Encyclopedia of Physical Science and Technology - Mathematics
Encyclopedia of Physical Science and Technology - Mathematics
Encyclopedia of Physical Science and Technology EN001C.19 May 26, 2001 14:19
Algebra, Abstract
Ki Hang Kim
Fred W. Roush
Alabama State University
435
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
Quotient group For a normal subgroup N of a group G, ments in B but not in A. For fixed set B ⊃ A, this is called
the group formed by equivalence classes x̄ under the the complement of A in B.
relation that x ∼ y if x y −1 in N , with multiplication The Cartesian product A1 × A2 × · · · × An of n sets
given by x̄ ȳ = x y. A1 , A2 , . . . , An is the set of all n-tuples (a1 , a2 , . . . , an )
Ring Structure having one associative and commutative such that ai ∈ Ai for all i = 1, 2, . . . , n. Infinite Cartesian
operation with identity and inverses, and a second op- products can be similarly defined with i ranging over any
eration associative and distributive with the first. index set I.
Semigroup Structure having an associative operation.
Simple Structure S such that, for every homomorphism
B. Binary Relations
f to another structure, either f (x) = f (y) for all x, y
in S or f (x) = f (y) for all x = y in S. The truth or falsity of any mathematical statement about
Subgroup Subset S of a group G containing the product a relationship x R y, for instance, x 2 + y 2 = 1, can be de-
and inverse of any of its elements. It is called normal termined from the set of ordered pairs (x, y) such that
if, for every g in G, s in S, the element gsg −1 is in S. x R y. In fact, R is equivalent to the relationship that
(x, y) ∈ {(x, y) ∈ A × B : x R y}, where A and B are sets
containing x and y.
ABSTRACT ALGEBRA is the study of laws of opera- The set of ordered pairs then represents the relationship
tions and relations such as those valid for the real numbers and is called a binary relation. In general, a binary relation
and similar laws for new systems. It studies the class of from A to B is a subset of A × B, that is, a set of ordered
all possible systems having given laws, for example, asso- pairs.
ciativity. There is no standard object that it studies such as The union, intersection, and complement of binary re-
the real numbers in analysis. Two systems are isomorphic lations from A to B are defined on them as subsets of
in abstract algebra if there is a one-to-one (1–1 for short) A × B. Another operation is composition. If R is a binary
correspondence between their elements such that relations relation from A to B and S is a binary relation from B to
and operations agree in the two systems. C, then R ◦ S is {(x, z) ∈ A × C : (x, y) ∈ R and (y, z) ∈ S
for some y ∈ B}. Composition is distributive over union.
For R, R1 , S, S1 , T , from A to B, A to B, B to C, B to
C, C to D, we have (R ∪ R1 ) ◦ S = (R ◦ S) ∪ (R1 ◦ S) and
I. SETS AND RELATIONS
R ◦ (S ∪ S1 ) = (R ◦ S) ∪ (R ◦ S1 ). It is associative; we have
(x, w) ∈ (R ◦ S) ◦ T if and only if for some y ∈ B and z ∈ C,
A. Sets
(x, y) ∈ R, (y, z) ∈ S, and (z, w) ∈ T . The same condition
A set is any precisely defined collection of objects. The is obtained for R ◦ (S ◦ T ).
objects in a set are called elements (or members) of the The identity relation 1 A is {(a, a) : a ∈ A}. This rela-
set. The set A = {1, 2, x, y} has elements 1, 2, x, y and we tion acts as an identity on either side for R ⊂ A × B,
write 1 ∈ A, 2 ∈ A, x ∈ A, and y ∈ A. There is no set of all S ⊂ B × C; that is, R ◦ 1 B = R, and 1 B ◦ S = S.
sets, but given any set we can obtain other sets by oper- The transpose R T of a binary relation R from A to B is
ations listed below, and for any set A and property P(x) {(b, a) ∈ B × A : (a, b) ∈ R}. It is also called converse and
we have a set {x ∈ A : P(x)} of all elements of A having inverse.
the property. Things such as “all sets” are called classes
rather than sets. There is a set of all real numbers and one
C. Functions
of all isosceles triangles in three-dimensional space.
We say that A is a subset of B if all elements of A are A partial function from a set A to a set B is a relation f
in B. This is written A ⊂ B, or B ⊃ A. The set is called a from A to B such that if (x, y) ∈ f , (x, z) ∈ f then y = z. If
proper subset if A = B; otherwise it is called an improper (x, y) ∈ f , we write y = f (x) since this condition means
subset. For finite sets, a subset is proper if and only if it y is unique. A partial function is a function defined on a
has strictly fewer elements. The empty set ∅ is the set with subset of a set considered. Its domain is {a ∈ A : (a, b) ∈ f
no elements. for some b ∈ B}.
If A ⊂ B and B ⊂A, then A = B. The union ∪ A of a A function is a partial function such that for all x ∈ A
family of sets is the set whose elements are all things there exists y ∈ B with (x, y) ∈ f .
that belong to at least one set A ∈ . The intersection ∩ A A function is 1–1 if and only if whenever (x, z) ∈ f ,
is the set of all elements lying in every set A in . (y, z) ∈ f we have x = y. This is the transpose of the def-
The power set P(A) is the set of all subsets of A. The inition of a partial function. Functions and partial func-
relative complement B\A (or B − A) is the set of all ele- tions are thought of in several ways. A function may be
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
considered the output of a process in which x is an input, The set x̄ = {y ∈ A : x R y} is called the equivalence
as x 2 + x + 1 adds a number to its square and 1. We may class of x (the set of all elements equivalent to the same
consider it as assigning an element of B to any element of element x). The set of equivalence classes x̄ is called
A. Or we may consider it to be a map of A into the set B the quotient set A/R. The function f (x) = x̄ is called
in which the point x ∈ A is represented by f (x) ∈ B. Then the quotient map A → A/R. The relation {(x, y) ∈ A ×
f (x) is called the image of x. Also for a subset C ⊂ A the A : f (x) = f (y)} is precisely R.
image f (C) is { f (x) : x ∈ C} the set of images of points of Any two equivalence classes are disjoint or equal: If
C. We may also consider it to be a transformation. Strictly, x R z and y R z, by symmetry z R y and by transitivity
however, the term transformation is sometimes reserved x R y. Any element belongs in the equivalence class of
for functions from a set to itself. itself. A family of nonempty subsets of a set S is called a
A 1–1 function sends distinct values of A to distinct partition if (1) whenever C = D in , we have C ∩ D = ∅;
points. For a 1–1 function on a finite set C, f (C) will (2) ∪ C = S. Therefore, the set of equivalence classes
have exactly as many elements as C. always forms a partition. Conversely, every partition
The composition of functions f : A → B and g : B → C arises from the equivalence relation {(x, y) ∈ A × A: for
is given by g( f (x)). It is written g ◦ f . Composition is as- some C ∈ , x ∈ C and y ∈ C}.
sociative since it is a special case of composition of binary An important equivalence is congruence modulo m of
relations except that the order is reversed. The identity re- integers. We say x ≡ y(mod m) for integers x, y, m if
lation on a set is also called the identity function. A com- there exists an integer h such that x − y = hm, that is,
position of partial (1–1) functions is respectively a partial if m divides x − y. If x − y = hm and y − z = km, then
(1–1) function. x − z = x − y + y − z = hm + km. So if x ≡ y(mod m)
and y ≡ z(mod m), then x ≡ z(mod m). This proves
transitivity.
A relation that is reflexive and transitive is called a
quasiorder. If it satisfies also (OR-4) antisymmetry if
A function f : A → B is onto if f (A) = B, that is, if (x, y) ∈ R and (y, x) ∈ R then x = y, it is called a par-
every y ∈ B is the image of some x. A composition of tial order. If a partial order satisfies (OR-5) completeness
onto functions is onto. A 1–1 onto function is called a 1–1 for all x, y ∈ A either (x, y) ∈ R or (y, x) ∈ R, it is called
correspondence. For a 1–1 onto function f , the inverse a total order, or a linear order.
(converse) of f is defined by f −1 = {(y, x) : (x, y) ∈ f } For every total order on a finite set A, the elements
or x = f −1 (y) if and only if y = f (x). It is character- of A can be labeled a1 , a2 , . . . , an in such a way that
ized by f ◦ f −1 = 1 B , f −1 ◦ f = 1 A . Moreover, for any ai R a j if i < j. An isomorphism between a binary rela-
binary relations R, S if R ◦ S = 1 B , S ◦ R = 1 A both R and tion R1 on a set A1 and R2 on A2 is a 1–1 correspon-
S must be 1–1 correspondences and S must be R T . A dence f : A1 → A2 such that (x, y) ∈ R1 if and only if
1–1 correspondence from a finite set to itself is called a ( f (x), f (y)) ∈ R2 . Therefore, every total order on a finite
permutation. set is isomorphic to the standard order on {1, 2, . . . , n}.
There are many nonisomorphic infinite total orders on
the same set and many nonisomorphic partial orders on fi-
D. Order Relations
nite sets of n elements. The structure of quasiorders can be
That a binary relation R from a set A to itself is (OR-1) re- reduced to that of partial orders. For every quasiorder R on
flexive means (x, x) ∈ R for all x ∈ A; (OR-2) symmetric any set A, the relation {(x, y) : (x, y) ∈ R and (y, x) ∈ R}
means if (x, y) ∈ R, then (y, x) ∈ R; and (OR-3) transitive is an equivalence relation. The quasiorder gives a partial
means if (x, y) ∈ R and (y, z) ∈ R, then (x, z) ∈ R. Fre- order on the set of equivalence classes of R and (x, y) ∈ R
quently x R y is written for (x, y) ∈ R, so that transitivity if and only if (x̄, ȳ) ∈ R1 .
means if x R y and y R z, then x R z. For instance, if x ≥ y The structure of partial orders on small sets can be de-
and y ≥ z, then x ≥ z. So x ≥ y is transitive, but x = y is scribed by diagrams known as Hasse diagrams. An ele-
not transitive. ment x of a partial order is called minimal (maximal) if
The following relations are reflexive, symmetric, and for no y = x does (y, x) ∈ R ((x, y) ∈ R) where (x, y) ∈ R
transitive on the set of geometric figures: x = y (same is taken as x ≤ y in the order. Every partial order on a finite
point set), x y (congruence), x ∼ y (similarity), x has set has at least one minimal and at least one maximal ele-
the same area as y. A reflexive, symmetric, transitive re- ment. Represent all minimal elements of the partial order
lation is called an equivalence relation. For any function as points at the same horizontal level of the bottom of the
f from A to B the relation {(x, y) ∈ A × A : f (x) = f (y)} diagram. From then on, the ith level consists of elements
is an equivalence relation. z not in previous levels but such that for at least one y on
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
F. Blockmodels
Often binary relations are empirically obtained. Such bi-
nary relations can frequently be simplified by blocking
the Boolean matrices: dividing the set of indices into dis-
joint subsets, relabeling to get members of the same sub-
set adjacent, and dividing the matrix into blocks. Each
nonzero block is replaced by a single 1 entry and each
zero block by a single 0 entry. Many techniques called
clustering exist for dividing the total set into subsets. One
(CONCOR) is to take iterated correlation matrices of the
FIGURE 2 Classification of binary operations. rows.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
Provided that every nonzero block has at least one 1 in II. SEMIGROUPS
each row the replacement of blocks by single entries will
preserve all Boolean sums and products. A. Generators and Relations
A binary operation is a function that given two entries
from a set S produces some element of a set T . Therefore,
G. General Relational Structures
it is a function from the set S × S of ordered pairs (a, b)
A general finite relational structure on a set S is an indexed to T . The value is frequently denoted multiplicatively as
family Rα of subsets of S ∪ (S × S) ∪ (S × S × S) ∪ · · ·. a ∗ b, a ◦ b, or ab. Addition, subtraction, multiplication,
Such structures include order structures and operational and division are binary operations.
structures such as multiplicative ones as subsets {(x, y, z) The set S is said to be closed under the operation if the
∈ S × S × S : x ∗ y = z}. A homomorphism of relational product always lies in S itself. The positive integers are
structures (index the same way) Rα on S1 to Tα on S2 con- not closed under subtraction or division.
sists of a function f : S1 → S2 such that if g is the mapping The operation is called associative if we always have
S1 ∪ (S1 × S1 ) ∪ (S1 × S1 × S1 ) ∪ · · · to S2 ∪ (S2 × S2 ) ∪ (a ◦ b) ◦ c = a ◦ (b ◦ c). We have noted that this always
(S2 × S2 × S2 ) ∪ · · · which is f on each coordinate then holds for composition of functions or binary relations.
g(Rα ) ⊂ Tα for each α. An isomorphism of relational Conversely, if closure and associatively hold, the set can
structures is a 1–1 onto homomorphism such that g(Rα ) = always be represented by a set of functions under compo-
Tα for each α. The quotient structure of Rα on S associ- sition.
ated with an equivalence relation E on S is the structure A set with a binary operation satisfying associativity
Tα = g(Rα ), for f the mapping S → S/E assigning to each and closure is called a semigroup. The positive integers
element its equivalence class. form a semigroup under multiplication or addition. An
element is called a left (right) identity in a semigroup
S if for all x ∈ S, x = x(x = x). A semigroup with two-
H. Arithmetic of Residue Classes sided identity is called a monoid. To represent a semigroup
as a set of functions under composition, first add a two-
Let Z denote the set of all integers. Let E m be the sided identity element to obtain a monoid M. Then for
equivalence relation {(x, y) : Z × Z : x − y = km for some each x in M define a function f x : M → M by f x (y) =
k ∈ Z}. This is denoted x ≡ y(mod m). We have previously x ◦ y.
noted that it is an equivalence relation. The set generated by a subset G ⊂ S is the set of all
It divides the integers into exactly m equivalence finite products {x1 x2 · · · xn : n ∈ Z+ , xi ∈ G}, where Z+
classes, for m = 0. For m = 3, the classes are 0̄ = {. . . , denotes the set of all positive integers. The set satisfies
−9, −6, −3, 0, 3, 6, 9, . . .}, 1̄ = {. . . , −8, −5, −2, 1, 4,
7, 10, . . .}, 2̄ = {. . . , −7, −4, −1, 2, 5, 8, 11, . . .}. Any
two members of the same class are equivalent (3 divides
their difference).
This relation has the property that if x ≡ y(mod m)
then, for any z ∈ Z, x + z ≡ y + z since m divides x + z −
(y + z) = x − y and x z ≡ yz. Such a relation in general is
called a congruence. For any congruence, we can define
operations on the classes by x + y = x̄ + ȳ, and x y = x̄ ȳ.
Let Zm be the set of equivalence classes of Z under
congruence module m and under +, × quotient operators. FIGURE 4 Addition and multiplication of module 4 residue
Operations in Z5 are given in Fig. 4. classes.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
closure since (x1 x2 · · · xn )(y1 y2 · · · yn ) has agian the form B. Green’s Relations
required. A subset of a semigroup satisfying closure is it-
For analyzing the structure of semigroups in many cases
self a semigroup and is called a subsemigroup. The set G
it is best to decompose it into a family of equivalence
is called a set of generators for S if G generates the entire
classes.
semigroup.
Let S be a semigroup and let M be S with an identity el-
For a given set G of generators of S, a relation is an
ement if S lacks one. Let be the relation for x, y ∈ S that
equation x1 x2 · · · xn = y1 y2 · · · yn , where xi , yi ∈ G.
for some a, b ∈ M, xa = y, yb = x. Let be the relation
A homomorphism of semigroups is a mapping f :
for some a, b ∈ M, ax = y, by = x. Let be the relation
S1 → S2 such that f (x ◦ y) = f (x) f (y) for all x, y ∈ S1 .
that for some a, b, c, d ∈ M, axc = y, byd = x. What these
For example, e z is a homomorphism from the additive
equations express is a kind of divisibility, that either x, y
semigroup of real numbers (R, +) to the multiplicative
can be a factor of the other. For example, in Z+ under
semigroup of complex numbers (C, ×), since e x+y =
multiplication (2, 5) ∈ because 2 is not a factor of 5.
(e x )(e y ). Here C denotes the set of all complex numbers.
There are two other important relations: = ∩ =
The trace and determinant of n-square matrices are ho-
{(x, y) : x y and x y} and = ◦ = ◦ . These
momorphisms, respectively, from the additive and multi-
are both equivalence relations also and are known as
plicative semigroups of n-square matrices to the additive
Green’s relations.
and multiplicative semigroups of real numbers.
For the semigroup of transformations on a finite set
An isomorphism of semigroups is a 1–1 onto homomor-
T, f g if and only if the partitions given by the equiv-
phism. The inverse function will then also be an isomor-
alence relations {(x, y) ∈ T × T : f (x) = f (y)}, {(x, y) ∈
phism. Isomorphic semigroups are structurally identical.
T × T : g(x) = g(y)} coincide, f g if and only if f g
Two semigroups having the same generating set G and
if and only if f, g have the same number of image ele-
the same relations are isomorphic, since the mapping
ments, and f g if and only if their images are the same.
f (x1 ◦ x2 ◦ · · · ◦ xn ) = x1 ∗ x2 ∗ · · · ∗ xn will be a homo-
In a finite semigroup, = . The entire semigroup is
morphism where ◦ , ∗ denote the two operations. The iden-
always broken into -classes, which are partially ordered
tity of the sets of relations guarantees that this is well de-
by divisibility. The -classes, which are in these, are bro-
fined, that is, if x1 ◦ x2 ◦ · · · ◦ xm = y1 ◦ y2 ◦ · · · ◦ yn , then
ken into -classes Hi j = Ri ∩ L j , where Ri , L i are the
f (x1 ◦ x2 ◦ · · · ◦ xm ) = f (y1 ◦ y2 ◦ · · · ◦ yn ).
, -classes in a given -class.
For any set of generators G and set R of relations, a
The -classes can be laid out in a matrix (Hi j ) called
semigroup S can be produced having these generators
the eggbox picture of a semigroup. There exists a 1–1
and satisfying the relations such that if f (G) → T is any
correspondence between any two -classes.
function such that for every relation x1 ◦ x2 ◦ · · · ◦ xm =
A left ideal in a semigroup S is a set ⊂ S such that
y1 ◦ y2 ◦ · · · ◦ yn the relation f (x1 ) ∗ f (x2 ) ∗ · · · ∗ f (xm )
for all x ∈ S, y ∈ the element x y ∈ . Right and two-
= f (y1 ) ∗ f (y2 ) ∗ · · · ∗ f (yn ) holds then f extends to a
sided ideals are similarly defined by yx ∈ , yx z ∈
homomorphism S → T .
for all y, z ∈ S, respectively. An element x generates the
To produce S we take the set of all “words” x1 x2 · · · xm
principal left, right two-sided ideals Sx = {yx : y ∈ S},
in G, that is, sequences of elements in G. Two “words” are
x S = {x y : y ∈ S}, Sx S = {yx z : y, z ∈ S}. Two elements
multiplied by writing one after the other x1 x2 · · · xm y1 y2
are -, -, -equivalent (assuming S has an identity, oth-
· · · yn . We define an equivalence relation on words by
erwise add one) if and only if they generate the same left,
w1 ∼ w2 if w2 can be obtained from w1 by a series of re-
right, or two-sided ideals, respectively.
placments of a x1 x2 · · · xn b by ay1 y2 · · · ym b, where a, b
are words (or are empty) and x1 x2 · · · xn = y1 y2 · · · ym is
a relation in R. Then S is the set of equivalence classes of
C. Binary Relations and Boolean Matrices
words.
The semigroup S is called the semigroup with genera- The set of binary relations on an n-element set under com-
tors S and defining relations R. The fact that multiplication position forms a semigroup that is isomorphic to Bn , the
is well defined in S follows from the fact that a ∼ b is a semigroup of n-square Boolean matrices under Boolean
congruence. This means that if a ∼ b then for all words matrix multiplication. A 1 × n(n × 1) Boolean matrix is
x, ax ∼ bx and xa ∼ xb, and that ∼ is an equivalence called a row (column) vector. Two vectors are added or
relation. multiplied by a constant (0 or 1) as matrices: (1, 0, 1) +
For all semigroup homomorphisms f : S → T the re- (0, 1, 0) = (1, 1, 1), 0v = 0, and 1v = v. A set of Boolean
lation f (x) = f (y) is a congruence. Conversely any con- vectors is called a subspace if it is closed under sums
gruence gives rise to a homomorphism from S to the set and contains 0. The span of a set W of Boolean vectors
of equivalent classes. is the set of all finite sums of elements of W , including 0.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
The row (column) space of a Boolean matrix is the space If an -class contains an idempotent, then it will be
spanned by its row (column) vectors. Two Boolean ma- closed under multiplication, and multiplication by any of
trices are - (also -) equivalent if and only if their row its elements gives a 1–1 onto mapping from it to itself.
spaces are isomorphic. The basis for a space of Boolean Therefore, it forms a group. Conversely any group in a
vectors spanned by S is {x ∈ S : x = 0 and x is not in the semigroup lies in a single -class since under multiplica-
span of S\{x}}. Two subspaces are identical if and only if tion any two elements are -equivalent and -equivalent.
their bases are the same. Bases for the row, column spaces
of a Boolean matrix are called row, column bases. Two
Boolean matrices are - (-) equivalent if and only if E. Finite State Machines
their row (column) bases (and so spaces) coincide. The theory of automata deals with different classes of
A semigroup of the form {x n } is called cyclic. Every theoretical machines representing robots, calculators, and
cyclic semigroup is determined by the index k = inf{k ∈ similar devices. The simplest are the finite state machines.
Z+ : x k+m = x k for some m ∈ Z+ }, and the period d = There are two essentially equivalent varieties: Mealy ma-
inf{d ∈ Z+ : x k+d = x k }. The set of powers k, k + 1, . . . re- chines and Moore machines.
peat with period d. If an n-square Boolean matrix A is re- A Mealy machine is a 5-tuple (S, X, Z , ν, µ), where
flexive (A ≥ I ), then A = AI ≤ A2 ≤ A3 ≤ · · · ≤ An−1 = S, X, Z are sets, ν a function S × X to S, and µ a function
An , an idempotent. Here I denotes the identity matrix. S × X to Z . The same definition holds for Moore machines
Also if A is fully indecomposable, meaning that there ex- except that µ is a function S to Z .
ists no nonempty K , L ⊂ {1, 2, . . . , n} with |K | + |L| = n, Here S is the set of internal states of the machine. For
where |S| denotes the cardinality of a set S, and ai j = 0 for a computer this could include all posibilities as to which
i ∈ K , j ∈ L, then An−1 = J and so A has period 1 and in- circuit elements are on or off. The set X is the set of inputs,
dex at most n − 1. Here J is the Boolean matrix all entries which could include a program and data. The set Z is the
of which are 1. set of outputs, that is, the desired response. The function
In general, an n-square Boolean matrix has period equal µ gives the particular output from a given internal state
to the period of some permutation on {1, 2, . . . , n} and and input. The function ν gives the next internal state
index at most (n − 1)2 + 1. resulting from a given state and input. For a computer this
is determined by the circuitry. For example, a flip-flop
D. Regularity and Inverses will change its internal state if it receives an input of 1;
otherwise the internal state will remain unchanged.
An element x of a semigroup S is said to be a group inverse A Mealy machine to add two n-digit binary numbers,
of y if S has a two-sided identity and x y = yx = . A a, b can be constructed as follows. Let the ith digits of
semigroup in which every element has a group inverse is ai , bi be inputs, so X = {0, 1} × {0, 1}. Let the carry from
called a group. In most semigroups, few elements are in- the i − 1 digit be ci . It is an internal state, so S = {0, 1}.
vertible in this sense. A function is invertible if and only if The output is the ith digit of the answer, so Z = {0, 1}.
it is 1–1 onto. A Boolean matrix in invertible if and only The function ν gives the next carry. It is 1 if and only if
if it is the matrix of a permutation (permutation matrix). ai + bi + ci > 1. The function µ gives the output. It is 1 if
There are many weaker ideas of inverse. The two most and only if ai + bi + ci is odd. Figure 5 gives the values
important are regularity yx y = y and Thierrin–Vagner in- of ν and µ.
verse yx y = y and x yx = x. An element y has a Thierrin– With a finite state machine is associated a semigroup
Vagner inverse if and only if it is regular: If yx y = y of transformations, the transformations f x (s) = ν(s, x) of
then x yx is a Thierrin–Vagner inverse. A semigroup in
which all elements are regular is called a regular semi-
group. The semigroups of partial transformation, transfor-
mations, and n-square matrices over a field are regular. In
the semigroup of n-square Boolean matrices, an element
is regular if and only if its row space forms a distributive
lattice as a poset.
An idempotent is an element x such that x x = x. An ele-
ment x is regular if and only if its -equivalence class con-
tains an idempotent: if x yx = x then x y is idempotent. The
same holds for -equivalence classes. Therefore, if two
elements are or -equivalent (therfore -equivalent),
one is regular if and only if the other is. FIGURE 5 Machine to add two binary numbers.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
the state space, and all compositions of them f x1 x2 ···xn (s) = symbol. If (x, y) ∈ ρ, we are allowed to change any oc-
f x1 ( f x2 · · · f xn (s)). This is called the semigroup of the currence of x with y. Members of W are called terminals.
machine. Consider logical formulas involving operations ∨, ∼
Two machines are said to have the same behavior if and variables p, q, r . Let W = { p, q, r, ∨, (,), ∼}, N =
there exists a binary relation R from the initial states of one {ψ}. We derive formulas by successive substitution, as
machine to the intial state of the other such that if (s, t) ∈ S ψ, (ψ ∨ψ), ((ψ ∨ψ)∨ψ), ((ψ ∨ψ)∨ ∼ ψ), (( p ∨ψ)∨ ∼
then for any sequence x1 , x2 , . . . , xk of inputs machine 1 ψ), (( p ∨ q) ∨ ∼ ψ), (( p ∨ q) ∨ ∼ r ).
in state s gives the same sequence of outputs as machine An element y ∈ (N ∪ W )∗ is said to be directly de-
2. A machine M1 equivalent to a given machine M having rived from x ∈ (N ∪ W )∗ if x = azb, y = awb for some
a minimal number of states can be constructed as follows. (z, w) ∈ ρ, a, b ∈ (N ∪ W )∗ . An indirect derivation is a
Call two states of M equivalent if, for any sequence of sequence of direct derivations. Here ρ = {(ψ, p), (ψ, q),
inputs, the same sequence of outputs is obtained. This (ψ, r ), (ψ, ∼ψ), (ψ, (ψ ∨ ψ))}.
gives an equivalence relation on states. Then let the states The language determined by a phrase structure grammar
of M1 be the equivalence classes of states of M. is the set of all a ∈ W ∗ that can be derived from ψ.
No finite state machine M can multiply, as the above A grammar is called context free if and only if for all
machine adds binary number of arbitrary length. Suppose (a, b) ∈ ρ, a ∈ N , b = e0 . This means that what items can
such a machine has n states. Then suppose we multiply be substituted for a given grammatical element do not de-
the number 2k by itself in binary notation (adding zeros pend on other grammatical elements. The grammar above
in front until the correct number of digits in the answer is is context free.
achieved). Let f x be the transformation of the state space A grammar is called regular if for all (a, b) ∈ ρ we have
given by inputs of 0, 0 and let a be the state after the inputs a ∈ N , b = tn, where t ∈ W, n ∈ N , or n = e0 . This means
1, 1. The inputs 0, 0 applied to state f xi (a) give output 0 at each derivation we go from t1 t2 · · · tr n to t1 t2 · · · tr tr +1 m,
for i = 0, 1, 2, . . . , k − 2 but output 1 for i = k − 1. Yet the where ti are terminals, n, m nonterminals, (n, tr +1 m) ∈ ρ.
transformation f on a set of n elements will have index at So we fill in one terminal at each step, going from left to
most n. Therefore, if k > n then f xk−1 (a) will coincide with right. The grammar mentioned above is not regular.
j
f x (a) for some j < k − 1. This is a contradiction since one To recognize a grammar is to be able to tell whether or
yields 0 output, the other 1 output. It follows that no such not a sequence from W ∗ is in the language. A grammar is
machine can exist. regular if and only if some finite state machine recognizes
it. The elements of W are input 1 at a time and outputs
are “yes, no,” meaning all symbols up to the present are or
F. Mathematical Linguistics
are not in the language. Let the internal states of the ma-
We start with a set W of basic units considered words. chine be in 1–1 correspondence with all subsets of N , and
Mathematical linguistics is concerned with the formal let the initial state be ψ. For a set S1 of nonterminals and
theory of sentences, that is, sequences of words that input x let the next state be the set S2 of all nonterminals z
are grammatically allowed and the grammatical structure such that for some u ∈ S1 , (u, x z) is a production. Then at
of sentences (or longer units). This is syntax. Meaning any time the state consists of all nonterminals that could
(semantics) is usually not dealt with. occur after the given seqence of inputs. Let the output be
For a set X , let X ∗ be the set of finite sequences from “yes” if and only if for some u ∈ N , (u, x) ∈ ρ. This is if
X including the empty sequence e. For instance, if X is and only if the previous inputs together with the current
{0, 1}, then X ∗ is {e, 0, 1, 00, 01, 10, 11, 000 · · ·}. For a input form a word in the language.
more important example, we can consider the family of For the converse, if a finite state machine can recognize
all sequences of logical variables p, q, r , and ∨ (or), ∧ a language, let W be as in the language, N be the set of
(and), (,) (parentheses), → (if then), ∼ (not). The set of internal states, ψ the initial state, the productions the set
logical formulas will be a subset of this. of pairs (n 1 , xn 2 ) such that if the machine is in state n 1
A phrase structure grammar is a quadruple (N , W, and x is input, state n 2 is the next state, and the set of
ρ, ψ), where W (set of words) is nonempty and finite, pairs (n 1 , x) such that in state n 1 after input x the machine
N (nonterminals) is a finite set disjoint from W, ψ ∈ N , answers “yes.”
and ρ is a finite subset of ((N ∪ W )∗ \W ∗ ) × (N ∪ W )∗ . A further characterization of regular language is the
The set N is a set of valid grammatical forms involving Myhill–Nerode theorem. Let W ∗ be considered a semi-
abstract concepts such as ψ (sentence) or subject, predi- group of words. Then a language L ⊂ W ∗ is regular if
cate, object. The set ρ (productions) is a set of ways we can and only if the congruence {(x, y) ∈ W ∗ × W ∗ : axb ∈ L
substitute into a valid grammatical form to obtain another, if and only if ayb ∈ L for all a, b ∈ L} has finitely many
more specific one. The element ψ is called the starting equivalence classes. This is if and only if there exists a
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
III. GROUPS
A. Examples of Groups
A group is a set G together with a binary operation
here denoted ◦ such that (1) ◦ is a function G × G →
G (closure); (2) ◦ is associative, for example, (x ◦ y) ◦ z =
x ◦ (y ◦ z); (3) there exists a two-sided identity such that FIGURE 6 Multiplication table of the dihedral group.
◦ x = x ◦ = x for all x ∈ G; (4) for all x ∈ G there ex-
ists y ∈ G with x ◦ y = y ◦ x = . Here y is known as an
inverse of x. From now on we suppress ◦ . n. It is isomorphic to the group of functions Zn → Zn of
For any semigroup S with a two-sided identity , let the form f (x) = ±x + k. Its multiplication table is given
S ∗ = {x ∈ S : x y = yx = for some y ∈ S}, the set of in- in Fig. 6, where xi is x + i − 1 if i > n + 1, −x + i − n − 1
vertible elements. The element y = x −1 is unique since if if i > n.
y1 x = , x y2 = then y1 = y1 x y2 = y2 . Then S ∗ satisfies
closure since (ab)−1 = b−1 a −1 and is a group where a −1
B. Fundamental Homomorphism Theorems
is an inverse of a.
A group with a finite number of elements is called a A group homomorphism is a function f : G → H satis-
finite group; otherwise, it is called an infinite group. If a fying f (x y) = f (x) f (y) for all x, y in G. If f is 1–1
group G is finite, and G contains n elements, we say that and onto it is called an isomorphism. For any group G
the order of G is n and we write |G| = n. If G is infinite, of order n there is a 1–1 homomorphism G → n ; la-
we write |G| = ∞. In general, |G| denotes the cardinality bel the elements of G as x1 , x2 , . . . , xn . For any x in G,
of a set G. the mapping a → xa gives a 1–1 onto function G → G,
The group of all permutations of {1, 2, . . . , n} is called so there exists a permutation π with x xi = xπ (i) . This is
n , the symmetric group of degree n. The degree is the a homomorphism since if x, y are sent to π, φ, we have
number of elements in the domain (and range) of a permu- yx xi = yxπ (i) = xφ(π (i)) . There is an isomorphism from n
tation. Therefore, n has degree n, order n!. A subgroup to the set of n-square permutation matrices (either over R
of n is formed by the set of all transformation of the form or a Boolean algebra), and it follows that any finite group
1 → k + 1, 2 → k + 2, . . . , n − k → n, n − k + 1 → 1, . . . , can be represented as a group of permutations or of in-
n → k. vetible matrices.
For any set P of subsets of n-dimensional Euclidean Groups can be simplified by homomorphisms in many
space En , let T be the union of the subsets of P. A sym- cases. The determinant represents the group of nonsin-
metry of P is a mapping f : T → T such that (S–1) for gular n-square matrices in the multiplicative group of
x, y ∈ T , d(x, y) = d( f (x), f (y)), where d(x, y) denotes nonzeroreal numbers (it is not 1–1). For a bounded func-
the distance from x to y; and (S–2) for A ⊂ T , A ∈ P if tion g, g f d x is a homomorphism from the additive
and only if f (A) ∈ P. That is, f preserves distances and group of integrable functions f to the additive R.
sends the subsets C of P, for example, points and lines, so The kernel of a homomorphism f : G → H is
other subsets of P. The inverse off is its inverse function, {x : f (x) = }. Every homomorphism sends the identity
which also satisfies (S–1) and (S–2). to the identity and inverses to inverses. Existence of in-
The sets R, Z, Zm under addition are all groups. verses means we can cancel on either side in groups:
The group Zm , the group of permutations {1 → k + 1, If ax = ay then a −1 ax = a −1 ay, x = y, x = y. There-
2 → k + 2, . . . , n → k}, and the group of rotational sym- fore, the identity is the unique elements satisfying = .
metries of an n-sided regular polygon are isomorphic. That Under a homomorphism f ( ) = f ( ) = f ( ) f ( ) so
is, there exists a 1–1 correspondence f between any two f ( ) is an identity. So is in the kernel. A ho-
such that f (x y) = f (x) f (y) for all x, y in the domain. A momorphism is 1–1 if and only if is its only el-
group isomorphic to any of these is called a cyclic group ement: If f (x) = f (y), then f (x y −1 ) = f (x) f (y −1 ) =
of order m. f (x) f (y)−1 = f (x) f (x)−1 = .
The group of all symmetries of a regular n-sided poly- If x, y are in the kernel, so is x y since f (x y) =
gon has order 2n and is called the dihedral group of order f (x) f (y) = = . If x is in the kernel, so is x −1 . Any
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
subset of group that is closed under products and inverses For any homomorphism f : G → H with kernel K , im-
is a subgroup. Both the kernel and image of a homomor- age M every equivalence class of K is sent to a single
phism are subgroups. element. That is, f gives a mapping G/K → M. This
The mapping cg : G → G defined by cg (x) = gxg −1 is mapping is an isomorphism.
called a conjugation. An isomorphism from a group (or If A, B are normal in a group G, and B is a subgroup
other structure) to itself is called an automorphism. Since of A, then
cg (x y)=gx yg −1 = gxg −1 gyg −1 , cg is a homomorphism. G/B
If h = g −1 , then cg is the inverse of ch . Therefore, the map- = G/A
A/B
pings cg are automorphisms of a group. They are called
the inner automorphisms. This gives a homomorphism If A is a normal subgroup and B is any subgroup of G,
of any group to its automorphisms, which also form a and AB = {ab : a ∈ A, b ∈ B} then,
group, the automorphism group. A subgroup N is normal AB B
if cg (x) ∈ N for all x ∈ N and g ∈ G. The kernel is always =
A A∩B
a normal subgroup.
If f : G → H has kernel K and is onto, the group G is
C. Cyclic Groups
said to be an extension of K by H . Whenever there exists
a homomorphism f from K to Aut(H ) for any groups Let G be a group. If there exists a ∈ G such that G = {a k :
K , H , a particular extension exists called the semidirect k ∈ Z}, then G is called a cyclic group generated by a, and
product. Here Aut(H ) denotes the automorphism group a is called a generator of G. For addition, the definition
of H . This has as its set K × H and products are defined becomes:
by (k1 , h 1 )(k2 , h 2 ) = [k1 k2 , f (k2 )(h 1 )(h 2 )], k1 , k2 ∈ K and There exists a ∈ G such that for each g ∈ G there exists
h 1 , h 2 ∈ H . The dihedral group is a semidirect product of k ∈ Z such that g = ka.
Z2 and Zm . A group G is said to be commutative when x y = yx
The groups K , H will generally be simpler than G. A for all x, y in G. Commutative groups are also called
theory exists classifying extensions. If a group has no nor- Abelian. In a commutative group, gxg −1 = x for all
mal subgroups except itself and { }, it is called simple. g ∈ G. Therefore, every subgroup of a commutative
The alternating group, the group of permutations in n group is normal. Let Rn denote the set of all n-tuples of
whose matrices have positive determinant, is simple for real numbers. Then (Rn , +) is a commutative group. For
n > 4. Its order is n!/2. For n odd, the group of n-square any groups G 1 , G 2 , . . . , G n , the set G 1 × G 2 × · · · × G n
real-valued invertible matrices of determinant 1 is sim- under componentwise multiplication (x1 , . . . , xn ) ×
ple. For n even a homomorphic image of n with kernel (y1 , . . . , yn ) = (x1 y1 , . . . , xn yn ) is a group called the
{I, −I } is simple, where I is an identity matrix. All finite Cartesian or direct product of G 1 , G 2 , . . . , G n . If all G i
and finite-dimensional differentiable, connected simple are commutative, so is the direct product. For any indexed
groups are known and most are constructed from groups of family G α of groups, coordinatewise multiplication
matrices. makes the Cartesian product a group ×αGα . The subset
For any subgroup H of a group G there exist equiva- of elements (xα ) such that {α : xα = } is finite is also a
lence relations defined by x ∼ y if x y −1 ∈ H (y −1 x ∈ H ). group sometimes called the direct sum.
These equivalence classes are called left (right) cosets. A set G 0 ⊂ G is said to generate the set
The left (right) coset of a is H a = {ha : h ∈ H } {x1 x2 · · · xn : xi ∈ G 0 or xi−1 ∈ G 0 for all i and n ∈ Z}.
(a H = {ah : h ∈ H }). There exists a 1–1 correspondence Every finitely generated Abelian group is isomorphic to
H → H a given by x → xa (x → ax for right cosets). a direct product of cyclic (1 generator) groups.
Therefore, all cosets have the same cardinality as H . If A group of words on any set X is defined as the
[G : H ] denotes the number of right (or left) cosets, then set of finite sequences (with ) x1a(1) x2a(2) · · · xna(n) , xi ∈ X ,
|G| = |H |[G : H ], which is known as Lagrange’s theorem. a(n) ∈ Z. We can reduce such words until a(i) = 0,
So the order of a subgroup of a finite group must divide xi = xi+1 by adding exponents if and only if they have the
the order of the group. If G is a group of prime order p, same reduced form. Equivalence classes of words form a
let x ∈ G, x = . Then {x n } forms a subgroup whose order group F.
divides and must therefore equal p. So G = {x n }. Relations among generators in a group can be writ-
If N is a normal subgroup and x ∼ y, for all g ∈ G then ten in the form x1a(1) x2a(2) · · · xna(n) = . For any set G 0 and
gx ∼ gy and xg ∼ yg since gx y −1 g −1 and x y −1 are in set of relations R in the elements of G 0 , there exists a
N if x y −1 in N . It follows that if a ∼ b and c ∼ d, then group G defined by these relations such that any map-
ac ∼ bd. Therefore, the equivalence classes form a group ping f : G 0 → H extends to a unique homomorphism
G/N called the quotient group. Its order is |G|/|N |. g : G → H if the relations hold in H . The group G is
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
6 1-cycles c1 = 6 — — —
(123456) 1 6-cycle c6 = 1 About 2 vertices 2 2-cycles c1 = c2 = 2
(654321) (26) (35)
(13) (46)
(15) (24)
(135) (246) 2 3-cycles c3 = 2 — — —
(153) (246)
(14) (25) (36) 3 2-cycles c2 = 3 About an edge center 3 2-cycles c2 = 3
(12) (36) (45)
(23) (14) (65)
(16) (25) (34)
In the problem above this yields the coefficient of θ12 θ24 in c ∈ F. For F = R, v3 is the set of vectors consideered in
physics. These can be regarded as positions (displace-
2 3
1
12
(θ1 + θ2 )6 + 2 θ16 + θ26 + θ13 + θ23 + 4 θ12 + θ22 ment), velocities, or forces: Each adds in the prescribed
way.
2
+ 3 θ12 + θ22 (θ1 + θ2 )2 A general vector space is a set v having an addition
v × v → v and a scalar multiplication F × v → v such that
That coefficient is (v, +) is a commutative group and for all a, b ∈ F, v,
1 6 3 1 w ∈ v; a(v + w) = av + aw, (a + b)v = av + bv, (ab)v =
+4 + 3(3) = (15 + 12 + 9) = 3
12 2 1 12 a(bv), and cv = v, where c = 1 ∈ F. Incidentally, F is
called the field of scalars, and multiplication by its ele-
ments scalar multiplication.
IV. VECTOR SPACES
B. Basis and Dimension
A. Vector Space Axioms
The span S of a subset S ⊂ v is {a1 s1 + a2 s2 + · · · +
A vector space is a set having a commutative group ad- an sn : n ∈ Z+ , si ∈ S, ai ∈ F} ∪ {0}. The elements a1 s1 +
dition, and a multiplication by another set of quantities a2 s2 + · · · + an sn are called linear combinations of
(magnitudes) called a field. A field is a set F such as R or s1 , s2 , . . . , sn . A sum of two linear combinations is a linear
C having addition and multiplication F × F → F such that combination and a multiple by c ∈ F of a linear combina-
the axioms in Table II hold for all x , y , z and some 0, 1 in F. tion is a linear combination. A subset of a vector space
The standard example of a vector space is the set vn of closed under addition and scalar multiplication is called a
n-tuples of members of F. Let (x1 , x2 , . . . , xn ), (y1 , y2 , subspace. Therefore, the span of any set is a subspace.
. . . , yn ) ∈ vn . The (x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = A subspace is a vector space in itself. Its span is itself.
(x1 + y1 , x2 + y2 , . . . , xn + yn ) and multiplication by c It follows that S lies in any subspace containing S.
is given by c(x1 , x2 , . . . , xn ) = (cx1 , cx2 , . . . , cxn ) where An indexed family tα , α ∈ I of vectors is said to be lin-
early independent if tγ is not a linear combination of tα ,
TABLE II Field Axioms α = γ and no tγ = 0 for all γ . Otherwise, it is called lin-
early dependent. A set of vectors W is likewise said to be
Operation linearly independent if for all x ∈ W, x ∈ / W \{x} .
Property Addition Multiplication A basis for a vector space is a linearly independent
spanning set for the vector space. A chain in a poset is a
Commutative x +y=y+x x y = yx subset that is linearly ordered by the partial order. Zorn’s
Associative (x + y) + z = x + (y + z) (x y)z = x(yz) lemma states that in a poset X in which every chain has a
Identity For all x ∈ F, x + 0 = 0 For all x ∈ F, x1 = x maximal element there is a maximal element of X . Take
Inverse For all x ∈ F, there exists For all x ∈ F (x = 0), X to be the family of linearly independent sets in v. Every
−x such that x + (−x) = 0 there exists x −1 chain has its union as an upper bound. A maximal linearly
such that x x −1 = 1
independent set therefore exists. It is a basis. Therefore,
Distributive x(y + z) = x y + x z
every vector space has a basis.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
A linearly ordered set is well ordered if every nonempty alent to bi ti . No two distinct sums bi ti can have differ-
subset has a least element. Using Zorn’s lemma it can be ence a nonzero sum ai si so the classes of ti are linearly
shown that every set can be well ordered. From every independent in v/w. Therefore t̄τ gives a basis for v/w.
set {xα } of vectors where the index set is well ordered The external direct sum of two vector spaces v, w is
we can pick a subset {xγ } that is linearly independent v × w with operations (v1 , w1 ) + (v2 , w2 ) = (v + v2 , w1 +
and spans the same set. Let B = {xγ : xγ ∈ {xα : α < γ } }. w2 ), and c(v, w) = (cv, cw). It has as basis the disjoint
Then no element of B is linearly dependent on previous union of any bases for v, w. It is denoted v ⊕ w. There-
vectors. But if some element of B were linearly depen- fore, if w ⊂ v, v is isomorphic to w ⊗ v/w. Moreover,
dent, let y = ai xi , ai = 0. Let xn be the last of the xi in dim(w) + dim(v/w) = dim(v), where dim(v) denotes the
the ordering. Then xn = (1/an )(y − i =n ai xi ) is linearly dimension of v.
dependent on previous vectors. This is a contradiction. A complement to a subspace w ⊂ v is a subspace u ⊂ v
Two bases for a vector space have the same cardinal such that u ∩ w = {0}, u + w = v, where u + w = {u + w :
number. For finite bases u 1 , u 2 , . . . , u n , v1 , v2 , . . . , vm u ∈ u, w ∈ w}. Every subspace has a complement by the
let m < n. Expressions u i = ai j v j exist since {vi } = construction above with u = {tα } . The mapping ai ti →
{u i } . Since m < n, we can find a nonzero solution of the ai t̄i gives an isomorphism u → v/w.
m linear equations i wi ai j = 0, j = 1 to m in n variables For a linear transformation f : v → w, the image set
wi . Then wi u i = 0, not all wi = 0. This contradicts linear Im( f ) and the null space (kernel) Nu( f ) = {v ∈ v : f (v) =
independence of the u i . 0} are both subspaces. Results on group homomorphism
The cardinality of a basis for a vector space is called imply that f gives an isomorphism f¯ : v/Nu( f ) → Im( f ).
the dimension of the vector space. A linear transformation is 1–1 (onto) if and only if
For vector spaces v, w a homomorphism (isomor- Nu( f ) = 0 (Im( f ) = w). The dimension of Im( f ) is called
phism) is a (1–1 and onto) function f : v → w such the rank of f .
that f (x + y) = f (x) + f (y) and f (cx) = c f (x) for all Choose bases xi , yi , z i for vectors spaces u, v, w. Let f
x, y ∈ v, c ∈ F. If two vector spaces over F have the same be a homomorphism from u to v and g a homomorphism
dimension, choose bases u α , vα for them. Every element v to w. There exist unique coefficients ai j , bi j such that
of v has a unique expression ai u i , u i ∈ {u α }, and every f (xi ) = ai j yi and g(yi ) = bi j z j . Then g( f (xi )) =
element of w has a unique expression ai vi , vi ∈ {vα }. g(ai j y j ) = ai j g(y j ) = ai j b jk z k . Therefore, the
Therefore, f (ai u i ) = ai vi defines an isomorphism v composition g ◦ f sends xi to j,k ai j b jk z k . This rule
to w. Conversely if two vector spaces are isomorphic, defines a product on nm-tuples ai j , which must be
their dimensions must be equal since if {u α } is a linearly associative since composition of functions is. It is
independent (spanning) set for v, { f (u α )} will be a distributive since f (g(x) + h(x)) = f (g(x)) + f (h(x))
linearly independent (spanning) set for w. and (g + h)( f (x)) = g( f (x)) + h( f (x)).
formula requires n 3 multiplications and n 2 (n − 1) ad- where the summation is over all permutations π, and
ditions. There does exist a slightly improved computer sgn(π ) denotes the sign of π and so sgn(π ) is ±1 accord-
method known as fast matrix multiplication. ing to whether π is even or odd. As before, a permutation
Matrices obey the following laws: (1) A + B = B + A, is even or odd according to whether it is the product of an
(2) (A + B) + C = A + (B + C), (3) (AB)C = A(BC), odd or even number of transpositions. It has the following
(4) A(B + C) = AB + AC, and (5) 0 + A = A. Here 0 de- properties, which can be proved in turn. Let (ai j )T denote
notes the zero matrix all the entries of which are 0. It is an (a ji ), called the transpose of A. This operation changes
additive identity. rows into columns and columns into rows.
There is a 1–1 correspondence from the set of linear
transformations from an n-dimensional vector space to an (D-1) Since sgn(σ ) = sgn(σ −1 ), det(A) = det(AT ),
m-dimensional vector space to the set of n × m matrices, σ ∈ n.
given bases in these. As in the last subsection (ai j ) corre- (D-2) If Ai ∗ = Bi ∗ = Ci ∗ for i = k and Ck ∗ = Ak ∗ + Bk ∗ ,
sponds to f such that f (xi ) = ai j y j . then det(C) = det(A) + det(B).
Consider linear transformations from a vector space (D-3) If the rows of A are permuted by a permutation
with basis {xi } to itself. Let f be represented as (ai j ). Let π , the determinant of A is multiplied by sgn(π ).
{w j } be another basis, where xi = bi j w j , w j = c ji xi . (D-4) If two rows (columns) of A are equal, then det
Then f (w j ) = c ji f (xi ) = c ji aik xk = c ji aik bkm wm . (A) = 0.
From w j = c ji xi = c ji bik wk it follows that C B = I , (D-5) If any row is multiplied by k, then the
where I is the matrix determinant is multiplied by k.
(D-6) If Ai ∗ is replaced by Ai ∗ − k A j ∗ , i = j, then the
1 0 0 ··· 0 determinant is unchanged.
0 1 0 · · · 0
(D-7) If ai j = 0 for i > j, then det(A)= a11 a22 · · · ann .
......... (D-8) det(AB) = det(A) det(B).
0 0 0 ··· 1 (D-9) det(A) = 0 if and only if A has an inverse.
(D-10) Let A[i| j] the submatrix of A obtained by
is known as a (multiplicative) identity. It acts as a two- deleting row i and column j. The (i, j)th-
sided identity I A = AI = A. We also have xi = bi j w j = cofactor of ai j is (−1)i+ j det A[i| j] and it is
bi j c jk xk so BC = I . Therefore, B = C −1 . Here C −1 de- denoted as C[i| j]:
notes the (multiplicative) inverse of C. Then, expressed
n
n
in terms of w j , f is C AB = C AC −1 . Two matrices X , det(A) = ar j C[r | j] = ais C[i|s]
Y X Y −1 are said to be similar. j=1 i=1
A linear transformation represented as (ai j ) sends
ci xi to ci ai j y j . The matrix product (ci )(ai j ) is Property (D-1) ensures that for properties of determi-
(ci ai j ). A row (column) vector is an 1 × n(n × 1) matrix. nants stated in terms of their rows, equivalent properties
So the linear transformation is equivalent to that given can be stated in terms of columns. These sometimes will
by matrix multiplication on row vectors. Matrices also act not be explicitly mentioned.
as linear transformation on column vectors. If follows from (D-4), (D-9), and (D-10) that if A has
The rank of a matrix is its rank as a linear transformation an inverse
on row vectors. Taking column vectors gives the same 1
number. A−1 = (C[i| j]T )
det(A)
The image space is spanned by the rows of the matrix
since (i ci ai j ) is the sum of ci times the ith row of A. From (D-7) and (D-8), det(I ) = 1, det(A−1 ) = 1/det(A).
Therefore, the row rank is the maximum size of a set of
linearly independent rows. The ith row (column) of A is E. Boolean Vectors and Matrices
denoted Ai ∗ (A∗ i ).
In general, an n-square matrix X is invertible if there Most of the theory of the last section holds in the case
exists Y with X Y = I or Y X = I . Either equation implies of Boolean matrices. Boolean row (column) vectors are
the other and is equivalent to the rank of the matrix being 1 × n(n × 1) Boolean matrices. The set Vn is the set of
n. all Boolean n-tuples, and additive homomorphisms pre-
The determinant of a matrix A is defined to be serving 0 from Vn to Vm are in 1–1 correspondence with
Boolean matrices.
det(A) = sgn(π)a1π (1) a2π (2) · · · anπ(n) Matrices over a Boolean algebra or field can be mul-
π∈ n tiplied or added in blocks using the same formulas as
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
for individual entries (Ai j ) + (Bi j ) = (Ai j + Bi j ), and space if and only if their matrices are similar. Similarity
(Ai j )(Bi j ) = ( Aik Bk j ). Here the set of all indices has is an equivalence relation. To describe it completely we
been partitioned into r subsets S1 , S2 , . . . , Sr and Ai j is the need a set of similarity invariants that distinguish any two
submatrix consisting of all akm such that (k, m) ∈ Si × S j . nonsimilar matrices.
Usually the numbers in each St are assumed adjacent. For The most important similarity invariant is the char-
example, acteristic polynomial defined by p(λ) = det(λI − A). If
B = X AX −1 , then
1 2 5 6 1 2 5 6
det(λI − X AX −1 ) = det(X (λI − A)X −1 )
3 4 7 8 3 4 7 8
0 1 1 1 0 1 1 1 = det(X ) p(λ)(det(X ))−1
1 0 1 1 1 0 1 1
= p(λ)
The final axiom is the induction property (R-14): If a When an argument is made by induction, it is under-
nonempty set S is contained in P and 1 ∈ S and 1 + x ∈ S stood that a statement P(n) holds for n = 1 and that there
whenever x ∈ S, then S = P. is an argument showing that if P(1), P(2), . . . , P(n − 1)
The ordinary rules of algebra follow from these axioms, hold so does P(n). Let S = {n ∈ P : P(n) is true}. Then by
as (1 + (−1) + 1) · (−1) = (0 + 1) · (−1) = 1 · (−1) = −1. the induction axiom (R-14), S is all of P. Arguments by
Therefore, −1 + ( −1 )(−1 ) + ( −1 ) = −1, 1 + ( −1 ) + induction are especially common in the theory of Z.
(−1)(−1) + (−1) + 1 = 1 + (−1) + 1, and (−1)(−1) = 1. The following is an example of the Euclidean algorithm
for polynomials:
B. Euclidean Domains x 3 + 1 = x(x 2 + 1) + (−x + 1)
Many properties of Z are shared by the ring of polyno- x 2 + 1 = (−x + 1)(−x − 1) + 2
mials over any√ field, and the rings {a + bi : a, b ∈ Z} and −x − 1
{a + b[(1 + 3)/2]i : a, b ∈ Z}. To deal with these we de- −x − 1 = 2
2
fine Euclidean domains. A Euclidean domain is a ring
satisfying (R-1) to (R-10) provided with a function Therefore, 2 (equivalently 1) is a g.c.d. of x 2 + 1, x 3 + 1.
ω(x) defined from nonzero elements of into Z such that We can now express it in terms of them:
(R-15) ω(x y) ≥ ω(x); (R-16) for all x, y ∈ there exist
2 = (x 2 + 1) − (−x + 1)(−x − 1)
q, r ∈ such that x = qy + r , and either ω(r ) < ω(y) or
r = 0; and (R-17) ω(x) ≥ 0. For polynomials, let ω be the = (x 2 + 1) − (−x + 1)(x + 1)
degree. For other cases, let it be |x|.
2 = (x 2 + 1) + ((x 3 + 1) − x(x 2 + 1))(x + 1)
In any ring with unit , the relation x|y can be de-
fined by there exists z ∈ with zx = y. This relation is 2 = (x 2 + 1) + (x + 1)(x 3 + 1) − x(x + 1)(x 2 + 1)
transitive. If x|y, x|z, then x|y + z since y = ax, z = bx,
2 = (1 − x − x 2 )(x 2 + 1) + (x + 1)(x 3 + 1)
y + z = (a + b)x for some a, b. If x|y then xa|ya for any
a. Since 1x = x, it is a quasiorder. It is called (right) In the second step, we substituted x 3 + 1 − x(x 2 + 1) for
divisibility. −x + 1.
In an integral domain, suppose y|x and x|y. Then x = By induction there is a g.c.d. of x1 , x2 , . . . , xk that is
r y, y = sx for some r, s. If either is zero then both are. linear combination s1 x1 + s2 x2 + · · · + sk xk of them. If
Otherwise r sx = r y = x = 1x and r s = 1 by (R-9). So r , we multiply by invertible elements, we obtain such an
s are invertible. expression for any g.c.d.
A greatest common divisor (g.c.d.) of a set S ⊂ is an If a is invertible, then ω(ab) ≤ ω(b) and ω(a −1 ab) ≤
element g ∈ such that g|x for all x ∈ S and if y|x for all ω(ab), so the two are equal. Conversely, if ω(ab) = ω(b), b
x ∈ S then y|g. Any two g.c.d.’s of the same set must be = 0, b ∈ , then a is a unit [divide b by ab; if ω(r ) <
multiples of one another, and so multiples by an invertible ω(ab) = ω(b) we have a contradiction].
element. Let g be a g.c.d. of S and h be a g.c.d. of a set From this we can establish the basic facts of prime fac-
T and m be a g.c.d. of {g, h}. Then m is a g.c.d. of S ∪ T . torizations. An element p ∈ is called prime if it is not
The g.c.d. is a greatest lower bound in the partial order of invertible but whenever p = x y one of x, y is invertible.
the set of equivalence classes of elements of under the It follows that for any a if p|a then the g.c.d. of p, a is
relation x ∼ ay if a is invertible. invertible.
In a Euclidean domain, the Euclidean algorithm is an Suppose 1 is a g.c.d. of c, a. Let 1 = ar + cs. If c|ab,
efficient way to calculate the g.c.d. of two elements x, y. then c|abr + cbs = b. Therefore, if p, a prime, divides
Assume ω(x) ≥ ω(y). Let d0 = x, d1 = y. Choose a se- x y, it divides x or y. To factor an element y of into
quence di , qi for i > 0 by di = qi+1 di+1 + di+2 , where primes choose a divisor x that has minimal ω(x) among
ω(di+2 ) < ω(di+1 ). The process terminates since ω(x) is nonivertible elements. If x = ut and u is not invertible then
always a nonnegative integer, and in the last stage we ω(u) = ω(x) and so t is invertible and so x is prime. Since
have a remainder dk+2 = 0. Then dk+1 |dk . By induction x is not invertible, ω(y/x) < ω(y). So after a finite number
using the equations di = qi+1 di+1 + di+2 it will divide of prime factors the process ends.
the right side and therefore the left side. So it divides By induction we can show that if p is a prime and divides
all di . Also di+2 = di − qi+1 di+1 is a linear combina- y1 y2 · · · yk , then p divides some yi . From this it can be
tion of di , di+1 so by induction dk+1 is a linear com- proved that if a = p1 p2 · · · pm = q1 q2 · · · qn , where pi , qi
bination of a, b. Therefore, if x divides a, b, it divides are primes, then n = m and the qi can be renumbered so
dk+2 . Therefore, the element dk+1 = ra + sb is a g.c.d. of that pi = u i qi , where u i is invertible. Since pi |q1 q2 · · · qn ,
a, b. p1 divides some qi . Label it q1 . Since both are primes,
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
p1 = q1 . This reduces n, m by 1 and the process can be quences (a0 , a1 , . . .) such that only finitely many ai are
repeated. nonzero. The ai are the coefficients of the polynomial and
multiplication can be defined by (an )(bn ) = (r ar bn−r ).
In it or any Euclidean domain, all ideals have the form
C. Ideals and Congruences
{ f (x)y : y ∈ K[x]} for some polynomial f (x). Given an
A ring homomorphism is a function f : → satisfying ideal , take a nonzero element f (x) in it such that
f (x + y) = f (x) + f (y) and f (x y) = f (x) f (y). That is, ω( f (x)), the degree, is minimal.
it is a semigroup homomorphism for multiplication and a If f (x) is another element of the ideal, choose q(x), r (x)
group homomorphism for addition. such that r (x) = 0 or ω(r (x)) < ω( f (x)) for g(x) =
The following are examples of ring homomorphisms. f (x)q(x) + r (x). Then r (x) = g(x) − f (x)q(x) ∈ since
(1) The mapping from n-square matrices to m-square ma- g(x), f (x) do. So ω(r (x)) < ω( f (x)) contradicts minimal-
trices for m > n, which adds to a matrix m − n rows and ity of ω( f (x)). So r (x) = 0. So every element in is a
columns of zero. (2) The mapping f : Z → Zm , f (x) = x̄. multiple of f (x). Conversely multiples of f (x) belong in
For any two rings , a product ring × is defined by since it is an ideal.
operations (a, b) + (c, d) = (a + c, b + d), (a, b)(c, d) = Ideals of the form a = {ax : x ∈ } are called princi-
(ac, bd). (3) The mapping × → sending (a, b) to pal, and a is said to be a generator. Therefore, Euclidean
a (or b) is a ring homomorphism. (4) For any complex domains are principal ideal domains, that is, integral do-
number c, the evaluation f (c) is a homomorphism from mains in which all ideals are principal.
the ring of all functions f : C → C into C itself. In all principal ideal domains, all elements can be
A subring of a ring is a subset closed under addi- uniquely factored in terms of primes and invertible el-
tion, subtraction, and multiplication. An ideal is a sub- ements. Let m where m ∈ Z is not √ divisible by the
set S of a ring closed under addition and subtraction square√ of a prime denote {a + [b(1 + m)/2] : a, b ∈ Z} or
(if x, y ∈ S, then x − y ∈ S) and such that if z ∈ , x ∈ S, {a + b m : a, b ∈ Z} according to whether m ≡ 1(mod 4)
then x z, zx ∈ S. The set of even integers is an ideal in the or not. If 0 < m < 25, then m is a principal ideal domain if
set of intergers. and only if m = 10, 15. If −130 < m < 0, then m is a prin-
A congruence on a ring is an equivalence relation cipal ideal domain if m = −1, −2, −3, −7, −11, −19,
x ∼ y such that for all a ∈ if x ∼ y then x + a ∼ y + a, −43, −67. (This is a result proved by techniques of alge-
xa ∼ ya, and ax ∼ ay. For any congruence if x ∼ y, braic number theory.) Unique factorization does not hold
z ∼ w then x + z ∼ y + z ∼ y + w and x z ∼ yw. Then ad- in m if it is not a principal ideal domain.
dition and multiplication of equivalence classes x̄ + ȳ = For noncommutative rings, there are separate concepts
x + y, x̄ ȳ = x y are well defined and give a ring called the of right and left ideals. A subset S of a ring closed under
quotient ring. The function x → x̄ is a homomorphism. addition and subtraction is a left (right) ideal if and only if
There is a 1–1 correspondence between ideals and con- ax ∈ S (xa ∈ S) for all x ∈ S, a ∈ . The ring of n-square
gruences. To a congruence associate the set x ∼ 0, which matrices has no (two-sided) ideals except itself and zero
is an ideal . Then the congruence is defined by x ∼ y if but for any subspace W, = {M ∈ Mn (F) : v M ∈ W} is
and only if x + (−y) ∼ y + (−y) = 0. Therefore, x ∼ y if a right ideal. A proper ideal is an ideal that is a proper
and only if x − y ∈ . Every ideal in this way determines subset. The trivial ideal is {0}.
a congruence. Suppose a ring has not all products zero and has no
The equivalence classes have the form x + = {x + m : proper nontrivial right ideals. If ab = 0 for some b, then
m ∈ }. The quotient ring is denoted /. {x : ax = 0} is a proper right ideal and is therefore zero.
All ideals in the integers—in fact, all subgroups—have The set {ax} is a nonzero right ideal and is therefore .
the form mZ = {mx : x ∈ Z}. The congruence associated So multiplication on the left by a is 1–1 and onto.
with these is a ≡ b(mod m) if and only if a − b = km Let = {a : ab = 0 for all b ∈ }. Then is a proper
for some k. By the general theory, such congruences can right ideal and is therefore 0. So for all a = 0 left multi-
be added, subtracted, multiplied, or raised to a power. plication by a is 1–1 and onto. So if ab = 0, then a = 0 or
Congruences can sometimes decide whether or not equa- b = 0.
tions are solvable in whole numbers. For example, x 2 + y 2 For any a = 0 since {ax} = , there exist , a −1 such
+ z 2 = w is not possible if w ≡ 7(mod 8). To prove this it that a = a, aa −1 = . Then for any x ∈ , a x = ax.
can first be noted that x 2 ≡ K (mod 8) where k = 0, 1, 4, So x = x since left multiplication is 1–1. From ab = 0
and no three elements of {0, 1, 4} add to 7(mod 8). The for a = 0, b = 0, it follows that right multiplication is also
ring of all polynomials in a variable x having coefficients 1–1. Since x x = x x and x = x, is an identity. From
in a field K is denoted K[x]. As a set it is a subset of aa −1 a = a = a = a , it follows that a −1 is an inverse of
a Cartesian product of countably many copies of K, se- a. Therefore, in , all nonzero elements have inverses.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
A ring with unit in which all nonzero elements have tion x = k we find r = 0. So if p(k) = 0, then (x − k)| p(x).
inverses is called a division algebra. The best known di- Thus, a polynomial of order n cannot have more than n dis-
vision algebra that is not a field is the ring of quater- tinct roots since each root gives a degree 1 factor (x − k).
nions. As a set, it is R × R × R × R. Elements are The polynomial x p−1 − 1 has exactly p − 1 roots in
written as a + b i +c j + d k. Expressions are multiplied Zp , that is, 1, 2, . . . , p − 1. Thus, it factors as c(x − 1) · · ·
by expanding and using the rules i2 = j2 = k2 = −1, (x − ( p − 1)). Since the, coefficient of x p−1 is 1, we have
i j = k, j k = i, k i = j, j i = − k, k j = − i, i k = − j. Any c = 1. The polynomial x r − 1 divides x p−1 − 1 for r | p−1,
multiplication defined from a basis by expanding terms so by unique factorization it factors into linear factors and
is distributive. The multiplication on basis elements and has exactly r roots. For any prime power r t dividing p − 1
t t−1
their negative ±1, ± i, ± j, ± k is a finite group of order 8. take a root yr of x r = 1, which is not a root of x r = 1.
t
This implies associativity. Then yr has order precisely r . Take t maximal. The prod-
uct u of all yr will have order the least common multiple
of r t , which is p − 1. Therefore, u, u 2 , . . . , u p−1 are dis-
D. Structure of Zn
tinct and are all of Z∗p since it has exactly p − 1 elements.
The ring Zn is the ring of congruence classes modulo n. It is This proves Z∗p is cyclic. Therefore, it is isomorphic to
the quotient ring Z/n , where n = {nx : x ∈ Z}. For any (Zp−1 , +).
quotient ring /, ideals of the quotient ring are in 1–1 In Zn for m|n, an element y is a multiple of m if and
correspondance with ideals of containing . An ideal only if (n/m)y ≡ 0. Therefore, in Z∗p , an element y is an
m ⊃ n if and only if m|n. So the ideals of Zn are in 1–1 mth power for m| p − 1 if and only if y ( p−1)/m ≡ 1(mod p).
correspondence with positive integers m dividing n. The The mth powers form a multiplicative subgroup of order
ring Zn has exactly n elements, since each integer x can ( p − 1)/m.
be expressed as nk + r for a unique r ∈ {0, 1, . . . , n − 1},
and then x ≡ r (mod n). It has a unit and is commutative.
E. Simple and Semisimple Rings
Two elements of an integral domain are called relatively
prime if 1 is a g.c.d. of them. If x, n are relatively prime, A ring is called simple if it has no proper nontrivial (two-
then r x + sn = 1 for some r and s. Thus, r̄ x̄ = 1 so x̄ sided) ideals. A commutative simple ring is a field. It is
is invertible. Conversely, if r̄ x̄ = 1, then r x − 1 = sn for simplest to treat the case of finite dimensional algebras.
some s, and 1 is a g.c.d. of x and n. Thus, the number of An algebra over a field F is a ring provided with a multi-
invertible elements equals (n), the number of positive plication F × → such that (1) (ax)y = a(x y) = x(ay)
integers m < n that are relatively prime to n. for all a ∈ F, x, y ∈ ; and (2) is a vector space over F.
The Chinese remainder theorem asserts that if all pairs The quaternions, the complex numbers, and the ring of all
of n 1 , n 2 , . . . , n m are relatively prime and ai ∈ Z, i = 1 to functions R to R are all algebras over R.
m, then there exists x ∈ Z and x ≡ ai (mod n i ) for i = 1 to A division algebra is an algebra that is a division ring.
m. Any two such x are congruent modulo n 1 n 2 · · · n m . Division algebras can be classified in terms of fields. A
The set of invertible elements is closed under products field F is called algebraically closed if every nonzero
since (x y)−1 = y −1 x −1 and contains inverses and identity, polynomial p(x) = a0 x n + a1 x n−1 + · · · + an x 0 , ai ∈ F,
so it forms a group for any ring. Here it is a finite group. a0 = 0, n = 0 has a root r ∈ F. Suppose we have a division
For any invertible element x, the elements 1, x, . . . , x k−1 algebra over an algebraically closed field F of finite
form a subgroup if k is the order of x. By the Lagrange’s dimension n. Let a ∈ . Then 1, a, . . . , a n are n + 1 ele-
theorem, the order of any subgroup of a group divides the ments in an n-dimensional vector space so they cannot be
order of the group. Therefore, k|(n). From x k = 1, fol- linearly independent (they could be extended to a basis).
lows x (n) = 1. This proves a theorem of Euler and Fermat. So some combination p(a) = c0 + c1 a + · · · + cn a n = 0.
For any x ∈ Zn , if x is invertible, then x (n) = 1. Choose the degree of p minimal. Let deg( p) denote
If p is prime, then 1, 2, . . . , p − 1 are all relatively the degree of p. If Deg( p) > 1, then p can be factored
prime to p, so ( p) = p − 1, and x p−1 ≡ 1(mod p) if over F. Each factor is nonzero evaluated at a, but their
x ≡ 0). Then x p ≡ x(mod p) for all x ∈ Zp . Assume p is product is zero. This is impossible in a division alge-
prime for the rest of this section. bra where nonzero elements are invertible. Therefore,
The multiplicative group Z∗p = {x̄ ≡ 0̄ : x̄ ∈ Zp } is a com- Deg( p) = 1, so c0 + c1 a = 0, as a ∈ F. So = F. Any
mutative group of order p − 1. The ring Zp is a field since such division algebra coincides with F. This applies to
Z∗p is a group. Polynomials over Zp can be uniquely fac- F = C.
tored into primes. A finite dimensional F-algebra is simple if and only if it
Over any field K, if a polynomial p(x) satisfies p(k) = 0, is isomorphic to the ring Mn () of n-square matrices over
where k ∈ K, then let p(x) = (x − k)q(x) + r . By substitu- . More generally, every ring having no infinite strictly
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
decreasing sequence of distinct left ideals (Artinian ring) have b(x, x) ≥ 0 and b(x, x) = 0 only if x = 0 since each
that is simple is an Mn (). term will be a vector vv ∗T = |vi2 |. We can inductively
A finite dimensional F-algebra A is called semisimple choose a basis u i for row vectors such that b(u i , u i ) = 1,
if as a vector space it is an internal direct sum of minimal b(u i , u j ) = 0 if i = j. In this basis, let u i f (g) = u k aki .
right ideals i . This means every x ∈ A can be uniquely Then b(u i , u j ) = b(u k aki , u k ak j ) = ak∗j aki b(u k , u k ).
expressed as xi , xi ∈ i . Left ideals can equivalently be So ak∗j aki = 1, 0 according to whether i = j or i = j.
used. This proves the matrix is now unitary.
This is equivalent to any of the following conditions. (1) For F = R, essentially the same can be done. A real
The multiplicative semigroup of is regular; (2) every left unitary matrix is called orthogonal. Orthogonal matrices
ideal of A is generated by an idempotent; and (3) A has no are geometrically products of rotations and reflections.
two-sided ideals such that m = {x1 x2 · · · xm : xi ∈ }
is zero for some m. Ideals such that m = 0 are called
G. Modules
nilpotent.
A finite dimensional semisipmle F-algebra is isomor- A module differs from a vector space only in that the ring
phic to a direct sum of rings Mn (). involved may not be a field. A left (right) -module is a set
If and are nilpotent two-sided ideals, so is + provided with a binary operation denoted as addition
since ( + )2 n ⊂ n + n . This implies that every finite and a multiplication × → ( × → ), where
dimensional algebra has a maximal two-sided nilpotent is a ring such that the axioms in Table III hold for all
ideal, the Jacobson radical, and its quotient ring by this x, y ∈ , r, s ∈ , and some 0 ∈ .
ideal is semisimple. For group representations, we consider only unitary
All finite division rings are fields. modules, those in which 1x = x for all x ∈ . Henceforth,
we consider left modules.
A homomorphism of modules is a mapping f : →
F. Group Rings and Group Representations
such that f (x + y) = f (x) + f (y) and f (r x) = r f (x) for
A group ring of a group G over a ring is formally the set all x, y ∈ , and r ∈ . A submodule of is a subset
of all functions f : G → such that |{g : f (g) = 0}| is fi- closed under addition, subtraction, and multiplication by
nite. Multiplication is given by ( f h)(x) = yz=x f (y)h(z) elements of . For any submodule ⊂ , the equiva-
and addition is the usual sum of functions. Distributivity lence relation x ∼ y if and only if x − y ∈ is a module
follows from the linear form of this expression. Asso- congruence, that is, x + z ∼ y + z, r x ∼ r y for all z ∈ ,
ciativity follows from {(u, v, w) : uv = y and yw = x for r ∈ . Then the equivalence classes form a new module
some y} = {(u, v, w) : vw = z, uz = x for some z}. In fact, called the quotient module.
semigroups also have semigroup rings. Modules are a type of algebraic action, that is, a map-
The group ring can be thought of as the set of all sums ping G × S → S for a structure G and set S. Figure 8
r1 g1 + r2 g2 + · · · + rn gn of group elements with co- classifies some of these.
efficients in . For coefficients in Z, we have, for instance, Direct sums of modules are defined as for vector spaces.
(2g + )(3 − 2g) = 3 − 2g + 6g − 4g 2 = 3 + 4g − 4g 2 . Unlike the vector space case not every finite dimensional
A representation of a group G is a homomorphism module over a general algebra is isomorphic to a direct
h from G into the ring Mn (F) of n-square matrices sum ⊕ ⊕ · · · ⊕ of copies of (with operations
over F. Two representations f, h are called equivalent if from ). If a ring is an F-algebra, all modules over it are
there exists an invertible matrix M ∈ Mn (F) with f (g) = vector spaces and this determines their dimension.
Mh(g) M −1 for all g ∈ G. This is an equivalence relation.
Every representation f of a group G defines a ring ho-
TABLE III Module Axioms
momorphism h : F(G) → Mn (F) by h(ri gi ) = ri h(gi )
such that h(1) = I where I is an n-square identity matrix. Left module Right module
This is a 1–1 correspondence since if h is a ring homo-
morphism h(g) gives a group representation. (x + y) + z = x + (y + z) (x + y) + z = x + (y + z)
For F = C, every group representation is equivalent to a x +0=x 0+x =x
unitary representation, that is, one in which every matrix 0x = 0 x0 = 0
f (g) satisfies f (g) f (g)∗T = I . Here * is complex conju- There exists −x such that There exists −x such that
x + (−x) = 0 for all x x + (−x) = 0 for all x
gation. Define a modified inner product on row vectors
r (x + y) = r x + r y (x + y)r = xr + yr
by b(x, y) = g x f (g)(y f (g))∗T . Then b(x, y) = b(x f (h),
(r + s)x = r x + sx x(r + s) = xr + xs
y f (h)) for h ∈ G since multiplication by h permutes the
r (sx) = (r s)x (xs)r = x(sr )
group elements g and so permutes the terms in the sum. We
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
vector space over F. The dimension of E over F is denoted For any f (x) such that f (x̄) = 0, there will exist r (x), s(x)
[E : F] and is called the degree of the extension. such that r (x) f (x) + s(x) p(x) = 1 since f, p have g.c.d.
Suppose F ⊂ E ⊂ D. Let xi be a basis for E over F and 1. So f (t) has an inverse r (t) and the quotient ring is a
yi a basis for D over E. Then every element of D can be field.
written uniquely as ci yi , ci ∈ E, and in turn this can be
written uniquely as ( f i j x j )yi , f i j ∈ F. So x j yi are a basis
C. Applications to Solvability
and [E : F][D : E] = [D : F].
and Constructibility
Let E be an extension of F and let t ∈ E. Then F(t) de-
notes the subfield of E generated by F, that is, all elements The problem of solving quadratic equations goes at least
of E that can be obtained from elements of F, t by adding, back to the Babylonians. In the ninth century, the Muslim
multiplying, subtracting, and dividing. mathematician Al-Khwarismi gave a version of the mod-
Suppose E is a finite dimensional extension of degree ern quadratic formula. In the mid-sixteenth century, Italian
n. Let x denote an indeterminate. Then the elements 1, t, mathematicians reduced the solution of a cubic equation
t 2 , . . . , t n are n + 1 elements and so are linearly depen- to the form
dent. Then some sum i=0 n
ai t i = 0. We can choose n mini-
x 3 + mx = n
mal and assume by dividing by an that an = 1. If an = 1, we
say that a polynomial is monic. We then get a polynomial by dividing it by the coefficient of x 3 and making a sub-
p(t) = 0 of minimal degree called the minimal polynomial stitution replacing x by some x − c. They then solved it
of t. If f (t) = 0 for any other polynomial f (x), divide p(x) as
into f (x). Then r (t) = f (t) − q(t) p(t) = 0. If r (x) = 0 it
would have lower degree than p(x) by minimality. There- x =a−b
fore, p(x) divides every other polynomial f (x) such that a = 3 (n/2) + (n/2)2 + (m/3)3
f (t) = 0. Also by minimality 1, t, t 2 , . . . , t n−1 are linearly
independent. The polynomial p(x) must also be prime;
b = 3 −(n/2) + (n/2)2 + (m/3)3 .
else if p(x) = r (x)s(x) then one of r (t), s(t) = 0. Prime
polynomials are called irreducible. They proceeded to solve quartic equations by reducing
We next show that 1, t, t 2 , . . . , t n−1 are a basis for them to cubics. But no one was able to solve the general
F(t). They are linearly independent. Since t n = i=0 n−1
ai t n , quintic using nth roots and in 1824 N.H. Abel proved this
t n+ j
= i=0 ai t , every power of t can be expressed as a
n−1 i+ j
is impossible. E. Galois in his brief life proved this also,
linear combination of 1, t, t 2 , . . . , t n−1 . Therefore, their as part of a general theory that applies to all polynomials.
span is a subring. Suppose f (t) = 0. Then p(x) does This is based on field theory, and we describe it next.
not divide f (x). Since p(x) is prime, 1 is a g.c.d. In the extensions F(t), one root of a polynomial p(t)
of p(x), f (x). For some r (x), s(x), r (x) p(x) + s(x) f (x) = has been added, or adjoined, to F. Extensions obtained by
1. So s(t) f (t) = 1 and s(t) is an inverse of f (t). There- adding all roots of a polynomial are called normal exten-
fore, the span of 1, t, t 2 , . . . , t n−1 is closed under divi- sions. The roots can be added one at a time in any order.
sion and is a field. So it must be the field F(t) generated Finite dimensional normal extensions can be studied by
by t. finite groups called Galois groups. The Galois group of a
There exists a homomorphism h from the ring of poly- normal extension F ⊂ E is the group of all field automor-
nomials F[x] to F(t) defined by h( f (x)) = f (t). From the phisms of E that are the identity on F. It will, in effect,
usual laws of algebra it is a ring homomorphism that is permute the roots of a polynomial whose roots generate √ the
also onto. extension. For example,√ let F = Q(ξ ) and let E = F( 3
2),
The kernel is the set of polynomials f (x) such that where ξ = (−1 √ + i 3)/2.
√ √ Then an automorphism
√ √ √E
of
f (t) = 0. We have already shown that this is the set of exists taking 3 2 → ξ 3 2, ξ 3 2 → ξ 2 ( 3 2), ξ 2 ( 3 2) → 3 2.
polynomials divisible by p(x), or the ideal p(x)F(x) gen- The Galois group is cyclic of order 3, generated by this
erated by p(x). automorphism. Since the ratio ξ of two roots goes to itself,
Therefore, in summary of these results, let p(x) be the it is the identity on Q(ξ ).
minimum polynomial, of degree n, of t. Then [F(t) : F] = n The order of the Galois group equals the degree of a nor-
with basis 1, t, t 2 , . . . , t n−1 . The field F(t) is isomorphic mal extension. Moreover, there is a 1–1 correspondence
to the quotient ring F[x]/ p(x)F[x], where F[x] is the ring between subfields F ⊂ K ⊂ E and subgroups of H ⊂ G,
of all polynomials in an indeterminate x. the Galois group of E over F. To a subgroup H is associ-
Conversely, let p(x) be a monic irreducible polynomial. ated the field K = {x ∈ E : f (x) = x for all f ∈ K}.
Then we will show that F[x]/ p(x)F[x] is an extension A splitting field of a polynomial p over a field F is a
field of F in which t = x̄ has minimum polynomial p(x). minimal extension of F over which p factors into factors
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
of degree 1. It is a normal extension and any two splitting Therefore, if x is a coordinate of a constructible point, x
fields are isomorphic. lies in an extension of degree 2n , in fact a normal extension
Suppose a polynomial p is solvable by radicals over of degree 2n . But if [Q(x) : Q] has degree not a power of
Q. Let E be the splitting field of p over Q. Each time we 2, this is impossible since [E : Q] = [E : Q(x)][Q(x) : Q].
extract a radical the roots of the radical generate a normal In particular, duplicating a cube (providing a cube of
extension F1 of a previous field F2 . Let Ei = Fi ∩ E. Then volume precisely 2) and trisecting an angle of 60◦ lead
F2 over F1 has cyclic Galois group, so E2 over E1 does to roots of irreducible cubics x 3 − 2 = 0 and 4 cos3 θ −
also. 3 cos θ − cos 60◦ = 0 and cannot be performed. Since π
It follows that there exist a series of extensions Q = D0 ⊂ does not satisfy any monic polynomial with coefficients
D1 ⊂ · · · ⊂ Dn = E each normal over the preceding such in Q, the circle cannot be squared.
that the Galois group of each over the other is cyclic. It
follows that the Galois group G has a series of subgroups D. Finite Fields
G n = {e} ⊂ G n−1 ⊂ · · · ⊂ G 0 = G such that gi is a normal
subroup of G i−1 with cyclic quotient group. Such a group The fields Zp where p is a prime number are fields with a
is called solvable. finite number of elements. All finite fiels have n1 = 0 for
The symmetric group of degree 5 has as its only nontriv- some n and are therefore extension fields of some Zp . If
ial proper normal subgroup the alternating group which is Zp ⊂ E, then E is a vector space over Zp . Let x1 , x2 , . . . , xn
simple. Therefore, it is not solvable. If F(x) is a degree be a basis. Then all elements of E can be uniquely ex-
5 irreducible polynomial over Q with exactly two non- pressed as linear combinations ci xi , where ci ∈ Zp . This
real roots, there exists an order 5 element in its Galois set has cardinality |{(c1 , c2 , . . . , cn )}| = |Znp | = p n . So
group just because 5 divides the degree of the splitting |E| = p n if [E : Zp ] = n.
field. Complex conjugation gives a transposition. There- Let E have order p n . Then the multiplicative group E∗
fore, the Galois group is 5 . So polynomials of degree 5 has order p n − 1. So every element has order dividing
p n − 1. So if r = 0, r p −1 = 1. So for all r ∈ E, r p = r .
n n
cannot in general be solved by radicals. n
Conversely, it is true that every normal extension E ⊂ F Then for all r ∈ E, x − r divides the polynomial x p − x.
n
with cyclic Galois group can be generated by radicals. It Therefore x p − x factors as precisely the product of x − r
can be shown that there is a single element θ such that for all r ∈ E. Therefore, if m is a divisor of p n − 1, the
equation x m − 1 divides x p −1 − 1 and splits into linear
n
E = F(θ ) (consider all linear combinations θ of a basis for
E over F, and there being a finite number of intermediate factors. As with Zp this means we can get an element u i
fields). of order a maximum power of each prime dividing p n − 1
Let the extension be cyclic of order n and let τ and their product will have order p n − 1 and generate the
be such that τ n = 1 but no lower power. Let the au- group. So E∗ is cyclic.
tomorphism g generate the Galois group. Let t = θ + Since p| p! but not r ! for r < p, p prime we have p|( rp ).
τ g(θ ) + · · · + τ n−1 g n−1 (θ ). Then t has n distinct In E since p1 = 0, every element satisfies x + x + · · · + x
conjugates (assuming τ ∈ F) g i (θ ) + τ g i+1 (θ ) + · · · + = px = 0. So (x + y)n = ( nr )x r y n−r by the binomial the-
τ n−1 g n−1+i (θ) and so its minimum polynomial has degree orem, which holds in all commutative rings. In E∗ , (x +y) p
n. Since g(t) = τ −1 (t), the element t n = a is invariant un- = x p + y p since other terms are divisible by p. This im-
r r r r
der the Galois group and lies in F. So θ, g(θ ), . . . , g n−1 (θ) plies by induction, (x + y) p = x p + y p . Therefore, x p
lie in the splitting field of x n = a, which must be E. is an automorphism of E.
Geometric constructions provide an application of field This gives a cyclic group of order n of automorphisms
i
theory. Suppose we are given a unit line segment. What fig- of E since if y generates the cyclic group then y p = y
ures can be constructed from it by ruler and compass? Let for i < n. This is the complete automorphism group of
the segment be taken as a unit length or the x axis. Wher- E.
ever we construct a new point from existing ones by ruler If an element z lies in a proper subfield F of E, then F
k
and compass it is an intersection of a line or circle with has order p k and k|n and z p = z. Conversely, the set of
k
a line or circle. Such intersections lead to quadratic equa- {z : z p = z} is closed under sums and products and mul-
k
tions. Therefore, if a point P is constructible, each coor- tiplicative inverses so it is a proper subfield. So if z p = z
dinate must be obtained from rational numbers by adding, then z lies in a proper subfield.
subtracting, multiplying, dividing, or taking square For any irreducible polynomial p(x) of degree k over
roots. Such quantities lie in an extension field of E ⊂ Q Zp , there exists a field H = Zp [x]/ p(x)Zp [x]. It has degree
k and so order p k . If t = x̄, then it is Zp (t) where p(t) = 0
such that there exist√ fields E0 = Q ⊂ E1 ⊂ · · · ⊂ Ek = E
and En = En−1 ( a) for a ∈ En−1 . The degree of
k
is the minimum polynomial of t. Since t p − t = 0, we
[E : Q] = [En : En−1 ] · · · [E1 : E0 ] is a power of 2.
k k n
have p(x)|x p − x, and if k|n, then p(x)|x p − x|x p − x.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
n
Suppose p(x)|x p − x and is irreducible of degree k, A perfect code is one such that equality holds. A linear
n
k | n. Then in Zp (t) of degree k we would have t p = t code is one that is a subspace of Vn . A cyclic code is a linear
n
and so for all r in the field r p = r . This is false. So an code in which each cyclic rearrangement b1 b2 · · · bn →
n
irreducible polynomial divides x p − x if and only if its bn b1 · · · bn−1 of a code vector is a code vector.
degree divides n. Let Vn be considered as the set of polynomials of degree
The derivative can be defined for polynomials in less than n in x. Then cyclic rearrangement is multiplica-
Zp [x] and satisfies the usual sum, product, power tion by x if we set x n = 1. Therefore, cyclic codes are
n n
rules. If x p − x = q(x)( p(x))2 , then (d/d x)(x p − x) = subspaces closed by multiplication by x in
−1 = q $ (x)( p(x))2 + 2q(x) p(x) p $ (x). So p(x) divides F[x]
−1. For p of degree greater than zero, this is false. There-
n (x n − 1)F[x]
fore, x p − x has each irreducible polynomial of degree k
dividing n exactly once as a factor. and are therefore ideals.
n
Suppose x p − x has no irreducible factors of degree Such an ideal is generated by a unique monic irreducible
k n
n. Then all its factors divide x p − x, k < n. So x p − x| polynomial g(x). Let γ be an element of an extension field
n−1 n
(x p − x) · · · (x p − x). Since x p − x has higher degree of F such that γ n = 1 but no lower power is 1. Let g(x) be
n
this is false. So x p − x has an irreducible factor g(x) of the least common multiple of the minimum polynomials
degree n. Therefore, a field H exists of order exactly p n . of γ 1 , γ 2 , . . . , γ d−1 , where d ≤ n, d is relatively prime
Any field F of order p n has an element r such that to n. Then g(x)|x n − 1 since all powers of γ are roots
n n
x − r |g(x)|x p − x since x p − x factors into linear fac- of x n − 1. By a computation involving determinants no
tors. Then r has minimum polynomial g(x). It follows element of the ideal of g(x) can have fewer than d nonzero
that Zp (r ) has degree n and equals F. And F is isomorphic coefficients. So t = (d − 1)/2 errors can be corrected. The
to H. So any two fields of order p n are isomorphic. number m is n minus the degree of g(x). These are called
the BCH (Bose–Chaudhuri–Hoquenghem) codes and if
n = q − 1 Reed–Solomon codes.
E. Codes
Coding theory is concerned with the following problem.
Consider information in the form of sequences a1 a2 · · · am F. Applications to Latin Squares
over a q-element set assumed to be a finite field F of An n × n Latin square is a matrix whose entries are el-
order q. We wish to find a function f encoding a1 a2 · · · am ements of an n-element set such that each number oc-
as another sequence b1 b2 · · · bn such that, if an error of curs exactly once in every row and column. The following
specified type occurs in the sequence (bi ), the sequence cyclic Latin square occurs for all n:
(ai ) can still be recovered. There should also be a readily
computable function g giving a1 a2 · · · am from b1 b2 · · · bn 1 2 ··· n
2 ··· 1
with possible errors. We assume any t or fewer errors could 3
occur in (bi ). ······
We consider a1 a2 · · · am as an m-dimensional vector
n 1 ··· n − 1
(a1 , a2 , . . . , am ) over F, that is, a ∈ Vm and (bi ) as a vector
b ∈ Vn . Therefore, a coding function is a function f : Vm → Two Latin squares (ai j ), (bi j ) are orthogonal if for all
Vn . The resulting code is its image of C where C ⊂ Vn . i, j, the ordered pairs (ai j , bi j ) are distinct.
Suppose two vectors v, w in a code differ in at most Orthogonal Latin squares are used in experiments in
2t places. Then let z agree with v in half these, w the which several factors are tested simultaneously. In partic-
other half, and agree with v and w where they agree. ular, suppose we want to test 5 fertilizers, 5 soil types,
Then z could have been either v or w. Error correction is 5 amounts of heat, and 5 plant varieties. Choose 5 × 5
impossible. orthogonal Latin squares. Take an experiment with vari-
Conversely, if two vectors v, w could give rise to the ety i, heat amount j, soil type ai j , fertilizer bi j for all
same vector z by at most t errors each, then they differ in i, j, and n 2 experiments. Then for any two factors, each
at most 2t places. variation on one factor occurs in combination with each
Therefore, a code can correct t errors if and only if every for the other factor exactly once. If we have k mutually
pair of vectors differ in at least 2t + 1 places. orthogonal Latin squares we can test k + 2 factors.
For such a code, q n ≥ q m (( n0 ) + (q − 1)( n1 ) + · · · + Suppose n = p1n 1 p2n 2 · · · prnr , where pi is a prime num-
(q − 1)t ( nt ) since the kth term on the right is the num- ber. Let R be the direct product of fields of order pini .
ber of vectors that differ from a code vector on exactly r Let k = inf{ pini − 1}. Choose k invertible elements from
places, and for r ≤ t all these are distinct. each field, xi j : i = 1, 2, . . . , k; j = 1, 2, . . . , r . Then the
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
elements z i = (xi1 , xi2 , . . . , xir ) are invertible and the dif- set of points to a set L called the set of lines, satisfying three
ference of any two is invertible. axioms: (1) Exactly one line is determined by two distinct
Define r , n × n Latin squares Ms by ms i j = z s yi points; (2) two lines intersect in exactly one point; and (3)
+ y j , where yi , y j run through all elements of R. Then there exist four points no three of which are collinear. The
this function is 1–1 in y j . Since zr is invertible, it is 1–1 last axiom is only to guarantee that the system does not
in yi . From the fact that z s − z t is invertible, it follows reduce to various special cases.
that (ms i j , mt i j ) = (ms hk , mt hk ) unless i = h and Let be any division ring. Left modules over
j = k. So all the square are orthogonal. If n is a prime have many of the properties of vector spaces. Let
power, then n − 1 is the maximum number of orthogonal = × × as a left module. Let be the set of
n × n Latin squares. However, there exists a pair of or- submodules of the form x, x ∈ . Let be the set of sub-
thogonal n × n Latin squares for all n = 2, 6 constructed modules of the form x + y, x, y ∈ , where x = dy
by R. C. Bose, S. S. Shrikhande, and E. T. Parker in 1958. for any d ∈ . Then and are essentially one- and
The existence of 10 × 10 Latin squares disproved a long- two-dimensional subspaces of . Let incidence be the re-
standing conjecture of Euler. lation U ⊂ V . Then we have a projective plane. A choice of
A k × m orthogonal array is a k × m matrix (ai j ) such = R gives the standard geometric projective plane. This
that for any i = j the ordered pairs ais , a js are distinct for type of projective plane is characterized by the validity of
s = 1 to m. the theorem of Desargues.
An (m + 2) × n 2 orthogonal array with entries from an Many more general systems also give rise to projective
n-element set yields m mutually orthogonal Latin squares. planes. A ternary ring (or groupoid) is a set R with an oper-
The first two rows run through all pairs (i, j) so let mr i j ation R × R × R to R denoted x ◦ m ∗ b. Every projective
be the entry in row r + 2 such that rows 1, 2 have entries plane can be realized with the set R × R ∪ R ∪ {∞},
(i, j) in that place. where R × R is considered as a plane for the ternary ring,
For n a prime power congruent to 3 modulo 4, an orthog- R ∪ {∞} the points on a line at infinity. Incidence is in-
onal 4 × n 2 array can be constructed. Let F be a field of terpreted set theoretically by membership, and lines con-
order n and let g generate F∗ its multiplicative group. Be- sist of sets y = x ◦ m ∗ b together with a line at infinity.
cause of the congruence condition −1 will not be a square Necessary and sufficient conditions that the ternary ring
in F. Let y1 , y2 , . . . , yk , k = 12 (n − 1), be distinct symbols give a projective plane are that for some 0, 1 ∈ R, for all
not in F. Take as columns of the array all columns of the x, m, b, y, k ∈ R: (TR-1) 0 ◦ m ∗ b = x ◦ 0 ∗ b = b; (TR-2)
types shown in Table IV, where x ranges through F, and 1 ◦ m ∗ 0 = m ◦ 1 ∗ 0 = m; (TR-3) there exists a unique z
i independently varies from 1, 2, . . . , 12 (n − 1), together such that a ◦ m ∗ z = b; (TR-4) there exists a unique z
with n columns such that z ◦ m ∗ b = z ◦ k ∗ a if k = m; and (TR-5) there
x exist unique z, w such that a ◦ z ∗ w = x and b ◦ z ∗ w = y
x if a = b.
, x ∈F Finite fields give ternary rings xm + b of all prime
x
power orders. It is unknown whether ternary rings (TR-1)
x to (TR-5) exist not of prime power order. If |R| = m, then
and (n − 1) /4 columns corresponding to a pair of orthog-
2 || = || = m 2 + m + 1.
onal Latin squares of size (n − 1)/2 × (n − 1)/2 with en- Projective planes are essentially a special case of block
tries y1 , y2 , . . . , yk . This gives an orthogonal array yield- designs. A balanced incomplete block design of type (b, ν,
ing, for example, two 10 × 10 orthogonal Latin squares. r, k, λ) consists of a family Bi , i = 1 to b of subsets of a
set V having ν elements such that (1) |Bi | = k > λ for all
i; (2) |{i : x ∈ Bi }| = r for all x ∈ V ; and (3) |{i : x ∈ Bi and
G. Applications to Projective Planes y ∈ Bi }| = λ for all x ∈ V, x = y. These are also used in de-
A projective plane is a mathematical system with a binary sign of experiments where Bi is the ith set of experiments
relation called incidence (lying on) from a set P called the and V the set of varieties tested.
Let A = (ai j ) be the matrix such that ai j = 1 if and
TABLE IV Columns Used to Construct Two Orthogonal Latin
only if the ith element of V occurs in Bi . Then A AT =
Squares λJ +(r − λ)I and all column sums of A are k, where J is a
matrix all of whose entries are 1. Moreover, these proper-
yi g 2i (g + 1) + x g 2i + x x
ties characterize balanced incomplete block designs. For
x yi g 2i (g + 1) + x g 2i + x
k = 3, λ = 1, designs are called Steiner triple systems.
g 2i + x x yi g 2i (g + 1) + x
A permutation group G acting on a set S is called doubly
g 2i (g + 1) + x g 2i + x x yi
transitive if for all x = y, u = v in S there exists g in G with
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
gx = u and gy = v. If T ⊂ S, then the family of subsets f (x) f (y) for all x, y ∈ . The equivalence relation
{Bi } = {g(T )} forms a balanced incomplete block design. {(x, y) : f (x) = f (y)} is a congruence. This means that,
if x ∼ y, then x + z ∼ y + z, x z ∼ yz, and zx ∼ yz for all
z ∈ . Conversely, for every congruence there is a quotient
VII. OTHER ALGEBRAIC STRUCTURES semiring and a homomorphism to the quotient semiring
x → x̄.
A. Groupoids If the semiring is partially ordered, so are the semirings
of n-square matrices over it.
As mentioned before, a groupoid is a set having one bi-
Just as Boolean matrices represent binary relations, so
nary operation satisfying only closure. For instance, in a
matrices over semirings on [0, 1] can represent relations
finite group, the operation aba −1 b−1 called the commuta-
in which there is a concept of degree of relationship. One
tor gives a groupoid. Congruences and homomorphism on
widely used semiring is the fuzzy algebra in which the op-
groupoids are defined in the same way as for semigroups,
erations are sup{x, y}, inf{x, y}, giving a distributive lat-
and for every congruence there is a quotient groupoid.
tice. Fuzzy matrices have applications in clustering theory,
Sometimes as in topology, groupoids are used to construct
where objects are partitioned into a hierarchy of subsets
quotient groupoids which are groups.
on the basis of a matrix C = (ci j ) giving the similarity be-
Groupoids are also used in combinatorics. For example,
tween objects i and j. Applications of fuzzy matrices are
any n × n Latin square, its entries labeled 1, 2, . . . , n is
also found in cybernetics.
equivalent to a groupoid satisfying (q) any two of a, b, c
Inclines are a more general class of semirings. An
determines the third uniquely in ab = c. A groupoid sat-
incline is a semiring satisfying x + x = x, x y ≤ x, and
isfying (q) is called a quasigroup. A loop is a quasi-
x y ≤ y for all x, y. The set of two-sided ideals in any
group with two-sided identity. Figure 9 classifies some
ring or semigroup forms an incline. The additive opera-
one-operation structures.
tion in an incline makes it a semilattice and, under weak
restrictions such as compactness of intervals, a lattice. In a
B. Semirings finitely generated incline for every sequence (ai ), i ∈ Z+ ,
there exist i < j with ai ≥ a j . The set of n-square matrices
A semiring is a system in which addition and multiplica- over an incline is not an incline under matrix multiplica-
tion are semigroups with distributivity on both sides. Any tion but is one under the elementwise product given by
Boolean algebra and any subset of a ring closed under ad- (ai j ) % (bi j ) = (ai j bi j ). Inclines can be used to study opti-
dition and multiplication as the nonnegative elements in mization problems related to dynamic programming.
Z or R are semirings. The set of matrices over a semiring
with 0, 1 comprises a semiring with additive and multi-
plicative identity. C. Nonassociative Algebras and Higher
A homomorphism of semirings is a function f from Order Algebras
to such that f (x + y) = f (x) + f (y), and f (x y) = A nonassociative ring is a system having two binary oper-
ations satisfying all the axioms for a ring except associa-
tivity of multiplication. A nonassociative algebra A over a
field F in addition is a vector space over the field satisfying
the algebra property a(bc) = b(ac) = (ab)c for all a ∈ F,
b, c ∈ A. There exists a nonassociative eight-dimensional
division algebra over the real numbers called the Cayley
numbers. It is an alternative ring, that is, (yy)x =
y(yx), and (yx)x = y(x x) for all x, y. It has applications
in topology and to projective planes.
A Lie algebra is an algebra in which for all a, b, c
the product denoted [a, b] satisfies (L-1) [a, a] = 0;
(L-2) [a, b] + [b, a] = 0; and (L-3) [[a, b], c] + [[b, c], a] +
[[c, a], b] = 0. In any associative algebra, the commuta-
tors [a, b] = ab − ba define a Lie algebra. Conversely, for
any Lie algebra an associative algebra called the univer-
sal enveloping algebra can be defined such that the Lie
algebra is a subalgebra of its algebra of commutators. In
FIGURE 9 Classification of one-operation structures. many topological groups, for every element a there exists
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
a parametrized family b(t), t ∈ R of elements such x1 , x2 , . . . , xn equals a certain other composition for all
that b(1) = a, b(t + s) = b(t)b(s) = b(s)b(t). The opera- x1 , x2 , . . . , xn in the set. The commutative, associative,
tion b(t)c(t) is commutative up to first order in t as t → 0 and distributive laws for commutative rings form an ex-
and defines a real vector space. The operation b(t)c(t)b−1 ample. A variety is the class of all structures satisfying
(t)c−1 (t) taken to second order in t then defines a Lie alge- given laws. Any law valid for a class of structures will
bra. All finite dimensional connected, locally connected also hold for their substructures, homomorphic images,
continuous topological groups can be classified using Lie and direct products of them. Conversely, suppose a class
algebras. of structures is closed under taking of substructures,
For any group G, let i (G) denote the subgroup gen- product structures, and homomorphic images. For any set
erated by all products of i elements under the operation of generators g we can construct an element of whose
of group commutator x yx −1 y −1 . Then i (G) is a normal only relations are those implied by the general laws of .
subgroup and i (G)/ i+1 (G) is an Abelian group. Com- To do this we take one copy of each isomorphism class
mutators give a product from i (G)/ i+1 (G) × j (G)/ Sα of structues in that are either finite or have the same
j+1 (G) to i+ j (G)/ i+ j+1 (G) satisfying (L-1) to (L-3). cardinality at most | g |. Replicate these copies until we
Such a structure is called a Lie ring. The sequence of nor- have one copy Sαβ for every map f αβ : g → Sα . Take the
mal subgroups i (G) is called the lower central series. direct product ×Sαβ and the substructure F(g) generated
A (noncommutative) Jordan algebra is a nonassociative by the image of g under each. Then by construction any
algebra such that (x 2 y)x = x 2 (yx), and (x y)x = x(yx). relation among these generators must hold identically in
Commutative Jordan algebras are used in constructing ex- all the Sα and thus be a law. It follows that the relations of
ceptional Lie algebras. For any associative algebra, the g conicide with laws of .
product x y + yx defines a commutative Jordan algebra. If S is any structure in which the laws hold with |S| = g,
They also are used in physics. then the the relations of F(g), being laws of , hold is S. In
A median algebra is a set S with ternary product x yz particular, there will then be an onto homomorphism from
such that (1) x yz is unchanged under any permutation of F(g) to S. This proves that a class is a variety if and only if
x, y, z; (2) (x(x yz)w) = (x y(x zw)); and (3) x yy = y for it is closed under taking substructures, product structures,
all x, y, z, w. The product in any two factors is then a and homomorphic images. Much work has been done on
semilattice. The set of n-diemensional Boolean vectors is classifying varieties, especially groups.
a median algebra under the componentwise product that Quasivarieties of relational structures on a set S also
x yz equals whichever of x, y, z is in the majority. The set exist but must be defined a little differently, as classes
of linear orders on n elements is a median algebra under closed under isomorphism, structures induced on subsets,
the same operation. and product structures. Laws are of three types. For a col-
lection of variables x1 , x2 , . . . , xn take a collection Tα of
m-tuples from x1 , x2 , . . . , xn for each m-ary relation R.
D. Varieties
For all replacements of xi by elements of the set T ei-
As mentioned earlier an algebraic operation on a set S is a ther (1) if for all α, all members of Tα belong in R, then
function S n = S × S × · · · × S to S for some positive in- (xi(1) , xi(2) , . . . , xi(k)) ∈ R, where i(1), i(2), . . . , i(n) ∈ Z+ ;
teger n. An operational structure is any labeled collection (2) if for all α, all members of Tα belong in R, then
of operations on a set S. A substructure is a subset closed (xi(1) , xi(2) , . . . , xi(k) ) ∈ R; or (3) if for all α, all members of
under all operations. A congruence is an equivalence re- Tα belong in R, then xi(1) = xi(2) for some i(1), i(2) ∈ Z+ .
lation such that for each operation if xi = yi for i = j The classes of reflexive, irreflexive, symmetric, and tran-
and x j ∼ y j , then f (x1 , x2 , . . . , xn ) ∼ f (y1 , y2 , . . . , yn ). sitive binary relations are all quasivarieties.
Multiplication is then uniquely defined on equivalence
classes, which form a quotient structure. A homomor-
E. Categories and Topoi
phism is a function h such that for each operation f and all
x1 , x2 , . . . , xn in the domain h[ f (x1 , x2 , . . . , xn )] = If we take algebraic structures of a given type and ig-
f [h(x1 ), h(x2 ), . . . , h(xn )]. Its image will be a substruc- nore the internal structure but consider only the homo-
ture of the structure into which f is defined. An isomor- morphisms that exist between them, then we have assen-
phism is a 1–1 onto homomorphism. The direct product tially a category. To be precise, a category consists of (1)
of algebraic structures with operations of the same type is a class called the class of objects together with (2) a
the product set with componentwise operations. set Hom(x, y) for all x, y ∈ called the set of morphisms
A law for an algebraic structures S is a relation which from x to y; (3) a special morphism 1x for each x in
holds identically for all elements of the structure. That is, called the identity morphism; and (4) an operation from
it asserts that a certain composition of operations applied Hom(x, y) × Hom(y, z) to Hom(x, z) called composition
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
of morphisms. It is required that composition be associa- phism of C1 such that the objects involved correspond,
tive when defined and that 1x act as the identity element the mappings 1x go to identities, and composition is pre-
whenever composition with it is defined. The most fun- served. Group rings over a fixed ring give a functor from
damental category consists of all sets and all functions, the category of groups to the category of rings.
between sets. Sets and binary relations form a category.
There are categories in which Hom(x, y) does not con-
F. Algebraic Topology
sist of functions, for example, let C be a poset and let
Hom(x, y) = (x, y) if x ≤ y and Hom(x, y) = ∅ otherwise. Algebraic topology is the study of functors from subcat-
The category of all modules over a ring is an additive egories (subsets of the sets and morphisms of a category
category: Morphisms can be added in a natural way. forming a category under the same operations) of the cat-
The direct product of objects K 1 × K 2 can be de- egory of topological spaces and continuous mappings to
scribed in terms of categories by its universal property. categories of algebraic structures. These functors often
This is that there exist mappings πi : K 1 × K 2 to K i such allow one to deduce the nonexistence or existence of pos-
that for any object A and any mappings f i : A to K i sible topological spaces or continuous mappings. If for
there exists a unique mapping h : A → K 1 × K 2 such that some functor F and two topological spaces X, Y, F(X ) is
πi (h(x)) = f i (x). There exists a dual concept of direct not isomorphic to F(Y ), then X, Y cannot be topologically
sum, an object K 1 × K 2 with mapping g j : K j → K 1 × K 2 equivalent. In topology, any mapping between spaces is
such that for any object A and any mappings f j : K j to A meant to be a continuous mapping.
there exists a unique mapping h : K 1 × K 2 → A such that Let A(B) be a subset of the topological space X (Y ).
h(g j (x)) = f j (x). A mapping from X, A to Y, B is a mapping f : X → Y
More generally, in category theory, one works with fi- such that f (A) ⊂ B. Two mapings f and g : X, A → Y ,
nite diagrams, or graphs, in which each vertex is an object B are called homotopic if there exists a mapping
and each arrow a morphism. The limit of a diagram cosists h : × X, × A → Y, B where = [0, 1] such that
of an object A and a mapping g j from each object a j of h(0, x) = f (x), and h(1, x) = g(x). This is an equivalence
the diagram to A such that if there is an arrow f : ak → a j , relation. Equivalence classes of mappings under homo-
then g j ( f (x)) = gk (x) and for any other object H and map- topy form a new category called the homotopy category.
pings m j with the same properties there exists a unqiue The first important functor from a category associated
mapping h : A → H such that h(g j (x)) = m j (x) for all j. to the category of topological spaces to the category of
A colimit is the dual concept. Any variety of relational groups is the fundamental group. Let x0 denote any point
structures, as the variety of commutative rings, has prod- of X . Homotopy classes of mappings f, g from ,
ucts, coproducts, limits, and colimits. Moreover, there is (where = {0, 1}) to X, x0 can be multiplied by defin-
the trivial structure {0} where any set of operations on zero ing f ∗ g = h such that h(t) = g(2t) for 0 ≤ t ≤ 0.5 and
gives zero. Any object has a unique mapping to zero. h(t) = g(2t − 1) for 0.5 ≤ t ≤ 1. The product depends only
A topos is a category with limits, colimits, products, on homotopy classes, and on homotopy classes is a group,
coproducts, and a few additional properties: There ex- where f (1 − t) is the inverse of f (t).
ists an exponential object X Y for any objects X , Y such Every group is the fundamental group of some topologi-
that for any object X there is a natural isomorphism from cal space, and every group homomorphism can be realized
Hom(Z , X Y ) to Hom(Z × Y, X ). In addition, there exists by a continuous mapping of some topological spaces.
a subobject classifier, that is, an object and a mapping Higher homotopy groups are defined in a similar way
called true from a point p to such that there is a 1–1 using an n-cube and its boundary in place of , .
correspondence between substructures of any object X Suppose a space Y is a topological groupoid; that is,
and mappings X → . The category of sets acted on the there exists a continuous mapping Y × Y → Y . Then the
left by a fixed monoid M is a topos. In the category of set [X : Y ] of homotopy classes of mappings from X to
sets itself = {0, 1}, where subsets T of a set S are in Y is a groupoid for any topological space Y . If we let
1–1 correspondence with mappings f : S → {0, 1} such Y be a topological space whose only nonzero homo-
that f (x) = 1 if x is in T and x = 0 if x ∈ T . Topoi have im- topy group is an Abelian group G in dimension n, called
portant applications to models in mathematical logic such an Eilenberg–MacLane space, then [X ; Y ] is a group
as in Boolean-valued models used to show the indepen- called the nth cohomology group of X . All the cohomol-
dence of the continuum hypothesis in Zermelo–Frankel ogy groups together form a ring. From the cohomology
set theory. groups of Eilenberg–MacLane spaces themselves can be
A functor from one category C1 to another category obtained cohomology operations, that is, mappings from
C2 consists of an assignment of a unique object of C2 to one cohomology group to another preserved by continous
each object of C1 , a unique morphism of C2 to each mor- mappings.
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
A vector bundle over a topological space X is a vector Two quadratic forms are isomorphic if they become
space associated to each point of X such that the union identical after a linear invertible change of variables. A
of the vector spaces forms a topological space. An exam- form is isotropic if nonzero x gives f (x) = 0. It is defined
ple is the tangent vectors to a surface. The K -theory of to be totally isotropic if f (x) = 0 identically, hyperbolic
a topological space is essentially a quotient of the semi- if it is isomorphic to a direct sum of forms f (x) = x1 x2
group of vector bundles over X under the operation of over 2 , and anisotropic if it is not isotropic.
direct sum of vector spaces. The equivalence class of a Over a field every quadratic form is uniquely express-
vector bundle V is the set of vector bundles W such that ible as the direct sum of an anisotropic form, a hyper-
U ⊕ V is equivalent to U ⊕ W for some vector bundle U . bolic form, and a totally isotropic form up to equivalence.
K -Theory of X also equals [X ; Y ] for a space Y which can Cancellation holds for direct sum: if h(x) ⊕ f (x) is iso-
be constructed from quotient groups of groups of n-square morphic to h(x) ⊕ g(x) then f (x) is isomorphic to g(x).
matrices. It is then convenient to work with a ring of isomorphism
By analogy-with topological constructions, researchers classes of forms and their additive inverses called the Witt-
have defined homology and K -theory for rings and Grothendieck ring . ˆ The Witt ring is generated by
modules. a , the class of axi2j has operations induced by direct sum
and tensor product, and is defined by relations 1 = 1
a b = ab , a + b = a + b (1 + ab ). Here ˆ is
G. Inclines and Antiinclines ˆ
defined as /H where H is the ideal of hyperbolic forms.
We have defined incline above as a semiring satisfy- It is studied in terms of the filtration by powers of the ideal
ing the incline inequality. The set of two-sided ideals in I generated by a − 1.
a ring (semigroup) for example forms an incline under Over the real numbers, rank [of the matrix (ai j )] and
I + J (I ∩ J ) and IJ . Many results on Boolean and fuzzy signature (the number of its positive eigenvalues, given
matrices generalize to inclines. Green’s relation - (-) ai j = a ji ) are complete isomorphism invariants. Over the
classes in the semiring of matrices over an incline are rational numbers the strong Hasse principle asserts that
characterized by equality of row, column spaces. In many two forms are isomorphic if and only if they are isomor-
cases the row, column spaces will have a unique basis. phic over each completion (real and p-adic) of the ra-
For the fuzzy algebra [0, 1] under sup{x, y}, inf{x, y}, it tionals. The weak Hasse principle asserts that a form is
is necessary to add the condition that if ci = ai j c j , {ci } isotropic if and only if it is isotropic over each comple-
the basis, then aii ci = ci . Matrices will have computable tion. A complete set of invariants of nonsingular forms
maximal subinverses A × A ≤ A, giving a way to test is given by determinant det[(ai j )] and Hasse invariants
regularity. at each prime. The Hasse invariant of a quadratic form
In any finitely generated incline, no chain has an infi- aii xi2 can be defined as
nite nondecreasing subchain. Eigenvectors and eigenval-
ues can be described in terms of a power of a matrix. A (aii , ai j )
i≤ j
particularly interesting incline is [0, 1] under sup{x, y}
and x y (ordinary multiplication). where (aii , a j j ) = ±1 according as aii x 2 + a j j y 2 = 1 has a
The asymptotic forms of finite matrices can be de- solution or not, over the p-adic numbers. The determinant,
scribed by the existence of positive integers k, d, and a taken modulo squares is called the discriminant.
matrix C such that Milnor’s theorem gives a description of the Witt ring of
(x), x transcendental in terms of the Witt ring of . The
Ak+nd = A % C % · · · % C
Tsen–Lang theorem asserts that if has transcendance
where % denotes entrywise product. degree n over C then every quadratic form of dimension
Antiinclines are defined by the reverse inequalities greater than 2n is isotropic. The theory of quadratic forms
x y ≥ x and yx ≥ x. They have the dual nonascending is also related to that of division algebra.
chain property.
I. Current and Recent Research
H. Quadratic Forms
The greatest achievement in algebra itself since 1950 has
A quadratic form over a ring is a function f (x) = been the classification of finite simple groups. Proofs are
ai j xi x j , ai j ∈ from n × n to . Quadratic forms very lengthy and due to many researchers. Much progress
occur often in statistics and optimization and as the first has been made on decidability of algebraic problems, for
nonlinear term in a power series, as well as in pure example, the result that the word problem is unsolvable in
mathematics. general groups having a finite number of generators and
P1: FVZ Revised Pages
Encyclopedia of Physical Science and Technology EN001C.19 May 7, 2001 13:42
the solution of Hilbert’s problem showing that polynomial been numerous important developments such as the proof
equations in n variables over Z (Diophantine equation) are of the van der Waerden conjecture.
in general undecidable by Turning machines. The proof Mathematical linguistics and automata theory have be-
of Mordell’s conjecture gives a positive step for equations come well-developed subjects.
of degree n in two variables over Q. Another result in Research is actively continuing in most of these areas
group theory was the construction of an infinite but finitely as well as in category theory and the theory of varieties
generated group G such that x m = for all x in G and for and combinatorial aspects of algebra.
a fixed m in Z+ . In the 1990s quantum algebras has become a very ac-
In algebraic geometry, a remarkable theory has been tive field. This deals with structures related to traditional
created leading to the proof of the Weil conjectures. This algebraic structures in the way quantum physics is related
theory made it possible to prove that, for any Diophantine to classical physics. In particular, a quantum group is a
equation, it is decidable whether for every prime number kind of Hopf algebra.
p it has a solution modulo p. Much has been done with A somewhat related and active area is noncommutative
algebraic groups. algebraic geometry, in which a prominent place is occu-
In coding theory, all perfect codes have been found. pied by the K -theoretic ideas of A. Connes.
A pair of n × n orthogonal Latin squares have been con-
structed for all n except 2 and 6.
The entire subjects of homological algebra and alge-
SEE ALSO THE FOLLOWING ARTICLES
braic K -theory have been developed. For a ring , the
following sequence
• ALGEBRAIC GEOMETRY • BOOLEAN ALGEBRA • GROUP
f1 f2
M1 −→ M2 −→ · · · −→ Mn
fn THEORY • MATHEMATICAL LOGIC • SET THEORY •
TOPOLOGY, GENERAL
is called exact if Im( f i ) = Ker( f i+1 ), where Ker( f ) de-
notes the kernel of f . A free resolution of a module is
an exact sequence · · · → n → n −1 → · · · 0 → ,
where each i is a free module. Another module , gives BIBLIOGRAPHY
a sequence
Cao, Z. Q., Kim, K. H., and Roush, F. W. (1984). “Incline Algebra and
gn
· · · ← Hom(n , ) ←− Hom(n −1 , ) Application,” Ellis Horwood, Chichester, England/Wiley, New York.
Childs, L. (1979). “A Concrete Introduction to Higher Algebra,”
gn−1 g0
←− · · · ←− Hom(0 , ) Springer-Verlag, Berlin and New York.
Connes, Ah. (1994). “Noncommutative Geometry,” Academic Press, New
The quotients Ker(gn+1 )/Im(gn ) are independent of the York.
particular free resolution and are called Extn (, ). Fraleigh, J. B. (1982). “A First Course in Abstract Algebra,” Addison-
Wesley, Reading, Massachusetts.
Whitehead’s problem is whether if Ext1 (A, Z) = 0, then
Gilbert, J., and Gilbert, L. (1984). “Applied Modern Algebra,” Prindle,
A is a direct sum of copies of Z. S. Shelah proved this is Weber & Schmidt, Boston, Massachusetts.
independent of the axioms of Zermelo–Frankel set theory. Kassel, C. (1995). “Quantum Groups,” Springer-Verlag, Berlin and
Considerable research has been done on order struc- New.
tures and on ordered algebraic structures in lattice theory Kim, K. H. (1982). “Boolean Matrix Theory and Applications,” Marcel
Dekker, New York.
and general algebra. Finite posets can be studied in ways
Kim, K. H., and Roush, F. W. (1984). “Incline Algebra and Applications,”
similar to topological spaces, because they are equiva- Ellis Horwood, Chichester, England/Wiley, New York.
lent to finite topological spaces. The theory of Boolean Lam, T. Y. (1973). “Algebraic Theory of Quadratic Forms,” Benjamin,
and fuzzy matrices has been developed with the advent Reading, Massachusetts.
of Green’s relations classes. Inclines, semirings (, ◦ , ∗) Laufer, H. B. (1984). “Applied Modern Algebra,” Prindle, Weber &
Schmidt, Boston, Massachusetts.
satisfying x ◦ x = x, x ◦ (x ∗ y) ◦ (y ∗ x) = x are a further
Lidl, R., and Pilz, G. (1984). “Applied Abstract Algebra,” Springer-
generalization. The algebraic structure of semigroups, es- Verlag, Berlin and New York.
pecially regular semigroups, has become well understood. Pinter, C. C. (1982). “A Book of Abstract Algebra,” McGraw-Hill, New
In matrix theory and algebraic number theory, there have York.
P1: ZCK Revised Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN001H-20 May 26, 2001 14:20
Algebraic Geometry
Rick Miranda
Colorado State University
465
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
coefficients of the polynomials lie, and where one looks the most common algebraic structure is the ring, which
for solutions. A field is a set with two binary operations is a set with addition and multiplication, but not neces-
(addition and multiplication) in which all of the usual rules sarily division. The ring of importance in affine algebraic
of arithmetic hold, including subtraction and division (by geometry is the ring K [x] = K [x1 , x2 , . . . , xn ] of polyno-
nonzero numbers). The main examples of interest are the mials in the n variables x = (x1 , . . . , xn ). Sometimes this
field Q of rational numbers, the field R of real numbers, is called the affine coordinate ring, since it is generated
and the field C of complex numbers. Another is the finite by the coordinate functions xi . It is from this ring that we
field Z/ p of the integers {0, 1, . . . , p − 1} under addition draw the subset S ⊂ K [x] of polynomials whose zeros we
and multiplication modulo a prime p. want to study.
Solutions to polynomials in n variables would naturally An ideal J in a ring R (like K [x]) is a subset of the
be n-tuples of elements of the field. If we denote the field ring R with the special properties that it is closed under
in question by K , the natural place to find solutions is addition and “outside multiplication”: if f and g are in J
affine n-space over K , denoted by K n , AnK , or simply An : then f + g is in J , and if g is in J then f g is in J for any
f in the ring. An example of an ideal is the collection of
An = {z = (z 1 , z 2 , . . . , z n ) | z i ∈ K }.
all polynomials that vanish at a particular point p ∈ An .
When n = 1 we have the affine line, if n = 2 we have the If S is a subset of the ring, the ideal generated
by S is
affine plane, and so forth. the set of all finite “linear combinations” i h i f i where
the h i ’s are in the ring and the f i ’s are in the given subset
S. The reader may check that this set, denoted by S , is
B. Affine Algebraic Sets
closed under addition and outside multiplication, and so
Let f (x) = f (x1 , . . . , xn ) be a polynomial in n variables is always an ideal.
with coefficients in K . The zeros of f are denoted by Returning to the setting of algebraic sets, one can easily
Z ( f ): see that if S is any collection of polynomials in K [x],
and J = S is the ideal in K [x] generated by S, then
Z ( f ) = {z ∈ An | f (z) = 0}.
S ⊂ J and Z (S) = Z (J ): the set S and the ideal J have
For example, Z (y − x 2 ) is a parabola in the plane, and exactly the same set of common zeros. This allows alge-
Z (x 2 − y 2 − z 2 − 4) is a hyperboloid in 3-space. braic geometers to focus only on the zeros of ideals of
More complicated geometric objects in affine space polynomials: every algebraic set is of the form Z (J ) for
must be defined by more than one polynomial. Let S be an ideal J ⊂ K [x]. It is not the case that the ideal defining
a set of polynomials (not necessarily finite) all having co- the algebraic set is unique, however: it is possible that two
efficients in the field K . The common zeros of all the different ideals J1 and J2 have the same set of common
polynomials in the set S is denoted by Z (S): zeros, so that Z (J1 ) = Z (J2 ).
Z (S) = {z ∈ An | f (z) = 0 for all f ∈ S}.
An algebraic set in A , or simply an affine algebraic set, is a
n
D. Algebraic Sets Defining Ideals
subset of An of the form Z (S) for some set of polynomials
S in n variables. That is, an affine algebraic set is a set We saw in the last paragraph that ideals may be used to
which is exactly the set of common zeros of a collection define all algebraic sets, but that the defining ideal is not
of polynomials. Affine algebraic sets are the fundamental unique. In the construction of algebraic sets, we look at
objects of study in affine algebraic geometry. a collection of polynomials and take their common zeros,
The empty set is an affine algebraic set [it is Z (1)] and which is a set of points in An . Now we turn this around,
so is all of affine space [it is Z (0)]. The intersection of an and look instead at a subset X ⊂ AnK of points in affine
arbitrary collection of algebraic sets is an algebraic set. space, and consider all the polynomials that vanish on X .
The union of a finite number of algebraic sets is also an We denote this by I (X ):
algebraic set. Therefore the algebraic sets in affine space
form the closed sets in a topology; this topology is called I (X ) = { f ∈ K [x] | f (z) = 0 for all z ∈ X }.
the Zariski topology.
No matter what kind of subset X we start with, this is
always an ideal of polynomials. Again there is the same
C. Ideals Defining Algebraic Sets
possibility of non-uniqueness: it may be that two differ-
The “algebraic” part of algebraic geometry involves the ent subsets X and Y of An have the same ideal, so that
use of the tools of modern algebra to study algebraic sets. I (X ) = I (Y ). This would happen if every polynomial that
Algebra is the study of sets with operations on them, and vanished on X also vanished on Y and vice versa.
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
The ideal I (X ) for a subset X ⊂ An has a special rational numbers, the field R of real numbers, and all finite
property not shared by all ideals: it is closed under roots. fields are not algebraically closed.
That is, if f is a polynomial, and f m vanishes on all the
points of X , then it must be the case that f itself vanishes. Theorem 1 (Hilbert’s Nullstellensatz). If K is an
This means that if f m ∈ I (X ), then f ∈ I (X ). An ideal algebraically closed field, then for any ideal J ⊂ K [x],
with this special property is called a radical ideal. I (Z (J )) = rad(J ).
The power of the Nullstellensatz occurs because in
E. The Z–I Correspondence many applications one has access to a polynomial f which
is known to be zero when other polynomials g1 , . . . , gk
The two operations of taking an ideal of polynomials and
are. The conclusion is then that f is in the radical of the
using Z to get a subset of affine space, and taking a subset
ideal generated by the gi ’s. Therefore there is a power of
of affine space and using I to get an ideal of polynomials
f which is a linear combination of the gi ’s, and so there
form the theoretical foundation of affine algebraic geom-
is an explicit equation of the form
etry. We have the following basic facts:
fm = h i gi .
r If J1 ⊂ J2 ⊂ K [x] are ideals, then Z (J1 ) ⊃ Z (J2 ). i
r If X 1 ⊂ X 2 ⊂ An , then I (X 1 ) ⊃ I (X 2 ). The Nullstellensatz permits a more detailed correspon-
r If J ⊂ K [x] is an ideal, then I (Z (J )) ⊃ J . dence of properties between the algebraic set X and its
r If X ⊂ An , then Z (I (X )) ⊃ X . ideal I (X ). Typical statements when K is algebraically
closed are as follows:
This last statement can be sharpened to give a crite-
rion for when a subset of An is algebraic, as follows. r There is a one-to-one correspondence between
If X is algebraic, equal to Z (J ) for some ideal J , then algebraic subsets of An and radical ideals in K [x].
I (X ) = I (Z (J )) ⊃ J , so that Z (I (X )) = Z (I (Z (J ))) ⊂ r X is empty if and only if I (X ) = K [x].
Z (J ) = X , which forces Z (I (X )) = X . Conversely if r X = An if and only if I (X ) = {0}.
Z (I (X )) = X , then X is obviously algebraic. Hence, r X consists of a single point if and only if I (X ) is a
to obtain a function on X . Functions on X which are re- single quadratic equation in two variables, and they define
strictions of polynomial functions are called polynomial a curve in the affine plane. The classification and study of
functions. The polynomial functions on X form a ring, conics date to antiquity. Most familiar is the classification
denoted by K [X ] and called the affine coordinate ring of of nonempty irreducible conics over the real numbers R:
X . For example, the affine coordinate ring of affine space we have either an ellipse, a parabola, or a hyperbola. Again
is the entire polynomial ring: K [An ] = K [x]. an appreciation of the points “at infinity” sheds light on
There is an onto ring homomorphism (restriction) from this classification: an ellipse has no real points at infinity,
K [x] to K [X ], whose kernel is the ideal I (X ). There- a parabola has one real point at infinity, and a hyperbola
fore K [X ] is isomorphic as a ring to the quotient ring has two real points at infinity.
K [x]/I (X ). Over any field, if an irreducible conic C [given by
Irreducible algebraic sets are often referred to as f (x, y) = 0] is nonempty, then it may be parametrized
algebraic varieties. If X is an algebraic variety, then I (X ) using a rational function of one variable. This is done by
is a prime ideal, so that the coordinate ring K [X ] is an in- choosing a point p = (x0 , y0 ) on C, writing the line L t
tegral domain (if f g = 0 then either f = 0 or g = 0). The through p with slope t [given by y − y0 = t(x − x0 )], and
field of fractions of K [X ], denoted by K (X ), represent intersecting L t with C. This intersection will consist of
rational functions on the variety X . Rational functions are the point p and one other point pt which depends on t.
more useful than polynomial functions, but they have the One may solve easily for the coordinates of pt as ratio-
drawback that any given rational function may not be de- nal functions of t, giving a rational parametrization for the
fined on all of X (where the denominator vanishes). If a conic C. From the point of view of maps, this gives a ratio-
rational function is defined at a point p, one says that the nal map φ : A1 → C which has an inverse: for any point
function is regular at p. (x, y) on C, the t-value is t = (y − y0 )/(x − x0 ). Invertible
The rational function field K (X ) of an algebraic variety rational maps are called birational, and most of the clas-
X is an extension field of the base field K , and as such has sification efforts of algebraic geometry are classifications
a transcendence degree over K : this is the largest num- up to birational maps.
ber of algebraically independent rational functions. This Applying this procedure to the unit circle C (defined
transcendence degree is the dimension of the variety X . by x 2 + y 2 = 1) and choosing the point p to be the point
Points have dimension zero, curves have dimension one, (−1, 0), one finds the parametrization
surfaces have dimension two, and so forth.
Polynomial and rational functions are used to define 1 − t2 2t
pt = , .
maps between algebraic sets. In particular, maps between 1 + t2 1 + t2
two affine spaces are simply given by a vector of functions.
We see here the origins of algebraic number theory, in
Maps between affine algebraic sets are given by restriction
particular the formulas for the Pythagorean triples. If
of such a vector of functions. Depending on the type of
we look at the case when t = p/q is a rational number,
functions used, such maps are called polynomial or ratio-
we find (clearing the denominator q) that ((q 2 − p 2 )/
nal maps. Again, rational maps have the property that they
(q 2 + p 2 ), 2 pq/(q 2 + p 2 )) is a point on the unit circle.
may not be defined everywhere. A map is called regular if
Again clearing the denominator q 2 + p 2 , we see that this
it is given by rational functions but is defined everywhere.
means
H. Examples (q 2 − p 2 )2 + (2 pq)2 = (q 2 + p 2 )2 ,
The most common example of an affine algebraic variety which gives the standard parametrization of Pythagorean
is an affine subspace: this is an algebraic set given by linear triples. This style of argument in elementary number the-
equations. Such a set can always be defined by an m × n ory dates the ancient Greeks, in particular to Appollonius
matrix A, and an m-vector b, as the vanishing of the set of and Diophantus.
m equations given in matrix form by Ax = b. This gives the In higher dimensions, an algebraic variety given by a
geometric viewpoint on linear algebra. One consequence single quadratic equation is called a quadric. Although the
of the geometric point of view is a deeper understanding classification of affine quadrics is not difficult, it becomes
of parallelism for linear spaces: geometrically, one begins clearer that the use of projective techniques simplifies the
to believe that parallel lines might be made to intersect matter considerably.
“at infinity.” This is the germ of the development of pro- Algebraic varieties defined by either more equations
jective geometry. or equations of degree larger than two present a much
No doubt the first nonlinear algebraic sets to be studied greater challenge. Even the study of cubic curves in the
were the conics. These are the algebraic sets given by a plane [given by a single equation f (x, y) = 0 where f
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
has degree three] is still a subject of modern research, [x0 : x1 : . . . : xn ], with each xi ∈ K , not all equal to zero,
especially over number fields. subject to the equivalence relation that
[x0 : x1 : . . . : xn ] = [λx0 : λx1 : . . . : λxn ]
I. Affine Schemes
for any nonzero λ ∈ K . The xi ’s are called the homoge-
The theory exposed above was developed in the hundred neous coordinates of the point [x]; note that they are not
years ending in the middle of the twentieth century. In the individually well-defined, because of the scaling condi-
second half of the twentieth century a different foundation tion. However, it does make sense to say whether xi = 0
to algebraic geometry was developed, which more closely or not.
follows the algebra of the rings and ideals in question. If x0 = 0, we can scale by λ = 1/x0 and assume x0 = 1;
Recall that there is a correspondence between algebraic then all the other n coordinates are well-defined and cor-
subsets of affine space and radical ideals in the polyno- respond to a unique point in affine n-space An :
mial ring K [x]. If the ground field K is algebraically
closed, points correspond to maximal ideals, and irre- p = (x1 , . . . , xn ) ∈ An corresponds to
ducible algebraic sets to prime ideals. The ring of polyno- [1 : x1 : . . . : xn ] ∈ Pn .
mial functions on X is naturally the ring K [x]/I (X ). This
ring is a finitely generated K -algebra, with no nilpotent However, we have a host of new points where x0 = 0 in
elements (elements f = 0 such that f k = 0 for some k). Pn ; these are to be thought of as points “at infinity” of
From this ring one can recover X , as the set of maximal An . In this way the points at infinity are brought into view
ideals. and in fact become no different than any other point, in
The idea of affine schemes is to start with an arbitrary projective space.
ring R and to construct a geometric object X having R
as its natural ring of functions. Grothendieck’s theory, de-
veloped in the 1950s and 1960s, uses Spec(R), the set of C. Projective Algebraic Sets
prime ideals of R, as the set X . This gives the notion of Let K [x] = K [x0 , x1 , . . . , xn ] be the polynomial ring gen-
an affine scheme. erated by the homogeneous coordinates. We can no longer
view these polynomials as functions since (because of the
scaling issue) even the coordinates themselves do not have
II. PROJECTIVE GEOMETRY well-defined values. However, suppose that a polynomial
F(x) is homogeneous; that is, every term of F has the
A. Infinity same degree d. Then F(λx) = λd F(x) for any nonzero
λ, and hence whether F = 0 or not at a point [x] ∈ Pn is
Consider the affine line A1 , which is simply the ground
well-defined.
field K as a set. What should it mean to approach infinity
We therefore define a projective algebraic set to be the
in A1 ? Consider a ratio x/y ∈ K ; if y = 0 this is an element
set of common zeros of a collection S of homogeneous
of K , but as y approaches zero for fixed x, this element
polynomials:
will “approach infinity.” However, we cannot let y equal
zero in this ratio, since one cannot divide by zero in a Z (S) = {[x] ∈ Pn | F(x) = 0 for all F ∈ S}.
field. It makes sense then to separate the numerator and
denominator in this ratio and consider the ordered pair: Correspondingly, given a subset X ⊂ Pn , we define the
let us write [y : x] for this. Since we are thinking of this homogeneous ideal of X , I (X ), to be the ideal in K [x] gen-
ordered pair as representing a ratio, we should maintain erated by all homogeneous polynomials F which vanish
that [y : x] = [ay : ax] for any nonzero a ∈ K . The ordered at all points of X :
pair [1 : x] will represent the number x ∈ K ; the ordered
I (X ) = {homogeneous F ∈ K [x] | F| X = 0} .
pair [0 : 1] will represent a new point, at infinity. We re-
move the ordered pair [0 : 0] from the discussion, since The reader will see the possibility of developing exactly
this would represent an undefined ratio. the same type of theory, complete with a Z −I correspon-
dence, a basis theorem, and a projective version of the
Nullstellensatz, all properly interpreted with the extra at-
B. Projective Space
tention paid to the homogeneity conditions at every turn.
The construction above generalizes to n-dimensional This is in fact what happens, and projective geometry
affine space An , as follows. Let us define projective attains an algebraic foundation as solid as that of affine
space Pn to be the set of all ordered (n + 1)-tuples geometry.
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
D. Regular and Rational Functions In general, one expects that each time one adds a new
equation that must vanish, the dimension of the set of ze-
In the context of projective geometry, the polynomial ring
ros goes down by one. Therefore, in projective space Pn ,
K [x] is called the homogeneous coordinate ring. It is a
which has dimension n, one expects that the projective
graded ring, graded by degree: K [x] = ⊕d≥0 Vd where Vd
algebraic set defined by exactly n homogeneous polyno-
is the vector space of homogeneous polynomials in x of
mials F1 = F2 = · · · = Fn = 0 will have dimension zero;
degree exactly d. If X ⊂ Pn is a projective algebraic set,
this means that it should be a finite set of points.
the homogeneous ideal I (X ) is also graded: I (X ) = ⊕d Id ,
Bezout’s theorem deals with the number of such in-
and the graded quotient ring K [X ] = K [x]/I (X ) is called
tersections. It may happen that a point of intersection is
the homogeneous coordinate ring of X .
counted with multiplicity; this is the same phenomenon
As we have noted above, a polynomial (even a homo-
as when a polynomial in one variable has a double root: it
geneous one) F does not have well-defined values at
counts for two when the number of roots is being enumer-
points of projective space. However, a ratio of polyno-
ated. In more than one variable, there is a corresponding
mials r = F/G will have a well-defined value at a point
notion of multiplicity; this is always an integer at least one
p if both F and G are homogeneous of the same degree,
(for a common isolated root).
and G( p) = 0. Such a ratio is called a rational function of
degree zero, and these functions form the foundation of
Theorem 3 (Bezout’s theorem). Suppose the
the function theory on projective algebraic sets. A rational
ground field K is algebraically closed. Let Fi , i = 1, . . . ,
function whose denominator does not vanish at p is said
n, be homogeneous polynomials in the n + 1 homogeneous
to be regular at p.
variables of projective n-space Pn . Suppose that Fi has
degree di . Then,
E. Homogenization
Let us suppose now that we have an affine algebraic set, (a) The common zero locus X = Z ({F1 , . . . , Fn }) is
given by the vanishing of a polynomial f (x1 , . . . , xn ). We nonempty.
want to investigate how this algebraic set looks at infinity, (b) If X is finite, the cardinality of X is at most the
in projective space. Let d be the largest degree of terms product of the degrees d1 d2 · · · dn .
of f . To each term of f , we multiply in the appropriate (c) If each point of X is counted according to its
power of the new variable x0 to make the term have de- multiplicity, the sum of the multiplicities is exactly
gree exactly d. This produces a homogeneous polynomial the product of the degrees d1 d2 · · · dn .
F(x0 , x1 , . . . , xn ), which for x0 = 1 has the same zeros as
the original polynomial f . However, now for x0 = 0 we For example, three quadrics (each of degree two) in P3
have new zeros, at infinity, in projective space. always intersect. If the intersection is finite, it is at most
For a general affine algebraic set X defined by more than eight points; if the points are counted according to their
one equation, we do this process for every polynomial in multiplicity, the sum is exactly eight.
I (X ), producing a collection of homogeneous polynomi- As another example, a line and a cubic in the plane will
als S. Then Z (S) ⊂ Pn is the projective closure X̄ of X intersect three times, counted with multiplicity. This is the
and exactly adds the minimal set of points at infinity to X basis for the famous “group law” on a plane projective cu-
to produce a projective algebraic set. bic curve X : given two points p and q on X , the line joining
For example, let us consider the hyperbola in the affine p and q will intersect the curve X in a third point, and this
plane defined by f (x, y) = x 2 − y 2 − 1 = 0. We homog- operation forms the basis for a group structure on X .
enize this to F(z, x, y) = x 2 − y 2 − z 2 , a homogeneous
polynomial of degree two. For z = 0, we recover the
two points [0 : 1 : 1] and [0 : 1 : −1] at infinity (where III. CURVES AND SURFACES
x 2 − y 2 = 0), up to scalar factors.
A. Singularities
F. Bezout’s Theorem
Let X be an irreducible algebraic variety, of codimension
One of the consequences of adding in the points at infinity r , in An . If xi are local affine variables at a point p ∈ X ,
to form projective space is that algebraic sets which did not and the ideal of X is generated near p by polynomials
intersect in affine space now tend to intersect, at infinity, f 1 , . . . , f k , then the Jacobian matrix J = (∂ f i /∂ x j ) will
in projective space. The classic example is that of two have rank at most r at every point of X and will have
parallel lines in the affine plane: these intersect at one maximal rank r at all points of X away from a subalge-
point at infinity in the projective plane. braic set Sing(X ). At points of X − Sing(X ), the variety
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
is said to be nonsingular, or smooth; at points of Sing(X ), projective plane. Plücker’s formula addresses the question
the variety is said to be singular, and Sing(X ) is called the of the genus of this curve in relation to the degree of the
singular locus of X . Over the complex numbers, the non- polynomial F:
singular points are those points of X where X is a complex
manifold, locally analytically isomorphic to an open set in Theorem 4 (Plücker’s formula). Suppose X is a
Cd , where d is the dimension of X . A common situation smooth projective plane curve defined by an irreducible
is when Sing(X ) is empty; then the variety is said to be polynomial F(x, y, z) of degree d. Then the genus of X is
smooth. At smooth points of algebraic varieties, there are equal to (d − 1)(d − 2)/2.
local “analytic” coordinates y1 , . . . , yd equal in number
to the dimension of the variety. Plücker’s formula has been generalized to curves with
singularities, in particular with simple singularities like
nodes (locally like x y = 0 or y 2 = x 2 ) and cusps (locally
B. 1-Forms
like y 2 = x 3 ). Then the formula gives the genus of the
A 1-form on a smooth algebraic variety is a collection of desingularization of the curve: if a plane projective curve
local expressions of the form i f i (y) dyi , where {yi } are of degree d has a finite number of singularities which are
local coordinates on the variety and the f i ’s are rational all either nodes or cusps, and there are ν nodes and κ cusps,
functions; this collection of expressions must be all equal then the genus of the desingularization is
under changes of coordinates, and at every point of the (d − 1)(d − 2)
variety at least one of the expressions must be valid. For a g= − ν − κ.
2
curve, where there is only one local coordinate y, a local
1-form expression has the simple form f (y) dy. The set
of 1-forms on a variety form a vector space. E. Elliptic and Hyperelliptic Curves
Using Plücker’s formula, we see that smooth plane pro-
C. The Genus of a Curve jective curves of degree one or two have genus zero. We
have in fact seen above that conics may be parametrized by
Let X be a smooth projective curve. Let ω be a 1-form on lines, and this is possible precisely because they have the
X . If at every point of X there is a local expression for ω same genus. Smooth projective plane cubic curves have
of the form f (y) dy where y is a local coordinate and f is genus one, and they cannot be parametrized by lines. Ev-
a regular function, then we say that ω is a regular 1-form. ery smooth projective plane curve over C has exactly nine
The set of regular 1-forms on a smooth projective curve inflection points, and if an inflection point is put at the
form a finite-dimensional vector space over K , and the projective point [0 : 1 : 0], and if the line of inflection is
number of linearly independent regular 1-forms, which is the line at infinity, the affine equation of the cubic curve
the dimension of this vector space, is the most important can be brought into Weierstrass form
invariant of the curve, called the genus.
If K is the complex numbers, the genus has a topological y 2 = x 3 + Ax + B
interpretation also. A smooth projective curve is a compact for complex numbers A and B, such that 4A3 + 27B 2 = 0
complex manifold of dimension one, which is therefore a (this is the smoothness condition). The form of this equa-
compact orientable real manifold of dimension two. As tion shows that a cubic curve is also representable as a
such, it is topologically homeomorphic to a sphere with double cover of the x-line: for every x-value there are two
g handles attached; this g is the topological genus, which y-values each giving points of the curve.
is equal to the genus. If g = 0, the curve is topologically a In general, there are curves of every genus representable
sphere; if g = 1, the curve is topologically a torus; if g ≥ 2, as double covers, using similar affine equations of the form
the curve is topologically a g-holed torus.
y 2 = f d (x),
The simplest smooth projective curve is the projective
line itself, P1 . It has genus zero. where f d is a polynomial with distinct roots of degree
d. Such a curve is called hyperelliptic and has genus
g = [(d − 1)/2]. In particular these constructions show the
D. Plane Curves and Plücker’s Formula
existence of curves of any genus g ≥ 0.
The most common projective curves studied over the Higher-degree coverings of the projective line, such as
centuries are the plane curves, which are defined by a sin- a curve given by an equation of the form y 3 = f (x), are
gle irreducible polynomial f (x, y) = 0 in affine 2-space, also important in the classification of curves. Coverings of
and then closed up with points at infinity to a projective degree three are called trigonal curves, tetragonal curves
curve defined by the homogenization F(x, y, z) = 0 in the are covers of degree four, and so forth.
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
F. Rational Functions, Forms, jective line P1 ; (b) all curves of genus one are isomor-
and the Riemann-Roch Theorem phic to smooth plane cubic curves; (c) all curves of genus
The most celebrated theorem in the theory of curves is two are hyperelliptic, given by an equation of the form
the theorem of Riemann-Roch, which gives precise infor- y 2 = f 6 (x), where f 6 is a sextic polynomial; (d) all curves
mation about the rational functions on a smooth projective of genus three are either hyperelliptic or smooth plane
curve X . The space of rational functions K (X ) on X forms quartic curves; (e) all curves of genus four are either hy-
a field of transcendence degree one over K , and as such perelliptic or the intersection of a quadric surface and a
is an infinite-dimensional vector space over K . In order cubic surface in P3 ; and (f) all curves of genus five are
to organize these functions, we concentrate our attention hyperelliptic, trigonal, or the intersection of three quadric
on the zeros (where the numerator vanishes) and the poles threefolds in P4 . Of special interest is the so-called canon-
(where the denominator vanishes). Specifically, given a ical embedding of a smooth curve, which shows that a
point p ∈ X and a positive integer m, we may look at all curve of genus g either is hyperelliptic or can be embed-
the rational functions on X with a zero at p, no zero or ded as a curve of degree 2g − 2 in Pg−1 .
pole at p, or a pole at p of order at most m. We denote this
space by L(mp). If m = − n is a negative integer, we de- G. The Moduli Space Mg
fine L(mp) = L(−np) to be the space of rational functions
Much of the research in algebraic geometry since 1960 has
with a zero at p of order at least n.
focused on the study of the moduli spaces for algebraic
A divisor on X is a function from the points of X to
varieties. In general, a moduli space is a topological space
the group of integers Z, such that all but finitely many
M whose points represent particular geometric objects;
values are zero. For any divisor D, we define the vector
the topology on M is such that points which are close in
space L(D) to be the space L(D) = ∩x∈X L(D(x) · x). In
M represent geometric objects which are also “close” to
plain English this is the space of rational functions with
each other. Of particular interest has been the moduli space
restricted poles (to the set of points with positive D-values)
Mg for smooth projective curves of genus g. For exam-
and prescribed zeros (as the set of points with negative D-
ple, since every curve of genus zero is isomorphic to the
values). Part of the Riemann-Roch theorem is that these
projective line, M0 consists of a single point. The moduli
spaces are finite-dimensional.
space M1 classifying curves of genus one is isomorphic
We can make the same construction with rational
to the affine line A1 : the famous j-invariant of elliptic
1-forms also. Let us recall that a rational 1-form has local
curves classifies curves of genus one by a single num-
expression f (y) dy and use the function part f to define
ber. [For a plane cubic curve of genus one in Weierstrass
the zeros and poles. Given a divisor E, we may then con-
form y 2 = x 3 + Ax + B, j = 4A3 /(4A3 + 27B 2 ).] For
sider the space 1 (E) of rational 1-forms with restricted
g ≥ 2, Mg is itself an algebraic variety of dimension
poles and prescribed zeros as E indicates. Again, this is a
3g − 3.
finite-dimensional space of forms.
Of particular interest has been the construction and
study of meaningful compactifications of various moduli
Theorem 5 (Riemann-Roch). Let X be a smooth
spaces. For Mg , the most natural compactification Mg
projective curve of genus g. Let D be a divisor on
X , and was constructed by Deligne and Mumford and the ad-
denote by d the sum of the values of D: d = x D(x).
ditional points represent stable curves of genus g, which
Then,
are curves without continuous families of automorphisms,
dim(L(D)) = d + 1 − g + dim( 1
(−D)). and having only nodes as singularities. Even today the con-
struction and elementary properties of moduli spaces for
The inequality dim(L(D)) ≥ d + 1 − g was proved by higher-dimensional algebraic varieties (e.g., surfaces) is
Riemann; Roch supplied the “correction term” related to a challenge. More recently attention has turned to mod-
1-forms. One of the main uses of the Riemann-Roch the- uli spaces for maps between algebraic varieties, and it is
orem is to guarantee the existence of rational functions an area of very active research today to compactify and
with prescribed zeros and poles, in order to intelligently understand such spaces of maps.
embed the curve in projective space: given rational func-
tions f 0 , . . . , f n on a curve X , the mapping sending x ∈ X
H. Surfaces and Higher Dimensions
to the point [ f 0 (x), f 1 (x), . . . , f n (x)] will always be well-
defined on all of X and under certain circumstances will The construction and understanding of the moduli spaces
embed X as a subvariety of Pn . Mg for smooth curves is tantamount to the successful clas-
Using this technique, we can show for example that sification of curves and their properties. The classification
(a) all curves of genus zero are isomorphic to the pro- of higher-dimensional varieties is not anywhere near as
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
complete. Even surfaces, for which a fairly satisfactory widespread use of general schemes, sheaves, and homo-
classification exists due to Enriques, Kodaira, and others, logical algebra, as well as slightly more general notions
presents many open problems. of algebraic spaces and stacks.
The Enriques classification of smooth surfaces essen- After this period of partial introspection, a vigorous
tially breaks up all surfaces into four categories. The first return to application areas in algebraic geometry began in
category consists of those surfaces with a family of genus the 1980s. We will close this article with brief discussions
zero curves on them. Since genus zero curves are all iso- of a sampling of these.
morphic to lines, such surfaces are known as ruled sur-
faces, and a detailed understanding of them is possible. A. Enumeration
The prototype is a product surface X × P1 for a curve X .
One of the fundamental questions of geometry, and of
The second category consists of surfaces with a nowhere-
many other subjects in mathematics, is the “how many”
vanishing regular 2-form, or finite quotients of such sur-
question: in algebraic geometry, this expresses itself in
faces. These are the so-called abelian surfaces, K 3 sur-
counting the number of geometric objects with a given
faces, Enriques surfaces, and hyperelliptic surfaces. The
property. In contrast to pure combinatorics, often the geo-
third category consists of surfaces with a family of genus
metric approach involves counting objects with multiplic-
one curves on them. Some techniques similar to those
ity (e.g., roots of a polynomial).
used in the study of ruled surfaces are possible, and since
Typical enumerative questions (and answers) ask for the
genus one curves are very well understood, again a rather
number of flexes on a smooth cubic curve (9); the number
detailed description of these surfaces, called elliptic sur-
of lines on a smooth cubic surface (27); the number of
faces, is available. The last category is the surfaces of
conics tangent to five given conics (3264); and the number
general type, and most surfaces are in this category. Mod-
of lines on a smooth quintic threefold (2875).
uli spaces have been constructed, and many elementary
Recent breakthrough developments in intersection the-
invariants are known, but there is still a lot of work to do
ory have enabled algebraic geometers to compute such
to understand general-type surfaces. An exciting current
numbers of enumerative interest which have stumped prior
application area has to do with the connection of algebraic
generations. New excitement has been brought by totally
surfaces (which over the complex numbers are real four-
unexpected relationships with string theory in theoretical
dimensional objects) and the study and classification of
physics, where the work of Witten and others has found
4-manifolds.
surprising connections between computations relating el-
For varieties of dimension three or more, there are
ementary particles and generating functions for enumera-
essentially no areas of complete classification. Basic
tive problems of the above sort in algebraic geometry.
techniques available for curves and surfaces begin to
break down for threefolds, in particular because several
B. Computation
fundamental constructions lead immediately to singular
varieties, which are much more difficult to handle. How- Computation with polynomials is a fundamentally algo-
ever, since the 1980s, starting with the groundbreaking rithmic process which permits in many cases the design of
work of Mori, steady progress has been made on funda- explicit algorithms for calculating a multitude of quantities
mental classification constructions. of interest to algebraic geometers. Usually these quantities
either are related to enumerative questions such as those
mentioned above or are the dimensions of vector spaces
IV. APPLICATIONS (typically of functions or forms) arising naturally either in
elementary settings or as cohomology spaces.
The origins of algebraic geometry, and the development With the advent of sufficient computing power as
of projective geometry in the Renaissance, were driven in represented by modern computers and computer algebra
large part by applications (of the theory of multivariable software packages, it has become more possible to actually
polynomials) to a variety of problems in geography, art, perform these types of computations by means of com-
number theory, and so forth. Later, in the 1800s, newer puter. This has led to an explosion of activity in designing
problems in differential equations, fluid flows, and the efficient and effective algorithms for the computations of
study of integrals were driving development in algebraic interest. These algorithms usually rely in some way on the
geometry. In the 1900s the invention of algebra as we know theory of Gröbner bases, which build on a multivariable
it today caused a rethinking in the foundations of alge- version of the division algorithm for polynomials.
braic geometry, and the energies of working researchers Software is now widely available to execute many cal-
were somewhat siphoned into the development of new culations, and it is typically user-customizable, which
structures and techniques: these have culminated in the has enabled researchers around the world to make
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
fundamental contributions. Software packages which are These three properties tend to act against one another, and
widely used include Macaulay, CoCoA, and Schubert. the theory of error-correcting codes is directed toward the
classification and analysis of possible codes and coding
schemes.
C. Mechanics
Algebraic geometry has found an application in this
Many problems in mechanical engineering and construc- area, by taking the field K to be a finite field, and taking
tion involve the precise understanding of the position of the code to be certain natural spaces of functions or forms
various machine parts when the machine is in motion. on an algebraic variety. Most successful have been at-
Robotic analysis is especially concerned with the position tempts to use algebraic curves; this was initiated by Goppa
and velocity of robot extremities given various parame- and has been successful in producing codes with desirable
ters for joint motion and extension. There are a few basic properties and in aiding in the classification and uniform
motions for machine joints, and these are all describable treatment of several families of previously known codes.
by simple polynomials [e.g., circular motion of radius r
occurs on the circle (x 2 + y 2 = r 2 )]. It is not difficult to see
E. Automatic Theorem Proving
therefore that in suitable coordinate systems, virtually all
such problems can be formulated by polynomial systems. It is often the case, especially in elementary geometry, that
However, mechanical devices with many joints can cor- geometric statements can be expressed by having some
respond to algebraic sets in affine spaces with many vari- polynomial vanish. For example, three points (x1 , y1 ),
ables and many equations, which makes the geometric (x2 , y2 ), and (x3 , y3 ) in the plane are collinear if and only
analysis rather complicated. It is therefore useful to be if the determinant of the matrix
able to apply more sophisticated techniques of the theory
1 1 1
to reduce the complexity of the problem, and this is where
the power of algebraic geometry can come into play. x1 x2 x3 ,
A specific example of the type of problem is the follow- y1 y2 y3
ing so-called n-bar configuration. Let us consider a cycli- which is the polynomial x2 y3 − x3 y2 − x1 y3 + x3 y1 +
cal arrangement of n stiff rods, joined at the ends by joints x1 y2 − x2 y1 , is zero.
that can actuate only in a planar way locally. For what ini- A typical theorem therefore might be viewed as a
tial positions of this arrangement does the configuration collection of hypothesis polynomials h 1 , . . . , h k , and a
actually move and flex? In how many different ways can conclusion polynomial g. The truth of the theorem would
it be made to flex? When n = 3, the configuration must lie be equivalent to saying that wherever the hypothesis poly-
in a plane, and no flexing or motion is possible; for n = 4, nomials all vanish, the conclusion polynomial is also zero.
the flexing configurations, although known, have not been This exactly says that g ∈ I (Z ({h 1 , . . . , h k })), using the
completely analyzed. language of the Z −I correspondence of affine algebraic
geometry.
D. Coding Theory If the field is algebraically closed, the Nullstellensatz
says that the conclusion polynomial g is in this ideal if
A code is a collection of vectors in K n for some n, where K and only if some power of g is a linear combination of the
is a field; each vector in the collection is a code word. The h i ’s; that is, there is some equation of the form
Hamming distance between two code words is the number
of positions where the code words differ. If each code gr = fi h i ,
word is intended to represent some piece of irreducible i
information, then desirable properties of a code are as where f i are other polynomials. A “proof” of the theorem
follows: would be given by an explicit collection of polynomials
f i , which exhibited the equation above for some r , and this
(a) Size: There should be many code words so that a can be easily checked by a computer. Indeed, algorithms
large amount of information can be represented. exist which check for the existence of such an expression
(b) Distinctness: The Hamming distance between any for g and are therefore effective in determining the truth
two code words should be large, so that if a code of a given proposed theorem.
word is corrupted in some way, the original code
word can be recovered by changing the corrupted
F. Interpolation
code word in a small number of positions.
(c) Efficiency: The ambient dimension n, which is the A general problem in approximation theory is to con-
number of positions, should be as small as possible. struct easy-to-evaluate functions with specified behavior.
P1: ZCK Revised Pages
Encyclopedia of Physical Science and Technology EN001H-20 April 20, 2001 17:14
Typically this behavior involves the values of the func- has lately become standard fare for computer graphics
tion at prescribed loci, the values of the derivatives of the specialists and novices alike.
functions, or both. Polynomial interpolation is the method Although initially the application of the ideas of
whereby polynomial functions are used in this manner, and projective geometry were rather elementary, more sophis-
algebraic geometry has found recent applications and new ticated techniques are now being brought to bear, espe-
problems of interest in this field in recent decades. cially those involving problems in computer vision and
Lagrange interpolation involves finding polynomial pattern recognition. In particular it now seems feasible
functions with specified values ck at a specified set of to use subtle projective invariants to discriminate between
points pk ∈ An . In one variable, there is a relatively easy scene objects of either different types at the same perspec-
formula for writing down the desired polynomial, and this tive or the same type at different perspectives.
is taught in most first-year calculus courses. In higher di-
mensions, no such formula exists. However, the problem is
a linear problem, and so it is a straightforward application SEE ALSO THE FOLLOWING ARTICLES
of linear algebra techniques.
Hermite interpolation involves finding polynomial ABSTRACT ALGEBRA • BOOLEAN ALGEBRA • CALCULUS
functions with specified derivative values. This is a signif-
icantly more complicated generalization, and open ques- BIBLIOGRAPHY
tions exist even for polynomials in two variables.
Spline interpolation involves finding piecewise poly- Beauville, A. (1996). “Complex Algebraic Surfaces,” 2nd ed., Cambridge
nomial functions, which stitch together to have a certain Univ. Press, Cambridge, UK.
degree of global smoothness, but which also have speci- Cox, D., Little, J., and O’Shea, D. (1997a). “Ideals, Varieties, and
fied behavior at given points. Cubic splines are the most Algorithms: An Introduction to Computational Algebraic Geometry
and Commutative Algebra,” 2nd ed., Springer-Verlag, New York.
popular in one variable, and with the advent of computer Cox, D., Little, J., and O’Shea, D. (1997b). “Using Algebraic Geometry,”
graphics, two- and three-variable splines are becoming Graduate Texts in Mathematics, Vol. 185, Springer-Verlag, New York.
more widely known and used in elementary applications. Eisenbud, D. (1996). “Commutative Algebra with a View toward Alge-
In all these settings, techniques of algebraic geometry braic Geometry,” Graduate Texts in Mathematics, Vol. 150, Springer-
are brought to bear in order to determine the dimension Verlag, New York.
Fulton, W. (1997). “Intersection Theory,” 2nd ed., Springer-Verlag, New
of the space of suitable interpolating functions. Usually it York.
is a simple matter to find a lower bound for such dimen- Griffiths, P., and Harris, J. (1978). “Principles of Algebraic Geometry,”
sions, and the more difficult problem is to find sharp upper Wiley (Interscience), New York.
bounds. Harris, J. (1992). “Algebraic Geometry: A First Course,” Springer-Verlag,
New York.
Harris, J., and Morrison, I. (1998). “Moduli of Curves,” Springer-Verlag,
G. Graphics New York.
Hartshorne, R. (1979). “Algebraic Geometry,” Springer-Verlag, New
The origins of projective geometry lie in the early York.
Renaissance, when artists and architects began to be inter- Kirwan, F. (1992). “Complex Algebraic Curves,” Cambridge Univ. Press,
ested in accurately portraying objects in perspective. The Combridge, UK.
Miranda, R. (1995). “Algebraic Curves and Riemann Surfaces,” Am.
correct treatment of horizon lines and vanishing points in Math. Soc., Providence.
artwork of the period led directly to a new appreciation of Reid, M. (1988). “Undergraduate Algebraic Geometry,” Cambridge Univ.
points at infinity and eventually to a mathematically sound Press, Combridge, UK.
theory of projective geometry. Shafarevich, I. (1994). “Basic Algebraic Geometry,” 2 vols., Springer-
With the recent explosion of capabilities in computer Verlag, New York.
Sturmfels, B. (1996). “Grobner Bases and Convex Polytopes,” Am. Math.
speed, resolution of displays, and amount and interest of Soc., Providence.
visualizing data, an appreciation of projective geometry, Ueno, K. (1999). “Algebraic Geometry I: From Algebraic Varieties to
homogeneous coordinates, and projective transformations Schemes,” Am. Math. Soc., Providence.
P1: GAE Revised Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN001C-26 May 26, 2001 14:31
I. Background
II. Best Approximation
III. Bivariate Spline Functions
IV. Compact Operators and M Ideals
V. Constrained Approximations
VI. Factorization of Biinfinite Matrices
VII. Interpolation
VIII. Multivariate Polyhedral Splines
IX. Quadratures
X. Smoothing by Spline Functions
XI. Wavelets
GLOSSARY over all of [0, 1] for the periodic case and over
[0, 1 – rt] for the nonperiodic setting. Also, denote
Center Let A be a bounded set in a normed linear space w( f, h) p = w1 ( f, h) p .
X . An element x0 in X is called a center (or Chebyshev Natural spline Let a = t0 < · · · < tn+1 = b. A function s
center) of A if sup{x0 − y : y ∈ A} = infx∈X sup{x in C 2k−2 [a, b] is called a natural (polynomial) spline of
− y : y ∈ A}. order 2k and with knots at t1 , . . . , tn , if the restriction
L Summand A closed subspace S of a Banach space X of s to each [ti , ti+1 ] is a polynomial of degree 2k − 1
is said to be an L summand (or L ideal) if there exists for i = 1, . . . , n − 1, and a polynomial of degree k − 1
a closed subspace T of X such that X = S ⊗ T and on each of the intervals [a, t1 ] and [tn , b].
x + y = x + y for all x ∈ S and y ∈ T . Normalized B spline Let ti ≤ · · · ≤ ti+k with ti < ti+k .
M Ideal A closed subspace S of a Banach space X is The ith normalized B spline of order k and with knots
called an M ideal in X if its annihilator S ⊥ is an L at ti , . . . , ti+k , is defined by Ni,k (x) = (ti+k − ti )[ti , . . . ,
summand of the dual space X ∗ . ti+k ](· − x)k−1 + , where the kth order divided difference
Modulus of smoothness Let f be in L p [0, 1], 1 ≤ of the truncated (k − 1)st power is taken at ti , . . . , ti+1 .
p ≤ ∞, and rt f denote the r th forward difference of n Widths Let A be a subset of a normed linear space X .
f with step size t, where r t ≤ 1. The L p r th mod- The Kolmogorov n width of A in X is the quantity
ulus of smoothness of f is defined by wr ( f, h) p = dn (A; X ) = inf X n supx∈A inf y∈X n x − y, where X n is
sup{rt f p : 0 < t < h}, where the L p norm is taken any subspace of X with dimension at most n. The linear
581
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
yielding good approximations efficiently. This area of ap- als of degree at most n. Although πn is not compact, a
proximation theory includes the subjects of interpolation, compactness argument still applies to yield the existence
least-squares methods, and approximate quadratures. of a best approximant to each f in C[0, 1] from πn . It is
It must be noted that the separation of the field of ap- also well known that best approximation from πn has a
proximation theory into five areas should not be taken unique solution. This can be proved by using the follow-
very strictly since these areas obviously have to overlap. ing characterization theorem, which is better known as the
For instance, the elementary and yet important method of alternation theorem.
least-squares is an approximation scheme with efficient
computational algorithms and yields an optimal approxi- Theorem 1. pn is a best approximant of f in C[0, 1]
mation. In addition, many mathematicians and practition- from πn if and only if there exist points
ers may give a broader or even different definition of ap-
0 ≤ x1 < · · · < xn + 2 ≤ 1
proximation theory. Hence, only selected topics of this
field are treated in this article and the selection of topics such that ( f − pn )(xi ) = a(−1)i f − pn for i = 1, . . . ,
and statements of results has to be subjective. In addi- n + 2, where a = 1 or −1.
tion, many interesting areas, such as approximation theory
More can be said about the uniqueness of best approx-
in the complex domain, trigonometric approximation and
imants as stated in the following so-called strong unicity
interpolation, numerical approximation and algorithms,
theorem.
inequalities, orthogonal polynomials, and asymptotic ap-
proximation, are not included.
Theorem 2. Let f be in C[0, 1] and pn its best ap-
proximant from πn . Then there exists a constant C f such
that
II. BEST APPROXIMATION
f − g ≥ E n ( f ) + C f pn − g
Approximation theory is built on the theory of best ap- for all g in πn .
proximation. Suppose that we are given a sequence of
subsets P1 ⊂ P2 ⊂ · · · in a normed linear space X of func- There are results concerning the behavior of the con-
tions with norm . For each function f in X , consider stant C f in the literature. All the above results on approx-
the distance imation by algebraic polynomials are also valid for any
Chebyshev system.
E n ( f ) = inf{ f − g : g ∈ Pn } Results on the orders (or degrees) of approximation by
algebraic or trigonometric polynomials are called Jack-
of f from Pn . If there exists a function pn in Pn such
son’s theorems. Favard also gave sharp constants for best
that f − pn = E n ( f ), we say that pn is a best approxi-
trigonometric polynomial approximation.
mant of f in X from Pn . Existence of best approximants
Although rational functions do not form a linear space,
is the first basic question in the theory of best approxi-
analogous results on existence, uniqueness, and character-
mation. Compactness of Pn , for instance, guarantees their
ization still hold. Let Rm,n be the collection of all functions
existence. The second basic question is uniqueness, and
p/q, where p ∈ πm and q ∈ πn , such that p and q are rel-
another basic problem is to characterize them. Character-
atively prime and q(x) > 0 for all x on [0, 1]. The last
ization is fundamental to developing algorithms for the
condition is not restrictive since, for approximation pur-
construction of best approximants.
poses, p/q must be finite on [0, 1]. A more careful com-
Suppose that the function f that we wish to approximate
pactness argument shows that every function in C[0, 1]
lies in some subset G of X . Let B be the unit ball of all f
has at least one best rational approximant from Rm,n . The
in G (with f ≤ 1). Then the quantity
following alternation theorem characterizes best rational
E n (B) = sup{E n ( f ) : f ∈ B} approximants. We will use the notation d( p) to denote the
degree of a polynomial p.
is called the order of approximation of functions in G from
Pn . Knowing if, and if so how fast, E n (B) tends to zero is Theorem 3. pm /qn is a best approximant of f from
also one of the most basic problems in best approximation. Rm,n if and only if there exist points
0 ≤ x 1 < · · · < xr ≤ 1
A. Polynomial and Rational Approximation
where r = 2 + max{m + d(qn ), n + d( pm )} such that
Consider the Banach space C[0, 1] with the uniform (or ( f − pm /qn )(xi ) = a(−1)i f − pm /qn for i = 1, . . . , r ,
supremum) norm. Let πn be the space of all polynomi- where a = 1 or −1.
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
Again uniqueness of best rational approximants can as δ → 0. If this n tuple is balanced, then their sum is called
be shown using the alternation theorem, and in fact an a balanced integer. There is an algorithm to generate all
analogous strong unicity result can be obtained. Unfor- balanced integers.
tunately, although rational functions form a larger class
than polynomials, they do not improve the orders of ap- Theorem 4. Let (i 1 , . . . , i n ) be balanced, N its sum,
proximation in general. For instance, the approximation and PN a fully interpolating set of functions in L p [V (δ)],
order of the class of all lip α functions by Rnn is O(1/n α ), in the sense that, for any discrete data {a jk }, there exists a
which is the same as the order of approximation from unique g in Pn such that
πn . However, while best approximation from πn is satu-
g ( j) (xk ) = a jk
rated [e.g., E n ( f ) = O(1/n α ) for 0 < α < 1 if and only if
f ∈ lip α, and E n ( f ) = O(1/n) if and only if f is in the for j = 0, . . . , i k − 1 and k = 1, . . . , n. Then for any f
Zygmund class], this is certainly not the case in rational that is i k -times differentiable around xk , k = 1, . . . , n, the
approximation. A typical example is that the order of best best local L p approximant of f from PN exists and is the
approximation
√ of the important function |x − 12 | from Rnn unique function g0 that satisfies the Hermite interpolation
− n condition
is O(e ), which is much better than O(1/n).
( f − g0 )( j) (xk ) = 0
B. Best Local Approximation for j = 0, . . . , i k − 1 and k = 1, . . . , n. Conversely, to any
n tuple of positive integers (c1 , . . . , cn ) with sum M, there
Because of the alternation properties of best polynomial is some scaling εk (δ) > 0 such that this n tuple is balanced
and rational approximants, these approximants are be- with respect to {εk (δ)}, k = 1, . . . , n; consequently, any
lieved to behave like Taylor polynomials and Padé ap- Hermite interpolation is a best L p local approximation
proximants, respectively, when the interval of best ap- scheme.
proximation is small. This leads to the study of best local
approximation. The general case where (i 1 , . . . , i n ) is not balanced is
Suppose we are given a set of discrete data {a jk }, which much more delicate, and the best result in this direction in
are the values of some (unknown) function f and its the literature is stated as follows:
derivatives at the sample points x1 , . . . , xn , say
Theorem 5. Suppose (i 1 , . . . , i n ) is not balanced and
a jk = f ( j) (xk ), k = 1, . . . , n N is its sum. Assume that each Ak is either an interval for
each δ or is independent of δ. Also assume that
The problem is to give a “best” approximation of f using −1
the available discrete data from some class P of functions
n
sufficiently smooth function f from Rmn is the [m, n] de Ballore. The following inverse result has been ob-
Padé approximant of f . We remark, however, that the tained. Let qmn denote the denominator of the [m, n] Padé
convergence of the net of best uniform approximants on approximant of f . Since n is fixed, we use any norm that is
small intervals to the Padé approximant is in general not equivalent to a norm of the (n + 1)-dimensional space πn .
uniform, but only in measure.
Theorem 6. Suppose that there exist nonzero com-
plex numbers z i and positive integers pi with p1 + · · ·
C. Padé Approximants
+ pk = n, such that
As just mentioned, Padé approximants are best local ap- 1/m
k
proximants and include Taylor polynomials, which are
lim supqmn (z) − (z − z i ) Pi =r <1
just [m, 0] approximants. What is interesting is that Padé m→∞ j=1
approximants to a formal power series are defined alge-
braically, so that they are certainly well defined even if the Then f is analytic in
series has zero radius of convergence. In this case, taking |z| < r −1 max{|z j | : j + 1, . . . , k}
suitable limits of the approximants on the Padé table can
be considered a summability scheme of the divergent se- with an exception of the poles at z 1 , . . . , z k with multiplic-
ries. The most interesting example is, perhaps, the Stieltjes ities p1 , . . . , pk , respectively.
series, Without the geometric convergence assumed in Theo-
a0 + a1 z + a2 z + · · · 2 rem 6, the following inverse result still holds.
shall discuss best approximations by incomplete polyno- Hence, the order of approximation is of interest. Results
mials. in C[0, 1]. Let H be a finite set of nonnegative in this area are called Müntz–Jackson theorems.
integers, and consider the best approximation problem We end our discussion of incomplete polynomials by
considering the density of the polynomials.
E( f, H ) = inf
f − c x
i
i
n
ci
i∈H pn,a (x) = ck x k , m(a) ≥ na
Note that E( f, H ) = E n ( f ) if H = {0, . . . , n}. We also k=m(a)
denote where 0 < a < 1.
E nk ( f ) = E( f, {0, . . . , k − 1, k + 1, . . . , n})
Theorem 11. If pn,a is uniformly bounded, then
for any 0 ≤ k ≤ n. The following result has been obtained:
lim pn,a (x) = 0, 0 ≤ x < a2
n→∞
Theorem 9. Let 0 < c < 1. There exists a positive
and the convergence is uniform on every interval [0, r ],
constant Mk independent of n such that, if f is in C k [0, 1]
where r < a 2 .
and f (k) (c) = 0, then
Thus, the best we can do is to consider approximation
E nk ( f ) ≥ Mk n −k [| f (k) (c)| + o(1)]
on [a 2 , 1]:
as n → ∞.
Hence, since E n ( f ) is of order Theorem 12. Let f ∈ C(a 2 , 1). Then there exists a
sequence of polynomials pn,a such that
O n −k E n ( f (k) ) = o(n −k )
lim pn,a (x) = f (x)
we have E nk ( f )/E n ( f ) → ∞ for each f in C k [0, 1] whose n→∞
kth derivative does not vanish identically in [0, 1]. So even uniformly on [r , 1] for any r > a 2 .
one term makes some difference in studying the order of
approximation.
E. Chebyshev Center
If only finitely many terms are allowed, the natural
question is which are the optimal ones. Let 0 < n < N Suppose that, instead of approximating a single function
and Hn + {t1 , . . . , tn } be a set of integers with 0 ≤ t1 < f in a normed linear space X from a subset P of X ,
· · · < tn < N . The problem is to find a set Hn , which we we are interested in approximating a bounded set A of
call an optimal set, denoted by Hn∗ , such that functions in X simultaneously from P. Then the order of
approximation is the quantity
E f, Hn∗ = inf{E( f, Hn ) : Hn }
An analogous problem for trigonometric approximation r P (A) = inf sup f − g : g ∈ P
has interesting applications to antenna design where cer- f ∈A
tain frequencies are to be received by using a specified Of course, if P = Pn and A is a singleton { f }, then r P (A)
number of antennas. Unfortunately, the general problem reduces to E n ( f ), introduced at the beginning of this sec-
is unsolved. We have the following result: tion. In general, r P (A) is called the Chebyshev radius of
A with respect to P. A best simultaneous approximant x0
Theorem 10. For f (x) = x M , where M ≥ N , Hn∗ = of A from P, defined by
{N − n, . . . , N − 1} and is unique.
sup f − x0 = r P (A)
This result can be generalized to a Descartes system and f ∈A
even holds for any Banach function space with monotone is called a Chebyshev center (or simply center) of A with
norm. In a different direction, the analogous problem of respect to P. We denote the (possibly empty) set of such
approximating a given incomplete polynomial by a mono- x0 by c P (A). In particular, we usually drop the indices if
mial has also been studied. P = X ; that is,
Next, consider Hn = {0, t1 , . . . , tn }, where 0 < t1 <
t2 <, . . . with r (A) = r X (A) and c(A) = c X (A)
∞
1 which are simply called the Chebyshev radius and the set
=∞ of centers of A, respectively.
t
i=1 i Most of the literature in this area is concerned with
and set E n ( f ) = E( f, Hn ). The so-called Müntz–Szasz’s the existence problem; namely, c P (A) is nonempty. If
theorem guarantees that E n ( f ) → 0 for every f in C[0, 1]. c P (A) = Ø for all bounded nonempty sets A in X , we
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
F. Optimal Estimation and Recovery There has been some progress in optimal recovery of non-
linear operators, and results on the Hardy spaces have also
Suppose that a physical object u is assumed to be ad- been obtained. It is not surprising that (finite) Blaschke
equately modeled by an unknown element xu of some products play an essential role here in the H p problems.
normed linear space X , and observations or measurements
are taken of u that give a limited amount of information
about xu . The data accumulated, however, are inadequate G. n Widths
to determine xu completely. The optimal estimation prob- Let A be a set in a normed linear space X that we wish
lem is to approximate xu from the given data in some to approximate from an n-dimensional subspace X n of X .
optimal sense. Let us assume that the data describe some Then the maximum error (or order of approximation if A
subset A of X . Then the maximum error in estimating xu is the unit ball) is the quantity
from A is
E(A, X n ) = sup inf{x − y : y ∈ X n }
sup f − xu x∈A
f ∈A It was Kolmogorov who proposed finding an optimal
In order for this quantity to be finite it is necessary and suf- n-dimensional subspace X̄ n of X . By this, we mean
ficient that A is a bounded subset of X , and we shall make E(A, X̄ n ) = inf E(A, X n )
Xn
that assumption. To optimize our estimate with only the
additional knowledge that xu must lie in some subset P of where the infimum is taken over all subspaces X n of
X we minimize the error, and this leads to the Chebyshev X with dimension at most n. This optimal quantity is
radius called the Kolmogorov n width of A and is denoted by
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
dn (A) = dn (A; X ). Knowing the value of dn (A) is impor- Theorem 16. Let k ≥ 2. Then
tant in identifying an optimal (or extremal) approximating −k,
n 1≤r ≤ p ≤∞
subspace X̄ n of X .
k or 2 ≤ p ≤ r ≤ ∞
Replacing the distance of x ∈ A from X n in dn B p ; L r ≈
−k+1/
Kolmogorov’s definition by the distance of x from n p−1/2
, 1≤ p ≤2≤r ≤∞
−k+1/ p−1/r
its image under a continuous linear operator Pn of rank n , 1≤ p ≤r ≤2
at most n of X into itself yields the notion of the linear n −k,
width of A in X , defined by
n 1≤r ≤ p ≤∞
or 1 ≤ p ≤ r ≤ 2
δn (A) = δn (A; X ) = inf sup x − Pn x d n B kp ; L r ≈ −k+1/2−1/r
n , 1 ≤ p ≤2≤r ≤∞
Pn x∈A
−k+1/ p−1/r
n , 2≤ p ≤r ≤∞
It is clear from the two definitions that δn (A) ≥ dn (A).
A dual to the Kolmogorov n width is the Gel’fand n and
−k,
width defined by n 1≤r ≤ p ≤∞
−k+1/ p−1/r
n 1≤ p ≤r ≤2
d n (A) = d n (A; X ) = inf sup x
or 2 ≤ p ≤ r ≤ ∞
L n x∈A∩L n
δn B kp ; L r ≈ n −k+1/ p−1/2 , 1≤ p ≤2≤r ≤∞
where L n is a closed subspace of X with codimension at
most n. It has also been shown that d n (A) is also bounded
and q ≥ r
above by the linear n width δn (A). To provide a lower
−k+1/2−1/r
, 1≤ p ≤2≤r ≤∞
n
bound for both dn (A) and d n (A), the Bernstein n width and q ≤ r
defined by
For the discrete case R m , let B p denote the unit ball us-
bn (A) = bn (A; X ) = sup sup{t : t B(X n+1 ) ⊆ A} ing the l p (R m ) norm, and set X = l∞ (R m ). The following
X n+1 estimates have been obtained:
could be used. Here, X n+1 is a subspace of X with di-
Theorem 17. There exist constants C and Ca inde-
mension at least n + 1, and B(X n+1 ) is the unit ball in
pendent of m and n such that
X n+1 .
An important setting is that A is the image of the unit dn (B1 ; X ) ≤ Cm1/2 n (1)
ball in a normed linear space Y under some T ∈ L(Y, X ),
the class of all continuous linear operators from Y to X . dn (B1 ; X ) ≤ 2[(ln m)/n]1/2 (2)
Although the set A is not TY, the commonly used nota- −1/2
dn (B1 ; X ) ≤ Ca n if n<m<n a
(3)
tions in this case are dn (T Y ; X ), δn (T Y ; X ), d n (T Y ; X ), 1/2
and bn (T Y ; X ). Let C(Y, X ) be the class of all compact n
dn (B1 ; X ) ≤ C 1 + ln n (4)
operators in L(Y, X ). Then the duality between dn and d n m
can be seen in the following theorem:
ln m
dn (B1 ; X ) ≤ 8 n 1/2 (5)
∗ ∗ ∗
ln n
Theorem 15. Let T ∈ L(Y, X ) and T ∈ L(X , Y )
be its adjoint. Suppose that T ∈ C(Y, X ) or that X is a (m − 1)n −1/2
dn (B1 ; X ) ≥ 1 + (6)
reflexive Banach space. Then m−n
dn (T Y ; X ) = d n (T ∗ X ∗ ; Y ∗ ) m 3/2
dn (B2 ; X ) ≤ C 1 + ln n 1/2 (7)
n
and
δn (T Y ; x) = δn (T ∗ X ∗ ; Y ∗ )
III. BIVARIATE SPLINE FUNCTIONS
Let us now consider some important examples. First,
let A be the unit ball B kp of functions in the Sobolev space Spline functions in one variable have proved to be very
H pk = H pk [0, 1] with f (k) p ≤ 1, where 1 ≤ p ≤ ∞. Also, rich in theory and extremely useful in applications. How-
let q be the conjugate of p. The following result describes ever, not much was done in the multivariable setting until
the asymptotic estimates of dn , d n and δn . We shall use the 1980s, and at this writing, many basic questions con-
the notation dn ≈ n s to mean C1 n s ≤ dn ≤ C2 n s for some cerning dimensions, bases, minimum supported elements,
positive constants C1 and C2 independent of n. interpolation, approximation order, shape-prescribed or
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
shape-preserving approximation schemes, computational rays through Ai are represented by irreducible polynomi-
algorithms, and so on are still unanswered. In fact, many als li,1 , . . . , li,ni with degrees f i,1 , . . . , f i,ni , respectively.
of the very fundamental problems do not seem to have Then we have the following result on the dimension of the
satisfactory solutions. This is an area in approximation space Sdk of all C k bivariate splines on D with degree d
theory that requires much research effort. In this section and grid partition .
we discuss only some selected basic results in the two-
variable setting, although most of the results here could Theorem 18.
be obtained in higher dimensions. A different aspect of m
d +2 d − (k + 1)d j + 2
the multivariable theory is discussed in Section VIII. dim Sdk = +
2 2 +
Since spline functions in one variable (or univariate j=1
splines) are piecewise polynomials separated by points,
n
s ∈ Sdk . Let m(d) = min{2(d − K ), d + 1}. It is known essarily produce constants. There is still no general result
that, if k ≤ (2d − 2)/3, then m(d) − 2 ≤ m ≤ m(d). For in- in the nonuniform setting.
stance, for S31 , m(d) = 4 and m is known to be 3. It is also When a triangular grid partition is considered, it is usu-
known that if k > (2d − 2)/3, then m = 0. These results ally more convenient to use Bézier (or Bernstein) represen-
were proved using box splines, which are discussed in tations of the polynomial pieces. Smoothing conditions on
Section VIII. the adjacent polynomial pieces are expressed in terms of
While the exact value of m is still unknown even for the certain relations on the Bézier coefficients. Many interest-
three-directional mesh that we discuss here, the controlled ing formulas have been recently obtained. These formulas
approximation order with respect to box splines has been have applications to constructing Hermite interpolants and
determined. Let {B 1 , . . . , B N } be the collection of all box quasi interpolants to scattered data and also have nice ap-
splines in Sdk . Then the controlled approximation order of plications to computer-aided geometric designs.
Sh with respect to box splines is the largest integer n such
that
IV. COMPACT OPERATORS
N
· AND M IDEALS
f − wih ( j)B i − j ≤ Ch n D n f ∞
i=1 z∈Z 2
h
∞ Since the mid-1970s, a branch of approximation theory
called operator approximation has come into vogue. This
for some sequence wih ( j)
satisfying
area is concerned with the approximation of an opera-
h
w (·) ≤ C f ∞ , i = 1, . . . , N tor from a family of operators with some nice structures,
i l∞
namely, positive operators, self-adjoint operators, normal
where C is an absolute constant, and f and C ∞ function. operators, compact operators, or operators with more than
In Section VIII, it will be seen that ∞ can be replaced by one of these properties. Since the majority of the research
any p, 1 ≤ p ≤ ∞, even when the controlled condition on papers in this area have been concerned with compact op-
the weights is replaced by a “local” one. erator approximation, we limit ourselves to the discussion
of this topic. Let B(X ) denote the class of bounded linear
Theorem 19. Let Sh be the scaled three-directional operators on a Banach space X , and C(X ) the subcollec-
mesh of Sdk . Then the controlled approximation order of tion of compact operators on X . The general question of
Sh with respect to all box splines of Sdk is interest is the existence of a best approximant to a given
operator T in B(X ) from C(X ). If every T in B(X ) has at
2d − 2k for 2d − 3k = 2 (1) least one best approximant from C(X ), we say that C(X ) is
proximal in B(X ). It is well known that, if X is a Hilbert
2d − 2k − 1 for 2d − 3k = 3 or 4 (2) space, then C(X ) is proximinal in B(X ). The following
d +1 for k = 0 (3) result on Banach spaces is worth stating:
min{2d − 2k − 2, d} for k ≥ 1 and 2d − 3k ≥ 5 Theorem 20. Let 1 < p < ∞. Then C(l p ) is prox-
(4) iminal in B(l p ). However, C(X ) is not proximinal in B(X )
for X = C[0, 1] or X = L p [0, 1], 1 < p < ∞, p = 2.
However, the controlled approximation order of Sh with
Nevertheless, certain bounded linear operators in B(L p )
respect to all minimum-supported splines is still unknown.
may still have best compact approximants. Let B1 (L p ) be
When the rectangular partitions are nonuniform, the
the collection of T in B(L p ) such that T xn p → 0 for
unidiagonal and crisscross triangulations are no longer
every uniformly bounded weakly null sequence {xn } in
three- and four-directional meshes. The dimensions of
L p . Also, let B2 (L p ) be the integral operators in B1 (L p ).
these spaces are still unknown except for the cases when
The following result holds:
d − k is sufficiently large or when k = 1, 2. It is also known
that, while the support of the S21 minimum-supported bi-
Theorem 21. C(L p ) is proximinal in B1 (L p ) for
variate splines on the crisscross triangulation is indepen-
2 < p < ∞ and is proximinal in B2 (L p ) for 1 < p < ∞.
dent of the uniformity of the rectangular partition, the
corresponding statement for S31 on the unidiagonal tri- If H is a Hilbert space, every Hankel operator in B(H )
angulation is false. In fact, even the support of the box has a best compact Hankel approximant. Using some func-
splines, which are larger in this case, is not preserved un- tion theoretic arguments, the following result can then
der nonuniform perturbation of the rectangular partition, be obtained. Here, H ∞ will denote the Hardy space of
and if one minimum-supported spline in the perturbed case bounded analytic functions and C the space of continuous
here is used, the totality of all its “translates” does not nec- functions.
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
Theorem 22. H ∞ + C is a proximinal subspace in M Ideals have other important structures. For instance,
L ∞. an M ideal of a Banach algebra must be an algebra, and if
is a compact Hausdorff space and A a function algebra
It should be remarked that, although this theorem does
contained in C(), then the M ideals of A are precisely the
not seem to be a result in operator approximation, it is
closed ideals with a bounded approximate identity. If, in
indeed related to best approximation by compact Hankel
particular, A is the disc algebra, then the M ideals of A are
operators. For the l p spaces, if we let Pn be the norm-one
exactly the two-sided ideals with norm-one approximate
projection of l p onto the first n coordinate vectors, the
identity. In addition, the following interpolation result has
following distance formula is obtained:
been obtained:
Theorem 23. Let T be in B(l p ), where 1 < p < ∞.
Then Pn⊥ T Pn⊥ → dist[T, C(l p ] as n tends to infinity. Theorem 27. Let f be in the disc algebra A
and for each n = 1, 2, . . . , E n be a closed subset of
For p = 2, Pn could be chosen as a norm-one projec- the unit circle with linear measure zero such that γn =
tion onto the first n vectors of any orthonormal basis. It log( f T / f En ) → 0. Then there exist minimal norm
is important to remark, however, that nothing has been interpolants sn of f satisfying ( f − sn ) ∈ Yn = {g ∈ A :
mentioned about uniqueness. In fact, if T is a noncom- g(E n ) = 0} such that f − sn = O(γn ). Furthermore, the
pact bounded linear operator in l p , where 1 < p < ∞, T rate of convergence O(γn ) cannot be replaced by o(γn ) in
has “many” compact best approximants. This observation general.
follows from the fact that C(l p ) is an M ideal in B(l p ).
The notion of an M ideal in a Banach space was in-
troduced in the early 1970s. A closed subspace Y of a
V. CONSTRAINED APPROXIMATIONS
Banach space X is an M ideal in X if its annihilator Y ⊥ is
an L summand of the dual space X ∗ . This in turn means
In many approximation and interpolation problems,
that Y ⊥ is the range of an L projection defined on X ∗ ,
the approximants, which may also be interpolants, are
that is, a projection Q : X ∗ → Y ⊥ with the property that
required to satisfy certain constraints. The constraints
f = Q f + f + Q f for every f in X ∗ . The impor-
may be explicit conditions imposed on the approximation
tance of M ideals in approximation theory is that M ide-
problems, or they may be certain specific properties of
als are proximinal subspaces with certain special approx-
the mathematical model or data the approximants are sup-
imation properties. For instance, (1) the metric projection
posed to preserve. In general, constrained approximation
PY onto Y satisfies the Lipschitz condition d H [PY (x),
problems are nonlinear, and some important problems
PY (y)] ≤ 2x − y for all x, y in X , where d H denotes
do not even have analytic solutions. We mention a few
the Hausdorff distance, and (2) there exists a continuous
such problems. The examples we have chosen should
homogeneous selection for the metric projection PY . Per-
not be construed as the most important work in this area
haps the most remarkable approximation characteristic of
but should be thought of as illustrating the results of the
M ideals is the following result:
subject.
We first limit ourselves to approximation by polynomi-
Theorem 24. Let Y be an M ideal in a Banach space als and splines in one variable. For polynomial approx-
X and x be in X \Y . Then PY (x) algebraically spans Y . imation in the uniform (or supremum) norm ∞ , the
We have already seen in Theorem 20 that C(l p ) is prox- following result is known:
iminal in B(l p ), where 1 < p < ∞. It is also known that
C(l p ) is an M ideal in B(l p ), 1 < p < ∞. In fact, C(l p ) is Theorem 28. Let k be any nonnegative integer. Then
the only M ideal in B(l p ). there exists a constant C such that for any function f in
C k [0, 1] satisfying f ≥ 0 and any integer n ≥ k + 1
Theorem 25. Let Z be a subspace of l p , where
1 < p < ∞. Then the compact operators on Z form an inf{ f − p∞ : p ≥ 0, p a polynomial of degree ≤ n}
M ideal in the space of bounded operators on Z if and ≤ Cn −k w f (k) , n −1 ∞
only if Z has the compact approximation property.
where w(·, n −1 )∞ denotes, as usual, the modulus of con-
In the same direction as Theorem 22, the following re- tinuity in the uniform norm.
sult can be proved:
This result essentially says that monotone approxima-
Theorem 26. The compact Hankel operators form tion by polynomials retains the same order of approxima-
an M ideal in the space of Hankel operators on a Hilbert tion as unconstrained approximation. For spline approx-
space. imation, an analogous result has been obtained, and this
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
∗
result holds even in the L p setting. Let Sk,N denote the where q is the conjugate of p and
space of all kth-order splines s with knots at i/N , i = 0,
!
n−k
. . . , N , such that s ≥ 0. We have the following result: A = [0, 1] {(t j , t j+k ) : d j = 0}
j=1
Theorem 29. Let 1 ≤ p ≤ ∞ and k be a nonnega-
We observe that if p = 2, so that q − 1 = 1, then s is
tive integer. Then there exists a positive constant C such
in C k−1 and is a piecewise polynomial of order 2k. Re-
that for any monotonically nondecreasing function f with
cently, an analog to the perfect spline solution of the un-
f ( j) ∈ L p [0, 1], if 1 ≤ p > ∞, or f ( j) ∈ C[0, 1], if p = ∞,
constrained problem has been obtained for case p = ∞.
where 0 ≤ j ≤ k − 1, and for any N = 1, 2, . . .
However, with the exception of some simple cases, no
∗
inf f − s p : s ∈ Sk,N ≤ C N − j w f ( j) , N −1 p numerical algorithm to determine s is known.
To describe computational methods for shape-
We now turn to the interpolation problem but con-
prescribed splines, it is easier to discuss quadratic spline
centrate only on spline interpolation. Let 0 = t0 ≤ t1
interpolation. It should be clear from the above result that
≤ · · · tn ≤ tn+1 = 1. Defining t j for j < 0 and j > n + 1 ar-
extra knots are necessary. Let these knots be x1 , . . . , xn−1
bitrarily as long as t j < t j+k for all j, we can use {t j } as the
with ti < xi < ti+1 and let {yi , m i }, i = 1, . . . , n, be a
knot sequence of the normalized B splines N j,k of order
given Hermite data set. Then a quadratic spline s with
k. Let H pk = H pk [0, 1] denote, as usual, the Sobolev space
knots at {t1 , . . . , tn , x1 , . . . , xn−1 } is uniquely determined
of functions f in C k−1 [0, 1] with f (k−1) absolutely con-
by the interpolation conditions s(ti ) = yi and s (ti ) = m i
tinuous and f (k) in L p [0, 1]. The “optimal” interpolation
for i = 1, . . . , m. The slopes of s at the knots xi can be
problem can be posed as follows. Let {gi }, i = 1, . . . , n,
shown to be
and
(k) 2(yi+1 − yi ) − (xi − ti )m i − (ti+1 − xi )m i+1
s = inf f (k) : f ∈ H k , f (ti ) = gi , i = 1, . . . , n s (xi ) =
p p p ti+1 − ti
if 1 < p < ∞, s always exists, and if 1 < p < ∞ and n ≥ k, for i = 1, . . . , n − 1. Hence, the knot xi is not active if and
s is also unique. It is also known that, for 1 < p < ∞ and only if
n ≥ k, m i+1 + m i yi+1 − yi
=
s (k) = |h|q−1 sgn h 2 ti+1 − ti
where h is some linear combination of the normalized In general, the slopes m i are not given and can be
B splines Ni,k and q the conjugate of the index p. Sup- considered parameters to ensure certain shape. For in-
pose that the data {gi } are taken from some function g in stance, if m i m i+1 ≥ 0, a necessary and sufficient con-
C k [0, 1]. Then the constrained version of the above prob- dition that m i s(t) ≥ 0 for ti ≤ t ≤ ti+1 is |(xi − ti )m i +
lem is determining an s in H pk such that s(ti ) = g(ti ) for (ti+1 − xi )m i+1 | ≤ 2 |yi+1 − yi |.
i = 1, . . . , n, s (k) ≥ 0, and Similar conditions can be obtained for comonotonic-
(k) ity and convexity preservation. For practical purposes, the
s = inf f (k) : f ∈ H k , f (ti ) = g(ti ), f (k) ≥ 0
p p p slopes m i could be selected as a weighted central differ-
Again for 1 < p < ∞ and n ≥ k, s exists and is unique. ence formula for 2 ≤ i ≤ n − 1 and a weighted noncentral
In fact, the following result is known. To formulate the formula for i = 1 and n, so that the artificial knots xi can
result, we set be determined using the corresponding necessary and suf-
ficient condition such as the inequality mentioned above.
di = [ti , . . . , ti+k ]g Such an algorithm has the advantage of being one-pass.
where the divided difference notation has been used. Also, Cubic splines could be easily used for monotone ap-
let χ A denote the characteristic function of a set A. proximation. In fact, if n ≥ 2, there is a uique function s
such that s satisfies s ≥ 0, s(ti ) = gi , and
Theorem 30. The unique function s described
s 2 = inf f 2 : f ∈ H22 , f (ti ) = gi , f ≥ 0
above is characterized by
1 The unique solution s is a natural cubic spline with at most
s (k) Ni,k = di 2[n/2] + 2 extra knots in addition to the knots at ti , . . . , tn .
0 Much more can be said when the Hilbert space H2k is
for i = 1, . . . , n − k, and considered. For instance, let C be a closed convex subset
q−1 of H2k . This may be a set of functions that are positive,
n−k
s =
(k)
α j N j,k χA monotone, convex, and so on. The following result has
j=1 + been obtained:
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
Theorem 31. Let g ∈ C ⊂ H2k and n ≥ k + 1. Then ai j = N j (xi ). Hence, the biinfinite (coefficient) matrix A
there is a unique function sn in C such that sn (ti ) = g(ti ) can be considered an operator from l∞ into itself. If A is
for i = 1, . . . , n, and surjective, then a bounded spline interpolant s exists, and
(k) if A is injective, then s is uniquely determined by the inter-
s = inf f (k) : f ∈ H k , f (ti ) = g(ti ), f ∈ C
2 2 2 polation condition s(xi ) = y + i, where i = . . . 0, 1, . . . . It
Furthermore, sn is a piecewise polynomial of order 2k, and is clear that A cannot be invertible unless it is banded, and
if sn∗ denotes the (unconstrained) natural spline of order using the properties of B splines, it can be shown that A
2k interpolating the same data, then is totally positive.
(k) Generally, certain biinfinite matrices A = [ai j ] can be
(sn − g)(k) ≤
sn∗ − g
considered as operators on l p as follows. Let A{xi } =
2 "
2
{Axi }, with Axi = j ai j x j , where {x j } is a finitely sup-
Consequently, the rate of convergence of the con- ported sequence. Then the definiton can be extended to all
strained interpolants sn to g is established. Another con- of l p using the density of finitely supported sequences. If
sequence is that the convergence rate for splines interpo- the extension is unique, we denote the operator also by A
lating derivatives at the end points remains the same as the and use the usual operator norm A p . Let {e j } be the nat-
unconstrained ones. ural basis sequence with e j (i) = δi j , and denote by Pn the
We now give a brief account of the multivariable set- projection defined by Pn e j = e j for | j| ≤ n and zero oth-
ting. Minimizing the L 2 norm over the whole space R s erwise. Therefore, the biinfinite matrix representation of
of certain partial derivatives gives the so-called thin-plate Pn is [ pi j ], where pi j = δi j for | j| ≤ n and zero otherwise.
splines. So far, the only attempt to preserve shape has For any biinfinite matrix A, we can consider its truncation
been preserving the positivity of the data. Since the nota- An = Pn A Pn . Also, let S be the shift operator defined by
tion has to be quite involved, we do not go into details but Se j = e j+1 . Then (S r A)n (i, j) = 0 if |i| > n or | j| > n and
remark that the convergence rate does not differ from the = ai−r j otherwise. That is, S r shifts the diagonals of A
unconstrained thin-plate spline interpolation. It must be down by r units.
mentioned, however, that thin-plate splines are not piece- We use the notation A ∈ B(l p ) if A, considered an op-
wise polynomials. Piecewise polynomials satisfying cer- erator on l p , is a bounded operator. We also say that A
tain smoothness joining conditions are discussed in the is boundedly invertible if, as an operator on l p , A is both
sections of bivariate splines for two variables and multi- injective and surjective. In order to extend the important re-
variate polyhedral splines for the general n-dimensional sults in finite matrices, it is important to be able to identifya
setting. Results on monotone and convex approximation “main diagonal” of A. Perhaps the following is a good def-
by piecewise polynomials, however, are not quite com- inition of such diagonal. The r th diagonal of A is main if
plete and certainly not yet published.
lim sup(S r A)−1 n
<∞
p
interpolation can be obtained in terms of the norm of where B is a diagonal matrix with positive diagonal
A−1 . We remark that there are many other possible and entries.
plausible definitions for a main diagonal of a biinfinite We conclude this section by stating a factorization result
matrix. of block Toeplitz matrices.
In solving a biinfinite linear system as discussed ear-
lier, a Gauss elimination procedure quite often factors the Theorem 37. Let A be totally positive biinfinite
coefficient matrix A as A = LU , where L is a unit lower block Toeplitz, with block size at least 2, and unit lower
triangular matrix with the zeroth diagonal as its rightmost triangular. Write A = [Ai j ], where Ai j = Ak for all i and
(nontrivial) band, and U is an upper triangular matrix. j with i = j + k, and denote
This is called an LU factorization of A. We say that an
∞
LU factorization is invertible if each factor is bounded A(z) = Ak z k
and boundedly invertible, with L −1 and U −1 being again k=0
lower and upper triangular, respectively.
Then
−1
Theorem 33. Let A be totally positive with both ∞ ∞
A and A−1 in B(l∞ ). Then there exists a unique r such A(z) = [I + ak (z)]c(z) [I − bk (z)]
that S r A = LU , where L, L −1 , U , U −1 are in B(l∞ ), L, k=1 k=1
L −1 being unit lower triangular and U , U −1 being upper where I + ak (z) and I − bk (z) are the symbols of one-
triangular biinfinite matrices. Furthermore, if we write banded
" block Toeplitz matrices with ak (1), bk (1) ≥ 0 and
(S r A)n = L n Un [ak (1) + bk (1)] < ∞. Furthermore, c(z) is the symbol
of a totally positive block Toeplitz matrix with c(z) and
for each n, then L n → L and Un → U entrywise. c−1 (z) entire, and det c(z) = 1.
It is clear that the LU factorization above is unique.
However, it would be even better if U were the transpose
of L. Such a factorization is called a Cholesky decompo- VII. INTERPOLATION
sition. We have the following result:
The theory and methods of interpolation play a central
Theorem 34. Let A be a positive definite symmetric role in approximation theory and numerical analysis. This
matrix with A and A−1 in B(l2 ). Then A has a unique branch of approximation theory was initiated by Newton
Cholesky decomposition; that is, A = L L T , where L and and Lagrange. Lagrange interpolation polynomials, for-
L −1 are lower triangular biinfinite matrices in B(l2 ) with mulated as early as 1775, are still used in many applica-
lii > 0. Furthermore, L is unique, and writing An = L n L nT , tions. To be more explicit, let
we have L n → L and n → ∞.
n : −1 ≤ tn−1 < · · · tnn ≤ 1
The following result for l1 is also of some interest.
be a set of interpolation nodes (or sample points) and
Theorem 35. Let A be a column diagonally domi- lnk (t) = (t − tni )/(tnk − tni )
nant biinfinite matrix with A and A−1 in B(l1 ). Then there i=k
is a unique factorization A = LU , where L, L −1 are B(l 1 )
unit lower triangular, and U , U −1 are B(l 1 ) upper triangu- Then for an given set of data Y = {yi , . . . , yn }, the poly-
lar. Furthermore, writing An = L n Un , we have L n x → L x nomial
and Un x → U x for all x ∈ l1 .
n
pn (t) = pn (t; Y, n ) = yk lnk (t)
Certain biinfinite matrices have more refined factoriza- k=1
tions. We will call a one-banded biinfinite matrix R = [ri j ]
elementary if rii = 1 and ri j = 0 for all i and j with interpolates the data Y on n . It is interesting that, even for
j ≤ i − 2 or j > i. uniformly spaced n , there exists a continuous function
g on [−1, 1] such that the (Lagrange) interpolating poly-
Theorem 36. Any strictly m-banded totally positive nomials pn (t) to yi = g(tni ), i = 1, . . . , n, do not converge
unit lower triangular biinfinite matrix A has a factorization uniformly to g(t) on [−1, 1]. This leads to the study of the
A = R1 · · · Rm where each Ri , i = 1, . . . , m, is elementary. Lebesgue function and constant defined, respectively, by
n
Hence, if A is not necessarily unit but is m-banded to- ln (t) = ln (t; n ) = |lnk (t)|
tally positive and lower triangular, then A = R1 · · · Rm B, k=1
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
denote the Hermite-Fejér interpolation polynomial of a Given a set of data {yi j : ei j = 1}, find an f in
continuous function f at the roots Tn of Cn . Then Sn (F, m ) satisfying D j f (xi ) = yi j , where ei j = 1 and
D j f (xi ) = 12 [D j f (xi− ) + D j f (xi+ )].
|Hn (x, f ) − f (x)| The problem I.P.(E, F, m ) posed here is said to be
#
n
1 i poised if it has a unique solution for any given data set.
= O(1) w f, (1 − x ) × |Cn (x)|
2 1/2
It is known that I.P.(E, F, m ) is poised if and only if
i2 n
i=1 I.P.(E, F, m ) is poised. By using the Pólya conditions
$
n
1 i 2 on (E, F) and the Budan–Fourier theorem, the following
+ w f, |xCn (x)| result can be shown. We say that E is quasi Hermite if for
i2 n
i=1 each i = 1, . . . , m − 1, there exists an Mi such that ei j = 1
and if and only if j < Mi .
#
1 n
1 Theorem 42. Suppose that (E, F) satisfies the Pólya
|Hn (x, f ) − f (x)| = O(1) w f, (1 − x 2 )1/2
n i=1 i conditions and the ei j = 1 implies f i,n− j = 0 for all (i, j).
$ Suppose further that one of the matrices E and F is quasi
Hermite and the other has no supported odd blocks. Then
× |Cn (x)| + i −2 |xCn (x)|
the problem I.P.(E, F, m ) is poised for any m .
There has been much interest in the so-called Birkhoff For cardinal interpolation, we impose the extra condi-
polynomial interpolation problem. The interpolation con- tions e0 j = em j and f 0 j = f m j for j = 0, . . . , m on the inci-
ditions here are governed by an incidence matrix E = [ei j ], dence matrices. Of course, Sn (F, m ) has to be extended
where ei j = 1 or 0, namely, to be piecewise polynomial functions f of degree n with
break points at q + x1 , where i = 1, . . . , m and q = · · · 0,
D j p(xi ) = yi j if ei j = 1 1, . . . , such that
We now study interpolation by spline functions. Spline D n− j f q + xi− = D n− j f q + xi+
interpolation is much more useful than polynomial inter-
for all q and (i, j) with f i j = 0. The following problem
polation. In the first place there is no need to worry about
will be called a cardinal interpolation problem, C.I.P.(E,
convergence. Another great advantage is that a spline in-
F, m ):
terpolation curve does not unnecessarily oscillate as a
Given a set of data {yi jq }, for any 0 ≤ i < m and ei j = 1,
polynomial interpolation curve does when a large num-
find an f in Sn (F, m ) satisfying D j f (q + xi ) = yi jq ,
ber of nodes are considered. The most commonly used
where D j f (q + xi ) is the average of D j f (q xi− ) and
is a cubic spline. If interpolation data are given at the
D j f (q + xi+ ).
nodes m : 0 = x0 < · · · < xm = 1, the interior nodes can
The problem C.I.P.(E, F, m ) is said to be poised
be used as knots of the cubic spline. When the first deriva-
if whenever yi jq is arbitrarily given and satisfies yi jq =
tives are prescribed at the end points, the correspond-
O(|q| p ) for some p, as q → ± ∞, for all (i, j), the prob-
ing interpolation cubic spline, called a complete cubic
lem C.I.P.(E, F, m ) has a unique solution f satisfying
spline, is uniquely determined. If no derivative values
f (x) = O(|x| p ) as x → ±∞. Let Co denote the space of
at the end points are given, a natural cubic spline with
all solutions of C.I.P.(E, F, m ) with yi jq = 0 and denote
zero second derivatives at the end points could be used.
its dimension by d. A function f in Co is called an eigen-
Spline functions can also adjust to certain shapes. This
spline with eigenvalue λ if f (x + 1) = λ f (x) for all x. The
topic is discussed in Section V on constrained approxima-
following results have been obtained:
tion. There is a vast amount of literature on various results
on spline interpolations: existence, uniqueness, error es-
Theorem 43. The C.I.P.(E, F, m ) has eigenvalue
timates, asymptotics, variable knots, and so on. We shall
if and only if the C.I.P.(E, F, m ) has eigenvalue λ−1 .
discuss only the following general interpolation problem:
Let E = [ei j ] and F = [ f i j ] be m × n incidence matri-
Theorem 44. Let the C.I.P.(E, F, m ) have d dis-
ces and n be a partition of [0, 1] as defined above. We
tinct eigenvalues. Then the problem is poised if and only
denote by Sn (F, m ) the space of all piecewise polyno-
if none of the eigenvalues lies on the unit circle.
mial functions f of degree n with the partition points xi
of m as break points such that D n− j f (xi− ) = D n− j f (xi+ ) Error estimates on interpolation by these splines have
for all pairs (i, j) with f i j = 0, where 0 < i < m. The fol- also been obtained.
lowing problem is called an interpolation problem, I.P.(E, We next discuss multivariate interpolation. First it
F, m ): should be mentioned that not every choice of nodes in R s ,
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
s ≥ 2, admits a unique Lagrange interpolation. Hence, it is vertices of a simplex T s in R s , and for each x in T s let
important to classify such nodes. We will use the notation (u 1 , . . . , u s+1 ) be its barycentric coordinate; that is,
n+s
s+1
s+1
Nn (s) = x= u i Vi ; u i = 1, ui ≥ 0
s
i=1 i=1
The following result gives such a criterion, which can be
Let a = (a1 , . . . , as+1 ), where a1 , . . . , as+1 are non-
used inductively to give an admissible choice of nodes in
negative integers such that |a| = a1 + · · · + as+1 ≤ n.
any dimension:
It is known that the set of equally spaced nodes
{xa }, |a| ≤ n, where xa = (a1 n −1 , . . . , as+1 n −1 ), admits
Theorem 45. Let n = {x i }1,...,Nn (s) be a set of nodes
a unique Lagrange interpolation. For this set of equally
in R s . If there exists n + 1 distinct hyperplanes S0 , . . . , Sn
spaced knots {xa }, we define
in R s and n + 1 pairwise disjoint subsets A0 , . . . , An of
the set of nodes n such that for each j = 0, . . . , n, A j is a s+1 ai −1 % s+1
j
subset of S j \{S j+1 ∪ · · · ∪ Sn }, has cardinality N j (s − 1), la (x) = n n
ui − ak !
i=1 j=0
n k=1
and admits a unique Lagrange polynomial interpolation
of degree j in (s − 1) variables, then n admits a unique L n (x) = |la (x)|
Lagrange polynomial interpolation of degree n in R s . |a|≤n
An application of the Hermite–Gennochi formula for di- where, and hereafter, P denotes an arbitrary orthogonal
vided differences gives the interesting relation projection from R n onto R s . Then we can write
i Rs for all p ∈ πd .
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
and F( p) = λ( p) for any polynomial p. For any h > 0, is in π|b|−1 for |b| < k.
define 3. There exist some finitely supported wi,a so that
(Q h f )(x) = F{ f [h(· + a)]}B h −1 x − a, X n
N
ψ= wi,a φi (· − a)
a
i=1 a∈Z s
Then the following estimate is established: satisfies
xb ab
Theorem 55. There exist absolute constants C and − ψ(x − a)
r such that b! a∈Z s b!
| f (x) − (Q h f )(x)| ≤ Ch d+1−s/ p D j f L P(Ax,h ) is in π|b|−1 for |b| < k.
| j|=d+1 4. S() ∈ A p,k for all p, 1 ≤ p ≤ ∞.
5. S() ∈ A∞,k .
for all x ∈ R s and f ∈ H pd (R s ), where d = d(X n ), and
A x,h = x + r h[−1, 1]s In the two-dimensional setting, and particularly the
three-directional mesh, the approximation order of S(),
At least for p = ∞, the order O(h d+1 ) cannot be im- where is a collection of box splines, has been extensively
proved. Let us now replace B by a finite collection studied. It is interesting that, although C 1 cubics locally
of locally supported splines φi on R s . Hence, S() will reproduce all cubic polynomials, the optimal approxima-
denote the linear span of the translates φi (·, −a), a ∈ Z s , tion rate is only O(h 3 ). However, not much is known on
where i = 1, . . . , N , say. Let H p,c
m
be the subspace of the approximation order of S() for R s with s > 2.
m
the Sobolev space H p of compactly supported functions Although multivariate polyhedral splines are defined in
f with norm the distributional setting, they can be computed by us-
# $1/ p ing the recurrence relations. The computational schemes,
however, are usually very complicated. For R 2 , there is
f p,m = D f p
j p
| j|≤m
an efficient algorithm that gives the Bézier coefficients for
each polynomial piece of a box spline. In general, one
We say that S() provides local L p approximation of order could apply the subdivision algorithms, which are based
k, or S() ∈ A p,k , if for any f ∈ H p,c
m
, there exist weights on discrete box splines. Under certain mild assumptions,
wi,a such that
h
the convergence rate could be shown to be quadratic.
N
·
f − wi,a φi
h
− a ≤ Ch k f p,m
i=1 a∈Z s h IX. QUADRATURES
p
and wi,a
h
= 0 whenever dist(ah, supp f ) > r holds, where This section is devoted to approximation of integrals. Let
C and r are positive constants independent of h and f . D be a region in R s and C(D) the space of all continuous
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
A. Automatic Quadratures
N −1
n
N −1
Q (n+1)N −1 ( f ) = A0k b0k ( f ) + Aik bik ( f )
An integration method that can be applied in a digital com- k=1 i=1 k=0
puter to evaluate definite integrals automatically within where
a tolerable error is called an automatic quadrature. The
1
trapezoidal, Simpson, and adaptive Simpson rules are typ- A0k = Uk−1 (x) d x
ical examples. We may say that an automatic quadra- −1
ture is based on interpolatory-type formulas. An ideal
and
interpolatory quadrature is one whose linear functionals
1
L nk are defined by L nk f = Ank (xk ), k = 1, . . . , n, where
A1k = wi [TN (x)]Tk (x) d x
−1 ≤ x1 < · · · < xn ≤ 1, such that Ank > 0, the function −1
values f (xk ), can be used again in future computation (i.e.,
for i = 1, . . . , n. It is important to point out that bik ( f )
for n + 1, n + 2, . . .), and that the quadrature has algebraic
and Aik are independent of n and that to proceed from
precision of degree n − 1 [i.e., rn ( f ) = 0 for f (x) = 1, . . . ,
Q (n+1)N −1 ( f ) to Q (n+2)N −1 ( f ) we need only the extra
x n−1 ]. Using Lagrange interpolation by polynomials of
terms bn+1,k (F) and An+1,k , for k = 1, . . . , N , and their
degree n − 1 guarantees the algebraic precision. Here, we
recurrence relations are available.
have
1
Ank = (x − xi )/(xk − xi ) d x B. Gauss Quadrature
−1 i=k
Let us return to the integration quadrature formula
and the quadrature is sometimes attributed to Newton.
However, the sequence {Ank }, k = 1, . . . , n, is usually quite 1
f (x)w(x) d x = Q n ( f ) + rn ( f )
oscillatory. Since it is known that rn (F) → 0 for all −1
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
cubatures are very difficult to obtain. Some of the reasons does not exceed rm,n [Vr,s; p (M)], defined analogously,
∗
are that multivariate interpolation may not be unisolvent where rmn (g) has the same form as rmn (g) with arbitrary
and that orthogonal polynomials are difficult to find on nodes xk and y j in I and arbitrary coefficients A j , Bk , and
an arbitrary region D and R s . There are, however, other Ck j . Furthermore
methods, including, Monte Carlo, quasi Monte Carlo, and ∗
number-theoretic methods. On the other hand, many of the rmn [Vr,s; p (M)] = M r̄r, p (Wr, p )r̄n,s (Ws, p )
one-variable results can be generalized to Cartesian prod-
uct regions. For instance, let us study optimal quadratures.
Let H pr (I ) denote the Sobolev spaces on I = [−1, 1], X. SMOOTHING BY SPLINE FUNCTIONS
and Wr, p the unit ball of f in H pr (I ); that is
(r ) Not only are spline functions useful in interpolation and
f ≤1 approximation; they are natural data-smoothing functions,
p
especially when the data are contaminated with noise. We
To extend to two variables, for instance, let Vr,s; p (M) be the
first discuss the one-variable setting.
set of all functions g(· , ·) in the Sobolev space H p(r,s) (I 2 )
Let 0 = t0 < · · · < tn+1 = 1 and H2k = H2k [0, 1] be the
such that
(r,s) Sobolev space of functions f such that f , . . . , f (k) are
g (· , ·) ≤ M in L 2 = L 2 [0, 1], where k ≤ n − 1. For any given data
p
Z = [z 1 , . . . , z n ]T , it can be shown that there is a unique
where M is a fixed positive constant. Now, in the one- function sk, p = sk, p (·; Z ), which minimizes the functional
variable setting, if
m
1
1 n
1
Jk, p (u; Z ) = p [D k u]2 + [u(ti ) − z i ]2
rm ( f ) = f (x) d x − Am
k f (xmk ) 0 n i=1
−1 k=1
k=1 j=1
max(ti+1 − ti ) ≤ c min(tt+1 − ti )
∗
is optimal in the sense that rmn [Vr,s; p (M)], defined to be
& ∗ & where c is independent of n as n → ∞. Then there exist
sup &rmn (g)& : g ∈ Vr,s; p (M) positive constants c1 and c2 such that
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
so that sk,0 (·; Z ) is the natural spline of order 2k interpo- for all P in πk−1 .
lating the data Z . Let K k be the elementary solution of the k-times-
Suppose that the data are contaminated with noise, say iterated Laplacean:
z i = g(ti ) + ei , where g ∈ H2k and e1 , . . . , en are indepen-
dent random variables with zero mean and positive stan- k K k = δ0
dard deviation. A popular procedure for controlling the Then it is well known that
smoothing parameter p is the method of generalized cross- #
validation (GCV). We will not go into details but will re- ck |t|2k−s if s is odd
K k (t) =
mark that it is a predictor–corrector procedure. ck |t|2k−s log |t| if s is even
The idea of a smoothing spline in one dimension has a
natural generalization to the multivatiate setting. Unfortu- where ck = (−1)k 2π [2k−1 (k−1)!]2 . Also, if m is a measure
nately not much is known for an arbitrary domain D in R s . with compact support orthogonal to πk−1 , then m ∗ K k is
In the following, the L 2 norm 2 will be taken over all an element of D −k L 2 (R s ). Applying this result to k sk, p ,
of R s , and this accounts for the terminology of “thin-plate we have
splines.” Again let p > 0 be the smoothing parameter and 1 n
k > s/2. Then for every set of data {Z 1 , . . . , Z n } in R and a (−1)k k sk, p ∗ K k = ci K k (t − ti )
np i=1
scattered set of sample points {t1 , . . . , tn } in R s , the thin-
plate smoothing spline of order 2k and with smoothing and
parameter p is the (unique) solution of the minimization
k (−1)k k sk, p ∗ K k = (−1)k k sk, p ∗ k K k
problem,
# $ = (−1)k k sk, p ∗ δ0
1 n
inf p|u|k +
2
[u(ti ) − z i ]2
= (−1)k k sk, p
u∈D −k L 2 (R s ) n i=1
where This yields
2 sk, p − (−1)k k sk, p ∗ K k ∈ πk−1
s ∂k
|u|k =
2
u
i ,...,i =1
∂ x i1 · · · ∂ x ik Theorem 62. The smoothing spline sk, p satisfies
1 k 2
and
n
sk, p (t) = (−1)k di K k (t − ti ) + P(t)
D −k L 2 (R s ) = {u ∈ D (R s ) : D a u ∈ L 2 (R s ), |a| = k} i=1
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
where P ∈ πk−1 . Furthermore, the polynomial P and the Of course, sk, p (ti ; Z ) = yi and determining {yi } is equiva-
coefficients d1 , . . . , dn are uniquely determined by the lent to determining sk, p itself.
equations: In a more general setting, let X , H , K be three Hilbert
1 spaces and T , A be linear operators mapping X onto H
(−1)k di + sk, p (ti ) − z i = 0, i = 1, . . . , n and K and having null spaces N (T ), N (A), respectively.
np
Let p > 0 be the smoothing parameter in the functional
and
Jk, p (u, Z ) = pT u2H + Au − Z 2K
d1 Q(t1 ) + · · · + dn Q(tn ) = 0, Q ∈ πk−1
where Z is the given “data” vector in K and u ∈ X .
Hence, if the values yi = sk, p (ti ; Z ), i = 1, . . . , n, are
known, the smoothing spline sk, p is also determined. Writ-
Theorem 64. If N (T ) + N (A) is closed in X with
ing Y = Ak ( p)Z , where Y = [y1 , . . . , yn ]T , it is important
N (T ) ∩ N (A) = {0}, there exists a unique function s p in
to study the influence matrix Ak ( p) as in the one-variable
X satisfying
setting. Again it is possible to write
Ak ( p) = (I + np Bk )−1 Jk, p (s p , Z ) = inf{Jk, p (u, Z ) : u ∈ X }
Let {t1 , . . . , tn } lie in a bounded domain D and R s . We Moreover, if S denotes the space of all “spline” functions
say that {ti } is asymptotically quasi-uniformly distributed defined by
in D if for all large n,
S = {s ∈ X : T ∗ T s is in the range of A∗ }
sup min |t − ti | ≤ c min |t1 − t j |
t∈D 1≤i≤n 1≤i< j≤n where T ∗ , A∗ denote the adjoints of T and A, respectively,
where c is some positive constant, depending only on D. then s p satisfies s p ∈ S and
T ∗ T s p = p −1 A∗ (Z − As p )
Theorem 63. Let D satisfy a uniform cone con-
dition and have Lipschitz boundary and {t1 , . . . , tn } As a corollary, it is clear that s p ∈ X is the smoothing
be asymptotically quasi-uniformly distributed in spline function if and only if
D. Then there exists a positive constant C such
T s p , T x H = p −1 Z − As p , Ax K
that the eigenvalues dn1 , . . . , dnn of Bk satisfy
0 = dn1 = · · · = dnk < dn,k+1 < · · · < dnn and for all x ∈ X .
An interesting special case is X = H2k (D), where D is
i 2k/s ≤ ndni ≤ Ci 2k/s , i = N + 1, . . . , n
a bounded open set in R s with Lipschitz boundary and
for all n, where satisfying a uniform cone condition. If k > s/2, then the
evaluation functional δt is continuous. Let ti , . . . , tn ∈ D
k+s−1
N = dim πk−1 = and set K = R n and H = [L 2 (D)] N , where N = s k . Also,
s
define
Thus, if we study the effect of thin-plate spline smooth- T
ing in the “frequency domain” as before by writing Au = δt1 u · · · δtn u
Ŷ = Q ∗ Y ; Ẑ = Q ∗ Z and
∗ ∂k
where Q is unitary with Q Bk Q = diag[dn1 · · · dnn ], we
T =
obtain ∂ xi1 · · · ∂ xik
1
ŷ i = ẑ i ∼ bk ( p 1/2k ri)ẑ i where i 1 = 1, . . . , s and i k = 1, . . . , s, and equip H with
1 + pdni the norm H , where
where, again, bk is a Butterworth-type low-pass filter with
d 2
cutoff frequency at (r p s/2k )−1 . Similar conclusions can v2H = vi (D)
i 1 ...i k
be drawn on the limiting cases as p → ∞ and p → 0, and t1 ,...,i k =1
L2
1 n
by a constant multiple of a −1 . Hence, the IWT has the
(−1)k k s p = [z i − s p (ti )]δti zoom-in and zoom-out capability.
p i=1
This localization of time and frequency of a signal func-
In fact, the following estimate has also been obtained for tion f (t), say, not only makes it possible for filtering,
noisy data: detection, enhancement, etc., but also facilitates tremen-
dously the procedure of data reduction for the purpose of
Theorem 66. Let z i = g(ti ) + ei , i = 1, . . . , n, storage or transmittance. The modified signal, however,
where g is in H2k (D), and let e1 , . . . , en be independent must be reconstructed, and the best method for this re-
identically distributed random variables with variance construction is by means of a wavelet series. To explain
V . If the points ti satisfy a quasiuniform distribution what a wavelet series is, we rely on the notion of mul-
condition in D, then there exist positive constants c1 and tiresolution analysis as follows: Let {Vn } be a nested se-
c2 , independent of n, such that for 0 ≤ p ≤ 1, quence of closed subspaces of L 2 = L 2 (R), such that the
1 intersection of all Vn is the zero function and the union
E |s p (· ; Z ) − g|2j ≤ c1 p (k− j)/k + c2 V p −(2 j+s)/2k of all Vn is dense in L 2 . Then we say that {Vn } consti-
n
tutes a multiresolution analysis of L 2 if for each n ∈ Z,
for n −2nk/s ≤ p ≤ 1, where 1 ≤ j ≤ k. we have f ∈ Vn ⇔ f (2·) ∈ Vn+1 and if there exists some
However, no general result on asymptotic optimality of φ ∈ V0 such that the integer translates of φ yields an un-
cross-validation is available at the time of this writing. conditional basis of V0 . We will also say that φ gener-
ates a multiresolution analysis of L 2 . Examples of φ are
B-splines of arbitrary order and with Z as the knot se-
XI. WAVELETS quence. Next, for each k ∈ Z, let Wk be the orthogonal
complementary subspace of Vk+1 relative to Vk . Then it
The Fourier transform fˆ(w) of a function f (t) represents is clear that the sequence of subspaces {Wk } is mutually
the spectral behavior of f (t) in the frequency domain. This orthogonal, and the orthogonal sum is all of L 2 . Conse-
representation, however, does not reflect time-evolution quently, every f ∈ L 2 can be decomposed as an orthogonal
of frequencies. A classical method, known as short-time sum of functions gk ∈ Wk , k ∈ Z. It can be proved that there
Fourier transform (STFT), is to window the Fourier inte- exists some function ψ whose integer translates form an
gral, so that, by the Plancherel identity, the inverse Fourier unconditional basis of L 2 . If this ψ, called a wavelet, is
integral is also windowed. This procedure is called time- used as the window function in the IWT, then the “modi-
frequency localization. This method is not very efficient, fied” signal f (t) can be reconstructed from IWT at dyadic
however, because a window of the same size must be used values by means of a wavelet series.
for both low and high frequencies. The integral wavelet Let us first introduce the dual ψ̃ ∈ W0 of ψ defined
transform (IWT), defined by (uniquely) by
1 ∞
t −b ψ̃, ψ(· − n) = δn,0 , n ∈ Z.
(Wψ f )(b, a) = √ f (t)ψ dt,
a −∞ a Then, by using the notation
on the other hand, has the property that the time-window
narrows at high frequency and widens at low frequency. ψk, j (t) = ψ(2k t − j)
This is seen by observing that, for real f and ψ,
where j, k ∈ Z, and the same notation for ψ̃ k, j , we have
(Wψ f )(b, a) the following.
√ ∞
a ω0 Theorem 67. For every f ∈ L 2 , then
= Re e−ibω fˆ(ω)g a ω − dω;
π 0 a
f (t) = 2k/2 (Wψ f )( j2−k , 2−k )ψ̃k, j (t)
where g(ω) := ψ̂(ω + ω0 ), and ω0 is the center of the fre- k, j∈Z
quency window function ψ̂(ω). More precisely, the win-
dow in the time-frequency domain may be defined by = 2k/2 (Wψ̂ f )( j2−k , 2−k )ψk, j (t).
k, j∈Z
ω0 1 ω0 1
[b − aψ , b + aψ ] × − ψ̂ , + ψ̂ , Algorithms are available to find the coefficients of the par-
a a a a
tial sums of this series and to sum the series using values
where ψ and ψ̂ denote the standard deviations of ψ of the coefficients. Since these coefficients are 2k/2 mul-
and ψ̂, respectively, and we have identified the frequency tiples of the IWT at ( j2−k , 2−k ) in the time-scale plane,
P1: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001C-26 May 7, 2001 13:59
and (an , bn ), which are known as reconstruction and de- Where we have omitted the multiple of [(2m − 1)!]−1 for
composition sequences, respectively, a pyramid algorithm convenience. Then {an } and {bn } are determined by the
is realizable. Here, { pn } defines the φ that generates the Laurent series:
)
multiresolution analysis, namely: 1 (z)
G(z) = m (1 + z)m )m 2 = an z −n
φ(t) = pn φ(2t − n) 2 z m (z ) n∈Z
n∈Z
1 1
H (z) = − (1 − z)m ) 2 = bn z −n .
and {qn } relates ψ to φ by: 2 m z (z ) n∈Z
ψ(t) qn φ(2t − n). The duals Ñ m of Nm and ψ̃m of ψ can be computed by
n∈Z using the following formulas:
In addition, {an } and {bn } determine the decomposition
Ñ m (t) = 2am Ñ m (2t − n)
V1 = V0 ⊕ W0 by n∈Z
φ(2t − ) = a−2n φ(t − n) + b−2n ψ(t − n), ψ̃m (t) = 2bn Ñ m (2t − n).
n∈Z n∈Z n∈Z
for all ∈ Z.
If φ(t) = Nm (t) is the mth order B-spline with integer SEE ALSO THE FOLLOWING ARTICLES
knots and supp Nm = [0, m], then we have the following
result. FRACTALS • KALMAN FILTERS AND NONLINEAR FILTERS
• NUMERICAL ANALYSIS • PERCOLATION • WAVELETS,
Theorem 68. For each positive integer m, the min- ADVANCED
imally supported wavelet ψm corresponding to Nm (t) is
given by
BIBLIOGRAPHY
1
2m−2
ψm (t) = (−1) j N2m ( j + 1) Baker, G. A., Jr., and Graves-Morris, P. R. (1981). “Encyclopedia of
2m−1 j=0 Mathematics and Its Applications,” Padé Approximants, Parts I and
II, Addison-Wesley, Reading, Massachusetts.
m m
Braess, D. (1986). “Nonlinear Approximation Theory,” Springer-Verlag,
× (−1) Nm (2t − j − ).
=0
New York.
Chui, C. K. (1988). “Multivariate Splines,” CBMS-NSF Series in Ap-
plied Math. No. 54, SIAM, Philadelphia.
Furthermore, the reconstruction sequence pair is given by: Chui, C. K. (1992). “An Introduction to Wavelets,” Academic Press,
Boston.
−m+1 m
pn = s Chui, C. K. (1997). “Wavelets : A Mathematica Tool for Signal Analysis,”
n SIAM, Philadelphia.
Daubechies (1992). “Ten Lectures on Wavelets,” CBMS-NSF Series in
and Applied Math. No. 61, SIAM, Philadelphia.
m
Petrushev, P. P., and Popov, V. A. (1987). “Rational Approximation of
(−1)n m Real Functions,” Cambridge University Press, Cambridge, England.
qn = m−1
N2m (n − j + 1), Pinkus, A. (1985). “n-Widths in Approximation Theory,” Springer-
2 j=0
j
Verlag, New York.
Pinkus, A. (1989). “On L 1 -Approximation,” Cambridge University
with supp{ pn } = [0, m] and supp{qn } = [0, 3m − 2]. Press, Cambridge, England.
To describe the pair of decomposition sequences, we Wahba, G. (1990). “Spline Models for Observational Data,” CBMS-NSF
need the “Euler–Forbenius” polynomial Series in Applied Math. No. 59, SIAM, Philadelphia.
P1: GKB/LPB P2: GAE Revised Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN001D-72 May 19, 2001 13:37
Boolean Algebra
Raymond Balbes
University of Missouri-St. Louis
GLOSSARY of the lattice that are less than elements in the ideal.
Lattice [distributive] Partially ordered set for which
Atom Minimal nonzero element of a Boolean algebra. the least upper bound x + y and the greatest lower
Boolean homomorphism Function from one Boolean bound x y always exist [a lattice in which the identity
algebra to another that preserves the operations. x(y + z) = x y + x z holds].
Boolean subalgebra Subset of a Boolean algebra that is Least upper bound In a partially ordered set, the smallest
itself a Boolean algebra under the original operations. of all of the elements that are greater than or equal to
Complement In a lattice with least and greatest elements, all of the elements of a given set.
two elements are complemented if their least upper Partially ordered set Set with a relation “≤” that satisfies
bound is the greatest element and their greatest lower certain conditions.
bound is the least element of the lattice.
Complete Pertaining to a Boolean algebra in which every
subset has a greatest lower bound and a least upper A BOOLEAN ALGEBRA is an algebraic system con-
bound. sisting of a set of elements and rules by which the elements
Field of sets Family of sets that is closed under fi- of this set are related. Systematic study in this area was
nite unions, intersections, and complements and also initiated by G. Boole in the mid-1800s as a method for
contains the empty set. describing “the laws of thought” or, in present-day ter-
Greatest lower bound In a partially ordered set, the minology, mathematical logic. The subject’s applications
largest of all of the elements that are less than or equal are much wider, however, and today Boolean algebras not
to all of the elements in a given set. only are used in mathematical logic, but since they form
Ideal In a lattice, a set that is closed under the formation the basis for the logic circuits of computers, are of funda-
of finite least upper bounds and contains all elements mental importance in computer science.
289
P1: GKB/LPB P2: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001D-72 May 19, 2001 13:37
I. BASIC DEFINITIONS AND PROPERTIES by 1) if x ≤ 1 for all x ∈ P. It is evident that, when they
exist, 0 and 1 are unique. A lattice L with 0 and 1 is
A. Definition of a Boolean Algebra complemented if for each x ∈ L there exists an element
y ∈ L such that x + y = 1 and x y = 0. The elements x and
Boolean algebras can be considered from two seemingly y are called complements of one another.
unrelated points of view. We shall first give a definition in
terms of partially ordered sets. In Section I.C, we shall give Definition. A Boolean algebra is a poset B that is
an algebraic definition and show that the two definitions a complemented distributive lattice. That is, B is a poset
are equivalent. that satisfies
Definition. A partially ordered set (briefly, a poset) 1. For each x, y ∈ B, x + y and x y both exist.
is a nonempty set P together with a relation ≤ that satisfies 2. 0 and 1 exist.
3. x(y + z) = x y + x z is an identity in B.
1. x ≤ x for all x. 4. For each x ∈ B there exists y ∈ B such that x + y = 1
2. If x ≤ y and y ≤ z, then x ≤ z. and x y = 0.
3. If x ≤ y and y ≤ x, then x = y.
We shall soon see that, in a Boolean algebra, comple-
For example, the set of integers with the usual ≤ relation ments are unique. Thus, for each x ∈ B, x̄ will denote the
is a poset. The set of subsets of any set together with the complement of x.
inclusion relation ⊆ is a poset. The positive integers with
the | relation (recall that x | y means that y is divisible by B. Examples
x) are a poset. A chain C is a subset of a poset P if x ≤ y
or y ≤ x for any x, y ∈ C. The integers themselves form a Example. The two-element poset 2, which consists
chain under ≤ but not under | since 3 ≤ 5 by neither 3 nor of {0, 1} together with the relation 0 < 1, is a Boolean
5 is divisible by the other. algebra.
In any poset, we write x < y when x ≤ y and x = y. We
Example. Let X be a nonempty set and (X ) the
also use y ≥ x synonymously with x ≤ y.
set of all subsets of X . Then (X ), with ⊆ as the re-
A subset S of a poset P is said to have an upper bound if
lation, is a Boolean algebra in which S + T = S ∪ T ,
there exists an element p ∈ S such that s ≤ p for all s ∈ S.
ST = S ∩ T , 0 = ∅, 1 = X , and S̄ = X − S, where X − S =
An upper bound p for S is called a least upper bound
{x ∈ X | x ∈ S}. Note that if X has only one member then
(lub) provided that if q is any other upper bound for S
(X ) has two elements, ∅, X , and is essentially the same
then p ≤ q. Similarly l is a greatest lower bound for S if
as the first example.
l ≤ s for all s ∈ S, and if m ≤ s for all s ∈ S then S then
m ≤ l. Diagrams can sometimes be drawn to show the order-
Consider the poset (X ) of all subsets of a set X under ing of the elements in a Boolean algebra. Each node rep-
inclusion ⊆. For S, T ∈ (X ), the set S ∪ T is an upper resents an element of the Boolean algebra. A node that is
bound for S and T since S ⊆ S ∪ T and T ⊆ S ∪ T ; clearly, connected to one higher in the diagram is “less than” the
any other set that contains both S and T must also contain higher one (Fig. 1).
S ∪ T , so S ∪ T is the least upper bound for S and T .
Least upper bounds (when they exist) are unique. In- Example. Let X be a set that is possibly infinite. A
deed, if z and z are both least upper bounds for x and y, subset S of X is cofinite if its complement X − S is finite.
then since z is an upper bound and z is least, we have The set of all subsets of X that are either finite or cofinite
z ≤ z. Reversing the roles of z and z , we have z ≤ z , so form a Boolean algebra under inclusion.
z=z . The preceding example is a special case of the
Denote the (unique) least upper bound, if it exists, by following.
x + y. Thus, x ≤ x + y, y ≤ x + y, and if x ≤ z, y ≤ z then
x + y ≤ z. Similarly x · y (or simply xy) denotes the great- Example. A field of subsets is a nonempty collection
est lower bound of x and y if it exists. of subsets of a set X = ∅ that satisfies
A nonempty poset P in which x + y and xy exist for
every pair of elements x, y ∈ P, is called a lattice (the 1. If S, T ∈ , then S ∩ T and S ∪ T are also in .
symbols ∨ and ∧ are often used as notation instead of + 2. If S ∈ , then so is X − S.
and ·). A lattice is called distributive if x(y + z) = x y + x z
holds for all x, y, z. A poset P has a least element (denoted A field of subsets is a Boolean algebra under the inclu-
by 0) if 0 ≤ x for all x ∈ P and a greatest element (denoted sion relation, and it is easy to see that
P1: GKB/LPB P2: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001D-72 May 19, 2001 13:37
x z 1 = 0 and x + z 2 = 1, x z 2 = 0, then z 1 = z 2 . Now to it is easy to see that by considering the + and · as opera-
prove this, z 1 = z 1 1 = z 1 (x + z 2 ) = z 1 x + z 1 z 2 = 0 + z 1 z 2 tions, rather than as least upper and greatest lower bounds,
≤ z 2 . Similarly, z 2 ≤ z 1 so z 1 = z 2 . all of the above identities are true. On the other hand,
suppose we start with a Boolean algebra B defined alge-
In a Boolean algebra B, any finite set S = {x1 , . . . , xn }
braically. We can then define a relation ≤ on B by x ≤ y if
has a least upper bound (. . . ((x1 + x2 ) + x3 ) + · · ·) + xn .
and only if x y = x. This relation is in fact a partial order on
This can be proved by mathematical induction, and in
B since (1) x x = x implies x ≤ x; (2) if x ≤ y and y ≤ z,
fact the order and parentheses do not matter. That is,
then x y = x, yz = y, so x z = (x y)z = x(yz) = x y = x, so
x1 + (x2 + x3 ), (x2 + x1 ) + x3 , etc., all represent the same
x ≤ z; and (3) if x ≤ y and y ≤ x, then x y = x, yx = y, so
element, namely, the least member of B that is greater
x = x y = yx = y. If we continue in this way, x + y turns
than or equal to x1 , x 2 , and x 3 . Thus, we use the notation
out to be the least upper bound of x and y under the ≤ that
x1 + · · · + xn or S or
we have defined, x y is the greatest lower bound, 0 is the
xi least element, 1 is the greatest, and x̄ is the complement
i∈(1,...,n) of x. Thus B, under ≤, is a Boolean algebra.
for the least upper bound of a nonempty set S = Since both of these points of view have advantages, we
{x1 , . . . , xn }. Similarly, x1 . . . xn , S, and shall use whichever is most appropriate. As we shall see,
no confusion will arise on this account.
xi
From the algebraic points of view, the definition of 2
i∈(1,...,n)
and ({x , y }), shown in Fig. 1, could be given by Table I.
represent the greatest lower bound of S. All of the prop-
erties listed above generalize in the obvious way. For
D. Subalgebras
example,
A (Boolean) subalgebra is a subset of a Boolean alge-
1. S ≤ s ≤ S for all s ∈ S. bra that is itself a Boolean algebra under the original
x + S = {x + s | s ∈ S}.
2. operations.
3. S = {s̄ | s ∈ S}, S = {s̄ | s ∈ S}. A subset B0 of a Boolean algebra that satisfies the con-
ditions 1–3 below also satisfies the four definitions of a
Note that, if S = ∅, then 0 satisfies the criteria for being a Boolean algebra. This justifies the following:
least
upper bound for the set S. So we extend the difinitions
of and to ∅ = 0, ∅ = 1. Definition. A subalgebra B0 of a Boolean algebra B
As mentioned above, Boolean algebras can be defined is a subset of B that satisfies
as algebraic systems
1. If x, y ∈ B0 , then x + y and x y ∈ B0 .
Definition. A Boolean algebra is a set B with two 2. If x ∈ B0 , then x̄ ∈ B0 .
binary operations +, ·, a unary operation −, and two dis- 3. 0, 1 ∈ B.
tinguished, but not necessarily distinct elements 0, 1 that
satisfies the following identities: Every subalgebra of a Boolean algebra contains 0 and
1; {0, 1} is itself a subalgebra. The collection of finite and
1. (x + y) + z = x + (y + z); (x y)z = x(yz). cofinite subsets of a set X forms a subalgebra of (X ).
2. x + y = y + x; x y = yx.
3. x + x = x; x x = x. TABLE I Operator Tables for 2 and ({x, y})
4. x + x y = x; x(x + y) = x.
5. x(y + z) = x y + x z; x + yz = (x + y)(x + z). Boolean Algebra 2
6. 0 + x = x; 1x = x.
+ 0 1 · 0 1 − 0 1
7. x + x = 1; x x = 0. 0 0 1 0 0 0 1 0
1 1 1 1 0 1
A more concise definition of a Boolean algebra is as
an idempotent ring (R, +, ·, 0)—in the classical algebraic Boolean algebra ({x, y}), where a = {x}, b = {y}
sense—with unity. In this case, the least upper bound is
+ 0 a b 1 · 0 a b 1 − 0 a b 1
x + y + x y; the greatest lower bound is x y; 0 and 1 play
0 0 a b 1 0 0 0 0 0
their usual role; and x̄ = 1 − x. 1 b a 0
a a a 1 1 a 0 a 0 1
Now that we have two definitions of a Boolean algebra,
b b 1 b 1 b 0 0 b b
we must show that they are equivalent. Let us start with the
1 1 1 1 1 1 0 a b 1
poset definition of B. From the theorem in the last section,
P1: GKB/LPB P2: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001D-72 May 19, 2001 13:37
B
f
B1
1. Every chain in a free Boolean algebra is finite or
countably infinite.
2. Every free Boolean algebra has the property that, if S
is a set in which x y = 0 for all x = y in S, then S is
x x̂ x x̂ finite or countably infinite. (This is called the
countable chain condition).
3. Countable atomless Boolean algebras are free.
4. The finite free Boolean algebras are all of the form
fˆ ((S)) for finite S.
S(B) S(B1)
FIGURE 2 Coequivalence between homomorphisms and contin-
uous functions. The Boolean space corresponding to the free Boolean
algebra on countably many free generators is the Cantor
discontinuum. So for example, the topological version of
I ∈ (B1 ). Furthermore, f̂ −1 [x̂] = f (x) for each x ∈ B “Every countable Boolean algebra is projective” is “Every
(Fig. 2). closed subset of the Cantor discontinuum is a retract of
This coequivalence can be very useful since problems itself.”
in Boolean algebra can be transformed into questions
or problems about Boolean spaces that may be more
amenable to solution. IV. COMPLETENESS AND HIGHER
DISTRIBUTIVITY
III. FREE BOOLEAN ALGEBRAS The canonical work on this topic is “Boolean algebras”
AND RELATED PROPERTIES by R. Sikorski, and there are many more recent results as
well. We present here only a brief introduction.
The basic fact about extending functions to homomor-
phisms is given by the following: Definition. A Boolean algebra is complete if S
and S exist for all S ⊆ B.
Theorem. Let B, B1 be Boolean algebras, ∅ = S ⊆
For example, (S) is complete, as is the Boolean al-
B, and f : S → B1 a function; f can be extended to a
gebra of regular open sets of a topological space. Here
homomorphism
→ B1 if and only if: (l) If T1 ≤
g : [S]
one must be careful, for if S is a family of regular open
T2 then f [T1 ] ≤ f [T2 ], where T1 ∪ T2 is any finite
sets, ∩S may not beregular open. It can be shown that
nonempty subset of S. (Note that this includes
the cases S = INT(∩S) and S = INT(C L(∪S)).
T = 0 ⇒ f [T ] = 0 and T =1⇒ f [T ] = 1,
There are many useful ways to generalize the distribu-
where T is a finite nonempty set).
tive law x(y + z) = x y = x z; we will restrict our attention
Sketch of Proof.
To define g : [S] → B, we set to the following:
n
g( i=1 Ti = f [Ti ]. The condition (1) is sufficient
to prove that g is a well-defined homomorphism. Clearly, Definition. A Boolean algebra is completely dis-
g|S= f. tributive provided that, if {ai j }i∈I, j∈J is a set of
elements
A free Boolean algebra with free generators {xi }i∈I is
such that j∈J ai j exists for each i∈I , i∈I j∈J ai j
a Boolean algebra generated by {xi }i∈I and such that any exists,
and i∈I aiϕ(i) exists for each ϕ ∈ J I
, then
function f : {xi | i ∈ I } → B1 can be extended to homo- ϕ∈J I i∈I ai ϕ
(i) exists and
morphism. Free Boolean algebras with free generating
ai j = aiϕ(i).
sets of the same cardinality are isomorphic. For the ex- i∈I J ∈J ϕ∈J I i∈I
istence of a free Boolean algebra with a free generating
set of cardinality α, we start with a set X of cardinality α. A typical result involving these notions is the following:
For each x ∈ X, let Sx = {S ⊆ X | x ∈ S}. Now {Sx | x ∈ X }
freely generates a subalgebra C of the power set of X . The Theorem. Let B be a Boolean algebra. Then the
set {Sx | x ∈ X } is
said to be independent because if sat- following are equivalent:
isfies x∈A Sx ⊆ y B Sy and implies A ∩ B = φ for any
finite nonempty subset A ∪ B of X . But this condition 1. B is complete and completely distributive.
clearly implies that (1) holds, so C is free. 2. B is complete and every element is a sum of atoms.
Free Boolean algebras have many interesting proper- 3. B is isomorphic with the field of all subsets of some
ties: set.
P1: GKB/LPB P2: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001D-72 May 19, 2001 13:37
A striking theorem of Sikorski, from which it follows Thus, Boolean algebras can be used as models for log-
that the injectives in the category of Boolean algebras are ical systems. A striking example of this is the simplifica-
those that are complete, can be formulated as follows: tion of the independence proofs of P. J. Cohen by means
of Boolean models. For example, by studying the appro-
Theorem. If B1 is a subalgebra of a Boolean alge- priate Boolean algebra, we can prove that the axiom of
bra B, A is a complete Boolean algebra, and f : B1 → A choice is not a consequence of set theory—even with the
is a homomorphism, then there exists a homomorphism continuum hypothesis.
g : B → A such that g | B1 = f . To illustrate, we start with the set S of formulas
α, β, γ , . . . of the classical propositional calculus. An
Proof. By Zorn’s lemma, there exists a homomor- equivalence relation ≡ is defined by identifying α and
phism f 1 : B0 → A that is maximal with respect to being β, provided the α → β and β → α are both derivable. A
an extension of f . We will, of course, be done if B0 = B, Boolean algebra is defined on the classes [α], [β], [γ ], . . .
so suppose B0 ⊂ B. Select a ∈ B − B0 and set by the partial ordering [α] ≤ [β] if and only if α → β is
derivable.
b= f 1 [{x ∈ B1 | x ≤ a} ∩ B0 ]. The resulting Boolean algebra is called the
Now, it can be shown that a ≤ x, x ∈ B0 ⇒ b ≤ f 1 (x), and Lindenbaum–Tarski algebra and is in fact the free
x ≤ a, x ∈ B0 ⇒ f 1 (x) ≤ b, so by our theorem on extend- Boolean algebras with free generators [α], [β], [γ ], . . . ,
ing functions to homomorphism, f 1 can be extended to where α, β, γ are the propositional variables. The
[{a} ∪ B0 ], contradicting the maximality of f 1 . construction of the Lindenbaum–Tarski algebra for
Another interesting series of results concerns the em- the restricted prelicate calculus is similar, and it is
bedding of Boolean algebras into ones that are complete. in this context that the independence proofs can be
Specifically, a regular embedding f : B → B1 is given.
a one-to-
one homomorphism
with the property that, if Sexists
in B, then f [S] exists in B1 and is equal to f ( B) B. Switching Functions and Electronics
and similarly for products.
To outline one of the basis results, we shall call an ideal Switching functions are mathematical formulas that en-
closed if it is an intersection of principal ideals. The set able us to describe and design certain parts of electronic
B̄ of all closed ideals forms a complete Boolean algebra circuits. They are widely used in both communications
under inclusion, and the map v : B → B̄, v(x) = (x] is a and computer applications.
regular embedding. The pair (B, v) is called the MacNeille First, we shall define some notation. We use 2n
completion of B. B is isomorphic with the Boolean algebra to represent the set of all n-tuples of elements in 2.
of regular open subsets of the Boolean space of B. That is, 22 = {(0, 0), (0, 1), (1, 0), (1, 1)}, and so on. For
σ ∈ 2n , we sometimes write σ = (σ (1), . . . , σ (n)), so for
σ = (0, 1) ∈ 22 , we have σ (1) = 0 and σ (2) = 1.
V. APPLICATIONS
A switching function is any function f : 2n → 2. For
example, if n = 2, there are 16 switching functions, the
A. Logic
values of which are given in Table II.
One of the most importance areas of application for Explicity, f 7 is defined by f 7 (0, 0) = 0; f 7 (0, 1) = 1;
Boolean algebras is logic. Indeed, many problems in logic f 7 (1, 0) = 1; f 7 (1, 1) = 0.
can be reformulated in terms of Boolean algebras where a For x ∈ 2 (that is, x = 0 or x = 1), we write x 0 = x̄
solution can be more easily obtained. For example, the fun- and x 1 = x. Now for a fixed value of n, a switch-
damental compactness theorem for the (classical) propo- ing function is called a complete product if it has the
sitional calculus is equivalent to the theorem that every form f (x1 , . . . , xn ) = x1σ (1) . . . xnσ (n) , where σ ∈ 2n . For
element a = 1 in a Boolean algebra is contained in a max- example, if n = 3, then f (x1 , x2 , x3 ) = x1 x̄2 x3 is a com-
imal ideal. plete product, but f (x1 x2 x3 ) = x̄2 x3 is not.
(0, 0) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
(0, 1) 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
(1, 0) 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
(1, 1) 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
P1: GKB/LPB P2: GAE Revised Pages
Encyclopedia of Physical Science and Technology EN001D-72 May 19, 2001 13:37
Calculus
A. Wayne Roberts
Macalester College
317
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
318 Calculus
Calculus 319
1 1 2
1 2 2
1 3 2
R3 = 3 3
+ 3 3
+ 3 3
Problem 2. Our second problem again makes refer- FIGURE 4 The curve is y = x 2 .
ence to the graph of y = x 2 . This time we seek the slope of
the line tangent to the graph at P(1, 1). That is, with refer-
ence to Fig. 3, we wish to know the ratio of the “rise” BT The slope of the desired line is evidently the limiting value
to the “run” PB (or, stated another way, we wish to know of this infinite process.
the tangent of the angle that PT makes with the positive Our interest in both Eqs. (1) and (2) focuses on what
x axis). Our method, certainly known to Fermat (1601– happens as n increases idenfinitely. Using the symbol
1665), is again suggested by a sequence of pictures. →∞ to represent the idea of getting large without bound,
From the graphs in Fig. 4, we see in turn mathematicians summarize their interest in Eq. (1) by ask-
ing for
B1 S1 (1 + 1)2 − 12
(a) slope PS1 = = =3
PB1 1 1 1 2 1 2 2 1 n 2
lim Rn = lim + + ··· +
B2 S2 (1 + 1/2)2 − 12 5 n→∞ n→∞ n n n n n n
(b) slope PS2 = = =
PB2 1/2 2
and in Eq. (2) by asking for
B3 S3 (1 + 1/3)2 − 12 7
(c) slope PS3 = = =
PB3 1/3 3 Bn Sn (1 + 1/n)2 − 12
lim = lim
n→∞ PBn n→∞ 1/n
And again there is a pattern that enables us to see where
we will be after n steps:
Bn Sn (1 + 1/n)2 − 12 II. LIMITS
slope PSn = = (2)
PBn 1/n
Calculus is sometimes said to be the study of limits. That is
because the nature of an infinite process is that we cannot
carry it to completion. We must instead make a careful
analysis to see if there is some limiting position toward
which things are moving.
A. Algebraic Expressions
Limits of certain algebraic expressions can be found by
the use of simplifying formulas. Archimedes was greatly
aided in his study of the area under a parabola because he
had discovered
12 + 22 + · · · + n 2 = 16 n(n + 1)(2n + 1) (3)
Sometimes the limit of a process is evident from a very
minor change of form. The limit of the sum (2) above is
FIGURE 3 The curve is y = x 2 . easily seen if we write it in the form
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
320 Calculus
Calculus 321
0 0
1
4 2 12 − 1
16 = 15
16
1
2 2 √1 − 14 ≈ 1.2
2
√
1 2 1−1=1
322 Calculus
We now ask how fast, in inches per second, y is in- Finally, look again at our introductory Problem 2. Using
creasing when x = 2. One way to get started is to find the the notation f (x) = x 2 now available to us, we see that
average rate at which y increased during the third minute. Eq. (2) may be written in the form
f (1 + 1/n) − f (1)
(height at x = 3) − (height at x = 2) f (3) − f (2) slope PSn =
= 1/n
elapsed time 3−2
We find the slope of the line tangent to the graph at x = 1
Since the rate is slowing down as time passes, however, by taking the limit of this expression as n → 1. Similar
it is clear that we would get a better approximation to the reasoning at an arbitrary point x, making the substitution
rate at x = 2 if we used a smaller time interval near x = 2. h = 1/n, would lead us to the conclusion that the slope of
Again we let h represent some short interval of time after a line tangent to the graph at x is given by
x = 2. Again we see that the desired rate at x = 2 will be f (x + h) − f (x)
given by the limit of slope = lim (11)
h→0 h
(height at x = 2 + h) − (height at x = 2) whenever this limit exists.
h The expression that we have encountered in Eqs. (9),
f (2 + h) − f (2) (10), and (11) turns up time and again in the applications of
= mathematics. For a given function f , the derived function
h
or derivative f is the function defined at x by
f (x + h) − f (x)
f (x) = lim
h→0 h
It is important to recognize that the meaning of the deriva-
tive is not restricted to just one interpretation. The three
important ones that we have discussed are summarized in
Table II. Thus, the computational techniques developed
by Fermat and other mathematical pioneers have turned
out to be of central importance in modern mathematical
analysis.
Calculus 323
the line tangent to the graph of y = x 2 at x = 1. For Finally, we note that differentiation is a linear opera-
f (x) = x 2 , f (x + h) = (x + h)2 = x 2 + 2xh + h 2 , so tion. This means that if f and g are differentiable func-
tions and r and s are real numbers, then the function
f (x + h) − f (x) x 2 + 2xh + h 2 − x 2 h(2x + h) h defined by h(x) = r f (x) + sg(x) is differentiable, and
= =
h h h h (x) = r f (x) + sg (x).
f (x + h) − f (x)
f (x) = lim (12)
h→0 h C. The Derivative and Graphing
324 Calculus
FIGURE 9 The tangent line is parallel to the secant line. FIGURE 11 The top curve is a sine wave.
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
Calculus 325
D. Four Important Applications This is a differential equation. We are asked to find a func-
tion, the derivative of which is a constant multiple of the
1. Falling Bodies
function with which we started. If we recall from Eq. (16)
We have seen that if y = f (x) describes the distance y that that E (x) = E(x) for E(x) = e x , we can quite easily con-
a body moves in time x, then f (x) gives the velocity at vince ourselves that a solution to (17) is
time x. Let us take this another step. Suppose the velocity
p = f (x) = ekx (18)
is changing with time. That rate of change, according to
the second principle in Table II, will be described by the This explains why it is said that population grows expo-
derivative of f (x), designated by f (x). The change of nentially.
velocity with respect to time is called acceleration; accel- We need not be talking about people on an island. We
eration at time x is f (x). might be talking about bacteria growing in a culture, or
When an object is dropped, it picks up speed as it falls. ice melting in a lake. Anytime the general principle is that
This is called the acceleration due to gravity. There is spec- growth (or decay) is proportional to the amount present,
ulation as to how Galileo (1564–1642), with the measuring Eq. (17), perhaps with negative k, describes the action and
instruments available to him, came to believe that the ac- Eq. (18) provides a solution.
celeration due to gravity is constant. But he was right; the
acceleration is designated by g and is equal to 32.2 ft/sec2 .
Now put the two ideas together. The acceleration is 3. Maxima and Minima
f (x); the acceleration is constant; f (x) = −g. We use The usefulness in practical engineering and scientific
−g because the direction is down. What function, if dif- problems of our ability to find a high or low point on a
ferentiated, gives the constant −g? We conclude that graph can be hinted at with the following simple problem.
f (x) = −gx + c. The c is included because the deriva- Suppose that a box is to be made from a 3-ft-square piece
tive of a constant is 0, so we can only be sure of f (x) of sheet metal by cutting squares from each corner, then
up to a constant. Since f (x) is the velocity at time x, folding the edges up (see Fig. 12) and welding the seams
and since f (0) = c, the constant c is usually designated along the corners. If the edges of the removed squares are
by υ0 , which stands for the initial velocity—allowing x in length, the volume of the box is given by the function
the possibility that the object was thrown instead of
dropped. y = f (x) = x(3 − 2x)2 = 4x 3 − 12x 2 + 9x
The function f (x) = − gx + υ0 is called the antideriva- which is graphed in Fig. 13. The derivative is
tive of f (x) = −g. Now to find the distance y that the ob-
ject has fallen, we seek the antiderivative of f (x); what f (x) = 12x 2 − 24x + 9 = 3(2x − 1)(2x − 3)
function was differentiated to give f (x) = −gx + υ0 ?
from which we see that with x restricted by the nature of
Contemplation of the derivatives computed above leads
the problem to be between 0 and 32 , the maximum volume
to the conclusion that
is obtained by choosing x = 12 .
y = f (x) = − 12 gx 2 + υ0 x + y0
this time the constant y0 represents the initial height from
which the object was dropped or thrown.
326 Calculus
V. INTEGRATION
Calculus 327
+ f (tn )(xn − xn−1 ) (21) For a large class of functions, it happens that by speci-
fying a small enough number g, we can guarantee that if
where ti is chosen to satisfy xi −1 ≤ ti < xi . Figure 2 shows |P| < g, then the sum R( f, P, {ti }) will be very close to
the areas obtained if we choose n = 1, 2, and 3, respec- some fixed number—and this will be true no matter how
tively, if we choose intervals of equal length and if in ev- the tags {ti } are chosen in the subintervals. Such a function
ery case we choose ti = xi , the right-hand endpoint. If we f is said to be Riemann integrable on the interval from
choose n intervals of equal length so that xi − xi−1 = 1/n a to b. The value around which the sums congregate is
for every i, and if we choose ti = xi = i/n, then Eq. (21) designated by an elongated S, stretched between an a and
becomes the sum Rn given in Eq. (1). ab, and set before the functional symbol f :
b
f
a
This number is called the integral of f .
Which functions are integrable? Numerous answers
can be given. Monotone increasing functions, continuous
functions, and bounded functions continuous at all but a
finite number of points are all integrable. Suffice to say
that most functions that turn up in applied mathematics
are integrable. This is also a good place to say that follow-
ing the work of Lebesque (1875–1941), the integral we
have defined has been generalized in numerous ways so
as to enlarge the class of integrable functions.
B. Evaluation of Integrals
No matter what the source (distance traveled, work done,
FIGURE 16 The curve is y = x 2.The intervals may be unequal, etc.) of a Riemann sum, the corresponding integral may
the ti located differently (not a center, or always the same propor- be interpreted as an area. Thus, for f (x) = x, we easily
tion of the way between xi −1 and xi ) within the intervals. see (Fig. 17) that
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
328 Calculus
b
b3
x2 =
0 3
Proceeding in just this way, the mathematician
Cavalieri (1598–1647) was able to find formulas simi-
lar to Eq. (3) for sums of the form 1k + 2k + · · · + n k for
k = 1, 2, . . . , 9. He was thus able to prove that
b
bk+1
xk = (22)
0 k+1
for all positive integers up to 9, and he certainly correctly
anticipated that Eq. (22) holds for all positive integers.
But each value of k presented a new problem, and as we
said, Cavalieri himself stalled on k = 10. Reasonable peo-
b
ple will be discouraged by the prospect of finding a f
for more complicated functions. Indeed, the difficulty of
FIGURE 17 Interpreting an integral as an area.
such calculations made most such problems intractable
until the time of Newton (1642–1727) and Leibniz (1647–
1 1 1716), when discovery of the fundamental theorem of cal-
1
f = x= culus, discussed below, brought such problems into the
0 0 2 realm of feasibility.
When a function is known to be Riemann integrable, The computer age has changed all that. For a given
then the fact that all Riemann sums get close to the correct function, it is now possible to evaluate Riemann sums very
value for partitions with sufficiently small gauge means rapidly. Along with this possibility have come increased
that we may use any convenient sum. We have seen that for emphases on variations of Riemann sums that approximate
b
f (x) = x 2 , one such sum was given by Rn as expressed a f . These are properly studied in courses on numerical
in Eq. (1): analysis.
Before leaving the topic of evaluation, we note that,
1 1 2 1 2 2 1 n 2
Rn = + + ··· + like differentiation, integration is a linear operator. For
n n n n n n two integrable functions f and g and two real numbers r
and s,
1 2
= (1 + 22 + · · · + n 2 ) b b b
n3
(r f + sg) = r f +s g
We have already said that Archimedes was helped in his a a a
work because he knew Eq. (3), enabling him to write Thus, using what we know from our calculations above,
1 1 1 1 1
Rn = n(n + 1)(2n + 1) 1
n3 6 (4x − 2x 2 ) = 4 x −2 x2 = 4
0 0 0 2
1 n n+1 2n + 1 1 4
= −2 =
6 n n n 3 3
1 1 1
Rn = 1+ 2+
6 n n C. Applications
Since the partition used here had n equal intervals, the It seems more difficult to group the diverse applications of
requirement that the gauge get smaller is equivalent to integration into major classifications than is the case for
requiring that n get larger. Thus, differentiation. We shall indicate the breadth of possibili-
1 ties with several examples typically studied in calculus.
1 1
x 2 = lim Rn = (1)(2) =
0 n→∞ 6 3
1. Volumes
With almost no extra effort, the length of the interval can
be extended to an arbitrary b, and it can be determined Suppose the graph of y = x 2 is rotated about the x axis to
that form a so-called solid of revolution that is cut off by the
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
Calculus 329
l= (xi − xi−1 )2 + (yi − yi−1 )2
= (xi − xi−1 )2 + [g(xi ) − g(xi−1 )]2
If the function g is differentiable, then from the Mean
Value Theorem, we see that
g(xi ) − g(xi−1 ) = g (ti )(xi − xi−1 )
where ti is between xi−1 and xi . Thus
l = 1 + [g (ti )]2 (xi − xi−1 )
and the sum of these lengths is
1 + [g (t1 )]2 (x1 − x0 ) + · · ·
+ 1 + [g (tn )]2 (xn − xn−1 )
This has the form of a Riemann sum for the function
f (x) = 1 + [g (x)]2 . It converges to an integral we take
to be the desired
b
Length = 1 + [g (x)]2
a
3. Normal Distribution
If in a certain town we measure the heights of all the
women, or the IQ scores of all the third graders, or the
gallons of water consumed in each single family dwelling,
FIGURE 18 The curve y = x 2 has been rotated about the x axis we will find that the readings cluster around a number
to generate a solid figure.
called the mean, x̄. A common display of the readings uses
a bar graph, a graph in which the percentage of the readings
plane x = 1 (Fig. 18). A partition of the x axis from 0 to 1 that fall between xi −1 and xi is indicated by the height of a
now determines a series of planes that cut off disks, each bar drawn over the appropriate interval (Fig. 20a). The sum
disk having a volume approximated by π yi2 (xi − xi−1 ) of all the heights (all the percentages) should, of course,
where yi = ti2 for some ti between xi−1 and xi . Summing be 1.
these volumes gives As the size of the intervals is decreased and the number
of data points is increased, it happens in a remarkable
πt14 (x1 − x0 ) + π t24 (x2 − x1 ) + · · · + πtn4 (xn − xn−1 ) number of cases that the bars arrange themselves under
the so-called normal distribution curve (Fig. 20b) that has
which has the form of a Riemann sum for a function an equation of the form
f (x) = π x 4 . It converges to an integral that we can eval-
uate with the help of Eq. (22).
1 1
1
Volume = π x4 = π x4 = π
0 0 5
2. Length of Arc
Suppose the points A and B are connected by a curve that
is the graph of y = g(x) for x between a and b (Fig. 19).
The length of this curve is clearly approximated by the
sum of the lengths of line segments joining points that FIGURE 19 An arbitrarily drawn curve, together with a sequence
have as their first coordinates the points of a partition of of secant lines being used to (roughly) approximate the length of
the x axis from a to b. A typical segment has length the curve.
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
330 Calculus
FIGURE 20 A pair of histograms, the second one indicating how The function being integrated is f , the derivative of the
an increasing number of columns leads to the concept of a distri- function f on the right side of the equal sign. The function
bution curve.
f is in turn called the antiderivative of f . The relationship
lies at the heart of the calculus.
y = de−m(x−x̄)
2
The constants are related to the relative spread of the bell- Fundamental theorem of calculus. Let F be any
shaped curve, and they are chosen so that the area under antiderivative of f ; that is, let F be a function such that
the curve is 1. The percentage of readings that fall between F (x) = f (x). Then
a and b is then given by b
b
f = F(b) − F(a)
de− m(x − x̄)
2 a
a
B. Some Consequences
VI. FUNDAMENTAL THEOREM In Eq. (15), we saw that if f (x) = x r , then f (x) = r x r −1 .
OF CALCULUS It follows that the antiderivative of f (x) = x r would, for
r = −1, be
Up to this point, calculus seems to be neatly separated into
two main topics, differential calculus and integral calcu-
lus. Historically, the two topics did develop separately, and
considerable progress had been made in both areas. The
genius of Newton and Leibniz was to see and exploit the
connection between the integral and the derivative.
A. Integration by Antidifferentiation
Let y = f (x) be the distance that a moving object has
traveled from some fixed starting point in time x. We have
seen (Table II) that the velocity of the object at time x
is then given by υ = f (x). Since the value of f (x) is
also the slope of the line tangent at (x, y) to the graph of
y = f (x), a general sketch of υ = f (x) may be drawn by
looking at the graph of y = f (x); see Fig. 21.
From the graph of y = f (x), we see that from time x = a
to time x = b,
Distance traveled = f (b) − f (a) (23)
At the same time, we argued in setting up Eq. (19) that
if the velocity υ of a moving object is given by υ = g(t),
then the distance traveled would be approximated by what
FIGURE 21 A graph of y = f (x) on the top axes with two tangent
we came later to see was a Riemann sum that converged to segments having a slope of 1, an intermediate segment with slope
b >1, and a low point with slope 0. The lower curve, obtained by
g plotting the slopes of representative tangent lines, is the graph of
a y = f (x).
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
Calculus 331
332 Calculus
TABLE IV
In a table of values accurate to seven places to the right Fourier series provide another useful representation, and
of the decimal, one must get to 5 degrees before any differ- there are others.
ence shows up between sin x and its polynomial approxi- Also, while we have approached infinite series by way
mation S(x); and the polynomial approximations S(x) and of representing functions, we might well have started with
C(x) give six place accuracy for sin x and cos x for values the representation of individual numbers. The familiar use
of x all the way through 20 degrees. of 0.333 · · · = 13 is really a statement saying that the infi-
Though manufacturers of calculators use variations, nite sum
the idea illustrated in the Table IV gives correct in- 3 3 3
sight into how calculators find values of trigonomet- + 2 + 3 + ···
10 10 10
ric functions; they make use of polynomial approxima-
tions. Greater accuracy can be obtained by using Tay- gets closer and closer to 13 ; that is, the finite sums
lor polynomials of higher degree. It is sufficient, of
course, to obtain accuracy up to the number of digits 3 3 3
Sn = + 2 + ··· n
to the right of the decimal that are displayed by the 10 10 10
calculator.
get closer to 13 as n, the number of terms increases.
The last function listed in Table III, Arc tan x, is the
The addition of an infinite number of terms is not a triv-
inverse tangent function. If y = Arc tan x, then x = tan y.
ial subject. The history of mathematics includes learned
It is included so we can address another question about
discussions of what value to assign to
which people sometimes wonder. Since tan 30◦ = √13 and
30◦ = π6 radians, substitution of x = √13 into the polyno- 1 − 1 + 1 − 1 + 1 − ···
mial approximation for Arc tan x gives
Some argued that an obvious grouping shows that
π 1 1 1 1
≈√ − √ + √ − √ = 0.5230 (1 − 1) + (1 − 1) + (1 − 1) + · · · = 0
6 3 3( 3) 3 5( 3) 5 7( 3)7
Multiplication by 6 gives us the approximation of Others countered that
π ≈ 3.138 Rounding to two decimals to the right of the 1 − (1 − 1) − (1 − 1) − · · · = 1
decimal, we get the familiar approximation of 3.14. More
accuracy can be obtained by using a Taylor polynomial and Leibniz, one of the developers of calculus, suggested
of higher degree, and there are tricks that yield much bet- that the proper value would therefore be 12 . There are other
ter approximations using just the 7th degree polynomial instances in which famous mathematicians have chal-
employed here, but again we stop, having illustrated our lenged one another to find the value of an infinite series
point. Familiar constants such as π (and e) can be approx- of constants. James Bernoulli (1654–1705) made it clear
imated using Taylor polynomials. that he would like to know the sum of
While our discussion of infinite series has given some
1 1 1 1
indication of the usefulness of the idea, it is important to 1+ + + + ··· + 2 + ···.
indicate that a great deal more could be said. Thus, while 4 9 16 n
Taylor series provide an important way to represent func- but it wasn’t until 1736 that Euler discovered that this sum
got closer and closer to π6
2
tions, they should be seen as just one such representation.
P1: GLQ Revised Pages
Encyclopedia of Physical Science and Technology En002c-76 May 17, 2001 20:27
Calculus 333
VIII. SOME HISTORICAL NOTES written by the Marquis de l’Hospital in 1696. It wasn’t
until 1816, however, that a text by Lacroix, Traité du Cal-
Archimedes, with his principle of exhaustion, certainly cul Différentiél et du Calcul Intégral, having been well
had in hand the notion of approaching a limiting value received in France for many years, was translated into
with a sequence of steps that could be carried on ad in- English and so brought the methods used on the conti-
finitum, but he was limited by, among other things, the nent to the English speaking world. Through all of these
lack of a convenient notation. The algebraic notation came books there ran an undercurrent of confusion about the
much later, and the idea of relating algebra to geome- nature of the infinitesimal, and it was not until the work
try was later still, often attributed to Descartes (1596– of Cauchy (1789–1857) and Weierstrass (1815–1897) that
1650), but arguably owing more to Fermat (1601–1665). logical gaps were closed and calculus took the form that
Fermat, Barrow (1630–1677), Newton’s teacher, Huygens we recognize today.
(1629–1695), Leibniz’s teacher, and others set the stage The most recent changes in the teaching of calculus
in the first half of the 17th century, but the actual develop- grew out of an effort during the decade from 1985–1995
ment of the calculus is attributed to Isaac Newton (1642– to reform the way calculus was to be taught in the United
1727) and Gottfried Wilhelm Leibniz (1646–1716), two States. A summary of that effort is presented in Calculus,
geniuses who worked independently and were ultimately The Dynamics of Change.
drawn into arguments over who developed the subject first.
Evidence seems to substantiate Newton’s claim to pri-
macy, but his towering and deserved reputation as one of SEE ALSO THE FOLLOWING ARTICLES
the greatest thinkers in the history of mankind is surely
owing not so much to what he invented, but what he did ALGEBRAIC GEOMETRY • DIFFERENTIAL EQUATIONS,
with it. In Newton’s hands, calculus was a tool for chang- ORDINARY • DIFFERENTIAL EQUATIONS, PARTIAL •
ing the way humans understood the universe. Using calcu- INTEGRAL EQUATIONS • NUMERICAL ANALYSIS • STO-
lus to extrapolate from his law of universal gravitation and CHASTIC PROCESSES
other laws of motion, Newton was able to analyze not only
the motion of free falling bodies on earth, but to explain,
even predict the motions of the planets. He was widely
regarded as having supernatural insight, a reputation the
BIBLIOGRAPHY
poet Alexander Pope caught with the lines,
Apostol, T. M. (1961). “Calculus, Volumes 1 and 2,” Blaisdell, Boston.
Boyer, C. B. (1959). “The History of the Calculus and Its Conceptual
Nature and Nature’s laws lay hid in night: Development,” Dover, New York.
God said, “Let Newton be!” and all was light. Morgan, F. (1995). “Calculus Lite,” A. K. Peters, Wellesley,
Massachusetts.
Leibniz, on the other hand, developed the notation that Ostebee, A., and Zorn, P. (1997). “Calculus,” Harcourt Brace & Co., San
Diego.
made the calculus comprehensible to others, and he gath-
Roberts, A. W., ed. “Calculus: The Dynamics of Change,” Mathematics
ered around himself the disciples that took the lead away Association of America, Washington, DC.
from Newton’s followers in England, and made the con- Sawyer, W. W. (1961). “What is Calculus About?,” The Mathemati-
tinent the center of mathematical life for the next cen- cal Association of America New Mathematical Library, Washington,
tury. James (1654–1705) and John (1667–1748) Bernoulli, DC.
Swokowski, E. W. (1991). “Calculus,” 5th ed., PWS-Kent, Boston.
Leonhard Euler (1707–1783), and others applied calculus
Thomas, G. B., Jr., and Finney, R. L. (1984). “Calculus and Analytic
to a host of problems and puzzles as well applied problems. Geometry,” 6th ed., Addison-Wesley, Reading, Massachusetts.
The first real textbook on calculus was Analyse des In- Toeplitz, O. (1963). “The Calculus, A Genetic Approach,” University of
finiments Petits Pour L’intelligence des Lignes Courbes, Chicago Press, Chicago.
P1: ZCK Final Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
Complex Analysis
Joseph P. S. Kung Chung-Chun Yang
Department of Mathematics, University of North Texas Department of Mathematics, The Hong Kong
University of Science and Technology
Cauchy integral formula A basic formula expressing Meromorphic function A complex function that is dif-
the value of an analytic function at a point as a ferentiable (or holomorphic) except at a discrete set of
line integral along a closed curve going around that points, where it may have poles.
point. Neighborhood of a point An open set containing that
Cauchy–Riemann equations The system of first-order point.
partial differential equations u x = v y , u y = −vx , which Pole A point a is a pole of a function f (z) if f (z) is ana-
is equivalent to the complex function f (z) = u(x, y) + lytic on a neighborhood of a but not at a, limz→a
iv(x, y) being holomorphic (under the assump- f (z) = ∞, and limz→a (z − a)k f (z) is finite for some
tion that all four first-order partial derivatives are positive integer k.
continuous). Power series or Taylor series An expansion
of a complex
Conformal maps A map that preserves angles infinites- function as an infinite sum: f (z) = ∞ m=0 an (z − c) .
m
443
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
TO OVERSIMPLIFY, COMPLEX ANALYSIS is cal- the real part of z. The y-axis is called the imaginary axis
culus using complex numbers rather than real numbers. and the real number b is called the imaginary part of z.
Because a complex number a + ib is essentially a pair The complex conjugate z̄ of the complex number z =
of real numbers, there is more “freedom of movement” a + ib is the complex number
√ a − ib. The absolute value
over the complex numbers and many conditions become |z| is the real number a 2 + b2 . The (multiplicative) in-
stronger. As a result, complex analysis has many features verse or reciprocal of z is given by the formula
that distingush it from real analysis. The most remarkable 1 z̄ a − ib
feature, perhaps, is that a differentiable complex function = =√ .
z |z| a 2 + b2
always has a power series expansion. This fact follows
from the Cauchy integral formula, which expresses the The complex number z = a + ib can also be written in the
value of a function in terms of values of the function on a polar form
closed curve going around that point.
z = r eiθ = r (cos θ + i sin θ ),
Many powerful mathematical techniques can be found
in complex analysis. Many of these techniques were devel- where
oped in the 19th century in conjunction with solving prac-
r= a 2 + b2 , tan θ = y/x.
tical problems in astronomy, engineering, and physics.
Complex analysis is now recognized as an indispens- The angle θ is called the argument of z. The argument is
able component of any applied mathematician’s toolkit. determined up to an integer multiple of 2π . Usually, one
Complex analysis is also extensively used in number the- takes the value θ so that −π < θ ≤ π ; this is called the
ory, particularly in the study of the distribution of prime principal value of the argument.
numbers.
In addition, complex analysis was the source of many
new subjects of mathematics. An example of this is B. Topology of the Complex plane
Riemann’s attempt to make multivalued complex func- The absolute value defines a metric or distance function d
tions such as the square-root function single-valued by on the complex plane C by
enlarging the range. This led him to the idea of a Riemann
surface. This, in turn, led him to the theory of differentiable d(z 1 , z 2 ) = |z 1 − z 2 |.
manifolds, a mathematical subject that is the foundation This metric satisfies the triangle inequality
of the theory of general relativity.
d(z 1 , z 2 ) + d(z 2 , z 3 ) ≥ d(z 1 , z 3 )
and determines a topology on the complex plane C. We
I. THE COMPLEX PLANE shall need the notions of open sets, closed sets, clo-
sure, compactness, continuity, and homeomorphism from
A. Complex Numbers topology.
A complex number is a number of the form a + ib, where
a and b are real numbers and i is a square root of −1, C. Curves
that is, i satisfies the quadratic equation i 2 + 1 = 0. His-
A curve γ is a continuous map from a real interval [a, b]
torically, complex numbers arose out of attempts to solve
to C. The curve γ is said to be closed if γ (a) = γ (b); it
polynomial equations. In particular, in the 16th century,
is said to be open otherwise. A simple closed curve is a
Cardano, Tartaglia, and others were forced to use complex
closed curve with γ (t1 ) = γ (t2 ) if and only if t1 = a and
numbers in the process of solving cubic equations, even
t2 = b.
when all three solutions are real. Because of this, complex
An intuitively obvious theorem about curves that turned
numbers acquired a mystical aura that was not dispelled
out to be very difficult to prove is the Jordan curve theorem.
until the early 19th century, when Gauss and Argand pro-
This theorem is usually not necessary in complex analysis,
posed a geometric representation for them as pairs of real
but is useful as background.
numbers.
Gauss thought of a complex number z = a + ib geomet- The Jordan Curve Theorem. The image of a sim-
rically as a point (a, b) in the real two-dimensional space. ple closed curve (not assumed to be differentiable) sepa-
This represents the set C of complex numbers as a real rates the extended complex plane into two regions. One
two-dimensional plane, called the complex plane. The x- region is bounded (and “inside” the curve) and the other is
axis is called the real axis and the real number a is called unbounded.
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
D. The Stereographic Projection hence, closed) subsets A and B. A set is simply connected
if every closed curve in it can be contracted to a point,
It is often useful to extend the complex plane by adding
with the contraction occurring in the set. It can be proved
a point ∞ at infinity. The extended plane is called the
that a set is simply connected in C if its complement in the
extended complex plane and is denoted C̄. Using a stereo-
extended complex plane is connected. If A is not simply
graphic projection, we can represent the extended plane
connected, then the connected components (that is, open
C̄ using a sphere.
and closed subsets) of its complement C̄ not containing
Let S denote the sphere with radius 1 in real three-
the point ∞ are the “holes” of A.
dimensional space defined by the equation
A region is a nonempty open set of C. A domain is
x12 + x22 + x32 = 1 a non-empty connected open set of C. When r > 0, the
set {z : |z − c| < r } of all complex numbers at distance
Let the x1 -axis coincide with the real axis of C and let
strictly less than r from the center c is called the open
the x2 -axis coincide with imaginary axes of C. The point
disk with center c and radius r . Its closure is the closed
(0, 0, 1) on S is called the north pole. Let (x1 , x2 , x3 )
disk {z : |z − c| ≤ r }. Open disks are the most commonly
be any point on S not equal to the north pole. The point
used domains in complex analysis.
z = x + i y of intersection of the straight line segment
emanating from the north pole N going through the point
(x1 , x2 , x3 ) with the complex plane C is the stereographi-
cal projection of (x1 , x2 , x3 ). Going backwards, the point II. ANALYTIC AND HOLOMORPHIC
(x1 , x2 , x3 ) is the spherical image of z. The north pole FUNCTIONS
is the spherical image of the point ∞ at infinity. The
stereographic projection is a one-to-one mapping of the A. Holomorphic Functions
extended plane C̄ onto the sphere S. The sphere S is called Let f (z) denote a complex function defined on a set
the Riemann sphere. The stereographical projection has in the complex plane C. The function f (z) is said to be
the property that the angles between two (differentiable) differentiable at the point a in if the limit
curves in C and the angle between their images on S are
equal. f (a + h) − f (a)
lim
The mathematical formulas relating points and their h→0 h
spherical images are as follows:
is a finite complex number. This limit is the derivative
z + z̄ z − z̄ |z|2 − 1 f (a) of f (z) at a. Note that the limit is taken over all
x1 = , x2 = , x3 = complex numbers h such that the absolute value |h| goes
1 + |z|2 i(1 + |z|2 ) 1 + |z|2
to zero, so that h ranges over a set that has two real di-
and
mensions. Thus, differentiability of complex functions is a
x1 + i x2
z= much stronger condition than differentiability of real func-
1 − x3 tions. For example, the length of every (infinitesimally)
Let z 1 and z 2 be two points in the plane C. The spherical or small line segment starting from a is changed under the
chordal distance σ (z 1 , z 2 ) between their spherical images function f (z) by the same real scaling factor | f (a)|, inde-
on S is pendently of the angle. The formal rules of differentiation
2|z 1 − z 2 | in calculus hold also for complex differentiation.
σ (z 1 , z 2 ) = . It is possible to construct complex functions that are
1 + |z 1 |2 1 + |z 2 |2
differentiable at only one point. To exclude these degen-
Let dσ and ds be the length of the infinitesimal arc on S erate cases, we only consider complex functions that are
and C, respectively. Then differentiable at every point in a region . Such functions
are said to be holomorphic on .
dσ = 2(1 + |z|2 )−1 ds.
where u(x, y) is the real part of f (z) and v(x, y) is the power series work in the same way as power series over
imaginary part of f (z). The functions u(x, y) and v(x, y) the reals.
are real functions of two real variables. In particular, a power series has a radius of convergence,
Differentiability of the complex function f (z) can be that is, an extended real number ρ, 0 ≤ ρ ≤ ∞, such that
rewritten as a condition on the real functions u(x, y) and the series converges absolutely whenever |z − c| < ρ. The
v(x, y). Let f (z) = u(x, y) + iv(x, y) be a complex func- radius of convergence is given explicitly by Hadamard’s
tion such that all four first-order partial derivatives of u formula:
and v are continuous in the open set . Then a necessary 1
and sufficient condition for f (z) to be holomorphic on is ρ= √
lim sup n |an |
n→∞
∂u ∂v ∂u ∂v
= , =− A function f (z) is analytic in the region if for every
∂x ∂y ∂y ∂x
point c in , there exists an open disk {z : |z − c| < r }
These equations are the Cauchy–Riemann equations. In contained in such that f (z) has a (convergent) power
polar coordinates z = r eiθ , the Cauchy–Riemann equa- series or Taylor expansion
tions are ∞
∂u ∂v ∂u ∂v f (z) = am (z − c)m
r = , = −r
∂r ∂θ ∂θ ∂r m=0
The square of the absolute value of the derivative is the When this holds, f (c)/m! = am .
(n)
Jacobian of u(x, y) and v(x, y), that is Polynomials are analytic functions on the complex
plane. Other examples of analytic functions on the com-
∂u ∂v ∂u ∂v
| f (z)|2 = − plex plane are the exponential function,
∂x ∂y ∂y ∂x ∞
C. Power Series and Analytic Functions A. Line Integrals and Winding Numbers
Just as in calculus, we can define complex functions using Line integrals are integrals taken over a curve rather than
power series. A power series is an infinite sum of the form an interval on a real line. Let γ : [a, b] → C be a piecewise
continuously differentiable curve and let f (z) be a contin-
∞
am (z − c)m uous complex function defined on the image of γ . Then
m=0 the line integral of f (z) along the path γ is the Riemann
integral
where the coefficients am and the center c are complex
b
numbers. This series determines a complex number (de-
f (γ (t))γ (t) dt
pending on z) whenever the the series converges. Complex a
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
for every point z in ), then it is said to be a conformal Maximum principle. Let be a bounded domain in
mapping on . C. Suppose that f (z) is analytic in and continuous in
Examples of conformal mappings (on suitable regions) the closure of . Then, | f (z)| attains its maximum value
are Möbius transformations. Let a, b, c, and d be complex | f (a)| at a boundary point a of . If f (z) is not constant,
numbers such that ad − bc = 0. Then the bilinear trans- then
formation | f (z)| < | f (a)|
az + b for every point z in the interior of .
T (z) =
cz + d
Using the maximum principle, one obtains Schwarz’s
is a Möbius transformation. Any Möbius transformation lemma.
can be decomposed into a product of four elementary con-
Schwarz’s lemma. Let f (z) be analytic on the disk
formal mappings: translations, rotations, homotheties or
D = {z : |z| < 1} with center 0 and radius 1. Suppose that
dilations, and inversions. In addition to preserving angles,
f (0) = 0 and | f (z)| ≤ 1 for all points z in D. Then,
Möbius transformations map circles into circles, provided
that a straight line is viewed as a “circle” passing through | f (0)| ≤ 1
the point ∞ at infinity.
and for all points z in D,
If f (z) is a function on the disk with center 0 and radius ρ If | f (0)| = 1 or | f (a)| = |a| for some nonzero point a,
and r < ρ, let M f (r ) be the maximum value of | f (z)| on then f (z) = αz for some complex number α with |α| = 1.
the circle {z : |z| = r }. Another theorem, which has inspired many generaliza-
Cauchy’s inequality. Let f (z) be analytic tions in the theory of partial differential equations, is the
in the disk following.
with center 0 and radius ρ and let f (z) = ∞ n
m=0 an z . If
r < ρ, then Hadamard’s three-circles theorem. Let f (z) be an
|an |r ≤ M f (r )
n analytic function on the annulus {z : ρ1 ≤ |z| ≤ ρ3 } and let
ρ1 < r1 ≤ r2 ≤ r3 < ρ3 . Then
A function is entire if it is analytic on the entire com- log r3 − log r2
plex plane. The following are two theorems about entire log M f (r2 ) ≤ log M f (r1 )
log r3 − log r1
function. The first follows easily from the case n = 1 of
log r2 − log r1
Cauchy’s inequality. + log M f (r3 )
log r3 − log r1
Liouville’s theorem. If f (z) is an entire function and
f (z) is bounded on C, then f must be a constant function. It follows from the three-circles theorem that log M f (r ) is
The second is much harder. It implies the fact that if a convex function of log r .
two or more complex numbers are absent from the image Cauchy’s integral theorem has the following converse.
of an entire function, then that entire function must be a Morera’s theorem. Let f (z) be a continuous function
constant. on a simply connected region in C. Suppose that the
Picard’s little theorem. An entire function that is not a line integral
polynomial takes every value, with one possible exception,
g(z) dz
infinitely many times. ∂
Applying Liouville’s theorem to the reciprocal of a over the boundary ∂ of every triangle in is zero. Then
nonconstant polynomial p(z) and using the fact that f (z) is analytic in .
p(z) → ∞ as z → ∞, one obtains the following impor-
tant theorem.
E. Analytic Continuation
The fundamental theorem of algebra. Every polyno-
An analytic function f (z) is usually defined initially with
mial with complex coefficients of degree at least one has
a certain formula in some region D1 of the complex plane.
a root in C.
Sometimes, one can extend the function f (z) to a function
It follows that a polynomial of degree n must have n fˆ(z) that is analytic on a bigger region D2 containing
roots in C, counting multiplicities. D1 such that fˆ(z) = f (z) for all points z on D1 . Such an
The next theorem is a fundamental property of analytic extension is called analytic continuation. Expanding the
functions. function as a Taylor (or Laurent series) is one possible
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
The Fabry gap theorem. Let If one allows negative powers, then analytic functions
∞ can be expanded as power series at isolated singularities.
f (z) = am z bm The idea is to write a meromorphic function f (z) in a
m=0 neighborhood of a pole a as a sum of an analytic part
where bm is a sequence of increasing nonnegative integers and a singular part. Suppose the function f (z) is analytic
such that in a region containing the annulus {z : ρ1 < |z − a| < ρ2 }.
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
Then we can define two functions f 1 (z) and f 2 (z) by: {cω1 + dω2 }, where c and d range over all integers. An
elliptic function is a meromorphic function on the complex
1 f (ζ ) dζ
f 1 (z) = plane with two (independent) periods, ω1 and ω2 , that is,
2πi {ζ :|ζ −a|=r } ζ − z
f (z + ω1 ) = f (z), f (z + ω2 ) = f (z), and every complex
where r satisfies |z − a| < r < ρ2 , and number ω such that f (z + ω) = f (z) for all points z in the
complex plane is a number in the lattice L.
1 f (ζ ) dζ
f 2 (z) = − A specific example of an elliptic function is the
2πi {ζ :|ζ −a|=r } ζ − z
Weierstrass ℘-function defined by the formula
where r satisfies ρ1 < r < |z −a|. The function f 1 (z) is an-
1 1 1
alytic in the disk {z : |z − a| < ρ2 } and the function f 2 (z) ℘(z) = 2 + − 2
is analytic in the complement {z : |z − a| > ρ1 }. By the z ω∈L\{0}
(z − ω)2 ω
Cauchy integral formula, f (z) = f 1 (z) + f 2 (z) and this This defines a meromorphic function on the complex
representation is valid in the annulus {z : ρ1 < |z−a| < ρ2 }. plane that is doubly periodic with periods ω1 and ω2 . The
The functions f 1 (z) and f 2 (z) can each be expanded ℘-function has poles exactly at points in L. Weierstrass
as Taylor series. Using the transformation z − a → 1/z proved that every elliptic function with periods ω1 and
and some simple calculation, we obtain the Laurent series ω2 can be written as a rational function of ℘ and its
expansion derivative ℘ .
∞
f (z) = am (z − a)m C. The Cauchy Residue Theorem
m=−∞
A simple but useful generalization of Cauchy’s integral
where formula is the Cauchy residue theorem.
1 f (ζ ) dζ
am = The Cauchy residue theorem. Let be a simply con-
2πi {ζ :|ζ −a|=r } (ζ − z)m+1
nected region in the complex plane, let f (z) be a function
valid in the annulus {z : ρ1 < |z − a| < ρ2 }. Note that analytic on except at the isolated singularities am , and
let let γ be a closed piecewise continuously differentiable
Res( f, a) = a−1 ,
curve in that does not pass through any of the points
and the point a is a pole of order k if and only if a−k = 0 and am . Then
every coefficient a−m with m > k is zero. The polynomial
1
f (z)dz = n(γ , am )Res( f, am )
k
am 2πi γ
m=1
(z − a)m where the sum ranges over all the isolated singularities
inside the curve γ .
in the variable 1/(z − a) is called the singular or principal
part of f (z) at the pole a. Cauchy’s residue theorem has the following useful
corollary.
the difference between the number Z ( f ) of zeros and the for every point z on the curve bounding , then
number P( f ) of poles of f (z) inside γ , counting multi-
Z ( f ) − P( f ) = Z (g) − P(g)
plicities and orders.
Hurwitz’s theorem. Let ( f n (z) : n = 1, 2, . . .) be a se-
D. Evaluation of Real Integrals quence of functions analytic in a region bounded by a
simple closed piecewise continuously differentiable curve
Cauchy’s residue theorem can be used to evaluate real such that f n (z) converges uniformly to a nonzero (analytic)
definite integrals that are otherwise difficult to evaluate. function f (z) on every closed subset of . Let a be an in-
For example, to evaluate an integral of the form terior point of . If a is a limit point of the set of zeros of
2π
the functions f n (z), then a is a zero of f (z). If a is a zero
R(cos θ, sin θ ) dθ
0 of f (z) with multiplicity m, then every sufficiently small
where R is a rational function of cos θ and sin θ, let z = eiθ . neighborhood K of a contains exactly m zeros of the func-
If we make the substitutions tions f n (z), for all n greater than a number N depending
on K .
z + z −1 z − z −1
cos θ = , sin θ =
2 2i
F. Infinite Products, Partial Fractions,
the integral becomes a line integral over the unit circle of and Approximations
the form A natural way to write a meromorphic function is in terms
S(z) dz of its zeros and poles. For example, because sin π z has
|z|=1
zeros at the integers, we expect to be able to “factor” it into
where S(z) is a rational function of z. By Cauchy’s residue product. Indeed, Euler wrote down the following product
theorem, this integral equals 2πi times the sum of the expansion:
residues of the poles of S(z) inside the unit circle. Using
∞
z z
this method, one can prove, for example, that if a > b > 0, sin π z = π z 1− 1+
2π n n
dθ π(2a + b) j=1
= 3/2 With complex analysis, one can justify such expansions
0 (a + b cos 2 θ )2 a (a + b)3/2
rigorously.
One can also evaluate improper integrals, obtaining for- The question of convergence of an infinite product is
mulas such as the following formula due to Euler: For easily resolved. By taking logarithms, one can reduce it
−1 < p < 1 and −π < α < π, to a question of convergence of a sum. For example, the
∞
x−p dx π sin pα product
= ∞
1 + 2x cos α + x 2 sin pπ sin α
0
(1 + am )
m=1
∞
E. Location of Zeros converges absolutely if and only if the sum m=1
It is often useful to locate zeros of polynomials in the | log(1 + am )| converges absolutely. Since | log(1 + am )|
complex plane. An elegant theorem, which can be proved is approximately |am |, the product converges absolutely if
by elementary arguments, is the following result. and only if the series ∞ m=1 |am | converges absolutely.
The following theorem allows us to construct an entire
Lucas’ theorem. Let p(z) be a polynomial of degree function with a prescribed set of zeros.
at least 1. All the zeros of the derivative p (z) lie in the The Weierstrass product theorem. Let (a j : j =
convex closure of the set of zeros of p(z). 1, 2, . . .) be a sequence of nonzero complex numbers in
Deeper results usually involve using some form of which no complex number occurs infinitely many times.
Rouché’s theorem, which is proved using the argument Suppose that the set {a j } has no (finite) limit point in the
principle. complex plane. Then there exists an entire function f (z)
with a zero of multiplicity m at 0, zeros in the set {a j } with
Rouché’s theorem. Let be a region bounded by a sim- the correct multiplicity, and no other zeros. This function
ple closed piecewise continuously differentiable curve. can be written in the form
Let f (z) and g(z) be two functions meromorphic in an ∞
z
open set containing the closure of . If f (z) and g(z) f (z) = z m e g(z) 1−
satisfy j=1
aj
+···+(1/m j )(a j /z)m j
× ea j /z+(1/2)(a j /z)
2
| f (z) − g(z)| < | f (z)| + |g(z)|
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
The function F(z) is called the triangle function of For general simply connected domains, Dirichlet’s
Schwarz. problem is difficult. It is equivalent to finding a Green’s
function. For a disk, the following formula is known.
B. Univalent Functions and
Bieberbach’s Conjecture The Poisson formula. Let g(z) = g(eiφ ), − π < φ ≤ π,
be a piecewise continuous function on the boundary
Univalent or Schlicht functions are one-to-one analytic {z : |z| = 1} of the unit disk. Then the function
functions. They have been extensively studied in complex
analysis. A famous result in this area was the Bieberbach u(z) = u(r eiθ )
conjecture, which was proved by de Branges in 1984. π
1 1 − r2
= g(eiφ ) dφ
Theorem. Let f (z) be a univalent analytic function on 2π −π 1 + r 2 − 2r cos(φ − θ )
the unit disk {z : |z| < 1} with power series expansion is a solution to Dirichlet’s problem for the unit disk.
f (z) = z + a2 z + a3 z + · · ·
2 3
D. Riemann Surfaces
(that is, f (0) = 0 and f (0) = 1). Then
A Riemann surface S is a one-dimensional complex con-
|an | ≤ n. nected paracompact Hausdorff space equipped with a con-
When equality holds, f (z) = e−iθ K (eiθ z), where formal atlas, that is, a set of maps or charts {h : D → N },
where D is the open disk {z : |z| < 1} and N is an open set
z
K (z) = = z + 2z 2 + 3z 3 + 4z 4 + · · · of S, such that
(1 − z)2
The function K (z) is called Koebe’s function. 1. The union of all the open sets N is S.
2. The chart h : D → N is a homeomorphism of the
Another famous result in this area is due to Koebe.
disk D to N .
Koebe’s 1/4-theorem. Let f (z) be a univalent function. 3. Let N1 and N2 be neighborhoods with charts h 1
Then the image of the unit disk {z : |z| < 1} contains the and h 2 . If the intersection N1 ∩ N2 is nonempty and con-
disk {z : |z| < 1/4} with radius 1/4. nected, then the composite mapping h −1 2 ◦ h 1 , defined on
the inverse image h −1 1 (N1 ∩ N2 ), is conformal.
The upper bound 1/4 (the “Koebe constant”) is the best
possible.
Riemann surfaces originated in an attempt to make a
C. Harmonic Functions “multivalued” analytic function single-valued by making
its range a Riemann surface. Examples of multivalued
Harmonic functions, defined in Section II, are real func-
functions are algebraic functions. These are functions f (z)
tions u(x, y) satisfying Laplace’s equation. We shall use
satisfying a polynomial equation P( f (z)) = 0. A specific
√
the notation: u(x, y) = u(x + i y) = u(z). Harmonic func-
example of this is the square-root
√ function f (z) = z,
tions have several important properties.
which takes on two values ± z except when z = 0. This
The mean-value property. Let u(z) be a harmonic can be made into a single-valued function using the Rie-
function on a region . Then for any disk D with cen- mann surface obtained by gluing together two sheets or
ter a and radius r whose closure is contained in , copies of the complex plane cut from 0 to ∞ along the pos-
2π itive real axis. Another example is the logarithmic func-
1
u(a) = u(a + r eiθ ) dθ tion log z, which requires a Riemann surface made from
2π 0 countably infinitely many sheets.
The maximum principle for harmonic functions. Let Intuitively, the genus of a Riemann surface S is the
u(z) be a harmonic function on the domain . If there number of “holes” it has. The genus can be defined as the
is a point a in such that u(a) equals the maximum maximum number of disjoint simple closed curves that
max{u(z) : z ∈ }, then u(z) is a constant function. do not disconnect S. For example, the extended complex
plane has genus 0 and an annulus has genus 1. There are
An important problem in the theory of harmonic func-
many results about Riemann surfaces. The following are
tion is Dirichlet’s problem. Given a simply connected do-
two results that can be simply stated.
main and a piecewise continuous function g(z) on the
boundary ∂ , find a function u(z) in the closure ¯ such Picard’s theorem. Let P(u, v) be an irreducible poly-
that u(z) is harmonic in , and the restriction of u(z) to nomial with complex coefficients in two variables u and v.
the boundary ¯ \ equals g(z). (The boundary ∂ is the If there exist nonconstant entire functions f (z) and g(z)
set ¯ \ .) satisfying P( f (z), g(z)) = 0 for all complex numbers z,
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003D-127 June 13, 2001 22:39
then the Riemann surface associated with the algebraic Gunning, R. (1990). “Introduction to Holomorphic Functions of Several
equation P(u, v) = 0 has genus 0. Variables,” Vols. I, II, and III. Wadsworth and Brooks/Cole, Pacific
Grove, CA.
Koebe’s uniformization theorem. If S is a simply con- Hayman, W. K. (1964). “Meromorphic Functions,” Oxford University
nected Riemann surface, then S is conformally equiv- Press, Oxford.
alent to Hille, E. (1962, 1966). “Analytic Function Theory,” Vols. I and II. Ginn-
Blaisdell, Boston.
Hille, E. (1969). “Lectures on Ordinary Differential Equations,”
1. (Elliptic type) The Riemann sphere. In this case, S Addison-Wesley, Reading, MA.
is the sphere. Hu, P.-C., and Yang, C.-C. (1999). “Differential and Complex Dynamics
2. (Parabolic type) The complex plane C. In this case, of One and Several Variables,” Kluwer, Boston.
Hua, X.-H., and Yang, C.-C. (1998). “Dynamics of Transcendental Func-
S is biholomorphic to C, C\{0}, or a torus. tions,” Gordon and Breach, New York.
3. (Hyperbolic type) The unit disk {z : |z| < 1}. Kodiara, K. (1984). “Introduction to Complex Analysis,” Cambridge
University Press, Cambridge, U.K.
Complex manifolds are higher-dimensional generaliza- Krantz, S. G. (1999). “Handbook of Complex Variables,” Birkhäuser,
Boston.
tions of Riemann surfaces. They have been extensively
Laine, I. (1992). “Nevanlinna Theory and Complex Differential Equa-
studied. tions,” De Gruyter, Berlin.
Lang, S. (1987). “Elliptic Functions,” 2nd ed., Springer-Verlag, Berlin.
E. Other Topics Lehto, O. (1987). “Univalent Functions and Teichmüller Spaces,”
Springer-Verlag, Berlin.
Complex analysis is a vast and ever-expanding area. “Nine Marden, M. (1949). “Geometry of Polynomials,” American Mathemat-
lifetimes” do not suffice to cover every topic. Some inter- ical Society, Providence, RI.
McKean, H., and Moll, V. (1997). “Elliptic Curves, Function Theory, Ge-
esting areas that we have not covered are complex differen-
ometry, Arithmetic,” Cambridge University Press, Cambridge, U.K.
tial equations, complex dynamics, Montel’s theorem and Morrow, J., and Kodiara, K. (1971). “Complex Manifolds,” Holt, Rine-
normal families, value distribution theory, and the theory hart and Winston, New York.
of complex functions in several variables. Several books Nehari, Z. (1952). “Conformal Mapping,” McGraw-Hill, New York;
on these topics are listed in the Bibliography. reprinted, Dover, New York.
Needham, T. (1997). “Visual Complex Analysis,” Oxford University
Press, Oxford.
SEE ALSO THE FOLLOWING ARTICLES Palka, B. P. (1991). “An Introduction to Complex Function Theory,”
Springer-Verlag, Berlin.
Pólya, G., and Szegö, G. (1976). “Problems and Theorems in Analysis,”
CALCULUS • DIFFERENTIAL EQUATIONS • NUMBER THE- Vol. II. Springer-Verlag, Berlin.
ORY • RELATIVITY, GENERAL • SET THEORY • TOPOLOGY, Protter, M. H., and Weinberger, H. F. (1984). “Maximum Principles in
GENERAL Differential Equations,” Springer-Verlag, Berlin.
Remmert, R. (1993). “Classical Topics in Complex Function Theory,”
Springer-Verlag, Berlin.
Rudin, W. (1980). “Function Theory in the Unit Ball of Cn ,” Springer-
BIBLIOGRAPHY Verlag, Berlin.
Schiff, J. L. (1993). “Normal Families,” Springer-Verlag, Berlin.
Ahlfors, L. V. (1979). “Complex Analysis,” McGraw-Hill, New York. Schwerdtfeger, H. (1962). “Geometry of Complex Numbers,” University
Beardon, A. F. (1984). “A Primer on Riemann Surfaces,” Cambridge of Toronto Press, Toronto. Reprinted, Dover, New York.
University Press, Cambridge, U.K. Siegel, C. L. (1969, 1971, 1973). “Topics in Complex Function Theory,”
Blair, D. E. (2000). “Inversion Theory and Conformal Mappings,” Amer- Vols. I, II, and III, Wiley, New York.
ican Mathematical Society, Providence, RI. Smithies, F. (1997). “Cauchy and the Creation of Complex Function
Carleson, L., and Gamelin, T. W. (1993). “Complex Dynamics,” Theory,” Cambridge University Press, Cambridge, U.K.
Springer-Verlag, Berlin. Steinmetz, N. (1993). “Rational Iteration, Complex Analytic Dynamical
Cartan, H. (1960). “Elementary Theory of Analytic Functions of One Systems,” De Gruyter, Berlin.
or Several Complex Variables,” Hermann and Addison-Wesley, Paris Titchmarsh, E. C. (1939). “The Theory of Functions,” 2nd ed., Oxford
and Reading, MA. University Press, Oxford.
Cherry, W., and Ye, Z. (2001). “Nevanlinna’s Theory of Value Distribu- Vitushkin, A. G. (ed.). (1990). “Several Complex Variables I,” Springer-
tion,” Springer-Verlag, Berlin. Verlag, Berlin.
Chuang, C.-T., and Yang, C.-C. (1990). “Fix-Points and Factorization Weyl, H. (1955). “The Concept of a Riemann Surface,” 3rd ed., Addison-
of Meromorphic Functions,” World Scientific, Singapore. Wesley, Reading, MA.
Duren, P. L. (1983). “Univalent Functions,” Springer-Verlag, Berlin. Whitney, H. (1972). “Complex Analytic Varieties,” Addison-Wesley,
Farkas, H. M., and Kra, I. (1992). “Riemann Surfaces,” Springer-Verlag, Reading, MA.
Berlin. Whittaker, E. T., and Watson, G. N. (1969). “A Course of Modern Anal-
Gong, S. (1999). “The Bieberbach Conjecture,” American Mathematical ysis,” Cambridge University Press, Cambridge, U.K.
Society, Providence, RI. Yang, L. (1993). “Value Distribution Theory,” Springer-Verlag, Berlin.
P1: GLQ/GJP P2: FQP Final Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN003B-132 June 13, 2001 22:45
Computer-Based Proofs of
Mathematical Theorems
C. W. H. Lam
Concordia University
I. Mathematical Theories
II. Computer Programming
III. Computer As an Aid to Mathematical Research
IV. Examples of Computer-Based Proofs
V. Proof by Exhaustive Computer Enumeration
VI. Recent Development: RSA Factoring Challenge
VII. Future Directions
543
P1: GLQ/GJP P2: FQP Final
Encyclopedia of Physical Science and Technology EN003B-132 June 13, 2001 22:45
only suitable for computers. Two notable examples are The statement “x = 3” is not a proposition because its
the four-color theorem and the nonexistence of a finite truth value depends on the value of x. It is a predicate,
projective plane of order 10. Both proofs required thou- and its truth value depends of the value of the variable or
sands of hours of computing and gave birth to the term “a argument x. The expression P(x) is often used to denote
computer-based proof.” The organization of such a proof a predicate P with an argument x. A predicate may have
requires careful estimation of the necessary computing more than one argument, for example, x + y = 1 has two
time, meticulous optimization of the computer program, arguments. To convert a predicate to a proposition, values
and prudent control of all possible computer errors. In have to be assigned to all the arguments. The process of
spite of the difficulty in checking these proofs, mathemati- associating a value to a variable is called a binding.
cians are starting to accept their validity. As computers are Another method of converting a predicate to a proposi-
getting faster, many other famous open problems will be tion is by the quantification of the variables. There are two
solved by this new approach. common quantifiers: universal and existential. For exam-
ple, the statement
I. MATHEMATICAL THEORIES For all x, x < x + 1
What is mathematics? It is not possible to answer this ques- uses the universal quantifier “for all” to provide bindings
tion precisely in this short article, but a generally accept- for the variable x in the predicate x < x + 1. If the predi-
able definition is that mathematics is a study of quantities cate is true for every possible value of x, then the propo-
and relations using symbols and numbers. The starting sition is true. The symbol ∀ is used to denote the phrase
point is often a few undefined objects, such as a set and its “for all.” Thus, the above proposition can also be written
elements. A mathematical theory is then built by assum- as
ing some axioms which are statements accepted as true. ∀x, x < x + 1.
From these basic components, further properties can then
be derived. The set of possible values for x has to come from a cer-
For example, the study of geometry can start from the tain universal set. The universal set in the above example
undefined objects called points. A line is then defined as may be the set of all integers, or it may be the set of all
a set of points. An axiom may state: “Two distinct lines reals. Sometimes, the actual universal set can be deduced
contain at most one common point.” This is one of the ax- from context and, consequently, not stated explicitly in the
ioms in Euclidean geometry, where it is possible to have proposition. A careful mathematician may include all the
parallel lines. The complete set of axioms of Euclidean details, such as
geometry was classified by the great German mathemati- √
∀ real x > 1, x > x.
cian David Hilbert in 1902. Using these axioms, further
results can be derived. Here is an example: The choice is often not a matter of sloppiness, but a con-
scious decision depending on whether all the exact details
Two triangles are congruent if the three sides of one are will obscure the main thrust of the result.
equal to the three sides of the other. An existential quantifier asserts that a predicate P(x) is
true for one or more x in the universal set. It is written as
To show that these derived results follow logically from
For some x, P(x),
the axioms, a system of well-defined principles of mathe-
matical reasoning is used. or, using the symbol ∃, as
∃x, P(x).
A. Mathematical Statements
This assertion is false only if for all x, P(x) is false. In
The ability to demonstrate the truth of a statement is central
other words,
to any mathematical theory. This technique of reasoning
is formalized in the study of propositional calculus. ¬[∃x, P(x)] ≡ [∀x, ¬P(x)].
A proposition is a statement which is either true or false,
but not both. For example, the following are propositions:
B. Enumerative Proof
r 1 + 1 = 2. A proof of a mathematical result is a demonstration of
r 4 is a prime number. its truth. To qualify as a proof, the demonstration must
be absolutely correct and there can be no uncertainty nor
The first proposition is true, and the second one is false. ambiguity in the proof.
P1: GLQ/GJP P2: FQP Final
Encyclopedia of Physical Science and Technology EN003B-132 June 13, 2001 22:45
Many interesting mathematical results involve quanti- D. What Makes an Interesting Theorem
fiers such as ∃ or ∀. There are many techniques to prove
Of course, a theorem must be interesting to be worth prov-
quantified statements, one of which is the enumerative
ing. However, whether something is interesting is a sub-
proof. In this method, the validity of ∀x, P(x) is estab-
jective judgement. The following list contains some of the
lished by investigating P(x) for every value of x one after
properties of an interesting result:
another. The proposition ∀x, P(x) is only true if P(x) has
been verified to be true for all x. However, if P(x) is false r Useful
for one of the values, then it is not necessary to consider r Important
the remaining values of x, because ∀x, P(x) is already r Elegant
false. Similarly, the validity of ∃x, P(x) can be found by r Provides insight
evaluating P(x) for every value of x. The process of evalu-
ation can stop once a value of x is found for which P(x) is
While the usefulness of some theorems is immediately
true, because ∃x, P(x) is true irrespective of the remaining
obvious, the appreciation of others may come only years
values of x. However, to establish that ∃x, P(x) is false,
later. The importance of a theorem is often measured by
it has to be shown that P(x) is false for all values of x.
the number of mathematicians who know about it or who
To ensure a proof that is finite in length, enumerative
have tried proving it. An interesting theorem should also
proofs are used where only a finite number of values of x
give insights about a problem and point to new research
have to be considered. It may be a case where x can take
directions. A theorem is often appreciated as if it is a piece
on only a finite number of values, or a situation where
of art. Descriptions such as “It is a beautiful theorem” or
an infinite number of values can be handled by another
“It is an elegant proof” are often used to characterize an
proof technique, leaving only a finite number of values
interesting mathematical result.
for which P(x) are unknown.
An enumerative proof is a seldom-used method because
it tends to be too long and tedious for a human being. A
proof involving a hundred different cases is probably the II. COMPUTER PROGRAMMING
limit of the capacity of the human mind. Yet, it is precisely
in this area that a computer can help, where it can evaluate Figure 1 shows a highly simplified view of a computer.
millions or even billions of cases with ease. Its use can There are two major components: the central processing
open up new frontiers in mathematics. unit (CPU) and memory. Data are stored in the memory.
The CPU can perform operations which changes the data.
A program is a sequence of instructions which tell the
C. Kinds of Mathematical Results CPU what operations to perform and it is also stored in
Mathematicians love to call their results theorems. Along the computer memory.
the way, they also prove lemmas, deduce corollaries, and Computer programming is the process of creating the
propose conjectures. What are these different classifica- sequence of instructions to be used by the computer. Most
tions? The following paragraphs will give a brief answer programming today is done in high-level languages such
to this question. as Fortran, Pascal, or C. Such languages are designed for
Mathematical results are classified as lemmas, theo- the ease of creating and understanding a complex program
rems, and corollaries, dependent on their importance. The by human beings. A program written in one of these lan-
most important results are called theorems. A lemma is an guages has to be translated by a compiler before it can be
auxiliary result which is useful in proving a theorem. A used by the computer.
corollary is a subsidiary result that can be derived from
a theorem. These classifications are quite loose, and the
actual choice of terminology is based on the subjective
evaluation of the discoverer of the results. There are cases
where a lemma or a corollary have, over a period of time,
attained an importance surpassing the main theorem.
A conjecture, on the other hand, is merely an educated
guess. It is a statement which may or may not be true.
Usually, the proposer of the conjecture suspects that it
is highly probable to be true but cannot prove it. Many
famous mathematical problems are stated as conjectures.
If a proof is later found, then it becomes a theorem. FIGURE 1 A simplified view of a computer.
P1: GLQ/GJP P2: FQP Final
Encyclopedia of Physical Science and Technology EN003B-132 June 13, 2001 22:45
A. Four-Color Conjecture
The four-color conjecture says that four colors are suffi-
cient to color any map drawn in the plane or on a sphere FIGURE 3 The complete bipartite graph K 3,3 .
P1: GLQ/GJP P2: FQP Final
Encyclopedia of Physical Science and Technology EN003B-132 June 13, 2001 22:45
plane. After about 2000 computing hours on a CRAY-1A equal to {[110], [101], [011]}. A partial solution at level 1
supercomputer in addition to several years of computing can be formed by choosing any of these three rows as x1 .
on a number of VAX-11 and micro-VAX computers, it If x1 is chosen to be [110], then there are only two choices
was shown that none of the matrices could be completed, for x2 which would satisfy the predicate P(xl , x2 ), namely,
which implied the nonexistence of the projective plane of [101] and [011]. Each of these choices for x2 has a unique
order 10. About 1012 different subcases were investigated. extension to a solution.
A nice way to organize the information inherent in the
V. PROOF BY EXHAUSTIVE partial solutions is the backtrack search tree. Here, the
COMPUTER ENUMERATION empty partial solution ( ) is taken as the root, and the par-
tial solution (x1 , . . . , xk ) is represented as the child of the
The proofs of both the four-color theorem and the nonex- partial solution (x1 , . . . , xk−1 ). Following the computer
istence of a projective plane order 10 share one common science terminology, we call a partial solution a node. It is
feature: they are both enumerative proofs. This approach also customary to label a node (x1 , . . . , xk ) by only xk be-
to a proof is often avoided by humans, because it is tedious cause this is the choice made at this level. The full partial
and error prone. Yet, it is tailor-made for a computer. solution can be read off the tree by following the branch
A. Methodology from the root to the node in question. Figure 6 shows the
search tree for the projective plane of order 1. The possi-
Exhaustive computer enumeration is often done by a pro- ble candidates are labeled as r1 , r2 , and r3 . The right-most
gramming technique called backtrack search. A version branch, for example, represents choosing r3 or [011] as
of the search problem can be defined as follows: the first row, r2 as the second row, and r1 as the third row.
It is often true that the computing cost of processing a
Search problem: node is independent of its level k. Under this assumption,
Given a collection of sets of candidates the total computing cost of a search is equal to the number
C1 , C2 , C3 , . . . , Cm and a boolean compatibility of nodes in the search tree times the cost of processing
predicate P(x , y) defined for all x ∈ Ci and y ∈ C j , a node. Hence, the number of nodes in the search tree
find an m-tuple (x1 , . . . , xm ) with xi ∈ Ci such that is an important parameter in a search. This number can
P(xi , x j ) is true for all i = j. be obtained by counting the nodes in every level, with αi
defined as the node count at level i of the tree. For example,
A m-tuple satisfying the above condition is called a in the search tree for the projective plane of order 1 shown
solution. in Fig. 6, α1 = 3, α2 = 6, and α3 = 6.
For example, if we take m = n 2 + n + 1 and let Ci be
the set of all candidates for row i of the incidence matrix B. Estimation
of a projective plane, then P(x , y) can be defined as
It is very difficult to predict a priori the running time of a
true if x , y = 1
P(x , y) = backtrack program. Sometimes, one program runs to com-
false otherwise, pletion in less than 1 sec, while other programs seem to
where x , y denotes the inner product of rows x and y. A take forever. A minor change in the strategy used in the
solution is then a complete incidence matrix.
In a backtrack approach, we generate k-tuples with
k ≤ m. A k-tuples (x1 , . . . , xk ) is a partial solution at
level k if P(xi , x j ) is true for all i = j ≤ k. The basic idea
of the backtrack approach is to extend a partial solution
at level k to one at level k + 1, and if this extension is im-
possible, then to go back to the partial solution at level
k − 1 and attempt to generate a different partial solution at
level k.
For example, consider the search for a projective plane
of order 1. In terms of the incidence matrix, the problem is
to find a 3 ×3 (0,1)-matrix A satisfying the matrix equation
A AT = I + J.
Suppose the matrix A is generated row by row. Since each
row of A must have two l’s, the candidate sets Ci are all FIGURE 6 Search tree for the projective plane of order 1.
P1: GLQ/GJP P2: FQP Final
Encyclopedia of Physical Science and Technology EN003B-132 June 13, 2001 22:45
One common optimization technique is to move invari- So, it is impossible to have an absolute error-free,
ant operations from the inside of a loop to the outside. This computer-based proof! In this sense, a computer-based
idea can be applied to a backtrack search in the following proof is an experimental result. As scientists in other dis-
manner. We try to do as little as possible for nodes near the ciplines have long discovered, the remedy is an indepen-
bulge, at the expense of more processing away from the dent verification. In a sense, the verification completes the
bulge. For example, suppose we have a tree of depth 3 and proof.
that α1 = 1, α2 = 1000, and α3 = 1. If the time required to
process each node is 1 sec, then the processing time for
the search tree is 1002 sec. Suppose we can reduce the
VI. RECENT DEVELOPMENT: RSA
cost of processing the nodes at level 2 by a factor of 10
FACTORING CHALLENGE
at the expense of increasing the processing cost of nodes
at levels 1 an 3 by a factor of 100. Then, the total time is
Recently, there has been a lot of interest in factorizing big
reduced to 300 sec.
numbers. It all started in 1977 when Rivest, Shamir, and
Adleman proposed a public-key cryptosystem based on
D. Practical Aspects the difficulty of factorizing large numbers. Their method
is now known as the RSA scheme. In order to encourage
There are many considerations that go into developing
research and to gauge the strength of the RSA scheme,
a computer program which runs for months and years.
RSA Data Security, Inc. in 1991 started the RSA Fac-
Interruptions, ranging from power failures to hardware
toring Challenge. It consists of a list of large composite
maintenance, are to be expected. A program should not
numbers. A cash prize is given to the first person to
have to restart from the very beginning for every inter-
factorize a number in the list. These challenge numbers
ruption; otherwise, it may never finish. Fortunately, an
are identified by the number of decimal digits contained
enumerative computer proof can easily be divided into in-
in the numbers. In February 1999, the 140-digit RSA-140
dependent runs. If there is an interruption, just look up the
was factorized. In August 1999, RSA-155 was factorized.
last completed run and restart from that point. Thus, the
The best known factorization method divides the task into
disruptive effect of an untimely interrupt is now limited
two parts: a sieving part to discover relations and a matrix
to the time wasted in the incomplete run. Typically, the
reduction part to discover dependencies. The sieving part
problem is divided into hundreds or even millions of in-
has many similarities with an enumerative proof. One has
dependent runs in order to minimize the time wasted by
to try many possibilities, and the trials can be divided into
interruptions.
many independent runs. In fact, for the factorization of
Another advantage of dividing a problem into many in-
RSA-155, the sieving part took about 8000 MIPS years
dependent runs is that several computers can be used to run
and was accomplished by using 292 individual computers
the program simultaneously. If a problem takes 100 years
located at 11 different sites in 3 continents. The resulting
to run on one computer, then by running on 100 computers
matrix had 6,699,191 rows and 6,711,336 columns. It
simultaneously the problem can be finished in 1 year.
took 224 CPU hours and 3.2 Gbytes of central memory
on a Cray C916 to solve. Fortunately, there is never any
E. Correctness Considerations question about the correctness of the final answer, because
one can easily verify the result by multiplying the factors
An often-asked question is, “How can one check a
together.
computer-based proof?” After all, a proof has to be abso-
lutely correct. The computer program itself is part of the
proof, and checking a computer program is no different
from checking a traditional mathematical proof. Computer VII. FUTURE DIRECTIONS
programs tend to be complicated, especially the ones that
are highly optimized, but their complexity is comparable When the four-color conjecture was first settled by a
to some of the more difficult traditional proofs. computer, there was some hesitation in mathematics cir-
The actual execution of the program is also part of the cles to accept it as a proof. There is first the question
proof, and the checking of this part is difficult or impos- of how to check the result. There is also the aesthetic
sible. Even if the computer program is correct, there is aspect: “Is a computer-based proof elegant?” The re-
still a very small chance that a computer makes an error in sult itself is definitely interesting. The computer-based
executing the program. In the search for a projective plane proof of the nonexistence of a projective plane of order
of order 10, this error probability is estimated to be 1 in 10 again demonstrated the importance of this approach.
100,000. A lengthy enumerative proof is a remarkable departure
P1: GLQ/GJP P2: FQP Final
Encyclopedia of Physical Science and Technology EN003B-132 June 13, 2001 22:45
Computer-Generated Proofs
of Mathematical Theorems
David M. Bressoud
Macalester College
553
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003H-881 June 13, 2001 22:46
With Doron Zeilberger’s program EKHAD, one can enter the equation x n + y n = z n when n is an integer greater than
the statement believed or suspected to be correct. If it is or equal to 3. We know that the last assertion is correct,
true, the computer will not only tell you so, it is capable thanks to Andrew Wiles.
of writing the paper ready for submission to a research For a Diophantine equation, if a solution exists then it
journal. Even the search for likely theorems has been au- can be found in finite (though potentially very long) time
tomated. A good deal of human input is needed to set just by trying all possible combinations of integers, but if
parameters within which one is likely to find interesting no solution exists then we cannot discover this fact just
results, but computer searches for mathematical theorems by trying possibilities. A proof that there is no solution is
are now a reality. usually very hard. In 1970, Yuri Matijasevic̆ proved that
The possible theorems to which this algorithm can be Hilbert’s algorithm could not exist. It is impossible to con-
applied are strictly circumscribed, so narrowly defined struct an algorithm that, for every Diophantine equation,
that there is still a legitimate question about whether this is able to determine whether it does or does not have a
constitutes true computer-generated proof or is merely a solution.
powerful mathematical tool. What is not in question is that There have been other negative results. Let E be an
such algorithms are changing the kinds of problems that expression that involves the rational numbers, π , ln 2, the
mathematicians need to think about. variable x, the functions sine, exponential, and absolute
value, and the operations of addition, multiplication, and
composition. Does there exist a value of x where this ex-
I. THE IDEAL VERSUS REALITY
pression is zero? As an example, is there a real x for which
A. What Cannot Be Done e x − sin(π ln 2) = 0?
Mathematics is frequently viewed as a formal language For this particular expression the answer is “yes” because
with clearly established underlying assumptions or ax- sin(π ln 2) > 0, but in 1968, Daniel Richardson proved that
ioms and unambiguous rules for determining the truth of it is impossible to construct an algorithm that would de-
every statement couched in this language. In the early termine in finite time whether or not, for every such E,
decades of the twentieth century, works such as Russell there exists a solution to the equality E = 0.
and Whitehead’s Principia Mathematica attempted to
describe all mathematics in terms of the formal language
B. What Can Be Done
of logic. Part of the reason for this undertaking was the
hope that it would lead to an algorithmic procedure for In general, the problem of determining whether or not a
determining the truth of each mathematical statement. As solution of a particular form exists is extremely difficult
the twentieth century progressed, this hope receded and and cannot be automated. However, there are cases where
finally vanished. In 1931, Kurt Gödel proved that no ax- it can be done. There is a simple algorithm that can be
iomatic system comparable to that of Russell and White- applied to each quadratic equation to determine whether
head could be used to determine the truth or falsehood of or not it has real solutions, and if it does, to find them.
every mathematical statement. Every consistent system of That x 2 − 4312x + 315 = 0 has real solutions may not
axioms is necessarily incomplete. have been explicitly observed before now, but it hardly
One broad class of theorems deals with the existence of qualifies as a theorem. The theorem is the statement of
solutions of a particular form. Given the mathematical the quadratic formula that sits behind our algorithm. The
problem, the theorem either exhibits a solution of the conclusion for this particular equation is simply an appli-
desired type or states that no such solution exists. In 1900, cation of that theorem, a calculation whose relevance is
as the tenth of his set of twenty-three problems, David based on the theory.
Hilbert challenged the mathematical community: “Given a But as the theory advances and the algorithms become
Diophantine equation with any number of unknown quan- more complex, the line between a calculation and a the-
tities and with rational integral numerical coefficients: To orem becomes less clear. The Risch algorithm is used by
devise a process according to which it can be determined computer algebra systems to find indefinite integrals in
by a finite number of operations whether the equation is Liouvillian extensions of difference fields. It can answer
solvable in rational integers.” A well-known example of whether or not an indefinite integral can be written in a
such a Diophantine equation is the Pythagorean equation, suitably defined closed form. If such a closed form exists,
x 2 + y 2 = z 2 , with the restriction that we only accept in- the algorithm will find it. Most people would still classify
teger solutions such as x = 3, y = 4, and z = 5. Another a specific application of this algorithm as a calculation, but
problem of this type is Fermat’s Last Theorem. This theo- it is no longer always so clear-cut. Even definite integral
rem asserts that no such positive integer solutions exist for evaluations can be worthy of being called theorems.
P1: ZCK Final
Encyclopedia of Physical Science and Technology EN003H-881 June 13, 2001 22:46
Freeman Dyson conjectured the following integral eval- n n−k
uation for positive integer z in 1962: 2k
0≤k≤n/3
n−k 2k
2π 2π
iθ
(2π )−n
··· e j − eiθk 2z dθ1 · · · dθn 1
0 0 1≤ j<k≤n
= 2n−1 + (i n + (−i)n ), n ≥ 2.
2
(nz)! In general, the coefficients in the recursion will be poly-
= .
(z!)n nomials in n. In 1991 Marko Petkovšek created an al-
Four proofs have since been published. Dyson’s conjec- gorithm that will find a closed form solution for such a
ture cannot be proven by the Risch or any other general recursion, or prove that no such formula exists. The com-
integral evaluation algorithm because the dimension of the bination of the WZ method with Petkovšek’s algorithm
space over which the integral is taken is a variable, but its gives an automated proof that a particular type of solution
proof is now close to the boundary of what can be totally cannot exist, or else it finds such a solution. As an example,
automated. there is a computer-generated proof of the fact that
Most of this article will focus on the WZ method n
2 2
developed by Wilf and Zeilberger in the early 1990s. n n+k
Given a suitable hypergeometric series, the WZ method k=0
k k
will determine whether or not it has a closed form. If it
does, the algorithm will find it. It can even be used to cannot be written as a linear combination of hypergeo-
find new hypergeometric series that can be expressed in metric terms in n.
closed form. Again, the important mathematics is the the- The WZ method combined with Petkovšek’s algorithm
ory that is used to create and justify the algorithm, but is producing fully automated proofs of results that, un-
specific applications now look very much like theorems. til recently, have required considerable human ingenuity.
One example of a result that can be proved by the WZ Significantly, it replies not just with a statement that a
method is the following identity, discovered and proved particular identity is true, but also with a proof certificate,
by J. C. Adams in the nineteenth century. Let Pn (x) be the a critical insight that enables anyone with pencil and paper
Legendre polynomial defined by and a little time to verify that this identity is correct. At
the very least, these algorithms have moved the line of
n
1 n 2 demarcation between what constitutes a proof and what is
Pn (x) := n (x − 1)k (x + 1)n−k ,
2 k=0 k only a computation.
1 (m − 2n − k − 1)(m − 2n − k)
4 (n + 1)(n + k + 1)
(n + (k + 1 − m)/2)(n + (k − m)/2)
= .
(n + 1)(n + k + 1)
This is simply the Chu-Vandermonde identity with α =
(k + 1 − m)/2, β = (k − m)/2, and γ = k + 1.
There is clearly an advantage to using the rising factorial
notation, in which case we write
∞
α1 , . . . , αm (α1 )k · · · (αm )k k
F
m n ; x := 1 + x .
β1 , . . . , βn k=1
k!(β 1 )k · · · (βn )k
n n−k (n − k − 1)! This means that if F(n, j) is the summand in the conjec-
2k = n · 2k · . tured identity, then
0≤k≤n/3
n − k 2k 0≤k≤n/3
(2k)! (n − 3k)!
−sF(n, j) = G(n, j + 1) − G(n, j),
is such a series.
Every such sum of proper hypergeometric terms will where G(n, j) = j(m + n − s − j)F(n, j)/(m + n − 2 j).
satisfy a finite recursion of the form The sum over j of G(n, j + 1) − G(n, j) telescopes,
and therefore the original summation equals [G(n, s + 1)
J
a j (n) f (n + j) = 0. − G(n, 0)]/(−s) = 0.
j=0 Gosper’s algorithm is a fertile approach that is often
applicable, but it is limited by the fact that such a G does
Sister Celine showed how to reduce the problem of find-
not always exist.
ing these coefficients to one of solving a system of linear
equations. It was Doron Zeilberger who realized that this
gives us an algorithm for proving hypergeometric series C. Wilf and Zeilberger
identities because we need only verify that each side satis- Major progress was made by Doron Zeilberger who, start-
fies the same recursion and the same initial conditions. The ing in 1982, began to combine the ideas of Sister Celine
problem with using Sister Celine’s approach is that her par- and William Gosper. In the early 1990s, Herbert Wilf
ticular algorithm for finding the coefficients is slow. Later joined Zeilberger in extending and refining these methods
developments would speed it up considerably, though in into a fully automated proof machine that is now known
the process would lose the easy generalization of Sister as the WZ method. If F(n, k) is a proper hypergeometric
Celine’s technique to summations over several indices. term, then there always is a proper hypergeometric term
G(n, k) such that G(n, k + 1) − G(n, k) is equal to a linear
B. Gosper’s algorithm combination of {F(n + j, k) | 0 ≤ j ≤ J } for some explic-
itly computable J ,
In 1977 and 1979, R. W. Gosper, Jr., took a different
approach and became one of the first people to use com-
J
a j (n)F(n + j, k) = G(n, k + 1) − G(n, k), (5)
puters to discover and check identities for hypergeomet- j=0
ric series. Given a proper hypergeometric term F(n, k), K
Gosper showed how to automate a search for a proper where the a j (n) are polynomials in n. If f (n) = k=0
hypergeometric term G(n, k) with the property that F(n, k), then we can sum both sides of Eq. (5) over
0 ≤ k ≤ K . The right side telescopes, and we are left with
G(n, k + 1) − G(n, k) = F(n, k).
the recursive formula
If such a G could be found, then
J
n a j (n) f (n + j) = G(n, K + 1) − G(n, 0).
f (n) = (G(n, k + 1) − G(n, k)) j=0
k=0 Gosper’s technique—which is very fast—can be used to
= G(n, n + 1) − G(n, 0). find the function G. The coefficients a j (n) are then found
by solving a system of linear equations.
An example of the application of this algorithm is
Gosper’s algorithm is the special case of the WZ method
the computer-generated proof of an identity discovered
in which J = 0. The other case of particular interest is
and first proved by J. S. Lomont and John Brillhart: Let
when J = 1 and a1 = −a0 = 1. Consider the conjectured
1 ≤ m ≤ n, where n ≥ 2 and 1 ≤ s ≤ min(m, n − 1), then
identity:
s m m − j n n − j n 2k + 1 n + 2
(−1) j (m + n − 2 j) 2n + 1
m −s n−s 2n−2k−1 = . (6)
j=0
j j
k
2k k 2k + 1 n
m + n m + n − s − j − 1 s 2
× = 0. If we divide each side by ( 2n+1
n
), this can be rewritten as
j s− j j
n 2k + 1 n + 2
f (n) = 2n−2k−1
2k k 2 j + 1
Given this conjecture, the program EKHAD replies with k
If this is true, then f (n) satisfies the recursion f (n + 1) − IV. EXTENSIONS, FUTURE WORK,
f (n) = 0. Let F(n, k) be the summand, AND CONCLUSIONS
n 2k + 1 n + 2 n−2k−1 2n + 1
F(n, k) = 2 . A. Extensions and Future Work
2k k 2j + 1 n
If we could find a proper hypergeometric term G(n, k) for All of the techniques described in this article have been
which extended to q-hypergeometric series such as
Convex Sets
A. C. Thompson
Dalhousie University
I. Introduction
II. Definitions
III. Examples
IV. Descriptions of Convex Sets
V. New Convex Sets from Old
VI. Spaces of Convex Sets
VII. Basic Theorems
VIII. Volumes and Mixed Volumes
IX. Inequalities
X. Special Classes of Convex Sets
GLOSSARY
Affine map The composition of a linear map and a tran- Line segment The set of points (vectors) {x : x = a +
slation; i.e., if T is linear, then A(x) := T (x + x0 ) = λ(b − a), 0 ≤ λ ≤ 1} = {x : x = (1 − λ)a + λb, 0 ≤
T (x) + T (x0 ) is an affine map. λ ≤ 1} is called the line segment joining a and b and is
Compact set A closed and bounded set in Rn ; in a metric denoted by [a, b].
space, a set C such that if (xk ) is a sequence of elements Linear map (transformation) A function T between
of C there is a subsequence that converges to a point vector spaces that respects the vector operations of
of C. addition and multiplication by numbers; i.e., T (αx +
Dual space The collection of all linear functions from a βy) = αT (x) + βT (y).
vector space to the set of real numbers. These func- Rn The most usual vector spaces consisting of n-tuples of
tions can be added and multiplied by real numbers in real numbers that are added and multiplied by numbers
a point-by-point fashion which makes this collection in a coordinate-by-coordinate fashion.
into another vector space. Vector space A collection of things called vectors or
Hyperplane A level set of a function in the dual space; points that can be added (via a parallelogram law) and
i.e., if f is a linear function from a vector space X to R multiplied by numbers (also called scalars). Here the
and if α is a number, then H αf := {x ∈ X : f (x) = α} numbers will be real but in other contexts complex or
is a typical hyperplane. other number systems are possible.
717
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
I. INTRODUCTION A set of vectors B with the property that every vector has
a unique representation as a finite linear combination of
Relatively few shapes in the natural world are convex. the vectors in B is called a basis for X . Each vector space
When they do occur— for example, soap bubbles, drops has a basis, and all bases for the same space have the same
of dew, smoothly worn stones on the beach, single crys- number of elements. That number is called the dimension
tals of amethyst and salt— we find them to be estheti- of the space. The dimension of Rn is n. The space is said to
cally pleasing. Among manufactured objects, rectangles, be finite dimensional if it has a basis with a finite number of
circles, hexagons, cubes, cylinders, and cones are quite elements and is infinite dimensional otherwise. We shall be
ubiquitous. We first encounter them as children and enjoy concerned almost entirely with finite dimensional spaces.
the shapes of wooden building blocks and colored tiles. A linear map is called an isomorphism if it is one to one
Convexity is the study of these shapes. Two- and onto. If X is finite dimensional, then, corresponding to
dimensional convex shapes (circles, ellipses, triangles, a basis (x1 , x2 , . . . , xn ), there is an isomorphism, T , of X
polygons) and the regular Platonic solids have been ob- onto Rn defined asfollows: each x ∈ X has a unique repre-
jects of mathematical study for a very long time. The study sentation as x = αi xi ; set T (x) := (α1 , α2 , . . . , αn ). In
of convexity as a specific mathematical topic dates back this sense, there is no real loss of generality if, when deal-
only to the end of the 19th century. The primary influence ing with finite dimensional spaces, we restrict attention to
was the pioneering work of Minkowski for which one Rn .
should consult his collected works.3 For a good historical The linear mappings from X to R are given the spe-
summary, see the article by Peter Gruber in the Handbook cial name of linear functionals. The set of all of them is
of Convex Geometry.2 This chapter covers only some of called the dual space of X and denoted by X ∗ . If X is
the topics that fall under the heading of convexity. Other finite dimensional and if it is given a basis, then the linear
aspects can be found in articles in Reference 2. There is functionals can be represented by 1 × n matrices (i.e., row
not space to mention the many interactions of convexity vectors of the same size as the column vectors from X ).
with other branches of mathematics. The book by Roger Thus, in the finite dimensional case, the dimension of X ∗
Webster5 is a readable, elementary introduction to the sub- is the same as that of X .
ject and has some interesting applications. In Rn , the length of a vector x = (ξ1 , ξ2 , . . . , ξn ) is de-
To define convex sets precisely we need an ambient noted by x and is defined to be the number:
space in which they may exist. This space requires one type 1/2
of mathematical structure and usually comes equipped n
x := |ξi | 2
.
with another.
i=1
The structure that is required is that of a vector or linear
space; the most familiar vector spaces are those that we The structure that is usually present in discussions of
denote by R2 , R3 , and, in general, Rn . This space consists convex sets in addition to the vector space structure just
of n-tuples of numbers (ξ1 , ξ2 , . . . , ξn ) whose entries are outlined is that of a topological space. Most frequently
called the coordinates of the vector. It is customary to write the topology derives from a metric or distance. This is the
single vectors as rows, but matrices (and other functions case in Rn where the distance between two vectors x and
that operate on vectors) are usually written to the left of y, d(x, y), is defined to be the length of x − y:
the vector. This means that the vectors should ‘really’ be d(x, y) := x − y .
viewed as columns.
If a vector y is expressed in the form: The topological structure allows one to talk about conver-
gence, continuity, and such concepts as open sets, closed
y = α1 x1 + α2 x2 + · · · + αk xk sets, connected sets, and compact sets. The Heine-Borel
then it is said to be a linear combination of the vectors Theorem asserts that C is a compact subset of Rn if and
{xi : i = 1 . . . , k}. The usual basis for Rn consists of the only if it is both closed and bounded. This is no longer
vectors: true in infinite dimensional spaces.
e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ...,
en = (0, 0, 0, . . . , 1) II. DEFINITIONS
where the ith vector has a 1 as the ith coordinate and a
Definition. A set K in a vector space X is said to be
0 elsewhere. Any vector x = (ξ1 , ξ2 , . . . , ξn ) can be ex-
convex if whenever x , y ∈ K then the line segment [x , y]
pressed uniquely as:
is contained in K . Thus, in Fig. 1, the set (a) is convex
x = ξ1 e1 + ξ2 e2 + · · · + ξn en . while (b) is not.
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
FIGURE 1
A point of the line segment [a , b]— a point of the form: Note that the convexity of K is needed to ensure that λx +
(1 − λ)y is in the domain of f when x and y are.
a + λ(b − a) = (1 − λ)a + λb, 0 ≤ λ ≤ 1,
The relation between the definitions of convexity for
is said to be a convex combination of a and b. a function and for a set is twofold. The graph of f is a
A related notion is that of being star shaped. A set S subset of K × R and is defined as:
is star shaped about a point x0 if for all x in S the line
segment [x, x0 ] is contained in S. In Fig. 2, the set (a) is graph( f ) := {(x, η) : f (x) = η}.
star shaped about x but not about y, and (b) is not star Extending this idea, the epigraph of f is the set that lies
shaped about any point. A set is convex if and only if it is
above the graph of f :
star shaped about every point. Each convex set (and each
star-shaped set) is connected. epigraph( f ) := {(x, η) : f (x) ≤ η}.
In the one-dimensional space R the collections of con-
vex sets, star-shaped sets, and connected sets coincide. Then, f is convex (as a function) if and only if epigraph( f )
is convex as a set. Secondly, if f is convex then, for each
Each of these is the class of intervals (closed, open, half-
(extended) real number α, the sets {x : f (x) ≤ α} and {x :
open, bounded, and unbounded). Therefore, in order to
have an interesting theory of convexity, the dimension of f (x) < α} are convex. This illustrates the connection be-
tween convexity and certain types of inequality.
the underlying space should be at least 2.
A function is sublinear if it is both subadditive— f (x +
There is also a definition of convexity as an adjective
y) ≤ f (x) + f (y)—and non-negatively homogeneous—
that applies to functions rather than sets:
f (αx) = α f (x) for all α ≥ 0. From now on, “homo-
geneity” will mean “non-negative homogeneity.” All lin-
Definition. A real valued function f defined on a
ear functionals are sublinear and all sublinear functions
convex set K is said to be convex if, for all x and y in K ,
are convex. Hence, if f is sublinear then sets of the form
f (λx + (1 − λ)y) ≤ λ f (x) + (1 − λ) f (y). {x : f (x) ≤ α} are convex.
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
FIGURE 2
is usually referred to as the triangle inequality and is a A general n-simplex is the image of Sn under an invertible
consequence of the Cauchy-Schwarz inequality. affine map.
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
coplanar, and so on. The convex hull of (n + 1) points (1, 0, 1), (1, 0, −1), and all the points of the circle except
in general position (necessarily in Rm with m ≥ n) is an (1, 0, 0).
n-simplex. To see that there is an affine map of this set
The next way to describe a convex set is to first sup-
onto Sn observe that there is a translation that takes one
pose that the origin is an interior point of the set. This
point to 0 and then a linear map that takes the remaining
effectively means that the set has interior points because
n to the usual basis vectors. Second, the relative interior
one can always either translate the set or choose the origin
of K is the interior when K is regarded as a subset of its
appropriately. Then one describes the set by saying how
affine hull. The relative interior of a non-empty convex set
far in any direction the boundary is from the origin.
is always non-empty.
Definition. If K is a convex set with 0 as an interior
Definition. A face F of a convex set K is a convex
point, then the radial function of K , r K (x), is defined by:
subset of K with the property that if y is in F and if y
can be represented in the form y = αx1 + (1 − α)x2 with r K (x) := sup{λ : λx ∈ K }.
x1 and x2 in K , then, in fact, x1 and x2 are in F. A more A slight variant of this is to say how much K has to be
geometrical description is that if an open line segment in dilated to contain a given vector.
K contains points of F then the whole line segment lies
in F. If P is an n-dimensional polytope in Rn , then its Definition. If K is a convex set with 0 as an interior
0-dimensional faces are called vertices, its one- point, then the gauge function (or Minkowski functional)
dimensional faces are called edges, and the (n − 1) of K , g K (x), is defined by:
dimensional faces will be called facets (this latter term is
not universally used). g K (x) := inf{λ ≥ 0 : x ∈ λK }.
It is evident that 1/r K (x) = g K (x) (with 1/∞ = 0 if K is
Examples. The faces of the standard cube C3 in R3 unbounded). The function g K has the advantage of being
are the cube itself, the six facets of the cube (the sets both homogeneous and subadditive (i.e., sublinear). If K is
where one fixed coordinate has a prescribed value from bounded, then the radial function is finite for all x = 0 and
{−1, 1}), the 12 edges of the cube (the sets where two the gauge function is non-zero for x = 0. If, in addition,
fixed coordinates have prescribed values from {−1, 1}), K is symmetric about the origin so that g K (−x) = g K (x),
and the eight vertices (the sets where all three coordinates then the gauge function has the properties of a norm. This
have prescribed values from {−1, 1}). leads to the use of convexity in the study of normed spaces.
The faces of the standard simplex S3 in R3 are S3 ; the If K is closed, it is described as K = {x : g K (x) ≤ 1}.
four facets of the simplex (the intersections of S3 with The boundary of K (denoted by ∂ K ) is the set of points
the planes {x : ξi = 0} and {x : ξi = 1}); the six edges for which g K (x) = r K (x) = 1.
(the line segments [0, ei ], [e1 , e2 ], [e2 , e3 ], [e3 , e1 ]); and Instead of thinking of the boundary of K as a set of
the four vertices {0}, {ei }. points, we may also regard it as an envelope of half-spaces
Faces of a convex set K that are single points {z} are (in the same way that a polytope was described as a finite
called extreme points of K . intersection of half-spaces).
Theorem. If K is a non-empty convex body, then Definition. If A is a non-empty subset of X , then the
K = {x : f (x) ≤ h K ( f ) for all linear functionals f }. polar of A, denoted by A◦ (or A∗ ), is defined by:
The proof of the second theorem requires the separation A◦ := { f ∈ X ∗ : f (x) ≤ 1 for all x ∈ A}.
theorem from Section VII. There are a variety of proofs If A ⊆ X , then A◦ ⊆ X ∗ . Often, the distinction between
of the first theorem (see Schneider4 ). X and X ∗ is obscured and the inner product is used in the
definition of A◦ . Similarly, for a non-empty set B in X ∗
we have:
V. NEW CONVEX SETS FROM OLD
B ◦ := {x ∈ X : f (x) ≤ 1 for all f ∈ B}.
A. The Convex Hull
Repeated applications of this operation are indicated
We begin this section with a second look at the operation without parentheses, thus A◦◦ , B ◦◦ , and so on. For all sets
of the convex hull, the intersection of all convex sets con- A (in either X or X ∗ ) we have A ⊆ A◦◦ . The operation
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
reverses inclusions: If A1 ⊆ A2 , then A◦2 ⊆ A◦1 . It follows be convex, but the convex hull of the union is the smallest
that A◦ = A◦◦◦ always. convex set that contains both of them. Therefore, we define
The definition of B ◦ reveals it to be an intersection of the following binary operations:
closed half-spaces so it is always a closed, convex set in X
K 1 ∧ K 2 := K 1 ∩ K 2
that contains 0. The same holds for A◦ in X ∗ . If A = {0},
then A◦ = X ∗ and X ◦ = {0}. If A is a single point other K 1 ∨ K 2 := co(K 1 ∪ K 2 ).
than 0, then A◦ is a half-space. Thus, the duality between
points and half-spaces (encountered in Section IV) is im- With these operations, the collection of convex sets is a
plemented by this operation. However, if A is a half-space, lattice. The underlying order relation is that of inclusion.
then A◦ is not a singleton but a line segment joining the There are two important sublattices: the collection of
expected point to 0. Because A◦ is always a closed, convex closed convex sets that contain 0 and the collection of
set containing the origin, in order to get an exact duality compact convex sets with 0 as an interior point. On each
we must restrict attention to this class of sets. of these sublattices, the polar map is a bijection that re-
verses inclusion and hence reverses the lattice operations:
Theorem. If K is a closed, convex set with 0 ∈ K , (K 1 ∧ K 2 )◦ = (K 1 ∩ K 2 )◦ = K 1◦ ∨ K 2◦ = co K 1◦ ∪ K 2◦
then K = K ◦◦ .
(K 1 ∨ K 2 )◦ = (co(K 1 ∪ K 2 ))◦ = K 1◦ ∧ K 2◦ = K 1◦ ∩ K 2◦ .
This is a most important theorem whose proof relies on
the separation theorem in Section VII. For an arbitrary set
A, A◦◦ is the closed convex hull of A and 0. D. Algebraic Operations on Convex Sets
On the collection of closed convex sets that contain 0,
the polar mapping is one to one and maps this collection In addition to the operations just discussed, vector oper-
onto the corresponding collection in X ∗ . If 0 is an interior ations can be performed on the collection of convex sets.
point of K , then K ◦ is compact. If K is compact, then K ◦ These are done “elementwise.”
has 0 as an interior point. Thus, the polar operation also
maps the class of compact convex sets with 0 as an interior Definition. If K 1 and K 2 are convex sets in a vector
point onto the same class in X ∗ . If K is also symmetric space X and λ is a non-negative scalar, then:
about 0, then so is K ◦ . In this last case, K plays the role K 1 + K 2 := {x : x = x1 + x2 with xi ∈ K i }
of the unit ball in a normed space and K ◦ is the dual ball
in the dual space X ∗ . and
FIGURE 7
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
accomplished by first proving the result for polytopes and An infimum is used here because it is also a useful defi-
then extending it to all convex bodies “by continuity.” nition in the case of infinite dimensional spaces. Restricted
A convex set is said to be strictly convex if its boundary to the finite dimensional case, the infimum is attained.
does not contain any line segment (of positive length). The functional is a metric on the equivalence classes
A convex set is said to be smooth if there is a unique of convex sets under the above equivalence relation. If we
supporting hyperplane at each point of its boundary. The consider the norms generated by K and L, then the equiv-
sets of smooth, of strictly convex, and of both smooth and alence relation is one of isometry between normed spaces
strictly convex bodies are all dense in the set of all convex and measures the distance between equivalence classes
bodies. of normed spaces.
John’s Theorem now says that the distance between
the equivalence class containing K and the set of ellip-
B. The Banach-Mazur Metric soids (which is the equivalence class containing B) is no
It is sometimes appropriate to consider that convex sets of more than (log n)/2. Hence, the distance between any two
different dimension are infinitely far apart. This section is equivalence classes is no more than log n. If we allow
concerned with metrics of this sort. We limit our attention non-symmetric sets, then these numbers are doubled. It is
to convex bodies with 0 as an interior point. surprising that the exact diameter of these metric spaces
If K and L are two convex bodies with 0 in their inte- is only known in the case of two dimensions.
riors, then there are scalars λ and µ such that:
K ⊆ λL and L ⊆ µK . VII. BASIC THEOREMS
As before, we can now take λ0 and µ0 to be the minimal
A. Separation and Support Theorems
such λ and µ. Then set (K , L) := λ0 µ0 . The fundamental
result is now John’s Theorem. The notion of separation involves placing a hyperplane
between two convex sets. There are varying degrees of
Theorem (John). If K is a centrally symmetric con- separation that can be considered.
vex body in an n-dimensional space X and if 0 is an interior
point of K , then there is an ellipsoid E such that: Definition. A hyperplane H = H αf is said to separate
√ the convex sets K and L if K lies in one of the closed
E ⊆ K ⊆ nE half-spaces determined by H , and L lies in the other. The
separation is proper if it is not the case that both sets lie
√ The standard cube Cn and cross polytope On show that in H . The separation is strict if one can replace “closed”
n cannot be improved. If we remove the condition √ that
K be centrally symmetric, then we must replace n by n. by “open” in the first sentence. Finally, the separation is
This bound is attained by the simplex Sn . strong if there exist α and β with α < β and K ⊆ H α− f
β+
The ellipsoid E that appears in this result is of con- and L ⊆ H f (or vice versa).
siderable interest. It is the ellipsoid of maximal volume
Separation Theorem. Let K be a convex set in Rn
contained in K and is called the Löwner-John ellip-
and suppose x is not in K , then K and x can be separated.
soid. It occurs in linear and nonlinear programming in
If K is closed, then K and x can be strongly separated.
Khachiyan’s polynomial time algorithm and in Schor’s
algorithm. It follows from the second statement that every closed
The functional is not a metric for two reasons. First, convex set can be represented as an intersection of closed
the construction is multiplicative rather than additive; e.g., half-spaces. It follows from the first statement that if K has
(K , K ) = 1 rather than 0. If we want a genuine metric a non-empty interior then at each point x of the boundary
we must take the logarithm of . Since not all authors do of K there is a supporting hyperplane obtained by sepa-
so, one should be careful when reading the literature to rating x from the interior of K .
see the precise definition. There is a converse to this theorem. If A is a closed
Second, log((K , α K )) = 0; Therefore, one should set with a non-empty interior and if, at each bound-
consider equivalence classes of multiples of K . How- ary point, there is a supporting hyperplane, then A is
ever, it is more appropriate to enlarge the equivalence convex.
classes and say that the sets K 1 and K 2 are equivalent if The separation theorem can be generalized to the fol-
there is an invertible linear map T such that T (K 1 ) = K 2 . lowing: If K and L are two convex sets whose relative
Finally, the definition of the Banach-Mazur metric is interiors are disjoint, then K and L can be separated. If
(K , L) := inf{log[(K , T (L))] : T is an invertible lin- one set is closed and the other is compact, then the sepa-
ear map}. ration is strong.
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
The proof of the seemingly more general statement Corollary. If F is a finite family of parallel line seg-
follows from the earlier one because K and L can be ments in R2 such that every three of them has a transversal,
(strongly) separated if and only if K − L and {0} can be then the whole family has a transversal.
(strongly) separated. Also, if one set is compact and the
other closed, then K − L is closed. This is not true in Corollary. Let F be a finite family of convex sets in
general for two closed sets. Rn and let K be a convex set. If for every finite subfamily
with (n + 1) elements there is a translate of K that inter-
Example. In R2 the sets K := {(x, y) : y = 0} and sects each member of the subfamily, then there is a single
L := {(x, y) : x ≥ 0 and y ≥ 1/x} are both closed and translate of K which intersects every member of the whole
convex. They cannot be strongly (or even strictly) sep- family.
arated, and the set K − L is open.
Theorem (Kirchberger). Let F1 and F2 be finite
These theorems have very important analogs in infinite sets in an n-dimensional space such that F1 ∪ F2 has at
dimensional spaces. In that setting, they are all conse- least (n + 2) elements. Suppose that for every subset F
quences of the Hahn-Banach Theorem, which is one of of F1 ∪ F2 with exactly (n + 2) points the sets F ∩ F1 and
the most important theorems of functional analysis. F ∩ F2 can be strictly separated, then F1 and F2 can be
strictly separated.
B. Carathéodory’s Theorem and Its Relatives
Webster5 gives an elegant proof of Jung’s theorem also
The following theorems are all closely related, but the based on Helly’s theorem.
Carathéodory result appears the most fundamental.
Theorem (Jung). Every set A in Rn with diameter
Theorem (Carathéodory). If A is a subset of an n- 1√ is contained in a closed ball of radius no more than
dimensional space and if x ∈ coA, then x can be expressed n/(2n + 2).
as a convex combination of (n + 1) or fewer points.
Finally, we give Krasnosel’skiǐ’s Theorem (sometimes
Other ways of phrasing the conclusion is to say that x is called the “Art Gallery” Theorem).
a convex combination of a set of points in general position.
Another is to say that x lies in a simplex whose vertices are Theorem (Krasnosel’skiǐ). Let A be a compact sub-
in A. Thus, when constructing the convex hull, the length set of Rn . If, for every (n + 1) points, a1 , a2 , . . . , an+1 of
of the convex combinations needed is bounded (in finite A there is a point x of A such that the line segments [x, ai ]
dimensional spaces). all lie in A, then A is star shaped.
Theorem (Radon). If a finite set F of points in an In other words, in an art gallery, if for every finite set of
n-dimensional space is not in general position, then it may (n + 1) pictures there is a point in the gallery from which
be decomposed into two disjoint subsets F1 and F2 such one can see all (n + 1), then there is a point from which
that co(F) ∩ co(G) = ∅. one can see the whole art gallery.
All of these theorems have been much generalized. For
In particular, this is true of any set of at least (n + 2) one collection of such results, see the article by J. Eckhoff
points. in the Handbook of Convex Geometry.2
Theorem (Helly). Let K 1 , K 2 , . . . , K m be a finite
family of convex sets in an n-dimensional space (m ≥
VIII. VOLUMES AND MIXED VOLUMES
n + 1). If every subfamily with exactly (n + 1) members
has a non-empty intersection, then the whole family has a
For many of the more interesting properties of convex sets
non-empty intersection.
it is necessary to measure them in some way. Such mea-
Eggleston1 shows how Helly’s Theorem can be derived surement requires more advanced ideas than were pre-
from Carathéodory’s and conversely. sented in Section I. The basic concept is that of volume
Since any family of compact sets has a non-empty inter- in an n-dimensional space. There are several approaches
section if every finite subfamily does, there is an easy ex- which coincide for compact convex sets but may not for
tension to infinite families of compact convex sets. If an ar- more general types of sets.
bitrary family of compact convex sets in an n-dimensional The most straightforward approach is Eggleston’s,1
space is such that every subfamily with (n + 1) members which says that the volume of a convex set in Rn is its
has a non-empty intersection, then so does the whole fam- n-dimensional Lebesgue measure. The volume of a set in
ily. A transversal for a family of sets is a line that meets Rn will be denoted by Vn (K ). One-dimensional volume
every member of the family. is usually called length, and V2 is usually called area. In
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
V (K , n − 1; L , 1)n ≥ Vn (K )n − 1 Vn (L) 2n
2n Vn (K ) ≤ Vn (D(K )) ≤ Vn (K )
with equality if and only if K and L are homothetic. n
with equality on the left if and only if K is symmetric and
The power of this inequality is shown by the fact that,
on the right if and only if K is a simplex. (The left-hand
substituting the unit ball for L, one immediately gets the
inequality is trivial; it is the other one that is due to Rogers
isoperimetric theorem for convex sets.
and Shephard.)
Theorem (Isoperimetric). If K is a convex body in
Theorem (Busemann intersection inequality). If
Rn with prescribed volume v, then A(K ) ≥ A(Bv ), where
K is a convex body in Rn with 0 as an interior point, then:
Bv is the dilation of B with volume v. Moreover, equality
holds if and only if K is a translate of Bv . Vn (I (K )) Vn (I (B)) n−1
n
≤ =
Vn (K )n−1 Vn (B)n−1 nn−2
Corollary (Isoperimetric inequality). If K is a con-
vex body in Rn , then: with equality if and only if K is an ellipsoid.
A(K )n A(B)n Theorem (Petty projection inequality). If K is a
−
≥ = n n n .
Vn (K ) n 1 Vn (B)n − 1 convex body in Rn , then:
Vn (K )Vn (K ◦ ) ≥ Vn (Cn )Vn (Cn◦ ) = Vn (On )Vn (On◦ ) = 4n /n! does it follow that Vn (K ) ≤ Vn (L)?
with equality if and only if K = Cn (On is not a zonoid). Petty and Schneider showed that the answer is “no” in
general but “yes” if the body L is a projection body. Since
Theorem (Rogers-Shephard). If K is a convex all symmetric convex bodies in R2 are projection bodies,
body in Rn then the answer is “yes” in R2 but “no” in all higher dimensions.
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
Finally we mention two well-known inequalities that under the inclusion relation (this is why P and ∅ are in-
relate to the finite (and infinite) dimensional p -spaces. cluded as faces). The lattice of faces of P ◦ forms the dual
lattice.
Theorem (Hölder’s inequality). If the numbers A key combinatorial problem is to characterize those
p ≥ 1 and q are related by the equation p −1 + q −1 = 1 vectors that are f -vectors of some polytope. Only in three
and if x = (ξi ) and y = (ηi ) are two vectors in Rn , then: dimensions has this problem been solved. The primary
1/ p 1/q necessary condition for a vector to be an f -vector is that
ξi ηi ≤ |ξi | p |ηi |q . it satisfies the so-called Euler relation.
A consequence of this inequality is that of Minkowski. Theorem. If f i denotes the number of faces of a
polytope of dimension i, then
Theorem (Minkowski’s inequality).
If x and y are
n
vectors in Rn and if x p := ( |ξi | p )1/ p , then (−1)i f i = 1.
x+y p ≤ x p + y p.
0
The number 1 is the Euler characteristic of P.
Therefore, the functional x p is a norm and the set
B( p) defined in Example 9 in Section III is convex. Corollary. In three-dimensional space, the number
of vertices f 0 of edges f 1 and of facets f 2 satisfy the
equation f 2 − f 1 + f 0 = 2.
X. SPECIAL CLASSES OF CONVEX SETS
Theorem (Steinitz). In three-dimensional space,
( f 0 , f 1 , f 2 ) is the f -vector of a polyhedron if and only
A. Polytopes
if it satisfies, in addition to the Euler relation, the follow-
In this section we deal with bounded polytopes. Recall ing inequalities:
that such a polytope is a convex body that may be regarded
either as the convex hull of finitely many points or, dually, 1. 4 ≤ f 0 ≤ 2 f 2 − 4
as the intersection of finitely many half-spaces. Polygons 2. 4 ≤ f 2 ≤ 2 f 0 − 4
and polyhedra have been studied since the beginnings of
Such conditions are not completely known in higher di-
mathematics. The existence theorem of Minkowski is very
mensions but there are many partial results, especially for
important.
n = 4 (see the survey article by Bayer and Lee2 ). However,
Theorem (Minkowski). If {u 1 , u 2 , . . . u k } is a set of it is known that the Euler relation is the only affine relation
dual unit vectors which do not all lie in a hyperplane satisfied by all f vectors.
and if {α1 , α2 , . . . , αk } are positive real numbers such Of particular appeal are the regular figures. If n = 2,
that: then there is an infinite family of regular polygons. Up to
rigid motions and dilations, there is precisely one for each
k
αi u i = 0 number k of vertices (edges). If n = 3, then there are pre-
i cisely five regular polyhedra (the Platonic solids). There
are many proofs that there can be no more. We outline
then there is a polytope (unique up to translation) that
one. Suppose that the facets are p-gons and that q meet
has k facets whose areas are given by the αi ’s and whose
at each vertex. Then the Euler relation implies that p −1 +
“normals” are given by the u i ’s.
q −1 > 1/2. Since p, q ≥ 3, only the values ( p, q) =(3, 3),
There is a generalization of this theorem to general con- (3, 4), (4, 3), (3, 5) and (5, 3) are possible.
vex bodies determined by general “surface area measures” In all dimensions it is possible to construct a regu-
(see Schneider4 ). lar cube Cn , a regular cross-polytope On , and a regular
By the facial structure of a polytope, we mean the simplex (a linear image of Sn ). For n ≥ 5, these are the
collection of all of its faces classified by their dimen- only regular polytopes. When n = 4, there are three more
sion. If P is an n-dimensional polytope, then there is with, respectively, 24 octahedral facets, 120 dodecahedral
precisely one n-dimensional face, P itself. The empty facets, and 600 tetrahedral facets.
set is usually included as the unique face of dimen- A great variety of polytopes have some degree of
sion −1. The 0-dimensional faces are the vertices, the regularity; for example, the cuboctahedron has six
one-dimensional faces are the edges, and the (n − 1)- square facets and eight that are equilateral triangles and
dimensional faces are the facets. The f -vector of P is whose vertices are all alike. Its dual is the rhombic do-
the vector ( f 0 , f 1 , f 2 , . . . , f n − 1 ) where f i is the number decahedron, which has all its facets alike (they are all
of faces of dimension i. The faces of P form a lattice rhombi).
P1: FWQ Final Pages
Encyclopedia of Physical Science and Technology EN003A-146 June 14, 2001 3:14
x ≤ y if and only if y − x ∈ C K .
Stephen E. Fienberg
Carnegie Mellon University
I. Two Problems
II. Classification
III. Cluster Analysis
IV. Nonparametric Regression
247
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
statistics, computer science, and database management. It techniques to study climate change over time. Or they
deals with very large datasets, tries to make fewer theoret- can be other forms of images such as those arising from
ical assumptions than has traditionally been done in statis- functional magnetic resonance imaging in medicine, or
tics, and typically focuses on problems of classification, robot sensors. Or the data might be continuous functions,
clustering, and regression. In such domains, data mining such as spectra from astronomical studies of stars. This
often uses decision trees or neural networks as models review focuses upon the most typical applications, but
and frequently fits them using some combination of tech- the reader can obtain a fuller sense of the scope by ex-
niques such as bagging, boosting/arcing, and racing. These amining the repository of benchmark data maintained at
domains and techniques are the primary focus of the the University of California at Irvine (Bay, 1999). This
present article. Other activities in data mining focus on is- archive is resource for the entire machine learning com-
sues such as causation in large-scale systems (e.g., Spirtes munity, who use it to test and tune new algorithms. It
et al. 2001; Pearl, 2000), and this effort often involves elab- contains, among other datasets, all those used in the an-
orate statistical models and, quite frequently, Bayesian nual KDD Cup contest, a competitive comparison of data
methodology and related computational techniques (e.g., mining algorithms run by the organizers of the KDD
Cowell et al., 1999, Jordan, 1998). For an introductory conference.
discussion of this dimension of data mining see Glymour It is unclear whether data mining will continue to gain
et al. (1997). intellectual credibility among academic researchers as a
The subject area of data mining began to coalesce in separate discipline or whether it will be subsumed as a
1990, as researchers from the three parent fields discov- branch of statistics or computer science. Its commercial
ered common problems and complementary strengths. success has created a certain sense of distance from tra-
The first KDD workshop (on Knowledge Discovery and ditional research universities, in part because so much of
Data Mining) was held in 1989. Subsequent workshops the work is being done under corporate auspices. But there
were held in 1991, 1993, and 1994, and these were then re- is broad agreement that the content of the field has true
organized as an annual international conference in 1995. In scientific importance, and it seems certain that under one
1997 the Journal of Data Mining and Knowledge Discov- label or another, the body of theory and heuristics that
ery was established under the editorship of Usama Fayyad. constitutes the core of data mining will persist and grow.
It has published special issues on electronic commerce, The remainder of this article describes the two prac-
scalable and parallel computing, inductive logic program- tical problems whose tension defines the ambit of data
ming, and applications to atmospheric sciences. Interest mining, then reviews the three basic kinds of data mining
in data mining continues to grow, especially among busi- applications: classification, clustering, and regression. In
nesses and federal statistical agencies; both of these com- parallel, we describe three relatively recent ideas that are
munities gather large amounts of complex data and want characteristic of the intellectual cross-fertilization that oc-
to learn as much from their collections as is possible. curs in data mining, although they represent just a small
The needs of the business and federal communities have part of the work going on in this domain. Boosting is dis-
helped to direct the growth of data mining research. For cussed in the context of classification, racing is discussed
example, typical applications in data mining include the in the context of clustering, and bagging is discussed in the
following: context of regression. Racing and bagging have broad ap-
plicability, but boosting is (so far) limited in application to
r Use of historical financial records on bank customers classification problems. Other techniques for these prob-
to predict good and bad credit risks. lems of widespread interest not discussed here include
r Use of sales records and customer demographics to support vector machines and kernel smoothing methods
perform market segmentation. (e.g., Vapnik, 2000).
r Use of previous income tax returns and current returns
to predict the amount of tax a person owes.
I. TWO PROBLEMS
These three examples imply large datasets without explicit
model structure, and the analyses are driven more by man- Data mining exists because modern methods of data col-
agement needs than by scientific theory. The first example lection and database management have created sample
is a classification problem, the second requires clustering, sizes large enough to overcome (or partially overcome) the
and the third would employ a kind of regression analysis. limitations imposed by the curse of dimensionality. How-
In these examples the raw data are numerical or cat- ever, this freedom comes at a price—the size of the datasets
egorical values, but sometimes data mining treats more severely restricts the complexity of the calculations
exotic situations. For example, the data might be satellite that can be made. These two issues are described in the
photographs, and the investigator would use data mining following subsections.
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
might by chance happen that people whose social security B. Massive Datasets
numbers end in 2338 tend to have higher incomes. If the
Massive datasets pose special problems in analysis. To
model fitting process allowed such frivolous terms, then
provide some benchmarks of difficulty, consider the fol-
this chance relationship would be used in predicting tax
lowing taxonomy by dataset size proposed by Huber
obligation, and such predictions would be less accurate
(1995):
than those obtained from a more parsimonious model that
excluded misinformation. It does no good to hope that
Type Size (bytes)
one’s dataset lacks spurious structure; when p is large,
it becomes mathematically certain that chance patterns Tiny 102
exist. Small 104
The third formulation of the COD is more subtle. Medium 106
In standard multiple regression, multicollinearity arises Large 108
when two or more of the explanatory variables are highly Huge 1010
correlated, so that the the data lie mostly inside an affine Monster 1012
subspace of IR p (e.g., close to a line or a plane within the
p-dimensional volume). If this happens, there are an un- Huber argues that the category steps, which are factors
countable number of models that fit the data about equally of 100, correspond to reasonable divisions at which the
well, but these models are dramatically different with re- quantitative increase in sample size compels a qualitative
spect to predictions for future responses whose explana- change in the kind of analysis.
tory values lie outside the subspace. As p gets large, the For tiny datasets, one can view all the data on a single
number of possible subspaces increases rapidly, and just page. Computational time and storage are not an issue.
by chance a finite dataset will tend to concentrate in one This is the realm in which classical statistics began, and
of them. most problems could be resolved by intelligent inspection
The problem of multicollinearity is aggravated in non- of the data tables.
parametric regression, which allows nonlinear relation- Small datasets could well defeat tabular examination by
ships in the model. This is the kind of regression most humans, but they invite graphical techniques. Scatterplots,
frequently used in data mining. Here the analogue of histograms, and other visualization techniques are quite
multicollinearity arises when predictors concentrate on a capable of structure discovery in this realm, and modern
smooth manifold within IR p , such as a curved line or sheet computers permit almost any level of analytic complexity
inside the p-volume. Since there are many more manifolds that is wanted. It is usually possible for the analyst to
than affine subspaces, the problem of concurvity in non- proceed adaptively, looking at output from one trial in
parametric regression distorts prediction even more than order to plan the next modeling effort.
does multicollinearity in linear regression. Medium datasets begin to require serious thought. They
When one is interested only in prediction, the COD is will contain many outliers, and the analyst must develop
less of a problem for future data whose explanatory vari- automatic rules for detecting and handling them. Visual-
ables have values close to those observed in the past. But ization is still a viable way to do structure discovery, but
an unobvious consequence of large p is that nearly all when the data are multivariate there will typically be more
new observation vectors tend to be far from those pre- scatterplots and histograms than one has time to examine
viously seen. Furthermore, if one needs to go beyond individually. In some sense, this is the first level at which
simple prediction and develop interpretable models, then the entire analytical strategy must be automated, which
the COD can be an insurmountable obstacle. Usually the limits the flexibility of the study. This is the lowest order
most one can achieve is local interpretability, and that in the taxonomy in which data mining applications are
happens only where data are locally dense. For more de- common.
tailed discussions of the COD, the reader should con- Large datasets are difficult to visualize; for example, it
sult Hastie and Tibshirani (1990) and Scott and Wand is possible that the point density is so great that a scatter-
(1991). plot is completely black. Many common statistical proce-
The classical statistical or psychometrical approach to dures, such as cluster analysis or regression and classifi-
dimensionality reduction typically involves some form of cation based on smoothing, become resource-intensive or
principal component analysis or multidimensional scal- impossible.
ing. Roweis and Saul (2000) and Tenenbaum et al. (2000) Huge and monster datasets put data processing issues at
suggest some novel ways to approach the problem of non- the fore, and analytical methods become secondary. Sim-
linear dimensionality reduction in very high dimensional ple operations such as averaging require only that data
problems such as those involving images. be read once, and these are feasible at even the largest
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
size, but analyses that require more than O(n) or per- gory membership for future observations. For example, a
haps O(n log(n)) operations, for n the sample size, are common data mining application is to use the historical
impossible. records on loan applicants to classify new applicants as
The last three taxa are a mixed blessing for data mining. good or bad credit risks.
The good news is that the large sample sizes mitigate the
curse of dimensionality, and thus it is possible, in princi-
A. Methods
ple, to discover complex and interesting structure. The bad
news is that many of the methods and much of the theory Classical classification began in 1935, when Sir Ronald
developed during a century of explosive growth in statisti- Fisher set his children to gather iris specimens. These spec-
cal research are impractical, given current data processing imens were then classified into three species by a botanist,
limitations. and numerical measurements were made on each flower.
From a computational standpoint, the three major is- Fisher (1936) then derived mathematical formulas which
sues in data mining are processor speed, memory size, used the numerical measurements to find hyperplanes that
and algorithmic complexity. Speed is proportional to the best separated the three different species.
number of floating point operations (flops) that must be In modern classification, the basic problem is the same
performed. Using PCs available in 2000, one can under- as Fisher faced. One has a training sample, in which the
take analyses requiring about about 1013 flops, and perhaps correct classification is known (or assumed to be accurate
up to 1016 flops on a supercomputer. Regarding memory, with high probability). For each case in the training sam-
a single-processor machine needs sufficient memory for ple, one has additional information, either numerical or
about four copies of the largest array required in the analy- categorical, that may be used to predict the unknown clas-
sis, and the backup storage (disk) should be able to hold an sifications of future cases. The goal is use the information
amount equal to about 10 copies of the raw data (otherwise in the learning sample to build a decision function that can
one has trouble storing derived calculations and exploring reliably classify future cases.
alternative analyses). The algorithmic complexity of the Most applications in data mining use either logistic re-
analysis is harder to quantify; at a high level it depends gression, neural nets, or recursive partitioning to build the
on the number of logical branches that the analyst wants decision functions. The next three subsections describe
to explore when planning the study, and at a lower level it these approaches. For information on the more traditional
depends on the specific numerical calculations employed discriminant analysis techniques, see Press (1982).
by particular model fitting procedures.
Datasets based on extracting information from the
World Wide Web or involving all transactions from banks 1. Logistic Regression
or grocery stores, or collections of fMRI images can Logistic regression is useful when the response variable is
rapidly fill up terabytes of disk storage, and all of these binary but the explanatory variables are continuous. This
issues become relevant. The last issue, regarding algorith- would be the case if one were predicting whether or not an
mic complexity, is especially important added on top of customer is a good credit risk, using information on their
the sheer size of some datasets. It implies that one cannot income, years of employment, age, education, and other
search too widely in the space of possible models, and that continuous variables.
one cannot redirect the analysis in midstream to respond In such applications one uses the model
to an insight found at an intermediate step.
For more information on the issues involved in prepar-
exp(X T θ)
ing superlarge datasets for analysis, see Banks and P[Y = 1] = , (1)
Parmigiani (1992). For additional discussion of the special 1 + exp(X T θ)
problems proposed in the analysis of superlarge datasets, where Y = 1 if the customer is a good risk, X is the vector
see the workshop proceedings on massive datasets pro- of explanatory variables for that customer, and θ are the
duced by the National Research Council (1997). unknown parameters to be estimated from the data. This
model is advantageous because, under the transformation
P[Y = 1]
II. CLASSIFICATION p = ln
1 − P[Y = 1]
Classification problems arise when one has a training sam- one obtains the linear model p = X T θ. Thus all the usual
ple of cases in known categories and their correspond- machinery of multiple linear regression will apply.
ing explanatory variables. One wants to build a decision Logistic regression can be modified to handle categor-
rule that uses the explanatory variables to predict cate- ical explanatory variables through definition of dummy
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
variables, but this becomes impractical if there are many and may be thought of as estimated values of model pa-
categories. Similarly, one can extend the approach to cases rameters. As diagrammed in Fig. 1, the simple perceptron
in which the response variable is polytomous (i.e., takes fits the model
more than two categorical values). Also, logistic regres- p
sion can incorporate product interactions by defining new y = signum wi xi + τ ,
explanatory variables from the original set, but this, too, i =1
becomes impractical if there are many potential interac-
tions. Logistic regression is relatively fast to implement, where the weights wi and the threshhold parameter τ are
which is attractive in data mining applications that have estimated from the training sample.
large datasets. Perhaps the chief value of logistic regres- Unlike logistic regression, it is easy for the simple per-
sion is that it provides an important theoretical window on ceptron to include categorical explanatory variables such
the behavior of more complex classification methodolo- as profession or whether the applicant has previously de-
gies (Friedman et al., 2000). clared bankruptcy, and there is much less technical dif-
ficulty in extending the perceptron to the prediction of
2. Neural Nets polytomous outcomes. There is no natural way, however,
Neural nets are a classification strategy that employs an to include product interaction terms automatically; these
algorithm whose architecture is intended to mimic that still require hand-tuning.
of a brain, a strategy that was casually proposed by von The simple perceptron has serious flaws, and these have
Neumann. Usually, the calculations are distributed across been addressed in a number of ways. The result is that
multiple nodes whose behavior is analogous to neurons, the field of neural networks has become complex and di-
their outputs are fed forward to other nodes, and these verse. The method shown in Fig. 1 is too primitive to be
results are eventually accumulated into a final prediction commonly used in modern data mining, but it serves to
of category membership. illustrate the basic ideas.
To make this concrete, consider a simple perceptron As a strategy, neural networks go back to the pio-
model in which one is attempting to use a training set of neering work of McCulloch and Pitts (1943). The com-
historical data to teach the network to identify customers putational obstacles in training a net went beyond the
who are poor credit risks. Figure 1 shows a hypothetical technology then available. There was a resurgence of at-
situation; the inputs are the values of the explanatory vari- tention when Rosenblatt (1958) introduced the perceptron,
ables and the output is a prediction of either 1 (for a good an early neural net whose properties were widely studied
risk) or −1 (for a bad risk). The weights in the nodes are in the 1960s; fundamental flaws with the perceptron de-
developed as the net is trained on the historical sample, sign were pointed out by Minsky and Papert (1969). The
area languished until the early 1980s, when the Hopfield
net (Hopfield, 1982) and the discovery of the backprop-
X1 X2 X3 ... Xp agation algorithm [Rumelhart et al. (1985) were among
several independent inventors] led to networks that could
W2 W3 be used in practical applications. Additional information
Wp on the history and development of these ideas can be found
W1 in Ripley (1996).
The three major drawbacks in using neural nets for data
mining are as follows:
3. Neural nets do not automatically provide statements ods grow elaborate trees and then prune back to improve
of uncertainty. At the price of greatly increased predictive accuracy outside the training sample (this is a
computation one can use statistical techniques such as partial response to the kinds of overfit concerns that arise
cross-validation, bootstrapping, or jackknifing to get from the curse of dimensionality).
approximate misclassification rates or standard errors However it is done, the result is a decision tree. Figure 2
or confidence intervals. shows a hypothetical decision tree that might be built from
credit applicant data. The first split is on income, a con-
Nonetheless, neural nets are one of the most popular tools tinuous variable. To the left-hand side, corresponding to
in the data mining community, in part because of their applicants with incomes less than $25,000, the next split
deep roots in computer science. is categorical, and divides according to whether the ap-
Subsection IV.A.4 revisits neural nets in the context of plicant has previously declared bankruptcy. Going back
regression rather than classification. Instead of describing the the top of the tree, the right-hand side splits on a lin-
the neural net methodology in terms of the nodes and con- ear combination; the applicant is considered a good risk
nections which mimic the brain, it develops an equivalent if a linear combination of their age and income exceeds
but alternative representation of neural nets as a proce- a threshold. This is a simplistic example, but it illustrates
dure for fitting a mathematical model of a particular form. the interpretability of such trees, the fact that the same
This latter perspective is the viewpoint embraced by most variable may be used more than once, and the different
current researchers in the field. kinds of splits that can be made.
Recursive partitioning methods were first proposed by
3. Recursive Partitioning Morgan and Sonquist (1963). The method became widely
popular with the advent of CART, a statistically sophis-
Data miners use recursive partitioning to produce de- ticated implementation and theoretical evaluation devel-
cision trees. This is one of the most popular and ver- oped by Breiman et al. (1984). Computer scientists have
satile of the modern classification methodologies. In also contributed to this area; prominent implementations
such applications, the method employs the training of decision trees include ID3 and C4.5 (Quinlan, 1992).
sample to recursively partition the set of possible ex- A treatment of the topic from a statistical perspective
planatory measurements. The resulting classification rule is given by Zhang and Singer (1999). The methodology
can displayed as a decision tree, and this is generally extends to regression problems, and this is described in
viewed as an attractive and interpretable rule for infer- Section IV.A.5 from a model-fitting perspective.
ence.
Formally, recursive partitioning splits the training sam-
ple into increasingly homogeneous groups, thus inducing B. Boosting
a partition on the space of explanatory variables. At each
step, the algorithm considers three possible kinds of splits Boosting is a method invented by computer scientists to
using the vector of explanatory values X: improve weak classification rules. The idea is that if one
has a classification procedure that does slightly better than
chance at predicting the true categories, then one can apply
X i ≤ t (univariate split)?
1. Is
p this procedure to the portions of the training sample that
2. Is i =1 wi xi ≤ t (linear combination split)?
are misclassified to produce new rules and then weight
3. Does xi ∈ S (categorical split, used if xi is a
all the rules together to achieve better predictive accuracy.
categorical variable)?
Essentially, each rule has a weighted vote on the final
classification of a case.
The algorithm searches over all possible values of t, all
The procedure was proposed by Schapire (1990) and
coefficients {wi }, and all possible subsets S of the category
improved by Freund and Schapire (1996) under the name
values to find the split that best separates the cases in the
AdaBoost. There have been many refinements since, but
training sample into two groups with maximum increase
the core algorithm for binary classification assumes one
in overall homogeneity.
has a weak rule g1 (X) that takes values in the set {1, −1}
Different partitioning algorithms use different methods
according to the category. Then AdaBoost starts by putting
for assessing improvement in homogeneity. Some seek to
equal weight wi = n −1 on each of the n cases in the training
minimize Gini’s index of diversity, others use a “twoing
sample. Next, the algorithm repeats the following steps K
rule,” and hybrid methods can switch criteria as they move
times:
down the decision tree. Similarly, some methods seek to
find the greatest improvement on both sides of the split,
whereas other methods choose the split that achieves max- 1. Apply the procedure gk to the training sample with
imum homogeneity on one side or the other. Some meth- weights w1 , . . . , wn .
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
2. Find the empirical probability pw of misclassification groups that are usefully similar. One common application
under these weightings. is in market segmentation, where a merchant has data on
3. Calculate ck = ln[(1 − pw )/ pw ]. the purchases made by a large number of customers, to-
4. If case i is misclassified, replace
wi by wi exp ck . gether with demographic information on the customers.
Then renormalize so that i wi = 1 and go to step 1. The merchant would like to identify clusters of customers
who make similar purchases, so as to better target adver-
The final inference is the sign of kK=1 ck gk (X), which is a tising or forecast the effect of changes in product lines.
weighted sum of the determinations made by each of the
K rules formed from the original rule g1 .
This rule has several remarkable properties. Besides A. Clustering Strategies
provably improving classification, it is also resistant to The classical method for grouping observations is hierar-
overfit, which arises when K is large. The procedure al- chical agglomerative clustering. This produces a cluster
lows quick computation, and thus can be made practical tree; the top is a list of all the observations, and these are
even for huge datasets, and it can be generalized to han- then joined to form subclusters as one moves down the
dle more than two categories. Boosting therefore provides tree until all cases are merged in a single large cluster.
an automatic and effective way to increase the capability For most applications a single large cluster is not infor-
of almost any classification technique. As a new method, mative, however, and so data miners require a rule to stop
it is the object of active research; Friedman et al. (2000) the agglomeration algorithm before complete merging oc-
describe the current thinking in this area, linking it to the curs. The algorithm also requires a rule to identify which
formal role of statistical models such as logistic regression subcluster should be merged next for each stage of the
and generalized additive models. tree-building process.
Statisticians have not found a universally reliable rule to
determine when to stop a clustering algorithm, but many
III. CLUSTER ANALYSIS have been suggested. Milligan and Cooper (1985) de-
scribed a large simulation study that included a range of
Cluster analysis is the term that describes a collection of realistic situations. They found that no rule dominated all
data mining methods which take observations and form the others, but that the cubic clustering criterion was rarely
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
bad and often quite good. However, in practice, most ana- The primary drawback to using k-means clustering in
lysts create the entire tree and then inspect it to find a point data mining applications is that exact solution requires
at which further linkages do not seem to further their pur- extensive computation. Approximate solutions can be ob-
pose. For the market segmentation example, a merchant tained much more rapidly, and this is the direction in which
might be pleased to find clusters that can be interpreted many researchers have gone. However, the quality of the
as families with children, senior citizens, yuppies, and so approximation can be problematic.
forth, and would want to stop the clustering algorithm Both k-means and agglomerative cluster analysis are
when further linkage would conflate these descriptive strongly susceptible to the curse of dimensionality. For
categories. the market segmentation example, it is easy to see that
Similar diversity exists when choosing a subcluster customers could form tight clusters based upon their first
merging rule. For example, if the point clouds associated names, or where they went to elementary school, or other
with the final interpretable clusters appear ellipsoidal with misleading features that would not provide commercial
similar shape and orientation, then one should probably insight. Therefore it is useful to do variable selection, so
have used a joining rule that connects the two subclusters that clustering is done only upon features that lead to use-
whose centers have minimum Mahalanobis (1936) dis- ful divisions. However, this requires input from the user
tance. Alternatively, if the final interpretable clusters are on which clusters are interpretable and which are not, and
nonconvex point clouds, then one will have had to discover encoding such information is usually impractical. An al-
those by using some kind of nearest neighbor joining rule. ternative is to use robust clustering methods that attempt
Statisticians have devised many such rules and can show to find a small number of variables which produce well-
that no single approach can solve all clustering problems separated clusters. Kaufman and Rousseeuw (1990) re-
(Van Ness, 1999). view many ideas in this area and pay useful attention to
In data mining applications, most of the joining rules de- computational issues.
veloped by statisticians require infeasible amounts of com- For medium datasets in Huber’s taxonomy, it is possi-
putation and make unreasonable assumptions about homo- ble to use visualization to obtain insight into the kinds of
geneous patterns in the data. Specifically, large, complex cluster structure that may exist. This can provide guidance
data sets do not generally have cluster structure that is on the clustering algorithms one should employ. Swayne
well described by sets of similar ellipsoids. Instead, data et al. (1997) describe software that enables data miners
miners expect to find structures that look like sheets, el- to see and navigate across three-dimensional projections
lipsoids, strings, and so forth (rather as astronomers see of high-dimensional datasets. Often one can see groups of
when looking at the large-scale structure of the universe). points or outliers that are important in the application.
Therefore, among agglomerative clustering schemes,
data miners almost always use nearest neighbor clustering.
B. Racing
This is one of the fastest clustering algorithms available,
and is basically equivalent to finding a minimum span- One of the advances that came from the data mining syn-
ning tree on the data. Using the Prim (1957) algorithm for ergy between statisticians and computer scientists is a
spanning trees, the computation takes O(n 2 ) comparisons technique called ‘racing’ (Maron and Moore, 1993). This
(where n is the number of observations). This is feasible enables analysts to do much larger searches of the space
for medium datasets in Huber’s taxonomy. Furthermore, of models than was previously possible.
nearest neighbor methods are fairly robust at finding the In the context of cluster analysis, suppose one wanted to
diverse kinds of structure that one anticipates. do variable selection to discover which set of demographic
As an alternative to hierarchical agglomerative clus- features led to, say, 10 consumer clusters that had small
tering, some data miners use k-means cluster analysis, intracluster variation and large intercluster variation. One
which depends upon a strategy pioneered by MacQueen approach would be to consider each possible subset of the
(1967). Starting with the assumption that the data contain demographic features, run the clustering algorithm, and
a prespecified number k of clusters, this method iteratively then decide which set of results had the best cluster sepa-
finds k cluster centers that maximize between-cluster dis- ration. Obviously, this would entail much computation.
tances and minimize within-cluster distances, where the An alternative approach is to consider many feature sub-
distance metric is chosen by the user (e.g., Euclidean, sets simultaneously and run the clustering algorithm for
Mahalanobis, sup norm, etc.). The method is useful when perhaps 1% of the data on each subset. Then one compares
one has prior beliefs about the likely number of clusters the results to see if some subsets lead to less well-defined
in the data. It also can be a useful exploratory tool. In clusters than others. If so, those subsets are eliminated and
the computer science community, k-means clustering is the remaining subsets are then tested against each other
known as the quantization problem and is closely related on a larger fraction of the data. In this way one can weed
to Voronoi tesselation. out poor feature subset choices with minimal computation
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
and reserve resources for the evaluation of the very best simplest AM says that the expected value of an observation
candidates. Yi can be written
Racing can be done with respect to a fixed fraction of p
the data or a fixed amount of runtime. Obviously, this E[Yi ] = θ0 + f j (X i j ), (2)
admits the possibility of errors. By chance a good method j =1
might appear poor when tested with only a fraction of the where the functions f j are unspecified but have mean zero.
data or for only a fixed amount of computer time, but the Since the functions f j are estimated from the data, the
probability of such errors can be controlled statistically, AM avoids the conventional statistical assumption of lin-
and the benefit far outweighs the risk. If one is using racing earity in the explanatory variables; however, the effects of
on data fractions, the problem of chance error makes it the explanatory variables are still additive. Thus response
important that the data be presented to the algorithm in is modeled as a sum of arbitrary smooth univariate func-
random order, otherwise the outcome of the race might tions of explanatory variables. One needs about 100 data
depend upon subsamples that are not representative of the points to estimate each f j , but under the model given in
entire dataset. (1), the requirement for data grows only linearly in p.
Racing has much broader application than cluster analy- The backfitting algorithm is the essential tool used in es-
sis. For classification problems, competing classifiers can timating an additive model. This algorithm requires some
be tested against each other, and those with high misclassi- smoothing operation (e.g., kernel smoothing or nearest
fication rates are quickly eliminated. Similarly, for regres- neighbor averages; Hastie and Tibshirani, 1990) which
sion problems, competing regression models are raced to we denote by Sm(·|·). For a large classes of smoothing
quickly eliminate those that show poor fit. In both appli- operations, the backfitting algorithm converges uniquely.
cations one can easily obtain a 100-fold increase in the The backfitting algorithm works as follows:
size of the model space that is searched, and this leads
to the discovery of better classifiers and better regression 1. At initialization, define functions f j(0) ≡ 0 and set
functions. θ0 = Ȳ .
2. At the ith iteration, estimate f j(i+1) by
IV. NONPARAMETRIC REGRESSION
(i+1) i
Regression is a key problem area in data mining and has at- f j = Sm Y − θ0 − f X1 j , . . . , Xnj
k
k= j
tracted a substantial amount of research attention. Among
dozens of new techniques for nonparametric regression for j = 1, . . . , p.
that have been invented over the last 15 years, we detail 3. Check whether | f j(i+1) − f j(i) | < δ for all j = 1, . . . ,
seven that are widely used. Section IV.A describes the p, for δ the prespecified convergence tolerance. If
additive model (AM), alternating conditional expecta- not, go back to step 2; otherwise, take the current f j(i)
tion (ACE), projection pursuit regression (PPR), neural as the additive function estimate of f j in the model.
nets (NN), recursive partitioning regression (RPR), mul-
tivariate adaptive regression splines (MARS), and locally This algorithm is easy to code. Its speed is chiefly deter-
weighted regression (LOESS). Section IV.B compares mined by the complexity of the smoothing function.
these techniques in terms of performance and computa- One can generalize the AM by permitting it to add a
tion, and Section IV.C describes how bagging can be used few multivariate functions that depend on prespecified ex-
to improve predictive accuracy. planatory variables. Fitting these would require bivariate
smoothing operations. For example, if one felt that predic-
A. Seven Methods tion of tax owed in the AM would improved by including
The following seven methods may seem very different, but a function that depended upon both a person’s previous
they employ at most two distinct strategies for addressing year’s declaration and their current marital status, then
the curse of dimensionality. One strategy fits purely local this bivariate smoother could be used in the second step
models; this is done by RPR and LOESS. The other strat- of the backfitting algorithm.
egy uses low-dimensional smoothing to achieve flexibility
in fitting specific model forms; this is done by AM, ACE, 2. Alternating Conditional Expectations (ACE)
NN, and PPR. MARS combines both strategies.
A generalization of the AM allows a smoothing transfor-
mation of the response variable as well the smoothing of
1. The Additive Model (AM)
the p explanatory variables. This uses the ACE algorithm,
Many researchers have developed the AM; Buja et al. as developed by Breiman and Friedman (1985), and it fits
(1989) describe the early development in this area. The the model
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
p explanatory variable (and thus perpendicular to the sec-
E[g(Yi )] = θ0 + f j (X i j ). (3) ond), then AM works well. When the aluminum sheet is
j=1
rotated slightly so that the corrugations do not parallel
Here all conditions are as stated before for (1), except a natural axis, however, AM fails because the true func-
that g is an arbitrary smooth function scaled to ensure tion is a nonadditive function of the explanatory variables.
the technically necessary requirement that var[g(Y )] = 1 PPR would succeed, however, because the true function
(if not for this constraint, one could get a perfect fit by can be written as an additive model whose functions have
setting all functions to be identically zero). arguments that are linear combinations of the explanatory
variables.
Given data Yi and X i , one wants to find pg, θ0 , and PPR combines the backfitting algorithm with a numer-
f 1 , . . . , f p such that E[g(Yi ) | X i ] − θ0 − j=1 f j (X i j )
is well described as independent error. Thus one solves ical search routine, such as Gauss–Newton, to fit models
of the form
(ĝ, fˆ1 , . . . , fˆ p ) = argmin
(g, f 1 ,..., f p )
r
E[Yi ] = f k (αk X i ). (4)
n
p 2 k=1
× g(Yi ) − f j (X i j ) ,
i=1 j=1
Here the α1 , . . . , αr are unit vectors that define a set of
r linear combinations of explanatory variables. The lin-
where ĝ is constrained to satisfy the unit-variance require-
ear combinations are similar to those used for principal
ment. The algorithm for achieving this is described by
components analysis (Flury, 1988). These vectors need
Breiman and Friedman (1985); they modify the backfit-
not be orthogonal, and are chosen to maximize predictive
ting algorithm to provide a step that smoothes the left-hand
accuracy in the model as estimated by cross-validation.
side while maintaining the variance constraint.
Operationally, PPR alternates calls to two routines. The
ACE analysis returns sets of functions that maximize
first routine conditions on a set of pseudovariables given
the linear correlation between the sum of the smoothed
by linear combinations of original variables; these are fed
explanatory variables and the smoothed response variable.
into the backfitting algorithm to obtain an AM in the pseu-
Therefore ACE is more similar in spirit to the multiple cor-
dovariables. The other routine conditions on the estimated
relation coefficient than to multiple regression. Because
AM functions from the previous step and then searches to
ACE does not directly attempt a regression analysis, it has
find linear combinations of the original variables which
certain undesirable features; for example, small changes in
maximize the fit of those functions. By alternating itera-
the data can lead to very different solutions (Buja and Kass,
tions of these routines, the result converges to a unique
1985), it need not reproduce model transformations, and,
solution.
unlike regression, it treats the explanatory and response
PPR is often hard to interpret for r > 1 in (3). When
variables symmetrically.
r is allowed to increase without bound, PPR is consis-
To redress some of the drawbacks in ACE, Tibshirani
tent, meaning that as the sample size grows, the estimated
(1988) devised a modification called AVAS, which uses a
regression function converges to the true function.
variance-stabilizing transformation in the backfitting loop
Another improvement that PPR offers over AM is this
when fitting the explanatory variables. This modification
it is invariant to affine transformations of the data; this is
is somewhat technical, but in theory it leads to improved
often desirable when the explanatory variables are mea-
properties when treating regression applications.
sured in the same units and have similar scientific jus-
tifications. For example, PPR might be sensibly used to
3. Projection Pursuit Regression (PPR) predict tax that is owed when the explanatory variables are
shares of stock owned in various companies. Here it makes
The AM uses sums of functions whose arguments are the sense that linear combinations of shares across commer-
natural coordinates for the space IR p of explanatory vari- cial sectors would provide better prediction of portfolio
ables. But when the true regression function is additive appreciation than could be easily obtained from the raw
with respect to pseudovariables that are linear combina- explanatory variables.
tions of the explanatory variables, then the AM is inap-
propriate. PPR was developed by Friedman and Stuetzle
(1981) to address such situations.
4. Neural Nets (NN)
Heuristically, imagine there are two explanatory vari-
ables and suppose the regression surface is shaped like Many neural net techniques exist, but from a statistical
a sheet of corrugated aluminum. If that sheet is oriented regression standpoint (Barron and Barron, 1988), nearly
to make the corrugations parallel to the axis of the first all variants fit models that are weighted sums of sigmoidal
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
functions whose arguments involve linear combinations of lustrated for classification in Fig. 2. Many common func-
the data. A typical feedforward network uses a model of tions are difficult for RPR, however; for example, it ap-
the form proximates a straight line by a stairstep function. In high
m
dimensions it can be difficult to discover when the RPR
E[Y ] = β0 + βi f αiT x + γi0 , piecewise constant model closely approximates a simple
i=1 smooth function.
where f (·) is a logistic function and the β0 , γi0 , and αi are To be concrete, suppose one used RPR to predict tax
estimated from the data. Formally, this approach is similar obligation. The algorithm would first search all possible
to that in PPR. The choice of m determines the number of splits in the training sample observations and perhaps di-
hidden nodes in the network and affects the smoothness of vide on whether or not the declared income is greater than
the fit; in most cases the user determines this parameter, $25,000. For people with lower incomes, the next search
but it is also possible to use statistical techniques, such might split the data on marital status. For those with higher
as cross-validation to assess model fit, that allow m to be incomes, subsequent split might depend on whether the
estimated from the data. declared income exceeds $75,000. Further splits might
Neural nets are widely used, although their performance depend on the number of children, the age and profession
properties, compared to alternative regression methods, of the declarer, and so forth. The search process repeats
have not been thoroughly studied. Ripley (1996) describes in every subset of the training data defined by previous
one assessment which finds that neural net methods are not divisions, and eventually there is no potential split that
generally competitive. Schwarzer et al. (1986) review the sufficiently reduces variability to justify further partitions.
use of neural nets for prognostic and diagnostic classifi- At this point RPR fits the averages of all the training cases
cation in clinical medicine and reach similar conclusions. within the most refined subsets as the estimates θ j and
Another difficulty with neural nets is that the resulting shows the sequence of chosen divisions in a decision tree.
model is hard to interpret. The Bayesian formulation of (Note: RPR algorithms can be more complex; e.g., CART
neural net methods by Neal (1996) provides a some rem- “prunes back” the final tree by removing splits to achieve
edy for this difficulty. better balance of observed fit in the training sample with
PPR is very similar to neural net methods. The primary future predictive error.)
difference is that neural net techniques usually assume
that the functions f k are sigmoidal, whereas PPR allows 6. Multivariate Adaptive Regression
more flexibility. Zhao and Atkeson (1992) show that PPR Splines (MARS)
has similar asymptotic properties to standard neural net
techniques. Friedman (1991) proposed a data mining method that com-
bines PPR with RPR through use of multivariate adaptive
regression splines. It fits a model formed as a weighted
5. Recursive Partitioning Regression (RPR) sum of multivariate spline basis functions (tensor-spline
RPR has become popular since the release of CART (Clas- basis functions) and can be written as
sification and Regression Tree) software developed by
q
Breiman et al. (1984). This technique has already been E[Yi ] = ak Bk (X i ),
described in the context of classification, so this subsec- k=0
tion focuses upon its application to regression. The RPR where the coefficients ak are estimated by (generalized)
algorithm fits the model cross-validation fitting. The constant term is obtained by
M setting B0 (X 1 , . . . , X n ) ≡ 1, and the multivariate spline
E[Yi ] = θ j I R j (X i ), terms are products of univariate spline basis functions:
j=1
rk
where the R1 , . . . , R M are rectangular regions which par- Bk (x1 , . . . , xn ) = b xi(s,k) ts,k , 1 ≤ k ≤ r.
s=1
tition IR p , and I R j (X i ) denotes an indicator function that
takes the value 1 if and only if X i ∈ R j and is otherwise The subscript i(s, k) identifies a particular explanatory
zero. Here θ j is the estimated numerical value of all re- variable, and the basis spline for that variable puts a knot
sponses with explanatory variables in R j . at ts,k . The values of q, the r1 , . . . , rq , the knot locations,
RPR was intended to be good at discovering local low- and the explanatory variables selected for inclusion are
dimensional structure in functions with high-dimensional determined from the data adaptively.
global dependence. RPR is consistent; also, it has an at- MARS output can be represented and interpreted in a
tractive graphic representation as a decision tree, as il- decomposition similar to that given by analysis of vari-
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
is commercially available from Mathsoft, Inc., at tions of a sigmoidal function,” IEEE Trans. Information Theory 39,
http://www.splus.mathsoft.com. 930–945.
Barron, A. R., and Barron, R. L. (1988). “Statistical learning networks:
A unifying view,” Comput. Sci. Stat., 20, 192–203.
C. Bagging Bay, S. D. (1999). “The UCI KDD Archive,” http://kdd.ics.uci.edu.
Department of Information and Computer Science, University of
Bagging is a strategy for improving predictive accuracy by California, Irvine, CA.
model averaging. It was proposed by Breiman (1996), but Breiman, L. (1996). “Bagging predictors,” Machine Learning 26, 123–
has a natural pedigree in Bayesian work on variable selec- 140.
tion, in which one often puts weights on different possible Breiman, L. (1998). “Arcing classifiers,” Ann. Stat. 26, 801–824.
models and then lets the data update those weights. Breiman, L., and Friedman, J. (1985). “Estimating optimal transforma-
tions for multiple regression and correlation,” J. Am. Stat. Assoc. 80,
Concretely, suppose one has a training sample and 580–619.
a nonparametric regression technique that takes the ex- Breiman, L., Friedman, J., Olshen, R.A., and Stone, C. (1984). “Classi-
planatory variables and produces an estimated response fication and Regression Trees,” Wadsworth, Belmont, CA.
value. Then the simplest form of bagging proceeds by Buja, A., and Kass, R. (1985). “Discussion of ‘Estimating optimal trans-
drawing K random samples (with replacement) from formations for multiple regression and correlation,’ by Breiman and
Friedman,” J. Am. Stat. Assoc. 80, 602–607.
the training sample and applying the regression tech- Buja, A., Hastie, T. J., and Tibshirani, R. (1989). “Linear smoothers and
nique to each random sample to produce regression rules additive models,” Ann. Stat. 17, 453–555.
T1 (X), . . . , TK (X). For a new observation,say X ∗ , the Cleveland, W. (1979). “Robust locally weighted regression and smooth-
bagging predictor of the response Y ∗ is K −1 kK=1 Tk (X ∗ ). ing scatterplots,” J. Am. Stat. Assoc. 74, 829–836.
The idea behind bagging is that model fitting strategies Cleveland, W., and Devlin, S. (1988). “Locally weighted regression: An
approach to regression analysis by local fitting,” J. Am. Stat. Assoc.
usually have high variance but low bias. This means that 83, 596–610.
small changes in the data can produce very different mod- Cowell, R. G., Dawid, A. P., Lauritzen, S. L., and Spiegelhalter,
els but that there is no systematic tendency to produce D. G. (1999). “Probabilistic Networks and Expert Systems,” Springer-
models which err in particular directions. Under these cir- Verlag, New York.
cumstances, averaging the results of many models can Deitterich, T. (1998). “An experimental comparison of three methods
for constructing ensembles of decision trees: Bagging, boosting, and
reduce the error in the prediction that is associated with randomization,” Machine Learning 28, 1–22.
model instability while preserving low bias. De Veaux, R. D., Psichogios, D. C., and Ungar, L. H. (1993). “A com-
Model averaging strategies are moving beyond simple parison of two nonparametric estimation schemes: MARS and neural
bagging. Some employ many different kinds of regression networks,” Computers Chem. Eng. 17, 819–837.
techniques rather than just a single method. Others modify Donoho, D. L., and Johnstone, I. (1989). “Projection based approxima-
tion and a duality with kernel methods,” Ann. Stat. 17, 58–106.
the bagging algorithm in fairly complex ways, such as Fisher, R. A. (1936). “The use of multiple measurements in taxonomic
arcing (Brieman, 1998). A nice comparison of some of problems,” Ann. Eugen. 7, 179–188.
the recent ideas in this area is given by Dietterich (1998), Flury, B. (1988). “Common Principal Components and Related Multi-
and Hoeting et al. (1999) give an excellent tutorial on variate Models,” Wiley, New York.
more systematic Bayesian methods for model averaging. Freund, Y., and Schapire, R. E. (1996). “Experiments with a new boost-
ing algorithm.” In “Machine Learning: Proceedings of the Thirteenth
Model averaging removes the analyst’s ability to interpret International Conference,” pp. 148–156, Morgan Kaufmann, San
parameters in the models used and can only be justified in Mateo, CA.
terms of predictive properties. Friedman, J. H. (1991). “Multivariate additive regression splines,” Ann.
Stat. 19, 1–66.
Friedman, J. H., and Stuetzle, W. (1981). “Projection pursuit regression,”
SEE ALSO THE FOLLOWING ARTICLES J. Am. Stat. Assoc. 76, 817–23.
Friedman, J. H., Hastie, T., and Tibshirani, R. (2000). “Additive logistic
ARTIFICIAL NEURAL NETWORKS • DATABASES • DATA regression: A statistical view,” Ann. Stat. 28, 337–373.
Glymour, C., Madigan, D., Pregibon, D., and Smyth, P. (1997). “Statis-
STRUCTURES • INFORMATION THEORY • STATISTICS,
tical themes and lessons for data mining,” Data Mining Knowledge
BAYESIAN • STATISTICS, MULTIVARIATE Discovery 1, 11–28.
Hastie, T. J., and Tibshirani, R. J. (1990). “Generalized Additive Models,”
Chapman and Hall, New York.
BIBLIOGRAPHY Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999).
“Bayesian model averaging: A tutorial,” Stat. Sci. 14, 382–417.
Banks, D. L., and Parmigiani, G. (1992). “Preanalysis of superlarge data Hopfield, J. J. (1982). “Neural networks and physical systems with emer-
sets,” J. Quality Technol. 24, 930–945. gent collective computational abilities,” Proc. Natl. Acad. Sci. USA 79,
Barron, A. R. (1991). “Complexity regularization with aapplications to 2554–2558.
artificial neural networks.” In “Nonparametric Functional Estimation” Huber, P. J. (1994). “Huge data sets.” In “Proceedings of the 1994 COMP-
(G. Roussas, ed.), pp. 561–576, Kluwer, Dordrecht. STAT Meeting,” (R. Dutter and W. Grossmann, eds.), pp. 221–239,
Barron, A. R. (1993). “Universal approximation bounds for superposi- Physica-Verlag, Heidelberg.
P1: ZBU Final Pages
Encyclopedia of Physical Science and Technology En004F-164 June 8, 2001 16:12
Jordan, M. I., (ed.). (1998). “Learning in Graphical Models,” MIT Press, Rosenblatt, F. (1958). “The perceptron: A probabilistic model for in-
Cambridge, MA. formation storage and organization in the brain,” Psychol. Rev. 65,
Kaufman, L., and Rousseeuw, P. J. (1990). “Finding Groups in Data: An 386–408.
Introduction to Cluster Analysis,” Wiley, New York. Roweis, S. T., and Saul, L. K. (2000). “Nonlinear dimensionality reduc-
MacQueen, J. (1967). “Some methods for classification and analysis tion by local linear embedding,” Science 290, 2323–2326.
of multivariate observations.” In “Proceedings of the Fifth Berkeley Rumelhart, D., Hinton, G. E., and Williams, R. J. (1986). “Learning
Symposium on Mathematical Statistics and Probability,” pp. 281–297, representations by back-propagating errors,” Nature 323, 533–536.
University of California Press, Berkeley, CA. Schapire, R. E. (1990). “The strength of weak learnability,” Machine
Mahalanobis, P. C. (1936). “On the generalized distance in statistics,” Learning 5, 197–227.
Proc. Natl. Inst. Sci. India 12, 49–55. Schwarzer, G., Vach, W., and Schumacher, M. (1986). “On misuses of
Maron, O., and Moore, A. W. (1993). “Hoeffding races: Accelerating artificial neural networks for prognostic and diagnostic classification
model selection search for classification and function approximation.” in oncology,” Stat. Med. 19, 541–561.
In “Advances in Neural Information Processing Systems 6,” pp. 38– Scott, D. W., and Wand, M. P. (1991). “Feasibility of multivariate density
53, Morgan Kaufmann, San Mateo, CA. estimates,” Biometrika 78, 197–206.
McCulloch, W. S., and Pitts, W. (1943). “A Logical calculus of the ideas Spirtes, P., Glymour, C., and Scheines, R. (2001). “Causation, Prediction,
immanent in nervous activity,” Bull. Math. Biophys. 5, 115–133. and Search,” 2nd ed., MIT Press, Cambridge, MA.
Milligan, G. W., and Cooper, M. C. (1985). “An examination of proce- Stein, M. L. (1987). “Large sample properties of simulations using latin
dures for determining the number of clusters in a dataset,” Psychome- hypercube sampling,” Technometrics 29, 143–151.
trika 50, 159–179. Swayne, D. F., Cook, D., and Buja, A. (1997). “XGobi: Interactive dy-
Minsky, M., and Papert, S. A. (1969). “Perceptrons: An Introduction to namic graphics in the X window system,” J. Comput. Graphical Stat.
Computational Geometry,” MIT Press, Cambridge, MA. 7, 113–130.
Morgan, J. N., and Sonquist, J. A. (1963). “Problems in the analysis of Tennenbaum, J. B., de Silva, V., and Langford, J. C. (2000). “A global ge-
survey data and a proposal,” J. Am. Stat. Assoc. 58, 415–434. ometric framework for nonlinear dimensionality reduction,” Science
National Research Council. (1997). “Massive Data Sets: Proceedings of 290, 2319–2323.
a Workshop,” National Academy Press, Washington, DC. Tibshirani, R. (1988). “Estimating optimal transformations for regres-
Neal, R. (1996). “Bayesian Learning for neural networks,” Springer- sion via additivity and variance stabilization,” J. Am. Stat. Assoc. 83,
Verlag, New York. 394–405.
Pearl J. (1982). “Causality,” Cambridge University Press, Cambridge. Van Ness, J. W. (1999). “Recent results in clustering admissibility.” In
Pearl, J. (2000). “Causality: Models, Reasoning and Inference,” Cam- “Applied Stochastic Models and Data Analysis,” (H. Bacelar-Nicolau,
bridge University Press, Cambridge. F. Costa Nicolau, and J. Janssen, eds.), pp. 19–29, Instituto Nacional
Press, S. J. (1982). “Applied Multivariate Analysis: Using Bayesian de Estatistica, Lisbon, Portugal.
and Frequentist Methods of Inference,” 2nd ed., Krieger, Hunting- Vapnik, V. N. (2000). “The Nature of Statistical Learning,” 2nd ed.,
ton, NY. Springer-Verlag, New York.
Prim, R. C. (1957). “Shortest connection networks and some generaliza- Zhang, H., and Singer, B. (1999). “Recursive Partitioning in the Health
tions.” Bell Syst. Tech. J. 36, 1389–1401. Sciences,” Springer-Verlag, New York.
Quinlan, J. R. (1992). “C4.5 : Programs for Machine Learning,” Morgan Zhao, Y., and Atkeson, C. G. (1992). “Some approximation properties
Kaufmann, San Mateo, CA. of projection pursuit networks.” In “Advances in Neural Information
Ripley, B. D. (1996). “Pattern Recognition and Neural Networks,” Processing Systems 4” (J. Moody, S. J. Hanson, and R. P. Lippmann,
Cambridge University Press, Cambridge. eds.), pp. 936–943, Morgan Kaufmann, San Mateo, CA.
P1: GRA/GLT P2: FQP Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN004J-167 June 8, 2001 17:9
I. Introduction
II. Perfect Codes
III. Constant Weight Codes
IV. Maximum Distance Separable Codes
V. Convolutional Codes
GLOSSARY I. INTRODUCTION
Binary sum, u + v Componentwise addition of u and v Both error-correcting codes and combinatorial designs are
with 1 + 1 = 0 (exclusive or). areas of discrete (not continuous) mathematics that began
Binary word A string (or vector or sequence) of 0’s in response to applied problems, the first in making the
and 1’s electronic transmission of information reliable and the
Code A set of (usually binary) words, often all of the second in the design of experiments with results being
same length. statistically analyzed. It turns out that there is substantial
Combinatorial design A collection of subsets of a set overlap between these two areas, mainly because both are
satisfying additional regularity properties. looking for uniformly distributed subsets within certain
Decoding Finding the most likely codeword (or message) finite sets. In this article we provide a brief introduction to
transmitted. both areas and give some indication of their interaction.
Distance, d(u, v) The number of coordinates in which u
and v differ.
A. Error-Correcting Codes
Encoding The assignment of codewords to messages.
Information rate The fraction of information per trans- Error-correcting codes is a branch of discrete mathemat-
mitted bit. ics, electrical engineering, and computer science that has
Weight, wt(u) The number of nonzero bits in the word u. developed over the past 50 years, largely in response to
335
P1: GRA/GLT P2: FQP Final Pages
Encyclopedia of Physical Science and Technology EN004J-167 June 8, 2001 17:9
the dramatic growth of electronic transfer and storage of The key to being able to detect or correct errors that oc-
information. Coding Theory began in the late 1940s and cur during transmission is to have a code, C, such that no
early 1950s with the seminal work of Shannon (1948), two codewords are close. The distance between any two
Hamming (1950), and Golay (1949). Error-correcting words u and v, denoted by d(u, v), is simply the num-
codes’ first significant application was in NASA’s deep ber of coordinates in which the two words differ. The
space satellite communications. Other important appli- weight of u, wt(u), is just the number of nonzero coor-
cations since then have been in storage devices (e.g., dinates in u. Using the binary sum (so 1 + 1 = 0), we have
compact discs), wireless telephone channels, and geo- d(u, v) = wt(u + v).
positioning systems. They are now routinely used in all Under the assumptions that bits are more likely to be
satellite communications and mobile wireless communi- transmitted correctly than incorrectly (a natural assump-
cations systems. tion) and that messages are equally likely to be trans-
Since no communication system is ideal, information mitted (this condition can be substantially relaxed), it
can be altered, corrupted, or even destroyed by noise. Any is easy to show that for any received word w, the most
communication system needs to be able to recognize or likely codeword originally transmitted is the codeword
detect such errors and have some scheme for recovering c ∈ C for which d(c, w) is least; that is, the most likely
the information or correcting the error. In order to pro- codeword sent is the one closest to the received word.
tect against the more likely errors and thus improve the This leads to the definition of the minimum distance
reliability, redundancy must be incorporated into the mes- d(C) = minc1 ,c2 ∈C {d(c1 , c2 )} of a code C, the distance be-
sage. As a crude example, one could simply transmit the tween the two closest codewords in C. Clearly, if a word
message several times in the expectation that the majority w is received such that d(c, w) ≤ t = (d(C) − 1)/2 for
will appear correctly at their destination, but this would some c ∈ C, then the unique closest codeword to w is c;
greatly increase the cost in terms of time or the rate of therefore, c is the most likely codeword sent and we de-
transmission (or space in storage devices). code w to c. (Notice that if c was the codeword that was
The most basic error control scheme involves simply originally transmitted, then at most t bits were altered
detecting errors and requesting retransmission. For many in c during transmission to result in w being received.)
communications systems, requests for retransmission are So decoding each received word to the closest codeword
impractical or impose unacceptable costs on the commu- (known as maximum likelihood decoding, or MLD) will
nication system’s performance. In deep space communi- always result in correct decoding provided at most t errors
cations, a request for retransmission would take too long. occur during transmission. Furthermore, if c1 and c2 are
In speech communications, noticeable delays are unac- the two closest codewords [so d(c1 , c2 ) = d(C)], then it
ceptable. In broadcast systems, it is impractical given the is clearly possible to change t + 1 bits in c1 so that the
multitude of receivers. The problem of correcting errors resulting word w satisfies d(c1 , w) > d(c2 , w). Therefore,
and recovering information becomes of paramount impor- if these t + 1 bits are altered during the transmission of
tance in such constrained situations. c1 , then, using MLD, we would incorrectly decode the
Messages can be thought of as words over some alpha- received word w to c2 . Since MLD results in correct de-
bet, but for all practical purposes, all messages are simply coding for C no matter which codeword is transmitted
strings of 0’s and 1’s, or binary words. Information can be and no matter which set of up to t bits are altered during
partitioned or blocked up into a sequence of binary words transmission, and since this is not true if we replace t with
or messages of fixed length k. A (block) code, C, is a set t + 1, C is known as a t-error-correcting code, or as an
of binary words of fixed length n, each element of which (n, |C|, d) code where d = d(C) ≥ 2t + 1.
is called a codeword. Mathematically, codewords can be The construction problem is to find an (n, |C|, d) code,
considered to be vectors of length n with elements being C, such that the minimum distance d (and thus t) is large—
chosen from a finite field, normally of order 2, but in some this improves the error-correction ability and thus the relia-
cases from the field G F(2r ). [So, for example, the binary bility of transmission; and where |C| is large, so the rate of
codeword 01101 could also be represented as the vector transmission (log2 |C|/n = k/n) is closer to 1. Since mes-
(0, 1, 1, 0, 1).] There are also convolutional codes where sages are usually blocked up into k-bit words, one usually
the codewords do not have fixed length (or have infinite has log2 |C| = k. Clearly, these aims compete against each
length), but these will be discussed later. Encoding refers other. The more codewords one packs together in C, the
to the method of assigning messages to codewords which harder it is to keep them far apart. In practice one needs
are then transmitted. Clearly, the number of codewords to make decisions about the reliability of the channel and
has to be at least as large as the number of k-bit messages. the need to get the message transmitted correctly and then
The rate of the code is k/n, since k bits of information weigh that against the cost of decreasing the rate of trans-
result in n bits being transmitted. mission (or increasing the amount of data to be stored). It
P1: GRA/GLT P2: FQP Final Pages
Encyclopedia of Physical Science and Technology EN004J-167 June 8, 2001 17:9
is possible to obtain bounds on one of the three parameters an extremely efficient decoding algorithm which finds the
n, d, and |C| in terms of the other two, and in some cases closest codeword without having to compare the received
families of codes have been constructed which meet these word to all 2192 codewords.
bounds. Two of these families are discussed in this arti- Again, the class of linear codes also has a relatively
cle: the perfect codes are codes that meet the Hamming efficient decoding algorithm. Associated with each linear
Bound and are described in Section II; the maximum dis- code C is the dual code C ⊥ consisting of all vectors (code-
tance separable (or MDS) codes are codes that meet the words) such that the dot product with any codeword in C
Singleton Bound and are described in Section IV. is 0 (again using xor or binary arithmetic). This is useful
The second problem associated with error-correcting because of the fact that if H is a generating matrix for
codes is the encoding problem. Each message m is to be C ⊥ , then Hw T = 0 if and only if w is a codeword. H is
assigned a unique codeword c, but this must be done effi- also known as the parity check matrix for C. The word
ciently. One class of codes that have an efficient encoding s = Hw T is known as the syndrome of w. Syndromes are
algorithm are linear codes; that is, the binary sum of any even more useful because it turns out that for each possible
pair of codewords in the code C is also a codeword in C syndrome s, there exists a word es with the property that a
(here, binary sum means componentwise binary addition closest codeword to any received word w with syndrome s
of the binary digits or the exclusive or of the two code- is w + es . This observation is taken even further to obtain
words). This means that C is a vector space and thus has a very efficient decoding algorithm for the Reed-Solomon
a basis which we can use to form the rows of a gener- codes that can deal with the 2196 codewords in real time;
ating matrix G. Then each message m is encoded to the it incorporates the fact that these codes are not only linear
codeword c = mG. One might also require that C have but also cyclic.
the additional property of being cyclic; that is, the cyclic Another family of codes that NASA uses is the convo-
shift c = xn x1 x2 . . . xn−1 of any codeword c = x1 x2 . . . xn lutional codes. Theoretically, these codes are infinite in
(where xi ∈ {0, 1}) is also a codeword for every codeword length, so a completely different decoding algorithm is
c in C. If C is cyclic and linear, then encoding can easily be required in this case (see Section V).
completed using a shift register design. The representation In the following sections, we focus primarily on the
of a code is critical in decoding. For example, Hamming construction of some of the best codes, putting aside dis-
codes (see Section II) possess a cyclic representation but cussion of the more technical problem of describing de-
also have equivalent representations that are not cyclic. coding algorithms for all except the convolutional codes
The final main problem associated with error-correcting in Section V. This allows the interaction between codes
codes is the decoding problem. It is all well and good to and designs to be highlighted.
know from the design of the code that all sets of up to t
errors occurring during transmission result in a received
B. Combinatorial Designs
word, w, that is closer to the codeword c that was sent
than it is to any other codeword; but given w, how do you Although (combinatorial) designs were studied earlier by
efficiently find c and recover m? Obviously, one could test such people as Euler, Steiner, Kirkman, it was Yates (1936)
w against each possible codeword and perhaps eventually who gave the subject a shot in the arm in 1935 by pointing
decode which is closest, but some codes are very big. Not out their use in statistics in the design of experiments.
only that, but it can also be imperative that decoding be In particular, he defined what has become known as an
done extremely quickly, as the following example demon- (n, k, λ) balanced incomplete block design (BIBD) to be
strates. a set V of n elements and a set B of subsets (called blocks)
The introduction of the compact disc (CD) by Phillips of V such that
in 1979 revolutionized the recording industry. This may
not have been possible without the heavy use of error- 1. Each block has size k < n.
correcting codes in each CD. (Errors can occur, for ex- 2. Each pair of elements in V occur as a subset of
ample, from incorrect cutting of the CD.) Each codeword exactly λ blocks in B.
on each CD represents less than 0.0001 sec of music, is
represented by a binary word of length 588 bits, and is ini- Fisher and Yates (1938) went on to find a table of small de-
tially selected from a Reed-Solomon code (see Section III) signs, and Bose (1939) soon after began a systematic study
that contains 2192 codewords. Clearly, this is an applica- of the existence of such designs. Bose made use of finite
tion where all decoding must take place with no delay, geometries and finite fields in many of his constructions.
as nobody will buy a CD that stops the music while the A natural generalization of BIBD is to replace (2) with
closest codeword is being found! It turns out that not only
are the Reed-Solomon codes excellent in that they meet 2 . Each (t + 1)-element subset of V occurs as a subset
the Singleton Bound (see Section III), but they also have of exactly λ blocks of B.
P1: GRA/GLT P2: FQP Final Pages
Encyclopedia of Physical Science and Technology EN004J-167 June 8, 2001 17:9
Such designs are known as (t + 1) designs, which can ductory text], and this topic is also considered elsewhere
be briefly denoted by Sλ (t + 1, k, n); in the particular in this encyclopedia, so here we restrict our attention to
case when λ = 1, they are known as Steiner (t + 1) de- designs that arise in connection with codes.
signs which are denoted by S(t + 1, k, n). By elementary
counting techniques, one can show that if s < t, then an
II. PERFECT CODES
Sλ (t + 1, k, n) design is also an Sµ (s, k, u) design where
µ = λ( t+1−s
n−s k−s
)/( t+1−s ). Since µ must be an integer, this
Let C be a code of length n with minimum distance d.
provides several necessary conditions for the existence of
Let t = (d − 1)/2 . Then, as described in Section I, for
a (t + 1) design.
each codeword c in C, and for each binary word w of
For many values of n, k, and t, an S(t + 1, k, n) design
length n with d(w, c) ≤ t, the unique closest codeword
cannot exist. A partial S(t + 1, k, n) design then is a set of
to w is c. Since we can choose any i of the n positions
k subsets of an n set where any (t + 1) subset is contained
to change in c in order to form a word of length n dis-
in at most one block. This is equivalent to saying that
tance exactly i from c, the number of words distance
any two k subsets intersect in at most t elements. Partial
i from c is ( ni ) = n!/(n − i)!i!. So the total number of
designs are also referred to as packings, and much research
words of length n that are distance at most t from c is
has focused on finding maximum packings for various
( n0 ) + ( n1 ) + · · · + ( nt ), one of which is c, thus the number
parameters.
of words of length tn distance at most t from some code-
There are very few results proving the existence of
word in C is |C| i=0 ( ni ) (by the definition of t, no code-
s designs once s ≥ 3. Hanani found exactly when there
word is within distance t of two codewords). Of course, the
exists an Sλ (3, 4, v) [also called Steiner Quadruple Sys-
total number of binary words of length n is 2n . Therefore,
tems; see Lindner and Rodger (1975) for a simple proof
it must be the case that
and Hartman and Phelps (1992) for a survey], and Teir-
t
linck (1980) proved that there exists an Sλ (s, s + 1, v) n
|C| ≤ 2 n
.
whenever λ = ((s + 1)!)2s+1 , v ≥ s + 1, and v ≡ s (modulo i
i=0
((s + 1)!)2s+1 . Otherwise, just a few s designs are known
[see Colbourn and Dinitz (1996)]. Much is known about This bound is known as the Hamming Bound or the sphere
their existence when s = 2. In particular, over 1000 pa- packing bound. Any code that satisfies equality in the
pers have been written [see Colbourn and Rosa (1999)] Hamming Bound is known as a perfect code, in which
about Sλ (2, 3, v) designs (known as triple systems, and as case d = 2t + 1.
Steiner triple systems if λ = 1). We only need to consider From the argument above, it is clear that for any per-
designs with λ = 1 in this article. fect code, each word of length n must be within distance t
Certainly, (t + 1) designs and maximum packings are of a unique codeword (if a code is not perfect, then there
of interest in their own right, but they also play a role in exist words for which the distance to any closest code-
the construction of good codes. To see how, suppose we word is more than t). In particular, if C is a perfect code
have a (t + 1) design (or packing) (V, B). For each block with d = 2t + 1, then the codewords of minimum weight
b in B, we form its characteristic vector cb of length n, d in C are the characteristic vectors of the blocks of an
indexed by the elements in V , by placing a 1 in position i S(t + 1, 2t + 1, n) design. To see this, note that each word
if i ∈ B and placing a 0 in position i if i ∈ B. Let C be the w of weight t + 1 is within distance t of a unique codeword
code {cb | b ∈ B}. Then C is a code of length n in which c, where clearly c must have weight d = 2t + 1. Equiva-
all codewords have exactly k 1’s (we say each codeword lently, each (t + 1) subset is contained in a unique d subset,
has weight k). The fact that (V, B) is a t + 1 design (or the characteristic vector of which is a codeword. In fact,
packing) also says something about the minimum distance for any codeword c ∈ C, one can define the neighborhood
of C: since each pair of blocks intersect in at most t el- packing,
ements, each pair of codewords have at most t positions
NS(c) = {x + c | x ∈ C and d(x, c) = d}.
where both are 1, so each pair of codewords disagree in
at least 2k − 2t positions, so d(C) ≥ 2k − 2t. This con- Then, the code C will be perfect if and only if every neigh-
nection is considered in some detail in Section III. We borhood packing is, in fact, the characteristic vectors of an
also show in Section II that the codewords of weight d in S(t + 1, 2t + 1, n) design. To see the converse, suppose C
perfect codes together form the characteristic vectors of a is a code with every NS(c) an S(t + 1, 2t + 1, n) design.
(t + 1) design. If w is any word, let c ∈ C be the closest codeword and as-
There is much literature on the topic of combinatorial sume d(c, w) ≥ t + 1. Choose any t + 1 coordinates where
designs [see Colbourn and Dinitz (1996) for an encyclope- c and w disagree. Since NS(c) is an S(t + 1, 2t + 1, n) de-
dia of designs and Lindner and Rodger (1997) for an intro- sign, these coordinates uniquely determine a block of size
P1: GRA/GLT P2: FQP Final Pages
Encyclopedia of Physical Science and Technology EN004J-167 June 8, 2001 17:9
a code with all nonzero words having the same weight. IV. MAXIMUM DISTANCE
These codes are sometime referred to as linear equidistant SEPARABLE CODES
codes. The dual of the Hamming code (also called the
simplex code) is an example of such a code. In fact, it has For any linear code C, recall that the minimum distance
been proved that the only such codes are formed by taking equals the minimum weight of any nonzero codeword.
several copies of a simplex code. The proofs that all such Also, if C has dimension k, then C ⊥ has dimension n − k
codes are generalized simplex codes come explicitly from and any parity check matrix H of C has rank n − k. If
coding theory (Bonisoli, 1983) and also implicitly from c ∈ C is a codeword of minimum weight, wt(c) = d, then
results on designs and set systems (Teirlinck, 1980). There H cT = 0 implies that d columns of H are dependent, but
is a close connection between linear equidistant codes and no d − 1 columns are dependent. Since H has rank n − k,
finite geometries. The words of a simplex code correspond every n − k + 1 columns of H are dependent. Thus,
to the hyperplanes of projective space [over GF(2)] just as
the words of weight 3 in the Hamming code correspond d ≤ n − k + 1.
to lines in this projective space. [For connections between This is known as the Singleton Bound, and any code meet-
codes and finite geometries, see Black and Mullin (1976).] ing equality in this bound is known as a maximum distance
Another variation on CW codes are optical orthogo- separable code (or MDS code).
nal codes (OOC) which were motivated by an applica- There are no interesting binary MDS codes, but there are
tion to optical CDMA communication systems. Briefly, such codes over other alphabets, for example, the Reed-
an (n, w, ta , tb ) OOC is a CW code, C, of length n and Solomon codes used in CD encoding of order 256 (Reed-
weight w such that for any c = (c0 , c1 , . . . , cn−1 ) ∈ C, and Solomon codes are described below). Even though such
each y ∈ C, c = y and each i≡ 0 (mod n), codes are treated mathematically as codes with 256 differ-
ent “digits,” each still has an implementation as a binary
n−1
code, since each of the digits in the finite field GF(28 ) can
c j c j+i ≤ ta , (1)
j=0
be represented by a binary word of length 8; that is, by
one byte. So the first step in encoding the binary string
and representing all the music onto a CD is to divide it into
n−1 bytes and to regard each such byte, as a field element in
c j y j+i ≤ tc . (2) GF(28 ).
j=0
We now consider a code C ⊆ F n as a set of codewords
Equation (1) is the autocorrelation property, and Eq. (2) is over the alphabet F, where F is typically the elements of a
the cross-correlation property. Most research has focused finite field. The are several equivalent definitions for a lin-
on the case where ta = tc = t, in which case we refer to an ear code C of length n and dimension k to be an MDS code:
(n, w, t) OOC. Again, one can refomulate these properties
in terms of (partial) designs or packings. In this case, an 1. C has minimum distance d = n − k + 1,
OOC is a collection of w subsets of the integers (mod n), 2. Every k column of G, the generating matrix for C, is
such that for subsets c, b ∈ C, linearly independent.
3. Every n − k column of H , the parity check matrix for
|(c + i) ∩ (c + j)| ≤ ta (i = j), (3) C, is linearly independent.
and Note, that from item (3) above C is MDS if and only if
|(c + i) ∩ (b + j)| ≤ tc . (4) C ⊥ is MDS.
If one arranges the codewords of C in an |C| × n array,
Here, c + i = {x + i(mod n) | x ∈ c}. then from item (2) this array will have the property that
An OOC code is equivalent to a cyclic design or pack- for any choice of k columns (or coordinates) and any word
ing. A code or packing is said to be cyclic if every cyclic of length k, w ∈ F k , there will be exactly one row of this
shift of a codeword (or block) is another codeword. The set array that has w in these coordinates. An orthogonal array
of all cyclic shifts of a codeword is said to be an orbit. A is defined to be a q k × n array with entries from a set
representative from that orbit is often called a base block. F, |F| = q, with precisely this property: restricting the
An (n, w, t) OOC is a set of base blocks for a cyclic (par- array to any k columns, every word w ∈ F k occurs exactly
tial) S(t + 1, w, n) design or packing (assuming t < w). once in this q k × k subarray. Two rows of an orthogonal
Conversely, given such a cyclic partial S(t + 1, w, n) de- array can agree in at most k − 1 coordinates, which means
sign or packing, one can form an (n, w, t) OOC by taking that they must disagree in at least n − (k − 1) coordinates.
one representative block or codeword from each orbit. Thus, the distance between any two rows of an orthogonal
P1: GRA/GLT P2: FQP Final Pages
Encyclopedia of Physical Science and Technology EN004J-167 June 8, 2001 17:9
be efficiently implemented by using the Viterbi Decoder. Colbourn C. J., and Dinitz, J. H., eds. (1996). “The CRC Handbook of
Deciding on how large to make τ can affect the code- Combinatorial Designs,” CRC Press, Boca Raton, FL.
word to which w is decoded (see Hoffman et al., 1991, for Colbourn, C. J., and Rosa, A. (1999) “Triple Systems,” Oxford Univ.
Pess, Oxford.
example). Fisher, R. A., and Yates, F. (1938). “Statistical Tables for Biological,
In deciding which convolutional code to use, choices Agricultural and Medical Research,” Oliver & Boyd, Edinburgh.
have to made about g1 (x), . . . , gµ (x) and the number k of Golay, M. J. E. (1949). “Notes on digital coding,” Proc. IEEE 37, 657.
message symbols to move into the shift register at each Hamming, R. S. (1950) “Error-detecting and error-correcting codes,”
tick, usually chosen to be 1. The rate of the code is then Bell Syst. Tech. J. 29, 147–160.
Hanani, H. (1960) “On quadruple systems,” Canad. J. Math., 12, 145–
k/µ. 157.
Hartman, A., and Phelps, K. T. (1992) “Steiner Quadruple Systems,
Contemporary Design Theory” (J. H. Dinitz and D. R. Stinson, eds.),
SEE ALSO THE FOLLOWING ARTICLES Wiley, New York.
Hoffman, D. G., Leonard, D. A., Lindner, C. C., Phelps, K. T., Rodger, C.
A., and Wall, J. R. (1991). “Coding Theory: The Essentials,” Dekker,
COMMUNICATION SATELLITE SYSTEMS • DATABASES • New York.
DISCRETE MATHEMATICS AND COMBINATORICS • WIRE- Lindner, C. C., and Rodger, C. A. (1997) “Design Theory,” CRC Press,
LESS COMMUNICATIONS Boca Raton, FL.
van Lint, J. H. (1975) “A survey of perfect codes,” Rocky Mount. J. Math.
5, 199–224.
MacWilliams, F. J., and Sloane, N. J. A. (1977). “The Theory of Error-
BIBLIOGRAPHY Correcting Codes,” North-Holland, Amsterdam.
Shannon, C. E. (1948) “A mathematical theory of communication,” Bell
Beth, Th., Jungnickel, D., and Lenz, H. (1999, 2000). “Design Theory,” Syst. Tech. J. 27, 379–423 and 623–656.
Vols. 1 and 2, Cambridge Univ. Press, Cambridge, UK. Teirlinck, L. (1980). “On projective and affine hyperplanes,” J. Combi-
Blake, I. F., and Mullin, R. C. (1976). “The Mathematical Theory of natorial Theory, Ser. A 28, 290–306.
Coding Theory,” Academic Press, New York. Tietäväinen, A. (1973) “On the nonexistence of perfect codes over finite
Bonisoli, A. (1983) “Every equidistant linear code is a sequence of dual fields,” SIAM J. Appl. Math. 24, 88–96.
Hamming codes,” Ars Combinatoria 18, 181–186. Yates, F. (1939) “Complex experiments,” J. R. Stat. Soc. 2, 181–247.
Bose, R. C. (1939) “On the construction of balanced incomplete block Yates, F. (1936) “Incomplete randomized blocks,” Ann. Eugen. 7, 121–
designs,” Ann. Eugen. 9, 353–399. 140.
Bush, K. A. (1952) “Orthogonal arrays of index unity,” Ann. Math. Stat. Zinov’ev, V. A., and Leont’ev, V. K. (1973). “The nonexistence of perfect
23, 426–434. codes over Galois fields,” Probl. Control Inf. Theory 2(2), 123–132.
P1: GNH Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
I. Introduction
II. Initial-Value Problems
III. Fundamental Theory
IV. Linear Systems
V. Stability
373
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
and we call (A) an autonomous system of first-order or- φ (n) (t) = h t, φ(t), . . . , φ (n−1) (t)
dinary differential equations. for all t ∈ J .
2. If (t + T, x) ∈ D when (t, x) ∈ D and if f (t, x) = Now for a given (τ, ξ1 , . . . , ξn ) ∈ D, the initial value
f (t + T, x) for all (t, x) ∈ D then (I) assumes the form problem for (En ) is
x = f (t, x) = f (t + T, x) (P) y (n) = h t, y, y (1) , . . . , y (n−1)
(In )
Such a system is called a periodic system of first-order y (τ ) = ξ1 , . . . , y (n−1) (τ ) = ξn
differential equations of period T . The smallest T > 0 for
which (P) is true is the least period of this system of A function φ is a solution of (In ) if φ is a solution of
equations. Eq. (En ) on some interval containing τ and if φ(τ ) =
3. When in (I), f (t, x) = A(t)x, where A(t) = [ai j (t)] is ξ1 , . . . , φ (n−1) (τ ) = ξn .
a real n × n matrix with elements ai j (t) which are defined As in the case of systems of first-order equations, we
and at least piecewise continuous on a t interval J , then single out several special cases.
we have
1. Consider equations of the form
x = A(t)x (LH)
y (n)
+ an−1 (t)y (n−1) + · · · + a1 (t)y (1) + a0 (t)y = g(t)
and we speak of a linear homogeneous system of ordi-
nary differential equations. (1)
4. If for (LH) A(t) is defined for all real t and if there is
where an−1 (t), . . . , a0 (t) are real continuous functions de-
a T > 0 such that A(t) = A(t + T ) for all t, then we have
fined on the interval J . We refer to Eq. (1) as a linear ho-
x = A(t)x = A(t + T )x (LP) mogeneous ordinary differential equation of order n.
2. If in (1) we let g(t) ≡ 0, then
This system is called a linear periodic system of ordinary
differential equations. y (n) + an−1 (t)y (n−1) + · · · + a1 (t)y (1) + a0 (t)y = 0 (2)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
A. Existence of Solutions
In order to simplify our presentation, we will consider in
the next few results one-dimensional initial value prob-
lems (i.e., we will assume that for (I), n = 1). Later in
this section we will show how these results are modified
for higher dimensional systems. Thus, we have a domain
D ∈ R 2 , we are given (τ, ξ ) ∈ D and f ∈ C(D), and we
seek a solution for the initial-value problem,
FIGURE 4 An example of an electric circuit. x = f (t, k), x(τ ) = ξ (I )
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
B. Continuation of Solutions
Our next task is to determine if it is possible to extend a
solution φ to a larger interval than was indicated above
(|t − τ | ≤ c). The answer to this is affirmative. To see this,
suppose that f ∈ C(D) and suppose also that f is bounded
on D. Suppose also that by some procedure, as above, it
was possible to show that φ is a solution of the scalar FIGURE 7 Continuation of a solution φ to ∂D.
differential equation,
x = f (t, x) (E ) f ∈ C(D) is said to satisfy a Lipschitz condition on D
(with respect to x) with Lipschitz constant L if
on an interval J = (a, b). Using expression (V ) for φ it
is an easy matter to show that the limit of φ(t) as t ap- | f (t, x̄¯ ) − f (t, x̄)| ≤ L|x̄¯ − x̄| (15)
proaches a from the right exists and that the limit of φ(t)
for all (t, x̄)(t, x̄¯ ) in D. The function f is said to be
as t approaches b from the left exists; that is,
Lipschitz continuous in x on D in this case.
lim φ(t) = φ(a + ) For example, it can be shown that if ∂ f (t, x)/∂ x exists
t→a +
and is continuous on D, then f will be Lipschitz contin-
and uous on any compact and convex subsect D0 of D.
lim φ(t) = φ(b− ) In order to establish a uniqueness result for solutions
t→b− of the initial value problem (I ), we will also require a
Now clearly, if the point (a, φ(a + )) ∈ D (resp., if (b, result known as the Gronwall inequality: Let r and k
φ (b− )) ∈ D), then by repeating the procedure given in the be continuous nonnegative real functions defined on an
above results (ε-approximate solution result and Peano– interval [a, b] and let δ ≥ 0 be a constant. If
Cauchy theorem), the solution φ can be continued to the t
left past the point t = a (resp., to the right past the point r (t) ≤ δ + k(s)r (s) ds (16)
a
t = b). Indeed, it should be clear that repeated applications
of these procedures will make it possible to continue the then
t
solution φ to the boundary of D. This is depicted in Fig. 7.
r (t) ≤ δ exp k(s) ds (17)
It is worthwhile to note that the solution φ in this figure a
exists over the interval J and not over the interval J˜.
Now suppose that for (I ) the Cauchy–Peano theorem
We summarize the preceding discussion in the follow-
holds and suppose that for one given (τ, ξ ) ∈ D, two solu-
ing continuation result:
tions φ1 and φ2 exist over some interval |t − τ | ≤ d, d > 0.
On the interval τ ≤ t ≤ τ + d we now have, using (V ) to
If f ∈ C(D) and if f is bounded on D, then all solutions of (E )
express φ1 and φ2 ,
can be extended to the boundary of D. These solutions are then
t
noncontinuable.
φ1 (t) − φ2 (t) = [ f (s, φ1 (s)) − f (s, φ2 (s))] ds (18)
τ
Now if, in addition, f is Lipschitz continuous in x, then
C. Uniqueness of Solutions Eq. (18) yields:
t
Next, we address the question of uniqueness of solutions.
|φ1 (t) − φ2 (t)| ≤ L|φ1 (s) − φ2 (s)| ds
To accomplish this, we require the following concept: τ
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
Letting r (t) = |φ1 (t) − φ2 (t)|, δ = 0,and k(t) ≡ L, and ap- |φ(t, τ, ξ ) − ψ(t, τ, ξ0 )| ≤ |ξ − ξ0 |
plying the Gronwall inequality, we now obtain: t
+ L|φ(s, τ, ξ ) − ψ(s, τ, ξ0 )| ds
|φ1 (t) − φ2 (t)| ≤ 0 for all τ ≤ t ≤ τ + d τ
Hence, it must be true that φ1 (t) = φ2 (t) on τ ≤ t ≤ τ + d. and by using the Gronwall inequality with δ = |ξ − ξ0 |,
A similar argument will also work for the interval τ − d ≤ L ≡ k(s), and r (t) = |φ(t) − ψ(t)|, we obtain the estimate:
t ≤ τ. |φ(t, τ, ξ ) − ψ(t, τ, ξ0 )| ≤ |ξ − ξ0 | exp(L|t − τ |)
Summarizing, we have the following uniqueness
result: t ∈ |t − τ | ≤ d (20)
If, in particular, we consider a sequence of initial con-
If f ∈ C(D) and if f satisfies a Lipschitz condition on D with ditions {ξm } having the property that ξm → ξ0 as m → ∞,
Lipschitz constant L, then the initial-value problem (I ) has at then it follows from Eq. (20) that φ(t, τ, ξm ) → φ(t, τ, ξ0 ),
most one solution on any interval |t − τ | ≤ d, d > 0. uniformly in t on |t − τ | ≤ d.
Summarizing, we have the following continuous
If the solution φ of (I ) is unique, then the ε-approximate dependence result:
solutions constructed before will tend to φ as ε → 0+ and
this is the basis for justifying Euler’s method—a numer- Let f ∈ C(D) and assume that f satisfies a Lipschitz condition
ical method of constructing approximations to φ. Now, if on D. Then, the unique solution φ(t, τ, ξ ) of (I ), existing on
we assume that f satisfies a Lipschitz condition, an alter- some bounded interval containing τ , depends continuously on
native classical method of approximation is the method ξ , uniformly in t.
of successive approximations. Specifically, let f ∈ C(D)
and let S be the rectangle in D centered at (τ, ξ ) shown in This means that if ξm → ξ0 then φ(t, τ, ξm ) → φ(t, τ, ξ0 ),
Fig. 5 and let c be defined as in Fig. 5. Successive approx- uniformly in t on |t − τ | ≤ d for some d > 0.
imations for (I ), or equivalently for (V ), are defined as: In a similar manner we can show that φ(t, τ, ξ ) will
depend continuously on the initial time τ . Furthermore,
φ0 (t) = ξ
if the differential equation (E ) depends on a parameter,
t
say µ, then the solutions of the corresponding initial value
φm+1 (t) = ξ + f (s, φm (s)) ds, (19)
τ problem may also depend in a continuous manner on µ,
m = 0, 1, 2, . . . provided that certain safeguards are present. We consider
a specific case in the following.
for |t − τ | ≤ c. Consider the initial-value problem,
The following result is the basis for justifying the
x = f (t, x, µ), x(τ ) = ξµ = µ + 1 (Iµ )
method of successive approximations:
where µ is a scalar parameter. Let f satisfy Lipschitz
If f ∈ C(D) and if f is Lipschitz continuous on S with constant L, conditions with respect to x and µ for (t, x) ∈ D and for
then the successive approximations φm , m = 0, 1, 2, . . . , given in |µ − µ0 | < c. Using an argument similar to the one em-
Eq. (19) exist on |t − τ | ≤ c, are continuous there, and converge ployed in connection with Eq. (20), we can show that
uniformly, as m → ∞, to the unique solution of (I ). the solution φ(t, τ, ξµ , µ) of (Iµ ), where ξµ depends con-
tinuously on µ, is a continuous function of µ (i.e., as
µ → µ0 , ξµ → ξµ0 , and φ(t, τ, ξµ , µ) → φ(t, τ, ξµ0 , µ0 )).
D. Continuity of Solutions with Respect
As an example, consider the initial-value problem
to Parameters
Our next objective is to study the dependence of solutions x = x + µt, x(τ ) = ξµ = µ + 1 (21)
φ of (I ) on initial data (τ, ξ ). In this connection, we find The right-hand side of Eq. (21) has a Lipschitz constant
it advantageous to highlight this dependence by writing with respect to x equal to one and with respect to µ equal
φ(t) = φ(t, τ, ξ ). to |a − b|, where J = (a, b) is assumed to be a bounded
Now suppose that f ∈ C(D) and suppose that f satisfies t-interval. The solution of (Iµ ) is
a Lipschitz condition on D with Lipschitz constant L.
Furthermore, suppose that φ and ψ solve: φ(t, τ, ξµ , µ) = [µ(τ + 2) + 1]e(t−τ ) − µ(t + 1) (22)
At t = τ , we have φ(τ, τ, ξµ , µ) = µ + 1 = x(τ ). Now
x = f (t, x) (E )
what happens when µ → 0? In this case, Eq. (21) becomes:
on an interval |t − τ | ≤ d with ψ(τ ) = ξ0 and φ(τ ) = ξ .
Then, by using (V ), we obtain: x = x, x(τ ) = 1 (23)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
Once again we consider the scalar initial-value v = F(t, v), u(τ ) = η (29)
problem,
then
x = f (t, x), x(τ ) = ξ (I )
|φ(t)| ≤ vM (t)
where f ∈ C(D) and (τ, ξ ) ∈ D. For (I ) we define the
maximal solution φM as that noncontinuable solution of for as long as both functions exist. (Here, |φ(t)| denotes the norm
of φ(t).)
(I ) having the property that if φ is any other solution
of (I ), then φM (t) ≥ φ(t) as long as both solutions are
defined. The minimal solution φm of (I ) is defined in a As an application of this result, suppose that f (t, x) in (E)
similar manner. It is not too difficult to prove that φM and is such that
φm actually exist. | f (t, x)| ≤ A|x| + B
In what follows, we also need the concept of upper
right Dini derivative D + x. Given x: (α, β) → R and for all t ∈ J and for all x ∈ R n , where J = [t0 , T ], and
x ∈ C(α, β) (i.e., x is a continuous real-valued function where A > 0, B > 0 are parameters. Then Eq. (29) as-
defined on the interval (α, β)), we define: sumes the form:
D + x(t) = lim+ sup[x(t + h) − x(t)]/ h v = Au + B, v(τ ) = η
h→0
Suppose that the maximal solution φM of (I ) stays in D for Both in the theory of differential equations and in their
all t ∈ [τ, T ]. If a continuous function ψ(t) with ψ(τ ) = ξ applications, linear systems of ordinary differential equa-
satisfies tions are extremely important. In this section we first
present the general properties of linear systems. We then
ψ (t) = D + ψ(t) ≤ f (t, ψ(t)) on D
turn our attention to the special cases of linear systems of
then it is true that ordinary differential equations with constant coefficients
ψ(t) ≤ φM (t) for all t ∈ [τ, T ] and linear systems of ordinary differential equations with
periodic coefficients. We also address some of the prop-
A similar result involving minimal solutions can also erties of nth-order linear ordinary differential equations.
be established.
The above result can now be applied to systems of equa-
A. Linear Homogeneous and
tions to obtain estimates for the norms of solutions. We
Nonhomogeneous Systems
have the following:
We first consider linear homogeneous systems,
Let f ∈ C(D), D ⊂ R n+1 , and let φ be a solution of
x = A(t)x (LH)
x = f (t, x), x(τ ) = ξ (I)
As noted in Section III, this system possesses unique so-
Let F(t, v) be a scalar-valued continuous function such that
lutions for every (τ, ξ ) ∈ D where x(τ ) = ξ ,
| f (t, x)| ≤ F(t, |x|) for all (t, x) ∈ D
D = {(t, x): t ∈ J = (a, b), x ∈ R n (or x ∈ C n )}
where | f (t, x)| denotes any one of the equivalent norms of f (t, x)
on R n . If η ≤ |φ(τ )| and if vM denotes the maximal solution of when each element ai j (t) of matrix A(t) is continuous over
the scalar comparison equation given by: J . These solutions exist over the entire interval J = (a, b)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
and they depend continuously on the initial conditions. In (Here, det denotes the determinant of and tr A de-
applications it is typical that j = (−∞, ∞). We note that notes the trace of the matrix A.) This result is known as
φ(t) ≡ 0, for all t ∈ J , is a solution of (LH), with φ(τ ) = 0. Abel’s formula.
This is called the trivial solution of (LH). 3. A solution of matrix equation (31) is a fundamen-
In this section we consider matrices and vectors which tal matrix of (LH) if and only if its determinant is nonzero
will be either real or complex valued. In the former case, for all t ∈ J . (This result is a direct consequence of Abel’s
the field of scalars for the x space is the field of real formula.)
numbers (F = R) and in the latter case, the field for the x 4. If is a fundamental matrix of (LH) and if C is
space is the field of complex numbers (F = C). any nonsingular constant n × n matrix, then C is also a
Now let V denote the set of all solutions of (LH) on fundamental matrix of (LH). Moreover, if is any other
J ; let α1 , α2 be scalars (i.e., α1 , α2 ∈ F); and let φ1 , φ2 fundamental matrix of (LH), then there exists a constant
be solutions of (LH) (i.e., φ1 , φ2 ∈ V ). Then it is eas- n × n nonsingular matrix P such that = P.
ily verified that α1 φ1 + α2 φ2 will also be a solution of
(LH) (i.e., α1 φ1 + α2 φ2 ∈ V ). We have thus shown that In the following, we let {e1 , e2 , . . . , en } denote the set of
V is a vector space. Now if we choose n linearly inde- vectors e1T = (1, 0, . . . , 0), e2T = (0, 1, 0, . . . , 0), . . . , enT =
pendent vectors ξ1 , . . . , ξn in the n-dimensional x-space, (0, . . . , 0, 1). We call a fundamental matrix of (LH)
then there exist n solutions φ1 , . . . , φn of (LH) such that whose columns are determined by the linearly indepen-
φ1 (τ ) = ξ1 , . . . , φn (τ ) = ξn . It is an easy matter to verify dent solutions φ1 , . . . , φn with
that this set of solutions {φ1 , . . . , φn } is linearly indepen-
dent and that it spans V . Thus, {φ1 , . . . , φn } is a basis of φ1 (τ ) = e1 , . . . , φn (τ )en , τ∈J
V and any solution φ can be expressed as a linear combi- the state transition matrix for (LH). Equivalently, if
nation of the vectors φ1 , . . . , φn . is any fundamental matrix of (LH), then the matrix
Summarizing, we have determined by:
The set of solutions of (LH) on the interval J forms an n- (t, τ ) = (t) −1 (τ ) for all t, τ ∈ J
dimensional vector space.
is said to be the state transition matrix of (LH).
In view of the above result it now makes sense to define a We now enumerate several properties of state transition
fundamental set of solutions for (LH) as a set of n linearly matrices. All of these are direct consequences of the def-
independent solutions of (LH) on J . If {φ1 , . . . , φn } is such inition of state transition matrix and of the properties of
a set, then we can form the matrix: fundamental matrices. In the following we let τ ∈ J , we
let φ(τ ) = ξ , and we let (t, τ ) denote the state transition
= [φ1 , . . . , φn ] (30) matrix for (LH) for all t ∈ J . Then,
relates the “states” of (LH) at t and τ . This motivated the The matrix
name “state transition matrix.” t
eη /2
2
Let us now consider a couple of specific examples. 1
(t) = τ
2
/2
1. For the system of equations 0 et
x1 = 5x1 − 2x2 satisfies the matrix equation = A(t) and
(33) 2
/2
x2 = 4x1 − x2 det (t) = et for all t ∈ (−∞, ∞)
we have Therefore, is a fundamental matrix for Eq. (34). Also,
in view of Abel’s formula, we have
5 −2
A(t) ≡ A = for all t ∈ (−∞, ∞) t
4 −1
det (t) = det (τ ) exp tr A(s) ds
Two linearly independent solutions for Eq. (33) are τ
t
= eτ /2
η dη = e−t /2
2 2
e3t et exp
φ1 (t) = , φ2 (t) = τ
e3t 2et
for all t ∈ (−∞, ∞)
The matrix
as expected. Also, since
e3t et t
(t) =
e3t 2et 1 −e −t 2 /2
e η2 /2
dη
−1 (t) = τ
satisfies the equation = A and
−e−t /2
2
0
det (t) = e4t = 0 for all t ∈ (−∞, ∞)
we obtain for the state transition of Eq. (34),
Thus, is a fundamental matrix for Eq. (33). Also, in t
view of Abel’s formula, we obtain: −τ 2 /2 η2 /2
t 1 e e dη
(t)−1 (τ ) = τ
det (t) = det (τ ) exp tr A(s) ds (t −τ )/2
2 2
τ 0 e
t
= e exp 4τ
4 ds = e4t Finally, suppose that φ(τ ) = ξ = [1, 1]T . Then
τ t
−τ 2 /2 η2 /2
for all t ∈ (−∞, ∞) 1 + e e dη
as expected. Finally, since φ(t, τ, ξ ) = (t, τ )ξ = τ
e(t −τ )/2
2 2
−1 2e−3t −e−3t
(t) = Next, we consider linear nonhomogeneous systems,
−e−t −e−t
x = A(t)x + g(t) (LN)
we obtain for the transition matrix of Eq. (33),
We assume that A(t) and g(t) are defined and continuous
−1 2e3(t−τ ) − et−τ −e3(t−τ ) + et−τ
(t) (τ ) = over R = (−∞, ∞) (i.e., each component ai j (t) of A(t)
2e3(t−τ ) − 2et−τ −e3(t−τ ) + 2et−τ and each component gk (t) of g(t) is defined and contin-
2. For the system, uous on R). As noted in Section III, system (LN) has for
any τ ∈ R and any ξ ∈ R n a unique solution satisfying
x1 = x2 , x2 = t x2 (34) x(τ ) = ξ . This solution exists on the entire real line R and
we have is continuous in (t, τ, ξ ). Furthermore, if A and g depend
continuously on parameters λ ∈ R l , then the solution will
0 1
A(t) for all t ∈ (−∞, ∞) also vary continuously with λ. Indeed, if we differentiate
0 t
the function
Two linearly independent solutions of Eq. (34) are t
t φ(t, τ, ξ ) = (t, τ )ξ + (t, η)g(η) dη (35)
η2 /2 τ
0 e dη
φ1 (t) = , φ2 (t) = τ
with respect to t to obtain φ (t, τ, ξ ), and if we substitute
1
e t 2 /2 φ and φ into (LN) (for x), then it is an easy matter to
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
verify that Eq. (35) is in fact the unique solution of (LN) It turns out that a similar result holds for the system of
with φ(t, τ, ξ ) = ξ . linear equations with constant coefficients,
We note that when ξ = 0, then Eq. (35) reduces to
t x = Ax (L)
φp (t) = (t, η)g(η) dη (36) By making use of the Weierstrass M test, it is not diffi-
τ cult to verify the following result:
and when ξ = 0 but g(t) ≡ 0, then Eq. (35) reduces to
Let A be a constant n × n matrix which may be real or complex
φh (t) = (t, τ )ξ (37) and let S N (t) denote the partial sum of matrices defined by the
Thus, the solution of (LN) may be viewed as consisting of formula,
a component due to the “forcing term” g(t) and another N
tk k
component due to the initial data ξ . This type of separation S N (t) = E + A (41)
k=1
k!
is in general possible only in linear systems of differential
equations. We call φ p the particular solution and φh the where E denotes the n × n identity matrix and k! stands for
homogeneous solution of (LN). k factorial. Then each element of the matrix S N (t) converges
Before proceeding to linear systems with constant co- absolutely and uniformly on any finite t interval (−a, a), a > 0,
as N → ∞.
efficients, we introduce adjoint equations. Let be a fun-
damental matrix for the linear homogeneous system (LH).
Then, This result enables us to define the matrix,
∞ k
t k
(−1 ) = −−1 −1 = −−1 A(t) e At = E + A (42)
k=1
k!
Taking the conjugate transpose of both sides, we obtain:
for any −∞ < t < ∞.
(∗−1 ) = −A∗ (t)∗−1 It should be clear that when A(t) ≡ A, system (LH) re-
This implies that ∗−1 is a fundamental matrix for the duces to system (L). Consequently, the results we estab-
system: lished above for (LH) are also applicable to (L). Now
by making use of these results, the definition of e At
y = −A∗ (t)y, t∈J (38) in Eq. (42), and the convergence properties of S N (t) in
Eq. (42), it is not difficult to establish several impor-
We call Eq. (38) the adjoint to (LH), and we call the matrix tant properties of e At and of (L). To this end we let
equation, J = (−∞, ∞) and τ ∈ j, and we let A be a given con-
Y = −A∗ (t)Y, t∈J stant n × n matrix for (L). Then the following is true:
Next, we consider the “forced” system of equations, in Eq. (46) f i (t) from f i (s) we obtain for the solution of
Eq. (47),
x = Ax + g(t) (44)
φ(t) = −1 [(sE − A)−1 ]ξ = (t, 0)ξ = e At ξ (49)
where g: J → R n is continuous. Clearly, Eq. (44) is a spe-
−1
cial case of (LN). In view of Eq. (35) we thus have where [ fˆ (s)] = f (t) denotes the inverse Laplace
t transform of fˆ (s). It follows from Eqs. (49) and (48) that
φ(t) = e A(t−τ ) ξ + e At e−Aη g(η) dη (45) (s)
ˆ = (sE − A)−1
τ
Ĉ(s) = [cij (t)] = [cij (t)] = [ĉij (s)] φ(t) = φh (t) + φp (t)
Now consider the initial value problem, = −1 [(sE − A)−1 ]ξ + −1 [(sE − A)−1 ĝ(s)]
t
x = Ax, x(0) = ξ (47) = (t, 0)ξ + (t − η)g(η) dη (53)
0
Taking the Laplace transform of both sides of Eq. (47),
we obtain: Therefore,
t
sx(s) − ξ = Ax(s) φp (t) = (t − η)g(η) dη (54)
0
or as expected. We call the expression in Eq. (54) the convo-
(sE − A)x(s) = ξ lution of and g. Clearly, convolution of and g in the
time domain corresponds to multiplication of and g in
or the s domain.
x(s) = (sE − A)−1 ξ (48) Let us now consider the specific initial-value problem,
x1 = −x1 + x2 , x1 (0) = −1
where E denotes the n × n identity matrix. It can be shown (55)
by analytic continuation that (sE − A)−1 exists for all s, x2 = −2x2 + u(t), x2 (0) = 0
except at the eigenvalues of A (i.e., except at those val-
ues of s where the equation det(sE − A) = 0 is satisfied). where
Taking the inverse Laplace transform of Eq. (48) (i.e., 1 for t > 0
u(t) =
by reversing the procedure and obtaining, for example, 0 elsewhere
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
λi pi = Api ). Then the Jordan matrix J = P −1 AP assumes From Eq. (62), it now follows that the solution of Eq. (47)
the form: is given by:
λ1 e J0 (t−τ )
0 e J1 (t−τ )
0
J = . .. .. −1
φ(t) = P . P ξ
0 0
λn e Js (t−τ )
Using the power series representation, Eq. (42), we im- As a specific example of the above procedure of de-
mediately obtain the expression: termining the state transition matrix, consider the initial-
λ1 t value problem:
e
0
.. x1 = −x1 + x2 , x1 (0) = 1
e =
Jt .
0 x2 = −2x2 , x2 (0) = 2
λn t
e In this case, we have
In this case, we have the expression for the solution of −1 1
A=
Eq. (47): 0 −2
eλ1 (t−τ ) with eigenvalues λ1 = −1 and λ2 = −2 and with corre-
sponding eigenvectors,
0
.. −1
φ(t) = P . P ξ 1 −1
P1 = , P2 =
0 0 1
eλn (t−τ )
We thus have
In the general case when A has repeated eigenvalues,
1 −1 1 1
we can no longer diagonalize A and we have to be con- P = [ p1 , p2 ] = , P −1 =
0 1 0 1
tent with the Jordan form given by Eq. (57). In this case,
P = [v1 , . . . , vn ], where the vi denote generalized eigen- and
vectors. Using the power series representation, Eq. (42)
λ1 0 −1 0
and the very special nature of the Jordan blocks (58) and J= =
0 λ2 0 −2
(59), it is not difficult to show that in the case of repeated
eigenvalues we have Furthermore,we obtain:
e J0 t φ1 (t)
= Pe J t P −1 ξ
e J1 t 0 φ2 (t)
..
eJt = . −∞<t <∞ 1 −1 e−t 0 1 1 1
=
0 0 1 0 e −2t
0 1 2
e Js t −t
3e − 2e−2t
where =
2e−2t
e λ1 t
0
..
e J0 t
=
.
C. Linear Systems With Periodic Coefficients
0
e λk t Next, we consider linear homogeneous periodic systems
of the form:
and
x = A(t)x, −∞ < t < ∞ (P)
1 t · · · t ni −1 /(n i − 1)!
0 1 · · · t ni −2 /(n i − 2)! where the elements of A are continuous functions on R
et Ji = eλk+i t . .. .. and where
.. . .
0 0 1 A(t) = A(t + T ) (63)
i = 1, . . . , s for some T > 0 which is a period of A.
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
The principal result for (P) which we shall present here the norm of any solution of (P) tends to zero as t → ∞ at
(called Floquet theory) involves the logarithm of a matrix, an exponential rate.
the existence of which is not too difficult to establish. We Finally, by using Eq. (64) we can write:
have the following:
P(t) = (t)e−tR (67)
Let B be a nonsingular n × n matrix. Then there exists an n × n which in turn can be used to see that AP − P = PR. Thus,
matrix F, called the logarithm of B, such that for the transformation,
eF = B x = P(t)y (68)
φ(t, τ ) denotes the state transition matrix of A(t), and φp the characteristic polynomial of the differential equa-
is a particular solution of Eq. (80), given by: tion (72), and we call
t
p(λ) = 0 (83)
φp (t) = (t, s)g(s) ds
τ
the characteristic equation of Eq. (72). The roots of p(λ)
t
= (t) −1
(s)g(s) ds are called the characteristic roots of Eq. (72).
τ We see that the study of Eq. (72) reduces to the study
of the system of first-order ordinary differential equations
We now specialize this result from the n-dimensional
with constant coefficients given by x = Ax, where
system (80) to the corresponding nth-order equation (70)
to obtain the following result: 0 1 0 0 ··· 0
0 0 1 0 ··· 0
. .. .. .. ..
If {φ1 , . . . , φn } is a fundamental set for the equation L n y = 0, A=
.. . . . . (84)
then the unique solution ψ of the equation L n y = b(t) satisfying
0 0 0 0 ··· 1
ψ(τ ) = ξ1 , . . . , ψ (n−1) (τ ) = ξn is given by:
−a0 −a1 −a2 −a3 · · · −an−1
n
ψ(t) = ψh (t) + ψP (t) = ψh (t) + φk (t) The following result, which is proved in a straightfor-
k=1 ward manner, connects Eq. (72) and x = Ax with A given
t
Wk (φ1 , . . . , φn )(s) by Eq. (84):
× b(s) ds (81)
τ W (φ1 , . . . , φn )(s)
The characteristic polynomial of A in Eq. (83) is precisely the
Here, ψh is the solution of L n y = 0 such that ψ(τ ) = ξ1 , ψ (τ ) =
characteristic polynomial p(λ) given by Eq. (82), that is,
ξ2 , . . . , ψ (n−1) (τ ) = ξn , and Wk (φ1 , . . . , φn )(t) is obtained from
W (φ1 , . . . , φn )(t) by replacing the kth column in W (φ1 , . . . , p(λ) = det(λE n − A)
φn )(t) by (0, . . . , 0, 1)T .
The next result enumerates a fundamental set for
We apply the above example to the second-order differ- Eq. (72):
ential equation,
y + (1/t)y − (y/t 2 ) = b(t), 0<t <∞ Let λ1 , . . . , λs be the distinct roots of the characteristic equa-
s and suppose that λi has multiplicity m i , i = 1, . . . , s,
tion (83)
where b(t) is a real continuous function for all t > 0. with i=1 m i = n. Then the following set of functions is a fun-
From the example involving Eq. (79) we have φ1 (t) = t, damental set for Eq. (72):
φ2 (t) = 1/t, and W (φ1 , φ2 )(t) = −2/t, t > 0. Also,
t k eλi t , k = 0, 1, . . . , m i − 1,
0 1/t 1 (85)
W1 (φ1 , φ2 )(t) = i = 1, . . . , s
2 = − ,
1 −1/t t
t 0 As a specific example, consider:
W1 (φ1 , φ2 )(t) = =t
1 1 p(λ) = (λ − 2)(λ + 3)2 (λ + i)(λ − i)(λ − 4)4 (86)
From Eq. (81) we now have Then, n = 9, and {e2t , e−3t , te−3t , e−it , e+it , e4t , te4t ,
ψ(t) = ψh (t) + ψp (t) t 2 e4t , t 3 e4t } is a fundamental set for the differential equa-
tion corresponding to the characteristic equation (86).
t t 1 t 2 We conclude this section by considering adjoint equa-
= ψh (t) + b(s) ds − s b(s) ds
2 τ 2t τ tions. Corresponding to the operator L n given in Eq. (73),
we define a second linear operator L + n of order n, which
Next, we consider nth-order ordinary differential equa-
we call the adjoint of L n , as follows. The domain of L + n
tions with constant coefficients given by Eq. (72) which
is the set of all continuous functions defined on J such
can equivalently be written as L n y = 0, where
that [ā j (t)y(t)] has j continuous derivatives on J . (Here,
dn d n−1 d ā j (t) denotes the complex conjugate of a j (t).) For each
Ln = n
+ an−1 n−1 + · · · + a1 + a0 function y, define:
dt dt dt
We assume that J = (−∞, ∞), we call L+
n y = (−1) y
n (n)
+ (−1)n−1 (ān−1 y)n−1
p(λ) = λn + an−1 λn−1 + · · · + a1 λ + a0 (82) + · · · + (−1)(ā1 y) + ā0 y
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
Since there are no general rules for determining explicit and in case of T -periodic systems,
formulas for the solutions of systems ofordinary differen-
tial equations (E), the analysis of initial-value problems x = f (t, x), f (t, x) = f (t + T, x) (P)
(I) is accomplished along two lines: (1) a quantitative ap-
proach is used which usually involves the numerical solu- a point xe ∈ R n is an equilibrium at some t ∗ if and only
tion of such problems by means of simulations on a digital if it is an equilibrium point at all times. Also note that
computer, and (2) a qualitative approach is used which is if xe is an equilibrium (at t ∗ ) of (E), then the transfor-
usually concerned with the behavior of families of solu- mation s = t − t ∗ reduces (E) to d x/ds = f (s + t ∗ , x)
tions of a given differential equation and which usually and xe is an equilibrium (at s = 0) of this system. For
does not seek specific explicit solutions. As mentioned in this reason, we shall henceforth assume that t ∗ = 0 in the
Section I, we will concern ourselves primarily with qual- above definition and we shall not mention t ∗ further. Note
itative aspects of ordinary differential equations. also that if xe is an equilibrium point of (E), then for any
The principal results of the qualitative approach include t0 ≥ 0, φ(t, t0 , xe ) = xe for all t ≥ t0 (i.e., xe is a unique so-
stability properties of an equilibrium point (rest position) lution of (E) with initial data given by φ(t0 , t0 , xe ) = xe ).
and the boundedness of solutions of ordinary differential As a specific example, consider the simple pendulum
equations. We shall consider these topics in the present introduced in Section II, which is described by equations
section. of the form,
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
Now, in Fig. 9 we depict the behavior of the trajectories limt → ∞ φ(t + t0 , t0 , ξ ) = 0 uniformly in (t0 , ξ ) for t0 ≥ 0
in the vicinity of a stable equilibrium for the case x ∈ R 2 . and for |ξ | ≤ δ0 .
In applications, we are frequently interested in the fol-
When xe = 0 is stable, by choosing the initial points in a suffi- lowing special case of uniform asymptotic stability: the
ciently small spherical neighborhood, we can force the graph of equilibrium x = 0 of (E) is exponentially stable if there
the solution for t ≥ t0 to lie entirely inside a given cylinder. exists an α > 0, and for every ε > 0 there exists a δ(ε) > 0,
such that |φ(t, t0 , ξ )| ≤ εeα(t−t0 ) for all t ≥ t0 whenever
In the above definition of stability, δ depends on ε and t0 |ξ | < δ(ε) and t ≥ 0.
(i.e., δ = δ(ε, t0 )). If δ is independent of t0 (i.e., δ = δ(ε)). In Fig. 11, the behavior of a solution in the vicinity of
then the equilibrium x = 0 of (E) is said to be uniformly an exponentially stable equilibrium x = 0 is shown.
stable. The equilibrium x = 0 of (E) is said to be unstable if
The equilibrium x = 0 of (E) is said to be asymptoti- it is not stable. In this case, there exists a t0 ≥ 0, ε > 0,
cally stable if (1) it is stable, and (2) for every t0 ≥ 0 there a sequence ξm → 0 of initial points, and a sequence {tm }
exists an η(t0 ) > 0 such that limt→∞ φ(t, t0 , ξ ) = 0 when- such that |φ(t0 + tm , t0 , ξm )| ≥ ε for all m, tm ≥ 0.
ever |ξ | < η. Furthermore, the set of all ξ ∈ R n such that If x = 0 is an unstable equilibrium of (E), it still can
φ(t, t0 , ξ ) → 0 as t → ∞ for some t0 ≥ 0 is called the do- happen that all the solutions tend to zero with increas-
main of attraction of the equilibrium x = 0 of (E). Also, ing t. Thus, instability and attractivity are compatible con-
if for (E) condition (2) is true, then the equilibrium x = 0 cepts. Note that the equilibrium x = 0 is necessarily un-
is said to be attractive. stable if every neighborhood of the origin contains initial
The equilibrium x = 0 of (E) is said to be uniformly points corresponding to unbounded solutions (i.e., solu-
asymptotically stable if (1) it is uniformly stable, and tions whose norm |φ(t, t0 , ξ )| grows to infinity on a se-
(2) there is a δ0 > 0 such that for every ε > 0 and for any quence tm → ∞). However, it can happen that a system (E)
t0 ∈ R + there exists a T (ε) > 0, independent of t0 , such that with unstable equilibrium x = 0 may have only bounded
|φ(t , t0 , ξ )| < ε for all t ≥ t0 + T (ε) whenever |ξ | < δ0 . solutions.
In Fig. 10 we depict property (2), for uniform asymp-
totic stability, pictorially. By choosing the initial points
in a sufficiently small spherical neighborhood at t = t0 ,
we can force the graph of the solution to lie inside a
given cylinder for all t > t0 + T (ε). Condition (2) can be
rephrased by saying that there exists a δ0 > 0 such that
x = 0 is the only equilibrium of Eq. (96). This equilibrium where |(t, t0 )| denotes the matrix norm induced by the
is unstable. vector norm used on R n and sup denotes supremum.
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
We shall find it convenient to use the following conven- 1. λ1 , λ2 are real and λ1 < 0, λ2 < 0: x = 0 is an asymp-
tion, which has become standard in the literature: A real totically stable equilibrium point called a stable node.
n × n matrix A is called stable or a Hurwitz matrix if all 2. λ1 , λ2 are real and λ > 0, λ2 > 0: x = 0 is an unstable
of its eigenvalues have negative real parts. If at least one equilibrium point called an unstable node.
of the eigenvalues has a positive real part, then A is called 3. λ1 , λ2 are real and λ1 λ2 < 0: x = 0 is an unstable
unstable. A matrix A which is neither stable nor unstable equilibrium point called a saddle.
is called critical and the eigenvalues of A with zero real 4. λ1 , λ2 are complex conjugates and Re λ1 = Re λ2 <
parts are called critical eigenvalues. 0: x = 0 is an asymptotically stable equilibrium point
Thus, the equilibrium x = 0 of (L) is asymptotically called a stable focus.
stable if and only if A is stable. If A is unstable, then 5. λ1 , λ2 are complex conjugates and Re λ1 = Re λ2 >
x = 0 is unstable. If A is critical, then the equilibrium is 0: x = 0 is an unstable equilibrium point called an unsta-
stable if the eigenvalues with zero real parts correspond ble focus.
to a simple zero of the characteristic polynomial of A; 6. λ1 , λ2 are complex conjugates and Re λ1 =
otherwise, the equilibrium may be unstable. Re λ2 = 0: x = 0 is a stable equilibrium called a center.
Next, we consider the stability properties of linear pe-
riodic systems, Using the results of Section IV, it is possible to solve
Eq. (99) explicitly and verify that the qualitative behavior
x = A(t)x, A(t) = A(t + T ) (PL) of the trajectories near the equilibrium x = 0 is as shown in
Figs. 12–14 for the cases of a stable node, unstable node,
where A(t) is a continuous matrix for all t ∈ R. For such
systems, the following results follow directly from Floquet
theory:
FIGURE 13 Trajectories near an unstable node. FIGURE 15 Trajectories near a stable node (repeated eigenvalue
case).
involve the existence of realvalued functions v: D → R. The above observations motivate the following: let v:
In the case of local results (e.g., stability, instability, R + × R n → R (resp., v:R + × B(h) → R) be continuously
asymptotic stability, and exponential stability results), we differentiable with respect to all of its arguments and
shall usually only require that D = B(h) ⊂ R n for some let ∇v denote the gradient of v with respect to x. Then
H > 0, or D = R + × B(h). (Recall that R + = (0, ∞) and
v(E)
: R + × R n → R (resp., v(E) : R + × B(h) → R) is de-
B(h) = {x ∈ R n : |x| < h} where |x| denotes any one of the fined by:
equivalent norms of x on R n .) On the other hand, in the
case of global results (e.g., asymptotic stability in the large,
exponential stability in the large, and uniform bounded-
ness of solutions), we have to assume that D = R n or
Of special interest are functions v: R n → R that are Quadratic forms (105) have some interesting geometric
quadratic forms given by: properties. To see this, let n = 2, and assume that both
n eigenvalues of B are positive so that v is positive definite
v(x) = x T Bx = bik xi xk (105) and radially unbounded. In R 3 , let us now consider the
i,k=1
surface determined by:
where B = [bi j ] is a real symmetric n × n matrix (i.e.,
B T = B). Since B is symmetric, it is diagonizable and all z = v(x) = x TBx (106)
of its eigenvalues are real. For Eq. (105) one can prove the
following: This equation describes a cup-shaped surface as depicted
in Fig. 19. Note that corresponding to every point on this
1. v is positive definite (and radially unbounded) if and cup-shaped surface there exists one and only one point in
only if all principal minors of B are positive; that is, if and the x1 x2 plane. Note also that the loci defined by Ci = {x ∈
only if R 2 : v(x) = ci 0}, ci = const, determine closed curves in
the x1 x2 plane as shown in Fig. 20. We call these curves
b11 · · · b1k level curves. Note that C0 = {0} corresponds to the case
. ..
in which z = c0 = 0. Note also that this function v can be
det .. . > 0, k = 1, . . . , n
bk1 ··· bkk used to cover the entire R 2 plane with closed curves by
selecting for z all values in R + .
2. v is negative definite if and only if In the case when v = x T Bx is a positive definite
quadratic form with x ∈ R n , the preceding comments are
b11 · · · b1k still true; however, in this case, the closed curves Ci must
. .. be replaced by closed hypersurfaces in R n and a simple
(−1)k det .. . > 0, k = 1, . . . , n
geometric visualization as in Figs. 19 and 20 is no longer
bk1 · · · bkk
possible.
3. v is definite (i.e., either positive definite or negative
definite) if and only if all eigenvalues are nonzero and have
the same sign.
4. v is semidefinite (i.e., either positive semidefinite or
negative semidefinite) if and only if the nonzero eigenval-
ues of B have the same sign.
5. If λm and λM denote the smallest and largest
eigenvalues of B and if |x| denotes the Euclidean norm
of x, then λm |x|2 v(x) λM |x|2 for all x∈ R n . (The
n
Euclidean norm of x is defined as (x Tx)1/2 = ( i=1 xi2 )1/2 .)
6. v is indefinite (i.e., in every neighborhood of the ori-
gin x = 0, v assumes positive and negative values) if and
only if B possesses both positive and negative eigenvalues. FIGURE 20 Level curves determined by a quadratic form.
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
curves v(x) = c from the exterior toward the interior when asymptotically stable in the large.
we proceed along these trajectories in the direction of in- Continuing our discussion by making reference to
creasing values of t. Then, we can conclude that these Fig. 22, let us assume that we can find for Eq. (107) a
trajectories approach the origin as t increases (i.e., the continuously differentiable function v: R 2 → R which is
equilibrium x = 0 is in this case asymptotically stable). indefinite and has the properties discussed below. Since v
In terms of the given v function, we have the following is indefinite, there exist in each neighborhood of the origin
interpretation. For a given solution φ(t, t0 , x0 ) to cross the points for which v > 0, v < 0, and v(0) = 0. Confining our
curve v(x) = r, r = v(x0 ), the angle between the outward attention to B(k), where k > 0 is sufficiently small, we
normal vector ∇v(x0 ) and the derivative of φ(t, t0 , x0 ) at let D = {x ∈ B(k): v(x) < 0}. (D may consist of several
t = t0 must be greater than π/2; that is, subdomains.) The boundary of D, ∂ D, as shown in
v(107) (x0 ) = ∇v(x0 ) f (x0 ) < 0 Fig. 22, consists of points in ∂ B(k) and of points deter-
mined by v(x) = 0. Assume that in the interior of D, v
For this to happen at all points, we must have v(107) (x) < 0 is bounded. Suppose v(107) (x) is negative definite in D
for 0 < |x| ≤ r1 . The same results can be arrived at from an
analytic point of view. The function V (t) = v(φ(t, t0 , x0 ))
decreases monotonically as t increases. This implies that
the derivative v (φ(t, t0 , x0 )) along the solution (φ(t, t0 ,
x0 )) must be negative definite in B(r ) for r > 0 sufficiently
small.
Next, let us assume that, Eq. (107) has only one equi-
librium (at x = 0) and that v is positive definite and ra-
dially unbounded. It turns out that in this case, the re-
lation v(x) = c, c ∈ R + , can be used to cover all of R 2
by closed curves of the type shown in Fig. 21. If for ar-
bitrary (t0 , x0 ), the corresponding solution of Eq. (107),
φ(t, t0 , x0 ), behaves as already discussed, then it follows
that the derivative of v along this solution, v (φ(t, t0 , x0 )),
will be negative definite in R 2 .
Since the foregoing discussion was given in terms of an
arbitrary solution of Eq. (107), we may suspect that the
following results are true: FIGURE 22 Instability of an equilibrium point.
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
and that x(t) is a trajectory of Eq. (107) which originates where k > 0 is a constant. As noted earlier, Eq. (109)
somewhere on the boundary of D (x(t0 ) ∈ ∂ D) with has an isolated
x1 equilibrium at x = 0. Choose v(x1 , x2 ) =
v(x(t0 )) = 0. Then, this trajectory will penetrate the 1 2
x
2 2
+ k 0 sin η dη, which is continuously differentiable
boundary of D at points where v = 0 as t increases and and positive definite. Also, since v does not depend
it can never again reach a point where v = 0. In fact, as t on t, it will automatically be decrescent. Further-
increases, this trajectory will penetrate the set of points more, v(109) (x1 , x2 ) = (k sin x1 )x1 + x2 x2 = (k sin x1 )x2 +
determined by |x| = k (since; by assumption, v(107) <0 x2 (−k sin x1 ) = 0. Therefore, the equilibrium x = 0 of
along this trajectory and v < 0 in D). But, this indicates Eq. (109) is uniformly stable.
that the equilibrium x = 0 of Eq. (107) is unstable.
3. If there exists a continuously differentiable, positive
We are once more led to a conjecture: definite, decrescent function v with a negative definite
3. Let a function v: R 2 → R be given which is continu- derivative v(E) , then the equilibrium x = 0 of (E) is uni-
ously differentiable and which has the following proper- formly asymptotically stable.
ties: (a) There exist points x arbitrarily close to the origin
For an example, the system,
such that v(x) < 0; they form the domain D bounded by the
set of points determined by v = 0 and the disk |x| = k. (b) x1 = (x1 − c2 x2 ) x12 + x22 − 1
In the interior of D, v is bounded. (c) In the interior of D, (110)
v(107) is negative. Then, the equilibrium x = 0 of Eq. (107) x2 = (c1 x1 + x2 ) x12 + x22 − 1
is unstable.
has an isolated equilibrium at the origin x = 0. Choos-
ing v(x) = c1 x12 + c2 x22 , we obtain v(110) (x) = 2(c1 x12 +
H. Principal Lyapunov Stability
c2 x2 )(x1 + x2 − 1). If c1 > 0, c2 > 0, then v is positive
2 2 2
and Instability Theorems
definite (and decrescent) and v(110) is negative definite in
It turns out that results of the type given above for Eq. (107) the domain x12 + x22 < 1. Therefore, the equilibrium x = 0
are true for general systems (E). These results, which are of Eq. (110) is uniformly asymptotically stable.
proved by standard δ–ε arguments, comprise the direct
method of Lyapunov, which is also sometimes called the 4. If there exists a continuously differentiable, positive
second method of Lyapunov. The reason for this nomen- definite, decrescent, and radially unbounded function v
clature is clear: Results of the kind presented here allow such that v(E) is negative definite for all (t, x) ∈ R + × R n ,
us to make qualitative statements about whole families of then the equilibrium x = 0 of (E) is uniformly asymptoti-
solutions of (E), without actually solving this equation. cally stable in the large.
In the following, we enumerate some of the more im- As an example, consider the system,
portant results of the direct method. We shall assume that
v: R + × B(h) → R (resp., v: R + × R n → R). x1 = x2 + cx1 x12 + x22
(111)
1. If there exists a continuously differentiable positive x2 = −x1 + cx2 x12 + x22
definite function v with a negative semidefinite (or iden- where c is a real constant. Note that x = 0 is the only
tically zero) derivative v(E) , then the equilibrium x = 0 of equilibrium. Choosing the positive definite, decrescent,
(E) is stable. and radially unbounded function v(x) = x12 +x22 , we obtain
As an example, consider the system given by: v(111) (x) = 2c(x12 + x22 )2 . We conclude that if c = 0, then
x = 0 of Eq. (111) is uniformly stable and if c < 0, then
x1 = x2 , x2 = −x2 − e−t x1 (108)
x = 0 of Eq. (111) is uniformly asymptotically stable in
which has an equilibrium at (x1 , x2 ) = (0, 0) . For
T T
the large.
Eq. (108), choose the positive definite function v(t, x1 ,
5. If there exists a continuously differentiable function
x2 ) = x12 + et x22 . We obtain v(108) (t, x1 , x2 ) = −et x22 which
v and three positive constants c1 , c2 , and c3 such that
is negative semidefinite. We conclude that the equilibrium
x = 0 of Eq. (108) is stable. c1 |x|2 ≤ v(t, x) ≤ c2 |x|2
2. If there exists a continuously differentiable, positive
v(E) (t, x) ≤ −c3 |x|2
definite, decrescent function v with negative semidefinite
derivative v(E) , then the equilibrium x = 0 of (E) is uni- for all t ∈ R + and for all x ∈ B(r ) for some r > 0, then the
formly stable. equilibrium x = 0 of (E) is exponentially stable.
As an example, consider the simple pendulum,
6. If there exist a continuously differentiable function v
x1 = x2 , x2 = −k sin x1 (109) and three positive constants c1 , c2 , and c3 such that
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
c1 |x|2 ≤ v(t, x) ≤ c2 |x|2 10. Let there exist a bounded and continuously differ-
entiable function v: D → R, D = {(t, x) ≥ t0 , x ∈ B(h)},
v(E) (t, x) ≤ −c3 |x|2 with the following properties: (a) v(E)
(t, x) = λv(t, x) +
for all t ∈ R + and for all x ∈ R n , then the equilibrium w(t, x), where λ > 0 is a constant and w(t, x) is ei-
x = 0 of (E) is exponentially stable in the large. ther identically zero or positive semidefinite; (b) in the
set D1 = {(t, x): t = t1 , x ∈ B(h 1 )} for fixed t1 ≥ t0 and
As an example, consider the system, with arbitrarily small h 1 , there exist values x such that
x1 = −a(t)x1 − bx2 v(t1 , x) > 0. Then the equilibrium x = 0 of (E) is unstable.
(112)
x2 = bx1 − c(t)x2 As a specific example, consider:
where b is a real constant and where a and c are real and x1 = x1 + x2 + x1 x24
continuous functions defined for t ≥ 0 satisfying a(t) ≥ (114)
x2 = x1 + x2 − x12 x2
δ > 0 and c(t) ≥ δ > 0 for all t ≥ 0. We assume that
x = 0 is the only equilibrium for Eq. (112). If we which has an isolated equilibrium x = 0. Choosing v(x) =
choose v(x) = 12 (x12 + x22 ), then v(112) (t, x) = −a(t)x12 − (x12 − x22 )/2, we obtain v(114) (x) = λv(x) + w(x), where
c(t)x2 ≤ −δ(x1 + x2 ). Hence, the equilibrium x = 0 of
2 2 2
w(x) = x1 x2 + x1 x2 and λ = 2. It follows from the above
2 4 2 2
Eq. (112) is exponentially stable in the large. result that the equilibrium x = 0 of Eq. (114) is unstable.
7. If there exists a continuously differentiable function 11. Let there exist a continuously differentiable func-
v defined on |x| ≥ R (where R may be large) and tion v having the following properties: (a) For every ε > 0
0 ≤ t ≤ ∞, and if there exist ψ1 , ψ2 ∈ KR such that and for every t ≥ 0, there exist points x̄ ∈ B(ε) such that
ψ1 (|x|) ≤ v(t, x) ≤ ψ2 (|x|), v(E) (t, x) ≤ 0 for all |x| ≥ R v(t, x̄) < 0. We call the set of all points (t, x) such that
and for all 0 ≤ t < ∞, then the solutions of (E) are x ∈ B(h) and such that v(t, x) < 0 the “domain v < 0.”
uniformly bounded. It is bounded by the hypersurfaces which are determined
by |x| = h and v(t, x) = 0 and it may consist of several
8. If there exists a continuously differentiable func-
component domains. (b) In at least one of the compo-
tion v defined on |x| ≥ R (where R may be large) and
nent domains D of the domain v < 0, v is bounded from
0 ≤ t < ∞, and if there exist ψ1 , ψ2 ∈ KR and ψ3 ∈ K such
below and 0 ∈ ∂ D for all t ≥ 0. (c) In the domain D,
that ψ1 (|x|) ≤ v(t, x) ≤ ψ2 (|x|), v(E) (t, x) ≤ −ψ3 (|x|) for
v(E) ≤ − (|v|), where ψ ∈ K . Then, the equilibrium x = 0
all |x| ≥ R and 0 ≤ t < ∞, then the solutions of (E) are
of (E) is unstable.
uniformly ultimately bounded.
As an example, consider the system,
As an example, consider the system,
x = −x − σ, σ = −σ − f (σ ) + x (113) x1 = x1 + x2
(115)
where f (σ ) = σ (σ − 6). There are isolated equilib-
2 x2 = x1 − x2 + x1 x2
rium points at x = σ = 0, x = −σ = 2, and x = −σ = which has an isolated equilibrium at the origin x = 0.
−2. Choosing the radially unbounded and decrescent
Choosing v(x) = −x1 x2 , we obtain v(115) (x) = −x12 − x22 −
function v(x, σ ) = 12 (x 2 + σ 2 ), we obtain v(113) = (x, σ ) = x12 x2 . Let D = {x ∈ R 2 : x1 > 0, x2 > 0, and x12 + x22 < 1}.
−x − σ (σ − 5) ≤ − x − (σ − 2 ) + 25
2 2 2 2 2 5 2
4
. Also v(113)
Then, for all x ∈ D, v < 0 and c(115) < 2v. We see that the
is negative for all (x, σ ) such that x + σ > R , where,
2 2 2
above result is applicable and conclude that the equilib-
for example, R = 10 will do. Therefore, all solutions of rium x = 0 of Eq. (115) is unstable.
Eq. (113) are uniformly bounded and, in fact, uniformly
ultimately bounded. The results given in items 1–11 are also true when v
is continuous (rather than continuously differentiable). In
9. The equilibrium x = 0 of (E) is unstable (at t =
this case, v(E) must be interpreted in the sense of Eq. (104).
t0 ≥ 0) if there exists a continuously differentiable, decres- For the case of systems (A),
cent function v such that v(E) is positive definite (negative
definite) and if in every neighborhood of the origin there x = f (x) (A)
are points x such that v(t0 , x) > 0(v(t0 , x) < 0).
it is sometimes possible to relax the conditions on v(A)
Reconsider system (111), this time assuming that c > 0. when investigating the asymptotic stability of the equilib-
If we choose v(x) = x12 + x22 , then v(111) (x) = 2c(x12 + x22 )2 rium x = 0, by insisting that v(A) be only negative semidef-
and we can conclude from the above result that the equi- inite. In doing so, we require the following concept: A set
librium x = 0 of (E) is unstable. of points in R n is invariant (with respect to (A)) if
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
If f and f x = ∂ f /∂ x are continuous on the set (R + × B(r )) for M. Lyapunov’s First Method
some r > 0, and if the equilibrium x = 0 of (E) is uniformly
asymptotically stable, then there exists a Lyapunov function v We close this section by answering the following question:
which is continuously differentiable on (R + × B(r1 )) for some Under what conditions does it make sense to linearize
r1 > 0 such that v is positive definite and decrescent and such a nonlinear system about an equilibrium x = 0 and then
that v(E) is negative definite. deduce the properties of x = 0 from the corresponding
linear system? This is known as Lyapunov’s first method
or Lyapunov’s indirect method.
L. Comparison Theorems We consider systems of n real nonlinear first-order or-
Next, we consider once more some comparison results dinary differential equations of the form:
for (E), as was done in Section III. We shall assume that
x = Ax + F(t, x) (PE)
f : R + × B(r ) → R n for some r1 > 0 and that f is con-
tinuous there. We begin by considering a scalar ordinary where F: R + × B(h) → R n for some h > 0 and A is a real
differential equation of the form, n × n matrix. Here, we assume that Ax constitutes the
y = G(t, y) (C̃) linear part of the right-hand side of (PE) and F(t, x)
represents the remaining terms which are of order higher
+ +
where y ∈ R, t ∈ R , and F: R × [0, r ) → R for some than one in the various components of x. Such systems
r > 0. Assume that G is continuous on R + × [0, r ) and may arise in the process of linearizing nonlinear equations
that G(t, 0) = 0 for all t. Also, assume that y = 0 is an of the form:
isolated equilibrium for (C̃).
The following results are the basis of the comparison x = g(t, x) (G)
principle in the stability analysis of the isolated equilib-
or they may arise in some other fashion during the mod-
rium x = 0 of (E):
eling process of a physical system.
To be more specific, let g: R × D → R n where D is
Let f and G be continuous on their respective domains of defi-
nition. Let v: R + × B(r ) → R be a continuously differentiable, some domain in R n . If g is continuously differentiable
positive-definite function such that on R × D and if φ is a given solution of (E) defined for
all t ≥ t0 ≥ 0, then we can linearize (G) about φ in the
v(E) (t, x) ≤ G(t, v(t, x)) (119) following manner. Define y = x − φ(t) so that
Then, the following statements are true: (1) If the trivial so- y = g(t, x) − g(t, φ(t))
˜ is stable, then the trivial solution of system (E)
lution of (C) = g(t, y + φ(t)) − g(t, φ(t))
is stable. (2) If v is decrescent and if the trivial solution of
˜ is uniformly stable, then the trivial solution of (E) is uni-
(C) = (∂g/∂t)(t, φ(t))y + G(t, y)
formly stable. (3) If v is decrescent and if the trivial solution
˜ is uniformly asymptotically stable, then the trivial so-
of (C) Here,
lution of (E) is uniformly asymptotically stable. (4) If there
are constants a > 0 and b > 0 such that a|x|b ≤ v(t, x), if v is
G(t, y) = [g(t, y +φ(t))− g(t, φ(t))]−(∂g/∂ x)(t, φ(t))y
decrescent, and if the trivial solution of (C) ˜ is exponentially
is o(|y|) as |y| → 0 uniformly in t on compact subsets of
stable, then the trivial solution of (E) is exponentially stable.
[t0 , ∞).
(5) If f : R + × R n → R n , G: R + × R → R, v: R + × R n → R is
decrescent and radially unbounded, if Eq. (119) holds for all
Of special interest is the case when g is independent
t ∈ R + , x ∈ R n , and if the solutions of (C)
˜ are uniformly bounded of t (i.e., when g(t, x) ≡ g(x)) and φ(t) = ξ0 is a constant
(uniformly ultimately bounded ), then the solutions of (E) are also (equilibrium point). Under these conditions we have
uniformly bounded (uniformly ultimately bounded ).
y = Ay + G(y)
The above results enable us to analyze the stability and where A = (∂g/∂ x)(x)|x=ξ0 , where (∂g/∂ x)(x) denotes
boundedness properties of an n-dimensional system (E), the Jacobian of g(x).
which may be complex, in terms of the corresponding By making use of the result for the Lyapunov function
properties of a one-dimensional comparison system (C), ˜ (117), we can readily prove the following results:
which may be quite a bit simpler. The generality and effec-
tiveness of the above results can be improved and extended 1. Let A be a real, constant, and stable n × n matrix
by considering vector-valued comparison equations and and let F: R + × B(h) → R n be continuous in (t, x) and
vector Lyapunov functions. o (|x|) as |x| → 0, uniformly in t ∈ R + .
satisfy F(t, x) =
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN004E-172 June 8, 2001 17:16
Then, the trivial solution of (PE) is uniformly asymptoti- SEE ALSO THE FOLLOWING ARTICLES
cally stable.
As a specific example, consider the Liénard equation ARTIFICIAL NEURAL NETWORKS • CALCULUS • COMPLEX
given by: ANALYSIS • DIFFERENTIAL EQUATIONS, PARTIAL • FUNC-
TIONAL ANALYSIS • MEASURE AND INTEGRATION
x + f (x)x + x = 0 (120)
where f : R → R is a continuous function with f (0) > 0.
We can rewrite Eq. (120) as: BIBLIOGRAPHY
x1 = x2 , x2 = −x1 − f (0)x2 + [ f (0) − f (x1 )]x2 Antsaklis, P. J., and Michel, A. N. (1997). “Linear Systems,” McGraw-
Hill, New York.
and we can apply the above result with x = (x1 , x2 ) , T
Boyce, W. E., and DiPrima, R. C. (1997). “Elementary Differential
Equations and Boundary Value Problems,” John Wiley & Sons, New
0 1
A= York.
−1 − f (0) Brauer, F., and Nohel, J. A. (1969). “Qualitative Theory of Ordinary
Differential Equations,” Benjamin, New York.
0
F(t, x) = Carpenter, G. A., Cohen, M., and Grossberg, S. (1987). “Computing with
[ f (0) − f (x1 )]x2 neural networks,” Science 235, 1226–1227.
Coddington, E. A., and Levinson, N. (1955). “Theory of Ordinary Dif-
Noting that A is a stable matrix and that F(t, x) = o(|x|) as ferential Equations,” McGraw-Hill, New York.
|x| → 0, uniformly in t ∈ R + , we conclude that the trivial Hale, J. K. (1969). “Ordinary Differential Equations,” Wiley, New York.
solution (x, x ) = (0, 0) of Eq. (120) is uniformly asymp- Halmos, P. R. (1958). “Finite Dimensional Vector Spaces,” Van Nostrand,
totically stable. Princeton, NJ.
Hille, E. (1969). “Lectures on Ordinary Differential Equations,”
2. Assume that A is a real n × n matrix with no eigen- Addison-Wesley, Reading, MA.
values on the imaginary axis and that at least one eigen- Hoffman, K., and Kunze, R. (1971). “Linear Algebra,” Prentice-Hall,
Englewood Cliffs, NJ.
value of A has positive real part. If F: R + × B(h) → R n Hopfield, J. J. (1984). “Neurons with graded response have collective
is continuous and satisfies F(t, x) = o(|x|) as |x| → 0, computational properties like those of two-state neurons,” Proc. Nat.
uniformly in t ∈ R + , then the trivial solution of (PE) is Acad. Sci. U.S.A. 81, 3088–3092.
unstable. Kantorovich, L. V., and Akilov, G. P. (1964). “Functional Analysis in
Normed Spaces,” Macmillan, New York.
As a specific example, consider the simple pendulum, Michel, A. N. (1983). “On the status of stability of interconnected sys-
tems,” IEEE Trans. Automat. Control 28(6), 639–653.
x + k sin x = 0, k>0 (121) Michel, A. N., and Herget, C. J. (1993). “Applied Algebra and Functional
Analysis,” Dover, New York.
Note that xe = π, xe = 0 is an equilibrium of Eq. (121). Michel, A. N., and Miller, R. K. (1977). “Qualitative Analysis of Large-
Let y = x − xe so that Scale Dynamical Systems,” Academic Press, New York.
Michel, A. N., and Wang, K. (1995). “Qualitative Theory of Dynamical
y + a sin(y + π) = y − ay + a(sin(y + π) + y) = 0 Systems,” Dekker, New York.
Michel, A. N., Farrell, J. A., and Porod, W. (1989). “Qualitative analysis
This equation can be put into the form (PE) with of neural networks,” IEEE Trans. Circuits Syst. 36(2), 229–243.
Miller, R. K., and Michel, A. N. (1982). “Ordinary Differential Equa-
0 1
A= tions,” Academic Press, New York.
a 0 Naylor, A. W., and Sell, G. R. (1971). “Linear Operator Theory in En-
gineering and Science,” Holt, Rinehart & Winston, New York.
0
F(t, x) = Royden, H. L. (1963). “Real Analysis,” Macmillan, New York.
a(sin(y + π ) + y) Rudin, W. (1953). “Principles of Mathematical Analysis,” McGraw-Hill,
New York.
Applying the above result we conclude that the equilib- Simmons, G. F. (1972). “Differential Equations,” McGraw-Hill,
rium point (π, 0) is unstable. New York.
P1: GNB Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN004E-173 June 8, 2001 17:20
I.Importance
II.How They Arise
III.Some Well-Known Equations
IV. Types of Equations
V. Problems Associated with
Partial Differential Equations
VI. Methods of Solution
GLOSSARY ∂u ∂u ∂ 2 u ∂ku
F x1 , . . . , xn , u, ,..., , ,..., k,... =0
∂ x1 ∂ xn ∂ x12 ∂ x1
Boundary Set of points in the closure of a region not
contained in its interiors. One can have more than one unknown function and more
Bounded region Region that is contained in a sphere of than one equation involving some or all of the unknown
finite radius. functions. One then has a system of j partial differential
Eigenvalue Scalar λ for which the equation Au = λu has equations in k unknown functions. The number of equa-
a nonzero solution u. tions may be more or less than the number of unknown
Euclidean n dimensional space Rn Set of vectors x = functions. Usually it is the same.
(x1 , . . . , xn ) where each component x j is a real number.
Partial derivative Derivative of a function of more than I. IMPORTANCE
one variable with respect to one of the variables keeping
the other variables fixed. One finds partial differential equations in practically every
branch of physics, chemistry, and engineering. They are
also found in other branches of the physical sciences and in
A PARTIAL DIFFERENTIAL EQUATION is an equa- the social sciences, economics, business, etc. Many parts
tion in which a partial derivative of an unknown function of theoretical physics are formulated in terms of partial
appears. The order of the equation is the highest order of differential equations. In some cases, the axioms require
the partial derivatives (of an unknown function) appear- that the states of physical systems be given by solutions
ing in the equation. If there is only one unknown function of partial differential equations. In other cases, partial dif-
u(x1 , . . . , xn ), then a partial differential equation for u is ferential equations arise when one applies the axioms to
of the form: specific situations.
407
P1: GNB Final Pages
Encyclopedia of Physical Science and Technology EN004E-173 June 8, 2001 17:20
II. HOW THEY ARISE time they are merely plausibility arguments. For this rea-
son, some branches of science have accepted partial dif-
Partial differential equations arise in several branches ferential equations as axioms. The success of these axioms
of mathematics. For instance, the Cauchy–Riemann equa- is judged by how well their conclusions describe past ob-
tions, servations and predict new ones.
∂u(x, y) ∂v(x, y) ∂u(x, y) ∂v(x, y)
= , =−
∂x ∂y ∂y ∂x III. SOME WELL-KNOWN EQUATIONS
must be satisfied if
Now we list several equations that arise in various bran-
f (z) = u(x, y) + iv(x, y) ches of science. Interestingly, the same equation can arise
is to be an analytic function of the complex variable z = in diverse and unrelated areas.
x + i y. Thus, the rich and beautiful branch of mathemat-
A. Laplace’s Equation
ics known as analytic function theory is merely the study
of solutions of a particular system of partial differential In n dimensions this equation is given by:
equations.
u=0
As a simple example of a partial differential equation
arising in the physical sciences, we consider the case of a where
vibrating string. We assume that the string is a long, very ∂2 ∂2
= + · · · +
slender body of elastic material that is flexible because of ∂ x12 ∂ xn2
its extreme thinness and is tightly stretched between the
It arises in the study of electromagnetic phenomena
points x = 0 and x = L on the x axis of the x,y plane. Let
(e.g., electrostatics, dielectrics, steady currents, magne-
x be any point on the string, and let y(x, t) be the displace-
tostatics), hydrodynamics (e.g., irrotational flow of a per-
ment of that point from the x axis at time t. We assume
fect fluid, surface waves), heat flow, gravitation, and many
that the displacements of the string occur in the x,y plane.
other branches of science. Solutions of Laplace’s equation
Consider the part of the string between two close points
are called harmonic functions.
x1 and x2 . The tension T in the string acts in the direction
of the tangent to the curve formed by the string. The net B. Poisson’s Equation
force on the segment [x1 , x2 ] in the y direction is
u ≡ f (x), x = (x1 , . . . , xn )
T sin ϕ2 − T sin ϕ1
Here, the function f (x) is given. This equation is found
where ϕi is the angle between the tangent to the curve and in many of the situations in which Laplace’s equation ap-
the x axis at xi . According to Newton’s second law, this pears, since the latter is a special case.
force must equal mass times acceleration. This is
x2 C. Helmholtz’s Equation
ρ ∂ 2 y/∂t 2 d x
x1 u ± α2u = 0
where ρ is the density (mass per unit length) of the string. This equation appears in the study of elastic waves,
Thus, in the limit vibrating strings, bars and membranes, sound and acous-
∂ ∂2 y tics, electromagnetic waves, and the operation of nuclear
T sin ϕ = ρ 2 reactors.
∂x ∂t
We note that tan ϕ = ∂ y/∂ x. If we make the simplifying
D. The Heat (Diffusion) Equation
assumption (justified or otherwise) that
∂ This equation is of the form:
cos ϕ ≈ 1, cos ϕ ≈ 0
∂x ut = a2 u
we finally obtain: where u(x1 , . . . , xn , t) depends on the variable t(time) as
T ∂ 2 y/∂ x 2 = ρ ∂ 2 y/∂t 2 well. It describes heat conduction or diffusion processes.
which is the well-known equation of the vibrating string.
The derivation of partial differential equations from E. The Wave Equation
physical laws usually brings about simplifying assump-
tions that are difficult to justify completely. Most of the u ≡ (1/c2 ) − u=0
P1: GNB Final Pages
Encyclopedia of Physical Science and Technology EN004E-173 June 8, 2001 17:20
This describes the propagation of a wave with velocity c. M. The Navier–Stokes Equations
This equation governs most cases of wave propagation.
∂u j ∂u j 1 ∂p
+ uk + = γ uj
F. The Telegraph Equation ∂t k
∂ x k ρ ∂x j
∂u k
u + σ u t = 0 =0
k
∂ xk
This applies to some types of wave propagation.
This system describes viscous flow of an incompress-
ible liquid with velocity components u k and pressure p.
G. The Scalar Potential Equation
u t + cuu x + u x x x = 0
H. The Klein–Gordon Equation
This equation is used in the study of water waves.
u + µ2 u = 0
h2 ∂ψ
− ψ + V (x)ψ = i h A. Linear Equations
2m ∂t
This equation describes the motion of a quantum mech- The most general linear partial differential equation of
anical particle as it moves through a potential field. The order m is
functionV (x)represents the potential energy, while the un-
Au ≡ aµ (x)D µ u = f (x) (1)
known function ψ(x) is allowed to have complex values.
|µ|m
we are prescribing Dirichlet data. On those parts of ∂ and, indeed, it has a unique analytic solution:
where β(x) = 0, we are prescribing Neumann data. On the
remaining sections of ∂ we are prescribing Robin data. u(x, y) = n −2 sinh ny sin nx
This is an example of a mixed boundary-value problem The function n −1 sin nx tends uniformly to 0 as n → ∞,
in which one prescribes different types of data on differ- but the solution does not become small as n → ∞ for
ent parts of the boundary. Other examples are provided by y = 0. It can be shown that the Cauchy problem is well
parabolic and hyperbolic equations, to be discussed later. posed only for hyperbolic equations.
only and the right-hand side is a function of x only, both With these values, the series (16) converges and gives a
sides are constant. Thus, there is a constant K such that solution of Eqs. (12)–(14).
X (x) = KX(x). If K = λ2 > 0, X must be of the form:
For Eq. (14) to be satisfied, we must have If we desire to determine the temperature u(x, t) of a sys-
tem in Rn with no heat added or removed and initial tem-
X (0) = X (π) = 0 (15 ) perature given, we must solve:
n=1
∞
= (4πa 2 t)−n/2 e−|x|
2
/4a 2 t
g(x) = ncBn sin nx
n=1 This suggests that a solution of Eqs. (17) and (18) is given
This will be true if f (x), g(x) are expandable in a Fourier by:
sine series. If they are, then the coefficients An , Bn are 2 −n/2
e−|x−y| /4a t ϕ(y) dy
2 2
It is easily checked that this is indeed the case if ϕ is con- c0 |ξ |m P(ξ ) C0 |ξ |m , ξ ∈ Rn (23)
tinuous and bounded. However, the solution is not unique
holds for positive constants c0 , C0 . We introduce the norm,
unless one places more restriction on the solution.
1/2
|v|r = |ξ |m |v̂(ξ )|2 dξ
C. Fundamental Solutions, Green’s Function
Let for function v in C0∞ (), the set of infinitely differentiable
functions that vanish outside . Here, is a bounded
|x − y|2−n domain in Rn with smooth boundary, and v̂(ξ ) denotes
K (x, y) = + h(x), n > 2,
(2 − n)ωn the Fourier transform given by Eq. (19). By Eq. (23) we
log 4 see that (P(D)v, v) is equivalent to |v|r2 on C0∞ (), where
K (x, y) = + h(x), n=2
2π
(u, v) = u(x)v(x) d x
where ωn = 2π n/2 / ( 12 n) is the surface area of the unit
sphere in Rn and h(x) is a harmonic function in a bounded Let
domaiń ⊂ Rn (i.e., h(x) is a solution of h = 0 in
). If the boundary ∂ of is sufficiently regular and a(u, v) = (u, P(D)v), u, v ∈ C0∞ () (24)
h ∈ C 2 (),
¯ then Green’s theorem implies for y ∈ : If u ∈ C m ()
¯ is a solution of the Dirichlet problem,
u(y) = K (x, y) u(x) d x P(D)u = f in (25)
Dµu = 0 on ∂, |µ| < r (26)
∂ K (x, y) ∂u
+ u(x) − K (x, y) d Sx (21) then it satisfies
∂ ∂n ∂n
for all u ∈ C 2 ().
¯ The function K (x, y) is called a funda- a(u, v) = ( f, v), v ∈ C0∞ () (27)
mental solution of the operator . If, in addition, K (x, y) Conversely, if ∂ is sufficiently smooth and u ∈ C () ¯ m
vanishes for x ∈ ∂, it is called a Green’s function, and satisfies Eq. (27), then it is a solution of the Dirichlet
we denote it by G(x, y). In this case, problem, Eqs. (25) and (26). This is readily shown by in-
∂G(x, y) tegration by parts. Thus, one can solve Eqs. (25) and (26)
u(y) = u(x) d Sx , y∈ (22) by finding a function u ∈ C m ()
¯ satisfying Eq. (27). Since
∂ ∂n
a(u, v) is a scalar product, it would be helpful if we had
for all u ∈ C 2 ()
¯ that are harmonic in . Conversely, this
a theorem stating that the expression ( f, v) can be repre-
formula can be used to solve the Dirichlet problem for sented by the expression a(u, v) for some u. Such a the-
Laplace’s equation if we know the Green’s function for orem exists (the Riesz representation theorem), provided
, since the righthand side of Eq. (22) is harmonic in a(u, v) is the scalar product of a Hilbert space and
and involves only the values of u(x) on ∂. It can be shown
that if the prescribed boundary values are continuous, then |( f, v)| Ca(v, v)1/2 (28)
indeed Eq. (22) does give a solution to the Dirichlet prob-
lem for Laplace’s equation. We can fit our situation to the theorem by completing
It is usually very difficult to find the Green’s function for C0∞ () with respect to the |v|r norm and making use of
an arbitrary domain. It can be computed for geometrically the fact that a(v, v)1/2 and |v|r are equivalent on C0∞ ()
symmetric regions. In the case of a ball of radius R and and consequently on the completion H0r (). Moreover,
center 0, it is given by: inequality (28) follows from the Poincare inequality,
G(x, y) = K (x, y) − (|y|/R)2−n K (x, R 2 |y|−2 y) v M r |v|r , v ∈ C0∞ () (29)
it can be shown that if ∂ and f are sufficiently smooth, It is clear that w(x) 0 in . For otherwise it would have
then u will be in C m () ¯ and will be a solution of Eqs. (25) a negative interior mininum in . At such a point, one
and (26). has ∂ 2 w/∂ xk2 = 0 for each k, and consequently. w 0
The proof of the Poincare inequality (29) can be given contradicting Eq. (35). Since w ∈ C(),¯ there is a constant
as follows. It sufficies to prove it for r = 1 and contained Ct such that:
in the slab 0 < x1 < M. Since v ∈ C0∞ (),
0 w(x) C1 , x ∈
x1 2
v(x1 , . . . , xn )2 = vx1 (t, x2 , . . . , xn ) dt Let
0
x1 K = max ψ(t)
|t|C1
x1 vx1 (t, x2 , . . . , xn )2 dt
0
Then, by Eq. (33):
M
M vx1 (t, x2 , . . . , xn )2 dt |∂ f (x, t)/∂t| K , |t| C1 (36)
0
Thus, Consequently,
M M f (x, t) − f (x, s) K (t − s) (37)
v(x1 , . . . , xn ) d x1 M
2 2
vx1 (t, x2 , . . . , xn )2 dt
0 0 when −C1 s t C1 . We define a sequence {u k } of func-
If we now integrate over x2 , . . . , xn , we obtain: tions as follows. We take u 0 = w and once u k−1 has been
defined, we let u k be the solution of the Dirichlet problem:
v Mvx1
Lu k ≡ u k − K u k = f (x, u k−1 ) − K u k−1 (38)
But, by Parseval’s identity,
in with u k = 0 on ∂. The solution exists by the theory
|vx1 | d x = |v̂x1 |2 dξ
2
of linear elliptic equations. We show by induction that:
−w u k u k−1 w (39)
= ξ12 |v̂|2 dξ |ξ |2 |v̂|2 dξ = |v|21
To see this for k = 1, note that:
L(u 1 − w) = f (x, w) − K w − w + Kw 0
E. Iterations
An important method of solving both linear and nonlinear From this we see that u 1 w in . If u 1 −w had an interior
problems is that of successive approximations. We illus- positive maximum in , we would have
trate this method for the following Dirichlet problem: L(u 1 − w) = (u 1 − w) − K (u 1 − w) < 0
u = f (x, u), x ∈ (30) at such a point. Thus, u 1 w in . Also we note:
u=0 on ∂ (31) (u 1 + w) = f (x, w) + K (u 1 − w) + w K (u 1 − w)
We assume that the boundary of is smooth and that
This shows that u 1 + w cannot have a negative minimum
f (x, t) = f (x1 . . . , xn , t) is differentiable with respect to
inside . Hence, u 1 + w 0 in , and Eq. (39) is verified
all arguments. Also we assume that:
for k = 1. Once we know it is verified for k, we note that:
| f (x, t)| N , x ∈ , −∞ < t < ∞ (32)
L[u k+1 − u k ] = f (x, u k ) − f (x, u k−1 )
|∂ f (x, t)/∂t| ψ(t), x ∈ (33)
−K (u k − u k−1 ) 0
where ψ(t) is a continuous function.
by Eq. (37). Thus, u k−1 u k in . Hence,
First, we note that for every compact subset G of
there is a constant C such that: (u k+1 + w) = f (x, u k ) − K u k + K u k+1
max |∇v| C(sup | v| + sup |v|) (34) + w K (u k+1 − u k )
G
Again, we deduce from this that u k+1 +w 0 in . Hence,
for all v ∈ C 2 (). Assume this for the moment, and let
Eq. (39) holds for k + 1 and consequently for all k. In par-
w(x) be the solution of the Dirichlet problem,
ticular, we see that the u k are uniformly bounded in ,
w = −N in , w=0 on ∂ (35) and by Eq. (38) the same is true of the functions u k .
P1: GNB Final Pages
Encyclopedia of Physical Science and Technology EN004E-173 June 8, 2001 17:20
Hence, by Eq. (34), the first derivatives of the u k are uni- F. Variational Methods
formly bounded on compact subsets of . If we differenti-
In many situations, methods of the calculus of variations
ate Eq. (38), we see that the sequence (∂u k /∂ x j ) is uni-
are useful in solving problems for partial differential equa-
formly bounded on compact subsets of (here we make
tions, both linear and nonlinear. We illustrate this with a
use of the continuous differentiability of f ). If we now
simple example. Suppose we wish to solve the problem,
make use of Eq. (34) again, we see that the second deriva-
tives of the u k are uniformly bounded on compact subsets ∂ ∂u(x)
− pk (x) + q(x)u(x) = 0 in (40)
of . Hence, by the Ascoli–Arzela theorem, there is a sub- ∂ xk ∂ xk
sequence that converges together with its first derivatives
u(x) = g(x) on ∂ (41)
uniformly on compact subsets of . Since the sequence u k
is monotone, the whole sequence must converge to a con- Assume that pk (x) c0 , q(x) c0 , c0 > 0 for x ∈ ,
tinuous function u that satisfies |u(x)| w(x). Hence, u bounded, ∂ smooth, and that g is in C 1 (∂). We consider
vanishes on ∂. By Eq. (38), the functions u k must con- the expression,
verge uniformly on compact subsets, and by Eq. (34), the
1 ∂u(x) ∂v(x)
same must be true of the first derivatives of the u k . From a(u, v) = pk (x)
the differentiated Eq. (38) we see that the (∂u k /∂ x j ) con- 2 ∂ xk ∂ xk
verge uniformly on bounded subsets and consequently the
same is true of the second derivatives of the u k by Eq. (34). + q(x)u(x)v(x) d x
Since the u k converge uniformly to u in and their sec-
ond derivatives converge uniformly on bounded subsets, and put a(u) = a(u, u). If u ∈ C 2 () ∩ C 0 (),
¯ satisfies,
we see that u ∈ C 2 () and u k → u. Hence, Eq. (41), and
then solutions of Eq. (43) are the critical points of the Then there exists a Palais–Smale sequence, Eq. (45), with
functional, c satisfying:
G(u) = ∇u2 − 2 F(x, u) d x b0 ≤ c < ∞ (47)
When G satisfies Eq. (46) we say that it exhibits mountain
where the norm is that of L 2 () and pass geometry.
t
It was then discovered that other geometries (i.e., con-
F(x, t) = f (x, s) ds figurations) produce Palais–Smale sequences, as well.
0
Consider the following situation. Assume that
The history of this approach can be traced back to the
calculus of variations in which equations of the form: E=M⊕N
G (u) = 0 (44) is a decomposition of E into the direct sum of closed
are the Euler–Lagrange equations of the functional G. The subspaces with
original method was to find maxima or minima of G by
dim N < ∞
solving Eq. (44) and then show that some of the solu-
tions are extrema. This approach worked well for one- Suppose there is an R > 0 such that
dimensional problems. In this case, it is easier to solve
Eq. (44) than it is to find a maximum or minimum of G. sup G ≤ b0 = inf G
N ∩∂ B R M
However, in higher dimensions it was realized quite early
that it is easier to find maxima and minima of G than it is to Then, again, there is a sequence satisfying Eqs. (45) and
solve Eq. (44). Consequently, the tables were turned, and (47). Here, B R denotes the ball of radius R in E and ∂ B R
critical point theory was devoted to finding extrema of G. denotes its boundary.
This approach is called the direct method in the calculus In applying this method to solve Eq. (43), one discovers
of variations. If an extremum point of G can be identified, that the asymptotic behavior of f (x, t) at ∞ plays an im-
it will automatically be a solution of Eq. (44). portant role. One can consider several possibilities. When
The simplest extrema to find are global maxima and
minima. For such points to exist one needs G to be semi- lim sup | f (x, t)/t| = ∞ (48)
|t|→∞
P1: GNB Final Pages
Encyclopedia of Physical Science and Technology EN004E-173 June 8, 2001 17:20
p(x, t) → p± (x) a.e. as t → ±∞ (this is done to keep the functions u real valued), and
u2t = (2π )n (1 + µ2 )t |αµ |2 < ∞ (60)
A stronger form occurs when
where µ2 = µ21 + · · · + µ2n . It is not required that the se-
p(x, t) → 0 as |t| → ∞ (53)
ries (59) converge in anyway, but only that Eq. (60) hold.
and If
|P(x, t)| ≤ W (x) ∈ L 1 () (54) u= aµ eiµx , v= βµ eiµx
P1: GNB Final Pages
Encyclopedia of Physical Science and Technology EN004E-173 June 8, 2001 17:20
are members of Ht , we can introduce the scalar product: to be given arbitrarily when Eq. (69) does hold. On the
other hand, if λ is not equal to any λk , then, Eq. (69) never
(u, v)t = (2π )n (1 + µ2 )t αµ β−µ (61) holds, and we can solve Eq. (66) by taking the αµ to satisfy
With this scalar product, Ht becomes a Hilbert space. If Eq. (67). Thus, we have Theorem 0.2:
f (x) = γµ eiµx (62) Theorem 0.2. There is a sequence {λk } of nonnegative
integers tending to +∞ with the following properties. If
we wish to solve f ∈ Ht and λ = λk for every k, then there is a unique so-
lution u ∈ Ht+2 of Eq. (66). If λ = λk for some k, then one
− u= f (63)
can solve Eq. (66) only if f satisfies Eq. (68). The so-
In other words, we wish to solve: lution is not unique; there is a finite number of linearly
µ2 αµ eiµx = γµ eiµx independent periodic solutions of:
( + λk )u = 0 (70)
This requires:
µ 2 α µ = γµ ∀µ which can be added to the solution.
In order to solve for αµ , we must have The values λk for which Eq. (70) has a nontrivial solu-
tion (i.e., a solution which is not ≡ 0) are called eigenval-
γ0 = 0 (64) ues, and the corresponding nontrivial solutions are called
Hence, we cannot solve for all f . However, if Eq. (64) eigen functions.
holds, we can solve Eq. (63) by taking: To analyze the situation a bit further, suppose λ = λk
for some k, and f ∈ Ht is given by Eq. (62) and satisfies
αµ = γµ µ2 when µ = 0 (65) Eq. (68). If v ∈ Ht is given by:
On the other hand, we can take α0 to be any number we
v= βµ eiµx (71)
like, and u will be a solution of Eq. (63) as long as it
satisfies Eq. (65). Thus, we have Theorem 0.1: then
Theorem 0.1. If f, given by Eq. (62), is in Ht and satis- ( f, v)t = (2π )n (1 + µ2 )t γµ β−µ
fies Eq. (64), then Eq. (63) has a solution u ∈ Ht+2 . An by Eq. (61). Hence, we have
arbitrary constant can be added to the solution.
( f, v)t = 0
If we wish to solve:
for all v ∈ Ht satisfying Eq. (71) and
−( + λ)u = f (66)
βµ = 0 when µ2 = λk (72)
where λ ∈ R is any constant, we want (µ − λ)αµ = γµ , or
2
I. Nature of Combinatorics
II. Basic Counting Techniques
III. Recurrence Relations and Generating Functions
IV. Inclusion–Exclusion Principle
V. Existence Problems
523
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
the manipulation of finite sets of symbols (e.g., strings of 10 × 10 = 1,000,000 possibilities, not enough to accom-
binary digits), and combinatorial mathematics, which pro- modate the expected number of vehicles. By contrast, the
vides tools for analyzing such patterns of symbols, have proposed new system allows (again by the product rule)
rapidly achieved prominence together. Moreover, since the 26 × 26 × 10 × 10 × 10 × 10 = 6,760,000 possibilities,
symbols themselves can be abstract objects (rather than more than enough to satisfy the anticipated demand.
simply numerical quantities), combinatorics supports the
EXAMPLE 2. DNA (deoxyribonucleic acid) consists of
more abstract manipulations of symbolic mathematics and
a chain of nucleotide bases (adenine, cytosine, guanine,
symbolic computer languages.
thymine). How many different three-base sequences are
Combinatorics is at heart a problem-solving discipline
possible?
that blends mathematical techniques and concepts with a
necessary touch of ingenuity. In order to emphasize this Solution. For each of the three positions in the sequence,
dual nature of combinatorics, the sections that follow will there are four possibilities for the base, so (by the product
first present certain fundamental combinatorial principles rule) there are 4 × 4 × 4 = 64 such sequences.
and then illustrate their application through a number of
EXAMPLE 3. In a certain computer programming lan-
diverse examples. Specifically, Sections II–IV provide an
guage, each identifier (variable name) consists of either
introduction to some powerful techniques for counting
one or two alphanumeric characters (A–Z, 0–9), but the
various combinatorial arrangements, and Section V ex-
first character must be alphabetic (A–Z). How many dif-
amines when certain patterns can be guaranteed to exist.
ferent identifier names are possible in this language?
Solution. In this case, analysis of the compound event
II. BASIC COUNTING TECHNIQUES can be broken into counting the possibilities for event
E, a single-character identifier, and for event F, a two-
A. Fundamental Rules of Sum and Product character identifier. The number of possibilities for E is
26, whereas (by the product rule) the number of possibili-
Two deceptively simple, but fundamentally important,
ties for F is 26 × (26 + 10) = 936. Since the two events E
rules allow the counting of complex patterns by decompo-
and F are mutually exclusive, the total number of distinct
sition into simpler patterns. The first such principle states,
identifiers is 26 + 936 = 962.
in essence, that if we slice a pie into two nonoverlapping
portions, then indeed the whole (pie) is equal to the sum
B. Permutations and Combinations
of its two parts.
In the analysis of combinatorial problems, it is essen-
Rule of Sum. Suppose that event E can occur in m differ-
tial to recognize when order is important in the arrange-
ent ways, that event F can occur in n different ways, and
ment and when it is not. To emphasize this distinc-
that the two events are mutually exclusive. Then, the com-
tion, the set X = [x1 , x2 , . . . , xn ] consists of n elements
pound event where at least one of the two events happens
xi , assembled without regard to order, whereas the list
can occur in m + n ways.
X = [x1 , x2 , . . . , xn ] contains elements arranged in a pre-
The second principle indicates the number of ways that scribed order.
a menu of choices (one item chosen from E, another item In the previous examples, the order of arrangement was
chosen from F) can be selected. clearly important so lists were implicitly being counted.
More generally, arrangements of objects into a list are re-
Rule of Product. Suppose that event E can occur in m
ferred to as permutations. For example, the objects a, b, c
different ways and that subsequently event F can occur
can be arranged into the following permutations: [a, b, c],
in n different ways. Then, a choice from E followed by a
[a, c, b], [b, a, c], [b, c, a], [c, a, b], [c, b, a]. By the prod-
choice from F can be made in m × n ways.
uct rule, n distinct objects can be arranged into:
EXAMPLE 1. A certain state anticipates a total of
2,500,000 registered vehicles within the next ten years. n! = n × (n − 1) × (n − 2) × · · · × 2 × 1
Can the current system of license plates (consisting of six different permutations. (The symbol n!, or n factorial, de-
digits) accommodate the expected number of vehicles? notes the product of the first n positive integers.) A per-
Should there instead be a change to a proposed new sys- mutation of size k is a list with k elements chosen from
tem consisting of two letters followed by four digits? the n given objects, and there are exactly
Solution. To analyze the current situation, there are ten P(n, k) = n × (n − 1) × (n − 2) × · · · × (n − k + 1)
possibilities (0–9) for each of the six digits, so appli-
cation of the product rule yield 10 × 10 × 10 × 10 × such permutations.
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
EXAMPLE 4. In a manufacturing plant, a particular prod- of ways of constructing a string of 11 symbols (eight stars
uct is fabricated by processing in turn on four different and three bars); namely, we can select the three bars in
machines. If any processing sequence using all four ma- C(11, 3) = 165 ways.
chines is permitted, how many different processing orders
are possible? How many processing orders are there if
C. Binomial Coefficients
only two machines from the four need to be used?
Ways of arranging objects can also be viewed from an
Solution. Each processing order corresponds to a permu-
algebraic perspective. To understand this correspondence,
tation of the four machines, so there are P(4, 4) = 4! = 24
consider the product of n identical factors (1 + x), namely:
different orders. If processing on any two machines is
allowable then there are P(4, 2) = 4 ×3 = 12 different (1 + x)n = (1 + x)(1 + x) · · · (1 + x)
orders.
The coefficient of x k in the expansion of this product is juh
When the order of elements occurring in the arrange- the number of ways to select the symbol x from exactly
ment is not pertinent, then a way of arranging k objects, k of the factors. However, the number of ways to select
chosen from n distinct objects, is called a combination these k factors from the n available is C(n , k), meaning
of size k. For example, the objects a , b, c can be ar- that:
ranged into the following combinations, or sets, of size
2: {a , b}, {a , c}, {b, c}. The number of combinations of (1 + x)n = C(n , 0) + C(n , 1)x
size k from n objects is given by the formula: + C(n , 2)x 2 + · · · + C(n , n)x n (1)
C(n , k) = P(n , k)/k! Because the coefficients C(n , k) arise in this way from the
expansion of a two-term expression, they are also referred
EXAMPLE 5. A group of ten different blood samples is to to as binomial coefficients. These coefficients can be con-
be split into two batches, each consisting of five “pooled” venienthvaced in a triangular array, called Pascal’s trian-
samples. Further chemical analysis will then be performed gle, as shown in Fig. 1. Row n of Pascal’s triangle contains
on the two batches. In how many ways can the samples be the values C(n, 0), C(n, 1), . . . , C(n, n). Several patterns
split in this fashion? are apparent from this figure. First, the binomial coeffi-
Solution. Any division of the samples S1 , S2 , . . . , S10 into cients are symmetrically placed within each row: namely,
the two batches can be uniquely identified by those sam- C(n , k) = C(n , n − k). Second, the coefficient appearing
ples belonging to the first batch. For example, {S1 , S2 , in any row equals the sum of the two coefficients appear-
S5 , S6 , S8 } defines one such division. Since the order of ing in the previous row just to the left and to the right. For
samples within each batch is not important, there are example, in row 5 the third entry, 10, is the sum of the
C(10, 5) = 252 ways to divide the original samples. second and third entries, 4 and 6, from the previous row.
In general, the binomial coefficients satisfy the identity:
EXAMPLE 6. Suppose that 12 straight lines are drawn
on a piece of paper, with no two lines being parallel and C(n, k) = C(n − 1, k − 1) + C(n − 1, k)
no three meeting at a single point. How many different The binomial coefficients satisfy a number of other in-
triangles are formed by these lines? teresting and useful identities. To illustrate one way of
Solution. Any three lines form a triangle since no lines are
parallel. As a result, there are as many triangles as choices
of three lines selected from the 12, giving C(12, 3) = 220
such triangles.
EXAMPLE 7. How many different solutions are there in
nonnegative integers xi to the equation x1 + x2 +x3 +
x4 = 8?
Solution. We can view this problem as an equivalent one in
which eight balls are placed into four numbered boxes. For
example, the solution x1 = 2, x2 = 3, x3 = 2, x4 = 1 corre-
sponds to placing 2, 3, 2, 1 balls into boxes 1, 2, 3, 4. This
solution can also be represented by the string * * | * * *
| * * | * which shows the number of balls residing in the FIGURE 1 Arrangement of binomial coefficients C(n, k) in
four boxes. The number of solutions is then the number Pascal’s triangle.
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
discovering such identities, formally substitute the value assessing the probability that certain desirable (or unde-
x = 1 into both sides of Eq. (1), yielding the identity: sirable) outcomes will occur.
2n = C(n, 0) + C(n, 1) + C(n, 2) + · · · + C(n, n) EXAMPLE 8. What is the probability that a hand of five
cards, dealt from a shuffled deck of cards, contains at least
In other words, the binomial coefficients for n must sum three aces?
to the value 2n . If instead, the value x = −1 is formally
substituted into Eq. (1), the following identity results: Solution. The population of 52 cards can be conveniently
partitioned into set A of the 4 aces and set N of the 48
0 = C(n, 0) − C(n, 1) + C(n, 2) − · · · + (−1)n C(n, n) non-aces. In order to obtain exactly three aces, the hand
This simply states that the alternating sum of the binomial must contain three cards from set A and two cards from
coefficients in any row of Fig. 1 must be zero. set N , which can be achieved in C(4, 3) C(48, 2) = 4512
A final identity involving the numbers in Fig. 1 con- ways. To obtain exactly 4 aces, the hand must contain
cerns a string of coefficients progressing from the left- four cards from A and one card from N , which can be
hand border along a downward sloping diagonal to any achieved in C(4, 4) C(48, 1) = 48 ways. Since the total
other entry in the figure. Then, the sum of these coefficients number of possible hands of five cards chosen from the
will be found as the value just below and to the left of the 52 is C(52, 5) = 2,598,960, the probability of the required
last such entry. For instance, the sum C(2, 0) + C(3, 1) + hand is (4512 + 48)/2,598,960 = 0.00175, indicating a
C(4, 2) + C(5, 3) = 1 + 3 + 6 + 10 = 20 is indeed the rate of occurrence of less than twice in a thousand.
same as the binomial coefficient C(6, 3). In general, this EXAMPLE 9. In a certain state lottery, six winning num-
observation can be expressed as the identity: bers are selected from the numbers 1, 2, . . . , 40. What are
C(n + k + 1, k) = C(n, 0) + C(n + 1, 1) the odds of matching all six winning numbers? What are
the odds of matching exactly five? Exactly four?
+ C(n + 2, 2) + · · · + C(n + k, k)
Solution. The number of possible choices is the number of
This identity can be given a pleasant combinatorial in- ways of selecting six numbers from the 40, or C(40, 6) =
terpretation, namely, consider selecting k items (with- 3,838,380. Since only one of these is the winning selec-
out regard for order) from a total of n + k + 1 items, tion, the odds of matching all six numbers is 1/3,838,380.
which can be done in C(n + k + 1, k) ways. In any To match exactly five of the winning numbers, there are
such selection, there will be some item number r so C(6, 5) = 6 ways of selecting the five matching numbers
that items 1, 2, . . . , r are selected but r + 1 is not. This and C(34, 1) = 34 ways of selecting a nonmatching num-
then leaves k − r items to be selected from the re- ber, giving (by the product rule) 6 × 34 = 204 ways, so
maining n + k + 1 − (r + 1) = n + k − r items, which can the odds are 204/3,838,380 = 17/319,865 for matching
be done in C(n + k − r, k − r ) ways. Since the cases five numbers. To match exactly four winning numbers,
r = 0, 1, . . . , k are mutually exclusive, the sum rule there are C(6, 4) = 15 ways of selecting the four match-
shows the total number of selections is also equal to ing numbers and C(34, 2) = 561 ways of selecting the
C(n + k, k) + C(n + k − 1, k − 1) + · · · + C(n + 1, 1) + nonmatching numbers, giving 15 × 561 = 8415 ways, so
C(n, 0). Thus, by counting the same group of objects in the odds are 8415/3,838,380 = 561/255,892 (or approx-
two different ways, one can verify the above identity. imately 0.0022) of matching four numbers.
This technique of “double counting” provides a power-
ful tool applicable to a number of other combinatorial EXAMPLE 10. An alarm system is constructed from
problems. five identical components, each of which can fail (in-
dependently of the others) with probability q. The sys-
tem is designed with a certain amount of redundancy so
D. Discrete Probability that it functions whenever at least three of the compo-
nents are working. How likely is it that the entire system
Probability theory is an important area of mathematics
functions?
in which combinatorics plays an essential role. For ex-
ample, if there are only a finite number of outcomes Solution. There are two states for each individual com-
S1 , S2 , . . . , Sm to some process, the ability to count the ponent, either good or failed. The state of the system can
number of occurrences of Si provides valuable informa- be represented by a binary string x1 x2 x3 x4 x5 , where xi
tion on the likelihood that outcome Si will in fact be is 1 if component i is good and is 0 if it fails. A func-
observed. Indeed, many phenomena in the physical sci- tioning state for the system thus corresponds to a binary
ences are governed by probabilistic rather than determin- string having at most two zeros. The number of states
istic laws; therefore, one must generally be content with with exactly two zeros is C(5, 2) = 10, so the probability
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
of two failed and three good components is 10q 2 (1 − q)3 . Using the initial conditions f 1 = 2 and f 2 = 3, the values
Similarly, the probability of exactly one failed component f 3 , f 4 , . . . , f 8 can be calculated in turn by substitution
is C(5, 1)q 1 (1 − q)4 = 5q(1 − q)4 , and the probability of into Eq. (2):
no failed components is C(5, 0)(1 − q)5 = (1 − q)5 . Alto-
f 3 = 5, f 4 = 8, f 5 = 13,
gether, the probability that the system functions is given
by 10q 2 (1 − q)3 + 5q(1 − q)4 + (1 − q)5 . For example, f 6 = 21, f 7 = 34, f 8 = 55
when q = 0.01, the system will operate with probability
0.99999015 and thus fail with probability 0.00000985; this Therefore, 55 binary strings of length 8 have the desired
shows how adding redundancy to a system composed of property.
unreliable components (1% failure rate) produces a highly In this problem it was clearly expedient to solve the
reliable system (0.001% failure rate). general problem by use of a recurrence relation that
stressed the interdependence of solutions to related prob-
lems. The particular sequence obtained for this problem,
[1, 2, 3, 5, 8, 13, 21, 34, . . .], with f 0 = 1 added for con-
III. RECURRENCE RELATIONS venience, is called the Fibonacci sequence, and it arises
AND GENERATING FUNCTIONS in numerous problems of mathematics as well as biology,
physics, and computer science.
A. Recurrence Relations
and Counting Problems EXAMPLE 12. Suppose that ten straight lines are drawn
in a plane so that no two lines are parallel and no three
Not all counting problems can be solved as readily and as intersect at a single point. Into how many different regions
directly as in Section II. In fact, the best way to solve will the plane be divided by these lines?
specific counting problems is often to solve instead a
more general, and presumably more difficult, problem. Solution. As seen in Fig. 2, the number of regions created
One technique for doing this involves the use of recurrence by one straight line is f 1 = 2, the number created by two
relations. lines is f 2 = 4, and the number created by three lines is
Recall that the binomial coefficients satisfy the relation: f 3 = 7. The picture becomes excessively complicated with
more added lines, so it is prudent to seek a general solution
C(n, k) = C(n − 1, k − 1) + C(n − 1, k) for f n , the number of regions created by n lines in the
plane. Suppose that n − 1 lines have already been drawn
Such an expression shows how the value C(n, k) can be
and that line n is now added. Because the lines are all
calculated from certain “prior” values C(n − 1, k − 1) and
mutually nonparallel, line n must intersect each existing
C(n − 1, k). This type of relation is termed a recurrence
line exactly once. These n − 1 intersection points divide
relation, since it enables any specific value in the sequence
the new line into n segments and each segment serves to
to be obtained from certain previously calculated values.
subdivide an existing region into two regions. Thus, the
EXAMPLE 11. How many strings of eight binary digits n segments increase the number of regions by exactly n,
contain no consecutive pair of zeros? producing the recurrence relation:
Solution. It is easy to find the number f 1 of such strings of f n = f n−1 + n
length 1, since the strings “0” and “1” are both acceptable,
yielding f 1 = 2. Also, the only forbidden string of length Given the initial condition f 1 = 2, application of this re-
2 is “00” so f 2 = 3. There are three forbidden strings of currence relation yields the values f 2 = f 1 + 2 = 4 and
length 3 (“001,” “100,” and “000”), whereupon f 3 = 5. f 3 = f 2 + 3 = 7, as previously verified. In fact, such a re-
At this point, it becomes tedious to calculate subsequent currence relation can be explicitly solved, giving:
values directly, but they can be easily found by noticing
that a certain recurrence relation governs the sequence f n .
In an acceptable string of length n, either the first digit is
a 1 or it is a 0. In the former case, the remaining digits
can be any acceptable string of length n − 1 (and there
are f n−1 of these). In the latter case, the second digit must
be a 1 and then the remaining digits must form an accept-
able string of length n − 2 (there are f n−2 of these). These
observations provide the recurrence relation:
f n = f n−1 + f n−2 (2) FIGURE 2 Number of regions created by the placement of n lines.
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
the number 4 has the five partitions: {1, 1, 1, 1}, {1, 1, 2}, IV. INCLUSION–EXCLUSION PRINCIPLE
{1, 3}, {2, 2}, and {4}.
Another important counting technique is based on the idea
EXAMPLE 15. How many partitions are there for the
of successively adjusting an initial count through system-
integer n?
atic additions and subtractions that are guaranteed to pro-
Solution. The choices for the number of ones to include duce a correct final answer. This technique, called the
as parts is represented by the polynomial: inclusion–exclusion principle, is applicable to many in-
stances where direct counting would be impractical.
(1 + x + x 2 + · · ·) = (1 − x)−1 As a simple example, suppose we wish to count the
number of elements that are not in some subset A of a
where the x i term means that 1 is to appear i times in the given universal set U . Then, the required number of ele-
partition. Similarly, the choices for the number of twos to ments equals the total number of elements in U , denoted
include is given by: by N = N (U ), minus the number of elements in A, de-
noted by N (A). Expressed in this notation,
(1 + x 2 + x 4 + · · ·) = (1 − x 2 )−1
N (A ) = N − N (A)
the choices for the number of threes is given by:
where A = U − A designates the set of elements in U that
(1 + x + x + · · ·) = (1 − x )
3 6 3 −1 do not appear in A. Figure 3a depicts this relation using
a Venn diagram (John Venn, 1834–1923), in which the
and so forth. Therefore, the number of partitions of n can enclosing rectangle represents the set U , the inner circle
be found as the coefficient of x n in the generating function: represents A, and the shaded portion represents A . The
quantity N (A ) is thus obtained by excluding N (A) ele-
f (x) = (1 − x)−1 (1 − x 2 )−1 (1 − x 3 )−1 · · · ments from N .
EXAMPLE 16. Find the number of partitions of the in- EXAMPLE 17. The letters a, b, c, d, e are used to form
teger n into distinct parts. five-letter words, using each letter exactly once. How
many words do not contain the sequence bad?
Solution. Since the parts must be distinct, the choices for
any integer i are whether to include it (x i ) or not (x 0 ) in Solution. The universe here consists of all words, or per-
the given partition. As a result, the generating function for mutations, formed from the five letters, so there are N =
this problem is 5! = 120 words in total. Set A consists of all such
words containing bad. By treating these three letters as
f (x) = (1 + x)(1 + x 2 )(1 + x 3 ) · · · a new “megaletter” x, the set A equivalently contains all
words formed from x, c, e so N (A) = 3! = 6. The num-
For example, the coefficient of x 8 in the expansion of ber of words not containing bad is then N − N (A) =
f (x) is found to be 6, meaning that there are six partitions 120 − 6 = 114.
of 8 into distinct parts: namely, {8}, {1, 7}, {2, 6}, {3, 5}, Figure 3b shows the situation for two sets, A and B,
{1, 2, 5}, {1, 3, 4}. contained in the universal set U . As this figure suggests,
FIGURE 3 Venn diagrams relative to a universal set U . (a) Sets A and A ; (b) Sets A, B, and A B .
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
the number of elements in either A or B (or both) is then Pr (A ∪ B ∪ C) = Pr (A) + Pr (B) + Pr (C) − [Pr (AB)
the number of elements in A plus the number of elements
in B, minus the number of elements in both: + Pr (AC) + Pr (BC)] + Pr (ABC)
N (A ∪ B) = N (A) + N (B) − N (AB) (3) Here, Pr (A) is the probability of event A occurring,
Pr (AB) is the probability of event AB occurring, and so
Here A ∪ B denotes the elements either in A or in B, or forth. Notice that Pr (A) = Pr (B) = Pr (C) = (1 − q)2
in both, whereas AB denotes the elements in both A and and Pr (AB) = Pr (AC) = Pr (BC) = Pr (ABC) = (1 −
B. Since the sum N (A) + N (B) counts the elements of q)3 , so the assembly functions with probability:
AB twice rather than just once, N (AB) is subtracted to
remedy the situation. The number of elements N (A B ) in Pr (A ∪ B ∪ C) = 3(1 − q)2 − 3(1 − q)3 + (1 − q)3
neither A nor B is N − N (A ∪ B), thus an alternative form
= 3(1 − q)2 − 2(1 − q)3
of Eq. (3) is
Two positive integers are called relatively prime if the
N (A B ) = N − [N (A) + N (B)] + N (AB) (4)
only positive integer evenly dividing both is the number 1.
This form shows how terms are alternately included and For example, 7 and 15 are relatively prime, whereas 6 and
excluded to produce the desired result. 15 are not (they share the common divisor 3).
EXAMPLE 18. In blood samples obtained from 50 pa- EXAMPLE 20. How many positive integers not exceed-
tients, laboratory tests show that 20 patients have antibod- ing 60 are relatively prime to 60?
ies to type A bacteria, 29 patients have antibodies to type
Solution. The appropriate universe here is U = {1, 2,
B bacteria, and 8 patients have antibodies to both types.
. . . , 60} and the (prime) divisors of N = 60 are 2, 3, 5.
How many patients have antibodies to neither of the two
Relative to U , let A be the set of integers divisible by
types of bacteria?
2, let B be the set of integers divisible by 3, and let C
Solution. The given data state that N = 50, N (A) = 20, be the set of integers divisible by 5. The problem here
N (B) = 29, and N (AB) = 8. Therefore, by Eq. (4): is to calculate N (A B C ), the number of integers that
share no divisors with 60. Because every other integer
N (A B ) = 50 − [20 + 29] + 8 = 9
is divisible by 2, we have N (A) = 60/2 = 30. Similarly,
meaning that nine patients are immune to neither type of N (B) = 60/3 = 20 and N (C) = 60/5 = 12. Because any
bacteria. integer divisible by both 2 and 3 must also be divisible by
The foregoing equations generalize in a natural way to 6, we have N (AB) = 60/6 = 10. Likewise, N (AC) = 60/
three sets A, B, and C: 10 = 6, N (BC) = 60/15 = 4, and N (ABC) = 60/30 = 2.
Substituting these values into Eq. (6) gives:
N (A ∪ B ∪ C) = N (A) + N (B) + N (C) − [N (AB)
N (A B C ) = 60 − [30 + 20 + 12] + [10 + 6 + 4] − 2
+ N (AC) + N (BC)] + N (ABC) (5)
= 16
N (A B C ) = N − [N (A) + N (B) + N (C)] + [N (AB)
so there are 16 positive numbers not exceeding 60 that are
+ N (AC) + N (BC)] − N (ABC) (6)
relatively prime to 60.
In each of these forms, the final result is obtained by suc-
EXAMPLE 21. Each package of a certain product con-
cessive inclusions and exclusions, thus justifying these as
tains one of three possible prizes. How likely is it that a
manifestations of the inclusion–exclusion principle.
purchaser of five packages of the product will get at least
EXAMPLE 19. An electronic assembly is comprised of one of each prize?
components 1, 2, and 3 and functions only when at least
Solution. The number of possible ways in which this event
two components are working. If all components fail inde-
can occur will first be calculated, after which a probabilis-
pendently of one another with probability q, what is the
tic statement can be deduced. Let the prizes be denoted
probability that the entire assembly functions?
a, b, c, and let the contents of the five packages be rep-
Solution. Let A denote the event in which components 1 resented by a string of five letters (where repetition is
and 2 work, let B denote the event in which components allowed). For example, the string bacca would represent
1 and 3 work, and let C denote the event in which com- one possible occurrence. Define A to be the set of all such
ponents 2 and 3 work. Of interest is the probability that at strings that do not include a. In similar fashion, define B
least one of the events A, B, or C occurs. The analogous (respectively, C) to be the set of strings that do not in-
form of Eq. (5) in the probabilistic case is clude b (respectively, c). It is required then to calculate
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
N (A B C ), the number of instances in which a, b, and Solution. Here the objects are the numbers 1, 2, . . . , 8
c all occur. The approach will be to use the inclusion– and the locations are the four sets A1 = {1, 8}, A2 =
exclusion relation, Eq. (6), to calculate this quantity. {2, 7}, A3 = {3, 6}, and A4 = {4, 5}. Notice that for each
Here, N is the number of strings of five letters over the of these sets its two elements sum to 9. According to the
alphabet {a, b, c}, so (by the product rule) N = 35 = 243. pigeonhole principle, placing five numbers into these four
Also, N (A) is the number of strings from the alphabet sets results in some set Ai having both its elements se-
{b, c}, whereupon N (A) = 25 = 32. By similar reasoning, lected, so the sum of these two elements (by the construc-
N (B) = N (C) = 32, N (AB) = N (AC) = N (BC) = 15 = tion of set Ai ) must equal 9.
1, and N (ABC) = 0. As a result,
EXAMPLE 23. In a room with n ≥ 2 persons, there must
N (A B C ) = 243 − 3(32) + 3(1) − 0 = 150 be two persons having exactly the same number of friends
in the room.
In summary, the total number of possible strings is 243
and all three prizes are obtained in 150 of these cases. If Solution. The number of possible friendships for any given
all strings are equally likely (i.e., any package is just as person ranges from 0 to n − 1. However, if n − 1 oc-
likely to contain each prize), then the required probability curs then that person is a friend of everyone else, and
is 150/243 = 0.617, indicating a better than 60% chance (assuming that friendship is a mutual relation) no other
of obtaining three different prizes in just five packages. A person can be without friends. Thus, both 0 and n − 1
similar analysis shows that for six packages, the proba- cannot simultaneously occur in a group of n persons. If
bility of obtaining all three prizes increases to 0.741; for 1, 2, . . . , n − 1 are the possible numbers of friendships,
seven packages, there is a 0.826 probability. then using these n − 1 numbers as locations for the n per-
sons (objects), the pigeonhole principle assures that some
number in {1, 2, . . . , n − 1} must appear twice. A simi-
lar result can be established for the case {0, 1, . . . , n − 2},
V. EXISTENCE PROBLEMS demonstrating there must always be at least two persons
having the same number of friends in the room.
A. Pigeonhole Principle
Two strings x = x1 x2 · · · xn and y = y1 y2 · · · ym over the
In certain combinatorial problems, it may be exceedingly alphabet {a, b, . . . , z} are said to be disjoint if they share
difficult to count the number of arrangements of a pre- no common letter and are said to overlap otherwise.
scribed type. In fact, it might not even be clear that any
such arrangement actually exists. What is needed then is EXAMPLE 24. In any collection of at least six strings,
some guiding mathematical assurance that configurations there must either be three strings that are mutually disjoint
of the desired type do indeed exist. One such principle or three strings that mutually overlap.
is the so-called pigeonhole principle. While its statement
Solution. For a given string x, let the k ≥ 5 other strings
is overwhelmingly self-evident, its applications range
in the collection be divided into two groups, D and O; D
from the simplest to the most challenging problems in
consists of those strings that are disjoint from x, and O
combinatorics.
consists of those that overlap with x. By a generalization
The pigeonhole principle states that if there are more
of the pigeonhole principle, one of these two sets must
than k objects (or pigeons) to be placed in k locations
contain at least three elements. Suppose that it is set D.
(or pigeonholes), then some location must house two (or
Then, either D contains three mutually overlapping strings
possibly more) such objects. A simple illustration of this
or it contains two disjoint strings y and z. In the first case,
principle assures that at least two residents of a town with
these three strings satisfy the stated requirements. In the
400 inhabitants have the same birthday. Here, the objects
second case, the elements x, y, z are all mutually disjoint,
are the residents and the locations are the 356 possible
so again the requirements are met. A similar argument can
birthdays. Since there are more objects than locations,
be made if O is the set containing at least three elements.
some location must contain at least two objects, meaning
In any event, there will either be three mutually disjoint
that some two residents (or more) must share the same
strings or three mutually overlapping strings.
birthday. Notice that this principle only guarantees the
This last example is a special case of Ramsey’s theo-
existence of two such residents; it does not give any infor-
rem (Frank Ramsey, 1903–1930), which guarantees that
mation about finding them.
if there are enough objects then configurations of certain
EXAMPLE 22. In any selection of five different elements types will always be guaranteed to exist. Not only does this
from {1, 2, . . . , 8}, there must be some pair of selected theorem (which generalizes the pigeonhole principle) pro-
elements that sum to 9. duce some very deep combinatorial results, but it has also
P1: LLL Final Pages
Encyclopedia of Physical Science and Technology EN004E-180 June 8, 2001 18:11
been applied to problems arising in geometry, the design of n types of fertilizer and n types of insecticide on the
of communication networks, and information retrieval. yield of a particular crop. Suppose that a field on which
the crop is grown is divided into an n × n grid of plots.
In order to minimize vertical and horizontal variations
B. Combinatorial Designs in the composition and drainage properties of the soil,
Combinatorial designs involve ways of arranging objects each fertilizer should appear on exactly one plot in each
into various groups in order to meet specified require- “row” and exactly one plot in each “column” of the grid.
ments. Such designs find application in the planning of Likewise, each insecticide should appear once in each row
statistical experiments as well as in other areas of math- and once in each column of the grid. In other words, a
ematics (number theory, coding theory, geometry, and Latin square design should be used for each of the two
algebra). treatments. Figure 5a shows a Latin square design for four
As one illustration, suppose that an experiment is to fertilizer types (A, B, C, D), and Fig. 5b shows another
be designed to test the effects of five different drugs Latin square design for four insecticide types (a, b, c, d ).
using five different subjects. One clear requirement is In addition, the fertilizer and insecticide treatments can
that each subject should receive all five drugs, since oth- themselves interact, thus an ideal design would ensure
erwise the results could be biased by variation among that each of the n 2 possible combinations of the n fertil-
the subjects. Each drug is to be administered for one izers and n insecticides appear together once. Figure 5c
week, so that at the end of five weeks the experiment shows that the two Latin squares in Figs. 5a and b (when
will be completed. However, the order in which drugs superimposed) have this property; namely, each fertilizer–
are administered could also have an effect on their ob- insecticide pair occurs exactly once on a plot. Such a pair
served potency, so it is also desirable for all drugs to of Latin squares is called orthogonal.
be represented on any given week of the experiment. A pair of orthogonal n × n Latin squares need not exist
One way of designing such an experiment is depicted in for all values of n ≥ 2. However, it has been proved that
Fig. 4, which shows one source of variation—the sub- the only exceptions occur when n = 2 and n = 6. In all
jects (S1 , S2 , . . . , S5 )—appearing along the rows and the other cases, an orthogonal pair can be constructed.
other source of variation—the weeks (W1 , W2 , . . . , W5 )— Latin squares are special instances of complete designs,
appearing along the columns. The entries within each row since every treatment appears in each row and in each
show the order in which the drugs ( A, B, . . . , E) are ad-
ministered to each subject on a weekly basis. Such an
arrangement is termed a Latin square, since the five treat-
ments (drugs) appear exactly once in each row and exactly
once in each column.
Figure 4 clearly demonstrates the existence of a 5 × 5
Latin square; more generally, Latin squares of size n × n
exist for each value of n ≥1. There are also occasions when
it is desirable to superimpose certain pairs of n × n Latin
squares. An example of this arises in testing the effects
column. Another useful class of combinatorial designs is In the previous example, these relations hold since
one in which not all treatments appear within each test 7 × 3 = 7 × 3 and 1 × 6 = 3 × 2. While the above condi-
group. Such incomplete designs are especially relevant tions must hold for any balanced incomplete block design,
when the number of treatments is large relative to the num- there need not exist a design corresponding to every set of
ber of tests that can be performed on an experimental unit. parameters satisfying these conditions.
As an example of an incomplete design, consider an
experiment in which subjects are to compare v = 7 brands
of soft drink (A, B, . . . , G). For practical reasons, every SEE ALSO THE FOLLOWING ARTICLES
subject is limited to receiving k = 3 types of soft drink.
Moreover, to ensure fairness in the representation of the COMPUTER ALGORITHMS • PROBABILITY
various beverages, each soft drink should be tasted by the
same number r = 3 of subjects, and each pair of soft drinks BIBLIOGRAPHY
should appear together the same number λ = 1 of times. It
turns out that such a design can be constructed using b = 7 Bogart, K. P. (2000). “Introductory Combinatorics,” 3rd ed. Academic
subjects, with the soft drinks compared by each subject i Press, San Diego, CA.
given by the set Bi below: Cohen, D. I. A. (1978). “Basic Techniques of Combinatorial Theory,”
Wiley, New York.
B1 = {A, B, D}; B2 = {A, C, F}; Grimaldi, R. P. (1999). “Discrete and Combinatorial Mathematics,” 4th
ed. Addison-Wesley, Reading, MA.
B3 = {A, E, G}; B4 = {B, C, G}; Liu, C. L. (1985). “Elements of Discrete Mathematics,” 2nd ed.
McGraw–Hill, New York.
B5 = {B, E, F}; B6 = {C, D, E}; McEliece, R. J., Ash, R. B., and Ash, C. (1989). “Introduction to Discrete
Mathematics,” Random House, New York.
B7 = {D, F, G} Roberts, F. S. (1984). “Applied Combinatorics,” Prentice Hall,
Englewood Cliffs, NJ.
The sets Bi are referred to as blocks, and such a design is Rosen, K. H. (1999). “Discrete Mathematics and Its Applications,” 4th
termed a (b, v, r, k, λ) balanced incomplete block design. ed. McGraw–Hill, New York.
In any such design, the parameters b, v, r, k, λ must satisfy Rosen, K. H., ed. (2000). “Handbook of Discrete and Combinatorial
the following conditions: Mathematics,” CRC Press, Boca Raton, FL.
Tucker, A. (1995). “Applied Combinatorics,” 3rd ed. John Wiley & Sons,
bk = vr, λ(v − 1) = r (k − 1) New York.
P1: ZCK Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
I. System Models
II. Linear Evolution Equations
III. Nonlinear Evolution Equations and Differential
Inclusions
IV. Recent Advances in Infinite-Dimensional
Systems and Control
561
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
In this section we present a few typical examples of dis- where y represents the displacement of the elastic body
tributed parameter systems. The Laplace equation in n from its unstrained configuration, ρ is the mass density,
is given by: and λ, µ, are Lamé constants. In 3 , the state of a single
particle of mass m subject to a field of potential v = v(t,
n
∂ 2φ
φ≡ =0 (1) x1 , x2 , x3 ) is given by the Schrödinger equation:
i=1 ∂ xi2
∂ψ h2
✥
=− ψ + vψ
✥
magnetostatic fields, and also by the temperature of a body In recent years the nonlinear Schrödinger’s equation
in thermal equilibrium. In addition to its importance on its has been used to take into account nonlinear interaction of
own merit, the Laplacian is also used as a basic operator in particles in a beam by replacing vψ by a suitable nonlinear
diffusion and wave propagation problems. For example, function g(t, x, ψ).
the temperature of a body is governed by the so-called The equation for an elastic beam allowing moderately
heat equation: large vibration is governed by a nonlinear equation of the
form:
∂T
= k T + f (t, x), ∂2 y ∂y ∂2 ∂2 y ∂2 y
∂t ρA 2 + β + 2 E I 2 + N 2 = f (t, x)
∂t ∂t ∂x ∂x ∂x
(t, x) ∈ I × , ⊂ 3 (2)
l 2
EA ∂y
where k is the thermal conductivity of the material and f N= dx (7)
is an internal source of heat. The classical wave equation 2l 0 ∂ x
in R n is given by: x ∈ (0, l), t ≥0
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
For simplicity we consider time-invariant systems, al- For system (32) one can prove the following result using
though the result given below holds for the general case. the same procedure.
Theorem 1. Consider system (33) with the operator A Theorem 2. Consider system (32) and suppose that
time invariant and self-adjoint and suppose it satisfies the the operator A satisfies assumptions (a1) and (a2) (see
conditions: Theorem 1). Then, for each φ0 ∈ H and f ∈ L 2 (I , V ∗ ),
system (32) has a unique solution:
(a1) |Aφ, ψ| ≤ cφV ψV
φ ∈ L 2 (I, V ) ∩ L ∞ (I, H ) ∩ C( I¯ , H )
c ≥ 0, φ, ψ ∈ V
(a2) Aφ, φ + λ|φ|2H ≥ αφ2V and further
(φ0 , f ) → φ
λ ∈ R, α > 0, φ∈V
is a continuous map from H × L 2 (I , V ∗ ) to C(I¯ , H ).
Then, for every φ0 ∈ V , φ1 ∈ H , and f ∈ L 2 (I , H ), system
(33) has a unique solution φ satisfying: As a consequence of this result there exists an evolu-
tion operator U (t, s), 0 ≤ s ≤ t ≤∞, with values U (t, s)
(c1) φ ∈ L ∞ (I, V ) ∩ C( I¯ , V ) ∈ (H ) such that
(c2) φ˙ ∈ L ∞ (I, V ) ∩ C( I¯ , H ) t
φ(t) = U (t, 0)φ0 + U (t, θ ) f (θ) dθ (36)
and 0
Note that the second-order evolution equation (33) can (c) Strong limit
+
S(t)ξ = ξ ∈ X
t↓0
be written as a first-order equation dψ/dt + Ã ψ = f˜
where The operator S(t), t ≥ 0, satisfying the above properties is
called a strongly continuous semigroup or, in short, a c0 -
φ 0 −I semigroup. Let A be a closed, densely defined linear oper-
ψ= , Ã =
φ˙ A 0 ator with domain D(A) ⊂ X and range R(A) ⊂ X . Suppose
(34) there exist numbers M > 0 and ω ∈ R such that
0
f˜ = (λI − A)−1 X ≤ M/(λ − ω)
f
for all real λ > ω. Then, by a fundamental theorem from
Defining X = V × H , with the product topology, as the
the semigroup theory known as the Hille–Yosida theorem,
state space, it follows from Theorem 1(c3) that there exists
there exists a unique c0 -semigroup S(t), t ≥ 0, with A as
an operator-valued function S(t), t ≥ 0, with values S(t)
its infinitesimal generator. The semigroup S(t), t ≥ 0, sat-
∈ (X ) so that
isfies the properties:
t
ψ(t) = S(t)ψ0 + S(t − θ ) f˜ (θ ) dθ (35) (a) S(t)(X ) ≤ Meωt , t ≥0
0
(b) for ξ ∈ D(A)
The family of operators {S(t), t ≥ 0} forms a c0 -semigroup
in X and ψ ∈ C( Ī , X ) where C(I , X ) is the space of con- S(t)ξ ∈ D(A) for all t ≥0 (38)
tinuous functions on I with values in the Banach space X
(c) for ξ ∈ D(A)
with the norm (topology):
d
f = sup{| f (t)| X , t ∈ I } S(t)ξ = AS(t)ξ = S(t)Aξ, t ≥0
dt
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
If ω = 0, we have a bounded semigroup; for ω = 0 and One reason for calling the analytic semigroups parabolic
M = 1 we have a contraction semigroup, and for ω < 0, semigroups is that S(t), t ≥ 0, turns out to be the funda-
we have a dissipative semigroup. In general, the abstract mental solution of certain parabolic evolution equations.
Cauchy problem: Consider the differential operator,
dy
= Ay, y(0) = ξ (39) L(x, D) = aα (x) D α
dt |α|≤2m
has a unique solution y(t) = S(t)ξ , t > 0, with y(t) ∈ D and suppose it is strongly elliptic of order 2m and
(A), provided ξ ∈ D(A). If ξ ∈ D(A) and f is any strongly
continuously differentiable function with values f (t) ∈ X , B j (x, D) = b j,α D α , 0≤ j ≤m−1
|α|≤m j
then the inhomogeneous problem,
dy is a set of normal boundary operators as defined earlier.
= Ay + f, y(0) = ξ (40) Define
dt
m, p
has a unique continuously differentiable solution y given D(A) = W 2m, p ( ) ∩ W0 ( )
by: (43)
t
(Aφ)(x) = −L(x, D)φ(x), φ ∈ D(A)
y(t) = S(t)ξ + S(t − θ ) f (θ ) dθ
0 Then, under certain technical assumptions, A generates
with y(t) ∈ D(A) for all t ≥0 (41) an analytic semigroup S(t), t ≥ 0, in the Banach space
X = L p ( ). The initial boundary value problem,
A solution satisfying these conditions is called a clas-
sical solution. For control problems these conditions are ∂φ
(t, x) + L(x, D)φ(t, x) = f (t, x)
rather too strong since, in general, we do not expect the ∂t
controls, for example f (t), to be even continuous. Thus, t ∈ (0, T ), x∈
there is a need for a broader definition and this is provided
by the so-called mild solution. (D α φ)(t, x) = 0 (44)
Any function y: I → X having the integral representa- |α| ≤ m − 1, t ∈ (0, T ), x ∈∂
tion (41) is called a mild solution of problem (40). In this
regard we have the following general result. φ(0, x) = φ0 (x), x∈
T ∈ (H ), T = T ∗ , and (T ξ , ξ ) > 0 for ξ (=0) ∈ H . We observer chooses a feasible set Q, where q ∗ may possibly
can prove the following result. lie, and constructs the model system,
Theorem 4. A necessary and sufficient condition for ẏ (q) = A(q)y(q), y(q)(0) = y0
the system dy/dt = Ay to be exponentially stable in the (51)
Lyapunov sense (i.e., there exist M ≥ 1, β > 0 such that z(q) = C y(q)
y(t) ≤ Me−βt ) is that the operator equation,
where C is the observation or output operator, an element
A∗ Y + YA = − (47)
of (H , K ). The analyst may choose to identify the pa-
has a solution Y ∈ + (H ) for each ∈ + (H ) with (ξ , rameter approximately by minimizing the functionl,
ξ ) ≥ γ |ξ |2H for some γ > 0. T
1
REMARK. Equation (47) is understood in the sense J (q) = |z(q) − z ∗ |2K dt over Q (52)
2 0
that the equality,
0 = (ξ, η) + (Aξ, Y η) + (Y ξ, Aη) Similarly, one may consider identification of an opera-
tor appearing in system equations. For example, one may
holds for all ξ, η ∈ D(A). consider the system,
Corollary 5. If the autonomous system (46) is asym- d2 y
ptotically stable, then the system, + Ay + B ∗ y = f
dt 2
dy y(0) = y0 , ẏ (0) = y1 (53)
= Ay + f, t ≥0 (48)
dt
z = Cy
is stable in the L p sense; that is, for every y0 ∈ H and
input f ∈ L p (0, ∞; H ), the output y ∈ L p (0, ∞; H ) for where the operator A is known but the operator B ∗ is
all 1 ≤ p ≤ ∞ and in particular, for 1 ≤ p < ∞, y(t) → 0 unknown. One seeks an element B from a feasible set
as t → ∞. P 0 ⊂ (V , V ∗ ) so that
We conclude this section with a remark on the solvabil- T
ity of Eq. (47). Let {λi } be the eigenvalues of the operator J (B) = g(t, y(B), ẏ (B)) dt (54)
A, each repeated as many times as its multiplicity requires, 0
and let {ξi } denote the corresponding eigenvectors com- is minimum, where g is a suitable measure of discrep-
plete in H . Consider Eq. (47) and form ancy between the model output, z(B) = C y(B), and the
∗ observed data z ∗ corresponds to the natural history y ∗ .
A Y ξi , ξ j + (YA ξi , ξ j ) = −(ξi , ξ j ) (49)
In general, one may consider the problem of identifi-
for all integers i, j ≥ 1. Clearly, from this equation, it cation of all the operators A, B including the data y0 , y1 ,
follows that and f . For simplicity, we shall consider only the first two
problems and present a couple sample results. First, con-
(Y ξi , ξ j ) = −(ξi , ξ j )/(λi + λ̄j ) (50)
sider problem (51) and (52).
Hence, if λi + λ̄j = 0 for all i, j, determines Y uniquely,
Theorem 6. Let the feasible set of parameters Q be
and if λi + λ̄j = Re λi < 0 for all i, then Y ∈ + (H ). In
a compact subset of a metric space and suppose for each
other words, if system (46) is asymptotically stable then
q ∈ Q that A(q) is the generator of a strongly continuous
the operator equation (47) always has a positive solution.
contraction semigroup in H . Let
A system analyst may know the structure of the system— in the strong operator topology for each γ > 0 whenever
for example, the order of the differential operator and its qn → q0 in Q. Then there exists q 0 ∈ Q at which J (q)
type (parabolic, hyperbolic, etc.)—but the parameters are attains its minimum.
not all known. In that case, the analyst must identify the un- Proof. The proof follows from the fact that under as-
known parameters from available information. Consider sumption (55) the semigroup Sn (t), t ≥ 0, corresponding
the natural system to be given by ẏ ∗ = A(q ∗ )y ∗ . Assume to qn strongly converges on compact intervals to the semi-
that q ∗ is unknown to the observer but the observer can group S0 (t), t ≥ 0, corresponding to q0 . Therefore, J is
observe certain data z ∗ from a Hilbert space K , the output continuous on Q and, Q being compact, it attains its min-
space, which corresponds to the natural history y ∗ . The imum on Q.
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
The significance of the above result is that the identifi- p (E) ≡ locally pth power summable
proper subset of L loc
cation problem is well posed. E-valued functions on 0 = [0, ∞). For a given initial
For the second-order evolution equation (53) we have state φ0 ∈ ,
the following result. t
φ(t) = S(t)φ0 + S(t − θ )Bu(θ ) dθ, t ≥0
Theorem 7. Consider system (53). Let P 0 be a compact 0
(in the sense of strong operator topology) subset of the ball, denotes the mild (weak) solution of problem (58).
Pb ≡ B ∈ (V, V ∗ ): B(V,V ∗ ) ≤ b Given φ0 ∈ X and a desired target φ1 ∈ X , is it possible
to find a control from that transfers the system from
Then, for each g defined on I × V × H which is measur- state φ0 to the desired state φ1 in finite time? This is the
able in the first variable and lower semi continuous in the basic question of controllability. In other words, for a given
rest, the functional J (B) of Eq. (54) attains its minimum φ0 ∈ X , one defines the attainable set:
on P 0 .
The best operator B 0 minimizing the functional J (B) (t) ≡ x ∈ X : x = S(t)φ0
can be determined by use of the following necessary con- t
ditions of optimality.
+ S(t − θ)Bu(θ) dθ, u ∈
0
Theorem 8. Consider system (53) along with the func-
tional, and inquires if there exists a finite time τ ≥ 0, such that
φ1 ∈ (τ ) or equivalently,
1 T
J (B) ≡ |C y(B) − z ∗ (t)|2K dt
2 0 φ1 − S(τ )φ0 ∈ R(τ ) ≡ (τ ) − S(τ )φ0
with the observed data z ∗ ∈ L 2 (I , K ), the observer The set (τ ), given by:
C ∈ (H , K ), f ∈ L 2 (I , H ), y0 ∈ V , y1 ∈ H and P 0 as
in Theorem 7. Then, for B 0 to be optimal, it is necessary
(τ ) ≡ ξ ∈ X : ξ = L τ u
that there exists a pair {y, x} satisfying the equations,
τ
ÿ + Ay + B 0 y = f
≡ S(τ − θ)Bu(θ ) dθ, u ∈
ẍ + A∗ x + (B 0 )∗ x = C ∗ K (C y(B 0 ) − z ∗ ) 0
where
τ L ∗τ is the adjoint of the operator L τ and L τ u ≡ for all nontrivial weak solutions of the adjoint system
0 S(τ − θ) Bu(θ ) dθ .
ψ̇ + A∗ (t)ψ = 0, t ≥0
Note that by our definition, here controllability means
approximate controllability; that is, one can reach an ar- where H (ξ ) = sup{(ξ, e) E ∗ ,E , e ∈ }.
bitrary neighborhood of the target but never exactly at the
target itself. Another interesting difference between finite- E. Existence of Optimal Controls
and infinite-dimensional systems is that in case X = R n ,
E = R m , condition (b) implies that (L τ L ∗τ )−1 exists and The question of existence of optimal controls is consid-
the control achieving the desired transfer from φ0 to φ1 is ered to be a fundamental problem in control theory. In
given by: this section, we present a simple existence result for the
−1 hyperbolic system,
u = L τ L ∗τ (φ1 − S(τ )φ0 )
φ¨ + Aφ = f + Bu, t ∈ I ≡ (0, T )
For ∞-dimensional systems, the operator (L τ L ∗τ ) does not
in general have a bounded inverse even though the operator (62)
is positive. φ(0) = φ0 , φ˙ (0) = φ1
Another distinguishing feature is that in the finite- Similar results hold for parabolic systems. Suppose the
dimensional case the system is controllable if and only operator A and the data φ0 , φ1 , f satisfy the assump-
if the rank condition, tions of Theorem 1. Let E be a real Hilbert space and
rank(B, AB, . . . , An−1 B) = n 0 ⊂ L 2 (I, E) the class of admissible controls and B ∈
(E, H ). Let S0 ⊂ V and S1 ⊂ H denote the set of ad-
holds. In the ∞-dimensional
case there is no such con- missible initial states. By Theorem 1, for each choice of
dition; however, if BE ⊂ ∞ n=1 D(A n
) then the system is φ0 ∈ S0 , φ1 ∈ S1 , and u ∈ 0 there corresponds a unique
controllable if solution φ called the response. The quality of the response
is measured through a functional called the cost functional
∞
closure range(An B) = X (59) and may be given by an expression of the form,
n=0 T
This condition is also necessary and sufficient if S(t), J (φ0 , φ1 , u) ≡ α g1 (t, φ(t), φ̇ (t)) dt
0
t ≥ 0, is an analytic semigroup and BE ⊂ t>0 S(t)X .
In recent years, much more general results that admit + βg2 (φ(T ), φ̇ (T )) + λg3 (u) (63)
very general time-varying operators {A(t), B(t), t ≥ 0}, α + β > 0; α, β, γ ≥ 0, where g1 , g2 , and g3 are suitable
including hard constraints on controls, have been proved. functions to be defined shortly. One may interpret g1 to
We conclude this section with one such result. The system, be a measure of discrepancy between a desired response
ẏ = A(t)y + B(t)u, t ≥0 and the one arising from the given policy {φ0 , φ1 , u}. The
(60) function g2 is a measure of distance between a desired
y(0) = y0 target and the one actually realized. The function g3 is
a measure of the cost of control applied to system (62).
is said to be globally null controllable if it can be steered
A more concrete expression for J will be given in the
to the origin from any initial state y0 ∈ X .
following section. Let P ≡ S0 × S1 × 0 denote the set
Theorem 10. Let X be a reflexive Banach space and of admissible policies or controls. The question is, does
Y a Banach space densely embedded in X with the in- there exist a policy p 0 ∈ P such that J ( p 0 ) ≤ J ( p) for
jection Y ⊂ X continuous. For each t ≥ 0, A(t) is the all p ∈ P? An element p 0 ∈ P satisfying this property is
generator of a c0 -semigroup satisfying the stability con- called an optimal policy. A set of sufficient conditions for
l (0, ∞; (Y, X )) where (X , Z ) is
dition and A ∈ L loc the existence of an optimal policy is given in the following
the space of bounded linear operators from a Banach result.
space X to a Banach space Z ; (X ) ≡ (X , X ), B ∈
Theorem 11. Consider system (62) with the cost func-
L qloc (0, ∞; (E, X )), and = {u ∈ L loc
p (E); u(t) ∈ a.e.} tional (63) and let S0 , S1 , and 0 be closed bounded
where is a closed bounded convex subset of E with o ∈
convex subsets of V , H , and L 2 (I , E), respectively. Sup-
and p −1 + q −1 = 1. Then a necessary and sufficient con-
pose for each (ξ , η) ∈ V × H , t → g1 (t, ξ, η) is measur-
dition for global null controllability of system (60) is that
able on I and, for each t ∈ I , the functions (ξ , η) → g1
∞
(t, ξ, η) and (ξ , η) → g2 (ξ , η) are weakly lower semicon-
H (B ∗ (t)ψ(t)) dt = +∞ (61) tinuous on V × H and the function g3 is weakly lower
0
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
semicontinuous on L 2 (I , E). Then, there exists an opti- Let x ∗ ∈ X ∗ ≡ V ∗ × H , where X ∗ is the dual of the Banach
mal policy, space X which is the space of continuous linear functionals
on X ; for example,
p 0 = φ00 , φ10 , u 0 ∈ P
X = L p, 1≤ p<∞
Another problem of considerable interest is the question
∗
of existence of time-optimal controls. Consider system X = Lq with p −1 + q −1 = 1
(33) in the form (34) with
Then,
0 τn
f = Bu, f˜ = ∗ ∗ ∗
Bu x (ψ1 ) = x (S(τn )φ0 ) + x S(τn − θ ) f n (θ ) dθ
0
and solution given by:
t (64)
ψ(t) = S(t)ψ0 + S(t − θ ) f˜ (θ ) dθ (35 ) By virtue of the c0 -property of the semigroup S(t), t ≥ 0,
0
where S(t), t ≥ 0, is the c0 -semigroup in X ≡ V × H with
lim x ∗ (S(τn )ψ0 ) = x ∗ S(τ ∗ )ψ0 (65)
the generator −Ã as given Eq. (34). Here, one is given the n→∞
initial and the desired final states ψ0 , ψ1 ∈ X and the set Splitting the integral in Eq. (64) into two parts, we have
of admissible controls 0 . Given that the system is con- τn
trollable from state ψ0 to ψ1 in finite time, the question ∗
x S(τn − θ ) f n (θ) dθ
is, does there exist a control that does the transfer in min- 0
τ∗
imum time? A control satisfying this property is called
a time-optimal control. We now present a result of this = x∗ S τn − τ ∗ S(τ ∗ − θ ) f n (θ) dθ
0
kind. τn
∗
Theorem 12. If 0 is a closed bounded convex subset +x S(τn − θ ) f n (θ ) dθ
τ∗
of L 2 (I , E) and if systems (34) and (35 ) are exactly con-
τ∗
trollable from the state ψ0 to ψ1 ∈ X , then there exists a ∗ ∗ ∗ ∗
= S(τ − θ ) f n (θ) dθ, S τn − τ x
time-optimal control. 0 X,X ∗
Proof. Let ψ(u) denote the response of the system cor- τn
Further, by the c0 -property of S(t), t ≥ 0 there exists a constraint (62), an optimal control exists. Since in this
finite M > 0 such that case J is strictly convex, there is, in fact, a unique opti-
τn mal control. For characterization of optimal controls, the
∗
x S(τn − θ) f n (θ ) dθ concept of Gateaux differentials plays a central role. A
τ∗
1/2 real-valued functional f defined on a Banach space X is
τn 1/2
∗ said to be Gateaux differentiable at the point x ∈ X in the
≤ Mx X ∗ f n (θ )2X dθ τn − r ∗ (68)
τ∗
direction h ∈ X if
Since 0 is bounded, it follows from this that f (x + εh) − f (x)
τn lim = f (x, h) (70)
∗
ε→0 ε
lim x S(τn − θ) f n (θ ) dθ = 0
n→∞ τ∗
exists. In general, h → f (x, h) is a homogeneous func-
Using Eqs. (65) to (67) in (64) we obtain: tional, and, in case it is linear in h, we write:
τ∗
∗
∗ ∗
x (ψ1 ) = x S(τ )ψ0 + x ∗ ∗ ∗
S(τ − θ) f (θ ) dθ f (x, h) = ( f (x), h) X ∗ ,X (71)
0
for all x ∗ ∈ X ∗ . Hence, with the Gateaux derivative f (x) ∈ X ∗ . Since the func-
τ∗ tional J , defined on the Hilbert space L 2 (I , E), is Gateaux
ψ1 = S(τ ∗ )ψ0 + S(τ ∗ − θ ) f ∗ (θ ) dθ differentiable and strictly convex, and the set of admissi-
0 ble controls 0 is a closed convex subset of L 2 (I , E), a
and u ∗ ∈ 00 . This completes the proof. control u 0 ∈ 0 is optimal if and only if
The method of proof of the existence of time-optimal (J (u 0 ), u − u 0 ) ≥ 0 for all u ∈ 0 (72)
controls presented above applies to much more general
systems. Using this inequality, we can develop the necessary con-
ditions of optimality.
F. Necessary Conditions of Optimality Theorem 13. Consider system (62) with the cost func-
After the questions of controllability and existence of op- tional (69) and 0 , a closed bounded convex subset of
timal controls are settled affirmatively, one is faced with L 2 (I , E). For u 0 ∈ 0 to be optimal, it is necessary that
the problem of determining the optimal controls. For this there exists a pair
purpose, one develops certain necessary conditions of op-
{φ 0 , ψ 0 } ∈ C(I¯ , V ) × C(I¯ , V )
timality and constructs a suitable algorithm for computing
the optimal (extremal) controls. We present here necessary with
conditions of optimality for system (62) with a quadratic
0 0
cost functional of the form, {φ̇ , ψ̇ } ∈ C(I¯ , H ) × C(I¯ , H )
T
J (u) ≡ α (Cφ(t) − z 1 (t), Cφ(t) − z 1 (t)) H1 dt satisfying the equations:
0
0
T φ̈ + Aφ 0 = f + Bu 0
+β (D φ˙ (t) − z 2 (t), D φ˙ (t) − z 2 (t)) H2 dt
0 φ 0 (0) = φ0 , (73a)
T 0
+γ (N (t)u, u) E dt (69) φ˙ (0) = φ1
0
where α, β, γ > 0. The output spaces, where observations T
ψ̈ + A∗ ψ 0 +
0
are made, are given by two suitable Hilbert spaces H1 and g1 dθ + g2 = 0
H2 with output operators C ∈ (H , H1 ) and D ∈ (H , t
H2 ). The desired trajectories are given by z 1 ∈ L 2 (I , H1 ) ψ 0 (T ) = 0,
0
ψ̇ (T ) = 0
and z 2 ∈ L 2 (I , H2 ). The last integral in Eq. (69) gives a (73b)
measure of the cost of control with N (t) ≥ δ I for all t ∈ I ,
g1 = 2αC ∗ 1 Cφ 0 − z 1
with δ > 0. We assume that N (t) = N ∗ (t), t ≥ 0. Our prob-
g2 = 2β D ∗ 2 D φ˙ − z 2
0
lem is to find the necessary and sufficient conditions an
optimal control must satisfy. By Theorem 11, we know
that, for the cost functional (69) subject to the dynamic and the inequality
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
in expression (75) and defining J (u) ≡ αCφ(T ) − z 1 2H1 + βD φ̇(T ) − z 2 2H2
T
g1 (t, φ(u 0 )) ≡ 2αC ∗ 1 Cφ(u 0 ) − z 1 (t)
+γ (N (t)u, u) E dt (82)
g2 (t, φ(u 0 )) ≡ 2β D ∗ 2 D φ˙ (u 0 ) − z 2 (t) 0
where z 1 ∈ H1 and z 2 ∈ H2 .
we obtain: In this case, the necessary conditions of optimality are
T given by:
(J (u ), w) =
0
dt φ̂ (u 0 , w), g1 H + φ̂˙ (u 0 , w), g2 H T
0
u − u 0 , 2γ Nu 0 − −1 ∗
E B p2 dt ≥ 0
+ (w, 2γ Nu 0 ) E (77) 0
−1 ∗ 1
Theorem 14 (Maximum Principle.) Suppose A is J (u 1 ) = 2γ Nu 1 + E B ψ̇
the generator of a c0 -semigroup S(t), t ≥ 0, in X and there
exists a t > 0 such that S(t)X = X . Let y0 , y1 ∈ X , and as the gradient at u 1 and constructs a new control u 2 =
suppose u 0 is the time-optimal control, with transition time u 1 − ε1 J (u 1 ), with ε1 > 0 sufficiently small so that
τ , that steers the system from the initial state y0 to the final J (u 2 ) ≤ J (u 1 ). This way one obtains a sequence of
state y1 . Then there exists an x ∗ ∈ X ∗ (=dual of X ) such approximating controls,
that
u n+1 = u n − εn J (u n )
∗ ∗
S (τ − t)x , u (t) X ∗ ,X
0
For example, let us consider the semilinear parabolic for u in a suitable subspace Y with D(A) ⊂ Y ⊂ X . Prob-
equation with mixed initial and boundary conditions: lem (88) can then be considered as an abstract Cauchy
∂φ problem,
+ Lφ = g(t, x; φ, Dφ, . . . , D 2m−1 φ)
∂t dφ
+ A(t)φ = f (t, φ)
(t, x) ∈ I × ≡ Q dt
(95)
φ(0, x) = φ0 (x), x∈ (88) φ(0) = φ0
Dνk φ = 0, 0≤k ≤m−1 In view of Eq. (93), a mild solution of Eq. (95) is given by
a solution of the integral equation,
(t, x) ∈ I × ∂ t
where φ(t) = U (t, 0)φ0 + U (t, θ ) f (θ, φ(θ)) dθ (96)
0
α
(Lφ)(t, x) ≡ aα (t, x)D φ (89) if one exists. Defining the operator G by:
|α|≤2m t
and Dνk φ = ∂ k φ/∂ν k denotes the kth derivative in the (Gφ)(t) ≡ U (t, 0)φ0 + U (t, θ ) f (θ, φ(θ )) dθ
0
direction of the normal ν to ∂ . We assume that L is
one then looks for a fixed point for G, that is, an element φ
strongly elliptic with principal coefficients aα , |α| = 2m,
such that φ = Gφ. Using a priori estimates, the most dif-
in C( Q̄) and the lower order coefficients aα , |α| ≤ 2m − 1,
ficult part of the program, one can establish the existence
L ∞ (Q), and further they are all Hölder continuous in t
of a solution by use of a suitable fixed-point theorem—
uniformly on ¯ . Let 1 < p < ∞ and define the operator-
for example, Banach, Schauder, or Leray–Schauder fixed-
valued function A(t), t ∈ I , by:
point theorems. We state the following result without
D(A(t)) ≡ {ψ ∈ X = L p ( ) : (Lψ)(t, ·) ∈ X proof.
and Dνk ψ ≡ 0 on ∂ , 0 ≤ k ≤ m − 1} (90) Theorem 15. Consider the semilinear parabolic prob-
The domain of A (t) is constant and is given by: lem (88) in the abstract form (95) and suppose A generates
the evolution operator U (t, τ ), 0 ≤ τ ≤ t ≤ T , and f sat-
m, p
D ≡ W 2m, p ∩ W0 (91) isfies the properties:
Then, one can show that for each t ∈ I, −A(t) is the gener- (F1) f (t, u) X ≤ c 1 + Aβ (t)u X
ator of an analytic semigroup and there exists an evolution
operator U (t, τ ) ∈ (X ), 0 ≤ τ ≤ t ≤ T , that solves the t∈I
abstract Cauchy problem: for constants c > 0, 0≤β<1
dy
+ A(t)y = 0 and u ∈ D(Aβ ), (97)
dt
(92)
y(0) = y0 (F2) f (t, u) − f (t, w) X
ρ
for each y0 ∈ X ; that is, y(t) = U (t, 0)y0 , t ∈ I , with y ∈ ≤ C Aβ (t)v − Aβ (t)w X
C( I¯ , X ) and ẏ ∈ C((0, T ], X ). In general, if f ∈ L p (I, X ), t∈I
then y, given by:
t for some 0 < ρ ≤ 1,
y(t) = U (t, 0)y0 + U (t, τ ) f (τ ) dτ (93) u, w ∈ D(Aβ ) (98)
0
is a mild solution of the Cauchy problem, Then, Eq. (95) has a mild solution φ ∈ C(I, X ), hence
the semilinear parabolic equation (88) has a generalized
dy
+ A(t)y = f solution. The solution is unique if ρ = 1.
dt
(94) REMARK 1. Condition (F1) is satisfied if the function
y(0) = y0
g satisfies the growth condition:
This would be the generalized (weak) solution of the
2m−1
parabolic initial boundary value problem (88) if g were |g(t, x, u, Du, . . . , D 2m−1
u)| ≤ k 1 + |D j u|rj
replaced by f ∈ L p (I, X ). In order to solve problem (88), j=0
one must introduce an operator f such that
for 0 ≤ r j ≤ (2m + n/q)/( j + n/q), 1 < q < ∞, and a
f (t, u) ≡ g(t, ·; u, D 1 u, . . . , D 2m−1 u) ∈ X a.e. number β satisfying (2m − 1)/2m < β < 1. In the case
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
ρ = 1, condition (F2) is satisfied if the function g is Proof. We discuss the outline of a proof. The differen-
Lipschitz in the last 2m variables uniformly with respect tial equation (100) is converted into an integral equation
to (t, x) ∈ I × . and then one shows that the integral equation has a solu-
If the coefficients {aα , |α| ≤ 2m} in Eq. (89) are tion. Let v ∈ C([0, t ∗ ], X ) and define:
also dependent on φ so that {aα = aα (t, x; φ, D 1 φ, . . . , −β
Av (t) ≡ A t, A0 v(t) ,
D 2m−1 φ) then system (88) becomes a quasilinear system
−β
and parabolic if f v (t) ≡ f t, A0 v(t)
(−1)m Re aα (t, x; η)ξ α ≥ c|ξ |2m and consider the linear system,
|α|=2m dy
+ Av (t)y = f v (t), t ∈ [0, t ∗ ]
c>0 (99) dt
(101)
$
for (t, x) ∈ Q̄ and η ∈ R N , where N = 2m−1
j=0 N j with N j
y(0) = φ0
denoting the number of terms representing derivatives of By virtue of assumptions (A1) to (A3), −Av (t), t ∈ [0,
order exactly j appearing in the arguments of {aα }. Sys- t ∗ ], is the generator of an evolution operator U v (t, τ ),
tem (88) then takes the form: 0 ≤ t ≤ t ∗ . Hence, the system has a mild solution given
dφ by:
+ A(t, φ)φ = f (t, φ) t
dt
(100) y (t) = U (t, 0)φ0 + U v (t, θ ) f v (θ) dθ
v v
φ(0) = φ0 0
0 ≤ t ≤ t∗ (102)
This problem is again solved by use of a priori es-
timates and a fixed-point theorem under the following Defining an operator G by setting
assumptions: t
β β
(Gv)(t) ≡ A0 U v (t, 0)φ0 + A0 U v (t, θ ) f v (θ ) dθ
(A1) The operator A0 = A(0, φ0 ) is a closed operator 0
(103)
with domain D dense in X and
% % one then looks for a fixed point of the operator G, that
%(λI − A0 )−1 % ≤ k/(1 + |λ|)
is, an element v ∗ ∈ C([0, t ∗ ], X ) such that v ∗ = Gv ∗ . In
for all λ, Re λ ≤ 0. fact, one shows, under the given assumptions, that for
sufficiently small t ∗ ∈ I , there exists a closed convex set
(A2) A−1
0 is a completely continuous operator that is; it K ⊂ C([0, t ∗ ], X ) such that GK ⊂ K and GK is relatively
is continuous in X and maps bounded sets into compact compact in C([0, t ∗ ], X ) and hence, by the Schauder
subsets of X . fixed-point theorem (which is precisely as stated) has a
(A3) There exist numbers ε, ρ satisfying 0 < ε ≤ 1, solution v ∗ ∈ K . The solution (mild) of the original prob-
−β
0 < ρ ≤ 1, such that for all t, τ ∈ I , lem (100) is then given by φ ∗ = A0 v ∗ . This is a genuine
(strong) solution if f is also Holder continuous in t. If
(A(t, u) − A(τ, w))A−1 (τ, w)
% β ρ = 1, G has the contraction property and the solution is
β %ρ
≤ k R |t − τ |ε + % A0 u − A0 w % unique.
β β
for all u, w such that A0 u, A0 w < R with k R possibly According to our assumptions, for each t ∈ [0, t ∗ ] and
β
depending on R. y ∈ D(A0 ), the operator A(t, y) is the generator of an an-
alytic semigroup. This means that Theorem 16 can han-
(F1) For all t, τ ∈ I, dle only parabolic problems and excludes many physical
% β β %ρ problems arising from hydrodynamics and wave propaga-
f (t, v) − f (t, w)| ≤ k R % A0 v − A0 w %
tion phenomenon including the semilinear and quasilinear
β β symmetric hyperbolic systems discussed in Section I. This
for all v, w ∈ X such that A0 v, A0 w ≤ R.
limitation is overcome by allowing A(t, y), for each t, y in
Theorem 16. Under assumptions (A1) to (A3) and (F1) a suitable domain, to be the generator of a c0 -semigroup
there exists a t ∗ ∈ (0, T ) such that Eq. (100) has at least rather than an analytic semigroup. The fundamental as-
β
one mild solution φ ∈ C ([0, t ∗ ], X ) for each φ0 ∈ D(A0 ) sumptions required are:
β
with A0 ψ0 ≤ R. Further, if f also satisfies the Hölder
condition in t, the solution is C 1 in t ∈ (0, t ∗ ]. If ρ = 1, the (H1) X is a reflexive Banach space with Y being an-
solution is unique. other Banach space which is continuously and densely
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
embedded in X and there is an isometric isomorphism S Let H be a real Hilbert space and V a subset of H
of Y onto X . having the structure of a reflexive Banach space with V
(H2) For t ∈ [0, T ], y ∈ W ≡ an open ball in Y , A(t, y) dense in H . Let V ∗ denote the (topological) dual of V and
is the generator of a c0 -semigroup in X . suppose H is identified with its dual H ∗ . Then we have
(H3) For t, y ∈ [0, T ] × W , V ⊂ H ⊂ V ∗ . Using the theory of monotone operators we
can prove the existence of solutions for nonlinear evolution
(S A(t, y) − A(t, y)S)S −1 = B(t, y) ∈ (Y, X )
equations of the form,
and B(t, y)(Y,X ) is uniformly bounded on I × Y . dφ
+ B(t)φ = f, t ∈ I = (0, T )
dt
Theorem 17. Under assumptions (H1) to (H3) and (107)
certain Lipschitz and boundedness conditions for A and φ(0) = φ0
f on [0, T ] × W , the quasilinear system (100) has, for
where B(t), t ∈I , is a family of nonlinear monotone oper-
each φ0 ∈ W , a unique solution,
ators from V to V ∗ .
φ ∈ C([0, t ∗ ), W ) ∩ C 1 ([0, t ∗ ), X )
Theorem 18. Consider system (107) and suppose the
for some 0 < t∗ ≤ T operator B satisfy the conditions:
The proof of this result is also given by use of a fixed- (B1) B: L p (I , V ) → L q (I , V ∗ ) is hemicontinuous:
point theorem but without invoking the operator A0 . Here,
for any v ∈ C([0, t ∗ ), W ), one defines Av (t) = A(t, v(t)), p−1
Bφ L q (I,V ∗ ) ≤ K 1 1 + φ L p (I,V ) (108)
f v (t) = f (t, v(t)) and U v (t, τ ), 0 ≤ τ ≤ t ≤ T , the evo-
where K 1 > 0, and 1 < p, q < ∞, 1/ p + 1/q = 1.
lution operator corresponding to −Av and constructs the
(B2) For all φ, ψ ∈ L p (I , V ),
operator G by setting
t (Bφ − Bψ, φ − ψ) L q (I,V ∗ ),L p (I,V ) ≥ 0 (109)
v
(Gv)(t) = U (t, 0)φ0 + U v (t, τ ) f v (τ ) dτ
0 That is, B is a monotone operator form L p (I , V ) to
where the expression on the right-hand side is the mild L q (I, V ∗ ).
solution of the linear equation (101). Any v ∗ satisfying (B3) There exists a nonnegative function C: → R̄
v ∗ = Gv ∗ is a mild solution of the original problem (100). with C(ξ ) → +∞ as ξ → ∞ such that for ψ ∈ L p (I , V ),
From the preceding results it is clear that the solutions (Bψ, ψ) L q (I,V ∗ ),L p (I,V ) ≥ C(ψ)ψ (110)
are defined only over a subinterval (0, t ∗ ) ⊂ (0, T ) and
it may actually blow up at time t ∗ . Mathematically this Then for each φ0 ∈ H and f ∈ L q (I, V ∗ ) system (107)
is explained through the existence of singularities, which has a unique solution φ ∈ L p (I, V ) ∩ C(I¯ , H ) and φ˙ ∈
physically correspond to the occurrence of, for example, L q (I ,V ∗ ). Further, φ is an absolutely continuous V ∗ -
turbulence or shocks in hydrodynamic problems. How- valued function on I¯ .
ever, global solutions are defined for systems governed It follows from the above result that, for φ0 ∈ H ,
by differential equations with monotone operators. We φ ∈ C(Ī , H ) and hence φ(t) ∈ H for t ≥ 0. For f ≡ 0,
present a few general results of this nature. the mapping φ0 → φ(t) defines a nonlinear evolution
Let X be a Banach space with dual X ∗ and suppose A operator U (t, τ ), 0 ≤ τ ≤ t ≤ T , in H . In case B is time
is an operator from D(A) ⊂ X to X ∗ . The operator A is invariant, we have a nonlinear semigroup S(t), t ≥ 0,
said to be monotone if satisfying the properties:
(Ax − Ay, x − y) X ∗ ,X ≥ 0 for all x, y ∈ D(A) (a) S(0) = I and, as t → 0,
(104)
(b) S(t)ξ → s ξ in H and, due to uniqueness,
It is said to be demicontinuous on X if (c) S(t + τ )ξ = S(t)S(τ )ξ, ξ ∈ H.
w
Axn → Ax0 in X ∗ Further, it follows from the equation φ̇ + Bφ = 0 that
whenever
s
xn → x0 in X (105) (φ̇ (t), φ(t)) + (Bφ(t), φ(t)) = 0; hence,
t
And, it is said to be hemicontinuous on X if |φ(t)|2H = |φ0 |2H − 2 (Bφ(θ ), φ(θ ) dθ, t ≥0
w 0
A(x + θ y) → Ax in X ∗
Thus, by virtue of (B3), |φ(t)| H ≤ |φ0 | H ; that is,
whenever θ →0 (106) |S(t)φ0 | H ≤ |φ0 | H . Hence the semigroup {S(t), t ≥ 0} is
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
a family of nonlinear contractions in H , and its generator (F2) f (t, ξ ) − f (t, η), ξ − ηV ∗ ,V ≤ 0 for all ξ , η ∈ V .
is −B. (F3) There exists an h ∈ L q (I , R+ ), R+ = [0, ∞), and
A classical example is given by the nonlinear initial α ≥ 0 such that
boundary-value problem,
| f (t, ξ )|V ∗ ≤ h(t) + α(ξ V ) p/q a.e.
∂φ ∂ ∂φ p−2 ∂φ for each ξ ∈ V .
−
∂t i
∂ xi ∂ xi ∂ xi (F4) There exists an h 1 ∈ L 1 (I , R) and β > 0 such
= f in I× =Q that
where U is the transition operator corresponding to the a ball Br (D) ⊂ B R (D) such that Tt (φ0 ) ∈ B R (D) for all
generator A and φ0 is the initial state. Then, one questions t ≥ 0 whenever φ0 ∈ Br (D). The zero state is said to be
the existence of a g ∈ L 1 (I , V ∗ ) such that g(t) ∈ f (t, St g) asymptotically stable if limt→∞ Tt (φ0 ) = 0 whenever
a.e. If such a g exists, then one has proved the existence φ0 ∈ D .
of a solution of the initial value problem: A function V : D → [0, ∞] is said to be positive defi-
nite if it satisfies the properties:
φ(t)
˙ ∈ A(t)φ(t) + f (t, φ(t))
(115) (a) V (x) > 0 for x ∈ D \{0}, V (0) = 0.
φ(0) = φ0 (b) V is continuous on D and bounded on bounded
sets.
These questions have been considered in control problems. (c) V is Gateaux differentiable on D in the direction of
H , in the sense that, for each x ∈ D and h ∈ H ,
B. Stability, Identification, and Controllability V (x + εh) − V (x)
lim ≡ V (x, h)
ε→0 ε
We present here some simple results on stability and some
comments on the remaining topics. exists, and for each h ∈ H , x → V (x, h) is continuous.
We consider the semilinear system, The following result is the ∞-dimensional analog of
dφ the classical Lyapunov stability theory.
= Aφ + f (φ) (116) Theorem 21. Suppose the system φ̇ = Aφ + f (φ) has
dt
strong solutions for each φ0 ∈ D(A) and there exists a
in a Hilbert space (H ,·) and assume that f is weakly positive definite function V on D such that along any
nonlinear in the sense that (a) f (0) = 0, and (b) f (ξ ) = trajectory φ(t), t ≥ 0, starting from φ0 ∈ D ,
o(ξ ), where
V (φ(t), Aφ(t) + f (φ(t))) ≤ 0 (<0) (118)
o(ξ )
lim =0 (117) for all t ≥ 0. Then the system is stable (asymptotically
ξ →0 ξ
stable) in the region D .
Theorem 20. If the linear system φ̇ = Aφ is asymp-
If the system admits only mild solutions, Theorem 21
totically stable in the Lyapunov sense and f is a weakly
must be modified by using positive definite functions
nonlinear continuous map from H to H , then the nonlin-
which have Gateaux derivatives in the directions {h} in
ear system (116) is locally asymptotically stable near the
spaces larger than H .
zero state.
We conclude this section with a result for systems gov-
The proof is based on Theorem 4. erned by monotone nonlinear operators as in Eqs. (107)
For finite-dimensional systems, Lyapunov stability the- and (112).
ory is most popular in that stability or instability of a sys-
Theorem 22. Consider system (112) with the opera-
tem is characterized by a scalar-valued function known
tors A and f satisfying the assumptions of Theorem 19
as the Lyapunov function. For ∞-dimensional systems a
for all t ≥ 0, and suppose h 1 ∈ L 1 (0, ∞; R) and the in-
straight forward extension is possible only if strong solu-
jection V ⊂ H is continuous. Then, the system is globally
tions exist.
asymptotically stable with respect to the origin in H .
Consider the evolution equation (116) in a Hilbert
space H , and suppose that A is the generator of a strongly The questions of identification of parameters appearing
continuous semigroup in H and f is a continuous map in in any of the system equations treated above can be dealt
H bounded on bounded sets. We assume that Eq. (116) with in a similar way as in the linear case. In fact, an iden-
has strong solutions in the sense that φ̇ (t) = Aφ(t) + tification problem may be considered as a special case of
f (φ(t)) holds a.e., and φ(t) ∈ D(A) whenever φ0 ∈ a control problem with controls appearing in the system
D(A). Let Tt , t ≥ 0, denote the corresponding nonlinear coefficients. Such classes of problems have been covered
semigroup in H so that φ(t) = Tt (φ0 ), t ≥ 0. Without loss well in the literature. However, controllability questions
of generality we may consider f (0) = 0 (if necessary for the general systems are more difficult and almost noth-
after proper translation in H ) and study the question of ing is known.
stability of the zero state. Let be a nonempty open
connected set in H containing the origin and define
C. Existence of Optimal Controls
D≡ ∩ D(A), and Ba (D) ≡ {ξ ∈ H : |ξ | H < a} ∩ D(A)
for each a > 0. The system is said to be stable in the Existence of optimal controls for strongly nonlinear
region D if, for each ball B R (D) ⊂ D , there exists parabolic systems and more general nonlinear evolution
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
equations of the form (86) have been treated in the (P2) Time-optimal control. Let M, a subset of H , be the
literature. The technical details are rather involved and target set. The requirement is to find a control u ∈ that
long. We shall limit ourselves to a brief summary of some transfers the system from the state φ0 ∈ H to the target set
results. M in minimum time.
Consider system (107) with controls denoted by u:
The existence of optimal controls depends on the prop-
dφ erties of admissible trajectories, attainable sets, and the
+ B(t)φ = f (t, u) (119)
dt cost functionals. Let denote the set of admissible tra-
where the operator B is nonlinear and may be given by jectories, that is, the set of all {φ} ∈ L p (I , V ) ∩ C( I¯ , H )
the expression: such that φ is a solution of Eq. (122) corresponding to
some control u ∈ . Similarly, the attainable set may be
B(t)φ = (−1)|α| D α Fα (t, x; φ, D 1 φ, . . . , D m φ), defined as:
|α|≤m
Under quite general assumptions on the function Fα , the Under a number of technical assumptions on the operators
operator B has properties (B1) to (B3) of Theorem 18. B and f and the control set one can prove the following
m, p
For the space V one may choose W0 , p ≥ 2, or any result.
m, p
closed subspace of W m, p
so that W0 ⊂ V ⊂ W m, p . Here Theorem 23. (a) The set of admissible trajectories is
V is a reflexive Banach space. For admissible controls we a weakly closed and weakly sequentially compact subset
choose any reflexive Banach space E of functions defined of L p (I , V ). (b) For each t ∈ [0, T ], the attainable set (t),
on and , a closed bounded convex subset of E, and is a weakly compact subset of H .
consider to be the class of admissible controls which
are strongly measurable functions defined on I = (0, T ), Using the preceding result, one can prove the existence
with values in . Let f : I × → V ∗ , so that for each of optimal controls for problems (P1) and (P2).
t, f (t, ·) is weakly continuous (or more generally demi- Theorem 24. Let Z be a weakly lower semicontinuous
continuous) on ; for each v ∈ , f (·, v) is measurable on functional defined on H and bounded from below. Then
I (or continuous), and for each u ∈ , f (u) ∈ L q (I, V ∗ ) there exists an optimal control solving problem (P1).
where f (u)(t) ≡ f (t, u(t)).
The system, Theorem 25 (P2). Suppose the given target set M is a
weakly closed subset of H and the system is controllable in
dφ the sense that there exists an admissible u, and τ ∈ I¯ , such
+ B(t)φ = f (t, u(t)), t∈I
dt that φ (u)(τ ) ∈ M. Then there exists an optimal control
(121)
φ(0) = φ0 , u∈ that steers the systems from state φ0 to the target set M in
minimum time.
is written in its weak form:
Optimal control problems for the more general system
(Lφ, ψ) + b(φ, ψ) = ( f (u), ψ) (112) recently have been studied in several papers giving
existence results for measurable controls and measure-
for all ψ ∈ L p (I, V ) ∩ C( I¯ , H ) (122)
valued controls. Systems governed by differential inclu-
φ(0) = φ0 , u∈ sions of the form (115) and their associated control prob-
lems also have been studied recently. The technical details
where L denotes the extension of d/dt as an operator
are too long for presentation here. Interested readers may
from the space L p (I , V ) to the space L q (I , V ∗ ) and b is
consult the Bibliography.
the Dirichlet form given by:
b(φ, ψ) ≡ Fα (t, x, φ, D 1 φ, . . . , D m φ) D. Necessary Conditions of Optimality
I |α|≤m
For completeness we shall present a result on the neces-
·D α ψ d x dt (123) sary conditions of optimality. Consider system (122) along
We consider the following control problems: with the cost functional given by:
T
(P1) Terminal control. Let J (u) = Z (φ(T )), where Z is J (u) = Z (φ(T ) + f 0 (t, φ(t), u(t) dt (124)
a real-valued function on H and the pair {u, φ} is subject 0
to the dynamic constraint (122). The problem is to find a The problem is to find a control u 0 ∈ that minimizes the
control u ∈ that minimizes the functional J . functional J subject to constraint (122). Let {Fα (t, x, ξ )},
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
t, x ∈ I × , ξ = {ξα , |α| ≤ m} ∈ R N , denote the functions For time-optimal controls, similar necessary conditions
defining the operator B (see Eqs. (119) and (120)) and exist. In this case, the optimal control is also characterized
{Fαβ , |β| ≤ m} their directional derivatives with respect to by inequality (c), with the exceptions that f 20 ≡ 0 and the
ξ ∈ R N . We assume that for fixed t, x ∈ I × , the func- upper limit of the integral is the optimal time t 0 instead
tions ξ → Fαβ (t, x, ξ ) are continuous on R N for all α, β of T .
satisfying |α|, |β| ≤ m, and, for fixed ξ ∈ R N , (t, x) → Fαβ A number of interesting observations can be made from
(t, x, ξ ) are measurable on I × . For any fixed φ ∈ L p (I , the above result. For example, if f (t, u) = T (t)u and f 0 (t,
0
V ), the bilinear form, φ, u) = f˜ (t, φ) + N u, u E ∗ ,E and the control set = E,
then it follows from inequality (c) that
bφ (ψ, ν) ≡ D α ψ, Fβα (t, x; φ,
|α|,|β|≤m I (N + N ∗ )u 0 (t) = T ∗ (t)ψ 0 (t) (129)
denote the linear Gateaux differential of f with respect to dy = Ay dt + f (y) dW (t), t ≥ 0, y(0) = y0
the control variable. (130)
Under a number of technical assumptions on the func-
tions {Fα , |α| ≤ m}, f , f 0 , and Z and the control constraint where W is the Wiener process with covariance opera-
set ⊂ E, one can prove the folowing necessary condi- tor Q ∈ + (H ) as in the linear case. A is the generator
tions of optimality. of a c0 -semigroup S(t), t ≥ 0, on H , and f : H → (H )
satisfying:
Theorem 26. Consider system (122) with the cost func-
tional (124). For the pair {u 0 , φ 0 } ∈ × to be optimal it (a) f (x)2(H ) ≤ K 2 1 + |x|2H
is necessary that there exists a ψ 0 ∈ L p (I , V ) ∩ C(I¯ , H ) (131)
(b) f (x) − f (y)2(H ) ≤ K 2 |x − y|2H
so that the triple {u 0 , ψ 0 , φ 0 } satisfy the conditions:
Define the nonlinear operator G by:
(a) (Lφ 0 , ν) + b(φ 0 , ν) = ( f (u 0 ), ν), φ 0 (0) = φ0
t
for all (Gx)(t) ≡ S(t)y0 + S(t − θ ) f (x(θ )) d W (θ ) (132)
0
ν∈ 1 ≡ {ν ∈ L p (I, V ) ∩ C(I¯ , H ): ν(T ) = 0}
for x ∈ X ≡ C(I , L 2 ( , H )), where X is a Banach space
with respect to the topology given by:
(b) − (Lψ 0 , ν) + bφ 0 (ψ 0 , ν) + f 10 , ν = 0, &
ψ 0 (T ) = −Z (φ 0 (T )) x X ≡ sup E|x(t)|2H , t ∈ I
for all
Under the above assumptions one can prove the existence
ν∈ 0 ≡ {ν ∈ L p (I, V ) ∩ C(I¯ , H ): ν(0) = 0} of an integer n, such that the nth iteration of G, denoted
by G n , is a contraction in X . Hence, by the Banach fixed-
point theorem, there exists y ∈ X such that y = Gy. In
(c) f 20 (t, φ 0 (t), u 0 (t)) other words, the integral equation in X ,
I
t
−F ∗ (t, u 0 (t))ψ 0 (t), w(t) − u 0 (t) E ∗ ,E
dt ≥ 0 y(t) = S(t)y0 + S(t − θ) f (y(θ )) d W (θ )
(128) 0
t ∈ [0, T ]
for all w ∈ , where F ∗ denotes the dual of the operator
F(t, u 0 (t)) ∈ (E, V ∗ ) and Z the Gateaux derivative of has a unique solution y ∈ X whenever y0 ∈ L 2 ( , H ),
Z on H . hence y is a mild solution of system (130).
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
Existence theory for semilinear stochastic evolution 1. The first one is in the area of control theory for sys-
equations of the form, tems governed by m-times integrated semigroups or dis-
tribution semi groups.
dy = (A(t)y + f (t, y)) dt + σ (t) d W (133)
2. The second one is in the area of fundamental con-
has been developed under much weaker hypotheses cepts, in particular, the notion of solution, a new notion of
on the operators A and f using the Leray–Schauder solutions, called measure solutions, has been introduced
degree theory. There are also results in which A has been very recently and has been used in the theory of control
considered to be a strongly measurable function from ( , of distributed parameter systems.
, P) to c (H ), the space of closed densely defined linear 3. The third front extends the concept of impulsive sys-
(not necessarily bounded) operators in H . In this case, A tems to infinite dimensional Banach spaces; we shall dis-
is assumed to generate a strongly measurable (random) cuss briefly these new developments and their implication
c0 -semigroup in H . Existence theory for nonlinear in mathematical sciences.
stochastic boundary value problems of the form, 4. The fourth front represents recent applications of the
theory of distributed parameter systems to the physical
∂φ
+ A(t)φ = f + F(φ) on I× sciences.
∂t
Bφ = g + G(φ) on I ×∂ (134)
A. m-Times Integrated Semigroups
φ(0) = φ0 on
The classical semigroup theory, as seen in Section II, is
has been considered with f , g being generalized random based on the assumptions that A is closed and D(A) is
processes, φ0 a generalized random variable, and F, G dense in X and that the Hille–Yosida inequality,
nonlinear accretive operators. Stability problems for sys-
tems of the form (134) with f = 0; g = N , a generalized R(λ, A) ≡ (λI − A)−1 ≤ M/(λ − ω),
white noise; G = 0; and F being a monotone operator have λ ∈ ρ(A) ⊃ (ω, ∞) (135)
been studied. It has been shown that the system is asymp-
totically stable with respect to a ball around the origin with holds for some M ≥ 0 and ω ∈ R. These are the neces-
radius determined by the trace of the covariance operator sary and sufficient conditions for the existence of a C0 -
of the associated Wiener process. semigroup S(t), t ≥ 0, and hence the existence of a solu-
It appears from the literature that for nonlinear systems tion of the Cauchy problem,
the theory of optimal control, filtering, identification, and ẋ = Ax, x(0) = ξ (136)
controllability is far from satisfactory. This is a difficult
but fascinating field and certainly a challenging subject of in X . The solution is given by x(t) = S(t)ξ , t ≥ 0. No
the future. doubt this covers a very large class of partial differen-
tial operators with given boundary conditions and hence
a large class of distributed parameter systems. However,
IV. RECENT ADVANCES IN there are classes of operators A which do not satisfy
INFINITE-DIMENSIONAL the Hille–Yosida theorem, yet such systems have solu-
SYSTEMS AND CONTROL tions in some generalized sense. According to the Hille–
Yosida theorem, R(λ, A) is the Laplace transform of some
In this section we discuss some recent advances in the operator-valued function S(t), t ≥ 0. It is now known that
theory and applications of distributed parameter systems the Cauchy problem stated above has a solution in some
since the time of first publication of this encyclopedia. generalized sense even if only Rm (λ, A) ≡ R(λ, A)/λm ,
Details of these new developments can be found in the λ ∈ ρ(A), is the Laplace transform of an operator-valued
references. There has been substantial theoretical devel- function T (t), t ≥ 0. In this case, T (t), t ≥ 0, is said to
opment of distributed parameter systems, as indicated in be the m-times integrated semigroup and A is said to be
References 11 to 36. These include general boundary con- its infinitesimal generator. The classical solution for the
trol problems, control of deterministic and stochastic evo- Cauchy problem as stated above is now given by:
lution inequalities and differential inclusions, uncertain
x(t) = (d m /dt m )T (t)ξ, t ≥ 0, for ξ ∈ D(Am+1 )
systems, and the so-called B-evolutions. Due to space lim-
itations, we cannot include the details. Here we shall give In general, if ξ ∈ X there is no classical solution but we
only a brief outline of the major new concepts introduced may admit generalized derivatives and hence generalized
in recent years. On the theoretical front, there are three solutions. For example, if D(Am+1 ) is dense in X , one
new major developments: can choose a sequence {ξk } converging strongly to ξ and
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
ξ ∈ E}. The dual of this space is the space of regular, + µs (Bφ), d W (s), t ≥0 (144)
0
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
where now the operator A is given by the second-order where, generally, A is the infinitesimal generator of a
partial differential operator, C0 -semigroup in a Banach space E, the function f is a
continuous nonlinear map from E to E, and Fi : E → E,
Aφ ≡ (1/2)Tr(D 2 φσ σ ∗ ) + A∗ Dφ(ξ ), ξ H i = 1, 2, ·, ·, n, are continuous maps. The difference
+ Dφ(ξ ), f (ξ ) H (145) operator x (ti ) ≡ x(ti + 0) − x(ti − 0) ≡ x(ti + 0) − x(ti )
denotes the jump operator. This represents the jump in
and the operator B is given by: the state x at time ti with Fi determining the jump size
Bφ ≡ σ ∗ Dφ at time ti . Similarly, a controlled impulsive system is
governed by the following system of equation:
The last term is a stochastic integral with respect to a cylin-
drical Brownian motion (for details, see Reference 31). d x (t) = [Ax(t) + f (x(t))] dt + g(x(t)) dv(t),
The operators A and B are well defined on a class of test t ∈ I \D, x(0) = x0 ,
functions given by: (148)
x(ti ) = Fi (x(ti )),
F ≡ {φ ∈ BC(E) : φ, Dφ, D φ continuous having
2
basic control problem is to find a control policy that im- one wishes to control the jump sizes in order to achieve
parts a minimum to the following cost functional, certain objectives, one has the model,
T
ẋ(t) ∈ Ax(t) + F(x(t)),
J (v) = J (q) ≡ (x(t), v(t)) dt + ϕ(v) + $(x(T ))
0 t ∈ I \D, x(0) = x0 ,
T
≡ (x(t), q) dt + ϕ(q) + $(x(T )) x(ti ) = gi (x(ti ), u i )),
0
0 = t0 < t1 < t2 , · · · < tn < tn+1 ≡ T
Theorem 28. Suppose assumptions (A1) to (A3) hold, where the controls u i may take values from a compact met-
with { f, Fi , g} all having Frechet derivatives continuous ric space U . In this case, the multis are given by G i (ζ ) =
and bounded on bounded sets, and the functions , ϕ, $ are gi (ζ, U ). For more details on impulsive evolution equa-
once continuously Gateaux differentiable on E × F, F, E, tions and, in general, inclusions, the reader may consult
respectively. Then, if the pair {v o (orq o ), x o } is optimal, References 35 and 36.
there exists a ψ ∈ PWCr (I , E ∗ ) so that the triple {v 0 , x 0 , ψ}
satisfies the following inequality and evolution equation:
D. Applications
m
(a) g ∗ x o (si ) ψ(si ) + ϕqi (q o ) The slow growth of application of distributed systems the-
i=0 ory is partly due to its mathematical and computational
T complexities. In spite of this, in recent years there have
+ u (x o (t), v o (t)) dt , qi − qio ≥0 been substantial applications of distributed control theory
si F ∗ ,F in aerospace engineering, including vibration suppression
for all q = {q0 , q1 , ·, ·, qm } ∈ Uad . of aircraft wings, space shuttle orbiters, space stations,
flexible artificial satellites, and suspension bridges. Sev-
(b) dψ = − A∗ ψ + f x∗ (x o (t) ψ − x (x o (t), v o (t)) dt eral papers on control of fluid dynamical systems gov-
− gx∗ (x o (t), ψ(t)) dv o , t ∈ I \D erned by Navier–Stokes equations have appeared over the
past decade with particular reference to artificial heart de-
ψ(T ) = $x (x (T )) o
sign. During the same period, several papers on the control
∗
o
r ψ(ti ) = −Fi,x x (ti ) ψ(ti ), i = 1, 2, ·, ·, n of quantum mechanical and molecular systems have ap-
peared. Distributed systems theory has also found applica-
where the operator r is defined by r f (ti ) ≡ f (ti − 0) tions in stochastic control and filtering. With the advance-
− f (ti ). ment of computing power, we expect far more applications
(c) The process x o satisfies the system equation (148) (in in the very near future.
the mild sense) corresponding to the control v o .
In case a hard constraint is imposed on the final state SEE ALSO THE FOLLOWING ARTICLES
requiring x o (T ) ∈ K where K is a closed convex subset of
E with nonempty interior, the adjoint equation given by CONTROLS, LARGE-SCALE SYSTEMS • DIFFERENTIAL
(b) requires modification. The terminal equality condition EQUATIONS, ORDINARY • DIFFERENTIAL EQUATIONS,
is replaced by the inclusion ψ(T )∈ ∂ IK (x o (T )), where IK PARTIAL • TOPOLOGY, GENERAL • WAVE PHENOMENA
is the indicator function of the set K.
Recently a very general model for evolution inclusions
has been introduced:36
BIBLIOGRAPHY
ẋ(t) ∈ Ax(t) + F(x(t)),
Ahmed, N. U., and Teo, K. L. (1981). “Optimal Control of Distributed
t ∈ I \D, x(0) = x0 , Parameter Systems,” North-Holland, Amsterdam.
(2.3) Ahmed, N. U. (1983). “Properties of relaxed trajectories for a class of
x(ti ) ∈ G i (x(ti )), nonlinear evolution equations on a Banach space,” SIAM J. Control
Optimization 2(6), 953–967.
0 = t0 < t1 < t2 , · · · < tn < tn+1 ≡ T Ahmed, N. U. (1981). “Stochastic control on Hilbert space for linear
evolution equations with random operator-valued coefficients,” SIAM
Here, both F and {G i , i = 1, 2, ·, ·, n} are multivalued J. Control Optimization 19(3), 401–403.
maps. This model may arise under many different situ- Ahmed, N. U. (1985). “Abstract stochastic evolution equations on Ba-
ations. For example, in case of a control problem where nach spaces,” J. Stochastic Anal. Appl. 3(4), 397–432.
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN004E-183 June 8, 2001 18:23
Ahmed, N. U. (1986). “Existence of optimal controls for a class of sys- solutions for Zakai equations,” Publicationes Mathematicae 49(3–4),
tems governed by differential inclusions on a Banach space,” J. Opti- 251–264.
mization Theory Appl. 50(2), 213–237. Ahmed, N. U. (1996). “Generalized solutions for linear systems governed
Ahmed, N. U. (1988). “Optimization and Identification of Systems Gov- by operators beyond Hille–Yosida type,” Publicationes Mathematicae
erned by Evolution Equations on Banach Space,” Pitman Research 48(1–2), 45–64.
Notes in Mathematics Series, Vol. 184, Longman Scientific/Wiley, Ahmed, N. U. (1997). “Measure solutions for semilinear evolution equa-
New York. tions with polynomial growth and their optimal control,” Discussiones
Balakrishnan, A. V. (1976). “Applied Functional Analysis,” Springer– Mathematicae (Differential Inclusions) 17, 5–27.
Verlag, Berlin. Ahmed, N. U. (1997). “Stochastic B-evolutions on Hilbert spaces,” Non-
Butkovskiy, A. G. (1969). “Distributed Control Systems,” Elsevier, New linear Anal. 30(1), 199–209.
York. Ahmed, N. U. (1997). “Optimal control for linear systems described
Curtain, A. F., and Pritchard, A. J. (1978). “Infinite Dimensional Linear by m- times integrated semigroups,” Publicationes Mathematicae
Systems Theory,” Lecture Notes, Vol. 8, Springer–Verlag, Berlin. 50(1–2), 1–13.
Lions, J. L. (1971). “Optimal Control of Systems Governed by Partial Xiang, X., and Ahmed, N. U. (1997). “Necessary conditions of optimality
Differential Equations,” Springer–Verlag, Berlin. for differential inclusions on Banach space,” Nonlinear Anal. 30(8),
Barbu, V. (1984). “Optimal Control of Variational Inequalities,” Pit- 5437–5445.
man Research Notes in Mathematics Series, Vol. 246, Longman Ahmed, N. U., and Kerbal, S. (1997). “Stochastic systems governed by
Scientific/Wiley, New York. B-evolutions on Hilbert spaces,” Proc. Roy. Soc. Edinburgh 127A,
Lakshmikantham, V., Bainov, D. D., and Simeonov, P. S. (1989). “Theory 903–920.
of Impulsive Differential Equations,” World Scientific, Singapore. Rogovchenko, Y. V. (1997). “Impulsive evolution systems: main results
Ahmed, N. U. (1991). “Semigroup Theory with Applications to Systems and new trends,” Dynamics of Continuous, Discrete, and Impulsive
and Control,” Pitman Research Notes in Mathematics Series, Vol. 246, Systems 3(1), 77–78.
Longman Scientific/Wiley, New York. Ahmed, N. U. (1998). “Optimal control of turbulent flow as measure
Ahmed, N. U. (1992). “Optimal Relaxed Controls for Nonlinear Stochas- solutions,” IJCFD 11, 169–180.
tic Differential Inclusions on Banach Space,” Proc. First World Fattorini, H. O. (1998). “Infinite dimensional optimization and con-
Congress of Nonlinear Analysis, Tampa, FL, de Gruyter, Berlin, trol theory,” In “Encyclopedia of Mathematics and Its Applications,”
pp. 1699–1712. Cambridge Univ. Press, Cambridge, U.K.
Ahmed, N. U. (1994). “Optimal relaxed controls for nonlinear infinite Ahmed, N. U. (1999). “Measure solutions for semilinear systems with
dimensional stochastic differential inclusions,” Lect. Notes Pure Appl. unbounded nonlinearities,” Nonlinear Anal. 35, 478–503.
Math., Optimal Control of Differential Equations 160, 1–19. Ahmed, N. U. (1999). “Relaxed solutions for stochastic evolution equa-
Ahmed, N. U. (1995). “Optimal control of infinite dimensional systems tions on Hilbert space with polynomial nonlinearities,” Publicationes
governed by functional differential inclusions,” Discussiones Mathe- Mathematicae 54(1–2), 75–101.
maticae (Differential Inclusions) 15, 75–94. Ahmed, N. U. (1999). “A general result on measure solutions for semi-
Ahmed, N. U., and Xiang, X. (1996). “Nonlinear boundary control of linear evolution equations,” Nonlinear Anal. 35.
semilinear parabolic systems,” SIAM J. Control Optimization 34(2), Liu, J. H. (1999). “Nonlinear impulsive evolution equations: dynamics
473–490. of continuous, discrete, and impulsive systems,” 6, 77–85.
Ahmed, N. U. (1996). “Optimal relaxed controls for infinite-dimensional Ahmed, N. U. (1999). “Measure solutions for impulsive systems in
stochastic systems of Zakai type,” SIAM J. Control Optimization 34(5), Banach space and their control,” J. Dynamics of Continuous, Discrete,
1592– 1615. and Impulsive Systems 6, 519–535.
Ahmed, N. U., and Xiang, X. (1996). “Nonlinear uncertain systems and Ahmed, N. U. (2000). “Optimal impulse control for impulsive systems in
necessary conditions of optimality,” SIAM J. Control Optimization Banach spaces,” Int. J. of Differential Equations and Applications 1(1).
35(5), 1755–1772. Ahmed, N. U. (2000). “Systems governed by impulsive differential in-
Ahmed, N. U. (1996). “Existence and uniqueness of measure-valued clusions on Hilbert space,” J. Nonlinear Analysis.
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
Dynamic Programming
Martin L. Puterman
The University of British Columbia
673
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
application are operations research, engineering, statistics, are classified as either finite horizon or infinite horizon
and resource management. Improved computing capabil- according to whether the set T is finite or infinite. The
ities will lead to the wide application of this technique in problem formulation in these two cases is almost identical;
the future. however, the dynamic programming methods of solution
differ considerably. For discrete-time problems, T is the
set {1, 2, . . . , N } in the finite case and {1, 2, . . .} in the
I. SEQUENTIAL DECISION PROBLEMS infinite case. The present decision point is denoted by
t and the subsequent point by t + 1. The points of time
A. Introduction at which decisions can be made are often called stages.
Almost all the results in this article concern discrete-time
A system under the control of a decision maker is evolv- models; the continuous-time model is briefly mentioned
ing through time. At each point of time at which a de- in Section IV.
cision can be made, the decision maker, who will be re- The set of possible states of the system at time t is de-
ferred to as “he” with no sexist connotations intended, noted by St . In finite-horizon problems, this is defined for
observes the state of the system. On the basis of this in- t = 1, 2, . . . , N + 1, although decisions are made only at
formation, he chooses an action from a set of alternatives. times t = 1, 2, . . . , N . This is because the decision at time
The consequences of this action are two-fold; the deci- N often has future conseqences that can be summarized
sion maker receives an immediate reward or incurs an by evaluating the state of the system at time N + 1. This
immediate cost, and the state that the system will occupy is analagous to providing boundary values for differential
at subsequent decision epochs is influenced either deter- equations. If at time t the decision maker observes the
ministically or probabilistically. The problem faced by the system in state s ∈ St , he chooses an action a from the set
decision maker is to choose a sequence of actions that will of allowable actions at time t, As,t . As above, St and As,t
optimize the performance of the system over the decision- can be either finite or infinite and discrete or continuous.
making horizon. Since the action selected at present affects This distinction has little consequence for the problem
the future evolution of the system, the decision maker can- formulation.
not choose his action without taking into account future As a result of choosing action a when the system is in
consequences. state s at time t, the decision maker receives an immediate
Dynamic programming is a procedure for finding op- reward rt (s, a). This reward can be positive or negative. In
timal policies for sequential decision problems. It differs the latter case it can be thought of as a cost. Furthermore,
from linear, nonlinear, and integer programming in that the choice of action affects the system evolution either
there is no standard dynamic programming problem for- deterministically or probabilistically. In the deterministic
mulation. Instead, it is a collection of techniques based case, the choice of action determines the state of the system
on developing mathematical recursions to decompose a at time t + 1 with certainty. Denote by wt (s, a) ∈ St+1 the
multistage problem into a series of single-stage problems state the system will occupy if action a is chosen in state s
that are analytically or computationally more tractable. at time t; wt (s, a) is called the transfer function. When the
Its implementation often requires ingenuity on the part of system evolves probabilistically, the subsequent state is
the analyst, and the formulation of dynamic programming random and choice of action specifies its probability dis-
problems is considered by some practitioners to be an art. tribution. Let pt ( j|s, a) denote the probability that the sys-
This subject is best understood through examples. This tem is in state j ∈ St+1 if action a is chosen in state s at time
section proceeds with a formal introduction of the basic t; pt ( j|s, a) is called the transition probability function.
sequential decision problem and follows with several ex- When St is a continuum, pt ( j|s, a) is a probability density.
amples. The reader is encouraged to skip back and forth Such models are discussed briefly in Section IV. A sequen-
between these sections to understand the basic ingredients tial decision problem in which the transitions from state to
of such a problem. Dynamic programming methodology state are governed by a transition probability function and
is discussed in Sections II and III. the set of actions and rewards depends only on the current
state and stage is called a Markov decision problem.
The deterministic model is a special case of the proba-
B. Problem Formulation
bilistic model obtained by choosing pt ( j|s, a) = 1 if j =
Some formal notation follows. Let T denote the set of wt (s, a) and pt ( j|s, a) = 0 if j = wt (s, a). Even though
time points at which decisions can be made. The set T can there is this equivalence, the transfer function represen-
be classified in two ways; it is either finite or infinite and tation is more convenient for deterministic problems.
either a discrete set or a continuum. The primary focus of A decision rule is a function dt : St → As,t that specifies
this article is when T is discrete. Discrete-time problems the action the decision maker chooses when the system is
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
FIGURE 1 Evolution of the sequential decision model under the policy π = (d 1 , d 2 , . . . , dN ). The state at stage 1 is s.
in state s at time t; that is, dt (s) specifies an action in known. Let X tπ denote the state the system occupies at
As,t for each s in St . A decision rule of this type is called time t if the decision maker uses policy π over the plan-
Markovian because it depends only on the current state of ning horizon. In the first period the decision maker receives
the system. The set of allowable decision rules at time t a reward of r1 (s, d1 (s)), in the second period a reward
is denoted by Dt and is called the decision set. Usually it of r2 (X 2π , d2 (X 2π )), and in the tth period rt (X tπ , dt (X tπ )).
is the set of all functions mapping St to As,t , but in some Figure 1 depicts the evolution of the process under a
applications, it might be a proper subset. policy π = {d1 , d2 , . . . , d N } in both the deterministic and
Many generalizations of deterministic Markovian deci- stochastic cases. The quantity in each box indicates the
sion rules are possible. Decision rules can depend on the interaction of the incoming state with the prespecified de-
entire history of the system, which is summarized in the cision rule to produce the indicated action dt (X tπ ). The
sequence of observed states and actions observed up to arrow to the right of a box indicates the resulting state,
the present, or they can depend only on the initial and and the arrow downward the resulting reward to the deci-
current state of the system. Furthermore, the decision rule sion maker. The system is assumed to be in state s before
might be randomized; that is, in each state it specifies a the first decision.
probability distribution on the set of allowable actions so The decision maker evaluates policies by comparing the
that by using such a rule the decision maker chooses his value of a function of the policy’s income stream. Many
action at each decision epoch by a probabilistic mecha- such evaluation functions are available, but it is most con-
nism. For the problems considered in this article, using venient to assume a linear, additive, and risk-neutral utility
deterministic Markovian decision rules at each stage is function over time, which leads to using the total reward
optimal so that the generalizations referred to above will over the planning horizon for evaluation. Let v πN (s) be the
not be discussed further. total reward over the planning horizon. It is given by the
A policy specifies the sequence of decision rules to expression
be used by the decision maker over the course of the
planning horizon. A policy π is a finite or an infinite
N +1
v πN (s) = rt X tπ , dt X tπ , (1)
sequence of decision rules; that is, π = {d1 , d2 , . . . , d N },
t=1
where dt ∈ Dt for t = 1, 2, . . . , N if the horizon is finite,
or π = {d1 , d2 , . . .}, where dt ∈ Dt for t = 1, 2, . . . if the in which it is implicit that X 1π = s. For deterministic prob-
horizon is infinite. Let denote the set of all possible lems, evaluation formulas such as Eq. (1) always depend
policies; = D1 × D2 × · · · × D N in the finite case and on the initial state of the process, although this is not ex-
= D1 × D2 × · · · in the infinite case. plicitly stated below.
In deterministic problems, by specifying a policy at the In probabilistic problems, by specifying a policy at the
start of the problem, the decision maker completely deter- start of the problem, the decision maker determines the
mines the future evolution of the system. For each policy transition probability functions of a nonstationary Markov
the sequence of states the system will occupy is known chain. The sequence of states the system will occupy is not
with certainty, and hence the sequence of rewards the de- known with certainty, and consequently the sequence of
cision maker will receive over the planning horizon is rewards the decision maker will receive over the planning
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
horizon is not known. Instead, what is known is the joint vλ∗ (s) = sup vλπ (s) for all s ∈ S1 . (6)
probability distribution of system states and rewards. In π ∈
Decision points:
T = {1, 2, 3}
States (numbers correspond to nodes):
S1 = {1}; S2 = {2, 3}; S3 = {4, 5, 6};
S4 = {7}
Actions (action j selected in node i corresponds to choos-
ing to traverse the arc between nodes i and j; the first
subscript on A is the state and the second the stage):
A1,1 = {2, 3}
FIGURE 2 Network for the longest-route problem. A2,2 = {4, 5}; A3,2 = {4, 5, 6}
A4,3 = {7}; A5,3 = {7}; A6,3 = {7}
1. A Longest-Route Problem
Rewards:
A finite directed graph is depicted in Fig. 2. The circles
are called nodes, and the lines connecting them are called r1 (1, 2) = 2; r1 (1, 3) = 4
arcs. On each arc, an arrow indicates the direction in which
r2 (2, 4) = 5; r2 (2, 5) = 6;
movement is possible. The numerical value on the arc is
the reward the decision maker receives if he chooses to r2 (3, 4) = 5, r2 (3, 5) = 3; r2 (3, 6) = 1
traverse that arc on his journey from node 1 to node 7. His
r3 (4, 7) = 3; r3 (5, 7) = 6; r3 (6, 7) = 2
objective is to find the path from node 1 to node 7 that max-
imizes the total reward he receives on his journey. Such Transfer function:
a problem is called a longest-route problem. A practical
application is determining the length of time needed to wt (s, a) = a
complete a project. In such problems, the arc length rep- The remaining ingredients in the sequential decision
resents the time to complete a task. The entire project is not problem formulation are the decision set, the set of poli-
finished until all tasks are performed. Finding the longest cies, and an evaluation formula. The decision set at stage
path through the network gives the minimum amount of t is the set of all arcs emanating from nodes at stage t.
time for the entire project to be completed since it cor- A policy is a list of arcs in which there is one arc start-
responds to that sequence of tasks that requires the most ing at each node (except 7) in the network. The policy set
time. The longest path is called a critical path because, if contains all such lists. Each policy contains a route from
any task in this sequence is delayed, the entire project will node 1 to node 7 and some superfluous action selections.
be delayed. The value of the policy is the total of the rewards along
In other applications the values on the arcs represent this route, and the decision maker’s problem is to choose
lengths of road segments, and the decision maker’s ob- a policy that maximizes this total reward.
jective is to find the shortest route from the first node The structure of a policy is described in more de-
to the terminal node. Such a problem is called a shortest- tail through an example. Consider the policy π = {(1, 2),
route problem. All deterministic, finite-state, finite-action, (2, 4), (3, 4), (4, 7), (5, 7), (6, 7)}. Implicit in this defini-
finite-horizon dynamic programming problems are equiv- tion is a sequence of decision rules dt (s) for each state and
alent to shortest- or longest-route problems. Another ex- stage. They are d1 (1) = 2, d2 (2) = 4, d2 (3) = 4, d3 (4) = 7,
ample of this will appear in Section II. d3 (5) = 7, and d3 (6) = 7. This policy can be formally de-
An important assumption is that the network contains noted by π = {d1 , d2 , d3 }. The policy contains one unique
no directed cycle; that is, there is no route starting at a routing through the graph, namely, 1 → 2 → 4 → 7 and
node that returns to that node. If this were the case, the several unnecessary decisions. We use the formal nota-
longest route would be infinite and the problem would not tion X 1π = 1, X 2π = 2, X 3π = 4, and X 4π = 7 so that
be of interest.
The longest-route problem is now formulated as a se- v3π (1) = r1 (1, d1 (1)) + r2 (2, d2 (2)) + r3 (4, d3 (4))
quential decision problem. This requires defining the set
= r1 (1, 2) + r2 (2, 4) + r3 (4, 7)
of states, actions, decision sets, transfer functions, and re-
wards. They are as follows: = 2 + 5 + 3 = 10.
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
In such a small problem, one can easily evaluate all Maximize g1 (s1 ) + g2 (s2 ) + · · · + g N (s N )
policies by enumeration and determine that the longest subject to
route through the network is 1 → 2 → 5 → 7 with a return
s1 + s2 + · · · + s N ≤ K (11)
of 14. For larger problems this is not efficient; dynamic
programming methods will be seen to offer an efficient and
means of determining the longest route.
0 ≤ st ≤ m t , t = 1, 2, . . . , N . (12)
The reader might note that the formal sequential de-
cision process notation is quite redundant here. The sub- It is not immediately obvious that this problem is a
script for stage does not convey any useful information and sequential decision problem. The formulation is based on
the specification of a policy requires making decisions in treating the problem as if the decision to allocate resources
nodes that will never be reached. Solution by dynamic to the activities were done sequentially through time with
programming methods will require this superfluous infor- allocation to activity 1 first and activity N last. Decisions
mation. In other settings this information will be useful. are coupled, since successive allocations must take the
quantity of the resource allocated previously into account.
That is, if K − s units of resource have been allocated to
2. A Resource Allocation Problem
the first t activities, then s units are available for activities
A decision maker has a finite amount K of a resource to t + 1, t + 2, . . . , N .
allocate between N possible activities. Using activity i at The following sequential decision problem formulation
level xi consumes ci (xi ) units of the resource and yields a is based on the second formulation above:
reward or utility of f i (xi ) to the decision maker. The max- Decision points (correspond to activity number):
imum level of intensity for activity i is Mi . His objective
T = {1, 2, . . . , N }
is to determine the intensity for each of the activities that
maximizes his total reward. When any level of the activ- States (amount of resource available for allocation in re-
ity is possible, this is a nonlinear programming problem. maining stages): For 0 ≤ t ≤ N ,
When the activity can operate only at a finite set of levels,
{s : 0 ≤ s ≤ m t }
this is an integer programming problem. In the special case
if resource levels are continuous
that the activity can be either utilized or not (Mi = 1 and xi St =
{0, 1, 2, . . . , m t }
is an integer) this is often called a knapsack problem. This
if resource levels are discrete
is because it can be used to model the problem of a camper
who has to decide which of N potential items to carry in For t = N + 1,
his knapsack. The value of item i is f i (1) and it weighs
{s : 0 ≤ s ≤ K }
ci (1). The camper wishes to select the most valuable set
if resource levels are continuous
of items that do not weigh more than the capacity of the St =
{0, 1, 2, . . . , K }
knapsack.
if resource levels are discrete
The mathematical formulation of the resource alloca-
tion problem is as follows: Actions (s is amount of resource available for stages t,
t + 1, . . . , N ):
Maximize f 1 (x1 ) + f 2 (x2 ) + · · · + f N (x N )
{u : 0 ≤ u ≤ min(s, m t )}
subject to
if resource levels are continuous
c1 (x1 ) + c2 (x2 ) + · · · + c N (x N ) ≤ K As,t =
(9)
{0, 1, 2, . . . , min(s, m t )}
if resource levels are discrete
and
Rewards:
0 ≤ x t ≤ Mt , t = 1, 2, . . . , N . (10)
rt (s, a) = gt (a)
The following change of variables facilitates the se-
quential decision problem formulation. Define the new Transfer function:
variable si = ci (xi ) and assume that ci is a monotone in-
wt (s, a) = s − a
creasing function on [0, Mi ]. this assumption says that
the more intense the activity level, the more resource uti- The decision set at stage t is the set of all functions from
lized. Define gi (si ) = f i (ci−1 (si )) and m i = ci−1 (Mi ). This St to As,t , and a policy is a sequence of such functions,
change of variables corresponds to formulating the prob- one for each t ∈ T . A decision rule specifies the amount
lem in terms of the quantity of resource being used. In this of resource to allocate to activity t if s units are available
notation the formulation above becomes for allocation to activities t, t + 1, . . . , N , and a policy
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
specifies which decision rule to use at each stage of the and the decision maker decides to allocate 2 units, he will
sequential allocation. As in the longest-route problem, a receive a reward of 23 = 8 units and move to node 1.
policy specifies decisions in many eventualities that will When the resource levels form a continuum, the net-
not occur using that policy. This may seem wasteful at first work representation is no longer valid. The problem is
but is fundamental to the dynamic programming method- reduced to a sequence of constrained one-dimensional
ology. The quantity X tπ is the amount of remaining re- nonlinear optimization problems through dynamic pro-
source available for allocation to activities t, t + 1, . . . , N gramming. Such an example will be solved in Section II
using policy π . Clearly, X 1π = K , X 2π = K − d1 (K ), and by using dynamic programming methods.
so forth. The decision maker compares policies through
the quantity v πN (K ), which is given by 3. A Stochastic Inventory Control Problem
v πN (K ) = g1 (d1 (K )) + g2 d2 X 2π Each month, the manager of a warehouse must determine
+ · · · + g N d N X πN . how much of a product to keep in stock to satisfy cus-
tomer demand for the product. The objective is to max-
When the set of activities is discrete, the resource allo- imize expected total profit (sales revenue less inventory
cation problem can be formulated as a longest-route prob- holding and ordering costs), which may or may not be
lem, as can any discrete state and action sequential deci- discounted. The demand throughout the month is random,
sion problem. This is depicted in Fig. 3 for the following with a known probability distribution. Several simplifying
specific example: assumptions make a concise formulation possible:
Maximize 3s12 + s23 + 4s3
subject to a. The decision to order additional stock is made at
s1 + s2 + s3 ≤ 4, the beginning of each month and delivery occurs instan-
taneously.
s1 , s2 , and s3 are integers, and b. Demand for the product arrives throughout the month
0 ≤ s1 ≤ 2; 0 ≤ s2 ≤ 2; 0 ≤ s3 ≤ 2. but is filled on the last day of the month.
c. If demand exceeds the stock on hand, the customer
In the longest-route formulation, the node labels are the goes elsewhere to purchase the product.
amount of resource available for allocation at subsequent d. The revenues and costs and the demand distribution
stages. A fifth stage is added so that there is a unique des- are identical each month.
tination and all decisions at stage 4 correspond to moving e. The product can be sold only in whole units.
from a node at stage 4 to node 0 with no reward. This is f. The warehouse capacity is M units.
because an unallocated resource has no value to the deci-
sion maker in this formulation. The number on each arc Let st denote the inventory on hand at the beginning
is the reward, and the amount of resource allocated is the of month t, at the additional product ordered in month t,
difference between the node label at stage t and that at and Dt the random demand in month t. The demand has a
stage t + 1. For instance, if at stage 2, there are 3 units known probability distribution given by pd = P{Dt = d},
of resource available for allocation over successive stages d = 0, 1, 2, . . . . The cost of ordering u units in any month
is O(u) and the cost of storing u units for 1 month is h(u).
The ordering cost is given by
K + c(u), if u>0
O(u) = (13)
0, if u = 0,
where c(u) and h(u) are increasing functions of u. For
finite-horizon problems, if u units of inventory are on hand
at the end of the planning horizon, its value is g(u). Fi-
nally, if u units of product are demanded in a month and
the inventory is sufficient to satisfy demand, the manager
receives f (u). Define F(u) to be the expected revenue in
a month if the inventory before receipt of customer orders
is u units. It is given in period t by
u−1
F(u) = f ( j) p j + f (u)P{Dt ≥ u}. (14)
FIGURE 3 Network for the resource allocation problem. j=0
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
Equation (14) can be interpreted as follows. If the inven- units. This occurs with probability qs+a . Finally, the prob-
tory on hand exceeds the quantity demanded, j, the rev- ability that the inventory level ever exceeds s + a units is
enue is f ( j); p j is the probability that the demand in a zero, since this demand is nonnegative.
period is j units. If the inventory on hand is u units and the The decision sets consist of all rules that assign the
quantity demanded is at least u units, then the revenue is quantity of inventory to be ordered each month to each
f (u) and P{Dt ≥ u} is the probability of such a demand. possible starting inventory position in a month. A policy
The combined quantity is the probability-weighted, or ex- is a sequence of such ordering rules. Unlike deterministic
pected, revenue. problems, in which a decision rule is specified for many
This is a stochastic sequential decision problem states that will never be reached, in stochastic problems
(Markov decision problem), and its formulation will in- such as this, it is necessary for the decision maker to de-
clude a transition probability function instead of a transfer termine the decision rule for all states. This is because
function. The formulation follows: the evolution of the inventory level over time is random,
Decision points: which makes any inventory level possible at any decision
T = {1, 2, . . . , N }; point. Consequently the decision maker must plan for each
N may be finite or infinite of these eventualities.
An example of a decision rule is as follows: Order only
States (units of inventory on hand at the start of a month): if the inventory level is below 3 units at the start of the
month and order the quantity that raises the stock level to
St = {0, 1, 2, . . . M}, t = 1, 2, . . . , N + 1
10 units on receipt of the order. In month t this is given by
Actions (the amount of additional stock to order in 10 − s, s<3
month t): dt (s) =
0, s ≥ 3.
As,t = {0, 1, 2, . . . , M − s} The evaluation method for a policy depends on the time
Expected rewards (expected revenue less ordering and horizon under consideration. For finite-horizon problems,
holding costs): the total expected cost conditional on the initial stock level
is a convenient summary. Assuming the stock level at time
rt (s, a) = F(s + a) − O(a) − h(s + a), 1 is s, the expected total reward for policy π is
t = 1, 2, . . . , N
N
v πN (s) = E π,s F X tπ + dtπ X tπ − O dtπ X t
Value of terminal inventory (no actions are possible): t=1
r N +1 (s, a) = g(s)
−h X tπ + dt X tπ −g X πN +1 .
Transition probabilities (see explanation below):
If, instead, the decision maker wishes to discount future
0, if j > s + a
profit at a monthly discount rate of λ, 0 ≤ λ < 1, the term
pj, if j = s + a − Dt ,
λt−1 is inserted before each term in the above summa-
s + a ≤ M, and
pt ( j|s, a) = tion and λ N before the terminal reward g. For an infinite-
s + a > Dt horizon problem, the expected total discounted profit is
q , if j = 0, s + a ≤ M, given by
s+a
and s + a ≤ Dt ∞
π
vλ = E π,s λt−1 F X tπ + dt X tπ
where t=1
∞
qs+a = P{Dt ≥ s + a} = pd · −O dtπ X tπ +h X tπ + dt (X ) π
.
d=s+a
A brief explanation of the transition probabilities might The decision maker’s problem is to choose a sequence
be helpful. If the inventory on hand at the beginning of of decision rules to maximize expected total or total dis-
period t is s units and an order is placed for a units, the counted profits.
inventory before external demand is s + a units. If the de- Many modifications of this inventory problem are pos-
mand of j units is less than s + a units, then the inventory sible; for example, excess demand in any period could be
at the beginning of period t + 1 is s + a − j units. This backlogged and a penalty for carrying unsatisfied demand
occurs with probability p j . If the demand exceeds s + a could be charged, or there could be a time lag between
units, then the inventory at the start of period t + 1 is 0 placing the order and its receipt. The formulation herein
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
can easily be modified to include such changes; the inter- II. FINITE-HORIZON DYNAMIC
ested reader is encouraged to consult the Bibliography for PROGRAMMING
more details.
A numerical example is now provided in complete de- A. Introduction
tail. It will be solved in subsequent sections by using
dynamic programming methods. The data for the prob- Dynamic programming is a collection of methods for solv-
lem are as follows: K = 4, c(u) = 2u, g(u) = 0, h(u) = u, ing sequential decision problems. The methods are based
M = 3, N = 3, f (u) = 8(u), and on decomposing a multistage problem into a sequence
1 of interrelated one-stage problems. Fundamental to this
4,
if d = 0 decomposition is the principle of optimality, which was
pd = 2 ,
1
if d = 1 developed by Richard Bellman in the 1950s. Its impor-
1 tance is that an optimal solution for a multistage problem
4
, if d = 2. can be found by solving a functional equation relating the
The inventory is constrained to be 3 or fewer units, and the optimal value for a (t + 1)-stage problem to the optimal
decision maker wishes to consider the effects over three value for a t-stage problem.
periods. All the costs and revenues are linear. This means Solution methods for problems depend on the time hori-
that for each unit ordered the per unit cost is 2, for each zon and whether the problem is deterministic or stochastic.
unit held in inventory for 1 month the per unit cost is 1, and Deterministic finite-horizon problems are usually solved
for each unit sold the per unit revenue is 8. The expected by backward induction, although several other methods,
revenue when u units of stock are on hand before receipt including forward induction and reaching, are available.
of an order is given by For finite-horizon stochastic problems, backward induc-
tion is the only method of solution. In the infinite-horizon
u F(u)
case, different approaches are used. These will be dis-
0 0 cussed in Section III. The backward induction procedure
1 0× 1
4 +8× 3
4 =6 is described in the next two sections. This material might
2 0× 1
4 +8× 1
2 + 16 × 1
4 =8 seem difficult at first; the reader is encouraged to refer to
3 0× 1
+8× 1
+ 16 × 1
=8 the examples at the end of this section for clarification.
4 2 4
induction algorithm for solving sequential decision prob- pected reward received over the remaining periods as a
lems. This equation corresponds to a one-stage problem consequence of choosing action a in period t.
in which the decision maker observes the system in state
s and must select an action from the set As,t . The conse-
quence of this action is that the decision maker receives an C. Backward Induction and the Principle
immediate reward of rt (s, a) and the system moves to state of Optimality
wt (s, a), at which he receives a reward of v t+1 (wt (s, a)).
Backward induction is a procedure that uses the functional
Equation (15) says that he chooses the action that maxi-
equation in an iterative fashion to find the optimal total
mizes the total of these two rewards. This is exactly the
value function and an optimal policy for a finite-horizon
problem faced by the decision maker in a one-stage se-
sequential decision problem. That this method achieves
quential decision problem when the terminal reward func-
these objectives is demonstrated by the principle of opti-
tion is v t+1 . Equation (16) provides a boundary condition.
mality. The principle of optimality is not a universal truth
When the application dictates, this value 0 can be replaced
that applies to all sequential decision problems but a math-
by an arbitrary function that assigns a value to the terminal
ematical result that requires formal proof in each applica-
state of the system. Such might be the case in the inventory
tion. For problems in which the (expected) total reward
control example.
criterion is used, as considered in this article, it is valid. A
Equation (15) emphasizes the dynamic aspects of the
brief argument of why it holds in such problems is given
sequential decision problem. The decision maker chooses
below.
that action which maximizes his immediate reward plus
To motivate backward induction, the following iterative
his reward over the remaining decision epochs. This is in
procedure for finding the total reward of some specified
contrast to the situation in which the decision maker be-
policy π = (d1 , d2 , . . . , d N ) is given. It is called the policy
haves myopically and chooses the decision rule that max-
evaluation algorithm. To simplify notation, assume that
imizes the reward only in the current period and ignores
pt ( j|s, a) = p( j|s, a) and rt (s, a) = r (s, a) for all s, a,
future consequences. Some researchers have given condi-
and j.
tions in which such a myopic policy is optimal; however,
in almost all problems dynamic aspects must be taken into
account. a. Set t = N + 1 and v N +1 (s) = 0 for all s ∈ S N +1 .
The expression “max” requires explanation because it is b. Substitute t − 1 for t(t − 1 → t) and compute v t (s)
fundamental to the dynamic programming methodology. for each s ∈ St in the deterministic case by
If f (x, y) is any function of two variables with x ∈ X and
y ∈ Y , then v t (s) = r (s, dt (s)) + v t+1 (wt (s, dt (s))), (18)
g(x) = max{ f (x, y)} or in the stochastic case by
y∈Y
The backward induction algorithm proceeds as follows: necessary only at nodes that can be reached from the ini-
tial state. the algorithm traces forward through the net-
a. Set t = N + 1 and v N +1 (s) = 0 for all s ∈ S N +1 . work along the path determined by decisions that obtain
b. Substitute t − 1 for t(t − 1 → t) and compute v t (s) the maximum in Eq. (15). It produces one route through
for each s ∈ St using Eq. (15) or (17) depending on which the network with longest length. If at any stage the set
is appropriate. Denote by A∗s,t the set of actions a ∗ for A∗s,t contains more than one action, then several optimal
which in the deterministic case, routings exist, and if all are desired, the procedure must
v t (s) = r (s, a ∗ ) + v t+1 wt (s, a ∗ ) , (20) be carried out to trace each path.
A problem closely related to the longest-route problem
or in the stochastic case,
is that of finding the longest route from each node to the
v t (s) = r (s, a ∗ ) + p( j|s, a ∗ )v t+1 ( j). (21) final node. When there is only one action in A∗s,t for each
j∈St+1 s and t, then specifying the decision rule that in each state
c. If t = 1, stop; otherwise, return to step b. is equal to the unique maximizing action produces these
routings. This is closer in spirit to the concept of a policy
By comparing this procedure with the policy evalua- than a longest route.
tion procedure above, we can easily see that the backward That the above procedure results in an optimal policy
induction algorithm accomplishes three objectives: and optimal value function is due to the additivity of re-
wards in successive periods. A formal proof of these re-
a. It finds sets of actions A∗s,t that contain all actions in sults is based on induction, but the following argument
As,t that obtain the maximum in Eq. (15) or (17). gives the main idea. The backward induction algorithm
b. It evaluates any policy made up of actions selected chooses maximizing actions in reverse order. It does not
from the sets A∗s,t . matter what happened before the current stage. The only
c. It gives the total return or expected total return vt (s) important information for future decisions is the current
that would be obtained if a policy corresponding to select- state of the system. First, for stage N the best action in
ing actions in A∗s,t were used from stage t onward. each state is selected. Clearly, v N (s) is the optimal value
function for a one-stage problem beginning in stage s at
Thus, if the decision maker had specified a policy that stage N . Next, in each state at stage N − 1, an action is
selected actions in the sets A∗s,t before applying the policy found to maximize the immediate reward plus the reward
evaluation algorithm, these two procedures would be iden- that will be obtained if, after reaching a state at stage N , the
tical. It will be argued below that any policy obtained by decision maker chooses the optimal action at that stage.
selecting an action from A∗s,t in each state at every stage is Clearly, v N −1 (s) is the optimal value for a one-stage prob-
optimal and consequently v 1 (s) is the optimal value func- lem with terminal reward v N (s). Since v N (s) is the opti-
tion for the problem; that is, v 1 (s) = v ∗N (s). mal value for the one-stage problem starting at stage N , no
In deterministic problems, specifying a policy often greater total reward can be obtained over these two stages.
provides much superfluous information, since if the state Hence, v N −1 (s) is the optimal reward from stage N − 1
of the system is known before the first decision, a policy onward starting in state s. Now, since the sets A∗s,N and
determines the system evolution with certainty and only A∗s,N −1 have been determined by the backward induction
one state is reached at each stage. Since all deterministic algorithm, choosing any policy that selects actions from
problems are equivalent to longest-route problems, the ob- these sets at each stage and evaluating it with the policy
jective in such problems is only to find a longest route. The evaluation algorithm above will also yield v N −1 (s). Thus,
following route selection algorithm does this. The system this policy is optimal over these two stages since its value
state is known before decision 1, so S1 contains a single equals the optimal value. This argument is repeated at
state. stages N − 2, N − 3, . . . , 1 to conclude that a policy that
selects an action from A∗s,t at each stage is optimal.
a. Set t = 1 and for s ∈ St define dt (s) = a ∗ for some The above argument contains the essence of the princi-
a ∈ A∗s,t . Set u = wt (s, a ∗ ).
∗
ple of optimality, which appeared in its original form on
b. For u ∈ St+1 , define dt+1 (u) = a ∗ for some a ∗ ∈ p. 83 of Bellman’s classic book, “Dynamic Programming,”
∗
Au,t+1 . Replace u by u = wt+1 (u, a ∗ ). as follows:
c. If t + 1 = N , stop; otherwise, t + 1 → t and return
to step b. An optimal policy has the property that whatever the initial state
and initial decision are, the remaining decisions must constitute
In this algorithm, the choice of an action at each node an optimal policy with regard to the state resulting from the first
determines which arc will be traversed, and decisions are decision.
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
The functional equations (15) and (17) are mathematical d. Since t = 1, continue. Set t = 1 and
statements of this principle.
It might not be obvious to the reader why the backward v 1 (1) = max r1 (1, 2) + v 2 (2), r1 (1, 3) + v 2 (3)
induction algorithm is more attractive than an enumera- = max{2 + 12, 4 + 9} = 14,
tion procedure for solving sequential decision problems.
A∗1,1 = {2}.
To see this, suppose that there are N stages, M states at
each stage, and K actions that can be chosen in each state. e. Since t = 1, stop.
Then there are (K M ) N policies. Solving a deterministic
problem by enumeration would require N (K M ) N addi-
This algorithm yields the information that the longest
tions and (K M ) N comparisons. By backward induction,
route from node 1 to node 7 has length 14, the longest
solution would require NMK additions and NK compar-
route from node 2 to node 7 has length 12, and so on.
isons, a potentially astronomical savings in work. Solv-
To find the choice of arcs that corresponds to the longest
ing stochastic problems requires additional M multipli-
route, we must apply the route selection algorithm:
cations at each state at each stage to evaluate expecta-
tions. Enumeration requires MN(K M ) N multiplications,
whereas backward induction would require NKM multipli- a. Set t = 1, d1 (1) = 2, and u = 2.
cations. Clearly, backward induction is a superior method b. Set d2 (2) = 5 and u = 5.
for solving any problem of practical significance. c. Since t + 1 = 3, continue. Set t = 2, d3 (5) = 7, and
u = 7.
d. Since t + 1 = 3, stop.
D. Examples
In this section, the use of the backward induction algorithm This procedure gives the longest path through the net-
is illustrated in terms of the examples that were presented work, namely, 1 → 2 → 5 → 7, which can easily be seen
in Section I. First, the longest-route problem in Fig. 2 is to have length 14. Note that choosing the myopic policy
considered. at each node would not have been optimal. At node 1, the
myopic policy would have selected action 3; at node 3, ac-
1. A Longest-Route Problem tion 4; and at node 4, action 7. The path 1 → 3 → 4 → 7
has length 12. By not taking future consequences into ac-
Note that N = 3 in this example. count at the first stage, the decision maker would have
found himself in a poor position for subsequent decisions.
a. Set t = 4 and v 4 (7) = 0.
The backward induction algorithm has also obtained an
b. Since t = 1, continue. Set t = 3 and
optimal policy. It is given by
v 3 (4) = r3 (4, 7) + v 4 (7)
π ∗ = d1∗ , d2∗ , d3∗ ,
= 3 + 0 = 3,
A∗6,3 = {7}, where
= max{31, 30} = 31
and A∗4,1 = {3}.
Maximize 3s12 + s23 + 4s3 e. Since t = 1, stop.
subject to
The objective function value for this constrained re-
s 1 + s2 + s3 ≤ 4 source allocation problem is 31. To find the optimal
resource allocation, we must use the second algorithm
and
above:
0 ≤ s1 ≤ 3; 0 ≤ s2 ; 0 ≤ s3 .
a. Set t = 1, d1 (4) = 3, and u = 1.
The backward induction is applied as follows: b. Set d2 (1) = 0 and u = 1.
c. Since t + 1 = 3, continue. Set t = 2, d3 (1) = 1, and
a. Set t = 4 and v 4 (s) = 0, 0 ≤ s ≤ 4. u = 0.
b. Since t = 1, continue. Set t = 3 and d. Since t + 1 = 3, stop.
v 3 (s) = max r3 (s, a) + v 4 (s − a) This procedure gives the optimal allocation, namely,
0≤a≤s
s1 = 3, s2 = 0, and s3 = 1, which corresponds to the
= max {4a} = 4s optimal value of 31.
0≤a≤s
67
0 0 0 0 3 2 0 16
1 5 0 1 0 0 0 129
16
2 6 0 194
3 5 0 2 0 0 0 16
227
3 0 0 0 16
c. Since t = 1, continue. Set t = 2 and
This policy has a particularly simple form: If at deci-
v 2 (s) = max{v 2 (s, a)}, sion point 1 the inventory in stock is 0 units, order 3 units;
a∈As
otherwise, do not order. If at decision point 2 the inventory
where, for instance, in stock is 0 units, order 2 units; otherwise, do not order.
And at decision point 3 do not order. The quantity v3∗ (s)
v 2 (0, 2) = r (0, 2) + p(0|0, 2)v 3 (0) + p(1|0, 2)v 3 (1) gives the expected total reward obtained by using this pol-
icy when the inventory before the first decision epoch is
+ p(2|0, 2)v 3 (2) + p(3|0, 2)v 3 (3) s units.
= −2 + 14 × 0 + 12 × 5 + 14 × 6 + A policy of this type is called an (s, S) policy. An (s,
S) policy is implemented as follows. If in period t the
0 × 5 = 2. inventory level is s t units or below, order the number of
units required to bring the inventory level up to S t units.
The quantities v 2 (s, a), v 2 (s), and A∗s,2 are summarized Under certain convexity and linearity assumptions on the
in the following tabulation, where Xs denote infeasible cost functions, Scarf showed in an elegant and important
actions: 1960 paper that (s, S) policies are optimal for the stochastic
inventory problem. His proof of this result is based on
v 2 (s, a)
using backward induction to show analytically that, for
s a=0 a=1 a=2 a=3 v 2 (s) A∗s,2 each t, v t (s) is K -convex, which ensures that there exists
1 1
a maximizing policy of (s, S) type. This important result
0 0 2 2 2
4 2 plays a fundamental role in stochastic operations research
1 6 14 4 2 12 X 6 14 0
and has been extended in several ways.
2 10 4 12 X X 10 0
3 10 12 X X X 10 12 0
iteration, successive overrelaxation, and extrapolation. Of the remaining terms are exactly vλd (s) Equation (26) can
these, only modified policy iteration and linear program- be rewritten as
ming will be discussed.
[δ( j, s) − λp( j|s, d(s)]vλd ( j) = r (s, d(s)),
j∈S
tem of linear equations. Let d denote the stationary policy By repeatedly substituting the above inequality into the
that uses decision rule d(s) at each stage and pdm ( j|s) its right-hand side of Eq. (30) and noting that λn → 0 as
corresponding m-step transition probabilities. Then, n → ∞, we can see that what follows from Eq. (24) is
that v ∗ (s) ≥ vλπ (s) for any policy π . Also, if d ∗ (s) is the
vλd (s) = r (s, d(s)) + λp( j|s, d(s))r ( j, d( j))
decision rule that satisfies
j∈S
+ λ2 pd2 ( j|s)r ( j, d( j)) r (s, d ∗ (s)) + λp( j|s, d ∗ (s))v ∗ ( j)
j∈S
+ λ3 pd3 ( j|s)r ( j, d( j)) + ··· (25)
∗
j∈S
= max r (s, a) + λp( j|s, a)v ( j) , (31)
= r (s, d(s)) + λp( j|s, d(s))vλd ( j). (26) j∈S
j∈S then the stationary policy that uses d ∗ (s) each period is
Equation (26) is derived from Eq. (25) by explicitly optimal. This follows because if d ∗ satisfies Eq. (31), then
writing out the matrix powers, factoring out the term v ∗ (s) = r (s, d ∗ (s)) + λp( j|s, d ∗ (s))v ∗ ( j),
p( j|s, d(s)) from all summations, and recognizing that j∈S
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
but since this equation has a unique solution, c. If vn+1 − vn < ε(1 − λ)/2λ, go to step d. Other-
∗ wise, increment n by 1 and return to step b.
v ∗ (s) = vλd (s) = vλ∗ (s).
d. For each s ∈ S, set d ε (s) equal to an a ∈ As that
These results are summarized as follows. There ex- obtains the maximum on the right-hand side of Eq. (32)
ists an optimal policy to the infinite-horizon discounted at the last iteration and stop.
stochastic sequential decision problem that is stationary
and can be found by using Eq. (31). Its value function This algorithm can best be understood in vector space
is the unique solution of the functional equation of dis- notation. In Eq. (29), the operator T is defined on the set
counted dynamic programming. of bounded real-valued M vectors. Solving the functional
This result plays the same role as the principle of opti- equation corresponds to finding a fixed point of T , that
mality in finite-horizon dynamic programming. It says that is, a v such that Tv = v. The value iteration algorithm
for us to solve the infinite-horizon dynamic programming starts with an arbitrary v0 (0 is usually a good choice) and
problem, it is sufficient to obtain a solution to the func- iterates according to vn+1 = T vn . Since T is a contraction
tional equation. In the next section, methods for solving mapping, that is,
the functional equation will be demonstrated.
T v − T u ≤ λv − u
C. Computational Methods for any v and u, the iterative method is convergent for any
v 0 . This is because
The four methods to be discussed here are value iteration,
policy iteration, modified policy iteration, and linear pro- vn+1 − vn ≤ λn v1 − v0 ,
gramming. Only the first three are applied to the example
in Section III.D. Value iteration and modified policy it- and the space of bounded real-valued M vectors is a
eration are iterative approximation methods for solving Banach space (a complete normed linear space) with re-
the dynamic programming functional equation, whereas spect to the norm used here. Since a contraction mapping
policy iteration is exact. To study the convergence of an has a unique fixed point, v∗λ , vn converges to it. The rate
approximation method, we must have a notion of distance. of convergence is geometric with parameter λ, that is,
If v is a real-valued function on S (an M vector), the norm
vn − v∗ ≤ λn v0 − v∗ .
of v, denoted by v is defined as
v = max |v(s)|. The algorithm terminates with a value function vn+1 and
s∈S a decision rule dε with the following property:
The distance between two vectors v and u is given by
v n+1 (s) = r (s, d ε (s)) + λp( j|s, d ε (s))v n ( j). (33)
v − u. This means that two vectors are ε units apart if j∈S
the maximum difference between any two components is
ε units. This is often called the L ∞ norm. The stopping rule in step c ensures that the stationary
A policy π is said to be ε-optimal if vπλ − v∗λ < ε. policy that uses dε every period is ε-optimal.
If ε is specified sufficiently small, the two iterative algo- The sequence of iterates vn have interesting interpre-
rithms can be used to find policies whose expected to- tations. Each iterate corresponds to the optimal expected
tal discounted reward is arbitrarily close to optimum. Of total discounted return in an n-period problem in which the
course, the more accurate the approximation, the more terminal reward equals v0 . Alternatively, they correspond
iterations of the algorithm that are required. to the expected total discounted returns for the policy in
an n-period problem that is obtained by choosing a maxi-
mizing action in each state at each iteration.
1. Value Iteration
Value iteration, or successive approximation, is the direct
2. Policy Iteration
extension of backward induction to infinite-horizon prob-
lems. It obtains an ε-optimal policy d ε as follows: Policy iteration, or approximation in the policy space, is
an algorithm that uses the special structure of infinite-
a. Select v0 , specify ε > 0, and set n = 0. horizon stationary dynamic programming problems to find
b. Compute vn+1 by all optimal policies. The algorithm is as follows:
v (s) = max r (s, a) +
n+1
λp( j|s, a)v ( j) . (32)
n a. Select a decision rule d 0 (s) for all s ∈ S and set
a∈As
j∈S n = 0.
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
c. For each s, and each a ∈ As , compute For each s, put a in A∗n,s if a obtains the maximum value
r (s, a) + λp( j|s, a)v n ( j). (35) in Eq. (37).
j∈S c. For each s in S, set d n (s) equal to any a in A∗n,s .
(i) Set k = 0 and define u 0 (s) by
For each s, put a in A∗n,s if a obtains the maximum value
in Eq. (35).
d. If, for all s, d n (s) is contained in A∗n,s , stop. Other- u 0 (s) = max r (s, a) + λp( j|s, a)v n ( j) . (38)
a∈As
wise, proceed. j∈S
4. Linear Programming however, the discount rate λ is chosen to be .9. The objec-
tive is to determine the stationary policy that maximizes
The stationary infinite-horizon discounted stochastic se-
the expected total infinite-horizon discounted reward.
quential decision problem can be formulated and solved
by linear programming. The primal problem is given by
1. Value Iteration
Minimize
To initiate the algorithm, we will take v0 to be the zero vec-
α j v( j)
tor; ε is chosen to be .1. The algorithm will terminate with
j∈S
a stationary policy that is guaranteed to have an expected
subject to, for a ∈ As and s ∈ S, total discounted reward within .1 of optimal. Calculations
proceed as in the finite-horizon backward induction algo-
v(s) ≥ r (s, a) + λp( j|s, a)v( j),
j∈S
rithm until the stopping criterion of
ε(1 − λ) .1 × .1
and v(s) is unconstrained. vn+1 − vn ≤ = = .0056
2λ 2 × .9
The constants α j are positive and arbitrary. The dual prob- is satisfied. The value functions vn and the maximiz-
lem is given by ing actions obtained in step b at each iteration are pro-
Maximize vided in the tabulation on the following page. The above
algorithm terminates after 58 iterations, at which point
x(s, a)r (s, a) v58 − v57 = .0054. The .1-optimal stationary policy is
s∈S a∈As
dε = (3, 0, 0, 0), which means that if the stock level is
subject to, for J ∈ S, 0, order 3 units; otherwise, do not order. Observe that the
optimal policy was first identified at iteration 3, but the al-
x( j, a) − λp( j|s, a)x(s, a) = α j ,
a∈As s∈S a∈As
gorithm did not terminate until iteration 58. In larger prob-
lems such additional computational effort is extremely
and x( j, a) ≥ 0 for a ∈ A j and j ∈ S. wasteful. Improved stopping rules and more efficient al-
Using a general-purpose linear programming code for gorithms are described in Section III.E.
solving dynamic programming problems is not compu-
tationally attractive. The dynamic programming methods 2. Policy Iteration
are more efficient. The interest in the linear programming
To initiate policy iteration, choose the myopic policy,
formulation is primarily theoretical but allows inclusion
namely, that which maximizes the immediate one-period
of side constraints. Some interesting observations are as
reward r (s, a). The algorithm proceeds as follows:
follows:
a. Set d 0 = (0, 0, 0, 0) and n = 0.
a. The dual problem is always feasible and bounded. b. Solve the evaluation equations:
Any optimal basis has the property that for each s ∈ S,
(1 − .9 × 1)v 0 (0) = 0,
x(s, a) > 0 for only one a ∈ As . An optimal stationary
policy is given by d ∗ (s) = a if x(s, a) > 0. (−.9 × .75)v (0) + (1 − .9 × 25)v (1) = 5,
0 0
b. The same basis is optimal for all α j ’s. (−.9 × .25)v 0 (0) + (−.9 × .5)v 0 (1)
c. When the dual problem is solved by the simplex al- + (1 − .9 × 25)v 0 (2) = 6,
gorithm with block pivoting, it is equivalent to policy it- and
eration. (−.9 × .25)v 0 (1) + (−.9 × .50)v 0 (2)
d. When policy iteration is implemented by only chang- + (1 − .9 × .25)v 0 (3) = 5.
ing the action that gives the maximum improvement over These equations are obtained by substituting the tran-
all states, it is equivalent to solving the dual problem by sition probabilities and rewards corresponding to pol-
the simplex method. icy d0 into Eq. (34). The solution of these equations is
v0 = (0, 6.4516, 11.4880, 14.9951).
c. For each s, the quantities
D. Numerical Examples
3
In this section, an infinite-horizon version of the stochas- r (s, a) + λp( j|s, a)v 0 ( j)
tic inventory example presented earlier is solved by using j=0
value iteration, policy iteration, and modified policy iter- are computed for a = 0, . . . , 3 − s, and the actions that
ation. The data are as analyzed in the finite-horizon case; achieve the maximum are placed into A∗0,s . In this example
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
v n (s) d n (s)
n s =0 s =1 s =2 s =3 s =0 s =1 s =2 s =3
0 0 0 0 0 0 0 0 0
1 0 5.0 6.0 5.0 2 0 0 0
2 1.6 6.125 9.6 9.95 2 0 0 0
3 3.2762 7.4581 11.2762 12.9368 3 0 0 0
4 4.6632 8.8895 12.6305 14.6632 3 0 0 0
5 5.9831 10.1478 13.8914 15.9831 3 0 0 0
10 10.7071 14.8966 18.6194 20.7071 3 0 0 0
15 13.5019 17.6913 21.4142 23.0542 3 0 0 0
30 16.6099 20.7994 24.5222 26.6099 3 0 0 0
50 17.4197 21.6092 25.3321 27.4197 3 0 0 0
56 17.4722 21.6617 25.3845 27.4722 3 0 0 0
57 17.4782 21.6676 25.3905 27.4782 3 0 0 0
58 17.4736 21.6730 25.3959 27.4836 3 0 0 0
there is a unique maximizing action in each state, and it is is described in detail; calculations for the remainder are
given by presented in tabular form below.
A∗0,0 = {3}; A∗0,1 = {2};
a. Set v0 = (0, 0, 0, 0), n = 0, and ε = .1.
A∗0,2 = {0}; A∗0,3 = {0}. b. Observe that
d. Since d 0 (0) = 0, it is not contained in A∗0,0 , so con-
3
evaluation step. so that for each s the maximum value occurs for a = 0.
Thus, A∗n,s = {0} for s = 0, 1, 2, 3 and dn = (0, 0, 0, 0).
The detailed step-by-step calculations for the remain- (i) Set k = 0 and u0 = (0, 5, 6, 5).
der of the algorithm are omitted. The value functions and (ii) Since u0 = v0 = 6 > .0056, continue.
corresponding maximizing actions are presented below. (iii) Compute u1 by
Since there is a unique maximizing action in the improve-
ment step at each iteration, A∗n,s is equivalent to d n (s) and
3
u 1 (s) = r (s, d 0 (s)) + λp( j|s, d 0 (s))u 0 ( j)
only the latter is displayed. j=0
The algorithm terminates in three iterations with the op-
3
timal policy d∗ = (3, 0, 0, 0). Observe that an evaluation = r (s, 0) + λp( j|s, 0)u 0 ( j) (40)
was unnecessary at iteration 3 since d2 = d3 terminated j=0
the algorithm before the evaluation step. Unlike value it-
eration, the algorithm has produced an optimal policy as = 0 + .9 × 1 × 0 = 0, for s = 0,
well as its expected total discounted reward v3 , which is = 5 + .9 × 3
4
× 0 + .9 × 1
4
× 5 = 6.125,
the optimal expected total discounted reward. This com- for s = 1,
putation shows that the .1-optimal policy found by using
value iteration is in fact optimal. This information could = 6 + .9 × 1
4
× 0 + .9 × 1
2
×5
not be obtained by using value iteration unless the action + .9 × 1
4
× 6 = 9.60, for s = 2,
elimination method described in Section III.E were used.
= 6+9× 1
4
× 5 + .9 × 1
2
×6
3. Modified Policy Iteration + .9 × 1
4
× 5 = 10.95, for s = 3,
The following illustrates the application of modified pol- so that u = (0, 6.125, 9.60, 10.95).
1
icy iteration of order 5. The first pass through the algorithm (iv) Since k = 1 < 5, continue.
v n (s) d n (s)
n s =0 s =1 s =2 s =3 s =0 s =1 s =2 s =3
v n (s) d n (s)
n s =0 s =1 s =2 s =3 s =0 s =1 s =2 s =3
0 0 0 0 0 0 0 0 0
1 0 6.4507 11.4765 14.9200 3 2 0 0
2 7.1215 9.1215 14.6323 17.1215 3 0 0 0
3 11.5709 15.7593 19.4844 21.5709 3 0 0 0
4 14.3639 18.5534 22.2763 24.3639 3 0 0 0
5 15.8483 20.0377 23.7606 25.8483 3 0 0 0
10 17.4604 21.6499 25.3727 27.4604 3 0 0 0
11 17.4938 21.6833 25.4062 27.4938 3 0 0 0
The loop is repeated four more times to evaluate u2 , u3 , Bounds are given for value iteration; however, they have
u4 , and u5 . Then v1 is set equal to u5 and the maximization also been obtained for policy iteration and modified policy
in step b is carried out. The resulting iterates are shown at iteration. First, define the following two quantities:
the bottom of the page. In step (ii), following iteration 11,
L n = min{v n+1 (s) − v n (s)}
the computed value of u0 is (17.4976, 21.6871, 25.4100, s∈S
27.4976), so that u0 − v11 = .0038, which guarantees and
that the policy (3, 0, 0, 0) is ε-optimal with ε = .1.
U n = max{v n+1 (s) − v n (s)}.
s∈S
Action elimination procedures are especially important not relevant. For instance, in a large telecommunications
in approximation algorithms such as value iteration and network, millions of packet and call routing decisions are
modified policy iteration that produce only ε-optimal poli- made every second, so that discounting the consequences
cies. If a unique optimal policy exists, by eliminating sub- of latter decisions makes little sense. An alternative is the
optimal actions at each iteration, one obtains an optimal expected average reward criterion defined in Eq. (7). Us-
policy when only one action remains in each state. This ing this criterion means that rewards received at each time
will occur in finitely many iterations. period receive equal weight.
Computational methods for the average reward crite-
rion are more complex than those for the discounted re-
F. Turnpike and Planning-Horizon Results ward problem. This is because the form of the average
Infinite-horizon sequential decision models are usually ap- reward function, g π (s), depends on the structure of the
proximations to long finite-horizon problems with many Markov chain corresponding to the stationary policy π. If
decision points. A question of practical concern is, When the policy is unichain, that is, the Markov chain induced
is the optimal policy for the infinite-horizon model optimal by the policy has exactly one recurrent class and possibly
for the finite-horizon problem? An answer to this question several transient classes, then gπ is a constant vector. In
is provided through planning-horizon, or turnpike, the- this case, the functional equation is
ory. The basic result is the following. There exists an N ∗ ,
called the planning horizon, such that, for all n ≥ N ∗ , the v(s) = max r (s, a) − g + p( j|s, a)v( j) . (44)
a∈As
optimal decision when there are n periods remaining is s∈S
one of the decisions that is optimal when the horizon is Solving this equation uniquely determines g and deter-
infinite. This result means that if there is a unique optimal mines v(s) up to an additive constant. The quantity g is
stationary policy d ∗ for the infinite-horizon problem, then the optimal expected average reward. For us to specify v
in an n-period problem with n ≥ N ∗ , it is optimal to use uniquely, it is sufficient to set v(s0 ) = 0 for some s0 in
the stationary strategy for the initial n − N ∗ decisions and the recurrent class of a policy corresponding to choosing
to find the optimal policy for the remaining N ∗ periods a maximizing action in each state. If this is done, v(s)
by using the backward induction methods of Section II. is called a relative value function and v( j) − v(k) is the
The optimal infinite-horizon strategy is called the turn- difference in expected total reward obtained by using an
pike, and it is reached after traveling N ∗ periods on the optimal policy and starting the system in state j as opposed
nonstationary “side roads.” The term turnpike originates to state k.
in mathematical economics, where it refers to the policy As in the discounted case, an optimal policy is found by
path that produces optimal economic growth. solving the functional equation. This is best done by policy
Another interpretation of the above result is that it is iteration. The theory of value iteration is quite complex in
optimal to use d ∗ for the first decision in any finite-horizon this setting and is not discussed here. The policy iteration
problem in which it is known that the horizon exceeds algorithm is given below. It is assumed that all policies are
N ∗ . Thus, it is not necessary to specify the horizon, only unichain.
to know that there are at least N ∗ decisions to be made.
Bounds on N ∗ are available, and this concept has been a. Select a decision rule d0 and set n = 0.
extended to nonstationary infinite-horizon problems and b. Solve the system of equations
the expected average reward criteria. This is referred to as
a rolling horizon approach. [δ( j, s) − p( j|s, d n (s))]v n ( j) − g n = r (s, d n (s)),
j∈S
The computational results for the value iteration al-
(45)
gorithm in Section III.D give further insight into the
planning-horizon result. There, N ∗ = 3, so that in any where v n (s0 ) = 0 for some s0 in the recurrent class of d n .
problem with horizon greater than 3, it is optimal to use c. For each s, and each a ∈ As , compute
the decision rule (3, 0, 0, 0) until there are three decisions
r (s, a) + p( j|s, a)v n ( j). (46)
left to be made, at which point it is optimal to use the de- j∈S
cision rule (2, 0, 0, 0) for two periods and (0, 0, 0, 0) in
the last period. For each s, put a in A∗n,s if a obtains the maximum value
in Eq. (46).
d. If for each s, d n (s) is contained in A∗n,s , stop. Other-
G. The Average Expected Reward Criteria
wise, proceed.
In many applications in which infinite-horizon formula- e. Set d n+1 (s) equal to any a in A∗n,s for each s in S,
tions are natural, the total discounted reward criterion is increment n by 1, and return to step b.
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
Note that this algorithm is almost identical to that for to compute optimal policies, as well as analytically to
the discounted case. The only difference is the linear sys- determine the form of an optimal policy under various as-
tem of equations solved in step b. If the assumption that all sumptions on the rewards and transition probabilities. A
policies are unichain is dropped, solution of the functional brief and by no means complete summary of applications
equation [Eq. (44)] is no longer sufficient to determine an appears in Table I. Only stochastic dynamic programming
optimal policy. Instead, a nested pair of optimality equa- is considered; however, in many cases, the problems have
tions is required. also been analyzed in the deterministic setting. In these
applications, probability distributions of the random quan-
tities are assumed to be known before solution of the prob-
IV. FURTHER TOPICS lem; adaptive estimation of parameters is not necessary.
A major limitation in the practical application of dy-
A. Historical Perspective namic programming has been computational. When the
The development of dynamic programming is usually set of states at each stage is large—for example, if the state
credited to Richard Bellman. His numerous papers in the description is vector-valued—then solving a sequential
1950s presented a formal development of this subject and decision problem by dynamic programming requires con-
numerous interesting examples. Most of this pioneering siderable storage as well as computationl time. Bellman
work is summarized in his book, “Dynamic Program- recognized the difficulty early on and referred to it as
ming.” However, many of the themes of dynamic pro- the “curse of dimensionality.” Research in the 1990s ad-
gramming and sequential decision processes are scattered dressed this issue by developing approximation meth-
throughout earlier works. These include studies that ap- ods for large-scale applications. These methods combined
peared between 1946 and 1953 on water resource manage- concepts from stochastic approximation, simulation, and
ment by Masse; sequential analysis in statistics by Wald; artificial intelligence and are sometimes referred to as re-
games of pursuit by Wald; inventory theory by Arrow, inforcement learning. A comprehensive treatment of this
Blackwell, and Girshick; Arrow, Harris, and Marshak; and line of research appears in “Neuro-Dynamic Program-
Dvoretsky, Kiefer, and Wolfowitz; and stochastic games ming” by Bertsekas and Tsitsiklis.
by Shapley.
Although Bellman coined the phrase “Markov decision C. Extensions
processes,” this aspect of dynamic programming got off In the models considered in this article, it has been as-
the ground with Howard’s monograph, “Dynamic Pro- sumed that the decision maker knows the state of the sys-
gramming and Markov Processes” in 1960. The first for- tem before making a decision, that decisions are made at
mal theoretical treatment of this subject was by Blackwell discrete time points, that the set of states is finite and dis-
in 1962. In 1960, deGhellinck demonstrated the equiv- crete (with the exception of the example in Section II.D.2),
alence between Markov decision processes and linear and the model rewards and transition probabilities are
programming. Other major contributions are those of known. These models can be modified in several ways:
Denardo in 1968, in which he showed that value iteration the state of the system may be only partially observed
can be analyzed by the theory of contraction mappings; by the decision maker, the sets of decision points and
Veinott in 1969, in which he introduced a new family of states may be continuous, or the transition probabilities
optimality criteria for dynamic programming problems; or rewards may not be known. These modifications are
and Federgruen and Schweitzer between 1978 and 1980, discussed briefly below.
in which they investigated the properties of the sequences
of policies obtained from the value iteration algorithm. 1. Partially Observable Models
Modified policy iteration is usually attributed to Puterman
and Shin in 1978; however, similar ideas appeared earlier This model differs from the fully observable model in that
in works of Kushner and Kleinman and of van Nunen. In the state of the system is not known to the decision maker
1978, Puterman and Brumelle demonstrated the equiva- at the time of decision. Instead, the decision maker re-
lence of policy iteration to Newton’s method. Puterman’s ceives a signal from the system and on the basis of this
book “Markov Decision Processes” provides a compre- signal updates his estimate of the probability distribution
hensive overview of theory, application, and calculations. of the system state. Updating is done using Bayes’ theo-
rem. Decisions are based on this probability distribution,
which is a sufficient statistic for the history of the process.
B. Applications
When the set of states is discrete, these models are re-
Dynamic programming methods have been applied in ferred to as partially observable Markov decision pro-
many areas. These methods have been used numerically cesses. Computational methods in this case are quite
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
Capacity expansion Size of plant Maintain or add capacity Costs of expansion and Demand for product
production at current
capacity
Cash management Cash available Borrow or invest Transaction costs less External demand for cash
interest
Catalog mailing Customer purchase record Type of catalog to send Purchases in current Customer purchase amount
to customer, if any period less mailing
costs
Clinical trials Number of successes Stop or continue the trial, Costs of treatment and Response of a subject to
with each treatment and if stopped, choose incorrect decisions a treatment
best treatment if any
Economic growth State of the economy Investment or consumption Utility of consumption Effect of investment
Fisheries management Fish stock in each age class Number of fish to harvest Value of the catch Population size
Football Position of ball Play to choose Expected points scored Outcome of play
Forest management Size and condition of stand Harvesting and reforestation Revenues less harvesting Stand growth and price
activities costs fluctuations
Gambling Current wealth Stop or continue playing Cost of playing Outcome of the game
the game
Hotel and airline Number of confirmed Accept, wait-list, or reject Profit from satifisfied Demand for reservations
reservations reservations new reservations reservations less and number of arrivals
overbooking penalties
Inventory control Stock on hand Order additional stock Revenue per item sold Demand for items
less ordering, holding,
and penalty costs
Project selection Status of each project Project to invest in at present Return from investing in Change of project status
project
Queuing control Number in the queue Accept or reject arriving Revenue from serving Interarrival times and
customers or control customer less delay service times
service rate costs
Reliability Age or status of equipment Inspect and repair or Inspection and repair Failure and deterioration
replace if necessary costs plus failure cost
Scheduling Activites completed Next activity to schedule Cost of activity Length of time to complete
activity
Selling an asset Current offer Accept or reject the offer The offer less the cost of Size of the offer
holding the asset for
one period
Water resource Level of water in each Quantity of water to release Value of power generated Rainfall and runoff
management reservoir in river system
complex, and only very small problems have been solved Its solution using dynamic programming methodology is
numerically. When the states form a continuum, this prob- given in Section II. When transitions are stochastic, only
lem falls into the venue of control theory. An extremely minor modifications to the general sequential decision
important result in this area is the Kalman filter, which problem are necessary. Instead of a transition probabil-
provides an updating formula for the expected value and ity function, a transition probability density is used, and
covariance matrix of the system state. Another important summations are replaced by integrations throughout. This
result is the separation theorem, which gives conditions modification causes considerable theoretical complexity;
that allow decomposition of this problem into separate the main issues concern measurability and integrability of
problems of estimation and control. value functions.
Problems of this type fall into the realm of stochastic
control theory. Although dynamic programming is used to
2. Continuous-State, Discrete-Time Models
solve such problems, the formulation is quite different. In-
The resource allocation problem in Section I is an example stead of explicitly giving a transition probability function
of a continuous-state, discrete-time, deterministic model. for the state, the theory requires use of a dynamic equation
P1: FYD/FQW P2: FYD/LRV QC: FYD Final Pages
Encyclopedia of Physical Science and Technology EN004A-187 June 8, 2001 18:59
to relate the state at time t +1 to the state at time t. A major estimate this parameter. In a Bayesian analysis of such
result in this area is that when the state dynamics are linear models, uncertainty about the parameter value is described
in the state, action, and random component and the cost through a probability distribution which is periodically up-
is quadratic in the state and the action, then a closed-form dated as information becomes available. The classical ap-
solution is available for the optimal decision rule and it proach to such models uses maximum likelihood theory to
is linear in the system state. These problems have been estimate the parameter and derive its statistical properties.
studied extensively in the engineering literature.
3. Continous-Time Models
ACKNOWLEDGMENT
Stochastic continuous-time models are categorized ac- Preparation of this article was supported by Natural Sciences and Engi-
cording to whether the state space is continuous or dis- neering Research Council Grant A-5527.
crete. The discrete-time model has been widely studied in
the operations research literature. The stochastic nature of
the problem is modeled as either a Markov process, a semi- SEE ALSO THE FOLLOWING ARTICLES
Markov process, or a general jump process. The decision
maker can control the transition rates, transition probabil- LINEAR OPTIMIZATION • NONLINEAR PROGRAMMING •
ities, or both. The infinite-horizon versions of the Markov OPERATIONS RESEARCH
and semi-Markov decision models are analyzed in a sim-
ilar fashion to the discrete-time Markov decision process;
however, general jump processes are considerably more BIBLIOGRAPHY
complex. These models have been widely applied to prob-
lems in queuing and inventory control. Bellman, R. E. (1957). “Dynamic Programming,” Princeton University
When the state space is continuous and Markovian Press, Princeton, N.J.
Bertsekas, D. P. (1995). “Dynamic Programming and Optimal Control,”
assumptions are made, diffusion processes are used to
Vols. 1 and 2, Athena Scientific, Belmont, Mass.
model the transitions. The decision maker can control Bertsekas, D. P., and Tsitsiklis, J. M. (1995). “Neuro-Dynamic Program-
the drift of the system or can cause instantaneous ming,” Athena Scientific, Belmont, Mass.
state transitions or jumps. The discrete-time optimality Blackwell, D. (1962). Ann. Math. Stat. 35, 719–726.
equation is replaced by a nonlinear second-order partial Denardo, E. V. (1967). SIAM Rev. 9, 169–177.
Denardo, E. V. (1982). “Dynamic Programming, Models and Applica-
differential equation and is usually solved numerically.
tions,” Prentice-Hall, Englewood Cliffs, N.J.
These models are studied in the stochastic control theory Fleming, W. H., and Rishel, R. W. (1975). “Deterministic and Stochastic
literature and have been applied to inventory control, Optimal Control,” Springer-Verlag, New York.
finance, and statistical modeling. Howard, R. A. (1960). “Dynamic Programming and Markov Processes,”
MIT Press, Cambridge, Mass.
Puterman, M. L. (1994). “Markov Decision Processes,” Wiley, New
4. Adaptive Control York.
Ross, S. M. (1983). “Introduction to Stochastic Dynamic Programming,”
When transition probabilities and/or rewards are un- Academic Press, New York.
known, the decision maker must adaptively estimate them Scarf, H. E. (1960). In “Studies in the Mathematical Theory of Inventory
to control the system optimally. The usual approach to and Production,” (K. J. Arrow, S. Karlin, and P. Suppes, eds.), Stanford
analysis of such systems is to assume that the rewards and University Press, Stanford, Calif.
Veinott, A. F., Jr. (1969). Ann. Math. Stat. 40, 1635–1660.
transition probabilities depend on an unknown parame- Wald, A. (1947). “Sequential Analysis,” Wiley, New York.
ter, such as the arrival rate to a queuing system, and then White, D. J. (1985). Interfaces 15, 73–83.
use the observed sequence of system states to adaptively White, D. J. (1988). Interfaces 18, 55–61.
P1: GHA/LOW P2: FQP Final Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
Fourier Series
James S. Walker
University of Wisconsin–Eau Claire
I. Historical Background
II. Definition of Fourier Series
III. Convergence of Fourier Series
IV. Convergence in Norm
V. Summability of Fourier Series
VI. Generalized Fourier Series
VII. Discrete Fourier Series
VIII. Conclusion
∞
GLOSSARY ists a collection {(ai ,
bi )}i=1 of open intervals such that
∞ ∞
S ⊂ ∪i=1 (ai , bi ) and i=1 (bi − ai ) ≤ . Examples: All
Bounded variation A function f has bounded vari- finite sets, and all countably infinite sets, have Lebesgue
ation on a closed interval [a, b] if there exists a measure zero.
positive constant B such that, for all finite sets Odd and even functions A function f is odd if
Npoints a = x0 < x1 < · · · < x N = b, the inequality
of f(−x) = −f(x) for all x in its domain. A function f is
i=1 |f(x i ) − f(x i−1 )| ≤ B is satisfied. Jordan proved even if f(−x) = f(x) for all x in its domain.
that a function has bounded variation if and only if it One-sided limits f(x−) and f(x+) denote limits of f(t)
can be expressed as the difference of two nondecreas- as t tends to x from the left and right, respectively.
ing functions. Periodic function A function f is periodic, with period
Countably infinite set A set is countably infinite if it P > 0, if the identity f(x + P) = f(x) holds for all x.
can be put into one-to-one correspondence with the set Example: f(x) = | sin x| is periodic with period π.
of natural numbers (1, 2, . . . , n, . . .). Examples: The
integers and the rational numbers are countably infinite
sets. FOURIER SERIES has long provided one of the prin-
Continuous function If limx→c f(x) = f(c), then the func- cipal methods of analysis for mathematical physics, en-
tion f is continuous at the point c. Such a point is called gineering, and signal processing. It has spurred general-
a continuity point for f. A function which is continuous izations and applications that continue to develop right
at all points is simply referred to as continuous. up to the present. While the original theory of Fourier
Lebesgue measure zero A set S of real numbers is said to series applies to periodic functions occurring in wave mo-
have Lebesgue measure zero if, for each > 0, there ex- tion, such as with light and sound, its generalizations often
167
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
relate to wider settings, such as the time-frequency analy- This was the first example of the use of Fourier series to
sis underlying the recent theories of wavelet analysis and solve boundary value problems in partial differential equa-
local trigonometric analysis. tions. To obtain Eq. (3), Fourier made use of D. Bernoulli’s
method of separation of variables, which is now a stan-
dard technique for solving boundary value problems.
I. HISTORICAL BACKGROUND
A good, short introduction to the history of Fourier se-
ries can be found in The Mathematical Experience. Be-
There are antecedents to the notion of Fourier series in
sides his many mathematical contributions, Fourier has
the work of Euler and D. Bernoulli on vibrating strings,
left us with one of the truly great philosophical principles:
but the theory of Fourier series truly began with the pro-
“The deep study of nature is the most fruitful source of
found work of Fourier on heat conduction at the begin-
knowledge.”
ning of the 19th century. Fourier deals with the problem
of describing the evolution of the temperature T (x, t) of a
thin wire of length π , stretched between x = 0 and x = π ,
with a constant zero temperature at the ends: T (0, t) = 0 II. DEFINITION OF FOURIER SERIES
and T (π, t) = 0. He proposed that the initial tempera-
ture T (x, 0) = f(x) could be expanded in a series of sine The Fourier sine series, defined in Eqs. (1) and (2), is a spe-
functions: cial case of a more general concept: the Fourier series for a
∞ periodic function. Periodic functions arise in the study of
f(x) = bn sin nx (1) wave motion, when a basic waveform repeats itself peri-
n=1 odically. Such periodic waveforms occur in musical tones,
with in the plane waves of electromagnetic vibrations, and in
π the vibration of strings. These are just a few examples.
2
bn = f(x) sin nx d x. (2) Periodic effects also arise in the motion of the planets, in
π 0
AC electricity, and (to a degree) in animal heartbeats.
A function f is said to have period P if f(x + P) =
A. Fourier Series f(x) for all x. For notational simplicity, we shall restrict
Although Fourier did not give a convincing proof of con- our discussion to functions of period 2π . There is no loss
vergence of the infinite series in Eq. (1), he did offer the of generality in doing so, since we can always use a sim-
conjecture that convergence holds for an “arbitrary” func- ple change of scale x = (P/2π )t to convert a function of
tion f. Subsequent work by Dirichlet, Riemann, Lebesgue, period P into one of period 2π .
and others, throughout the next two hundred years, was If the function f has period 2π , then its Fourier series is
needed to delineate precisely which functions were ∞
expandable in such trigonometric series. Part of this work c0 + {an cos nx + bn sin nx} (4)
entailed giving a precise definition of function (Dirichlet), n=1
and showing that the integrals in Eq. (2) are properly with Fourier coefficients c0 , an , and bn defined by the
defined (Riemann and Lebesgue). Throughout this article integrals
we shall state results that are always true when Riemann π
1
integrals are used (except for Section IV where we need c0 = f(x) d x (5)
to use results from the theory of Lebesgue integrals). 2π −π
In addition to positing Eqs. (1) and (2), Fourier argued 1 π
that the temperature T (x, t) is a solution to the following an = f(x) cos nx d x, (6)
π −π
heat equation with boundary conditions:
1 π
∂T ∂2T bn = f(x) sin nx d x. (7)
= , 0 < x < π, t > 0 π −π
∂t ∂x2
T (0, t) = T (π, t) = 0, t ≥0 [Note: The sine series defined by Eqs. (1) and (2) is a
special instance of Fourier series. If f is initially defined
T (x, 0) = f (x), 0 ≤ x ≤ π. over the interval [0, π ], then it can be extended to [−π, π ]
Making use of Eq. (1), Fourier showed that the solution (as an odd function) by letting f(−x) = −f(x), and then
T (x, t) satisfies extended periodically with period P = 2π . The Fourier
series for this odd, periodic function reduces to the sine
∞
T (x, t) = bn e−n t sin nx.
2
(3) series in Eqs. (1) and (2), because c0 = 0, each an = 0, and
n=1 each bn in Eq. (7) is equal to the bn in Eq. (2).]
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
∞
c0 + cn einx + c−n e−inx (8)
n =1
∞
cn einx . (10) Thus, the Fourier series for this square wave is
n =−∞
1 ∞
sin(nπ/2) inx
We now consider a couple of examples. First, let f1 be + (e + e−inx )
defined over [−π, π] by 2 n=1 nπ
1 ∞
2 sin(nπ/2)
1 if |x | < π/2 = + cos nx. (11)
f1 (x) = 2 n=1 nπ
0 if π/2 ≤ |x | ≤ π
Second, let f2 (x) = x 2 over [−π, π] and have period 2π ,
and have period 2π . The graph of f1 is shown in Fig. 1; see Fig. 2. We shall refer to this wave as a parabolic wave.
it is called a square wave in electric circuit theory. The This parabolic wave has c0 = π 2 /3 and cn , for n = 0, is
constant c0 is π
π 1
1 cn = x 2 e−inx d x
c0 = f1 (x) d x 2π −π
2π −π π π
π/2 1 i
1 1 = x 2 cos nx d x − x 2 sin nx d x
= 1 dx = . 2π −π 2π −π
2π −π/2 2 2(−1)n
=
While, for n = 0, n2
π after an integration by parts. The Fourier series for this
1
cn = f1 (x)e−inx d x function is then
2π −π
π/2 π2 ∞
2(−1)n inx
1 + (e + e−inx )
= e−inx d x 3 n 2
2π −π/2
n=1
1 e−inπ/2 − einπ/2 π2 ∞
4(−1)n
= = + cos nx. (12)
2π −in 3 n=1
n2
sin(n π/2) We will discuss the convergence of these Fourier series,
= .
nπ to f1 and f2 , respectively, in Section III.
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
Because of Bessel’s inequality, it follows that holds for all x near x0 (i.e., |x − x0 | < δ for some δ > 0). It
is easy to see, for instance, that the square wave function
lim cn = 0 (18) f1 is Lipschitz at all of its continuity points.
|n |→∞
The inequality in Eq. (19) has a simple geometric
π interpretation. Since both sides are 0 when x = x0 , this
holds whenever −π |f(x)|2 d x is finite. The Riemann-
inequality is equivalent to
Lebesgue lemma says that Eq. (18) holds in the following
more general case: f(x) − f(x0 )
≤A (20)
Theorem 2 (Riemann-Lebesgue Lemma): If x − x0
π
−π | f(x)| d x is finite, then Eq. (18) holds. for all x near x0 (and x = x0 ). Inequality (20) simply says
that the difference quotients of f (i.e., the slopes of its
One of the most important uses of the Riemann-Lebesgue
secants) near x0 are bounded. With this interpretation, it
lemma is in proofs of some basic pointwise convergence
is easy to see that the parabolic wave f2 is Lipschitz at all
theorems, such as the ones described in the next section.
points. More generally, if f has a derivative at x0 (or even
See Krantz and Walker (1998) for further discussions
just left- and right-hand derivatives), then f is Lipschitz
of the definition of Fourier series, Bessel’s inequality, and
at x0 .
the Riemann-Lebesgue lemma.
We can now state and prove a simple convergence the-
orem.
π
Theorem 3: Suppose f has period 2π , that −π |f(x)| d x is
III. CONVERGENCE OF FOURIER SERIES
finite, and that f is Lipschitz at x0 . Then the Fourier series
for f converges to f(x0 ) at x0 .
There are many ways to interpret the meaning of Eq. (13).
Investigations into the types of functions allowed on the To prove this theorem, we assume that f(x0 ) = 0. There
left side of Eq. (13), and the kinds of convergence con- is no loss of generality in doing so, since we can always
sidered for its right side, have fueled mathematical in- subtract the constant f(x0 ) from f(x). Define the function g
vestigations by such luminaries as Dirichlet, Riemann, by g(x) = f(x)/(e
π − e ). This function g has period 2π.
ix i x0
Weierstrass, Lipschitz, Lebesgue, Fejér, Gelfand, and Furthermore, −π |g(x)| d x is finite, because the quotient
Schwartz. In short, convergence questions for Fourier se- f(x)/(ei x − ei x0 ) is bounded in magnitude for x near x0 . In
ries have helped lay the foundations and much of the su- fact, for such x,
perstructure of mathematical analysis.
The three types of convergence that we shall describe f(x) f(x) − f (x0 )
=
here are pointwise, uniform, and norm convergence. We ei x − e x 0 ei x − e x 0
shall discuss the first two types in this section and take up
x − x0
the third type in the next section. ≤ A
All convergence theorems are concerned with how the ei x − e x 0
partial sums and (x − x0 )/(ei x − e x0 ) is bounded in magnitude, because
N it tends to the reciprocal of the derivative of ei x at x0 .
S N (x) := cn einx If we let dn denote the nth Fourier coefficient for
n=−N g(x), then we have cn = dn−1 − dn ei x0 because f(x) =
g(x)(ei x − ei x0 ). The partial sum S N (x0 ) then telescopes:
converge to f(x). That is, does lim N →∞ S N = f hold in
some sense?
N
x-value x0 . If lim N →∞ S N (x0 ) does equal f(x0 ), then we = d−N −1 e−i N x0 − d N ei(N +1)x0 .
say that the Fourier series for f converges to f(x0 ) at x0 .
We shall now state the simplest pointwise convergence Since dn → 0 as |n| → ∞, by the Riemann-Lebesgue
theorem for which an elementary proof can be given. lemma, we conclude that S N (x0 ) → 0. This completes the
This theorem assumes that a function is Lipschitz at each proof.
point where convergence occurs. A function is said to be It should be noted that for the square wave f1 and the
Lipschitz at a point x0 if, for some positive constant A, parabolic wave f2 , it is not necessary to use the general
Riemann-Lebesgue lemma stated above. That is because
π
|f(x) − f(x0 )| ≤ A |x − x0 | (19) for those functions it is easy to see that −π |g(x)|2 d x is
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
→0 as N → ∞
and thus Eq. (21) holds for the parabolic wave f2 .
Uniform convergence for the parabolic wave is a special
case of a more general theorem. We shall say that f is
uniformly Lipschitz if Eq. (19) holds for all points using the
same constant A. For instance, it is not hard to show that a
continuously differentiable, periodic function is uniformly
Lipschitz.
Theorem 4: Suppose that f has period 2π and is uniformly
Lipschitz at all points, then the Fourier series for f con-
verges uniformly to f.
A remarkably simple proof of this theorem is described in
Jackson (1941). More general uniform convergence theo-
rems are discussed in Walter (1994). FIGURE 6 Dirichlet’s kernel D20 .
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
of the main lobe centered at 0 (see Fig. 6) is significantly for L 2 -convergence. There is also an interpretation of L 2 -
greater than 1 (about 1.09 in value). norm in terms of a generalized Euclidean distance and
From the facts just cited, we can explain the origin of this gives a satisfying geometric flavor to L 2 -convergence
ringing and Gibbs’ phenomenon for the square wave. For of Fourier series. By interpreting the square of L 2 -norm
the square wave function f1 , Eq. (22) becomes as a type of energy, there is an equally satisfying physi-
x +π/2 cal interpretation of L 2 -convergence. The theory of L 2 -
1
S N (x) = D N (t) dt . (24) convergence has led to fruitful generalizations such as
2π x −π/2 Hilbert space theory and norm convergence in a wide va-
As x ranges from −π to π, this formula shows that S N (x) riety of function spaces.
is proportional to the signed area of D N over an interval To introduce the idea of L 2 -convergence, we first ex-
of length π centered at x. By examining Fig. 6, which is amine a special case. By Theorem 4, the partial sums of
a typical graph for D N , it is then easy to see why there is a uniformly Lipschitz function f converge uniformly to
ringing in the partial sums S N for the square wave. Gibbs’ f. Since that means that the maximum distance between
phenomenon is a bit more subtle, but also results from the graphs of S N and f tends to 0 as N → ∞, it follows
Eq. (24). When x nears a jump discontinuity, the central that
lobe of D N is the dominant contributor to the integral in π
1
Eq. (24), resulting in a spike which overshoots the value lim |f(x) − S N (x)|2 d x = 0. (25)
N →∞ 2π −π
of 1 for f1 by about 9%.
Our final pointwise convergence theorem was, in
This result motivates the definition of L 2 -convergence.
essence, the first to be proved. It was established by Dirich-
If g is a function for which |g |2 has a finite Lebesgue
let using the integral form for partial sums in Eq. (22). We
integral over [−π, π], then we say that g is an L 2 -function,
shall state this theorem in a stronger form first proved by
and we define its L 2 -norm g 2 by
Jordan.
π
Theorem 5: If f has period 2π and has bounded variation 1
on [0, 2π ], then the Fourier series for f converges at all g 2 = |g(x)|2 d x .
2π −π
points. In fact, for all x-values,
We can then rephrase Eq. (25) as saying that f − S N 2 →
lim S N (x) = 12 [f(x +) + f(x −)].
N →∞ 0 as N → ∞. In other words, the Fourier series for f con-
This theorem is too difficult to prove in the limited space verges to f in L 2 -norm. The following theorem general-
we have here (see Zygmund, 1968). A simple consequence izes this result to all L 2 -functions (see Rudin (1986) for a
of Theorem 5 is that the Fourier series for the square proof).
wave f1 converges at its discontinuity points to 1/2 (al- Theorem 6: If f is an L 2 -function, then its Fourier series
though this can also be shown directly by substitution of converges to f in L 2 -norm.
x = ±π/2 into the series in (Eq. (11)).
Theorem 6 says that Eq. (25) holds for every L 2 -func-
We close by mentioning that the conditions for con- tion f. Combining this with Eq. (16), we obtain Parseval’s
vergence, such as Lipschitz or bounded variation, cited equality:
in the theorems above cannot be dispensed with entirely.
∞ π
For instance, Kolmogorov π gave an example of a period 1
2π function (for which −π |f(x)| d x is finite) that has a |cn |2 = |f(x)|2 d x. (26)
n=−∞ 2π −π
Fourier series which fails to converge at every point.
More discussion of pointwise convergence can be found Parseval’s equation has a useful interpretation in terms
in Walker (1998), Walter (1994), or Zygmund (1968). of energy. It says that the energy of the set of Fourier
coefficients, defined to be equal to the left side of Eq. (26),
is equal to the energy of the function f, defined by the right
IV. CONVERGENCE IN NORM side of Eq. (26).
The L 2 -norm can be interpreted as a generalized
Perhaps the most satisfactory notion of convergence for Euclidean distance.To see 2this take square roots of both
Fourier series is convergence in L 2 -norm (also called sides of Eq. (26): |cn | = f2 . The left side of this
L 2 -convergence), which we shall define in this section. equation is interpreted as a Euclidean distance in an
One of the great triumphs of the Lebesgue theory of inte- (infinite-dimensional) coordinate space, hence the L 2 -
gration is that it yields necessary and sufficient conditions norm f2 is equivalent to such a distance.
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
As examples of these ideas, let’s return to the square We close this section by returning full circle to the no-
wave and parabolic wave. For the square wave f1 , we find tion of pointwise convergence. The following theorem was
that proved by Carleson for L 2 -functions and by Hunt for L p -
sin2 (n π/2) functions ( p = 2).
f1 − S100 22 =
|n |>100
(n π)2 Theorem 9: If f is an L p -function for p > 1, then its
Fourier series converges to it at almost all points.
= 1.0 × 10−3 .
By almost all points, we mean that the set of points where
Likewise, for the parabolic wave f2 , we have f2 − divergence occurs has Lebesgue measure zero. References
S100 22 = 2.6 × 10−6 . These facts show that the energy for the proof of Theorem 9 can be found in Krantz (1999)
of the parabolic wave is almost entirely contained in and Zygmund (1968). Its proof is undoubtedly the most
the partial sum S100 ; their energy difference is almost difficult one in the theory of Fourier series.
three orders of magnitude smaller than in the square wave
case. In terms of generalized Euclidean distance, we have
f2 − S100 2 = 1.6 × 10−3 and f1 − S100 2 = 3.2 × 10−2 ,
V. SUMMABILITY OF FOURIER SERIES
showing that the partial sum is an order of magnitude
closer for the parabolic wave.
In the previous sections, we noted some problems with
Theorem 6 has a converse, known as the Riesz-Fischer
convergence of Fourier series partial sums. Some of these
theorem.
problems include Kolmogorov’s example of a Fourier
Theorem 7 (Riesz-Fischer): If |cn |2 converges, then series for an L 1 -function that diverges everywhere, and
there exists an L -function f having {cn } as its Fourier
2 Gibbs’ phenomenon and ringing in the Fourier series par-
coefficients. tial sums for discontinuous functions. Another problem
is Du Bois Reymond’s example of a continuous function
This theorem is proved in Rudin (1986). Theorem and
whose Fourier series diverges on a countably infinite set of
the Riesz-Fischer theorem combine to give necessary and
points, see Walker (1968). It turns out that all of these dif-
sufficient conditions for L 2 -convergence of Fourier series,
ficulties simply disappear when new summation methods,
conditions which are remarkably easy to apply. This has
based on appropriate modifications of the partial sums, are
made L 2 -convergence into the most commonly used no-
used.
tion of convergence for Fourier series.
The simplest modification of partial sums, and one of
These ideas for L 2 -norms partially generalize to the
the first historically to be used, is to take arithmetic
case of L p -norms. Let p be real number satisfying p ≥ 1.
means. Define the N th arithmetic mean σ N by σ N =
If g is a function for which |g | p has a finite Lebesgue
(S0 + S1 + · · · + S N −1 )/N . From which it follows that
integral over [−π, π ], then we say that g is an L p -function,
and we define its L p -norm g p by N
|n|
π 1/ p σ N (x) = 1− cn einx . (27)
1 n=−N
N
g p = |g(x)| p d x .
2π −π The factors (1 − |n|/N ) are called convergence factors.
If f − S N p → 0, then we say that the Fourier series for They modify the Fourier coefficients cn so that the am-
f converges to f in L p -norm. The following theorem gen- plitude of the higher frequency terms (for |n| near N ) are
eralizes Theorem 6 (see Krantz (1999) for a proof). damped down toward zero. This produces a great improve-
ment in convergence properties as shown by the following
Theorem 8: If f is an L p -function for p > 1, then its theorem.
Fourier series converges to f in L p -norm.
Theorem 10: Let f be a periodic function. If f is an L p -
Notice that the case of p = 1 is not included in Theorem 8.
function for p ≥ 1, then σ N → f in L p -norm as N → ∞.
The example of Kolmogorov cited at the end of Section III
If f is a continuous function, then σ N → f uniformly as
shows that there exist L 1 -functions whose Fourier series
N → ∞.
do not converge in L 1 -norm. For p = 2, there are no simple
analogs of either Parseval’s equality or the Riesz-Fischer Notice that L 1 -convergence is included in Theorem 10.
theorem (which say that we can characterize L 2 -functions Even for Kolmogorov’s function, it is the case that
by the magnitude of their Fourier coefficients). Some par- f − σ N 1 → 0 as N → ∞. It also should be noted that
tial analogs of these latter results for L p -functions, when no assumption, other than continuity of the periodic func-
p = 2, are discussed in Zygmund (1968) (in the context of tion, is needed in order to ensure uniform convergence of
Littlewood-Paley theory). its arithmetic means.
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
1. For each N ,
π
1
K N (x) d x = 1.
2π −π
n =0 b+ c j ,k H j ,k (x) (35)
j =0 k =0
As an example, let f(x) = 1 for 0 ≤ x ≤ 1 and f(x) = 0 1
for −1 ≤ x < 0. The Legendre series for this step function with b = 0 f(x) d x and
is [see Walker (1988)]: 1
1 ∞
(−1)k (4k + 3)(2k)! c j ,k = f (x) H j ,k (x) d x.
+ P2k +1 (x). 0
2 k =0 4k +1 (k + 1)!k!
The definitions of b and c j ,k are justified by orthogonality
In Fig. 10 we show the partial sum S11 for this series. The relations between the Haar functions (similar to the or-
graph of S11 is reminiscent of a Fourier series partial sum thogonality relations that we used above to justify Fourier
for a step function. In fact, the following theorem is true. series and Legendre series).
1 A partial sum S N for the Haar series in Eq. () is defined
Theorem 12: If −1 |f(x)|2 d x is finite, then the partial by
sums S N for the Legendre series for f satisfy
1 S N (x) = b + c j,k H j,k (x).
{ j,k | 2 j +k≤N }
lim |f(x) − S N (x)|2 d x = 0.
N →∞ −1
For example, let f be the function on [0, 1] defined as
Moreover, if f is Lipschitz at a point x0 , then S N (x0 ) → follows
f(x0 ) as N → ∞.
x − 1/2 if 1/4 < x < 3/4
This theorem is proved in Walter (1994) and Jackson f(x) =
0 if x ≤ 1/4 or 3/4 ≤ x.
(1941). Further details and other examples of orthogonal
polynomial series can be found in either Davis (1975), In Fig. 11 we show the Haar series partial sum S256 for this
Jackson (1941), or Walter (1994). There are many impor- function. Notice that there is no Gibbs’ phenomenon with
tant orthogonal series—such as Hermite, Laguerre, and this partial sum. This contrasts sharply with the Fourier
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
e−inπ M −1
= f(xk )e−i2πkn/M .
FIGURE 13 Two Daubechies wavelets. M k =0
The last sum above is called the Discrete Fourier Trans-
If f is continuous on [0, 1], then S N converges uniformly form (DFT) of the finite sequence of numbers {f(xk )}.
to f on [0, 1]. That is, we define the DFT of a sequence {gk }k=0M−1
of
numbers by
Theorem 14 does not reveal the full power of wavelet
series. In almost all cases, it is possible to rearrange the
M−1
terms in the wavelet series in any manner whatsoever and Gn = gk e−i2πkn/M . (38)
convergence will still hold. One reason for doing a rear- k=0
rangement is in order to add the terms in the series with The DFT is the set of numbers {G n }, and we see from the
coefficients of largest magnitude (thus largest energy) first discussion above that the Fourier coefficients of a function
so as to speed up convergence to the function. Here is a f can be approximated by a DFT (multiplied by the fac-
convergence theorem for such permuted series. tors e−inπ /M). For example, in Fig. 14 we show a graph
1 of approximations of the Fourier coefficients {cn }50
n=−50 of
Theorem 15: Suppose that 0 |f(x)| p d x is finite, for
the square wave f1 obtained via a DFT (using M = 1024).
p > 1. If the terms of a Daubechies wavelet series are per-
For all values, these approximate Fourier coefficients dif-
muted (in any manner whatsoever), then the partial sums
fer from the exact coefficients by no more than 10−3 . By
S N of the permuted series satisfy
taking M even larger, the error can be reduced still further.
1/ p The two principal properties of DFTs are that they can
1
lim |f(x) − S N (x)| p d x = 0. be inverted and they preserve energy (up to a scale factor).
N →∞ 0 The inversion formula for the DFT is
If f is uniformly Lipschitz, then the partial sums S N of the
M−1
indices m = n + N ): (42)
N This signal is a combination of two tones with sharply
S N (xk ) = cn ein(−π+2π k /M) increasing frequency of oscillations. When run through a
n =−N sound generator, it produces a sharply rising pitch. Sig-
N nals like this bear some similarity to certain bird calls, and
= cn (−1)n ei2πnk /M are also used in radar. The spectrogram magnitudes for
n =−N this signal are shown in Fig. 15b. We can see two, some-
what blurred, line segments corresponding to the factors
2N
= cm −N (−1)m −N e−i2π k N /M ei2πkm/M . 400π x and 200π x multiplying x in the two sine factors
m =0 in Eq. (42).
One important area of application of spectrograms is
Thus, if we let gm = cm −N for m = 0, 1, . . . , 2N and gm =
in speech coding. As an example, in Fig. 16 we show
0 for m = 2N + 1, . . . , M − 1, we have
spectrogram magnitudes for two audio recordings. The
M −1 spectrogram magnitudes in Fig. 16a come from a record-
S M (xk ) = e−i2π k N /M gm (−1)m −N ei2πkm/M . ing of a four-year-old girl singing the phrase “twinkle,
m =0
twinkle, little star,” and the spectrogram magnitudes in
This equation shows that S M (xk ) can be computed using a Fig. 16b come from a recording of the author of this arti-
DFT inversion (along with multiplications by exponential cle singing the same phrase. The main frequencies are seen
factors). By combining DFT approximations of Fourier to be in harmonic progression (integer multiples of a low-
coefficients with this last equation, it is also possible to ap- est, fundamental frequency) in both cases, but the young
proximate Fourier series partial sums, or arithmetic means, girl’s main frequencies are higher (higher in pitch) than
or other modified partial sums. See Briggs and Henson the adult male’s. The slightly curved ribbons of frequency
(1995) or Walker (1996) for further details. content are known as formants in linguistics. For more
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
FIGURE 15 Spectrograms of test signals. (a) Bottom graph is the signal in Eq. (41). Top graph is the spectrogram
magnitudes for this signal. (b) Signal and spectrogram magnitudes for the signal in (42). Horizontal axes are time
values (in sec); vertical axes are frequency values (in Hz). Darker pixels denote larger magnitudes, white pixels are
near zero in magnitude.
FIGURE 16 Spectrograms of audio signals. (a) Bottom graph displays data from a recording of a young girl singing
“twinkle, twinkle, little star.” Top graph displays the spectrogram magnitudes for this recording. (b) Similar graphs for
the author’s rendition of “twinkle, twinkle, little star.”
P1: GHA/LOW P2: FQP Final
Encyclopedia of Physical Science and Technology EN006C-258 June 28, 2001 19:56
details on the use of spectrograms in signal analysis, see SEE ALSO THE FOLLOWING ARTICLES
Mallat (1998).
It is possible to invert spectrograms. In other words, we FUNCTIONAL ANALYSIS • GENERALIZED FUNCTIONS •
can recover the original signal by inverting the succession MEASURE AND INTEGRATION • NUMERICAL ANALYSIS •
of DFTs that make up its spectrogram. One application SIGNAL PROCESSING • WAVELETS
of this inverse procedure is to the compression of audio
signals. After discarding (setting to zero) all the values in
the spectrogram with magnitudes below a threshold value, BIBLIOGRAPHY
the inverse procedure creates an approximation to the sig-
nal which uses significantly less data than the original Briggs, W. L., and Henson, V. E. (1995). “The DFT. An Owner’s Manual,”
signal. For example, by discarding all of the spectrogram SIAM, Philadelphia.
values having magnitudes less than 1/320 times the largest Daubechies, I. (1992). “Ten Lectures on Wavelets,” SIAM, Philadelphia.
Davis, P. J. (1975). “Interpolation and Approximation,” Dover, New
magnitude spectrogram value, the young girl’s version of
York.
“twinkle, twinkle, little star” can be approximated, without Davis, P. J., and Hersh, R. (1982). “The Mathematical Experience,”
noticeable degradation of quality, using about one-eighth Houghton Mifflin, Boston.
the amount of data as the original recording. Some of the Fourier, J. (1955). “The Analytical Theory of Heat,” Dover, New York.
best results in audio compression are based on sophis- Jackson, D. (1941). “Fourier Series and Orthogonal Polynomials,” Math.
Assoc. of America, Washington, DC.
ticated generalizations of this spectrogram technique—
Krantz, S. G. (1999). “A Panorama of Harmonic Analysis,” Math. Assoc.
referred to either as lapped transforms or as local cosine of America, Washington, DC.
expansions, see Malvar (1992) and Mallat (1998). Mallat, S. (1998). “A Wavelet Tour of Signal Processing,” Academic
Press, New York.
Malvar, H. S. (1992). “Signal Processing with Lapped Transforms,”
VIII. CONCLUSION Artech House, Norwood.
Meyer, Y. (1992). “Wavelets and Operators,” Cambridge Univ. Press,
In this article, we have outlined the main features of Cambridge.
the theory and application of one-variable Fourier series. Rudin, W. (1986). “Real and Complex Analysis,” 3rd edition, McGraw-
Hill, New York.
Much additional information, however, can be found in Walker, J. S. (1988). “Fourier Analysis,” Oxford Univ. Press, Oxford.
the references. In particular, we did not have sufficient Walker, J. S. (1996). “Fast Fourier Transforms,” 2nd edition, CRC Press,
space to discuss the intricacies of multivariable Fourier Boca Raton.
series which, for example, have important applications Walker, J. S. (1999). “A Primer on Wavelets and their Scientific Appli-
in crystallography and molecular structure determination. cations,” CRC Press, Boca Raton.
Walter, G. G. (1994). “Wavelets and Other Orthogonal Systems with
For a mathematical introduction to multivariable Fourier Applications,” CRC Press, Boca Raton.
series, see Krantz (1999), and for an introduction to their Zygmund, A. (1968). “Trigonometric Series,” Cambridge Univ. Press,
applications, see Walker (1988). Cambridge.
P1: GLQ Final Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals
Benoit B. Mandelbrot
Michael Frame
Yale University
185
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
186 Fractals
in science. Every science started as a way to organize a Euclidean or locally Euclidean, and observed nature is
large collection of messages our brain receives from our written in the language of noisy Euclidean geometry.
senses. The difficulty is that most of these messages are Fractal geometry was invented to approach roughness in
very complex, and a science can take off only after it suc- a very different way. Under magnification, smooth shapes
ceeds in identifying special cases that allow a workable are more and more closely approximated by their tangent
first step. For example, acoustics did not take its first step spaces. The more they are magnified, the simpler (“bet-
with chirps or drums but with idealized vibrating strings. ter”) they look. Over some range of magnifications, look-
These led to sinusoids and constants or other functions in- ing more closely at a rock or a coastline does not reveal
variant under translation in time. For the notion of rough- a simpler picture, but rather more of the same kind of
ness, no proper measure was available only 20 years ago. detail. Fractal geometry is based on this ubiquitous scale
The claim put forward forcibly in Mandelbrot (1982) is invariance. “A fractal is an object that doesn’t look any
that a workable entry is provided by rough shapes that are better when you blow it up.” Scale invariance is also called
dilation invariant. These are fractals. “symmetry under magnification.”
Fractal roughness proves to be ubiquitous in the works A manifestation is that fractals are sets (or measures)
of nature and man. Those works of man range from mathe- that can be broken up into pieces, each of which closely
matics and the arts to the Internet and the financial markets. resembles the whole, except it is smaller. If the pieces scale
Those works of nature range from the cosmos to carbon isotropically, the shape is called self-similar; if different
deposits in diesel engines. A sketchy list would be use- scalings are used in different directions, the shape is called
less and a complete list, overwhelming. The reader is re- self-affine.
ferred to Frame and Mandelbrot (2001) and to a Panorama There are deep relations between the geometry of fractal
mentioned therein, available on the web. This essay is or- sets and the renormalization approach to critical phenom-
ganized around the mathematics of fractals, and concrete ena in statistical physics.
examples as illustrations of it.
To avoid the need to discuss the same topic twice, math- B. Examples of Self-Similar Fractals
ematical complexity is allowed to fluctuate up and down.
The reader who encounters paragraphs of oppressive dif- 1. Exact Linear Self-Similarity
ficulty is urged to skip ahead until the difficulty becomes A shape S is called exactly (linearly) self-similar if
manageable. the whole S splits into the union of parts Si : S =
S1 ∪ S2 ∪ . . . ∪ Sn . The parts satisfy two restrictions: (a)
each part Si is a copy of the whole S scaled by a linear
I. SCALE INVARIANCE contraction factor ri , and (b) the intersections between
parts are empty or “small” in the sense of dimension. An-
A. On Choosing a “Symmetry” Appropriate ticipating Section II, if i = j, the fractal dimension of the
to the Study of Roughness intersection Si ∩ S j must be lower than that of S. The
roughness of these sets is characterized by the sim-
The organization of experimental data into simple theoret- ilarity dimension d. In the special equiscaling case
ical models is one of the central works of every science; r1 = · · · = rn = r , d = log(n)/ log(1/r ). In general, d is
invariances and the associated symmetries are powerful the solution of the Moran equation
tools for uncovering these models. The most common in- n
variances are those under Euclidean motions: translations, rid = 1.
rotations, reflections. The corresponding ideal physics is i=1
that of uniform or uniformly accelerated motion, uniform More details are given in Section II.
or smoothly varying pressure and density, smooth subman- Exactly self-similar fractals can be constructed by sev-
ifolds of Euclidean physical or phase space. The geometric eral elegant mathematical approaches.
alphabet is Euclidean, the analytical tool is calculus, the
statistics is stationary and Gaussian. a. Initiator and generator. An initiator is a starting
Few aspects of nature or man match these idealizations: shape; a generator is a juxtaposition of scaled copies of the
turbulent flows are grossly nonuniform; solid rocks are initiator. Replacing the smaller copies of the initiator in the
conspicuously cracked and porous; in nature and the stock generator with scaled copies of the generator sets in mo-
market, curves are nowhere smooth. One approach to this tion a process whose limit is an exactly self-similar frac-
discrepancy, successful for many problems, is to treat ob- tal. Stages before reaching the limit are called protofrac-
served objects and processes as “roughened” versions of tals. Each copy is anchored by a fixed point, and one may
an underlying smooth ideal. The underlying geometry is have to specify the orientation of each replacement. The
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 187
FIGURE 2 The Sierpinski gasket, Sierpinski carpet, the Peano FIGURE 3 The Julia set of z 2 + 0.4 (left) and the filled-in Julia
curve generator, and the fourth stage of the Peano curve. set for z 2 − 0.544 + 0.576 · i (right).
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
188 Fractals
5-cycle, and the Julia set is the boundary of the black vation by Mandelbrot, Tan Lei (1984) proved the conver-
region. Certainly, Jc is invariant
√ under f c and under
√ the gence of appropriate magnifications of Julia sets and the M
−1 −1
inverses of f c , f c+ (z) = z − c and f c− (z) = − z − c. set at certain points named after Misiurewicz. Shishikura
Polynomial functions allow several equivalent character- (1994) proved Mandelbrot’s (1985) and Milnor’s (1989)
izations: Jc is the closure of the set of repelling periodic conjecture that the boundary of the M set has Hausdorff
points of f c (z) and Jc is the attractor of the nonlinear IFS dimension 2. Lyubich proved that the boundary of the M
−1 −1
{ f c+ , f c− }. set is asymptotically self-similar about the Feigenbaum
Much is known about Julia sets of quadratic functions. point.
For example, McMullen proved that at a point whose ro- Mandelbrot’s first conjecture, that the interior of the M
tation number has periodic continued-fraction expansion, set consists entirely of components (called hyperbolic) for
the J set is asymptotically self-similar about the critical which there is a stable cycle, remains unproved in general,
point. though McMullen (1994) proved it for all such compo-
The J sets are defined for functions more general nents that intersect the real axis. Mandelbrot’s notion that
than polynomials. Visually striking and technically in- M may be the closure of M 0 is equivalent to the assertion
teresting examples correspond to the Newton function that the M set is locally connected. Despite intense ef-
N f (z) = z − f (z)/ f (z) for polynomial families f (z), or forts, that assertion remains a conjecture, though Yoccoz
entire functions like λ sin z, λ cos z, or λ exp z (see Sec- and others have made progress.
tion VIII.B). Discussions can be found in Blanchard Other developments include the theory of quadratic-like
(1994), Curry et al. (1983), Devaney (1994), Keen (1994), maps (Douady and Hubbard, 1985), implying the univer-
and Peitgen (1989). sality and ubiquity of the M set. This result was presaged
by the discovery (Curry et al., 1983) of a Mandelbrot set
b. The Mandelbrot set. The quadratic orbit f cn (z) in the parameter space of Newton’s method for a family
always converges to infinity for large enough values of z. of cubic polynomials.
Mandelbrot attempted a computer study of the set M 0 of The recent book by Tan Lei (2000) surveys current re-
those values of c for which the orbit does not converge to sults and attests to the vitality of this field.
infinity, but to a stable cycle. This approach having proved
unrewarding, he moved on to a set that promised an easier c. Circle inversion limit sets. Inversion IC in a cir-
calculation and proved spectacular. Julia and Fatou, build- cle C with center O and radius r transforms a point P
ing on fundamental work of Montel, had shown that the into the point P lying on the ray OP and with d(O, P) ·
Julia set Jc of f c (z) = z 2 + c must be either connected or d(O, P ) = r 2 . This is the orientation-reversing involution
totally disconnected. Moreover, Jc is connected if, and defined on R2 ∪ {∞} by P → IC (P) = P . Inversion in C
only if, the orbit O+ (0) of the critical point z = 0 re- leaves C fixed, and interchanges the interior and exterior
mains bounded. The set M defined by {c: f cn (0) remains of C. It contracts the “outer” component not containing
bounded} is now called the Mandelbrot set (see the left O, but the contraction ratio is not bounded by any r < 1.
side of Fig. 4). Mandelbrot (1980) performed a computer Poincaré generalized from inversion in one circle to a
investigation of its structure and reported several observa- collection of more than one inversion. As an example,
tions. As is now well known, small copies of the M set are consider a collection of circles C1 , . . . , C N each of which
infinitely numerous and dense in its boundary. The right is external to all the others. That is, for all j = i, the disks
side of Fig. 4 shows one such small copy, a nonlinearly bounded by Ci and C j have disjoint interiors. The limit set
distorted copy of the whole set. Although the small copy (C1 , . . . , C N ) of inversion in these circles is the set of
on the right side of Fig. 4 appears to be an isolated “island,” limit points of the orbit O+ (P) of any point P, external to
Mandelbrot conjectured and Douady and Hubbard (1984) C1 , . . . , C N , under the group generated by IC1 , . . . , IC N .
proved that the M set is connected. Sharpening an obser- Equivalently, it is the set left invariant by every one of the
inversions IC1 , . . . , IC N .
The limit set is nearly always fractal but the nonlin-
earity of inversion guarantees that is nonlinearly self-
similar. An example is shown in Fig. 5: the part of the limit
set inside C1 is easily seen to be the transform by I1 of the
part of the limit set inside C2 , C3 , C4 , and C5 .
How can one draw the limit set when the arrangement
FIGURE 4 Left: The Mandelbrot set. Right: A detail of the Man- of the circles C1 , . . . , C N is more involved? Poincaré’s
delbrot set showing a small copy of the whole. Note the nonlinear original algorithm converges extraordinarily slowly. The
relation between the whole and the copy. first alternative algorithm was advanced in Mandelbrot
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 189
d. Kleinian group limit sets. A Kleinian group A tree’s branches are not exact shrunken copies of that
(Beardon, 1983; Maskit, 1988) is a discrete group of tree, inlets in a bay are not exact shrunken copies of that
Möbius transformations bay, nor is each cloud made up of exact smaller copies
of that cloud. To justify the role of fractal geometry as
az + b
z→ a geometry of nature, one must take a step beyond ex-
cz + d act self-similarity (linear or otherwise). Some element of
randomness appears to be present in many natural objects
acting on the Riemann sphere Ĉ, the sphere at infinity
and processes. To accommodate this, the notions of self-
of hyperbolic 3-space H3 . The isometries of H3 can be
similarity and self-affinity are made statistical.
represented by complex matrices
a b a. Wiener brownian motion: its graphs and trails.
. The first example is classical. It is one-dimensional
c d
Brownian motion, the random process X (t) defined by
[More precisely, by their equivalence classes in these properties: (1) with probability 1, X (0) = 0 and X (t)
P S L 2 (C).] Sullivan’s side-by-side dictionary (Sullivan, is continuous, and (2) the increments X (t + t) − X (t) of
1985) between Kleinian groups and iterates of rational X (t) are Gaussian with mean 0 and variance t. That is,
maps is another deep mathematical realm informed, 1
at least in part, by fractal geometry. Thurston’s “ge- Pr {X (t + t) − X (t) ≤ x} = √
2π t
ometrization program” for 3-manifolds (Thurston, 1997) x 2
involves giving many 3-manifolds hyperbolic structures −u
× exp du.
by viewing them as quotients of H3 by the action of a −∞ 2 t
Kleinian group G (Epstein, 1986). The corresponding
An immediate consequence is independence of incre-
action of G on Ĉ determines the limit set (G), defined
ments over disjoint intervals. A fundamental property of
as the intersection of all nonempty G-invariant subsets
Brownian motion is statistical self-affinity: for all s > 0,
of Ĉ. For many G, the limit set is a fractal. An example √
gives the flavor of typical results: the limit set of a finitely Pr {X (s(t + t)) − X (st) ≤ sx} = Pr {X (t + t)
generated Kleinian group is either totally disconnected, a
− X (t) ≤ x}.
circle, or has Hausdorff dimension greater than 1 (Bishop
and Jones, 1997). The Hausdorff dimension of the limit That is,√ rescaling t by a factor of s, and of x by a fac-
set has been studied by Beardon, Bishop, Bowen, Canary, tor of s, leaves the distribution unchanged. This correct
Jones, Keen, Mantica, Maskit, McMullen, Mumford, rescaling is shown on the left panel of Fig. 6: t (on the
Parker, Patterson, Sullivan, Tricot, Tukia, and many horizontal axis) is scaled by 4, x (on the vertical axis) is
others. Poincaré exponents, eigenvalues of the Laplacian, scaled by 2 = 41/2 . Note that this magnification has about
and entropy of geodesic flows are among the tools used. the same degree of roughness as the full picture. In the
Figure 5 brings forth a relation between some limit sets center panel, t is scaled by 4, x by 4/3; the magnification
of inversions or Kleinian groups and Apollonian packings is flatter than the original. In the right panel, both t and
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
190 Fractals
Fractals 191
FIGURE 10 Top: Fractional Brownian motion simulations with and a (one-dimensional) Lévy stable process is defined as
H = 0.25, H = 0.5, and H = 0.75. Bottom: Difference plots X(t +
a sum
1) − X(t) of the graphs above.
∞
f (t) = λk ξ (t − tk ),
k=1
E((X (t) − X (0)) · (X (t + h) − X (t)))
where the pulse times tn and amplitudes λn are cho-
= 12 ((t + h)2H − t 2H − h 2H ). sen according to the following Lévy measure: given t
and λ, the probability of choosing (ti , λi ) in the rectan-
If H = 1/2, this correlation vanishes and the increments
gle t < ti < t + dt , λ < λi < λ + d λ is C λ−α − 1 d λ dt. Fig-
are independent. In fact, FBM reduces to Brownian mo-
ure 12 shows the graph of a Lévy process or flight, and a
tion. If H > 1/2, the correlation is positive, so the incre-
graph of its increments.
ments tend to have the same sign. This is persistent FBM.
Comparing Figs. 7, 10, and 12 illustrates the power of
If H < 1/2, the correlation is negative, so the increments
the increment plot for revealing both global correlations
tend to have opposite signs. This is antipersistent FBM.
(FBM) and long tails (Lévy processes).
See Fig. 10. The exponent determines the dimension of
The effect of large excursions in Lévy processess is
the graph of FBM: with probability 1, dH = dbox = 2 − H .
more visible in the plane. See Fig. 13. These Lévy flights
Notice that for H > 1/2, the central band of the difference
were used in Mandelbrot (1982, Chapter 32) to mimic the
plot moves up and down, a sign of long-range correlation,
statistical properties of galaxy distributions.
but the outliers still are small. Figure 11 shows the trails
Using fractional Brownian motion and Lévy processes,
of these three flavors of FBM. FBM is the main topic of
Mandelbrot (in 1965 and 1963) improved upon Bache-
Mandelbrot (2001c).
lier’s Brownian model of the stock market. The former
corrects the independence of Brownian motion, the lat-
c. Lévy stable processes. While FBM introduces
ter corrects its short tails. The original and corrected pro-
correlations, its increments remain Gaussian and so have
cesses in the preceding sentence are statistically self-affine
small outliers. The Gaussian distribution is characterized
random fractal processes. This demonstrates the power
by its first two moments (mean and variance), but some
of invariances in financial modeling; see Mandelbrot
natural phenomena appear to have distributions for which
(1997a,b).
these are not useful indicators. For example, at the critical
point of percolation there are clusters of all sizes and the
d. Self-affine cartoons with mild to wild random-
expected cluster size diverges.
ness. Many natural processes exhibit long tails or global
Paul Lévy studied random walks for which the jump dis-
dependence or both, so it was a pleasant surprise that both
tributions follow the power law Pr {X > x} ≈ x −α . There
can be incorporated in an elegant family of simple car-
is a geometrical approach for generating examples of Lévy
toons. (Mandelbrot, 1997a, Chapter 6; 1999, Chapter N1;
processes.
2001a). Like for self-similar curves (Section I.B.1.a), the
The unit step function ξ (t) is defined by
basic construction of the cartoon involves an initiator and a
0 for x < 0
ξ (t) =
1 for x ≥ 0
192 Fractals
Fractals 193
FIGURE 16 Generators, cartoons and difference graphs for symmetric cartoons with turning points (a, 2/3) and
(1 − a, 1/3), for a = 0.333, 0.389, 0.444, 0.456, and 0.467. The same random number seed is used in all graphs.
with d ≈ 1.71 for clusters in the plane and d ≈ 2.5 for most general cases but coincide for exactly self-similar
clusters in space. This exponent d is the mass dimen- fractals. Many other dimensions cannot be mentioned
sion of the cluster. (See Sections II.C and V.) These val- here.
ues match measured scalings of physical objects moder- A more general approach to quantifying degrees of
ately, but not terribly well. A careful examination of much roughness is found in the article on Multifractals.
larger clusters revealed discrepancies that added in due
time to a very complex picture of DLA. Mandelbrot et al. A. Similarity Dimension
(1995) investigated clusters in the 107 range; careful mea-
surement reveals an additional dimension of 1.65 ± 0.01. The definition of similarity dimension is rooted in the fact
This suggests the clusters become more compact as they that the unit cube in D-dimensional Euclidean space is
grow. Also, as the cluster grows, more arms develop and self-similar: for any positive integer b the cube can be de-
the largest gaps decrease in size; i.e., the lacunarity de- composed into N = b D cubes, each scaled by the similar-
screases. (See Section VI.) ity ratio r = 1/b, and overlapping at most along (D − 1)-
dimensional cubes.
The equiscaling or isoscaling case. Provided the
II. THE GENERIC NOTION OF FRACTAL pieces do not overlap significantly, the power-law relation
DIMENSION AND A FEW SPECIFIC N = (1/r ) D between the number N and scaling factor r of
IMPLEMENTATIONS the pieces generalizes to all exactly self-similar sets with
all pieces scaled by the factor r . The similarity dimension
The first, but certainly not the last, step in quantifying dsim is
fractals is the computation of a dimension. The notion of log(N )
dsim = .
Euclidean dimension has many aspects and therefore ex- log(1/r )
tends in several fashions. The extensions are distinct in the
194 Fractals
C. Mass Dimension
The mass M(r ) of a d-dimensional Euclidean ball of con- FIGURE 19 Attempts at measuring the mass dimension of a
stant density ρ and radius r is given by Sierpinski gasket using three families of circles.
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 195
Section V describes methods of measuring the mass more difficult because the inf is taken over the collection of
dimension for physical datasets. all δ-covers. Because of the inf that enters in its definition,
the Hausdorff–Besicovitch dimension cannot be measured
for any physical object.
D. Minkowski–Bouligand Dimension
Note: If A can be covered by Nδ (A) sets of diame-
Given a set A ⊂ R E and δ > 0, the Minkowski sausage of ter at most δ, then Hδs (A) ≤ Nδ (A) · δ s . From this it fol-
A, also called the δ-thickening or δ-neighborhood of A, lows dH (A) ≤ dbox (A), so dH (A) ≤ dbox (A) if dbox (A) ex-
is defined as Aδ = {x ∈ R E: d(x, y) ≤ δ for some y ∈ A}. ists. This inequality can be strict. For example, if A is
(See Section I.B.b.) In the Euclidean case when A is a any countable set, dH (A) = 0 and yet dbox (rationals in
smooth m-dimensional manifold imbedded in R E , one has [0, 1]) = 1.
vol(Aδ ) ∼ · δ E−m . That is, the E-dimensional volume
of Aδ scales as δ to the codimension of A. This concept
F. Packing Dimension
extends to fractal sets A: if the limit exists,
log(vol(Aδ )) Hausdorff dimension measures the efficiency of covering
E − lim a set by disks of varying radius. Tricot (1982) introduced
δ→0 log(δ)
packing dimension to measure the efficiency of packing
defines the Minkowski–Bouligand dimension, dMB (A) (see a set with disjoint disks of varying radius. Specifically,
Mandelbrot, 1982, p. 358). In fact, it is not difficult to see for δ > 0 a δ-packing of A is a countable collection of
that dMB (A) = dbox (A). If the limit does not exist, lim sup disjoint disks {Bi } with radii ri < δ and with centers in A.
gives dbox (A) and lim inf gives dbox (A). In analogy with Hausdorff measure, define
In the privileged case when the limit
vol(Aδ ) Pδ (A) = sup
s
|Bi | : {Bi } is a δ-packing of A .
s
lim
δ E−m
δ→0 i
exists, it generalizes the notion of Minkowski content for As δ decreases, so does the collection of δ-packings of A.
smooth manifolds A. Section VI will use this prefactor to Thus Pδs (A) decreases as δ decreases and the limit
measure lacunarity.
P0s (A) = lim Pδs (A)
δ→0
P (A) = inf
s
P0 (Ai ): A ⊂
s
Ai .
Hδ (A) = inf
s
|Ui | : {Ui } is a δ-cover of A .
s i i=1
i
Then the packing dimension dpack (A) is
A decrease of δ reduces the collection of δ-covers of
dpack (A) = inf{s: P s (A) = 0} = sup{s: P s (A) = ∞}.
A, therefore Hδs (A) increases as δ → 0 and Hs (A) =
limδ→0 Hδs (A) exists. This limit defines the s-dimensional Packing, Hausdorff, and box dimensions are related:
Hausdorff measure of A. For t > s, Hδt (A) ≤ δ t−s Hδs (A).
dH (A) ≤ dpack (A) ≤ dbox (A).
It follows that a unique number dH has the property that
For appropriate A, each inequality is strict.
s < dH implies Hs (A) = ∞
and
III. ALGEBRA OF DIMENSIONS
s > dH implies Hs (A) = 0.
AND LATENT DIMENSIONS
That is,
The dimensions of ordinary Euclidean sets obey sev-
dH (A) = inf{s: Hs (A) = 0} = sup{s: Hs (A) = ∞}.
eral rules of thumb that are widely used, though rarely
This quantity dH is the Hausdorff–Besicovitch dimension stated explicitly. For example, the union of two sets of
of A. It is of substantial theoretical significance, but in dimension d and d usually has dimension max{d, d }.
most cases is quite challenging to compute, even though it The projection of a set of dimension d to a set of di-
suffices to use coverings by disks. An upper bound often is mension d usually gives a set of dimension min{d, d }.
relatively easy to obtain, but the lower bound can be much Also, for Cartesian products, the dimensions usually add:
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
196 Fractals
dim(A × B) = dim(A) + dim(B). For the intersection of The obvious generalization holds for fractals A ⊂ R E
subsets A and B of R E , it is the codimensions that usually and projections to k-dimensional hyperplanes through the
add: E − dim(A ∩ B) = (E − dim(A)) + (E − dim(B)), origin.
but only so long as the sum of the codimensions is non- Projections of fractals can be very complicated. There
negative. If this sum is negative, the intersection is empty. are fractal sets A ⊂ R3 with the surprising property that
Mandelbrot (1984, Part II) generalized those rules to frac- for almost every plane P through the origin, the projection
tals and (see Section III.G) interpreted negative dimen- proj P (A) is any prescribed shape, to within a set of area 0.
sions as measures of “degree of emptiness.” Consequently, as Falconer (1987) points out, in principle
For simplicity, we restrict our attention to generaling we could build a fractal digital sundial.
these properties to the Hausdorff and box dimensions of
fractals. D. Subordination and Products of Dimension
We have already seen operations realizing the sum, max,
A. Dimension of Unions and Subsets and min of dimensions, and in the next subsection we shall
Simple applications of the definition of Hausdorff dimen- examine the sum of codimensions. For certain types of
sion give fractals, multiplication of dimensions is achieved through
“subordination,” a process introduced in Bochner (1955)
A⊆B implies dH (A) ≤ dH (B) and elaborated in Mandelbrot (1982). Examples are con-
and structed easily from the Koch curve generator (Fig. 20a).
The initiator (the unit interval) is unchanged, but the new
dH (A ∪ B) = max{dH (A), dH (B)}. generator is a subset of the original generator. Figure 20
Replacing max with sup, this property holds for countable shows three examples.
collections of sets. The subset and finite union properties In Fig. 20, generator (b) gives a fractal dust (B) of
hold for box dimension, but the countable union property dimension log 3/log 3 = 1. Generator (c) gives the stan-
fails. dard Cantor dust (C) of dimension log 2/log 3. Generator
(d) gives a fractal dust (D) also of dimension log 2/log 3.
Thinking of the Koch curve K as the graph of a func-
B. Product and Sums of Dimensions tion f : [0, 1] → K ⊂ R2 , the fractal (B) can be obtained
For all subsets A and B of Euclidean space, by restricting f to the Cantor set with initiator [0, 1] and
dH (A × B) ≥ dH (A) + dH (B). Equality holds if one generator the intervals [0, 1/4], [1/4, 1/2], and [1/2, 3/4].
of the sets is sufficiently regular. For example, if dH (A) = In this case, the subordinand is a Koch curve, the subor-
dH (A), then dH (A × B) = dH (A) + dH (B). Equality does dinator is a Cantor set, and the subordinate is the fractal
not always hold: Besicovitch and Moran (1945) give an (B). The identity
example of subsets A and B of R with dH (A) = dH (B) = 0, log 3 log 4 log 3
yet dH (A × B) = 1. = ·
log 3 log 3 log 4
For upper box dimensions, the inequality is reversed:
dbox (A × B) ≤ dbox (A) + dbox (B). expresses that the dimensions multiply,
dim(subordinate) = dim(subordinand)
C. Projection · dim(subordinator).
Denote by proj P (A) the projection of a set A ⊂ R to a3
Figure 20, (C) and (D) give other illustrations of this
plane P ⊂ R3 through the origin. If A is a one-dimensional multiplicative relation. The seeded universe model
Euclidean object, then for almost all choices of the plane of the distribution of galaxies (Section IX.D.1) uses
P, projP (A) is one-dimensional. If A is a two- or three- subordination to obtain fractal dusts; see Mandelbrot
dimensional Euclidean object, then for almost all choices (1982, plate 298).
of the plane P, proj P (A) is two-dimensional of positive
area. That is, dim(proj P (A)) = min{dim(A), dim(P)}.
The analogous properties hold for fractal sets A. If
dH (A) < 2, then for almost all choices of the plane
P, dH (proj P (A)) = dH (A). If dH (A) ≥ 2, then for al-
most all choices of the plane P, dH (proj P (A)) = 2 and FIGURE 20 The Koch curve (A) and its generator (a); (b), (c),
proj P (A) has positive area. So again, dH (proj P (A)) = and (d) are subordinators, and the corresponding subordinates of
min{dH (A), dH (P)}. the subordinand (A) are (B), (C), and (D).
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 197
E. Intersection and Sums of Codimension Embedding. A problem that concerns R2 can often be
reinterpreted as a problem that really concerns R E , with
The dimension of the intersection of two sets obviously
E > 2, but must be approached within planar intuitions by
depends on their relative placement. When A ∩ B = ∅, the
R2 . Conversely, if a given problem can be embedded into a
dimension vanishes. The following is a typical result. For
problem concerning R E , the question arises, “which is the
Borel subsets A and B of R E , and for almost all x ∈ R E ,
‘critical’ value of E − 2, defined as the smallest value for
dH (A ∩ (B + x)) ≤ max{0, dH (A × B) − E}. which the intersection ceases to be empty, and precisely
If dH (A × B) = dH (A) + dH (B), this reduces to reduces to a point?” In the example of a line and a point,
the critical E −2 is precisely 1: once embedded in R3 , the
dH (A ∩ (B + x)) ≤ max{0, dH (A) + dH (B) − E}. problem transforms into the intersection of a plane and a
This is reminiscent of the transversality relation for inter- line, which is a point.
sections of smooth manifolds. Approximation and pre-asymptotics in mathematics
Corresponding lower bounds are known in more re- and the sciences. Consider a set defined as the limit of
stricted circumstances. For example, there is a positive a sequence of decreasing approximations. When the limit
measure set M of similarity transformations of R E with is not empty, all the usual dimensions are defined as be-
ing properties of the limit, but when the limit is empty
dH (A ∩ T (B)) ≥ dH (A) + d H (B) − E and all the dimensions vanish, it is possible to consider
for all T ∈ M. Note dH (A ∩ T (B)) = dH (A) + dH (B) − E instead the limits of the properties of the approximations.
is equivalent to the addition of codimensions: E − The Minkowski–Bouligand formal definition of dimen-
dH (A ∩ T (B)) = (E − dH (A)) + (E − dH (B)). sion generalizes to fit the naive intuitive values that may
be either positive or negative.
F. Latent Dimensions below 0 or above E
2. Latent Dimensions That Exceed
A blind application of the rule that codimensions are addi- That of the Embedding Space
tive easily yields results that seem nonsensical, yet become
useful if they are properly interpreted and the Hausdorff For a strictly self-similar set in R E , the Moran equation
dimension is replaced by a suitable new alternative. defines a similarity dimension that obeys dsim ≤ E. On
the other hand, a generator that is a self-avoiding broken
line can easily yield log(N )/ log(1/r ) = dsim > E. Recur-
1. Negative Latent Dimensions as Measures
sive application of this generator defines a parametrized
of the “Degree of Emptiness”
motion, but the union of the positions of the motion is
Section E noted that if the codimension addition rule gives neither a self-similar curve nor any other self-similar set.
a negative dimension, the actual dimension is 0. This ex- It is, instead, a set whose points are covered infinitely of-
ception is an irritating complication and hides a feature ten. Its box dimension is ≤E, which a fortiori is <dsim .
worth underlining. However, one can load a mass on this set by following the
As background relative to the plane, consider the fol- route that applies in the absence of multiple points. Mass
lowing intersections of two Euclidean objects: two points, is distributed on the generator’s intervals in proportion to
a point and a line, and two lines. Naive intuition tells the values of ridsim . By infinite recursion, the difference
us that the intersection of two points is emptier than between the times t and t when points P and P are
the intersection of a point and a line, and that the lat- visited is defined as the mass supported by the portion of
ter in turn is emptier than the intersection of two lines the curve that links these points.
(which is almost surely a point). This informal intu- If so and dsim > E, the similarity dimension acquires a
ition fails to be expressed by either a Euclidean or a useful role as a latent dimension. For example, consider
Hausdorff dimension. On the other hand, the formal ad- the multiplication of dimensions in Section III.D. Suppose
dition of codimensions suggests that the three intersec- that our recursively constructed set is not lighted for all
tions in question have the respective dimensions −2, −1, instants of time, but only intermittently when time falls
and 0. The inequalities between those values conform within a fractal dust of dimension d . Then, the rule of
with the above-mentioned naive intuition. Therefore, they thumb is that the latent dimension of the lighted points
ushered in the search for a new mathematical defini- is dsim d . When dsim d < E, the rule of thumb is that the
tion of dimension that can be measured and for which true dimension is also dsim d .
negative values are legitimate and intuitive. This search Figure 21 shows an example. The generator has N = 6
produced several publications leading to Mandelbrot segments, each with scaling ratio r = 1/2, hence latent
(1995). Two notions should be mentioned. dimension dsim = log 6/log 2 > 2. Taking as subordinator
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
198 Fractals
FIGURE 21 Left: Generator and limiting shape with latent dimension exceeding 2. Right: generator and limiting
shape of a subordinate with dimension <2. For comparison, this limiting shape is enclosed in the outline of the left
limiting shape.
a Cantor set with generator having N = 3 segments, each dµ(x)
φs (x) = .
with scaling ratio r = 1/2, yields a self-similar fractal with |x − y|s
dimension log 3/log 2.
there is a mass distribution µ on a set A with
If
G. Mapping φs (x) dµ(x) < ∞, then dH (A) ≥ s. Potential theory has
been useful for computing dimension of many sets, for
Recall f satisfies the Hölder condition with exponent example, Brownian paths.
H if there is a positive constant c for which | f (x) −
f (y)| ≤ c|x − y| H . For such functions, dH ( f (A)) ≤
(1/H )dH (A). If H = 1, f is called a Lipschitz function; C. Implicit Methods
f is bi-Lipschitz if there are constants c1 and c2 with
c1 |x − y| ≤ | f (x) − f (y)| ≤ c2 |x − y|. Hausdorff dimen- McLaughlin (1987) introduced a geometrical method,
sion is invariant under bi-Lipschitz maps. The analogous based on local approximate self-similarities, which suc-
properties hold for box-counting dimension. ceeds in proving that dH (A) = dbox (A), without first deter-
mining dH (A). If small parts of A can be mapped to large
parts of A without too much distortion, or if A can be
IV. METHODS OF COMPUTING DIMENSION mapped to small parts of A without too much distortion,
IN MATHEMATICAL FRACTALS then dH (A) = dbox (A) = s and Hs (A) > 0 (in the former
case) or Hs (A) < ∞ (in the latter case). Details and ex-
Upper bounds for the Hausdorff dimension can be rel- amples can be found in Falconer (1997, Section 3.1).
atively straightforward: it suffices to consider a specific
family of coverings of the set. Lower bounds are more
delicate. We list and describe briefly some methods for D. Thermodynamic Formalism
computing dimension.
Sinai (1972), Bowen (1975), and Ruelle (1978) adapted
methods of statistical mechanics to determine the dimen-
A. Mass Distribution Methods
sions of fractals arising from some nonlinear processes.
A mass distribution on a set A is a measure µ with Roughly, for a fractal defined as the attractor A of a fam-
supp(µ) ⊂ A and 0 < µ(A) < ∞. The mass distribution ily of nonlinear contractions Fi with an inverse function f
principle (Falconer, 1990, p. 55) establishes a lower bound defined on A, the topological pressure P(φ) of a Lipschitz
for the Hausdorff dimension: Let µ be a mass distribution function φ: A → R is
on A and suppose for some s there are constants c > 0
and δ > 0 with µ(U ) ≤ c · |U |s for all sets U with |U | ≤ δ. P(φ) = lim
1
log exp[φ(x) + φ( f (x))
Then δ ≤ dH (A). k→∞ k
x ∈ Fix( f k )
Suitable choice of mass distribution can show that no
individual set of a cover can cover too much of A. This
can eliminate the problems caused by covers by sets of a + · · · φ( f k−1
(x))] ,
wide range of diameters.
where Fix( f k ) denotes the set of fixed points of f k . The
B. Potential Theory Methods
sum plays the role of the partition function in statistical
Given a mass distribution µ, the s-potential is defined by mechanics, part of the motivation for the name “ther-
Frostman (1935) as modynamic formalism.” There is a unique s for which
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 199
For shapes represented in the plane—for example, coast- another step in characterizing fractals through associated
lines, rivers, mountain profiles, earthquake faultlines, frac- numbers. How can the distribution of a fractal’s holes or
ture and cracking patterns, viscous fingering, dielectric gaps (“lacunae”) be quantified?
breakdown, growth of bacteria in stressed environments—
box dimension is often relatively easy to compute. Select a
A. The Prefactor
sequence 1 > 2 > · · · > n of sizes of boxes to be used to
cover the shape, and denote by N (i ) the number of boxes Suppose A is either carpet in Fig. 22, and let Aδ de-
of size i needed to cover the shape. A plot of log(N (i )) note the δ-thickening of A. As mentioned in Section II.D,
against log(1/i ) often reveals a scaling range over which area(Aδ ) ∼ · δ 2−log 40/ log 7 . One measure of lacunarity is
the points fall close to a straight line. In the presence of 1/ , if the appropriate limit exists.
other evidence (hierarchical visual complexity, for exam- It is well known that for the box dimension, the limit as
ple), this indicates a fractal structure with box dimension → 0 can be replaced by the sequential limit n → 0, for
given by the slope of the line. Interpreting the box di- n satisfying mild conditions. For these carpets, natural
mension in terms of underlying physical, chemical, and choices are those n just filling successive generations of
biological processes has yielded productive insights. holes. Applied to Fig. 22, these n give 1/ ≈ 0.707589
For physical objects in three-dimensional space—for and 0.793487, agreeing with the notion that higher lacu-
example, aggregates, dustballs, physiological branch- narity corresponds to a more uneven distribution of holes.
ings (respiratory, circulatory, and neural), soot parti- Unfortunately, the prefactor is much more sensitive than
cles, protein clusters, terrain maps—it is often easier the exponent: different sequences of n give different lim-
to compute mass dimension. Select a sequence of radii its. Logarithmic averages can be used, but this is work in
r1 > r2 > · · · > rn and cover the object with concentric progress.
spheres of those radii. Denoting by M(ri ) the mass of the
part of the object contained inside the sphere of radius ri ,
a plot of log(M(ri )) against log(ri ) often reveals a scaling B. The Crosscuts Structure
range over which the points fall close to a straight line. In
An object is often best studied through its crosscuts by
the presence of other evidence (hierarchical arrangements
straight lines, concentric circles, or spheres. For a fractal
of hole sizes, for example), this indicates a fractal struc-
of dimension d in the plane, the rule of thumb is that the
ture with mass dimension given by the slope of the line.
crosscuts are Cantor-like objects of dimension d − 1. The
Mass dimension is relevant for calculating how density
case when the gaps between points in the crosscut are
scales with size, and this in turn has implications for how
statistically independent was singled out by Mandelbrot
the object is coupled to its environment.
as defining “neutral lacunarity.” If the crosscut is also self-
similar, it is a Lévy dust.
Hovi et al. (1996) studied the intersection of lines (linear
VI. LACUNARITY crosscuts) with two- and three-dimensional critical per-
colation clusters, and found the gaps are close to being
Examples abound of fractals sharing the same dimension statistically independent, thus a Lévy dust.
but looking quite different. For instance, both Sierpin- In studying very large DLA clusters, Mandelbrot et al.
ski carpets in Fig. 22 have dimension log 40/log 7. The (1995) obtained a crosscut dimension of dc = 0.65 ± 0.01,
holes’ distribution is more uniform on the left than on different from the value 0.71 anticipated if DLA clus-
the right. The quantification of this difference was un- ters were statistically self-similar objects with mass di-
dertaken in Mandelbrot (1982, Chapter 34). It introduced mension dmass = 1.71. The difference can be explained by
lacunarity as one expression of this difference, and took asserting the number of particles Nc (r/l) on a crosscut
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
200 Fractals
C. Antipodal Correlations
Select an occupied point p well inside a random fractal known was constructed by Weierstrass in 1872. The Weier-
cluster, so that the R × R square centered at p lies within strass sine function is
the cluster. Now select two vectors V and W based at p
∞
and separated by the angle θ . Finally, denote by x and y the W (t) = b−H n sin(2π bn t),
n=0
number of occupied sites within the wedges with apexes
at p, apex angles φ much less than θ , and centered about and the complex Weierstrass function is
the vectors V and W . The angular correlation function
∞
is W0 (t) = b−H n exp(2πibn t).
x y − xy n=0
C(θ) = , Hardy (1916) showed W (t) is continuous and nowhere-
x 2 − xx
differentiable if and only if b > 1 and 0 < H < 1.
where · · · denotes an average over many realizations of As shown in Fig. 23, the parameter H determines the
the random fractal. Antipodal correlations concern θ = π . roughness of the graph. In this case, H is not a perspicuous
Negative and positive antipodal correlations are inter- “roughness exponent.” Indeed, as b increases, the ampli-
preted as indicating high and low lacunarity; vanishing tudes of the higher frequency terms decrease and the graph
correlation is a weakened form of neutral lacunarity. is more clearly dominated by the lowest frequency terms.
Mandelbrot and Stauffer (1994) used antipodal correla- This effect of b is a little-explored aspect of lacunarity.
tions to study the lacunarity of critical percolation clusters.
On smaller central subclusters, they found the antipodes
are uncorrelated. B. Weierstrass–Mandelbrot Functions
Trema random fractals. These are formed by remov- The Weierstrass function revolutionized mathematics but
ing randomly centered discs, tremas, with radii obeying a did not enter physics until it was modified in a series
power-law scaling. For them, C(π) → 0 with φ because of steps described in Mandelbrot (1982, pp. 388–390;
a circular hole that overlaps a sector cannot overlap the (2001d, Chapter H4). The step from W0 (t) to W1 (t) added
opposite sector. But nonconvex tremas introduce positive low frequencies in order to insure self-affinity. The step
antipodal correlations. For θ close to π , needle-shaped from W1 (t) to W2 (t) added to each addend a random phase
tremas, though still convex, yield C(θ) much higher than ϕn uniformly distributed on [0, 1]. The √step from W1 (t) to
for circular trema sets. From this more refined viewpoint, W3 (t) added a random amplitude An = −2 log V , where
needle tremas’ lacunarity is much lower. V is uniform on [0, 1]. A function W4 (t) that need not be
written down combines a phase and an amplitude. The lat-
est step leads to another function that need not be written
VII. FRACTAL GRAPHS down: it is W5 (t) = W4 (t) + W4 (−t), where the two ad-
AND SELF-AFFINITY dends are statistically independent. Contrary to all earlier
extensions, W5 (t) is not chiral. We have
A. Weierstrass Functions
∞
W1 (t) = b−H n (exp(2πibn t) − 1),
Smooth functions’ graphs, as seen under sufficient mag- n=−∞
nification, are approximated by their tangents. Unless the
function itself is linear, the existence of a tangent con-
∞
W2 (t) = b−H n (exp(2πibn t) − 1) exp(iϕn ),
tradicts the scale invariance that characterizes fractals. n=−∞
The early example of a continuous, nowhere-differentiable
∞
function devised in 1834 by Bolzano remained unpub- W3 (t) = An b−H n (exp(2πibn t) − 1).
lished until the 1920s. The first example to become widely n=−∞
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 201
C. The Hölder Exponent the dynamics to a shift map on a Cantor set under general
conditions. In this sense, chaos often equivalent to simple
A function f: [a, b] → R has Hölder exponent H if there
dynamics on an underlying fractal.
is a constant c > 0 for which
| f (x) − f (y)| ≤ c · |x − y| H
B. Fractal Basin Boundaries
for all x and y in [a, b] (recall Section III.G). If f is con- For any point c belonging to a hyperbolic component of the
tinuous and has Hölder exponent H satisfying 0 < H ≤ 1, Mandelbrot set, the Julia set is the boundary of the basins
then the graph of f has box dimension dbox ≤ 2 − H . of attraction of the attracting cycle and the attracting fixed
The Weierstrass function W (t) has Hölder exponent point at infinity. See the right side of Fig. 3.
H , hence its graph has dbox ≤ 2 − H . For large enough Another example favored by Julia is found in New-
b, dbox = 2 − H , so one can think of the Hölder exponent ton’s method for finding the roots of a polynomial f (z)
as a measure of roughness of the graph. of degree at least 3. It leads to the dynamical system
z n+1 = N f (z n ) = z n − f (z n )/ f (z n ). The roots of f (z) are
attracting fixed points of N f (z), and the boundary of the
VIII. FRACTAL ATTRACTORS AND basins of attraction of these fixed points is a fractal; an ex-
REPELLERS OF DYNAMICAL ample is shown on the left side of Fig. 24. If contaminated
SYSTEMS by even small uncertainties, the fate of initial points near
the basin boundary cannot be predicted. Sensitive depen-
The modern renaissance in dynamical systems is asso- dence on initial conditions is a signature of chaos, but here
ciated most often with chaos theory. Consequently, the we deal with something different. The eventual behavior
relations between fractal geometry and chaotic dynam- is completely predictable, except for initial points taken
ics, mediated by symbolic dynamics, are relevant to our exactly on the basin boundary, usually of two-dimensional
discussion. In addition, we consider fractal basin bound- Lebesgue measure 0.
aries, which generalize Julia sets to much wider contexts The same complication enters mechanical engineer-
including mechanical systems. ing problems for systems with multiple attractors. Moon
(1984) exhibited an early example. Extensive theoretical
and computer studies by Yorke and coworkers are de-
A. The Smale Horseshoe
scribed in Alligood and Yorke (1992). The driven har-
If they exist, intersections of the stable and unstable monic oscillator with two-well potential
manifolds of a fixed point are called homoclinic points.
d2x dx 1
Poincaré (1890) recognized that homoclinic points cause + f − x(1 − x 2 ) = A cos(ωt)
great complications in dynamics. Yet much can be un- dt 2 dt 2
derstood by labeling an appropriate coarse-graining of a is a simple example. The undriven system has two
neighborhood of a homoclinic point and translating the equilibria, x = −1 and x = +1. Initial values (x , x ) are
corresponding dynamics into a string of symbols (the painted white if the trajectory from that point eventually
coarse-grain bin labels). The notion of symbolic dynam- stays in the left basin, black if it eventually stays in the right
ics first appears in Hadamard (1898), and Birkhoff (1927) basin. The right side of Fig. 24 shows the initial condition
proved every neighborhood of a homoclinic point contains portrait for the system with f = 0.15, ω = 0.8, and A =
infinitely many periodic points. 0.094.
Motivated by work of Cartwright and Littlewood (1945)
and Levinson (1949) on the forced van der Pol oscillator,
Smale (1963) constructed the horseshoe map. This is a IX. FRACTALS AND DIFFERENTIAL OR
map from the unit square into the plane with completely PARTIAL DIFFERENTIAL EQUATIONS
invariant set a Cantor set , roughly the Cartesian product
of two Cantor middle-thirds sets. Restricted to , with the The daunting task to which a large portion of Mandelbrot
obvious symbolic dynamics encoding, the horseshoe map (1982) is devoted was to establish that many works of
is conjugate to the shift map on two symbols, the archetype nature and man [as shown in Mandelbrot (1997), the lat-
of a chaotic map. ter includes the stock market!] are fractal. New and often
This construction is universal in the sense that it oc- important examples keep being discovered, but the hard-
curs in every transverse homoclinic point to a hyperbolic est present challenge is to discover the causes of fractal-
saddle point. The Conley–Moser theorem (see Wiggins, ity. Some cases remain obscure, but others are reasonably
1990) establishes the existence of chaos by conjugating clear.
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
202 Fractals
FIGURE 24 Left: The basins of attraction of Newton’s method for finding the roots of z 3 − 1. Right: The basins of
attraction for a damped, driven two-well harmonic oscillator.
Thus, the fractality of the physical percolation clusters nection with lasers. The sensitivity to initial conditions
(Section I.B.3.e) is the geometric counterpart of scaling common to chaotic dynamics is mediated by the intricate
and renormalization: the analytic properties of those ob- fractal interleaving of the multiple layers of the attractor.
jects follow a wealth of power-law relations. Many math- In addition, Birman and Williams (1983) showed an abun-
ematical issues, some of them already mentioned, remain dance of knotted periodic orbits embedded in the Lorenz
open, but the overall renormalization framework is firmly attractor, though Williams (1983) showed all such knots
rooted. Renormalization and the resulting fractality also are prime. Grist (1997) constructed a universal template,
occur in the structure of attractors and repellers of dy- a branched 2-manifold in which all knots are embedded.
namical systems. Best understood is renormalization for Note the interesting parallel with the universal aspects of
quadratic maps. Feigenbaum and others considered the the Sierpinski carpet (Section I.B.1.a). It is not yet known
real case. For the complex case, renormalization estab- if the attractor of any differential equation contains a uni-
lishes that the Mandelbrot set contains infinitely many versal template. The Poincaré–Bendixson theorem pro-
small copies of itself. hibits fractal attractors for differential equations in the
Unfortunately, additional examples of fractality proved plane, but many other classical ordinary differential equa-
to be beyond the scope of the usual renormalization. A tions in at least three dimensions exhibit similar fractal
notorious case concerns DLA (Section I.B.3.f). attractors in certain parameter ranges.
Fractals 203
204 Fractals
structure that is not a deliberate and largely arbitrary input. extended to turbulence, and circa 1964 led to the following
Details are given in Mandelbrot (1982). conjecture.
The first construction is the seeded universe based on Conjecture. The property of being “turbulently dissipa-
a Lévy flight. Its Hausdorff-dimensional properties were tive” should not be viewed as attached to domains in a fluid
well known. Its correlation properties (Mandelbrot 1975) with significant interior points, but as attached to fractal
proved to be nearly identical to those of actual galaxy sets. In a first approximation, those sets’ intersection with
maps. The second construction is the parted universe a straight line is a Cantor-like fractal dust having a dimen-
obtained by subtracting from space a random collection sion in the range from 0.5 to 0.6. The corresponding full
of overlapping tremas. Either construction yields sets that sets in space should therefore be expected to be fractals
are highly irregular and involve no special center, yet, with with Hausdorff dimension in the range from 2.5 to 2.6.
no deliberate design, exhibit a clear-cut clustering, “fila- Actually, Cantor dust and Hausdorff dimension are not
ments” and “walls.” These structures were little known the proper notions in the context of viscous fluids because
when these constructions were designed. viscosity necessarily erases the fine detail essential to frac-
Conjecture: Could it be that the observed “clusters,” tals. Hence the following conjecture (Mandelbrot, 1982,
“filaments,” and “walls” need not be explained separately? Chapter 11; 1976). The dissipation in a viscous fluid oc-
They may not result from unidentified features of spe- curs in the neighborhood of a singularity of a nonviscous
cific models, but represent unavoidable consequences of approximation following Euler’s equations, and the mo-
a variety of unconstrained forms of random fractality, as tion of a nonviscous fluid acquires singularities that are
interpreted by a human brain. sets of dimension about 2.5–2.6. Several numerical tests
A problem arose when careful examination of the sim- agree with this conjecture (e.g., Chorin, 1981).
ulations revealed a clearly incorrect prediction. The sim- A related conjecture, that the Navier–Stokes equations
ulations in the seeded universe proved to be visually far have fractal singularities of much smaller dimension, has
more “lacunar” than the real world. That is, the simula- led to extensive work by V. Scheffer, R. Teman, and
tions’ holes are larger than in reality. The parted universe C. Foias, and many others. But this topic is not exhausted.
model fared better, since its lacunarity can be adjusted at Finally, we mention that fractals in phase space entered
will and fit to the actual distribution. A lowered lacunarity the transition from laminar to turbulent flow through the
is expressed by a positive correlation between masses in work of Ruelle and Takens (1971) and their followers. The
antipodal directions. Testing this specific conjecture is a task of unifying the real- and phase-space roles of fractals
challenge for those who analyze the data. is challenging and far from being completed.
Does dynamics make us expect the distribution of galax-
ies to be fractal? Position a large array of point masses
in a cubic box in which opposite sides are identified to X. FRACTALS IN THE ARTS
form a three-dimensional torus. The evolution of this ar- AND IN TEACHING
ray obeys the Laplace equation, with the novelty that the
singularities of the solution are the positions of the points, The Greeks asserted art reflects nature, so it is little sur-
therefore movable. All simulations we know (starting with prise that the many fractal aspects of nature should find
those performed at IBM around 1960) suggest that, even their way into the arts—beyond the fact that a representa-
when the pattern of the singularities begins by being uni- tional painting of a tree exhibits the same fractal branching
form or Poisson, it gradually creates clusters and a sem- as a physical tree. Voss and Clarke (1975) found fractal
blance of hierarchy, and appears to tend toward fractality. It power-law scaling in music, and self-similarity is designed
is against the preceding background that the limit distribu- in the music of the composers György Ligeti and Charles
tion of galaxies is conjectured to be fractal, and fractality Wuorinen. Pollard-Gott (1986) established the presence of
is viewed as compatible with Newton’s equations. fractal repetition patterns in the poetry of Wallace Stevens.
Computer artists use fractals to create both abstract aes-
thetic images and realistic landscapes. Larry Poons’ paint-
2. The Navier–Stokes Equation
ings since the 1980s have had rich fractal textures. The
The first concrete use of a Cantor dust in real spaces is “decalcomania” of the 1830s and the 1930s and 1940s
found in Berger and Mandelbrot (1963), a paper on noise used viscous fingering to provide a level of visual com-
records. This was nearly simultaneous with Kolmogorov’s plexity. Before that, Giacometti’s Alpine wildflower paint-
work on the intermittence of turbulence. After numerous ings are unquestionably fractal. Earlier still, relatives of the
experimental tests designed to create an intuitive feeling Sierpinski gasket occur as decorative motifs in Islamic and
for this phenomenon (e.g., listening to turbulent velocity Renaissance art. Fractals abound in architecture, for exam-
records that were made audible), the fractal viewpoint was ple, in the cascades of spires in Indian temples, Bramante’s
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 205
plan for St. Peter’s, Malevich’s Architektonics, and some SEE ALSO THE FOLLOWING ARTICLES
of Frank Lloyd Wright’s designs. Fractals occur in the
writing of Clarke, Crichton, Hoag, Powers, Updike, and CHAOS • PERCOLATION • TECTONOPHYSICS
Wilhelm, among others, and in at least one play, Stoppard’s
Arcadia. Postmodern literary theory has used some con- BIBLIOGRAPHY
cepts informed by fractal geometry, though this applica-
tion has been criticized for its overly free interpretations Alligood, K., and Yorke, J. (1992). Ergodic Theory Dynam. Syst. 12,
of precise scientific language. Some have seen evidence 377–400.
of power-law scaling in historical records, the distribu- Alligood, K., Sauer, T., and Yorke, J. (1997). “Chaos. An Introduction
tion of the magnitudes of wars and of natural disasters, to Dynamical Systems,” Springer-Verlag, New York.
Barnsley, M. (1988). “Fractals Everywhere,” 2nd ed., Academic Press,
for example. In popular culture, fractals have appeared Orlando, FL.
on t-shirts, totebags, book covers, MTV logos, been men- Barnsley, M., and Demko, S. (1986). “Chaotic Dynamics and Fractals,”
tioned on public radio’s A Prairie Home Companion, and Academic Press, Orlando, FL.
been seen on television programs from Nova and Murphy Barnsley, M., and Hurd, L. (1993). “Fractal Image Compression,” Peters,
Brown, through several incarnations of Star Trek, to The X- Wellesley, MA.
Batty, M., and Longley, P. (1994). “Fractal Cities,” Academic Press,
Files and The Simpsons. While Barnsley’s (1988) slogan, London.
“fractals everywhere,” is too strong, the degree to which Beardon, A. (1983). “The Geometry of Discrete Groups,” Springer-
fractals surround us outside of science and engineering is Verlag, New York.
striking. Beck, C., and Schlögl, F. (1993). “Thermodynamics of Chaotic Systems:
A corollary of this last point is a good conclusion to An Introduction,” Cambridge University Press, Cambridge.
Berger, J., and Mandelbrot, B. (1963). IBM J. Res. Dev. 7, 224–236.
this high-speed survey. In our increasingly technologi- Berry, M. (1979). “Structural Stability in Physics,” Springer-Verlag,
cal world, science education is very important. Yet all New York.
too often humanities students are presented with limited Bertoin, J. (1996). “Lévy Processes,” Cambridge University Press,
choices: the first course in a standard introductory se- Cambridge.
quence, or a survey course diluted to the level of journal- Besicovitch, A., and Moran, P. (1945). J. Lond. Math. Soc. 20, 110–120.
Birkhoff, G. (1927). “Dynamical Systems, American Mathematical
ism. The former builds toward major points not revealed Society, Providence, RI.
until later courses, the latter discusses results from science Birman, J., and Williams, R. (1983). Topology 22, 47–82.
without showing how science is done. In addition, many Bishop, C., and Jones, P. (1997). Acta Math. 179, 1–39.
efforts to incorporate computer-aided instruction attempt Blanchard, P. (1994). In “Complex Dynamical Systems. The Mathemat-
to replace parts of standard lectures rather than engage ics behind the Mandelbrot and Julia Sets” (Devaney, R., ed.), (pp.
139–154, American Mathematical Society, Providence, RI.
students in exploration and discovery. Bochner, S. (1955). “Harmonic Analysis and the Theory of Probability,”
Basic fractal geometry courses for non-science students University of California Press, Berkeley, CA.
provide a radical departure from this mode. The subject Bowen, R. (1975). “Equilibrium States and the Ergodic Theory of
of fractal geometry operates at human scale. Though new Anosov Diffeomorphisms,” Springer-Verlag, New York.
to most, the notion of self-similarity is easy to grasp, Bunde, A., and Havlin, S. (1991). “Fractals and Disordered Systems,”
Springer-Verlag, New York.
and (once understood) handles familiar objects from a Cartwright, M., and Littlewood, L. (1945). J. Lond. Math. Soc. 20, 180–
genuinely novel perspective. Students can explore frac- 189.
tals with the aid of readily available software. These in- Cherbit, G. (1987). “Fractals. Non-integral Dimensions and Applica-
stances of computer-aided instruction are perfectly nat- tions,” Wiley, Chichester, UK.
ural because computers are so central to the entire field Chorin, J. (1981). Commun. Pure Appl. Math. 34, 853–866.
Crilly, A., Earnshaw, R., and Jones, H. (1991). “Fractals and Chaos,”
of fractal geometry. The contemporary nature of the field Springer-Verlag, New York.
is revealed by a supply of mathematical problems that Crilly, A., Earnshaw, R., and Jones, H. (1993). “Applications of Fractals
are simple to state but remain unsolved. Altogether, many and Chaos,” Springer-Verlag, New York.
fields of interest to non-science students have surprising Curry, J., Garnett, L., and Sullivan, D. (1983). Commun. Math. Phys. 91,
examples of fractal structures. Fractal geometry is a pow- 267–277.
Dekking, F. M. (1982). Adv. Math. 44, 78–104.
erful tool for imparting to non-science students some of Devaney, R. (1989). “An Introduction to Chaotic Dynamical Systems,”
the excitement for science often invisible to them. Sev- 2nd ed., Addison-Wesley, Reading, MA.
eral views of this are presented in Frame and Mandelbrot Devaney, R. (1990). “Chaos, Fractals, and Dynamics. Computer Exper-
(2001). iments in Mathematics,” Addison-Wesley, Reading, MA.
The importance of fractals in the practice of science and Devaney, R. (1992). “A First Course in Chaotic Dynamical Systems.
Theory and Experiment,” Addison-Wesley, Reading, MA.
engineering is undeniable. But fractals are also a proven Devaney, R. (ed.). (1994). “Complex Dynamical Systems. The Mathe-
force in science education. Certainly, the boundaries of matics Behind the Mandelbrot and Julia Sets,” American Mathemati-
fractal geometry have not yet been reached. cal Society, Providence, RI.
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
206 Fractals
Devaney, R., and Keen, L. (1989). “Chaos and Fractals. The Mathemat- Lapidus, M., Neuberger, J., Renka, R., and Griffith, C. (1996). Int. J.
ics Behind the Computer Graphics,” American Mathematical Society, Bifurcation Chaos 6, 1185–1210.
Providence, RI. Lasota, A., and Mackey, M. (1994). “Chaos, Fractals, and Noise. Stochas-
Douady, A., and Hubbard, J. (1984). “Étude dynamique des polynômes tic Aspects of Dynamics,” 2nd ed., Springer-Verlag, New York.
complexes. I, II,” Publications Mathematiques d’Orsay, Orsay, Lawler, G., Schramm, O., and Warner, W. (2000). Acta Math., to appear
France. [xxx.lanl.gov./abs/math. PR/0010165].
Douady, A., and Hubbard, J. (1985). Ann. Sci. Ecole Norm. Sup. 18, Lei, T. (2000). “The Mandelbrot Set, Theme and Variations,” Cambridge
287–343. University Press, Cambridge.
Edgar, G. (1990). “Measure, Topology, and Fractal Geometry,” Springer- Le Méhauté, A. (1990). “Fractal Geometries. Theory and Applications,”
Verlag, New York. CRC Press, Boca Raton, FL.
Edgar, G. (1993). “Classics on Fractals,” Addison-Wesley, Reading, MA. Levinson, N. (1949). Ann. Math. 50, 127–153.
Edgar, G. (1998). “Integral, Probability, and Fractal Measures,” Springer- Lorenz, E. (1963). J. Atmos. Sci. 20, 130–141.
Verlag, New York. Lu, N. (1997). “Fractal Imaging,” Academic Press, San Diego.
Eglash, R. (1999). “African Fractals. Modern Computing and Indigenous Lyubich, M. (2001). Ann. Math., to appear.
Design,” Rutgers University Press, New Brunswick, NJ. Mandelbrot, B. (1975). C. R. Acad. Sci. Paris 280A, 1075–1078.
Encarncação, J., Peitgen, H.-O., Sakas, G., and Englert, G. (1992). “Frac- Mandelbrot, B. (1975, 1984, 1989, 1995). “Les objects fractals,” Flam-
tal Geometry and Computer Graphics,” Springer-Verlag, New York. marion, Paris.
Epstein, D. (1986). “Low-Dimensional Topology and Kleinian Groups,” Mandelbrot, B. (1976). C. R. Acad. Sci. Paris 282A, 119–120.
Cambridge University Press, Cambridge. Mandelbrot, B. (1980). Ann. N. Y. Acad. Sci. 357, 249–259.
Evertsz, C., Peitgen, H.-O., and Voss, R. (eds.). (1996). “Fractal Geom- Mandelbrot, B. (1982). “The Fractal Geometry of Nature,” Freeman,
etry and Analysis. The Mandelbrot Festschrift, Curaçao 1995,” World New York.
Scientific, Singapore. Mandelbrot, B. (1984). J. Stat. Phys. 34, 895–930.
Falconer, K. (1985). “The Geometry of Fractal Sets,” Cambridge Uni- Mandelbrot, B. (1985). In “Chaos, Fractals, and Dynamics” (Fischer, P.,
versity Press, Cambridge. and Smith, W., eds.), pp. 235–238, Marcel Dekker, New York.
Falconer, K. (1987). Math. Intelligencer 9, 24–27. Mandelbrot, B. (1995). J. Fourier Anal. Appl. 1995, 409–432.
Falconer, K. (1990). “Fractal Geometry. Mathematical Foundations and Mandelbrot, B. (1997a). “Fractals and Scaling in Finance. Discontinuity,
Applications,” Wiley, Chichester, UK. Concentration, Risk,” Springer-Verlag, New York.
Falconer, K. (1997). “Techniques in Fractal Geometry,” Wiley, Chich- Mandelbrot, B. (1997b). “Fractales, Hasard et Finance,” Flammarion,
ester, UK. Paris.
Family, F., and Vicsek, T. (1991). “Dynamics of Fractal Surfaces,” World Mandelbrot, B. (1999). “Multifractals and 1/ f Noise. Wild Self-Affinity
Scientific, Singapore. in Physics,” Springer-Verlag, New York.
Feder, J. (1988). “Fractals,” Plenum Press, New York. Mandelbrot, B. (2001a). Quant. Finance 1, 113–123, 124–130.
Feder, J., and Aharony, A. (1990). “Fractals in Physics. Essays in Honor Mandelbrot, B. (2001b). “Gaussian Self-Affinity and Fractals: Globality,
of B. B. Mandelbrot,” North-Holland, Amsterdam. the Earth, 1/ f Noise, & R/S,” Springer-Verlag, New York.
Fisher, Y. (1995). “Fractal Image Compression. Theory and Application,” Mandelbrot, B. (2001c). “Fractals and Chaos and Statistical Physics,”
Springer-Verlag, New York. Springer-Verlag, New York.
Flake, G. (1998). “The Computational Beauty of Nature. Computer Ex- Mandelbrot, B. (2001d). “Fractals Tools,” Springer-Verlag, New York.
plorations of Fractals, Chaos, Complex Systems, and Adaptation,” Mandelbrot, B. B., and Stauffer, D. (1994). J. Phys. A 27, L237–L242.
MIT Press, Cambridge, MA. Mandelbrot, B. B., Vespignani, A., and Kaufman, H. (1995). Europhy.
Fleischmann, M., Tildesley, D., and Ball, R. (1990). “Fractals in the Lett. 32, 199–204.
Natural Sciences,” Princeton University Press, Princeton, NJ. Maksit, B. (1988). “Kleinian Groups,” Springer-Verlag, New York.
Frame, M., and Mandelbrot, B. (2001). “Fractals, Graphics, and Massopust, P. (1994). “Fractal Functions, Fractal Surfaces, and
Mathematics Education,” Mathematical Association of America, Wavelets,” Academic Press, San Diego, CA.
Washington, DC. Mattila, P. (1995). “Geometry of Sets and Measures in Euclidean Space.
Frostman, O. (1935). Meddel. Lunds. Univ. Math. Sem. 3, 1–118. Fractals and Rectifiability,” Cambridge University Press, Cambridge.
Gazalé, M., (1990). “Gnomon. From Pharaohs to Fractals,” Princeton McCauley, J. (1993). “Chaos, Dynamics and Fractals. An Algorith-
University Press, Princeton, NJ. mic Approach to Deterministic Chaos,” Cambridge University Press,
Grist, R. (1997). Topology 36, 423–448. Cambridge.
Gulick, D. (1992). “Encounters with Chaos,” McGraw-Hill, New York. McLaughlin, J. (1987). Proc. Am. Math. Soc. 100, 183–186.
Hadamard, J. (1898). J. Mathematiques 5, 27–73. McMullen, C. (1994). “Complex Dynamics and Renormalization,”
Hardy, G. (1916). Trans. Am. Math. Soc. 17, 322–323. Princeton University Press, Princeton, NJ.
Hastings, H., and Sugihara, G. (1993). “Fractals. A User’s Guide for the McShane, G., Parker, J., and Redfern, I. (1994). Exp. Math. 3, 153–170.
Natural Sciences,” Oxford University Press, Oxford. Meakin, P. (1996). “Fractals, Scaling and Growth Far from Equilibrium,”
Hovi, J.-P., Aharony, A., Stauffer, D., and Mandelbrot, B. B. (1996). Cambridge University Press, Cambridge.
Phys. Rev. Lett. 77, 877–880. Milnor, J. (1989). In “Computers in Geometry and Topology” (Tangora,
Hutchinson, J. E. (1981). Ind. Univ. J. Math. 30, 713–747. M., ed.), pp. 211–257, Marcel Dekker, New York.
Keen, L. (1994). In “Complex Dynamical Systems. The Mathematics Moon, F. (1984). Phys. Rev. Lett. 53, 962–964.
behind the Mandelbrot and Julia Sets” (Devaney, R., ed.), pp. 139– Moon, F. (1992). “Chaotic and Fractal Dynamics. An Introduction for
154, American Mathematical Society, Providence, RI. Applied Scientists and Engineers,” Wiley-Interscience, New York.
Keen, L., and Series, C. (1993). Topology 32, 719–749. Parker, J. (1995). Topology 34, 489–496.
Keen, L., Maskit, B., and Series, C. (1993). J. Reine Angew. Math. 436, Peak, D., and Frame, M. (1994). “Chaos Under Control. The Art and
209–219. Science of Complexity,” Freeman, New York.
Kigami, J. (1989). Jpn. J. Appl. Math. 8, 259–290. Peitgen, H.-O. (1989). “Newton’s Method and Dynamical Systems,”
Lapidus, M. (1995). Fractals 3, 725–736. Kluwer, Dordrecht.
P1: GLQ Final
Encyclopedia of Physical Science and Technology EN006H-259 June 28, 2001 20:1
Fractals 207
Peitgen, H.-O., and Richter, P. H. (1986). “The Beauty of Fractals,” Shlesinger, M., Zaslavsky, G., and Frisch, U. (1995). “Lévy Flights and
Springer-Verlag, New York. Related Topics in Physics,” Springer-Verlag, New York.
Peitgen, H.-O., and Saupe, D. (1988). “The Science of Fractal Images,” Sinai, Y. (1972). Russ. Math. Surv. 27, 21–70.
Plenum Press, New York. Smale, S. (1963). In “Differential and Combinatorial Topology”
Peitgen, H.-O., Jürgens, H., and Saupe, D. (1992). “Chaos and Fractals: (Cairns, S., ed.), pp. 63–80, Princeton University Press, Princeton,
New Frontiers of Science,” Springer-Verlag, New York. NJ.
Peitgen, H.-O., Rodenhausen, A., and Skordev, G. (1998). Fractals 6, Stauffer, D., and Aharony, A. (1992). “Introduction to Percolation The-
371–394. ory,” 2nd ed., Taylor and Francis, London.
Pietronero, L. (1989). “Fractals Physical Origins and Properties,” North- Strogatz, S. (1994). “Nonlinear Dynamics and Chaos, with Applica-
Holland, Amsterdam. tions to Chemistry, Physics, Biology, Chemistry, and Engineering,”
Pietronero, L., and Tosatti, E. (1986). “Fractals in Physics,” North- Addison-Wesley, Reading, MA.
Holland, Amsterdam. Sullivan, D. (1985). Ann. Math. 122, 410–418.
Poincaré, H. (1890). Acta Math. 13, 1–271. Tan, Lei (1984). In “Étude dynamique des polynômes complexes”
Pollard-Gott, L. (1986). Language Style 18, 233–249. (Douardy, A., and Hubbard, J., eds.), Vol. II, pp. 139–152, Publi-
Rogers, C. (1970). “Hausdorff Measures,” Cambridge University Press, cations Mathematiques d’Orsay, Orsay, France.
Cambridge. Tan, Lei. (2000). “The Mandelbrot Set, Theme and Variations,”
Ruelle, D. (1978). “Thermodynamic Formalism: The Mathematical Cambridge University Press, Cambridge.
Structures of Classical Equilibrium Statistical Mechanics,” Addison- Thurston, W. (1997). “Three-Dimensional Geometry and Topology,”
Wesley, Reading, MA. Princeton University Press, Princeton, NJ.
Ruelle, D., and Takens, F. (1971). Commun. Math. Phys. 20, 167–192. Tricot, C. (1982). Math. Proc. Camb. Philos. Soc. 91, 54–74.
Samorodnitsky, G., and Taqqu, M. (1994). “Stable Non-Gaussian Ran- Vicsek, T. (1992). “Fractal Growth Phenomena,” 2nd ed., World Scien-
dom Processes. Stochastic Models with Infinite Variance,” Chapman tific, Singapore.
and Hall, New York. Voss, R., and Clarke, J. (1975). Nature 258, 317–318.
Sapoval, B. (1989). Physica D 38, 296–298. West, B. (1990). “Fractal Physiology and Chaos in Medicine,” World
Sapoval, B., Rosso, M., and Gouyet, J. (1985). J. Phys. Lett. 46, L149– Scientific, Singapore.
L156. Weyl, H. (1912). Math. Ann. 71, 441–479.
Sapoval, B., Gobron, T., and Margolina, A. (1991). Phys. Rev. Lett. 67, Williams, R. (1983). Ergodic Theory Dynam. Syst. 4, 147–163.
2974–2977. Wiggins, S. (1990). “Introduction to Applied Nonlinear Dynamical Sys-
Scholz, C., and Mandelbrot, B. (1989). “Fractals in Geophysics,” tems and Chaos,” Springer-Verlag, New York.
Birkhäuser, Basel. Witten, T., and Sander, L. (1981). Phys. Rev. Lett. 47, 1400–1403.
Shishikura, M. M. (1994). Astérisque 222, 389–406. Witten, T., and Sander, L. (1983). Phys. Rev. B 27, 5686–5697.
P1: GRB/GWT P2: GQT Final Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
Functional Analysis
C. W. Groetsch
University of Cincinnati
I. Linear Spaces
II. Linear Operators
III. Contractions
IV. Some Principles and Techniques
V. A Few Applications
337
P1: GRB/GWT P2: GQT Final
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
A linear space that comes bundled with a norm is called for all x ∈ V. Equivalence therefore means equivalent in
a normed linear space. For specificity, we will sometimes the sense of convergence: a sequence converges with re-
indicate a linear space V equipped with a norm · by the spect to the norm · a if and only if it converges with
pair (V, · ). For example, (C[a, b], · ∞ ) is a normed respect to the norm · b . It can be shown that any two
linear space under the so-called uniform norm norms on a finite-dimensional linear space are equiv-
alent and hence when speaking of convergence in a
f ∞ = max{| f (t)| : t ∈ [a, b]}.
finite-dimensional space one can dispense with mention-
The space C[a, b] is also a normed linear space when ing the norm. However, as we have seen above, conver-
endowed with the norm gence is norm-dependent in infinite dimensional spaces.
P1: GRB/GWT P2: GQT Final
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
Definition. The closure, W , of a subset W of a normed Every normed linear space can be imbedded as a
linear space (V, · ) is the set consisting of elements of dense subspace of a Banach space by an abstract pro-
W along with all limits of convergent sequences in W . A cess of “completion.” Essentially, the completion process
set which is its own closure is called closed. involves adjoining to the original space, for each Cauchy
sequence in the original space, a vector in an extended
For example, the closure of the set of all polynomials in space (the completion) which is the limit in the extended
the space (C[a, b], · ∞ ) is, by virtue of the Weierstrass space of the Cauchy sequence. For example, it can be
approximation theorem, the space C[a, b] itself. Also, for shown that the completion of the space (C[a, b], · 1 ) is
example, the set of nonnegative functions in C[a, b] is a the space (L 1 [a, b], · 1 ).
closed subset of C[a, b].
Certain sequences of vectors in a normed linear space tend Perpendicularity, the key ingredient in the Pythagorean
to “bunch up” in a way that imitates convergent sequences. theorem, is one of the fundamental concepts in geome-
Such sequences are named after the nineteenth century try. A successful geometrization of analysis requires the
mathematician Augustin Cauchy. incorporation of this concept. The essential aspects of per-
pendicularity are captured in special normed linear spaces
Definition. A sequence of vectors {xn } in a normed called inner product spaces.
linear space (V, · ) is called a Cauchy sequence if
limn,m→∞ xn − xm = 0. Definition. An inner product on a real linear space V is a
function ·, · : V × V → R which satisfies:
It is easy to see that every convergent sequence is
Cauchy, however, it is not necessarily the case that a (i) x, x ≥ 0 for all x ∈ V,
Cauchy sequence is convergent. Consider, for exam-
(ii) x, x = 0 if and only if x = θ (the zero vector),
ple, the “ramp” function h n in C[−1, 1] whose graph
consists of the straight line segments connecting the (iii) x, y = y, x for all x, y ∈ V,
points (−1, 0), (− n1 , 0), (0, 1), (1, 1). The sequence of
(iv) t x, y = tx, y for all x, y ∈ V and all t ∈ R,
ramp functions is Cauchy with respect to the norm · 1
since (v) x + y, z = x, z + y, z, for all x, y, z ∈ V.
1 1 1 Properties (iii)–(v) are summarized by saying that ·, ·
h n − h m 1 = − → 0, as n, m → ∞.
2 n m is a symmetric bilinear form; properties (i) and (ii) say
that ·, · is nonnegative and definite. A linear space en-
However, {h n } does not converge, with respect to the norm dowed with an inner product is called an inner product
· 1 , to a function in C[−1, 1] (the · 1 limit of this se- space.
quence of ramp functions is the discontinuous Heaviside The most familiar inner product space is the Euclidean
function). Spaces, such as (C[−1, 1], · 1 ), which do not space Rn with the inner product
accommodate limits of Cauchy sequences are in some
sense “incomplete.” A normed linear space is called com- x, y = x1 y1 + x2 y2 + · · · + xn yn .
plete if every Cauchy sequence in the space converges to
some vector in the space. Of course other inner products may be used on the same
underlying linear space. For example, in statistics, the
Definition. A Banach space is a complete normed linear Euclidean space is often used with a weighted inner
space. product
We have just seen that C[a, b] with the norm · 1 is
not a Banach space. However, C[a, b] with the uniform x, y = w1 x1 y1 + w2 x2 y2 + · · · + wn xn yn
norm is a Banach space. The Lebesgue spaces are partic- where w1 , w2 , . . . , wn are fixed positive weights.
ularly important Banach spaces. For 1 ≤ p < ∞ the space Function spaces serve up a particularly rich stew of
L p [a, b] consists of measurable real-valued functions f inner product spaces. For example, the theory of Fourier
defined on [a, b] such that | f | p has a finite Lebesgue in- series can be developed in the Lebesgue space L 2 [a, b]
tegral. The norm · p on L p [a, b] is defined by with the inner product
b 1p b
f p = | f (t)| p dt . f, g = f (t)g(t) dt
a a
P1: GRB/GWT P2: GQT Final
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
An orthogonal set of unit vectors (i.e., vectors of norm have been studied and applied extensively in recent years.
one) is called an orthonormal set. If S is an orthonormal set We now show how to construct the simplest wavelet basis,
in a Hilbert space H , then for any y ∈ H at most countably the Haar basis. As in the previous example, we seek a sin-
many of the numbers y, x, where x ∈ S, are nonzero and gle function to generate all the basis vectors, but instead
of a wave, we choose a function that decays quickly, in
|y, x|2 ≤ y 2
fact a function that vanishes off an interval. Our choice is
x∈S
the Haar “wavelet”
(this is called Bessel’s inequality). The numbers {y, x:
x ∈ S} are called the Fourier coefficients of y with respect
1 for 0 ≤ t < 12
to the orthonormal set S. If the orthonormal set S has ψ(t) = −1 for 1
≤t <1
0 otherwise
2
the property that the Fourier coefficients of a vector com-
pletely determine the vector in the sense that if, for any
vector y ∈ H Taking dilates of this wavelet will serve the same purpose
as the higher frequency waves in the previous example,
y, x = 0 for all x ∈ S implies y=θ that is, the dilates of the fundamental wave φ. However,
this by itself will not serve to represent functions defined
then S is called a complete orthonormal set.
over the entire line. To do that we must also shift the dilates
Definition. A Hilbert space is called separable if it con- about. If this is done properly, then all the needed time
tains a sequence which is a complete orthonormal set. Any and frequency attributes are captured and the resulting
such complete orthonormal sequence is called a basis for wavelets are orthonormal. Specifically, it can be shown
the Hilbert space. that the functions {ψ j,k : j, k = 0, ±1, ±2, . . .} defined by
It is not hard to see that an orthonormal sequence {φn } in j
ψ j,k (t) = 2 2 ψ(2 j t − k)
a Hilbert space H is complete if and only if
form an orthonormal basis for L 2 (−∞, ∞) called the
|y, φn |2 = y 2 Haar basis.
n
In functional analysis the primary interest is the study complish. From this it also follows that every bounded
of linear operators on infinite dimensional function spaces linear operator is continuous (relative to the norms in
that are defined intrinsically, that is, without regard to a question) and, in fact, every continuous linear operator is
specific basis. For example, given a real-valued function bounded. Indeed, if T is linear and continuous, then there
k(·, ·) which is continuous on the square [0, 1] × [0, 1] is a δ > 0 such that z ≤ δ implies Tz ≤ 1. For any x = θ
one can define the linear integral operator T on C[0, 1], one then has T (δx/ x ) ≤ 1, that is, Tx ≤ 1δ x for
taking values in C[0, 1], by T f = g where all x. Therefore, T is bounded and T ≤ 1/δ.
1 The set L(V, W) of all bounded linear operators from
g(s) = k(s, t) f (t) dt. a normed linear space V to a normed linear space W is
0 itself a normed linear space under the natural notions of
Such integral operators may be viewed as natural general- operator sum and scalar multiplication:
izations of matrices to the case of continuous, rather than (T + S)(x) = Tx + Sx
discrete, variables.
Two important subspaces, the nullspace and range, are (t T )(x) = t(Tx)
associated with each linear operator. The nullspace, N (T ),
and with the norm on bounded linear operators as de-
of a linear operator T consists of all vectors that T maps to
fined above. Further, if W is a Banach space, then so is
the zero vector: N (T ) = {x ∈ D(T ): Tx = θ }. For example,
L(V, W). If the image space W is the scalar field (R, or
the differentiation operator acting on the space C 1 [a, b]
C), then L(V, W) is called the dual space, V ∗ , of V. An
has nullspace consisting of all constant functions. The
operator in V ∗ is called a bounded linear functional on V.
range, R(T ), of a linear operator T is the set of all images
For example, given a fixed t0 ∈ [a, b], the point evaluation
of vectors under T , that is, R(T ) = {Tx: x ∈ D(T )}. For ex-
operator E t0 , defined by
ample, the range of the integral operator on C[0, 1] gener-
ated by the kernel k(s, t) = exp(s + t) consists of all scalar E t0 ( f ) = f (t0 )
multiples of the exponential function f (s) = exp(s).
is a bounded linear functional on the space (C[a, b],
· ∞ ). The dual space of C[a, b] may be identified with
A. Bounded Operators the space of all functions of bounded variation on [a, b] in
the sense that any bounded linear functional on C[a, b]
Linear operators acting between normed linear spaces
has a unique representation of the form
that are continuous with respect to the norms are called
bounded. b
( f ) = f (t) dg(t)
Definition. A linear operator T defined on a normed linear a
space (V, · v ) and taking values in a normed linear space for some function g of bounded variation (the integral is a
(W, · w ) is called bounded if there is a constant L such Riemann-Stieljes integral). In the case of point evaluation
that Tx w ≤ L x v for all x ∈ V. the representative function g is the Heaviside function at
The notational specification of norms on various lin- t0 :
ear spaces is tiresome and annoying; we will there-
0 , t < t0
fore often dispense with it and rely on the reader to g(t) =
1 , t0 ≤ t.
understand appropriate norms from the context. Hence
we will say that a linear operator T is bounded if It can be shown that for 1 < p < ∞,
Tx ≤ L x for all x ∈ D(T ) and some constant L. The
smallest constant L for which this inequality holds is (L p [a, b])∗ = L q [a, b]
called the norm of the operator T and is denoted T . where 1p + q1 = 1, in the sense that every bounded linear
Equivalently, functional on L p [a, b] has the form
Tx b
T = sup
x=θ x ( f ) = f (t)g(t) dt
a
where “sup” stands for supremum, or least upper bound. for some g ∈ L [a, b] that is uniquely determined by .
q
Using linearity of T one sees that With this understanding the space L 2 [a, b] is self-dual.
Tx − Ty ≤ T x−y In fact, by the Riesz Representation Theorem, all Hilbert
spaces are self-dual for the following reason: a bounded
and hence T gives the smallest universal bound for the linear functional on a Hilbert space H has the form
relative “spread” that a bounded linear operator T can ac- (x) = x, z, for some vector z ∈ H which is uniquely
P1: GRB/GWT P2: GQT Final
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
determined by . Further, = z , where the first norm b
is an operator norm and the other is the underlying Hilbert (T f )(s) = k(s, t) f (t) dt, s ∈ [c, d],
a
space norm. The association of the bounded linear func-
tional with its Riesz representor z provides the identifi- then
d
cation of H ∗ with H . b
T f, g = k(s, t) f (t) dt g(s) ds
Bounded linear functionals may be thought of as linear c a
measurements on a Hilbert space in that a bounded linear b d
functional gives a numerical measure on vectors in the = k(s, t)g(s) ds f (t) dt = f, T ∗ g
space which is continuous with respect to the norm. Such a c
measures distinguish vectors in the space because x = y if and hence T ∗ is the integral operator generated by the
and only if there is a vector z with x, z = y, z. However, kernel k ∗ (·, ·) defined by k ∗ (t, s) = k(s, t). In particular,
it may happen that no such linear measure can ultimately if the kernel k is symmetric and [a, b] = [c, d], then the
distinguish the vectors in a sequence from a certain fixed operator T is self-adjoint.
vector. This gives rise to the notion of weak convergence.
Definition. A sequence {xn } in a Hilbert space H is said C. Compact Operators
to converge weakly to a vector x, denoted xn x, if
xn , z → x, z for all z ∈ H . A linear operator T : H1 → H2 is called an operator of
finite rank if its range is spanned by finitely many vectors
It is a consequence of the Cauchy-Schwarz inequality that in H2 . In other words, T has finite rank if there are linearly
every convergent sequence is weakly convergent to the independent vectors {v1 , . . . , vm } in H2 such that
same limit. Also, Bessel’s inequality shows that every or-
m
thonormal sequence of vectors converges weakly to the Tx = ak (x)vk
zero vector. k=1
eigenspaces. The eigenspace of a linear operator Then D(M) is a dense subspace of L 2 (−∞, ∞) and one
T : H → H associated with a scalar λ is the subspace can define the linear operator M : D(M) → L 2 (−∞, ∞)
N (T − λI ) = {x ∈ H : Tx = λx}. by
Mf = g where g(t) = t f (t).
In general, an eigenspace may be trivial, that is, it may
consist only of the zero vector. If N (T − λI ) = {θ }, we Let φn (t) = 1 for t ∈ [n, n + 1] and φn (t) = 0 otherwise.
say that λ is an eigenvalue of T . If T is self-adjoint, Then φn 2 = 1 and Mφn 2 > n, and hence M is un-
then the eigenvalues of T are real numbers and vec- bounded. The multiplication operator has an important
tors in distinct eigenspaces are orthogonal to each other. interpretation in quantum mechanics.
If T is self-adjoint, compact, and of infinite rank, then The adjoint operator may also be defined for unbounded
the eigenvalues of T form a sequence of real numbers linear operators with dense domains. Given a linear op-
{λn }. This sequence converges to zero, for taking an or- erator T : D(T ) ⊂ H1 → H2 with dense domain D(T ), let
thonormal sequence {xn } with xn ∈ N (T − λn I ) we have D(T ∗ ) be the subspace of all vectors y ∈ H2 satisfying
λn xn = Txn → θ since xn θ (a consequence of Bessel’s
Tx, y = x, y ∗
inequality). Since xn = 1, it follows that λn → 0. The
fact that Txn → 0 for a sequence of unit vectors {xn } is an for some vector y ∗ ∈ H1 and all x ∈ D(T ). The vector y ∗ is
abstract version of the Riemann-Lebesgue Theorem. then uniquely defined and we set T ∗ y = y ∗ . Then the oper-
The prime exemplar of a compact self-adjoint operator ator T ∗ : D(T ∗ ) ⊂ H2 → H1 is densely defined and linear.
is the integral operator on the real space L 2 [a, b] generated As an example, let D(T ) be the space of absolutely
by a symmetric kernel k( ·, ·) ∈ L 2 ([a, b] × [a, b]): continuous complex-valued functions f defined on [0, 1]
b with f ∈ L 2 [0, 1] satisfying the periodic boundary con-
(T f )(s) = k(s, t) f (t) dt. dition f (0) = f (1). Then D(T ) is dense in L 2 [0, 1].
a Define T : D(T ) → L 2 [0, 1] by T f = i f . For g ∈ D(T )
This operator is compact because it is the limit in operator we have
norm of the finite rank operators 1
b T f, g = i f (t)g(t) dt
0
TN f = cn,m f (t)φm (t) dtφn
a 1
= i g(t) f (t) 1 − i
n,m≤N
0 f (t)g (t) dt
where {φn }∞ 1 is a complete orthonormal sequence in
0
L 2 [a, b] and 1
= f (t)ig (t) dt = f, ig
∞
0
k(s, t) = cn,m φn (s)φm (t)
n,m=1 for all f ∈ D(T ). Therefore, D(T ) ⊂ D(T ∗ ) and, in fact, it
can be shown that D(T ∗ ) = D(T ). This calculation shows
is the Fourier expansion of k( ·, ·) relative to the orthonor- that T ∗ g = Tg, that is, T is self-adjoint.
mal basis {φn (s)φm (t)} for L 2 ([a, b] × [a, b]). A linear operator T : D(T ) ⊆ H1 → H2 is called closed
if its graph G(T ) = {(x, Tx) : x ∈ D(T )} is a closed sub-
D. Unbounded Operators space of the product Hilbert space H1 × H2 . This means
that if {xn } ⊂ D(T ), xn → x ∈ H1 , and Txn → y ∈ H2 , then
It is not the case that every interesting linear operator is (x, y) ∈ G(T ), that is, x ∈ D(T ) and Tx = y. For example,
bounded. For example, the differentiation operator act- the differentiation operator defined in the previous para-
ing on the space C 1 [0, π] and taking values in the space graph is closed. In fact, the adjoint of any densely defined
C[0, π ], both with the uniform norm, is unbounded. In- linear operator is closed.
deed, for the functions φn (t) = sin(nt) we find that the A densely defined linear operator T : D(T ) ⊆ H → H
quotient is called symmetric if
φn ∞
=n Tx, y = x, Ty for all x, y ∈ D(T ).
φn ∞
Every self-adjoint transformation is, of course, symmetric;
is unbounded.
however, a symmetric transformation is not necessarily
Multiplication by the variable in the space L 2 (−∞, ∞)
self-adjoint. Consider, for instance, a slight modification
is another famous example of an unbounded operator. Let
∞ of the previous example. Let D(T ) be the space of ab-
solutely continuous complex-valued functions on [0, 1]
D(M) = f ∈ L 2 (−∞, ∞) : t 2 | f (t)|2 dt < ∞ .
−∞ which vanish at the end points, and let T f = i f . For
P1: GRB/GWT P2: GQT Final
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
f, g ∈ D(T ), integration by parts gives T f, g = f, Tg, 1
and hence T is symmetric, and the adjoint of T satis- |(Tw)(s) − (Tu)(s)| = k(s, w(t)) − k(s, u(t)) dt
0
fies T ∗ g = i f . However, D(T ∗ ) is a proper extension of 1
D(T ), in that no boundary conditions are imposed on func- ≤α |w(t) − u(t)| dt ≤ α w − u .
tions in D(T ∗ ), and hence T is not self-adjoint. The ex- 0
amples just given show that a symmetric linear operator is Therefore, Tw − Tu ≤ α w − u and hence T is a con-
not necessarily bounded. The Hellinger-Toeplitz Theorem traction for the uniform norm if α < 1.
gives sufficient conditions for a symmetric operator to be The Contraction Mapping Theorem (elucidated by
bounded: a symmetric linear operator whose domain is Banach in 1922) is a constructive existence and unique-
the entire space is bounded. ness theorem for fixed points. If D is a closed subset of a
If a linear operator T : D(T ) ⊆ H1 → H2 is closed, then Banach space and T : D → D is a contraction, then the
D(T ) is a Hilbert space when endowed with the graph theorem guarantees the existence of a unique fixed point
inner product: x ∈ D. This fixed point is the limit of any sequence con-
(x, Tx), (y, Ty) = x, y + Tx, Ty. structed iteratively by xn+1 = Txn , where x0 is an arbitrary
vector in D. If α is a contraction constant for T , then there
If T is closed and everywhere defined, i.e., D(T ) = H1 , is an a priori error bound
then since the graph norm dominates the norm on H1 ,
αn
we find, by the corollary to Banach’s theorem (see the xn − x ≤ x1 − x0
inversion section), that the norm in H1 is equivalent to the 1−α
graph norm. In particular, the operator T is then bounded. and an a posteriori error bound
This is the closed graph theorem: a closed everywhere 1
defined linear operator is bounded. xn − x ≤ xn+1 − xn
1−α
for xn as an approximation to the unique fixed point x.
The contraction mapping theorem is often used to es-
III. CONTRACTIONS
tablish the existence and uniqueness of solutions of prob-
lems in function spaces. One such application is a sim-
Suppose X is a Banach space and D ⊆ X . A vector x ∈ D
ple implicit function theorem. Suppose f ∈ C([a, b] × R)
is called a fixed point of the mapping T : D → X if Tx = x.
satisfies
Every linear operator has a fixed point, namely the zero
∂ f
vector. But a nonlinear mapping may be free of fixed
0 < m ≤ (t, s) ≤ M
points. However, a mapping that draws points together in ∂s
a uniform relative sense (a condition called contractivity) for some constants m and M. Then one can show that
is guaranteed to have a fixed point. the equation f (t, x) = 0 implicitly defines a continu-
Definition. A mapping T : D ⊆ X → X is called a ous function x on [a, b]. Indeed, the nonlinear operator
contraction (relative to the norm · on X ) if, T : C[a, b] → C[a, b] defined by
Tx − Ty ≤ α x − y , for all x, y ∈ D and some posi- 2
(Tx)(t) = x(t) − f (t, x(t))
tive constant α < 1. m+M
For example, if L : X → X is a bounded linear oper- is, under the stated conditions, a contraction mapping on
ator with L < 1, and g ∈ X , then the (affine) mapping C[a, b] with contraction constant α = (M − m)/(M + m).
T : X → X defined by Tx = L x + g is a contraction with Therefore, T has a unique fixed point x ∈ C[a, b]. That is,
contraction constant α = L . As another example, con- there is a unique function x ∈ C[a, b] satisfying
sider the nonlinear integral operator T : C[0, 1] → C[0, 1] 2
defined by x(t) = x(t) − f (t, x(t))
m+M
1
or, equivalently f (t, x(t)) = 0.
(Tu)(s) = k(s, u(t)) dt
0
contains a unique vector of smallest norm. By shifting If S is a separable closed subspace of a Hilbert space
the origin to a vector x, one sees that this is equivalent H , then a representation of the projection operator P of
to saying that given a closed convex subset S of a Hilbert H onto the subspace S can be given in terms of a com-
space H and a vector x ∈ H , there is a unique vector Px ∈ S plete orthonormal sequence {φn } for S. Indeed, if x ∈ H ,
satisfying then
x − Px = min x − y . x, φn φn − x ∈ S ⊥
y∈S n
This purely geometric projection property has impor- since this vector is orthogonal to each member of the basis
tant applications in optimization theory. For example, con- {φn } for S. Therefore,
sider the following simple example of optimal control of a
one-dimensional dynamical system. Suppose a unit point Px = Px + P x, φn φn − x = x, φn φn .
mass is steered from the origin with initial velocity 1 by n n
a control (external force) u. We are interested in a control
There is an important relationship involving the
that will return the particle to a “soft landing” at the origin
nullspace, range, and adjoint of a bounded linear oper-
in unit time while expending minimal effort, where the
ator acting between Hilbert spaces. If T : H1 → H2 is a
measure of effort is
bounded linear operator, then N (T ∗ ) = R(T )⊥ . To see this,
1
note that w ∈ R(T )⊥ if and only if
|u(t)|2 dt.
0 0 = Tx, w = x, T ∗ w
We may formulate this problem in the Hilbert space
for all x ∈ H1 , that is, if and only if w ∈ N (T ∗ ). By a pre-
L 2 [0, 1]. The dynamics of the system are governed by
viously discussed result on the second orthogonal com-
the equations
plement we get the related result that R(T ) = N (T ∗ )⊥ . In
ẍ = u, x(0) = 0, ẋ(0) = 1, x(1) = 0, ẋ(1) = 0. particular, if T is a bounded linear operator with closed
range, then the equation T f = g has a solution if and only
Suppose C is the set of all vectors u in L 2 [0, 1] for if g is orthogonal to all solutions x of the homogeneous
which the equations above are satisfied for some vector adjoint equation T ∗ x = θ.
x ∈ H 2 [0, 1]. It may be routinely verified that C is a closed Replacing T with T ∗ and noting that T ∗∗ = T , we obtain
convex subset of L 2 [0, 1] and hence C contains a unique two additional relationships between the nullspace, range,
vector of smallest L 2 -norm, i.e., there is a unique mini- and adjoint. Taken together these relationships, namely
mal effort control that steers the system in the specified
N (T ∗ ) = R(T )⊥ , N (T ∗ )⊥ = R(T ),
manner.
The (generally nonlinear) operator P defined above is N (T ) = R(T ∗ )⊥ , N (T )⊥ = R(T ∗ )
called the (metric) projection of H onto S. If S is a closed
subspace of H , then P is a bounded self-adjoint linear are sometimes collectively called the theorem on the four
operator and I − P, where I is the identity operator, is fundamental subspaces.
the projection of H onto S ⊥ . Since x = P x + (I − P)x, The Riesz Representation Theorem is a simple conse-
this provides a Cartesian decomposition of H , written quence of the decomposition theorem. If is a nonzero
H = S ⊕ S ⊥ , meaning that each vector in H can be written bounded linear functional on a Hilbert space H , then
uniquely as a sum of a vector in S and a vector in S ⊥ . N ()⊥ = R(∗ ) is one-dimensional and H = N () ⊕
For example, suppose H is the completion of the space N ()⊥ . Let y ∈ N ()⊥ be a unit vector. Then x = Px +
C 1 [0, 1] with respect to the inner product x, yy, where P is the projection operator of H onto
1 N (). Therefore,
f, g = f (t)g (t) dt + f (0)g(0), (x) = (x, yy) = x, y(y) = x, z
0
of such an operator. We have seen that the nonzero eigen- The system {u j , v j ; µ j } is called a singular system for the
values of a compact self-adjoint operator T of infinite operator T , and any f ∈ H1 has, by the decomposition
rank form a sequence of real numbers {λn } with λn → 0 theorem, a representation in the form
as n → ∞. The corresponding eigenspaces N (T − λn I )
∞
are all finite-dimensional and there is a sequence {vn } of f = Pf + f, u j u j
orthonormal eigenvectors, that is, vectors satisfying j=1
satisfies the normal equation T ∗ T u = T ∗ g. Since T is it follows that u ∈ K is the vector in K that is nearest to
bounded, the set of solutions of the normal equation is f , that is, u = P f , the metric projection of f onto K .
closed and convex, and hence, by the projection theorem, A nonsymmetric version of the Riesz Representation
if there is a least-squares solution, then there is a least- Theorem also follows from the theorem on variational
squares solution of smallest norm. These ideas enable us inequalities. Suppose is a bounded linear functional on
to define a generalized inverse (the Moore-Penrose gen- H . Then, by the Riesz Representation Theorem, there is
eralized inverse) for T . Let D(T † ) = R(T ) + R(T )⊥ and a f ∈ H such that (w) = f, w for all w ∈ H . Let the
for g ∈ D(T † ), define T † g to be the least-squares solution closed convex set K be the entire Hilbert space H , then
having smallest norm. The Moore-Penrose generalized in- there is a unique u ∈ H satisfying
verse, T † : D(T † ) ⊆ H2 → H1 , is a closed densely defined
a(u, v − u) ≥ f, v − u for all v∈H
linear operator. However, T † is bounded if and only if
R(T ) is closed. If T is compact, then R(T ) is closed if and hence a(u, w) ≥ f, w for all w ∈ H . Replacing w
and only if T has finite rank. In particular, a linear inte- by −w, we also get a(u, − w) ≥ f, −w for all w ∈ H .
gral operator generated by a square integrable kernel has Therefore, a(u, w) = f, w for all w ∈ H . That is, the
a bounded Moore-Penrose generalized inverse if and only functional has the representation (w) = a(u, w) for a
if the kernel is degenerate. If T is compact with singular unique u ∈ H . This representation of bounded linear func-
system {u j , v j ; µ j }, the the Moore-Penrose generalized tional in terms of a possibly nonsymmetric bilinear form
inverse has the explicit representation is known as the Lax-Milgram lemma. The Lax-Milgram
g, v j lemma can be used to establish the existence of a unique
T †g = u j. weak solution for certain nonsymmetric elliptic boundary
j
µj
value problems in the same way that the Riesz Represen-
tation Theorem is used to prove the existence of a unique
F. Variational Inequalities weak solution of the Poisson problem.
As a simple application of the Lax-Milgram lemma,
Suppose H is a real Hilbert space with inner prod- consider the two-point boundary value problem
uct ·, · and corresponding norm · . A bilinear form
a(·, ·): H × H → R is called bounded if there is a con- −u + u + u = f, u (0) = u (1) = 0.
stant C such that Integration by parts yields a(u, v) = f, v, where ·, · is
|a(u, v)| ≤ C u v for all u, v ∈ H the L 2 [0, 1] inner product and a(u, v) is the nonsymmetric,
bounded, coercive, bilinear form defined on H 1 [0, 1] by
and coercive if there is a constant m > 0 such that 1
m u 2
≤ a(u, u) a(u, v) = (u v + u v + uv)(s) ds.
0
for all u ∈ H . A fundamental result of Stampacchia asserts The Lax-Milgram lemma then ensures the existence of a
that if a( ·, ·) is a bounded, coercive bilinear form (which unique weak solution u ∈ H 1 [0, 1] of the boundary value
need not be symmetric), and if f ∈ H and K is a closed problem, that is, a unique vector u ∈ H 1 [0, 1] satisfying
convex subset of H , then there is a unique u ∈ K satisfying
a(u, v) = f, v for all v ∈ H 1 [0, 1].
a(u, v − u) ≥ f, v − u for all v ∈ K .
This is called a variational inequality for the form a( ·, ·), V. A FEW APPLICATIONS
the closed convex set K , and the vector f ∈ H .
The fundamental nature of this result becomes apparent A. Weak Solutions of Poisson’s Equation
when one notices that the projection property for Hilbert Suppose is a bounded domain in R2 with smooth bound-
space is a special case of this result on variational inequal- ary ∂. Given f ∈ C() a classical solution of Poisson’s
ities. Indeed, if a(x, y) = x, y, then the theorem insures equation is a function u ∈ C 2 () satisfying
the existence of a unique u ∈ K satisfying
−u = f in
u, v − u ≥ f, v − u
u=0 on ∂
or, equivalently
where is the Laplacian operator. If the Poisson equation
f − u, v − u ≤ 0 for all v ∈ K . is multiplied by v ∈ C02 () and integrated over , then
Green’s identity gives
Geometrically, this says that the angle between the vectors
f − u and v − u is obtuse for all vectors v ∈ K . From this ∇u, ∇v = f, v
P1: GRB/GWT P2: GQT Final
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
where ∇ is the gradient operator and ·, · is the L 2 () A finite element solution of the Poisson problem is de-
inner product. There are two things to notice about this fined by restricting the conditions for a weak solution to
equation: f, v is defined for f ∈ L 2 (), allowing consid- the subspace of finite elements, that is, u N ∈ U N is a finite
eration of “rougher” data f , and on the left-hand side only element solution of the Poisson problem if
first derivatives are required rather than second derivatives,
a(u N , v) = f, v for all v ∈ UN .
allowing less smooth “solutions” u. These observations
permit us to propose a weaker formulation of the Poisson When this condition is expressed in terms of the finite
problem. element basis, the resulting coefficient matrix is positive
The bilinear form a( ·, ·) defined by a(u, v) = ∇u, ∇v definite and hence there is a unique finite element solution.
is an inner product (sometimes called the energy inner If u is the weak solution of the Poisson problem, then
product) on the space C01 () (this is a consequence of
Poincaré’s inequality: a(u, u) ≥ C u 22 , where · 2 is the a(u − u N , v) = f, v − f, v = 0
L 2 norm). The norm generated by this inner product is for all v ∈ U N . Geometrically, this says that the finite ele-
equivalent to the Sobolev H01 () norm. Therefore, the ment solution is the projection (relative to the energy inner
completion of C01 () relative to this inner product is the product) of the weak solution onto the finite element sub-
Sobolev space H01 (), and we define a weak solution of space and hence,
the Poisson problem to be a vector u ∈ H01 () satisfying
u − uN ≤ u − v
a(u, v) = f, v
for all v ∈ U N , where · is the energy norm. That is,
for all v ∈ H01 (). The linear functional : H01 () → R the finite element solution is the best approximation to the
defined by (v) = f, v is (again, as a consequence of weak solution, with respect to the energy norm, in the finite
Poincaré’s inequality) a bounded linear functional on element subspace.
H01 (), and hence, by the Riesz Representation Theorem,
there is a unique u ∈ H01 () satisfying
C. Two-Point Boundary Value Problems
a(u, v) = f, v for all v ∈ H01 ().
We briefly treat a simple class of Sturm-Liouville prob-
In other words, Poisson’s problem has a unique weak so- lems. The goal is to find x ∈ C 2 [a, b] satisfying the differ-
lution for each f ∈ L 2 (). ential equation
d dx
p(s) − [µ + q(s)] x(s) = f (s)
B. A Finite Element Method ds ds
A finite element method is a constructive precedure for where q, f ∈ C[a, b] and p ∈ C 1 [a, b] are given functions
approximating a weak solution by a linear combination and µ = 0 is a given scalar. Define T : C02 [a, b] → C[a, b]
of “basis” functions. As a simple illustration we treat a by
piecewise linear finite element method for the Poisson
Tx = [ px ] − q x.
problem in the plane. A finite dimensional subspace U N
of H01 () chosen and a basis (whose members are called We suppose the this differential operator is nonsingu-
finite elements) is selected for U N . The finite element ap- lar, that is, N (T ) = {θ } (such is the case, for example, if
proximation to the weak solution will be a certain linear p(s) < 0 and q(s) ≤ 0). Then there is a symmetric Green’s
combination of these basis functions, i.e., a member of function k(·, ·) ∈ C[a, b] × C[a, b] for T . That is, T −1 is
U N . First the region is triangulated; the vertices of the the integral operator generated by the kernel k(·, ·). In
resulting triangles are called nodes of the triangulation. other words, Tx = h if and only if
The functions in U N will be continuous on , linear on b
each triangle, and zero on the boundary of . With each x(s) = k(s, t)h(t) dt.
interior node of the triangulation we associate a basis func- a
tion which is 1 at the node and zero at all other nodes of The original problem may then be expressed as
the triangulation (these basis functions are called the linear
Lagrange elements).The dimension of the finite element Tx = µx + f
subspace is therefore equal to the number of interior nodes
or, in terms of the Green’s function k(·, ·):
in the triangulation and each basis function has a pyrami-
b
dal shape with a peak at the associated nodal point of the
x(s) = k(s, t)[µx(t) + f (t)] dt.
triangulation. a
P1: GRB/GWT P2: GQT Final
Encyclopedia of Physical Science and Technology EN06M-269 June 27, 2001 12:29
For example, the expected value of the multiplication by i.e., for all ψ ∈ D([P, M]):
the independent variable operator, (Mψ)(x) = xψ(x),
∞ h
[P, M]ψ = ψ.
E(M) = xψ(x)ψ(x) d x 2πi
−∞
is the mean position of the particle (slight modifications The general Heisenberg principle then gives
of the arguments given in the section on unbounded oper- 2
ators show that M is unbounded and self-adjoint). For this h
Var (P) × Var (M) ≥ .
reason the observable M is called the position operator. 4π
The variance of an observable T when the system is in
state ψ is defined by This is an expression of the physical uncertainty principle:
no matter the state ψ, the position and momentum can not
Var (T ) = (T − E(T )I )ψ 2 . both be determined with arbitrary certainty.
That is, the variance gives a measure of the dispersion of
an observable from its expected value.
Definition. The commutator of two observables S and SEE ALSO THE FOLLOWING ARTICLES
T is the observable [S, T ] : D(ST ) ∩ D(TS) → H defined
by [S, T ] = ST − TS. CONVEX SETS • DATA MINING AND KNOWLEDGE DIS-
S and T are said to commute if ST = TS, i.e., COVERY • DIFFERENTIAL EQUATIONS, ORDINARY •
D(ST ) = D(TS) and [S, T ] = 0. In abstract form the FOURIER SERIES • GENERALIZED FUNCTIONS • TOPOL-
Heisenberg principle says that for any state ψ ∈ D([S, T ]) OGY, GENERAL
1
|E([S, T ])|2 ≤ Var (S) × Var (T ).
4
Suppose D(P) is the subspace of all absolutely continuous BIBLIOGRAPHY
functions in H whose first derivative is also in H . Define
the operator P : D(P) → H by Brenner, S. C., and Scott, L. R. (1994). “The Mathematical Theory of
h Finite Element Methods,” Springer-Verlag, New York.
Pψ(x) = ψ (x) Groetsch, C. W. (1980). “Elements of Applicable Functional Analysis,”
2πi Dekker, New York.
where h is Planck’s constant. Then P is self-adjoint, that Kantorovich, L. V., and Akilov, G. P. (1964). “Functional Analysis in
is, an observable. A physical argument shows that E(P) Normed Spaces,” Pergamon, New York.
Kirsch, A. (1996). “An Introduction to the Mathematical Theory of In-
is the expected value of the momentum of the system and verse Problems,” Springer-Verlag, New York.
hence P is called the momentum operator. The commuta- Kreyszig, E. (1978). “Introductory Functional Analysis with Applica-
tor of the momentum and position operators can be found tions,” Wiley, New York.
from the relation Lebedev, L. P., Vorovich, I. I., and Gladwell, G. M. L. (1996). “Functional
Analysis: Applications in Mechanics and Inverse Problems,” Kluwer,
h d h
(P Mψ)(x) = [xψ(x)] = [ψ(x) + xψ (x)] Dordrecht.
2πi d x 2πi Naylor, A. W., and Sell, G. R. (1971). “Linear Operator Theory in En-
gineering and Science,” Holt, Rinehart and Winston, New York.
h
= ψ(x) + (MPψ)(x) Riesz, F., and Sz.-Nagy, B. (1955). “Functional Analysis,” Ungar, New
2πi York.
P1: FLV/LPB P2: FJU Final Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN006A-278 June 29, 2001 21:22
Generalized Functions
Ram P. Kanwal
Pennsylvania State University
I. Introduction
II. Distributions
III. Algebraic Operations on Distributions
IV. Analytic Operations on Distributions
V. Pseudo-Function, Hadamard Finite
Part and Regularization
VI. Distributional Derivatives
OF DiscontinuousFunctions
VII. Convergence of Distributions and Fourier
Series
VIII. Direct Product and Convolution of Distributions
IX. Fourier Transform
X. Poisson Summation Formula
XI. Asymptotic Evaluation of Integrals:
A Distributional Approach
517
P1: FLV/LPB P2: FJU Final
Encyclopedia of Physical Science and Technology EN006A-278 June 29, 2001 21:22
With the following four definitions we are able to grasp from the sifting property. This functional is continuous
the concept of distributions: because limm→∞ δ(x − ξ ), φm (x) = limm→∞ φm (ξ ) =
δ(x − ξ ), limm → ∞ φm (x). Since δ(x − ξ ) is not a
1. A sequence {φm (x)}, m = 1, 2, . . . , for all φm ∈ D, locally integrable function, it produces a singular distri-
converges to φ0 if the following two conditions are bution. These results hold in Rn as well. The functions
satisfied. (a) All φm as well as φ0 vanish outside a which generate singular distributions are called general-
common region. (b) D k φm → Dk φ0 uniformly over ized functions. We shall use these words interchangeably.
Rn as m → ∞, for all multi-indices k. For the special The definition of a distribution can be extended to include
case φ0 = 0, the sequence {φm } is called a null complex-valued functions.
sequence.
2. A linear functional t(x) on the space D is an The dual space D . The space of all distributions on D
operation by which we assign to every test function is called the dual space of D and is denoted as D . It is
φ(x) a real number t(x), φ(x) such that t, c1 φ1 + also a linear space.
c2 φ2 = c1 t, φ1 + c2 t, φ2 for arbitrary test There are many other interesting spaces. For example,
functions φ1 (x) and φ2 (x) and real numbers c1 and c2 . the test function space E consists of all the functions φ(x),
Then it follows that t, mj=1 c j φ j = so there is no limit on their growth at infinity. Accordingly,
m E ⊃ D. The corresponding space E consists of the distri-
j=1 c j t, φ j , where c j are arbitrary real numbers.
3. A linear functional t(x) on D is called continuous if butions which have compact support, so that E ⊃ D . The
and only if the sequence of numbers t, φm → t, φ, distributions of slow growth are defined in Sect. IX.
as m→∞, when the sequence {φm } of test functions
converges to φ ∈ D, that is,
III. ALGEBRAIC OPERATIONS
lim t, φm = t, lim φm . ON DISTRIBUTIONS
m→∞ m→∞
In the numerator of the first term on the right side of this has the drivative F (x) in the intervals x < ξ and x > ξ . The
equation we add and subtract φ(0) to get derivative is undefined at x = ξ . To find the distributional
∞ 1 1 derivative of F(x) in the entire interval we define the func-
φ(x) φ(0) φ(x) − φ(x) tion f (x) = F(x) − a H (x − ξ ), where H is the Heaviside
dx = dx +
x x x function. This function is continuous at x = ξ and has a
derivative which coincides with F (x) on both sides of ξ .
∞ Accordingly, we differentiate both sides of this equation
+ φ(x) d x and get F (x) = F̄ (x) − aδ(x − ξ ), where the bar over F
1
stands for the generalized (distributional) derivative of F.
Thus,
1
φ(x) − φ(0)
= −φ(0) ln + dx F̄ (x) = F (x) + [F]δ(x − ξ ), (26)
x
where [F] = a is the value of the jump of F at x = ξ . Be-
∞ fore we present the corresponding n-dimensional theory,
φ(x)
+ d x. we give an interesting application of (26) to the Sturm-
x Liouville differential equation,
1
The first term on the right side of this equation is infinite as d dE(x; ξ )
p(x) = q(x)E(x, ξ ) − δ(x − ξ ), (27)
→ 0, while the other two terms are finite. Accordingly, dx dx
we define
where E(x, ξ ) stands for the impulse response and p(x)
1 ∞ and q(x) are continuous at x = ξ . When we compare (22)
H (x)φ(x) φ(x) − φ(0) φ(x)
dx = dx + d x. and (23) and use the continuity of p(x) at x=ξ , we find
x x x
1 that the jump of [dE/d x]x=ξ is
(23)
The function P f (1/|x|) can be regularized in the same [dE/d x]x=ξ = −1/ p(ξ ).
way. Indeed,
These concepts easily generalize to higher-order deriva-
1 φ(x) − φ(0) tives and to the surfaces of discontinuity and, therefore,
Pf , φ(x) = dx
|x| |x|≤1 |x| have applications in the theory of wave fronts. Accord-
ingly, we include time t in our discussion and consider
φ(x)
+ d x. (24) a function F(x, t), x ∈ Rn , which has a jump discontinu-
|x|>1 |x| ity across a moving surface (x, t). Such a surface can
These concepts can be used to regularize the func- be represented locally either as an implicit equation of
tions 1/x m and H (x)/x m to yield the generalized func- the form u(x1 , . . . , xn , t) = 0, or in terms of the curvi-
tions P f (1/x m ) and P f (H (x)/x m ). The combination of linear Gaussian coordinates v1 , . . . , vn−1 on the surface:
P f (1/x m ) and δ (m) (x) yield the Heisenberg distributions xi = xi (v1 , . . . , vn−1 , t). The surface is regular, so the
above-mentioned functions have derivatives of all orders
1 1 1
δ ±(m) (x) = δ (m) (x) ∓ Pf . with respect to each of their arguments, and for all values
2 2πi xm
of t, the corresponding Jacobian matrices of transforma-
They arise in quantum mechanics. tion have appropriate ranks, that is, grad u = 0, and the
Let us end this section by solving the equation rank of the matrix (∂ xi /∂v j ) = n − 1. Furthermore, di-
xt(x) = g(x). The homogeneous part xt(x) = 0 has vides the space into two parts, which we shall call positive
the solution t(x) = δ(x) because xδ(x) = 0. Thus, the and negative.
complete solution is The basic distribution concentrated on a moving and a
1 deforming surface (x, t) is the delta function δ[(x, t)]
t(x) = δ(x) + g(x)P f . (25) whose action on a test function φ(x, t) ∈ D is
x
∞
δ(), φ = φ(x, t) d S(x) dt, (28)
VI. DISTRIBUTIONAL DERIVATIVES
−∞
OF DISCONTINUOUSFUNCTIONS
where dS is the surface element on . This is a sim-
Let us start with x ∈ R and consider a function F(x) that ple layer. The second surface distribution is the normal
has a jump discontinuity at x = ξ of magnitude a but that derivative operator, given as
P1: FLV/LPB P2: FJU Final
Encyclopedia of Physical Science and Technology EN006A-278 June 29, 2001 21:22
tions of rapid decay contains the complex-valued func- trary polynomial with constant coefficients, we find that
P1: FLV/LPB P2: FJU Final
Encyclopedia of Physical Science and Technology EN006A-278 June 29, 2001 21:22
[P(d/d
∞ x)φ]ˆ = P(−iu)φ̂(u). Similarly, ∞ fromiuxthe relation φ(u)] = d k t/d x k , φ̂(x) = (−1)k t(x), [(d k /d x k )φ(x)]ˆ
−∞ (i x) k
φ(x)e iux
d x = (d k
/du k
) −∞ φ(x)e d x, we = t(x), (−iu)k φ(u)]ˆ(x) = t̂(u), (−iu)k φ(u) = (−iu)k
derive the relation [x k
φ]ˆ(u) = P(−d/du) φ̂(u). Because t̂(u), φ(u), which shows that [d k t/d x k ]ˆ(u) =(−iu)k t̂(u),
∞
iua ∞
−∞ φ(x − a)e iux
d x = e −∞ φ(y)e iuy
dy, we have which agrees with (51). Instead of writing the formulas
[φ(x − a)]ˆ(u) = e φ̂(u). Similarly, [φ(x)]ˆ(u + a) =
iau (50)–(57) all over again for a distribution t(x), we shall
[eax (x)]ˆ(u) and [φ(ax]ˆ(u) = (1/|a|φ̂(u/a)). The merely refer to them with φ replaced by t. Let us now
corresponding n-dimensional formulas are listed below. give some important applications of these formulas.
∞
∞ 1. Delta function. [δ(x)]ˆ, φ(u) = δ(x), −∞ φ(u)eiux
∞
φ̂(u) = e iu·x
φ(x) d x, (48) du = −∞ φ(u)du = 1, φ(u). Thus δ̂(x) = 1 so that
−∞ 1̂ = 2π δ(x). In the n-dimensional case, the corresponding
∞ formulas are δ̂(x) = 1 and 1̂ = (2π )n δ(x). Incidentally, we
1 recover formula (12) from this relation.
φ(x) = eiu·x φ̂(u) du, (49)
(2π)n 2. Heaviside function. We use formula (51), so that
−∞
[t (x)]ˆ(u) = −iu t̂(u). Because t(x) = H (x), we have
[φ̂]ˆ(u) = (2π )n φ(−u), (50) [δ(x)]ˆ(u) = −iu[H (x)]ˆ(u) or u(H )(x)]ˆ(u) = i, whose
[D k φ]ˆ(u) = (−iu)k φ̂(u), (51) solution follows from (25) as [H (x)]ˆ(u) = cδ(u) +
i P f (1/u). Similarly, from (57) and the above result
∂ ∂ we have [H (−x)]ˆ(u) = cδ(u) − i P f (1/u). To find the
P ,..., φ(x) ˆ(u) = P(−iu 1 , ., −u n )φ̂(x), constant c we observe that H (x) + H (−x) = 1, whose
∂ x1 ∂ xn
Fourier transform is 2cδ(u) = 2π δ(u), so that c = 1. Thus
(52) [H (±x)]ˆ(u) = π δ(u) ± i P f (1/u).
[x φ]ˆ(x) = (−i D) x φ̂(u),
k k
(53) 3. Signum function. Because sgn x = H (x) − H (−x),
we take Fourier transforms of both sides and use Example
∂ ∂ 2 above to get [sgn x]ˆ(u) = 2i P f (1/u).
[P(x1 , . . . , xn )φ]ˆ(u) = P − i , . . . , −i φ̂(u),
∂u 1 ∂u n 4. P f (1/x). We use formulas (47) for t(x) and
(54) the example above. The result is [(sgn x)ˆ(u)](x) =
[2i P f (−1/u)]ˆ(x), so that
[φ(x − a)]ˆ(u) = e ia·u
φ̂(u), (55)
1
[φ̂(x)](u + a) = [eia·x ]ˆφ(u), (56) Pf ˆ(u) = iπ sgn u. (58)
x
−1
[φ(Ax)]ˆ(u) = |det A| φ̂(A )u), T
(57)
5. The function |x|. We write it as |x| = x H (x) −
where u · x = u 1 x1 +· · ·+u n xn , A is a nonsingular matrix, x H (−x). Taking the Fourier transforms of both sides of
and AT is its transpose. We shall refer to the numbered this equation, we get
formulas above for n = 1 also. [|x|]ˆ(u) = [x H (x)]ˆ(u) − [x H (−x)]ˆ(u)
As a simple example we consider the function
exp(−x 2 /2), which is clearly a member of S. To find its d d
= −i [H (x)]ˆ(u) + i [H (−x)]ˆ(u)
transform we first observe that it satisfies the equation du du
φ (x) + xφ(x) = 0. Taking Fourier transforms of both
d 1
sides of this equation and using (52) and (53), we = −i π δ(u) + i P f
du u
find that (d/du)[φ̂(u)eu /2 ] = 0. Thus, φ̂(u) = Ce−u /2 ,
2 2
where C is a constant. To evaluate√ C, we observe√that d 1
∞
φ̂(u) = −∞ exp(−x 2 /2) d x = 2π, so that C = 2π ±i π δ(u) − i P f
√ du u
and we have φ̂(u) = 2πφ(u). Thus we have found a
2
function
√ which is its own inverse. [The multiplicative
√ = .
2π disappear if we use the factors 1/ 2π in the u2
definition of the transform pairs (48) and (49)]. Thus [|x|]ˆ(u) = 2/u 2 .
Fourier transform of tempered distributions. Having 6. The function 1/x 2 . To find its transform we ap-
discovered that φ̂ ∈ S when φ is, we can apply the relation peal to the previous example and relation (47) so that
t̂, φ = t, φ̂ to define the Fourier transform of the tem- [(|x|)ˆ(u)]ˆ(x) = (2/u 2 )ˆ(x) and we get (1/x 2 )ˆ(u) = 12 |x|.
pered distributions t(x). Then all the formulas given above 7. Polynomial P(x) = a0 + a1 + · · · + an x n . Formula
for φ̂ carry over for t̂. For instance, |(d k /d x k )t(x)]ˆu, (54) gives [P(x)t(x)]ˆ(u) = P[−i(d/du)t̂(u)]. When we
P1: FLV/LPB P2: FJU Final
Encyclopedia of Physical Science and Technology EN006A-278 June 29, 2001 21:22
substitute t(x) = 1 in this formula and use Example 1, we By a routine computation it yields
get [P(x)]ˆ(u) = 2π P(−i d/du)δ(u). Thus, 2
sin(u/2) 2 u
a0 + a1 + · · · + an x n ˆ(u) = 2π (a0 δ − ia1 δ + · · · fˆ(u) = = sinc . (63)
(u/2) 2
+ (−i)n an δ (n) (u) . (59) 12. Fourier transform of an integral. We have found
In particular, x̂(u) = −2πiδ (u), [x ]ˆ(u) = −2π δ (u), 2 previously that taking the Fourier transform of the difer-
. . . , [x n ]ˆ(u)= (−i)n 2πδ (n) (u). ential of a distribution t(x) has the effect of multiplying
8. P f (1/|x|). With the help of the definition (24) of t̂(u) by (−iu). In the case of taking the Fourier transform
this function we have of the integral of t(x), it amounts to dividing t̂(u) by (−iu)
so that
1 1 x ˆ
Pf ˆ(u), φ(u) = P f , φ̂(x)
|x| |x| i t̂(u)
[t(s) ds] (u) = . (64)
u
1
φ̂(x) − φ̂(0) φ̂(x) a
= dx + d x,
|x| |x|>1 |x| Similarly,
−1 x ˆ
which, after some algebraic manipulation, yields i t̂(u) 1
[t(s) ds] (u) = + t̂(0)δ(u). (65)
1 u 2
Pf ˆ(u) = −2(γ + ln|u|), (60) −∞
|x|
13. Fourier transform of the convolution. The Fourier
where γ is Euler’s constant. transform of the convolution f ∗ g of two locally
9. ln|x|. For evaluation of the Fourier transform of this ∞
integrable functions f and g is [ f ∗ g]ˆ(u) = −∞ f ∗
important function, we take the Fourier transform of both ∞ iux ∞ ∞
ge = −∞ e
iux
−∞ f (x − y)g(y) dy = −∞ g(y)e
iuy
dy×
sides of (60), use formula (47), and obtain ∞ iu(x − y)
−∞ f (x − y) e d(x − y) = ĝ(u) f (u) = f (u)ĝ(u).
ˆ ˆ
1 Thus, the Fourier transform of the convolution of two
Pf = 2[2πγ δ(x) + [ln(u)]ˆ(x)]
|x| regular distributions is the product of their transforms.
and relabel. The result is This relation also holds for singular distributions with
slight restrictions. For instance, if at least one of these
1
[ln|x|]ˆ(u) = − π P f + 2πγ δ(u) . (61) distributions has compact support, then the relation holds.
2|u|
In the theories of wavelets, sampling, and interpolation,
we need the Fourier transforms of the square and triangular X. POISSON SUMMATION FORMULA
functions. We derive them in the next two examples.
10. The square function. It is defined as f (x) = To derive the Poisson summation formula we first
H (x + 12 ) − H (x − 12 ). Thus find the Fourier series of the delta function in the
1/2 1/2 period [0, 2π ]: δ(x) = ∞ m=0 (am cos mx + bm sin mx),
1 iux where the coefficients am , bm are given by a0 =
fˆ(u) = e iux
dx = e 2π 2π
iu −1/2 (1/2π ) 0 δ(x) d x = (1/2π ), am = (1π ) 0 δ(x) cos mx×
−1/2 π
d x = (1/π ), and bm = (1/π ) 0 δ(x) sin mx d x = 0. Thus,
2 u u we have
= sin = sinc , (62)
u 2 2 1 ∞
1 ∞
δ(x) = 1+2 cos mx = eimx . (66)
where sinc t = sin t/t. 2π 2π
m=1 m=−∞
11. The triangular function. It is defined as
Now we periodize δ(x) by putting the row of deltas at the
1 − |x|, |x| < 1, points 2π m so that relation (66) can be written as
f (x) =
0, |x| > 1,
∞
1 ∞
so that δ(x − 2π m) = 1+2 cos mx
m=−∞ 2π m=1
0 1
fˆ(u) = (1 + x) e iux
dx + (1 − x)eiux d x. 1 ∞
= eimx .
−1 0 2π m=−∞
P1: FLV/LPB P2: FJU Final
Encyclopedia of Physical Science and Technology EN006A-278 June 29, 2001 21:22
When we set x = 2π y in this relation and use the formula arises from the points where the function h(x) has a min-
δ[2π(y − m)] = (1/2π)δ(y − m) and relabel, we obtain imum. If x0 is the only global minumum of h(x), then we
∞
∞
∞ have the Laplace formula
δ(x − m) = 1 + 2 cos 2πmx = ei2π mx . 1/2
2π
m =−∞ m =1 m =−∞ I (λ) ∼ g(x0 )e−λh(x0 ) . (73)
(67) λh (x0 )
The action of this formula on a function φ(x) ∈ S yields If we could somehow prove that e−λh(x) has an asymptotic
∞ series of delta functions such that
φ(m) = φ̂(2π m), (68) 1/2
−λh(x) 2π
m =−∞ e ∼ e−λh(x0 ) δ(x − x0 ), (74)
λh (x0 )
which relates the sum of functions φ and their Fourier
transforms φ̂ and is a very useful formula. then all that we have to do is to substitute (74) in (72) and
In the classical theory, it is necessary that both sides use the sifting property of the delta function and formula
of relation (68) converge. Moreover, they must converge (73) follows immediately. To achieve the expansion (74)
in the same interval. With the help of the theory of we first define the moments of a function f (x). They are
distributions we can obtain many variants of relation ∞
(68) which are applicable even when one or both of
µn = f (x), x =n
f (x)x n d x. (75)
the series (68) are divergent. As an example, we set
x = x/λ, where λ is a real number, and use the relation −∞
δ[(x/λ) − m] = |λ|δ(x − mλ). Then relation (67) becomes The Taylor expansion of a test function φ(x) at x = 0 is
∞
∞
∞
xn
(λ) δ(x − mλ) = e2iπ xm/λ . (69) φ(x) = φ (n) (0) . (76)
m =−∞ m =−∞ n=0
n!
When we multiply both sides of this relation by a test Then it follows from (75) that
function φ(x) and integrate with respect to x, we obtain a
variant of (68) as
∞
xn
f (x), φ(x) = f (x), φ (0)(n)
∞
1 ∞
2πm n=0
n!
φ(mλ) = φ̂ . (70)
m=−∞ |λ| m=−∞ λ
∞
µn
= φ n (0) . (77)
This is called the distributional Poisson summation n=0
n!
formula. Among other things, the Poisson summation
But φ n (0) = (−1)n δ (n) (x), φ(x), so that (77) becomes
formula (70) transforms a slowly converging series to
a rapidly converging series.
√ For instance, if we take ∞
(−1)n µn δ (n) (x)
φ(x) = e−x , then φ̂(u) = πe−u /2 , so that (70) becomes
2 2
f (x), φ(x) = , φ(x) . (78)
n=0
n!
∞
∞
e−m λ
e−m π 2 /λ2
2 2 2
= (π/λ)1/2 . (71) Thus
m=−∞ m=−∞
∞
(−1)n µn δ (n) (x)
The series on the left side of (71) converges rapidly for f (x) = . (79)
n!
large λ, that on the right side for small λ. n=0
h (x) = 2[ψ (x)]2 + 2ψ(x)ψ (x), which, for x = x0 , The oscillatory integral
becomes h (x0 ) = 2[ψ (x0 )]2 or ∞
1 I (λ) = eiλh(x) g(x) d x, λ→∞ (87)
ψ (x0 ) =
[h (x0 )]1/2 . (81)
2 −∞
Substituting this information about h(x) in the Laplace can also be processed in the same manner as the Laplace
integral (72), we obtain integral. Indeed, the steps leading (87) to relation
∞ ∞
−λh(x0 ) −λ[ψ(x)]2 g(u)
I (λ) = e −iλh(x0 ) 2
e g(x) d x. (82) I (λ) = e eiλu du (88)
ψ (x0 )
−∞ −∞
The next step is to set u = ψ(x), which gives d x = are almost the same as these from (72) to (83). The dif-
du/ψ (x) so that (82) becomes ference arises in the value of the moments µn which now
∞ are
−λh(x0 ) g(u)
e−λu
2
I (λ) = e du, (83) ∞
nπ eπi[(2n+1)/4] , n even,
ψ (x0 ) iλu 2 n
−∞ µn = e u du = 2
0,
−∞ n odd.
so we need the asymptotic expansion of e−λu from for-
2
Graph Theory
Ralph Faudree
University of Memphis
I. Introduction
II. Connectedness
III. Trees
IV. Eulerian and Hamiltonian Graphs
V. Colorings of Graphs
VI. Planar Graphs
VII. Factorization Theory
VIII. Graph Reconstruction
IX. Extremal Theory
X. Directed graphs
XI. Networks
XII. Random Graphs
15
P1: GPA/GHE P2: FYK/LSK QC: FYK Final Pages
Encyclopedia of Physical Science and Technology en007b-296 June 30, 2001 16:40
16 Graph Theory
Graph Theory 17
FIGURE 2
obvious generalizations to sets of vertices or edges. Also, If all of the vertices of a graph have the same degree r ,
if e is an edge not in E, then G + e is the supergraph of then the grahph is said to be r -regular or just regular. The
G with the same vertex set as G but with e added to the complete graph K r +1 and the complete bipartite graph K r,r
edge set. The concepts of subgraph and supergraph are are examples of r -regular graphs. There are many other
illustrated in Fig. 2, with G being the graph pictured in special graphs, but we conclude with the graph Cn , a cycle
Fig. 1. of length n. This is a connected graph of order n which
Two graphs G and H are identical (written as G = H ) is 2-regular. The graph C5 is pictured in Fig. 4 along with
if V (G) = V (H ) and E(G) = E(H ). However, it is pos- other examples of complete graphs, bipartite graphs, and
sible for graphs to have the same appearance if they are their complements.
appropriately labeled but yet not be identical. Two graphs Up to this point only one form of a “graph” has been
G and H are said to be isomorphic (written as G ∼ = H ) if discussed, and that is what is sometimes called a simple
there is a one-to-one mapping (called an isomorphism) θ graph. No edges of the form vv, which are called loops,
of V (G) onto V (H ) such that edge uv is in E(G) if and were allowed. Also, any pair of vertices in a graph were
only if the edge θ (u)θ(v) is in E(H ). The graph G pictured joined by at most one edge, not multiple edges. If loops
in Fig. 3 is isomorphic to the graph H of Fig. 3; in fact, the and multiple edges are allowed, the structure is called a
map θ defined by θ(u) = a, θ(v) = d, θ(w) = f, θ (x) = b multigraph. This definition of a multigraph is not com-
and θ (y) = c is an isomorphism from G onto H . pletely standard because loops are sometimes not allowed.
Some classes of graphs occur so frequently that special Figure 5a shows an example of a multigraph which has
names are given to them. A graph of order n in which multiple edges joining vertices u and x and loops at the
every pair of its vertices are adjacent is called a complete vertex v. All edges have to this point been assumed to
graph and is denoted by K n . Each vertex
in K n has degree have no direction, so that the edge uv is the same as the
n − 1 and the size of the graph is n2 = n(n − 1)/2. The edge vu. Sometimes it is appropriate for each edge to have
complement Ḡ of a graph G is the graph with vertex set a direction, and such directed edges will be called arcs.
V (G) such that two vertices are adjacent in Ḡ if and only Therefore each arc is an ordered pair of distinct vertices,
if they are not adjacent in G. Thus K̄ n is a graph with no and the arc uv is not the same as the arc vu. Such struc-
edges. A bipartite graph is one in which the vertices can tures are called directed graphs or digraphs. An example
be partitioned into two sets, say A and B, such that every of a digraph is given in Fig. 5b. All of these structures and
edge of the graph joins a vertex in A and a vertex in B. The some additional variations will be considered later.
sets A and B are called the parts of the bipartite graph. If There are many ways to describe or determine a par-
every vertex in A is joined to every vertex in B, then the ticular graph. So far, we have generally described graphs
bipartite graph is called a complete bipartite graph, and is by using the definition and listing the vertices and edges
denoted by K m,n if |A| = m and |B| = n. Thus, K m,n has of the graph. In a few cases, a picture or drawing of the
order m + n and size mn. The special complete bipartite
graph K 1,n is called a star of size n (and order n + 1). A
collection of vertices of a graph are said to be independent
if there are no edges joining them. Thus, the vertices of
the complement of a complete graph and the vertices in
the parts of a bipartite graph are independent.
18 Graph Theory
Graph Theory 19
called the components of G. The components are the max- called a cut-set, and the set S is said to separate vertices
imal connected subgraphs of a graph. For example, the x and y if they are in different components of G − S. The
graph G of Fig. 1 has two components, one with the four connectivity (sometimes called vertex connectivity) κ(G)
vertices {u, v, x, y} and the other with the single vertex w. of a graph G is the minimum number of vertices in a cut-
Associated with any pair of vertices u and v of a graph G set of G. The only graphs without cut-sets are complete
is the distance dG (u, v) [or just d(u, v) when the graph G graphs, and there the connectivity is one less than the
is obvious], which is the length of a shortest path from u to order of the complete graph. A graph G is k-connected if
v. A natural and interesting problem is the determination κ(G) ≥ k. Thus, for example, a graph of order at least 3 and
of a good procedure or algorithm for finding d(u, v). A with no cut-vertices is 2-connected. It is easily checked
brute-force approach of checking all possible paths from that κ(Pn ) = 1, κ(Cn ) = 2, κ(K n ) = n−1, and κ(K m,n ) = n
u to v would not be very satisfactory, since the number when m ≥ n. One would expect that as the connectivity of
of such paths could be quite large—in fact, of order of a graph becomes larger, there would be an increase in the
magnitude (n − 2)! for a graph G of order n. number of “alternative” paths between pairs of vertices.
A good “shortest-path algorithm” was discovered by The following results verify this expectation.
Dijkstra and determines the distance from a fixed vertex
Theorem 2.1 (Menger): For distinct nonadjacent ver-
of a graph G of order n to each of the remaining vertices,
tices u and v of a graph, the maximum number of internally
and it works in order of magnitude n 2 . Also, this algorithm
disjoint (vertex disjoint except for u and v) paths between
will handle the more general structure of a weighted graph
u and v is equal to the minimum number of vertices that
G in which each edge e of G is assigned a real number
separate u and v.
w(e), called its weight. For a weighted graph, the length
of a path is the sum of the weights of the edges of the path. Theorem 2.2 (Menger-Whitney): A graph is k-
It should be noted that any graph G can be considered connected if and only if for each pair of distinct vertices
as a weighted graph by assigning weight 1 to each edge of the graph there are at least k internally vertex disjoint
of G. The algorithm of Dijkstra is of greedy design and paths between them.
uses a breadth-first search of the graph. It starts by finding
Vertex connectivity has an analog, edge connectivity,
the vertex in G which is “closest” to a fixed vertex v.
denoted by κ1 (G). Each of the previous definitions and
At each step the “next closest” vertex is found using the
results concerning vertices has a natural edge analog. For
information about the distance from v to vertices whose
example, if e is an edge of a connected graph G and the
distances from v have already been determined. Each of
graph G − e (the graph obtained from G by deleting the
the n steps involves at most n comparisons and n additions
edge e) is disconnected, then e is called a cut-edge. An
of real numbers, and thus n 2 is an upper bound on the order
edge-cut-set is a collection of edges which, when deleted,
of magnitude of the algorithm.
disconnect the graph, and the edge connectivity is the min-
The “shortest-path algorithm” can be used to determine
imum number of edges in an edge-cut-set. There are results
the components of a graph. All vertices in the same com-
analogous to those of Menger and Whitney which also say
ponent of a fixed vertex v of a graph would have finite
that high edge connectivity is equivalent to the existence
distance from v, while the remaining vertices would have
of many edge disjoint paths between any pair of vertices.
infinite distance. Although there are more efficient meth-
ods for finding the components of a graph, the shortest
path algorithm is a good one for this purpose.
Connectedness in a graph which is being used as a III. TREES
model for a transportation or communication system is
clearly desirable. However, connectedness alone might A tree is usually defined as a connected graph which con-
not be sufficient. If the connected graph G which rep- tains no cycles. However, this very useful class of graphs
resented a communication network had a vertex v such has many other characterizations. The following theorem
that G − v was disconnected (v is called a cut-vertex of gives five equivalent statements, each of which could be
G), then it would be critical that no failure occur at v, for used as a definition of a tree.
this could result in a failure of the entire system. Thus, we
Theorem: 3.1 The following statements are equivalent
need some measure of the extent of the connectedness of
for a graph of order n:
a graph. With that objective in mind we introduce some
additional concepts. (i) G is connected and has no cycles.
If S is a collection of vertices of a graph G, then G − S (ii) G is connected and has size n − 1.
is the graph obtained from G by deleting the vertices of (iii) G has no cycles and has size n − 1.
S. If G is connected but G − S is disconnected, then S is (iv) G is a graph in which every edge is a cut-edge.
P1: GPA/GHE P2: FYK/LSK QC: FYK Final Pages
Encyclopedia of Physical Science and Technology en007b-296 June 30, 2001 16:40
20 Graph Theory
Graph Theory 21
Euler showed that the graph G of Fig. 8b has no eulerian lem is to determine the route which will minimize the time
trail. For a graph to have such a trail, it is clear that the (distance) of the trip. Another version of the same prob-
graph must be connected and that each vertex, except for lem is presented by a robot that is tightening screws on
possibly the first and last vertex of the trail, must have a piece of equipment on an assembly line. An order for
even degree. These conditions are also sufficient, as the tightening the screws should be determined so that the dis-
following result states. tance traveled by the arm of the robot is minimized. The
corresponding graph problem in both cases is to deter-
Theorem 4.1 (Euler): Let G be a connected graph (multi-
mine a minimum-weight hamiltonian cycle in a complete
graph). Then, G has a closed eulerian trail if and only if
graph, with weights assigned to each edge. The weight as-
each vertex has even degree, and G has an “open” eule-
signed to an edge would represent the time or cost of that
rian trail if and only if there are precisely two vertices of
edge. A brute-force approach of examining all possible
odd degree.
hamiltonian cycles could be quite expensive, since there
A consequence of Theorem 1.1 is that a graph has an are (n − 2)! possibilities in a complete graph of order n.
even number of vertices of odd degree. There is a useful Although there are good solutions for special classes of
immediate corollary of Theorem 4.1. If a connected graph graphs, no good algorithm is known for determining such
G has 2k vertices of odd degree, then the edges of G can be a hamiltonian cycle in the general case; in fact, the travel-
“covered” with k trails, and this is the minimum number ing salesman problem is known to be NP-complete. This
of trails which will suffice. This observation is the basis means that it is not known if a good algorithm exists, but
of many puzzles and games. the existence of a good algorithm to solve this problem
Also, related to eulerian graphs is the Chinese postman would imply the existence of good algorithms to solve
problem, which is to determine the shortest closed walk many other outstanding problems, such as the graph iso-
that contains all of the edges in a connected graph G. Such morphism problem. Some of these problems will be men-
a walk is called for obvious reasons a postman’s walk. If tioned in later sections. Although there is no known good
G has size m, then the postman’s walk will have length algorithm which always gives a minimum solution, there
m if and only if G is eulerian. At the other extreme, this are procedures which give reasonable solutions most of
shortest walk will have length 2m if and only if G is a tree. the time.
There is the obvious extension of the Chinese postman
problem to weighted graphs and minimizing the sum of
the weights along the postman’s walk. There are several V. COLORINGS OF GRAPHS
good algorithms for solving this problem.
A graph G is hamiltonian if it contains a spanning cy- A coloring (sometimes called a vertex coloring) of the
cle, and the spanning cycle is called a hamiltonian cycle. vertices of a graph is an assignment of one color to each
The name is derived from the mathematician Sir William vertex. If k colors are used, it is called a k-coloring, and if
Rowan Hamilton, who in 1857 introduced a game, whose adjacent vertices are given different colors, it is a proper
object was to form such a cycle. In Euler’s problem the coloring. The minimum number of colors in a proper col-
object was to visit each of the edges exactly once. In the oring of a graph G is called the (vertex) chromatic number
hamiltonian case the object is to visit each of the ver- of G and is denoted by χ (G). The chromatic number of
tices exactly once, so the problems seem closely related. many special graphs is easy to determine. For example,
However, in sharp contrast to the eulerian case, there χ (K n ) = n, χ (Cn ) = 3 if n is odd, and χ (B) = 2 for any
are no known necessary and sufficient conditions for a bipartite graph B with at least one edge. Therefore, all
graph to be hamiltonian, and the problem of finding such paths, all cycles of even length, and all trees have chro-
conditions is considered to be very difficult. There are nu- matic number 2, since they are bipartite. In general, the
merous sufficient conditions for the existence of a hamil- chromatic number of a graph is difficult to determine; in
tonian cycle and a few necessary conditions. The follow- fact, from an algorithm point of view it is an NP-complete
ing is an example of one of the better-known sufficient problem just like the traveling salesman problem.
conditions. In a proper coloring of a graph the set of all vertices
assigned the same color must be independent. A proper
Theorem 4.2 (Ore): If for each pair of nonadjacent ver-
k-coloring is thus a way of partitioning the vertices V of a
tices u and v of a graph G of order n ≥ 3, d(u) + d(v) ≥ n,
graph G into k independent sets, and thus, the chromatic
then G is hamiltonian.
number is the minimum number of independent sets in
A traveling salesman wishes to visit all of the cities on such a partition.
his route precisely one time and return to his home city in Consider the problem of storing n chemicals (or other
the smallest possible time. The traveling salesman prob- objects) when certain pairs of these chemicals cannot be
P1: GPA/GHE P2: FYK/LSK QC: FYK Final Pages
Encyclopedia of Physical Science and Technology en007b-296 June 30, 2001 16:40
22 Graph Theory
stored in the same building. There is a natural model as- different colors. An upper bound is given by the following
sociated with this problem, which is a graph G of order result.
n. The vertices are the chemicals, and the edges are pairs
Theorem 5.3 (Vizing): For any graph G, χ1 (G) = (G)
of objects that cannot be stored together. The chromatic
or (G) + 1.
number χ (G) is the minimum number of buildings needed
to safely store the objects, since the buildings partition The theorem of Vizing implies that a graph G falls
the objects into sets corresponding to independent sets of into one of two categories, depending on whether
vertices. χ1 (G) = (G) or not. “Most” graphs are in the first cate-
The chromatic number χ (G) is related to other graphi- gory (χ1 = ). In particular, all bipartite graphs are in this
cal parameters which are easy to determine, such as (G), category. If n is odd, then the cycle has Cn edge chromatic
the maximum degree of a vertex of G. The maximum de- number 3, but maximum degree 2, so there are graphs
gree gives an upper bound on the chromatic number. in the second category. No characterization for graphs in
either of these two categories is known.
Theorem 5.1 (Brooks): For any graph G, χ (G) ≤
There are two scheduling problems that can be consid-
(G) + 1, and if G is a connected graph which is not
ered as graph coloring problems. We briefly describe each
an odd cycle or a complete graph, then χ (G) ≤ (G).
of the problems along with a solution. The first problem in-
A maximal complete subgraph of a graph is called a volves a vertex coloring and the second problem involves
clique. If a graph G contains a clique of order m, then an edge coloring.
clearly χ (G) ≥ m. However, the determination of the order A schedule of classes is to determined that will accom-
of the largest clique in a graph is also an NP-complete modate the requests of a group of students. If each class is
problem. In addition, the chromatic number of a graph offered at a different time, then each student will be able
can be very large without the graph even containing a to take the classes he or she wanted. This could result in
complete graph with three vertices or any small cycle. a large number of class periods, with some being at very
There is no relation between χ (G) and the largest clique of undesirable times. The problem is to determine the mini-
a graph G that would simplify the problem of determining mum number of time periods in which the classes can be
χ(G). The girth of a graph G is the length of a shortest scheduled so that each student will get the schedule he or
cycle in the graph. The following result indicates that a she requested. Consider a graph G with the vertices being
graph can be very sparse and still have a large chromatic the classes to be scheduled. Two classes are joined by an
number. edge if some student would like to take both classes. If
two classes are scheduled at the same time, then no stu-
Theorem 5.2 (Erdős, Lovász): For any pair of positive
dent requested to take both of these classes, so such classes
integers k and m, there is a graph with chromatic number
must be independent in the graph G. This suggests that the
at least k and girth at least m.
chromatic number χ (G) is the minimum number of time
This result is interesting, but the proof technique that periods needed for the scheduling. This fact is not difficult
was used may be of even more interest. The probabilistic to verify.
method, which was introduced by Paul Erdős, is the basis A similar scheduling problem deals with the scheduling
for the proof. It its simplest form, this powerful technique of teachers for classes. A collection of teachers T are to
merely counts the number of graphs which do not satisfy be scheduled to teach a set of classes C. The number of
a certain property and verifies that this number is less than sections of each class for which each teacher is responsible
the total number of graphs. Therefore, there must be at is known. The problem is to determine a schedule which
least one graph with the desired property. This results in a uses a minimum number of time periods. The assumption
proof of the existence of a graph, but it does not necessarily is made that, in any period, a teacher can teach only one
exhibit such a graph. class and each class is taught by only one teacher. Consider
There is an edge analog to the vertex coloring of a graph. the bipartite graph B (actually it is a bipartite multigraph
An assignment of a color to each edge of a graph is called without loops) with T as the vertices in one part and C as
an edge coloring, and it is called a k-edge coloring if at the vertices in the other part. If a teacher t in T is scheduled
most k colors are used. A proper edge coloring is one in to teach m sections of a class c in C, then place m edges
which adjacent edges are assigned different colors. The between the vertex t and the vertex c. The construction of
edge chromatic number χ1 (G) of a graph G is the mini- a schedule is really a proper edge coloring of the bipartite
mum k for which there is a proper k-edge coloring of G. graph B using the time periods as colors. This observation
It should be noted that this concept applies to any multi- can be used to verify that the minimum number of periods
graph without loops. A lower bound for χ1 (G) is (G), that must be used in the scheduling is the edge chromatic
since each of the edges incident to a fixed vertex must have number χ1 (B), and this is the maximum degree (B).
P1: GPA/GHE P2: FYK/LSK QC: FYK Final Pages
Encyclopedia of Physical Science and Technology en007b-296 June 30, 2001 16:40
Graph Theory 23
24 Graph Theory
Graph Theory 25
Theorem 7.1 (Petersen): A graph has a 2-factorization if graphs. One such algorithm for bipartite graphs uses the
and only if it is m-regular with m even. idea of an alternating path. If M is a matching of a graph
G, then an M-alternating path is a path whose edges alter-
A similar characterization for 1-factorizations in bipar-
nate between edges in M and edges not in M. If the first
tite graphs exists.
and last vertices of the path are not in M, then the path is an
Theorem 7.2 (Kőnig): A bipartite graph has a 1- M-augmenting path. If P is an M-augmenting path, then
factorization if and only if it is regular. replacing the edges of P which are in M with the edges
of P not in M will give a larger matching in G. Thus a
Perfect matchings of graphs have been investigated maximum matching M never has an M-augmenting path.
more extensively than any other factors. Fortunately, there Berge showed that this was a necessary and sufficient con-
is a very useful characterization of bipartite graphs which dition for a matching to be maximum.
have perfect matchings. If G is a bipartite graph with parts The algorithm based on this result searches for aug-
A and B, then a matching M is said to saturate A if each menting paths by constructing a tree rooted at a vertex
vertex in A is incident to an edge of M. Thus, if |A| = |B|, which is not saturated by the the matching. Also, all of the
any matching which saturates A is a perfect matching. paths in this tree are alternating. Either this tree will give
Also, if M is a matching which saturates A, then for any an augmenting path, which will give a larger matching,
subset S of A, the neighborhood N (S) of S (vertices not or the matching will be a maximum matching. The condi-
in S which are adjacent to some vertex of S) must have as tion of Hall can be shown to be violated if the maximum
many vertices as S, because just the matching M implies matching is not a perfect matching.
this. This condition is also sufficient to imply the existence There are many applications of matchings in graphs. An
of a perfect matching. obvious one is the assignment problem. Assume there are
Theorem 7.3 (Philip Hall): If G is a bipartite graph with n workers and n jobs, and each worker can perform certain
parts A and B, then there is a matching which saturates of the jobs. The assignment problem is to determine if it is
A if and only if |N (S)| ≥ |S| for all S ⊆ A. possible to assign each person a job (two workers can not
be assigned the same job), and if so, make the assignment.
A characterization of all graphs which have perfect The graph model is a bipartite graph with the workers in
matchings also exists, but it is more complicated. If S is a one part and the jobs in the other part. An edge is placed
separating set for a graph G, then G − S may have several between a worker and any job he can perform. The assign-
components. Any component C with an odd number of ment problem reduces to finding a maximal matching in
vertices (called an odd component) cannot within itself this bipartite graph. If this matching is a perfect matching,
have a perfect matching. So, any perfect matching of G then an assignment can be made. Otherwise, it is not pos-
would have at least one edge joining a vertex of C with a sible. Since there is a good algorithm for finding maximal
vertex of S. Thus, if G has a perfect matching, the number matchings in bipartite graphs, there is a good algorithm
of odd components cannot exceed the number of vertices for solving the assignment problem.
in S. This condition on the number of odd components is The assignment problem can also be considered when
also sufficient for the existence of a perfect matching. the number of jobs and the number of workers are not the
same. However, for example, in this case when the number
Theorem 7.4 (Tutte): A graph G has a perfect match-
of workers exceeds the number of jobs, the objective is to
ing if and only if for each S ⊆ V (G) the number of odd
assign all of the jobs. This will unfortunately leave some
components in G − S does not exceed |S|.
of the workers unemployed. The graph model is still the
Although both Theorem 7.3 and Theorem 7.4 are use- same bipartite graph, and the objective is to find a matching
ful results, they cannot be applied directly to obtain good which saturates the vertices associated with the jobs. The
algorithms for finding perfect matchings (or maximum same algorithm used when the number of jobs is the same
matchings) in graphs. This is true because there are 2n as the number of workers applies in this case.
different subsets in a set with n elements. However, good A generalization of the assignment problem is the opti-
algorithms do exist for finding maximum matchings in mal assignment problem. Here there are also n workers and
P1: GPA/GHE P2: FYK/LSK QC: FYK Final Pages
Encyclopedia of Physical Science and Technology en007b-296 June 30, 2001 16:40
26 Graph Theory
n jobs. However, in this case, the ith worker can perform
p
the jth job with some efficiency ci j . The problem is to as- q= qi ( p − 2).
sign each worker a job such that the sum of the efficiencies i=1
is a maximum. Again, a brute-force approach of examin- Also, clearly the vertex vi has degree q −qi . An immediate
ing all possible assignments is not efficient, since there consequence of these facts is that any regular graph is
are n! such possibilities. However, there is a good algo- reconstructible.
rithm for solving the optimal assignment problem which Several properties dealing with the connectedness of a
utilizes finding maximum matchings in a series of appro- graph are reconstructible, including the number of com-
priately defined bipartite graphs. Associated with a given ponents of the graph. A subgraph of a graph is a block
assignment of jobs there is a bipartite graph. A maximum if it is a maximal 2-connected subgraph. The blocks of a
matching is determined in this bipartite graph. If the max- graph partition the edges of a graph, and the only vertices
imum matching is not a perfect matching, then it can be that are in more than one block are the cut-vertices. If a
used to obtain a better assignment which determines a graph has at least two blocks, then the blocks of the graph
new bipartite graph. If the matching is perfect, then the can also be determined. However, this does not mean the
assignment is optimal and the algorithm terminates. graph can be reconstructed from the blocks.
There are many special classes of graphs which are
reconstructible, but we list only three well-known classes.
VIII. GRAPH RECONSTRUCTION Theorem 8.1: The following classes of graphs are recon-
structible:
A famous unsolved problem in graph theory is the Kelly-
Ulam conjecture. A graph G of order n is reconstructible (i) Disconnected graphs
if it is uniquely determined by its n subgraphs G − v for (ii) Trees
v ∈ V (G). In Fig. 13 there is an example of the four graphs (iii) Regular graphs
obtained from single vertex deletions of a graph of order
Corresponding to the “vertex” reconstruction conjec-
4, and the graph they uniquely determine.
ture is an edge reconstruction conjecture, which states
Reconstruction Conjecture (Kelly-Ulam): Any graph that a graph G of size m ≥ 4 is uniquely determined by the
of order at least 3 is reconstructible. m subgraphs G − e for e ∈ E(G). Such a graph is said to be
edge-reconstructible. Just as in the vertex case, the edge
The initial but equivalent formulation of the conjec-
conjecture is open. The two conjectures are related, as the
ture involved two graphs. If G and H are graphs with
following result indicates.
V (G) = {u 1 , u 2 , . . . , u n } and V (H ) = {v1 , v2 , . . . , vn },
and if G−u i ∼= H − vi for 1 ≤ i ≤ n, then G ∼ = H . Note that Theorem 8.2 (Greenwell): If a graph with at least four
to say that a graph G is reconstructible does not mean that edges and no isolated vertices is reconstructible, then is
there is a good algorithm which will construct the graph is edge-reconstructible.
G from the graphs G − v for v ∈ V . A positive solution to Theorem 8.2 implies that trees, regular graphs, and
the conjecture might still leave open the question of the disconnected graphs with two nontrivial components are
complexity of algorithms that would generate a solution edge reconstructible. There are also results which show
to the problem. that graphs with “many” edges are edge-reconstructible.
Although it is not known in general if a graph is re- For example, Lovász has shown that if a graph G has
constructible, certain properties and parameters of the order n and size m with m ≥ n(n − 1)/4, then G is edge-
graph are reconstructible. It is straightforward to recon- reconstructible.
struct from the vertex-deleted subgraphs both the size of Intuitively, the edge-reconstruction conjecture is wea-
a graph and the degree of each vertex. Let G be a graph ker than the reconstruction conjecture. This is confirmed
of size q with vertices {v1 , v2 , . . . , v p }, and for each i let by Theorem 8.2. However, there is another way of relating
qi be the size of the graph G − vi . Each edge in G would the two conjectures. Associated with each graph G is the
appear in precisely p − 2 of the vertex deleted subgraphs, line graph L(G) of G. The vertices of L(G) are the edges
hence of G and two vertices of L(G) (which are edges of G) are
adjacent in L(G) if and only if they were adjacent edges
in G. The following result relates reconstruction and edge
reconstruction.
Theorem 8.3 (Harary, Hemminger, Palmer): A graph
with size at least four is edge-reconstructible if and only
FIGURE 13 Graph reconstruction. if its line-graph is reconstructible.
P1: GPA/GHE P2: FYK/LSK QC: FYK Final Pages
Encyclopedia of Physical Science and Technology en007b-296 June 30, 2001 16:40
Graph Theory 27
The line graphs of some special classes of graphs are vertices. Let t(n, k) denote the size of the graph T (n, k).
easy to determine. For example, the line graph of a star Turán proved the following result.
K 1,n is K n , a complete graph, and the line graph of a cy-
Theorem 9.1 (Turán): The maximum number of edges
cle Cn is the cycle Cn of the same length. Therefore, the
in a graph of order n which does not contain a K p is
graphs K 3 and K 1,3 have isomorphic line graphs, namely,
t(n, p − 1), and the graph T (n, p − 1) is the only graph
K 3 . With this one exception, the line graphs of nonisomor-
of that order and size with no K p .
phic connected graphs are also nonisomorphic. However,
this does not imply that every graph is the line graph of For a fixed graph F, which we will call the forbid-
some graph. In fact, there are numerous characterizations den graph, the extremal number ex(n, F) is the maxi-
of line graphs. In particular, no graph which has an in- mum size of a graph of order n which does not contain
duced subgraph isomorphic to K 1,3 can be the line graph an F as subgraph. The collection of graphs of order n
of a graph. and size ex(n, F) are called the extremal graphs for F
Since not every graph is the line graph of some graph, and is denoted by Ex(n, F). The extremal problem con-
Theorem 8.3 does not imply that the edge reconstruction sists of finding ex(n, F) and also Ex(n, F), if possible.
conjecture and the vertex reconstruction conjecture are In the case when the forbidden graph is K p , the result of
equivalent. Turán states that ex(n, K p ) = t(n, p −1) and Ex(n, K p) =
{T (n, p − 1)}. Note that
IX. EXTREMAL THEORY
p−2 n
t(n, p − 1) =
There are a large number of optimization problems and p−1 2
results in graph theory which could be considered in a
discussion of extremal graph theory. However, we will re- when n is a multiple of p − 1 and is close to that for
strict our consideration to the Turán-type extremal prob- all other values of n. If p is thought of as the chromatic
lem of determining the maximum number of edges a graph number of the complete graph K p , then the extremal result
can have without containing a certain subgraph. For exam- for an arbitrary graph F is closely related to the result for
ple, consider the problem of determining how many edges the complete graph. In fact, the chromatic number χ (F)
there can be in a graph of order n which does not contain determines the order of magnitude of the extremal number
a K 3 (triangle). For n even, the complete bipartite graph if the graph is not bipartite. In the following result, an
K n/2,n/2 has n 2 /4 edges and no triangle, and, in fact, no “error function” is needed. Denote by o(n 2 ) a function of
odd cycle of any length. Also, any graph with n vertices n with the property that limn→∞ o(n 2 )/n 2 = 0.
and (n 2 /4) + 1 edges (for n even) can be shown to contain Theorem 9.2 (Erdős-Simonovits): For any nontrivial
a triangle. Thus, when n is even, the maximum number of graph F with chromatic number k ≥ 3,
edges in a triangle-free graph of order n is n 2 /4.
The original problem of Turán, who initiated this kind k−2
of investigation, was to determine the maximum number ex(n, F) = (n 2 ) + o(n 2 ).
k−1
of edges a graph of order n can have without containing
a complete graph K p . The case p = 3 was just consid-
Also, any graph in Ex(n, F) can be obtained from T (n,
ered. The general case, while more complicated, is entirely
k − 1) by deleting or adding at most o(n 2 ) edges.
parallel.
Multipartite graphs are a class of graphs which occur in If F is not a bipartite graph, then the first term in the
this type of study. A graph G is called a k-partite graph expression for ex(n, F) in Theorem 9.2 is of order n 2 , so
if the vertices V can be partitioned into k parts such that the error function o(n 2 ) becomes insignificant for suffi-
the only edges in G join vertices in different parts. Thus, ciently large values of n. However, when F is a bipartite
bipartite graphs are simply 2-partite graphs. If all edges graph, this first term is 0, and thus Theorem 9.2 gives little
between each pair of k parts are in the graph, the graph is information. The extremal problem for a bipartite graph
called a complete k-partite graph. In any k-partite graph is called the degenerate extremal problem. In this case,
the vertices in each part are independent, and if the graph there are many interesting open questions, and no general
is a complete k-partite graph then the chromatic number asymptotic result like Theorem 9.2 is known in the de-
of the graph is clearly k. Let T (n, k) denote the complete generate case. Examples which give lower bounds for the
k-partite graph of order n in which the difference in the number ex(n, F) are difficult to find when F is a bipartite
number of vertices in any pair of parts is at most 1. That graph, and in many cases involve designs. We mention two
is, if n = km + r with 0 ≤ r < m, there will be r parts with results which give upper bounds. Sharp lower bounds are
m + 1 vertices and the remaining k − r parts will have m not known for either of the classes of graphs considered
P1: GPA/GHE P2: FYK/LSK QC: FYK Final Pages
Encyclopedia of Physical Science and Technology en007b-296 June 30, 2001 16:40
28 Graph Theory
below, except for a few small-order cases such as C4 , C6 , TABLE I Ramsey Numbers r (Km , Kn )
C10 , and K 3,3 . m\n 3 4 5 6 7 8 9
Theorem 9.3 (Kővári-Sós-Turán): If r ≤ s, then there
3 6 9 14 18 23 29 36
exists a constant cr s (depending only on r and s) such that
4 18
Graph Theory 29
30 Graph Theory
Graph Theory 31
specified property and showing that this number is smaller Theorem 12.3: Assume that limn→∞ g(n) = ∞, and
than the total number of graphs. An example of this type let p1 (n) = [ln n − g(n)]/n and p2 (n) = [ln n + g(n)]/n.
of result is the following result of Erdős concerning an Then, almost no graphs in G[n, p1 (n)] are connected, and
exponential lower bound for Ramsey number r (K s , K s ). almost all graphs in G[n, p2 (n)] are connected.
Theorem 12.1 (Erdős): There is a positive constant c The previous theorem implies that t(n) = (ln n)/n is the
such that r (K n , K n ) > cs s/2 . threshold function for connectivity, and the next result
implies that t(n) = [(ln n)/n + ln ln n]/n is the threshold
The lower bound of the previous theorem has been im-
function for hamiltonicity.
proved using random graphs (probabilistic techniques),
but at this point no graphs have actually been described Theorem: 12.4: Assume that limn→∞ g(n) = ∞, and
that satisfy the bound. It was mentioned earlier that given let p1 (n) = [ln n + ln ln n − g(n)]/n and p2 (n) = [ln n +
positive integers k and m there is a graph G with the chro- ln ln n + g(n)]/n. Then, almost no graphs in G[n, p1 (n)]
matic number χ (G) ≥ k and girth g(G) ≥ m. Again, the are hamiltonian, and almost all graphs in G[n, p2 (n)] are
major tool used in this proof is the probabilistic method. hamiltonian.
If R is a graphical property, then the probability that
There are corresponding threshold functions results for
a graph G in G(n, p) has property R will be denoted by
a wide range of graphical properties.
P[R ∩ G(n, p)]. We say that almost all graphs have prop-
erty R if limn→∞ P[R ∩ G(n, p)] = 1. There is the corre-
sponding definition for almost no graphs. As the value of SEE ALSO THE FOLLOWING ARTICLES
p increases, the expected number of edges of a random
graph G in G(n, p) will also increase. Thus, if p is a fixed ALGEBRA, ABSTRACT • DISCRETE MATHEMATICS AND
positive real number, then the expected
number in edges in COMBINATORICS • GRAPHICS IN THE PHYSICAL SCIENCES
a random graph G in G(n, p) is p n2 , which is substantial. • PROBABILITY
Therefore, these graphs are dense, so it is not surprising
that the following is true.
Theorem 12.2: If p is a fixed positive number BIBLIOGRAPHY
(0 < p < 1), then for any fixed positive integer k, almost
all graphs [in the probability model G(n, p)] have the fol- Alon, N., and Spencer, J. (1992). “The Probabilistic Method,” Wiley,
New York.
lowing properties (among others): Beineke, L. W., and Wilson, R. J., eds. (1983). “Selected Topics in Graph
Theory 2,” Academic Press, London.
(i) Diameter 2
Beineke, L. W., and Wilson, R. J., eds. (1988). “Selected Topics in Graph
(ii) Contain any finite subgraph H as an induced Theory 3,” Academic Press, London.
subgraph Biggs, N. L., Lloyd, E. K., and Wilson, R. J. (1976). “Graph Theory
(iii) k-Connected 1736–1936,” Clarendon Press, Oxford, U.K.
(iv) Hamiltonian Bollobás, B. (1998). “Modern Graph Theory,” Springer-Verlag, New
York.
(v) Genus exceeding k
Bollobás, B. (1978). “Extremal Graph Theory,” Academic Press, New
(vi) Chromatic number exceeding k York.
Chartrand, G., and Lesniak, L. (1996). “Graphs and Digraphs,” 3rd ed.,
It is obvious that a p increases, the number of edges of Chapman & Hall, London.
graphs in a random graph in G(n, p) increases, but it is Chartrand, G., and Oellermann, O. (1993). “Applied and Algorithmic
not obvious that random graphs undergo significant struc- Graph Theory,” McGraw-Hill, New York.
tural changes as the number of edges increases. This be- Clark, J., and Derek Holton, D. (1991). “A First Look at Graph Theory,”
havior can be described in terms of threshold functions World Scientific, Singapore.
Gould, R. (1988). “Graph Theory,” Benjamin/Cummins, Menlo Park,
for a graphical property in the probability space G(n, p) CA.
for the probability p = p(n). If R is a graphical prop- Graham, R. L., Grötschel, M., and Lovász, L., eds. (1995). “Handbook of
erty then t(n) is a threshold function for property R when Combinatorics, I, II,” MIT Press, Cambridge, MA, and North Holland,
limn→∞ p(n)/t(n) = ∞, then almost all graphs have prop- Amsterdam, The Netherlands.
Palmer, E. (1985). “Graphical Evolution,” Wiley, New York.
erty R and when limn→∞ p(n)/t(n) = 0, then almost no
West, D. B. (1996). “Introduction to Graph Theory,” Prentice-Hall,
graphs have property R. The following two results are ex- Upper Saddle River, NJ.
amples of threshold functions for the property of being Wilson, R. J., and Watkins, J. (1991). “Graphs, An Introductory
“connected” and the property of being “hamiltonian.” Approach,” Wiley, New York.
P1: GTV Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
I. Univariate Guidelines
II. Bivariate Guidelines
III. Multivariate Visualization
IV. Closing Remarks
1
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
Quantile plot Graph that relates the ordered observations many times more images. The human visual system may
in a batch of data to cumulative probabilities. be able to handle on the order of 10 7 bits per second
Quantile–quantile plot Graph for comparing two distri- of visual input, so flashing through the images may not
butions. Good for several tasks like judging relative tail take all that long. However, encoding the information in a
thickness, ratios of standard deviations, and changes in way that helps scientists is a real challenge. Norretranders
mean. estimates that consciousness is limited to about 16 bits a
Scatterplot matrix Graph displaying scatterplots of all second. If care is not taken none of what is seen will get
pairs of variables, arranged in square array with indi- into those precious 16 bits or whatever gets in might be
vidual variables corresponding to rows and columns. distorted.
Especially useful in the early stages of studying joint The brief monochrome article cannot do justice to meth-
relationships across several variables. ods available for quantitative visualization in the phys-
Three-dimensional scatterplot Graph with points plot- ical sciences. One book of over 400 pages catalogues
ted against three orthogonal axes. Perceived via a stereo monochrome chart types. The possibilities grow combina-
pair view or via motion parallax. Especially useful in torially from this when considering the addition of color,
seeing three-dimensional structure. Useful as a frame- texture, interactivity, and further composition of forms.
work for showing additional coordinates using glyphs. This article presents some of the basic methods asso-
ciated with the field of statistical graphics, along with a
discussion of related cognitive and perceptual principles
THE PURPOSE of graphics in the physical sciences is that help to provide guidance when moving beyond the
to help scientific and public understanding of the phenom- basic forms. The principles are, of course, not correct in
ena being studied. The graphics and design issues for the every detail, but provide a reasonable basis for selecting
respective audiences differ. This article emphasizes graph- among the options. Before proceeding, a few comments
ics for scientists who are often interested in the discovery on data and estimates are appropriate.
and analysis of data. The varied sources of physical science data include lab-
Scientists involved in discovery can face many chal- oratory studies, field samples, and electronically collected
lenges. These may include data such as satellite imagery and computer simulation.
Most data need to be transformed into estimates that are
r The complexity of the phenomena being studied, suitable for scientific analysis. Each type of data and col-
r The difficulty of parsimoniously conceptualizing this lection circumstances poses it own issues in addressing the
complexity, processes of producing estimates that are worthy of eval-
r The logistic and political impediments to collecting uation. Recurrent types of issues include calibration of
adequate, representative data, instruments, scaling of variables, adequacy of variables as
r The limits of computational resources, surrogates for the desired variables of interest, adjustments
r The limits of human perception and cognition for for covariates, representative coverage of the population
understanding multivariate summaries. or phenomena of interest, and validation of simulated or
indirect estimates.
Graphics can help in conceptualizing and characteriz- Graphics can be no better than the estimates presented
ing the phenomena being student. The Crick and Wat- (unless they are being used to show how bad the estimates
son discovery of the DNA’s double-helix structure was are). Readers should look for indications of estimate qual-
a major step forward. Now the exponentially expand- ity either in the accompanying text or in the graphics.
ing studies of genes can boggle the mind. Graphical The display of confidence bounds for estimates provides
bookkeeping (for example, see biochemical pathways at a good indication. The lack of confidence bounds is some-
http://www.expasy.ch/cgi-bin/show thumbnails.pl.) helps times a warning that estimates have not been assessed with
scientists cope with the ever-increasing details. The pos- respect to accuracy (bias) and precision (variability).
sibilities of combinatorial chemistry and genetics are hard The heart of graphics is visual comparison. One com-
to imagine. The physical scientists monitoring the earth mon task is to assess estimated density patterns. This
also face huge datasets. Consider one multispectral dataset involves comparing local densities from different parts
with 30-m resolution over the 8 million km2 of the con- of a plot. Another task compares data to a conjectured
tinental United States. A quick calculation indicates that functional relationship. Other common tasks include com-
it takes over 7000 screen images to examine this data on paring estimates against reference values and compar-
a 1024 × 1280 monitor. That encodes all of the multi- ing two or more sets of estimates against each other.
spectral information for a 30-m region into the color of a Much of graphics design concerns facilitating accurate
pixel. To appreciate the multispectral resolution requires comparisons.
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
I. UNIVARIATE GUIDELINES
• •
•• • • • • •
FIGURE 1 Position along a scale. This is the preferred encoding FIGURE 3 Transforming angles. This reduces the decoding task
for a continuous variable. from taking differences to judging position along an angular scale.
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
2 •••••
••
••
••• Sept. 1992
••
••
••
••
•••••••••••
•
0 q=-.31
••
•••
••
••
••
•••••••••••••••
•
Quantiles
•
•••••
• Sept. 1982
••
••
••
••
•••••••••••••
•••••
-2 ••••
••• p=0.5
-.1 .2 .5 .8
NDVI from Africa
-4 FIGURE 6 Box plots. This compare two distributions of values
taken from Fig. 8. Vertical lines show the medians. First and third
quartiles bind the thicker gray rectangles. Adjacent values bind
• the thinner gray rectangles. No outlines appear in the example.
0.1 0.3 0.5 0.7 0.9 When interior white rectangles do not overlap, medians are not
significantly different.
Cumulative Probabilities
FIGURE 5
changed. For maps it is possible to save space by repre-
to the smallest observation in a sample of size 10 would senting selected cumulative probability and quantile pairs
be (1 − .5)/10 = .05. An interpretation is that if the ob- using a parallel coordinate approach.
servations were from a random sample of size 10, the
probability of a future observation being smaller than the
C. Box Plots
currently observed smallest observation is .05.
Another common textbook approach for calculating cu- The box plot is a distribution caricature that has achieved
mulative probabilities is the empirical method. For this wide acceptance. While used to represent individual
method the cumulative probability for a value is the frac- distributions, the common use is to compare distributions.
tion of observations less than or equal to the value. This Figure 6 shows two boxplots. The features shown include
approach treats the curve as a step function jumping at the median, quartiles, and adjacent values. The notion
each observation. The step function jumps from 0 to 1/10 of adjacent values sets a bound on what will be called
at the smallest observation in a sample of size 10. This outliers and warrant a note on how it is calculated. The
suggests that the probability of a future observation being upper adjacent value is the largest observation less than
smaller than the smallest one already observed is 0. The the 3rd quartile +1.5 times the interquartile range (the
sample does not allow determination of the true probabil- 3rd quartile −1st quartile). The lower adjacent value is
ity. The probability .05 seems reasonable. As the sample determined correspondingly. Outliers, if any, are more
gets larger the discrepancy between the two calculation extreme than the adjacent values. Not all box plot vari-
approaches get smaller. ations are the same. Some just show the maximum and
Interpretation is straightforward. For any probability minimum values rather than adjacent values and outliers.
covered on the x-axis it is possible to determine a quan- The design variation in Fig. 6 has an additional feature. It
tile. To obtain the .5 quantile (or estimated median) go uses a white line to provide comparison intervals for the
straight up from .5 on the x-axis to the curve and then medians. If two comparison intervals do not overlap the
straight across to the y-axis to obtain x units. Starting medians are significantly different.
with .25 and .75 yields corresponding quantiles are also
known as the 1st and 3rd quartiles, or 25th and 75th per-
D. Quantile–Quantile Plots
centiles, respectively. Similarly one can go from quantiles
to cumulative probabilities. Since scientists use such plots Some authors consider the quantile–quantile (QQ) plots
to describe convenience collections from a population as as the preferred graphic to make detailed continuous dis-
well as random samples from a population, the trickiest tribution comparisons. For theoretical distributions, the
interpretation task is often to decide if the probabilities cumulative distribution function, F(), provides the corre-
really extend to a larger population. spondence between the probability and quantile pairs via
Quantile plots are helpful on maps and provide a frame p = F(q). When F() is a strictly increasing function the
of reference for observing change over time. For exam- quantile function, Q(), is the inverse of F() and Q( p) = q.
ple, one can tell if the 50th percentile (the median) has Familiar pq pairs from the standard normal distribution
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
are (.5, 0) and (.025, −1.96). Comparison of two distri- Two Curves
butions, denoted 1 and 2, proceeds by plotting quantile And Their Difference
pairs (Q1 (p), Q2 (p)) over a range of probabilities, such
as from .01 to .99 in steps of .01. For two distributions 1.0
of observed data, the calculations described for the quan-
tile plots (previously described) are appropriate. Figure 6
shows a QQplot for two sets of data. The x-axis shows
quantiles from Set 1 and the y-axis shows quantiles from 0.5
Set 2. Sometimes statistician chose Q1 ( p) to be from a the-
oretical family of distributions, such as the normal family,
to see if parametric modeling is reasonable using the nor-
mal (Gaussian) family of distributions.
A merit of QQplots is that in simple cases they have a 0.0
nice interpretation. If points fall on a straight line then the
distributions have the similar shape differing only in the
first 2 moments. This is the case in Fig. 7 since the robust 0.11
fit (thin) line matches the quantiles quite well.
The slope and intercept of the approximating straight 0.08
line tell about the discrepancies in the second moment
(standard deviation) and first moment (mean). The slope
of the thin line tells about the ratio of standard devia- 0.05
tions. The thick line in the figure is the reference line for
identical distributions. The lines are not quite parallel in
0.02
Fig. 7 so the standard deviations are not quite the same.
Multiplying Set 2 quantiles by a factor and then adding 0.0 0.5 1.0
a constant changes the slope and intercept of the robust
FIGURE 8 Explicit difference of two curves. Humans tend to see
fit line. Graphical input and visual assessment provides closest differences between curves, not differences parallel to the
a quick way to find the two numbers that make the lines y-axis.
••
••
•••
••••• to comparing spectra and time series as well. Adding grid
•
•••• lines can help, but it is often better to plot the difference ex-
0
••
•• plicitly or make distribution comparisons using QQplots.
••
•
•
-1
FIGURE 9 Pair Difference Plot (bottom panel). Top panels represent the Device Vegetation Index (NDVI) values on
a 1◦ grid using gray levels. The images were taken about a decade apart. The bottom panel shows the difference of
location paired values. The mean and difference plots show the mean on the x-axis and the difference on the y-axis.
The reference line is x = 0. This plot handles many thousands of points by hexagon binning. Large hexagon symbols
show more counts. While there are many exceptions, the smooth line indicates a tendency for 1992 values to be
larger.
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
low-resolution satellite images of the same region. The 1950-1959 State Mortality Rates
top two images show NDVI values on a 1◦ grid. NDVI
22
+
+
vides a measure of greenness. The traditional reference + o = Land State
+ +
line for equality is a 45◦ line through origin. John Tukey +
+ o + o
18
suggested making the reference line horizontal by plot- + o o
o + o+
ting differences on the y-axis and mean on the x-axis. The o + +
bottom of Figure 9 shows a mean difference plot. Making o oo + +
+
14
o +
transformations to simplify the visual reference and re- o
o o o o +
oo o o +
duce the visual calculation burden is an important graphic o o+ o +
o
o o
10
design principle.
The data for the panels in Fig. 9 consist of over 27,000 o
point pairs. The overplotting in the bottom panel would
30 35 40 45
cause a problem. Since we do not judge density well the
bottom panel bins the data into hexagon bins and rep- Latitude in Degrees
resents the count with the size of the hexagon symbol FIGURE 10 A smooth. The smooth curve suggests a functional
plotted. Such binning is a fast but rudimentary way to ob- relationship. The two types of points suggest there are two differ-
ent functional relationships involved.
tain a bivariate density estimate. Carr lists the merits of
hexagon binning over square binning. An alternative view
estimates the density surface on a higher-resolution bivari- Hastie and Tibshirani provide a good introduction to a
ate grid and displays the surface with perspective views variety of smoothing methods.
or contours. Numerous smoothers are available. Historically, many
The line in the bottom panel of Fig. 9 shows a smooth of researchers used cubic spline as smoothers. Cubic splines
the data. The upward shift indicates increased greenness in have a continuous second derivative and that is sufficient to
1992. Symbols below the zero reference line for no change make curves appear smooth to humans. The elegant math-
indicate that the change is a decrease in some locations. ematical formulation behind splines increased their pop-
The smooth indicates that the increase tends to be highest ularity within the statistical community. However, there
for intermediate values of NDVI. is no a priori best smoother. New methods, such as the
wavelet smoothing, keep appearing in statistical software.
Recently developed wavelet smoothers are supposed to
F. Functional Relationships and Smoothing be better than many smoothers at tracking discontinuities
When y is considered a function of x, common practice is in the functional form. However, the old local median
to enhance scatterplots of (x, y) pairs by adding a smooth smoothers still do well at handling discontinuities.
curve. This is done in Fig. 9 to see if the difference is Smoothers typically have a smoothing parameter that
related to the average of the two values. To avoid the con- needs to be estimated or specified by the user. With com-
siderable human variability in sketching a fit, the standard putational power at hand, cross-validation methods have
procedure is to model the data using a procedure that oth- become increasingly popular as a community standard.
ers can replicate. Figure 10 shows a scatterplot with the This reduces the judgment burdens on the analyst. How-
smooth line generated using loess (see Cleveland et al. ever, this does not guarantee a match between an empiri-
for more details). Loess fits the data using weighted local cal curve and a hypothesized true but unknown underlying
regression. That is, the regression uses data local to x0 curve.
to predict a value at x0. Points closest to x0 receive the
greatest weight. The use of many local regressions pro-
duces a set of pairs (x, y) that are connected to produce III. MULTIVARIATE VISUALIZATION
the smooth. Each local regression in the smooth shown in
Fig. 10 used a linear model in x and included the closest Visualization in the physical sciences is inherently multi-
60% of the observations to the prediction point x0. Such variate. Scientists are interested in the relationships among
a smooth is reproduceable. The smooth in Fig. 10 draws many attributes and the attributes have space–time coor-
further attention to the distinction between ocean and land dinates. The purpose of multivariate graphics is to show
states and additional modeling is appropriate. multivariate patterns and to facilitate comparisons. As in
Smoothing is an extremely important visual enhance- low dimensions, the patterns often concern data distribu-
ment technique. It helps us see the structure through the tions or models with at least one dependent variable.
noise. The decomposition of data into smooth and resid- After converting attributes and their space–time coordi-
ual parts is fundamental technique in statistical modeling. nates to images for evaluation, human visual comparisons
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
typically fall into three categories, comparison of external The entry only briefly describes color since it is not an
images to each other, comparisons of external images to option here. However, color provides an important tool.
external visual references, and comparison of external Different hues provide a good way to distinguish cate-
images to the analyst’s internal references. Internal ref- gories of a categorical variable. The cartographic guidance
erences include scientific knowledge, statistical expecta- limits categories or hues to six or fewer. Humans are very
tions, and process models. The visual investigation process sensitive to a second dimension of color, a dark-to-light
often involves converting internal references into external scale that is referred to in literature by terms such as value,
visual references subject to further manipulation. With lightness, or brightness. Gray levels provide ordered scale
external images and references available, the next step so it is thinkable to represent a continuous variable using
often involves graphical transformation to simpler forms value or gray level. Humans are less sensitive to the dimen-
in terms of our perceptual–cognitive processing abilities. sion of color called saturation. A saturation scale going
Multivariate graphics must deal with the problem of from an achromatic color such as medium-gray to a vivid
showing higher-dimensional relationships and, as in lower red is also an ordered scale. However, humans can make
dimensions, often have to deal with the noise that obscures fewer saturation distinctions than lightness distinctions.
patterns and the overplotting due to many points. Generally color is a poor choice for representing a continu-
ous variable. Humans perceive that hue and brightness are
integral dimensions so we should not use them to encode
A. Graphical Design Principles and Resources two or three variables. Brewer et al. discuss additional
The description indicates the increased complexity as we considerations that apply to the color-vision impaired.
move to higher dimensions. At the same time, Kosslyn Since most people can work with four chunks of
warns that “The spirit is willing but the mind is weak.” We information, this article suggests attempting to fit the
should approach the multivariate challenge prepared to do guidelines into four broad categories of quantitative design
battle. Our arsenal of tools includes design principles that principles:
help us in uncharted territory. Some of our tools include: r Use encodings that have high perceptual accuracy of
extraction
r Distributional caricatures such as box plots to help us r Provide context for appropriate interpretation
deal with large data sets r Strive for simple appearance
r Map caricatures that let us show small multiples r Involve the reader
r Modeling to reduce noise and complexity
r Layering and separating to constrain and guide the
These organizing categories contain some conflicting
information flow guidelines. For example, a long list of caveats may pro-
r Partitioning and sorting to simplify appearance and
vide the context for appropriate interpretation but conflict
facilitate comparisons with simple appearance and reader involvement. Compos-
r Linking low-dimensional views to get insight into
ing graphics that balance among the guidelines remains
higher-dimensional relationships. something of an art form. The communication objectives
influence the balance.
The basic formats for comparison graphics include jux-
taposition, superposition, or the direct display differences.
B. Communication Objectives
The art of multivariate graphics is to select the methods
and enhancements that work best in view of the phenom- Multivariate graphics can have many different communi-
ena’s complexity view and in view of human perceptual cation objectives. Four common objectives include pro-
and cognitive limitations. viding an overview, telling a story, suggesting hypothe-
This entry suggests just a few representation tools and ses, and criticizing a model. In providing an overview,
principles. Good reference include books by Cleveland, broad coverage is important. Achieving clarity often in-
Kosslyn, Wilkinson, and Ware. The books by Tufte pro- volves hiding details and working with caricatures. Sim-
vide wonderful examples of putting design principles into ilarly, in telling a story the predetermined message must
practice. MacEachren provides a good introduction on shine through. Scientists often fail to tell simple stories
mapping and Cleveland and McGill provide an early sur- because they are eager to list caveats and a host of details
vey on dynamic multivariate graphics that go beyond the that qualify the basic results. Interactive web graphics can
scope of the entry. As more work is done in computing alleviate the archival side of this problem providing ready
environments, issues of the computer human interface be- access to metadata, supplemental documents, and giga-
come increasingly important. Card et al. edited a book of byte databases. However, it still takes careful design to
readings that gathers many important concepts. draw readers to the details.
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
C. Surface Views
In some ways multivariate visualization is similar to lower- y
x
dimension visualization. The general tasks are still to show
densities and functional relationships. The difference is FIGURE 12 A perspective view or wireframe plot. This surface
that more variables including spatial and temporal indices shows the density of bivariate points.
are called into play. The density of bivariate points consti-
tutes a third coordinate. The basic idea in bivariate kernel less variable. In the context of maps some approaches to
density estimation is similar to that for univariate esti- modeling address the issue of spatial correlation. Common
mation: average local likelihoods. The key difference is monochrome views again include contour and wireframe
that local neighborhood is bivariate. The result is surface plots. Fully rendered color surfaces with highlights pro-
z = f (x , y). Figure 11 shows the contours of a bivariate vide better a view of the wireframes. Pairing of contour
density surface. Figure 12 shows a wireframe perspec- and surface plots can aid understanding. The surface tends
tive view. to provide a good overall impression (except for what is
Estimating functional relationships of the form hidden) while the contours help to locate features, such as
z = f (x, y) also follows the pattern established with one local extrema, on the plane. Wireframes and translucent
surfaces can be superimposed on maps or contour plots.
Perspective views appeal to many people and have also
been used to show local values as the animated flyovers of
three-dimensional bar charts as well as surfaces. How per-
spective foreshortening complicates accurate decoding of
0.005 values and comparisons. For a detailed study of surfaces,
5 0.015 Cleveland recommends dropping back a dimension. By
0.025 conditioning on intervals of x or y it is possible to return
0.035 study strips of the surface using two-dimensional plots.
0.040
For comparing surfaces there are three standard ap-
y
H. Conditioned Plots
SEE ALSO THE FOLLOWING ARTICLES
Conditioned plots partition the estimates (or data) into
sets based on classes defined for conditioning variables. COLOR SCIENCE • FLOW VISUALIZATION • GRAPH
The different plots for the sets then appear as juxtaposed THEORY • IMAGE PROCESSING
panels. The visualization task is then to study how the
distributions or function relationships shown in the pan-
els vary across the conditioned panels. Tukey and Tukey BIBLIOGRAPHY
provide an early example called a casement display. Con-
ditioned plots appearing as a one-way or two-way layout Bayly, C. I., Cieplak, P., Cornell, W. D., and Kollman, P. A. (1993).
are typically two-dimensional plots, but they can be three- “A Well-Behaved Electostatic Potential Based Method Using Charge
P1: GTV Final Pages
Encyclopedia of Physical Science and Technology EN007J-297 July 6, 2001 16:57
Restraints for Deriving Atomic Charges: The RESP Model,” J. Phys. inference through sections and projections,” J. Comp. Graph. Stat.
Chem. 97, 10269–10280. 3(4), 323–353.
Brewer, C. A., MacEachren, A. M., Pickle, L. W., and Herrmann, D. Hastie, T. J., and Tibshirani, R. J. (1990). “Generalized Additive
(1997). “Mapping Mortality, evaluation color schemes for choropleth Models,” Chapman and Hall, New York.
maps,” Annals of the Association of American Geographers 87(3), Kosslyn, S. M. (1994). “Elements of Graph Design,” W. H. Freeman and
411–438. Company, New York.
Card, K., Mackinlay, J. D., and Schneiderman, B. (1999). “Information MacEachren, A. M. (1994). “Some Truth with Maps: A Primer on
Visualization Using Vision To Think,” Morgan Kaufmann Publishers, Symbolization & Design,” Association of American Cartographers,
Inc., San Francisco, CA. Washington, D.C.
Carr, D. B. (1991). Looking at large data sets using binned data plots. Scott, D. W. (1992). “Multivariate Density Estimation; Theory, Practice
“Computing and Graphics in Statistics” (A. Buja and P. Tukey, eds.), and Visualization,” Wiley, New York.
pp. 7–39, Springer-Verlag, New York. Tufte, E. R. (1983). “The Visual Display of Quantitative Information,”
Cleveland, W. S. (1993). “Visualizing Data,” Hobart Press, Summit, NJ. Graphics Press, Cheshire, CN.
Cleveland, W. S. (1994). “The Elements of Graphing Data,” Hobart Tufte, E. R. (1990). “Envisioning Information,” Graphics Press,
Press, Summit, NJ. Cheshire, CN.
Cleveland, W. S., and McGill, R. (1984). “Graphical perception: The- Tufte, E. R. (1997). “Visual Explanations,” Graphics Press, Cheshire,
ory, experimentation, and application to the development of graphics CN.
methods,” J. Am. Stat. Assoc. 79, 531–554. Wilkinson, L. (1999). “The Grammar of Graphics,” Springer-Verlag,
Cleveland, W. S., and McGill, M. E. (eds.). (1988). “Dynamic Graphics New York.
for Statistics,” Chapman and Hall, New York. Wood, D. W. (1992). “The Power of Maps,” The Guilford Press, New
Furnas, G. W., and Buja, A. (1994). “Prosection views: Dimensional York.
P1: LLL/GKP P2: FYK Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
Group Theory
Ronald Solomon
Ohio State University
I. Fundamental Concepts
II. Important Examples
III. Basic Constructions
IV. Permutation Groups and Geometric Groups
V. Group Representations and Linear Groups
VI. Finite Simple Groups
VII. Other Topics
137
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
Any such function may be thought of as a permutation or (3) For any x ∈ G, there is an element x −1 ∈ G such that
symmetry of the set X , and S(X ) is generally referred to x −1 ◦ x = e = x ◦ x −1 .
as the symmetry group of X or as the symmetric group on
the set X . This definition makes the group operation the promi-
Historically, groups emerged first in the work of nent feature and is indifferent to the kind of mathematical
Lagrange, Ruffini, and Cauchy around 1800 as sets of objects which the elements of G might be.
permutations of a finite set X . In this context the defini- The element e is unique and is called the identity ele-
tion may be simplified. A nonempty set G of permuta- ment of G, and the element x −1 is uniquely determined
tions of a finite set X is a group if and only if G is closed by x and is called the inverse of the element x. Henceforth
under composition of permutations. Closure under func- in this article we shall in general write x y for x ◦ y. Also,
tional inversion follows automatically. The condition of we shall typically speak of a group G, with the operation
closure under composition puts severe restrictions on the ◦ being tacitly understood. It is important to note that in
group G, e.g., the remarkable fact discovered by Lagrange general x y = yx. A group in which the commutative law
that if |X | = n [and so |S(X )| = n!] and G is any subgroup
of S(X ), then |G| is a divisor of n! x y = yx for all x, y ∈ G
The subject was greatly enriched by Evariste Galois,
who demonstrated around 1830 how to associate to each holds is called an abelian group in honor of Nils Abel, who
polynomial equation a finite group (called its Galois first recognized that a polynomial equation whose Galois
group) whose structure encodes considerable information group is abelian is solvable by radicals. Other groups are
concerning the algebraic relationships among the roots of called nonabelian groups.
the equation. An analogous theory for differential equa- Sometimes when G is an abelian group, the operation
tions was developed later in the century by Lie, Picard, is denoted by + and the identity element by 0. Many im-
and Vessiot. The relevant groups in this setting were no portant sets of numbers form abelian groups under +, for
longer finite groups but rather continuous groups (more example, (C, +) and such subsets as (Z, +), (Q, +), and
commonly known as Lie groups). (R, +).
Starting around 1850, Cayley championed the idea Groups of numbers like (R, +) seem quite different
of studying groups abstractly rather than via some spe- from groups of functions. However, we may identify (R,
cific representation. This notion simplifies numerous ar- +) with the group (T, ◦) of all translation functions Ta :
guments and constructions, notably the construction of R1 → R1 defined by
quotient groups. The first modern definition of a group Ta (x) = a + x for all x ∈ R1 .
was given by Dyck in 1882, who also pioneered the study
of infinite groups. More generally, as observed by Cayley and Jordan, if (G, ·)
is any abstract group, then G may be regarded as a group of
invertible functions on the set G by regarding the element
g ∈ G as the function λg : G → G defined by
I. FUNDAMENTAL CONCEPTS
λg (x) = gx for all x ∈ G.
We shall freely use the standard notation of naive set theory
with regard to sets, subsets, elements, relations, functions, It is for this reason that our two definitions of group are
etc. By a binary operation ◦ on a set S, we shall mean a essentially the same. The mapping G → S(G) taking g to
function ◦ : S × S → S. It is customary to say that S is λg is called the (left) regular representation of G.
closed under the binary operation ◦ in this situation, and Groups like (Z, +), (Q, +), and (R, +) are instances of
it is customary to write x ◦ y for the value of ◦ at the subgroups of (C, +) in the following sense.
ordered pair (x, y) ∈ S × S. The symbols Z, Q, R, and Definition: Let (G, ◦) be a group. A nonempty subset H
C will denote the sets of integers, rational numbers, real of G is a subgroup of G if (H, ◦ H ) is itself a group, where
numbers, and complex numbers, respectively. ◦ H is the restriction of the function ◦ to H × H .
We now give the promised abstract definition of a group.
It follows readily that the identity in H is the identity
Definition: A group is an ordered pair (G, ◦) such that in G and that inverses of elements in H are the same as
G is a set closed under the binary operation ◦ satisfying: their inverses in G.
Given any subgroup H of a group G, we may define an
(1) (x ◦ y) ◦ z = x ◦ (y ◦ z) for all x, y, z ∈ G. equivalence relation on the group G by the rule
(2) There is an element e ∈ G with e ◦ x = x = x ◦ e for
all x ∈ G. g ≡ H g1 if g −1 g1 ∈ H.
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
The equivalence classes determined by this equivalence The quotient construction affords important examples
relation are called the left cosets of H in G. Thus the left of cyclic groups. Namely, for each positive integer n, the
coset containing the element g is just subset nZ of all multiples of n in Z is a subgroup of the
abelian group Z and hence we have cyclic groups Z/nZ,
gH = {gh : h ∈ H }. generated by the coset 1 + nZ. Cyclic groups arise in many
natural contexts. For instance, the set Z (n) of all com-
The set G\H of all left cosets of H in G forms a space plex nth roots of 1 forms a cyclic group of cardinality
on which G acts by left multiplication, in the sense that to n under multiplication. Likewise, the group of all rota-
each element g ∈ G there is naturally associated a func- tional symmetries of a regular n-sided polygon is a cyclic
tion L g : G\H → G\H by L g (xH ) = gx H and the func- group Cn of cardinality n, under composition of functions.
tional equation L gg1 = L g ◦ L g1 holds for all g, g1 ∈ G. The groups Z/nZ, Z (n), and Cn have identical proper-
The left regular representation of G is the special case ties as abstract groups, although their manifestations in
where H = {e}. However, although the map g → λg is one- the “real world” are different. Under such circumstances
to-one, in general, many elements of G may give rise to the groups are called isomorphic. The formal definition
the same function on G\H . follows.
The fact that all left cosets of H have the same cardinal-
Definition: The function : (G, ◦) → (H, ·) is an iso-
ity yields the first important theorem about finite groups.
morphism (of groups) if : G → H is a bijective function
Lagrange’s Theorem: Let G be a finite group and H a satisfying:
subgroup of G. Then |H | is a divisor of |G|.
(g ◦ g ) = (g) · (g ) for all g, g ∈ G.
In a similar way, it is possible to define the right cosets
H g of a subgroup H . An important class of subgroups, Two groups which are isomorphic are indistinguishable
first studied by Galois, is the normal subgroups, which are as abstract groups, and the isomorphism may be regarded
those for which the right and left cosets coincide. as a dictionary for translating from one group to the other.
Much of the power of abstract group theory derives from
Definition: A subgroup N of a group G is a normal sub-
the fact that a given theorem about abstract groups may
group if and only if g N = N g for all g ∈ G. When N is a
have numerous important instantiations via different re-
normal subgroup of G, we write N G.
alizations of the abstract groups. For later reference, we
Given a group G and a normal subgroup N , we can note that an isomorphism : G → G between a group
construct a new group G/N , called the quotient group of and itself is called an automorphism of G. The set of all
G by N . automorphisms of G itself forms a group, Aut(G), un-
der composition of functions. (We shall refer later to an
Definition: Let G be a group and N a normal subgroup
automorphism of a field, which has a similar meaning.)
of G. The set G\N with the operation (gN)(g N ) = gg N
Returning to cyclic and abelian groups, we observe that
is a group, called the quotient group G/N. (Note that it is
we now know, up to isomorphism, every cyclic group and
conventional to replace \ by / when the set is regarded as
every abelian simple group.
a group.)
Theorem:
The normality condition is necessary for the multipli-
cation on G\H to be well defined. The identity subgroup (1) Every cyclic group is isomorphic either to Z or to
{e} and the entire group G are always normal subgroups of Z/nZ for some positive integer n; and
every group G. A group G whose only normal subgroups (2) Every abelian simple group is isomorphic to Z/pZ
are {e} and G is said to be a simple group. Simple groups for some prime p.
play the role in the theory of finite groups that prime num-
In contrast to this elementary result, the classification
bers play in the theory of numbers, as remarked by Michael
of the finite nonabelian simple groups was a very chal-
Solomon, among others. All finite groups are “composed”
lenging and complex undertaking, and the classification
from the simple ones.
of infinite simple groups is surely impossible. In any case,
At the other extreme, every subgroup of an abelian
a strategy for the analysis of a general (finite) group is to
group is normal. If G is any group and g ∈ G, then the
“decompose” it into simple factors via an iterated quotient
subset {g n : n ∈ Z} forms a subgroup of G called the cyclic
group construction. This is meaningful for finite groups
subgroup generated by g and often denoted g. If G = g,
thanks to the following theorem.
then G is called a cyclic group. It follows that an abelian
simple group must be cyclic, indeed cyclic of cardinality Jordan–Hölder Theorem: Let G be a finite group. Then
p for some prime p. there is an integer n ≥ 0 and a chain of subgroups
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
for all u, v, w ∈ V and all c, d ∈ F. We say that the form The symplectic group Sp(V ) is the group of all isometries
is symmetric if b(u, v) = b(v, u) for all u, v ∈ V . We say of the symplectic space (V, b). [As (V, b) is unique up to
that the form is alternating if b(v, v) = 0 for all v ∈ V . isometry, it is unambiguous (up to isomorphism) to write
Sp (V).] If (V, Q) is an orthogonal space and T : V → V
Definition: A quadratic form on an F-vector space V is
is a linear operator, then T is an isometry of (V, Q) if
a function Q : V → F such that
Q(v) = Q(T (v)) for all v ∈ V.
(1) Q(av) = a Q(v) for all a ∈ F, v ∈ V ; and
2
The orthogonal group O(V, Q) is the group of all isome-
(2) b(u, v) = Q(u + v) − Q(u) − Q(v) defines a bilinear tries of the orthogonal space (V, Q).
form (necessarily symmetric) on V.
An isometry is necessarily a bijection, whence Sp(V )
and O(V, Q) are subgroups of GL(V ). Classical mechan-
If (V, b) is a space with a symmetric bilinear form and
ics focusses on the case where F = R and Q is positive
if u and v are vectors in V of length 1, i.e., b(u, u) =
definite, i.e.,
1 = b(v, v), then b(u, v) may be thought of as the cosine
of the angle between the vectors u and v. In all cases, Q(x1 , x2 , . . . , xn ) = x12 + x22 + · · · + xn2 ,
if b(u, v) = 0 we say that u and v are orthogonal vec-
tors. Note that if b(v, v) = 0 for all v ∈ V , then b(u, v) = and so often in the physics literature the group O(Rn , Q)
−b(v, u) for all u, v ∈ V , and so in both the alternating for this choice of Q is denoted simply O(n). The Lorentz
and symmetric cases, orthogonality is a symmertic rela- group, however, which arises in relativity theory, is an or-
tion on V . thogonal group defined with respect to a four-dimensional
indefinite quadratic form.
Definition: If (V, b) is a space with an alternating or Symplectic isometries always have determinant 1.
symmetric form b, we let There is a somewhat larger group CSp(V ) of symplectic
similarities, i.e., linear operators T : (V, b) → (V, b) such
V ⊥ = {v ∈ V : b(u, v) = 0 for all u ∈ V }. that for all v, w ∈ V ,
If (V, Q) is a space with a quadratic form Q and associ- b(T (v), T (w)) = λT b(v, w),
ated symmetric bilinear form b, then we let
for some nonzero scalar λT ∈ F. This group will reappear
Rad(V ) = {v ∈ V ⊥ : Q(v) = 0}. toward the end of Section V.
Orthogonal isometries have determinant ±1. The sub-
We say that a space (V, b) with an alternating form b group of O(V, Q) consisting of isometries of determinant
is a symplectic space if V ⊥ = {0}. It may be shown that a 1 is called the special orthogonal group, SO(V, Q). In the
finite-dimensional symplectic space V must have even di- case of O(n), it is denoted SO(n) and interpreted as the
mension and for every even positive integer 2n and field F, group of all orientation-prserving isometries of Rn fixing
there is (up to isometry) exactly one symplectic F-space of the origin. For n = 3, SO(3) is the group of all rotations
dimension 2n. We say that a space (V, Q) with a quadratic of R3 about an axis through (0, 0, 0). The earliest classi-
form Q is an orthogonal space if Rad(V ) = {0}. There may fication theorem for a class of nonabelian groups was the
be many orthogonal spaces of a given dimension over a classification of finite subgroups of SO(3), achieved in the
given field; the number depends on both the field and the mid-nineteenth century.
dimension. The classical law of inertia of Sylvester asserts
in particular that there are exactly n + 1 nonisometric or- Theorem: A finite subgroup of SO(3) is a subgroup of
thogonal structures on a real vector space of dimension the full group of symmetries of one of the following ob-
n. An orthogonal space (V, Q) is completely determined jects centered at (0, 0, 0): a regular polygon lying in a
by its associated bilinear structure (V, b) when the field plane through (0, 0, 0) or one of the five regular poly-
F has characteristic different from 2 (e.g., when F is a hedra (Platonic solids): the tetrahedron, the cube, the
subfield of C). octahedron, the dodecahedron, or the icosahedron.
The symplectic and orthogonal groups are the isometry It follows that a finite subgroup G of SO(3) is either
groups of symplectic and orthogonal spaces, respectively, a cyclic group or is the dihedral group Dn of all symme-
in the following sense. tries of a regular n-gon or is the tetrahedral, octahedral, or
Definition: If (V, b) is a symplectic space and T : V → V icosahedral group (the group of rotational symmetries of
is a linear operator, then T is an isometry of (V, b) if the tetrahedron, octahedron, or icosahedron, respectively).
Finally, when the field F admits an involutory automor-
b(u, v) = b(T (u), T (v)) for all u, v ∈ V. phism σ (i.e., σ 2 = I = σ ), there is a further important
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
family of classical linear groups, namely, the unitary Definition: Let G and H be groups. The Cartesian prod-
groups. uct set G × H with the operation
Definition: Let F be a field and let σ be an involutory (g, h) · (g , h ) = (gg , hh )
automorphism of F. A (σ -)hermitian form on an F-vector
is a group, called the direct product G × H.
space V is a function h : V × V → F which is linear in the
first position and which satisfies This
construction can be iterated to define direct prod-
σ ucts G i over arbitary index sets I . Another important
h(v, w) = h(w, v) for all v, w ∈ V.
generalization is the semidirect product. Before defining
A pair (V, h) with V an F-vector space and h a σ - this, we must generalize the notion of an isomorphism of
Hermitian form on V is said to be a unitary space if groups.
V ⊥ = {0}. We denote by U (V, h) the group of all isome-
tries of the unitary space (V, h) and call it the unitary Definition: Let (G, ◦) and (H, ·) be groups. A function
group of (V, h). The subgroup of unitary isometries of f : G → H is a homomorphism of group provided that
determinant 1 is the special unitary group, SU(V, h). f (g ◦ g ) = f (g) · f (g ) for all g, g ∈ G.
Of course the most important of all involutory automor-
Thus an isomorphism of groups is a bijective homo-
phisms is the complex conjugation map on C. When h is
morphism of groups. In general, failure of injectivity for
the standard positive definite inner product on Cn , the uni-
homomorphisms is measured by the following subgroup.
tary group U (Cn , h) is often denoted simply U (n). There
are, however, other inequivalent hermitian forms on Cn . Definition: Let f : G → H be a homomorphism of
There is a famous twofold covering map from SU(2) onto groups. The kernel of f is defined by
SO(3) called the spin covering because of its connection
Ker( f ) = {g ∈ G : f (g) = e H },
with the concept of spin in quantum mechanics.
If F is a finite field with |F| = q 2 for some prime power where e H is the identity element of the group H.
q, then F admits an involutory automorphism and it is
The pre-image f −1 (h) of each element in f (G) is both
possible to define unitary groups of isometries of finite-
a left and right coset of Ker( f ) = f −1 (e H ) and so has the
dimensional F-vector spaces. Up to isomorphism, in the
same cardinality as Ker( f ). In particular, f is injective if
finite field case, the unitary group is uniquely determined
and only if Ker( f ) = {eG }, where eG is the identity element
by the field F and the dimension n of the vector space. Un-
of G. There is an intimate connection between normal
fortunately, if |F| = q 2 , some group theorists denote this
subgroups and homomorphisms.
group Un (q 2 ), while others write Un (q). This is only one
instance of the notational inconsistencies which litter the Theorem: Let f : G → H be a homomorphism of groups
terrain of classical linear groups. The original notations of with kernel K. Then K is a normal subgroup of G
Jordan and Dickson have been largely abandoned or mod- and f induces an isomorphism between G/K and f (G).
ified in favor of a “classical” notation dating from the time Conversely, if K is any normal subgroup of the group G,
of Weyl. However, when Cartan (for the field R) and later then the function π : G → G/K defined by
Chevalley and Steinberg (for arbitrary fields) constructed
π (g) = gK for all g ∈ G
these groups from a Lie-theoretic standpoint, they adopted
the notation of Killing, which is completely different from is a homomorphism of groups with kernel K.
Weyl’s notation. Moreover, even within the Weyl frame-
Now we can describe the semidirect product construc-
work, there are treacherous inconsistencies. For instance,
tion of groups.
some write Spn (F) and others Sp2n (F), both meaning the
group of symplectic isometries of a 2n-dimensional F- Definition: Let K and H be groups and let φ : H →
vector space. Let the reader beware. Aut(K) be a homomorphism of groups. The semidirect
product K :φ H (or simply K : H ) is the group whose un-
derlying set is the Cartesian product set K × H with mul-
III. BASIC CONSTRUCTIONS
tiplication defined by
In Section I we addressed the problem of decomposing (k, h)(k , h ) = (kφ(h)(k ), hh )
a group into its constituent simple composition factors
for all k, k ∈ K , h, h ∈ H.
(when possible). Now we consider the opposite problem
of “composing” two or more groups to create a new and The direct product K × H is the special case of the
larger group. One fundamental construction is the direct above construction in which φ is the homomorphism map-
product construction. ping every element of H to the identity automorphism
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
of K . In all cases, {(k, e H ) : k ∈ K } is a normal sub- and β are understood to be isomorphisms. Like the direct
group of K : H , which is isomorphic to K . The subset product, all of these product constructions can be iterated.
H0 = {(e K , h) : h ∈ H } is a subgroup of K : H isomorphic In some ways dual to the center of G is the commutator
to H , but is not in general a normal subgroup of K : H . quotient.
Indeed, H0 is also normal if and only if K : H ∼ = K × H.
Definition: Let G be a group. If x, y ∈ G, then the com-
If all composite (i.e., not simple) groups could be con-
mutator [x, y] = x −1 y −1 x y. The commutator subgroup
structed from proper subgroups by an iterated semidirect
[G, G] of G is defined by
product construction, then the classification of all finite
groups, or even all groups having a composition series, [G, G] = [x, y] : x, y ∈ G,
would at least be thinkable, if not doable. However, not,
i.e., [G, G] is the subgroup of G generated by all commu-
all composite groups can be so constructed, as is illus-
tators of elements of G. Then [G, G] is a normal subgroup
trated by the easy example of the cyclic group C p2 . The
of G and the commutator quotient G/[G, G] is the largest
obstruction to a composite group “splitting” as a semidi-
abelian quotient group of G.
rect product was first analyzed by Schur. The attempt to
parametrize the set of all groups which are an “extension” Note that G is an abelian group if and only if Z (G) = G
of a given normal subgroup by a given quotient group was if and only if [G, G] = {e}. Sometimes G/[G, G] is called
one of the motivating forces for the development of the the abelianization of G.
cohomology theory of groups. This leads to the definition of the following important
In the context of finitely generated abelian groups, a classes of groups.
complete classification is nevertheless possible and essen-
Definition: A nonidentity finite group G is quasi-simple if
tially goes back to Gauss. (A group G is finitely generated
G = [G, G] and G/Z (G) is a (nonabelian) simple group.
if there is a finite subset S of G such that every element
A finite group G is semisimple if G = G 1 ◦ G 2 ◦ · · · ◦ G r ,
of G is expressible as a finite word in the “alphabet” S.
where each Gi is a quasisimple group.
We write G = S if G is generated by the elements of
the subset S. In particular, every finite group G is finitely In the contexts of Lie groups and algebraic groups, a
generated: you may take S = G.) group is called simple if every proper closed normal sub-
group is finite. Thus the group S L(n, C), whose center
Theorem: If A is a finitely generated abeliean group, then
is isomorphic to the finite group of complex nth roots
A is a finite direct product of cyclic groups.
of 1, would be called a simple group by a Lie theorist
On the other hand, the enumeration of all finite p-groups and a quasi-simple (but not simple) group by a finite
is an unthinkable problem. For p a prime, we call a group group theorist. There is a notion of semisimple group in
P a p-group if all of the elements of P have order a power the categories of linear algebraic groups and Lie groups
of p. By theorems of Lagrange and Cauchy, P is a finite p- which coincides with the definition above for connected
group if and only if |P| = p n for some n ≥ 0. G. Higman groups.
and Sims have established that the number P(n) of p- Some slightly larger classes of groups play an impor-
groups of cardinality p n is asymptotic to a function of the tant role in many areas. The following definitions are not
form p 27 n +o(n ) as n → ∞.
2 3 3
standard. The concept of a connected reductive group is
In a different direction, another important generaliza- fundamental in the theory of algebraic groups, in which
tion of the direct product is the central product. First we context it is defined to be the product of a normal semisim-
introduce an important normal subgroup of a group G. ple group and a torus (a group isomorphic to a product of
GL1 ’s). The second definition below approximates this
Definition. Let G be a group. The center, Z (G), of G is
notion in the category of all groups.
defined by
Z (G) = {z ∈ G : zg = gz for all g ∈ G}. Definitions: A group G is almost simple if G has a nor-
mal subgroup E which is quasi-simple and G/Z (E) is
Definition: Let H, K, and C be groups and let α : C →
isomorphic to a subgroup of Aut(E). A group G is al-
Z (H ) and β : C → Z (K ) be injective homomorphisms.
most semisimple if G has a normal subgroup E = S ◦ Z (E)
Let
with S = S1 ◦ S2 ◦ · · · ◦ Sr semisimple (with each Si quasi-
Z = {(α(c), β(c)) ∈ H × K : c ∈ C}. simple) and G/Z (E) is isomorphic to a subgroup of
Then the central product H ◦C K is the quotient group Aut(S) normalizing each Si .
(H × K )/Z .
Most of the classical groups described above are almost
Sometimes the central product is (ambiguously) writ- semisimple. Indeed, S is a quasi-simple group in most
ten H ◦ K . Often this is done when Z (H ) ∼
= Z (K ) and α cases. Also, the Levi complements of parabolic subgroups
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
subgroup V of translation maps is a regular normal sub- on X if for any two ordered k-tuples (x1 , . . . , xk ) and
group, i.e., V transitively permutes the vectors of V via (y1 , . . . , yk ) with xi = x j and yi = y j for i = j, there
translation and the stabilizer Vu of any vector u is {I }, exists an element g ∈ G with g(xi ) = yi for all i.
the identity function (regarded as translation by 0). Thus
Thus transitive is the same as 1-transitive. In the 1860s,
AGL(V )0 = GL(V ).
Mathieu discovered two remarkable groups, M12 and M24 ,
In the theory of permutation groups an important role
which act 5-transitively on sets of cardinality 12 and 24,
is played by the socle of G, which is the product of
respectively. Despite considerable effort, the only known
all minimal normal subgroups of G. There exist rather
proof of the following theorem is as a corollary of the
detailed theorems concerning the structure of primitive
classification of finite simple groups.
permutation groups, in particular describing the structure
of the socle and its intersection with the stabilizer of a Theorem: Let G be a finite k-transitive permutation
point. group on a set X for some k ≥ 4. Then either G =
S(X ) with |X | ≥ 4 or G = Alt(X ) with |X | ≥ 6 or G =
Theorem (O’Nan-Scott): Let G be a primitive permuta-
M11 , M12 , M23 , or M24 . [Here M11 (resp. M23 ) is the sta-
tion group on a finite set X with |X | = n, and let B be the
bilizer of a point in M12 (resp. M24 ).]
socle of G. Then one of the following holds:
Highly transitive permutation groups lead to tight com-
1. X may be regarded as an affine space with n = p m for binatorial designs which may often be interpreted as error-
some prime p, B is the abelian group of all correcting codes or dense sphere-packing lattices. Specif-
translations of X and G is a subgroup of the affine ically, Witt described designs (or Steiner systems) S(5, 6,
group AGL(X ); or 12) and S(5, 8, 24) associated with the Mathieu groups
2. B = S k is the direct product of k ≥ 1 isomorphic M12 and M24 , respectively. These yield the ternary and
nonabelian simple groups acting via the product binary Golay codes, respectively.
action on the Cartesian product set X = Y k , where S There is also a complete description of finite k-transitive
is a primitive simple subgroup of S(Y ); or permutation groups for k = 2 and 3. One important class of
3. B = S k is the direct product of k > 1 nonabelian examples of 2-transitive permutation groups is the class
simple groups acting either via diagonal action with of projective general linear groups PGL(V ). For V any
|X | = |S|k−1 and with Bx a diagonally embedded vector space (over a field F) of dimension at least 2, we
copy of S, or via twisted wreath action with may form the projective space P(V ) whose objects are
|X | = |S|k and Bx = 1. the k-dimensional subspaces of V, k ≥ 1, with incidence
being given by symmetrized containment of subspaces.
The O’Nan-Scott theorem focuses attention on the Then GL(V ) acts on the points of P(V ) and the ker-
primitive permutation actions of nonabelian simple groups nel of the action is Z (GL(V )), the group of scalar linear
and indeed, in the wake of the classification of finite sim- transformations on V . The projective general linear group
ple groups, considerable effort has been invested in the PGL(V ) = GL(V )/Z (GL(V )) then acts as a 2-transitive
program of determining all maximal subgroups of all fi- permutation group on the set P(V ). The image of SL(V ) in
nite simple groups and all closed maximal subgroups of the symmetric group on P(V ) is the subgroup of PGL(V ),
all simple algebraic groups. There exist detailed structure denoted PSL(V ), and it too acts 2-transitively on P(V ).
theorems for these maximal subgroups, as well as exten- The lines of projective spaces are coordinatized by the
sive specific information. Besides the case-by-case enu- elements of the field F plus one extra “point at infin-
merations of maximal subgroups for the sporadic simple ity.” As no element of PGL(V ) can map three collinear
groups, most of this theory has been developed in the con- points to three noncollinear points, the action of PGL(V )
text of simple algebraic and Lie groups by Dynkin, Seitz, cannot be 3-transitive unless the geometry contains only
Liebeck, and others. Lifting theorems, originating with one line. This is precisely the case when dim(V ) = 2, and
Steinberg, make it possible to a certain extent to trans- indeed in this case the action is 3-transitive. If V is two-
fer maximal subgroup questions concerning finite classi- dimensional over the finite field F with |F| = q, then we
cal linear and exceptional groups to analogous questions write PGL(V ) = PGL(2, q) and PSL(V ) = PSL(2, q). If
about algebraic groups over the algebraic closure of the q = 2m , then PGL(2, q) = PSL(2, q), while if q is odd,
finite field. then the PSL(2, q) is a normal subgroup of PGL(2, q) of
Since the mid-nineteenth century there has been con- index 2.
siderable interest in highly transitive permutation groups.
Theorem: Let G be a finite 3-transitive (but not 4-
Definition: Let G be a permutation group on a set X and transitive) permutation group on a set X. Then one of the
let k be a positive integer. We say that G acts k-transitively following holds:
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
in the sense that each indecomposable summand Vi is Burnside’s pa q b Theorem: If G is a finite group with
irreducible, i.e., Vi has no proper G-invariant subspaces. |G| = pa q b , with p and q primes, then G is a solvable
On the other hand, every finite group of order divisi- group.
ble by a prime p has a p-modular representation space There is also a theory of Brauer characters in the context
V which is indecomposable but not irreducible. The of modular representation theory, but the Brauer characters
p-modular representation theory for cyclic p-groups is do not determine the modular representations uniquely,
precisely the theory of unipotent Jordan canonical ma- except in the case of irreducible modular representations.
trices. A unipotent Jordan block represents an indecom- A fundamental construction in representation theory, in-
posable module and is irreducible only if it is a 1 × 1 troduced by Frobenius, is the induced representation. First
block. Similar considerations play a significant role in the note that for a finite group G and a field F, the F-vector
representation theory of Lie groups and algebraic groups. space F[G] with basis G becomes a ring when endowed
Thus a fundamental result in Lie theory asserts the com- with the unique extension of the group multiplication to
plete reducibility of finite-dimensional complex repre- F[G] via the distributive law. F[G] may also be regarded
sentations of complex semisimple Lie groups. By con- as the algebra of F-valued functions on G with convolu-
trast, nilpotent Lie groups typically have indecomposable tion product and, as such, has natural generalizations in
finite-dimensional complex representations which are not the categories of Lie and algebraic groups. An F-vector
irreducible. space V with a G-action is an F[G]-module in a natural
For the remainder of this section we shall assume that way.
V is finite-dimensional, although many of the concepts
such as characters and induced representations have wider Definition: Let G be a finite group, H a subgroup of G, F
applicability. a field and V an F[H ]-module. The induced F[G]-module
In characteristic 0, a G-representation ρ is completely V G is the tensor product F[G] ⊗ F[H ] V with the action of
determined by its trace function or character χρ , defined by F[G] via left multiplication.
χρ (g) = Tr(ρ(g)) for all g ∈ G, There is a similar construction in the categories of Lie
groups and algebraic groups, which plays a fundamen-
where Tr is the trace of the matrix ρ(g). As similar tal role, for example, in Harish-Chandra’s construction of
matrices have the same trace, χρ is a class function, i.e., it unitary representations of semisimple Lie groups. In this
is constant on G-conjugacy classes. The (ordinary) char- context it is often best to study representations of the asso-
acter theory of finite groups was developed by Frobenius, ciated Lie algebra, in which case the universal enveloping
Burnside, and Schur around 1900. For abelian groups, algebra substitutes for the group algebra.
complex characters are simply homomorphisms into C× , Induced modules are analogous to imprimitive sets in
and they arose in the number-theoretic investigations of the theory of permutation groups. Indeed, a G-module V
Legendre, Gauss, and Dirichlet. It was Dedekind whose is called imprimitive if V = U G , where U is an H -module
letter to Frobenius stimulated the extension of this theory for some proper subgroup H of G. Otherwise V is called
to nonabelian groups. a primitive G-module.
The following concept goes back to Galois. A new complication in representation theory relates to
Definition: A group G is solvable if and only if there is a the size of the field F. This is the extension of the basic
series problem of finding eigenvalues in the field for a matrix
or linear operator. Thus the theory works most smoothly
G = G 0 > G 1 > · · · > G n−1 > G n = {e}, over algebraically closed fields such as C. Sometimes,
however, it is necessary to work over smaller fields and
with G i+1 G i for all i < n and with G i /G i+1 an abelian
due caution is necessary. For example, an F[G]-module
group.
V may be irreducible but not absolutely irreducible, in the
For finite groups this is equivalent to the assertion that sense that V becomes reducible upon suitable extension
the composition factors of G are all cyclic of prime order. of the field of scalars. The theory of Schur indices and
Galois established that a polynomial equation is “solv- Brauer groups is important in this context.
able by radicals” if and only if its Galois group is a finite There is a further important way of “analyzing” a
solvable group, extending the result of Abel mentioned G-module into simpler constituents. An F[G]-module V
in Section I. Sylow proved that a finite group of prime- is said to be tensor decomposable if V ∼ = U ⊗ W with U
power cardinality is necessarily a solvable group. One of and W F[G]-modules. [Here ⊗ denotes the ordinary ten-
the great early achievements of character theory was the sor product over F and g(u ⊗ w) = g(u) ⊗ g(w).] Other-
proof of the following result. wise V is tensor indecomposable. A fundamental theorem
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
of Steinberg, the tensor product theorem, describes all irre- of B(n, C), the solvable group of all upper triangular
ducible rational modules for simple algebraic groups and matrices in GL(n, C).
all irreducible modules for finite groups of Lie type in the
Thus every subgroup of U (n, C) is nilpotent and every
defining characteristic (i.e., p-modular representations,
closed connected nilpotent subgroup of GL(n, C) is con-
where p is the defining prime for the group) as tensor prod-
jugate to a subgroup of B(n, C). Lie’s theorem does not
ucts of certain basic modules and their Galois “twists.”
extend to arbitrary fields. For example, SO(2) is a closed
As tensor decomposability parallels reducibility, so too
connected abelian subgroup of SL(2, R) which is not con-
there is a notion of a tensor-induced F[G]-module which
jugate to a subgroup of B(2, R).
is analogous to the notion of an imprimitive F[G]-module.
Returning to abstract groups, it can easily be shown that
We can call a module tensor-primitive if it is neither ten-
the product of two normal nilpotent subgroups of a group
sor decomposable nor tensor-induced. Thus the “prime”
G is again nilpotent. Hence if G is a finite group, then G
object in this context is an absolutely irreducible primitive
has a unique maximal normal nilpotent subgroup, called
and tensor-primitive F[G]-module.
the fitting subgroup of G, F(G). Likewise the product of
In the context of linear representation theory, the socle
two normal semisimple subgroups of a group G is again
is no longer the correct subgroup to consider. For exam-
semisimple and so a finite group G has a unique maximal
ple, whereas the socle of the permutation group Sn (when
normal semisimple subgroup denoted E(G). Moreover,
n ≥ 5) is the nonabelian simple group An , which captures
the subgroups E(G) and F(G) commute with each other
much of the structure of Sn , the socle of the general lin-
elementwise.
ear group GL(n, q), when n divides q − 1, is a subgroup
of the group Z of scalar matrices and captures very little Definition: The generalized Fitting subgroup F ∗ (G) of
of the structure of GL(n, q). Surprisingly it was not until the finite group G is the central product F ∗ (G) = E(G) ◦
1970 that the correct replacement for the socle in the con- F(G).
text of linear groups or general finite groups was defined.
Notice that if F(G) is an abelian group, then F ∗ (G)
Before giving the definition, we must finally discuss the
is an almost semisimple group. In the general theory of
important class of nilpotent groups.
finite groups (or more generally of groups having a com-
Definition: For a group G, we define recursively G 0 = G position series), the group F ∗ (G) plays the role that the
and G i+1 = [G i , G] for i ≥ 0. [Thus G 1 = [G, G] is the socle plays in the theory of permutation groups. Thus, re-
commutator subgroup of G.] A group G is said to be nilpo- turning to the example of GL(n, q), although the socle is
tent (of nilpotence class n) if G n = {e} for some positive typically a subgroup of the group of scalar matrices Z ,
integer n (with n the smallest such positive integer). F ∗ (GL(n, q)) = S L(n, q) ◦ Z captures most of the struc-
ture of GL(n, q).
Thus an abelian group is a nilpotent group of nilpo-
To further appreciate the fundamental significance of
tence class 1. As G i is a normal subgroup of G for all i
F ∗ (G) we must introduce a basic concept.
and as G i/G i+1 is abelian, every nilpotent group is a solv-
able group. However, the converse is false, as shown by Definition: Let X be any subset of the group G. The set
the symmetric group S3 . Sylow proved that every finite
p-group is nilpotent, and indeed, the following theorem C G (X ) = {g ∈ G : gx = xg for all x ∈ X }
characterizes finite nilpotent groups. is a subgroup of G called the centralizer in G of X.
Theorem: A finite group G is nilpotent if and only if G is Definition: Let G be a group and N a normal subgroup
a direct product of finite p-groups. of G. The conjugation action of G on the elements of N de-
In the category of linear algebraic groups, the quin- fines a homomorphism : G → Aut(N ). The kernel of this
tessential (though certainly not the only) examples of con- homomorphism is C G (N ). Thus G/C G (N ) is isomorphic
nected nilpotent groups are the groups U (n, F) of all up- (via ) to a subgroup of Aut(N ).
per triangular matrices with all diagonal entries equal to Theorem (Fitting–Bender): Let G be a finite group.
1. We call a matrix unipotent if it is conjugate to a matrix Then
in U (n, F). We have the following theorems.
C G (F ∗ (G)) = Z (F(G)).
Theorem: Let U be a subgroup of GL(n, F) all of whose
elements are unipotent matrices. Then U is a nilpotent Thus G/Z (F(G)) is isomorphic to a subgroup of
group and is conjugate to a subgroup of U (n, F). Aut(F ∗ (G)).
Lie’s Theorem: Let G be a closed connected solvable In particular, this result says that |G| is bounded by a
subgroup of GL(n, C). Then G is conjugate to a subgroup function of |F ∗ (G)|. Specifically, |G| ≤ (|F ∗ (G)|)!. (This
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
upper bound is achieved only when G = Sn for n ≤ 4.) In is a quadratic form on V whose associated bilinear form is
fact, much more is true in most practical applications. b. Now there is a surjective homomorphism : Aut(E) →
Returning briefly to linear groups, we recall that the O(V, Q) whose kernel is again Inn(E) ∼ = V . Thus Aut(E)/
fundamental case to analyze was the following: G is a Inn(E) ∼= O(V, Q) . In this case, however, Griess proved
subgroup of GL(V ), where V is an absolutely irreducible that the group Aut(E) is not in general a semidirect prod-
primitive and tensor-primitive G-module. In this case the uct. In any case, we get the following theorem.
following structure theorem is valid. We need one further
Theorem: Let G be a finite group such that F ∗ (G) =
definition.
E is an extraspecial p-group. Set V = E/Z (E). Then G/E
Definition: Let p be a prime. An extraspecial p-group E is isomorphic to a subgroup of CSp(V ). If p = 2, then in
is a finite p-group such that |Z (E)| = p and E/Z (E) is a fact G/E is isomorphic to a subgroup of O(V, Q), where
nontrivial abelian p-group of exponent p. Q is the quadratic squaring map. Moreover, G/E has no
nontrivial normal p-subgroup.
Extraspecial p-groups are the finite cousins of the
Heisenberg groups, which play a prominent role in quan- Thus the structure of G/E for such groups G may be
tum mechanics. analyzed by the methods of linear representation theory. In
fact, a similar line of reasoning may be applied to any finite
Theorem: Let G be a finite subgroup of GL(V ), where V
group G such that F ∗ (G) is a p-group for some prime p.
is an n-dimensional vector space over the field F. Suppose
One of the earliest and deepest results in this vein is the
that for every normal subgroup N of G not contained in
Hall–Higman theorem, which illustrates the relevance of
Z = Z (GL(V )), V is an absolutely irreducible primitive
the final assertion of the preceding theorem.
and tensor-primitive F[N ]-module. Then either
Hall–Higman Theorem B: Let V be a finite-dimen-
1. F ∗ (G) = E ◦ (Z ∩ G) with E an extraspecial p-group sional vector space over a field of characteristic p, p an
for some prime p such that the field F contains odd prime. Let G be a solvable subgroup of GL(V ) having
primitive pth roots of 1; or no nontrivial normal p-subgroup. Let x be an element of
2. F ∗ (G) = E ◦ (Z ∩ G) with E a quasi-simple finite G of order pn . Then the minimum polynomial m x (t) of x
n
group and Z (E) ≤ Z . is (t − 1) p , except possibly if p is a Fermat prime and G
contains a nonabelian normal 2-subgroup.
A version of this theorem figures prominently in
Note that p n is the largest possible size of a Jordan block
Aschbacher’s analysis of the maximal subgroups of the
for a linear transformation of order p n acting on a vector
finite classical matrix groups.
space in characteristic p.
The importance of finite groups G for which F ∗ (G) = E
The general theory of p-modular representations of fi-
is an extraspecial p-group (or a close relative thereof ) af-
nite groups was developed primarily by Richard Brauer.
fords an excuse for an illustration of the use of the Fitting–
The sharpest results are achieved when every p-subgroup
Bender theorem and of linear group methods in the study
of G is cyclic, for example, when |G| is divisible by p
of abstract finite groups. First, let E be an extraspecial
but not by p 2 . A related theme of great importance is
p-group and let V = E/Z (E). Then V may be regarded
the p-modular representation theory of a finite group of
as a finite-dimensional vector space over the finite field
Lie type G(q) defined over a field Fq of characteristic
F p . Moreover, we may identify Z (E) with the field F p .
p. The methodology here is completely different from
Then the function b : V × V → F p , defined by
Brauer’s theory and instead is analogous to the theory
b(u + Z (E), v + Z (E)) = [u, v] for all u, v ∈ E, of (rational) representations of semisimple Lie groups or
algebraic groups. Indeed, Steinberg proved that all irre-
is a nondegenerate alternating form on the vector space V .
ducible representations of G(q) come by restriction from
Moreover, if p is odd, then there is a surjective homomor-
the irreducible rational representations over F̄q of the al-
phism : Aut(E) → CSp(V ), the conformal symplectic
gebraic group G(F̄q ), where F̄q is the algebraic closure of
group of all similarities of the symplectic space (V, b).
Fq and G(q) arises from G(F̄q ) by taking fixed points of
The kernel of is the group of all inner automorphisms
a Frobenius–Steinberg endomorphism. The group G(F̄q )
of E, i.e., the automorphisms induced by the conjugation
is the characteristic p analog of the Lie group G(C).
action of E onto itself. This group, Inn(E) may be iden-
Now, the irreducible continuous representations of G(C)
tified with V = E/Z (E) and then we see that Aut(E) ∼ =
are parametrized by “highest weights” and the characters
V : CSp(V ). When p = 2, the squaring map Q : V → F2 ,
of these representations are given by the Weyl charac-
defined by
ter formula. Curtis and Richen showed that an analogous
Q(u + Z (E)) = u 2 for all u ∈ E, parametrization holds for the finite groups of Lie type. An
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
analog of the Weyl character formula has been conjec- the unique nonabelian simple composition factors of the
tured by Lusztig and proved generically, but not yet for all groups O + (V ), resp. O − (V ), where V is an n-dimensional
primes p. Fq -space. The groups 3 D4 (q) and 2 E 6 (q) were new fam-
For applications to topology and number theory, there ilies of nonabelian finite simple groups. 2 E 6 had arisen
have been efforts to study integral representations of a already as a real form of E 6 in Cartan’s classification,
finite group G, i.e., homomorphisms of G into GL(n, R), but the 3 D4 groups were truly new, arising as the fixed
where R is the ring of integers of some algebraic number points on D4 (F) of ρ ◦ σ , where ρ is the special triality
field, e.g., R = Z . The situation here is quite subtle, and automorphism of the D4 oriflamme geometry and σ
only small cases can be handled effectively. is a field automorphism of F of order 3. [Note: only
involutory automorphisms arise in the classification of
real forms of Lie groups, since (C : R) = 2.]
As 1960 dawned, the known nonabelian finite simple
VI. FINITE SIMPLE GROUPS groups were the alternating groups of degree at least 5,
the Chevalley and Steinberg groups (including the classi-
Around 1890, Wilhelm Killing published a series of pa- cal linear groups), and the five Mathieu groups, which had
pers classifying the finite-dimensional semisimple Lie al- been dubbed sporadic groups by Burnside around 1900.
gebras over C, modulo a significant error corrected by Each of these groups has even order, and indeed it had
E. Cartan. In particular, via Lie’s correspondence, this been conjectured as early as 1900 by Miller and Burnside
gave a classification of the finite-dimensional simple Lie that groups of odd order were necessarily solvable groups.
groups over C, and this classification was shortly ex- Even more, each of these groups has order divisible by
tended to the real field as well by Cartan. In addition 6. In fact, they all have subgroups isomorphic to either
to the Lie algebras associated with the classical linear SL(2, 3) or PSL(2, 3) ∼ = A4 , with the exception of the
groups over C, Killing discovered five exceptional sim- 2-transitive simple permutation groups PSL(2, 2n ) for n
ple Lie algebras over C, which he denoted E 2 , F4 , E 6 , E 7 , odd, n ≥ 3.
and E 8 , the subscript denoting the maximum dimension In 1960 Michio Suzuki discovered the first known non-
of a diagonalizable subalgebra, later known as a Car- abelian simple groups of order not divisible by 3, the
tan subalgebra. Later E 2 came to be known as G 2 . In infinite family of 2-transitive simple permutation groups
Killing’s notation the classical families of Lie groups over Sz(2n ), n odd, n ≥ 3. Shortly thereafter, Ree showed that
C are: An (C) = PSL(n + 1, C), Bn (C) = PSO(2n + 1, C), a modification of Steinberg’s twist could be apply to the
Cn (C) = PSp(2n, C), and Dn (C) = PSO(2n, C). Chevalley groups B2 and F4 over fields of order 2n , n odd,
The classification theorem of Killing and Cartan, which and to G 2 over fields of order 3n , n odd, to produce the
is a fundamental precursor of the classification of the finite families of simple groups 2 B2 (2n ),2 F4 (2n ), and 2 G 2 (3n ), n
simple groups, is the following result. odd. In particular, the groups 2 B2 (2n ) were precisely
Theorem: Let G be a finite-dimensional simple Lie group the Suzuki groups Sz(2n ). The other two families were
with trivial center. Then G is either (an adjoint version of ) new and were dubbed Ree groups. The finite Chevalley,
a member of one of the four infinite families of classical Lie Steinberg, Suzuki, and Ree groups are now often called
groups An (C), Bn (C), Cn (C), and Dn (C); or G is isomor- the finite groups of Lie type, inasmuch as they can be
phic to the adjoint version of one of the five exceptional uniformly constructed as the fixed points of Frobenius–
Lie groups E 6 (C ), E 7 (C), E 8 (C), F4 (C), or G 2 (C). Steinberg endomorphisms of simple algebraic groups over
algebraically closed fields of prime characteristic, and
In 1955 Chevalley published a paper constructing these groups are analogous to the simple Lie groups over
analogs of Killing’s groups over all fields. In particular, C classified by Killing and Cartan.
when the fields are finite, these Chevalley groups are Meanwhile, in 1960–1962, Walter Feit and John G.
finite simple groups, except in a few cases over fields of Thompson wrote the most remarkable paper in the his-
cardinality 2 or 3. Shortly thereafter, Steinberg did the tory of finite group theory, proving the Miller-Burnside
analog of Cartan’s passage to real forms of Lie groups conjecture.
and constructed the so-called twisted Chevalley groups or
The Odd-Order Theorem (Feit–Thompson): Every
Steinberg groups 2 An (q) = PSU(n + 1, q),2 Dn (q),3 D4 (q),
finite group of odd order is solvable.
and 2 E 6 (q). If V is an even-dimensional vector space over
a finite field Fq , then there are exactly two nonisomorphic This paper and the successor “N -Group Paper” of
orthogonal groups on V : O + (V ) and O − (V ), where Thompson provided most of the fundamental ideas for
O + (V ) is the isometry group of an orthogonal space of the classification of the finite simple groups. The first tool
maximal Witt index (i.e., an orthogonal sum of hyper- in the analysis of finite simple groups was provided in
bolic planes). Then, for n ≥ 3, Dn (q), and 2 Dn (q) are 1872 by Ludvig Sylow.
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
Sylow’s Theorem: If G is a finite group with |G| = p n m For p = 2, this problem was solved by a combination
for some prime p, with m not divisible by p, then G has of independent results of Suzuki and Bender.
subgroups of order p k for all k, 0 ≤ k ≤ n, and all sub-
Theorem: Let G be a finite transitive permutation group
groups of G of order p n are G-conjugate.
in which every involution (element of order 2) fixes ex-
The maximal p-subgroups of G are called Sylow actly one point. Then either G has 2-rank at most 1 or
p-subgroups of G. It is interesting to note that there are G∼ = SL(2, 2n ), Sz(22n−1 ), or PSU(3, 2n ) for some n ≥ 2.
somewhat analogous results in the category of linear alge-
For groups of odd order, the problem was solved by Feit
braic groups, namely, the conjugacy of Borel subgroups
and Thompson in the “Odd Order Paper.” The theory of
(maximal closed connected solvable subgroups) and of
group characters plays a major and seemingly unavoid-
maximal tori (maximal connected semisimple abelian sub-
able role in this delicate analysis. The complete problem
groups) of the linear algebraic group G. Note that a
has been solved as a corollary of the classification of fi-
Borel subgroup B = NG (U ) = U T , where U is a maxi-
nite simple groups, but an independent solution for p odd
mal unipotent subgroup and T is a maximal torus. If G is
remains elusive.
defined over an algebraically closed field of characteristic
A somewhat different strategy for the classification of fi-
p, then U is a maximal p-subgroup of G, in the sense that
nite simple groups was pioneered by Richard Brauer start-
U is maximal with the property that every element of U
ing around 1950. For groups of even order, Brauer cham-
has order a power of p.
pioned an inductive approach focusing on the centralizer
Returning to finite group theory, the set of all normal-
C G (t) of an involution t. A philosophical underpinning for
izers and centralizers of nonidentity p-subgroups of the
this approach was provided by the following result.
finite group G is called the set of p-local subgroups of G,
and the study of G via the analysis of these subgroups is Brauer–Fowler Theorem: Let G be a finite simple group
called p-local analysis. The “N -Group Paper” combined of even order containing the involution t. If |C G (t)| = c,
with the “Odd Order Paper” completed the classification then |G| < (c2 )!.
of finite simple groups all of whose p-local subgroups are
The Brauer–Fowler bound is useless in practice, but it
solvable groups. The strategy in brief is to choose a prime
does establish that given a finite group H with center of
p and a Sylow p-subgroup P for which the p-rank (the di-
even order, the problem of finding all finite simple groups
mension of an elementary abelian p-subgroup, thought of
G containing an involution t with C G (t) ∼ = H is a finite
as a vector space over F p ) is as large as possible. A theorem
problem. In practice, as Brauer, Janko, and their students
of Burnside, using monomial representations, guarantees
demonstrated in the ensuing decades, the problem is not
that p may be chosen so that either the p-rank is at least 3
only finite, it is doable. Indeed, a posteriori, C G (t) usually
or p = 2 and |G| is a multiple of 12. One now studies the
determines G uniquely. Once Feit and Thompson proved
normalizers of all nonidentity subgroups of P. If G is go-
that every nonabelian finite simple group has even order,
ing to turn out to be a finite group of Lie type defined over
Brauer’s strategy became an inductive strategy for the re-
a field of characteristic r , then almost always the chosen p
mainder of the classification proof. Brauer’s strategy is
will turn out to be r and the geometry defined by the coset
significantly different from the Feit–Thompson strategy
spaces G\Mi , where Mi is a p-local subgroup of G con-
in the following sense: since most primes are odd, a finite
taining P, will be precisely the Tits building for the group
simple group of Lie type is almost always defined over a
G. If there are at least two maximal p-local subgroups Mi
field of characteristic τ with τ = 2. Thus, for most finite
containing P, then the Tits building will have rank at least
simple groups G, each involution t is a semisimple ele-
2 and the geometry will be rich enough to characterize the
ment in the sense that F ∗ (C G (t)) is an almost semisimple
group G. More accurately, finite buildings of rank at least
group. Fortunately, the two approaches mesh and comple-
3 were classified in a major paper of Tits. Finite rank 2
ment each other, and the current proof is a blending of both.
buildings include all finite projective planes. However, the
In the course of pursuing Brauer’s strategy, Janko inves-
buildings of rank at least 2 which occur in the context of
tigated many possible specific structures for the centralizer
the simple group classification satisfy a “Moufang condi-
of an involution in a simple group. One of these structures,
tion,” which permits their complete identification. On the
C2 × A5 , led him to a new simple group, J1 , the first spo-
other hand, if there is a unique maximal p-local subgroup
radic group discovered since Mathieu. Shortly thereafter,
M of G containing P, then the “geometry” is simply the
Janko discovered two more sporadic groups, J2 and J3 .
point set G\M. This leads to a problem in permutation
The former was soon constructed as a rank-3 permutation
group theory.
group by M. Hall. The involution centralizer approach and
Problem: Classify finite transitive permutation groups the rank-3 permutation group approach were the main tac-
G in which, for some fixed prime p, every nonidentity tics leading to the discovery of 21 sporadic groups in the
p-element fixes exactly one point. decade 1965–1975.
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
In a different vein, John Leech constructed in 1967 The principal obstruction to establish this structure the-
the Leech lattice, the densest sphere-packing lattice in orem for maximal p-locals is the hypothetical existence of
R24 , based on the Steiner system S(5.8.24) for M24 . John certain p -subgroups (subgroups of order relatively prime
Conway investigated the automorphism group .0 of the to p) whose normalizers contain large p-subgroups. These
Leech lattice and discovered that .0/{±1} = Co1 was a were dubbed p-signalizers by Thompson, who devel-
new finite simple group, as were two of its subgroups, Co2 oped the principal strategies for controlling them. These
and Co3 . In 1974, pursuing the involution centralizer phi- strategies were honed into a “signalizer functor theory”
losophy, starting with a centralizer C built out of the Leech by Gorenstein and Walter with major contributions by
lattice modulo 2, /2, and Co1 , Bernd Fischer and Goldschmidt, Glauberman, Harada, Lyons, McBride, and
Robert L. Griess, Jr., independently discovered the largest others.
of the sporadic simple groups, dubbed M, the Monster. The
Monster was constructed by Griess as the automorphism
group of a 196, 884-dimensional nonassociative commu- VII. OTHER TOPICS
tative C-algebra, the Griess algebra. Twenty-one of the
26 sporadic simple groups are contained (as quotients of Thus far the article has focused on finite groups, with
subgroups) in the Monster. The investigation of numeri- some reference to Lie groups and linear algebraic groups.
cal mysteries (dubbed Monstrous Moonshine) associated Certain other classes of groups have played a fundamen-
with M led Frenkel, Lepowsky, Meurman, and Borcherds tal role in mathematics. Klein and Poincaré pioneered the
to develop the mathematical theory of vertex operator al- study of discrete subgroups of PSL(2, R) and PSL(2, C).
gebras, objects which were first studied by physicists. For instance, PSL(2, R) acts naturally as fractional lin-
The classification theorem for finite simple groups is ear transformations on the upper half-plane of R2 , which
the following theorem. may be identified with the hyperbolic plane with PSL(2,
R) acting as orientation-preserving isometries. A discrete
Classification Theorem: Let G be a finite simple group.
subgroup is then a subgroup which does not contain
Then G is isomorphic to a member of one of the following
arbitrarily small translations or rotations. A similar inter-
families of simple groups:
pretation may be given to discrete subgroups of PSL(2,
1. The cyclic groups of prime order; C) viewed as isometries of hyperbolic 3-space. An im-
2. The alternating groups of degree at least 5; portant class of examples of these discrete subgroups is
3. The finite simple groups of Lie type; and given by PSL(2, Z) and its congruence subgroups, i.e.,
4. The 26 sporadic simple groups. the kernels of the natural homomorphisms of PSL(2, Z)
onto PSL(2, Z/nZ), where Z/nZ is the ring of integers
There are a small number of unusual isomorphisms modulo n. These discrete groups may be regarded as the
among the simple groups. For example, A5 ∼ = SL(2, 4) ∼
= symmetry groups of tessellations (tilings) of hyperbolic
∼ ∼
PSL(2, 5), A6 = PSL(2, 9) = [Sp(4, 2), Sp(4, 2)], and space by congruent (hyperbolic) polygons. This theory
A8 ∼= SL(4, 2). has been generalized to the study of arithmetic subgroups
The proof is a massive inductive argument, i.e., one as- of higher-dimensional Lie groups. Like the integers, these
sumes througout that G is a minimal counterexample to groups do not have a composition series and so they can-
the statement of the theorem, and so every composition not be “built up from the bottom” like finite groups: they
factor of every proper subgroup of G (in particular, of have no bottom from which to build. They are, however,
every p-local subgroup of G) is one of the listed simple usually generated by a finite set of elements or proper sub-
groups. Having chosen a prime p upon which to perform groups and may be viewed as a quotient of the universal
p-local analysis according to either the Feit–Thompson or or freest object so generated.
Brauer strategy, one is confronted by the major problem Thus, if G is generated by the set of elements S, then G is
of attempting to establish that for many of the maximal a quotient of the free group F(S). A relator is a word which
p-local subgroups H of G, either F ∗ (H ) is an almost becomes 1 when interpreted in G rather than in F(S). In
semisimple group or F ∗ (H ) is a p-group. In the latter general, if R is a set of relators, then there is a natural
case we say that H is of parabolic type, since in a fi- homomorphism from F(S)/N (R) onto G, where N (R) is
nite simple group G of Lie type in characteristic p, all the smallest normal subgroup of F(S) containing all of the
subgroups which contain a Sylow p-normalizer (the so- members of R. If that homomorphism is an isomorphism,
called parabolic subgroups of G) are of parabolic type. then the generators S and relations R “present” the group
Aschbacher and Timmesfeld were leaders in the analysis G. In general, it is quite difficult to “understand” a group
of such groups in the 1970s. Later, Stroth, Stellmacher, given by generators and relations. In fact, it is a theorem
and Meierfrankenfeld also pursued this vein. (the “Undecidability of the Word Problem”) that there is no
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
recursive algorithm for deciding whether a given finitely The free product H * K of two groups H and K is the
presented group is the identity group. freest group generated by H and K with H ∩ K = {e}.
A striking exception to the intractability of the “Word Underlying the Brauer–Fowler theorem on finite simple
Problem” is the family of Coxeter groups C(m i j ) de- groups is the elementary fact that the free product C2 * C2
fined by the generators {si }i∈I and relators (si s j )m i j , where is an infinite dihedral group generated by an element x
M = (m i j ) is a symmetric matrix with m ii = 1 for all i and of infinite order and an involution y such that yx y = x −1 .
with m i j ∈ {2, 3, . . . , ω} for i = j. Classical examples of All finite homomorphic images of C2 * C2 are finite di-
Coxeter groups are discrete groups of isometries of spheri- hedral groups (the symmetry groups of regular polygons)
cal, Euclidean, or hyperbolic n-spaces. The spherical Cox- or are cyclic of order 1 or 2. By contrast, the free product
eter groups are well understood and form the skeletons C2 * C3 is isomorphic to PSL(2, Z) and almost every finite
of Tits’ spherical BN pairs. Beyond the finite dihedral simple group occurs as a homomorphic image of C2 * C3 .
groups, all but two of the spherical Coxeter groups are PSL(2, Z) is in turn the quotient of the 3-string braid group
finite Weyl groups, whose classification is a key step in B3 by an infinite cyclic central subgroup.
the modern proof of the Killing-Cartan classification the- It follows from the theory of fundamental groups and
orem. The Euclidean Coxeter groups are also well under- covering spaces that the study of discrete subgroups (at
stood and play a major role in the theory of affine buildings least the torsion-free ones, i.e., those having no nontrivial
and p-adic groups. Closely associated to Coxeter groups elements of finite order) of PSL(2, R) and PSL(2, C) is
are the braid groups and Artin groups, whose generators essentially identical to the problem of finding Riemannian
have infinite order but which satisfy relations similar to metrics of constant (sectional) curvature −1 on manifolds
the Coxeter relations. They have featured in recent work of dimensions 2 and 3, respectively. This suggests that one
in numerous areas, including knot theory, singularity the- should be especially interested in studying fundamental
ory, and inverse Galois theory, i.e., the problem of finding groups of negatively curved Riemannian manifolds (or
a polynomial (usually in Q[x]) having a specified Galois negatively curved spaces). Quite recently, Gromov has
group. developed a theory of “word hyperbolic groups” which
The braid groups arise as the fundamental groups of encompasses the group-theoretic aspects of negative
topological spaces, an important concept introduced by curvature. For example, free groups as well as the funda-
Poincaré. The elements of the fundamental group are mental groups of negatively curved manifolds are all word
equivalence classes of loops (closed paths) in the space hyperbolic. In contrast to the general undecidability of
with a distinguished base point. Multiplication is concate- the Word Problem, it can be shown that the Word Problem
nation of loops and inversion is reversal of direction. Fun- is solvable for word hyperbolic groups. Indeed, this has
damental groups are naturally described via generators led to a study of the larger class of “automatic groups”
and relations, with the generators being certain loops and for which the Word Problem is decidable by a finite-state
a relation being a closed path which can be contracted to automation.
the base point in the space. For example, if the space X Recently there have been projects to develop efficient
is the union of n circles which intersect only at a com- computer algorithms for the identification of finite groups
mon base point, then the fundamental group of X is the from data consisting either of a set of generating permu-
free group on n generators. The fundamental group acts tations or a set of generating matrices or an abstract set of
naturally on the universal cover X̃ of the space X . In this generators and relators (a black-box group). Most of these
example, X̃ is a regular tree (graph without circuits) of algorithms use the classification of finite simple groups
valency 2n. as well as numerous corollaries concerning simple groups
The study of fundamental groups of topological spaces with small linear or permutation representations. Com-
attached along embedded subspaces motivates the group- puter algorithms played a relatively small role in the clas-
theoretic study of so-called free products with amalgama- sification of finite simple groups. The existence of most of
tion, HNN extensions, and group actions on trees, all of the sporadic simple groups can be established as a corol-
which is subsumed in the Bass–Serre theory of graphs of lary of the existence of the Monster, which Griess con-
groups. The objective is understand the universal comple- structed “by hand.” Several of the sporadic groups were
tion of a graph of groups, i.e., the largest possible group first constructed on a machine, however, and for a few this
containing certain specified subgroups (attached to the remains the only proof of their existence.
vertices and edges of the graph) subject to certain relating
homomorphisms mapping edge groups to vertex groups.
The amalgamated product of two groups over a specified SEE ALSO THE FOLLOWING ARTICLES
common subgroup is a “free” construction, which in all
nontrivial cases leads to an infinite completion. GROUP THEORY, APPLIED • SET THEORY
P1: LLL/GKP P2: FYK Final Pages
Encyclopedia of Physical Science and Technology EN007C-304 June 30, 2001 17:0
I. Brief History
II. Group Representation Theory
III. Continuous Lie Groups
IV. Applications in Atomic Physics
V. Applications in Nuclear Physics
VI. Applications in Molecular and
Solid-State Physics
VII. Application of Lie Groups in
Electrical Engineering
VIII. Applications in Particle Physics
IX. Applications in Geometrical Optics
X. The Renormalization Group
155
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
APPLIED GROUP THEORY is the study of the theory TABLE I Cayley Multiplication Table of C3v or S3
of group representations and its applications in various ar- E A B C D F
eas of physics such as atomic physics, molecular physics,
solid-state physics, and particle physics. Group theory is E E A B C D F
essentially a mathematical description of symmetry in na- A A E D F B C
ture, and it is very interesting that this symmetry plays an B B F E D C A
important role in understanding many experimental phe- C C D F E A B
nomena in physics. D D C A B F E
F F B C A E D
I. BRIEF HISTORY
made after the operation B, akin to the left multiplication
Ancient as well as modern works of art and architecture of a matrix B by a matrix A). A function of an independent
tell us that humans have long familiarity with symme- variable x is written F(x). These notations are consistent
try. An Egyptian pyramid, a Greek temple, an Indian Taj with the customary usage in group theory books by physi-
Mahal, and a Chinese pagoda strike us as beautiful exam- cists (e.g., Wigner, 1959). The set of six elements form a
ples of symmetry. There is symmetry in works of nature finite group or order g = 6 (G or S3 or C3v ), and from the
as well, in living organisms, and in people. It is interest- table one can see that it is closed with respect to multi-
ing to note, however, that while knowledge of symmetry plication. There exists an identity element (E) and every
existed for centuries, it was only toward the end of the element has an inverse element such that the product of
eighteenth century that it was realized that symmetry has the element and its inverse (or vice versa) is the identity.
a scientific basis in the mathematical theory of groups. If the multiplication is commutative (i.e., AB = B A) the
Since the mathematical structure and analysis of groups is group is known as an abelian group.
the subject matter of another article in this Encyclopedia, This abstract group multiplication table can be “real-
we will discuss here only the applications of group theory ized” in more than one way by different types of elements
in physics and engineering. with a multiplication rule appropriately defined. For in-
In a series of papers published toward the end of the stance, the following numbers satisfy all the group prop-
nineteenth century and at the beginning of the twen- erties under ordinary multiplications:
tieth century Frobenius and Schur laid the foundation
of the theory of group representations for finite groups. E = 1 = D = F, A = B = C = −1. (1)
The structure and representation of continuous groups
is the work of Lie, Cartan, and Weyl. Wigner was probably
the first to recognize the importance of Frobenius’s work This, for instance, is an abelian group. Another realization
to quantum mechanics and modern physics. He applied of the group table is obtained by elements which are op-
representation theory to physical problems in many areas erations of permutations of three objects and this group is
of physics. Subsequently several others played important known as the symmetric group S3 . Explicitly, the elements
roles in developing and applying group theory as a tool— are
Bargmann, Bethe, Casimir, Gell-Mann, Pauli, to name a E A B
few.
1 2 3 1 2 3 1 2 3
, , ,
1 2 3 1 3 2 3 2 1
II. GROUP REPRESENTATION THEORY (2)
C D F
It is necessary to discuss briefly some important theo- 1 2 3 1 2 3 1 2 3
rems in representation theory, since all the applications in , , .
2 1 3 2 3 1 3 1 2
the physical sciences are based on these theorems. First,
a Cayley multiplication table (Table I) defines a group The permutation D is defined to read “1 is replaced by 2, 2
abstractly. In the table the operation of “multiplication” by 3, and 3 by 1.” AB is defined as the result of the permu-
is to be understood as, for instance, the correspondences tation corresponding to A performed after the operation
AB = D and B A = F. A word of caution is in order here. B. Yet another way of realizing this group, which is of
There is usually a notational difference between mathe- importance in molecular physics, is by choosing for the
maticians and physicists. Here the operation AB is to be elements certain geometric symmetry operations. This
performed from right to left (i.e., the operation A to be group is known as C3v . These symmetry operations are
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
FIGURE 1 Equilateral triangle (C3v ) with coordinates of vertices. [Reprinted with permission from Swamy, N. V. V. J.,
and Samuel, M. A. (1979). “Group Theory Made Easy for Scientists and Engineers,” Wiley-Interscience, New York.
Copyright 1979 John Wiley and Sons.]
easily understood by applying them to the equilateral tri- elements E, A, B, . . . of S3 are in one-to-one correspon-
angle in Fig. 1. In this A, B, C are the vertices, A , B , dence with elements E, σv , σv , . . . of C3v uniquely such
C are the midpoints of the opposite sides such that A A , that group multiplication is preserved. AB = D in the case
B B , CC are the medians of the triangle. Let us assume of the group S3 means that the product of the elements
the triangle to be in the X Y plane of a rectangular coordi- corresponding to A and B in C3v , in that order, results
nate system whose origin is at the centroid of the triangle in the element corresponding to D. Here multiplication
O. Here the identity element means “leave it alone.” Ele- means the group operation. Isomorphic groups have the
ments D,F correspond to rotations through 120◦ and 240◦ , same Cayley multiplication table. A many-to-one corre-
respectively, about the Z axis through O perpendicular to spondence is known as homomorphism. In this case more
the plane of the triangle. These are called C3 , C32 opera- than one element of one group corresponds to the same
tions, the subscript 3 referring to the angle of rotation 2π/3 element in the other group, but products do correspond to
and the letter C stands for the cyclic axis. A, B, C are products in either group. Thus the group of numbers in (1)
operations of reflections in three vertical mirror planes is homomorphic to S3 or C3v .
passing through A A , B B , CC and containing the cyclic We notice from the Cayley table for S3 that subsets of
axis. Although we labeled the vertices to be able to under- elements (E), (E, A), and (E, D, F) themselves satisfy
stand the geometric symmetry operations of rotations and all the group properties. These are known as subgroups
reflections, it is important to note that the operations bring of the main group, of orders h = 1, 2, and 3, respectively.
the triangle into coincidence with itself, which implies that There is a theorem of Lagrange which says that the or-
all the vertices are identical and hence indistinguishable. der h of a subgroup is a divisor of the order g of the
The ammonia molecule NH3 has the symmetry of the C3v group. Thus a group or order 11 can have only one sub-
group. The three hydrogen atoms are at the vertices of group, of order 1, the trivial group with only the identity
the triangle and the nitrogen atom sits somewhere on the element. Given an element A, one defines its conjugate
cyclic axis and thus all the six geometric operations bring of B AB −1 , where B is any element of the group. For
the molecule into coincidence with itself. The groups S3 instance, in Table I, D AD −1 = D AF = DC = B. Thus
and C3v are said to be isomorphic, which means that the B is a conjugate of A, the operation of conjugation
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
E E A B C D F G H I J K L M N O P Q R S T U V W X E
A A B E D F C R U O K N I V J L H T X Q S P W M G A
B B E A F C D X P L N J O W K I U S G T Q H M V R B
C C F D E B A K O U R G P T X H L V J W M I Q S N C
D D C F A E B N L P X R H S G U I W K M V O T Q J D
F F D C B A E J I H G X U Q R P O M N V W L S T K F
G G U S K W I E J F H C T P Q R M N O B L A X D V G
H H T X V L J I E G F S D R P Q N O M K A W C U B H
I I W K S U G H F J E V A N O M R P Q X D T B L C I
J J L V X T H F G E I B W O M N Q R P C U D K A S J
K K I W G S U C R A O E M L V J T X H D P F N B Q K
L L V J T H X P D N B M E K I W G U S R C Q A O F L
M M N O P R Q T S V W L K E A B C F D H G X I J U M
N N O M R Q P D X B L A V I W K S G U F H C J E T N
O O M N Q P R U C K A W B J L V X H T G F S E I D O
P P Q R M O N L B X D T C G U S K I W J E V F H A P
Q Q R P O N M W V S T U X F D C B E A I J K H G L Q
R R P Q N M O A K C U D S H T X V J L E I B G F W R
S S G U I K W V M T Q H R D C F A B E L N J P X O S
T T X H L J V M W Q S P G C F D E A B O K N U R I T
U U S G W I K O A R C Q F X H T J L V N B M D P E U
V V J L H X T S Q W M I N A B E D C F U R G O K P V
W W K I U G S Q T M V O J B E A F D O P X R L N H W
X X H T J V L B N D P F Q U S G W K I A O E R C M X
E A B C D F G H I J K L M N O P Q R S T U V W X
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
theorem in permutation groups is that the “number of q” Here we have denoted the classes by K i , the number of
classes of the symmetric group Sn (of order n!) is equal to elements in each class gi . The partitions of the number
the number of distinct partitions in n. We thus have 4 associated with the different classes are shown as (14 ),
K 1 = (E) 1 + 1 + 1 + 1 (14 ), g1 = 1, etc., which can be taken as an alternative description of
classes. According to representation theory:
K 2 = (C, D, F, G, H, Q) 1 + 1 + 2 + 0 (12 2)
g2 = 6,
1. The number of inequivalent irreducible
K 3 = (K , L , M) 2 + 2 + 0 + 0 (22 ) representations is equal to the number of classes. S4
g3 = 3, (7) thus has five representations.
K 4 = (A, B, I, J, N , O, V, W ) 1+3+0+0 (31) 2. The sum of squares of the dimensions of the
irreducible representations equals the order of the
g4 = 8,
group:
K 5 = (P, R, S, T, U, X ) 4 + 0 + 0 + 0 (41 )
g5 = 6. n 21 + n 22 + n 23 + n 24 + n 25 = g = 24 for the S4 group.
D (2) : →
√ √ √
−i/2 i/ 2 −i/2 −1/2 −1/ 2 −1/2 −i/2 i/2
1/ 2
√ √ √ √ √ √
1/ 2 0 −1/ 2 , −1/ 2 0 1/ 2 , −i/ 2 0 i/ 2 ,
D(A) = √ D(F) = √ D(J ) = √
i/2 i/ 2 i/2 −1/2 1/ 2 −1/2 −i/2 −1/ 2 i/2
χ =0 χ = −1 χ =0
√
i/2 1/ 2 −i/2 0 0 −i −1 0 0
√ √
−i/ 2 0 −i/ 2 , 0 −1 0 , 0 1 0 ,
D(B) = √ D(G) = D(K ) =
i/2 −1/ 2 −i/2 i 0 0 0 0 −1
χ =0 χ = −1 χ = −1
√
0 0 i −1/2 i/ 2 1/2 0 0 −1
√ √
0 −1 0 , −i/ 2 0 −i/ 2 , 0 −1 0 ,
D(C) = D(H ) = √ D(L) =
−i 0 0 1/2 i/ 2 −1/2 −1 0 0
χ = −1 χ = −1 χ = −1
√ √
−1/2 −i/ 2 1/2 i/2 −i/ 2 i/2 0 0 1
√ √ √ √
i/ 2 0 i/ 2 , 1/ 2 0 −1/ 2 , 0 −1 0 ,
D(D) = √ D(I ) = √ D(M) =
1/2 −i/ 2 −1/2 −i/2 −i/ 2 −i/2 1 0 0
χ = −1 χ =0 χ = −1
√ √ √
i/2 i/ 2 i/2 1/2 −i/ 2 −1/2 −i/2 −i/ 2 −i/2
√ √ √ √ √ √
−1/ 2 0 1/ 2 , −i/ 2 0 −i/ 2 , −1/ 2 0 1/ 2 ,
D(N ) = √ D(R) = √ D(V ) = √
−i/2 i/ 2 −i/2 −1/2 −i/ 2 1/2 i/2 −i/ 2 i/2
χ =0 χ =1 χ =0
√ √ √
i/2 −1/ 2 −i/2 1/2 i/ 2 −1/2 −i/2 −1/ 2 i/2
√ √ √ √ √ √
i/ 2 0 i/ 2 , 1/ 2 0 i/ 2 , −i/ 2 0 −i/ 2 ,
D(O) = √ D(S) = √ D(W ) = √
i/2 1/ 2 −i/2 −1/2 i/ 2 1/2 −i/2 1/ 2 i/2
χ =0 χ =1 χ =0
√
−i 0 0 i 0 0 1/2 −1/ 2 1/2
√ √
0 1 0 , 0 1 0 , 1/ 2 0 −1/ 2 ,
D(P) = D(T ) = D(X ) = √
0 0 i 0 0 −i 1/2 1/ 2 1/2
χ =1 χ =1 χ =1
√ √
−1/2 +1/ 2 −1/2 1/2 1/ 2 1/2 1 0 0
√ √ √ √
1/ 2 −1/ 2 , −1/ 2 1/ 2 , 0 1 0 .
(E) =
0 0
D(Q) = √ D(U ) = √
−1/2 −1/ 2 −1/2 1/2 −1/ 2 1/2 0 0 1
χ = −1 χ =1 χ =3
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
This equation has the solution 12 + 12 + 22 + 32 + 32 . to verify the above numerically with the help of the repre-
sentation matrices for S4 and its character table (Table III).
Thus S4 has two one-dimensional, two three- The √ third relation above shows that normalized charac-
( j)
dimensional, and one two-dimensional representations. ters gi /g χi form an orthonormal vector system in the
All finite groups have a trivial one-dimensional represen- k-dimensional space of classes. Various prescriptions ex-
tation where each element is represented by 1. One set ist for calculating the characters of a finite group, and it
of the three-dimensional representation matrices of S4 is is much easier to determine characters in many instances
given on the previous page. Swamy and Samuel give all than the representation matrices themselves. Frobenius de-
the representation matrices in their work. It is important veloped a systematic algebraic procedure for determining
to note that, while the inequivalent representations will be the characters of any permutation group. If we know the
finite in number, there will be an infinite number of equiv- characters of a permutation group we can infer therefrom
alent representations, each differing from the other only the characters of all finite groups because of a theorem in
by a similarity transformation. We use the notation D ( j) group theory which says that any finite group is isomorphic
for the jth irreducible representation, D ( j) (R) for the ma- to a subgroup of some permutation or symmetric group.
trix representing the general group element R in the J th Isomorphic groups have the same character table. One of
representation, and Dmm (R) for the mm th matrix ele-
( j) the interesting prescriptions for obtaining characters is the
ment of that particular matrix, χ ( j) (R) is the character of class product relation
the element R in the J th irreducible representation.
ki k j = Ci jl kl , (9)
All the elements of one class have the same character t
in a given irreducible representation. Table III gives the
classwise character table for S4 . Two important theorems where the ki are classes and the summation is done over
in representation theory are summarized in the formulas all the classes. From this can be derived
∗ (µ) (µ) (µ)
(µ)
Dil (R)D (ν)
jm (R) = (g/n µ ) δµν δi j δlm ,
gi g j χi χ j = nµ Ci jl gl χl , (10)
R l
(8)
µ (ν)∗ where the C coefficients are identical to the ones in Eq. (9).
χ (R)χ (R) = g δµν ,
R
To illustrate the first identity let us take classes k2 and k3
of S4 :
or, equivalently,
k3 k2 = (K , L , M)(C, D, F, G, H, Q)
k
∗
(µ)
χi χi(ν) gi = g δµν . = (G, S, U, C, R, X ), (T, H, X, P, D, U ),
TABLE IV OR x, OR y, OR z
F(132) D(123) A(23) C(12) B(13)
E Xi C 3 Xi C 3 Xi σv Xi σ v X i σ v X i
√ √ √ √
X1 = X X −(1/2)x − ( 3/2)y −(1/2)x + ( 3/2)y −X (1/2)x − ( 3/2)y (1/2)x + ( 3/2)y
√ √ √ √
X2 = Y Y ( 3/2)x − (1/2)y −( 3/2)x − (1/2)y Y −( 3/2)x − (1/2)y ( 3/2)x − (1/2)y
X3 = Z Z Z Z Z Z Z
basis functions, the fundamental formula of representation The characters are shown underneath each matrix, and
theory then is expressed as Table V gives all the characters classwise. x, y are
written against the E representation since these transform
n
O R ψν = Dµν (R)ψµ . (12) accordingly. x is said to belong to the first row of the
µ=1
representation D (E) and y to its second row. Since the
A2 representation is also one-dimensional, the matrix
This means that the result of a group operation O R elements (numbers which are also the characters) are
(element of the group) on one of the basis functions is easily obtained by applying the orthogonality relations
a linear combination of the basis functions and the matrix either to the matrix elements of different irreducible
of the coefficients in the linear combination essentially de- representations or to the characters. It is to be noted
termines the representation. We will illustrate this for the that the choice of x and y for the two-dimensional
C3v group. In Fig. 1 let us choose a rectangular coordinate representation is fortuitous inasmuch as it straightaway
system with origin at the centroid O and the Y axis through generated the representation. However, there is a general
the median A A with the X axis perpendicular to it in its prescription for obtaining the basis functions starting with
plane. The Z axis will be pointing up from the origin per- any arbitrary function by means of projection operators.
pendicular to the plane of the figure. The√ vertices will then The formula for this projection operator is
have the√ coordinates A = (0, 1, 0), B( 3/2, −1/2, 0), (µ) nµ (µ)∗
and C( 3/2, −1/2, 0). In Table IV we gather the results pi = Dii (R)O R , (14)
g R
of making all the six operations of the group C3v , on x, y,
and z. From formula (12) and the table we see that if we pi2 = pi , pi p j = 0, (14a)
choose z as one basis function, we have a one-dimensional and for the two-dimensional representation these are
representation with the number 1 representing all the ele- explicitly
ments. In the character table (Table V) this is shown as the
A1 representation, which means that the basis function is p1 = 13 E − 12 D − 12 F − A + 12 B + 12 C ,
symmetric with respect to rotation about the cyclic axis. (15)
If we choose a pair of functions x = ψ1 , y = ψ2 in that p2 = 13 E − 12 D − 12 F + A − 12 B − 12 C .
order we obtain the two-dimensional (E representation) If we choose the arbitrary function φ(x, y, z) = x + y + z,
irreducible representation then it is easily seen that
(R) (A) p1 = x
or ψi(ν) = pi(ν) . (16)
p2 = y
1 0 −1 0
, , We will conclude this discussion of representation the-
0 1 0 1
ory by introducing the concept of direct product of two
χ =2 χ =0 groups. Let a group G 1 be of order g1 with elements A1 ,
(B) (C) A2 , . . . , A gi and another group of G 2 of order g2 with el-
√ √ ements B1 , B2 , . . . , Bg2 . A direct product group G 1 × G 2
1/2 3/2 1/2 − 3/2
√ , √ , (13)
3/2 −1/2 − 3/2 −1/2 TABLE V Character Table of S3 or C3v
χ =0 K 1 (13 )E K 2 21(A, B, C) K 3 (3)(D, F)
g1 = 1 (g 2 = 3) g3 = 2
(D) (F)
√ √
−1/2 − 3/2 −1/2 3/2 D (A1 ) 1 1 1 z
√ , √ . D (A2 ) 1 −1 1
3/2 −1/2 − 3/2 −1/2
D (E) 2 0 −1 (x, y)
χ = −1 χ = −1
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
x cos t2 cos t3 −cos t2 sin t3 sin t2 x
y = cos t1 sin t3 + sin t1 sin t2 cos t3 cos t1 cos t3 − sin t1 sin t2 sin t3 sin t1 cos t2 y (26)
z sin t1 sin t3 − cos t1 sin t2 cos t3 sin t1 cos t3 + cos t1 sin t2 sin t3 cos t1 cos t2 z
x + y + z 2 = x 2 + y 2 + z 2 .
2 2
The way to describe rotations of a coordinate system values. Thus for a given l there are 2l + 1 spherical har-
mathematically in three-dimensional space was shown by monics and these are the basic functions which generate
Euler, centuries ago, who introduced three real parameters the odd (2l + 1)-dimensional irreducible representations
called the Eulerian angles. An alternative and equivalent of the three-dimensional rotation group. If a dynamical
description is given in Eq. (26). Here ti are the parame- system in quantum mechanics has a certain symmetry, or
ters of the Lie group and these are naturally functions of more precisely if the Hamiltonian of the dynamical sys-
the Eulerian angles. Following the Lie prescription, the tem is invariant to a certain set of group operations, then
infinitesimal generators are seen to be its solutions generate the representations of that group.
∂ ∂ In particular, if the dynamical system has spherical sym-
Xt1 ≡ X1 = z −y , metry, which means that the Hamiltonian has rotational
∂y ∂z
invariance, the spherical harmonics must be a factor in
∂ ∂ its solution. The Laplace operator is indeed rotationally
X2 = x −z , (27)
∂z ∂x invariant, and its solutions are either
∂ ∂
X3 = y −x , r l Ylm or r −l−l Ylm . (31)
∂x ∂y
and the associated Lie algebra is
[X1 , X2 ] = X3 and cyclically. (27a) B. SU(2) Covering Group of the Rotation Group
This relation is written somewhat cryptically as The even-dimensional representations of SO(3) were first
discovered by Weyl. He showed that the covering group of
[Xi , X j ] = εi jk Xk , (27b) this rotation group is SU(2), which is the unitary unimod-
k
ular group in two dimensions. This group played a crucial
where the Levi–Civita symbol εi jk assumes the value 0 role in characterizing the spin properties of particles such
whenever any two indices are equal, equals +1 when ijk is as electrons and protons (fermions), and recently it has
an even permutation of 123, and equals −1 for an odd per- proved to be one of the cornerstones of a theory that led
mutation. The finite transformation (26) can be obtained to the discovery of the W ± and Z 0 bosons.
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
A general element of the SU(2) group can be written as xµ = bµλ xλ . (37)
a 2 × 2 unitary matrix of determinant +1. λ
α β The transformation matrix bµλ , an element of the SO(4)
U= , αα ∗ + ββ ∗ = 1. (32)
−β ∗ α ∗ group, can be expressed in terms of six real parameters
α, β, γ , α , β , γ . Forsyth has shown that the matrix ele-
Here the asterisk stands for the complex conjugate. This
ments bµλ are somewhat complicated trigonometric func-
is called a covering group of SO(3) because it not only
tions of these parameters. This matrix, as well as the ele-
induces three-dimensional rotations, it gives the odd- as
ment of SO(3), is an orthogonal matrix. The Lie algebra
well as even-dimensional representations of the latter. For
of the six infinitesimal generators is best given in terms
instance, a similarity transformation with U of the matrix
of two sets of operators X 1 = L 1 , X 2 = L 2 , X 3 = L 3 , and
z x − iy X 4 = A1 , X 5 = A2 , X 6 = A3 :
σ · r = (33)
x + i y −z [Li , L j ] = εi jk Lk ,
k
gives
[Ai , A j ] = εi jk Lk , (38)
· r = U+ σ
σ · r U. (34) k
r (x , y , z ) is related to r (x, y, z) through α, β and their [Li , A j ] = εi jk Ak .
complex conjugates. From the theory of determinants we k
know that σ · r and σ · r have the same determinant or These relations identify the SO(4) group and any six oper-
2 2 2 ators, which obey the same commutation relations as the
x +y +z =x +y +z .
2 2 2
X i generate a group isomorphic to SO(4). It is interesting
Wigner has given the explicit connection between the to note that the L j themselves form a closed Lie algebra
complex α, β and of SU(2) and the Eulerian angles α, isomorphic to SO(3). This group is then a subgroup of
β, γ which characterize the three-dimensional rotations. SO(4). This group has two invariant operators,
It is straightforward algebra to establish the relation be-
F= Li2 + Ai2 ,
tween the SU(2) elements and the ti of (26). The Lie (39)
algebra satisfied by the generators of SU(2) is isomor- G = L1 A1 + L2 A2 + L3 A3 .
phic to that of the SO(3) group. The Pauli spin operators
This group plays an important role in understanding the
that describe the magnetic electron in atomic physics, the
bound states of the nonrelativistic hydrogen atom as well
Cayley–Klein operators that describe rigid body spin in
as in particle physics.
classical mechanics, and quoternions in vector space the-
If instead of Eq. (36) we have transformations which
ory all have algebras similar to the SU(2) algebra. The
leave
basis functions which generate the irreducible represen-
tations of SU(2) are known as monomials or tensors. A X 12 + X 22 + X 32 − X 42 (40)
typical even-dimensional representation of SU(2) is given invariant, then these form the elements of the homo-
in Eq. (35). Some of the properties of SU(2) groups will geneous Lorentz group, of fundamental importance in
be discussed elsewhere in this article in the context of the special theory of relativity and elementary particle
applications in particle physics. physics. We have explicitly
1/2
Dmm (αβγ ) = X µ = aµλ xλ . (41)
λ
−i(α/2)
e (cos β/2)e−i(γ /2) −i −i(α/2) (sin β/2)ei(γ /2) The aµλ are once again expressed in terms of six real pa-
−i(γ /2) i(γ /2)
. rameters. How algebraically complicated these functions,
e i(α/2)
(sin β/2)e e i(α/2)
(cos β/2)e
elements of aµλ , are can be judged from the typical ele-
(35)
ment
a12 = sin γ cos γ {cosh α cos2 β
C. Four-Dimensional Rotation Group
− cos α cosh2 β + cos α sin2 β
and Homogeneous Lorentz Group
+ cosh α sinh2 β } − sin a sin β cosh β
The extension of the rotation group to four dimensions is
straightforward but not trivial. The SO(4) group elements + sinh α sinh β cos β. (42)
are the linear transformations that ensure For instance, when α = 1, β = 0.7, γ = 0.35, α = 0.5,
β = 1.4, and γ = 0.1, the aµλ becomes a numerical
x12 + x22 + x32 + x42 = x12 + x22 + x32 + x42 , (36)
matrix, the element of the homogenous Lorentz group,
where each element of the group transforms xi into xi : as given in Eq. (43). The
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
aµλ = In (45a),
0.5930545 0.3897004 −0.9044001 0.5670270 a·
1.2037232 −1.2987247 1.5783052 −2.1509726 ≡ ab. (45b)
·b
0.5232450 −2.2304747 0.7592345 −2.1966429 The above matrix elements are related to the velocity com-
−1.0365561 2.4111345 −1.6986536 3.2822924 ponents of the moving frame (with the speed of light c = 1)
as
(43)
infinitesimal generators of this group and the associated vx = tanh α cosh β cosh γ ,
Lie algebra are v y = i tanh α cosh β cosh γ ,
∂ ∂ (46)
X1 = y −z ≡ L1 , vz = i tanh α sinh β ,
∂z ∂y
∂ ∂ v = vx2 + v 2y + vz2 = tanh α .
X2 = z −x ≡ L2 ,
∂x ∂z
This group also has two invariant operators, somewhat
∂ ∂
X3 = x −y ≡ L3 , complicated functions of the generators. The representa-
∂y ∂z tions of the homogeneous Lorentz group can be obtained
∂ ∂ from its covering group which was shown by Bargmann
X4 = t +x ≡ A1 ,
∂x ∂t to be SL(2, C), an element of which is a 2 × 2 complex
∂ ∂ matrix with determinant +1. For example, the element of
X5 = t + y ≡ A2 , the covering group corresponding to the element of the
∂y ∂t
homogeneous Lorentz group in Eq. (43) is
∂ ∂
X6 = t + z ≡ A3 ,
∂z ∂t −1.222144 + i (0.2411005) − 0.6624081−i (1.0352785)
.
(t ≡ x4 ); 0.1311536 − i (0.9976158) 1.9325431 − i (0.4832207)
[Li , L j ] = − εi jk Lk , (47)
k
It is important to mention that the full group of rela-
[Ai , A j ] = εi jk Lk , (44) tivistic quantum mechanics is the inhomogeneous Lorentz
k group or the Poincaré group which has, in addition, ele-
ments corresponding to translations in space–time.
[Li , A j ] = − εi jk Ak ,
k
aµλ =
cosh α sinh α sinh β sinh α cosh β sinh γ sinh α cosh β cosh γ
cosh α (1 − cosh α ) sinh β
(1 − cosh α
)
−sinh α sinh β
+(1 − cosh α ) cosh β2
· cosh β sinh γ
· sinh β cosh β cosh γ
− α (1 − cosh α
) cosh 2
β (1 − cosh α ) .
(1 cosh )
−sinh α cosh β sinh γ
· sinh β cosh β sinh γ
· sinh γ + 1
2
· cosh β sinh γ cosh γ
2
(cosh α − 1) (cosh α
− 1) (cosh α
− 1)
sinh α cosh β cosh γ
2 2 2
· sinh β cosh β cosh γ · cosh β sinh γ cosh γ · cosh β cosh γ + 1
(45a)
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
× p) − m r .
where e and m are the charge and mass of the electron
=√ 1
A −L
(p × L (51)
and r the distance from the nucleus. The electron is in a −8mH −2H r
central field, the potential energy e2 /r being invariant to
rotations in three-dimensional space. This Hamiltonian The three components of A and the three components of L
thus has SO(3) symmetry and its bound-state solutions, make up the six generators of SO(4) and the quantum me-
which generate the irreducible representations of the chanical commutation relations among these operators are
three-dimensional rotation groups, are expressed in terms the same as the Lie algebra of the infinitesimal generators
of three quantum numbers n, l, m: of SO(4). This group has two invariants and one of them,
·A
L =A · L,
vanishes in this case. The other invariant is
Unlm ( r ) = Ylm (θ, φ)Rnl (r ). (49)
simply the sum of the squares of the six components of
As can be expected from our discussion of the rota- and L.
A It is interesting to note that linear combinations
tion group, the spherical harmonics Ylm are a factor in the
of A and L, j1 , and j2 commute with each other and each
solution. Dmm(l) satisfies an SU(2) Lie algebra:
are the matrices of the irreducible repre-
Unlm (r 2 ) = Ylm (θ, φ)Rnl (r 2 ). (54) The structure relations can be written explicitly as
Since the Laplacian
[Hi , H j ] = 0, i, j = 1, 2, . . . , l
∂2 ∂2 ∂2 (l = 2 here),
∇ ≡ 2+ 2+ 2
2
∂x ∂y ∂z
[Hi , E α ] = ri (α)E α , α = ±1, ±2, . . . , (57)
and r 2 = x 2 + y 2 + z 2 both do not change on a rotation of
[E α , E −α ] = ri (α)Hi ,
the coordinate system in space, the Hamiltonian has rota-
γ
tional invariance and naturally the spherical harmonics Ylm [E α , E β ] = Cαβ Er , α = β = ±1, ±2, . . . .
are a factor in the eigenfunctions of H . However, the har- γ
monic oscillator, like the hydrogen atom, has a higher sym-
metry than SO(3). Jauch, Hill, and Baker demonstrated The number of mutually commuting generators of this
that the group of the isotropic harmonic oscillator is SU(3), group, called the rank of the group l, is 2 in this case,
a three-dimensional generalization of SU(2), the unitary H1 and H2 being these generators ri (α) is considered
unimodular group in three dimensions. A typical element as the ith component of an l-dimensional “root vector”
of this group is a 3 × 3 unitary matrix with determinant r(α). The √
different root vectors √
and their components are
+1. To establish the group structure or symmetry of a r(1) = (1/ 3, 0) or r1 (1) = (1/ 3)r2 (1) = 0,
Hamiltonian one should find, on the one hand, operators √
that commute with it and show, on the other, that these op- r(−1) = (−1/ 3, 0),
erators are the infinitesimal generators of that particular √
Lie group. In the 3 × 3 matrix there are nine complex ele- r(2) = 12 3, 12 ,
ments or 18 real parameters. However, the conditions that √
r(−2) = − 12 3, − 12 , (58)
need to be satisfied by the rows and columns of a unitary
matrix, as well as the requirement that the determinant √
r(3) = 12 3, − 12 ,
of the matrix should be +1, reduce the number to eight
√
independent real parameters. These eight parameters lead r(−3) = − 12 3, 12 .
to eight generators of the SU(3) group. Three of these are
the familiar L x , L y , and L z operators, These are represented graphically in Fig. 2 in a root dia-
∂ ∂ ∂ ∂ ∂ ∂ gram with r1 and r2 as rectangular axes. The root vectors
y −z , z −x , and x −y , have the “orthonormal” property
∂z ∂y ∂x ∂z ∂y ∂x
respectively, which are the operators that generate the ri (α)r j (α) = δi j . (59)
SO(3) group. The other generators are the five independent α
components of the symmetric tensor γ
In the relation [E α , E β ] = Cαβ E γ the structure constants
Ai j = (1/2ω) pi p j + ω2 xi x j , γ
(55) Cαβ vanish whenever r(α) and r(β)is not a root vector. The
the sum of whose diagonal elements is essentially the root diagram is a concise way of showing the structure of
Hamiltonian. The Lie algebra of SU(3) can be conviently the Lie algebra.
expressed in terms of linear combinations of these eight The well-known solutions of the isotropic harmonic
operators: oscillator can be used as basis functions for calculating
√ the irreducible representations of the SU(3) group and
H1 = (1/2 3)Lz , enumeration of the degenerate states give us the dimen-
sionality of the representations. This degeneracy is easily
H2 = (1/6)(A11 + A22 − 2A33 ),
√ known from the expression for the energy eigenvalue of
E1 = (1/2 6)(A11 − A22 + 2iA12 ), the Hamiltonian:
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
around the nucleus, which is called spin–orbit coupling, Applying this formula and remembering that the spheri-
then these degenerate levels are split into two groups, giv- cal harmonics Y1m have parity (−)l , we obtain the com-
ing rise to what is known as fine structure of spectra lines. pound characters displayed in Table VI. Since we know
In the case of sodium a 5896-Å wavelength sodium D line the compound characters of the subgroup and its char-
(µ)
is really made up of two lines of 5890 and 5896 Å wave- acters χi , calculation of aµ following formula (62) is
lengths, the D1 and D2 lines. The unsplit line is due to an straightforward. For instance, the sevenfold-degenerate
electric dipole transition between one of the six degener- level l = 3 of the free atom (ignoring spin) is split into one
ate P states to an S state. The two lines appear because the level of symmetry A1u , two levels of symmetry A2u , and
six states split into two groups and there are transitions two twofold-degenerate levels each of symmetry E µ . The
between levels in either group and the S state. This split- splitting of the other levels is shown below. Thus group
ting can be predicted by group theory if the symmetries of theory predicts the number of split levels and their sym-
the degenerate states and the symmetry of the perturbation metries, which implies that the transitions and spectral
are known, as was first pointed out by Bethe. lines can be predicted without calculation. The magni-
We will calculate and see how the levels of the atom are tude of the splitting or the frequency of the spectral lines
split when the perturbation arises, say from the atom being cannot, of course, be known from symmetry considera-
in a crystalline field of D3d symmetry. The elements of D3d tions alone. These types of symmetry considerations are
form a subgroup of the full rotation–reflection group and of great importance in solid state. For instance, the pres-
hence the irreducible representations of the latter become ence in a crystal of an impurity atom occupying a site of
reducible representations of the subgroup. The reduction cubic symmetry leads to the formation of a localized triply
theorem degenerate vibrational mode. If the crystal has C3v point
symmetry at that site the triply degenerate level will split
1 (µ)∗
aµ = gi χi χi (62) into a doublet (two levels E type) and a singlet (a type).
g i This is of experimental use in infrared studies of crystal
defects:
gives aµ the number of times the µth irreducible repre-
sentation of the subgroup is contained in the reducible E 0 → A1g ,
representation in which the elements of the ith class have
E 1 → A2u + E u ,
the compound character χi (see Table VI). gi is the num-
(µ)
ber of elements in class K i which have the character χi E 2 → A1g + 2E g ,
in the µth representation. The character table (Table VII) (64)
specifies gi , K i , and χi . The irreducible representations E 3 → A1u + 2A2u + 2E u ,
are classified as even or odd (g or µ) because of inversion E 4 → 2A1g + A2g + 3E g ,
(or the parity operation) as an element of the subgroup. In
the rotation–reflection group O(3) the rotation angles that E 5 → A1u + 2A2u + 4E u .
correspond to elements in classes K 1 , K 2 , and K 3 , are,
respectively 0, 2π/3, and 2π/2, although the axes for the D. Selection Rules in Atomic Spectra
cyclic and dihedral rotations (i.e., rotations about axes per-
pendicular to the cyclic axis) are different. The characters The spectrum radiated by an atom, hydrogen for exam-
are given by the formula ple, consists in general of discrete lines, all of which do
not have the same intensity of illumination. Also, in the
(l) sin(1 + 12 )α spectra of different atoms it usually happens that some ex-
χ(α) = (α = angle of rotation). (63) pected lines are not seen; these are called forbidden lines.
sin(α/2)
It was Neils Bohr who pointed out that a spectral line is
the result of a quantum mechanical transition between two
TABLE VI Compound Character Table of Subgroup D3d states of energy E i and E f and that the frequency of the
radiated line is given by the following Bohr frequency con-
E 2C 3 3C 2 i 2S6 3σ d
dition, which follows from the principle of conservation
χ (l = 6) 1 1 1 1 1 1 of energy:
χ (1) 3 0 −1 −3 0 1 hc
χ (2) 5 −1 1 5 −1 1 = hν = E i − E f . (65)
λ
χ (3) 7 1 −1 −7 −1 1
Here ν and λ, respectively, are the frequency and wave-
χ (4) 9 0 1 9 0 1
length of the spectral line, h Planck’s constant, and c the
χ (5) 11 −1 −1 −11 1 1
speed of light. Since more than one atom in a gas makes
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
A1g 1 1 1 1 1 1
µ=1
A2g 1 1 −1 1 1 −1
µ=2
Eg 2 −1 0 2 −1 0
(3)
µ=3 =X4
A1u 1 1 1 −1 −1 −1
µ=4
A2u 1 1 −1 −1 −1 1
µ=5
E 2u 2 −1 0 −2 1 0
µ=6
the same transition between quantized states, the intensity belongs to the D (1) representation and the product zUnli m i
of a spectral line depends on the population of these atoms, belongs to the direct product D (1) × D (li) . More precisely,
which can be determined from the statistical distribution zUnli m i is a certain linear combination of the basis func-
of the atoms at a given temperature and the quantum me- tions which generate the representations of the irreducible
chanical transition probability between these states. For components of D (1) × D (li) . Unl f m i belongs to the m f th
instance, the transition probability for transition from a row of the D (l f ) representation of SO(3). If D (l f ) is not
2P state to a 1S state in a hydrogen atom is 6.25 × 108 s−1 . one of the terms in the direct sum into which D (1) × D (li)
The radiation is usually expressed as a sum of multipoles decomposes, the matrix element vanishes. According to
and the transition probability decreases, by several orders the Clebsch–Gordan theorem we have
of magnitude, with increasing order of multipoles. That
certain transitions are seen to happen experimentally and D (1) × D (li ) = D (li + 1) ⊕ D (li ) ⊕ D (li − 1) . (68)
others not is related to selection rules which vary with
It is clear that unless l f = li + 1, li , li − 1, the matrix el-
the multipole order of the radiation. For a linearly po-
ement vanishes with the exception that when li = 0, l f
larized electric dipole radiation, the transition probability
can only be +1. Furthermore, the central field functions
between states |i and | f is given by
have definite parity. For a reflection operation x → −x,
Ai f = (64π 4 ν 3 /3hc3 )| f |ez|i|2 . (66) y → −y, z → −z these functions are even (do not change
sign) if l is even, and vice versa. In other words, they
Here ν is the frequency of the radiation, ez the dipole
are also basis functions of the inversion or parity group.
operator, and f |ez|i the quantum mechanical matrix el-
Since z has odd parity, the matrix element vanishes unless
ement which depends on the nature of the initial state
l f = li + 1 or li − 1. Since li cannot equal l f , it is custom-
|i and the final state | f . It is this matrix element that
ary to say that parity should change in an electric dipole
dictates whether the transition |i → | f is allowed or
transition. Thus the electron can make a transition between
not. If the matrix element vanishes, then that transition is
a P state and an S state.
forbidden—at least in the electric dipole approximation.
If an atom is not free but is in a crystalline field, say
The selection rules then depend on this matrix element
of C3v symmetry, the matrix element can be analyzed as
and it is the vanishing or nonvanishing of this matrix ele-
follows. The initial state function ψµ(i) belongs to the µth
ment that can be predicted by group theory without actual (f)
calculation. row of the ith irreducible representation of C3v and ψν
Let us assume that the initial quantum state of the elec- belongs to the νth row of the f th representation. From
tron |i in the hydrogen atom making the transition is the the character table of C3v we see that z belongs to the
central field state Unli m i , and that the final state it jumps one-dimensional representation D (A1 ) of C3v . Hence the
into is |1Unl f m f . The matrix element is explicitly matrix element will vanish unless D ( f ) occurs in the de-
composition of the direct product D (A1 ) × D (i) . Without
e Unl∗ f m f (ez)Unli m i dτ. (67) having to apply the systematic reduction formula one can
easily see from the character table, for instance,
Now Unli m i is the basis function which belongs to the m i th
D (A1 ) × D (A1 ) = D (A1 ) ⊕ OD(A2 ) ⊕ OD(E) ,
row of the D (li) irreducible representation of the rotation (69)
group. The operator z which is Y10 , apart from a constant, D (A1 ) × D (A2 ) = OD(A1 ) ⊕ D (A2 ) ⊕ OD(E) .
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
The coefficients of the sums of the right-hand side of Gordan theorem. Two such functions, one symmetric and
the equation are determined by looking at the appropri- the other antisymmetric, in the interchange of 1 and 2 are
ate characters on both sides. Thus if ψµ(i) refers to A1 , then given below. These are eigenfunctions of S 2 and Sz with
(f)
ψν also must have A1 symmetry and likewise for A2 . eigenvalues 12 and 12 :
Thus A1 -to-A2 transition, or vice versa, is strictly forbid-
den in electric dipole radiation. 1/2 1
φ3 ≡ χ1/2 = √ {2α(1)α(2)β(3) − α(2)α(3)β(1)
6
=
L li , s = i .
S (70) A standard tableau is an arrangement of the numbers 1,
i i
2, 3 in the boxes such that they are in increasing order
The corresponding Hamiltonian is symmetrical in the in- in rows as well as columns, as shown above. The num-
terchange of particle coordinates and is free of any spin ber of such standard tableaux is the dimensionality of the
operators. The Pauli principle restricts the many-particle representation, which in this case is two. Since each dia-
function to be antisymmetric in the interchange of the co- gram is obtained from the other by interchanging rows and
ordinates (space as well as spin coordinates) of any two columns, these are said to be conjugate to each other. The
particles because all the nucleons are fermions with spin 12 . projection operators corresponding to these two diagrams
This implies that, when written as a product of space and are
spin functions, the function is symmetrical in space co-
ordinates and antisymmetric in spin coordinates and vice P1 = 13 [E − (13)][E + (12)]
versa. To illustrate this let us take a three-nucleon system = 13 (E + C − B − D),
and consider the spin functions first. Denoting an up-spin (74)
state by α and a down-spin state by β and labeling the P2 = 13 [E − (12)][E + (13)]
particles 1, 2, 3, we have both symmetric and antisym-
metric functions which are built following the Clebsch– = 13 (E − C + B − F),
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
where we used the notation of Section II. Operating with which means three quantum numbers define a single state.
these on α(1)α(2)β(3), for instance, generates the basis While the energy levels are equally spaced they are degen-
functions erate. The ground state is nondegenerate, but the higher
excited states have degeneracies 3, 6, 10, etc., which is the
2
3
{α(1)α(2)β(3) − α(2)α(3)β(1)}, underlying basis for the shell structure. These states are
(75)
1
{α(2)α(3)β(1) − α(1)α(3)β(2)}. labeled 1s, 1 p (2s, 1d), (2 p, 1 f ), and so on. Because of
3
the presence of Ylm as a factor, these are basis functions
The elements A, C of the S3 group are represented in this for the irreducible representation of the rotation group (the
basis by the matrix orbital angular momentum group), the Hamiltonian being
invariant to rotations in space. However, as we saw earlier,
(A) (C) the bigger group is SU(3) with eight infinitesimal gener-
1 ates H1 , H2 , E 1 , E −1 , E 2 , E −2 , E 3 , E −3 satisfying the Lie
0 −2 1 0
, . (76) algebra appropriate to this group, and commuting with
−2 0 2 −1 the Hamiltonian of the oscillator. The Casimiar operator
χ =0 χ =0 C is simply the sum of squares of these eight operators
and this commutes with every generator of the group.
Other matrices can be calculated in a similar manner. As Lipkin has shown, the Elliott Hamiltonian can be
The characters show that this representation is equiva- written
lent to the one in Eq. (72). Thus linear combinations of
functions of the type α(1)α(2)β(3) give rise to basis func- 2,
H = H0 + λ1 V C + λ2 V L (78)
tions identical to the ones in Eq. (71). The functions of
the space coordinates can be treated likewise. On the one
where λ1 and λ2 are constants and V is related to the
hand, since the single-particle states are described by cen-
strength of the quadrupole interaction between two parti-
tral field functions (product of spherical harmonics and ap-
cles. H0 describes the independent particle motion in an
propriate radial function) the three-particle function must
2 and L z . This means that the latter oscillator potential. The additional terms are intended to
be an eigenfunction of L
remove some of the degeneracies associated with H0 ; in
should be a Clebsch–Gordan-type linear combination of
other words, the SU(3) multiplets are split. Since the added
products of single-particle functions. On the other hand,
terms commute with H0 the eigenfunction of H is also si-
they should be basis functions for the irreducible represen-
multaneously an eigenfunction of C as well as the angular
tations of S3 , which means that these should be obtainable 2 . The required eigenfunctions are chosen to
momentum L
by means of projection operators associated with appro-
make sure that they are SU(3) multiplet states and within
priate Young tableaux. Since the wavefunction, a product
these multiplets they are also eigenfunctions of the angu-
of space and spin functions, has to be antisymmetric in 2 . Since the energy levels corresponding
lar momentum L
the interchange of all the coordinates (space and spin) of 2 are in the nature of rotational en-
to the eigenvalues of L
any two particles, we need spatially symmetric functions
ergy level, each SU(3) multiplet constitutes a rotational
multiplying spin antisymmetric functions and vice versa.
band. Thus the rotational spectrum, which is assumed to
The space and spin functions then correspond to conjugate
be due to collective motions of the nuclear particles, is
Young diagrams. More details about these can be known
derivable from an SU(3)-type independent-particle shell
from the work of Swamy and Samuel.
model under the assumption of a quadrupole two-body
interaction. Experiments have shown, for instance, that in
20
Ne (whose outer nucleons can be considered to be in
A. The Elliott Model
the 2s–1d harmonic oscillator shell) there are excited lev-
In 1958 Elliott introduced a model of the nucleus wherein els, corresponding to a rotational spectrum, of energies
the particles move in a common harmonic oscillator po- 9.48 MeV(2+ ), 10.24 MeV(2+ ), 10.64 MeV(6− ), 11.99
tential and in addition have a mutual interaction of the MeV(8+ ), and so on.
quadrupole type. Before writing this Hamiltonian which We conclude this section by remarking that currently
mathematically describes the model, let us recall some of much research activity is related to supersymmetries such
the features of the nonrelativistic isotropic harmonic os- as U (6/4) and U (6/12) starting with the interacting boson
cillator in three dimensions discussed earlier. The simul- model initiated by Iachello and Arima. In the original in-
taneous eigenfunctions of energy and angular momentum teracting boson model the valence neutrons and protons,
have the central field form which are fermions individually, are paired into “s” and
“d” bosons in even–even nuclei, similar to Cooper pairs in
Unlm (r ) = Ylm (θ, φ)Rnl (r ), (77) the theory of superconductivity. The U (6/4) is extended
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
FIGURE 5 Arrangement of atoms in OL 4 molecule. [Reprinted with permission from Swamy, N. V. V. J., and Samuel,
M. A. (1979). “Group Theory Made Easy for Scientists and Engineer,” Wiley-Interscience, New York. Copyright 1979
John Wiley and Sons.]
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
axis, two antisymmetrical one-dimensional representa- stance, for the A1 type we have
tions B1 and B2 , and a degenerate two-dimensional repre-
sentation. The subscripts 1 and 2 and A and B indicate N A1 = 18 [(12 × 1) + (1 × 0) + (1 × 0) + (1 × 0)
whether it is symmetric or antisymmetric with respect
+ (2 × 1) + (2 × 1) + (4 × 1) + (4 × 1)]
to the vertical reflections. Wigner’s fundamental demon-
stration has been that the allowed normal modes of any =3 (81)
molecule with a certain symmetry (belonging to a group)
should correspond to the symmetries of these irreducible Since the E type are doubly degenerate, we notice that
representations and none else. we have accounted for all 12 frequencies. Pictures of the
According to the theory of small vibrations in classical modes of vibration of all important symmetry types are
mechanics, a system of N particles connected by springs found in the classic work of Herzberg. The transforma-
can have only 3N − 6 normal modes unless all the par- tion properties of the components of the dipole moment
ticles are in one line. In the case of BrF5 this means 12 vector (essentially x or y or z) and the components of
frequencies, and these modes should have the symmetries the polarizability tensor (x z, yz, x 2 − y 2 , etc.) determine
of the C4v representations. We will apply the Wigner pre- whether these modes are Raman active or infrared active.
scription to ascertain these symmetries. The first step is to From the character table we notice that both these sets of
calculate the compound character χ (R) of each symmetry components transform as the A1 and E representations,
operation following the rule whereas only the components of the polarizability tensor
have the B-type symmetries. Thus the A1 and E modes
χ (R) = (µ R − 2)(1 + 2 cos φ) are both Raman and infrared active, whereas B1 and B2
for proper rotations, are only Raman active. The experimentally measured fre-
quencies of Stephenson and Jones (expressed in units of
= µ R (−1 + 2 cos φ) cm−1 ) are 365, 572, 683 (A type), 315, 481, 536 (B type),
for improper rotations. (79) 244, 415, 626 (E type).
It is important to note that group theory predicts only
Here µ R indicates the number of atoms left unchanged by the symmetries of the normal modes but not their nu-
that particular group operation. These compound charac- merical frequencies. One needs to solve the secular de-
ters as well as the characters of the C4v group (not class- terminant to obtain these frequencies and even then the
wise) are shown in Table VIII. Following the usual reduc- numerical values are dependent on assumed force con-
tion formula relating characters to compound characters, stants between the atoms. In general the calculation is
the number of modes of each symmetry type is cumbersome, but considerable reduction in labor and time
1 will result if “symmetry-adapted eigenfunctions (symme-
Ni = χ (R)χi (R). (80) try coordinates)” are used in the evaluation of the indi-
g R
vidual matrix elements. These latter set of functions are
Calculation shows that there will be 3A1 , 0A2 , 1B1 , 2B2 , easily obtained by means of projection operators. Four
and three sets of degenerate E-type vibrations. For in- of the representations are one-dimensional and we give
A1 z, x 2 + y 2 , z 2 1 1 1 1 1 1 1 1
A2 1 1 1 1 −1 −1 −1 −1
B1 x 2 − y 2 1 −1 1 −1 1 1 −1 −1
B2 x y 1 −1 1 −1 −1 −1 1 1
E (x, y)(x z, yz) 2 0 −2 0 0 0 0 0
a set of irreducible representation matrices for the two- elements are those rotations and reflections which always
dimensional E representation: leave one point fixed. The point group is the local symme-
try at the site of an atom. In an earlier section we discussed
1 0 0 −1 −1 0 0 1
, , , , how the symmetry of the crystalline field splits the degen-
0 1 1 0 0 −1 −1 0 erate energy levels of a free atom when this is in a crystal.
(E) (M) (N ) (P) The complete set of symmetry operations carrying a crys-
(82) tal into itself, including translations and the point group
operations, is known as the space group.
1 0 −1 0 0 −1 0 1 are such that there are planes
, , , . The eigenvalues E(k)
0 −1 0 1 −1 0 1 0 of energy discontinuity in k space which are the bound-
(Q) (S) (T ) (V ) aries of a volume, the Brillouin zone. The translational
symmetry enables one to consider a single unit cell in k
Applying formula (14), we can readily obtain the projec- space known as the first Brillouin zone or the reduced zone
tion operators with the corresponding reduced wave vector. Bouckaert,
PA1 = 18 {E + M + N + P + Q + S + T + U }, Smoluchowski, and Wigner studied the symmetry prop-
erties of the Brillouin zone and derived “compatibility
PB1 = 18 {E − M + N − P + Q + S − T − U }, relations” between adjoining points, lines, and planes of
PB2 = 18 {E − M + N − P − Q − S + T + U }, (83) symmetry. These relations are of fundamental importance
in the analysis of solid-state experiments. For instance,
PE1 = 14 (E − N + Q − S), energy-band calculations become considerably simplified
for states along symmetry lines and at symmetry points
PE2 = 14 (E − N − Q + S). in the Brillouin zone. Selection rules for the absorption of
A convenient set of symmetry coordinates is easily ob- polarized electromagnetic radiation depend on the sym-
tained by simply applying these projection operators to metry associated with a given point in the reduced zone.
the changes in the various bond lengths and bond angles In the analysis of the vibronic spectra of doped crystals,
of the molecule (Table VIII). vibronic selection rules need to be determined for phonons
at various points in the Brillouin zone, and compatibility
relations are indispensable for doing this. We now discuss
B. Brillouin Zones and Compatibility Relations these compatibility relations.
Because of the lattice structure, an electron moves in The Bloch function is a product of two factors, the phase
a periodic electric potential field in the solid and the function ei k·r which determines what is known as the “star
Hamiltonian describing its motion has the form
of k” and the periodic function Uk which determines the
2 “small representations” of the “group of the wave vec-
h The “star of k”
tor k.” is the figure one obtains when a
H=− ∇ 2 + V (r ), (84)
2m
given wave vector k is subjected to all the symmetry oper-
ations of the point group. If k terminates on a zone bound-
where V ( r ) satisfies the periodicity condition
ary, two points separated by a reciprocal lattice vector are
V (r + a) = V (r ). (85) considered identical. Those elements of the point group
that leave a k invariant constitute a subgroup called “group
Here a describes the periodicity of the lattice in three of the wave vector.” Suppose an irreducible representation
dimensions. This Hamiltonian has translational invari- j of the point group is decomposed in terms of the ir-
ance and its eigenfunctions are basis functions for the reducible representations of its subgroup. If j , a given
irreducible representations of the group of translations irreducible representation of the latter, occurs in this de-
in space. Bloch showed that group theory requires the composition, it is said to be compatible with j . In band
stationary-state solution ψk of the Schrödinger equation, theory, compatibility relates states that can exist together
k (r ), in a single band.
Hψk (r ) = E kψ (86)
To illustrate the star of k let us assume a two-
be of the form dimensional square zone, the length of whose side equals a
reciprocal lattice vector. Let OE in Fig. 6 be the position of
ψk (r ) = ei k·r Uk (r ), (87)
our k vector and let this correspond to the identity element
where Uk is a periodic function with the periodicity of of the group C4v of the square. The other symmetry oper-
the lattice. This translational symmetry of the crystal is in ations, C4 , C42 , C43 , σx , σ y , σα , and σα , take the vector into
addition to its point group symmetry of the lattice, whose the positions shown in the figure, which does resemble a
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
E 8C 3 6C 4 3C 24 6C 2 i 8S6 6S4 3σ v 6σ d
Oh
A1g 1 1 1 1 1 1 1 1 1 1
E g 2 −1 0 2 0 2 −1 0 2 0
T1g 3 0 1 −1 −1 3 0 1 −1 −1
A1u 1 1 1 1 1 −1 −1 −1 −1 −1
Eu 2 −1 0 2 0 −2 1 0 −2 0
T1u 3 0 1 −1 −1 −3 0 −1 1 1
C3v
A1 1 1 1
A2 1 1 −1
E 2 −1 0
= ak (t)Xk . (96)
Here the dot above a variable means derivative with respect k
to time. This second-order equation can be transformed It turns out that this particular operator A can be expanded
into an equivalent matrix differential equation involving in terms of the generators of a Lie group isomorphic to
coupled first-order quantities by the substitution the SO(3) group familiar in quantum mechanics, with the
V̇1 = −(ċ/c)V1 = (1/Rc)V2 , defining Lie algebra
(90) [Xi , X j ] = εi jk Xk . (97)
V̇2 = (1/Rc)V1 − (ċ/c)V2 , V2 = VFM .
εi jk is the well-known Levi–Civita tensor symbol, whose
Thus components can only assume values 1, −1, or 0, and
Einstein summation convention over repeated indices is
V̇1 −ċ/c −1/Rc V1 implied. In the 2 × 2 matrix realization the generators are
= , (91)
V̇2 1/Rc −(ċ/c V2 explicitly
Solving these leads to the final solution given in Eq. (102) U (a, b) f Jm = f Jm = f Jm U −1 uv
above. If, on the other hand, the given U is treated as a
(au + bv) J + m (−b∗ u + av) J − m
column vector, then either column of the above matrix = √ . (104)
satisfies the equation with the initial values ( 10 ) and ( 01 ), (J + m)!(J − m)!
respectively.
As shown in Hamermesh’s (1959) text, a straightforward
It is interesting to note that, unlike in quantum mechan-
binomial expansion leads to the relation
ics or elementary particle physics, the properties of the Lie
group itself do not seem to be amenable to a direct phys-
U (a, b) f Jm = f Jm DmJ n (a, b),
ical interpretation bearing on the behavior of the VFO,
m
although the structure constants that define the group are
involved in the differential relations satisfied by the gi ’s
where the representation matrix DmJ m is explicitly
in the exponential solutions.
√
(J + m)!(J − m)!(J + m )!(J − m )!
k
(J + m − k)!k!(J − m − k)!(m − m + k)!
VIII. APPLICATIONS IN PARTICLE
PHYSICS
×a J + m − k a ∗J − m −k k
b (−b∗ )m −m +k
. (105)
Group theory has been important in particle physics since We now show that the representation is unitary. The nor-
the early 1960s. The hadrons were first classified into malization of the f Jm was chosen so that
charge multiplets (isospin). Then approximate SU(3) sym-
metry was used to classify the hadrons into larger multi- J
∗
plets. (This is the so-called flavor SU(3).) More recently, f Jm f Jm
in quantum chromodynamics (QCD), exact SU(3) color m=−J
3 1! 1 1 1! 2 !
= |1, 2 − + |1, 0 12 12 ,
22 3 2 2 3
and obtain
3 1! 1 1 1! 2 !
− = |1, −1 + 3 |1, 0 12 − 12 ,
2 2 3 22
j1 + m − 12
j1 + 1 , m − 1 = j1 , m − 3 1 1 3 3! !
2 2 j1 + 1 2 2 2 − = |1 − 1 1 − 1 ,
2 2 2 2
j1 − m + 3
! ! !
+ 2 j1 , m − 1 , 1 , − 1 , (116) − 12 , 12 = 13 12 , 12 |1, 0 − 23 12 − 12 |1, 1,
2 j1 + 1 2 2 2
1 ! ! !
, − 1 = 1 1 − 1 |10 − 2 1 1 |1, −1. (119)
2 2 3 2 2 3 22
and this means that the result is true for m − 1, and there-
fore true for all m. In particular, when m = j1 − 12 we Identifying,
have |11 = |π + ,
|10 = |π 0 ,
1 1 2 j1 1 1
j1 + , j1 − = | , − ,
2 2 2 j1 + 1
j1 1j 1 2 2 |1 − 1 = |π − , (120)
1 1!
= |p,
1 1 1 1 1!
22
+
| j1 , j1 , − . (117) − = |n.
2 j1 + 1 2 2 2 2
One obtains
3 3!
Now for the case j = j1 − we note that the basis func-
1
2
, = | p, π + ,
2 2
tion | j1 − 12 , j1 − 12 must be normalized and orthogonal 3 1! 1
to the above state. Equation (114) gives, with m = j1 − 12 , , = |n, π + 2
| p, π 0 ,
2 2 3 3
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
3 !
λ3 and λ8 are diagonal is
, − 1 = 1 | p, π + 2 |n, π 0
2 2 3 3
3 ! (121) 0 1 0 0 −i 0
, − 3 = |n, π − ,
2 2 λ1 = 1 0 0 , λ2 = i 0 0 ,
1 1! 2
, = |n, π +
− 1
| p, π 0 , 0 0 0 0 0 0
2 2 3 3
1 !
1 0 0 0 0 1
, − 1 = 1 |n, π 0 − 2 | p, π − ,
2 2 3
3 3!
3
λ3 = 0 −1 0 , λ4 = 0 0 0 ,
| p, π = 2 , 2 ,
+
0 0 0 1 0 0
! (124)
| p, π − = 13 32 , − 12 − 23 12 , − 12 , (122)
0 0 −i 0 0 0
!
|n, π 0 = 23 32 , − 12 + 13 12 , − 12 , λ5 = 0 0 0 , λ6 = 0 0 1 ,
i 0 0 0 1 0
By SU(2) invariance of (isospin conservation in) strong
√
interactions, 0 0 0 1/ 3 0 0
√
π + p|T |π + p T (3/2) λ7 = 0 0 −i , λ8 = 0 1/ 3 0 .
= √
π − p|T |π − p 1 (3/2)
3
T + 23 T (1/2) 0 i 0 0 0 2/ 3
At low energy we may take It is more convenient to express the structure constants in
γ
terms of f αβγ rather than Cαβ . The nonvanishing f αβγ are
T (1/2) = 0,
f 123 = 1, f 246 = 12 , f 367 = − 12 ,
and √
f 147 = 12 , f 257 = 12 , f 458 = 3/2 (125)
σ (π + p → π + p) : σ (π − p → π − p) : σ (π − p → π 0 n) √
= |π + p|T |π + p|2 : |π − p|T |π − p|2 : |π 0 n|T |π p|2 f 156 = − 12 , f 345 = 12 , f 678 = 3/2,
√ The f αβγ are odd under permutation of any two indices.
= 1 : (1/3)2 : ( 2/3)2
We now define the combinations of generators, using
= 9 : 1 : 2. (123) the standard notation T± , T3 for isospin instead of J± , J3 :
In general, for T (1/2) = 0, T± = F1 ± i F2 , U± = F6 ± i F7 ,
√
2π 0 n|T |π − p + π − p|T |π − p = π + p|T |π + p. V± = F4 ± i F5 , (126)
√
T3 = F3 , and Y = (2/ 3)F8 .
D. SU(3) and Particle Physics
The commutation relations satisfied by these operators can
SU(3) is the group of transformations ψa = Uab ψb , where easily be derived. For example,
U is any unitary, unimodular 3 × 3 matrix with determi-
nant U =
0. This is the group of the three-dimensional [T3 , T± ] = [F3 , F1 ± i F2 ]
isotropic harmonic oscillator in quantum mechanics (see
= [F3 , F1 ] ± i[F3 , F2 ]
Schiff 1968). In terms of Lie’s infinitesimal generators U
is given by = i F2 ± F1 = ±T± .
8
Similarly,
U = exp i εk Fk
i =1 [Y, T± ] = 0 = [T3 , Y ], [T3 , U± ] = ∓ 12 U± ,
where Fk ≡ 12 λk are the infinitesimal generators. An ex- [Y, U± ] = ±U± ,
plicit matrix form for the λk ’s, due to Gell-Mann, in which (127)
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
1
= ( p + 1)(q + 1)( p − q + 2)
2
(130)
1
− q(q + 1)( p − q + 2)
2
q −1
1
+3( p − q + 2) (q + 1) + 6 (q − k)k.
2 k =0
and since
q −1
(q − k)k = 16 (q + 1)q(q − 1),
k =0
we readily get
N = 12 ( p + 1)(q + 1)( p + q + 2).
This result, symmetric under interchange of p and q, is
also valid for p < q.
FIGURE 9 Results of shift operations on the state ψ(t3 , y ).
If one now applies U+ to the state of maximum
[Reprinted with permission from Swamy, N. V. V. J., and Samuel,
M. A. (1979). “Group Theory Made Easy for Scientists and Engi- t3 , ψmax , one moves along the boundary in the counter-
neers,” Wiley Interscience, New York. Copyright 1979 John Wiley clockwise direction, reaching the next corner, ψ after
and Sons.] q steps. These q + 1 states form a U -spin multiplet of
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
which ψmax = |U = 12 q, u 3 = − 12 q is the lowest state and TABLE X Quantum Numbers for the Quarks and the Anti-
quarks
ψ = |U = 12 q, u 3 = + 12 q is the highest state. ψ carries
the maximum value of y that occurs in the representation B t t3 y S Q
ymax , since continuing counterclockwise from ψ the (con-
p 1/3 1/2 1/2 1/3 0 2/3
vex) boundary runs parallel to the t3 axis and then turns
n 1/3 1/2 −1/2 1/3 0 −1/3
downward to smaller values of y, but u 3 , y, and t3 are
λ 1/3 0 0 −2/3 −1 −1/3
not independent: 32 y − t3 = 2u 3 . Applying this equation
p̄ −1/3 1/2 −1/2 −1/3 0 −2/3
to ψ (t3 = 12 p, u 3 = 12 q) we find
n̄ −1/3 1/2 1/2 −1/3 0 1/3
λ̄ −1/3 0 0 2/3 1 1/3
ψmax = 43 13 q + 23 12 p = 13 ( p + 2q). (131)
to ψmax the value of t3 increases by 12 for each of the q octet is easily constructed using the aforementioned rules
steps. and the values given by our formulas ymax = 1 and tmax = 1.
One makes here the usual association of the three (3) The decuplets (Fig. 12) are easily constructed after de-
representation with the quarks and the three-star (3∗ ) termining that tmax = 32 , ymax = 1 for the 10 representa-
representation with the antiquarks, y with hypercharge tion and tmax = 32 , ymax = 2 for the 10∗ representation.
y = B + S (B is the baryon number and S the strangeness), The Gell-Mann–Nishijima relation is used to obtain the
and t3 with the third component of isospin. The for- charges.
mula ymax = 13 ( p + 2q) gives ymax = 13 for the three repre-
sentation, ymax = 23 for the three-star representation. Us-
E. Gell-Mann–Okubo Mass Formula
ing tmax = 12 ( p + q) one obtains tmax = 12 for both the
three and three star. Using the Gell-Mann–Nishijima re- If one assumes that the mass operator is the sum of two
lation for the charge (in units of the proton charge) terms, one which transforms like a U -spin scalar (U = 0)
Q = t3 + 12 y we obtain the usual quantum numbers for and the other which transforms like U = 1 (this is equiv-
the quarks and antiquarks (Table X). These representa- alent to the usual “octet enhancement” assumption), one
FIGURE 10 The fundamental triplet representation. [Reprinted with permission from Swamy, N. V. V. J., and Samuel,
M. A. (1979). “Group Theory Made Easy for Scientists and Engineers,” Wiley-Interscience, New York. Copyright 1979
John Wiley and Sons.]
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
M("− ) = a + 32 b,
G. Gauge Theories: Applications
which leads to the “equal spacing rule,” to Elementary Particle Physics
M(!− ) − M(8 ) = M(∗− ) − M( − ) Group theory plays a very important role in our current
understanding of the elementary particles, the basic build-
= M( − ) − M("− ). ing blocks of nature, and their fundamental interactions.
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
FIGURE 12 The decuplet. [Reprinted with permission from Swamy, N. V. V. J., and Samuel, M. A. (1979). “Group
Theory Made Easy for Scientists and Engineers,” Wiley-Interscience, New York. Copyright 1979 John Wiley and Sons.]
There are four fundamental interactions, electromagnetic, gauge symmetry. The photon cannot interact with itself at
weak, strong, and gravitational. It is well established now the tree level.] One problem in pure gauge theories is that
that these are all gauge interactions based on various gauge all the gauge particles must be massless. While the pho-
groups which are continuous Lie groups. All the elemen- ton, the gluons, and the graviton are massless, the gauge
tary particles falls neatly with the various representations particles such as weak intermediate vector bosons, W ± , Z
of the gauge groups. The allowed interactions between are not. Thus, part of the gauge symmetry must be broken
the different particles are completely determined by the so that some of these gauge particles can acquire masses.
gauge symmetry, the mathematical requirement that the In 1964, P. Higgs incorporated (Higgs, Englert, Brout,
action be invariant under the transformation of the var- Guralnik, Hagen, Kibble) the idea of spontaneous sym-
ious fields under these group transformations. Histori- metry breaking with the introduction of additional scalar
cally, the pioneering step in this development was taken by particles in the theory, called now the Higgs bosons. The
H. Weyl who proposed (Weyl, Fock) that the electromag- self-interaction of these particles (the so-called Higgs po-
netic interaction, currently known as quantum electrody- tential) is such that the vacuum expectation values of some
namics (QED), is invariant under a local U (1) gauge sym- of these Higgs fields are nonzero and the gauge symmetry
metry. The experimental consequences of this symmetry is broken spontaneously to a smaller gauge group. As a
had been tested to a very high degree of precision. Then, result, some of the gauge bosons, as well as matter fields,
in 1954, Yang and Mills (Yang, Mills) extended this idea acquire masses. This idea of nonabelian gauge symmetry
to include nonabelian gauge symmetry, such as SU(2), and the spontaneous symmetry braking was successfully
SU(3), etc. One new feature in this generalization is that integrated to build a unified theory of weak and electro-
the gauge bosons belonging to the adjoint representation of magnetic interactions (Weinberg, Salam, Glashow). The
the gauge groups now can have interaction among them- gauge group is SU(2) × U (1), which is spontaneously bro-
selves. [Such an interaction is not present in the U (1) ken to UEM (1). Thus the theory has one massless gauge
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
boson, the photon, and three massive gauge bosons, W + , TABLE XI Particle Contents of
W − , and Z . All three were discovered experimentally the Standard Model
(Arnison et al.) in the CENN proton–antiproton collider SU(3)×SU(2)×U (1)
at their theoretically predicted masses. All the detailed ex- Particles representations
perimental predictions of this theory have been verified to
u Lα (3, 2, 13 )
better than 1% accuracy, and not a single deviation has
d Lα (3, 2, 13 )
been found. S. L. Glashow, A. Salam, and S. Weinberg
νeL (1, 2, −1)
were awarded the Nobel Prize in 1979 for proposing this
eL (1, 2, −1)
theory, and C. Rubia and S. Van der Meer were awarded
u Rα (3, 1, 43 )
the Nobel Prize in 1984 for the experimental discovery
d Rα (3, 1, − 23 )
of these theoretically predicted gauge bosons, W ± and Z .
eR (1, 1, −2)
It is now well established that the protons and neutrons,
ga (8, 1, 0)
which are the building blocks of the nuclei, are not elemen-
tary particles. They are made of elementary constituents +1
called quarks, bound by interaction through a set of gauge Aa 1, 3, 0
particles called gluons. The gauge symmetry group for −1
this interactions is SU(3), sometimes called SU(3) color. B (1, 1, 0)
The quarks, antiquarks, and gluons have nonzero color H (1, 2, 1)
charges under this symmetry, and the gluons can interact
with themselves in addition to interacting with the quarks
weak interaction. This set of 15 chiral fermions
and antiquarks. This theory is called quantum chromody-
(u La , d La νeL , e L , u Rα d Rα , e R ) constitute what is called the
namics (QCD). This interaction, which is responsible for
first family of fermions. This is called the electron family.
the binding of three quarks, or a quarks and an antiquark,
There are two other families, the muon family consisting
is very strong at low energy, and is responsible for the con-
of (c La , s La , νµL , µ L , c Rα , s Rα , µ R ), and the tao family
finement of the quarks and gluons. At very high energy,
consisting of (t Lα , b Lα , ντ L , τ L , t Rα , b Rα , τ R ). The fami-
the interaction becomes weaker and the theory becomes
lies are exact replica of each other except for the particles
asymptotically free (Gross, Wilczek, Politzer). Many pre-
masses. The ga (a = 1–8) represent the eight gluons, and
dictions of the theory at high energies have been tested
belong to the adjoint representation of the color SU(3).
experimentally to a good degree of accuracy.
The weak gauge bosons Aa (a = 1–3) belong to the ad-
Our current understanding of all three particle interac-
joint representation of SU(2), and B is the gauge bosons
tions, the electromagnetic, weak, and strong, is thus based
for the U (1) group. The observed gauge bosons W ± are
on the gauge group SU(3) × SU(2) × U (1) called the stan-
linear combination of A1 and A2 , and are given by
dard model of particle physics. All the existing elementary
particles, the fermions, and gauge bosons fall neatly into 1
W ± = √ (A1 ∓ i A2 ) (141)
various representations of this gauge group as shown in 2
Table XI, where u L stands for the left-handed up quark while the observed photon γ and the Z boson are linear
and u R for the right-handed up quark. The index α is the combinations of A3 and B,
SU(3) color index and takes the values 1, 2, 3. The same
is true for the down quark, d. Here u L and d L are dou- γ = B cos θW + A3 sin θW ,
blets under the weak SU(2), with u L having I3 = + 12 and (142)
Z = −B sin θW + A3 cos θW .
d L having I3 = − 12 . The electron-type neutrino, νeL , and
the electron, e L , have no color, and are doublets under the The angle θW is known as the weak mixing angle. All
weak SU(2). All the right-handed fermions are singlets the experimental observations, at both low and high en-
under SU(2). The last entries in the parentheses represent ergy scales (up to few hundred GeV) agree very well
the values of all the U (1) hypercharges, Y , and are related with the predictions of the standard model. However, the
to the usual electric charge by the relation standard Model has several theoretically unsatisfactory
features. Since it is based on the semisimple group con-
Y
Q = I3 + . (140) taining the product of three group factors, it has three in-
2 dependent gauge couplings associated with gauge groups
Here I3 is the third component of the weak isospin U (1), SU(2), and SU(3), respectively. It will be theoret-
[the diagonal generator for SU(2)]. Note that all ically much more beautiful as well as predictive if these
the right-handed particles are singlets under SU(2) three couplings can be unified into a simple group. Such a
weak, and hence they do not participate in the SU(2) unification group (Georgi, Glashow, Pati, Salam), such as
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
SU(5), has been proposed. This not only relates the three Aµα is given by
coupling strengths g, g , and g3 , it also predicts many in-
1
teresting new phenomena such as proton decay, nonzero Aαµ −→ Aα α
µ = Aµ − ∂µ ε α + Cαβγ ε β Aγµ , (145)
neutrino masses, matter–antimatter asymmetry etc. g3
So far, we have left out the gravitational interaction. where εα are the infinitesimal parameters of SU(3).
This is also a gauge theory, the gauge group being that The Lagrangian describing the interaction of the quarks
of a general coordinate transformation. Is it possible to (q) with the gluons is given by
unify also the gravity with the three gauge interactions of
the standard model? Recent development of a new sym- L 2 = q̄(∂µ − ig3 Tα Aµα )q + M q̄q, (146)
metry, called the supersymmetry (Wess, Zumino, Volkov,
Akulov), based on graded Lie algebra, makes such unifica- where M is the mass of the quark. The SU(3) transforma-
tion not only possible, but a requirement. Supersymmetry tion law for the triplet quark fields qi (i = 1–3) is given
is a generalization of a 10-parameter Poincaré group to by
a 14-parameter super-Poincaré group (Wess, Zumino). It
has four additional fermionic generators. Making these qi −→ (eiεα Tα )i j q j , (147)
fermionic parameters local requires the introduction of where εα are the local (space–time-dependent) infinites-
gravity. An exciting new development in the 1990s, known imal parameters. The sum of the Lagrangians L 1 + L 2
as string theory (Polchinski) (which includes local super- describes the complete SU(3) color interactions involving
symmetry) allows the unification of the standard model the quarks and the gluons and is known as quantum chro-
with gravity. The most interesting prediction of the super- modynamics (QCD). This symmetry is exact, so that the
string theory is that space–time is 10-dimensional, with gluons are massless. One very interesting feature of this
the extra six dimensions possibly being compact. Below nonabelian local gauge theory is that the coupling constant
we discuss the formalism of the standard model, grand uni- decreases logerithmically with the energy scale (this will
fication, and supersymmetry in the context of the group be discussed in detail later, in the renormalization group
theory point of view. equation section). Thus, the coupling constant vanishes at
infinite energy, so the theory approaches the behavior of
H. Standard Model a free field theory as the energy scale approaches infinity.
This is known as asymptotic freedom. This logarithmic
1. Quantum Chromodynamics decrease of the coupling parameter with energy has been
The standard model (SM) is based on the local gauge sym- tested up to few hundred GeV. On the other hand, as the
metry group SU(3) × SU(2) × U (1). SU(3) corresponds energy scale decreases, the coupling increases logarithmi-
to strong color interactions, while SU(2) × U(1) corre- cally so that it becomes very large at low energy and the
sponds to electroweak interactions. Let us discuss SU(3) theory becomes nonperturbative. This is sometimes called
color interactions first. The representations of the quarks infrared slavery, and it has been speculated that this may
and gluons under the SU(3) color are given in Table XI. be responsible for the confinement of the quarks and the
The Yang–Mills Lagrangian, for the pure gauge sector gluons.
involving the gluons only, invariant under the local SU(3)
gauge transformations, is given by
2. Electroweak Theory
L 1 = − 14 Fµνα F µνα , (143)
The SU(2) × U (1) part of the SM is known as the elec-
where troweak theory (Weinberg, Salam, Glashow), since it de-
scribes the weak and EM interactions. The multiplet struc-
Fµνα = ∂µ Avα − ∂ν Aµα + g3 Cαβγ Aµβ Aνγ . ture of the quarks, leptons, and the electroweak gauge
Here Aµα are the SU(3) gauge fields belong to the ad- bosons as given in Table XI. The gauge boson part of the
joint representation; µ, ν are space–time indices; α, β are Lagrangian under this symmetry is given by
the adjoint SU(3) indices (α, β, γ = 1–8); g3 is the cou- 1 1
pling constant; and the Cαβγ are the structure constants L 1 = − Fµνα F µνα − Bµν B µν , (148)
4 4
for SU(3) given by
where
[Tα , Tβ ] = iCαβγ Tγ , (144)
Bµν = ∂µ Bν − ∂ν Bµ
where the Tα are the SU(3) generators given by Eq. (124)
with Tα = 12 λα . The transformation law for the gauge fields Fµνα = ∂µ Aνa − ∂ν Aµa + gεabc Aµb Aνc .
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
The transformation laws for the gauge fields Bµ and Aµa where
are given by g
DµL = ∂µ − igTi Aµi − i Y L Bµ ,
1 2
Bµ −→ Bµ = Bµ − ∂µ ε,
g g
DµR = ∂µ − i Y R Bµ .
1 2
Aµa −→ Aµa = Aµa − ∂µ εa − εabc εb Aµc . Here $ L represent left-handed quarks or leptons doublets,
g
while $ R represent right-handed quarks or leptons singlets
Although the photon is massless, the gauge bosons W ± under the SU(2) transformations given in Table XI.
and Z are massive (MW = 80 GeV, M Z = 91 GeV). Thus
SU(2) × U (1) symmetry must be broken to UEM (1) at the u νe
$L = qL = , %L = ,
100-GeV scale. In analogy with the spontaneous breaking d L e L
of the rotational symmetry in the case of ferromagnetism, $R = u R , dR , eR .
this symmetry is also spontaneously broken by using suit-
able scalar fields, the so-called Higgs fields. The Higgs Finally, the Lagrangian for the Yukawa part of the in-
fields required is an SU(2) weak doublet, as shown in teraction responsible for giving rise to the masses of the
Table XI. The Higgs potential is such that although the fermions is given by
Lagrangian is invariant under the symmetry group, the L 4 = f d q̄ L d R H + f u q̄ L U R H̃ + f e %̄ L e R H,
ground state (the vacuum state) is not. The required Higgs
where
Lagrangian is
o −i
L 2 = D µ H † Dµ H − V (H ), (149) H̃ ≡ iτ2 H ∗ and τ2 = . (154)
i o
where When the symmetry is broken spontaneously, the fermions
g acquires masses.
Dµ = ∂µ − igTi Aµi − iY Bµ V V V
2 m u = fu √ , m d = fd √ , m e = fe √ .
is known as the gauge-covariant derivative, and Y is the 2 2 2
U (1) hypercharge. (155)
V (H ) = −µ2 H † H + λ(H † H )2 (150) Thus, in electroweak gauge theory, the gauge boson
masses are predicted by the gauge symmetry, while the
is called the Higgs potential.
fermion masses are parametrized in terms of the unknown
Note that for positive values of µ2 and λ, V (H )
Yukawa couplings f u , f d and f e .
has minimum for nonzero values of H , where H ≡
0|H |0 ≡ V is the vacuum expectation value of H . Thus,
the SU(2) × U (1) symmetry is broken spontaneously to I. Grand Unification
UEM (1), which is a linear combination of the diagonal
We saw that the gauge theory based on the SM gauge
part of the SU(2) and the U (1). The charged (W ± ) and the
group, SU(3) × SU(2) × U (1), involves three independent
neutral (Z ) gauge bosons acquire masses
coupling constants,
√ g3 , g, and g (or g3 , g2 , g1 , where
1 1 2
g2 ≡ g, g1 = 5/3g in a different normalization). The
Mw = gV, Mz = g + g 2 V, (151)
2 2 question naturally arises whether the SM gauge group can
while the photon Aµ , corresponding to the unbroken be embedded into a simple group so that it will involve
UEM (1) gauge symmetry, remains massless. In terms of only one coupling constant. It turns out that there are such
the original gauge fields (A1 , A2 , A3 ) and B, the expres- unification groups, the simplest being SU(5). (Since the
sions for the W ± , Z , and γ fields are given by Eqs. (141) SM gauge group has rank 4, the minimum rank of such
and (142). The expression for the weak mixing angle θw a unifying symmetry group must be rank 4.) The unifica-
given by tion of the three SM gauge interactions in a simple gauge
group is known as the grand unification theory (GUT).
g
tan θw = . (152)
g
J. SU(5) Grand Unification
The Lagrangian for the fermionic part of the gauge inter-
actions is given by The SU(5) grand unification theory can accommodate
the chiral nature of the fermion representation under
L 3 = $̄ L DµL $ L + $̄ R DµR $ R , (153) SU(2) × U (1) as well as the absence of chiral anomalies
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
without introducing new fermions. One can also calculate are two stages of symmetry breaking:
the value of the weak mixing angle. The gauge bosons be-
µ∼1016 GeV
long to the adjoint representation of SU(5). The 24 gauge SU(5) −−−−−−→ SU(3) × SU(2) × U (1)
bosons have the following decomposition under the SM, first stage
Expressing all fermions as left-handed Weyl fermions, Where βi ({g}) is the beta-function for the coupling gi .
(3∗ , 1, 23 ) contains the three color singlet SU(2) doublet- At the one-loop level, for the three SM gauge couplings,
down-type antiquarks (d1c , d2c , d3c ), (1, 2, −1) contains the βi ({gi (µ)}) = bi gi3 (µ) for gi 1. The boundary condition
color singlet SU(2) doublet electron-type neutrino and the is
electron (νe , e), (3, 2, 13 ) contains the color triplet SU(2) g1 = g2 = g3 = gG at µ = MGUT . (157)
doublet-up and- down quarks, (3∗ , 1, − 43 ) contains the
color triplet SU(2) singlet up-type antiquarks, and (1, 1, For the SM,
2) contains both the SU(3) and SU(2) singlet positrons.
1 2
Note that all the known fermions in one family fit neatly, b3 = − 11 − N f ,
16π 2 3
and no extra fermions are needed.
1 22 2 1
b2 = − − N f − N H , (158)
16π 2 3 3 6
K. Symmetry Breaking in SU(5)
1 2 1
SU(5) symmetry must be broken spontaneously at a very b1 = N f + N H ,
16π 2 3 10
high energy scale (∼1016 GeV) because X , Y gauge bosons
have baryon number-violating interactions, and will cause where N f is the number of quark flavors (N f = 6 for six
proton decay. The agreement with the current experimen- quark flavors, u, d, c, s, t, b) and N H is the number of
tal lifetime of the proton demand Mx , y ≥ 1016 GeV. There Higgs doublets.
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
The experimental values of the three coupling at the low super Poincaré algebra, Eqs. (160) and (161), involves
energy scale, such as at µ = M Z is very well determined both bosons and fermions in the same representation and
from the studies of Z -boson decay: is called Fermi–Bose symmetry or supersymmetry. In a
given representation, we have particles of both spin j and
α3−1 (M Z ) = 8.5 ± 0.5, α2−1 (M Z ) = 29.61 ± 0.13, j − 12 . The simplest irreducible representation is called
α1−1 (M Z ) = 98.29 ± 0.13, (159) the chiral scalar superfield, in which we have a complex
scalar field and a chiral fermionic field. Next is the vec-
where αi (µ) ≡ Using these values at µ = M Z
gi2 (µ)/4π . tor representation, which has a massless vector field and a
and evolving the couplings to higher energy scale us- Majorana spinor field. Thus, in the SM, each representa-
ing Eqs. (156) and (157), we find that the couplings do tion will be associated with its superpartner. The particle
not unify for this simple minimal SU(5) GUT. Also, content of the minimal supersymmetric extension of the
the very approximate unification scale is too small, SM is given in Table XII. The first column represent the
M X ∼1013 GeV, so the dominant proton decay rate in this usual SM particles, and the second column their super-
model, P → e+ π o is too high compared to the experi- symmetric partners. The intrinsic spins of the superpart-
mental limit. Thus, this simple SU(5) GUT theory is ruled ners differ by half a unit. For example, ẽ L has spin 0,
out experimentally. However, for supersymmetric SU(5) a scalar particle. Note that an additional new feature is
GUT, the unification works beautifully and the unifica- that that two Higgs doublets are needed to cancel chiral
tion scale is ∼1016 GeV, in complete agreement with the anomalies.
current experimental proton lifetime limits. So, next we Since none of the superpartners has so far been ob-
turn to supersymmetry, supersymmetric SM (also known served, supersymmetry must be broken at a few hundred
as MSSM), and supersymmetric GUTS. GeV scale or higher. Two popular supersymmetry break-
ing mechanisms are the gravity mediated and the gauge
M. Supersymmetry mediated (GMSB). The superpartners are expected to a
have masses less than TEV for the supersymmetry to solve
Supersymmetry is a beautiful mathematical generaliza-
the gauge hierarchy problem, and are expected to be dis-
tion of 10-parameter Poincaré group to 14-parameter
covered at the Large Hadron Collider (LHC), if not at the
super-Poincaré or supersymmetry groups. The 10-
upgraded Tevatron. Local supersymmetry, in which the
parameter Poincaré group, with the generators Jµν and
fermionic parameters εα are arbitrary functions of space–
Pµ (µ, ν = 0, 1, 2, 3), satisfy the algebra.
time (x, t), necessarily needs the introduction of the grav-
[Jµν , J pσ ] = gµρ Jνσ − gνρ Jµσ + gµσ Jνρ − gνσ Jµρ , ity supermultiplet containing the spin-2 graviton and its
supersymmetric partner, the gravitino.
(160)
[Pµ , Pν ] = 0, [Jµν , Pρ ] = gµρ Pν − gνρ Pµ .
Wess and Zumino (1974) discovered that this Poincaré al- TABLE XII Particle Contents of the Supersymmetric
gebra beautifully generalizes to a new symmetry algebra Standard Model
by introducing four new generators, Sα . Sα ’s are fermionic,
SU(3)× SU(2)×U (1)
the index α (α = 1, 2, 3, 4) is a spinor index, and the cor- Particles Superpartners representations
responding parameters εα are anticommuting Majorana
spinors. Sα ’s satisfy the following algebra: u Lα ũ Lα (3, 2, 13 )
d Lα d̃ Lα (3, 2, 13 )
1
[Sα , Pµ ] = 0, [Jµν , Sα ] = (σµν )αβ Sβ , νeL ν̃eL (1, 2, −1)
2
(161) eL ẽ L (1, 2, −1)
{Sα , Sβ } = (γ µ c)αβ Pµ , u Rα ũ Rα (3, 1, 43 )
d Rα d̃ Rα (3, 1, − 23 )
where c is the Dirac charge conjugation matrix. eR ẽ R (1, 1, −2)
In Eq. (161), the curly braces represent anticommu- ga g̃a (8, 1, 0)
tators. The introduction of anticommuting parameters
+1
and the associated fermonic generators, together with the 1, 3, 0
Aa Ãa
bosonic ones, brought lot of excitement among the math-
−1
ematicians. The usual Lie algebra has now been gener- B B̃ (1, 1, 0)
alized to include both commutators and anticommutators H1 H̃1 (1, 2, 1)
and is known as graded Lie algebra. Graded Lie alge- H2 H̃2 (1, 2, −1)
bras have now been classified. The representation of the
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
In supersymmetric SM, there will be additional contri- general linear group GL(2n). The requirement of skew
butions to the evolution of the gauge coupling above the symmetry implies that the general element of this group
supersymmetric thresholds. Thus, above threshold, the co- of transformations S should satisfy
efficients to the beta-factors b1 , b2 , and b3 are modified to
ST K S = K. (167)
1
b̃3 = − (9 − N f ),
16π 2 In geometric optics a single ray incident on a refracting
surface is usually defined by one of its points in object
1 1
b̃2 = − 6 − N f − N H , (162) space, given by a vector a and a vector s which gives the
16π 2 2
direction of the ray. The length of s is n, the refractive
1 3 index of the medium in that space.
b̃1 = Nf + NH .
16π 2 10 In image space the corresponding vectors describing
Starting with the experimental values of α1 (M Z ), α2 (M Z ), the refracted ray are a and s , n being the appropriate re-
and α3 (M Z ) given in Eq. (162), and evolving these to fractive index there. A two-dimensional manifold of such
higher energy scales, these coupling unify at an energy rays is adequately described by the components of the
scales, µ = MGUT 2 × 1016 GeV. Thus, although in non- vector expressed as continuous functions of two variable
SUSY SU(5) the unification does not take place, unifi- parameters u and v. It has been proven by Herzberger
cation does occur very accurately in SUSY SU(5). Also, that a fundamental invariant governing image formation
because of the very high scale of the unifications, the pro- au · sv ) − (
is ( av · su ). In other words,
ton decay limits are easily satisfied. au · sv ) − (
( av · su ) = (
au · sv ) − (
av · su ), (168)
Eq. (167), and the latter describes the symplectic group the forces between electrons are mediated by light quanta
Sp(4). Thus, the symplectic group is of fundamental im- or photons. A popular picture of this electron can thus be
portance in geometric optics. that of a bare particle modestly apparelled in photons. It
An interesting application of the techniques based on is important to note that underlying all these ideas is the
the symplectic Lie group has been made to charged- mass–energy equivalence of Einstein’s special relativity
particle beam optics, which is of importance in particle theory.
accelerator physics. The article in Annual Review of Nu- As a preliminary to the introduction of the renormaliza-
clear and Particle Science cited in the Bibliography (Dragt tion group, it is helpful to discuss certain ideas of “critical
et al., 1988) gives an illuminating discussion of this ap- phenomena”, to which it has its important application.
plication. We shall single out phase transitions in liquids for this
purpose. It is common knowledge that under appropriate
conditions of temperature and pressure, water exists in the
three phases of solid (ice), liquid (water), and gas (steam).
X. THE RENORMALIZATION GROUP In a three-dimensional plot of pressure, density, and tem-
perature, the domains of these phases and their boundaries
The renormalization group (RG) theory has applications are clearly demarcated. Across the boundary separating
in several areas of physics but we shall confine our treat- gas from a liquid, for instance, there is a discontinuity in
ment here to “critical phenomena”. The importance of RG density. In his experiments on the liquefaction of carbon
theory in elementary particle physics is discussed else- dioxide gas by isothermal compression, Andrews noticed
where in this article. in 1869 the existence of a critical temperature Tc , above
The concept of renormalization has its origins in quan- which there is a continuity of the gas and liquid phases.
tum electrodynamics, which is the theory that explains the In other words, it is only below the critical temperature
properties and behavior of electrons and photons or light that condensation of a gas can take place. The pressure at
quanta. According to quantum field theory, particles are this point is called the critical pressure Pc , and likewise
the outcome of quantizing appropriate wave fields. For in- the density ρc . As the temperature is increased to Tc , the
stance, photons are quanta resulting from quantizing the density of the gas phase, ρg , tends to equal the density
classical electromagnetic field of Maxwell. A disturbing of the liquid phase, ρl , and the two become one ρc at the
feature of quantum electrodynamics has been the exis- critical temperature Tc . For water, for instance, Tc is 648,
tence of infinities or divergent integrals. For instance, if Pc is 218, and ρc is 0.25 in appropriate units.
we assume the charge of an electron e to be uniformly dis- A classical theoretical foundation for the behavior of a
tributed in a sphere of radius a, the electrostatic self-energy gas at equilibrium temperature T is the equation of state re-
of this sphere is 35 (e2 /a) according to Maxwell’s theory, lating the thermodynamic variables pressure p, volume (or
and this becomes infinite in the limit of a going to zero. In density ρ) V , and temperature T . Taking into account pos-
other words, a point electron has infinite self-energy. Many sible intermolecular forces and the finite size of molecules,
such divergences exist, of which two important cases are van der Waals derived in 1873 the equation of state
the self-energy of the photon and the polarizability of the
vacuum induced by an external electric field. This vacuum a
p + 2 (V − b) = N kT, (171)
happens to be an infinite sea of occupied electron states V
of negative energy which are the inevitable consequences where a and b are constants pertaining to the gas, k is
of Dirac’s relativistic electron theory. According to Op- the universal Boltzmann constant, and N is the Avogadro
penheimer, the roots of charge renormalization lie in the number. This equation does predict the existence of critical
efforts to overcome these infinities by asserting that the ex- constants according to the following relations:
perimentally measured charge of the electron is the sum
of the “true” and “induced” charges, and it is this prescrip- ∂p ∂2 p
= 0, = 0,
tion that is the genesis of renormalization of charge. The ∂V ∂V 2
(172)
existence of a similar distinction between “bare electron a 8a
mass” and “dressed electron mass” has been pointed out pc = , Vc = 3b, Tc = .
27b2 27bN k
by Kramers in his study of the interaction between charged
particles and the radiation field. In terms of reduced variables ρ = p/ pc , ν = V /Vc ,
These concepts took concrete shape ten years later in t = T /Tc , the van der Waals equation can be written in
the epoch-making contributions of Schwinger, Tomonoga, the so-called universal form applicable to any gas:
and Feynman, and the term renormalization has been in
3 1 8
vogue ever since. According to Yukawa, the forces be- p+ ν− = t. (173)
tween particles are mediated by quanta and, in particular, ν2 3 3
P1: GLQ/GLT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN007C-305 June 29, 2001 17:59
This is known as the law of corresponding states. The ex- A second transformation involving a parameter s j takes
istence of such a universal relation, through not the exact xk to xl . A succession of these two transformations then
quadratic equation, has been demonstrated by Guggen- moves x j to xl .
heim, who plotted the reduced density (ρ/ρc ) as a func-
{ f (−s j , xk )xk }{(−si , x j )x j } = xe . (179)
tion of the reduced temperature (T /Tc ) for eight different
fluids on which experimental measurements have been However, there must also exist a transformation, say with
made. This has given rise to the suggestion that the order parameter sk , which takes x j directly to xl .
parameter ρl − ρg satisfies a power law in the reduced
temperature, f (−sk , x j )x j = xe . (180)
In other words, we have the group multiplication
T
ρl − ρg ∼ |τ |β , τ= − 1 = t − 1, (174)
Tc { f (−s j , xk )xk }{ f (−si , x j )x j } = f (−sk , x j )x j = xe .
collider,” Phys. Lett. 126B, 398 (1983). “Further evidence for charged Lichtenberg, D. B. (1970). “Unitary Symmetry and Elementary Parti-
intermediate vector bosons at the SPS collider,” Phys. Lett. 129B, 273 cles,” Academic Press, New York.
(1984). “Observation of the muonic decay of the charged intermediate Lie, S. (1893). “Theorie der Transformationgruppen,” F. Engel, Leipzig.
vector boson,” Phys. Lett. 134B, 469 (1984). “Observation of muonic Lipkin, H. J. (1965). “Lie Groups for Pedestrians,” North-Holland, Am-
Z o decay at the PP collider,” Phys. Lett. 147B, 241 (1984). sterdam.
Campbell, J. E. (1966). “Introductory Treatise on Lie’s theory of Finite Luneburg, R. K. (1964). “Mathematical Theory of Optics,” University
Continuous Transformation Groups,” Chelsea, New York. of California Press, Los Angeles.
Chang, T. P., and Li, L. F. (1984). “Gauge Theory of Elementary Par- Miller, A. (1994). “Early Quantum Electrodynamics,” Cambridge Uni-
ticle Physics,” Oxford University (Clarendon) Press, London/New versity Press, Cambridge, U.K.
York. Pati, J. C., and Salam, A. (1973). “Is baryon number conserved?” Phys.
Cotton, F. A. (1970). “Chemical Applications of Group Theory,” Wiley Rev. Lett. 31, 661.
(Interscience), New York. Polchinski, J. (1998). “String Theory,” Vols. I and II, Cambridge Uni-
Domb, C. (1996). “The Critical Point,” Taylor & Francis, London (all versity Press, Cambridge, U.K.
other references cited in the text can be found in this book). Politzer, H. D. (1973). “Reliable perturbative results for strong interac-
Dragt, A. J., et al. (1988). “Annual Review of Nuclear & Particle Sci- tions,” Phys. Rev. Lett. 30, 1346.
ence,” p. 455, Annual Reviews, Palo Alto, CA. Salam, A. (1968). “Elementary Particle Physics,” p. 367, Almquist and
Englert, F., and Brout, R. (1964). “Broken symmetry and the mass of Wiksells, Stockholm, Sweden.
gauge vector mesons,” Phys. Rev. Lett. 13, 321. Schiff, L. I. (1968). “Quantum Mechanics,” McGraw-Hill, New York.
Fock, V. (1927). Uber die invariante Form der Wellen- und der Beu- Schwinger, J. (ed). (1958). “Quantum Electrodynamics,” Dover, New
gungs gleichungen fur einen geladenen Massenpunkt. Z. Physik 39, York. The original papers can be found in the books of Miller and
226. Schwinger.
Gasiorowicz, S. (1966). “Elementary Particle Physics,” Wiley, New York. Segré, E. (1977). “Nuclei and Particles,” Benjamin, New York.
Gell-Mann, M., and Neeman, Y. (1964). “The Eight-Fold Way,” Slater, J. C. (1972). “Symmetry and Energy Bands in Crystals,” Dover,
Benjamin, New York. New York.
Georgi, H. (1982). “Lie Algebras in Particle Physics,” Frontiers in Stavroudis, O. N. (1972). “The Optics of Rays, Wavefronts and Caustics,”
Physics, Benjamin/Cummings, New York. Academic Press, New York.
Georgi, H., and Glashow, S. L. (1974). “Unity of all elementary particle Swamy, N. V. V. J., and Samuel, M. A. (1979). “Group Theory Made
forces,” Phys. Rev. Lett. 32, 438. Easy for Scientists & Engineers,” Wiley-Interscience, New York.
Glashow, S. L. (1961). “Particle symmetries of weak interactions,” Nu- ’t Hooft, G. (1992). “Conference on Lagrangian Field Theory,”
clear Phys. 22, 579. Marseille.
Gross, D. J., and Wilczek, F. (1973). “Ultra violet behavior of non- Tinkham, M. (1975). “Group Theory and Quantum Mechanics,”
Abelian gauge theories,” Phys. Rev. Lett. 30, 1343. McGraw-Hill, New York.
Guralnik, G. S., Hagen, C. R., and Kibble, T. W. (1964). “Global con- Volkov, D. V., and Akulov, V. P. (1972). “Universal neutrino interaction,”
servation laws and massless particles,” Phys. Rev. Lett. 13, 585. JETP Lett. 16, 438.
Halzen, F., and Martin, A. D. (1984). “Quarks and Leptons,” Wiley, New Weinberg, S. (1967). “A model of leptons,” Phys. Rev. Lett. 19,
York. 1264.
Hamermesh, M. (1959). “Group Theory,” Addison-Wesley, Reading, Wess, J., and Zumino, B. (1974). “Supergauge transformation in four
MA. dimensions,” Nuclear Phys. B70, 39.
Herzberg, G. (1959). “Infrared and Raman Spectra,” Van Nostrand, Weyl, H. (1929). “Elektron und Gravitation,” Z. Physik 56, 330.
Princeton, NJ. Wigner, E. P. (1959). “Group Theory and Its Application to the Quantum
Herzberger, M. (1958). “Modern Geometrical Optics,” Interscience, New Mechanics of Atomic Spectra,” Academic Press, New York.
York. Wilson, R. G. (1971). “Renormalization group and strong interactions,”
Higgs, P. W. (1964). “Broken Symmetries and the Masses of Gauge Phys. Rev. D3, 1818.
Bosons,” Phys. Rev. Lett. 13, 508. Yang, C. N., and Mills, R. L. (1954). “Conservation of isotopic spin and
Kibble, T. W. B. (1967). “Symmetry breaking in non-Abelian gauge isotopic gauge invariance,” Phys. Rev. 96, 191.
theories,” Phys. Rev. 155, 1554. Yeomans, J. M. (1992). “Statistical Mechanics of Phase Transitions,”
Kokkedee, J. J. J. (1969). “The Quark Model,” Benjamin, New York. Oxford University Press, New York.
P1: GLQ/GJP P2: FJU Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
Integral Equations
Ram P. Kanwal
Pennsylvania State University
839
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
equation in terms of the boundary values of the When one or both limits of integration become infinite
function. n or when the kernel becomes infinite at one or more points
Separable kernel Kernel K (x, y) = i=1 ai (x)bi (y); within the range of integration, the integral equation is
also called degenerate. called singular.
Singular integral equation Equation in which either the We shall mainly deal with functions which are either
kernel in the integral equation is singular or one or both continuous or integrable or square integrable. A function
b
of the limits of integration are infinite. g(x) is square integrable if a |g(x)|2 d x < ∞, and is called
an 2 function. The kernel K (x, y) is an 2 function if
b
INTEGRAL EQUATIONS are equations in which the un- |K (x, y)|2 d x < ∞
known function appears under the integral sign. They arise a
in the quest for the integral representation formulas for b
the solution of a differential operator so as to include the |K (x, y)|2 dy < ∞ (3)
a
boundary and initial conditions. They also arise naturally
b b
in describing phenomena by models which require sum-
|K (x, y)|2 d x d y < ∞.
mation over space and time. Among the integral equations a a
which have received the most attention are the Fredholm- We shall use the inner product (or scalar product)
and Volterra-type equations. In the study of singular inte- notation
gral equations, the prominent ones are the Abel, Cauchy, b
and Carleman type. φ, ψ = φ(x)ψ̄(x) d x, (4)
a
where the bar indicates complex conjugate. The func-
I. DEFINITIONS, CLASSIFICATION, tions φ and ψ are orthogonal if φ, ψ = 0. The norm
AND NOTATION of the function φ is ||φ|| = ( φ, φ )1/2 . If φ = 1, then
φ is called normalized. In terms of this norm the famous
An integral equation is a functional equation in which the Cauchy–Schwarz inequality can be written as
unknown variable g(x) appears under the integral sign. A | φ, ψ | ≤ φ ψ (5)
general example of an integral equation is
b while the Minkowski inequality is
φ(x)g(x) = f (x) + λ F{x, y; g(y)} dy φ+ψ ≤ φ + ψ . (6)
a
a ≤ x ≤ b, (1) NOTATION. We shall sometimes write the right-hand
side of Eq. (2) as f + λK g and call K the Fredholm oper-
where φ(x), f (x), and F{x, y, g(y)} are known functions ator. Furthermore, for Fredholm integral equations it will
and g(x) is to be evaluated. The quantity λ is a complex be assumed that the range of integration is a to b unless
parameter. When F{x, y, g(y)} = K (x, y)g(y), Eq. (1) the contrary is stated. The limits a and b will be omitted.
becomes a linear integral equation:
b
φ(x)g(x) = f (x) + λ K (x, y)g(y) dy II. THE METHOD OF SUCCESSIVE
a APPROXIMATIONS
a ≤ x ≤ b, (2)
Our aim is to solve the inhomogeneous Fredholm integral
where K (x, y) is called a kernel. Four special cases of equation
Eq. (2) are extensively studied. In the Fredholm integral
equation of the first kind φ(x) = 0, and in his equation g(x) = f (x) + λ K (x, y)g(y) dy, (7)
of the second kind φ(x) = 1; in both cases a and b are
constants. The Volterra integral equations of the first and where we assume that f (x) and K (x, y) are in the space
second kinds are like the corresponding Fredholm integral 2 [a, b], by Picard’s method of successive approxima-
equations except that now b = x. If f (x) = 0 in either case, tions. The method is based on choosing the first approx-
the equation is called homogeneous. imation as g0 (x) = f (x). This is substituted into Eq. (7)
A nonlinear integral equation may occur in the form under the integral sign to obtain the second approxima-
(1) or the function f {x, y, g(y)} may have the form tion and the process is then repeated. This results in the
K(x, y)F(y, g(y)) where F(y, g(y)) is nonlinear in g(y). sequence
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
g0 (x) = f (x) process we get the relation Cm2 ≤ B 2m−2 C12 . Substituting
it in (13) we arrive at the inequality
g1 (x) = f (x) + λ K (x, y)g0 (y) dy
2
(8)
.. K m (x, y) f (y) dy < C 2 D 2 B 2m−2 . (14)
. 1
gm (x) = f (x) + λ K (x, y)gm−1 (y) dy. This means that the infinite series (11) converges faster
than the geometric series with common ratio |λ|B. Thus, if
The analysis is facilitated when we utilize the iterated |λ|B < 1, the Neumann series converges uniformly and ab-
kernels defined as solutely. Fortunately, the condition |λ|B < 1 also assures
us that solution (11) is unique as can be easily proved.
K 1 (x, y) = K (x, y) In view of the uniform convergence of series (11) we can
change the order of integration and summation in it and
K 2 (x, y) = K (x, s)K 1 (s, y) ds write it as
(9)
..
. g(x) = f (x) + λ (x, y; λ) f (y) dy, (15)
K m (x, y) = K (x, s)K m−1 (s, y) ds.
where (x, y; λ) = ∞ m=1 λ
m−1
K m (x, y) is called the
It can be proved that K m+n (x, y) = K m (x, s)K n (s, y) ds. resolvent kernel. This series is also convergent at least
Thereby, we can express the mth approximation in (8) as for |λ|B < 1. Indeed, the resolvent kernel is an analytic
function of λ, regular at least inside the circle |λ|B < 1.
m
From the uniqueness of the solution it can be proved that
gm (x) = f (x) + λ λn−1 K n (x, y) f (y) dy (10) the resolvent kernel is unique.
n=1 A few remarks are in order:
When we let m → ∞, we obtain formally the so-called
Neumann series 1. We can start with any other suitable function for the
first approximation g0 (s).
g(m) = lim gm (x) 2. The Neumann series, in general, cannot be summed in
m→∞
closed form.
∞
3. The solution of Eq. (7) may exist even if |λ|B > 1.
= f (x) + λm K m (x, y) f (y) dy. (11)
m=1
The same iterative scheme is applicable to the Volterra
In order to examine the convergence of this series, we integral equation of the second kind:
applythe Cauchy–Schwarz inequality (5) to the general x
term K m (x, y) f (y) dy and get g(x) = f (x) + λ K (x, y)g(y) dy. (16)
a
2
K m (x, y) f (y) dy In this case the formulas corresponding to (11) and (15)
are
≤ |K m (x, y)|2 dy | f (y)|2 dy. (12)
∞ x
g(x) = f (x) + λm K m (x, y) f (y) dy (17)
m=1 a
Let us denote the norm f as D and the upper bound of
the integral |K m (x, y)|2 dy as Cm2 , so that relation (12) and
becomes x
2 g(x) = f (x) + λ (x, y; λ) f (y) dy, (18)
K m (x, y) f (y) dy ≤ C 2 D 2 (13) a
m
III. THE FREDHOLM ALTERNATIVE D(λ) = 0, the algebraic system (25) and thereby integral
equation (19) has a unique solution. On the other hand, for
Let us consider the inhomogeneous Fredholm integral all values of λ for which D(λ) = 0, algebraic system (25),
equation of the second kind and with it integral equation (19), is either insoluble or
has an infinite number of solutions. We discuss both these
g(x) = f (x) + λ K (x, y)g(y) dy, (19) cases.
The Case D(λ) = 0. In this case the algebraic system
when the kernel is degenerate (separable) (i.e., K (x, y) =
(25) has only one solution given by Cramer’s rule
n
k=1 ak (x)bk (y), where ak (x) and bk (y), k = 1, . . . , n, are
D1k f 1 + · · · + Dhk f h + · · · + Dnk f n
linearly independent functions). Thus, Eq. (19) becomes ck =
n D(λ)
g(x) = f (x) + λ ak (x) bk (y)g(y) dy, (20) k = 1, 2, . . . , n, (27)
k=1
where Dhk denotes the cofactor of the (h, k)th element of
where we have exchanged summation with integration. It
the determinant (26). When we substitute (27) in (22) we
emerges that the technique of solving Eq. (20) depends on
obtain the unique solution
the choice of the complex parameter λ and on the constants
n n
ck defined as k=1 D jk f i ak (x)
g(x) = f (x) + λ
j=1
D(λ)
ck = bk (y)g(y) dy, (21)
λ
= f (x) +
which are unknown because g(y) is so. Thereby, Eq. (20) D(λ)
takes the algebraic form
n
n
n × D jk b j (y)ak (x) f (y) dy, (28)
g(x) = f (x) + λ ck ak (x) (22) j=1 k=1
k=1
where we have used relation (24). This expression can be
Next, we multiply both sides of (22) by bi (x) and inte-
put in an elegant form if we introduce the determinant
grate from a to b so that we have a set of linear algebraic
equations D(x, y; λ)
n 0 −a1 (x) −a2 (x) ··· −an (x)
ci = f i + λ aik ck , i = 1, 2, . . . , n, (23)
1
b (y) 1 − λa11 −λa12 · · · −λa1n
k=1
· · · −λa2n
where = b2 (y) −λa21 1 − λa22 (29)
.
.
fi = bi (x) f (x) d x .
bn (y) −λan1 −λan2 · · · 1 − λann
(24)
aik = bi (x)ak (x) d x. which is called the Fredholm minor. Then Eq. (28) takes
the form
Let us write the algebraic system (23) in the matrix form
(I − λA)c = f, (25) g(x) = f (x) + λ (x, y; λ) f (y) dy (30a)
where I is the identity matrix of order n, A is the matrix where the resolvent kernel is the ratio of two determi-
aik , while c and f are column matrices. nants, that is,
The determinant D(λ) of the algebraic system (23) is
(x, y; λ) = D(x, y; λ)/D(λ)
1 − λa11 −λa12 · · · −λa1n
−λa 1 − λa22 · · · −λa2n 1 n n
21 = D jk b j (y)ak (x). (30b)
D(λ) = ..
, (26)
D(λ) j=1 k=1
.
−λa −λan2 · · · 1 − λann It is clear from the above analysis that if we start with
n1
the homogeneous integral equation,
which is a polynomial of degree at most n in λ. Note
that D(λ) is not identically zero because when λ = 0, it g(x) = λ K (x, y)g(y) dy, (31)
reduces to unity. Accordingly, for all values of λ for which
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
we shall obtain the homogeneous algebraic system respectively. Clearly, the determinant of this algebraic
system is D(λ̄). Accordingly, the transposed integral
(I − λA)c = 0. (32)
equation (33) also possesses a unique solution whenever
When D(λ) = 0, this algebraic system and the homoge- (19) does. Also, the eigenvalues of the homogeneous part
neous integral equation (31) have only the trivial solutions. of Eq. (33), that is,
The Case D(λ) = 0. In this case the algebraic system
(32) and hence the homogeneous integral equation (31) ψ(x) = λ K (y, x)ψ(y) dy (35)
may have either no solution or infinitely many solutions.
To examine these possibilities it is necessary to discuss are the complex conjugates of those for (31). The eigen-
the subject of eigenvalues and eigenfunctions of the ho- vectors of the homogeneous system (I − λĀT )c = 0, are,
mogeneous in general, different from the corresponding eigenvectors
problem (31). Strictly speaking we should
write it as K (x, y)g(y) dy = ωg(x) for ω to be an eigen- of system (32). The same applies to the eigenfunctions
value, but in the theory of integral equations it has be- of the transposed integral equation (35). Because the ge-
come customary to call the parameter λ = 0, for which ometric multiplicity r of λi for (31) is the same as that
the homogeneous equation (31) has a nontrivial solution, of λ̄i for (35), the number of linearly independent eigen-
its eigenvalue. The corresponding solution g(x) is called functions of the transposed equation (35) corresponding
the eigenfunction of the operator K . From the above anal- to λ̄i are also r in number, say, ψi1 , ψi2 , . . . , ψir which we
ysis it follows that the eigenvalues of (31) are the solu- assume to be normalized. Accordingly, any solution ψi (x)
tions of the polynomial |I − λA| = 0. There may exist of (35) corresponding
to the eigenvalue λi is of the form
more than one eigenfunction corresponding to a specific ψi (x) = nk=1 βk ψik (x), where βi are arbitrary constants.
eigenvalue. Let us denote the number r of such eigenfunc- Incidentally, it can be easily proved that the eigenfunc-
tions as gi1 , gi2 , . . . , gir corresponding to the eigenvalue tions g(x) and ψ(x) corresponding to eigenvalues λ1 and
λi . The number r is called the index of the eigenvalue λi λ̄2 (λ1 = λ2 ) of the homogeneous integral equation (31)
(it is also called the geometric multiplicity of λi while and its transpose (35) respectively, are orthogonal.
the algebraic multiplicity m means that D(λ) = 0 has m This analysis is sufficient for us to prove that the neces-
equal roots). We know from linear algebra that if p is the sary and sufficient condition for Eq. (19) to have a solution
rank of the determinant D(λi ) = |I − λi A|, then r = n − p. for λ = λi , a root of D(λ) = 0, is that f (x) be orthogonal to
If r = 1, λi is called a simple eigenvalue. Let us assume the r eigenfunctions ψi j , j = 1, . . . , r , of the transposed
that the eigenfunctions gi1 , . . . , gir have been normalized equation (35). The necessary part follows from the fact that
(i.e., gi j = 1 for j = 1, . . . , r ). Then to each eigenvalue if Eq. (19) for λ = λi admits a certain solution g(x), then
λi of index r = n − p, there corresponds a solution gi (x)
of the homogeneous integral equation (31) of the form f (x)ψi j (x) d x
gi (x) = rk=1 αk gik (x), where αk are arbitrary constants.
For studying the case when the inhomogeneous integral = g(x)ψi j (x) d x
equation (19) has a solution even when D(λ) = 0, we need
the integral equation
− λi ψi j (x) d x K (x, y)g(y) dy
ψ(x) = f (x) + λ K (y, x)ψ(y) dy, (33)
= g(x)ψi j (x) d x
which is called the transpose (or adjoint) of Eq. (19) which
is then the transpose of Eq. (33). For the separable kernel
K (x, y) as considered in this section, the transpose kernel − λi g(y) dy K (x, y)ψi j (x) d x
is K (y, x) = nk=1 ak (y)bk (x). When we follow the same
steps that we followed for integral equation (19) we find =0
that the transposed integral equation (33) leads to the al- because λ̄i and ψi j (x) are an eigenvalue and a corre-
gebraic system (I − λĀT )c = f, where AT stands for the sponding eigenfunction of (35). For the proof of the
transpose of A while ck and f k are now defined as sufficiency, we appeal to the corresponding condition
of orthogonality for the linear algebraic system which
ck = ak (y)ψ(y) dy,
assures us that the inhomogeneous system (25) reduces
to only n − r independent equations (i.e., the rank of the
and (34)
matrix (I − λA) is exactly p = n − r and therefore the
fk = ak (y) f (y) dy, system ((I − λA)c = f is soluble). Substituting this value
of c in (22) we have the required solution of (19).
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
This analysis is true for a general integrable kernel. The norm f defined by (4) generates the natural met-
Fredholm gave three theorems in this connection and they ric d( f, g) = f − g . Furthermore, we have the Cauchy–
bear his name. These theorems are of great importance Schwarz and Minkowski inequalities as given by (5) and
in general discussion but are of little use in constructing (6), respectively.
closed form solutions or obtaining solutions numerically. An important concept in the study of metric spaces is
Fredholm’s first theorem gives the same formula as (30a) that of completeness. A metric space is called complete
where the resolvent kernel is if every Cauchy sequence of functions in this space is a
convergent sequence (i.e., the limit is in this space). A
(x, y; λ) = D(x, y; λ)/D(λ), D(λ) = 0, (36) Hilbert space H is an inner product linear space that is
which is a meromorphic function of the complex variable complete in its natural metric. An important example is the
λ, being the ratio of two entire functions D(x, y; λ) and space of square integrable functions on the interval [a, b].
D(λ) which are given by suitable Fredholm series of form It is denoted as 2 [a, b], called 2 space in the sequel.
similar to (30b). The other two theorems discuss the case An operator K is called bounded if there exists a con-
D(λ) = 0 for a general kernel. The discussion given above stant M > 0, such that K g ≤ M g for all g ∈ 2 . We
and these three theorems add up to the following important can prove that the Fredholm operator with an 2 kernel
result. is bounded by starting with the relation f = K g. Then by
using the Cauchy–Schwarz inequality we have
2
A. The Fredholm Alternative Theorem | f (x)|2 = K (x, y)g(y) dy
For a fixed λ, either the integral equation (19) possesses
one and only one solution g(x) for integrable functions ≤ |K (x, y)|2 dy |g(y)|2 dy.
f (x) and K (x, y) (in particular the solution g(x) = 0 for
the homogeneous equation (31)), or the homogeneous of this relation we find that
Integrating both sides
equation (31) possesses a finite number r of linearly in- f = K g ≤ g [ |K (x, y)|2 d x d y]1/2 and we have
dependent solutions gi j , j = 1, . . . , r , with respect to the established the boundedness of K . The norm K of an
eigenvalue λ = λi . In the first case, the transposed inho- operator K is defined as
mogeneous equation (33) also possesses a unique solu- K = sup( K g / g ) (37a)
tion. In the second case, the transposed homogeneous
equation (35) also has r linearly independent solutions or
ψi j (x), j = 1, . . . , r , corresponding to the eigenvalues λi ; K = (sup K g ; g = 1). (37b)
and the inhomogeneous integral equation (19) has a solu-
tion if and only if the given function f (x) is orthogonal The operator K is called continuous in a Hilbert space if
to all the eigenfunctions ψi j (x). In this case the general whenever {gn } is a sequence in the domain of K with limit
solution of the integral equation (19)is determined only g, then K gn → K g. A linear operator is continuous if it is
up to an additive linear combination rj=1 α j gi j (x). bounded and conversely.
A set S is called precompact if a convergent subse-
quence can be extracted from any sequence of elements
in S. A bounded linear operator K is called compact
IV. THE FREDHOLM OPERATOR if it transforms any bounded set in H onto a precom-
pact set. Any bounded operator K , whose range is finite-
We have observed in Section I that the Fredholm dimensional, is compact because it transforms a bounded
operator K g(x) = K (x, y)g(y) dy is linear (i.e., set in H into a bounded finite-dimensional set which is
K (αg1 + βg2 ) = α K g1 + β K g2 , where α and β are necessarily precompact. Many interesting integral opera-
arbitrary complex numbers). In this section we study
some general results for this operator. For this purpose we space is 2
tors are compact. For instance, if the Hilbert
space and K (x, y) is a degenerate kernel in= 1 ai (x)bi (y),
consider a linear space of an infinite dimension
with inner then K is a compact operator. This follows by observing
product defined by (4) (i.e., f, g = f (x)ḡ(x) d x). that
This inner product is a complex number and satisfies the n
following axioms: Kg = ai (x)bi (y)g(y) dy
i=1
(a) f, f = 0 iff f = 0.
n
(b) α f 1 + β f 2 , g = α f 1 , g + β f 2 , g . = ci ai (x)
(c) f, g = g, f . i=1
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
A. Property 1 1. Proof
1. Existence of an Eigenvalue Let φ1 and φ2 be the eigenfunctions corresponding re-
If the Hermitian kernel is also an 2 function then at least spectively, to the distinct eigenvalues λ1 and λ2 . Then we
one of the quantities ±( K )−1 , where this norm is defined have λ1 K φ1 = φ1 and λ2 K φ2 = φ2 so that K φ1 , φ2 =
by (37), must be an eigenvalue of λK f = f . λ−1
1 φ1 , φ2 and also K φ1 , φ2 = φ1 , K φ2 = λ2 φ1 ,
−1
k=1
λk f (x) = f n φn (x), f n = f, φn . (45a)
n=1
which yields the (n + 1)th lowest eigenvalue and the
The Fourier coefficients f n are related to the correspond-
corresponding eigenfunction φn+1 (x). Thereby we find
ing coefficients h n of h(x) as
that either this process terminates after n steps (i.e.,
f n = h n /λn = h, φn /λn
= 0), and the kernel K (x, y) is a degener-
K (n+1) (x, y) (45b)
ate kernel nk=1 (φk (x)φk (y)/λk ), or the process can be and λn are the eigenvalues of K .
continued indefinitely and there are an infinite number of
eigenvalues and eigenfunctions so that 1. Proof
∞
φk (x)φk (y)
K (x, y) = . (42) The Fourier coefficients of the function f (x) with respect
k=1
λk to the orthonormal system {φn (x)} are
This is called the bilinear form of the kernel. Recall that f n = f, φn = Kh, φn = h, Kφn
we meet a similar situation for a Hermitian matrix A. In- = h, φn /λn = h n /λn
deed, by transforming to an orthonormal basis of the vector
space consisting of the eigenvectors of A we can transform because K is self-adjoint and λn K φn = φn . Accordingly,
it to a diagonal matrix. we can write the correspondence
From the bilinear form (42) we derive a useful inequal- ∞ ∞
h n φn (x)
ity. Let the sequence {φk (x)} be all the eigenfunctions of a f (x) ∼ f n φn (x) = . (46)
λn
Hermitian 2 kernel K (x, y) with {λk } as the correspond- n=1 n=1
ing eigenvalues as described and arranged in the above The estimate of the remainder term for this series is
analysis. Then the series 2
n+ p
φk (x)
n+ p
n+ p
|φk (x)|2
∞
|φn (x)|2 hk ≤ h 2k
< C12 , (43) k=n+1 λk k=n+1 k=n+1 λ2k
λ2n
∞ 2
n+p φk (x)
n=1
≤ hk 2
(47)
where C12 is an upper bound of the integral |K (x, y)|2 dy. λ2k
k=n + 1 k=1
The proof follows by observing that the Fourier co-
efficients an of the function K (x, y) with fixed x, Now the series ∞ k=1 |φk (x)|/λk is
2 2
bounded in view of re-
n+ p
with respect to the orthonormal system φn (y) are an = lation (43) while the partial sum k=n + 1 h 2k can be made
K (x, y), φ̄ n (y) = φn (x)λn . Substituting these values of arbitrarily because h(x) ∈ 2 and, as such, the se-
small
an in the Bessel inequality (40) we derive (43). ries ∞ h
k=1 k
2
is convergent. Thus, the estimate (47) can
The eigenfunctions do not have to form a complete set be made arbitrarily small so that series (46) converges
in order to represent the functions in 2 . Indeed, any func- absolutely and uniformly. Next, we show that this series
tion which can be written as “sourcewise” in terms of the converges to f (x) in the mean and for this purpose we de-
kernel K (i.e., any function f = Kh) can be expanded in note its partial sum as ψn (x) = nm=1 (h m /λm )φm (x) and
a series of the eigenfunctions of K . This is not surpris- estimate the value f (x) − ψn (x) . Because
ing because integration smooths out irregularities or if the n
hm
functions f , K , and h are represented by the series, then f (x) − ψn (x) = Kh − φm (x) = K (n+1) h,
the convergence of the series representing f will be better m=1
λm
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
where K n+1 is the truncated kernel (41), we find that In view of the uniform convergence of the expansion, we
2
can interchange the order of integration and summation,
f (x) − ψn (x) 2
= K (n+1) h and get
= K (n+1) h, K (n+1) h
∞
∞ ∞
gn φn (x)
gn φn (x) = f n φn (x) + λ (52)
= h, K (n+1) K (n+1) h n=1 n=1 n=1
λn
Now we multiply both sides of (52) by φk (x) and inte-
= h, K 2(n+1) h (48)
grate from a to b and appeal to the orthogonality of the
in view of the self-adjointness of the kernel K (n+1) and the eigenfunctions and obtain
relation K (n+1) K (n+1) = K 2(n+1) . Now we use Property 7 gk = f k + (λ/λk )gk (53a)
for the Hermitian kernels and find that the least eigenvalue
of the kernel K 2(n+1) is λ2n+1 . On the other hand, Property 1 or
implies that gk = f k + (λ/(λk − λ)) f k . (53b)
2 Substitution of (53) into (50) leads us to the required
1 λ2n+1 = sup K (n+1) h / h 2 solution
= sup h, K 2(n+1) h / h 2 . ∞
λ
g(x) = fn + f n φn (x)
(λn − λ)
Combining it with (48) we get f (x) − ψn (x) 2 ≤ h 2 / n=1
λ2n+1 . Because λn+1 → ∞, we have proved that f (x) − ∞
φn (x)φn (y)
ψn (x) → 0 as n → ∞. = f (x) + λ f (y) dy
(λn − λ)
In order to prove that f = ψ, where ψ is the series n=1
with partial sum ψn , we use the Minkowski inequality (6)
and get f − ψ ≤ f − ψn + ψn − ψ . The first term = f (x) + λ (x, y; λ) f (y) dy, (54)
on the right-handside of this inequality tends to zero as
where the resolvent kernel (x, y; λ) is expressed by the
proved above. Because series (47) converges uniformly,
series
the second term can be made as small as we want (i.e.,
given an arbitrarily small and positive ε we can find n large ∞
φn (x)φn (y)
(x, y; λ) = , (55)
enough that |ψn − ψ| < ε). One integration then yields n=1
λn − λ
ψn − ψ < ε(b − a)1/2 and we have proved the result.
and we have again interchanged the integration and sum-
Let us now use the foregoing theorem for solving the
mation. It follows from expression (55) that the singu-
inhomogeneous Fredholm integral equation of the second
lar points of the resolvent kernel corresponding to a
kind:
Hermitian 2 kernel are simple poles and every pole is an
g(x) = f (x) + λ K (x, y)g(y) dy, (49) eigenvalue of the kernel.
In the event that λ in Eq. (49) is equal to one of the
with a Hermitian 2 kernel. First, we assume that λ eigenvalues, say, λ p of K , solution (55) becomes infinite.
is not an eigenvalue of K . Because g(x) − f (x) in this To remedy it we return to relation (53) which for k = p
equation has the integral representation of the form (44) becomes g p = f p+ g p . Thus, g p is arbitrary and f p = 0.
we expand both g(x) and f (x) in terms of the eigen- This implies that f (x)φ p (x) d x = 0 (i.e., f (x) is orthog-
functions φn (x) given by the homogeneous equation onal to the eigenfunction φ p (x)). If this is not the case, we
φn (x) = λn K (x, y)φn (y) dy. Accordingly, we set have no solution. If λ p has the algebraic multiplicity m,
then there are m coefficients g p which are arbitrary and
∞
∞
f (x) is orthogonal to all these m functions.
g(x) = gn φn (x), f (x) = f n φn (x), (50) Integral equations arise in the process of inverting or-
n=1 n=1
dinary and partial differential operators. In the quest for
where gn = g, φn is unknown and f n = f, φn is known. the representation formula for the solutions of these oper-
Substituting these expansions in (49) we obtain ators so as to include the initial or boundary values in it,
∞
∞ we arrive at integral equations. In the process there arises
gn φn (x) = f n φn (x) the theory of Green’s functions which are symmetric and
n=1 n=1 become the kernels of the integral equations. If they are
∞ not symmetric, then they can be symmetrized. We illus-
+λ K (x, y) gn φn (y) dy. (51) trate these concepts with the help of the Sturm–Liouville
n=1 differential operator
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
when λ = 0 has a nonzero eigenfunction can be handled Similarly, the solution of the integral equation
by a slight extension of the foregoing arguments. b
g(y)
The theory of Green’s functions as derived above can be dy = f (s), 0<α<1 (70)
displayed very elegantly with the help of the Dirac delta s (y − x)α
function and other generalized functions. is
sin απ d b
f (x)
g(y) = − d x . (71)
π dy y (x − y)1 − α
VI. SINGULAR INTEGRAL EQUATIONS
ON THE REAL LINE There are many related integral equations which can be
solved by similar steps. For instance, the solution of the
An integral equation is said to be singular either if the integral equation
x
kernel is singular within the range of integration or if one g(y) dy
f (x) = α
, 0 < α < 1, (72)
a [h(x) − h(y)]
or both limits of integration are infinite. In this section
we study some famous singular integral equations. They
where the function h(x) is strictly increasing differentiable
arise very frequently in various branches of physics and
function with nonzero h (x) over some interval a ≤ x ≤ b,
engineering. No general theory is available for these equa-
is
tions but methods are available for solving some special y
sin απ d h (u) f (u) du
cases. We start with the Abel integral equation. g(y) = . (73)
π dy a [h(y) − h(u)]1−α
A. The Abel Integral Equation Similarly, the solution of the integral equation
b
g(y) dy
This equation is f (x) = , 0<α<1 (74)
x x [h(y) − h(x)]α
g(y)
f (x) = dy, 0 < α < 1. (66) is
a (x − y)α
sin απ d b
h (u) f (u) du
To solve it we multiply both sides of this equation by g(y) = − . (75)
π dy [h(u) − h(y)]1−α
d x/(u − x)1−α and integrate with respect to x from a to u y
b
while g(y) dy
u g(x) = f (x) + λ (85)
φ(y, u) π csc απ a y−x
dy = − , u < x. (80)
0 y−x (x − u)1−α x α is
When we multiply (76) by x we get f (x) λ
g(x) = − +
1 1+π λ 2 2 (1 + π λ )(x − a)1−α (b − x)α
2 2
yg(y) dy b
λ = xg(x) − x f (x) + c, (81) (b − y)α (y − a)1−α f (y) dy
0 y−x ×
1 a y−x
where c = λ 0 g(y) dy. Next, we multiply both sides of c
(81) by φ(x, u) as defined by (78), integrate from 0 to u + , (86)
(x − a)1−α (b − x)α
and change the order of integration. The result is
u u where c is an arbitary constant.
φ(x, u) d x The solution of the Cauchy-type integral equation of
−λ yg(y) dy
0 0 x−y the first kind:
1 u b
φ(x, u) d x g(y) dy
−λ yg(y) dy = f (x), a<x <b (87)
u 0 x−y a y−x
u
can be obtained with a very similar analysis and is
= xg(x)φ(x, u) d x
0 1
g(x) = √
u u
π 2 (x − a)(b − x)
− x f (x)φ(x, u) d x + c φ(x, u) d x. b √
0 0 (y − a)(b − y)
× f (y) dy + π c . (88)
With
u the help of relations (79) and (80) and the fact that a x−y
0 φ(x, u) d x = π csc απ, the above relation becomes In particular, when a = −1, b = 1, it follows that the solu-
1 1−α tion of the airfoil equation
y g(y)
λπ csc απ dy
u (y − u) 1 1 g(y) dy
1−α
u = f (x), −1 < x < 1 (89)
π −1 y − x
= − x f (x)φ(x, u) d x + cπ csc απ (82)
0 is
This is an Abel-type integral equation whose solution is 1 1
(1 − y 2 ) f (y) c
found from the previous analysis to be g(x) = √ dy + √ .
π 1− x2 −1 x−y 1 − x2
1 u
sin2 απ d (90)
λy g(y) =
1−α
(u − y)−α
π 2 dy y 0
α−1 1−α c sin απ C. Singular Integral Equations
× (u − x) x f (x) d x d y + with a Logarithmic Kernel
π (1 − y)α
(83) We start with the integral equation
1
Now we use the relation −π cot απ = 1/λ and do a little ln|x − y|g0 (y) dy = 1, −1 < x < 1. (91)
algebraic manipulation and obtain the required solution as −1
Then relation (92) becomes which when substituted in (97) yields the solution
π ∞
1
cos nα cos nβ 1 1 − y 2 1/2 f (y)
−ln 2 − 2 g(x) = 2 dx
0 n=1
n π −1 1 − x 2 y−x
1
∞
− 2
1 f (y)
dy. (98)
× bm cos mβ dβ = 1, π ln 2(1 − x )
2 1/2
−1 (1 − y )
2 1/2
m=0
Various other forms of integral equations with logarith-
from which it follows, due to orthogonality of cosine func- mic kernels can be solved in a similar fashion.
tions, that
∞
cos nα
−π b0 ln 2 − πbn = 1.
n VII. THE CAUCHY KERNEL AND THE
n=1
RIEMANN–HILBERT PROBLEM
Thus, b0 = −(1/(π ln 2)), bn = 0, n ≥ 1, and we find that
the solution of Eq. (91) is For the study of the singular equations in the complex
1 1 plane , we require a few important results from the anal-
g0 (y) = − . (94)
π ln 2 1 − y 2 ysis of a complex variable. We present some of these con-
cepts needed for the Cauchy kernel. Let C be a simple,
In passing we observe that by substituting solution (94) smooth, and closed curve in the complex z plane endowed
in (91) we have the useful identity with the counterclockwise orientation. The complement
1 \C consists of two parts, one interior (bounded) part
ln |x − y|
dy = −π ln 2, −1 < x < 1. (95) S+ and the other exterior part S− . A function F(z) defined
−1 (1 − y )
2 1/2
and analytic in the complement \C is called a sectionally
Next, we consider the integral equation analytic function with discontinuity contour C. Let f (ζ )
1
be a continuous function defined for ζ ∈ C. The Cauchy
ln |x − y|g(y) dy = f (x), −1 < x < 1. (96) (or analytic) representation of f is the sectionally analytic
−1
function
Differentiation with respect to x gives
1 F(z) = F{ f (ζ ); z}
g(y)
dy = f (x), −1 < x < 1,
1 f (ζ ) dζ
−1 x −y = , z ∈ \C. (99)
2πi C ζ − z
whose solution follows from (90) to be
1 The boundary values F± (ω) of this function on both sides
1 1 − y 2 1/2 f (y) C of C satisfy the Plemelj relations
g(x) = 2 dy + √ ,
π −1 1 − x 2 y−x π 1 − x2 1 i
(97) F+ (ω) = f (ω) − H ( f )
2 2
1 (100)
where C = −1 g(y) dy. To find the constant C, we mul- 1 i
F− (ω) = − f (ω) − H ( f ),
tiply (96) by 1/ (1 − x 2 ) and integrate it with respect to 2 2
x from −1 to 1 and change the order of integration. The
where H ( f ) = (1/π ) C ( f (ζ )/(ζ − ω)) dζ, ω ∈ C, is
result is called the Hilbert transform of f . Solving (100) for f
1 1
ln |x − y| and H ( f ) we get
g(y) dy dx
−1 (1 − x )
2 1/2
−1
f = F+ − F− = [F]
1
f (x) (101)
= d x, 1 f (ζ )
−1 (1 − x )
2 1/2
H( f ) = dζ = i(F+ + F− ),
π C ζ −ω
which, in view of identity (95), becomes
1 where [F] is called the jump of F across C.
f (x) Now let 1 (ζ ) and 2 (ζ ) be two continuous functions
(−π ln 2)C = d x.
−1 (1 − x )
2 1/2
defined on C. The Riemann–Hilbert problem is to find the
Thus sectionally analytic function Y (z) defined on \C whose
1 boundary values satisfy
1 f (x)
C =− d x,
π ln 2 −1 (1 − x 2 )1/2 1 (ζ )Y+ (ζ ) − 2 (ζ )Y− (ζ ) = (ζ ), (102)
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
where (ζ ) is a function given on C. We assume that 1 problem (fundamental in the sense that all other solutions
and 2 never vanish on C. It is called the normality con- can be obtained from it in a suitable way).
dition. When we divide both sides of the above equation Let us now consider the case when ln (ζ ) is multiple
by 1 (ζ ) we get valued on C and introduce the number k
Y+ (ζ ) = (ζ )Y− (ζ ) + ψ(ζ ), (103) 1 1
k= c (ln (ζ )) = c (arg ln (ζ )), (109)
2πi 2π
where = 2 /1 and ψ = /1 . When ψ = 0. Eq. (103)
becomes the homogeneous Riemann–Hilbert problem where c ( f (ζ )) denotes the increment of the function
f (ζ ) when the curve C is transversed in the positive di-
X + (ζ ) = (ζ )X − (ζ ). (104) rection. Thus, k is the index of the point z = 0 with respect
to the curve C , the image of the curve C under the func-
To solve (103) we first reduce it to the simple form
tion (ζ ). The number k, which is always an integer, is
W+ (ζ ) = W− (ζ ) + ψ(ζ ) (105) called the index of the Riemann–Hilbert problem.
Let Y (z) be a solution of the homogeneous Riemann–
which is obtained from (103) by taking (ζ ) = 1 be- Hilbert problem (104); we define the sectionally analytic
cause the solution of (105) is known to have the analytic function Ȳ (z) as
representation
Ȳ (z) = Y (z), z ∈ S+
1 ψ(ζ ) dζ (110)
W (z) = F{ψ(ζ ); z} = , (106) Ȳ (z) = (z − z 0 ) Y (z),
k
z ∈ S−
2πi C ζ − z
where we have appealed to definition (99). We first solve Thus, Ȳ (z) satisfies the following boundary value problem
the homogeneous Riemann–Hilbert problem (104). For where z 0 is an arbitrary point of S+ :
this purpose we take the logarithm of both sides of (104),
Ȳ+ (ζ ) = 0 (ζ )Ȳ− (ζ ),
and get (111)
0 (ζ ) = (z − z 0 )−k (ζ )
ln X + (ζ ) = ln X − (ζ ) + ln (ζ ), (107)
Thereby ln 0 (ζ ) has become single valued and we can
where we assume, for the time being, that ln (ζ ) is single apply the previous analysis to conclude that the solution
valued on C. A particular solution of (107) is given by the of (111) is Ȳ (z) = P(z) X̄ (z), where P(z) is a polynomial
analytic representation and where X̄ (z) = exp[F(ln φ0 (ζ ); z)] is the fundamental
solution. Then it follows from (110) that the solutions of
ln X (z) = F{ln (ζ ); z}
(104) are of the form Y (z) = P(z)X (z) where P(z) is an
1 ln (ζ ) dζ arbitrary polynomial and where the fundamental solution
= . (108)
2πi C ζ − z X (z) is given by
or X (z) = e F{ln (ζ );z} , which is a sectionally analytic func- 1 ln 0 (ζ ) dζ
X (z) = exp , z ∈ S+ (112a)
tion that never vanishes on \C and whose boundary val- 2πi C ζ −z
ues satisfy (104) because
1 ln 0 (ζ ) dζ
X + (ζ ) X (z) = (z − z 0 )−k exp
= exp[F+ {ln (ω); ζ } − F− {ln (ω); ζ }] 2πi C ζ −z
X − (ζ )
z ∈ S− (112b)
= exp[ln (ζ )] = (ζ ).
Once a fundamental solution of the Riemann–Hilbert
Note that this basic solution is normal because X (∞) = 1. problem has been obtained, we can solve the inhomoge-
Now, if Y (z) is any other solution of the homogeneous neous problem (103) as follows. Let X (z) be a fundamental
problem (104) then the function Y (z)/ X (z), which is solution and let Y (z) be a solution of (103) which we can
known to be analytic on \C, is also analytic on C because write as
its jump across C vanishes:
Y+ Y− ψ
Y Y Y− Y− = + (113)
− = − = 0. X+ X− X+
X + X − X − X−
because = X + / X − . The solution of this equation with
Thus, Y / X is an entire function and the most general so- polynomial behavior at z = ∞ is
lution of the homogeneous Riemann–Hilbert problem is
Y (z) ψ(ζ )
Y (z) = P(z)X (z) where P(z) is an entire function. It is = P(z) + F ;z .
called the fundamental solution of the Riemann–Hilbert X (z) X + (ζ )
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
For the special case when a and b are constants. K (x) = 0(e−c|x| ), f (x) = 0(ed x ), as x → ∞, the function
Eq. (115) reduces to the Cauchy integral equation h − (x) = 0(e−e|x| ) as x → −∞, where c > 0 and d < c. (4).
We look for a solution g+ (x) = 0(ed x ) as x → ∞.
b g(ω)
ag(ζ ) + dω = f (ζ ) (127) Let us now apply the Fourier transform ψ̂(u)
πi C ω − ζ
∞
while its solution follows by observing that k = 0, and
ψ̂(u) = ψ(x)eiux d x (133)
and γ are constants. Thus, we appeal to relation (126) and −∞
find the solution to be
to both sides of equation (132) and get
a f (ζ ) b f (ω) dω
g(ζ ) = 2 − 2 . (128)
a −b 2 (a − b )πi C ω − ζ
2
(1 + λ K̂ (u))ĝ + (u) = fˆ+ (u) + ĥ − (u). (134)
Let us finally present an elementary discussion of nonlin- as g (0) (x) = f (x), and assume that |φ(x)| < A, a constant.
ear integral equations. Because there are no analytic tech- Then we find that
P1: GLQ/GJP P2: FJU Final Pages
Encyclopedia of Physical Science and Technology EN007-344 June 30, 2001 18:2
(L|λ||x|)k
≤ · · · ≤ [|λ|L(b − a)] k−1
A. ≤ ··· ≤ M (152)
k!
Thus, if |λ|L(b − a) < 1, the series will be absolutely and But the last term in (152) is the kth term of the power
uniformly convergent and g (n) (x) will tend to a function series
∞ for M exp{L|λ||x|}}, so that the series g (0) (x) +
k=1 {g (x) − g
(k) (k−1)
g(x) which will be the solution of Eq. (146). (x)} is absolutely and uniformly
To prove the uniqueness of the solution, we let g(x) and convergent for all values of λ and its sum limn → ∞ g (n) (x)
h(x) be two solutions. Then the value of the difference is the solution of the integral equation (149). The unique-
ψ(x) = g(x) − h(x) is ness of this solution can be proved by a slight extension
of the arguments presented above for the Fredholm case.
ψ(x) = λ [F{x, y, g(y)} − F{x, y, h(y)}] dy.
When we denote by ψmax , the maximum value of ψ(x) in X. A TAYLOR EXPANSION TECHNIQUE
(a, b) and appeal to the Lipschitz continuity of F, we find
that In the previous analysis we have presented the Neumann
and Hilbert-Schmidt expansion techniques for solving the
|ψ(x)| < |λ| |ψ(y) dy| ≤ |λ|L(b − a)ψmax (x), integral equations. Recently, it has been discovered that
both the linear and nonlinear integral equations can also
which means that ψmax ≤ |λ|L(b − a)ψmax . But |λ|L(b − be solved with the help of the Taylor series. To present
a) < 1, so we have ψmax = 0 and the uniqueness of the the basic ideas of this method, we consider the Fredholm
solution is proved. integral equation of the second kind
Let us now consider the nonlinear Volterra integral b
equation g(x) = f (x) + K (x, y)g(y) dy (153)
a
x
g(x) = f (x) + λ F{x, y, g(y)} dy, (149) and differentiate it n times with respect to x so that we
0 have
b n
with the same conditions on the function f (x) and ∂ K (x, y)
g (n) (x) = f (n) (x) + g(y) dy.
F{x, y, g(y)} as given above. The iterative scheme again a ∂xn
yields the sequence g (0) (x) = f (x) and
For x = 0, the above relation becomes
x
b n
g (n) (x) = f (x) + λ F x, y, g (n−1) (y) dy ∂ K (x, y)
g (n) (0) = f (n) (0) + g(y) dy. (154)
0
a ∂ x n
n≥1 (150) x=0
Knots
Louis H. Kauffman
University of Illinois
199
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
200 Knots
Topology The study of topological spaces—sets en- We begin by making some knots. In particular, we shall
dowed with a notion of neighborhoods (called open look at the bowline, a most useful knot. The bowline is
sets) closed under finite intersections and arbitrary widely used by persons who need to tie a horse to a post
unions, such that the whole space and the empty set are or their boat to a dock. It is easy and quick to make, holds
both open. Topological spaces encapsulate the concept exceedingly well, and can be undone in a jiffy. Figure 1
of continuity in the structure of the neighborhoods. gives instructions for making the bowline. In showing the
bowline we have drawn it loosely. To use it, one grabs
THIS ARTICLE constitutes an introduction to the theory the lower loop and pulls it tight by the upper line shown in
of knots as it has been influenced by developments concur- the drawing. It tightens while maintaining the given size
rent with the discovery of the Jones polynomial in 1984 of the loop. Nevertheless, the knot is easily undone, as
and the subsequent explosion of research that followed some experimentation will show.
this signal event in the mathematics of the 20th century. The utility of a schema for drawing a knot is that the
I hope to give the flavor of these extraordinary events schema does not have to indicate all the physical prop-
in this exposition. Even the act of tying a shoelace can erties of the knot. It is sufficient that the schema should
become an adventure. The familiar world of string, rope, contain the information needed to build the knot. Here
and the third dimension becomes an inexhaustible source is a remarkable use of language. The language of the
of ideas and phenomena. diagrams for knots implicitly contains all their topolog-
Sections 1 and 2 constitute a start on the subject of knots. ical and physical properties, but this information may
Later sections introduce more technical topics. The theme not be easily available unless the “word is made flesh”
of a relationship of knots with physics begins already with in the sense of actually building the knot from rope or
the Jones polynomial and the bracket model for the Jones cord.
polynomial as discussed in Section 4. Sections 5 and 6 Our aim is to get topological information about knots
provide an introduction to Vassiliev invariants and the re- from their diagrams. Topological information is informa-
markable relationship between Lie algebras and knot the- tion about a knot that does not depend upon the material
ory. The idea for the bracket model and its generalizations from which it is made and is not changed by stretching or
is to regard the knot itself as a discrete physical system and
to obtain information about its topology by averaging over
the states of the system. In the case of the bracket model
this summation is finite and purely combinatorial. Trans-
positions of this idea occur throughout, involving ideas
from quantum mechanics (Sections 7 and 8) and quantum
field theory (Section 9). In this way knots have become a
testing ground not only for topological ideas, but also for
the methods of modern theoretical physics.
This article concentrates on the construction of invari-
ants of knots and the relationships of these invariants to
other mathematics (such as Lie algebras) and to physi-
cal ideas (quantum mechanics and quantum field theory).
There is also a rich vein of knot theory that considers a knot
as a physical object in three-dimensional space. Then one
can put electrical charge on a knot and watch (in a com-
puter) the knot repel itself to form beautiful shapes in three
dimensions. Or one can think of a knot as made of thick
rope and ask for an “ideal” form of the knot with minimal
length-to-diameter ratio. This idea of physical knots is a
current topic of research.
A. Introduction
For this section it is recommended that the reader obtain a
length of soft rope for the sake of direct experimentation. FIGURE 1 The bowline.
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 201
202 Knots
is topologically equivalent to two trefoils clasping one an- of arithmetic, we believe that the topological properties of
other, as shown in Fig. 6. knotted rope follow the laws of knot topology.
This deformation was discovered by making a bowline
in a length of rope, closing it into a loop, and fooling
about with the rope until the nice pair of clasped trefoils II. INVARIANTS OF KNOTS
appeared. Note that there is more than one way to close the AND LINKS, A FIRST PASS
bowline into a loop. Figure 6 illustrates one choice. After
discovering them, it took some time to find a clear picto- We want to be able to calculate numbers (or bits of algebra
rial pathway from the closed-loop bowline to the clasped such as polynomials) from given link diagrams in such a
trefoils. The pictorial pathway shown in Fig. 6 can be eas- way that these numbers do not change when the diagrams
ily expanded to a full sequence of Reidemeister moves. In are changed by Reidemeister moves. Numbers or polyno-
this way the model of the the knot in real rope is an analog mials of this kind are called invariants of the knot or link
computer that can help to find sequences of deformations represented by the diagram. If we produce such invari-
that would otherwise be overlooked. ants, then we are finding topological information about
It is a curious reversal of roles that the original physical the knot or link. The easiest example of such an invariant
object of study becomes a computational aid for getting is the linking number of two curves, which measures how
insight into the mathematics. Of course this is really a two- many times one curve winds around another. In order to
way street. The very close fit between the mathematical calculate the linking number we orient the curves. This
model for knots and the topological properties of actual means that each curve is equipped with a directional ar-
knotted rope is the key ingredient. Knots are analogous to row, and we keep track of the direction of the arrow when
integers. Just as we believe that objects follow the laws the curve is deformed by the Reidemeister moves. If the
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 203
204 Knots
curves A and B are represented by an oriented link dia- Of course, two singly linked rings receive linking num-
gram with two components, attach a sign (+1 or −1) to ber equal to +1 or −1 as shown in Fig. 8.
each crossing as in Fig. 7. Then the linking number Lk(A, It can be shown that the linking number is invariant
B) is the sum of these signs over all the crossings of A under the Reidemeister moves. That is, if we take a given
with B. diagram D (representing the curves A and B) and change it
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 205
206 Knots
Knots 207
in the diagram and replace the coloring rule by a method an observer facing the overcrossing line and standing on
for combining these labels. It turns out that a good way to the arc labeled A.
articulate such a rule of combination is to make the label This operation depends upon the orientation of the
on one of the undercrossing arcs at a crossing a product line labeled B, so that A∗ B corresponds to B pointing
(in the sense of this new mode of combination) of the to the right for an observer approaching the crossing
labels of the other two arcs. In fact, we shall assume that along A, and A# B corresponds to B pointing to the
this product operation depends upon the orientation of the left for the same observer. All of this is illustrated in
arcs as shown in Fig. 15. Fig. 15.
In Fig. 15 we show how a label A on an undercrossing The binary operations ∗ and # are not necessarily ass-
arc combines with a label B on an overcrossing arc to ociative. For example, our original color assignments of
form C = A∗ B or C = A# B depending upon whether the R (red), B (blue), and P (purple) for the trefoil knot corres-
overcrossing arc is oriented to the left or to the right for pond to products R∗ R = R, B∗ B = B, P∗ P = P, R∗ B = P,
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
208 Knots
Knots 209
be handled just like the three-coloring that we have a coloring must obey the equations A∗ B = C, C ∗ A = B,
already studied. In particular, a given labeling of a B ∗ C = A. Hence 2B − A = C, 2A − C = B, 2C − B = A.
knot diagram means that it is possible to label (sat- For example, if A = 0 and B = 1, then C = 2B − A = 2
isfying the rules given above for the labels) any dia- and A = 2C − B = 4 − 1 = 3. We need 3 = 0. Hence this
gram that is related to it by a sequence of Reidemeister system of equations will be satisfied for appropriate label-
moves. However, not all the labels will necessarily ap- ings in Z/3Z, the integers modulo three, a modular number
pear on every related diagram, and for a given coloring system.
scheme and a given knot, certain special restrictions can For the reader unfamiliar with the concept of modular
arise. number system, consider a standard clock whose dial is
To illustrate this, consider the color rule for numbers: labeled with the hours 1, 2, 3, . . . , 11, 12. We ask what
A∗ B = A# B = 2B − A. This satisfies the axioms, as is time is it 4 hr past the hour of 10? The answer is 2, and one
easy to see. Figure 17 shows how, on the trefoil, such can say that in the arithmetic of this clock 10 + 4 = 2. In
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
210 Knots
Knots 211
212 Knots
coloring number of a knot or link K is the least number t, we obtain a polynomial that generalizes the modulus.
of colors (greater than 1) needed to color it in the 2B − A This polynomial is the Alexander polynomial.
fashion for any diagram of K . It is a nice exercise to verify Alexander (1923) described it differently in his original
that the coloring number of the figure eight knot is indeed paper, and there is a remarkable history to the development
four. In general the coloring number of knot or link is not of this invariant. See, for example Crowell and Fox (1963)
easy to determine. This is an example of a topological and Kauffman (1987b) for more information. The flavor
invariant that has subtle combinatorial properties. of this relationship can be seen by doing a little experi-
Other knots and links mentioned in this section can be ment in labeling the trefoil diagram shown in Fig. 19. The
shown to be knotted and linked by the modular method. circularity inherent in the knot diagram results in relations
The reader should try it for the Borromean rings and the that must be satisfied by the module action. In Fig. 19 we
Whitehead link. see directly by labeling the diagram that if arc 1 is labeled
The coloring (labeling) rules as we have formalized 0 and arc 2 is labeled A, then (t + (1 − t)2 )A = 0. In fact,
them can be described as axioms for an algebra associated t + (1 − t)2 = t 2 − t + 1 is the Alexander polynomial of
with the knot. This is called the quandle. It has been gen- the trefoil knot. The Alexander polynomial is an algebraic
eralized to the crystal, the interlock algebra, and the rack. modulus for the knot.
The quandle is itself a generalization of the fundamental
group of the knot complement.
III. THE JONES POLYNOMIAL
C. The Alexander Polynomial
Our next topic describes an invariant of knots and links of
The modular labeling method has a marvellous genera- quite a different character than the modulus or the Alexan-
lization to the Alexander polynomial of the knot. der polynomial of the knot. It is a “polynomial” invariant of
This comes about through generalized coloring rules knots and links discovered by Jones (1985). Jones’ invari-
A∗ B = t A + (1 − t)B and A# B = t −1 A + (1 − t −1 )B, ant, usually denoted VK (t), is a polynomial in the variable
where t is an indeterminate. It is a nice exercise to verify t 1/2 and its inverse t −1/2 . One says that VK (t) is a Laurent
that these rules satisfy the axioms for the quandle. This polynomial in t 1/2 . Superficially, the Jones polynomial
algebraic structure is called the Alexander module. appears to be just another polynomial invariant of knots
The case t = −1 gives the rule 2B − A that we have and links, somewhat similar to the Alexander polynomial.
already considered. By coloring diagrams with arbitrary When I say that the Jones polynomial is of a different
Knots 213
character, I mean something deeper, and it will take a little crossing site for K + and K − (see Fig. 20), then
while to explain this difference. A little history will help. t −1 VK + (t) − t VK − (t) = (t 1/2 − t −1/2 )VK 0 (t).
The Alexander polynomial was discovered in the 1920s
and until 1984 no one had found another polynomial in- The axioms for VK (t) are a consequence of Jones’ orig-
variant of knots and links that was not a simple gener- inal definition of his invariant. He was led to this invariant
alization of the Alexander polynomial. Jones discovered by a trail that began with the study of von Neumann al-
a new polynomial invariant of knots and links that had gebras (a branch of algebra directly related to quantum
some very remarkable properties. The Alexander polyno- theory and to statistical mechanics) and ended in braids,
mial cannot detect the difference between any knot and knots, and links. The Jones polynomial has a distinctly
its mirror image. What made the Jones polynomial such different flavor from the Conway–Alexander polynomial,
an exciting discovery for knot theorists was the fact that it even though it can be axiomatized in a very similar way.
could detect the difference between many knots and their In fact, this similarity of axiomatics points to a common
mirror images. Later, other properties began to emerge. generalization [the Homfly(Pt) polynomial] and to another
It became a key tool in proving properties of alternating generalization (the Kauffman polynomial) and then to fur-
links (and generalizations) that had been conjectured since ther generalizations in the connection with statistical me-
the last century (see, eg., Murasugi, 1987a, b, Kauffman chanics (see, e.g., Kauffman, 1989).
1987a and Thistlewaite 1987). To this date no one has found a knotted loop that the
It turns out the the Jones polynomial is intimately re- Jones polynomial does not declare to be knotted. Thus one
lated to a number of topics in mathematical physics. Curi- can make the following conjecture:
ously, it is actually easier to define and verify the properties
of the Jones polynomial than for any other invariant in the Conjecture. If a single-component loop K is knotted,
theory of knots (except of course the linking number). We then VK (t) is not equal to one.
shall devote this section to the defining properties of the While it is possible that the Jones polynomial is able to
Jones polynomial, and later sections to the relationships detect the property of being knotted, it is not a complete
with physics. classifier for knots. There are inequivalent pairs of knots
Here is a set of axioms for the Jones polynomial. that have the same Jones polynomial. Such a pair is shown
The polynomial was not discovered in the form of these in Fig. 21. These two knots, the Kinoshita–Terasaka knot
axioms. The axioms are in a format analogous to the and the Conway knot, have the same Jones polynomial but
framework that Conway (1970; Kauffman, 1980, 1983) are different topologically. Incidentally these two knots are
discovered for the Alexander polynomial. I am starting examples of knots whose knottedness cannot be detected
with these axioms because they give a quick access to the by the Alexander polynomial.
polynomial and to sample computations. Let us use the axioms to compute the Jones polynomial
for the trefoil knot. To this end, there is a useful device
called the skein tree. A skein tree is obtained from a given
1. Axioms for the Jones Polynomial
knot or link diagram by recording the knots and links
1. If two oriented links K and K are ambient isotopic, obtained from this diagram by smoothing or switching
then VK (t) = VK (t). crossings. Each node of the tree is a knot or link. The
2. If U is an unknotted loop, then VU (t) = 1. nodes farthest from the original knot or link are unknotted
3. If K + , K − , and K 0 are three links with diagrams that or unlinked. Such a tree can be produced from a given knot
differ only as shown in the neighborhood of a single or link by using the fact that any knot or link diagram can
214 Knots
FIGURE 21 (Top) Conway Knot. (Bottom) Kinoshita-Terasaka Knot. Two Knots with trivial Alexander Polynomial.
be transformed into an unknotted (unlinked) diagram by This is the easiest possible knot diagram to draw since
a sequence of crossing switches. one never has to make any corrections: one just passes
Figure 22 illustrates a “standard unknot diagram.” This under when one wants to cross an an already created line
diagram is drawn by starting at the arrowhead in the fig- in the diagram. Standard unknot diagrams are always un-
ure and tracing the diagram in such a way that one always knotted. Trying the one in Fig. 22 will show why this
draws an overcrossing before drawing an undercrossing. is so.
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 215
216 Knots
with labels A and B as shown in Fig. 25. (In the A state diagram are indicated in Fig. 26. States are evaluated in
the regions swept out by a counterclockwise turn of the two ways. These ways are denoted by K | S and by S.
overcrossing line are joined. In the B state the regions The norm of the state S, S, is defined to be one less than
swept out by a clockwise turn of the overcrossing line are the number of closed curves in the plane described by S.
joined.) In the example in Fig. 26, we have S = 1 and S = 0.
A state S of a diagram K consists in a choice of local The evaluation K | S is defined to be the product of all
state for each crossing of K . Thus a diagram with N cross- the state labels (A and B) in the state. Thus, in Fig. 26, we
ings will have 2U states. Two states S and S of the trefoil have, K | S = A3 and K | S = A2 B.
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 217
Taking variables A, B, and d, we define the state sum- Kauffman (1987a) for more information about the bracket
mation associated with a given diagram K by the formula and its relationship with the Jones polynomial.] There is
a great deal of topological information in the calculations
K = K | S d S .
that ensue from the bracket polynomial. In particular, one
In other words, for each state we take the product of the can distinguish many knots from their mirror images, and
labels for that state multiplied by d raised to the number it is possible that the bracket calculation can detect whether
of loops in the state. K is the summation of this state a given diagram is actually knotted.
evaluation over all the states in the diagram for K . (The
notation S means “sum over all instances of S.”) We will
A. First Steps in Bracketology
show that the state summation K is invariant under the
second and third Reidemeister moves if we take B = 1/A The first constructions related to the bracket polynomial
and d = −(A2 + B 2 ). A normalization then enables us to are quite elementary. There are two basic formulas that
obtain invariance under all three Reidemeister moves, and are reminiscent of the exchange relations we have al-
hence topological information about knots and links. [See ready seen for the Jones polynomial. These formulas are
218 Knots
as shown in Fig. 27. Here the small diagrams indicate parts specialized the variables A, B, and d. We shall analyze
of larger diagrams that are otherwise identical. Formula just what specialization will produce an invariant of knots
1 just says that the state summation breaks up into two and links. The advantage to having set up the definition
sums with respect to a given crossing in the diagram. In of the bracket polynomial in this way is exactly that we
one sum, we have made a smooothing of type A at the have a method of labeling link diagrams with algebra, and
crossing, while in the other sum we have made a smooth- it is possible to then adjust the evaluation so that it is
ing of type B. The factors of A and B indicated in the for- invariant under Reidemeister moves. To this end, the next
mula are the contributions to the product of vertex weights lemma tells us how the general bracket behaves under a
from this crossing. All the rest of the two partial sums Reidemeister move of type two. Essential diagrams for
can be interpreted as bracket evaluations of the smoothed this lemma are in Fig. 28.
diagrams.
Formula 2 in Fig. 27 just states that an extra, simple Lemma. Let K be a given link diagram, and let
closed curve in a diagram multiplies its bracket evaluation K denote a diagram that is obtained from K by per-
by the loop value d. Note that a single loop receives the forming a type 2 Reidemeister move in the simplify-
value 1. ing direction (eliminating two crossings from K ). Let
With the help of these two formulas, we can compute K be the diagram obtained from K by replacing the
some basic bracket evaluations. Note that we have not yet site of the type 2 move by two arcs in the opposite
Knots 219
pattern to the form of the simplified site in K . (The and respective coefficients ABd (after converting the loop
diagrams in Fig. 28 illustrate this construction.) Then to a value d), A2 , and B 2 . This completes the proof of the
K = AB K + (ABd + A2 + B 2 ) K . Lemma.
With the help of this lemma it is now obvious that if
Proof. Consider the four local state configurations we choose B = 1/A and d + A2 + B 2 = 0, then K is
that are obtained from the diagram K on the left-hand invariant under the second Reidemeister move.
side of the equation, as illustrated in Fig. 28. The formula Once this choice is made, the resulting specialized
follows from the fact that one of these states has coefficient bracket is invariant under the third Reidemeister move,
AB and the other three have the same underlying diagram as illustrated in Fig. 29.
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
220 Knots
Knots 221
with a simplifying move of type 1 available where the (Here we use the notation of the previous lemma as shown
crossing that is to be removed has type −1. Figure 30 in Fig. 30.)
illustrates the diagrams for K (+) and K (−). Thus the writhe behaves in a parallel way to the bracket
on the type 1 moves, and we can combine writhe and
Proof. See Fig. 30 for the behavior under type 1 bracket to make a new calculation that is invariant under
moves. We have already verified the other statements in all three Reidemeister moves. We call the fully invariant
this lemma. calculation the “ f -polynomial” and define it by the
equation
B. Framing Philosophy, Twist and Writhe
f K (A) = (−A3 )−w(K ) K (A).
Is it unfortunate that the bracket is not invariant under the
first Reidemeister move? No, it is fortunate! First of all, Up to this normalization, the bracket gives a model for
the matter is easy to fix by a little adjustment: Let K be an the original Jones polynomial. The precise relationship
oriented knot or link, and define the writhe of K , denoted is that VK (t) = f K (t −1/4 ), where w(K ) is the sum of the
w(K ), to be the sum of the signs of all the crossings in K . crossing signs of the oriented link K , and K is the bracket
Thus, the writhe of the right-handed trefoil knot is three. polynomial obtained by ignoring the orientation of K .
The writhe has the following behavior under Reidemeister We shall return to this relationship with the Jones poly-
moves: nomial in a moment, but first a little extra mathematical
philosophy: Another way to view the fact of the bracket’s
(i) w(K ) is invariant under the second and third lack of invariance under the first Reidemeister move is to
Reidemeister moves. see that the bracket is an invariant of knotted and linked
(ii) w(K ) changes by plus or minus one under the first bands embedded in three-dimensional space. Regard a link
Reidemeister move: diagram as shorthand for an embedding of bands as shown
in Fig. 31.
w(K (+)) = w(K ) + 1
Figure 31 illustrates a link diagram for the trefoil knot in
w(K (−)) = w(K ) − 1. a thick, dark mode of drawing. This diagram is juxtaposed
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
222 Knots
with a drawing of a knotted band that parallels that knot of a ribbbon-like strip of paper attached to itself with an
diagram. The band has two boundary components that even number of half-twists. The first Reidemeister move
proceed (mostly in the plane) parallel to one another. The no longer applies to this shorthand since we can, at best,
curl in the knot diagram becomes a flat curl in the band replace a curl by a twist as shown in Fig. 31.
that is ambient isotopic to a full twist (two half-twists) in In fact, as Fig. 31 shows, there are two distinct curls cor-
the band. This isotopy is indicated in Fig. 31. The top of responding to a single full twist of a band. The bracket (and
Fig. 31 shows a full twist in a band and two flat curls that the writhe) behave the same way on both of these twists.
both give rise to this same full twist by ambient isotopy This means that we can reinterpret the bracket as an in-
that leaves their ends fixed. Each component of a link variant of the topological embeddings of knotted, linked,
diagram is replaced by a paralleled version: the analog and twisted bands in three-dimensional space. This means
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 223
224 Knots
image of the knot that would ensue if the plane on which contributions of (−A3 ) or (−A3 )−1 , one from each cross-
the knot is drawn were a mirror. It is easy to see that ing and depending upon the sign of the crossing. Thus
K ∗ (A) = K (A−1 ) and that f K ∗ (A) = f K (A−1 ). Thus, we can write an oriented state expansion formula for f K
if K is ambient isotopic to K ∗ (all three Reidemeister as shown below, where K + and K − denote links with
moves allowed), then corresponding sites with oriented crossings, K 0 is the re-
sult of smoothing the crossing in an oriented fashion, and
f K (A) = f K ∗ (A) = f K (A−1 ).
K & is the result of smoothing the crossing against the
Returning to the evaluation of the f -invariant for the tre- orientation:
foil, note that f T (A−1 ) is not equal to f T (A). Therefore,
the trefoil knot T and its mirror image T ∗ are topolog- f K + = (−A3 )−1 A f K 0 + (−A3 )−1 A − 1 f K & .
ically distinct. The proof that we have given for it is
Hence,
the simplest proof known to this author. Note that we
have given a complete proof of this fact, starting with the f K + = −A−2 f K 0 − A−4 f K &
Reidemeister moves, constructing and applying the
bracket invariant. and similarly, for a negative crossing,
A knot is said to be chiral if it is not ambient isotopic
f K − = −A2 f K 0 − A4 f K & .
to its mirror image. The words chiral and chirality come
from physical chemistry and natural science. A knot that Letting VK (t) = f K (t −1/4 ), we have
is equivalent to its mirror image is said to be achiral. (or
VK + = −t 1/2 VK 0 − t VK &
amphicheiral in the speech of knot theorists). Many knots
are achiral. The reader may enjoy verifying that the figure VK − = −t −1/2 VK 0 − t −1 VK & .
eight knot shown in Fig. 33 is ambient isotopic to its mirror
Therefore,
image.
A complete understanding of the problem of determin- t −1 VK + − t VK − = (t 1/2 − t −1/2 )VK 0 .
ing whether a knot is chiral remains in the far distance.
We leave the rest of the verification that VK (t) is the Jones
The new invariants of knots and links have enhanced our
polynomial (see Section 3) to the reader (check that it has
understanding of this difficult question.
the right behavior on unknotted loops).
Knots 225
instance of this is the Conway polynomial C K (z) with its ted graphs is a Vassiliev invariant of finite type i if it satis-
exchange identity fies the identity
C K + − C K − = zC K 0 . Z K+ − Z K− = Z K#
Vassiliev (1990) gave new meaning to this sort of iden- and it is of finite type i. In rigid vertex isotopy the cyclic
tity by thinking of the structure of the entire space of all order at the vertex is preserved, so that the vertex behaves
mappings of a circle into three-dimensional space. This like a rigid disk with flexible strings attached to it at spe-
space of mappings includes mappings with singularities cific points.
where two points on a curve touch. He interpreted the Vassiliev invariants form an extraordinary class of knot
equation invariants. It is an open problem whether the Vassiliev
invariants are sufficient to distinguish knots that are topo-
Z K+ − Z K− = Z K#
logically distinct.
as describing the difference of values across a singular Vassiliev began an analysis of the combinatorial condi-
embedding K # , where K # has a transverse singularity in tions on graph evaluations that could support such invari-
the knot space as illustrated in Fig. 34. (In a transverse ants. The key observation is the following result:
singularity the curve touches itself along two different
directions.) Lemma. If Z G is a Vassiliev invariant of finite type
The Vassiliev formula serves to define the value of the i, then Z G is independent of the embedding of the graph
invariant on a singular embedding in terms of the the val- G when G has i vertices.
ues on two knots “on either side” of this embedding. This
Vassiliev formula serves to describe a method of extend- Proof. Suppose that G is an embedded graph G with
ing a given invariant of knots to a corresponding invari- i nodes. If we switch a crossing in G to form G , then
ant of embedded graphs with controlled singularities of the exchange relation for the Vassiliev invariant says that
this transverse type. This idea had been considered be- Z G − Z G = Z G , where G has one more node than G or
fore Vassiliev. Vassiliev carried out his program of analyz- G . But then G has i + 1 nodes and hence Z G = 0. There-
ing the singular knot space using techniques of algebraic fore Z G = Z G . This shows that we can switch crossings
topology, and in the course of this investigation he discov- in any embedding of G without changing the value of Z G .
ered a key concept that had been completely overlooked It follows from this that Z G is independent of the embed-
in the context of graph invariants. That concept is the idea ding and depends only on the graph G. This completes the
of an invariant of finite type. proof of the lemma.
For a Vassiliev invariant of type i, there is important
Definition. We shall say that Z G is an invariant of information in the values it takes on graphs with exactly i
finite type i if Z G vanishes for all graphs with greater than nodes. These evaluations do not depend upon the embed-
i nodes. ding type of the graph. However, not just any such graph-
This concept was extracted from Vassiliev’s work by ical evaluation will extend to give a topological invari-
Birman and Lin (1993). A (rigid vertex) invariant of knot- ant of knots and graphs. There are necessary conditions.
226 Knots
Vassiliev found a version of these conditions through his its generalizations give rise to Vassiliev invariants. In the
analysis of the knot space and Stanford (1996) discov- case of the Jones polynomial here is an easy proof of their
ered the beautiful topological meaning of these condi- result:
tions in relation to the switching identity. Stanford’s argu-
ment goes as follows: Consider a singular crossing that Theorem. Let VG (t) denote the Jones polynomial
has an arc from the diagram passing underneath it as extended to rigid vertex 4-valent graphs by the formula
shown in Fig. 35. Four crossing switches will take that VK + − VK − = VK # . Let vi (G) denote the coefficient of x i
arc above the singular crossing and return the diagram to in the expansion of VG (exp(x)). Then vi (G) is a Vassiliev
a position that is topologically equivalent to its original invariant of type i.
position.
Each crossing switch gives an equation. There are four Proof. Use the identities from the end of Section 4:
equations. Add them up and one gets an identity among
the values of the invariant on four diagrams. Call this the VK + = −t 1/2 VK 0 − t VK & ,
four-term relation. This identity is illustrated in the second VK − = −t −1/2 VK 0 − t −1 VK & .
box in Fig. 35.
Now recall from the lemma we proved above that for a Substitute t = exp(x). It follows at once that VK # = VK + −
Vassiliev invariant of type i, the graphs with i nodes have VK − is divisible by x. Hence VG is divisible by x i when G
values that are independent of their embeddings in three- has i nodes. This implies that the coefficients vi (G) = 0 if
dimensional space. This means that at the top level (the G has more than i nodes. Hence the coefficients vi (G) are
i-noded graphs for a Vassiliev invariant of type i will be of finite type, proving the theorem.
called the top level) the four-term relations will be rela- With the help of theorems of this type it is possible
tions among the evaluations of abstract graphs. At the top to study Vassiliev invariants by studying the structure of
level the four-term relations will be purely combinatorial known invariants of knots and links. In particular it is
conditions related to the topology. possible to justify the structure of many weight systems
How shall we think of abstract four-valent graphs cor- in terms of known invariants. We shall not go into these
responding to singular embeddings of a knot? An abstract sorts of investigations in this exposition. The next section
knot is just a circle. An abstract singular knot is a cir- shows how the algebraic study of Lie algebras is directly
cle with pairs of points marked that become the singular related to the construction of Vassiliev invariants. This is
points in the embedding. Indicate these paired points by one beginning of a whole world of relationships between
arcs between them. Call the resulting structure a chord knot theory and algebra.
diagram. See the example at the beginning of Fig. 36.
In the language of the chord diagrams the four-term
relation at the top level (see the discussion of the top level VI. VASSILIEV INVARIANTS
in the paragraph above) becomes the equation shown in AND LIE ALGEBRAS
Fig. 36. This can be seen by translating the relation in
Fig. 35 into the language of chord diagrams. In Fig. 36 The subject of Lie algebras is an algebraic study with
we indicated parts of the chord diagram that are neighbors a remarkable connection with the topology of knots and
by showing an outer bracket connecting them. Those sites links. The purpose of this section is to first give a brief
that are neighbors can have no other chords between them. introduction to the concept of a Lie algebra and then to
Otherwise there can be many chords in these diagrams show the deep connection between these algebras and the
that are not indicated, just so long as the diagrams in the structure of Vassiliev invariants for knots and links, as
equation for the four-term relation differ only as shown in described in the previous section.
the figure. In order to understand the idea behind a Lie algebra it
If one can write down a top-level evaluation of chord is helpful to first consider the concept of a group. A set G
diagrams that satisfies the four-term relation, then one has is said to be a group if it has a single binary operation ∗
the raw data for a Vassiliev invariant. Such an evalua- such that:
tion of chord diagrams is called a weight system for a
Vassiliev invariant. By the theorems of Kontsevich (1994) 1. Given a and b in G, then a ∗ b is also in G.
and Bar-Natan (1995), these raw data guarantee the exis- 2. If a, b, c are in G, then (a ∗ b)∗ c = a ∗ (b∗ c).
tence of at least one invariant that satisfies the top-level 3. There is an element E in G such that E ∗ a = a ∗ E = a
evaluation. for all a in G.
The world is rife with Vassiliev invariants. Birman and 4. Given a in G, there exists an element a −1 in G such
Lin (1993) showed directly that the Jones polynomial and that a ∗ a −1 = a −1∗ a = E.
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 227
228 Knots
One of the most fertile sources of groups is matrix algebra. the formula (AB)i j = k Aik Bk j , where k runs from 1 to
Recall that an n × n matrix A is an array of numbers Ai j n in this summation.
(real or complex) A = (Ai j ), where i and j range in value For our purposes it is essential to have a dia-
from 1 to n. One defines the product of two matrices by grammatic representation for matrix multiplication. This
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 229
FIGURE 37
representation is illustrated in Fig. 37. Each matrix is rep- is the sum of the diagonal entries Aii where i ranges from
resented by a labeled box with one arrow that enters the one to n. The diagrammatic proof of the basic formula
box and one arrow that leaves the box. The entering arrow tr(AB) = tr (B A) is illustrated in Fig. 38.
corresponds to the left index i in Ai j and the right arrow For a given value of n, we let Mn (R) denote the set of
corresponds to the right index j. In multiplying two ma- all n × n matrices with coefficients in the real numbers R.
trices A and B together to form AB we tie the outgoing We let A∗ B = AB denote the product of matrices and we
arrow of A to the ingoing arrow of B. By convention, an let E denote the matrix whose entries are given by E ii = 1
arrow that has no free ends connotes the summation over for all i and E i j = 0 if i is not equal to j. With this choice
all possible choices of index for that arrow. of multiplication and identity element E, Mn (R) satisfies
Many facts about matrices become quite transparent in the first three axioms for a group. However, there are ma-
this notation. For example, the trace of A, denoted tr(A), trices A that have no inverse ( A−1 so that A A−1 = E). For
230 Knots
example the matrix 0, all of whose entries are zero, is a ma- (since tr(BC) = tr(CB) for any matrices B and C). Thus, if
trix without an inverse. Thus Mn (R) is not itself a group. B and C belong to sl(n) , then [B, C] also belongs to sl(n) .
There is a criterion for a matrix to have an inverse. This This closure under the bracket operation leads directly to
is simply that the determinant Det(A) should be nonzero. the notion of a Lie algebra.
Thus the largest group of matrices of size n × n that we
can devise is the set of all matrices A such that Det(A) is Definition. A Lie algebra is a vector space L over
nonzero. This is called the general linear group and is de- a field F that is closed under a binary operation, called
noted by GLn (R). There are many interesting subgroups the Lie bracket and denoted by [B, C ] for B and C
of this large group of matrices. One example is the group in L. The bracket is assumed to satisfy the following
Sl(n) of all matrices with determinant equal to one. We may axioms.
also restrict to orthogonal matrices A over R. These are
invertible matrices A such that At = A−1 , where At de-
1. [X, Y ] = − [Y, X ] for all X and Y in L.
notes the transpose of the matrix A: Ait j = A ji . The group
2. [a X + bY, Z ] = a[X, Z ] + b[Y, Z ] for all a and b in
of orthogonal matrices is denoted by O(n).
F and X , Y , Z in L.
The intersection of O(n) and Sl(n) is denoted SO(n).
3. [X, [Y, Z ]] + [Z , [X, Y ]] + [Y, [Z , X ]] = 0.
SO(n) consists of the orthogonal matrices of determinant
equal to one. In the case n = 2, SO(2) consists of rota-
tions of the plane that fix the origin, and in the case of This last identity is called the Jacobi identity. It is easy
n = 3, SO(3) consists of rotations of three-dimensional to verify that the bracket operation [B, C] = BC − CB on
space about specified axes. the vector space of all n × n matrices over F (e.g., F = R)
SO(3) has a fascinating collection of finite subgroups satisfies the axioms given above. Thus, we have so far seen
including the symmetries of the classical regular solids: that sl(n) is a Lie algebra that is naturally associated with
the tetrahedron, the cube, the octahedron, the dodecahe- the group of matrices SL(n) . In fact, sl(n) generates SL(n)
dron, and the icosahedron. Ultimately, the matrix groups by exponentiation.
become a language for the precise expression of symmetry. There is a general pattern. Each matrix group has
We now ask when a matrix A can be written in the form its corresponding Lie algebra. The classification of ma-
trix groups is simplified by a corresponding classifi-
A = e B = E + (1/1!)B + (1/2!)B 2 + (1/3!)B 3 + · · · cation of Lie algebras. As a result, the Lie algebras
for some other matrix B. Since e B = limit(E + B/m)m , are a subject in their own right. It has often hap-
where the limit is taken as m approaches infinity, we can pened that Lie algebras are connected mathematically
regard (E + B/m), for m large, as an “infinitesimal” ver- with subjects different from their original roots in group
sion of the matrix A, and one refers to B as an “infinitesi- theory.
mal generator” for A. It is interesting and mathematically In our context the Lie algebras turn out to be related
significant to compare the algebraic properties of A and B. to the formation of weight systems for Vassiliev invari-
The key property for this comparison is the determinant ants. One way to see this is to just take the case of matrix
equation Lie algebras with commutator brackets and interpret dia-
grammatically the formula that states that the Lie algebra
Det(e B ) = e tr(B), is closed under the bracket operation. This formula states
where tr(B) denotes the trace of B. (One way to prove this that there is a basis {T 1 , T 2 , . . . , T m } for the Lie algebra
identity is to use the Jordan canonical form for the matrix as a vector space over F such that each T a is an n × n
and the fact that similar matrices have the same trace and matrix and such that
determinant.)
For example, if Det(e B ) = 1, then we need that T a T b − T b T a = f abc T c ,
tr(B) = 0. This means that elements of Sl(n) are the ex-
where f abc is a set of constants in F depending on the three
ponentials of matrices with trace equal to zero.
indices a, b, c (running from 1 to n). The right-hand side of
Let sl(n) denote the set of n × n matrices with trace equal
this equation connotes a summation over all values of the
to zero. The set sl(n) is not closed under matrix multiplica-
index c = 1, . . . , n. The left-hand side is the commutator
tion, but it is closed under the Lie bracket (or commutator)
of T a and T b for any given choice of a and b.
operation [B, C] = BC − C B.
In Fig. 39 we diagram this equation using the conven-
If tr(B) = tr(C) = 0, then tions for diagrammatic matrix multiplication explained in
this section. The structure constants f abc are represented
tr[B, C] = tr(BC − C B) = tr(BC) − tr(C B)
by a graphical vertex with three lines attached to it, one for
= tr(BC) − tr(BC) = 0 a, one for b, and one for c. For the purpose of discussion,
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 231
we shall assume that fcab is dependent only on the cyclic Concretely, the relationship we have just described
order of abc. It is convenient to regard the graphical ver- means that it is possible to construct weight systems for
tex as representing a “tensor” that has this cyclic invari- Vassiliev invariants by using matrix Lie algebras. To see
ance since this means that we can slide the diagram for how this works see Fig. 42. Here we indicated a chord di-
the structure constant tensor around in the plane so long agram D and a corresponding diagram involving matrices
as we keep the cyclic order of its legs unchanged. Such T a from a Lie algebra basis. Thesecond diagram repre-
bases can be obtained in many cases of matrix Lie algebras, sents the sum of traces wt(D) = tr(T a T b T c T a T b T c ),
and the results that we outline can be generalized in any where we are summing over all values for the indices
case. a, b, and c. This second diagram represents the weight
Figure 40 shows a formal version of the commutator wt(D) that is assigned to the first diagram. It follows
relation of Fig. 39, except that the labels and indices have from our considerations that this weight system satis-
been removed and the boxes for matrix elements have been fies the four-term relation and hence, by the theorem
replaced by graphical vertices. Imagine that the terms in of Kontsevich, is the top-row evaluation for a Vassiliev
this formal version of the commutator relation are parts invariant.
of chord diagrams as illustrated with examples in this This section has sketched the amazing and deep con-
figure. In other words, recall the method of chord dia- nection between Lie algebras and invariants of knots and
grams from the last section and imagine that along with links. The territory is even more surprising as one explores
the chords there are also trivalent graphical vertices among it further. First of all, it should be clear from what we have
the chords, and that these vertices are related to commu- said that what is really needed here is an appropriate gen-
tators as shown in the figure. Finally, Fig. 41 shows a eralization of Lie algebras. In fact, prior to the discovery of
formal derivation of the four-term relation for chord dia- the Vassiliev invariants, a very remarkable such general-
grams from the diagrammatic commutator identity. This ization called “quantum groups” was discovered through
means that the four-term relation that we derived from work in statistical mechanics and was applied to knot the-
topological considerations in the last section is intimately ory. It was already known that quantum groups provided a
related to the basic structure of a Lie algebra. This is the strong connection between Lie algebras and their general-
essence of the relationship of Vassiliev invariants with Lie izations and invariants of knots and links. Now the matter
algebras and their generalizations. of finding all weight systems challenges the resources of
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
232 Knots
quantum groups and it is not known if all Vassiliev invari- the fantastic notion that matter (such as an electron) is
ants can be built through the quantum groups. accompanied by a wave that guides its motion and pro-
In the next few sections we shall discuss the physical duces interference phenomena just like the waves on the
background behind many of the mathematical ideas dis- surface of the ocean or the diffraction effects of light going
cussed so far in this introduction to knot invariants. through a small aperture.
de Broglie’s idea was successful in explaining the prop-
erties of atomic spectra. In this domain, his wave hypothe-
VII. A QUICK REVIEW OF sis led to the correct orbits and spectra of atoms, formally
QUANTUM MECHANICS solving a puzzle that had been only described in ad hoc
terms by the preceding theory of Niels Bohr. In Bohr’s the-
To recall principles of quantum mechanics it is useful to ory of the atom, the electrons are restricted to move only in
have a quick historical recapitulation. Quantum mechan- certain elliptical orbits. These restrictions are placed in the
ics really got started when Louis de Broglie introduced theory to get agreement with the known atomic spectra,
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 233
234 Knots
and to avoid a paradox! The paradox arises if one thinks of As we shall see, the statistical nature of quantum theory
the electron as a classical particle orbiting the nucleus of has a formal side that can be exploited to understand the
the atom. Such a particle is undergoing acceleration in or- topological properties of such mundane objects as knot-
der to move in its orbit. Accelerated charged particles emit ted ropes in space and spaces constructed by identifying
radiation. Therefore the electron should radiate away its the sides of polyhedra. These topological applications of
energy and spiral into the nucleus! Bohr commanded the quantum mechanical ideas are exciting in their own right.
electron to only occupy certain orbits and thereby avoided They may shed light on the nature of quantum theory
the spiral death of the atom—at the expense of logical itself.
consistency. In this section we review a bit of the mathematics of
de Broglie hypothesized a wave associated with the quantum theory. Recall the equation for a wave
electron and he said that an integral multiple of the length
f (x, t) = sin[(2π/l)(x − ct)].
of this wave must match the circumference of the electron
orbit. Thus, not all orbits are possible, only those where With x interpreted as the position and t and as the time,
the wave pattern can “bite its own tail.” The mathematics this function describes a sinusoidal wave traveling with
works out, providing an alternative to Bohr’s picture. velocity c. We define the wave number k = 2π/l and the
de Broglie had waves, but he did not have an equa- frequency w = (2π c/l), where l is the wavelength. Thus
tion describing the spatial distribution and temporal evo- we can write f (x, t) = sin(kx − wt). Note that the veloc-
lution of these waves. Such an equation was discovered by ity c of the wave is given by the ratio of frequency to wave
Erwin Schrödinger. Schrödinger relied on inspired guess- number, c = w/k.
work, based on de Broglie’s hypothesis, and produced a de Broglie hypothesized two fundamental relationships:
wave equation, known ever since as the Schrödinger equa- between energy and frequency, and between momentum
tion. Schrödinger’s equation was enormously successful, and wave number. These relationships are summarized in
predicting fine structure of the spectrum of hydrogen and the equations
many other aspects of physics. Suddenly a new physics,
quantum mechanics, was born from this musical hypoth- E = hw, p = hk,
esis of de Broglie. where E denotes the energy associated with a wave and
Along with the successes of quantum mechanics came p denotes the momentum associated with the wave. Here
a host of extraordinary problems of interpretation. What h = h/2π , where h is Planck’s constant.
is the status of this wavefunction of Schrödinger and de For de Broglie the discrete energy levels of the orbits of
Broglie. Does it connote a new element of physical re- electrons in an atom of hydrogen could be explained by
ality? Is matter “nothing but” the patterning of waves in restrictions on the vibrational modes of waves associated
a continuum? How can the electron be a wave and still with the motion of the electron. His choices for the energy
have the capacity to instantiate a very specific event at one and the momentum in relation to a wave are not arbitrary.
place and one time (such as causing a bit of phosphor to They are designed to be consistent with the notion that the
glow there on a television screen)? Max Born developed a wave or wave packet moves along with the electron. That
statistical interpretation of the wavefunction wherein the is, the velocity of the wave packet is designed to be the
wave determines a probability for the appearance of the velocity of the “corresponding” material particle.
localized particulate phenomenon that one wanted to call It is worth illustrating how de Broglie’s idea works.
an “electron.” In this story the wavefunction ψ takes val- Consider two waves whose frequencies are very nearly the
ues in the complex numbers and the associated probability same. If we superimpose them (as a piano tuner superim-
is ψ ∗ ψ, where ψ ∗ denotes the complex conjugate of ψ. poses a tuning fork with the vibration of the piano string),
Mathematically, this is a satisfactory recipe for dealing then there will be a new wave produced by the interference
with the theory, but it leads to further questions about the of the original waves. This new wave pattern will move at
exact character of the statistics. If quantum theory is inher- its own velocity, different (and generally smaller) than the
ently statistical, then it can give no complete information velocity of the original waves. To be specific, let
about the motion of the electron. In fact, there may be
no such complete information available even in principle. f (x, t) = sin(kx − wt), g(x, t) = sin(k x − w t).
Electrons manifest as particles when they are observed in Let
a certain manner and as waves when they are observed in
another, complementary manner. This is a capsule sum- h(x, t) = sin(kx − wt) + sin(k x − w t)
mary of the view taken by Bohr, Heisenberg, and Born.
= f (x, t) + g(x, t).
Others, including de Broglie, Einstein, and Schrödinger,
hoped for a more direct and deterministic theory of nature. A little trigonometry shows that
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 235
h(x, t) = cos{[(k − k )/2]x − [(w − w )/2]t} and momentum that initiated the beginnings of quantum
theory.
× sin{[(k + k )/2]x − [(w + w )/2]t}.
If we assume that k and k are very close and that w and A. Schrödinger’s Equation
w are very close, then (k + k )/2 is approximately k, and Schrödinger answered the question, Where is the wave
(w + w )/2 is approximately w. Thus h(x, t) can be rep- equation for de Broglie’s waves? Writing an elementary
resented by wave in complex form
H (x, t) = cos[(δk/2)x − (δw/2)t] f (x, t), ψ = ψ(x, t) = exp[i(kx − wt)],
where δk = (k − k )/2 and dw = (w − w )/2. This means we see that we can extract de Broglie’s energy and mo-
that the superposition H (x, t) behaves as the wave- mentum by differentiating:
form f (x, t) carrying a slower moving “wave packet” i h ∂ψ/∂t = E ψ and − i h ∂ψ/∂ x = pψ .
G(x , t) = cos[(δk /2)x − (δw/2)t]. See Fig. 43.
This led Schrödinger to postulate the identification of dy-
Since the wave packet (seen as the clumped oscil-
namical variables with operators so that the first equation,
lations in Fig. 43) has the equation G(x , t) =
cos[(δk/2)x − (δw/2)t], we see that that the velocity of i h ∂ψ/∂t = E ψ ,
this wave packet is vg = δw/δk. Recall that wave velocity is promoted to the status of an equation of motion, while
is the ratio of frequency to wave number. Now according the second equation becomes the definition of momentum
to de Broglie, E = hw and p = hk, where E and p are as an operator:
the energy and momentum, respectively, associated with
p = −i h ∂/∂ x.
this wave packet. Thus we get the formula vg = dE/d p. In
other words, the velocity of the wave packet is the rate of Once p is identified as an operator, the numerical value of
change of its energy with respect to its momentum. Now momentum is associated with an eigenvalue of this opera-
this is exactly in accord with the well-known classical laws tor, just as in the example above. In our example
for a material particle! For such a particle, E = mv 2 /2 and pψ = hkψ .
p = mv. Thus E = p 2 /2m and dE/d p = p/m = v. In this formulation, the position operator is just multi-
It is this astonishing concordance between the sim- plication by x itself. Once we have fixed specific opera-
ple wave model and the classical notions of energy tors for position and momentum, the operators for other
236 Knots
physical quantities can be expressed in terms of them. We (ψ ∗ denotes the complex conjugate of y) represents the
obtain the energy operator by substitution of the momen- probability of finding the “particle” (a particle is an ob-
tum operator in the classical formula for the energy: servable with local spatial characteristics) at a given point
in spacetime.
E = (1/2)mv 2 + V
E = p 2 /2m + V
B. Dirac Brackets
E = −(h 2 /2m) ∂ 2 /∂ x 2 + V.
We now discuss Dirac’s notation a | b (Dirac, 1958). In
Here V is the potential energy, and its corresponding op- this notation a| and |b are vectors and covectors, respec-
erator depends upon the details of the application. tively. a | b is the evaluation of a| by |b , hence it is a
With this operator identification for E, Schrodinger’s scalar, and in ordinary quantum mechanics it is a complex
equation number. One can think of this as the amplitude for the state
i h ∂ψ/∂t = −(h 2 /2m) ∂ 2 ψ/∂ x 2 + Vψ to begin in “a” and end in “b.” That is, there is a process
that can mediate a transition from state a to state b. Except
is an equation in the first derivatives of time and in second for the fact that amplitudes are complex valued, they obey
derivatives of space. In this form of the theory one consid- the usual laws of probability. This means that if the pro-
ers general solutions to the differential equation and this in cess can be factored into a set of all possible intermediate
turn leads to excellent results in a myriad of applications. states c1 , c2 , . . . , cn , then the amplitude for a → b is the
In quantum theory, observation is modeled by the sum of the amplitudes for a → ci → b. Meanwhile, the
concept of eigenvalues for corresponding operators. The amplitude for a → ci → b is the product of the amplitudes
quantum model of an observation is a projection of the of the two subconfigurations a → ci and ci → b. Formally
wavefunction into an eigenstate. we have
An energy spectrum {E k } corresponds to wavefunctions
ψ satisfying the Schrödinger equation such that there are a|b = a | ci ci | b ,
constants E k with E ψ = E k ψ. An observable (such as en- where the summation is over all the intermediate states
ergy) E is a Hermitian operator on a Hilbert space of i = 1, . . . , n.
wavefunctions. Since Hermitian operators have real eigen- In general, the amplitude for mutually disjoint processes
values, this provides the link with measurement for the is the sum of the amplitudes of the individual processes.
quantum theory. The amplitude for a configuration of disjoint processes is
It is important to notice that there is no mechanism pos- the product of their individual amplitudes.
tulated in this theory for how a wavefunction is “sent” Dirac’s division of the amplitudes into bras a| and kets
into an eigenstate by an observable. Just as mathematical |b is done mathematically by taking a vector space V (a
logic need not demand causality behind an implication be- Hilbert space, but it can be finite dimensional) for the bras:
tween propositions, the logic of quantum mechanics does a| belongs to V . The dual space V ∗ is the home of the
not demand a specified cause behind an observation. This kets. Thus |b belongs to V ∗ so that |b is a linear map-
absence of an assumption of causality in logic does not ping |b : V → C, where C denotes the complex numbers.
obviate the possibility of causality in the world. Similarly, We restore symmetry to the definition by realizing that an
the absence of causality in quantum observation does not element of a vector space V can be regarded as a map-
obviate causality in the physical world. Nevertheless, the ping from the complex numbers to V . Given a|: C → V ,
debate over the interpretation of quantum theory has often the corresponding element of V is the image of 1 (in C)
led its participants into asserting that causality has been under this mapping. In other words, a|(1) is a member
demolished in physics. of V . Now we have a|: C → V and |b : V → C. The
Note that the operators for position and momentum sat- composition ab = a | b : C → C is regarded as an el-
isfy the equation x p − px = hi. This corresponds directly ement of C by taking the specific value a | b (1). The
to the equation obtained by Heisenberg on other grounds, complex numbers are regarded as the “vacuum,” and the
stating that dynamical variables can no longer necessar- entire amplitude a | b is a “vacuum-to-vacuum” ampli-
ily commute with one another. In this way, the points of tude for a process that includes the creation of the state a,
view of de Broglie, Schrödinger, and Heisenberg came to- its transition to b, and the annihilation of b to the vacuum
gether, and quantum mechanics was born. In the course once more.
of this development, interpretations varied widely. Even- Dirac notation has a life of its own. Let P = |y x| and
tually, physicists came to regard the wavefunction not as xy = x|y . Then
a generalized wave packet, but as a carrier of information
about possible observations. In this way of thinking ψ ∗ ψ PP = |y xy x| = |y x | y x| = x | y P.
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 237
Up to a scalar multiple, P is a projection operator. That T − V , where T is the classical kinetic energy and V is
is, if we let Q = P/ x | y , then the classical potential energy of the particle.
The beauty of Feynman’s approach to quantum me-
QQ = PP/ x | y x | y
chanics is that it shows the relationship between the clas-
= x | y P/ x | y x | y = P/ x | y = Q. sical and the quantum in a particularly transparent manner.
Classical motion corresponds to those regions where all
Thus, Q Q = Q. In this language, the completeness of
nearby paths contribute constructively to the summation.
intermediate states becomes the statement that a certain
This classical path occurs when the variation of the action
sum
of projections is equal to the identity: Suppose that is null. To ask for those paths where the variation of the
i |ci ci | = 1 (summing over i) with ci | ci = 1 for each action is zero is a problem in the calculus of variations, and
i. Then
it leads directly to Newton’s equations of motion. Thus,
a | b = ab with the appropriate choice of action, classical and quan-
tum points of view are unified.
= a| |ci ci b
The drawback of this approach lies in the unavailability
= aci ci b at the present time of an appropriate measure theory to
support all cases of the Feynman integral.
= a | ci ci | b . To summarize, Dirac’s notation shows at once how the
probabilistic interpretation for amplitudes is tied to the
Iterating this principle of expansion over a complete set vector space structure of the space of states of the quantum
of states leads to the most primitive form of the Feyn- mechanical system. Our strategy for bringing forth rela-
man integral (Feynman and Hibbs, 1965). Imagine that tions between quantum theory and topology is to pivot on
the initial and final states a and b are points on the vertical the Dirac bracket. The Dirac bracket acts as intermediate
lines x = 0 and x = n + 1, respectively, in the x–y plane, between notation and linear algebra. In a very real sense,
and that (c(k)i(k), k) is a given point on the line x = k the connection of quantum mechanics with topology is an
for 0 < i(k) < m. Suppose that the sum of projectors for amplification of Dirac notation.
each intermediate state is complete. That is, we assume The next two sections discuss how topological invari-
that following sum is equal to one, for each k from 1 to ants in low-dimensional topology are related to amplitudes
n − 1: in quantum mechanics. In these cases the relationship with
|c(k)1 c(k)1| + · · · + |c(k)m c(k)m| = 1. quantum mechanics is primarily mathematical. Ideas and
techniques are borrowed. It is not yet clear what the effect
Applying the completeness iteratively, we obtain the fol- of this interaction will be on the physics itself.
lowing expression for the amplitude a | b :
a|b = a | c(1)i(1) c(1)i(1) | c(2)i(2) · · ·
VIII. KNOT AMPLITUDES
× c(n)i(n) | b ,
where the sum is taken over all i(k) ranging between 1 At the end of the last section we said that the connection of
and m, and k ranging between 1 and n. Each term in this quantum mechanics with topology is an amplification of
sum can be construed as a combinatorial path from a to b Dirac notation. Consider first a circle in a spacetime plane
in the two-dimensional space of the x–y plane. Thus the with time represented vertically and space horizontally
amplitude for going from a to b is seen as a summation (Fig. 44). The circle represents a vacuum-to-vacuum pro-
of contributions from all the “paths” connecting a to b. cess that includes the creation of two “particles” (Fig. 45)
Feynman used this description to produce his famous path and their subsequent annihilation (Fig. 46). In accord with
integral expression for amplitudes in quantum mechanics. our previous description, we could divide the circle into
His path integral takes the form two parts, creation (a) and annihilation (b), and consider
the amplitude a | b . Since the diagram for the creation
dP exp(i S), of the two particles ends in two separate points, it is nat-
ural to take a vector space of the form V ⊗ V (the tensor
where i is the square root of minus one, the integral is product of V with V ) as the target for the bra and as the
taken over all paths from point a to point b, and S is domain of the ket.
the action for a particle to travel from a to b along a We imagine at least one particle property being cata-
given path. For the quantum mechanics associated with a logued by each dimension of V . For example, a basis of
classical (Newtonian) particle the action S is given by the V could enumerate the spins of the created particles. If
integral along the given path from a to b of the difference {ea } is a basis for V , then {ea ⊗ eb } forms a basis for
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
238 Knots
Knots 239
In the simplest case cup and cap are represented by 2 × 2 crossings we have indicated mappings of V ⊗ V to itself,
matrices. The topological condition implies that these ma- called R and R̄, respectively. These mappings represent
trices are inverses of each other. Thus the problem of the the transitions corresponding to these elementary config-
existence of topological amplitudes is very easily solved urations.
for simple closed curves in the plane. That R and R̄ really must be inverses follows from the
Now we go to knots and links. Any knot or link can be isotopy shown in Fig. 50 (this is the second Reidemeister
represented by a picture that is configured with respect to a move.)
vertical direction in the plane. The picture will decompose We now have the vocabulary of cup, cap, R, and R̄.
into minima (creations) maxima (annihilations) and cross- Any knot or link can be written as a composition of these
ings of the two types shown below. (Here I consider knots fragments, and consequently a choice of such mappings
and links that are unoriented. They do not have an intrinsic determines an amplitude for knots and links. In order for
preferred direction of travel.) In Fig. 49, next to each of the such an amplitude to be topological we want it to be
FIGURE 48 Cancellation.
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
240 Knots
FIGURE 49
Knots 241
invariant under the list of local moves on the diagrams In the first Reidemeister move, a curl in the diagram is
shown in Fig. 51. These moves are an augmented list of created or destroyed. Ambient isotopy (generated by all
the Reidemeister moves (see Fig. 4), adjusted to take care the Reidemeister moves) corresponds to the full topology
of the fact that the diagrams are arranged with respect to of knots and links embedded in three-dimensional space.
a given direction in the plane. Two link diagrams are ambient isotopic via the Reide-
The equivalence relation generated by these moves is meister moves if and only if there is a continuous family
called regular isotopy. It is one move short of the relation of embeddings in three dimensions leading from one link
known as ambient isotopy. The missing move is the first to the other. The moves give a combinatorial reformulation
Reidemeister move shown in Fig. 4. of the spatial topology of knots and links.
242 Knots
By ignoring the first Reidemeister move, we allow the Taken together with the loop value of A2 − A−2 ,
possibility that these diagrams can model framed links,
that is, links with a normal vector field or, equivalently,
embeddings of curves that are thickened into bands. It
turns out to be fruitful to study invariants of regular iso- = A2 − A−2
topy. In fact, one can usually normalize an invariant of reg-
ular isotopy to obtain an invariant of ambient isotopy. We
have already discussed this phenomenon with the bracket
polynomial in Section 4. these equations can be regarded as a recursive algorithm
As the reader can see, we have already discussed the al- for computing the amplitude. This algorithm is the bracket
gebraic meaning of moves 0 and 2. The other moves trans- state model for the (unnormalized) Jones polynomial. This
late into very interesting algebra. Move 3, when translated model can be studied on its own grounds as we have al-
into algebra, is the famous Yang–Baxter equation. The ready done in section 4.
Yang–Baxter equation occurred for the first time in prob-
lems related to exactly solved models in statistical me-
chanics. All the moves taken together are directly related IX. TOPOLOGICAL QUANTUM FIELD
to the axioms for a quasi-triangular Hopf algebra (“quan- THEORY, FIRST STEPS
tum group”). We shall not go into this connection here.
There is an intimate connection between knot invariants In order to further justify the idea of topology in rela-
and the structure of generalized amplitudes, as we have de- tion to the amplification of Dirac notation, consider the
scribed them in terms of vector space mappings associated following scenario. Let M be a three-dimensional man-
with link diagrams. This strategy for the construction of ifold; that is, a space that is locally homeomorphic to
invariants is directly motivated by the concept of an ampli- Euclidean three-dimensional space. Suppose that F is a
tude in quantum mechanics. It turns out that the invariants closed orientable surface inside M dividing M into two
that can actually be produced by this means (that is, by pieces M1 and M2 . These pieces are 3-manifolds with
assigning finite dimensional matrices to the caps, cups and boundary. They meet along the surface F. Now consider
crossings) are incredibly rich. They encompass, at present, an amplitude M1 | M2 = Z (M). The form of this am-
all of the known invariants of polynomial type (Alexander plitude generalizes our previous considerations, with the
polynomial, Jones polynomial, and their generalizations). surface F constituting the distinction between the “prepa-
It is now possible to indicate the construction of the ration” M1 and the “detection” M2 . This generalization
Jones polynomial via the bracket polynomial as an ampli- of the Dirac amplitude a | b amplifies the notational dis-
tude, by specifying its matrices. tinction consisting of the vertical line of the bracket to a
The cups and the caps are defined by (Mab ) = topological distinction in a space M. The amplitude Z (M)
(M ab ) = M, where M is the 2 × 2 matrix (with ii = −1). will be said to be a topological amplitude for M if it is a
Note that M M = I , where I is the identity matrix. Note topological invariant of the 3-manifold M. Note that a
also that the amplitude for the circle is topological amplitude does not depend upon the choice of
surface F that divides M.
Mab M ab = Mab Mab = (Mab )2 From a physical point of view the independence of the
= (i A)2 + (−i A−1 )2 = −A2 − A−2 . topological amplitude on the particular surface that divides
the 3-manifold is the most important property. An ampli-
The matrix R is then defined by the equation tude arises in the condition of one part of the distinction
a b carved in the 3-manifold acting as “the observed” and the
−1 other part of the distinction acting as “the observer.” If the
Rcd = AM Mcd + A
ab ab
.
c d amplitude is to reflect physical (read topological) infor-
Since, diagrammatically, we identify R with a (right- mation about the underlying manifold, then it should not
handed) crossing, this equation can be written diagram- depend upon this particular decomposition into observer
matically as and observed. The same remarks apply to 4-manifolds and
interface with ideas in relativity. We mention 3-manifolds
because it is possible to describe many examples of topo-
logical amplitudes in three dimensions. The matter of four-
=A + A−1 dimensional amplitudes is a topic of current research. The
notion that an amplitude be independent of the distinction
producing it is prior to topology. Topological invariance
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
Knots 243
of the amplitude is a convenient and fundamental way to agree (see, e.g., Freed and Gompf, 1991; Lawrence and
produce such independence. Rozansky, 1997). This is one of the pieces of evidence in
This sudden jump to topological amplitudes has its a puzzle that everyone expects will eventually justify the
counterpart in mathematical physics. Witten (1989) pro- formalism of the functional integral.
posed a formulation of a class of 3-manifold invariants In order to obtain invariants of knots and links from
as generalized Feynman integrals taking the form Z (M), Witten’s integral, one adds an extra bit of machinery to the
where brew. The new machinery is the Wilson loop. The Wilson
loop is an exponentiated version of integrating the gauge
ik
Z (M) = dAe 4π S(M,A) . field along a loop K . We take this-loop K in three-space
to be an embedding (a knot) or a curve with transversal
Here M denotes a 3-manifold without boundary and A is a self-intersections.
It is usually indicated by the symbolism
gauge field (also called a gauge potential or gauge connec- tr(e K A ), where P denotes path-ordered integration, that
tion) defined on M. The gauge field is a one-form on M is, we are integrating and exponentiating matrix-valued
with values in a representation of a Lie algebra. The group functions, and one must keep track of the order of the
corresponding to this Lie algebra is said to be the gauge operations. The symbol tr denotes the trace of the resulting
group for this particular field. In this integral the “action” matrix.
S(M, A) is taken to be the integral over M of the trace of With the help of the Wilson loop function on knots and
the Chern–Simons three-form C S = A dA+( 23 )A A A. links, Witten writes down a functional integral for link
(The product is the wedge product ˆ of differential
ˆ ˆ invariants in a 3-manifold M:
forms.)
Z (M, K )
Instead of integrating over paths, the integral Z(M) in-
tegrates over all gauge fields modulo gauge equivalence. ik
= dAe 4π (M,A) tr(e K A ).
This generalization from paths to fields is characteristic of
quantum field theory. Quantum field theory was designed Here S(M, A) is the Chern–Simons Lagrangian, as in the
in order to accomplish the quantization of electromag- previous discussion.
netism. In quantum electrodynamics the classical entity If one takes the standard representation of the Lie alge-
is the electromagnetic field. The question posed in this bra of SU(2) as 2 × 2 complex matrices then it is a fasci-
domain is to find the value of an amplitude for starting nating exercise to see that the formalism of Z (S 3 , K ) (S3
with one field configuration and ending with another. The denotes the three-dimensional sphere) yields the original
analogue of all paths from point a to point b is “all fields Jones polynomial with the basic properties as discussed
from field A to field B.” in Section 1. See Witten (1989) or Kauffman (1995) for
Witten’s integral Z(M) is, in its form, a typical integral discussions of this part of the heuristics.
in quantum field theory. In its content Z(M) is highly un- This approach to link invariants crosses boundaries be-
usual. The formalism of the integral and its internal logic tween different methods. There are close relations be-
supports the existence of a large class of topological in- tween Z (S 3 , K ) and the invariants defined by Vassiliev
variants of 3-manifolds and associated invariants of knots (Kauffman, 1995), to name one facet of this complex
and links in these manifolds. crystal.
Invariants of 3-manifolds were initiated by Witten as
functional integrals and at the same time defined in a com-
A. Links and the Wilson Loop
binatorial way by Reshetikhin and Turaev (1991). The
Reshetikhin–Turaev definition proceeds in a way that is We shall now indicate an analysis of the formalism of this
quite similar to the definition that we gave for the bracket functional integral that reveals quite a bit about its role in
model for the Jones polynomial in Section 1. It is an amaz- knot theory. This analysis depends upon some key facts
ing fact that Witten’s definition seems to give the very relating the curvature of the gauge field to both the Wil-
same invariants. We are not in a position to go into the son loop and the Chern–Simons Lagrangian. To this end,
details of this correspondence here. However, one theme let us recall the local coordinate structure of the gauge
is worth mentioning: For k large, the Witten integral is field A(x), where x is a point in three-space. We can
approximated by those gauge connections A for which write A(x) = Aah Ta d x h , where the index a ranges from
S(M, A) has zero variation with respect to change in A. 1 to m with the Lie algebra basis {T1 , T2 , T3 ,. . . , Tm }
These are the so-called flat connections. It is possible and the index k goes from 1 to 3. For each choice of
in many examples to calculate this contribution via both a and k, Aak (x) is a smooth function defined on three-
the functional integral and by the combinatorial definition space. In A(x) we sum over the values of repeated in-
of Reshetikhin and Turaev. In all cases, the two methods dices. The Lie algebra generators Ta are actually matrices
P1: GRB/GWT P2: GPJ Final Pages
Encyclopedia of Physical Science and Technology EN008O-359 July 14, 2001 18:55
244 Knots
corresponding to a given representation of an abstract Lie V (G) is said to be of finite type k if V (G) = 0 when-
algebra. ever #(G) > k, where #(G) denotes the number of 4-valent
nodes in the graph G. See Section 5.
With this definition in hand, let us return to the invariants
B. Difference Formula derived from the functional integral Z (K ). We have that
One can deduce a difference formula for the Witten invari-
ants from the formal properties of the functional integral. 4πi
Z (K +) − Z (K −) = Z K ## T a T a .
Let K + and K − denote knots that differ at a single cross- k
ing with + and − signs, respectively, and K ## the result
of replacing the crossing by a transverse singularity (i.e., This formula tells us that for the Vassiliev invariant asso-
with distinct tangent directions for the two local curve seg- ciated with Z we have
ments). We take K # to denote the insertion of a graphical 4πi
node at the transverse crossing, as we have done in our Z (K #) = Z K ## T a T a .
k
discussion of the Vassiliev invariant. The notation K ## in-
dicates that the curve intersects itself in space at one point. Furthermore, if V j (K ) denotes the coefficient of 4πi in the
k
Let K ## Ta Ta denote the result of placing the matrices of expansion of Z(K ) in powers of (1/k), then the ambient
the Lie algebra basis into the Wilson line at the singular difference formula implies that (1/k) j divides Z(G) when
crossing as shown in Fig. 52. G has j or more nodes. Hence V j (G) = 0 if G has more
These matrices become part of the big matrix product than j nodes. Therefore V j (K ) is a Vassiliev invariant of
that generates the Wilson line. Then, up to order (1/k) one finite type. [This result was proved by Birman and Lin
has the difference relation (1993) by different methods and by Bar-Natan (1995) by
Z (K + ) − Z (K − ) = (4 pi/k)Z K ## Ta Ta . methods equivalent to ours.]
The fascinating thing is that the ambient difference
This formula is the key to unwrapping many properties of formula, appropriately interpreted, actually tells us how
the knot invariants. to compute Vk (G) when G has k nodes. This result is
equivalent to the description of weight systems derived
C. Graph Invariants and Vassiliev Invariants from Lie algebras that we described in Section 6. Thus the
Recall, from Section 5, that V(G) is a Vassiliev invariant if approach to link invariants via the functional integral mo-
tivates and explains the fundamental structure of Vassiliev
VK + − VK − = VK # . invariants.
Knots 245
This deep relationship between topological invariants in Freed, D., and Gompf, R. (1991). “Computer calculations of Witten’s
low-dimensional topology and quantum field theory in the 3-manifold invariants,” Comm. Math. Phys. 41, 79–117.
sense of Witten’s functional integral is still in its infancy. Jones, V. F. R. (1985). “A polynomial invariant for links via von Neumann
algebrea,” Bull. Amr. Math. Soc. 129, 103–112.
There will be many surprises in the future as we discover Kauffman, L. H. (1980). “The Conway polynomial,” Topology 20, 101–
that what has so far been uncovered is only the tip of an 108.
iceberg. Kauffman, L. H. (1983). “Formal Knot Theory,” Princeton University
Press, Princeton, NJ.
Kauffman, L. H. (1987a). “State models and the Jones polynomial,”
Topology 26, 395–407.
ACKNOWLEDGMENT Kauffman, L. H. (1987b). “On Knots,” Princeton University Press,
Princeton, NJ.
It gives me great pleasure to thank Vaughan jones, Ed Witten, Nicolai Kauffman, L. H. (1989). “Statistical mechanics and the Jones polyno-
Reshetikhin, Mario Rasetti, Sostenes Lins, Massimo Ferri, Lee Smolin, mial.” In “Proceedings of the 1986 Santa Cruz Conference on Artin’s
Louis Crane, David Yetter, Ray Lickorish, DeWitt Sumners, Hugh Braid Group,” pp. 263–298, AMS, Providence, RI. [Reprinted in
Morton, Joan Birman, John Conway, John Simon and Dennis Roseman M. Rasetti, ed. (1990). “New Problems, Methods and Techniques
for many conversations related to the topics of this paper. This research in Quantum Field Theory and Statistical Mechanics,” pp. 175–222,
was partially supported by the National Science Foundation Grant DMS- World Scientific, Singapore.]
2528707. Kauffman, L. H. (1993). “Knots and Physics,” 2nd ed. World Scientific,
Singapore.
Kauffman, L. H. (1995). “Functional integration and the theory of knots,”
J. Math. Phys. 36, 2402–2429.
SEE ALSO THE FOLLOWING ARTICLES Kontsevich, M. (1994). “Feynman diagrams and low-dimensional topol-
ogy.” First European Congress of Mathematics, Vol. II (Paris, 1992),
QUANTUM MECHANICS • QUANTUM THEORY • STATIS- 97–121, Progro Math., 120, Birkhauser, Basel.
TICAL MECHANICS • TOPOLOGY, GENERAL Lawrence, R., and Rozansky, L. (1997). “Witten–Reshetikhin–Turaev
invariants of Seifert manifolds,” Preprint.
Murasugi, K. (1987a). “The Jones polynomial and classical conjectures
in knot theory,” Topology 26, 187–194.
BIBLIOGRAPHY Murasugi, K. (1987b). “Jones polynomials and classical conjectures in
knot theory II,” Math. Proc. Camb. Phil. Soc. 102, 317–318.
Alexander, J. W. (1923). “Topological invariants of knots and links,” Reidemeister, K. (1948, 1932). “Knotentheorie,” Chelsea, New York and
Trans. Amr. Math. Soc. 20, 275–306. Julius Springer, Berlin.
Atiyah, M. F. (1990). “The Geometry and Physics of Knots,” Cambridge Reshetikhin, N. Y., and Turaev, V. (1990). “Ribbon graphs and their
University Press, Cambridge. invariants derived from quantum groups,” Comm. Math. Phys. 127,
Bar-Natan, D. (1995). “On the Vassiliev knot invariants,” Topology 34, 1–26.
423–472. Reshetikhin, N. Y., and Turaev, V. (1991). “Invariants of three-manifolds
Birman, J., and Lin, X. S. (1993). “Knot polynomials and Vassiliev’s via link polynomials and quantum groups,” Invent. Math. 103, 547–
invariants,” Invent. Math. 111, 225–270. 597.
Conway, J. H. (1970). “An enumeration of knots and links and some of Stanford, T. (1996). “Finite-type invariants of knots, links and graphs,”
their algebraic properties,” In “Computational Problems in Abstract Topology, 35, 1027–1050.
Algebra,” Pergamon Press, New York, pp. 329–358. Tnistlethwaite, M. (1987). “ A spanning tree expansion of the Jones
Crowell, R. H., and Fox, R. H. (1963). “Introduction to Knot Theory,” polynomial,” Topology, 26, pp 297–309.
Ginn, Boston. Vassiliev, V. (1990). “Cohomology of knot spaces.” In “Theory of Sin-
Dirac, P. A. M. (1958). “Principles of Quantum Mechanics,” Oxford gularities and Its Applications” (V. I. Arnold, ed.), pp. 23–69, AMS,
University Press, Oxford. Providence, RI.
Feynman, R., and Hibbs, A. R. (1965). “Quantum Mechanics and Path Witten, E. (1989). “Quantum field theory and the Jones polynomial,”
Integrals,” McGraw-Hill, New York. Commun. Math. Phys. 121, 351–399.
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
Linear Optimization
C. Roos
Delft University of Technology/University of Leiden
I. Historical Background
II. The Simplex Method
III. Interior-Point Methods
IV. Related Topics
V. Further Extensions
point for which all inequality constraints in the LO- minimize p= cjxj
j=1
problem are satisfied strictly.
LO-relaxation The LO-problem that arises when the in-
n
597
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
or, in matrix form, data of the LO-model. The number of operations, how-
ever, is so large that only relatively small models can be
minimize p = cT x
solved manually. Fortunately, the invention of the Simplex
subject to Ax = b (1) method coincided in time more or less with the inven-
tion of the electronic computer. The calculations could
x ≥ 0,
be automatized, thus paving the way for large-scale ap-
where A is an m × n matrix and x, b, and c are vectors plications. Oil companies belonged to the early users of
of dimension n, m, and m respectively. Here, x ≥ 0 means the computer code MPS/360 for LO released by IBM in
that all entries of x are nonnegative and the superscript T 1966 and that could run on the IBM 360 computers that
refers to taking the transpose. were introduced around that time. At the end of the 1960s
The relevance of the subject is due to the fact that real- and in the early 1970s, other software packages for LO
life problems in many branches of applied sciences like became available: MPSX, and later MPSX/370 of IBM,
economy, logistics, and engineering, can be modeled as MPS III of Exxon, the UMPIRE system for the UNIVAX
LO-problems. As a result, LO is probably one of the most 1108, APX III for the CDC computers, LAMPS (of John
applied mathematical tools. The beauty of the subject is Forrest), and LINDO. For the history of the implemen-
due to its rich mathematical theory, part of which has de- tations of the Simplex method the reader is referred to
veloped over the last twenty years. W. Orchard-Hays (1990).
From a theoretical point of view the Simplex method
has some properties that deserve mentioning here, because
I. HISTORICAL BACKGROUND they were important in the history of LO. First, the method
may cycle, i.e., it may happen that after hours of calcula-
The field of LO1 arose in the 1940s, due to important tions a tableau is generated that already occurred before
work of Dantzig, Kantorovich, Koopmans, Von Neumann, during the calculations. This phenomenon is called cy-
and Morgenstern. Dantzig was involved in the research cling. For a mathematical algorithm this is disastrous. It
project Scientific Computation of the Optimum Programs inplies that a computer program may run forever without
(SCOOP) at the U.S. Air Force. He visited Koopmans terminating with a solution. Dantzig already recognized
in June 1947 and told him of his work on linear mod- this danger and found a so-called cycle-breaking rule; as
els for the optimization of military operations. Koopmans a consequence, the Simplex method always solves any
immediately recognized the relevance of this approach LO-problem correctly.
for his economic models and made clear to Dantzig A second property concerns the computational time. It
that the economists did not have an algorithm for solv- may be expected that this time will grow with the size of
ing such models. In the summer of 1947 Dantzig in- the problem, i.e., with the number n of variables as well as
vented the Simplex method. Koopmans was the leader of with the number m of constraints. In practice the behavior
a group of economists who developed the theory of opti- is such that, as a rule of thumb, the computational time
mal assignment of resources by using LO-models. For this depends linearly on n and m. Many researchers tried to
work he received the Nobel prize, in 1975, together with find a theoretical estimate for the computational time in
Kantorovich. terms of n and m. In all cases they ended up with a formula
In the autumn of 1947 another important meeting took that is exponential in either n or m. In 1972 it became
place, when Dantzig visited Von Neumann. This meeting clear that this behavior is inherent to the Simplex method.
clarified the relation between Dantzig’s work and the game In that year Klee and Minty gave an elegant example for
theory as developed by Von Neumann and Morgenstern. which the computational time is exponential in n [Klee
Dantzig also heard for the first time of Farkas’ lemma and and Minty (1972)].
the notion of duality in LO. The result of Klee and Minty initiated a period of search
From a computational point of view the Simplex method for new, more efficient methods in the LO world. In 1979
is very simple. Only the elementary arithmetic operations the front page of the New York Times announced an
addition, subtraction, division, and multiplication are per- important result: the Russian mathematician Khachiyan
formed on a rectangular table that initially contains the had found a new method with the desired property, the
Ellipsoid method [Khachiyan (1979)]. This method is
1 Historically, the field is named Linear Programming, or LP. This
polynomial, i.e., the computational time depends poly-
name was proposed shortly after World War II [Dantzig (1963)]. Since nomially on the size of the problem. Theoretically, this
then the modern computer came to life and the word “programming”
usually refers to the activity of writing computer programs. As a conse-
was an enormous breakthrough, from the practical point
quence, its use instead of the more natural word “optimization,” gives of view the result was disappointing. Computer programs
rise to confusion [Williams (1990)]. based on the Ellipsoid method proved to be much less
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
efficient and less robust than those using the Simplex was faster, suitable for all current computer platforms, and
method. The result was an unpleasant tension between freely available for research purposes.
theory and practice: the theoretically efficient Ellipsoid The computer code OB1 was based on a method pro-
method could not match the theoretically inefficient Sim- posed already in the 1950s by Frisch (1956), the logarith-
plex method. mic barrier method. By adjusting this method the so-called
A new breakthrough came in 1984. In that year the In- path-following methods or interior-point methods arose.
dian mathematician Karmarkar published a method that The first term refers to the central path of an LO-problem;
reconciled theory and practice [Karmarkar (1984)]. It was this is a curve in the interior of the feasible region that con-
again front page news. Karmarkar worked at the well- verges to an optimal solution. After its rediscovery, inde-
known American telephone company AT&T, and had pendently by several researchers, it became the basis of all
found a completely new approach to the LO-problem. modern interior-point methods. These methods are noth-
His so-called Projective method was polynomial and, as ing else than numerical recipes that generate a sequence
claimed by Karmarkar, was in practice at least 100 times of points, on or close to the central path, that converge
faster than the Simplex method, especially for large-scale to an optimal solution. In this way, several efficient (i.e.,
problems. The last claim gave rise to much commotion. polynomial) methods came to life whose practical perfor-
It is hard to find another mathematical result that caused mance justify the original claim of Karmarkar [Roos et al.
so much excitement and dispute. The disputes were due to (1997)].
the fact that the published version of Karmarkar’s method In the meantime it turned out that the implementations
differed from the version implemented at AT&T. This was of the Simplex method could be accelerated dramatically
not made public. Later on the reason became clear. AT&T by implementing techniques already available in the lit-
had made a big investment to design a computer program erature. Especially Bixby (Rice University, Houston, TX)
for LO that should be about 200 times faster than the com- did important work in this respect. As a result an exciting
mercial codes for solving LO-problems available at that competition arose between him and the makers of OB1. In
time. A project group was formed including many promi- some large-scale applications in the airline industry this
nent researchers, with the task to devise a new LO-program has led to a nice synthesis of both approaches, where a
on the basis of Karmarkar’s method. During its existence close-to-optimal solution is generated by an interior-point
the size of the group grew from 20 to 200 researchers method which is then used as input for the Simplex method
(R. E. Bixby and R. Vanderbei, Private communication). to generate an exact solution.
The system would be a hardware/software “complete” so- A result of the sketched developments is that nowa-
lution. For the hardware a $1 million vector/parallel ma- days LO-problems can be solved about 1,000,000 times
chine of Alliant Computers was chosen. faster than in 1984. A factor of 1000 is due to the bet-
About 5 years later, in 1989, the LO-package was ter algorithmic methods and the other factor of 1000 to
launched under the name KORBX. In 1991, after sell- the improvements in computer technology (Moore’s law).
ing one system to the military airlift command for about The consequences for the applications are far-reaching.
$4 million and one system to Delta airlines for $12 million, An LO-model that required 1 year of computational time
the venture was essentially abandoned. The LO-package 16 years ago can now be solved in about 30 seconds. It is
was not portable, it ran only on the Alliant computer. In clear that problems that require a computational time of 1
retrospect, this may have been the main reason for the year are unsolvable from a practical point of view. There-
failure of the project. fore, the improved performance, although of a quantitative
Outside AT&T much research activity took place. Initial nature, has qualitative consequences: problems that could
implementations based on Karmarkar’s paper proved to be not be solved 15 years ago can now be solved in a few
about 100 times slower than the Simplex-based codes. It seconds. This explains why the use of commercial LO-
looked as if the Simplex method would survive this attack packages has shown an explosive growth during the last
from the Projective method, just as it had survived the years.
Ellipsoid method. Karmarkar, however, persisted with his Modern LO-packages contain both a Simplex-based
claims. This inspired further worldwide research. Over a solver and an interior-point solver. Some of the most well-
period of 10 years more than 2000 scientific publications known packages are OSL of IBM, CPLEX (based on the
appeared on the subject. This led to many new theoretical Simplex code of Bixby and on OB1), XPRESS-MP of
concepts and algorithmic ideas. These insights were im- Dash Associates, and MOSEK of the brothers E. D. and
mediately incorporated into computer programs. The work K. D. Andersen. The environments in which these pack-
of Lustig, Marsten, and Shanno resulted in a computer ages are used nowadays is quite diverse: aviation industry,
code, called OB1, that could compete with the Simplex oil industry, engineering design, finance, water manage-
method. At the time that KORBX entered the market, OB1 ment, etc.
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
We assume (without loss of generality) that the problem while leaving the other nonbasic variables zero (their cur-
matrix A has rank m. Since x̄ is a basic solution, its support rent value). We then have
contains at most m indices and the corresponding columns
of A are linearly independent. If necessary, we extend the c T x = c T x̄ + s̄k xk . (9)
support of x̄ to obtain a set B of m indices such that the
columns Ak of A with k ∈ B are linearly independent. Any This makes clear that the objective value will decrease if xk
such set B is called a basic index set (or basis) for x̄. The increases. Of course, the larger xk , the larger the decrease
remaining indices are called the nonbasic indices; their of the objective value. However, when increasing xκ we
set is denoted as N . have to take care of the feasibility. In this respect Eq. (7) is
By construction, the submatrix A B of A, consisting of useful. Taking for x N the vector with xk in the k-position,
the columns indexed by elements of B, is nonsingular. and zeros elsewhere, we obtain x B as a function of xk :
Hence, the current basic feasible solution x̄ is given by
x B = x̄ B − xk A−1
B Ak .
x̄ B = A−1
B b, x̄ N = 0, (2)
It is convenient to introduce the matrix
and, defining ȳ ∈ IRm by
T B := A−1
B A. (10)
A TB ȳ = c B , (3)
Note that this is an m × n matrix whose rows are naturally
the current objective value satisfies
indexed by the elements of B. We can now rewrite the
c T x̄ = c TB x̄ B + c TN x̄ N = c TB A−1
B b = b ȳ.
T
(4) above expression for x B as follows:
s̄ ≥ 0, and as we saw before this implies the optimality entry in b is negative, by −1. Then x = 0, t = b is a fea-
of the current solution x̄. The second possible reason for sible solution for Eq. (15), and, obviously, this solution
termination of the method is that after having found the is basic. Hence, we can solve Eq. (15) with the Simplex
entering variable k there is no candidate for ; as we estab- method. Since Eq. (15) is a bounded problem, this yields
lished earlier this implies that the problem is unbounded. a basic feasible solution (x̄, t̄) of Eq. (15). Now two cases
can occur: e T t̄ > 0 or e T t̄ = 0. In the first case the prob-
lem must be infeasible, because any feasible solution of
G. Cycle-Breaking Rules Eq. (1) yields a feasible solution to Eq. (15) with t = 0. In
One elegant way to avoid cycles is to use a so-called least the second case x̄ is a basic feasible solution of Eq. (1).
index rule. Then, whenever there is any ambiguity in the It is now clear that we can solve Eq. (1) with the Simplex
choice of k or we choose the smallest possible index method in two phases. In phase I we solve Eq. (15); if the
among the candidate indices. This rule was first proposed problem is infeasible it is detected in this phase, otherwise
by Bland (1977). we obtain a basic feasible solution of Eq. (1) that can be
A second cycle-breaking rule is the lexico-graphic rule used in phase II to solve the problem. Phase II either yields
[Danzig et al. (1955)]. This rule uses the lexico-graphic an optimal basic solution, or detects that the problem is
ordering “≺” of vectors: we say that v ≺ w (v, w ∈ IR p ) (feasible and) unbounded.
if the vector u = w − v is lexicographically positive, i.e.
u = 0 and the first nonzero coordinate of u is positive. The I. Duality
lexicographic rule gives no condition on the choice of
the entering index k, but it requires the leaving index to Suppose that Eq. (1) has an optimal solution. Then, by
be taken such that the pivot element T Bk is positive and the applying the Simplex method, we can obtain an optimal
vector basic index set B and the corresponding basic feasible
solution x̄. Below, we use again the vectors ȳ ∈ IRm and
x̄ B T B ,: s̄ ∈ IRn as introduced in Eqs. (3) and (5), respectively.
T Bk
is lexicographically minimal. Here, (x̄ B T B ) is the matrix 1. Dual of the Standard Problem
obtained by extending T B to the left with the column x̄ B . Recall from Section II.D that s̄ ≥ 0. Hence, ( ȳ, s̄) is feasi-
So this matrix is nothing else than the part of the Simplex ble for the following maximization problem:
tableau below its first row; (x̄ B T B ) : denotes the row of
the tableau indexed by the basic variable . maximize d = bT y
If initially all rows in the tableau below the first row are subject to A T y + s = c (16)
lexicographically positive, which can be realized easily,
then the lexicographic rule guarantees that in all subse- s ≥ 0.
quent tableaus this property is maintained, and that the
first row is lexicographically decreasing. The last prop- 2. Duality Results
erty prevents the occurrence of cycles.
We claim that ( ȳ, s̄) is an optimal solution of Eq. (16).
This is a consequence of the following result.
H. Initialization: Two-Phase Method
Theorem 1 (Weak duality) Suppose that x and (y, s) are
In Section II.D we assumed that we were given a basic feasible for Eqs. (1) and (16), respectively. Then
feasible solution x̄. We show in this section how to obtain
c T x − b T y ≥ 0.
such a solution, if it exists.
We introduce artificial variables ti (1 ≤ i ≤ n) and con- Proof: We have
sider the problem
x T s = x T (c − A T y)
minimize e T t = c T x − (Ax)T y = c T x − b T y.
subject to Ax + t = b (15) Since x ≥ 0 and s ≥ 0, x T s ≥ 0.
x ≥ 0, t ≥ 0,
Theorem 1 reveals a close relation between problems (1)
where e denotes the all-one vector. Without loss of gener- and (16). From now we call these problems the primal
ality we may assume that b ≥ 0 in Eq. (1), because other- problem and dual problem, respectively. The theorem im-
wise we multiply constraints for which the corresponding plies that if x is feasible for the primal problem (or shortly,
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
primal feasible) then c T x is an upper bound for the opti- TABLE I Scheme for Dualizing
mal value d ∗ of the dual problem, and, vice versa, if (y , s) min c T x max b T y
is dual feasible then b T y is a lower bound for the opti-
mal value p ∗ of the primal problem. As a consequence, ‘=’ Constraint Free variable
if c T x = b T y, then x is an optimal solution of the primal ‘≥’ Constraint Variable ≥ 0
problem and (y , s) is an optimal solution of the dual prob- ‘≤’ Constraint Variable ≤ 0
lem. Hence, by taking x = x̄ , y = ȳ and s = s̄, the above Free Variable ‘=’ Constraint
claim now follows from Eq. (4). Variable ≥ 0 ‘≤’ Constraint
A direct consequence of Theorem 1 is that if either of Variable ≤ 0 ‘≥’ Constraint
problems (1) or (16) is unbounded, then the other problem
is infeasible.
In this way we get a 1–1 correspondence between the vari-
In summary, if the primal problem has an optimal solu-
ables in the primal problem and the constraints in the dual
tion then so has the dual problem and the optimal values
problem, and vice versa. Note that the primal constraints
coincide. If the primal problem is unbounded then the
are equality constraints and the dual variables are free
dual problem is infeasible. In Section II.I.3 we will see
(i.e., without sign constraints). On the other hand, the pri-
that problem (16) also has a dual problem, which is ex-
mal variables are nonnegative and the dual constraints are
actly (1). Hence, we may interchange the words “primal”
inequality constraints of ‘≤’ type. Also note that the roles
and “dual” in the first sentence of this paragraph. Thus, we
of b and c are exchanged when taking the dual, and the
have the following result, due to Von Neumann (1947).
problem matrix A is transposed.
Theorem 2 (Strong duality) If the primal and dual prob- We can associate with each LO-problem (not necessar-
lem are both feasible then both problems have optimal ily in the standard from) a dual problem. To this end we
solutions and their optimal values are equal. Otherwise, first put the problem in the standard from (cf. Section II.A),
neither of the two problems has optimal solutions. and then take the dual of this problem. The dual problem
obtained in this way, however, in general does not have
If neither of the two problems has optimal solutions then
such a nice 1–1 correspondence between variables on one
both are infeasible, or one is infeasible while the other is
side and constraints on the other side. With little extra
unbounded.
effort it is possible to reformulate the dual such that we
A second duality result is due to Goldman and Tucker
retain a 1–1 correspondence. In this way we get a simple
(1956).
and natural relation between a general LO-problem and
Theorem 3 (Goldman-Tucker) If both the primal and its dual, at the same time making it quite easy to write
dual problem are feasible then there exists a strictly com- down the dual problem of an arbitrary LO-problem. The
plementary pair of optimal solutions, i.e., optimal solu- relation is shown in Table I.
tions x and (y, s) such that x + s > 0. The scheme can be used both from the left to the right
and from the right to the left. Thus, e.g., if the primal
This duality result is less well-known. Its interest is that
problem is a maiximization problem, the dual problem
interior-point methods produce strictly complementary
is a minimization problem and, e.g., a ‘≥’-constraint in
solutions (cf. Section III.J). It has recently become clear
the primal problem gives rise to a nonpositive variable in
that such solutions play an important role when dealing
the dual problem. An obvious consequence is that we may
with sensitivity analysis (cf. Section IV.B).
say, in short, that the dual of the dual problem is the primal
Let us also mention that the primal problem is infeasible
problem.
if and only if there exists a vector y such that A T y ≤ 0
and b T y > 0, and the dual problem is infeasible if and
only if there exists a vector x ≥ 0 such that Ax = 0 and J. The Dual Simplex Method
c T x < 0. These statements are examples of theorems of the Given any basic index set B, following the method of
alternatives and are equivalent to the well-known Farkas’ Section II.I we can associate with B a basic solution x̄
lemma. See, e.g., Schrijver (1986). of the primal problem and a basic solution ( ȳ, s̄) of the
dual problem as done in Eqs. (2), (3), and (5). Note that
3. Dual of a General LO-Problem
if x̄ ≥ 0 and s̄ ≥ 0 then x̄ and ( ȳ, s̄) are primal and dual
The dual of the standard problem, as given in Section II.I.3, feasible respectively, and due to Eq. (4), these solutions
can be reformulated in the following way: are optimal. In this section we do not assume that x̄ is
primal feasible but we assume that ( ȳ, s̄) is dual feasible.
maximize b T y
A typical iteration in the dual Simplex method then goes
subject to A T y ≤ c. as follows. Since x̄ is not primal feasible we may choose
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
an index ∈ B such that x̄ < 0 . Our aim is to replace Let us conclude this section by mentioning that in some
the index ∈ B by some index k ∈ B, thus getting the new natural way the dual Simplex method is completely equiv-
basic index set [see Eq. (13)] alent with the (primal) Simplex method as treated earlier.
The natural correspondence between the two methods fol-
B := (B ∪ {k})\{ }. lows by noting that when applying the Simplex method to
The new Simplex tableau will be obtained by using T Bk solve problem (1) then this also yields the solution of the
as pivot. Denoting the new basic solutions as x̃ and ( ỹ, s̃), dual problem (16). Therefore, since the dual of (16) is
we then have exactly (1), when solving (16) with the primal Simplex
method we also obtain the solution of (1). In fact, it can be
x̃ = x̃ − x̃k T Bk = 0 (17) seen that applying the dual Simplex method to (1) is es-
sentially the same as applying the primal Simplex method
and to (16).
T Bi With the above observation in mind it will be no surprise
s̃i = s̄i − s̄k , 1 ≤ i ≤ n. (18) that cycling of the dual Simplex method can be prevented
T Bk
by using the least index rule: whenever there is any ambi-
We want to have x̃k > 0. Therefore, since x̄ < 0, Eq. (17) guity in the choice of or k, choose the smallest possible
makes it clear that we need index among the candidate indices.
T Bk < 0. (19)
K. The Criss-Cross Method
We further want to maintain dual feasibility, i.e., s̃ ≥ 0. As-
suming Eq. (19), it is obvious that s̃i ≥ 0 whenever T Bi ≥ 0, So far, we have dealt with two variants of the Simplex
because s̄i ≥ 0 and s̄k ≥ 0. Thus, dual feasibility is main- method: the primal Simplex method and the dual Simplex
tained if and only if k is such that method. The primal Simplex method generates a sequence
of primal feasible basic solutions, whereas the dual Sim-
T Bi plex method uses only dual feasible basic solutions. Fea-
s̄i − s̄k ≥ 0
T Bk siblility is maintained by choosing the pivoting pair (k, )
in each iteration according to the primal and dual quotient
whenever T Bi < 0. As a consequence, we obtain rule, respectively. To start such a method, one first needs to
generate a feasible basic solution, thus separating the work
s̄i
k = argmini : T B
i < 0 . into two phases. In Phase I a feasible basic solution is gen-
−T Bi
erated; this phase requires the introduction of the artificial
This is the quotient rule for the dual Simplex method. With variables. Phase II is used to generate an optimal solution.
the given leaving index this rule finds an entering index We discuss in this section how we can avoid artificial
k such that dual feasibility of the new basic solution is variables, and solve the problem in one phase. The
guaranteed. underlying idea, due to Zionts (1969), is to neglect the
The above method works only if there are suitable can- feasibility issue and to aim for feasibility at both sides
didates for the entering index k. What if there are no such simultaneously.
candidates? Then we have Let us call index i primal infeasible if x̄i < 0 and other-
wise primal feasible. Similarly, index i is dual infeasible
x̄ < 0 and T Bk ≥ 0, for all k. (20) if s̄i < 0 and otherwise dual feasible. Recall that any basic
solution satisfies x̄ N = 0 and s̄ B = 0. Hence, for each index
Now recall from Eq. (7) that
either x̄i = 0 or s̄i = 0. Therefore, it is impossible for an
x B = x̄ B − TNB x N . index i to be primal infeasible and dual infeasible at the
same time. If all indices are feasible then the basic solu-
Hence, if Eq. (20) holds then we have for the -entry of tions are optimal, and we are done. Otherwise, we take
any primal feasible vector x: an infeasible index i. If i is dual infeasible then, putting
k := i we look for an index such that T Bk > 0; if such an
x = x̄ − T Bk xk ≤ x̄ < 0, does not exist the problem is unbounded, or infeasible.
k∈N
If i is primal infeasible then, putting := i we look for
showing that the problem cannot be feasible. We conclude an index k such that T Bk > 0; if such a k does not exist
that if no candidate for the entering variable exists then the the problem is infeasible. If a pair (k, ) has been found
problem is infeasible. Note that this argument does not use then we perform an iteration with this pair as pivoting pair.
the dual feasibility of the current basic solution. Hoping for the best, this process is repeated.
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
B. Reduction to Feasibility Problem does not satisfy this condition. Because if y, z, and κ solve
(23) then c T z − b T y = 0, and hence the last coordinate of
By Theorem 2, if one of the two problems (21) and (22)
M x̄ vanishes.
has an optimal solution then so has the other, and then
To circumvent this difficulty the problem is embedded
their optimal values coincide. Using also Theorem 1, this
into a slightly larger problem that satisfies the IPC. This
is the case if and only if the system
goes as follows. Letting n − 1 denote the size of M̄, and
Az ≥ b, z ≥ 0, with e denoting the all-one vector, as usual of appropriate
size, we introduce
−A T y ≥ −c, y ≥ 0,
0
b y−c z ≥ 0
T T r = e − M̄e, q = ·
n
has a solution, and any such solution provides optimal Now consider the system
solutions for (21) and (22). With κ = 1, the above system
M̄ r x̄ x̄
can be rewritten as ≥ −q, ≥ 0. (25)
−r T 0 ϑ ϑ
0 A −b y y Note that if (x̄, ϑ) satisfies (25) and if ϑ = 0, then x̄ is a
−A T
0 c z ≥ 0, z ≥ 0. (23) solution of (24). On the other hand, a solution x̄ of (24)
bT −c T 0 κ κ gives rise to a solution of (25) with ϑ = 0 if and only if
Here the zeros represent matrices (or vectors) of appropri- r T x̄ ≤ n.
ate sizes. Note that by introducing the variable κ, the sys-
This certainly holds if r T x̄ ≤ 0; otherwise a positive mul-
tem became homogeneous: if (y, z, κ) is a solution then
tiple of x̄ yields a solution of problem (25) with ϑ = 0. We
λ(y, z, κ) is also a solution, for any positive λ. Hence,
conclude that the set of solutions of (25) with ϑ = 0 con-
problem (23) has a solution with κ = 1 if and only if it has
tains all solutions of (24), possibly up to a positive factor.
a solution with κ > 0. Given any solution (y, z, κ) of (23)
The new variable ϑ is called the lifting variable.
with κ > 0, then
We now show that the new system satisfies the IPC. In
z y fact the all-one vector does the work. Taking x̄ = e and
z∗ = , y∗ =
κ κ ϑ = 1 we get
are optimal solutions of (21) and (22).
M̄ x̄ + ϑr = M̄e + r = e
Thus we may conclude that (21) and (22) have optimal
solutions if and only if (23) has a solution with κ > 0. and
The extra variable κ is called the homogenizing variable.
−r T x̄ = −(e − M̄e)T e = −e T e = −n + 1,
Note that problem (23) always admits the zero solution
z = 0, y = 0, and κ = 0. If κ = 0 for every solution, then we where we used that e T M̄e = 0, since M̄ is skew-
may conclude that (21) and (22) have no optimal solutions, symmetric. Hence, we obtain
i.e., these problems are infeasible or unbounded.
M̄ r e e
+q = ,
−r T
0 1 1
C. Embedding into Self-Dual Model proving the claim.
To simplify notations, we use the matrix M̄ and the vector We already observed that a solution of (25) can be useful
x̄ defined by for us only if ϑ = 0. How do we get such a solution? Note
that, since q ≥ 0, x̄ = 0 and ϑ = 0 are feasible for problem
0 A −b y (25). Therefore, when minimizing ϑ subject to (25) the
M̄ = −A T 0 c , x̄ = z . optimal value will be equal to zero. Defining
bT −c T 0 κ
M̄ r x̄
M= , x = ,
Then Eq. (23) can be written as −r T
0 ϑ
M̄ x̄ ≥ 0, x̄ ≥ 0. (24) q T x = nϑ vanishes if and only if ϑ = 0, and thus we are
interested in the optimal solutions of the problem
We need to find out whether or not this inequality system
has a solution with κ > 0. minimize q T x
When using interior-point methods it is necessary that
subject to M x ≥ −q (26)
the IPC is satisfied. In other words, there should exist a vec-
tor x 0 > 0 such that M̄ x 0 > 0. The system (23) certainly x ≥ 0.
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
When taking the dual of this problem we get the question is whether n − 1 ∈ B or not. If the answer is
yes, then any solution with positive homogenizing variable
maximize −q T u
will yield optimal solutions for our original problems (21)
subject to −Mu ≤ q and (22).
Since M is skew-symmetric we have for any vector u
u ≥ 0,
u T Mu = 0. (29)
where we used that M T = −M. Since maximizing −q T u
is equivalent to minimizing q T u this is exactly the same Hence, if x is feasible,
problem as (26). This is expressed by saying that problem
(26) is self-dual. q T x = (M x + s(x))T x = x T s(x), (30)
We finally point out that the optimal set of (26) is and x is optimal if and only if xi si (x) = 0 for each i.
bounded. For any feasible x let s(x) = Mx + q. Then We will use a shorthand notation for the vector with
e T M x = e T ( M̄ x̄ + ϑr ) − r T x̄ coordinates xi si (x), namely xs(x). Then the optimality
conditions for problem (26) are given by
= e T ( M̄ x̄ + ϑr ) − (e − M̄e)T x̄
x ≥ 0, s(x) = M x + q ≥ 0, xs(x) = 0. (31)
= e T ( M̄ x̄ + ϑr ) − e T x̄ + e T M̄ T x̄
The last condition is the complementarity condition.
= ϑe T r − e T x̄, For more appreciation of the next theorem, let us indi-
where we used once more that M̄ T = − M̄. For the same cate that problem (31) may have multiple solutions. For
reason we have e T M̄e = 0, whence e T r = e T e = n − 1. example, if
Hence,
0 −1 1
M= , q= ,
e T s(x) = ϑ(n − 1) − e T x̄ + n 1 0 0
and, finally, then
e T x + e T s(x) = e T x̄ + ϑ + e T s(x) x1 1 1 − x2
s(x) = M + = ,
x2 0 x1
= n(1 + ϑ).
and the solution to (31) is x = (0, x2 ) with 0 ≤ x2 ≤ 1.
Taking ϑ = 0 we get Now consider the following system.
e T x + e T s(x) = n. x ≥ 0, s(x) = M x + q ≥ 0, xs(x) = µe, (32)
Since x ≥ 0 and s(x) ≥ 0, this implies that the optimal set
where µ is any positive number.
is bounded.
Theorem 4 For any µ > 0 the system (32) has a unique
solution.
D. Central Path
The solution of (32) is denoted as x(µ); it is called the
In the previous section we reduced the general LO- µ-center of (26). When µ runs through all positive reals
problem to the problem of solving the self-dual problem then x(µ) follows a curve in the interior of the feasible
(26). This is not exactly true, however. Remember that the region of (26). Note that z(µ)/κ(µ) and x(µ)/κ(µ) are
real issue is to find an optimal solution with a positive feasible solutions for the problems (21) and (22); the cor-
homogenizing variable κ = xn−1 , or to establish that such responding curves are the respective central paths of these
a solution does not exist. two problems.
Recall that the all-one vector is feasible and The central path is quite relevant for two reasons. First,
s(e) = e, (27) observe that for any µ > 0 we have
so the IPC is satisfied. Although the M and q introduced q T x(µ) = nµ. (33)
in the previous section have a very special structure, in
This is an obvious consequence of Eq. (30) and xi si (x) = µ
the analysis below we allow M to be any skew-symmetric
for each i. Hence, if µ approaches 0 then q T x(µ) goes to
matrix and q any nonnegative vector.
zero, which means that x(µ) approaches the optimal set.
Denoting the optimal set of (26) as S, and defining the
The second reason is that not only the limit
index set B by
x(0) := lim x(µ)
B := {i:xi > 0 for some x ∈ S}, (28) µ↓0
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
exists but, even more importantly, that the support of x(0) iterations the objective value, given by q T x(µ) = nµ, is
is the set B. As a consequence, if we know x(0), we have smaller than, or equal to ε if
enough information to decide whether the homogeneous
variable can be positive in the optimal set, and if this is the (1 − θ )k n ≤ ε.
case we can derive from x(0) optimal solutions for (21) Taking logarithms, this becomes
and (22).
Path-following methods use the central path as a guide k log(1 − θ ) + log n ≤ log ε.
to the optimal set. Such a method starts at the point on the
Since −log(1 − θ) ≥ θ , this certainly holds if
central path corresponding to µ = 1, because the 1-center
is known: due to Eq. (27) we have x(1) = e. n
kθ ≥ log n − log ε = log .
ε
This implies the lemma.
E. Conceptual Method
The above algorithm uses exact µ-centers. These can
The parameter is µ is called barrier parameter. The ques-
be obtained only by solving the nonlinear system (32). To
tion we have to deal with is how to obtain the µ-centers
make the algorithm more practical, we have to avoid this.
for small values of the barrier parameter.
This is the subject of the following sections.
Suppose that we know x(µ) for some µ > 0, and let µ
be obtained from µ by
µ := (1 − θ )µ, F. Using Approximate Centers
where θ is a positive constant smaller than 1. We may Let us now assume that we have an appr oximate µ-
expect that if θ is not too large, the µ -center will be center, i.e., a positive feasible solution x that is close to
close to the given µ-center. For the moment, let us as- x(µ). The meaning of “close” will be made more precise
sume that we are able to calculate the µ -center, pro- later on.
vided θ is not too large. Then the following conceptual We want to find a displacements x such that
algorithm can be used to find an ε-optimal solution of x = x + x (34)
problem (26).
is the µ-center. Denoting s = s(x ), and
The output of this algorithm is a feasible solution for The above system is nonlinear and hard to solve. Follow-
problem (26) such that the objective value does not exceed ing Newton’s method, we linearize the centering condition
ε. How many iterations are needed by the algorithm? The (37) by neglecting the quadratic term xs, thus obtain-
answer is provided by the following lemma. ing the Newton equation
Due to the Newton equation (38) this implies Approximate Center Algorithm
Input:
x s = µe + xs. (39) An accuracy parameter ε > 0;
a barrier update parameter θ, 0 < θ < 1.
Since M is skew-symmetric, begin
x = x(1); µ := 1;
(x)T s = (x)T Mx = 0, while nµ ≥ ε do
begin
proving that x and s are orthogonal. Hence, using µ := (1 − θ )µ;
Eq. (39) x := x + s;
end
q T x = e T (x s ) = e T (µe + xs) = nµ. end
Thus we have shown that after the Newton step, the ob-
In the next section we discuss how an appropriate
jective value has the value at the target x(µ).
value of the parameter θ can be obtained, so that during
But what about x ? We may not expect that x coincides
the course of the algorithm the iterates are always close
with x(µ), due to the “error term” xs in Eq. (39). But
enough to the current µ-center to guarantee that Newton’s
hopefully, x is closer to x(µ) then x. For dealing with this
method is quadratically convergent.
we need a quantity to measure proximity to x(µ). For this
purpose we introduce the quantity δ(x, µ) defined by
1 xs µe
.
I. Complexity Analysis
δ(x, µ) = −
2 µ xs At the start of the algorithm we have µ = 1 and x = x(1),
whence q T x = n and δ(x, µ) = 0. In each iteration µ is
The notation, although not quite common, explains itself:
first reduced with the factor 1 − θ and then the Newton
all operations are meant to be coordinatewise. Note that
step is made to the new µ-center. It will be clear that the
xs µe reduction of µ has effect on the value of the proximity
x = x(µ) ⇔ =e⇔ = e.
µ xs measure. This effect is fully described by the following
lemma.
Hence, if x = x(µ) then δ(x, µ) = 0 and δ(x, µ) > 0 oth-
erwise. One may prove the following. Lemma 2 Let x > 0 and µ > 0 be such that s =
s(x) > 0 and q T x = nµ. Moreover, let δ := δ(x, µ) and
Theorem 5 If δ := δ(x, µ) ≤ 1, then the Newton step is
µ = (1 − θ )µ. Then
feasible, i.e., x and s are nonnegative. Moreover, if δ < 1,
then x and s are positive and θ 2n
δ(x, µ )2 = (1 − θ)δ 2 + .
δ2 4(1 − θ )
δ(x , µ) ≤ .
2(1 − δ 2 ) Now let us have
Due to the choice of θ, we get In fact, this property can be used to prove Theorem 3 in
1 1 1 1 1 Section II.I.2. Another consequence is that
δ(x, µ )2 < + ≤ + < .
4 16(1 − θ) 4 8 2 N := {i: si > 0 for some x ∈ S}. (41)
Therefore, after the Newton step δ(x , µ ) ≤ 1/2. Also, The partition of the index set {1, . . . , n} into the classes B
q T x = nµ . This proves the claim. and N is called the optimal partition of (26).
Thus we have shown that the following theorem holds. We define the condition number of (26) by
√
Theorem 6 If θ = 1/(2 n) then the algorithm with full σ := min max(xi + si (x)).
Newton steps requires at most i x∈S
The calculation of this (positive!) number is even more
√ n
2 n log cumbersome than solving Eq. (26). For our purpose, how-
ε
ever, it is sufficient to know the following lower bound
iterations. The output is a feasible x > 0 such that
1
q T x = nµ ≤ ε and δ(x, µ) ≤ 12 . σ ≥ n , (42)
j=1 M j
This theorem shows that we can get an ε-solution x of
our self-dual model with ε as small as desirable. But note where M j denotes the jth column of M. For this bound
that x will always be an interior solution, so x > 0 and we need to make the assumption that the entries of M and
s = s(x) > 0. q are integral. In the sequel, in all the bounds where the
A crucial question is whether the variable κ = xn−1 is condition number occurs, it can be safely replaced by this
positive or zero in the limit, when µ goes to zero. In prac- easily computable lower bound.
tice, for small enough ε it is usually no serious problem Lemma 3 For any positive µ one has
to decide which of the two cases occurs. But the theory σ
can help us in an elegant way, as we explain in the next xi (µ) ≥ , i ∈ B,
n
section. This requires some further analysis of the central
nµ
path. xi (µ) ≤ , i ∈ N.
Before proceeding, we want to agree upon the follow- σ
ing. This section has made clear that the conceptual al- Proof: Let i ∈ N and x̃ ∈ S be such that s̃ i := si (x̃) is
gorithm of Section III.E can be turned into a practical maximal. Then the definition of the condition number σ
algorithm; the iterates are no longer on the central path implies that s̃i ≥ σ . Using the skew-symmetry of M one
but move in a narrow neighborhood around the central easily sees that
path. (x(µ) − x̃)T (s(µ) − s̃) = 0.
In the next sections we will assume that the iterates are
on the central path, because this simplifies the analysis Since q T x̃ = x̃ T s̃ = 0, it follows that
significantly. At the cost of some additional technicalities
x(µ)T s̃ + s(µ)T x̃ = nµ.
in the analysis we can obtain similar results for the iterates
of the practical algorithm. For the technical details the This implies
reader is referred to Roos et al. (1997).
xi (µ)s̃i ≤ x(µ)T s̃ ≤ nµ.
Dividing by s̃i and using that s̃i ≥ σ we obtain the second
J. Finding the Optimal Partition
inequality of the lemma:
If x ∈ S, S being the set of optimal solutions, then nµ nµ
xi (µ) ≤ ≤ .
xs(x) = 0, x + s(x) ≥ 0. s̃i σ
The equality represents the complementarity property of The first inequality can be derived in a similar way.
optimal solutions. If the inequality is strict, we say that Lemma 3 gives a complete separation between variables
x is a strictly complementary solution. Recall from Sec- in B and variables in N , provided that µ is so small that
tion III.D that the support of the limit point x(0) of the
σ nµ
central path is the set B, defined by Eq. (28). Let N de- > .
note the complementary set. Then another property of the n σ
central path is that the support of s(0) := s(x(0)) is N . As or, equivalently,
a consequence, x(0) is a strictly complementary solution: σ2
nµ < .
x(0)s(0) = 0, x(0) + s(0) > 0. n
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
Thus, applying the algorithm with Consider the system of equations in the unknown vector
ξ given by
σ2
ε=,
n MB B ξ = sB − MB N x N . (43)
we obtain a solution x which reveals the optimal partition Note that ξ = xB is a “large” solution of (43), because
according to the entries of x B are “large” variables. We can easily see
B = {i: xi > si (x)} that Eq. (43) has more solutions. This follows by using
any optimal solution x̃ with x̃B = 0. One has x̃N = 0
N = {i: xi < si (x)}. and sB (x̃) = 0, whence MBB x̄B = 0. Since xB (0) = 0, it
By substitution of the above value of ε in Theorem 6 the follows that the matrix MBB must be singular, and hence
required number of iterations follows: Eq. (43) has multiple solutions.
Now let ξ be any solution of Eq. (43) and consider the
√ n2
2 n log 2 . vector x̄ defined by
σ
After this number of iterations we know if κ belongs to B x̄B = x B − ξ, x̄N = 0.
or not. In other words, we then know whether the original Define s̄ = s(x̄). Since x̄N = 0, we have
problem has an optimal solution or not. In the last case we
are done, otherwise we are usually interested in finding a s̄B = MBB x̄B = MBB (xB − ξ ) = 0.
solution. If we are interested in an ε-solution we can pro-
Therefore, x̄ N = s̄ B = 0, showing that the vectors x̄ and
ceed with the algorithm until such a solution is obtained.
s̄ are complementary. It will be clear, however, that the
An alternative approach may be to use the rounding pro-
vectors x̄ and s̄ are not necessarily nonnegative, let alone
cedure described in the next section, which yields an exact
strictly complementary. This only holds if
solution.
x̄ B = x B − ξ > 0, (44)
K. Rounding to an Exact Solution
and
Knowing the optimal partition, we aim for finding a strictly
complementary solution of Eq. (26). s̄ N = M N B x̄ B + M N N x̄ N + q N
In one case this is very easy. If it happens that the set B = M N B (x B − ξ ) + q N
in the optimal partition is empty then x(0) = 0 is a strictly
complementary solution and we are done. Note that in that = s N − M N N x N − M N B ξ > 0. (45)
case s(x(0)) must be positive. Since s(x(0)) = q, this case Note that if we run the algorithm long enough, xB and
can easily be seen to occur if and only if q > 0. sN converge to positive vectors, whereas xN and sB (and
Therefore, we assume from now on that the class B is not hence also ξ ) converge to zero. This makes it plausible
empty. Assuming this we describe a rounding procedure that if we take ε small enough in the algorithm, then we
that can be applied to any x generated by the algorithm to will get a solution ξ of (43) that satisfies (44) and (45),
yield a vector x̄ such that x̄ and its surplus vector s̄ = s(x̄) hence giving rise to a strictly complementary solution of
are complementary (in the sense that x̄ N = s̄ B = 0). In gen- Eq. (43).
eral, however, these x̄ and s̄ are not necessarily nonnega- Omitting further details, we conclude this section by
tive. But, as we will see, after sufficient additional itera- stating the main complexity result for interior-point re-
tions of the algorithm the separation between the “small” sults: when solving Eq. (43) by Gaussian elimination,
and the “large” variables is strong enough to get a strictly √ an
exact solution of Eq. (26) can be found after at most 7 n L
complementary solution. All of this can be done in poly- iterations, where L denotes the binary size of the problem.
nomial time.
Partitioning the matrix M and the vectors x and s ac- L. Other Interior-Point Methods
cording to the optimal partition, the relation s = M x + q
can be rewritten as In the preceding sections we have shown that the LO-
problem can be solved in polynomial time. If the problem
sB MBB MBN xB qB is infeasible or unbounded then this is detected by the
= + .
sN MN B MN N xN qN method, otherwise it generates an exact solution. When
executed, the number of iterations will be about the same
Since qBT xB (0) = 0 and x B (0) > 0, it follows that q B = 0.
as predicted by the theory. In practice this means that the
Hence, we have
method is much slower than the Simplex method. There
sB = MBB xB + MBN xN . are several ways to make the method more efficient and
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
competitive to the Simplex method. We briefly discuss The embedding technique, as described in Section III.C,
some of them. elegantly resolves this initialization problem, at the cost
of two additional variables, the homogenizing variable κ
and the lifting variable ϑ. There exist other solutions to the
1. Adaptive-Update Methods
initialization problem, leading to the so-called infeasible-
The method we considered generates iterations in a narrow start methods. We briefly discuss such a method for the
neighborhood of the central path, since at the start of each standard form (1) and its dual (16). The centering condi-
iterations we have δ(x , µ) ≤ 12 . In the theoretical anal- tion for this pair of problems is simply xs = µe.
ysis this property is maintained by taking √ small updates Infeasible-start methods take any two positive n-vectors
of the barrier barrier parameter (θ = 1/(2 n). In practice, x and s and an arbitrary m-vector y. These vectors may
Newton’s method behaves much better than predicted by be not feasible. Defining the primal and dual residuals by
Theorem 5. As a consequence, δ(x , µ) will usually be
r p (x) = Ax − b, rd (y, s) = A T y + s − c,
much smaller than 12 . It means that we can take larger val-
ues of θ without loosing the above property. The adaptive- respectively, the Newton-like search directions are defined
update method seeks to take θ as large as possible such that by the system
at the start of the next iteration we still have δ(x , µ) ≤ 12 .
This strategy accelerates the method significantly and does Ax = −r p (x)
not deteriorate the theoretical iteration bound. A T y + s = −rd (y, s)
xs + sx = µe − xs.
2. Large-Update Methods
The steps are damped to keep the iterates sufficiently
A popular strategy is to use larger values of θ , say θ = 0.5, bounded away from zero. When starting with y = 0 and
or even θ = 0.99. In this case, after the barrier update, x = s = ζ e for some suitable ζ , under a mild condition
δ(x , µ) will be much larger than 1, and Theorem 5 can on ζ it can be shown that within O(n 2 | log ε|) iterations
be no longer used to measure progress. Even worse, the an ε-solution can be found [Wright (1996)]. The gener-
(full) Newton step will be no longer feasible. The rem- ated solutions are not feasible, in general, but the norms
edy is to take a damped Newton step x = x + αx, with of their residuals are bounded in terms of ε.
damping factor α; this factor can be chosen such that x is
feasible and at the same time the proximity decreases suf-
ficiently to get a provable polynomial method. In practice IV. RELATED TOPICS
this approach yields very efficient methods.
A. Integer Programming
3. Predictor-Corrector Method Suppose we have a linear maximization problem with fea-
This is the most popular method. We describe a simple sible region P:
variant. It is based on a very greedy strategy that uses max{c T x: x ∈ P}, (46)
the Newton step targeting at the zero vector. So, in the
definition of the Newton step one takes µ = 0. The re- and one or more of the variables are required to be inte-
sulting direction is called the affine scaling direction. To gral. In many situations it is natural to impose such a con-
stay feasible, this step is damped again, but with a greedy dition on the variable, for example if a variable represents
damping factor. As a result, after such a step we may end a number of vehicles in the model. If all variables need to
up far from the central path. To restore the proximity, a be integral the problem is called a pure integer problem,
so-called centering step is taken. This is a (damped) New- otherwise a mixed integer problem. In the next sections we
ton step with µ = q T x /n, where x is the current iterate. restrict the discussion to pure integer problems, the gen-
This method has not only turned out to be extremely ef- eralization to mixed integer problems is straightforward.
ficient, but also has the nice property that asymptotically
the objective value converges quadratically to zero.
1. Branch-and-Bound Methods
This method is an intelligent way of searching for integral
4. Infeasible-Start Methods
solutions in P with an objective value that is better than
To start an interior-point method one needs an interior so- the current lower bound, which is initially put to −∞.
lution of the problem at hand. Usually such a solution is Let x̄ be an optimal solution of Eq. (46). Suppose that x̄i
not available, and then the method cannot even be started. is fractional. Let P1 be the region obtained by adding the
P1: GPT/MBG P2: GPJ/GKP P3: FJU/LOT QC: FJS/FYD Final Pages
Encyclopedia of Physical Science and Technology EN008A-379 June 29, 2001 15:17
V. FURTHER EXTENSIONS the simplex method is polynomial. Z. Oper. Res. 26, 157–177.
Boyd, S. E., El Ghaoui, L., Feron, E., and Balakrishnan, V. (1994). Linear
matrix inequalities in system and control theory. SIAM Studies in
It should be noted that many phenomena in this world Applied Mathematics, Vol. 15. SIAM, Philadelphia.
cannot be described adequately by a linear model, but Dantzig, G. B. (1963). “Linear Programming and Extensions,” Princeton
require a nonlinear model. The solution of such models Univ. Press, Princeton, NJ.
goes beyond the scope of this chapter. Some remarks are Dantzig, G. B., Orden A., and Wolfe, Ph. (1955). Notes on linear pro-
in order, however. gramming: Part I—the generalized simplex method for minimizing
a linear form under linear inequality restrictions. Pac. J. Math. 5(2),
Inspired by the success of the interior-point approach 183–195.
to LO, Nesterov and Nemirovski recognized that the un- Frisch, K. R. (1956). La resolution des problemes de programme lineaire
derlying ideas can be extended to a wide class of non- par la methode du potential logarithmique. Cah. Semin. D’Econ. 4,
linear optimization problems, the so-called convex cone 7–20.
optimization problems. Such problems have the form Goemans, M. X., and Williamson, D. P. (1995). Improved approxima-
tion algorithms for maximum cut and satisfiability problems using
min{c T x: Ax = b, x ∈ K}, (47) semidefinite programming. J. Assoc. Comput. Mach. 42(6), 1115–
1145.
where K denotes a convex cone. If K = IRnt , the nonnega- Goldman, A. J., and Tucker, A. W. (1956). Theory of linear programming.
tive orthant, this is exactly the standard LO-problem. The In “Linear Inequalities and Related Systems” (H.W. Kuhn and A.W.
above authors showed that the interior-point approach also Tucker, eds.) Annals of Mathematical Studies, No. 38, pp. 53–97,
applies to problems of this type for different cones. Note Princeton Univ. Press, Princeton, NJ.
that Eq. (47) looks like a linear problem, but one should Hoffman, A. J. (1953). Cycling in the simplex algorithm. Technical Re-
port 2974, National Bureau of Standards.
realize that the cone K can hide a lot of nonlinearity. One Jansen, B., de Jong, J. J., Roos, C., and Terlaky, T. (1997). Sensitivity
striking example occurs when K is the cone of positive analysis in linear programming: just be careful! Eur. J. Oper. Res. 101,
semidefinite matrices. For example, the nonlinear con- 15–28.
straint uv ≥ 1, with u ≥ 0 and v > 0, can be modeled as Karmarkar, N. K. (1984). A new polynomial-time algorithm for linear
programming. Combinatorica 4, 373–395.
u 1 Khachiyan, L. G. (1979). A polynomial algorithm in linear programming.
is positive semidefinite.
1 v Dok. Akad. Nauk SSSR 244, 1093–1096. Translated into English in
Sov. Math. Dok. 20, 191–194.
As a consequence, the field of semidefinite optimization Klee, V., and Minty, G. J. (1972). How good is the simplex algorithm?
at present receives much attention. It has important appli- In “Inequalities III” (O. Shisha, ed.), Academic Press, New York.
cations in system theory [Boyd et al. (1994)] and in com- Lee, J. (1997). Hoffman’s circle untangled. SIAM Rev. 39, 98–105.
binatorial optimization. Many combinatorial optimization Mitchell, J. E., and Borchers, B. (1996). Solving real-world linear order-
ing problems using a primal-dual interior point cutting plane method.
problems admit a natural relaxation to a semidefinite op- Ann. Oper. Res. 62, 253–276.
timization problem. Using this, Goemans and Williamson Nesterov, Y., and Nemirovskii, A. S. (1994). Interior point polynomial
could generate approximate solutions of a famous and hard algorithms in convex programming. SIAM Studies in Applied Math-
combinatorial problem (finding a maximal cut in a graph), ematics, Vol. 13. SIAM, Philadelphia.
not more than 13% from optimal. von Neumann, J. (1947). On a maximization problem. Manuscript, In-
stitute for Advanced Studies, Princeton University, Princeton, NJ.
It is generally believed that Karmarkar revealed only the Orchard-Hays, W. (1990). History of the development of LP solvers.
tip of an iceberg; there is still a big mass to be explored. Interfaces 20(4), 61–73.
Roos, C. (1990). An exponential example for Terlaky’s pivoting rule for
the criss-cross method. Math. Program. 46, 79–84.
SEE ALSO THE FOLLOWING ARTICLES Roos, C., Terlaky, T., and Vial, J.-Ph. (1997). “Theory and Algorithms
for Linear Optimization. An Interior Approach,” John Wiley & Sons,
COMPUTER ALGORITHMS • DYNAMIC PROGRAMMING • Chichester, UK.
GAME THEORY • LINEAR SYSTEMS OF EQUATIONS • NON- Schrijver, A. (1986). “Theory of Linear and Integer Programming,” John
Wiley & Sons, New York.
LINEAR PROGRAMMING • OPERATIONS RESEARCH
Smale, S. (1983). On the average number of steps of the simplex method
of linear programming. Math. Program. 27, 241–262.
Terlaky, T. (1985). A convergent criss–cross method. Math. Oper. Stat.
BIBLIOGRAPHY ser. Optimization 16, 683–690.
Williams, H. P. (1990). “Model Building in Mathematical Program-
Avis, D., and Chvátal, V. (1978). Notes on Bland’s pivoting rule. Math. ming,” 3rd edition, John Wiley & Sons, New York. USA.
Program. Study 8, 24–34. Wolsey, L. A. (1998). “Integer Programming,” John Wiley & Sons, New
Beale, E. M. L. (1955). Cycling in the dual simplex algorithm. Nav. Res. York.
Logist. Q. 2, 269–276. Wright, S. J. (1996). Primal-Dual Interior-Point Methods. SIAM,
Bland, R. (1977). New finite pivoting rules for the simplex method. Math. Philadelphia.
Oper. Res. 2, 103–107. Zionts, S. (1969). The criss-cross method for solving linear programming
Borgwardt, K.-H. (1982). The average number of pivot steps required by problems. Manag. Sci. 15, 426–445.
P1: FYK Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN008A-387 June 29, 2001 15:29
Loop Groups
Andrew Pressley
King’s College, London
GLOSSARY tral extensions of the Lie algebras of loop groups are the
simplest infinite-dimensional examples of Kac–Moody al-
Central extension A central extension of a group G by gebras. (Throughout this article, G will denote a compact
an Abelian group A is a group G̃ that has A as a central Lie group.)
subgroup such that the quotient group G̃/A is isomor-
phic to G.
Deformation retract A space X is a deformation retract
I. BASIC PROPERTIES OF LOOP GROUPS
of a space Y if the identity map Y → Y can be deformed
through continuous maps to a map Y → X .
A. Definitions
Lie group A group that is also a smooth manifold such
that the group operations are smooth. A loop group LG is the group of maps from the circle S 1
Maximal torus A maximal connected Abelian subgroup into a Lie group G. Except where stated otherwise, the
of a compact Lie group. maps will be assumed to be smooth, that is, infinitely dif-
Symplectic manifold A smooth manifold equipped with ferentiable, and the group G to be compact. The group op-
a closed 2-form that is nondegenerate at each point. eration in LG is given by pointwise multiplication. When
provided with the C ∞ -topology, LG can be given the
structure of an infinite-dimensional Lie group, modeled
LOOP GROUPS are groups of maps from the circle on the space L g of smooth maps from S 1 into the Lie al-
into a finite-dimensional Lie group, usually assumed to gebra g of G. The model space L g is a Lie algebra, again
be compact. They arise in many areas of mathematics and under pointwise operations, and is the Lie algebra of LG.
physics, such as the theory of integrable systems, singu- It is called a loop algebra. There is an exponential map
larity theory, and two-dimensional quantum field theory. L g → LG induced from that of G; unlike the exponential
The geometry and representation theory of loop groups is map of G itself, that of LG is not surjective, although its
closely analogous to that of compact Lie groups. The cen- image is dense in LG.
791
P1: FYK Final Pages
Encyclopedia of Physical Science and Technology EN008A-387 June 29, 2001 15:29
One of the many properties of loop groups that does not D. Polynomial Loops
hold for infinite-dimensional Lie groups in general is the
Apart from the smooth loops, it is often convenient to con-
existence of a complexification. This is simply the group
sider other classes of loops. One of the most important is
LG of smooth loops in the complexification G of G
the Laurent polynomial loops; these have finite expansions
itself. Its Lie algebra is the complexification of L g.
N
f (z) = Ak z k
B. Twisted Loop Groups k=N
(d) If G is simple, the extension of L g defined by any The length l(w) of an element w ∈ Waff is the number of
nonzero cocycle ω of the above form is universal, in reflections in a minimal expression for w.
the sense that every central extension of L g by any Choose elements eγ in the root space gα z k correspond-
Abelian Lie algebra A arises from it via a ing to each root γ = (α, k) with the property
homomorphism A → .
[eγ , e−γ ] = γ̌
For the definition of coroots, see the next subsection.
for all simple roots γ of LG. Let γ1 , . . ., γl be the sim-
F. Affine Lie Algebras ple roots of LG and write ei = eγi , f i = e−γi . Then, the
elements of ei , f i , γ̌i , i = 1, . . . , l satisfy the following
The complexified central extensions L̃ g become, after commutation relations:
taking the semi-direct product with the one-dimensional
algebra generated by the derivation d = zd/dz, examples [γi , γ j ] = 0
of Kac–Moody algebras, and in this context they are re- [ei , f j ] = δi j γ̌i
ferred to as affine Lie algebras. One can describe analogs
of all the usual notions associated to finite-dimensional [γ̌i , e j ] = ai j e j
semi-simple Lie algebras. [γ̌i , f j ] = −ai j f j
Recall the decomposition
−ai j +1
(adei ) ej = 0 if i = j
g = ⊕ gα
α (ad f i )−ai j +1 f j = 0 if i = j
where gα is the subspace of g on which the maximal Here, ai j is a square matrix of integers, called the Cartan
torus T acts (by conjugation) via the homomorphism matrix of LG, satisfying
α : T → S 1 . The homomorphisms that occur are called
the roots of g. Roots are often identified with their deriva- aii = 2
tives at the identity, which are elements of the dual space ai j ≤ 0 if i = j
∗ of . The element α̌ ∈ [gα , g−α ] such that α(α̌) = 2 is
called the coroot associated to the root α. ai j = 0 if and only if ai j = 0
To formulate the corresponding definitions for loop
groups, consider the semi-direct product LG × ˜ S 1 , where Theorem 4. These relations define the universal cen-
S acts on LG by rotating loops. Then T × S 1 is a max-
1 tral extension of L g defined in the previous section.
imal Abelian subgroup of LG × ˜ S 1 (T is identified with If (ai j ) is any square matrix of integers satisfying the
the constant loop whose image lies in T ). The Weyl group above conditions, then the preceding relations define a
of LG is thus Waff = N (T × S 1 )/(T × S 1 ). It is called the so-called Kac–Moody Lie algebra. If (ai j ) is positive defi-
affine Weyl group, and is isomorphic to the semi-direct nite, the Lie algebra is finite-dimensional and semi-simple.
product Ť ט , where Ť is the lattice of homomorphisms In the case of affine Lie algebras, det(ai j ) = 0, but every
S1 → T . principal minor of (ai j ) is positive definite. In fact, this
One has the following decomposition of the complexi- condition characterizes the affine Lie algebras, and their
fied Lie algebra of LG × ˜ S1: twisted analogs, among Kac–Moody algebras.
The information contained in the Cartan matrix is often
L g ⊕ = ( ⊕ ) ⊕ z k ⊕ gα z k expressed graphically in the form of the Dynkin diagram
k=0 (α,k)
of the Lie algebra in question. This is a graph that has one
The pairs (α, k) that occur, including those corresponding node for each generator ei , the ith and jth nodes being
to the second summand where α = 0, are called the roots joined by ai j a ji bonds. If |ai j | > |a ji |, the bonds carry an
of LG. The pair (α, k) is regarded as the homomorphism arrow pointing toward the ith node.
T × S 1 → S 1 given by (t, z) → α(t)z k . As in the finite-
dimensional case, a subset of the roots is called a simple
II. THE FUNDAMENTAL
system if every root can be written as a linear combination
HOMOGENEOUS SPACE
of the simple roots, the coefficients being integers all of
which have the same sign. A root is called positive if all
A. Differential-Geometric Properties
the coefficients are positive.
The affine Weyl group acts on the Lie algebra of T ט S1 The homogeneous space X = LG/G plays a fundamental
and is generated by the reflections in the hyperplanes role in the study of loop groups. It is obviously diffeomor-
γ (x) = 0, where γ runs through the simple roots of LG. phic to the subgroup G of LG consisting of the based
P1: FYK Final Pages
Encyclopedia of Physical Science and Technology EN008A-387 June 29, 2001 15:29
loops, that is, those satisfying f (1) = 1; however, it is bet- The Hamiltonian vector field associated to E by the sym-
ter to regard it as a homogeneous space of LG, partly plectic structure is the derivative of the circle action on
because its properties are closely analogous to those of X that rigidly rotates loops. The critical points of E are
the homogeneous space G/T of G, where T is a maximal thus the homomorphisms S 1 → G; they fall into conju-
torus of G; of course, G/T is not a group. gacy classes under the action of G and each conjugacy
The first example of this phenomenon is that X , like class is a connected, compact complex manifold; the con-
G/T , is a complex manifold. This follows from the fol- jugacy classes are in one-to-one correspondence with the
lowing factorization theorem, in which L + G denotes orbits of the action of the Weyl group W of G on the lattice
the subgroup of LG consisting of the smooth maps of homomorphisms S 1 → T .
f : S 1 → G that extend smoothly to holomorphic maps The Kähler metric on X defined above allows one to
of the disc {z ∈ : |z| < 1} in to G . define the gradient vector field of E. Let { f t } be the down-
ward gradient flow of E passing through a loop f at time
Theorem 5. LG = LG, L + G . t = 0; in other words, the solution of the ordinary differ-
This result shows that X is a homogeneous space of the ential equation
complex Lie group LG : ∂f
= −grad E
X∼
= LG /L + G ∂t
Since X is infinite-dimensional, it is not clear a priori that
One of the most striking facts about X is that it behaves
this exists.
like a compact complex manifold.
Theorem 7.
Theorem 6.
(a) Every holomorphic map X → is constant on each (a) The integral curve f t of the downward gradient flow
connected component of X . exists for all t > 0 for any initial loop f . The integral
(b) If M is any compact complex manifold, the space of curve exists for all t < 0 if and only if the initial loop
based holomorphic maps M → X lying in a fixed f is polynomial.
homotopy class is finite-dimensional. (b) For all integral curves f t , limt→∞ f t exists and is a
critical point of E. If f 0 is a polynomial loop,
The second important aspect of the geometry of X is the limt→−∞ f t exists and is a critical point of E.
existence of an invariant symplectic structure. The tangent
space to G at the identity element is L g/g, where g is For any conjugacy class C of homomorphisms S 1 → G,
the Lie algebra of G. The formula let X C and X C denote the parts of X that tend to a point
2π of C as t → ∞ and as t → −∞, respectively. Note that X
1
ω(ξ, η) = ξ (θ), η (θ ) dθ is the disjoint union of the X C and X pol = L pol G/G is the
2π 0 disjoint union of the X C .
where , is an invariant inner product on g, defines a
skew form on L g/g; extending by left translation gives a Theorem 8.
nondegenerate closed 2-form on X (the nondegeneracy is
in the weak sense, that ω(ξ, η) = 0 for all η implies ξ = 0). (a) For any conjugacy class C, X C and X C are locally
Moreover, the complex structure and the symplectic closed complex submanifolds of X of finite
structure are compatible, in the sense that they fit to- codimension and finite dimension, respectively.
gether to give a Kähler structure on X ; this means that (b) The intersection of X C and X C is transverse and
ω(J ξ, J η) = ω(ξ, η) and that ω(ξ, J η) is a Riemannian consists of the conjugacy class C.
metric on X , where J : L g/g → L g/g is the infinitesimal (c) The stratum X 1 corresponding to the identity
complex structure. The Ricmannian metric in question is homomorphism is open and dense in the identity
the Sobolev 12 -metric, given by the L 2 -norm of the 12 -th component of X .
derivative. (d) If λ is any homomorphism in C,
X C = LG · λ
B. Stratifications
XC = L+
pol G · λ
There is a canonical real-valued function on X , the energy
function, given by
Here, LG is the subgroup of LG consisting of the
2π
1 loops that are boundary values of holomorphic maps from
E( f ) = f 1 f (θ ), f 1 f (θ ) dθ the disc |z| > 1 in the Riemann sphere into G .
4π 0
P1: FYK Final Pages
Encyclopedia of Physical Science and Technology EN008A-387 June 29, 2001 15:29
The first half of part (c) is equivalent to the Birkhoff D. Determinant Bundle and Central Extensions
factorization theorem, which asserts that every loop f in
If H were a finite-dimensional space, there would be
G can be written as
a holomorphic line bundle Det on Gr(H ) whose fiber
f = f ·λ· f+ over W ∈ Gr(H ) is the top exterior power of W . In the
infinite-dimensional case, one can still define the deter-
where f + ∈ L + G . minant bundle, using the fact that operators on H of
the form 1 + (traceclass) have well-defined determinants.
But whereas, in the finite-dimensional case, Det is ho-
C. Grassmannian Embedding
mogeneous under the action of G L(H ), in the infinite-
Let H be a separable infinite-dimensional Hilbert space dimensional case only a central extension G̃ L res (H ) of
equipped with a polarization, that is, an orthogonal G L res (H ) by C + acts on Det.
splitting H = H+ ⊕ H into a pair of closed infinite- By combining this with the embedding in Proposition 9,
dimensional subspaces. The restricted general linear one obtains a central extension L̃G of LG ; up to finite
group G L res (H ) is defined as the group of invertible op- coverings, every extension of LG by C + arises in this
erators on H whose block decomposition way. In particular, this shows that all the central extensions
of LG described in Section I.E have complexifications.
a b
c d
III. REPRESENTATION THEORY
with respect to the polarization has the property that b and OF LOOP GROUPS
c are Hilbert–Schmidt operators. Then a and d are nec-
essarily Fredholm operators, and index (a) = −index(d). A. Positive-Energy Representations
The group G L res (H ) is a Hilbert Lie group and has con- A representation of LG is said to be of positive energy
nected components determined by the index of a. Also of if it extends to a representation of the semidirect product
importance is the subgroup Ures (H ) of unitary operators S1 × ˜ LG, with S 1 acting on LG by rotation, and if eiθ ∈ S 1
in G L res (H ). acts by ei Rθ , where the spectrum of the self-adjoint oper-
The restricted Grassmannian Gr(H ) of H is defined to ator R is bounded below. The most interesting represen-
be the set of closed subspaces W of H such that the orthog- tations of LG are those of positive energy; it turns out
onal projections W → H+ and W → H− are Fredholm that these representations are projective, that is, represen-
and Hilbert–Schmidt, respectively. The group G L res (H ) tations of a central extension of LG. Apart from that, their
acts transitively on Gr(H ). The Grassmannian Gr(H ) is a properties and classification are closely analogous to those
Hilbert manifold and is homotopy equivalent to × BU , of the finite-dimensional representations of G. The irre-
the universal classifying space for vector bundles. ducible positive-energy representations of LG correspond
To apply this theory to loop groups, one chooses a finite- to the so-called integrable highest weight representations
dimensional unitary representation V of G and takes H of affine Lie algebras.
to be the space of L 2 functions on the circle with values There are also representations that are not of posi-
in V . The subspaces H+ and H− consist of the functions tive energy. For any a ∈ S 1 , there is a homomorphism
n ≥ 0 vn z and
n
whose
Fourier expansions are of the form LG → G given by evaluating the loop at a; pulling back
v
n<0 n z n
, respectively. The loop group LG acts on H, an irreducible representation V of G by this homomor-
and LG acts by unitary operators. phism gives an irreducible representation V (a) of LG;
every finite-dimensional irreducible representation of LG
Theorem 9. The action of LG on H induces a is a tensor product of representations of this form. A
homomorphism LG → G L res (H ) and a smooth map closely related construction is to let f ∈ LG act on L V by
X → Gr(H ). Both maps are embeddings if V is a faithful ( f, v)(z) = f (az)v(z); tensor products of such representa-
representation of G. tions are generically irreducible. Neither of these types of
If G = Un and V is the natural representation on n , the representations are of positive energy.
image of X pol in Gr(H ) can be characterized as follows:
X pol = {W ∈ Gr(H ) : zW ⊂ W and B. The Fundamental Representation
and The Spin Representation
z N H+ ⊂ W ⊂ z N H+ }.
The space of holomorphic sections of Det is a com-
Similar characterizations are possible for classes of loops pletion of the exterior algebra (H+ ⊕ H̄ − ), where H̄
other than polynomial and groups other than Un . is the orthogonal complement of H+ , but with the
P1: FYK Final Pages
Encyclopedia of Physical Science and Technology EN008A-387 June 29, 2001 15:29
complex conjugate complex structure. The induced ac- (b) L n,λ has nonzero holomorphic sections if and only if
tion of the central extension G̃ L res (H ) on is irreducible; (n, λ) is dominant, in the sense that
it is called the fundamental representation of G̃ L res (H ).
“Irreducible” here means that contains no proper invari- 0 ≤ λ(α̌) ≤ n α̌, α̌
ant subspace that is closed in the compact-open topology. for every simple coroot α̌ of G. Here, , is the
Moreover, is a unitary representation of Ũ res (H ), the unique inner product on g such that α̌, α̌ = 2 for
central extension of Ures (H ) by S 1 obtained by restricting every simple coroot α̌.
the extension G̃ L res (H ). This means that contains as a (c) If (n, λ) is dominant, the space (L n,λ ) of
dense subspace a Hilbert space on which Ũ res (H ) acts by holomorphic sections of L n,λ is an irreducible
unitary operators. positive-energy representation of a central extension
One can define in a similar way the restricted orthogonal of LG .
group Ores (H ) of the real Hilbert space H underlying (d) Every irreducible positive energy representation of
H . The fundamental representation can then be realized LG arises in this way.
as the spin representation of Ores (H ). The space of this
realization is a direct sum of symmetric algebras indexed The positive integer n is called the level of the repre-
by the integers. sentation (L n,λ ). The pair (n, λ) is usually regarded as
The isomorphism between these two realizations is an element of the dual of the Lie algebra of T̃ , the part of
called the boson-fermion correspondence, by analogy the central extension L̃G lying over T .
with quantum field theory. The space (H+ ⊕ H̄ − ) of One can show also that every positive-energy represen-
the first realization is the Hilbert space of a system of tation of LG is unitary, and hence breaks up into a direct
free fermions, with H+ and H− being the states of a sin- sum of representations of the above type. In particular,
gle particle of positive and negative energy, respectively. this implies that every positive-energy representation of
Similarly, the symmetric algebra is the Fock space of a LG extends to the complexification LG . No such prop-
system of free bosons. erty holds for infinite-dimensional groups in general.
By restricting the fundamental representation, one ob-
tains an irreducible unitary representation of L Ũ n , called
the basic representation of LUn . D. Kac Character Formula
V. Kac proved an analog for affine Lie algebras (in fact, for
C. Borel–Weil Theory all symmetrizable Kac–Moody algebras) of the Weyl for-
mula for the characters of the irreducible representations
To construct the positive-energy representations of LG of G. The character of a representation V of LG × ˜ S 1 is
for a general compact Lie group G, one must consider the the formal power series
homogeneous space Y = LG/T , where T is a maximal
torus of G. Dividing by the action of G exhibits Y as a χV = (dim Vγ )γ (t, z)
bundle over X with fiber G/T . For simplicity, we shall γ
only consider the case where G is simply-connected and where Vγ is the part of V on which the maximal Abelian
simple. Then Y is connected and simply connected so the subgroup T × S 1 acts by the homomorphism γ . In some
complex line bundles over Y are classified by their first cases this can be interpreted as some kind of generalized
Chern class, which is an element of function of (t, z) ∈ T × S 1 . The formula for the character
H 2 (Y, ) ∼
= ⊕ H 2 (G/T ) ∼
is often written
= ⊕ T̂
χV = (dim Vγ )eγ
where T̂ is the lattice of characters of T , that is, the ho-
γ
morphisms T → S 1 . Let L n,λ be the line bundle associated
to (n, λ) ∈ ⊕ T̂ . Then we have the following classifica- identifying the homomorphism γ with an element of the
tion of the positive-energy representations of LG, which dual of the Lie algebra of T × S 1 .
is a precise analog of the Borel–Weil theorem, which de-
scribes the finitedimensional representations of G as sec- Kac Character Formula
tions of line bundles over G/T . Let L V be the holomorphic line bundle on LG/T associ-
ated to a dominant element of the dual of the Lie algebra
Theorem 10. Let G be a simply-connected, simple of T̃ , as in theorem 10. Let V be the space of holomorphic
compact Lie group. sections of L V . Then the character of V , is
l(w) w(V,ρ)+ ρ
(a) Each complex line bundle L n,λ has a unique w ∈ waft (−1) e
χV = γ
holomorphic structure. γ ,0 (1 − e )
P1: FYK Final Pages
Encyclopedia of Physical Science and Technology EN008A-387 June 29, 2001 15:29
Here, ρ is characterized by ρ(γ̌ ) = 1 for all simple roots cycle defining the central extension LG, and Diff + (S 1 )
γ of LG. acts on L̃G.
Regarded as functions of z ∈ S 1 , these characters are
boundary values of holomorphic functions in the disc Theorem 11. There is a (projective) action of
|z| < 1. Since the group S L 2 () acts on the disc by ap- Diff + (S 1 ) on all positive-energy representations of LG
propriate linear fractional transformations, it acts also on that intertwines with the action of LG.
the space of such functions. One of the remarkable facts More recently, the algebra of vertex operators has been
about the characters is that the finite-dimensional vector formalized and generalized and is capable of describing
space spanned by the characters of the irreducible positive- representations of LG other than the basic one. Vertex
energy representations of a given level is preserved by the algebras have also found other applications, notably to
action of S L 2 (). The proper explanation of this is to be the construction of the “moonshine” representation of the
found in conformal field theory. Monster simple group.
E. Vertex Operators
IV. RELATIONS WITH OTHER PARTS
Historically, the first realization of the basic representa- OF MATHEMATICS
tion was given in terms of so-called vertex operators, the
construction and properties of which were known to physi- Due to limitations of space, we shall restrict discussion of
cists in dual resonance theory. The restriction of the central applications of loop groups to the following two combi-
extension of LG to the Abelian group L T is an infinite- natorial topics.
dimensional Heisenberg group, and thus has a canonical
level 1 irreducible unitary representation, say H . One at- A. Macdonald Identities
tempts to extend this representation, initially to the Lie
algebra L g, and then to LG, by defining, for each coroot I. G. Macdonald discovered a remarkable series of multi-
α̌ of G, and each complex-valued function f on S 1 , op- variable power series identities, one for each simple com-
erators Vα ( f ) that obey the correct commutation relations pact Lie group G. When G = SU2 , this gives Jacobi’s
with each other and with the generators of the action of L T formula
on H . Since g is spanned by and the coroots α̌, this will 2
accomplish the task of extending the representation to L g. η(q)3 = (−1) j q (1/24)(6 j+1)
j
The construction works, at least in its simplest form,
only when G is simply-laced, which means that all the for the cube of the Dedekind eta-function
simple coroots have the same length; the coroots can then
be identified with the homomorphisms S 1 → T of minimal η(q) = q 1/24 (1 − q j ).
length. One defines j=1
Manifold Geometry
C. T. J. Dodson
University of Manchester Institute of Science and Technology
I. Preliminary Notions
II. Geometrical Spaces
III. Manifolds and Bundles
IV. Calculus of Sections
V. Metric Geometry
VI. Connection Geometry
VII. Singular Geometry
VIII. Topology, Geometry, and Physics
49
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
50 Manifold Geometry
allow a geometrization of diverse problems in abstract algebra to organize our calculus procedures and to rep-
analysis and in physical field theories; they also allow the resent symmetries in the geometry. Our first task is to
use of algebraic topological methods of cohomology to describe the appropriate parts of analysis and algebra to
study global properties. Very elegant formulations of the explain the above use of the term “geometric structure”
concepts of parallelism and curvature arise from the ge- on a set and to explain the kinds of requirements that are
ometrical object called a connection. This distinguishes often imposed on maps that arise in geometrical contexts.
the geodesic curves that generalize to a manifold the We shall attempt to present them together in a geometrical
Euclidean notion of a straight line and also gives a test setting with examples. In order to have a fund of examples
for completeness of the manifold. we need to assume a working knowledge of certain mathe-
matical raw materials, abbreviations, and basic tools.
We shall suppose some familiarity with elementary al-
I. PRELIMINARY NOTIONS gebra and calculus on the real and complex number fields.
The analysis on them depends on the modulus or absolute
A. Sets and Maps value function a → |a|, which yields a distance function
d between two numbers a and b by
We shall accept as sufficient for our purposes the intuitive
notion of a set as a collection of distinct and definite ele- d(a, b) = |a − b|
ments. Likewise, a map from a set X to a set Y is a rule
This satisfies the geometrical requirements:
that prescribes precisely one element of set Y to be related
to each element of set X . Then we indicate it by a diagram d(a, b) = d(b, a) (symmetry)
like d(a, b) = 0 if and only if a = b (positive definiteness)
f : X → Y : x → f (x) d(a, c) d(a, b) + d(b, c) (triangle inequality)
and speak of the map f with domain X that sends a typical The algebra of the real and complex numbers depends on
element x in X to the element f (x) in Y . A convenient each of them having double group structures, one under
way to view f is in terms of its graph, that is, the set of addition with identity zero and one under multiplication
elements (x, f (x)) lying in the X × Y space. Every map (excepting zero) with identity one. Recall that a group is a
f : X → Y defines another map bringing subsets of Y back set with a binary operation that is associative, has identity,
to subsets of X and admits inverses.
Manifold Geometry 51
A∪B the set of elements in one or both of subsets A or B; Hence we seek f satisfying, for all θ and w, the condition
their union
{x ∈ X | P(x)} the set of elements in X for which statement P(x)
f (eiθ w) = f (w)
is true This requirement is simply rotational symmetry; that is,
⇒ implies; then f (w) is independent of the angular position of w. One such
⇔ implies and is implied by; if and only if function is the modulus, defined for all w = x + iy by
A×B the set of ordered pairs (a, b) such that a is in A
and b is in B f (x + iy) = (x 2 + y 2 )1/2
X →Y a map from set X to set Y
Try repeating this example for the same group when
X →→ Y a map from set X onto (the whole of) set Y
acting as a rotation about a point other than the origin.
X Y an equivalence of structures (iso/homeo/
diffeomorphism)
Whenever we have a group action on a set we wish
f ❜ g or fg the composite map; do g then f
to know the subsets left invariant by the action—if there
are any. In our example they do exist and in fact they are
EXAMPLE. Denote by G the multiplicative group of the circles with center the origin; we call such subsets the
unit modulus complex numbers, that is, orbits of the group action. They always form a partition of
the set carrying the action into disjoint nonempty subsets.
G = {z ∈ | |z| = 1} = {eiθ | θ ∈ } Orbits that consist of single points are called fixed points
This group acts on the set of complex numbers in a (the origin is one in the example). Orbits need not look like
simple way: planetary orbits; for example, the action on the Euclidean
plane E 2 given for the additive group of real numbers by
α: G × → : (eiθ , w) → eiθ w α
× E 2 → E 2 : (a, (x, y)) → (x + a, y)
that is simply to rotate each complex number w = re by iφ
the angle θ to give eiθ × reiφ = rei(θ +φ) , for each choice of has no fixed point and orbits the horizontal lines. This latter
φ (Fig. 1). This α is a continuous action since group action is simply translation by a real number along
the x axis, but in fact any other direction (or all directions)
1. Nearby θ values send a fixed w to nearby numbers could have been used. Indeed, if we combine all rotations,
in ; translations, and reflections into one collection we get
2. Nearby w numbers are sent to nearby numbers by a the Euclidean group, so called because it leaves invariant
fixed θ . the essentials of Euclidean geometry, namely lengths and
angles.
Consider seeking a real map (i.e., a function)
f: → D. Geometrical Spaces
that is invariant under the above group action. Evidently Euclidean geometry is the usual geometry of real n-space
we want commutativity of a diagram E n and it is algebraically easy to handle because E n is an
affine space; this simply means that relative to any choice
α
G×
−−−−−→
of origin it is equivalent to a vector space. Non-Euclidean
n-dimensional geometries arise from the smooth patching
p2 f that is
f
together of pieces of E n without regard for the preserva-
−−− → tion of their algebraic structures. Such patchwork struc-
tures are called n-manifolds, and they are our principal
(θ, w) − −−−−→ eiθ w
domain spaces in general geometry. Familiar examples of
on elements 2-manifolds are a sphere and a torus; evidently each has
w − −−−−→ ? some residual local similarity to pieces of the Euclidean
plane E 2 from which they are synthesized, but globally
they are very different. One geometrical difference is
already apparent: in the Euclidean plane the angle sum
of any triangle is 180◦ , but it is easy to see that on a sphere
we can find a triangle with sides the arcs of three great cir-
cles and having an angle sum of 270◦ . In fact, this angular
excess is a manifestation of the presence of curvature. The
formal way to handle it in general geometry is via an en-
FIGURE 1 Rotation through angle θ . tity called a connection, which governs the definition of
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
52 Manifold Geometry
parallelism on a manifold: going due north on a sphere is steps: from one to more than one, and from finitely many
a curve that propagates in a parallel fashion on the curved to infinitely many. Thus the step from E 1 to E 2 , the real
surface, that is, maintains its direction as closely as it can. plane, is intuitively difficult at first meeting; but the step
Intuitively, a connection structure provides a general geo- on, to E 4 or E n for any finite n, is simple because the
meter with a tool somewhat like the parallel rulers used algebra and analysis are essentially unchanged. The fur-
by navigators. ther step to infinite dimensions is also difficult because de-
Once we can say what we mean by the “same direction” finitions change; the kind of hurdle that arises is like the
at two different points it is meaningful to pose problems transition for polynomial functions to power series. In fact,
like: what is the rate of change in that direction of some we shall not study infinite-dimensional spaces explicitly
entity defined on the space? By means of a connection but they will arise incidentally.
we obtain a differential operator that precisely measures So for our purposes, a good intuitive grasp of the geom-
all such rates of change in an invariant (sometimes called etry of the real plane E 2 is a prime asset. By suppressing
covariant) way, that is, independently of the manner in coordinates and other manifestations of its “twoness” we
which we choose to label the directions and points. The shall be able to use the algebra and analysis for higher-
operator involved here is the covariant derivative and it re- dimensional situations. We summarize the geometry as
places ordinary derivatives in non-Euclidean geometries, follows:
so allowing the representation of very intricate systems
of differential equations such as those used in electrical 1. Points of E 2 are ordered 2-tuples of real numbers like
engineering or for physical field theories. p = ( p1 , p2 ).
Probably the most important differential operator on a 2. There is a vector difference between pairs of points,
manifold is the exterior derivative; this comes free with the a map, diff: E 2 × E 2 → 2 : ( p, q) → q − p =
manifold: no extra structure need be assumed. In quite a (q1 − p1 , q2 − p2 ), and this correspondence is the
precise way it generalizes the gradient operation on scalar best possible in the sense that if we hold one of the
functions and the curl operation on vector functions that points fixed then it is a one-to-one correspondence
are used in vector calculus on E 3 . In that sense it turns between all points of E 2 and all vectors of 2 ;
out that curvature is the “curl of the connection.” We shall reversing the points reverses the vector.
see how the exterior derivative characterizes some intrin- 3. The parallelogram law holds for vector differences of
sically topological properties of manifolds by means of de points.
Rham cohomology theory. 4. There is a distance function defined between pairs of
Perverse though it may seem, no sooner do mathemati- points, the length (or norm) of their vector difference:
cians set up a nice machinery than there arises an interest in
dist( p, q) = diff( p, q)
how and why the machinery could be made to seem inade-
quate or defective. Clearly, if well-founded logically, then
We use property (1) for labeling the points of the space:
the theory (e.g., general geometry) will not have intrinsic
in E 2 each point has two coordinates, so E 2 and 2 are
defects. However, near the “edges” of its validity some
equivalent (in bijective correspondence) as sets. The vec-
interesting problems arise. For example, general geome-
tor difference map gives us two things:
try can be asked to model the environment of a physical
singularity, like a black hole. This means devising a geo-
1. The set of all directions at each point; this is the
metrical space in which something goes wrong; namely,
tangent space at the point.
particles disappear if they follow certain paths. It is an in-
2. A simple algebraic cohesion among points that
teresting recent result that in such a model the singularity
facilitates the solution of problems by trigonometric
seems to be stable: we cannot make it go away by per-
methods.
turbing the geometry, as done for example in quantization
procedures.
Recall that in 2 we have the dot product between two
vectors given by
u · v = (u 1 , u 2 ) · (v1 , v2 ) = u 1 v1 + u 2 v2
II. GEOMETRICAL SPACES
This also determines the length (or norm) of a vector u by
A. Euclidean Spaces √
u = u · u
The simplest Euclidean space is the one-dimensional case
and angle θ between nonzero vectors u and v by
of the real line, E 1 . As is almost always the case in math-
ematical processes of generalization, there are two major u · v = uv cos θ
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 53
54 Manifold Geometry
1. M can support the notion of continuous functions on and atlas {(T Uα , T φα ) | α ∈ A}, where T Uα = ∪x∈Uα Tx M
it; (we take M to be Hausdorff with countable base). and T φα is the derivative of φα . We call this manifold the
2. M is the union of a collection {Uα | α ∈ A} of its tangent bundle to M and it is noteworthy that it comes
subsets; (the collection is an open cover for M). free with M; no further structure was needed. There is a
3. For each α in the indexing set A there is a continuous natural smooth map onto M:
equivalence between Uα and E n ; (homeomorphisms
p: TM →
→ M : (x, u) → x
φα : Uα → E n give coordinates).
4. The change of coordinate maps are smoothly The fiber of such a map over x is
differentiable; (φα ❜ φβ−1 is a diffeomorphism if
p ← {x} = {y ∈ TM | p(y) = x}
Uα ∩ Uβ = ∅).
which evidently coincides with Tx M and therefore looks
We call the collection {(Uα , φα ) | α ∈ A} an atlas of charts like n for each x, though there is no unique isomorphism
for M. Properties 1–4 are clear enough but it is worth not- Tx M n actually determined for each x since each chart
ing that 4 is meaningful because maps like φα ❜ αβ−1 go gives a different one. The tangent bundle is an example of
between pieces of E n on which we suppose differentia- an important class of structures over manifolds, the vector
bility to be well understood—namely, calculus on n at- bundles.
tached to each point. It is property 4 that enables us to
say what we mean by a map between two manifolds be- B. Bundles
ing differentiable. Let M be an m-manifold with atlas
{(Vλ , ψλ ) | λ ∈ B}. Then a map A vector bundle with fiber F (a vector space) over a mani-
fold M is a smooth surjection p: X → → M such that ev-
f : M → M ery x ∈ M has a neighborhood U for which there is a dif-
is called differentiable if and only if the composite maps feomorphism p ← U U × F, and isomorphism on fibers.
ψλ ❜ f ❜ φα−1 are differentiable wherever f Uα ∩ Vλ = ∅, for This implies that the zero vectors in fibers are joined
any α ∈ A and λ ∈ B. The reason for this is apparent from smoothly together and the set of all of them looks like
the diagram a copy of M embedded in X . In the case of the tangent
bundle over an n-manifold, F = n and the local triviality
f
Uα −−−−−→ Vλ ∩ f Uα condition is
φα
−1
ψλ T U α U α × n
E n −−−−−→ Em
ψ ❜ f ❜φα−1 Thus a vector bundle over M consists of M together with
copies of a vector space smoothly fitted over all points of
Evidently, we are borrowing for manifolds the already
M. We interpret the tangent bundle over M as the set of
known property of differentiability of maps from E n to
all points and directions in M.
E m . Property 4 is spoken of as giving a differentiable or
If we have a vector bundle with fiber F over M then
smooth structure to M.
we may be able to use its construction to obtain another
A similar trick of borrowing is employed to define Tx M,
vector bundle with fiber the dual space F ∗ consisting of
the tangent space to M at x ∈ M. If x ∈ Uα then we cer-
all linear functionals defined on F. One case when this
tainly have a nice vector space tangent at φα (x) ∈ E n and it
is always possible is for the tangent bundle TM; we re-
is isomorphic to n . The problem is that we may also have
place each tangent space Tx M by its dual (Tx M)∗ con-
x ∈ Uβ . In that case φα ❜ φβ−1 and φβ ❜ φα−1 have derivatives
sisting of real-valued linear maps on Tx M. The result-
that are actually isomorphisms between the tangent spaces
ing bundle is T ∗M, the contangent bundle. Recall that
at φα (x) and φβ (x); in fact, they will appear as invertible
just as we can dual vector spaces so we can dual linear
Jacobian matrices with entries the partial derivatives, from
maps between them by sending linear f to linear f ∗
calculus on n . Such isomorphisms are used to take equiv-
where
alence classes of E n -tangent spaces for synthesizing Tx M.
F −→ G
f
Then the process can be rounded off nicely because the
with f ∗ : α → α f
derivative of a map between manifolds actually becomes f∗
a linear map between tangent spaces. G ∗ −→ F ∗
The totality of all tangent spaces to an n-manifold M is So the dual map actually goes in the reverse direction. For a
actually itself a 2n-manifold with point set finite-dimensional vector space F we may identify (F ∗ )∗ ,
the double dual, with F itself by the natural isomorphism
TM = Tx M = {(x, u) | x ∈ M, u ∈ Tx M}
x∈M F (F ∗ )∗ : v → v̂
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 55
56 Manifold Geometry
For example (3 )∗ 2 can therefore be interpreted as τ runs over all k! permutations of {1, 2, . . . , k}, and sgn(τ )
the space of linear maps from 3 to 2 , namely the space is ±1 according as τ is an even or an odd permutation.
of 2 × 3 matrices. Then we can define
There is a third composition, also based on the tensor (Ar F ∗ ) ∧ (As F ∗ ) = Ar +s F ∗
product, available for a vector space F with itself. This is
the alternating or exterior product F ∧ F. In terms of the = Ar +s (Ar F ∗ As F ∗ )
basis { f 1 , . . . , f n } for F:
which agrees with what we gave before for r = s = 1, and
F ∧ F has basis { f i f j − f j f i | i < j} the exterior product turns out to be associative and dis-
tributive but anticommutative:
and so dimension 12 (n 2 − n)
w ∧ v = (−1)r s v ∧ w, w ∈ Ar F ∗ , v ∈ As F ∗
We usually write f i ∧ f j = 12 ( f i f j − f j f i ). Hence
2 ∧ 2 has basis {î ĵ − ĵ î} and so 2 ∧ 2 1 . It follows that if dim F = dim F ∗ = n then
Observe that we can only have (F ∧ F) F if 12 (n 2 − n) =
r ∗ n n!
n, that is, if n = 3. This fact is closely related to the ex- dim A F = =
r (n − r )!r !
istence on 3 only of the vector cross product × defined
by and therefore
î × ĵ = k̂ ĵ × k̂ = i î × k̂ = − ĵ n n
dim Ar F ∗ = = = dim An−r F ∗
r n −r
For, we can make an isomorphism
so in particular
∧ ×
3 3 3 3
dim An F ∗ = dim A0 F ∗ = 1
î ∧ ĵ → î × ĵ dim Ak F ∗ = 0 for k > n
by mapping linearly î ∧ k̂ → î × k̂
All of this exterior algebra can be carried over to ap-
ĵ ∧ k̂ → ĵ × k̂ ply smoothly over a manifold M and hence give rise to
Once again we can use the product ∧ on spaces to give Ar M, the alternating tensor bundles of differential r -forms
a product on linear maps. For f, g ∈ L(F; J ) we obtain or r -form bundles. The collection of all r -forms for all
r = 0, 1, . . . on an n-manifold M is denoted M and the
f ∧ g: F ∧ F → J ∧ J : x ∧ y → f (x) ∧ f (y) exterior product gives M the structure of a Grassmann
or exterior algebra. The collection of all r -forms on M is
These vector space processes can be performed denoted r M so M is the disjoint union of these over
smoothly over a manifold to the fibers of vector bundles. all r = 0, 1, . . . . We call the Tsr M the tensor bundles and
In particular, we have automatically the following vector we usually adopt the conventional notation
bundles derived from the tangent bundle TM and cotangent
bundle T ∗M to a smooth manifold: T00 M = A0 M = M ×
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 57
for the trivial product vector bundle with fiber 1 over IV. CALCULUS OF SECTIONS
M. Much of general geometry and its applications is con-
cerned with sections of these bundles. For example, we A. Module of Sections
may point out that
Intuitively we viewed a section of a bundle X over M as
a copy of M lifted vertically up into the manifold X in
1. A metric structure is determined by a section of T20 M. such a way that each point of M goes to the fiber over
2. A connection (i.e., parallelism) structure is itself. For this to be possible at all we need the projection
p
determined by a section of A2 M. of X to be surjective onto M; this we denote by X →→ M.
3. Curvature of a connection is determined by a section In a precise technical sense a section σ : M → X is actu-
of T31 M. ally a lift of the identity map 1 M : M → M since we want
4. On an n-dimensional configuration space of particles: Fig. 3 commuting, namely such that p(σ (x)) = x, that is,
the kinetic energy is a section of T20 M and the σ (x) ∈ p ← {x} for all x ∈ M. Vector bundles always have
Lagrangian is a section of An M. one smooth such section, the zero section which sends
each x ∈ M to the zero vector in the fiber p ← {x}. Some
vector bundles (e.g., the trivial product bundles) have a
Notation full set of linearly independent sections; such sections, of
It is important to distinguish between a bundles X →→ M course, can never cross the zero section. Other vector bun-
over M and its space of sections, which sometimes is de- dles (e.g., the Möbius bundle over the circle, or TS2 , the
noted χ (X ) or (X ). However, it is common to speak tangent bundle to the ordinary sphere) do not have even
of, for example, the bundle ArM of r -forms on M when one continuous section that never meets the zero section.
p
properly ArM →→ M is the bundle and the collection of its The fact that sections take values in fibers means that
sections is quite different, denoted r M by us; precisely over any point in M we may use available algebraic struc-
ture in the fiber on sections passing through it. Now, the
r M = {smooth w: M → ArM | pw = 1 M } fibers are joined smoothly together, so not surprisingly
we find that their algebraic structures change, but only
Similarly, we use ϒsr M to denote the collection of all sec- smoothly, so we can apply much of the algebra to sec-
tions of Tsr M, that is, the (r, s)-tensor fields: tions themselves. In particular, the set of sections of a
vector bundle over a manifold is a vector space (infinite-
ϒsr M = smooth v: M → TsrM pv = 1 M dimensional) with pointwise addition and multiplication
by numbers in each fiber. Furthermore, we can actually
allow multiplication by numbers that vary smoothly from
D. Manifolds with Boundary point to point, that is by sections of the trivial scalar bun-
The alert reader may have noticed that our definition of an dle; this means that the set of sections of a vector bun-
n-manifold disqualifies, for example, a unit cylinder and dle over a manifold is a module over the ring of smooth
the closed unit disk from being 2-manifolds. They simply scalar-valued maps on the manifold. Thus, if σ , τ are two
fail to be locally like E 2 at their boundary edges; instead sections of the real vector bundle X →→ M and λ: M →
the edge points are homeomorphic to half-spaces like is a smooth map, then also σ + τ and λσ are sections of
X →→ M with
{(x, y) ∈ E 2 | y 0}
σ + τ : M → X : x → σ (x) + τ (x)
Evidently such entities are likely to be needed in geometry λσ : M → X : x → λ(x)σ (x)
and physics, so we shall allow such edge points in future.
Spaces that have them are called manifolds with boundary Much of geometrical analysis and its applications de-
and we shall denote the boundary of such an M by ∂ M. In pend on the study of sections of the tensor bundles; these
fact, if M is an n-manifold then ∂ M is an (n −1)-manifold.
EXAMPLES
58 Manifold Geometry
are the tensor fields. As we have seen, they are built up Evidently these are related (invertibly) by
from sections of the tangent and cotangent bundles via the ∂x ∂y
tensor product. Accordingly, we shall begin by a study of ∂˜1 = ∂2 and ∂˜2 = ∂1 + ∂2
∂z ∂z
tangent and cotangent vector fields.
with x = (1 − y 2 − z 2 )1/2 and y = (1 − x 2 − z 2 )1/2 . Also
on U ∩ V , we could introduce angular coordinates by,
B. Tangent Vector Fields say,
From the canonical structure of TM inherent in the differ- φ̃: U ∩ V → 2 : (x, y, z) → (θ, ψ) = (x̂1 , x̂2 )
entiable structure on an n-manifold M, it turns out that for
each chart where θ and ψ are the angles defined by
φ: U → E n : x → (x1 , x2 , . . . , xn )x x = cos ψ cos θ so ψ = 0 on the equator and π/2 at
the north pole
y = cos ψ sin θ
the tangent space Tx M has a basis (∂1 , ∂2 , . . . , ∂n )x , where
and θ = 0 in the positive x direction
each ∂i is the partial differential operator ∂/∂ xi for real z = sin ψ θ = ±π/2 in the ±y directions
functions defined on neighborhoods of φ(x). Hence, for Again we get a local basis section
all tangent vector fields σ : M → TM, on the domain of
∂ ∂
such a chart, σ appears in the form (∂1 , ∂2 ) =
ˆ ˆ ,
∂θ ∂ψ
σ : U → TM : x → σ 1 ∂1 + σ 1 ∂2 + · · · + σ n ∂n x
which is related to the original choice (∂1 , ∂2 ) by
with each σ a real function on U . A change of chart
i
∂x ∂
∂ˆ1 = ∂1 +
induces a change of basis and corresponding change of ∂θ ∂θ
components of σ , both in agreement with the usual trans- ∂ x ∂y
formation law for partial derivatives. Viewing ϒ M, the ∂ˆ2 = ∂1 + ∂2
∂ψ ∂ψ
space of sections of TM, as a module, it is convenient to
think of the map x → (∂1 , ∂2 , . . . , ∂n )x as giving a basis In fact, if we wish to exploit the sphericity of S 2 then
for this module in the neighborhood U of x. the chart with (θ, ψ) coordinates is very convenient. For
instance,
EXAMPLE. Let M = S 2 be the unit sphere in 3 ,
explicitly: U ∩ V → TM : (θ, ψ) → cos ψ ∂ˆ1
M = {(x, y, z) ∈ 3 | x 2 + y 2 + z 2 = 1} locally models an east–west wind on the earth, decaying
from the equator to the north pole. Similarly,
Define two charts (U, φ), (V, φ̃) for
U ∩ V → TM : (θ, ψ) → −cos ψ ∂ˆ2
U = {(x, y, z) ∈ M | z > 0}
models a north wind.
(the “northern hemisphere”) A local flow about x0 ∈ M for a vector field v on M is
V = {(x, y, z) ∈ M | x > 0} a map for some neighborhood U of x0 and positive ε,
(the “eastern hemisphere”) : U × (−ε, ε) → M
by such that (∀x ∈ U )
φ: U → 2 : (x, y, z) → (x, y) = (x1 , x2 ), say
a. (x, 0) = x.
φ̃: V → : (x, y, z) → (y, z) = (x̃1 , x̃2 ), say
2
b. cx : (−ε, ε) → M : t → (x, t) is a curve with
tangent vector ċx (t) = v(c(t)).
Then on U ∩ V = {(x, y, z) ∈ M | x > 0, z > 0} (the NE
quadrant) we have alternative coordinates: the (xi ) or the Intuitively we think of as a family of curves that “join
(x̃i ). These determine corresponding local basis sections: up the arrows” of the vector field v. We call each curve cx
an integral curve of v through x.
∂ ∂
(∂1 , ∂2 ) = ,
∂x ∂y EXAMPLE. Take U , V , and M as in the previous ex-
ample. Then a local flow on S 2 for the vector field
∂ ∂
(∂1 , ∂2 ) =
˜ ˜ ,
∂ y ∂z v: U ∩ V → TS2 : (θ, ψ) → cos ψ ∂ˆ2
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 59
60 Manifold Geometry
We have been careful so far to put subscript x on various formed smoothly over a manifold, so giving correspond-
entities to indicate that they vary with x. This is not always ing products on fields. Locally with respect to some chart
necessary and where it is clear whether we mean, for in- φ: U → E n , suppose that we have basis fields {∂1 , . . . , ∂n }
stance, a vector at x or a field about x, we shall omit such for tangent and {d x 1 , . . . , d x n } for cotangent vector fields.
subscripts. There is another very useful notational trick: Then we obtain, for example, the basis fields
the summation convention attributed to Einstein. This is
{∂i ∂ j | i, j = 1, . . . , n} for T02 U
simply to sum over repeated upper and lower indices. For
example, if A is an n × n matrix with entries a ij then we {d x i d x j | i, j = 1, . . . , n} for T20 U
can write trace A = aii . Thus the identity matrix has entries
δ ij and its trace is δii = n. We shall use the same convention {d x i ∧ d x j | 1 i < j n} for 2 U
i
with more than one symbol present, for instance, d x 1 ∧ d x i 2 ∧ . . . ∧ d x ir |
f1 d x 1 + f2 d x 2 + · · · + fn d x n = fi d x i 1 i 1 < i 2 < . . . < ir n for r U
v 1 ∂1 + v 2 ∂2 + · · · + v n ∂n = v i ∂i To accommodate the structure of such spaces as r U we
h 11 ∂1 ∂1 + h 12 ∂1 ∂2 + · · · + h nn ∂n ∂n modify the summation convention by enclosing the lower
set of repeated indices in brackets to indicate that the sum-
= h i j ∂i ∂ j
mation is only over increasing sequences. Thus, for exam-
ple, we would write for w ∈ 2 E 3
D. Commutator or Lie Bracket
w = w12 d x 1 ∧ d x 2 + w13 d x 1 ∧ d x 3 + w23 d x 2 ∧ d x 3
Given two tangent vector fields u, v with expressions = w(i j) d x i ∧ d x j
u = u i ∂i and v = v j ∂ j in terms of local partial deriva-
tives defined by some chart (U, ), then we obtain their
EXAMPLE. Consider the simple case M = E 2 . Then
commutator vector field
we can illustrate two important fields with respect to the
[u, v]: U → TM : x → u i ∂i v j − v i ∂i u j ∂ j standard chart.
Independently of any chart we can define the commuta- (1) g: M → T20 M : (x1 , x2 ) → δi j d x i d x j ,
tor through its action as a derivation on any smooth real
function f : M → by 1 if i = j
δi j =
[u, v] ( f ) = u(v( f )) − v(u( f )) 0 if i = j
A feel for what the commutator measures can be ob- This defines the usual metric structure on E 2 , that is, the
tained by examining the flows of u and v. Suppose that standard dot product on each tangent space Tx M:
these flows are, respectively, and with
g u k ∂k , v m ∂m = δi j d x i d x j u k ∂k , v m ∂m
t : U → M : x → (x, t)
= δi j u k v m d x i d x j (∂k , ∂m )
s : U → M : x → (x, s)
= δi j u k v m δki δmj
Then it turns out that on U
= δi j u i v j = u 1 v 1 + u 2 v 2
i ❜ s = s ❜ t
(2) ω: M → 2 M : (x1 , x2 ) → d x 1 ∧ d x 2
if and only if [u, v] = 0
Thus, [u, v] measures the failure of the two flows to com- This defines the usual geometrical measure on E 2 , that
mute. It follows all vector fields v satisfy the self commut- is, the standard parallelogram area mapped out by pairs of
ing property vectors in each tangent space Tx M:
[v, v] = 0 ω u k ∂k , v m ∂m = 12 (d x 1 d x 2 − d x 2 d x 1 )
then the flow of V reflects this in the identity t ❜ s = × u k ∂k , v m ∂m
s ❜ t = s+t wherever all are defined on U .
= 12 (u 1 v 2 − u 2 v 1 )
Both (1) and (2) generalize to E n : g has the same form but
E. Products of Fields
we sum over i, j = 1, . . . , n; ω = d x 1 ∧ . . . ∧ d x n . Any
We have seen how and can be used in vector spaces, change of chart induces corresponding changes in their
and it is easy to see how these operations can be per- local expressions.
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 61
F. Exterior Derivative
There is an extraordinarily useful derivative on differential
forms defined locally by
d: r M → r +1 M
: w(i1 ...ir ) d x i1 ∧ . . . ∧ d x ir → dw(i1 ...ir )
d x i 1 ∧ . . . ∧ d x ir
FIGURE 4 Chain complex of modules.
where dwi1 ...ir = ∂ j wi1 ...ir d x j , the gradient of the compo-
nent function. Surprisingly, this exterior derivative d is 2. r -forms arriving from r −1 M, the image of
uniquely determined by the requirements:
d: r −1 M → r M
a. d: r M → r +1 M linearly We usually abbreviate those to ker d and im d, and if we
b. d: 0 M → 1 M : f → d f wish to emphasize precisely where they come from we
c. w ∈ r M, v ∈ s M write
⇒ d(w ∧ v) = dw ∧ v + (−1)r w ∧ dv ker d = Z r (M, d) and im d = B r (M, d)
d. d 2 = 0
Note that it is a consequence of d 2 = 0 that
Here of course we are using M to denote the space of
r
B r (M, d) ⊆ Z r (M, d)
sections of the bundle.
and a consequence of the linearity of d that B r (M, d) is
EXAMPLE. For a real function f ∈ M, with respect 0
a submodule of Z r (M, d) and both are submodules of
to some chart r M. When equality occurs we say that the sequence is
d f = ∂i f d x i ∈ 1 M exact at r M; this means that
62 Manifold Geometry
But Z 1 (S 1 , d) = 1 S 1 = {smooth w: S 1 → A1 S 1 = T ∗ S 1 } which, on choosing charts about x and f (x), appears lo-
and B 1 (S 1 , d) = {d f | f ∈ 0 S 1 = smoot maps S 1 → }. cally as the Jacobian matrix [ f i j ] of partial derivatives of
Hence components of f with respect to coordinates at x. As al-
ways, there is a dual to the linear map Dx F, going the
H 1 (S 1 , d) = {w + B 1 (S 1 , d) | w ∈ 1 S 1 } other way:
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 63
64 Manifold Geometry
expressible as λ du + µ dv for any real functions not orientable by passing to density forms, but we shall
λ, µ. not pursue this.
2. On a sphere S 2 there is no nongradient cotangent If µ is a volume form on M then so is f µ for any
field: u ∈ 1 S 2 ⇒ u = d f for some f : S 2 → ; also, nowhere-zero smooth real function f , so such an f must
there is a nonzero w ∈ 2 S 2 , which is consequently stay the same sign on each component piece of M. Since
not of the form du for u ∈ 1 S 2 since all such have An M can have only one independent section, then volume
u = d f so du = d 2 f = 0. forms can only differ by such a factor f . We say µ and
3. If M is n (or the open n-ball, {x ∈ n | x < 1}, to f µ are equivalent choices of orientation if f is every-
which it is diffeomorphic) then every closed r -form where positive and opposite choices if f is everywhere
w ∈ r M is expressible as w = du for some negative.
u ∈ r −1 M; hence the partial differential equation Given a volume form µ on M and an atlas {(Uα , φα ) |
dw = 0 has a solution. α ∈ A} then it is possible to define an integral of an n-form
w ∈ n M by means of a technical device called a partition
of unity. This effectively shares w out fairly among the
V. METRIC GEOMETRY
charts involved in any overlapping. Then each φα∗ carries
the appropriately weighted fraction of w to n E n , where
In an n-manifold M we often have to measure two kinds
integration is as usual, and it remains to sum up the various
of things: the “volume” of a piece of M or the “size” of
weighted contributions from overlapping charts. For this
vectors from some bundle over M. The special kind of
to give a finite answer it is necessary that the sum converge,
measure function appropriate for a manifold is a volume
and for this we require M to have finite volume or, if not,
form.
then w must become zero on all but a finite region of M.
A. Volume Forms The technical condition is that w has compact support.
The integral of w is denoted M w and in particular the
A volume form on an m-manifold M is a nowhere-zero volume of M is M µ.
n-form µ ∈ n M. Since An M is a vector bundle [with A fundamental application of this process is in Stokes’
fiber (n )∗ ] there may be no such volume form. theorem:
On E n with the standard coordinates there is a volume
form namely d x 1 ∧ d x 2 ∧ . . . ∧ d x n , as we have pointed dv = v
out before. Now every point of M has a neighborhood that M ∂M
is diffeomorphic to E n so in each such neighborhood we for v ∈ n−1 M with compact support
could construct a local volume form by pulling it back
from E n ; the problem is simply that we may not be able where ∂ M is the boundary of an n-manifold with bound-
smoothly to join up these local expressions into a global ary. In this general context, integration by parts has the
volume form on M. A sufficient condition for effecting expression
the joining up is easily motivated:
If (U, φ), (V, ψ) are two charts about x ∈ U ∩ V with df ∧v = fv− f dv
M dM M
coordinates (x1 , . . . , xn ) and (y1 , . . . , yn ), respectively,
then where f ∈ M. 0
Manifold Geometry 65
But d x ∧dy is the usual volume form for E 2 so M d x ∧dy Usually we drop the subscript x indicating the variabil-
is just the area enclosed by curve ∂ M. In particular, if ∂ M ity of g and its coordinate expression with position in M.
is the ellipse L with EXAMPLE. Take M = E 2 so each tangent space is iso-
L = {(x = a cos θ, y = b sin θ ) ∈ 2
| 0 θ 2π } morphic to 2 as the set of directions at the point. The sim-
plest choice of metric tensor is given by the usual inner
then product
x dy − y d x = (ab cos2 θ + ba sin2 θ ) dθ gx : Tx M × Tx M →
= ab dθ : u i ∂i , v j ∂ j → u 1 v 1 + u 2 v 2
and it follows that the area of the ellipse is and so here g = δi j d x i d x j . This induces on E 2 all of
the usual Euclidean geometry, including the usual volume
1 form; it is easily generalized to E n .
d x ∧ dy = ab dθ = πab
M 2 ∂M Another metric tensor on E 2 is given by
Evidently the volume form on E 3 is d x ∧ dy ∧ dz and its ηx : Tx M × Tx M →
restriction to the two-dimensional submanifold E 2 which
: u i ∂i , v j ∂ j → u 1 v 1 − u 2 v 2
is the (x, y) plane is simply d x ∧ dy. Now, the ellipse
itself is a one-dimensional submanifold of E 2 and, like the and so this is expressed as
circle, it supports a nowhere-zero 1-form dθ . The standard
1 0
volume form on the ellipse L is actually r dθ , where r 2 = η = ηi j d x i d x j where ηi j =
x 2 + y 2 , and therefore the circumference of the ellipse is 0 −1
2π This induces Minkowski geometry on E 2 , as used in
r dθ = (a 2 cos2 θ + b2 sin2 θ )1/2 dθ relativity.
L 0
From the example we see that the Euclidean metric ten-
2π
sor satisfies a stronger condition than 2. It is 2 . Positive
=a (1 − e2 sin2 θ )1/2 dθ
0 definiteness: gx (u, v) = 0 if and only if u = 0. This im-
poses on the matrix (gi j )x that its eigenvalues all be of one
the familiar elliptic integral with e the eccentricity
sign.
(1 − b2 /a 2 )1/2 .
A metric tensor satisfying condition 2 is called a
Riemannian metric; one satisfying only 2 is called
B. Metric Tensors an indefinite metric or a pseudo-Riemannian metric.
Riemannian metric geometry includes all of the usual
A metric tensor on an n-manifold M is an element g ∈ curved surfaces and much else; pseudo-Riemannian ge-
T20 M, that is, a section of T20 M ometry includes space–time structures where angles and
lengths are interpreted very differently from those of com-
g: M → T20 M : x → gx ∈ Tx M ∗ Tx M ∗
mon experience.
where gx : Tx M × Tx M → : (u, v) → gx (u, v) satisfies Since the simplest possible choice of metric tensor gives
the conditions the whole of standard Euclidean geometry on E n , we can
expect that the structure implied by any metric tensor is
very rich indeed. This is the case, but before investigat-
1. Symmetry: gx (u, v) = gx (v, u).
ing it we should consider existence questions. Now, an
2. Nondegeneracy: if gx (u, v) = 0 for all v ∈ Tx M then
n-manifold consists of copies of E n smoothly joined to-
u = 0.
gether and we have a standard metric tensor on each patch
of E n , so we are once again faced with patching them
If (∂1 , . . . , ∂n )x is a basis for Tx M then its dual, together in a consistent way. The same trick works as
(d x 1 , . . . , d x n ), is a basis for (Tx M)∗ and so gx is ex- for volume forms: we use a partition of unity to share
pressible in the form out fairly the contributions of each overlapping copy of
E n . Thus, all of our manifolds admit a Riemannian met-
gx = gi j d x i d x j x
ric. However, whereas the module n M of volume forms
Then the symmetry condition imposes symmetry on the is one-dimensional and so any two differed only by a
matrix of numbers (gi j )x and nondegeneracy makes its scalar function, the module ϒ20 M of sections of T20 M is
determinant nonzero. n 2 -dimensional and no particular Riemannian metric is
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
66 Manifold Geometry
distinguished in general. Of course, if M is itself a sub- standard volume form on E n , where det gi j = 1
manifold of some manifold M (especially E m for some everywhere.
m n) that has a metric tensor g, then the restriction of g 6. If M is oriented then there is a Hodge dual
to M is a metric tensor on M. This is an important source isomorphism on differential forms:
of examples. ∗
: r M → n−r M : w → ∗ w
We list next some of the important implications of an
n-manifold M having a metric tensor g. To explain the construction of this isomorphism
suppose that (e1 , . . . , en )x is any ordered basis in the
1. It induces a measure of length for a curve orientation for Tx M ∗ and ordered bases for r M and
n−r M are given locally by
c: [0, 1] → M : t → c(t)
{ei1 ∧ ei2 · · · eir | 1 i 1 < · · · < ir n}
The tangent vector to c at t is ċ(t) ∈ Tc(t) M and we
define the length of c to be {e j1 ∧ e j2 · · · e jn−r | 1 j1 < · · · < jn−r n}
t=1
Then with gi j = g(ei , e j )
|g(ċ(t), ċ(t))|1/2 dt
∗
t=0
w(i1 ···ir ) ei1 ∧ · · · ∧ eir
In coordinates, if c(t) ∈ U for some chart (U, φ) then
φ(c(t)) = (c1 (t), . . . , cn (t)) and = |det gi j |1/2 w(∗j1 ··· jn−r ) e j1 ∧ · · · ∧ e jn−r
Hence g(ċ, ċ) = gi j ċi ċ j . w ∗j1 ··· jn−r = g k1 i1 g k2 i2 · · · g kr ir wi1 ···ir sgn(k → i)
2. There is a dual metric tensor g ∗ ∈ ϒ02 M with and sgn(k → i) is the sign of the permutation
gx∗ : ∗ ∗
Tx M × Tx M → : (α, β) → gx∗ (α, β) (i 1 , . . . , ir , j1 , . . . , jn−r ) → (k1 , . . . , kr , j1 , . . . , jn−r )
with coordinate expression Intuitively, ∗ w is the “complement” of w in the
gx∗ = g i j ∂i ∂ j x and (g ) = (gi j )
ij −1 volume form µg determined by g; in particular, we
find
as matrices, well defined since det gi j = 0.
∗
3. There is an isomorphism ϒ10 M ϒ01 M of tangent 1 = µg (1 is the constant unit real
and cotangent fields: function on M, 1 ∈ 0 M)
∗
gxb : Tx M → Tx M ∗ : v → v ∗ µg = (−1)ν (ν is the number of negative
eigenvalues of g)
with v ∗ (u = gx (u, v))
The appearance of ∗ is simplest if we choose the
which appears in coordinates as oriented base (e1 , . . . , en ) for Tx M ∗ to be
gxb : v i ∂i → gi j v i d x j orthonormal, that is, g ∗ (ei , e j ) = ±δ ij so
|det gi j | = 1. In this case
with inverse
∗
(ei1 ∧ · · · ∧ eir ) = (e j1 ∧ · · · ∧ e jn−r )
gx# : wi d x i → g i j wi ∂ j
for any even permutation (i 1 , . . . , ir , j1 , . . . , jn−r ) of
4. These isomorphisms in 2 can be tensor-producted to (1, . . . , n), and we extend the isomorphism to all
give isomorphisms r -forms by requiring it to be linear.
ϒsr M ϒmk M for all r + s = k + m
Some justification for the presence of |det gi j | in the
Consequently, we can switch among tensor fields to
volume form can be seen by recalling that (gi j ) is the local
suit our convenience when we have a metric tensor.
(n × n) matrix expression of a linear map Tx M → Tx M ∗ .
5. If M is an orientable manifold then a choice of an
For such matrix maps from n to n the determinant mea-
oriented atlas (Jacobian everywhere positive on
sures precisely the factor of volume change under the map:
overlaps) yields a unique volume form µg determined
a unit n-cube is sent to an n-box of volume |det gi j |.
locally by
EXAMPLE. The equations of the elecromagnetic field
µg = |det gi j |1/2 d x 1 ∧ · · · ∧ d x n
on a space-time manifold can be very neatly expressed in
This is nowhere zero since nondegeneracy ensures terms of the electromagnetic 2-form F ∈ 2 M and dim
det gi j = 0. Observe that this does agree with the M = 4.
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 67
Locally, for a basis of 1-form fields (ωi ), of v on M because Dv takes values in T (TM), not in TM. A
connection gets us from T (TM) to TM in orderly fashion.
F = F(i j) ωi ∧ ω j
If we suppose that the ωi are mutually orthogonal unit A. Linear Connection
fields, then the metric tensor components (gi j ) are the
eigenvalues of g lying along the diagonal. There are more general connections than the one that we
The Hodge dual isomorphism gives ∗ (ωi ∧ ω j ) = shall use; we are interested in those that have particular
ω ∧ ωk , where (i, j, m, k) is an even permutation of
m significance for the structures we already have on M.
(1, 2, 3, 4). So ∗ (ω1 ∧ ω2 ) = ω3 ∧ ω4 and so forth. Simi- A (linear) connection on M is a splitting of the vector
larly, ∗ (ω1 ∧ ω2 ∧ ω3 ) = ω4 and so forth. bundle T (TM) into a direct sum of a horizontal part HM
Physical theory leads to the following equations for F and a vertical part VM, with HM isomorphic to TM as
in regions that contain negligible amounts of matter: vector bundles. The motivation is clear: given any vector
∗ ∗
u ∈ TM and field v: M → TM then
dF = 0 d F=J (†)
Dv(u) ∈ T (TM) HM VM
where J is the current density. These equations correpond
to the usual Maxwell’s equations through the vector cal- and we interpret the rate of change of v in the direction u
culus correspondences: as the projection of the HM part onto TM. Equivalently,
∗ ∗ ∗ we define a connection on M to ba a map ∇ that assigns
d ≡ curl and d ≡ divergence
to each u ∈ TM and v ∈ ϒ01 M a vector ∇u v in Tx M (where
Conservation of charge is expressed by u ∈ Tx M) such that
div J = 0
a. ∇u is linear over u ∈ Tx M.
This is automatically satisfied when there is negligible b. ∇u v is linear over v ∈ ϒ01 M.
matter since it becomes c. ∇u f v = u( f )v(x) + f (x) ∇u v if f : M → .
∗ ∗∗ ∗ d. If w, v ∈ ϒ01 M then so is ∇w v: x → ∇w(x) v.
d d F =0 because d 2 = 0
However, in the presence of matter, Eq. (†) becomes We view ∇u v as the rate of change of the field v in the
∗ ∗ direction of the vector u at x ∈ M and call it the covariant
dA = 0 d B=J for some A, B ∈ 2 M derivative of v with respect to u.
with A and B related by some transformation, per- By exploiting the linearity properties we can easily ob-
haps linear. Once again, d 2 = 0 ensures conservation of tain coordinate expressions. Given a chart (U, φ) with cor-
charge. rdinates (x1 , . . . , xn ) about x and basis fields (∂1 , . . . , ∂n )
Locally, J = ρ Ji ωi , where ρ is the chrage density. Then for tangent vector fields about x, any u ∈ Tx M is of the
over a compact spacelike submanifold S of M we can form u = (u i ∂i )x and near x any vector field w is of the
measure the total charge Q N and find that form w = w j ∂ j ∈ ϒ01 U . Now we must have
∇u w = ∇u i∂i w j ∂ j ∈ Tx M
Q N = ρω ∧ ω ∧ ω = ∗J
i 2 3
S S = u i ∇∂i w j ∂ j by a
= d ∗B = ∗
B = u ∂i w ∂ j + w j ∇∂i ∂ j x by b and c
i j
∂S
S
Hence ∇ is completely determined on U by specification
of ∇∂i ∂ j ∈ T10 U . But any such field is of the form
VI. CONNECTION GEOMETRY ∇∂i ∂ j = ikj ∂k for some ikj : U →
Given a tangent vector field v: M → TM it is quite likely So locally ∇ appears as an n 3 array of smooth real
that we shall be interested in measurings its rate of change functions on U . These functions are traditionally called
over M. Now, v is a smooth map between smooth manifols the Christoffel symbols of the connection, and they will
so it induces the derivative change smoothly from chart to chart. Substitution of
[(∂ y m /∂ x k ) ∂ˆm ] for (∂i ) through a change of chart coor-
Dv: TM → T (TM) dinates from (x1 , . . . , xn ) to (y1 , . . . , yn ) can be traced
trough the above steps to obtain expressions for
between the corresponding tangent bundles. Unfortu-
nately, this is not useful as a measure of the rate of change ∇∂ˆi ∂ˆ j = ˆ ikj ∂ˆk
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
68 Manifold Geometry
Then the ˆ ikj can be related to the ikj through the par- EXAMPLE. Consider E 1 with ∇∂1 ∂1 = λ, for some
tial derivatives (∂ y m /∂ x k ). This is one way to see that constant λ ∈ , with respect to the standard chart.
the Christoffel symbols are not components of any tensor
field. Take c: [0, 1] → E 1 : t → t, so ċ(t) = ∂1 .
We easily extend ∇ to give covariant derivative of arbi- Then τt : Tc(0) E 1 → Tc(t) E 1 : α0 ∂1 → α(t)∂1 satisfies.
trary tensor fields by defining for u ∈ Tx M dα
+ αλ = 0 so α(t) = α0 e−λt
dt
a. ∇u f = u( f ) for f ∈ ϒ00 M.
b. ∇u (d x i ∂i ) = 0 for mutually dual basis fields. Evidently λ = 0 corresponds to the usual connection since
c. ∇u (w v) = (∇u w) v + w ∇u v. we do not usually alter the length of vectors when we
move them on E 1 . Any λ = 0 determines a non-Euclidean
From b and c with u = ∂ j we get, for example, parallelism structure on E 1 . A similar connection could
be put on S 1 .
∇∂ j (d x i ∂i ) = ∇∂ j d x i ∂i + d x i ∇∂ j ∂i = 0
EXAMPLE. To find a local expression for the paral-
0 = ∇∂ j d x k ∂k + kji d x i ∂k lel transport isomorphism. We consider M = E 2 with
therefore the standard chart and connection ∇ having constant
Christoffel symbols
∇∂ j d x k = − kji d x i
12
1
= 21
1
=1 and all other components zero
This defines a family of 1-forms called the connection
Given the curve c: [0, 1] → E 2 : t → (t, t 2 ) we find the
1-forms.
parallel vector field
w: [0, 1] → TE2 : t → f (t) ∂1 + g(t) ∂2
B. Parallel Transport
for two independent initial tangent vectors:
Given a curve c: [0, 1] → M, its tangent vector field is
denoted ċ: [0, 1] → TM and in components ċ = ċ j ∂ j . If a. w(0) = ∂1
w ∈ ϒsr M then we say that w is parallel along c if b. w(0) = ∂2
∇ċ w = 0
The parallel transport condition is ∇ċ w = 0 and we are
If w = wk ∂k ∈ ϒ01 M is parallel along c with tangent given ċ(t) = ∂1 + 2t ∂2 . Substituting
vector field ċ j ∂ j = d/dt, then
f˙∂1 + ġ ∂2 + 2tf 21
i
∂i + g12
i
∂i = 0
∇ċ w = ∇ċ j∂ j w ∂k = 0 ∈
k
ϒ01 M ( f˙ + 2tf + g) ∂1 + ġ ∂2 = 0
so (ċ ∂ j w ) ∂k + ċ w ∇∂ j ∂i = 0. Hence dw /dt + (dc /
j k j i k j
so g(t) = g(0)
dt)wi kji = 0 ∈ , k = 1, 2, . . . , n. This system of n linear
differential equations, which represents ∇ċ w = 0, has a We solve f˙ + 2tf + g = 0 for constant g to give:
unique solution
Case (a): f (t) = e−t , g(t)
2
= g(0) = 0.
w: [0, 1] → TM: t → wt 2 t
Case (b): f (t) = −e−t 0 e x d x = k(t), say, and
2
Manifold Geometry 69
70 Manifold Geometry
2m =˙. 3 × 10−11 sec This is called being compatible with the metric and it
on the earth r= ˙. 2.1 × 10−2 sec effectively says that the covariant derivative will always
r̈ = view the metric tensor (and its dual g ∗ ) as a constant: ∇ g
˙. −3/8 × 10−7
factors out any variability introduced by peculiarties of g.
2m =˙. 2.5 × 10−13 sec In particular, it will make parallel transport an isometry
as well as an isomorphism and covariant derivatives will
on the moon r= ˙. 5.4 × 10−3 sec
r̈ = commute with the isomorphisms g # and g b induced by g.
˙. −14/3 × 10−9 The second condition is less obvious at first:
2m =˙. 10−5 sec b. ∇ug v − ∇vg u = [u, v] for all u, v ∈ ϒ01 M.
on the sun r= ˙. 2 sec
r̈ = This is called being symmetric because it implies that with
˙. −1/8 × 10−5 respect to any coordinates the Christoffel symbols ikj are
A manifold with connection is called geodesically com- symmetric in i j.
plete if all of its geodesics can be extended to infinite Conditions (a) and (b) are sufficient to select a unique
parameter values (or until they meet the boundary, if connection for a manifold with metric tensor. They give
M is a manifold with boundary). It is known that, with a system of differential equations that locally allows the
their standard connections induced from being embedded Christoffel symbols to be calculated from partial dervia-
in Euclidean space, the circle, sphere, and torus are all tives of the components of g. We find that
g g
geodesically complete. Observe that on such spaces the ∂k g(∂i , ∂ j ) = g ∇∂k ∂i , ∂ j + g ∂i , ∇∂k ∂ j
extension of a geodesic to infinite parameter values may
involve it in repeatedly convering the same points, for from a so
some geodesics become closed curves. ∂k gi j = ki
m
gm j + kmj gim ,
As might be expected, from the fact that some suf- g g
ficiently small region about x ∈ M is diffeomorphic to ∇∂i ∂ j − ∇∂ j ∂i = [∂i , ∂ j ] = 0
E n , we can get a geodesic going in any direction. That is,
from b so ikj = kji . Hence we use the inverse matrix of
for all sufficiently small initial tangent vectors ċ(0) = u 0
(gr s ) and symmetry to give
∈ Tx M, there is a geodesic through c(0) = x. To make “suf-
ficiently small” precise we need a norm in Tx M, but any ikj = 12 g km (∂i g jk + ∂ j gik − ∂k gi j )
norm will do equivalently well. Define
It can be shown that every manifold can be given a
Sx = {u 0 ∈ Tx M | there is a geodesic Riemannian metric that is geodesically complete. Evi-
c: [0, 1] → M with c(0) = x and ċ(0) = u 0 } dently, the fact that (gi j ) = (δi j ) everywhere on E n im-
mediately gives zero Christoffel symbols in the standard
Then there is a nice map called the exponential map at x coordinates; however, if we use other than rectilinear coor-
dinates, then the corresponding metric tensor components
expx : Sx → M : u 0 → c(1) where c is a geodesic
will be nonconstant and some nonzero Christoffel symbols
with c(0) = x and ċ(0) = u 0 will airse. The idea is clear: if we wish to keep Euclidean
It turns out that at every point x there is some- geometry but describe it with curvilinear coordinates, then
neighborhood of the zero vector in Tx M on which expx we shall expect any components in these coordinates (of
is a diffeomorphism onto its image. If the connection is vectors or tensors) to alter as we parallel transport them.
complete then expx is defined on all of Tx M for all x ∈ M. EXAMPLE. To find a metric connection and equations
for a parallel vector field along a given curve. We take
M = (0, 2π) × S 1 , an open cylinder with identity coordi-
D. Metric Connection
nate x on the interval (0, 2π ) and angular coordinate θ on
We saw that any Euclidean space has a connection, the the circle S 1 . Consider the expression in these coordinates
simplest possible choice, and we implied that any subset of the pseudo-Riemannian metric tensor
of E n that is a manifold will inherit a unique connection.
−(1 − cos x)2 0
Now this is a consequence of the following result: (gi j ) =
Every metric tensor g determines a unique connection 0 (1 − cos x)2
∇ g , called the metric connection or Levi–Civita connec- at (x, θ ) ∈ M
tion. There is one obvious condition that it should satisfy:
From symmetry and compatibility of the induced metric
a. ∇ug g = 0 for all u ∈ ϒ01 M. connection ∇ with Christoffel symbols (imj ) we have
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 71
imj = 12 g mk ∂i g jk + ∂ j gi j − ∂k gi j We can interpret T as a section of T21 M since at each
x ∈ M we have a bilinear map
= 12 g mk k i j , say
T : Tx M × Tx M → Tx M
Substitution gives
! but such maps are effectively elements of Tx M ∗
−2 sin x(1 − cos x) 0
(1 i j ) = Tx M ∗ Tx M. Hence we may view T as a section of
0 −2 sin x(1 − cos x)
T ∗ M T ∗ M TM and locally
!
0 2 sin x(1 − cos x) T = ikj − kji d x i d x j ∂k
(2 i j ) =
2 sin x(1 − cos x) 0
The antisymmetry of T in i and j immediately suggests
Hence and interpretation of T as some king of 2-form. This is
! possible but the presence of the ∂k direction means that
sin x/(1 − cos x) 0
i1j = then we view T as a vector-valued 2-form, the torsion
0 sin x/(1 − cos x)
form
!
0 sin x/(1 − cos x) !: ϒ01 M × ϒ01 M → ϒ01 M
i j =
2
sin x/(1 − cos x) 0
which in local coordinates becomes
For the vertical-going curve c: (0, 2π) → M : t → (t, 0)
a parallel vector field is v: (0, 2π) → TM: t → f (t)∂x + ! = 12 i j d x i ∧ d x j where i j = ikj ∂k
h(t)∂θ where ∇ċ v = 0. This differential equation becomes
so locally ! takes values in n Tx M.
∂x f + f 11
1
=0 In much the same way, we can represent any connection
∂x h + 2
h12 =0 by a vector-valued 1-form, the connection form
72 Manifold Geometry
The connection form of this ∇ is ω : Y01 E 2 → Y11 E 2 with which in local coordinates becomes
ω = ikj d x j ∂k d x i " = "(i j) d x i ∧ d x j
" #
1d x 1 ∂1 8 d x 1 ∂2 where "i j = (∂i mjk − ik
l
mjl ) ∂m d x k . So, locally "
=
2
4d x 2 ∂1 0 d x 2 ∂2
dx takes values in Tx M Tx M ∗ n , giving a matrix rep-
resenting the limiting parallel transport map of vectors
" #
1d x 1 ∂1 6 d x 1 ∂2 from Tx M around a parallelogram defined by ∂i , ∂ j .
+ dx2 It is easy to extend the exterior product and exterior
4d x 1 ∂1 4 d x 2 ∂1
derivative to vector-valued r -forms, just by applying the
So usual operations to their components. When we do this
1 8 1 6 connection, curvature, and torsion forms we obtain the
ω= dx +
1
dx2 (with values in 2×2 ) famous structural equations of E. Cartan, for example,
4 0 4 2
"(u, v) = dω(u, v) + 12 [ω(u), ω(v)] u, v ∈ ϒ 1 M
F. Curvature of a Connection
EXAMPLE. The previously mentioned Schwarzschild
Intuitively we perceive that the unit sphere in E 3 has curva- metric tensor on M = × E 3 \B with coordinates (t, r,
ture but the (x, y) plane there has not; both are 2-manifolds θ, φ) is of the form
inheriting a metric connection from Euclidean E 3 . A geo-
meter detects the presence of curvature by taking a tangent − f 2 (r ) 0 0 0
0 f −2 (r ) 0
vector around closed curves by parallel transport. If upon 0
(gi j ) =
return to the starting point the transported vector is always 0 0 r2 0
the same as the initial vector, then the connection used for 0 0 0 r sin θ
2 2
the parallel transport is called flat; otherwise it is called
curved. The amount of curvature at different points and in with f a function of r
different directions is measured by the curvature map As before we let indices run 0, 1, 2, 3. Evidently, a basis
R : ϒ01 M × ϒ01 M → L ϒ01 M, T01 M of mutually orthogonal unit 1-form fields, that is, of or-
thonormal fields, is given by
: (u, v) → R(u, v)
(ωi ) = ( f dt, f −1 dr, r dθ, r sin θ dφ)
where R(u, v): ϒ01 M → ϒ01 M : w → ∇u ∇v w − ∇v ∇u
w − ∇[u,v] w. Since L(ϒ01 M, ϒ01 M) ϒ11 M, we can con- Their exterior derivatives satisfy the structural equations
sider R to be a member of L(ϒ02 M, ϒ11 M) ϒ31 M, that
dωi = −ωij ∧ ω j where ωij = ijk ωk
is, a ( 13 )-tensor field. Observe that R(u, v) = −R(v, u).
We interpret R(u, v) at a point x as a map of Tx M to and
itself that is a limiting case of parallel transport around a
curvilinear parallelogram determined by u(x), v(x) ∈ "ij = dωij + ωki ∧ ωkj where "ij = Rl(i
k
j) w ∧ ω
i j
so dω0 = f˙ω1 ∧ ω0
m dω1 = 0
R(∂i , ∂ j )∂k = ∇∂i mjk ∂m − ∇∂ j ik ∂m
m dω2 = ( f /r )ω1 ∧ ω2
= ∂i jk + ljk ilm ∂m − ∂ j ik m
+ ljk mjl ∂m
m dω3 = ( f /r )ω1 ∧ ω3 + (cot θ)/r ω2 ∧ ω3
= ∂i ik − ∂ j mjk + ljk ilm − ik
l
mjl ∂m
Then we deduce that the only nonzero ωij are
= Rimjk ∂m
ω12 = −ω21 = f /r ω2 so dω12 = f f˙/r ω1 ∧ ω2
So as a ( 13 )-tensor field
ω13 = −ω31 = f /r ω3 so dω13 = f f˙/r ω1 ∧ ω2 + f /r
R = Rimjk ∂m d x i d x j d x k
cot θ ω2 ∧ ω3
Most conveniently we can interpret R as a vector-valued
2-form, the curvature form ω10 = ω01 = f˙ω0 so dω10 = ( f˙2 + f f˙)ω1 ∧ ω0
": ϒ01 M → ϒ11 M ω23 = −ω32 = (cot θ)/r ω3 so dω23 = −1/r 2 ω2 ∧ ω3
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 73
By inspection of the second structural equation we find it to be compact, for instance, a closed and bounded sub-
set of some Euclidean E n . Spheres and the torus are of
"01 = ( f f˙ + f˙2 )ω1 ∧ ω0 this kind. It is not a necessary condition because E n itself
"02 = f f˙/r ω3 ∧ ω0 is complete but not compact. In a connected, complete
Riemannian manifold any two points can be connected by
"31 = f f˙/r ω1 ∧ ω3 + f /r 2 cot θ ω2 ∧ ω3 a geodesic that is minimal in length relative to all joining
"21 = f f˙/r ω1 ∧ ω2 curves between the points. Of course, in any Riemannian
manifold, “small enough” regions are always complete.
"32 = ( f 2 − 1)/r 2 ω2 ∧ ω3
"03 = f f˙/r ω3 ∧ ω0 B. Connection Completeness
Then from the definition of the curvature form we obtain Geodesics do not tell the whole story about complete-
the components Rlik j of the Riemann curvature tensor. For ness in non-Riemannian manifolds. It is possible to have
example, geodesic completeness but some other curves may be in-
2
R332 = R223
3
= ( f 2 /r 2 ) − (1/r 2 ) complete in any reasonable definition. For example, it is
possible to contrive a model space-time that is geodesi-
Einstein’s equation in general relativity can be written cally complete but in which an observer, in a rocket say,
could with finite energy follow a trajectory that cannot be
Rikjk = Ri0j0 + Ri1j1 + Ri2j2 + Ri3j3 = 0
extended beyond a certain finite time. Such an observer
It results in two differential equations for f , reducible to would disappear from that universe. Most realistic cos-
mological models imply space-time geometries that are
f 2 + r (d/dr )( f 2 ) − 1 = 0 of this kind, simply because gravity is attractive.
which admits the solution we encountered before: We are faced with the problem: does a given curve
c: [0, 1) → M admit a continuous extension by one point
f (r ) = (1 − 2m/r )1/2 to the closed interval [0, 1]? If not, then we say that the
curve is inextensible. This in itself need not imply a prob-
lem of incompleteness since, in the Euclidean plane,
VII. SINGULAR GEOMETRY
c: [0, 1) → E 2 : t → (0, (1 − t)−1 − 1)
In this section we shall look, albeit briefly, at some situa- begins at the origin and proceeds to cover the whole posi-
tions where the geometry goes wrong. We have seen that if tive y axis but we cannot extend it to a domain includ-
we start with a nice geometrical space, like Euclidean E 2 , ing t = 1. The additional test that we need to make is
we can introduce singular behavior by removing points. for some kind of finiteness property. In the presence of a
Spaces obtained by such removal operations are mani- Riemannian metric we could use the length of the curve.
festly incomplete. The interesting thing is that some geo- Clearly, though inextensible the above curve is not finite
metrical spaces are not of this type but are incomplete; in in length, so we do not wish to infer from it any intrinsic
other words some singular behavior still persists after all incompleteness of the space E 2 . In the absence of suit-
removed points are replaced. able measurements for length, as in pseudo-Riemannian
manifolds, we need another device to test for finiteness.
The natural one, which arises when we have a connection,
A. Riemannian Completeness
depends on the use of parallel transport along the given
It is an interesting geometrical problem to describe just curve.
what constitutes a singularity and where it is in such
spaces. Geodesic completeness is one obvious criterion
C. b -Incompleteness
and it solves the problem for Riemannian manifolds: there
is a geodesic singularity in a region U ⊆ M if there is a Let c: [0, 1) → M be a curve in manifold M with connec-
geodesic that enters U and does not leave but cannot be tion ∇ and choose a basis (bi ) for Tc(0) M. Then the parallel
extended. It can be shown by means of the Hopf-Rinow transport isomorphism of tangent spaces along the curve’s
theorem that this criterion covers all other curves as well: path, say,
a Riemannian manifold (without boundary) is complete
τt : Tc(0) M → Tc(t) M : u i bi → u i τ (bi ) = u i βi (t)
with respect to all curves if and only if it is complete
with respect to all geodesics. A sufficient condition for a defines a basis (βi (t)) for each Tc(t) M. Now, any n-
Riemannian manifold to be geodesically complete is for dimensional vector space, once we have chosen a basis,
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
74 Manifold Geometry
has a unique isomorphism with n simply by taking com- VIII. TOPOLOGY, GEOMETRY,
ponents with respect to that basis. Next, tmr n has a natural AND PHYSICS
norm
The 20th century enjoyed a wealth of developments from
(x i ) = ((x 1 )2 + (x 2 )2 + · · · + (x n )2 )1/2 the interplay of formal mathematics and natural sciences,
So, given the choice of basis (bi ) = (βi (0)) at c(0) for our and in no other area is this as rich in results as that involv-
tangent spaces, we can use its parallel transported image ing geometry, topology, and theoretical physics. One of the
to define a b length for the tangent vector most remarkable families of recent results has been in the
work of Freedman and Donaldson that led to their award
ċ(t) = ċi (t)βi (t) to c at t of Fields Medals at the International Congress of Mathe-
maticians in 1986. An amazing but easily comprehended
namely
result is that there are copies of 4 that are topologically
ċ(t)b = (ċi (t)) indistinguishable from ordinary 4 but which have differ-
ent manifold structures—namely, there are exotic 4-spaces
This gives us a b length of the curve with respect to basis and this happens in no other dimension.
(bi ) at c(0), defined by In simple terms, we could say that whereas on all n for
t n = 1, 2, 3, 5, 6, . . . , there is only one way to set up cal-
L b (c) = ċ(t)b dt culus, in the case of 4 there are infinitely many different
0
ways to do it. It was already known that there were ex-
In a space-time manifold, a choice of basis for one tan- otic 7-spheres (but none of lower dimension), but the new
gent space is effectively a choice of reference frame of results led to the existence of exotic closed 4-manifolds.
directions and scale of units, that is, an observer. Clearly Many compact (topological) 4-manifolds cannot be given
our b length will vary with the choice of the observer. a differentiable structure and those that do admit differ-
However, what does not vary is whether it is finite or not. entiable structures may allow infinitely many. Interest-
One choice of initial basis (bi ) at c(0) is sufficient to test ingly, this was proved by using the methods of Yang-Mills
for finiteness of b-length with respect to any basis. theory from physics. The obstructions to differentiable
Accordingly, we say that curve c is b incomplete if it is structures arise from the theory of connections associ-
inextensible and has finite b length. This definition only ated with the Yang-Mills equations and instantons. Thus,
needs the presence of a connection, not a metric tensor. Donaldson gave an application of physical gauge field the-
When we do have a Riemannian metric connection then ory to geometrical topology.
the finiteness of b length is equivalent to the finitness of Cosmology has provided rich areas of application for
ordinary length. We say that a manifold with connection the curved pseudo-Riemannian geometry needed for gen-
is b incomplete if it contains a b-incomplete curve. A eral relativity theory and it has generated very precise ex-
Riemannian manifold is actually b incomplete if and only priments to detect the physical consequences. In general
if it is geodesically incomplete, but this is not the case for relativity, spacetime has its curvature controlled by the
pseudo-Riemannian manifolds. matter distribution and the curvature controls how freely
When it was discovered that all realistic models of the gravitating bodies will move.
universe, relativistic or otherwise, are likely to be incom- The usual model for a simple, homogeneous isotropic
plete, it was natural to enquire if this was merely an inade- spacetime is 4 with the Friedmann-Robertson-Walker
quacy of the field theory of gravity. Thus, it might be hoped metric tensor, given by the arclength expression
than an appropriate quantum theory of gravity would be
dr 2
such as to average out any classical singularities. Unfor- ds 2 = c2 dt 2 − a(t)2 + r 2
(sin 2
θ dφ 2
+ dθ 2
) .
tunately, there is no single quantum theory of gravity that 1 − kr 2
is accepted by all. One theory, geometrical quantization, Here, k = 1, 0, −1 depending on whether the universe is
when applied to a massless Klein–Gordon scalar field on closed, flat, or open, respectively. In the case k = 1, space
a curved space-time could not prevent the collapse of the is represented at each instant t by a sphere of radius a(t).
state vector: the incompleteness found in the classical ge- Geodesics represent the trajectories of particles free
ometry could not be quantized away. More generally, it from all influences other than gravitational interactions
was shown that if there is b incompleteness with respect with the matter in the universe. Photons follow null
to one connection then there will be b incompleteness with geodesics, which lie on the boundary of local null cones
respect to any nearby connection. So the singularity is sta- (ds 2 = 0) determined by the constancy of the speed of
ble under perturbations of the connection and unlikely to light. Free material particles follow timelike geodesics
be removable by any quantum theory of gravity. but even those subject to acceleration are nevertheless
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
Manifold Geometry 75
constrained to lie locally in the interior (ds 2 > 0) of null Rather surprisingly, general relativistic cosmology cou-
cones of directions. pled with deep space observations lead cosmologists to
In the Friedmann-Robertson-Walker cosmological conclude that most of the matter in galaxies and clusters
model, the wavelength of electromagnetic radiation emit- is dark matter and so is invisible as far as electromagnetic
ted from a distant source at time t and observed at time t0 emission or absorption is concerned. The matter is known
has redshift relative to the local wavelength given by: to be there through gravitational effects and it is dominant
at all scales of structure larger than galactic cores, but it
λsource a(t0 ) is unclear what form it takes. Inference from interpreting
=1+z = .
λlocal a(t) gravitational effects on infrared spectral redshifts indicates
Free, initially thermal radiation energy remains thermal the presence of supergalactic sheetlike clusters containing
during the expansion of the universe, with temperature about 60% of all galaxies. The remaining galaxies seem to
proportional to a(t)−1 and the relative expansion rate of be about equally shared between dense filamentary con-
the spacelike surfaces is given by glomerations (“walls”) and sparse filaments. This disposi-
tion of matter leaves large voids in the observable univese
1 da 2 8 k2 1 and the distribution of the sizes of these voids is also an
= H 2 = π Gρ + 2 + active research area.
a dt 3 a 3
Quasars and active galactic nuclei need some source of
Here, G is the gravitational constant, H is Hubble’s con- energy and one possibility is for this to consist of mas-
stant at time t, is the cosmological constant and ρ is sive black holes, of size 107 –109 solar masses. The sim-
the mean matter density. The critical mean density that plest geometry for an otherwise empty spacetime con-
corresponds to k = 0 when = 0, is given at the present taining an isolated black hole with mass M is that of the
epoch when H = H0 , by Schwarzschild model. There we have
3H02 dr 2
ρcrit = ds 2 = (1 − 2M/r ) dt 2 −
8π G 1 − 2M/r
A recent trend has been to allow to differ from zero, + r 2 (sin2 θ dφ 2 + dθ 2 ).
since there is evidence that the present mean matter density
The event horizon consists of the surface at r = 2M, from
is less than the critical mean density ρcrit .
which neither photons nor material particles can escape to
A Hot Big Bang some 12 Giga years ago, at t = 0 in
the outside and through which any nearby particles will
the Friedmann-Robertson-Walker model, followed by an
be drawn toward the central singularity at r = 0. In fact,
adiabatic expansion controlled by general relativity, is the
quantum theory may allow a tunneling escape of matter
broad scenario most widely accepted by cosmologists. It
and a consequential reduction of mass as a result of pair
accounts well for the relative abundance of the lightest nu-
production just outside the event horizon, if only one of the
clides and for the observable microwave background radi-
pair is drawn in. Infalling matter swept up by a black hole
ation. However, there are some intriguing difficulties. The
in a galactic core could generate radiation energy through
matter in typical galaxies arose from fluctuations within
its acceleration toward the event horizon.
the first year after the Big Bang, because later the scales
would have been too large for causal effects to occur. This
gives rise to the question of the origin and nature of the SEE ALSO THE FOLLOWING ARTICLES
fluctuations in the hot dense early phase; “cosmic infla-
tion” is a currently favored way to answer this question. ALGEBRA, ABSTRACT • ALGEBRAIC GEOMETRY • COS-
This notion involves an initial period of exponentially ac- MOLOGY • LOOP GROUPS • TOPOLOGY, GENERAL
celerating expansion lasting ∼10−32 sec, caused by a posi-
tive cosmological constant, which physicists associate
with a hypothetical scalar field or “inflaton.” Such an in- BIBLIOGRAPHY
flation period would precede the normal evolution of a(t).
A special case of interest is when k = 0, the mean matter Arnold, V., Atiyah, M., Lax, P., and Mazur, B., eds. (1999). “Mathemat-
density is zero ρ = 0, and the cosmological constant is ics: Frontiers and Perspectives,” Am. Math. Soc., Providence, RI.
positive in Friedmann-Robertson-Walker spacetime. This Beem, J. K., and Ehrlich, P. E. (1981). “Global Lorentzian Geometry,”
corresponds to the de Sitter cosmological model, which Dekker, New York.
Beem, J. K., Ehrlish, P. E., and Easley, K. L. (1996). “Global Lorentzian
happens also to be the limiting scenario for all indefinitely Geometry,” Second ed., Marcel Dekker, New York.
expanding models with > 0. In de Sitter √ spacetime, the Berger, M. (1999). “Riemannian Geometry During the Second Half of
cosmic inflation corresponds to a(t) ∝ e /3t . the Twentieth Century,” Am. Math. Soc., Providence, RI.
P1: GPA Final Pages
Encyclopedia of Physical Science and Technology EN009B-400 July 6, 2001 20:51
76 Manifold Geometry
Bott, R., and Tu, L. W. (1982). “Differential Forms in Algebraic Topol- Donaldson, S. K. (1987). The geometry of 4-manifolds, Proc. Int.
ogy,” Springer-Verlag, New York. Congress of Mathematicians, Berkeley, 1986, pp. 43–54, Am. Math.
Choquet-Bruhat, Y., DeWitt-Morette, C., and Dillard-Bleick, M. Soc., Providence, RI.
(1982). “Analysis, Manifolds and Physics,” 2nd ed., North-Holland, Gompf, R. E., and Stipsicz, A. L. (1999). “4-Manifolds and Kirby Cal-
Amsterdam. culus,” Am. Math. Soc., Providence, RI.
Cordero, L. A., Dodson, C. T. J., and de Leon, M. (1989). “Differential Gray, A. (1998). “Modern Differential Geometry of Curves and Sur-
Geometry of the Frame Bundles,” D. Reidel, Dordrecht. faces,” Second ed., CRC Press, Boca Raton, FL.
Dekel, A., and Ostriker, J. P., eds. (1999). “Formation of Structure in the Peebles, P. J. E. (1993). “Principles of Physical Cosmology,” Princeton
Universe,” Cambridge Univ. Press, Cambridge. Univ. Press, Princeton.
Dodson, C. T. J. (1988). “Categories, Bundles and Spacetime Topology,” Sternberg, S. (1983). “Lectures on Differential Geometry,” 2nd ed.,
2nd ed., Kluwer, Dordrecht. Chelsea, New York.
Dodson, C. T. J., and Parker, P. (1997). “A Users’ Guide to Algebraic Thurston, W. P. (1997). “Three-Dimensional Geometry and Topology,”
Topology,” Kluwer, Dordrecht. Princeton Univ. Press, Princeton.
Dodson, C. T. J., and Poston, T. (1991) “Tensor Geometry,” Graduate Willmore, T. J. (1982). “Total Curvature in Riemannian Geometry,” Ellis
Texts in Mathematics 130, Second ed., Springer-Verlag, New York. Horwood, Chichester.
P1: GNHFinal Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
Mathematical Logic
Yiannis N. Moschovakis
University of California, Los Angeles,
and University of Athens
I. Propositional Logic, PL
II. First-Order Logic, FOL
III. Gödel’s Incompleteness Theorem
IV. Computability
V. Recursion and Programming
VI. Alternative Logics
VII. Set Theory
197
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
A B ¬A (A & B) (A ∨ B) (A → B)
Each logic L has a syntax which delineates the gram-
matically correct linguistic expressions of L, a semantics 1 1 0 1 1 1
which assigns meaning to the correct expressions, and a 1 0 0 0 1 0
structured system of proofs which specifies the rules by 0 1 1 0 1 1
which some L-expressions can be inferred from others. 0 0 1 0 0 1
There are other words to describe these things: formal
language is sometimes used to describe a plain syntax,
formal system often identifies a syntax together with an B. Propositional Semantics
inference system (but without an interpretation), and ab-
stract logic has been used to refer to a syntax together with If B stands for some true proposition, then ¬B is false,
an interpretation, leaving inference aside. It is, however, independently of the “meaning” or internal structure of
a fundamental feature of logic that it draws clean dis- B. This is an instance of a general Compositionality Prin-
tinctions and studies the connections among these three ciple for PL: The truth value of a formula depends only
aspects of language. We explain them first in the simplest on the truth values of its immediate parts. The semantics
example of the “logic of propositions,” which is part of of PL comprise the rules for computing truth values, and
many important logics. they can be summarized in Table I, where 1 stands for
“truth” and 0 for “falsity.” By the first line of this table,
for example, if A and B are both true, then ¬A is false
A. Propositional Syntax while (A & B), (A ∨ B), and (A → B) are all true. Notice
The symbols of PL are the connectives that if A is false, then (A → B) is reckoned to be true no
matter what the truth value of B, so that “if the moon is
¬ (not) & (and) ∨ (or) → (implies, if-then) made of cheese, then 1 + 1 = 5” is true (on the plausible
the two parentheses ‘(’, ‘)’, and an infinite list of (formal) assumption that the moon is not made of cheese). This
propositional variables P0 , P1 , P2 , . . . which intuitively material implication assumed by Propositional Logic has
stand for declarative propositions, things like “John loves been attacked as counterintuitive, but it agrees with math-
Mary” or “3 is a prime number.” It has only one cate- ematical practice and it is the only useful interpretation
gory of grammatically correct expressions, the formulas, of implication which accords with the Compositionality
which are strings (finite sequences) of symbols defined Principle.
inductively by the following conditions: Using these rules, we can construct for each formula
A a truth table which tabulates its truth value under all
1. Each Pi is a formula. assignments of truth values to the variables. For exam-
2. If A and B are formulas, then so are the expressions ple, the truth table for (Q → P) consists of the first three
columns of Table II while the first two and the last column
¬A (A & B) (A ∨ B) (A → B) give the truth table for (P → (Q → P)).
If n variables occur in a formula A, then the truth table
For example, if P and Q are propositional variables, then
for A has 2n rows and determines an n-ary bit function v A ,
(P → Q) and (P ∨ ¬P) are formulas, which we read as
with arguments and values in the two-element set {1, 0}.
“if P then Q” and “either P or not P.”
By the Definitional Completeness Theorem, every n-ary
The inductive definition gives a precise specification of
bit function is v A for some A, so that the formulas of PL
exactly which strings of symbols are formulas, and also
provide definitions (or “symbolic representations”) for all
insures that each formula is either prime, i.e., just a vari-
bit functions.
able Pi , or it can be constructed in exactly one way from its
simpler immediate parts, by one of the connectives. This
makes it possible to prove properties of formulas and to
TABLE II Truth Table
define operations on them by structural induction on their
definition. P Q (Q → P) (P → (Q →P))
More propositional connectives can be introduced as 1 1 1 1
“abbreviations” of formula combinations, e.g., 1 0 1 1
A ↔ B ≡ ((A → B) & (B → A)) 0 0 1 1
0 1 0 1
A ∨ B ∨ C ≡ (A ∨ (B ∨ C)).
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
A formula A is a semantic consequence of a set of for- other connectives. One can now use natural manipulations
mulas T (or T -valid) if every assignment to the variables of formulas to construct circuits which compute a given
which satisfies (makes true) all the formulas in T also bit function with minimum size or time complexity, or
satisfies A. We write to establish optimality results for the computation of bit
functions by appealing to the formula representations of
T |= A ⇔ A is T -valid,
the circuits which realize them. For example, using dis-
and |= A, in the important special case when T is empty, junctive normal forms, one sees immediately that (if we
in which case A is called a tautology. A formula A is do not care about cost), every n-ary bit function can be
satisfiable if some assignment satisfies it, i.e., if ¬A is not computed by an unbounded fan-in circuit in no more than
a tautology. Let 3 time units. There is, in general, a substantial trade-off
between the size and time complexity of the circuits which
A ∼ B ⇔ {A} |= B and {B} |= A
compute a given bit function.
⇔ |= A ↔ B,
and call A and B equivalent if A ∼ B. Equivalent formulas D. The Satisfiability Problem
define the same bit function, and they can be substituted The assertion that “C(A) and C(B) never give the
for each other without changing truth values. Clearly same output on the same inputs” means precisely that
(A → B) ∼ (¬A ∨ B), “(A ↔ ¬B) is a tautology,” so that to detect that A and
B do not have this safety property we need to determine
so that the implication connective is superfluous. In fact, whether the formula ¬(A ↔ ¬B) is satisfiable.
every formula is equivalent to one in disjunctive nor- Because of such natural formulations of “error detec-
mal form, i.e., a disjunction A1 ∨ · · · ∨ Ak where each tion” for circuits relative to given specifications, it is
Ai is a conjunction of variables or negations of variables very important to find efficient algorithms for determin-
(literals). ing whether a given formula is satisfiable. The problem
is of non-deterministically polynomial time complexity
C. Applications to Circuits (NP), because it can be resolved by guessing (“non-
deterministically”) some assignment and then verifying
Each formula A with n variables can be realized by a that it satisfies A in a number of steps which is bounded
switching circuit C(A) with n inputs and one output, so that by a polynomial in the length of A; and it is NP-complete,
C(Pi ) consists of just one input-output edge, C(A & B) is i.e., every NP-problem can be “reduced” to it by a polyno-
constructed by joining C(A) and C(B) with an and-gate, mial reduction. This is a basic result of S. Cook, who in-
etc. Figure 1 exhibits the circuit for ((P1 & P2 ) → P3 ) using troduced the complexity class NP, showed that it contains
the equivalent formula without implications, so that only a large number of important problems, and asked if it co-
¬-, &-, and ∨-gates are required. These are restricted cir- incides with the (seemingly) smaller class P of “feasible,”
cuits, of fan-in (maximum number of edges into a node) 2 deterministically polynomial time problems. The question
and fan-out 1, but the Definitional Completeness Theorem whether P = NP is the fundamental open problem of com-
implies that every n-ary bit function can be computed by plexity theory; it amounts simply to the question whether
some formula circuit C(A). the satisfiability problem can be solved by a deterministic,
There are basically two useful measures of circuit polynomial algorithm.
complexity, and both of them are faithfully mirrored in
formulas. The number of gates of C(A) is exactly the E. Propositional Inference
number of connectives in A and measures size complex-
ity (construction cost), while the depth of C(A), which A proof of a formula A from a set of hypotheses T is any
measures the time complexity of computation, is exactly finite sequence
the rank of A, defined inductively so that rk(Pi ) = 1, A0 , A1 , . . . , An−1 , A
rk(A & B) = max(rk(A), rk(B)) + 1 and similarly for the
which ends with A, and such that each Ai is either in T ,
or a PL-axiom, or follows from previously listed formulas
by a rule of inference. To make this notion precise we need
to specify a set of PL-axioms and rules of inference; and
for these to be useful, it should be that they are few and
easy to understand, and that the formulas provable from
FIGURE 1 The circuit for (¬((P1 & P2 ) ∨ P3 ). T are exactly the T -tautologies.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
We need just one, binary inference rule: irrelevant as long as this fact obtains; and then the Com-
pleteness Theorem implies that two formulas A and B
A (A → B)
(Modus Ponens) define the same n-ary operation on all Boolean algebras
B exactly when A ∼ B, i.e., when A and B define the same
This is sound, i.e., {A, (A → B)} |= B, so that if A and bit function.
(A → B) are both T -tautologies, then so is B. Boolean algebras have many important applications in
An axiom is any instance of the following axiom mathematics (to measure theory, among other things), and
schemes, where A, B, and C are arbitrary formulas and we they are the subject of the classical Stone Representation
have omitted several parentheses which pedantry would Theorem which identifies them all (up to isomorphism)
require: with sub-algebras of powerset algebras. In logic they are
mostly used through the “nonstandard” Boolean seman-
(1) A → (B → A) tics of this subsection, which extend to richer logics and
(2) (A → B) → ((A → (B → C)) → (A → C)) provide a powerful tool for independence (unprovability)
(3) A → (B → (A & B)) results.
(4) (A & B) → A (4 ) (A & B) → B
(5) A → (A ∨ B) (5 ) B → (A ∨ B)
(6) (A → C) → ((B → C) → ((A ∨ B) → C)) II. FIRST-ORDER LOGIC, FOL
(7) (A → B) → ((A → ¬B) → ¬A)
Consider the claim:
(8) ¬¬A → A
These are all tautologies, and so every formula provable If everybody has a mother, and every mother loves her children,
then everybody is loved by somebody.
from T is T -valid. We write
T A ⇔ there is a proof of A from T, It is certainly true, it has the “linguistic form” of many
similar (more substantial) claims in mathematics, and it
and it is not hard now to establish the basic appears to be true by virtue of its form and not because of
Soundness and Completeness Theorem for PL. For all any special properties of the words “mother,” “love,” etc.
sets T and any A, First-Order Logic makes it possible to express complex
T |= A ⇔ T A. assertions of this type and to show that they are true by
logic alone. The symbolic expression of this one will be
[(∀x)(∃y)M(x, y) & (∀x)(∀y)[M(x, y) → L(y, x)]]
F. Boolean Algebras
→ (∀x)(∃y)L(y, x),
A Boolean algebra is a set B with at least two, distinct
elements 0 and 1, a unary complementation operation , give-or-take a few parentheses and brackets which will be
and binary infimum ∩ and supremum ∪ operations such required to make the syntax completely precise.
that certain properties hold. The standard example is the
set P(M) of all subsets of some nonempty set M, with
0 = ∅, 1 = M and the usual complementation, intersection A. First-Order Syntax
and union operations, which for a singleton M gives the The symbols of FOL are the propositional connectives,
two-element set {1, 0} of truth values; but there are others, the parentheses, the quantifiers
e.g., the set of all finite and cofinite subsets of some infinite
set, the set of all “closed and open” subsets of a topological ∀ (for all) ∃ (there exists)
space, etc. the comma ‘,’, the identity symbol ‘=’, an infinite list
Each formula A with n variables defines an n-ary func- v0 , v1 , . . . of individual variables which will denote arbi-
tion on every Boolean algebra B, simply by letting the trary objects in some domain, and for each n = 0, 1, . . . ,
propositional variables range over B and replacing ¬, &, two infinite lists of function and relational symbols
and ∨ and → by , ∩, ∪, and ⇒ respectively, where
fn0 , fn1 , . . . , Pn0 , Pn1 , . . . ,
x⇒y=x ∪y
which will stand for n-ary functions and relations on the
on B. Now the axioms for a Boolean algebra insure that objects.
every propositional axiom defines a function with con- There are two categories of grammatically correct ex-
stant value 1—in fact the particular choice of axiomati- pressions in FOL, terms and formulas, defined recursively
zation for Boolean algebras (and there are many) is quite by the following conditions.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
T1. Each variable vi is a term. of substituting the term t for the free occurrences of the
T2. If t1 , . . . , tn are terms, then (the string) fin (t1 , . . . , tn ) variable x in some formula A by
is also a term. When n = 0, we write simply fi0 .
A{x :≡ t}
F1. If t1 , . . . , tn are terms, then the expressions
and we will tacitly assume that all substitutions are free.
t1 = t2 Pin (t1 , . . . , tn ) Formulas of FOL are too messy to write down, and so
are formulas, the latter written simply Pi when we often resort to “informal descriptions” of them like
n = 0. the example about mothers loving their children above,
F2. If A and B are formulas, then so are the expressions recipes, really, from which the full, grammatically correct
formula could (in principle) be constructed.
¬A (A & B) (A ∨ B) (A → B)
F3. If A is a formula, then so are the expressions B. First-Order Semantics
(∀vi )A (∃vi )A Whether Eq. (1) is true or false depends on the object v1 , on
the function f10 , on the property P11 , and (most significantly)
Notice that by the notational convention in F1, all PL- on the range of objects over which we interpret the exis-
formulas are also FOL-formulas. tential quantifier—where do we search for things which
This logic is called first order because quantification is may or may not satisfy P11 ?
only allowed over individuals; if we add formula forma- To interpret the formulas of FOL we must be given a
tion rules domain D and an interpretation ı, a function which assigns
n an object ı(vi ) in D to each individual variable, an n-ary
∀Pi A ∃Pin A
function ı(fin ) on D to each n-ary function symbol fin , and
we obtain the formulas of second-order logic, SOL. an n-ary relation ı(Pin ) on D to each Pin . Using these, first
Consider the simple formula we extend inductively ı to all terms by
(∃v2 ) ¬v2 = v1 & P11 (v2 ) . (1) ı fin (t1 , . . . , tn ) = ı fin (ı(t1 ), . . . , ı(tn )),
Its “translation” into English by the reading of the symbols so that ı(t) is some object in D. To assign truth values to
we have introduced is formulas, define first, for each variable x and d in D, the
update
some object other than v1 has the property P11
= ı{x := d},
which is exactly how we would translate the result of sub- which agrees with ı on all function and relation symbols,
stituting v3 for v2 in it, and also on all individual variables, except that (x) = d.
With the help of this basic operation, we can state in
(∃v3 ) ¬v3 = v1 & P11 (v3 ) . Table III the classical Tarski truth conditions which de-
This is because both occurrences of v2 in Eq. (1) are bound termine the truth of formulas relative to a fixed domain
by the quantifier ∃v2 , just as the occurrences of x are bound D and an interpretation ı. The truth value of a formula A
1
by the d x in 0 x 2 d x and can be replaced by y without relative to an interpretation ı is 1 if ı |= A and 0 otherwise,
changing the meaning of the definite integral. On the other and the Compositionality Principle extends to FOL in a
hand, the occurrence of v1 in Eq. (1) is free, because it is not straightforward manner and implies the following basic
within the scope of any quantifier, and so the interpretation fact: the truth value of A relative to ı depends only on the
of v1 clearly affects the meaning of Eq. (1).
Using the same simple example, consider the results of TABLE III The Tarski Truth Conditions
substituting f10 (v3 ) and f10 (v2 ) for v1 in Eq. (1), ı |= t1 = t2 ⇔ ı(t1 ) = ı(t2 )
ı |= Pin (t1 , . . . , tn ) ⇔ (ı(Pin ))(ı(t1 ), . . . , ı(tn ))
(∃v2 ) ¬v2 = f10 (v3 ) & P11 (v2 ) ,
ı |= ¬A ⇔ ı |= A
(∃v2 ) ¬v2 = f10 (v2 ) & P11 (v2 ) . ı |= (A & B) ⇔ ı |= A and ı |= B
ı |= (A ∨ B) ⇔ ı |= A or ı |= B
The first of these says of f10 (v3 ) what Eq. (1) says of v2 , but ı |= (A → B) ⇔ ı |= A or ı |= B
the second says that “something is not a fixed point of f10 ı |= (∀ vi )A ⇔ for all d in D,
and has property P11 ,” which is quite different—evidently ı{vi := d} |= A
because the variable v2 in f10 (v2 ) is “caught” by the quan- ı |= (∃vi )A ⇔ for some d in D,
tifier ∃v2 . The first is a free substitution (causing no con- ı{vi := d} |= A
fusion) while the second is not. We will denote the result
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
values of ı on the function and relation symbols which oc- A directed graph is a structure G = (D, E), where E
cur in A, and on the values ı(x) for the individual variables is a binary “edge” relation on the set of “nodes” G, and it
which occur free in A. is a graph (undirected) if it satisfies the sentence
The Tarski conditions do nothing more than translate
formulas into English, in effect identifying FOL with a (∀x)(∀y)[E(x, y) → E(y, x)].
precisely formulated, small but very expressive fragment
Complete graphs (cliques) are characterized by the
of natural language.
sentence
(∀x)(∀y)E(x, y),
C. Structures
A vocabulary (or signature) is any finite sequence σ = while “diameter ≤2” is defined by
{f1 , . . . , fk , P1 , . . . , Pl } of function and relation symbols,
(∀x)(∀y)[x = y ∨ E(x, y) ∨ (∃z)[E(x, z) & E(z, y)]].
and FOL(σ ) is the part of FOL whose formulas involve
only the function and relation symbols of σ . The idea is Finite directed and undirected graphs are used to model
to think of f1 , . . . , fk and P1 , . . . , Pl as constants, denoting many notions in computer science, e.g., circuits.
fixed functions and relations on some set D, and to use the A semigroup (monoid) with identity is a structure
formulas of FOL(σ ) to study definability in structures (S, e, ·) where the identity e is some specified member
of S, · is a binary “multiplication” on S, and the following
M = (D M , f 1 , . . . , f k , P1 , . . . , Pl )
sentences are true:
of vocabulary σ , where the universe D M of M is any
nonempty set, and f 1 , . . . , f k , P1 , . . . , Pl are functions (∀x)(∀y)[x · (y · z) = (x · y) · z],
and relations which can be assigned to the vocabulary (∀x)(x · e = x & e · x = x).
symbols, e.g., such that f i is n-ary if fi is n-ary.
An M-assignment is any function α from the variables Here and in the sequel we write t1 · t2 rather than the
to D M , and it extends naturally to an interpretation α M by pedantically correct ·(t1 , t2 ).
the association of f i with fi and Pi with Pi ; the standard In addition to semigroups, there are groups, rings, fields,
notation for structure satisfaction is and ordered fields, vector spaces, and any number of other
structures which are the stuff of “abstract” algebra. These
M, α |= A ⇔ α M |= A. classes of structures are all characterized by first-order
Formulas of FOL(σ ) with no free variables are called axioms, and the use of methods from logic is becoming
sentences and (by the Compositionality Principle) they are increasingly important in their study.
simply true or false in every σ -structure, without reference Two structures M1 and M2 are isomorphic if some one-
to any assignment. They define properties of structures. to-one correspondence between their universes carries the
We write functions and relations of M1 to those of M2 . Isomorphic
structures satisfy the same first-order sentences, but the
M |= A ⇔ for any (and hence all) α, converse is not true, as we will see in Section II.F.
M, α |= A (A a sentence),
and if M |= A, we say that M satisfies A or is a model D. Databases
of A. In the most general terms, a database is just a finite
While sentences define properties of structures, formu- structure, typically relational, i.e., without functions, only
las with free variables can be used to define relations on relations. “Finite” does not mean “small” or “simple,”
structures. If, for example, A has at most one free vari- and in the interesting applications databases are huge
able x, we set structures of large and complex vocabularies, with basic
R A (d) ⇔ M, α{x := d} |= A, relations such as “x is an employee born in year n,” “y is
the supervisor of x,” etc. Properties of structures are usu-
where α is any assignment, since its only relevant value is ally called queries in database theory, and one of the main
updated in this definition. In the same way, formulas with n tasks in the field is to develop representations for databases
free variables define n-ary relations on σ -structures, the which support fast algorithms for updating, entering new
first-order definable relations of M. A function f : D nM → information in the base, and data testing, determining the
D M is first-order definable if its graph truth or falsity of queries. As it happens, updating and data
testing for first-order queries can be done very efficiently,
G f (x1 , . . . , xn , w) ⇔ w = f (x1 , . . . , xn )
and so database systems, including the industry standard
is first-order definable. Some examples: SQL make heavy use of methods from first-order logic.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
Motivated by database theory, a good deal of research (Wiles’) Theorem, and the (still open) question whether
has been done since the 1970s in Finite Model Theory, the there exist infinitely many twin pairs of prime numbers.
mathematical and logical study of finite structures. For a
rather surprising, basic result, let F. Model Theory
Probσ [M |= A : |D M | = n] The mathematical theory of structures starts with the fol-
= the proportion of σ -structures lowing basic result:
of size n which satisfy A, Compactness and Skolem-Löwenheim Theorem. If
where structures are counted “up to isomorphism.” every finite subset of a set of sentences T has a model,
then T has a countable model.
The FOL 0-1 Law. For each sentence A of FOL(σ ) in For an impressive application, let (in the vocabulary of
a relational vocabulary, either arithmetic)
lim Probσ [M |= A : |D M | = n] = 1, 0 ≡ 0, m+1 ≡ (m + 1),
n→∞
or so that the numeral m is about the simplest term which
lim Probσ [M |= A : |D M | = n] = 0, denotes the number m, add a constant c to the language,
n→∞
and let
i.e., either A or ¬A is asymptotically true.
More advanced work in this area is concerned primarily T = {A : N |= A}
with the algorithmic analysis of queries on finite struc- ∪ {0 ≤ c, 1 ≤ c, 2 ≤ c, . . .}.
tures, especially in logics richer than FOL. Every finite subset S of T has a model, namely
N S = (N, 0, 1, +, ·, m),
E. Arithmetic
where the object m which interprets c is some number
Most basic is the structure of arithmetic bigger than all the numerals which occur in formulas of
N = (N, 0, 1, +, ·), S. So T has a countable model
where N = {0, 1, . . .} is the set of (non-negative) natu- N T = (N̄, 0̄, 1̄, +̄, ¯·, c),
ral numbers and + and · are the operations of addition and then N̄ = (N̄, 0̄, 1̄, +̄, ¯· ) is a structure for the vocab-
and multiplication. The first-order definable relations and ulary of arithmetic which satisfies all the first-order sen-
functions on N are called arithmetical, and they obviously tences true in the “standard” structure N but is not iso-
include addition, multiplication, and the ordering on N, morphic with N —because it has in it some object c which
which is defined by the formula is “larger” than all the interpretations of the numerals 0 .
x ≤ y ≡ (∃z)[x + z = y]. 1 , . . . . It follows that, with all its expressiveness, First-
Order Logic does not capture the isomorphism type of
By a basic lemma of Gödel, if a function f is determined complex structures such as N .
from arithmetical functions g and h by the equations These nonstandard models of arithmetic were con-
structed by Skolem in the 1930s. Later, in the 1950s,
f (0, x ) = g(x ) Abraham, Robinson constructed by the same methods
(2)
f (y + 1, x ) = h( f (y, x ), y, x ), nonstandard models of analysis, and provided firm foun-
dations for the classical Calculus of Leibnitz with its in-
then f is also arithmetical. Thus exponentiation x y is arith- finitesimals and “infinitely large” real numbers.
metical, with g(x) = 1, h(w, y, x) = w·x, and, with some Model Theory has advanced immensely since the early
work, so is the function p(x) which enumerates the prime work of Tarski, Abraham Robinson and Malcev. Espe-
numbers, cially with the contributions of Shelah in the 1970s and,
p(0) = 2, p(1) = 3, p(2) = 5, . . . . more recently, Hrushovsky, it has become one of the most
mathematically sophisticated branches of logic, with sub-
In fact, the scheme of Primitive Recursion (2) is the ba- stantial applications to algebra and number theory.
sic method by which functions are introduced in number
theory, so that, with some work, all fundamental num-
G. First-Order Inference
ber theoretic relations and functions are arithmetical, and
all celebrated theorems and open problems of the theory The proof system of First-Order Logic is an extension of
of numbers are expressed by first-order sentences of N . that for Propositional Logic, first by identity axioms which
These include the Prime Number Theorem, Fermat’s Last insure that = is an equivalence relation and a congruence
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
for all function and relation symbols, e.g., for unary func- Completeness Theorem answers definitively (for science)
tion symbols, the ancient question of what follows from what by logic
alone: a proposition A follows from certain assumptions
(∀x)(∀y)[x = y → f (x) = f (y)].
T as a matter of logic (and independently of the facts), if
In addition, there are two axioms for the quantifiers, A and T can all be expressed faithfully as FOL(σ ) asser-
tions about some σ -structure M, and T A. On this view,
A{x :≡ t} → (∃x)A (∀x)A → A{x :≡ t},
it is hard to overemphasize the importance of this result
assuming that the term substitutions are free; and there are for the foundations of mathematics and science.
two new inference rules, Incidentally, there is an obvious extension of the Tarski
C→A A→C conditions to Second-Order Logic, e.g.,
C → (∀x)A (∃x)A → C ı |= ∀Pin A ⇔ for all n-ary P on D,
which can be used only when the variable x is not free in
ı Pin := P |= A.
C. Proofs from a set T of FOL(σ ) sentences are defined
exactly as for PL, and we set again However, there is no useful Completeness Theorem for
T A ⇔ there is a proof of A from T. SOL, as we will see in Section IV.F.
P(x) → P(x), P(x) → (∀x)P(x), If Model Theory is the study of semantics independently of
inference, then Proof Theory can be viewed as the mathe-
(∃x)P(x) → (∀x)P(x) matical investigation of formal proofs independently of
would be a proof of (∃x)P(x) → (∀x)P(x), which is, ob- interpretation. This has always been one of the most
viously, not valid. With the restriction, however, for every active research areas of logic, and it has been invigo-
structure M, if every M-assignment satisfies the hypothe- rated in recent years by its substantial applications to
sis of either new rule, then every M-assignment satisfies computer science, including automated deduction, an im-
the conclusion, so that the quantifier inference rules are portant component of artificial intelligence. Key to these
sound. applications—and the basic result of Proof Theory—is
the Extended Normal Form Theorem of Gentzen, whose
somewhat weaker (but simpler) Herbrand version is fairly
H. Gödel’s Completeness Theorem
easy to describe.
A model of a set of sentences T in FOL(σ ) is any structure There are four Herbrand inference rules, and they apply
M which satisfies every A in T , in symbols to n-ary disjunctions
M |= T ⇔ for all A in T, M |= A. A1 ∨ · · · ∨ An .
We also write Two of them are structural, and they clearly preserve
T |= A ⇔ for all M, meaning: you can interchange the order of the disjuncts,
M |= T ⇒ M |= A, or delete one of two occurrences of the same disjunct. The
other two are quantifier rules,
which extends to FOL(σ ) the semantic consequence re-
lation of PL. From the comments above: A1 ∨ · · · ∨ An {x :≡ t} A1 ∨ · · · ∨ An
∗
A1 ∨ · · · ∨ (∃x)An A1 ∨ · · · ∨ (∀x)An
Soundness Theorem for FOL. If T A, then T |= A.
The fundamental fact about First-Order Logic is the where the ∗ indicates that the ∀-rule can only be used if the
converse of this result: variable x is not free in its conclusion. The result applies
only to sentences without identity and in prenex normal
Completeness of FOL. If T |= A, then T A.
form, i.e., looking like
It may be argued that the semantic consequence rela-
tion T |= A captures the intuitive notion A follows from (Q1 x1 ) · · · (Qn )B
the assumptions in T by logic alone, in the sense that
where each Qi is ∀ or ∃ and B is quantifier-free.
it insures that A is true whenever all the hypotheses in
T are true, independently of the meaning of the func- Herbrand’s Theorem. Every provable = −free sen-
tion and relation symbols. Granting that and considering tence A of FOL(σ ) in prenex form can be derived from a
the strong expressibility of First-Order Logic discussed provable quantifier-free disjunction by the four Herbrand
in Section II.C above, we may then argue further that the rules.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
The restriction to prenex sentences is not essential, be- and the best we can do in FOL is to adopt the Axiom
cause every formula can be converted to an equivalent Scheme
prenex one by the application of simple rules which can
be added to the system. (A{y :≡ 0} & (∀x)(A{y :≡ x} → A{y :≡ x + 1}))
The theorem asserts (in part) that every provable sen- → (∀x)A{y :≡ x}. (6)
tence A has a “normal” proof, in which only formulas of
“quantifier rank” no greater than A occur. This is a power- The set PA of (first-order) Peano axioms is obtained by
ful tool for proof-theoretic studies. As for applications, all taking the correctly spelled versions of all the formulas in
automated deduction systems use Herbrand-like inference (3)–(6) and adding enough universal quantifiers in front of
systems (or their Gentzen variants), and the programming them so that they become sentences. This is a very strong
language PROLOG is based entirely on this idea. set of axioms, it can prove all simple properties of numbers
The proof of Herbrand’s Theorem is constructive: an and most of their deep properties too—although proving
algorithm is defined, which computes for each proof a theorem from PA is harder than proving it using, say,
of a prenex sentence A a Herbrand proof , and then methods from analysis, and number theorists distinguish
it is shown by simple, combinatorial arguments that , and value “elementary proofs” in PA.
indeed, proves A. The additional, effective content is sig- Gödel’s First Incompleteness Theorem. There is a
nificant for the foundational applications of the theorem sentence g in FOL (0, 1, + , ·), such that N |= g but
(for example to consistency proofs), and also in the appli- PA g.
cations to automated deduction. One’s first thought is that we can overcome this “in-
It should be emphasized that the simplistic slogans completeness phenomenon” by strengthening PA, perhaps
“Model Theory = no inference” and “Proof Theory = no add Gödel’s own g to it, or use the Second-Order Logic
semantics” are often honored in the breach: like the Com- version of the Induction Axiom along with a suitable ax-
pleteness Theorem, most fundamental results of logic are iomatization of Second-Order Logic. None of this helps:
about connections between truth and proof, and some of Gödel’s fundamental discovery is that first-order truth in
the deepest results in one part of the discipline depend on N (and every other sufficiently rich structure) simply can-
methods and ideas from the other. not be presented usefully as an “axiomatic theory.” We
will make this precise in a more general version of the
Incompleteness Theorem in the next section.
where n i is the code of the symbol ai and p(i) is the ith and self-reference have become standard tools of logic
prime number. For example, the (correctly spelled) prime since Gödel’s work, and they have found substantial ap-
formula plications in many areas, including computer science and
set theory.
+(v1 , 0) = v0
has the horrendously large code
IV. COMPUTABILITY
213 35 516 79 1111 136 1710 1915 .
The size of codes is irrelevant: what matters is that It is easy to determine whether an arbitrary equation a0 +
every string of symbols (and hence every term, formula a1 x + · · · + an x n = 0 with integer coefficients a0 , . . . , an
and proof) has a code from which it can be reconstructed, has integer solutions, since every integer root must divide
by the Unique Factorization Theorem for numbers; and a0 , and so all we have to do is to test the finitely many
(more significantly) that PA is powerful enough to ex- divisors of a0 . The problem is not so easy for equations in
press and prove simple properties of formulas and proofs, k unknowns
thus translated into properties of numbers. For example, ar1 ,...,rn x1r1 x2r2 · · · xkrk = 0, (7)
if n is the numeral denoting n, as above, then PA can r1 +···+rk ≤n
prove all true, basic relations among numerals, e.g.,
and it is much more interesting, in fact
m + n = k ⇒ PA m + n = k .
to find an algorithm which determines whether Eq. (7) has a
Less trivially, the basic (coded) proof relation solution
otherwise it is unsolvable or undecidable. The definitions The two basic examples are theories of σ -structures
apply to problems about natural numbers, coded in unary;
to problems about FOL-formulas, by identifying (for ex- Th(M) = {A | M |= A},
ample) each variable vi by a similar sequence vv· · · v of and axiomatic theories of the form
i + 1 v’s, so that the syntax of FOL is based on a finite
vocabulary; and to relations (sets of n-tuples) on strings T = Th(T0 ) = {A : T0 A},
or numbers, by thinking of u 1 , . . . , u n as a single string. where T0 is a decidable set of axioms T0 . The terminol-
Each Turing machine can be represented by a string ogy is natural, because we would certainly demand of any
of 0’s and 1’s which codes its alphabet, internal states, “axiomatization” that it can be decided effectively whether
and transitions, and this leads to the first and most basic an arbitrary sentence is an axiom.
unsolvability result, due to Turing: Every decidable theory T is axiomatizable since
The Halting problem: It is undecidable whether an Th(T ) = T when T is a theory, but the converse fails, in
arbitrary Turing machine M halts on an arbitrary binary general, and in particular for T0 = ∅ when the vocabulary
string u. is not trivial:
For the proof, Turing constructed a universal machine Church’s Theorem: If the vocabulary σ includes at
U which can simulate every other, i.e., least one binary function or relation symbol, then it is
undecidable for a sentence A of FOL(σ ) whether A.
U [ M̄, u] = M[u], if M̄ is the code of M.
A FOL(σ )-theory T is consistent if it does not contain
a contradiction A & ¬A, and it is complete if for every
This treatment of programs as data is, of course, routine
sentence A, either A or ¬A is in T . It is easy to verify that
today.
every consistent, axiomatizable, complete theory is decid-
All unsolvability results are (ultimately) established by
able, and we can use this to formulate and prove a very
reducing the Halting Problem to them, i.e., showing that
general version of the Gödel Incompleteness Theorem.
if such-and-such a function were computable, then the
The key tool is the notion of translation.
Halting Problem would be solvable. The proofs are often
Suppose T1 and T2 are theories, perhaps in different
difficult and generally depend on results specific to the
vocabularies σ1 and σ2 —e.g., T1 might by Th(PA), and
field in which the problem arises.
T2 might be some axiomatic set theory. A translation
In mathematics, the problems which have been proved
of T1 into T2 is a computable string function ρ which
unsolvable include:
assigns a sentence ρ(A) of FOL(σ2 ) to every sentence
Hilbert’s 10th: Whether a given Diophantine equa- A of FOL(σ1 ) and preserves propositional logic and T1 -
tion has integer solutions (Matijasevich, following work inference, i.e.,
of Martin Davis, Hilary Putnam, and Julia Robinson).
The Word Problem for Groups: Whether two words T2 ρ(¬A) ↔ ¬ρ(A)
denote the same element in a finitely generated, finitely T2 ρ(A & B) ↔ ρ(A) & ρ(B)
presented group (P. Novikov, W. Boone).
The Homeomorphism Problem for 4-manifolds: T1 A ⇒ T2 ρ(A).
Whether the orientable n-manifolds represented by two Notice that the identity function ρ(A) = A translates every
triangulations are homeomorphic, for n ≥ 4 (A. Markov). theory into itself.
This problem is solvable for 2-manifolds, by their classi-
cal representation as spheres with handles, and it is still The Gödel Incompleteness Theorem (Rosser’s form):
open for 3-manifolds, pending (among other things) the If T is a consistent, axiomatizable theory and Peano arith-
resolution of the Poincaré Conjecture. metic Th(PA) is translatable into T , then T is undecidable
and hence incomplete.
There is also a large number of unsolvable problems in
computer science. In short, every consistent axiomatic system in which a
reasonable amount of mathematics can be developed is
undecidable and incomplete.
To state the strongest corresponding result about theo-
D. Undecidable Theories ries of structures, we need the simple fact that every com-
A theory T in FOL(σ ) is any set of sentences closed under putable set is arithmetical, essentially due to Gödel.
consequence, Tarski’s Theorem: If Th (N ) is translatable into
Th(M), then Th (M) is not arithmetical, a fortiori it is
T A ⇒ A ∈ T. not decidable.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 August 1, 2001 13:40
To apply Tarski’s Theorem, you need (in effect) to tive methods” we would be willing to use in a consistency
give a first-order definition of the natural numbers within proof should be part of the “substantial part of mathemat-
the given structure. One of the first results of this type ics” we want to axiomatize. Beyond its obvious founda-
was the undecidabilty of the theory of rational numbers tional significance, the Second Incompleteness Theorem
Th(Q, 0, 1, +, ·) (Julia Robinson), but there are many has numerous applications, especially in comparing the
others, and there are also many difficult open problems strength of various hypotheses in Axiomatic Set Theory.
in this area.
On the other hand, many interesting theories are decid-
able, including the following: F. Hierarchies
A set Q of strings or numbers is 20 if
r The theory Th(N, 0, 1, +) of arithmetic without
u ∈ Q ⇔ (∃x1 )(∀x2 )R(u, x1 , x2 ),
multiplication (Presburger).
r The theory Th(Q, ≤). This coincides with the theory
where the quantified variables range over natural numbers
of every dense, linear ordering without end points. and the matrix R is computable, and it is 03 if, for all u
r The theory Th(C, 0, 1, +, ·) of the complex number
field, which coincides with the theory of every u ∈ Q ⇔ (∀x1 )(∃x2 )(∀x3 )R(u, x1 , x2 , x3 )
algebraically closed field of characteristic 0 (Tarski, with the same restrictions. The definitions extend naturally
Abraham Robinson). to all k, and we also set
r The theory Th(R, 0, 1, +, ·, ≤) of the ordered field of
real numbers, which coincides with the theory of 0k = k0 ∩ 0k .
every real closed field (Tarski). Kleene, who introduced these classes, showed that
The classical result here is Tarski’s decidability of the 01 = the class of recursive sets,
ordered field of real numbers, which (using coordinates)
implies that Euclidean geometry is decidable, in a sense ⌺01 ⌺02
trivializing much of ancient Greek mathematics. It is still
open whether the extended theory Th(R, 0, 1, +, ·, ≤, ↑) ⌬01 ⌬02 ...
What sorts of sentences are not provable in sufficiently with some recursive f : N → S ∗ . Moreover, these classes
strong axiomatizable theories? If T = Th(T0 ) is axiomati- increase properly and exhaust the arithmetical sets. A simi-
zable in FOL(σ ), then the (coded) proof relation lar hierarchy
k1 , 1k , 1k
ProofT (a, p) ⇔ a is the code of some sentence A in FOL (σ )
and p is the code of a proof of A from T for the analytical (second-order definable) sets is con-
structed by allowing the quantified variables to range over
is Turing computable, and hence arithmetical. Using this, the unary functions α : N → N and the matrix to be arith-
we can construct a sentence ConcisT in the vocabulary of metical, so that all arithmetical sets are in 11 .
PA which expresses naturally the consistency of T and These hierarchies classify the analytical sets of natu-
establish the following: ral numbers and strings by the logical complexity of their
(simplest) definitions, and they are powerful tools in the
Gödel’s Second Incompleteness Theorem (Rosser’s theory of definability. For example, every axiomatizable
form): If T is consistent, axiomatizable and ρ trans- theory is 10 . This rules out an axiomatization of Second-
lates Th(PA) in T , then T cannot prove the translation Order Logic SOL, whose set of valid sentences (on the
ρ(ConsisT ) of its consistency sentence. empty vocabulary) is not analytical. Somewhat surpris-
The theorem makes it clear that we cannot axiomatize a ingly, it also rules out an axiomatization of the theory
substantial part of mathematics in any way whatsoever so
T f = {A | for all finite (D, E), (D, E) |= A}
that the consistency of the system can be established “con-
structively”: because the (presumably simple) “construc- of finite graphs, which is 01 but not 10 (Trachtenbrot).
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
A. Recursive Equations
FIGURE 3 q, a, 0 → q , , 1, −1, +1. Not every recursive equation (8) has a solution x, and
some have many, e.g., the trivial x(t) = x(t) which is sat-
isfied by every function. The basic result which guarantees
G. Turing Reducibility
canonical solutions to a large class of recursive equations
Imagine a Turing machine with a second query tape which comes from the theory of partially ordered sets.
it handles exactly like its primary tape, implementing A partially ordered set or poset is a structure (D, ≤ D ),
somewhat more complex transitions of the form where ≤ is a binary relation and for all x, y, z in D,
q, s1 , s2 → q , s1 , s2 , m 1 , m 2 x ≤ D x, [x ≤ D y & y ≤ D z] ⇒ x ≤ D z
It also has a special query state q? , and when it goes into [x ≤ D y & y ≤ D x] ⇒ x = y;
q? , the computation stops and does not resume until some
external agent (the oracle) replaces the contents on the a subset C of D is a chain if every two members of C are
query tape by some string (Fig. 3). ≤ D -comparable, i.e., x ≤ D y or y ≤ D x; and a poset D is
A string function f is computable relative to some given complete if every chain in D has a supremum (least upper
g if it can be computed by such an oracle machine, pro- bound).
vided each time q? is reached, the string u on the query Every complete poset has a least element ⊥ (the supre-
tape is replaced by the value g(u). We let mum of the empty chain), and every set A can be turned
into a flat poset A⊥ by adding a “bottom” below all its
f ≤T g ⇔ f is computable in g, otherwise incomparable elements (Fig. 4). Other, basic
and we extend this notion of Turing reducibility to sets of examples include the set of all subsets of a set A (un-
natural numbers via their characteristic functions. der ⊆) and the set of all (finite and infinite) sequences
It is not hard to show that there exist Turing-incompar- from some set, under “extension.” The Cartesian product
able sets of numbers (Kleene-Post). In fact, there exist of complete posets is complete, and, more importantly, if
Turing-incomparable recursively enumerable sets, but this W is complete, then the it function spaces of all arbitrary,
was quite hard to prove and it was a celebrated open ques- monotone or Scott-continuous mappings π : D → W are
tion for some twelve years, known as Post’s Problem. The also complete, with the pointwise partial ordering
simultaneous, independent discovery in 1956 by Friedberg π ≤ ρ ⇔ for all x, π (x) ≤ ρ(x).
and Muchnik of the priority method which proved it, initi-
ated an intense study of Turing reducibility which is still, Here π : D → W is monotone if
today, one of the most active research areas of logic, the x ≤ D y ⇒ π (x) ≤W π (y),
largest (and technically most sophisticated) part of com-
putability or recursion theory. and it is Scott-continuous if, in addition, for every chain
C in D,
π (supremum(C)) = supremum(π [C]).
V. RECURSION AND PROGRAMMING
The Least-Fixed-Point Theorem. If (D, ≤) is a com-
In its most general form, a recursive definition of a function plete poset and π : D → D is monotone, then the recur-
x is expressed by a recursive (or fixed point) equation sive equation
x(t) = f (t, x), (8) x = π (x) (9)
where the functional f (t, x) provides a method for com- has a least solution.
puting each value x(t), perhaps using (“calling”) other
values of x in the process. It is possible to characterize the
computable functions on the natural numbers using sim-
ple recursive equations of this form, generalizations of the
primitive recursive definition (2) in Section III.E. Though
conceptually less direct than Turing’s approach through
idealized machines, this modeling of computability by FIGURE 4 Flat poset.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
The theorem is proved by setting recursively The use of recursive equations is absolutely essential here,
to interpret the iteration and recursive constructs which are
x 0 = ⊥, x n+1 = π(x n ). (10)
at the heart of programming languages.
In the simplest case, which is sufficient for the applica- The implementation is a function which assigns to each
tions to programming languages, the mapping π is Scott program A a “machine” M A —or, more concretely, code
continuous, and then in the machine language of some processor—which com-
putes the denotation [[A]] of A. In the simplest case, [[A]]
x̄ = supremum{x 0 , x 1 , . . .} might just be a sequence of external acts, like “printing”
is the least fixed point of π . For the full result we need some file or drawing some picture on a monitor; more of-
to extend the iteration (10) into the “transfinite,” using ten [[A]] is a function relating input to output, or a “strat-
recursion on ordinal numbers. egy” in some game, by which the machine responds to a
There is a rich theory of complete posets and various sequence of external stimuli. As with inference systems,
kinds of mappings on them, mostly motivated by the ap- implementations come in a great variety of shapes and
plications to programming, but also by earlier work in forms (compilers and interpreters, to name two), but they
abstract recursion, the generalization of computability to must have the basic soundness property, that M A “com-
abstract structures. putes” [[A]] in a well-understood way which relates the
abstract (mathematical) denotations of programs to the
behavior of machines.
B. Programming Languages Even with this grossly oversimplified description, it
should be clear that the basic methodology of logic—
From the mathematical point of view, a programming lan-
the clean distinction between syntax, semantics, and
guage P is very much like a logic, with a syntax, a seman-
inference—has had an immense influence on the develop-
tics, and an implementation, which plays the role of an
ment of programming languages; and that the fundamen-
inference system.
tal, related notions of symbolic computation and recursion
The syntax is generally much more complex than that
introduced by logicians in the 1930s are essential to the
of logics, with many different categories of grammatically
understanding of programming languages.
correct expressions. There are variables of various kinds,
In the other direction, the study of programming
some of them for functions of specified types; constants
languages—spurred by the need for applications—has in-
which are meant to denote acts of interaction with the en-
troduced a host of interesting problems in logic, chief
vironment (input, output, interrupts); and various ways of
among them the question of logic of programs: What
combining grammatically correct expressions to produce
are the natural formal languages and inference systems
new ones, using programming constructs like composi-
in which the fundamental properties of programs can be
tion, “while loops,” functional abstraction, and recursion.
expressed and rigorously proved? Much work has been
Some closed expressions (with no free variables) corre-
done on this, but it is fair to say that the question is still
sponding to the “sentences” of a logic are singled out,
open, and a formidable challenge to logicians and com-
typically called programs. With all this complexity, the
puter scientists.
“grammar” is still specified by an induction, as it is for
logics, so that it is again possible to prove properties of
correct expressions and to define operations on them by
structural induction.
VI. ALTERNATIVE LOGICS
In the denotational semantics introduced by Dana Scott,
From the many alternative logics which are obtained by
a programming language P is interpreted in a structure
changing the syntax, semantics or inference system of
(D, —) whose universe D is a complete poset, the do-
First-Order Logic, we consider, very briefly, just two.
main. The points of D may include concrete data (words
from some finite alphabet), but also functions of various
sorts and complex mathematical structures which model A. Modal and Temporal Logic
computations, interactions, etc. For each correct expres-
Modal Logic goes back to Aristotle, the traditional founder
sion A and each assignment α to the variables, the deno-
of logic, who took necessity as one of the basic linguistic
tation [[A]](α) is a point in D, determined by a structural
constructs worthy of logical study. The modern syntax is
induction of the following general form: first a (Scott-
obtained by adding to FOL the propositional box operator
continuous) recursive equation (9) is constructed from
, so that with each formula A we have the formula A
α and the denotations of the parts of A, and then we
(with the same free variables), read necessarily A. The
take
possibility operator is defined by the abbreviation ♦ A ≡
[[A]](α) = the least fixed point of [x = π (x)]. ¬¬A.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
Modal formulas are interpreted in Kripke structures property q,” and ♦ p says that “ p will eventually become
and remain true,” both interesting properties of finite state
M = (W, s0 , {Ms | s ∈ W }, R),
machines. This temporal logic is decidable, and so are
of a specified vocabulary σ , where W is some set of pos- various extensions of it, in which essentially all interesting
sible worlds; s0 is a specified “actual world”; each Ms is a liveliness and fairness properties of finite state machines
σ -structure associated with the world s; and R(s, t) is an can be expressed, so that one can mechanically verify the
accessibility relation on the worlds, intuitively standing “correct behavior” of finite state machines. The relevant
for “t is a possible alternative to s.” There are no fixed, algorithms are practical, if not simple, they are used com-
general assumptions about the accessibility relation or the mercially, and they provide a spectacular example of the
interpretations of the given relations on the various worlds; emerging field of applied logic.
it could be, for example, that “Mary is John’s wife” in the
actual world s0 , but in alternative possible worlds John’s
wife might be Ellen, John may not have a wife, or he may B. Intuitionistic Logic
not even exist. Assign associate objects in the possible First-Order Intuitionistic Logic FOL I has the same syntax
worlds to individual variables, and the basic, semantic re- as FOL, and almost the same inference system: we simply
lation Ms , α |= A is defined by the Tarski conditions (for replace the Double Negation Law ¬¬A → A, Eq. (8) in
structures) with the additional clause Section I.E, by the weaker
Ms , α |= A (8) I ¬A → (A → B).
Kripke has established a Completeness Theorem for
⇔ for all t, if R(s, t), then Mt , α |= A.
FOL I using a variation of his semantics for Modal Logic,
For example, if R is transitive, and this is useful for obtaining unprovability results for
FOL I . The language, however, is meant to be understood
R(s, t) & R(t, t ) =⇒ R(s, t ), constructively, and so it is not really possible to explain
then, the formula its semantics fully within classical mathematics. Aside
from philosophical concerns, the real interest of Intuition-
A → A (11) istic Logic comes from the proof theory of FOL I , which,
is satisfied by all assignments, in all possible worlds, while somewhat surprisingly, also has important applications to
it may fail for some A in nontransitive structures. Finally, computer science. Some sample results:
M, α |= A ⇔ Ms0 , α |= A.
(1) For any two sentences A and B,
Different conceptions of “necessity” can be modeled
by placing appropriate restrictions on the accessibility re- I A ∨ B =⇒ I A or I B,
lation, for example that it be transitive, linear, etc., and and hence I p ∨ ¬ p.
there is a question of constructing a suitable inference (2) In Heyting Arithmetic, i.e., the axiom system PA of
system and proving the appropriate Completeness Theo- Section III.A with Intuitionistic Logic, for any
rem in each case. A great deal of interesting work has been sentence (∃x)A,
done in this area, much of it motivated by puzzles in the
philosophy of language. PA I (∃x)A
If we take W = N for the set of possible worlds, with ⇒ for some n, PA I A{x :≡ n }.
s0 = 0 and R(s, t) ⇔ s ≤ t, and if we read A as “from
now on A,” we get one version of Temporal Logic, very (3) If PA I (∀x)(∃y)A and (∀x)(∃y)A is a sentence (no
useful for applications to computing systems. The worlds free variables), then there is a computable function
are interpreted by the states of some finite state machine, f , such that for all n,
the propositional variables stand for properties of states, PA I A{x :≡ n , y :≡ f (n) }.
and the propositional formulas (which suffice) can express
interesting properties of the system, especially if we aug- This last result is obtained with Kleene’s Realizability
ment the language with some additional, natural primitives Theory, and it illustrates the following general principle:
like Next with the truth condition from a constructive proof of (∀x)(∃y)R(x, y), we can ex-
tract an algorithm which computes for each x, some y such
Ms |= Next A ⇔ Ms+1 |= A.
that R(x, y). There are obvious applications of this idea
For example, ( p → Next q) says that “every state which in computer science, and much of the current research in
has property p is followed immediately by one which has intuitionism is motivated by it.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
VII. SET THEORY without counting, because of the obvious one-to-one cor-
respondence between left and right shoes. The princi-
Sets are collections into a whole of definite and separate ple here is that equivalent sets have the same number of
objects of our intuition or thought, according to Georg members,
Cantor, who initiated their mathematical study in the mid |A| = |B| ⇔ A ∼c B, (12)
1870s. Thus the basic relation of the theory is member-
ship (∈), where A ∼c B indicates that some one-to-one correspon-
dence exists between the members of A and the members
x ∈ A ⇔ x is a member of A, of B, and
and a set is completely determined by its members, |X | = the number of objects in the set X.
A = B ⇔ (for all x)[x ∈ A ⇔ x ∈ B]. This is a basic tool in mathematics: we count a set A
Finite sets can be simply enumerated, e.g., A = {0, 5, 7}. by establishing a one-to-one correspondence between its
Infinite sets are usually specified by means of some con- members and the members of some already-counted set
dition P(x) which characterizes their members, and we B. Moreover, if we set
write A &c B ⇔ for some subset C ⊆ B, A ∼c C,
A = {x | P(x)} then, obviously,
to indicate that A “is the set of all objects which satisfy |A| ≤ |B| ⇔ A &c B, (13)
P(x).”
Cantor was led to the study of arbitrary, abstract sets and we can often prove indirectly that there are objects in
in his effort to understand the structure of some specific B which are not in A by showing (using arithmetic) that
sets of real numbers or pointsets, and the theory which he |A| < |B|, so that B ⊆ A is impossible.
created still exhibits today these two related but separate Cantor proposed to associate a cardinal number |X |
concerns. The theory of pointsets or descriptive set theory with every (finite or infinite) set X , so that Eqs. (12) and
is primarily a theory of definability on the real numbers, (13) hold, and then to use similar counting and (infinite)
and it is characterized by its applications to other fields cardinal arithmetic techniques in the study of arbitrary
of mathematics, especially analysis. Abstract set theory sets. One might expect problems, because a finite set can-
is primarily a theory of counting, an extension of com- not be equivalent with one of its proper subsets (by the
binatorics to the transfinite. The best set-theoretic results so-called Pigeonhole Principle), while
are about the interaction between these two poles of the N = {0, 1, 2, . . .} ∼c {0, 2, 4, . . .} (14)
subject.
via the correspondence f (n) = 2n. Cantor showed that,
At about the same time as Cantor’s original contribu-
despite this “paradox,” his cardinal arithmetic is a power-
tions, Gottlob Frege initiated an effort to create a foun-
ful tool with important applications in almost all areas of
dation of mathematics on the basis of set theory. Frege’s
mathematics.
approach was different (he took “function” rather than
Cantor’s first fundamental discovery was that there are
“set” as his primitive notion) and his original program
(at least) two infinite sizes of sets: if
was overly ambitious and failed. He had the right basic
idea, however, that all objects of classical mathematics ℵ0 = |N|, = |R| = |the real numbers|
can be “defined within set theory,” so that their properties
are the cardinal numbers of the two most basic sets in
can be (ultimately) derived from properties of sets. It took
mathematics, then
some time for this to take hold, but it is fair to say that
since the 1930s, set theory has been the official language ℵ0 < . (15)
of mathematics, just as mathematics is the official lan- A set A is countable if |A| ≤ ℵ0 , otherwise it is, like R,
guage of science. This richness of the field makes it fertile uncountable.
ground for logical investigations, and it is not an accident To define the arithmetical operations on (possibly infi-
that logicians have been involved with set theory from the nite) cardinal numbers, choose sets K , L with no members
beginning. in common so that κ = |K |, λ = |L|, and set
κ + λ = |K ∪ L|,
A. Cardinal Arithmetic
κ · λ = K × L,
There are exactly as many left shoes in a (normal) shoe
store as there are right shoes—we can be sure of this κ λ = |(L → K )|.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
Here the union K ∪ L is the set of all objects which be- Cantor’s Paradise. Exponentiation is the source of the
long to either K or L; the Cartesian product K × L is deepest questions about infinite sets, chief among them
the set of all ordered pairs (x, y) with x ∈ K and y ∈ L; Cantor’s Generalized Continuum Hypothesis (GCH), the
and the function space (L → K ) is the set of all func- claim that for all infinite κ,
tions f : L → K . If κ and λ are finite, we get the usual (GCH) 2κ = κ + = the least cardinal number > κ.
sum, product and exponential, noting, in particular, that
there are κ λ functions from a set of size λ to one of size The “ordinary” case (CH) 2ℵ0 = ℵ+ 0 was No. 1 in Hilbert’s
κ. Moreover, all the familiar arithmetical identities hold, list, it dominated set-theoretic research in the 20th century
e.g., addition and multiplication are associative and com- and, in a sense, it is still open today.
mutative, multiplication distributes over addition, κ 0 = 1, In addition to the cardinal numbers, which count the
and members of a set one, two, three, . . . in the finite case,
Cantor also introduced infinite versions of the ordinal
κ λ+µ = κ λ · κ µ , (κ λ )µ = κ λ·µ . numbers first, second, third, . . . which assign position in
As examples of “proofs by counting,” Cantor showed a sequence. These are associated with “transfinite se-
first that quences,” i.e., well-ordered structures (A, ≤), where ≤
ℵ0 + ℵ0 = ℵ0 · ℵ0 = ℵ0 (16) is a linear ordering on A (so that x ≤ y or y ≤ x, for all
x, y in A) and every non-empty subset of A has a least
(basically because of Eq. (14)), and element. Every ordinal number α has a successor α + 1
= 2ℵ0 . which defines “the next position,” and every set of ordi-
Both of these facts are easy, but they support the nal numbers A has a least upper bound sup A. The least
computation infinite ordinal number ω defines the first position with
infinitely
2 = 2ℵ0 · 2ℵ0 = 2ℵ0 +ℵ0 = 2ℵ0 = ,
0, 1, 2, . . . ω, ω + 1, ω + 2, . . . ω · 2, ω · 2 + 1, . . .
which means that there is a one-to-one correspondence be-
many predecessors, and it is a limit ordinal, without an
tween the line and the plane, and, hence, between the line
immediate predecessor.
and real n-space, for every n. This was new, it was surpris-
Ordinal arithmetic has fewer direct applications than the
ing, and it was proved by “plain arithmetic.” Eventually
arithmetic of cardinal numbers, but well-ordered struc-
it motivated the development of dimension theory, whose
tures and ordinal numbers are the fundamental tools in
basic result is that there is no continuous, one-to-one corre-
the study of transfinite iteration, which is rich in appli-
spondence of real n-space with real m-space unless n = m.
cations. In a typical case, a function f : A → B is de-
Moreover, the set of rational numbers is countable, and so
fined by recursion on some well ordered structure (A, ≤),
is the set of algebraic numbers, the solutions of polyno-
and then the crucial properties of f are established by
mial equations
induction along ≤. Moreover, the exact specification of
a0 + a1 x + a2 x 2 + · · · + x n = 0 the relation ≤ is often unimportant: all that matters is that
some relation well orders A, in other words that A be well
with integer coefficients. Thus, since R is uncountable,
orderable.
“by simple counting” there exist transcendental (not alge-
braic) real numbers, a famous result of Liouville’s whose (WOP) Is every set well orderable?
original proof had rested on delicate convergence argu- Specifically, is the set R of real numbers (where many of
ments for infinite series. It was a “killer application” which the applications lie) well orderable? The natural ordering
made set theory instantly known (and somewhat notori- of R won’t do, since (for example) R has no least element,
ous) in the mathematical community. and it is hard to imagine how one could arrange all the real
Cardinal addition and multiplication satisfy the follow- numbers into a transfinite sequence, with each point fol-
ing absorption laws which basically trivializes them in the lowed by its successor and every nonempty subset having
infinite case: a least element. The Continuum Problem (whether CH is
if 0 < κ ≤ λ and λ is infinite, true or not) and this Well-Ordering Problem were the cen-
tral open problems in set theory at the turn of the 20th
then κ + λ = κ · λ = λ.
century.
For exponentiation, however, Cantor extended Eq. (15) to
the general inequality
B. The Paradoxes
κ < 2κ ,
Cantor developed his theory on the basis of the following
which provides infinitely many distinct “orders of infin- General Comprehension Principle which flows naturally
ity,” perhaps what people meant when they referred to from his “definition” of sets quoted in the beginning of
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
this section: every definite (unambiguous) property P(x) it is claimed (roughly, and in the later simple version due
of mathematical objects, has an extension, the set A = to Ramsey), that every mathematical object is of a certain
{x | P(x)} which “collects into a whole” all the objects (natural number) type n, and that every set A is of some
which satisfy P(x), so that successor type n +1, such that the members of A are of the
immediately preceding type n. Type theory is awkward to
x ∈ A ⇔ P(x). (17)
apply and it yields only a poor shadow of Cantor’s set the-
But this is not generally true: because if ory, albeit without the paradoxes. It never gained favor as
a true alternative to set theory, although it has been studied
R = {x | is a set and x ∈
/ x},
extensively as a logical system, it has found its own appli-
then, from Eq. (17), cations (especially recently, to programming languages),
and many of its fundamental ideas were eventually incor-
R ∈ R ⇔ R is a set and R ∈
/ R⇔R∈
/ R
porated in the reformulation of set theory which eventually
which is absurd. The argument was discovered in 1902 prevailed.
by Bertrand Russell, and it was not the first contradic- What has prevailed is Axiomatic Set Theory, first pro-
tion in set theory. However, earlier “paradoxes” (some posed in 1904 by Zermelo as a pragmatic way to avoid the
of them known to Cantor) were technical, not unlike the paradoxes by rebuilding Cantor’s set theory on the basis of
paradoxes with infinitesimals which had been common- a few set-theoretic principles which are basic, simple, and
place in calculus some years earlier, and it was thought well understood by their uses in classical mathematics.
that they would go away in a careful development of the Formalists can accept it, as nothing more but the choice of
subject. The Russell Paradox is not technical, it goes to the a specific set of axioms, whose “truth” is irrelevant, if, at
heart of the nature of sets, and it threw the mathematical all, meaningful. But it is the realists who, in the end, have
community into a spin. received the greatest comfort from axiomatic set theory:
L. E. J. Brouwer initiated the intuitionistic program because the systematic development of consequences of
which denies that abstract sets are meaningful objects of the axioms eventually led to a narrower, more concrete
study and also rejects some of the basic principles of logic. concept of set, which ultimately justified the axioms.
Mathematical objects cannot be said to “exist” in any sense Much of modern logic was created in response to the
independent of (mental) “mathematical activity”; and to challenge of the set-theoretic paradoxes, and that is an-
prove that some x has property P, one must construct other reason why the discipline is so intimately tied with
some specific object x which has property P. It is not set theory.
enough to derive a contradiction from the assumption that
no x has property P. Intuitionism had a strong influence
in the philosophy of mathematics and remains a vibrant C. Zermelo-Fraenkel Set Theory
field of study within logic, but it never carried much favor There are eight axioms in ZFC (Zermelo-Fraenkel Set
with mathematicians: too much of classical mathematics Theory with Choice), and it is assumed that they are in-
must be thrown out to satisfy its tenets. terpreted over some given domain of sets V, which comes
Hilbert proposed to “save” classical mathematics from endowed with a binary membership condition, x ∈ y. The
the paradoxes and Brouwer’s attack by formalizing as formal theory ZFC is obtained by expressing these axioms
large a part of it as possible in some first-order, axiomatic by sentences of FOL(∈), and it requires infinitely many
theory T , and then establishing the consistency of T by ab- sentences, because the Replacement Axiom 5 requires an
solutely safe, finitistic methods. Formalism is the reading axiom scheme. Here we will describe them briefly and
of Hilbert’s Program as a philosophical view: it alleges informally, with a few interspersed comments.
that once T is chosen, then T is all there is—there is noth-
ing more to mathematics but the study of the inference
relation T A, with no reference to meaning. Aside from 1. Extensionality. Two sets are equal exactly when the
the impact of Gödel’s Second Incompleteness Theorem have the same members.
(Section IV.E) which weakens it, formalism also fails to 2. Empty set and Pairing. There is a set ∅ with no
account for the applications of mathematics: it is hard to members, and for any two sets a, b, there is a set
see how the existence or not of certain patterns of meaning- {a, b} whose members are exactly a and b.
less symbols can have any bearing on the escape velocity 3. Unionset. For each set A, there is a set ∪A whose
of a rocket. members are the members of the members of A,
From those reluctant to abandon the traditional, realist t ∈ ∪A ⇔ (∃x)[x ∈ A & t ∈ x].
view that mathematical objects are, well, real, no matter
how abstract and difficult to pin down, Russell first pro- 4. Powerset. For each set A, there is a set P(A) whose
posed to replace set theory by his famous theory of types: members are all the subsets of A.
P1: GNHFinal Pages
Encyclopedia of Physical Science and Technology EN009B-410 July 19, 2001 18:42
and then we model a function f by its graph, 8. Foundation. Every set is a member of some Vα .
G f = {(x, y) ∈ A × B | y = f (x)},
This is a limiting axiom, not needed for the develop-
which is just a set with some special properties. It is ment of Cantor’s set theory or its applications, but it is
common to use the so-called Kuratowski pair important because it codifies within the axiomatic theory
operation a conception of set which replaced in the 1930s Cantor’s
(x, y) = {x, {x, y}}, free-wheeling notion of a “collection into a whole”: each
set is reached starting with “nothing” (the emptyset ∅),
but there are many others, and all that is needed is by “indefinite” (never ending) “iteration” of the powerset
some operation which satisfies Eq. (18). operation. Admittedly more complex than Cantor’s, this
6. Infinity. There is a set I and a one-to-one function notion of grounded set prohibits the circular constructions
f : I → I which is not onto I , f [I ] I . which lead to the paradoxes, and it can be described intu-
Next comes Zermelo’s chief contribution: itively in sufficiently clear terms to justify the axioms.
To see how classical mathematics can be developed on
the basis of these seven axioms, consider first arithmetic.
f A number system is a triple (N, 0, S) such that N is a set,
R 0 ∈ N, S : N → N is a one-to-one function which is never
B
0, and
D. Independence Results lem, now that we know that it cannot be settled in ZFC?
Some have adopted a formalist view, that it is meaningless
It is, perhaps, ironic, that the axiomatization of set theory
to ask “whether CH is true or not,” and that “set theory is
made possible to formulate and prove its own limitations.
the study of all models of ZFC.” This is a very active area
Let ZF be the theory with axioms 1–6, i.e., without the
of research.
Axiom of Choice:
In another direction, people have looked for new ax-
Theorem. If ZF is consistent, then so are the theories ioms, extending ZFC, which might provide the needed
ZFC+GCH (Gödel, 1938) and ZFC+¬CH (Paul Cohen, answers, and a great deal of research has been done in
1963). this direction since the 1960s. Generally speaking, two
kinds of axioms have been considered. Large cardinal
In effect, ZFC can neither disprove nor prove the Con-
axioms are plausible generalizations of the Axiom of In-
tinuum Hypothesis, unless a contradiction can be obtained
finity, which, however, have very few direct consequences
from its “constructive” core. In addition, Cohen showed
for the continuum. Determinacy hypotheses postulate that
that ZF cannot prove the Axiom of Choice, and several
certain (fairly simple) infinite games on the natural num-
additional consistency and independence results.
bers are determined; somewhat technical and not espe-
Gödel’s proof uses an inner model, a sub-collection of
cially plausible, these axioms answer most definability
our intended universe of sets V: using only axioms 1–6,
questions about the real numbers that are independent of
he defines a certain collection L of constructible sets and
ZFC, although, unfortunately, they cannot settle the Con-
shows that if we reinterpret “set” to mean “member of
tinuum Problem. In a fundamental advance made in the
L,” then all the axioms of ZFC as well as GCH are true.
1980s, Donald A. Martin, John Steel, and Hugh Woodin
Cohen’s forcing method builds “virtual universes” which
showed that the plausible large cardinal axioms imply
are “larger” than V, and so he must describe them indi-
the fruitful determinacy hypotheses, and so a “unified,”
rectly. This can be done with Boolean-valued models: a
very strong extension of ZFC has been created which is
collection M ⊂ V and a binary condition E on M are
the subject of much current research. Unfortunately, it
defined, and then it is shown that, for a certain (com-
does not solve the Continuum Problem, and so the search
plete) Boolean algebra B, the Boolean semantics of the
goes on.
“structure” (M, E) assign 1 to all the theorems of ZFC
It may well be that set theory will continue to be domi-
but something other than 1 to CH.
nated in the 21st century by the search for an answer to the
In both of these proofs, logic plays an essential role
Continuum Problem, as it certainly was during the century
which goes much beyond providing the context in which
just ended.
their claims can be made precise. For example, the con-
structible universe L is defined by iterating the operation
of taking all first-order definable subsets rather than P(A) SEE ALSO THE FOLLOWING ARTICLES
in the cummulative hierarchy of sets, and then a strong
version of the Skolem-Löwenheim Theorem is used at a BOOLEAN ALGEBRA • COMPUTER ALGORITHMS • DATA-
crucial point to show that GCH holds in L. Through the BASES • FUZZY SETS, FUZZY LOGIC, AND FUZZY SYSTEMS
work, initially, of Robert Solovay for forcing and Ronald • SET THEORY
Jensen for constructibility, these theories have been much
generalized and continue to be very active research areas
of logic, with important applications to analysis, algebra BIBLIOGRAPHY
and topology.
Abramsky, S., Maibaum, T. S. E., and Gabbay, D. M., eds. (1993). “Hand-
book of Logic in Computer Science,” Clarendon Press, Oxford.
Buss, S. R., ed. (1998). Handbook of Proof Theory. In “Studies in
E. Current Research in Set Theory Logic and the Foundations of Mathematics,” Vol. 137, Elsevier,
In one direction, set theory is more involved now with Amsterdam/New York.
Hodges, W. (1993). Model Theory. In “Encyclopedia of Mathematics
applications than ever before. Especially fruitful has been and Its Applications,” Vol. 42, Cambridge Univ. Press, Cambridge,
the development in the 1960s of effective descriptive set U.K.
theory, which incorporates methods from recursion theory Rogers, Jr., H. J. (1967). “Theory of Recursive Functions and Effective
into the study of definability on the continuum to yield very Computability,” McGraw-Hill, New York.
substantial applications to analysis. Kunen, K. (1998). Set Theory. In “Studies in Logic and the Foundations
of Mathematics,” Vol. 102, Elsevier, Amsterdam/New York.
Beyond the applications, set theory has attempted to Moschovakis, Y. N. (1980). Descriptive Set Theory. In “Studies in
confront the fundamental problem posed by the indepen- Logic and Foundations of Mathematics,” Vol. 100, North Holland,
dence results: what does one do with the Continuum Prob- Amsterdam.
P1: GSS Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN009H-411 July 6, 2001 19:51
Mathematical Modeling
Xavier J. R. Avula
University of Missouri, Rolla
I. Introduction
II. Mathematical Modeling
III. Classification of Mathematical Models
IV. Formulation of Mathematical Models
V. Solution Techniques
VI. Model Validation
VII. Chaos and Complexity
VIII. Modeling with Neural Networks
IX. Mathematical Modeling and Computers
X. Concluding Remarks
219
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology EN009H-411 July 6, 2001 19:51
years, mathematical modeling has become a powerful tool for all humankind. The regularity observed in the natural
to solve complex, interconnected, and interacting phenom- processes of the universe is best expressed and explained
ena arising from the rapid developments taking place in in mathematical terms. Mathematical description has pro-
science and technology. The success in physical sciences vided the tools and motivation for numerous discoveries
in terms of valid mathematical models has led scientists in cosmology and atomic phenomena, as well as in bi-
to extend the modeling methodology to other emerging ology, material science, earth sciences and the social be-
fields of inquiry in which great strides have been made. havior of animal and human populations. In engineering
The explosive growth of mathematical modeling activ- and technology, mathematical concepts and analyses have
ity has been a driving force behind the development of contributed greatly to the understanding, design, and op-
high-speed digital computers, which in turn aided model eration of complex systems. The cost and risk involved in
solutions in a symbiotic relationship. testing a real physical or engineering system are usually
prohibitive. The alternative is to develop a mathematical
model of the system and investigate its performance. How
I. INTRODUCTION many of us would “pay the price for reaching the sun and
learning its shape, its size, and its substance?”
Modern civilization has its roots in human endeavor to
understand the physical universe. The effort to systemat-
ically understand the universe and the various phenom- II. MATHEMATICAL MODELING
ena in it appears as a never-ending struggle imbibed with
frustration and romance that sparked the imagination of Derived from its Latin root modus, the word “model” is
humankind in the course of history. The birth of the sci- generally understood to stand for an object that repre-
entific method and the ensuing pursuits have lead men sents a physical entity with a change of scale. For exam-
and women of learning to the study of phenomena sys- ple, a model airplane is a scaled-down version of a “real”
tematically and produced a vast edifice of knowledge in airplane by a few orders of magnitude. As any airplane
sciences and mathematics. Galileo (1564–1642), the fa- model builder knows, the behavior of the model and the
mous Italian astronomer and physicist, firmly enunciated real airplane differ in more ways than one, in spite of
that the language of science is mathematics. The German their physical similarity, ensuring a missing ingredient—
philosopher Immanuel Kant (1724–1804), in the preface something that has fallen out during the model building
to his book “Metaphysical Foundations of Science,” de- process. What then is a mathematical model? A mathemat-
clared “that each particular discipline contains only as ical model is a set of mathematical equations representing
much science as it contains mathematics.” No more words a process or a system. It is a mathematical idealization
need to be wasted to say that science and mathemat- of a real-world phenomenon. In the sense of the model
ics are intertwined. If science and mathematics are inter- just defined, it represents a change on the scale of abstrac-
twined, can technology, the child of science, exist without tion. Also, the way one sees the world depends upon the
mathematics? structure of one’s language; different languages give rise
In striving to understand phenomena, one must not for- to different concepts, and to many “alternate realities.” In
get the Aristotelian adage, “The primary question is not the process of idealization, some simplifications will have
what do we know but how do we know it.” What we know been made in obtaining the mathematical model. There-
and its reliability should be the consequence of the process fore, the mathematical model is less real than the system
that answers the question how do we know it. it is supposed to represent; it is a mathematical represen-
The famous mathematician Hilbert, after a sojourn in tation of the modeler’s perception of some aspects of real-
the chemistry department at the University of Göttingen, ity within the confines of a formal mathematical system.
said that chemistry is too difficult for chemists. The Ameri- Nevertheless, it is an essential step in the construction of a
can mathematician Richard Bellman echoed that medicine theory. To quote Boltzmann, there is nothing as practical
is too difficult for physicians, politics is too difficult for as a good theory.
politicians, and economics is too difficult for economists. In the process of mathematical modeling, the objective
So is engineering for engineers and physics for physi- of the modeler is, in general, to construct a model of an
cists. Then how can they be enlightened? The enlighten- observed phenomenon and use the model to predict its
ment comes from exploring the behavior of mathematical future course. In some cases, modeling is used to explain
models of the processes they encounter in their respective the known facts and lay a foundation for the theory be-
endeavors. hind the phenomenon. Thus, a model is a mathematical
The physical universe is an abundant source of won- manifestation of a particular theory. Some modelers have
der. Benevolently exploited, it is a life-enhancing system an altruistic motive to apply the model characteristics to
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology EN009H-411 July 6, 2001 19:51
considered valid and then put into practice to predict a application of mathematical and empirical knowledge to
future event, or to make a decision. Otherwise, the model the problems of science and technology. Even social sci-
is invalidated, and steps B–D are repeated by revising one ences and business, which are not traditionally subjected
or more ingredients in the process. Even a scrutiny of the to mathematical treatment, are not spared from the pro-
solution technique (step E) is worth considering. Thus the ductiveness of mathematical modeling. The catalysts in
process of mathematical modeling is iterative in nature. the growth of mathematical modeling have been
There is no uniqueness in the model-building process.
Although the steps in the modeling process are subjec- 1. The advent of the computer age with rapid
tive, they are somewhat similar. Figures 2 and 3, extracted developments in computer technology, specifically,
from different sources, essentially have the same features the developments in speed, memory, and software
presented in Fig. 1, but indicate the role of computers in 2. The developments in numerical techniques and
mathematical modeling rather explicitly. analysis
In recent years, the growth in mathematical modeling 3. The developments in systems theory and simulation
has been so phenomenal that it is evolving into a disci- 4. The progress in empirical knowledge resulting in a
pline in its own right. The offering of courses on math- greater understanding of various physical processes
ematical modeling in the colleges in the United States,
Europe, Asia and Australia has been on a steady increase. The role of experience and intuition in the process of
The international conferences and symposia on mathe- mathematical modeling accord to it an air of art that makes
matical modeling and the journals that deal exclusively the process subjective. However, the commonality of fea-
with mathematical modeling support this viewpoint. The tures in the process of mathematical modeling in all sci-
proliferation of mathematical models in the scientific lit- ences and technology is so overwhelming that the sub-
erature can be attributed to the explosive growth in the jectivity is cast as a minor essence. Let us now turn our
attention to these common features that enhance the ob-
jective view of mathematical modeling.
III. CLASSIFICATION OF
MATHEMATICAL MODELS
FIGURE 3 Guide to model building. [From Jacoby, L. S., and Kowilik, J. S. (1980). “Mathematical Modeling with
Computers,” Prentice-Hall, Englewood Cliffs, NJ.]
jumps. The former subclass of models are represented by functions result in a response that is the superposition of
ordinary and/or partial differential equations, for exam- corresponding outputs. Simulations and exact solutions
ple, for the boundary-layer flow over a wing of an aircraft. of linear systems of equations are well established and
On the other hand, the production line in a factory is a reported in mathematical literature.
discrete-state system. All systems that are not linear belong to the class of
Yet another classification breaks down mathematical nonlinear systems. As there are no general methods of
models into stochastic (or probabilistic) and determinis- solutions to solve and analyze models of nonlinear sys-
tic models. Models with at least one random variable in tems, simulation has become a commonly used tool for
their description are termed stochastic. Models other than this purpose. Impacted by recent developments in numer-
stochastic are deterministic. In fact, deterministic is a spe- ical analysis and by advances in powerful, high-speed and
cial case of stochastic. high-memory digital and analog computers, simulation
Another classification of models is based on how the of nonlinear systems have blossomed into technical en-
real system interacts with the environment. If the real sys- deavors that are broad in scope and variety encompassing
tem is isolated from its environment, the model is called biological, ecological, social, and economic systems in
autonomous. If the environment exerts any kind of in- addition to those in hard scientific fields such as physics,
fluence through so-called input variables from the envi- chemistry, material science, and mechanics. Modern non-
ronment without being controlled by the model, then the linear equations frequently encountered in simulation in-
model is called nonautonomous. clude, for example, the quaternion rotational equations
When models are expressed in terms of mathematical of motion that are treated with geometric algebra which
equations, the solutions are profoundly affected by non- is of late applied to a range of problems in many fields
linearity. Based on the type of equations, the models are of science and being promoted as a unified mathematical
classified as linear or nonlinear. These classifications can language in the 21st century.
be combined with the above-mentioned model types to Traditional quantitative techniques of modeling can-
yield, for example, linear or nonlinear dynamic and linear not be effectively applied to model phenomena in the so-
or nonlinear stochastic models. Although the universe is called “soft” disciplines consisting of humanistic systems
predominantly nonlinear, some processes are linear within such as social, economic, ecological, and biological sys-
certain bounds. Actually, linearity can be viewed as a spe- tems because there may arise difficulties associated with
cial case of nonlinearity. Sometimes, it is mathematically multidimensionality, subsystem interactions, inexplicable
expedient to consider nonlinear processes as piece-wise feedback mechanisms, hierarchical structures, and unpre-
linear. Systems are classified as linear if and only if they dictable behavioral dynamics. Such difficulties may also
simultaneously satisfy the principle of homogeneity and be encountered in mechanistic systems that may include
the principle of superposition. The principle of homogene- some physical and engineering systems. These difficulties
ity preserves the input function scale factor in transition lead to making imprecise statements, introducing fuzzi-
from input to output. A system satisfies the superposi- ness much like human reasoning, about the characteristics
tion principle when the superposition of individual input and behavior of a system. An important class of models
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology EN009H-411 July 6, 2001 19:51
using fuzzy sets was launched about 35 years ago. The with the application of physical laws—Newton’s laws,
idea of fuzzy sets is centered on the imprecision in the Maxwell’s laws, Kirchhoff’s laws, and balance laws,
belonging of elements to a set. Intuitively, a fuzzy set is a which include mass balance, energy/heat balance, mo-
set of elements that are not precise and in which there is mentum balance, impulse balance, and entropy balance—
no distinct boundary between the elements that belong to to the phenomena being studied. In these laws, a number of
the set and those that do not. In other words, the transition relationships between the variables are expressed in terms
from full membership to nonmembership of an element (or of ordinary differential equations, partial differential equa-
elements) is blurred, so that a gradation of partial member- tions, and difference equations. A detailed presentation of
ship is possible. Mathematical concepts based on this idea these laws is beyond the scope of this article. However,
have been successfully applied to modeling of systems in some examples are appropriate.
“soft” desciplines as well as in engineering when impre- The first and second laws of thermodynamics in differ-
cision is present. Fuzzy systems models are categorized ential form are stated as
into two types: liguistic and rule-based. An elaborate dis-
cussion of these types of models and modeling techniques dU = d Q − dW (1)
is beyond the scope of this article. dS = dQ/T, (2)
where U is the internal energy,Q is the heat absorbed by
the system, W is the work done by the system, S is the
IV. FORMULATION OF entropy, and T is the temperature.
MATHEMATICAL MODELS The fundamental law of heat conduction in one dimen-
sion is represented by the ordinary differential equation
A. Conventional Modeling (Direct Modeling)
dQ/dt = −kA (dT/dx),
Experimental observations and measurements are gener-
ally accepted to constitute the backbone of physical sci- where dQ/dt is the time rate of heat transfer across the
ences and engineering because of the physical insight they area A, k is the thermal conductivity of the medium, and
offer to the scientist for formulating the theory. The con- dT/dx is the temperature gradient.
cepts that are developed from the observations are used as Newton, in his famous “Principia,” expressed the sec-
guides for the design of new experiments, which in turn are ond law of motion in terms of momentum, which in sym-
used for validation of the theory. Thus, experiments and bolic form becomes
theory have a hand-in-hand relationship. The information
gathered during observation and measurement is usually F = dp/dt, (3)
presented in terms of curves, tables, block diagrams, cir-
where p is the linear momentum (product of mass and
cuit diagrams, flow diagrams, etc. for convenience of per-
velocity mv). This equation can be written in the familiar
ception in the model building process. These information
form as
display techniques have been tremendously aided by the
advent of high-performance computers. F = d(mv)/dt = m(dv/dt) = ma, (4)
Formulation of theory is equivalent to model building.
Cast in mathematical terms, the theory stands as a mathe- Attempts by scientists to describe the physical world led
matical model of reality. In the conventional sense, the first them to the formulation of various types of partial dif-
stage in model building is to propose and gather equations ferential equations. The mathematical model for the one-
representing all relevant mechanisms of the phenomenon dimensional wave phenomena is the hyperbolic equation
under study. Because of the diversity in the types of infor-
∂ 2 u/∂t 2 = c2 (∂ 2 u/∂ x 2 ). (5)
mation available to the model builder, the equations rep-
resenting the basic phenomenon may be extensive with The phenomena of diffusion of heat in solids is modeled
complex interrelationships that may be difficult to untan- by the parabolic equation derived by Fourier in the form
gle for easy tractability. To preserve fidelity, all equations
that will have significant effect on the system behavior ∂u ∂ 2u
= σ 2. (6)
must be included in the model. It is always better to first ∂t ∂x
produce a simple model and then refine it until it closely The Fourier heat equation in three spatial dimensions and
represents the reality, than to start with a complex model time modeled as
and simplify it for fear of facing mathematical difficulties.
Conventionally, the construction of mathematical ∂u ∂ 2u ∂ 2u ∂ 2u
=σ + 2 + 2 (7)
models for physical and engineering phenomena begins ∂t ∂x2 ∂y ∂z
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology EN009H-411 July 6, 2001 19:51
the mathematical model that describes the system behav- X(s) G(s) Y(s)
ior? This latter problem is called an inverse problem and
generally is referred to as the system identification. Iden- FIGURE 5 Relationship of the transfer function G(s) to the input
tification is defined as the determination, on the basis of X(s) and the output Y(s).
input and output, of a system within a specified class of
systems, to which the system (phenomenon) under study
2. Step-Response Method
is equivalent. In other words, identification is the process
of constructing a mathematical model of a system from Step input is the simplest input that can be applied to a
prior knowledge and observations. Equivalence is often system, like sudden closing or opening of a valve in a
expressed in terms of an error function E, which is a func- hydraulic network. In reality, strict step input is physi-
tion of system output y and the model output ym , that is, cally impossible, but as long as the rise time of the step
input is much shorter than the period of the highest fre-
E = E(y, ym ). (13) quency in the system, the input can be considered as a step
Two models m 1 and m 2 , are said to be equivalent if the input. Step-response problems are considered as impulse-
value of the error function is the same for both models, response problems, and the determination of transfer func-
that is, tion falls into the category of time-domain problems,
which can be solved by one of the gradient methods.
E y, ym 1 = E y, ym 2 . (14)
fashion after expanding the cost function in Taylor se- general problem for which a sequential estimate of sys-
ries about an assumed parameter to a desired order. In the tem parameters is made in the solution procedure. Invari-
computational algorithm, one picks the variation in the pa- ant imbedding converts many boundary-value problems
rameter to make the steepest descent toward the minimum into initial-value problems, which makes both the anal-
cost function. The computation is repeated until there is ysis and the computational solution easier. This form of
no significant change in the parameter value from iteration imbedding is also useful in engineering problems where
to iteration. Widely applied among these methods are the sensitivity analysis has to be performed.
first-order, second-order, and conjugate gradient methods.
V. SOLUTION TECHNIQUES
6. Quasilinearization Method
As opposed to gradient technique, this is an indirect For a mathematical modeler, the entire world of applied
method in which a sequence of functions is iteratively mathematics and new mathematical concepts being devel-
computed until the functions converge to the solution. In oped from time to time are open for use. The advent of
this method, recurrence relations are obtained in the form high-speed digital computers has aided the solution tech-
of linear differential equations, even if the model equation niques and opened the doors for modeling more complex
has nonlinearity in its structure—hence the name quasi- systems. Some solution techniques practiced in modeling
linearization. This method is also called the generalized include:
Newton–Raphson method.
Consider, for example, the nonlinear differential 1. Rigorous analysis
equation 2. Symmetry methods
3. Obtaining a priori bounds on the solutions
d x/dt = f (x), x(0) = c, t >0 (17) 4. Method of isoclines
in which the function f is continuous in x and time t and 5. Perturbation theory
has continuous bounded second partial derivatives with 6. Asymptotic analysis
respect to x for all x and t. The function can be expanded 7. Group theory
around x (0) (t) in Taylor series as 8. Inequalities
9. Integration by parts
(0) ∂ f x (0) 10. Extremum principles
f (x) = f x + x − x (0) + . (18)
∂x 11. Numerical approximations (including finite
difference, finite element, boundary element
Neglecting the higher-order terms and combining
methods, invariant imbedding, etc.)
Eqs. (17) and (18) yields the linear differential equation
12. Variational principles
dx (0) ∂ f x (0) 13. Linearization and quasi-linearization
= f x + x − x (0) (19) 14. Integral methods
dt ∂x
15. Complex-variable methods
x(0) = c. (20)
16. Graph theory
Proceeding likewise, one obtains 17. Mathematical programming.
d x (n+1) (n) ∂ f x (n) (n+1) This list is by no means exhaustive. For detailed anal-
= f x + x − x (n) (21)
dt ∂x yses of these topics, the reader must refer to textbooks
x (n+1) (0) = c. (22) on applied mathematics. There are separate textbooks for
several of these topics.
The iterative process begins with an initial approxima-
tion x (0) (t) and proceeds until the sequence of functions
converges. VI. MODEL VALIDATION
system it represents constitute model validation. On the computational algorithms that lead to novel computer ar-
surface, it appears that validation is something that should chitectures. New discoveries in nonlinear dynamics have
be done at the end of the model construction. But, in fact, created new concepts and tools such as fractal dimensions
validation should be carried out throughout the modeling and Lyapunov exponents for detecting and quantifying
process. A valid model can be expected by being logically chaos in physical systems. There are no formal, theoreti-
consistent at each step of the modeling process, by reexam- cal criteria to determine under what conditions a dynam-
ining the assumptions and constraints without sacrificing ical system in general would become chaotic. Significant
the mathematical rigor, and by turning every stone of ap- effort is needed to determine how and when more general
plicable mathematical knowledge. Is it not logical that to and complex physical systems will become chaotic.
obtain valid model behavior one must use valid assump- Complexity is one of the most perplexing problems of
tions and constraints? By doing so one would obtain the systems theory. Many relatively independent subsystems
most accurate model, but perhaps one difficult to solve. that are highly interconnected and interactive manifest
Suppose, by a judicious choice of a solution technique complex systems. As a result, the collective behavior of
from the myriad of techniques available, the modeler de- complex systems is reflected in reproducing the functions
termines the system behavior. This step invariably (at least of truly complex, self-organizing, replicating, learning and
in physical sciences and technology) involves a numerical adaptive systems. Systems consisting of a large number of
approximation associated with computer programming, interacting elements lead to a perception of complex and
which should be carefully carried out and correctly imple- disorderly behavior. It is generally believed that biological
mented. To generate confidence in the model, the system systems involve more interacting elements than physical
behavior must be compared with the real system data. It systems, and therefore the latter are simple with orderly
is nearly impossible, except for some simple situations, to behavior. However, recent developments in irreversible
obtain global agreement (agreement over the entire range thermodynamics, in the theory of dynamical systems, and
of parameters). Then the modeler must choose the range of in classical mechanics have narrowed the gap between the
validity for various model parameters and establish other simple and complex, and between order and disorder; un-
criteria, such as comparison of results by different solution der certain conditions, simple systems are also known to
techniques, by which to judge the model validity. exhibit complex behavior. Complexity in system behavior
The validity of a model should not be judged by mathe- is characterized by instabilities and bifurcations. In mod-
matical rationality alone; nor it should be judged purely by eling complex behavior of a system, one must first assess
empirical validation at the cost of mathematical and sci- the nonlinear character of the underlying dynamics and
entific principles. A combination of rationality and em- identify a set of variables that control these instabilities
piricism (logic and pragmatism) should be used in the and bifurcations.
validation. If necessary, all or some of the steps in the
modeling process should be repeated several times until
the model is acceptable for use. VIII. MODELING WITH
NEURAL NETWORKS
VII. CHAOS AND COMPLEXITY As processes increase in complexity, they become less
amenable to direct mathematical modeling based on phys-
In deterministic physical and mathematical systems, when ical laws. In the later half of the 20th century, artificial neu-
the model equations are nonlinear, the evolution of the ral networks have made inroads into several disciplines
system behavior becomes irregular and unpredictable and with a wide range of applications. An artificial neural net-
exhibits sensitive dependence on initial conditions. Such work is a network of interconnected units called artificial
behavior is called chaos. It occurs in vibrating objects, neurons that are connected in different patterns to process
in rotating or heated fluids and in some chemical reac- information in a parallel distributed fashion much like the
tions. Understanding chaotic behavior involves tracking human brain. A significant aspect of neural networks is
the time evolution of nonlinear dynamical system or that their ability to learn how to process information in super-
of natural phenomena modeled by a set of nonlinear dif- vised and unsupervised modes. They are used for simu-
ferential equations that arise from classical equations of lation of physical systems that are modeled by massively
physics. With the exception of some first-order equations, parallel networks. In recent years, applications have been
analytical solutions of nonlinear differential equations are developed for modeling simple biological structures with
either difficult or impossible. The desire to overcome this known functions, for the modeling of higher functions
difficulty has presented a significant motivation for the of the central nervous system, for solving complicated
advancement of numerical methods and innovations of problems in artificial intelligence and cognitive sciences,
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology EN009H-411 July 6, 2001 19:51
for pattern recognition, and for solving combinatorial op- been surveyed. In recent years, mathematical modeling
timization problems. Further applications have been ex- has pervaded all branches of knowledge, bringing forth
tended to medical diagnosis, financial services including greater understanding of processes under investigation.
stock price prediction, intelligent control of engineering In engineering and technology it provides the analytical
plants, and manufacturing. Neural networks are also well basis for design and control in which predictions can be
adapted to ill-posed problems, those with damaged or in- confidently made without spending valuable resources of
complete data. money and effort.
Successful applications of mathematical modeling
techniques in engineering sciences have led the way to
IX. MATHEMATICAL MODELING extend the techniques to more exotic areas of inquiry,
AND COMPUTERS like nanotechnology, nuclear-reactor engineering, mate-
rial science, environment, weather prediction, biological
In all endeavors involving mathematical modeling, one processes, space sciences, cosmology, and also social sci-
cannot fail to perceive that computers act as an interface ences. Although the general philosophy of modeling in
between phenomena and the various stages of mathemat- these new areas remains the same as discussed in this ar-
ical modeling, including validation and use. Spectacular ticle, the simulation procedures and validation criteria are
advances have been achieved in modeling complex and different and dependent on the types of models and the
large systems using computers by removing the need for disciplines they belong to.
analytical solutions to differential equations representing Mathematical modeling is a vast, multidisciplinary field
the systems. The advances in high-speed computers and that pleads to engage the interest and dediation of engi-
efficient numerical algorithms have generated acceptable neers, scientists and mathematicians to solve the prob-
numerical solutions to systems of the equations hitherto lems facing the humankind. A significant development in
intractable and thus made modeling of complicated, inter- the mathematical modeling activity is the availability of
connected, and interacting systems possible. As a matter very-high-speed computers, which can solve a variety of
of fact, complicated mathematical models expressed in complex models. In spite of all the advances in empirical
terms of nonlinear partial differential equations and new knowledge, solution techniques, and computer assistance,
applications have been the driving force behind the devel- it must be noted that human intelligence, experience,
opment of large computing machines with huge amounts and intuition still play a significant role in mathematical
of shared memory and a processing rate of up to three modeling.
trillion floating-point operations per second. The need for
more efficient computers to simulate complicated models
has driven the computer scientists to think innovatively SEE ALSO THE FOLLOWING ARTICLES
in the direction of new architectures and procedures. The
greatest potential for achieving higher speed lies in adding CHAOS • CONTROLS, ADAPTIVE SYSTEMS • DIFFEREN-
parallel processors. In recent years, neurocomputers based TIAL EQUATIONS • DISCRETE SYSTEMS MODELING •
on artificial neural networks have been built to process in- GROUP THEORY • LINEAR SYSTEMS OF EQUATIONS •
formation efficiently and cost-effectively, but complemen- NUMERICAL ANALYSIS • PERTURBATION THEORY •
tary to algorithmic computing. With more developments PROBABILITY • STOCHASTIC PROCESSES • SYSTEM
in the computer technology on the horizon, mathematical THEORY
modelers can expect to achieve more success in model-
ing and simulation of very complicated systems, and also
in revising and reworking the old models, in which dras- BIBLIOGRAPHY
tic simplifications had to be made in the past. Despite
the advances in computers and computational technology, Andrews, J. G., and McLone, R. R. (1976). “Mathematical Modelling,”
Butterworths, London.
the role of analytical methods in deciphering mathemati- Aris, R. (1995). “Mathematical Modelling Techniques,” Dover, Mineola,
cal models should not be overlooked because they offer a NY.
valuable insight to a wide variety of problems. Atherton, D., and Borne, P. (eds.) “Concise Encyclopedia of Modelling
and Simulation,” Elsevier, New York.
Avula, X. J. R. (ed.) (1977). “Proceedings of the First International Con-
ference on Mathematical Modelling,” 5 vols., University of Missouri-
X. CONCLUDING REMARKS
Rolla, Rolla, MO.
Avula, X. J. R., Bellman, R. E., Luke, Y., and Rigler, A. K. (eds.) (1979).
In this article, mathematical modeling concepts in relation “Proceedings of the Second International Conference on Mathemati-
to physical sciences, engineering, and technology have cal Modelling,” University of Missouri-Rolla, MO.
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology EN009H-411 July 6, 2001 19:51
Avula, X. J. R., Kalman, R. E., Liapis, A. I., and Rodin, E. Y. (eds.) (1983). Fulford, G., Forrester, P., and Jones, A. (1999). “Modelling with Differ-
“Mathematical Modelling in Science and Technology,” Proceedings ential and Difference Equations,” Cambridge University Press, New
of the Fourth International Conference, Zurich, Switzerland, 1983, York.
Pergamon, New York. Gibbons, M. M. (1995). “A Concrete Approach to Mathematical Mod-
Avula, X. J. R., Leitmann, G., Mote, Jr., C. D., and Rodin, E. Y. elling,” Wiley, New York.
(eds.) (1987). “Mathematical Modelling in Science and Technol- Haber, R., and Keviczky, L. (2000). “Nonlinear System Identification:
ogy,” Proceedings of the Fifth International Conference, University Input-Output Modeling Approach,” Kluwer Academic, Norwell, MA.
of California, Berkeley (July 1985), Mathematical Modelling, Vol. 8, Jacoby, S. L. S., and Kowalik, J. S. (1980). “Mathematical Modeling
Pergamon Press, New York. with Computers,” Prentice-Hall, Englewood Cliffs, NJ.
Avula, X. J. R. (ed.) (1993). “Mathematical Modelling in Science and Jerome, J. W. (ed.) (1998). “Modelling and Computation for Applications
Technology,” Proceedings of the Eighth International Conference, in Mathematics, Science, and Engineering,” Oxford University Press,
University of Maryland, College Park (April 1991). Principia Scientia, New York.
St. Louis, MO. Kagawa, Y. (1994). “Modelling and Simulation and Identification,” Acta
Avula, X. J. R., and Mote Jr., C. D. (eds.) (1994). “Mathematical Mod- Press, Anaheim, CA.
elling in Science and Technology,” Proceedings of the Ninth Inter- King, J. R. (2000). “Emerging areas of mathematical modelling,” Phil.
national Conference, University of California, Berkeley (July 1993), Trans. Roy. Soc. Lond. A 358, pp. 3–19.
Principia Scientia, St. Louis, MO. Lasenby, J., Lasenby, A. N., and Doran, C. J. L. “A unified mathematical
Avula, X. J. R., and Nerode, A. (1996). “Mathematical Modelling in language for physics and engineering in the 21st Century,” Phil. Trans.
Science and Technology,” Proceedings of the Tenth International Roy. Soc. Lond. A 358, pp. 21–39.
Conference, Boston, MA (July 1995). PrincipiaI Scientia, St. Louis, May, R. M. (1976). “Simple mathematical models with very complicated
MO. dynamics,” Nature 261, 459–467.
Avula, X. J. R., and Nerode, A. (1998). “Mathematical Modelling in Meskens, N., and Roubens, M. (eds.) (1999). Kluwer Academic, Nor-
Science and Technology,” Proceedings of the Eleventh International well, MA.
Conference, Georgetown University, Washington, DC (July 1997), Nicholson, H. (ed.) (1980). “Modelling of Dynamical Systems,”
Principia Scientia, St. Louis, MO. Vols. 1 and 2, Peter Peregrinus Ltd. (for The Institution of Electri-
Avula, X. J. R. (2000). “Mathematical Modelling in Science and Technol- cal Engineers), Stevenage, U.K.
ogy,” Proceedings of the Twelfth International Conference, Chicago, Nicolis, G., and Prigogine, I. (1989). “Exploring Complexity,” W. H.
IL (July 1999). Principia Scientia, St. Louis, MO. Freeman & Co, New York.
Bandemer, H. (1993). “Modelling Uncertain Data,” Wiley, New York. Rodin, E. Y., and Avula, X. J. R. (eds.) (1989). “Mathematical Modelling
Bellman, R., and Roth, R. (1986). “Methods in Approximation: Tech- in Science and Technology,” Proceedings of the Sixth International
niques for Mathematical Modelling,” Kluwer Academic, Norwell, Conference, St. Louis, MO (August 1987), Elsevier Science, New
MA. York.
Bellman, R., and Wing, G. M. (1975). “An Introduction to Invariant Rodin, E. Y., and Avula, X. J. R. (eds.) (1990). “Mathematical Modelling
Imbedding,” Wiley, New York. in Science and Technology,” Proceedings of the Seventh International
Bellman, R., and Roth, R. (1983). “Quasilinearization and the Identifi- Conference, Chicago, II (August 1989) Elsevier Science, New York.
cation Problem,” World Scientific, Singapore. Saaty, T. L., and Alexander, J. M. (1981). “Thinking with Models: Math-
Caldwell, J., and Ram, Y. M. (1999). “Mathematical Modelling Concepts ematical Models in the Physical, Biological, and Social Sciences,”
and Case Studies,” Kluwer Academic, Norwell, MA. Pergamon, New York.
Casti, J. L. (1989). “Alternate Realities,” Wiley, New York. Sinha, N. K., and Kuszta, B. (1983). “Modeling and Identification of
Cowan, G. A., Pines, D., and Meltzer, D. (eds.) (1994). “Complexity: Dynamic Systems,” Van Nostrand Reinhold, New York.
Metaphors, Models and Reality,” Addison Wesley, Reading, MA. Stark, J. (2000). “Observing complexity, seeing simplicity,” Phil. Trans.
Cross, M., and Moscardini, A. O. (1985). “Learning the Art of Mathe- Roy. Soc. Lond. A 358, 41–61.
matical Modelling,” Wiley, New York. Whorf, W. (1956). “Language and Thought,” MIT Press, Cambridge,
Dym, C. L., and Ivey, E. S. (1980). “Principles of Mathematical Model- MA.
ing,” Academic, New York. Yager, R. R., and Filev, D. P. (1994). “Essentials of Fuzzy Modeling
Eykhoff, P. (1974). “System Identification: Parameter and State Estima- and Control,” Wiley, New York.
tion,” Wiley, London. Zadeh, L. A. (1965). “Fuzzy sets,” Information and Control 8, 338–353.
P1: ZCK Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
231
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
the main deficiencies and the limit theorems described in outer measure are measurable (and so therefore are all
Section III together with the results on differentiation of subsets of such sets). The class of measurable sets has the
Section V rounded off the subject. The needs of the ap- important property of being a σ algebra: that is, the class
plied mathematicians were provided for also by the re- contains the whole space ( in this case) and is closed un-
sults on the L p spaces of functions described in Section der the formation of complements and countable unions.
IV and the results using product spaces of Section VI. The Only the last part here presents any difficulty. Denote m ∗
Lebesgue integral turns out to be the appropriate definition restricted to the σ -algebra M by m, called Lebesgue mea-
to deal with orthogonal expansions and with the Fourier sure. Then we easily have that for any sequence {E i } of
and Laplace transforms. The needs of the probabilists were disjoint measurable sets
supplied by the general theory of measure in Section VII,
the decomposition theorems of Section VIII, and the re-
∞ ∞
m Ei = m(E i )
sults on products of measure spaces in Section VI. The i=1 i=1
differentiation of measures considered in Section VIII is
of central importance in functional analysis, as is seen in (i.e., m is additive on disjoint measurable sets). For some
Section XI. purposes it is convenient to restrict the measure m still fur-
ther to the Borel sets which may be defined as the smallest
σ algebra containing the intervals. These sets are more nat-
II. LEBESGUE MEASURE ural in some contexts, but difficulties can arise from the
fact that whereas for the class M every subset of a set of
For any set A contained in the real line we define zero measure is again measurable, this is no longer true of
the Lebesgue outer measure (or outer measure) to be the smaller class of Borel sets. So we confine our attention
to the Lebesgue measurable sets.
the quantity m ∗ (A) given by m ∗ (A) = inf n l(ln ) where
we are taking the infimum or greatest lower bound over It is easily seen that every nonempty open set has pos-
all collections {In } of intervals such that A ⊆ ∪ In and itive measure. From the countable additivity of m it is
where l(I ) denotes the length of the interval I . From this obvious that every countable set has zero measure. It is
definition we have immediately that m ∗ (A) is nonnegative; less obvious that there exist uncountable sets of zero mea-
m ∗ (A) = 0 if A is a one point set or empty; m ∗ (A) m ∗ (B) sure. A standard example is the Cantor ternary set. It is
whenever A ⊆ B. It is fairly easy to show that the outer formed by removing from the interval [0, 1] the middle
measure of any interval equals the length of the interval, third ( 13 , 23 ), then the middle thirds of the two remaining
so outer measure has for some sets at least the properties intervals, namely, ( 19 , 29 ) and ( 79 , 89 ), then the middle thirds
we would desire. Also, for any sequence of sets {E i } it is of the remaining intervals, and so on. The total set re-
easy to see that moved has measure adding to 1, so the residual set (the
Cantor set) has measure 0. But the Cantor set contains all
∞ ∞
the numbers between 0 and 1 with expansions to base 3
∗
m Ei m ∗ (E i )
consisting of 0’s and 2’s. So it has the same cardinality
i=1 i=1
as the set of all binary expansions, and so is uncountable.
∗
(i.e., m is subadditive). We, however, cannot in general It is also true that there exist sets of the real line which
assume that equality will occur here even if the sets are are not measurable. But these sets cannot be constructed
pairwise disjoint. The outer measure has another desirable and indeed can only be shown to exist using the axiom of
property: the outer measure of a set is unchanged if the set choice or an equivalent tool of mathematical logic. The
is shifted to left or right (i.e., m ∗ is translation invariant). existence of these nonmeasurable sets is not crucial for
It also has a regularity property: for any set A and any the theory, but if they did not exist the theory would lose
ε > 0 there is an open set O containing A and such that some of its content, for all sets would be measurable.
m ∗ (O) m ∗ (A) + ε. We consider now a “continuity” property of mea-
In order to achieve the desired additivity we restrict m ∗ sure. Suppose {E i } is a sequence of measurable sets,
to the class M of Lebesgue measurable sets. We define a E 0 ⊆ E 1 ⊆ E 2 ⊆ · · ·. Then m(∪i−1 ∞
E i ) = lim m(E i ). Also,
∞
set E to be Lebesgue measurable (or just measurable) if if F0 ⊇ F1 ⊇ F2 ⊇ · · ·, and m(F0 ) < ∞, then m(∩i=1 Fi ) =
for each set A lim m(Fi ) where again the sets {Fi } are supposed measur-
∞
m ∗ (A) = m ∗ (A ∩ E) + m ∗ (A ∩ E) able. If, as is conventional, we write here lim E i = ∪i−1 Ei
∞
and lim Fi = ∩i=1 Fi , then this result reads m(lim E i ) =
So a measurable set divides an arbitrary set in an additive lim m(E i ) for any increasing sequence of measurable
way as regards outer measure. It follows from this defi- sets and also for any decreasing sequence of sets of
nition that intervals are measurable. Also, all sets of zero finite measure. Lebesgue measure also has a regularity
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
property stronger than that of outer measure. If E is any ess inf f = sup{α: f α a.e.}. So if we disregard sets of
measurable set and ε any positive number then there measure zero we replace the sup f by the possibly smaller
exists an open set O ⊇ E with m(O\E) ε and a closed ess sup f and the inf f by the possibly larger ess inf f.
set F ⊆ E with m(E\F) ε. Functions for which both of the numbers are bounded are
Although the sets of zero measure can be uncountable, called essentially bounded. A simple property of the es-
they turn out to be negligible for the purposes of integration sential supremum to which we refer later is that for any
and we say that a property holds “almost everywhere” two measurable functions
(a.e.) if it holds except possibly on a set of zero measure.
Note that sets of infinite measure frequently occur (e.g., ess sup ( f + g) ess sup f + ess sup g
[1, ∞) is of infinite measure). So identities or inequalities and
involving the measures of sets must be written carefully
with this in mind. In particular, indeterminate expressions ess inf ( f + g) ess inf f + ess inf g
of the form ∞ − ∞ must be avoided.
So finite linear combinations of essentially bounded
The importance of Lebesgue measurable sets is that
functions are again essentially bounded. As an example
they allow us to define measurable functions. Recall that
suppose f (x) = 1 for x rational, f (x) = 0 otherwise. The
we say a function f is continuous if the set of points x with
f is measurable (it is the characteristic function of a
f (x) > α is open, for each α (i.e., f −1 (α, ∞) is open). We
countable set which of course has measure zero), and sup
define f to be Lebesgue measurable (or just measurable)
f = 1, but ess sup f = 0.
if f −1 (α, ∞) is a measurable set for each α. Since open
All the functions considered here have been real valued.
sets are measurable it follows that continuous functions
For some purposes, however, it is convenient to extend
are measurable. It follows easily from the definition that,
the definitions to complex-valued functions. We describe
if f is measurable, {x: f (x) = α} is measurable for each α.
the function f as measurable if Re( f ) and Im( f ), the
Also, the constant functions are measurable (since they are
real and imaginary parts of f , are measurable in the pre-
continuous). Of special importance are the characteristic
vious sense. To avoid difficulties we shall assume that
functions. We denote by χ A the characteristic function (or
any complex-valued functions considered take only finite
indicator function) of the set A: χ A = 1 on A, χ A = 0 on
values. It is easy to check that | f | is measurable whenever
A. Then from the definition of measurable functions we
f is a complex-valued measurable function.
have that a characteristic function χ A is measurable if,
and only if, the set A is measurable. It is also easy to see
that a constant multiple c f of a measurable function f is
measurable. With a little more care we can see that the III. THE LEBESGUE INTEGRAL
sum of measurable functions is measurable. So any finite
linear combination of measurable functions is measurable We now show how the Lebesgue integral is defined, how
and the product of measurable functions is measurable. its value may be obtained in practice, what the principal
As indicated earlier it is important in applications to theorems are which make its use attractive, and how these
deal with limiting operations. So we note that the sup theorems may be used in examples. We revert for the mo-
and inf of any finite or countable family of measurable ment to real-valued functions.
functions is again measurable; this uses the fact that M As with most definitions of the integral we specify the
is a σ algebra allowing countable set operations. It fol- integral for a certain basic class of functions and show how
lows that for any sequence { f n } of measurable functions, the definition can be extended to a more general class. Not
lim sup f n and lim inf f n are again measurable. Here lim all functions or even all measurable functions have inte-
sup f n is the inf(N 1) of the sup{ f n : n N }, and lim inf grals; the restrictions arise in a natural way from the defi-
f n = − lim sup(− f n ). If lim f n exists it is the common nition. So we first consider nonnegative simple functions.
value of lim sup f n and lim inf f n , so we have that the These are measurable functions taking only a finite num-
limit, when it exists, of a sequence of measurable func- ber of values. Equivalently we write the real line as the
tions is again measurable. The corresponding result is not union of a finite number of measurable sets
true of continuous functions. As important special cases n
For continuous functions the sup and inf are important. and consider the function
For a measurable function we are more interested in ess
n
sup f and ess inf f (the essential supremum and essen- φ(x) = ai χ Ai
tial infimum of f) given by ess sup f = inf{α: f α a.e.}, i=1
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
least one of these integrals is finite; if both are finite we uous function on a finite interval [a, b] (so that f is
say f is Lebesgue integrable, or integrable in short. So the certainly measurable)
x then f is integrable and the func-
measurable function f is integrable provided the nonneg- tion F(x) = a f (t) dt(a < x < b) is differentiable with
ative measurable function | f | has a finite integral. Then F = f . So for continuous functions, integrals can be ob-
the integrals of integrable functions inherit the properties tained using indefinite integrals, in the usual elementary
of the simpler integral, namely, manner. In particular the usual devices of integration by
parts or by substitution, which follow from the correspond-
(a f + bg) d x = a f d x + b g d x ing rules for derivatives, can be applied in the integration
of continuous functions.
for any integrable functions f and g; the integral is mono- All the theory of this section has dealt with real-valued
tone: f g implies f d x g dx; the integral is addi- functions. It may be extended to complex-valued func-
tive on sets: A f d x + B f d x = A∪B f d x for disjoint tions just as in the last section. Define the complex-valued
measurable function f to be integrable provided its real
measurable sets A and B. From the definition we have eas-
ily that | f d x| ≤ | f | d x for any integrable function f . and imaginary parts Re f and Im f are integrable. The
Also a mean value theorem applies: if f is measurable and elementary theory will
still be true as will the modulus in-
equality | f d x| | f | d x though the proof now needs
g is integrable
if and α, β are real with α f β a.e. then
f |g| d x = γ |g| d x for some γ , α γ β. In particu- a little more care. Lebesgue’s monotone convergence the-
lar, if E is measurable
with m(E) < ∞ and α f β on orem and Fatou’s lemma have no direct application but
E then αm(E) E f d x βm(E). Lebesgue’s dominated convergence theorem and its coun-
We come now to the main theorem of this section: terpart for series apply unchanged to a complex-valued se-
Lebesgue’s dominated convergence theorem. It states that quence { f n }. Indeed, to prove this one need consider only
if { f n } is a sequence of measurable functions such that the sequences {Re f n } and {Im f n } separately.
| f n | g where g is integrable and if lim f n = f a.e. then We now have several theorems allowing us to take lim-
f is integrable and its “under the integral sign” or to interchange summation
and integration. In the case of nonnegative functions this
lim f n d x = f d x interchange is always possible; in the general case a finite-
ness condition needs to be imposed. We now give a few
This follows immediately on applying Fatou’s lemma to examples showing how the theorems may be applied.
the nonnegative sequences {g + f n } and {g − f n } in turn. EXAMPLE 1. Show that
As a corollary we have that | f n − f | d x → 0 (i.e., f n
tends to f “in the mean”). It is quite easy to extend this 1
x log x
∞
1
dx = −
result to a family of measurable functions indexed by a 0 1−x 1
(n + 1)2
parameter. So if { f α } is such a family and f α → f at each
point as α → α0 , and | f α | g,an integrable function, then The integrand on the left-hand side may be considered on
f is integrable and limα→α0 f α d x = f d x. This ver- (0, 1), as
one point does not affect
∞then+1 integral, and there it
sion is useful in some applications;
∞ in particular it allows equals ∞ 0 x n+1
log x. Since 0 x log 1/x is a series
us
b to consider an integral −∞ f d x as limit of the integrals of nonnegative functions we may integrate it term by term
→ −∞, → ∞, | |
a f d x as a b f being the “dominating to get ∞ 1 1/(n + 1) 2
as required.
integrable function.”
Another useful version of the dominated convergence EXAMPLE 2. Show that
theorem is as follows. Let { f n } be a sequence of integrable ∞
dx
functions such that lim =1
(1 + x/n) n x 1/n
∞
0
| fn | d x < ∞ We may suppose x > 0 in the integral and write
1 f n (x) for
the integrand. Then lim f n (x) = e−x and 0 e−x d x = 1.
n=1
Then the series f n (x) So we need to interchange the integral and the limit. We
a.e., its sum f (x) is
converges
integrable and f d x = ∞ n=1 f n d x. This follows on
may construct a dominating function g(x) as follows. For
using the dominating function ∞ n=1 | f n |.
0 < x < 1,
In order to apply these limiting theorems to specific
examples we need to be able to obtain the integrals of (1 + x/n)n x 1/n > x 1/2 (n > 1)
elementary functions. It is easily seen, from the mean
value theorem referred to earlier, that if f is a contin- For 1 x,
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
The special case of Minkowski’s inequality for p = ∞ has d, converges. In the usual L p notation this may be re-
been referred to already. stated as follows: if 1 p < ∞ and f n − f m p → 0 as
Our third inequality is Jensen’s inequality. This states n, m → ∞ then there exists a function f ∈ L p () such
that if f is a measurable function defined on the interval that f n − f p → 0 as n → ∞. We have in addition the
[0, 1] and with values in the finite range [a, b] and if φ is existence of a subsequence { f ni } such that pointwise con-
a convex function on [a, b] then vergence holds: f ni → f a.e. This result is proved by
Fatou’s lemma and Minkowski’s inequality. The corre-
1 1
φ f dx ≤ (φ ❜ f ) d x sponding result in L ∞ () is also true, with a more di-
0 0 rect proof: if each f n ∈ L ∞ () and f n − f m ∞ → 0 as
n, m → ∞ then there exists a function f ∈ L ∞ such that
The proof requires a careful examination of convex func-
lim f n = f a.e. and f − f n ∞ → 0 as n → ∞. These re-
tions. In the case where φ is strictly convex (i.e., the seg-
sults need to be used with care: even if the functions f n are
ment referred to in the definition is strictly above the
specified functions, the limit f in L p is defined only as an
graph), we have that equality occurs in Jensen’s inequality
1 element of L p and its value at any specific point cannot be
if, and only if, f is constant a.e., having the value 0 d x.
obtained. The completeness of the space L 2 is of special
An important special case is obtained by assuming h to
importance in physical applications.
be a nonnegative measurable function such that log h is
integrable over [0, 1] and taking f to be log h and φ(x) to
be e x : then we have V. DIFFERENTIATION AND INTEGRATION
1 1
exp log h d x h dx Differentiation and integration are closely connected;
0 0
since we have extended the elementary notion of integral
By induction proofs we can show that Hölder’s and we must deal carefully with differentiation so that this
Minkowski’s inequalities extend to n functions. For ex- relation continues to hold. The first point to note is that
ample, for the case n = 3, Hölder’s inequality reads: if continuous functions are not very relevant here; indeed
1 < p < ∞, 1 < q < ∞, 1 < r < ∞, 1/ p + 1/q + 1/r = continuous functions which are nowhere differentiable
1, f ∈ L p , g ∈ L q , and h ∈ L r then f gh ∈ L 1 and are easily constructed. For example, on the interval (0, 1)
f gh1 f p gq hr . let f n (x) denote the distance from x to the nearest number
Hölder’s inequality may be used to obtain inequalities of the form m/10n where m and n are nonnegative
for any integral of the product of functions. For exam- integers. Then f n has a “sawtooth” graph with 10n “teeth”
π/2
ple, to obtain an upper bound for 0 x −1/4 cos x d x take and max f n = 12 · 10−n . So f n certainly continuous, and
f (x) = x −1/4 , g(x) = cos x, p = q = 2 to get the bound f n (x) is uniformly convergent with sum f (x), say.
(π/2)3/4 . As a second example, Then f is continuous but for each x ∈ (0, 1) by consider-
b 1/ p
ing its decimal expansion we can show that the graph of
f dx | f |p dx (b − a)1/q y = f (x) has no tangent at this point.
a We now consider an important class of functions which
for conjugate p and q on taking g(x) ≡ 1. This shows although not necessarily continuous are well behaved
that L p (a, b) ⊆ L 1 (a, b) for p > 1 and in the special case as regards differentiability, namely, the functions of
a = 0, b = 1 is just Jensen’s inequality with φ(x) = x p . bounded variation. We suppose f is defined and finite-
As for any normed space the norm on L p may be used to valued on the finite interval [a, b] and take a partition
define a distance function with the “distance” between the a = x0 < x1 < · · · < xk = b. of [a, b]. Then we form the
functions f and g in L p defined as d( f, g) = f − g p . sums
Then Minkowski’s inequality states that the triangle in- k
Section III, in connection with Lebesgue’s dominated con- t = p+n = | f (xi ) − f (xi−1 )|
i=1
vergence theorem. The most important property of the
L p spaces is that of completeness: every sequence which where we are using the notation a + = max(a, 0) and a − =
is a Cauchy sequence, in terms of the distance function max(−a, 0). So t, p, n, 0 and f (b) − f (a) = p − n.
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
Taking upper bounds over all partitions of [a, b], and ity). Nor can we assume that whenever F exists it equals
keeping to the same function f we let P = sup p, N = f , for we may take a continuous function as f (so that F
sup n, T = sup t and call these quantities the positive, exists at all points) and then change f at a single point x0 .
negative, and total variations of f over [a, b]. If T is finite, Then F is unchanged but F (x0 ) cannot equal f (x0 ).
f is said to be of bounded variation over [a, b] or to Indefinite integrals, however, have the important prop-
belong to the class BV[a, b]. Taking suprema over parti- erty of being absolutely continuous. A function f is said
tions the relations for t, p, n give T = p + N and f (b) − to be absolutely continuous on [a, b] if given ε > 0 there
f (a) = P − N . Also, each of these variations is additive exists δ > 0 such that
over intervals, so if a < c < b then T [a, b] = T [a, c] + n
T [c, b] and similarly for P, N . This follows immediately | f (xi ) − f (yi )| < ε
from the corresponding identities for t, p, n, and leads i=1
n
to the important result that a function f is of bounded whenever i=1 |xi − yi | < δ for any finite set of disjoint
variation over [a, b] if, and only if, it may be written as intervals (xi , yi ) in [a, b]. Taking the special case n = 1 we
the difference of two finite-valued monotone increasing see that absolutely continuous functions are continuous.
functions g and h, say. These are obtained by defining Considering any partition of [a, b] and introducing new
g(x) = P[a, x] + f (a) and h(x) = N [a, x]. The converse partition points at a distance at most δ apart we can show
follows from the fact that any finite-valued monotone that every absolutely continuous function is of bounded
function is of bounded variation, and so therefore is variation.
the difference of two such functions. Indeed, the class
Now for any integrable function f we have that
E | f | dt tends to zero as m(E) → 0. This is obvious for
of functions BV[a, b] forms a vector space; linear
combinations of functions of bounded variations are a bounded function f , and follows for an unbounded
again functions of bounded variation. function since the integral of f over any set is the limit
Functions of bounded variation share the “good” prop- of integrals of the functions f n where f n = f provided
erties of monotone functions. Since any finite-valued | f | n, f n = ± n otherwise. From this it follows (with
monotone increasing function is continuous except pos- E = ∪i=1 n
(xi , yi )) that if f is integrable over [a, b] its
sibly on a countable set the same is true for functions indefinite integral is absolutely continuous there. It is a
f ∈ BV[a, b], so in particular these functions are measur- little more difficult to prove the converse: every absolutely
able. An example of a monotone-increasing function with continuous function is an indefinite integral, indeed it is
a countable number of discontinuities is provided by let- the indefinite integral of its derivative, a function which
ting {ri } be an enumeration
of the rational numbers in [0, 1] we know to exist a.e. That the derivative of any function of
and defining f (x) = rn x 2−n . Then f is discontinuous bounded variation f is measurable can be seen from the
at each rational in the interval. For an example of a function fact that it may be obtained as the limit of a sequence of ra-
f not of bounded variation on [0, 1] define f arbitrarily at tios gn (x) = n( f (x + 1/n) − f (x)) which are themselves
x = 0 and let f (x) = sin(1/x) otherwise; another exam- measurable. So, on finite intervals, a function is an indef-
ple is given by f (x) = x sin(1/x), x = \ 0, f (0) = 0. This inite integral if, and only if, it is absolutely continuous.
example shows directly that f may be continuous but not
of bounded variation.
Lebesgue proved that if f ∈ BV[a, b], then f is differ- VI. PRODUCT SPACES AND
entiable a.e. and its derivative is finite a.e. This is a signifi- PRODUCT MEASURES
cantly more difficult result to prove than the earlier results
of this section. The two examples of the previous para- For any two spaces X and Y the Cartesian product X × Y
graph show that the converse of this theorem is not true. is the set of ordered pairs {(x, y): x ∈ X, y ∈ Y }. We will
We consider now indefinite integrals and write for any concern ourselves with the case X = Y = so that X × Y
x
integrable function f, F(x) = a f dt, so that F is the in- is just the plane 2 . However, the product notation is use-
definite integral of f over the interval [a, b], say, on which ful for subsets of 2 . We call a set E in 2 a rectangle if
f is integrable. Then Lebesgue’s dominated convergence E = A × B, A ⊆ , B ⊆ . For the purpose of measure and
theorem, applied to the family of functions χ[a.x] f where integration the important sets are the measurable rectan-
x → x0 shows that F is continuous. It also follows easily gles: sets of the form A ×B where A and B are measurable
from the definitions that F ∈ BV[a, b], its total variation sets. These include as special cases the “genuine” rectan-
being bounded by a | f | dt. So F exists a.e. and it can
b
gles where A and B are intervals.
easily be shown that F = f a.e. in [a, b]. We cannot ex- From the basic measurable rectangles we can form the
pect F to be everywhere differentiable (e.g., let f be a step σ -algebra M × M which is the least σ algebra containing
function, then F does not exist at the points of discontinu- the measurable rectangles. So within M × M we may
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
form complements and countable unions. It can be shown If f is a function of x and y we may similarly define
that if we take all finite unions of measurable rectangles the x section of f as the function f x (y) = f (x, y) for each
and then take the smallest class of sets which contains fixed x, and the y section of f as f y (x) = f (x, y) for each
these sets and is closed under the formation of count- fixed y. Since measurable sets have measurable sections
able increasing unions and countable decreasing intersec- it follows easily that if f is measurable with respect to
tions then we get just the σ -algebra M × M. As we shall M × M then f x and f y are measurable with respect to m.
see this is crucial for the theory. We shall call the sets of For any nonnegative measurable function f we have
M × M the measurable sets of the plane. An alternative the result that the integral of f may be expressed in
definition extends M × M by also including all subsets terms of repeated integrals in either order and both of
of sets of measure zero, so as to get a complete measure these integrals are defined. More precisely, let f be non-
space (see Section VII). negative and measurable with respect to M × M; write
For any set E in 2 define the x section of E to be the φ(x) = f x dy, ψ(y) = f y d x. Then φ and ψ are mea-
set E x = {y: (x, y) ∈ E} and the y section of E to be the set surable and
E y = {x: (x, y) ∈ E}. These are sets in , not 2 . It can be
shown quite easily that measurable sets in the plane have φ dx = f d x dy = ψ dy (2)
measurable sections. We wish now to define the product
measure on the sets of M × M. This definition should
provide, for the measurable rectangle A × B, the measure where the middle integral is the Lebesgue integral of f
m(A)m(B). We use the result that if E is a measurable with respect to the product measure defined earlier in this
set in the plane then φ(x) = m(E x ) and ψ(y) = m(E y ) are section. Identity (2) has already been proved for the case
measurable functions of x and y, respectively, and of a function of the form χ E by identity (1), and so for
any nonnegative measurable simple function. Taking a se-
φ d x = ψ dy (1) quence of such functions f n increasing monotonically to
f , the sections ( f n )x ↑ f x and ( f n ) y ↑ f y and the Lebesgue
The common value of these integrals is then taken as the monotone convergence theorem gives identity (2).
measure of E. It is clear that for the measurable rectangle We need now to extend (2) to functions not necessarily
E = A × B, φ(x) = χ A (x)m(B) so φ d x = m(A)m(B). nonnegative and expect now to find some finiteness con-
Also, ψ(y) = χ B (y)m(A) so ψ dy = m(A)m(B) also. So dition coming in. Now, identity (2) applied to | f | states
identity (1) holds for measurable rectangles and the mea- that | f | is integrable if, and only if, each of the iterated
sure obtained is the desired one. It holds similarly for fi- integrals of | f | (or, more precisely, of the sections of | f |)
nite unions of measurable rectangles. Using the Lebesgue is finite, and then all three integrals are equal. In this case
monotone convergence theorem we obtain (1) for the we write f as the difference f = f + − f − of nonnegative
union of monotone increasing sequences of sets {E n }, as measurable functions and we apply identity (2) to f + , f − ,
the sequences {φn } and {ψn } are similarly monotone. Sim- and their sections. On subtracting the results we get
ilarly for a decreasing sequence of sets {Fn }, contained in
a bounded rectangle, the result for the intersection fol-
lows from the Lebesgue dominated convergence theorem. dx f x dy = f d x dy = dy f y dx (3)
Since the plane may be written as the union of a sequence
of bounded rectangles So from the remarks concerning | f | we deduce Fubini’s
∞ theorem which states that if f is a measurable function of x
2 = {[n, n + 1) × [m, m + 1)} and y and either of the iterated integrals of | f | is finite then
n,m=−∞
so is the other, | f | is integrable and identity (3) holds for f .
the result follows for all the sets of M × M on adding An important application of this theory is in connection
together the results for the component pieces. with the Laplace and Fourier transforms of an integrable
These definitions and the sketch proof just given work function. We will describe the application in the Fourier
equally well with general measures µ, ν say (see Sec- transform case; the other is similar. The following result
tion VII), with a measurable rectangle A × B having the allows us to define the convolution of two functions. Let
measure µ(A)ν(B). The only condition on the measures f and g be integrable functions; then f (y − x)g(x) is an
is that each coordinate space can be written as a countable integrable function of x for almost all y and if h(y) is
union of sets of finite measure for µ or ν, as the case may defined for these y by h(y) = f (y − x)g(x) d x then h
be, so that the product space may be decomposed into a se- is integrable and h1 f 1 g1 . To prove this we need
quence of sets of finite measure as for 2 in the preceding to show that f (y − x)g(x) is measurable with respect
paragraph. to M × M. This is not very difficult and assuming
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
of the integrable functions f and g. We can easily show VII. GENERAL MEASURES
that f ∗ g = g ∗ f a.e. and ( f ∗ g) ∗ h = f ∗ (g ∗ h) a.e. for
any integrable functions f, g, h. For example, to prove Measures arise in various ways and in various spaces and
the first identity, let y be such that g(y − x) f (x) is inte- the theory outlined above can be applied in the main pro-
grable with respect to x, so y does not lie in the excep- vided we have an appropriate class of sets defined to be
tional
set of measure zero. For such y let t = y − x so that measurable. So we shall suppose we have a set or space
g(y − x) f (x) d x becomes X and on it a σ -algebra of sets. On the sets of our
measure µ is defined and so µ is presumed to be a nonneg-
g(t) f (y − t) dt = (2π )1/2 ( f ∗ g)(y) ative set function which takes the value 0 on the empty set
and which is countably additive (i.e., if {An }is a sequence
∞
as we may translate the variables. of disjoint sets of we have µ(∪∞ n=1 A n ) = n=1 µ(An )).
For any integrable function f we may define the Fourier Then the triple {X, , µ} is called a measure space. This
transform as the function fˆ given by measure space is said to be σ finite if we may write X
as X = ∪∞ n=1 X n , X n ∈ , and µ(X n ) < ∞. It is said to
ˆf (s) = (2π )−1/2 e−ist f (t) dt be a complete measure space if for any set E of with
µ(E) = 0 every subset of E also belongs to (and then of
Then fˆ is a continuous function, by the Lebesgue domi- course has measure zero also).
nated convergence theorem with | f | as dominating func-
EXAMPLE 1. X = , = M, µ = m, gives Lebesgue
tion. Also, | fˆ| (2π)−1/2 | f | dt. The convolution and
measure on the real line. This is σ finite and complete.
the Fourier transform are related by the identity
EXAMPLE 2. X = , = Borel sets, µ = m, gives
( f ∗ g) = fˆ ĝ
Borel measure. This is also σ finite but is not complete.
true for integrable functions. To see this note that EXAMPLE 3. X = 2 , = M × M, µ = m × m,
gives planar measure. This is σ finite but not complete.
( f ∗ g)(s) = (2π)−1 dt e−ist f (t − x)g(x) d x
EXAMPLE 4. X = N the set of natural numbers; is
Then the modulus of the integrand here, | f (t − x)g(x)|, the class of all subsets
of N. Let an < ∞ with an 0
is integrable by the argument given for the convolution. and define µ(E) = ak where the summation is over all
So by Fubini’s theorem we may interchange the order of integers k in E. This measure is obviously complete and
integration to get is finite for the whole space (and so trivially σ finite).
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
Results such as m(lim E i ) = lim m(E i ) hold as before zero). Indeed the sequence { f n } defines the function f a.e.,
for monotone sequences of sets. A real function f is mea- in the sense that if { f n } is a Cauchy sequence with respect
surable if f −1 (α, ∞) belongs to , the family of mea- to convergence in measure so that for any ε > 0,
surable sets. Integration theory proceeds
n as before, the
lim µ{x: | f n (x) − f m (x)| > ε} = 0
nonnegative simple function
n φ = χ
i=1 i Ai having the
a n,m→∞
integral φ dµ = i=1 ai µ(Ai ) as in Section III. Then
then there exists a measurable function f so that f n → f
the integral f dµ of a nonnegative measurable function
in measure and also for some subsequence {n i }, f ni → f
is the least upper bound of such integrals for all such func-
a.e. This is proved by a careful choice of ε’s as elements
tions φ f . The integration of functions not necessarily
of a convergent series.
nonnegative and of complex-valued functions is defined
An analog of Fatou’s lemma holds for convergence in
as before and the same theorems hold: Fatou’s lemma, the
measure: Let { f n } be a sequence of nonnegative measur-
Lebesgue monotone convergence theorem, the Lebesgue
able functions and let f be a measurable function such
dominated convergence theorem, and their variants for
that f n → f in measure; then
series. Of course the Riemann integral will not exist in
general so comparisons cannot be made. Also the result f dµ lim inf f n dµ
noted in Section V holds in general: E | f | dµ tends to
zero as µ(E) → 0. The proof depends on applying the original Fatou’s lemma
The constructions and inequalities obtained in of Section III to subsequences { f ni } tending to f a.e. There
Section IV will hold good for the general L p spaces is a corresponding analog of the Lebesgue dominated con-
L p (X, , µ). So will the results of Section VI, as already vergence theorem. Let { f n } be a sequence of measurable
noted, for a pair of measures µ and ν provided we assume functions such that | f n | g, an integrable function, and
these to be σ finite. let f n → f in measure, where f is measurable.
For some purposes it is convenient to use a complete Then
f is integrable, lim f n dµ = f dµ, and lim | f n −
measure and it is an important fact that a measure may f | dµ = 0 (i.e., f n → f in the mean). This result is easily
be extended in a unique way so as to be complete. We proved. Since there exists a subsequence { f ni } with limit
replace the σ -algebra by the larger class of sets of the f a.e., we have | f | g so f ∈ L 1 (µ). Also, for each n,
form E ∪ N where E ∈ , E ∩ N = , and N ⊆ M where g + f n 0 and g + f n → g + f in measure, so by the ver-
M ∈ with µ(M) = 0. Then we check that so defined sion of Fatou’s lemma just given
is a σ algebra and define µ on by µ(E ∪ N ) = µ(E).
This clearly extends µ and is easily seen to give an unam- g dµ + f dµ lim inf (g + f n ) dµ
biguous definition. With this construction {X, , µ} is a
complete measure space. and
If µ(X ) = 1 the triple {X, , µ} is called a probabil-
ity measure space. Then a particular set of terms is used, f dµ lim inf f n dµ
for historical reasons, and the emphasis is on a particu-
lar type of result. In particular for “almost everywhere” Similarly, g − f n 0, whence
one writes “almost surely,” abbreviated a.s.; measurable
functions are called
random variables; their Fourier trans- g dµ − f dµ lim inf (g − f n ) dµ
forms φyx = ei xt f (t) dµ(t) are called characteristic
functions. Note that Jensen’s inequality will apply directly and so
with [0, 1] replaced by X ; also that the product of proba-
bility spaces is again a probability space. f dµ lim sup f n dµ
We shall consider now some further types of conver-
gence which may be applied to sequences of functions. lim inf f n dµ f dµ
Let { f n } be a sequence of measurable functions and f
a measurable function (all on the measure space {X, Therefore equality holds and the first result follows.
, µ}). Then f n tends to f in measure if for every posi- Also it is easily seen that | f n − f | → 0 in measure. But
tive ε, | f n − f | 2g and so the second result follows from the
first.
lim µ{x: | f n (x) − f (x)| > ε} = 0
In L p (µ), convergence in the sense of the norm is
If µ is a probability measure this is termed convergence termed convergence in the mean of order p (i.e., f n → f
in probability. It is easily seen that a sequence { f n } can in the mean of order p( p 1) if lim f n − f p = 0). (It
tend at most to one limit function f (up to sets of measure may be defined for any p > 0, but for 0 < p < 1 the norm
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
notation is not appropriate.) This convergence and con- implies almost uniform convergence (i.e., if f n → f a.e.
vergence in measure are easily shown to be related: if where | f n | g, an integrable function, then f n → f a.u.).
f n − f in the mean of order p then f n → f in measure. Examples can be constructed showing that certain types
For suppose not. Then there exist ε > 0, δ > 0 such that of convergence do not in general imply certain other
µ{x: | f n (x) − f (x)| > ε} > δ for infinitely many n. But ones. For instance, let χni be the characteristic function
then | f n − f | p dµ ε p δ for infinitely many n, contra- of the interval [(i − 1)/n, i/n], i = 1, . . . , n, with [0, 1]
dicting convergence in the mean of order p. as the whole space and using Lebesgue measure and de-
Another important kind of convergence is almost uni- fine { f n } to be the sequence χ11 , χ21 , χ22 , χ31 , χ32 , χ33 , . . . .
form convergence. Let { f n } be a sequence of measurable Then f n = χk where 2 k(k − 1) < n ≤ 2 k(k + 1), whence
i 1 1
functions and let f be a measurable function. Then we say f n d x = 1/k → 0 as n (and thus k) → ∞, so that f n → 0
that f n → f almost uniformly (abbreviated a.u.) if for any in the mean. But f n → 0 a.e.; indeed for no x in [0, 1] does
ε > 0 there exists a set E with µ(E) < ε and such that on f n (x) → 0. Nor does f n → 0 a.u. The space here, [0, 1],
the complement of E, f n → f uniformly (i.e., given ε > 0 has nevertheless finite measure and the sequence is dom-
there exists N such that for n > N , | f n (x) − f (x)| < ε for inated by the integrable function χ[0,1] .
all x ∈ E (N depending on ε but not on x)). Obviously
uniform convergence (on the whole space) implies almost
uniform convergence (just take E = above). In the op- VIII. THE RADON–NIKODÝM THEOREM
posite direction, the sequence {x n } converges to zero al- AND SIGNED MEASURES
most uniformly on [0, 1] but not uniformly.
If f n → f a.u., however, then f n → f in measure. For New measures arise from given measures in natural ways.
if not then there exist positive ε and δ such that For example, let f be a nonnegative function measurable
with respect to the measure µ on the σ -algebra , and
µ{x: | f n (x) − f (x)| > ε} > δ define ν on by
for infinitely many n. But since there exists E with
ν(E) = f dµ
µ(E) < δ and with f n → f uniformly on E we get a E
contradiction. We can also show easily that if f n → f a.u. Then ν is a nonnegative set function, vanishing on the
then f n → f a.e. For if m is any positive integer we can empty set. That ν is countably additive follows from the
find a set E m with µ(E m ) < 1/m and on E m , f n → f corollary to the Lebesgue monotone convergence theorem.
uniformly. But if x ∈ ∪∞ m=1 E m we have x ∈ E N , say, For if {E n } is a sequence of disjoint sets in then
so lim f n (x) = f (x). Since ∪∞ ∞
m=1 E m = ∩m=1 E m , a set
of measure zero, the result follows. ∞ ∞
ν En = χ En f dµ
An important result in the opposite direction is given
n=1 n=1
by Egorov’s theorem: on a space of finite measure con-
∞
vergence almost everywhere implies almost uniform con-
= χ En f dµ
vergence. We can prove this by writing for each pair of n=1
positive integers k, n,
∞
whenever {E i } is a disjoint sequence of sets of . Clearly by reduction to the case for finite measures: write X as
every measure is a signed measure. the union of a sequence of disjoint sets on which both
µ and ν are finite, find the corresponding function f for
EXAMPLE. Let f be an integrable function on each set, and put these functions together to get the cor-
{X, , µ}. Then ν(E) = E f dµ defines a signed mea-
responding “density function” on the whole of X . In the
sure. Clearly only the countable additivity
+ needs chec-
+ − case of a finite measure, f is obtained as the supremum
king,
− so for E ∈ write ν (E) = E f dµ, ν (E) =
+ − of functions
g which are nonnegative, measurable, and
f dµ. Then ν , ν are (finite) measure and
E satisfy E g dµ ν(E) for each E in . However, taking
∞
∞ ∞ the supremum of such functions involves an uncountable
+ −
ν Ei = ν Ei − ν Ei family of functions so a nonmeasurable function could
i=1 i=1 i=1 apparently appear. The proof reduces the operation to a
∞
∞
countable one. The resulting function f can be regarded
= f + dµ − f − dµ as the derivative of the measure ν with respect to µ and we
Ei Ei
i=1 i=1 write f = dν/dµ. The theorem can be extended to signed
∞
∞ measures. If µ, ν are signed measures we say ν is abso-
= f dµ = ν(E i ) lutely continuous with respect to µ if ν(E) = 0 whenever
Ei
i=1 i=1
|µ|(E) = 0. Then ν can be written in terms of a derivative
In this example the signed measure ν has a decomposition with respect to the measure µ by writing ν = ν + + nu − ,
as the difference of measures and there is a corresponding finding the derivatives of ν + and ν − separately, and taking
decompositions of the space into the sets {x: f (x) 0} and dν/dµ as their difference.
{x: f (x) < 0} on one of which ν acts like a measure and on The analogy between ordinary derivatives and the
the other −ν acts like a measure. More generally for any Radon–Nikodým derivative dν/dµ goes further. The
signed measure ν there is a decomposition ν = ν1 − ν2 as chain rule applies: if λ, µ, ν are σ finite measures such
the difference of measures for which ν1 and ν2 are mutu- that ν µ and µ λ then ν λ and
ally singular (written ν1 ⊥ ν2 ) (i.e., for some set A ∈ we
have ν2 (A) = ν1 ( A) = 0). Then ν1 and ν2 are said to be dν/dλ = (dν/dµ)(dµ/dλ)
the Jordan decomposition of ν and are uniquely defined where equality is in the sense that this equation must hold
by ν. There is a corresponding decomposition of the space almost everywhere (in the sense of λ). If the measure ν is
X as the union of disjoint sets A, B of such that ν is defined by the integral of a function f with respect to µ
nonnegative on the measurable subsets of A, −ν is non- this theorem takes the following form: if µ λ where µ
negative on the measurable subsets of B. This, the Hahn and λ are σ finite then there exists a measurable function
decomposition, is unique up to sets for which all subsets g such that if f ∈ L 1 (X, µ) then f g ∈ L 1 (X, λ) and for
in have zero ν measure. The example given above where each E
ν is defined using an integrable function displays both the
Jordan and Hahn decompositions. In the general case the ν(E) = f dµ = f g dλ
Hahn decomposition is established first using an exhaus- E E
tion argument to find the “largest” set on which ν acts If we take any nonnegative integrable function f on the
like a nonnegative measure and the Jordan decomposition real line we may take multiple
of f so as to get a func-
follows. tion p(x) such that ν(E) = E p d x defines a probability
By analogy with the functions of bounded variation measure with p = dν/dm as the probability density. The
discussed in Section V the measure ν + is called the Radon–Nikodým derivative of one probability measure
positive variation of ν, ν − the negative variation, and with respect to another is important in connection with
|ν| = ν + + ν − the total variation of the signed measure conditional probabilities and conditional expectations.
ν. Obviously
in the example above, |ν| is given by The Radon–Nikodým theorem shows that for a finite
|ν|(E) = E | f | dµ. measure ν the relation ν µ is a genuine continuity
The converse to the situation presented in this exam- property. For since ν(E) = E f dµ where f is an inte-
ple is provided by the Radon-Nikodým theorem. It states grable function we have the result noted in Section VII
that if {X, µ} is a σ -finite measure space and ν is a that ν(E) → 0 as µ(E) → 0; more precisely, given ε > 0
σ -finite measure such that ν µ, then there exists a non- there exists δ > 0 such that whenever µ(E) < δ we have
negative measurable
function f on X such that, for each ν(E) < ε.
E ∈ , ν(E) = E f dµ. This f is unique in the conven- Example 4 in Section VII exhibits the opposite property:
tion given before; any other function with the same prop- the measure µ defined there is concentrated on a set of zero
erty agrees with f almost everywhere. The proof proceeds measure (the integer points) and so cannot be given as the
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
g must be left-continuous (i.e., g(x) → g(x0 ) whenever of the Radon–Nikodým theorem holds. The most useful
x → x0 , x < x0 ). Then µg is indeed a measure on the ring version of this integral is the Bochner integral. First, we
of finite unions of such intervals and extends to a measure describe the details of such measures. Let Y be a Banach
on the Borel sets of the real line. We could, at least for the space with norm .. We will suppose that we have a space
case g bounded, have chosen intervals of the form (a, b] X with measurable sets . Then m: → Y is a vector mea-
and g right-continuous. This form of the definition is more sure if m() = 0 and whenever {Ai } is a countable family
∞ ∞
common in probability theory. The measure µg given by of disjoint sets of then m(∪i=1 Ai ) = i=1 m(Ai ) where
the construction outlined above is the Lebesgue–Stieltjes this sum
n is norm convergent, that is, the sequence of vec-
measure defined by g. tors i=1 m(Ai ) converges in the normed space (Y, .)
It is clear that points of discontinuity of g correspond to as n → ∞. We will suppose that the space X is equipped
atoms of the measure µg , as defined in Section IX. Abso- with a finite measure µ: for some purposes it is more con-
lute continuity was defined for functions in Section V and venient to assume, as we will, that µ(X ) = 1, that is: µ is
for measures in Section VIII. These definitions are drawn a probability measure. The average range of m is the set
together by the following result. Let g be a monotone in- in Y given by A R(m) = {m(A)/µ(A): A ∈ , µ(A) > 0}.
creasing absolutely continuous function. Then g defines a As in the scalar case we say the vector-valued measure
measure on a σ algebra. If we assume that this measure m is absolutely continuous with respect to µ, (m µ) if
has been completed, as defined in Section VII, to get the µ(A) = 0 implies m(A) = 0 (zero vector).
complete extension µg then µg is defined on the σ alge- Then to set up a theory of integration we nconsider
bra M of Lebesgue measurable sets and µg m. Indeed simple functions as before of form f = i=1 x i χ Ai
the Radon-Nikodym derivative of µg and the derivative with xi ∈ Y, Ai ∈ . Then for any set A ∈ , we can
n
of g (which exists b a.e.) correspond: for any a, b we have define for such an f A f dµ = i=1 xi µ(A ∩ Ai ). Then
g(b) − g(a) = a g dt and so g = dµg /dm. In this case generally a function f : X → Y is Bochner integrable
integrals with respect to µg reduce to integrals with respect if there is a sequence { f n } of simple functions with
to Lebesgue measure: limn→∞ f n (x) = f (x) a.e. (µ) and limn→∞ f (x) −
f n (x)dµ =0. This provides
an unambiguous definition
f dµg = f g dt if we set f dµ = lim f ndµ, and we then write
E E f ∈ L 1Y (X, , µ), and we say f dµ is the Bochner inte-
These results are especially important in probability the- gral of f . Regarding measurability, a Y-valued function
ory where the finiteness of the measure simplifies the is said to be strongly measurable if it is the limit a.e. of
results. Indeed let {X, , µ} be a probability measure simple functions. A weaker requirement is that each scalar
space and f a finite-valued measurable function, so f is valued function F ◦ f where F is a continuous linear
a random variable. Define a function F by F(x) = µf −1 functional on Y should be measurable in the usual sense.
(−∞, x] (i.e., F(x) = µ{t: f (t) x}, so that F(−∞) = For functions f with f integrable and which satisfy a
0 · F(∞) = 1). Then F is the distribution function of f condition on the range (always true for integrable func-
and is a monotone increasing right-continuous function. tions), the definitions are equivalent. Much of the theory
The measure µ F it defines agrees with the measure µf −1 extends; for example, we have a dominated convergence
on the intervals of . theorem: we require f n (x) ≤ g(x) a.e. (µ). where g is
integrable and lim f n (x) = f (x) a.e., then f is integrable,
f dµ = lim f n dµ and lim f n − f dµ = 0 (i.e.
XI. THE RADON–NIKODÝM PROPERTY f n converges to f in the mean).
FOR BANACH SPACES The Radon–Nikodým property turns out to be very im-
portant for when considering such vector-valued measures
In this section we consider measures taking their values in and functions; whether it holds depends on the geome-
a Banach space, that is a normed space which is complete try of Y . Let K be a closed bounded convex set in the
(the examples of L p , L ∞ were considered in Section IV). Banach space Y . Then K has the Radon–Nikodým prop-
This is more general than the finite dimensional case con- erty (RNP) for {X, , µ} if for any Y-valued measure
sidered in Section X. We consider first the question of in- m which is absolutely continuous with respect to µ and
tegrating functions taking values in a Banach space, with whose average range A R(m) lies in K there exists a func-
respect to a real measure. If we then set m(A) = f dµ tion f ∈ L 1Y (X, , µ) such that m(A) = A f dµ for each
we see that such a theory of integration gives rise to a Ba- A ∈ . More generally, if E is a closed convex (possi-
nach space–valued measure. Which measures are formed bly unbounded) set of Y (e.g., E = Y ), then E has the
in this way depends on the Banach space in question, since RNP for {X, , µ} if each closed bounded subset K of
the existence of such a function f implies that a version E has the RNP for {X, , µ}. Finally, K has the RNP
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
martingale convergence property (MCP) for {X, , µ} if U1/2 ( f )). So D is not dentable. However, it can be shown
whenever { f n , n } is a martingale such that ∪∞
n=1 n gener- by a geometrical argument that if K is a closed convex
ates and f n ∈ L 1K (X, n , µ) for each n then there exists set in Y with interior K (int K ) nonempty then if K is not
f ∈ L 1K (X, , µ) such that limn → ∞ f n (x) − f (x) = 0 dentable int K is not s-dentable. See Davis and Phelps
a.e., (µ). The corresponding statements for closed con- for details. So every bounded set of Y is dentable if, and
vex sets and the definition of “Y has MCP” follow ex- only if, every bounded set is s-dentable. This turns out
actly as for RNP. In fact a closed bounded convex set to be the important property. Indeed, if a closed bounded
has the RNP if, and only if, it has the MCP. To see convex set K has every subset s-dentable then K has the
that a closed convex set K has the MCP provided it RNP.
P1: ZCK Final Pages
Encyclopedia of Physical Science and Technology EN009B-413 July 18, 2001 0:38
To outline the proof: consider the special case when m f 2 . By the non-s-dentable property of D, this martingale
is in the form so constructed will not converge.
n This result establishes the equivalence of the proper-
m(A) = xi µ(A ∩ Bi ), ties RNP, MCP, dentability of subsets and s-dentability of
i=1 subsets for a Banach space Y . Such spaces, to some ex-
tent, have the nice properties of finite dimensional spaces.
with {Bi } being a partition of X into sets of positive mea-
Among the various other properties of such spaces is the
sure and with xi ∈ K . Then for µ(A) > 0 and A ⊆ Bi we
fact that they possess the Krein–Milman property referred
have
to in Section IX; that is, for any closed bounded convex set
m(A) m(Bi ) K in Y , K is the closed convex hull of its extreme points.
= xi = .
µ(A) µ(Bi ) Indeed we saw earlier that the space C[0, 1] has a unit ball
n U1 [0], which is not dentable: U1 [0] has just two extreme
Set f = i=1 xi χ Bi ; then for each A in we have points, the constant functions +1 and −1, so it does not
n possess the KMP. To prove the KMP it is sufficient to show
f dµ = xi µ(A ∩ Bi ) = m(A). that every bounded convex set has an extreme point. For
A i=1 dentable sets this can be done using the fact that they have
So f = dm/dµ and K has the RNP. In general m will slices of arbitrarily small diameter. A nested decreasing
not be of this simple form, but we can approximate it sequence of such slices is found the intersection of which
by a sequence of such measures, obtaining a convergent yields the desired extreme point. For further details, re-
sequence of such derivatives f n . So we need to know lated results, and references see Bourgin (1983) and Phelps
that we can partition X into sets Bi such that for sub- (1988). We have seen that the spaces C[0, 1], L 1 [0, 1]
sets B of Bi , with µ(B) > 0, and for a suitable x in have not the RNP. However, the spaces L p of Section IV
K we have m(B)/µ(B) − xi < ε. Then we use a se- for which 1 < p < ∞ have the RNP, as have all reflex-
quence of such partitions with ε’s tending to zero to obtain ive spaces. It is not known whether the KMP implies the
the approximation. So we let E = {m(B)/µ(B): µ(B) > 0, RNP. Another important property of finite-dimensional
B ⊆ Bi }, a subset of K as A R(m) ⊆ K . So E is s-dentable spaces is that convex functions are differentiable almost
and so we can find xi ∈ s-co(E\Uε (xi )), with xi ∈ E, everywhere. To see how this extends, we say that the real-
xi = m(B)/µ(B), say. Suppose we can find sets C in Bi valued function f on a Banach space Y is Frechét differ-
with m(C)/µ(C) − xi ≥ ε. Maximizing such a family entiable at y if there exists a linear functional φ (y) on
of disjoint sets C we could write Bi = ∪i∞− 1 Ci and then Y such that for every ε > 0 there exists δ > 0 such that
ϕ(y + x) − ϕ(y) − ϕ (y)(x) ≤ εx whenever x ≤ δ.
m(B) ∞
µ(C j ) Then the space Y is said to be an Asplund space if every
xi = = xj continuous convex function f on a nonempty open set D
µ(B) j=1
µ(B
in Y is Frechét differentiable at each point y of some dense
∞
with x j set E of D where the set E is a G δ set, that is: E = ∩i=1 Gi
when the sets G i are dense open sets of D. Then the space
m(C j )
∞
µ(Ci ) Y is an Asplund space, if, and only if, the Banach space
= ∈E and = 1, Y ∗ of continuous linear functionals on Y has the RNP.
µ(C j ) i=1
µ(B)
that is if two sets A and B are a positive distance apart, finds that the curve C of length L has dimension one and
Hs∗ (A ∪ B) = Hs∗ (A) + Hs∗ (B). Then Borel sets are Hs∗ Hausdorff measure H1 (C) = L (Falconer, p. 24). This re-
measurable in the usual sense: the proof requires a lit- lates to the functions of bounded variation discussed in
tle care in showing that intervals are Hs∗ measurable. It Section V. In three dimensions we may consider the mo-
is easily seen that H1∗ is just Lebesgue measure. Haus- tion of minute particles in a liquid. This erratic motion is
dorff measure, Hs , defined by Hs∗ , is used to analyze modeled probalistically by Brownian motion. In Falconer
sets of zero Lebesgue measure. For it can be shown (p. 144) we have that Brownian paths are s-sets of di-
fairly easily that if Hs∗ (A) < ∞ then Hq∗ (A) = 0 for q > s. mension one, with probability one. Applications to many
So if 0 < Hs∗ (A) < ∞ then Hq∗ (A) = ∞ for q < s. So areas are to be found in Mattila, together with a compari-
we can define (Hausdorff) dimension of a set E by: son of different definitions of “dimension,” and which also
dim(E) = inf{s: Hs∗ (E) = 0}. This provides a dimension, contains a sizeable bibliography.
usually noninteger, for Borel sets in R. All this may
be generalized with, in the definition, (Ik )s being re-
placed by h((Ik )) with a suitable function h, for exam- SEE ALSO THE FOLLOWING ARTICLES
ple a monotonic increasing function h(t) for t ≥ 0 with
h(0+) = h(0) = 0. CALCULUS • COMPLEX ANALYSIS • CONVEX SETS •
Hausdorff dimension can be difficult to calculate. The DIFFERENTIAL EQUATIONS, ORDINARY • FRACTALS •
real numbers have dimension = 1, finite sets of points have INTEGRAL EQUATIONS
dimension zero. For the Cantor Set, P, described in Sec-
tion II, the calculation is straightforward. This set is “self-
similar” in the sense that the subsets in the closed inter- BIBLIOGRAPHY
vals J1,1 , J1,2 left after the first “open third” I1,1 has been
removed are copies of the Cantor set scaled by 1/3 and Best, E. P. (1992). On sets of fractional dimension III. London Math.
translated. So using the “scaling property” of H described Soc. 47(2), 436–454.
above, we get Hs (P) = 32s Hs (P) and so s = log 2
log 3
, provided Bourgin, R. D. (1983). “Geometric Aspects of Convex Sets with the
Radon–Nikodým Property” (Lecture Notes in Mathematics, 993),
we can show that 0 < Hs (P) < ∞. This part needs care
Springer-Verlag, Berlin.
in transforming a arbitrary covering of P into coverings Cohn, D. L. (1980). “Measure Theory,” Birkhäuser, Boston and Basel.
of the subsets of J1,1 , J1,2 mentioned in Section III, and Davis, W. J., and Phelps, R. R. (1974). The Radon-Nikodým property
using the metric outer measure property. This method and and dentable sets in Banach spaces. Proc. Amer. Math. Soc. 45, 119–
result generalize immediately to the “Cantor-like set” Pξ 122.
De Barra, G. (1981). “Measure Theory and Integration,” Ellis Horwood,
obtained when we remove, not the middle third, but a
Chichester, England.
central open interval of positive length 1 − 2ξ at the first Diestel, J., and Uhl, J. J. (1977). “Vector Measures,” Mathematical
stage, with the residual intervals being of length ξ at stage Surveys 15. Amer. Math. Soc., Providence, RI.
one, ξ 2 at stage two, etc. The same argument gives the Dinculeanu, N. (1967). “Vector Measures,” Pergamon, London.
Hausdorff dimension of Pξ as − log log 2
ξ
, a number between 0 Dunford, N., and Schwartz, J. T. (1958). “Linear Operators, Part I,”
Interscience Publications Inc., New York.
and 1. This shows that sets in R exist with all possible posi-
Falconer, K. J. (1985). “The Geometry of Fractal Sets,” Cambridge Univ.
tive values of Hausdorff dimension. In forming the Cantor Press, Cambridge, U.K.
set we removed numbers whose expansion to base 3 con- Gelbaum, B. R. (1982). “Problems in Analysis,” Springer-Verlag, New
tained a 1. If we remove instead those containing a 2 or York.
those containing a zero, we get somewhat different sets Hewitt, E., and Stromberg, K. (1965). “Real and Abstract Analysis,”
Springer-Verlag, New York.
whose Hausdorff dimensions were found by Best. Similar
Lauwerier, H. (1991). “Fractals,” Penguin Books, Baltimore.
results for expansions to other bases are to be found in Mattila, P. (1995). “Geometry of Sets and Measures in Euclidean Spaces,
Weymann. Sets of finite positive Hs measure are called Fractals and Rectifiability,” Cambridge Univ. Press, Cambridge, U.K.
s-sets, or fractals in this context, though some writers use Pesin, I. N. (1970). “Classical and Modern Integration Theories,” Aca-
“fractals” to refer to sets which are self-similar at all levels demic Press, New York.
Pfeffer, W. F. (1977). “Integrals and Measures,” Dekker, New York.
of magnification.
Phelps, R. R. (1988). “Convex Functions. Monotone Operators and
This generalizes easily to two dimensions: from a closed Differentiability,” Lecture Notes, University of Washington, Seattle.
square one deletes a central open cross to leave four closed Rogers, C. A. (1970). “Hausdorff Measures,” Cambridge Univ. Press,
squares placed at the corners: these are then similarly re- London and New York.
duced, etc. The resulting “Cantor square” along with many Rudin, W. (1966). “Real and Complex Analysis,” McGraw-hill, New
York.
examples of self-similar sets are considered in the very ac-
Weymann, H. (1971). Das Hausdorff-Mass von Cantormengen, Math.
cessible book of Lauwerier. In two dimensions one may Ann. 193, 7–20.
consider the Hausdorff dimension of a curve. With some Wheeden, R. L., and Zygmund, A. (1977). “Measure and Integral,”
restrictions to avoid pathological space-filling curves one Dekker, New York.
P1: GNH Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
Nonlinear Programming
Jon W. Tolle
University of North Carolina at Chapel Hill
I. Overview
II. Theoretical Aspects
III. Computation
IV. Applications
optimality conditions. Engineers build controlling devices Using a functional representation of the constraint set
that perform with minimal energy expenditure and de- the standard nonlinear programming problem, hereafter
sign plant schedules that maximize some measure of pro- denoted NLP, can be formulated as
duction efficiency. Economists and other social scientists
theorize that much of human behavior and development, minimize f (x)
whether individual or collective, is guided by optimization subject to: gi (x) ≤ 0, i = 1, . . . , M,
processes. Finally, modern managerial scientists use op-
timization methods to organize and reduce large amounts h j (x) = 0, j = 1, . . . , P,
of data to make rational decisions on policy issues in the x ∈ N .
public and private sectors.
Mathematically, the most general form of an optimiza- The functions gi and h j are called the constraint functions
tion model can be characterized by a set X and a real- and, like f , are generally assumed to be continuously dif-
valued function f defined on X . The set X , often called ferentiable. Nonlinear programming problems are distin-
the feasible set, consists of the possible realizations of the guished from the special types of optimization problems
model. For example, X could consist of a set of permis- called linear programming problems, in which the objec-
sible trajectories for a satellite launch vehicle, the possi- tive and constraint functions are affine. It is also conven-
ble flight schedules for an international cargo airline, or a tional to separate nonlinear programming, which requires
set of possible purchases by a consumer in a given mar- the decision vectors to be finite-dimensional, from prob-
ket. Each x ∈ X is a vector, the components of which are lems such as those of optimal control, in which X is a sub-
termed the decision variables of the model. The function set of an infinite-dimensional space. However, the com-
f , called the objective function, is a measure by which putation of the solutions to these latter problems depends
the vectors in X are compared. Thus for these examples, on the solution of finite-dimensional approximations (see
f (x) might represent the energy required to traverse tra- Section IV.C).
jectory x, the total cost for the implementation when x is
an airline schedule, or the value of a consumer’s purchase
B. The Solution of a Nonlinear Program
when x is the vector with components that are quantities of
commodities. A vector, x ∗ ∈ X , satisfying f (x ∗ ) ≤ f (x) for all x ∈ X
The problem to be solved in an optimization model is is called a (global) optimal solution to NLP and the
corresponding number v ∗ = f (x ∗ ) is called the optimal
minimize f (x), x ∈ X.
value of NLP. One of the distinguishing characteristics
∗
That is, an x ∈ X is sought such that for all other of nonlinear programming is that local optimal solu-
x ∈ X, f (x ∗ ) ≤ f (x). The fact that the problem is formula- tions are possible. Hence, x L ∈ X is a local optimal so-
ted as a minimization rather than a maximization is not im- lution if there is an > 0 such that f (x L ) ≤ f (x) for
portant because the above is mathematically equivalent to all x ∈ X ∩ {x : |x − x L | < } (|z| represents the Euclidean
length of the vector z). Unfortunately, for many nonlinear
maximize −f (x), x ∈ X.
programs, global optimal solutions cannot be identified
The problem as stated is much too general to yield use- until all local optimal solutions are known. Hereafter, the
ful information. In order to be able to analyze the problem, term “optimal solution” will refer to a local solution unless
more must be known about the properties of the set X and specified otherwise.
the function f . The general field of optimization is broken The major theoretical questions to be answered con-
down into subfields according to these properties. Nonlin- cerning a given NLP relate to the existence, characteri-
ear programming is one of these subfields; traditionally, it zation, and stability of solutions. Existence refers to the
refers to the case in which the set X is assumed to be func- determination of conditions on the objective and constraint
tionally prescribed. That is, X is the intersection of a finite functions under which global and local solutions are guar-
number of sets each of which is the solution set of a func- anteed to exist. The characterizations of a solution are
tional equation or inequality. The equations or inequalities theoretical conditions on a point x that are either neces-
that prescribe the feasible set are called constraints. In an sary or sufficient for it to be a global or local solution.
optimization model, an inequality constraint may be used These characterizations are important for identifying the
to represent the limits on an available resource or may be a solution(s) and its properties and in the construction of
production quota that must be exceeded. An equality con- algorithms for computing the solution. Stability refers to
straint might represent the requirement that a consumer the sensitivity of the solution with respect to the pertur-
spend or save all of his or her income or that a trajectory bation of the parameters which define the objective and
begin from a designated location. constraint functions. Stability is important in the practical
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
use of optimization because these parameters are usually C. Special Types of Nonlinear Programs
not known precisely. For instance, they might be the ob-
The set of nonlinear programs can also be subdivided fur-
served mean of historical data or they might be the result
ther into classes associated with special properties of the
of a physical measurement with its attendant errors. It is
objective and constraint functions. Some of the more com-
desirable, in these cases, to have assurances that the opti-
mon classes are described in this section.
mal solution and optimal value are not wildly inaccurate
In theory, the simplest nonlinear program is the un-
as a result of small errors in the determination of these
constrained program (i.e., the problem in which there are
parameters.
no constraints) where the objective function is minimized
In most cases, it is the numerical values of an optimal
over all of N . These problems are important because
solution that are ultimately needed. Thus, the computa-
they include many of the important least-square models
tional question of how best to numerically approximate
and also because the computational algorithms for solving
an optimal solution is equally important to the questions
them are used as the foundations for algorithms for solv-
of theory. A computationally efficient numerical algorithm
ing the more general problems. Among the constrained
that will yield a good approximate solution to NLP must
nonlinear programs, the easiest ones to deal with are the
be available. Consequently, the techniques of numerical
convex programs. These problems have a unique global
analysis and computational mathematics play a major role
optimal solution and no local solutions; much of the the-
in nonlinear programming.
ory is analogous to that found in linear programming. Of
The following simple one-dimensional example illus-
special interest among the convex programs are the con-
trates the major theoretical questions in nonlinear pro-
vex quadratic programs. In these problems, the objective
gramming:
function has the form
minimize f (x)
1 t
subject to: x − α ≥ 0, f (x) = x Ax + a t x,
2
x − β ≤ 0,
where A is an N × N symmetric positive definite matrix
where f is twice differentiable and 0 < α < β. Since the
and a is a fixed N -vector, and all of the constraints func-
objective function is continuous and X = [α, β] is com-
tions are affine. As will be seen, one approach to solv-
pact, the existence of a global solution is guaranteed. El-
ing a general NLP is to approximate it by a quadratic
ementary calculus shows that any local minimum point
program.
must satisfy f (x) = 0 if α < x < β, f (x) ≥ 0 if x = α, or
Two classes of problems that are currently on the fore-
f (x) ≤ 0 if x = β. These necessary conditions for opti-
front of research in the field of nonlinear programming are
mality are examples of the characterizations of the solu-
nondifferentiable and large-scale programs. Nondifferen-
tion. Note that {α < x < β, f (x) = 0, f (x) > 0} is a
tiable programs are those for which the objective function
sufficient (but not necessary) set of conditions for x to be
is only piecewise differentiable. This class has many appli-
a local minimum. For an example of the ideas of stabil-
cations, one of the most important of which is the case in
ity, specify f (x) = xe−x , α = 1/2, and β > 1. The global
which f is itself the solution of an optimization problem;
optima are
for example,
α, β ≤ β ∗ ,
x∗ = f (x) = max{(cα )t x},
β, β ≥ β ∗ , α∈A
∗
where β > 1 satisfies
where {cα } is a family of vectors. The term large-
∗
β ∗ e−β = (1/2)e−(1/2) . scale problems refers to problems in which some or
all of the parameters, N , M, and P are large. The the-
Moreover the optimal value as a function of β is ory for this class is not changed but special computa-
(1/2)e−(1/2) , β ≤ β ∗ , tional techniques are necessary to ensure that solving
v ∗ (β) = such a problem is feasible in terms of computer time and
βe−β , β ≥ β ∗.
storage.
v ∗ (β) is a continuous function of the parameter β and Although there are important applications and results
associated with each of these special types of nonlinear
dv ∗ 0, β < β ∗,
= programs, the theory and computation presented in the
dβ (1 − β)e , β > β ∗ .
−β
remainder of this article deals primarily with the general
Thus, the global optimal value is differentiable except problem, the major exception is the specialization of the
when β = β ∗ . theory to convex programs.
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
L(f,vL)
L(f,v*)
•x L
• x*
FIGURE 1 Level sets of the objective function with local and global minima.
L(w, α) = {x : w(x) = α}, is empty for each v < vl . An example of the geometry is
illustrated in Fig. 1.
L − (w, α) = {x : w(x) ≤ α}. The set of optimal solutions is simplified when the
With this notation, the feasible set for NLP can be written feasible set and the level sets L − ( f, α) are convex sets.
as a finite intersection of these level sets: Mathematically, a subset C ⊂ N is convex if, for every
pair of vectors x and y in C and all λ ∈ [0, 1], the vector
M P
X= −
L (gi , 0) ∩ L(h j , 0) . λx + (1−λ)y is also in C. Thus a convex set is one that has
i=1 j=1 no indentations or protuberances. If the feasible set and
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
L(f,v)
L (f,v*)
x*
the level sets are all convex then, for any optimal solution B. First-Order Conditions
x l with optimal value vl , the sets
In order to establish first-order optimally conditions the
L − ( f, v) ∩ X assumption that the objective and constraint functions are
continuously differentiable will be used. This allows the
are empty for all v < vl and hence there are no local so- functions to be approximated linearly about a given vector,
lutions that are not also global solutions. This situation is which in turn leads to a local polyhedral approximation of
illustrated in Fig. 2. the feasible set.
These geometric conditions characterize the optimal Given a continuously differentiable function w defined
solutions completely. However, they are not very useful in on N and a vector x̂ the affine (linear) approximation to
practice because the level sets cannot be easily mapped. w at x̂ is
In the next section, these geometric conditions are
ŵ(x) = w(x̂) + ∇w(x̂)t (x − x̂),
transformed into algebraic conditions, which will in turn
yield qualitative and quantitative results for the nonlinear where ∇w(x̂) is the gradient of w, that is, the N -
programs. dimensional column vector of partial derivatives of w at
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
L(g1,o) ^ ,o)
L(g 1
x^
x^ ^
X
L(g^2,o)
L(g2,o)
FIGURE 3 Linear approximation of the feasible set.
x̂. If the objective and constraint functions in NLP are The question that now arises is the extent to which the
replaced by their affine approximations at a point x̂ ∈ X optimality of x̂ for the approximating problem is related to
then the resulting optimization problem is a linear pro- the optimality of x̂ for NLP. The answer is contained in the
gram with a polyhedral feasible set X̂ , which serves as a following fundamental result of nonlinear programming.
local approximation of NLP (although the approximation
may be poor). Figure 3 illustrates this approximation in a Theorem 1 (First-Order Necessary Conditions): Let
simple inequality-constrained case. x̂ ∈ N and suppose that the set of active constraint gra-
It can easily be shown that in the inequality-constrained dients at x̂,
case, x̂ is an optimal solution to the approximating linear
{∇gi (x̂) : i such that gi (x̂) = 0}
program if and only if the vector −∇ f (x̂) is a nonnegative
combination of the gradients of the active constraints at x̂ ∪ {∇h j (x̂), j = 1, . . . , P},
(i.e., those constraints that have value 0 at x̂). As is seen is linearly independent. Then x̂ is a local optimal solution
in Fig. 4, this condition means that fˆ cannot have a direc- only if there exist vectors µ̂ ∈ M and ω̂ ∈ P such that
tion of decrease into the interior of X̂ from x̂ and hence
−∇ f (x̂) must lie in the cone K̂ generated by the gradients ∇ f (x̂) + µ̂i ∇gi (x̂) + ω̂ j ∇h j (x̂) = 0, (1)
i j
of the active inequality constraints. If equality constraints
are present, the situation is slightly more complicated in µ̂i gi (x̂) = 0, i = 1, . . . , M, (2)
that −∇ f (x̂) must be a combination of all of the active
gradients with the coefficients of the active inequality con- µ̂i ≥ 0, i = 1, . . . , M, (3)
straint gradients being nonnegative and the coefficients of gi (x̂) ≤ 0, i = 1, . . . , M, (4)
the equality constraints being unrestricted in sign; that is,
h j (x̂) = 0, j = 1, . . . , P. (5)
M P
∇ f (x) + µi ∇gi (x) + ω j ∇h j (x) = 0 The condition on the linear independence of the active
i=1 j=1
constraint gradients is called a constraint qualification.
for some µi ≥ 0 and some ω j . It assures that the geometry at x̂ is not so pathological
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
It can easily be established that if w is convex then the which has a unique solution; namely, (x ∗ , ω∗ ), the global
level set L − (w, α) is a convex set for every α for which optimal solution and its multiplier. Thus, in this case
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
solving the NLP reduces to solving a linear system of z t Hx x L(x̂, µ̂, ω̂)z ≥ 0
equations.
Historically, the necessary conditions in theorem 1 are where Hx x L(x̂, µ̂, ω̂) is the N × N Hessian matrix of L
a part of classical mathematics for the case when there are with respect to x.
only equality constraints; the ω̂ j are known as Lagrange The second-order necessary condition requires the Hes-
multipliers and Eq. (1) is then a direct consequence of sian of the Lagrangian to be positive semidefinite on the
the implicit function theorem. The inequality-constrained tangent subspace to the feasible set at x̂. In a manner anal-
case was treated by Karush, Kuhn, and Tucker and the the- ogous to that of the unconstrained case, a second-order
ory is sometimes referred to by their names. It should be sufficiency condition can be obtained by the more strin-
noted that a more general necessary condition can be de- gent requirement that the Hessian of L be positive definite
rived without assuming any constraint qualification. This on this subspace.
result, called the Fritz John condition, differs from that of
theorem 1 in that there is a coefficient, possibly zero, in Theorem 4 (Second-Order Sufficiency Conditions):
front of ∇ f (x) in Eq. (1). Let x̂, µ̂, and ω̂ satisfy conditions in Eqs. (1)–(5) of theo-
rem 1 and suppose that for i ∈ I (x̂), µ̂i > 0. Further, sup-
pose that for every nonzero z satisfying
C. Second-Order Conditions
∇gi (x̂)t z = 0, i ∈ I (x̂),
In this section, it is assumed that the objective and con-
∇h j (x̂)t z = 0, j = 1, . . . , P,
straint functions are twice continuously differentiable. For
unconstrained nonlinear functions, second-order condi- it is the case that
tions are easily derived. Suppose x̂ is a critical point of f
z t Hx x L(x̂, µ̂, ω̂)z > 0.
(i.e., ∇ f (x̂) = 0) and let H f (x̂) denote the Hessian ma-
trix of f at x̂, the symmetric matrix of second-order partial Then ẑ is an isolated local minimum of NLP.
derivatives. Then x̂ is an unconstrained local minimum of
f if H f (x̂) is positive definite and only if H f (x̂) is posi- x̂ is an isolated local minimum if there is a neighbor-
tive semidefinite. hood of x̂ that contains no other local solution of NLP.
To state analogous conditions for the constrained prob- There are other slightly weaker versions of the second-
lem, the Lagrangian function is introduced. This function order sufficient conditions that do not require that i ∈ I (x̂)
of the N + M + P variables (x, µ, ω) defined by implies µ̂i > 0. However, this restriction, called strict com-
plementary slackness, is required in many important the-
L(x, µ, ω) = f (x) + µi gi (x) + ω j h j (x) oretical applications, including those of the next section.
i j
For twice differentiable functions it is the case that they
is central to the theoretical development of the subject. are convex if and only if their Hessian matrices are posi-
Conditions in Eqs. (1)–(5) show that (x̂, µ̂, ω̂) is a critical tive semidefinite. Thus for convex programs, the Hessian
point of L(x, µ, ω) with respect to x and ω and satisfies of the Lagrangian is positive semidefinite and the second-
order conditions are redundant, as is to be expected in light
∂L
(x̂, µ̂, ω̂) = gi (x̂) ≤ 0. of theorem 2.
∂µi
The second-order conditions for NLP are given in theo- D. Stability and Duality
rems 3 and 4. In these theorems, I (x) will represent the
index set of active inequality constraints at x; that is, As stated earlier, the stability of the optimal solution and
its optimal value are of major importance in applications of
I (x) = {i : gi (x) = 0}. nonlinear programming models. Fortunately, the optimal
solution and its optimal value are stable under reasonable
Theorem 3 (Second-Order Necessary Conditions): Let conditions. To formally state this result, it is necessary to
x̂ satisfy the conditions of Theorem 1 with corresponding consider the perturbed version of NLP,
multipliers µ̂ and ω̂. If x̂ is a local minimum then for any
minimize fˆ(x, p)
vector z satisfying
subject to: ĝ i (x, p) ≤ 0, i = 1, . . . , M,
∇gi (x̂)t z = 0, i ∈ I (x̂),
ĥ j (x, p) = 0, j = 1, . . . P,
∇h j (x̂)t z = 0, j = 1, . . . , P,
where p is a Q-vector, fˆ(x, 0) = f (x), ĝ i (x, 0) = gi (x),
it is the case that and ĥ j (x, 0) = h j (x). Furthermore, it is assumed that
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
fˆ, ĝ i , and ĥ j are continuously differentiable functions max min L(x, µ, ω).
µ≥0,ω x
of p near p = 0. The vector p represents the parameters
of the problem. The following theorem gives conditions If the program is convex and the Slater condition holds
under which the optimal solution and multipliers are then x ∗ is an optimal solution to NLP with multipliers µ∗
smooth functions of the perturbation. and ω∗ if and only if (x ∗ , µ∗ , ω∗ ) is a saddle point for
To state the theorem, we define a regular solution of L(x, µ, ω) that is,
NLP to be a solution at which the linear independence of L(x ∗ , µ, ω) ≤ L(x ∗ , µ∗ , ω∗ ) ≤ L(x, µ∗ , ω∗ )
the active constraint gradients, strict complementary
slackness, and the second-order sufficient conditions a1l for all x, µ ≥ 0, and ω. Thus the values of the primal and
hold. dual problems are equal at optimality. This result fails
to hold under less restrictive hypotheses on the problem
Theorem 5 (Basic Perturbation Theorem): Let x ∗ be a NLP.
regular optimal solution to NLP with multipliers µ∗ and
ω∗ . Let I (x ∗ ) be the index set of active constraints at x ∗ .
Then there is an > 0 and functions x ∗ ( p), µ∗ ( p), ω∗ ( p) III. COMPUTATION
defined and continuously differentiable on the set E = { p :
| p| ≤ } with x ∗ (0) = x ∗ , µ∗ (0) = µ∗ , and ω∗ (0) = ω∗ . A. Basic Concepts
x ∗ ( p) is a local optimal solution to the corresponding per- Finding a numerical approximation to the solution of a
turbed problem and µ∗ ( p) and ω∗ ( p) are its multipliers. nonlinear program can be a difficult task. If the number of
Moreover, the second-order sufficient conditions hold at variables is large or the functions involved are highly non-
x ∗ ( p) and I (x ∗ ( p)) = I (x ∗ ). linear, the computation can be time consuming, even on
the fastest computers. Moreover, in the case of nonconvex
One of the most important cases of perturbation occurs programs, the presence of local solutions can make the de-
when p is an M-vector and gˆi (x, p) = gi (x) − pi . Under termination of a global solution problematic. The methods
the assumptions of theorem 5, the optimal solution as a and procedures described as follows are concerned with
function of p, x ∗ ( p), is smooth and it can be shown that approximating local solutions only.
the optimal value as a function of p, v ∗ ( p) = f (x ∗ ( p)), The algorithms for approximating the solutions of NLP
satisfies. are all iterative in nature; that is, given an initial estimate
∇ p v ∗ (0) = −µ∗ . of the solution, x 0 , a sequence {x k } of approximate solu-
In other words, the instantaneous rate of change in the tions is generated with each iterate, x k , being determined
optimal value as a function of a shift in the value of gi is the successively from information gathered at the preceding
negative of the i th multiplier. This gives an interpretation iterations. It is desired that at each iteration, the new iterate
of the multiplier in terms of the model. For example, if the is a better approximation, in some sense, of a local solu-
i th constraint is a bound on a resource and the objective is tion than the previous ones. In theory, the iterations should
measured in dollars, the value of additional units of that converge to a local solution, say x ∗ . In most algorithms
resource in terms of decreased optimal value is linearly approximations to the optimal multipliers, (µk , ωk ), are
approximated as µi∗ dollars per unit. A similar result holds computed along with x k at each iteration and the algo-
for pertubations of the equality constraints. As a result of rithms terminate when the approximate solutions and mul-
this interpretation, the optimal multipliers are often called tipliers satisfy the first-order necessary conditions to some
shadow prices. predetermined degree of accuracy.
The preceding observations on the properties of the In constrained optimization, in addition to minimizing
multipliers lead, in linear programming, to the formula- f , the optimal solution must also satisfy the feasibility
tion of a dual linear program with the multipliers as the requirements. It is possible (even probable when there are
optimal solution. Similar, but much less complete results nonlinear equality constraints) that an iterate will not be
can be obtained for nonlinear programs. The appropriate feasible. Ideally, an algorithm should be designed in such
formulation involves the Lagrangian function defined ear- a way that, given a current iterate x k , the new iterate x k+1
lier. It can be shown that NLP is equivalent to the min-max will be no more infeasible than x k and will also satisfy
problem f (x k+1 ) ≤ f (x k ). Unfortunately, this is not always possi-
ble and therefore, a successful algorithm should balance
min max L(x, µ, ω). the goals of feasibility and decreasing f . In the algorithms
x µ≥0,ω
described in this article, the iterates are of the form
The dual problem can then be defined as the max-min
problem x k+1 = x k + αk d k ,
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
where the vector d k gives the direction in which a “step” subject to the need to reduce the value of f . Under fairly
is taken and and the scalar αk , called the step length reasonable conditions, these descent methods yield a con-
parameter, determines how far in this direction the next it- vergent sequence of iterates and the limit point satisfies
erate lies. Another approach, called a trust region method the equation ∇ f (x ∗ ) = 0. These methods are fairly robust;
can also be employed to determine the step from x k to their convergence not depending on the initial iterate x 0 .
x k+1 ; a reference to these methods can be found in the However, the convergence rate of these descent methods
Bibliography. is typically very slow and, consequently, they are ill-suited
In judging the effectiveness of an algorithm, there are for general use.
three major criteria: (1) robustness, (2) rate of conver- The second general method is that of Newton’s method
gence, and (3) efficiency. The first two terms deal with and its modifications. The pure Newton method deter-
theoretical issues. The term “robustness” refers to the like- mines the step, d k , for solving ∇ f (x) = 0 by
lihood that the algorithm will yield a sequence of iterates
d k = −[H f (x k )]−1 ∇ f (x k ),
that converge, in theory, to a local solution regardless of
the starting point x 0 , whereas “rate of convergence” refers where H f (x k ) is the Hessian matrix of f at x k , and re-
to the speed with which the iterates converge, for exam- quires αk = 1. It is well known that if the initial iterate
ple, how fast the theoretical error terms, |x k − x ∗ |, tend to x 0 is sufficiently close to a local solution x ∗ at which
zero. Finally, the “effectiveness” is concerned with prac- the gradient ∇ f (x k ) is nonzero, the iterates will converge
tical considerations in implementing the algorithm; it in- to x ∗ very rapidly. In particular, they will converge at a
cludes such problems as computer time and storage re- quadratic rate, which roughly means that the number of
quirements and numerical stability. The descriptions that decimal places of accuracy is doubled at each step. The
follow discuss only the first two of these criteria because biggest drawback to the use of Newton’s method is its lack
the third is very dependent on the particular platform and of robustness. Unless the vector x 0 is close to a local so-
software used. lution, these iterates may fail to converge or else converge
to a nonminimum point.
B. Unconstrained Optimization Algorithms There are a number of methods designed to obtain both
the robustness of descent methods and the rapid local con-
Any description of algorithms for solving NLP must begin vergence of Newton’s method. If f is a convex function, its
with the computational algorithms for unconstrained op- Hessian matrix is positive definite and the Newton step is
timization problems because the latter are fundamental to then also a descent step. Applied with a line search to deter-
the design of the former. The algorithms for unconstrained mine αk , this approach, called a damped Newton method,
problems are much less complex because there is no ques- can both be robust and yield rapid local convergence. For
tion of feasibility that must be taken into account in the nonconvex problems a standard approach for obtaining
choice of step direction and length. There are two basic robustness without sacrificing rapid convergence is to use
approaches to solving the unconstrained problem to be a quasi-Newton or secant method. In this type of algo-
discussed here: the basic descent method and variants of rithm, the Hessian matrix, H f (x k ) is approximated by
Newton’s method. The former emphasizes decreasing f a positive definite matrix, Bk , and the step direction is
while the latter attempts to solve the necessary conditions, given by
∇ f (x) = 0. Each is discussed briefly as are hybridizations
of the two approaches that attempt to incorporate their best d k = −[Bk ]−1 ∇ f (x k ).
properties.
For the descent method, the choice of the step direction Because Bk is positive definite, the direction dk is a descent
is a direction d k satisfying direction [satisfies Eq. (6)] and a line search procedure can
be used to determine αk . Because it is a descent method,
∇ f (x k )t d k < 0. (6) this algorithm will be robust and if the Bk are relatively
Movement in this direction, called a descent direction will, good approximations to the Hessians of f , the conver-
for at least for a short distance, decrease f , and hence αk gence rate should be close to that of Newton’s method.
can be chosen so that The class of matrices that satisfy the generalized secant
condition
f (x k+1 ) = f x k + αk d k < f (x k ).
Bk+1 (x k+1 − x k ) = ∇ f (x k+1 ) − ∇ f (x k ) (7)
The specific choice d k = −∇ f (x k ), gives the direction of
greatest local rate of decrease in f and leads to the steepest provide good approximations to the Hessian of f at x k+1 .
descent method. In practice, αk is chosen so as to mini- These approximations are implemented in an algorithm
mize a polynomial approximation to φ(α) = f (x k + αd k ) by an “updating formula” of the form
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN010D-486 July 14, 2001 18:50
and to use this information to generate the next iterate. In the actual implementations of the algorithm, the ma-
These methods, as typified by the gradient projection and trices {Bk } are often a quasi-Newton approximations just
reduced gradient algorithms, are comparable to the de- as in the algorithms for unconstrained optimization; that
scent method described for unconstrained optimization is, they satisfy the conditions:
and hence are not competitive for the more complicated
nonlinear constrained problems. Bk+1 (x k+1 − x k ) = ∇x L(x k+1 , µk+1 , ωk+1 )
A more general approach that uses higher order infor- −∇x L(x k , µk+1 , ωk+1 ).
mation and has been shown to be particularly effective
is the sequential quadratic programming algorithm. In These matrices are updated by a rank-two matrix at each
this type of algorithm an approximation to NLP is con- iteration and are usually chosen to be positive definite.
structed in which the constraints are approximated lin- Since the true Hessian is not positive definite except on a
early and a quadratic approximation to the Lagrangian subspace of N (see theorem 4), the use of positive definite
function is employed as an objective function. Specifi- quasi-Newton approximations will usually lead to a slower
cally, if x k is a current iterate for NLP, not necessarily rate of convergence. Another aspect of this algorithm that
feasible, with corresponding multiplier approximations requires careful implementation is the determination of
µk and ωk the following quadratic programming problem the step length parameter αk . In unconstrained optimiza-
results: tion, the parameter is chosen so that the objective function
is decreased at each step. As was described in the first
minimize 1 t
2
d Bk d + ∇x L(x k , µk , ωk )t d section of this article, in constrained optimization there
is also the requirement of achieving feasibility, or at least
subject to: ∇gi (x k )t d + gi (x k ) ≤ 0, i = 1, . . . , M,
decreasing infeasibility. At any step this may be inconsis-
∇h j (x k )t d + h j (x k ) = 0, j = 1, . . . , P, tent with decreasing the objective function and therefore
it is not clear how the choice of αk should be made. This
where Bk is the Hessian to the Lagrangian with respect to is where the merit function, mentioned in the preceding
x at (x k , µk , ωk ), Hx x L(x k , µk , ωk ), or an approximation section, comes into play. In the sequential quadratic pro-
thereof. If this problem is solved to obtain a solution d k gramming method, a decrease in a merit function is used
with associated multipliers y k and z k , then the new iterate to determine the step length parameter for a given iterate.
for NLP is given by The choice of merit function depends upon the particular
version of the algorithm but is usually taken to be one of the
x k+1 = x k + αk d k ,
standard penalty functions. For example, if the L 1 penalty
for an appropriate choice of step length αk . Updates for function is used then at a given step, then αk is chosen
the multipliers are given by so that
µk+1 = y k and ωk+1 = z k . P x k + αk d k ; ρ < P(x k , ρ).
The use of the Lagrangian in the objective function permits Because P is decreased at each step and for large ρ a
quadratic effects in the constraints to have an effect on the minimum of P is a minimum of NLP, this choice of
iteration. αk is justified. However, the value of ρ must generally
As a justification for this approach, it can be seen be adjusted in the algorithm to assure that this property
that if the true Hessian of the Lagrangian is used in a holds.
problem with equality constraints only, then the iterate In spite of the complications involved in its implemen-
(x k+1 , ωk+1 ) is identical to that obtained by using the New- tation, versions of the sequential quadratic programming
ton step to find a solution of the first-order conditions of algorithm are considered the most effective general pur-
NLP: pose algorithms currently available for solving the NLP.
P
∇ f (x) + h j (x)ω j = 0 E. Interior Point Methods
j=1
form of the first order conditions. A basic version of the IV. APPLICATIONS
latter development for the case where there are only in-
equality constraints is given here; more examples of the A. Nonlinear Models
method can be found in the Bibliography.
The first order system to be solved in this method is the The description of actual nonlinear optimization models
system of equations derived from Eqs. (1)–(5), in which a is hindered by the fact that the nonlinearities that occur in
slack variable z has been added to covert the inequalities the model are usually due to technical theoretical concepts
to equalities: special to the particular field and hence are not easily ex-
plained to the nonspecialist. In other cases the objective or
M
constraint functions may have little or no theoretical basis
∇ f (x) + gi (x)µi = 0, but be nonlinear functions that have been fit to historical
i=1
data. Linear optimization models in production, blending,
g(x) + z = 0, and network problems are discussed under the topic of
gi (x) · z i = β, i = 1, . . . , M, linear programming. Many nonlinear programming mod-
els are generalizations of these linear problems in which a
z ≥ 0, more accurate simulation of reality is obtained by allowing
µ ≥ 0. nonlinearities in the objective and constraint functions.
In models for which the objective or constraint function
This system is perturbed because the complementary is a cost function, nonlinearities very often occur when
slackness conditions are not satisfied at the zero level economies of scale are present (i.e., when the per unit cost
but at the β level for some β > 0. As in barrier function of a particular item decreases as the amount purchased
methods, it is assumed that there is a family of solutions or produced increases). In blending problems, a partic-
(x(β), z(β), µ(β)) to this set of equations. Under certain ular quality of the blend, for instance, the octane rating
assumptions (usually convexity of the f and gi functions), of gasoline, is a nonlinear function of the components of
this set of solutions defines a curve called the central path the mixture. In network flow problems, nonlinearities may
that tends to a solution of NLP as β → 0. The idea here is represent increasing per unit cost of shipment as a function
to use Newton steps to solve this system while decreasing of the flow from node to node. Another type of nonlinearity
β towards zero. Given a current iterate, (x k , z k , µk ) with results from the modeling of a learning process in produc-
z k > 0 and µk > 0, the set of equations becomes tion models; that is, the efficiency of production increases
Hx x L(x k , µk ) J g(x k )t 0 dx as production rises.
J g(x k ) 0 I du
0 Zk Uk dz B. Statistical Applications
One of the most important uses of nonlinear program-
−∇ x L(x k , µk )
ming is in the estimation of the parameters for a statis-
= −g(x k ) − z k , tical distribution using the observed data. This is called
βe − Z k µk maximum likelihood estimation. A related problem is
the familiar regression problem in which parameters, say
where J g(x k ) is the Jacobian of the vector function g, Z k
θi , i = 1, . . . , N , are estimated so that a particular non-
and Uk are the diagonal matrices with diagonals that are
linear function w(θ) “best” fits a set of observed data;
z k and µk , e is the vector of ones, and L is the Lagrangian
that is, given values of a control variable z k , k = 1, . . . , L,
function. The iterates are now updated by the formulas:
and a sequence of corresponding experimental responses
x k+1 = x k + αk d x, y k , k = 1, . . . , L, values of θ are determined so that some
measure of the difference vector e ∈ L , with
z k+1 = z k + αk dz,
ek = y k − w(θ, z k ), k = 1, . . . , L ,
µk+1 = µk + αk du,
where αk is chosen to maintain the slack variables and mul- is minimized.
tiplier variables as positive and to decrease an appropriate The most common measure is the Eucclidean norm of
merit function. The key issue in an implementation of this e, in which case, the optimization problem is
algorithm is how progress toward the point on the central
L
path corresponding to β is mixed with decrease of β. As in sup [y k − w(θ, z k )]2
the the case of sequential quadratic programming, an ap- k=1
1
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
Prime number p is a prime if p ∈ , p > 1, and if a > 0 include the law of quadratic reciprocity of C. F. Gauss
and a divides p then a = 1 or a = p. (1777–1855).
Rational function A real function of the form F/G
where F and G are polynomials.
Rational number A number of the form a/b where I. ALGEBRAIC NUMBER FIELDS
a, b ∈ , (a, b) = 1, and b = 0. denotes the set of all
rational numbers with the usual operations and rules of Algebraic numbers and integers are generalizations of ra-
addition, subtraction, multiplication, and division. It is tional numbers and integers, and algebraic number fields
a field. are extensions of the rational field . In both cases many
Unique factorization Every element in can be repre- properties are preserved in the new situations, but not all;
sented uniquely as a product of primes. An integral for example, unique factorization is often lost. Although
domain with this property (using irreducible elements these new notions are important in their own right, they
rather than primes) is called a unique factorization are also important because many problems concerning the
domain. rational numbers are best treated in this more general con-
text. For example, to show that one of Fermat’s equations
(see Section II.A.5) has no rational solutions we work in
THE CENTRAL CONCERNS of number theory are the an algebraic extension of in which the left-hand side of
properties of the integers and rational numbers. Questions the equation can be factorized into linear factors. We shall
relating to individual real or complex numbers, integer define these new notions now and discuss their relation-
matrices, algebraic number fields, points on curves, lat- ship to the rational case.
tice points in convex regions, and similar entities are also Definition: A complex number α is called an algebraic
considered. The subject has a very long history, as long as number if rational numbers c1 , . . . , cn can be found to
mathematics itself, and is as actively researched today as satisfy the polynomial equation
at any time in the past.
Number theory is not an organized theory in the usual α n + c1 α n−1 + · · · + cn = 0.
sense but is a vast collection of individual topics and re- The positive integer n is called the degree of α provided
sults, with some coherent subtheories and a long list of un- that α does not satisfy another polynomial equation whose
solved problems. Some of these topics are highly special- degree is less than n. If each ci (i = 1, . . . , n) is a rational
ized, for example, giving a single solution to a Diophantine integer (i.e., each ci ∈ ) then α is called an algebraic
equation, whereas others have wide applicability, the integer.
Euclidean algorithm being an example. Methods range
over virtually all mathematical disciplines, although group For example, the complex numbers√ 2/3, (2 + i)/3,
√ and
and field theory, linear algebra, and both real and complex 21/3 are algebric numbers, and 5, −5, and (1 + 5)/2
analysis are the most commonly used. Problems and re- are algebraic integers. (Note that the last number is an
sults are studied entirely for their own sake; the fact that algebraic integer because it satisfies the equation x 2 −
many theorems are useful elsewhere is incidental. x − 1 = 0). The nonalgebraic numbers, that is the tran-
In this article we shall treat a few of the more im- scendental numbers, are discussed in Section IV.
portant topics not considered in the previous article. For The algebraic number fields are defined as follows. Let
each of these topics major results have been established α be an algebraic number of degree n, and let (α) denote
recently, and this progress is likely to continue in the the set of all complex numbers of the form
future. b0 + b1 α + · · · + bn−1 α n−1 ,
The reader is referred to the article quoted above for
an account of the history of the subject and the basic where each bi ∈ . It can be shown that this set is closed
definitions and results concerning the properties of the under the usual operations of complex addition, subtrac-
integers—in particular, the fundamental theorem of arith- tion, multiplication, and division and so is a field. It is
metic, prime numbers, greatest common divisors, and con- called an algebraic number field of degree n over and
gruences. The fundamental theorem states that every inte- its algebraic structure is similar to that of itself.
ger n > 1 can be expressed uniquely (except for the order Given a field K = (α), let O K denote the set of alge-
of the factors) as a product of prime numbers. The great- braic integers in K . This set is closed under sums and
est common divisor of two integers a and b is denoted by products and so is an integral domain called the ring
(a, b); it can be found using the Euclidean algorithm. The of integers in K . If α is an algebraic integer, we also
basic results of congruence theory concern the conditions define the set [α] to be the collection of all expres-
for the solution of linear and quadratic congruences and sions g(α) where g is a polynomial with rational integer
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
coefficients. Each element of [α] is an algebraic integer, There is a close connection between the arithmetic of
and so [α] ⊆ O K . Note √ that this inclusion√can be strict. O K and the arithmetic of these ideals. This is illustrated by
For example if α = 5 then [α] =√{a + b 5: a , b ∈ }, the following important result. An ideal I is called prin-
but as noted above, the number (1 + 5)/2 is an algebraic cipal if I = {at: a ∈ O K } = t for some fixed t ∈ O K ; t
integer;
√ hence it belongs to the ring √ of integers of the field is called the generator (of I ) in O K . We have O K is a
( 5) but not to the domain [ 5]. unique factorization domain if and only if each ideal in
It can be shown that for each algebraic number field O K is principal. To illustrate this theory we shall consider
K = (α) of degree n its ring of integers O K has a -basis two examples.
{γ1 , . . . , γn }. That is, algebraic integers γ1 , . . . , γn can be √
found so that each integer in O K can be expressed in the (a) Let K = (i) where i = −1. In this case the ring
form c1 γ1 + · · · + cn γn where ci ∈ , i = 1, . . . , n.√In the of integers O K = {a + bi: a , b ∈ }; it is known as the set
example√ above, the -basis for O K , where K = ( 5), is of Gaussian integers. This is a unique factorization do-
{1, (1 + 5)/2}. main and so all ideals are principal. For instances the ideal
A “number theory” has been developed for these rings 2, 3 + 3i = {2a + (3 + 3i)b: a , b ∈ O K } equals the prin-
of integers O K ; it is similar to that for the rational case, but cipal ideal 1 + i as 1 + i divides both 2 and 3 + 3i in the
two main aspects are different—units and factorization. A Gaussian integers. √
unit ε in an algebraic number field K is an algebraic integer (b) In our second example, let K = ( 10). Its ring
that divides 1 (that is, εε = 1 for some ε ∈ O K ). of integers does not have unique factorization, and it has
In the only units are 1 and −1, but most algebraic nonprincipal √ instance, as noted above, 39 = 3 ×
√ ideals. For
number fields contain √ infinitely many√units. For example, 13 = (7 + 10)(7 − 10), and these last four√entries are
the units of the field ( 2) are (3 + 2 2)m for each m ∈ . irreducible in O K . Further the ideal 3, 7 + 10 is not
An irreducible element, which is the generalization of principal√because there is no t ∈ O K that divides both 3
the notion of a prime number in , is defined as follows: and 7 + 10. For the ideals we have
ξ is irreducible (over K ) if ξ ∈ O K , ξ is not a unit of zero, √ √
and whenever we have ξ = βγ , and β and γ belong to 3 = 3, 7 + 103, 7 − 10
O K , then either β or γ is a unit. Clearly, each member of √ √
13 = 13, 7 + 1013, 7 − 10
O K can be written as a product of a unit and a number of √ √ √
irreducible elements; but in many cases this representation 7 ± 10 = 3, 7 + 103, 7 − 10
is not unique, that is, K does √ not have unique factorization.
For example, If K = ( 10), we have and the ideal 39 can be written uniquely
√ as the √prod-
√ √ uct of the√four prime ideals √ 3, 7 + 10, 3, 7 − 10,
39 = 3 × 13 = (7 + 10) (7 − 10), 13, 7 + 10, and 13, 7 − 10.
√ √ Algebraic number theory can be viewed in two ways.
and 3, 13, 7 + 10, and 7 − 10 are all irreducible ele-
ments in this field. It is an open problem to list all algebraic In the first it is a branch of modern algebra and provides
number fields that have unique factorization. One impor- some good examples of structures for that theory to work
tant case has been √ settled recently; that is the imaginary with. In the second way, it provides a more general context
quadratic fields ( n) where n is a negative integer. The in which some problems concerning the rational numbers
only fields in this class that have unique factorization are and integers, especially Diophantine equation problems,
those where n = −1, −2, −3, −7, −11, −19, −43, −67, can be viewed. For a good introduction to the whole sub-
and −163 (see Section IV). ject see Stewart and Tall (1979).
The lack of unique factorization in many number fields
is a severe drawback; it can be partially retrieved by in-
troducing ideals into the theory. Let K be an algebraic II. DIOPHANTINE EQUATIONS
number field with ring of integers O K . An ideal I is a non-
empty subset of O K with the properties: if a , b ∈ I then This topic is the largest and most important in number
a − b ∈ I , and if a ∈ I and c ∈ O K then ac ∈ I . Further, if theory; it concerns the solution of polynomial equations
I and J are ideals we define the product ideal IJ to be the in some specified number system, often the integers
set of all finite sums ai b j , where ai ∈ I and b j ∈ J . Also or the rational numbers . The name refers to the early
I is called a prime ideal when I = O K ; and if J1 and J2 Greek mathematician Diophantus, who was working in
are ideals with J1 J2 ⊆ I then either J1 = I or J2 = I . It Alexandria in the middle of the third century. His surviv-
can be shown that each ideal I where I = O K (O K acts as ing books show that he had a highly developed knowledge
the identity ideal) can be represented uniquely in the form of the solution of equations in integers, especially those
I = P1 . . . Pn where each Pi is a prime ideal. involving sums of squares. Nowadays although, some
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
methods recur, a large array of ideas and arguments finite or an infinite number of solutions (this is related
are used; this is particularly true when integer solutions to the underlying geometry); in the latter case the solu-
are sought. When solutions in an algebraic number field tions are generated by a finite set of fundamental solutions.
(for example ) are required, a more organized theory Accounts are given in Rose (1999) [two-variable integer
based on ideas from algebraic geometry has been de- case] and Cassels (1978) [general field and integral do-
veloped; we shall discuss this in the next section. A main cases].
good survey of the whole topic can be found in Mordell
(1969).
3. Thue–Siegel–Roth Theorem
We begin by listing some of the major results in
Diophantine equation theory. The list is by no means This is an example of a major class of equations that have
complete, but it will give the reader some idea of the only finitely many solutions. One version of the theorem
range of theorems and methods used. is as follows: Suppose f is an irreducible homogeneous
polynomial, without multiple roots, in the rational number
A. Some Major Results in Diophantine variables x and y, and having degree n > 2. Then if the
Equation Theory equation f (x , y) = m , m ∈ , is soluble, it has only finitely
many solutions. The proof uses Diophantine approxima-
1. Linear Diophantine Equations tion theory (see Section IV). T. A. Skolem (1887–1963)
If a , b, n ∈ , then the equation has given an ingenious method for solving some of these
equations; see Borevich and Shafarevich (1966). Also, G.
ax + by = n
Faltings has greatly extended this result; see Section III.
has a solution in integers x and y if and only if (a , b) | n,
and if x0 , y0 is a solution then the general solution is
given by x0 + tb/(a , b), y0 − ta/(a , b) where t ∈ . The 4. Goldbach Conjecture
Euclidean algorithm provides a good method for finding Here, we look for prime number solutions only; the result
solutions of these equations. This result is one of the most has not yet been fully established. C. Goldbach (1690–
useful in the whole of number theory. 1764) conjectured that every even positive integer larger
than 2 can be expressed as a sum of two prime num-
2. Pell’s Equation and Quadratic Forms bers. For example 10 = 7 + 3, 100 = 53 + 47, 1000 =
509 + 491, etc. At the present time, no counterexample
Suppose d is a positive integer and not square, then the has been found and J.-R. Chen has proved that every
equation large even integer is the sum of a prime a number that
x 2 − dy 2 = 1 has at most two prime factors. Also, I. M. Vinogradov
(1891–1983) has shown that every large odd positive
has infinitely many integer solutions x, y generated from √ a integer is the sum of three primes. The proofs of both of
fundamental
√ solution x 0 , y0 using the relation x + y d= these results are long and complex.
(x0 + y0 d)n for n ∈ . The fundamental solution√can be
found using the continued fraction expansion for d (see
Section IV). It is an accident that Pell’s name has been at- 5. Fermat’s “Last Theorem”
tached to this equation; J.-L. Lagrange (1736–1813) gave First proposed in 1637 and finally proved in 1995, this
the first proof of its solvability. is undoubtedly the most famous and well-researched
A quadratic form (over ) is a function F: n → given Diophantine equation. We shall give a brief history,
by the equation for more information see Edwards (1977), Ribenboim
n (1979), Cornell (1997) (for a detailed account of the
F(x1 , . . . , xn ) = ai j xi x j , whole proof ), and Singh (1997) (for a popular account
i , j =1
including the recent advances).
where, for 1 ≤ i, j ≤ n , ai j is an integer satisfying ai j = Although of little intrinsic interest in itself, Fermat’s
a ji . An extensive theory has been developed for solutions so-called Last Theorem has had a profound influence of
of equations of the form the development of pure mathematics over the past three
and a half centuries. It states: the equation Fn
F(x1 , . . . , xn ) = m .
x n + yn = zn
It relies to some extent on the theory of Pell’s equation dis-
cussed above. The results are generally straightforward, if has no solution in integers x, y and z if x yz = 0 and n > 2.
a little complicated to state. Equations can have either a (There are solutions when n = 2.) In 1637, P. de Fermat
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
(1601–1673) proposed this result and claimed to have a is a kind of mathematical induction. By a simple count-
proof, which has not survived, although he did prove the ing argument we find, for our given m, positive integers
result in the case n = 4 and so showed that n can be re- t , x , y , z , w satisfying x 2 + y 2 + z 2 + w 2 = tm. Secondly,
stricted to be an odd prime number. The next step was given this equation, a procedure is devised to give a new
taken by L. Euler (1707–1783) who proved Fermat’s re- solution t1 , x1 , y1 , z 1 , w1 to this equation, which satisfies
sult for n = 3. He was the first to realize that the solution 0 < t1 < t. The result follows by finitely many applications
to the problem would come by working in a number sys- of this procedure.
tem larger than the integers; he used the complex num- The two-square result first proved by Fermat, is: the
bers in his proof. In the middle of the nineteenth century, equation
E. E. Kummer (1810–1893) greatly extended this work
and so proved the result for a large number of cases (and x 2 + y2 = n
as a by-product helped to lay the foundation of modern
has a solution x , y provided those factors of n that are
algebra). Building on this and using the power of modern
congruent to 3 modulo 4 occur in the prime factorization
computers, Fermat’s Last Theorem had been established
of n with an even power. So for example, 1, 2, 4, 5, 8, 9,
for all n < 4,000,000 by 1980. Also by Falting’s work (see
10, 13, 16, 17, and 18 are the positive integers less than
next section), it was known that even if one of the equa-
20 that can be expressed as sums of two squares. This
tions Fn was solvable, it could only have a finite number
can also be established using the infinite descent method,
of solutions.
although many other proofs exist. The proofs of both of
In 1985, G. Frey noted an apparently simple connection
these results rely on the fact that the product of two sums of
between Fermat’s result and some properties of a class of
2 or 4 squares is itself a sum of 2 or 4 squares, respectively.
elliptic curves (see Section III). Frey’s elliptic curves have
For example, we have
the form
y 2 = x(x − a n )(x − cn ). (x 2 + y 2 )(z 2 + w 2 ) = (x z + y w)2 + (x w − yz)2
Using these curves, he made the following conjecture: if (a result known to Diophantus in the third century AD.) No
Fermat’s Last Theorem is false then so is the Taniyama similar identity is available for an odd number of squares,
and Weil Elliptic Curve conjecture (see the last para- and so in this case different methods involving quadratic
graph of Section III). He noted that if a n − cn = bn and form theory are used. The three-squares theorem, first pro-
a , b, c ∈ , then the discriminant of the curve has the vided by Gauss, states that n can be expressed as a sum of
form = (abc)2n , and so he conjectured that if an elliptic three squares proved it is not of the form 4a (8b + 7). For
curve whose discriminant was a 2n-power existed, then it example, every positive integer less than 30 is a sum of
could not satisfy the Taniyama and Weil conjecture. This three squares except 7, 15, 23, and 28.
implication was proved by K. Ribet in 1990. At about this
time, A. Wiles began work on proving that the Taniyama
7. Waring’s Problem
and Weil conjecture is true, one consequence of which
would be, by Frey and Ribet’s result, that Fermat’s result More generally we can ask if all positive integers are
was finally proved. His work was successful for in 1995, sums of kth powers with g(k) summands where g depends
and with some input from R. Taylor, he published his proof only on k. This is known as Waring’s problem. In 1770,
of the Taniyama and Weil conjecture for a large class of E. Waring (1736–1798) postulated that g(2) = 4 (this is
elliptic curves, which includes the Frey curves, and so Lagrange s thoerem), g(3) = 9, g(4) = 19, and so on. In
finally established Fermat’s Last Theorem 358 years af- 1909, D. Hilbert (1862–1943) showed that g exists for all
ter it had been first proposed. This work will surely rank k. It has been established that g(3) = 9 and g(5) = 37; very
as one of the greatest achievements of twentieth century recently it has been shown that g(4) = 19, the value Waring
mathematics. conjectured. Further, for most k ≥ 6, g(k) = 2k + A − 2,
where A denotes the largest integer less than (3/2)k . For
the exact result see Hardy and Wright (1954) pages 335–
6. Sums of Squares
337.
Here we ask: Can a positive integer m be expressed as One aspect distinguishes the case k = 2, discussed in
a sum of k squares, or l cubes, or in general as a sum Subsection 6, and the case k > 2; in the latter case there
of nth powers? We shall look at the squares case first. are a few exceptional integers that require more sum-
Lagrange’s famous four-squares theorem states that ev- mands than usual. For instance only 23 and 239 need nine
ery positive integer is a sum of four squares. The usual cubes, and if n > 8042 then, in all probability, six cubes
proof uses the so-called “infinite descent method,” which are sufficient. If we let G(k) denote the least number of
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
summands needed to express all sufficiently large positive Definition: The genus g of a curve C that has degree
integers as a sum of kth powers, then clearly n and has N singular points is given by g = 12 (n − 1)
(n − 2) − N .
G(k) ≤ g(k) and G(2) = g(2) = 4.
It can be shown that g is a nonnegative integer and is
For most k the precise value of G(k) is not known. unaltered when C is birationally transformed into a new
G(3) lies between 4 and 7 with the most likely values curve C . The genus provides an important classification,
4 or 5, and G(4) = 16. G. H. Hardy (1877–1947) and into three main classes, for the collection of all curves
J. E. Littlewood (1885–1977) conjectured that G(k) ≤ defined over the field K . The first class contains curves
2k + 1 if k = 2m , where m > 1, and G(2m ) = 2m+2 again of genus zero. All curves in this class can be reduced (by
where m > 1. At the moment the best result known is: as birational transformations) to lines or conics, and the lin-
k → ∞, G(k) ≤ (2 + ck )k ln k, where ck → 0 as k → ∞. ear and quadratic form material discussed in the previous
For an account of this work see Vaughan (1981) and section applies to the associated Diophantine equations.
Ribenboim (1989), pages 236–245, which gives the cur- The second class contains the curves of genus one. Us-
rent status of this problem when k ≤ 10. ing birational transformations, all curves in this class that
have at least one point whose coordinates lie in K can be
represented by equations of the form
III. ELLIPTIC CURVES y 2 z = x 3 + ax z 2 + bz 3 , (1)
In this section we discuss the solution of two variable Dio- where a, b ∈ K . They are called elliptic curves because
phantine equations defined over an algebraic number field. they can be parametrized using elliptic functions. The
Compared with the material discussed in the previous sec- discriminant of the curve (1) is −16(4a 3 + 27b2 ).
tion, a more organized theory has been developed; it uses Many of the basic properties of this curve are determined
some basic ideas from algebraic geometry. Also, it is the by this quantity, for example the curve (1) is elliptic if and
subject of considerable current research interest. only if = 0 and, in the real plane, it has one component
Let K be an algebraic number field, for example if < 0 and two if > 0. The last main class contains
the rational field , and let An (K ) denote the set of curves having genus larger than one. Until recently little
n-tuples (x1 , . . . , xn ) where each xi ∈ K . An (K ) is called was known about these equations. In 1922 L. J. Mordell
the n-dimensional affine space over K . On the set A3 (K ) − (1888–1972) conjectured that each equation in this class
{(0, 0, 0)} we define the relation by: (x, y, z) (x ,y ,z ) has only finitely many points whose coordinates lie in K .
if and only if x = x t, y = y t, z = z t for some nonzero This remarkable conjecture was established by Faltings
t ∈ K . This is an equivalence relation, and the set of in 1983, using advanced techniques from algebraic geom-
equivalence classes is called the two-dimensional projec- etry. We have already mentioned one consequence of this
tive space P 2 (K ) over K . Points in this space are denoted work. If n > 3, the genus of each of the curves associated
by (x:y:z). Note that by the relation it is only the with the Fermat equation Fn (see Section II.A.5) is larger
ratios of x, y, and z that matter; so, for example, (2:3:5), than one, and so none of these equations can have in-
(−4: − 6: − 10), and (26:39:65) all represent the same finitely many solutions. This fact provided initial support
point. to Wiles and others in their eventual solution of Fermat’s
A curve C in P 2 (K ) is the set of points (x:y:z) that sat- result. Table I summarizes the main classes for the
isfy a homogeneous polynomial equation f (x, y, z) = 0, field .
and the degree of C is the degree of the polynomial
f . A point P on C is called singular if the partial
A. Elliptic Curves
derivatives (∂ f /∂ x, etc.) at P with respect to all three
variables x, y, and z are zero; unique tangents can be Points on elliptic curves have an important algebraic struc-
drawn to the curve at points that are not singular. Two ture that we shall discuss now. If the point (x:y:z) lies on
curves C and C are said to be birationally equivalent the curve C (defined over the field K ) and x, y, z ∈ K , then
if each can be transformed into the other by a rational we call the point rational; and we let C(K ) denote the set
function and the correspondence is one-to-one and onto of rational points on C. Using the so-called chord-tangent
except at a finite number of points. For example method, C(K ) can be given a group structure as follows.
x = 1/(x − 1), y = 1/y, z = 2z is birational. Note that Let us suppose that C is represented in the standard form,
the degrees of C and C need not be identical. Eq. (1). In this case it is a cubic curve and any straight
We come now to the important notion of the genus of a line will intersect it in at most three points. Further, if
curve. two of these points are rational than the coefficients of the
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
TABLE I Properties of Homogeneous Three Variable Dio- points on C() can be constructed by the chord-tangent
phantine Equations Defined Over method from a finite set of points. Later, A. Weil extended
Maximum number this to all algebraic number fields K , and the result is now
of solutions called the Mordell–Weil theorem. The group C(K ) can
Genus Parametrization in in
be finite or infinite. Recently B. Mazur has classified all
groups that can occur in the finite case when K = ; they
0 By rational functions Infinite Infinite are either cyclic or direct products of two cyclic groups
1 By elliptic functions Finite Infinite but finitely and have orders not larger than 16.
generated The rank of C(K ) is the number of infinite order gen-
≥2 — Finite Finite erators of C(K ); it is finite by the Mordell–Weil theorem.
This number has been studied extensively, and recently
equation of the line through these points will belong to K , some results have been established. For many curves the
and so the third point will also be rational. This conclusion rank is zero or one; but examples have been given in which
is valid even if the first two points coincide at P, that is, the rank is at least 22. For a fixed field K , it is not known
the line is the tangent at P. if the rank can be arbitrarily large or, conversely, if a con-
Using this chord-tangent process we can define an ad- stant m exists such that the rank of all elliptic curves over
dition operation on C(K ). Let O be the point (0:1:0), it K is less than m.
lies on all curves of the form, Eq. (1), and it will act as There are many conjectures concerning these curves,
the identity of the group. Suppose P1 , P2 ∈ C(K ) and the the most important involve L-functions, a detailed defini-
line P1 P2 meets C again at Q; as noted above, Q ∈ C(K ). tion of which can be found in Silverman (1986). Roughly
Now construct the line O Q; it will meet C in one fur- speaking, an L-function for a curve C encodes “local”
ther rational point. This new point is called the sum of P1 properties (that is, properties of C defined over finite fields,
and P2 and is denoted by P1 + P2 . It can be shown that working modulo a prime for example) in the hope of ob-
C(K ) with this operation is an Abelian group. The proof taining “global” properties of C (that is, properties of C
is straight-forward except for associativity, which requires valid over the rational numbers or the complex numbers).
a result from algebraic geometry known as Bezout’s the- Two main collections of conjectures have been studied;
orem. Some sample constructions are given in Fig. 1. In the first, due to B. Birch and H. P. F. Swinnerton–Dyer, re-
this example, P1 + P2 = 2P where 2P denotes the tangent lates the behavior of the L-function of a curve C near 1 to
point P + P. the rank of C over the rational numbers—there is good nu-
In 1922 Mordell showed that the Abelian group C() merical evidence for this conjecture and it has been proved
has only a finite number of generators; that is, all rational recently for some classes of curves. The second main col-
lection of conjectures concerns the domain of definition of
the L-functions in the complex plane. It is a simple mat-
ter to see that these functions are defined at most points
z in the right hand side of the plane (to be precise, when
the real part of z is larger than 3/2). The Taniyama and
Weil conjecture states that the domain of definition can be
extended to cover the whole of the complex plane except
for the point 1. It is known that if this conjecture holds
for the curve C in question, and a set of related curves,
then the curve is “modular” which implies that it possesses
a number of useful arithmetic properties including a spe-
cial type of parameterization. It is this conjecture (for a
subset of all curves, called the semistable curves, which
contains the Frey curves related to Fermat’s Problem) that
A. Wiles (and R. Taylor) proved in 1993/1995 [see Wiles
(1995)] and which finally established that Fermat’s Last
Theorem is valid in all cases (see Section A.5). At the
time of writing (November 1999), it has been announced
by C. Breuil, B. Conrad, F. Diamond, and R. Taylor that
the Taniyama and Weil conjecture is valid for all elliptic
FIGURE 1 Sample constructions of rational points on elliptic curves defined over the rational numbers, no details are
curves by the chord-tangent method. available.
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
IV. DIOPHANTINE APPROXIMATION if α is irrational then the inequality√ (2) has infinitely many
AND TRANSCENDENCE solutions a, b provided f (b) ≤ b2 5. This result can be
derived using some elementary theorems concerning con-
These topics are concerned with the number-theoretic tinued fractions√[see Rose (1999)]. It is best possible
properties of the real and complex numbers. The start- for if α = (1 + 5)/2 (its continued √ fraction representa-
ing point for this work is the question: How close can a tion is [1, 1, 1, . . .]) and f (b) > b2 5 then inequality (2)
rational number a /b approximate to a real number α as- has only finitely many solutions. If this α and any real
suming some restriction on the size of a and b; that is, number whose continued fraction expansion is eventually
given α can integers a and b be found to satisfy all ones is excluded,√then Hurwitz’s
√ theorem
√ can be im-
proved by replacing
√ 5 by 8. Then 8 is best possible
α − a < 1 (2) when α = 2. This process continues; finally it can be
b f (b) shown that there are uncountably many real numbers α
for some suitably chosen monotone increasing function f ? for which the inequality (2) has infinitely many solutions
This is the basic question in Diophantine approximation if f (b) = 3b2 , but only finitely many if the number 3 is
theory. We answer this question and discuss some related replaced by a larger number.
topics in the subsections below.
Continued fractions play an important role in this the- 3. Sets of Real Numbers Modulo 1
ory; they are defined as follows. Let α be a real number
and let [α] denote the largest integer c satisfying c ≤ α; Let ((α)) denote the fractional part of α, that is ((α)) =
then we can write α − [α]. P. L. Chebyshev (1821–1894) was the first to ask:
Given α how is the set {((nα)): n = 1, 2, . . .} distributed in
α = [α] + 1/α1 , the unit interval? Using the methods discussed above, he
where α1 > 1 provided α is not an integer. We repeat this was able to show that this set is dense in the unit interval
construction by setting and that the distribution is uniform provided α is irrational.
His methods apply to a number of similar problems.
αn = [αn ] + 1/αn +1
(n = 1, 2, . . .) provided αn is not an integer. If α is a ra- B. Transcendental Numbers
tional number this process will stop in a finite number of
steps, otherwise it will continue indefinitely. We write q0 A real or complex number that satisfies no polynomial
for [α], qn for [αn ], and [q0 , q1 , . . . , qn ] for the expression equation with algebraic coefficients is called transcen-
dental. J. Liouville (1809–1882) was the first to show that
q0 + 1/(q1 + 1/(q2 + · · · + 1/qn . . .)). transcendental numbers exist, although using the diago-
This is called the nth continued fraction convergent to√α. nal argument of G. Cantor (1845–1918) we now know
For example, the fifth continued fraction convergent to 2 that almost all real or complex numbers have this prop-
is [1, 2, 2, 2, 2, 2]. erty. On the other hand there are many specific num-
bers whose transcendental status is√ unknown. Examples of
Euler’s constant γ , e + π, and π 2 .
A. Rational Approximation Liouville began with the inequality (2) and showed that
if α is an algebraic number of degree n(n > 1) then (2)
1. Best Approximation
has only finitely many solutions if f (b) = bn+k and k > 0.
By a “good” rational approximation a /b to a real number Using this he was able to show that certain real numbers
α we mean that a /b is close to α, and a and b are relatively are transcendental; for example the number whose decimal
small. We define the best approximation to a real number expansion is 0.11000100 . . . , that is, all zeros except for
α relative to n to be the rational number a /b closest to α 1’s at the n! digit places, n = 1, 2, . . . . A number of this
satisfying 0 < b < n. For example, 22/7 is a good approx- type is now called a Liouville number.
imation to π; it is in fact the best approximation relative C. Hermite (1822–1901) introduced the main method
to any n ≤ 56. All good approximations to a real number for establishing the transcendence of a number in 1873.
α are continued fraction convergents to α. It begins by assuming that α is algebraic, satisfying the
polynomial equation f (α) = 0; that is the contrary of the
required result. Then two properties of a formula F con-
2. Hurwitz’s Theorem
structed using the coefficients of f are derived that contra-
The theorem of A. Hurwitz (1859–1919) puts a limit on dict one another, and so transcendence is established. For
how good approximations can be in general. It states that example, the contradictory properties could be (a) F is a
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
positive integer and (b) F tends to zero. Hermite used this at most h, such that | f (γ )| takes the least nonzero value.
method to show that e (= ∞ n=1 1/n!) is transcendental. Now let ω(n, h), ωn , and ω, be given by
F. Lindemann (1852–1939) extended this to show that if
| f (γ )| = h −nω(n,h)
α1 , . . . , αn are distinct algebraic numbers and c1 , . . . , cn
are nonzero algebraic numbers, then ωn = lim max ω(n, k)
h→∞ 1≤k≤h
c1 eα1 + · · · + cn eαn = 0. (3) ω = lim max ωm
n→∞ 1≤m≤n
Using this result he was able to show that the following
Finally, let ν be the least positive integer n such that
numbers are transcendental where α is algebraic, nonzero,
ωn = ∞, and let ν = ∞ if ωn < ∞ for all n.
and not equal to one in the last two cases:
π, eα , sin α, cos α, sinh α, cosh α, There are four possibilities for the values of ω and ν
(note ω and ν cannot both be finite). The scheme of K.
arcsin α, across α, and ln α. Mahler (1888–1972) defines
A famous problem first proposed by the ancient Greeks
and known as “squaring the circle” was: Construct a square γ to be an A-number if and only if ω = 0 and ν = ∞,
equal in area to a given circle using only a ruler and com- γ to be an S-number if and only if 0 < ω < ∞ and
pass. Lindemann’s result shows that this is impossible as ν = ∞,
√ γ to be a T -number if and only if ω and ν are both
π is transcendental. Further extensions of the method
enabled A. O. Gelfond (1906–1968) and T. Schneider to infinite, and
show that if α and β are algebraic numbers, α = 0 or 1, γ to be a U -number if and only if ω = ∞ and ν is finite.
and β is irrational then α β is transcendental.
One consequence of this result is: If α, β, γ and δ are The following results concerning this classfication have
nonzero algebraic numbers and no linear relation with been proved.
rational coefficients exists between ln β and ln δ, then
(a) If two numbers α and β are algebraicly dependent
α ln β + γ ln δ = 0. (i.e., g(α, β) = 0 for some two-variable polynomial g) then
they belong to the same class.
In 1966, A. Baker extended this by providing a similar
(b) The A-numbers are exactly the algebraic numbers,
result with n rather than 2 summands [see Lindemann’s
and so the S-, T -, and U -numbers are transcendental.
result, inequality (3)]. A number of important applications
(c) Almost all real and complex numbers are
have followed. For example, the number eα β γ is transcen-
S-numbers.
dental provided α, β and γ are algebraic and nonzero.
(d) The Liouville numbers are U -numbers; in the ex-
We shall mention two further applications.
ampleof a Liouville number given above ω1 = ∞.
(e) T -numbers exist; this is a recent result of W. M.
(a) In Section III, we introduced the elliptic curves; Schmidt. Many problems remain, for example it is known
Baker’s result provides an effective finite bound on the that π is an S- or T -number, but which? For further de-
number of integer (as opposed to rational number) solu- tails on all of the results in this section see Baker (1975),
tions of the associated Diophantine equations. LeVeque (1955), and Schmidt (1980).
(b) Baker’s result enables us to list all those imagi-
nary quadratic fields that have unique factorization (see
Section I). V. PRIME NUMBER THEORY
1. Mahler’s Classification The properties of the prime numbers have been studied
since the time of the early Greeks Pythagoras and Euclid;
The set of all real and complex numbers is divided into even so, many questions remain unanswered today. The
four classes A, S, T, and U as follows: first major result, which is attributed to Euclid, states that
the set of prime numbers is infinite. The other major result
1. The height h of a polynomial f with rational integer from this period is the so-called “Sieve of Eratosthenes,”
coefficients, not all zero, is the maximum of the absolute which is an effective method for enumerating all primes
values of its coefficients. less than some fixed integer.
2. Given γ , n, and h > 1, let f be the polynomial with Two pre-eminent results have been established in mod-
rational integer coefficients, degree at most n, and height ern times. They are the theorem of P. G. L. Direchlet
P1: GSS Final Pages
Encyclopedia of Physical Science and Technology En011l-504 July 25, 2001 16:52
(1805–1859) concerning primes in arithmetic progres- bach conjecture (see Section II.A.4), problems concerning
sions and the Prime Number Theorem (PNT), which esti- formulas taking prime values, and Fermat and Mersenne
mates the density of the primes. We shall discuss these, but numbers.
we shall begin by considering some of the many unsolved We shall discuss now the two major results mentioned
problems in prime number theory. Most of the conjectures earlier and their connections with the Riemann hypothesis.
associated with these problems are backed up with ample
numerical evidence, but this should be treated with some
B. Dirichlet’s Theorem
skepticism because in most cases this evidence involves
only relatively small numbers. For example, most of the Suppose (a, b) = 1; this famous theorem first established
available numerical evidence suggests that Li(x) > π (x) in 1837 states that there are infinitely many prime numbers
for all x. [These two prime density functions are defined in the arithmetic progression A = {an + b, n =1, 2, . . .}.
below.] But it is now known that Li(x) − π(x) changes It is proved by considering a series of the form χ ( p)/ p
sign infinitely often, and the first change occurs before where the sum is taken over the primes p in A. The func-
x = 6.69 × 10370 . tion χ is used to pick out elements of A and is defined
using some basic group theory (it is called a multiplica-
A. Conjectures Concerning Prime Numbers tive character). Second, standard analytic techniques are
applied to this series to show that it diverges to infinity.
1. The Twin Prime Conjecture Dirichlet’s theorem follows, for if there were only finitely
A brief study of a table of prime numbers shows that there many primes in the progression A then the series would
are many pairs of primes p, q where q = p + 2; examples have a finite sum.
are 3, 5; 101, 103; 1997, 1999; and 109 + 7, 109 + 9. If we
let π2 (x) denote the number of prime pairs less than x;then, C. The Prime Number Theorem (PNT)
for instance, π2 (103 ) = 35 and π2 (106 ) = 8164. The twin
Let π (x) denote the number of primes p satisfying p ≤ x.
prime conjecture states that π2 (x) → ∞ as x → ∞, and
J. S. Hadamard (1865–1963) and C. de la Vallee Poussin
the likely value of π2 (x) is ( f 2x (ln t)−2 dt). Progress
(1866–1962) proved in 1896 that
has been made on this conjecture, for Chen has shown
that there are infinitely many pairs of integers p, p + 2 π (x) ∼ x/ ln x.
where p is prime and p + 2 has at most two factors.
A more precise version of this result is
15
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
I. THE ANCIENT ROOTS The tablet designated Plimpton 332 in the Plimpton
OF NUMBER THEORY Library at Columbia University contains columns of num-
bers that give the hypotenuse and one leg of a right trian-
A. Arithmetica versus Logistica gle. The other leg can be calculated from the Pythagorean
theorem. The sizes of some numbers imply that the
As ancient cultures slowly developed the concept of num- Babylonians possessed a general method for solving the
ber and applied it to the solution of everyday problems, it right triangle problem. The last column of this tablet gives
was inevitable that a few people would become interested the value of the cosecant of the acute angle opposite the
in the properties of numbers, in relations among numbers longer leg of the triangle, indicating that the Babylonians
and their patterns, and in numbers as logical entities rather used and tabulated some of the ratios of the sides of a right
than as practical tools. triangle, as we do today in trigonometry.
The use of numbers in commerce and trade, construc- The Babylonians also considered problems of a the-
tion of temples, surveying of land, and other practical oretical nature such as circumscribing a circle about an
pursuits came to be known as logistica. Such activities, isosceles triangle or finding the area of a circle, the latter
which could be carried on by anyone with knowledge of leading to the erroneous value of 3 for π .
the proper procedures, particularly commoners or slaves, Other fertile sources of knowledge of ancient math-
was considered a less noble endeavor than was the study ematics are the papyrus records from Egypt. One of the
of abstract properties and relations among numbers, called most notable, the Rhind Papyrus, is displayed in the British
arithmetica. The latter was considered the province of Museum. Dating from about 1650 B.C., the Rhind Papyrus
scholars and royalty, too precious to be entrusted to the contains material copied by the scribe Ahmes from ear-
common people. In ancient times, then, the goals of arith- lier records. The material consists of problems involving
metic were the same as those of number theory today: the arithmetic as well as ideas from geometry. Some problems
study of the properties of integers. involved no practical applications but seemed to be posed
for the reader’s amusement.
Early records such as these show that ancient civiliza-
B. Ancient Mathematical Records
tions were highly interested in mathematics, their achieve-
Some records have survived through the centuries to shed ment in mathematics was considerable, and the knowledge
light on the early history of mathematics. In Czechoslo- they possessed no doubt formed the basis for works of the
vakia in 1937, a wolf bone was found on which were Greeks and others who followed.
carved 55 notches in groups of 5. The bone, which was
about 30,000 years old, suggests that before mankind
C. Number Theory versus Numerology
learned to write there was a compulsion to record numbers.
About 3000 B.C., a mace belonging to King Menes of Just as the science of astronomy developed hand in hand
Egypt was inscribed with hieroglyphic symbols represent- with the superstition of astrology, so the early history of
ing 400,000 oxen, 422,000 goats, and 120,000 prisoners. number theory is interwoven with numerology, the belief
On the Columna Rostrata in Rome, erected in honor of the that numbers possessed mystical powers and were a dom-
victory over Carthage in 260 B.C., the symbol for 100,000 inant influence in human affairs. One form of numerology
was engraved 31 times, signifying the number 3,100,000. was gematria, whereby numbers were assigned to letters
However, surviving records indicate that early mathe- of the alphabet, for example, a = 1, b = 2, and so forth.
matics consisted of more than just recording large num- The numerical value of words or names could then be cal-
bers. Clay tablets from Babylonia containing columns culated and studied, thus revealing hidden relationships
of numbers, which were originally thought to be merely or future events. A minor nobleman might ingratiate him-
records of business accounts, were later discovered to be self with the king by showing that the numerical values
mathematical texts and tables. The tablets date from about of their names, that is, the sum of the letter values, were
2000 to 200 B.C. Some contain solutions to construction equal.
problems such as the calculation of areas or volumes. Oth- Gematria could also warn the unwary of evil as in the
ers relate to business or legal matters such as the compu- Biblical quotation “Here is wisdom. Let him that hath
tation of interest or the division of estates. Some were ta- understanding count the number of the beast; for it is the
bles of multiplication or squares or square roots, whereas number of a man and his number is six hundred three score
others gave the circumferences of circles of various diam- and six.” To whom the passage referred has never been
eters. A table of inverses was used to reduce the opera- determined, but during the Reformation it was common
tion of division to the more easily performed operation of to attack an enemy by writing the enemy’s name in such
multiplication. a way that the numbers added to 666.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
D. The Pythagoreans
One of the strongest influences in ancient number the-
ory came from the Pythagoreans, a brotherhood founded
by the philosopher Pythagoras who lived around 570–
500 B.C. The members of this mystical order believed
that integers were the key to explaining the universe. To
support their belief, they had only to point to the fact,
which could be verified by trial and error, that halving
FIGURE 1 The Pythagorean theorem: a 2 + b 2 = c 2 .
the length of a vibrating string created a sound an oc-
tave higher than the original tone. Further, they knew that
when tones whose frequencies were in the ratios of cer-
tain whole numbers were sounded together, the result was 3, 4, 5 relationship to establish right angles in surveying
a harmonious chord. property or constructing buildings by stretching a rope
On such empirical evidence as this, the Pythagoreans with knots marking lengths of 3, 4, and 5 units around
based a theory of the universe ruled by integers. Numbers three stakes, as in Fig. 2.
were imbued with human traits or characteristics. Odd Trial and error leads to the discovery of many such
numbers such as 1, 3, 5 were called masculine, whereas Pythagorean triples, for instance, 5, 12, 13 and 8, 15,
even numbers such as 2, 4, 6 were feminine. Square num- 17. It is natural to inquire whether formulas exist that
bers such as 4 = 2(2) represented justice, the two factors will generate sets of Pythagorean triples. One method,
being equal. The number 6 represented the soul, 7 repre- which is attributed to Pythagoras, is as follows: Let n be
sented health and understanding, and 8 was the number of any positive integer. Then a = 2n + 1, b = 2n 2 + 2n, and
love and friendship. c = 2n 2 + 2n + 1 constitute a Pythagorean triple. If n = 3,
Although few today believe in numerology, we may we obtain 7, 24, 25, for which 72 + 242 = 49 + 576 =
speculate on the origins of some associations of numbers 625 = 252 . For n = 4 we have 9, 40, 41, whereby 92 +
with philosophical concepts. The number 6 is the small- 402 = 81 + 1600 = 1681 = 412 . All triples generated in
est number that is the sum of all of its proper divisors: this way lead to triangles having a hypotenuse that is one
6 = 1 + 2 + 3. This special property may have led to the unit longer than the larger leg.
number being identified with the important concept of the A more general method consists of choosing integers u
soul. Again, the Bible states that the earth was created in and v, one of which is odd and the other even, such that u
6 days. and v have no common divisors other than the number 1.
A few hints of numerology still remain in the ideas that We then say that u and v are relatively prime. Then, letting
certain numbers such as 7 are “lucky” and that numbers a = 2uv, b = u 2 − v 2 , and c = u 2 + v 2 gives the desired
like 13 are “unlucky.” Another example is the common result. The following examples illustrate this method.
expression “bad news comes in threes.” Although such Multiplying each member of a Pythagorean triple
notions are now taken lightly, we must remember that an- by the same integer yields another Pythagorean triple.
cient people took the principles of numerology seriously
and that many purely mathematical questions and prob-
lems arose from such occult investigations.
E. Pythagorean Triples
The Pythagorean theorem states that the sum of the squares
of the legs of a right triangle equals the square of its hy-
potenuse, that is, a 2 + b2 = c2 , as shown in Fig. 1. This
result was certainly known before the time of Pythagoras,
but whether he was the first to actually prove the theo-
rem is unknown because of the Pythagoreans’ custom of
ascribing all new knowledge to the Master.
Numbers whose values satisfy the Pythagorean theo-
rem, such as 3, 4, and 5 (32 + 42 = 9 + 16 = 25 = 52 ), are
permissible values for the sides of a right triangle. In
Egypt, men known as rope stretchers made use of the FIGURE 2 Rope stretching.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
we can see that any common divisor of 135 and 42 di- lity rules were developed. Obviously, a number is divisible
vides 9. Also, any common divisor of 135 and 42 divides by 2 only when it is even, that is, only when it ends in 0,
6. Finally, we conclude that any common divisor of 135 2, 4, 6, or 8.
and 42 divides 3. Hence, 3 is the gcd of 135 and 42. In Because 100 and all higher powers of 10 are divisible
general, gcd of the two numbers is always the last nonzero by 4, a number is divisible by 4 only when it ends in
remainder that occurs in the algorithm. 00 or when the number formed by its last two digits is
Several results follow from Euclid’s algorithm. If the divisible by 4. Only numbers ending in 0 or 5 are divisible
product ab is divisible by c and if a and c are relatively by 5.
prime, then c divides b. From this it may easily be seen One hundred and all higher powers of 10 are divisible
that if a number is relatively prime to each of two or by 25. A number is divisible by 25 only when it ends in
more numbers, it is relatively prime to their product. Also, 00 or when the number formed by its last two digits is
if (a, b) = c, then (ka, kb) = kc. The latter may be ver- divisible by 25. Hence, a number is divisible by 25 only
ified by Euclid’s algorithm. As an example, the gcd of when it ends in 00, 25, 50, or 75.
405 = 3(135) and 126 = 3(42) is 3(3) = 9. The number 10 and all higher powers of 10 leave the
Euclid’s algorithm may be used to prove the follow- remainder 1 when divided by 3. This means that the re-
ing: If c is the gcd of a and b, there exist integers x mainder from dividing a number by 3 equals the remainder
and y such that ax + by = c. For the preceding example, from dividing the sum of the digits of that number by 3.
5(135) − 16(42) = 3. The numbers x and y can be ob- We have 371 = 3(123) + 2, whereas 3 + 7 + 1 = 11 and
tained by solving for c, starting from the next-to-the-last 11 = 3(3) + 2. Therefore, a number is divisible by 3 if and
equation and working backward to the first equation. only if the sum of its digits is divisible by 3.
A similar statement holds for the divisor 9. A num-
ber is divisible by 9 if and only if the sum of its dig-
J. Least Common Multiple
its is divisible by 9. Thus, 12,681 is divisible by 9 since
If c is divisible by both a and b, we say that c is a common 1 + 2 + 6 + 8 + 1 = 18 and 18 is divisible by 9. A process
multiple of a and b. The multiples of 4 are 4, 8, 12, 16, called casting out nines, based on the divisibility property
20, 24, etc., whereas those of 6 are 6, 12, 18, 24, etc. We of the number 9, can be used to check the results of ad-
find that 12 and 24 are both common multiples of 4 and 6. dition or multiplication. For addition, the digits of each
The smallest of the infinite set of common multiples addend are added:
of a and b is called the least common multiple, or lcm,
written [a, b]. Therefore, we have [4, 6] = 12. 17 1+7=8 8
It is easily seen that the lcm of two primes is their 31 3+1=4 4
product. However, our previous example shows that the 84 8 + 4 = 12 1+2=3 3
lcm of two numbers having a common divisor is smaller 133 15
than their product. In fact, the product of two integers 1 + 3 + 3 = 7∗ 1 + 5 = 6∗
equals the product of their lcm and their gcd. That is,
ab = (a, b)[a, b]. If the result is a two-digit number, these digits are added
Finding the gcd and the lcm for two numbers can be and so on until each addend has been reduced to a single-
facilitated by writing each one as the product of primes. digit number. These resulting numbers for all of the
For example, 84 = 22 (3)(7) and 90 = 2(32 )(5). We find the addends are then added, the digits again being added
gcd by taking each factor appearing in both numbers to if necessary to reduce the final result to a single-digit
the smallest power to which it appears in either number. number.
So (84, 90) = 2(3) = 6. The same process is applied to the sum, yielding a
We find the lcm by taking each factor appearing in ei- single-digit number. If these two single-digit numbers,
ther number to the largest power to which it appears in the one from the addends and the one from the sum, are
either number. So [84, 90] = 22 (32 )(5)(7) = 1260. Finally, not equal, the addition is incorrect. On the other hand,
we note that 6(1260) = 7560 = 84(90). The definitions of if the two single-digit numbers are the same, there is no
gcd and lcm can be extended to three or more numbers in guarantee that the addition is correct. Two different errors
a straightforward way. may make these numbers the same but make the addition
wrong. In particular, the error of transposing two digits
will not be detected by this method. Hence, casting out
K. Divisibility Rules
nines can show that the addition process is wrong but can-
Before the advent of computers, finding divisors of large not show that it is absolutely correct. The results of our
numbers was not an easy task. Therefore, certain divisibi- example show that the original sum is incorrect.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
FIGURE 4 The sum of the first n odd positive integers is n 2 . FIGURE 6 The sum of the first n positive integers is n (n + 1)/2.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
O. Magic Squares
If a square is divided into n 2 smaller squares and one of
the integers from 1 to n 2 is written in each of the smaller
squares (using each number only once) in such a way
that the sum of the integers in any row, column, or main
diagonal is always the same, the result is called a magic
square of order n. A few examples of magic squares are
shown in Fig. 9.
Interest in magic squares dates back to about 2200 B.C.
in China where the diagram of Fig. 10, called the Lo-Shu,
appeared. FIGURE 10 The Lo-Shu.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
FIGURE 12 Magic square formed by substitution of numbers in FIGURE 14 The result of substituting the digits of π into the magic
a sequence. square of Fig. 13.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
Sometimes, additional conditions restrict the number This follows directly from Euclid’s algorithm and leads to
of solutions. In this problem, one might require that the
Theorem. There exist integers x and y satisfying the
number of women be twice the number of men, yielding
equation ax + by = c if and only if the gcd of a and b
the unique solution (5, 10, 15). At other times there may
divides c.
be an infinite number of solutions or even no solution.
One renowned indeterminate problem, called the cattle For example, in the equation 14x + 35y = 56, we find that
problem of Archimedes, results in seven equations in eight (14, 35) = 7 and 7 divides 56. Dividing, we obtain the
unknowns whose solution yields extremely large numbers. equivalent equation 2x + 5y = 8. We now let 2x + 5y = 1
Many problems of this type, called linear indeterminate and find the solution x = −2, y = 1. Then x = 8(−2) =
problems, originated in India. The Hindu mathematician −16 and y = 8(1) = 8 are solutions of 2x + 5y = 8, and
Brahmagupta (born in 598 A.D.) authored a treatise on as- therefore of 14x + 35y = 56. From the particular solution
tronomy that included chapters devoted to mathematics. x0 , y0 , we can find the general solution x = x0 + tb and
This work, “Brahma-Sphuta-Siddhanta” (or “Brahma’s y = y0 − ta, where t is any integer. In our case, x = −16 +
Correct System”) includes solutions of numerous linear 35t and y = 8 − 14t.
indeterminate equations. Also found there is the second- Diophantine equations involving variables to the second
degree indeterminate equation nx 2 + 1 = y 2 , for which or higher powers can prove difficult or impossible to solve.
Brahmagupta gives the solution Although many special results exist, a general method of
solving any Diophantine equation or proving the nonexis-
x = 2t/(t 2 − n) tence of solutions is unknown. We mention a few particu-
y = (t 2 + n)/(t 2 − n), lar results. It is known that the equations x 3 + y 3 = z 3 and
x 4 + y 4 = z 4 have no positive integral solutions.
where t is any integer. Thus, if n = 3, we have 3x 2 + 1 = y 2 The equation x 4 + y 4 = u 4 + v 4 has general solutions
and x = 2t/(t 2 − 3) and y = (t 2 + 3)/(t 2 − 3), which leads that we shall not list. The smallest integral solution is
to 1334 + 1344 = 1584 + 594 .
It has been proven that the equation ax n + by n = c has
t x y only finitely many solutions if n ≥ 3. More generally, Axel
Thue (1863–1922) showed that if a function of x and y,
1 −1 −2
f (x, y) = an x n + an−1 x n−1 y + · · · + a1 x y n−1 + a0 y n ,
2 4 7
3 1 2 with ai integers, cannot be factored into two polynomials
10 20
97
103
97 having integral coefficients, then the equation f (x, y) = c
..
.
..
.
..
.
has only a finite number of solutions for n ≥ 3.
The equation x 2 − cy 2 = 1 is known as Pell’s equation,
after John Pell (1610–1685), although he had no part in
He also states that the equation nx 2 − 1 = y 2 has no in-
its solution. For any value of c there is the trivial solution
tegral solutions for x and y unless n is the sum of the
x = ±1, y = 0. If c is a square number, the left-hand side
squares of two integers. For example, 4x 2 − 1 = y 2 has
is easily factored. If c is a positive nonsquare integer, then
no integral solutions, whereas 13x 2 − 1 = y 2 has integral
it can be proved that the equation always has a nontrivial
solutions because 13 = 22 + 32 . One solution is x = 5,
solution. Such solutions may not necessarily be obtained
y = 18.
readily by trial and error. The equation x 2 − 61y 2 = 1 has
The search for general solutions was aided by the intro-
x = 1,766,319,049 and y = 226,153,980 as its smallest
duction of symbols for certain quantities and operations
positive nontrivial solution.
that occur frequently within a given problem. The use
The foregoing discussion is meant to illustrate the wide
of this technique is credited to Diophantus. For a long
range of problems leading to Diophantine equations, the
period of time, mathematicians did not differentiate be-
great variety of approaches to solving such problems and
tween problems leading to determinate or indeterminate
the enormous difficulty that is entailed in the solution of
solutions. Today, algebra students are exposed almost ex-
certain of them.
clusively to determinate problems.
The following results are useful in solving certain in-
determinate problems. B. Congruences
Theorem. If the integers a and b are relatively prime The theory of congruences was begun by the work of Carl
then there exist integers x and y such that ax + by = 1. Friedrich Gauss (1777–1855), which originally appeared
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
Theorem. If p is a prime of the form 4n + 3, then one x ≡ 4(mod 44), x ≡ 8(mod 44)
of the congruences
x ≡ 12(mod 44), x ≡ 26(mod 44)
p−1
! ≡ ±1(mod p) x ≡ 30(mod 44), x ≡ 34(mod 44).
2
holds. As an example, we see that when n = 3,
F. Quadratic Residues
3−1
! = 1! = 1 ≡ 1(mod 3)
2 We now examine the law of quadratic reciprocity, one of
and when n = 7, the most profound and powerful results in the theory of
congruences. If the integers a and m are relatively prime,
7−1 we say that a is a quadratic residue of m if the congruence
! = 3! = 6 ≡ −1(mod 7).
2 x 2 ≡ a(mod m) has a solution. If it has no solution, we say
There is a complex procedure for determining whether the that a is a quadratic nonresidue of m. Since x 2 ≡ 11(mod 5)
positive or negative sign prevails. has the solution x ≡ 1(mod 5), we see that 11 is a quadratic
A more general result is contained in the following residue of 5. However, x 2 ≡ 13(mod 5) has no solution,
theorem. showing that 13 is a quadratic nonresidue of 5.
The Legendre symbol simplifies discussion of quadratic
Theorem. If p is an odd prime that does not di- reciprocity. If p is an odd prime that does not divide a,
vide the integer a, then x 2 ≡ a(mod p) has a solution we write (a/ p) = 1 if a is a quadratic residue of p and
when a ( p−1)/2 ≡ 1(mod p) and has no solution when (a/ p) = −1 if a is a quadratic nonresidue of p. From our
a ( p−1)/2 ≡ −1(mod p). previous discussion, we have (11/5) = 1 and (13/5) = −1.
Let p = 7 and a = 8. Then 8(7−1)/2 = 83 = 512 ≡ 1(mod 7), We have already mentioned Euler’s criterion, which
so the congruence x 2 ≡ 8(mod 7) has a solution. One easily states that r is a quadratic residue of an odd prime p
found solution is x ≡ 1(mod 7). if r ( p−1)/2 ≡ 1(mod p) and r is a quadratic nonresidue
On the other hand, if p = 5 and a = 8, then 8(5−1)/2 = of p if r ( p−1)/2 ≡ −1(mod p). Thus, from the fact that
8 = 64 ≡ −1(mod 5). Hence, the congruence x 2 ≡
2 5(11−1)/2 = 55 = 3125 ≡ 1(mod 11), we conclude that 5
8(mod 5) has no solutions. is a quadratic residue of 11. However, 5(13−1)/2 = 56 =
By trial and error we find that the congruence x 2 + 5x ≡ 15,625 ≡ −1(mod 13), which indicates that 5 is a quadratic
0(mod 6) has the four solutions x ≡ 0(mod 6), nonresidue of 13. Hence, (5/13) = −1 while (5/11) = 1.
x ≡ 1(mod 6), x ≡ 3(mod 6), and x ≡ 4(mod 6). However, Also, x 2 ≡ 11(mod 7) has the solution x ≡ 2(mod 7),
if the modulus is prime, a polynomial congruence of so (11/7) = 1. However, x 2 ≡ 7(mod 11) has no solution,
degree n cannot have more than n solutions. It may have whence (7/11) = −1.
fewer than n. For example, x 3 ≡ 1(mod 5) has only the We see that in some cases ( p/q) = (q/ p) as in (5/13) =
solution x ≡ 1(mod 5). (13/5) = −1 or (11/5) = (5/11) = 1, but in other cases
In general, in order to solve a polynomial congruence, ( p/q) = −(q/ p), for example, (11/7) = 1 = (7/11) = −1.
one has only to solve polynomial congruences having The question arises as to when equality holds and when it
moduli that are powers of a prime. We illustrate the method does not. Euler discovered a pattern that he believed an-
by considering the congruence swered this question, and Gauss later gave several proofs
that this pattern was indeed true in all cases. The key to
x 3 − 2x 2 + 12 ≡ 0(mod 44). the pattern is the observation that every odd prime has
Because 44 = 22 · 11, we write two new congruences either the form 4n + 1 or 4n + 3. If p and q are odd
primes and either one of them has the form 4n + 1, then
x 3 − 2x 2 + 12 ≡ 0(mod 22 ) ( p/q) = (q/ p), as in (5/11) = (11/5). However, if both of
and the primes have the form 4n + 3, then ( p/q) = −(q/ p).
An example of the latter is (11/7) = −(7/11).
x 3 − 2x 2 + 12 ≡ 0(mod 11). The theorem is usually stated in the more elegant form
The first has the solutions x ≡ 0(mod 4) and x ≡ 2(mod 4), Law of Quadratic Reciprocity. When p and q are
whereas the second has the solutions x ≡ 1(mod 11), distinct odd primes then
x ≡ 4(mod 11), and x ≡ 8(mod 11). By applying the Chi-
nese remainder theorem to pairs of these latter congru- ( p/q)(q/ p) = (−1)[( p−1)/2][(q−1)/2] .
ences, each pair containing a congruence of modulus 4
and a congruence of modulus 11, we find the solutions of Gauss gave six proofs of the law of quadratic reciprocity
the original congruence to be and more than 50 proofs have been devised.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
A number of assertions by Fermat can be shown to all positive integers n. Unfortunately, there seems to be
follow from the law of quadratic reciprocity, including little hope of determining the value of θ , so the theorem is
the following. presently of no use for constructing primes. Similar for-
mulas exist for expressing pn as a function of some param-
1. Every prime of the form 8n + 1 or 8n + 3 has eter θ . In order to determine θ to an accuracy sufficient
the form x 2 + 2y 2 . For example, 17 = 32 + 2(22 ) or to calculate pn , it proves necessary to know the primes
41 = 32 + 2(4)2 . p1 , p2 , p3 , . . . , pn . Such formulas are thus equally useless
2. Every prime 3n + 1 has the form x 2 + 3y 2 but no for constructing new primes, unless some method can be
prime 3n − 1 has this form. found for determining θ without using p1 , p2 , p3 , . . . , pn .
3. Every prime 4n + 1 is the sum of two squares in only No such method has yet been found, nor has it been proved
one way. impossible for such a method to exist. The question is thus
an open one at present.
The number of primes whose values do not exceed some
G. Formulas for Primes
positive integer x is symbolized by π (x). Thus, π (12) = 5,
A glance at a list of the first few primes is sufficient to since 2, 3, 5, 7, and 11 are primes not exceeding 12. Simi-
convince one that any formula that would yield all of the larly, π (13) = 6.
primes would be a most unusual contrivance. Even the The method of finding primes known as the sieve of
related but less demanding task of finding a formula that Eratosthenes leads to the formula
would produce only primes seems most formidible. √
Certain formulas furnish partial results, For example, π (N ) − π ( N ) + 1 = N − [N / p1 ] − [N / p2 ] − · · ·
if we replace x in the function f (x) = x 2 + x + 41 with − [N / pk ] + [N / p1 p2 ] + [N / p1 p3 ] + · · ·
x = 0, ±1, ±2, . . . , ±39, or −40, the result in each case
is a prime. The same is true of g(x) = x 2 − 79x + 1601 + [N / pk−1 pk ] − [N / p1 p2 p3 ] − · · · ,
√
for x = 0, 1, 2, . . . , 79. The function f is a composite of where p1 , p2 , . . . , pk are the primes less than N . The
two formulas, one given by Euler in 1772, the other by formula involves considerable calculation. To determine
Legendre in 1798. the number of primes less than one million, one needs
Euler also found that 2x 2 + n is prime for x = 0, 1, . . . , to consider the primes less than 1000. Obviously, today’s
n − 1 where n is one of the numbers 3, 5, 11, or 29. Paul computers make such a formula vastly easier to apply and
Pritchard discovered the imposing expression, therefore more useful.
11,410,337,850,553 + 4,609,098,694,200x, There are certain shortcuts that can be employed with
the formula for π (x); Before the advent of computers,
which is prime for x = 0, 1, . . . , 21. Bertelsen determined that there are 50,847,478 primes that
Similar formulas of this nature exist, but it can be do not exceed one billion. An analogous formula for the
shown that no polynomial function having integral co- number of pairs of twin primes the larger of which does
efficients can generate only prime numbers for integral not exceed N was given by G. J. Kostis and R. L. Page.
values of x. One need only observe the effect of calculat- There exists a number pattern that generates primes in
ing f (41) = (41)2 + 41 + 41 to understand why any poly- a curious but inefficient way:
nomial will eventually produce a composite number.
The function f given above generates primes for 80 1 1
consecutive integers. It has been conjectured that this is 1 2 1
the largest number of consecutive primes that a quadratic 1 3 2 3 1
polynomial can produce. All that is known for sure is that 1 4 3 5 2 5 3 4 1
no quadratic having the form x 2 + x + c, where c > 41,
1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 1
can yield primes for all values of x = 0, 1, 2, 3, . . . , c − 2.
From algebra we know that a polynomial of degree n can Each successive row is formed by inserting between
be constructed to assume n + 1 arbitrary values. Thus, the each consecutive pair of integers their sum. Any two con-
function h(x) = −x 3 /6 + x 2 + x/6 + 2 gives the first four secutive entries are relatively prime. It is also true that the
primes for x = 0, 1, 2, and 3. However, a similar polyno- integer n occurs φ(n) times in the nth line, where φ(n) is
mial yielding the first 101 primes would have degree 100. the number of integers less than n and relatively prime to
The longest known arithmetic progression consisting en- n. Since a prime p is relatively prime to all p − 1 integers
tirely of primes is given by 223,092,870n + 2,236,133,941 less than itself, we see that n is a prime if and only if it
where n = 0, 1, 2, . . . , 15. appears n − 1 times in the nth row. Since 2 appears once
W. H. Mills proved the following remarkable theorem: in row 2, it must be a prime. Similarly, 3 appears twice in
There exists a real number θ such that [θ 3n ] is prime for row 3 and 5 appears 4 times in row 5. However, 4 appears
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
only twice in row 4, making 4 a composite number. The For M2 = 3, the perfect number is 2(3) = 6 and for M3 = 7,
number of integers in each row increases rapidly, there the perfect number is 4(7) = 28.
being 433 of them in row 10. Thus, the pattern is of little
practical value in finding primes.
K. Fermat Numbers
t
H. The Prime Number Theorem Numbers of the form 22 + 1 are called Fermat numbers.
We see that F0 = 3, F1 = 5, F2 = 17 and F4 = 65,537 are
By examining tables of prime numbers and by a great all primes. The next one, F5 = 232 + 1, is difficult to test for
amount of trial-and-error calculation, mathematicians dis- factors without the use of a computer. Fermat conjectured
covered that the quantities π(x) and x/ ln x behave in a that all Fermat numbers were primes, and it was a hundred
similar fashion and that their ratio approaches 1 as x in- years before Euler found a counterexample. He did this by
creases without bound. This was confirmed upon proof of proving that any factor of a Fermat number must have the
the following theorem. form (2t+1 )k + 1. For t = 5, these factors are of the form
Prime Number Theorem. limx→∝ [π(x)/(x/ln − x)] 64k + 1. He then showed that 641 is a factor of F5 .
= 1. No further primes have been discovered among the
Fermat numbers and many believe that quite the oppo-
This was proved independently by Hadamard and De la site of Fermat’s conjecture is true. Nevertheless, interest
Vallée Poussin. in Fermat numbers continues because of a remarkable re-
Another function that approximates π (x) better than sult due to Euler. He showed that a regular polygon of
x/ ln x is the integral logarithm, defined by N sides can be constructed with only straightedge and
x compass if N = 2k p1 p2 · · · pn , where the pi are Fermat
dt
Li(x) = . primes. This unexpected connection between number the-
2 ln t
ory and geometry is an example of the richness of results
I. Riemann’s Zeta Function obtained by pondering prime numbers.
M. Representation of Numbers
J. Mersenne Primes
in Certain Forms
The study of the primality of numbers having a particu-
The operation of division leads to questions of factorabil-
lar form has occupied a great deal of attention in num-
ity of integers and to congruences, to mention only two
ber theory. A number of the form Mn = 2n − 1 is called
things. So, too, does the operation of addition lead to many
a Mersenne number in honor of Marin Mersenne (1588–
important questions in number theory. For example, the
1648). Thus, M2 = 3, M3 = 7, M4 = 15, etc. When Mn is
question of whether a given integer can be represented as
a prime p, it is called a Mersenne prime and the num-
the sum of an arbitrary number of squares has occupied
ber 2 p−1 (2 p − 1) is a perfect number. Furthermore, every
many mathematicians over the years. For the case of two
even perfect number must be of this form. Thus, the search
squares, we have the following theorem:
for even perfect numbers reduces to a search for Mersenne
primes. Because 2n − 1 can be factored when n is compos- Theorem. Every prime of the form 4k + 1 can be writ-
ite, a Mersenne prime must have the form M p = 2 p − 1. ten as the sum of two squares.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
Thus, 29 = 4(7) + 1 has the required form and can be because primes are the building blocks for all integers,
written 29 = 4 + 25 = 22 + 52 . A more powerful result research concerning their properties continues to be a ma-
states: An integer n can be written as the sum of two jor area in number theory.
squares if and only if all of its prime factors having We sometimes refer to numbers such as 1000 or
the form 4k + 3 occur with even exponents. We have 1,000,000 as nice round numbers because of the zeros in
5(72 ) = 245 = 49 + 196 = 72 + 142 as the sum of two their base 10 representation. More generally, we consider
squares, whereas 73 = 343 is not. a round number to be one having a large number of rela-
The case for three squares is settled by the following tively small factors. Therefore, 4200 = 23 · 3 · 52 · 7 would
theorem: be a round number but 17,858,257 = (3607)(4951) would
not. It can be shown that the number of prime factors of an
Theorem. An integer n can be represented as the sum
integer x is, on the average, of the order of ln (ln x). That
of three squares unless n has the form 4e (8k + 7) for some
is, for all integers in a large interval, the preponderance of
integers e and k.
them have as the number of their prime factors a number
We see that 45 = 4 + 16 + 25 = 22 + 42 + 52 , whereas close to ln (ln x).
60 = 4(8 + 7) cannot be so represented.
Lagrange proved that every positive integer is the sum
of four squares. The search for the four squares that repre- N. Factorization Methods
sent a certain number can be reduced to representing the Many methods have been devised to aid in the factorization
prime factors of the number as the sum of four squares. of large numbers. Fermat’s factorization method depends
An identity then gives the four squares for the original on writing the number to be factored as the difference of
number. two squares. If
The question of finding the least value of s such that
every integer can be expressed as the sum of no more than n = x 2 − y 2 = (x + y)(x − y) = ab,
s kth powers of integers is known as Waring’s problem. we may assume that n is odd and then a √ and b will also be
It has long been known that every integer can be written odd. If we write x 2 = n + y 2 , then x ≥ n. We examine
as the sum of nine cubes. Kevin S. McCurley proved that √
x 2 − n for various values of x > n, and when the differ-
every integer exceeding ee13.97 is a sum of seven positive ence is a square, we have found the desired factorization.
integral cubes. For example, if n = 26,781, we see that (164)2 −
For fourth powers, the most that can be said is that every 26,781 = 115, which is not a square. Instead of consid-
89
integer larger than 1010 can be represented as a sum of ering (165)2 − 26,781, (166)2 − 26,781, etc., we can use
19 fourth powers. For fifth powers, the question is unre- the easier but equivalent method of adding 2x + 1, 2x + 3,
solved. Surprisingly, for many higher powers, the problem etc., and check each result for a square.
has been solved. It is known that for 6 ≤ K ≤ 200,000,
every integer can be written as a sum of no more than
x x 2 − 26,781
2k + [(3/2)k ] − 2 kth powers. Hence, every integer is the
sum of no more than 26 + [(3/2)6 ] − 2 = 73 sixth powers. 165 115 + 329 = 444
We have seen that every integer can be represented as 166 444 + 331 = 775
the sum of no more than nine cubes. In fact, if 23 and 239 167 775 + 333 = 1108
are exempted, every integer is the sum of no more than 168 1108 + 335 = 1443
eight cubes. Similarly, every integer larger than 454 is the 169 1443 + 337 = 1780
sum of no more than seven cubes. One is therefore led 170 1780 + 339 = 2119
to ask: What is the smallest number G(k) of kth powers 171 2119 + 341 = 2460
whose sum represents any sufficiently large integer? I. M. 172 2460 + 343 = 2803
Vinogradov showed that 173 2803 + 345 = 3148
k + 1 ≤ G(k) ≤ k(3 ln k + 11). 174 3148 + 347 = 3495
175 3495 + 349 = 3844 = (62)2
Although one of the outstanding accomplishments in num-
ber theory, this nevertheless falls short of answering the Then
question.
From studying tables of primes, one comes to the (175)2 − 26,781 = (62)2
inescapable conclusion that most integers are compos- or
ite. This is true in the sense that the ratio π(x)/x ap-
proaches zero as x approaches infinity. Nevertheless, 26,781 = (175 + 62)(175 − 62) = 237(113).
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
Euler’s factorization method depends on expressing the TABLE I The first 30 Fibonacci numbers
number to be factored as the sum of two squares in two n Fn n Fn n Fn
different ways: N = a 2 + b2 = c2 + d 2 . Then a 2 − c2 =
d 2 − b2 or (a + c)(a − c) = (d + b)(d − b), where we may 1 1 11 89 21 10,946
assume that a and c are odd and b and d are even. Let 2 1 12 144 22 17,711
k be the gcd of a − c and d − b. Then k is even and 3 2 13 233 23 28,657
a − c = kl and d − b = km. Then (a + c)kl = (d + b)km 4 3 14 377 24 46,368
or (a + c)l = (d + b)m. We see that a + c is divisible by m 5 5 15 610 25 75,025
since (l, m) = 1. Thus a + c = nm, where n is even. Also, 6 8 16 987 26 121,393
nml = (d + b)m or nl = d + b. We see that n is the gcd of 7 13 17 1597 27 196,418
a + c and d + b. The factorization is 8 21 18 2584 28 317,811
9 34 19 4181 29 514,229
N = [(k/2)2 + (n/2)2 ](m 2 + l 2 ). 10 55 20 6765 30 832,040
For example,
1,000,009 = (235)2 + (972)2 = 32 + (1000)2 .
the sum of the total number of pairs from the previous two
We have a − c = 232, a + c = 238, d − b = 28, and b + months.
d = 1972. Then k = (232,28) = 4, n = (238,1972) = 34, The sequence 1, 1, 2, 3, 5, 8, . . . , defined by F1 = 1,
l = (a − c)/k = 232
4
= 58, and m = (d − b)/k = 284
= 7. F2 = 1, Fn = Fn−1 + Fn−2 for all n ≥ 3, is called the
We find the factorization Fibonacci sequence in honor of Leonardo of Pisa (who
2 2 2 was known as Fibonacci, the son of Bonacci). Table I lists
1,000,009 = 42 + 34 2
(7 + 582 ) = 293(3413).
the first 30 terms of the Fibonacci sequence.
Many properties have been discovered for the sequence,
O. Fibonacci Numbers which has been the subject of extensive study. One readily
sees, for example, that every third term is divisible by 2,
Consider the following ancient problem. Assume that we
every fourth term is divisible by 3 and every fifth term is
have a pair of newborn rabbits, one male and one female.
divisible by 5. In fact, it can be shown that every prime p
Suppose that at the end of two months the pair becomes
divides an infinitude of Fibonacci numbers.
mature. At the end of the third month, and every suc-
Several other easily discovered properties are
ceeding month, they produce a male–female pair. Finally,
assume that the rules just stated apply to each new pair of
1. The sum of the first n Fibonacci numbers is one less
rabbits that is born. Assuming no deaths, how many pairs
than Fn+2 .
of rabbits will there be at the end of any given month?
2. Fn−1 Fn+1 + 1 = Fn2 .
If we represent an immature pair of rabbits by an open
3. Fn2 + Fn+1
2
= F2n+1 .
circle and a mature pair by a shaded circle, we have the
4. If A, B, C, and D are four consecutive Fibonacci
results shown in Fig. 15.
numbers, then C 2 − B 2 = AD.
Notice that the number of mature pairs each month
equals the total number of pairs from the previous month.
The sequence is also related to patterns found in na-
Also, the number of immature pairs each month equals
ture. For example, the seeds on the head of a sunflower
the number of mature pairs from the previous month. But,
lie in rows that form clockwise (CW) and counterclock-
since the number of mature pairs from the previous month
wise (CCW) spirals. The numbers of spirals of each type
equals the total number of pairs from two months ago, we
are often consecutive Fibonacci numbers with the smaller
see that the total number of pairs in any given month is
number enumerating the CCW spirals and the larger num-
ber the CW ones. A typical sunflower will have 34 CCW
and 55 CW spirals, although heads with 144 CCW and
233 CW spirals have been reported.
Other sunflower heads may have numbers of CCW and
CW spirals given by consecutive Lucas numbers, named
for Edouard Lucas. These numbers are 1, 3, 4, 7, 11,
18, . . . , where L1 = 1, L2 = 3, and Ln = Ln−1 + Ln−2 for
n 3.
If we consider the ratios Fn+1 /Fn , we have the se-
FIGURE 15 The Fibonacci sequence. quence 1, 2, 1.5, 1.666, . . . , 1.6154, 1.679, 1.6180, . . . .
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
√
This sequence converges to the number (1 + 5)/2 =
1.6180339 . . . , which is the golden ratio of ancient
Greece. FIGURE 18 An approximation to the logarithmic spiral.
Another interesting relationship involves the first k
Fibonacci numbers. Suppose we choose the first 11 of
them: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, and 89. We construct more, by performing thousands of operations a second,
a rectangle having sides of 55 and 89 and then divide it into computers reduce the time required for certain calcula-
a square of side 55 and a 34-by-55 rectangle as shown in tions to within reasonable limits. Starting with M521 , the
Fig. 16. Now we divide the rectangle into a square of side most recently discovered Mersenne primes were found
34 and a 34-by-21 rectangle as in Fig. 17. We continue the with the aid of computers. Some results that were estab-
process of constructing squares in the remaining rectan- lished by use of home computers have been published.
gles until two 1-by-1 squares are constructed. Then in the Computers have sparked renewed interest in other an-
55-by-55 square, we draw a quarter circle of radius 55, in cient problems. In 1974, H. J. J. teRiele announced a pair
the 34-by-34 square we draw a quarter circle of radius 34 of amicable numbers, each of which has 152 digits. Using
that is connected to the first quarter circle, and continue a probabilistic algorithm in conjunction with a computer,
in this fashion until all of the squares have quarter circles. Michael O. Rabin found a pair of numbers of the order of
The resulting curve, which closely approximates the loga- magnitude 10123 , which he conjectures are twin primes.
rithmic spiral, a curve found in the shell of the chambered In November, 1996, Joel Armengaud and George
nautilus, is shown in Fig. 18. F. Woltman proved that 21,398,269 − 1 is a Mersenne
prime that would require more than 400,000 digits to
write in the base 10 system and is the largest known
III. MODERN DIRECTIONS prime.
Research in the distribution of primes continues, in-
Research in number theory flourishes today, the results cluding the study of primes that differ by a fixed amount
of that research appearing in numerous scholarly journals and gaps between primes. Sol Weintraub has found the
on a monthly or quarterly basis. Obviously, this article largest known gap between primes, which consists of
cannot begin to treat the current state of research in the 682 consecutive composite numbers following the prime
field. Besides being too large in scope, the subject has 61,003,096,898,749. In view of the use of computers to
become extremely abstract, with many topics which can facilitate searches, it is safe to say that many of today’s
be fully understood only by experts in the field. largest results will soon be surpassed.
High-speed computers have affected number theory by Sometimes a long-term conjecture falls prey to a coun-
allowing the consideration of cases involving computa- terexample. Euler conjectured that no nth power is a sum
tions far beyond the capacity of the human mind. Further- of fewer than n nth powers; for example, no cube is the
sum of fewer than 3 cubes. In 1966, L. J. Lander and T. R.
Parkin found the counterexample:
1445 = 275 + 845 + 1105 + 1335 .
In 1921, V. Brun showed that even if the number of twin
primes should be infinite, they are more thinly distributed
throughout the integers than are the primes. More specif-
ically, he showed that the sum of the reciprocals of the
FIGURE 17 A rectangle divided in proportion to numbers in the primes is infinite but the sum of the reciprocals of twin
Fibonacci sequence. primes is finite.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
left-hand side cannot be factored into two similar expres- terns are disposed of almost as quickly as they appear. As
sions. If r is a root of such an irreducible equation of degree an example, consider some patterns which seem to gener-
n, then the set of all expressions that can be formed from ate primes. For p12 + ( p1 + 1) p2 we have
that root by addition, subtraction, multiplication, and divi-
22 + 3(3) = 13
sion by nonzero terms is called the algebraic number field
of degree n generated by r . 32 + 4(5) = 29
In order to preserve the concept of unique factorization,
52 + 6(7) = 67
Kummer introduced what he called ideal numbers. He was
able to find a sufficient condition for Fermat’s last theorem 72 + 8(11) = 137
to be true and proved it for several particular cases.
112 + 12(13) = 277,
Dedekind extended this concept to algebraic number
fields in general by considering sets in the fields called all of which are prime. However, 132 + 14(17) = 407 =
ideals. This work led to the development of field theory, 11(37).
p
an area in which number theory overlaps another branch Again, consider p1 2 + p1 + p2 :
of modern mathematics called abstract or modern algebra.
Much research is presently conducted on Diophantine 23 + 2 + 3 = 13
equations. A related topic is Diophantine approximation, 35 + 3 + 5 = 251
the approximation of irrational numbers by rational
numbers. 57 + 5 + 7 = 78,137,
A review of current literature in number theory gives ev-
all of which are prime. Numbers in this sequence grow in
idence of interest in old as well as new topics. Among the
size rapidly. Thus, 711 + 7 + 11 = 1,977,326,761, which is
former are tests for divisibility, distribution of quadratic
harder to check for primality by use of a pocket calculator
residues, Riemann’s zeta function, Pythagorean triples,
than its predecessors. Of course, it is easily handled by
and the distribution of primes. Among the latter are
a modern computer. Even this is not necessary, however,
Fortune’s conjecture (see Section IV.A) and permutable
since the next term 1113 + 11 + 13 = 1113 + 24 is clearly
primes. These are numbers such as 13 or 37 that are also
divisible by 5.
prime when their digits are reversed.
The numbers 31, 331, 3331, and 33,331 are all primes.
In fact, the pattern continues to yield primes until we reach
IV. UNSOLVED PROBLEMS 333,333,331, which is divisible by 17.
AND CONJECTURES Another interesting pattern is
3! − 2! + 1! = 5
The natural numbers are the simplest set of numbers used
in mathematics, yet the number of patterns derived from 4! − 3! + 2! − 1! = 19
the natural numbers seems almost endless. These patterns
have led mathematicians to ask questions such as: Is the 5! − 4! + 3! − 2! + 1! = 101
pattern true for all integers? Under what conditions does
6! − 5! + 4! − 3! + 2! − 1! = 619
the pattern hold?
7! − 6! + 5! − 4! + 3! − 2! + 1! = 4421
A. Conjectures
8! − 7! + 6! − 5! + 4! − 3! + 2! − 1! = 35,899,
When statements concerning number patterns are proved
to be true, they are called theorems. Before that, they are which yields the primes listed. Unfortunately, the next step
conjectures, nothing more than educated guesses based on gives 326,981, which is divisible by 79.
inductive reasoning applied to a finite number of particular The following pattern gives primes for many steps:
cases. Conjectures, then, fall in a kind of no-man’s-land
41 + 2 = 43
between the list of facts that have been shown to be true
and the host of statements that have been shown to be false. 43 + 4 = 47
They remain there until they are proven to be true theorems
47 + 6 = 53
or until a counterexample shows them to be false.
Some conjectures enjoy long lives before they are dis- 53 + 8 = 61
proved. For example, Fermat’s conjecture that numbers of
t 61 + 10 = 71
the form 22 + 1 are always prime survived for a hundred
years before it died at the hands of Euler. Many other pat- 71 + 12 = 83.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
In fact, it will give primes for another 33 steps be- In 1974, Robert Tijdeman proved that there exists a
fore the composite number 1681 = 412 appears. The pat- constant k with the property that all powers of integers
tern is made up of numbers obtained from the formula which equal consecutive integers are less than k. Thus, we
x 2 + x + 41, which we have previously discussed. know that there can be at most a finite number of such pairs
Finally, consider the pattern of integers in which each of consecutive powers, although the work of calculating
succeeding row is obtained by inserting the number of k seems too formidible to allow a definite value to be
the row, n, between each pair of integers from row n − 1 determined at present.
whose sum is n: In 1876, Catalan also examined the sequence P0 = 2
and Pn+1 = 2 Pn − 1. Thus,
Number
n of terms P1 = 2 P0 − 1 = 22 − 1 = 3
1 1 2 P2 = 2 P1 − 1 = 23 − 1 = 7
2 1 2 1 3 P3 = 2 P2 − 1 = 27 − 1 = 127
3 1 3 2 3 1 5 P4 = 2 P3 − 1 = 2127 − 1.
4 1 4 3 2 3 4 1 7 He speculated that Pn is prime for n = 1, 2, 3, and 4,
5 1 5 4 3 5 2 5 3 4 5 1 11 all of which were subsequently verified. P5 seems to be
6 1 6 5 4 3 5 2 5 3 4 5 6 1 13 undecidable since it has approximately 1038 digits.
This pattern fails for row 10, which contains 33 terms. If
B. Fermat’s Last Theorem
we count the number of digits in each row, the tenth row
contains 37 digits, a prime number. However, the 11th row One of the most famous of all problems in number the-
contains 57 digits, a composite number. ory, unsolved for over 350 years, goes by the misnomer
A recent conjecture is due to Reo F. Fortune who of Fermat’s last theorem. In the margin of a copy of
examined the pattern Diophantus’ “Arithmetic,” opposite a problem concern-
ing writing a square as the sum of two squares, Fermat
2+1=3 5−2=3 wrote that it is impossible to write a cube as the sum of
2(3) + 1 = 7 11 − 6 = 5 two cubes, a fourth power as the sum of two fourth powers,
2(3)(5) + 1 = 31 37 − 30 = 7 and so forth. In other words, he claimed that the equation
2(3)(5)(7) + 1 = 211 223 − 210 = 13 x n + y n = z n cannot be solved when n > 2.
2(3)(5)(7)(11) + 1 = 2311 2333 − 2310 = 23 He also claimed to “have discovered a truly marvelous
demonstration of this proposition that this margin is too
2(3)(5)(7)(11)(13) + 1 = 30,031 30,047 − 30,030 = 17
narrow to contain.” In view of Fermat’s wrong guess con-
The first five sums in the left-hand column are primes, cerning the primality of all Fermat numbers, one must be
but 30,031 = 59(509). However, for each sum if we find skeptical of his claim, or at least wish he had had a supply
the next larger prime and subtract from it the product of blank paper at hand.
of consecutive primes given in that row, the result is Only one proof concerning number theory is known
prime. Fortune’s conjecture is that this pattern always to be due to Fermat, that being found in another margin
gives primes. Many feel that the conjecture is true but of the same book. This theorem showed that the area of
proving it appears to be a difficult task. a Pythagorean triangle having integral sides cannot be a
The question of whether there exist an infinite number square integer. This theorem leads to the proof of Fermat’s
of Mersenne primes has been unsolved for approximately last theorem for the case n = 4; that is, x 4 + y 4 = z 4 has
300 years, as has the companion question of the existence no solutions.
of an infinite number of even perfect numbers. To date, Fermat claimed to be able to prove the conjecture for
only 28 Mersenne primes are known, so the question is n = 3, but published no proof. Euler gave such a proof
far from resolved. nearly 100 years later, but it contained some faulty rea-
Catalan’s Conjecture, due to Eugène Charles Catalan soning that fortunately could be corrected.
(1814–1894), states that 8 and 9 are the only positive con- Work on the problem progressed slowly as other mathe-
secutive integral powers of integers. maticians proved the conjecture true for n = 5, n = 7, and
In general, this suggests that x m − y n = 1, where x and y other particular values of n. It can be shown that if the
are integers greater than 0 and m and n are integers greater conjecture is true for some integer k, then it is true for
than 1, has as its only solution x = 3, y = 2, m = 2, and any multiple of k. Hence, it suffices to consider only odd
n = 3. Since m and n vary, as well as x and y, the equation prime powers. Partial results included showing that if the
above is a Diophantine equation that is not in polynomial theorem is true for some value of n > 2, then n must ex-
form. ceed 4,000,000. It was also shown that any solution led to
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
numbers that are inconceivably exceeding the capacity of work which led to his successful proof. Thus, his advisor
even the most modern computers. unwittingly helped Wiles achieve his childhood goal.
Furthermore, it was known that x p + y p = z p has no so-
lution in integers that are relatively prime to p if p is an odd C. Goldbach’s Conjectures
prime and q = 2 p + 1 is also prime. In 1983 Gerd Faltings
proved that the formula contained in Fermat’s last theorem Nearly 250 years ago, Christian Goldbach, in correspon-
has only finitely many rational solutions when n > 2. From dence with Euler, posed two conjectures:
time to time, several reputable mathematicians, as well as
countless amateurs, gave purported proofs that Fermat’s 1. Every even number greater than or equal to 6 is the
last theorem was true for all n > 2. Unfortunately, errors sum of two odd primes.
were discovered in each of the proofs presented before 2. Every odd number greater than or equal to 9 is the
1993. sum of three odd primes.
In June, 1993, Andrew J. Wiles of Princeton Univer-
sity gave three lectures at Cambridge University, which The second conjecture is actually a consequence of the
culminated in a proof of Fermat’s last theorem. Wiles had first, and in 1937 Vinogradoff proved that any odd number
been interested in the problem since the age of 10 when that is sufficiently large is the sum of three odd primes.
he vowed to find a proof. His interest in the problem led How large the number has to be is not known. It has been
16,038
him to choose mathematics as a career. established that if N is an even integer larger than ee ,
His teachers and professors advised him that he would then N is the sum of no more than four primes.
be wasting his time pursuing such a difficult and uncertain It is also known that every sufficiently large even inte-
goal. At Cambridge, his advisor guided him to the field of ger is the sum of a prime and an integer having no more
elliptic curves, which are curves having cubic equations than two distinct prime factors. Complete resolution of
and which can be used for calculating the perimeter of Goldbach’s conjectures awaits future results.
ellipses. The question of the existence of odd perfect numbers is
Wiles’ concern for the theorem never abated, even while still unresolved. Most mathematicians are inclined to be-
he did research on elliptic curves. In 1986, his determi- lieve that perfect numbers must be even, although an odd
nation was strengthened by the work of other mathemati- one may be detected someday by methods as yet unimag-
cians who postulated and then proved a connection be- ined. It is known that any such integer must exceed 1050
tween Fermat’s last theorem and elliptic functions. He and must have at least eight prime factors.
devoted himself completely to solving the problem and Number theory is the oldest branch of mathematics and
seven years later he was ready to present his proof to fellow concerns the simplest set of numbers, the integers. Be-
mathematicians. cause some of its problems can be stated in easily un-
The reaction to his proof was astonishment and derstood terms, it probably has attracted more amateurs
widespread acclaim. Verifying his proof was a slow pro- than any other branch of mathematics. Although many of
cess for two reasons: (1) his proof was 200 pages long its problems today are stated by means of abstract tech-
and (2) it was estimated that elliptic curves were under- nical definitions not easily mastered by the lay person,
stood by approximately only one tenth of one percent of nevertheless the subject that has commanded widespread
all professional mathematicians. attention over the past 3000 years should remain a vital
Minor errors in his proof were found and easily cor- area of human learning for at least that far into the future.
rected. Then a catastrophe occurred; a flaw in the proof
was discovered that could not be easily overcome. Wiles
and his former student, Richard Taylor, worked tirelessly SEE ALSO THE FOLLOWING ARTICLES
to save the proof. They finally decided to employ a differ-
ent type of elliptic curv that would eliminate the error. COMPUTER ALGORITHMS • LINEAR SYSTEMS OF EQUA-
After 14 months Wiles began to belive that his proof TIONS • NUMBER THEORY, ALGEBRAIC AND ANALYTIC •
would suffer the same fate as all previous ones, failure. SET THEORY
Then, one of those events occurred that sound more like
fiction than fact. They were close to giving up but on
September 19, 1994 they found a way to eliminate the BIBLIOGRAPHY
error and save the proof.
In June, 1997 Wiles was awarded the Wolfskehl Prize, Boston, N., and Greenwood, M. L. (1995). “Quadratics representing
primes,” Am. Math. Month. 102(7), 595–599.
which had been unclaimed for 89 years, for proving Bunt, L. N. H., Jones, P. S., and Bedient, J. D. (1976). “The Historical
Fermat’s last theorem. Had he not studied elliptic curves at Roots of Elementary Mathematics,” Prentice-Hall, Englewood Cliffs,
Cambridge, Wiles might not have been prepared to do the New Jersey.
P1: GLQ Final Pages
Encyclopedia of Physical Science and Technology EN011I-503 July 14, 2001 21:40
Crandall, R. E. (1997). “The challenge of large numbers,” Sci. Am. McCoy, N. H. (1965). “The Theory of Numbers,” Macmillan, New York.
276(2), 74–78. McCurley, K. S., An effective seven cube theorem, J. Number Theory
Dudley, U. (1969). “Elementary Number Theory,” Freeman, San 19(2), 176–183.
Francisco, California. Newman, J. R. (ed.) (1956). “The world of Mathematics,” Simon &
Edwards, H. M. (1978). Fermat’s last theorem. Sci. Am. 239(4), 104– Schuster, New York.
122. Pomerance, C. (1980/1981). Recent developments in primality testing.
Gardner, M. (1979). Sci. Am. 241(3), 22–32. Math. Intelligencer. 3(3), 97–105.
Gardner, M. (1980). Mathematical games. Sci. Am. 243(6), 18–28. Ribenboim, P. (1994). “Prime number records,” College Math. J. 25(4),
Hardy, G. H., and Wright, E. M. (1960). “The Theory of Num- 280–290.
bers,” 4th ed. Oxford Univ. Press (Clarendon), London and New Rouse Ball, W. W. (1960). “A Short Account of the History of Mathe-
York. matics,” Dover, New York.
Garrison, B. (1981). Consecutive integers for which n 2 + 1 is composite. Singh, S., and Ribet, K. A. (1997). “Fermat’s last stand,” Sci. Am. 277(5),
Pacific J. Math. 97(l), 93–96. 68–73.
Kostis, G. J., and Page, R. L. (1964). A formula concerning twin primes. Weintraub, S. (1982). A prime gap of 682 and a prime arithmetic
Math. Mag. 37(3), 153–154. sequence. BIT 22(4), 538.
P1: GPA/GJK Final Pages P2: GSS Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
Numerical Analysis
John N. Shoosmith
NASA, Langley Research Center, retired
39
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
40 Numerical Analysis
Roundoff error Error introduced because of the limit to recent development in computing is the use of collections
the precision (number of places) to which numbers can of computers to solve individual problems (parallel pro-
be represented in any finite computation. cessing), and this requires the consideration of problem
Stability A numerical method is unstable if, when it is partitioning and algorithm development to take advantage
applied to a well-conditioned problem, a small change of the particular computer architectures involved. It is in
in the data results in a large change in the numerical this area that numerical analysis and computer science are
solution. It is stable if the change in the solution remains particularly closely related.
small.
Truncation error Error introduced because of the limit
to the number of numbers that can be used in any finite
I. NUMERICAL ANALYSIS
computation. For example, a variable may be repre-
AS A SUBJECT AREA
sented exactly by an infinite mathematical series, but
only the most significant terms can be retained. The
A. The Numerical Approach
discarded terms contribute to the truncation error in
to Problem Solution
the computed solution.
Vector algorithm Algorithm designed to make efficient The means by which physical situations and processes are
use of a vector processor. In a vector processor a single described, analyzed, designed, and simulated is through
instruction causes the same operation to be performed mathematics. Natural laws are stated in terms of mathe-
on one or more sequences of numbers (vectors) in an matical equations, and the behavior of systems that obey
assembly-line fashion. those laws is described by their solutions.
Unfortunately, the mathematics of many of the pro-
cesses we would like to study quickly becomes intractable
NUMERICAL ANALYSIS is the study of the solution of when approached by conventional means. For example,
mathematical problems through the manipulation of num- analytical solutions to most nonlinear systems of equa-
bers. The solution processes are usually referred to as nu- tions simply cannot be found. The best that can be done,
merical methods, and the required sequences of numerical in the traditional sense, is to attempt a series expansion of
and logical operations, when precisely set down, are called the solution.
algorithms. The solutions thus obtained are usually not ex- Today, there is another approach: the problem state-
actly correct, but approximately so. In the context of sci- ment and variables of interest can be approximated nu-
ence and engineering, the problems of concern are either merically. Analysis and problem solution can then be per-
the result of reducing the analysis of a physical situation formed through numerical computation with the aid of
to mathematical equations that cannot be simplified fur- high-speed digital computers. To be sure, something is lost
ther by usual mathematical means, or the physical laws and when it becomes necessary to resort to numerical meth-
constraints governing a system under study are “modeled” ods, because characteristics of the solution that are im-
by mathematical equations—in which case, the numerical mediately apparent from inspection of analytical expres-
solution can be said to simulate the behavior of the system. sions may be obscured in listings of numbers; however,
Although numerical methods have been used for centuries a numerical solution is certainly better than no solution,
(Johannes Kepler used one to determine the orbit of Mars and sometimes its nature can be revealed by repeating the
in 1607), the development of digital computers, with their process with small changes in the data. Also, computer-
tremendous capacity for carrying out arithmetic and logi- generated graphs or images can often provide a sufficiently
cal operations on numbers, has made numerical methods accurate visual interpretation of the solution to aid in its
practical in a great variety of scientific and technologi- understanding.
cal applications. Today, almost all numerical methods are The numerical approach is illustrated for a very simple
carried out on computers. From an understanding of num- problem in Fig. 1. The problem is posed in physical terms
ber representations and manipulations, the numerical ana- in (a), and its mathematical formulation is given in (b). In
lyst must devise the methods and design the algorithms to this situation we know the solution, but in more complex
solve a variety of problems. In doing so, he or she must be cases, of course, we may not.
concerned with the source and propagation of errors, and The next step is to select or develop a numerical
whether or not the method “converges” to a close approx- method. Here, we have chosen the Euler method for
imation of the solution. Also, in spite of the great speed the solution of initial value problems involving ordinary
of modern computers, the matter of convergence rate and differential equations, again because of its simplicity.
computational efficiency are of paramount importance and Numerical methods can be thought of as operators that
may determine whether or not the method is practical. A accept numbers as input (in this case the initial velocity
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
Numerical Analysis 41
V0 , the problem parameters D and M, and the discretiza- The final stage is to produce an algorithm, a step-by-step
tion parameter h) and produce other numbers as output implementation of the method. Algorithms are thought of
(the successive values of time and velocity). rather like flow charts and are usually described in an
unambiguous way by means of an algorithmic or even a
computer-programming language. Algorithms are recipes
that could conceivably be followed by a person with pen-
cil and paper; however, it is usual to convert them to
computer programs, which can then be executed on a suit-
able computer.
42 Numerical Analysis
In fact, in the limit as h approaches zero, the answer is rithm as a mapping F from a set of input numbers X ≡
exact, since the error bound is then zero. Of course, it {x1 , x2 , . . . , xn } to a set of results Y ≡ {y1 , y2 , . . . , ym }
does not make sense to use a zero interval size, but the and write
point is that we can make the error as small as we wish by
yi = F(x1 , x2 , . . . , xn ), i = 1, 2, . . . , m.
selecting h sufficiently small. In this case, the truncation
error in computing V (tf ) is bounded in direct proportion to The error propagation formula is simply a first order dif-
the interval size h, and we say that the method converges ference expansion of this equation, Thus,
with order h.
∂ Fi ∂ F1
Turning now to the computation of the solution as de- yi ≈ x1 + x2 + · · ·
scribed by the algorithm, we first must specify values for ∂ x1 ∂ x2
the parameters of the problem and the initial data. Be- ∂ Fi
cause, in any practical computing device, the number of + xn , i = 1, 2, . . . , m.
∂ xn
digits allocated to a number is limited, it will probably
be necessary to chop or round these numbers before they The effect on the ith output variable of an error in the jth
are stored. The error committed by doing this is called input variable can be estimated by knowing the appropri-
inherent roundoff. Also, during the course of the compu- ate partial derivative. Of course, in most cases this will
tation, arithmetic operations are performed that produce not be known directly; however, it may be possible to find
results with more digits than the operands, and these re- an experimental estimate by observing the change in the
sults must be chopped or rounded before they are stored. output variables as the algorithm is executed with small,
This error is called arithmetic roundoff. A particularly incremental changes in the input data, one variable at a
serious consequence of roundoff occurs when two num- time.
bers of nearly the same value are rounded before they are Arithmetic operations that introduce roundoff error are
subtracted, since this can result in the loss of significant often treated as equivalent, exact operations with erro-
information. neous operands. By working backward, the effect of arith-
It is possible to minimize the effect of roundoff error metic roundoff can be reduced to an equivalent error in the
by judicious design of the algorithm. For example, the input, which can then be translated to output error by the
computation of the roots of a quadratic equation use of the error propagation formula.
ax 2 + bx + c = 0,
C. Condition and Numerical Stability
by the quadratic formula,
In solving a problem numerically, it is inevitable that errors
r = (−b ± b2 − 4ac)/2a, will be committed, both because of the use of a discrete
method (truncation) and because of finite precision cal-
when b is large relative to a and c, can produce a poor
culations (roundoff). There are two effects that can cause
approximation to the smallest in magnitude root. To avoid
such errors to grow to serious proportions, one of which
this, it is possible to use the mathematically equivalent
has to do with the problem itself, and the other due to the
formula,
method which is employed.
r = 2c/(−b ∓ b2 − 4ac) Some problems are inherently ill-conditioned, which
means that a small change in the data results in a large
for that root. change in the solution. A simple example is the system of
Another example is in the calculation of expressions of two linear equations representing the intersection of two
the type straight lines that are nearly parallel, such as
n
ai bi . y = x + 1, y = 1.01x.
i=1
These equations have the solution (100, 101); however, if
The results of each multiplication can be saved and added the coefficient of x is changed to 1.001x, less than a 1%
together in extended precision. (If the computer hardware change, the solution is then (1000, 1001), which is a 900%
cannot do this, then it can be done with the use of a special change in x. It is important that the numerical analyst be
program, treating the lower and upper halves of the multi- aware that a problem is ill-conditioned. It may be pos-
plication results as separate variables.) Then the accumu- sible to reformulate it to avoid the ill conditioning, or if
lated sum may be rounded at the end of the calculation. not, to use a higher-order method and increased precision
In order to assess the effect of an error as it is prop- arithmetic in order to reduce the introduction of numerical
agated through the computation, we consider an algo- error as much as possible.
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
Numerical Analysis 43
Numerical instability is a condition that is due to the
n
∝
method employed. A simple example is the solution of a x =± bi × 2i + b− j × 2 − j .
differential equation problem i=0 j=1
y = f (x, y), y(0) = y0 In general, for the integer base B > 1, a real number is
represented by
by the “midpoint method,”
±gn gn−1 · · · g0 .g−1 g−2 · · · ,
yi+1 = yi−1 + 2h f (xi , yi ), y1 = y0 + h f (x0 , y0 ).
where the g’s are symbols for zero and the first B − 1
In this method, a small error can grow as x increases, positive integers; its value is given by
even when the problem itself is well-conditioned, because
n ∝
of the existence of a parasitic solution. This is explained x =± gi × B i + g− j × B − j .
more fully in the section on differential equations. It is a i=0 j=1
challenge to the numerical analyst to recognize instability
Commonly used bases, in addition to 10 and 2, are 8
and to provide an alternate, stable method.
(octal), which uses the symbols 0 through 7, and 16 (hex-
adecimal), which uses 0 through 9, then A, B, C, D, E,
and F to represent the integer values from 10 to 15. These
II. FINITE-PRECISION NUMERICAL bases are useful because conversion to and from binary is
OPERATIONS easily accomplished through the grouping of bits three and
four at a time, respectively. Table I gives representations
A. Number Representation
of the first 16 positive integers for five different bases.
In our standard, positional, decimal notation, a real num- Arithmetic is carried out in any base system by using the
ber is represented as same procedure as in decimal arithmetic, keeping in mind,
however, that “carries” and “borrows” are dependent on
± dn dn−1 · · · d0 .d−1 d−2 · · · ,
the value of the base. Figure 2 gives examples of arithmetic
where each d is a digit, that is, a symbol representing zero in binary, octal, and hexadecimal.
or one of the first nine natural numbers, and the subscript Conversion from one base to another is accomplished
is an index specifying the position of the digit in the num- by a slightly different procedure, depending on the base
ber. The ± indicates that the sign may be either + or −. in which the arithmetic is to be performed. Examples are
The period after d0 is called the decimal point, and it sep- given, in Fig. 3, of conversion between octal and decimal.
arates the integral part of the number (on the left) from If the arithmetic is to be done in the base notation from
the fractional part. The leading digit dn is nonzero except
when the integral part is zero.
TABLE I The First Sixteen Natural Numbers Written in Some
The value of the number, which we will designate by Different Base Notations
x, is then given by
Decimal Binary Ternary Octal Hexadecimal
n ∝ (base 10) (base 2) (base 3) (base 8) (base 16)
x =± di × 10i + d− j × 10− j ,
i=0 j=1 1 1 1 1 1
2 10 2 2 2
the first sum being the value of the integral part and the
3 11 10 3 3
second being that of the fraction.
4 100 11 4 4
The number of distinct symbols used to represent a sin-
5 101 12 5 5
gle digit in the decimal system is 10 (probably because we
6 110 20 6 6
have 10 fingers), and we say that the base of the decimal
7 111 21 7 7
system is 10. Electronic devices depend on sensing either
8 1000 22 10 8
the presence or absence of a signal, just two possible states,
9 1001 100 11 9
which we can represent with 0 and 1. Thus, for electronic
10 1010 101 12 A
computers we need to consider the binary representation,
11 1011 102 13 B
which uses the base 2. The binary representation of a real
12 1100 110 14 C
number is
13 1101 111 15 D
±bn bn−1 · · · b0 .b−1 b−2 · · · , 14 1110 112 16 E
15 1111 120 17 F
where each b (bit) is either 0 or 1. The value of this number
16 10000 121 20 10
is
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
44 Numerical Analysis
Numerical Analysis 45
and negative zero are numerically equivalent but the dis- the single or double format standard. Register to register
tinction can be useful; for example, division of a positive computations take place in extended precision (also de-
number by negative zero produces negative infinity. fined by the IEEE standard) and only when a result is to
The smallest positive normalized single format number be stored in the computer’s memory does it get shortened
is 2−126 . If any operation produces a positive number that to single or double precision.
is smaller than this, the biased exponent is set to zero and
the significand is shifted to the right so that the number
C. Roundoff Error
is no longer normalized and the precision is less than 24
bits. Similarly, the largest negative normalized number is Exact arithmetic operations typically produce results that
−2−126 and the production of a larger negative number contain a greater number of significant positions than the
results in a denormalized form. The generation of a de- operands. For example, the sum of 8.5 and 9.2 is 17.7 (one
normalized number signals an underflow condition. more significant digit than the addends) or the product of
A biased exponent of 255 with a zero fraction indicates 1.234 and 1.432 is 1.767088 (the fractional part has twice
plus or minus infinity (e.g., the result of dividing a nonzero as many significant digits). In a binary floating point pro-
number by a signed zero). A biased exponent of 255 with cessing unit, the exact result of an arithmetic operation on
a nonzero fraction is defined as a NaN (Not a Number). A single format operands may be contained in an extended
NaN will be produced when the magnitude of the result precision register; however, if the result is to be stored in
of an operation is equal to or larger than 2128 (overflow single format, some loss of precision is inevitable. The
condition). shortening of a number to a lower precision is accom-
The IEEE standard also defines a double format number plished by “rounding.”
that is stored in a 64-bit word. In the double format mode There are two choices in rounding. The first is to merely
the fraction length is 52 bits, the bias is 1022, and the discard the portion of the significand that follows the least
biased exponent length is 11 bits. The precision of a double significant bit of the shortened format; and the second is to
format normalized number is 53 bits. add one (with appropriate carries to more significant posi-
In order to reduce the effect of roundoff error, the float- tions) to the least significant bit after discarding the same
ing point processing units of modern computers generally portion. The default rounding mode in the IEEE standard
contain a number of registers that are longer than either is “round-to-nearest.” This means that the choice is made
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
46 Numerical Analysis
FIGURE 4 Representation of numbers in a 32 bit computer word. (a) Bit numbering convention. (b) Binary-coded
decimal representation of 358. (c) Binary integer representation of +358 (+546 octal). (d) Single format floating point
binary representation of +358.625.
so that the rounded number is nearest to the infinitely pre- from the error propagation formula. Bounds for the error
cise result. If both choices are equally near, then the even that have been obtained in this manner for some common
shortened significand is chosen. sequences of calculations are shown in Table II.
The IEEE standard also provides for user specified “di- The rate of convergence of iterative methods can be
rected” rounding modes in order to reproduce results that strongly affected by roundoff error; thus, the use of
may have been obtained on nonstandard implementations. multiple extended precision registers in floating point
Round-towards-zero (also referred to as chopping) always units, in which sequences such as those shown in Table II
uses the first choice. Round-towards-positive-infinity uses can be executed prior to rounding to single or double
the first choice if the number is negative and the second format, can significantly improve the performance of the
if it is positive. Round-towards-negative-infinity uses the computer.
first choice if the number is positive and the second if it is
negative. TABLE II Bounds for Errors in Some Common Floating Point
A bound for the relative error (magnitude of the maxi- Calculations
mum possible error divided by the exact number) is 2− p for
Absolute error bounda
round-to-nearest and 2− p+1 for directed rounding, where F(x 1 , x 2 , . . . , x n )
This bound is referred to as the machine unit and desig- xi |(n + 1 − i)xi | (1.06µ)
i=1 i=1
nated by µ. n n
The generation of error in sequences of floating point xi (n − 1) |xi | (1.06µ)
i=1 i=1
operations is determined by backward error analysis. In
n
n
this approach, each rounded result is supposed to be ob- xi yi |(n + 2 − i)xi yi | (1.06µ)
i=1 i=1
tained by an exact calculation with numbers that are ini- n n
tially in error. It is possible to work backward in this man- ai x i |(2i + 1)ai x i | (1.06µ)
i=1 i=1
ner to establish a bound on the error in the initial data that
would lead to the same final result with exact arithmetic. a Note: µ is the machine unit; 21− p for chopping, 2− p for rounding,
An estimate of the accumulated error can then be obtained where p is the significand length.
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
Numerical Analysis 47
A. Background
By an algebraic equation in a single independent variable
x, we will mean an equation that can be put in the form
f (x) = 0,
where f is a single-valued function of x, containing no
derivatives nor integrals with respect to x. Examples are
x − sin x = 10
x3 = 2
e x + ln x − 3 = 0.
For purposes of this discussion, we will assume that f is
continuous and differentiable in x. Clearly, by adding x to
both sides, we have the equivalent equation
F(x) = f (x) + x = x.
A commonly occurring problem is to solve the equation FIGURE 5 The graphical determination of a root or fixed point.
in one of these two forms, by which we mean to find a
value of x for which the equation is true. In the first case
lim |xi − x ∗ | = 0.
a solution is called a root of f (x), and in the second it is i→∞
called a fixed point of F(x). If f (x) is continuous and differentiable, a sufficient condi-
In the case where f (x) is linear, that is, of the form tion for convergence is that there exists a constant C such
f (x) = ax + b, a = 0, that
|xi+1 − x ∗ |
a solution exists, is unique, and can be determined imme- ≤C <1
diately for given values of a and b. If f (x) is not linear, the |xi − x ∗ |
situation is a great deal more complicated, because there for all i greater than some threshold number. If this equa-
may not be a solution or there may be many solutions. tion holds, the convergence is said to be of order 1, or
In order to get some understanding of the nature of the linear. More generally, if it can be shown that there is a
problem, it is usually advantageous to construct a graph of number p ≥ 1 and a constant C such that
f (x) versus x and observe if and where this graph crosses |xi+1 − x ∗ |
the x axis, or alternatively to graph F(x) and observe if and ≤C <1
|xi − x ∗ | p
where it crosses the line y = x. This is illustrated in Fig. 5.
Another complication for a nonlinear equation is that for all i greater than some threshold, then the iteration
(with certain specific exceptions) it is not possible to obtain converges with order p. In particular, if p = 2, the con-
a solution in a finite number of operations. The best that vergence is said to be quadratic. The higher the order
can be done is to approximate a solution by making an of convergence, the fewer the number of iterations that
initial numerical estimate and then to methodically refine should be required to home in on the solution; however,
it by obtaining a succession of (hopefully, ever closer) whether or not a particular iteration will converge and how
estimates, until there is little change between successive rapidly it will do so depend on a number of factors, such
values. This process is referred to as iteration. as how close the initial estimate is to the desired solution
To make this more precise, suppose a solution is x ∗ , the and how close solutions are to one another. The situa-
initial estiamte is x0 , and successive iterations produce the tion of multiple, identical solutions can be particularly
infinite sequence troublesome.
As a practical matter, a test for convergence is usually
x1 , x2 , x3 , . . . , xi , . . . , made at the end of each iteration. If
then the absolute error of the ith iterate is |xi − x ∗ |. The |xi+1 − xi |
< ε1
iteration is said to converge to x ∗ if |xi |
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
48 Numerical Analysis
for a prescribed ε1 > 0, usually of the order of the machine method uses the value and derivative of f (x) at x = xi to
unit µ, the iteration is stopped and the solution is taken as extrapolate linearly to xi+1 ; the secant method uses the lat-
xi+1 . In the event that xi is close to zero, however, the test est two values of f (x) to extrapolate linearly; and Muller’s
must be modified to the absolute form, and the iteration is method uses the latest three values to perform a quadratic
stopped when extrapolation. The order of convergence of these meth-
ods (to simple roots) is 2, 1.62, and 1.84, respectively, but
|xi+1 − xi | < ε2 ,
the major disadvantage is that initial estimates must be
where, again, ε2 is a sufficiently small positive number. relatively close in order to achieve convergence.
Finally, in the root-finding case, a termination test can be Also, Newton’s method requires the computation of the
in the form function and its derivative at each step. Newton’s method
and the secant method are illustrated in Fig. 7.
| f (xi+1 )| < ε3 .
C. Roots of Polynomials
B. Iteration Methods The particular case where f (x) is a polynomial of degree
The simplest possible iteration method is the ancient n, that is, to find the roots of
method of repeated substitution, which can most read-
pn (x) = a0 x n + a1 x n−1 + a2 x n−2 + · · · + an ;
ily be applied to an algebraic equation in the (fixed-point)
form a0 = 0
x = F(x). is a frequently occurring problem. Any of the methods of
the previous section can, of course, be used for this case;
Starting with an initial estimate of x0 , the iteration formula however, it is sometimes possible to take advantage of the
is special properties of polynomials.
xi+1 = F(xi ). To start with, p(x) can be evaluated for x = α by the
following recurrence formula:
It can be shown that the method converges to x ∗ if
b0 = a 0
|F (x)| < 1
bi = αbi−1 + ai for i = 1, 2, . . . , n.
for (x ∗ − |x0 − x ∗ |) ≤ x ≤ (x ∗ + |x0 − x ∗ |); the order of
convergence is linear if F (x ∗ ) = 0. This is called the method of synthetic division, because the
Bracketing methods are applied to equations in the numbers b0 , b1 , . . . , bn−1 are the coefficients of the poly-
(root) form, nomial of degree n − 1 that is the result of dividing p(x)
by the binomial (x − α). Then p(α) = bn is the remainder
f (x) = 0. for this division. Also, once a root x1 has been found, that
They rely on first finding two values of x, say a and is, p(x1 ) = 0, then the b’s of the final iteration define the
b, such that f (a) and f (b) have opposite sign. Thus, reduced polynomial which has for its roots the remaining
if f (x) is continuous, a root lies somewhere between a n − 1 roots of p(x), so that additional roots can be sought,
and b. Two methods for successively narrowing the inter- starting from the reduced polynomial.
val containing the root are the bisection method and the The term p (x) can be evaluated for x = α by repeating
method of false position. These methods are illustrated in the synthetic division process for the first n − 1 b’s; thus,
Fig. 6. Their advantage is that once an interval containing c 0 = b0
a root has been found, convergence is guaranteed. On the
negative side, it is not always easy to find two values that ci = αci−1 + bi for i = 1, 2, . . . , n − 1,
bracket a root, and the order of convergence of both of and
these methods is only linear. In most cases the method
of false position converges more rapidly than the method p (α) = cn−1 .
of bisection; however, there are situations where it is much For this reason, Newton’s method is easy to apply.
slower. A common approach is to start with the method of A disadvantage of the methods considered so far, ex-
false position and then to revert to the method of bisection cept Muller’s method, is that they cannot directly find the
if convergence is not achieved after a prescribed number of complex roots that are usually needed in the case of poly-
iterations. nomial equations. Muller’s method and Cauchy’s method,
Extrapolation methods use information at previous iter- which bears the same relationship to Muller’s method as
ations to extrapolate to a new estimate of a root. Newton’s Newton’s method does to the secant method [that is, it
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
Numerical Analysis 49
FIGURE 6 Bracketing method. (a) Bisection method. (b) False position method.
uses the first and second derivative of f (x) at an estimate Finally, for every polynomial, a matrix can be found that
of a root to extrapolate quadratically to the next estimate], has the same characteristic polynomial; thus, the problem
are capable of finding complex roots and are thus more of finding roots of a polynomial can be expressed as the
useful for the polynomial case. problem of finding the eigenvalues of a matrix. Methods
A method designed for polynomial equations is due to exist for finding all or selected numbers of eigenvalues of
Lin and Bairstow (not described in detail here because a matrix that do not depend for convergence on close esti-
of its complexity). It uses synthetic division of p(x) by mates to start with, and are, therefore, often preferred over
a quadratic and iterates to reduce the linear remainder the methods discussed in this section. Matrix eigenvalue
to zero. The quadratic factor thus determined may have methods are discussed in Section V.
complex conjugate roots, and the reduced polynomial has
degree 2 less than p(x).
IV. SYSTEMS OF ALGEBRAIC EQUATIONS
The properties of polynomials can also be invoked to
determine approximate locations of roots, which are often
A. Systems of Linear Equations
necessary in order to improve the chance of convergence
of the methods discussed so far. Particularly useful in this If x1 , x2 , . . . , xn represent n variables, and a1 , a2 , . . . , an
regard is the theory of Sturm sequences, which can be are given numbers (called coefficients), then the
used to determine the number of real roots in a prescribed expression
interval. a1 x 1 + a2 x 2 + a3 x 3 + · · · + an x n
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
50 Numerical Analysis
is called a linear combination of the x’s. The equation and the second subscript of the a’s designate the variable.
a1 x 1 + a2 x 2 + a3 x 3 + · · · + an x n = b We define the m × n matrix A of coefficients as the m-row
by n-column array
is a linear equation. A system of m linear equations is
written in the form
a1,1 a1,2 a1,3 ··· a1,n
a1,1 x1 + a1,2 x2 + a1,3 x3 + · · · + a1,n xn = b1
a2,1 a2,2 a2,3 ··· a2,n
a2,1 x1 + a2,2 x2 + a2,3 x3 + · · · + a2,n xn = b2
a ··· a3,n
3,1 a3,2 a3,23
a3,1 x1 + a3,2 x2 + a3,3 x3 + · · · + a3,n xn = b3 A=
·
·
·
·
· ·
am,1 x1 + am,2 x2 + am,3 x3 + · · · + am,n xn = bm , am,1 am,2 am,3 · · · am,n
where the subscript of the b’s and the first subscript of the and we define the n-vector x and the m-vector b as the
a’s designate the equation, while the subscript of the x’s columnar arrays
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
Numerical Analysis 51
x1 b1 Similarly, if A is upper triangular, which can be visualized
x b by reversing the order of the rows in the above matrix, then
2 2
the solution can be obtained by back substitution
x3 b3
x = b=
· . xn = bn /an,n
·
· ·
· ·
n
xi = bi − ai, j x j /ai,i ,
xn bm j=i+1
x1 = b1 /a1,1 L y = b, U x = y.
The number of additions and multiplications required
i−1
xi = bi − ai, j x j /ai,i , i = 2, 3, . . . , n. for computing the elements of L and U is of the order
j=1 of 13 n 3 ; thus for large n there is considerable savings in
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
52 Numerical Analysis
performing the LU factorization once, then solving the thereby reducing the storage required by half and the num-
triangular systems only for each new b for which the so- ber of operations to the order of 16 n 3 . In this case, after the
lution may be needed. matrix U has been computed, the solution is found by
In solving an arbitrary n × n system it can occur that, solving
(k−1)
at the kth stage, the divisor ak,k is zero, in which case,
U T y = b, U x = Dy,
the solution can proceed no further. Also, if not identically
zero, it may be very small, which can cause relatively large where D is the diagonal of U . This is often referred to as
arithmetic roundoff errors to be introduced. To avoid these the (root-free) Cholesky method.
difficulties the technique of partial pivoting is employed, Systems that have a banded structure (the matrix A has
in which at each stage the largest element in the column zero elements wherever the difference between the indices
(k−1)
headed by the pivotal element, ak,k is found, and the i and j is greater than a fixed number less than n − 1) re-
entire row in which this element occurs is exchanged with tain their band width during LU factorization if pivoting is
the kth row. Partial pivoting does not change the solution not required; thus advantages in storage and computation
since it merely changes the order in which the equations can be realized when working with them. Matrices that
of the system appear; however, the order of elements in are sparse within a band of some nonzero elements, how-
the vector b must be adjusted accordingly. ever, suffer from “zerofill,” which means that no advantage
Pivoting is not necessary when the matrix A is diag- can be taken of sparsity with direct methods, except that
onally dominant: |ai,i | ≥ j |ai, j | for all i, or when it by judicious exchanging of equations and order of vari-
is symmetric and positive definite, meaning that for any ables, the bandwidth or profile of the matrix can sometimes
n-vector x that is not identically zero, the quadratic form be reduced, thereby reducing the number of operations
x T Ax is positive. If A is symmetric and pivoting is not required.
necessary, a symmetric version of LU factorization is pos- Before considering the error in the solution to a system
sible in which only the elements of A(k) above the diagonal of equations (either linear or nonlinear), we need to have
are computed and the elements of the M (k) are not saved, a means to measure a vector or matrix. Just in measuring
P1: GPA/GJK Final Pages P2: GSS
Encyclopedia of Physical Science and Technology EN011G-505 July 25, 2001 18:45
Numerical Analysis 53
a scalar quantity by its magnitude, we measure a vector where µ is the machine unit. In practice, it has bee