Mathematics 1001
Mathematics 1001
1001
SCIENCE MATHS
MATHS
1001
RICHARD ELWES
Quercus
CONTENTS
Introduction 6 Discrete Ge
Differential
NUMBERS 8 Topology
The Basics 10 Knot Theor
Arithmetic 14 Non-Euclid
Number Systems 22 Algebraic T
Rational Numbers 27 Algebraic G
Factors and Multiples 31 Diophantin
Induction 35
Representation of Numbers37 ALGEBRA
Transcendental Numbers 46 Letters for N
Ruler and Compass Equations
Constructions 48 Vectors and
Diophantine Equations 54 Group Theo
Prime Numbers 64 Abstract Alg
GEOMETRY 78 DISCRETE
6 Discrete Geometry 125
Differential Geometry 129
8 Topology 132
10 Knot Theory 139
14 Non-Euclidean Geometry142
Systems 22 Algebraic Topology 143
Numbers 27 Algebraic Geometry 145
nd Multiples 31 Diophantine Geometry 152
35
tation of Numbers37 ALGEBRA 156
dental Numbers 46 Letters for Numbers 158
d Compass Equations 161
48 Vectors and Matrices 167
tine Equations 54 Group Theory 177
64 Abstract Algebra 185
78 DISCRETE
CONTENTS
The study of mathematics, then, is as ancient as civilization; but it also represents the
modernity of today’s world. In the millennia since Ahmes’ work, we have seen
scientific and technological progress of which he could not have dreamt. Central to this
advance has been the march of mathematics, which has contributed the basic language
used in all scientific contexts. Probably mathematics’ most fundamental contribution
has been in the sphere of physics. Galileo’s revolutionary insight in the early 17th
century that the universe might yield to a purely mathematical description set the
direction towards the world-changing theories of quantum mechanics and relativity.
This reliance on mathematics is not confined to the physical sciences. The social
sciences depend on techniques of probability and statistics to validate their theories, as
indeed do the worlds of business and government. More recently, with the emergence
of information technology, mathematics became entangled in another love-affair, with
computer science. This too has had a profound impact on our world.
As its influence has broadened, the subject of mathematics itself has grown at a
startling rate. One of history’s greatest mathematicians, Henri Poincaré, was described
by Eric Temple Bell as ‘the last universalist’, the final person to have complete
mastery of every mathematical discipline that existed during his lifetime. He died in
1912. Today, no-one can claim to have mastered the whole of topology, let alone
geometry or logic, and these are just a fraction of the whole of mathematics.
Poincaré lived through a turbulent period in the history of mathematics. Old ideas had
been blown away, and new seeds planted which flourished during the 20th century.
The result is that the mathematical world we know today is rich and complex in ways
that even the greatest visionaries of the past could not have imagined. My aim in this
book is to give
6
INTRODUCTION
an overview of this world and how it came to be. I might have tried to sketch a low-
resolution map of the entire mathematical landscape, but this would be neither useful
nor entertaining. Instead, I have presented 1001 short ‘postcards’ from interesting
landmarks around the mathematical world that nonetheless give a feel for the bigger
picture of mathematics.
In the scheme of things, 1001 is a very small number (see the frivolous theorem of
arithmetic). My challenge has been to select the real highlights: the truly great
theorems, the outstanding open problems and the central ideas. I have also sought to
represent the surprises and quirks that make the subject truly fascinating.
This book is organized thematically, on three levels. It is divided into ten chapters,
each covering a broad subject, beginning with ‘Numbers’. Each chapter is subdivided
into sections, which are more narrowly focused on a single topic, such as ‘Prime
arithmetic
numbers’. Each section comprises a series of individual entries, such as the one on the
Riemann hypothesis.
How you should read Mathematics 1001 depends on what you want from it. If you are
interested in prime numbers you can read through that section. If you want a quick
explanation of the Riemann hypothesis, you can jump straight there; but, because ‘a
quick explanation of the Riemann hypothesis’ is an impossibility, you will then need to
rewind a little, to take in the preceding few entries where the necessary prerequisites
are laid out. Alternatively you can dip in and out, perhaps finding a new story by
following the bold cross-references to different entries in the book.
Who is this book aimed at? The answer is: anyone with a curiosity about mathematics,
from the novice to the informed student or enthusiast. Whatever the reader’s current
knowledge, I’m sure that there will be material here to enlighten and engage. Some
parts of the book undoubtedly cover highly complex subjects. That is the nature of the
subject; shying away from it would defeat the purpose. However, the book is
structured so that the relevant basic concepts precede the complex ones, giving a
foundation for understanding. My job in writing has been to discuss all ideas, from the
basic to the most abstract, in as direct and focused a way as possible. I have done my
best, and have certainly relished the challenge. Now I can only hope that you will
enjoy it too.
The first thing to do with these is combine them arithmetically, through addition,
subtraction, multiplication and division. There are several time-honoured ways of
calculating the results, which rely on our decimal place value notation.
At a deeper level, questions about the natural numbers fall into two main categories.
The first concerns the prime numbers, the atoms from which all others are built. Even
today, the primes guard their secrets. Major open questions include Landau’s problems
and the Riemann hypothesis.
The second principal branch of number theory considers Diophantine equations. These
encode the possible relationships
9
THE BASICS
Addition Today, as for thousands of years, numbers are principally used for counting.
Counting already involves addition: when you include a new item in your collection,
you have to add one to your total.
More general addition extends this: what happens if you add three items to a collection
of five? Efficient methods for adding larger numbers first required the development of
numerical notation (see addition by hand). Mathematicians have many terms for
addition. The sum of a collection is the total when everything is added together. As
more and more numbers are added together, objects called series emerge. Today,
addition extends beyond plain numbers to more colourful objects such as polynomials
and vectors. see addition by hand
Subtraction The mathematical perspective on subtraction may seem strange on first
sight. Since the dawn of negative numbers, every number (such as 10) has an opposing
additive inverse (−10). This is defined so that when you add the two together they
completely cancel each other out, to leave 0. Subtraction is then a two-step process: to
calculate ‘20 – 9’ first you replace 9 with its negative, and then you add the numbers
20 and 9. So ‘20 – 9’ is really short-hand for ‘20 + (−9)’.
This settles a matter which often troubles children: why does the order of addition not
matter (3 + 7 = 7 + 3), but the order of subtraction does (3 − 7 ≠ 7 − 3)? When
understood as addition, the order does not matter after all: 3 + (−7) = (−7) + 3.
So how do you subtract negative numbers such as −7 −(−4)? The same rules apply,
first replace −4 with its inverse, −4, and then add: −7 + 4 = −3.
There are several terms to describe multiplication, and several symbols. ‘Times’ is
nicely descriptive. A more everyday term is the English word ‘of’: three sets of seven
make 21, for example. This remains valid for fractions: half of six is three, which
translates as 1/2 × 6 = 3.
10
NUMBERS
THE BASICS
see sums and product
The most common symbol for multiplication is ‘×’, though mathematicians often
prefer ‘·’, or even nothing at all: 3 · x and 3x mean the same as 3 × x. For multiplying
lots of numbers together, we use the product notation (see sums and products).
Since 3 × 5 = 15, we say that 3 is a factor of 15, and 15 is a multiple of 3. The prime
factors of a number are its basic building blocks, as affirmed by the fundamental
theorem of arithmetic.
Sums and products Suppose you want to add together the numbers
1 to 100. Writing out the whole list would take up too much time and paper, so
mathematicians have devised a short-hand. A capital Greek sigma (standing for sum) is
used:
The numbers on top and bottom of the sigma show the range: j starts at 1 and takes
each value successively up to 100. After the sigma comes the formula we are adding.
Since in this case we are just adding the plain numbers, the formula is just j.
Instead if we wanted to add the first 100 square numbers, we would write:
(These expressions have easily computable answers; see adding up 1 to 100 and adding
the first hundred squares.)
If we want to multiply instead of add, we use a capital Greek pi (standing for product):
first hundred squares.
(This is also written as ‘100!’, that is 100 factorial.)
11
xa × xb = xa+b
(xa) b = x a × b
The rule for getting larger positive powers of 10 is ‘keep multiplying by 10’. Starting
with 10 1 (which is 10), multiplying by 10 tells us 10 2 10 10 100. Multiplying by 10
again gives
We can turn this on its head. If we start with 10 6 (which is 1,000,000) and count
down, to get to the next power down, we divide by 10. That is, 10 5 10 6 10 1,000,000
10 100,000. Then to get to 10 4 we divide by 10 again, and so on, until we get back to
10 1 (which equals 10).
But there is no reason to stop here. To reach the next power down, 10 0, we divide by
10 again, so 10 0 10 1 10 1. If we continue, we arrive among the negative powers: 10 1
10 0 10 1 10 1 __ 10. Then 10 2 10 1 10 1 __ 10 10 1 ___ 100, and so on. The pattern
is:
What works for 10 works for every other number (other than 0). Negative powers of x
are defined as the corresponding positive power of the reciprocal of x. That is x n (x 1)
n.
Roots What number when squared gives 16? The answer of course is 4. To put it
another way, 4 is the square root of 16. In the same way, 3 cubed is 27, so 3 is the cube
root of 27. Similarly 2 5 32, so 2 is the fifth root of 32. The symbol we use to denote
this is. So 5
32 2 and
27 3. For square roots the little 2 is usually omitted, and we just write
16 4. Roots can
12
NUMBERS
THE BASICS
2 a time’, which does not seem very meaningful. Just as negative powers are made
comprehensible by incorporating the reciprocal, fractional powers can make sense
when interpreted as roots.
If notation such as 32 1 _ 5 is to mean anything at all, then it must satisfy the second
law of powers: (x a) b 5 x a 3 b. In particular (32 1 _ 5) 5 should equal 32 1 _ 5 35 5 32
1 5 32. That is, 32 1 _ 5 should be a number which when raised to the fifth power gives
32. This must mean 32 1/5 _ 5 5 5. 32 5 2. Similarly we can write 27 = 27 = 3 and 16 =
16 = 4. In general, x.
What meaning can we give to notation like 32 4/5 ? Again the second law of powers
helps: it should be equal to (32 1/5) 4 5 2 4 5 16. Similarly, 274/3 = 3 5 3 4 5 81, and in
general,
You can take logarithms to any base number, as long as it is positive and not equal to
1.
But two bases are particularly common. Because powers of 10 are such a convenient
way to represent numbers, logarithms to base 10 are very useful for measuring the
order of magnitude of a number: log 10 N is approximately the number of digits in the
decimal representation of N.
Logarithms to base e are called natural logarithms, and are the most commonly
occurring within pure mathematics.
(All the logarithms must be taken to the same base here, but every positive base
works.) To see why this holds, define c 5 x a and d 5 x b in the first law of powers.
Translating these, we get a 5 log c and b 5 log d. (Since all the logarithms are to base x,
we take the x as read.) Now, cd 5 x a 1 b, and so a 1 b 5 log (cd), giving the result
above.
The second law of logarithms corresponds to the second law of powers. It says that, for
any c and d:
log (c d) 5 d 3 log c
27 5 3 and 16 1 _ 2 5 ?
16 5 4. In general, x 1 __ n 5 n
Slide rules A rudimentary slide rule for addition might work as follows: take two rulers
marked in centimetres, and place them side by side. If you want to calculate 4 7, slide
the top ruler so that its start aligns with the ‘4’ on the bottom ruler. Find the ‘7’ on the
top ruler and read off the value aligned with it on the bottom. With a slight
modification, this simple idea produces a logarithmic slide rule, which can manage
multiplication instead of addition.
Logarithmic slide rules In the days before calculators, multiplying large numbers was
time consuming and error-prone. A logarithmic slide rule is a device which uses
logarithms to produce a quick and easy method. The crucial ingredient is the first law
of logarithms:
This says that a logarithm converts multiplication into addition: if two numbers are
multiplied (cd), then their logarithms are added: log c log d.
A logarithmic slide rule works as an ordinary one, with one important difference.
Instead of standard rulers where the ‘4’ is marked 4 cm from the end, it uses
logarithmic rulers where the ‘4’ is marked log 4 cm from the end. (One consequence is
that the logarithmic ruler starts at ‘1’ instead of ‘0’, because log 1 0.) Following
exactly the same procedure as for an ordinary slide rule, you will have arrived at a
point log 4 log 7 cm along the bottom ruler, which will be marked 28.
Logarithmic slide rules will work to any base, but were first designed using natural
logarithms, by William Oughtred in the 1620s.
ARITHMETIC
ARITHMETIC
The difficulty arises when the numbers in a column add up to more than 9. Suppose we
want to add 56 and 37. We always begin on the right, with the units column. This time,
6 and 7 give us 13.
The final number of units will certainly be 3, so we can write this as the answer in the
units column. This leaves us with the extra 10 to cope with. Well, the next stage is to
add up the tens column anyway, so we just need to add one more ten to the pile. So we
carry the 1 to the top of the tens column before adding that up:
Subtraction by hand As with addition by hand, the basic idea for subtraction is to align
the numbers in columns and proceed column by column, starting with the units:
This time, we may encounter the problem that we need to take a larger digit from a
smaller:
Here we seem to need to take 8 from 3, which we cannot do without heading into the
negative numbers (which is best avoided until inevitable). The way around this is to
split up 73 differently. Currently it is split into 7 tens (T), and 3 ones, or units (U),
which is proving inconvenient. So instead we will write it as 6 tens, and 13 units.
Essentially we are rewriting the calculation as:
Now we can proceed as before. What this looks like when written normally is:
This process of ‘borrowing 1’ from the next column might have to be repeated several
times in one calculation.
15
Multiplication by hand, table method
If you know your times tables, then multiplying two single-digit numbers should be
straightforward. Once we can do this, the decimal system makes it fairly simple to
multiply larger numbers.
To calculate 53 7, as usual we split 53 into 5 tens and 3 units (ones). The key fact is
that we can multiply each part separately. That is: (50 3) 7 (50 7) (3 7). Some people
use a grid method:
To finish the calculation, we add up everything in the inner part of the table: 350 21
371. This easily extends to calculations with more digits, such as 123 45:
To finish this calculation, we add up: 4000 800 120 500 100 15 5535.
An alternative method to the table method of multiplication (which uses less ink) is
column by column, adding as we go along, instead of at the end:
Starting with the units column, we get 15. So the answer’s digit in that column will be
5, and we have also gained an extra ten. So, after we perform the multiplication in the
tens column, we need to add that on:
16
NUMBERS
ARITHMETIC
To multiply 13 by 45, we multiply by 40 and 5 separately and then add up. Writing
these two calculations one under the other makes this addition quicker to do:
Short division Short division is a method for dividing a large number (the dividend) by
a single-digit number (the divisor). The basic idea is to take the dividend digit by digit
starting on the left. Above each digit we write the number of times the divisor fits in:
The complication comes when the divisor does not fit exactly into one of the digits, but
leaves a remainder. In the next example, 3 into 7 goes 2 times, with a remainder of 1.
The 2 goes above the 7 as before, and the 1 is carried to the next step, placed before the
8, which is then considered as 18:
Short division can work with small double-digit divisors such as 12 too (as long as you
know their times table). In this example, 12 cannot go into 9, so the whole 9 is carried
at the start:
Long division Long division is essentially the same procedure as short division. As the
divisors become larger, however, more digits have to be carried, and calculating the
remainders becomes lengthier. Rather than cluttering up the division symbol with
carried remainders, they are written out underneath. So instead of writing
17
Since 18 cannot divide the 8 in the hundreds column, 84 is the first number to be
divided by 18, corresponding to the tens column. It goes in four times since 4 18 72,
but 5 18 90. 4 is written on top, and 72 is written below 84 and then subtracted from it
to find the remainder, 12. If we were doing short division, this would be the number to
carry to the next column. The equivalent here is to bring down the next digit from 846
(namely 6) and append it to the 12 to get 126, the next number to be divided by 18.
This goes exactly 7 times, so we write 7 on the top and we have finished.
Divisibility tests How can you tell when one whole number is divisible by another? In
general there is no easy method. But for small numbers there are various tricks which
exploit quirks in the decimal system, our ordinary method of writing numbers. Some
are so easy that we do them without thinking:
This last one easily extends to divisibility tests by 100, 1000, and so on.
This trick works because the number written as xyz is really 100 x 10 y z. This is equal
to 99 x 9 y x y z. Now, 99 x 9 y is certainly divisible by 3. So the whole thing is
divisible by 3, if and only if x y z is divisible by 3. This proof also shows that the same
trick works for 9. So 972 is divisible by 9, as 9 7 2 18, a multiple of 9. But 1001 is not
divisible by 9, as 1 0 0 1 2.
Divisibility by 6 The test of divisibility by 6 simply amounts to applying the tests for
both 2 and 3: a whole number is divisible by 6 if and only if it is even, and its digits
add up to a multiple of 3. So 431 is not divisible by 6, as it is not even. Also 430 is not
divisible by 6, as its digits add up to 7, not a multiple of 3. But 432 is divisible by 6 as
it is even, and its digits add up to 9, a multiple of 3. (Notice that its digits do not have
to add up to a multiple of 6.)
Similarly, we can tell whether a number is divisible by 4 just by looking at the last two
digits. If they form a number divisible by 4, then the whole thing is divisible by 4. So
1924 is divisible by 4, because 24 is divisible by 4. On the other hand 846 is not
divisible by 4, because 46 is not
18
NUMBERS
ARITHMETIC
divisible by 4. Again, ‘wxyz’ is short-hand for 1000 w 100 x 10 y z. This time, 1000 w
100 x is always divisible by 4, so whether the whole thing is divisible by 4 depends
only on whether 10 y z is.
Divisibility by 8 The divisibility test for 4 easily extends to 8, 16, 32, and so on.
Looking at the last three digits of a number is enough to determine divisibility by 8. So
7448 is divisible by 8, as 448 is.
Admittedly, this divisibility test relies on knowing your 8 times table up to 1000, but is
still useful when analysing very large numbers. For smaller numbers, it may be more
practical to divide by 2, and then apply the divisibility test for 4. Similarly the last four
digits determine divisibility by 16, and so on.
One test works as follows: chop off the final digit, and double it. Then subtract the
result from the shortened number. If the result is divisible by 7 then so was the original
number. For example, starting with 224, we remove the final 4 and double it to get 8.
Then we subtract this from 22, to get 14. Since this is divisible by 7, so is 224.
For larger numbers, we might need to apply this trick more than once. Starting with
3028, remove the 8 and double it to get 16. Now subtract that from 302 to give 286,
and we repeat. Remove the final 6, double it to get 12, and subtract that from 28 to
leave 16. That is not divisible by 7, so neither is 286, and therefore neither is 3028.
This works because every number can be written as 10 x y, where y is the last digit
(and so between 0 and 9), and x is the result of chopping off y. In the example of 224,
x 22, and y 4. Next, 10 x y is divisible by 7 if and only if 20 x 2 y is divisible by 7
(multiplying by 2 does not affect divisibility by 7). Now, 20 x 2 y 21 x x 2 y. Of course
21 x is always divisible by 7, so whether or not the original number is divisible by 7
depends on x 2 y, or equivalently its negative x 2 y.
Divisibility by 11 11 has an elegant divisibility test. It works by taking the alternating
sum of the digits: add the first, subtract the second, add the third, and so on.
If the result is divisible by 11, then so is the original number. More precisely, taking a
five-digit number ‘vwxyz’ as an example, this is divisible by 11 if and only if v w x y z
is divisible by 11. (If v w x y z 0, that is classed as divisible by 11.) So, to test 5893,
we calculate 5 8 9 3 3, which is not divisible by 11.
This works because the following numbers are all divisible by 11: 99, 9999, 999999,
and so on. On the other hand 9, 999, 99999, etc. are not, but 11, 1001, 100001 etc. are.
Writing ‘vwxyz’ as 10000 v 1000 w 100 x 10 y z, this is equal to 9999 v 1001 w 99 x
11 y v w x y z
From the above observation 9999 v 1001 w 99 x 11 y will always be divisible by 11.
So whether or not the whole number is divisible by 11 depends on v w x y z.
19
Divisibility by other primes For composite numbers the best approach to divisibility is
to test for every constituent prime individually. Other prime numbers can all be tested
in a way similar to 7. They involve chopping off the final digit, multiplying it by some
suitable constant, and then adding or subtracting that from the curtailed number.
o To test for divisibility by 13, chop off the last digit, multiply it by 4, and add that to
the shortened number. So, to test 197, chop off the 7, multiply by 4 to give 28, and add
that to 19 to give 47. As this is not divisible by 13, neither is 197.
o For 17, chop off the final digit, multiply by 5, and subtract that from the curtailed
number. For example, starting with 272, chop off the 2, multiply by 5 to get 10, and
subtract that from 27 to leave 17, which is divisible by 17, so 272 is too.
o For 19, chop off the last digit, double it, and add that to the curtailed number. Similar
tests work for larger primes too.
Difference of two squares One of the simplest and most useful algebraic identities is
the difference of two squares. This says that for any numbers a and b, a 2 b 2 (a b)(a b).
The proof is simply a matter of expanding brackets:
(a b)(a b) a 2 ab ab b 2
This works equally well with any combination of numbers and algebraic variables. For
instance, 15 2 3 2 (15 3)(15 3) and x 2 16 (x 4)(x 4), because 16 is 4 2. One of many
uses for this identity is as a technique for speeding up mental arithmetic.
Arithmetic using squares One of the first tasks for people training for speed arithmetic
is to memorise the first 32 square numbers. As well as being useful on their own, they
can be used to multiply other pairs of numbers. The trick is to exploit the difference of
two squares.
If the two numbers are both odd or both even, then there will be another number
directly in the middle of them. For example, if we want to multiply 14 18, we note that
16 is in the middle. So we can rewrite the problem as (16 2) (16 2). This is now the
difference of two squares: 16 2 2 2. Since we have memorized that 16 2 256, the
answer is 252.
If the two numbers are not both odd or both even, we can do it in two steps. For
example, to calculate 15 18, split it up as (14 18) 18. We calculated 14 18 252 above,
so 15 18 252 18 270.
Casting out nines Casting out nines is a useful technique for checking for errors in
arithmetic. The basic idea is to add up the digits of the number, and subtract 9 as many
times as possible, to get an answer between 0 and 8. So starting with 16,987 we add 1
and
20
NUMBERS
ARITHMETIC
6 to get 7. We can ignore the next 9. Then add 8 to get 15, and subtract 9 to get 6, add
7 to get 13, and subtract 9 to get an answer of 4. We can write N (16,987) 4.
The point of this is that if we have calculated 16,987 41,245 as 58,242, we can check it
as follows: N (16,987) 4 and N (41,245) 7. Adding these together, and casting out
nines again gives an N-value for the question of 4 7 9 2. However, our answer
produces N (58,242) 3. As these do not match, we know we have made a mistake. In
fact, 16,987 41,245 58,232.
The same trick works for subtraction, multiplication and integer division. For example,
arthmetic
if we have calculated 845 637 as 538,265, we work out N (845) 8 and N (637) 7.
Multiplying these together gives 56. Repeating the process, we get a result for the left-
hand side of N (56) 5 6 9 2. Since N (538,265) 2 too, the two sides match, and the test
is passed.
This technique amounts to checking the answer in arithmetic modulo 9 (see modular
arthmetic). It is useful for detecting errors, but also gives false negatives (for instance,
it cannot detect swapping two digits in the answer).
The only complicating factor is when a pair of digits makes 10 or more. To multiply 87
by 11, for example, we first write down 7. Now, 8 and 7 sum to 15. So we write down
5, and carry 1 to the next step. So far we have 57. The final step is usually to write
down the final digit: 8. But this time we must also add the carried 1, so the final answer
is 957. see modular
This method can be summarized as ‘add each digit to its neighbour’, where
‘neighbour’ means the digit to its right. With a little practice, this makes multiplying
by 11 almost instantaneous. Jakow Trachtenberg devised similar methods for
multiplying by all the numbers from 1 to 12. The corresponding rule for multiplying by
12, for example, is ‘double each digit and add its neighbour’.
21
Trachtenberg multiplication by 11
The only complicating factor is when a pair of digits makes 10 or more. To multiply 87
by 11, for example, we first write down 7. Now, 8 and 7 sum to 15. So we write down
5, and carry 1 to the next step. So far we have 57. The final step is usually to write
down the final digit: 8. But this time we must also add the carried 1, so the final answer
is 957. see modular
This method can be summarized as ‘add each digit to its neighbour’, where
‘neighbour’ means the digit to its right. With a little practice, this makes multiplying
by 11 almost instantaneous. Jakow Trachtenberg devised similar methods for
multiplying by all the numbers from 1 to 12. The corresponding rule for multiplying by
12, for example, is ‘double each digit and add its neighbour’.
NUMBER SYSTEMS
Number systems The most ancient number system is the one that humans have used to
count objects for millennia: the system N of natural numbers consisting of 0, 1, 2, 3, 4,
5, … As civilizations advanced, more sophisticated number systems became necessary.
To measure profit and debt, we need to incorporate negative numbers, giving Z, the
system of integers.
Not everything can be measured using whole numbers. Half a day, or two thirds of a
metre, show the need for a system extending beyond the integers. Today, the system
which unites the fractions and the integers is known as Q, the rational numbers. As the
Pythagoreans discovered, the rational numbers are not adequate for measuring every
length. By plugging the gaps between rational numbers, we arrive at the system R of
real numbers. In the 16th century, Italian algebraists working on solving equations
realized that this was still not enough. The system that results from introducing a
square root of 1 is C, the complex numbers.
Mathematical disciplines Each number system can be studied and investigated on its
own terms, and mathematicians have come to know the characters and idiosyncrasies
of each. N and Q seem straightforward and welcoming on first meeting, but are highly
secretive, and downright awkward to work with. This is the realm of number theory.
At the other end of the spectrum, C perplexes those who see it from a distance, but
rewards anyone brave enough to get to know it, with its incredible simplicity and
power. Complex analysis was one of the triumphs of 19th-century mathematics.
In between, R is the right arena for understanding lengths, as the ancient Greeks first
realized. Much of geometry and topology is built from R.
Swaziland, archaeologists in the 1970s found a baboon’s leg bone, with 29 notches
carved into it. It had been used as a tally, and the number 29 suggests a lunar calendar.
Dating from around 35,000 bc, the Lebombo bone is the oldest mathematical artifact
that we have. It illustrates the first number system, the one that humans have used to
count, for millennia: the natural numbers.
0, 1, 2, 3, 4, 5, …
22
NUMBERS
NUMBER SYSTEMS
The prehistory of zero The entity we know as zero took many years to be accepted as a
number in its own right. It required a leap of imagination to start thinking of 0, which
represents nothing, as being something. The trigger for the ascent of zero was the
development of place value notation. There are, of course, infinitely many numbers.
But we don’t want to have to invent ever more symbols to describe them. Today, we
use only the symbols 1–9, as well as 0, to describe any number, with the place of the
number imparting as much information as its value. So in ‘512’, the ‘5’ means ‘five
hundreds’, while in ‘54’ it means ‘five tens’.
This is an ingenious system, but what happens when you have no tens, as in two
hundred and three? The ancient Babylonians simply left gaps. So they might have
written two hundred and three as ‘2 3’ (of course, they did not use Arabic numerals or
a decimal base, nevertheless this illustrates the idea). The problem is obvious: this can
be easily mistaken for 23. By the third century bc, the Babylonians, in common with
other cultures, had got around this by using a place holding symbol to indicate an
empty column. Ancient Chinese mathematicians had a mathematical concept of zero,
and indeed negative numbers, but their primitive notation lagged behind this deeper
understanding.
Brahmagupta’s zero It was in India that the Babylonian notation and Chinese
conception of zero finally came together, in the development of 0 as a full-blown
number. In ad 628, Brahmagupta formally defined it as:
His arithmetical insights may seem obvious today, but they represent a real
breakthrough in the history of human thought: ‘When zero is added to a number or
subtracted from a number, the number remains unchanged; and a number multiplied by
zero becomes zero.’
Brahmagupta’s work also laid out the basic theory of negative numbers, though these
took longer to gain widespread acceptance.
This book generally adopts the Zen-like philosophy that 0 is the epitome of
naturalness. In some contexts, however, this leads to the annoyance that 0 is the first
natural number, 1 is the second, 2 the third, and so on. So, when working with
sequences, for example, it is more convenient to exclude zero.
23
Profit and debt If numbers are principally used for counting, then what do negative
numbers mean? How can you have 3 apples? The likely origin of negative numbers is
in trade, where positive numbers represent profit, and negative numbers debt. Ancient
Chinese mathematicians represented numbers with counting rods, and used red and
black rods respectively to distinguish between positive and negative numbers. (These
colours have been reversed in the western world, with the phrases ‘in the black’ and ‘in
the red’ meaning ‘in credit’ and ‘in debit’, or debt, respectively.)
Negative numbers Despite their ancient pedigree, negative numbers were long viewed
with suspicion, even until the early 19th century. Many saw them as short-hand for
something else (a positive quantity of debt instead of a negative quantity of profit),
rather than legitimate numbers in their own right. Many mathematicians were content
to employ them as tools for calculating, but if the final answer came out negative, it
would often be abandoned as invalid. (Complex numbers held a similarly
indeterminate status for several years.) However, the direction had been set in 628,
when the Indian mathematician Brahmagupta wrote his treatise on the combined
arithmetic of positive and negative whole numbers, and 0. We now call this the system
of integers.
Integers The integers are the whole numbers: positive, negative and zero. The system
of integers is known as Z, standing for Zahlen, meaning ‘numbers’ in German. The
advantage of Z is that quantities, such as temperature, which naturally come either side
of 0 can be measured. Similarly, profits and debts can be measured on a single scale.
This also allows more equations to be solved. For example, x 3 2 is an equation built
solely from natural numbers, but it has no solution in that system. In the integers,
however, it does. Indeed any equation x a b, where a and b are integers, can now be
solved without leaving Z.
Rational numbers Any number which can be expressed as a fraction of integers (whole
numbers) is rational (meaning that they are ratios, rather than that they are logical or
cerebral). Examples are 2 (since it equals 2 _
1), 17 __
8 and 3
4. The reasons for the development of fractions are self-evident. For measuring time,
distance or resources, quantities such as half a month, a third of a mile, or three-
quarters of a gallon are obviously useful.
Mathematicians denote the system of all rational numbers by Q (standing for quotient).
This system augments the integers, and also brings mathematical benefits. Among the
whole
24
NUMBERS
NUMBER SYSTEMS
numbers, division is not well-behaved. You can divide 8 by 2, but not by 3. The
rational numbers form a system called a field, where any numbers can be divided, as
well as added, subtracted or multiplied. The solitary exception is 0, by which you can
never divide (see division by 0).
Real numbers Integers are a fixed distance apart. From one integer to the next is
always a distance of 1. For the rational numbers this is no longer true; they can
measure shorter distances. Starting at 1, there is no ‘next’ rational number. There are 1
1_
2, 1 1
20,000, and rational numbers as close to 1 as you like. It seems strange, then, to insist
that there are nevertheless ‘gaps’ among the rational numbers. But this is true, as the
irrationality of
2 demonstrates. Drawing the graph of y x 2 2, at the place where it should cross the x-
axis, it sneaks through a gap in the rational numbers.
Filling in these gaps is a technical procedure, which results in R, the system of real
numbers. Also known as the real line, R can be thought of as all the points on an
infinite line, which is now complete, meaning that there are no gaps in it.
2 3i is a complex number. These can then be added, subtracted, multiplied and divided
according to the rules of complex arithmetic.
3, 1 1
10,
____ 1 1
_ as 3i). So 1
25
The Argand diagram is the standard way to represent complex numbers, as a 2-
dimensional plane. Then complex numbers look like familiar Cartesian coordinates,
with the real axis horizontal, and the imaginary axis vertical. This complex plane is a
wonderful setting for geometry, as geometric and algebraic ideas mesh perfectly.
In many ways, the complex numbers form the endpoint for the evolution of the concept
of number. For the purposes of solving polynomial equations, the fundamental theorem
of algebra says that they do everything that could possibly be required.
However, we do not have to stop there. It is possible to extend the complex numbers to
a still larger system. The complex numbers are built from pairs of real numbers (a, b),
with a new ingredient i, the square root of 1. Every complex number can be written in
the form a i b.
Octonions When Sir William Hamilton had discovered his system of quaternions, he
wrote to his friend John Graves explaining his breakthrough. Graves replied ‘If with
your alchemy you can make three pounds of gold, why should you stop there?’ As
good as his word, Graves came up with an even larger system, now known as the
octonions. It is built from octuples of real numbers: (a 0, a 1, a 2, a 3, a 4, a 5, a 6, a 7)
along with seven new ingredients: i 1, i 2, i 3, i 4, i 5, i 6, i 7, each of which squares to
1. So a general octonion is of the form a 0 i 1 a 1 i 2 a 2 i 3 a 3 i 4 a 4 i 5 a 5 i 6 a 6 i 7
a 7.
26
Imaginary axis 4i
½ + 3i 3i
2i
–i
–2i
–3i
–4i 0 2 3 4
Real axis
–4 –1
–3
–2
1
NUMBERS
RATIONAL NUMBERS
Graves the octonions; how much further can this generalization be pushed? In 1898,
Adolf Hurwitz proved that this really is the limit now. The real and complex numbers,
the quaternions and the octonions are the only four normed division algebras:
structures containing the real numbers which allow multiplication and division in a
geometrically sensible way. The mathematical physicist John Baez described this
family in 2002:
The real numbers are the dependable breadwinner of the family, the complete ordered
field we all rely on. The complex numbers are a slightly flashier but still respectable
younger brother: not ordered, but algebraically complete. The quaternions, being non-
commutative, are the eccentric cousin who is shunned at important family gatherings.
But the octonions are the crazy old uncle nobody lets out of the attic: they are non-
associative.
Non-associative means that, if you multiply octonions A, B and C, you might find A
(B C) (A B) C. This is a violation of one of the most basic algebraic laws. Despite (or
because of) their craziness, the quaternions and octonions are useful for explaining
other mathematical anomalies, such as the exceptional Lie groups.
RATIONAL NUMBERS
2, and vice versa. The reciprocal of a fraction is easily found: just turn it upside down.
So the reciprocal of 5 _
Division Division measures how many times one number fits into another. So 15 ÷ 3 =
5 because 3 fits into 15 exactly 5 times: 5 3 15. This equally applies to fractions, so
becayse futs ubti exactkt 2 times: 2 1/4 1/2.
An alternative view is to see division as being built from the more basic operation of
reciprocation. (Of course the notation of the reciprocal is already highly suggestive of
1 being divided by 4.)
5.
4 2 because 1
4 fits into 1
__
2 exactly 2 times: 2 1
41
27
We can then understand m n as meaning m multiplied by the reciprocal of n, that is m
1 __ n. So 15 3 15 1 _
3 5, and 1
21
41
53
42
54
38
15.
Equivalent fractions In the world of whole numbers, there is just one way to denote
any number: 7 is 7, and you cannot write it in any other way. (James Bond might fairly
object that it can be prefixed with zeroes, but this presents no real ambiguity.) When
we enter the realm of the rational numbers, that is to say fractions, something
inconvenient happens. There are now several genuinely different ways to write the
same number. For instance, 2 _
6, 6 _
9, 14
21, and a host of other fractions which do not immediately seem the same. These are
called equivalent fractions. If you start with a number, multiply it by 2, and then divide
the result by 2, you would expect to arrive back where you started. This leads to the
rule for equivalent fractions: if you multiply the top by some number (not including
zero) and you do the same to the bottom, you get an equivalent fraction. So 4 _
62
3,
3 2.
Taking this backwards, if the top and bottom of the fraction are each divisible by some
number, you can always cancel down to produce an equivalent fraction: 12 __
because 4
622
15 can be cancelled down since 12 and 15 are both divisible by 3. This is called
simplifying and gives the equivalent fraction 4 _
5. For most purposes, the best (simplest) representation of a fraction is when all
possible cancelling has been done. So the top and bottom of the fraction have no
common factors (or are coprime). It is a consequence of the fundamental theorem of
arithmetic that such a representation always exists, and is unique.
Multiplying fractions The basic rule for multiplying fractions is simple: multiply the
top, and multiply the bottom. So 2 _
34
15. A shortcut is to do any possible cancelling at the beginning, instead of the end. So
instead of calculating 2 __ 15 21
524
3 5 8 __
8 as 42
120 and then simplifying, we spot that the top and bottom are both divisible by 2 and
3. Cancelling these gives 1 _
57
47
20.
Adding fractions One of the commonest mistakes young students make is to think 1 _
42
7. The reasoning here is obvious, but a little thought-experiment quickly dismisses it: 1
_
4 of a cake, not 2
33
6.
Some fractions are easy to add: those which have the same number on the bottom, or a
common denominator. It is intuitive enough that 1 _
52
53
5. To add other pairs of fractions we first have to find an equivalent pair with common
denominator. When one denominator is a multiple of the other, this is straightforward.
To evaluate 3 _ 7 5
7 into 14ths,
namely 6
14 5
14 11
14.
28
When faced with 6, we need to find a common multiple of the two denominators, that
is, a number into which both 4 and 6 divide. One possibility is to multiply them: 4 6
24. This will work perfectly well. But it will reduce the cancelling later on if we use
their lowest common multiple, 12, instead. Now we convert each into 12ths, and add: 3
_
41_
69
41
12 2
12 11
12.
Recurring decimals Recurring decimals are decimal expansions which repeat for ever
without end. For example 1 __
7 0.285714285714285714…, which is
written as 0. ?????
285714 (or 0. ˙ 28571˙
4). Recurring decimals can always be rewritten as exact fractions, and so represent
rational numbers.
2 have decimal expansions which continue for ever without repeating, and so are not
recurring decimals.
Some numbers have both terminating and recurring representations: the number 1 is an
example.
9 (that is, 0.99999999…) is equal to 1 is one of the most resisted facts in elementary
mathematics. Students often insist that the two numbers are ‘next to each other’, but
not the same. Or they say that something should happen ‘after all the 9s’. Here are
three different proofs that the two are equal. (Of course any one of them is enough.)
9.
2 1 0.˙
The last of these is the template for converting any recurring decimal into a fraction.
Let x 0.˙
4. Subtracting the first equation from the second shows that 9 x 4, and so x 4
9.
2 and exponential e, which can never be written exactly as fractions, are therefore
irrational. (Again this has nothing to do with being illogical or stupid.) The
Pythagorean cult in ancient Greece attached mystical significance to the integers, and
believed that all numbers were rational. According to legend, when Hippasus of
Metapontum first proved the irrationality of
29
The irrationality of ?2 The fact that
2 can never be written exactly as a fraction. So the proof begins by assuming that it
can, say
If a and b have any common factors then the fraction can be cancelled down: we will
assume this has been done. So a and b have no common factors. Now, the definition of
2 means that
This proof, often attributed to Hippasus of Metapontum around 500 bc, can be adapted
to show that the square root of any prime number (in fact any non-square whole
number) is irrational.
We work on the plane equipped with Cartestian coordinates, and we plant a flag at all
the points with integer coordinates (1, 1), (2, 5), (4, 7), and so on. Now imagine that a
laser beam is fired from the origin out across the plane. Will it eventually hit a flagpole
or not?
The answer depends on the gradient m of the straight line followed by the ray. The
equation of this line is y mx. If the ray hits the post at (p, q), then it must be that q mp,
and so m q
Since q and p are whole numbers, this means that m is a rational number. So the
answer is that if m is rational the ray will hit a post, and if m is irrational it will not.
The problem of the reflected ray König and
Szücs considered an interesting adaptation of the problem of the ray. Instead of posts,
they imagine a square whose internal walls are mirrors. A laser is fired from one corner
into this mirrored box. What sort of path will it follow? (If the ray ever hits a corner of
the box, we assume it bounces back in the direction it came.)
Again the answer depends on the initial gradient m of the ray. If m is rational, then
after some time the ray will start retracing its former path, and will then repeat the
same loop over and over again. On the other hand, if m is irrational, then the ray never
repeats itself. The resulting line will be dense inside the box. This does not mean that it
will literally pass through every point of the interior of the box (so it is not quite a
space-filling curve).
ba
_ b 2. That is, a 2
30
NUMBERS
But if you choose any point inside the box, and specify some distance, no matter how
tiny, then eventually the ray will pass within that distance of your chosen point.
Bob $3 every day, and has been doing so for some time. So, each day, Bob’s balance
changes by 3, and Anna’s by 3. In 2 days from now, Bob’s balance will be 6,
compared with today, illustrating that 2 3 6. On the other hand 2 days from now (that is
two days ago) Bob had $6 less, that is 6, relative to today.
236
133
030
133
236
(The middle row reflects that we are comparing Bob’s balance to today’s level. After 0
days, of course it has changed by 0 dollars!)
Now consider Anna’s money, which changes by 3 dollars each day. So, in 2 days time,
it will be 6, compared with today’s level, illustrating that 2 3 6. What about 2 days
from now? Anna had $6 more, that is, a relative level of 6. Putting these in a table we
get the multiplication table for 3:
236
133
030
133
236
31
Division by 0 Dividing by 0 is probably the commonest mathematical mistake of all.
Even experienced researchers can tell horror stories of finding division by 0 lurking
within their proofs. There is a good reason why division by 0 is forbidden, coming
straight from the meaning of ‘divide’. We write that 8 ? 2 5 4 because 4 is the number
which, when multiplied by 2, gives 8. So to calculate 8 ? 0 we would need to find a
number which, when multiplied by 0 gives
y as x and y both get closer and closer to 0. Depending on the precise relationship
between x and y, this fraction can approach any fixed number, or explode out to
infinity, or cycle around seemingly at random.
A consequence of this, also known to the ancient Greeks, is the fundamental theorem
of arithmetic, which says two things:
1 Every positive whole number can be broken up into prime factors. 2 This can happen
in only one way.
So, 308 can be broken down into 2 3 2 3 7 3 11, and the only other ways of writing 308
as a multiple of primes are re-orderings of this (such as 11 3 2 3 7 3 2). So we know
immediately, without checking, that 308 ? 2 3 2 3 2 3 3 3 13.
As its name suggests, this fact is a foundation for the whole of mathematics. It is
peculiar to the system of natural numbers, however. In the rational numbers, nothing
similar holds, as there are many different ways to divide up a number. For example 2 5
4_
333
Highest common factor The highest common factor (or greatest common divisor) of
two natural numbers is the largest number which divides each of them. For example,
the highest common factor of 18 and 24 is 6: there is nothing bigger which divides
both.
32
__
8 3 16
2 and 2 5 7
7.
NUMBERS
The hcf of two numbers can be found by dividing them up into primes, and multiplying
together all their common prime factors (including any repetitions). For example, 60 5
2 3 2 3 3 3 5 and 84 5 2 3 2 3 3 3 7. So the hcf of 60 and 84 is 2 3 2 3 3 5 12. The same
method extends to finding the hcf of more than two numbers. Numbers like 8 and 9,
whose highest common factor is 1, are called co-prime.
Lowest common multiple The lowest common multiple of two natural numbers is the
smallest number into which they both divide. So the lowest common multiple of 4 and
6 is 12, since this is the first number appearing in both the 4 and 6 times tables. The
lcm of two numbers can be found by multiplying them together, and then dividing by
their highest common factor. So the lcm of 60 and 84 is 60 3 84 _____
12 5 420.
Perfect numbers A whole number is perfect if all its factors (including 1 but not the
number itself) add together to make the original number.
The first perfect number is 6: its factors are 1, 2 and 3. The next is 28 whose factors are
1, 2, 4, 7 and 14. Perfect numbers have aroused curiosity since the Pythagoreans, who
attached a mystical significance to this balancing out of additive and multiplicative
components. Perfect numbers continue to attract the attentions of mathematicians
today, and their study is divided between the even perfect numbers and the odd perfect
numbers.
Elements, that the first important result on perfect numbers was proved. Proposition
9.36 proved that if 2 k 2 1 is prime, then (2 k 21) (2 k) ________
2 is perfect. Centuries later this result would be revisited, once prime numbers of the
form 2 k 2 1 had come to be known as Mersenne primes.
The converse to Euclid’s theorem is also true, as the 10th-century scientist Ibn al-
Haytham noticed. If an even number is perfect, then it must be of the form M (M 11)
______ 2 where M is a Mersenne prime. For exemple, 62. However Ibn al-Haytham
was not able to prove this result fully. That had to wait for Leonhard Euler, around 800
years later.
Odd perfect numbers No-one has yet found an odd perfect number and most experts
today doubt their existence. However, no-one has managed to prove that such a
creature cannot exist either. In any case, their likely non-existence is no obstacle
284 220
S? S? S?
14264
2 and 28 5 7 3 8
33
for mathematicians to study them in depth. By investigating what an odd perfect
number will look like if it does exist, mathematicians hope either to pin down where
one might be found, or gather ammunition for an eventual proof by contradiction.
In the 19th century, James Sylvester identified numerous conditions which will have to
be met by any odd perfect number. Sylvester believed that for such a number to exist
‘would be little short of a miracle’. Certainly, if there are any, they will have to be
extremely large. In 1991, Brent, Cohen and te Riele used a computer to rule out the
existence of any odd perfect number shorter than 300 digits long.
Amicable pairs Most numbers are not perfect. When you add up their factors you are
likely either to come up short (in the case of a deficient number) or overshoot (for an
abundant number). But sometimes abundant and deficient numbers can balance each
other out. For example, 220 is abundant: its factors are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55
and 110, which sum to 284. The factors of 284 are 1, 2, 4, 71 and 142, which total not
to 284, but back to 220.
Mathematicians of the classical and Islamic worlds were fascinated by amicable pairs
of numbers such as this. In the 10th century, Thâbit ibn Kurrah discovered a rule for
producing amicable pairs, which was later improved by Leonhard Euler.
At the time of writing, 11,994,387 different amicable pairs are known, the largest
being 24,073 digits long. As with perfect numbers, it remains an open question
whether there are really infinitely many of them. Every known pair consists either of
two odd or two even numbers, but it has never been proved that this must always be
the case.
12496
Aliquot sequences Start with any number S 1, and add up all its proper factors to get a
new number S 2. If S 1 S 2, then we have a perfect number. Otherwise, repeat the
process: add up all the proper factors of S2, to get a new number S 3. If S 1 S 3, then S
1 and S 2 form an amicable pair. Otherwise we can continue, to get S4, S5, S 6 and so
on. This is called an
Sociable numbers An amicable pair consists of two numbers where the process of
adding up all the factors for each number takes you from one to the other. But longer
cycles can exist too: adding up the factors of 12,496 gives 14,288. Next we get 15,472,
then 14,536 and 14,264, before getting back to 12,496. So these form a cycle of length
5. Numbers like this are known as sociable. At the time of writing, the longest known
cycle of sociable numbers has length 28. It is: 14316, 19116, 31704, 47616, 83328,
177792, 295488, 629072, 589786, 294896, 358336, 418904, 366556, 274924, 275444,
243760, 376736, 381028, 285778, 152990, 122410, 97946, 48976, 45946, 22976,
22744, 19916, 17716.
NUMBERS
INDUCTION
aliquot sequence (aliquot is the Latin word for ‘several’). The aliquot sequence of 95 is
95, 25, 6, 6, 6, … This has landed on a perfect number, where it stays. Similarly, if an
aliquot sequence ever lands on a member of an amicable pair, or any sociable number,
then it will simply cycle around for ever.
An alternative is for the sequence to hit a prime number, which marks its death. The
sequence starting with 49, for example, proceeds next to 8, and then to 7. But the next
number must be 1 (since 7 has no other factors), and after that 0.
In 1888, Eugène Catalan conjectured that every aliquot sequence will end in one of
these ways. But some numbers are throwing this conjecture into serious doubt: 276 is
the first of many numbers whose eventual fate is unknown. Around 2000 terms of its
aliquot sequence have been calculated to date and so far it is simply growing and
growing. After 1500 terms, the numbers are over 150 digits long.
INDUCTION
Proof by induction How can you prove infinitely many things in one go? One prized
technique is mathematical induction, which is used to prove results involving natural
numbers. Suppose I want to prove that every number satisfies some property, call it X.
Induction attacks the natural numbers in order. The base case of the argument is to
show that 0 is an X-number. The inductive step is to show that if 0, …, k are all X-
numbers, then k 1 must be one too. If this can be done, then no number can ever be the
first non-X-number, so there cannot be any non-X-numbers at all. So all numbers are
X-numbers.
Induction resembles a mathematical domino-effect: the base case pushes over the first
domino. Then the inductive step shows that the first domino must knock over the
second, and the second will knock over the third, and so on, until no domino is left
standing. Induction is a defining feature of the natural numbers: it does not apply
directly to the real numbers for example. Adding the numbers 1 to 100 is an example
of induction in use, and light-hearted takes on it include the bald man paradox, and the
proof that every number is interesting.
1729 is interesting
1729 1 3 12 3 9 3 10 3
35
A proof that every number is interesting
The base case is the number 0: unquestionably among the most interesting of all
numbers. The inductive step assumes that the numbers 0, 1, 2, …, k are all interesting.
The next one is k 1 which must be either interesting or not. If not, then it is the first
non-interesting number, but this would make it of unique interest: a contradiction. So,
it must be interesting. This completes the inductive step, and so by induction, every
number is interesting.
This is, of course, a parody of a proof, rather than the real thing. Interestingness is not
rigorously defined, and more accurately operates on a subjective sliding-scale. In
reality some numbers are more interesting than others (and it depends on who you
ask). This ‘proof’ requires an artificial inflexibility whereby every number is deemed
either absolutely interesting or not.
2. For n 3, we add up the first three numbers: 1 2 3 6: the same answer given by 3 (3 1)
_______
2. Gauss realized that all he needed to do was substitute n 100 into this formula to get
his answer: 100 (100 1) __________
There is a geometric way to see this, by considering the n th triangular number. It can
be proved more rigorously by induction.
so the formula holds for n 0. Now, the inductive step: we suppose that the formula
holds for some particular value of n, say k. So 1 2 … k k (k 1)
2. We want to deduce that the corresponding thing holds for the next term, n k 1. It
follows that 1 2 … k (k 1) k (k 1)
The formula for adding up 1 to n can be written neatly in sum notation as:
2. The base case involves adding up the first zero numbers. Obviously the answer is 0.
The right-hand side is 0 (0 1)
2 (k 1). After a little algebraic sleight of hand, the right-hand side comes out as (k 1) (k
2) ___________
j n (n 1)
36
NUMBERS
REPRESENTATIONS OF NUMBERS
There is a similar formula for the sum of the first n square numbers, which can also be
proved by induction:
j1
j 3 n 2 (n 1) 2
(Notice that this is just the first formula squared.) For higher powers, we get:
j1
j41
5n51
2n41
3n31
30 n
j1
j51
6n61
2n55
12 n 4 1
12 n 2
j1
j61
7n71
2n61
2n51
6n31
42 n
j1
To give a general formula for higher powers requires more work, and involves the
Bernoulli numbers, an important sequence in number theory.
REPRESENTATIONS OF NUMBERS
Place value and decimal notation Counting from 1 to 9 is easy; all we need to do is
remember the correct symbols. But something strange happens when we arrive at 10.
Instead of a new symbol, we start recycling the old ones. This is a deceptively
sophisticated system, which took many centuries to evolve. A crucial moment was the
arrival of a symbol for 0.
In place value notation, the symbol ‘3’ does not just represent the number 3. It can also
stand for 30, 300 or 0.3. The position of the symbol carries as much meaning as the
symbol itself. Whole numbers are represented as digits arranged in columns. On the
right are the units (ones), and with each step left we go up a power of 10. In a place
value table, the number 1001 is shown as:
37
Thousands Hundreds Tens Units 1
001
This is called the decimal system, because 10 is its base. Other choices of base are
perfectly possible.
Bases The fact that 10 is the base of our counting system is surely due in part to
evolution gifting us 10 fingers rather than 8 or 12. From a mathematical perspective, it
is an arbitrary choice. You can form an equally good counting system based on any
number. Indeed a binary system, that is base 2, has some advantages (at least in the
computer age). It should be stressed that this discussion is only concerned with how to
represent numbers using symbols. Whether written as 11 in decimal notation, 1011 in
binary, B in hexadecimal, or 10 in base 11, the fundamental object is the same
throughout. It is no more affected by these cosmetic alterations than a human is by a
change of hat.
Today we mostly work with decimals, but not entirely. Telling the time is not decimal:
there are 60 seconds in a minute, 60 minutes in an hour. This is a hangover from
ancient Babylon, whose mathematicians and bureaucrats worked in base 60. The
ancient Chinese divided the day into 100 ke (around 1 _
4 of an hour), until the adoption of the western system in the 17th century. Plans to
replace hours and minutes with a decimal system have come and gone several times
since then, only ever meeting with fleeting success, such as during the French
revolution.
Binary Binary means ‘base 2’. So, counting from the right, the place values represent
units (or ones), twos, fours, eights, sixteens, etc. (powers of 2). To translate a decimal
number into binary, we break it down into these pieces.
4) (1 1), giving a binary representation of 101101. Going the other way, we translate
binary to decimal as follows: 11001 has 1 in the units column, 0 in the twos column, 0
in the fours, 1 in the eights, and 1 in the sixteens. So it represents 1 8 16 25.
Binary is the most convenient base for computers, since the 1 and 0 can be stored as
the ‘on’ and ‘off’ settings of a basic component. These binary digits are known as bits.
Eight bits make a byte, which is used to measure computer memory. The way in which
strings of bits can carry data is the subject of information theory. Like so much of
modern mathematics, binary was first conceived by Gottfried Leibniz in the 17th
century.
Binary is not only useful for computers. Using your fingers to represent bits, it is
possible to count to 31 on one hand, or 1023 on two.
38
12345678
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31
NUMBERS
REPRESENTATIONS OF NUMBERS
Hexadecimals Binary may be the easiest numerical representation for a computer but,
to the human eye, a long string of 1s and 0s is not easy to decipher. For the most part,
we deal in decimals instead. These have the disadvantage of not being easy to convert
into binary. For this reason, some computer scientists prefer to work in hexadecimals,
base 16. The digits 0–9 have their ordinary meanings, and A, B, C, D, E, F stand for
10, 11, 12, 13, 14 and 15 respectively. The decimal ‘441’ would be written in
hexadecimals as ‘1B9’.
Translating between binary and hexadecimals is much easier than between binary and
decimals. We just group the binary expression into clumps of four digits, and translate
each in turn. So 1111001011 gets grouped as ‘(00)11 1100 1011’, which translates to
hexadecimals as 3CB.
Standard form Thanks to our decimal system, the number 10 has several uniquely
useful properties. Multiplying or dividing by 10 has the effect of sliding the digits one
place left or right, relative to the decimal point. For example, 47 10 4.7, and 0.89 10
8.9. Exploiting this, every number can be expressed as a number between 1 and 10,
multiplied or divided by some number of 10s, which can be written as a positive or
negative power of 10. This is called standard form. For example, 3.14 10 6 and 2.71 10
5 are both written in standard form.
To convert an ordinary number, such as 14,100, into standard form, first slide the
decimal point so that it gives a number between 1 and 10. In this case it gives 1.41. To
get back to 14,100 we need to multiply by 10 four times. So in standard form 14100
becomes 1.41 10 4. For small numbers such as 0.00173, the procedure is the same.
First slide the decimal point (right this time) to give 1.73. This time, to get back to the
original number, we need to divide by 10 three times. So we get a negative power: 1.73
10 3.
Standard form is useful as it allows the order of magnitude of the number to be gauged
quickly (by the power of 10), and meshes well with the metric system.
2. Expressions like this are called surds. (This word comes from the Latin surdus,
meaning ‘voiceless’ reflecting al Al-Khwarizmi’s take on irrational numbers, against
the more ‘audible’ rationals.)
2 applies equally to the square root of any non-square natural number. Although it
makes sense to keep these expressions as surds rather than approximating them with
decimals or fractions, we usually want to simplify these expressions as far as possible.
The key mathematical ingredient in working with surds is the observation that
ab
43
Surds Because
b, and in particular
12 we simplify it:
a
32
3.
39
Rationalizing the denominator If we have an expression like 2
3, we often want to simplify to it a form where the square roots occur only on the top
of the fraction. To achieve this, we can always multiply the top and bottom of any
fraction by the same thing without changing its value. This trick is in choosing the
right multiplier. In this case choose 1
2 ________ 1
3 2 (1
3) ____________________ (1
3) (1
see difference of two squares
3) 2 2
3 _________________________ 1
3?
31
22
3 _________
1322
3 _________
In general, if a
b. This will transform the denominator into the whole number a 2 b (see difference of
two squares).
Large numbers Large numbers have always held a great fascination for humans (or at
least humans of a certain disposition). Jaina mathematicians of ancient India attached
deep mystical significance to enormous numbers. They defined a rajju as the distance a
God travels in six months. (Gods travel a million kilometres in every blink of the eye.)
Building on this, they imagined a cubic box whose sides are one rajju long, filled with
wool. A palya is then the time it will take to empty the box, removing one strand per
century. The Jains also developed a theory of different denominations of infinity,
anticipating Georg Cantor’s set theory by over 2000 years.
Reckoner of around 250 bc, Archimedes estimated the number of grains of sand
needed to fill the universe. His solution was that no more than 10 63 grains should be
required. This number is of no great interest, marred as it is by the heliocentric
cosmology of Archimedes’ time, where the stars were assumed a fixed distance from
the sun. Nevertheless the text marks an important conceptual distinction: that between
very large natural numbers and infinity.
The Sand Reckoner was no idle game. Archimedes was correcting a common
misconception of his time: that there is no number which can measure anything so
huge, that sand is essentially an infinite quantity. To do this, he had to invent a whole
system of notation for large numbers. He got as far as ‘a myriad myriad units of a
myriad myriad numbers of the myriad myriadth period’ or, as we would write it, 10 8
10 16. Modern notation, such as taller towers of exponentials, can take us far beyond
this.
Towers of exponentials How can large numbers best be described, using modern
mathematics? We could start by trying to write them down in ordinary decimal place
value notation. Following this strategy, a googol is:
40
NUMBERS
REPRESENTATIONS OF NUMBERS 10, 000, 000, 000, 000, 000, 000, 000, 000,
000, 000, 000, 000, 000, 000, 000, 000, 000,
000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000
For most purposes, exponentiation is all we need. The number of atoms in the universe
is around 10 80, and the number of possible games of chess is estimated at 10 123. But
we can always concoct larger numbers. A googolplex is 1 followed by a googol zeroes.
In exponential notation that is:
10 10, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000,
000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000
. To build larger numbers than towers can give us, we need Knuth’s arrow notation.
Knuth’s arrows Donald Knuth occupies a place in every mathematician’s heart. As the
creator of the typesetting programming TeX, he is largely responsible for what modern
mathematics looks like, in the pages of countless books and journals. In 1976, Knuth
also devised an efficient notation for writing down very large numbers. It is based on
iteration. To start with, multiplication is iterated (repeated) addition: 4 3 4 4 4.
where
the tower is 4 4 4 storeys high. This is a stupendously large number, which is already
almost impossible to express in any other way. We continue by defining the fourth
arrow as the third iterated, and so on. The next problem is that the number of arrows
might grow unmanageable. To counter this, we may write 4{ n }3 as short-hand for 4
… 3 where there are n arrows. For still larger numbers, we need more powerful
notation, such as Bowers’ operators.
Bowers has devoted a great deal of time into finding and naming ever larger numbers.
At time of writing, his largest is a colossus he has called meameamealokkapoowa
oompa. Bowers’ basic idea is a process to far extend Knuth’s arrows. His first operator
is {{1}} defined by:
m {{1}}2 m { m } m
m {{1}}3 m { m { m } m } m
m {{1}}4 m { m { m { m } m } m } m
41
and so on. This is enough to locate one of the largest of all mathematical constants,
Graham’s number. But we can continue with a second operator {{2}} defined by:
m {{2}}2 m {{1}} m
and so on. Then the operators {{3}}, {{4}}, etc. can all be defined analogously.
We begin the next level with {{{1}}}, which behaves in relation to {{ }} as {{ }} does
to { }, and so on. We can press on, with a new function which counts the brackets: we
write [ m, n, p, q ] to mean m {{ … { p }… }} n where there are q sets of brackets. Of
course, Bowers does not stop here, pushing this line of thought to ever more
outrageous heights. But some numbers will always remain out of reach, such as
Friedman’s TREE (3).
Graham’s number Graham’s number is often cited as the largest number ever put to
practical use. The previous record holder was Skewe’s number, a puny 10 10 10 34.
(Whether Graham’s number still holds the crown depends on whether you class the
likes of TREE (3) as useful.) While Skewes’ number can easily be written as a short
tower of exponentials, it is impossible to describe Graham’s number without the aid of
some heavy machinery, such as Bowers’ operators. In these terms, Graham’s number
lies between 3{{1}}63 and 3{{1}}64.
Friedman’s TREE sequence grows so fast that ordinary mathematics (as formalized in
Peano arithmetic) simply cannot cope, making this one of the most concrete
established examples of Gödelian incompleteness.
42
NUMBERS
REPRESENTATIONS OF NUMBERS
___ c d
___ e f
___ g
11
1 1 ____
1 1 ____
1 1 ____
1…
211
2 1 ____
2 1 ____
2 1 ____
2…
Leonhard Euler showed that if a continued fraction continues for ever and converges, it
must represent an irrational number. He then deduced for the first time that the number
e is irrational by showing that it is equal to:
e 2 1 ____
1 1 ____
2 1 ____
1 1 ____
11
4 1 ____
1 1 ____
1 1 ____
6…
These are all simple continued fractions, because their numerators (top rows) are all 1.
e 2 1 ____
1 1 ____
22
3 3 ____
44
5 5 ____
6…
One of the earliest continued fractions was discovered by Lord William Brouncker, in
the early 17th century:
4 __
1 1 2 ___
2 3 2 ___
2 5 2 ___
2 7 2 ___
2…
This can be manipulated to produce a continued fraction for. But’s simple continued
fraction remains mysterious. Ramanujan’s continued fractions are not only non-simple,
but are not even built from whole numbers.
43
Ramanujan’s continued fractions According to his friend and mentor G.H.Hardy, the
Indian virtuoso Srinivasa Ramanujan, had ‘mastery of continued fractions … beyond
that of any mathematician in the world’. Ramanujan discovered numerous spectacular
formulas involving continued fractions, many of which were discovered in
unorganized notebooks after his death. Not only did Ramanujan not provide proofs of
his formulas, often he did not even leave hints as to how he performed these
astonishing feats of mental acrobatics. It was left to later mathematicians to verify his
formulas, and some of his highly individual notation remains undeciphered to this day.
An example of his work on continued fractions is this, involving the golden section:
(The numerical value of this is around 0.999999, which may be why Ramanujan found
it intriguing.) This is a special case of the celebrated Rogers–Ramanujan fraction.
Discovered independently by Leonard Rogers, this is an ingenious system for
calculating the value of
1 ___
1q
1 q 2 ___
1 q 3 ___
1…
30 into a simple continued fraction. There are two basic steps: first we separate the
integer part from the fractional part, which means splitting 43 __
30 into 1 13
30. Next we turn the fractional part upside down, below a 1. So our fraction becomes:
1 1 ____
(30 ___
13)
13 __
1 1 ________
2 1 ____
(13 ___
4)
1 1 _________
2 1 ______
3 1 __
4
44
NUMBERS
REPRESENTATIONS OF NUMBERS
Because we started with a rational number, the process ended after finitely many steps.
For an irrational number, this process will produce an infinitely long continued
fraction. This is a simple continued fraction, because all the components are whole
numbers, and all the numerators are 1. Every real number can be expressed as a simple
continued fraction, and this can be done in exactly one way.
A perplexing question involves the simple continued fraction for. The sequence begins:
3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, … and has been
calculated to 100 million terms by Eric Weisstein, in 2003. The underlying pattern,
however, remains mysterious. By truncating the sequence after a few steps, good
fractional approximations to can be found: 3, 22 __
7, 333
106, 355
1 x can be expressed as a simple continued fraction, in a unique way. So, just as for,
we get a sequence (a, b, c, d, e …) which encapsulates x.
3 This sequence will get closer and closer to some fixed number. This is K, the
geometric mean of the sequence (a, b, c, d, e, f, …).
This may seem a convoluted process, but the punch-line is truly astounding: almost
every value of x produces the same value of K. This amazing number is approximately
It is not true that literally every value of x reveals the same value. Rational numbers do
not produce K, for example, and neither does e. However, these exceptions are
infinitely outnumbered by those which do have Khinchin’s constant hiding behind
them. If we were able to pick a real number at random, the probability that it would
yield K would be 100% (exactly). It is all the more surprising then, that no-one has yet
managed to prove that any individual value of x does produce K. appears to (as does K
itself), but no full proofs have yet been found. K itself is very secretive and is not even
known to be irrational.
a 1 _______________ b 1
___________ c 1
_______ d
ab, 3
abcd, 5
abcde, …
45
TRANSCENDENTAL NUMBERS
Cantor went beyond this, however. The algebraic numbers include the rational
numbers but also many of the more common irrational numbers, such as
2. Cantor proved that, as well as all the rational numbers, the algebraic numbers are
also countable. This had a stunning consequence: transcendental numbers, rather than
being exotic anomalies, are the norm. In fact, almost every real number is
transcendental. The familiar numbers that we use the most, the integers, rationals and
algebraic numbers, are just a tiny sliver in a universe of transcendence.
2 is irrational, there is a sense in which it is not too far away from the safety of the
whole numbers. There is a quick route back from
2 2, a whole number. Transcendental numbers such as are not like this. is not a whole
number, and neither is 3, or 1001 5 64. In fact, there is no route from a transcendental
number back to the whole numbers, using addition, subtraction, multiplication and
division. The technical definition is that a number a is transcendental if there is no
polynomial built from integers which produces an integer when a is substituted in.
Every transcendental number is irrational. However, many irrational numbers, such as
46
NUMBERS
TRANSCENDENTAL NUMBERS
Transcendental number theory Georg Cantor had shown that almost all real numbers
are transcendental. It is all the more surprising then that specific examples are hard to
find. Liouville’s numbers, e, and were for some time the only known examples.
Hilbert’s 7th problem of 1900 first addressed the core of the difficulty: the way that
transcendence and exponentiation interact.
In answer, the Gelfond–Schneider theorem of 1934 provided the first solid rule of
transcendence: it said that if a is an algebraic number (not 0 or 1), and b is an irrational
algebraic number, then a b is always transcendental. So and 3 7 are transcendental, for
example. Over the 20th century this beginning was built upon, notably in results such
as the six exponentials theorem, and Alan Baker’s pioneering work in the 1960s, for
which he won the Fields medal.
Baker’s work investigated sums of numbers of the form b ln a. His results extended the
Gelfond–Schneider theorem to products of numbers of the form a b (where a and and b
are both algebraic, a is not 0 or 1, and b is irrational). This hugely increased the stock
of known examples of transcendental numbers, by including the likes of 2
Six exponentials theorem The six exponentials theorem was proved by Siegel,
Schneider, Lang and Ramachandra. It attacks the central problem of transcendental
number theory: how often exponentiation produces a transcendental result. It states that
if a and b are complex numbers, and x, y and z are complex numbers, then at least one
of e ax, e bx, e ay, e by, e az and e bz is transcendental. The caveats are that a and b
must be linearly independent, meaning that neither is a multiple of the other by a
rational number (so a 3 _
4 b, for example). Similarly none of x, y and z can be reached by multiplying the other
two by rational numbers and adding the results together (so
x1_
3y2
Schanuel’s conjecture is phrased in the technical language of Galois theory, and says,
in essence, that there are no nasty surprises in store: transcendence and exponentiation
interact in as simple a fashion as could be hoped. According to the number theorist
David Masser,
7 are transcendental, for example. Over the 20th century this beginning was built upon,
notably in results such as the six exponentials theorem, and Alan Baker’s pioneering
work in the 1960s, for which he won the Fields medal.
35
10
5 z, for example).
It is an open question, known as the four exponentials conjecture, whether the same
holds when z is omitted. This would follow from Schanuel’s conjecture.
47
Schanuel’s conjecture ‘is generally regarded as impossibly difficult to prove’.
However, in 2004 Boris Zilber applied techniques of model theory to provide strong,
indirect evidence that Schanuel’s conjecture should be true. It remains to be seen
whether this insight can be captured as a proof of this momentous conjecture.
Ruler and compass constructions The world of geometry is filled with exotic figures
and shapes. Mostly we consider them from a lofty theoretical perspective, but how can
we actually construct them? Taking this question to its limit, which of these shapes can
be constructed using just the simplest of tools: a ruler to draw a straight line, and a
compass (pair of compasses) to draw a circle? This was a question which fascinated
ancient Greek mathematicians.
Sometimes this process is called straight-edge and compass construction to make clear
that the ruler is not to be used for measuring: it is just a tool for drawing straight lines.
Similarly, the compass can only be set to the length of any line which has already been
constructed (or to a randomly chosen length).
Bisecting a line We are presented with two points on a page: the challenge is to find
the point exactly half way between them, using only a ruler and compass. The solution,
provided as Proposition 1.10 of Euclid’s Elements is as follows:
2 Next, set the compass to more than half the distance between the points.
Putting the pin in one of the points, draw an arc crossing the line. Keeping the compass
at the same setting, do the same thing at the other point.
3 The two arcs should meet at two places (as long as you drew them large enough).
Join these with a straight line (B).
4 The place where the line A and line B meet is the midpoint we want.
In fact this construction does a little more than just find the midpoint: line B is the
perpendicular bisector of line A.
Centre
48
NUMBERS
Proposition 1.31 of Elements shows that it can. First set your compass to any length
greater than that from A to L, and keep it at this setting throughout the whole
construction.
1 Draw a circle (X) centred at A. This will cross L at two places; pick one and call it B.
2 Draw another circle (Y) centred at B. This will cross L at two places too. Pick one
and call it C.
3 Draw a third circle (Z) centred at C. This will cross the circle X at B, and at a second
point.
A
Z
Trisecting a line Since a line can be bisected using only a ruler and compass, it can also
be divided into four, eight or sixteen equal parts, just by repeating that procedure. But
can a line be divided into three equal parts? In Proposition 6.9 of his Elements, Euclid
shows that it can:
1 To divide the line AB into three, first draw any other line (L) through A. 2 Pick any
point (C) on L, and then use the compass to find another point D on L so that the
distance from A to C is the same as that from C to D. 3 Repeat this, to find a third
point E on L so that the distance from A to C is the same as that from D to E, so that C
is one third of the way from A to E.
4 Now join E to B with a new line M. 5 Draw a line N through C, parallel to M. 6 The
point where N crosses AB is one third of the distance from A to B. Step 5 requires use
of the construction of parallel lines.
Lines of rational length The method for trisecting a line easily generalizes to allow a
segment to be divided into as many pieces as you like. If we begin with a line of length
1, we can now create a line of length x
__ y, where x
A
M
First divide the line into y equal pieces. Next place x such pieces end to end (by
measuring off chunks of a long line with the compass). This shows that all rational
numbers are constructible. Some, but not all, irrational numbers are constructible too,
since the square root is a constructible operation.
49
Bisecting an angle We are given a sheet of paper containing two lines meeting at an
angle: the challenge is to divide that angle in half, using only a ruler and compass.
2 Ensuring that the compass is set large enough, put the compass at A and draw circle.
3 Keeping the compass at the same setting, draw another circle centred at B.
Trisecting an angle The procedure for bisecting an angle is not especially complicated.
Altogether harder is the question of whether a general angle can be trisected: divided
into three equal angles. This was a problem tackled by Archimedes, among others. He
discovered how to do it using an extra tool which allowed him to draw an Archimidean
spiral. Neither he nor anyone else could manage the task exactly using just a ruler and
compass, however. Approximate methods were found: one such is to draw a chord
across the angle, and trisect that line.
The conundrum remained unsettled until 1836 when Pierre Wantzel finally proved
algebraically that, in general, angles cannot be trisected using only a ruler and
compass. Some specific angles, such as right angles, are trisectable however.
Suppose the angle is centred at the origin O, and is formed by a horizontal line and an
inclined line. Draw the spiral centred at O. At some point, call it A, the curve will cross
the inclined line.
By definition, at every point on the spiral, the distance to the origin is equal to the
angle formed, since r. At this stage, trisecting the angle becomes equivalent to
trisecting the distance from A to O. Of course this is a problem which can be solved.
Say the resulting distance is X. We set our compass to X, and use this to find a point B
on the spiral which is a distance X from the origin. This solves the original problem.
Archimedean spiral
AA
VVV
BB
XXX
OO
BB
50
NUMBERS
The very first proposition, 1.1, of Euclid’s Elements shows that an equilateral triangle
can be constructed with ruler and compass. We assume that we are given a segment of
straight line (say with ends at A and B). The challenge is to find another point C so that
ABC is an equilateral triangle. Euclid’s solution is as follows:
1 Set the compass to the distance from A to B. Draw an arc centred at A, passing
through B.
2 Without changing the compass setting, draw and arc centred at B passing through A.
The first regular polygon is an equilateral triangle, which is constructible. The second
is a square. Proposition 1.46 of Euclid’s Elements shows that this is constructible too.
The procedure hinges on being able to create a right angle at a point A on a line L. This
can be done by taking points B and C an equal distance on either side of A, and then
bisecting the segment BC.
The Elements also contains instructions for constructing a regular pentagon (see
illustration), as well as a hexagon and pentadecagon (15-gon).
51
CC
AAAB
BC
Ax1
AA
Squaring the circle The challenge is as follows: we are given a circle, and required to
construct a square of the same area, using only a ruler and compass. Also known as the
quadrature of the circle, this problem has baffled mathematicians since the time of the
ancient Greeks. It is intimately tied to an even older problem: the value of.
Suppose the circle has radius of 1. Then its area is 1 2. So the sides of our square need
to have length ?
. Once we have a line of this length, building the square is straightforward. So the nub
of the problem is to construct this length.
In 2.14 of Euclid’s Elements, he showed how to construct a square root, using just a
ruler and compass. Suppose we are given two lengths 1 and x. (For convenience we’ll
suppose that x 1, but the method can easily be adapted if x 1.) The challenge is to
construct a new line, of length
__ x.
1 First put these two lengths end to end, to give a line AC of length x 1. This can be
done with the compass. Mark the point B where the two segments meet.
3 Set the compass to the length AD, and draw a circle centred at D.
__ x.
The reason this works is that DE has the same length as DC, namely half that of AC, so
x 1 ____
D
BC
BDBC
____ ____
21x1
2) 2 (x 1
x?
52
NUMBERS
Since square roots are constructible, it is enough to construct a line of length. The key
to the problem then is whether or not is constructible. In 1836, Pierre Wantzel showed
that if is a transcendental number, then it is not a constructible number. The final piece
of the puzzle arrived in 1882, when the Lindemann–Weierstrass theorem implied that
is indeed transcendental, and so squaring the circle is impossible.
In 1914, Srinivasa Ramanujan found a very accurate approximate method for squaring
the circle. Although is not constructible, he found an extraordinary constructible
approximation to it, namely 4
2143 ____
22 can be constructed as a rational length, and then the procedure for constructing a
square root can be applied twice to give 22 and then. If the original circle has a radius
of 1 metre, the resulting square will have sides accurate to the nearest nanometre.
Doubling the cube In around 430 bc, Athens suffered a terrible plague.
The Athenians consulted the Oracle of Apollo, on the island of Delos. To cure the
plague, the Oracle said they must construct a new altar, double the size of the present
one. When they consulted Plato, his response was that the Oracle intended to shame
the Greeks for their neglect of geometry. So goes the tale of the origin of the problem
of doubling the cube. Whatever the truth of the story, this problem did indeed
preoccupy ancient mathematicians. Essentially it is a problem of a ruler and compass
construction (albeit in three dimensions): we are given a cube and challenged to
construct a new one of double the volume.
Suppose that the original cube has sides of 1 unit. Then it has volume 1, and so the new
cube should have volume 2. This means that it must have sides of length 3
2. Constructing this length is core of the problem. Plato’s friend Menaechmus managed
to solve it, with extra tools beyond simple ruler and compass: essentially, he realized
that the parabolas y 2 2 x and y x 2 would intersect at the point whose x-coordinate
was 3
2 (an astonishing insight since Cartesian coordinates were still thousands of years
away). The problem was eventually proved impossible in 1836 by Pierre Wantzel, who
demonstrated that 3
2 is not a constructible
Constructible numbers Many problems of ruler and compass construction boil down to
this: given a line of length 1, and a number r, is it possible to construct a line of length
r ? If so, we say that r is constructible. Obviously integers, such as 4, are constructible:
draw a long line, and then use the compass to measure off four successive segments of
length 1. Numbers of the form m
n are also constructible: the method of trisecting a line easily extends to give lines of
rational length.
So far, this shows that every rational number is constructible. This is not all however:
taking square roots is also a constructible procedure. Pierre Waltzel proved that this is
now everything
2143 ____
22
and then 4
number.
53
we can do: the only constructible numbers are those obtainable from the rational
numbers by adding, subtracting, multiplying, dividing, and taking square roots. As this
suggests, the constructible numbers form a field.
DIOPHANTINE EQUATIONS
Number theory The term number theory might seem a good description of the whole of
mathematics, but the focus of this subject is the system of ordinary whole numbers (or
integers) rather than the more rarefied real numbers or complex numbers. The whole
numbers are the most ancient and fundamental of all mathematical structures. But,
below their surface, lie some of the deepest questions in mathematics, including the
Riemann hypothesis and Fermat’s last theorem.
The two major concerns of contemporary number thoery are the behaviour of the
prime numbers and the study of Diophantine equations: the formulas which describe
relationships between whole numbers. These two topics roughly correspond to the two
principal branches of the subject: algebraic number theory and analytic number theory.
The tools of the two subjects are different. The algebraic approach studies numbers via
objects such as groups and elliptic curves, while analytic number theory uses
techniques from complex analysis, such as L-functions. Langland’s program provides
the tantalizing suggestion that these two great subjects may be different perspectives
on the same underlying objects.
54
is not, (although 4
NUMBERS
DIOPHANTINE EQUATIONS
Modular arithmetic is widespread not just within mathematics, but in daily life. The
12-hour clock relies on our ability to do arithmetic modulo 12, and if you work out
what day of the week it will be in nine days’ time, you are doing arithmetic mod 7.
Modular arithmetic is useful in number theory for providing information on numbers
whose exact values are unknown, through powerful results such as Fermat’s little
theorem, and Gauss’ quadratic reciprocity law.
Chinese remainder theorem Some time between the third and fifth centuries ad, the
Chinese mathematician Sun Zi wrote: ‘Suppose we have an unknown number of
objects. When counted in threes, two are left over. When counted in fives, three are left
over. When counted in sevens, two are left over. How many objects are there?’ In
modern terms, this is a problem in modular arithmetic: what is needed is a number n,
where n 2 (mod 3), n 3 (mod 5), and n 2 (mod 7).
The Chinese remainder theorem states that this type of problem always has a solution.
The simplest case involves just two congruences: if a, b, r and s are any numbers, then
there is always a number n where n a (mod r), and n b (mod s). The caveat is that r and
s must be coprime, meaning they have no common factors. This immediately extends
to solving any number of congruences (again providing the moduli are all coprime).
The solution n is not quite unique: 23 and 128 both solve Sun Zi’s original problem. In
general there will be exactly one solution which is at most the product of all the
moduli, in this case 105 (3 5 7).
n p n 0 (mod p) or n p n (mod p)
n p 1 1 (mod p)
Fermat added the characteristic comment ‘I would send you the proof, if I did not fear
its being too long’. The first known proofs are due to Gottfried Leibniz, in unpublished
work around 1683, and Leonhard Euler in 1736.
55
Quadratic reciprocity law The great German mathematician Carl Friedrich Gauss loved
this result, which he called the ‘Golden Theorem’. First stated by Leonhard Euler in
1783, Gauss published its first complete proof in 1796.
For two odd primes p and q, this law describes an elegant symmetry between two
questions: whether p is a square modulo q, and whether q is a square modulo p. It
asserts that these questions always have the same answer, except when p q 3(mod 4)
when they have opposite answers. Take 5 and 11, for example. Firstly 11 1 (mod 5),
and 1 is a square. So the theorem predicts that 5 mod 11 should also be a square. This
is not immediately obvious, but on closer inspection, 4 2 16 5 (mod 11).
Along with Pythagoras’ theorem, the quadratic reciprocity law is one of the most
profusely proved results in mathematics. Gauss alone produced eight proofs during his
life, and over 200 more now exist, employing a wide variety of techniques.
Diophantus of Alexandria lived around ad 250. Although the ancient Babylonians had
begun probing integer solutions to quadratic equations, it was in Diophantus’ thirteen
volume Arithmetica that the study of Diophantine equations, named in his honour, was
begun in earnest. This work marked a milestone in the history of number theory, but
was believed to have been lost in the destruction of the great library at Alexandria.
However in 1464 six of the books resurfaced and became a major focus for European
mathematicians, most notably Pierre de Fermat.
2, 1 _
3, or 1
4 (but not 3
4). Of course, any rational number can be written as unit fractions added together: for
example 3 _
41
__
41
41
56
NUMBERS
DIOPHANTINE EQUATIONS
For instance, 1
21
4 is a representation of 3
In 1202 Leonardo of Pisa (better known as Fibonacci) wrote his book Liber Abaci, in
which he proved that every fraction can be split up in this way. He also provided an
algorithm for finding this representation. But this has not completely settled the matter:
questions still remain about how many unit fractions are needed to represent a
particular number, which include the Erdo˝s-Straus conjecture.
(with n at least 2) can be written as the sum of three unit fractions (that is, fractions
whose numerator is 1). So, for every such n, there are three whole numbers x, y, z
where 4 __
n1_
x 1 _ y 1 _ z. For
10. The claim that this is always true, formulated by Paul Erdo˝s and Ernst Straus in
1948, has so far resisted all attempts at proof and counterexample.
51
21
51
Bézout’s lemma The highest common factor (highest common denominator) of 36 and
60 is 12. Named after Étienne Bézout, Bézout’s lemma says that there are integers x
and y where 36 x 60 y 12. On closer inspection, one solution is x 2 and y 1. But there
will be infinitely many others too, such as x 7 and y 4. This can be rephrased as a
statement about Diophantine equations: if a and b have highest common factor d, then
Bézout’s lemma guarantees that the linear equation ax by d has infinitely many integer
solutions. Bézout’s lemma is the key to understanding all linear Diophantine equations.
Diophantine equations are ones which just involve plain x and y (no x 2 or xy or higher
powers): linear equations, such as 6 x 8 y 11. Any such equation defines a straight line
on the plane. So asking whether it has integer solutions is the equivalent to looking for
points on this line which have integer co-ordinates.
Bézout’s lemma deals with the most important case: it says ax by d has a solution
when d is the highest common factor (highest common denominator) of a and b (and in
this case it has infinitely many solutions). In fact, the equation ax by d only has
solutions when d is a multiple of the hcf of a and b. So the equation 6 x 8 y 11 has no
integer solutions, since the hcf of 6 and 8 is 2, and 11 is not a multiple of 2.
If a linear equation has any integer solutions, then it will have infinitely many. This
result extends to equations in more variables, such as 3 x 4 y 5 z 8. Here, the highest
common factor of 3, 4 and 5 is 1. Since 8 is a multiple of 1, this equation will have
infinitely many integer solutions.
–1 –2 –3 –4
–5 –6
–6 –5 –4 –3 –2 –1 1
2 3 4 5 6 7 8 9 10 x
6 x +8 y =11
57
Archimedes’ cattle Around 250 bc, Archimedes sent a letter to his friend Eratosthenes,
containing a challenge for the mathematicians of Alexandria. It concerned the Sicilian
‘cattle of the Sun’, a herd consisting of cows and bulls of various colours. We will
write W for the number of white bulls and w for the number of white cows, and
similarly X and x for the black ones, Y and y for the dappled, and Z and z for the
brown.
W (1 _
21
3) X Z X (1 _
41
5) Y Z Y (1 _
61
7) W Z
w (1 _
31
_
4) (X x) x (1 _
41
5) (Y y) y (1 _
51
6) (Z z) z (1 _
61
7) (W w)
The challenge was to find the make-up of the herd by solving the equations. We do not
know how the Alexandrians fared. The first known solution is due to the European
mathematicians who rediscovered it in the 18th century: W 10,366,482, X 7,460,514,
Y 7,358,060, Z 4,149,387, w 7,206,360, x 4,893,246, y 3,515,820, z 5,439,213, a total
of 50,389,082 head of cattle.
However, Archimedes warned that anyone who solved this problem ‘would not be
called unskilled or ignorant of numbers, but nor shall you yet be numbered among the
wise’. In order to attain perfect wisdom, the problem had to be solved with two extra
conditions included: W X should be a square number, and Y Z should be a triangular
number. This removes the problem from the sphere of linear equations, and makes it
substantially harder. In 1880
A. Amthor described a solution, based on reducing the problem to the Pell equation, a
2 4729494 b 2 1. With the dawn of the computer age this was fleshed out to give the
full 206,545-digit answer: incomparably more cattle than there are atoms in the
universe.
Pell equations The easiest Diophantine equations to manage are the linear ones, where
there is a straightforward procedure to determine whether or not there are any integer
solutions. The picture becomes much murkier once squares are introduced, as the
unknown status of perfect cuboids demonstrates.
Around 800 bc the Hindu scholar Baudhayana gave 577 ___ 408 as an approximation
to ?
2. It is likely that this came from studying the equation x 2 2 y 2 1, which has x 577
and y 408 as a solution (as well as x 17, y 12). This, x 2 2 y 2 1, is the first example of
a Pell equation. Another is x 2 3 y 2 1, and generally x 2 n y 2 1, where n is any non-
square natural number. (In fact they have little to do with John Pell, but Leonhard
Euler confused him with William Brouncker, and the name stuck.)
x ²–2 y ²=1
15
14
13
12
11
10
9
8
10
11
12
13
14
15
16
17
18
19
20
21
58
NUMBERS
DIOPHANTINE EQUATIONS
Pell equations were studied much earlier in India, notably by Brahmagupta in ad 628.
The chakravala method, attributed to Bhāskara II in the 12th century, is a procedure for
solving Pell equations by continued fractions. It was spectacularly deployed to solve
the awkward case x 2 61 y 2 1, to find minimum solutions of x 1,766,319,049 and y
226,153,980. Hermann Hankel called chakravala ‘the finest thing achieved in the
theory of numbers before Lagrange’. It was Joseph Louis Lagrange who gave the first
rigorous proof that x 2 n y 2 1 must have infinitely many integer solutions for any non-
square number n.
Euler bricks Pythagorean triples allow us to construct a rectangle whose sides and
diagonals are all whole numbers. For instance, a rectangle with sides of 3 and 4 units
see irregular polyhedra
will, by Pythagoras’ theorem, have diagonals of length 5. An Euler brick generalizes
this to three-dimensions: it is a cuboid (see irregular polyhedra) all of whose lengths
are whole numbers, as are the diagonals of each face. The smallest Euler brick has
sides of lengths 44, 117 and 240 units, and was discovered in 1719 by Paul Halcke. In
1740, the blind mathematician Nicholas Saunderson discovered a method to produce
infinitely many Euler bricks. This approach was later augmented by Leonhard Euler
himself.
We can translate the geometry into algebra using Pythagoras’ theorem. An Euler brick
with sides a, b, c and face diagonals d, e, f must satisfy the Diophantine equations:
a 2 b 2 d 2, b 2 c 2 e 2 and c 2 a 2 f 2
The most sought after Euler bricks are the elusive perfect cuboids.
Perfect cuboids Euler bricks have intrigued mathematicians since the 18th century:
they are cuboids where the edges and face-diagonals all have integer lengths. A natural
extension is to require that the cuboid’s body-diagonal should also be a whole number.
This describes a perfect cuboid. The problem is that no-one has ever seen one: whether
or not they exist is a significant open problem.
a 2 b 2 d 2, b 2 c 2 e 2 and c 2 a 2 f 2,
as well as a 2 b 2 c 2 g 2
regular polyhedra
a
ge
b
59
No perfect cuboid has ever been found. It is known that if one does exist one of its
sides must be at least 9 billion units long. However in 2009, Jorge Sawyer and Clifford
Reiter discovered the existence of perfect parallelepipeds (see irregular polyhedra).
The smallest has edges of 271, 106, 103, parallelogram faces with diagonals of 101
and 183, 266 and 312, 255 and 323, seeandirregular polyhedra
body-diagonals of 374, 300, 278 and 272.
Sums of two squares An ancient mathematical question asks: which natural numbers
can be written as two square numbers added together? Fermat’s two square theorem
answered this for prime numbers: the only ones which can are those of the form 4 m 1
(as well as 2 1 2 1 2). But what happens for composite numbers? 6 and 14 cannot, but
45 can: 45 3 2 6 2. For large numbers like 6615, finding the answer could require a lot
of experimentation.
Fermat’s result suggests that prime numbers are especially important, and the main
obstacle are the (4 m 3)-primes: 3, 7, 11, 19, 23, 31, etc. The solution (which can be
deduced from Fermat’s theorem) first breaks a whole number n into primes using the
Fundamental theorem of arithmetic. Then the theorem states that n can be written as
the sum of two squares if each (4 m 3)-prime appears in this break-down an even
number of times. So 6 2 3 fails, since 3 appears once (an odd number of times). But 45
3 2 5 can be written as a sum of two squares, since 3 appears twice (an even number of
times). With no further need for experiment, we now know that 6615 can never be
written as a sum of two squares, because 6615 3 3 5 7 2, where 3 appears three times.
Bachet translated the six surviving books of Diophantus’ Arithmetica (dating from
around ad 250) from ancient Greek into Latin. These volumes would play an important
role in the development of modern number theory, most notably in the hands of Pierre
de Fermat. Bachet was also a mathematician in his own right. He noticed that implicit
in Diophantus’ work was a remarkable claim: that every whole number could be
written as the sum of four square numbers. For example, 11 3 2 1 2 1 2 0 2 and 1001
30 2 8 2 6 2 1 2.
Fermat later made the same observation, known as Bachet’s conjecture. But the first
published proof for all natural numbers was by Joseph-Louis Lagrange in 1770. This
important theorem is generalized further by Fermat’s polygonal number theorem and
Waring’s problem.
Euler published the first proof of Fermat’s two square theorem in 1749, essentially
settling the question of which numbers could be written as sums of two squares. In
1770, Joseph-Louis Lagrange proved his four square theorem, showing that every
natural number can be written as the sum of four squares.
60
NUMBERS
DIOPHANTINE EQUATIONS
A puzzle remained: which numbers can be written as the sum of three squares? Most
numbers can. The first which cannot are 7, 15, 23, 28, 31, 39, 47, 55, 60, …
Triangular numbers In cue-sports such as billiards and pool, the pack comprises 15
balls arranged as an equilateral triangle at the start of the game. It is not fanciful to
imagine that the number 15 was chosen because it is a triangular number: 14 or 16
balls cannot form an equilateral triangle. The smallest triangular number is 1, then 3,
then 6. Looking at the corresponding packs of balls, see adding
in each upfirst
case the the numbers
row contains 1
1 ball, the next 2, then 3 and so on. So triangular numbers are those of the form 1 2 3 4
… n, for some n. The formula for summing the numbers 1 to n (see adding up the
numbers 1 to 100) gives a formula for the n th triangular number: n (n 1) ______
2.
2). In this guise, triangular numbers give the solution to the handshaking problem: how
many handshakes take place when n people all shake hands with each other?
Triangular numbers are the first of the polygonal numbers.
Polygonal numbers Triangular numbers are those which count the number of balls
which can be arranged in an equilateral triangle. Square numbers are the corresponding
numbers for squares. Can this be extended to regular pentagons and other polygons?
The answer is that it can, though care is needed since it is not immediately obvious
how to arrange balls into pentagonal arrays. The convention is that the first pentagonal
number is 1, the second is 5, and subsequent numbers are found by picking one corner,
extending its two sides each by one ball, and completing the pentagon (enclosing all
previous balls). See diagrams. This gives the next pentagonal numbers as 12, 22, 35,
51, etc.
A similar process works for hexagonal numbers (the first few of which are 1, 6, 15, 28,
45, 66), heptagonal numbers, and indeed n-agonal numbers for any n.
2 or to put it another way, 1 formula for the m th square number is, of course, m 2. For
the m th pentagonal number, it is
3_
2m21_
2 m. The general
g up the numbers 1 to 100pattern might be visible now: the formula for the m th n-agonal
number is (n
__ 2 1) m 2 (n
2 2) m. The fact that these numbers have genuine number theoretic significance is
proved in Fermat’s polygonal number theorem.
1 3 6 10
1 4 9 16
1 5 12 22
__
2 m The
2m21
61
Fermat’s polygonal number theorem
In 1796 Carl Friedrich Gauss proved that every number can be written as the sum of
three triangular numbers. Echoing Archimedes, he wrote:
EYPHKA num
In 1770 Lagrange had already proved his four square theorem: that every number can
be written as the sum of four square numbers (or, as Gauss might have put it, ‘num’).
Together these two results formed the first two cases of Fermat’s polygonal number
theorem. The next says that any number should be expressible as the sum of five
pentagonal numbers. In general, the theorem asserts that any number can be obtained
by adding together n n-agonal numbers. In his customary fashion, Pierre de Fermat
claimed to have a proof, but did not apparently communicate it to anyone. The first
known complete proof is due to Augustin Louis Cauchy in 1813.
Pythagorean triple because 3 2 4 2 5 2 (that is 9 16 25). Around 300 bc, Euclid realized
that there must be infinitely many triples of whole numbers like this which satisfy x 2 y
2 z 2. In 1637, Pierre de Fermat contemplated what would happen if these squares
were replaced with higher powers. In his copy of Diophantus’ Arithmetica, he wrote ‘It
is impossible to separate a cube into two cubes, or a fourth power into two fourth
powers, or any power higher than the second into two like powers.’ Fermat was
claiming that, for any value of n larger than 2, it would never be possible to find whole
numbers x, y and z satisfying x n y n z n. Infamously, he went on ‘I have discovered a
truly marvellous proof of this which this margin is too narrow to contain’.
Wiles’ theorem Fermat’s famous claim became known as his last theorem, not because
it was the last he wrote, but because it was the last to be proved. Despite his claim,
most experts today do not believe that Fermat could possibly have had a complete
proof (although he did prove the particular case of n 4). Nor did he or anyone else find
a counterexample.
It would more accurately have been called Fermat’s conjecture, and remained
unresolved over the centuries, despite the attentions of many of mathematics’ greatest
thinkers. In 1995, Andrew Wiles and his former research student Richard Taylor
completed a proof which established Fermat’s last theorem as a consequence of the
modularity theorem for elliptic curves.
Beal’s conjecture Best known as a self-made billionaire, and for playing poker for the
highest stakes in history, the Texas-based businessman Andrew Beal is also an
enthusiastic amateur number theorist. In 1993, he was investigating Fermat’s last
theorem,
62
NUMBERS
DIOPHANTINE EQUATIONS
which says that x n y n z n has no integer solutions when n 2. Beal’s idea was to alter
the formula by allowing the exponents to differ: so, instead of x n y n z n, he
considered x r y s z t, where r, s and t are allowed to be different (but all must be
bigger than 2). Similar situations had been studied by Viggo Brun at the beginning of
the 20th century.
Waring’s problem By Lagrange’s four square theorem, we know that every positive
whole number can be written as the sum of four squares. What about writing numbers
as sums of higher powers? In 1909, Arthur Wieferich showed that every number can
be written as the sum of nine cubes. In 1986, Balasubramanian, Deshouillers, and
Dress showed that 19 fourth powers are also enough. These results had been
conjectured by Edward Waring in 1770. Waring had further suggested that this
problem must have a solution for every positive power. That is, for every whole
number n, there must be some other number g, so that any number can be written as
the sum of at most g n th powers.
This important result was proved by David Hilbert in 1909, and is known as the
Hilbert–Waring theorem. The first few numbers in the sequence are:
n 1 2 3 4 5 6 7 8 9 10
The exact nature of this sequence, and related sequences, remains an active research
topic.
63
The abc conjecture To calculate the radical of a number, first break it down into
primes, and then multiply together the distinct primes, ignoring any repetitions. So,
300 2 2 3 5 5, and its radical, Rad(300) equals 2 3 5 30. Similarly, 16 2 2 2 2 and
Rad(16) 2. (The radical is also called the square-free part of x, since it is the largest
factor of x not divisible by a square. Note that the term ‘radical’ is separately used to
refer to roots, as in square root.)
In 1985, Joseph Oesterlé and David Masser formulated a conjecture which would
generalize Fermat’s last theorem, Catalan’s conjecture (Miha? ilescu’s theorem) and a
great many other number theoretical problems. Dorian Goldfeld called it ‘the most
important unsolved problem in Diophantine analysis’. It concerns situations where
three numbers a, b and c are coprime and satisfy a b c. The conjecture compares c and
Rad(a b c). In most cases c Rad(a b c). For instance, if a 3, b 4, c 7, then Rad(3 4 7) 42,
which is more than 7. But occasionally this does not happen, for instance when a 1, b
8, c 9, then Rad(1 8 9) 6, which is less than 9.
Masser proved that there are infinitely many of these exceptions, where c Rad(a b c).
The abc conjecture says that these are only just in violation of the rule: even a very
minor tweak will get rid of most of them. For any number d bigger than 1 (even d
1.0000000001), almost all of the exceptions should disappear, leaving just finitely
many triples where c Rad(a b c) d.
PRIME NUMBERS
Prime numbers The definition of a prime number is among the most profound and
ancient there is: a positive whole number which cannot be divided into smaller whole
numbers.The first examples are 2, 3, 5 and 7. On the other hand, 4 is not prime, as it
equals 2 2. Up until the early 20th century, 1 was classed as prime, but no longer.
Today we say that a prime must have exactly two factors: itself and 1. Non-prime
numbers (apart from 1) are known as composite. The fundamental theorem of
arithmetic tells us that prime numbers are to mathematicians as atoms are to chemists:
the basic building blocks from which every other natural number is built. Despite the
simplicity of their definition, the prime numbers still hold many mysteries. Major open
questions about primes include Landau’s problems and the Riemann hypothesis. There
is no simple formula for generating prime numbers, and no immediate way to tell
whether or not a large number is prime, making primality testing an important area of
research.
The list of prime numbers starts with 2, 3, 5, 7, 11, 13, 17, … But where does it end?
In fact, it goes on forever, a fact demonstrated by Euclid in Proposition 9.20 of his
Elements. His method is a classical proof by contradiction: Euclid started by imagining
that what he wanted to prove was false, and
64
NUMBERS
PRIME NUMBERS
there really was a biggest prime number. Call this P. According to this asssumption, he
now had a list of all the prime numbers: 2, 3, 5, 7, …, P. Euclid then constructed a new
number (Q, say), by multiplying all the primes together, and then adding 1. So:
Q (2 3 5 7 … P) 1.
This technique of sieving the natural numbers, much evolved since Eratosthenes’ day,
plays a major role in modern number theory, and forms the basis for the proof of
Chen’s Theorems 1 3.
1 2 3 4 5 6 7 8 9 10 11 12
22
13 14 15 16 17 18 19 20
21 23 24 25 26 27 28 29 30
32
31 33 34 35 36 37 38 39 40
42
41 43 44 45 46 47 48 49 50
52
51 53 54 55 56 57 58 59 60
62
61 63 64 65 66 67 68 69 70
72
71 73 74 75 76 77 78 79 80
82
81 83 84 85 86 87 88 89 90
92
91 93 94 95 96 97 98 99 100
1 2 3 4 5 6 7 8 9 10 11 12
22
13 14 15 16 17 18 19 20
21 23 24 25 26 27 28 29 30
32
31 33 34 35 36 37 38 39 40
42
41 43 44 45 46 47 48 49 50
52
51 53 54 55 56 57 58 59 60
62
61 63 64 65 66 67 68 69 70
72
71 73 74 75 76 77 78 79 80
82
81 83 84 85 86 87 88 89 90
92
91 93 94 95 96 97 98 99 100
1 2 3 4 5 6 7 8 9 10 11 12
22
13 14 15 16 17 18 19 20
21 23 24 25 26 27 28 29 30
32
31 33 34 35 36 37 38 39 40
42
41 43 44 45 46 47 48 49 50
52
51 53 54 55 56 57 58 59 60
62
61 63 64 65 66 67 68 69 70
72
71 73 74 75 76 77 78 79 80
82
81 83 84 85 86 87 88 89 90
92
91 93 94 95 96 97 98 99 100
65
A related statement is the weak Goldbach conjecture, which says that every odd
number from 9 onwards is the sum of three primes. In 1937, Ivan Vinogradov found a
large number N and proved that the conjecture is true for all odd numbers bigger than
N. So, theoretically, only finitely many numbers remain to be checked to prove the
result. However, this threshold is still so huge (around 10 43000) that the weak
conjecture also remains open today.
Fermat’s two square theorem With the exception of 2, all prime numbers are odd. So,
when divided by 4, every odd prime must leave remainder either 1 or 3. This divides
the odd primes into two families, and in the 17th century Pierre de Fermat noticed a
surprising difference between them. The primes of the form 4 n 1 could all be written
as the sum of two squares: 5 2 2 1 2, and 13 3 2 2 2, for example, and furthermore this
could be done in only one way. But none of the primes of the form 4 n 3 (such as 7 or
11) could be expressed in this way.
This is also known as Fermat’s Christmas theorem because it was first written in a
letter to Marin Mersenne, on 25 December 1640. Typically, Fermat gave just the barest
sketch of an argument that this should always be true. The first published full proof
was by Leonhard Euler, in 1749. This is intimately related to the quadratic reciprocity
theorem and, as G.H.Hardy remarked, ‘is ranked, very justly, as one of the finest in
arithmetic’. A consequence is that if a number can be written as the sum of two squares
in two different ways (such as 50 7 2 1 2 5 2 5 2), then it cannot be prime.
Fermat’s result extends to a complete analysis of which numbers can be written as the
sums of two squares.
66
NUMBERS
PRIME NUMBERS
Prime gaps The study of prime numbers can be turned on its head by looking instead at
the gaps between them: that is, the differences between successive prime numbers.
So the first prime gap is that between 2 and 3, namely 1. Then between 3 and 5 we find
2, and so on. Can prime gaps be arbitrarily large? It turns out that they can; so, like the
primes, the set of prime gaps is infinite. But where very large gaps first appear, and the
relation of the possible size of a gap to that of the primes either side of it, remains a
topic of research. The twin primes and de Polignac’s conjectures are long-standing
open problems about prime gaps. Bertrand’s postulate implies that a prime number
can’t ever be followed by a gap bigger than it. Andrica’s conjecture, if proved, would
substantially strengthen this result.
Bertrand’s postulate How far apart can consecutive primes be? That was the question
on Joseph Bertrand’s mind in 1845, when he postulated that, between any natural
number n and its double (2 n), you must always be able to find at least one prime
number. Formally, for any number n (greater than 1), there is a prime number p, where
n < p < 2 n. Five years later, his postulate was proved by Pafnuty Chebyshev. Its
subsequent reproof by Paul Erdo˝s was immortalized in a verse by Nathan Fine:
‘Chebyshev said it, but I’ll say it again; there’s always a prime between n and 2 n’.
Dorin formulated what he describes as ‘a very hard open problem in number theory’. It
concerns the differences between the square roots of successive primes. His conjecture
is that these are always less than 1. More formally, if p and q are consecutive primes
then
His investigations of prime numbers led him to conjecture that between any two square
numbers there will always be a prime. That is, for any natural number n, there is a
prime number p where n 2 < p < (n 1) 2. The format of the conjecture is similar to
Bertrand’s postulate. But while that quickly succumbed to mathematical ingenuity,
Legendre’s conjecture remains stubbornly resistant. The most promising progress on
Legendre’s conjecture to date is Chen’s Theorem 3.
67
p < 1.
If true, squaring this inequality translates to a rule about ordinary prime gaps: if p is a
prime number; then the gap following p is less than 2
Twin prime conjecture Pairs of prime numbers which are
2 apart, such as 5 and 7, or 17 and 19, are known as twin primes. Although the infinity
of the primes has been known since the time of Euclid, whether there are infinitely
many pairs of twin primes is still an open question.
At time of writing, the largest known twin primes are the two numbers either side of
2,003,663,613 2 195,000, discovered in 2007 by Eric Vautier as part of Primegrid and
the internet Twin Prime Search. The twin primes conjecture is generalized by de
Polignac’s conjecture and the first Hardy–Littlewood conjecture. It is the first of many
attempts to analyse the different constellations which can be found among the primes.
Chen’s theorem 1 Between 1966 and 1975, Chen Jingrun proved three theorems which
marked major progress towards several outstanding open problems about prime
numbers. At the centre of Chen’s work was the notion of a semiprime: a number which
is the product of exactly two primes (e.g. 4 2 2 or 6 2 3). Chen’s first theorem was
aimed in the direction of the Goldbach conjecture. It says that, after a certain threshold,
every even number can be written either as the sum of two primes (as Goldbach would
have it), or of a prime and a semiprime.
Polignac’s conjecture. It states that every even number appears infinitely often either
as the difference of two prime numbers (which would follow from de Polignac’s
conjecture), or of a prime and a semiprime. A consequence of Chen’s theorem 2 is that
there are infinitely many prime numbers p where p 2 is either prime (as the twin prime
conjecture states) or semiprime.
Legendre’s conjecture. Chen’s Theorem 3 says that between any two square numbers
there exists either a prime, or a semiprime.
68
NUMBERS
PRIME NUMBERS
Constellations of primes The twin primes and de Polignac’s conjecture both concern
pairs of primes, a fixed distance apart. But what of longer patterns: triples, quadruples
of primes, in a prescribed pattern? Care is needed: not all possible arrangements of
primes are possible. For instance, why does the twin primes conjecture ask for pairs of
prime numbers of the form n and n 2, instead of n and n 1? The answer is that one of n
and n 1 must always be divisible by 2, and so they cannot both be prime (the sole
exception being the pair 2 and 3).
The same phenomenon occurs for longer patterns. It is no use searching for triples of
primes of the form n, n 2 and n 4: one of these will always be divisible by 3, and so not
prime (except for the triple 3, 5, 7). These forbidden patterns can be described neatly
using modular arithmetic.
Hypothesis H From the twin prime conjecture (still unproved!) to the first
nski suggested yet another step. The first Hardy–Littlewood conjecture is concerned
only with constellations of primes obtained through addition, such as n, n 2, n 6. Each
such pattern requires finding a suitable n, and then adding specified constants to it.
This does not encompass the n 2 1 conjecture. There the condition on the prime also
involves multiplication. To come up with one broad conjecture which included this
too, Schinzel and Sierpí
nski needed the language of polynomials. The idea was that any suitable collection,
such as n 2 1, 2 n 2 1, and n 3 3, should all give simultaneously prime outputs, for
infinitely many values of n. Only irreducible polynomials need be considered, which
cannot be split up (n 2 for example, is reducible to n n, and can never produce a prime
result).
nski provided a detailed estimate of how often they expected their systems to be
simultaneously prime.
69
Dirichlet’s theorem Undoubtedly, the primes form the most enigmatic of all sequences
of natural numbers: 2, 3, 5, 7, 11, 13,… More prosaic are arithmetic progressions: start
with a natural number (3 for example), and then repeatedly add on another number
(such as 2), to get a sequence: 3, 5, 7, 9, 11, 13,…. Understanding the relationship
between these simple sequences and the infinitely subtler sequence of primes is an
ongoing area of research.
A fundamental question to ask is: does the sequence above contain infinitely many
primes?
Not every arithmetic progression can contain primes: the sequence with initial term 4
and common difference 6 is 4, 10, 16, 22,… But this can never reach a prime number:
every number in the sequence is even, and so divisible by 2. To have a hope of finding
infinitely many primes, the initial term of the sequence and its common difference
should be coprime (that is, have highest common factor equal to 1). This sequence fails
because 4 and 6 have 2 as a common factor.
In 1837, Johann Dirichlet proved his most celebrated and technically impressive result:
he showed that this is indeed the only obstacle. Every arithmetic progression does
contain infinitely many primes, as long as its initial term and common difference are
coprime. That is, if a and b are coprime, then there are infinitely many prime numbers
of the form a bn. Dirichlet’s proof was the first to deploy L-functions, and as such is
often considered as marking the birth of analytic number theory.
An old conjecture, whose roots are lost in mathematical folklore, but likely to date
back at least to 1770, says that there is an arithmetic progression of primes of any
length you care to name. At time of writing, the longest known arithmetic progression
of primes was found by Benoãt Perichon, as part of Primegrid. It has length 26, and
starts at 43142746595714191, with common difference 5283234035979900. Progress
towards the general conjecture was slow over the 20th century, but it was dramatically
proved in 2004, by Terence Tao and Ben Green, which contributed to Tao winning the
Fields medal in 2006. This theorem would be generalized by the first Hardy–
Littlewood conjecture and Hypothesis H.
Fermat primes Pierre Fermat was interested in primes of a particularly simple sort,
those which are one more than a power of two: 2 n 1. For example, 3 2 1 1, and 17 2 4
1. Not every number of this form is prime: 2 3 1 9 for example; and nor does every
prime fit this form (7 does not). Fermat noticed that in all cases where 2 n 1 is prime,
70
NUMBERS
PRIME NUMBERS
These numbers grow large very quickly, so it was difficult to test this conjecture for
many values. In 1953, however, J.L.Selfridge showed that 2 2 16 1 is not prime,
refuting Fermat’s conjecture.
In fact, most Fermat numbers are not prime. To date, only five Fermat primes are
known: 3, 5, 17, 257 (2 2 3 1), and 65537 (2 2 4 1). After that the sequence seems to
stop yielding primes (2 2 5 1 4,294,967,297 641 6,700,417 for example). Whether
there are any more Fermat primes remains an open question. One consequence would
be for the theory of constructible polygons.
Mersenne primes Mersenne numbers are those which are 1 less than a power of 2: that
is, numbers of the form 2 n 1. Not all such numbers are prime, for example 15 2 4 1.
Also, not all primes are Mersenne (5 and 11 are not, for instance). But those which are,
such as 3 2 2 1 and 7 2 3 1, comprise the important class of Mersenne primes, named
after the French monk Marin Mersenne who began listing them in 1644.
The study of Mersenne primes is intimately connected to that of perfect numbers since
even perfect numbers are all the numbers of the form M (M 1) _________
Large primes The infinity of the primes has been known since Euclid’s time, proving
beyond doubt that there is no biggest prime number. Nevertheless the quest to pin
down ever larger primes dates back for hundreds of years. With a handful of
exceptions, the largest known prime numbers have always been Mersenne primes. In
1588, Pietro Cataldi showed that the Mersenne number 2 19 – 1 524,287 is prime, and
this record stood for almost 200 years.
In the 20th century the search accelerated not only through improved techniques of
primality testing, such as the discovery of the Lucas–Lehmer test, but through sheer
computing power. Since 1951, the hunt for large primes has relied on ever faster
computers and, since 1996, it has been dominated by the Great Internet Mersenne
Prime Search (GIMPS), which exploits idle time on the computers of thousands of
volunteers worldwide. The largest known prime at the time of writing is a Mersenne
prime discovered in August 2008 through GIMPS (by Smith, Woltman, Kurowski and
colleagues): 2 43,112,609 1. When written out in full, this is 12,978,189 digits long.
Primality testing The simplest method to tell whether a number n is prime is just to try
to divide it by every smaller number. (In fact it is enough just to try prime numbers up
to ?
n.) However, for really large numbers this is impractically slow: to test a number 100
digits long would take longer than the lifetime of the universe.
71
2 and 28 7 8
2.
The sieve of Eratosthenes is the oldest known method for generating lists of primes,
and remains in use today. Other primality tests such as the Fermat Primality Test, and
the Miller–Rabin test are probabilistic: no true primes fail them, but the numbers which
pass are only ‘probably prime’. (These tests might more accurately be termed
compositeness tests.) All modern tests have to be grounded in modular arithmetic to
bring the numbers involved down to a manageable size. Today, the search for large
prime numbers hinges on the non-probabilistic Lucas–Lehmer test. In 2002, the AKS
primality test demonstrated for the first time that testing the primality of general
numbers can be done comparatively efficiently.
Fermat’s primality test The Chinese hypothesis was an incorrect method for
identifying prime numbers. It said that the number q is prime if and only if q divides
2 4 2(14), 5 divides 2 5 2(30), and this continues to work for some time. The first
counterexample is the non-prime 341 11 31, which does divide 2 341 2.
Despite being false, this conjecture contains the seeds of a primality test based on
Fermat’s little theorem. A consequence of this is that if q is possible prime, and we
find some number n where n q n is not divisible by q, then q cannot be prime after all.
The chinese hypothesis corresponds to the case n 2: so it forms a necessary condition
for primality, but not sufficient.
Fermat’s primality test tries a putative prime q against base numbers n. If the
conclusion to the theorem fails (so q does not divide n q n), it follows that q is not a
true prime after all.
However, this test is not perfect, as there are pseudoprimes which can pass it for
particular bases: 341 is a 2-pseudoprime. The worst cases are non-primes q which can
pass this test for every base n which is coprime to q. These are called Carmichael
numbers, after their discoverer in 1910, Robert Carmichael. The first is 561. In 1994,
Alford, Granville and Pomerance proved that there are infinitely many Carmichael
numbers.
Lucas–Lehmer test The Lucas–Lehmer test dates from 1930 and is used today by the
Great Internet Mersenne Prime Search to test for Mersenne primes. If p is a large prime
number, the question to answer is whether the enormous number 2 p 1 is also prime.
Call this number k. The basis of the test is the Lucas–Lehmer sequence which is
defined recursively: S 0 4 and S n 1 S n 2 2. Lehmer’s theorem says that k is prime if
and only if it divides S p 2.
The difficulty is that the Lucas–Lehmer sequence grows very fast indeed: the first few
terms are S 0 4, S 1 14, S 2 194, S 3 37,634 and S 4 1,416,317,954. By S 8 the
sequence has grown bigger than the number of atoms in the universe, making any
further direct calculations impossible.
72
NUMBERS
PRIME NUMBERS
The prime counting function There are of course infinitely many prime numbers. So
what can it mean to ‘count’ them? The answer is given by the prime counting function
(not to be confused with the number). For a natural number n, (n) is defined to be the
number of primes up to and including n. So (7) 4 because there are exactly four prime
numbers up to 7 (namely 2, 3, 5 and 7). Similarly (8) 4, (11) 5, (100) 25, and
(1000) 168.
Friedrich Gauss made a remarkable observation about the prime counting function. He
noticed that the ratio n: (n) was roughly the natural logarithm of n, that is, ln n. This
means that if you pick a number between 1 and n at random, the probability of getting
a prime is approximately 1 ___
ln n. So the number of primes between 1 and n is approximately n
___ ln n. The young Gauss conjectured that, even for big values of n, this would
continue to be true. That is, the two functions (n) and n ___ ln n
More precisely, this means that one divided by the other will get closer and closer to 1,
as n gets larger.
The prime number theorem 2 Later, Gauss refined his estimate for the number of
primes from n
____ ln (x). Again, Gauss believed, but did not prove, that (n)~Li n. This assertion is
the prime number theorem, and it also implies the weaker result that (n) ~ n
Technically, Li n
P(n)
10000
8000
6000
n 1n(n)
4000
2000
20000
0 40000 60000 80000 100000
ndx
1n n.
73
In 1896, by studying the Riemann zeta function, Jacques Hadamard and Charles de la
Vallée Poussin each independently proved the prime number theorem. This theorem
describes the general pattern, but not the exact values, of (n). It predicts that the
number of primes up to 10,000,000,000 should be approximately 455,055,614. In fact
there are 455,052,511, an error of around 0.0007%. The Riemann hypothesis, if
proved, would fill in the details of these errors, and so turn the prime number theorem
into a method for describing (n) at the best possible level of accuracy.
The Riemann zeta function Bernhard Riemann only published one paper on number
theory: ‘On the Number of Prime Numbers less than a Given Quantity’ of 1859. But it
has a strong claim to be the most influential ever written. The paper concerned a
certain function defined by a power series:
(s) ?
2s1
3s1
4s1
5s…
n1
1 __
ns11
The function was not new: Leonhard Euler had contemplated it in 1737, and had
realized its connection with the prime numbers by spotting a different way to write it,
now known as Euler’s product formula:
(s) ?
p prime
ps
_____ p s 1 2 s
2s13s
3s15s
5s17s
7s1…
Here, the product is over the prime numbers p, where the sum above was over all
natural numbers n.
Riemann’s first insight was that the zeta function made sense as a complex function: he
could feed in complex values of s and get out complex values of (s). In fact the
formulas above are only valid in half the complex plane. (They only converge when
the real part of s is greater than 1.)
His second stroke of genius was to find a third formula for, valid everywhere (except at
the single point s 1), which tallies exactly with Euler’s versions.
Extracting explicit values from the zeta function is extremely difficult: studying
Riemann’s new presentation shows that (0) 1
2, and that (2 n) is transcendental for every positive integer n. But it was only in 1978
that Roger Apéry was able to show that (3) is irrational.
The nature of (s) for other odd values of n remains mysterious. The further behaviour
of this function is the focus of Riemann’s famous hypothesis.
Euler’s product formula Applying the generalized binomial theorem to (1 1
_ 2) 1 gives:
(1 1
2) 1 1 1 __ 2 1
221
23…
74
NUMBERS
PRIME NUMBERS
Similarly:
(1 1
3) 1 1 1 __ 3 1
321
33…
(1 1
2) 1 (1 1
3) 1 1 (1 1
31
9 …) 1 __ 2 (1 1
31
9 …) 1 __ 4 (1 1
31
9 …) …
21
31
41
61
81
9 1 ___
12 …
Every number which has 2 and 3 as its prime factors will appear in this expression
exactly once. Similarly:
(1 1
2) 1 (1 1
3) 1 (1 1
12 …
In order for every natural number to appear on the right-hand side, every prime number
would have to appear on the left. This suggests taking the infinite product:
5) 1 1 1 __ 2 1
31
41
51
61
81
9 1 ___
10 1 ___
(1 1
__ p) 1
p prime
10 …
does not converge: it just gets bigger and bigger. But if s 1, then the series given by the
Riemann zeta function (s) 1 1
2s1
21
31
41
_
51
61
71
81
91
3s1
4s1
5s1
6s1
7 s … does converge, and the same argument suggests that this should equal:
p prime
(1 1
__ p s) 1
This is the core idea behind Euler’s product formula, the first hint that the Riemann
zeta function is intimately linked to the prime numbers
(s) ?
p prime
(1 1
__ p s) 1
or equivalently
(s) ?
p prime
(p s
____ p s 1)
The Riemann hypothesis ‘If I were to awaken after having slept for a thousand years,
my first question would be: Has the Riemann hypothesis been proven?’, said the great
mathematician David Hilbert. Using the zeta function and some spectacular insights
into complex analysis, Riemann was able to concoct an exact formula for
75
Critical line Critical strip
the prime counting function: arguably the holy grail of number theory. However, his
formula depended on knowing where the zeta function disappeared: the values of s for
which (s) 0. It is comparatively easy to check that (2), (4), (6),… are all zero: these are
called the trivial zeroes. But there are infinitely many non-trivial zeroes too. Riemann
showed that these all lie in a critical strip between Re(s) 0 and Re(s) 1 (see diagram),
and that they must lie symmetrically across the critical line Re(s) 1
3i –2i
–3i
2i
–5
–4
–3
–2
–1
2 3 –i
112
Trivial zeroes
2.
The Riemann hypothesis is the claim that all the non-trivial zeroes of the zeta function
lie on the critical line. Riemann wrote that he thought this ‘very probable’ but had
‘temporarily put aside the search for this after some fleeting futile attempts’; 150 years
later, after the concentrated efforts of hundreds of our greatest minds, a proof remains
elusive.
Of course, the Riemann hypothesis implies that that the true proportion is 100%, and
so far the experimental evidence backs this up: in 2004, Xavier Gourdon and Patrick
Demichel harnessed the power of distributed computing to verify that the first 10
trillion non-trivial zeroes do indeed lie on the critical line.
(s) ?
n1
1 __
ns11
2s1
3s1
4s1
5s…
Notice that all the fractions have 1 as their numerator. Other functions can be formed
by replacing these 1s with other (carefully chosen) complex numbers. This is the
profoundly important family of L-functions. These behave somewhat similarly to the
zeta function (including in their reluctance to give up their secrets). Each comes with
its own version of Euler’s product formula, its own extension to the complex numbers,
and its own variant of the Riemann hypothesis, which are collected together in the
generalized Riemann hypothesis.
76
NUMBERS
PRIME NUMBERS
L-functions first arose in the original proof of Dirichlet’s theorem. Since then, they
have become central to the subject of analytic number theory, despite the formidable
technical difficulties they pose. An analogue of the Riemann hypothesis for closely
related functions was the centrepiece of the Weil conjectures, proved in 1980 by Pierre
Deligne. Then in 1994 L-functions were exploited by Andrew Wiles to prove Fermat’s
last theorem. Other families of L-functions hold the keys to many further treasures of
the mathematical world, including the Birch and Swinnerton–Dyer conjecture and
Langlands’ program.
A proof of the Riemann hypothesis would be momentous for the light it shed on the
prime numbers. In particular, if it is true, the behaviour of the prime counting function
can be captured far more precisely than the prime number theorem so far allows. But
the implications go even wider: the zeta function is just the first in the infinite family
of L-functions which pervade modern number theory. Techniques which allowed it to
be understood would have deep implications for the rest.
The generalized Riemann hypothesis says that the non-trivial zeroes of every L-
function lie on the critical line Re(s) 1
77
GEOMETR
MANY OF THE THEOREMS OF GEOMETRY that are taught in school today date
back to ancient Greece. In fact most of them are found in one book: Euclid’s Elements.
In it, Euclid combined his own research with the amassed knowledge of the ancient
world to study the interactions of points, straight lines and circles. The Elements was
as significant for its axiomatic approach as it was for the theorems it contained.
However, in the 19th century new forms of non-Euclidean geometry were discovered,
in which Euclid’s axioms were not valid.
From here, geometry diverged into several strands, which can be illustrated by one of
the simplest mathematical figures: the circle. In differential geometry, a circle is the
result
Y
Euclid did not just include his own work, but collected that of his contemporaries and
forebears into an impressive body of knowledge. For around 2000 years, it was held in
such high regard that it remained a standard text around the world. The first printed
edition was produced in 1482. Over one thousand subsequent editions have been
published, a number exceeded only by the Bible.
The proof of the infinity of the primes is a highlight of the section on number theory,
and Euclid also developed the theory of ruler and compass constructions. But it is for
Euclid’s geometrical work that the Elements is chiefly famous, celebrated as much for
his modern approach as for the facts proved. For the first time, a mathematician wrote
down in detail his starting assumptions (or postulates or axioms), and deduced all his
results with precision, directly from these axioms.
Euclid’s postulates People had been studying points and lines for thousands of years
before Euclid. However, in his hands this subject (plane geometry) became the first
area of mathematics to be axiomatized. Euclid showed that his work rested on five
fundamental assumptions:
1 Between any two points you can draw a straight line segment. 2 Any straight line
segment can be extended indefinitely in both directions.
3 A circle can be drawn with radius of any length, centred at any point. 4 Any two
right angles are equal. 5 If two straight lines are crossed by a third, so that the interior
angles on one side are less than two right angles (x y 180˚), then the two lines, when
extended indefinitely, will eventually cross.
It is likely that Euclid was not satisfied by the long-winded fifth axiom, known as the
parallel postulate. The first 28 propositions of the Elements use only postulates 1–4.
But in the 29th, on parallel lines, he had no choice but to rely on postulate 5.
Euclid’s Elements, the term ‘Euclidean’ geometry was redundant; there was no other
sort. Only geometrical systems obeying Euclid’s five postulates were known.
Nevertheless, most geometers
xy
80
GEOMETRY
EUCLIDEAN GEOMETRY
shared Euclid’s view that the fifth postulate was somewhat troublesome, being less
obviously true than the others. Easier alternative formulations were found, including
one known as Playfair’s axiom (although John Playfair was not the first to discover it):
Given a straight line, and a point not on that line, there is at most one straight line
passing through the point, parallel to the given line.
‘Parallel’ means that the lines, extended indefinitely, will never cross. Euclid had
already shown (using just the first four postulates) that there is at least one such line,
which explains the strange expression ‘at most one’.
The independence of the parallel postulate For centuries, European and Islamic
mathematicians debated this axiom. Could it be derived from the first four of Euclid’s
postulates? A huge number of fallacious proofs were written and debunked, and
several logically equivalent conditions were discovered (including the assertion that
the angles in a triangle always sum to 180°). Recently, Scott Brodie observed that
Pythagoras’ theorem is logically equivalent to the parallel postulate.
It was not until the 19th century when Carl Friedrich Gauss, Nikolai Lobachevsky and
János Bolyai conceived of a system of hyperbolic geometry which obeyed postulates 1
4, but not the parallel postulate, that the parallel postulate was finally shown to be
independent of the first four postulates (see independence results).
Angles An angle is an amount of turn. But how can we quantify this? Perhaps the
easiest way is in terms of revolutions.see independence
So one results
complete rotation counts as 1, and a right
angle counts as 1
4. This is convenient for measuring large amounts of turn, which is why rotational
speed is usually measured in revolutions per minute. For smaller amounts of turn, a
finer scale is helpful.
The traditional measure of angle is the degree (°), where 360° constitutes a complete
rotation. This has its roots in the ancient Babylonian year, which had 360 days.
Traditionally a degree is divided into 60 arc-minutes, and each minute into 60 arc-
seconds. Another system, no longer much used, is the grade (or gradian), invented in
France in an attempt to incorporate angles into the metric system. 100 grades constitute
a right angle, making a complete revolution 400 gradians. Though historically
interesting, these systems are mathematically arbitrary. Scientists generally prefer to
use radians, a system based on the circle, where one full revolution equals 2 radians.
Parallel lines Two straight lines are parallel if they lie in the same plane and never
meet, even when extended indefinitely in either direction, like an infinitely long set of
rail-tracks. (The condition that the lines lie in a plane is necessary: in three dimensions
two
81
lines can be skew, neither parallel nor crossing each other.) If we assume the parallel
postulate, an equivalent condition is that the perpendicular distance between the two
lines is always the same, wherever it is measured.
Parallel lines are usually represented with matching arrows on the lines, while line
segments of equal length are drawn with matching dashes across them.
Two important facts concern what happens when a pair of parallel lines are crossed by
a third line, and were proved by Euclid as Proposition 1.29 of Elements, his first result
depending on the parallel postulate:
1 Corresponding angles are equal: in the illustration A and B are corresponding angles
(sometimes called ‘F angles’) because they occupy corresponding positions on
different parallel lines. Such angles are always equal. 2 Alternate angles are equal: C
and D are alternate angles (or ‘Z angles’) and again are always equal.
The opposite notion to parallel: two lines are perpendicular if they are at right angles to
each other. (In some settings, such as vector geometry, the word orthogonal is used.)
Right angles also provide a notion of measurement. It is obvious how to measure the
distance between two points on a plane. But between two lines (or a point and a line, or
a line and a plane) we need the perpendicular distance: the length of a new line
segment which crosses the original ones at right angles.
82
GEOMETRY
EUCLIDEAN GEOMETRY
Cartesian coordinates In Cartesian geometry, every point on the plane can be identified
by saying which quadrant it is in, and how far it is from each axis. This is much like
reading a map; indeed the two numbers which capture this information are called the
point’s coordinates.
The point (4, 3) is in the top right quadrant (because both numbers are positive), four
units to the right (along the x-axis), and three units up. (Some people use a mnemonic
such as ‘Along the corridor and up the stairs’ to remember the order of the x and y
coordinates.) Similarly, (3, 2) is in the bottom left quadrant, three units to the left and
two units down. The origin has coordinates (0, 0).
(–1,3) (–3,–2)
3 –1
–2
–3
(4,3)
–3
–2
–1
1234
(2,–1)
Origin
Plotting graphs As far back as the ancient Babylonians, algebraic methods have been
used in geometry. But Cartesian coordinates opened up geometry to more sophisticated
techniques. With Cartesian coordinates, points are precisely defined by numbers. The
relationships between those numbers can be used to describe geometric figures.
For instance we could look at all the points whose two coordinates are the same: (0, 0),
(1, 1), (10, 10), etc. Plotting these on the plane, we find that they all lie on a straight
line. Looking at the numbers, it is obvious that a point (x, y) lies on this line precisely
if y x. So we call y x the equation of the line.
Another example is those points where the second coordinate is double the first: (1, 2),
(6, 12), (0, 0), etc. Again they lie on a straight line, and in this case its equation is y 2
x. Similarly, starting with the equation y x 3, substituting values for x (say 0, 4 and 8)
gives points (0, 3), (4, 1) and (8, 11) on the line. All of these points have y-coordinate
3 bigger than their x-coordinate.
y=x x
y=2x
–1
–2
–3
–4
(3.6,3.6)
–1
–2
–3
–4
–1
–2
–3
–4
y y=x+3
(1,4)
(1,2)
(–2,1)
–1 0 –2
–1 0 –2
–1 0 –2
–4
–3
1234
–4
–3
1234
–4
–3
1234
(–1.5,–3)
(–4,4)
4 of a metre. Another way to say this is that, over any range, the increase in height
divided by the horizontal distance covered is always 1
83
y
Of course, physical hills have beginnings and ends, and the steepness in between is
irregular; this is not so with straight lines in the plane. To calculate the gradient of a
straight line, pick any two points on the line, measure the height gained vertically and
divide this by the distance covered horizontally. (If left to right is a downwards slope,
then the gradient is negative.)
It makes sense to pick a point on the curve and ask about the gradient of the tangent to
the curve there. This can be calculated by differentiation.
The equation of a straight line A straight line in the plane can be identified by two
pieces of information: firstly, its gradient, and secondly the coordinates of any point
the line passes through. A convenient point to pick is the y- intercept: the place where
the line crosses the vertical axis (or y-axis). Since it lies on the axis, the first coordinate
of this point will be 0. Call it (0, c).
y= 4 x+ 2 Gradient 4
(0,2)
4
y
If the gradient of the line is m, then the equation of the line is y mx c. So the line y 4 x
2 crosses the y-axis at (0, 2) and has gradient 4.
An exceptional class is that of the vertical lines. Since they are parallel to the y-axis,
they do not cross it anywhere, and there is no meaningful value of c to calculate.
Similarly the gradient of these lines is not defined, since they do not cover any
horizontal distances. (Some would say that they have ‘infinite gradient’.) Despite this,
the equations of these lines are straightforward, since they are defined by the first
coordinate staying fixed: x 3, for example.
Horizontal lines have a gradient of 0 and are defined by their fixed second coordinate:
y 4, for example.
The real line Euclid’s Elements set up geometry for the millennia to come. His
fundamentals – parallel lines, right-angled triangles, circles – are as important now as
ever. Nevertheless, by the 19th century, mathematicians had become concerned not to
take too much for granted. The Elements opens with some definitions: ‘1. A point is
that which has no part.
2. A line is a breadthless length … 4. A straight line is a line which lies evenly with the
points on itself. 5. A surface is that which has length and breadth only’. Any modern
mathematician will have no trouble grasping Euclid’s meaning. At the same time, what
really is ‘that which has no part’? It was necessary to translate Euclid into the precise
terminology of modern mathematics.
These ideas can be formalized through the system of real numbers (R). This comes
ready wrapped as a geometric object, the real line. In modern terms this is the first
Euclidean space.
10
9
321
23
2?
–2 –1
01234
84
GEOMETRY
EUCLIDEAN GEOMETRY
Euclidean plane The set (R) of real numbers is also known as the real line: it provides
the perfect model of a 1-dimensional line. Points on this line are simply numbers, and
the distance between two points is the larger number minus the smaller. So two points
are either a positive distance apart (with infinitely many points in between), or are
actually the same point; no two points are true neighbours. Also, points come in a
natural order, given by their size.
A plane can be modelled by replacing the individual numbers with pairs of real
numbers (a, b). This is the system of Cartesian coordinates. Formally, these pairs of
numbers are not merely directions to a point; they are the points. The distance between
a point (a, b) and the origin (0, 0) is given by Pythagoras’ theorem:
_______
a 2 b 2.
(This is easily extended to give the distance between any two points.)
This collection of all pairs of real numbers is denoted R 2 and called Euclidean 2-
space, or simply the plane. Repeating this process gives higher-dimensional spaces.
a 2 b 2 … z 2.
_______
x 2 y 2 1, that is, x 2 y 2 1. The same idea in three dimensions produces a sphere: the
set of points (x, y, z) which are distance 1 away from (0, 0, 0). So, x 2 y 2 z 2 1. It is
obvious how to extend this to higher dimensions: in Euclidean n-space, the n- sphere is
the set of points (x, y, …, z) a distance of 1 unit away from (0, 0, …, 0), that is, the
points for which x 2 y 2 … z 2 1.
Despite the human impediments to visualizing shapes in higher dimensions, they can
often be accessed fairly painlessly, by generalizing from spaces which are more
familiar.
(a,b)
? a² + b²
a² + b² + c² ?
a
c
(a,b,c)
85
TRIANGLES
Triangles The world of human affairs is best described by the system of Euclidean
geometry. Here, there are no biangles, shapes built from just two straight lines. So the
humble triangle is among the most elementary figures, and therefore one of the most
significant.
Triangles come in many forms: the first regular polygon, an equilateral triangle, is one
whose sides are all the same length. An isosceles triangle is one with two sides of the
same length. Triangles whose sides are all different lengths are known as scalene.
Angles in a triangle
Proved by Euclid in Proposition 1.32 of Elements, this is the first important result of
triangular geometry. The theorem follows from the alternate angles theorem on parallel
lines. Starting with a triangle ABC, draw a new line through C parallel to AB. Then the
three angles at C lie on a straight line, and so sum to 180°. But the two new angles at C
equal the angles at A and B, by the alternate angles theorem. Among many
consequences, this fact implies that the angles in an equilateral triangle are each 60°.
CCB
AAB
86
GEOMETRY
TRIANGLES
Pythagoras’ theorem
Perhaps the most famous theorem of all, Pythagoras’ theorem is also one of the oldest.
Though attributed to the Greek mathematician and mystic Pythagoras (circa 569 475
bc), there is strong evidence that the ancient Babylonians knew this result, over a
thousand years earlier. Euclid included it as Proposition 1.47 of his Elements.
The theorem concerns a right-angled triangle. It says: the square of the hypotenuse (c)
is equal to the sum of the squares of the other two sides (a and b). This can be thought
of geometrically, as relating the areas of squares built on the sides of the triangle, or
purely algebraically: a 2 b 2 c 2.
Pythagoras’ theorem is perhaps the most proved of all mathematical theorems. In the
1907 book The Pythagorean Proposition, Elisha Loomis collected together 367
different proofs.
Pythagorean triples The easiest lengths to work with (especially in the days before
pocket calculators) are those with whole-number values. Once right-angled triangles
had assumed their rightful place as the gatekeepers to geometry, there was a need to
find some which had integer-valued lengths. Unfortunately, most do not. For instance,
if you make the two shorter sides each 1 unit long, then Pythagoras’ theorem shows
that the length of the hypotenuse (c) must satisfy, c 2 1 2 1 2 1 1 2. So c
2, which is not only not a whole number, but worse, it is an irrational number.
Inconveniently, this is what usually happens.
The first right-angled triangle you can build from whole numbers has sides of length 3,
4 and 5. This satisfies Pythagoras’ theorem: 3 2 4 2 9 16 25 5 2. So (3, 4, 5) is called a
Pythagorean triple. Multiples of this, such as (6, 8, 10) and (9, 12, 15) are also
Pythagorean triples. Primitive Pythagorean triples are those which are not multiples of
a smaller one. The next few are (5, 12, 13), (7, 24, 25), (8, 15, 17), and (9, 40, 41).
87
bc
a+b
a²
a+b
b²
a+b
c²
a+b
the problem of finding Pythagorean triples was an early result in the study of
Diophantine equations. Fermat’s last theorem states that if we replace the squares with
a higher power (n 3), then we will never find any integer solutions to a n b n c n.
The area of a triangle Over the years, geometers have discovered a phenomenal
number of formulas for calculating the area of a triangle. The commonest is ‘half the
base times the height’: 1
2 b h, where b is the length of one of the edges (the ‘base’) and h is the perpendicular
distance from the base to the third point of the triangle. (This is really three formulas in
one, depending on which edge you choose as the ‘base’.)
A more sophisticated formula is r s, where r is the radius of the triangle’s incircle (see
centres of triangles) and s is its semiperimeter, s a b c ______
__________________
s (s a) (s b) (s c).
Suppose we know that one angle of our right-angled triangle is 60°. What are the
possible lengths of the sides? Focusing on the two edges on either side of the 60°
angle, the adjacent side (A) and the hypotenuse (H), possible lengths are A 1 cm, H 2
cm; A 5 km, H 10 km; or A 8 µm, H 16 µm. These solutions are different, but they
have something in common.
In each case the hypotenuse is twice the length of the adjacent side.
Knowing that one angle is 60° is not enough to determine the lengths of the triangle’s
sides, but it is enough to tell us the value of A
H, in this case 1
see
If we had started with a 45° angle we would have found that A
Sine, cosine and tangent If x is an angle in a right-angled triangle, we can call the three
lengths H (hypotenuse), O (opposite the angle) and A (adjacent to the angle). Although
H, A and O can change without altering x (by enlarging or shrinking the
60°
A
_1
2=
H1
TRIANGLES
triangle), these lengths always remain in the same proportions to each other. So the
see
values of
H, A
H and O
A are fixed, and totally determined by the angle x. These ratios are given by the sine,
cosine and tangent functions respectively (or sin, cos and tan for short): sin x O
H, cos x A
and tan x O
A. For example, in a (3, 4, 5)-triangle (see Pythagorean triples), if x is opposite the side
of length 3, then sin x 3
5, cos x 4
5 and tan x 3
4.
Evaluating these functions for a particular angle, say 34.2°, is decidedly tricky; happily
most pocket calculators have buttons dedicated to the task. In the past, people had to
wade through trigonometric tables or construct accurate scale drawings of triangles.
These important functions have long since outgrown their humble geometric origins. In
new guises as power series, they play pivotal roles in complex analysis and Fourier
see analysis
Pythagorean
(the studytriples
of waveforms), among other areas.
In the definitions of sine, cosine and tangent, you might wonder why O
H, A
H and O
A were chosen
O. After all, these are also fixed for any right-angled triangle containing the given
angle x. These are the definitions of the three less well-known trigonometric functions:
cosecant, secant and cotangent, respectively (or cosec, sec and cot for short).
It immediately follows from the definitions that cosec x 1 ___ sin x, sec x 1 ____ cos x
and cot x 1 ____ tan x.
instead of H
O, H
A and A
So translating information about cosec, sec and cot into terms of sin, cos and tan (or
vice versa) is never difficult.
Trigonometric identities
1 The three main trigonometric functions are connected by the formula tan x sin x
____ cos x.
A is equal to O
H divided by A
H2A2
(sin x) 2 (cos x) 2 1
Usually rewritten as sin 2 x cos 2 x 1, this formula often makes an appearance when
sin and cos are in use.
3 If we know the value of sin x and sin y, what can we say about sin(x y)? An
elementary mistake is to believe that sin (x y) sin x sin y. The situation is a little more
complicated, but there are still manageable formulas:
Similarly:
89
4 Applying the formulas above to the case where x y, gives the so-called double-angle
formulas for sin 2 x and cos 2 x sin 2 x 2 sin x cos x
and
Trigonometric values
1 In most cases, the values of sin x, cos x and tan x are best left to an electronic
calculator. But some values are suitable for human consumption, notably 0°, 90° and
180°. Translating these into radians:
sin 0 0 sin
2 1 sin 0
2 0 cos 1 and
tan 0 tan 0
Other important values are for 30°, 60° and 45°. In radians, they are:
sin
61
2 cos
3 ____
2 tan
6 1 ____
sin
3 ____
2 cos
31
2 tan
3
3
sin
4 1 ____
2 cos
4 1 ____
2 tan
41
Since sin x O __
H and cos x A
H, it must be that sin x 1 and cos x 1. In a slightly extended form, sine and cosine can
assume negative values too, but only ever between 1 and 1 (until complex values of x
are permitted).
3 The tangent function, meanwhile, can give any value as output. But it cannot accept
any value for x as input: no triangle can have two 90° angles in it (no Euclidean
triangle at any rate, but see elliptic geometry). So it is not clear what tan
2 might mean. Worse, as x gets closer and closer to 90°, the ratio of the side O to A
gets ever larger: tan(89°) 57, tan(89.9°) 573, and tan(89.99999°) is almost 6 million.
So there is no sensible value which can be assigned as tan (90°) or tan
The law of sines The definitions of sin, cos and tan all involve working in a right-
angled triangle, as does Pythagoras’ theorem. What can be said about non-right-angled
triangles? The sine rule, or law of sines, says that if you take one side of a triangle, and
90
GEOMETRY
TRIANGLES
divide its length by the sine of the opposite angle, then you will always get the same
answer (d), irrespective of which side you picked. Suppose a triangle has sides of
length a, b and c, with opposite angles A, B and C respectively. The sine rule is the
statement:
a
see also centres of triangle
_____ sin A b
_____ sin B c
_____ sin C d
The number d has a nice geometric interpretation: it is the diameter of the triangle’s
circumscribing circle (the unique circle which passes through the three corners of the
triangle; see also centres of triangles).
The sine rule has its roots in results 1.18 and 1.19 of Euclid’s Elements, but was first
written down explicitly by the 13th-century Persian mathematician and astronomer
Nasīr al-Dīn al-Tūsī.
bC
dc
The law of cosines The law of cosines is an extension of Pythagoras’ theorem to non-
right-angled triangles. Although trigonometric functions such as cosine were not
developed until later, a geometric version of this result was proved in 2.12 and 2.13 of
Euclid’s Elements. Also known as the cosine rule, it says that if a triangle has sides of
length a, b and c, with opposite angles A, B and C respectively, then:
a 2 b 2 c 2 2 bc cos A
If the triangle happens to be right-angled, this collapses back down to the ordinary
statement of Pythagoras’ theorem, since cos(90°) 0.
centres of triangles
Any triangle can be divided into two right-angled-triangles: the cosine rule comes from
piecing together the ordinary trigonometry of these two.
Centres of triangles The centre of a circle or rectangle has a clear and unambiguous
meaning. But where is the centre of a triangle? According to the Encyclopedia of
Triangle Centers, maintained by the mathematician Clark Kimberling at the University
of Evansville, there are 3587 different answers to this question (at time of writing)!
Worse, the potential number of triangle centre functions is infinite.
1 Inside any triangle you can draw a unique inscribed circle, which has all three sides
of the triangle as tangents. The incentre is this circle’s centre. So this point is the same
perpendicular distance from each side. The incentre is also where the bisectors of the
three angles meet.
2 If you join each corner of the triangle to the midpoint of the opposite side, these three
lines meet at the triangle’s centroid. If you cut the triangle out of sheet metal, this
would be its centre of gravity.
(3 2), we get to the point (4, 6). Starting again, and applying these vectors in the
opposite order, we also arrive at (4, 6), via (3, 2). This illustrates that, for any two
vectors u and v, u v v u, that is, vector addition is commutative. This may not be a
profound truth, but it is an important one, and is known as the parallelogram law (see
quadrilaterals).
Although vectors were not studied explicitly until the 19th century, the parallelogram
law had already been recognized by Heron of Alexandria in the first century AD.
Length of a vector A vector has magnitude or length, as well as direction. The way to
find the length of a vector is through Pythagoras’ theorem. So the length of
(3
91
3 Every triangle also has a unique circumscribing circle, which passes through all three
of the triangle’s corners. The circumcentre is the centre of this circle. It is also the
point where the perpendicular bisectors of the three sides meet. (It’s not much of a
‘centre’ perhaps, since it will lie outside the triangle if one of the angles is obtuse.)
4 If you join each corner of the triangle to its opposite side so that the lines meet at a
right-angle, then the three lines (or altitudes) meet at the triangle’s orthocentre. (For
obtuse triangles you need to extend the sides, and again this centre can lie outside the
triangle.)
5 The first Brocard point is defined as the point P so that the angles PAB, PBC and
PCA are all equal. It is not quite a triangle centre. There is a second Brocard point Q
where QBA, QAC and QCB are equal. The third Brocard point (R) is defined as the
midpoint of the first and second. It is a triangle centre.
The Euler line Only for an equilateral triangle do all the different centres of the triangle
coincide. In 1765, however, the great Swiss mathematician Leonhard Euler proved that
the orthocentre, centroid and circumcentre always lie on a straight line, now known as
the Euler line. He also demonstrated that the distance between the orthocentre and
centroid is twice that between the centroid and circumcentre.
CIRCLES
Circles Mark a spot on the ground, ask 10 people each to stand 1 metre away from it,
and an approximation to a circle should appear. This is, more or less, Euclid’s
definition of a circle, which has survived till today essentially unchanged. Two pieces
of data are required: a point (O) to be its centre, and a length (r) for its radius. Formally
then, the circle is the set of all points in the plane r units away from O. (The same
definition in three dimensions will give a sphere.)
A disc is the set of points at most r units away from O: a filled-in circle, in other
words. In practice, this distinction is often abandoned, as in discussion of the area of a
circle (which to an irrepressible pedant is 0, though the area of the corresponding disc
is given by r 2).
92
Circumcentre
Centroid Orthocentre
QRP
CB
Radius
Tangent
GEOMETRY
CIRCLES
Circles come with their own lexicon of terms: the circumference is the perimeter of the
circle; an arc is a portion of the circumference between two points; a radius is a line
segment from the centre to the circumference; a chord is a line segment from one point
on the circle to another, dividing it in two; a diameter is a chord which passes through
the centre (so its length is twice the radius); a tangent is a straight line outside the
circle which touches it at exactly one point.
81. In fact, is an irrational number, meaning that it can never be written exactly as a
fraction or a recurring decimal. So its decimal representation continues for ever,
without repeating. More than this, it is a transcendental number, even further removed
from the familiar territory of the whole numbers. Perhaps this fact, combined with its
ancient history and succinct definition, explains the incredible superstardom of this
number, well beyond mathematical circles. The eponymous hero of many books, also
stars in several films and songs. March 14th (3.14) is celebrated as international day.
has also become a test-bed for various feats of human endeavour. The current record
for computing digits of belongs to a team of computer scientists led by Fabrice Bellard,
who established the first 2699999990000 digits as an ordinary desk-top computer, in
2009. The mathematician John Conway tells of romantic walks with his wife, when
they would recite the digits of, taking alternate groups of twenty. Conway has
memorized the first thousand decimal places, some way short of the current Guinness
world record of 67,890, set by Chao Lu in 2005. However, verification is continuing
on Akira Haraguchi’s 2006 attempt at the 100,000 digit mark.
Circle formulas As ancient as the Pyramids, the original definition of the number is as
the ratio of a circle’s circumference (c) to its diameter (d). So the formula for the
circumference of a circle is a true mathematical antiquity: c d. Equivalently, c 2 r,
where r is the circle’s radius.
The formula for the area (A) of a disc was first explicitly derived by Archimedes
around 225 bc: A r 2. This states that a square built on the radius of the circle fits into
the circle exactly times. In situations with Cartesian coordinates, the unit circle is the
gold standard. It has centre at the origin, and radius 1. So it is formed by all the points
(x, y) which are 1 unit away from (0, 0), which (by Pythagoras’ theorem) amounts to
the equation x 2 y 2 1. (In the complex numbers, this is neatly expressed as | z | 1.)
More generally, a circle with centre at (a, b) and radius r is described by the formula (x
a) 2 (y b) 2 r 2.
93
is 1
1 Radian
A basic fact about circles, proved by Euclid in Proposition 3.18 of his Elements, is that
the tangent at a point on the circle is at right angles to the radius at that point. A
consequence is the equal tangent theorem: take any point X outside a circle, and there
are exactly two tangents to the circle you can draw, which pass through X. The
theorem says that the distance from X to the circle is the same along both of these
lines. To see this, call the points where the tangents touch the circle A and B, and the
centre of the circle O. Then OA and OB are both radii, and so the same length. Also,
OA and AX are perpendicular, as are OB and BX. So we have two right-angled
triangles: OAX and OBX. OX is the hypotenuse of both, and OA and OB have the
same length, so Pythagoras’ theorem gives the lengths for AX and BX as equal.
O
X
2x
This result has several important consequences, including the theorem of Thales, the
theorem on angles in the same segment, and the characterization of cyclic
quadrilaterals.
C
A
To unpack this result a little, if you draw a diameter across a circle, and connect its two
endpoints A and B to any point C on the circumference, then
94
GEOMETRY
CIRCLES
the resulting angle ACB is always a right angle. This follows from the inscribed angle
theorem: since A and B form an angle of 180° at the centre, the angle on the
circumference must be half that, 90°. Euclid included this fact as Proposition 3.31 of
his Elements, but it was first proved by the ‘Father of Geometry’, the Egyptian-
influenced philosopher Thales of Miletus, around 600 bc.
If you take two points on the circumference of a circle, and join them with a chord, you
have divided the circle into two segments.
An angle in a segment is the angle created when the two endpoints (A and B) of the
chord are connected to a third point (C) on the circle’s circumference. In Proposition
3.21 of Elements, Euclid proves that any two angles in the same segment are equal. So,
in the illustration, the angles ACB and ADB are equal. In fact, this is a consequence of
the inscribed angle theorem, since each of the angles ACB and ADB must be half that
at the centre (AOB), and so must be equal.
Cyclic quadrilaterals Any triangle may be inscribed in a circle. That is, you can draw a
circle so that all three of the triangle’s corners sit on the circumference. However, the
same thing is not true for quadrilaterals (four-sided shapes). A rhombus, for example,
cannot be inscribed in a circle (except when the rhombus is actually a square). Cyclic
quadrilaterals are those which can be inscribed in a circle, and in Proposition 3.22 of
Elements, Euclid proved their defining characteristic: pairs of opposite angles add up
to 180°.
Really this is two results in one. The first says that cyclic quadrilaterals have this
property. This is a consequence of the inscribed angle theorem: join two opposite
corners of the quadrilateral to the centre (O). Then the two angles at the centre
obviously add up to 360° (2 x 2 z 360°), and so the angles at the two remaining corners
must add up to half this: x z 180°. The second half of the theorem says that every
quadrilateral with this property has an inscribing circle.
Suppose we have a circle with three points, A, B and C, on its circumference, and a
tangent to the circle at A. Take D to be any point on the tangent line which is on the
opposite side of the line AB from C. In Proposition 3.32 of Elements, Euclid shows
that the angles ACB and BAD are equal. (Angle ACB is the angle produced at C by the
lines from A and B.) Also known as the Tangent–Chord theorem, psychologically, this
theorem says that the angle ACB is equal to the angle between the chord AB and the
arc from A to B. But curved lines and angles don’t necessarily mix well, so the straight
tangent is used for precision.
x + z = 180° w +y = 180°
wyx
22zx
x
B
95
A
If A, B, C and D are any four points on a circle, and X is the point where the lines AC
and BD meet, then the triangles ABX and DCX are similar. This theorem still applies
even if X is outside the triangle.
Eyeball theorem The analysis of circles did not stop with Euclid. Since he laid the
foundations, a multitude of exotic and beautiful facts have been discovered. One such
is the eyeball theorem: Take two circles C 1 and C 2 (not necessarily the same size),
and draw two tangents from C 1 which meet at the centre of C 2. Say these cross C 2 at
A and B. Similarly draw two tangents from C 2 meeting at the centre of C 1, crossing
C 1 at points D and E. Then the distance from A to B (along a straight line) is the same
as that from D to E.
Polygons Triangles, quadrilaterals and pentagons all fall under the broader category of
polygons: 2-dimensional shapes bounded by straight lines, meeting at vertices
(corners).
Polygons have been studied since antiquity. The easiest to analyse are convex
polygons. (Stars are examples of non-convex polygons.)
A regular polygon is one where the edges all have the same length, and all the angles
are equal. Equilateral triangles and squares are the first examples of regular polygons,
followed by regular pentagons, and so on.
Starting at three, a regular convex polygon can exist with any number of sides. As the
number of sides increases, the polygon gets ever closer to a circle. In 1796, at the age
of 19, Carl Friedrich Gauss proved that not all regular polygons can be drawn using
elementary ruler and compass constructions.
Convexity An object X is convex if, whenever you mark two points inside it, and join
them together with a straight line, the whole line segment between them lies entirely
inside the shape.
96
C1
C2
Convex
Non-convex
GEOMETRY
A regular pentagon is convex, but a star shape is not, as you can find two points where
the line crosses outside the shape. The same definition holds in higher dimensions. So
a spherical ball is convex, but a curvy banana is not.
The Platonic solids are convex, but they have non-convex, self-intersecting analogues
in the form of the four regular Kepler–Poinsot polyhedra.
Convex quadrilaterals Constructed from four sides of equal lengths, and four right
angles, a square is the simplest quadrilateral: 2-dimensional shapes built from four
straight edges. Squares are the only quadrilaterals classed as regular polygons.
Relaxing these conditions, other types of quadrilaterals emerge:
o A rectangle contains four right angles (consequently its sides come as two pairs of
equal length).
o A rhombus has four equal sides, which come in two parallel pairs. The angles will
not be right angles (except when it happens also to be a square).
o A trapezium (or trapezoid) has one pair of parallel sides. Subspecies include
isosceles trapezia (where the remaining pair of sides are of equal length) and right
trapezia (which also contain two right angles).
o A kite (or deltoid) has two pairs of sides of equal length, like a parallelogram. But in
this case the equal sides meet, rather than being opposite each other.
Non-convex quadrilaterals
The definition of a kite can be satisfied by both convex and non-convex shapes. A
chevron is the name usually given to a non-convex kite, the most symmetric of the
reflex quadrilaterals (which contain one angle of more than 90°).
On the fringes of acceptability are formations of four straight lines where two crash
through each other: the self-intersecting quadrilaterals, of which the most symmetric
are the bow-ties.
Rectangle
Rhombus
Parallelogram
Trapezium
Kite
Chevron
Bow-tie
97
Polyhedra The definition of a polyhedron has changed over time. But essentially it is a
surface which is built from flat 2-dimensional faces meeting at straight edges and
vertices (corners). Polyhedra are the 3-dimensional analogues of polygons, and are
divided into convex solids and the more intricate non-convex polyhedra.
Next are the Archimedean solids, prisms and antiprisms, which are also highly
symmetric in that their vertices are all identical, and faces are all regular convex
polygons. Unlike the Platonic solids, however, their faces may be different shapes.
Other attractive polyhedra are the Catalan solids which make fair dice (since their faces
are all identical, though not regular polygons), and the Johnson solids: all the convex
polyhedra whose faces are regular polygons.
Non-convex polyhedra can be classified too: the most symmetrical of these are the
regular Kepler–Poinsot polyhedra. Polyhedra whose vertices are all identical are
known as isogonal.
The Platonic solids The Platonic solids comprise five beautiful and important
polyhedra:
o the cube with its six square faces meeting at right angles
o the icosahedron where twenty equilateral triangular faces meet at each corner in
fives.
The philosopher Plato held these five highly symmetrical shapes in the highest regard.
Around 350 bc, he wrote that the tetrahedron, cube, octahedron and icosahedron
correspond to the four elements: fire, earth, air and water, respectively. The
dodecahedron he considered ‘God used for arranging the constellation of the whole
universe’.
Mathematically, these five are convex and regular: each face is a regular polygon,
identical to every other, and similarly the edges are all indistinguishable, as are the
vertices (corners). In one of the world’s first classification theorems, Plato presented a
proof that these are the only convex regular polyhedra; no sixth would ever be found.
The final book of Euclid’s Elements is also devoted to these shapes.
Irregular polyhedra The world is full of polyhedra and solids, and most do not have the
high levels of symmetry that mathematicians enjoy. A brick, for example, is not a
Platonic cube, but
98
GEOMETRY
a cuboid, with sides of three different rectangular shapes. A cuboid is a special case of
a parallelpiped, built from three pairs of parallel parallelograms. Pyramids are another
important family of irregular polyhedra: the square- and pentagon-based pyramids can
have sides with equilateral triangles (as can the triangle-based pyramid, or
tetrahedron). Beyond this, all pyramids must have irregular triangular sides.
Once we allow polyhedra with irregular polygons, the most symmetrical are those
which make fair dice, where every face is identical. Beyond this, there is no limit to the
list of possible irregular polyhedra. The Johnson solids at least provide a complete
lexicon of those convex polyhedra which can be constructed from regular polygons.
The Stewart toroids extend this list to non-convex shapes.
Nets The German artist Albrecht Dürer was also a mathematician, with a particular
interest in polyhedra. In his 1538 work Instruction on Measurement, Dürer introduced
an invaluable tool for understanding polyhedra. A net is a flat arrangement of
polygons, some joined along their edges. By folding and gluing this pattern, you can
create a model of the polyhedron.
Every polyhedron can be described by a net. Indeed, the cube has 11 different nets.
Dürer’s fascination with polyhedra led him to rediscover two of the Archimedean
solids (the truncated cuboctahedron and the snub cube), and to design a shape of his
own: the melancholy octahedron.
Polyhedral duality A cube has six faces and eight vertices, while an octahedron has
eight faces and six vertices. Both have twelve edges. This symmetry comes from the
fact that the cube and octahedron are dual polyhedra.
To obtain the dual of a polyhedron, mark a spot in the middle of each face, and join
two spots with a line if the two corresponding faces meet. The resulting framework of
spots and lines describe a new polyhedron: the dual of the original. Repeating the
process, taking the dual of the dual, gives back the original shape. Among the Platonic
solids, the tetrahedron, with its four faces and four vertices, is self-dual. The
dodecahedron and icosahedron form a dual pair.
The Archimedean solids No other polyhedra can possess the perfect symmetry of the
Platonic solids, but slightly loosening the requirements opens up an exciting range of
new shapes. The fourth-century mathematician Pappus credits Archimedes
99
with the discovery of 13 convex polyhedra with faces which are regular polygons
(though not all the same), and which are symmetrical in their vertices: that is, the
arrangement of faces and edges at every vertex is identical to every other, so moving
any vertex to any other is a symmetry of the shape.
Rhombi-cuboctahedron
Truncated icosidodecahedron
Prisms and antiprisms The Archimedean solids are defined as convex polyhedra whose
faces are regular and whose vertices are all indistinguishable. But these 13 shapes do
not list every possibility. There are also two infinite families of polyhedra which
satisfy these criteria.
The prisms are formed by taking two identical regular polygons (such as two hexagons
of the same size), and joining their edges with squares. In general a prism is a solid
formed by two regular n-gons joined by a ring of n squares. (Polyhedra with rectangles
in place of squares are also sometimes known as prisms, but these do not have all
regular polygonal faces.)
In a hexagonal antiprism, the two hexagons are twisted out of sync, and then joined by
equilateral triangles. In general an antiprism comprises two regular n-gons joined by a
ring of 2 n alternating equilateral triangles.
Fair dice Which shapes make fair dice? To make a fair die, a polyhedron should be
convex, with all its faces identical. Certainly the Platonic solids satisfy these
requirements, but they are not alone. In 1865, Eugène Catalan published a list of 13
beautiful new solids with this property. He did not discredit Plato’s classification, since
the faces are not regular polygons: they are rhombuses, non-equilateral triangles, kites
or irregular pentagons.
The Catalan solids are obtained as the duals of the Archimedean solids. These elegant
shapes go by clumsy names, such as the rhombic dodecahedron with 12 rhombus faces
and the strombic hexecontahedron with its 60 kite-shaped faces.
100
Truncated tetrahedron
Truncated octahedron
Truncated cube
Cuboctahedron
Truncated cuboctahedron
Snub cube
Truncated dodecahedron
Snub dodecahedron
Icosidodecahedron
Rhomb-icosidodecahedron
Truncated icosahedron
Hexagonal prism
Hexagonal antiprism
Strombic hexecontahedron
Disphenoid Bipyramid
Trapezohedron
GEOMETRY
There are also three infinite families satisfying the criteria: 1 bipyramids, where two
pyramids with n-agonal bases are glued at their bases (obtainable as the duals of the
prisms)
2 trapezohedra where two ‘cones’ built from kites are glued together (these are the
duals of the antiprisms)
3 disphenoids which are tetrahedra, but whose four faces are identical non-equilateral
acute triangles.
Kepler–Poinsot polyhedra
Are the Platonic solids the only solids with identical faces of regular polygons? As so
often in mathematics, the answer closely depends on definitions.
In 1619 Johannes Kepler noticed two non-convex polyhedra which also fit the bill.
Artists such as Paolo Uccello had already exploited their beauty, but previous
mathematicians had overlooked them, perhaps because their faces do not only meet
along edges, they also pass through each other at false edges.
If we are really interested in 3-dimensional solids, these should probably be ruled out.
But they are usually classed as legitimate polyhedra. Louis Poinsot identified two more
in 1809, and the resulting four Kepler–Poinsot polyhedra arguably complete the list of
regular polyhedra begun by Plato. They are the small stellated dodecahedron, great
dodecahedron, great stellated dodecahedron, and the great icosahedron.
Two of the Kepler-Poinsot polyhedra are star polyhedra: the great and small stellated
dodecahedra. Both are obtained from the dodecahedron by a process of stellation:
extending edges and faces until they meet. The basic idea can be seen with polygons:
the pentagram is a stellation of the pentagon, for instance. Usually there are choices
about which edges to connect: a heptagon can be stellated in two ways, to give two
different heptagrams.
The possibilities for stellating an icosahedron were documented in the book The Fifty-
Nine Icosahedra, by Coxeter, Du Val, Flather and Petrie, in 1938. Some of the
Archimedean solids have hundreds of millions of distinct stellations.
Seventeen of the uniform polyhedra are star polyhedra deriving from Archimedean
solids.
Because of the way their faces pass through each other, star polyhedra are typically not
topologically spherical, and so will not satisfy Euler’s polyhedral formula.
Great dodecahedron
Great icosahedron
101
Compound polyhedra As well as recognizing the first two of the Kepler–Poinsot
polyhedra, Johannes Kepler also discovered the first compound polyhedron: the stella
octangula, obtained by pushing two tetrahedra into each other, so that they have a
common centre. (It can also be obtained as a stellation of an octahedron.) As with other
star polyhedra, this shape is non-convex and self-intersecting, resulting in false edges
and vertices. Shapes such as this, which can be pulled apart into separate polyhedra,
are not usually classed as polyhedra themselves. Nevertheless, all its faces are
identical, as are its edges and vertices, making it a regular compound polyhedron.
There are four others, built similarly from five tetrahedra, ten tetrahedra, five cubes
and five octahedra.
Uniform polyhedra The discovery of star polyhedra opened the door for a new
classification of uniform polyhedra: polyhedra with regular polygonal faces (including
star polygons), and whose vertices are all identical. The convex examples were long
known: the Platonic solids, the Archimean solids, prisms and antiprisms. The Kepler–
Poinsot polyhedra are the first non-convex, self-intersecting examples, and in 1954
Coxeter, Longuet-Higgins and Miller produced a list of 53 more, starting with the
tetrahemihexahedron, created from three intersecting squares and four equilateral
triangles. The last, the great dirhombicosidodecahedron or ‘Miller’s monster’ has 60
vertices, at each of which four squares, two triangles, and two pentagrams meet,
making 124 faces in total.
The uniform polyhedra are completed by two more infinite families: star prisms (where
two regular n-grams are joined by a ring of intersecting squares) and star antiprisms
(where two regular n-grams are joined by a ring of intersecting equilateral triangles).
See illustrations.
In 1970 S.P. Sopov proved that this list was complete, although in 1975 John Skilling
made a curious discovery. If edges were allowed to coincide (that is, one edge could be
shared between four faces), he found one further possibility: Skilling’s figure (the great
disnub dirhombidodecahedron, pictured) with 204 faces meeting at 60 vertices.
Isogonal polyhedra The definition of a uniform polyhedron has two parts: the vertices
are all the same (the arrangement of faces and edges at every vertex is identical to
every other), and every face is a regular (possibly non-convex) polygon. If this second
requirement is dropped, the number of qualifying shapes becomes infinite: these are
the isogonal polyhedra.
This can be seen with the very first examples: the disphenoid tetrahedra (see fair dice).
Take any cuboid, and pick four corners which do not share an edge. Joining these four
creates a disphenoid. No complete classification of isogonal polyhedra is yet known.
Star prism
Star antiprism
Skilling’s figure
Isogonal polyhedron
102
GEOMETRY
The Johnson solids In 1966, Norman Johnson ignored all questions of symmetry and
asked simply: what convex polyhedra can be built from regular polygons (not
necessarily all the same)? He produced a catalogue of 92 convex polyhedra. In 1969
Victor Zalgaller proved that Johnson’s list, along with the Platonic and Archimedean
solids, and the prisms and antiprisms, are indeed all there are.
The first Johnson solid (J1) is a pyramid with square base and equilateral triangular
sides. Another is the gyrobifastigium (J26), built from four squares and four equilateral
triangles.
In it, Stewart considered polyhedra that can be built from regular polygons. But,
departing from the Johnson solids, he did not confine himself to convex shapes. Others
may have given this up as a lost cause, since there is no limit to the number of Johnson
solids which can be glued together.
However, some of these shapes do have a high-level of symmetry. Eight octahedra, for
example, can be glued at their faces to form a ring. Some of the most breathtaking of
Stewart’s shapes come not from gluing solids together, but from the opposite
philosophy: he took large models of several Johnson and Archimedean solids, and
analysed the possibilities for drilling through them, lining the tunnels with regular
polygons.
Topologically these shapes are not spheres but n-tori (see orientable surfaces), with the
number of holes n given by their genus. The arrangements must therefore satisfy the
surfaces
appropriate polyhedral formula.
Polychora With the study of polyhedra having yielded such spectacular fruits, a
mathematician’s immediate reaction is to seek to generalize it. Just as polyhedra are
the 3-dimensional analogues of polygons, so polychora are the equivalent objects in
four dimensions. These are built from 3-dimensional polyhedral cells, meeting at 2-
dimensional polygonal faces, 1-dimensional straight edges, and zero-dimensional
vertices.
Just as Plato had classified the regular convex polyhedra, so in 1852 the Swiss
geometer Ludwig Schläfli classified the regular convex polychora. He found:
o the pentachoron (or 4- simplex), built from five tetrahedra, and analogous to the
tetrahedron
o the tesseract (or 4- hypercube), built from eight cubes o the hexadecachoron (4-
orthoplex or 4- cross-polytope), built from 16 tetrahedra: the analogue of the
octahedron
Gyrobifastigium
103
see orientable
o the icositetrachoron (octaplex), built from 24 octahedra, a new shape, with no 3-
dimensional analogue
104
GEOMETRY
TRANSFORMATIONS
Ludwig Schläfli proved that a remarkable thing occurs when we look in higher
dimensions than four. There are only ever three regular polytopes: the simplex,
hypercube and orthoplex (the equivalents of the tetrahedron, cube and octahedron
respectively).
There are also the troublesome self-intersecting, non-convex polytopes to account for.
In two dimensions these are the star polygons, beginning with the pentagram. In three
dimensions, we find the four Kepler–Poinsot polyhedra, and in four dimensions, ten
Schläfli– Hess polychora. Again higher dimensions turn out to be far simpler and, from
five-dimensions onwards, there are none at all. This is an example of a phenomenon
well-known to geometers: that life in three and four dimensions is in many ways more
complicated than in higher-dimensional spaces.
TRANSFORMATIONS
Isometries of the plane Having drawn a picture on the plane, there are various ways to
move it into a new position, without it becoming twisted or distorted. These are called
isometries of the plane; technically defined this means that a line will have the same
length before and after the move.
o A rotation is given by two pieces of information: a point (the centre of rotation), and
an angle describing the amount of rotation. (As always in mathematics, a positive
angle corresponds to an anticlockwise rotation, and a negative one to a clockwise
rotation.)
o A translation slides a figure around and is expressed by a vector, with the top row
corresponding to movement right (or left, if negative), and the bottom row
corresponding to shifting the figure up (or down).
So (4
Angle of rotation
o A glide is a reflection followed by a translation along the same line. (See glide
symmetry.)
Line of reflection
105
Symmetry For an object drawn on the plane, a symmetry is an isometry which leaves
the shape looking the same. A square, for example, has both rotational and reflectional
symmetry. With the centre of the square as the centre of rotation, rotating by 90°
leaves it looking the same. Repeating this manoeuvre produces two other symmetries,
rotations by 180° and 270°, before bringing the square back to its starting position. So
we say that a square has rotational symmetry ‘of order 4’.
A square also has four different lines of reflectional symmetry: the two diagonals, the
horizontal and the vertical. Altogether this produces eight symmetries (including the
trivial one: just leaving the square as it is). This information is encapsulated in the
symmetry group of the square.
Shapes can have just rotational symmetry, just reflectional symmetry, or both. Infinite
patterns, such as tessellations, may also have translational symmetry and glide
symmetry.
Of course there is one symmetry which has no effect when combined with any other:
the trivial symmetry (‘1’) which leaves the square as it is.
Every symmetry has an inverse: if A is ‘rotate 90° anticlockwise’, then its inverse
(denoted A 1) is ‘rotate 90° clockwise’. These facts suggest that the collection of
symmetries of the square forms a group. This turns out to be true for the symmetries of
any object.
For 2-dimensional shapes, there are two families of groups, depending on whether the
object has any reflectional symmetry. If it does not, as is the case with a swastika, the
group is cyclic, meaning that it resembles addition modulo some number (in this case
4). If there is also reflectional symmetry, as in the case of the square, the group is
dihedral.
Other polyhedra and higher dimensional polytopes come with more complicated
groups.
In the case of the most symmetrical of all shapes, circles and spheres, these groups are
infinite Lie groups.
Tessellations also have infinite symmetry groups, namely frieze groups or wallpaper
groups.
A-1
106
GEOMETRY
TRANSFORMATIONS
Similarity Two triangles are similar if their angles match. So, triangles A, B and C both
have angles of 30°, 60° and 90° for example, then they are similar. (It is not required
that the lengths of their sides should be equal.) Similarity also allows the triangle to be
reflected.
In general, two shapes are similar if they are the same shape but not necessarily the
same size or in the same position. (If they are the same shape and size, then they are
congruent, and there will be an isometry taking one to the other.) As it stands, this
definition is unsatisfactorily imprecise, but can be made exact through the idea of an
enlargement.
The same procedure works for objects in higher dimensions. Objects which are
enlargements of each other are said to be similar.
Circumference 11
Circumference 33
Scale factor 3
Enlarged figure
Scale factor ½
Original
Enlarged figure
Original
Centre of enlargement
Scale factor-2
Enlarged figure
Original
Scale factors Two objects which are similar look the same, have the same proportions,
but are different sizes. This difference in size is measured by the scale factor. If a shape
is 3 units long, and the scale factor is 2, then the length of the enlarged shape is 3 2 6
units. This does not only work for straight lines. If an ellipse with a circumference of
11 units
Area 4
107
is enlarged by a scale factor of 3, then the new ellipse has a circumference of 33 units.
However, the same process does not apply to area. If a triangle of area 4 is enlarged by
scale factor 3, the area of the new triangle is not 4 3, but is calculated by multiplying
the original area by the square of the scale factor: 4 3 2 36. Again this procedure works
for any shape, and is useful when direct methods for calculating area are not available.
TESSELLATIONS
Tessellations From the tombs of the Pharaohs to the etchings of M.C. Escher, humans
have always been fascinated by patterns built from the repetition of simple shapes. The
art of tiling was widespread in the Islamic world, where a religious injunction against
figurative art spurred artists to explore the aesthetic possibilities of abstract design,
such as those within the Alhambra Palace in Spain.
It took longer for the mathematics behind these patterns to be revealed. The first idea is
that a 2-dimensional figure tessellates if it can function as a tile: copies of it can be
placed side by side, so that as large an area as you like can be covered, with no
overlaps or gaps.
The simplest tilings are the regular tessellations, though irregular and semiregular ones
are equally common in design. Once a larger variety of tiles become involved, a more
careful account of the possible symmetries is provided by the wallpaper and frieze
groups. Of course, mathematicians also explore the same phenomenon in higher
dimensions.
Regular tessellations The most basic tilings are those involving just one regular
polygon: these regular tessellations are the equivalents of the Platonic solids. A grid is
the commonest example: tessellating squares with four meeting at every vertex.
Equilateral triangles also tessellate when arranged so that six meet at every vertex.
Which other regular polygons tessellate? Pentagons do not: if you try it, you will end
up with pentaflakes. (The interior angle of a pentagon is 108°, which does not divide
360°.) The regular hexagon is the only other regular polygon to tessellate, a fact
exploited by bees. Heptagons (and n-gons for larger n) cannot. If you place two side by
side, the remaining angle is too small to fit a third.
Scale factor 2
Volume 5
Volume 40
Heptagons
108
GEOMETRY
TESSELLATIONS
corresponding sides
Pentagonal tilings Regular pentagons do not tessellate, but there are some irregular
convex pentagons which do. There are 14 essentially different known ways for this to
happen. One of these is the Cairo tessellation (illustrated) which adorns the pavements
of that city. Others include the four discovered in 1977 by the amateur mathematician
Marjorie Rice, and the most recent found in 1985 by Rolf Stein.
It is not yet established that these 14 list every possible tiling by convex pentagons.
Semiregular tessellations Like regular tessellations, the semiregular ones use only
regular polygons, but this time more than one type of tile is allowed, and it is required
that every vertex is identical to every other. There are eight such tilings; each involves
two or three shapes out of equilateral triangles, squares, regular hexagons, octagons
and dodecagons. One of these tilings comes in a left-handed and a right-handed
version.
Translational symmetry Finite objects such as polygons or polyhedra can have at most
two types of symmetry: reflectional, and rotational. Infinite tilings admit a third
possibility: translational symmetry. Shifting the rhombus tiling overleaf right by 1 unit
leaves it looking the same, and so is a symmetry. Of course, moving it right by 2 units,
or 3, 4, 5, … are also symmetries, meaning that a transitionally symmetrical pattern
automatically has infinitely many symmetries.
Irregular tessellations It is not only regular polygons that can tessellate, every triangle
does. To see this, draw and cut out any triangle. Using this as a template, draw around
it once. Then move the template so that one side matches up with the corresponding
side of the drawn triangle (be careful only to rotate it in the plane, do not flip it over)
and draw around it again. Repeating this will create a pattern that tiles the plane.
The same procedure works for quadrilaterals: every figure with four straight sides
tessellates. Pentagonal tilings are more complicated. Some pentagons do tessellate,
though regular ones do not. Regular hexagons tessellate, and in 1918 Karl Reinhardt
showed there are exactly three classes of irregular convex hexagon which do too. For n
7, there are no convex n-gons which tessellate.
109
The translational symmetry of a pattern is the first step towards classification. Many
common patterns have two distinct types of translational symmetry: left–right and up–
down. This allows 17 essentially different patterns, classified by the 17 wallpaper
groups. Other tilings have only one type of translational symmetry: left–right (or up–
down, but not both). These are classified by the seven frieze groups.
The aperiodic tilings, including the Penrose and Ammann tilings, are remarkable for
having rotational and reflectional symmetry, but no translational symmetry at all.
Glide symmetry As well as rotational, reflectional and translational symmetry, the final
possible type of symmetry of 2-dimensional pattern is the glide: a combination of
translation and a reflection. In the illustration, the picture can be reflected in the
horizontal line, and then translated along that same line. The resulting symmetry is
neither a reflection nor a translation, but a glide (or glide-reflection). In contrast, the
combination of a rotation with a translation always gives another rotation (although
locating its centre is not always straightforward). Many tilings and patterns have glide
symmetries, and so they feature in the frieze and wallpaper groups.
Frieze groups In architecture, a frieze is a narrow band along the top of a wall. Since
classical times, friezes have often been decorated with repeating geometrical patterns.
In this context there is translational symmetry, but only of the left–right variety.
Patterns like this, with just one form of translational symmetry, come in seven types.
Their names, given to them by John Horton Conway, describe trails of footsteps with
the correct symmetries:
o The spinning jump is the largest group, with translations, vertical reflections, one
horizontal reflection and rotational symmetries of 180°.
110
Hop
Sidle
Jump
Step
Spinning hop
Spinning sidle
Spinning jump
GEOMETRY
TESSELLATIONS
Conway’s orbifolds The frieze groups classify patterns with left–right or up–down
translational symmetry. What of patterns with both? The simplest such pattern has only
translational symmetry, without any reflectional, rotational or glide symmetry. In the
orbifold notation devised by John Conway, this is denoted o.
Unlike polygons, tilings can have more than one centre of rotation. One possibility has
centres of order 6, other centres of order 3 and a final set of order 2. The illustrated
example has these centres of rotation and no reflectional symmetries. This group is
denoted 632.
The final type of symmetry to consider is the glide, denoted x. The possibilities are xx
which has two types of glide (and no reflections or rotations),
*x which has one glide and one reflection, and 22x which has no reflections, two
rotations of order 2 and a glide.
Wallpaper groups Wallpaper groups classify those patterns which contain two different
translational symmetries. In 1891, Evgraf Fedorov proved that there are exactly 17
different possibilities. Here we will use the later orbifold notation of John Conway.
The starting point of the classification is the following fact: patterns with two types of
translation can only have rotational symmetry of order 1, 2, 3, 4 or 6. This is the
crystollagraphic restriction theorem.
The simplest pattern has translational symmetry alone: this is denoted o. The
possibilities for a pattern with rotations, but without reflection or glide symmetries, are
632, 442, 333 and 2222. With both rotational and reflectional symmetry, the
possibilities are *632, *333, 3*3, *442, 4*2, 22*,
*2222, 2*22 and ** (the last having two parallel lines of reflection, and no rotational
symmetry). Finally, those with glides are xx, *x and 22x.
632
Gyration of order 4
Kaleidoscope of order 2
4*2
XX
333
*2222
111
Heesch’s tile One of the questions posed by David Hilbert in his 18th problem was
about shapes which can tessellate on their own, but only in a rather strange way.
Although every tile is identical, nevertheless they appear in positions which are non-
identical. This would mean that you can always find two tiles which cannot be
matched by a symmetry, because the arrangements of tiles around them are different. It
is not difficult to construct a tiling like this (any aperiodic tiling fits this description).
Hilbert’s question was whether there was any shape which can only tessellate this way.
Aperiodic tilings Tilings can have two types of translational symmetry (classified by
the wallpaper groups), or just one, as described by the frieze groups. There are also
tilings which have no translational symmetry at all. These are the aperiodic tilings,
which never repeat themselves: even if you were to tile a square mile, there would be
no way to slide the pattern around so that it fits back over itself. Having only rotational
and reflectional symmetry, their possible symmetry groups are the same as for a
polygon. Common examples are radial tilings (with dihedral symmetry groups), and
the beautiful spiral tilings (with cyclic symmetry groups), such as the Voderberg tiling.
More exotic possibilities are the Penrose and Ammann tilings.
Uncomputable tilings Suppose I present you with a collection of shapes and challenge
you to tile the plane with them. I am not concerned about symmetry; all that is required
is that you can cover as large an area as I ask, with no gaps or overlaps. You can have
as many of each shape as you like. If I give you squares and equilateral triangles, you
will have no trouble. If I pick regular pentagons, heptagons and decagons, you will be
unable to do it. But if I present you with a collection of 100 intricate and irregular
polygons, then you will have to stop and think.
The question Hao Wang addressed was whether there is some definite procedure that
you could follow, to decide whether or not my selection of shapes does tile the plane.
In other words, he was searching for an algorithm. In 1961, he believed he had found
one. But to prove it would work, Wang had to make an assumption, that no set of tiles
should only tile the plane aperiodically. The discovery of Penrose and Ammann tilings
demolished this hypothesis, and with it Wang’s algorithm. In fact, there can be no
algorithm to solve this problem: it is uncomputable.
Heesch’s tile
Radial tiling
112
GEOMETRY
TESSELLATIONS
Penrose and Ammann tilings Aperiodic tilings were known to the mosaic-makers of
ancient Rome. But every example seemed to have a periodic cousin. If a set of tiles
could tile the plane, then it could be rearranged to do so in a way with translational
symmetry. In 1961 Hao Wang conjectured that this must always be the case: no finite
set of tiles should only give rise to aperiodic patterns. Wang’s conjecture was refuted
by his student Robert Berger in 1964, who concocted a set of 20,426 notched square
tiles which tile the plane only aperiodically.
During the 1970s the mathematical physicist Roger Penrose and the amateur
mathematician Robert Ammann independently discovered beautiful simplifications of
this result. In 1974 Penrose found an aperiodic set of just two tiles: both rhombuses
and with equal sides, one fat and one skinny. These tiled the plane, but by notching the
edges (or colouring the edges and insisting that touching edges match) Penrose ensured
that they could never do so in a periodic fashion. Penrose dubbed these tiles the
‘rhombs’ and he found two other purely aperiodic sets of tiles: the ‘kites and darts’,
which also use two different tiles, and the ‘pentacles’, which use four. Ammann
discovered further examples including, with Frans Beenker, the Ammann– Beenker
tiling involving squares and rhombuses.
An outstanding question in tiling theory is: is there a single tile which can tile the
plane, but only aperiodically?
For patterns involving more than one solid, a key result is the crystallographic
restriction theorem, which says that a tessellation of 3-space which has translational
symmetry can only have rotation of order 2, 3, 4 or 6. The analogues of the wallpaper
groups are the 230 crystallographic groups. This is a fundamental result in materials
science, as it restricts the possible arrangements of molecules in crystalline solids.
113
However more complicated patterns can be formed with the inclusion of more than one
type of tile. To analyse these, space groups play the role of the 17 wallpaper groups
and 230 crystallographic groups.
In his 18th problem, David Hilbert asked a key question: are there only finitely many
space groups in each dimension? In 1911, Ludwig Bieberbach was able to answer this
question in the positive. (Bieberbach made other notable contributions to mathematics,
but was disgraced for his Nazi politics.) In 1978, Harold Brown, Rolf Bülow and
Joachim Neubüser showed that there are 4,895 space groups in four dimensions. In
2001, Wilhelm Plesken and Tilman Schulz used a computer to list the 222,097 space
groups in five dimensions, and the 28,934,974 in six dimensions.
Quasicrystals On the molecular scale, solids come in two basic forms: amorphous,
where molecules are arranged haphazardly as in a liquid (an example is glass), and
crystalline where they are arranged in fixed geometric patterns (such as diamond).
Curves A curve is a 1-dimensional geometric object. The simplest examples are those
which are not ‘curved’ at all: straight lines. These are also the simplest when looked at
algebraically. On a plane with Cartesian coordinates, straight lines are described by
equations such as x y 1 0, or 3 x y 7 0, or generally Ax By C 0 for some numbers A, B,
C (and not with A B 0). These are the polynomial equations of degree 1.
114
GEOMETRY
Parabola
Circle
Elipse
Focus and directrix As well as arising when a cone intersects with a plane, the conic
sections can be obtained by another construction. Take a point on the plane, and a
straight line (not passing through the point). We call these the focus and the directrix,
respectively. For any point on the plane we can ask what its distance is to the focus,
and what its distance is to the directrix. (The ‘distance’ from a point to a line always
means the shortest, or perpendicular, distance.) What pattern is formed by the
collection of points for which these two distances are the same? The answer is a
parabola.
By subtly changing the question we get different curves: if we want the distance to the
focus to be half that to the directrix, the resulting curve is an ellipse. If we require it to
be double, we get a hyperbola.
In these constructions, the crucial quantity is the ratio of a point’s distance from the
focus to its distance from the directrix. Call this number e. The defining
Conic sections Which curves are described by equations of degree 2? Even 1800 years
before Descartes invented his system of coordinates, Greek geometers, most notably
Appolonius of Perga around 220 bc, were able to address this problem. It has a very
elegant solution: first we consider a pair of infinite cones joined opposite at their tips.
The conic sections are the curves formed by taking slices through this surface.
These make up the family of curves given by equations of degree 2. Slicing the plane
through horizontally will produce a circle (such as that given by x 2 y 2 1 0). Slicing
vertically, right through the centre gives a pair of intersecting straight lines (described
by the equation x 2 y 2 0, for example).
Quadratic curves The three principle types of conic section are ellipses, parabolas, and
hyperbolas. Any equation of degree 2 is of the form A x 2 Bxy C y 2 Dx E y
Hyperbola
Parabola
Focus
Distance to directrix
Directrix
Directrix
115
characteristic of conic sections is that, whichever point on the curve you choose, you
get the same value of e, called the curve’s eccentricity. If 0 e 1 the curve is an ellipse if
e 1 it is a parabola, and if e 1 it is a hyperbola.
This provides a nice method for drawing an ellipse: push two pins into a piece of paper
with a piece of slack string tied between them. Tracing out the places where the string
is pulled taught will produce an ellipse.
The major axis of the ellipse is the longest straight line segment inside it, passing
through both foci and the centre. The minor axis is perpendicular to this: the shortest
straight line through the centre, connecting two points of the ellipse. Taking the
distance between the foci and dividing by the length of the major axis gives the
eccentricity of the ellipse. A circle is the special case where the two foci coincide: an
ellipse of eccentricity 0.
In 1609, Johannes Kepler formulated his first law of planetary motion: that the orbit of
a planet is an ellipse with the sun at one focus.
Parabolas Unlike an ellipse, a parabola is a not a closed curve, but has infinite length.
One of the conic sections, a parabola can be defined as a slice through a cone along a
plane parallel to the edge of the cone. Alternatively it is the set of points whose
distance from a given line (the directrix) is equal to that from a particular point (the
focus).
A common parabola is given by y x 2 (or equivalently x 2 y 0). This has its focus at (0,
1
_
ab
y y = x²
1 4 (0,)
(focus) (directrix)
Parabolic trajectory
–2 –1
12
14
4.
During the middle ages, the physics of motion was not well understood. People
generally believed that when you fired a canon, the missile would fly in a straight line
until it lost ‘impetus’ and fell to the ground. It was Galileo who, in the 17th century,
combined mathematical knowledge with experimental skill to challenge this
assumption. He devised a series of tests which demonstrated that projectiles actually
travel in parabolic paths (ignoring the effect of air resistance). It was not until the work
of Isaac Newton that scientists understood why this is true.
NASA distinguishes between periodic comets, which have elliptical orbits (and so
reappear – in the case of Halley’s comet, every 75 years, but for some long-period
comets every ten million years), and single-apparition comets which travel on
parabolic or hyperbolic paths, only passing through the solar system once.
y=–
116
GEOMETRY
Hyperbolas of interference
Hyperbolas A hyperbola is the only conic section with two separate branches, which
come from slicing through both halves of the double cone. Like their cousins the
ellipses, hyperbolas have two foci and two directrices. Again the two foci provide an
alternative characterization. If the distances from a point on the curve to the two foci
are a and b, then the numbers a b and b a are fixed for all points on the curve
(swapping them over moves between the two branches). For this reason, hyperbolas
occur as interference patterns in waves. If you drop two pebbles into a pond, the two
sets of circular ripples will interfere in a family of hyperbolas.
All hyperbolas have two asymptotes. In the case of the the most famous hyperbola, y x
1 (or equivalently y 1
_ x), the asymptotes are the x- and y-axes. Of course these are at right angles to each
other, making this a rectangular hyperbola (which always has eccentricity
2).
Not every curve has any asymptotes; ellipses and parabolas do not, for example.
Newton’s cubics The three types of conic section are the curves given by quadratic
equations. Curves of higher degree come in a far greater variety than three. Some of
them can be more neatly expressed in polar coordinates.
The most important cubics are the elliptic curves, which remain of central importance
to mathematics today. Not every cubic curve is elliptic, but Newton showed that all
cubic curves can be constructed by suitably squashing or stretching an elliptic curve.
Quadric surfaces In 2-dimensional space, the simplest curves are straight lines (given
by linear equations), followed by the conic sections: curves on the plane defined by
quadratic equations. Looking at surfaces in 3-dimensional space, linear formulas define
flat planes, and quadratic formulas produce the family of quadric surfaces. These are
y=1x
–3 –1
–2
x
134
–1
–2 –3 –4
117
essentially obtained by lifting the conic sections into three dimensions, in different
ways. The simplest versions are cylinders formed from ellipses, parabolas and
hyperbolas, obtained by building straight walls on top of these curves. The equations
for these are the same as for the original curves, with the z-coordinate free to take any
value.
After rotating and centring the surface, many quadrics are given by an equation of the
form A x 2 1 B y 2 1 C z 2 5 1. If A, B and C are all positive, an ellipsoid is defined. If
two are positive and one negative, a one-sheeted hyperboloid is defined, and if two are
negative and one positive, a two-sheeted hyperboloid is defined.
The remaining cases are elliptic cones (cones with elliptical cross-sections) and pairs
of planes.
A special case is the spheroid, where the cross-sections in one direction are all circles.
We have known that the earth is approximately spheroidal since Isaac Newton’s work
on universal gravitation. However, in the 18th century, there was some debate about
whether it forms a stretched sphere (a prolate spheroid, like a rugby or American
football) as the French astronomers Geovanni and Jacques Cassini believed, or a
squashed sphere (an oblate spheroid) as Newton himself held. Further measurements
proved Newton correct, although smaller celestial bodies can form prolate spheroids.
One such is the dwarf planet Haumea.
The second form is the saddle-shaped hyperbolic paraboloid, famous as the shape of
Pringles snacks, and also used for roofing in modern architecture. The archetypal
hyperbolic paraboloid is described by z 5 x y.
118
GEOMETRY
Most applications involve one-sheeted circular hyperboloids, where these ellipses are
actually circles. (Often ‘hyperboloid’ is understood as short-hand for ‘one-sheeted
circular hyperboloid’.) As these are ruled surfaces, they are easily constructed. Take
two identical circular rings, and join the corresponding points with wires. Pulling them
apart will form a cylinder. Twisting turns this into a hyperboloid. Since this surface is
doubly ruled, two sets of wires can simultaneously be straightened. Being constructible
from straight beams, hyperboloids have been popular in art and architecture ever since
Vladimir Shukhov built a hyperboloid water tower, in 1896.
Rotating a straight line about a parallel axis produces a circular cylinder. If the two
lines are not parallel on the plane, then the result is the double cone that encapsulates
the conic sections. Starting with two skew lines in 3-dimensional space (that is, two
lines which are not parallel, but do not cross), the surface swept out is a circular
hyperboloid of one sheet. Rotating an ellipse about one of its axes produces a spheroid,
while a parabola gives a circular paraboloid, and a hyperbola produces a circular
hyperboloid. More intricate curves can produce very beautiful surfaces of revolution, a
fact long exploited by potters and sculptors.
Ruled surfaces A plane is completely built from straight lines: the straight lines on the
surface together totally cover it.
More surprising is that there are other, apparently curvier, surfaces with the same
property. Cylinders are just obtained by building straight walls along curved paths.
Cones are also ruled surfaces. Another famous example is the helicoid, swept out by a
straight line spiralling down a vertical axis, reminiscent of a ramp in a multi-storey
carpark (garage).
119
There are three doubly ruled surfaces, where every point lies on two straight lines.
These are the plane, the circular single-sheeted hyperboloid, and the hyperbolic
paraboloid.
An interesting class of surfaces are the superquadrics. These mimic the quadric
surfaces, replacing the x 2 terms with higher degree. For example, a spheroid is a
surface of revolution of an ellipse. An ellipse has equation
|x
_a|2
_ b | 2 1.
By replacing the squares with some higher power n, we obtain a superellipse, given by
|x
_a|n
|y
POLAR COORDINATES
Polar coordinates The system of Cartesian coordinates identifies a point on the plane
through its distances from a pair of perpendicular axes. An alternative is to use polar
coordinates; these also involve two pieces of information: a distance and an angle,
usually denoted r and respectively. The distance states how far the point is from the
origin: so, in the diagram, the points A and B (with Cartesian coordinates (1,0) and
(0,1) respectively) are each 1 unit away from the origin, as is C with Cartesian
coordinates (1 ___
2, 1
(r,)
(1
2) 2 (1 ___
2 1.)
Distance alone cannot distinguish between these, so we also provide the angle that that
point makes at the origin, against the polar axis. This is the horizontal line starting at
the origin and going right (the positive x-axis in Cartesian coordinates). So the point A
which sits on this line has angle 0°. In mathematics a positive angle is always
anticlockwise, so the point C has angle 45° and B has 90°.
Usually, however, we measure the angles in radians: so, writing the distance before the
angle, the polar coordinates of A, B and C are (1,0),
(1,
2) 2 1 _ 2 1
Polar axis
Origin
BC
Ax2
–2
–1
4).
The point D is 2 units away from the origin, at an angle of 270°, so it has polar
coordinates (2, 3
2).
1
–1
–2
120
GEOMETRY
POLAR COORDINATES
Polar geometry Polar and Cartesian coordinates are two different languages for talking
about the same objects. Anything which can be said in one can be translated into the
other. Nevertheless, polar coordinates efficiently describe certain geometrical shapes in
the plane. For instance, the circle of radius 1 has a simple formula: r 1. This describes
the set of points of the form (1,), each of which is one unit away from the origin.
4, and letting r vary produces a straight line at that angle to the horizontal. This is
described by the equation
4.
Other examples of shapes that are well described by polar coordinates are
Archimedean and logarithmic spirals, and cycloids.
Polar coordinates are ubiquitous in complex analysis: every complex number z comes
equipped with a distance r (its modulus) and an angle (its argument). These are tied
together in the formula z r e i.
Archimedean spirals Polar coordinates are perfect for describing spirals, where a
point’s distance from the origin depends on some quantity of rotation. The simplest
case is the Archimedean spiral, given by the equation r. This consists of all points
whose first and second polar coordinates agree: those of the form (,). When 0, the
length r is also 0, which happens only at the origin. When 4 (that is 45°) the length is
too (around 0.8. When (around 1.6), and so on. Once gets up to 2, the spiral has
performed one revolution, and crosses the polar axis. But we can continue plotting
larger values of, until it crosses the axis again at 4 (720°), and again at 6, 8, 10, and so
on.
The spiral can be expanded or contracted by including a multiplying constant: r 2
defines one twice as sparse as that above, and r 1
The Parker spiral is an Archimedean spiral formed by the sun’s magnetic field, as it
permeates space.
Mirabilis (‘the miraculous spiral’) has the polar equation r e (or equivalently ln r).
Starting at 0, e 0 1, so the curve crosses the polar axis at 1. It crosses again at e 2
(around 535.5), e 4 (around 286751.3), and so on. It also makes sense to spiral back the
other way, by allowing to take negative values. So it also crosses the axis at e –2
(about 0.002), e –4 (around
0.000003), and infinitely often as it winds ever more tightly towards the origin
(although, unlike the hyperbolic spiral, the distances decrease so quickly that the length
along the curve from any point to the origin is finite).
121
4 too (around
=0
=P
θ
θ
= 2P
= 4P
= 6P
Jakob Bernoulli was stunned by the fractal-like self-similarity of the logarithmic
spirals: if you enlarge or shrink it by a factor of e 2, the result is exactly the same
curve. Even more, if you take the inverse of the spiral (given by r e –) the result is
again the same. Bernoulli was so besotted by this curve that he instructed that one be
engraved on his tombstone. (Unfortunately the stonemason was no geometer, and
carved an Archimedean spiral instead.)
The logarithmic spiral is also known as the equiangular spiral because of another
defining property: the angle between the tangent and the radius is constant, at
4 (that is, 45°). Logarithmic spirals with different angles are given by the equation r e
c, for different numbers c, producing an angle of tan –1 (1
_ c).
The problem of the four mice In 1871, the astronomer and mathematician Robert
Kalley Miller set a tricky problem in the notorious mathematical tripos exam at
Cambridge University. It concerned four mice A, B, C and D, which start at four
corners of a square room. They are released simultaneously and all run at the same
speed, with A chasing B, B chasing C, C chasing D, and D chasing A. The problem
was to predict the paths they will follow.
Initially each mouse runs along the wall. But, as its target also moves, it deviates from
that path. The answer is that their paths produce four intertwined logarithmic spirals
converging at the centre of the room. This generalizes to rooms of other polygonal
shapes. In 1880 Pierre Brocard considered three mice in an irregular triangular room.
The three spirals meet at the first or second Brocard points of the triangle, depending
on their direction (see centres of triangles).
Roses Why might a superstitious mathematician think the equation r cos 2 lucky? In
polar coordinates, it describes a four-leafed clover: a quadrifolium. This is one of the
see centres of
family of rose curves, given by the equations r cos k (or r sin k) for different values of
k. Roses were first studied by the Italian priest Luigi Guido Grandi in the early 18th
century.
122
e-2P
1 e20P
Not to scale
45°
45°
45°
r = cos 2 θ
r = cos 3 θ
5
GEOMETRY
POLAR COORDINATES
53θ
The curve depends on the number k. If k is odd, the rose has k petals. So the equation r
cos 3 describes a 3-petalled rose, or trifolium. When k is even, the rose has not k but 2
k petals. It also makes sense to consider non-integer values for k. This time the petals
will overlap. If k is a rational number, say k a
b, with a and b coprime, again there are two cases: if a and b are both odd, the rose will
have a petals, and a pattern which starts to repeat when reaches b (a period of b).
Otherwise it will have 2 a petals and a period of 2 b. When k is irrational (such as k
2)
The tautochrone problem In 1659, Christiaan Huygens was considering beads sliding
down slopes. Assuming zero friction, he discovered a remarkable curve which is
tautochronous (‘of the same time’). No matter how far up he put the bead at the start, it
always took exactly the same time to slide to the bottom. Huygens curve was a cycloid.
If you draw a spot on a bicycle tyre, this is the path the spot follows as you cycle
along. However, the tautochrone cycloid is the other way up, as if you are cycling
across the ceiling.
Bernoulli set the readers of Acta Eruditorum a challenge. Suppose we have two points
A and B, with A higher than B (but not vertically above it). The idea is to draw a curve
from A to B, and let a bead slide down. What curve should we use, if we want the bead
to reach B in the shortest possible time? This is the brachistochrone (‘shortest time’)
problem.
Several mathematicians were able to provide the answer, including Newton, Leibniz,
Bernoulli himself, as well as his brother Jakob. The solution was the same as for the
tautochrone problem: a cycloid.
Cycloids Draw a straight horizontal line, and roll a circle along it. If you mark one
point on the circle, the curve traced out is a cycloid, famous as the solution of the
tautochrone and brachistochrone problems. A cycloid is described parametrically in
Cartesian coordinates by:
x t sin t
y 1 cos t
As t increases, the centre of the circle moves horizontally along the line y 1. So, at time
t, the centre is at (t,1), with the curve cycling around it.
Hypocycloids and epicycloids are similar constructions, with circles taking the place of
the straight line.
r = cos 4 θ
r = cos
Bead
Cycloid
123
Hypocycloids and epicycloids If a small circle rolls around the inside of a larger circle,
we can mark one point on the small circle and trace out its path. The resulting curve is
a hypocycloid. If the outer circle has double the radius of the inner one, they are called
a Tusi couple. As Nasir al-Din al-Tusi realized in the 13th century, this hypocycloid is
just a straight line segment. If the larger circle is triple the length of the smaller, we
obtain a three-pointed deltoid.
If it is quadruple the length, we obtain a curve with four cusps (sharp corners), and so
on.
A similar construction is an epicycloid, where one circle rolls around the outside of
another. A notable epicycloid is where the two circles are the same size. In this case,
the result is a cardioid. If the radius of the outer circle is half that of the inner, the
result is a nephroid (‘kidney shape’).
For hypocycloids and epicycloids, the key is the ratio of the larger circle’s radius to
that of the smaller. Call this k. If k is a whole number, we get a curve with k cusps. If k
is a rational number, say k a _
b,
with a and b coprime, the curve will self-intersect and have a cusps. If k is irrational,
the curve never closes up, and will produce an inky mess.
Roulettes The spirograph is a mathematical toy, invented by Denys Fisher in the 1960s
for creating beautiful, intricate patterns. Probably it appeals as much to geometers as to
children. The spirograph relies on a principle similar to that of cycloids, hypocycloids
and epicycloids: a small plastic disc is rolled along a line, or around a fixed larger
circle. The difference is where the pen is placed: rather than being on the perimeter of
the smaller circle, it fits in a hole somewhere inside the disc. The resulting curves are
called trochoids, hypotrochoids and epitrochoids (coming from the Greek trochos
meaning ‘wheel’).
This idea can be extended to allow the pen-point to be outside the smaller disc, a fixed
distance from the centre (as if on a matchstick glued to the disc).
Roulettes are the most general curves formed in this way, obtained by attaching a point
to a curve (not necessarily on it) and ‘rolling’ that curve along another, tracing out the
path of the point. For example, rolling a parabola along a straight line and tracing out
the path of its focus produces a catenary.
Catenary Fix the two ends of a chain to a wall, and let the chain hang down in between.
What curve will be formed? That was the question posed by Jakob Bernoulli in Acta
Eruditorum in 1690. (We assume the chain is infinitely flexible and of even density.)
Galileo Galilei had already considered this in 1638.
124
Deltoid
Cardioid
Nephroid
Hypocycloid with k = 11
Epicycloid with k = 1
Epitrochoid
Hypotrochoid
Catenary
GEOMETRY
DISCRETE GEOMETRY
He said the curve was a parabola. But Galileo was wrong, as Joachim Jungius
demonstrated in 1669.
Bernoulli received three correct solutions to his challenge: from Gottfried Leibniz,
Christiaan Huygens and his brother Johann Bernoulli. Their result was one of the early
triumphs of differential calculus. The curve which answers the question is a catenary
(chain curve), given by the equation y e x e x
_______
DISCRETE GEOMETRY
Pick’s theorem There are many ways to calculate the area of a shape on the plane.
Generally, as the figure becomes increasingly complicated, so does the formula for its
area. An elegant way around this was found by the Austrian mathematician Georg
Pick, in 1899.
Create a grid of dots by placing one at every point on the plane whose coordinates are
both integers. Pick’s method applies to any shape which can be formed by joining
these dots with straight lines. There are only two ingredients: the number of points on
the boundary of the shape (call this A), and the number of points enclosed within it
(B). Pick’s theorem states that the area is A
2 7 1 17 square units. This provides a fast method to calculate the areas of complicated
shapes which would otherwise involve dividing the shape into triangles, in a rather
longwinded way.
Thue’s circle packing Suppose you have a table and a bag of coins, and are challenged
to fit as many coins on the table as you can. All the coins are the same size and may
not be piled up, just laid flat on the table-top. What is the best strategy?
There are two obvious candidates: square packing, where the coins lie in vertical and
horizontal rows, each coin touching four others; and hexagonal packing, where each
coin touches six others (leading to staggered rows).
To compare them we use the packing density: the proportion of the area of the table
covered by coins. The packing density of the square lattice is
4 (about
12 (about 91 %). So the hexagonal packing seems to be better. But can we be certain
we have not missed another, better arrangement?
In 1831, Carl Friedrich Gauss proved that the hexagonal packing is indeed the tightest
of the regular packings. These are symmetrical packings built on lattices: patterns with
double translational symmetry. But could some
__ so the area is 22
125
strange irregular arrangement do better? In 1890, Axel Thue finally proved that there
was no such arrangement. Thue’s circle packing theorem is the 2-dimensional
analogue of the Kepler conjecture.
In what came to be known as the Kepler conjecture, Kepler asserted that this is ‘the
tightest possible, so that in no other arrangement could more pellets be stuffed into the
same container’. On closer inspection, the solution is not unique: there are two choices,
depending on whether you place the third directly over the first (the hexagonal
packing) or staggered (the face-centred cubic packing). However, both have the same
packing density:
18
Hypersphere packing The Kepler conjecture was solved (at least with 99% certainty) in
1998. But the same question in higher dimensions remains open. It is not even certain
that the closest packing of hyperspheres must be a regular lattice, rather than some less
symmetrical arrangement. The regular packings, at least, are well understood up to
dimension eight.
Although we do not know what the best packings of hyperspheres are, thanks to the
Minkowski–Hlawka theorem we do have some idea of what packing density they will
produce. This says the optimal packing in n dimensions will always achieve a density
of at least
(n) ___
2 n 1, where is the Riemann zeta function.
(about 74%).
Intuitively, it seemed obvious enough that these should be the optimal solutions. But
by 1900 no proof had been found, and David Hilbert highlighted the challenge in his
18th problem.
It was not until 1998 that Thomas Hales, with his graduate student Samuel Ferguson,
finally produced a proof. However, at 250 pages long, and with large sections relying
on computer code and data amounting to over three gigabytes, it posed a massive
challenge to check. After four years of work, the team who had been assigned the task
gave up, stating that while they were 99% sure the proof was correct, they were unable
to fully certify it. Hales is now working on a second generation proof, intended to be
verifiable by proof-checking software.
126
GEOMETRY
DISCRETE GEOMETRY
(Hales’ theorem 2) Suppose you want to divide up a large sheet of paper into cells of
area 1 cm 2, using the least possible amount of ink to draw the lines. Probably the first
attempt would consist of 1 1 squares. But there are other possibilities: rectangles,
tessellating triangles, irregular pentagons, the Heesch tile, or any other tesselation
which uses just one tile.
For many years the one which involved the lowest total line-length was believed to be
the hexagonal lattice. Indeed, this was often tacitly assumed as a fact, even though no-
one had proved it. In 1999, Thomas Hales did provide a proof: the hexagonal pattern is
indeed the most efficient method.
This explains why bees pack their honey in hexagonal tubes rather than those with
square or Heesch tile cross-sections, for example. These tubes require the least amount
of wax in proportion to their volume of any system of identical tessellating tubes. An
interesting appendix to the story concerns what the bees don’t know.
What bees don’t know In 1953, László Fejes Tóth published a paper ‘What the bees
know and what they do not know’, in which he showed that the bees’ honeycomb
design is not quite optimal, despite their use of the hexagonal honeycomb conjecture.
Bees’ hive
Honey cells are open at one end. At the closed end, two layers of hexagonal tubes are
separated by a wax partition, consisting of four rhombuses closing off each tube. Fejes
Tóth showed that less wax would be needed for a partition comprising two hexagons
and two small squares. However, Tóth’s design used only 0.35% less wax than the
bees’. The compensating benefits of ease of construction and the stability of the hive
may explain the bees’ choice.
Kelvin’s conjecture Which 3-dimensional shape has the lowest ratio of surface area to
internal volume? The answer is the sphere, which explains why soap bubbles are
round.
The question becomes more complicated when we want more than one cell. This is a
question Lord Kelvin considered in 1887: how to divide up
127
3-dimensional space into cells of the same size, using a partition of minimal surface
area? This is a 3-dimensional analogue of the hexagonal honeycomb conjecture, and
Kelvin believed that he had found the optimal arrangement, in what is now called a
Kelvin cell, essentially a truncated octahedron (a space-filling Archimedean solid),
with slightly curved faces.
Denis Weaire and Robert Phelan refuted Kelvin’s conjecture by finding a new
structure which improved upon the Kelvin cell, by 0.3%. Their repeating unit is
constructed from eight irregular, slightly curved polyhedrons: two dodecahedra (with
12 pentagonal faces), and six 14-hedra, each with two hexagonal and 12 pentagonal
faces. This discovery was celebrated in the architecture of the aquatic centre at the
Beijing Olympics, in 2008. The architect Kurt Wagner explained that they created a
huge Weaire–Phelan foam, and then ‘cut the overall shape of the building out of the
foam structure’. It is not known whether Weaire-Phelan foam provides the ultimate
solution to Kelvin's problem.
The map colouring problem How many colours do you need to colour a map so that no
country borders another of the same colour? This was the question that the British
lawyer and mathematician Sir Alfred Kempe addressed in 1879. The question did not
concern the geography of the real world: any arrangement of shapes whose ‘countries’
came in one connected piece (unlike the USA, in which Alaska and Hawaii form
disconnected parts) was classed as a ‘map’. Countries meeting at a single point are
allowed to have the same colour, only ones separated by boundary lines may not.
Kempe’s solution was that every conceivable map drawn on a sphere ‘can in every
case be painted with four colours’. Unfortunately for him, in 1890 Percy Heawood
found a problem: it did not mean that four colours could not work, but it did identify a
fatal gap in Kempe’s argument. Extending his ideas, however, Heawood was able to
prove that five colours would always be enough.
The four colour theorem For over 80 years the map colouring problem stood open: no-
one was able to prove that four colours are always enough, nor could anyone construct
a map which required five. That was until 1976 when Kenneth Appel and Wolfgang
Haken of the University of Illinois published their proof that four colours are indeed
always sufficient. Their proof did not rely on mathematical ingenuity alone. It was a
mammoth effort, requiring 1000 hours of computer time, and containing 10,000
different diagrams. Appel explained: ‘There is no simple elegant answer, and we had
to make an absolutely horrendous case analysis of every possibility’.
128
GEOMETRY
DIFFERENTIAL GEOMETRY
DIFFERENTIAL GEOMETRY
Gaussian curvature If we have a surface, we want some way to measure how curved it
is. Of course, it may have some flat parts and some highly curved parts. So curvature is
a local phenomenon. Gaussian curvature is a device which uses differential calculus to
measure the curvature at a particular point, x. It produces a number K (x). It works by
placing an arrow coming out of the surface at x, perpendicular to the surface. This is a
normal vector. If K (x) < 0, then the surface bends towards the normal vector in one
direction, and away from it in another. In other words, the surface is saddle-shaped at
x. An example is a one-sheeted hyperboloid, which has negative curvature everywhere.
A sphere is an example of a surface with positive curvature everywhere (so K (x) > 0
at every point). Starting at our chosen point x, all directions of the surface bend in the
same direction, relative to the normal vector.
As expected, a surface may have positive curvature in one place, negative in another
and zero at a third. There is a limit to this variety, however, given by the Gauss–
Bonnet theorem.
Developable surfaces If the Gaussian curvature at a point x is zero, that is, K (x) 0, it
does not follow that the surface is completely flat, like a plane. It means there is at
least one direction in which the surface is flat. A cylinder has zero curvature at every
point, as does any surface which can be unrolled to plane without distortion. These are
the developable surfaces. Other examples are a cone and a developable helicoid.
Theorema Egregium If I draw a straight line on a piece of paper, and ask if my line is
horizonal, your answer would not just depend on the line. It would have to take into
account the position of the line in relation to the page or the floor. Qualities such as
being horizonal are not intrinsic to a geometric object, but depend on its relationship to
the ambient space. Gauss’ Theorema Egregium says that Gaussian curvature is not like
this. It is an intrinsic property of the surface and does not depend on the ambient space.
Gauss thought this result ‘remarkable’ because his original definition does make heavy
use of the ambient conditions.
129
Local and global geometry There are several ways to view a mathematical surface.
One is to pay close attention to its detailed geometry in small regions. Curvature
belongs to this local world. The Euler characteristic, on the other hand, belongs to
another realm, namely topology. Here, the object is considered globally, and changes
in small regions are irrelevant. In a wonderful development, these two phenomena are
nevertheless intimately related by a fundamental result of 19th- century geometry,
attributed to Carl Friedrich Gauss and Pierre Bonnet: the Gauss–Bonnet theorem.
Gauss–Bonnet theorem If we are working on a surface (S) with finite area, and no
edges, then integration provides a way to take local data at each point of S and average
it all out, to give one global piece of information about the whole surface. The Gauss–
Bonnet theorem says that when you do this with the Gaussian curvature K, what
emerges is none other than the Euler characteristic (S) (multiplied by a constant 2).
s K 2 (S)
If you bend and pull the surface, you can dramatically alter the curvature at every
single point, but these changes all cancel each other out.
Every smooth surface which is a topological sphere (such as the surface of a banana or
a frying pan) has Euler characteristic 2. So when you integrate the Gaussian curvature
over the whole surface, you will always get the answer 4.
On the punctured plane there are no shortest paths (any path can be made shorter)
Geodesics Everyone knows that the shortest path between two points is a straight line.
But what happens if we’re working on a surface, such as a sphere, which does not
contain any straight lines? In the specific case of a sphere, there is a nice answer: the
roles of straight lines are played by great circles. These are the largest circles the
sphere can hold (obtained by slicing a plane through the centre of the sphere).
In general, the curves which make up the shortest paths on a surface are called its
geodesics. Shortest paths do not always exist. For example, if we take the ordinary
plane, and remove the point (0, 0), then there is no shortest route in the punctured
plane between the points (1, 0), and (1, 0); every route can be shortened by bypassing
the hole more closely.
However, geodesics always exist locally. In other words, at any point on a surface
there are geodesics leaving it, in every direction. On a cyclinder, there are three types
130
Geodesics on a sphere
GEOMETRY
DIFFERENTIAL GEOMETRY
1 Isometric maps should preserve the distance between any two points. Unfortunately a
flat map can never do this.
2 Equiareal functions keep areas proportionately the same. An example is the Albers
projection, which is formed by cutting a cone through the globe, and projecting the
points onto it.
3 Conformal functions preserve angles. So two lines which meet on the earth, will
meet at the same angle on the map. It is possible to draw a conformal map of the earth,
for example by stereographic projection.
4 The gnomonic projection preserves the shortest route between any two points (but
not the length of that route). This means that it represents great circles on the sphere as
straight lines on the page. (Only one hemisphere can be represented.)
5 A rhumb line is a path spiralling around the sphere determined only by an initial
bearing. These were historically important in navigation. Mercator projections are
maps which represent rhumb lines as straight lines. They are formed by wrapping the
globe in a cylinder, and projecting the points outward.
No map can have more than one of these features; some distortion is inevitable; the
question for the cartographer is where the priorities lie.
Isometric maps of the earth A map of the earth would be isometric if the distance
between any two points on the earth was the same as that given by the map (suitably
scaled down, of course). However, isometric maps, even of portions of the earth, are
impossible on flat paper. This is a consequence of the Theorema Egregium. Because
curvature is intrinsic, it must be preserved by any isometric function (such as one
taking a point on the earth to a point on the map). Consequently, any isometric map of
the earth has to be curved like a globe, not flat. For a small region, such as a city, this
does not create a problem as the earth is approximately flat.
There are ways to approximate isometric maps for the whole globe, such as the orange-
peel map, but fundamentally, the obstacle is immovable.
Albers projection
Stereographic projection
Gnomonic projection
Mercator projectio
Orange-peel map
131
Stereographic projection Stereographic projection is a way to draw a map of a sphere
on a flat plane. The idea is as follows: place the sphere on the plane. Then, given a
point x on the sphere, send a beam from the north pole through the sphere, passing
through x. The place where that beam crosses the plane is where x is marked on the
map. The only point which gets missed off the map is the north pole. The resulting
map is conformal, that is, angles on the sphere match the corresponding angles on the
map.
TOPOLOGY
Topology Whereas traditional geometry focuses on rigid objects such as straight lines,
angles or curves given by precise equations, topology studies structures at a higher
level of abstraction. A shape’s topological properties are those which can survive any
amount of stretching and twisting (but not cutting or gluing). For example a cube can
be squashed until it is spherical, but a torus (or donut) cannot. So, topologically, a
sphere and a cube are equivalent, but a torus is different. Similarly the letter C is
topologically equivalent to L, but distinct from B.
Dubbed ‘rubber sheet geometry’, topology grew into a subject in its own right in the
early 20th century, although its roots go back to Leonard Euler’s solution of the Seven
Bridges of Königsberg problem, in 1736. Topology now comprises several large
bodies of research. The London underground map is an example of a topological
representation, since it ignores the precise geometry of distances and directions, but
accurately represents factors such as the ordering of stations, and the intersections of
tube (subway) lines.
Möbius strip Take a rectangular strip of paper, give it a half-twist before gluing the
ends together, and you will obtain a Möbius strip. Discovered in 1858 by August
Ferdinand Möbius, the interest of this object is that it only has one side (or in
mathematical terms is non-orientable). Möbius strips featured heavily in the work of
the artist M.C. Escher. They are mathematically important because many other
topological constructions depend on them: notably the real projective plane and Klein
bottle. Both can be built by gluing the edge of a Möbius strip to itself, in one of two
ways.
132
GEOMETRY
TOPOLOGY
Orientable surfaces If you look at just a small portion of a sphere, it looks very much
like a 2-dimensional plane. (Indeed, this is the perspective of most human beings for
most of their lives.) It is only on zooming out that its global spherical nature becomes
clear. This is essentially the definition of a surface: around every point is a patch of
plane (perhaps slightly curved). The question is what are the possible global shapes
that can be patched together this way? The whole plane itself is one example; but much
focus is on closed surfaces, which have a finite area (and come in one piece).
Double torus
Non-orientable surfaces The definition of a surface is a local one: something built from
small patches of 2-dimensional plane. Sometimes, mathematics throws up unexpected
possibilities. We can patch together pieces of plane in a way which is internally
coherent, but which cannot accurately be represented in 3-dimensional space. These
are the non-orientable surfaces. Technically a definition of a non-orientable surface is
this: if you put a watermark in a surface, it can be slid around until it is in its starting
position, but as a mirror image of the original. This means that the surface must
contain a Möbius strip somewhere.
So a way to produce non-orientable surfaces is to start with a sphere, cut slits in it and
‘sew in’ Möbius strips along their edges. The first two non-orientable surfaces
obtained this way are the real projective plane and the Klein bottle.
Start with a square of paper, and draw arrows along the left and right edges, both
pointing up. If you glue these edges together, with the arrows matching, you will create
a cylinder.
Sphere
Torus
133
Start again, this time drawing the arrows in opposite directions. Gluing the edges
together with the arrow heads touching now creates a Möbius strip.
To arrive at a surface without edges, we’ll also need to join the two remaining sides:
take the configuration for the cylinder, and add a new pair of arrows on the top and
bottom, both running from left to right, and you have the pattern for a torus. The
sphere can also be created this way (of course two cones joined at their bases form a
sphere, topologically speaking).
The two remaining possible configurations represent the real projective plane and the
Klein bottle. If you try to build these shapes out of paper, you’ll find that it can’t be
done in 3-dimensional space without the paper cutting through itself. In a true Klein
bottle this does not happen. Nevertheless, making this allowance, some very beautiful
models of the Klein bottle and real projective plane can be built. The real projective
plane is an important example of a projective space.
What if we started with a torus, and sewed a Möbius strip into that? In a piece of
mathematical magic, it turns out that this gives us nothing new: this non-orientable
surface is the same as a sphere with three Möbius strips sewed in.
One of the first significant results in topology, the classification of closed surfaces
(formalized by Brahana in 1921 based on the earlier work of others), says that there are
no other surfaces. Every closed surface belongs to one of these two families.
This result can be made more precise using the Euler characteristic.
1 vertex
1 edge
1 edge
Cylinder
Möbius strip
Torus
Sphere
Klein bottle
1 vertex
Euler’s polyhedral formula Start with a sphere, and mark some spots (or vertices) on it,
say V of them. Now join these together with some edges, E of those. (The fine-print is
that every vertex must have at least two routes away from it, and edges cannot end, or
meet, except at vertices.) This process will have divided the sphere into different
regions (or faces). Suppose there are F of those. This simplest example shown here
contains one vertex, and one edge, which passes around the sphere once, dividing it
into two faces: V ? 1, E ? 1 and F ? 2.
In 1750, Leonhard Euler noticed that in every example the numbers of vertices, edges
and faces satisfy the formula: V ? E ? F ? 2. (This is sometimes known just as Euler’s
formula: Euler was a man of many formulas.) He did not
2 faces
2 faces
134
GEOMETRY
TOPOLOGY
provide a complete proof that this would always be the case, but in 1794 Adrien-Marie
Legendre did manage this.
This must be true for every surface which is topologically equivalent to a sphere too:
so the vertices, edges and faces of every polyhedron must obey this formula.
Euler’s polyhedral formula does not hold for surfaces, such as a torus, which are not
topologically spherical. But we can apply the same technique, taking care that the
edges really divide the surface into faces which can be uncurled flat, rather than tubes.
For a torus, this can be done with one vertex and two edges, which produces one face.
So, in this case: V E F 0. This new formula will hold true for any arrangement of
vertices, edges and faces on a torus. Trying the same thing for the double torus, we
find V E F 2.
Similar results hold for non-orientable surfaces too. On the real projective plane, we
always get V E F 1; on the Klein bottle, V E F 0.
Euler characteristic The polyhedral formulas on surfaces say that if you have a surface,
however you divide it up with vertices and edges, the number V E F will always come
out the same. This fixed number is called the Euler characteristic of the surface,
symbolized by the Greek letter. So if S is a sphere, T is a torus, and D is a double torus,
then (S) 2, (T) 0 and (D) 2.
For orientable surfaces, if X is a sphere with g handles added (that is, X has genus g),
then
(X) 2 2 g. For non-orientable surfaces, if Y is a sphere with n Möbius strips sewn in,
then
(Y) 2 n.
The Euler characteristic alone cannot identify a closed surface (the torus and Klein
bottle both have Euler characteristic zero). But the classification of closed surfaces
implies that every surface is completely determined by two pieces of information:
whether or not it is orientable, and its Euler characteristic.
135
This came as a shock to topologists of the early 20th century who believed that any
sphere should crisply divide 3-dimensional space into two simple parts: an inside and
an outside. But the outside of the horned sphere is fiendishly complex. You can draw
infinitely many different loops, which cannot slide from one to the other, because they
are irretrievably entwined in the horns.
Manifolds A surface is an object you can divide into patches that each look like a piece
of plane (otherwise known as 2-dimensional Euclidean space). The notion of a
manifold lifts this idea to higher dimensions. An n-manifold is an object which can be
divided up into patches that look like n- dimensional Euclidean space. (How close this
resemblance has to be is the difference between plain topology, and differential
topology.)
3-manifolds Just as the classification of closed surfaces describes every possible closed
2-manifold, so the more recent geometrization theorem fully classifies closed 3-
manifolds (closed meaning that that the manifold has finite volume, and no edges).
In 1982, the Fields medallist William Thurston listed eight special types of 3-manifold
that, he thought should be the elementary forms from which all others are built. He
described how to chop any closed 3-manifold into pieces, each of which, he believed,
should be of his eight types. Thurston’s eight manifolds correspond to different notions
of distance. The commonest are the Euclidean and hyperbolic ones, another is
spherical geometry. The remaining five come from certain Lie groups.
Thurston showed that many 3-manifolds satisfy his geometrization conjecture, but he
could not prove that all do. In 2003, Grigori Perelman did prove this, using highly
sophisticated methods from dynamical systems, and building on the work of Richard
Hamilton. The celebrated Poincaré conjecture followed as a consequence.
Simple connectedness If you draw a loop on an ordinary spherical surface, that loop
can be gradually contracted until it is just a single point. This is the definition of being
simply connected. This property distinguishes a sphere from a torus (donut) where a
loop encircling the hole can never be shrunk away. Other surfaces such as cubes are
simply connected too, but these are equivalent to spheres. So, topologically, the sphere
is the only simply connected 2-dimensional surface. The Poincaré conjecture tackled
the question of whether the same thing is true for 3-manifolds instead of surfaces.
136
GEOMETRY
TOPOLOGY
First posed in 1904 by Jules Henri Poincaré, this conjecture (now elevated to the status
of a theorem) occupies a prominent place in modern topology. It had been known for
many years that the sphere is the only simply connected surface. Poincaré’s question
was whether the same was true when we step up one dimension. It was known that the
3-sphere (the 3-dimensional analogue of the usual sphere) is simply connected. The
missing ingredient was to rule out other undiscovered 3- manifolds which might also
be simply connected.
Poincaré’s conjecture states that the 3-sphere is indeed the only simply connected 3-
manifold. It attracted widespread attention within mathematics, defying numerous
attempts to prove it throughout the 20th century, and also in physics where it can be
interpreted as limiting the possible shape of the universe. In 2000, the conjecture was
listed as one of the Clay Institute’s $1,000,000 problems. It was finally proved in 2003
by Grigori Perelman as a consequence of the geometrization theorem. Perelman
declined both the prize money and the Fields Medal.
We might naïvely expect that the more dimensions we look at the more impenetrable
mathematics becomes. But this is not true; in many respects 3- and 4-dimensional
space is more awkward to analyse than that of higher dimensions.
The generalized Poincaré conjecture says that, in every dimension, the ordinary sphere
is the only homotopy sphere. The classification of closed surfaces had already given a
positive answer in the 2-dimensional case. In 1961 Steven Smale proved that the
generalized Poincaré conjecture is true in all dimensions from 5 upwards, under an
additional hypothesis. This triumph won him a Fields
137
medal in 1966. In the same year, Max Newman was able to show that the extra
hypothesis was not needed, thereby completing the proof. This just left dimensions 3
and 4. The latter was resolved in 1982 by Michael Freedman (who was also awarded a
Fields medal in 1986). So Henri Poincaré’s original 3-dimensional conjecture was the
last to fall, as it did to Grigori Perelman in 2003.
Differential topology The topological definition of a manifold allows some fairly wild
objects to qualify. In differential topology, these requirements are tightened. Only
smooth manifolds are allowed, so pathologies such as the Koch snowflake and
Alexander’s horned sphere are eliminated. Similarly, the topological idea of two
manifolds being equivalent is quite coarse, allowing one to be pulled and twisted into
the shape of the other in pretty violent ways. A finer notion is that of differential
equivalence: two smooth manifolds are considered the same if one can morph into the
other smoothly (essentially in a way that can be differentiated).
But this raises a subtle possibility: two smooth manifolds could turn out to be
topologically identical, but differentially different. Another way to say this is that one
underlying topological manifold could support two incompatible smooth structures.
This phenomenon is difficult to imagine, not least because it does not actually happen
in dimensions 1, 2 or 3. (So the classification of closed surfaces and the geometrization
theorem for closed 3-manifolds both remain valid at the level of smooth manifolds.)
But in dimension 4 it does occur, in spectacular style.
Every science-fiction fan knows that the fourth dimension is a crazy place. In the
1980s differential topologists discovered that the truth is even stranger than fiction. In
dimensions 1, 2 and 3, the distinction between a manifold and a smooth manifold is not
especially important: every topological manifold can be smoothed, and smooth
manifolds which are topologically equivalent are also differentially equivalent.
On entering the fourth dimension, this cosy set-up crashes, badly. In 1983, using ideas
from Yang-Mills theory, Simon Donaldson discovered a large collection of 4-
manifolds which are unsmoothable: they do not admit any differential structure at all.
Worse, the simplest 4-manifold, 4-dimensional space itself (R 4), came under attack.
Michael Freedman found a manifold which is topologically identical to R 4, but
differentially different from it. In 1987, Clifford Taubes showed that the situation is
even more extreme: there are uncountably infinitely many such manifolds, all
differentially inequivalent. These are the exotic R 4 s.
Exotic spheres The exotic R 4 s are truly an anomaly: in every other dimension (n),
there is only one smooth version of Euclidean space (R n).
In higher dimensions than 4, as in the fourth dimension, it does remain true that
smooth manifolds can be indistinguishable topologically but differentially different.
(However, unlike the 4-dimensional jungle, there can only ever be finitely many
incompatible manifolds in higher dimensions.)
138
GEOMETRY
KNOT THEORY
This even happens with spheres: in 1956, by experimenting with the quaternions John
Milnor discovered a bizarre new 7-dimensional manifold. Later he remembered, ‘At
first, I thought I’d found a counterexample to the generalized Poincaré conjecture in
dimension seven’. On closer inspection, this was not true. His manifold could morph
into a sphere, but it could not do so smoothly. Topologically it was a sphere but,
differentially, it was not. This was the first exotic sphere.
The smooth Poincaré conjecture in four dimensions In dimensions 1, 2 and 3 there are
no exotic spheres: smooth manifolds which are spherical from a topological
perspective, but not from a differential one. In 1963, John Milnor and Michel Kervaire
developed surgery theory: a powerful way to manipulate high-dimensional manifolds
by cutting and gluing. This technological leap allowed them to determine exactly how
many exotic spheres there are in every dimension of at least 5. The answer is that there
are none in dimensions 5 and 6. But the one that Milnor had discovered in dimension 7
was one of a family of 27. In higher dimensions the answers range from none to
arbitrarily large families.
But, as with the case of the exotic R 4 s, 4-dimensional space is uniquely awkward. At
time of writing, it is not known whether there are any exotic spheres in dimension 4. It
is even conceivable that there could be infinitely many. The assertion that there are no
4-dimensional exotic spheres (that is, that every topological sphere is also differentially
a sphere), is known as the smooth Poincaré conjecture in four dimensions, and is
considered a very difficult problem.
KNOT THEORY
Mathematical knots The vortex theory of the atom was an idea in 19th-century physics.
It held that atoms were knots in some all-pervading aether. Although the theory was
short-lived, it prompted Lord Kelvin and Peter Tait to begin the mathematical
investigation of knots, which remains an active area of research today.
For a mathematician, a knot is a knotted piece of string, in which, importantly, the ends
have been fused to produce a knotted loop. When more than two pieces of string are
involved, it is called a link. According to the vortex theory, two knots represented the
same chemical element if they are essentially the same. Although this idea is long
discredited, the question it poses is a completely natural one. Being ‘essentially the
same’ is a topological concept: two knots are equivalent if one can be pulled and
stretched into the shape of the other (without cutting or gluing, of course).
139
The principal aim of knot theory is to find a method to determine whether two knots
are equivalent. This is a deceptively deep problem. The Perko pair illustrates how
difficult the problem is, even for comparatively simple knots. Even the simplest knot
of all, the unknot, a plain unknotted loop, can be cleverly disguised. Spotting this is
called the unknotting problem. The Haken algorithm provides a theoretical solution to
the knot problem, but the search for ever more powerful knot invariants goes on.
Knot tables In the 19th century, Peter Tait began listing all possible knots, according to
their number of crossings. By 1877, Tait had listed knots with up to seven crossings.
He was joined in his project by the Reverend Thomas Kirkman, a vicar from England,
and Charles Little in Nebraska, USA. Communicating by mail, they largely managed
to classify knots with eight, nine and ten crossings, and made some inroads into those
with 11.
Efforts continued throughout the 20th century but, as the lists grew longer and the
knots more complex, identifying those which are really the same became a formidable
challenge.
In 1998, Hoste, Thistlethwaite and Weeks published a paper entitled ‘The first
1,701,936 knots’, which was a full classification up to 16 crossings. Although no
complete list is yet known beyond this, some important subfamilies have been
classified up to 24 crossings, taking current tables to over 500,000,000,000 distinct
knots and links.
The Perko pair One of the early knot tables was compiled by Charles Little in 1885. It
featured a list of 166 knots with 10 crossings, including the pair 10 161 and 10 162.
Subsequent generations of knot theorists built on this foundation, and for almost 100
years 10 161 and 10 162 sat side by side in every knot catalogue and textbook. It was
not until 1974 that the New York lawyer and amateur mathematician Kenneth Perko
spotted the error: the two are actually different formations of the same knot.
Chiral knots After the unknot, the simplest knot is the trefoil, with its three crossings.
Actually, this comes in two variants: left-handed and right-handed. The figure-of-eight
knot also comes in two mirror-image forms. It’s by no means obvious, but these two
turn out to be equivalent. With a little pulling around, the left-handed figure of eight
can be turned into the right-handed one. Might the same true for the trefoil knot? The
answer is no. The trefoil knot is chiral, meaning that the two forms are distinct, while
the figure-of-eight knot is achiral. For more complicated knots, chirality becomes very
difficult to detect.
Knot theory has many applications in wider science, and chirality often has physical
significance. In chemistry some compounds are chiral, meaning that their molecules
come in both left-handed and right-handed varieties, which can have different chemical
properties.
10162
Not equivalent
Equivalent
10 161
10 162
10161 10162
10161
140
GEOMETRY
KNOT THEORY
The Jones polynomial For over 50 years, the Alexander polynomial was the best
algebraic tool for telling knots apart. However, in 1984, Vaughan Jones noticed an
unexpected connection between his own work in analysis and knot theory.
His insight blossomed into a brand new knot invariant. Although still not perfect, the
Jones polynomial holds several advantages over Alexander’s. Notably it can almost
always identify chirality. The right-handed trefoil knot, for example, has Jones
polynomial s s 3 s 4, while the left-handed one has s 1 s 3 s 4.
Jones’ discovery quickly found applications in broader science, notably among the
knotted DNA molecules of biochemistry. Since his work, the Jones polynomial has
been built upon, in the search for yet more powerful invariants. In 1993, Maxim
Kontesevich formulated a new mathematical entity, known as the Kontesevich integral.
It is a seriously complicated object (even the Kontesevich integral for the unknot is
difficult to write down). A major open problem in knot theory is Vassiliev’s
conjecture, which implies that the Kontesevich integral can indeed distinguish between
any two knots.
The Haken algorithm In 1970, Wolfgang Haken tackled the question of telling when
two knots are equivalent. His tactic was to turn the whole problem inside out. Instead
of comparing two knots floating in space, he looked at the knots’ complements: the 3-
dimensional shapes that are left when you remove the knots from the surrounding
matter, leaving knot-shaped holes (as if he’d set loosely knotted strings in blocks of
glass, and then removed the strings). By telling whether these two objects are
topologically the same, the same would go for the knots. Haken developed a method to
dissect the two complements in stages, before deciding whether or not they are the
same. It was a brilliant idea, but Haken’s algorithm still had holes in it when he moved
onto other concerns (most notably the four colour theorem). However, Sergei Matveev
picked it up, and was able to fill in the final gap, in 2003.
141
The unknotting problem A general aim of knot theory is to tell whether two knots are
equivalent. A simpler question is to tell whether a given knot is equivalent to the
unknot (the plain unknotted loop). More manageable algorithms than Haken’s have
been found, specifically for recognizing configurations of the unkot. Even with these,
it is unknown whether they can be made to run fast enough ever to be of practical use
in the real world, that is, whether the question can be answered in polynomial time (see
Cobham’s thesis). It is known that the unknotting problem is in the complexity class
NP, so a positive answer to the P NP question would settle this.
NON-EUCLIDEAN GEOMETRY
Nikolai Lobachevsky and János Bolyai independently discovered a new and unfamiliar
possible system for geometry, called hyperbolic geometry. The basic Euclidean
ingredients survive: distances, angles and areas. But they combine in new and
unexpected ways. Crucially, Euclid’s parallel postulate fails, which was the historical
impetus for this discovery. The angles in a triangle now add up to less than (180°).
More weirdly, just knowing the angles in a triangle (say A, B and C) is enough to tell
you its area: (A B C). (This is inconceivable in Euclidean space, where there are
similar triangles of any area you like.)
How can we imagine such an alien space? Various models of the hyperbolic plane
have been constructed, notably the Poincaré disc. The Minkowski model is hyperbolic
geometry on one half of a two-sheeted hyperboloid. This plays a central role in special
relativity in physics. Most topics of interest in Euclidean geometry have hyperbolic
counterparts. There are hyperbolic surfaces and manifolds, hyperbolic trigonometry
(involving cosh and sinh), and a well-developed theory of tilings of hyperbolic spaces,
for example.
ALGEBRAIC TOPOLOGY
Euclidean hegemony was blown apart, and a question arose: are there any other
possible geometries? In fact, we have been living on one for the last 4 billion years.
Hyperbolic geometry breaks the parallel postulate by having more than one line
through any point that is parallel to a given line. Elliptic geometry says that there are
no parallel lines at all; every pair of lines meets. (Euclid’s other postulates have to be
slightly amended to allow this, but variants of them continue to hold.)
There are subtly different forms of elliptic geometry, but the commonest is the
spherical variety. Here, the space is the surface of a sphere. The role of straight lines is
played by great circles: the largest circles the sphere can hold (formed by a plane
cutting through the centre of the sphere, these are the sphere’s geodesics). In elliptic
geometry, the angles in a triangle add up to more than 180° (but less than 540°). One
such is the triple right-angled triangle.
Biangles In elliptic geometry, there are no parallel lines: every pair of lines must meet.
In this context, the triangle’s crown as most elementary polygon is appropriated by the
two-sided biangle. In spherical geometry, a biangle’s two angles are necessarily equal,
and its area is determined simply by adding them together. The biangle illustrates that,
in elliptic geometry, two points no longer determine a unique line, but there are
infinitely many possible lines joining them together. Similarly, on planet Earth, there is
no shortest route from the north pole to the south pole.
ALGEBRAIC TOPOLOGY
Biangle
143
Hairy tori and Klein bottles
The hairy ball theorem raises a question: which shapes can be combed?
The story changes when we move up a dimension. We can comb a hairy 3-sphere (that
is the 3-dimensional analogue of the familiar ball). Similarly, we can comb a 5-sphere
and 7-sphere and generally an n-sphere where n is odd. But an n-sphere is never
combable when n is even.
Geometric fixed points Suppose you have a piece of paper lying flat on the bottom of a
box. If you scrunch, fold, or roll it up, and then throw it back in the box, Brouwer’s
fixed point theorem guarantees that there must be at least one point on the paper which
is directly above its original position.
Another example, which inspired Brouwer to make his observation, is if you stir a cup
of coffee, at any moment there is a molecule of drink which is exactly in its original
place. These are geometric fixed points, and Brouwer’s theorem guarantees that in
many circumstances there will be one.
Brouwer’s fixed point theorem is that if you take a geometric object and deform it
somehow, there must be at least one point whose position is unchanged. There are
some important caveats however.
Firstly, in the example of the paper in the box, you cannot tear it. The theorem is easily
violated by ripping the paper in two, and swapping the two halves. In mathematical
terms, the function must be continuous.
Secondly, you must put the paper back entirely in the box (and not spill the coffee).
Another example, if you are carrying around a map of a city, there will always be one
point on the map which occupies exactly the position that it represents. But if you take
the map out of town, this is no longer true. Technically this means that we must be
talking about a function from a space X into itself.
The final point is even more subtle. If we consider the whole infinite plane, sliding it
one unit to the right will move everything, leaving no fixed points. The theorem holds
only if X is (topologically) a disc or a ball. That would be a line segment in one
dimension, a disc in two dimensions, and a ball in three or more dimensions. In each
case, the boundary of X must be included. Shave off the edge, and the theorem no
longer holds.
144
GEOMETRY
ALGEBRAIC GEOMETRY
Algebraic topology Topologists of the early 20th century arrived at a new streamlined
language for geometry, based on simplices.
A complex is a shape obtained by gluing any number of simplices together, along their
edges. What Solomon Lefschetz and others realized is that the data needed here is so
minimal, that certain underlying algebraic rules emerge. By adding and subtracting
simplices, they produced groups. These homology groups encode a great deal of data
about a shape.
Triangulation A sphere is not built from basic chunks like simplices. By allowing
simplices to be bent and stretched once they are constructed, a much broader range of
shapes can be constructed. When a shape can be broken down into these stretched
simplices, we say that it has been triangulated. A sphere can be triangulated as four
triangles. In fact, every manifold in two and three dimensions can be triangulated. For
this reason homology groups are powerful tools in practical geometric problems, such
as medical imaging.
However, 4-dimensional space is a strange place. Some 4-manifolds cannot be
triangulated at all. In higher dimensions, the complete answer is not known.
Euler characteristic is one example of a valuable piece of data which emerges from
triangulation.
ALGEBRAIC GEOMETRY
Varieties The standard way to describe a curve or surface is via an equation, typically
involving a polynomial. This immediately gives us two perspectives: the geometric
curve and the algebraic polynomial. The equation of a circle, for example, is x 2 y 2 1.
So if we write P (x, y) to stand for the polynomial x 2 y 2 1, then the circle is the set of
points where P vanishes. That is, it is the collection of all pairs (x, y) such that P (x, y)
0.
O-simplex
1-simplex
2-simplex
3-simplex
complex
145
This is the basic idea of a variety, the set of points where a polynomial (or collection of
polynomials) vanish. Geometric operations such as gluing varieties together, or
looking at where they overlap, correspond to particular algebraic operations on the
polynomials. This is the starting point of the subject of algebraic geometry, whose
principal aim is to understand varieties as general phenomena, instead of focusing on
individual examples (such as conic sections).
The primary setting for contemporary geometry is the complex numbers. Here the
fundamental theorem of algebra guarantees that every variety has its full quota of
points. However, modern geometry also applies these techniques in a wide range of
other settings.
Polynomials make sense anywhere where we can add, subtract, and multiply. This
opens up geometric questions in unexpected places, such as finite fields. The Weil
conjectures unlocked the secrets of these finite geometries. How algebraic geometry
interfaces with algebraic topology is the subject of the Hodge conjecture, one of the
deepest questions in the subject.
The early use of perspective by artists such as Masaccio involved a single vanishing
point, typically in the centre of the canvas. If the ground in the painting is imagined as
an infinite chessboard, the parallel lines running away from the viewer all converge at
this single point.
146
GEOMETRY
ALGEBRAIC GEOMETRY
Later artists such as Vittore Carpaccio experimented by placing this vanishing point in
different positions, sometimes even outside the canvas.
This modification introduces an extra problem, however. With the central vanishing
point, the chessboard lines running away from the viewer converge, and the lines at
right angles to them appear horizontal. But relocating the vanishing point means that
these perpendicular lines no longer look horizontal. In fact they all converge to a
second vanishing point. Straight lines drawn across the chessboard with intermediate
angles converge at vanishing points between the two. The line joining the two is the
vanishing line.
Vanishing point
Vanishing line
Desargues’ theorem
Take two triangles abc and ABC. A theorem named in honour of the father of
projective geometry, Gérard Desargues, relates two different notions.
1 The triangles are in perspective from a point, if the lines Aa, Bb, and Cc all converge
to a single point.
2 The triangles are in perspective from a line, if the three points where AB and ab
meet, BC and bc meet, and AC and ac meet, all lie on a straight line.
Desargues’ theorem says that these two are equivalent: two triangles are in perspective
from a point if and only if they are in perspective from a line.
How to draw a triangle Desargues’ theorem is an important result for the visual artist,
as it removes the need to worry about vanishing points. The key point is that the two
triangles do not have to be in the same plane.
A relevant case is when ABC is on the floor, and abc is its image on the artist’s canvas.
We imagine the canvas standing on the floor at 90°.
The artist’s aim is that the two triangles should be in perspective from a point. But
where should she draw abc? Desargues’ theorem says that the triangles should also be
in perspective from a line. This line must be where the floor and canvas meet.
The artist can draw the first point ‘a’, wherever she likes. Then she has to make sure
that each edge of abc extends to hit the floor in the same place that the corresponding
side of ABC hits the line of the canvas.
a
In perspective from a point
Floor
Canvas
147
Projective geometry Any two points on the Euclidean plane define a straight line.
Going the other way, it is almost true that any two lines meet at a single point. This
duality between points and lines is an elegant and powerful principle. Unfortunately, it
does not quite hold true: pairs of parallel lines never meet.
In art, however, parallel lines can meet, or at least appear to. Parallel lines running
away from the viewer eventually converge. What artists call a vanishing point
corresponds to the mathematician’s idea of a ‘point at infinity’. The idea is to enlarge
the plane by adjoining extra points where parallel lines are deemed to meet.
Once this is done, geometry starts to look different, and in many ways simpler. For
example, the three conic sections are reunited, as different views of an ellipse. Of
course projective geometry is not Euclidean, since it is engineered for the parallel
postulate to fail. But neither is it non-Euclidean in the same sense that hyperbolic
geometry is. It is best thought of as a closing up (technically a compactification) of
Euclidean geometry. Every small region is still perfectly Euclidean; it is only when the
space as a whole is considered that its overall non-Euclidean nature is revealed.
Homogeneous coordinates Mathematicians do not like playing fast and loose with the
concept of ‘infinity’; too much nonsense has been written over the centuries by people
playing around freely with this concept. So if projective geometry requires extra
‘points at infinity’, then we need a system of coordinates for which this is meaningful.
August Möbius’ homogeneous coordinates provide an elegant solution.
While points on the real plane are given by a pair of Cartesian coordinates such as (2,
3) or (x, y), points on the projective plane have three coordinates: [2, 3, 1] or [ x, y, z ].
The difference is that now some coordinates represent the same point, just as
equivalent fractions represent the same number. For example [2, 3, 1], [4, 6, 2] and [
10, 15, 5] all represent the same projective point. In general, multiplying each
coordinate by a fixed number does not change the point.
All the points of the ordinary plane can be included, by associating (x, y) with [ x, y,
1]. But there are extra points ‘at infinity’ too, of the form [ x, y, 0]. So we can even
give the equation of the artists’ vanishing line: z 0.
Finite geometry The approach of algebraic geometry is to study shapes via their
defining polynomials. This approach began in the real numbers, with the study of conic
sections, for example.
Once the system of complex numbers came of age, this assumed centre stage. But the
philosophy of algebraic geometry makes sense in any context where we can add,
subtract, multiply and divide. In particular, we can also consider the geometry of finite
fields.
The basic examples of finite fields are modular arithmetic, to a prime base. We can
begin with a polynomial such as x 2 2 0 and, instead of solving it in the real numbers,
try to solve it in the integers, mod 3.
148
GEOMETRY
ALGEBRAIC GEOMETRY
Here:
Also:
So, modulo 3, the polynomial x 2 2 has two solutions, namely 1 and 2. The same
polynomial interpreted modulo 5 has no solutions at all.
This is a basic example of starting with a polynomial and counting the points satisfying
it in different finite fields. But what is the pattern? As we work over larger and larger
finite fields, can the number of solutions flicker around at random or is there some
underlying rule? This question was answered in one of the landmarks of 20th-century
geometry, Pierre Deligne’s proof of the Weil conjectures.
Projective geometry also makes sense in this context, yielding objects such as the Fano
plane.
The Fano plane A projective plane is a collection of points and lines organized so that:
o any two points lie on a unique line o any two lines meet at a unique point o there are
four points, no three of which lie on one line.
Weil’s zeta function Finite fields come in towers, with a prime number at their base.
The first tower is based on 2 and consists of the fields F 2, F 4, F 8, F 16, … (of sizes
2, 4, 8, 16, and so on). The tower with 3 at its base consists of F 3, F 9, F 27, F 81, …
For any prime number p, there is a tower F p, F p 2, F p 3, … where the first m storeys
make up F p m.
F p 2, N 3 in F p 3, and so on. Certainly the variety cannot lose points as it ascends the
tower, so N 1 N 2 N 3 … This sequence N 1, N 2, N 3, … is the key to the geometry of
finite fields, and is what André Weil set out to understand. His idea was to encode it
into a single L-function,
1 2 2 3 0 (Mod 3)
2 2 2 6 0 (Mod 3)
149
The Weil conjectures [Deligne’s theorem]
Weil’s zeta function is an abstruse object. The Weil conjectures, however, assert that it
is far simpler to understand than it first appears.
His first conjecture said that is determined by finitely many pieces of data. This is a
crucial fact, as it means the sequence N 1, N 2, N 3, … does not jump around at
random, but is governed by a fixed, predictable pattern.
The two remaining conjectures pin down this pattern precisely. Significantly, the
second identifies those places where is 0. In the simplest case, it says that all the zeroes
of lie on the critical line Re(z) 1
The Weil conjectures were a driving force behind the huge expansion of algebraic
geometry in the 20th century. The first was proved by Alexander Grothendieck in
1964, and the others by Pierre Deligne in 1974.
Hodge theory By the mid 20th century, geometry had come a long way from anything
Euclid would have recognized. At the same time, geometers’ concerns were still
traditional ones: what sorts of shapes may exist? Answers to this question had been
hugely enriched by algebra, in two different directions: firstly the polynomial
equations of algebraic geometry, and secondly the groups of algebraic topology. The
first gives us varieties as the fundamental notion of shape. Extending these slightly are
algebraic cycles, built by formally adding varieties together, and multiplying them by
rational numbers.
On the other side is the topological set-up of simplices. A critical difference here is that
these objects are constructed from the real, rather than the complex numbers. Another
distinction is that these are flexible topological objects, not tied down by polynomial
equations. Again, simplices are formally added together into topological cycles.
The setting for the meeting of these two powerful theories is projective geometry over
the complex numbers. The question William Hodge addressed in his 1950 speech to
the International Congress of Mathematicians is: when do these two different ideas
produce the same result? When is a topological cycle equivalent to an algebraic one?
The Hodge conjecture A partial answer for the central problem of Hodge theory is easy
to give. When viewed over the real numbers, the complex numbers have dimension 2
(which is why the Argand diagram looks like a 2-dimensional plane). Similarly any
complex variety must have even numbered dimension, from a real perspective. So the
first criterion is that to be algebraic, the cycle must have even dimension.
However, this is not enough. Hodge was also a master of calculus. His work on
Laplace’s equation provided him with the language to describe some particularly stable
topological cycles, now called Hodge cycles. He conjectured that in these he had found
the right
150
GEOMETRY
ALGEBRAIC GEOMETRY
Grothendieck’s Éléments de
Schemes The motivation for Grothendieck’s approach was a difference between the
two languages of algebra and geometry. The fundamental geometric objects were
varieties, defined by polynomials. These polynomials simultaneously give rise to an
algebraic object: a polynomial ring. Grothendieck realized that only a very limited type
of ring can arise in this fashion, however. The commonest ring of all is Z, the ring of
integers. But there is no variety which has Z as its ring. His bold idea was that the
geometric techniques which had been developed should work with any ring, even
where there was no underlying variety. He called these new structures schemes.
Schemes are highly abstract; many do not have obvious geometric interpretations.
Working with them poses a formidable technical challenge. The pay-off is that the
category of schemes is far better behaved than that of varieties. This leads to a more
coherent and powerful overall theory, as witnessed by the proof of the Weil
conjectures.
151
DIOPHANTINE GEOMETRY
Curves are subdivided by their complexity, or genus. Simple curves such as conic
sections have genus 0, and must either have infinitely many rational points (as is the
case with the circle x 2 y 2 2) or none at all (as happens for x 2 y 2 3). These days, we
can tell which is the case without too much difficulty. On the other hand, more
complicated curves with genus 2 or more can only ever have a finite number of
rational points. This deep fact was conjectured by Louis Mordell in 1922, and finally
proved by Gerd Faltings in 1983.
Faltings’ theorem was a huge step forward. But it did not quite close the book on the
matter of rational points on curves. Between the simple and complex, sit the enigmatic
elliptic curves with genus 1. These may have either a finite or an infinite number of
rational points. ...
How to tell the difference is the question addressed by one of the most important open
problems in the subject, the Birch and Swinnerton-Dyer conjecture.
It describes a curve, but an unusual one: you can add its points according to the
following rule. If a and b are points on the curve, draw a straight line joining the two.
This line must intersect the curve at a third point c. We say that a b c 0. So a b c. (The
zero of the group is
152
(5,12,13)
(3,4,5)
GEOMETRY
Only for elliptic curves (not to be confused with ellipses) does this rule work so
smoothly that the curve becomes a group. Because the resulting groups are difficult to
predict, they are a mainstay of modern cryptography.
The most elementary equations where this question is not yet fully understood are the
elliptic curves, such as y 2 ? x 3 ? 1. Solving this problem for elliptic curves is an
important goal of contemporary number theory. During the 1960s Bryan Birch and
Peter Swinnerton-Dyer made a conjecture which would quantify the rational solutions
of these important equations.
From any elliptic curve (call it E), there is a procedure to define a corresponding L-
function, L. Birch and Swinnerton-Dyer claim that this function encodes the details of
E’s rational solutions. In particular, they believe the function can detect whether E has
infinitely many rational points, or only finitely many. Their conjecture implies that if L
(1) ≠ 0 then the curve has infinitely many rational points, and if L (1) ≠ 0 then it does
not.
The best progress to date is due to Victor Kolyvagin in 1988. When combined with
subsequent results of Wiles and others, Kolyvagin’s theorem proves half the
conjecture, namely that if L (1) ? 0 then E only has finitely many rational points. The
remainder now comes with a $1,000,000 price tag, courtesy of the Clay Foundation.
Modular forms The slick, modern world of complex analysis seems a very long way
from the hoary problems of Diophantine equations. However, just as number theory
and geometry were drawn together over the 20th century, more recently complex
analysis has also been pulled into the mix. A key notion is that of a modular form. A
modular form is a
–c
153
function which takes complex numbers from the upper half-plane as inputs and gives
complex numbers as outputs. Modular forms are notable for their high level of
symmetry.
The sine function is a periodic function, meaning that it repeats itself. If you take a step
of 2 to the right, the function looks the same. Modular forms satisfy similar, but more
intricate rules, where the symmetry is determined not by a single number such as 2 but
by 2 2 matrices of complex numbers.
Modular forms have starred in two of the greatest stories in modern mathematics: the
proof of Fermat’s last theorem (via the modularity theorem), and the investigation of
the monster (as thrown up by the classification of finite simple groups).
Shimura made a conjecture that proposed to bridge two very different areas of
mathematics. It concerned two entirely different types of object. The first are the
elliptic curves, a preoccupation of number theorists. The second come from the world
of complex analysis, the modular forms. Taniyama and Shimura claimed that elliptic
curves and modular forms are essentially the same things, though described in utterly
different languages. They claimed that L-functions should provide a dictionary for
translating between the languages of analysis and number theory.
The modularity theorem shows that certain elements from the two worlds are closely
related. This was predicted by Langlands, but his broader program goes much further.
To express it, he needed to go beyond modular forms to the automorphic forms,
complex functions whose symmetries are described by larger matrices. Central to
Langlands’ project are L-functions, which convert algebraic data coming from Galois
theory into analytic functions in the complex numbers. Langlands believes that, as this
divide is crossed, key concepts on one side marry up with key concepts on the other
side.
154
GEOMETRY
DIOPHANTINE GEOMETRY
In addition to the proof of the modularity theorem, there has been major progress
towards realizing Langlands’ vision. The algebraic side comes in three parts: local
fields, function fields and number fields, which are of particular importance in
analysis, geometry and number theory, respectively. By 2000, Langlands’ conjectures
had been verified in two out of the three, with Laurent Lafforgue winning a Fields
medal for his 300-page solution of the case of function fields.
Only one part remains unproved, but it is monumental. Langland’s conjectures for
number fields continue to stand defiantly open, and represent a massive challenge to
today’s number theorists.
155
ALGEBRA
rules, such as the fact that a b b a, whatever the numbers a and b. A more sophisticated
example is the binomial theorem.
Using letters to stand for unknown numbers also opened up the science of equation
solving. A simple example is to find a number x, such that 4 x 6. The quest to solve
more complex cubic and quartic equations was the driving concern for mathematicians
of the Italian renaissance. Then came Abel
and Galois’ paradigm-shattering work on the quintic equation, which lifted algebra to
new heights. At this stage, the process of replacing familiar objects with more abstract
analogues reoccurs. Here the familiar number systems are replaced with more general
algebraic structures. Especially important are groups. At this level magnificent
classification theorems are possible, notably those of finite simple groups and simple
Lie groups.
Modern abstract algebra supplies much of the machinery for other areas of
mathematics, from geometry and number theory to quantum field theory.
LETTERS FOR NUMBERS
Letters for numbers For many people, the moment mathematics becomes
uncomfortable is when letters start appearing where previously there were only
numbers. What is the point of this? The first purpose is to use a letter to stand for an
unknown number. So ‘3 y 12’ is an equation, which says that y is a number which
when multiplied by 3 equals 12. In this example, the role of the y could equally well be
played by a question mark: ‘3 ≠ 12’. But this is less practical when we have more than
one unknown quantity. For example: which two numbers add together to equal 5, and
multiply together to equal 6? Using letters to represent these numbers produces a pair
of simultaneous equations: x y 5 and x y 6.
Usually traditional times-symbols are left out when writing algebra, or replaced with
dots. So our equation could be written ‘3 y 12’ or ‘3 · y 12’. Notice that the number is
conventionally placed before the letter, so ‘3 y 12’ rather than ‘y 3 12’.
Variables and substitution As well as standing for specific but unknown numbers,
letters can be used as variables, in place of any number. Here’s an example: if I am
jogging down the road, the formula d 4 t 5 might express the relationship between my
distance from my house (d, measured in metres) and the time since I started (t,
measured in seconds). Here d and t no longer represent individual unknown numbers to
be discovered. Instead they can vary; if we want to know my distance after 2 seconds,
we substitute the value t 2 into the formula, and discover that d 4 2 5 13 metres.
Asking how long it will take until I reach the end of the road 21 metres away means
substituting in d 21, to get 21 4 t 5. Now we can solve this equation, to arrive at the
answer t 4 seconds.
So (3 2) 8 means first add 3 and 2, and then multiply the result by 8. To work through
more complicated calculations, you always start with the innermost brackets. So:
158
ALGEBRA
BIDMAS What is 2 3 7? It may seem a trivial question, but only one of these can be
correct:
Since M comes before A, (i) above is wrong. Brackets trump all else, so if (i) is what
was intended, it can be expressed as (2 3) 7 5 7 35.
Similarly, 6 3 1 2 1 1, but 6 (3 1) 6 2 3.
If everyone used a lot of brackets ((2 5) ((6 4) 3)), there would be no need for
BIDMAS.
Technically, the law of distributivity assures us that equality always holds in these
situations. This is more often encountered under the guise of expanding brackets and
taking out common factors. Both involve a pair of brackets with addition or subtraction
happening inside, being multiplied by something outside: 3 (5 10)
3 (5 10) 3 5 3 10
Going back the other way, if a string of terms are added up, and they are all divisible
by a particular number, we can take out this common factor:
20 28 4 5 4 7 4 (5 7)
This may all seem straightforward when only numbers are being used. But, once
variables are involved, these techniques become essential for simplifying formulas. For
example, ax 4 x x (a 4) shows a common factor of x being taken out.
159
Squaring brackets If we have addition inside a bracket, being multiplied by something
outside, the technique of expanding brackets tells us how to cope. For example: (1 1 2)
3 2 5 1 3 2 1 2 3 2. Does a similar thing happen when the bracket is raised to a power?
It is a common mistake to believe so.
(1 1 2) 2 5 (1 1 2) 3 (1 1 2) 5 1(1 1 2) 1 2(1 1 2) 5 1 1 2 1 2 1 4.
(1 1 x) 2 5 (1 1 x)(1 1 x) 5 1(1 1 x) 1 x (1 1 x) 5 1 1 x 1 x 1 x 2 5 1 1 2 x 1 x 2
(y 1 x) 2 5 y 2 1 2 yx 1 x 2
The binomial theorem What of brackets raised to powers higher than squares? Opening
up more brackets by hand quickly becomes laborious. If we persevere, the results are
as follows:
(1 1 x) 3 5 1 1 3 x 1 3 x 2 1 x 3 (1 1 x) 4 5 1 1 4 x 1 6 x 2 1 4 x 3 1 x 4 (1 1 x) 5 5 1 1 5
x 1 10 x 2 1 10 x 3 1 5 x 4 1 x 5
(x 1 y) 4 5 y 4 1 4 y 3 x 1 6 y 2 x 2 1 4 y x 3 1 x 4
So to calculate (7 1 2 a) 4 we just make the substitution in this formula. But what is the
meaning of these sequences of numbers, 1, 4, 6, 4, 1, and so on?
The discovery of the binomial theorem is commonly attributed to Blaise Pascal, though
it was known by earlier thinkers, including Omar Khayyam. A convenient way to
remember the binomial coefficients is Pascal’s triangle.
1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 etc.
160
ALGEBRA
EQUATIONS
Each row begins and ends with 1, and each number in between is the sum of the two
numbers above it. Pascal’s triangle is principally used as a way of calculating binomial
coefficients. If you want to expand (1 y) 4, the fifth row of the triangle tells you the
answer:
14y6y24y3y4
There are many other patterns concealed in this triangle too. For instance, if you look
at the diagonals, the first is just 1s, and the second simply counts: 1, 2, 3, 4, 5, … But
the third diagonal (1, 3, 6, 10, …) lists the triangular numbers. After this are higher-
dimensional analogues of polygonal numbers, the next being the tetrahedral numbers.
Deleting the even numbers from the triangle produces a pattern which gets ever closer
to the fractal Sierpińksi triangle, as more and more rows are added.
EQUATIONS
When manipulating an equation, the rule is to treat it like a pair of scales: keep it
balanced. This means that, to remain true, you must always perform the same action to
both sides: if you add 6 to the left-hand side, or multiply it by 24, you must do the
same thing to the right, or else end up with something false. This is obvious when only
numbers are involved:
The same applies when unknowns are involved. So for instance, if 3 x 1 16, then
subtracting 1 from both sides gives 3 x 15, and dividing both sides by 3, we get x 5,
which is the solution to the original equation.
Polynomials are important in geometry: setting the polynomial equal to zero defines a
condition on the coordinates of a point on the plane, (x, y). The set of all points
satisfying this condition produces a geometric object, such as a straight line, or in this
case a parabola.
161
If a polynomial has just one variable, this produces an equation x 2 3 x 2 0. Every
number x either satisfies it or not. Solving this equation involves finding all x for
which the equation is true: the roots of the polynomial.
The fundamental theory of algebra says that (in most cases) a polynomial of degree n
will have n roots in the complex numbers.
Linear equations As soon as humans could perform basic addition and multiplication,
they were equipped to address questions such as ‘what number when doubled and
added to three gives nine?’, a useful skill for dividing up resources. It is probable,
therefore, that our species has been solving equations such as 2 x 3 9 since Paleolithic
times. This equation is of the simplest kind: it has one unknown (x), and that is never
squared, or square rooted, or otherwise complicated. We call such equations linear,
because they come from the equation of a straight line: y 2 x 3. The key to solving
these is to perform the same action to both sides, until x is left on its own: first subtract
3 from both sides: 2 x 6. Then divide both sides by 2, to get x 3.
(This can be checked by expanding the brackets.) If the original equation is to hold, it
must be that (x 1)(x 2 x 6) 0. So if we substitute the value of 1 for x, then the first
bracket is zero, and thus the equation holds. It follows that x 1 is a solution to the
original equation.
Now two numbers multiplied together can only produce zero if one of them is zero. So
if (x 1)(x 2 x 6) 0, then it must follow that either x 1 or x 2 x 6 0. This second
polynomial can be split up further, into (x 2)(x 3). So the original equation can be
rephrased as (x 1)(x 2)(x 3) 0. This is its factorized form. At this point the solutions
can just be read off: they are 1 and 2, as already expected, and 3. Can there be any
other solutions? The answer is no. For any other value of x, the expression (x 1)(x 2)(x
3) must be non-zero.
162
ALGEBRA
EQUATIONS
The factor theorem The factor theorem makes precise the argument around factorizing
polynomials. A number a is a root of a polynomial P, if and only if (x a) divides P, that
is, there is another polynomial Q such that P (x a) Q. To completely solve a polynomial
we can split it into the form (x a)(x b) … (x c). Then its solutions are exactly the
numbers a, b, …, c. In the previous example these are 1, 2, and 3.
The method for solving quadratic equations originates with the seventh-century Indian
mathematician Brahmagupta, and ninth-century Persian scholar Muhammad ibn Musa
al-Khwārizmī, from whose book on the subject, Hisāb al-jabr w’al-muqābala
(‘Compendium on calculation by completion and balancing’), we get the word algebra.
The quadratic formula In modern notation, any quadratic equation can be rearranged
into the form a x 2 bx c 0, for some values of a, b, c. This will typically have two
solutions, and these can be found by a formula imprinted in the minds of generations of
mathematics students: b
________ b 2 4 ac
____________
2 a.
The two different solutions come from the ‘’. So the equation x 2 5 x 36 has solutions
5
__________________
5 2 4 1 (36) __________________
2 1 which is 5 13 ______
Completing the square Some quadratic equations are easier to solve than others: just
taking square roots is enough to see that x 2 9 has solutions x 3 and x 3. The equation x
2 2 x 1 9 may seem more complicated. But if we spot that it is equivalent to (x 1) 2 9,
then taking square roots gives x 1 3, giving solutions of x 4 and x 2.
Completing the square is a method to turn any quadratic equation into one of this type.
The first step is to arrange the equation so that the x 2-term has coefficient 1. For
instance, starting with 2 x 2 12 x 32 0, divide both sides by 2 to get x 2 6 x 16 0. The
second stage involves the coefficient of the x-term. Dividing this by 2 and then
squaring it gives the number which ‘completes the square’, as we shall see. So, from x
2 6 x 16 0, we divide 6 by 2 to get 3, and square this to get 9. We want this number to
be the only constant on the left-hand side: x 2 6 x 9. So we have to change the right-
hand side to keep the equation balanced: x 2 6 x 9 25.
The pay-off for all this manipulation is that the left-hand side is now a square: (x 3) 2.
So the equation can be solved by taking square roots: x 3 5, and the solutions are x 8
and x 2. Applying this method to the general equation a x 2 bx c 0 produces the
formula b
________ b 2 4 ac
____________
2 a.
163
Cubic equations Following the quadratics, next in the hierarchy of equations in one
unknown are the cubic equations: those which also involve x 3, such as x 3 6 x 2 11 x
6 0. The first serious analysis of cubics was made in the 11th century by the Persian
poet and polymath Omar Khayyam, using conic sections.
In 16th-century Italy, cubic and quartic equations became the great problems of the
age. Mathematicians such as Girolamo Cardano, Niccolò Fontana, Scipione del Ferro
and Lodovico Ferrari gambled their reputations on public bouts of equation solving.
Cardano published the general solution to the cubic in his 1545 book Ars Magna,
crediting del Ferro with its discovery. Their work was a major force in the acceptance
of negative numbers and the development of complex numbers. In general, a cubic
equation has three complex solutions, and it always has at least one among the real
numbers.
The cubic formula The formula for the general cubic x 3 a x 2 bx c 0 is significantly
more complicated than that for the quadratic. To give it, we first define q a 3 ___ 27 ab
6c
2 and p q 2 (b
_3a2
x3
see complex numbers
q
p3
q
pa
(1
3 i __________
2) 3 ________ q __ p (1 ?
3 i _________
2) 3
pa
of 1, besides 1.
Quartic equations Beyond the cubics lie the quartic equations, which additionally
involve x 4. An example is x 4 5 x 3 5 x 2 5 x 6 0. Without complex numbers, which
were yet to gain acceptance, these equations posed a serious conundrum for the
renaissance algebraists such as Lodovico Ferrari. Some quartics such as x 4 1 0 have
no solutions in the real numbers, and others have four. Nevertheless, they persevered,
prepared simply to assume negative and complex numbers as their working required,
hoping they would later cancel out.
164
ALGEBRA
EQUATIONS
This cubic must have a solution in the real numbers. Pick one such and call it y. Next,
let g
____________
a 2 4 b 4 y and h
y 2 4 d. Then the four solutions to the original quartic can be found by solving two
quadratic equations:
x21
Quintic equations The formulas for finding the solutions of cubic and quartic equations
are certainly fiendish. Between 1600 and 1800, the assumption among the
mathematical community was that the formulas for quintic, sextic, septic equations
would become ever more complicated, and mathematicians toiled away trying to
discover them. Leonhard Euler admitted ‘All the pains that have been taken in order to
resolve equations of the fifth degree, and those of higher dimensions … have been
unsuccessful’.
The story took a surprising twist in the 19th century, in the hands of one of
mathematics’ most brilliant and tragic figures: Niels Abel. The formulas for quadratics,
cubics and quartics involve addition, subtraction, multiplication, division and taking
radicals (roots):
, and so on. Working in the mathematical backwater of Norway, Abel produced a six-
page manuscript, in which he proved that the search was in vain: there is no
corresponding formula for quintic equations or for those of any higher degree. These
equations will always have solutions; that is the fundamental theorem of algebra. But
there is no single method using radicals which will allow you to find them.
This is sometimes known as the Abel-Ruffini theorem, as Paolo Ruffini had arrived at
the same conclusion (although his 500-page tome contained an incomplete proof).
Tragically, Abel did not survive to witness the algebraic revolution he had began,
dying penniless at the age of 26.
Insoluble equations Throughout history, the adoption of new number systems has
always produced solutions to previously insoluble polynomial equations. The
introduction of negative numbers allowed equations such as x 6 4, which Diophantus
considered ‘absurd’, to be solved in exactly the same manner as other linear equations.
The real numbers included irrationals such as
2, which provided solutions to problems like x 2 2. The complex numbers are built
around a new number i, a solution to the hitherto intractable equation x 2 1.
With this accomplished, the question was: are there any insoluble polynomials left?
The fundamental theorem of algebra gives a triumphant answer: no.
In a piece of mathematical magic, proved by Carl Friedrich Gauss in his doctoral thesis
in 1799, it turns out the complex numbers do not merely provide the solution to the
equation x 2 1. Every polynomial built from complex numbers must always have a
solution, also in the complex numbers. For example, x 5 2i x 4 must have a solution in
the complex numbers.
_______
2 (a g) x 1
2 (a g) x 1
2 (y h) 0
2 (y h) 0 and x 2 1
165
Indeed, a polynomial of degree n (that is, with highest term x n) will usually have n
different solutions. Occasionally, however, these solutions can double up, as in the
case of (x 1) 2 0 which has just one solution: x 1. The fundamental theorem of algebra
has several proofs, four of which were discovered by Gauss. All invoke the power of
complex analysis.
There are two main methods for tackling this sort of problem. The first is by
elimination: we add or subtract the equations from each other, until one variable
vanishes. In this case adding the two equations eliminates y (the y and y cancel out),
leaving 2 x 6. We easily solve this, to get x 3. Now we substitute that value back in to
one of the original equations, say x y 4, which becomes 3 y 4, and we can solve this to
get y 1. So the solution is x 3, y 1.
There are two things that can go wrong, however. If we start with the simultaneous
equations x y 1 and 2 x 2 y 2, and try to solve them, we will not get very far. The
reason for this is that they are not really two different equations, more like the same
one twice. This is easy to spot with just two equations, but the same phenomenon can
happen more subtly with larger systems. For instance, x y z 6, 2 x y z 3 and x 4 y 2 z
15 cannot be solved uniquely. On closer inspection, this is because the third equation is
not really new, but comes from the first two (triple the first, and subtract the second).
Systems like this are called dependent. They do have solutions: for example, x 1, y 2, z
3 is a solution. In fact, they have infinitely many solutions, lying along the line.
166
ALGEBRA
The opposite problem is when the system has no solutions at all. For example, we
cannot hope to find any solutions to the system x y 1 and x y 2. These are inconsistent.
In geometric terms, they represent parallel lines (so there is no use looking for the
place where they cross). In three dimensions, we may get parallel planes such as x y z
1 and x y z 2.
In more complicated systems, we can end up with skew lines, such as those given by z
x y 1 and z x y 1. These are not parallel, but nor do they cross; they pass each other in
3-dimensional space, without meeting.
Polynomial rings Originally, numbers were just tools for counting objects. In time,
humans came to see their order and beauty, and were inspired to investigate them
further for their own sake. Over the 20th century, a similar shift of perspective
happened regarding polynomials. Once, a polynomial was just a convenient way to
formalize a problem which involved some unknown number. A more modern, abstract
approach sees polynomials as objects in their own right, which can be added,
subtracted and multiplied together.
So the collection of all polynomials with integer coefficients and just one variable (X)
forms a ring, called Z [ X ]. Alternatively we could look at the ring of polynomials in
two variables with complex coefficients: C [ X, Y ]. There are many other possibilities.
Like different number systems, these new structures have hidden depths, and their
study is of huge importance in contemporary algebra, number theory, and geometry.
(3
4) are known as vectors. Essentially, they are a system for giving directions from one
point to another.
The top row gives the distance to travel right, and the bottom row the distance to travel
up. So
(3
4) translates
as ‘three right and four up’. A negative number on the top row means left, and on the
bottom down. So
(2
–2 –1
(2,8) 13
(4,9)
34
1) translates
as ‘two left and one down’. Starting at the point (1, 5) and following the vector
(3
4)
(1,5)
34
13
–2 –1 + =
1), we arrive at
(2, 8). On the other hand, the trip from (1, 5) to (2, 8) can be given in one step, by
(1
4)
1) (1
3). This illustrates how to add vectors: just add the numbers in the corresponding
positions.
(2
167
Parallelogram law A vector has direction and magnitude. It can conveniently be
written as a straight arrow, of a particular length. It does not have any particular start or
end point, however. (This makes vectors perfect for modelling quantities such as
velocity.) You can start the same vector
(1
4) at the origin (0, 0), or at the point (100, 101). If we start at the origin, and apply 4),
we get to the point (4,6)
The triangle inequality Everyone knows that is further to travel along two sides of a
triangle than to take a shortcut along the third. This piece of common sense makes its
appearance in mathematics as the triangle inequality, occupying an axiomatic status in
several areas. Any sensible notion of distance should obey this rule, whether that be in
see
an exotic hyperbolic space, or in ordinary Euclidean space.
quadrilaterals
The triangle inequality is also important in the study of vectors. If v and u are vectors,
it is not true that the length of v u is equal to the lengths of v and u added together. For
example, if v
(3
4) and u (0 4),
then ||v|| 5 and ||u|| 4, but ||v u|| 3. So we cannot say, in general, that ||v u|| ||v|| ||u||.
However, the length of v u cannot be more than the total lengths of v and u. So ||v u||
||v|| ||u||, which says exactly that the third side of the triangle is shorter than the sum of
the other two sides.
(1,4)
(3,2)
(0,0)
5
u+v
v+u<v+u
168
see
ALGEBRA
The dot product There is a way to combine two vectors, not to get a third vector, but a
number which describes their relationship. This is called the dot product (or scalar
product). It works by multiplying together the corresponding entries in the vector, and
adding them up. So
(1
2) (3
4) 1 3 2 4 3 8 11.
–1
22
The dot product has several useful properties. Firstly it incorporates the length of a
vector: ||v||
1) (2
2) 1 2 1 2 0. Expanding
on this, the dot product can also be used to find the angle between two vectors.
The angle between two vectors The dot product provides a convenient method for
finding the angle between any two vectors. If the angle is between the vectors u and v,
the formula connecting them is:
cos u v
uv
(1
1) and v (2
0), we get:
u v 1 2 1 0 2. Also, u
_______
1212
2, and
2 2 2.
cos 2
_______
u
22
1 ____
gives a result of
4, or 45.
Or alternatively
ax by
_________________
(a 2 b 2) (x 2 y 2)
This can be expressed very neatly in terms of the dot product of two vectors. Putting u
(a
uvuv
169
This version of the Cauchy–Schwarz inequality follows from the formula for the angle
between two vectors, as |cos ? | must be between 0 and 1.
It extends to larger collections. Suppose you have two equally sized collections of real
numbers a, b, …, c and x, y, …, z. Then
The cross p roduct The dot product of two vectors is not a vector, but a scalar: an
ordinary number. There is also a way to combine two vectors u and v to give a third u
3 v, known as the cross product. Algebraically, it is rather delicate, and only works in
3-dimensional space. It also fails to satisfy some criteria we expect a product to obey.
For example u 3 v ? v 3 u (in fact u 3 v 5 ? v 3 u). It is defined as follows:
(a
3 (x y z)
bz ? cy cx
az
)
2 0 0)
030
0 0 6)
So for example
ay ? bx
The length of u 3 v is ||u 3 v|| 5 ||u|| ||v|| sin ?, where ? is the angle between u and v. (If
the two vectors are parallel, then the angle is zero, and the result is always zero.)
Maxwell’s equations.
Despite being algebraically awkward, the cross product is important in physical
problems such as understanding electromagnetic fields. (See Maxwell’s equations.)
Matrices A matrix is an array of numbers such as (1 2
3 4) or (1 1 0
0 1 1).
Matrices can come in any rectangular form, but square matrices are particularly
important. You can only add matrices which are the same size. To do this, just add the
corresponding entries. So:
(1 2
34
) 1 (5 7
68
) 5 (6 9
9 12
Multiplying a vector by a matrix To start with a simple example, we can multiply the
single row matrix (2 3) by the vector
) to get a
1 3 1 matrix. It works by taking each number in the matrix in turn, and multiplying it
by the
(4
170
ALGEBRA
corresponding number in the vector, and adding all these together. So:
(2 3
) (4
) (2 4 3 5) (8 15) (23)
This is the fundamental technique. With a larger matrix we do the same thing, taking
each row in turn. So:
(2 3 6 7
) (4
) (2 4 3 5
6475
) (23 59
The same process will always work, as long as the width of the matrix is the same as
the height of the vector. So:
0 1), which is the 2 2 identity matrix. This has the property that it leaves every vector
as it is:
(1 0
(1 0
0 1) (a b) (a
An important matrix is
100
010
001
Usually denoted I, the identity matrix is the multiplicative identity, meaning that it
plays the same role in multiplying matrices as the number 1 does in multiplying
numbers.
(5
(1 2 3 4
) (5
3 4) (5 6 7 8), just split (5 6 7 8) into two
vectors
7) (19
43) and (1 2 3 4
) (6
8) (22
50)
(1 2 3 4
) (5 6 7 8
) (19 22
43 50)
After a little practice, this can be done without splitting up the matrix.
The thing to remember is that, when you evaluate an entry, you use the corresponding
row from the first matrix and the column from the second. So, focusing on the bottom
central number in this evaluation, we get:
(1 2 3
456
246
369
…356392……356392…
234123
… 51 …
… 51 …
171
Determinants The determinant of a square matrix is a number associated to it, which
encodes, in a rather subtle way, useful information about it. For a 2 2 matrix,
(a b
c d),
det (1 2 3 4
)1423462
Larger square matrices also have determinants, but calculating them is a little more
involved:
det
def
ghi
a det (e f h i)
b det
abc
d f g i) c det (d e g h
)
Extending this process allows the calculation of 4 4 and larger matrices, although the
process becomes increasingly time-consuming.
One useful feature is that the determinant respects matrix multiplication. For any
matrices A and B:
So if you know the determinant of A and of B, you immediately know the determinant
of AB, without having to work through the whole process.
The most important information the determinant carries is whether or not it is 0. If det
A 0, then A has no inverse, that is, there is no other matrix B, such that AB I (where I
is the identity matrix). If det A 0, then A does have an inverse.
To invert a 2 2 matrix A
(a b
1 Find its determinant, det A ad bc. If this is zero, then stop, because A does not have
an inverse.
c a).
3 Divide each entry in this new matrix by det A, to get A 1, the inverse of A:
A1
d ________ ad bc b
________ ad bc
c ________ ad bc a
________ ad bc
____ det A adj A. This also holds for 3 3 and larger matrices, but both the adjugate and
determinant become more complicated to calculate.
Or more briefly, A 1 1
172
ALGEBRA
The adjugate of a matrix The procedure for inverting 3 3 and larger matrices is
essentially the same as for 2 2 matrices, but finding the adjugate is a
123321
213
, it works as follows:
1 First, form a new matrix by swapping the rows and columns of the original. So the
first row becomes the first column, the second row becomes the second column, and so
on. This is called the transpose of A.
AT
132221
313
2 Next, we focus on one entry of A T. Striking out the whole row and column of that
entry leaves a 2 2 matrix. For example, taking the 3 on the top row of A T, removing
its row and column:
, leaves the matrix (2 1 3 3). The determinant of this smaller matrix is called the
31
22
) 2 3 3 1 3.
3 The next matrix is formed by replacing every entry in A T with the corresponding
minor. This produces:
534
738
134
)
, where means
do not change the sign, and means do. So we finally arrive at:
adj A
738
134
534
_____ det A adj A, and divide every entry of the adjugate by the determinant of A, in
this case, 12. So:
5 ___ 12 1 __ 4 1 __ 3
7 ___ 12 1 __ 4 2 __ 3 1 ___ 12 1 __ 4 1 __ 3
)
A1
173
Transformation matrices Transformations such as rotations and reflections involve
taking the coordinates of a point, and recombining them in some way. For example,
reflecting a point in the line y x involves swapping its coordinates. So (1, 2) becomes
(2, 1). Reflecting in the x-axis involves changing the sign of the y-coordinate: (1, 2)
becomes (1, 2). Rotating by 90° about the origin (anticlockwise as always) takes (1, 2)
to (2, 1).
(1
). So: (0 1
1 0) (1
2) (2
1)
(1 0
0 1). So:
(1 0
0 1) (1
2) (1
2)
(0 1
1 0). So:
(0 1
1 0) (1
2) (2
1)
(0 1
1 0), (1 0
0 1) and (0 1
1 0)
respectively.
What of more general rotations? Matrices can only really cope with rotations about the
origin. (Rotating about any other point can be broken down into a rotation about the
origin, and a translation.)
The general form for the matrix of rotation by an angle of, is:
(cos sin
sin cos
Rotate by θ
3 ____
3 ____
21
Reflection matrices Matrices can express reflection in any line which passes through
the origin. The commonest examples are the lines y x, y x, y 0 and x 0. These are
described by the matrices
(0 1
0 1) respectively.
For other rotations, every line through the origin is uniquely defined by its gradient or,
equivalently, its angle to the x-axis. The x-axis itself (that is, the line y 0) has angle 0°,
the line
1 0), (0 1
1 0), (1 0
0 1) and (1 0
174
ALGEBRA
y x has angle 45°, the y-axis has angle 90° and y x has angle 135°. At 180°, we are
back at the x-axis. If the angle to the x- axis is, then the gradient is tan, and so the
equation of the line is given by y (tan) x. Reflection in this line is described by the
matrix:
(cos 2
sin 2
sin 2 cos 2
3 ____
3 ____
21
2
y = (tan) x
y=x
21
(2
(6
), since (3 0 0 3
) (2
1) = (6
3).
175
In general, an enlargement by a scale factor a is given by the matrix
(a 0 0 a).
0 1). This is an example of a shear. Shearing always keeps one line fixed (in this case
the x-axis) and slides other points parallel to this line, in proportion to their distance
away from it. This proportion is called the shear factor (in this case 1). It is notable that
shearing always preserves the area of a shape.
Groups of matrices Matrix multiplication works best for matrices which are square. In
this case, any two matrices of the same size can be multiplied, and there is a special
matrix of each size, the identity, which leaves every other untouched. For 3 3
However, the set of all 3 3 matrices does not form a group. The problem is that not
every matrix has an inverse. In particular, if the determinant of a matrix A is zero, then
A has no inverse. Luckily, this is the only obstacle. If we rule out the matrices with
zero determinant, then the remaining matrices do form a group, called the general
linear group of degree 3. There are other, smaller groups lurking inside here too.
Restricting to matrices which have determinant 1 produces another group, the special
linear group.
Focusing on 2 2 matrices, the collection of just rotation and reflection matrices forms a
group, called the orthogonal group of degree 2. In higher dimensions, the range of
possible transformations grows, but orthogonal groups remain important.
These different matrix groups are the prime examples of Lie groups, and a painstaking
analysis of them resulted in the classification of simple Lie groups. Beyond this, by
replacing the numbers in these matrices with elements from a finite field, we can find
the families which feature in the classification of finite simple groups.
Matrices and equations As well as being central to geometry, matrices are useful for
condensing a whole system of simultaneous equations into a single one. Take the pair
of equations 2 x y 7 and 3 x y 3. These can be expressed together as:
176
ALGEBRA
GROUP THEORY
52
3), so the solution to the original system is x 2 and y 3. This approach can be extended
to solving bigger sets of equations with more unknowns.
(x
y) (2
GROUP THEORY
The group axioms If we are adding integers together, two things are obvious:
o There is a special number 0 which does nothing when added to any other.
17 6 23, which gives the same answer as 12 (5 6) 12 11 23. So the way we choose to
bracket the numbers doesn’t matter. It is because of this property of associativity that
we can write 12 5 6 without ambiguity.
These simple facts, and the philosophy of abstraction give the idea of an abstract
algebraic structure called a group as a collection of objects which can be combined in
pairs, satisfying three axioms:
1 There is one special object, the identity, which does nothing when combined with
any other.
2 Every object has an inverse which combines with it to produce the identity.
Multiplying numbers can also produce a group: this time the identity will be 1, and the
inverse of a number q will be its reciprocal 1
2, which is not an integer. So the integers do not form a group under multiplication. If
we extend to the rational numbers, then we almost have a group. But there’s still one
problem. The number 0 has no inverse, as
177
there is no number you can multiply by 0 to get 1. So we’ll exclude that, to arrive at
the group of non-zero rational numbers. This forms a group under multiplication.
These examples are both infinite, but finite groups also exist, including many
symmetry groups and permutation groups.
Another observation about the integers is that 17 89 89 17, so the order of combination
doesn’t matter. Groups such as this are called Abelian after the Norwegian algebraist
Niels Abel. Many non-Abelian groups also exist, including groups of matrices under
multiplication.
Permutation groups How many different ways can the set {1, 2, 3} be ordered?
Factorials supply the answer: 3! 3 2 1 6. The six orderings are: 1, 2, 3; 1, 3, 2; 2, 1, 3;
2, 3, 1; 3, 1, 2; 3, 2, 1.
1?2
2?3
3 ? 1 or
1?1
2?3
3?2
These permutations can be performed one after the other. Putting these two together
gives:
1?2?3
2?3?2
3?1?1
which simplifies to
1?3
2?2
3?1
1?1
2?2
3?3
1?2
2?3
3 ? 1 is
1?3
2?1
3?2
because when you put these together, you get the identity:
1?2?1
2?3?2
3?1?3
178
ALGEBRA
GROUP THEORY
So these six permutations form a group. This is called the symmetric group on three
elements, or S 3. Similarly we can construct the symmetric group on any number of
elements.
1?2
2?3
3?1
can be written as (1)(2 3), since 1 is on a cycle on its own, and 2 and 3 form a cycle of
length 2. Usually the trivial cycle (1) is left out, and the permutation is just written as
(2 3).
Librarian’s nightmare theorem If customers borrow books one at a time, and return
them one place to the left or right of the original place, what arrangements of books
may emerge? The answer is that, after some time, every conceivable ordering is
possible. The simplest permutations are the transpositions, which leave everything
alone except for swapping two neighbouring points. The question is; which more
complex permutations can be built from successive transpositions? The answer is that
every permutation can be so constructed.
Alternating groups The librarian’s nightmare theorem says that every permutation can
be broken down into transpositions. This representation is not unique, however. There
are many ways to build a particular permutation from transpositions. For example, in
the symmetric group S 5, (1 3 2) (1 2)(2 3) and also (1 3 2) (2 3)(1 2)(2 3)(1 2).
Through all possible representations, something does remain fixed. If a permutation is
a product of an even number of transpositions, then every representation must
comprise an even number. Similarly, a permutation which is a product of an odd
number of transpositions can only be written as a combination of an odd number. This
fact divides permutations into the odd ones and the even ones.
179
1?1
2?3
3?2
The collection of even permutations is particularly important, as it forms a subgroup,
called the alternating group on n elements, A n. When n 5, this group is a simple
group. The fact that A 5 is the first non-Abelian simple group is central to Galois’
theorem.
10
1
see modular arithmetic
1
4
5
10
2 4 6 8 10 12 14 16 18 20
12
15
18
21
24
27
30
4
4 8 12 16 20 24 28 32 36 40
10
15
20
25
30
35
40
45
50
6 12 18 24 30 36 42 48 54 60
14
21
28
35
42 49
56
63
70
8 16 24 32 40 48 56 64 72 80
18
27
36
45
54
63
72
81
90
10
10 20 30 40 50 60 70 80 90 100
2413
4
2
4321
Named after the 19th-century British mathematician Arthur Cayley, this table
completely defines the group of non-zero integers under multiplication modulo 5.
(Notice that each element of the group appears exactly once in each column and in
each row, making Cayley tables examples of Latin squares.)
For another example, take the symmetry group of an equilateral triangle. As well as the
identity, written ‘’, this contains a rotation by 120°, call it R. The other rotation is by
240°, the result of doing R twice, so R 2. There is also a reflection in the vertical line,
call this T. The remaining two reflections are the results of doing T followed by either
R or R 2, so we call them TR and T R 2. Then the full multiplication table shows how
these six elements interact:
180
ALGEBRA
GROUP THEORY
RR2T
TR
TR2
RR2T
TR
TR2
R R 2 T R 2 T TR
R2R2
TR
TR2T
T TR T R 2 R R 2
TR
TR
TR2TR2
T R 2 T R 2 T TR R R 2
Isomorphisms Arthur Cayley understood that the pattern within one of his tables
contained the abstract essence of a group. The names of the elements, and the
geometric scenarios that gave rise to it, are of secondary importance. To say that two
groups are isomorphic means that they are essentially the same (even if they arose in
completely different situations). All that is required to turn one into the other is a
systematic changing of the labels.
(0 1 2)
(0 2 1)
(0 1)
(0 2)
(1 2)
(0 1 2)
(0 2 1)
(0 1)
(0 2)
(1 2)
(0 1 2)
(0 1 2) (0 2 1) (1 2) (0 1) (0 2)
(0 2 1)
(0 2 1)
(0 1 2)
(0 2)
(1 2)
(0 1)
(0 1)
(0 1) (0 2) (1 2) (0 1 2) (0 2 1)
(0 2)
(0 2)
(1 2)
(0 1)
(0 2 1)
(0 1 2)
(1 2)
(1 2) (0 1) (0 2) (0 1 2) (0 2 1)
? (0 1 2) ? R (0 2 1) ? R 2
(0 1) ? T (0 2) ? TR (1 2) ? T R 2
181
0
123450
345012
501234
The term isomorphic can apply to algebraic structures other than groups (such as rings
or fields), but always carries the same meaning: that two structures are essentially
identical, with only a renaming of the elements needed to turn one into the other.
Simple groups Just as a prime number is one which cannot be divided into smaller
numbers, so a simple group cannot be broken into smaller groups. In the case of finite
groups, there is an analogue of the fundamental theorem of arithmetic too: the Jordan–
Hölder theorem of 1889, which says that every finite group is built from simple
groups, in a unique way. The classification of finite simple groups gives a complete
account of these fundamental building blocks.
For infinite groups, the situation is not so straightforward, since not every group can be
broken down into indivisible pieces in this precise way. However, certain special cases
can be tackled, a particularly important case being the classification of simple Lie
groups.
One of the most spectacular mathematical feats of the 20th century, the classification
of finite simple groups was the culmination of a mammoth team project, spread across
500 papers by more than 100 different mathematicians worldwide. Efforts to create a
‘second generation proof’ in one place are continuing, and will stretch to 12 volumes
(at time of writing six have been published). The final theorem gives precise
descriptions of 18 infinite families of finite groups.
The first of these is the family of cyclic groups: addition modulo p, where p is prime.
Next come the alternating groups, and the remaining families are symmetry groups of
certain finite geometric structures. The theorem also describes 26 individual groups,
known as the sporadic groups, the largest of which is the monster.
Ultimately, this remarkable theorem states that the 18 families and 26 individual
groups together comprise the entire collection of finite simple groups: there are no
others. These, then, are the atoms from which every finite group is built.
182
5
ALGEBRA
GROUP THEORY
constructed by Bernd Fischer and Robert Griess in 1980, the monster was initially seen
as a curiosity: a freakish combinatorial possibility.
In 1979, however, John Conway and Simon Norton were surprised to find its
fingerprints in the unconnected area of modular forms. Dubbing this unexpected
phenomenon monstrous moonshine, they made a bold conjecture that these two worlds
are actually intimately related. In a tour de force in 1992, Richard Borcherds proved
the moonshine conjecture using deep ideas from quantum field theory. Having
completed the proof, he described his feelings as ‘over the moon’.
Lie groups The symmetry group of a square contains four rotations: by 0°,
90°, 180° and 270°. In contrast, a circle can be rotated by 1° or 197.8°, or any amount,
to leave it looking the same. This means that the symmetry group of the circle is
infinite. Beyond this however, it makes sense to talk about two rotations being ‘near’
to each other: rotating by 1° is nearly the same as leaving the circle fixed. Rotating by
0.1° or 0.01° are closer still, so these rotations are getting closer to the group’s identity:
the trivial symmetry which moves nothing. This group allows for gradual, continuous
changes rather than the discrete symmetry groups of polygons or tessellations.
In fact the symmetry group of the circle starts to look rather like the circle itself: you
can rotate by any angle, and 360° takes you back to the start. But there are also
infinitely many reflections, corresponding to choosing different diameters of the circle
as the mirror lines. Two reflections can be close to each other (if their lines are nearly
the same), but a reflection can never be close to the identity. So this symmetry group
comes in two distinct components: rotations which can slide towards the identity, and
reflections which cannot.
This is called the second orthogonal group. (The third corresponds to the symmetries
of the sphere, and so on.) The orthogonal groups are the first examples of Lie groups:
groups which are also manifolds, named after their Norwegian discoverer Sophus Lie
(pronounced ‘Lee’). The symmetries of any smooth manifold produce a Lie group,
making these of great importance in physics. Often, as with the orthogonal groups, a
Lie group can be made concrete as a group of matrices.
In 1989, the quantum theorist A. John Coleman wrote an article entitled ‘The greatest
mathematical paper of all time’. His nomination for this majestic title went to an 1888
work by Wilhelm Killing. In it, among other technological breakthroughs, Killing laid
the groundwork for a full classification of simple Lie groups.
Lie groups today play central roles in physics, and are mathematically profound for the
way they tie together the algebra of group theory, with deep ideas from differential
geometry. Simple Lie groups are especially important: they are those which cannot be
broken down into smaller Lie groups.
183
Subsequently Élie Cartan completed the momentous project begun by Killing. As Jean
Dieudonné wrote, it was ‘made possible only by his uncanny algebraic and geometric
insight and that has baffled two generations of mathematicians’. The result of Killing
and Cartan’s efforts is a list of four infinite families of simple Lie groups deriving from
matrix groups. Additionally there are five individual groups, the so-called exceptional
Lie groups, which come from the quaternions and octonions. Every simple Lie group
must then be either a member of one the four families, or equal to one of the five
exceptional groups.
The atlas of Lie groups Since Killing and Cartan’s classification of simple Lie groups,
we have known that every simple Lie group must be one from the list. A tremendous
achievement though this was, it is not the end of the story, since it does not tell us the
inner workings of these groups. The best way to understand such gigantic, abstract
objects is to approximate them by things we know far better, namely groups of
matrices. This is the subject of representation theory. The atlas of Lie groups is an
ongoing project to accumulate the representation theory of all Lie groups.
The extension problem In chemistry, elements such as carbon and hydrogen are
combined into compounds. A formula such as C 5 H 12 tells us that each molecule of
the compound contains five carbon and twelve hydrogen atoms. This information is
not enough to pin down the new chemical exactly. Elements can combine in different
ways called isomers. Two isomers can both have the formula C 5 H 12 but, because
these atoms are connected in different ways, the result can be two compounds with
very dissimilar chemical properties.
The same thing is true in group theory. To understand a group it is not enough to know
its make-up in terms of simple groups; two groups can combine in many different
ways. For example, start with the group of addition modulo 2:
10
Taking two copies of this and joining them together gives two possibilities. The first is
to add pairs from the group. This produces the so-called Klein 4 group:
184
ALGEBRA
ABSTRACT ALGEBRA
(0, 0)
(0, 1)
(1, 0)
(1, 1)
(0, 0)
(0, 0)
(0, 1)
(1, 0)
(1, 1)
(0, 1)
(1, 0)
(1, 0)
(1, 1)
(0, 0)
(0, 1)
(1, 1)
1230
1
3
3012
Understanding the different possibilities for combining two groups is called the
extension problem. In general it is intractably difficult. But particular special cases,
such as finite groups of prime size, are subjects of intensive study. One technique
deployed by Marcus du Sautoy and others, is the use of L-functions to encode
information about the groups.
Solvable groups The easiest groups to understand are Abelian groups: those where it is
always true that xy yx. All the familiar number systems satisfy this, but not every
group is Abelian. Matrix groups, permutation groups, and symmetry groups are
typically not. Second best are groups which, though not Abelian themselves, are built
from Abelian simple pieces. These are the solvable groups. Groups which are built
from non-Abelian simple pieces, such as A n, are not solvable. Évariste Galois’ great
insight was that groups are useful for studying equations. Whether or not the resulting
group is solvable is a significant question, addressed in Galois’ theorem.
ABSTRACT ALGEBRA
Abstract algebra Higher algebra differs from the plain letters-for-numbers variety,
mainly in that calculations are performed in settings which are far more abstract than
the familiar systems of the integers or rational numbers. The role of the number system
(such as the integers) is played by a structure: a collection of objects (called its
elements) which take the place of individual numbers. The whole set-up is governed by
some precisely defined axioms, such as the group axioms. These structures can then be
studied in depth, purely by investigating the logical consequences of the axioms.
185
Algebraic structures The range of abstract algebraic structures studied today is so vast
as to be bewildering, even to professional mathematicians. The most important
examples, however, are groups, rings and fields. At first glance, this dry, formal
approach may seem totally disconnected from any sort of reality. But many ordinary
mathematical objects, such as the integers, complex numbers and matrices fit into one
or more of these boxes, so to study abstract structures is also to study these more
common systems.
Rings Groups provide a wonderful way to study either addition or multiplication in the
abstract. But, for ordinary numbers, both of these processes go on at the same time. A
new way to abstract this set-up was also needed. The solution that mathematicians of
the early 20th century arrived at is the notion of a ring. This is a collection of objects
which can be added, subtracted and multiplied together, following precise axioms
governing how these processes interact. The most important example is, of course, Z
the ring of integers. But there are many others too: rings of matrices and polynomials
appear throughout algebra and geometry, and rings of functions play an important role
in analysis.
All fields are also rings. Similarly all rings are groups (when you forget about their
multiplication and focus on addition and subtraction).
(7) does not usually give an integer answer. So we cannot hope to be able to divide
without leaving the structure.
A field is a place which also permits division. (There is an exception though: even in a
field, you can never have division by 0.) The most common fields are the rational
numbers, the real numbers and the complex numbers. Another important class is that
of finite fields. Non-examples are the ‘field’ with one element, the quaternions and the
octonions.
Prime fields In arithmetic modulo 4 (see modular arithmetic), the number 1 cannot be
divided by 2. The reason is that the multiples of 2 are 0, 2, 4, 6, 8, 10, … Each of these,
when divided by 4, leaves remainder 0 or 2. So, no whole number, when multiplied by
2, gives an answer congruent to 1 mod 4. This means that when we look at the
numbers {0, 1, 2, 3} under arithmetic modulo 4, we can add, subtract and multiply
them, but not divide them. So this structure is a ring, but not a field.
186
ALGEBRA
ABSTRACT ALGEBRA
In modulo 5, however, any number can be divided by any other (except 0 as always).
For instance, 1 2 3 mod 5, because 2 3 1 mod 5. This means that the collection of
numbers {0, 1, 2, 3, 4} does form a field (called F 5). You can do addition, subtraction,
multiplication and division in this structure. The crucial difference is that 5 is prime,
but 4 is not. In the same way, arithmetic modulo any prime p produces a finite field,
called F p. But this never works for non-primes.
Finite fields The prime fields F p are not quite the only finite fields. Just as we can
extend the real numbers to the complex numbers by adding in a square root of 1, so we
may extend the field F p, by incorporating some new elements.
For any n, there is a new field F p n, with exactly p n elements. So, in particular, there
is a field F 4 with four elements, but it is not the same as {0, 1, 2, 3} under arithmetic
modulo 4. These finite fields are remarkably useful objects, not least in cryptography.
The ‘field’ with one element According to the axioms, every field must contain an
element which does nothing when multiplied by any other (usually called ‘1’ for
obvious reasons) and an element which does nothing when added to anything else
(‘0’), and these must be different: 0 1. One of the most basic facts about fields then, is
that all of them contain at least two elements. (In fact the smallest field, F 2, has
exactly two.)
So there can be no field with only one element. However, this logical impossibility
didn’t stop Jacques Tits from opening discussions on the subject in 1957. Nor has it
prevented numerous mathematicians since then from developing a body of knowledge
around the non-existent entity known as F 1 (or perhaps more appropriately as F un,
‘un’ being the French for ‘one’).
This is not (only) a joke. These mathematicians view F 1 as a notional limit object for
fields, as infinity is for the natural numbers. The guiding principle is that F 1 turns
combinatorial objects into geometric ones. According to this philosophy, the set of
integers can be imagined as a curve over F 1, and plain unstructured sets of objects
come to resemble varieties.
Galois theory The rational numbers, real numbers and complex numbers are all
examples of fields. Although self-contained, these number systems are closely related:
rational numbers are all real numbers, and real numbers are all complex numbers. An
important thing which can happen when we move from a smaller field to a bigger one
is that equations which previously had no solutions can gain some. x 2 2 0 has no
rational solutions, but when we move to the real numbers we find one:
2. Similarly, x 2 1 0 has no solutions among the real numbers, but by stepping into the
complex numbers we find one: i.
Building on Abel’s work on quintic equations, in the early 19th century, Évariste
Galois drew together several central themes of algebra. From his works, the subject of
Galois theory, has grown and grown. In modern terms, Galois theory is the study of
how one field can live inside another. A crucial aspect is to decide what new equations
can be solved in the bigger field that could not be solved in the smaller.
187
Symmetries and equations When passing from a smaller field (such as the real
numbers) to a larger one (such as the complex numbers), the new field brings with it
new symmetries. For example, complex conjugation is a symmetry of the complex
numbers, not present in the real numbers. This symmetry flips over i and i, the two
solutions of the equation x 2 1 0. Galois theory can be thought of as the study of
symmetries of solutions of equations. These symmetries form a group. Évariste Galois
realized that a critical question is whether or not the resulting group is solvable.
Like Abel, Galois’ life was both spectacular and tragically short. A committed
revolutionary, he was once arrested for threatening the king’s life (though later
acquitted). Aged just 21, he was killed in a duel, in circumstances which remain
mysterious.
Galois’ theorem Galois addressed the question of when an equation can be solved by
radicals (roots). The quadratic formula is a formula involving just
1, … which will give the solutions to any quadratic equation. The cubic and quartic
formulas are similar.
Galois’ theorem says that such a formula exists if and only if the corresponding group
is solvable. The symmetric groups S 1, S 2, S 3 and S 4 are solvable, which explains
why linear, quadratic, cubic and quartic equations all have formulas for their solution.
From then on the groups S 5, S 6, S 7, . . . are not solvable. When you split S n into its
simple pieces, you encounter the non-Abelian simple group A n (see alternating
groups). Hence quintic (and higher degree) equations are not solvable by radicals.
see alternating groups
Representation theory The trouble with algebra, it is often said, is that it is so abstract.
It is not only the baffled student who thinks so; this is a problem of which professional
algebraists are all too aware. A group, for example, is a very abstract object. For finite
groups, we can get a handle on them by writing out the Cayley table (as long as it is
not too big). With this done, the inner workings of the group are laid bare. For infinite
groups there is no such easy way in. More concrete groups are much easier to work
with. Matrix groups are the prime examples. All we need to do is to learn the laws of
matrix multiplication, and the group becomes far more accommodating.
Representation theory is about bringing abstract algebra down to earth. The best case is
when we can identify that our group actually is a matrix group (that is, it is isomorphic
to one). Even when this is not true, we can often find matrix groups which are
reasonable approximations to our group. These are called the group’s representations.
Representation theorists aim to piece together the original group from the information
provided by its representations. Representation theory began with groups, but the study
of rings and other structures by these methods is a major theme of modern algebra.
188
ALGEBRA
ABSTRACT ALGEBRA
Category theory ‘No man is an island’, said the poet John Donne, and the same is true
of algebraic structures. If we wish to study a particular group, for example, it is often
productive to investigate its relationships with other groups. The classical example is
to break the group down into smaller groups. But other, subtler relationships are also
possible. These relationships are best expressed as functions between groups. So, a
certain amount of information resides not with the individual group, but at a higher
level, in the collection of all groups, and the functions between them. This is an
example of a category.
Something rather remarkable can happen; you can prove things using very general
techniques, which do not seem to rely on the properties of groups at all, but just the
category of objects with its functions. The topologists Saunders Mac Lane and Samuel
Eilenberg turned what had previously been dismissed as ‘abstract nonsense’ into a
subject in its own right. They were led to this through their development of algebraic
topology, which associates groups to topological objects. The best way of imagining
this, they realized, is as a functor between the category of topological spaces and that
of groups.
With mathematicians contemplating ever more abstract and arcane objects (the
schemes of modern algebraic geometry being a prime example), the importance of
category-theoretic methods is rapidly growing.
When are two things the same? This is a surprisingly deep question, with many
answers, depending on precisely what is meant by ‘the same’. Different approaches to
geometry provide different answers there, but the question is important in algebra too.
The tightest concept of sameness is that of equality: two things are the same if they are
not really two at all, but one. (In non-mathematical language, equality has a different
meaning, and things, such as men and women, can be equal without being the same.)
Equality is the right notion of sameness for set theory; in algebra this is too restrictive.
In group theory, isomorphisms better capture the idea of being identical. Two groups
may not literally be the same thing, but if they are isomorphic, then they have the same
Cayley tables and are identical in every important respect.
In some contexts, notably algebraic geometry, this is still too tight. Kiiti Morita
considered two rings to be essentially the same if they had exactly the same
representations. This does not imply that they are isomorphic, but being Morita
equivalent is enough to guarantee that many important properties carry over from one
to the other.
The derived category The best language for addressing the question of when two
things are the same is that of category theory. It was here that an even more profound
notion of sameness was discovered, by Jean-Louis Verdier and Alexander
Grothendieck. It is called the derived category, and generalizes Morita equivalence. If
two objects produce the same derived category this means that, ignoring the superficial
details, at their core, they are in many ways the same. The derived category is an
important component of Langlands’ program, as well as modern algebra and algebraic
geometry.
189
DISCRETE MATHEMA
collection of points, some of which are joined together with edges. This is the right
way to express many important optimization questions, such as the travelling salesman
and Chinese postman problems. Graphs also relate to questions of pure geometry,
through the subject of topological graph theory.
191
19
COMBINATORICS
The pigeonhole principle Suppose 1001 pigeons are housed in 1000 coops. It must be
that at least one coop contains two or more pigeons. If this seems obvious, it is. But it
is also extraordinarily useful. In general, the pigeonhole principle says that if n objects
are distributed among m boxes, and n m, then at least one box must contain two or
more objects. As an example, people have four different blood types: O, A, B and AB.
Imagining these as boxes, the pigeonhole principle guarantees that from any group of
five or more people, at least two will share the same blood type.
For example, we can define the parity of a whole number to be 1 if it is odd, and 0 if it
is even. The pigeonhole principle guarantees that in any collection of three numbers, at
least two will have the same parity. The reason that this simple idea is so useful is that
it is non-constructive.
It asserts the existence of x and y without having to go to the trouble of taking blood
tests.
A and B. How many objects are there all together? That is, how many objects does the
union, A B, contain? It is tempting to count the objects in A and those in B, and add
the two numbers together. If we use A to denote the size of A, what this suggests is
that A B A B.
This will sometimes be true, but not always. My table has two musical instruments on
it and two wooden things on it. However, there are not four objects in total, but three: a
metal flute, a wooden ukulele and a wooden spoon. The difficulty arises when A and B
overlap, that is, have non-empty intersection, A B. When we count each of A and B
and add the results together, we are actually double-counting those objects that are in
both A and B. So we have not worked out
ABABAB
The formula for the size of the union tells us the size of the union of two sets A B.
What happens if we have three sets, and want to know the size of A B C ? We can
apply the formula twice, and after a little manipulation we arrive at:
ABCABCABBCACABC
This is a rather long formula, but there are only three components to it: first add up the
sizes of the individual sets, then subtract the intersections of pairs, and then add the
common intersection of all three.
192
DISCRETE MATHEMATICS
COMBINATORICS
A1A2A3?
i3Ai?
ij3AiAjA1A2A3
A1A2…An?
inAi?
ijnAiAj?
i j k n A i A j A k … (1) n 1 A 1 A 2 … A n
The convention is that 0! 1. This may seem strange, but stops us having to insert a lot
of caveats and special cases in our statements. For n 0, the factorial function satisfies
the recurrence relation: (n 1)! (n 1) n !
Permutations Suppose we have gold, silver and bronze medals for mathematics, and a
shortlist of five candidates: Abel, Bernoulli, Cantor, Descartes and Euler (A, B, C, D
and E). How many different possible outcomes are there?
The answer depends on the rules of the award. The question is: may someone receive
more than one medal? Suppose they may. Then the number of choices for each medal
is five, so the number of total possible outcomes is 5 5 5 5 3 125.
However, if we insist that the medals go to different people, then there are again five
choices for the gold-medal winner, but then four for the silver, and three for the
bronze, giving a total of 5 4 3 60. We call this the number of permutations of three
objects from five, written as
nPrn!
_______ (n r)!
2 60.
193
Combinations When we are calculating permutations, the order of the selected objects
matters: E, B, D is not the same as D, E, B. However, sometimes we may not care
about the order. In these circumstances, we want a combination rather than a
permutation.
This time, we have three identical gold medals, and intend to award them from a
shortlist of Abel, Bernoulli, Cantor, Descartes and Euler (A, B, C, D and E). Therefore
the ordering of the three recipients does not matter: E, B, D is the same as D, E, B.
How many possible outcomes are there now? We know there are 5 P 3 60 different
possible orderings of the recipients. These fall into clusters of six which are identical
from our new perspective, such as {B, E, D}, {B, D, E}, {D, B, E}, {D, E, B}, {E, B,
D}, {E, D, B}. What we want to know is the number of clusters. This is given by 60
Binomial coefficients In general the number of sets of size r which can be selected
from a set of size n is written as (n
These combinations also appear as the coefficients in the binomial theorem, and can be
read off Pascal’s triangle.
Partitions of a set If we have four fruits: an apple, a banana, a cherry and a date (a, b, c,
d). How many ways can we divide these onto identical plates? The simplest case is to
put them onto four plates individually { a }, { b }, { c }, { d }. There is only one way
to do this. If instead we divide them as one pair and two singles such as { a, b }, { c },
{ d }, then there are
(4
2) 6
choices for the pair. Alternatively, we could divide them as two pairs, such as { a, b },
{ c, d }. There are three ways to do this. There are four ways to divide them into a
triple and a single { a, b, c }, { d }. Finally we could put them all on one plate: { a, b, c,
d }. Adding these all together, we get 1 6 3 4 1 15 possibilities. This is the fourth Bell
number B (4).
Bell numbers Named after Eric Temple Bell, the n th Bell number B (n) is the number
of ways of dividing up a set of n objects into subsets. The sequence of Bell numbers
begins: 1, 2, 5, 15, 52, 203, 877, 4140, … Unfortunately, there is no simple formula for
Bell numbers. But there are several complicated ones, including Dobiński’s formula:
B (n) 1
How many ways could we divide these up into smaller collections? (The question
differs from the partitions of a set described by the Bell numbers, because the coins are
identical. So if we were to divide them into one set of size 2 and three of size 1, then
we would not care which two coins go together, because they are all interchangeable.)
(n
r)
nPr
e?
k0
kn
k!
194
DISCRETE MATHEMATICS
GRAPH THEORY
Ultimately, the question reduces to this: in how many ways can 5 be expressed as a
sum of positive whole numbers? The answer in this case is seven:
11111211122131132415
Writing P (n) to stand for the number of partitions of n, this tells us that P (5) 7.
There is no straightforward way to write an exact formula for P (n). But these values
can be recovered from Euler’s partition function.
In 1918, G.H. Hardy and Srinivasa Ramanujan found a formula which estimates the
value of P (n):
P (n) 1
As n gets larger and larger, this estimate gets better and better. (Technically, the true
value divided by the estimate will get closer and closer to 1, so they are asymptotically
equal.)
Euler’s partition function Rather than an explicit formula for P (n), Leonhard Euler
found a generating function: a nice algebraic description of the series
P (n) x n. From his formula, the individual numbers can be worked out. His function is:
(1 x r) 1
To get at the numbers P (n), we have to expand each bracket using the generalized
binomial theorem:
1 x 2 x 2 3 x 3 5 x 4 7 x 5 11 x 6 15 x 7 …
This tells us that the first few partition numbers are 1, 1, 2, 3, 5, 7, 11, 15, …
GRAPH THEORY
Graphs The word ‘graph’ has several meanings in mathematics. In graph theory, all
that is meant is a collection of points or vertices, some of which are joined together by
lines or edges. A path is then a string of edges connecting one point to another.
______ 4 n
3 e ___
2 n ___
r1
195
These simple structures can capture the essence of many more complicated scenarios.
They are ubiquitous in topology and combinatorics, and form the best language for
analysing problems like the four colour theorem. Graphs feature in many optimization
problems such as the shortest path probelm, Chinese postman problem, and the
travelling salesman problem. In these cases, graphs often come with an extra piece of
structure: each edge has a number, its weight, attached. These problems involve
finding a route of minimal weight.
Seven bridges of Königsberg Now part of Russia and known as Kaliningrad, the
Prussian city of Königsberg has an illustrious mathematical history. It was the home of
David Hilbert, Rudolf Lipschitz and Christian Goldbach, among others. The city itself
was also at the centre of a problem, whose solution by Leonhard Euler sparked the
birth of several mathematical disciplines. The River Pregel divides in two as it passes
through the city, and different parts of the city were connected by seven bridges, as
shown. The question was: was it possible to walk around the city, crossing each bridge
exactly once?
Euler’s first insight in 1735 was that the geometrical details were extraneous:
everything that mattered could be captured by a simplified diagram consisting only of
vertices (standing for the various landmasses) joined by edges (representing the
bridges).
This insight marked the birth of graph theory. (Technically, this arrangement is a
multi-graph, since it has some vertices which are joined by more than one edge, not
allowed for in an ordinary graph.) Euler then understood that the answer to the
problem was no. His solution came from analysing the degrees of the vertices of the
graph.
Degree of a vertex In a graph, the degree of a vertex is the number of edges coming out
of it. Leonhard Euler reasoned that, if a solution existed to the Bridges of Königsberg,
most vertices would need to have an even number of edges coming out of them: half to
act as entries, and the same number as exits. This would have to be true for all apart
from at most two vertices: the start and end of the path. Since the Königsberg graph
has four vertices, and all are of odd degree, Euler concluded that no such route is
possible.
The shortest path problem Anyone who has used a route-finding program online or a
satellite-navigation system in their car is familiar with the shortest path problem. In the
context of a weighted graph, the question is: given two vertices A and B, how do we
find the path of least weight between the two?
111
BD
2
In applications, the weight of each edge could be its length, or it could be the time
taken to travel it (compare a 10 mile stretch of single-track road against an 11 mile
stretch of motorway). In telecommunications, weight could represent bandwidth. There
are many solutions to this problem, including
196
DISCRETE MATHEMATICS
GRAPH THEORY
Dijkstra’s algorithm of 1959. This starts at A and calculates minimum weight paths
first to the vertices adjacent to A, then to those increasingly distant from A, until it
reaches B.
The Chinese postman problem A postman needs to deliver letters to every street in his
area. What is the shortest route he can take? This is the Chinese postman problem
(‘Chinese’ because it was first studied by Mei-ko Kwan in 1962, though related to
Euler’s older bridges of Königsberg problem).
A suitable model of the problem is a weighted graph, where the weight of each edge
represents its length. The idea is to find a circuit of minimum overall weight which
passes along each edge. If the graph has only vertices of even degree (that is, with an
even number of edges joined) then it will have an Eulerian circuit, a route that goes
along each edge exactly once, and this will be the optimal solution. If it does not, some
edges will have to be traversed more than once. But which ones, and in which order?
Kwan came up with an algorithm to solve the problem. Essentially, it modifies the
graph into one in which all vertices have even degree, and then takes an Eulerian
circuit of the new graph.
Which of the shapes pictured can you draw without lifting your pen from the paper or
repeating any edges?
Considering each picture as a graph, the question becomes: which of these graphs have
an Eulerian path, that is, a route which crosses every edge exactly once? Based on
Leonhard Euler’s work on the Bridges of Königsberg, the answer is graphs that have at
most two vertices of odd degree.
The travelling salesman problem A salesman needs to travel to ten different towns:
what is the shortest route he can take? As for the Chinese postman problem, this can be
formalized graph-theoretically, with the towns represented by vertices, and the roads
between them as weighted edges. This time, the best case scenario is a Hamiltonian
path: a route around the graph which passes through each vertex exactly once. But,
again, such a path may not exist.
When the number of vertices is small, trial and error may produce an optimal result.
With bigger graphs this quickly becomes impractical. Unlike the Chinese postman
problem, there is no simple algorithm to solve the general case: as Merrill Flood
commented in 1956, ‘the problem is fundamentally complex’. In the language of
modern complexity theory the problem is NP-complete. However, special cases of the
problem are intensively studied, and are used in the optimization of countless
processes, from train scheduling to genome-sequencing. The largest instance of the
problem solved to date is an 85,900-city challenge posed by Gerhard Reinelt in 1991,
and completed in 2006 by David Applegate and colleagues, requiring a total of 136
years of CPU time.
197
The three utilities problem
I have received an extraordinary number of letters respecting the ancient puzzle that I
have called ‘Water, Gas and Electricity’. It is much older than electric lighting, or even
gas but the new dress brings it up to date. The puzzle is to lay on water, gas and
electricity, from W, G and E, to each of the three houses, A, B and C, without any pipe
crossing another.
So wrote Henry Dudeney, Britain’s pre-eminent puzzlist, in 1913. It is not too difficult
to get to a situation where just one more pipe is required, say from W to B.
Dudeney himself provided a solution, of sorts, in which this missing pipeline passes
underneath house A. This clever answer shows the importance of making the rules of
the game crystal clear (which explains why mathematicians have such a reputation for
pedantry). In mathematical terms, the question concerns a graph consisting of two sets
of three vertices (W, G, E and A, B,C), where each vertex is connected to all three
opposing vertices, but none on its own side.
This graph goes by the name of K 3,3. The question becomes whether K 3,3 can be
drawn on a flat sheet of paper. Dudeney’s answer, with its ingenious detour through
the third dimension, is now ruled out. Indeed the puzzle, correctly stated, has no
solutions at all: the task is impossible. However, it can be solved on a torus (see
topological graph theory.)
see topological
Planar graphs When can a graph be drawn on a graph
sheet of theory.
paper without any of its edges
crashing through each other? Graphs which can be drawn in this way are called planar,
meaning they can be represented in the 2-dimensional plane. A graph is complete if
every pair of edges is connected by an edge. The complete graph on four vertices is
known as K 4: it is planar.
But K 5, the complete graph on five vertices, is not: however you arrange it, two edges
will always collide.
The three utilities problem gives us another non-planar graph, called K 3,3. In
unpublished work in 1928, Lev Pontryagin proved that every non-planar graph encodes
either K 5 or K 3,3 (though not necessarily in a straightforward fashion). Surprisingly,
these two small graphs are the only obstacles to planarity. This celebrated fact is
usually known as Karatowski’s theorem, after its first publication by Kazimierz
Kuratowski in 1930.
Topological graph theory Planar graphs are defined by the fact that they do not encode
either K 5 or K 3,3. Both these graphs can be drawn on a torus, as indeed can K 7 and
K 4,4. It is still an open question exactly what graphs may be drawn on a torus, or on
other surfaces. A complete answer to this question is a goal of topological graph
theory.
198
Dudeney’s cheat
BC
Planar
K4
Planar Non-planar
K2,3
Non-planar
K5
K3,3
DISCRETE MATHEMATICS
RAMSEY THEORY
Erdős numbers The roaming Hungarian mathematician Paul Erdo˝s was one of the
most prolific in history: at his death in 1996 he had published around 1500 papers,
more even than Leonhard Euler (although page-for-page Euler wrote more). Shunning
worldly possessions, Erdo˝s travelled the globe, sleeping on friends’ sofas. He is
reputed to have believed that ‘a mathematician is a machine for turning coffee into
theorems’.
In 1969, in tribute to his eccentric friend, Casper Goffman playfully coined the idea of
an Erdo˝s number, which quickly became folklore. The idea is based on a graph whose
vertices are individual mathematicians (past and present), where two are joined by an
edge if they have co-authored a paper together. The resulting graph is not connected:
there are pairs of mathematicians who can never be joined by a path (for instance, if
one of them only ever wrote solo papers.)
People who co-authored a paper with Erdo˝s have an Erdo˝s number of 1. (There are
511 such known.) Mathematicians who wrote a paper with one of these, but not with
Erdo˝s, have Erdo˝s number 2, and so on. (Erdo˝s, uniquely, has an Erdo˝s number of
0.) People who can never be reached by this process have infinite Erdo˝s number.
Calculating an individual’s Erdo˝s number amounts to solving the shortest path
problem between them and Paul Erdo˝s.
RAMSEY THEORY
Ramsey’s theorem Frank Ramsey was a mathematician whose insights not only
produced a single theorem, but spawned an entire subject, Ramsey theory.
He is like Évariste Galois in this regard and, like Galois, he died tragically young, at
the age of 26. Ramsey’s theorem of 1930 is a classic for finding pattern and structure
amid disorder.
Look at sets of five distinct whole numbers: {1, 13, 127, 789, 1001} is one example.
Every such set is assigned one of four colours: red, white, green or blue. So {1, 13,
127, 789, 1001} might be red, and {2, 13, 104, 789, 871} green. It does not matter on
what basis these colours are assigned: the critical thing is that every collection of five
numbers has one, and only one, colour. Ramsey’s theorem says that there will always
be some infinite subcollection of whole numbers (call it A) which is monochromatic:
all sets of five elements from A are the same colour.
There is nothing magic about the numbers 4 and 5. The theorem guarantees that for
any numbers n and m, if all sets n of distinct numbers are assigned one of m colours,
then there will be some infinite monochromatic subset.
The dinner party problem How many people need to be invited to dinner to guarantee
that there will be three guests who are all either mutually acquainted, or mutual
strangers?
To translate this into mathematics, we can express the dinner party as a coloured graph.
Draw a dot to represent each guest, and join two with a green edge if they are
acquainted, or a
199
red edge if they are not. The answer to the problem is not five, since it is possible to
concoct a 5-vertex graph which contains neither a red triangle nor a green one.
However, any such graph of six vertices must contain a monochromatic triangle. This
translates back to a solution to the original problem. This innocent question is the
starting point for one of the most intractable problems in mathematics: Ramsey
numbers.
Ramsey numbers It is useful for mathematical purposes (if not for planning parties) to
extend the dinner party problem as follows: how many people need to be invited to
guarantee that there will either be m guests who are mutually acquainted, or n who are
mutual strangers?
A finite version of Ramsey’s theorem asserts that this problem does always have a
solution. Let R (m, n) be the minimum number of guests required to solve the problem.
The values R (m, n), for different values of m and n, are called Ramsey numbers.
Though simple to define, the exact values of Ramsey numbers remain deeply
mysterious. Even the value of R (5, 5) is as yet unknown, though established to be
between 43 and 49. Ramsey theory is the study of this problem, and the many
variations on this theme. It remains an active topic of research, with relevance to
computer science and game theory.
After the happy ending You might wonder where the happy ending was in the happy
ending problem. Actually, the name does not come from the mathematics, but from the
problem’s human history. Having solved the original problem, Esther Klein set to work
on generalizing it. The new, much more difficult question was: how many points in
general position are needed to guarantee there will be five which form a convex
pentagon? And similarly for a convex hexagon, or in general, a convex n-gon?
Klein and George Szekeres worked on this with Paul Erdo˝s, and were able to prove
that these questions do have a solution. For any number n, there is some number K (n),
so that any collection of K (n) points in general position must contain a convex n-gon.
Klein and Szekeres’ collaboration not only culminated in a proof of this theorem, but
in their marriage. This was the happy ending, and it was Erdo˝s who named the
problem so.
200
DISCRETE MATHEMATICS
RAMSEY THEORY
Golomb rulers A ruler is a straight, notched stick, where each notch is marked with a
number, representing its distance from the end. Of course, the distance between any
two notches is the difference between the two numbers.
While an ordinary ruler has marks at 0 cm, 1 cm, 2 cm, 3 cm, 4 cm, and so on, a
Golomb ruler has only some of these markings. The principle, conceived by Solomon
Golomb, is that no two notches should be the same distance apart as any other pair.
So a ruler notched at (0, 1, 2) is not Golomb, as 0 and 1 are the same distance apart as
1 and 2. The ruler (0, 1, 3) is Golomb, however. Moreover, this is an optimal Golomb
ruler as it is the shortest possible, with three marks. (0, 2, 3) is also an optimal Golomb
ruler of length 3. But, as it is the mirror image of (0, 1, 3), they are not counted as
genuinely different.
Optimal Golomb rulers (0, 1, 3) is the unique optimal Golomb ruler with three notches.
If we try to add an extra mark: (0, 1, 3, 4) and (0, 1, 3, 5) both fail, and (0, 1, 3, 6) is
not Golomb either since 0 and 3 are the same distance apart as 3 and 6. (0, 1, 3, 7) is
Golomb, but it is not optimal, there is a shorter ruler with four notches: (0, 1, 4, 6).
There are two optimal Golomb rulers with five notches: (0, 1, 4, 9, 11) and (0, 2, 7, 8,
11).
This suggests that locating optimal Golomb rulers is difficult. You cannot simply
extend one to find the next. For longer rulers, there is an extra difficulty: how do you
know if your n-marked ruler is optimal? This is compounded by there being no
obvious rule to tell how many optimal rulers there should be. The only approach, it
seems, is to compare all possible rulers with n notches, and pick the shortest.
Finding Golomb rulers Current methods for searching for optimal Golomb rulers
require serious amounts of computing time. A distributed computing project run by
distributed.net has found the longest currently known optimal Golomb ruler, with 26
notches:
(0, 1, 33, 83, 104, 110, 124, 163, 185, 200, 203, 249, 251, 258, 314, 318, 343, 356,
386, 430, 440, 456, 464, 475, 487, 492)
201
ANALYSIS
IF YOU DRAW A GRAPH of a moving object’s distance from a point against time,
the gradient of this graph represents its speed, or the rate of change of distance. This
relationship had been known since the time of Archimedes. What was missing was an
understanding of the underlying mathematical laws of gradients and rates of change.
This void was filled in the 17th century, with the development of differential calculus.
It was a mathematical triumph, but a human tragedy. Isaac Newton and Gottfried
Leibniz were perhaps the two greatest scientists of the age, and both claimed the
discovery. The ensuing dispute was one of the most divisive and acrimonious in the
history of science, and culminated with Newton, in his capacity as President of the
Royal Society, publicly denouncing Leibniz for plagiarism.
In any event, it was not until the 19th century that calculus was given a truly firm
foundation, through the work of Augustin-Louis Cauchy and Karl Weierstrass. Newton
and Leibniz had both relied on fictional objects called ‘infinitesimals’. In Weierstrass’
hands these were replaced with the notion of a limit. This new approach was more
robust, but also more technically demanding, opening up very delicate issues of
convergence and continuity. The new subject became known as analysis to indicate its
level of difficulty.
One reward for this hard work is a much richer picture of the complex numbers.
Taking pride of place in this scene is the exponential function and in particular one of
mathematics’ crown jewels: Euler’s formula.
203
2
SEQUENCES
Sequences A sequence is an infinite list of numbers: 1, 1, 2, 5, 15, 52, 203, 877, 4140,
21147, … This example is an integer sequence, as only whole numbers are involved.
One can also have real or complex-valued sequences. The sequences that we are most
familiar with are those which follow some discernible pattern or rule, allowing us to
predict what comes next. Mathematically, there is a formula for their n th term.
However, any infinite list of numbers qualifies as a valid sequence, even if there is no
rule apparent. (This is formalized by considering a real-valued sequence as a function
f: N → R.)
Sequences feature in every branch of mathematics, as well as the wider world. Their
cousins are series: sequences which we add up as we go along.
Arithmetic progression
If we write a for the first term of the sequence (5 in the case above), then a general
arithmetic progression is:
a, a d, a 2 d, a 3 d, a 4 d, a 5 d, …
a n a (n 1) d
na n(n 1) d
_______
So, if we add up the first 100 terms of the first sequence, we will get:
2 15,350
204
ANALYSIS
SEQUENCES
Geometric progressions
If we write r for the common ratio, and a for the first term, then a general geometric
progression can be written as:
a, ar, a r 2, a r 3, a r 4, …
So the n th term is a r n 1, and in the above example, the 16th term will be 28,697,814.
Because the common ratio is greater than 1, this sequence grows very quickly. Indeed,
this is a classic example of exponential growth.
see limits of sequences
If the common ratio is smaller than 1, then the sequence rapidly tends towards its limit
of 0 (see limits of sequences). An example is:
4, 4
3, 4
Sessa’s chessboard The beginnings of the game of chess are a matter of historical
conjecture. The game is believed to date back to ancient India, some time before ad
600. Although no-one can be certain, there is a legend about an original creator, the
wise man Sessa.
One day, Sessa visited the King to demonstrate his invention. The King was so
delighted with the clever game that he decided to reward Sessa with whatever he
should ask for. Sessa stated that he would like 1 grain of wheat on the first square of
his chessboard; on the second square, he would like 2 grains, on the third 4 grains, and
on the fourth 8 grains. On each square he wanted double the number of grains on the
previous square.
The King was angry that his generous offer should be so insulted, and strode out of the
room, commanding a courtier to see to it that Sessa got his wish. The courtier set to
work. On the 11th square he had to put 1024 grains, on the 21st over a million. By the
51st square, the courtier would have had to place the entire global wheat harvest for
2007; and there are 64 squares on the chessboard.
What comes next? What comes next in the sequence 1, 2, …? The obvious answer is 3,
but on reflection it could equally well be 4. If we take a longer sequence, such as 1, 2,
4, 8, 16, is there still ambiguity?
Counting the number of pieces into which the pictured circles overleaf are divided, we
get a surprising answer: 31. Of course it would not be incorrect to say that the next
number is 32,
9, 4
27, 4
81, 4 ____
243, …
205
if we instead interpret the sequence as powers of 2. The conclusion is that there is no
predetermined ‘next term’, until we understand the mathematics behind the sequence.
Similarly, 1, 2, 4, 8, 16, 1001, … is another perfectly good start to a sequence, as is 1,
2, 4, 8, 16, 3
The moral is that just writing the first few terms of a sequence followed by ‘…’ is
ambiguous and inadequate. As the philosopher Ludwig Wittgenstein noted, this sober
assessment does not come naturally to humans. When presented with the start of a
sequence such as 1, 5, 11, 19, 29, …, our tendency is to analyse it and believe that we
have ‘understood’ it when we can see how to continue it.
The circle sequence illustrates the risks of relying too heavily on this sort of insight. To
avoid this ambiguity, it is better to give the underlying rule directly. The standard way
to do this is by giving the n th term.
a 1, a 2, a 3, a 4, …
The most famous sequence of all begins 1, 2, 3, 4, … Here the first term is 1, the
second is 2, the third is 3, and, for each n, the n th term is n. We can write this as a n n.
An even simpler sequence is 1, 1, 1, 1, …. In this case, the first term is 1, the second
term is 1. Indeed, for every n, the n th term is 1, or a n 1. Consider the more
complicated sequence 3, 5, 7, 9, 11, … This can be captured by saying that, for every
n, a n 2 n 1. This is an arithmetic progression.
It may seem pedantic to keep repeating the phrase ‘for every n’. But many sequences
are too complicated to be pinned down by just one formula. For example, consider 2,
0, 8, 0, 32, 0, 128, 0, … Here the sequence is described by a n 2 n when n is odd, but a
n 0 when n is even.
__ 24 n 4 1
_ 4 n 3 23
__ 24 n 2 3
4 n 1.
In the story of Sessa’s chessboard, the courtier needed to place 2 n 1 grains of wheat on
the n th square. The resulting sequence is defined by a n 2 n 1. Because n features as a
power, this is an example of exponential growth, much faster than polynomial growth.
206
ANALYSIS
SEQUENCES
3) n, then we have instead exponential decline, which falls away to zero very quickly.
Faster sequences can be built using towers of exponentials, and so on.
__ n. As we move along this sequence, we get closer and closer to 0, but of course
never arrive. Zero is called the limit of this sequence.
Convergence Like the epsilon and delta definition of continuity, the formal definition
of a limit of a sequence is intricate. Consider the harmonic sequence, which has a limit
of 0. We can formalize this by saying that, given any positive number, no matter how
small (call it), there is some point (N) along the sequence, after which every remaining
term is within of 0. That is, for every n N, we have
|1
__ n |. Mathematicians usually use the Greek letter (epsilon) to stand for indeterminate,
but very small numbers. (Paul Erdo˝s used to call children epsilons for this reason.)
For any positive number (no matter how small), there is some point (N) along the
sequence, after which every remaining term is within a distance of l. That is, for every
n N, we have
| a n l |.
A consequence is that a sequence can only ever have one limit. Those which have a
limit are said to converge.
The sequence 1
5, … has a limit, namely 1. Sequences with limits are said to converge. We also say
that n
On the other hand, Sessa’s sequence 1, 2, 4, 8, 16, 32, … does not have a limit. It
simply grows and grows. This sequence is said to diverge to infinity, written 2 n → as
n ╺.
Many sequences neither converge nor diverge. For example, the sequence 1, 1, 1, 1, 1,
1, … cycles endlessly between 1 and 1.
_
5, 1
6, …
2, 1
4, 1
3, 1
2, 2
3, 3
207
SERIES
2, 1
4, 1
8, … It converges if it gets ever closer to some fixed number (in this case 0). A series
is obtained when we add the numbers up as we go along: 1 1
21
8 … Again, this converges if it gets ever closer to some fixed number. This series
converges to 2.
We use the large sigma notation to describe series, as for finite sums. Since this series
consists of successive powers of 1
n0
41
_
1 __
2n2
Series occupy a central place in modern analysis, but they are quirky characters.
Telling whether or not a series converges can be tricky, as shown by the deceptive
harmonic series. Even when we know that a sequence does converge, actually finding
the limit can be extremely difficult. It is by no means obvious that the series 1 1
31
51
71
9 … should converge to
4.
_
4, 3
16, 3
64, 3
4 n.
_ 4, S 3 3 3
43
__ 16, S 4 3 3
43
16 3
Sna1rn
______ 1 r
4n
11_
441
4 n 1.
This general formula for S n is valid for any geometric series. But, when r is between 0
and 1, the geometric series converges to a limit of:
10
ar n a _____
1r
n0
4 n 4.
The above formula for S n is not too difficult to derive. Since we know that:
S n a ar a r 2 a r 3 … a r n 1
r S n ar a r 2 a r 3 … a r n 1 a r n
208
ANALYSIS
SERIES
Subtracting these two equations, most of the terms cancel out. So, S n r S n a a r n,
which simplifies to the formula we want.
Harmonic series It is obvious that some series diverge to infinity. For example 1 2 3 4
5 … has no hope of settling on a finite limit. For other series it is not so easy to tell.
The harmonic series is obtained by adding together the terms of the harmonic sequence
1, 1
2, 1
3, 1
4, 1
5, 1
21
_
31
41
51
6, … or 1
__ n.
Here, the individual terms certainly get closer and closer to zero. This is a necessary
condition for the series to converge, but as it happens it is not sufficient.
Oresme proved the unexpected fact that the harmonic series actually diverges to
infinity. Start with the original series:
11
21
17 …
If we decrease each term, and still get something which diverges, then the original
must diverge too. So we decrease each term as follows:
11
21
31
41
51
61
71
81
91
10 1
11 1
12 1
13 1
14 1
__ 15 1
16 1
41
41
81
81
81
81
16 1
16 1
16 1
16 1
16 1
16 1
16 1
16 1
32 …
Now we can group the terms of this new series into a procession of halves:
11
21
32 …
This new series will keep adding half and half and half, and so it will eventually grow
bigger than any number you care to name. Therefore the harmonic series will do the
same.
This is a very surprising result, because the terms of the harmonic sequence get so
small that the series hardly seems to be growing at all. In fact, as Leonhard Euler
showed, if we add the first n terms of the series, we get approximately ln n. To reach
just 10, we have to add the first 12, 367 terms. To reach 100, we have to add around
1.5 10 43 terms. So although this series diverges to infinity, it does so very slowly
indeed.
21
41
41
81
81
81
81
16 1
16 1
16 1
16 1
16 1
16 1
16 1
16 1
"
35
21
31
51
71
11 … If this series converged, its limit would certainly be of interest. However, Euler
was able to prove that it does not converge. Like the harmonic series, this prime series
diverges to infinity. (This provides an alternative proof that there are infinitely many
prime numbers.)
209
In 1919, Viggo Brun looked at what would happen to this series if he focused on twin
primes, instead (that is, prime numbers which are 2 apart). Brun’s series comes from
adding up the reciprocals of all pairs of twin primes:
(1
19) …
Brun’s theorem was that this series does indeed converge to a finite limit. This
remarkable number is known as Brun’s constant, and has been pinned down to
approximately 1.90216.
Does this mean that the twin prime conjecture is false? Not necessarily, although that
is certainly possible. It does imply that twins are very sparse among the primes, even if
there are infinitely many of them.
31
5) (1
51
7) (1
11 1
13) (1
17 1
21
16 … 2. But ultimately this is short-hand which bypasses the crucial concept, that of a
limit.
There are several ways in which series are unlike ordinary addition. One is that when
we add finitely many numbers together, the order of addition does not make a
difference. For series this is not true. For example, the series below converges to ln 2.
11
41
81
21
31
41
51
61
71
81
9…
We can re-order this in a clever way:
11
21
41
31
61
81
51
10 1
12 …
(1 1
2) 1 __ 4 (1
31
6) 1 __ 8 (1
51
10) 1 ___ 12 …
1 __
21
12 …
41
61
81
10 1
The same is true for any conditionally convergent series, that is, one which does not
converge when all its terms are made positive. When the terms of this series are made
positive, it produces the divergent harmonic series. Happily, other absolutely
convergent series are better behaved.
210
ANALYSIS
SERIES
The binomial theorem tells us how to expand brackets such as (1 z) 17 without having
to work through sixteen intermediate stages.
The answer is a sum of terms of the form (17 r) z r, as r takes the values from 0 to 17.
Here (17 r)
(17
r) 17 16 15 … (17 r 1) ______________________________ r !
On the face of it, there is no reason to expect this theorem to have anything to say
about (1 z) 17, for example. This is something different and cannot be expressed as a
sum of positive powers of z in any obvious way.
Yet, nothing ventured, nothing gained. Isaac Netwon experimented to see what would
happen if the exponent was replaced with a negative number. He might have ended up
with nonsense; in fact he found a powerful generalization of the theorem. First, he had
to define the generalized binomial coefficient (17
r):
_________________________________________ r !
(a
r) a (a 1) (a 2) … (a r 1) ____________________________________ r !
(17
(1 z) 17 ?
(17
r) z r
r0
(1 z) a ?
(a
r) z r
r0
However, the issue of convergence is delicate here. For most values of z, this series
will not converge, and the theorem will fail. But whenever z is a complex number with
| z | 1 this series will converge, and this generalized binomial theorem holds.
211
CONTINUITY
Zeno of Elea assembled a list of paradoxes. The most famous purport to prove that
physical motion is impossible.
The first paradox involves the mythical hero Achilles. When this great warrior sets out
to race a tortoise, he encounters an unexpected problem. Agreeing to give the tortoise a
head-start, he has no trouble running to where the tortoise started. But when he gets
there, he finds the tortoise has moved on slightly. He can run to the tortoise’s new
position, but again once he arrives it has moved ahead, and so on. He can never reach
the tortoise without first arriving at its previous position, but every time he does, it
moves on slightly. Therefore he can never catch the tortoise.
Achilles and the tortoise have further paradoxical adventures in Lewis Carroll’s logical
dialogue What the tortoise said to Achilles, and then in one of the great books of the
20th century, Douglas Hofstadter’s Gödel, Escher, Bach.
It is possible that Zeno genuinely believed that all motion and change is illusory.
However, the enduring importance of his paradoxes, from a mathematical perspective,
is in illustrating the subtleties in the relationship between discrete and continuous
systems.
Zeno’s dichotomy paradox Zeno’s dichotomy paradox makes a similar point to the one
of Achilles and the tortoise, but even more forcefully. Suppose Achilles is training for
his big race by running a mile, alone this time. Before he can cross the finish line, he
must first reach the half-way mark. But, before he can reach that, he must reach the
point half-way to that, the quarter mark. Before he can get there, though, he must pass
the eighth mark, and so on, ad infinitum. Suddenly he is faced with an infinity of tasks
to do, and no first step. This time Achilles cannot even get started.
In the case of Achilles’ race against the tortoise, if each phase of the race took the same
length of time, then it really would be true that he could never catch the creature. But
this is not the case; successive phases take ever smaller quantities of time, which
together form a convergent series. The limit of this series will be the moment Achilles
moves into the lead. What is misleading about the paradox is that the time taken to say
each step remains the same, and this forms a divergent series.
212
ANALYSIS
CONTINUITY
The rational numbers have gaps Take a piece of paper with a horizontal line across the
middle of the page. Put one mark in the bottom half of the page, and another in the top.
It seems obvious that any curve you can draw between the two marks, without your
pen leaving the paper, must cross the dividing line at some point (leaving aside tricks
such as going around the back of the page).
This is not a mathematical scenario, of course, but it is a model that any sensible theory
of continuous functions should follow. Unfortunately, when working in the rational
numbers, this does not happen. Take the function f (x) x 2 2. This seems an innocuous,
continuous function. If we plot the graph of y f (x) then at x 0, y is negative, and when
x 2, y is positive. So the graph should cross the horizontal line y 0 somewhere. But,
working in the rational numbers, it does not: there is no rational number such that f (x)
0. There is a gap in the rational line at
The intermediate value theorem The beauty of the real numbers is that all the gaps in
the rational numbers have been filled in. The intermediate value theorem asserts this
once and for all. If you have a continuous function f on the real numbers, which is
negative at some point (a) and positive somewhere else (b), then there is guaranteed to
be an intermediate point (c) where f crosses the horizontal line, that is to say f (c) 0.
This theorem shows that the real numbers capture our intuition about points, curves
and continuity, and form the right setting for geometry and analysis.
Discreteness and continuity The whole numbers are discrete, meaning that they come
in separate packages, with gaps in between. We jump from 1 to 2 and then from 2 to 3.
The real numbers, in contrast, are the classic example of a continuous set. Here we
glide smoothly between 1 and 2, covering an infinite number of intermediate points.
The intermediate value theorem shows that, even on the finest imaginable scale, there
are no gaps.
Discreteness and continuity are the north and south poles of mathematics. Each
provides the setting for a great deal of fascinating work. Topology and analysis are
concerned with continuous functions, whereas combinatorics and graph theory are
wholly discrete in nature. These two areas have very different feels, and the tension
between them poses many technical and conceptual challenges, of which Zeno’s
paradoxes were an early taste. When discrete and continuous situations collide, the
fireworks can be spectacular. Examples include the number theoretical heights of
Diophantine geometry and the mysterious quantum phenomenon of wave–particle
duality.
213
Continuous functions A real function is a function which takes real numbers as inputs
and gives real numbers as outputs. These are hugely useful for modelling all sorts of
physical processes. For example, walking down a road can be modelled by a real
function which takes numbers representing time as input, and gives numbers
representing distance as output. In this and in many applications, the function we get is
continuous, meaning that it doesn’t contain any gaps or jumps. (A discontinuous
function would be needed to model someone walking down the road and then suddenly
teleporting 10 feet forwards.)
Being continuous need not imply that the curve is smooth, however. It may be very
jagged, as long as it doesn’t contain any gaps. So every differentiable function (see
differentiability) is certainly continuous, but the reverse does not hold (an example
being the Koch snowflake).
see differentiability
Weiertrass’ epsilon–delta definition of continuity confuses everyone on first meeting.
But it illustrates the level of rigour that Augustin Cauchy and Karl Weierstrass
introduced into mathematical analysis in the 19th century, replacing the old fallacious
reasoning about infinitesimals. It is remarkable that such an intuitive notion as
continuity should require such a technical formulation.
To say that f is continuous at this point intuitively means that it does not have a gap at
this point. The idea is simple enough: as x gets closer to 0, then f (x) gets closer to 1.
To formalize this idea of ‘getting closer’, Weierstrass rephrased it as follows: for any
number (no matter how small), whenever x is close enough to 0, then f (x) will be
within of 1.
This version still contains the undefined notion of being ‘close enough’ however. This
can be removed as follows: for any number, there exists another positive number, so
that, whenever x is within of 0, then f (x) will be within of 1. This is now the modern
definition of continuity. It can be written out more quickly using logical quantifiers:
(0)(0)(| x | ?| f (x) 1|)
214
fferentiability
ANALYSIS
DIFFERENTIAL CALCULUS
DIFFERENTIAL CALCULUS
Everywhere else the curve is smooth, but not at this particular point. How can we
define what it means to be smooth? Differentiation provides an answer. This is a subtle
process, but what it amounts to is finding the tangent to the curve. The problem is that
you cannot always do this. At its cusp, the curve does not have a unique tangent. Any
line you can draw is as good as any other, so the tangent is undefined, and the curve is
not differentiable at this point.
Of course, the pictured curve is smooth everywhere except at its cusp. For many years,
mathematicians thought this was typical: every continuous function should be
differentiable, except possibly at a few points. However, in 1872 Karl Weierstrass
shocked the establishment by producing a function which is continuous everywhere,
but not differentiable anywhere. The Koch snowflake is another such curve. The theory
of smooth functions works much more cleanly in the complex numbers.
__ d x. Velocity,
Curve
Cusp
When we have a curve instead of a straight line, this process is trickier. To start with, it
is not obvious what the gradient of a curve should mean. The agreed meaning is that
the gradient is that of the tangent to the curve, the straight line which touches it at
exactly one point.
Choosing this point makes a huge difference. Unlike straight lines, the gradient of a
curve varies from place to place. Dating back to Archimedes, the basic method for
calculating this gradient is as follows: locate the chosen point on the curve, draw a
tangent by hand, and
215
calculate its gradient. The problem with this is that it relies on being able to draw
curves and straight lines with perfect accuracy. In reality, this method will only ever
produce an estimate of the true value. For an exact procedure mathematicians had to
wait for the development of the derivative.
Suppose we want to calculate the gradient of the curve y x 2 at the point (1, 1).
Archimedes’ method would be to draw the tangent at this point, and calculate it.
Ideally, however, the accuracy of our mathematics should not be hostage to our artistic
skill. To draw an exact tangent at a point is difficult. But to draw an approximate
tangent is easy: pick another nearby point on the curve and join the two with a straight
line (called a secant). We could pick (2, 2 2), that is, (2, 4), as our second point. We
can join these with a straight line, and easily calculate its gradient:
4 1 ____
2 1 3.
If we had chosen a second point nearer (1, 1) we would get a better approximation. The
point
1.1 1
As the second point moves ever closer to (1, 1), it seems the gradient is getting closer
to 2. So we could guess that this is what we really wanted, the gradient of the tangent
at (1, 1). To prove this rigorously we need a little algebra.
We want to know the gradient of the line passing through this and (1, 1). As always
this is the change in height divided by the change in base: (1 2 h h 2) 1 ___________
_____ h,
(2,4)
(1.1,1.21) (1,1)
1
(1,1) y = x ²
It is now clear what happens as the second point gets closer to (1, 1). This corresponds
to h getting smaller and smaller, meaning that the gradient 2 h gets closer and closer to
2.
If we had chosen the point (3, 9) we would have found that the curve has gradient 2 3
6. Similarly at (4, 16) it has gradient 2 4 8. In general, at the point (x, x 2), the curve
has gradient 2 x. This shows that the derivative of the function y x 2 is y 2 x.
216
ANALYSIS
DIFFERENTIAL CALCULUS
Derivative Suppose we have a function f which has real numbers as its inputs and
outputs. The derivative of f is a new function (written f ') which describes the rate of
change of f. In graphical terms, if we have the graph of y f (x), then we can look at its
gradient at the point a. This is the number given by f ' (a). How is this defined?
small number h. So, to calculate the exact gradient, we take the limit of
This is the definition of the derivative. (It is an important caveat that this limit must
exist uniquely, otherwise we have a non-differentiable function on our hands.)
d y ___
d x f ' (x)
The derivative song Tom Lehrer is best known for his satirical songs such as
‘Poisoning Pigeons in the Park’. He is also a mathematician, and taught at Harvard
University. In one of his lesser known works, Lehrer set the definition of a derivative
to music. Very appropriately, the tune was ‘There’ll Be Some Changes Made’ by W.
Benton Overstreet (with original lyrics by Billy Higgins).
You take a function of x, and you call it y, Take any x 0 that you care to try, You make
a little change and call it x, The corresponding change in y is what you find nex’, And
then you take the quotient, and now carefully Send x to zero and I think you’ll see That
what the limit gives us, if our work all checks, Is what we call d y
d x. It’s just d y
Here x 0 (‘x nought’) is the point at which we are calculating the derivative (a in the
example above). Similarly x (‘delta x’) is the small change denoted by h above. So the
‘corresponding change in y’ is f (x 0 x) f (x 0).
f(a+h)
f(a)
h ?0 f (a h) f (a)
a+h
dx!
217
Standard derivatives The definition of the derivative can be tricky to apply directly.
Happily, there are several standard facts we can use instead:
d y ___
dx
d y ___
dx
xn
nx n 1
Cx
C x ln C (valid for x 0)
log c x
1 _____ x ln c
(valid for | c | 0, 1)
ex
ex
ln x
1 __
sin x
cos x
sinh x
cosh x
cos x
sin x
cosh x
sinh x
tan x
sec 2 x
tanh x
sech 2 x
cosec x
sec x
sec x tan x
cosec 2 x
coth x
cosech 2 x
1 _______
1x2
(for 1 x 1)
sin 1 x
sinh 1 x
1 _______
1x2
1 _______
1x2
1 _________
_______
1 x 2 (for | x | 1)
cos 1 x
(for 1 x 1)
cosh 1 x
1 ______ 1 x 2
(for 1 x 1)
Together with the product rule, quotient rule, and chain rule, these standard derivatives
are enough to differentiate a wealth of common functions, without having to worry
about tricky limiting processes.
Product rule The table of standard derivatives tells us how to differentiate y x 3 and y
sin x. But how do we differentiate y x 3 sin x ? This is an instance of the general
question of how to differentiate the product of two functions: y f (x) g (x).
A common mistake is to differentiate the first two separately and multiply the results
together. This is wrong (as can be seen by taking f (x) g (x) x, for example). The
correct answer, known as the product rule or Leibniz’ law, is:
tan 1 x
1 ______ 1 x 2
tanh 1 x
218
ANALYSIS
DIFFERENTIAL CALCULUS
__ d x x 3 cos x 3 x 2 sin x.
d y ___
Proof of the product rule Why should the product rule be true? The reason comes from
manipulating the definition of the derivative of f (x) g (x). It must be the limit of:
By adding and subtracting f (x h) g (x) to the top row, we can rewrite this as:
This equals
which is
As h → 0, f (x h) → f (x) and so the whole thing approaches f (x) g' (x) f ' (x) g (x).
The chain rule Using the table of standard derivatives we can differentiate y x 3 and y
sin x, and thanks to the product rule, we can also differentiate y x 3 sin x. But what if
we combine these functions in a different way, such as y sin(x 3)? The general
question is to find d y
__ d x when y f (g (x)).
d y ___
__ d x e sin x cos x.
Iterated applications of the chain rule can allow even more complicated functions to be
differentiated. For instance, if y e sin(x 3), x then d y
__ d x e sin(x 3) cos(x 3) 3 x 2.
Proof of the chain rule The chain rule is extremely useful, as it enormously increases
the number of functions we can differentiate. To see why it should be true requires
getting our hands dirty with the technical definition of the derivative.
If y f (g (x)) then d y
f (g (x h)) f (g (x))
__________________ h
219
The top line almost looks like f (g (x) h) f (g (x)), in which case the whole thing would
approach f ' (g (x)). This isn’t quite right though.
We can manoeuvre ourselves into this position by introducing a new small number j,
defined as j g (x h) g (x). As h becomes very small, so too does j (since g is
continuous). Now f (g (x h)) f (g (x) j). So d y
f (g (x) j) f (g (x))
__________________ h
To arrive at f ' (g (x)), we would like to have j on the bottom row, instead of h. We can
arrange this, by multiplying the whole thing by j
__ h. Now d y
f (g (x) j) f (g (x))
__________________ j j
__ h
The first fraction does indeed approach f ' (g (x)). So we need to understand the
second, j
__ h.
__ h g(x h) g (x)
h, and as h ? 0 this approaches g ' (x).
The quotient rule If the product rule allows us to differentiate x 3 ln x, and the chain
rule ln (x 3) then how do we differentiate y x 3
is to differentiate y f (x)
___ g (x).
d y ___
g (x) 2
___ ln x, then
d y ___
d x ln x 3 x 2 1
__ x x 3
_______________
(ln x) 2
which is
3 x 2 ____
ln x x 2 ______
(ln x) 2
Rather than being a rule in its own right, the quotient rule follows from applying the
chain rule to differentiate (g (x)) 1 to get g '(x) _____ g (x) 2, and then the product rule
to y f (x)g (x) 1.
__ d x f ' (x).
However, there is no law that says we have to begin with y alone on one side of the
equation.
__ d x 5 3 x 2.
This process of implicit differentiation works in exactly the same way as before. We
just have to remember that whenever we differentiate y, we get d y
220
ANALYSIS
DIFFERENTIAL CALCULUS
we get cos x). So, if we have yx 2 sin x, we can differentiate the left-hand side using
the product rule, to get d y
__ d x 1
x.
Maxima and minima Looking at the graph of y x 3 3 x, there are two places where the
graph is perfectly horizontal. Let’s find them exactly. A horizontal graph corresponds
to a gradient of 0, which means that at these two points d y
__ d x 0.
So we start by differentiating: d y
__ d x 3 x 2 3.
y = x³ – 3x
If d y
In some sense these points represent maximum and minimum values of y. But (1, 2) is
not really the maximum value of y (when x 10, y 970, for example). It is a local
maximum, however. There is no point in its immediate vicinity which is greater.
Local maxima and minima represent a curve’s turning points, and they can be found by
finding the points where d y
__ d x 0.
y = x³
__ d x 0 at the origin. Although the graph is flat here, this is not a turning point, it is a
stationary inflection point.
Maxima, minima, and stationary inflection points can be distinguished by the second
derivative test.
The second derivative If we start with a function f and differentiate it, we arrive at the
derivative f ' of f. For example, if f (x) sin x, then f ' (x) cos x. This function f '
describes the rate of change of f. What happens if we differentiate again? We get the
second derivative f ". In this case f " (x) sin x. When the variable y is defined by y f
(x), we write d y
The derivative measures the gradient of the graph y f (x), so what does the second
derivative mean? Of course, it must be the ‘rate of change of the gradient’, but what
geometric meaning does this have? If d 2 y
___ d x 2 is positive, that means the curve is getting steeper and appears convex from
the left. If you draw a tangent, it will appear below the curve. If d 2 y
d²y dx² < 0
d²y dx²
<
d²y dx² = 0
___ d x 2 is negative, then the gradient is decreasing and appears concave from the left.
If you draw a tangent it will be above the curve. If d 2 y
___ d x 2 0, then the gradient is unchanging and the tangent runs concurrent to the
curve.
221
The second derivative test The second derivative provides a useful test for identifying
maxima, minima, and stationary inflection points. At a maximum, the gradient is
decreasing, meaning that d 2 y
__ d x d 2 y
__ d x 2 x.
Often more than two variables are involved. For example, the formula z 2 xy describes
a surface (specifically a hyperbolic paraboloid) in three dimensions. We can think of
this as a function which takes a pair of numbers (x and y) as input, and produces a
single number (z) as output. We can return to a familiar scenario by fixing a value of y,
such as y 5. Geometrically, this corresponds to taking a slice through the surface, to get
a curve with equation z 10 x. We can now differentiate this as usual, to get d z
__ d x 4.
and d 2 y
___ d x 2 0.
So if d 2 y
if d 2 y
___ d x 2 0 there.
222
ANALYSIS
DIFFERENTIAL CALCULUS
In fact, whatever value of y we fix, we will get twice that number as the derivative.
This says that z
Partial differentiation
The basic laws for partial differentiation are exactly the same as for ordinary
differentiation. For example, suppose we have a formula z x 2 3 xy sin y, which we
want to differentiate with respect to x. The only new rule is that y is treated in exactly
the same way as a constant term is (as if we have fixed it at some value). So we get z
__
y 3 x cos y.
Similarly if y 4 t 2 x, then y __
x 4 t 2 and y __
t 8 tx.
__ d x 2 x has a nice geometrical interpretation, as the gradient of the curve. So, at the
point x 4, the curve has gradient 8. More precisely, the derivative is the gradient of the
tangent to the curve. This is the straight line which touches the curve exactly once, at
the point (x, y).
x and z
respectively.
___ d x 2, the second derivative. We can do the same with partial differentiation, but
there is more choice. If we have z x 2 3 xy sin y, we can differentiate with respect to x
to get z
x 2 x 3 y, and then
y 3 x cos y and 2 z
2 z ____
y x 3. It is no coincidence that these two come out the same. Clairaut’s theorem
guarantees that 2 z ____
x y 2 z ____
y x always holds. So the order of variables by which we differentiate does not matter.
We can continue to give higher partial derivatives, such as 3 z ___
y 3 cos y, 3 z
y 2 x 0, and so on.
223
INTEGRAL CALCULUS
41
91
x2dx
represents the area enclosed by the curve y x 2 and the x-axis, between the points 1 and
4.
The term d x indicates that integration is being performed with respect to the variable
x. In a subtle way, this tells us how the area is being measured. In this case, the area we
want is that swept out as x moves smoothly between its two limits, 1 and 4. The
fundamental theorem of calculus provides the method for calculating this area.
Step functions
It is not obvious how to evaluate the area under a curve. Some functions are easier to
manage though. Take the curve which is equal to 2 between 1 and 4, and 0 elsewhere.
We can write this as:
f (x) { 2
0 if 1 x 4
otherwise
The area beneath this is a rectangle of width 3 and height 2, and therefore with area 3 2
6. Similarly the function
f (x)
Area 9
12345
Area 6
12345
if 1 x 4
if 4 x 5 otherwise
has an area comprising two rectangles, giving a total area of 9. These are two examples
of step functions, which are easy to integrate.
The definition of integration for more general curves is technical, and was first fully
worked out by Bernhard Riemann and Henri Lebesgue in the late 19th century. But the
idea is simple enough: it comes from approximating the target function ever more
closely by step functions. Almost every function can be approximated this way, and is
therefore integrable.
230
y = x²
y = x²
20
20
15
10
15
10
x² dx
1
234
224
ANALYSIS
INTEGRAL CALCULUS
Happily, awkward sequences of step functions remain firmly in the background for
most practical problems. If we want to integrate a curve such as f (x) x 2, the
fundamental theorem of calculus provides a much easier method.
Definite integrals Integrals come in two forms: definite, in which the endpoints are
specified, and indefinite, without endpoints. The integral
1234
–1
Area –2
–2
–3
x2dx
is definite, because it has endpoints 1 and 4. The result of a definite integral should be
a specific number (in this case 21), which gives the area under the curve.
When the curve passes below the x-axis, the area comes out as a negative number. For
example, integrating y x between 0 and 4:
0
4
–4
y=–x
y=x
Area 0
xdx8
If we integrate y x between 3 and 3 the positive and negative areas cancel each other
out:
Indefinite integrals Definite integrals produce numbers, representing areas. But how
can we calculate these numbers? There is a function which gives them, called the
indefinite integral. For example, consider the definite integral:
x2dx
x2dxx3
(using the fundamental theorem of calculus, and omitting the constant of integration).
This function now allows any definite integral to be calculated. If the indefinite
integral of f is F, then any definite integral is calculated according to the rule:
__ 3 1 3
3 21.
The next question is how to evaluate indefinite integrals. The fundamental theorem of
calculus provides the answer.
225
The fundamental theorem of calculus
The subject of calculus has two components: differentiation and integration. The
fundamental theorem of calculus relates the two. The answer is that differentiation and
integration are inverse procedures, one is the other one ‘done backwards’.
If we differentiate a function f to get f ', and then integrate that, we arrive back at f
(along with a constant of integration). Writing this out formally, we get:
f ' (x) d x f (x) C Going the other way, if f (x) dx F (x) then F ' (x) f (x).
If we increase the value of x by a small amount, h, the area increases from F (x) to F (x
h). Geometrically, a small strip has been added, which is h wide and approximately f
(x) high. This strip has an area of approximately h f (x). (Of course the strip is not
really a perfect rectangle; this is why this is only a sketch of a proof !) Therefore the
total new area should satisfy F (x h) F (x) h f (x).
This formula looks like the definition of the derivative of F; as h tends to zero, the
right-hand side approaches F ' (x), as required.
___ answer: 3 x 2
3 x 2. So x 2 d x x 3
f(x)
F(x)
x x+h h×f(x)
226
ANALYSIS
INTEGRAL CALCULUS
More generally:
x n d x x n 1 ______ n 1 C
Other standard integrals can be found by working backwards from the table of standard
derivatives. For example:
cos x d x sin x C
and y 2 have a gradient of zero. So when we differentiate any such function, we always
get d y
__ d x 0.
cos x d x The fundamental theorem of calculus tells us to search for a function y that
differentiates to give d y
__ d x cos x.
__ d x cos x, as the constant term disappears. Similarly y sin x 2 has derivative cos x,
as does y sin x C for any number C.
So there is not just one possible answer: y sin x C is a valid solution, for every possible
value of C.
It is a good idea to present the most general answer possible, so the best way to write it
is as sin x C. The unknown number C is called a constant of integration.
Evaluating definite integrals A definite integral should produce a numerical value for
an area. For example, 1
x 2 d x is the area bounded by the curve y x 2 and the axis, between the points 1 and 4.
The first step to calculating this is to evaluate the corresponding indefinite integral. In
this case x 2 d x x 3
3. (We may ignore the constant of integration, as it would cancel out in the next step
anyway.) The second step is to substitute the two limit values for x (in this case 1 and
4) into this function. So we get 4 3
3 64
3. Finally, we subtract the value for the higher limit (4) from that for the lower (1) to
give our answer. In this case 64
31
__ 3 and 1 3
31
_
3 63
3 21.
x3
x2dx
43
313
3 64
31
3 63
3 21
Similarly:
0
2
cos x d x [sin x ] 0
2 sin
2 sin 0 1 0 1
227
Integration by parts According to the product rule for differentiation, if we have two
functions of x, say f (x) and g (x), then when we differentiate f g, we get (f g) ' f ' g f g'.
So f g' (f g) ' f ' g. If we reintegrate this, we get
f g' d x f g f ' g d x This technique of integrating by parts is often useful where no other
method seems available.
x cos x d x we might spot that this is in the correct form, with f (x) x and g (x) sin x.
So, applying the formula above, this integral becomes
x sin x 1 sin x d x
Integration by substitution The chain rule for differentiation says that the derivative
with respect to x of f (u) is f ' (u) d u
__ d x. This provides a method for integrating anything of the form f ' (u) d u
We can think of the two d x terms on the left as cancelling each other out (though we
should not take this too literally). This is a useful technique for integration; the skill is
in spotting a suitable substitution to make for u.
Integration, however, is a much more robust procedure. It does not require the function
to be smooth, or even continuous, just that it should be approximable by step functions.
It is almost
cos u d u
x sin x cos x C
__ d x. Namely:
f ' (u) d u
228
ANALYSIS
INTEGRAL CALCULUS
correct to say that every function satisfies this condition. Certainly, any function you
can write
Ultimately, this question closely depends on the logical structure underlying the real
numbers. At the root of this structure is the prickly problem of the axiom of choice. If
we take this to be true then the existence of a non-measurable function automatically
follows. Such a function cannot be integrated, and there is no satisfactory way it can
even be written down. (This existence of non-measurable functions is also the cause of
the Banach–Tarski paradox.)
This is not a problem which impacts people’s lives very often, since every function
encountered in a practical situation will certainly be integrable. Far more troublesome
is the problem of non-elementary integrals.
Students of integration are often presented with problems of the form f (x) d x. An
example is:
This is rather inconvenient, since evaluating e x 2 d x is the key to working with the
normal distribution, amongst other things. Nor is this an isolated case. The logarithmic
integral function which occurs in the prime number theorem is also always expressed
as an integral. It too cannot be expressed by any combination of elementary functions.
Similarly sin x integrable, has no elementarily expressible integral.
___ x, though
Numerical analysis Non-elementary integrals are one area where the comforting
certainty provided by a single mathematical formula is lost. But this problem is much
more widespread. As we investigate more complicated differential equations, this
phenomenon recurs, or seems to, all too often. A notable instance is the Navier–Stokes
problem. This is a moment when the interests of pure mathematicians diverge from
those of engineers, for example. While mathematicians worry about the theoretical
existence and uniqueness of exact solutions, most applications require something
which works well enough for practical purposes.
229
Catastrophe theory According to geologists, at some moment within the next few
thousand years, the earth’s magnetic poles will spontaneously switch. This
phenomenon of geomagnetic reversal is an example of what mathematicians call a
catastrophe. This is not to suggest that the consequences for humanity will be
disastrous, rather it is an example of a process which carries along perfectly smoothly,
and then changes abruptly and without warning.
Closely related to chaos, catastrophes are widespread in both mathematics and the
natural world, and are even invoked by sports psychologists to explain sudden slumps
in performance. Such occurrences cause a headache for numerical analysts, as even
slight errors in calculation risk putting them on the wrong side of a catastrophe,
invalidating their answer.
COMPLEX ANALYSIS
Real and imaginary parts What do complex numbers look like and how can we
perform arithmetic with them? The first example of a complex number is i (the square
root of 1). Others are 2 3i, and 1
3 1001i. Every complex number can be written in the form a b i, where a and b are real
numbers (indeed this is the definition of a complex number).
The numbers a and b are called, respectively, the real part and the imaginary part of z,
written as Re(z) a and Im(z) b. For example, if z 2 3i, then Re(z) 2 and Im(z) 3. (Notice
that, somewhat confusingly, the ‘imaginary part’ of z is in a fact a real number.)
It is very easy to add or subtract complex numbers presented in this form: simply add
or subtract their real and imaginary parts. So:
1 3 1 i 2i 3 2i i 3 i 6i 2 1 7i
Numbers written in modulus and argument form are even easier to multiply:
3 e __ 2 i 4 e __ 3 i 3 4 e (
3) i 12 e (
3) i 12 e 5 __ 6 i
230
ANALYSIS
COMPLEX ANALYSIS
Dividing complex numbers Every complex number can be written in the form a b i,
where a and b are real numbers. But complex numbers can also be divided, so 2 3i
____
This is now in the usual format, with its real and imaginary parts clearly visible.
3 i is presented in the ordinary way, via its real and imaginary parts. If we think of the
complex numbers as a plane (called the Argand diagram), then the real and imaginary
parts correspond to the Cartesian coordinates of the point z, namely (
3, 1).
There is an alternative: we could use polar coordinates instead. This means that we
need to measure the distance of the point (
3, 1) from 0, and find its angle from the real axis (measured in radians). Pythagoras’
theorem and a little trigonometry reveal these to be 2 and
6 respectively. These are called the modulus and argument of the complex number,
written z 2 and Arg z
6.
Putting these together, the second way of writing the number z is as 2 e i __ 6. In
general the complex number with modulus r and argument can be written as r e i.
The appearance of the number e here is due to Euler’s trigonometric formula. This
theorem, along with a little triangular geometry, is all that is needed to switch between
the modulus– argument representation r e i and the real–imaginary representation a i b.
According to Euler’s trigonometric formula, r e i r (cos i sin) r cos i r sin, so a r cos and
b r sin.
Complex analysis At the end of the 18th century, Carl Friedrich Gauss established the
profound importance of the complex numbers, by proving the fundamental theorem of
algebra. In the early 19th century, Augustin-Louis Cauchy set about investigating the
spectacular feats which this opened up. In particular, he pursued the ideas of calculus,
and discovered how elegantly they work in this setting, more smoothly in fact than in
the real numbers. This realization marked the dawn of complex analysis, perhaps the
moment that mathematics entered full-blown adulthood. The central objects of study
are complex functions.
Complex functions The fundamental theorem of algebra tells us that the theory of
polynomials works much more smoothly at the level of the complex numbers than in
the real numbers. The same holds true when we look at the more general theory of
functions. Complex functions take complex numbers as inputs, and give complex
numbers as outputs,
231
f: C ? C. The difficulty is that while we can ‘see’ a real-valued function as a 2-
dimensional graph, our human minds are incapable of visualizing the 4-dimensional
graph that a complex function demands. Despite this inconvenience, we can learn a
great deal about them. The most important complex functions are the smooth
functions.
At the end of this sequence are the infinitely differentiable functions, the smoothest of
all.
A small subsection of these are the analytic functions, essentially those which can be
written as power series. These are the functions which are easiest to work with.
All told, it is a messy picture, and it is easy to get lost in this confusing hierarchy.
Augustin-Louis Cauchy’s theory of analytic functions shows that, in the complex
numbers, the situation is infinitely simpler.
Analytic functions One of Cauchy’s most important theorems was a highly unexpected
result about smooth functions in the complex numbers: every differentiable function is
automatically infinitely differentiable. Knowing that you can differentiate f once
guarantees that you can differentiate it twice, three times, and as many times as you
like. This is emphatically not true in the real numbers. It shows what a hospitable
world the complex numbers are, once you get to know your way around.
Analytic continuation theorem When we are working with the real numbers, smooth
functions can easily be chopped up and glued together. For example, we can take the
curve y x 2, cut it in two at see
its trough, and insert a portion of straight line 3 units long
between the two halves. This hybrid curve may not have a concise formula to define it,
but it is perfectly valid, and indeed differentiable.
The analytic continuation theorem says that the situation is utterly different in the
complex numbers. If we know the values of an analytic function on a patch of plane,
then there is exactly one way to extend it to the whole complex plane. This is two
theorems in one, and they are two
232
complex numbers, just being differentiable guarantees that a function is analytic, that
is, it can be expressed by power series. More precisely, f is analytic if the complex
plane can be divided into overlapping discs, and the function is given by a power series
in each region. This sounds as if there is a large potential for patching together
different functions. The analytic continuation theorem shows that this is not the case.
Analytic continuation theorem When we are working with the real numbers, smooth
functions can easily be chopped up and glued together. For example, we can take the
curve y x 2, cut it in two at see
its trough, and insert a portion of straight line 3 units long
between the two halves. This hybrid curve may not have a concise formula to define it,
but it is perfectly valid, and indeed differentiable.
The analytic continuation theorem says that the situation is utterly different in the
complex numbers. If we know the values of an analytic function on a patch of plane,
then there is exactly one way to extend it to the whole complex plane. This is two
theorems in one, and they are two
ANALYSIS
POWER SERIES
good ones! Firstly, if we are only provided with a function on a tiny patch of the plane,
we know that it automatically extends to cover the whole plane. Secondly, there is only
ever one way to do this. If two analytic functions f and g coincide, even on a tiny patch
of the plane, then they must be equal everywhere. (A ‘patch’ cannot just be a
smattering of individual points, it must have positive area, however small.)
This shows that differentiable functions in the complex numbers are far more rigid
than their slippery counterparts in the real numbers. A particularly famous application
of this fact is to the Riemann zeta function.
Picard’s theorem The theory of analytic functions, and the analytic continuation
theorem in particular, tell us that we cannot carry over our intuition about the real
numbers to understand complex functions. Consider the function f (x) 5 x 2 1 2.
Viewed as a function with real numbers as inputs, what are its outputs? The answer is
that every real number from 2 onwards appears as an output, those below 2 do not. The
real function f (x) 5 sin x is even more restricted, producing outputs only between 21
and 1. The simplest of all are the constant functions, such as f (x) 5 3. For every input,
this produces the same output: 3.
In the late 19th century, Charles Émile Picard showed that yet again the situation is
dramatically different in the complex numbers. If f is a complex analytic function and
is non-constant, Picard proved that every single complex number must appear as an
output of f, with possibly one solitary exception. For example, viewed as a complex
function, f (z) 5 z 2 1 2 produces every complex number as an output. The exponential
function, f (z) 5 e z misses one number: 0.
POWER SERIES
Summing powers Ordinary series consist only of numbers. But if we include a variable
z, we can end up with a function. An important way to do this is to add up successive
powers of z. For example:
zr511z1z21z31…
Care is needed for this series to converge. Substituting z 5 2, for example, gives no
chance of the series approaching a finite limit. But it will converge when | z | < 1. In
this region, it produces the function (12 z) 21, as it is a geometric progression. Another
power series is given by:
This also turns out to converge for | z | ? 1. Less obviously, this converges to the
function 2ln(1 2 z). These are both examples of power series.
zr
r5z1z2
21z3
41…
31z4
233
Power series Generally, a power series is of the form
a0a1za2z2a3z3…?
r0
Many important functions are built in this fashion: the exponential function, the
trigonometric functions, and of course all polynomials are power series where most of
the a
In fact, according to the theory of analytic functions, it is almost true to say that all
important functions arise this way. Certainly all well-behaved functions do, though
pathological examples such as the Koch snowflake do not.
Suppose we start with the series z …. If we differentiate this term by term, we get 1 z z
2 z 3 …. There are some technical arguments suppressed here. There are two processes
going on: the differentiation of a function, and the limit of a series. I have assumed that
it is legitimate to swap the order of these. Happily it is valid, but Riemann’s
rearrangement theorem is a warning not to take these types of fact for granted.
Functions as power series Power series may initially seem as if they have been dreamt
up to torment the unsuspecting student. But, with a little familiarity, they really do
provide an excellent language for analysis. It is a remarkable fact that any reasonable
function can be written as a power series: this is formalized as Taylor’s theorem.
First we assume that a function, such as sin, can be written as a power series:
sin z a 0 a 1 z a 2 z 2 a 3 z 3 a 4 z 4 …
cos z a 1 2 a 2 z 3 a 3 z 2 4 a 4 z 3 5 a 5 z 4 …
sin z 2 a 2 6 a 3 z 12 a 4 z 2 20 a 5 z 3 …
arzr
3z4
2z3
234
ANALYSIS
EXPONENTIATION
cos z 6 a 3 24 a 4 z 60 a 5 z 2 …
6.
The general pattern might become clear now: a n 0 whenever n is even. When n is odd,
an1
sin z z z 3
3! z 5
5! z 7
7! …
Taylor’s theorem Taylor’s theorem guarantees that the above method for writing
functions as power series actually works. The resulting series really does converge to
sin z. In this example the series is valid for all z, although this is not true for every
function.
The sine, cosine and exponential functions are particularly nice, because only one
region is needed. These functions are given by the same series everywhere.
EXPONENTIATION
The exponential function Power series are central to modern mathematics, and the
exponential function is the most important of all:
1xx2
2x3
6x4
24 … exp x
x0
0! x 1
1! x 2
2! x 3
3! x 4
4! … exp x
There are several properties that make this function very special. Firstly, by
multiplying the series for exp x and exp y, we find that:
235
A second crucial property is what happens when we differentiate this function. The x 4
___ for example, produces This shows the general pattern. The result is the same
d x (exp x) exp x
This says that the exponential function describes its own rate of change. Indeed, the
function can be defined by this property. This is one reason that this function is so
widespread within mathematics, because it has a tendency to appear whenever calculus
is used (and mathematicians are always using calculus). Radioactive decay is one
example.
When we feed the value 1 into the exponential function, exp 1, we get the definition of
the important number e. The function is also commonly written as e x.
e If the imaginary number i is the cornerstone on which the complex numbers are built,
then e is the key to the front door. Whereas is a number with remarkable properties, e
is more than a number: it is the public face of the exponential function. This function
carries great power; e, as its most visible part, takes all the plaudits.
The number e is defined as exp 1, that is to say as the limit of the series:
111
__ 2! 1
e can equivalently be defined as the base of the natural logarithm, or the limit of the
continuous interest sequence. The exponential function is the parent of the
trigonometric functions sin and cos, and the basis of the modulus and argument
approach to complex numbers. In fact, it is almost impossible to do mathematics
without encountering this number at every turn.
The first criteria that we should demand of complex exponentiation are that a 0 1 and a
1 a for any number a. Since exp 0 1 and exp 1 e, this translates as e 0 1 and e 1 e, as
required. This is a good start.
A more significant rule that we want complex exponentiation to follow is the first law
of powers: a b a c a b c, for any a, b, c. The exponential function also satisfies this: e x
e y e x y. We can then extend this from e to other complex powers. The most famous
example of complex exponentiation in action is Euler’s formula.
4! term,
4!, which is
236
ANALYSIS
EXPONENTIATION
a b e b ln a
Compound interest If you deposit $100 in a bank account which produces 5% interest,
one year later you will have the original $100, plus $5 of interest, making $105. An
elementary, but common, mistake is to believe that after two years you should have
$110. The error is that the account does not add $5 each year; it adds 5% of the amount
in the beginning of the year. At the start of the second year this is $105, and 5% of this
is $105 0.05 $5.25. So at the end of the second year, there is $110.25 in the account.
How much money will be in the account after 25 years? Rather than working through
24 intermediate calculations, we want a short cut. Each year, the account grows by 5%,
which is equivalent to multiplying the total by 1.05. At the end of the first year there is
1.05 100.
At the end of the second, 1.05 1.05 100, that is 1.05 2 100. At the end of the third there
is
1.05 3 100, and so on. In general, at the end of the n th year, there will be 1.05 n 100.
Taking n 25, after 25 years the account will contain 1.05 25 100 $338.64.
In general, if you put $ n into an account that pays m % each period, then after k
periods the amount of money there will be (1 m
___ 100) k n.
Continuous interest In 1689, Jacob Bernoulli discovered that beneath the arithmetic of
compound interest is some intriguing mathematics. Suppose I put $1 into a bank
account which pays 100% interest each year. After 1 year, the account will have grown
to (1 1) 1 2. Suppose instead that the account pays 50% every 6 months, or 1
4 year produces (1 1
4) 4 2.44.
The interesting question is what happens if we continue this line of thought. If we split
the year into tiny pieces, seconds perhaps, can we manufacture a huge yearly total? Or
is there some bound that it can never exceed? The question is: what happens to the
sequence (1 1 Bernoulli’s answer is that this sequence gets ever closer to the number e.
Dividing the year into hours produces $ e (to the nearest cent). In fact, this is often
used as an alternative definition: e lim n ? (1 1
In the limiting system, we no longer have discrete interest being awarded at regular
periods; instead the money increases continuously, (see discreteness and continuity)
growing at every single moment, with e determining the rate of increase.
237
Euler’s trigonometric formula The exponential function is defined as a power series:
Taylor’s theorem allows us to form the power series of other functions, including:
Leonhard Euler noticed that these three series seem to be very closely related. In fact, it
almost looks as if sin z and cos z should combine together to make e z. But simply
adding them does not quite work.
The odd terms here are the series for cos z. The remaining ones are i multiplied by the
series for sin z. Putting this together, we get Euler’s trigonometric formula:
e i z cos z i sin z
cos z e i z e i z
2 sin z e i z e i z
2i
238
ANALYSIS
EXPONENTIATION
These are very useful formulas. Taking n 3, 4, … in de Moivre’s theorem can produce
triple, quadruple angle trigonometric formulas, and so on.
Euler’s formula Leonhard Euler’s work appears in this book more often than that of
any other mathematician. His mathematics was extensive and decisive. He also made
major contributions to other areas of science, including astronomy and optics. But he
may be best known for something he probably never actually wrote down (although it
is an immediate consequence of his work).
Euler’s formula is an equation which beautifully unites the five fundamental constants
of mathematics, and hints at awesome, unimagined depths to the complex world:
ei10
The Nobel-prize-winning physicist Richard Feynman has called this ‘the most
remarkable formula in mathematics’. Why should this exquisite equality hold true? It
follows directly from Euler’s trigonometric formula. According to that, e i cos i sin.
Since cos 1 and sin 0, Euler’s formula follows.
Natural logarithm The natural logarithm is the inverse of the exponential function: if
exp x y, then ln y x. It is written ‘ln’ but pronounced ‘log’. Equivalently, the natural
logarithm is the logarithm to base e. The natural logarithm was one of the first
glimpses that mathematicians had of the exponential function, and was first tabulated
by John Speidell in 1619.
Calculus of the natural logarithm Integrating powers is routine work: the integral of x 2
is x 3
0 C, which is meaningless,
__ x d x ln x C
Indeed, this is what makes the natural logarithm ‘natural’. To see why this should be
true, we approach the problem from the other side, and show that if y ln x, then d y
__ d x 1
_ x: If y ln x, then
by definition e y x. Now we can differentiate this with respect to x, using the chain
rule: e y dy
__ dx 1. So d y
__ d x 1
__ e y, which says that d y
__ d x 1
_ x.
239
Hyperbolic trigonometry To make Euler’s trigonometric formula work, we had to
introduce complex numbers. A way to avoid this is by using the following hyperbolic
functions:
2 and sinh z e z e z
Straight from this definition, we get cosh x sinh x e x, the hyperbolic equivalent of
Euler’s trigonometric theorem.
All standard trigonometric facts and formulas have hyperbolic counterparts. For
instance, instead of cos 2 x sin 2 x 1, we have cosh 2 x sinh 2 x 1. This fact provides a
clue to their name too. In the real numbers, if you plot a graph of the points (cos, sin)
as varies, the result is a circle. Plotting (cosh, sinh) produces a hyperbola.
Hyperbolic functions are excellent examples of the complex numbers at work. In the
geometry of the real numbers, the graphs of sinh and cosh had been encountered, in the
Bernoullis’ study of catenaries. They do not resemble the sine and cosine waves,
however, being non-periodic. Nevertheless, the complex numbers reveal them as the
close cousins of the usual trigonometric functions: cosh z cos i z and sinh z i sin i z.
FRACTALS
Self-similarity The word fractal does not have a formal definition. But these
fascinating shapes share a property of self-similarity: a scale-defying tendency to look
the same, however far you zoom in. For example, the Sierpinksi triangle is obtained by
taking an equilateral triangle, dividing it into four smaller triangles, and removing the
central one. Then the same process is repeated with each of the three remaining
triangles, and so on. Once this is completed the set that remains is self-similar: if you
shrink its width by a half, then it exactly fits into one of the corners of the original.
2! z 4
4! z 6
_______
3! z 5
_______
6! … sinh z z z 3
5! z 7
7! …
cosh z e z e z
240
ANALYSIS
FRACTALS
from the study of dynamical systems, including the most famous fractal of all, the
Mandelbrot set, and the closely related Julia sets, which show how fractals appear as
strange attractors in chaos theory.
see tessellations
Pentaflakes The only regular polygons which tessellate (see tessellations) are
equilateral triangles, squares and hexagons; pentagons do not. Unperturbed by this, in
his 1525 book Instruction in Measurement the German artist and polymath Albrecht
Dürer proceeded to lay six regular pentagons edge to edge creating a new shape,
resembling a pentagon, but with a notch cut out of each side. Then he laid six of these
notched pentagons edge to edge, to obtain a more complicated figure. The beautiful
shape which results from repeating this process is now known as a pentaflake, and has
a strong claim to being history’s first fractal.
Koch snowflake Fractal patterns often appear when familiar objects are subjected to
minor changes, repeated infinitely often. One such was the Koch snowflake,
discovered by Helge von Koch in 1906. The idea is simple: first draw an equilateral
triangle. Then take each side in turn, divide it into three pieces, and build an equilateral
triangle on the middle section. Now, delete the original portion of line, just leaving the
two new sides. At this stage, we have a six-pointed star.
Now repeat this process on every straight section of the new shape, and keep repeating.
The Koch snowflake is defined to be the curve generated by this process.
The resulting curve is infinitely long, but encloses a finite area (in fact 8
5 times the area of the original triangle). The fractal dimension of this curve is log 4
____ log 3. As well as being aesthetically appealing, mathematically this curve aroused
interest as an example of something continuous (it doesn’t have any jumps or gaps),
but not differentiable: it is not smooth anywhere; you cannot draw a tangent to it at any
point.
Cantor dust Start with a segment of line, 1 unit long. Divide it into three equal parts,
and throw away the middle section. Now repeat the process with each of two
remaining pieces: chop each into thirds and throw away the middle section. Keep
repeating this process. The collection of points which never get thrown out, and still
remain after infinitely many steps, is called Cantor dust.
241
Georg Cantor himself constructed this fractal object to illustrate that the connection
between his new infinities (see countable and uncountable infinities) and geometry is
not at all straightforward. From a set-theoretic point of view, Cantor dust contains a lot
see countable
of points. In fact it is in one to one correspondence with the entireand
set of real numbers.
uncountable
But, from a geometric infinities
point of view, there is barely anything there. At the start, the
segment has length 1. After the first step, it has length 2
3, and then 4
9, 8
3 as long, will require three of them. Beginning instead with a square 1msee powers
3 1m, and
covering it in smaller squares each 1
3
1
3 as long
Fractal dimension Looking at the top part of the Koch curve, if we want to cover it
with smaller copies of itself, 1
_
23
3 as long, we need four of them. So, if its dimension is D, this should satisfy 4 5 3D.
Anyone adept at logarithms can solve this as D 5 log 4 ____ log 3,
23
4
1
approximately 1.26. With fractal dimension between 1 and 2, Koch’s curve can be
thought of as intermediate between a line and a surface. Cantor dust has fractal
dimension log 2
____ log 3, which is around 0.63, lying between a point and a line.
Fractal dimension was first discovered by Lewis Fry Richardson, in his investigation
of the coastline problem.
The coastline problem In the early 20th century, the wide-ranging British scientist
Lewis Fry Richardson was gathering data on how the lengths of national boundaries
affect the likelihood of war, when he hit an inconvenient obstacle: the Spanish
measured their border with Portugal as being 987 km, but the Portuguese put it at 1214
km. Contemplating this, Richardson realized that whereas the length of a straight line
is
242
ANALYSIS
FRACTALS
unambiguously defined, for a very wiggly one the answer would depend on the scale
you used to measure it. This work was rediscovered by Benoît Mandelbrot, who in
1967 wrote an article: ‘How Long is the Coast of Britain?’
To try to answer this question, you could first take a map from a schoolbook and
measure the outline in straight 100 km sections. But using a higher-resolution map and
a shorter scale of say 10 km, taking into account all the extra kinks and wiggles, will
produce a larger result. In the extreme case, you could set out yourself with a ruler, and
try to measure the exact line of high tide by hand. The extra distance could push the
estimate up to millions of times your original answer; this is the so-called Richardson
effect.
Richardson had found that ‘At one extreme, D 1.00 for a frontier that looks straight on
the map. For the other extreme, the west coast of Britain was selected because it looks
like one of the most irregular in the world; it was found to give D 1.25’.
Because Peano’s curve covers the whole square, it has fractal dimension 2.
Subsequently generalizations of Peano’s curve have been found which can fill a 3-
dimensional cube, or any n-dimensional hypercube.
Kakeya’s needle Imagine you have a needle lying on a table-top, and you want to
rotate it a full 360° by sliding it around (you are not allowed to pick it up). But there’s
a catch: the needle is covered in ink, and as you push it around it leaves its path painted
out behind it. The question Soichi Kakeya asked in 1917 is: what is the minimum
possible area for the resulting shape?
The surprising answer, supplied by Abram Besicovitch in 1928, is that you can make
the area as small as you like. There is no way to do it leaving a shape with zero area,
but it can be done by leaving an area of 0.1 square units, or 0.001, or as small an area
as required.
243
Kakeya’s conjecture The sets left by Kakeya’s needle have an interesting property:
they contain a line 1 unit long pointing in every single direction. Shapes like this are
known as Kakeya sets. The needle problem concerns such sets on a 2-dimensional
plane, but the same definition makes sense in higher dimensions.
Kakeya’s conjecture concerns the fractal dimension of Kakeya sets. It says that a
Kakeya set in n-dimensional space must have the maximum possible fractal dimension,
that is to say n. This means that the sets are substantial, unlike Cantor dust. At time of
writing, Kakeya’s conjecture remains open.
DYNAMICAL SYSTEMS
Dynamical systems One of the joys of mathematics is the way fabulous beauty and
flabbergasting complexity can arise from seemingly simple situations. Often, all that is
needed is the right angle of approach.
The formula z 2 0.1 is hardly a wonder of the mathematical world. If we substitute the
value z 0 into it we get 0 2 0.1 0.1. What happens if we put this value back into the
formula? We get 0.1 2 0.1 0.11. Putting this back in again, we get 0.11 2 0.1 0.1121. If
we keep doing this, after about 13 iterations the result settles down to a number close
to 0.112701665.
A dynamical system arises when the output of a function is repeatedly fed back in as
its input. This example is a quadratic system, as it is based on the quadratic function z
z 2 c.
2 and 0.25. This does not seem very exciting, until we also consider complex values of
c, when the spectacular fractal discovered by Benoît Mandelbrot in 1980 is fully
revealed.
244
ANALYSIS
DYNAMICAL SYSTEMS
The different bulbs of the Mandelbrot set correspond to different types of attracting
cycles of the function z z 2 c. The central heart-shaped region (called the main
cardioid) comprises the set of values of c for which the system has a unique attracting
fixed point. The largest circular disc then represents those where there is an attracting
2-cycle (see attracting cycles). The smaller bulbs correspond to attracting cycles of
different lengths.
Julia sets The number 1 lies in the Mandelbrot set, because the sequence we get by
repeatedly applying z z 2 1, starting at z 0, remains bounded (oscillating for ever
between 1 and 0). But what happens if we start the sequence at somewhere other than
0, say at z 2? This time the sequence does rush off to infinity. This means that 0 is in
the Julia set of z 2 1, but 2 is not.
We can replace 1 with any other complex number c, and look at the Julia set for z 2 c.
Julia sets typically form fantastically intricate patterns. In fact for some values of c the
Julia set of z 2 c resembles the Mandelbrot set itself (such as 1, where is the golden
ratio). The Mandelbrot set serves as a map of all these patterns, with every point in it
producing its own Julia set (and those outside producing Cantor dust).
More generally, we can replace z 2 c with any function f, and ask which starting
numbers cause the system obtained by repeatedly applying f to remain bounded. Many
extraordinarily organic patterns can be formed this way. This idea was discovered by
Gaston Julia in 1915, before the term ‘fractal’ had been coined, and without the aid of
modern computers to display them in their full glory.
The logistic map The Mandelbrot set is a picture of the simplest non-linear system, the
quadratic system, x x 2 c. Only slightly more complicated is the logistic map:
x r x (1 x)
Again, just one number needs to specify this system, this time known as r. With the
logistic map, the characteristic properties of a dynamical system can be seen without
leaving the real numbers.
We form a sequence by repeatedly applying this function, and watch how it evolves.
The starting value ofsee attracting
the sequence doescycles
not matter too much as long as it is between 0
and 1. For convenience, we will begin at x 1
3. The evolution of the sequence depends on r. If r 0, then the sequence quickly settles
down to a single value: 1
112557.70, … This sequence quickly diverges to. Between the values of r 0 and r 4
interesting action happens: these produce a sequence of bifurcations, followed by
chaos.
1.0
0.8
0.6
x 0.4
0.2
3 (or any other randomly selected number between 0 and 1) quickly settles down to a
single value of 0. If r 2, this sequence proceeds: 0.33333, 0.44444,
0.49383, 0.49992, … to five decimal places. In this case the sequence never arrives at a
fixed value, but it does get ever closer to the number 1
When r 3.2, the sequence does not get close to a single value, but flickers alternately
between values near 0.51304 and 0.79946. This is called an attracting 2-cycle. The
values of the 2-cycle depend closely on r, but not on the starting value. The sequence
beginning at 1
10 (and almost every other point between 0 and 1). Similarly when r 3.5, the system
has an attracting 4-cycle at approximately: 0.38282,
0.82694, 0.500088, 0.87500. Not all attractors are finite cycles like this. In higher-
dimensional situations, the sequence may instead be drawn towards a strange attractor.
Bifurcations When r is between 0 and 3, the logistic map has a single attracting point.
From 3 to around 3.45, it has an attracting 2-cycle. Between approximately 3.45 and
3.54, it has an attracting 4-cycle, and between approximately 3.54 and 3.56, an
attracting 8-cycle. As r gets closer to 3.57, the attracting cycle keeps doubling in
length: 16, 32, and so on. These bifurcations are illustrated in the diagram overleaf.
When r is bigger than 3.57, we see a new form of behaviour, namely chaos. The
threshold of chaos is called the Feigenbaum number, and is approximately
3.5699.
Chaos The logistic map is one of the most commonly studied dynamical systems, as it
illustrates the basic aspects of chaos theory. Between r 3 and around 3.57, the logistic
map exhibits a sequence of bifurcations. At this point, chaos takes over. A sequence,
starting at 1
3, with r 3.58, begins as follows: 0.333, 0.796, 0.582, 0.871, 0.403, 0.861, 0.428,
0.876, 0.388, … One might hope that after some time this sequence would settle down
into a discernible pattern. But the essence of chaos is that this is not so. After r 3.57
most sequences will not be drawn into an attracting cycle, but will jump around for
ever, apparently at random.
246
_ point between 0 and 1.) So 1
ANALYSIS
DYNAMICAL SYSTEMS
Of course in the real world, it is impossible to know the value of r with infinite
precision.
Three-cycle theorem The bifurcations of the logistic map suggest that attracting cycles
tend to come with lengths that are a power of 2. This system does also have a 3-cycle,
however, the first occurring at r 1
8. The logistic map is an example of a 1-dimensional system, since it only involves one
variable x. In 1975, Tien-Yien Li and James Yorke proved that for any such system a
3-cycle is, paradoxically, a sure indicator of chaos.
In their paper ‘Period Three Implies Chaos’, they proved that if a 1-dimensional
system has an attracting 3-cycle anywhere, then it must also have attracting cycles of
length 2, 4, 5, and every other finite length, as well as chaotic cycles.
The phrase coined by Edward Lorenz to describe this was the butterfly effect. The
equations which describe the Earth’s weather are also believed to be chaotic. So even a
tiny change in air flow, such as a butterfly fluttering its wings in Brazil can trigger
dramatic changes in weather patterns, eventually leading to tornadoes in Texas.
Chaotic systems The logistic map was the first simple, chaotic process studied in
depth. In the 1940s, John von Neumann first saw the potential of the system x 4 x (1 x)
as a pseudo-random number generator. In the 1970s, the system was revisited by the
biologist Robert May. The logistic map was intended as a simple model of the changes
of a population of fish from year to year. Of course, the dynamics of the fish
population turned out to be rather less simple than had been hoped.
Many chaotic systems have been identified throughout science. The origins of the
subject date back to Newtonian physics and Henri Poincaré’s work on the chaotic three
body problem. Subsequently, chaotic systems have been used to analyse phenomena
from epileptic seizures to stock-market crashes. A lava lamp illustrates the potential for
chaos in fluid dynamics.
Strange attractors The quadratic system and the logistic map are both examples of 1-
dimensional dynamical systems, based on functions which take one number as input
and one as output.
247
Higher-dimensional systems are also possible. For example, the Henon phase is a 2-
dimensional function which takes as input a pair of numbers, x and y, and applies the
following rule:
x ? x cos (y x 2) sin
y ? x sin (y x 2) cos
Here is any fixed angle. In the resulting dynamical system, points are gradually drawn
towards a particular subset of the plane, called the system’s attractor. The exact
geometry of this attractor will depend on the choice of.
As the Henon phase illustrates, attractors of dynamical systems are a rich source of
beautiful imagery. If the fractal dimension of the attractor is not a whole number, it is
called a strange attractor.
DIFFERENTIAL
EQUATIONS
5 and indeed
y 3 x 2 C for any constant C. So, instead of getting a single solution, we get a space of
solutions
__ d x, such as d y
___ d x 2.
involves d y
248
ANALYSIS
DIFFERENTIAL EQUATIONS
Boundary conditions Suppose a boy is cycling down a hill with constant acceleration
of 2 m/ s 2. This corresponds to a differential equation d 2 s
__ d t 2 2, where s is the distance travelled down the hill, and t is the time since he
started travelling.
(i) d s ___
(ii) s t 2 Ct D
Some extra data can specify a solution more precisely. If we are additionally told that
the boy’s speed at the start is 4 m/s, this translates as saying that, at t 0, d s
__ d t 4. Substituting this into equation (i), we find that C 4. So our solution space has
been narrowed to a 1-dimensional space:
(iii) s t 2 4 t D
Suppose we have a second bit of data, at time t 0 the boy is at the top of the hill, that is,
s 0. Then we can put this into equation (iii) and find that D 0, so now we have a unique
solution:
(iv) s t 2 4 t
y 2 sin x C
__ d x
___ d x cos x
2ydy
249
More generally, if we have a differential equation f ' (y) d y __ d x g' (x), we recognize
the left-hand side as the derivative with respect to x of f (y), and solve it to get:
f (y) g (x) C
It is very tempting to think of this as ‘multiplying both sides by d x’, and then
integrating:
f ' (y)d y g' (x)d x. But some caution is needed, as is it is by no means clear what, if
anything, ‘multiplying both sides by d x’ actually means. Ultimately, this procedure
relies on spotting applications of the chain rule. But this short-hand of separating the
variables is useful.
Another example is e y d y
__ d x 4 x. Then e y d y 4 x d x and e y 2 x 2 C.
If we let y be the mass of the lump and let t stand for the time since the clock started,
then the rate of change of mass is d y
d y ___
dty
By separating the variables (or by treating this as a homogeneous equation; see higher-
order differential equations) we end up with a formula y C e t. If we additionally have
a boundary condition, say at t 0 the lump had an initial mass of 2 kg, then we can solve
the equation fully: y 2 e t. This formula tells us the mass of the lump after any length of
time.
___ d x 2 5 d y
__ d x, and d 2 y
only of multiples of y, d y
___ d x 2 added together, and the right-hand side is zero. These homogeneous
equations are more manageable.
This procedure works just as well for higher-order homogeneous equations, and other
forms of second and higher-order equations have different methods of solution.
However, many cannot be solved exactly at all, and rely on numerical analysis for
approximate solutions.
__ d t
250
ANALYSIS
FOURIER ANALYSIS
x 0. Partial differential equations are of great importance in physics, where they are
often expressed in terms of vector calculus. Many physical phenomena are described
by such equations. Examples include the heat equation, Maxwell equations for
electromagnetic fields, the Navier–Stokes equations for fluid flow, and Schrödinger’s
equation, which governs quantum mechanics.
x 0, then z 4 works, as does z 3 y, and z sin y. In fact, z f (y) is a solution for any
function f. When differentiating with respect to x, any term involving y alone is treated
as a constant and therefore disappears.
This illustrates that partial differential equations may have much bigger families of
solutions than ordinary differential equations. Finding precise descriptions of these
families can pose a major challenge. In other cases it is not clear whether the equation
has any solutions at all.
A notable example is the Navier–Stokes problem.
FOURIER ANALYSIS
Sine waves To a mathematician’s ear, the purest sound is that carried by a sine wave or
sinusoid. For a sine wave at concert pitch A (a frequency of 440 Hz), and volume
around that of ordinary conversation (an amplitude of around 1
_______ 2,000,000 metres), the equation of the wave will be y sin (440 t)
y sin t
2. One of the advantages of working with sine and cosine waves is that their harmonics
have easy equations. If y sin t is the first harmonic, then y sin 2 t is the second, y sin 3 t
the third, and so on.
2P
2P
y = sin t
y = cos t
2PP
0
P
2P
2P
2P
2P
2P
2P
2P
2P
32
P
1
2P
2P
251
Musical instruments produce much more complex waveforms than the plain sine wave.
In 1807, Joseph Fourier made the magnificent discovery that all such waves can be
built from sine waves. This is the subject of Fourier analysis, and is of incalculable
importance in modern technology.
The basic waveforms we have are the sine wave, with its equation y sin t, and its
family of harmonics: y sin 2 t, y sin 3 t, y sin 4 t, and so on. We can form many more
interesting waveforms by adding together or superposing multiples of these: y sin t sin
3 t or y sin t 3
2 sin 2 t 1
y = sin t + sin 3 t
2 sin4 t.
Two superposed waves will reinforce each other at some places, and cancel each other
out at others; this is known as interference, and has the result that the superposition of
the two waves will have a much more complicated appearance than either of the
original waves.
32P
P
1
2P1
2P3
2P5
2P
2P
2P
22
2P
2P
2P
2P1
2P32P5
2P
Fourier series Waves come in all shapes and sizes. A saw-tooth wave (pictured top)
looks radically different to the mathematician’s favourite, the sine wave. It sounds
different too. It also looks different from the smooth waves that can be built from sine
waves by simple addition. We can get something quite close to it though:
y sin x sin 2 x
A sawtooth wave
2 sin 3 x
3 sin 4 x
34
Once we have spotted the pattern, it is clear what needs to happen: next we need to add
on sin 5 x
5 and
20
then sin 6 x
6. As we add more and more such terms, we get closer to the saw-tooth wave, without
ever quite getting there.
Square wave
Triangle wave
252
ANALYSIS
FOURIER ANALYSIS
By taking the infinite limit of these terms, we can produce an exact formula for the
saw-tooth wave, as a Fourier series:
sin x sin 2 x
2 sin 3 x
3 sin 4 x
4…?
n1
sin nx ______
Similar tricks work for other waveforms such as the square wave
sin x sin 3 x
3 sin 5 x
5 sin 7 x
7…?
n1
sin (2 n 1) x ____________
2n1
and the triangular wave:
sin x sin 3 x
sin (2 n 1) x ____________ (2 n 1) 2
These series are all of the form ? a n sin nx, where the a n are carefully chosen
numbers. In general a Fourier series can also include harmonics of cosine waves,
giving the general formula:
? a n sin nx ? b n cos nx
Fourier’s theorem Objects similar to Fourier series had been studied before Joseph
Fourier turned his mind to them in 1807. His great contribution was to realize that
every reasonable periodic function can be given by a Fourier series. This was Fourier’s
theorem.
where a n and b n are some numbers. The challenge is to find the correct values of a n
and b n.
9 sin 5 x
25 sin 7 x
49 … ?
n1
253
Fourier’s formulas To express a waveform f as a Fourier series
we need to find the right values for a n and b n. Fourier proved that the answers are:
f (t) cos nt d t To split any reasonable waveform into sine and cosine waves, all that
needs to be done is to evaluate these two integrals. In some cases this may be more
easily said than done, of course, and can require sophisticated techniques of numerical
analysis.
an1
a n sin nx b n cos nx c n e i nx c n e i nx
Now we get a much slicker expression for the complex Fourier series:
f (x) ?
f (t) sin nt d t b n 1
c n e i nx
To write a function f (x) in this form, we have to evaluate the complex version of
Fourier’s formula:
cn1
2
f (t) e int d t Although more abstract than Fourier’s original series, this discovery has
been no less important.
In the continuous case, îs a function on the real numbers, and the series becomes an
integral:
f (x)
f^ (t) e i tx d t
254
ANALYSIS
FOURIER ANALYSIS
Starting with f, how do we get at ? The answer is again given by Fourier’s formula,
slightly tweaked:
which was not observable before. Exploiting this symmetry allows many difficult
problems to be rephrased in ways that make them manageable; important examples are
tricky partial differential equations. The Fourier transform is a powerful weapon in the
modern mathematician’s arsenal, and has wide applications, from representation theory
to quantum mechanics.
255
LOGIC
Shortly after Boole’s work, logic was unexpectedly propelled into the limelight, by
Georg Cantor’s work on set theory. His astonishing revelation that there are different
levels of infinity precipitated what Hermann Weyl called a ‘new foundational crisis’ in
mathematics. Were there firm logical foundations on which the rest of mathematics
could depend? Russell’s paradox brought special urgency to this question.
The challenge was articulated most clearly by David Hilbert, who set out the
requirements he thought a logical
During the 20th century, Hilbert’s dream was shattered first by the incompleteness
theorems of Kurt Gödel, and then by Alonzo Church and Alan Turing’s work on the
Entscheidungsproblem.
Although the mathematical realm turned out to be infinitely more complex than Hilbert
expected, from the wreckage of his program sprang a whole new world. Most
importantly, Turing machines formed the theoretical basis on which physical
computers would later be built. Beyond this, new branches of mathematical logic
blossomed, bringing powerful new tools. These include proof theory, model theory,
complexity theory and computability theory.
257
2
BASIC LOGIC
Necessary and sufficient ‘Socrates is mortal’ is a necessary condition for the statement
‘Socrates is human’, because he must be mortal in order to be human; there is no other
way around it. On the other hand, it is not a sufficient condition; being mortal does not
on its own guarantee being human. Socrates could be a dog, a duck or a demigod.
However, ‘Socrates is a man’ is a sufficient condition for ‘Socrates is human’.
Necessity and sufficiency are two sides of the same coin: if statement P implies
statement Q, then P is a sufficient condition for Q, and Q a necessary condition for P.
When P is both a necessary and sufficient condition for Q to hold, the two are logically
equivalent, and mathematicians often use the phrase ‘if and only if’ to express this. For
example, Socrates is a bachelor if and only if he is an unmarried man.
These two implications have essentially the same meaning: that humans form a subset
of mammals. Statement 2 is called the contrapositive of 1. In general the contrapositive
of ‘P implies Q’ is ‘not Q implies not P’. The contrapositive is a rephrasing of the
original implication, a different way of expressing the same thing. The same is not true
of the converse.
This is false (Socrates might be a dog, for example). If both ‘P implies Q’ and its
converse ‘Q implies P’ hold, then P and Q are logically equivalent: P is true if and only
if Q is true.
and or, sometimes written as, ?, and ? respectively. In broader mathematics, as well as
within logic, ‘or’ is interpreted inclusively. That is, ‘a or b’ always means ‘a or b or
both’. (When the exclusive or is needed, it is written xor.)
258
LOGIC
While the earliest formal logic, propositional calculus, was being developed by George
Boole in the 19th century, Augustus de Morgan formulated two laws relating not, and
and or.
1 ‘not (a and b)’ is equivalent to ‘(not a) or (not b)’ 2 ‘not (a or b)’ is equivalent to
‘(not a) and (not b)’
A little thought shows that these hold in everyday language, and they can easily be
verified using truth tables.
and These two symbols are called quantifiers. stands for ‘for all’ (or ‘for every’ or ‘for
each’) and for ‘there exists’ (or ‘there is’). These phrases feature heavily in any self-
respecting mathematician’s vocabulary. To illustrate how to use them, consider the
statement ‘Everyone has a parent’. To avoid ambiguity, what is really meant is ‘For
every person in the world (living or dead), there is another person who is/was their
biological parent’.
First we bring in some symbols, ‘For every person x, there exists y, such that y is a
parent of x’. Now we introduce a predicate: P (y, x), to stand for ‘y is a parent of x’.
Then the sentence becomes:
x y P (y, x)
The order of the quantifiers is important here. If we swap them round, y x P (y, x) we
get the false assertion that there exists some person who is the parent of everyone.
See the
Another delicate issue is that we have to specify the domain we’re working in. The
statement is not true (or at least not meaningful) if x is the Galaxy Andromeda, or the
number. So everything has to be restricted to the collection of people (living and dead).
This is the range of quantification. It becomes increasingly difficult to discern the
meaning of a statement, the more alternating quantifiers it contains. (See the epsilon–
delta definition of continuity as an example. This is implicitly of the form x Q (,, x)).
More than three alternating quantifiers are very difficult to unravel.
Syllogisms Logic received its first thorough, formal analysis in the fourth century bc,
at the hands of Plato’s student Aristotle. His six-volume Organon laid out the basic
rules of logical deduction. Syllogisms played a central role. A syllogism is an
argument which proceeds from two premises to reach a conclusion. The most famous
is not due to Aristotle, but to the later thinker Sextus Empiricus:
259
Therefore
Sextus was a sceptic who thought this type of reasoning useless, arguing that, unless
we already know Socrates to be mortal, we are in no position to assert that all men are
mortal.
Categorical sentences The first premise of the ‘All men are mortal’ syllogism is of the
form:
(a) Every X is Y. (With a slight mental contortion, the second and third can also be
understood in this form.) The opposite form to (a) is known as (o):
(e) No X is Y.
Aristotle held that these were the four possible forms of categorical sentence, relating
the categories X and Y. He used them as a basis for his classification of categorical
syllogisms.
Aristotle’s analysis concluded that exactly 15 of these 256 possible syllogisms are
valid, and a further four are valid so long as they do not apply to an empty category
(that is, make statements about X, when there is no such thing as an X). The medieval
names for the 19 valid syllogisms are: Barbara, Celarent, Darii, Ferio, Cesare,
Camestres, Festino, Baroco, Darapti, Disamis, Datisi, Felapton, Bocardo, Ferison,
Bramantip, Camenes, Dimaris, Fesapo and Fresison. These are mnemonics: the order
of the vowels in each name gives the order of the categorical sentences a, o, i, e.
Dodgson’s soriteses The 19th century British mathematician and novelist Charles
Dodgson, better known as Lewis Carroll, considered soriteses: longer arguments
resembling Aristotle’s categorical syllogisms, but containing more than two premises.
260
LOGIC
In his book Symbolic Logic, he challenged the reader to come up with the logically
strongest conclusion from a set of premises such as this:
A solution to this would involve breaking this down into a sequence of valid
syllogisms.
Froggy’s problem Dodgson set his readers the task of finding the strongest possible
conclusion which can legitimately be deduced from this set of premises:
1 When the day is fine, I tell Froggy ‘You’re quite the dandy, old chap!’; 2 Whenever I
let Froggy forget that 10 dollars he owes me, and he begins to strut about like a
peacock, his mother declares ‘He shall not go out a-wooing!’;
3 Now that Froggy’s hair is out of curl, he has put away his gorgeous waistcoat; 4
Whenever I go out on the roof to enjoy a quiet cigar, I’m sure to discover that my
purse is empty;
5 When my tailor calls with his little bill, and I remind Froggy of that 10 dollars he
owes me, he does not grin like a hyena;
6 When it is very hot, the thermometer is high; 7 When the day is fine, and I’m not in
the humour for a cigar, and Froggy is grinning like a hyena, I never venture to hint that
he’s quite the dandy;
8 When my tailor calls with his little bill and finds me with an empty pocket, I remind
Froggy of that 10 dollars he owes me;
9 My railway shares are going up like crazy! 10 When my purse is empty, and when,
noticing that Froggy has got his gorgeous waistcoat on, I venture to remind him of that
10 dollars he owes me, things are apt to get rather warm;
11 Now that it looks like rain, and Froggy is grinning like a hyena, I can do without my
cigar;
12 When the thermometer is high, you need not trouble yourself to take an umbrella;
13 When Froggy has his gorgeous waistcoat on, but is not strutting about like a
peacock,
I betake myself to a quiet cigar; 14 When I tell Froggy that he’s quite a dandy, he grins
like a hyena; 15 When my purse is tolerably full, and Froggy’s hair is one mass of
curls, and when he is not
strutting about like a peacock, I go out on the roof; 16 When my railway shares are
going up, and when it’s chilly and looks like rain, I have a quiet cigar;
17 When Froggy’s mother lets him go a-wooing, he seems nearly mad with joy, and
puts on a waistcoat that is gorgeous beyond words;
18 When it is going to rain, and I am having a quiet cigar, and Froggy is not intending
to go a-wooing, you had better take an umbrella;
261
19 When my railway shares are going up, and Froggy seems nearly mad with joy, that
is the time my tailor always chooses for calling with his little bill;
20 When the day is cool and the thermometer low, and I say nothing to Froggy about
his being quite the dandy, and there’s not the ghost of a grin on his face, I haven’t the
heart for my cigar!
Sadly Lewis Carroll died before publishing Froggy's solution. He did, however, hint
that the problem ‘contains a beautiful “trap”.’
Formal systems Longer logical deductions than syllogisms, such as the solutions to
Dodgson’s sorites, can be built up as longer sequences of categorical sentences. We
would like an analysis of these compound arguments. However, given that there is no
limit to their possible length, how is this possible? Gottfried Leibniz had the first idea
in the 1680s, but his work was not picked up, and the subject did not come to fruition
until the 19th century, when George Boole, Augustus de Morgan and others developed
the first formal system for logic.
Their philosophy was to excise from logic all human subjectivity and intuition, and
build up logical arguments from the ground. As Boole put it, ‘the validity of the
processes of analysis does not depend upon the interpretation of the symbols which are
employed, but solely upon the laws of combination’.
Any formal system has three ingredients: 1 a language, which is a list of permitted
symbols, together with a grammar which says how to combine these into legitimate
formulas 2 some axioms, which are specified formulas, taken as the starting points for
logical deduction 3 laws of deduction, which say how to deduce a formula from
previous ones.
Then, a formula is judged valid, and called (rather grandiosely) a theorem purely if it
can be deduced from the axioms, by a sequence of applications of the laws of
deduction.
Propositional calculus The first formal system, propositional calculus, was intended to
provide a complete framework for logical deduction, such as longer sequences of
categorical sentences.
1 There are variables p, q, r, etc. (We think of each of these as standing for a
categorical sentence, such as ‘all men are mortal’.)
The language also includes brackets ‘(’ and ‘)’ and the connective symbols ?, ?, ?
(which we think of as and, or and implies, respectively), as well as (which we think of
as not).
262
LOGIC
The grammar says how to combine these symbols into legitimate formulas, intended to
mirror meaningful (though not necessarily true) statements. So we get ‘(p ? q) ? r’ but
see contraposit
not gibberish like ‘? ? p ? q’.
2 The axioms are a collection of specific formulas. For every formula P, the formula P
? P is an axiom, as is ((Q)?(P)) ? (P ? Q) (see contrapositive). Other axioms are needed
to encode the intended meaning of the logical symbols; these include de Morgan’s
laws, as well as general logical principles deriving from Aristotle’s laws of thought.
When written down, each has an obvious feel to it.
3 For propositional calculus we only need one law of deduction, modus ponens.
The question is what does all this produce? What is the list of formulas that can
legitimately be deduced from the axioms?
There is a very neat description of these via the informal method of truth tables. That
this provides a complete answer is guaranteed by the adequacy and soundness
theorems.
Not In ordinary language, as opposed to a logical formal system, the function of the
word ‘not’ is usually to negate whatever comes next, changing the sentence into its
opposite. So ‘not P’ (or ‘P’ as logicians write it) should be true when P is false, and
false when P is true.
F
The left-hand column of this truth table shows the possible truth values for P (either
True or False), and the right-hand column gives the resulting value for P.
Truth tables We can use a larger truth table to analyse ‘and’, which is sometimes
denoted by the wedge symbol ?. This time we need two columns on the left to list the
possible truth values for P and Q and the right-hand column gives the corresponding
contrapositive
truth values for P ? Q.
PQ
TT
TF
FT
FF
The same trick works for ‘or’ (?). Notice that in mathematics ‘or’ is inclusive. So, ‘P or
Q’ always means ‘P or Q or both’.
P?Q
F
F
263
PQ
P?Q
TT
TF
FT
FF
PQ
P?Q
TT
T
TF
FT
FF
Another symbol is ‘?’ standing for ‘if and only if’. It has the truth table:
PQ
P?Q
TT
TF
FT
FF
We can apply the truth table rules to particular examples. For example, to assess the
statement P ? P, we use the truth table:
P?P
264
LOGIC
Because the column for P ? P contains only Ts, it is always true no matter what its
inputs. Such a statement is known as a tautology. In ordinary language, ‘If Sophocles
is an unmarried man then he is a bachelor’ is a tautology and is true whatever
Sophocles’ actual marital status.
Using this idea we can build up truth tables for more complicated formulas:
PQ
P?Q
(P) ? Q
(P ? Q) ? ((P) ? Q)
TT
TF
F
F
FT
FF
Again the column for (P ? Q) ? ((P)) ? Q contains only Ts, so it is a tautology. In this
case, this means that P ? Q and (P) ? Q are logically equivalent.
NAND, NOR, XOR and XNOR The commonest logical connectives are ‘and’, ‘or’,
‘implies’ and ‘if and only if’. But there are four others, which are best described by
their truth tables.
Nand stands for ‘not … and’. So ‘P nand Q’ Nor is ‘not … or’, as in means ‘not (P and
Q)’. It has truth table: ‘not (P or Q)’:
Xor is the exclusive ‘or’. So ‘P xor Q’ means Finally xnor is the exclusive ‘nor’.
‘P or Q, but not both’: So ‘P xnor Q’ means ‘not (P xor Q)’:
PQ
P NAND Q
PQ
P NOR Q
TT
TT
TF
TF
FT
FT
FF
T
FF
PQ
P XOR Q
PQ
P XNOR Q
TT
TT
TF
TF
FT
FT
FF
F
FF
265
Logic gates The relations that logicians put in truth tables are important in electronics,
as logic gates. A logic gate is a component with inputs (typically two of them) and one
output. The output of the component is on or off, according to its two inputs and the
rule coming from the truth table. So a nor gate will emit an output when both its inputs
are off, and not otherwise.
In this context nand and nor are the primary gates. In fact, everything can be built up
just from nand. For example, ‘not P’, can be obtained as ‘P nand P’, and then ‘P and Q’
as ‘not (P nand Q)’. Using this idea, more sophisticated rules for the behaviour of a
device can be hardwired, as logical deductions.
Input A
Input B
Truth tables do suffer from one problem, namely that they grow exponentially larger,
the more variables are included. The full truth table for Froggy’s problem would need
131,072 (2 17) rows. Nevertheless truth tables encapsulate human intuition about
logical operations.
If the system of propositional calculus is to do its job properly, every tautology should
be a theorem of that formal system. This is indeed true, and is known as the adequacy
theorem. The converse is also true: every theorem of the formal system is a tautology.
This is the soundness theorem.
Beyond propositional logic There are two ways to view logic: George Boole’s highly
formal propositional calculus and the more intuitive approach of truth tables. The
adequacy and soundness theorems say that the two perspectives produce the same
result. This effectively renders propositional calculus obsolete. Everything it can
express can be arrived at more quickly through common sense and truth tables.
On the other hand, truth tables also vindicate Boole’s highly formal approach to logic.
Although in the context of propositional calculus the result has not turned out to be
especially useful, it does demonstrate that formal systems do work. When turbo-
charged and applied in broader settings (predicate calculus, Peano arithmetic and
axiomatic set theory), this approach would go on to transform mathematics.
AND
OR NOT
NAND
NOR
XOR
XNOR
Output
266
LOGIC
1 x is greater than y.
2 y is greater than z.
Therefore:
3 x is greater than z.
We would like a logical system to be able to cope with a deduction such as this. But
not every argument of this form works: A x is one more than y.
Therefore:
The overall form of these arguments is not enough to judge whether they are valid. It
depends on the detailed behaviour of the relations ‘… is greater than …’ and ‘… is one
more than…’.
Propositional calculus is too simplistic to cope with this. The solution is to introduce
subtler predicates, which can represent more delicate relationships between objects.
The result is predicate calculus.
For any remnant of the intended meaning to survive, we must axiomatize it. Hilbert’s
program expressed the hope that the whole of mathematics could be incorporated into
just one system of predicate calculus.
267
We might also want to say that there is no largest number. We can express that by
saying that, for every number, there exists another which is greater. So we also include
the axiom:
? y ? x G (x, y)
Continuing in this fashion will produce a system of predicate calculus for describing
orderings with no maximum element. Every theorem of the system will be a true fact
about such a structure. A similar approach allows building formal systems for number
systems, groups and other structures. A key example is Peano arithmetic, which
axiomatizes the natural numbers.
Peano’s axioms were a landmark. But, like so many great discoveries, his system
opened up as many questions as it answered. First: are there any models of Peano
arithmetic? Secondly: can every statement in number theory really be derived within
this system? This was the challenge posed by Hilbert’s program, and later answered by
Gödel’s incompleteness theorems.
Peano’s system set out the basic properties of the successor function:
3 Adding 0 to any number does not change it: ? x x 1 0 5 x 4 The successor of the sum
of two numbers is the same as adding the first to the successor of the second:
? x, y x 1 S (y) 5 S (x 1 y)
6 For any numbers x and y, multiplying x by y 11 gives the same result as multiplying
x by y
? x, y x ? S (y) 5 (x ? y) 1 x
268
LOGIC
The final axiom is the most complicated. It builds into the system the principle of
mathematical induction:
7 This says, if is any property of numbers, such that 0 satisfies, and whenever x
satisfies, x 1 also does, then every number satisfies. So, for every formula, we have an
axiom:
Models of Peano arithmetic Peano’s axiomatization of the natural numbers was a real
breakthrough. But the rulebook alone does not make a game of football. Could the
same care and precision which produced the axioms also yield a model of them, that is,
a structure which obeys them?
Since the dawn of humanity, the primary use of natural numbers has been to count
things, that is, to represent the sizes of sets of objects. The task facing logicians,
therefore, was first to develop a theory of sets, to abstract the notion of the size of a set,
and then to show that these sizes do behave as Peano arithmetic says they should.
Various approaches were taken, notably Principia Mathematica and then Zermelo–
Fraenkel set theory. Peano himself, however, gave up mathematics, to work on a new
international language in which he published the final version of his logical work.
Bertrand Russell and Alfred North Whitehead set themselves a monumental task: to
deduce the entirety of mathematics from purely logical foundations. Gottlob Frege had
already tried this (see logicism) but was undone at the last moment, when Russell
wrote to him with the news of Russell’s paradox. Frege had been too permissive
regarding what can classifyseeaslogicism
a set, and allowed in paradoxical monsters. In their
three-volume masterpiece Principia Mathematica, Russell and Whitehead were more
cautious, and in the first volume they painstakingly developed type theory.
page 83
In the second, they defined numbers within the theory, and proved that their system
does indeed conform to Peano’s rules. (It takes until page 83 of the second volume to
deduce that 1 1 2, accompanied by the comment ‘The above proposition is
occasionally useful’.) The third volume develops higher-level mathematics, including
the real numbers, Cantor’s infinities and the rudiments of analysis. A fourth volume on
geometry was planned, but never finished.
269
page 83
Type theory Invented by Bertrand Russell, type theory is an approach for the
foundations of mathematics, intended to avoid Russell’s paradox. Objects are not of
one type, but are stratified. In simple terms, atomic objects are assigned level 0. A set
of such objects has level 1, and sets of such sets have level 2, and so on. Any object
can only refer to others on lower levels than itself. This prevents any form of Russell’s
paradox creeping in.
In 1937, Willard Van Orman Quine proposed a new type theory known as New
Foundations, which greatly simplifies Russell’s version, and continues to be studied as
an alternative and comparator to Zermelo–Fraenkel set theory. As the subject of
computer science flourished, type theory assumed a new importance for describing
computational objects. It was advanced in 1970 by Per Martin-Löf, who developed a
powerful new framework initially intended to act as a foundation for constructive
mathematics. This type theory had properties which made it exciting to computer
scientists, including the ability to encode logical deductions inside itself. Several
programming languages, including proof checking software, have subsequently grown
out of it.
SET THEORY
Sets A set is a collection of objects, called its elements, or members. So the set of
natural numbers (usually denoted N) has elements 0, 1, 2, 3, … We might informally
write N {0, 1, 2, 3, … }, but set theory involves being extremely careful about the
meaning of ‘…’.
It is to be expected that sets are everywhere in mathematics. Every number system and
algebraic structure is a set, with some additional properties. But it might be surprising
that much mathematics can be done at such a general level. The origins of the subject
lie in the pioneering work of Georg Cantor, who published his first work on infinite
sets in 1874. His paper of 1891 famously contained Cantor’s diagonal argument and
Cantor’s theorem. In the early 20th century, following the discovery of Russell’s
paradox, the search began for a version of axiomatic set theory to act as a foundation
for the whole of mathematics. The search culminated in 1922 with the discovery of
Zermelo–Fraenkel set theory, but not without leaving loose ends such as the axiom of
choice and continuum hypothesis.
Set membership ‘a B’ means that the object a is a member of the set B. The symbol ‘’
is a disfigured Greek epsilon.
So ‘1 N’ says that the object 1 is an element of the set N. If x is a person, and Y is the
set of all Spanish speakers, then ‘x Y’ asserts that x is a member of the set of Spanish
speakers, that is, x can speak Spanish.
270
LOGIC
SET THEORY
Intersection Suppose A and B are two sets. The intersection of A and B is their
overlap: that is, the collection of all objects included in both A and B. This is written as
A B. So, if A is the collection of English speakers and B is the collection of Spanish
speakers, then A B is the set of people who speak both Spanish and English. If A is the
collection of even numbers and B is the collection of odd numbers, then A B (the
empty set). This means that the two sets are disjoint, having no elements in common.
Union The union of two sets A and B is the set obtained by taking the two together. It
is denoted by A B. So A B is the collection of objects in either A or B (or both). If A is
the collection of English speakers, and B is the collection of Spanish speakers, then A
B is the collection of people who speak either English or Spanish (or both). When both
sets are finite, there is a formula for the size of the union.
If A is the collection of even numbers, and B is the collection of odd numbers, then A
B Z, the set of all integers.
Because A and B cover Z entirely, and are also disjoint, they form a partition of Z.
Subsets If A is a set, then a subset of A is any collection of elements from A. The set of
even numbers is a subset of set of natural numbers (N). Similarly {1, 2, 3, 4, 5} is a
subset of N, as are the set of prime numbers, the set of square numbers, and every other
possible collection of natural numbers. Of course N is also a subset of itself. The
empty set is a subset of every set.
Functions Every area of mathematics involves the notion of a function. This can be
thought of as a process into which you feed inputs, to get outputs. It is common to call
a function ‘f’, to write ‘x’ for an input and ‘f (x)’ for the corresponding output.
The formula ‘f (x) x 2 2’ describes a function which accepts any number as an input,
and produces an output of 2 more than its square. So f (2) 6, for example. If we wanted
to draw this function as a graph, we would set y f (x), and plot points (x, y) for
different values of x.
As in this example, inputs and outputs are often numerical, and the function can be
written out explicitly with a formula. But there are more abstract types of function too.
Input
271
fy
Output
f(x)
y=f(x)
3
2
–1
–3 –2 –1 1 2 3
Domain and range When working with a function, care is always needed to specify its
domain. This is the set of all allowable inputs. We write f: A ? B to indicate that f is a
function from A to B. That is, its domain is the set A, and all the outputs are in the set
B. For example, we might consider a function f: R ? R, given by the formula f (x) x 2
2.
The range is the set of all outputs of the function. If we have a function f: A ? B, the
range may not be the whole of B, but can be a subset. In the above example, the range
is the set of all real numbers from 2 onwards.
In 1874, Cantor published a demonstration that the natural numbers, N, and the real
numbers, R, must have different cardinalities: that is, there can never be a one to one
correspondence between them. In fact he showed that just between 0 and 1 there are
more real numbers than in the whole of N. His paper of 1891 contained a new and
disarmingly slick proof of the same thing, a classic result now known as Cantor’s
diagonal argument. Though no longer controversial within mathematics, this result still
attracts a great deal of scepticism from those encountering it for the first time. One
instance even ended up in court in Wisconsin, USA: the 1996 case of Dilworth versus
Dudley.
In 1998 the mathematician and editor of the journal Bulletin of Symbolic Logic,
Wilfrid Hodges, wrote about various fallacious refutations that he had been sent, and
wondered ‘why so many people devote so much energy to refuting this harmless little
argument – what had it done to make them angry with it?’ Hodges went on to suggest
that ‘This argument is often the first
272
LOGIC
SET THEORY
mathematical argument that people meet in which the conclusion bears no relation to
anything in their practical experience or their visual imagination’.
Now, suppose for a contradiction, that there is a one to one correspondence between
the natural numbers and the real numbers between 0 and 1. Then we can write out this
correspondence in a grid:
1b1
2b2
3b3
4b4
nbn
The right-hand column is the supposed complete enumeration of the real numbers
between 0 and 1, written out as decimals. (The superscripts are labels to discriminate
between the numbers, they do not represent powers.)
1 b 1 0. b 1
1b2
1b3
1…
2 b 2 0. b 1
2b2
2b3
2…
3 b 3 0. b 1
3b2
3b3
3…
4 b 4 0. b 1
4b2
4b3
4…
n b n 0. b 1
nb2
nb3
n…
Cantor proceeded to construct a number x which is missing from the list. To begin
with, either b 1
1 7 or it does not. If it does, then let x 1 4, but if not, then let x 1 7. Either way, x 1 b 1
1. Similarly, if b 2
2.
n. Then we
273
digit (x 1 b 1
digit (x n b n
n). So x is a real number between 0 and 1 which is not on the list, and that is a
contradiction. Therefore there can be no one to one correspondence between the
natural numbers and the real numbers between 0 and 1.
Countable infinities Cantor’s proof of the uncountability of the real numbers cleaved
the world of infinity in two types of infinity: countable sets which can be put in one to
one correspondence with the natural numbers, and uncountable sets which cannot.
Every infinite subset of the natural numbers, such as the prime numbers, is countable.
Write them out in a list: 2, 3, 5, 7, 11, … Then counting them sets up a
correspondence, 1 with 2, 2 with 3, 3 with 5, 4 with 7, 5 with 11, and so on. The
integers are also countable, since we can list them like this: 0, 1, 1, 2, 2, 3, 3, …
More surprisingly, the positive rational numbers are countable, as Cantor proved in
1873. These are the positive fractions and they can be set out in a grid as shown. Then,
by taking a cleverly snaking path, we can count them all once, though we have to take
care to skip any we’ve already listed. So 2
correspondence we want.
1
2
5
4
4
He had not finished yet, however. Cantor’s theorem of 1891 subdivided infinity again.
In a second shock, he showed that there is not just one level of uncountable infinity,
but infinitely many. Many of these have names, given by cardinal numbers. The tool
Cantor used was the notion of a power set.
274
LOGIC
SET THEORY
Power sets If A is a set, then its power set P (A) is the collection of all its subsets
(making the power set a set of sets). To take an example, the power set of {1, 2, 3} is
{, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3} {1, 2, 3}}. In this example the original set has
three elements, and its power set has 2 3 8 elements. This illustrates the general
pattern. The power set of a set with n elements contains 2 n elements. Remarkably, this
rule extends to infinite sets, as well as the finite ones. This is the content of Cantor’s
theorem.
123
232
Cantor?s theorem In 1891, armed with just two simple tools, the idea of a power set,
and the notion of a one to one correspondence, Georg Cantor tore apart the
mathematics of his time. With one short argument (whose form anticipated Russell’s
paradox) he proved that there can never be a one to one correspondence between any
set and its power set.
The consequences were dramatic: start with an infinite set, such as that of the natural
numbers N, and take its power set, P (N). By Cantor’s argument, the two can never be
the same size. The power set is always bigger. Now start again. Taking the power set
of the new infinite set P (N) reveals a still bigger set P (P (N)). By repeatedly taking
power sets, infinity opened up before Cantor’s eyes. No longer did it seem a monolith,
but an infinite hierarchy, with each level infinitely surpassing the last.
Cardinal numbers Cantor’s theorem demonstrated that there are infinitely many levels
of infinity, so a way to measure and compare them was needed. Cantor constructed a
system of cardinal numbers, special sets which could measure any other: every set will
be in one to one correspondence with exactly one cardinal, called its cardinality.
The first cardinals are the ordinary natural numbers: 0, 1, 2, 3, … But then come the
infinite cardinals, the first of which is called ? 0 (that is ‘aleph zero’, aleph being the
first letter of the Hebrew alphabet). ? 0 is the cardinality of the set of natural numbers
N, and so also the cardinality of every countable set. The next cardinal number is ? 1,
then ? 2, ? 3, and so on, as far as ? ? 0, and way beyond. For any cardinal number it
always makes sense to talk about the next one: the smallest cardinal which is bigger
than the one you already have.
112
13
275
Another way to find bigger cardinals is by taking power sets. The cardinality of the
power set of ? 0 is written as 2 ? 0, and then we can take the power set of that 2 2 ? 0,
and so on. This sequence is also written as p 0 (? o), p 1 (2 p 0), p 2 (2 p 1), p 3 (2 p 2)
etc. (where p (‘beth’) is the second letter of the Hebrew alphabet).
Russell?s paradox With a few short words in 1901, the mathematician and philosopher
Bertrand Russell seemed to sound the death knell of the set theory that had blossomed
in the years following Cantor’s theorem. If a set is any collection of objects, then the
set of all sets seems to make perfectly good sense. Furthermore, this special set must
contain itself as a member, by definition. This illustrates that some sets contain
themselves as members, while others do not. Russell’s devastating move was to define
a set X: the set of exactly those sets which do not contain themselves as members. The
paradox is this: is X a member of itself? Either assumption, that it is a member of
itself, or that it is not, ends in contradiction.
Although Russell and others struggled to find a way around this paradox, there is no
way to resolve it, other than by being far more discriminating about what is classed as
a set. Russell’s paradox killed off the naïve set theory which admitted any collection of
objects as a set. In a few years it would be replaced with the more technically
demanding and rigorous subject of axiomatic set theory, in which paradoxical monsters
such as X and the set of all sets cannot exist.
The barber and librarian paradoxes are real-world analogies for Russell’s paradox, and
Grelling’s paradox transplants the argument into linguistics.
The barber paradox The barber paradox is a translation of Russell’s paradox from the
domain of set theory to the real world. It was used by Russell in discussion of his
work. There is a village, in which there lives a barber, who shaves some of the village
men. To be exact, he shaves all the men who do not shave themselves (and only those
men). The unanswerable question is: who shaves the barber?
The librarian paradox The librarian paradox is another analogy for Russell’s paradox.
A librarian is indexing all the books in her library. She compiles two lists, A and B;
every book is entered into one or the other (but not both). In A she lists all the books
which reference themselves. In B she lists those which do not. Once she has completed
the main collection, she has two new books, the books A and B, which need to be
indexed too. But where can she list B? If she puts it in A, then it cannot also go in B.
This would make it one of those books which do not reference themselves, and so it
should be listed in B, not A. But if she does list it in B, then it does reference itself, and
so should be listed in A, not B. Again, either way leads to a contradiction.
276
LOGIC
SET THEORY
Axiomatic set theory ‘No-one shall expel us from the paradise that
Cantor has created’, declared the influential German mathematician David Hilbert. But
Russell’s paradox demonstrated that the set theory of the time contained dangerous
contradictions. Would it bring the whole edifice of cardinal numbers crashing down?
What was needed was a secure logical grounding for the theory of sets, to replace the
informal idea of a set as a ‘collection of objects’. With this in place, set theory itself
could act as foundation for the whole of mathematics, and safely incorporate Cantor’s
system of cardinal numbers.
Two main contenders appeared for the role: the type seetheory
intuitionism
of Russell and
Whitehead’s Principia Mathematica (published between 1910 and 1913), and, by the
1920s, the axioms of Zermelo–Fraenkel set theory. Today, several variants of set
theory are still studied, including intuitionistic formulations (see intuitionism) and
Quine’s New Foundations (see type theory). Alternative approaches to the foundations
of mathematics come from category theory. However, the industry standard remains
ZFC, Zermelo–Fraenkel set theory plus the axiom of choice.
Empty set,
‘I got plenty o’ nothin’ and nothin’s plenty for me’ — George Gershwin
Written as ‘’ or sometimes ‘{ }’, the empty set is the most trivial object in
mathematics. It is simply a set with nothing in it. More accurately, it is the set with
nothing in it. If two sets have exactly the same elements then they are really the same
set. Two empty sets certainly have the same see Platonic
elements solids
(namely none at all). So the set
of square prime numbers is the same as the set of regular heptahedra (see Platonic
solids): both are equal to.
Before this situation could arise, a proper formalization of set theory was needed,
avoiding Russell’s paradox.seeIn type theory
1905, Ernst Zermelo had begun the search for such a
list of axioms, and in 1922 Thoralf Skolem and Abraham Fraenkel were able to
complete the search. The resulting framework, called Zermelo–Fraenkel Set Theory (or
ZF) posits the existence of the empty set,
277
Fraenkel set theory (or one of its alternatives). For set theory to underpin the whole of
mathematics, the natural numbers must first be encoded in it. This is usually achieved
with playing the role of 0, { } that of 1, {, { }} that of 2, {, { }, {, { }}} that of 3, and
so on. Not only the natural numbers, but a whole set-theoretic universe, and essentially
the whole of mainstream mathematics, can then be built from the empty set in this
Before this situation could arise, a proper formalization of set theory was needed,
avoiding Russell’s paradox.seeIn type theory
1905, Ernst Zermelo had begun the search for such a
list of axioms, and in 1922 Thoralf Skolem and Abraham Fraenkel were able to
complete the search. The resulting framework, called Zermelo–Fraenkel Set Theory (or
ZF) posits the existence of the empty set,
and axiomatizes the process of taking power sets, as well as unions and intersections.
Everything in this universe is a set: they are no longer constructed from more basic
objects. So sets may contain other sets, but not every collection of sets itself counts as
a set. In particular, the collection of all sets is not a set, and no set is permitted to
include itself. Thus Russell’s paradox is avoided.
D, … Then you can form a new set by taking one element from A, one from B, one
from C, and so on. This principle may appear uncontroversial at first sight, but all
attempts to prove it from the axioms of Zermelo–Fraenkel set theory (ZF) floundered.
In 1940 Kurt Gödel at least managed to prove that this principle did not introduce any
contradictions into ZF.
The problem is that the collection of sets A, B, C, D, … may go on for ever, in which
case an infinite number of choices have to be made. This is particularly awkward when
there is no method for choosing one element from each set. Bertrand Russell’s analogy
was that if you have infinitely many pairs of shoes, you can apply a general rule: take
the left shoe from each pair. But for infinitely many pairs of identical socks, no such
rule exists, so infinitely many arbitrary choices are needed.
The status of the axiom was eventually resolved in 1962, when Paul Cohen used
forcing to show that the axiom of choice is independent of ZF.
Banach–Tarski paradox In 1924, the analyst Stefan Banach and the logician Alfred
Tarski joined forces to prove a very disconcerting fact: if you have a 3-dimensional
ball, then the axiom of choice allows you to chop it into six pieces which can be slid
around (using only rotations and translations) and reassembled into two balls each
identical to the original. This looks very much as if it violates the original sphere’s
volume. Banach and Tarski dodged this problem by cutting the sphere into non-
measurable sets, which don’t have a volume in any meaningful sense. These ghostly
sets are impossible to imagine, and their existence is only guaranteed by the axiom of
choice.
This strikingly counterintuitive consequence of the axiom of choice has caused some
people to question its validity. It is also the source of a favourite mathematical joke,
that the catchiest anagram of ‘banach–tarski’ is ‘banach–tarski banach–tarski’.
Cantor’s system of cardinal numbers works perfectly for measuring the size of sets:
every set is in one to one correspondence with a cardinal. However, comparing these
cardinals is not always as straightforward as might be hoped. You might expect that,
for any two cardinals A and B, one of the following should hold: either A B, or A B or
A B. Cantor certainly believed in this trichotomy principle and was happy to use it
without proof in his work in 1878. Later, however, he came to realize that it was not
quite as self-evident as it had first seemed. In fact, this principle is logically equivalent
to the Axiom of Choice. If that fails, then there will exist cardinal numbers which just
cannot be compared to each other.
278
LOGIC
SET THEORY
Many important sets have cardinality 2 ? 0, including the set of real numbers, and the
set of complex numbers.
The question that had so frustrated Cantor was whether or not there is another cardinal
number between ? 0 and 2 ? 0. That is, whether 2 ? 0 ? 1. The statement that the two
are indeed equal is known as the continuum hypothesis. It was finally settled in 1963
when Paul Cohen used forcing to show that this statement is independent of the axioms
of ZFC (Zermelo–Fraenkel set theory plus the axiom of choice).
Forcing Paul Cohen invented a powerful method, called forcing, for creating made-to-
measure universes of sets satisfying the Zermelo–Fraenkel axioms (ZF), as well as
extra requirements he could choose to impose. Cohen showed off his new technique in
the most spectacular fashion. In 1962 he constructed a model of ZF in which the axiom
of choice failed. With Gödel’s earlier work showing that the axiom of choice is
consistent with ZF, the independence of the axiom of choice was established.
The following year Cohen built another model of ZF in which the axiom of choice was
true, but the continuum hypothesis was false. Again, when taken with Gödel’s work,
the independence of the continuum hypothesis from ZFC was shown. Cohen’s forcing
remains the standard way of constructing new models of set theory, in particular in the
study of large cardinals.
New models of set theory When Cantor and Zermelo laid the foundations of set theory
at the end of the 19th century, they had in mind a unified world of sets to be described
and axiomatized, and then for these to act as the base for the whole of mathematics.
Gödel’s incompleteness theorems of 1931 meant that this goal had to be radically
reassessed: there would always be statements X for which neither X nor its negation
(‘not X’) could be deduced from ZF.
However, for some time this remained only a theoretical possibility; the hope was any
such X would be an arcane curiosity, not of any real mathematical importance. The
triumph of forcing dashed that hope. As the logician Andrzej Mostowski said in 1967,
‘Such results show that axiomatic set theory is hopelessly incomplete … Of course if
there are a multitude of set-theories then none of them can claim the central place in
mathematics’. The unavoidable conclusion was that mathematicians would have to
treat ZF more like the group axioms, not as the axiomatization of one, unique
categorical structure, but with an infinite variety of models. Some of these are explored
by large cardinal axioms.
279
Large cardinals After ? 0, the most familiar infinite cardinal is 2 ? 0. Already, its
properties are not clear, as the independence of the continuum hypothesis shows all too
well.
However, when we look at bigger cardinals, their behaviour becomes ever more
opaque. (A rare exception is the extraordinary result by Saharon Shelah, under a minor
additional hypothesis, that 2 ? ? 0 ? ? 4 .) When we look at much larger cardinals, very
often even their existence is independent of ZFC. As the set theorist Dana Scott put it,
‘if you want more you have to assume more’.
HILBERT’S PROGRAM
Hilbert’s program In the early 20th century, mathematics was suffering from what
Hermann Weyl called a ‘new foundational crisis’. The search had been on for a
framework in which the whole of mathematics could systematically be deduced. In the
1920s, David Hilbert set out what he thought was required from such a system. It
would be based around Peano arithmetic and should satisfy three criteria:
1 Consistency. The system should never produce a contradiction (such as 1 1 3). That
the foundations of mathematics should be consistent was the second problem Hilbert
posed in his 1900 address.
2 Completeness. Every true statement about natural numbers should be deducible
within the system.
3 Decidability. There should be a procedure which can determine whether any given
statement about natural numbers is true or false.
Hilbert’s program must have seemed like a natural, realistic goal. As it turned out,
mathematics was far more slippery than he expected. Gödel’s first and second
incompleteness theorems killed off the possibilities of criteria 1 and 2, and Church and
Turing’s solution of the Entscheidungsproblem similarly demolished criterion 3.
280
LOGIC
HILBERT’S PROGRAM
Gödel’s method was to encode the self-referential statement ‘this statement has no
proof’. If it was false, then the system would be inconsistent; if true, the system would
be incomplete.
Gödel’s second incompleteness theorem, published in the same paper as his first, says
that no logical axiomatization for arithmetic can ever be proved consistent under its
own rules, unless it is in fact inconsistent. This was the starting point for the subject of
proof theory. Both Gödel’s theorems had a huge impact within mathematics, and their
philosophical implications continue to be discussed today.
Church’s thesis There are various technical formulations of the notion of an algorithm,
notably in terms of theoretical Turing machines and type theory. On first sight these
approaches seem very different. However, a network of theorems by Alonzo Church,
Alan Turing, Stephen Kleene and John Barclay Rosser showed that the differences are
ultimately superficial. All reasonable approaches to formalizing algorithms produce the
same results. This idea, known as Church’s thesis, means that the notion of an
algorithm is mathematically robust.
281
Turing machines
A Turing machine involves a tape (as long as required), which serves as its memory.
The tape is divided into segments, each of which may contain a symbol. (Only finitely
many symbols are allowed; these make up the machine’s alphabet.) The machine
comes equipped with a finite number of possible states. The combination of its current
state and the symbol in the current segment determines what the machine does next. It
may erase or write on the tape, then move to a neighbouring segment, and finally
assume a new state.
This is where the digital computer was born. With a suitable choice of alphabet and
states, a Turing machine is theoretically capable of everything a modern machine can
do. As Time magazine wrote in 1999, ‘everyone who taps at a keyboard, opening a
spreadsheet or a word-processing program, is working on an incarnation of a Turing
machine’.
The halting problem Some algorithms will run for ever (or until the universe ends, or
someone intervenes with Ctrl-Alt-Delete). Other algorithms stop once they have
achieved their goal. The halting problem is to decide whether a given algorithm halts,
or does not. Dry as it sounds, this is a question of huge mathematical significance. For
instance, it is easy to write an algorithm which runs through the natural numbers,
stopping if it finds an odd perfect number. Knowing whether or not this algorithm halts
amounts to knowing whether such a number exists, a major open problem in number
theory.
If someone was able to come up with a method to tell in advance whether or not any
algorithm will halt, they would settle at a stroke this, and innumerable other questions
of mathematics and computer science. However, no-one will find such a method, as
none can exist. This seminal result was proved by Alan Turing, in 1936.
282
ĦŦ
LOGIC
HILBERT’S PROGRAM
have required wildly differing techniques, and the individual insight and hard work of
many mathematicians, to prove. Was Hilbert right that there should be one overarching
perspective which would reduce mathematics to the simple application of one
mechanical process?
Closely related to the halting problem, this is the Entscheidungsproblem (or decision
problem). Alonzo Church in 1936 and Alan Turing in 1937 independently proved that
it does not have a solution. Echoing and supplementing Gödel’s theorems, they showed
that every logical system powerful enough to incorporate Peano’s axioms for the
natural numbers is undecidable as well as incomplete. That is, there can be no
algorithm to determine whether or not an arbitrary statement is true within the system.
A happy consequence is that computers will not be putting mathematicians out of work
for the foreseeable future.
In 1926, Alfred Tarski set about the task of translating Euclid’s postulates into the
language of modern logic. The resulting system allows all the usual analysis of points
and lines, but not set-theoretic constructions (such as Cantor dust). In 1930, Tarski
managed to show that his system does not conceal any Gödelian horrors. It is
unambiguously consistent (it contains no contradictions) and, unlike arithmetic, it is
complete (every statement in the language will be definitively true or false). Even
better, it is decidable: there is an algorithm which will take any statement about points
on the plane, and produce a result of ‘yes’ or ‘no’, depending on whether the statement
is true or false. The foundations of geometry, it seems, are more solid than those of
number theory.
Proof theory In 1936, Gerhard Gentzen proved that the ordinary arithmetic of the
natural numbers, as laid down in Peano arithmetic, is consistent: following the rules of
the system will never produce a contradiction. This seems to fly in the face of Gödel’s
second incompleteness theorem, where it was shown that such a system can never be
proved consistent under its own rules. The resolution to this contradiction is that
Gentzen was not working within Peano arithmetic. He was working inside another
system. Although the new system could not prove itself consistent, it could prove
Peano arithmetic consistent.
Does this mean for certain that no contradictions lurk within Peano arithmetic?
Although most mathematicians would subscribe to this conclusion, Gentzen’s proof
relies on the assumption that his new, stronger system is consistent. Of course, this
cannot be taken for granted. Proof theory compares the relative strengths of different
logical systems (including intuitionistic and modal logic). Any system can be assigned
an ordinal, an infinite quantity closely related to a cardinal number, which measures its
strength.
283
Reverse mathematics The standard process of mathematical research is to start with
some basic axioms, and deduce interesting conclusions from them.
It is good practice to be as sparing with the initial assumptions as possible. The worst
case would be if your work depended on a conjectural result, such as the Riemann
hypothesis or, even worse, the P NP problem. Beyond this, some axioms are more
controversial than others. If your theorem absolutely requires the axiom of choice (or,
worse, the continuum hypothesis) this is worth recording. Otherwise, if you can do
without them that would usually be preferable. Similarly, if you can prove your
theorem from first principles, instead of relying on monumental results such as the
classification of finite simple groups or the four colour theorem, it will usually be
better (and more illuminating) to do so.
Mathematicians generally try to follow this guiding principle (without getting too hung
up on it). But the logician Harvey Friedman turned it into a full-blown logical
programme. Starting with classical theorems such as the intermediate value theorem,
Friedman’s question is: what are the absolute bare minimum assumptions needed to
prove it? This question demands a logical answer, in terms of a minimal proof-
theoretic system. In 1999, Stephen Simpson identified the five commonest such
systems, in order of strength. Reverse mathematicians analyse which the appropriate
foundation is for a given theorem.
Hilbert’s 10th problem For thousands of years people had been investigating equations
such as Fermat’s last theorem and Catalan’s conjecture, trying to determine whether or
not they have any solutions which are whole numbers. David Hilbert believed that this
ad hoc approach to Diophantine equations was inadequate. In his 1900 address, he
called upon the mathematical community ‘to devise a process according to which it
can be determined by a finite number of operations whether the equation is solvable in
rational integers’. This process should involve inputting an equation, following some
predetermined steps, and arriving at an answer: ‘yes this equation has a whole number
solution’, or ‘no it does not’. In modern terms, what was required was an algorithm.
The study of Hilbert’s 10th problem led to astonishing bridges between number theory
and mathematical logic. It became clear that Diophantine equations and Turing
machines are really two perspectives of the same underlying subject. Hilbert’s hopes
were finally dashed by Matiyasevich’s theorem.
284
LOGIC
COMPLEXITY THEORY
The key concept from logic is that of a collection of natural numbers being computably
enumerable, which means that there is an algorithm which lists it. The set of prime
numbers is computably enumerable, as there is an algorithm which takes the natural
numbers in turn, testing each one for primality, and including or excluding it as
appropriate. It is not too difficult to prove that every Diophantine set of integers is
computably enumerable. In 1970, building on earlier work by Julia Robinson, Martin
Davis and Hilary Putnam, Yuri Matiyasevich proved the deep and counterintuitive
result that the opposite is also true: every set of computably enumerable integers is
Diophantine. Because we know there are enumerable sets that are not computable,
Matiyasevich’s theorem immediately implies the existence of uncomputable equations.
Matiyasevich’s theorem killed off this idea, by showing that every computably
enumerable set is Diophantine. The critical point is that the class of computably
enumerable sets is much broader than that of outright computable sets. This followed
from Turing’s work on the halting problem. Therefore, by Matiyasevich’s theorem,
there are a great many Diophantine sets which are uncomputable.
COMPLEXITY THEORY
Complexity theory In practical computing applications, not all algorithms are equally
efficient. An expert programmer might write a slick piece of code to perform a given
task quickly, while an amateur’s effort could take hundreds of times longer to do the
same thing. This is the art of algorithm design. However, not everything comes down
to the ingenuity of the programmer. It is likely that even the greatest programmers will
never build a quick algorithm for solving the travelling salesman problem, because
probably no such algorithm can exist. (Whether or not this is true depends on the
biggest problem in the subject: P NP.)
This is the topic of complexity theory, which straddles the boundary of mathematics
and theoretical computer science. It studies the inherent difficulty of a task, as
measured by the minimum length of time any algorithm will take to solve it.
285
is given by a polynomial such as n 2 or n 3, then the task is said to have polynomial
time. The collection of all such tasks form a class known as P (standing for
‘polynomial’). Cobham’s thesis identifies this with the set of tasks which can be
completed fast enough for practical purposes.
On the other hand, some functions grow much faster than polynomials. If the algorithm
has to carry out 2 n steps on n pieces of data, then this very quickly explodes out of
control.
At n 100, even the fastest modern processor will be defeated. Tasks like this are said to
run in exponential time, and the class of them is known as exptime.
Cobham’s thesis Alan Cobham was one of the early post-war workers in complexity
theory. He viewed the complexity class P as the best theoretical description of
problems that can feasibly be solved on a physical computer. Cobham held that for
most purposes ‘polynomial time’ means ‘fast enough’. Problems which are not in P
may technically be computable, but any algorithm is likely to be so slow as to be of no
practical use.
Cobham’s thesis is valuable as a rule of thumb, but is not quite the whole story. In
truth, an algorithm which has to carry out n 1000 operations when n pieces of data are
entered is of little use (though technically still polynomial time). On the other hand,
while most non-polynomial algorithms are of limited practical value, one which grows
at the rate 2 n ____ 1000 could well be worth implementing for small sets of data.
There is – or there seems to be – a large difference between the time needed to solve a
problem and the time it takes to check the solution. For example, there is no known
algorithm which can factorize large integers such as 10,531,532,731 in polynomial
time, suggesting that it is an inherently tough problem. On the other hand, once a
solution is provided, it is quick work to check that it is correct. If someone tells me that
101,149 104,119 10,531,532,731, it is straightforward to verify that she is right.
So there is a short route through the problem; the hard work is in finding it. Although it
may not be quickly solvable by an ordinary Turing machine, a machine which is
allowed to make guesses during the algorithm could solve it fast, if it got lucky. Such a
theoretical device is known as a non-deterministic Turing machine.
The class of problems which can be checked in polynomial time is known as NP,
standing for ‘non-deterministic polynomial time’, meaning that a non-deterministic
machine may be able to solve it quickly (if it is lucky). Similarly exptime, and other
complexity classes have their non-deterministic equivalents. The relationship of the
classes P and NP is the subject of the famous P NP problem. The toughest problems in
NP make up the class of NP-complete problems.
286
LOGIC
COMPLEXITY THEORY
AKS primality test How do we decide whether or not a particular number is prime?
The simplest method is just to try dividing n by every smaller number (this can be
slightly improved by recognizing that we only need to test prime numbers smaller than
n). All the same, if n is a large number (say hundreds of digits long), this is an
impossibly slow process. The subject of primality testing has deep roots, and some
better methods have been found. But for a long time it was unclear where the
theoretical barriers lie. The Lucas – Lehmer test runs in polynomial time, but it only
applies to a very special class of inputs, namely Mersenne numbers. No-one had been
able to construct an efficient primality test which could work for any integer, nor had
anyone demonstrated that it could never be done.
In 2002, Manindra Agrawal, Neeraj Kayal and Nitin Saxena at the Indian Instititute of
Technology in Kanpur stunned the world with their paper ‘Primes is in P’, in which
they described a new primality test which works for any number, and does run in
polynomial time.
Integer factorization problem If someone gives you a large number, and asks you to
break it down into its constituent prime factors, what is the most efficient process to
use? As well as being of mathematical interest, this is a problem of great importance in
cryptography theory. Modern public key systems rely on large integers being very
difficult to factorize (which is why RSA Laboratories set up the RSA factoring
challenge).
So far, this seems to be the case. Although some special types of integer can be
factorized quickly (that is, in polynomial time), the best algorithm for factorizing
general large integers (say, longer than 150 digits) is the number field sieve, conceived
by John Pollard in 1988 (and developed by Menasse, Lenstra and Lenstra), which does
not run in polynomial time.
Integer factorization can be checked very quickly just by multiplying the relevant
numbers together, so this problem’s certainty lies in the complexity class NP. The big
question is whether it is in P, that is, whether a polynomial time algorithm might in
principle exist. If P NP was proved to have a positive answer then there would have to
be a polynomial time algorithm for integer factorization. Such an algorithm could have
a devastating impact on internet security.
The RSA factoring challenge In 1991, the network security company RSA
Laboratories published a list of 54 numbers between 100 and 617 digits long. They
challenged the world to factorize them, offering prizes of up to $200,000. All of the
numbers are semiprimes, that is, multiples of exactly two prime numbers, as these have
the most significance for cryptography. RSA declared the challenge inactive in 2007
and retracted the remaining prizes. However, the distributed computing project
distributed.net continues to work on these challenges, and is offering privately
sponsored prizes for successful participation.
At time of writing, the world record for the largest factorization is an RSA number,
namely, the 200-digit number known as RSA-200:
287
279978339112213278708294676387226016210704467869554285375600099293261284
456710529553608560618223519109513657886371059544820065767750985805576135
50144178863178946295187237869221823983
This was factorized in 2005 by F. Bahr, M. Boehm, J. Franke and T. Kleinjung into
two 100-digit primes:
353246193440277012127260497819846436867119740019762502364930346877612125
0058547956528088349
and
792586995447833303334708584148005968773797585736421996073433034145576787
5381409304740185467
The effort required around 55 years of computer time, and employed the number field
sieve algorithm.
The P NP question Can every problem which can be checked quickly also be solved
quickly? This is, roughly, the meaning of the P NP problem, one of the outstanding
open problems in mathematics today, and carrying a $1,000,000 price tag, courtesy of
the Clay Institute. A solution would be worth vastly more, however, because of the
potentially seismic repercussions for integer factorization, the travelling salesman
problem and numerous other questions of algorithm design.
The class P is the collection of all problems which can be solved by an algorithm in
polynomial time, such as telling whether a number is prime. The class NP is those
which can be checked in polynomial time, such as factorizing a number. It is
straightforward to see that everything in P is also in NP, so P NP. The million dollar
question is whether NP P.
Stephen Cook and Leonid Levin independently posed the P versus NP problem in
1971. Although no proof exists either way, the suspicion among mathematicians and
computer scientists is largely that P NP. A third possibility is that the question itself
could be independent of all our standard mathematical assumptions.
288
LOGIC
COMPLEXITY THEORY
But what if P NP, as most people suspect? This seems not to take us any further
forward, as these problems could still either be in P or not. In the case of the travelling
salesman problem however, there is an extra detail. This problem is NP-complete,
meaning that, out of all the problems in the class NP, it is among the most difficult to
compute by algorithm. If P NP, then there is some problem in NP which cannot be
computed in polynomial time. Being NP-complete, the travelling salesman problem
must be at least as difficult as this problem, and so cannot lie in P.
If this could be exploited for computing purposes, it could lead to massively faster
machines. All that remains is for a functioning quantum computer to be built. Although
this is a daunting challenge, progress has been made. In June 2009, a team at Yale
University led by Robert Schoelkopf succeeded in developing a 2-qubit (‘quantum
bit’) processor. It successfully ran Grover’s reverse phone book algorithm.
However, if you happen to have a quantum computer to hand, the problem can be
solved faster. In 1996, Lov Grover designed a quantum algorithm, which exploits a
quantum computer’s ability to adopt different states, and thus check different numbers,
simultaneously. If the phone book contains 10,000 entries, the classical algorithm will
take approximately 10,000 steps to find the answer. Grover’s algorithm reduces this to
around 100. (In general, it will take around
N steps, instead of N.) The algorithm was successfully run on a 2-qubit quantum
processor in 2009. Of course it is not really useful for checking phone numbers. But
for searching for the keys to ciphers, for example, it would be a powerful tool.
Quantum complexity classes Exactly how quantum complexity classes relate to their
classical counterparts is a topic of current research, and largely a mystery. An added
complication is that quantum computations are probabilistic; they only probably
decohere on the right answer. Repeatedly running the algorithm can increase this
likelihood to any required level of certainty, but this slows the process down again,
partially counteracting its benefits. Significantly, Grover’s algorithm was found to be
nearly optimal; no other quantum algorithm would solve the problem significantly
faster. This shows that, although quantum computers are powerful, they are not
omnipotent.
289
It is unknown whether every problem in NP can be solved in quantum polynomial time
(BQP, bounded-error quantum polynomial time). However, in 1994 Peter Shor
discovered a quantum algorithm for integer factorization which does run in polynomial
time, a problem which is unsolved (and maybe impossible) on a classical computer.
This result is of potentially huge importance, should a fully functional quantum
computer successfully be built.
COMPUTABILITY THEORY
Computability What are the possible collections of natural numbers? There are the
prime numbers, the numbers 1 to 100, the triangular numbers, and all sorts of other
collections of endless fascination to mathematicians. But these are vastly outnumbered
by a swathe of unstructured, random-seeming sets which are nearly impossible to
describe. Is there any way to tell which is which?
The first attempt to split the interesting sets from the morass is the computability
concept. Suppose that A is a set of numbers, and I want to know whether or not 57 and
1001 are in A.
If there is an algorithm which will give yes/no answers to such questions, then A is
said to be computable. (Whether the corresponding algorithms are fast enough to be
practically useful is a matter for complexity theorists to worry about.) This is only the
start, however. Through the introduction of Turing oracles, computability becomes a
relative concept, with some sets being more uncomputable than others. This opens the
door to the study of what it means to be truly random. Computability theory can also
be understood in terms of computable real numbers.
Computably enumerable sets A subtly different notion from being computable is the
notion of being computably enumerable. This means that there is an algorithm which
lists the set. The catch is that it may not list it in any comprehensible order.
Every computable set is certainly enumerable, but the reverse does not hold. Suppose
B is an enumerable set, and I want to know whether or not 7 is in it. I run the algorithm
which lists B, and it begins: 1, 207, 59, 10003, 6, … If 7 appears in the list, then I
know it is in. But if I wait for half an hour, and 7 has not appeared, I cannot conclude
that 7 is not in the list. It may yet be listed. I could let the algorithm run all night, or for
a million years, but there is no moment when I can be certain that 7 is definitely out.
By Turing’s work on the halting problem, we know that there is no way around this
obstacle. The class of enumerable sets really is strictly bigger than that of truly
computable sets. Computably enumerable sets appear in number theory as the
Diophantine sets, courtesy of Matiyasevich’s theorem.
290
LOGIC
COMPUTABILITY THEORY
Encoding sets in binary Real numbers are commonly written as decimals, but can also
be written out in binary notation. An example might look like r
0.011010100010100010… Binary representation provides an excellent method for
encoding sets of natural numbers as real numbers.
We can construct a set of natural numbers from r as follows: list the natural numbers 1,
2, 3, 4, 5, 6, 7, … Now line these up with the bits of r, with 1 meaning in the set, and 0
for meaning out of it:
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
n th bit of r 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0
So in this case r encodes the set {2, 3, 5, 7, 11, 13, 17, …} The reverse procedure
produces a real number from a set. So {1, 2, 3, 5, 8, 13, 21, …} defines the real
number 0.111010010000100000001…
We can at least gesture in the direction of some individual uncomputable real numbers.
The best way to do this is to derive them from known uncomputable problems.
The halting number K The prime example of uncomputability is the halting problem.
By encoding this problem as a string of binary, we get our first uncomputable real
number, known as K.
291
Chaitin’s During the 1980s, the computer scientist Gregory Chaitin travelled as close
to an individual random, uncomputable real number as it is possible to go. His number
‘’ can be understood as the halting probability. Every Turing machine will either
complete its task and halt, or it will run for ever. Deciding in advance which will
happen is the halting problem, and is uncomputable.
It makes sense to talk about the probability of a randomly chosen Turing machine
halting. This, roughly, is the definition of the number. It is a compressed version of the
halting number K, with all the non-randomness stripped out. If could be pinned down
exactly (which of course it cannot) it would serve as an oracle for the halting problem,
and many other uncomputable problems.
Oracles By definition, the only sets which can be calculated by Turing machines are
the computable ones. By cheating, and endowing our machines with superpowers, we
can manufacture a much larger range. A Turing oracle is a Turing machine with an
extra component: an oracle with magical access to some specified set (A). On top of its
usual functions, the machine can consult the oracle for information about A. It is
almost certain that such a device will never be built (even a quantum computer would
not suffice). But, as an abstraction, it throws up some fascinating possibilities. The
question is, what sets can be computed now?
Turing degrees
With the idea of an oracle, the concept of computability became relative. Even if A and
B are uncomputable sets (or equivalently uncomputable real numbers), it makes sense
to say that A computes B. This means that there is an oracle with access to A which
can compute B. This an ingenious way of capturing the idea that A contains all the
information provided by B (and possibly more).
If A and B can each compute each other, then they contain exactly the same
information, and are said to have the same
Computable Sets
292
LOGIC
COMPUTABILITY THEORY
Turing degree. (Of course A and B might appear very different indeed; there are many
different ways to encode the same information.) If A can compute B, but not vice
versa, we say A has a higher Turing degree than B. This relationship seems natural
enough, but the pattern formed by the Turing degrees is enormously complicated.
Many pairs of degrees are incomparable, with neither able to compute the other
(although there is always a third degree which can compute both).
At the bottom sits the degree of computable sets. Above it, almost anything can
happen. There is no top degree which can compute all others; above every degree you
can always find another. Similarly, there are infinite descending chains: A can compute
B, which can compute C, and so on for ever, but with none of the reverse computations
possible. There are degrees which are infinitely nested, in fact almost any
configuration you can dream up can be found somewhere within the Turing degrees.
Normal numbers What does a typical real number look like? Emile
Borel, in 1909 formulated the definition of a normal number. If you write out the
decimal expansion, then each digit 0–9 should appear equally often. This does not have
to happen over any short stretch, but over the course of the whole infinite decimal
expansion, everything eventually averages out.
More is required, however. There are 100 different possible two-digit combinations: 00
to
99. Each of these should also appear equally often in the long run, and similarly for
three-digit combinations, and so on. In general every finite string of digits should
appear with the same frequency as every other string of the same length. There is one
final requirement for normality. The above definition starts with the number written
out in base 10. If you translate it into base 2, 36, or any other base, the same properties
should still hold.
Although Borel was able to prove that almost every real number is normal, known
examples are rare. The first was provided by Waclaw Sierpinski in 1916. It is
conjectured that e and are both normal, but this is not known for sure. Chaitin’s is
known to be normal. The definition of a random real implies normaility.
Random real numbers The real numbers that we have the easiest access to are the
integers, rationals and algebraic numbers. Although some individual transcendental
numbers, such as e and are known, these are at least computable. A typical real number
will look very different from any of these. Chaitin’s is the closest we have to what a
random real should look like.
Here are two definitions of what it means for a real number to be random: 1 An
approach based on Kolmogorov complexity. If the bits of a real number contain any
pattern, this can be exploited to compress the sequence. A random real is one which is
incompressible, and therefore completely patternless.
293
2 Suppose bits of the number were to be revealed one by one: 0.10111001…, and you
were to place bets on whether the next digit will be 0 or 1. If the real is computable,
you can run an algorithm to predict the next digit exactly, and so win every time. But
for a random real, there is no strategy you can adopt to win more than 50% of the bets.
Randomness The two definitions of a random real number above turn out to be
equivalent, as do several other variants. The second shows that randomness
automatically implies normality otherwise you could produce a strategy to win money
based on combinations of digits which occur more or less frequently. Randomness,
however, is stronger than normality. Random real numbers are scattered throughout the
Turing degrees (though of course none are computable). This means that they can
encode a lot of information, which has led some researchers to delve further into the
question of what randomness is.
Stupidity tests The definition of random real numbers works well. But it has a
somewhat paradoxical consequence. Many random numbers, such as Chaitin’s, have
high Turing degrees. This means that they encode a great deal of information. In fact,
according to the Kucera–Gacs theorem, almost any information whatsoever can be
encoded within the bits of a random real number. This rather goes against some
people’s intuition of what randomness should mean.
The computability theorist Denis Hirschfeldt likens randomness to stupidity. There are
two ways to pass a test for stupidity: first, if you are really stupid; second, if you are
clever enough to predict how a stupid person would answer. Numbers like are the
clever sort of random. More stringent tests only allow genuinely stupid numbers
through, which contain little information. This idea leads to a whole new hierarchy of
randomness.
MODEL THEORY
They do not mean that it is impossible to write down a complete, consistent set of
axioms for any mathematical theory. They say that you cannot do this for the natural
numbers, where addition and multiplication are too complicated to be captured by one
formal system.
Lou van den Dries contrasts such ‘tame’ theories with ‘wild’ ones which exhibit
Gödelian phenomena, and describes model theory as the ‘geography of tame
mathematics’.
294
LOGIC
MODEL THEORY
An early pioneer in the subject was Alfred Tarski, who provided elegant logical
analyses of the real and complex numbers, totally avoiding incompleteness or
undecidablility.
Gödel’s incompleteness theorems show that there can be a chasm between a language
and its referents. But, even if we restrict ourselves to non-Gödelian, complete
situations, the situation remains far from straightforward. Of course, there are many
different logical systems, but the Löwenheim–Skolem theorem, applies in many cases.
This says that a logical theory can never determine the size of the structure it is talking
about. This has dramatic consequences in terms of non-standard models of familiar
mathematical structures which can exist with size given by any infinite cardinal
number.
Beyond this, there is huge variety. Some logical theories can pin down a model exactly
(once its size is fixed). This is where the relationship between semantics and syntax is
tightest. For others, this breaks down badly, with theory having a huge spectrum of
models. Determining which category a given logical theory is in is called classification
theory.
Non-standard models Start with a structure, such as the set of real numbers. Look at its
logical theory, that is, the collection of all true statements about it, in some chosen
formal language. (An example of such a statement is ‘For any positive number, you
can find a smaller one, still not equal to zero.’) Now ask what other structures you can
find which obey all the same logical rules. Remarkably, you are suddenly faced with a
bewildering army of structures, other than the one you first thought of. These are
known as non-standard models, and were first discovered by the model theorist
Abraham Robinson in 1960. In the case of the real numbers, Robinson found a non-
standard model he called the hyperreal numbers, which include infinitesimals, objects
previously discredited as superstition.
Non-standard models are not mere curiosities, but form the setting for non-standard
analysis. They also raise some philosophical questions. As the logician H. Jerome
Keisler wrote, ‘we have no way of knowing what a line in physical space is really like.
It might be like the hyperreal line, the real line, or neither.’
Infinitesimals When Archimedes discovered the formula for the volume of a sphere, he
did it by cutting the shape into infinitely many infinitely thin slices, measuring the
infinitesimal volume (that is to say infinitely small) of each, and then adding all these
together to get an ordinary, finite number. Mathematicians throughout the ages
followed
295
his lead, including Newton and Leibniz, who each relied on infinitesimal numbers in
their parallel developments of the calculus.
In the 19th century, Karl Weierstrass finally showed how to set calculus on a solid
footing, using limits of sequences of ordinary numbers, and avoiding infinitesimals. By
1900, infinitesimals had been completely banished from the mathematical repertoire,
and for good reason: there are no such things. At least, the set of real numbers, the
basic setting for geometry and analysis, does not include any infinitesimals.
Every real number x can be written as a decimal. Either every digit is zero (x
0.0000000…), in which case x 0, or we eventually reach a digit which is not (e.g. x
0.00000.. . 000007612415…). In this case, the number has positive size, and is not
infinitely small. It may be unimaginably tiny, of course, but if you zoom in close
enough, it will always be a positive, measurable distance away from zero, and
therefore not infinitesimal.
The transfer principle Despite being based on a fiction, the old infinitesimal approach
to calculus certainly had an uncanny ability to conjure up the correct answers. An
explanation came from a cross-pollination of Weierstrass’ analysis with mathematical
logic. In the 1960s, the model theorist Abraham Robinson discovered non-standard
models of the real numbers which do contain infinitesimal elements.
This discovery made the infinitesimal respectable again, and indeed the primary tool of
non-standard analysis.
Non-standard models first arose as structures satisfying the same logical rules as the
ordinary set of real numbers (in some formal language). Turning this idea around, it
provides a useful tool. If we can deduce that something is true in a non-standard
model, this is often enough to show that it must also hold true for the ordinary real
numbers. This transfer principle is the foundation of non-standard analysis.
The subject was begun in 1960 by Abraham Robinson, whom Kurt Gödel described as
‘the one mathematical logician who accomplished incomparably more than anybody
else in making this science fruitful for mathematics’. Since then, non-standard analysis
has successfully been applied to problems in mathematical physics and probability
theory, as well as within analysis.
Classification theory With over 960 papers to his name, the Israeli logician Saharon
Shelah is one of the most prolific mathematicians currently working. One of his
triumphs is the proof of the classification problem.
296
LOGIC
Some sets of axioms are satisfied by very many structures indeed. At the opposite end,
other theories have just one model, once you fix its size (you can always find models
of different sizes; this is the Löwenheim–Skolem theorem). The classification problem,
roughly speaking, is to tell the difference.
Model theoretic algebra The scraps from the classification theoretic banquet have
fuelled many further feasts. Shelah’s deep and abstract techniques can provide real
insights into structures within mainstream mathematics, such as rings and fields. This
is the subject of model-theoretic algebra. Stable groups, for example, form the setting
for an ongoing attempt to mirror the classification of finite simple groups in an infinite
setting.
Aristotle’s three laws of thought Mathematical logic is concerned with deducing true
statements from true statements. At the bottom, however, something has to be taken as
given, or axiomatic. Three laws of thought attributed to Aristotle and posited as non-
negotiable axioms are:
The first was so self-evident to Aristotle that he barely mentioned it, other than to
opine that ‘why a thing is itself is a meaningless inquiry’. It was elevated to a law by
later thinkers. Although all three laws persevere, systems have been developed which
dispense with 2 and 3, namely paraconsistent logic and intuitionism respectively.
Law of the excluded middle The law of the excluded middle is an axiom, not of
mathematics, but of logic: the broader framework in which the rest of mathematics
takes place. It says that, for any suitably well-posed statement (call it P) either P is
true, or the statement ‘not P’ is true: there is no middle ground. Another way to say this
is that
297
P is logically equivalent to ‘not not P’. The law of the excluded middle is the
foundation of proof by contradiction.
The mother of all logical paradoxes, the liar paradox is attributed to Eubulides in the
fourth century bc, also responsible for the bald man and other paradoxes. Though not
strictly mathematical, it has echoes throughout logic, most notably in Russell’s
paradox, and the proofs of Gödel’s incompleteness theorems and the halting problem.
It illustrates the consequences of a language sophisticated enough to refer to itself.
There have been many approaches to resolving the paradox. Bertrand Russell’s theory
of types sets up a hierarchy whereby objects in the language are only permitted to refer
to lower-level objects, so no legitimate sentence may refer to itself. A paraconsistent
approach simply takes it at face value, and allows that the statement is both true and
false.
This paradox was cooked up by Kurt Grelling and Leonard Nelson, and closely
resembles Russell’s paradox, in a linguistic setting.
298
LOGIC
What happens if we are 90% confident that P is true, and believe that P implies Q, 75%
of the time ? This is a problem in uncertain reasoning, a skill that human beings are
naturally adept at. In a marriage of logic and probability theory, various approaches
have been taken to formalize the rules of uncertain systems. A major motivation is
artificial intelligence, in particular the development of expert systems, such as a
machine which can make medical diagnoses. Being able to reason with uncertain
information, such as conflicting or unclear symptoms, would be a key component.
Fuzzy sets Books about mathematics are full of carefully phrased definitions. To the
non-mathematician these can seem painfully pedantic, but concepts must be given with
the utmost precision for the subject to work. This is particularly true of mathematical
sets: any object must be either in or out. We cannot allow set membership to be
ambiguous.
In the human world, however, this is not true. The set of tall people is certainly
meaningful, but not precisely defined. At 6 feet, do I qualify? Maybe. Fuzzy set theory
was devised by Lofti Zadeh, in an attempt to extend set-theoretic reasoning to cope
with sets which have fuzzy edges. Instead of being given by a crisp in/out, set
membership is assigned a number between 0 and 1 (corresponding to definitely out and
definitely in, respectively).
4' tall
Fuzzy logic An extension of fuzzy set theory is fuzzy logic. This is related to uncertain
reasoning and deals with logical inference involving similarly fuzzy concepts. An
example of Lofti Zadeh’s is: If P is small, and Q is approximately equal to P, then Q is
more or less small. Just as numbers can be defined within traditional set theory, so
fuzzy numbers have been defined from fuzzy sets. Several parts of algebra have
subsequently been fuzzified. Although this approach is not without its sceptics, fuzzy
set theory has found applications in the social sciences and information processing.
5' tall
6' tall
7' tall
299
Multivalued logic In classical logic, a statement can be assigned one of two truth
values: ‘true’ or ‘false’. Multivalued logics extend this to three, four, or infinitely many
possible truth values. Intuitionistic logic can be formalized as a 3-valued logic, by
adding the value ‘unknown’ or ‘undetermined’.
Some databases work on a 4-valued relevance logic. To a query ‘is X true?’ there are
four possible outcomes: that the database contains ‘no relevant information’,
‘information suggests X is true’, ‘information suggests X is false’ or ‘conflicting
information’ (see also paraconsistent logic).
logic
Fuzzy logic, and approaches to uncertain reasoning usually involve infinitely many
truth values, with the truth of a statement being measured on a scale between 0 (false)
and 1 (true). Other systems of continuous logic have been developed which also allow
truth values between 0 and 1, to measure how close two functions are to being
identical.
Intuitionism Some mathematicians viewed with suspicion the debate about foundations
of mathematics which had developed since the late 19th century, in works such as
Principia Mathematica, and the movement spawned by Hilbert’s program. These
intuitionists, such as L.E.J. Brouwer, saw mathematics as an activity of the human
mind, rather than the unthinking logical consequences of some formal system. If the
formalization of mathematical language led to weird and counterintuitive constructs
such as Cantor’s hierarchy of infinities, this should be taken as a hint that a mistake
had been made somewhere, and the language of contemporary logic was clouding,
rather than clarifying, the underlying mathematics.
In 1908, Brouwer investigated more closely, trying to isolate the part of logic
responsible for the damage. In The Unreliability of the Logical Principles, he targeted
the law of the excluded middle as the basis on which mathematicians had been
welcoming in strange constructs for which there was no direct evidence.
Intuitionistic logic is logic without the law of the excluded middle. The project begun
by Brouwer to rebuild mathematics from this base is called constructive mathematics.
Explosion What happens to a logical system when a contradiction creeps in, that is to
say, there is some statement P where both P and ‘not P’ are deemed to be true? The
usual answer is that this acts as a pin to a balloon, and the whole set-up collapses into
inconsistency and meaninglessness. Such an event is known as an explosion.
To see why this happens, suppose that Q is any statement whatsoever. Now, the two
statements ‘P implies Q’ and ‘(not P) or Q’ are logically equivalent (see tautology and
logical equivalence). But ‘not P’ is assumed to be true, and so ‘(not P) or Q’ is
certainly true. Therefore so is ‘P implies Q’. But P is also assumed true, and so it
follows that Qsee
is true.
alsoButparaconsistent
Q was a completely arbitrary statement. Once a
contradiction enters the system, then, anything and everything can be deduced from it.
300
LOGIC
Paraconsistent logic A wild cousin of intuitionistic logic, the term paraconsistency was
coined in 1976 by Francisco Miró Quesada. While intuitionism allows the possibility
of a statement P where neither P nor ‘not P’ holds, paraconsistent logic does the
opposite: it permits both P and ‘not P’ to hold simultaneously.
This would be the kiss of death for any ordinary system, which would immediately
explode with the introduction of any such contradiction. The rules of a paraconsistent
system, however, are weakened to tolerate some local inconsistency. Not every
proposition can be both simultaneously true and false, but a limited number may,
without bringing the system crashing down. A philosophical motivation is dialetheism,
which holds that some statements are both true and false, necessitating some sort of
paraconsistent logic. (The liar paradox is an example of a statement which some might
believe both true and false.) Aside from ideology, paraconsistency has been used to
provide methods for large software systems to handle inconsistencies in their data.
Modal logics After Aristotle’s conquest of the syllogism, he turned his attention to a
thornier problem. In real life, we may not simply assert that a fact X is true. There are
many ways we may qualify this statement: X is necessarily true, possibly true, believed
to be true, or will eventually be true. Modal logics are systems which incorporate
different ways of being true. The standard modal logic introduces two new symbols,
and. ‘A’ is interpreted as ‘A is necessarily true’ and ‘A’ for ‘A is possibly true’. They
are related, with ‘A is possibly true’ meaning ‘A is not necessarily untrue’.
However, there are many variants of modal logic. Kurt Gödel began the study of
provability logic, which includes the modality ‘provably true’. Meanwhile doxastic
logics attempt to formalize the logic of belief systems, and temporal logics are
designed to cope with changes of tense. These have to contend with the past being a
domain of solid fact, and the future a realm of uncertain possibility. (This caused
Aristotle himself to question whether the law of the excluded middle is valid, when
applied to statements about the future.)
Possible Worlds In the 1960s Saul Kripke and others elevated the status of modal
logics by demonstrating how to build mathematical models of them. The important fact
about these structures is they included different worlds. Here world is a technical term,
corresponding to a part of an abstract mathematical structure. Nevertheless these
different domains are indeed intended to mirror parallel universes, or possible worlds,
in which different truths may hold. A statement is then necessarily true if it holds in
every possible world, and possibly true if it holds in at least one possible world.
301
METAMAT
This question was considered by the ancient Greek philosopher Plato, whose
metaphysical answer was provided, in metaphorical form, in Plato’s cave. But in the
late 19th century, this question returned to prominence, principally as a result of Georg
Cantor’s spectacular work on infinite sets. Gottlob Frege’s
HEMATICS
logicism was an attempt to deduce the whole of mathematics from pure logic.
Meanwhile David Hilbert adopted a formalist approach, in which mathematics was no
more than a set of rules for manipulating symbols on a page. Other thinkers reacted
against Cantor’s work on set theory. To the constructivist school, it was strong
evidence that mathematics had gone badly wrong.
In more recent times, any discussion of the nature of mathematics must also consider
its symbiotic relationship with technology. Mathematics was fundamental in the
development of the computer, and later the internet. In return, these inventions have
fundamentally changed the way that mathematics is studied and applied.
WHAT MATHEMATICIANS DO
Proof Proof is the ultimate goal of mathematics, on which the whole thing stands or
falls. A proof is a logically watertight argument, starting with some initial assumptions
(and underlying axioms), and ending with the theorem to be proved. It is the possibility
of proof that sets mathematics apart from all other subjects.
Proofs come in many different forms, and mathematicians have numerous strategies
available: proof by contradiction and proof by induction are two important examples.
Proofs included in this book are the irrationality of
Aside from all these technical terms, mathematicians have a lexicon of words to
describe types of questions they consider.
A lemma is a minor mathematical result, not of particular of interest in its own right
(except when misnamed), but a stepping stone towards a bigger result.
304
METAMATHEMATICS
WHAT MATHEMATICIANS DO
It is possible to prove the uniqueness of a type of object without proving its existence.
This amounts to showing that there is at most one such object.
Proof by contradiction Suppose I want to prove that some statement (call it X) is true.
The most straightforward method is to start from things I already know, and attempt to
deduce X directly. An alternative is to turn the whole process on its head, and begin by
supposing that X is not true. If I can demonstrate that this leads inescapably to the
conclusion that 1 2, then it must be that the assumption was false. So X has to be true
after all.
Examples of this technique are Euclid’s infinity of the primes, and the irrationality of
2.
Proof by contradiction rests upon the law of the excluded middle and, although it is a
standard piece in the mathematician’s repertoire, intuitionist and constructivist
mathematicians restrict themselves to working without it.
Also known by its Latin title of reductio ad absurdum, this technique has a curious by-
product: mathematicians can spend much of their time studying things that don’t exist.
Things that don’t exist The extent to which mathematical objects truly exist is a matter
of philosophical debate. Leaving this aside, mathematicians have undoubtedly devoted
a great deal of time to studying things whose non-existence is a matter of indisputable
fact.
For example, there is a well-known conjecture which says that there is no odd perfect
number. A natural way to try to prove this would be by contradiction. Such an attempt
would begin by supposing that it is not true, and so assume that there is an odd perfect
number, x. In the hunt for an ultimate contradiction, it would be necessary to study the
properties of x in great depth. In the course of this work, you might become the world-
expert on odd perfect numbers, even though, in all likelihood, there is no such thing.
305
Often, when contemplating a question, mathematicians will adopt twin approaches of
trying to prove it, and searching for a counterexample. Obviously, only one approach
can ultimately succeed, but the obstacles encountered in one attempt can be turned to
advantage in the other.
Independence results This book contains many theorems: mathematical statements for
which someone has provided a watertight proof. But mathematics is not alchemy; you
cannot conjure something out of nothing. At the base of every proof are some initial
assumptions, either made implicitly, or stated explicitly as axioms. Depending on the
field of mathematics, there are various standard starting points from which other
results are derived. Sometimes, however, these usual assumptions are not adequate
either to prove or disprove a particular statement. This is made concrete with an
independence result, which demonstrates that the statement and its opposite are both
equally compatible with the axioms.
The first significant result of this kind was that showing that the parallel postulate is
independent of Euclid’s other axioms. After Kurt Gödel proved his incompleteness
theorems in 1931, the possibility of finding independence results within mainstream
mathematics became genuine. Might it turn out for example that the Riemann
hypothesis is independent of the usual axioms of mathematics? No such result has
appeared within number theory yet, but through phenomena such as Friedman’s TREE
sequence, incompleteness is slowly encroaching on mainstream mathematics.
As the foundations of mathematics came under more scrutiny in the 20th century, two
important independence results were proved by Paul Cohen: that of the axiom of
choice and the continuum hypothesis.
306
METAMATHEMATICS
WHAT MATHEMATICIANS DO
Abstraction In different ways, all abstract theories of mathematics seek to identify and
analyse features of the mathematical landscape. They do this by identifying an
important phenomenon, such as multiplication, and then axiomatizing it. This strips
away all the extraneous noise that a particular multiplicative system may carry, and
allows the phenomenon under investigation to be studied in isolation.
For this reason, classification theorems can be major mathematical events, which mark
the end of centuries of enquiry. Notable examples include the classification of finite
simple groups, the classification of surfaces, the classification of simple Lie groups,
Shelah’s classification of countable first-order theories, the classifications of frieze and
wallpaper groups, and the geometrization theorem.
1 Establish the truth or otherwise of Cantor’s continuum hypothesis. 2 Prove that the
axioms of arithmetic are consistent. (This was addressed by Gödel’s second
incompleteness theorem and Gentzen’s proof theory.)
307
4 Systematically construct new non-Euclidean geometries through analysis of
geodesics.
(This problem is generally considered too vague to be answerable. But our knowledge
of non-Euclidean geometries is certainly well developed.)
5 Is there a difference between Lie groups which are assumed differentiable and those
which are not? (The answer is essentially ‘no’. For large classes of Lie groups,
differentiability is automatic.)
6 Fully axiomatize physics. (Our best efforts so far are the standard model of particle
physics and Einstein’s field equation, but the search for a Theory of Everything goes
on.)
7 Understand transcendental number theory. The precise problem Hilbert posed was
answered by the Gelfond–Schneider theorem, but the subject as a whole remains
mysterious.
10 Find an algorithm for solving Diophantine equations. This task was proved
impossible in honeycombs.
Matiyasevich’s theorem. 11–13 Problems in Galois theory, which for the most part
remain unresolved. 14–17 Problems in algebraic geometry, which have been partially
answered.
18 i) Are there only finitely many space groups in each dimension? (See n-dimensional
honeycombs.)
19–21 and 23 Problems in the theory of partial differential equations, which have
partially been resolved.
In the year 2000, the Clay Institute assembled a list of major open problems, echoing
David Hilbert’s from a century earlier. Each of their seven millennium problems was
assigned a $1,000,000 prize. They are:
1 The Birch & Swinnerton-Dyer conjecture 2 The Hodge conjecture 3 The Poincaré
conjecture 4 The Navier–Stokes equations 5 P NP? 6 The Riemann hypothesis 7 The
Yang–Mills Problems To date only the Poincaré conjecture has been settled. It was
proved in 2003 by Grigori Perelman. He declined both the prize and the Fields Medal.
308
See n-dimensional
The Fields Medal The Swedish chemist Alfred Nobel died in 1896, leaving most of his
money to found the Nobel Prizes in literature, physics, chemistry, physiology and
medicine, and peace. In 1968, the Swedish central bank endowed a new Nobel
Memorial Prize in economics. There has been some speculation about why Nobel did
not include mathematics. A legend says that he found himself the rival of a famous
mathematician (possibly Gosta Mittag-Leffler) for a woman’s love. A more prosaic
theory is that he was simply uninterested in the subject.
In 1924, the International Congress of Mathematicians sought to fill this gap, with the
Canadian mathematician John Charles Fields contributing the funds for the prize. The
first Fields Medal was awarded in 1936, and since 1966 four have been awarded every
four years.
The Fields Medal is traditionally awarded to mathematicians under the age of 40.
When Andrew Wiles completed his proof of Fermat’s last theorem he was therefore
ineligible. He was, however, awarded a commemorative silver plaque in 1998. The
medal itself is gold, and depicts the Greek mathematician Archimedes together with a
Latin quotation from him: Transire suum pectus mundoque potiri (‘Rise above oneself
and grasp the world’).
309
Computational mathematics A common belief about mathematicians and technology is
summed up by an old joke about a university chancellor. He hated the physics and
chemistry faculty, because they were constantly requesting money for expensive
equipment. But he liked the mathematicians because all they ever needed were pens,
papers and dustbins (garbage cans). (His favourite faculty was philosophy, because
they didn’t even need dustbins.) Perhaps there was once a grain of truth to the idea that
mathematicians need few tools beyond their brains, but the dawn of the personal
computer has had major implications for mathematics. It is striking that Gaston Julia
discovered Julia sets in 1915, but it took another half century for these beautiful
fractals to be seen by means of computer.
There are now several integrated computer packages available for mathematics,
including Mathematica, Maple, Mathcad, Maxima, Sage and MATLAB. These are not
only sophisticated numerical calculators and graphics packages, but they can
manipulate symbols in very advanced ways. Algebraic procedures which previously
took a long time, and ran the risk of human error, can now be automated. Such
programs are invaluable tools for mathematical education, and are assuming an ever
more important role in research. Computer simulations are essential in much of applied
mathematics (such as fluid dynamics). Other mathematical technologies include
distributed computing projects, and proof-checking programs.
Mathematics once had a reputation of being fusty and inaccessible; the internet has
gone a long way to opening the subject up. A great deal of mathematics is now
available online. Sites such as Wolfram Mathworld (mathworld.wolfram.com),
Wikipedia (wikipedia.org) and The On-Line Encyclopedia of Integer Sequences
(mathworld.wolfram.com
(www.research.att.com/~njas/sequences/) are searchable depositories of large amounts
of information. www.research.att
Increasingly mathematicians use the web to disseminate their own work, and discuss
the work of others. It has become common practice to upload papers onto personal
websites or arXiv.org, as well as submitting them to peer-reviewed journals. The
arXiv.org
mathematical blogosphere is growing rapidly, and websites such as Math Overflow
form online discussion rooms for professional researchers. Innovations such as
polymath projects are beginning to turn this interaction into active collaborative
research.
Distributed computing In the internet age, you do not have to be a scientist to help with
scientific research. There are numerous projects which run on the idle time of home
computers and games consoles around the world, from analysing the structure of
protein molecules to scanning the skies for evidence of extraterrestrial life. As
computers get faster, and more people around the world connect to the internet,
distributed computation has become a hugely powerful resource. Among these projects
are an increasing number of mathematical investigations.
310
NFS@home tackles the integer factorization problem, particularly for numbers of the
form b n 6 1 for b between 2 and 12, as n becomes large.
Gowers posed a problem from Ramsey theory on his blog, and invited anyone who
wanted to join in to help prove it. This was the start of the first polymath project,
inspired by the success of open source projects such as Wikipedia.
The idea was brand new. Mathematicians have always collaborated, but usually only in
twos or threes. Asking the question ‘Is massively collaborative mathematics possible?’,
Gowers wrote, ‘if a large group of mathematicians could connect their brains
efficiently, they could perhaps solve problems very efficiently as well’. So a small
network of blogs and wikis was created to connect brains, discuss strategy, share ideas
and collect results.
The experiment was a success; through the input of around 27 people around the
world, the theorem was proved, and indeed extended, within two months. Several
further polymath projects have since been run. It is hard to imagine that such projects
will not form an increasingly powerful force for research in the years ahead.
Inspiration and perspiration Many people have theorized about the psychology of
mathematics, including the relationship between occasional momentary flashes of
inspiration, and the many hours of sheer hard work. Wittingly or not, mathematicians
themselves have helped cultivate this mystery, by following the lead of
311
A fresh perspective on this question comes through polymath projects, which leave
behind them a complete, unabridged record of the thought processes which led to the
proof, complete with mistakes and cul-de-sacs. As Timothy Gowers and Michael
Nielsen wrote in Nature, ‘It shows vividly how ideas grow, change, improve and are
discarded, and how advances in understanding may come not in a single giant leap, but
through the aggregation and refinement of many smaller insights’.
This technique becomes really valuable with proofs which are too long to be easily
digestible by humans. A triumph of computerized proof checking came in 2004, when
Georges Gonthier at Microsoft Research and Benjamin Werner of INRIA used a
system called Coq to verify the four colour theorem. At time of writing, a major
enterprise in this area is the ongoing Flyspeck project, aiming towards a fully
formalized and verified proof of Kepler’s conjecture.
Closely related to theorem proving programs are systems for software creation. These
are programs which design algorithms to solve particular problems. Again, they can
outperform humans in certain cases, and intelligent systems are today employed by
NASA as well as several commercial technology companies.
312
METAMATHEMATICS
PHILOSOPHIES OF MATHEMATICS
PHILOSOPHIES OF MATHEMATICS
Coined in 1990 by Peter Steinbach, this joke has a serious point behind it. Those
numbers which are accessible to humans with our finite minds form a tiny and utterly
unrepresentative sample of the set of natural numbers. This remains true even factoring
in the most powerful distributed computing projects.
The frivolous theorem gives some perspective on what we are looking at, however: one
snowflake on the tip of an infinitely tall iceberg.
Plato’s cave The Greek philosopher Plato held that there are two planes of existence:
the transitory, imperfect, physical world that we inhabit, and the eternal, unchanging
world of forms. In his work The Republic, Plato wrote an analogy to illustrate how the
two realms are related. He imagined a group of prisoners who spent their lives chained
up in a cave, able only to stare at the wall in front of them. Behind the prisoners is a
fire, which illuminates the wall. Various objects from the outside world are brought
into the cave, but the prisoners are never permitted to see them directly, only their
silhouettes on the wall.
According to this analogy, these shadows represent the physical realm. They are all we
can see, and we mistakenly accept them as the ultimate reality. According to Plato,
however, they are only fleeting and imperfect representations of a truer reality, that of
forms.
313
Platonism Plato had more than mathematics in mind with his theory of forms, but it
does have a particular resonance here. Philosophers of mathematics are particularly
concerned with the existence of entities such as numbers. It is not hard to conceive of
seven bananas and seven books as being two imperfect shadows of the true number
seven. Plato also believed that our only access to the world of forms comes through the
application of reason. This too chimes with mathematical methodology.
Whatever the ultimate validity of Platonism as a philosophy, it is fair to say that many
mathematicians subscribe to it, at least as a working assumption. It offers a
reassuringly unequivocal answer to the question of the existence of mathematical
entities, and one which agrees with the experiences of those who spend their days in
the company of abstract structures. As such, Platonism is perhaps the default position
against which other philosophies of mathematics are judged.
Mathematics as logic
The fundamental thesis … that mathematics and logic are identical, is one which I
have never since seen any reason to modify.— Bertrand Russell
Logicism can be interpreted more generally as the view that mathematics is reducible
to logic. However, unlike Frege’s original logicism, ‘logic’ now takes on a broader
meaning and is allowed to posit axioms other than analytic truths.
This principle acted as guiding light to many mathematicians over the 20th century,
with Russell and Whitehead’s Principia Mathematica being an early triumph.
It would be difficult today to reject this neo-logicism completely, given the successes it
has brought. Many objects and methods of pure mathematics can certainly be
conceived as logical in nature. What is more, this realization has hugely enriched our
interactions with them, through the technical branches of mathematical logic such as
computability theory, proof theory and model theory. On the other hand, this neo-
logicist principle does not address the fundamental problems of mathematical
philosophy, such as the ontological status of mathematical objects; rather, it relocates
them to the philosophy of logic.
314
METAMATHEMATICS
PHILOSOPHIES OF MATHEMATICS
On its most basic terms, formalism is hard to reject, given the successful and rigorous
axiomatization of so much mathematics. Pure mathematicians do indeed spend their
time manipulating symbols in accordance with certain rules. The philosophical
question is whether this is all they are doing, or whether there is some deeper meaning
to their work. It seems that some other consideration is necessary, as formalism alone
cannot answer a key question: which rules should we choose? This question became
acute when Kurt Gödel proved his incompleteness theorems, with their implication that
no available axiomatization of the natural numbers is adequate to answer every
question.
Empiricism Rather than trying to reduce everything to the analytic truths of pure logic,
or a symbol-manipulating formal game, an empirical approach to the natural numbers
observes that they exist in our world as properties of physical objects. This is, after all,
how every child begins mathematics: if he has two toys and is given two more toys,
then he has four toys in total.
How can we investigate the behaviour of these natural numbers? The empiricist school
says that we should treat this in the same way as we treat other scientific matters: we
investigate, experiment, make hypotheses, test them against the evidence and draw
conclusions.
Empiricism is set against formalism, and received a boost from the works of Gödel and
Turing, which showed that the truth of the natural numbers lies beyond anything
accessible by a formal system. The computability theorist Gregory Chaitin is an
outspoken empiricist, who cites the randomness at the heart of mathematics as
evidence that formal systems are forever doomed to fail, and that a more direct line of
enquiry is needed. Axioms (and of course proof) still play a role, but they are closer in
spirit to the standard model of particle physics. They represent our best estimate, for
now. We must expect them to change and grow, as we inch towards the truth.
315
Proof by contradiction offers a non-constructive alternative, which takes advantage of
the law of the excluded middle. First assume that an object of the type you want does
not exist, and then deduce a contradiction from this assumption. This is the main way
that non-constructive results arise in mathematics. For this reason constructivism is
closely bound up with intuitionistic logic (of which, somewhat ironically, L.E.J.
Brouwer was a founder). The ultimate example of non-constructive mathematics is the
axiom of choice, which is why some people have reservations about it.
Finitism
God made the integers, all else is the work of man.— Leopold Kronecker
Kronecker was aghast at the turn that mathematics was taking in the late 19th century,
most especially by the work of his own student Georg Cantor (see Cantor’s theorem).
Kronecker’s quote became a motto for finitism, a strict take on the constructivist
school, which rejects the reality of all infinite quantities. Finitists hold that infinity is
only ever accessible as a potential. Certainly the set of natural numbers is potentially
see
infinite. But we can never access the whole collection, only its finite subcollections.
These finite sets are expressed in the physical reality of our universe, in a way that
Cantor’s infinite sets are not.
Ultrafinitism Some thinkers push finitist philosophy further. If the reality of infinite
sets is cast into doubt because they are not part of our physical universe or accessible
to our human minds, then the same should go for enormous finite numbers such as
Harvey Friedman’s TREE 3. Perhaps the most outspoken proponent of ultrafinitism is
Alexander Yessenin-Volpin, who is also well known as a poet, moral philosopher and
human rights activist, and who spent many years as a political prisoner in the Soviet
Union. For Yessenin-Volpin, even numbers such as 2 100 are out of reach of the
human mind, and therefore of doubtful validity, let alone monsters such as Friedman’s.
316
PHILOSOPHIES OF MATHEMATICS
The question is: if 2 is to be accepted, but 2 100 is not, where is the line drawn?
Harvey Friedman tells the story of when the two men, whose ideologies could hardly
be more different, met:
‘I raised just this objection with the (extreme) ultrafinitist Yessenin-Volpin during a
lecture of his. He asked me to be more specific. I then proceeded to start with 2 1 and
asked him whether this is ‘real’ or something to that effect. He virtually immediately
said yes. Then I asked about
2 2, and he again said yes, but with a perceptible delay. Then 2 3, and yes, but with
more delay. This continued for a couple of more times, till it was obvious how he was
handling this objection. Sure, he was prepared to always answer yes, but he was going
to take 2 100 times as long to answer yes to 2 100 than he would to answering 2 1.
There is no way that I could get very far with this.’
317
PROBABIL STATISTIC
of probability. Here simple experiments such as rolling dice and flipping coins can
provide a great deal of insight into the
mathematics of chance.
Every business and government in the world employs mathematics daily, in the form
of statistics. This provides the tools for analysing any form of numerical data.
It is no surprise that probability and statistics are intimately related. Statistical data
from an experiment will provide clues about its underlying probability distribution.
Many powerful and elegant results can be proved about such
ITY AND S
distributions, such as the law of large numbers and the central limit theorem. It is
striking that such mathematical calculations often violently disagree with human
intuition, famous examples being the Monty Hall problem and the prosecutor’s fallacy.
(Some have even suggested that this trait may be evolutionarily ingrained.) To combat
this tendency towards irrationality, people in many walks of life people apply
techniques of Bayesian inference to enhance their estimation of risk.
Mean Suppose we have been out in the field and collected a set of data:
3, 3, 4, 3, 4, 5, 3, 9, 4
Call this set A. It could be the number of leaves on dandelions in a garden, or the
number of residents in houses on a street. Whatever it represents, we might wish to
calculate its average. The mean, median, mode and mid-range are different
formulations of the average. Typically they do not produce the same result.
The mean is the most well-known of the averages. To calculate it, we add up all the
numbers (3 3 4 3 4 5 3 9 4 38), and divide by how many of them there are (in this case
9). So in this case, the mean is 38
2.
The mean can have unexpected consequences. For instance, the mean number of arms
of people worldwide is not 2, but around 1.999. (Most people have two arms, some
have none or one, and very few have three or more.) The overwhelming majority of
people, therefore, have an above average number of arms, when the mean is used.
Using the median or mode, this will no longer be true.
3, 3, 3, 3, 4, 4, 4, 5, 9
A complication arises when there are an even number of data points, in which case the
median is taken as the mean of the central two. For example, suppose we want to
calculate the median of the data set: 9, 10, 12, 14. There is no middle number here, but
the middle two are 10 and 12. So we take the take the mean of these: 10 12 _____
2 11.
Mode and mid-range The mode is the value which occurs most often. In the case of the
set A above, this is 3. Some sets of data do not have a meaningful mode: 1, 2, 3, 4, 5,
for example, or 2, 2, 50, 1001, 1001.
The crudest form of average is the mid-range: the mean of the highest and lowest
points. In the case of A above it is 3 9
2 6.
Frequency tables The data set A discussed above was easy to handle, since it just
contained nine data points. In many practical applications the data set will be much
larger.
320
9 4.˙
PROBABILITY & STATISTICS
STATISTICS
The underlying mathematics for calculating the mean, median and mode remains the
same, but the manner of presenting it is slightly different.
Suppose we have the data for the number of people living in each house or apartment
in a chosen town. We can set the results up in a frequency table:
Frequency
This means that there are 292 unoccupied properties, 5745 single-occupier properties,
and so on, making a total of 22,631 properties in the town. The easiest average to
identify is the mode, which can be read straight from the table: it is 2, as this is the
number with the highest frequency.
Mean from frequency tables To calculate the mean in the above frequency table, we
need to add up the total number of people in all the houses, and divide by the number
of houses. The frequency column tallies the number of houses, making a total of
22,631. But how can we find the total number of people?
In the unoccupied houses there are a total of 0 people (obviously). There are 5745
single-occupier properties, which contribute 5745 people to the total. In the two-person
properties, there are a total of 8291 2 16,582 people, and in the three-person properties,
a total of 4703 3 14,109 people.
So to calculate the total number of people we need to multiply together the first two
columns of the table. It is convenient to abbreviate the first as n, the frequency as f.
Then we can add a new column to the table, for n f:
321
The third column represents the total number of people in each class of house. Adding
up this column we find the total number of people in the town: 52,859. So we can
finally calculate the mean: 52,859 _____
Using the sum notation, we can write the mean in a new way as n f _____
f.
Cumulative frequency The median in the above frequency table will be the middle
property in the list, that is the 11,316th. So we need to work out which class this fits
into. A convenient method is to add a third column which counts the cumulative
frequency.
frequency
The first entry in the cumulative frequency column is 292, the same as in the plain
frequency column. The next entry is different. While 5745 is the number of single
occupier residences, 6037 is the number of residences with either 0 or 1 people in
them. Similarly the next entry 14,328 is the number of residences with 0, 1, or 2 people
in them.
Cumulative frequency is often useful. In particular it makes the median easy to find. In
this case, we are looking for the 11,316th house. Since the class of two-person
residences covers the 6038th house till the 14,328th, the 11,316th must be in this class,
therefore the median is 2.
Interquartile range The different averages of a set of data are ways to judge its centre.
Another important aspect is its spread. Again there are different ways to formalize this.
The range is the full distance between the maximum and minimum data points. This is
not a particularly useful measure as it is dictated completely by outliers, points that lie
far outside the general trend. Another simple measure is the interquartile range.
The median is found by putting all the data in ascending order, and picking the point
exactly
2 way between the maximum and minimum. We may also look at the points 1 of the
way between these points; these are the quartiles. The distance between them is the
interquartile range. For example, if our data set is 1, 1, 3, 5, 7, 9, 10, 15, 18, 20, 50,
then the median is 9, and the quartiles are 3 and 18, making the interquartile range 18 3
15. The full range is 50 1 49. For larger data sets we can read the quartiles off the
cumulative frequency table:
322
PROBABILITY & STATISTICS
STATISTICS
The first quartile will be the 5658th house, which lies in the class of single-occupier
properties. The third quartile is the 16,974th house, which lies in the class of three-
person residences. So the interquartile range is 3 1 2. The full range is 6 0 6.
Sample variance Like the interquartile range, the sample variance is a measure of the
spread of a set of data. It is the most natural one to consider when we take the mean as
the centre. Suppose our set X of data contains n data points, and has mean μ. We can
measure the distance of any data point x from the mean as x – μ. We then square this to
make it positive, giving (x – μ) 2.
If we do this for each value of x and then take the mean of the results, we get the
sample variance of X, Var X. That is:
Now square each of these, and take the mean of the squares:
3.284 1.812 to three decimal places. (See also expectation and variance of a probability
distribution.)
Moments The above procedure for calculating the variance works perfectly well, but
there is a slightly quicker method. It turns out that the variance is equal to the mean of
the squares minus the square of the mean. That is:
323
Starting with the data set 3, 3, 3, 3, 4, 4, 4, 5, 9, we calculate the mean of the squares:
The mean is also known as the first moment, and the mean of the squares ? x 2
Correlation Are smokers more likely to get cancer? Do people in rainy countries have
more children? Do heavier beetles live longer? There are many occasions in which
scientists want to test for a relationship between two phenomena. The statistical tool
which quantifies this is correlation.
Suppose we have data on the weight and longevity of some beetles, and want to test if
the two are related. We could start by plotting the data on a graph of weight against
longevity on a graph. If the resulting points seem randomly scattered, then there is no
correlation between the two factors. On the other hand, if they lie very close to a line,
then the two are strongly correlated. Between these two situations is weak correlation.
Suppose that there is some degree of correlation. If an increase in weight tends to come
with an increase in longevity, then they are positively correlated. If an increase in
weight tends to come with a decrease in longevity, they are negatively correlated.
Strong positive
There are various methods by which statisticians can assign a number, called a
correlation coefficient to this situation. (A common one is Spearman’s rank correlation
coefficient.) In all cases, the output is a number between 1 and 1, where a result near 1
means strong negative correlation, a result near 1 means strong positive correlation,
and a result of 0 means no correlation.
If our analysis does reveal positive correlation between the weight and longevity of
beetles, this does not tell us whether healthier, longer-lived beetles tend to be heavier,
or whether extra weight helps protect beetles from injury, or whether there is a third
factor we have missed such as female beetles being both heavier and longer-lived than
males. The warning is correlation does not imply causation.
324
PROBABILITY & STATISTICS
STATISTICS
Spearman’s rank correlation There are several approaches to testing for correlation.
Charles Spearman’s rank correlation coefficient has the advantage that it does not
assume linear correlation to work.
Suppose that scientists discover a new type of luminous plant. They want to investigate
whether the height of the plants is correlated with their luminosity. To illustrate the
method, I will take a small sample:
B 4.5 0.37
C 5.0 0.36
D 5.9 0.31
E 7.3 0.45
F 6.2 0.38
The first thing to do is rank the plants by height and by luminosity (1 being the
tallest/brightest):
Now we can forget about the actual data and just work with the two ranks. The next
thing to do is to calculate the difference (d) in ranks for each plant, and square it (d 2):
325
Next we sum up the d 2 column; in this case d 2 10. Finally, we insert this into
Spearman’s formula for the coefficient
16?d2
where n is the number of plants, in this case 6. (Notice that if correlation is perfect then
all the ranks are the same, and the coefficient will be 1, as we would hope.) In this
case, we get a rank correlation coefficient of to two decimal places, which is moderate
positive correlation.
The first digit phenomenon Suppose we take a set of data from the real world, the
account books of a company perhaps, or the heights of a mountain range. Then we
tally up the number of times each of the digits 1 to 9 occur as the first digit (ignoring
any leading zeros). Most people would expect that the nine digits should be equally
common, each occurring as the leading digit around 1
9 th of the time. Remarkably, this usually fails, as two 19th-century scholars, Simon
Newcomb and Frank Benford, both noticed. Benford investigated further, counting the
leading digits in large quantities of data from baseball statistics to river basins. He
found that the digit 1 appears as a leading digit around 30% of the time, 2 around 17%
of the time, decreasing to 9 which occurs around 5% of the time.
Benford’s law Benford’s law provides a formula for the first digit phenomenon. It
states that the proportion of the time that n occurs as a leading digit is around log 10 (1
1
__ n). We have to be careful with Benford’s law. Truly random numbers (that is to say
uniformly distributed data) will not usually satisfy it (so it is of no help in selecting
lottery numbers). Similarly data within too narrow a range will not obey it (not many
US presidents have an age beginning with 1). Between these two extremes, however,
are innumerable social and naturally occurring situations where it does apply. (Indeed,
Benford’s law has been of great value in fraud investigations in the USA.)
Hill’s theorem Benford’s law has been justified in various ways since its discovery.
Consider just the numbers 1 to 9. In this set, of course, the leading digits are perfectly
uniform. Now create a new set by multiplying everything by two: 2, 4, 6, 8, 10, 12, 14,
16, 18. Now over 50% of the results have leading digit 1. This illustrates the instability
of the uniform distribution on leading digits. Many arithmetical procedures will skew it
towards Benford’s distribution.
Relative frequency
Leading digit
PROBABILITY & STATISTICS
PROBABILITY
This is the idea, but it was only in 1998 that Theodore Hill provided the first rigorous
explanation. The key observation is that Benford’s law is not base dependent. The
statement above was for data presented in base 10, but a similar thing applies in any
other base b, with the average frequency of the leading digit n given by log b Hill was
able to show that Benford’s is the only probability distribution on leading digits which
satisfies this property of base-invariance.
PROBABILITY
2 of coming out heads, and the probability of rolling a number between 1 and 5 on a
fair die is 5
6.
This reading of probability works well for many purposes, when there are finitely
many outcomes. But it is too simplistic for some applications. It is better to interpret P
(X)=0 as meaning that X is infinitely unlikely. Suppose I (somehow) pick, completely
at random, a real number between 0 and 10. What is the probability that I will select
(not to 10 or 100 decimal places, but exactly)? Since this is one of infinitely many
possibilities, the probability must be 0. This situation is covered by continuous
probability distributions.
Successes and outcomes When you roll a fair die, there are six possible outcomes, all
equally likely. Suppose we are interested in the probability of getting a 6. That means
that exactly one of the six possible outcomes is classed as a success, giving a
probability of 1
A123456
B654321
__ 36 1
6.
This simple idea can be the basis of some highly involved problems, such as the
birthday problem. Calculating such probabilities often involves subtle combinatorics to
count outcomes.
327
Adding probabilities Suppose we know the probability of two events X and Y. What is
the probability of the new event ‘X or Y’? The rule of thumb is that ‘or’ means ‘add
the probabilities’.
For example, the probability of rolling a 4 on a fair die is and the same probability
holds for rolling a 5. So the probability of rolling a 4 or 5 is Extreme caution is
required here! This can easily lead to errors. Suppose I flip two coins (A and B). What
is the probability that I get a head on A or a head on B? The probability that I get a
head on A is and the same goes for B. Adding these two, the answer should be This
would imply that it is a certainty. Obviously, this is nonsense: I may get two tails. In
general, the rule only holds when we have two mutually exclusive events.
Multiplying probabilities Suppose I know the probability of two events X and Y. What
is the probability of the new event ‘X and Y’? Here, the rule of thumb is that ‘and’
means ‘multiply the probabilities’. For example, if I flip two coins, the probability that
they will both come up heads is
As with adding probabilities, misapplying this rule easily produces nonsense. Suppose
I flip one fair coin, once. The probability that I will get a head is This is obviously
absurd, since that scenario is impossible, with a probability of 0.
The rule of thumb is valuable, nevertheless. To know when it applies and when it does
not, we need to understand independent events.
of rolling either a 2 or a 5 is
2, and similarly for a tail. So the probability that I get a head and a tail (on the same
flip) should be 1
Mutually exclusive events Two events X and Y are mutually exclusive if they cannot
both occur: if X happens then Y does not, and vice versa. If I roll a die, then the events
of my getting a 2 and my getting a 5 are mutually exclusive.
If X and Y are mutually exclusive then to find the probability of ‘X or Y’, we add the
probabilities:
For example, rolling a 2 and rolling a 5 on a fair die each have probability 1
6. So the probability
3. However, if I roll two dice (A and B), scoring 2 on A and 5 on B are not mutually
exclusive events. They can both occur. So in this case we cannot just add the
probabilities.
Independent events Two events X and Y are independent if they do not affect each
other. Whether or not X occurs has no impact on the probability of Y, and vice versa.
A classic example would be if I roll a dice and flip a coin, then rolling a 6 and flipping
a head are independent events.
328
PROBABILITY & STATISTICS
PROBABILITY
If X and Y are independent, then to find the probability of ‘X and Y’ we multiply the
probabilities:
21
12.
It is almost true to say that two events cannot be both independent and mutually
exclusive. There is one exception, namely if one of the events is impossible. The
events of flipping a head on a coin and rolling a 7 on an ordinary die are certainly
independent. They are also mutually exclusive, in the trivial sense they cannot both
happen, because rolling a 7 itself can never happen.
The birthday problem How many people do there need to be in a room so that there is
at least a 50% chance that two will share the same birthday? To answer this, it is
convenient to turn the question on its head, and, for different numbers of people,
calculate the probability that everyone has a different birthday. When this probability
first drops below 50%, we will have the answer to the problem.
The model used here comes with some built-in assumptions which should be noted.
Most obviously it ignores leap years; more subtly it assumes that every day of the year
is as common a birthday as every other, although this is not quite true in practice.
The birthday theorem To solve the birthday problem, suppose first that there are just
two people in the room. Then the total number of possible arrangements of birthdays is
365 365. On the other hand, if they are to have different birthdays, then the first person
may have his on any day of the year (365 possibilities) and the second may have hers
on any day except the first’s birthday (364 possibilities). So the number of possible
pairs of distinct birthdays is 365 364. The probability of one occurring is therefore 365
364
The same approach generalizes to when there are n people in the room. The number of
total possible arrangements of birthdays is 365 n. If all the people are to have distinct
birthdays, we reason as before: the first person may have her birthday on any day (365
possibilities).
The next may have his on any day except the first’s (364 possibilities); the third must
avoid the birthdays of the first two (363 possibilities), and so on, until the n th person
who must avoid the first (n 1) birthdays (366 n possibilities). So the probability of this
occurring is
365 n.
_____________ (365 n)! 365 n 1. So, what is the first value of n for which this is
below 0.5?
A little experimentation with different values of n provides the answer: 23, at which
point the probability is around 0.493.
329
Conditional probability In a particular city, 48% of houses have broadband internet
installed, and 6% of houses have both cable television and broadband internet. The
question is: what is the probability that a particular house has cable TV, given that it
has broadband?
In the above example, we take X to be the event that the house has cable TV, and Y to
be the event that it has broadband. Notice that we do not need to know P (X) to
calculate the answer: P (X | Y) 0.06
Bayes’ theorem In 1764, an important paper by the Reverend Thomas Bayes was
published posthumously. In it he gives a compelling account of conditional
probabilities. The basis was Bayes’ theorem, which states that, for any events X and Y:
In a sense, this formula is not deep. It follows directly from the definition of
conditional probability:
Splitting an event Suppose we pick a person at random off the street, and want to know
the probability that this person wears glasses? In our city, the only statistics available
say that 65% of females and 40% of males wear glasses. We also know that 51% of the
population are female and 49% male. How can we combine these to get the probability
we want?
Call Y the event that our selected person is female. The event we are really interested
in is X, that our selected person wears glasses. We have split this into two smaller
events: X & Y and X & not Y. These two events are mutually exclusive, so:
330
PROBABILITY & STATISTICS
PROBABILITY
At this point we can use the data we have, P (X | Y) 5 0.65, P (X |not Y) 5 0.4, P (Y) 5
0.51 and P (not Y) 5 0.49. Putting this together:
False positives A test for a certain disease has the following accuracy: if someone has
the disease, the test will produce a positive result 99% of the time, and give a false
negative 1% of the time. If someone does not have the disease, the test gives a negative
result 95% of the time, and gives a false positive 5% of the time. The disease itself is
quite rare, occurring in just 0.03% of the population. A person, Harold, is picked at
random from the population and tested. He tests positive. The critical question is: what
is the probability that he has the disease?
We need to translate this into mathematics: let X be the event that Harold has the
disease. Before we factor in the test data, P (X) 5 0.0003. Let Y be the event that
Harold tests positive. We can split this event to write P (Y) 5 P (Y | X) P (X) 1 P (Y
|not X) P (not X), which comes out as
We are really interested in P (X | Y), the probability that he has the disease, given that
he has tested positive, and can work this out using Bayes’ theorem:
0.050282 5 0.00591
So the probability that Harold has the disease, given that he haws tested positive for it,
is a little under 0.6%. The explanation for this surprising result is that the true positives
form a high proportion of the very small number of disease sufferers. These are greatly
outnumbered by the false positives, a fairly small proportion of a hugely larger number
of non-sufferers. So, despite the seeming accuracy of the test, when randomly chosen
people are tested, a large majority of the positive results will be false.
At the scene of the crime, police found a strand of the burglar’s hair. Forensic tests
showed that it matched the suspect’s own hair. The forensic scientist testified that the
chance of a random person producing such a matching is
2000. The prosecutor’s fallacy is to conclude that the probability of the suspect being
guilty must therefore be 1999
This is certainly incorrect. In a city of 6 million people, the number of people with
matching hair samples will be
2000 3 6,000,000 5 3000. On the basis of this evidence alone, the probability of the
suspect being guilty is a mere
3000.
The term ‘prosecutor’s fallacy’ was coined by William Thomson and Edward
Schumann, in their 1987 article ‘Interpretation of Statistical Evidence in Criminal
Trials’. They documented how readily people made this mistake, including at least one
professional prosecuting attorney.
331
The defence attorney’s fallacy Thomson and
Schumann also considered an opposite mistake to the prosecutor’s fallacy, which they
dubbed the defence attorney’s fallacy.
In the above example, the defence attorney might argue that the hair-match evidence is
worthless, since it only increases the probability of the defendant’s guilt by a tiny
amount,
3000. If the hair is the only evidence against the suspect, then the pool of potential
suspects before the forensic evidence is taken into account is the entire population of
the city, 6,000,000, which the new evidence reduces by a factor of 3000, to 2000.
However, one would expect that this is not the only evidence, in which case the initial
pool of potential suspects will be much smaller. If it is 4000, say, then the forensic
evidence may reduce this by 2000, to 2, increasing the probability of guilt from to This
is valuable evidence.
The phenomenon of false positives and the prosecutor’s fallacy may seem surprising.
Mathematically speaking, the two situations are similar; the ultimate fallacy in both
cases is of confusing is very high, people commonly assume that P (Y | X) must be too.
In the example above of the medical test, P (positive result|disease) 0.99 while P
(disease|positive result) 0.056. These examples show how wrong this can be.
This fallacy is widespread (even in doctors’ surgeries and law courts). Some have
argued that it is not merely a common mathematical mistake, but an ingrained
cognitive bias within human nature. Either way, an appreciation of this issue is
essential for making sense of statistical data in the real world.
Frequentism What is the ontological status of probability? That is, to what extent does
it really exist in the world? There are two broad schools of thought: frequentism and
Bayesianism.
For a frequentist, randomness is taken as an intrinsic part of reality, which probability
quantifies. To say that event A has a probability of 1
2 means that if the experiment was repeated many times then A would occur exactly 1
2 of the time. In other words the probability of A is a measure of the frequency with
which A happens, given the initial conditions. (This would only be an approximate
after finitely many repetitions, but would be exact in the limit.)
As this shows, the principle does not apply very easily to one-off events, but is best
suited to repetitive occurrences.
Bayesianism In contrast to frequentists, for a Bayesian, probability does not exist in the
external world. It is purely a way for humans to quantify our degree of certainty on the
basis of incomplete information. In other words, probability is a subjective concept.
People will make different assessments of probability, based on the different data they
have available.
332
_
PROBABILITY & STATISTICS
PROBABILITY
2 of resulting in a head, this is because we know little about it. More data about the
weighting of the coin, its initial position, and the technique of the flipper would allow
us to modify our probability. If we knew these things in great detail, we would be able
to predict the outcome with some certainty. (The mathematician John Conway is
reputed to have mastered the art of flipping coins to order.)
Aumann’s agreement theorem There are three parts to Bayesian inference: a prior
probability distribution, some new data, and a posterior probability distribution
produced from these.
In 1976, Robert Aumann considered two Bayesian reasoners with identical prior
probabilities for some event X. The two people are then provided with different pieces
of data. Of course, conditional probability is likely to produce two different posterior
probabilities for X. Aumann’s question was what would happen if they then share their
posteriors (technically, elevate them to common knowledge), without sharing their
private data.
The Monty Hall problem Monty Hall is the former presenter of the TV Quiz Let’s
Make A Deal, in which contestants had to choose between three doors, concealing
different prizes. This scenario was the inspiration for the most infamous of all
probability puzzles, the Monty Hall problem, concocted by Steve Selvin in 1975.
There are three doors, A, B and C. Behind one of them is a brand new sports car.
Behind the other two are wooden spoons. The contestant chooses one door, let’s say A.
In doing so he has a 1
3 probability of hitting the jackpot. Next, Monty Hall, who knows where the car is,
says ‘I’m not going to tell you what’s behind door A, not yet. But I can reveal that
behind door B is a wooden spoon. Now will you keep with door A, or swap to C?’
The natural assumption is that the odds are now 50/50 between A and C, and swapping
makes no difference. In fact, this is incorrect: C now has a 2
_ but A just 1
333
To switch or not to switch? To say that the solution to the Monty Hall problem often
comes as a surprise is an understatement. Readers of Parade magazine felt so
passionately that when Marilyn Vos Savant discussed the problem in 1990 she was
inundated with complaints, including some from several professional mathematicians,
protesting her public display of innumeracy.
To see why she was right, it may help to increase the number of doors, say to 100.
Suppose the contestant chooses door 54, with a 1% probability of finding the car.
Monty then reveals that doors 1–53, 55–86, and 88–100 all contain wooden spoons.
Should the contestant swap to 87, or stick with 54? The key point is that the probability
that door 54 contains the car remains 1%, as Monty was careful not to reveal any
information which affects this. The remaining 99%, insteading of being dispersed
around all the other doors, becomes concentrated at door 87. So she should certainly
swap.
The Monty Hall problem hinges on a subtlety. It is critical that Monty knows where the
car is. If he doesn’t, and opens one of the other doors at random (risking revealing the
car but in fact finding a wooden spoon), then the probability has indeed shifted to 1
2. But in the original problem, he opens whichever of the two remaining doors he
knows to contain a wooden spoon, and the contestant’s initial probability of 1
3 is unaffected.
Count Buffon’s needle If you drop a needle at random onto a piece of lined paper,
what is the probability that it will land crossing one of the lines? That was the question
investigated by Georges Leclerc (le Comte de Buffon) in 1777, who experimented by
chucking sticks over his shoulder onto his tiled floor.
The answer depends on the length (l) of the needle and the distance (d) between the
lines. If l d then the answer turns out to be 2 l
d. (The case l d is slightly more complex.) So if the needle is 1 cm long, and the lines
are 2 cm apart, then the answer comes out very neatly as 1/, which provides what is
known as a Monte Carlo method for calculating: perform the experiment as many
times as you like, and then divide the total number of drops by the number of times the
needle lands on a line, to obtain an approximation to.
In 1777, Comte de Buffon also attempted to apply conditional probability to the study
of philosophy, by trying to quantify the likelihood that the sun will rise tomorrow,
given that it has risen for the last n days.
The law of truly large numbers The following useful principle was first named by Persi
Dioaconis and Frederick Mosteller in 1989, although the phenomenon had been known
for a long time:
Lotteries usually do not have good odds; if I buy a ticket, I have a very low chance
(perhaps one in ten million) of winning. If I do win, I will be astonished that such an
unlikely event has happened. But viewed from the lottery’s perspective, if several
million tickets are sold, there is a good chance that someone will win.
334
PROBABILITY & STATISTICS
PROBABILITY DISTRIBUTIONS
A still more outrageous example is the case of the US woman who won the New Jersey
lottery twice. From the winner’s perspective this is a truly incredible occurence.
Diaconis and Mosteller framed a broader question: ‘What is the chance that some
person, out of all the millions and millions who buy lottery tickets in the United States,
hits a lottery twice in a lifetime?’ Following Stephen Samuels and George McCabe,
they report the answer: ‘practically a sure thing’.
Coincidence The law of truly large numbers is a useful tool for understanding
coincidence, when highly improbable events take place. Suppose we call an event rare
if it has a probability of less than one in a million. In 1953 Littlewood observed that, in
the USA’s population of 250 million people, hundreds of people should experience
rare events every day. Scaling this up to the entire world, even supremely unlikely
occurrences of one in a billion should be expected to occur daily.
Other sources of apparent coincidence derive from our poor intuition about probability.
The surprising solutions to the birthday problem and the phenomenon of false positives
show how unreliable this can be.
PROBABILITY DISTRIBUTIONS
6.
To refine the picture, we need a little terminology. The sample space is the collection
of all possible outcomes of an experiment. In the case of a roll of the die, this will be
the numbers 1 to 6. A random variable is an assignment of probabilities to these
outcomes. (Technically it is a function, from the sample space to the numbers between
0 and 1.)
There are all sorts of possible functions. The random variable which assigns number 6
probability 1, and the numbers 1 to 5 all probability 0, corresponds to a die which is
certain to show a 6. The random variable which assigns each number a probability of 1
Probability distributions There are a large variety of probability distributions, but they
come in two fundamentally different forms. In a discrete distribution, the outcomes are
separated from each other, as in the example of a die, which
335
can take the value 4 or 5, but not 4 1
2. In a continuous distribution this is not the case (for example, a person’s height may
take any value between 4 and 5 feet).
Given a probability distribution X, two important pieces of data are where it is centred,
and how widely spread it is. These are measured by two numbers: the mean E (X) and
variance V (X), respectively. The E in E (X) stands for the expectation or expected
value of the distribution, an alternative (and slightly misleading) name for the mean.
In experiments, the mean and sample variance of a set of data should correspond to the
theoretical mean and variance of the underlying distribution. This will happen ever
more closely as larger sets of data are used, a consequence of the law of large numbers.
Expectation and variance Given a discrete random variable X, the mean is defined as
the sum of all the possible outcomes multiplied by their respective probabilitites. If X
represents the distribution of the roll of a fair die, the possible outcomes are the
numbers 1 to 6, each with probability 1
The formal definition of the mean (expectation) is E (X) x P (X x), where x ranges
over all possible outcomes, and P (X x) is the corresponding probability.
For continuous random variables, this becomes E (X) x f (x)d x where f is the
see continuous
probability density function (see continuous probabilityprobability
distributions).distributions
Suppose that E (X). Then the variance V (X), measuring the spread of the probability
distribution, is defined as V (X) E (X) 2. An easier way to calculate it is V (X) E (X 2)
E (X) 2. For a die roll:
The spread is also measured by the standard deviation, which is defined by For a die
roll this is
The Bernoulli distribution may not seem much on its own, but by combining Bernoulli
trials together, more sophisticated distributions can be built. An important example is
the binomial distribution.
1– p
01
336
ity distributions
PROBABILITY & STATISTICS
PROBABILITY DISTRIBUTIONS
Binomial trials Suppose we roll 100 fair dice. What is the probability of getting exactly
17 sixes? Since the dice are fair, this problem can be solved by counting successes and
outcomes, as follows. If we specify 17 dice, then the probability that those dice show
sixes is (1
6) 17 (since the 17 rolls are independent). We also require that the other 83 dice do not
show sixes: this probability is (5
6) 83. So the probability that our 17 named dice, and only these, roll sixes
For each possible choice of 17 dice, we get this probability. To answer the original
question, therefore, we need to multiply this probability by the number of possible
choices of 17 dice from the 100 (since the resulting events are all mutually exclusive).
The number of choices of 17 from 100 is given by the combination (100
More generally, if X B (n, p), then X measures the total number of successes from n
independent Bernoulli trials with probability p. Then, for each number k between 0 and
n, the probability that X k is:
Poisson processes Between 9a.m. and 5p.m., an office telephone receives on average
three calls per hour. In any particular hour it might receive no calls, or 1, 2, 3, 4, 5, 6,
… calls. There is no obvious cutting-off point, but higher numbers of calls become
progressively unlikely. If we assume that the calls come randomly, with no difference
across the day, or between days, then this is an example of a Poisson process.
Poisson distribution We would like a random variable which assigns probabilities to all
the possible outcomes 0, 1, 2, 3, 4, … of a Poisson process. The distribution commonly
used is the Poisson distribution, discovered in 1838 by Siméon-Denis Poisson, via the
law of rare events.
337
One number is needed to specify a Poisson distribution, usually called the intensity,. In
the above example, 3. We write X Po(3) to express that X is a random variable with
this distribution.
Then the probability that the phoneline receives 0 calls in a particular hour is P (X 0) e
3. The probability that it receives one is P (X 1) 3 e 3, and for two calls, it is The
probability that it receives k calls is
The expectation and the variance of a Poisson distribution with parameter are given by
The law of rare events Suppose a factory produces peanut-flavoured ice-cream. The
average number of peanuts in each tub is 18. A natural way to model this situation
would be using the Poisson distribution, X Po(18).
There is another approach, however. Suppose there are 36,000 peanuts in the ice-cream
vat, and the probability that a particular peanut will end up in a particular tub is 0.0005.
Then we could model the scenario as a binomial distribution, X B (36,000, 0.0005).
The first model is certainly more convenient, but the second may be more accurate. We
need not worry too much though. The law of rare events guarantees that these two
models will produce very similar answers. Technically, the law of rare events says that,
if n grows large and p becomes small, and they do this in such a way that the average
number of successes np remains constant, then the distribution B (n, p) becomes
increasingly close to a Poisson distribution, Po(). Indeed, this is how Siméon–Denis
Poisson discovered his distribution.
If I pick someone at random from my city, and measure their height, then distributions
such as the binomial and Poisson distributions cannot apply. The problem is that the
list of possible heights do not form a discrete list like the whole numbers: 0, 1, 2, 3, …
but a continuous range. Continuous probability distributions apply in situations such as
this. Each comes with a curve, called the probability density function, which represents
the distribution.
338
PROBABILITY & STATISTICS
PROBABILITY DISTRIBUTIONS
For a discrete distribution, we demand that all the probabilities add up to 1. In the
continuous case, we require that the area under the whole curve should be 1.
The simplest continuous distribution is the uniform one. Occupying pride of place at
the heart of modern probability theory is the normal distribution.
In general, if X U (a, b), the probability density function has a constant value of 1
____ b a between a and b, and 0 elsewhere. In the above example, this gives the graph
a height of 1
4. The archetypal example is the standard uniform distribution, U (0, 1). The general
uniform distribution has expectation a b
2 and variance 1
The fundamental shape of the bell curve is given by the equation y e x 2. However, this
needs to be rescaled to ensure the area underneath the curve is 1, to move the centre to
μ, and stretch it according to the value of 2. Putting these together, we get the
probability density function, y 1
In the case of the standard normal distribution, N (0, 1), this is simplified slightly: y 1
The normal distribution is the mother of all probability distributions, in a precise sense
given by the central limit theorem.
This standard version is the only one we really need, since every normal distribution
can be standardized. If X N (μ, 2), it is standardized by defining Y X μ
339
Independent, identical random variables
A common thing to do is to take the sample mean of the first n rounds of the
experiment. We do this by defining a new random variable
where, Y n represents the mean of the outcomes of the first n experiments (such as the
average score of the first n rolls of the die). The law of large numbers and the central
limit theorem both describe this random variable.
Law of large numbers Roll a fair die 10, 100 or 1000 times, and take the mean score in
each case. What would you expect to find? The law of large numbers predicts that, as
the sample gets ever larger, we should expect the mean of the sample result to get ever
closer to the theoretical mean of 3.5.
Although informal versions of this had been known for many years, Jacob Bernoulli
was the first person to frame it as a rigorous theorem, in 1713. The theorem refers to a
sequence X 1, X 2, X 3, X 4, … of independent, identical random variables, each with
mean μ. Then we define new random variables:
The law of large numbers asserts that, as n grows large, the random variable Yn gets
ever closer to the fixed number μ.
This law is refined by the central limit theorem. However, this law has broader
applicability, since the central limit theorem requires an additional assumption about
the variance of the random variables X
i.
Central limit theorem In 1733, Abraham de Moivre used a normal distribution to
model the total number of heads in a long sequence of coin flips. Something seems
wrong with this: coin flips are discrete, not continuous. The binomial distribution, and
not the normal, is the appropriate model.
However, de Moivre had not made an elementary error. Rather this was the first
inkling of a fundamental result in probability theory: the central limit theorem.
340
PROBABILITY & STATISTICS
STOCHASTIC PROCESSES
It does not matter what the distribution of X is, aside from these facts. It could be
uniform, Poisson, or some as yet undiscovered distribution. The central limit theorem
says that, if this experiment is repeated many times, the average result is approximately
given by a normal distribution.
The central limit theorem tells us that, as n gets bigger and bigger, the distribution of Y
n is approximately N (μ, 2
__ n. Now, as n gets larger, the random variables Z n get ever closer to the standard
normal distribution N (0, 1).
Gambler’s fallacy The law of averages may be well known outside of mathematics, but
you will find no theorem of this name in any book on probability theory. Appeals to
this law, when they are valid, are likely to refer to the law of large numbers. When they
are erroneous, they are usually instances of the gambler’s fallacy. Suppose a gambler
has seen black win at roulette six times in a row. He might consider that red is now
‘due’, and therefore more likely to occur next spin.
Of course, a long run of identical outcomes may still be worth considering as evidence
that the experiments are not as they seem (either the probability is not as claimed, or
the events are not really independent).
STOCHASTIC
PROCESSES
Stochastic processes Start in the middle of a road, and flip a fair coin.
If it comes up heads walk 1 metre north; if it comes up tails, walk 1 metre south. Then
flip again. Where will you end up after 10 or 100 flips? This experiment is an example
of a random walk.
341
For a 2-dimensional random walk, consider Manhattan, where the streets are laid out
as a square grid. At each crossroads you take an equal probability of walking north,
south, east or west for one block. (We model the city as an infinite grid, ignoring the
possibility that you might reach the edge.)
Random walks are the simplest examples of stochastic processes, processes which
develop over time according to probabilistic rules, rather than along predetermined
lines. More sophisticated examples include Markov chains and Brownian motion.
George Pólya, who in 1921 analysed the 1- and 2-dimensional cases described above.
His question was as follows: pick a point on the graph at the beginning. Now, what is
the probability that the random walker will reach it eventually? A simpler question,
which amounts to the same thing, is: what is the probability that the walker will
eventually return to his starting point?
Pólya showed that the answer is 1, in both cases, making it a virtual certainty. The 1-
dimensional case is sometimes known as gambler’s ruin. A gambler playing a fair,
random game against a casino one chip at a time, has a probability of 1 of losing all his
chips eventually. This may seem unsurprising. But Pólya showed that this fails in
higher dimensions. A random walk on a 3-dimensional lattice has a lower probability
of returning to its starting point, subsequently pinned down to around 0.34. Rather than
covering the whole lattice, higher-dimensional random walks exhibit a striking fractal
appearance as they grow.
Markov chains At each stage of a Manhattan random walk, you flip a coin to decide in
which direction to go next. In probability theory, these coin flips are modelled by a
random variable of a simple kind. A Markov chain is a sequence of random variables,
like a random walk. The difference is that these random variables may be of more
sophisticated types, such as a random walk on a grid which contains random
teleporters or other booby traps. Conceived by the 19th-century probabilist Andrey
Markov, the defining characteristic of a Markov chain is that the probability
distribution at each stage depends only on the present, and not on the past. (In a
random walk, all that matters is where you are now, not how you got there.)
Markov chains are an excellent framework for modelling many phenomena, including
population dynamics and stock-market fluctuations. To determine the eventual
behaviour of a Markov process is a deep problem, as Pólya’s 3-dimensional random
walk illustrates.
Kinetic theory of heat Robert Brown was a botanist, and a pioneer of microscopy
within the biological sciences. In 1827 he turned his microscope on a primrose pollen
seed suspended in water. Floating inside were tiny specks of matter, darting around in
a very haphazard manner. This subsequently became known as Brownian motion.
Brown initially
342
PROBABILITY & STATISTICS
CRYPTOGRAPHY
thought the particles were tiny living organisms, but further investigations revealed
that the same characteristic irregular movement occurred in finely powdered rock left
in water.
In 1905, Albert Einstein realized that the particles were being bumped around by water
molecules, too tiny to see. Significantly, the hotter the water was, the faster the visible
particles moved. Einstein recognized this as powerful indirect evidence for the
molecular theory of heat. As we now know, heat energy in matter is nothing more than
the combined kinetic energy of its constituent molecules.
Brownian motion To flesh out the details of Einstein’s work on the kinetic theory of
heat, a mathematical model of a particle’s Brownian motion was needed. Since each
change of course of the particle is random, and independent of its previous motion, it
resembles a stochastic process such as a Markov chain. However, in random walks and
Markov chains, time comes in discrete steps. Each step in the process takes place after
a fixed period of time. In Brownian motion, the particle is constantly changing
direction. The path looks like a random walk, zoomed out so that the individual legs of
the journey shrink to zero.
i) where i varies over the natural numbers. In contrast, Brownian motion can be
modelled by a continuous family of random variables, that is, (X
What happens as this system develops? Einstein showed that after any length of time,
the position of the particle is modelled by a 3-dimensional normal distribution (its
position in each dimension is normally distributed, and the three are independent).
CRYPTOGRAPHY
We can use this key to encrypt our message. Before enciphering, the message is known
as plaintext (and will be written in lower case letters). Suppose our plaintext reads
‘meet me in the park at three a.m.’. We produce the ciphertext (which will be written in
capital letters) by substituting the letters according to the table above: ‘DTTZ DT OF
ZIT HQKA QZ ZIKTT Q.D.’.
343
Once we have sent this to our contact, she can decipher it using the same key. Of
course, there is no reason to be limited to the letters of alphabet. Any 26 symbols will
do equally well. To make a sentence less guessable, we might also want to omit the
punctuation marks and spaces.
Cryptanalysis Imagine that you intercept an encrypted message intended for your
enemy:
Let us assume that the sender used monoalphabetic encryption, the fundamental
technique discovered by the ninth-century scientist Abū al-Kindi-. The basis of
frequency analysis is the observation that the letters of the alphabet are not equally
common. The first step is to analyse the most common letters which occur in the
ciphertext:
U R F K B G D J P 36 29 26 25 23 19 18 17 15
The basic idea is to try to replace these with the most frequent letters which arise in
English, which are (in order) ETAOINSHRDLU.
Frequency analysis In the above example, we might start by replacing the commonest
two letters in the message, U and R, with the commonest two in English, e and t. This
takes us to:
WKtKtPeEBtXEeGtJetJFBGDtFtBKGFtGBGetPBJtXFOKGtPetZeGtXAKIJt
PKAOFXOFVeDIJeXKIFJeGKtAKQQKZeLXKIJEKGtFEtBDFQeSFDPeZBQQ
PFTeFZPBteEFJGFtBKGBGPeJPFGLXKIJEKLeZKJLBDDBDXYPIDDPeZB
QQWBTeXKIFVeXtPeOFteJBFQDFJeBGMKSGIOMeJDBSeBWPtABTeLeDtJ
KXtPBDOeDDFWeFAteJXKIPFTeOeOKJBDeLBtLKGKtEKGtFEtOeFWFB
GeGLKAOeDDFWe
344
PROBABILITY & STATISTICS
CRYPTOGRAPHY
Now we can open up a second line of attack. At several places, the plaintext letter t is
followed by the encrypted letter P. From our knowledge of English, it seems likely that
P represents h. This gives us:
WKtKtheEBtXEeGtJetJFBGDtFtBKGFtGBGethBJtXFOKGthetZeGtXAKIJt
hKAOFXOFVeDIJeXKIFJeGKtAKQQKZeLXKIJEKGtFEtBDFQeSFDheZBQ
QhFTeFZhBteEFJGFtBKGBGheJhFGLXKIJEKLeZKJLBDDBDXYhIDDheZ
BQQWBTeXKIFVeXtheOFteJBFQDFJeBGMKSGIOMeJDBSeBWhtABTeLe
DtJKXthBDOeDDFWeFAteJXKIhFTeOeOKJBDeLBtLKGKtEKGtFEtOeFW
FBGeGLKAOeDDFWe
Letter
345
The phrase ETAOIN SHRDLU lists the first 12 letters in order of frequency. It was
well known in the age of linotype printing presses, where the letters on the keyboard
were arranged in approximate order of frequency. Sometimes the phrase would
accidentally appear in newspapers.
Frequency analysis does not just use individual letters. Some combinations of letters
(such as ‘th’) are more common than others (‘qz’).
Digraph
Digraph
Digraph
Polyalphabetic encryption, codes and spelling mistakes There are several techniques to
make monoalphabetic encryption more difficult to crack. One is to use polyalphabetic
encryption, where each letter is enciphered in more than one way. So, the key might
use a 52-letter alphabet, with each letter of plaintext enciphered by a choice of two
symbols. For extra complexity, we could also introduce dummy symbols which have
no meaning, and will be deleted by our contact, but may confuse any interceptor:
a b c d e f g h i j k l m n o p q r s t u v w x y z dummy
SZL
bank dollar
346
PROBABILITY & STATISTICS
CRYPTOGRAPHY
A third method is to introduce deliberate spelling mistakes into the message, to further
throw out frequency analysis. All of these produce messages which are much harder to
crack than plain monoalphabetic encryption:
‘{DXC}C;EX28HXS$}B5@ZX3~!ATRX~K£X}@KXI;3,(E{}P?
CX2K1~S^,P$5X38}£2O;V,}M%’
One-time pad The disadvantage of monoalphabetic encryption is that the same letter is
encrypted in the same way each time, leaving it open to frequency analysis. This
problem is ameliorated through polyalphabetic encryption, spelling mistakes and other
devices. But, ultimately, an expert cryptanalyst may overcome these hurdles,
particularly when armed with a computer to test out different possibilities.
The one-time pad is an alternative method which works with a string of letters as its
key. Suppose in this case that the key begins ‘mathematical’.
First the letters of the plaintext and key are each converted into numbers, according to
their alphabetical positions.
Plaintext:
a b o r t m i s s i o n 1 2 15 18 20 13 9 19 19 9 15 14
Key:
m a t h e m a t i c a l 13 1 20 8 5 13 1 20 9 3 1 12
The ciphertext is then obtained by adding together the two numbers in corresponding
positions, and converting them back to letters. Where the result exceeds 26, 26 is first
subtracted. That is to say, addition is performed modulo 26 (see modular arithmetic).
Ciphertext:
14 3 9 26 25 26 10 13 2 12 16 26 N C I Z Y Z J M B L P Z
Our contact can decipher the message by reversing this process (as long as she has the
key). This form of encryption is equivalent to enciphering successive letters according
to different monoalphabetic ciphers. Its high level of security, and its name, derives
from the fact each key is used for encrypting just one message, and is then discarded.
So agents would have matching pads, with a key on each page, and turn a new page
with each message.
347
Public keys Although the one-time pad is theoretically unbreakable, it nevertheless has
an Achilles heel: it is extremely expensive in keys. The sender and receiver need a new
key for every message. Exchanging these keys inevitably involves risk. In the one-time
pad and other traditional ciphers, encryption and decryption are symmetric procedures.
In particular, the sender and receiver require access to the same key.
In public key cryptography, this symmetry is broken. Now, the key comes in two parts:
a private part which is kept by the owner and never shared, and a public part, freely
accessible to all.
Anyone can encrypt a message using the public key, and send it to the owner. Only the
owner has the power to decipher it, however, as this requires the private key.
Public key cryptography is the backbone of modern internet security. The key consists
of two large prime numbers, say p and q. These are kept private, while their product p
q is made public. The security of this system relies on the inherent difficulty of
reversing this process, known as the integer factorization problem.
To begin with, he pioneered the use of binary as the natural language of information.
Then, for the first time he analysed the theoretical basis of information transmission.
He probed its limits, analysing the maximum rates at which a system can transmit data.
The answer depends on the source of information, and in particular on a quantity called
its entropy. This precisely quantifies the unpredictability of successive bits (binary
digits) in a stream of binary code, by modelling it as a Markov process. Shannon’s
entropy was later found to be equivalent to Kolmogorov complexity.
The two cases that Shannon considered are noiseless systems and noisy systems
(which errors can creep into). In the latter case, he considered the theoretical limits of
the powers of error-correcting codes.
348
PROBABILITY & STATISTICS
CRYPTOGRAPHY
The sequence 111111… is easily described, containing very little information. It would
be a waste of disc space to store a string of a million 1s. To save space, archiving
software can dramatically compress this sequence, repackaging the information as
instructions for writing out one million 1s.
In the 1960s, Ray Solomonoff and Andrey Kolmogorov used this idea as a way to
quantify the information content of a string of bits. The Kolmogorov complexity of a
string is the minimum length to which it can be compressed. Strings which carry a lot
of information are incompressible and thus have high complexity. Strings with little
information, such as the million 1s, can be hugely compressed and therefore have low
complexity. Kolmogorov complexity is essentially equivalent to Shannon’s notion of
the entropy of a string of bits.
Error-correcting codes When sending information along a noisy channel, errors can
creep in. Error-correcting codes are ciphers which allow the message to survive some
level of corruption.
The simplest method is plain repetition: instead of sending COME NOW, we send
CCCOOOMMMEEE NNNOOOWWW. If one digit is corrupted, the blocks of three
come to our aid: III CCCAAANNQNNNOOOTTT.
If there is more than one error, we may be in trouble again. We could use longer
repeating blocks, say 100 of each letter, but this will start to slow the process down.
Shannon showed, remarkably, that this trade-off between accuracy and speed is not
inevitable. With some mathematical technology, it is possible to find codes which are
both quick and as accurate as required. Even if 99% of the message is corrupted, it may
still be possible for it to be reliably deciphered.
One method is based on Latin squares, which are natural error correctors. If an entry in
a Latin square is corrupted it is easily identifiable and correctable, by checking all the
rows and columns. More sophisticated methods employ the algebraic structures of
finite fields.
349
MATHEMA PHYSICS
Philosophy is written in this grand book, the universe, which stands continually open
to our gaze. But the book cannot be understood unless one first learns to comprehend
the language and interpret the alphabet in which it is composed. It is written in the
language of mathematics. Galileo Galilei
Galileo’s words are as valid today as when he wrote them, at the genesis of the first
detailed physical theory Newtonian mechanics. Even today, Newton’s theory remains
adequate for many purposes, but cannot cope with the motion of light on the
astronomical scale. This was addressed by Albert Einstein’s theory of special
relativity, bringing many unexpected consequences, including the celebrated
TICAL
equivalence of mass and energy. Missing from this story, however, was an account of
gravity. This Einstein tackled in his second
On the subatomic scale, too, Newtonian mechanics broke down and again light was the
obstacle. An old question asked whether light consists of waves or particles. The
eventual answer was very troubling: both. A completely new model of matter was built
to describe this, quantum mechanics.
Since the early 20th century, the challenge has been to find a new model uniting
general relativity and quantum mechanics. Although this dream remains unfulfilled,
the approach of quantum field theory has brought huge advances.
NEWTONIAN MECHANICS
Newton’s laws Newtonian mechanics concerns the behaviour of objects which are
subjected to forces pulling or pushing them around. A boy kicking a football is one
example of a force, gravity causing the boy to fall down is another. Newton’s second
and third laws model these situations. The first law addresses something more basic:
what happens to an object which is not subject to a force?
This question is not as easy as it sounds. Since the time of Aristotle, the belief had
been that such an object would gradually slow down as its ‘inertia’ waned, until it
became stationary. It was Galileo Galilei who first corrected this misconception.
Galileo’s principle became Newton’s first law.
Newton’s first law A moving object will not slow down in the absence of any force.
Imagine a stone sliding across a perfectly smooth frozen lake. There is no force acting
on it, and it simply continues to slide, at a constant speed, in a fixed direction. (In
reality, the stone will eventually slow down, of course, because ice is not perfectly
smooth, and so there will be a small force acting, namely friction.)
Galileo’s principle, also known as Newton’s first law, says that an object will remain at
rest, or travelling at constant speed in a fixed direction, unless disturbed by a force.
The law can better be seen in the near vacuum of deep space. Here, a rock moving
though space in a given direction will simply continue on its path indefinitely, until
disturbed.
Newton’s first law tells us that objects can move even without forces acting. So what
difference does the force really make?
The answer is that, without any force acting, an object can travel in a straight line at a
constant speed. A force will cause it to speed up, slow down or change direction. In
other words, forces produce acceleration.
How much acceleration? This depends on another factor. A golf shot produces a
significant acceleration in a golf ball (over a very short time). To produce the same
acceleration in a house-brick would require a massively larger force. To produce a
given acceleration in an object, the force required is proportional to the object’s mass.
The heavier the object, the greater the force needed. In equation form, we have
F ma
where F is the force measured in Newtons, m is the object’s mass (in kilograms) and a
is the resulting acceleration (in metres per second per second, or m/ s 2).
352
MATHEMATICAL PHYSICS
NEWTONIAN MECHANICS
Newton’s third law In billiards or pool, the cue ball is used to move the other balls.
When it hits a stationary ball, during the short time that they are in contact, the cue ball
exerts a force on the target ball, causing it to accelerate according to Newton’s second
law. However, the cue ball does not simply continue on its former path. If the target
ball moves left, then the cue ball will be deflected right. In what pool players call a
‘stun shot’, the cue ball hits the target head-on, and then decelerates to stationary.
According to Newton’s first law, the cue ball must also have been subjected to a force.
Newton’s third law of motion says that these two forces are of equal magnitude, and in
opposite directions. The general statement is when an object A exerts a force F on
object B, then B exerts a force F on A.
Equal and opposite reactions Newton’s third law is often quoted, slightly misleadingly,
as ‘every action has an equal and opposite reaction’. This is reasonable when we
contemplate pool balls colliding, but seems to run counter to intuition at the level of
human affairs. When we hit a nail with a hammer, we do not ordinarily think of the
stationary nail as exerting a force on the moving hammer. Nevertheless the nail does
causes the hammer to decelerate.
More complex situations can be confusing. If a man is pushing a car along a flat road,
and the car applies an equal force to the man, why does it not push him backwards
along the road? In such scenarios there are often more forces at work than are
immediately apparent. The man is not only pushing the car forwards, he is also pushing
back with his feet, and taking advantage of the equal and opposite force exerted on him
by the ground. On sheet ice, he could not do this. Here, if he gave the car a push, he
would indeed find himself sliding backwards.
Momentum The momentum of an object is its mass multiplied by its velocity. Often
the letter p is used for momentum (because m is already taken for mass) and v stands
for velocity. So the defining formula is p mv. If a soccer ball weighing 0.5 kg is
travelling at 6 m/s, then it has a momentum of
0.5 6 3 kg m/s.
The total momentum before the collision is given by 1 0 3 10 30 kg m/s. If the velocity
we want to find is v, then the total momentum after is 1 v 3 5 v 15. Since momentum is
conserved, it must be that the two are equal, and v 15 30. So v 15 m/s.
3kg
3kg
10 m/s 5 m/s v
1kg
1kg
353
Conservation of momentum A closed system is a collection of objects which interact
with each other but are insulated from all external forces and objects. In such a system,
the total momentum of the system remains fixed and unchanging.
This useful fact is a direct consequence of Newton’s laws. The simplest case is where
two particles, of fixed mass m 1 and m 2, inflict forces F 1 and F 2 on each other. By
Newton’s third law, it must be that F 1 F 2. If the two accelerations are a 1 and a 2,
then, by Newton’s second law, it follows that m 1 a 1 m 2 a 2, and so m 1 a 1 m 2 a 2
0.
Suppose the velocities of the two particles are v 1 and v 2. To get from the acceleration
to the velocity, we need to integrate (see rates of change of position). If we integrate
the above equation, we get m 1 v 1 m 2 v 2 C, for some seeconstant
rates ofofintegration
change of C. This
positio
says that, even as F 1, F 2, a 1, a 2, v 1 and v 2 all vary, the total momentum m 1 v 1 m
2 v 2 remains fixed at the value C.
Bodies with constant velocity Suppose a cyclist is travelling along a straight line with a
constant velocity of 3 m/s. If s represents her displacement from the origin, then her
velocity is given by d s
__ d t. In this case d s
Falling bodies An important type of motion is that of an object moving under constant
acceleration. A common example is of an object falling under gravity. The moral of
Galileo’s cannonballs is that, if we ignore the effects of air resistance, any two objects,
even of widely differing masses, will fall to earth at the same rate. This constant
acceleration is determined by earth’s gravity, and known as g (approximately 9.8
m/s2).
354
Usually we will orient ourselves so that the origin is the starting position. This amounts
to the boundary condition that s 0 when t 0. Hence C 0. Now we have the equation s 3
of changet.of
This will tell us the cyclist’s displacement at any time. For example, after 60
position
seconds, her displacement is 180 m.
_ t, which happily
MATHEMATICAL PHYSICS
NEWTONIAN MECHANICS
(1) d 2 s
dt2g
__ d t gt C. Assuming that
__ d t 0, and so C 0. So:
(2) d s ___
d t gt
2 t 2 D. If we have organized matters so that the object was dropped from the origin,
then D 0, and we get:
(3) s g __ 2 t 2
So, after 10 seconds, the object has a velocity of 98 m/s (by equation 2) and has
dropped 490 metres (by equation 3). This is one example of a body with constant
acceleration.
Bodies with constant acceleration If an object has constant acceleration, say of a, then:
Integrating this we get
Now if we say that the object has an initial velocity of u, then when t 0,
__ d t,
so we get:
(2) v at u
It is important to recognize here that t and v are variables, while a and u are fixed
constants.
(3) s 1 __ 2 at 2 ut
Equations 2 and 3 are often used for calculating the velocity and displacement of an
object under constant acceleration, after a certain amount of time. It can also be useful
to connect v and s directly, without needing to go via t. A little manipulation of
equations 2 and 3 produces:
(4) v 2 u 2 2 as
355
Kinetic energy In Newtonian mechanics, a moving body has a certain amount of
energy by virtue of its motion. If a mass m is moving at a velocity v, its kinetic energy
is K 1
Potential energy It requires energy to lift a heavy object, because you are fighting
against the gravitational force. By doing this you are imbuing the object with
gravitational potential energy. This energy remains stored in the object, and can be
converted to kinetic energy by dropping it. The formula for an object’s gravitational
potential energy is V mgh where m is its mass, g is the earthly gravitational constant,
and h is the height to which it has been lifted. Potential energy is not limited to gravity,
but can exist in relation to other forces, such as electromagnetism.
WAVES
Waves Many physical phenomena come in waves. Common examples are sound, light
and the ripples on a pond. In each case, some property perpetuates through a medium
over time, and does so in a repetitive way.
The simplest waves are 1-dimensional, such as those carried along a violin string.
Sound and light travelling through space spread out in three directions, while the
displacement
356
2 1 10 2) (1
2 2 20 2) 450 J. After they collide, this kinetic energy should remain the same. So, as
an easy example, if the heavier body is at rest after the collision, and the lighter one is
travelling with speed v then we must have
(The conservation rule will not hold exactly in the real world, where some energy will
always be lost as heat and sound.)
_
MATHEMATICAL PHYSICS
WAVES
waves on a pond’s surface are 2-dimensional. (These can be modelled by scalar fields.)
But the 1-dimensional model remains relevant in each case.
The two fundamental attributes of any wave are its frequency and its amplitude.
Frequency A wave such as sound or light is subject to the ordinary formula: speed
distance
______ time. The distance from the start of one cycle to the start of the next is called
the
wavelength. If this has a value of L, the wave has a speed v, and it takes a time of t to
complete one cycle, then this produces the relationship v L
_ t.
The frequency (f) of a wave is the number of complete cycles per second (measured as
Hertz, Hz). This means f 1
For sound waves, frequency is interpreted by our brains as pitch. A taxi’s squeaky
brakes have a high frequency of around 5000 Hz (meaning that 5000 cycles are
completed every second), while blue whales sing at around 20 Hz, around the lower
limit of the human ear. The upper limit is approximately 20,000 Hz (though bats can
hear up to 100,000 Hz).
For visible light, frequency determines colour, the visible spectrum being from 4.3 10
14 Hz to 7.5 10 14 Hz.
In many cases (such as sound and light) the wavelength and the frequency can vary,
while the speed is fixed by the ambient conditions. If we rescale, so that this fixed
speed v 1, this sets up an inverse relationship between wavelength and frequency: f 1
L.
The closer together they are, the higher the frequency. The height of those peaks is
given by the amplitude. (More precisely, the amplitude is half the height from a peak
to a trough, or the distance from the centre to a peak.)
The amplitude of a sound wave determines its volume. On a stringed instrument, the
harder the string is plucked, the greater the amplitude of the resulting wave, and the
louder the sound. (In fact, this story is slightly more complicated, as humans have a
tendency to hear high-frequency noises as psychologically louder than low ones. So we
do not usually use metres as the measure of loudness, but decibels, which also
incorporate some measure of frequency.)
The amplitude of a wave of visible light defines its brightness, and the amount of
energy it carries.
1990s there were two different ways in which sound could be encoded into
electromagnetic waves for broadcast. These were amplitude modulation (AM), first
used in 1906, and frequency modulation (FM) in 1933. The difference is that the AM
encodes the information into
Amplitude modulation
357
the amplitude of the electromagnetic wave (keeping the frequency constant), while FM
keeps the amplitude constant, and encodes the information into the frequency.
Generally FM is better resistant to noise, with a bigger range to play with (AM has to
contend with the limits to the range of amplitudes which it is practical to broadcast and
receive).
Digital radio Like FM, digital radio uses frequency modulation to carry information.
The difference between this and standard FM is that the sound is first encoded as a
stream of bits (see information theory), and these bits are broadcast via frequency
modulation. Each digital station essentially needs just two frequencies, to represent 0
and 1, meaning that more stations can be accommodated in a narrower range of
see information
frequencies. Because error-correcting codes are built theory
into the data stream, digital radio
is better resistant to noise than either AM or ordinary FM.
Stringed instruments Players of stringed instruments, such as banjos, cellos and pianos,
create sounds by plucking, bowing or hitting the strings. This causes the string to
vibrate, and the resulting pitch (or frequency) of the note created is determined by the
string’s length, mass and tension. A double bass produces a deeper pitch than a
ukulele, because its strings are longer and heavier. When you press on the string, you
effectively shorten it, raising the pitch. As you loosen a tuning peg, the tension lessens,
causing the pitch to drop.
The principal note that the string produces is called its fundamental frequency or first
harmonic. This is produced by a wave which is exactly double the length of the string.
But there are other pitches produced too, namely the higher harmonics.
Harmonics The first harmonic on a vibrating string is the fundamental frequency (or
root), a wave where only the ends of the string are stationary.
The second harmonic leaves an additional stationary point in the middle of the string.
(Instrumentalists produce this by gently touching the centre of the string.) This has
wavelength half that of the first harmonic. Halving the wavelength corresponds to
doubling the frequency, so the first harmonic has double the frequency of the
fundamental note. To human ears, this sounds an octave higher than the root.
The third harmonic has stationary points one third and two thirds of the way along the
string. Its wavelength is a third of that of the fundamental, and its frequency is three
times higher. In musical terms, this sounds an octave plus a fifth above the root.
The fourth harmonic divides the string into four parts, producing a frequency four
times that of the first, and a sound two octaves higher.
358
Frequency modulation
First harmonic
Second harmonic
Third harmonic
MATHEMATICAL PHYSICS
Scalar and vector functions The commonest form of function takes in a number (x) as
input and gives out another u (x) as output. In three dimensions, however, the input
might be a triple of numbers (x, y, z), representing the coordinates of a point in 3-
dimensional space. The output u (x, y, z) might describe the temperature at the
specified point, for example. Such a function is called a scalar field.
positive direction of the x-axis, and 1 m/s in the negative direction along the z-axis.
(This produces a total magnitude, of
Often bold letters such as u are used for such vector functions. For example,
u(3, 2, 1)
then the fluid is travelling rightwards at 2 m/s at the point (0, 4, 1).
respectively. So u (x, y, z)
Vector fields Fluid flow is an example of a vector field. This is a function which
assigns a vector to every point in space (not to be confused with the algebraic structure
called a field). Usually, we require the vector field to be smooth, so that small
movements in space produce small changes in the corresponding vectors.
359
Just as with scalar functions, we are interested in the rate of change of vector fields.
This is the topic of vector calculus. In the case of fluid flow, the vector field is
governed by the Euler and Navier–Stokes equations. Vector fields may also be
interpreted as forces, such as in Maxwell’s equations for electromagnetic fields.
The hairy ball theorem says that every smooth vector field on a sphere must be equal to
zero somewhere.
Vector calculus To investigate scalar and vector fields, we want to look at the way they
vary from point to point, and moment to moment. This involves calculus. But with
three spatial coordinates, and one of time, by which we might want to differentiate, the
notation can become very messy and the equations very long.
In the 19th century, mathematicians doing vector calculus started to use a new symbol,
called ‘del’ or ‘nabla’ (not to be confused with a capital Greek delta) as a useful short-
hand for the following operator:
Grad Suppose that f is a scalar field, which assigns a number to every point of 3-
dimensional space. (An example is temperature.) Then:
This is a vector, called the gradient of f (commonly abbreviated to ‘grad f’). Like the
ordinary derivative, f expresses the rate at which f increases. The vector f points in the
direction of greatest increase, and its magnitude quantifies the rate of increase.
Div Although is an operator, we can pretend it is an ordinary vector, and take a sort of
‘dot product’.
360
MATHEMATICAL PHYSICS
This is pronounced ‘div u’, short for the divergence of u. This is now a scalar quantity,
which quantifies the total inflow and outflow at each point.
If u 0 at some particular point, this means that the net effect of flow at that point is
outwards. That is to say, the point acts as a source of flow. If u 0, then the net flow is
inwards at that point, meaning that it acts as a sink.
Curl Just as the divergence is obtained by taking the dot product with, so the curl of a
vector field u is defined by taking the cross product.
If u
(u x
uy
uz
Sink
(u z ___
, then:
uy
___ z
ux
zuz
___ x
uy
xux
____ y
The curl is another vector field, which quantifies the extent and direction of the
rotation of u. If u 0, then u is said to be irrotational.
The Laplacian Applying the del operator twice we can also use it as short-hand for
second derivatives. An important case is the Laplacian 2, named after its inventor
Pierre–Simon Laplace. If f is a scalar field, then:
2 f · (f) 2 f
x22f
y22f
____ z 2
This is a scalar quantity. The Laplacian at a point expresses the average value of f in a
small neighbourhood of that point (technically the limit of the average as the
neighbourhood shrinks to zero). It features in Laplace’s equation, the heat equation and
in the quantum mechanical Hamiltonian, among many other places in mathematics.
A vector form of the Laplacian is central to the Navier–Stokes equations. Starting with
a vector field u as above, we define:
Writing this out fully really demonstrates the space-saving benefits of the notation!
361
Laplace’s equation The simplest equation which can be built from the Laplacian is:
2f0
According to this equation, f is a scalar field such that around every point the average
value is zero. In one dimension, this is only satisfied by fields corresponding to straight
lines. In higher dimensions there are subtler possibilities, the harmonic functions
studied in Hodge’s Theorem.
Hodge’s theorem In the 1930s, William Hodge performed some exceptionally deep
analysis on the Laplace equation and its possible solutions, or harmonic functions as
they became known. In particular, Hodge showed that solving this equation on a
manifold is equivalent to calculating certain homology groups for it (see algebraic
topology).
This profound and unexpected connection had major repercussions from topology to
group theory, and was described by Hermann Weyl as ‘one of the great landmarks in
the history of science in the present century’. The breakthrough presented Hodge with
the right language to formulate the famous Hodge conjecture. It also provides the
template for solutions of more complicated equations such as the heat equation, in
which the scalar field is no longer fixed, but now additionally depends on time,
becoming a flow.
The heat equation In 1811, Joseph Fourier published a theory on the flow of heat in a
solid. According to this, heat is a scalar quantity. The heat f at a point depends on the
time t, as well as the point’s coordinates in 3-dimensional space (x, y, z). So we model
f as a function of four variables: t and x, y, z. By considering the way that heat diffuses
over time, Fourier derived the fundamental partial differential equation for modelling
the flow of heat:
t2f
The physical interpretation of this equation is that the rate at which f changes at a
single point is determined by the average temperature of the points around it.
Solutions to the heat equation We cannot hope for a unique solution to the heat
equation, f
t 2 f. The general theory of partial differential equations is firmly set against this; but,
more than this, common sense dictates that the way that heat flows must depend on the
initial distribution of heat around the solid.
f __
362
To deal with more complicated initial conditions, we can build solutions from series of
these solutions, just as in Fourier analysis. Indeed the study of heat was Fourier’s
initial motivation.
t 2 f carries far beyond models of heat flow. This equation occupies a foundational
place in several subjects, including modelling stock options in financial mathematics.
Along with Laplace’s equation, it forms the basis upon which other partial differential
equations are built, especially those modelling any form of flow. A notable example is
the Navier–Stokes equation.
The Laplace and heat equations are not limited to ordinary 3-dimensional space, but
can apply on any manifold. This is useful for modelling diverse physical phenomena,
and also within pure mathematics. Ricci flow is a striking example of an abstract flow
on a manifold, the analysis of which led to the proof of the geometrization theorem and
Poincaré conjecture.
Fluid dynamics How do fluids flow? This continuous form of mechanics is far more
difficult to model than the discrete particles of Newtonian mechanics. In the 18th
century, Leonhard Euler considered this question. Euler’s formula for fluid flow is
essentially a statement of Newton’s second law, transferred to an idealized fluid.
His work was built on during the 19th century, separately by Claude-Louis Navier and
George Stokes. Both worked with essentially the same mathematical model of fluid
flow.
As well as assuming that the fluid is incompressible, Euler assumed that it has no
viscosity, that is to say, that it can move freely over itself without any internal
frictional forces. Navier and Stokes introduced a new constant v to quantify the
viscosity of the fluid. Mathematically, this made the resulting equations even more
difficult to solve. This is the celebrated Navier–Stokes problem.
The fluid model We assume that our fluid is distributed throughout some region.
Mathematically, the central quantity in which we are interested is a vector u,
describing the velocity of the fluid at a particular point (x, y, z) and at some moment t,
say
(u x
uy
uz
. The value of each of u x, u y, u z will each depend on the time t as well as the spatial
coordinates (x, y, z). In mathematical terms, this constitutes a vector field u (t, x, y, z).
The principal factor affecting the flow is the pressure of the liquid, p. This may also
vary from point to point and moment to moment. So we write it as a function p (t, x, y,
z). Assuming that there are no other forces acting on the liquid, what Euler’s work and
that of Navier and Stokes provides is a formula relating u and p.
363
In their analysis, a fundamental assumption is that the fluid is incompressible. When a
force is applied, the liquid may move in some direction but cannot contract or expand
to fill a different volume. This amounts to asserting that · u 0. In other words no point
may act as either a source or sink of flow at any time.
Euler’s fluid flow formula Suppose for a moment that we are interested in a fluid
flowing in a 1-dimensional space, so we only need to consider u x, with u y u z 0.
Euler’s formula for fluid flow says that the flow must satisfy:
Euler also derived two other similar equations, for acceleration in the y- and z-
directions. In these, u y and u z replace u x as the object to be differentiated on the left-
hand side, with p
These three equations can be expressed more succinctly as one equation, using vector
calculus:
When the liquid is subject to an additional external force f (such as gravity), the
equation becomes
The critical ingredient which Navier and Stokes added was a constant to quantify the
viscosity of the fluid. If no external force is acting on the fluid, the Navier–Stokes
equations then say that its velocity u should satisfy the following:
Here 2 u is the Laplacian of u. In the presence of an external force f, the equation
becomes
respectively replacing p
364
MATHEMATICAL PHYSICS
The Navier–Stokes problem The Navier–Stokes equations are a triumph of the power
of mathematics to model nature. They have received extensive, detailed experimental
verification, in a wide variety of contexts. They occupy a central position in fluid
dynamics, and their study has led to technological advances from aircraft wings to
artificial heart valves.
It is all the more surprising, then, that it has never been established that these equations
have mathematical solutions. To be more precise, it is possible to find mathematical
formulas for u which satisfy an Euler or Navier–Stokes equation over a short period of
time. But often these solutions break down at a later stage, ceasing to be smooth
functions (an impossible condition for a physical fluid).
No-one has yet been able to find a single formula which remains valid for all values of
t, which solves either the Euler or Navier–Stokes equations (although their 2-
dimensional analogues can be solved). This is certainly frustrating, since evidence
from computer simulations (as well as the natural world) suggests that there should be
a great many such solutions.
In 2000, the Clay Institute announced a prize of $1,000,000 for a solution of a Navier–
Stokes equation, as one of their millennium problems.
Reversing this idea, he was also able to build an electric motor for converting electric
energy into mechanical energy.
These discoveries rely upon the intertwining of magnetic and electric fields. If an
electric field E causes current to pass through a wire, this sets up a magnetic field B
wrapping around the wire, at right angles to it.
The precise geometry is subtle; pinning it down exactly involved the work of
physicists such as André-Marie Ampère, and mathematicians such as Carl Friedrich
Gauss. In 1864 James Clerk Maxwell was finally able to write four partial differential
equations which perfectly capture the geometry of these two fields. This combined
electromagnetic force is now counted as one of the fundamental forces of nature.
Magnetic field
Electric current
365
Maxwell’s equations James Clerk Maxwell’s equations relate two vector fields: a
magnetic field B and an electric field E. Their exact geometry depends on the ambient
matter and the distribution of electrons around it. This can be quantified by two pieces
of data: a scalar field ρ, called the charge density, and a vector field J, called the
current density. The speed of light c is also involved. After rescaling through a careful
choice of units, Maxwell’s equations may be written using vector calculus as:
SPECIAL RELATIVITY
Inertial frames of reference In deep space, thousands of light-years from the nearest
star, float two rocks. Rock 1 is stationary, and rock 2 drifts past it at 5 m/s. Or was it
the other way around? Perhaps rock 2 was stationary, and rock 1 floated by at 5 m/s.
Or possibly they sailed past each other, each travelling at 2.5 m/s. Or maybe, even,
rock 1 was whizzing along at 996 m/s, and was overtaken by rock 2 flying at 1001 m/s.
How can we tell which of these descriptions is right? The answer is that we cannot.
Each of the above is a perfectly legitimate characterization of the situation. Which one
we need depends on our choice of inertial frame of reference.
Because neither rock has any force acting on it, both of them are said to have inertial
motion, and, according to Galilean relativity, all such are fundamentally equivalent. If
we have a particular interest in rock 1, it might make sense to consider that as
stationary, and rock 2 as moving. Making this decision fixes our inertial frame of
reference, and the velocities of other bodies can then be measured relative to it.
Galilean relativity We are familiar with a certain level of relativity, according to which
different positions are equivalent. Imagine two identical sealed rooms in different parts
of a city. There is no conceivable experiment by which you could determine which you
are in. The laws of physics are identical in the two cases. The only difference is in their
relative locations: from the first room, the second is 6 miles north. Equivalently, from
the second, the first is 6 miles south.
366
MATHEMATICAL PHYSICS
SPECIAL RELATIVITY
constant speeds, then again there is no conceivable experiment that could determine
which you are in. The two scenarios are identical, even if one is travelling at 10,000
m/s, relative to the other.
This idea does not come naturally to humans, since we spend our lives firmly rooted in
one inertial frame (as far as our senses can detect, anyway). But it follows from
Newton’s first law. However, this equivalence does not extend to acceleration. If one
room is in an accelerating train, and the other is stationary, then an experiment (such as
dropping a ball) can distinguish the two.
Space is relative The implication of Galilean relativity is that space is relative. If there
is no possible way to distinguish between objects which are stationary and those which
are travelling with constant velocity, then we have to acknowledge that the notion of
absolute stationarity is neither useful nor meaningful.
A consequence is that we have no way to cling onto individual points in space. If a cup
is on a table in an inertial frame, if no-one touches it, is it in the same position 10
seconds later? Relative to the chosen inertial frame it is; relative to another it is not.
But which is true, really? It sounds like a simple question: is the cup occupying the
same region of space as before, or is it not? However frustrating it may be, there is no
valid answer which can be given. For some time, physicists, including Newton,
theorized the existence of an all-pervading ether, which would have fixed one inertial
frame as the true measure of stationarity. The demise of ether theory with the
Michelson–Morley experiment took with it the last hope for a universal default frame
of reference.
Spacetime The attempt to unify the three dimensions of space and one of time into a 4-
dimensional geometric spacetime dates back to at least Joseph-Louis Lagrange in the
late 18th century. A first attempt to formalize this might involve quadruples of
coordinates (t, x, y, z) where t represents time, and (x, y, z) are the ordinary 3-
dimensional spatial coordinates. We could write this more briefly as (t, x), where x is
short-hand for (x, y, z).
This is naïve, because it fails to take into account that space is relative. If we pick a
point in space, say A, then (2, A) and (15, A) represent the same point A thirteen
seconds apart. But this is exactly what Galilean relativity warns us against! Galilean
spacetime is a solution to this problem.
Galilean spacetime
Spacetime should not come equipped with a built-in notion of ‘the same point at
different times’, but should be flexible enough to allow us to specify possible inertial
frames by which to judge this.
Mathematicians have a device, called a fibre-bundle, for arranging this. It looks very
much like the naïve spacetime, with one-dimension of time, and three of space. The
difference is that there is no pre-assigned correspondence between the different layers
of
Time
367
3-dimensional space. Various paths through these slices are equally valid as constant
velocities, called world-lines, represented as straight arrows. A family of parallel
arrows corresponds to picking an inertial frame and measuring everything relative to it.
The eclipses of Io When you light a candle, does the light take time to travel to the
corners of the room, or does it get there instantaneously? With no obvious way to
answer this question, scientists have debated and disagreed over the centuries. Galileo
(1564–1642) believed that light travels at a finite speed, and tried to demonstrate this
experimentally. All he could conclude was that, if he was right, light’s speed must be
extremely high.
Among his more successful work was the discovery of Io, the nearest moon of Jupiter
in 1610. Later in the 17th century Ole Rømer found evidence that light travels at a
finite speed, by observing Io, which orbits Jupiter once every 42.5 hours. For some of
that time it is eclipsed from our view by Jupiter. Rømer noticed that the time between
successive eclipses varied, depending on the relative motion of earth and Jupiter. As
they neared each other, this time decreased; as they drew apart, it increased. This could
only be because light takes longer to reach earth from Jupiter the futher apart the two
planets are.
Speed of light The eclipses of Io were the first solid evidence that light moves at a
finite speed. Subsequent experiments have pinned this down to 299,792,458 m/s, in a
vacuum. (This number is exact, since a metre is now defined as the distance travelled
by light in 1
1 299,792,458 of a second.)
The letter c is usually used to stand for the speed of light. The term ‘light’ is a little
misleading; it is the speed of all electromagnetic radiation in a vacuum, of which
visible light is just a small sliver.
Once scientists knew that light travels at a finite speed, it was expected that this would
slot neatly into Galilean relativity. If all velocities are relative, and all inertial frames
equivalent, then this should apply equally to the speed of light (c). If you were
travelling with speed c, alongside a beam of light, it would appear stationary, like two
cars travelling side by side at the same speed. If you exceed c, the beam of light would
seem to move backwards (just as happens when one car overtakes another). Two
beams of light fired at each other should have a relative velocity of 2 c.
To explain this paradox required a complete reworking of the basic conception of time
and space.
368
MATHEMATICAL PHYSICS
SPECIAL RELATIVITY
Future cone
A beam of light
Minkowski spacetime Galilean spacetime, in which all inertial frames of reference are
equivalent, cannot handle the constancy of the speed of light. Around 1907, Hermann
Minkowski built on earlier work of Henrik Lorentz, Henri Poincaré and Albert
Einstein, to formulate a new spacetime to cope with this. Through some inventive
mathematics, notably the Lorentz transformation, every point of spacetime could be
endowed with a double cone, with one half pointing to the past, and the other to the
future.
These light cones describe the history of every possible flash of light passing through
the point. Paths inside the cones represent lower speeds, the possible world-lines of a
massive particle passing through the point. Outside the cone are points inaccessible to
both light and matter, as travel that is faster than light would be needed to reach them.
Minkowski spacetime is endowed with a new notion of special relativity. The basic
principle is that, below the speed of light, Galilean relativity holds. But, once c is
reached, it breaks down. More precisely, within the light cone, every future-pointing
arrow is as good as every other. Despite appearances, there is no unique line passing
through the centre of the cone (so there is no notion of being stationary). Also,
however close to the boundary an arrow appears to be, from its own perspective the
edge is no closer. (This corresponds to the speed of light remaining constant, no matter
how fast you travel.)
This is made rigorous by endowing the insides of the cone with hyperbolic geometry.
This has the properties we want: on the Poincaré disc, there is no well-defined centre,
and the boundary of the disc is infinitely far away from each point inside the disc.
But if A and B are outside each other’s light cones, they cannot influence each other by
any means. Are they simultaneous?
A natural way to answer this would be to wait for a third event C which is in the
common futures of A and B, and then calculate the times that have elapsed since both.
If these times are the same, we judge A and B to have been simultaneous. However,
because of the constancy of the speed of light, this answer will not be unique. It will
depend on the choice of C. For this reason, we must give up on the idea of absolute
time.
Past cone
369
Mass is relative Suppose an object is travelling at 99% of the speed of light and is
subject to a large force causing it to accelerate further. Newtonian mechanics predicts
that after some time it should exceed the speed of light (according to v = u + at, see
bodies with constant acceleration).
There are now two values of mass that we might want to assign to an object. One is the
observed mass m, as it seems to a stationary observer. As the object’s speed gets close
to the speed of light, m tends towards infinity. The second is the rest mass, the mass of
the object in its own inertial frame, where it is judged stationary. The rest mass is
usually denoted m 0.
Special relativity 1905 was Albert Einstein’s annus mirabilis, or miraculous year. As
well as completing his doctoral thesis on molecular physics, he published four seminal
scientific papers: one on quantum mechanical photons, one on the kinetic theory of
heat, and two on relativity theory. ‘On the Electrodynamics of Moving Bodies’
introduced his theory of special relativity, building on earlier work of Lorentz and
Poincaré. This picture was completed in 1907, when Hermann Minkowski introduced
his mathematical model of spacetime. The theory was quickly welcomed by the
scientific community.
If A and B are two events (which we think of as points in Minkowski spacetime), then
there are three possibilities:
B
see bodies with
Time-like
Space-like
Light-like
370
there are three possibilities:
Space-like
Light-like
MATHEMATICAL PHYSICS
SPECIAL RELATIVITY
1 There is a time-like path between them. This means that they lie within each other’s
light cones, and there is an inertial frame which carries one to the other. So they can be
conceived as two events happening at the same point of space, at different times.
2 There is a space-like path between them. They lie outside each other’s light cones,
and cannot influence each other. Since time is relative, they can be conceived as two
events happening at the same moment, in different places.
3 There is a light-like path between them. This means that they lie on the boundary of
each other’s light cones. Light can pass from one to the other, but matter cannot.
Energy is relative The Newtonian concept of kinetic energy ceases to be absolute, even
in Galilean relativity. Since all velocities are equivalent, there is no meaningful number
which can be assigned to v in the formula 1
2 mv 2.
In special relativity, this gets worse. Since mass is relative, the value of m is also
dependent on the choice of inertial frame. As a result, we have to conclude that kinetic
energy is also relative, and depends on the choice of inertial frame. This realization
prompted Einstein to revisit the whole concept of energy, in his final paper of 1905.
E = mc 2 Albert Einstein’s final paper of 1905 was entitled ‘Does the Inertia of a Body
Depend on its Energy Content?’ It was only three pages long, and consisted of one
fairly simple mathematical deduction. But its stunning conclusion would again shatter
our understanding of the workings of the physical universe.
It concerned the question of energy in special relativity. Just as the mass of an object is
relative, so is its energy. The faster an object travels, the greater both its mass and
energy become. Einstein realized that these two quantities increase in fixed proportion
to each other. This ratio is expressed by the number c 2, where c is the speed of light.
Conversely, as the object slows down, its relativistic mass decreases. But it never
reaches zero. There is a fixed lower limit, given by its mass at rest (m). Similarly,
Einstein found, the energy of an object decreases with its speed as expected. But again,
there is a baseline which is never crossed: its energy at rest (E).
What is the meaning of this energy at rest? The unavoidable conclusion was that any
object has energy, simply by virtue of having mass, and the two are related by perhaps
the most famous equation of all:
E mc 2
Amid all the relativistic quicksand, these three numbers E, m and c are absolute, not
depending on any choice of inertial frame. Einstein’s equation therefore provided a
new firm foothold from which to view the universe.
371
The equivalence of mass and energy
E=m
The message here could not be clearer: energy and mass are equivalent. More
precisely, mass is energy, frozen into material form. Einstein realized that the ultimate
test of this bold claim would come from nuclear reactions.
These come in two main forms: in radioactive decay, an atom of radioactive material
flies apart to leave a lighter atom (or, in the case of nuclear fission, more than one such
atom). In nuclear fusion, two hydrogen atoms collide to form one heavier helium atom.
Crucially, in both processes, the end result weighs less than the initial ingredients. A
small amount of mass has been lost, and released as energy.
The study of such reactions subsequently provided detailed experimental support for
Einstein’s formula. With our usual units of seconds and metres, the number c 2
expresses the exchange rate between mass and energy. Because c 2 is so large
(89,875,517,873,681,764), a tiny drop in mass can buy an enormous amount of energy.
In 1945, this was demonstrated to dreadful effect at Hiroshima and Nagasaki. It also
underlies the energy on which we all ultimately depend: that of the sun.
372
MATHEMATICAL PHYSICS
GRAVITY
GRAVITY
Galileo’s cannonballs Galileo Galilei is said to have climbed the leaning tower of Pisa,
armed with two cannonballs of different masses. By dropping them, he demonstrated
once and for all that objects of different masses fall at the same rate. Whether Galileo
actually performed this particular experiment is a matter of doubt. Nevertheless, he did
make the discovery, and it was enough to contradict the view of gravity that had
predominated since Aristotle, that heavier bodies fall faster.
Galileo was a supporter of the ideas of Nicholas Copernicus, who a century earlier had
posited that the sun, and not the earth, is at the centre of the solar system. Building on
Galileo’s work, Robert Hooke and Isaac Newton developed a theory of universal
gravitation, according to which gravity is not limited to earth, but an attractive force
between all masses. This insight brought the necessary tools to flesh out the
Copernican theory. The backbone of their theory is the inverse square law.
Newton’s inverse square law In ordinary earthly life, we think of gravity as being
constant, and producing a fixed acceleration. Taking a broader perspective, this is not
true. When Neil Armstrong stepped out of the Apollo 11 lunar module, he did not
hurtle towards earth at 9.8 m/ s 2. We know that more massive objects create greater
gravitational fields, and the earth is 80 times more massive than the moon, so why did
he not?
The answer is that, as you move away from a planet or star, its gravitational effects
diminish. At what rate does this happen? Johann Kepler thought that the gravitational
effect between two objects was inversely proportional to the distance between them.
This was not quite right. In Isaac Newton’s Philosophiæ Naturalis Principia
Mathematica (known as the Principia) the correct answer was provided. Newton said
that at a distance r away from a mass, the gravity was proportional to 1
__ r 2.
Since then the details have been filled in: the gravitational force between two objects
of mass m 1 and m 2 is Gm 1 m 2 _____
If we take m 1 as the mass of the earth (5.97 x 10 24 kg), and r as the radius of the
earth (6.37 x 10 6 m), then Gm 1 ___
r 2 comes out at
The two-body problem Two bodies in space will attract each other according to
Newton’s inverse square law. We idealize these bodies as points in space endowed
with mass. If they are stationary to start with, the two bodies will simply fall towards
each other and collide.
Gm m
¹²
Gm m
¹r²²
r²
²
r
373
If they are moving to start with, more complex outcomes are possible. Examples are
two stars cycling around interlocking ellipses, an asteroid which spirals around a star
before plunging into it, or a single-apparition comet, which does a parabolic U-turn
around a star and then flies off. These two-body problems amount to systems of
differential equations with boundary conditions given by the initial positions and
velocities; in all cases the problem can be solved easily, unlike the three-body problem.
It had come out of the works of Newton, who had considered the motion of the earth,
moon and sun. He wrote that when more than two masses interact, ‘to define these
motions by exact laws admitting of easy calculation exceeds, if I am not mistaken, the
force of any human mind’.
Henri Poincaré devoted much attention to the problem, and analysed exactly what
would be needed to solve the problem. It involved ten separate integrals; but Poincaré
believed that these would not be solved exactly without profound mathematical
advances. With no complete solution on the horizon, Poincaré was awarded King
Oscar’s prize.
A huge development in the subject was Sundman’s series, but the three-body problem
(and n-body for n 4) remains an active topic of research today.
The answer is that, uniquely in the case of gravity, the magnitude of the force also
depends on the object’s mass: F mg, where g is the earthly gravitational constant.
These two considerations exactly cancel each other out, to give a g.
This equivalence principle between gravity and acceleration would not receive a
satisfactory explanation until Albert Einstein began to develop his theory of general
relativity, in 1907.
374
MATHEMATICAL PHYSICS
GRAVITY
Einsteinian frames of reference To simulate life in zero gravity, trainee astronauts often
travel in special aeroplanes. When their plane goes into freefall, the sealed room inside
acquires the conditions of zero gravity. In other words, it resembles an inertial frame.
Of course the room frame is not inertial, since it is not moving at constant velocity, but
accelerating at 9.8 m/ s 2. However, the acceleration and gravity cancel each other out
exactly.
From the perspective of the trainee astronaut in his room, the effects of gravity depend
on the acceleration of the plane. When it is flying level, gravity is as we feel it on
earth. If the aeroplane points vertically downwards and switches on its afterburner to
descend even faster than gravity, the astronaut would feel that ‘up’ and ‘down’ had
flipped. Only during freefall is life inside the room free of all gravitational effects.
What is more, it is impossible to separate frames which are subject to gravity, from
those which are subject to acceleration. If you are travelling in deep space with no
gravity and your spacecraft accelerates at 9.8 m/ s 2, life inside would be
indistinguishable from that in earthly gravitation. In general relativity, gravity and
acceleration are one and the same.
Gravitational tides Only in the gravity-free environs of deep space do inertial and
Einsteinian frames coincide; in an empty universe, special and general relativity are the
same. The idea of general relativity is that if an entire room is in freefall then, relative
to the walls and the floor, the contents of the room are totally free from the effects of
gravity. The complicating factor is that gravity is not a constant force. It varies,
depending on your distance from the earth, or other nearby mass. In our hypothetical
falling room, the floor will be subject to a stronger gravitational force than the ceiling,
because it is slightly nearer the earth. From the perspective of a point in the centre, the
floor will be pulled down and the ceiling will be pulled up.
Furthermore, because gravity is directed towards the centre of the earth rather than
downwards in parallel lines, the walls will be pulled slightly inwards. If we imagine
that our
Gravitational force relative to the centre of the room, with its deforming effect.
375
room begins as a sphere, and is made of some compliant material, as it falls it will be
deformed to a prolate spheroid (see ellipsoid).
see ellipsoid
In the case of the planet earth, the effect of this gravitational tide is imperceptible. Near
stronger centres of gravity, it will be more dramatic. In the extreme example of a black
hole, if someone is unlucky enough to fall in, the difference between the gravitational
force on their head and that on their feet would pull them into spaghetti.
Einsteinian spacetime
In general relativity, Einsteinian frames take the place of inertial frames. So, in
Einsteinian spacetime, the path traced out by a particle (its world- line) should be
straight if it is in gravitational freefall. What can we make of gravitational tides? The
world-lines of nearby particles seem to veer apart in a way that straight lines do not.
The best solution is to see spacetime itself as curved, with the world-lines of free
particles being given by its geodesics. Not every geodesic is a viable world-line, only
those which pass through the future light cones at each point. (There may be other
geodesics which represent faster-than-light travel.)
Einsteinian spacetime, the setting for general relativity, is therefore a deformed version
of Minkowski spacetime. This provides an elegant description of gravity: it is the
curvature in spacetime. In the absence of any other force, particles travel along a
geodesic. Where there is no gravity, spacetime is flat and these will be genuine straight
lines, exactly as in Minkowski spacetime.
In 1915, Einstein published a system of equations which describe exactly how the
presence of mass deforms spacetime. The equations are best expressed in the language
of tensor calculus, a technically demanding extension of vector calculus. With this
machinery in place, and working in the appropriate units, Einstein’s equations can be
condensed into one field equation:
G ab T ab
Here G ab is the Einstein tensor, which measures the curvature of a region of space. On
the other side T ab is the stress–energy tensor, which quantifies the amount of energy
or mass in the region. The solutions to this equation are the possible geometries of
spacetime in the theory of general relativity.
376
MATHEMATICAL PHYSICS
QUANTUM MECHANICS
Black holes Einstein’s field equation embodies his theory of general relativity.
Unfortunately, its superficial simplicity belies a considerable complexity. The equation
is very difficult to solve, a fact which Einstein himself found disheartening. One
possible solution does present itself, namely the Minkowski spacetime of special
relativity. This corresponds to a universe devoid of all matter and gravity, and
therefore completely flat. Over the 20th century, other solutions have been found, and
these match well with astronomical observations, providing a strong evidence base for
general relativity.
An unusual solution was found in 1960 by Martin Kruskal, building on earlier work by
others. In this solution, there was a region of spacetime so steeply curved that even
light could not escape. Black holes such as this were considered geometric anomalies
until 1971, when the astronomer Charles Thomas Bolton studied the star system
Cygnus X-1 and realized that the large star HDE226868 was locked in orbit with
another very massive, but invisible object. Further calculations showed that this could
be nothing but a black hole.
Subsequently dozens more suspected black holes have been found, and it is believed
that many stars collapse to form black holes after their death.
QUANTUM MECHANICS
Young’s double-slit experiment Around 1801, the physicist Thomas Young placed a
light source behind a barrier pierced with two slits. On the other side was a screen. The
pattern that Young saw would provide an important insight into the nature of light. The
two dominant theories about light at that time were the particulate theory and the wave
theory. If light consists of particles, then the number of particles reaching any point on
the screen should be simply the number which arrive through the first slit plus the
number which arrive via the second. This should produce a smooth picture.
However, if light is a wave, then Young should see something else. When two waves
meet, they may reinforce each other at some places, and cancel each other out at
others. The result is an interference pattern: brighter in some places, but darker in
others. When Young performed the experiment, he saw an interference pattern on the
screen. This provided powerful evidence for the wave theory of light. Over a hundred
years later, the particle theory of light was to reappear under the guise of photons,
when Young’s experiment would take on a new significance.
Photons Isaac Newton, among others, had believed that light consists of particles.
His ‘corpuscular’ theory was killed off by Young’s double-slit experiment, but was
revived in the early 20th century.
377
In 1900, Max Planck analysed black-body radiation, the electromagnetic radiation
produced inside a black box as it is heated up. Previous attempts to understand this
situation had predicted that the energy inside the box should become infinite, as light
of every frequency was emitted. Planck found a way to side-step this problem, by
supposing that energy was emitted in small clumps, or quanta. This bold assumption
led Planck to a formula which matched well with experimental measurements.
In 1905, Albert Einstein took Planck’s argument further and used it to explain the
mysterious photoelectric effect. In 1887, Heinrich Hertz had shone light at a metallic
surface and noticed that it caused electrons to be emitted from the metal. In 1902,
Philipp Lenard made the puzzling observation that the energy of these electrons was
not affected by changes in the intensity of the light, as would be expected under a wave
model of light. Einstein was able to explain this effect, using Planck’s quanta of light,
later dubbed photons. It was for this work at the birth of quantum theory that Einstein
was awarded the Nobel Prize, in 1921.
With the re-emergence of photons, the results of Young’s double-slit experiment posed
a fresh conundrum.
With more sophisticated technology than Young had available, it was possible to refine
the experiment by emitting particles from the source, one at a time. One particle could
be released, and then detected at one point on the screen. Then the next would be
released. Over time, a truly astonishing picture emerged.
With just one slit open, the final locations of the particles were distributed smoothly.
But, with both slits open, interference patterns again appear. Opening the second slit
seems to prevent particles from taking a previously permitted route through the first!
This experiment has been replicated with several different types of particles, including
electrons, neutrons and even molecules as large as Buckminster–Fullerene composed
of 60 carbon atoms. It provides strong evidence that matter comes neither in classical
waves, nor Newtonian particles, but has a different quantum character. This has
become known as wave–particle duality and is modelled by probability amplitudes.
378
Interference pattern
MATHEMATICAL PHYSICS
QUANTUM MECHANICS
In the absence of a better term, we use the word particle for these dual wave–particle
entities, but it should always be borne in mind that their behaviour is far from that of
familiar Newtonian particles. The de Broglie relations translate between the language
of waves and particles.
Quantum mechanics
If quantum mechanics hasn’t profoundly shocked you, you haven’t understood it yet.
— Niels Bohr
At first sight, wave–particle duality may not appear too strange. Suppose we place a
ball at the top of a hill, and want to know how far it will have rolled after 5 seconds
(treating this as a question in Newtonian mechanics). Even using state-of-the-art
equipment, we can never know the starting position of the ball with perfect precision,
so there is a range of possible initial states. From this range we could concoct a
probability distribution for the position of the ball after 5 seconds, to fulfill a predictive
role similar to that of the quantum wave.
There are two major differences between these scenarios. The first is the nature of the
wave.
The second is that uncertainty about the state of a quantum particle is not simply a
failure of our measuring equipment, but is an innate property of the system.
Heisenberg’s uncertainty principle imposes limits beyond which we can never hope to
pin a quantum particle down.
The Planck constant When Max Planck first investigated black-body radiation, he
made the radical supposition that light energy comes in discrete clumps, later dubbed
photons. More precisely, if the light waves have frequency f then each photon has
energy hf, where h is a particular fixed number, now known as the Planck constant.
Numerically, h is approximately 6.626 x 10 34.
The Planck constant is now considered one of the fundamental constants of nature. It is
because h is so tiny that we do not see quantum phenomena in ordinary life.
Planck’s constant is crucial for moving between the languages of waves and particles,
via the de Broglie relations. Often we work with Paul Dirac’s variant on the Planck
constant, written as h
2.
De Broglie relations Max Planck and Albert Einstein had arrived at the conclusion that
light exhibits particle-like as well as wave-like behaviour. Going the opposite way, the
French aristocrat Louis de Broglie suggested that ordinary matter should have wave-
like as well as particle-like properties.
379
His prediction was supported by subsequent investigation, including two-slit
experiments performed with molecules in place of beams of light. De Broglie
formulated two fundamental rules for translating between the languages of particles
and waves. If a wave has frequency f, then each corresponding particle should have
energy E, where:
E = hf
(Here h is Planck’s constant.) The second formula relates the momentum p of a particle
to the wavelength of the corresponding wave:
p=h
Neither classical waves nor Newtonian particles are adequate to describe quantum
phenomena such as that revealed by the double-slit experiment. The distribution of the
particles on the screen seems to be best modelled as a probability distribution.
However, classical probability distributions do not interfere as waves do. A new
mathematical device was needed.
(x y
point in 3-dimensional space, at time t. Then the output ψ(x, t) is a complex number,
probability
called a probability amplitude.distributions
If we fix t 5, say, then the function x ψ(x, 5) provides a complete description of the
state of the particle 5 seconds after the clock was started. Included in this is the
probability of finding the particle at the point x.
So, a probability amplitude is rather like a probability density function (see continuous
probability distributions). The difference is that probability density functions are
always real-valued functions, whereas is complex.
If we take the modulus of the probability amplitude, and square it, we do get an
ordinary probability density function. This gives the probability of finding the particle
at the point x.
One reason for quantum theory’s unfamiliar feel is that the outputs of this function are
complex numbers rather than real numbers. It is not immediately obvious what
statements such as
is a coordinate
However, this function does encode the familiar properties of the particle, such as
position, momentum and energy, even if they take on radically new flavours. In
particular, these properties do not have unique values, but are probabilistic in nature.
Such properties are called observables.
QUANTUM MECHANICS
The simplest observable is position. This has a straightforward formula, given just by
the coordinates of the point: x. Accompanying this is a probability distribution which
gives the particle’s probability of having this position, at time t. This is given by the
real number.
This looks extremely strange! We expect momentum to be a number, but this formula
defines it as something else, a differential operator. However, in certain instances we
can identify this with a numerical value. For example, suppose (this is not quite a
permissible wave function since it is not normalized, but it illustrates the point). Then:
Comparing the first and final terms of this equation, it makes sense to assign a
numerical value to the momentum p x, namely 6 h.
This argument only worked because of the choice of ψ. In many cases we will not get a
numerical value for momentum. Momentum is the first example of a quantized
observable, meaning that it is an operator which crystallizes to a numerical value only
under certain special circumstances.
Using vector calculus, we can write these three formulas as one, to give the definition
of the momentum operator:
Just as for the position x, we may also extract a corresponding probability distribution
which gives the probability of the particle having momentum of a particular value.
There are two ways of viewing a wave function. One is for the function ψ to take
position as the primary consideration, and deduce information about momentum from
it. It is equally possible to take the opposite perspective. These two viewpoints are
related by the Fourier transform, and provide a beautiful symmetry to the mathematics
of quantum mechanics.
381
Non-commuting operators Position and momentum are the two principal observables
of a quantum particle. Mathematically, they are given by two operators: x for position
and for momentum. Focusing on the x-direction, these are x and.
However, the order in which these are applied makes a difference. A simple
application of the
Since
but by
Writing p x for this momentum operator and x for the position operator, we find that:
px
In particular then p x x x p x, which says that the two operators do not commute.
Similar formulas hold in the y-and z-directions.
The non-commuting of the position and momentum operators was first noticed by
Werner Heisenberg in 1925. At first it may have seemed no more than a minor
technical inconvenience. Within a year, Heisenberg had realized its extraordinary
repercussions, with major consequences for experimental physics and the philosophy
of science.
Focusing on the x-direction, the position is simply given by the coordinate x. The
particle does not have a unique position, however; it is smeared out across all the
possible values of x. How wide this spread is can be quantified by a number, Δx
(technically the standard deviation of the positional probability distribution).
Similarly, the momentum of the particle in the x-direction is given by p x. This is also
not uniquely defined, and the number Δp x determines how widely spread the
momentum is.
Here h
is the reduced Planck constant. Similar inequalities hold along the y-and z-directions.
The converse is also true: if the momentum is narrowed down to a small number, then
it must be that the position is widely spread. In other words, it is impossible to pin
down both the position and the momentum, simultaneously.
Position Momentum
382
MATHEMATICAL PHYSICS
QUANTUM MECHANICS
The Hamiltonian Apart from position and momentum, another important observable is
the total energy of the system. Just as for momentum, this is quantized and given by an
operator. This is known as the Hamiltonian, H.
If the particle is free, that is, not subject to any external force, then kinetic energy is the
particle’s only form of energy, and the Hamiltonian H is defined by H = E.
If there are external forces acting, the particle will have additional potential energy. At
the point x and time t, say, this energy is given by V (x, t). Then the total energy of the
system is given by the Hamiltonian:
H=E+V
Here i =, h is the reduced Planck constant, and H is the Hamiltonian. This says that the
way the wave function changes is determined solely by its energy.
At its most basic level, quantum mechanics investigates the possible solutions to this
partial differential equation. From a mathematical perspective, this is not too
problematic. Mathematicians are well used to analysing such equations. (Certainly it is
easier to tackle than the Navier–Stokes’ equations).
The problem is this: if such an observation is being made, and the particle is located at
a specific point, then the probability of it being found at another point at the same
moment
i h __
383
disappears. So the original wave function is no longer a valid description of the state of
the particle. It is difficult to avoid the conclusion that whenever someone (or perhaps
something) takes a measurement, the quantum system mysteriously jumps from being
smoothly spread out, to crystallizing at a specific position.
This is known as the collapse or decoherence of the wave function. The measurement
paradox is that measurement apparently triggers this decoherence. Scientists continue
to debate the meaning of this phenomenon. However, the evidence for it is compelling
and includes the double-slit experiment with extra measurements.
The double-slit experiment with extra measurements The measurement paradox is not
a conclusion that scientists would concede without serious evidence for it. Some such
evidence is provided by revisiting the quantum double-slit experiment. What might
happen if extra sensors are added to the two slits, to measure which route a photon
takes? The answer is that the interference patterns disappear. It seems that the
additional measurement causes the particle to decohere at one slit. There is then
nothing to prevent it going on to reach a previously forbidden section of the screen.
The probability of it passing through the second slit has reduced to zero, and so no
interference will occur.
The measurement paradox would suggest that the wave function of the particle should
remain spread out, embracing both the possibility of radioactive decay and no
radioactive decay. This situation endures until the box is opened, at which stage it
crystallizes into a firm state of either having decayed or not.
The implications for the cat are that it is neither firmly alive nor dead, but smeared out
in a quantum living–dead state, until the box is opened. Schrödinger considered this
conclusion ridiculous, but its correct resolution continues to be debated.
384
MATHEMATICAL PHYSICS
QUANTUM MECHANICS
other as part of larger systems. Two particles A and B might collide head on, or
slightly deflect each other, or travel straight past each other, for example.
This is true for classical Newtonian particles and quantum particles. The difficulty is
that quantum wave functions may be blurred across all three possibilities. When this
happens the wave function of A becomes entangled with that of B. Two entangled
wave functions can no longer be considered separately, but only as two aspects of a
combined wave function for the two particle system. When measured, this system
decoheres as a whole.
The Schrödinger equation is not limited to the wave functions of single particles, but
governs those of larger systems too, including potentially the wave function of the
entire universe. (It requires a lot of technical work to reinterpret this equation in a
suitable higher-dimensional Hilbert space, to coordinatize all the particles, and possible
correspondences between them.)
There is nothing to prevent two particles, having become entangled, from then parting
and travelling far away from each other. In 1935, Albert Einstein, Boris Podolsky and
Nathan Rosen (EPR) wrote a paper drawing out some of the seemingly paradoxical
consequences of entanglement over large distances.
Suppose particles A and B became entangled and then fly apart. Now Albert will
perform some measurement on particle A, and then Boris on particle B. If Albert
measures the position of A, this collapses the wave function of the pair, pinning down
the position of B. If Boris then measures B’s position, his result is nearly a foregone
conclusion.
Suppose instead that Albert measures A’s momentum instead of position. Then,
according to Heisenberg’s uncertainty principle, the positional distributions of both A
and B must become broadly spread out. So Boris’ positional measurement gains a
much wider range of possible readings.
The upshot of this is Albert’s choice of reading fundamentally alters the positional
probability distribution associated to Boris’ particle, possibly a large distance away.
‘quantum information’ passes between the two particles, and does so instantaneously.
Even more striking EPR-scenarios have subsequently been dreamt up, involving
measurements whose outcomes are just ‘yes’ or ‘no’, rather than spread over a
probability range. In each case, the measurement on one particle has a definite,
quantifiable impact on the second, irrespective of the distance between them.
385
These predictions certainly run counter to common sense. Even worse, they threaten
the inviolability of the speed of light. However, they have been confirmed
experimentally, not over interplanetary distances, admittedly, but up to several
kilometres.
The quantum field Quantum mechanics is a powerful theory for modelling the
behaviour of subatomic particles and is supported by a large body of experimental
evidence. On its own, however, it is not a complete description of our universe. The
missing ingredients are accounts of the fundamental forces of nature. One of the
greatest challenges in science is to construct a combined model of all four. The
approach that emerged in the early 20th century was quantum field theories.
Quantum field theory A quantum field theory is a mathematical model of the quantum
field. Such theories are extremely challenging from a mathematical perspective.
Nevertheless, considerable progress was made over the 20th century, in a sequence of
increasingly sophisticated field theories, most significantly quantum electrodynamics,
electroweak theory and quantum chromodynamics.
2 The strong nuclear force. Two protons carry the same positive electric charge, so,
according to the theory of electromagnetism, they should repel each other. Yet they
coexist peacefully
386
MATHEMATICAL PHYSICS
inside the nucleus of an atom. How can this be? The answer is that there is another
force which attracts them, with enough muscle to overpower the electromagnetic
repulsion. This is the strong nuclear force.
3 The weak nuclear force. There is another force at work within atomic nuclei, which
explains why they sometimes fly apart in radioactive decay. This is the weak nuclear
force, which, unlike the strong nuclear force, is repellent.
Relativistic quantum theory In the early 20th century, two major physical theories were
developing: quantum mechanics and relativity theory. Albert Einstein was heavily
involved with both, but the man who took the first significant step to uniting the two
was Paul Dirac. In 1930, Dirac successfully built a quantum model of the electron,
which was compatible with special relativity. Central to his work was the Dirac
equation, a relativistic counterpart of Schrödinger’s equation.
Dirac’s ideas formed the basis on which further relativistic field theories would be
built, most immediately quantum electrodynamics. Scientists are continuing to search
for a theory which embraces general relativity. Dirac’s work also made a remarkable
prediction: the existence of antimatter.
Dirac saw a way through this conflict, when he noticed that the Dirac equation for the
electron allowed a second solution. In this case, the mass was the same, but electric
charge was reversed. He thus predicted the existence of the positron. In high-energy
situations, electrons and positrons could now appear in pairs, as their charges cancel
each other out.
The positron was discovered by Carl Anderson in 1932, in line with Dirac’s prediction.
Both men won Nobel prizes for their work. Subsequently it was found that other
particles too have corresponding antiparticles of the same mass, but with the electric
charge and other properties reversed.
387
Quantum electrodynamics Building on Dirac’s relativistic quantum theory, in the
1940s Richard Feynman, Julian Schwinger and Shin-Itiro Tomonaga assembled a
quantum field theory to describe the first of the forces of nature: electromagnetism.
The resulting theory is quantum electrodynamics (or QED), which incorporates a
quantized version of Maxwell’s equations. QED was a stunning success, able to predict
the outcome of laboratory experiments to within one part in a trillion, an unheard of
level of accuracy.
It certainly does not seem this way on planet earth, but electroweak theory predicts that
at very high energies (such as existed at the beginning of the universe) the two forces
are fully unified.
Electroweak theory was the first quantum field theory to exploit the work of Yang and
Mills. This provided an invaluable technical tool, but came at a cost. The particles
carrying the weak nuclear force are called the W and Z bosons. These are massive
particles in contrast to the massless photon which carries the electromagnetic force.
But if the two forces are ultimately one, what is this disparity? To answer this question,
it was necessary to revisit the whole question of where mass comes from. Electroweak
theory provided a brand new answer: the Higgs field.
The Higgs boson What is mass? The Newtonian answer is that it is an innate property
of all matter, and the mass of an object corresponds to the amount of matter contained
in it. This became less convincing in the early 20th century, with the discovery of
many subatomic particles with different masses, and especially photons, the first
massless particles.
According to the electroweak theory, mass derives from the Higgs field. This
permeates everything, and the mass of different particles is an expression of their
interaction with it.
If the Higgs field exists, it should be evidenced by a corresponding particle, called the
Higgs boson. The discovery of the W and Z bosons at the CERN particle accelerator in
1983 was a triumph for the electroweak theory. The Higgs boson, however, remains as
yet unobserved.
Quantum chromodynamics At the same time that some physicists were working on the
electroweak theory, others were applying quantum field theory to the next force: the
strong nuclear force.
The result was quantum chromodynamics (QCD), developed in the early 1970s by
Harald Fritzsch, Heinrich Leutwyler and Murray Gell-Mann. They used QED as a
template, but there were two additional challenges to overcome.
388
MATHEMATICAL PHYSICS
A second challenge was to account for the surge of particles called hadrons that
experimental physicists had been finding since the late 1940s. In QCD, these two
phenomena were both explained through a new fundamental particle: the quark.
Like QED, the predictions of QCD were subsequently validated in the laboratory with
astonishing accuracy.
Hadrons Particle physics had come a long way, since the ancient elemental theory of
Earth, Air, Fire, and Water. The theory of the atom developed over the 19th century
with Brownian motion providing important early evidance. Its name comes from the
Greek word atomos, meaning indivisible. But, around 1912, Ernest Rutherford and
Niels Bohr suggested that the atom was composed of negatively charged electrons
orbiting a positively charged nucleus. In 1919 Rutherford found that this nucleus was
itself composed of positively charged protons, and in 1932 James Chadwick also found
neutral particles called neutrons within the nucleus.
Protons, neutrons and electrons were for some time taken to be the fundamental units
of matter. But in the 1940s, physicists armed with more powerful tools began to unveil
a startling array of new particles. Primary among them was the large family of hadrons.
By the mid 1960s over one hundred new particles had been identified.
Suspicions grew that hadrons (including the proton and neutron) must be composed of
smaller building blocks. Quantum chromodynamics eventually provided the answer:
quarks.
Quarks In the early 1960s, Murray Gell-Mann and others proposed that protons and
neutrons are not indivisible but are each composed of three more basic particles. These
he called quarks, a word taken from James Joyce’s novel Finnegan’s Wake: ‘Three
quarks for Muster Mark’. Just as electrons are electromagnetically charged, so quarks
are charged by the strong nuclear force. Each may carry a charge of red, green or blue.
Antiquarks, meanwhile, carry an anticolour.
Quarks carry other intrinsic properties, namely electric charge, spin, and mass, and
come in a total of six flavours known as up, down, strange, charm, top and bottom
(there are additionally six corresponding antiquarks). These six varieties have been all
been identified in particle accelerator experiments. The last to be seen was top in 1995.
389
Standard model of particle physics
Putting together the electroweak theory (of electromagnetism and the weak nuclear
force) and quantum chromodynamics (for describing the strong nuclear force) gives
the standard model of particle physics. Since the 1970s, this framework has
represented our best understanding of the particles which make up matter. Major
questions remain, however:
1 Even on its own terms, the standard model is not a complete theory. There are
several numerical constants which are currently unexplained; these have to be
observed from nature and written into the theory.
3 The electroweak theory demonstrated that the weak nuclear force and
electromagnetism are not separate but two aspects of a single fundamental force. The
strong force, however, is treated separately. Many physicists believe it should be
possible to unite all three, in a Grand Unified Theory.
Gauge groups It was Hermann Weyl who first realized the crucial role that symmetry
groups would play in understanding fields such as electromagentism. Unlike the
symmetry groups of spacetime, these gauge groups do not describe global symmetries
but arise from the underlying algebra.
In quantum electrodynamics, the same thing holds, and the electromagnetic field is
unaffected by multiplication by a. The gauge group corresponds to taking together all
these gauge symmetries. In this case, we get the circle group, known as U(1).
Yang–Mills theory A major technical step in quantum field theory was made by Chen-
Ning Yang and Robert Mills in 1954, when they had the bold idea of replacing U(1)
with a larger Lie group, in the first instance SU(2), a group of 2 x 2 complex matrices.
This group allowed extra hidden symmetries within the system, but a major technical
difficulty is that it may be non-Abelian. This means that combining two symmetries g
and h could produce different results, depending on the order: gh hg.
390
MATHEMATICAL PHYSICS
Yang and Mills wrote down two equations that any non-Abelian gauge theory should
satisfy. These equations are the source of some of the thorniest questions in
mathematical physics: the Yang–Mills problems.
Yang–Mills theory has played in physics over the last half century, it is remarkable
that the Yang– Mills equations themselves have never been fully solved. The existence
problem addresses this:
1 For any simple Lie group G, show that a quantum field theory can be built which has
2 Show that there is some number 0 so that every excitation must have energy at least.
In 2000, the Clay Institute offered a prize of $1,000,000 for a solution to these two
questions. The mathematical physicist Edward Witten was involved in the selection of
the Millennium problems, and wrote that the existence problem ‘would essentially
mean making sense of the standard model of particle physics’.
There has been a great deal of research into the Yang–Mills problems, some of which
has had repercussions elsewhere in mathematics. A notable example is in differential
topology, in the discovery of aliens from the fourth dimension.
Quark confinement A third problem coming from Yang–Mills theory (though without
the million dollar price tag) is to show that quarks are confined. This would explain
why we can never extract individual quarks from protons or neutrons; they only come
in threes, or in quark–antiquark pairs.
When two electrically charged particles draw apart, the magnitude of the force between
them drops. The same happens with gravity and the weak nuclear force. This is not
true of the strong nuclear force however. It remains constant over distance. The result
is that when you try to separate a quark from its antiquark, for example, the energy
required is so high as to bring another quark into existence to replace it. In terms of
mathematics, the problem is to show that all possible particle states in QCD are SU(3)-
invariant.
391
GAMES RECREATI
The growth of game theory over the 20th century means that mathematics has much to
contribute to ancient games such as Chess and Go. Highlights include the growth of
games-playing machines such as Deep Blue, and the solution of checkers in 2007. But
the importance of game theory extends far beyond board games, to any context where
strategy is needed, from economics to artificial intelligence. Today, game theorists are
even consulted on military and political strategy.
393
GAME THEORY
(or tic-tac-toe) is one of the simplest there is. The action takes place on a 3 3 grid, with
one player inserting noughts (Os) and the other crosses (Xs). They take turns, and if
either player gets three in a row, she has won. After some practice, an intelligent player
should never lose at this game, whether playing first or second. There is a strategy that
either player can use to force a draw.
The central question in game theory is to identify when a strategy exists either to win
or draw. Game theory began with reasoning about traditional board games, but has
hugely outgrown these shoes. Game theorists were consulted on strategy during the
Cold War, and the subject has great significance to the study of stock markets.
100 of whom have blue eyes, and 900 of whom have brown. However, there are no
mirrors on the island, and the local religion forbids all discussion of eye-colour. Even
worse, anyone who inadvertently discovers their own eye-colour must commit suicide
that same day.
One day, an explorer lands on the island, and is invited to speak before the whole
population. Ignorant of the local customs, he commits a faux pas. ‘How pleasant it is’,
he says, ‘to see another pair of blue-eyes, after all these months at sea’. What happens
next?
(We must assume that the islanders follow their religion unerringly. An islander could
not, for example, simply decide to disobey the suicide law. Despite their crazy religion,
the islanders are also assumed to be hyperlogical: if there is some way by which
someone can deduce their eye-colour, they will do so instantly.)
The blue-eyed theorem The solution to the blue-eyed suicides puzzle depends on the
number of blue-eyed islanders. In the original version, there are 100 of them. But it is
easier to start with the case where there is just one, call him A. Then A knows from the
explorer’s words that there is at least one blue-eyed islander. Since he can see none, A
concludes that it must be him. He commits suicide on the first day.
Suppose now that there are two, A and B. A can see B, and so A knows that there is at
least one blue-eyed islander. However, come the second day, A observes that B has not
committed suicide, and deduces that B must also be able to see someone with blue
eyes. Since he can see none other, he concludes that it must be him. He commits
suicide on the second day, and B does likewise.
The general statement is the following: if there are n blue-eyed islanders, they will all
commit suicide on the n th day. This can be proved quite simply by induction. So the
394
GAMES AND RECREATION
GAME THEORY
solution to the original question is that all the blue-eyed islanders will commit suicide
on the 100th day. (The next day the brown-eyed islanders will realize what has
happened, and do the same.)
Not all first-order knowledge is second order. For example, the people in a room may
each individually notice that the clock on the wall has stopped, but they will have no
idea whether the others have also noticed, until someone mentions it. At this point it
becomes second-order knowledge (in fact common knowledge).
What is remarkable about the problem of the blue-eyed suicides is that it involves
orders of knowledge up to 100 (admittedly under somewhat artificial assumptions). Let
X be the statement ‘at least one islander has blue eyes’. The case that is easiest to
understand is when there is only one blue-eyed islander. Here, the explorer directly
increases the stock of first-order knowledge, by telling everyone X.
When there are two blue-eyed islanders (A and B), everyone knows X, which is first-
order knowledge. But X is not second-order knowledge, because A does not know that
B knows this, until the explorer speaks.
The first case which is difficult to imagine is where there are three blue-eyed islanders,
call them A, B and C. So each of them knows that there are at least two blue-eyed
islanders; this much is first-order knowledge. Also, each knows that each of the others
can see at least one blue-eyed islander. So X is second-order knowledge. But X is not
third-order knowledge (until the explorer speaks), because A cannot know that B
knows that C knows this.
In the original puzzle, the explorer’s seemingly harmless words actually increase the
stock of 100th-order knowledge. Amazingly, this is enough to doom the island.
X is common knowledge to a group of people if each of them knows it. Within game
theory, this is first-order knowledge. Common knowledge has a much stronger
meaning. It is required not
395
only that everyone knows X, but that everyone knows that everyone knows X, and
furthermore that everyone knows that everyone knows that everyone knows X, and so
on. That is to say, common knowledge is knowledge of every order.
The typical way for a piece of information to become common knowledge is for it to
be announced publicly.
Although the roots of the idea go back to the philosopher David Hume in 1740,
common knowledge has been studied in detail only more recently. It is a central idea in
game theory, notably in the work of Robert J. Aumann, via whom the subject entered
economics. The path of a piece of data from first-order knowledge, up through the
orders of knowledge to common knowledge is believed to be highly significant for
financial markets.
The prisoner’s dilemma Alex and Bobby are arrested for serious fraud, and held in
separate cells. The prosecutor makes the same offer to each:
1 If you confess, and your accomplice does not, then you will go free. With your
testimony they will be convicted of fraud and sent to prison for ten years.
2 If your accomplice confesses and you do not, the opposite will happen: they will
walk free, and you will go to prison for ten years.
3 If you both confess, then you will both be convicted of fraud, but your sentence will
be cut to seven years.
4 If neither of you confess, then you will both be convicted on lesser charges, and each
sentenced to prison for six months.
(We assume that neither of the prisoners has any ethical concerns, or cares for the
welfare of the other; each is just interested in minimizing their own prison time.)
The optimal solution seems to be option 4, as this minimizes the total amount of prison
time. However, Alex and Bobby could each reason as follows: ‘whatever the other
does, I am better off confessing’. If they both follow this strategy, they will arrive at
option 3, arguably the worst for the pair of them.
In a game with two or more players, an equilibrium is defined as a situation where no-
one has any incentive to make a unilateral change of strategy, even with full
knowledge of the others’ intentions. The prisoner’s dilemma illustrates that an
equilibrium by no means necessarily represents an optimal outcome.
John Nash considered non-cooperative games, where players cannot enter into binding
agreements. In 1950, he proved his celebrated theorem, that equilibria always exist in
any such game, Typically,
396
GAMES AND RECREATION
GAME THEORY
the equilibrium involves mixed strategies in which each player assigns probabilities to
his various possible moves. His proof rested on a generalization of Brouwer’s fixed
point theorem.
Deep Blue The first chess-playing automaton was devised in the 18th century, by
Wolfgang von Kempelen. Known as ‘The Turk’, it gained a reputation for defeating
all-comers, including Napoleon Bonaparte and Benjamin Franklin. In 1820, the
workings of the Turk were finally revealed. It involved a human chess-master sitting
inside a cabinet, moving the pieces with levers. As many had suspected, the Turk was
an ingenious hoax.
During the 20th century, however, the chess-playing computer became a reality. After
the theoretical groundwork was laid by Claude Shannon, Alan Turing and others, the
first fully functional chess program was developed in 1958. In 1980, Edward Fredkin
of Carnegie Mellon University offered a prize of $100,000 to the programmers of the
first chess computer to defeat a reigning world champion. This prize was claimed in
1997 by programmers at IBM, whose machine Deep Blue defeated Garry Kasparov
3½–2½.
Games-playing machines The descendants of Deep Blue can now, for the most part,
defeat humans at Chess. Other games have similarly succumbed to the dominance of
computers. Maven is a computerized Scrabble® player developed by Brian Sheppard
which can defeat even world champions. In the game of Othello® (or Reversi),
Michael Buro’s Logistello program can defeat the best. In the ancient Chinese game of
Go, however, humans continue to rule the roost, for the time being.
Complete solutions to Chess and Go remain a distant dream. The games are simply too
intricate, with too many possible scenarios. In his important 1950 paper ‘Programming
a Computer for Playing Chess’, Claude Shannon estimated the number of distinct
games of Chess to be at least 10120. Dwarfing the number of atoms in the universe,
this is far too large for any computer to handle directly.
Some simpler games have been solved, however, notably Connect Four (in which the
first player can always force a win), Gomoku (or Connect Five), Awari (an ancient
African game), the ancient Roman game Nine Men’s Morris, Nim and checkers.
397
Checkers is solved ‘Checkers is Solved’ was the title of a 2007 paper by a team of
computer scientists led by Jonathan Schaeffer. By some distance the most complex
game to have been fully solved, this was a triumph of mathematical analysis, and a
milestone in the development of artificial intelligence.
What Schaeffer discovered was a strategy for playing which would never lose. If two
computers played against each other, each using this perfect strategy, they would draw
(much as two competent players of noughts and crosses, or tic-tac-toe, will always
draw). This result had been conjectured by the top checkers players many years earlier.
Proving the result was a mammoth operation, which involved a complete analysis of
all possible endgames with ten pieces on the board, and a sophisticated search
algorithm to determine how starting positions in the game relate to the possible
endgames. To do this entailed in Schaeffer’s words ‘one of the longest running
computations completed to date’, involving up to 200 computer processors operating
continuously from 1989 till 2007.
Nim Claude Bachet was a 17th-century European nobleman, and the first person to
make a serious study of recreational mathematics. One of his finds, Bachet’s game, is
even simpler than noughts and crosses (tic-tac-toe). A pile of counters is placed on the
table, and two players take turns to remove counters. Each can choose to remove 1, 2
or 3. The winner is the person who takes the final counter.
The outcome of the game depends on the number of starting counters (between 15 and
25 is normal), and which player goes first.
Bachet’s is the simplest of the family of Nim games. In other variants such as
Marienbad, there may be more than one pile of counters, and players can choose from
which pile to take counters. In misère versions, the player who takes the last counter
loses. Other complicating factors may involve increasing the number of counters a
player may remove, or forbidding players from removing the same number of counters
as the previous player.
These additional factors can turn Nim into a complex game of strategy. But Nim has
more than recreational value, as the Sprague–Grundy theorem demonstrates.
Mex To analyse the game of Nim, we can assign numbers called nimbers to the
different states of the game. The key device is that of the minimum excluded number
(or mex) of a set of natural numbers.
For the set {0, 1, 3} the mex is 2, as this is the smallest number missing from the set.
For {0, 1, 2} the mex is 3, and from {1, 2, 3} the mex is 0.
In Bachet’s game, reducing the pile to zero is a winning move, so a pile of size zero
has nimber 0.
398
GAMES AND RECREATION
FIBONACCI
We calculate the nimbers of position X as follows: list the nimbers of the positions you
could move to from X. The nimber of X is then defined as the mex of this list.
A pile of size 1 can only move to a pile of size 0 which has nimber 0. So the nimber for
the 1-counter position is the mex of {0} which is 1. Similarly piles of two and three
counters have nimbers 2 and 3 respectively. But from four counters, the possible
moves are to one, two or three, so the nimber will be the mex of {1, 2, 3} which is 0.
This accords with the experience of playing the game, that leaving four counters is a
winning move. Then we continue:
Nimber 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
More complicated variants of Nim have more complex patterns of nimbers. But in each
case the nimber 0 represents a winning position, so the strategy is always to aim for
these.
In any game of Nim, one or other player will always have a winning strategy.
Sprague–Grundy theorem In Chess and checkers, each player has their own set of
pieces, with which they battle their opponent’s. Therefore, from the same position on
the board, the moves that the two players could make are entirely different. Nim, on
the other hand, is an impartial game. The two players play with the same pieces and,
from the same position, what would count as a good move for one would be equally
good for the other. The only difference between the players, then, is who goes first.
FIBONACCI
399
To investigate, he modelled this situation, making several simplifying assumptions:
1 Rabbits do not die 2 Rabbits come in pairs 3 Rabbits have two forms: baby and adult
4 Baby rabbits cannot breed 5 Baby rabbits become adult after one month 6 Each pair
of adult rabbits produces one pair of babies every month According to these, the
garden contains one pair of baby rabbits in the first month. In the second, there is one
pair of adult rabbits. In the third month these have reproduced, so there is one adult
pair and one new baby pair, and so on.
This sequence continues 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, and in the twelfth month
there are 144 pairs. So the answer to Fibonacci’s original question is 288, subject to his
assumptions. The sequence he had discovered became known as the Fibonacci
sequence and is one of the most famous in science.
In terms of rabbits, the total number of rabbits in July (F n 2) is the number of adults
plus the number of babies. The adults are those rabbits which have been alive since
June (F n 1). July’s baby rabbits are equal in number to their parents, namely June’s
adults, who are all of May’s rabbits (F n).
Fibonacci spiral On a piece of squared paper, draw a 1 1 square. Next to it, draw
another. Adjoining these, draw a 2 2 square, and then a 3 3. Spiralling round, you can
keep drawing squares whose sides are given by the Fibonacci sequence.
Once this is done, drawing arcs between the meeting points of the squares produces a
Fibonacci spiral, a good approximation to a logarithmic spiral.
Month 1
Month 2
Month 3
Month 4
Month 5
gives birth to
same rabbits
400
GAMES AND RECREATION
FIBONACCI
Ratios of Fibonacci numbers We can form a new sequence from the ratios of
successive Fibonacci numbers:
The interesting thing about this is that it converges. While the Fibonacci sequence
grows bigger and bigger without limit, this sequence of ratios gets ever closer to some
fixed number.
If a and b are successive Fibonacci numbers (a long way along the sequence), then a b
and a b
should be very close together. This is reminiscent of the definition of the golden
section ϕ, where a line is divided into lengths a and b so that a
sequence does indeed tend to, intimately linking the two topics.
Binet’s formula What is the 100th term in the Fibonacci sequence? Is there a way to
find it, without trudging through the first 99?
Here,
Nevertheless the Fibonacci sequence undeniably does appear in the natural world. It is
a better model for the family trees of honeybees than for rabbits. If you ask how many
parents, grandparents or great-grandparents an individual bee has, the answers tend to
be Fibonacci numbers. (For humans, the answers are powers of 2, but the solitary
queen changes this for bees.)
It is also striking that the numbers of petals on many flowers are often Fibonacci
numbers, as are the number of spirals of fruitlets on pine-cones and pineapples.
There is certainly something in the simple iteration of the sequence which reflects
natural growth. It is what biomathematicians term a Lindenmayer system, after the
botanist Aristid Lindenmayer who first used them to model plant growth. Lindenmayer
systems are logically stripped down dynamical systems. They describe many familiar
fractals such as the Koch curve and Cantor dust, as well as producing excellent models
of plant growth.
401
THE GOLDEN SECTION
The golden section In Proposition 6.30 of his Elements, Euclid showed how to take a
segment of line, and divide it into two lengths, so that the ratio of the whole line to the
longer part is the same as that of the longer to the shorter. This special ratio is known
as the golden section or golden ratio. It is denoted by the Greek letter phi (ϕ) after the
sculptor Phidias who, in around 450 bc, first exploited its aesthetic qualities.
Starting with a line 1 unit long, we divide it into two lengths according to Euclid’s
rules for the golden section, resulting in a longer part a, and a shorter piece b. First we
can express a and b in terms of. Since the ratio of 1 to a is, this says that 1
So a 1
This means b 1
The golden section in the arts Phidias’ works around 450 bc, including his sculptures
for the Parthenon in Athens, probably mark the start of the relationship between the
golden section and the arts (although it is difficult to be certain). In 1509, the
mathematician Luca Pacioli published a three-volume treatise on, Divina Proportione
(‘the Divine Proportion’), with illustrations by his friend and fellow-enthusiast,
Leonardo da Vinci.
A golden rectangle is often said to be among the most aesthetically pleasing of shapes.
Psychologists have tested this claim, with conflicting results. This figure does appear
in the decorative arts (whether through conscious design or not). This happens most
explicitly in the work of the 20th-century painter Salvador Dali, and the architect Le
Corbusier, who developed a system of scales called ‘The Modulor’ based on.
402
GAMES AND RECREATION
Although the place of the golden section in art and architecture is assured, many
specific examples remain controversial. Egypt’s Great Pyramid of Giza, Paris’ Notre
Dame Cathedral and the work of the Italian renaissance architect Andrea Palladio are
all cases where the involvement of the golden section is a matter of dispute.
Golden rectangle A golden rectangle is one whose sides are in proportion given by the
golden section. It has an elegant defining property: if you start with a golden rectangle,
construct a square from the shorter side and remove it, what is left is a smaller golden
rectangle. Golden rectangles are often said to be the most aesthetic of figures and form
the basis of much art and architecture.
Kepler triangle Johannes Kepler wrote: ‘Geometry has two great treasures: one is the
theorem of Pythagoras, the other the division of a line into mean and extreme ratio.
The first we may compare to a mass of gold, the second we may call a precious jewel’.
In the phrase ‘mean and extreme ratio’, Kepler was echoing Euclid’s terminology for
the golden section. In a piece of mathematical art, Kepler brought the two treasures
together by constructing a right-angled triangle whose lengths are 1,
and.
Paper size Since the 18th century, manufacturers of writing paper have appreciated the
benefits of scalability. This has become even more important since the advent of the
computer and home-printing. If you want to print a document, you might first want to
print a draft at half size. This means that the smaller-size paper must be in the same
proportions, or similar, to the original. It would be particularly convenient if the
smaller paper was exactly half the size of the original, so that two small pages could be
printed on one large one.
But this is not true for most shapes. Starting with a square and cutting it in half
produces two rectangles, not two squares. What was wanted was a special rectangle
which, when cut in half, produces a similar rectangle. (This makes an interesting
comparison to the defining property of the golden rectangle.) Taking the shorter side as
1, and the longer as a, the requirement is that the ratio of a to 1, should be the same as
the ratio of 1 to
That is a
A square
A golden rectangle
403
PUZZLES AND PERPLEXITIES
Magic squares Around 2250 bc, Emperor Yu of China discovered a turtle in the
Yellow River. On its shell were some curious markings. On closer inspection, they
formed a 3 3 square, with the numbers 1 to 9 inside.
At least, so goes the story of the first magic square, known as the Lo Shu or Yellow
River Writing. The magic is that each row and each column add up to the same
number, 15, as do the two main diagonals.
Leaving aside rotations and reflections, the Lo Shu is the only 3 3 magic square. There
are 880 different 4 4 magic squares, including Dürer’s. These were listed in 1693, by
Bernard Frénicle de Bessy. In 1973, Richard Schroeppel calculated that there are
275,305,224 different 5 5 magic squares.
It is not known how to calculate the exact number of n n magic squares, but in 1998
Pinn and Wieczerkowski used statistical methods to estimate the number of 6 6 magic
squares at around 1.77 10 19.
Dürer’s magic square In Albrecht Dürer’s picture Melancholia, he paid tribute to his
love of mathematics. As well as a mysterious polyhedron (known as his melancholy
octahedron), the engraving also contains Europe’s first magic square.
Dürer’s is truly a connoisseur’s square. Not only do the rows, columns and diagonals
sum to 34, but so do the four quadrants, the four central numbers, the four corners and
several other significant groupings. Even more, the central numbers on the bottom row
date the picture: 1514. Outside these are the numbers 4 and 1, alphanumeric code for D
and A, Dürer’s initials.
Generalized magic squares Magic squares are the oldest of all recreational
mathematics, and it is not surprising that many variations on the theme have
developed. One such is a multiplication magic square, where, instead of adding, the
numbers are multiplied together. In this case we no longer insist that the numbers are
consecutive, but they do all have to be different.
In 1955, Walter Horner found an 8 8 square which functions both as an ordinary magic
square and as a multiplication magic square. Another 8 8 addition–multiplication
magic square, as well as a 9 9 example, was found in 2005 by Christian Boyer. It is not
known if there are any smaller examples.
404
GAMES AND RECREATION
Magic cubes In 1640, Pierre Fermat lifted the principle of magic into higher
dimensions. A magic cube is a cube where all rows, columns, pillars and the four body
diagonals add up to the same number. The cube is perfect if the diagonals on each
layer also add up to the same number, meaning that the cube is built from magic
squares. It was an open question for many years what the smallest perfect magic cube
was. In 2003 Christian Boyer and Walter Trump found it: a 5 5 5 cube, built from the
numbers 1 to 125.
Needless to say, mathematicians have not stopped at dimension 3. In the 1990s John
Hendricks produced perfect magic hypercubes in dimensions 4 and 5, as well as
studying magic hypercubes in higher dimensions.
In 2003, Boyer also discovered a gigantic tetramagic cube: an 8192 8192 8192 magic
cube. Astonishingly, this remains magic (and perfect) when each entry is squared,
cubed or raised to the fourth power.
Knight’s tours In Chess, most pieces move either in horizontal, vertical or diagonal
lines. A knight’s move is the simplest move not covered by these possibilities. A
knight can move two squares forwards or backwards, and then one right or left.
Alternatively, he can move two squares left or right, and then one forward or back,
making eight possible moves.
A knight’s tour is a path that a knight can take around the chessboard, visiting each
square exactly once. A closed tour is one where the knight loops back to his starting
point. The smallest board where this is possible is the 6 6 board. Here, there are 9862
closed tours. Of particular interest are tours which have some level of symmetry. On
the 6 6 board, there are five closed tours which have rotational symmetry of order 4,
discovered by Paul de Hijo in 1882.
405
Magic knight’s tours In a knight’s tour, if we label the knight’s starting square 1, the
next 2, and so on, then we can hope that not only will the knight visit each square
exactly once, but his trail of numbers will form a magic square.
The first magic knight’s tour was discovered in 1848 by William Beverley, on a
standard 8 8 chessboard. In 2003, Stertenbrink, Meyrignac and Mackay used a
computer to show that there are exactly 140 such tours. None of these, however, are
perfect magic squares as their diagonals do not sum to the same totals as the rows and
columns; there is no perfect magic knight’s tour on a standard chessboard.
On a 12 12 board though there are perfect magic tours, including the illustrated
example, discovered by Awani Kumar. Recently Kumar has extended the problem into
higher dimensions, and discovered magic knight’s tours of cubes, and even of
hypercubes in up to five dimensions.
Latin squares Despite their name, Latin squares originate in the medieval Islamic
world, where they were considered mystic and engraved on amulets. The appeal is in
their simplicity and symmetry. To create one, fill a 3 3 grid with the numbers 1, 2, 3 so
that each number appears exactly once in each column and each row. The challenge
extends to 4 4, 5 5 and all n n squares.
These squares were picked up by Leonhard Euler who considered them ‘a new type of
magic square’, and they are of genuine mathematical significance. The Cayley table of
a finite group is a Latin square, for example (although the converse is not always true).
A Latin square is reduced if the first row and first column both consist of 1, 2, 3, …, n
in the correct order. Every Latin square can be reduced by swapping around its
columns and rows. There is just one reduced Latin 2 2 square, and similarly one 3 3
example. There are four 4 4 distinct reduced Latin squares, and, as Leonhard Euler
showed, 56 5 5 examples. In 1900, Gaston Tarry showed that there are 9408 6 6
squares, which led him to the solution of the 36 officers problem.
There are several variations on the Latin theme, most notably Sudoku and Euler’s
Graeco-Latin squares.
Sudoku Sudoku was first dreamt up in 1979 by Howard Garns in New York, and
published in Dell Pencil Puzzles and Word Games magazine under the name ‘Number
Place’. Sudokus then became fashionable in Japan when published by Nikoli puzzle
magazines, where they picked up their current name, an abbreviation of ‘Suuji wa
dokushin ni kagiru’, meaning ‘numbers should be unmarried’. Since then, they have
achieved worldwide popularity.
Underlying the puzzle is a 9 9 Latin square, into which the digits 1 to 9 must be
written, each appearing exactly once in each row and column. The extra rule comes
from the grid being subdivided into nine 3 3 blocks. Each of these too must contain the
numbers 1 to 9.
406
GAMES AND RECREATION
The Sudoku begins with a few numbers already in place, these are the clues. The
challenge is to complete the whole grid. Importantly, it is designed to have a unique
solution. Typically this can be arrived at a by a process of elimination, step by step.
More difficult puzzles might present the solver with a choice of ways forward, both of
which need to be investigated in greater depth before either can be ruled out.
Sudoku clues Setters of Sudoku need to ensure that their puzzle has a solution, and that
it only has one solution. This is a classic existence and uniqueness problem. For
existence, of course an empty grid (that is 0 clues) has a solution, as does a completed
puzzle (81 clues). To ensure that your puzzle has a solution, you just need to avoid
contradictory configurations.
It is a thornier problem to guarantee uniqueness. A basic question is: how many clues
are required (that is, how many numbers present at the start)? Surprisingly, the answer
is not known. The lowest number of clues which is known to generate a unique Sudoku
is 17, and it is widely suspected that this is the lowest possible answer.
36 army off icers, from six different regiments, and of six different ranks. The question
he posed was whether it was possible to arrange these soldiers in a 6 6 grid so that each
rank and regiment appears exactly once in each row and column.
What this amounts to is finding a 6 6 Graeco-Latin square. Euler wrote that ‘after
spending much effort to resolve this problem, we must acknowledge that such an
arrangement is absolutely impossible, though we cannot give a rigorous proof’. In
1901 Gaston Tarry listed the 9408 possible 6 6 Latin squares, and showed that there
was no way to combine any two of them without some pair being repeated, thereby
proving Euler’s conjecture.
Graeco-Latin Squares Leonhard Euler was interested in ways to put Latin squares
together. For example, can we form a 3 3 Latin square with the symbols 1, 2, 3 and
another with A, B, C, and then put them together so that no two squares contain the
same pair? If so, the result is a Graeco-Latin square.
The answer in this case is yes. But if you try the same thing for 2 2 squares, you will
not succeed. Euler conjectured that no Graeco-Latin square can exist for squares of
side 2, 6, 10, 14, 18, and so on. He was correct for 2 and for 6, as evidenced by Tarry’s
solution to the 36 officers problem. But in 1959 Parker, Bose and Shrikhande (known
as ‘Euler’s spoilers’) did create a Graeco-Latin square of side 10, and showed how to
construct one of sides 14, 18, … and so on, refuting Euler’s conjecture.
The name comes from the fact that Euler used the Latin and Greek alphabets for the
two labellings. They have broad applications for producing optimal matchings between
different sets of objects, such as sports contests and experiment design.
407
Sports contests and experiment design
Suppose two teams of five tennis players have a contest in which each player plays
every player from the other team.
Labelling the players from the first team A, B, C, D, E and those from the second an
optimal schedule for matches is provided by a 5 5 Graeco-Latin square:
Gardner’s logician Martin Gardner, who died in 2010, was the world’s foremost expert
on recreational mathematics. In over 65 books, and 25 years of columns for the
Scientific American magazine, he brought to public attention a wealth of brain-bending
puzzles, delightful curiosities, and deep mathematics including Penrose tilings, fractals
and public key cryptography. In his earliest columns for the Scientific American,
Gardner introduced a fictitious logician whose exploits have become classic.
The logician is travelling on an island inhabited by two tribes, one of which always
lies, while the other always tells the truth. He is walking to a village but comes to a
fork in the road. Not knowing which path to take, he consults a local man resting under
a tree nearby. Unfortunately, he cannot tell whether the local belongs to the lying tribe
or the truthful tribe. Nevertheless, he asks a single question, and from the answer he
knows which way to go. What question could he have asked?
The unexpected hanging A logician was condemned to be hanged. The judge informed
him that the sentence was to be carried out at noon, one day next week between
Monday and Friday. But it would be unexpected: he would not know which day it
would be, before it happened. In his cell, contemplating his doom, the logician
reasoned as follows:
‘Friday is the final day available for my hanging. If I am still alive on Thursday
afternoon, then I can be certain that Friday must be the day. Since it is to be
unexpected, this is impossible. So I can rule out Friday.
408
GAMES AND RECREATION
‘Therefore Thursday is the last possible day on which the sentence can be carried out.
If I am still here on Wednesday afternoon, I must expect to die on Thursday. Again,
this conflicts with the execution’s unexpectedness, and is therefore impossible. I can
rule out Thursday.’
Repeating the same argument, the logician was able to rule out Wednesday, Tuesday,
and Monday, and went to sleep slightly cheered. On Tuesday morning the hangman
arrived at his cell, completely unexpectedly. As he stood on the gallows, the logician
reflected that the dreadful sentence was being carried out exactly as the judge had
promised.
Logic and reality The unexpected hanging is one of the most troubling of all logical
paradoxes, as it separates pure deduction from real life in a very disconcerting fashion.
As the philosopher Michael Scriven put it ‘The logician goes pathetically through the
motions that have always worked the spell before, but somehow the monster, Reality,
has missed the point and advances still’. The paradox is of uncertain origin, but has
been contemplated by logicians and philosophers including Kurt Gödel and Willard
Quine. It entered popular consciousness in Martin Gardner’s 1969 book The
Unexpected Hanging and Other Mathematical Diversions.
The impossible sentence To analyse the core of the paradox of the unexpected hanging,
it is useful to shorten the judge’s sentence. What happens if we reduce it to a single
day? In this case, the sentence amounts to ‘You will be hanged on Monday, but you do
not know that’. This is already problematic. If the logician can be certain that he will
be hanged on Monday, then this knowledge renders the judge’s statement false. But if
the judge’s statement is false, this removes his basis for believing he will be hanged on
Monday at all.
Like the liar paradox, the logician is unable to accept the judge’s statement at face
value, since it imposes conflicting restrictions about what is true, and what he can
know to be true. But while the liar paradox is a perpetual absurdity, the passing of time
changes the situation for the doomed logician, and may reveal the judge to have
spoken with complete truth all along.
The bald man paradox This is not a genuine mathematical result, but a warning that
mathematics and ordinary language do not always mix easily. It uses mathematical
induction to ‘prove’ something untrue: that every man is bald.
The base case is a man with 0 hairs on his head: he is self-evidently bald. The
inductive step is to suppose that a man with n hairs on his head is bald. Then the next
man, who has n 1 hairs on his head, is either bald or not. But one single hair cannot be
the difference between baldness and hirsuteness. So the man with n 1 hairs must be
bald too. Hence, by induction, a man with any number of hairs on his head is bald.
The resolution of the paradox comes from the fact that ‘bald’ is not a rigorously
defined term: men are more or less bald, rather than rigidly either bald or not. A man
with no hair is indisputably bald, but as individual hairs are added to his head, he
becomes incrementally less bald, until around 100,000, when he is no longer bald at
all.
409
The 1089 puzzle Write down any 3-digit number, the only rule is that it must not read
the same forwards as backwards (not 474, for example). I will pick 621. First, reverse
the digits: 126. Next, subtract the smaller of these two from the larger: 621 126 495.
Now reverse the digits of this new number: 594. Add the two new numbers together:
495 594 1089.
Whatever number you start with, the final answer is always 1089. With a bit of
panache this can be exploited to give the illusion of psychic powers. (The only place to
be careful is to make sure you always treat the numbers as 3-digit numbers. So if you
get 099 after the second stage, this should be reversed to give 990.)
The 1089 theorem The 1089 problem relies on a quirk of the decimal system. Why
does it work? If we start with a number abc, this really means 100 a 10 b c. If we then
subtract cba (which is really 100 c 10 b a), we get 99 a 99 c. The important thing about
this number is that it is a multiple of 99. Such numbers are of the form a 9 d, where a d
9. (This is the divisibility test for 99, similar to that for 9.) Now when you add d 9 e e 9
d, the units (or ones) digit will be 9 (because d e 9). Then the two 9s in the tens column
produce 8, with 1 to carry. So in the hundreds column we have d e 1 10.
Hailstone numbers Pick a whole number, any whole number. We will apply the
following rule: if the number is even, halve it; if it is odd, triple it and add 1. So,
starting with 3, we first move to 10. Applying the rule again takes us to 5, and from
there to 16. Then we go 8, 4, 2, 1. Once we hit 1 we stop. We call 3 a hailstone number
because it eventually falls to the ground, that is to say, it ends up at 1.
Some numbers have more complicated sequences. Starting at 7 we get: 7, 22, 11, 34,
17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1. So 7 is also a hailstone number.
The question Lothar Collatz posed in 1937 is: is every number a hailstone?
It would be false if one of two things happened: some sequence ends up not at 1, but in
an infinitely repeating cycle, or some sequence simply grows and grows for ever.
In 1985, Jeffrey Lagarias proved that if there is a cycle, then it must have length at
least 275,000. The question as a whole remains wide open, but in 2009, Tomás
Oliveira e Silva verified it by computer for all numbers up to 5.76 10 18.
410
INDEX Aristotle’s classification of
categorical syllogisms 260 C
Aristotle’s three laws of thought calculus of the natur
411 297 logarithm
Listings correspond to text entry arithmetic 14-21 calculus of power se
headings; bold denotes section arithmetic progression 204 Cayley tables
headings arithmetic using squares 20 Cantor dust
asymptotes 117 Cantor’s diagonal ar
0.9 = 1 29 atlas of Lie groups, the 184 273-4
36 officers problem, the 407 attracting cycles 246 Cantor’s theorem
1089 puzzle, the 410 Aumann’s agreement theorem Cantor’s uncountabi
1089 theorem, the 410 333 transcendental nu
1729 is interesting 35 automated theorem proving cardinal numbers
312 cardinal trichotomy
A axiom of choice, the 278 the
∀ and ∃ 259 axiomatic set theory 277 Cartesian coordinate
abc conjecture, the 64 axiomatizing predicates 267-8 Cartesian geometry
abstract algebra 185-9 axioms of Peano arithmetic cartographic projecti
abstract algebra 185 268-9 casting out nines
Catalan’s conjecture
abstraction 307
(Mihăilescu’s theo
Achilles and the tortoise 212 B catastrophe theory
adding 1 up to n, proof by bald man paradox, the 409 categorical sentence
induction 36 Banach-Tarski paradox 278 category theory
adding the first hundred barber paradox, the 276 catenary
squares 36-7 bases 38 Cauchy-Schwarz ine
adding fractions 28-9 basic logic 258-9 169-70
adding probabilities 328 basics, the 10-14 central limit theorem
adding up 1 to 100 36 Bayes’ theorem 330 centres of triangles
addition 10 Bayesian inference 333 chain rule, proof of
addition by hand 14-15 Bayesianism 332-3 chain rule, the
adequacy and soundness Beal’s conjecture 62-3
theorems, the 266 Chaitin’s Ω
Bell numbers 194 chaos
adjugate of a matrix, the 173
Benford’s law 326 chaos versus random
after the happy ending 200-201
Bernoulli distribution 336 246-7
AKS primality test 287
Bertrand’s postulate 67 chaotic systems
Alexander’s horned sphere
beyond propositional logic checkers is solved
135-6
266 Chen’s theorem 1
algebraic and analytic number
Bézout’s lemma 57 Chen’s theorem 2
theory 54
biangles 143 Chen’s theorem 3
algebraic geometry 145-51
BIDMAS 159 Chinese postman pr
algebraic geometry 146
bifurcations 246 the
algebraic structures 186
binary 38 Chinese remainder t
algebraic topology 143-5 55
Binet’s formula 401
algebraic topology 145 chiral knots
binomial coefficients 194
algorithms 281 Church’s thesis
binomial distribution 337
aliens from the fourth
binomial theorem, the 160 circle formulas
dimension 138
binomial trials 337 circles
aliquot sequences 34
Birch and Swinnerton-Dyer circles
alternate segment theorem,
conjecture 153 classification of close
the 95
birthday problem, the 329 surfaces, the
alternating groups 179-80
birthday theorem, the 329 classification of finite
AM and FM 357-8
bisecting an angle 50 groups, the
amicable pairs 34
complexity classes 285-6
complexity theory 285-90
calculus of the natural
complexity theory 285
logarithm 239
compound interest 237
calculus of power series 234
compound polyhedra 102
Cayley tables 180-81
computability 290
Cantor dust 241-2
computability theory 290-94
Cantor’s diagonal argument
computable real numbers
273-4
291
Cantor’s theorem 275
computably enumerable sets
Cantor’s uncountability of
290
transcendental numbers 46
computational mathematics
cardinal numbers 275-6 310
cardinal trichotomy principle, computers 309
the 278
conditional probability 330
Cartesian coordinates 83
conic sections 115
Cartesian geometry 82
conservation of kinetic energy
cartographic projections 131 356
casting out nines 20-21 conservation of momentum
Catalan’s conjecture 354
(Mihăilescu’s theorem) 63
constants of integration 227
catastrophe theory 230
constellations of primes 69
categorical sentences 260
constructible numbers 53-4
category theory 189
constructible polygons 52
catenary 124-5
constructing an equilateral
Cauchy-Schwarz inequality triangle 51
169-70
constructing parallel lines 49
central limit theorem 340-41 constructing a square root
centres of triangles 91-2 52
chain rule, proof of the 219-20 constructing squares and
chain rule, the 219 pentagons 51
Chaitin’s Ω 292 constructive mathematics
chaos 246 316
chaos versus randomness constructivism 315-16
246-7 continued fractions 43
chaotic systems 247 continuity 212-14
checkers is solved 398 continuous functions 214
Chen’s theorem 1 68 continuous interest 237
Chen’s theorem 2 68 continuous probability
Chen’s theorem 3 68 distributions 338-9
Chinese postman problem, continuum hypothesis 279
the 197 contrapositive 258
Chinese remainder theorem convergence 207
converging and diverging
chiral knots 140 sequences 207
Church’s thesis 281 converse 258
circle formulas 93 convex quadrilaterals 97
circles 92-6 convexity 96-7
circles 92-3 Conway’s orbifolds 111
classification of closed correlation 324
surfaces, the 134 cosecant, secant and
classification of finite simple cotangent 89
groups, the 182 cosines, the law of 91
complexity classes 285-6
complexity theory 285-90
calculus of the natural
complexity theory 285
logarithm 239
compound interest 237
calculus of power series 234
compound polyhedra 102
Cayley tables 180-81
computability 290
Cantor dust 241-2
computability theory 290-94
Cantor’s diagonal argument
computable real numbers
291
Cantor’s theorem 275
computably enumerable sets
Cantor’s uncountability of
290
transcendental numbers 46
computational mathematics
cardinal numbers 275-6 310
cardinal trichotomy principle, computers 309
278
conditional probability 330
Cartesian coordinates 83
conic sections 115
Cartesian geometry 82
conservation of kinetic energy
cartographic projections 131 356
casting out nines 20-21 conservation of momentum
Catalan’s conjecture 354
(Mihăilescu’s theorem) 63
constants of integration 227
catastrophe theory 230
constellations of primes 69
categorical sentences 260
constructible numbers 53-4
category theory 189
constructible polygons 52
catenary 124-5
constructing an equilateral
Cauchy-Schwarz inequality triangle 51
169-70
constructing parallel lines 49
central limit theorem 340-41 constructing a square root
centres of triangles 91-2 52
chain rule, proof of the 219-20 constructing squares and
chain rule, the 219 pentagons 51
Chaitin’s Ω 292 constructive mathematics
246 316
chaos versus randomness constructivism 315-16
continued fractions 43
chaotic systems 247 continuity 212-14
checkers is solved 398 continuous functions 214
Chen’s theorem 1 68 continuous interest 237
Chen’s theorem 2 68 continuous probability
Chen’s theorem 3 68 distributions 338-9
Chinese postman problem, continuum hypothesis 279
197 contrapositive 258
Chinese remainder theorem convergence 207
converging and diverging
chiral knots 140 sequences 207
Church’s thesis 281 converse 258
circle formulas 93 convex quadrilaterals 97
92-6 convexity 96-7
92-3 Conway’s orbifolds 111
classification of closed correlation 324
surfaces, the 134 cosecant, secant and
classification of finite simple cotangent 89
groups, the 182 cosines, the law of 91
defence attorney’s fallacy, empiricism 315 Fermat primes
412 the 332 empty set, Ø 277 Fermat’s last theorem
definite integrals 225 encoding sets in binary 291 Fermat’s little theorem
degree of a vertex 196 energy is relative 371 Fermat’s polygonal number
derivative 217 enlargement 107 theorem 62
derivative song, the 217 enlargement and shearing Fermat’s primality test
derived category, the 189 matrices 175-6 Fermat’s two square theorem
Desargues’ theorem 147 entanglement 385-6 66
determinants 172 Entscheidungsproblem, the Fibonacci 399-401
developable surfaces 129 282-3 Fibonacci sequence, the
difference of two squares 20 EPR paradox, the 385 Fibonacci spiral
differentiability 215 epsilon and delta 214 Fibonacci’s rabbits
differential calculus 215-23 equal and opposite reactions ‘field’ with one element, the
differential equations 248-51 353 187
differential equations 248 equal tangent theorem, the 94 fields 186
differential geometry 129-32 equation of a straight line, the fields and flows
differential topology 138 84 Fields Medal, the
differentiating from first equations 161-7 finding Golomb rulers
principles 216 equations 161 finite fields 187
digital radio 358 equilibrium 396 finite geometry
dimension 242 equivalence of mass and finitism 316
dinner party problem, the energy, the 372 first digit phenomenon, the
199-200 equivalent fractions 28 326
Diophantine equations 54-64 Erdős numbers 199 flows 363
Diophantine equations 56 Erdős-Straus conjecture 57 fluid dynamics
Diophantine geometry 152-5 error-correcting codes 349 fluid model, the
Diophantine geometry 152 ETAOIN SHRDLU 345-6 focus and directrix
Diophantus’ Arithmetica 56 Euclidean geometry 80-85 forces of nature, the
Dirichlet’s theorem 70 Euclidean plane 85 forcing 279
discrete geometry 125-8 Euclid’s Elements 80 formal systems
discreteness and continuity Euclid’s postulates 80 formalism 315
213 Euclid’s proof of the infinity of forming continued fractions
displacement and velocity 222 primes 64-5 44-5
distributed computing 310 Euler bricks 59 four colour theorem, the
distributed mathematics Euler characteristic 135 four mice, the problem of th
projects 311 Euler line, the 92 122
div 360-61 Euler’s fluid flow formula 364 Fourier analysis
dividing complex numbers Euler’s formula 239 Fourier series
231 Euler’s partition function 195 Fourier transform, the
divisibility by 2 and 4 18-19 Euler’s polyhedral formula Fourier’s formulas
divisibility by 3 and 9 18 134-5 Fourier’s theorem
divisibility by 6 18 Euler’s product formula 74-5 fractal dimension
divisibility by 7 19 Euler’s trigonometric formula fractals 240-44
divisibility by 8 19 238 fractional powers
divisibility by 11 19 evaluating definite integrals Frege’s logicism
227 frequency 357
divisibility by other primes 20
evaluating indefinite integrals frequency analysis
divisibility tests 18
226-7 frequency tables
division 27-8
even perfect numbers 33 frequentism
division by 0 32
every number is interesting, Friedman’s TREE
Dodgson’s soriteses 260-61
a proof that 36 frieze groups
domain and range 272 excluded middle, law of the
dot product, the 169 frivolous theorem of
297-8
Fermat primes 70-71 games and strategies 394
Fermat’s last theorem 62 games-playing machines 397
Fermat’s little theorem 55 Gardner’s logician 408
Fermat’s polygonal number gauge groups 390
theorem 62 Gauss’ heptadecagon 51
Fermat’s primality test 72 Gauss-Bonnet theorem 130
Fermat’s two square theorem Gaussian curvature 129
general relativity 375
Fibonacci 399-401 generalization 306
Fibonacci sequence, the 400 generalized binomial
Fibonacci spiral 400 coefficients 211
Fibonacci’s rabbits 399-400 generalized binomial theorem
‘field’ with one element, the 211
generalized magic squares
186 404
fields and flows 359-66 generalized Poincaré
Fields Medal, the 309 conjecture, the 137-8
finding Golomb rulers 201 generalized Riemann
finite fields 187 hypothesis, the 77
finite geometry 148-9 geodesics 130-31
finitism 316 geometric fixed points 144
first digit phenomenon, the geometric progressions 205
geometric series 208-9
363 geometrization theorem for
fluid dynamics 363 3-manifolds, the 136
fluid model, the 363-4 glide symmetry 110
focus and directrix 115-16 Gödel’s first incompleteness
forces of nature, the 386-7 theorem 281
forcing 279 Gödel’s second incompleteness
formal systems 262 theorem 281
formalism 315 Goldbach’s conjecture 65-6
forming continued fractions golden rectangle 403
golden section in the arts, the
four colour theorem, the 128 402-3
four mice, the problem of the golden section, the 402-4
golden section, the 402
Fourier analysis 251-5 Golomb rulers 201
Fourier series 252-3 grad 360
Fourier transform, the 254-5 gradient of a tangent, the
Fourier’s formulas 254 215-16
Fourier’s theorem 253 gradients 83-4
fractal dimension 242 Graeco-Latin Squares 407
240-44 Graham’s number 42
fractional powers 13 graph theory 195-9
Frege’s logicism 314 graphs 195-6
frequency 357 gravitational equivalence 374
frequency analysis 344-5 gravitational tides 375-6
frequency tables 320-21 gravity 373-7
frequentism 332 Green-Tao theorem 70
Friedman’s TREE sequence 42 Grelling’s paradox 298
frieze groups 110 Grothendieck’s Éléments de
frivolous theorem of Géométrie Algébrique 151
group axioms, the 177
harmonic series diverges, integral calculus 224-30 librarian’s nightmare
INDEX the 209 integration 224 179
harmonics 358 integration by parts 228 Lie groups
413 he knows that she knows integration by substitution limits of sequences
that … 395 228 Lindenmayer systems
heat equation, solutions to intermediate value theorem, linear Diophantine eq
362-3 the 213 linear equations
heat equation, the 362 interquartile range 322-3 lines of rational lengt
Heesch’s tile 112 intersecting chords theorem Liouville’s non-eleme
Heisenberg’s uncertainty 96 integrals
principle 382 intersection 271 local and global geom
Heisenberg’s uncertainty intuitionism 300 130
relations 382 inverting matrices 172 logarithmic slide rules
hexadecimals 39 irrational numbers 29 logarithmic spirals
hexagonal honeycomb irrationality of √2 30 logarithms
conjecture (Hales’ theorem 2) irregular polyhedra 98-9 logarithms, the laws o
127 irregular tessellations 109 logic gates
Higgs boson, the 388 isogonal polyhedra 102 logic and reality
higher-dimensional spaces 85 isometric maps of the earth logistic map, the
higher-order differential 131 long division
equations 250 isometries of the plane 105 lowest common multi
highest common factor 32-3 isomorphisms 181-2 Lucas-Lehmer test
Hilbert’s 10th problem 284
Hilbert’s problems 307-8 J M
Hilbert’s program 280-85 Johnson solids, the 103 magic cubes
Hilbert’s program 280 Jones polynomial, the 141 magic knight’s tours
Hill’s theorem 326-7 Julia sets 245 magic squares
Hodge conjecture, the 150-51 Mandelbrot set, the
Hodge theory 150
Hodge’s theorem 362
K manifolds
Kakeya’s conjecture 244 map colouring proble
homogeneous coordinates 128
148 Kakeya’s needle 243
Kelvin’s conjecture 127-8 Markov chains
homotopy 137 mass is relative
honeycombs and crystals 113 Kepler conjecture, the (Hales’
theorem 1) 126 mathematical collabo
how to draw a triangle 147 311
Hurwitz’s theorem 27 Kepler triangle 403
Kepler-Poinsot polyhedra 101 mathematical discipli
hyperbolas 117 mathematical knots
hyperbolic geometry 142 Khinchin’s constant 45
kinetic energy 356 mathematics in the
hyperbolic geometry of light, information age
the 369 kinetic theory of heat 342-3
knight’s tours 405 mathematics as logic
hyperbolic trigonometry 240 mathematics and tec
hyperboloids 119 knot invariants 141
knot tables 140 309-12
hypercube 104 Matiyasevich’s theore
hypersphere packing 126 knot theory 139-42
matrices
hypocycloids and epicycloids Knuth’s arrows 41
matrices and equation
124 Koch snowflake 241
maxima and minima
Hypothesis H 69 Kolmogorov complexity 349
Maxwell’s equations
mean
I L mean from frequency
imaginary numbers 25 L-functions 76-7 321-2
implicit differentiation 220-21 Lagrange’s four square theorem measurement paradox
impossible sentence, the 409 60 383-4
inclusion-exclusion principle, Landau’s problems and the median
librarian’s nightmare theorem monster, the 182-3
Monty Hall problem, the 333
Lie groups 183 multi-dimensional spheres 85
limits of sequences 207 multiplication 10-11
Lindenmayer systems 401 multiplication by hand, column
linear Diophantine equations57 method 16-17
linear equations 162 multiplication by hand, table
lines of rational length 49 method 16
Liouville’s non-elementary multiplying complex numbers
integrals 229 230
local and global geometry multiplying fractions 28
multiplying matrices 171
logarithmic slide rules 14 multiplying negative numbers
logarithmic spirals 121-2 31
logarithms 13 multiplying probabilities 328
logarithms, the laws of 13 multiplying a vector by a matrix
logic gates 266 170-71
logic and reality 409 multivalued logic 300
logistic map, the 245 mutually exclusive events 328
long division 17-18
lowest common multiple 33 N
Lucas-Lehmer test 72 n-dimensional honeycombs
113-14
NAND, NOR, XOR and XNOR
magic cubes 405 265
magic knight’s tours 406 Nash’s equilibrium theorem
magic squares 404 396-7
Mandelbrot set, the 244-5 natural logarithm 239
manifolds 126 natural numbers 22
map colouring problem, the naturalness of 0, the 23
Navier-Stokes equations 364
Markov chains 342 Navier-Stokes problem, the
mass is relative 370 365
mathematical collaboration necessary and sufficient 258
negative numbers 24
mathematical disciplines 22 negative powers 12
mathematical knots 139-40 nets 99
mathematics in the new models of set theory 279
information age 310 Newtonian mechanics 352-6
mathematics as logic 314 Newton’s cubics 117
mathematics and technology Newton’s first law 352
309-12 Newton’s inverse square
Matiyasevich’s theorem 284-5 law 373
matrices 170 Newton’s laws 352
matrices and equations 176-7 Newton’s second law 352
maxima and minima 221 Newton’s third law 353
Maxwell’s equations 366 Nim 398
mean 320 nimbers 398-9
mean from frequency tables non-commuting operators
321-2 382
measurement paradox, the non-convex quadrilaterals 97
383-4 non-deterministic complexity
median 320 classes 286
librarian’s nightmare theorem monster, the 182-3
Monty Hall problem, the 333
Lie groups 183 multi-dimensional spheres 85
limits of sequences 207 multiplication 10-11
Lindenmayer systems 401 multiplication by hand, column
linear Diophantine equations57 method 16-17
linear equations 162 multiplication by hand, table
lines of rational length 49 method 16
Liouville’s non-elementary multiplying complex numbers
integrals 229 230
local and global geometry multiplying fractions 28
multiplying matrices 171
logarithmic slide rules 14 multiplying negative numbers
logarithmic spirals 121-2 31
logarithms 13 multiplying probabilities 328
logarithms, the laws of 13 multiplying a vector by a matrix
logic gates 266 170-71
logic and reality 409 multivalued logic 300
logistic map, the 245 mutually exclusive events 328
long division 17-18
lowest common multiple 33 N
Lucas-Lehmer test 72 n-dimensional honeycombs
113-14
NAND, NOR, XOR and XNOR
magic cubes 405 265
magic knight’s tours 406 Nash’s equilibrium theorem
magic squares 404 396-7
Mandelbrot set, the 244-5 natural logarithm 239
manifolds 126 natural numbers 22
map colouring problem, the naturalness of 0, the 23
Navier-Stokes equations 364
Markov chains 342 Navier-Stokes problem, the
mass is relative 370 365
mathematical collaboration necessary and sufficient 258
negative numbers 24
mathematical disciplines 22 negative powers 12
mathematical knots 139-40 nets 99
mathematics in the new models of set theory 279
information age 310 Newtonian mechanics 352-6
mathematics as logic 314 Newton’s cubics 117
mathematics and technology Newton’s first law 352
309-12 Newton’s inverse square
Matiyasevich’s theorem 284-5 law 373
matrices 170 Newton’s laws 352
matrices and equations 176-7 Newton’s second law 352
maxima and minima 221 Newton’s third law 353
Maxwell’s equations 366 Nim 398
320 nimbers 398-9
mean from frequency tables non-commuting operators
382
measurement paradox, the non-convex quadrilaterals 97
non-deterministic complexity
median 320 classes 286
414 odd perfect numbers 33-4 polygons 96 quantum field theory
one to one correspondence polygons and polyhedra quantum field theory
272 96-105 quantum mechanics
one-time pad 347-8 polyhedra 98 quantum mechanics
1089 puzzle, the 410 polyhedral duality 99 quantum momentum
1089 theorem, the 410 polyhedral formulas on quantum systems
optimal Golomb rulers 201 surfaces 135 quark confinement
oracles 292 polymath projects 311 quarks 389
orders of knowledge 395 polynomial rings 167 quartic equations
orientable surfaces 133 polynomials 161-2 quartic formula, the
possible worlds 301 quasicrystals
P potential energy 356 quaternions
P = NP question, the 288 power series 233-5 quintic equations
paper size 403 power series 234 quotient rule, the
parabolas 116 power sets 275
paraboloids 118 powers 11 R
paraconsistent logic 301 powers, the laws of 12 radians 93-4
parallel lines 81-2 predicate calculus 267 radioactive decay
parallel postulate, the 80-81 predicates 267 Ramanujan’s approximate
parallelogram law 168 prehistory of zero, the 23 circle-squaring
partial derivatives 222-3 primality testing 71-2 Ramanujan’s continued
partial differential equations prime counting function, the fractions 44
251 73 Ramsey numbers
partial differential equations, prime fields 186-7
Ramsey theory
solutions of 251 prime gaps 67
Ramsey’s theorem
partial differentiation 223 prime number theorem 1,
random real numbers
partitions of integers 194-5 the 73
random variables
partitions of a set 194 prime number theorem 2,
the 73-4 randomness
Pascal’s triangle 160-61 rare events, the law of
Peano arithmetic 268 prime numbers 64-77
rates of change of position
Peano’s space-filling curve 243 prime numbers 64
rational numbers
Pell equations 58-9 Principia Mathematica 269
prisms and antiprisms 100 rational numbers
pen and paper puzzles 197 rational numbers have gaps
Penrose and Ammann tilings prisoner’s dilemma, the 396
the 213
113 probability 327-35
rational points on varieties
pentaflakes 241 probability 327 152
pentagonal tilings 109 probability distributions rational solutions of elliptic
perfect cuboids 59-60 335-41 curves 153
perfect numbers 33 probability distributions 335-6 rationalizing the denominat
Perko pair, the 140 product rule 218-19 40
permutation groups 178-9 product rule, proof of the 219 ratios of Fibonacci numbers
permutations 193 profit and debt 24 401
perspective 146 projective geometry 148 ray, the problem of the
philosophies of mathematics proof 304 real and imaginary parts
313-17 proof by contradiction 305 real line, the
photons 377-8 proof that 1 2, a 32 real numbers
π 93 proof theory 283 real projective plane and Kl
Picard’s theorem 233 proof-checking software 312 bottle 133-4
Pick’s theorem 125 propositional calculus 262-3 reciprocals 27
pigeonhole principle, the 192 prosecutor’s fallacy, the 331 recurring decimals
π’s simple continued fraction public keys 348 reflected ray, the problem o
45 puzzles and perplexities the 30-31
quantum field theory 386-91 ruled surfaces 119-20
quantum field theory 386 ruler and compass
quantum mechanics 377-86 constructions 48-54
quantum mechanics 379 ruler and compass
quantum momentum 381 constructions 48
quantum systems 384-5 Russell’s paradox 276
quark confinement 391
389 S
quartic equations 164 sample variance 323
quartic formula, the 164-5 scalar and vector functions
quasicrystals 114 359
quaternions 26 scale factors 107-8
quintic equations 165 Schanuel’s conjecture 47-8
quotient rule, the 220 schemes 151
Schrödinger’s cat 384
Schrödinger’s equation 383
radians 93-4 science of deduction, the
radioactive decay 250 259-70
Ramanujan’s approximate second derivative test, the 222
circle-squaring 53 second derivative, the 221
Ramanujan’s continued second partial derivatives 223
fractions 44 self-similarity 240-41
Ramsey numbers 200 semiregular tessellations 109
Ramsey theory 199-201 separating the variables
Ramsey’s theorem 199 249-50
random real numbers 293-4 sequences 204-7
random variables 335 sequences 204
randomness 294 series 208-11
rare events, the law of 338 series 208
rates of change of position 222 Sessa’s chessboard 205
rational numbers 24-5 set membership 270
rational numbers 27-31 set theory 270-80
rational numbers have gaps, sets 270
213 seven bridges of Königsberg
rational points on varieties 196
Shannon’s information theory
rational solutions of elliptic 348
curves 153 short division 17
rationalizing the denominator shortest path problem, the
196-7
ratios of Fibonacci numbers sieve of Eratosthenes, the 65
similarity 107
ray, the problem of the 30 simple connectedness 136
real and imaginary parts 230 simple groups 182
real line, the 84 simultaneous equations 166
real numbers 25 sine, cosine and tangent 88-9
real projective plane and Klein sine waves 251-2
bottle 133-4 sines, the law of 90-91
reciprocals 27 six exponentials theorem 47
recurring decimals 29 size of the union, the 192
reflected ray, the problem of sketch of the proof of the
30-31 fundamental theorem of
standard model of particle Thue’s circle packing 125-6 uncountability of the
INDEX physics 390 time is relative 369 numbers, the
star polygons and star time-like, space-like and uncountable infinities
415 polyhedra 101 light-like paths 370-71 unexpected hanging,
statistics 320-27 to switch or not to switch? 334 408-9
step functions 224-5 topological graph theory 198 uniform distribution
stereographic projection 132 topology 132-9 uniform polychora
Stewart toroids, the 103 topology 132 uniform polyhedra
stochastic processes 341-3 towers of exponentials 40-41 union
stochastic processes 341-2 Trachtenberg arithmetic 21 uniqueness
strange attractors 247-8 Trachtenberg multiplication unknotting problem,
stringed instruments 358 by 11 21
stupidity tests 294 transcendence of π and e, the V
subsets 271 46 value of the golden se
subtraction 10 transcendental number theory the
subtraction by hand 15 47 vanishing points and
successes and outcomes 327 transcendental numbers 46-8 lines
Sudoku 406-7 transcendental numbers 46 variables and substitu
Sudoku clues 407 transfer principle, the 296 varieties
summing powers 233 transformation matrices 174 vector calculus
sums and products 11 transformations 105-8 vector fields
sums of two squares 60 translational symmetry 109-10 vectors
Sundman’s series 374 travelling salesman problem, vectors and matrices
surds 39 the 197
surfaces of higher degree 120 triangle inequality, the 168 W
surfaces of revolution 119 triangles 86-92 wallpaper groups
syllogisms 259-60 triangles 86 Waring’s problem
symmetries and equations triangular numbers 61 wave functions and p
188 triangulation 145 amplitudes
symmetry 106 trigonometric identities 89 wave-particle duality
symmetry groups 106 trigonometric values 90 waves
symmetry groups of spacetime, trigonometry 88 waves
the 372 trisecting an angle 50 Weaire-Phelan foam
syntax and semantics 295 trisecting an angle using a ruler, Weil conjectures, the
compass, and Archimedean (Deligne’s theorem
T spiral 50 Weil’s zeta function
tangent spaces 223 trisecting a line 49 what bees don’t know
Tarski’s geometric decidability truly large numbers, the law
what comes next?
theorem 283 of 334-5
what mathematician
tautochrone problem, the truth tables 263-4
304-9
123 Turing degrees 292-3
when are two things t
tautology and logical Turing machines 282
189
equivalence 264-5 twin prime conjecture 68
Wiles’ theorem
Taylor’s theorem 235 two-body problem, the 373-4
type theory 270
tessellations 108-14 Y
tessellations 108 Yang-Mills problems,
theorem of Thales, the 94-5 U Yang-Mills theory
Theorema Egregium 129 ultrafinitism 316-17
Young’s double-slit ex
theory and experiment 313 uncertain reasoning 298-9 377
things that don’t exist 305 uncertainty and paradoxes
36 officers problem, the 407 297-301
three utilities problem, the uncomputable equations 285
Z
Zeno’s dichotomy par
Thue’s circle packing 125-6 uncountability of the real
time is relative 369 numbers, the 272-3
time-like, space-like and uncountable infinities 274
light-like paths 370-71 unexpected hanging, the
to switch or not to switch? 334 408-9
topological graph theory 198 uniform distribution 339
132-9 uniform polychora 104
topology 132 uniform polyhedra 102
towers of exponentials 40-41 union 271
Trachtenberg arithmetic 21 uniqueness 305
Trachtenberg multiplication unknotting problem, the 142
21
transcendence of π and e, the V
value of the golden section ø,
transcendental number theory the 402
vanishing points and vanishing
transcendental numbers 46-8 lines 146-7
transcendental numbers 46 variables and substitution 158
transfer principle, the 296 varieties 145-6
transformation matrices 174 vector calculus 360
transformations 105-8 vector fields 359-60
translational symmetry 109-10 vectors 167
travelling salesman problem, vectors and matrices 167-77
197
triangle inequality, the 168 W
triangles 86-92 wallpaper groups 111
triangles 86 Waring’s problem 63
triangular numbers 61 wave functions and probability
triangulation 145 amplitudes 380
trigonometric identities 89 wave-particle duality 378-9
trigonometric values 90 waves 356-8
trigonometry 88 waves 356-7
trisecting an angle 50 Weaire-Phelan foam 128
trisecting an angle using a ruler, Weil conjectures, the
compass, and Archimedean (Deligne’s theorem) 150
50 Weil’s zeta function 149
trisecting a line 49 what bees don’t know 127
truly large numbers, the law
what comes next? 205-6
334-5
what mathematicians do
truth tables 263-4
304-9
Turing degrees 292-3
when are two things the same?
Turing machines 282
189
twin prime conjecture 68
Wiles’ theorem 62
two-body problem, the 373-4
type theory 270
Y
Yang-Mills problems, the 391
Yang-Mills theory 390-91
ultrafinitism 316-17
Young’s double-slit experiment
uncertain reasoning 298-9 377
uncertainty and paradoxes
297-301
uncomputable equations 285
Z
Zeno’s dichotomy paradox 212
Author Acknowledgements
I would like thank Dugald Macpherson, Jessica Meyer, Ben Abramson, Matthew
Daws, Tom Lehrer, Nic Infante, George Barmpalias, David Pauksztello, Mark Ryten,
Peter Tallack, David Knapp, Anne Oddy, Elizabeth Meenan, Michael Gurney, Mark
Reed, Wayne Davies, Emma Heyworth-Dunn, Mairi Sutherland, Haruka Okura Elwes,
Jessica Elwes, Colin and Jessica Russell, as well as the staff and students at the
University of Leeds, Dixons City Academy, and Calder High School.
Carmelite House
An Hachette UK company
The moral right of Richard Elwes to be identified as the author of this work has been
asserted in accordance with the Copyright, Design and Patents Act, 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, electronic, mechanical,
photocopying, recording, or otherwise, without the prior permission in writing of the
copyright owner and publisher.
Every effort has been made to contact copyright holders. However, the publishers will
be glad to rectify in future editions any inadvertent omissions brought to their
attention.
Quercus hereby excludes all liability to the extent permitted by law for any errors or
omissions in this book and for any loss, damage or expense (whether direct or indirect)
suffered by a third party relying on any information contained in this book.
Illustrations: Bill Donohoe, Pikaia, p.245 (top) Michael Becker, p.248 Scott
Camazine/Science Photo Library