0% found this document useful (0 votes)
2 views

Foundations

The document 'Foundations of Mathematics' by Martin Ward provides a comprehensive overview of mathematical logic, including formal theories, sentential and predicate logic, and models. It covers various first-order theories, Morse-Kelley set theory, transfinite arithmetic, model theory, and computability, including recursive functions and Gödel's theorem. The content is structured into sections that detail foundational concepts and their applications in mathematics.

Uploaded by

jrx527
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Foundations

The document 'Foundations of Mathematics' by Martin Ward provides a comprehensive overview of mathematical logic, including formal theories, sentential and predicate logic, and models. It covers various first-order theories, Morse-Kelley set theory, transfinite arithmetic, model theory, and computability, including recursive functions and Gödel's theorem. The content is structured into sections that detail foundational concepts and their applications in mathematics.

Uploaded by

jrx527
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 612

Foundations of Mathematics

Martin Ward

March 22, 2019

1
Contents

Contents i

I LOGIC 3

1 FORMAL THEORIES 5
A Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
B Formal theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
C Comparing formal theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2 SENTENTIAL LOGIC 31
A A language for Sentential Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B Examples of formal proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
C Deductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
D Three “it follows that” relationships . . . . . . . . . . . . . . . . . . . . . . . . 49
E Subdeductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
F Some theorems and deductions . . . . . . . . . . . . . . . . . . . . . . . . . . 55
G Decidability of Sentential Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 73
H Independence of the axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
I Other axiomatisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3 PREDICATE LOGIC AND FIRST-ORDER THEORIES 89


A A language for first-order theories . . . . . . . . . . . . . . . . . . . . . . . . . 89
B Deduction and subdeduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
C Using theorems and deductions in proofs . . . . . . . . . . . . . . . . . . . . . 119
D Some theorems and metatheorems . . . . . . . . . . . . . . . . . . . . . . . . 121
E Quantifying non-free variable symbols . . . . . . . . . . . . . . . . . . . . . . 124
F Deductions involving choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
G Qualified quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
H Consistency of PL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
I First-order theories compared with PL . . . . . . . . . . . . . . . . . . . . . . 137
J Finitely-axiomatisable first-order theories . . . . . . . . . . . . . . . . . . . . 138

4 SOME FIRST-ORDER THEORIES 139


A First-order theories with equality . . . . . . . . . . . . . . . . . . . . . . . . . 139
B Projective planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
C Peano Arithmetic (PA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

i
ii CONTENTS

D A stripped-down version of elementary arithmetic (RA) . . . . . . . . . . . . 161


E The Elementary Theory of Groups . . . . . . . . . . . . . . . . . . . . . . . . 168
F Unbounded dense linear orders (UDLO) . . . . . . . . . . . . . . . . . . . . . 170
G Functions versus relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

5 MODELS 185
A Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
B Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
C Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
D Models which respect equality . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
E Example: more models of the projective plane . . . . . . . . . . . . . . . . . . 208
F PL is not complete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
G Example: Models for RA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

II MATHEMATICS 215

6 MORSE-KELLEY SET THEORY 217


A Morse-Kelley Axioms for Set Theory . . . . . . . . . . . . . . . . . . . . . . . 217
B The first five axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
C Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
D The Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
E Well-ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
F The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
G Zorn’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
H The last two axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

7 TRANSFINITE ARITHMETIC 287


A Ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
B Induction and ordinals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
C Ordinal functions and continuity . . . . . . . . . . . . . . . . . . . . . . . . . 302
D Arithmetic of ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 308
E Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
F Cardinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

8 MODEL THEORY 347


A Structures, interpretations, models again . . . . . . . . . . . . . . . . . . . . . 347
B Adequacy of first-order theories . . . . . . . . . . . . . . . . . . . . . . . . . . 350
C Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

IIICOMPUTABILITY 375

9 RECURSIVE FUNCTIONS 377


A Partial recursive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
B Primitive recursive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
C Specifying algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
D The Ackermann function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
E Two fundamental theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
CONTENTS iii

F Some important negative results . . . . . . . . . . . . . . . . . . . . . . . . . 427


G Recursively enumerable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434

10 GÖDEL’S THEOREM 439


A Gödel numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
B Expressibility and representability . . . . . . . . . . . . . . . . . . . . . . . . 444
C Gödel’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
D PL is not decidable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

APPENDICES 460

A CONSTRUCTION OF THE BASIC NUMBER SYSTEMS 461


A Quotient sets and algebraic operations . . . . . . . . . . . . . . . . . . . . . . 461
B The Natural Numbers, N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
C The Integers, Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
D The Rationals, Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
E The Reals, R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
F The Complex Numbers, C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484

B ZF SET THEORY 485


A Zermelo-Fraenkel Axioms for Set Theory . . . . . . . . . . . . . . . . . . . . . 485
B The first six axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
C The Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
D Well-ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
E The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
F The Axioms of Foundation and Formation . . . . . . . . . . . . . . . . . . . . 490
G Ordinal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491

C GENERAL ALGORITHMS 493


A Defining a general algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
B Some useful facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
C A more convenient notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
D An important workspace structure . . . . . . . . . . . . . . . . . . . . . . . . 527
E Algorithmic ⇒ partial recursive . . . . . . . . . . . . . . . . . . . . . . . . . . 537
F Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554

D SOME ALGORITHMS 557


A Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
B Algorithms for Gödel’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 559

E THE EXPONENTIAL FUNCTION IN PA 581

F INDEX 585
CONTENTS 1

Preface
Please read this: it won’t take long.
In these notes I use coloured text a lot. Everything makes perfect sense without the colour-
coding, however it is there to make it easy to see what kind of text it is; I hope that will
make the notes easier to understand. Here is how the colours will be used . . .
The plain black text like this is the serious core of the book. It is most easily explained as
being what the other-coloured text is not.
The text like this is for informal discussion and comments. It will contain, for instance,
suggestions of useful ways to visualise things, informal outlines of proofs, warnings about
easily-made mistakes and so on. In short, the sort of overview which I would normally
provide in lectures. In a number of places I mention mathematical results that will not be
established until later in the notes. So be aware that anything in this colour which looks like
a mathematical statement with or without proof cannot be used seriously until it is actually
proved (in another colour) later in the notes.

The green like this is used for expressions in the various formal languages we will be studying.
(That’s apart from this paragraph of course.) Such expressions will usually be mostly
symbols. This usage will be explained properly when it becomes relevant.
Anything in blue like this is a hot link to another place in the notes. Hot links are usually
in the margin where they won’t get in the way of the text.
Part I

LOGIC

3
1. FORMAL THEORIES

A Formal languages
A.1 Discussion: The idea of a formal language
Let us suppose we have decided to develop a self-contained theory of the natural numbers,
treating the subject with full mathematical rigour. Our writings would have informal discur-
sive portions, which would be written in English (or whatever language we choose) peppered
with some mathematical technical terms, and there would be highly formal portions, such
as equations, proofs and so on, which would not look like a natural language at all. To
be entirely rigorous, the full logical development of the theory should be contained in the
formally written part, which would consist for the most part of definitions and proofs. The
discursive portions would be limited to discussion of motivation, pointing out interesting
connections, history and so on and could be dispensed with without affecting the logical
structure of the theory. Our focus in this book will be on the formal language (which is all
that is logically necessary).

To treat the natural numbers in this formal way, we would need some symbols: some letters
to refer to variables, some numeric digits such as 0 and 1, and some logical and mathematical
symbols such as +, =, ≤, ∀ and so on; perhaps quite a short list. (We will have good reason
to do exactly this in Section 4.C and the list is indeed short.) I 4.C

The statements we make in our formal language will consist of strings of these symbols.
Some strings will be just plain rubbish, such as

)∀ + (=== .

Of course we hope we will not write such strings down in earnest. The kind of strings we
will be writing are ones which are meaningful, which we will call expressions, for instance

(∀x) ( x + 1 > x )
(∀x) ( x + y > x )
2+2=5

While an expression must make sense, it need not be true. The first example above is true,
the last one is false and the middle one occupies an intermediate position — its truth or
falsity depends upon the value of y.
We expect that it should be a fairly simple matter to decide whether a string is an expression
or not. In most areas of mathematics you only have to glance at an equation to see if it is
well-formed or not (assuming you are already familiar with the notation used in that area).
You might have trouble with a very complicated equation which extended over several lines,
and then you could engage in a more careful procedure — checking that parentheses match
and so on. In short, one has an algorithm (a straightforward computational procedure) for

5
6 CHAPTER 1. FORMAL THEORIES

determining whether a string is an expression or not; one hopes it is simple enough that it
can be applied to short expressions very easily, even subconsciously.
Putting this together, we have a formal language. In general, a formal language consists of

• A defined set of symbols, which we will call its alphabet, even though it is likely to
contain symbols other than ordinary letters.

We will be interested in strings, by which we mean strings of members of the alphabet.

• A defined set of expressions, which is a subset of the set of all strings. (Expressions
are sometimes called well-formed formulas or just wffs.)

• An algorithm which will decide whether a string is an expression or not.

Things become much more interesting, perhaps useful, when amongst the expressions we
have a distinguished subset which we consider “good” in some sense: they are the important
centre of our investigation when using this language. If the language is designed to do
mathematics or logic, then these “good” expressions will probably be “theorems”, in the
sense that they can be derived from a given set of “axioms” by application of some well-
defined set of rules. When we have this extra structure, we have an “axiomatic theory”.

So, in general, an axiomatic theory consists of

• A formal language, as described above, and also —

• A defined set of axioms, which is a subset of the set of expressions of the language.

• A well-defined set of deductive rules, which allow new expressions to be “derived” or


“deduced” from old ones.

• This then defines the set of theorems, another subset of the set of expressions, as those
expressions which can be derived from the given axioms by the given rules.

Some of these ideas, especially how theorems can be derived using rules, need to be explained
more carefully. This will be done later in this chapter .
Returning to our number theory example, our theorems should be the expressions which say
something that is in fact true, or at least a big subset of them. We could start with Peano’s
Axioms and give a sufficient set of deductive rules to derive the theorems we want. Figuring
out what these rules should be is of course a big undertaking, and one with a long history.
A. FORMAL LANGUAGES 7

[ In these notes, the emphasis will be on topics in mathematics and logic, however it
is worth pointing out that these ideas and methods can be of considerable interest in
other areas. It is possible, for instance, to define a language in which the expressions
specify four-part musical pieces and then the theorems are those which specify pieces
which obey the rules of classical (common practice) harmony. For another example,
we might define a language in which the expressions specify positions in the game of
Chess and the theorems are those which specify the winning positions. For perhaps a
more obvious example, the language could be a computer language, C for instance; in
this case the expressions would be programs obeying the syntax of the language and
the theorems those programs which do not crash (in some sense which we would have
to specify carefully), for instance the well-known “Hello world”. The expressions in
this language (= programs) tend to be long and very complicated and the algorithm to
distinguish expressions from non-expressions correspondingly intricate: this is (part
of) what a compiler does.]

It should be pointed out that the description given above of an axiomatic theory might give
the impression that one starts with a formal language with axioms and deductive rules and
then figures out what theorems follow from that. In fact, from a historical perspective, this
is quite back-to-front. Most important formal theories in use started from a body of known
facts that was crying out for codification, a language was invented to describe them and then
the difficult process of discovering axioms and rules which will distinguish the required facts
from the dross was undertaken. As one obvious example, the important general facts about
the natural numbers were well known for thousands of years before Giuseppi Peano came
up with his axioms — but others abound: the invention of group theory to investigate the
properties of symmetry, the long drawn out invention of axiomatic bases for mathematics
which more or less started with Bertrand Russell’s Principia Mathematica and the more
recent invention of Category Theory to (amongst other things) codify properties in common
between diverse mathematical theories. Although this approach is obviously important, we
will not be following it in these notes; rather we are interested in the properties of formal
theories in general, and several particularly important ones, leading up to the axiomatisation
of mathematics itself. By and large our axiomatic theories will be taken ready-made. There
is quite enough here to keep us busy!
We will however from time to time want to consider such ideas as

• What different sets of axioms might there be for a given theory?

• What happens if we add extra axioms to a given theory?

• Under what circumstances might one theory be stronger (say more) than another
theory (and what does that mean anyway)?

To discuss such ideas, we need the notion of a “theory” which does not necessarily involve
the notion of axioms for that theory. One property that a theory really should have is that
it should logically hold together. We can make this idea more clear-cut by stating it thus:
assuming we already have a language and a set of deductive rules,
Any expression which can be derived (using the given rules) from expressions
in the theory must also be in the theory.
8 CHAPTER 1. FORMAL THEORIES

This property could be conveniently expressed “A theory must be closed under the deductive
rules”. This idea turns out to work very nicely; we will define a theory to be a set of
expressions which is closed under the deductive rules in this way. (The definition assumes
that we have already decided on a language and set of deductive rules.) Here is why this
definition is nice: we will be able to prove

(1) Any axiomatic theory (that is, any set of statements that can be derived from a set
of axioms as described above) is in fact a theory (as just defined), so we get no ambiguity
of words.
(2) Any theory (as just defined) is axiomatic, that is, there does exist a set of axioms
from which that theory can be derived.

So the neat way to proceed will be to start with the idea of a theory in general, then define
an axiomatic one.
Some examples of theories:

• The natural numbers (axiomatic — given by Peano’s axioms).

• Finite set theory (facts about finite sets. What would the axioms be?).

• Real numbers (facts about real numbers. Something like the “axioms” given in ele-
mentary calculus textbooks?).

• “All” of mathematics (axiomatisations are MK and ZF which we will discuss in detail


later.

• Simple formal theories such as the one described in the next section.

When we come to investigating a particular language, for instance that for Mathematics, we
will define it completely formally. We do this because we are interested in proving things
about mathematical proofs, provability, and theories in general, some of them subtle and
some surprising so we need a firm basis to work from. This means that we must idealise
our language somewhat — it must differ to some extent from common usage (because
common usage, even in mathematical texts, is quite sloppy). Given that we are going to do
this, we may as well idealise it in a way which is convenient for our present purposes. It
should be understood however that, whatever we do, anything correctly stated in ordinary
mathematical discourse should be translatable into our formalised version of the language
and vice versa. Having granted ourselves this freedom, we will define the formal language
as simply as possible. This will mean that it is (relatively) easy to prove things about it.
On the other hand, as we will see, it will be very difficult to read and most impracticable
to use. We will usually employ a number of abbreviations to make it easier to understand
— thus creating a “user friendly”, semi-formal version of the language.
Many formal theories embed logic in some sense. There are many different logics and
many ways they could be incorporated into a theory, however two forms of logic stand
out as most important: Sentential (or Propositional ) Logic and Predicate (or First-Order )
Logic. Sentential Logic is concerned with sentences (otherwise known as propositions or
statements), that is, expressions which are either true or false (but not both) and the
A. FORMAL LANGUAGES 9

logical relations between them, the most common being negation, conjunction, disjunction,
implication and equivalence. Predicate Logic includes all of this and adds the quantifiers
(“For all x . . . ” and “there exists x such that . . . ”). That, of course, was a rough description,
not a definition.
Sentential and Predicate Logic will be described in detail in the next two chapters, however
to continue these introductory remarks, let us assume that we now know what Predicate
Logic is (at least roughly), and we propose to incorporate it into some formal theory. There
are several ways we could do this. For a start, there is a choice of symbols: we do not need
all of ¬ ∧ ∨ ⇒ ⇔ ∀ ∃ , because some of these symbols can be defined in terms of the
others. When we have chosen our symbols and the rules for forming expressions, we have
some choices about the sets of axioms and rules. If you take the trouble to look up some
textbooks on logic, or just look the topic up on the internet, you will discover that there are
many different ways of axiomatising both sentential and predicate logic. Of course, all these
different axiomatisations should be equivalent, in the sense that they give rise to the same
theory. To a large extent, the choice of axiomatisation is simply a matter of convenience:
any choice will be acceptable provided it yields a language and theory which is equivalent to
“standard” Predicate Logic. (In Section C.4 we given a careful definition of what we mean I C.4
here by “equivalence”). In these notes we adopt a particularly simple and fairly common
axiomatisation of these logics.

You will have noticed that above I wrote “should be equivalent”. It is an unfortunate
fact that some authors have slightly different ideas as to what Predicate Logic actually
is, so their axiomatisations might give rise to a different set of theorems. Possibly the
most common divergence is that some authors incorporate the notion of equality into
their logic, whereas in these notes it is treated as a separate idea. Even more unfortu-
nately, a number of treatments contain errors, either large or small, the most common
being errors of omission. (Once you have worked through to the end of Chapter on
First-order Theories, you might find it interesting to look up some other expositions
and see how they deal with the subtle but important topics of acceptability, binding
and universal generalisation.)

Suppose now we have some formal theory in which we are interested. Here are some funda-
mental questions we might ask about it:

• Is the theory decidable? That is, is there an algorithm which will decide whether any
given expression is a theorem or not?

• What sort of interpretations does it have, perhaps quite different from whatever led
us to think it up in the first place?

Those questions can be asked whether the theory has logic embedded in it or not. Now
suppose that the theory contains (at least) sentential logic.

• Is the theory consistent? A theory is inconsistent if it contains two theorems of the


forms P and NOT–P (in whatever symbology the language employs), in other words,
two theorems which flatly contradict each other. In an inconsistent theory, it would
10 CHAPTER 1. FORMAL THEORIES

then follow from the rules of Sentential Logic that every expression is a theorem (and
therefore so would the negation of every expression), and so the whole thing becomes
trivial. Indeed, worse than that, a theory which contained such theorems would also
contain (one meaning) P AND NOT P , and so would violate the spirit of sentential
logic anyway. In short, we really do not want any of our theories to be inconsistent.

• Is the theory complete? A theory is complete if, given any sentence P , either P or
NOT–P is a theorem. Since a sentence (in the sense we will use the term) is meant
to represent some proposition which is either true or false, one might hope that the
theory was strong enough to say which of these was the case: completeness means that
this is so.

• Is the theory categorical ? A theory is categorical if all interpretations of it are iso-


morphic. If we have an axiomatisation of the Natural Numbers, for instance, we could
imagine building several different models of this theory, but we would hope that the
differences between the different versions would be trivial, that the axiomatisation had
I 8.B.17 indeed captured all the structure of the Natural Numbers. (In Sections 8.B.17 and
I 8.C.3 8.C.3 we will see, alarmingly, that this is not so). On the other hand, we could axioma-
tise Classical Harmony for music in such a way that it would yield normal harmony
in the standard twelve-note scale and something quite different in a fifteen-note scale
(with the same axioms) — this would be not categorical. By the second example I am
hoping to suggest that failure to be categorical, in some cases, may be a good thing.

A.2 Example: A "toy" axiomatic theory


In this theory there are three symbols: i a b
All nonempty strings are expressions. (This is unusual amongst languages, but it certainly
makes it easy to decide whether a string is an expression or not.)

There is only one axiom, namely i


There are four rules:
(1) If an expression contains the symbol i, then it may be removed (provided the result
is not empty). Conversely an i may be inserted anywhere.

XiY ↔ XY .

(2) The string aaa may be replaced by i anywhere, and vice versa. That is

XaaaY ↔ XiY .

(3) And the same for bbb:


XbbbY ↔ XiY .

(4) The string abab may be replaced by bbaa anywhere, and vice versa:

XababY ↔ XbbaaY .
A. FORMAL LANGUAGES 11

(In these rules, X and Y stand for any subexpressions, possibly empty — though they may
not both be empty in (1).)
An expression is part of the theory if it can be “deduced” from the axiom by a sequence of
none or more of the rules. For example, we could have a sequence

i → aaa by Rule 2
→ iaaa by Rule 1
→ bbbaaa by Rule 3
→ bababa by Rule 4

which would “prove” that bababa is a theorem.


In discussing this theory, it is very tempting to use notations such as a3 and (ab)2 . There
is no harm in doing this, so long as one remembers that such notation is not part of the
formal language we are discussing, but rather just a convenient abbreviation to use when
discussing it. We say that it is not part of the formal language, but part of a semi-formal
version of it.
Using these abbreviations, the rules become a bit easier to read:

XiY ↔ XY Rule 1
3
Xa Y ↔ XiY Rule 2
3
Xb Y ↔ XiY Rule 3
2 2 2
X(ab) Y ↔ Xb a Y Rule 4

and the example proof:

i → a3 → ia3 → b3 a3 → (ba)3

Here are some questions to try. I believe that they are in increasing order of difficulty.

(Q1) Is (ba2 b2 aba)4 a theorem?

(Q2) Is (abab2 )3 a theorem?

(Q3) Is (ab2 )3 a theorem?


The point of these questions is to illustrate that, even with quite a simple theory, the
decidability question may not be at all straightforward.

And this leads to the more general . . .

(Q4) Is this theory decidable or not? In other words, is there an algorithm (a routine
computational process) which will decide, for any expression of the language, whether it is
a member of the theory or not?
12 CHAPTER 1. FORMAL THEORIES

Note that we could have defined this theory slightly differently. We could have replaced the
lone axiom i by an extra rule which says
In a proof, you can write down i as a step any time you like.
Then we would have a perfectly good theory based on five rules and no axioms at all. (The
theorems would be the same as before.)

A.3 Symbols and alphabet


Here is where the story really starts. We now set about defining a (formal) language. As
discussed above, it is defined in terms of a given set of symbols, from which we get the
strings, which are just finite sequences of symbols, amongst which there is a defined subset,
the expressions or wffs.
We start with a set, which we call the alphabet and whose members are called the symbols.
Normally we would think of our alphabet as being a set of recognisable characters we could
write on a page, but there is no need to be restrictive. We could set up our toy theory above
with the three symbols being represented by pebbles of three different colours, or by ASCII
codes in a computer or by different notes played on a musical instrument. Since we will be
using the language and methods of ordinary mathematics to discuss our formal theories, let
us just say that the alphabet can be any set. We just call it an “alphabet” to indicate how
we are going to use it.

The alphabet is normally either finite or countably infinite. In occasional (unusual) cases in
which we want to deal with an uncountable set of symbols, this will be made clear.

A.4 Strings
Assume now that we have decided on the set of symbols.
In terms of these, a string is simply a finite sequence of symbols.
For example, if our alphabet contains the symbols a, c and t, one of our strings is {c, a, t}.
For legibility, we mostly write these sequences without the braces and commas, thus cat.
Using a set of symbols commonly used in mathematics, we can now make strings such as
2 + 2 = 4 and xy∀+− . Note that nonsensical sequences of symbols are permitted as
strings. To facilitate discussion, we also include the empty sequence, which I will denote by
the name e.

Pause for nitpicking. We cannot write the empty string if we think in terms of marks
on a page, except I suppose to display a blank space on the paper. In this notation,
one could argue whether there is such a thing as the empty string at all. It suffices
to say that anything we will say below which mentions the empty string explicitly or
implicily can also be said without it; we are simply using the idea as a notational or
terminological convenience. In any case, the symbol e which I use to represent the
empty string is not the empty string itself but a name for it, and so is not part of the
formal language.

We note that two strings can be concatenated, i.e. joined end to end. We can concatenate
abc and pqa to get abcpqa. We could concatenate the and cat to get thecat. This turns
A. FORMAL LANGUAGES 13

our set of expressions into an algebraic system, consisting of a set S of strings, with two
operations
the empty string e (nullary)
concatenation (binary)
It is easy to see that the empty string acts as an identity and that concatenation is associa-
tive, i.e. that the following laws hold:
ue = u and eu = u for any string u,
u(vw) = (uv)w for any strings u, v and w.
This means that S is a monoid, that is, a semigroup with identity. This particular semigroup
has some very special properties, most of which are really really obvious. However, here is
some terminology that we should get clear, because the ideas are used frequently.

A.5 Definition
(i) A string u is substring of a string v if there are strings x and y such that v = xuy.
(Note that either x or y or both may be empty here, so in particular, any string is a
substring of itself. Also the empty string is a substring of every string.)
(ii) An occurrence of the string u in the string v is a triple (x, u, y) such that v = xuy.

For example:

• The string bab is a substring of the string cbabcbababd.


• There are three occurrences of the string bab in the string cbabcbababd (the second and
third occurrences overlap).

A.6 Expressions
The set of expressions is simply a defined subset of the set of strings.
Normally the set of expressions is decidable, that is, there is an effective procedure — an
algorithm or rule — to decide whether any given string is an expression or not.

Let me put that more strongly: it is hardly ever the case that a formal language does not
have such an algorithm.

A.7 Decidability
The above raises the question of what exactly an algorithm is. For the time being, it is
sufficient to think of an algorithm as a rule, or set of rules, which can be applied to answer
such a question (or compute values of a function) in a completely routine and mechanical
way — without any call for creative thinking. Algorithms will be defined and discussed in
eye-watering detail in Chapter 9. I Ch.9
14 CHAPTER 1. FORMAL THEORIES

B Formal theories
Now we have the idea of a formal language, we are able to define a formal theory.

B.1 Definition: Deductive rule


A deductive rule (in a language L) is of the form: from a certain finite set of expressions,
A1 , A2 , . . . , Ar say, one can deduce another given expression B.

More precisely, the rule simply consists of the specified expressions. The phrase “one can
deduce” is, so far, a meaningless convenience. We will give it a meaning in the next paragraph
or so. Such a rule is often written in the convenient form
A1 , A2 , . . . , Ar
B
Here the expressions A1 , A2 , . . . , Ar are called the premises and B the conclusion of the
rule. Once again, there may be an infinite number of rules but, as with the axioms, the set
of rules will normally be decidable, for the same reason.

When discussing relationships between different theories, possible sets of axioms for theories,
and so on, we normally (but not always) work with a fixed language and set of rules. Thus
it is natural to think of a given language and set of rules as a sort of deductive framework
within which we work.

B.2 Definition: Formal theory


Given a language L and a set R of deductive rules (a deductive framework), a formal theory
is a set, T say, of expressions in the language L with the property:
For any rule A1 ,A2B,...,Ar in R such that A1 , A2 , . . . , Ar are all members of T , B must be a
member of it too. It is natural to describe this property as “T is closed under the rules R”.

For a theory, it will always be assumed that the set R of deductive rules is decidable.
Note that the definition of a formal theory assumes that it is done in terms of a language
and set of rules, presumably already chosen. To be more careful, perhaps we should talk of
“the formal theory T in the language L with rules R”, or “the formal theory T for framework
L and R.” To be even more careful we could consider the whole thing together as a triple
hL, R, T i. Most of the time we will not need to be so explicit especially since, from Chapter
3 on, the language will hardly ever change and the rules never.
In any case, the language L is called the underlying language of the theory.
I now give what may look like a strange definition of axioms for a theory — it does not
mention proofs. However, it is neat and useful, and we will come to proofs shortly. First a
proposition which makes the definition work.

B.3 Proposition (and definition)


Given a deductive framework L and R, as described above, let A be any set of expressions
in the language. Then there is a unique theory Th(A) with these properties
B. FORMAL THEORIES 15

(i) A ⊆ Th(A),
(ii) If T is any theory which contains A as a subset, then Th(A) ⊆ T .
We will say that A generates the theory Th(A).

Proof. First notice that, given a set of theories (with the same framework), their inter-
section is also a theory. This is because, given any rule A1 ,A2B,...,Ar such that the premises
A1 , A2 , . . . , Ar are all members of this intersection, they must all be members of every one
of the set of theories. But then so must B be.

Notice also that the entire language — the set of all its expressions — is a theory.
It follows that, setting Th(A) to be the intersection of all theories containing A, it has the
required properties.
That Th(A) is unique follows immediately from (i) and (ii) 

B.4 Corollary
Given a deductive framework L and R, let A and B be two sets of expressions in the language
such that A ⊆ B. Then Th(A) ⊆ Th(B).

B.5 Definition: Axiomatic theory


A set of axioms for a theory is a set of expressions which generates the theory, as defined
in the previous proposition.
An axiomatic theory is a theory together with a specified set of axioms.

B.6 Comments
(1) Note that any theory has at least one set of axioms (according to this definition),
namely itself. Thus there is no point in defining an axiomatic theory to be a theory which
happens to have a set of axioms — because then every theory would be axiomatic. Rather,
we define an axiomatic theory to be a theory together with a particular, usually chosen set
of axioms, the pair hT , Ai if you like.

(2) Nearly all the theories we will deal with in this book will be axiomatic theories.
Nearly.
(2) There is a difference between a “plain” formal theory, as defined in B.2 above, and I B.2
an axiomatic theory that has an empty set of axioms. A plain formal theory is just a
set of expressions, closed under the chosen rules. It usually expresses everything we know
about some subject, or at least we hope so. Frequently we would be interested in finding a
convenient set of axioms which would make it into an axiomatic theory. Until we do that
we cannot identify theorems by giving proofs. In an axiomatic theory for which the set of
axioms is empty, on the other hand, theorems can be proved by proofs which involve just the
rules alone. We made a version of the Toy Theory at the end of A.2 which is such a theory. I A.2
Axiomatic theories with empty sets of axioms are unusual, but not particularly rare. For
instance, there are formal theories of logic which are entirely rules-based, with no axioms.
16 CHAPTER 1. FORMAL THEORIES

(3) It is normally assumed that the set of axioms must be decidable and certainly for
any useful formal theory this will be so. However very occasionally we may want to consider
a theory in which the set of axioms is not (or need not be) decidable, so we do not want
to make it a hard and fast rule. In the rest of these notes, you can assume that all sets of
axioms are decidable unless I say otherwise — however it is probably a good thing, each
time a new theory is introduced, to look at the axioms and ask whether the set is decidable
or not. (If we have only a finite set of axioms, that is automatically decidable. However we
often want the set of axioms to be infinite, in which case the decidability criterion is not
quite so trivial.)
A formal theory for which a decidable set of axioms exists is called a recursively axiomatisable
theory for reasons which will become apparent much later in this book.

(4) I mention again: the set of deductive rules is always decidable.

B.7 Theorems and proofs


Given a set of deductive rules and perhaps a set of axioms, we will now show that the theory
so generated can be defined in the way you probably expected — in terms of theorems and
proofs.

Most of the time we will be dealing with axiomatic theories, where there is a chosen set of
axioms to work with which generates the theory, but just occasionally we will deal with a
“plain” theory, where we have rules but no particular set of chosen axioms. So I need to
word the definitions here carefully to cover both cases.
The set of theorems can be built up as follows: a theorem is an expression which has a proof.
A proof is a finite sequence of expressions, S 1 , S 2 , . . . , S n say, (which we call the steps of
the proof), such that every one of the individual expressions Si is either an axiom (if there
are any), or follows from earlier members of the sequence by one of the deductive rules.
This corresponds pretty closely to what we would ordinarily think of as a theorem and its
proof, except that in the definition of the formal theory a proof is expected to be written
out in much more detail than we would normally bother with in practice.

More generally, we can have a proof from hypotheses, in which case we start with a set of
expressions (the hypotheses), which can be any set of expressions at all, and then a proof
from them is just like the one defined above except that a step may also be one of the
hypotheses.

In the case of a theory which is not axiomatic (that is, we do not have a defined subset of
axioms) we can use these definitions by simply assuming that the set of axioms is empty.

B.8 Theorems and proofs


We start with the notion of a proof from hypotheses H, where H can be any set of expres-
sions.
(i) In an axiomatic theory, a proof is a finite sequence of expressions

S1, S2, . . . , Sn n≥1


B. FORMAL THEORIES 17

(called the steps in the proof) in which, for each i, one of the following holds:
(a) Si is an axiom ,
(b) Si follows by one of the rules of the theory from its predecessors S 1 , S 2 , . . . , S i−1 .
Is it obvious what it means in (b) to say that, in this sequence, the step Si
follows from earlier steps by the rule
A1 , A2 , . . . , Ar
?
B
It means that, among the preceding members S 1 , S 2 , . . . , S i−1 of the sequence,
all the expressions A1 , A2 , . . . , Ar occur, though not necessarily in that order,
and Si is the same as B.
(ii) In an axiomatic theory, a theorem is any expression T which occurs as the final step
of a proof. And of course we say that that proof is a proof of T .
We also have the notion of a proof from hypotheses H, where H can be any set of expressions.
In this case I will use the word deduction to emphasise the difference.

(iii) In an axiomatic theory, a deduction from or based on hypotheses H is a finite sequence


of expressions
S1, S2, . . . , Sn n≥1
(called the steps in the deduction) in which, for each i, one of the following holds:
(a) Si is an axiom ,

(b) Si is a member of H ,
(c) Si follows by one of the rules of the theory from its predecessors S 1 , S 2 , . . . , S i−1 .
So the only difference between a deduction from hypotheses and an ordinary proof of a
theorem is that, in a deduction from hypotheses, any one of the hypotheses may be asserted
as a step at any time. This corresponds to making the assumption that that hypothesis is
true. Looking at this the other way around: a proof is just a deduction from no hypotheses
(H is empty).
The idea of proof of a theorem is not usually useful in a plain formal system, but the idea
of a deduction from hypotheses is. Of course, in this case there are no axioms to appeal to,
so the definition looks like this:

(iv) In a plain formal theory (that is, one without a given set of axioms), a deduction
from or based on hypotheses H is a finite sequence of expressions

S1, S2, . . . , Sn n≥1

(called the steps in the deduction) in which, for each i, one of the following holds:
(a) Si is a member of H ,

(b) Si follows by one of the rules of the theory from its predecessors S 1 , S 2 , . . . , S i−1 .
18 CHAPTER 1. FORMAL THEORIES

The last step Sn is usually called the conclusion (since that is usually what we are proving).

B.9 Comments
(i) This definition of a proof or deduction includes sequences S1 of length 1. In this case
S1 must be an axiom (if there are any) or a hypothesis (if there are any).
(ii) For an ordinary proof (that is, not a deduction from hypotheses) I will often say
“proof of a theorem” in an attempt to avoid confusion.
(iii) You may have noticed a rather odd thing about Part (iii) of the definition: there
doesn’t seem to be any difference between axioms and hypotheses — at least as far as this
definition is concerned. And that is right: the difference is in how we use them. With
an axiomatic theory, the axioms are fixed (for that theory) and won’t change, whereas we
may play with various sets of hypotheses. In the next chapter, for instance, we investigate
the axiomatic theory SL. The axioms are part of the definition of SL, are given near the
beginning and stay fixed for the whole chapter, while we look at proofs from many sets of
hypotheses.
Also, it would be an irritating waste of time to have to write out all the axioms like hy-
potheses at the start of the proof of every theorem.
(iv) It does follow from this that, if we are prepared to change axioms in our theory, we
can sort of swap axioms and hypotheses. Now I will state this more carefully.
Suppose we have an axiomatic system with axioms set A, and we also have a chosen set H
of hypotheses, and we are looking at a possible proof

S1, S2, . . . , Sn n ≥ 1. (–1)

Then the following five statements about it are equivalent (in the sense that if one is true
they all are).

• In the axiomatic theory with axioms A, (–1) is a deduction from hypotheses H.


• In the axiomatic theory with axioms A ∪ H, (–1) is a proof of a theorem.
• In the axiomatic theory with an empty set of axioms, (–1) is a deduction from hy-
potheses A ∪ H.
• In a plain formal theory, (–1) is a proof from hypotheses A ∪ H.
• Sn ∈ Th(A ∪ H).

B.10 Proposition
In an axiomatic theory, the theory is just the set of all theorems.
Which is exactly what you would expect: that’s the whole point of a proof.

Proof. Let T be the set of theorems based on A; I will show that it satisfies the definition
I B.3 of the theory generated by A as given in B.3.
B. FORMAL THEORIES 19

(i) The remark above about proofs of length 1 tells us that all the axioms are theorems,
so A ⊆ T . To see that T is a theory, let A1 ,A2B,...,Ar be a rule such that A1 , A2 , . . . , Ar ∈ T .
Then all those Ai have proofs, so, for each i let

proofi be the proof of Ai . (–1)

Now, make a big proof by placing all these individual proofs end-to-end, and finishing off
with B, thus:
proof1 , proof2 , . . . , proofr , B (–2)
Then this is a proof of B because every step except the last is justified in (–2) by the same
rule as in (–1), and the last step B is justified by the rule we started out with, because all
the Ai occur as the last steps of proofi in (–2). This shows that B ∈ T as required.
(ii) Suppose that U is another theory such that A ⊆ U. Any T ∈ T has a proof,
S 1 , S 2 , . . . , S n say, with Sn = T . Then, by induction over i, every Si ∈ U. This proves that
T ⊆ U, as required. 

B.11 Comments
There are several simple facts about proofs that should be pointed out here.
(1) The remark above about proofs of length 1 tells us that, in an axiomatic theory, all
the axioms are theorems.
(2) Note that we could have defined a theorem alternatively as any expression which
occurs as a step anywhere within any proof. This follows from the obvious fact that , if
S 1 , S 2 , . . . , S n is a proof and 1 ≤ m ≤ n, then S 1 , S 2 , . . . , S m is also a proof.
(3) If an expression appears as a step in a proof, it may always be repeated at a later
point, since whatever justified its use the first time will still justify it later on.
20 CHAPTER 1. FORMAL THEORIES

Strings

Expressions

Theorems

This is a mental image I find useful for


thinking about a axiomatic theory. The
arrows are supposed to represent (sort
Axioms
of) the rules, and suggest that the the-
orems are built up from the axioms by
repeated applications of the rules.

B.12 Schemata
Returning to our simple i − a − b theory, we have a single axiom, i. Consider the second
rule, which I wrote as Xa3 Y ↔ XiY . This is of course just a convenient shorthand for
two rules, Xa3 Y → XiY and XiY → Xa3 Y . According to our latest notation, we should
write these
Xa3 Y XiY
and .
XiY Xa3 Y
But this does not exactly fit the definition of a deductive rule given above: the premise
and conclusion are not expressions. They represent, rather, forms of expressions. Must we
extend the definition above to include such rules?
No. What we do is regard this, not as a single rule, but rather as a recipe for producing
a whole set of rules, one for every possible pair of “values” of X and Y . More precisely,
substitution of any pair of expressions for X and Y (possibly empty and same substitutions
top and bottom) in the recipe above produces a rule.
B. FORMAL THEORIES 21

Such a recipe is called a schema and the individual rules which it defines are called instances
of the schema. It is not a new kind of deductive rule, but simply a convenient way of
specifying an infinite set of genuine deductive rules.
This is a deductive rule schema. Shortly we will see that an axiom schema is also a useful
thing.

(By the way, the plural of “schema” is “schemata”, if you want to be pedantic. However
“schemas” will do as long as you are far enough away from the Classics Department.)

It is not difficult to see that an infinite number of deductive rules is essential for an interesting
theory. Observe that the only things that can turn up as expressions in proofs are axioms
and the conclusions of rules. Therefore, if a formal theory had only a finite number of
deductive rules (I refer to genuine rules, not schemata now), then the theory would consist
of only the axioms and a subset of this finite set of conclusions. In this case, we might as
well add those conclusions to the axioms in the first place, and have a theory which has
been specified simply by writing down all the theorems, and no deduction necessary.

B.13 Decidability again


To say that a theory is decidable is a short way of saying that its set of theorems is decidable,
that is, that there is an algorithm to decide whether any expression in the underlying
language is a theorem or not..
It has been mentioned above that, in a formal theory, the sets of axioms and rules are
normally decidable. In such a theory it follows that proofs are also decidable: it is easy
to construct an algorithm which will recognise whether any given sequence of expressions
is a proof or not. This does not mean that the theory itself is decidable: there is no
guarantee that an algorithm exists which will decide whether or not a proof exists for any
given expression.
Here is an interesting undertaking.

(i) It is not hard to make an algorithm which will list all the strings. Suppose first that
there is only a finite number of symbols. To make the list, simply start with the empty
string, then list all the strings of length 1 in dictionary order, followed by all the strings
of length 2, followed by all the strings of length 3 and so on. It is not hard to extend
this method to cover the infinite number of variables case.
(ii) Next, let us perform this algorithm, but as we go, apply the given method of deciding
whether strings are expressions or not, and so throw away every string which is not
an expression. We now have an algorithmic way of producing a list of all expressions.
(iii) Now go back to step (i), but change it in the obvious way to list all finite sequences
of strings. Then apply the trick of step (ii) to turn it into a method of listing all finite
sequences of expressions.
(iv) Tune up the last algorithm one further stage by testing each sequence of expressions,
as it is produced, to see if it is a proof or not, and throwing away all non-proofs. We
now have an algorithm which will list all proofs.
22 CHAPTER 1. FORMAL THEORIES

(v) Now use this algorithm to produce proofs, but as each one is produced, select its last
expression and discard the rest. We have an algorithm which will list all the theorems.

How then can the set of theorems possibly be undecidable?


I Ch.9 (A detailed answer to this important question will be given in Chapter 9.)

B.14 Deduction
(i) Where there is a deduction of a conclusion C based on a set H of hypotheses, there
is a standard symbol for this:
H C .
We can also say that H entails C.

Clearly this idea (what it means and whether it is true or not) depends upon
what theory we are working in. The definition does not make much sense unless
we know what the rules are, whether we are working in an axiomatic system or
not and, if so, what the axioms are. Thus we only use this useful symbol when
we already know what theory we are talking about. Occasionally we might
be talking about several different theories at the same time, or the particular
theory might be in doubt for some other reason. Then we can indicate that we
are referring to a proof in the theory T by writing

H C (in T ).

(ii) In the case where H has only one member, H = {H} say, we really should write
{H} C, but usually, as a slight abuse of notation, we write simply

H C

In the same way, when there is only a small number of hypotheses, H = {H 1 , H 2 , . . . , H n }


say, we usually dispence with the brackets and write

H 1, H 2, . . . , H n C

(iii) And in the same way, in the case where H = ∅, we really should write ∅ C, but
again, as a slight abuse of notation, we write simply

C.

This means of course that C is a theorem.


(iv) It is also convenient to write H K, where H and K are both sets of expressions,
to mean that H entails every member of K, that is:

H H for every H ∈ K .
B. FORMAL THEORIES 23

(v) Note that the symbol is not part of the formal language we are discussing. It
is not even a convenient abbreviation for something that can be said in the language —
that is, it is not even part of our semi-formal version. Statements involving this symbol
are statements about the language, in other words they are part of the metalanguage (see
below).

This symbol I believe comes from a T for “theorem” turned on its side, but I
could be wrong. If you want a name for it, it is usually called “turnstile” (because it
looks a bit like one of those things you run into as you leave a library).

(vi) We will also write H K to mean that both H K and K H and say that H
and K are deductively equivalent. (And of course, if H and K contain one expression each,
we may write H K, and so on.)

There is much more discussion of deductions in the next couple of Chapters.


Proofs are of course very important in the study of mathematics and of formal theories in
general. We pause here to demonstrate a few basic properties of proofs which we will need
from time to time.

B.15 Proposition
(i) Perhaps the most basic fact about deduction (in an axiomatic theory) is that

H C if and only if C ∈ Th(A ∪ H)

(where A is the set of axioms) and so, for a set K of expressions,

H K if and only if K ⊆ Th(A ∪ H)

(ii) Deductions can be chained

If H K and K L then K L.

(iii) If we add more hypotheses to a deduction, the conclusion remains a conclusion.


0
If H C and H ⊆ H then H0 C also .

And of course the same thing for a set of conclusions,

If H K and H ⊆ H0 then H0 K also .

(iv) This one says, roughly, that deductive rules immediately give deductions.
A1 , A2 , . . . , Ak
If is a rule, then A1 , A2 , . . . , Ak B.
B

(v) The set of hypotheses can always be pared down to a finite set (but only when the
conclusion is a single expression).
24 CHAPTER 1. FORMAL THEORIES

If H C then there is a finite subset H 1 , H 2 , . . . , H k of H such that H 1 , H 2 , . . . , H k C.


(vi) And, in any theory, deductions made from members of the theory are also members
of the theory.
If H ⊆ the theory and H K then K ⊆ the theory also .
For an axiomatic theory with axioms A this becomes
If H ⊆ Th(A) and H K then K ⊆ Th(A) also .
A special case of this is worth noting: if, in some axiomatic theory, all the members of H
are theorems and H C, then C is a theorem also.
Another way of looking at this: we know that formal theories are closed under deductive
rules; this says that they are closed under proofs too.

I B.9 Proof (i). This is just a restatement of the comment (v) in B.9 above.
I B.3 (vi) K ⊆ Th(A ∪ H) (by Part (i)) and Th(A ∪ H) ⊆ Th(A) (by Proposition B.3).
(ii) This is just a restatement of Part (vi) using Part (i).
0
(iii) From H C we have C ∈ Th(A ∪ H) and from H ⊆ H we have Th(A ∪ H) ⊆
0 0
Th(A ∪ H ), so C ∈ Th(A ∪ H ) .
(iv) B ∈ Th(A ∪ {A1 , A2 , . . . , Ak }) by the definition of a theory.
(v) A proof of H B consists of a finite sequence of steps S 1 , S 2 , . . . , S n , where
Sn is B and each Si is either an axiom, a member of H or follows from earlier steps by a
deductive rule. So, if we set {H 1 , H 2 , . . . , H k } to be all the members of H which occur as
steps in that proof (there must only be a finite number of them), then it is also a proof of
{H 1 , H 2 , . . . , H k } B. 

Does it strike you as odd to be proving things about proofs?

B.16 Metalanguage and the green font


In these notes I will make many statements in one or other of the various formal languages
we discuss, however more often than not I will make statements about them; such statements
will contain ideas, words or symbols not expressible in the language itself.
Statements which are made about a formal language, but which contain words or symbols
which are not part of that language, are said to be made in the metalanguage.
As things get more complicated, the metalanguage tends to get more formal-looking itself.
When talking about formal logic, we make statements in (or are doing) metalogic, when
talking about mathematics we make statements in (or are doing) metamathematics and so
on.
In the sections on deductions above, all the statements are in the metalanguage. As remarked
there, even though H B for instance looks pretty formal, it is not part of the object
language because the symbol is not a symbol of that language.
B. FORMAL THEORIES 25

For an everyday example, the phrase «Il pleut» is a statement in the French language. In
the statements “It is possible to say ‘It is raining’ in French” and “«Il pleut» is French for
‘It is raining’ ”, we are using English as metaFrench.
For the kind of discussions we will be having, it is essential to distinguish between expressions
of the object language and statements made in the metalanguage. We will be dealing with
the language of mathematics and languages which express parts of mathematics. These
languages of course contain symbols and expressions which are the same as the ones we use
in the metalanguage, providing a rich source of confusion.
For example, suppose we are dealing with a language which contains symbols for the nu-
merals 0, 1, 2, . . . , + for addition and = for equality. What are we to make of this?

2+2=4

Is it a statement in the object language, saying that 2+2 is equal to 4 (true) or is it a


statement in the metalanguage, saying that the expressions “2+2’’ and “4” are the same
string (false: the first string has three characters, the second one only one)? The confusion
arises of course because there is an ambiguity here about the equality symbol: it could be a
symbol of the language itself, or we could be using it as part of our metalanguage.

What we do in everyday English, if we are being careful, is use quotes, something like this:

“2+2=4” the single expression (true)


“2+2”= “4” the two strings are the same (false).

Sprinkling mathematics with quote marks like this does not work very well: it quickly
becomes unreadable. Another way to deal with the problem is to use a noticeably different
font. In these notes I will use a different font colour, so that expressions in the formal
language are easily distinguished — that accounts for the green-coloured symbols in the
text above.
Also, because equality and inequality of expressions as strings is often discussed, I will
occasionally use a special symbol for this, a heavy circled equals sign:

X = Y X and Y are the same string,


X 6= Y X and Y are not the same string.

So here is how I would write the “2+2=4” examples:

2+2 = 4 the single expression (true)


2+2 = 4 the two strings are the same (false).

When we look at important formal languages in detail, we will find that they are enormously
long-winded. So we introduce many abbreviations to make them easier to read and deal
with. We have already done so with our simple i − a − b language with the abbreviation
a3 for aaa. Such abbreviations will also be written in the “formal language font colour”. In
discussing that simple theory, we also used expressions such as Xa3 Y . Here the X and Y
26 CHAPTER 1. FORMAL THEORIES

are not actually symbols of the language; but they stand for strings in the language. We
write them in the formal font colour because they represent something that is entirely in
the formal language. Discussions above of strings X and Y follow the same pattern.
So here is the general rule followed in these notes:

• Anything that is entirely in the formal font colour is, or represents, a string in the
formal language itself.
• Anything that is entirely in any other colour is in the metalanguage (or just generally
rabbiting on).
• Anything that is a mixture of colours is in the metalanguage, discussing the indicated
bits of the formal language.

Note that the symbols = and 6= are never part of the formal language. They belong to our
metalanguage — just like the symbols and discussed earlier.
So, excuse me if I go on a bit about this coloured font business. The point of using it is to
make clear, as far as possible, what is an expression in the formal language under discussion
I A.2 and what is not. For example, when writing about the toy formal language of Section A.2,
I wrote

• iaaa in green, because it is a genuine expression of the language;


• (ab2 )3 in green, because it is an abbreviation for a genuine expression of the language,
namely abbabbabb;
• XaaaY in green, because the letters X and Y are stated to represent expressions, and
so the whole thing does.

On the other hand, supposing that we give the toy language a name, T say; that would not
be coloured green. It is not a formal expression, nor does it represent one: it represents the
language itself.
I have already pointed out that, in statements such as
x>0 x+y >y
there are two formal expressions connected by a metasymbol. The entire statement is not
a formal expression. Where colour coding is used, the whole statement must be coloured
green to be a formal expression.
I will try to use this colour convention as consistently as possible throughout these notes,
I Ch.6 except for Chapters 6 and 7. Virtually everything that looks formal in those chapters is
I Ch.7 formal (or semiformal), so the colouring would just be a nuisance there.

[The prefix “meta-” is used a lot in various contexts, often sloppily. As used in these
notes it has very little to do with meta-analyses (referring to clinical trials) or meta-
narratives (in postmodern discourse) or more or less any other popular use of the
prefix.]
B. FORMAL THEORIES 27

B.17 Theorems, propositions, lemmas


Throughout these notes there are subsections labelled "Theorem", "Proposition", "Lemma"
or “Corollary” will occur. They will all in fact head theorems, in that they are things that
have a proof, either in the formal theory being studied or about it (that is, metatheorems).
The difference between the four is just in the attitude we have to them. But you know that.
28 CHAPTER 1. FORMAL THEORIES

C Comparing formal theories


We will occasionally want to compare different, but related, formal theories. In this section
we look at a couple of definitions and propositions relating to this. These results are de-
scribed here because they belong in this chapter however, on first reading of these notes, I
would recommend you skip this section and only come back to it when it becomes relevant.

If we want to discuss different theories or sets of axioms, while keeping the same language
and set of rules, the ideas of the previous section are usually quite adequate. However
sometimes we want to compare different theories which have different languages or different
sets of rules (or both). Two common cases are

(1) Some new symbols are added to the language, thus extending the language, and with
it usually the theory. The rules don’t change, but (if the theories are axiomatic) usually
some new axioms are added to define the behaviour of the new symbols.
(2) The two languages are completely different, but the theories say the same things in
those different languages. We say that the two theories are equivalent.

C.1 Definition: extension


Let A and B be two formal theories. Then B is an extension of A if
(i) The underlying language of A ⊆ the underlying language of B .

(ii) The set of theorems of A ⊆ the set of theorems of of B .


Perhaps this is a case where being super-careful would be a good idea. If we consider the
two theories to be triples, A = hL1 , R1 , T 1 i and B = hL2 , R2 , T 2 i, where the Li are the
underlying languages, the Ri are the rules and the T i are the sets of theorems, then the
above conditions are that L1 ⊆ L2 , R1 = R2 and T 1 ⊆ T 2 .

In the particular case in which the underlying languages are the same, we say that we have
a simple extension.
Note that, in the case that A and B are axiomatic theories, we do not require that all the
axioms of A must also be axioms of B. However, the axioms of A must at least be theorems
of B, since we know that they are theorems of A.

(We could also of course consider what might happen if we change the rules as well. We
don’t bother here, because that is hardly ever done.)
The following result can be useful in proving that one axiomatic theory is an extension of
another.

C.2 Proposition
Let A and B be two axiomatic theories. Then B is an extension of A if and only if
(i) The underlying language of A ⊆ the underlying language of B

(ii) And every axiom of A is a theorem of B.


C. COMPARING FORMAL THEORIES 29

Proof. Suppose first the definition holds. Then Part (i) of this proposition holds because
the definition says so and Part (ii) holds because every axiom of A is a theorem of A.
Conversely, suppose that Parts (i) and (ii) of this proposition hold. Then Part (i) of the
definition is immediate. For Part (ii) follows from Proposition B.15(v).  I B.15

C.3 Equivalent theories


In the case of an extension, as above, we can consider the theory B to be “stronger than” A,
because everything that can be said in A can also be said in B and everything that can be
proved in A can also be proved in B — arguments in A are mirrored by arguments in B.
However we sometimes want to consider theories which say the same things and prove the
same things but in completely different languages.
Since the languages are different nothing will make sense unless we have a method of trans-
lating back and forth: to every expression in A there must correspond an expression in B
and vice versa. Putting this more carefully, we will need translating functions

ϕ:A → B and ψ : B → A.

Even more carefully, these functions are from the underlying language of A to that of B and
vice versa.
But we want more than that. Given an expression A in A, if we translate it into B and
then translate back again we might expect to get the expression we started with; that is,
ψ(ϕ(A)) = A. Well, this turns out to be too much to ask for in general; in fact this is not
going to work even for natural languages. As an example, suppose we had translations

ϕ : English → French and ψ : French → English

Now French has no single-word translation for the English word “shallow”; it would normally
be translated as “pas profond”. Then translating this back again into English would normally
yield something like “not deep”. Thus we would have

ψ(ϕ(shallow)) = not deep .

The double translation brings us back, not to the same thing, but at least to something
which means the same. We will see that much the same thing happens in formal theories and
that the best we can expect is that the double translation back and forth yields something
equivalent to the original. Thus we have

C.4 Definition: equivalent theories


Let A and B be two formal theories. We will say that A and B are equivalent if there
are functions ϕ : A → B and ψ : B → A (more carefully, to each expression A in the
underlying language of A there is defined a corresponding expression ϕ(A) in the underlying
language of B and to each expression B in the underlying language of B there is defined a
corresponding expression ψ(B) in the underlying language of A) with these properties:

(1) For any expression A in the underlying language of A, A ψ(ϕ(A)) (in A).
30 CHAPTER 1. FORMAL THEORIES

(10 ) For any expression B in the underlying language of B, B ϕ(ψ(B)) (in B).
(2) If H 1 , H 2 , . . . , H k A (in A), then ϕ(H1 ), ϕ(H2 ), . . . , ϕ(Hk ) ϕ(A) (in B).
(20 ) If K 1 , K 2 , . . . , K k B (in B), then ψ(K1 ), ψ(K2 ), . . . , ψ(Kk ) ψ(B) (in A).
It is a straightforward, if tedious, exercise to prove that equivalence of formal theories, as
just defined, is an equivalence relation in the ordinary sense.

C.5 Note
Part (2) of the theorem, in the case k = 0, tells us that if A is a theorem of A, then ϕ(A) is
a theorem of B. In the same way, if B is a theorem of B, then ψ(B) is a theorem of A.

C.6 Proposition
Let A and B be two axiomatic theories. Then A and B are equivalent if and only if there
are functions ϕ : A → B and ψ : B → A (that is, between their underlying languages)
with these properties:
(1) For any expression A in the underlying language of A, A ψ(ϕ(A)) (in A).
(10 ) For any expression B in the underlying language of B, B ϕ(ψ(B)) (in B).

(2) For any axiom A of A, ϕ(A) is a theorem of B.


(20 ) For any axiom B of B, ψ(B) is a theorem of A.
H 1 ,H 2 ,...,H k
(3) If A is any rule of A, then ϕ(H1 ), ϕ(H2 ), . . . , ϕ(Hk ) ϕ(A) (in B).
K 1 ,K 2 ,...,K k
(30 ) If B is any rule of B, then ψ(K1 ), ψ(K2 ), . . . , ψ(Kk ) ψ(B) (in A).

Proof. Obvious. 

Note that, since the languages are different, the rules will probably have to be different too.

Remark This proposition looks more complicated than the definition, however it is useful
because its conditions are usually easier to check than those of the definition. In particular,
for many theories based on logic, the rules are the same for both theories and only the
axioms differ; in this case Conditions (3) and (30 ) hold automatically.
2. SENTENTIAL LOGIC

A A language for Sentential Logic


A.1 Formal and semi-formal theories
“Everything should be made as simple as possible, but no simpler."
—attrib. to Albert Einstein

In this chapter, the next one and Chapter 6 we will develop Logic and then Mathematics I6
as axiomatic theories. The definitions will be made as simple as possible. There are several
reasons for this.

Firstly, Mathematics, as it is actually written, uses a vast number of different symbols


and notational conventions, and these are growing all the time. Moreover, it often uses
the same symbol with different meanings, leaving it to the reader to decide, from context,
which meaning is intended. (For example, the vertical bars in |x| have different meanings
depending upon whether x is a complex number, a vector or a matrix.) To develop a formal
theory that would embrace all this mess, as is, would be formidably complicated.

Secondly, we are going to prove things about Logic and Mathematics. Some of these things
are quite startling, some have occasioned major shifts in the way mathematicians and Lo-
gicians think about their subjects. So it is important that the proofs of these facts are as
watertight as possible, and to do this requires that the subject matter, Logic or Mathe-
matics, be defined precisely. To do this it helps a lot to make the definitions as simple as
possible — but no simpler.
So we head towards developing a formal theory for Mathematics in Chapter 6. It will use
a surprisingly small number of symbols and so the underlying language will be correspond-
ingly simple. Nevertheless, it is designed so that everything that can be said in Logic or
Mathematics can be said in this language, though usually in an enormously long-winded
and obscure way. In the same way, anything that can be proved in Logic or Mathematics
can be proved in our formal theory, though at the expense of long, tedious and extremely
detailed proofs.
Having made our clean and simple definition, we will allow ourselves to introduce new
symbols and methods of proof to increase the convenience of the theory — shorter and more
legible expressions, smarter and easier to understand proofs and so on. As we progress, the
theory will become more and more like Logic and Mathematics as they are usually written
(though perhaps with a little more attention to detail) until eventually we can see how any
logical or mathematical proof can be performed in our defined language.
For safety’s sake (and in order to be able to describe what is going on) it is best to think of
this as two theories. Firstly there is the stripped-down one we are about to define in terms
of expressions, axioms, rules and so on; this I will call the fully-formal theory. We won’t

31
32 CHAPTER 2. SENTENTIAL LOGIC

mess with this one — after it has been defined, we will add no symbols or methods of proof
at all. Then we will have the more convenient theory, got by starting with the fully-formal
one, but adding new symbols and methods of proof as we go along. This I will call the
semi-formal theory.
Imagine trying to write out a proof of a theorem from First-Year Calculus (the Intermediate
Value Theorem say) using only the facilities of the fully-formal language. Such a (fully-
I 1.B.8 formal) proof was defined in 1.B.8, and involves starting from the axioms and proceeding by
the (tiny) steps allowed by the formal rules of the theory. It would have to contain the whole
necessary infrastructure — the definition of the Reals, with its algebra, infs and sups, lots of
elementary stuff about continuous functions, ideas about the natural numbers and induction
and so on and on. If it is not obvious already, the Deduction Theorem (proved later in this
chapter) will make it so, that such a proof would be stupendously long; it certainly wouldn’t
fit into a fat book. And then, when you move on to the next theorem (the Heine-Borel
Theorem say), you would have to write out the entire infrastructure all over again. This is
clearly not the way mathematicians actually proceed. For one thing (among many), they
build upon theorems already proved and so avoid re-inventing the wheel over and over again.

I 1.B.15 We have already made a start in allowing this facility. Proposition 1.B.15(vi) says that we
can use a set H of theorems already proved in the proof of a new one C. Well, we do not
add this new method of proof to our fully-formal theory; we leave that in its pristine state.
We add it as an allowed method of proof to our semi-formal theory, and then the corollary
tells us that anything that can be proved in our semi-formal theory can also be proved in
the fully-formal one.

This is the way we will proceed. We will make many incremental changes to our semi-formal
theory. Any time we change the underlying language, for instance by adding a new symbol,
we will show that
whatever can be expressed in the newly extended semi-formal language can also be expressed
in the formal one.
In the same way, if we add a new proof method to the semi-formal theory, we show that
anything that can be proved in the newly extended semi-formal language can also be proved
in the formal one.

Since we will be growing our semi-formal theory by many small incremental changes, the
easiest way of doing this is by showing, for each such change, that whatever can be expressed
or shown after the change could already be expressed or shown before it.

A.2 Sentential logic: Discussion


In this chapter and the next we set up Logic as a formal theory. We do this in two stages,
starting with Sentential Logic in this chapter and extending to Predicate Logic in the next.
Predicate Logic is important: it is the logic which is incorporated into most useful formal
theories, including the usual axiomatisations of Mathematics. I use the two-stage process
here because many of the important facts about logic can be dealt with in the context of
Sentential Logic before adding the extra complications required for Predicate Logic.
A. A LANGUAGE FOR SENTENTIAL LOGIC 33

Sentential Logic can be described as the logic of sentences, that is, statements which are
simply true or false (but not both), such as 2 + 2 = 4 (true) or 2 + 2 = 31 (false). Predicate
Logic extends this by adding the idea of variables, so we can deal with expressions such
as x + y < z − 1, for which the truth or falsity depends upon the values of the variables
involved.

For Sentential Logic, we combine sentences with the operations ¬ (NOT), ∧ (AND), ∨ (OR),
⇒ (IMPLIES) and ⇔ (IF AND ONLY IF). These operations are usually called connectives.
Here I have given the symbols that will be used for them in these notes in the formal theories
and the names that are usually used for them.

The symbols used here are becoming the ones most commonly used in formal logic;
however be aware, if you consult other books on logic, that you may find other symbols
being used. Common alternatives are ∼ for NOT and either ⊃ or → for IMPLIES.

Some care has to be taken with the words NOT, AND, OR, IMPLIES, IF as their use
in Logic is rather different from their use in the ordinary English language. For a start,
it is important to realise that when sentences are combined using these operations, the
truth-value (truth or falsity) of the combined sentence is determined by the truth-values
of the simpler sentences from which it is constructed, and nothing else. It might help in
understanding these operations to think of them as follows: for any sentences X and Y ,
¬X (NOT X) is true when X is false and false when X is true.
X ∧Y (X AND Y ) is true when both X and Y are true, false otherwise.
X ∨Y (X OR Y ) is true when at least one of X and Y is true, false otherwise.
X⇒Y (X IMPLIES Y ) is only false in the case that X is true and Y false;
it is true otherwise.
X⇔Y (X IFF Y ) is true when X and Y have the same truth-value
(i.e. both true or both false).
Warning I have found that mathematics students often use the symbol ⇒ to mean
“therefore” in the course of a proof, like this:
a line of proof
⇒ another line of proof
What they are (usually) trying to say here is that the first line is true and the second
follows from that, so they are both true. But this is not what the ⇒ symbol means — see
the description above. The point is, that “implies” and “therefore” are two different ideas,
though they are related of course. If you really want to use a quick symbol for “therefore”,
there is a perfectly good one:
a line of proof
∴ another line of proof
Please do not misuse the ⇒ symbol in your writings for this course: it is used very frequently
in its proper meaning and misuse can cause much confusion.

For those who like to use TeX to typeset mathematics, this symbol can be set using
the unsurprising command \therefore
34 CHAPTER 2. SENTENTIAL LOGIC

We now set up Sentential Logic, or SL, as a formal theory.

A.3 The symbols


The symbols are of three kinds:

The connective symbols ¬ ⇒


The proposition symbols p q r . . . (countably infinitely many)
The punctuation symbols ( )

A.4 Comments
(1) What happened to the other connectives, ∧, ∨ and ⇔? They have disappeared in our
program of simplifying the (fully-formal) language. All these other connectives of Sentential
Logic are redundant, since they can be expressed in terms of the two above, for example
p ∨ q is the same as (¬p) ⇒ q. We will shortly re-introduce them in the semi-formal theory.
(2) It is helpful to think of the proposition symbols as standing for sentences (otherwise
known as propositions). For instance, if p stands for the sentence 2 + 2 = 4 and q for the
sentence 3 + 5 = 42 then p ∧ q stands for 2 + 2 = 4 AND 3 + 5 = 42 (which is false because
q is).
(3) What do we mean by “. . . ” for the proposition symbols? We assume that there is a
countably infinite number of proposition symbols. Does this make sense? Can we actually
have an infinite number of different symbols, each of which is recognisably different from
every other one? Yes we can: here is an example:

V VV VVV VVVV and so on.

Alternatively, we could simply use two symbols, for instance p and 0 , which would allow us
our unending set of proposition symbols

p p0 p00 p000 ...

In a sense the last suggestion is the simplest of all, so why don’t we make this part of the
original definition, in keeping with our program? We want the language to be simple in the
sense that it will be easy to define and easy to prove things about it. Saying that we have
an infinite number of proposition symbols if we like makes the language easier to read and
write and does not in fact put any extra difficulties in our way. Indeed, there is really no
problem here — while we normally think of the symbols as being characters written down,
it was pointed out in the previous chapter that they can come from any kind of set at all.
So we see that there are in fact many versions of the language SL, depending upon which
particular set of symbols we choose to use. But the difference between these versions of the
language is entirely trivial, we ignore it and speak of the language SL.
In these notes I will use lower-case letters from the ordinary English alphabet for proposition
symbols. Expressions will usually be represented by upper-case letters.
A. A LANGUAGE FOR SENTENTIAL LOGIC 35

A.5 Expressions
(S1) Any proposition symbol (by itself) constitutes an expression.

(S2) If A is an expression, then so is (¬A).


(S3) If A and B are expressions, then so is (A ⇒ B).

A.6 Examples
For a start the proposition symbols p, q, r and so on are all expressions.

From these, using (S2) and (S3) we can construct more expressions such as (¬p) and (q ⇒ r).
Then we can keep going, thus ((¬p) ⇒ (q ⇒ r)) and ((¬(q ⇒ p)) ⇒ (¬(¬p)) ⇒ (q ⇒ (¬p))) .

A.7 Comments
(1) This definition should perhaps be put slightly more carefully. It is meant to state
that expressions can be built up from simpler ones in this way, and only in this way. One
can think of this as a set of construction rules, telling one how it is allowed to build up
expressions. This way of looking at it does not make it obvious that we have a decision
algorithm here. Perhaps a better approach is to notice that, in the second and third rules,
the new expression is longer than the one(s) from which it is formed. We can therefore
reword the definition in a recursive fashion:

A string C is a expression if and only if it is of one of the forms


(S10 ) It is a lone proposition symbol.
(S20 ) There is a shorter expression A such that C = (¬A).
(S30 ) There are shorter expressions A and B such that C = (A ⇒ B).

(2) The rules introduce more parentheses than we usually use. However if the parenthe-
ses in (S3) were omitted, for instance, we would be allowed to write p ⇒ q ⇒ r, which is
meaningless (since implication is not associative). Our normal use of parentheses depends
on complicated rules of precedence in the application of operators, which presumably we
could include in our rules, at the expense of making our definition much more complicated.
The easier approach is to insist on a complete set of parentheses every time, at the expense
of legibility — for example, the expression we would normally write (p ⇒ ¬p) ⇒ ¬p must
now be written ((p ⇒ (¬p)) ⇒ (¬p)).
(3) In our semi-formal version, we will of course allow ourselves to dispense with paren-
theses according to the usual conventions. Shortly we will introduce the other connectives
into our semi-formal language as abbreviations for the appropriate constructions in the fully-
formal language. For instance, when in the semi-formal language we write p ∨ q we are only
using a convenient shorthand for the true formal ((¬p) ⇒ q).
Shortly we will define formal equivalents for the semi-formal p ∧ q and p ⇔ q. In the mean-
time, you might be interested in trying to figure them out for yourself.
That completes the definition of the language. Of course we have a theory in mind, namely
the set of all tautologies, the logical formulas which are always true, irrespective of the truth
36 CHAPTER 2. SENTENTIAL LOGIC

or falsity of the propositions the individual proposition symbols represent. Well-known


examples are

p∧q ⇔q∧p
(p ⇒ (q ∧ ¬q)) ⇒ ¬p
¬(p ∧ q) ⇔ (¬p) ∨ (¬q) .

[These are, of course, presented in semi-formal version. Try translating them into fully-formal
language to see why we don’t want to use it much.]

Comment (1) above tells us how we are allowed to build up expressions. But there is
another aspect to this: any particular expression can be built up in only one way. Putting
this another way, given an expression X, we can break it down into its component parts
in only one way. This is important, because it means that any given expression cannot be
understood to mean two different things; our expressions are not ambiguous.

This property of expressions, known as uniqueness of parsing, relies on the way we have
included parentheses in the definition. If we leave out parentheses, for example write
p ⇒ q ⇒ r then the result is indeed ambiguous: it could mean either (p ⇒ q) ⇒ r or
p ⇒ (q ⇒ r), and these are two quite different things.

A.8 Proposition: Unique parsing


Let X be an expression. Then one and only one of the following is the case:
(1) X is a single proposition symbol (letter).

(2) X is of the form (¬A) for some (shorter) expression A; and in this case there is only
one such possible A.
(3) X is of the form (A ⇒ B) for some (shorter) expressions A and B and in this case
there is only one possible A and one possible B.
I will not give the rather tiresome proof of this here. However, to point out that there is
definitely something to be proved, consider the expression

((p ⇒ ((p ⇒ p) ⇒ p)) ⇒ ((p ⇒ (p ⇒ p)) ⇒ (p ⇒ p)))

Is it obvious that it decomposes uniquely?

If you want to prove it for yourself, let me encourage you. Here are some hints. For any
expression, assign numbers to all the symbols in it by starting at the beginning (left-hand
end) with 0, adding 1 every time you encounter an opening parenthesis and subtracting 1
every time you pass a closing parenthesis, thus:
12 2 2 34 4 4 44 3 3 32 1 23 3 3 44 4 4 43 2 33 3 3 321
((p ⇒ ((p ⇒ p) ⇒ p)) ⇒ ((p ⇒ (p ⇒ p)) ⇒ (p ⇒ p)))

Notice that opening and closing parentheses are treated slightly differently: you add one as
you reach an opening parenthesis, but subtract one just after you pass a closing parenthesis.
If you now erase all the little numbers, except for the ones over parentheses, it is clear how
A. A LANGUAGE FOR SENTENTIAL LOGIC 37

they match up:


12 34 4 32 23 4 43 3 321
((p ⇒ ((p ⇒ p) ⇒ p)) ⇒ ((p ⇒ (p ⇒ p)) ⇒ (p ⇒ p)))

Now all you have to do is work out how this numbering of parentheses defines the way the
expression has been put together, then prove that it works for all expressions. Ha! (If you
study, or have studied, Computer Science, this proof should be very familiar to you.)

A.9 Induction over construction of an expression


Continuing the idea of Comment (1) above, it is possible to prove things about expressions
by induction over their length. To prove something is true of all expressions of SL, it is
sufficient to prove:
(1) It is true of all expressions which consist of a single proposition symbol;
(2) for any expression A, if it is true of A then it is also true of (¬A); and

(3) for any expressions A and B, if it is true of both of them then it is also true of
(A ⇒ B).
This process is called induction over the construction of the expression. We will be using it
often.

A.10 The axioms


Continuing our definition of the theory, we now define the axioms.
There are three axiom schemata:

(SL1) (A ⇒ (B ⇒ A))

(SL2) ((A ⇒ (B ⇒ C)) ⇒ ((A ⇒ B) ⇒ (A ⇒ C)))

(SL3) (((¬A) ⇒ (¬B)) ⇒ (((¬A) ⇒ B) ⇒ A))

As before, to say that these are axiom schemata means that each represents an infinite
number of actual axioms, created by substituting any actual expressions for the symbols A,
B and C used here. For example, as instances of (SL1) are the actual axioms

(p ⇒ (q ⇒ p))
(p ⇒ (p ⇒ p))
(((¬q) ⇒ (¬p)) ⇒ ((p ⇒ (¬p)) ⇒ ((¬q) ⇒ (¬p))))

and of course infinitely many others. It is a straightforward, if tiresome, process to decide


whether any given expression is an axiom, that is, an instance of one of the three schemata
above. Perhaps a simpler way of looking at this is to read (SL1) as saying that any expression
of the form (A ⇒ (B ⇒ A)) is an axiom, and to read (SL2) and (SL3) in a similar fashion.
38 CHAPTER 2. SENTENTIAL LOGIC

The axioms are a little easier to comprehend if we remove some parentheses and write them
semi-formally:

(SL10 ) A ⇒ (B ⇒ A)

(SL20 )
 
A ⇒ (B ⇒ C) ⇒ (A ⇒ B) ⇒ (A ⇒ C)

(SL30 )

(¬A ⇒ ¬B) ⇒ (¬A ⇒ B) ⇒ A

(There’s a few other visual clues here too — extra space and some larger parentheses.)

A.11 The rules


Finally, we need our deductive rules. In fact, we need only one, called Modus Ponens. It is
also a schema:

A , (A ⇒ B)
(MP)
B

that is, given A and (A ⇒ B) one can deduce B. The rôle of rules in defining theories and
proofs has been discussed in the previous chapter.

A.12 Comments on the axioms and rules


(i) This is not the only way to axiomatise Sentential Logic. A brief look at textbooks on
logic, especially symbolic logic or mathematical logic, will yield many other ways of doing
it. It may even seem to be a bit strange — after all, the axioms (SL2) and (SL3) are
not enormously immediately compellingly obvious, especially compared with such gems as
(A ∧ B) ⇔ (B ∧ A). It is, however, wonderfully economical: we get the whole of Sentential
Logic in three reasonably simple axiom schemata and one very simple rule of deduction.
II This is a great help later. (We look at some other axiomatisations in Section I.)
(ii) It has already been pointed out that we do not (yet) have a symbol for the logical
I F.7 “and” connective; we will introduce it later in Section F.7. When we do, we will also prove
I F.10 that A ⇒ (B ⇒ C) is equivalent to A ∧ B ⇒ C (that’s in Theorem F.10).

However, if we allow ourselves to look forward to this notation, we can rewrite the three
axioms thus:
(SL100 ) A∧B ⇒ A
(SL200 ) (A ∧ B ⇒ C) ∧ (A ⇒ B) ⇒ (A ⇒ C)

(SL300 ) (¬A ⇒ ¬B) ∧ (¬A ⇒ B) ⇒ A


They perhaps make better sense in this form.
(iii) It is by no means obvious that we can in fact deduce the whole of Sentential Logic
from this basis. Try, for instance, discovering a proof of the very simple theorem p ⇒ p
(proof in B.1: no peeking!).
A. A LANGUAGE FOR SENTENTIAL LOGIC 39

(iv) It is possible to avoid the “schemata” way of setting up axioms. An alternative


approach is to define (SL1), (SL2),(SL3) as axioms as follows (here p, q and r are proposition
symbols):
(SL1 alt) (p ⇒ (q ⇒ p))

(SL2 alt) ((p ⇒ (q ⇒ r)) ⇒ ((p ⇒ q) ⇒ (p ⇒ r)))


(SL3 alt) (((¬p) ⇒ (¬q)) ⇒ (((¬p) ⇒ q) ⇒ p))
at the cost of adding one more substitution rule, which states that, substituting any ex-
pression for any proposition symbol throughout any theorem (consistently) yields another
theorem. (MP must remain a schema, for reasons explained above.)

When we move on to Predicate Logic in the next chapter, setting up the axioms in this
manner becomes very messy. So we won’t pursue this any further.
(v) It is possible to define this theory semantically; we will look at this later. Also
the “truth table” method yields a decision process without too much trouble: an algorithm
which will decide whether any given expression is a theorem or not. Once we have checked
this, we will have proved that the theory is decidable.
It is in fact usually easier to check whether an expression in SL is theorem or not by using
the truth-table method than by looking for a semi-formal proof. However, there is no such
nice procedure when we move on to Predicate Logic, so we need to prepare for that already.
That is: Predicate Logic contains Sentential Logic, so if we don’t do SL by proofs now, we
will just have to do it by proofs in the next chapter.
40 CHAPTER 2. SENTENTIAL LOGIC

B Examples of formal proofs


First, a formal proof of a theorem.

B.1 Proposition
(p ⇒ p)
Proof

((p ⇒ ((p ⇒ p) ⇒ p)) ⇒ ((p ⇒ (p ⇒ p)) ⇒ (p ⇒ p)))


(p ⇒ ((p ⇒ p) ⇒ p))
((p ⇒ (p ⇒ p)) ⇒ (p ⇒ p))
(p ⇒ (p ⇒ p))
(p ⇒ p)

That is the full formal proof. It is extremely unfriendly to human readers — on the other
hand a computer would have little trouble with it. Below it is presented again in a less
formal form. Parentheses have been removed, some spacing has been inserted, line numbers
have been added, and a gloss on the right indicating which rules or axioms have been used.
In Line 1, the gloss “Ax 2” means that this is an instance of Axiom Schema (SL2) and, in
line 3, the gloss “MP: 1 and 2” means that this is the result of applying Modus Ponens to
Lines 1 and 2.

1 (p ⇒ ((p ⇒ p) ⇒ p)) ⇒ ((p ⇒ (p ⇒ p)) ⇒ (p ⇒ p)) Ax 2


2 p ⇒ ((p ⇒ p) ⇒ p) Ax 1
3 (p ⇒ (p ⇒ p)) ⇒ (p ⇒ p) MP: 1 and 2
4 p ⇒ (p ⇒ p) Ax 1
5 p⇒p MP: 3 and 4 

With this proof and the next one, it is a very good idea to check it through line by line and
make sure you understand just how each line is an instance of an axiom or an application
of MP, as claimed.
Because the axioms and the rule are all schemata, it follows that we can substitute any
expression for p in the above theorem and its proof and we will still have a valid theorem
and proof. In other words, we immediately get a schematic version of the theorem:

B.2 Proposition
A⇒A
(Here A stands for any expression.)
B. EXAMPLES OF FORMAL PROOFS 41

Now let us look at a proof of a deduction.

B.3 Proposition
p⇒q , q⇒r p⇒r

Proof.

1 (q ⇒ r) ⇒ (p ⇒ (q ⇒ r)) Ax 1
2 q⇒r Hyp
3 p ⇒ (q ⇒ r) MP: 1 and 2
4 (p ⇒ (q ⇒ r)) ⇒ ((p ⇒ q) ⇒ (p ⇒ r)) Ax 2
5 (p ⇒ q) ⇒ (p ⇒ r) MP: 3 and 4
6 p⇒q Hyp
7 p⇒r MP: 6 and 5


Recall that, in a proof of a deduction, we are allowed to write down any one of the hypotheses
as a step at any time. Here I have done this in Steps 2 and 6.
As with Theorem B.2, we could just as easily have proved a schematic version:

B.4 Proposition
A⇒B , B⇒C A⇒C

Proof. Substitute A, B and C for p, q and r throughout the proof above. 

We can make this kind of substitution in any theorem and proof in SL. Since a schematic the-
orem is more general that an ordinary one, we will present and prove theorems in schematic
form throughout this chapter.
42 CHAPTER 2. SENTENTIAL LOGIC

B.5 Proposition
Another way of looking at the axioms.
(SL1d) A B⇒A

(SL2d) A ⇒ (B ⇒ C) (A ⇒ B) ⇒ (A ⇒ C)
(SL2d0 ) A ⇒ (B ⇒ C) , A ⇒ B A⇒C
(SL3d) ¬A ⇒ ¬B , ¬A ⇒ B A

Proof. (SL1d)

1 A hyp
2 A ⇒ (B ⇒ A) Ax SL1
3 B⇒A MP: 1 and 2

(SL2d)

1 A ⇒ (B ⇒ C) hyp
2 (A ⇒ (B ⇒ C)) ⇒ ((A ⇒ B) ⇒ (B ⇒ C)) Ax SL2
3 (A ⇒ B) ⇒ (B ⇒ C) MP: 1 and 2

(SL2d0 )

1 A ⇒ (B ⇒ C) hyp
2 A⇒B hyp
3 (A ⇒ (B ⇒ C)) ⇒ ((A ⇒ B) ⇒ (B ⇒ C)) Ax SL2
4 (A ⇒ B) ⇒ (B ⇒ C) MP: 1 and 3
5 A⇒C MP: 2 and 4

(SL3d)

1 ¬A ⇒ ¬B hyp
2 ¬A ⇒ B hyp
3 (¬A ⇒ ¬B) ⇒ ((¬A ⇒ B) ⇒ A) Ax SL3
4 (¬A ⇒ ¬B) ⇒ A MP: 1 and 3
5 A MP: 2 and 4


You will notice that all these follow in the same way from the axioms, using Modus Ponens
to justify relpacing an implication (⇒) with a turnstile ( ).
C. DEDUCTIONS 43

C Deductions
C.1 Using deductions as rules
One important thing about deductions is that, once proved, we can henceforth use them
in the same way as rules (in our semi-formal theory). For example, having proved the last
deduction above (Proposition B.4), we can henceforth proceed as though we had the new I B.4
deductive rule
A⇒B , B⇒C
.
A⇒C
What we are doing here is extending our semi-formal language, and making it more con-
venient. We are not changing the formal language in any way. Using the deduction just
proved, for instance, we now allow ourselves to put a line in one of our semi-formal proofs
of the form
A⇒C By Proposition B.4
provided we already have two lines A ⇒ B and B ⇒ C somewhere. The justification for
this new freedom is that there is a recipe for converting such a proof-step into a full formal
version for the object language. Here it is:

Replace the line A ⇒ C in the semi-formal version with the entire proof of the de-
duction B.4, as given above. The result is a valid fully-formal proof. The new lines
added to the proof include A ⇒ C, and so the new proof contains all the lines of the
old proof in the same order. Therefore all the old lines are still valid, for the same
reasons (except for A ⇒ C itself, for which we need a proper reason). We now justify
each of the new steps in the proof (and that includes A ⇒ C). If a new step was an
axiom, then it still is, so it is valid. If a new step followed by Modus Ponens from
two earlier steps in the proof of the deduction, then these earlier steps have just been
added to our new formal proof: the step being examined is valid. Finally, if the new
step was a hypothesis of the deduction, then it is a repetition of an earlier step of the
old proof, and this is also valid, as remarked in 1.B.11.

C.2 The Deduction Theorem


Note We will now prove the Deduction Theorem. Despite its name, it is not a theorem of
SL. It is a theorem about this theory and so is a metatheorem. As remarked above, there
is nothing in the language SL which allows the idea symbolised by to be expressed, let
alone a formal proof being supplied (within SL).
The Deduction theorem is very important to our program. It provides a great convenience
in constructing proofs, and we will use it often. It represents a common form of argument
in mathematics and elsewhere, as follows. Suppose we wish to prove some statement of the
form H ⇒ C. What we usually do is make an argument of this form
Suppose H;
proof step
proof step
..
.
proof step
Therefore C
44 CHAPTER 2. SENTENTIAL LOGIC

and from this argument conclude that H ⇒ C.


That this works, in the sense that we may add it to our arsenal of proof techniques in the
semi-formal language, is what the Deduction Theorem tells us.

C.3 The Deduction Theorem (simple form)


H C if and only if H ⇒C.

Note It is easy to prove the “reverse direction” (that if H ⇒ C then H C) — this


is almost a restatement of Modus Ponens. The “forward direction”

If H C then H ⇒C.

is the important one, and the harder to prove. Notice that it tells us that, in some sense,
the symbol ⇒ says, within the formal language, what the symbol says outside it (in
the metalanguage). This corresponds to what we are used to: that one can do formal
manipulations with the symbol ⇒, yet it can also be interpreted as saying that one thing
can be deduced from another. For our present purposes it is important to keep the two ideas
distinct, though the Deduction Theorem states the close relationship between them.

Proof. We assume that we have a proof of H C; let’s call this the “old proof” Using
this we will construct a proof of H ⇒ C, the “new proof”. We start by putting “H ⇒”
in front of every line of the old proof:

Old proof New proof


S1 H ⇒ S1
S2 H ⇒ S2
S3 H ⇒ S3
.. ..
. .

This new proof is not finished yet. Next we are going to inject some extra steps into the
new proof, a few before each of the statements just created, like this:

Old proof New proof


X1
S1 H ⇒ S1
X2
S2 H ⇒ S2
X2
S3 H ⇒ S3
.. ..
. .

(Here, each of these Xi represents several steps.)


C. DEDUCTIONS 45

For each i, the extra steps Xi we insert depend upon what sort of step Si was in the old
proof. From the definition of a deduction, each step of the old proof is either an axiom, the
hypothesis H or follows from two earlier steps by modus ponens. Let us take these cases
one at a time.
Suppose first that the step in the old proof is an axiom, A say. Then, in the new proof we
replace it by A , A ⇒ (H ⇒ A) , H ⇒ A, so that part of the two proofs looks like this:
Old proof New proof
.. ..
. .
..
. A
..
. A ⇒ (H ⇒ A)
A H⇒A
.. ..
. .
The first step here is OK because it is an axiom. The second step is also an axiom — an
instance of Axiom SL1. And now the third step follows from these two by Modus Ponens.
Now suppose that the step in the old proof we are fixing is the hypothesis H.
Old proof New proof
.. ..
. .
H H⇒H
.. ..
. .

This is OK because it is a theorem (Proposition B.2). I B.2


Finally, suppose the step we are fixing follows from two earlier steps by Modus Ponens. So
we start with this
Old proof New proof
.. ..
. .
A H⇒A ← Already fixed
.. ..
. .
A⇒B H ⇒ (A ⇒ B) ← Already fixed
.. ..
. .
B H⇒B ← Now we fix this
.. ..
. .
We fix up the new step H ⇒ B by inserting two lines before it, as follows
46 CHAPTER 2. SENTENTIAL LOGIC

Old proof New proof


.. ..
. .
A H⇒A (1)
.. ..
. .
A⇒B H ⇒ (A ⇒ B) (2)
.. ..
. .
(H ⇒ (A ⇒ B)) ⇒ ((H ⇒ A) ⇒ (H ⇒ B)) (3)
(H ⇒ A) ⇒ (H ⇒ B) (4)
B H⇒B (5)
.. ..
. .

This is not as bad as it looks. Steps (1) and (2) are the ones we have already fixed. The
horrible Step (3) is simply an instance of Axiom SL2. Steps (4) and (5) now follow by Modus
Ponens.

Finally observe that the last step in the old proof was C, so the last step in the new one is
H ⇒ C, as required.
Oh yes, it is an “if and only if” theorem. We should also prove the reverse form, that

If H⇒C then H C.

This is easy:

H hyp
H⇒C theorem 
C MP

We now look at the more general form of the theorem, in which there may be a few extra
hypotheses.

C.4 The Deduction Theorem (general form)


If K 1, K 2, . . . , K k , H C (with k ≥ 0) then K 1, K 2, . . . , K k H ⇒C.

Proof. This is a straightforward extension of the proof just given. There is only one more
case to consider: where the step in the old proof is one of the extra hypotheses Ki (that is
a perfectly valid step in such a proof). Then the corresponding step in the new proof which
needs fixing in H ⇒ Ki . This is left as an exercise. 

There are a couple of things worth noting about this proof.


Firstly, it is a classic example of showing that a new proof technique works. We take a
“proof” using the new technique and give a recipe for converting it, line by line, into a proof
which does not use the new technique and which we know already works. We will use this
technique several times.
C. DEDUCTIONS 47

Secondly, notice that we are using the method of incrementally improving our semi-formal
language. Our first increment was to show that we could use previously proved theorems as
steps in proofs. Now we add the Deduction Theorem as our second incremental improvement.
Note that our recipe above does not convert the given proof into a fully formal one; it only
converts into a “first increment” one (because we appeal to the H ⇒ H theorem). But that’s
OK because we already know that such a proof works.

Note If we have H1 , H2 B then two applications of the Deduction Theorem yields

H1 ⇒ (H2 ⇒ B) .

In the same way, three applications of the theorem to H1 , H2 , H3 B eventually yields

H1 ⇒ (H2 ⇒ (H3 ⇒ B)) .

And, as mentioned several times already, when we come to defining ∧, we can write these
in a form that is easier to understand:

H1 ∧ H2 ⇒ B
H1 ∧ H2 ∧ H3 ⇒ B .

There is no particular reason to suppose that there is only a finite number of extra hypotheses
in the above version of the Deduction Theorem. Any set of extra hypotheses can be used:

C.5 The Deduction Theorem (even more general form)


If H ∪ {H} B then H H ⇒B.

Proof. Same as before. 

C.6 Example
Earlier in this chapter, we proved that

A⇒B , B⇒C A⇒C. (–1)

Using the Deduction Theorem we may now conclude that

A⇒B (B ⇒ C) ⇒ (A ⇒ C)

and that
(A ⇒ B) ⇒ ((B ⇒ C) ⇒ (A ⇒ C)) .
It is a useful exercise to use the recipe given above to construct fully-formal proofs for the
last two statements. It then becomes clear what a great labour-saving device the Deduction
Theorem is.
48 CHAPTER 2. SENTENTIAL LOGIC

C.7 A better example


In Section F we will prove a number of the basic theorems of Sentential Logic. One of the IF
first of these is
(A ⇒ (B ⇒ C)) ⇒ (B ⇒ (A ⇒ C)) . (–1)
This is a classic example of the use of the Deduction Theorem. We will be using this
technique a lot, so I will describe it carefully here.
The expression to the right of the turnstile here is of the form X ⇒ Y , so we move the X
part to the left, to the left side of the turnstile thus:

A ⇒ (B ⇒ C) B ⇒ (A ⇒ C) . (–2)

Now the expression to the right of the turnstile is agai of the form X ⇒ Y , so we repeat the
operation:
A ⇒ (B ⇒ C) , B A⇒C. (–3)
And again:
A ⇒ (B ⇒ C) , B , A C. (–4)
The point of these manipulations is, firstly, that (–1), (–2), (–3) and (–4) are all equivalent
in the sense that if any one of them is a valid statement, then they all are. And, secondly,
that the final form (–4) is much more likely to be easy to prove than the first one — the
thing to be proved is much simpler and there are more hypotheses which can be employed.
Here is the easy proof of (–4)

1 A ⇒ (B ⇒ C) hyp
2 B hyp
3 A hyp
4 B⇒C MP: 3, 1
5 C MP: 2, 4

Having proved (–4) then (–3), then (–2), then (–1) all follow in order by direct applications
of the Deduction Theorem.

Should we be interested in the actual equivalence of these statement, the argument in the
forwards direction from (–1) to (–2) to (–3) to (–4) does not require the Deduction Theorem,
just applications of Modus Ponens. For example, to get from (–1) to (–2), we assume that
(–1) has been proved and exhibit a proof of (–2):

1 A ⇒ (B ⇒ C) hyp
2 (A ⇒ (B ⇒ C)) ⇒ (B ⇒ (A ⇒ C)) Thm (–1)
3 (B ⇒ (A ⇒ C)) MP: 1, 3

The other steps are proved in the same way.


D. THREE “IT FOLLOWS THAT” RELATIONSHIPS 49

D Three “it follows that” relationships


D.1 Discussion
You will have noticed that we have been dealing with three different ways an expression B
can “follow from” an expression A:
A
A⇒B , A B and .
B
What is the difference between them? Answer: not a lot. Let us see first what they say. We
are talking about SL here:

A⇒B says that the expression A ⇒ B is a theorem.

A B says that there is a proof (of a deduction) leading from A to


B.

A
B says that, in any proof in which A has already occurred as a step,
it is now allowed to write down B as a step. More precisely, in the original
definition of the formal theory, one of the given rules is that, B may occur as
a step in any formal proof in which A has already occurred as a step.

Firstly, the first two are certainly equivalent; that is, if either one of them is true, then so is
the other. That is what the Deduction Theorem tells us.
A
Secondly, if B is true, then it is easy to see that A B must be too: here is a proof

A hyp
A
B because B

However this does not work the other way around. For example, in SL, we have A B
whenever B is an axiom, irrespective of what A is, and this is certainly not one of the given
rules of the theory (Modus Ponens is the only one such).
50 CHAPTER 2. SENTENTIAL LOGIC

E Subdeductions

In this section we introduce another extension to our proof methods. This one is a great
convenience, and we will use it a lot. We start with an example.
We can clean up the proof of A ⇒ B , B ⇒ C A ⇒ C even further. Informally, we are
tempted to argue thus:

We are given the hypotheses A ⇒ B and B ⇒ C. Now suppose that A (a temporary hy-
pothesis). Then, from the first hypothesis we have B and then, from that and the second
hypothesis, we have C. Since we got this on the supposition A, we have proved that A ⇒ C.
It will be very useful to make this kind of subsidiary deduction part of our semi-formal
theory. Let us try to make it look more formal.

1 A⇒B hyp
2 B⇒C hyp
3 A subhyp Proof 1
4 B MP: 3 & 1
5 C MP: 4 & 2
6 A⇒C Ded: 3–5

Here I have enclosed the subsidiary deduction in an inner box, and for both the main deduc-
tion and the subsidiary one have drawn a line to separate the hypotheses from the proper
proof steps. It is very important in this kind of proof to know where our subdeductions
start and end, and exactly which lines are hypotheses and which are the proper proof steps.
These boxes seem to me to be a good way of doing it.

So here we have a subsidiary deduction (Lines 3–5) based on a temporary (or subsidiary)
hypothesis A. When that subsidiary proof is finished, we can conclude that A ⇒ C. The
steps in the subsidiary proof are allowed to use any steps that came before, whether in the
subsidiary proof or the main one — for example, Line 5 makes use of Lines 4 and 2.

How do we justify this? As usual, by showing that a proof which uses such a subsidiary
deduction can be replaced by one which does not. The recipe is to make the subsidiary
proof into a separate deduction which uses all the preceding steps of the main proof as
hypotheses. Like this:

1 A⇒B hyp
2 B⇒C hyp Proof 2
3 A hyp
4 B MP: 3 & 1
5 C MP: 4 & 2

Note that the proper steps of the new proof (Lines 4 and 5 in this example) are justified by
E. SUBDEDUCTIONS 51

the same things as they were in Proof 1 above. Proof 2 is, of course, a proof of the deduction

A⇒B, B ⇒C, A C

And now the Deduction Theorem gives us

A⇒B, B ⇒C A⇒C

as required.
Now let us prove that this works in general. Suppose then that we have a proof with a
subsidiary deduction of this form:

H1 , H2 , . . . , Hh hypotheses for the main proof


S1
..
.
Sp−1 Proof 3
K subsidiary hypothesis
T1
..
.
Tt
K ⇒ Tt (This is Step Sp )
Sp+1
..
.
Ss

Here the usual rules apply: the hypotheses H 1 , H 2 , . . . , H h and K can be any expressions
at all.
Each of the “main” steps S 1 , S 2 , . . . , S s can be an axiom, one of the main hypotheses
H 1 , H 2 , . . . , H h or follow by Modus Ponens from any of its preceding main steps Si . It
may not use any of K 1 , K 2 , . . . , K k or T 1 , T 2 , . . . , T t for the purposes of Modus Ponens.
Each of the steps T 1 , T 2 , . . . , T t of the subsidiary proof can be an axiom, one of the main
hypotheses H 1 , H 2 , . . . , H h or the subsidiary one K or follow by Modus Ponens from any
of the preceding main steps Si or subsidiary ones Ti .
The process of going from the subproof to the line K ⇒ Tt is often called “discharging the
hypothesis K”.

Now this purports to be a proof of H 1 , H 2 , . . . , H h Ss ; we must show that this is so, by


showing that it can be replaced by a proof which does not involve the subsidiary deduction.
Following the idea already used above, we set down a new deduction which is the same as
our subsidiary one, except that all of H 1 , H 2 , . . . , H h , S 1 , S 2 , . . . , S p−1 and K are used as
hypotheses:—
52 CHAPTER 2. SENTENTIAL LOGIC

H1 , H2 , . . . , Hk hypotheses
S1 , S2 , . . . , Sp−1 more hypotheses
K yet another hypothesis Proof 4
T1
..
.
Tt

Once again, note that anything which justified one of the lines T 1 , T 2 , . . . , T t in Proof 3 still
justifies that line in Proof 4, so this is then a valid proof of the deduction

H 1 , H 2 , . . . , H h , S 1 , S 2 , . . . , S p−1 , K Ti

(for any i = 1, 2, . . . , t).

Then from the Deduction Theorem, we have

H 1 , H 2 , . . . , H h , S 1 , S 2 , . . . , S p−1 K ⇒ Ti

so it is now valid to write down


K ⇒ Ti
for any of Sp+1 , Sp+2 , . . . , Ss , as required.
We will see how useful this method is in the next few sections. For now, let us just note
that it can be made even more powerful in several (obvious?) ways; the proofs that they
are valid are easy modifications of the one given above.
(1) The step K ⇒ Tt discharging the subsidiary hypothesis K does not have to occur
immediately following the subproof: it can occur anywhere after it. However there is never
any good reason to defer it in this way and to do so would only confuse the reader. Best
avoided.

(2) Similarly, the step discharging the hypothesis does not have to involve the last step
of the subproof: K ⇒ Tu , where Tu is any of the substeps, is equally valid. But again, why
do this? It only renders the last few steps of the subproof a waste of time. Except . . .
(3) It is OK to discharge the hypothesis several times, for instance
K ⇒ Tu (Step Sp )
K ⇒ Tv (Step Sp+1 )
K ⇒ Tt (Step Sp+2 )
Sp+3

(Useful if you want to show that several things follow from the same hypothesis.)
E. SUBDEDUCTIONS 53

(4) More usefully, one may have multiple hypotheses for a subproof, for example:

H1 , H2 , . . . , Hh main hyps
S1
..
.
Sp−1
K1 , K2 , K3 subhyp
T1
..
.
Tt
K1 ⇒ (K2 ⇒ (K3 ⇒ Tt ))
Sp+1
..
.
Ss

Once we have the and connective to work with, the discharging step can be written K1 ∧ K2 ∧ K3 ⇒ Ti .
(5) One can have multiple subdeductions, any number, either sequentially or nested, for
examples:

H1 , H2 , . . . , Hh main hyps
H1 , H2 , . . . , Hh main hyps
S1
S1 ..
.. .
.
Sp−1
Sp−1
K subhyp
K subhyp
T1
T1 ..
.. .
.
Tt L subsubhyp

K ⇒ Tt U1
..
.
L subhyp
Uu
U1
.. L ⇒ Uu
. ..
.
Uu
Tt
L ⇒ Uu
K ⇒ Tt
Sp+1
.. Sp+1
. ..
.
Ss
Ss
54 CHAPTER 2. SENTENTIAL LOGIC

E.1 Example
To some extent one can view the use of subproofs as a convenient way of folding uses of the
Deduction Theorem into one’s proofs. For example, we can prove the result discussed in
I C.7 Example C.7
(A ⇒ (B ⇒ C)) ⇒ (B ⇒ (A ⇒ C))
using this technique.

1 A ⇒ (B ⇒ C) hyp
2 B subhyp
3 A subsubhyp
4 B⇒C MP: 3 and 1
5 C MP: 2 and 4
6 A⇒C Ded: 3–5
7 B ⇒ (A ⇒ C)) Ded: 2–6
F. SOME THEOREMS AND DEDUCTIONS 55

F Some theorems and deductions

F.1 Discussion
In this section I give a number of theorems and deductions and their proofs. This is by no
means an exhaustive list of the theorems of SL, not even a list of all the more useful ones.
There are enough theorems here, however, to supply a working basis: from this further
theorems can be proved without difficulty. The proofs illustrate the techniques discussed
above.
Do not feel you have to read all these proofs in detail — but if you are interested enough
to do so, don’t let me discourage you! One thread of this chapter, the next and Chapter 6
is to provide enough detail to show that the whole of mathematics, as usually done, can be
set up as a formal theory on the axiomatic basis provided here. The point of this being that
we can then prove things about mathematics, some of them quite surprising.
Nevertheless, the examples of proof techniques provided here are important, and you should
read and digest enough of them to understand well what is going on. A few of the more
straightforward proofs are left as exercises; you should supply these for yourself.

As already mentioned, SL is designed to deal with sentences, by which we mean statements


which are either true or false. Traditional logic has worked with a grab-bag of these proce-
dures (syllogisms and so on) for millenia, and SL is meant to codify and systematise this.
Thus there are two ways to look at any of the theorems of SL: as a theorem of the formal
theory, which must have a formal proof, or as a statement about the relationships between
sentences, which should make sense according to our informal understanding of the meaning
of “true” and “false” and which can be used in the course of a proof.
A safe way of looking at this is to think of SL as an abstract system — a game if you like —
with a hard and fast set of rules. We can play this game to produce theorems. But then, as
well, all the expressions can be interpreted as statements about the relationships between
sentences. The theorems will then be the ones where this interpretation is in fact correct.
As this chapter proceeds, I will point out a number of these interpretations, since most of
them are useful in formal arguments.
We start with some theorems about the implication connective ⇒. There are some common
misconceptions about this connective, the main one being that in a statement such as A ⇒ B,
there should be some sort of argument or causality leading from A to B. But this is not so,
the truth or falsity (“truth value”) of the statement depends only upon the truth values of A
and B; they may be entirely unrelated. This dependence on truth values may be summed
up:
true ⇒ true the statement is true
true ⇒ false the statement is false
false ⇒ true the statement is true
false ⇒ false the statement is true
For an example using everyday facts, the statement

London is the capital of France ⇒ The blue whale is the largest living creature
56 CHAPTER 2. SENTENTIAL LOGIC

is true because the first part (about London) is false. From the truth-value table given
above, we can see that this is true whether or not the second part (about blue whales) is
true (it isn’t by the way).
IG Warning We will see later (Section G) that using calculations involving truth-values in
this way can be used to decide whether an expression of SL is a theorem or not. But the
fact that that technique works needs proof, and that proof requires a lot of the theorems
developed in this section. So to use truth-value calculations to verify any of the proofs in
this section would be circular; in short, they must not be used yet.
So why does one usually think of implication as requiring some sort of argument or proof?
I think that this is because, in mathematics, we seldom want to prove implications between
sentences (like, say, 2 + 2 = 4 ⇒ 2 + 3 = 5), but rather implications which are true for an
infinite number of values ((∀x)(∀y)(∀z)(x < y ⇒ x + z < y + z)). But you cannot check
the truth of an infinite number of cases one by one — you need a general argument.

F.2 Proposition
A⇒A

Proof. This has already been provided. 

“If something is true, then it is true” is about as obvious as you can get. But we do need to
check that it is a theorem.

F.3 Proposition
(a) (A ⇒ B) ⇒ ((B ⇒ C) ⇒ (A ⇒ C))

(b) (A ⇒ (B ⇒ C)) ⇒ (B ⇒ (A ⇒ C))


(c) A ⇒ (¬A ⇒ B)
(a) Note that this is equivalent to
(a0 ) A⇒B , B⇒C A⇒C

(as discussed above). This is chaining implications.


(b) This is a little logical manipulation that is useful from time to time. It is equivalent
to
(b0 ) A ⇒ (B ⇒ C) B ⇒ (A ⇒ C)
(c) is the really interesting one here. Noting first that it is equivalent to
(c0 ) A , ¬A B
it says: if there is a statement somewhere amongst the expressions which can be proved both
true and false, then every expression becomes a theorem and therefore so does its negation.
F. SOME THEOREMS AND DEDUCTIONS 57

Put this another way: every expression in the theory becomes both true and false. The
theory then is inconsistent and probably quite useless.
We will prove in Section G that this cannot happen for SL and in section 3.H that it cannot IG
happen for plain predicate logic, which will be reassuring. But the possibility remains that I 3.H
by extending these theories by adding more symbols and axioms, as we do to make various
important formal theories, including mathematics, we may introduce an expression which
can be proved both true and false. Of course, we really want to avoid this if possible.

Proof of (a). This has been done to death above in Section E.

Proof of (b) This has already been given (twice) in C.7 and E.1. I C.7
I E.1
Proof of (c) We prove (c0 ) and then (c) follows by two applications of the Deduction
Theorem.

1 A hyp
2 ¬A hyp
3 ¬B ⇒ A By Axiom SL1d on 1
4 ¬B ⇒ ¬A By Axiom SL1d on 2
5 (¬B ⇒ ¬A) ⇒ ((¬B ⇒ A) ⇒ B) Ax 3
6 (¬B ⇒ A) ⇒ B MP: 4 & 5
7 B MP: 3 & 6


Note SL1d is in Proposition B.5. I B.5


Comment We have used Axiom SL1 (A ⇒ (B ⇒ A)) in the form A (B ⇒ A) to get
from Line 1 to Line 3. This is quite a common trick and we will use it repeatedly: where
an axiom or theorem is in the form of an implication, we can always use it as a deduction
with no further ado.

F.4 Proposition: Double negation


In this section, nearly every proposition will have a “theorem” form and a “deduction” form.
They will always be equivalent, in the sense that if either one of them is true then they both
are. One can be proved from the other by the methods discussed above, so in general I will
only give the proof of one form; usually the “deduction” form, because that is easier.
Here I place the two forms side by side for comparison:
(a) A ⇒ ¬¬A (a0 ) A ¬¬A
(b) ¬¬A ⇒ A (b0 ) ¬¬A A

and (a0 ) and (b0 ) give

A ¬¬A
58 CHAPTER 2. SENTENTIAL LOGIC

This is the result usually known as “double negation”. To say that it is not true that A is
not true is the same as saying that A is true. In other words, two negatives nested in this
way cancel each other out.
Double negation is not often used in this bare-faced way in ordinary language:

“You told me you are not coming to the party tonight.”


“You’re wrong. I’m not not coming to the party tonight.”
However it is often used in disguised forms, such as
“George is not unlike his brother.”

But beware of ordinary-language uses where a double negative does not cancel out, but
is used to reinforce the negativeness — or just ungrammatically. Consider I can’t get no
satisfaction by you-know-who or the (ironic?) We don’t need no education by Pink Floyd.
Or the deplorable word irregardless (where both the prefix “ir-” and the suffix “-less” should
denote negation.

Proof. We prove the deduction forms: ¬¬A A and A ¬¬A.

1 ¬¬A hyp
2 ¬A ⇒ ¬¬A SL1d on 1
3 (¬A ⇒ ¬¬A) ⇒ ((¬A ⇒ ¬A) ⇒ A) Ax 3
4 (¬A ⇒ ¬A) ⇒ A MP: 2 and 3)
5 ¬A ⇒ ¬A Proposition F.2
6 A MP: 4 and 5

And

1 A hyp
2 ¬¬¬A ⇒ ¬A Part 1 of this proof
3 (¬¬¬A ⇒ ¬A) ⇒ ((¬¬¬A ⇒ A) ⇒ ¬¬A) Ax 3
4 (¬¬¬A ⇒ A) ⇒ ¬¬A MP: 2 and 3)
5 ¬¬¬A ⇒ A SL1d on 1
6 ¬¬A MP: 5 and 4


F.5 Proposition: Contrapositive


Placing the two forms side by side again:
(a) (A ⇒ B) ⇒ (¬B ⇒ ¬A) (a0 ) A⇒B ¬B ⇒ ¬A
(b) (¬A ⇒ ¬B) ⇒ (B ⇒ A) (b0 ) ¬A ⇒ ¬B B⇒A
0
(c) (A ⇒ ¬B) ⇒ (B ⇒ ¬A) (c ) A ⇒ ¬B B ⇒ ¬A
(d) (¬A ⇒ B) ⇒ (¬B ⇒ A) (d0 ) ¬A ⇒ B ¬B ⇒ A
F. SOME THEOREMS AND DEDUCTIONS 59

(a0 ) and (b0 ) together give


(a00 ) A⇒B ¬B ⇒ ¬A
Remembering that all these theorems are give as schemas, so, for instance, A ⇒ ¬B B ⇒ ¬A
is in fact the same as B ⇒ ¬A A ⇒ ¬B, (c0 ) and (d0 ) give

(c00 ) A ⇒ ¬B B ⇒ ¬A
(d00 ) ¬A ⇒ B ¬B ⇒ A
Note that (b) is the standard way of arguing by “contrapositive”. We prove “if A then B”
by showing that if B is false then A must be too. The expression ¬B ⇒ ¬A is called the
contrapositive of A ⇒ B.
(a), (c) and (d) are just variations on this idea.
Note also that (a0 ) can be rewritten in the form
(a000 ) A ⇒ B , ¬B ¬A

This is one of the versions of proof by contradiction (a.k.a. reductio ad absurdum). It says
that if A implies something that we know to be false, then A must itself be false. The same
maniplulation can be applied to (b0 ), (c0 ) and (d0 ) to produce some other variations on the
proof-by-contradiction theme.

Proof of (b).

1 ¬B ⇒ ¬A hyp
2 (¬B ⇒ ¬A) ⇒ ((¬B ⇒ A) ⇒ B) Ax 3
3 (¬B ⇒ A) ⇒ B MP: 1 and 2
4 A subhyp
5 ¬B ⇒ A Proposition F.3(d) on 4
6 B MP: 5 and 3
7 A⇒B Ded: 4–6

Proof of (a)

1 A⇒B hyp
2 ¬¬A ⇒ A Proposition F.4
3 ¬¬A ⇒ B Proposition F.3(a) on 1 and 2
4 B ⇒ ¬¬B Proposition F.4 again
5 ¬¬A ⇒ ¬¬B F.3(a) on 3 and 4
6 (¬¬A ⇒ ¬¬B) ⇒ (¬B ⇒ ¬A) First part of this proof
7 ¬B ⇒ ¬A MP: 5 and 6
60 CHAPTER 2. SENTENTIAL LOGIC

Proofs of (b) and (c) as exercises. 

From now on I will not give both forms of the propositions, except where there is good
reason to. But don’t forget that they exist.

F.6 Proposition
(a) B ⇒ (A ⇒ B)
(b) ¬A ⇒ (A ⇒ B)

(c) ¬(A ⇒ B) ⇒ A
(d) ¬(A ⇒ B) ⇒ ¬B
Proofs as easy exercises. (a) of course is Axiom SL1, so needs no proof. It is included
here because it fits in with the pattern of the other parts.

F.7 Notation
Now let us introduce the other connectives. As already stated, they are not part of the
fully-formal language, but are used in the semi-formal one as abbreviations for the more
complicated expressions which they represent.

A∧B stands for ¬(A ⇒ ¬B)


A∨B stands for ¬A ⇒ B
A⇔B stands for (A ⇒ B) ∧ (B ⇒ A) .

The simplest way to think of this is as three new rules. The first says that you can substitute
any expression of the form A ∧ B for one of the form ¬(A ⇒ ¬B) and vice versa (same A
and B of course). The other two are to be read the same way.
These substitutions can even be carried out on subexpressions. For instance, it is allowed
to substitute

(A ∧ B) ⇒ (C ∨ D) for ¬(A ⇒ ¬B) ⇒ (¬C ⇒ D)

and vice versa.


In the literature, these connectives are often called conjunction, disjunction and equivalence.
It is usually more comfortable to read them as and, or and if and only if, this last often
being shortened to iff.

F.8 Proposition
(a) A∧B ⇒A
(b) A∧B ⇒B

(c) A ⇒ (B ⇒ (A ∧ B))
F. SOME THEOREMS AND DEDUCTIONS 61

(c) is probably easier to read in its deduction form:


(c0 ) A, B A∧B
These very basic properties of conjunction make sense if you think of A ∧ B as saying that
A, B are both true.

Proof of (a0 ). We have to show that ¬(A ⇒ ¬B) A. Note that Proposition F.3(c) can
be rewritten ¬A , A B which, by the Deduction Theorem, gives ¬A ⇒ (A ⇒ B).
Substituting ¬B for B in this starts the proof:

1 ¬(A ⇒ ¬B) hyp


2 ¬A ⇒ (A ⇒ ¬B) As above
3 ¬(A ⇒ ¬B) ⇒ A Proposition F.5(c) as deduction on 2
4 A MP: 1 and 3

Proof of (b0 ) (Note that this is not obvious because we have not yet proved that ∧ is
commutative.) We want to prove that ¬(A ⇒ ¬B) B.

1 ¬(A ⇒ ¬B) hyp


2 ¬B ⇒ (A ⇒ ¬B) Ax 1
3 ¬(A ⇒ ¬B) ⇒ B Proposition F.5(c) as deduction on 2
4 B MP: 1 and 3

Proof of (c) as an exercise. 

F.9 Proposition
(a) A⇒A∨B
(b) B ⇒A∨B

These basic properties of disjunction make sense if you think of A ∨ B as saying that at least
one of A, B is true.

Proof of (a0 ). This is A ¬A ⇒ B and so, by the Deduction Theorem, it is enough to


prove that A , ¬A B; this is Proposition F.3(c).

Proof of (b) This is Axiom SL1 in disguise. 

F.10 Proposition
(a) (A ⇒ (B ∧ C)) ⇒ (A ⇒ B)
(b) (A ⇒ (B ∧ C)) ⇒ (A ⇒ C)

(c) (A ⇒ B) ⇒ ((A ⇒ C) ⇒ (A ⇒ (B ∧ C))


62 CHAPTER 2. SENTENTIAL LOGIC

(d) (A ⇒ (B ⇒ C)) ⇒ (A ∧ B ⇒ C)
(e) (A ∧ B ⇒ C) ⇒ (A ⇒ (B ⇒ C))
These are more properties of conjunction. Again, they make sense if you think in terms of
truth-values.

(c) is probably easier to read in its deduction form:


(c0 ) A⇒B , A⇒C A ⇒ (B ∧ C)
I A.12 Note that (d)/(e) is the result, promised back in Section A.12, which we used there to make
the axioms a bit more readable.

Proof. (a) We prove that A ⇒ (B ∧ C) , A B.

1. B∧C MP from both hyps


2. B Proposition F.8(a) on 1 

I F.8 (b) and (c) follow in the same way from Propositions F.8(b) and (c).
(d) This is an equivalence so as usual there are two things to prove. First we prove
A ⇒ (B ⇒ C) A ∧ B ⇒ C.

1 A ⇒ (B ⇒ C) hyp
2 A∧B subhyp
3 A Proposition F.8(a) on 2
4 B⇒C MP: 3 and 1
5 B Proposition F.8(b) on 2
6 C MP: 5 and 4
7 A∧B ⇒C Ded: 2–6

Now we prove A ∧ B ⇒ C A ⇒ (B ⇒ C). For the first time we will use a doubly-
nested subdeduction.

1 A∧B ⇒ C hyp
2 A subhyp
3 B subsubhyp
4 A∧B Proposition F.8(c) on 2 and 3
5 C MP: 4 and 1
6 B⇒C Ded: 3–5
7 A ⇒ (B ⇒ C) Ded: 2–6
F. SOME THEOREMS AND DEDUCTIONS 63

F.11 Proposition
(a) ((A ∨ B) ⇒ C) ⇒ (A ⇒ C)
(b) ((A ∨ B) ⇒ C) ⇒ (B ⇒ C)
(c) (A ⇒ C) ⇒ ((B ⇒ C) ⇒ (A ∨ B ⇒ C))

(d) (A ∨ B) ⇒ (((A ⇒ C) ⇒ ((B ⇒ C) ⇒ C)))


(e) (A ∨ B) ⇒ (¬A ⇒ B)
(d) is probably easier to understand in its deduction form:

(d0 ) A∨B , A⇒C , B ⇒C C

Proof. (a) We want to show that (¬A ⇒ B) ⇒ C A ⇒ C which we do as usual by


showing that (¬A ⇒ B) ⇒ C , A A ⇒ C.

1 A ⇒ (¬A ⇒ B) Apply the Deduction Theorem to Proposition F.3(c)


2 ¬A ⇒ B MP: hyp and 1
5 C MP: hyp and 2

(b) We show that (¬A ⇒ B) ⇒ C B⇒C .

1 (¬A ⇒ B) ⇒ C hyp
2 B subhyp
3 ¬A ⇒ B Proposition F.3(d) on 2
4 C MP: 3 and 1
5 B⇒C Ded: 2–4

(c) We show that A ⇒ C , B ⇒ C (¬A ⇒ B) ⇒ C .

1 A⇒C hyp
2 B⇒C hyp
3 ¬A ⇒ B subhyp
4 ¬A ⇒ C Proposition F.3(a) on 2 and 3
5 ¬C ⇒ A Proposition F.5(c) on 4
6 ¬C ⇒ ¬A Proposition F.5(a) on 1
7 C Ax 3 as deduction on 5 and 6
8 (¬A ⇒ B) ⇒ C Ded: 3–7

(d) is an immediate corollary of (c). Propositions (e) and (f ) follow immediately from the
fact that A ∨ B is defined to mean ¬A ⇒ B. 
64 CHAPTER 2. SENTENTIAL LOGIC

F.12 Comment
The last two groups of deductions, specifically the first three of each of F.10 and F.11
encapsulate the functions of ∧ and ∨

A ⇒ (B ∧ C) A⇒B (A ∨ B) ⇒ C A⇒C
A ⇒ (B ∧ C) A⇒C (A ∨ B) ⇒ C B⇒C
A⇒B , A⇒C A ⇒ (B ∧ C) A⇒C , B⇒C (A ∨ B) ⇒ C

From here on, all properties of these connectives follow from these six deductions and no
further reference to the rather strange definitions is required (though occasional proofs
straight from the definitions do turn out to be quicker). Observe the pleasing symmetry
between ∧ and ∨; this flows on to their various properties.
I F.11 Proposition F.11(d) is traditionally called the Constructive Dilemma; I prefer the more
obvious name Proof by Cases, because that’s what it is. It is a very common way of arguing.
Here A and B represent the two cases, and the proposition says that if one or other of the
two cases is true and C can be proved in either case, then C is true. It is easily extended to
cope with three or more cases.
As an exercise, try proving the three-case version

A∨B∨C , A⇒D , B ⇒D , C ⇒D D

Oops! Not so fast: the notation A ∨ B ∨ C is (so far) undefined and ambiguous. We have
a pretty good idea what it means, but this sort of notation assumes that the operation ∨
is associative and we haven’t proved that yet. We will do that, but until that time and for
the purposes of this challenge, take A ∨ B ∨ C to be convenient shorthand for (A ∨ B) ∨ C.

I F.8 Note that Deductions F.8(a) to (c) can be applied to the definition of ⇔ to yield a rather
similar group of deductions involving it . . .

F.13 Proposition
(a) (A ⇔ B) ⇒ (A ⇒ B)

(b) (A ⇔ B) ⇒ (B ⇒ A)
(c) (A ⇒ B) ⇒ ((B ⇒ A) ⇒ (A ⇔ B))
And some basic properties of equivalence

F.14 Proposition
(a) A⇔A
(b) (A ⇔ B) ⇒ (B ⇔ A)
(c) (A ⇔ B) ⇒ ((B ⇔ C) ⇒ (A ⇔ C))
(c) is probably easier to understand in its deduction form:
F. SOME THEOREMS AND DEDUCTIONS 65

(c0 ) A⇔B , B⇔C A⇔C


From here on many of the proofs are quite easy and will not be given.

F.15 Comments: another way of looking at all this


The last proposition tells us that the equivalence connection ⇔ behaves the way an equiv-
alence relation is supposed to, which is just as well.

Now look back at Propositions F.2, F.3 and F.13(c): I F.2


I F.3
A A I F.13
A⇒B , B⇒A A⇔B
A⇒B , B⇒C A⇒C
These can be regarded as saying that the connective ⇒ is a partial order — well, a preorder
for which the equivalence relation is the equivalence in our language (if you don’t know what
a preorder is, there’s a rundown later in these notes in Section 6.C). I 6.C
Of course, we have to agree to interpret our implication and equivalence symbols as relations
on expressions to say this.
Now, if you happen to know something about lattices, you will recognise the deductions
collected in Comment F.12 above as saying that ∧ and ∨ behave as meet and join operations
with respect to this order. And if you haven’t made the acquaintance of lattices yet don’t
worry, we are not going to go further into this here.

F.16 Substitution of equivalents


Now we set about proving the Substitution of Equivalents Theorem.
In ordinary mathematics we know that if two things are equal, then we can substitute one
for the other anywhere and get equal or equivalent things. For example, if x = x0 , then we
can say without further ado that
sin x + y 2 = sin x0 + y 2 and log x > 3y ⇔ log x0 > 3y .
The Substitution of Equivalents Theorem says that we can do much the same thing in SL
once we know that two expressions are equivalent. For example, if we know that A ⇔ A0 ,
then we can say without further ado that (for instance)
B ⇒ (¬A ∧ C) ⇔ B ⇒ (¬A0 ∧ C)
Or at least we will be able to after we have proved the theorem.

Mental picture of A occurring as a subexpression of X:

X A

Mental picture of substituting A0 for the occurrence of A and so changing X to X 0 :

X0 A0
66 CHAPTER 2. SENTENTIAL LOGIC

Note that it is possible that there may be several occurrences of A in X. In this case the
theorem does not require that the substitution should be carried out consistently on all
occurrences. It may be carried out on some and others left as they were, and the conclusion
still holds.
Mental picture of A occurring three times as a subexpression of B:

X A A A

Mental picture of substituting A0 for some but not all of the occurrences of A and so changing
X to X 0 :

X0 A0 A A0

Note well that we now have two very different styles of substitution. In substitution of
equivalents, as above, we may substitute some but not necessarily all of the subexpression in
question. Compare with making a substitution into one of the axioms to form an instance of
the axiom; in this case we must make the same substitution consistently to all occurrences
of the subexpression in question. For instance, given the first axiom A ⇒ (B ⇒ A), we may
substitute any expression for A (whether it is equivalent to A or not, for example, Q ⇒
(B ⇒ Q), but we must substitute both occurrences of the subexpression: Q ⇒ (B ⇒ A) is
not OK.
We need to prove a lemma first. This basically covers a few simple cases and is easy to
prove.

F.17 Lemma

(a) (A ⇔ A0 ) ⇒ (¬A ⇔ ¬A0 )


(b) (A ⇔ A0 ) ⇒ ((A ⇒ B) ⇔ (A0 ⇒ B))
(c) (A ⇔ A0 ) ⇒ ((B ⇒ A) ⇔ (B ⇒ A0 ))

Proof (a).

1 A ⇔ A0 hyp
0
2 A⇒A F.13(a)
3 ¬A0 ⇒ ¬A F.6
4 A0 ⇒ A F.13(b)
5 ¬A ⇒ ¬A0 F.6 again
6 ¬A ⇔ ¬A0 Definition of ⇔, steps 3 and 5

(b)
F. SOME THEOREMS AND DEDUCTIONS 67

1 A ⇔ A0 hyp
2 A ⇒ A0 F.13(a) on hyp
0
3 A ⇒B subhyp
4 A⇒B Proposition F.3(a) on 2 and 3
0
6 (A ⇒ B) ⇒ (A ⇒ B) Subded, 3–4
7 A0 ⇒ A F.13(b) on hyp
8 A⇒B subhyp
9 A0 ⇒ B Proposition F.3(a) on 7 and 8
10 (A ⇒ B) ⇒ (A0 ⇒ B) Subded, 8–9
11 (A ⇒ B) ⇔ (A0 ⇒ B) Definition of ⇔, steps 6 and 10

(c) Proof is much the same. 

F.18 Substitution of Equivalents theorem — actually a metatheorem


If A, A0 and X are expressions such that A ⇔ A0 and A occurs as a subexpression of X,
and we write X 0 for the result of substituting A0 for an occurrence A in X, then X ⇔ X 0 .

Proof. We prove this by induction over the construction of X (that is, we assume the
theorem is true in all cases where X is shorter).
Since A is a subexpression of X, one of the following cases must hold:—

(1) X = A.
In this case X 0 = A0 and the result is trivially true.
(2) X = ¬P and A is a subexpression of P .
In this case X 0 = ¬P 0 , where P 0 is the result of substituting A0 for A in P . But, by the
inductive hypothesis, P 0 ⇔ P , and then X 0 ⇔ X by Part (a) of the lemma.
(3) X = P ⇒ Q and A is a subexpression of P .
In this case X 0 = P 0 ⇒ Q, where P 0 is the result of substituting A0 for A in P . But, by the
inductive hypothesis, P 0 ⇔ P , and then X 0 ⇔ X by Part (b) of the lemma.

(4) X = P ⇒ Q and A is a subexpression of Q.


Same proof. 

Here are a few useful special cases.

F.19 Corollary
These are probably all a little easier to read in their deduction forms.
68 CHAPTER 2. SENTENTIAL LOGIC

(a) (A ⇔ A0 ) ⇒ ((A ∧ B) ⇔ (A0 ∧ B)) (a0 ) A ⇔ A0 (A ∧ B) ⇔ (A0 ∧ B)


(b) (A ⇔ A0 ) ⇒ ((B ∧ A) ⇔ (B ∧ A0 )) (b0 ) A ⇔ A0 (B ∧ A) ⇔ (B ∧ A0 )
(c) (A ⇔ A0 ) ⇒ ((A ∨ B) ⇔ (A0 ∨ B)) (c0 ) A ⇔ A0 (A ∨ B) ⇔ (A0 ∨ B)
(d) (A ⇔ A0 ) ⇒ ((B ∨ A) ⇔ (B ∨ A0 )) (d0 ) A ⇔ A0 (B ∨ A) ⇔ (B ∨ A0 )
(e) (A ⇔ A0 ) ⇒ ((A ⇔ B) ⇔ (A0 ⇔ B)) (e0 ) A ⇔ A0 (A ⇔ B) ⇔ (A0 ⇔ B)
(f ) (A ⇔ A0 ) ⇒ ((B ⇔ A) ⇔ (B ⇔ A0 )) (f0 ) A ⇔ A0 (B ⇔ A) ⇔ (B ⇔ A0 )

F.20 Corollary
Multiple applications of the Substitution of Equivalents theorem allow us to make multiple
substitutions, for example:

(a) (A ⇔ A0 ) ⇒ (¬A ⇔ ¬A0 )


(a0 ) A ⇔ A0 ¬A ⇔ ¬A0
(b) (A ⇔ A0 ) ⇒ ((B ⇔ B 0 ) ⇒ ((A ⇒ B) ⇔ (A0 ⇒ B 0 )))

(b0 ) A ⇔ A0 , B ⇔ B 0 (A ⇒ B) ⇔ (A0 ⇒ B 0 )
(c) (A ⇔ A0 ) ⇒ ((B ⇔ B 0 ) ⇒ ((A ∧ B) ⇔ (A0 ∧ B 0 )))
(c0 ) A ⇔ A0 , B ⇔ B 0 (A ∧ B) ⇔ (A0 ∧ B 0 )
(d) (A ⇔ A0 ) ⇒ ((B ⇔ B 0 ) ⇒ ((A ∨ B) ⇔ (A0 ∨ B 0 )))

(d0 ) A ⇔ A0 , B ⇔ B 0 (A ∨ B) ⇔ (A0 ∨ B 0 )
(e) (A ⇔ A0 ) ⇒ ((B ⇔ B 0 ) ⇒ ((A ⇔ B) ⇔ (A0 ⇔ B 0 )))
(e0 ) A ⇔ A0 , B ⇔ B 0 (A ⇔ B) ⇔ (A0 ⇔ B 0 )

Anecdote: Bertrand Russell, in his autobiography, tells us that around the time he
was working on Principia Mathematica he was puzzled by the following idea. Consider
the true statement

Sir Walter Scott is the author of Ivanhoe.

That means that “Sir Walter Scott” and “the author of Ivanhoe” are the same thing
— equal. That being the case, one can be substituted for the other in any sentence
without altering its meaning. Do that to the sentence displayed above and we get

Sir Walter Scott is Sir Walter Scott.

The two displayed statements should both say the same thing — but they clearly
don’t. (OK. Is this a problem or isn’t it?)

F.21 Proposition: Commutativity of ∧ and ∨


It is now easy to prove some basic algebraic-style properties of conjunction and disjunction.
F. SOME THEOREMS AND DEDUCTIONS 69

Most of these express really obvious facts if you think in terms of truth-values. Firstly, they
are commutative. — If A and B are both true, then B and A are both true. And if at least
one of A and B is true, then at least one of B and A is true. How’s that for obvious?
(a) (A ∧ B) ⇔ (B ∧ A)

(b) (A ∨ B) ⇔ (B ∨ A)

F.22 Proposition: Associativity of ∧ and ∨


They are associative. — (A ∧ B) ∧ C says that all three are true; so does A ∧ (B ∧ C).
Similarly (A ∨ B) ∨ C says that at least one of the three is true; so does A ∨ (B ∨ C).

(a) ((A ∧ B) ∧ C) ⇔ (A ∧ (B ∧ C))


(b) ((A ∨ B) ∨ C) ⇔ (A ∨ (B ∨ C))
Because of these laws we can perform the usual operations on expressions involving repeated
conjunctions and disjunctions. We also adopt the usual abbreviations in our semi-formal
language:

A∧B∧C stands for (A ∧ B) ∧ C


A∧B∧C ∧D stands for ((A ∧ B) ∧ C) ∧ D and so on,
and the same for disjunction.

Now that we have associativity we may write things like A ∧ B ∧ C ∧ D without fear of
ambiguity, because no matter how we bracket the expression, we get equivalent expressions.
The same goes of course for ∨. This fact about associativity should be familiar from ordinary
algebra.
Put this together with commutativity and we can change the order in such expressions at
will too. For instance
A∧B∧C ∧D ⇔ D∧B∧A∧C
This, combined with substitution of equivalents, gives us much the same freedom in manip-
ulating expressions that we are used to with ordinary algebra.
Where you have expressions involving both conjunctions and disjunctions, the conjunctions
are traditionally supposed to take precedence, so an expression such as A ∧ B ∨ C ∧ D is
to be interpreted as meaning (A ∧ B) ∨ (C ∧ D) and not A ∧ (B ∨ C) ∧ D. However it is
not a good idea to leave parentheses out in such expressions, mainly because it makes them
difficult to read. I won’t do it in these notes.

F.23 Proposition: Distributivity of ∧ and ∨


Conjunction and disjunction distribute over each other:
(a) A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C) and (B ∨ C) ∧ A ⇔ (B ∧ A) ∨ (C ∧ A)

(b) A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C) and (B ∧ C) ∨ A ⇔ (B ∨ A) ∧ (C ∨ A)
70 CHAPTER 2. SENTENTIAL LOGIC

That this works both ways is a little startling. Compare with ordinary algebra (of numbers).
We are used to multiplication distributing over addition:

x(y + z) = xy + xz

but not the other way round

x + yz = (x + y)(x + z) ??!

F.24 Proposition: De Morgan’s Laws


Now we look at some useful laws which are not like any ordinary algebraic laws that apply
to numbers. The first of these are De Morgan’s Laws:
(a) ¬(A ∧ B) ⇔ ¬A ∨ ¬B (b) ¬(A ∨ B) ⇔ ¬A ∧ ¬B

F.25 Corollary
Substituting ¬A for A and/or ¬B for B gives us some variations on these, which are usually
also referred to as De Morgan’s Laws.
(c) ¬(¬A ∧ B) ⇔ A ∨ ¬B

(d) ¬(A ∧ ¬B) ⇔ ¬A ∨ B


(e) ¬(¬A ∧ ¬B) ⇔ A ∨ B
(f ) ¬(¬A ∨ B) ⇔ A ∧ ¬B

(g) ¬(A ∨ ¬B) ⇔ ¬A ∧ B


(h) ¬(¬A ∨ ¬B) ⇔ A ∧ B
Note that De Morgan’s laws give us slight, but useful, variations on a number of the foregoing
propositions. Two such we will need are

F.26 Corollary
(a) ¬A ¬(A ∧ B) and ¬B ¬(A ∧ B)

(b) ¬A , ¬B ¬(A ∨ B)

F.27 Proposition: Absorption


Slightly weird-looking are the Absorptive Laws They are occasionally useful for simplifying
complicated expressions.
(a) A ∨ (A ∧ B) ⇔ A

(b) A ∧ (A ∨ B) ⇔ A
F. SOME THEOREMS AND DEDUCTIONS 71

F.28 Proposition: Excluded Middle and Dichotomy


(Nearly) finally, there is the Law of the Excluded Middle and the Law of Dichotomy. At
the beginning of this chapter it was stated that Sentential Logic is designed to deal with
truth-function relationships between statements which are either true or false, but not both.
These two laws express, in the formal language itself, these facts about sentences.
(a) ¬(A ∧ ¬A)
(b) A ∨ ¬A

F.29 Proposition: Idempotence


The Idempotence laws are a couple more exercises in saying the obvious, if you think in
terms of truth-values. To say that both A and A are true is the same as saying that A is
true and to say that at least one of A and A is true is also the same as saying that A is true.

(a) A∧A⇔ A
(b) A∨A⇔ A

F.30 Fun with T and F


For the purposes of this discussion it will be convenient to define an antitheorem to be an
expression that is proveably false; that is, B is an antitheorem if ¬B . For example,
B ∧ ¬(A ⇒ B) is an antitheorem.

Now, choose your favourite theorem, preferably a very simple one (for instance, I would
choose p ⇒ p, where p is some fixed chosen letter of the alphabet); call it T for short. (The
symbol is supposed to suggest “true”.)
In the same way, choose a nice simple antitheorem (¬(p ⇒ p) is convenient), and call it F
(to suggest “false”).

Now, if A is any theorem, we have both A and T. From this it is easy to prove
that A ⇔ T. Using Substitution of Equivalents (F.18), we now know that, wherever any I F.18
theorem occurs as a subexpression of some larger expression, we may replace it by T.
For example, we know that (A ∧ B) ⇒ A and so we can replace it by T in, say,

(A ∨ B) ⇒ ((A ∧ B) ⇒ A) ∧ C) (–1A)

to get the equivalent expression

(A ∧ B) ⇒ (T ∧ C) . (–1B)

In the same way, we may substitute F for any antitheorem, wherever it occurs as a subex-
pression. For example, we can prove that ¬(A ∧ ¬(B ⇒ A)), which is to say that
(A ∧ ¬(B ⇒ A)) is an antitheorem. So, making the substitution in

(A ⇒ (A ∧ ¬(B ⇒ A))) ⇒ ¬A (–2A)


72 CHAPTER 2. SENTENTIAL LOGIC

we find that it is equivalent to


(A ⇒ F) ⇒ ¬A . (–2B)

A good deal of the manipulation we do of expressions in Sentential Logic is concerned with


simplifying expressions — replacing expressions by simpler but equivalent ones. In this
process it is often useful to replace theorems by T and antitheorems by F, as described
above, and then use the various simplifications listed in the next theorem.

For example, using the next theorem we see that the expression (–1B) above simplifies to

(A ∧ B) ⇒ C

which is not a theorem, so neither was (–1A). Similarly expression (–2B) simplifies to

¬A ⇒ ¬A ,

which we immediately recognise as theorem, so (–2) was theorem also.

F.31 Proposition: Useful T and F manipulations


(a) ¬T ⇔ F and ¬F ⇔ T
(b) (T ⇒ A) ⇔ A and (F ⇒ A) ⇔ T
(c) (A ⇒ T) ⇔ T and (A ⇒ F) ⇔ ¬A
(d) A∧T ⇔ A and A∧F ⇔ F
(e) A∨T ⇔ T and A∨F ⇔ A
(f ) (A ⇔ T) ⇔ A and (A ⇔ F) ⇔ ¬A

Exercise Prove some of these. (Hint: the proofs are all very easy.)
G. DECIDABILITY OF SENTENTIAL LOGIC 73

G Decidability of Sentential Logic


Throughout the previous section I made comments based on thinking about expressions of
SL in terms of the truth-values of the proposition symbols of which they are made up. So far
I have emphasised that, while this way of looking at expressions, theorems and deductions
is helpful, it could not actually be used to prove anything — or, at least, until it had been
codified and shown to work.

In this section we will do just that: I describe an organised method, the Truth Table method
of computing the truth-value of an expression in terms of the truth-values of its component
proposition symbols. This will be fact an algorithm to decide whether any given expression
is a theorem or not; that this is so is also proved in this section.
The section is in two parts. In the first part the Truth Table method is described. In the
second, we prove the fact that it performs as claimed: that the description of how to read
off whether an expression is a theorem or not is actually correct.
The upshot of all this is that we will be proving that SL is a decidable theory.
The first part is well worth reading and understanding in detail; it is not difficult. The second
part, the proof that it works, is long and fairly complicated; it is included for completeness
because this is a fairly significant result. However, I would only recommend you read it
through if you are particularly interested in seeing the proof of this result. (That said, there
are some techniques and subsidiary results here that should be of interest to anyone who
intends proceeding to further studies of logic in general.)

G.1 Preliminary
Consider a theorem of SL, for example

p ∧ (q ∨ r) ⇔ (p ∧ q) ∨ (p ∧ r) .

Note that we are talking about an actual theorem here, not a theorem schema such as we
have been working with most of the time so far. Here p, q and r are letters of the alphabet
of the theory. And now suppose we have a proof of this,

S1, S2, . . . , Sn

Now, the expression being proved uses only the three proposition symbols p, q, r, however we
must consider the possibility that the proof might need to contain some extra proposition
symbols, other than these three. Now this seems unlikely, and a glance through the previous
section will show that none of the proofs given there require any “extra” proposition symbols,
that is, ones that didn’t already occur in the expression to be proved. In what follows we
need to know that no such extra symbols are ever necessary, so we now prove this obvious
fact.

G.2 Theorem
Let A be an expression (of SL) and suppose that p1 , p2 , . . . , pk are the proposition symbols
(letters) of the language which occur in A. Then A if and only if there is a proof of it
in which the only letters which occur anywhere in the proof are these same p1 , p2 , . . . , pk .
74 CHAPTER 2. SENTENTIAL LOGIC

Proof. A is a theorem, so it has a proof,


S1, S2, . . . , Sn
Now this proof may contain some “extra” letters, q 1 , q 2 , . . . , q h say. Now replace the proof
above by a new one,
S01, S02, . . . , S0k
in which every occurrence of any of the extra letters q 1 , q 2 , . . . , q h is replaced by p1 . We
show that this new proof is also a proof of A. This is easy.
If Si is an instance of an axiom, then Si0 is also an instance of the same axiom.
If Si follows by Modus Ponens from two earlier steps, Su and Sv , then Si0 follows by Modus
Ponens from Su0 and Sv0 .
And the final step in both proofs is the same A, because it doesn’t contain any of the extra
letters. Done.
Finally, obviously the only letters mentioned in the new proof are p1 , p2 , . . . , pk . 

G.3 Truth tables


Here I describe the algorithm for deciding whether an expression of SL is a theorem or
not. The description will be informal, and the proof that it works as claimed will be given
subsequently, occupying the remainder of this section.
Consider first the connective ∧. The motivating idea of this connective is that the expression
p ∧ q should be true when both p and q are true, and false otherwise. We can encapsulate
this in a table.
p q p∧q
1 1 1
1 0 0
0 1 0
0 0 0
Here I have used 1 and 0 to denote TRUE and FALSE.
Why use 1 and 0 instead of the more obvious T and F? Because it is easier in these tables
to see at a glance the difference between the symbols 1 and 0 than between T and F. This
is particularly so when the table becomes larger (see examples below).
Here are the tables for the five basic connectives:
p ¬p
1 0
0 1

p q p∧q p q p∨q
1 1 1 1 1 1
1 0 0 1 0 1
0 1 0 0 1 1
0 0 0 0 0 0
G. DECIDABILITY OF SENTENTIAL LOGIC 75

p q p⇒q p q p⇔q
1 1 1 1 1 1
1 0 0 1 0 0
0 1 1 0 1 0
0 0 1 0 0 1
One can construct a table for more complicated expressions by a mechanical process. For
example, to construct a truth table for the expression (p ⇒ q) ⇒ (¬q ⇒ ¬p), start with
an empty table, in which there is a column corresponding to each of its sub-expressions,
building up from left to right:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1
1 0
0 1
0 0

Now we fill in the missing entries in the table, working from left to right. The first three
columns can be filled in by consulting the basic tables given above:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1 1 0 0
1 0 0 0 1
0 1 1 1 0
0 0 1 1 1
Now the ¬q ⇒ ¬p column can be filled in, using the values in the ¬p and ¬q columns and
the basic table for ⇒:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1 1 0 0 1
1 0 0 0 1 0
0 1 1 1 0 1
0 0 1 1 1 1
Finally the last column can be filled in, using the values in the p ⇒ q and ¬q ⇒ ¬p columns
and the basic table for ⇒ again:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1 1 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 0 1 1
0 0 1 1 1 1 1

Here are the important facts.

• A tautology is an expression which is true whatever the truth-values of its letters.


(That is a definition)

• An expression is a tautology if and only if every entry in its column in a truth-table


is a 1. (That is obvious from the way we have constructed our table.)
76 CHAPTER 2. SENTENTIAL LOGIC

• An expression is a tautology if and only if it is a theorem of SL. (That is the big


theorem we are just about to prove.)

The upshot of this is that we now have an algorithm (a completely routine method) to decide
whether any expression is a theorem of SL or not. In other words, the theory is decidable.
One more example. Let us ask if the connective ⇒ is associative, that is, whether
((p ⇒ q) ⇒ r) ⇔ (p ⇒ (q ⇒ r))
is a theorem of SL or not. Here is the table:
p q r p⇒q (p ⇒ q) ⇒ r q⇒r p ⇒ (q ⇒ r) The expression
1 1 1 1 1 1 1 1
1 1 0 1 0 0 0 1
1 0 1 0 1 1 1 1
1 0 0 0 1 1 1 1
0 1 1 1 1 1 1 1
0 1 0 1 0 0 1 0
0 0 1 1 1 1 1 1
0 0 0 1 0 1 1 0
First notice that the leftmost columns, corresponding to the letters p, q and r must con-
tain every combination of truth-values. Consequently, if the expression being investigated
has n letters, the table must have 2n rows. Second, observe that some of the entries in
the final column are not 1 in this example. From this we conclude that the expression
((p ⇒ q) ⇒ r) ⇔ (p ⇒ (q ⇒ r)) is not a tautology, and hence not a theorem of SL.
Here we embark upon the process of proving that these claims are correct. Read on if you
are hard enough!

We may re-interpret the columns of these tables as tables of functions. For example, take
the table of p ⇒ q above. Think of it as a table of a function of two variables (the values of p
and q), each of which can take values from the set B = {0, 1}. It is better not to confuse the
expression p ⇒ q with its truth-table function, so let us call the function Tp⇒q . It is thus a
function B2 → B. Doing the same thing for the first big table above, we have another way
of looking at it, as a table of values of the functions involved:
e Tp⇒q (e) T¬p (e) T¬q (e) T¬q⇒¬p (e) T(p⇒q)⇒(¬q⇒¬p) (e)
(1,1) 1 0 0 1 1
(1,0) 0 0 1 0 1
(0,1) 1 1 0 1 1
(0,0) 1 1 1 1 1

This interpretation will be used in the remainder of this section to prove that the Truth-table
Algorithm works as claimed.

G.4 Definition
Let us write B for the set {0, 1} so that Bk = { (e1 , e2 , . . . , ek ) : each ei = 0 or 1 }.
In Bk we will write O for the all-0 k-tuple and I for the all-1 one: O = (0, 0, . . . , 0) and
G. DECIDABILITY OF SENTENTIAL LOGIC 77

I = (1, 1, . . . , 1).
We will be working with functions Bk → B. We define operations on these functions
corresponding to the connectives; if f and g are such functions, then ¬f , f ∧ g, f ∨ g, f ⇒ g
and f ⇔ g are defined as follows:
(
1 if f (e) = 0 ;
(¬f )(e) =
0 otherwise.

( (
1 if f (e) = g(e) = 1 ; 0 if f (e) = g(e) = 0 ;
(f ∧ g)(e) = (f ∨ g)(e) =
0 otherwise. 1 otherwise.

( (
0 if f (e) = 1 and g(e) = 0 ; 1 if f (e) = g(e) ;
(f ⇒ g)(e) = (f ⇔ g)(e) =
1 otherwise. 0 otherwise.

To each expression C we define its truth-table function, TC : Bk → B by induction over the


construction of C, as follows: I A.9

(i) If C is a proposition symbol pi , then TC (e1 , e2 , . . . , ek ) = ei .


(ii) If C is ¬A , then TC (e) = ¬TA (e) for all e ∈ Bk .
(iii) If C is A ∧ B , then TC (e) = TA (e) ∧ TB (e) for all e ∈ Bk .

(iv) If C is A ∨ B , then TC (e) = TA (e) ∨ TB (e) for all e ∈ Bk .


(v) If C is A ⇒ B , then TC (e) = TA (e) ⇒ TB (e) for all e ∈ Bk .
(vi) If C is A ⇔ B , then TC (e) = TA (e) ⇔ TB (e) for all e ∈ Bk .
Finally, we say that A is a truth-table tautology (TTT) if TA (e) = 1 for all e ∈ Bk . Note
that all of this is just a complicated, but precise, way of describing the functions which are
tabulated by a truth table, as described in the preceding section.

The remainder of this section is devoted to a proof of

G.5 Theorem
A if and only if A is a TTT.

This theorem will establish the decidability of the theory, because whether a expression is a
TTT or not is clearly decidable. (For each of the 2k “vectors” e, compute TA (e) according to
the recipe given by their definition, and see if all the values are 1. The truth-table method,
as described in G.4 is simply an organised way of doing this.) I G.4

One half of the proof of this theorem is straightforward:


78 CHAPTER 2. SENTENTIAL LOGIC

G.6 Half of the theorem


If A then A is a TTT.

Proof. A is a theorem if and only if it is one of the lines in a formal proof. Therefore it is
enough to show that the three axioms are TTTs and that, if A and A ⇒ B are TTTs, then
so is B.
Consider Axiom SL1, A ⇒ (B ⇒ A). Let e ∈ Bk and check all possible combinations of
values of TA (e) and TB (e).
If TA (e) = 1 and TB (e) = 1 then TB⇒A (e) = 1 and so TA⇒(B⇒A) (e) = 1.

If TA (e) = 1 and TB (e) = 0 then TB⇒A (e) = 1 and so TA⇒(B⇒A) (e) = 1.


If TA (e) = 0 and TB (e) = 1 then TB⇒A (e) = 0 and so TA⇒(B⇒A) (e) = 1.
If TA (e) = 0 and TB (e) = 0 then TB⇒A (e) = 1 and so TA⇒(B⇒A) (e) = 1.
The proofs for Axioms SL2 and SL3 are the same, except that eight cases must be checked
for each.
Now consider Modus Ponens. Suppose that A and A ⇒ B are TTTs. Let e ∈ Bk . Then
TA (e) = 1 (since A is a TTT) and TA (e) = 1 ⇒ TB (e) = 1 (by definition of TA⇒B ) and so
TB (e) = 1 as required. 

G.7 Definition
(i) A conjunctive term is a expression of the form Q1 ∧Q2 ∧ . . . ∧Qk where, for each i, Qi
is either pi or ¬pi . (So every proposition symbol occurs exactly once, either with or
without a preceding not-sign.)

(ii) A disjunctive form is a expression of one of the two forms

F
or C 1 ∨C 2 ∨ . . . ∨C s where C 1 , C 2 , . . . , C s are distinct conjunctive terms.

G.8 Notes

(1) We will assume that some fixed order has been chosen for the conjunctive terms in
the second form of the expression. It does not matter what it is, so long as we choose
one (say, lexicographic) and stick to it.

(2) The form F is used to act as the “gold-plated” always-false expression, such as ¬(p1 ⇒ p1 )
I F.30 (see Section F.30).

(3) We will want to talk about “the conjunctive terms which occur in” any given disjunc-
tive form. It is obvious what this means for the second form above. We will simply
define F to contain no conjunctive terms.
G. DECIDABILITY OF SENTENTIAL LOGIC 79

(4) Having made these clarifications, we can now say that, given any set of conjunctive
terms, there is exactly one disjunctive form which contains just these terms and no
others.

G.9 Definition
To any member e of Bk there corresponds a conjunctive term Ke = Q1 ∧Q2 ∧ . . . ∧Qk .
where (
pi if ei = 1,
Qi =
¬pi if ei = 0.

Then, for any expression A, we define its disjunctive form ADF by

ADF = C 1 ∨C 2 ∨ . . . ∨C s

where C 1 , C 2 , . . . , C s are just the conjunctive terms Ke for which TA (e) = 1, in our chosen
order. (Note: in the case e = O, we interpret this to mean that ADF is F.)

G.10 Lemma

(i) A conjunctive term C = Ke satisfies

TC (e) = 1 and TC (e0 ) = 0 for all e0 6= e

(ii) If D = C 1 ∨C 2 ∨ . . . ∨C s is a disjunctive form, then TD (e) = 1 for exactly those e for


which Ke is one of the terms C i of D.

Proof. (i) follows immediately from the definition and (ii) follows easily from (i). 

This lemma has several useful immediate corollaries:

G.11 Corollaries

(i) Any disjunctive form is its own disjunctive form, that is, if D is a disjunctive form, then
DDF = D.

(ii) TA = TADF for all expressions A.

(iii) Suppose that D is a disjunctive form. Then D is a TTT if and only if it contains all
2k conjunctive terms.

(iv) Distinct disjunctive forms have distinct truth-table functions.


80 CHAPTER 2. SENTENTIAL LOGIC

G.12 Lemma
Let A = C 1 ∨C 2 ∨ . . . ∨C s , where C 1 , C 2 , . . . , C s are all the 2k conjunctive terms. Then
A.

Proof. We have pi ∨ ¬pi for each i (Proposition F.28). By Proposition F.8(c) then, I F.28
I F.8
(p1 ∨ ¬p1 ) ∧ (p2 ∨ ¬p2 ) .
Now, by repeated applications of distributivity, commutativity and associativity (Proposi-
I F.21 tions F.21–F.23),
I F.23
(p1 ∧ p2 ) ∨ (p1 ∧ ¬p2 ) ∨ (¬p1 ∧ p2 ) ∨ (¬p1 ∧ ¬p2 ) ,
which is the lemma in the case k = 2. If k ≥ 3, we also have p3 ∨ ¬p3 and so, by
I F.8 Proposition F.8(c) again,

(p1 ∧ p2 ) ∨ (p1 ∧ ¬p2 ) ∨ (¬p1 ∧ p2 ) ∨ (¬p1 ∧ ¬p2 ) ∧ (p3 ∨ ¬p3 )
which, after umpteen applications of the distributive, associative and commutative laws
expands out to the required eight-term expression. AND SO ON. 

G.13 Corollary
Let D be a disjunctive form. Then D if and only if it is a TTT.

G.14 Lemma
pi C 1 ∨C 2 ∨ . . . ∨C s ,
k−1
where C 1 , C 2 , . . . , C s are all the 2 conjunctive terms of the form Q1 ∧Q2 ∧ . . . ∧Qk in
which Qi = pi .

Proof. is the same as for the preceding lemma. 

G.15 Lemma
Let A = A1 ∨A2 ∨ . . . ∨As be a disjunctive form in which A1 , A2 , . . . , As are some of the
available conjunctive terms and let B = B 1 ∨B 2 ∨ . . . ∨B t be another disjunctive form in
which the conjunctive terms B 1 , B 2 , . . . , B t are exactly the ones which are not involved in
A. (In the special case in which A1 , A2 , . . . , As happens to be all the conjunctive terms, we
interpret this to mean B = F. And vice versa.) Then ¬A B.

Proof. We show first that ¬A B. Using Modus Ponens, it is enough to show that
¬A ⇒ B. From the definition of the disjunction symbol, this is the same as ¬¬A ∨ B.
I F.4 Now by Proposition F.4 and Substitution of Equivalences, this is the same as proving that
A ∨ B. Because of the assumed form of A and B, this is the preceding lemma.
Next we prove that B ¬A. Again we ring some changes: using Modus Ponens it is enough
to prove that B ⇒ ¬A. This is the same as ¬¬(B ⇒ ¬A) which, by the definition
of conjunction, is the same as ¬(B ∧ A), so this is what we will establish.
G. DECIDABILITY OF SENTENTIAL LOGIC 81

Now B ∧ A is of the form (B 1 ∨B 2 ∨ . . . ∨B t )∧(A1 ∨A2 ∨ . . . ∨As ) and so, by repeated use of
the distributive, associative and commutative laws,

B∧A (B1 ∧ A1 ) ∨ (B1 ∧ A2 ) ∨ . . . ∨ (Bt ∧ As ) , (1)

the terms on the right being all expressions of the form B q ∧Ap (for q = 1, 2, . . . , t and
p = 1, 2, . . . , s). In each such term, B q and Ap are distinct and so there is at least one index
i such that Ap contains pi and B q contains ¬pi , or vice versa. Then, using commutativity,
there is a expression E such that B q ∧Ap pi ∧¬pi ∧E. Using the tricks in Section F.30 I F.30
(specifically F.31(d) and (e)), we get ¬(B q ∧Ap ) and, since this is true of every one of the
terms in the expansion (1) above, ¬(B ∧ A). 

G.16 Lemma
Let A = A1 ∨A2 ∨ . . . ∨As and B = B 1 ∨B 2 ∨ . . . ∨B t be disjunctive forms in which
A1 , A2 , . . . , As is a subset of B 1 , B 2 , . . . , B t . Then A B.

Proof. If we write E 1 , E 2 , . . . , E u for those conjunctive terms which appear in B but not
in A, then using distributivity and associativity, A∨E 1 ∨E 2 ∨ . . . ∨E u B. By Proposition
E7a, A A∨E 1 ∨E 2 ∨ . . . ∨E u and we are done. 

G.17 Lemma
For any expression A, A ADF .

Proof. is by induction over the construction of A.

Suppose first that A = pi , a proposition symbol. Now TA (e) = ei . Therefore ADF =


A1 ∨A2 ∨ . . . ∨As , where A1 , A2 , . . . , As are just the conjunctive terms of the form Q1 ∧Q2 ∧ . . . ∧Qk
in which Qi = pi . The result follows by Lemma F12.
Suppose next that A = ¬B. Let B DF = B 1 ∨B 2 ∨ . . . ∨B t and, by the inductive hypothesis,
B B DF . Then

A ¬B DF ¬(B 1 ∨B 2 ∨ . . . ∨B t ) A1 ∨A2 ∨ . . . ∨As

where A1 , A2 , . . . , As are all the conjunctive terms which are not amongst B 1 , B 2 , . . . , B t
(the last step being by Lemma F13). Since B DF = B 1 ∨B 2 ∨ . . . ∨B t , TB (e) = 1 for just
those e such that Ke is amongst B 1 , B 2 , . . . , B t . Thus TA (e) = T¬B (e) = 1 for exactly those
e such that Ke is not amongst B 1 , B 2 , . . . , B t , that is, for those e such that Ke is amongst
A1 , A2 , . . . , As . Therefore ADF = A1 ∨A2 ∨ . . . ∨As and we have proved that A ADF in
this case.

Finally suppose that A = B ⇒ C. Let B DF = B 1 ∨B 2 ∨ . . . ∨B t and C DF = C 1 ∨C 2 ∨ . . . ∨C u .


Then ADF = A1 ∨A2 ∨ . . . ∨As , where A1 , A2 , . . . , As are all conjunctive terms except for
those which occur in B DF but not in C DF (in other words, all conjunctive terms which oc-
cur in C DF or do not occur in B DF ). By the inductive hypothesis, B B DF and C C DF
and so B ⇒ C B DF C DF , that is,

A (B 1 ∨B 2 ∨ . . . ∨B t ) ⇒ (C 1 ∨C 2 ∨ . . . ∨C u ) .
82 CHAPTER 2. SENTENTIAL LOGIC

By the definition of disjunction,

(B 1 ∨B 2 ∨ . . . ∨B t ) ⇒ (C 1 ∨C 2 ∨ . . . ∨C u ) ¬(B 1 ∨B 2 ∨ . . . ∨B t ) ∨ (C 1 ∨C 2 ∨ . . . ∨C u )

I G.15 and then, by Lemma G.15,

(B 1 ∨B 2 ∨ . . . ∨B t ) ⇒ (C 1 ∨C 2 ∨ . . . ∨C u ) (D1 ∨D2 ∨ . . . ∨Dv ) ∨ (C 1 ∨C 2 ∨ . . . ∨C u )

where D1 , D2 , . . . , Dv is the set of conjunctive terms which do not occur amongst B 1 , B 2 , . . . , B t .


We now have
A (D1 ∨D2 ∨ . . . ∨Dv ) ∨ (C 1 ∨C 2 ∨ . . . ∨C u )
I F.29 and, using commutativity and idempotence of ∨ (F.29) to remove repetitions, this expression
is ADF . 

G.18 Theorem G.5 — second half


It remains to prove that, if A is a TTT then A.

Proof. If A is a TTT then, by Corollary F9(ii), so is ADF . Then, by Corollary F11, ADF .
Then, by Proposition F15, A. 

G.19 Corollary

SL is consistent.

Proof. We must show that the theory contains no theorem of the form A ∧ ¬A. By the
definition of a truth-table function, TA∧¬A (e) = 0 for all e and so A ∧ ¬A is not a TTT.
The result follows. 

G.20 Corollary

SL is not complete.

Proof. We must demonstrate the existence of a expression A such that neither A nor ¬A
is a theorem. This is so with A = p1 because

Tp1 (O) = 0 so p1 is not a TTT.

T¬p1 (I) = 0 so ¬p1 is not a TTT.



H. INDEPENDENCE OF THE AXIOMS 83

H Independence of the axioms


“As simple as possible, but no simpler”. — Various rather complicated sets of axioms for
Sentential Logic have been slowly whittled down over the last hundred years or so to finally
arrive at the rather simple set we have been working with, This raises the question: have
we gone as far as possible? What if only two of SL1, SL2, SL3 would suffice?
Suppose, for instance, that SL1 and SL2, without SL3, would suffice to give all the theorems
of Sentential Logic. In that case SL3, being an axiom of the old version and therefore a
theorem, would be provable from the other two.
We will now verify that this sort of thing cannot happen by proving, for each of the three
axioms in turn, that it cannot be proved from the other two.

H.1 Independence of Axiom SL1


To show this we use a modified truth table procedure, but with three values, 0, 1 and 2
instead of two. (This can be thought of as using a three-valued logic.)

We assume that values have been assigned to all the proposition symbols, p, q, . . . and
define consequent values for all expressions. We will call an expression which takes the value
2, no matter what values are assigned to its proposition symbols, a strange tautology. We
then show that all instances of Axioms SL2 and SL3 are strange tautologies and that, if P
and P ⇒ Q both have value 2, then Q must have value 2 also. It follows from this that
every expression which can be proved from SL2 and SL3 is a strange tautology.

Finally, we show that Axiom SL1 is not a strange tautology; it follows that it cannot be
proved from the other two axioms.
To define how values are calculated for expressions, we use tables like those used in Section
G.3: I G.3
p q p⇒q
2 2 2
2 1 0
p ¬p 2 0 0
2 2 1 2 0
1 1 1 1 0
0 1 1 0 2
0 2 2
0 1 2
0 0 2

(We won’t need to bother with the other connectives here.)


Construct truth tables for Axioms SL2 and SL3 and verify that they are both strange
tautologies. (SL2 has three variable symbols and so requires 33 = 27 rows. Constructing
this table is tiresome, so a direct proof is easier: assume that some value of SL2 is not 2
and work backwards to a contradiction. SL3 has only 9 rows and is easiest dealt with by a
table.)
84 CHAPTER 2. SENTENTIAL LOGIC

Inspection of the table for ⇒ above shows that, if P and P ⇒ Q both have value 2, then
so does Q. It now follows that every expression that can be proved from SL2 and SL3 is a
strange tautology.
Verify that, when p has value 1 and q has value 0, SL1 has value 0; thus SL1 is not a strange
tautology and so SL1 cannot be proved from the other two axioms.

H.2 Independence of Axiom SL2


To show that Axiom SL2 is independent of the other two, the argument in this case is exactly
as the one above, but with the tables
p q p⇒q
2 2 2
2 1 0
p ¬p 2 0 1
2 1 1 2 2
1 2 1 1 0
0 1 1 0 2
0 2 2
0 1 2
0 0 2

H.3 Independence of Axiom SL3


To show that Axiom SL3 is independent of the other two is easier because it only requires
two values, 0 and 1 (a two-valued logic). We will now call any expression that takes the
value 1 no matter what values are assigned to its proposition symbols a strange tautology.
Use these tables:
p ¬p
1 0 p q p⇒q
0 0 1 1 1
1 0 0
0 1 1
0 0 1

Since we are using the standard table for ⇒, it follows immediately that Axioms SL1 and
SL2 are strange tautologies and that if P and P ⇒ Q both have value 1, then so does Q.
Therefore any expression that can be proved from SL1 and SL2 is also a strange tautology.
When p and q both have value 0, SL3 has value 0. Therefore SL3 is not a strange autology
and cannot be proved from SL1 and SL2.
I. OTHER AXIOMATISATIONS 85

I Other axiomatisations
There are many other ways of setting up Sentential Logic as a formal theory. Here are a few.
They are all equivalent to SL in the sense of Definition 1.C.4 — which is a careful way of I 1.C.4
saying that, while the new theory has different symbols and therefore different expressions,
there is a (usually pretty obvious) way of translating back and forth between them in such
a way that theorems correspond to theorems (both ways).

I.1 Use of prefix notation


This is simply a different notation for expressions, one which allows parentheses to be done
away with. It is more difficult for humans to read, probably easier for computers. Certain
elementary facts, such as uniqueness of parsing, are more easily proved.
The symbols are the same, except that parentheses are not required:

Connectives ¬ ⇒
Proposition symbols p q r and so on.

The rules for building expressions are

Any proposition symbol constitutes a expression.


If A is a expression, then so is ¬A.
If A and B are expressions then so is ⇒AB.

The axioms are the same, translated into this notation:

⇒A⇒BA
⇒⇒A⇒BC⇒⇒AB⇒AC
⇒⇒¬A¬ B⇒⇒¬ABA

The only rule of inference is Modus Ponens, which now looks like this:

A , ⇒AB
B

I.2 An axiomatisation using only disjunction and negation


There are a number of commonly given axiomatisations using ∧, ∨ or both, with ¬. Here is
one.
The symbols are:

Connectives ¬ ∨
Proposition symbols p q r and so on.
Punctuation ( )
86 CHAPTER 2. SENTENTIAL LOGIC

The rules for building expressions are the obvious ones:

Any proposition symbol constitutes an expression.


If A is an expression, then so is ¬A.
If A and B are expressions then so is (A ∨ B).

In the semi-formal version of the language, we omit parentheses in the usual way and use
A ⇒ B as an abbreviation for ¬A ∨ B. The axioms are:
(1) A∨A⇒A
(2) A⇒A∨B

(3) A∨B ⇒B∨A


(4) (B ⇒ C) ⇒ (A ∨ B ⇒ A ∨ C)
As before, the only rule is Modus Ponens. Semi-formally, it looks the same; formally, it
looks like this:

A , (¬A ∨ B)
B

I.3 Formalising definitions


We can always make a symbol used as an abbreviation in the semi-formal version of the
language a part of the formal language by adding it to the symbol list and adding an axiom
or axioms sufficient to make it equivalent to its definition.
For example, if we start with our original theory, defined at the beginning of this chapter,
we can add the ∧ symbol and the axioms

A∧B ⇒A
A∧B ⇒B
A ⇒ (B ⇒ A ∧ B).

Alternatively, we can add the new symbol and some new rules of inference. For example,
again starting from our original theory, we can add the symbol ∧ and two rules of inference:

¬(A ⇒ ¬B) A∧B


and .
A∧B ¬(A ⇒ ¬B)

I.4 Minimal axiomatisations


It is possible to axiomatise Sentential Logic using the symbols ¬ and ⇒ and only one axiom,
to whit:
h   i  
(A ⇒ B) ⇒ (¬C ⇒ ¬D) ⇒ C ⇒ E ⇒ (E ⇒ A) ⇒ (D ⇒ A)
I. OTHER AXIOMATISATIONS 87

(Here I have used several different types of parentheses to make the axiom easier to read.)
Even more stripped down is an axiomatisation using the Sheffer stroke, otherwise known as
the nand function (short for “not-and”). It has the following truth-function table:
p q p⇒q
1 1 0
1 0 1
0 1 1
0 0 1

Only a single axiom is needed:


h   i
(A|(B|C)) | D|(D|D) | (E|B) | (A|E)|(A|E) .

I.5 More about nand


The nand connective can of course be added as another connective to ordinary Sentential
Logic, just as we added ∧, ∨ and ⇔. We would define it to be an abbreviation for ¬(A ∧ B)

This connective is more important than you might at first think. Firstly, note that all the
other connectives can be defined in terms of nand, as follows
¬p as p|p , p ⇒ q as p|(q|q) , p ∧ q as (p|q)|(p|q) , p ∨ q as (p|p)|(q|q)

Consider digital circuitry — computers, calculators, mobile phones and so on. The adjec-
tive “digital” means that most of the circuitry inside implements logical operations. Each
particular brand of circuitry defines two voltage levels to represent true and false (or 1 and
0). For instance the (now old-fashioned) TTL series uses 5 volts and 0 volts for these; my
Mac, which is not old-fashioned yet, uses 1 volt and 0 volts.

Note that, when binary notation is used, arithmetic operations such as addition and multi-
plication can be implemented as (rather complicated) logical operations.
So called “gates” implement the various logical operators. For example, you can go down
to your local electronics shop and buy an and-gate. It will have two inputs, p and q say,
and an output which will give p ∧ q. (Well, actually what you can buy is an IC chip which
probably contains at least four and-gates, plus connections for its power supply.) If you
have the time and money, some electronics know-how and a soldering iron, you can build
almost any logical network you choose.
Inside the IC, these operations are implemented by tiny transistors. By far the simplest
operation to implement is the nand operation, which only takes one transistor. Other
operations are usually built up from combinations of nand gates according to the rules given
above.
So more complicated IC circuits, such as the CPU in a computer, are packages containing
a large number of these gates pre-wired together to implement the required behaviour.
Nowadays a CPU chip will contain many millions of microscopic nand-gates and very little
else.
3. PREDICATE LOGIC AND FIRST-ORDER
THEORIES

A A language for first-order theories


A.1 Discussion
We now extend our languages and theories to deal with variables and quantification. Con-
sider for example the statement

for all x, y and z, if x+y =x+z then y = z

which we might expect to be a theorem in a theory about the natural numbers. Expressing
it more symbolically,
(∀x)(∀y)(∀z)(x + y = x + z ⇒ y = z) .
Consider the component parts of this.
• First there are the variable symbols, x, y and z. No surprises here — so far.

• Then there is the function symbol + representing addition, a binary function (that
is, a function of two variables). Mathematics, as usually written, is full of different kinds of
notation for functions, some quite bizarre. We have the usual f (x, y) kind of notation but
also “infix” notation such as is used for addition and subtraction, the superscript notation
xy for exponentiation, things like nr for the binomial coefficient; and what are we to make
of xy for multiplication? — scarcely a notation at all.
We wish to set up a general formal language, and for this we will avoid all these special
types of function notation; we will insist that for the fully-formal language all functions are
written in the same “generic” f (x, y) way. For instance, we will write addition as +(x, y)
instead of x + y, and our example expression becomes

(∀x)(∀y)(∀z)(+(x, y) = +(x, z) ⇒ y = z) .

• The statement contains a relation, equality, also binary. In order to preserve flexibility
in our formal language, we will also insist that all relations are written in generic r(x, y)
form. Accordingly, we will write = (x, y) instead of x = y and our expression becomes

(∀x)(∀y)(∀z)(=(+(x, y), +(x, z)) ⇒ =(y, z)) .

• And we have the quantifiers, (∀x), (∀y) and (∀z), of which much more anon.
Here we have the essential components of a first-order language (that is, the language for
any theory based upon Predicate Logic): variable symbols, function and relation symbols,
and the logical connectives augmented by quantifiers.

89
90 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

A.2 More discussion


A first-order theory will contain
• Predicate Logic axioms, which are there to make the logic work and which are
common to all first-order theories. An example is
(∀x)(P ⇒ Q) ⇒ ((∀x)P ⇒ (∀x)Q) .
As with most formal theories, there are various equivalent ways to axiomatise predicate logic.
In these notes we will use a set of six axiom schemata, the ones usually used in expositions
I A.21 of mathematical logic. They are listed in A.21.
In setting up a first-order theory, we usually have a particular interpretation in mind, such
as Number Theory, Set Theory, Group Theory or whatever. When we do this we will also
have:
• Proper axioms, specific to the particular theory, usually designed to encapsulate the
properties of the structure being described. For instance, for elementary Number Theory
we might have (reverting to semi-formal notation)
(∀x)(x + 0 = x) .

There are several different ways we can think about a first-order theory.
(1) We might have a specific interpretation in mind, such as elementary Number Theory,
Group Theory or even one of the axiomatisations of Set Theory (which is powerful enough
to contain all of mathematics) — the main ones are Morse-Kelley (MK) Von Neumann-
Bernays-Gödel (VBG) and Zermelo-Fraenkel (ZF). In this case we will have a (usually
small) number of functions and relations, each of which has a definite interpretation in our
target structure (such as addition, equality, set-membership etc.). We will discuss several
I4 specific theories in Chapter 4 and MK in detail in Chapter 6.
I6
(2) We might want to discuss first-order theories in general, for instance in order to
discover properties common to them all. Then we will have functions and relations, and
perhaps some axioms, but no specific axioms or interpretation in mind.
(3) We might want to consider Predicate Logic itself, in much the same way as we
considered Sentential Logic in the previous chapter. Then we will have no proper axioms
at all. It is tempting to call such a theory “Pure Predicate Logic”, but this won’t do;
unfortunately this term is already used for a specific kind of Predicate Logic (one without
any function symbols at all either) so I will use the term Plain Predicate Logic for this —
a first-order theory with no proper axioms.

Plain Predicate Logic is not uniquely defined — one can create different plain predicate
logics by making different choices of the sets of function and relation symbols. However
it turns out that the difference between different predicate logics is entirely inessential, in
that, for any property of theories we are actually interested in, if it is true for one predicate
logic then it is true for all of them. Therefore there is no harm in choosing one gold plated
example as a representative and calling it Plain Predicate Logic (with capitals) or just PL.
This is what we will do.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 91

Most of the important properties of PL are properties of all first-order theories. There
is, however, one important theorem about PL in this chapter which is not shared by all
first-order theories: it is consistent (proof later)

My mental picture of these theories

Expressions Expressions

Theorems Theorems

Proper axioms

PL Axioms PL Axioms

Plain Predicate Logic A first-order theory

Now we start defining the theory in the usual way, by specifying what our alphabet is.

A.3 Symbols and strings


There are several kinds of symbols:

The connective symbols ¬ ⇒ ∀


The variable symbols x y z ... (infinitely many, usually countably).
The function symbols See notes below.
The relation symbols See notes below.
The punctuation symbols ( ) ,

A string is, as usual, any finite sequence of these symbols.


92 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

A.4 Comments

(i) When we come to specific first-order theories, the function symbols will be meant to
have permanently assigned meanings. In ordinary mathematics, examples might be symbols
such as + or ∩. In an example such as “let f be a differentiable function . . . ” we would
not include the letter f among the function symbols above — other arrangements are made
for this usage (that is, functions created on the fly). The actual list of function symbols
would depend on which particular language we are considering. In the same way, the
relation symbols also will be meant to have permanently assigned meanings. In ordinary
mathematics, examples would be = or ∈ . Again, the actual list of relation symbols would
depend on the particular example of a language we were considering. In a first-order theory
there would usually be only a finite number of function and relation symbols (though this
is not necessary); in some theories one or other of these sets might even be empty.
(ii) In Sentential Logic we have already made a distinction between the formal symbols
of the language, those which are part of the basic definition of the language, such as ¬ and
⇒, and the others, which we might call semi-formal, those which are introduced later, for
instance as abbreviations, such as ∧ and ⇔.
In the same way, in first-order theories, there are formal functions and relations which
are part of the definition of the language and (usually) some semi-formal ones which are
introduced later in various ways. For example, most first-order theories have the symbol =
for equality, nearly always introduced as a formal relation. Then the symbol 6= for inequality
can be defined later (x 6= y to mean ¬(x = y)) and so is a semi-formal symbol.
(iii) Note that we assume that there is an infinite number of variable symbols at our
disposal. In a number of proofs it will be necessary to introduce, at some point, a new
variable symbol or symbols which has not appeared in the proof so far (a “fresh” variable).
It is necessary to know that this can always be done. In nearly all first-order theories
a countably infinite set of variable symbols is big enough, however we will never need to
assume this for our investigations. That releases us from the problem of defining countability
at this early stage: the remark about countability is simply a comment.
(iv) In Plain Predicate Logic, the function and relation symbols are not supposed to have
any specific meanings, and we usually use non-specific symbols such as f and r to stand
for them. It is usual to suppose that we have a finite (or at most denumerably infinite)
collection of each, though this is not necessary. It is always supposed that the symbols
have been chosen so that variable, function and relation symbols can be recognised as such.
(Thinking of them as sets, the sets are disjoint.)
(v) Function symbols have an arity, which is a natural number (≥ 0) — a function of
arity n is simply a function of n variables. (A function of arity n is called n-ary. Common
terminology is nullary for 0-ary, unary for 1-ary, binary for 2-ary and ternary for 3-ary.)
Note that nullary functions are just constants, so this allows us to pick out permanent names
for important constants in our theory, examples in ordinary mathematics being 0 and ∅.
Relation symbols also have arity. A nullary relation represents a truth value that does
not depend on any variables, so nullary relations have the same rôle in Predicate Logic as
proposition symbols have in Sentential Logic.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 93

The arity of a function or relation symbol is also meant to be permanently attached to it


and be instantly recognisable from the symbol itself. We will use the terms function domain
and relation domain to refer to such a set of function or relation symbols with prescribed
arities. The function domain and relation domain, taken together, are often referred to as
the signature of the language.

(vi) As with Sentential Logic, other connectives could be added to our list. Conspicuous
by their absence are ∧, ∨, ⇔ and ∃. These connectives can be defined in terms of the ones
given above, so we happily leave them out of the formal language and then introduce them
in the semi-formal one. (We have already done this with ∧, ∨ and ⇔ and will do it with ∃
in this chapter.)

A.5 Discussion
We now set about defining the language. The existence of function and relation symbols
means that this language is a bit more complicated than that for Sentential Logic.
There are two kinds of “meaningful” strings in this language: terms and expressions.
A term names something. For example, in a language describing the natural numbers, we
would expect terms like 3 and 1 + 1 and x2 + 3y + 2, each of which names some natural
number. Observe that the possible presence of variable symbols means that we may not
know exactly which thing is being named unless we know which things the variable symbols
name.
A expression is a statement. For example, in the language for natural numbers, we would
expect expressions like 2 + 2 = 4 and 13 + 5 < 3 and x2 + 3y − 2 = 0. It is convenient to
think of an expression as a sentence which is either true or false. Note that expressions may
well express facts which are false and that, if they contain variable symbols, their truth or
falsity may depend upon the values of the variables involved.
We need to define terms first and then, using this, define expressions.

A.6 Definition: Terms


A string is a term if it is of one of the forms
(T1) A variable symbol.
(T2) f (s1 , s2 , . . . , sn ) , where f is an n-ary function symbol, n ≥ 1 and s1 , s2 , . . . , sn are
terms, or simply f in the case where f is a nullary function symbol.

A.7 Comments
(i) As remarked above, think of terms as behaving like nouns and noun phrases in
natural languages: they name things.
(ii) As already mentioned, a special case is a nullary function, which can be thought of
as a constant. Examples are 0 (in a language for the natural numbers) and ∅ (in a language
for set theory).
94 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

(iii) Notice that this definition is recursive: Item (T2) allows us to substitute functions
inside functions to any depth. For example, we would construct the term x2 + 3y in our
formal language thus:
+( .(x, x) , .(3, y) )

(here I am using a dot for the multiplication function).

A.8 Definition: Expressions (wffs)


A string is an expression (otherwise known as a well-formed formula or wff ) if it is of one
of the forms
(E1) r(t1 , t2 , . . . , tn ) where r is an n-ary relation symbol, n ≥ 1 and t1 , t2 , . . . , tn are terms,
or simply r in the case where r is a nullary relation symbol.
(E2) (¬P ) where P is an expression,

(E3) (P ⇒ Q) where P and Q are expressions, or


(E4) (∀x P ) where x is a variable symbol and P is an expression.

A.9 Comments
(i) Expressions of the form (E1) are called atomic.

(ii) Expressions normally represent sentences (e.g. 2 + 2 = 4), which are either true or
false, or predicates (e.g. x + y = z), whose truth or falsity depends upon what the variables
represent.
(iii) As usual, the fully-formal language uses lots of parentheses to make sure that am-
biguous constructions cannot arise. And, also as usual, we will omit them in the semi-formal
language whenever possible, An expression of the form (∀x P ) will usually be written (∀x)P
in our semi-formal language whenever possible without confusion, because it is easier to read
that way — compare (∀x x + 0 = x) with (∀x) x + 0 = x .
The (∀x) construction is usually called a universal quantifier. (This word is also sometimes
used for the single symbol ∀, but I won’t do that in these notes.)

The universal quantifier (∀x) expresses in our formal language the frequently-used preamble
“For all x . . . ”.
(iv) We will add the connectives ∧ , ∨ and ⇔ to our semi-formal language as before; later
in this chapter we will also add the existential quantifier (∃x) in the same way. It stands
for the preamble “There exists an x such that . . . ”.
(v) If we want to include the symbols T and F (for true and false), they are nullary
relations, and so are expressions.
(vi) Where the forms (E1), (E2) and (E3) above occur in the semi-formal language we
will usually leave off the outer parentheses where no ambiguity can arise, and write them
¬P , P ⇒ Q and (∀x)P .
A. A LANGUAGE FOR FIRST-ORDER THEORIES 95

Again we have a unique parsing lemma; we had one before for SL in 2.A.8. I 2.A.8

A.10 Proposition: Unique parsing for first-order theories


(i) A string cannot be both a term and an expression.

(ii) Let A be a term. Then only one of cases (T1) or (T2) of the definition above hold. If
(T1) holds, there is a unique variable symbol x such that A = x. If (T2) holds, then there
is a unique function symbol f and terms s1 , s2 , . . . , sn such that A = f (s1 , s2 , . . . , sn ).
(iii) Let A be an expression. Then only one of cases (E1) to (E4) of the definition above
hold. If (E1) holds, then there is a unique relation symbol r and terms t1 , t2 , . . . , tn such
that A = r(t1 , t2 , . . . , tn ). If (E2) holds, then A is unique, if (E3) holds then P and Q are
unique and if (E4) holds then x and P are unique.
Once again I will leave the proof of this as a tiresome and completely optional exercise.

A.11 Discussion: Binding


The quantifier (∀x) “binds” the variable x. What does this mean?
First of all, let me say that the concept of binding is very important here and we will be
referring to it constantly. It is a good idea to get your head around it before we go any
further.
Second of all, the general idea should not be new to you. It occurs all the time in ordinary
algebra and calculus, where variables are bound by other constructions. So let us start with
a few familiar examples of binding, and the come back to quantifiers. Here are a couple:
Z π 10
X
cos x dx = 0 , i2 = 385 (–1)
0 i=1

In the first of these the variable x is bound, in the second the variable i. (A bound variable
is often called a “dummy” variable.)

There are several ways to recognise a bound (or dummy) variable.


R
(1) P binds it. For example, . . . dx binds any
It can be recognised by the construction that
occurrence of the variable x within it; similarly i binds the i shown and any occurrence of i
in whatever subexpression immediately follows. Similarly, in Logic, (∀x) and (∃x) also bind
the occurrence of x shown and any occurrence of x in whatever subexpression immediately
follows them.
(2) Whatever is being said with the dummy variable can be said just as well without it.
For example, the two statements in (–1) above can equally well be expressed:

The integral of the cos function from 0 to π is 0.


The sum of the squares of the integers from 1 to 10 is 45.

This is what the word “dummy” is meant to suggest: that the dummy variable is just a
passing convenience. However if the statement is at all complicated, saying it without
96 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

using
R π x the dummy1 variable might require a truly horrible sentence. For instance, try saying
π
0
e cos x dx = 2 (1 − e ) without mentioning x. Dummy variables are indeed convenient.
(3) The dummy variable can be changed to another variable symbol without changing
the meaning of the statement. For instance, the two statements (–1) above say exactly the
same things as
Z π X10
cos z dz = 0 , r2 = 45 (–1A)
0 r=1

In larger, more complicated statements which might contain several variables, things can
get nasty if you substitute a variable which is already being used elsewhere in the statement.
We will have to look at this a bit more carefully below, but as a general rule, if you are
going to substitute a new variable for a dummy, it is best to substitute a “fresh” one, that
is, one which does not already occur elsewhere in the expression.

Let us look at

(∀x)(x + 0 = x) in other words for every x, x + 0 = x .

in these same ways:


(1) We recognise that x is a dummy here because it is bound by the construction (∀x).
(2) This statement may be rephrased, “Adding 0 to any number does not change it.”
(3) The statement is equivalent to (∀v)(v + 0 = v), amongst many other similar things.

Scope
Each of these constructions which bind a variable symbol has a scope, that is, an area within
the expression in which that dummy variable has meaning. In an integral or a quantified
expression, the scopes are as shown here by the boxes.
Z π
cos x dx = 0 , (∀x)(x + 0 = x)
0

This may not look very illuminating as it stands: it becomes important when such an
expression is part of a larger one. For example, here is an elementary property of integration:
Z Z Z
(f (x) + g(x))dx = f (x) dx + g(x) dx . (–2A)

If we draw in the boxes,


Z Z Z
(f (x) + g(x)) dx = f (x) dx + g(x) dx ,

we see that the variable symbol x in fact has three different scopes, which can be thought
of informally as different “meanings”, here. The equation could equally well be written
Z Z Z
(f (x) + g(x))dx = f (y) dy + g(z) dz , (–2B)

making it much more obvious that there are actually three different dummies here.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 97

At first glance, the second form (–2B) looks better, at least for a formal theory, in that
it does not require recognition of the various different scopes of the same variable symbol.
On the other hand, the first form (–2A) is more in line with common usage. Furthermore,
recognising different scopes is algorithmically quite simple (even if it allows us to write
rather human-unfriendly expressions — see below) and, more importantly, legislating against
expressions such as (–2A) in our formal language would require a lot of awkward little rules
to be added. Therefore multiple different bindings of the same variable symbol are allowed
in the formal language from the word go.
Moreover, if you really used a different letter for every different scope, you would probably
run out of letters by the end of the first page.

Mathematics teachers are at pains to drum into school and university students that
one should never use the same variable symbol with two different meanings in the
same expression or proof, because that way leads to invalid arguments and so to
incorrect results. But then we turn around and flagrantly disregard this rule in
statements such as (–2A) above. The point is that it is OK to use the same symbol
for different dummies, provided that their scopes don’t overlap; it is just the unbound
(“free”) variables we should apply this rule to.

The sigma notation forPsums is a bit of a worry, because it is sometimes not obvious what
10
the scope is. Consider i=1 i2 + 1. Is the 1 at the end included in the scope or not? If not,
the value is 386 and if it is, 395.
10
X 10
X
i2 + 1 = 386 but i2 + 1 = 395
i=1 i=1

(These boxes aren’t quite right: they should include the i at the bottom of the Σ-sign too.)
Well, we should have used parentheses to remove the ambiguity in these statements
10
X 10
X
( i2 ) + 1 = 386 but (i2 + 1) = 395
i=1 i=1

This is just a mild problem with the sigma notation which won’t occur with our logic, so
we will ignore it here.
Now, for an example involving quantifiers, consider the deduction

A, B A∧B

Suppose that P (x) and Q(x) are expressions which mention the variable x. By substitution,
the above allows

(∀x)P (x) , (∀x)Q(x) (∀x)P (x) ∧ (∀x)Q(x) (–2A)

Boxing these to see the scopes gives

(∀x)P (x) , (∀x)Q(x) (∀x)P (x) ∧ (∀x)Q(x)


98 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

and so (–2A) could just as well have been written

(∀w)P (w) , (∀x)Q(x) (∀y)P (y) ∧ (∀z)Q(z) (–2B)

For computer science geeks only: This sort of thing should be familiar to you. There
is no harm in different local variables being denoted by the same symbol so long as
their scopes do not overlap. This is almost exactly the same idea.

The discussion so far should be enough for the definition to make sense.

A.12 Definition: Binding


Let X be an expression and consider some occurrence of the variable symbol x in X. Then
that occurrence of x is bound in X if it is part of some occurrence of an expression of the
form ((∀x)P ) in X. Otherwise it is free.
If we want to be careful, we can define this by induction over the construction of X, using
I A.8 Definition A.8:

• Every occurrence of any variable in an atomic expression r(t1 , t2 , . . . , tn ) is free in that


expression.

• An occurrence of a variable in ¬P is free or bound according as it is free or bound in


P.

• An occurrence of a variable in P ⇒ Q is free or bound according as it is free or bound


in P or Q (whichever it occurs in).

• All occurrences of a variable x are bound in (∀x)P ; any occurrence of any other variable
is free or bound in (∀x)P according as it is free or bound in P .

A.13 Comments on Binding


(i) Note that it does not make sense to talk about a variable itself being bound or free;
it makes sense only for occurrences of a variable in a given expression. For example, x is
free in x + 1 > 0 and bound in (∀x)(x + 1 > 0).
(ii) Another way of looking at this: a quantifier (∀x) binds all occurrences of x within
its scope.

(iii) Note that the definition refers to occurrences of a variable. It is perfectly possible,
though rather unusual, for an expression to contain both bound and free occurrences of the
same variable. For example, in

(x2 + x < 6) ∧ (∀x)(cos x > −5)

the first two occurrences of x are free, the third and fourth bound.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 99

A.14 More discussion: Binding


Now we come to a couple of rather weird-looking manifestations of binding.
(i) In quantified expressions such as (∀x)P , it is usually the case that the quantified
variable turns up somewhere in the expression P . However this is not a hard and fast rule:
it is perfectly possible to have an expression of the form (∀x)P in which P has no occurrences
of x at all. The basic reason is that the definition of an expression allows this. Here is an
example:
(∀x)(2 + 2 = 4)
What can such a statement mean? The short answer is that in an expression, (∀x)P where
P does not mention x, the whole statement means the same as P — more carefully, we will
prove that in such a case, (∀x)P is equivalent to plain P .

Maybe this isn’t so strange after all. Consider:


Z π 10
X
2dx = 2π and 3 = 30 .
0 i=1

Another place such expressions tend to turn up is in simplifications. Here is an example


using the other quantifier (∃x). Suppose we have been working on some kind of equation,
and have reduced the problem to

(∃x)(x + y = x + 3)
that is, there exists some x such that x + y = x + 3

Then we can say immediately that y = 3. But we have jumped over a step here. The proof,
with the subconscious missing step goes

(∃x)(x + y = x + 3) given,
(∃x)(y = 3) because x + y = x + 3 ⇔ y = 3,
y=3 because y = 3 does not contain x free.

(In the middle line here we are substituting equivalents.)


(ii) Now we come to a couple of kinds of expression that are definitely weird, but which
are allowed by the definition of an expression. Firstly, variable symbols are allowed to occur
both free and bound in the same expression, as in

P (x) ∨ (∀x)Q(x) . (–2A)

Here the first occurrence of x is free, the second and third are bound. This sort of thing is
unusual in ordinary mathematical notation, but is in fact allowed, for example
Z 1
2
g(x) = x + f (x) dx . (–3A)
0

As mentioned above, this kind of confusing notation is allowed in the formal language simply
because legislating against it causes more problems than it solves. While these are actually
100 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

allowed, in expressions like these it is far more sensible to change the name of the dummy
so that the meaning of the expression is easier to see:
P (x) ∨ (∀y)Q(y) , (–2B)
Z 1
2
g(x) = x + f (t) dt , (–3B)
0

(iii) Things become really hard to read when we have overlapping scopes with the same
dummy. For example one is allowed to take expressions such as (–2A) and (–3A) and
quantify them again to get, for instance,

(∃x) P (x) ∨ (∀x)Q(x) , (–4A)
Z 1 Z 1 
g(x) = x2 + f (x) dx dx , (–5A)
0 0
When confronted with horrible expressions such as these, probably the best thing to do is
change the letters used for dummies, working from the innermost outward:

(∃x) P (x) ∨ (∀y)Q(y) , (–4B)
Z 1 Z 1 
g(x) = x2 + f (t) dt dx , (–5B)
0 0
(Or just draw in the “scope boxes”.)
Challenge: Here is the definition of a function R → R. Figure out what the function is in a
more straightforward notation.
Z xZ x
h(x) = x2 dx dx for all x .
0 0

How many different dummies, all called x, are there here anyway?
This kind of notation can nearly always be avoided, and I will do so in these notes. But one
should be aware that it is possible, being legal under the definition of an expression.

A.15 Definition: Sentences


A sentence is an expression which contains no free occurrences of any variables. It is also
called a closed expression.
In ordinary calculus, we might write
(∀x)(sin2 x + cos2 x = 1) ;
That is a sentence (x is bound). It is true.
We might also write
cos3 x + sin3 x = 1 ;
That is not a sentence (x is free here). Whether it is true or not depends upon the value of
x. Putting that another way, it says something about the variable x.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 101

A.16 Discussion: Substitution


We know that
sin2 x + cos2 x = 1 for all x.
and so we can substitute any term for x in this to get a true statement, for example
sin2 π + cos2 π = 1 or sin2 log 42 + cos2 log 42 = 1
or even, if we are part-way through an argument involving some number k,
sin2 (3k + ek ) + cos2 (3k + ek ) = 1 .
In general we will have a principle that, given a true statement of the form
(∀x)P
then we can substitute any term we like for x in P and nearly always get a true statement.
(That “nearly always” is a bit of a worry, and we’ll deal with it shortly.)
Notation First we need a convenient and watertight notation for substituting in a general
expression P . One rather straightforward way is to make the variable explicit: if the ex-
pression P contains some variable x which we are going to play with, let us (at least for the
time being) call the expression P (x); then if we want to substitute, say π + 1, for x in it, we
would write the result P (π + 1). This is just like the notation we normally use for functions,
so I’ll call it functional notation. It is convenient and easy to read, but it does have some
problems, mainly when the expression has several variables, especially when some are free
and others bound, so generally I will use a safer notation. Here it is:
The result of substituting a term t for the variable x in the expression P will be written
P [x/t].
Now, as usual, we must be a little careful here. Let us consider the slightly strange (but
valid) definition of a function
Z 1
g(x) = x2 + f (x)dx for all x. (–1A)
0

Now suppose we want to substitute 3 for x here. Do we get this?


Z 1
g(3) = 32 + f (3)d3 WRONG!!.
0

Of course not. For a start, that doesn’t even make sense. What we should have got is
Z 1
g(3) = 32 + f (x)dx CORRECT.
0

The point is that in the expression (–1A) there are both free and bound occurrences of x,
and we should have only changed the free ones. If we had rewritten (-1A) in the safer form
Z 1
2
g(x) = x + f (t)dt for all x. (–1B)
0

before trying the substitution, this would have been obvious.

So we arrive at:
102 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

A.17 Definition: Substitution


Let s be any term, x a variable and t a term. We define s[x/t], the result of substituting t
for x in s to be the term formed from s by replacing every free occurrence of x by t.
In the same way, let P be any expression, x a variable and t a term. We define P [x/t], the
result of substituting t for x in P to be the expression formed from P by replacing every
free occurrence of x by t.
Here is a careful (and boring) definition, by induction over the construction of the term or
expression.
First we define substitution into a term s.

• If s is a variable symbol, there are two possibilities,


(i) s is the same as x, in which case s[x/y] is just y (in other words, x[x/y] is
always y);
(ii) s is a different variable from x, in which case s[x/y] is s (it is not changed).
• If s is f (s1 , s2 , . . . , sn ), where f is a function symbol and s1 , s2 , . . . , sn are terms, then

s[x/t] is f (s1 [x/t], s2 [x/t], . . . , sn [x/t]) .

Now we define substitutions in expressions.

• If P is r(s1 , s2 , . . . , sn ), where r is a relation symbol and s1 , s2 , . . . , sn are terms, then

P [x/t] is r(s1 [x/t], s2 [x/t], . . . , sn [x/t]) .

• If P is ¬Q, then P [x/t] is ¬Q[x/t].


• If P is (Q ⇒ R), then P [x/t] is (Q[x/t] ⇒ R[x/t]).
• If P is (∀yQ), where y and x are different variables, then P [x/t] is (∀yQ[x/t]).
• If P is (∀xQ), then P [x/t] is P .

It follows immediately from these that substitution behaves in the same way with the other
connectives (the proof is easy and obvious):

• If P is (Q ∧ R), then P [x/t] is (Q[x/t] ∧ R[x/t]).


• If P is (Q ∨ R), then P [x/t] is (Q[x/t] ∨ R[x/t]).
• If P is (Q ⇔ R), then P [x/t] is (Q[x/t] ⇔ R[x/t]).
• If P is (∃yQ), where y and x are different variables, then P [x/t] is (∃yQ[x/t]).
• If P is (∃xQ), then P [x/t] is P .
A. A LANGUAGE FOR FIRST-ORDER THEORIES 103

All this is just saying in a complicated way that, to make the substitution [x/t] in any term
or expression, simply go through it and change every free occurrence of the variable x to
the term t. However occasionally, when we want to prove something about substitution, it
helps to pick the idea apart in this way.
(If we were being really careful, at this stage we should verify that the result of such a
substitution in a term is guaranteed to always yield a genuine term and that such a substi-
tution in an expression is guaranteed to always yield a genuine expression. This is not hard
to verify, but tiresome, so let’s leave it at that.)

A.18 Discussion: Acceptability


There is one more subtlety to worry about when applying substitution — sorry, can’t be
helped!
Suppose we have defined a function f thus:
Z 1
f (x) = x sin y dy for all x . (–1A)
0

We may substitute z or ek + 1 for x here to get


Z 1 Z 1
k
f (z) = z sin y dy or f (e + 1) = (ek + 1) sin y dy ,
0 0

but substituting y for x is not acceptable:


Z 1
f (y) = y sin y dy (–1X)
0

is a radically different statement (the right hand side is now a constant!). What went wrong
here? The problem was that the two occurrences of x in (–1A) were both free and referred
to the same thing, whereas in (–1X) the first y is free, but the second one is bound: the
variable y here has two different meanings in the same statement: this won’t do.
In short, the problem is that we are trying the substitution [x/y], when x occurs within the
scope of the binding of y by the integral.
The same sort of thing occurs in logic. Suppose we are talking about the integers Z. Here
is an expression:
(∃y)(x + y = 1) . (–2)
This is a statement about the integer x, which is in fact true of all such x, so we could write

(∀x)(∃y)(x + y = 1) . (–3)

Suppose we substitute z for x here:

(∃y)(z + y = 1) .

This says the same thing about z as (–2) does about x and we have

(∀z)(∃y)(z + y = 1) .
104 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

All this is OK, however if we make the unacceptable substitution [x/y] in (–2) we get

(∃y)(y + y = 1) .

which means something completely different — it has no free variable and is false (for
integers).
So far we have looked at substitutions of one variable for another. We are also allowed to
substitute terms for variables. Here are the results of the substitutions [x/3] and [x/x2 + 3]
in (–2):
(∃y)(3 + y = 1)
(∃y)(x2 + 3 + y = 1) .
These are OK, but the unacceptable substitution [x/y 2 + 3], yielding

(∃y)(y 2 + 3 + y = 1) .

is not.
In short, an expression with a free variable x can be thought of as saying something about
that variable. An acceptable substitution [x/t] yields an expression which says the same
thing about the term t; an unacceptable substitution yields an expression which says some-
thing entirely different. And what makes this unacceptable is if the term t contains a variable
which changes from free to bound when t is substituted for some occurrence of x.

A.19 Definition: Acceptability


Given an expression P , a variable x and a term t, the substitution [x/t] is acceptable in
P provided that there is no variable symbol y which occurs in t and such that some free
occurrence of x in P is within the scope of a (∀y) quantifier in P .
There are a few special cases, more or less common, in which this definition simplifies.

(i) (One variable is being substituted for another.) If x and y are variable symbols, then
the substitution [x/y] is acceptable in P provided no free occurrence of x in P is within the
scope of a (∀y) quantifier.
(ii) (A variable is being substituted for itself.) As you would expect, the substitution
[x/x] is always acceptable in P , and P [x/x] is just P .

(iii) (An ineffective substitution.) If the expression P happens not to contain any occur-
rence of the variable x at all, then any substitution [x/t] is acceptable in P and again P [x/t]
is the same as P .
Here is a careful definition, by induction over the construction of the expression:

• Any substitution into an atomic expression is acceptable.


• A substitution in ¬Q is acceptable if and only if it is acceptable in Q.
• A substitution in (Q ⇒ R) is acceptable if and only if it is acceptable in both Q and
R.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 105

• A substitution [x/t] in (∀xQ) is always acceptable (note, the same x in both the
substitution and the quantifier).

• A substitution [x/t] in (∀yQ), where x and y are different variables, is acceptable if


and only it is acceptable in Q and t contains no free occurrence of y.
In the fully-formal language, and mostly in any semi-formal versions, all variables
which occur in terms are free, so the word “free” in that last dot point is mostly
unnecessary. In certain semi-formal constructions a term might contain some kind of
expression — the example which springs to mind is the construction {x : P (x)} in set
theory, but other constructions involving definition by description might occur to you
— so I’ve included the word “free” there just to be on the safe side.

A.20 “Functional” notation


I have also mentioned that we sometimes write P (x) for an expression in which the variable
x may or may not occur free, and then write P (t) for the result of substituting t for x in
P (that is, to mean P [x/t]). This notation is often easier to read than the P [x/t] notation,
because it is more in line with the notation we use for ordinary functions.
It needs to be handled with care because it can be ambiguous. Because of this, we won’t
even think of it as semi-formal; at best it is informal.

One of the problems with this notation is that, with many of the results we discuss, an
expression P may have any number of variables, over and above the ones we are actually
interested in. But the P (x) notation leads us to think that that is the only variable in P .
But if we remember that P (x) might have other variables besides, we are allowed to write
things like
P (x) for x2 + 3y ;
Now let’s do a couple of substitutions and see what happens. Here’s what we would expect
(with this notation).

Start with P (x)


Substitute y for x to get P (y)
Now substitute x for y to get P (x) back again.

Well, no. What actually happens is this

Start with x2 + 3y
Substitute y for x to get y 2 + 3y
Now substitute x for y to get x2 + 3x and that isn’t P (x)!.

What we are seeing here is a shortcoming of the “functional” notation for expressions: it just
isn’t up to the job when repeated substitutions occur (and in some other circumstances).
So I will use it occasionally for illustrative purposes, but we should not rely on it.

We are finally ready to describe the . . .


106 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

A.21 Axioms and rules for Predicate Logic


These axioms are schema: A , B and C stand for any expressions, x for any variable symbol
and t for any term.
(PL1) A ⇒ (B ⇒ A)
(PL2) (A ⇒ (B ⇒ C)) ⇒ ((A ⇒ B) ⇒ (A ⇒ C))

(PL3) (¬A ⇒ ¬B) ⇒ ((¬A ⇒ B) ⇒ A)

(PL4) (∀x)A ⇒ A[x/t] provided that the substitution [x/t] is acceptable in A.


(PL5) (∀x)(S ⇒ A) ⇒ (S ⇒ (∀x)A) provided that x does not occur free in S.
The Axiom PL4 is called Particularisation.

There are two rules:


A, A⇒B
(MP, Modus Ponens)
B
A
(UG, Universal Generalisation)
(∀x)A
When discussing UG, x is referred to as the “affected” variable.
In the rough-and-ready functional notation described above, the new axioms look like this:
(PL4) (∀x)A(x) ⇒ A(t) provided that the substitution [x/t] is acceptable in A.

(PL5) (∀x)(S ⇒ A(x)) ⇒ (S ⇒ (∀x)A(x)) provided that x does not occur free in S.
Universal generalisation looks like this:

A(x)
(∀x)A(x)

A.22 Comments on the first three axioms


These axioms look exactly like the three axioms of Sentential Logic. Regarding these I have
some good news and some bad news.
First the bad news. They are not the same as the axioms of SL, even though they look
like them. The reason for this is that they are actually axiom schemas. With the axiom
schemas of SL, we could get actual axioms — instances of the schemas — by substituting
any expressions of SL for the letters A, B and C used there. With the new ones for PL we
get actual axioms by substituting expressions of PL instead, and these are different. So the
actual axioms of SL, rather than the schemas, are different from the axioms of PL.
Now the good news: the bad news hardly matters at all. This is because all the theorems of
SL we proved in the previous chapter were schemas too. Every one of them still holds in PL
if we interpret them in the obvious way: we are now allowed to substitute any expression
A. A LANGUAGE FOR FIRST-ORDER THEORIES 107

of PL for the various letters in those theorems. And that, in turn, is because all the proofs
were schematic.
Put this another way: you could, if you wished, simply copy all the theorems of the SL
chapter and paste them into this one here, and they would stand as perfectly good theorems
and proofs for PL. (I don’t recommend actually doing this of course!)

These comments do not apply to the two big metatheorems of the previous chapter (the
Deduction Theorem and the Substitution of Equivalents Theorem). They need to have their
statements readjusted and be proved again for the new context.

A.23 Comments on Axiom PL4


This is argument from the general to the particular. If something (A) is true of all x, then
it must be true of any particular one (t).
But don’t forget that there is an acceptability condition here. In the discussion of accept-
ability in A.18 above we saw a couple of examples of what can go wrong if you ignore this I A.18
condition.

There is a simple special case of this axiom which is used quite often. In this version no
substitution is made, that is, the variable x is left unchanged. So this special form of PL4 is

(∀x) A ⇒ A

(To get this special case from the general version of PL4, simply use the substitution [x/x].
We know that this kind of substitution is always acceptable, so we don’t need to bother
checking acceptability.)

A.24 Comments on Axiom PL5


First, a warning:— BEWARE OF AXIOM PL5!

It can only be used when S does not contain x free. I have used the letter S here in an
attempt to make this stand out by suggesting “sentence”. This means that it is of very
limited usefulness, for instance in proving weird things like

If (∀x)(2 + 2 = 4 ⇒ A) then (∀x)A

It is however used as a crucial step in the proof of the Deduction Theorem for PL and so,
of course, occurs as a “hidden” step in many proofs.
It was mentioned above that, if the expression A has no free occurrence of the variable x,
then (∀x)A is equivalent to A. So PL5 should be an equivalence, and it is; we will prove
this below.
In the last chapter we saw that expressions of the form X ⇒ (Y ⇒ Z) are equivalent to
X ∧ Y ⇒ Z and that the axioms of SL are probably easier to understand if you change
them to the latter form. (As before, Axiom PL4 is written in this double-implication form
because we want to write the axioms in fully-formal form, and the ∧ symbol is only available
in the semi-formal language.)
108 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

If we change Axiom PL4 in the same way to make it more readable, it becomes

(∀x)(S ⇒ A) ∧ S ⇒ (∀x)A (provided x does not occur free in S) .

or even
(∀x)(S ⇒ A) , S (∀x)A (Same proviso) .

A.25 Free variables in proofs and theorems


How are we to think about free variables which turn up in theorems? For example, consider
the following, which ought to be a theorem in a theory of the natural numbers:

x + y = y + x.

(By the way, we have already seen that every expression which turns up as a step in the
proof of a theorem is itself a theorem, and we know that all axioms are also theorems, so
what we are talking about here is free variables in any theorem, any step in the proof of a
theorem, or any axiom.)

Think of an expression (theorem, step, axiom) such as the one above as being true for any
values of its free variables. Let us have a look at an example of how this might work in a
proof.
When we come to the axiomatic development of mathematics, we will define the subset
relation thus:

A⊆B is an abbreviation for (∀x)(x ∈ A ⇒ x ∈ B)

and then of course we want to prove some of the elementary properties of this relation. Let
us prove
A⊆B, B ⊆C A⊆C.
Using the definition above, this is the same as

(∀x)(x ∈ A ⇒ x ∈ B) , (∀x)(x ∈ B ⇒ x ∈ C) (∀x)(x ∈ A ⇒ x ∈ C) .

Let us first write out a proof in plain language:


For any x, we have

x∈A⇒ x∈B by the first hypothesis


and x∈B ⇒ x∈C by the second hypothesis
and so x∈A⇒ x∈C by SL Proposition F.3(a)

Since this is true for any x

(∀x) (x ∈ A ⇒ x ∈ C)

Now let us write this out as a formal proof.


A. A LANGUAGE FOR FIRST-ORDER THEORIES 109

1 (∀x)(x ∈ A ⇒ x ∈ B) hyp
2 (∀x)(x ∈ B ⇒ x ∈ C) hyp
3 x∈A ⇒ x∈B PL4 on 1
4 x∈B ⇒ x∈C PL4 on 2
5 x∈A ⇒ x∈C SL Proposition F.3(a) on 3 and 4
6 (∀x)(x ∈ A ⇒ x ∈ C) UG on 5

Notice that Steps 3,4 and 5 in this proof contain a free variable (x). It is a little clearer now
what is meant by such steps: the free variable stands for an arbitrary x. You might say that
these lines are supposed to be true “for any x”, and in fact we do say just that explicitly in
the plain language version of the proof.
We do not write “Now, for any x” between Steps 2 and 3 of the formal proof for two reasons.
Firstly, because any occurrence of x as a free variable in a proof of a theorem represents
“any x” (and the same of course for any other variable symbol which occurs free). And
secondly, because there is no expression in the formal language to say, “From now on every
free occurrence of x stands for any x”.
Notice also what these tricks allow us to do. Between Steps 2 and 3 we strip the quantifiers
off the hypotheses. Then, in Steps 3, 4 and 5 we do some manipulations with x which
would be difficult, perhaps impossible, to do if the expressions were still surrounded by the
quantifiers. Finally, having done these manipulations, we put the quantifier back in Step 6.
(This makes sense because Step 5 still holds for any x.) It is UG which allows us to perform
this final step — and that is the whole point of UG.
In a fully formal proof, this is the only way a new free variable symbol may appear: the
only justifications for the appearance of a new free variable symbol in a fully formal proof
are (1) that it is in an instance of one of the axioms or (2) that it is in the result of an
application of PL4 (the term t in P [x/t] is, or contains, a new variable symbol).
However, mostly we will be using semi-formal proofs and for them we will introduce tech-
niques which will allow new free variable symbols to appear out of thin air in two more
ways: (3) in a hypothesis to a deduction and (4) as a choice variable in an application of
the Choice Rule. We will deal with these later in this chapter.

A.26 Comments on Universal generalisation


(i) On first sight, UG appears to be (invalid) argument from the particular to the general,
but it isn’t really. This has just been discussed above.

(ii) Important warning This form of Universal Generalisation (UG) can only be used
in the formal proof of a theorem. When we come to look at deductions in PL, we will see
that there is an extra condition necessary for UG to be valid in the proof of a deduction.

A.27 A remark
Mathematics is a first-order theory and so is built on the basis of predicate logic. All of the
considerations we have discussed so far in this chapter turn up everywhere in mathematics.
110 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

For example, consider the definition of “f is a continuous function” in an elementary calculus


course. Stated formally, it is

(∀a)(∀ > 0)(∃δ > 0)(∀x)( |x − a| < δ ⇒ |f (x) − f (a)| <  )

so it is clear that some heavy predicate logic is involved here. In fact, I have simplified the
definition somewhat: it really should specify that a, , δ and x are all real numbers, and
I have not defined yet how notations such as ∀ > 0 are defined. Adding these necessary
refinements would make the definition even longer and harder to read.
Nevertheless, Calculus 101 courses around the world all expect pass-level students to be able
to work with such definitions, without any preliminary training in predicate logic and usually
without any mention being made of the numerous subtleties and pitfalls I have discussed in
this chapter.
Go figure!
B. DEDUCTION AND SUBDEDUCTION 111

B Deduction and subdeduction

I mentioned above that UG cannot be used with complete freedom when proving a deduction.
Let us look at an example to see why a further condition is needed. Suppose we are working
in a formal theory to deal with the natural numbers, and we have got as far as defining
prime numbers. We might now want to prove

x is prime x = 2 ∨ x is odd (–1)

Such a proof would look like this in outline

x is prime hyp
..
.
..
. (proof steps) (Good proof)
..
.
x = 2 ∨ x is odd

Such a proof certainly exists; I have just left out the tiresome details. And this is all perfectly
correct so far.
HOWEVER, if I now use UG to add a further line to the bottom

(∀x)( x = 2 ∨ x is odd ) UG (Bad step)

I have “proved” something which is obviously absurd — that every natural number is either
2 or odd.
It is fairly clear what the problem is here. In the good part of the proof, every step follows
from the hypothesis. So any reference to the variable x (as a free variable) in the proof is
assuming that the hypothesis is true — that x is a prime number. In other words, in this
proof, x is not just any natural number, it is any prime number. And that means that UG
cannot be used in this way.
So the way out of this dilemma is to say that, in this kind of proof of a deduction from
hypotheses, UG must not be used to generalise any variable which occurs free in any hy-
pothesis. Remember that a deduction from hypotheses is not part of our fully formal theory.
It is a convenient addition in our semi-formal theory. So we are allowed to define a deduc-
tion in any way we please — but then it is incumbent on us to prove that it does what it is
supposed to do, that is, that the Deduction Theorem works.
Now there is another kind of argument which works from hypotheses and which looks very
similar. This occurs when we treat our hypotheses just like theorems. (We might in fact
already know that they are theorems, or we might want to ask what would happen if they
were theorems.) For the purposes of the present discussion, let us call such an argument an
entailment.
So the difference between an entailment and a deduction is the difference between two
possible ways of looking at the idea of “C follows from hypotheses H”:
112 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

Entailment: We suppose that the expressions of H have been


added to the set of theorems (for example, by adding them as new
axioms, or perhaps they were there already) then C can be proved
as a theorem.
Deduction: Considering the expressions in H as statements de-
scribing the sorts of things their free variables can refer to, then
C can be proved for those kinds of variables.

And what this boils down to, in practical terms, is . . .

In the proof of an entailment you can use UG freely, as described


in A.21.
In the proof of a deduction, you may not use UG if the variable
being generalised occurs free in any of the hypotheses*.

(* We will make this condition a bit more subtle shortly.)


Now, looking at our example proof above and the discussion surrounding it, it is obvious
that it is the second way of looking at such a proof that we are mostly interested in. In fact,
the “proof of theorem” interpretation would be truly weird in this context. What would
happen if we were to add “x is a prime” to our list of theorems (as an extra axiom, say)?
We know that, in this situation, the statement is supposed true for any x, so we are in effect
adding a new axiom which says that every natural number is a prime. This is plainly false,
and we can prove it, so we end up with an inconsistent theory. This is not at all what we
had in mind.
Here are a couple more ways of looking at the difference between these two ideas.

Entailment: for any free variable in any hypothesis, it is as-


sumed that that hypothesis is true for all values of that variable.
We treat the hypotheses just as though they were theorems or
axioms.
Deduction: for any free variable in any hypotheses, it is as-
sumed that the hypothesis restricts those values of the variable to
the ones for which the hypothesis is true.

And

Entailment: We use entailment when discussing comparisons


between formal theories — what happens when we extend a theory
by adding extra axioms and so on.
Deduction: We use deduction when making ordinary logical
or mathematical arguments of the kind that use the Deduction
Theorem.
B. DEDUCTION AND SUBDEDUCTION 113

We will put some mild restrictions on a proof H C so that the Deduction theorem will

work, that is, so that it follows from such a proof that H ⇒ C. I will use the symbol
for such a restricted proof. We can also of course still deal with more general entailments
H 1, H 2, . . . , H h C if we want to.

One way of dealing with the problem is to insist that in a proof of a deduction H C,
we do not allow any UG step in which the variable being generalised occurs free in H. More

generally, in a proof of a deduction H 1 , H 2 , . . . , H h C , we do not allow any UG step in
which the variable being generalised occurs free in any of the hypotheses H 1 , H 2 , . . . , H h .
This restriction will indeed do the trick — with it the Deduction Theorem follows — and
we will prove it.

However, this is overkill: the Deduction Theorem still works with a milder condition. In
order for a UG step from S to (∀x)S to be valid, all that is necessary is that x should not
occur free in those hypotheses which are actually involved in the proof of the step S (and
that might well be not all of them).
So how do we decide which of the set of hypotheses are involved in the proof of some given
step? The easiest way to do this is to change the wording slightly as follows: Let us suppose

we have a deduction H C with a purported proof S 1 , S 2 , . . . , S n . Then a subset K of
H is sufficient for step Si if there is a subsequence of the given proof S 1 , S 2 , . . . , S n giving

a valid proof of K S. Then we can say that the UG step from S to (∀x)S is allowed
provided there is some subset of the hypotheses which is sufficient for S, none of which
contain x free.

This wording happily covers a slightly confusing possibility. In writing down a proof, we
usually annotate each step with how it comes about (“UG from Step 3” or “MP from Steps 4
and 5” and so on). But the fact is that these annotations are not part of the proof but just
a kindness to the reader. It is perfectly possible (though uncommon) for a step to have two
different justifications (“UG from Step 3” and “MP from Steps 4 and 5”), but we only bother
to point out one. And this in turn means that a step might have more than one sufficient
set of hypotheses. If this does occur, then it is only necessary for one of those sets to have
no free occurrences of x. As just stated, this strange possibility is covered by the wording
above.
114 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

B.1 Definition: A deduction and its proof


Let H be a set of expressions and C another expression. We say that there is a deduction

from H to C, denoted H C if there is a proof of this deduction, as defined next.

We say that a sequence S 1 , S 2 , . . . , S n of expressions is a proof of the deduction H C if
the sequence satisfies the condition that, for each i, at least one of the following hold:
(i) Si is an axiom;
(ii) Si is one of the hypotheses (members of H);
(iii) Si follows from two earlier steps by Modus Ponens, that is, there are j and k, both
< i, such that Sk = Sj ⇒ Si ;
(iv) There is j < i and a variable symbol x such that Si = (∀x)Sj and, moreover,
(ug∗) there is a subset of H which is sufficient for Sj , no member of which
contains x free.
Here, to say that the subset K is sufficient for Sj means that there is a
subsequence of S 1 , S 2 , . . . , S n which constitutes a proof of the deduction

K Sj .
Because of Part (iv) here, this is a definition by induction over the length of the proof; in
other words, one checks the validity of the proof by working from left to right.

B.2 Comments on this


(i) The only difference between the definitions of entailment and deduction is the extra
condition (ug∗) above.
(ii) The condition
(ug∗∗) Wherever UG is used in the proof, the variable x being generalised
must not occur free in any of the hypotheses
is more restrictive than (ug∗); that is, if (ug∗∗) holds, then so does (ug∗) (but not vice
versa). This condition (ug∗∗) is obviously much easier to check and in fact holds for most
valid proofs, so this is the one we will usually use.

(iii) In the condition (ug∗) it is possible that the empty set of hypotheses might be
sufficient for Sj . If this happens, it means that there is a subsequence of the given proof
which constitutes a proof of Sj on the basis of no hypotheses at all — that is, that Sj is a
theorem.
(iv) If the hypotheses contain no free variables at all, then the proviso (ug∗) automatically
holds and so can be ignored.
(v) As a special case of this, where there are no hypotheses (that is, H is empty: this is
the proof of a theorem), the proviso (ug∗) is automatically true and can be ignored.
(vi) Where the proof does not use UG at all, the proviso does not apply at all.

(vii) From this it follows that all proofs of deductions given in the previous chapter on
B. DEDUCTION AND SUBDEDUCTION 115

Sentential Logic are equally valid as proofs of deductions in PL or any first-order theory.)
(viii) The extra condition (ug∗) is the one that will make the Deduction Theorem work,
as discussed informally above and proved below.
(ix) Here is what this means for our small example proof above.

The short proof, labelled “Good proof” above, is a valid proof of the deduction

x is prime x = 2 ∨ x is odd . (–2)

The extra step, labelled “Bad step” is invalid. It is not allowed in a proof of such a deduction
because it fails the (ug∗) condition. As already remarked, this is a good thing because the
conclusion is obviously false.
The Deduction Theorem, which we prove next, will allow us to conclude from (–2) that
x is prime ⇒ x = 2 ∨ x is odd
and then, since this is a theorem (and not a step in the proof of a deduction) we may now
if we wish apply UG to conclude
(∀x)( x is prime ⇒ x = 2 ∨ x is odd )
All this is just what we want from deductions and the Deduction Theorem.
Earlier I said that all the theorems and proofs in Sentential Logic in Chapter 2 held just as
well for Predicate Logic in this chapter — that you could just cut and paste any theorem
from that chapter into this one, but that we will need to prove the Deduction Theorem
again. The reason is that, in the earlier proof, we looked at the various kinds of steps which
could occur in a deduction. Now that we are dealing with Predicate Logic, we have one
more kind of step, namely a UG step, and we need to deal with this as well. But the proof
for first-order theories is almost exactly the same as the proof for Sentential Logic, with just
a small addition to cope with UG steps.

In order to prove the Deduction Theorem we must first observe that Proposition 2.B.2 and I 2.B.2
Deduction 2.B.4 of Chapter 2 still hold good for Predicate Logic: simply look at the proofs I 2.B.4
given there and observe that they obey the rules for a proof in Predicate Logic.

B.3 The Deduction Theorem for PL


Simple form

If H C , then H⇒C .
More general form

If H 1 , H 2 , . . . , H h C , then H1 ∧ H2 ∧ · · · ∧ Hh ⇒ C .

Proof. The simple form is just a special case of the more general form (the case h = 1), so
we need only prove the latter. We do this by proving that if

H 1, H 2, . . . , H h C,
116 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

then

H 1 , H 2 , . . . , H h−1 Hh ⇒ C ,

from which the general form follows by induction.


The proof is very similar to the corresponding one for SL. Recall, we assumed there is a

valid “old” proof of H C,

S1, S2, . . . , Sn (with Sn = C), (-1)

and showed how to construct a “new” proof

Hh ⇒ S1 , Hh ⇒ S2 , . . . , Hh ⇒ Sn (-2)

with a few extra steps thrown in, which is a valid proof of (-2). We did this by working step
by step along the proof, which is the same as working by induction over n.
So, for some particular step Sk (with 1 ≤ k ≤ n) we assumed all previous steps Hh ⇒ Si
in the new proof were valid and proved that Hh ⇒ Sk is valid also. We did this by looking
at the various ways that Sk can be justified (as an axiom, as one of the hypotheses and as
following by Modus Ponens from earlier steps), giving a short proof of validity in each case.
All those short proofs still hold, word for word, in this new environment, so it remains only
to check the fourth, new, case, that Sk follows from an earlier step by UG:

there is some p < k such that Sk is (∀x)Sp

and, since this is part of the old valid proof, there is a subset, K say, of H 1 , H 2 , . . . , H h which
is sufficient for Sp , no member of which contains x free. And that means that there is some

subsequence U 1 , U 2 , . . . , U m of S 1 , S 2 , . . . , S p which constitutes a valid proof of K Sp .
Now we look at two subcases:
(i) Hh does not contain x free.

We have K Sp

so K Hh ⇒ Sp (Axiom PL1 and MP)

so K (∀x)(Hh ⇒ Sp ) (OK, no member of K contains x free)

so K (∀x)(Hh ⇒ Sp ) ⇒ (Hh ⇒ (∀x)Sp ) (Axiom PL5 (n.b. ug∗ complied with.)

so K Hh ⇒ (∀x)Sp (MP)

(ii) Hh contains x free.


Then Hh is not a member of K.
Now add that entire subsequence U 1 , U 2 , . . . , U m (above) to the new proof, just before the
step (∀x)Sp . This is a valid proof from K and so is still valid as part of the new proof. The
last member of the subsequence we have just added is Sp and no member of K contains x
free, so we may apply Case (iv) of the definition and conclude that (∀x)Sp is a valid step in
the new proof. But then we have Hh ⇒ (∀x)Sp by Axiom PL1 and MP, as required. 
B. DEDUCTION AND SUBDEDUCTION 117

B.4 Subdeduction

Subdeductions work for PL in the same way as for SL. The only difference is that the (ug∗)
restriction on the affected variables in a Universal Generalisation step must be observed. As
remarked above, the stronger but simpler condition (ug∗∗) is nearly always sufficient, which
is nice because it is much more easily seen to be obeyed in a proof.
To conform with the stronger condition (ug∗∗): wherever a Universal Generalisation step is
used, the affected variables must not occur free in any of the hypotheses of any box (subproof
or main proof) which encloses that step.

For example

H1 , H2 , . . . , Hh hypotheses for the main proof


..
.
..
.
J1 , J2 , . . . , Jj subsidiary hypotheses
..
.
UG step here
..
.
K1 , K2 , . . . , Kk subsidiary hypotheses: ignore
..
.
..
.
..
.
UG step here
..
.
..
.
..
.
L1 , L2 , . . . , Ll subsidiary hypotheses: ignore
..
.
..
.
..
.
..
.

The two UG steps shown must take into account free variables in H1 , H2 , . . . , Hh and
J1 , J2 , . . . , Jj but may ignore those in K1 , K2 , . . . , Kk and L1 , L2 , . . . , Ll .

Where the more general condition (ug∗) is used, it is almost always hidden in the substitution
of the statement of a deduction or theorem, as justified in Theorem C.1 above. In the rare I C.1
cases where (ug*) is used explicitly, one should be careful to specify the sufficient set of
hypotheses in the comments.
118 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

B.5 Two alternative axioms


Another common presentation of PL uses a slightly different set of axioms: PL5 is replaced
by the following two. They are generally useful, so let us prove them now as theorems. This
also makes a nice example of the use of subdeductions, as described above.

(PL6) (∀x)(A ⇒ B) ⇒ ((∀x)A ⇒ (∀x)B)


(PL7) S ⇒ (∀x)S provided x does not occur free in S.

Proof of PL6.

1 (∀x)(A ⇒ B) hyp
2 A⇒B PL4
3 (∀x)A subhyp
4 A PL4
5 B MP:4,2
6 (∀x)B UG
7 (∀x)A ⇒ (∀x)B Ded:3–6

Proof of PL7

1 S hyp
2 (∀x)S UG


It is an interesting exercise to prove that, using this alternative axiomatisation, PL5 can be
proved (that is what is needed to show that the two sets of axioms are equivalent). The
slightly tricky bit is that, if you use the Deduction Theorem in this proof (hard not to!),
then you have to first prove that it works for the new axiomatisation. Note that the proof
above of the Deduction Theorem does use PL5, so this is necessary.
C. USING THEOREMS AND DEDUCTIONS IN PROOFS 119

C Using theorems and deductions in proofs


In Sentential Logic, as part of our semiformal theory, we are always allowed drop a theorem
as a step into any proof (and we did this often). As a recipe for formalising this action,
simply insert before this step its full formal proof.
In Sentential Logic we also allowed deductions to be used in any proof. If lines H 1 , H 2 , . . . , H k
already appear in a proof and the deduction H 1 , H 2 , . . . , H k P has already been proved,
then we may now insert the line P .

That was fine for Sentential Logic, where UG is not involved with its special condition on
free variables in the hypotheses. (And where there was no difference between deduction
and entailment). But now, with Predicate Logic it seems that a problem can arise with
deductions. Consider this: suppose that we have already proved a deduction of the form

H 1, H 2, . . . , H h P (–1)
and we are now trying to prove a new one

K 1, K 2, . . . , K k C (–2)
Suppose further that in the course of the proof of (–2) we have already proved H 1 , H 2 , . . . , H h ,
so now we feel justified in writing down P as a new step, because of (–1).
But are we?
Our semiformal proof, up to the point where we assert P , looks like this
S1, S2, . . . , Sn , P (–3)
(Here S 1 , S 2 , . . . , S n are the steps so far.) Now, using the justification we used in the
previous chapter, all we have to do is insert the proof of (–1) before P to get a valid formal
proof:
S1, S2, . . . , Sn , T 1, T 2, . . . , T m , P (–4)
But not so fast! We are dealing with deductions now. It is possible that one of the steps
we have just inserted is the result of applying UG, and that pesky condition (ug∗) must
be checked. Now, the steps we have just inserted constitute a valid proof of (–1), so that
variable does not turn up free in any of the hypotheses H 1 , H 2 , . . . , H h of (–1). However,
now we are constructing a proof of (–2), and there is nothing at all to say that the variable
does not turn up free in one of its hypotheses K 1 , K 2 , . . . , K k . (This really can happen —
it’s not too hard to think up an example.) So this proof that it is always OK to insert P in
the semi-formal proof as in (–3) appears to be incorrect. That’s the bad news.

The good news is that it would only be incorrect if the simpler condition (ug∗∗) was all
we had to work with, but with the more subtle (ug∗), which is in fact the definition, the
insertion is correct. We prove this now.

C.1 Theorem: Substituting one deduction into another


Let H 1 , H 2 , . . . , H h , P be expressions in a first-order language and

H 1, H 2, . . . , H h P (–1)
(There is a proof of the deduction from H 1 , H 2 , . . . , H h to P .)
120 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

Then

in any proof of any deduction in which H 1 , H 2 , . . . , H h appears, P may be asserted. (–2)

What this means is that, in constructing a proof, we can always add a new step (P ), provided
that step has already been proved to follow as a deduction from earlier steps in this proof.

Proof. From (–1) and the Deduction Theorem, there is a theorem

H1 ∧H2 ∧ . . . ∧Hh ⇒ P ;

let U 1 , U 2 , . . . , U m be a proof of this (so that Um = H1 ∧H2 ∧ . . . ∧Hh ⇒ P ).


Let S 1 , S 2 , . . . , S n be the proof under construction from hypotheses K 1 , K 2 , . . . , K k , which
contains all the steps H 1 , H 2 , . . . , H h in some order.

Now place the two proofs end-to-end followed by P thus:

S1, S2, . . . , Sn, U 1, U 2, . . . , U m, P .

This is a valid proof from hypotheses K because S 1 , S 2 , . . . , S n is and every step in U 1 , U 2 , . . . , U m


follows from an earlier one by an approved rule — even UG steps. (Note that it is possible
for one of the steps in U 1 , U 2 , . . . , U m to follow from an earlier one by a UG generalising a
variable which occurs free in K, but this is not a problem, because U 1 , U 2 , . . . , U m is the
proof of a theorem, so the empty set of hypotheses is sufficient for every step in it.) The
final step P now follows by SL since the steps H 1 , H 2 , . . . , H h occur amongst S 1 , S 2 , . . . , S n
and Um is H1 ∧H2 ∧ . . . ∧Hh ⇒ P . 

Since a theorem is the same as a deduction from the empty set of hypotheses, it follows
from this that, in constructing a proof, we are always allowed to insert the statement of a
theorem (that is proved elsewhere) with no further ado.
This is good, because this is something we want to do often.
D. SOME THEOREMS AND METATHEOREMS 121

D Some theorems and metatheorems


D.1 Theorem
I have already mentioned . . .
Any theorem or deduction schema of SL is also a theorem or deduction schema of PL. (In
PL one substitutes expressions of PL for the expression-letters in the schema.) Such proofs
only use Axioms PL1, PL2 and PL3 and the rule MP.

Proof. Inspection of the definitions of proofs in SL and first-order languages shows that a
proof in SL is a proof in any first-order language, without need for any change. 

In proofs and arguments henceforth, it will be assumed that you are familiar with Sentential
Logic. Steps in proofs which depend on sentential logic arguments alone are simply notated
“by SL” or something similar.
Back in the early discussion of binding (A.11), I stated that one of the ways of recognising a I A.11
bound (= dummy) variable is that it can be replaced by another variable without changing
the meaning of the expression (the substitution as usual should be acceptable). We are now
in a position to prove this.

D.2 Proposition: Substitution of dummies


(∀x)P ⇒ (∀y)P [x/y]
provided the substitution [x/y] is acceptable in P .
(In the alternative “functional” notation this is (∀x)P (x) ⇒ (∀y)P (y) .)


Proof. Using the deduction theorem, we prove (∀x)P (∀y)P [x/y]

1 (∀x)P hyp
2 P [x/y] PL4 and MP
3 (∀y)P [x/y] UG


Note that x occurs in the hypothesis, but it does not occur free there; so y does not occur
free in the hypothesis, even if it is the same as x. Therefore the UG step is OK.

Exercise Think up a couple of simple examples to show that ignoring the provisos in this
theorem can lead to silly results.

D.3 The other connectives


We introduce the connectives ∧, ∨ and ⇔ in exactly the same way as in SL (See 2.F.7). I 2.F.7
We introduce the existential quantifier ∃ thus: for any variable symbol x and expression P ,

(∃xP ) stands for (¬(∀x¬P ))


122 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

That is the fully formal version, a bit awkward to read as usual. In semi-formal notation
we usually write
(∃x)P stands for ¬(∀x)¬P

From SL we immediately have the following:

(∃x)P ⇔ ¬(∀x)¬P (–1)


(∃x)¬P ⇔ ¬(∀x)P (–2)
¬(∃x)P ⇔ (∀x)¬P (–3)
¬(∃x)¬P ⇔ (∀x)P (–4)

Note that these all make sense when expressed in plain English:
(1) To say that there exists some x such that P is true is the same as saying that P is
not false for all x.

(2) To say that there exists some x such that P is false is the same as saying that P is
not true for all x.
(3) To say that there does not exist any x such that P is true is the same as saying that
P is false for all x.

(4) To say that there does not exist any x such that P is false is the same as saying that
P is true for all x.
Now we prove another obvious property of this quantifier. Suppose we know that the
expression P is true when some particular value (t say) is substituted for the variable x,
then we can say that there exists an x such that P is true. Stating this properly:

D.4 Proposition: Existential Generalisation


Suppose that the substitution [x/t] is acceptable in P (where t is a term). Then

P [x/t] ⇒ (∃x)P .

Proof. Using the definition of ∃, this is P [x/t] ⇒ ¬(∀x)¬P . The proof is easy:

1 (∀x)¬P ⇒ ¬P [x/t] PL4


2 P [x/t] ⇒ ¬(∀x)¬P SL


D.5 Proposition
(∀x)(P ⇔ Q) ⇒ ((∀x)P ⇔ (∀x)Q)
We are proving this here because it is needed for the next theorem.


Proof. As usual we prove (∀x)(P ⇔ Q) ((∀x)P ⇔ (∀x)Q) .
D. SOME THEOREMS AND METATHEOREMS 123

1 (∀x)(P ⇔ Q) hyp
2 P ⇔Q PL4 on 1
3 P ⇒Q Definition of ⇔ on 2
4 (∀x)(P ⇒ Q) UG
5 (∀x)P ⇒ (∀x)Q PL6
6 (∀x)Q ⇒ (∀x)P Similarly
7 (∀x)P ⇔ (∀x)Q Definition of ⇔ on 5,6


Substitution of equivalents in the context of Sentential Logic was proved in Theorem 2.F.18. I 2.F.18
The theorem still works for Predicate Logic, assuming that care is taken to avoid accept-
ability problems (we should expect that by now). It is also a very useful theorem — we tend
to use it all the time without thinking much about it. The proof is almost the same as the
one for the SL version, but I will give the updated one because of its importance.

D.6 Theorem: Substitution of Equivalents


Suppose that A, A0 and X are expressions such that A occurs as a subexpression of X, and
we write X 0 for the result of substituting A0 for an occurrence A in X.
It seems reasonable that, if A ⇔ A0 then X ⇔ X 0 . There are two ways of looking at this.

(i) If A ⇔ A0 then X ⇔ X 0.

(ii) Suppose further that this substitution is “acceptable” in the sense that, if A lies
within the scope of a quantifier (∀x) in X, then neither A nor A0 contain a free occurrence
of x. Then
(A ⇔ A0 ) ⇒ (X ⇔ X 0 ) (–1)

Proof (i). The proof is by induction over the construction of X and is exactly the same
as the proof of the SL version, referred to above, except that one extra case is needed to
cover a quantifier, that is, where X is (∀x)P and and A is a subexpression of P . Then X 0
is (∀x)P 0 , where P 0 is the result of replacing A by A0 in P .
We have P ⇔ P 0 by the inductive hypothesis.

Then (∀x)(P ⇔ P 0 ) by UG (this is a theorem, so (ug∗) can be ignored).


Then (∀x)P ⇔ (∀x)P 0 by PL6 (twice), and this is X ⇔ X 0.

(ii) Using the Deduction Theorem, we prove (A ⇔ A0 ) (X ⇔ X 0 ). Again, the proof
is the same as that for the SL version except for the UG case. So again assume that X is
(∀x)P and and A is a subexpression of P , so that X 0 is (∀x)P 0 , where P 0 is the result of
replacing A by A0 in P .

By the inductive hypothesis, we have (A ⇔ A0 ) (X ⇔ X 0 ), which means that there is a
proof of a deduction from A ⇔ A0 to P ⇔ P 0 . But now we can add the step (∀x)(P ⇔ P 0 )
to the end of this proof, justified by (ug∗∗), since x does not occur free in A ⇔ A0 . Finally,
(∀x)P ⇔ (∀x)P 0 follows by PL6 (twice). 
124 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

E Quantifying non-free variable symbols


The propositions in this section are special. They deal with the cases in which an expression
occurs inside a quantifier, but does not mention the quantified variable. A simple example
is an expression such as
(∀x)( 2 + 2 = 4 ) . (–1A)

Such expressions turn up from time to time (often while simplifying more complicated
expressions), so we need to be able to deal with them. Luckily, that is usually quite simple.
Basically, a statement such as the one above is equivalent to the same one without the
quantifier, that is
2 + 2 = 4. (–1B)

More generally, these remarks apply to expressions in which the variable in question does
not occur free. For example, in

(∀x)(∀y)( x + y = y + x ) . (–2A)

neither x nor y is free. Consequently this expression is equivalent to

(∀x)(∀x)(∀y)( x + y = y + x ) . (–2B)

and to
(∀y)(∀x)(∀y)( x + y = y + x ) . (–2C)

This is what Proposition E.1 below tells us.


More generally still, similar simplifications occur when a quantifier is applied to an expression
part of which does not mention the quantified variable: usually that part of the expression
can simply be “taken outside” of the scope of the quantifier. The other propositions in this
section are of this nature.
Warning In the following propositions, the expression called S does not contain the vari-
able x free; the other expression P might contain x free. These propositions do not hold
if S contains x free.

E.1 Proposition
Suppose that the variable x does not occur free in S. Then
(i) S ⇔ (∀x)S
(ii) S ⇔ (∃x)S .

Proof. (i) S ⇒ (∀x)S by PL7. Conversely (∀x)S ⇒ S by PL4. The result follows.
(ii) Applying (i) to ¬S we have ¬S ⇔ (∀x)¬S. But (∀x)¬S ⇔ ¬(∃x)S and so we
have ¬S ⇔ ¬(∃x)S. The result now follows by SL. 
E. QUANTIFYING NON-FREE VARIABLE SYMBOLS 125

E.2 Proposition

Suppose that the variable x does not occur free in S (but may occur free in P ). Then
(i) (∀x)(S ∧ P ) ⇔ S ∧ (∀x)P
(ii) (∀x)(S ∨ P ) ⇔ S ∨ (∀x)P
(iii) (∃x)(S ∧ P ) ⇔ S ∧ (∃x)P

(iv) (∃x)(S ∨ P ) ⇔ S ∨ (∃x)P


(This may look clearer in the alternative notation: suppose that the variable x does not
occur free in S. Then

(i) (∀x)(S ∧ P (x)) ⇔ S ∧ (∀x)P (x)


(ii) (∀x)(S ∨ P (x)) ⇔ S ∨ (∀x)P (x)
(iii) (∃x)(S ∧ P (x)) ⇔ S ∧ (∃x)P (x)
(iv) (∃x)(S ∨ P (x)) ⇔ S ∨ (∃x)P (x))


Proof. (i) First we prove that (∀x)(S ∧ P ) S ∧ (∀x)P .

1 (∀x)(S ∧ P ) hyp
2 S∧P PL4 on 1
3 S SL on 2
4 P SL on 2
5 (∀x)P UG on 4
6 S ∧ (∀x)P SL on 3 and 5


Next we prove that S ∧ (∀x)P (∀x)(S ∧ P ).

1 S ∧ (∀x)P hyp
2 S SL on 1
3 (∀x)P SL on 1
4 P PL4 on 3
5 S∧P SL on 2 and 4
6 (∀x)(S ∧ P ) UG on 5

The result now follows.


∗ ∗
(ii) First we prove that (∀x)(S ∨ P ) S ∨ (∀x)P in the form (∀x)(¬S ⇒ P ) ¬S ⇒ (∀x)P
126 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

1 (∀x)(¬S ⇒ P ) hyp
2 ¬S ⇒ P PL4 on 1
3 ¬S subhyp
4 P MP: 3 and 2
5 (∀x)P UG on 4
6 ¬S ⇒ (∀x)P Ded: 3–5


Now we prove S ∨ (∀x)P (∀x)(S ∨ P ).

1 S ∨ (∀x)P hyp
2 S subhyp
3 S∨P SL on 2
4 (∀x)(S ∨ P ) UG on 3
5 S ⇒ (∀x)(S ∨ P ) Ded: 2–4
6 (∀x)P subhyp
7 P PL4 on 6
8 S∨P SL on 7
9 (∀x)(S ∨ P ) UG on 8
10 (∀x)P ⇒ (∀x)(S ∨ P ) Ded: 6–9
11 (∀x)(S ∨ P ) Proof by Cases: 1,5 and 10

The result now follows.


(iii) and (iv) follow from (i) and (ii) by De Morgan’s laws. 

E.3 Proposition
Suppose that the variable x does not occur free in S (but may occur free in P ). Then
(i) (∀x)(S ⇒ P ) ⇔ (S ⇒ (∀x)P )

(ii) (∀x)(P ⇒ S) ⇔ ((∃x)P ⇒ S)

Proof. (i) Using the theorem of SL that (A ⇒ B) ⇔ (¬A ∨ B), this is the same as

(∀x)(¬S ∨ P ) ⇔ ¬S ∨ (∀x)P

I E.2 and this is a special case of Proposition E.2(ii) above.


E. QUANTIFYING NON-FREE VARIABLE SYMBOLS 127

Abbreviated proof of (ii)

(∀x)(P ⇒ S) ⇔ (∀x)(¬S ⇒ ¬P ) by SL
⇔ ¬S ⇒ (∀x)¬P just proved
⇔ ¬S ⇒ ¬(∃x)P see Section D.3(–3)
⇔ (∃x)P ⇒ S by SL

Note that the forward direction of (ii),

(∀x)(P ⇒ S) ⇒ ((∃x)P ⇒ S)

can be rewritten
(∀x)(P ⇒ S) , (∃x)P S
which is a sort of quantified version of the Constructive Dilemma.
128 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

F Deductions involving choice


F.1 Example
Consider the (ordinary mathematical) proof of the result in group theory: Let A and B be
subgroups of a group G such that A ∪ B is a subgroup also. Then either A ⊆ B or B ⊆ A.

Proof Suppose not, that is, that neither A nor B is a subset of the other.
Then there are group members a ∈ A and b ∈ B such that a ∈ / B and b ∈/ A.
Since a and b are members of the subgroup A ∪ B, we have ab ∈ A ∪ B and so
either ab ∈ A or ab ∈ B; suppose without loss of generality that ab ∈ A. Now
remember that a−1 ∈ A also, so b = a−1 .ab ∈ A, a contradiction.

There are a number of interesting aspects of this proof (proof by contradiction, use of the
Constructive Dilemma, apparantly unavoidable use of Reductio ad Absurdum and so on).
The part of interest now is the assertion of the existence of a and b. Making the assertion
of the existence of a a little bit more formal, it goes

A 6⊆ B
Therefore ¬(∀x)(x ∈ A ⇒ x ∈ B) from the definition of B ⊆ A
and so (∃x)(x ∈ A ∧ x ∈ / B) by PL manipulation.
Now let a be an element such that a ∈ A and a ∈
/ B.
etc . . .

This is a common form of argument in mathematics. In general, it is as follows. Suppose


that a line (∃x)C has been established in a proof. We then go on to say something like “let
a be an object such that C[x/a]” and continue the proof, finally arriving at a conclusion
that does not mention a. There is no rule discussed so far which (directly) justifies such a
step in a proof. We now show that such a rule can be introduced, and how to do it. There
are two things to do: first, decide exactly what the rule should be and, second, prove (by a
metatheorem) that the rule is a genuine derived rule.
Observe first that our “object a” is treated like a temporary constant, not a variable —
the formal difference between how constants and variables may be treated is simply that
constants may not be quantified. If we are to introduce a new constant into our language on
the fly, this involves extending the language (and then we immediately have to extend our
sets of strings, expressions and axiom schemas). This is a bit unwieldy, so instead we adopt
the artifice of using a fresh variable symbol (that is, a variable symbol not used in this proof
so far) and adding a condition to the new rule that such a variable must be treated like a
constant, that is, it may not be quantified. Since we have a countably infinite number of
variable symbols at our disposal, there always is a fresh one, so this does not involve us in
extending the language.
Our new method will be: In the course of a deduction, if a statement (∃x)C has been
asserted, we are now allowed to assert C[x/a], where a is a fresh variable symbol (not used
before in this proof) and the whole is subject to some good behaviour conditions. We will
say that the Choice Rule has been applied to (∃x)C to yield C[x/a] and that a is the choice
variable involved.
F. DEDUCTIONS INVOLVING CHOICE 129

F.2 Definition: Deduction involving choice


Our new definition of a deduction (involving choice) is as follows. We say that

H A (with choice)
if there is a sequence S 1 , S 2 , . . . , S n of expressions such that Sn is A and for each i =
1, 2, . . . , n, one of the following holds
(i) Si is an axiom of the theory,

(ii) Si is one of the hypotheses H, or


(iii) Si follows by MP from two earlier expressions in the sequence.
(iv) Si follows by UG from an earlier expression in the sequence, with the condition (ug∗)
holding as usual.

(v) Si follows by the Choice Rule from an earlier expression in the sequence, that is, it
is of the form C[x/a] where
(a) There is an earlier step of the form (∃x)C (same C and x).
(b) a is a variable symbol which which does not occur in the proof before the step
C[x/a] (and in particular, it must not occur, free or bound, in any hypothesis of the proof).
(c) The variable a must not be the variable being generalised in any application
of UG in this proof.
(d) The variable a must not occur free in the “target” expression. This means that,

when this proof with choice, H 1 , H 2 , . . . , H h A is used to prove H1 ∧H2 ∧ . . . ∧Hh ⇒ A,
the expression A must not contain a free.

F.3 Notes
(i) The conditions effectively make a choice variable behave like a new constant.
(ii) Under this definition, a deduction (with choice) in which the Choice Rule is not in
fact used, is just an ordinary deduction. Therefore this new kind of deduction is an extension
of the old kind.
(iii) Note that the Choice Rule may be used several times in the same deduction (we used
it twice in the introductory example F.1 above; the wording of the definition takes this into
account.
(iv) Regarding (d). The expression A may however contain a (∃x)C quantifier, in fact is
likely to. Together with (b) it tells us that a cannot occur free anywhere in H1 ∧H2 ∧ . . . ∧Hh ⇒ A.
The point of (d) is that this kind of proof subverts the variable a to act like a constant for
the duration of this proof. But once the proof is over, we do not want a to be stuck in this
rôle for all time; once the proof is over, it reverts to being a plain variable again.

F.4 Theorem

H A (with choice) (–1)
130 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES


if and only if H A (without choice) (–2)

The theorem is important. Make sure you understand it and how to use the Choice Rule.
The proof is not so important to understand, and is included here as usual for completeness.
On the other hand, it is not too long and quite nice, so don’t let me discourage you from
working through it.

Proof. It has just been observed that a deduction without choice is a special case of a
deduction with it, so if (–2) then (–1). We now prove that if (–1) then (–2), by induction
over the number of applications of the Choice Rule in the assumed proof with choice
B1, B2, . . . , Bn
of (–1). If the Choice Rule is not used in the proof, then it is already a proof without choice
and (–2) is the case. Now suppose that the Choice Rule is used at least once; suppose further
that the first use of the rule in the proof is to (∃x)C to yield C[x/c]. Now note that

H ∪ {C[x/c]} A (with choice). (–3)
This uses exactly the same proof, except that the line C[x/c] is now justified by “Hyp”
instead of “choice rule”. [Not so fast. We must check that B 1 , B 2 , . . . , B n does indeed
satisfy the conditions in the definition of a proof with choice of (–3). A careful reading of
the conditions shows that this is in fact so.] Therefore the Choice Rule is used one less time
in B 1 , B 2 , . . . , B n as a proof of (–3) than it was as a proof of (–1). We may therefore use
induction to conclude that

H ∪ {C[x/c]} A (without choice) (–4)
and then apply the (ordinary) Deduction Theorem to conclude

H C[x/c] ⇒ A (without choice).
Since one of the conditions on the choice variable c was that it should comply with (ug∗),
we may use Universal Generalisation to conclude further that

H (∀c)(C[x/c] ⇒ A) (without choice).
I E.3 Another condition on c was that it should not occur in A either, and so by E.3(ii)

H (∀c)C[x/c] ⇒ A (without choice). (–5)
We also note that (∃x)C occurs in the proof before C[x/c], and so that part of the proof
from the beginning up to and including (∃x)C constitutes a proof of that expression in which
the Choice Rule is not used at all. In other words

H (∃x)C (without choice).
Yet another condition on c was that it should not occur in this step (∃x)C of the proof, and
so

H (∃c)C[x/c] (without choice). (–6)
The result now follows from (–5) and (–6). 

The next proposition is important; it deals with when we can or cannot change the order in
which quantifiers occur. The proof of Part (iii) is a nice example of the use of the Choice
Rule.
F. DEDUCTIONS INVOLVING CHOICE 131

F.5 Proposition
(i) (∀x)(∀y)P ⇔ (∀y)(∀x)P
(ii) (∃x)(∃y)P ⇔ (∃y)(∃x)P

(iii) (∃x)(∀y)P ⇒ (∀y)(∃x)P


Note The third statement here is an implication, not an equivalence. The reverse impli-
cation is not a theorem of PL.


Proof. (i) First we prove (∀x)(∀y)P (∀y)(∀x)P .

1 (∀x)(∀y)P hyp
2 (∀y)P PL4 on 1
3 P PL4 on 2
4 (∀x)P UG on 3
5 (∀y)(∀x)P UG on 4

The proof of the converse is the same.

(ii) This follows from (i) by De Morgan’s laws.



(iii) We prove (∃x)(∀y)P (∀y)(∃x)P .
In this proof, c is a variable symbol that does not appear in P .

1 (∃x)(∀y)P Hyp
2 (∀y)P [x/c] Choice rule on 1
3 P [x/c] PL4 on 2
4 (∃x)P Existential Generalisation on 3
5 (∀y)(∃x)P UG (Note that y does not occur free in Line 1)


F.6 Remarks
In calculus, continuity of a function f : R → R is defined thus:

(∀a ∈ R)(∀ > 0)(∃δ > 0)(∀x ∈ R)( |x − a| < δ ⇒ |f (x) − f (a)| <  )

and uniform continuity is defined

(∀ > 0)(∃δ > 0)(∀a ∈ R)(∀x ∈ R)( |x − a| < δ ⇒ |f (x) − f (a)| <  ) .

The only difference between these definitions is the order of the first three quantifiers.
Consequently the theorem

f is uniformly continuous ⇒ f is continuous

is more a theorem of logic than of calculus.


132 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

F.7 Another remark


Through the sixties, Dean Martin had a big hit with the song Everybody Loves Somebody
Sometime, a title which is crying out for a predicate logic treatment.
Let us write L(x, y, t) for the expression “Person x loves person y at time t”. Then the title
of the song translates to
(∀x)(∃y)(∃t) L(x, y, t) .
Notice what happens if we mess around with the quantifiers. It is OK to swap the order of
the two existentials:

(∀x)(∃t)(∃y) L(x, y, t)
“Everybody, at some time, loves somebody”

but it is not OK to swap the existentials with the universal — the meaning will change:

(∃y)(∀x)(∃t) L(x, y, t)
“There is a particular person that everybody loves at some time”

and

(∃t)(∀x)(∃y) L(x, y, t)
“There is/was/will be a particular time at which everybody loves someone” .

Changing quantifiers from existential to universal and vice versa has an even more violent
effect:

(∀x)(∃y)(∀t) L(x, y, t)
“Everybody loves someone forever” .

And because I try to be a good guy: this song is copyright c 1948, 1975 Sinatra Songs, Inc.
and written by Sam Coslow, Irving Taylor and Ken Lane, 1947. It was the theme song of
The Dean Martin Show 1965–1974.
G. QUALIFIED QUANTIFIERS 133

G Qualified quantifiers
G.1 Discussion
Consider the definition of continuity in elementary calculus: a function f : R → R is contin-
uous if
(∀a ∈ R)(∀ > 0)(∃δ > 0)(∀x ∈ R)( |x − a| < δ ⇒ |f (x) − f (a)| <  )
We see that the quantifiers here are slightly more complicated than the ones we have dis-
cussed so far: they contain qualifications within the quantifiers themselves. (∀x ∈ R) means
“for all real x”, not “for all x whatsoever”. Similarly (∃δ > 0) means “there exists a real δ > 0
such that”. Other similar ways of qualifying a quantifier will probably occur to you.
If we expand these slightly in an obvious way,
(∀a ∈ R) means (∀a such that a ∈ R)
(∃δ > 0) means (∃δ such that δ is real and δ > 0)

and so on, we see that all such notations are shorthand for one or other of
(∀x such that Q(x)) or (∃x such that Q(x))
where Q(x) is some expression involving x. And now we can make a general definition which
covers all such cases:
(∀x such that Q(x)) P (x) means (∀x)(Q(x) ⇒ P (x))
(∃x such that Q(x)) P (x) means (∃x)(Q(x) ∧ P (x))

I have used the slightly sloppy “functional” notation here because it is perhaps easier to see
what is going on. Here is a properly written definition:

G.2 Definition: Qualified quantifiers

(∀x such that Q) P means (∀x)(Q ⇒ P )


(∃x such that Q) P means (∃x)(Q ∧ P )

It is nice to know that most of the manipulations we have been verifying for working with
ordinary quantifiers still work with these qualified ones. For example, the basic relationship
between ∀ and ∃:

G.3 Proposition: Qualified ∀ and ∃

(∃x such that Q) P ⇔ ¬(∀x such that Q) ¬P (–1)


(∃x such that Q) ¬P ⇔ ¬(∀x such that Q) P (–2)
¬(∃x such that Q) P ⇔ (∀x such that Q) ¬P (–3)
¬(∃x such that Q) ¬P ⇔ (∀x such that Q) P (–4)
134 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

Proof. For (–1),

¬(∀x such that Q) ¬P ⇔ ¬(∀x) (Q ⇒ ¬P ) by the definition G.2 above


⇔ (∃x) ¬(Q ⇒ ¬P ) by Equation (–2) of D.3
⇔ (∃x) (Q ∧ P ) by SL
⇔ (∃x such that Q) P by G.2 again.

The other equations follow from this by double-negation fiddles.


When we come to consider changing the order of multiple quantifiers, we must take a little
care. Suppose we have something like this:

(∀x such that Q)(∀y such that R)X

Normally Q would mention x but not y and R would mention y but not x. But there is no
hard and fast rule about this. It is perfectly possible to have such an expression as

(∀x > 0)(∀y > x)X

In this case it would not be valid to swap the two quantifiers: (∀y > x)(∀x > 0)X. However,
so long as we avoid this kind of overlapping of quantifiers, the rules for swapping their order
I F.5 (F.5) still hold.

G.4 Proposition
Provided Q does not mention y free and R does not mention x free, the following hold:

(i) (∀x such that Q)(∀y such that R)A ⇔ (∀y such that R)(∀x such that Q)A
(ii) (∃x such that Q)(∃y such that R)A ⇔ (∃y such that R)(∃x such that Q)A
(iii) (∃x such that Q)(∀y such that A)P ⇒ (∀y such that R)(∃x such that Q)A

Note As before, the third statement here is an implication, not an equivalence. The
reverse implication is not a theorem of PL.

Proof (i).

(∀x such that Q)(∀y such that R)X


⇔ (∀x)(Q ⇒ (∀y)(R ⇒ X)) by definition
⇔ (∀x)(∀y)(Q ⇒ (R ⇒ X)) by E.3
⇔ (∀x)(∀y)(R ⇒ (Q ⇒ X)) by SL
⇔ (∀y such that R)(∀x such that Q)X similarly

(ii) and (iii) are proved the same way. 


H. CONSISTENCY OF PL 135

H Consistency of PL
H.1 Theorem
Plain Predicate Logic PL is consistent.

Proof. We show that Predicate Logic contains no expression P such that both P and
¬P .
We define, for each expression P , a value kP k which will be either 0 or 1. This is by
induction over the construction of P .

• If P is atomic, that is of the form r(t1 , t2 , . . . , tn ), then kP k = 1.

• If P is ¬Q then kP k = 1 − kQk.
(
1 if kQk ≤ kRk;
• If P is Q ⇒ R then kP k =
0 otherwise.

• If P is (∀x)Q then kP k = kQk. 

H.2 Lemma
If the substitution [x/t] is acceptable in P , then kP [x/t]k = kP k.

Proof. This follows the construction of P , using the “careful” description of acceptability
given in A.19. I A.19
If P is atomic, then so is P [x/t] and so both have value 1.
If P is ¬Q and [x/t] is acceptable in P , then it is acceptable in Q also and we have kP [x/t]k =
k¬Q[x/t]k = k¬Qk = kP k.

Similarly, if P is Y (Q ⇒ R) and [x/t] is acceptable in P , then it is acceptable in Q and R


also and we have

kP [x/t]k = kQ[x/t] ⇒ R[x/t]k = 1 if and only if kQ[x/t]k ≤ kR[x/t]k

and so, since kQ[x/t]k = kQk and kR[x/t]k, kP [x/t]k = 1 if and only if kQk ≤ kRk, that is,
kP [x/t]k = kQ ⇒ Rk = kP k.
If P is (∀y)Q, where y is different from x and [x/t] is acceptable in P , then it is acceptable
in Q also and the variable symbol y does not occur in t. Then

kP [x/t]k = k(∀y)Q[x/t]k = kQ[x/t]k = kQk = k(∀y)Qk = kP k .

Finally, if P is (∀x)Q then [x/t] is acceptable in P , kP [x/t]k = kP k and the result is trivially
true. 
136 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

H.3 Lemma
The value of every theorem is 1.

Proof. This is by induction over the length of the proof of the theorem. It is enough to
prove that the value of every axiom is 1 and that, if the expression P follows by Modus
Ponens or Universal Generalisation from expressions of value 1, then kP k = 1 also.
The proofs for Axioms PL1, PL2 and PL3 are simply by checking the various cases for values
of kP k, kQk and kRk.

• Axiom PL4 [(∀x)(P ⇒ Q) ⇒ ((∀x)P ⇒ (∀x)Q)].


Since k(∀x)(P ⇒ Q)k = kP ⇒ Qk = 1 if and only if kP k ≤ kQk and k(∀x)P ⇒ (∀x)Qk =
1 if and only if k(∀x)P k ≤ k(∀x)Qk, that is, if and only if kP k ≤ kQk, we have
k(∀x)(P ⇒ Q)k = k(∀x)P ⇒ (∀x)Qk and so k(∀x)(P ⇒ Q) ⇒ ((∀x)P ⇒ (∀x)Q)k,
as required.
• Axiom PL5 [P ⇒ (∀x)P , where x does not occur free in P ].
Then kP k = k(∀x)P k and the result is trivial.
• Axiom PL6 [(∀x)P ⇒ P [x/t], where [x/t] is acceptable in P ].
Then k(∀x)P k = kP k = kP [x/t]k by Lemma 1, and again the result is trivial.
• Modus Ponens [There are expressions Q and Q ⇒ P , both of value 1].
Since kQ ⇒ P k = 1, we have kQk ≤ kP k. But also kQk = 1, so kP k = 1.
• Universal Generalisation P = (∀y)Q[x/y], where kQk = 1, [x/y] is acceptable in Q
and, if y 6= x, y is not free in Q].
Then kP k = k(∀y)Q[x/y]k = kQ[x/y]k = kQk (by Lemma 1). 

Proof of the theorem. If P is a Theorem then kP k = 1, therefore k¬P k = 0 and therefore


¬P is not a theorem. 

H.4 Comment
I 5.F In the Section 5.F we will prove that Plain Predicate Logic is not complete. In fact we could
do it right now with a slight modification of the above technique, but it sits better in the
the Models chapter.

Towards the end of these notes we will prove that Plain Predicate Logic is not decidable.
There are various ways of doing this and they all require a fair amount of background. We
will show that the result follows without too much trouble from Gödel’s Incompleteness
Theorem, which we are going to prove anyway.
I. FIRST-ORDER THEORIES COMPARED WITH PL 137

I First-order theories compared with PL


The following result is quite simple but also quite useful.

I.1 Proposition
Let K be a first-order theory with proper axioms A. Then, for any expression X,

X in K (–1)

if and only if
A X in PL . (–2)


Note well: in (–2) the plain turnstile symbol is used, not . This means that this is an
entailment, not a deduction. (This is exactly the kind of fact for which the idea of entailment
is used.)

Proof. The proofs of the two things are the same, the only difference being that where a
member of A turns up as a line in the proof, in (–1) it is justified by being an axiom and in
(–2) by being a hypothesis. 
138 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES

J Finitely-axiomatisable first-order theories


The axioms for PL are all schemata, so each one of PL1–PL6 actually represents an infinite
number of axioms. On the other hand, some of the first-order theories we will investigate
only have a finite number of proper axioms. It turns out that the question of whether
a theory has only a finite number of proper axioms is occasionally quite important. In
particular, this idea will be important in the proof that PL is undecidable and therefore
incomplete.

This is a good place to define the idea properly, so let us do that now.

J.1 Definition: Finitely axiomatisable


A first-order theory T is finitely axiomatisable if there exists a finite set of axioms which
(together with the axioms and rules of PL) generate the theory T .
In other words T is the set of all expressions entailed by that set of axioms.

Note the wording here. It might happen that we define a first-order theory by a convenient
but infinite set of axioms, and then find out that there is another, perhaps less convenient,
finite set which generates the same set of theorems. In that case the theory is finitely
axiomatisable. This will in fact occur with the very first set of axioms we investigate in the
next chapter.

In general it might be far from obvious whether any given theory is finitely axiomatisable
or not.
4. SOME FIRST-ORDER THEORIES

A First-order theories with equality


A.1 Introduction
In this chapter we will investigate several interesting first-order theories. Before looking at
specific theories, we should discuss the relation of equality. It is not part of the definition
of a first-order theory that it must contain the relation of equality, however it is a fact that
almost all useful ones do so (for I suppose obvious reasons) — and that includes all the
examples in this chapter.

The relation of equality is usually of course denoted by =. It will be accompanied by axioms


which ensure that it has the expected properties. These are: firstly, it is an equivalence
relation

• For any term t, t = t.


• For any terms s and t, s=t ⇒ t = s.
• For any terms s, t and u, s=t ∧ t=u ⇒ s = u.

and, secondly, equals may be substituted for equals in terms or expressions

• If s and t are terms and s = t, u is a term containing s as a subterm (possibly several


times) and v is a term formed from u by replacing some (possibly not all) of the
occurrences of s in u by t, then u = v. (In other words, substituting equals for equals
in a term results in an equal term.)
• If s and t are terms and s = t, P is an expression containing s as a term (possibly
several times) and Q is an expression formed from P by replacing some (possibly not
all) of the occurrences of s in P by t, then P ⇔ Q. (In other words, substituting equals
for equals in an expression results in an equivalent expression.)

For an example of the last two items here, suppose we are solving the equation

cos x + cos 2x = 1 (–1)

A natural way to proceed would be to use the fact that we know that (have already proved
that)
cos 2x = 2 cos2 x − 1 (–2)
so we can substitute this equality into the term cos x + cos 2x to get

cos x + cos 2x = cos x + 2 cos2 x − 1

139
140 CHAPTER 4. SOME FIRST-ORDER THEORIES

or, better still, substitute this equality (–2) into the equation (–1) (an expression) to get the
equivalent expression
cos x + 2 cos2 x − 1 = 1

after which it is all downhill. If we write P for Equation (–1) then it would be natural to
write Equation (–2) as
P [ cos 2x / 2 cos2 x − 1 ] (–3)

but there is a problem with this notation: in a substitution [x/y], y may be any term, but
x must be a variable symbol. Thus the notation in (–3) is not allowed.

A binary relation which did not have all these properties would not be called “equality” and
it would be foolhardy to use the symbol = for such a relation.
We now look at a pair of axioms which will give all these properties.

A.2 Definition
A first-order theory with equality is a first-order theory with an extra binary relation = and
the extra axioms:

(Eq1) (∀x)( x = x )

(Eq2) (∀x)(∀y) x = y ⇒ (P ⇔ P [x/y]) provided the substitution is acceptable.
Eq2 is an axiom schema: one axiom for every choice of expression P . On the other hand,
Eq1 is a plain axiom.

As usual with axioms, we can use either the fully quantified version, as I have done above,
or the unquantified version:
(Eq1u) x=x
(Eq2u) x = y ⇒ (P ⇔ P [x/y]) provided both substitutions are acceptable.

(You can deduce the unquantified versions from the fully quantified ones by using PL4, and
go in the opposite direction by UG.)
It is not at all obvious that all the properties of equality listed above can be deduced from
just these two axioms. Nevertheless they can, and we will prove this now.

A.3 Proposition: equality is an equivalence relation


(i) (∀x)(x = x)

(ii) (∀x)(∀y)(x = y ⇒ y = x)
(iii) (∀x)(∀y)(∀z)(x = y ∧ y = z ⇒ x = z)

Proof. (i) is given as Axiom Eq1.


A. FIRST-ORDER THEORIES WITH EQUALITY 141

(ii) Let P be the expression y = x. Then P [x/y] is the expression y = y and Axiom Eq2
gives
x = y ⇒ (y = x ⇔ y = y)
∴ x = y ⇒ (y = x ⇔ T) by Eq1
∴ x = y ⇒ y = x.

(iii) Let P be the expression x = z. Then P [x/y] is the expression y = z and Axiom Eq2
gives
x = y ⇒ (x = z ⇔ y = z)
∴ x = y ⇒ (y = z ⇒ x = z)
which is the same as x = y ∧ y = z ⇒ x = z. 

When we come to substituting equals for equals we have a problem, one we can fix without
too much trouble, but we have to notice that it is there. The thing is, we have a new kind
of substitution here.

With the kind of substitution we have been dealing with in the last chapter (for example
P [x/t]) the thing being substituted for (x in the example) must be a variable symbol and
every free occurrence of x must be replaced by t.
On the other hand, with substitution of equals we can substitute one term for another, and,
as stated above, it is OK to substitute for some but not all of the occurrences.

This raises the spectre that we might need to invent a new notation for this kind of substi-
tution. Luckily, we don’t. With a little fancy footwork we can make ordinary substitution
(as we have been using it) work fine.
The next proposition is the crucial one which makes it work, and the following discussion
shows how.

A.4 Proposition
(i) For any term t and variable z

(∀x)(∀y) x = y ⇒ (t[z/x] = t[z/y])

(ii) For any expression P and variable z,



(∀x)(∀y) x = y ⇒ (P [z/x] ⇔ P [z/y]) provided both substitutions are acceptable.

Proof. (ii) Suppose that x = y. Apply Axiom Eq2 twice, once to P [z/x] and once to
P [z/y]:
P [z/x] ⇔ P [z/x][x/y]
and P [z/y] ⇔ P [z/y][x/y] .
It remains only to observe that P [z/x][x/y] and P [z/y][x/y] are the same expression. This
is simply because of the properties of substitution. In the first substitution P [z/x][x/y],
142 CHAPTER 4. SOME FIRST-ORDER THEORIES

• All free occurrences of x in P are left alone by [z/x] and then changed to y by [x/y].

• All free occurrences of y in P are left alone by both substitutions.

• All free occurrences of z in P are changed to x by [z/x] and then changed to y by


[x/y].

We can summarise this in a diagram:

[z/x] [x/y]
x → x → y
y → y → y
z → x → y

Summarising the second substitution in the same way:

[z/y] [x/y]
x → x → y
y → y → y
z → y → y

The result is the same in both cases.

(i) Let P be the expression t[z/x] = t[z/y] and note that

P [x/y] = (t[z/x] = t[z/y])[x/y]


= t[z/x][x/y] = t[z/y][x/y] by definition of a substitution

and, by the calculations just done above, the two sides of this equation are in fact the same
term, so P [x/y] is true by Eq1.
Now Eq2 gives

x = y ⇒ (P ⇔ P [x/y])
which is x = y ⇒ (P ⇔ T )
and so x=y ⇒ P
which is x = y ⇒ (t[z/x] = t[z/y])

as required. 

A.5 Partial substitution


In order to establish the last two properties, “substituting equals for equals” in the intro-
I A.1 duction, A.1 we must first solve a couple of notational problems.
The first is how to define, in a watertight fashion, the operation of substituting for some,
but not necessarily all, occurrences of a variable or term.

The second is how to define the operation of substituting one term for another.
A. FIRST-ORDER THEORIES WITH EQUALITY 143

We will see that the last proposition above in fact solves both these problems for us.
Let us suppose we have proved that s = s0 , where s and s0 are terms, we are confronted with
a larger term, t say, which contains several occurrences of s and we want to substitute s0 for
some of these occurrences, but perhaps not all.

Mental picture of s occurring three times as a subterm of t:

t: s s s

Mental picture of substituting s0 for some but not all of the s occurring and so changing t
to t0 :

t0 s0 s s0

Given s = s0 , we want to prove that t = t0 .


Now, take a fresh variable, z say, (that is, one which does not occur anywhere in t or s0 );
then there is an expression which is t with just those occurrences of s which we want to
change replaced by z; let us call it t∗ :

t∗ : z s z

Now, notice that both t and t0 are got by ordinary substitutions in t∗ :

t = t∗ [z/s] and t0 = t∗ [z/s0 ] .

So now what we want to prove becomes: t∗ [z/s] = t∗ [z/s0 ]; but this is just what Part (i) of
the previous Proposition (A.4) tells us.
The proof that such partial substitution of equals for equals in an expression yields an
equivalent expression is almost the same, and should be obvious by now. Here are the
details anyway.
Let us suppose we have proved that s = s0 , where s and s0 are terms, we are confronted with
an expression, P say, which contains several occurrences of s and we want to substitute s0
for some of these occurrences, but perhaps not all.
Mental picture of s occurring three times as a subterm of P :

P : s s s

Mental picture of substituting s0 for some but not all of the s occurring and so changing P
to P 0 :

P0 s0 s s0

Given s = s0 , we want to prove that P ⇔ P 0 .


Take a fresh variable, z say. Then there is an expression which is P with just those occur-
rences of s which we want to change replaced by z; let us call it P ∗ :
144 CHAPTER 4. SOME FIRST-ORDER THEORIES

P∗ : z s z

Then
P = P ∗ [z/s] and P 0 = P ∗ [z/s0 ] ,
so now what we want to prove becomes: P ∗ [z/s] ⇔ P ∗ [z/s0 ]; and this is just what Part (ii)
of the previous Proposition (A.4) tells us.

A.6 Unique existence


New topic.
We can now formally define the idea expressed by “There is a unique x such that P (x)”.
Other ways of expressing the same idea are “There is exactly one x such that P (x)” or
“There is one and only one x such that P (x)”. This is commonly symbolised (∃!x)P (x) and
is defined thus

Definition

(∃!x)P (x) means (∃x)P (x) ∧ (∀x)(∀y) P (x) ∧ P (y) ⇒ x = y .

(Here y is the first variable symbol which does not occur in P (x).) Some writers use the
notation (∃1 x)P (x) or (∃1x)P (x) for this idea.
It seems to me to be quite natural to split this up into two ideas:

(∃x)P (x) there is at least one x such that P (x) .


(!x)P (x) there is at most one x such that P (x) ,

the latter being defined to mean (∀x)(∀y) P (x) ∧ P (y) ⇒ x = y .
Then (∃!x)P (x) means both, that is (∃x)P (x) ∧ (!x)P (x).

Note that the uniqueness (!x) part of this definition requires equality to be defined.

A.7 Definition: Definition by description


A very common mathematical construction is as follows. Suppose we have an expression
P (u, x1 , x2 , . . . , xn ) and we have proved that, for any x1 , x2 , . . . , xn , there is a unique u
such that P (u, x1 , x2 , . . . , xn ). Then we can think of this as defining u as a function of
x1 , x2 , . . . , xn : we introduce a new function symbol, f say, and write

u = f (x1 , x2 , . . . , xn ) to mean P (u, x1 , x2 , . . . , xn ) .

This procedure is valid, in a sense which is made precise in the next proposition. We will
call it defining the function f by description.
This way of defining a function is important: it is used frequently in mathematics and other
first-order theories with equality. On the other hand, you will see that the proof is quite long
A. FIRST-ORDER THEORIES WITH EQUALITY 145

and complicated. I would suggest that you read the statement of the theorem and make
sure you understand it. On the other hand, the proof is provided for completeness (and
because you might find it difficult to find elsewhere in the literature) so I would recommend
working through it only if you are interested in the gory details or are suspicious.

A.8 Theorem: Definition by description is valid.


Let A be a theory with equality and suppose that P (u, x1 , x2 , . . . , xn ) is an expression in A
with n + 1 free variables (n ≥ 0) such that

(∀x1 )(∀x2 ) . . . (∀xn )(∃!u)P (u, x1 , x2 , . . . , xn ) in A .

Let us construct a new theory B by

• Adding a new n-ary function symbol f to the symbols of the language A and extending
the sets of terms and expressions accordingly, as dictated by Definitions 3.A.6 and I 3.A.6
3.A.8, I 3.A.8

• Extending the axiom schemas PL1–PL5 and Eq2 and the rules Modus Ponens and
Universal Generalisation to include all the new expressions so formed,

• Adding a new axiom



(∀x1 )(∀x2 ) . . . (∀xn )(∀u) u = f (x1 , x2 , . . . , xn ) ⇒ P (u, x1 , x2 , . . . , xn ) .

Then the new theory B is equivalent to the old one A (in the sense of Definition 1.C). I 1.C

Proof. We observe that B is an extension of A, in the sense that the symbols of A are a
subset of the symbols of B and the same goes for strings, terms, expressions, axioms, rules
and therefore theorems.

Observe that the new axiom (∀u) u = f (x1 , x2 , . . . , xn ) ⇒ P (u, x1 , x2 , . . . , xn ) gives rise
in the usual way to theorems: for any terms t1 , t2 , . . . , tn of B,

B (∀u) u = f (t1 , t2 , . . . , tn ) ⇒ P (u, t1 , t2 , . . . , tn ) .

Here “ B ” means “the following is a theorem of B”. In the same way, the assumed theorem
(∃!u)P (u, x1 , x2 , . . . , xn ) of A gives rise to theorems: for any terms t1 , t2 , . . . , tn of A, such
that the substitutions [x1 /t1 ], [x2 /t2 ] . . . [xn /tn ] are all acceptable in P ,

A (∃!u)P (u, t1 , t2 , . . . , tn ) for any terms t1 , t2 , . . . , tn of A, provided the substitutions


[x1 /t1 ], [x2 /t2 ] . . . [xn /tn ] are all acceptable in P .

Since all theorems of A are theorems of B, these are also theorems of B.


We now show that P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) in B, for all terms t1 , t2 , . . . , tn ,
provided that these substitutions are acceptable in P .
146 CHAPTER 4. SOME FIRST-ORDER THEORIES


1. (∀u) u = f (t1 , t2 , . . . , tn ) ⇒ P (u, t1 , t2 , . . . , tn ) New axiom
2. f (t1 , t2 , . . . , tn ) = f (t1 , t2 , . . . , tn ) ⇒ P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) PL4 on 1
3. f (t1 , t2 , . . . , tn ) = f (t1 , t2 , . . . , tn ) Eq10
4. P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) MP: 3, 2

I 1.C We now define the functions ϕ : A → B and ψ : B → A (required by Definition 1.C). The
definition of ϕ is easy: it is the identity, that is, to each expression A of A, ϕ(A) is A.

Define a “simple f -term” to be a term in B of the form f (t1 , t2 , . . . , tn ), where the terms
t1 , t2 , . . . , tn do not mention f .
Let B be any atomic expression in B. We define ψ(B) by induction over the number of
occurrences of the symbol f in B. If f does not occur in B, then ψ(B) = B. Otherwise
B must contain at least one simple f -term; write B ∗ for the result of replacing the leftmost
occurrence of a simple f -term, f (t1 , t2 , . . . , tn ) say, in B by the first variable u which does
not occur in B or P . Then

ψ (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) .

ψ(B) =

We observe that (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) has one less occurrence of the symbol f
than does B and so this inductive definition is good.

We now extend ψ to all expressions in B in the obvious way:—

ψ(¬B) = ¬ψ(B)
ψ(B ⇒ B 0 ) = ψ(B) ⇒ ψ(B 0 )
and ψ((∀x)B) = (∀x)ψ(B) .

We now show that, for any expression B in B, we have B ψ(B) in B.


Consider atomic expressions first. If f does not occur in B, then ψ(B) is B and the result
is trivial. Otherwise let f (t1 , t2 , . . . , tn ) be the leftmost occurrence of a simple f -term in B,
so that ψ(B) is ψ (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) . Using induction, it is clearly sufficient to


show that
B (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) in B .

Proof of B (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) in B.

1. P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) Proved above (note that the substitution is acceptable)


2. B Hyp
3. P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) ∧ B SL, 1 and 2
4. (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) ExGen

Proof of (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) B in B.

1. P (u, t1 , t2 , . . . , tn ) ∧ B ∗ Choice on Hyp


2. P (u, t1 , t2 , . . . , tn ) SL on 1
A. FIRST-ORDER THEORIES WITH EQUALITY 147

3. B∗ SL on 1
4. P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) Proved above (note, acceptable)
5. (∃!u)P (u, t1 , t2 , . . . , tn ) The assumed theorem
6. (∀x)(∀y)(P (x, t1 , t2 , . . . , tn ) ∧ P (y, t1 , t2 , . . . , tn ) ⇒ x = y From 5 by SL
7. P (u, t1 , t2 , . . . , tn ) ∧ P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn )
⇒ u = f (t1 , t2 , . . . , tn ) PL4 on 6
8. u = f (t1 , t2 , . . . , tn ) SL, 2, 4 and 7
9. B Subst equals, 8 into 3.

Now it follows easily that, for all expressions B in B, B ψ(B) in B.


We can now prove the four conditions of Definition 1.C. Using the fact that ϕ(A) is A I 1.C
for every expression A of A, and that both languages contain sentential logic, these can be
rewritten:
0 0
(i) If A A⇒A then B ϕ(A) ⇒ ϕ(A ) .
0 0
(ii) If B B⇒B then A ψ(B) ⇒ ψ(B ) .
(iii) If A is any expression of A, then A A ⇔ ψ(ϕ(A)) .
(iv) If B is any expression of B, then B B ⇔ ϕ(ψ(B)) .
We know (1) already, (3) is trivial, since ψ(ϕ(A)) = A in this case, and (4) has just been
proved above.
It remains to prove (2), which we do by proving that, if B B, then A ψ(B) .

Let L1 , L2 , . . . , Ln be a proof of B in B; we show that ψ(L1 ), ψ(L2 ), . . . , ψ(Ln ), with perhaps


some extra lines inserted, forms a proof of ψ(B) in A. We observe that Ln is B, so ψ(Ln ) =
ψ(B).
Suppose the line L arises by Modus Ponens in the old proof. Then there are two previous
lines of the form K and K ⇒ L in that proof. But then the lines ψ(K) and ψ(K) ⇒ ψ(L)
occur before ψ(L) in the new proof, and so it arises by Modus Ponens in the new proof also.
Suppose the line L arises by UG in the old proof. Then it is of the form (∀x)A(x) and there
is an earlier line, K say, of the form A(x). But then ψ(L) is (∀x)ψ(A(x)) and follows by
UG from ψ(K) which is ψ(A(x)) — note that we are talking about the proof of a theorem,
not a deduction, here, so there is no proviso regarding the appearance of the variable x in
hypotheses to check.
Finally suppose that L is an axiom of B. Then there are two possibilities: either it is one
of the axioms of A, perhaps extended to include the symbol f , or else it is the new axiom.
If L is an instance of an Axiom of A then ψ(L) is another instance of the same Axiom in A.

It remains to consider the possibility that L is the new axiom

(∀u)(u = f (x1 , x2 , . . . , xn ) ⇒ P (u, x1 , x2 , . . . , xn )) .

This is a single axiom, not an axiom scheme, so the symbol f occurs exactly once and
f (x1 , x2 , . . . , xn ) is a simple f -term. If v is the first variable symbol which does not occur
148 CHAPTER 4. SOME FIRST-ORDER THEORIES

here, we have:

ψ(L) = (∃v)P (v, x1 , x2 , . . . , xn ) ∧ (∀u)(∀v)(u = v ⇒ P (u, x1 , x2 , . . . , xn )) .

This is a theorem of A (proof below), so all we need do is insert its proof before the line
ψ(L) in the new proof.
Here is a proof:

1. (∃!u)P (u, x1 , x2 , . . . , xn ) Assumed to be a theorem


2. (∃u)P (u, x1 , x2 , . . . , xn ) From 1 by SL
3. P (v, x1 , x2 , . . . , xn )  Choice
4. u = v ⇒ P (v, x1 , x2 , . . . , xn ) ⇒ P (u, x1 , x2 , . . . , xn ) Eq2
5. P (v, x1 , x2 , . . . , xn ) ⇒ u = v ⇒ P (u, x1 , x2 , . . . , xn ) SL on 4
6. u = v ⇒ P (u, x1 , x2 , . . . , xn ) MP, 3 and 5
7. (∀u)(u = v ⇒ P (u, x1 , x2 , . . . , xn )) UG on 6
8. P (v, x1 , x2 , . . . , xn ) ∧ (∀u)(u = v ⇒ P (u, x1 , x2 , . . . , xn ))  SL, 3 and 7
9. (∃v) P (v, x1 , x2 , . . . , xn ) ∧ (∀u)(u = v ⇒ P (u, x1 , x2 , . . . , xn )) ExGen on 8 

And another new topic related to equality.


For the investigations we are doing it is obvious that axioms for equality are important; and
it has already been remarked that finite axiomatisability will turn out to be important too.
So it is good to know that there is an alternative set of axioms for equality that is finite.
Well — this is true provided the theory has only a finite number of function and relation
symbols.
This will mean that, given a first-order theory with equality that has only a finite number
of function and relation symbols, we can show that it is finitely axiomatisable, by showing
that it can be axiomatised by

The axioms of PL + the axioms of equality + a finite set of extra axioms.

(The proviso that there should be only a finite number of function and relation symbols is
not a big problem: nearly all the theories we are interested in are of this kind.)
We prove this now

A.9 Finite axioms for equality


Here is an alternative set of axioms for equality:
(EqF1) (∀x)( x = x )
(EqF2) (∀x)(∀y)( x = y ⇒ y = x )

(EqF3) (∀x)(∀y)(∀z)( x = y ∧ y = z ⇒ x = z )
A. FIRST-ORDER THEORIES WITH EQUALITY 149

(EqFf) For each formal function symbol f of the language of any arity k,

(∀x1 )(∀x2 ) . . . (∀xk )(∀x01 )(∀x02 ) . . . (∀x0k )


( x1 = x01 ∧ x2 = x02 ∧ · · · ∧ xk = x0k ⇒ f (x1 , x2 , . . . , xk ) = f (x0 1 , x0 2 , . . . , x0 k ) )

(EqFr) For each formal relation symbol r of the language of any arity k,

(∀x1 )(∀x2 ) . . . (∀xk )(∀x01 )(∀x02 ) . . . (∀x0k )


x1 = x01 ∧ x2 = x02 ∧ · · · ∧ xk = x0k ⇒ (r(x1 , x2 , . . . , xk ) ⇔ r(x0 1 , x0 2 , . . . , x0 k ))


The last two “axioms” can be many axioms each. However, if the theory has only a finite
number of function and relation symbols, then these each represent only a finite number of
axioms.

A.10 Proposition
The sets of axioms given in Section A.2 and A.9 are equivalent. I A.2
I A.9
Proof. Since they both use the same language, it is enough to show that (1) the axioms of
A.9 follow as theorems from the axioms of A.2 and (2) the axioms of A.2 follow as theorems
from those of A.9.
(1) Assuming the theory with the axioms of A.2, we have already proved EqF1–3. The
other two, EqFf and EqFr, follow by k applications of the Partial Substitution Proposition
A.4.

(2) Now assume the language with the axioms of A.9. Axiom Eq1 is given as Axiom
EqF1.
To prove Axiom Eq2 we must first prove a similar result for terms: for any term t,

x = y ⇒ t = t[x/y]

This follows easily from EqFf by induction over the construction of f .


Now Axiom Eq2 follows by an easy induction over the construction of the expression P .

150 CHAPTER 4. SOME FIRST-ORDER THEORIES

B Projective planes
B.1 Discussion
A projective plane is a geometric structure consisting of lines and points, in which there is
one line through every pair of distinct points and every pair of distinct lines have one point
in common.
If we start from the point of view of Euclidan geometry, then the requirement that every two
points have a line through them is no surprise, but to stipulate that every pair of distinct
lines has a point in common can be looked at in two ways: either there are no parallel lines
at all or there must be extra “points at infinity” at which parallel lines meet. In any case, it
raises the question whether this whole idea makes sense at all.
This question can be looked at two ways:

• If we axiomatise this idea, is the resulting theory consistent?

• Euclidean Geometry can be realised by a structure (the points and straight lines in
R2 ); is there any such structure which realises the axioms of projective geometry?

We will answer both questions in the affirmative in the next chapter. For the meantime, let
us see how to define Projective Geometry as a first-order theory.

B.2 Definition — informal version


A projective plane consists of points and lines. (Everything is either a point or a line, and
nothing is both.)
There is a relation of incidence between points and lines: a point may or may not lie on
any particular line.
To any two distinct points, there is a unique line that both those points lie on.

To any two distinct lines, there is a unique point which lies on both of them.
There exist four points, no three of which all lie on the same line. (This is called the
Non-triviality axiom.)

B.3 Definition — formal version


This is a first-order theory with equality. It has three relation symbols

two unary relations P and L


one binary relation i

(Think: P (p) means p is a point, L(l) means l is a line and i(p, l) means point p lies on line
l).

There are no function symbols.


B. PROJECTIVE PLANES 151

There are five proper axioms:



(PP1) (∀x) (P (x) ∨ L(x)) ∧ ¬(P (x) ∧ L(x))
(PP2) i(x, a) ⇒ P (x) ∧ L(a)
(PP3) P (x) ∧ P (y) ∧ x 6= y ⇒ (∃!a)(i(x, a) ∧ i(y, a))

(PP4) L(a) ∧ L(b) ∧ a 6= b ⇒ (∃!x)(i(x, a) ∧ i(x, b))


(PP5) (∃x1 )(∃x2 )(∃x3 )(∃x4 )(∀a) P (x1 ) ∧ P (x2 ) ∧ P (x3 ) ∧ P (x4 )
∧ ¬(i(x1 , a) ∧ i(x2 , a) ∧ i(x3 , a)) ∧ ¬(i(x1 , a) ∧ i(x2 , a) ∧ i(x4 , a))

∧ ¬(i(x1 , a) ∧ i(x3 , a) ∧ i(x4 , a)) ∧ ¬(i(x2 , a) ∧ i(x3 , a) ∧ i(x4 , a))

B.4 An example
This is called the seven-point plane for obvious reasons.

This structure has seven points (shown as grey blobs) and seven lines — six of them straight
and one curved (no one said the lines must be straight!). That the axioms all hold for this
structure can be checked directly by observation.

A structure which obeys the axioms (and therefore the theorems) of a theory like this is
called a model. We will look at models in detail in the next chapter.
Another way of defining the same model (or, at least, an isomorphic one) is to give an
incidence matrix. There is a row for each point p1 , p2 , . . . , p7 and a column for each line
l1 , l2 , . . . , l7 ; the following matrix has a 1 where the point lies on the line and a zero otherwise.
In this form it is easier to check Axioms (PP3) and (PP4).
152 CHAPTER 4. SOME FIRST-ORDER THEORIES

l1 l2 l3 l4 l5 l6 l7
p1 1 0 1 0 1 0 0
p2 1 0 0 1 0 1 0
p3 0 1 0 1 1 0 0
p4 0 1 1 0 0 1 0
p5 1 1 0 0 0 0 1
p6 0 0 1 1 0 0 1
p7 0 0 0 0 1 1 1
C. PEANO ARITHMETIC (PA) 153

C Peano Arithmetic (PA)


Our aim here is to axiomatise elementary arithmetic as a first-order theory with equality plus
Peano’s axioms. This axiomatisation is kept simple and, in particular, avoids introducing
set theoretic ideas.
Peano’s axioms, stated informally, are:—
(1) 0 is a natural number
(2) For every natural number n, there is defined a natural number suc(n), called the
successor of n. (Think: suc(n) is a shorthand notation for n + 1.)
(3) 0 is not the successor of any natural number,
(4) The successors of different natural numbers are different.
(5) The only set of natural numbers which contains 0 and is closed under successors is
the set of all the natural numbers itself.
Alternatively (and much better, since we are not allowed to talk about sets in this theory):
if P (x) is an expression such that P (0) is true and (∀x)(P (x) ⇒ P (suc(c))), then (∀x)P (x).)
To set this up as a bona fide formal theory, we make it a first-order theory with equality,
with the functions mentioned above, and axioms as in the following definition.

C.1 Definition
A first-order theory with arithmetic is a first-order theory with equality with the extra
function symbols

one constant (nullary function) 0̄


one unary function suc
two binary functions + and .

We use a bar over the zero symbol to distinguish between 0̄, the constant symbol in the
language, and 0, the actual zero in N. We will also use a special notation for the successor
function, writing x+ for suc(x). Finally, we will use the usual infix notation for addition and
multiplication, writing x + y for +(x, y) and x.y or xy for .(x, y) .
I write the function suc all squashed up like that to suggest that it is a single symbol, not
three letters. We won’t be using it much anyway.

The proper axioms are


(PA1) (∀x)¬(x+ = 0̄)
(PA2) (∀x)(∀y) (x+ = y + ⇒ x = y)
(PA3) Schema: for every expression P and variable symbol x,
P (0̄) ⇒ (∀x)(P (x) ⇒ P (x+ )) ⇒ (∀x)P (x)
154 CHAPTER 4. SOME FIRST-ORDER THEORIES

(PA4) (∀x) (x + 0̄ = x)
(PA5) (∀x)(∀y) (x + y + = (x + y)+ )
(PA6) (∀x) (x.0̄ = 0̄)
(PA7) (∀x)(∀y) (x.(y + ) = (x.y) + x)

Note The theory defined by exactly the functions, relations and axioms specified above
(and no others) is Peano Arithmetic; we will denote it by PA. (It is often called the
Elementary Theory of Arithmetic).
Later we we will be interested in extending the theory by adding more functions, relation
symbols and axioms than the minimum specified above.

C.2 Remarks
(i) The first three axioms here obviously encapsulate Peano’s axioms and are sufficient
to define the Natural Numbers. The third axiom looks a bit more friendly if we rewrite it
thus:
P (0̄) ∧ (∀x)(P (x) ⇒ P (x+ ))

⇒ (∀x)P (x)
Note how it expresses the Principle of Induction without the need for any set theory notation.
It is frequently used in the form of a deduction:

P (0̄) , (∀x)(P (x) ⇒ P (x+ )) (∀x)P (x) (PA3a)

(ii) Since we are avoiding set theory, we have no easy mechanism for introducing new
functions. Thus the two basic functions, addition and multiplication, are built in to the
definition of the theory itself. Axioms (PA4) and (PA5) constitute an inductive definition
of addition, Axioms (PA6) and (PA7) do the same for multiplication.

Herein lies a major shortcoming of this theory — it is difficult to introduce further functions,
without extending the language. For example, how can you define exponentiation and prove
the index laws in this language?
(iii) We have allowed the possibility of adding extra function and relation symbols and
extra axioms. Thus we can approach this theory in two ways. We can consider PA, the bare
theory of arithmetic, having only the functions, relations and axioms stated above, and see
what we can find out about it. Alternatively, we can use this definition to discuss any theory
which contains arithmetic (and that includes mathematics). This latter approach is useful
when we come to Gödel’s Theorem, which applies to any theory which contains arithmetic.
(iv) In general, if we take an idea normally defined in mathematics by a set of axiom-like
statements, and extract those to form a first-order theory by themselves, we form what is
usually called the “Elementary Theory of” whatever it is. We will look briefly at elementary
theories of groups and certain kinds of orders later in this chapter.
Such elementary theories do not normally contain the full power of Mathematics — set
theoretic notation, functions and so on — so there are usually many things we would like
to say about such theories which simply cannot be said, let alone proved, within them. For
C. PEANO ARITHMETIC (PA) 155

example, there is no way one can express “there is an uncountable number of subsets of the
natural numbers” in the language of this theory, far less prove it.
(v) The language contains terms which describe each of the natural numbers. For in-
stance the term 0̄+ represents the number 1, the term 0̄++ represents the number 2 and so
on. So it is a natural abbreviation to write 1̄ for 0̄+ , 2̄ for 0̄++ and so on. In fact, let us now
add these notations to our semi-formal theory, making it precise with a proper inductive
definition.

C.3 Definition
For any natural number ξ, we define the term ξ¯ by

0̄ is already given as part of the definition of PA.


ξ + 1 = ξ¯+ for all natural numbers ξ.
Thus 1̄ = 0̄+ , 2̄ = 0̄++ , 3̄ = 0̄+++ and so on.
Note that this notation is not part of the fully-formal language PA. For each (actual)
natural number ξ, the notation ξ¯ is a new symbol in the semi-formal language. And, if we
¯ then that is definitely part
should want to think of this construction as a function ξ 7→ ξ,
of the metalanguage (because there is no way the ξ 7→ part can be expressed in the formal
language.)
It will be convenient to use the notation ξ + for ξ + 1 in N also, in which case the second
line of the definition above becomes
¯+
ξ + = (ξ)

C.4 An inductive proof


There are many properties of the natural numbers which are not stated explicitly in the
axioms given above, but which nevertheless we hope we could prove from them. Obvious
ones are commutativity, associativity and the other usual ring properties. It would also be
nice if we could define the usual order relation and prove some of its important properties.
Let us start by proving commutativity. In fact, let us start by seeing just what we need to
prove (writing the steps out somewhat informally for clarity).
We want to prove that x + y = y + x for all x and y. The only proof technique given to
us by the axioms is induction, so we more or less have to use that. It clearly makes no
difference whether we use induction over x or y, so let us use induction over x. That means
we want to prove two things:
0+y = y+0 for all y (–1)
+ +
x+y =y+x ⇒ x +y =y+x for all x and y (–2)
Now neither of these is given by the axioms, so we step back and try to prove these first
(hoping that we don’t get into some kind of infinite regress here!). To prove (–1), we note
first that by PA4 it is the same as
0+y = y for all y . (–1a)
156 CHAPTER 4. SOME FIRST-ORDER THEORIES

So we try to prove this by induction over y. For that we need

0+0 = 0 (–1b)
+ +
0+y =y ⇒ 0+y =y for all y (–1c)

Here (–1b) is a special case of PA4 and, with the hypothesis 0 + y = y, we can prove
0 + y + = y + using PA5. We should be able to write this out to make a proper proof of (–1);
so now let us look at (–2). If we want to prove this by induction as it stands, we have a
choice of induction over x or y. Either way yields a couple of horrible tasks; for induction
over y for instance we need:

x+0=0+x ⇒ x+ + 0 = 0 + x+ for all y (–2x)


+ +
(x + y = y + x ⇒ x +y = y + x )
⇒ (x + y + = y + + x ⇒ x+ + y + = y + + x+ ) for all x and y
(–2y)

However things are not as bad as they seem: we can prove the right hand side of (–2) without
needing the left hand side, that is, we can prove x+ + y = y + x+ for all x and y without
need of the hypothesis, and that will do. To do this by induction over y we need to prove

x+ + 0 = 0 + x+ for all x (–2a)


+ + + + + +
x +y =y+x ⇒ x +y =y +x for all x and y. (–2b)

But (–2a) is just an instance of (–1), already proved, and (–2b) follows from several appli-
cations of PA5. So now all that is needed is to convert this into a proper semi-formal proof,
which I’ll write out just to show what it looks like.

C.5 Proposition
(∀x)(∀y)( x + y = y + x )

Proof. First we prove


(∀x)( 0̄ + x = x ) (–1)

1 0̄ + 0̄ = 0̄ PA4
2 0̄ + x = x subhyp
+ +
3 0̄ + x = (0̄ + x) PA5
4 0̄ + x+ = x+ Subst of equals (eq3): 1 into 3
5 0̄ + x = x ⇒ 0̄ + x+ = x+ Ded: 2–4
6 (∀x)(0̄ + x = x ⇒ 0̄ + x+ = x+ ) UG
7 (∀x)(0̄ + x+ = x+ ) Induction (PA3a) on 1 and 6

Next we prove
(∀x)(∀y)( x+ + y = (x + y)+ ) (–2)
C. PEANO ARITHMETIC (PA) 157

This is by induction over y. Writing the proof out slightly less formally, firstly,
x+ + 0̄ = x+ by PA4
+
= (x + 0̄) by PA4 again (x = x + 0̄ and substitute equals).
Now for any y, suppose that x + y = (x + y)+ . Then
+

x+ + y + = (x+ + y)+ by PA5


++
= (x + y) by IH (“IH” means “Inductive hypothesis”)
+ +
= (x + y ) by PA5 again. 
In case you have any doubts about this, here is the proof again, written out more formally:

1 x+ + 0̄ = x+ PA4
2 x+ = (x + 0̄)+ PA4 again
3 x+ + 0̄ = (x + 0̄)+ A.3 on 1 and 2
4 (∀x)(x+ + 0̄ = (x + 0̄)+ ) UG
5 x+ + y = (x + y)+ subhyp
6 x+ + y + = (x+ + y)+ PA5
7 (x+ + y)+ = (x + y)++ By the subhyp
8 x+ + y + = (x + y)++ A.3 on 6 and 7
9 (x + y)++ = (x + y + )+ PA5 again
10 x+ + y + = (x + y + )+ A.3 on 8 and 9
11 x+ + y = (x + y)+ ⇒ x+ + y + = (x + y + )+ Ded: 6–10
12 (∀x)(x+ + y = (x + y)+ ⇒ x+ + y + = (x + y + )+ ) UG on x
13 (∀y)(∀x)(x+ + y = (x + y)+ ⇒ x+ + y + = (x + y + )+ ) UG on y
14 (∀y)(∀x)( x+ + y = (x + y)+ ) Induction over y, 4 and 13
14 (∀x)(∀y)( x+ + y = (x + y)+ ) F.5 of Ch.3

Finally we prove the main result by induction over y, writing the proof out less formally.
We have
x + 0̄ = x = 0̄ + x by PA4 and (–1).
Now, for any y, suppose that x + y = y + x. Then
x + y + = (x + y)+ = (y + x)+ = y + + x by PA5, IH and (–2).

C.6 Remarks
That was pretty unpleasant for two reasons. The first was that the result we set out to
prove (commutativity of addition) actually required two lemmas first. The second was that
I was at pains to show how an inductive proof really could be expressed within the formal
theory.
But, once one has the hang of how it goes, it is OK to write out inductive proofs in a more
relaxed manner, knowing that they can be translated into formal form if necessary. Here
follows an example.
158 CHAPTER 4. SOME FIRST-ORDER THEORIES

C.7 Proposition
We prove that addition is associative:

(∀x)(∀y)(∀z)( (x + y) + z = x + (y + z) ) .

Proof. The proof is by induction over z. For the base case,

(x + y) + 0 = x + y by PA4
= x + (y + 0) by PA4 again

(Notice that the second step also uses substitution of equals.)


For the inductive step we assume that (x + y) + z = x + (y + z) and prove the corresponding
result with z + in place of z.

(x + y) + z + = ((x + y) + z)+ by PA5


+
= (x + (y + z)) by IH
+
= x + (y + z) by PA5 again
+
= x + (y + z ) by PA5 again .

(Again there are a couple of quiet uses of substitution of equals.) “IH” is an abbreviation
for “Inductive hypothesis”. 

C.8 Remarks
(i) It is possible to continue in this vein and prove the usual algebraic properties of
addition and multiplication — associativity, distributivity, cancellation and so on up to prime
number factorisation and beyond. We will not follow this up here, but restrict ourselves to
a few results which are relevant to the discussion of Gödel’s Theorem later.
(ii) While it is difficult to introduce new functions in any useful way into this language
(except by formally extending the language by adding new function symbols and axioms),
there is no problem with adding new relations semiformally. For example, we can define

x 6= y as an abbreviation for ¬(x = y)


x≤y as an abbreviation for (∃u)(x + u = y)
and x<y as an abbreviation for (∃u)(x + u+ = y).

C.9 Proposition
(∀x)(∀y)( x < y ⇒ x+ < y + )

Proof. From the definition of <, there exists u such that x + u+ = y. Then

x+ + u+ = (x + u+ )+ = y + and so x+ < y + . 
C. PEANO ARITHMETIC (PA) 159

C.10 Proposition
(∀x)( x = 0̄ ∨ (∃y)(x = y + ) )

Proof. is by induction over x. Let P (x) be the statement to be proved. Then 0̄ = 0̄ and so
P (0̄) is true.

Now for any x, x+ = x+


and so (∃y)(y + = x+ ) .
Therefore x+ = 0̄ ∨ (∃y)(y + = x+ ) ,
in other words, P (x+ ) .
But then, regardless of the truth of P (x), we have P (x) ⇒ P (x+ ) .
Now, by UG we have (∀x)(P (x) ⇒ P (x+ )) .
But then, by induction we have (∀x)P (x+ ) . 

C.11 Corollary
(∀x)( x 6= 0 ⇒ (∃y)(x = y + ) )

(Not so much a corollary as a restatement.)

C.12 Proposition: Trichotomy


(∀x)(∀y)( (x < y) ∨ (x = y) ∨ (y < x) ) .
Notice that this is the same as

(∀x)(∀y)( (∃z)(x + z + = y) ∨ (x = y) ∨ (∃z)(y + z + = x) ) .

Proof is by induction over x. Suppose first that x = 0̄. Then either y = 0̄, in which case
x = y and the result is true, or else y 6= 0̄, in which case by Corollary B10, there is a u such
that y = u+ . But then y = x + u+ , so x < y and again the result is true.
Now we want to prove (x+ < y) ∨ (x+ = y) ∨ (y < x+ ). Now either y = 0̄, in which case
x+ = y + x+ , so y < x+, or else y 6= 0̄. Then, by Corollary 2.9, there is a u such that
y = u+ . By IH, (x < u) ∨ (x = u) ∨ (u < x) and now, using Proposition C.9, the result
follows. 

C.13 Proposition: The usual full-order properties


(i) (∀x)( x ≤ x ) (Reflexivity)
(ii) (∀x)(∀y)( x ≤ y ∧ y ≤ x ⇒ x = y ) (Antisymmetry)
(iii) (∀x)(∀y)(∀z)( x ≤ y ∧ y ≤ z ⇒ x ≤ z ) (Transitivity)

(iv) (∀x)(∀y)( x ≤ y ∨ y ≤ x ) (Dichotomy)


160 CHAPTER 4. SOME FIRST-ORDER THEORIES

Proof. (i) x + 0̄ = x.
(ii) and (iii) as an interesting exercise.
(iv) follows easily from Trichotomy, above. 

Question: is it possible in this language to express the fact that the Natural Numbers are
well-ordered?

C.14 The Sorites Paradox


Here is something from Wikipedia (https://en.wikipedia.org/wiki/Continuum_fallacy):

Imagine grains of sand in a bag. I can lift the bag when it contains one grain
of sand. If I can lift the bag with N grains of sand then I can certainly lift it
with N+1 grains of sand (for it is absurd to think that I can lift N grains but
adding a single grain makes it too heavy to lift). Therefore, I can lift the bag
when it contains any number of grains of sand, even if it contains five tons of
sand.

I would prefer that the second sentence here was “I can lift the bag when it is empty”, since
we start counting at zero in this chapter, but that doesn’t really matter.

The argument is easily formalised. Write P (N ) for the statement “I can lift the bag with N
grains of sand”. Then it goes

P (0)
(∀N )(P (N ) ⇒ P (N + 1))
therefore (∀N )P (N )

The conclusion is obviously wrong — so where is the problem?


D. A STRIPPED-DOWN VERSION OF ELEMENTARY ARITHMETIC (RA) 161

D A stripped-down version of elementary arithmetic (RA)

D.1 Discussion
In Section C we introduced the first-order theory PA, Peano Arithmetic, otherwise known IC
as the Elementary Theory of Arithmetic, which describes the natural numbers and basic
arithmetic based on Peano’s axioms. In this section we intoduce a simpler theory, which I
will call Robinson Arithmetic, RA (after its proposer Raphael Robinson). It uses the same
language as PA but different axioms, giving rise to a weaker theory (that is, all the theorems
of RA are theorems of PA, but there are theorems of PA which are not theorems of RA).
Why bother? Well, there are two reasons why this theory is important. Firstly, it is strong
enough to prove the Representability Theorem of Chapter 10. Gödel’s Incompleteness The-
orem applies to any first-order theory strong enough to prove the Representability Theorem.
Thus showing that the Representability Theorem can be proved in a weaker theory means
that Gödel’s theorem has wider applicability. Secondly, we will also see that, if there
exists a first-order theory in which the Representability Theorem can be proved and which
is finitely axiomatisable, then plain Predicate Logic is incomplete and undecidable. RA is
finitely axiomatisable, which will establish these important facts, whereas PA is not.

D.2 Definition: RA
RA is a first-order language with equality and four formal function symbols:
0̄ zero nullary
suc successor unary
+ addition binary
. multiplication binary

Just as with PA, we use a bar over the zero symbol to distinguish between 0̄, the constant
symbol in the language, and 0, the actual zero in N. We also use the notation x+ for the
successor function suc(x) and the usual infix notation for addition and multiplication, writing
x + y for +(x, y) and x.y or xy for .(x, y) .
Here are the axioms of the theory RA.
(PL1 – PL5) The six ordinary axioms of Predicate Logic.

Axioms of equality (finite version):


(E1) (∀x)( x = x )
(E2) (∀x)(∀y)( x = y ⇒ y = x )
(E3) (∀x)(∀y)(∀z)( x = y ∧ y = z ⇒ x = z )

(E4) (∀x)(∀x0 )( x = x0 ⇒ x+ = x0+ )


(E5) (∀x)(∀y)(∀x0 )(∀y 0 )( x = x0 ∧ y = y 0 ⇒ x + y = x0 + y 0 )
(E6) (∀x)(∀y)(∀x0 )(∀y 0 )( x = x0 ∧ y = y 0 ⇒ xy = x0 y 0 )

And the special axioms for Robinson Arithmetic:


162 CHAPTER 4. SOME FIRST-ORDER THEORIES

(RA1) (∀x)¬(x+ = 0̄)


(RA2) (∀x)(∀y) (x+ = y + ⇒ x = y)
(RA3) (∀x) (x 6= 0 ⇒ (∃u)(u+ = x))
(RA4) (∀x) (x + 0̄ = x)

(RA5) (∀x)(∀y) (x + y + = (x + y)+ )


(RA6) (∀x) (x.0̄ = 0̄)
(RA7) (∀x)(∀y) (x.(y + ) = (x.y) + x)

There are equivalent forms of Axioms (E5) and (E6), namely:


(E5’) (∀x)(∀y)(∀z)( x = y ⇒ x + z = y + z ∧ z + x = z + y )
(E6’) (∀x)(∀y)(∀z)( x = y ⇒ xz = yz ∧ zx = zy )
Notice that there is hardly any difference between this and the axioms of PA. I have given
the Axioms of Equality in their finite form, but they could just as well have been given
in the ordinary form. I only gave them this way to emphasise that this theory is finitely
axiomatisable.
The main difference is that Axiom PA3 has been replaced by Axiom RA3. The loss of
the Induction Axiom robs RA of a very powerful proof method, so we should expect many
theorems of PA not to be theorems of RA.
Note that Axiom RA2 is actually an equivalence, the reverse implication being given by
substitution of equals; also Axiom RA1 tells us that Axiom RA3 is also an equivalence. We
can express these:
(RA20 ) (∀x)(∀y) (x+ = y + ⇔ x = y)

(RA30 ) (∀x) (x 6= 0 ⇔ (∃u)(u+ = x))

D.3 Order
The axioms of RA do not imply that addition is commutative, so we must be a bit careful
about how we define the order relation here. We actually define it “on the opposite side”
from the way we defined it for PA:
We define

x<y to mean (∃u) (u+ + x = y)


x≤y to mean x < y ∨ x = y.

D.4 Warning
Be very careful with formal proofs in this theory. There is no reason to suppose that the usual
laws, such as associativity and commutativity of addition and multiplication are proveable
D. A STRIPPED-DOWN VERSION OF ELEMENTARY ARITHMETIC (RA) 163

in this theory. Consequently, many of the simple arithmetic manipulations we are used to
doing are not justified.
At the end of the next chapter (Section 5.G) a model is constructed (and another related I 5.G
one outlined) which prove that nearly all the standard manipulations we are used to are
not justified by theorems in this theory: these include commutativity, associativity and
cancellation of addition and even the behaviour of zero. (0 + x = x is not a theorem!)
Note also that proof by induction is not justified by these axioms, so induction may not be
used in any formal proof within the theory. We are, however, allowed to make inductive
arguments in metaproofs about the theory (because for metaproofs we are using ordinary
mathematics).

Nevertheless, the theory does have some familiar theorems. We now set about developing
just enough results in this theory for the requirements of the Representability Theorem.

D.5 Proposition: some properties of order


(i)
(∀x)( ¬(x < 0̄) )
that is, no x is < 0̄.

(ii)
(∀x)(∀y)( x ≤ y ∧ x 6= y ⇒ x < y).
Note that the opposite implication is not a theorem of RA.

(iii)

(∀x)(∀y)( x < y ⇔ x+ < y + )


(∀x)(∀y)( x ≤ y ⇔ x+ ≤ y + )

(iv)
(∀x)( x ≤ 0̄ ⇒ x = 0̄ )

Proof (i). Suppose there is an x such that x < 0̄. Then there is u such that u+ + x = 0̄.
But then x 6= 0̄ because x = 0̄ would give u+ = 0̄ (by RA4) contradicting RA1. Then, by
RA3a, there is y such that x = y + . But now we have u+ + y + = 0̄, that is, (u+ + y)+ = 0̄
(by RA5) and this contradicts RA1 again.
(ii) This follows immediately from the definition of x ≤ y.
(iii) For any x and y,

x<y ⇔ (∃u)( u+ + x = y ) defn. of <


+ + +
⇔ (∃u)( (u + x) = y ) by RA2’
+ + +
⇔ (∃u)( u + x = y ) by RA5
+ +
⇔ x <y defn. of < again
164 CHAPTER 4. SOME FIRST-ORDER THEORIES

and

x≤y ⇔ x<y ∨ x=y defn. of ≤


+ + + +
⇔ x <y ∨ x =y by the last proof and RA2’
+ +
⇔ x ≤y defn. of ≤ again.

(iv) Since x ≤ 0̄ ⇔ x < 0̄ ∨ x = 0̄, this follows from (i). 

D.6 Definition: bar notation


Just as with Peano Arithmetic, for any natural number ξ, we define the term ξ¯ by induction
as follows:

0̄ is already defined
ξ + 1 = (ξ)¯+

and note that the second line above can be written

¯+
ξ + = (ξ)

The next proposition shows that the function ξ 7→ ξ¯ preserves all the relevant structure of
N.

D.7 Proposition
For any natural numbers µ and ν,
(i) µ+ = µ̄+ ;
(ii) µ + ν = µ̄ + ν̄ ;

(iii) µν = µ̄ν̄ ;
(iv) For all µ ∈ N, µ̄ 6= µ̄+
(v) The map µ 7→ µ̄ (from N into the language of RA) is one-to-one.

(vi) µ̄ < ν̄ if and only if µ < ν.


(vi0 ) µ̄ ≤ ν̄ if and only if µ ≤ ν.
(vii) For any ν ∈ N, (∀x)(x ≤ ν̄ ∨ ν̄ ≤ x) .
(vii0 ) For any ν ∈ N, (∀x)(x < ν̄ ∨ x = ν̄ ∨ ν̄ < x) .

(viii) For any ν ∈ N, (∀x)(∀y)(x + ν̄ = y + ν̄ ⇒ x = y) .

Proof (i). This is part of the definition of bar notation.


D. A STRIPPED-DOWN VERSION OF ELEMENTARY ARITHMETIC (RA) 165

(ii) is proved by induction over ν. I will give the proof in detail to show that we are not
making any unjustified assumptions. First we prove the zero case µ + 0 = µ̄ + 0̄:
µ + 0 = µ̄ since µ + 0 = µ in N
= µ̄ + 0̄ by RA4

For the inductive step we assume that µ + ν = µ̄ + ν̄ and prove that µ + ν + = µ̄ + ν +


µ + ν + = (µ + ν)+ since µ + ν + = (µ + ν)+ in N
+
= (µ + ν) by definition of bar notation
+
= (µ̄ + ν̄) by the inductive hypothesis
+
= µ̄ + ν̄ by RA5
= µ̄ + ν+ by definition of bar notation

(iii) The proof is similar to that of (ii).

(iv) Suppose that there is some µ ∈ N such that µ̄ = µ̄+ . Then let µ be the least such
number. Then µ 6= 0, since 0̄ 6= 0̄+ by RA1. So, writing ν = µ − 1, we have µ̄ = ν̄ + ,
ν̄ ++ = µ̄+ = µ̄ = ν̄ + and then ν̄ = ν̄ + by RA2, contradicting the choice of µ.
(v) Proof by contradiction. Suppose that there are µ, ν ∈ N such that µ 6= ν but µ̄ = ν̄.
We may assume without loss of generality that µ < ν. We may suppose further that,
amongst all such pairs, we have chosen one so that ν − µ is the smallest. Now, if ν ≥ 2 + µ
we have µ < 1 + µ < ν and then, by choice of µ and ν, µ̄ = 1 + µ = ν̄, a contradiction.
Therefore ν = µ+ , contradicting (v).
(vi) If µ < ν then there is some ξ ∈ N such that ξ + + µ = ν (a property of N). But then
ξ¯+ + µ̄ = ν̄ by (ii), so µ̄ < ν̄.

For the converse, suppose that µ̄ < ν̄. We prove that µ < ν by induction over µ.
If µ = 0 we have 0̄ < ν, that is (∃u)(u+ + 0̄) = ν̄. Then (∃u)(u+ = ν̄ so ν̄ 6= 0̄
by RA1 and thus ν 6= 0, so µ = 0 < ν.
If µ 6= 0, noting that ν 6= 0 too by D.5(i), there are ξ and η in N such that µ = ξ +
and ν = η + . Thus ξ¯+ < η̄ + , in other words, (∃u)(u+ + ξ¯+ = η̄ + ). But then
+
(∃u)((u + ξ)¯ = η̄ ) from which
+ +
(∃u)(u + ξ¯ = η̄) by RA1. But then
+
ξ¯ < η̄
and, by the inductive hypothesis, ξ < η and so µ < ν.
(vi0 )
µ̄ ≤ ν̄ if and only if µ̄ < ν̄ ∨ µ̄ = ν̄
if and only if µ<ν ∨ µ=ν
if and only if µ≤ν.

(vii) The proof is by induction over ν. If ν = 0, the result is immediate: (∀x)(0̄ ≤ x) .


Now assume that ν = µ+ and that the result is true for µ, that is, that
(∀x)(x ≤ µ ∨ µ ≤ x) .
166 CHAPTER 4. SOME FIRST-ORDER THEORIES

Now, when x = 0 the result is immediate also, so we may suppose that x 6= 0 and thus that
there is y such that x = y + . But then our inductive hypothesis tells us that either y ≤ µ or
µ ≤ y, in which case either y+ ≤ µ+ or µ+ ≤ y, that is x ≤ ν or ν ≤ x, as required.
(vii0 ) This follows immediately from (vii) using the definition of ≤.

(viii) Proof is by induction over ν. If ν = 0, this is (∀x)(∀y)(x + 0̄ = y + 0̄ ⇒ x = y)


which is given by RA4.
Assuming the result is true for ν as given, for any x and y, x + ν̄ + = y + ν̄ + gives
(x + ν̄)+ = (y + ν̄)+ by RA5, and then x + ν̄ = y + ν̄ by RA2, from which x=y
by the inductive hypothesis. 

D.8 Proposition
For any ν ∈ N,

x ≤ ν̄ ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν̄ (–1)
and, if ν > 0, x < ν̄ ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν − 1 (–2)

And note that these can be restated:

If x ≤ ν̄ there is ξ ≤ ν such that ¯


x = ξ.
and, for ν > 0, if x < ν̄ there is ξ < ν such that ¯
x = ξ.

Proof. For (–1): this is by induction over ν. In the case ν = 0 we want x ≤ 0̄ ⇒ x = 0̄


I D.5 and this is Proposition D.5(iv).
For the inductive step we assume that

x ≤ ν̄ ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν̄

and prove that

x ≤ ν̄ + ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν̄ ∨ x = ν̄ +

If x = 0̄ this is immediate, so we assume that x 6= 0̄. But then (by RA2) there is a y such
I D.5 that x = y + . We now have y + ≤ ν + and so, by Proposition D.5(iii), y ≤ ν. The inductive
hypothesis now gives
y = 0̄ ∨ y = 1̄ ∨ . . . ∨ y = ν̄
from which
y + = 0̄+ ∨ y + = 1̄+ ∨ . . . ∨ y + = ν̄ +
which is
x = 1̄ ∨ x = 2̄ ∨ . . . ∨ x = ν̄ +
and we are done.

The proof of (–2) is exactly the same. 


D. A STRIPPED-DOWN VERSION OF ELEMENTARY ARITHMETIC (RA) 167

D.9 Proposition
For any expression P (x) and any natural number ν,

P (0̄) ∧ P (1̄) ∧ . . . ∧ P (ν̄) ⇔ (∀x ≤ ν̄)P (x) (–1)


and, if ν > 0, P (0̄) ∧ P (1̄) ∧ . . . ∧ P (ν − 1) ⇔ (∀x < ν̄)P (x) . (–2)

Proof.

(∀x ≤ ν̄)P (x) ⇔ (∀x)( x ≤ ν̄ ⇒ P (x) ) (–1)


⇔ (∀x)( (x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν̄) ⇒ P (x) ) (–2)

⇔ (∀x) (x = 0̄ ⇒ P (x)) ∧ (x = 0̄ ⇒ P (x)) ∧ . . . ∧ (x = ν̄ ⇒ P (x))
(–3)
⇔ (∀x)( P (0̄) ∧ P (1̄) ∧ . . . ∧ P (ν̄) ) (–4)
⇔ P (0̄) ∧ P (1̄) ∧ . . . ∧ P (ν̄) (–5)

These steps perhaps need some explanation.


(–1) is just by the definition of the (∀x ≤ ν̄) notation.

(–2) is by the preceding Proposition.


(–3) is more interesting. Back in 2.F.11(i) and (ii) we had a couple of results which could I 2.F.11
be summarised
(A ∨ B ⇒ C) ⇔ (A ⇒ C) ∧ (B ⇒ C)
Using induction, this easily generalises to

(A1 ∨ A2 ∨ · · · ∨ An ⇒ C) ⇔ (A1 ⇒ C) ∧ (A2 ⇒ C) ∧ · · · ∧ (An ⇒ C)

(Note the change from ∨ to ∧ here). This is what is used in this step.
(–4) is because (x = 0̄ ⇒ P (x)) ⇔ P (0̄) and so on for the other terms.
(–5) is one one the rare cases in which we use the fact that, in an expression of the form
(∀x)(An expression which does not contain x free), one can simply remove the (∀x) quanti-
fier. (See 3.E.) I 3.E
The proof of (–2) is the same. 
168 CHAPTER 4. SOME FIRST-ORDER THEORIES

E The Elementary Theory of Groups


E.1 The theory
The Elementary theory of groups is a first-order theory with equality, three function symbols:

one nullary, e the identity


one unary, n the inverse
one binary, m multiplication

no relation symbols (other that equality) and the following proper axioms:—

(G1) (∀x)(∀y)(∀z) m(x, m(y, z)) = m(m(x, y), z)

(G2) (∀x) m(x, e) = x

(G3) (∀x) m(x, n(x)) = e .
If we employ a more usual notation,

e is the identity 1
n(x) is the inverse x−1
m(x, y) is the product xy

then the axioms look more familiar:


(G1) x(yz) = (xy)z for all x, y and z

(G2) x1 = x for all x


(G3) xx−1 = 1 for all x.

E.2 Remarks
(i) This is a stripped-down version of the axioms. In particular we do not assert

(G20 ) 1x = x
or (G30 ) x−1 x = 1

as axioms. These can be proved as theorems. To find the proofs is an interesting exercise
(hint: prove G30 first).
(ii) This theory cannot encompass the whole of Group Theory as we know it. As it is set
up, all variables must be elements of the same group and so the theory can only talk about
one group at a time (a single group constitutes the whole “universe of discourse”). Thus the
I 1.A theory is not categorical, in the sense of the question posed at the end of Section 1.A. This
will be made precise in the Chapter 5.
This also means that the theory cannot talk about homomorphisms, subgroups, direct prod-
ucts etc. etc. It is pretty limited.
E. THE ELEMENTARY THEORY OF GROUPS 169

E.3 Alternative axiomatisations


There are many other (equivalent) ways of axiomatising the Elementary Theory of Groups.
I will only mention one here, because it corresponds to what is perhaps the most common
way of defining groups in introductory texts. In this form, there is only one function,
multiplication, and the axioms change to

(GA1) (∀x)(∀y)(∀z) x(yz) = (xy)z

(GA2) (∃e)(∀x) xe = x ∧ (∃y)(xy = e) .

In developing the theory from this base, the first task is to show that the elements e and
y mentioned in GA2 are in fact unique. Then they give rise to the identity and inverse as
functions defined by description.
And, of course, you can use additive notation. It uses the same axioms, but different symbols.
170 CHAPTER 4. SOME FIRST-ORDER THEORIES

F Unbounded dense linear orders (UDLO)

F.1 Discussion
It seems reasonable to expect that any complete theory should be categorical. Look at it
this way: to say that a theory is complete means that any sentence, that is, any simple
statement with no free variables, is either provably true or provably false. And to say that
a theory is categorical means that any two structures that can be described by the language
and for which all the theorems of the theory are true must be isomorphic. Completeness
means that anything that the language can express that is true in one of the structures must
be true in the other also. How then can they possibly fail to be isomorphic?
Well, they can, and the theory described in this section is complete but not categorical.
The definition of “categorical” is at present a bit vague (what do I mean by a “structure the
language can describe” and by “isomorphic”?); it will be made precise in the next chapter
and then the fact that UDLO is non-categorical will be clear. In this section we will prove
the harder ssertion, that UDLO is complete.
It is the fact that UDLO is both complete and non-categorical that is important here. The
proof is longish and quite intricate, and is given here for completeness. I would recommend
skimming through it to get a rough idea how it goes (the argument is quite interesting) but
do not feel you have to read it in detail unless you become fascinated by it.

F.2 The theory


The Elementary Theory of Unbounded Dense Linear Orders (UDLO) is a first-order theory
with equality, no function symbols and one extra relation < which will (semi-formally) be
written in infix notation, as is usual. It has the following proper axioms:—
(UDLO1) (∀x)(∀y)( x < y ∨ x = y ∨ y < x )

(UDLO2) (∀x)(∀y)¬( x < y ∧ y < x )


(UDLO3) (∀x)(∀y)(∀z)( x < y ∧ y < z ⇒ x < z )
(UDLO4) (∀x)(∀y)( x < y ⇒ (∃z)(x < z ∧ z < y) )
(UDLO5) (∀x)( (∃z)(z < x) ∧ (∃z)(x < z) )

F.3 Remarks
(i) Axioms UDLO1–UDLO3 express the fact that the system is fully ordered, in terms
of a strict order relation <. Well, almost. They need to imply transitivity, and this is done
by UDLO3. They also need to imply that, for any x and y, exactly one of x < y, x = y or
y < x is true. Now UDLO1 implies that at least one of these is true and UDLO2 implies
that x < y and y < x cannot both be true. It remains to ensure that x < y and x = y
cannot both be true, for then it follows that x = y and y < x cannot both be true either.
But this is given with the help of UDLO4 (proof as an easy exercise).
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 171

In fact, UDLO4 is there for another reason (see the next remark), but it does this as a side
effect, so there is no need for an extra axiom.
(ii) Axiom UDLO4 expresses density: between any two distinct elements there is another.
Axiom UDLO5 says that the system is unbounded: it has neither a least nor a greatest
element.

(iii) In our semi-formal exposition we allow the usual notation, 6=, >, ≤ and ≥, defined
in the usual way.
(iv) In what follows it will be very convenient to allow the symbols T and F for “true”
and “false”. We can either add these as part of the logical background or add T as a new
nullary relation together with defining axiom

and then define F as a semi-formal abbreviation for ¬T.

F.4 Proposition
UDLO is not categorical.
Proof (sort of ). The Rationals Q and the Reals R are both systems described by UDLO;
they are clearly not isomorphic. 

As already remarked, the proof above is a wee bit vague, due mainly to the slight vagueness
in our definition of “categorical”. What we are really saying here is that Q and R are non-
isomorphic models of UDLO. In Chapter 5 we will define all these terms properly and this I Ch.5
theorem and its proof will become watertight.

The remainder of this section is devoted to a proof that UDLO is complete. Throughout,
the symbol means, of course, proveable in UDLO.

F.5 Proposition
Let x1 , x2 , . . . , xn , z be variable symbols (n ≥ 1). Then

(∃z) x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xn ≤ z ∧ (x1 = z ∨ x2 = z ∨ . . . ∨ xn = z) .
(1)
This expresses the idea that the set {x1 , x2 , . . . , xn } has a maximum member, using the
language available to us. In the same way we can express the fact that the set has a
minimum member.

(∃z) z ≤ x1 ∧ z ≤ x2 ∧ . . . ∧ z ≤ xn ∧ (z = x1 ∨ z = x2 ∨ . . . ∨ z = xn ) .
(2)

Proof. We prove the first statement only, by induction over n. In the case n = 1,

(∃z)( x1 ≤ z ∧ x1 = z )

is obvious (set z = x1 ).
172 CHAPTER 4. SOME FIRST-ORDER THEORIES

Now in the case n ≥ 2, and slipping into more informal style, we assume as inductive
hypothesis that there exists z 0 such that

x1 ≤ z 0 ∧ x2 ≤ z 0 ∧ . . . ∧ xn−1 ≤ z 0 ∧ (x1 = z 0 ∨ x2 = z 0 ∨ . . . ∨ xn−1 = z 0 ) . (3)

Now we have two possibilities: either


z 0 ≤ xn (4)
or
xn ≤ z 0 . (5)
In the first case, set
z = xn so that z0 ≤ z (6)
and we have x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xn−1 ≤ z from (3) and (6). Also z = xn , so we
have

x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xn ≤ z ∧ (x1 = z ∨ x2 = z ∨ . . . ∨ xn = z) .

and (1) follows. In the second case, set

z = z0 (7)

and we have

x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xn−1 ≤ z ∧ (x1 = z ∨ x2 = z ∨ . . . ∨ xn−1 = z)

from (3) and (7). 

You will notice that we have used both the choice Rule and the Constructive Dilemma in
this proof.

F.6 Note
The atomic expressions of UDLO are those of one of the forms

T , x=y or x<y

where x and y are variable symbols. By simple expressions we will mean expressions which
are either atomic expressions or negations of atomic expressions, so these are expressions of
one of the forms

T , ¬T , x=y , ¬(x = y) , x<y or ¬(y < x) .

These are the same, or equivalent to, expressions of one of the forms

T , F , x=y , x 6= y , x<y or x ≤ y .

By a C-term we will mean an expression of the form

A1 ∧ A2 ∧ . . . ∧ Am (m ≥ 1)
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 173

where each Ai is an atomic expression and there are no repetitions (that is, Ai 6= Aj
whenever i 6= j).
By a DC-form we will mean an expression which is either T or F or of the form

B1 ∨ B2 ∨ . . . ∨ Bn (n ≥ 1)

were each Bi is a C-term and again there are no repetitions (that is, Bi 6= Bj whenever
i 6= j).
For the next few propositions, we are going to have to discuss the set of free variables which
occur in our expressions. To save repeating the same phrase over and over,

For any expression P we will write v(P ) for the set of free variables in P .

F.7 Proposition
Let P be any expression in UDLO which contains no quantifiers. Then

P ⇔ P∗ ,

where P ∗ is a DC-form and v(P ∗ ) ⊆ v(P ) .

Proof. In this proof, when I say that two expressions are equivalent, that means of course
that they can be proved equivalent in UDLO. Also, the associativity of ∧ and ∨ will be
used frequently without comment.
Step 0 First some trivialities:
(a) Any atomic expression is a C-term.

(b) Any C-term is a DC-form.


(c) Consequently, any atomic expression is a DC-form.
Step 1 If P 1 , P 2 , . . . , P n are atomic expressions or C-terms, then P1 ∨P2 ∨ . . . ∨Pn is equiv-
alent to a DC-form.

Proof. By Step (0a), P1 ∨P2 ∨ . . . ∨Pn is a list of C-terms OR-ed together. If there are no
repetitions, this is already a DC-form. Otherwise, using commutativity of ∨, we can rear-
range the list so that identical atomic expressions are side by side and then use idempotence
of ∨ to remove the repetitions.
Step 2 If P 1 , P 2 , . . . , P n are DC-forms, then P1 ∨P2 ∨ . . . ∨Pn is equivalent to a DC-form.

Proof. If n = 1 then P 1 , P 2 , . . . , P n is just P1 and is already a DC-form. Now suppose


that n ≥ 2.
If any Pi is T, then P1 ∨P2 ∨ . . . ∨Pn is equivalent to T, which is a DC-form.
If any Pi is F, then P1 ∨P2 ∨ . . . ∨Pn is equivalent to the same list with that F removed, and
the result follows by induction over n.
174 CHAPTER 4. SOME FIRST-ORDER THEORIES

Otherwise P1 ∨P2 ∨ . . . ∨Pn is a list of C-terms OR-ed together. The proof now goes as for
Step 1.
Step 3 If P 1 , P 2 , . . . , P n are C-terms, then P1 ∧P2 ∧ . . . ∧Pn is equivalent to a C-term.
Proof. (The proof is similar to that for Step 1). P1 ∧P2 ∧ . . . ∧Pn is a list of atomic expres-
sions AND-ed together. If there are no repetitions, this is already a C-term. Otherwise,
using commutativity of ∧, we can rearrange the list so that identical atomic expressions are
side by side and then use idempotence to remove the repetitions.
Step 4a If P and Q are DC-forms, then P ∧ Q is equivalent to a DC-form.
Proof. If P is T, then P ∧ Q is equivalent to Q, a DC-form. The proof is the same if Q is
T.
If either P or Q is F, then P ∧ Q is equivalent to F, again a DC-form.
Otherwise both P and Q are lists of C-terms OR-ed together,
P = C1 ∨C2 ∨ . . . ∨Cm
and Q = D1 ∨D2 ∨ . . . ∨Dn say.
Then, using distributivity, this P ∧ Q is equivalent to
(C1 ∧ D1 ) ∨ (C1 ∧ D2 ) ∨ . . . (Cm ∧ Dn ) , (–1)
where the list contains all of the pairs (Ci ∧ Dj ). But each of these pairs is equivalent to a
C-term, by Step Step 3, so (–1) is equivalent to a list of C-terms OR-ed together. But then
this is equivalent to a DC-form, by Step 1.

Step 4b If P 1 , P 2 , . . . , P n are DC-forms, then P1 ∧P2 ∧ . . . ∧Pn is equivalent to a DC-form.


Proof. This follows from Step 4A by induction over n.
Step 5 If A is an atomic expression, then ¬A is equivalent to a DC-term.
Proof. Look at the cases.
¬F ⇔ T
¬(x < y) ⇔ (y < x) ∨ (y = x)
¬(x = y) ⇔ (x < y) ∨ (y < x)

Step 6 If P is a C-term, then ¬P is equivalent to a DC-form.


Proof. Let P be A1 ∧A2 ∧ . . . ∧An , where the Ai are atomic expressions. Then, by De
Morgan’s law, ¬P is equivalent to ¬A1 ∨¬A2 ∨ . . . ∨¬An . By Step 5, each ¬Ai is equivalent
to a DC-form and then, by Step 1, ¬P is equivalent to a DC-form.
Step 6 If P is a DC-form, then ¬P is equivalent to a DC-form.
Proof. Let P be C1 ∨C2 ∨ . . . ∨Cn , where the Ci are C-terms. Then, by De Morgan’s law,
¬P is equivalent to ¬C 1 ∧¬C 2 ∧ . . . ∧¬C n . By Step 6, each ¬Ci is equivalent to a DC-form
and then, by Step 4b, ¬P is equivalent to a DC-form.
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 175

Step 7 (The main result at last) If P is any expression with no quantifiers, then it is
equivalent to a DC-form P ∗ .
Proof is by induction over the construction of P .
If P is an atomic expression, then it is already a DC-form.

If P is ¬Q where, by the inductive hypothesis Q is equivalent to a DC-form, then the result


is given by Step 6 above.
If P is Q ⇒ R where, by the inductive hypothesis we may assume that both Q and R are
equivalent to DC-forms, Q∗ and R∗ say, then P is equivalent to Q∗ ⇒ R∗ . But this means
that P is equivalent to ¬Q∗ ∨ R∗ . Now, by Step 6, ¬Q∗ is equivalent to a DC-form, Q∗∗
say. This in turn means that P is equivalent to Q∗∗ ∨ R∗ , which is equivalent to a DC-form,
P ∗ say, by Step 2.
It only remains to point out that at no stage in this process have any new variables been
introduced, so v(P ∗ ) ⊆ v(P ) . 

F.8 Proposition
Let P be any expression in UDLO which contains no quantifiers. Then

(∃z)P ⇔ Q

where Q is a DC-form and v(Q) ⊆ v(P ) r {z} .

Proof. By the previous proposition we may assume that P is a DC-form, that is,

P = C1 ∨C2 ∨ . . . ∨Cn

where each Ci is a C-term. But then (by Predicate Logic)

(∃z)P = (∃z) C1 ∨ (∃z)C2 ∨ . . . ∨ (∃z)Cn

so it is sufficient to prove the result in the case that P is in fact a C-term. So let us assume
that
P = A1 ∧ A2 ∧ · · · ∧ Am ,
where the Ai are all atomic expressions (let’s call them its factors). The proof is by induction
over m, the number of factors in P .
Firstly note that, if none of the factors Ai mention z then the result is trivially true with Q
the same as P . So from now on we may assume that at least one of the factors mentions z.

Next observe that if any one of the factors Ai does not mention z, then it may be “taken
outside”; for instance, if A1 does not mention z, then

(∃z)(A1 ∧ A2 ∧ · · · ∧ An ) ⇔ A1 ∧ (∃z)(A2 ∧ · · · ∧ An )

and then the inductive hypothesis gives the result. So from now on we may assume that all
the factors A1 , A2 , . . . , An mention z.
176 CHAPTER 4. SOME FIRST-ORDER THEORIES

Now the factors of P are each of one of the following forms (where here x stands for any
variable symbol which is different from z).

z=x , z=x , z<x , x<z , z=z , z<z .

If P contains a factor of the form z = z, then that factor may be simply eliminated, since it
is equivalent to T, and then the inductive hypothesis applies. If P contains a factor of the
form z < z, then P ⇔ F since that factor is equivalent to F.

Suppose now that P contains a factor of the form z = x or x = z. Rearranging the factors
if necessary, P is equivalent to an expression of the form (z = x) ∧ R(z), where R(z) is the
conjunction of all the remaining factors. (I have made the variable z explicit here because
we are going to substitute for it.) Now

(∃z)P ⇔ (∃z) (z = x) ∧ R(z)
⇔ R(x)

and R(x) is a conjunction of m − 1 atomic expressions, so again the inductive hypothesis


applies.

From now on we may assume that all of the factors of P are of one of the forms z < x and
x < z.
Suppose now that P contains two such factors with the same variable x. If the two factors
are in fact identical, then they can be replaced by one copy and the inductive hypothesis
applies. On the other hand, if the two factors are different, they must be of the form z < x
and x < z, with the same x. But

z<x ∧ x<z ⇔ F

and then (∃z)P ⇔ F , as required.


From now on we may assume that the variable symbol x which turns up in each factor of P
is different. We now need to write P out more explicitly. There are variable symbols

x1 , x2 , . . . , xp , y1 , y2 , . . . , yq ,

all distinct, such that

P = (x1 < z) ∧ (x2 < z) ∧ · · · ∧ (xp < z) ∧ (z < y1 ) ∧ (z < y2 ) ∧ · · · ∧ (yq < z) (1)

I will prove that then (∃z)P is equivalent to the expression Q which is the conjunction of
all the simple expressions

xi < yj for i = 1 . . . p and j = 1 . . . q (2)

(if there are no such expressions at all, we take this to mean that Q is T).
It is not difficult to see that, in ordinary mathematics, (1) and (2) are equivalent (the proof
is "Stare hard"). The point here is to check that they are proveably equivalent with only
the limited resources of UDLO.
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 177

To see that (∃z)P ⇒ Q we argue as follows. For any i and j, xi < z and z < yj are factors
of P , so P ⇒ xi < z ∧ z < yj and so P ⇒ xi < yj and thus (∃z)P ⇒ xi < yj .
Repeating this for each i and j, we see that (∃z)P implies every one of the factors of Q
listed above and so it implies their conjunction, which is Q.
It remains to show that Q ⇒ (∃z)P .

Suppose first that P has no terms of the form z < yj ; Then Q is T so we want to prove that
T ⇒ (∃z)P , that is, that P . By the previous proposition (F.5),
(∃z) x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xp ≤ z ∧

( z = x1 ∨ z = x2 ∨ . . . ∨ z = xp ) . (3)
Using a choice argument, we choose such a z, and so deduce that xi ≤ z for all i. Using
the axiom UDLO4, there is z 0 such that z < z 0 , from which we deduce xi < z 0 for all the
xi , and then (∃z)P follows immediately.

A similar argument proves the proposition in the case in which P has no terms of the forms
xi < z.
Finally, we assume that P has at least one term of the form xi < z and at least one term
of the form z < yj . For this we need both forms of the previous proposition: there are
theorems of the form (3) above and
(∃z) z ≤ y1 ∧ z ≤ y2 ∧ . . . ∧ z ≤ yq ∧

( z = y1 ∨ z = y2 ∨ . . . ∨ z = yq ) . (4)
Using the Choice Rule (and changing letters for convenience), there are theorems of the
forms
xi ≤ u (5)
u = x1 ∨ u = x2 ∨ . . . ∨ u = xp (6)
v ≤ yj (7)
v = y1 ∨ v = y2 ∨ . . . ∨ v = yq (8)
((5) is a list of theorems, one for each i = 1, 2, . . . , p, and (7) is a list of theorems, one for
each q.)
Applying Proof by Cases (the Constructive dilemma) to (6) and (8), we prove a heap of
special cases, one for each i and j, as follows.

For any particular i and j, we have a case u = xi ∧ v = yj , and we use (2) to get u < v.
Then the axiom UDLO4 tells us that (∃z)(u < z < v) and then, using (5) and (7), for
each i0 and j 0 ,
xi0 < z
z < yj 0

which immediately gives (∃z)P , as required.


Observe that throughout this process, no new variables have been introduced and z has
been removed. 
178 CHAPTER 4. SOME FIRST-ORDER THEORIES

F.9 Corollary
Let P be any expression in UDLO which contains no quantifiers. Then
(∀z)P ⇔ Q
where Q is a DC-form and v(Q) ⊆ v(P ) r {z} .

Proof. By Predicate Logic we have


(∀z)P ⇔ ¬(∃z)¬P
But by F.7 above, ¬P ⇔ P ∗ , where P ∗ is a DC-form, and so
(∀z)P ⇔ ¬(∃z)P ∗ .
Then, by F.8 above, (∃z)P ∗ ⇔ P ∗∗ , where P ∗∗ is a DC-form, and so
(∀z)P ⇔ ¬P ∗∗ .
Finally, by F.7 again, ¬P ∗∗ ⇔ Q, where Q is a DC-form, and so
(∀z)P ⇔ Q .
In this process, v(Q) ⊆ v(P ∗∗ ) ⊆ v(P ∗ ) ⊆ v(P ) . 

The next proposition allows us to remove all quantifiers from P .

F.10 Proposition
Let P be any expression in UDLO. Then
P ⇔ Q
where Q is a DC-form and v(Q) ⊆ v(P ) .

Proof. We proceed by induction over the construction of P .


If P is an atomic expression, then it is already of the required form.
If P is of the form ¬R, then we may assume inductively that R ⇔ R0 , where R0 is a
0
DC-form and v(R ) ⊆ v(R) = v(P ) . But then P ⇔ ¬R and ¬R0 is an expression
0

I F.7 with no quantifiers, so by Proposition F.7 there is a DC-form Q such that ¬R0 ⇔ Q and
0
v(Q) ⊆ v(¬R ) ⊆ v(P ) .
The same argument can be used when P is of the form R ⇒ S
Finally, suppose that P is (∀z)R, so that v(P ) = v(R) r {z} . Then, by the inductive
hypothesis
R ⇔ R0 R0 a DC form, v(R0 ) ⊆ v(R) ,
so P ⇔ (∀z)R0 v((∀z)R0 ) ⊆ v(R0 ) r {z} ⊆ v(P ) ,
0
and (∀z)R ⇔ Q Q a DC form, v(Q) ⊆ v((∀z)R0 ) ⊆ v(P ) . 
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 179

F.11 Corollary
UDLO is complete.

Proof. Let P be any sentence (closed expression) in UDLO. By the proposition, P ⇔ Q,


where Q is a DC-form with no free variables; from the definition of a DC-form, the only
possibilities are Q = T and Q = F. The result follows. 
180 CHAPTER 4. SOME FIRST-ORDER THEORIES

G Functions versus relations


G.1 No need for function symbols
I A.7 In Definition A.7 and the following Theorem A.8 we looked at definition of functions “by
I A.8 description”. It was shown there that, to define a new function, it is enough to define a new
relation symbol, so long as we add a new axiom that the relation behaves like a function
— or, betyer if possible, prove a theorem to that effect. In the theorem we developed a few
tricks so that things we might need to do, such as substituting a function in another one or
substituting it in an expression, could be carried out in this alternative relational notation.
This means that, in order to define a first-order theory, we can if we wish define it entirely
in terms of relations, with no functions at all. Looking at this another way, given a first-
order theory such as The Elementary Theory of Groups or PA, we can if we wish replace all
the functions by relations and rewrite the axioms accordingly to get an entirely equivalent
theory.
Let us now do just that with the Elementary Theory of Groups: it will be a good illustration
of how to replace functions by relations.

G.2 PAr, Relations-only Group Theory


The theory is a first-order one with equality, so we start, of course, with the Axioms of
Predicate Logic and the axioms of equality. Equality is a relation, so we just leave it alone
(and use the usual notation).

In the ordinary theory there are three function symbols


1 the identity nullary
x 7→ x−1 inverse unary
· multiplication binary
so we will replace them by three relations
id unary id(u) says u is the identity.
inv binary inv(u, x) says u = x−1 .
mul ternary mul(u, x, y) says u = x.y
The first thing we must do is add axioms which cause these relations to be function-defining:
(1) (∃!u)id(u)
(2) (∀x)(∃!u)inv(u, x)
(3) (∀x)(∀y)(∃!u)mul(u, x, y)

And now we translate the axioms (G1)–(G3) of The Elementary Theory of Groups into
relational form. Let us start with (G1), the Axiom of Associativity. It is
(G1) (∀x)(∀y)(∀z)( (xy)z = x(yz) ) .
In order to convert this one into an expression using the relational forms above, we use the
I A.8 tricks described in the proof of Theorem A.8. The basic idea is to use extra variables to pull
G. FUNCTIONS VERSUS RELATIONS 181

(G1) apart into manageable fragments. First, we introduce variables u and v to stand for
(xy)z and x(yz), breaking the axiom up thus:

(∀x)(∀y)(∀z) u = (xy)z ∧ v = x(yz) ⇒ u = v

We still need to break up the fragments (xy)z and x(yz) because they still involve the
functional form. For the first one, we introduce a further variable w to stand for xy) and
use the fact that

u = (xy)z is the same as (∃w)( w = xy ∧ u = wz )


which is the same as (∃w)( mul(w, x, y) ∧ mul(u, w, z) ) .

In the same way

v = x(yz) is the same as (∃w)( mul(w, y, z) ∧ mul(v, x, w) )

Substituting back, we have our relations-only version of Axiom (G1):


(G1r)

(∀x)(∀y)(∀z)
(∃w)( mul(w, x, y) ∧ mul(u, w, z) ) ∧ (∃w)( mul(w, y, z) ∧ mul(v, x, w) )

⇒ u=v

Using these same techniques, (G2), (∀x)(x1 = x) becomes


(G2r) (∀x)(∀u)( id(u) ⇒ mul(x, x, u) )
and (G3), (∀x)(xx−1 = 1) becomes

(G3r) (∀x)(∀u)(∀v)( id(u) ∧ inv(v, x) ⇒ mul(u, x, v) ) .

G.3 PAr, Relations-only Peano Arithmetic


Again the theory is a first-order one with equality, so we start with the Axioms of Predicate
Logic and the axioms of equality, and equality and its axioms can be left as is..
The four function symbols are replaced by four relations
zero unary zero(u) says u = 0̄.
suc binary suc(u, x) says u = x+ .
sum ternary sum(u, x, y) says u = x + y
prod ternary prod(u, x, y) says u = x.y
and the rest can safely be left as an exercise.

G.4 Symbols for partial functions


It is interesting to know that we can always replace function symbols by relations in any
first-order theory , but it is not obvious that it is a useful thing to do. The examples above
182 CHAPTER 4. SOME FIRST-ORDER THEORIES

show that the resulting axioms are usually more complicated-looking than the ones using
the function symbols.
However, there is a useful variation on this. A partial function can also be treated as a
relation of a particular kind.

Suppose, for example, we have a theory within which we want to deal with a partial function
f of two variables. Consider the expression

u = f (x, y) .

This means that the pair hx, yi of arguments is in the domain of f and that u is the (unique)
value of the function for those arguments. If we want to replace this by a relation, F (u, x, y)
say, we know that

• there does not have to be such a u for every pair x, y, however


• if such a u exists, then it is unique.

Thus the condition that F define a partial function is simply

(∀x)(∀y)(!u)F (u, x, y) .

This can be useful. We might wish to deal with a first-order theory in which one of our defin-
ing function symbols stands for a partial function. But the definition of a first-order theory
does not allow function symbols for partial functions (in the formal language). However,
this technique allows us to introduce a partial function as a relation, and then use ordinary
functional notation in the semi-formal language (which is usually more convenient).
Where a partial function is to be introduced in this way, one usually needs to add an
axiom, similar to the one above, which ensures that the relation is in fact a partial function.
Another axiom to define its domain may be required also. (It is always possible that both
these statements might follow as theorems from other axioms of the theory.)
An obvious example of this is the inverse operation in a field. Let us look at this now.

G.5 The Elementary Theory of Fields


The axioms for a field, as usually written, look like this:
The operations (function symbols) are zero and identity (nullary), negation and inverse
(unary) and addition and multiplication (binary).
The inverse is a partial function, x−1 being defined if and only if x 6= 0

The axioms are


(F1) (∀x)(∀y)(∀z)( (x + y) + z = x + (y + z) )
(F2) (∀x)( x + 0 = x )

(F3) (∀x)( x + (−x) = 0 )


G. FUNCTIONS VERSUS RELATIONS 183

(F4) (∀x)(∀y)( x + y = y + x )
(F5) (∀x)(∀y)(∀z)( (xy)z = x(yz) )
(F6) (∀x)( x1 = x )
(F7) (∀x)(∀y)( xy = yx )

(F8) (∀x)(∀y)(∀z)( x(y + z) = xy + xz )


(F9) (∀x 6= 0)(xx−1 = 1)
Most of this looks like a perfectly good description of a first-order theory. The only problem
is the inverse function, which is only a partial function, and as such does not fit in with the
definition of a first-order theory.
So, to define the “true” formal theory, we define the inverse by a binary relation inv. Here
is what is needed: replace (F9) by
(F9a) (∀x)(∃!u)inv(u, x)

(F9b) (∀x)( (∃u)inv(u, x) ⇔ x 6= 0 )


(F9c) (∀x)(∀u)( inv(u, x) ⇒ mul(1, x, u) ) .
So, replacing the problematical partial function symbol for the inverse by the relation inv,
and (F9) by (F9a)–(F9c), we do have a proper definition of a first-order system. And now we
can go back to the original version, involving the partial function and (F9) as a semi-formal
version, knowing, from our discussion above, that we can proceed from this basis in the
manner we are used to and probably prefer.
5. MODELS

A Structures

A.1 Discussion
In the informal discussions up to here in these notes, I have often made remarks about what
terms in our formal language “represent” and what expressions “mean”, leaving it to you to
understand as well as you can what I mean. For straightforward theories like PA I think
this is pretty obvious, but for less well-known ones like, say, Projective Geometry it might
be less so. In this chapter we will discuss the definitions of structures and models, which
clears up what it means for a formal language or theory to “talk about” something in these
ways. (And about time!)
A language seldom exists in isolation. It is usually used to describe something, perhaps a
whole class of similar things. If the language has function symbols F and relation symbols
R, then the thing the language describes presumably has (actual) functions and relations to
which these symbols refer. This means that this thing must be some kind of set (or class)
upon which these functions and relations are defined: we will call this a structure. Once we
know this, we can ascertain which members of the structure all the terms of the language
refer to and thus determine which of the statements in the language are true or false, as
they refer to that structure.

In this section we will consider only first-order theories and the languages which underlie
them (first-order languages). Some of the results of interest in this context will involve
languages which are (possibly) uncountably infinite. So this section we will allow languages
to have arbitrarily large sets of variable, function and relation symbols. As usual, the number
of variable symbols will be at least countably infinite.

(Recall that a first-order language L has a function domain, that is, a set F of function
symbols, each with a prescribed arity, a relation domain, that is, a set R of relation symbols,
also each with a prescribed arity, and a set of variable symbols, which we will call vL. These
three things define the language. Once we have such a language its set of terms is defined;
we will call this set tL.)

Then an (F,R)-structure is a set M upon which there are (actual) functions and relations
defined corresponding to the members of F and R. More precisely,

A.2 Definition: Structure


An (F, R)-structure M consists of
(i) A nonempty class |M |, the underlying class or universe of M . In many, but not
all, of the structures we consider, |M | is in fact a set, in which case of course we call it the
underlying set.

185
186 CHAPTER 5. MODELS

What is a “class”?! — A class can be described roughly as something just like a set, only
it might be much bigger. For a couple of examples, the class of all topological spaces, the
class of all groups. (Both these things behave like sets in many respects, but we will prove
later that they are actually “too big” to actually be sets.) Classes will be defined properly
in the next chapter and dealt with there at length. In the meantime, it will do no harm in
this chapter to think “set” when you read “class”.

(ii) To each function symbol f of arity n in F, a function fM : |M |n → |M | (that is, an


n-ary operation defined upon |M |),
(iii) To each relation symbol r of arity n in R, an n-ary relation rM defined upon M .
Let L be a first-order language. Then a structure for the language L is just an (F, R)-
structure, where F and R are the function and relation domains of the language L.
It is important to notice that a structure consists not just of a set or class |M | and the
functions and relations on it, but that it also contains a specification of which function
corresponds to which function symbol and which relation to which relation symbol. In other
words, the functions f 7→ fM and r 7→ rM are part of the structure (we don’t bother to give
these functions names because we won’t need them). In other words again, if you take a
structure M , keep its underlying class |M | but change which actual functions and relations
on |M | some of the symbols of the language refer to, you have changed the structure. For
this reason I am making a distinction here between M , the class with structure, and |M |,
the underlying plain old class. It is necessary to make this distinction in case we should
ever want to define several alternative structures on the same underlying class. Since we do
not often need to do this, we usually ignore this distinction and say, for instance, “m is a
member of M ” or “g is a function defined on M ”.
A. STRUCTURES 187

When I think of a formal language and a structure for it, I usually have a mental picture
something like this:

A function symbol f An actual function fM

A relation symbol r An actual relation rM

The language L The structure M

A.3 Example: PA and N


The theory PA is obviously designed in the first place to describe the natural numbers N.
Consequently we expect N to be a structure for the language of PA, and so it is.
We have the correspondences

0̄ (the symbol for zero) → 0 (the actual zero of N)


= (the equality symbol) → = (actual equality)
suc (the successor symbol) → suc (the actual successor function x 7→ x + 1)
+ (the addition symbol) → + (actual addition)
. (the multiplication symbol) → . (actual multiplication)

Note that we sometimes use the same sign (on paper) to stand for both a relation symbol
in the language and the corresponding actual relation on the structure. This can obviously
be a rich source of confusion, which I alleviate as far as possible in these notes with the
coloured font convention.
188 CHAPTER 5. MODELS

0̄ 0

= =

suc suc

+ +
. .

The language PA The structure N

It is now fairly obvious that any sentence in the language (expression with no free variables
corresponds in the same way to a statement about the structure which is either true or false.
For example

the sentence 2+2=4 in PA the statement 2+2=4 about N

It is important to notice that the idea of a structure for a language pays no attention to any
axioms or theorems that might be defined. So sentences in the language may just as well
correspond to false statements about the structure as to true ones:

the sentence 2+2=37 in PA the statement 2+2=37 about N

Obviously, it will usually be a good thing if the theorems in our theory correspond to true
statements about the structure. A structure which has this property is called a model and
we will discuss these in the next section.

A.4 A fundamental assumption


About these “egg diagrams”: In my “mental picture” diagrams above, the left “egg”
represents the language. Everything inside the egg is meant to be the formal language —
sequences of symbols and the relationships between them only.

Everything outside that egg is meant to be the language we use to describe it and its
properties, the metalanguage. We will use (and have been using) ordinary mathematics,
perhaps supplemented by plain English when necessary, as that language.
Why choose ordinary mathematics? Well, we have to use some sort of language to describe
what is going on and to establish facts about our various theories. It should be obvious by
A. STRUCTURES 189

now that often we have to make quite sophisticated and complicated arguments, the kind
for which ordinary everyday language is just not good enough. Mathematics is designed for
this sort of thing.
However, this does raise a fundamental question: how can we be sure that arguments made
in ordinary mathematics are indeed correct? Indeed, how can we be sure that ordinary
mathematics is at least consistent? This question really becomes important later when we
set about proving that certain other theories are consistent.
One could imagine perhaps using some other, simpler, system of argument to prove that
ordinary mathematics is consistent, but that would only raise the question of the consistency
of that simpler system. With such an approach, the best one could hope for is some sort
of infinite regress, which doesn’t really prove anything. So throughout these notes we make
the fundamental assumption: Ordinary mathematics is a consistent theory.
More specifically, we will use the system of mathematics defined by the Morse-Kelley Axioms
in these notes. This is defined and discussed in detail in Chapter 6. So our fundamental I6
assumption is:

Morse-Kelley mathematics, as defined in Chapter 6 is consistent.

All the arguments we use can equally well be made using the other major system, Zermelo-
Fraenkel (see Appendix B), and all or most of them can be made using somewhat weaker IB
assumptions. We will not go into this in detail here.

There is an alternative, less gung-ho, approach. One can be explicit about this assumption
and simply prefix virtually everything proved in these notes with “If ordinary mathematics
is consistent then . . . ”. For example, in the last chapter we proved that UDLO is complete.
Taking this approach, we can say that that was just an abbreviation for what we really
proved, which was

If ordinary mathematics is consistent then UDLO is complete.

This way of looking at proofs of consistency in particular is usually called relative consistency.
Now, returning to the egg diagrams above, everything outside the left-hand egg is meant to
take place in ordinary mathematics. In particular the right-hand egg, that is the structure,
is constructed in ordinary mathematics. In the above example, in which the structure is N,
that is the ordinary mathematical Natural Numbers which we all know and love, and when
we talk about functions and relations there we mean ordinary functions and relations in the
sense we have always been used to.

A.5 Example: PA and modular arithmetic


There is no reason why a structure for a given language must be the one the language
was originally thought up for. Just so long as it is equipped with the requisite number of
relations and functions of the requisite arities, any structure will do.
190 CHAPTER 5. MODELS

Let us look at the structure Z5 , the ring of integers modulo 5, which can act as a structure
for PA. To see this let us write Z5 as the set { [0] , [1] , [2] , [3] , [4] } and use the notation

= [0] csuc ⊕

for the various relations and functions on it, defined in the usual way. (Here csuc is the
“cyclic successor” function [0] 7→ [1] 7→ [2] →
7 [3] 7→ [4] 7→ [0], ⊕ and are addition and
multiplication modulo 5).

We make this into a structure for the language of PA by making the formal relation and
function symbols of the language correspond to these relations and functions in the obvious
way:
= → =
0̄ → [0]
suc → csuc
+ → ⊕
. →

In this structure, the sentence 3̄ + 4̄ = 2̄ corresponds to the statement [3] ⊕ [4] = [2], which
happens to be true.

A.6 The interpretation of sentences


This correspondence we have been discussing, which can be viewed as giving a “meaning”
to expressions of the language — these meanings being statements about the structure —
is called an interpretation. So far we have only discussed the interpretation of sentences
(expressions with no free variables); the interpretation of expressions (which might or might
not have free variables) is discussed in the next section.
Once we have defined the interpretation in our structure of the various function and relation
symbols of the language, we can combine them together in the obvious way to find the
interpretation of terms. I have already snuck this in in the last example above. Remember
that 3̄ is not a symbol of the fully-formal language; it is an abbreviation in the semi-formal
language for 0̄+++ , or using the generic notation, suc(suc(suc(0̄))). We can treat 4̄ and 2̄ the
same way to find that 3̄ + 4̄ = 2̄ is really an abbreviation for

suc(suc(suc(0̄))) + suc(suc(suc(suc(0̄)))) = suc(suc(0̄)) .

Combining the interpretations of the various symbols here together in the obvious way, this
should be interpreted as

csuc(csuc(csuc(0))) ⊕ csuc(csuc(csuc(csuc(0)))) = csuc(csuc(0)) .

Using the definition of csuc we can simplify this to [3] ⊕ [4] = [2] as claimed, and then verify
that it is true.
This gives us a pretty good idea about how to interpret terms. But how do we interpret
expressions? (Remember that in this section we will only look at sentences.)
The first principle here is that the interpretation does not mess around with Sentential Logic,
so the not-symbol ¬ of the language is interpreted as the ordinary not ¬ of mathematics
A. STRUCTURES 191

and the implication symbol ⇒ of the language as the ordinary if-then implication ⇒ of
mathematics. It is straightforward then to verify that the same thing happens to the other
connectives of Sentential Logic: we have the following interpretations
¬ → ¬
⇒ → ⇒
∧ → ∧
∨ → ∨
⇔ → ⇔
When we come to interpret the quantifier symbols, we have a second principle: where the
language is being interpreted in a structure M , then the members of M are all that the theory
can see — effectively, M is its entire universe (what is called its “universe of discourse”).
Consider our first example above, PA being interpreted in the structure N as usual. Then,
when PA4 says
(∀x)(x + 0̄ = x)

it doesn’t mean that this is true for all possible things in the universe of mathematics —
sets, real numbers, complex functions etc. etc. — it simply means that it is true for all
natural numbers, that is, everything in its own private universe. In the same way, in our
second example above, where PA is being interpreted in Z5 , this axiom would be interpreted
as saying that this was true for all five members of Z5 .

In general, where a language L is being interpreted in a structure M , the quantifiers have


the following interpretations:
(∀x) → (∀x ∈ M )
(∃x) → (∃x ∈ M )
This all boils down to a very simple recipe for finding the interpretation of sentences in
the language: simply work through the sentence, symbol by symbol, replacing function and
relation symbols by the corresponding functions and relations in the structure, and logical
symbols as just described. For instance, the following theorem of PA

(∀x)(∀y)(∀z)( x + y = x + z ⇒ y = z)

is interpreted in N as

(∀x ∈ N)(∀y ∈ N)(∀z ∈ N)( x + y = x + z ⇒ y = z)

and in Z5 as

(∀x ∈ Z5 )(∀y ∈ Z5 )(∀z ∈ Z5 )( x ⊕ y = x ⊕ z ⇒ y = z).

Note that this recipe breaks down if the expression contains free variables. If we try to
interpret (for example) x + 3̄.y = 7̄ this way, we can replace the + by +, 3̄ and 7̄ by 3 and
7, but what are we to replace x and y by? (We will sort this out in the next section.)
192 CHAPTER 5. MODELS

A.7 Two very simple examples


(i) The trivial language, which has no function symbols and no relation symbols at all
(not even equality): F = R = ∅. Any nonempty class whatsoever can act as a structure.
(ii) The Language of Pure Identity, in which there are are no function symbols and only
one relation, equality: F = ∅ and R = {=}. Any nonempty class, with the equality symbol
representing actual equality, can act as a structure.

A.8 Example: Projective planes


I 4.B.4 We can now see that the picture of the seven-point plane displayed in 4.B.4 is a diagrammatic
way of describing a structure for the theory of projective planes. Using the notation there
(following the picture), the underlying set of the structure is {p1 , p2 , . . . , p7 , l1 , l2 , . . . , l7 },
the unary relations defining points and lines are given by
P (pi ) is true for i = 1, 2, . . . , 7 (All the pi are points)
L(pi ) is false for i = 1, 2, . . . , 7 (None of the pi are lines)
P (li ) is false for i = 1, 2, . . . , 7 (None of the li are points)
L(li ) is true for i = 1, 2, . . . , 7 (All the li are lines)
and the binary relation defining incidence is give by the table displayed in that example.

A.9 Example: Groups


It should be obvious by now that any individual group is a structure for the Elementary
I 4.E Theory of Groups (Section 4.E). For a specific example, the symmetric group S4 is one of
the many possible structures for this language; for another example, choose your favourite
group.
Note that, when a particular group is chosen as the structure for this theory, that group
becomes its entire universe — “everything that it can see”. Consequently, there is no way
(in this theory) to talk about more than one group at a time, and so there is no way of
discussing homomorphisms, quotient groups, subgroups etc. etc. This theory is really very
limited.

A.10 Interpretation of definitions


In developing our theories, we hardly ever stick to the fully-formal version. Normally we
work most of the time in a semi-formal version which usually contains many extra notations,
usually defined as abbreviations for more complicated expressions (or subexpressions). Thus
it is important to know how to interpret these extra notations.
Example Suppose we are interpreting PA in N (the usual interpretation). Here is the
Dichotomy Law, one of the theorems:
(∀x)(∀y)( x ≤ y ∨ y ≤ x ) . (–1)
This contains the symbol ≤, which has been introduced into the semi-formal version of the
language by the definition
x≤y = (∃z)(x + z = y) . (–2)
A. STRUCTURES 193

From here there are two approaches to interpreting (–1). One obvious thing to do is simply
use the definition (–2) to convert (–1) back into the fully-formal version it is short for,

(∀x)(∀y)( (∃z)(x + z = y) ∨ (∃z)(y + z = x) ) ,

then interpret that according to the method described above. For this example, we would
get 
(∀x ∈ N)(∀y ∈ N) (∃z ∈ N)(x + z = y) ∨ (∃z ∈ N)(y + z = x) . (–3)
This method is tiresome, especially when the definitions are complicated — and becomes
truly horrible when we deal with definitions which depend on earlier definitions which in
turn depend on even earlier ones, and so on. A much better approach is to interpret the
definition itself directly once and for all. In this example, we know that,

wherever x ≤ y occurs as part of a sentence, we can replace it by (∃z)(x + z = y)


and (∃z)(x + z = y) is interpreted as (∃z ∈ N)(x + z = y),
so any occurrence of x ≤ y can be interpreted directly as (∃z ∈ N)(x + z = y).
But we also know that (in N) (∃z ∈ N)(x + z = y) is just a complicated way of saying x ≤ y.
SO we have a new rule: x ≤ y can be interpreted as x ≤ y. (–4)

(This, of course, is why the symbol ≤ was defined the way it was in PA.)
Having determined this rule once, we can then interpret (–1) in a single step as

(∀x ∈ N)(∀y ∈ N)( x ≤ y ∨ y ≤ x ) (–5)

much easier!
By the way, you can see that (–3) simplifies to (–5).
194 CHAPTER 5. MODELS

B Interpretations
B.1 Remarks leading to the definition of an interpretation
Now we set about defining what an interpretation of an expression is in general — when
that expression might contain free variables.
Before we go any further, I should point out that there are three ways of looking at an
expression with free variables, only one of which we will officially call an interpretation.
(1) When an expression with free variables occurs as a theorem (and that includes as an
axiom or a step in the proof of a theorem) it is meant to imply that the expression is true
for all values of those free variables. For example, in PA we proved the theorem
(∀x)(∀y)( x + y = y + x ) (–1)
but we could just as well have written this theorem in the shorter form
x + y = y + x. (–2)
((–2) follows from (–1) by PL4 and (–1) follows from (–2) by UG.) So (–2) is the kind of
expression we are talking about here.
(2) When an expression with free variables occurs as a step in the proof of a deduction
from hypotheses, it is meant to imply that the expression is true for all values of those
variables which satisfy all the hypotheses in force for it.

(3) But what are we to make of any old expression which does not turn up as a theorem
or a step in a proof? Let us take a simple example (in PA again):
x + 3̄.y = 7̄ . (–3)
Normally, what one would think about an expression like this is, “Whether it is true or false
depends upon the values given to x and y”. It is this way of looking at an expression that
is called an interpretation, so we now set about making it precise.
So, in order to interpret an expression, one must decide first on an assignment of values to
all the free variables. Assign the values
x → 1 and y → 2
and the interpretation is 1 + 3.2 = 7 which is true (the dot here represents multiplication,
not a decimal point), or assign the values
x → 2 and y → 1
and the interpretation is 2 + 3.1 = 7 which is false.
Another example: consider the expression
(∃y)(x + y = z) . (–4A)
This one has two free variables and one bound one. To interpret it in N, we assign particular
values for x and z in our structure N, for example
x → a and z → c
B. INTERPRETATIONS 195

and then the interpretation is


(∃y ∈ N)(a + y = c) . (–4B)
(Notice that we only need to assign values to x and z; the variable y is bound and taken
care of as described earlier.) For this example, we see that whether this interpretation is
true in N depends on the particular choices we make for a and c.
We could just as well have interpreted this expression (–4A) in the structure Z5 . To do that,
we would first have to assign values a and c in Zn to x and z, and then the interpretation
would be
(∃y ∈ Z5 )(a ⊕ y = c) . (–4C)
This one happens to be true for all choices of a and c, but that’s just a special property of
Z5 .

Suppose now we add another quantifier to our expression,

(∀x)(∃y)(x + y = z) . (–5A)

To interpret this we need only assign a value, c say, to z in our structure and then the
interpretation in N would be

(∀x ∈ N)(∃y ∈ N)(x + y = c) (–5B)

(which is false whatever value we assigned to z) and in Z5 it would be

(∀x ∈ Z5 )(∃y ∈ Z5 )(x ⊕ y = c) (–5C)

(which is true for any value assigned to z).


Note what has happened here: in the passage from (–4A) to (–5A), we have added one
more quantifier (∀x). In the corresponding passages from (–4B) to (–5B) and from (–4C) to
(–5C), we have forgotten our chosen value a for x, and replaced it by an ordinary variable
which is quantified. Perhaps the best way to think about this is that the rôle of the symbol
x has changed — from a free variable to a bound one — and so the way we interpret it also
changes, from a symbol which needs an assigned value before we can interpret the expression
to one that doesn’t.
To make this work smoothly, it is convenient to assume that we have assigned values,
members of the structure, to all the variable symbols: we call this an assignment. (Then,
when interpreting expressions with quantifiers like (–4A) and (–5A) above, we simply ignore
the chosen values of the quantified variables along with any which do not occur in our
expressions.) Once this has been done all terms automatically take on corresponding values,
also members of the structure, and all expressions automatically translate into statements
in ordinary mathematics about the structure. What’s more, where the same variable occurs
free in different expressions, it is assigned the same value everywhere, so everything works
together properly.
The upshot of all this is that, for any given language L and structure M for L (and except
in utterly trivial cases), there are lots of interpretations L → M , corresponding to different
assignments of the variables. Under any particular interpretation, every expression is either
true or false, even ones with free variables. However, expressions with free variables will
196 CHAPTER 5. MODELS

have truth-values which depend on the choice of assignment of the variables and thus on
the interpretation, whereas it is obvious that sentences (closed expressions) will have truth-
values independent of the particular assignment/interpretation chosen.
While it is clear how this goes, it is necessary to have a proper definition of the process,
so that further definitions may be made and facts proved about it, including the important
fact that proofs in the language translate to valid proofs about the structure in ordinary
mathematics.

B.2 Definition: Assignment


Let M be a structure for the language L.
An assignment (of the variables of L in M is simply a function θ : vL → M , where vL is
the set of variables of the language. (Think of θ as assigning a value in M to each variable
symbol in the language.)
Next we look at how an interpretation works as far as the terms of a language goes.

B.3 Definition:Interpretation on the terms of a language


Let M be a structure for the language L.
An interpretation of the terms of the language L in the structure M is a function θ : tL → M ,
where tL is the set of terms of the language, which satisfies the property

(I1) For every function symbol f of the language and terms t1 , t2 , . . . , tn ,

θ(f (t1 , t2 , . . . , tn )) = fM (θ(t1 ), θ(t2 ), . . . , θ(tn ))

(where n of course is the arity of the function symbol f .)


There are several important things to notice about this definition.
First, the property (I1) simply says that θ behaves like what in algebra would be called a
homomorphism.
Second, any assignment of the variables of L can be extended uniquely to an interpretation
of all of its terms. This is because, given an assignment, (I1) amounts to an inductive
definition of the values θ takes on all other terms.

Third, if two interpretations on the terms agree on all the variables, then they are the same
(i.e. agree on all terms). This is for the same reason as the second observation above.
Fourth, this is merely a more precise definition of the action of an interpretation on the
terms as described in B.1 above.
Before we can make a precise definition of the interpretation of expressions, we need the
following strange little definition.
B. INTERPRETATIONS 197

B.4 Definition: Substitution in an interpretation


Let L be a language, M be a structure for L and θ : tL → M an interpretation of the terms
of L. Also, let v be any variable symbol in vL and m any member of M .
Then θ[v/m], the result of substituting m for v in θ is the interpretation whose action on
the variables vL is given by
(
m if x is v
θ[v/m](x) =
θ(x) otherwise

This definition is not as complicated as it looks. It says that, given an old interpretation θ,
we make the new one by assigning the value m to the variable v (instead of whatever it was
assigned to) and leaving all the other variables as they were. (Since this specifies what the
new interpretation does to all the variables, it completely specifies it.)
This can be said in a more useful way: given θ, we can calculate θ[v/m](t) for any term t by

• replacing every occurrence of the variable symbol v by m,


• replacing every other variable symbol u by θ(u) and
• replacing every function symbol f by fM .

B.5 Definition: Interpretation of an expression


Let θ : tL → |M | be an interpretation and P be an expression in L. Then the interpretation
of P under θ is a statement θ(P ) (in ordinary mathematics) defined by induction over P
as follows:—
(i) If P is an atomic expression, P = r(t1 , t2 , . . . , tn ) then

θ(P ) is rM (θ(t1 ), θ(t2 ), . . . , θ(tn )) .

(ii) If P = ¬Q then
θ(P ) is ¬θ(Q) .

(iii) If P = Q ⇒ R then

θ(P ) is θ(Q) ⇒ θ(R) .

(iv) If P = (∀x)Q then

θ(P ) is (∀m ∈ |M |)θ[x/m](Q) .

All this is pretty straightforward, except perhaps for Part (iv). An example should explain
what is going on here. Let us consider the expression

2.x = y (–1)
198 CHAPTER 5. MODELS

and an interpretation θ which takes


x → a and y → b
(where a, b ∈ M of course; we can ignore whatever θ does to other variables). Then θ takes
the expression (-1) to
2.a = b
Now let us see what happens if we add a quantifier to (–1):
(∀x)( 2.x = y ) (–2)
By our original simpler description, θ should take this expression to
(∀x ∈ M )( 2.m = b) (–3)
By the last, more complicated looking, definition above, θ should take this expression (–2)
to
(∀m ∈ M )θ[x/m]( 2.x = y ) (–4)
Take this one step at a time. Firstly, θ[x/m] is the assignment which agrees with θ, except
that it takes x to m instead of a, that is, it takes
x → m and y → b
and consequently, applying this to (–1),
θ[x/m]( 2.x = y ) is the expression 2.m = y
and thus (–4) is just the expression
(∀m ∈ M )( 2.x = b)
which is the same as (–3), except for having m instead of x, which doesn’t matter.
What this all boils down to is that the definition of an interpretation (of expressions) given
in B.5 is same as the earlier description in B.1. The latter one is more complicated but more
precise. Here are a couple of recommendations:
If you want to apply an interpretation to an expression, use the simpler description in B.1.
If you need to prove something about interpretations, use the more precise one B.5.

A couple of useful words:—

B.6 Definition: Satisfaction, true for M


Let P be an expression in a first-order language L and M a structure for L.

(i) For any interpretation θ : L → M , we say P is satisfied by θ to mean that θ(P ) is


true and P is falsified by θ to mean that θ(P ) is false.
(ii) We say P is true for M if P is satisfied by every interpretation L → M and P is
false for M if P is falsified by every interpretation L → M .

And a couple of useful facts:—


B. INTERPRETATIONS 199

B.7 Proposition
Suppose that P is an expression in a first-order language L and M is a structure for L.
(i) If θ and ϕ are two interpretations of L in M which agree on all the free variables of
P , then the interpretations of P under θ and ϕ are the same; therefore P is satisfied by θ if
and only if it is satisfied by ϕ.
(ii) Suppose that P is a sentence (no free variables). Then the interpretation of P under
all interpretations in M is the same. Therefore if P is satisfied by any one interpretation
then it is satisfied by every interpretation (so then it is true for M ). By the same token, if
P is falsified by any one interpretation then it is falsified by every interpretation.

Proof. Obvious. 
200 CHAPTER 5. MODELS

C Models
In the discussion of interpretations of a language in a structure in the last section, no
attention was paid to theorems or proofs. In fact, examples were given where theorems
of a theory were interpreted as false statements about the structure. For this reason, we
could discuss interpretations of a language in a structure, because theorems and proofs were
irrelevant.
But of course, the interesting and useful kind of structure is one in which the theorems are
in fact interpreted as true statements. Such a structure is called a model. In discussing
models, then, theorems become relevant, so we discuss models for theories, not languages.
The definition is now just what you would expect.

C.1 Definition: Model


Let A be a first-order theory. Then a model for A is a structure M for it such that every
theorem of A is true for M .
In other words, M is a model for A if every theorem of A is satisfied by every interpretation
of it in M .
In a slightly more general context, given a first-order language L and a set E of expressions
in it, we can say that a structure M for L is a model for E if every member of E is true for
M.
In this same general setting, if E is a single expression in the language L, we will abuse
language slightly and say that M is a model for E if it is a model for {E}. This is then the
same thing as saying that E is true for M .

C.2 Proposition
(i) Let L be a first-order language and M a structure for L. Let E be a set of expressions
and X an expression in L such that E X.

If E is true for M , so is X.
In other words, if M is a model for E, then it is a model for X also.
Note that we are talking about entailment here.

(ii) Suppose that L is a first-order language, E a set of expressions in L and M is a


structure for which there exists an interpretation θ : L → M which satisfies not only E but
all expressions entailed by E. Then M is a model for E (and therefore for all expressions
entailed by E).
The point of this is that all one needs is a single interpretation θ which satisfies E . Then,
provided also that all expressions entailed by E are satisfied by θ too, we can conclude that
they are all satisfied by every interpretation in M .

Proof (i). We want to prove that X is true for M , given the assumption that every member
of E is true for M . We also have the assumption that E X which tells us that there is a
C. MODELS 201

proof, S 1 , S 2 , . . . , S n say, of X from E. As usual we work our way from left to right in this
list, proving that each step Si is true for M (assuming that this holds for all earlier steps).
So let S be any one of these steps and suppose that all earlier steps have already been proved
to be true for M .

If S is a member of E then we are given that it is true for M .


Now suppose that S follows from two earlier steps by Modus Ponens; then those steps are
of the forms P and P ⇒ S. Since they are earlier steps, they are both true for M . Let θ
be any interpretation of L in M ; then θ(P ) and θ(P ⇒ S) are both true., But θ(P ⇒ S) is
just θ(P ) ⇒ θ(S) and then θ(S) is true (by MP in ordinary mathematics). Since this is so
for any such interpretation θ, S is true for M .
Now suppose that S follows from an earlier step by Universal Generalisation. If that step was
P then S must be (∀y)P [x/y] (for some variables x and y). Now let θ be any interpretation
of L in M ; we want to prove that (∀y)P [x/y] is true under θ, that is, that θ((∀y)P [x/y]
is true. But, using B.5(iv), this is (∀m ∈ M )θ[y/m](P [x/y]). Also, checking the double
substitution here, we see that θ[y/m](P [x/y]) simplifies to θ[x/m](P ). Thus our aim is to
prove that (∀m ∈ M )θ[x/m](P ) is true. But this is in fact easy: we have the assumption
that P is true for M , that is, that it is satisfied by every interpretation of L in M — and
that goes for all the interpretations θ[x/m] for all m ∈ M .
Notice how the tricky little definition B.5(iv) is used here to good effect. It is difficult to
see how this useful result could be proved any other way.

(ii) Let E be member of E; we want to show that it is true under every interpretation
θ : L → M . If E contains no free variables, then this follows immediately from B.7(ii) above. I B.7
Suppose now that E contains free variables, x1 , x2 , . . . , xk say. Let F be the statement
obtained from E by universally quantifying all these variables, i.e. (∀x1 )(∀x2 ) . . . (∀xk )E.
Then by UG, F is entailed by E and so is satisfied by θ. Also it contains no free variables
and so is true for every interpretation A → M , that is, M is a model for F . But F ⇒ E by
Predicate Logic Axiom PL4. Therefore M is a model for E. 

C.3 Proposition
(i) Let A be a first-order theory and M be a structure for A. Then M is a model for A
if and only if it is a model for its axioms.
(ii) Let A be a first-order theory and M a structure for A. Then every theorem of PL is
true for M .

Well, that’s good to know!


(iii) Suppose that A is a first-order theory and M is a structure for which there exists an
interpretation θ : A → M which satisfies all the theorems of A. Then M is a model for A.
(iv) Suppose that A is a first-order theory in which all the axioms are sentences (closed
expressions) and M is a structure for which there exists an interpretation θ : A → M which
satisfies all the axioms of A. Then M is a model for A.
202 CHAPTER 5. MODELS

Proof (i). This is an immediate corollary of C.2(i) above.


(ii) Given (i), it is enough to show that all the axioms of PL are true for M .
Axioms PL1–PL5 are

P ⇒ (Q ⇒ P )
(P ⇒ (Q ⇒ R)) ⇒ ((P ⇒ Q) ⇒ (P ⇒ R))
(¬P ⇒ ¬Q) ⇒ ((¬P ⇒ Q) ⇒ P )

(∀x)(P ⇒ Q) ⇒ (∀x)P ⇒ (∀x)Q
P ⇒ (∀x)P , where x does not occur free in P
(∀x)P ⇒ P [x/t] , provided that [x/t] is acceptable in P

and, under any interpretation θ : L → M , these translate to

θ(P ) ⇒ (θ(Q) ⇒ θ(P ))


(θ(P ) ⇒ (θ(Q) ⇒ R)) ⇒ ((θ(P ) ⇒ θ(Q)) ⇒ (θ(P ) ⇒ θ(R)))
(¬θ(P ) ⇒ ¬θ(Q)) ⇒ ((¬θ(P ) ⇒ θ(Q)) ⇒ θ(P ))

(∀x ∈ M )(θ(P ) ⇒ θ(Q)) ⇒ (∀x ∈ M )θ(P ) ⇒ (∀x ∈ M )θ(Q)
θ(P ) ⇒ (∀x ∈ M )θ(P ) , where x does not occur free in P
(∀x ∈ M )θ[x/m](P ) ⇒ θ(P [x/t]) , provided that [x/t] is acceptable in P

These are all obviously true in ordinary mathematics, except for the last one (PL6), which
pehaps requires explanation. The left hand side of this states that θ[x/m](P ) is true for
all m ∈ M . Now t is a term of L, so θ(t) ∈ M . The left hand side therefore implies the
particular case in which θ(t) is substituted for m, that is θ[x/θ(t)](P ). It remains only to
check the definitions to see that this is the same as θ(P [x/t]). We have shown that the six
axioms of PL are all satisfied by every interpretation L → M ; that means they are all true
for M .
(iii) This is an immediate corollary of C.2(ii) above.
(iv) For each axiom of A, the fact that it is a sentence and is satisfied by an interpretation
I B.7 means that it is satisfied by all interpretations (see B.7), and is therefore true for M . Since
this is true for all the axioms of A, the result follows from Part(i). 

C.4 The fundamental theorem of Model Theory


Now we are in position to state an important theorem about models; it is probably not an
exaggeration to say that it is the Fundamental Theorem of Model Theory. It states:

A first-order theory has a model if and only if it is consistent.

We will prove one direction of this theorem now and look at a couple of ways it can be used.
The other direction (that if a theory is consistent then it has a model) will be dealt with in
I Ch.8 our second helping of Model Theory, Chapter 8.
C. MODELS 203

C.5 Theorem
If a first-order theory has a model, then it is consistent.

Proof. Suppose that the first-order theory A has a model M . If A were inconsistent, then
it would have a theorem of the form P ∧ ¬P where P is a sentence. Writing PM for the
interpretation of P in this model, the interpretation of the theorem is PM ∧ ¬PM , which
cannot be true in M because of our fundamental assumption above. 

C.6 Comments
This gives us a straightforward way of proving a theory is consistent: just build a model.
For example N is a model of the theory PA, therefore

PA is consistent.

For another example, consider the theory of projective planes, as described in section 4.B.3. I 4.B.3
The Seven-point plane, defined immediately after that is easily seen to be a model. Therefore

The theory of projective planes is consistent.

The celebrated proof by Paul Cohen (and the simplified version by J. Barkley Rosser) that
the negation of the Axiom of Choice is consistent with Mathematics (as a formal theory)
depends upon using ordinary mathematics to build a model of itself plus the negation of
the Axiom of Choice. (The construction of the model and the proof that everything works
is very long and complicated and beyond the scope of these notes.)
Now suppose that we have proved a theory to be consistent. The next thing we are likely to
ask is: are the axioms we have chosen independent? (To say that the axioms are independent
means that no one of them can be proved from the others.)
Normally, we want our axiomatic basis to be as simple as possible, at least not redundant
— it is just a nuisance if one of them can be proved from the others. Using models, we can
check independence, using the following theorem.

C.7 Definition: Independence of axioms


(i) Let A be a first-order theory with proper axioms A. To say that one of the axioms,
B say, is independent of the others means that B cannot be proved in the first-order theory
defined by the other axioms A r {B}. Equivalently, A r {B} / B in PL.

(ii) (Saying the same thing in slightly different notation.) Let A be a first-order theory
with proper axioms A ∪ {B}. To say that B is independent of the others means that B
cannot be proved in the first-order theory defined by the other axioms A. Equivalently,
A / B in PL.
(iii) And, of course, to say that the axioms of a first-order theory are independent means
that every one of them is independent of the others.
204 CHAPTER 5. MODELS

C.8 Proposition
Suppose we have a first-order theory A with proper axioms A∪{B}. If the first-order theory
with axioms A ∪ {¬B} is consistent, then, in A, the axiom B is independent of the others.

Proof. To say that the theory A is consistent means that there is no expression X such
that X ∧ ¬X in A, and this is the same thing as saying that there is no expression X
such that A ∪ {B} X ∧ ¬X in PL.

Suppose then that B is not independent of the other axioms. That means that A B in
PL and then of course A ∪ {¬B} B in PL. But also (trivially) A ∪ {¬B} ¬B in
PL, contradicting the consistency of A ∪ {¬B}. 

C.9 Comment
This gives a neat way of showing that axioms are independent. Given a first-order theory
A with proper axioms A, we can show that any one of them, B say, is independent of the
others by constructing a model of the theory A in which axiom B is replaced by ¬B. And
doing this for each axiom separately would prove that the set of axioms is independent.
I A.5 For example, using Z5 as a structure for PA, as we did in Section A.5, it is quite straight-
forward to prove that all of the axioms except for PA1 are true for Z5 , but that PA1 fails in
this structure. It follows that, for this theory, PA1 is independent of the other six axioms.

C.10 Exercise
Show that the other six axioms of PA are independent. (This involves building six different
models, one for each axiom).
D. MODELS WHICH RESPECT EQUALITY 205

D Models which respect equality


D.1 Comments
If L is a first-order language with equality and M is a structure for L, then (by the definition
of a structure) M must have a binary relation =M .

In particular, if A is a first-order theory with equality and M is a model for A, then again
M must have a binary relation =M .
But there is nothing in the definition of either a structure or a model which says that =M
must be ordinary equality!

In a model, the axioms of equality force the relation =M to be an equivalence relation; not
only that, but an equivalence relation with special properties. It must, for instance behave
well towards the functions of the structure, in the sense that, for any n-ary function symbol
f and terms x1 , x2 , . . . , xn , x0 1 , x0 2 , . . . , x0 n ,

if x1 = x01 , x2 = x02 , . . . , xn = x0n then fM (x1 , x2 , . . . , xn ) =M fM (x0 1 , x0 2 , . . . , x0 n ) .

It must also behave well in a similar fashion towards all relations and indeed to all expres-
sions.

However none of this guarantees that =M must actually be equality itself. In many cases it
does of course and this is clearly a very useful property of a model, so we give it a name:

D.2 Definition
A model M respects equality if the relation =M is ordinary equality.

(We could say, if =M = =, but that would be silly!)


To show that we are not wasting our time with a meaningless distinction...

D.3 Example: a model which does not respect equality


Choose a natural number n 6= 0; it doesn’t matter which one, but keep it fixed for the rest
of this example.
Strip down our theory PA by removing Axiom PA1. So now we have a theory, let’s call it
PAo , with all the functions and relations of PA and all the axioms except for PA1.
Now let us make a structure for PAo by starting with N as before, interpreting all the
functions 0̄, suc, + and . in the usual way but interpreting equality = as congruence modulo
n.
It is straightforward to check that this modified version of N is indeed a model for PAo , but
that equality is not respected, since it is interpreted as congruence, which is different from
equality.

I specified n 6= 0 to make the congruence different from equality, and excluded Axiom PA1
because it is not true for this model.
206 CHAPTER 5. MODELS

D.4 Remark
I8 In Chapter 8, when we prove that every consistent first-order theory has a model, we will
also prove that if a theory with equality has a model then it has one which respects equality.
Then it will follow that every consistent first-order theory with equality has a model which
respects equality. That is reassuring.

D.5 Definition: Isomorphic structures


Let M and N be two structures for the same language L. Then M and N are isomorphic if
there is a bijection ϕ : M → N which preserves the actions of all functions and relations of
L, that is,

(i) For any formal function symbol f of the language of arity n and members x1 , x2 , . . . , xn , y
of M ,
fM (x1 , x2 , . . . , xn ) = y ⇔ fN (ϕ(x1 ), ϕ(x2 ), . . . , ϕ(xn )) = ϕ(y)

(ii) For any formal relation symbol r of the language of arity n and elements x1 , x2 , . . . , xn
of M ,
rM (x1 , x2 , . . . , xn ) ⇔ rN (ϕ(x1 ), ϕ(x2 ), . . . , ϕ(xn )) .

This is really just the obvious definition. It is laid out carefully here just to be on the safe
side. It is usually only of interest where M and N are models.
We can now define the idea of a categorical theory properly.

D.6 Definition: Categorical


A first-order theory L with equality is categorical if all its models which respect equality are
isomorphic.

D.7 Discussion
(i) What’s equality got to do with this? Why not make the much simpler definition:
Any formal theory is categorical if all its models are isomorphic?

The reason is that the simpler definition is useless. Every formal theory that has any models
at all has models which are not isomorphic to one another.
(ii) We saw above that the axioms given for equality are not sufficient to ensure that
=M is in fact equality in a model M . (For this reason we needed to introduce the extra idea
of a model which respects equality.) So why not add some extra axioms to fix this, to make
sure that =M must be true equality?
The reason is that, no matter what extra axioms one adds, so long as the resulting theory
has a model at all, then it has one which does not respect equality, that is, one for which
=M is not true equality.

The next example establishes both these awkward facts.


D. MODELS WHICH RESPECT EQUALITY 207

D.8 Example
Let A be any first-order theory and M be any model of A.
Let P be any nonempty set and p0 be some particular member of P . Make N = P × M
into a structure for A as follows:

(i) For any formal function symbol f of A of arity n and members hp1 , m1 i, hp2 , m2 i, . . . , hpn , mn i
of N , define

fN (hp1 , m1 i, hp2 , m2 i, . . . , hpn , mn i) = hp0 , fM (m1 , m2 , . . . , mn )i

(ii) For any formal relation symbol f of A of arity n and members hp1 , m1 i, hp2 , m2 i, . . . , hpn , mn i
of N , define

rN (hp1 , m1 i, hp2 , m2 i, . . . , hpn , mn i) if and only if rM (m1 , m2 , . . . , mn ) .

Check, as an exercise, that this is indeed another model for A.


Note that, by taking the set P to be large enough (the power set of M for instance) we can
ensure that there can be no one-to-one correspondence between M and N , so they cannot
be isomorphic.
Note also that, if A is a first-order theory with equality and if P contains at least two
members then, N does not respect equality.
This example tells us two interesting things.
First, every first-order theory with equality which has a model also has a model which does
not respect equality. And, when we have proved in Chapter 8 that every first-order theory I8
does have a model, we will then know that every first-order theory with equality has a model
which does not respect equality.
Second, this means in turn that there are no extra axioms one can add to the axioms for
equality that will force it to be interpreted as actual equality in any model.
208 CHAPTER 5. MODELS

E Example: more models of the projective plane


E.1 The Real Projective Plane
We work on the surface of a sphere in three-dimensional Euclidean space (i.e. an ordinary
sphere). The things corresponding to lines in the theory will be great circles on the sphere
and the things corresponding to points in the theory will be pairs of opposite points on the
sphere. Incidence corresponds to ordinary incidence on the sphere — there is no ambiguity
here, if one point of the sphere lies on a great circle, then so does the opposite point, and
vice versa.
So, to be more precise, choose any sphere in three dimensional Euclidean space, for instance
the unit sphere { x : ||x|| = 1 }.
The model consists of the set of all pairs of opposite points on this sphere,

{ hx, −xi : ||x|| = 1 }

together with all great circles on this sphere.


Obviously, we interpret the relations of the theory thus:
P (x) is true if and only if x is one of the pairs of opposite points,
L(a) is true if and only if a is one of the great circles and
i(x, a) is true if and only if the pair x lies on the great circle a.

It is not difficult to see that this structure is a model for the Projective Plane. It is also
easy to see that it has an infinite number of point-pairs, and so is not isomorphic to the
Seven-point Plane.

E.2 Lots of models


Let V be a vector space of dimension 3 over any field K. We construct a structure by
making points in the theory correspond to 1-dimensional subspaces and lines in the model
to 2-dimensional subspaces.
More carefully, the model consists of all 1- and 2-dimensional subspaces of V , with
P (x) true if and only if x is a 1-dimensional subspace of V ,
L(a) is true if and only if a is a 2-dimensional subspace of V , and
i(x, a) is true if and only if x is contained in a (that is, x ⊆ a).

From elementary vector space theory, we see that

• Any two distinct 1-dimensional subspaces generate a 2-dimensional one, so Axiom


(PP3) is satisfied;

• Any two distinct 2-dimensional subspaces intersect in a 1-dimensional one, so Axiom


(PP4) is satisfied; and

• If {b1 , b2 , b3 } is a basis for V , then any three of {b1 , b2 , b3 , b1 + b2 + b3 } generate


the whole space so the 1-dimensional subspaces generated by these four vectors satisfy
the requirements of Axiom (PP5).
E. EXAMPLE: MORE MODELS OF THE PROJECTIVE PLANE 209

It is of interest to observe that this construction gives a large number of different (i.e., non-
isomorphic) projective planes, by simply choosing different fields for the field K of scalars
for V . This shows that the theory is not categorical.
Choosing K to be the Galois field GF(2) of two elements (which is the field of integers
modulo 2), this construction gives the Seven-point plane discussed above (well, something
isomorphic to it).
Choosing K to be the reals R yields the Real Projective Plane, a construction isomorphic to
the “spherical” one given above. To see this isomorphism, note that if the field of scalars is
R, then V is Euclidean 3-space. Make the subspaces of this construction correspond to their
intersections with the unit sphere: each 1-dimensional subspace is a straight line through
the origin and so intersects the sphere in a pair of opposite points; and each 2-dimensional
subspace is a plane through the origin and so intersects the sphere in a great circle.
210 CHAPTER 5. MODELS

F PL is not complete
F.1 Discussion
I 3.H In Section 3.H we proved that Plain Predicate Logic is consistent by a technique of attaching
a value of 1 or 0 to every statement in such a way that all theorems have value 1 and all
antitheorems value 0.

We can now reveal that this technique was actually the use of a (particularly simple) model
in disguise. The way it works is this.
Our model is a one-member set, M = {m} say, on which the actions of the functions and
relations are defined thus:

for any n-ary function symbol f , fM (m, m, . . . , m) = m ,


for any n-ary relation symbol r, rM (m, m, . . . , m) is true .

I 3.H Now, looking at the technique of 3.H, you can see that we defined

kP k = 1 if P is true for this model and


kP k = 0 if P is false for it.

Now we can alter this model slightly to get another fundamental result for Plain PL.

Make a second model, let’s call it M 0 , defined in exactly the same way as M above, with
the one change that the actions of the relation symbols are defined

for any n-ary relation symbol r, rM (m, m, . . . , m) is false .

Everything still works (check it!) and all theorems of PL are still true for this model.

Choose a unary relation, r say, and consider the sentence P = (∀x)r(x). Now P is false
for the second model M 0 , so it cannot be a theorem of PL. On the other hand, ¬P is false
for the first model M , so it cannot be a theorem of PL either. Since P is a sentence, this
shows that
PL is not complete.
G. EXAMPLE: MODELS FOR RA 211

G Example: Models for RA


G.1 The first model
A model for RA which breaks the cancellation laws is not hard to manufacture. All we need
to do is add an extra element to N, to which we’ll give the suggestive name of ∞. We will
extend the usual definitions of successor, addition and multiplication to include this new
element in what is (I suppose) the obvious way.

G.2 Definition of the first model


The underlying set is N ∪ {∞}, where ∞ is any element which does not belong to N already.
The operations are defined as follows: for any m, n ∈ N,
m+ , m + n , m.n are all defined as usual in N, ∞+ = ∞ ,
∞ + m = ∞ , m + ∞ = ∞ , ∞ + ∞ = ∞,
∞.0 = 0 , 0.∞ = 0
∞.m = ∞ , m.∞ = ∞ , ∞.∞ = ∞ for all m 6= 0 .

G.3 Theorem
RA satisfies none of the cancellative laws of ordinary arithmetic. In other words, these are
all non-theorems:
/ x+y =x+z ⇒ y=z
/ y+x=z+x ⇒ y=z
/ x.y = x.z ∧ x 6= 0 ⇒ y=z
/ y.x = z.x ∧ x 6= 0 ⇒ y=z

Proof. Is really just a matter of checking cases. All the axioms of RA hold for N, so we
just have to check that they also hold when one or more of the variables involved is ∞. For
example, to check RA5, it is enough to check that
(∀x)( x + ∞+ = (x + ∞)+ )
(∀y)( ∞ + y + = (∞ + y)+ )
and ∞ + ∞+ = (∞ + ∞)+ .
This can be done just by checking the definitions above. The same goes for all the other
axioms. (And there is no need to check the axioms of PL or equality, because they hold
automatically). Now, to show that the “untheorems” above really don’t hold in the model,
all we need do is look for an example — and it will have to involve ∞, so the search is not
hard.
∞+0 =∞+∞ but 0 6= ∞
0+∞ =∞+∞ but 0 6= ∞
∞.1 = ∞.∞ but 1 6= ∞
1.∞ = ∞.∞ but 1 6= ∞ 
212 CHAPTER 5. MODELS

G.4 A second model


Here we look at a model for RA which shows that most of the other theorems of PA do
not hold in it (the cancellation theorems do hold in this one). The general strategy is
fairly simple: again we just add another element to N (let’s call it λ). Then of course we
immediately must have lots of other things, λ + 1 and λλ + λ and (λ + λ).(1 + λ) and
so on. In the first model we made these all the same; now we will make them all different,
unless the axioms of RA force them to be the same. So we look at which ones of these are
forced to be the same (such as (λ + λ) + 1 and λ + (λ + 1)). It turns out that hardly any
of them are.
We can thus isolate a set of expressions which we can guarantee are all allowed to be different
by the axioms. Then we figure out how to apply the operations (successor, addition and
multiplication) to these results in a model where most of the algebraic laws which hold for
N fail. In the model below, I use special symbols so that expressions can be recognised –
and we will know that if they look different, they are different.
There is one tricky bit, forced on us by Axiom RA3. It tells us, amongst other things, that
there must be some x such that x + 1 = λ. It seems reasonable to call this λ − 1. Then
of course we get λ − 2, λ − 3, . . . , and some kind of subtraction creeps in which must be
accommodated in the model.
To cut a long story short, here is the outcome of this process. We end up with a number of
special cases to look at, so the definition looks much more complicated than it actually is.
In the same way, checking the axioms boils down to checking lots of special cases too.

G.5 The model


We start by making a simple language. The alphabet consists of all natural numbers,
together with the symbols λ , ⊕ , , [ and ] .
The expressions are

(i) λ and any number n ∈ N.


(ii) [X ⊕ Y ] [X Y ] and [X Y ] where X and Y are shorter expressions.
Now we define a basic expression recursively:
λ is basic (and n ∈ N is not),

[X ⊕ Y ] [X Y ] and [X Y ] are all basic provided that Y is basic.


Now we define an M-expression as any of:
(a) any n ∈ N,

(b) any basic expression,


(c) any expression of the forms [X ⊕ y] or [X y]
where X is basic, y ∈ Y and y 6= 0.
G. EXAMPLE: MODELS FOR RA 213

We will make a model out of the set of M-expressions.


For any M-expression X, define the successor X + as follows:
If X = n ∈ N, X + = n+ , the usual successor in N.
If X is basic, X + = [X ⊕ 1].

If X = [X 0 ⊕ y] (with X 0 basic, y ∈ N r {0}), X + = [X 0 ⊕ y + 1] .


If X = [X 0 y] (with X 0 basic, y ∈ N r {0}),
(
+ X0 if y = 1
X =
[X 0 y − 1] otherwise, i.e. if y ≥ 2.

For any M-expressions X and Y , define the sum X + Y as follows:


If Y is basic, then X + Y = [X ⊕ Y ].
If Y = [Y 0 ⊕ y] (with Y 0 basic and y ∈ N r {0}), then X + Y = [[X ⊕ Y 0 ] ⊕ y].

If Y = [Y 0 y] (with Y 0 basic and y ∈ N r {0}), then X + Y = [[X ⊕ Y 0 ] y].


If Y = 0 then X + Y = X.
Otherwise Y = y ∈ N r {0} and
If X is basic, then X + y = [X ⊕ y].

If X = [X 0 ⊕ x] (with X 0 basic and x ∈ N r {0}), then X + Y = [X 0 ⊕ x + y].


If X = [X 0 x] (with X 0 basic and x ∈ N r {0}), then

0
[X ⊕ y − x] if x < y

X + y = X0 if x = y
 0

[X x − y] if x > y.

If X = x ∈ N, then X + Y = x + y is the usual sum in N.


Before defining multiplication we define some notation, [X ⊕n Y ] and [X n
Y ], for any
n ∈ N:

[X ⊕0 Y ] = X , [X ⊕n+1 Y ] = [[X ⊕n Y ] ⊕ Y ],
0 n+1 n
[X Y] = X , [X Y ] = [[X Y ] ⊕ Y ].

For any M-expressions X and Y , define the sum X.Y as follows:


If Y is basic, then X.Y = [X Y ].
If Y = y ∈ N, X.Y = 0 if y = 0, otherwise X.Y = [X ⊕y−1 X].

If Y = [Y 0 ⊕ y] (with Y 0 basic and y ∈ N r {0}), then X.Y = [[X Y 0 ] ⊕y X].


214 CHAPTER 5. MODELS

If Y = [Y 0 y] (with Y 0 basic and y ∈ N r {0}), then X.Y = [[X Y 0] y


X].
It is now just a tedious process of checking cases to confirm that this structure is indeed a
model for RA.
Using the fact that these expressions are different if they are written differently, it is easy
to check that many of the usual algebraic properties of N do not hold in this model, and so
the corresponding theorems do not hold in RA.

G.6 Proposition
The following are not theorems of RA:
Associativity and commutativity of addition.

Associativity and commutativity of multiplication.


Distributivity of multiplication over addition.
Behaviour of zero on the left.

Proof. First we check that all the axioms of RA are satisfied by this model, just a matter
of checking lots of cases. Then, noting that λ is basic, we have

λ + (λ + λ) = [λ ⊕ [λ ⊕ λ]] but (λ + λ) + λ = [[λ ⊕ λ] ⊕ λ] ,

contradicting both associativity and commutativity. The same trick works for multiplication:

λ.(λ.λ) = [λ [λ λ]] but (λ.λ).λ = [[λ λ] λ] ,

contradicting both associativity and commutativity again. For distributivity,

λ.(λ + λ) = [λ [λ ⊕ λ]] and (λ + λ).λ = [[λ ⊕ λ] λ]


but λ.λ + λ.λ = [[λ λ] ⊕ [λ λ]] .

contradicting distributivity on both sides. For behaviour of zero,

0 + λ = [0 ⊕ λ] 6= λ and 0.λ = [0 λ] 6= 0 . 

This model satisfies the usual cancellation laws, but a small modification to it breaks those
laws too. All that is needed is to define λ + λ = λ instead of [λ ⊕ λ] and λ.λ = λ instead
of [λ λ]. However it was nicer to break those laws with the much simpler first model.
Part II

MATHEMATICS

215
6. MORSE-KELLEY SET THEORY

A Morse-Kelley Axioms for Set Theory


In this chapter I expound the Morse-Kelley axioms for Set Theory. This first-order theory
is sufficiently powerful to form a basis for all mathematics (if we exclude branches which
investigate what can be done with other logical systems, such as intuitionistic mathematics
and metamathematics, as we must).
While it is obviously impossible to actually develop the whole of mathematics explicitly
here in this book, I will provide the beginning of this program, enough so that I hope it will
become obvious that it can indeed be carried through. The plan is that, whatever area of
mathematics you are reasonably familiar with, you will be able to see how it is underpinned
by this logical foundation without any trouble.
In this chapter we define and discuss the first-order theory MK. Every expression, term
and symbol discussed will be part of this language, and no complicated meta-arguments will
be engaged in. Consequently, the use of the green font to distinguish carefully between the
formal language and the metalanguage is unnecessary here and would just be a nuisance.
I will not bother with green font-colouring in this chapter and the next, otherwise almost
everything would be in colour.
I will start by defining the theory, presenting the axioms without comment. Most of them
will be difficult to understand at this stage, but that is only because they are being expressed
here in the formal language — it is, of course, important that this can be done. So I would
suggest you just scan the definition at this stage to get a rough feel for it (and convince
yourself we are indeed defining a first-order theory here) and then refer back to parts of the
definition as they become relevant later in the chapter. What each of them actually says is
fairly straightforward, and this will be explained as we proceed through this chapter.

A.1 Definition: Morse-Kelley Set Theory, MK


This is a first-order theory with equality. It has one binary relation ∈ and no functions. I
will use a couple of abbreviations to simplify the statements of these axioms somewhat:

SET(x) will mean (∃s)(x ∈ s)


x⊆y will mean (∀u)( u ∈ x ⇒ u ∈ y ) .

Read the first one as saying “x is a set”. This may look a little strange; the reason for it will
be explained soon.

217
218 CHAPTER 6. MORSE-KELLEY SET THEORY

The proper axioms are:—


(MK1) Extension 
(∀a)(∀b) (∀x)(x ∈ a ⇔ x ∈ b) ⇒ a=b

(MK2) Specification
Schema: an axiom for every expression P with at least one free variable x (here y1 , y2 , . . . , yn
are all the other free variables of P — if any).

(∀y1 )(∀y2 ) . . . (∀yn )(∃w)(∀x)(x ∈ w ⇔ SET(x) ∧ P )

(MK3) Unordered Pairs


 
(∀a)(∀b) SET(a) ∧ SET(b) ⇒ (∃w) SET(w) ∧ a ∈ w ∧ b ∈ w

(MK4) Unions
 
(∀a) SET(a) ⇒ (∃w) SET(w) ∧ (∀x)((∃y)(x ∈ y ∧ y ∈ a) ⇒ x ∈ w)

(MK5) Power Set


 
(∀a) SET(a) ⇒ (∃w) SET(w) ∧ (∀x)(x ⊆ a ⇒ x ∈ w)

(MK6) Infinity

(∃w) SET(w) ∧ (∃u)(u ∈ w ∧ (∀x)¬(x ∈ u))



∧ (∀x) x ∈ w ⇒ (∃y)(y ∈ w ∧ (∀u)(u ∈ y ⇔ u ∈ x ∨ u = x))

(MK7) Formation

(∀a)(∀b)(∀c) SET(a)

∧ (∀u) u ∈ a ⇒ (∃v)(∃w)(v ∈ b ∧ w ∈ c ∧ u ∈ w ∧ v ∈ w)
∧ (∀u)(∀w1 )(∀w2 )(∀v1 )(∀v2 ) u ∈ a ∧ w1 ∈ c ∧ w2 ∈ c ∧ v1 ∈ b ∧ v2 ∈ b

∧ u ∈ w1 ∧ v1 ∈ w1 ∧ u ∈ w2 ∧ v2 ∈ w2 ⇒ v1 = v2

∧ (∀v) v ∈ b ⇒ (∃u)(∃w)(u ∈ a ∧ w ∈ c ∧ u ∈ w ∧ v ∈ w)

⇒ SET(b)

(MK8) Foundation
  
(∀w) (∀a) (SET(a) ∧ a ⊆ w) ⇒ a ∈ w ⇒ (∀a)(SET(a) ⇒ a ∈ w)

Note the Axiom of Choice is not an axiom of MK. It will be discussed below in Section
IF F.
A. MORSE-KELLEY AXIOMS FOR SET THEORY 219

A.2 Discussion: Sets and classes


I am assuming that if you are reading these notes, you are already reasonably familiar with
the idea of a set. On the other hand, you might not be so conversant with the idea of a class.
MK set theory deals with sets and classes, the fundamental idea being that of a class. The
theory, of course, defines precisely and in detail the logical relationships between classes, the
various manipulations that can be performed on them and the things that can be proved
about them.
Nevertheless, it is very helpful to have some sort of mental picture of a class. I shall try to
provide this now.
Firstly, a class is very like a set. A class has members and is in fact defined by its membership,
in the sense that if class A and class B have exactly the same members, then A = B. Classes
have intersections and unions and one class can be a subclass of another, all these things
being defined in terms of their members, just as with sets. One can define functions from
one class to another, and so on and so on.
So what is the difference?

Well, firstly we will see that a set is just a special kind of class, so the interesting question
is: what other sorts of classes are there? What we really want is some idea of what a proper
class might be — that is, a class which is not a set.
The best answer I can give here is to think of them as things like sets, but which are “too
big” to be sets. Examples are: the class of all sets, the class of all topological spaces, the
class of all groups, the class of all three-dimensional C ∞ manifolds, etc. We will see later
that the assumption that any one of these examples I have given of proper classes is a set
leads to a contradiction. In other words, any theory which contains enough set theory to
define, say, the set of all groups will be inconsistent.
So, for MK, the basic object is a class. Everything is a class. For instance, when we write

(∀x) . . .

take this as meaning


For every class x . . .
Putting this another way: classes constitute the universe of discourse for MK.
Of course there are many mathematical
√ objects that we don’t normally think of as classes,
or even as sets: the number 2, any point in the plane, the logarithm function and so on.
Nevertheless in this theory they are all defined as classes (in fact as sets) — the only reason
we don’t think of them as sets is that we are not interested in their members.

Some of the notation becomes a little clearer if we use uppercase letters for variables which
we think of as classes or sets (that is, whose members are of interest to us) and lowercase
letters for the others — as in “let x be a member of the group G”. I will do this where
appropriate as far as possible.
It should be pointed out here that there are other ways of axiomatising set theory and with
it the “whole of mathematics”. There is one pre-eminent one, known as Zermelo-Fraenkel
220 CHAPTER 6. MORSE-KELLEY SET THEORY

Set Theory or ZF, which is probably more popular than MK, in the sense that of those texts
which take the trouble to describe the foundations they assume, I believe currently more of
them specify ZF than specify MK.
So why am I subjecting you to MK instead of the better known and possibly more widespread
ZF?

The fundamental object in ZF is the set, not the class. In fact, there is no way in ZF to talk
directly about classes. But for a number of areas of modern mathematics this can be quite
a drawback. In algebraic topology one discusses functors, and these are functions from the
class of all topological spaces to the class of all groups. In group theory, “abelianisation” is a
function from the class of all groups to the class of all abelian groups. And category theory
can hardly be done at all without using classes extensively.
In this chapter, many statements are made and some of them have quite obvious and easy
proofs; in these cases the proofs will usually be omitted to keep the narrative going. Where
proofs are not obvious they will usually be given. There are occasional (very few) state-
ments whose proofs are truly enormous and which will not be given — the consistency and
independence of the Axiom of Choice being two examples.

The remainder of this chapter will be devoted to showing how mathematics, as we know
it, is done within this formal theory. The axioms will be introduced one at a time, what
they say will be explained in plain language, and an overview of what they provide for us
provided.
Towards the end I will discuss the Axiom of Choice, which perhaps can be thought of as an
optional extra axiom, and the extra facilities it provides us with.
B. THE FIRST FIVE AXIOMS 221

B The first five axioms


B.1 MK1: The Axiom of Extension
As given above: 
(∀a)(∀b) (∀x)(x ∈ a ⇔ x ∈ b) ⇒ a=b
Restating this slightly less formally: for any two classes A and B,

if (∀x)(x ∈ A ⇔ x ∈ B) then A = B .

Or even less formally: if two classes have exactly the same members, then they are equal.

First observe that the opposite implication, A = B ⇒ (∀x)(x ∈ A ⇔ x ∈ B) is given


to us by Substitution of Equals, and so from this axiom follows the slightly stronger form

A=B if and only if (∀x)(x ∈ A ⇔ x ∈ B) ,

in other words, two classes are equal if and only if they have the same members.

B.2 Definition of a set


A set is defined as anything that can be a member of something else. That is

x is a set means there is some s such that x ∈ s .

Thus the (semi-formal) notation SET(x) introduced right at the beginning of A.1 above says
that x is a set.

A proper class is a class which is not a set.


It might help to think: A set is small enough to be a member of something else, perhaps
another set or a class, but a proper class is far too big to be a member of anything.

B.3 MK2: The Axiom of Specification and the Empty Set


The Axiom of Specification is sometimes called The Axiom of Comprehension.

It is a schema: an axiom for every expression P with at least one free variable x (here
y1 , y2 , . . . , yn are all the other free variables of P — if any).

(∀y1 )(∀y2 ) . . . (∀yn )(∃w)(∀x)(x ∈ w ⇔ SET(x) ∧ P )

This might be easier to understand if we use the less formal “functional” notation for the
expression P and write it P (x, y1 , y2 , . . . , yn ) — with the understanding that the variables
listed are all the free variables in P . The axiom can be restated: for any such expression P
and given any classes y1 , y2 , . . . , yn , there is a class w such that

(∀x)( x ∈ w ⇔ SET(x) ∧ P (x, y1 , y2 , . . . , yn ) )

and this last line can be restated:

w is the class consisting of all those sets x for which P (x, y1 , y2 , . . . , yn ) is true.
222 CHAPTER 6. MORSE-KELLEY SET THEORY

This axiom gives us a powerful tool for defining new sets and classes, and we will be using
it frequently. For now, let us look at a straightforward example of the way it can be used.
Suppose we have two sets (or classes, it doesn’t matter) A and B and we want to define
their intersection. We know what we want — we want a set (or class), W say, with this
property
for any x, x ∈ W ⇔ x ∈ A and x ∈ B . (–1)
So, take the right hand part of this and call it P (or, if you prefer, P (x, A, B), thus:

P (x, A, B) is the expression x ∈ A ∧ x ∈ B

Applying the axiom to this, it tells us that there is a class W such that

(∀x)( x ∈ W ⇔ SET(x) ∧ x ∈ A ∧ x ∈ B )

Now, notice that if x ∈ A then SET(x) is automatically true (by the definition of SET(x)),
so this simplifies to
(∀x)( x ∈ W ⇔ x ∈ A ∧ x ∈ B )
which is just (–1) written slightly more formally.
The point of this example is that it shows how we can define new sets and classes: As long
as we can describe the members of the class we want by some expression P , then the Axiom
of Specification will tell us that that class exists.

What is the point of that part “SET(x)” of the axiom? It is there just to ensure that the
members of W must always be sets — it is possible that you could choose a bad sort of P
which was true for some proper classes x. We will see an example of this shortly. However,
in most cases the expression P implies that x is a set, and then the axiom simplifies. This
occurs frequently enough for us to state it properly:—

B.4 Simplified version of the Axiom of Specification


If the expression P (x, y1 , y2 , . . . , yn ) has the property that

(∀y1 )(∀y2 ) . . . (∀yn )( P (x, y1 , y2 , . . . , yn ) ⇒ SET(x) )

then the Axiom of Specification simplifies to

(∀y1 )(∀y2 ) . . . (∀yn )(∃w)(∀x)(x ∈ w ⇔ P),

in other words, for any y1 , y2 , . . . , yn there is a class w whose members are just those x for
which P is true.

B.5 A more useful version of the Axiom of Specification


Given an expression P and any y1 , y2 , . . . , yn , the class w is actually defined uniquely by the
Axiom of Specification. That is, the axiom can be strengthened to:
Let P be an expression with at least one free variable x (and let y1 , y2 , . . . , yn be all the
other free variables of P — if any). Then

(∀y1 )(∀y2 ) . . . (∀yn )(∃!w)(∀x)(x ∈ w ⇔ SET(x) ∧ P )


B. THE FIRST FIVE AXIOMS 223

That is, given any y1 , y2 , . . . , yn , there is a unique class w such that

(∀x)( x ∈ w ⇔ SET(x) ∧ P (x, y1 , y2 , . . . , yn ) )

Proof. This is an immediate consequence of the Axiom of Extension. 

This unique existence allows us to define a function by description (see 4.A.7). The usual I 4.A.7
notation for this is { x : P } or, if you would like to be explicit about the variables involved,
{ x : P (x, y1 , y2 , . . . , yn ) }. According to the way definition by description works, this is
defined by
W = {x : P } ⇔ (∀x)( x ∈ W ⇔ SET(x) ∧ P ) (–1)

Returning to our intersection example above. This latest result proves that, not only does
there exist a class with the required property of an intersection, but there is only one such.
That allows us to write the intersection of two classes as a function, which is exactly what
we do in the notation A ∩ B ( ∩ is a function of two variables; we are using infix notation).
Note by the way that the x in { x : P } is a dummy: it is a bound variable, bound by this
notation, and its scope is everything inside the two curly braces.
This means, amongst other things, that we can change the dummy to another variable
without changing the meaning of the notation,

{ x : P } = { u : P [x/u] }

provided, as usual, that the substitution is acceptable in P . But note, there is also a “hidden”
acceptability condition here: u must not even occur free in P (unless u is the same as x, in
which case nothing changes). To see why this is so, look at what the definition becomes in
this case:
w = { u : P [x/u] } ⇔ (∀u)( u ∈ w ⇔ P [x/u] ) ;
you can see that, if u occurs free in P then it is definitely unacceptable on the right hand
side of this definition.
Using this notation for our intersection example, this all boils down to the more compact
definition
A ∩ B = {x : x ∈ A ∧ x ∈ B }
and, as an example of changing the dummy, this is the same as

A ∩ B = {v : v ∈ A ∧ v ∈ B }

(or any other letter you would like to use except for A or B). To see the sort of thing that
can happen if you violate the “hidden” acceptability condition, consider

A ∩ B = {A : A ∈ A ∧ A ∈ B } WRONG!

(I expect that you are very familiar with this notation.)

Now let us use this axiom to get some useful basic facts about sets and classes.
224 CHAPTER 6. MORSE-KELLEY SET THEORY

Is there any guarantee that there are any classes at all? Well, yes. Substitute any predicate
you like into the Axiom of Specification, for instance x = x, and we have

(∃W )(∀x)(x ∈ W ⇔ SET(x) ∧ x = x)

which tells us that there is a class (W ) with certain properties, which we won’t bother about
just now. Suffice it for now to know that at least one class actually exists.
Well then, are there any sets at all? Again yes. A glance at the Axiom of Infinity (MK6)
shows us that it states that a class (W again) exists with certain complicated-looking prop-
erties, the first of which is that it is a set. For the moment, that is enough: there exists at
least one set.
But of course we can do a lot better than that.

B.6 The empty set


Now let us use the Axiom of Specification once more with the expression x 6= x (in case it
is not obvious by now, x 6= x is just short for ¬(x = x)):

(∃W )(∀x)(x ∈ W ⇔ SET(x) ∧ x 6= x)

But it is far more convenient to use this to define the empty class; since x is the only variable
in the expression, this defines a constant.
Definition
The empty set, ∅, is defined
∅ = { x : x 6= x } . (–1)

This certainly defines ∅, but so far all it tells us is that ∅ is a class. We will show that it
is indeed a set shortly, but first. . .

This definition tells us that


(∀x)( x ∈ ∅ ⇔ x 6= x )
and, since x 6= x is false for all x (by the first Axiom of Equality), this is the same as

(∀x)( x ∈
/ ∅). (–2)

Using the Axiom of Extension, it is easy to show that this also defines ∅ uniquely; it is
probably the better-known definition of the empty set. That means also that any set u
which has the property (∀x)¬(x ∈ u) must be the empty set.
Now, to prove that it is indeed a set, we peek once more at the Axiom of Infinity. Notice
that it has the form

(∃w) stuff ∧ (∃u)(u ∈ w ∧ (∀x)¬(x ∈ u)) ∧ more stuff

and so it says, amongst other things and in a very formal way, that there is a class w such
that ∅ ∈ w. But then, by the definition of a set above, ∅ is a set.
B. THE FIRST FIVE AXIOMS 225

B.7 Some useful notation (definitions)

x∈
/ A means ¬(x ∈ A)
A⊆B means (∀x)(x ∈ A ⇒ x ∈ B)
A⊂B means A ⊆ B ∧ A 6= B
A⊇B means B ⊆ A
A⊃B means B ⊂ A
(∀x ∈ A)P (x) means (∀x)(x ∈ A ⇒ P (x))
(∃x ∈ A)P (x) means (∃x)(x ∈ A ∧ P (x))

Note that all these definitions hold whenever A and B are classes — there is no requirement
that they be sets. With regard to the last two notations, observe that if the class A happens
to be empty, (∀x ∈ A)P (x) is automatically true regardless of the expression P (x). We say
that the statement is vacuously true. Similarly, if A is empty, the statement (∃x ∈ A)P (x)
is automatically false.
Also:
A is a proper class means ¬ SET(A) ,

that is, A is not a set.

B.8 Properties of subclasses


The following are now easily proved

(i) The empty set is a subset of every class:

∅⊆A

and the only subclass of the empty set is itself:

A⊆∅ ⇔ A=∅

(ii) The subclass relation has the properties of a partial order:

Reflexivity A⊆A

Antisymmetry A⊆B & B⊆A ⇒ A=B

Transitivity A⊆B & B⊆C ⇒ A⊆C

Easy exercise: prove all these statements.

We will prove shortly that any subclass of a set is a set.


226 CHAPTER 6. MORSE-KELLEY SET THEORY

B.9 The class of all sets


The class of all sets is denoted Sets and is defined by

Sets = { x : x = x } .

From this it follows that


x ∈ Sets ⇔ SET(x) .
We will prove shortly that Sets is not itself a set — it is a proper class.
It is easy to prove now that every class is a subclass of Sets:

for every class X, X ⊆ Sets

The class of all sets is occasionally called “the universe”. I will not use the word in this sense
because it clashes with the way we use it elsewhere in this book — recall, we have used the
term “universe of discourse” to mean everything that a variable can represent. For MK, the
universe of discourse is all classes, not just all sets.

B.10 MK5: The Axiom of Power Sets


It is convenient now to skip ahead to the Axiom of Power Sets; we will come back to the
Axioms of Unordered Pairs and Unions shortly.

Let A be a class. Then the power class of A is defined to be the class

P (A) = { X : X ⊆ A } ,
that is, the class whose members are exactly the subclasses of A which are sets — the subsets
of A.
If you are used to the idea of a power set, then this definition seems a little strange. Why
not define P (A) to consist of all the subclasses of A? The reason is that any subclasses of
A which are not sets are not allowed to be members of anything, so cannot be members of
P (A).
In fact, this is one of the uncommon cases in which the definition of this notation:

X ∈ P (A) ⇔ SET(X) ∧ x ⊆ A

can not be simplified immediately to

X ∈ P (A) ⇔ x⊆A

because X ⊆ A does not imply that X is a set.


So we don’t need an axiom to tell us that the power class of a class exists. The Axiom
of Power Sets is going to tell us, in a slightly roundabout way, that the power set of a set
exists.
The Axiom of Power Sets states that, for every set A, there is a set W such that

(∀X)(X ⊆ A ⇒ X ∈ W) ,
B. THE FIRST FIVE AXIOMS 227

that is, there is a set W such that every subclass of A is a member of W ; in other words,
such that P (A) ⊆ W .
There are several important things to notice here:
(1) If A is any set, then every subclass of A must also be a set, since it is a member of
this set W . This is the result promised in B.8 above.
(2) If A is a set, then its power class P (A) is in fact a set, since it is a subclass of the
set W . We therefore call it the power set of A.
(3) A corollary of all this is that the Zermelo-Fraenkel version of the Axiom of Specifi-
cation holds, that is, if a is a set and P (x) an expression, then
(∃w ∈ Sets)(∀x)(x ∈ w ⇔ x ∈ a ∧ P (x)) .

(5) A couple of special power sets are worth knowing:


P (∅) = {∅} and P ({a}) = { ∅, {a} }.
We cannot present the second one officially yet, because we have not defined the notation
involved; however I’m sure you know what it means.

B.11 The Russell Paradox


In the early days of attempts to set mathematics on a firm foundation by axiomatising Set
Theory, it was generally assumed that one could define a set as any collection of things (or
at least, any collection of whatever things mathematics can talk about). But mathematics
can (it is assumed) talk about sets, and so it is then natural to consider the set of all sets.
Bertrand Russell showed that assuming the existence of the set of all sets gives rise to an
inconsistency (at that time called a paradox), and this gave rise to a serious re-think of the
foundations of Set Theory. Russell’s solution was to come up with a multi-tiered approach
(his “Theory of Types”) but more recently it has become clear that a simpler two-tiered
approach is sufficient. In MK the two tiers are sets and classes, and the paradox is avoided
by the fact that, in this theory, there is a class of all sets but not a set of all sets.

For anyone interested in category theory: actually the two-tiered approach starts to
fall a little short here. Given categories A and B, there is a concept of the category of
functors A → B and natural transformations between them. If A and B are classes
(as they usually are) then the members of the functor category are classes too, and the
whole thing cannot be accommodated within MK without some serious double-speak.
(There are ways of getting around the problem, but they are a bit of a nuisance.) If
you are not interested in category theory, ignore this coment.

In the context of MK, Russell’s argument shows that the class of all sets cannot be a set,
and is therefore a proper class.
The Russell class, is defined to be the class of all sets which are not members of themselves:
RUS = { x : x ∈
/ x}.
228 CHAPTER 6. MORSE-KELLEY SET THEORY

The “Russell Paradox” argument shows that this is not a set. From the results above, we
see that RUS ⊆ Sets, and so Sets is not a set either.
Here is the Russell Paradox argument: note first that the definition of RUS is equivalent to

x ∈ RUS ⇔ SET(x) ∧ x ∈
/x. (–1)

We suppose that RUS is a set and deduce a contradiction.

Now either RUS is a member of itself or not.


If RUS ∈ RUS, then it fails the condition on x in the definition (–1) above, and so RUS ∈
/
RUS — a contradiction.
On the other hand, if RUS ∈ / RUS (and remembering our assumption that RUS is a set),
then it satisfies the condition on x in the definition (–1) above, and so RUS ∈ RUS — a
contradiction again.
Thus our basic assumption that RUS is a set leads to an inescapable contradiction (whether
it is a member of itself or not). The only conclusion is that RUS is not a set: it is a proper
class.

The important thing to take away from this fun little proof is

Sets is not a set, it is a proper class.

The Russell Paradox is sometimes called the “Barber Paradox”, from a popularisation which
starts off, “In a certain town there is a barber who shaves everyone who does not shave
himself. Does the barber shave himself?” This is not a true paradox because the logical
conclusion is that no such barber exists.

B.12 The calculus of classes


All the usual operations and relations with sets that we know and love with sets (subset,
intersection, union, set-theoretic difference and so on) are defined for classes too and work
the same way. We also get an operation for classes that is not available for sets — the
complement.
The complement of a class A, usually denoted Ac , is defined by Ac = { x : x ∈ / A }. If A is a
set, then Ac is not a set, it is a proper class (because, as we will see shortly, A ∪ Ac = Sets
and the union of two sets is a set).
Properties of the complement are

(i) ∅c = Sets and Setsc = ∅ .


(ii) For any class A, (Ac )c = A .
(iii) For any classes A and B, A⊆B ⇒ B c ⊆ Ac .

(iv) For any classes A and B, A=B ⇒ Ac = B c .


B. THE FIRST FIVE AXIOMS 229

The class-difference ArB of two classes A and B is defined by ArB = { x : x ∈ A ∧ x ∈/ B }.


If A is a set, then A r B is also a set (whether or not B is a set!). When A and B are both
sets, this is often called the set-theoretic difference.
Properties of the class-difference are:

(v) For any classes A and B, ArB ⊆A .


(vi) For any class A, Ar∅ = A , ArSets = ∅ , ∅rA = ∅ and SetsrA =
Ac .
(vii) For any classes A and B, A⊆B ⇔ ArB =∅ .

The union and intersection of two classes are defined by A ∪ B = { x : x ∈ A ∨ x ∈ B }


and A ∩ B = { x : x ∈ A ∧ x ∈ B }. There are many properties which can be listed now,
of which the most important are:
(viii) (A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c
(ix) A ∪ (B ∪ C) = (A ∪ B) ∪ C and A ∩ (B ∩ C) = (A ∩ B) ∩ C

(x) A∪B =B∪A and A∩B =B∩A


(xi) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) and A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(xii) A∪A=A and A∩A=A

(xiii) A ∪ ∅ = A , A ∪ Sets = Sets , A∩∅=∅ and A ∩ Sets = A


(xiv) A ∪ Ac = Sets and A ∩ Ac = ∅
(xv) A⊆A∪B and A∩B ⊆A
(xvi) A ⊆ B ⇔ A∪B =B and A⊆B ⇔ A∩B =A .

(xvii) . . . and so on.

Proof. All these proofs go much the same way and are quite easy. The trick is to first use
the definitions of complement, intersection, or whatever operations are relevant, to convert
the equation into a problem in logic; this then turns out to be a well-known result in SL.

For example, consider the first equation of (xi). Using the Axiom of Extension, this is
equivalent to 
(∀x) x ∈ A ∩ (B ∪ C) ⇔ x ∈ (A ∩ B) ∪ (A ∩ C)
Working on the left-hand side,

x ∈ A ∩ (B ∪ C) ⇔ x ∈ A ∧ x ∈ (B ∪ C) Defn of intersection
⇔ x ∈ A ∧ (x ∈ B ∨ x ∈ C) Defn of union

and on the right

x ∈ ((A ∩ B) ∪ (A ∩ C)) ⇔ (x ∈ A ∩ B) ∨ (x ∈ A ∩ C) Defn of union


⇔ (x ∈ A ∧ x ∈ B) ∨ (x ∈ A ∧ x ∈ C) Defn of intersection
230 CHAPTER 6. MORSE-KELLEY SET THEORY

I 2.F.23 and these are equivalent by SL (2.F.23). Now we have a proof, but it is written out sort of
backwards. Putting it the right way round, we get

x ∈ A ∩ (B ∪ C) ⇔ x ∈ A ∧ x ∈ (B ∪ C) Defn of intersection
⇔ x ∈ A ∧ (x ∈ B ∨ x ∈ C) Defn of union
⇔ (x ∈ A ∧ x ∈ B) ∨ (x ∈ A ∧ x ∈ C) By SL
⇔ (x ∈ A ∩ B) ∨ (x ∈ A ∩ C) Defn of union
⇔ x ∈ ((A ∩ B) ∪ (A ∩ C)) Defn of intersection

It is probably no accident that the symbols for logical AND and OR are so similar to the
symbols for intersection and union. 

B.13 MK3: The Axiom of Unordered Pairs


Let a and b be classes. Then {a} and {a, b} are the classes defined by

{a} = { x : x = a } and {a, b} = { x : x = a ∨ x = b } .

We note that if a is a proper class, then {a} is empty; similarly, if a and b are proper classes,
then {a, b} is empty. Therefore these definitions are usually only used when a and b are sets.
In this case they are not empty, and their defining properties are

x ∈ {a} ⇔ x=a and x ∈ {a, b} ⇔ x=a ∨ x=b

In this case, {a} is called a singleton and {a, b} is called a doubleton or an unordered pair.
As with the power set, we don’t need the new axiom to tell us that the pair and the singleton
exist as classes and that they have the requisite properties. The new axiom tells us, in the
usual roundabout way, that they are actually sets.

The Axiom of Unordered Pairs tells us that, for any sets a and b, there is a set W such
that {a, b} ⊆ W . It follows from this that {a, b} is itself a set. Also, from the definition,
{a} = {a, a} and so it follows that {a} is a set also.
So for any set a, there is a set whose only member is a; for any sets a and b there is a set
whose members are a and b (and no others).

From these definitions we can easily prove lots of simple facts about unordered pairs and
singletons. For some examples (here a, b, c, d and x are assumed to be sets)
(i) {a, b} = {c, d} if and only if (a = c ∧ b = d) or (a = d ∧ b = c) .
(ii) {a} = {b} if and only if a=b .

(iii) {a} = {b, c} if and only if a=b=c .


(iv) x∈A if and only if {x} ⊆ A .
(v) {a, b} = { x : x = a ∨ x = b } and {a} = { x : x = a } .

(vi) {a, a} = {a} .


B. THE FIRST FIVE AXIOMS 231

Proof. As an interesting exercise. 

Note that it will shortly become apparent that these simple-looking facts are more important
than they look.

It is a further interesting little exercise to figure out which of these statements remain true
if the restriction that a, b, etc. must be sets is removed.
Before we leave singletons, note a trap for young players: the set {∅} is not the empty set.
It has in fact one member, namely ∅.

B.14 MK4: Unions


There are two “kinds” of unions that turn up in mathematical discourse:

• the union of two classes A∪B (from which idea we can define the union of three classes
A ∪ B ∪ C and so on) and
S
• the union
S of a class of sets: if A is a classSof sets this may variously be written Y ∈A Y
or { Y : Y ∈ A } or, most simply, A . (I prefer the last and simplest notation
and will use it in what follows.)

Of course, in the world of MK, the members of classes are always sets, and so any class is
automatically a class of sets. When I say above that A is a “class of sets”, it is merely a
signal to the human reader that we will here actually be interested in the members of A as
sets themselves — the members of its members are relevant.

Both these kinds of union are easily defined:

A ∪ B = {x : x ∈ A ∨ x ∈ B }
[
A = { x : (∃Y )(x ∈ Y ∧ Y ∈ A }

S
Axiom MK4 tells us that, if A is a set of sets, then A is also a set. (More precisely, it
works in the usual
S roundabout way, telling us that, ifSA is a set of sets, then there is a set
W such that A ⊆ W ; and from this it follows that A is a set.)

These definitions can be rewritten

x∈A∪B ⇔ x ∈ A or x ∈ B ,
[
x∈ A ⇔ there is some member Y of A such that x ∈ Y .

We can now prove various standard results, such as


[
∅=∅
[
{A} = A
232 CHAPTER 6. MORSE-KELLEY SET THEORY

from which, in particular,


[
{∅} = ∅.

Also note that, if A and B are sets, then


[
A∪B = {A, B}

which, with the Axiom of Unordered Pairs above, tells us that, if A and B are sets, then so
is A ∪ B. You cannot use this trick if A and B are proper classes because then {A, B} is
empty.

Note that {a, b} = {a} ∪ {b} and so we can extend this notation in a natural manner:
{a,
S b, c} = {a}∪{b}∪{c} (defined as ({a}∪{b})∪{c}) and then of course we have A∪B ∪C =
{A, B, C} (provided A, B and C are sets) and so on.

B.15 Intersections
We define the intersection of two classes and of a class of sets in a similar way to their
unions:
A ∩ B = { x : x ∈ A ∧ x ∈ B }.
\
A = { x : (∀Y ∈ A)(x ∈ Y ) }

We don’t need another axiom to deal with intersections of sets. If A is nonempty, then its
intersection is a set, even when A is a proper class! To see this, note that if A is nonempty,
then it mustTcontain at least one member, Y say, and this must be a set. It is now easy to
prove that A ⊆ Y , and so the intersection is a set also.
On the other hand, the intersection of the empty set is not a set:
\
∅ = Sets

The proof is an interesting exercise in PL. You should try it.


Warning: it is not usual to use this notation when A is empty. In this case many authors
consider the intersection to be undefined — as it is, in fact, in Zermelo-Fraenkel Set Theory.
If you really want to use the intersection of the empty set, you should state explicitly what
you take it to be.

The intersection of two classes is a class (possibly a set, but not always). The intersection
of two sets is always a set because of the easily proved fact that A ∩ B ⊆ A.
I B.12 It is now worth looking again at all the basic results listed in Section B.12. All these results
hold just as well for sets as classes (for the simple reason that sets are just a kind of class),
however we now know also that, provided A, B and C are sets, all the things listed in that
Section are also sets — with the exception of those that mention Sets or complements.
B. THE FIRST FIVE AXIOMS 233

B.16 Ordered pairs


Next we define an ordered pair. Notation for ordered pairs varies in the literature; possibly
the most common notation is (x, y), which can be very confusing since that notation is also
used for other things, notably the arguments of functions. So in these notes I will use the
angle-bracket notation ha, bi.
The crucial property of such an ordered pair is that

ha, bi = hc, di ⇔ a=c ∧ b=d.

We will actually define ordered pairs twice — in two different ways. I will call the first kind
a primitive ordered pair.
So, the primitive ordered pair ha, bip is defined by

ha, bip = {{a}, {a, b}} .

As with unordered pairs, this definition makes sense when a and b are classes, but is only
useful when they are both sets. Note that, if a and b are both sets, the Axiom of unordered
pairs tells us that this ordered pair ha, bip is also a set.
Using the properties of unordered pairs listed above ((i) to (iii)), we can prove the main
property of primitive ordered pairs:

B.17 Proposition
If a, b, c and d are sets, then

ha, bip = hc, dip if and only if a=c ∧ b=d.

Proof. As an interesting exercise. 

We now also define (primitive) ordered triples by ha, b, cip = hha, bip , cip and prove the
crucial property, that, if a, b, c, d, e and f are all sets, then so are ha, b, cip and hd.e.f ip ,
and
ha, b, cip = hd, e, f ip if and only if a=d ∧ b=e ∧ c=f
and so on.

The problem with this “primitive” definition is that it only really works when the elements
of the pair are sets: if A and B are proper classes, then it is not difficult to check that
hA, Bip = {∅}, irrespective of the values of A and B, and so the crucial property just stated
and proved for sets no longer holds. We will shortly have need of ordered pairs of classes —
well actually an ordered triple hA, G, Bi defined as hhA, Gi, Bi — so we now make a more
powerful definition, which works for classes as well as sets.

The (ordinary) ordered pair ha, bi is defined by

ha, bi = { h∅, xip : x ∈ a } ∪ { h{∅}, xip : x ∈ b }


234 CHAPTER 6. MORSE-KELLEY SET THEORY

What is going on here? We are replacing a and b by sets in which all their elements are
“tagged” so we can tell them apart. Think of this as

ha, bi = a 0 ∪ b0

where a0 and b0 are the tagged versions. We tag every member of a with ∅ and every member
of b with {∅}:

a0 = { h∅, xip : x ∈ a }
b0 = { h{∅}, xip : x ∈ b } .

Now it really doesn’t matter what tags we use here, so long as they are different so that we
can tell them apart. For example, if we had already defined the numbers 0 and 1 it would
have been natural to use them for tags, something like this:

a0 = { h0, xip : x ∈ a }
b0 = { h1, xip : x ∈ b } .

But we can’t do that, because we haven’t defined 0 and 1 yet. So I have chosen to use ∅
and {∅} here because they are the simplest things we already know are different (∅ has no
members and {∅} has one member, namely ∅ itself). There is another reason which will
emerge later that this choice of tags is nice.
It is important to know that if a and b are sets then ha, bi is a set too.
Here is the proof. This proof is included for completeness. Do not bother to wade through
it unless you are interested.
Suppose that a and b are both sets. Then ha, bi is also a set. To see this, note that {a}
and {a, b} are both subsets of {a, b} and hence members of P ({a, b}). Hence ha, bip =
{{a}, {a, b}} ⊆ P ({a, b}). In particular, for any x ∈ a, {∅, x} ⊆ {∅} ∪ a and so

h∅, xip ⊆ P ({∅, x}) ⊆ P ({∅} ∪ a) and so h∅, xip ∈ PP ({∅} ∪ a) .

Thus {h∅, xip : x ∈ a} ⊆ PP ({∅} ∪ a). In the same way {h{∅}, xip : y ∈ b} ⊆
PP ({{∅}} ∪ b) and so

ha, bi = {h∅, xip : x ∈ a} ∪ {h{∅}, xip : y ∈ b} ⊆ PP ({∅} ∪ a) ∪ PP ({{∅}} ∪ b)

is a set.

Now, for the crucial property:

B.18 Proposition
If a, b, c and d are any classes, then

ha, bi = hc, di if and only if a=c ∧ b=d.

Proof. As exercise. However, here is a hint.


B. THE FIRST FIVE AXIOMS 235

Write X = ha, bi. We want to show that, given X there is only one possible choice of a and
b that will give it. For a start we can recover a and b thus:

a = { x : h∅, xip ∈ X }
b = { x : h{∅}, xip ∈ X } . 

Our next major goal is to define functions in such a way that we can create them conveniently
when we want to; also so that we can define functions from one class to another (as one
wants to do all the time in algebraic topology and category theory to mention just two
areas). We will, of course, have to define functions as sets or classes. In order to do this we
must first discuss cartesian products and relations.

B.19 Cartesian products


The cartesian product A × B is just the set of all ordered pairs ha, bi, where a ∈ A and
b ∈ B. So we have the definition:

A×B = { ha, bi : a ∈ A ∧ b ∈ B }

This is a bit informal; what it really means is

A×B = { x : x = ha, bi for some a ∈ A and b ∈ B }

or even more formally

A×B = { x : (∃a ∈ A)(∃b ∈ B)(x = ha, bi) }

As usual, we can rewrite the definition in the form

x∈A×B ⇔ (∃a ∈ A)(∃b ∈ B)(x = ha, bi)

I have used the general form of the ordered pair here, but the primitive form could have
been used just as well (since here a and b must be sets). In any case, this definition is good
for defining the cartesian product of two classes A and B. (This is important.) Nevertheless,
we also need to know that

If A and B are both sets, then A × B is also a set.

As before, the proof is given here for completeness. Don’t feel you have to work through it
unless you really want to.

Remember that we showed ⊆ PP ({∅} ∪ a) ∪ PP ({{∅}} ∪ b).


S that, for any a and b, ha, bi S
Now, for any a ∈ A, a ⊆ A and so {∅} ∪ a ⊆ {∅} ∪ S that PP ({∅} ∪ a) ⊆
A. It follows
PP ({∅} ∪ A). In the same way PP ({{∅}} ∪ b) ⊆ PP ({{∅}} ∪ B). Thus
S

[ [
ha, bi ⊆ PP ({∅} ∪ A) ∪ PP ({{∅}} ∪ B)

and so [ [ 
ha, bi ∈ P PP ({∅} ∪ A) ∪ PP ({{∅}} ∪ B) .
236 CHAPTER 6. MORSE-KELLEY SET THEORY

every ha, bi ∈ A × B, we have A × B ⊆ P PP ({∅} ∪


S
Since this is true for A) ∪
S 
PP ({{∅}} ∪ B) and so A × B is a set.
We can now prove the usual properties of a cartesian product. The cartesian product is
not in general commutative or associative (an interesting exercise to prove this). It does
distribute over both union and intersection:

A × (B ∪ C) = (A × B) ∪ (A × C) , A × (B ∩ C) = (A × B) ∩ (A × C).

Here are some other random properties

A×∅=∅ , ∅×A=∅ , A×B =∅ ⇒ A = ∅ or B = ∅

U ⊆ A and V ⊆ B ⇒ U ×V ⊆A×B
{a} × {b} = {ha, bi}
and so on.

We could now go on to define cartesian products of more than two sets, for instance A×B×C
to mean (A × B) × C, but we won’t do that because later we will have a more satisfactory
way to define general cartesian products. For the time being however, the notation A2 for
A × A is useful.

B.20 Relations
We now wish to define relations as mathematical objects, rather than as the relation symbols
of our language. This will allow us great flexibility in creating new relations of various
kinds. It would be possible to define new relations by adding new symbols to the language
for them, extending the language to include them, and also adding new axioms to define
their meanings. This process clearly would be very messy; instead we show how to define
relations as sets, which can be defined at will by the means already at our disposal without
having to do any violence to the language itself. This method, it will be seen, is neat and
very flexible.
For the time being, we will only discuss binary relations (the extension to other arities will
become obvious). If R is a relation, we will denote “a stands in relation R to b” by aRb
rather than R(a, b). This is more in line with the notation used for most binary relations
such as ≤ , < , ⊆ and so on.

Suppose now that R is a relation from a class A to a class B. That means that aRb makes
sense and is either true or false whenever a ∈ A and b ∈ B. Then there are three important
classes here:—
A, the domain of R. We denote this dom(R).

B, the codomain of R. We denote this cod(R).


The graph of R, which is the set of all ordered pairs ha, bi for which aRb is true,

gr(R) = { ha, bi : a ∈ A , b ∈ B and aRb } .


B. THE FIRST FIVE AXIOMS 237

Clearly these classes between them define the relation, gr(R) is a subclass of the cartesian
product A × B and any subclass of A × B will define some relation from A to B. Thus we
have a method of constructing relations as classes.
Definition A relation is a triple hA, G, Bi, where A and B are classes and G is a subclass
of the cartesian product A × B. In this case A is called the domain of R, dom(R); B is the
codomain of R, cod(R) and G is the graph of R, gr(R). We say that R is a relation from A
to B. We write aRb to mean ha, bi ∈ G.
When we define such a relation, A and B will usually be sets, and in this case, gr(R) and
R itself will be sets also. However it all works fine if A and B are classes.
We can now define equivalence relations:—

An equivalence relation on a class A is a binary relation ≡ on A (that is, a relation from A


to itself, as defined above) with the properties

Reflexivity a≡a for all a ∈ A

Symmetry a≡b ⇒ b≡a for all a, b ∈ A

Transitivity a ≡ b and b ≡ c ⇒ a≡c for all a, b, c ∈ A

We can now also define order relations of various kinds (partial, full orders etc.). We will
do so in Section C. IC
Various other kinds of relations (preorders and full orders being a couple of important ones)
can be defined the same way.
It is important to notice that now we have three distinct but related ideas, all called “rela-
tion”.
(1) The relation symbols of the fully formal language. In mathematics, as defined here,
there are only two of these, ∈ and =.
(2) Any expression with two free variables defines a relation, for example the expression

P (a, b) = (∀x)(x ∈ a ⇒ x ∈ b)

defines the subset relation.


(3) Relations defined by choosing subclasses of cartesian products, as just described.

Also, of course, where a useful relation has been defined by either methods (2) or (3), it may
be given a symbol which can henceforth be used as though it was of type (1): the subset
relation symbol ⊆ arose this way.

B.21 Functions
We will now define a function as a special kind of relation. If f is a function A → B, then
we can think of f (a) = b as defining a relation from A to B. If (temporarily) we write this
238 CHAPTER 6. MORSE-KELLEY SET THEORY

af b to stay in line with the notation of the previous paragraph, we see that the relation f
has to have a special property to qualify as a function, and arrive at the following definition.

Definition A function from A to B (where A and B are classes) is a relation f from A


to B with the property:

For every a ∈ A there is a unique b ∈ B such that af b ,


in other words,
(∀a ∈ A)(∃!b ∈ B) af b .

This uniqueness is of course exactly what we need to use functional notation (definition by
description): for any a ∈ A, we write f (a) for the unique b such that af b.
Note This definition implies that dom f = A. Thus we are here adopting the convention
used in virtually all of mathematics except elementary calculus that, when we say that f is
a function A → B, we imply that it is defined on all of A; in other words, its domain is A,
not just a subset of A.
Musing Calculus 101 really is a worry. Look in any first-year calculus door-stop textbook
at the definitions of limit and continuity and you will find that the authors get themselves
into all kinds of knots regarding the domain of the functions involved (if they don’t simply
ignore the problems completely). In my experience, most such textbooks end up getting the
definitions plain wrong.

More musing It would appear that we now have two different meanings of the word
“graph” — the one defined here, and the one we all know well from elementary mathematics.

Now consider an example, the squaring function on the reals; let us call it f . It is a function
f : R → R defined by f (x) = x2 for all x ∈ R. So, if we use the definition of “graph” in
this section, its graph will be the subset of R × R consisting of all points of the form hx, x2 i.
But R × R is of course the Euclidean plane, so we can draw a picture of this set (the orange
curve is the graph):

AHA! The two things are actually the


same. (Well, I suppose our new use of
the word is a generalisation from func-
tions R → R to functions defined on
general sets A → B.)

Note also In this definition, every function has a well-defined domain and codomain. The
fact that a function defines its domain is not a surprise, but not everyone would agree that
a function defines its codomain. (The codomain is what you might think of as the “target”
set or class — it contains the range.) Consider this example:
B. THE FIRST FIVE AXIOMS 239

The absolute value function x 7→ |x| as a function C → R≥0 (the non-negative reals);
The absolute value function as a function C → R (all the reals);
The absolute value function as a function C → C
(Makes a nice example of a non-analytic function).

Some folk would consider these to be all the same function. According to the definition
above however they are three different functions because they have different codomains.
They are obviously closely related of course.
The more precise definition of a function used here is in line with some branches of modern
mathematics, in particular category theory and any branch that uses category-theory ideas;
Its use in general does not cause any problems.
Note that, if A and B are sets, then any function A → B is also a set.
It is now straightforward, if tedious, to verify all the usual basic properties of functions.
We will say that two functions f and g are equal if they are equal as classes. This turns out
to be the same as the usual meaning of equality of functions: f and g are equal if they have
the same domain and codomain and f (a) = g(a) for all a in their domain.
The composite of two functions and the identity function idA on a class A are defined in the
obvious ways.

Composition is associative where the composites are defined and if f : A → B then idB ◦f =
f and f ◦ idA = f .
The proper definition of a function as a special type of relation allows us to clear up a point it
is possible you have been wondering about for ages: whether there are functions f : A → B
where A and or B may be empty, and if so, what they are. Perusal of the definition tells us
that
• For any class B there is a unique function ∅ → B. Its graph is the empty set, so we
call it the empty function.
• If A is nonempty, then there is no function A → ∅ .

If f is a function A → B and X is a subclass of A, we write f [X] for the class of images of


members of X under f , that is

f [X] = { y : y ∈ B ∧ (∃x ∈ X)(y = f (x)) }.

The notation { f (x) : x ∈ X } is also often used and is more readable.


The range of f is just the class f [A] and is often written ranf .
Suppose that f is a function A → B and that D is a subset of A. Then the restriction of
f to D, which we will write f D , is the function D → B which agrees with f on D. (Some
books use f |D for this.)
240 CHAPTER 6. MORSE-KELLEY SET THEORY

B.22 Two kinds of function and Definition by Description again


We now have two ways of defining a function.
(1) By description. If we have an expression P (u, x) such that

(∃!u)P (u, x) for all x

then we can introduce a new function symbol, defining

u = f (x) to mean the same thing as P (u, x).

(2) By specifying it as a class, as in the definition just given. If we have classes A and
B and a subclass G of A × B such that, for all a ∈ A there is a unique b ∈ B such that
ha, bi ∈ G, then this defines a function f = hA, G, Bi.
These two ways of dealing with functions are equivalent, in the sense that, given any function
of type (1), we can define an equivalent function of type (2) and vice versa.
However, method (2) has one outstanding advantage: if A and B are sets, then any function
A → B is, as we have seen a set, and so we are able to speak of the “set of all functions
A → B”. With method (1) this construction just doesn’t make sense. Therefore, whenever
new functions are introduced, it is usually tacitly understood that they are defined as sets
(or classes).

B.23 Cartesian powers


For any two sets A and B we define B A to be the set of all functions A → B.

Note the reversal of order of the symbols here!


Also note: the set A∅ has exactly one member, the empty function ∅ → A. This includes
the case ∅∅ . On the other hand, if B is nonempty, then ∅B = ∅.
We only consider the cartesian power in the case where A and B are both sets. When the
definition is applied to classes it is just not useful.

B.24 Sequences
Consider a sequence ha0 , a1 , a2 , . . .i, whose elements belong to some class A. If we were to
write its elements as ha(0), a(1), a(2), . . .i instead, it would be clear that were were simply
listing the values of a function N → A. Looking at it this way then, a sequence is nothing
new; it is simply a function N → A (where A is the class containing the elements of the
sequence), but with a different notation which we sometimes choose to use for one reason
or another.
Well ... there is something new here: we haven’t got around to defining N yet. But when
we do, we will have sequences ready made. We will also have n-tuples of course: an n-tuple
ha1 , a2 , . . . , an i with elements in a set A is simply a function from the subset {1, 2, . . . , n}
of N to A.

I B.16 Oops! Now we have three different definitions of an ordered pair, the ones in B.16 and B.17
I B.17
B. THE FIRST FIVE AXIOMS 241

above and the one just given as a 2-tuple. In the same way we have three definitions of
a triple. The definitions just given (as special cases of n-tuples) are the most convenient.
Unfortunately, the more primitive definitions in B.16 cannot be dispensed with — they are
used to define a function, which in turn is used to define an n-tuple. Perhaps the best plan
is to call both the kinds of ordered pairs and triples defined in B.16 “primitive”, use them
only to define relations and functions and use the more general and convenient definition of
an n-tuple in all other cases. In fact, the distinction is almost never important in practice.
When we index members of some set A in this way it is very common for the index set to
be N (or some subset thereof). But there is no reason why this has to be so. It is perfectly
possible to have something like a sequence, but indexed by some other set, I say. In this
case we use the word family instead of sequence. Also, the notation ha0 , a1 , a2 , . . .i is no
longer useful; we write something like hai ii∈I instead.
Without needing to wait for N to be defined, we can define families. A family hxi ii∈I in
a class A indexed by the set I is simply a function x : I → A. (There is no reason why I
cannot be a class here too, but it would be unusual for this to be so.)

In particular, if A is a set, the cartesian power AI can be thought of as the set of all families
in A indexed by I.
More generally, if hAi ii∈I is a family of sets, we can define its cartesian
Q product Qto be the
set of all families hai ii∈I , where ai ∈ Ai for all i ∈ I. This is denoted i∈I Ai or hAi ii∈I .
242 CHAPTER 6. MORSE-KELLEY SET THEORY

C Order
The various kinds of orders, especially partial and full orders, should be familiar to you.
This is a short section just to get our symbols and terminology straight.

C.1 Definition: Various kinds of orders


An order on a set X is a binary relation on that set with certain properties (see below). It
is usually written ≤, but other more exotic symbols are used from time to time. An ordered
set is a set, X say, together with an order relation on that set: if we want to be careful, we
define it as a pair hX, ≤i.

It is perfectly possible to define an order relation on a class and to discuss ordered classes;
everything in this section applies to ordered classes just as well as to sets. (Example: ⊆ is
a partial order on the class of all sets.)
There are various kinds of orders, depending upon which of the following conditions they
satisfy:

(O1) Reflexivity For all x ∈ A, x ≤ x.


(O2) Transitivity For all x, y, z ∈ A, x ≤ y and y ≤ z ⇒ x ≤ z.
(O3) Antisymmetry For all x, y ∈ A, x ≤ y and y ≤ x ⇒ x = y.

(O4) Dichotomy For all x, y ∈ A, either x ≤ y or y ≤ x.


(O5) Well-order Every nonempty subset of A has a least member.
A pre-order is a relation which satisfies Conditions O1 and O2. Pre-orders are sometimes
called quasi-orders.

A partial order is a relation which satisfies Conditions O1, O2 and O3.


A full order is a relation which satisfies Conditions O1, O2, O3 and O4. Full orders are
sometimes called total orders or linear orders. The term “linear” is best avoided in this
context, since it can be confusing.

A well-order is a relation which satisfies all five conditions O1 – O5.


So we have a hierarchy of order types here with stronger and stronger conditions on them
as we go from pre-orders to partial orders to full orders to well-orders.
Pre-orders are not met with very often, but are occasionally useful. They are very helpful in
I Ap.A the development of the standard number systems outlined in Appendix A. However we will
not be considering them in the present section: here every order will be at least a partial
order.
Probably the most obvious example of a partial order which is not a full order is the subset
relation ⊆. This can be used as a relation on any set (or class) of sets. It is obvious that it is
reflexive (X ⊆ X always), antisymmetric (if X ⊆ Y and Y ⊆ X the X = Y ) and transitive
(if X ⊆ Y and Y ⊆ Z then X ⊆ Z). On the other hand it is not a full order except in very
C. ORDER 243

special circumstances (there are sets X and Y such that neither X ⊆ Y nor Y ⊆ X). We
will deal with other partial orders from time to time.
Notation: In a partially ordered class, we write

xy to mean ¬(x ≤ y)


and x<y to mean (x ≤ y) ∧ (x 6= y) — which is the same as (x ≤ y) ∧ (y  x).

The relations x ≥ y, x  y and x > y are defined in the obvious ways.


It is easy to prove that, in a partially ordered class,

x<y ⇒ xy for all x and y

but the reverse implication does not hold in general. In fact, if the reverse implication does
hold for all x and y, then the relation is a full order (another easy proof).

Full orders are probably the most familiar kind: the standard order relations on the natural
numbers, the integers, the rationals and the reals are all full orders.
We will look at well-orders in Section E. IE

C.2 Definition of some terms


(i) Two elements a and b of a partially ordered class are comparable if either a ≤ b
or b ≤ a. Thus a fully ordered class is a partially ordered class in which all elements are
comparable.
(ii) Let X be a subclass of a partially ordered class A. Then

a is a lower bound for X (in A) if

a∈A and for all x ∈ X, a ≤ x.

a is a minimum or least element of X if

a∈X and for all x ∈ X, a ≤ x,

that is, if it is a lower bound for X which is also a member of X.

a is a minimal element of X if

a∈X and for all x ∈ X, x ≮ a.

Upper bounds, maximum (= greatest) elements and maximal elements are defined in the
obvious ways.
A least upper bound or supremum for X is, as its (first) name implies, a least element in the
set of all upper bounds of X. The notations lub X and sup X are used. Similarly a greatest
lower bound or infimum, denoted glb X or inf X, is a greatest element in the set of all lower
bounds of X.
244 CHAPTER 6. MORSE-KELLEY SET THEORY

(iii) A chain in a partially ordered class is a subclass which is fully ordered, that is, a
subclass in which all elements are comparable.
(iv) A subclass X of a partially ordered class A is initial or an initial segment if
x∈X and a ≤ x ⇒ a∈X .
For any member a of A, the set I(a) = {x : x < a} is an initial segment (the initial segment
¯
defined by a), as is the class I(a) = {x : x ≤ a}. Also, A is an initial segment of itself. A
partially ordered class may well have initial segments other than ones of these forms.

C.3 Suprema
The idea of a supremum should be familiar to you from calculus and analysis. Suprema and
their properties will be important to us (in this chapter, but particularly in the context of
I Ch.7 ordinal numbers in Chapter 7) so here are a number of facts worth knowing.

(i) The supremum of a subset depends also on what it is a subset of. For example, let
X be the set of all rational numbers x such that x2 < 2. Then:

• as a subset of R, sup X exists and is equal to 2;
• as a subset of Q, sup X does not exist.

This suggests that the usual notation could be a bit misleading, and we should write some-
thing like, say, supR X and supQ X. It usually doesn’t matter, because we know what the
“main” set is.

This same example shows that the inf does not always exist.
(ii) In any ordered set, m = sup X if and only if the following are both true:
(a) x ≤ m for all x ∈ X.

(b) If a < m, then there is some x ∈ X such that a < x.


If you have a set X and element m and you want to prove that m = sup X (and there is no
other obvious easy way to do it) the standard approach is to prove these two things.
(iii) In any ordered set, if the subset X has a maximum member m, then that is also its
supremum. In other words, if max X exists, then sup X exists also and sup X = max X.

(iv) If a subset X happens to have a supremum and that supremum is a member of the
set (that is, sup X exists and ∈ X), then it is also the maximum member of the set.
(v) Consider the empty set as a subset of some given ordered set A.

• If A has a minimum member, then sup ∅ exists and is that minimum member,
• otherwise sup ∅ does not exist.

For example, as a subset of R, sup ∅ is undefined, but as a subset of N, sup ∅ = 0.


C. ORDER 245

C.4 Proposition: Initial segments of fully ordered classes


Let x and y be two members of a fully ordered class A. Then
(i) I(x) ⊆ I(y) if and only if x≤y and I(x) ⊂ I(y) if and only if x < y.

(ii) Either I(x) ⊆ I(y) or I(y) ⊆ I(x).

Proof. An easy exercise. 


246 CHAPTER 6. MORSE-KELLEY SET THEORY

D The Natural Numbers

D.1 MK6: The Axiom of Infinity


So far there is no axiom which allows us to create an infinite set. The Axiom of Infinity tells
us that there is one; more specifically, it will allow us to construct the Natural Numbers N.
Using the notation so far developed we can make the Axiom of Infinity look much more
friendly. It says that there is a set W with these properties:

∅∈W

for all x ∈ W , x ∪ {x} ∈ W also.

We now set about creating the Natural Numbers N. We will see that most of the engineering
required to manufacture this set is already available to us, with one lynchepin left to be
provided by this axiom.
The way we treat the Natural Numbers here is rather different from the approach used in
I 4.C Section 4.C. Now we set them up as a set. This allows us to use general mathematical
methods to investigate them and, conversely, to use them as a tool in general mathematics.

What we want is the set N to which Peano’s Axioms apply in this form:
(P1) 0 ∈ N.
(P2) for all n ∈ N, n+ ∈ N.
(P3) for all n ∈ N, 0 6= n+ .

(P4) for all m, n ∈ N, m + = n+ ⇒ m = n .


(P5) If X is a subset of N with the properties

0∈X and (∀x)(x ∈ X ⇒ x+ ∈ X) ,

then X = N.
We will use the beautiful definition originally due to John von Neumann. Imagine that we
are attempting to come up with a definition of the natural numbers, based on the sort of
set theory we have done so far. Then every natural number is going to be a set. It would
be nice if the set n actually had n members, so let’s see if we can arrange this. For a start,
0 must have no members and so must be the empty set. To go further than this, look at
the natural numbers
0, 1, 2, 3, 4, ...

Notice that here each natural number n is preceded by exactly n smaller ones. For example,
3 is preceded by 0,1 and 2. So we try defining each natural number as the set of its
predecessors. Can we make this circular-sounding definition work? Well, yes, if we organise
it properly. We are proposing to define each natural number n to be the set {0, 1, . . . , n − 1}.
D. THE NATURAL NUMBERS 247

But then we will have (for each n)

n+ = {0, 1, 2, . . . , n}
= {0, 1, 2, . . . , n − 1} ∪ {n}
= n ∪ {n}

So we have

0 = ∅
+
n = n ∪ {n} for all n

This is starting to look very like an inductive definition, in fact it would be if we only had
induction already. (We had induction as an axiom of PA, but it is not an axiom of MK. We
are about to make that work too.)
You will notice that the Axiom of Infinity is suddenly starting to look very appropriate: the
set W above will contain our brand new N, but it might be too big. We can see straight
off that it contains 0 and, if it contains x then it contains x+ also, so we are on our way.
There is nothing in the axiom, however, to say that the set W does not contain a whole lot
of other rubbish. So we must define N to be a subset of W in a crafty way. There will also
be some other technical details to fix as we go. These will occupy the rest of this section.
The main trick in his discussion is to observe that the set W given to us by the axiom is
not unique — there will be many such sets. If we take the intersection of all of them we
can show that this has all the required properties above. So we use this intersection as the
definition of N. For convenience, we will call these sets “inductive”.

D.2 Definition
A set A is inductive if it has these properties:

0∈A and (∀x)(x ∈ A ⇒ x+ ∈ A)

(We could define an inductive class the same way if we wished, but we don’t need this
generality, so why bother?)
We now define N by specification:

N = { n : n is a member of every inductive set }.

This is obviously the same as


\
N = { X : X is an inductive set } .

This clearly defines N to be a class. To see that it is a set, note that N ⊆ W , (since W is
inductive) and the axiom says that W is a set.

We define a natural number to be a member of N.


248 CHAPTER 6. MORSE-KELLEY SET THEORY

D.3 Most of Peano’s Axioms


We can now prove four out of the five axioms quite easily. In class I sometimes ask which
of the five axioms is going to be hardest to prove. The vote usually goes to P5. But,
surprisingly, that one is really easy. It is P4 that gives the most trouble.

(P1) ∅ is a member of every inductive set by definition. Therefore it is a member of N.


(P2) If n ∈ N then n is a member of every inductive set. But then so is n+ , and so n+ ∈ N
also.
(P3) We note that n ∈ n ∪ {n} = n+ . Therefore n+ 6= ∅.

(P5) If X is a subset of N with the given properties, then of course X ⊆ N. But also X
is inductive, so N ⊆ X.
This leaves (P4), which is surprisingly tricky . . .

D.4 Definition
A class X is transitive if every member is also a subset, that is,
(∀x)(x ∈ X ⇒ x ⊆ X) .
In this context then, each such x is both a member and a subset of X.

The word transitive comes from the alternative (equivalent) definition: a class X is transitive
if
(∀x)(∀y)(x ∈ y and y ∈ X ⇒ x ∈ X) .

At present we are only interested in transitive sets. Later, when we come to consider ordinal
numbers, it will be useful to observe that the class of all ordinal numbers is transitive. But
we are getting ahead of ourselves.

D.5 Proposition
All natural numbers are transitive.

Proof. The empty set is transitive (vacuously). Since we have already proved (P5), it
remains to show that, if n is a transitive natural number, then so is n+ . Suppose then that
n is such a number and that x ∈ n+ ; we want to show that x ⊆ n+ . We have x ∈ n ∪ {n}, so
either x ∈ n or x ∈ {n}. If x ∈ n then x ⊆ n (by the inductive hypothesis) and then, since
n ⊆ n+ we have x ⊆ n+ . On the other hand, if x ∈ {n} then x = n and again x ⊆ n+ . 

Proof of P4. First note a couple of facts:


(1) If n is a natural number, then n ∈ n+ .
(2) If m and n are natural numbers and m ∈ n+ then m ⊆ n.
Here (1) follows immediately from n+ = n ∪ {n}. For (2), we have m ∈ n ∪ {n} so either
m ∈ n or m = n; if m = n we are done and if m ∈ n we have m ⊆ n by transitivity.
D. THE NATURAL NUMBERS 249

Now to prove P4. Given m+ = n+ , we have m ∈ m+ by (1), so m ∈ n+ and then m ⊆ n by


(2). Similarly n ⊆ m. Done. 

One would expect there to be a theorem stating that no set can be a member of itself. This
is in fact so, the result following from the Axiom of Foundation, and we will prove it in
Corollary H.3. I H.3
It is nice however to know that this esoteric axiom is not needed to prove such a basic fact
for the natural numbers; it follows from what we have just proved:

D.6 Proposition
No natural number is a member of itself (that is, if n ∈ N then n ∈
/ n).

Proof. The empty set has no members, so it is not a member of itself. It now suffices to
show that if n is a natural number which is not a member of itself, then so is n+ . Suppose
/ n. We want to show that n+ ∈
then that n is such a number, n ∈ / n+ .
Proof is by contradiction: suppose that n+ ∈ n+ , that is, n+ ∈ n ∪ {n}. Then either
n+ ∈ n or n+ ∈ {n}. But if n+ ∈ n then n+ ⊆ n , since n is transitive. If, on the other
hand, n+ ∈ {n}, then n+ = n. So in either case n+ ⊆ n. But n ∈ n+ , so then n ∈ n, a
contradiction. 

D.7 Fun and games


This definition of N has some very pretty properties. We can list the first few natural
numbers as
0 = ∅ , 1 = ∅+ , 2 = ∅++ , 3 = ∅+++ , etc.
Expanding this out gives

0=∅ , 1 = {∅} , 2 = {∅, {∅}} , 3 = {∅, {∅}, {∅, {∅}}} , ...

and we have already mentioned the petty fact that

0=∅ , 1 = {0} , 2 = {0, 1} , 3 = {0, 1, 2} , 4 = {0, 1, 2, 3} , ...

and in general, n = {0, 1, 2, . . . , n − 1}.


Also (assuming that we have sorted out what counting means), each n has exactly n mem-
bers. Indeed, we can define counting by saying that a set has n members if it can be placed
in one-to-one correspondence with (the set) n, and we can define a finite set as one that can
be so counted (by some n).

D.8 Proof by (ordinary) induction


There are two ways of looking at a proof by induction, either as (P5) above:
(A) If X is a subset of N with the properties

0∈X and (∀x)(x ∈ X ⇒ x+ ∈ X) ,


250 CHAPTER 6. MORSE-KELLEY SET THEORY

then X = N.
or in terms of an expression to be proved, as was given when we discussed Peano Arithmetic:
(B) If P is an expression and x a variable symbol such that,

P (0̄) and (∀x)(P (x) ⇒ P (x+ ))

then (∀x)P (x).


We should point out here that these are equivalent. The proof is easy.

To prove that (A) (B), define X to be the set { x : P (x) }.


To prove that (B) (A), define P to be the expression x ∈ X.
We should also point out that, in (B), I have used the rough and ready “functional” notation.
In “correct” notation it would be

(B) If P is an expression and x a variable symbol such that,

P [x/0] and (∀x)(P ⇒ P [x/x+ ])

then (∀x)P .

D.9 Definition by (ordinary) induction


The Induction axiom (P5) allows us to prove things about the natural numbers by induction.
We still have to see how to define things by induction. In the simplest case we want to be
able to define a function f : N → B, where B is some set, by specifying f (0) ∈ B and f (n+ )
in terms of n and f (n).

Thus we specify f (0) = b, where b is a member of B, and, for all n, f (n+ ) = h(n, f (n)) where
h is some function, already defined. (Notice that here, h must be a function N × B → B.)
There is no reason why B should not be a class here, and occasionally we will want that.
So we make the definition in this generality: it is no harder.

Definition by Induction, no parameters


Let B be a class, b a member of B and h : N × B → B be a function. Then there is a
unique function f : N → B satisfying

f (0) = b

and
f (n+ ) = h(n, f (n)) for all n ∈ N.

More generally, there may be other variables involved and we want to define a function of
several variables by induction over one of them — define f (a1 , a2 , . . . , ak , n) by induction
over n. In this case we can of course consider the n-tuple ha1 , a2 , . . . , ak i to be a single
variable, and so our general problem boils down to defining a function f : A × N → B by
D. THE NATURAL NUMBERS 251

specifying f (a, 0) for every a ∈ A and specifying f (a, n+ ) in terms of a, n and f (a, n) for
all a ∈ A and n ∈ N. Here A and B can be any classes.
Thus we have

D.10 Proposition: Definition by Induction, with parameters


Let A and B be classes, g : A → B and h : A × N × B → B be functions. Then there is
a unique function f : A × N → B satisfying

f (a, 0) = g(a) for all a ∈ A and (–1A)


+
f (a, n ) = h(a, n, f (a, n)) for all a ∈ A and n ∈ N. (–1B)

Proof. The proof that f is unique is a simple application of proof by induction and will be
left as an exercise. The fact that such an f exists is what will be proved here.
The most powerful method for defining a new function A × N → B that we have at our
disposal so far is to define it as a triple hA × N, G, Bi, where G is the graph of the function,
so we will do that. Now G must be a subset of (A × N) × B, which is the same as A × N × B.
Translating conditions (–1A) and (–1B) above on the function into corresponding conditions
on the graph, we have

ha, 0, g(a)i ∈ G for all a ∈ A (–2A)


+
if ha, n, bi ∈ G then ha, n , h(a, n, b)i ∈ G for all a ∈ A, n ∈ N and b ∈ B (–2B)

and we mustn’t forget the extra condition to ensure that this relation is a function:

For all a ∈ A and n ∈ N there is a unique b ∈ B such that ha, n, bi ∈ B . (–2C)

Any subset of A × N × B which satisfies these three conditions is the graph of the required
function; so now all we need do is show that such a subset exists. We employ a similar trick
to the one we used when creating N above.
For the purposes of this proof, let us call any subset of A × N × B which satisfies (–2A)
and (–2B) “good”. We will show that the intersection of all good subsets satisfies all three
conditions and so gives the required function. So let us write G for the intersection of all
good subsets of A × N × B.
Firstly, observe that A × N × B itself is good, so G ⊆ A × N × B.
• Given any a ∈ A, ha, 0, g(a)i is a member of every good set by (–2A) and so is a
member of G. Thus G satisfies (–2A) also.

• Given any a ∈ A, n ∈ N and b ∈ B such that ha, n, bi ∈ G, ha, n+ , h(a, n, b)i is a


member of every good set by (–2B) and so is a member of G. Thus G satisfies (–2B) also.
• Now observe that, for any n ∈ N, if ha, n+ , ci ∈ G then there is some b ∈ B such
that c = h(a, n, b). This is so because, if no such b existed, then the triple ha, n+ , ci could
simply be removed from G and the result would be still good (think about it!). And that
would contradict the definition of G as the intersection of all good sets.
252 CHAPTER 6. MORSE-KELLEY SET THEORY

And now we can prove (–2C), which we will do by induction over n. That’s not being
circular: we are in the process of showing that definition by induction works; we already
know that proof by induction works.
First we want to show that, given any a ∈ A, there is a unique b ∈ B such that ha, 0, bi ∈ G.
Now we know (by (–2A) for G) that ha, 0, g(a)i ∈ G, so it exists. For uniqueness, suppose
that there is some b 6= g(a) such that ha, 0, bi ∈ G. Then this triple can simply be removed
from G and it will still be good, so that cannot happen.
Finally, given any a ∈ A and n ∈ N, assume that there is a unique b ∈ B such that
ha, n, bi ∈ G; we want to show that there is a unique c ∈ B such that ha, n+ , ci ∈ G. Such
a c exists, namely c = h(a, n, b). But it is also unique, because we have just seen that
c = h(a, n, b) for some b ∈ B such that c = h(a, n, b) and our inductive assumption is that
there is only one such b. 

D.11 Addition, multiplication, order etc.


I C.1.4 In the section on Peano Arithmetic (C.1.4) addition and multiplication were set up ax-
iomatically. With the more powerful methods available to us in MK we can now define
functions such as these easily whenever we like. In particular, the axioms for addition and
multiplication in PA can simple be copied to here, where they become inductive definitions.

x + 0 = x for all x ∈ N and x + y + = (x + y)+ for all x, y ∈ N


+
x.0 = 0 for all x ∈ N and x.y = (x.y) + x for all x, y ∈ N

indeed, we can now define (for example) exponentiation in the same way

+
x0 = 1 for all x ∈ N and x(y )
= (xy ).y for all x, y ∈ N

Also, of course, we are now allowed to talk about sets and classes, so we can say much more.
We can also define the order on N as in Section C.1, or else we can ignore addition and
define it directly: take a hint from the fact noted above that n = (1, 2, . . . , n − 1), and define
a ≤ b to mean a ∈ b ∨ a = b, or perhaps better just a ⊆ b. It is an interesting exercise to
prove the usual order properties from this basis.
More ambitiously, we can now develop all the well-known algebraic and order properties of
N. Having done this, we can construct in turn The Integers Z, The Rationals Q, The Reals
R and The Complex Numbers C and prove their defining properties.

I Ch.A These ideas will be taken up in Appendix A.

D.12 Some thoughts about the Natural Numbers and definitions in general
You may find the fact that we have three separate definitions of an ordered pair, and the
fact that we often don’t care much about which one we use, a little unsettling. But this
should not be new: we customarily treat the Natural Numbers the same way. We have a
basic construction of N, à la Von Neumann as in this book, or any other way you please;
D. THE NATURAL NUMBERS 253

then later a construction of the Rationals Q which “contains” N and later still a construction
of the Reals R which “contains” Q and along with it N again. Look at the actual definitions
(they are given in Appendix A) and you will see that these three versions of N are in fact IA
quite different. However we are happy to swap back and forth between these constructions
according to convenience. Oh, and did I mention N as a subset of the Complex Numbers?
Of the Algebraic Numbers? Of rings of polynomials? We are even happy to call all these
things “the” Natural Numbers. Why is this OK?
What do we want the Natural Numbers to provide for us? At the most basic level, probably
two things. Firstly to act as a way of measuring the size of finite sets — one element exactly
of N corresponding to each finite size — and, one would hope, some useful operations to
help figure out what happens when one adds an extra element to a finite set or merges two
of them. Secondly as something to keep track of repetitive operations or occurrences. The
uses seem to be intertwined; in particular, it does not seem easy to deal with the question
of the size with out dealing with repetitive operations also — think counting.
Analysis of these ideas leads to the requirements encapsulated in the axioms of PA (or
something very like it). In a richer context such as mathematics in general (MK or ZF for
examples), the first three axioms are enough and the final four then follow.
Now, to return to the original question, anything we construct that has these properties will
do just fine. It is seldom relevant whether we are using the Natural Numbers as constructed
according to the Von Neumann method or in some other way. All these constructions yield
structures which are isomorphic as far as relevant properties go.

[It should be mentioned here that the Von Neumann construction does become relevant
again when we look at transfinite arithmetic, because it extends nicely to a very convenient
construction of the ordinal numbers.]
These same general considerations apply to the construction of an ordered pair. The whole
point of an ordered pair is its property: it is a function of two variables (let us use here the
generic notation p(x, y)) such that

if p(x1 , y1 ) = p(x2 , y2 ) then x1 = y1 and x2 = y2 .

[It should be pointed out that of the three definitions given above, the “ordinary” ordered
pair is not exactly equivalent to the other two (the “primitive” one and as a sequence of
length 2) because it can be used when x and y are classes, whereas for the other two they
must be sets (for the defining property to work).
So why then do we bother with the construction at all (since we are hardly going to pay
any attention to it)? It is because we need to know that the thing (Natural Numbers, the
ordered pair or whatever) does in fact exist.
These general ideas apply to many of the constructions of mathematics. For another exam-
ple, there are two standard ways of constructing the Real Numbers, one employing Cauchy
sequences and the other Dedekind cuts. These are used to show that the Real Numbers
(with its required properties) actually exists and from then on are pretty well ignored.
254 CHAPTER 6. MORSE-KELLEY SET THEORY

E Well-ordering
E.1 Notes
Well-order was defined in Section C.1. (The various terms that were defined in that section
will be used here.) With regard to that definition, note that:—

(i) Properties O3 and O5 together imply the others, so in order to verify that a relation
is a well-order, it is sufficient to verify these two properties.
(ii) When we come to discuss the Axiom of Choice, we will see that that axiom is
equivalent to the Well-Ordering Principle: Every set can be well-ordered.

(iii) N (with the usual ordering) is well-ordered.

Proof (i). To prove O1, note that {x} has a least member, so x ≤ x.
To prove O2, note that {x, y, z} has a least member. If this is x then of course x ≤ z. If it
is y, then y ≤ x; since we already know that x ≤ y then x = y (by O3), so x ≤ z. If the
least member is z the proof is similar.
To prove O4, note that {x, y} has a least element, which must be either x or y.
(iii) The proof is by contradiction: suppose that X is a subset of N which has no least
element; we show that X is empty.

Let P (n) be the expression (∀x ≤ n)(x ∈


/ X) ,

We prove(by ordinary induction) that P (n) is true for all n ∈ N.

Firstly, P (0) is (∀x ≤ 0)(¬x ∈ / X), which is the same as 0 ∈


/ X, and this is true for
otherwise 0 would obviously be the least member of X.
Now suppose that P (n) is true, that is, that none of the numbers 0, 1, . . . , n is a member of
X. Then n+ cannot be a member either, because if it was it would be the least member of
X. But that means that P (n+ ) is true too, as required. 

E.2 Examples of well-ordered sets


Perhaps the two most important examples are N, discussed above, and the ordinal numbers
which will be discussed in the next chapter.
In the meantime, here are a few other examples.
Firstly, any subset of a well-ordered set is also well-ordered (with the inherited order of
course), and so any subset of N is well-ordered. That includes the empty set (the definition
holds vacuously). The proof of this is very easy.
Now let us try extending N by adding an extra element at the far end — it would be natural
I suppose to call it ∞. So, choose any element which is not a member of N and call it ∞ (it
does not matter much what it is). Add it to N to form a new set N∞ = N ∪ {∞}. Order it in
the obvious way: natural numbers are ordered in the usual way and n < ∞ for every natural
E. WELL-ORDERING 255

number n. Again, it is easy to prove that this is well-ordered. (Any nonempty subset X is
either {∞}, in which case ∞ is its least member, or else X r {∞} is a nonempty subset of
N and so has a least member.)
You can also put two copies of N side by side. The best way to look at this is to start with
two disjoint sets which are ordered like N, A = {ai }i∈N and B = {bi }i∈N . Now order the
disjoint union A ∪ B by specifying ai < aj and bi < bj whenever i < j and ai < bj for all
i, j. Picture:
{ a0 < a1 < a2 < . . . < b0 < b1 , < b2 < . . . }

Any nonempty subset X either contains sat least one ai , in which case the least such ai is
the least member of X, or else it is entirely contained in the B part, in which case it has a
least member because B is well ordered.
We can well-order the set N2 thus:

All the pairs which start with 0 first, h0, 0i < h0, 1i < h0, 2i < . . .
followed by all the pairs which start with 1, h1, 0i < h1, 1i < h1, 2i < . . .
followed by all the pairs which start with 2, h2, 0i < h2, 1i < h2, 2i < . . .
and so on.

This is called the lexicographic order for I suppose obvious reasons; more carefully, it is the
lexicographic order from the left. This order is better defined thus:
(
a1 < b1
ha1 , a2 i < hb1 , b2 i if
or else a1 = b1 and a2 < b2 .

To see that this is a well-order, let X be a nonempty subset of N. Then there is at least
one member ha1 , a2 i. Let a1 be in fact the least member of N such that such a pair is in X.
Keep that ai fixed. Now there is at least one member a2 of N such that ha1 , a2 i ∈ X (with
that already fixed a1 ). Choose the least such a2 . Then ha1 , a2 i is the least member of X.

This construction works for n-tuples (for any fixed n). Define an order on Nn by

ha1 , a2 , . . . , an i < hb1 , b2 , . . . , bn i if there is k such that


1 ≤ k ≤ n,
ak < bk
and ai = bi for all i < k . (–1)

Proof as easy exercise


Obviously, these orders can be defined from the other end, resulting in the lexicographic
order from the right. For pairs:
(
a2 < b2
ha1 , a2 i < hb1 , b2 i if
or else a2 = b2 and a1 < b1 .
256 CHAPTER 6. MORSE-KELLEY SET THEORY

and for n-tuples:

ha1 , a2 , . . . , an i < hb1 , b2 , . . . , bn i if there is k such that


1 ≤ k ≤ n,
ak < bk
and ai = bi for all i > k . (–1)

It is not so easy to define a well-order


S∞ for the set of all finite sequences of natural numbers
(of any length), than is, the set n=0 Nn . If you try to do it like the last definition above
you run into trouble with what to do with ends of n-tuples of different lengths. The trick
is to order them by length first and then, for ones of the same length, order them by recipe
(–1) above.
Proof as easy exercise.
Another approach to ordering this set would be to extend all finite sequences to be infinitely
long by adding zeros at the end. In other words, we look at all infinite sequences ha0 , a1 , . . .i
(indexed by N) which have only a finite number of nonzero entries and order them by recipe
(–1).
As a medium exercise, show that this does not work: this recipe defines a full order but not
a well-order. As a slightly harder exercise, define a (different) well-order for this set (and
prove it of course).
This raises an obvious question: what about well-ordering the set of all such infinite se-
quences, whether or not they have an infinite number of nonzero entries.
Impossibly hard exercise: define a well-order for this set and prove it.
In the next proposition I use the notation

I(a) = { x : x < a } the initial segment defined by a


¯
I(a) = {x : x ≤ a}

I C.2 which was defined in C.2. (It is quite useful in the context of well-orders.)

E.3 Proposition: Initial segments of well-ordered classes


(i) Let w be a member of a fully fully ordered class W . Then w is the least member of
W r I(w).
(ii) An initial segment of a well-ordered class W is either W itself or else of the form
I(w) for some w ∈ W .

Proof. An easy exercise. 

E.4 Theorem: Strong induction


There are a number of ways of looking at strong induction: I will define and prove them
first and discuss them afterwards.
E. WELL-ORDERING 257

(i) Let W be a well-ordered class and E be a subclass of W such that

I(x) ⊆ E ⇒ x∈E for all x ∈ W .

Then E = W .
(ii) Let W be a well-ordered class and P (x) be an expression such that

(∀x < a)P (x) ⇒ P (a) for all a ∈ W .

Then P (a) is true for all a ∈ W .


(iii) Let P (x) be an expression about members of a well-ordered class W and suppose
that there is some member x of W such that P (x) is not true. Then there exists a least
member of W such that P (x) is not true.

Proof. (i) By contradiction: suppose E 6= W . Then W r E is nonempty and so has a


least element, x say. Then I(x) ⊆ E but x ∈
/ E, contradicting the assumption.
(ii) is an immediate corollary of (i): let X be the set { x : P (x) }.
(iii) The set of members of W for which P is not true is nonempty, therefore it has a
least member. 

Discussion
All these methods work not only for N but also for any well-ordered set or indeed any well-
ordered class, however let us discuss them in the context of N here — not forgetting their
wider applicability.

In a proof of P (a) by ordinary induction, you are allowed to assume P (a − 1) in order to


prove P (a). In a proof by strong induction, you are allowed to assume that P (x) is true
for all x < a. This is a much stronger assumption, so the proof is often accordingly easier.
Here is the method ((ii) above) in all its formal glory: for any expression P (n),

(∀a ∈ N) ((∀x < a)P (x)) ⇒ P (a) ⇒ (∀a ∈ N)P (a)

What this means in practice is, to use this method to prove that P (n) is true for all n ∈ N,
it is sufficient to prove, for any a ∈ N,

if P (x) is true for all x < a, then P (a) . (–1)

In order to compare this with ordinary induction, we could rewrite the ordinary induc-
tion method thus: to prove that P (a) is true for all a ∈ N,it is sufficient to prove that
P (0) (–2A)
and, for all a ≥ 1 ∈ N,

if P (x) is true for all x = a − 1, then P (a) is true. (–2B)


One aspect of strong induction that often mystifies beginners is that you don’t have to prove
an initial step (like (–2a)). The reason is that the initial case is taken care of as part of (–1).
258 CHAPTER 6. MORSE-KELLEY SET THEORY

Proving (–1) for all n includes proving it in the case n = 0. Now, writing out
(–1) in the special case n = 0, we get

if P (k) is true for all k < 0, then P (0) is true.

But, since (∀k < 0)P (k) is vacuously true, this is the same as T ⇒ P (0),
which in turn is the same as P (0). In practice, when strong induction is an
appropriate method, this special case very often just comes out in the wash.

(iii) is another form of this principle which is often used in proofs.


If P (n) is false for some n ∈ N, then there is a least n ∈ N for which it is false.
This is usually used to prove that P (n) is true for all n by contradiction: assume not. Then
by this principle, there is a least n for which P (n) is false. Now argue from there.

E.5 Definition
Let A and B be two partially ordered classes.
(i) A function f : A → B is order-preserving if, for any a1 , a2 ∈ A

a1 ≤ a2 ⇒ f (a1 ) ≤ f (a2 ) .

(ii) An order-isomorphism is a bijection ϕ : A → B such that, for any a1 , a2 ∈ A

a1 ≤ a2 ⇔ ϕ(a1 ) ≤ ϕ(a2 ) .

An alternative description of an order isomorphism is as a bijection such that both it and its
inverse are order-preserving. Note that, if both A and B are known to be fully ordered, then
either ⇒ or ⇐ is enough in (ii) above. Putting that another way, if both A and B are known
to be fully ordered, then an order-preserving bijection must be an order-isomorphism.
The next proposition proves some useful tricks for dealing with order-preserving functions.

E.6 Proposition
(i) Suppose that X and Y are fully ordered classes and f : X → Y is an order-preserving
bijection, then it is an order-isomorphism. Also f [I(x)] = I(f (x)) for all x ∈ X.

(ii) Let W be a well-ordered class and f : W → W be one-to-one and order-preserving.


Then x ≤ f (x) for all x ∈ W .
(iii) Let W be a well-ordered class. Then there is only one order isomorphism W → W ,
namely the identity function.
(iv) Let W be a well-ordered class. Then W is not order isomorphic to I(x) for any
x ∈ W.
E. WELL-ORDERING 259

(v) If X and Y are well-ordered classes and f is an order-isomorphism from X onto an


initial segment of Y , then it is unique (in the sense that, if g is any order-isomorphism from
X onto any initial segment of Y , then g = f ).

Proof. (i) Suppose that y1 < y2 in Y . Since f is a bijection, there are x1 and x2 in
X such that f (x1 ) = y1 and f (x2 ) = y2 . But then x1 ≥ x2 gives y1 ≥ y2 , a contradiction.
Suppose y ∈ f [I(x)]. Then there is u ∈ I(x) such that y = f (u). Then u < x and, since f is
one-to-one, f (u) < f (x), i.e. y ∈ I(f (x)).
Conversely, suppose that y ∈ I(f (x)). Since f is onto, there is u ∈ X such that y = f (u).
Then u ∈ I(x), for otherwise (X being fully ordered) x ≤ u, in which case we would have
y = f (u) ≥ f (x), contradicting y ∈ I(f (x)).

(ii) Observe first that the conditions on f tell us that, if x < y in W , then f (x) < f (y)
(f is strict order-preserving). Let E be the class of all members x of W such that x ≤ f (x);
we want to prove that E = W . Suppose not; then E r E is nonempty and so has a least
member, w say. Then f (w) < w, from which we deduce two things, first that f (w) ∈ E and
second that f (f (w)) < f (w). But these contradict one another.

(iii) and (iv) are immediate corollaries of (i) and (ii).


(v) Suppose that f 6= g. Then there is some a ∈ X such that f (a) 6= g(a). Suppose that
a is the least such; we may suppose further (without loss of generality) that f (a) < g(a).
Then, for x < a, g(x) = f (x) < f (a) and, for x ≥ a, g(x) ≥ g(a) > f (a) — so there is no
x such that g(x) = f (a). However f (a) < g(a), so g cannot be onto an initial segment. 

This next theorem and its corollary will be important.

E.7 Theorem: Comparison theorem for well-ordered classes


Let A and B be two well-ordered classes. Then either

(i) A is order-isomorphic to an initial segment of B or


(ii) B is order-isomorphic to an initial segment of A.
(This includes the possibility that A and B are order-isomorphc to each other, in which case
both of the above are true.)

Proof. For the sake of this proof, let us call a function which is an order-isomorphism from
an initial subset of A to an initial subset of B and initial function. Let U be the subset of
¯
A consisting of all members a ∈ A such that there exists an initial function I(a) → B. If
such a function exists it is unique (by Part (v) of the previous proposition), so we can call
¯
it fa . It is obvious that the range fa [I(a)] ¯ (a)].
must be I(f
(1) U is an initial segment.

To see this, let u ∈ U , a ∈ A and a ≤ u. Then it is easy to check that the function fu I(a)
¯
¯ → B, so a ∈ U also.
is an initial function I(a)
260 CHAPTER 6. MORSE-KELLEY SET THEORY

(2) These functions fit together nicely, in the sense that, for any a1 ≤ a2 in A, fa1 =
fa2 I(a
¯ 1).

This is because fa2 I(a


¯ 1 ) is initial, so it must be fa1 .

(3) So we can fit them all together to make a single inclusive function f : U → B. The
easiest way to do this is to define f by f (u) = fu (u) for all u ∈ U .

We need to check that this is an initial function too.


Write V for the range of f , that is, V = f [U ]. Suppose v ∈ V , b ∈ B with b ≤ v. Then
¯
there is u ∈ U such that f (u) = v. Now f maps I(u) ¯
onto I(v), so there is some a ≤ u such
that f (a) = b. But then a ∈ U so b ∈ V .

(4) Now we prove the theorem. If either U = A or V = B we are done; we show that
one or other (or both) must be true. For suppose not. Then we have initial segments U of
A and V of B such that U 6= A and V 6= B. Then there are members a of A and b of B
such that U = I(a) and V = I(b). Define
(
g(a) = b
g : U ∪ {a} → B by
g(x) = f (x) for all x ∈ U .

But this contradicts the definition of U because U ∪ {a} = I(a),¯ ¯ and so g


V ∪ {b} = I(b)
¯
maps I(a) ¯ — but a ∈
onto I(b) / U , contradicting the dfinition of U . 

Another useful way of stating this theorem is:

E.8 Corollary: Trichotomy theorem for well-ordered classes


Let A and B be two well-ordered classes. Then exactly one of the following holds
(i) A and B are order-isomorphic.

(ii) There is some y ∈ B such that A is order-isomorphic to I(y).


(iii) There is some x ∈ A such that B is order-isomorphic to I(x).
Furthermore,
in Case (i), the order-isomorphism A → B is unique,

in Case (ii), the element y of B is unique as is the order-isomorphism A → I(y) and


in Case (iii), the element x of A is unique as is the order-isomorphism B → I(x).

E.9 Corollary
From this theorem we can say that, if A and B are two well-ordered classes, then either A
is order-isomorphic to an initial segment of B or B is order-isomorphic to an initial segment
of A. Moreover, if both, then A and B are themselves order-isomorphic.
E. WELL-ORDERING 261

E.10 Definition by strong induction


Let us consider an example. Some algebraic systems have a binary operation which is not
associative. The most obvious ones I can think of are the Lie operation in a Lie algebra
and commutation in a group, but let us write it as xy for simplicity. In such an algebra a
product (or whatever you want to call it) of more than two factors (abc, abcd etc.) does not
make sense until it is fully bracketed, because different ways of bracketing may give different
values. The problem arises: what is the number of ways of bracketing a given string of
length n? Call this number P (n).Here are some examples.
n=1 a P (1) = 1
n=2 (ab) P (2) = 1
n=3 a(bc) , (ab)c P (3) = 2
n=4 a(b(cd)) , a((bc)d) , (ab)(cd) , (a(bc))d , ((ab)c)d P (4) = 5
It is not difficult to see that this function satisfies
P (1) = 1 (–1)
n−1
X
P (n) = B(r)B(n − r) for all n ≥ 1 . (–2)
r=1

It is clear that these equations are sufficient to define B(n) for all n, along these lines:
P (1) = 1
P (2) = P (1).P (1) = 1.1 = 1
P (3) = P (1).P (2) + P (2).P (1) = 1.1 + 1.1 = 2
P (4) = P (1).P (3) + P (2).P (2) + P (3).P (1) = 1.2 + 1.1 + 2.1 = 5
P (5) = P (1).P (4) + P (2).P (3) + P (3).P (2) + P (4).P (1) = 1.5 + 1.2 + 2.1 + 5.1 = 14
and so on.
This looks very like an inductive definition, except that the “inductive step” (–2) involves
all the previous values of P , not just P (n − 1). It looks like a combination of an inductive
definition with the idea of strong induction. And so it is: it is an example of a definition by
strong induction. To define a function, f say, this way, we define f (n) not just in terms of
f (n − 1) but in terms of the whole function so far.
The general idea is not terribly complicated, but describing it properly (and in full generality)
is rather tricky. The important thing to notice is that, when defining a function F : N → B
in this way, we define f (n) in terms of the entire function so far, that is, in terms of f I(n)
— recall, this is our notation for f restricted to {0, 1, . . . , n − 1}.

To define a function f : A × N → B by ordinary induction, we need something like this:


f (a, 0) = g(a)
f (a, n+ ) = h(a, n, f (n))
so to define such a function by strong induction we expect (by analogy with inductive proofs)
that we won’t need the n = 0 step, and the required definition will look something like this:
f (a, n) = h(a , n , f I(n) ) ;
262 CHAPTER 6. MORSE-KELLEY SET THEORY

well, the second argument n is not required here, because if we need n it is given by f I(n)
anyway. So what we require is actually

f (a, n) = h(a , f I(n) ) .

So this defining function h is a function A×? → B, where the question mark represents
some set or class which we have yet to figure out. The thing that goes in here, f I(n) is, as
far as we can know when setting up the definition, some unpredictable function I(n) → B,
so maybe the question mark should represent all such functions. Well, yes, but it also has
to work for all possible values of n too, so what we want here is

the set of all functions I(n) → B for all n ∈ N.

For the present purposes I will call this set I (that’s a big curly letter I).
To see how this notation includes our bracketing example above, first we have to give P (0) a
value. Since we don’t care much what it is, let’s just arbitrarily set P (0) = 1. (Also A = ∅
since there are no extra parameters.) Then our function h : I → N is
(
1 if the domain of θ is either I(0) or I(1)
h(θ) = Pn−1
r=1 θ(r)θ(n − r) where the domain of theta is n and n > 1

Now we have enough to write out a proper definition of definition by strong induction, and
to prove that it works. Moreover, all this works for any well-ordered class in place of N
with no extra complication. Since we will eventually need this kind of definition in other
well-ordered cases, we treat the general case here. For the time being, there is no harm in
thinking of W as being the natural numbers, then glancing back at this method when we
come to discuss ordinal numbers later.
Read the statement of this theorem and make sure you understand what it says, even though
the notation is a bit complicated — the theorem is important. On the other hand, it is not
so important to work through the rather horrible little proof. It is included mainly for
completeness and because you might have difficulty finding it anywhere else. As usual, feel
free to examine the proof if you are interested, or suspicious.

E.11 Theorem
Let A and B be classes, W a well-ordered class and h be a function h : A × W × I → B,
where I is the class of all functions A × I(w) → B for all w ∈ W . Then there is a unique
function f : A × W → B such that

f (a, x) = h(a , x , f A×I(x) ) for all a ∈ A and x ∈ W . (–1)

¯
Proof. For the purposes of this proof, define a starter to be a function σ : A × I(w) → B,
where w ∈ W , which satisfies (–1) “as far as it goes”, that is,

σ(a, x) = h(a , x , σA×I(x) ) ¯


for all a ∈ A and x ∈ I(w) (–1a)
E. WELL-ORDERING 263

We show first that, for any w ∈ W ,


¯
if there is a starter σ : A × I(w) → B, then it is unique. (–2)

Suppose then that we have two starters σ, σ 0 : A × I(w) ¯ → B and σ 6= σ 0 . Then there is
some a ∈ A and x ∈ I(w) ¯ such that σ(a, x) 6= σ 0 (a, x); using well-order, we may suppose
that x is in fact the least such. Then we have σ(a, z) = σ 0 (a, z) for all z < x; in other words,
σA×I(x) = σ 0 A×I(x) . But then

σ(a, x) = h(a , x , σA×I(x) ) = h(a , x , σ 0 A×I(x) ) = σ 0 (a, x) ,

contradicting the choice of x and so proving (–2).


¯
Now let S be the subclass of all w ∈ W such that there does exist a starter A × I(w) → B.
From the uniqueness just proved we are justified in using the notation: if w ∈ S, denote the
¯
unique starter A × I(w) → B by σw .

Now suppose that w ∈ S and v ≤ w, and consider the function σ 0 = σw A×I(v) ¯ . Noting
that, for any x ≤ v we have σ 0 (a, x) = σw (a, x), so that σ 0 A×I(x) = σw A×I(x) , we see that,
for any x ≤ v,

σ 0 (a, x) = σ(a, x) = h(a , x , σA×I(x) ) = h(a , x , σ 0 A×I(x) )

¯
and so σ 0 is a starter and therefore the starter A × I(v) → B, that is σ 0 = σv . From this
we see several things. Firstly, if w ∈ W and v ≤ w, then v ∈ S also, that is, S is an initial
segment of W. Secondly, if v ≤ w ∈ S,

σv = σw A×I(v)
¯ .

And finally, again if v ≤ w ∈ S,

σv (a, x) = σw (a, x) for all a ∈ A and x ≤ v.

Now let us define the function f : A × S → B by

f (a, w) = σw (a, w)

We note that, for v ≤ w,


f (a, v) = σv (a, v) = σw (a, v)
which tells us that
f |A×I(w)
¯ = σw .
¯
Noting that I(w) ⊆ I(w), we have σw |A×I(w) = f |A×I(w) so,

f (a, w) = σw (a, w) by definition of f


= h(a, w, σw A×I(w) ) since σw is a starter
= h(a, w, f A×I(w) ) just noted

so f satisfies the main equation in the form

f (a, w) = h(a, w, f A×I(w) ) for all a ∈ A and w ∈ S.


264 CHAPTER 6. MORSE-KELLEY SET THEORY

It remains only to show that S = W . Suppose not. Then W rS 6= ∅ so, using well-ordering,
¯
let w be the least member of W r S. Then there is a starter σx : A × I(x) → B for all
¯
x < w, which means that I(w) ⊆ S. Define σ : A × I(w) → B by
(
f (a, x) if x < w,
σ(a, x) =
h(a, w, f A×I(w) ) if x = w.

But this σ is also a starter. To see this we check (–1). Firstly, for x < w , σ(a, x) = f (a, x)
which tells us that σ|A×I(w) = f |A×I(w) , and so

σ(a, w) = h(a, w, f A×I(w) ) = h(a, w, σA×I(w) ) .

Noting also that, for any x < w, σ(a, x) = f (a, x) = σx (a, x), so σ|A×I(x) = σx |A×I(x) ,

σ(a, x) = f (a, x) by definition of σ


= σx (a, x) by definition of f
= h(a, x, σx A×I(x) ) since σx is a starter
= h(a, x, σA×I(x) ) just noted.

¯
This proves that σ is a starter A× I(w) → B, contradicting the choice of w and thus proving
that S = W , as required. 
F. THE AXIOM OF CHOICE 265

F The Axiom of Choice


F.1 The axiom
The Axiom of Choice, as usually stated, is
S
(AC) Let A be a set of nonempty sets. Then there is a function f : A → A such that
f (A) ∈ A for all A ∈ A.

Think of f as “choosing” one element of each set in A. It is called a choice function: hence
the name of the axiom.

F.2 Discussion
The Axiom of Choice is not one of the axioms of von Morse-Kelley Set Theory (MK) nor is
it one of the axioms of Zermelo-Fraenkel Set Theory (ZF). I will call Set Theory with the
Axiom of Choice added MK+AC.

Note
(1) AC is consistent with Set Theory: if MK is consistent, then so is MK+AC.
(2) AC is independent of Set Theory: if MK is consistent, then so is MK+¬AC.

The Axiom of Choice has a special status: some mathematicians are happy to use it routinely,
others regard it with deep suspicion. If you use an argument which involves AC then it is a
very good idea to say so. Moreover, if it is possible to find an alternative argument which
does not involve AC, this is generally considered to be a good thing.
Given the controversial nature of AC, the first consistency result above is reassuring. Even
if AC does not have the acceptance of the other axioms, at least its use cannot involve any
more risk of a self contradiction that mathematics without it. The second consistency result
shows that the search for a proof of AC from the other axioms is futile (unless, of course,
you suspect that MK itself is inconsistent, and that this might be a good way to prove it!).
Because of the status of the Axiom of Choice, I will decorate results in these notes which
depend upon it with a superscript thus AC .
Here are some observations which I find helpful in thinking about AC.
You don’t ever need AC to make a finite number of choices.
If there is some reason to choose a particular member of each nonempty subset, then you
don’t need the AC either. A standard example of this is: given a (finite or infinite) number
of subsets of N, you can choose one from each simply by choosing its least member. What
is happening here is that the general rule for any nonempty subset X of N

X 7→ least member of X ,

which we already know is a perfectly good function, is in fact a choice function.


Another example. Consider the set of all nonempty open intervals (a, b) in R. It is easy to
construct a choice function for this set: just choose the midpoint of each one. However, if
266 CHAPTER 6. MORSE-KELLEY SET THEORY

you consider the set of all nonempty subsets of R, you will find that you cannot come up
with a definition of a choice function. (Choosing the midpoint won’t do: plenty of nonempty
sets don’t contain their midpoint. And so on for any other rule you like to come up with.)
Here the AC comes to the rescue — it tells you that such a choice function does in fact
exist, even though you cannot possibly know exactly what function it is.

This is one of the characteristics of AC (and the equivalent principles we will look at shortly):
they usually tell us that something exists, without supplying the least clue as to what that
something actually is.
There is a joke about the Axiom of Choice (the kind only a mathematician would laugh at).
Suppose you have an infinite number of pairs of socks. Then it requires the AC to choose
one sock from every pair. On the other hand, if you have an infinite number of pairs of
shoes, it does not require the AC to choose one from every pair. (Because the two socks in
a pair are identical, so there is no particular rule to make a choice; but for shoes there is a
choice function which doesn’t require AC: pair of shoes 7→ left one.)

F.3 Equivalent “axioms”


There are several very useful statements which are equivalent to the Axiom of Choice (that
is, they are formally equivalent to AC in MK.) Thus any proof which requires any of these
“axioms” can be taken as involving the use of AC.
Firstly, here are three other ways the Axiom of Choice is sometimes stated. They are not
enormously different and can easily be proved to be equivalent.
(AC1 ) AC Let A be a set. Then there is a function c : P (A)r{∅} → A such that c(X) ∈ X
for every nonempty subset X of A.
The function c here is sometimes called a choice function for A. That clashes slightly with
the definition just given above, so I won’t use the word this way.
AC
(AC2 ) The cartesian product of any family of nonempty sets is nonempty.

That this is another way of stating the Axiom of Choice becomes more obvious if you look
at the definition of the cartesian product.
(AC3 ) AC Let A be a set of pairwise disjoint nonempty sets. Then there is a set C which
contains exactly one element from every member of A.
(The set C here is called a choice set for A.)

The last form is the one which is most easily expressed in the formal language (without need
for preliminary definitions). It can be stated:

(∀u)(u ∈ a ⇒ u 6= ∅) ∧ (∀u)(∀v)(u ∈ a ∧ v ∈ a ∧ (∃x)(x ∈ u ∧ x ∈ v) ⇒ u = v)

⇒ (∃c) (∀u)(u ∈ a ⇒ (∃x)(x ∈ c ∧ x ∈ u))

∧ (∀x)(∀y)( x ∈ c ∧ y ∈ c ∧ (∃u)(x ∈ u ∧ y ∈ u ∧ u ∈ a) ⇒ x = y ) .
I will leave the proof that these four formulations of the Axiom of Choice are equivalent as
a fairly straightforward exercise.
F. THE AXIOM OF CHOICE 267

It is considerably more difficult to prove the following three equivalent. They are important
and very useful results, especially Zorn’s Lemma and the Well-ordering Principle. I will
give some applications of the use of Zorn’s Lemma in the next section; the Well-ordering
Principle will come into its own when we discuss ordinals and cardinals in the next chapter.

In the meantime, we will prove their equivalence by proving implications round in a circle:
AxC ⇒ Max Principle ⇒ Zorn’s Lemma ⇒ W-O Principle ⇒ AxC
We prove a lemma to help with the first step and this lemma has the hardest proof of the
circle. While it is important to understand Zorn’s Lemma and the Well-ordering Principle
(and I will discuss them in some detail as we proceed) it is not imperative to work through
the rather awful proofs. They are included here for completeness and so you can consult
them if you are interested. These proofs occupy the remainder of this section.
AC
Maximal Principle Every partially ordered set has a maximal chain.

Zorn’s Lemma AC Let P be a nonempty partially ordered set in which every chain has
an upper bound. Then P has a maximal element.
AC
The Well-Ordering Principle For every set there is a well-order.
(This is often expressed: “Every set can be well-ordered”. This is a tad dangerous, if you
interpret it to mean that you can explicitly specify or construct a particular well-order for
the set. For many sets you cannot. The AC, as is its wont, simply says that there is a
well-order there without giving any hint as to what it actually is.)
In order to prove these are equivalent to the Axiom of Choice, we start with another lemma,
also equivalent, but not particularly interesting in its own right. (Its proof does however
contain most of the hard work here.)

AC
F.4 Lemma
Let X be a set and S be a collection of subsets of X (which we will consider to be partially
ordered by inclusion), with the properties:
(1) ∅∈S
(2) If V ∈ S and U ⊆ V , then U ∈ S also.
S
(3) If C is a chain in S, then C ∈ S.

Then S has a maximal member.

Proof that AC ⇒ this lemma. By the Axiom of Choice, there is a choice function c :
P (X) r {∅} → X (so that, for every nonempty subset U of X, c(U ) ∈ U ).
For each member S of S, define a set S̄ by
S̄ = { x ∈ X : S ∪ {x} ∈ S }
Obviously S ⊆ S̄ for all such S.
268 CHAPTER 6. MORSE-KELLEY SET THEORY

Suppose that S 6= S̄. Then S ⊂ S̄, so there is some x ∈ / S such that S ∪ {x} ∈ S̄. This
means that S is not maximal in S. Conversely, suppose that S is not maximal. Then there
is some T ∈ S such that S ⊂ T . Then there is some x ∈ T r S and then S ∪ {x} ⊆ T , so
S ∪ {x} ∈ S by Condition (2). But then x ∈ S̄. Since x ∈/ S, S 6= S̄. We have just proved
that S 6= S̄ if and only if S is not maximal. In other words, S is maximal if and only if
S = S̄. Our task becomes: show that S contains a member S such that S = S̄.

For each set S ∈ S, define a set S ∗ by


(
∗ S ∪ c(S̄ r S) if S̄ r S 6= ∅;
S =
S otherwise.

We note that, if S ∈ S then S ∗ ∈ S also. Moreover, if S is maximal, S̄ r S = ∅ and so


S ∗ = S. If S is not maximal, then S ⊂ S ∗ and S ∗ r S is a singleton. Our task becomes:
show that S contains a member S such that S = S ∗ .
We make a definition for the purposes of this proof: a subcollection T of S is a tower if

(i) ∅∈T
(ii) T ∈T ⇒ T∗ ∈ T
S
(iii) if C is a chain in T, then C∈T.
There is at least one tower, namely S itself. Also the intersection of any collection of towers
is a tower. Therefore there is a “smallest” tower, T0 say — define T0 to be the intersection
of all towers and then it is a tower which is a subset of any other tower.
S
We will show that T0 is a chain; this will prove the lemma, for then, writing W for T0 , we
have W ∈ T0 by (iii). Then from (ii), W ∗ = W . Also, T0 is a chain in S, so W ∈ S. To this
end, we will say that a member C of T0 is central if it is comparable with every member of
T0 , that is
for all T ∈ T0 , either T ⊆ C or C ⊆ T.

It is now enough to show that every member of T0 is central and, to do this, it is sufficient
to show that the set of all central members of T0 is a tower; this is now our task.
Firstly,
S ∅ is obviously central. Now let K be a chain of central elements of T0 ; we
S show
that K is central. Let T be any member of T0 ; we want to show that T and K are
comparable. If every member of K is a subset of T then so is its union. OtherwiseSthere is
some K ∈ K such that K * T . But then since K is central, T ⊆ K and then T ⊆ K.
It remains to show that if C is central then so is C ∗ . From now on then suppose that C
is a fixed central member of T0 . We consider the collection U of all sets U in T0 such that
either U ⊆ C or C ∗ ⊆ U . We will show that U is a tower. This proves the lemma, for then
U = T0 (by definition of T0 ) and so, for every member T of T0 we have T ⊆ C or C ∗ ⊆ T ;
since C ⊆ C ∗ , this makes C ∗ central.
It remains to show that U is a tower. Clearly ∅ ∈ U. Now S let K be a chain in U. Then
either every member of K is a subset of C, in which case so is SK, or else K has a member,
K say, which is not a subset of C, and then C ∗ ⊆ K so C ∗ ⊆ K.
F. THE AXIOM OF CHOICE 269

It now remains to show that, if U ∈ U then U ∗ ∈ U. If C ∗ ⊆ U then C ∗ ⊆ U ∗ and we are


done. If U = C then U ∗ = C ∗ and again we are done. Finally suppose U ⊂ C. Since C is
central, either U ∗ ⊆ C or C ⊂ U ∗ . If U ∗ ⊆ C then once again we are done, and C ⊂ U ∗ is
impossible, for then we would have U ⊂ C ⊂ U ∗ and U ∗ r U is a singleton. 

Proof that Lemma E4 ⇒ the Maximum Principle. Let X be a partially ordered set
and let S be the collection of all chains in X. Now ∅ is a chain in X, so ∅ ∈ S. If V is a
chain in X and U ⊆ V thenSU is a chain also. Now suppose that C is a chain in S (under
⊆ as usual); we prove that C ∈ S, that is, its union is a chain in X.
S
Let a, b ∈ C. Then there are sets A, B ∈ C such that a ∈ A and b ∈ B. Since C is a chain,
A and B are comparable; suppose without loss of generality that A ⊆ B. Then a, b ∈ B.
Since B is a chain, a and b are comparable. 

Proof that the Maximum Principle ⇒ Zorn’s Lemma. By the Maximum Principle,
P has a maximal chain, C say. By hypothesis, C has an upper bound, m say. We show
that m is a maximal member of P . Suppose not. Then there is p ∈ P such that m < p. It
follows that C ∪ {p} is also a chain, contradicting the maximality of C. 

Proof that Zorn’s Lemma ⇒ the Well-Ordering Principle. (I will only give an out-
line of the proof here. The main structure of the proof is a classic example of how Zorn’s
Lemma is used. The details are straightforward but a bit tedious, and can safely be left as
an exercise.)
Let X be any set; we show that it can be well-ordered. Let W be the collection of all
well-ordered subsets of X, that is, all pairs of the form hW, ≤i, where W is a subset of X
and ≤ is a well-order of W . We now order W in the obvious (I think!) way:

hW1 , ≤1 i ≤ hW2 , ≤2 i if and only if hW1 , ≤1 i is an initial segment of hW2 , ≤2 i,

that is, if
W1 ⊆ W2 ,
≤1 is the restriction of ≤2 to W1
and
u ≤ v in W2 and v ∈ W1 ⇒ u ∈ W1 .
We now use Zorn’s Lemma to show that W has a maximal member. First note that W is
nonempty since it contains h∅, ∅i. Now we suppose that C is a chain in W and construct
an upper bound for C. This upper bound is the union of all the sets in C with the “obvious”
order. More precisely, it is the pair hM, ≤0 i defined by
[
M = { W : hW, ≤i ∈ C } (the union of all the sets “in” the chain)

and, for any x, y ∈ M ,

x ≤0 y if and only if there is some hW, ≤i ∈ C such that x, y ∈ W and x ≤ y.


270 CHAPTER 6. MORSE-KELLEY SET THEORY

The “straightforward but tedious” details that need checking here are that hM, ≤0 i is indeed
a member of W , that is, M is a subset of X and ≤0 is a well-order of M , and that it is an
upper bound for C.
Having established that W does have a maximal member, hW, ≤i say, it is enough to show
that then W = X. Suppose not. Then there is some x ∈ X such that x ∈ / W . But we
can extend the well-order of W to W ∪ {x} by making x a new greatest element — more
precisely, define an order ≤0 on W ∪ {x} by

u ≤0 v in W ∪ {x} if and only if either (i) u, v ∈ W and u ≤ v in W or (ii) v = x.

It is now easy to check that ≤0 well-orders W ∪ {x} and that hW, ≤i is an initial segment of
hW ∪ {x}, ≤0 i, contradicting the maximality of hW, ≤i. 

Proof that the Well-Ordering Principle ⇒ the Axiom of Choice. Let X be any set.
We show that there is a choice function c : P (X) r {∅} → X. There is a well-order, ≤ say
of X. Now define the function c by defining its graph:

gr(c) = { hA, least member of Ai : A ∈ P (X) r {∅} }


G. ZORN’S LEMMA 271

G Zorn’s Lemma
In this section I give several examples of applications of Zorn’s Lemma, together with a bit
of advice on how to use it.

G.1 BASES FOR VECTOR SPACES


In an introductory course on Linear Algebra you will have learnt about linearly independent
sets and bases, but almost certainly in the context of finite-dimensional vector spaces. Here
I prove the fundamental result that every vector space whatsoever has a basis — finite-
dimensional or not.
I start out by giving the background — the definitions of the basic ideas as they are used in
the infinite dimensional case and a couple of easy basic results we need to use. I am assuming
that you are familiar with these ideas in the finite-dimensional case. Zorn’s Lemma is used
in the proof of the main theorem.

Even though we are dealing with the infinite-dimensional case, the definitions are necessarily
couched in terms of finite sums because in a general vector space there is no definition of the
sum of an infinite number of vectors. (Where infinite sums occur in analysis some notion
of convergence is required and that means that the vector spaces involved must have some
sort of additional topological structure: you would be working in a Hilbert or Banach space
or something of that nature. But one needs the notion of a basis in a general vector space,
and this notion is important even in analysis.

G.2 Definitions
(1) A subset A of a vector space V is linearly independent if, for any finite subset
a1 , a2 , . . . , an of distinct members of A and scalars x1 , x2 , . . . , xn such that

x1 a1 + x2 a2 + · · · + xn an = 0

it follows that x1 = x2 = · · · = xn = 0
(2) A member v of the vector space V is a linear combination of the subset A of V if
there is a finite subset a1 , a2 , . . . , an and scalars x1 , x2 , . . . , xn such that v = x1 a1 + x2 a2 +
· · · + xn an .
(3) Let A be a subset of a vector space V. The set of all linear combinations of A is
denoted sp(A) and called the subspace of V spanned by A. If sp(A) = V we say that A
spans V.

(4) A subset A of a vector space V is a basis for V if it is both linearly independent and
spans V.
Now, a couple of lemmas and a corollary with easy proofs which you should supply: they
are almost the same as in the finite-dimensional case.

G.3 Lemma
The subset sp(A) of the vector space V, as defined in 3 above, is indeed a subspace.
272 CHAPTER 6. MORSE-KELLEY SET THEORY

G.4 Lemma
If A is a linearly independent subset of the vector space V and b ∈ V r sp(A), then A ∪ {b}
is linearly independent also.

G.5 Corollary
A maximal linearly independent subset of a vector space V is a basis for V.

Here “maximal” of course means maximal with respect to the order ⊆, in other words we
are looking at a linearly independent subset which is not a proper subset of any other one.
A result like this clearly suggests the use of Zorn’s Lemma. It tells us that, in order to
show that a basis exists, it is enough to show that the collection of all linearly independent
subsets (ordered by ⊆) has a maximal member. To use Zorn’s Lemma, we need to prove:

• The collection of all linearly independent subsets of the space is nonempty (that is,
that there exists a linearly independent subset of the space), and
• Every chain of linearly independent subsets is bounded above (and to do this, the
union of the chain is likely to be the bound we need).

A couple of comments before starting. Zorn’s Lemma applies to any partial order, and here
we are going to use ⊆ as this order. In this context a chain is a set, C say, of sets such that
any pair of them is comparable, in the sense that, given any C1 and C2 ∈ C , either C1 ⊆ C2
or C2 ⊆ C1 . Also, an upper bound for such a chain is a set, U say, which contains every
member ofSC , in the sense that C ⊆ U for every C ∈ C . A likely candidate is the union of
the chain, C , and this works for this theorem.

G.6 Theorem
Every vector space has a basis.

Proof. Let L be the collection of all linearly independent subsets of V, ordered by ⊆. By


the last lemma it is enough to show that L has a maximal member. By Zorn’s lemma it is
enough to show that L is nonempty and that every chain in L has an upper bound.
That L is nonempty is easy: ∅ is linearly independent so ∅ ∈ L.
Now let C be a chain S in L. We show that C is an upperSbound for C (in L). That
S
every
member of C is ⊆ C is trivial. It remains to show that C ∈ L, that is, that C is also
S
linearly independent.
So, let a1 , a2 , . . . , an be distinct members of C and x1 , x2 , . . . , xn scalars such that
S

x1 a1 + x2 a2 + · · · + xn an = 0 .
Then there are members C1 , C2 , . . . , Cn of C such that a1 ∈ C1 , a2 ∈ C2 , . . . , an ∈ Cn .
Since all the Ci are comparable (and there are only a finite number of them), there is one
that contains all the others. In other words, there is some k such that Ci ⊆ Ck for all i. But
then a1 , a2 , . . . , an all ∈ Ck . But Ck is linearly independent, so x1 = x2 = · · · = xn = 0, as
required. 
G. ZORN’S LEMMA 273

G.7 Note
The same argument can be used to prove the stronger and more useful theorem: Let A
and C be subsets of a vector space V such that A is linearly independent, C spans V and
A ⊆ C. Then there is a basis B for V such that A ⊆ B ⊆ C.
To prove this use the proof above with L the collection of all linearly independent subsets
L such that A ⊆ L ⊆ C.

G.8 Some general advice


To prove something using Zorn’s Lemma you need to

1. Choose a collection of things, P say, and partially order it.

2. Check that a maximal member of P is what you want.

3. Check that P is nonempty.

4. Prove that every chain is bounded above. Note that this upper bound must be a
member of P . This is usually the main part of the proof.

Here (1) and (2) are the parts where you need to use some creativity, to work out what P
and its order should be. The usual starting point is (2): you have something you need to
prove exists, and you figure out that a maximal sort of something will do the trick.
P will usually (but not always) be a collection of sets, subsets of something, often with
some extra structure. The partial order we place on these is usually ⊆, but if the sets have
extra structure, we will probably also need the structures to agree. The upper bound in (4)
is usually the union of the chain; if we need to create a structure on it, we will generally
need the agreement mentioned above.
In the vector space basis example above, we did not need to create any extra structure on
our subsets: the only structure needed is that of the vector space itself, and that is there
already.
In the next example we do need to create some structure, the full order we are trying to
show exists.

G.9 FULLY ORDERED SETS


We’ll prove here that every set can be fully ordered — more precisely, for every set there is
a full-order.
To do this (using Zorn’s Lemma of course) we will look at fully-ordered subsets of the given
set. We will need to prove two things: firstly, that a maximal fully-ordered subset must
be the whole set and, secondly, that the union of a chain of fully-ordered subsets is fully
ordered.
274 CHAPTER 6. MORSE-KELLEY SET THEORY

So what we will be working with are fully-ordered subsets: not just plain subsets, but subsets
together with a full order. It is best to think of these things as pairs:

hX, ≤i where X ⊆ A and ≤ is a full order on A .

Now we will have not only all these subsets with the full orders on them, but also the set
of all such pairs upon which we want to define the partial order to apply Zorn’s Lemma to.
I will use a nice exotic sort of order symbol for this: if hX1 , ≤1 i and hX2 , ≤2 i are two such
fully-ordered sets, we define
hX1 , ≤1 i 4 hX2 , ≤2 i
. . . well, how are we to define it? It is not good enough to define this to mean simply that
A1 ⊆ A2 (try it; you’ll se that you can’t prove that the union of a chain is fully ordered).
What we need is for the two orders ≤1 and ≤2 to match up as well. So we define this to
mean that A1 ⊆ A2 and ≤2 extends ≤1 , that is, that ≤1 and≤2 agree on A1 .

(If we view the orders as subsets of X1 × X1 and X2 × X2 , then to say that ≤2 extends ≤1
is the same as to say that ≤1 is a subset of ≤2 , or if you like ≤1 ⊆ ≤2 — but this notation
might be a bit hard to handle.)

G.10 Theorem
For every set there is a full order.

Proof. Let A be the given set. Let P be the set of all pairs hX, ≤i, where X ⊆ A and ≤
is a full order on X. We partially order this set by the order 4 defined

hX1 , ≤1 i 4 hX2 , ≤2 i if and only if X1 ⊆ X2 and ≤2 extends ≤1

Step 1 This relation 4 is a partial order on the class of fully ordered subsets sets of A.
Proof This is very straightforward, however here is some of it.
Let hX, ≤i be any fully ordered set. Then X ⊆ X and ≤⊆≤, so hX, ≤i 4 hX, ≤i.

Let hX1 , ≤1 i and hX2 , ≤2 i be two fully ordered sets such that hX1 , ≤1 i 4 hX2 , ≤2 i and
hX2 , ≤2 i 4 hX1 , ≤1 i. Then X1 ⊆ X2 and X2 ⊆ X1 so X1 = X2 and the same goes for the
two orders.
Proof of transitivity is much the same.

Step 2 This set of fully-ordered subsets of A is nonempty.


Proof Because the empty set with the empty order is a member. There are some nit-
picking details to check here.
Step 3 A maximal member of P is the given set A with a full order on it.
Proof We show that, if hX, ≤i is a member of P for which X 6= A, then it is not maximal
by constructing a greater set.
G. ZORN’S LEMMA 275

There is some a ∈ A such that a ∈ / X. So we have a set X ∪ {a} which we may order by
extending the order on X so that a is a new greatest element (that is, x ≤0 y in X ∪ {a} if
either x ≤ y in X or else y = a.) Now clearly this new set is a fully ordered subset of A and
hX ≤i ≺ hX ∪ {a}, ≤0 i.
So now we want to show that such a maximal member exists. To apply Zorn’s Lemma we
will want to construct an upper bound for a chain of ordered sets. This is in fact fairly
straightforward.
Step 4 Let C be a chain of ordered sets, that is a collection of ordered sets in which every
pair of members is comparable under our relation 4. Define a new pair hM, ≤m i by
[
M = { X : hX, ≤i ∈ C } ,
[
≤m = { ≤ : hX, ≤i ∈ C } .

(another way of looking at the definition of ≤m is that, for x, y ∈ M ,

x ≤m y ⇔ there is some hX, ≤i ∈ C such that x, y ∈ X and x ≤ y .)

THEN hM, ≤m i is indeed a fully ordered set as claimed and is an upper bound for C under
the order 4.
Proof First we must check that hM, ≤m i is a member of P . That M ⊆ A is obvious. To
see that ≤m is a full order is straightforward, as follows.

Reflexivity: Let x ∈ M . Then there is some hX, ≤i ∈ C such that x ∈ X. Then x ≤ x and
so x ≤m x.
Antisymmetry: Let x, y ∈ M be such that x ≤ y and y ≤ x. Then there are hX1 , ≤1 i ∈ C
and hX2 , ≤2 i ∈ C such that x, y ∈ X1 , x ≤1 y, x, y ∈ X2 and y ≤2 x. Now, since C is a
chain, we have either

hX1 , ≤1 i 4 hX2 , ≤2 i or hX2 , ≤2 i 4 hX1 , ≤1 i ,

without loss of generality we may assume the former. But then x ≤2 y also and, since this
is an order, x = y.
Transitivity: The proof is much the same.
Trichotomy: Let x, y ∈ M . Then there is some hX1 , ≤1 i ∈ C such that x ∈ X1 and some
hX2 , ≤2 i ∈ C such that y ∈ X2 . Then x and y are both members of the "bigger" of these two
ordered subsets, so are comparable in that subset, and therefore comparable in hM, ≤m i.

Now for the proof that hM, ≤m i is an upper bound. Referring to the construction of M and
≤m at the beginning of this step, this is easy. Let hX, ≤i be any member of C. Then, by
construction, X ⊆ M . Also ≤ is a subset of ≤m , and this means that ≤m extends ≤.
Step 5 That’s it, the proof is finished except perhaps to write something like

By Zorn’s Lemma the theorem is proved. 


276 CHAPTER 6. MORSE-KELLEY SET THEORY

G.11 Extending a partial order


We show that, given a partially ordered set, we can extend the given order to a full order.
More precisely, given a partial order ≤ on a set A, there is a full order ≤∗ say, on the same
set which extends the given order, in the sense that

x≤y ⇒ x ≤∗ y for all x, y ∈ A.

Following the previous example, one might think that a good approach would be to look
at fully ordered subsets of A on which the full order extends the given partial order. This
idea turns out to lead to a horrible mess (try it!) so we use a different approach. We order
partial orders themselves (!). Apart from the rather weird fact that we are going to define
an order relation on a set of orders (so we need to be very careful with notation), this works
nicely. In fact we find that we don’t run into the “creating structure” kind of complications
that we did with the last example.

To use Zorn’s Lemma, we order the partial orders on A which extend the given one and
show that a maximal partial order must be a full order.
How to order the orders? The easy way is to think of an order as a subset of A × A — the
set of all pairs (x, y) such that x ≤ y. Then we simply order them by set inclusion.

Note that, to say that the order ≤1 is “less than or equal to” the order ≤2 , which we could
denote by ≤1 ⊆ ≤2 if that notation is not too confusing, means that

x ≤1 y ⇒ x ≤2 y for all x, y ∈ A

So what we want is a full order ≤∗ such that ≤ ⊆ ≤∗ .


We first prove

G.12 Lemma
If ≤ is a partial order on A which is not a full order, then it is not maximal, that is, there
is some partial order ≤0 on P such that ≤ ⊂ ≤0 .

Proof. Suppose then that ≤ is partial but not full. Then there must be incomparable
a, b ∈ A. The trick is that we can now order those two in any way we like, as long as we
take care of the further relationships that follow from that. (If we put a ≤0 b, then anything
less than a must become less than b and anything greater than b must become greater than
a. It turns out that that is all we need.)

Define ≤0 thus:

x ≤0 y if either x ≤ y
or x ≤ a and b ≤ y .

(Note that that includes the case a ≤ b.) It is obvious that ≤ is a proper subset of ≤0 . It is
necessary to prove that ≤0 is indeed a partial order. This is not hard, but requires checking
a few cases; I’ll leave that as an exercise, so as not to interrupt the flow. 
G. ZORN’S LEMMA 277

Main proof
Now we look at the collection P of all partial orders on A which contain the given order ≤,
that is, all partial orders ≤0 such that ≤ ⊆ ≤0 .
This is nonempty, since ≤ is a member of P.

So, to apply Zorn’s Lemma, we show that any chain in P is bounded above. As is often the
case with Zorn’s Lemma, we take the union of the chain. That is clearly an upper bound;
so it remains only to prove that it is a partial order.
So let C be a chain in P. What is that? It is a collection of partial orders on A with the
property that, for any two of them, one must be contained in the other.

Let ≤◦ be the union of the chain. This is the order defined by

x ≤◦ y if and only if there is some ≤0 in the chain such that x ≤0 y.

So to prove that it is a partial order:

(O1) (Reflexive) For any x, x ≤◦ x because x ≤ y in the given order.


(O2) (Symmetric) If x ≤◦ y and y ≤◦ x, then there are orders ≤1 and ≤2 in the chain
such that x ≤1 y and y ≤2 x. Since they are in the chain, one of them contains the other.
Suppose wlog that ≤1 ⊆ ≤2 . Then both x ≤2 y and y ≤2 x, and then x = y since ≤2 is a
partial order.
(O3) (Transitive) Much the same proof. If x ≤◦ y and y ≤◦ z, then there are orders ≤1
and ≤2 in the chain such that x ≤1 y and y ≤2 z. Since they are in the chain, one of them
contains the other. Suppose wlog that ≤1 ⊆ ≤2 . Then both x ≤2 y and y ≤2 z, and then
x ≤2 z since ≤2 is a partial order, so x ≤◦ z.

(A new kind of proof here: proof by cutting and pasting!)


Finally, that it extends (contains) the given order is obvious, so we are done.

G.13 Theorem: Consistent and complete theories


We now look at an important theorem which we will use for some interesting results in
Chapter 8. It is proved here because the proof is another good example of the use of Zorn’s I Ch.8
Lemma.
If A is a consistent first-order theory, then it has a consistent complete theory extension.
Note that the extension so formed is not guaranteed to have a decidable set of axioms, even
if A does.

Proof. For the sake of this proof we will say that a set G of expressions is “good” if it has
these three properties.

(a) It is a theory, that is, it is deductively closed,


278 CHAPTER 6. MORSE-KELLEY SET THEORY

(b) It contains A, that is, A ⊆ G


(c) It is consistent, that is, there is no expression P such that P and ¬P both ∈ G.
Let us write G for the collection of all good sets. We will order it by ⊆. We will prove
(1) that any maximal member of G is complete (and so is the complete consistent theory
extending A that we require) and
(2) that G does indeed have a maximal member
This will establish the theorem.
Proof of (1) is by contradiction. We assume that G is a member of G but not complete
and show that it is not maximal. Since G is not complete, there is a sentence P such that
neither P nor ¬P ∈ G. Let H be the deductive closure of G ∪ {P } (so H is the set of
all expressions which are entailed by G ∪ {P }). Now H has Property (a) by definition and
Property (b) since A ⊆ G. To see that it is consistent, suppose not. Then there is some
expression Q such that both Q and ¬Q ∈ H. This means that

G ∪ {P } Q and G ∪ {P } ¬Q

but then
G ∪ {P } Q ∧ ¬Q and so G P ⇒ Q ∧ ¬Q
(This last step is by the Deduction Theorem. Note that P is a sentence, so there is no
trouble with UG.) Now, by SL, G ¬P , contradicting our choice of P . This contradiction
proves that H is consistent. We have just proved that H is good. But G ⊂ H, since P ∈ H
but P ∈/ G. Therefore G is not maximal.

Proof of (2) is by Zorn’s Lemma.S Now G is nonempty, since A ∈ G. Let C be a nonempty


chain in G: we show that C ∈ G.
To see that C is a theory,Swe show that it is deductively closed. Let A1 ,AB2 ,...,A be
S
(a)
a rule such that A1 , A2 , . . . , Ak ∈ C. Then there are members C 1 , C 2 , . . . , C k of C such
that Ai ∈ C i for i = 1, 2, . . . , k. But C is a chain, so one of these sets contains all the others;
call that one C n . Then
S A1 , A2 , . . . , Ak all ∈ C n . But C n is deductively closed, so B ∈ C n
also; but then B ∈ C, as required.
(b) Also C is nonempty, so there is Sat least one member of this chain, C say. But then
C is good, so A ⊆ C and therefore A ⊆ C.
S
(c) We now show that CSis consistent. Suppose not: then there is an expression SP
such that both P and ¬P ∈ C. But then there are members, C 1 and C 2 say, of C
such that P ∈ C 1 and ¬P ∈ C 2 . But C is a chain, so one of these sets contains both P and
¬P , contradicting its consistency.
Now Zorn’s Lemma tells us that G has a maximal member, and we are finished. 
H. THE LAST TWO AXIOMS 279

H The last two axioms


H.1 MK8: The Axiom of Foundation
There are several equivalent formulations of the Axiom of Foundation, of which three (and
a half) will interest us here. They are:—

The Axiom of Foundation, version 1


The version given in the formal axiom at the beginning of this chapter can be rewritten

(∀a ∈ Sets)(a ⊆ w ⇒ a ∈ w) ⇒ w = Sets

There is another way of looking at this statement, so little different that we will call it . . .

The Axiom of Foundation, version 1A (∈-induction)


A schema: an axiom for every expression P (x)

(∀a ∈ Sets) (∀x ∈ a)P (x) ⇒ P (a) ⇒ (∀a ∈ Sets)P (a)

The Axiom of Foundation, version 2


There is no sequence m0 , m1 , m2 , . . . of sets (indexed by N) such that
(i) m0 is a singleton set
(ii) for every i ∈ N and x ∈ mi , x ∩ mi+1 6= ∅.

The Axiom of Foundation, version 3



(∀a) a = ∅ or (∃x ∈ a)(x ∩ a = ∅) .
(Note: in Version 3 we do not start with (∀a ∈ Sets) — this is true for all classes a.)
I will prove first that these various versions of the axiom are equivalent (in the presence of
the preceding axioms of MK), then make some comments afterwards.

Proof that Version 1 ⇒ Version 1A. Let w be the class w = { x : P (x) } . 

Proof that Version 1A ⇒ Version 2. Let us call a sequence m0 , m1 , m2 , . . . satisfying


Conditions (i) and (ii) of Version 2 a “bad” sequence. If m0 = {a}, it is a bad sequence
“starting with” {a}.
Our aim is to prove that there are no bad sequences; in other words, for every set a, there
is no bad sequence starting with {a}. If we write P (a) for the expression “there is no bad
sequence starting with {a}”, then we wish to prove (∀a ∈ Sets)P (a). By virtue of Version
1A, it is enough to prove that

(∀a ∈ Sets) (∀x ∈ a)P (x) ⇒ P (a) .
280 CHAPTER 6. MORSE-KELLEY SET THEORY

So let us suppose that a is any set, and prove that (∀x ∈ a)P (x) ⇒ P (a). This we do by
proving that ¬P (a) ⇒ ¬(∀x ∈ a)P (x), which is the same as ¬P (a) ⇒ (∃x ∈ a)¬P (x).
So now we assume ¬P (a), that is, that there is some bad sequence starting with {a}; our
aim is to prove that there is some x ∈ a and a bad sequence starting with {x}.
Let m0 , m1 , m2 , . . . be the assumed bad sequence starting with {a}. We have

(i) m0 = {a} .
(ii) For every i ∈ N and x ∈ mi , x ∩ mi+1 6= ∅.
In particular, since a ∩ m1 6= ∅, there is an element b ∈ a ∩ m1 . But then the sequence
{b}, m2 , m3 , . . . is bad (every x ∈ b also ∈ m1 ), and b ∈ a. 

Proof that Version 2 ⇒ Version 3. We will prove that Version 3 false ⇒ Version 2
false. Our assumption is then that there is a class a such that a 6= ∅ and (∀x ∈ a)(x∩a 6= ∅).
We choose b ∈ a and use it to construct a bad sequence. Define it inductively:

m0 = {b}
[
mi+1 = { x ∩ a : x ∈ mi }.

We must first check that each mi is in fact a set, which S S m0 is a


we do by induction. That
set is obvious. Now suppose that mi is a set; then so is mi . But mi+1 = a ∩ mi must
then be a set also.
Now observe that mi ⊆ a for all i. (m0 = {b} and b ∈ a, so {b} ⊆ a. Also, for any y ∈ mi+1 ,
there is some x ∈ mi such that y ∈ x ∩ a and then y ∈ a; thus mi+1 ⊆ a.)

Finally check property (ii) of a bad sequence. Let i ∈ N and x ∈ mi . Then x ∈ a, so by


assumption, x ∩ a 6= ∅. Thus there is some y ∈ x ∩ a. But then y ∈ mi+1 . Also y ∈ x. Then
y ∈ x ∩ mi+1 , so x ∩ mi+1 is nonempty. 

Proof that Version3 ⇒ Version1. We will prove that Version 1 false ⇒ Version 3 false.
Our assumption then is that Version 1 fails, in other words, that there is a class w such that

(∀a ∈ Sets)(a ⊆ w ⇒ a ∈ w) (1)

but w 6= Sets . (2)


Let b be the complement of w. Then (1) becomes

(∀x)(x ∩ b = ∅ ⇒ x∈
/ b)

in other words (∀x)(x ∈ b ⇒ x ∩ b 6= ∅) (10 )


and in other words again (∀x ∈ b)(x ∩ b 6= ∅) . (100 )

Also (2) becomes


b 6= ∅ (20 )
The last two expressions constitute the negation of Version 3. 
H. THE LAST TWO AXIOMS 281

H.2 Remarks
Version 1 is of course the version given at the start of this chapter. If you compare it with
the statements of Strong Induction in E.4(ii), you will see that Versions 1 and 1A say that I E.4
you can do something very much like strong induction on the class of all sets, using the
relation ∈. Hence the subheading “∈-induction” above.
Version 2 has several immediate and useful corollaries:

H.3 Corollaries
(i) There is no sequence a0 , a1 , a2 , . . . of sets (indexed by N) such that a1 3 a1 3 a2 3 . . .
This probably accounts for the name of the axiom. To see that this follows from Version 2,
define m0 , m1 , m2 , . . . by mi = {ai } for all i ∈ N. Another way of looking at this result is
that, if you start with some set, then take a member of it, then a member of that and so
on, eventually you must come to the empty set. More precisely,
(i0 ) Any sequence a0 3 a1 3 a2 3 . . . of sets must terminate after a finite number of steps
with the empty set.
Putting this another way again, “everything, at the bottom, is empty”. If you are familiar
with Henri le Chat Noir you will recognise this as just his kind of theorem.
And following from this we have
(ii) No set is a member of itself.
Because, if a ∈ a that would generate an infinite sequence a 3 a 3 a 3 . . ..

(iii) There are no sets a and b such that a ∈ b ∈ a


for much the same reason.
You will recall that we have already proved these results for natural numbers. This axiom
tells us that they are true for all sets whatsoever.

We will need the rather technical-looking Version 3 when we come to discuss ordinal numbers.

In the section on Ordinal numbers, we will prove an interesting result which follows from
the Axiom of Foundation. For the time being we prove a couple of facts about transitivity:

H.4 Proposition
For every set there is a unique smallest transitive set which contains it (as a subset). In more
detail: let A be any set. Then there is a transitive set T such that A ⊆ T and, moreover, if
T 0 is any other transitive set such that A ⊆ T 0 then T ⊆ T 0 . This set T is uniquely defined
by A.

(It is called the transitive closure of A.)


282 CHAPTER 6. MORSE-KELLEY SET THEORY

Proof. We prove that T exists; uniqueness is obvious.


Define a family hBi ii∈N of sets, indexed by the natural numbers, as follows:—

B0 = A
[ 
for each i ∈ N, Bi+1 = Bi ∪ Bi .

(This means that x ∈ Bi+1 if Sand only if either x ∈ Bi or there is some y such that
x ∈ y ∈ Bi .) Now define T = i∈N Bi . It is easy to check that T is transitive and that
A ⊆ T . It remains to check the minimality condition. So, given transitive T 0 such that
A ⊆ T 0 , we must prove that T ⊆ T 0 ; it is enoug to show that every Bi ⊆ T 0 , and this we do
by induction over i. Firstly (zerothly?) B0 ⊆ T 0 because B0 = A. Now suppose (i-thly?)
that Bi ⊆ T 0 ; we show that Bi+1 ⊆ T 0 . Let x ∈ Bi + 1. Then either x ∈ Bi , in which case
x ∈ T 0 trivially, or else there is some y such that x ∈ y ∈ Bi , in whic case y ∈ T 0 and then
x ∈ T 0 by transitivity. 

H.5 Remark
When dealing with classes of sets with some kind of structure, it is normally of interest to
consider functions between those sets which preserve the structure in some obvious way, for
examples, continuous functions between topological spaces, linear functions between vector
spaces, homomorphisms between groups, order-preserving functions between ordered sets
and so on. Now a plain old set has a built-in structure: since all its members are also sets,
it has a defined relation ∈ between its members. It seems natural to look at functions which
preserve this structure. Such a function f : A → B would have the property that, for all
x, y ∈ A, x ∈ y ⇒ f (x) ∈ f (y). In this context, the following proposition says that two
transitive sets are isomorphic if and only if they are the same.

H.6 Proposition
Let A and B be transitive sets and suppose there exists a bijection f : A → B such that

for all x, y ∈ A x ∈ y ⇔ f (x) ∈ f (y) ,

then A = B.

Proof. as an interesting exercise. 

H.7 MK7: The Axiom of Formation


The Axiom of Formation tells us that, if the domain of a function is a set, then so is its
range. In a little more detail:

(a) Suppose that f is a surjection A → B, where A is a set and B is any class then B is
also a set.
Or, an obviously equivalent form:
(b) Suppose that f is a function A → B, where A is a set and B is any class then f [A]
is also a set.
H. THE LAST TWO AXIOMS 283

It is not obvious that this is what the axiom, as stated in Section A.1, gives us, so let I A.1
us deconstruct the axiom now. (Why not state the axiom in the simple form just stated?
Because the axioms, as given at the beginning of this Chapter, are all in formal form; the
simple version above would involve the definition of a function and thus the definitions
of unordered and ordered pairs; this could be done, but the axiom would then become
distressingly long.)

This is another section where the proofs have been included for completeness. Feel free to
skip these proofs unless you are interested in how they go. We return to the main story
with H.10 below. I H.10
First, let us change some of the variable letters to more helpful ones:

(∀A)(∀B)(∀F ) SET(A)

∧ (∀a) a ∈ A ⇒ (∃b)(∃f )(b ∈ B ∧ f ∈ F ∧ a ∈ f ∧ b ∈ f )
∧ (∀a)(∀f1 )(∀f2 )(∀b1 )(∀b2 ) a ∈ A ∧ f1 ∈ F ∧ f2 ∈ F ∧ b1 ∈ B ∧ b2 ∈ B

∧ a ∈ f1 ∧ b1 ∈ f1 ∧ a ∈ f2 ∧ b2 ∈ f2 ⇒ b1 = b2

∧ (∀b) b ∈ B ⇒ (∃a)(∃f )(a ∈ A ∧ f ∈ F ∧ a ∈ f ∧ b ∈ f )

⇒ SET(B)

Now, make this more readable:


The following is true for all sets A and classes B and F . If
(1) For all a ∈ A there exists b ∈ B and f ∈ F such that a, b ∈ f ,

(2) For all a ∈ A, b1 , b2 ∈ B and f1 , f2 ∈ F , if a, b1 ∈ f1 and a, b2 ∈ f2 then b1 = b2 .


(3) For all b ∈ B there exists an a ∈ A and f ∈ F such that a, b ∈ f
then B is a set.

H.8 Theorem: The usual form of the Axiom of Formation


Let A be a set, B a class and g a surjection A → B. Then B is also a set.

Proof. Define A1 = g −1 (A ∩ B) and A2 = g −1 (B r A).

Define F to be the class of all (unordered) pairs {a, f (a)} with a ∈ A2 .


It is now easy to check that Conditions (1), (2) and (3) hold for F (with respect to the sets
A2 and B r A). For Condition (2), note that, if {a, b} ∈ F , then a ∈ A2 and b ∈/ A2 .
Therefore, using the axiom, B r A is a set. But A ∩ B is also a set, being a subset of A.
Therefore their union, which is B, is a set. 
284 CHAPTER 6. MORSE-KELLEY SET THEORY

H.9 This is an equivalence


Because the axiom, as given above, is in an unusual form, I give here a proof that, if we
assume the axiom in its usual form (as just given in the statement of the theorem above),
I A.1 then the form in Section A.1 follows from that.

Suppose then that A is a set and B and F classes satisfying Conditions (1), (2) and (3). We
must prove, using the standard form, that B is a set.
Let G be the class of all ordered pairs ha, bi such that there exists some f ∈ F such that
a, b ∈ f :
G = { ha, bi : (∃f ∈ F )(a ∈ f ∧ b ∈ f } .
then Condition (1) says

for all a ∈ A there exists b ∈ B such that ha, bi ∈ G

and Condition (2) says

for all a ∈ A and b1 , b2 ∈ B, if ha, b1 i ∈ G and ha, b2 i ∈ G then b1 = b2

so, between them thay say that G is the graph of a function, g say, A → B. Finally,
Condition (3) says that g is onto B.

The standard form now tells us that B is a set, as required.

H.10 Corollaries
(i) Let X be a class, Y a set and f : X → Y an injection. Then X is a set too.
(ii) Let X be a proper class, Y a class and f : X → Y an injection. Then Y is a proper
class too.

Proof (i). Consider the range R of f :

R = { f (x) : x ∈ X } .

This is a subclass of Y and so is a set too. But, since f is an injection, its inverse exists,
f −1 : R → X and this is a surjection (onto), so X is also a set.
(ii) This is just the contrapositive of (i). 

H.11 A useful application


An interesting (and important) application of the Axiom of Foundation occurs when a set
of sets, indexed by N, is defined inductively. For example, we can define the sets

∅ , {∅} , {{∅}} , {{{∅}}} , . . .

as a sequence inductively and that then the class of all of them:

{∅ , {∅} , {{∅}} , {{{∅}}} , . . .}


H. THE LAST TWO AXIOMS 285

will be a set. Here are the details, in case they aren’t obvious already. We want to define
these sets as a sequence, S0 , S1 , S2 , . . . say, indexed by N, that is, a function S with domain
N. This is easy: we define the function S : N → Sets by induction by specifying S0 = ∅
and Sn+ = {Sn }. The collection of all these sets is of course the range of the function S,
and the Axiom of Formation tells us that this is a set.

H.12 Another application


Near the beginning of this chapter (in A.2) I claimed, without proof, that the class of all I A.2
topological spaces, the class of all groups and other things of this nature are proper classes.
Here I will show how this can be proved. The proof must be on a case-by-case basis: I will
give proofs for two or three well-known classes; it is not hard to see how the general idea
can be used for other such classes.

Vector spaces
Let X be any set. We make a real vector space VX . It is the set of all functions f : X → R
with the operations defined thus (in the obvious way?):

0 is the zero function 0 (x) = 0 for all x ∈ X

For functions f, g : X → R, f + g is the function

(f + g)(x) = f (x) + g(x) for all x ∈ X

For function f : X → R, −f is the function

(−f )(x) = −(f (x)) for all x ∈ X

For function f : X → R and r ∈ R, rf is the function

(rf )(x) = r.f (x) for all x ∈ X

It is easy to check that


(a) For any set X this defines a vector space as claimed and
(b) different sets give different vector spaces, that is, the function X 7→ VX is an injection.
That the class of all real vector spaces is a proper class now follows from Corollary (ii) above.

Topological spaces
Let X be any set. Make a topological space TX out of X by simply defining all of its subsets
to be open (this is known as a discrete space).
It is easy to see that TX is then a topological space, as claimed, and trivial to see that
the function X 7→ TX is injective. By the same argument as before, then, the class of all
toplological spaces is a proper one.
286 CHAPTER 6. MORSE-KELLEY SET THEORY

We get more from this example: since discrete spaces are metrisable, it also proves that the
class of all metic spaces is proper and, since discrete spaces are hausdorff spaces, the class
of all hausdorff spaces is also proper. And so on.

Groups
Use the same construction as we used for vector spaces above, but ignore the multiplication-
by-scalars part; use only the additive group part. This gives us an injection from the class
of all sets to the class of all abelian groups. So this proves that the classes of all abelian
groups and of all groups are proper.
7. TRANSFINITE ARITHMETIC

In this chapter we will investigate cardinality, that is the way in which the sizes of infinite
sets can be compared. You are probably familiar with the idea that the set R of real
numbers is uncountable and therefore “larger” than the countable set N of natural numbers.
Cardinality is the systematic investigation of “sizes” of all infinite sets (finite ones too, but
that’s nothing new.).

One useful outcome of this is the notion of cardinal numbers. We know that, for each finite
set, there is a unique natural number which represents its size (“The set X has n members;
its size is n.”). The cardinal numbers extend the natural numbers so that every set, finite
or infinite, has a unique cardinal number representing its size.

We will also look at ordinal numbers, which also extend the natural numbers past the finite,
but do so in a way which takes into account ordering, so for instance, just like the natural
numbers, every ordinal number will have a successor. We will see that the ordinal numbers
are well-ordered and that every well-ordered set is order-isomorphic to exactly one ordinal
number. Ordinal numbers will also allow us to do transfinite induction, a process which is
occasionally very useful.

While cardinal numbers sound more generally useful than ordinal numbers, it turns out
that the discussion of cardinality and cardinal numbers is based on well-orders and ordinal
numbers. Consequently, we investigate ordinal numbers first.

The words “cardinal” and “ordinal” have been stolen from the ordinary English lan-
guage and given slightly different and more specialised meanings. (Mathematicians
have a bad habit of doing that, to the confusion of non-specialists.) In ordinary En-
glish usage, the cardinal numbers are ‘zero’, ‘one’, ‘two’, . . . and are used to represent
the size of sets (“I watched four movies last night”); the ordinal numbers are ‘first’,
‘second’, ‘third’, . . . and are used to represent the order in which things occur (“But
my sister only watched the second and third one”). So the mathematical meaning is
not so very different from the everyday one in this case.

A Ordinal numbers
A.1 Discussion
Suppose we want to extend the natural numbers to include infinite numbers in some way.
The first step might be to append a first infinite number at the far end, something like this:

0, 1, 2, 3, ... , ω

(This notation is meant to indicate that all the natural numbers come first, then ω at the
end.) This first infinite number is called ω here because that’s what it is always called in
this context.

287
288 CHAPTER 7. TRANSFINITE ARITHMETIC

Well, that’s nice, but what is this new thing ω going to be exactly? It has to be a set of
some sort, because everything’s a set in MK. Here we can use what I like to think of as the
“Von Neumann insight”: every natural number is just the set of its predecessors, so let us
define ω to be the set of its predecessors too:

ω = {0 , 1 , 2 , 3 , . . . } .

So we have actually defined ω to be N: we have two official names for this set. Mathemati-
cians normally use one or other of these symbols depending upon the way we are looking at
the set. If we are thinking of it as the first infinite ordinal number, we usually call it ω.
Back when we were defining the natural numbers, we defined the successor a+ = a ∪ {a} of
any set. At that point we were only interested in the successors of natural numbers, but the
definition works for any set at all. So now we can extend our new set of numbers by adding
in successors to get
0 , 1 , 2 , 3 , . . . , ω , ω + , ω ++ , ω +++ , . . .
We will see that we will be able to define addition and multiplication of these new ordinal
numbers (in much the same way as we did for natural numbers) and that the we can use
the notation
0, 1, 2, 3, ... , ω, ω + 1, ω + 2, ω + 3, ...
Now of course we can pop a new number at the end of all this (I will call it ω2 because
that’s what it is usually called)

0 , 1 , 2 , 3 , . . . , ω , ω + 1 , ω + 2 , ω + 3 , . . . , ω2

Here, as before, we just define ω2 to be the set of all its predecessors here. But why stop
there? We can follow ω2 by a lot of successors and eventually append ω3 at the far end and
keep going like that. The resulting set is

0, 1, 2, 3, ...
ω, ω + 1, ω + 2, ω + 3, ...
ω2 , ω2 + 1 , ω2 + 2 , ω2 + 3 , . . .
ω3 , ω3 + 1 , ω3 + 2 , ω3 + 3 , . . .
..
.

Then we can put a new “number” at the end of of all this — we’ll call it ωω or just ω 2 —
and start all over again with things like ω 2 + 1, ω 2 + ω2 etc. etc.
At this stage you may be wondering if this is all an amusing but pointless mathematical
game. To which I reply: no, stay with it; as this chapter progresses I hope to demonstrate
how very useful these ordinal numbers are.
What we need at this stage is a workable definition of the class of ordinal numbers (we will
see that it is a proper class). Looking at the examples above, we see that, just like the
natural numbers, this class has the properties
(1) 0 is an ordinal number, and

(2) If α is an ordinal number, then so is α+ .


A. ORDINAL NUMBERS 289

Recalling that we defined N was the “smallest” class which had these properties (in other
words, the “closure” of these properties) it seems that what we need now is a third property
to ensure that the class contains numbers like ω, ω2, . . . , ω 2 and all the other ones of this
nature that will turn up. Here it is useful to notice a couple of things: firstly, that every one
of these numbers is the union of all its predecessors and secondly that so is every natural
number. It turns out (we will prove this, it’s not hard) that the same is true for all the
other numbers we have just created. So it seems we should add a third closure property to
the two above, that the union of every initial subset of this class is a member of the class.
There is one rather awkward thing about this property (in this form at least): to talk about
“before”, “initial sets” and so on one needs a definition of order. Now, while it is pretty
obvious how we want to order this set, to try to define an order on it while we are in the
process of creating it could be quite messy.
However, this class has a very pretty property that gets us out of this difficulty: the union
of any subset whatsoever is also the union of some initial subset. (Again, we will prove this,
and again it will not be hard to do so.) So our third property will be:
S
(3) If X is any set of ordinal numbers, then X is also an ordinal number.

Now we define the class Ord of all ordinal numbers to be the “smallest” class with this
property. It is tempting to define it as
\
Ord = { X : X is a class having properties (1), (2) and (3) above } WRONG!
but this won’t do: all those X’s are proper classes, so they cannot be members of anything:
we are actually taking the intersection of the empty set here, which is not at all what we
have in mind. But we already found a way of avoiding this problem when we defined N, so
we’ll use it again.
An ordinal number is a set which is a member of every class which has
properties (1), (2) and (3) above .

It is now time to make this a proper definition.

A.2 Definition 1 of ordinal numbers


(i) A class T is transfinite-inductive if it has the following three properties:
(1) ∅∈T
(2) For all x ∈ T , x+ ∈ T also.
S
(3) For any subset X of T , X ∈ T.
(ii) An ordinal number is a set which is a member of every transfinite-inductive class.
The class of all ordinal numbers will be denoted Ord.

And now I give a second definition of ordinal numbers which looks rather different.

A.3 Definition 2 of ordinal numbers


An ordinal number is a set α such that
290 CHAPTER 7. TRANSFINITE ARITHMETIC

(i) α is transitive (that is, every member of α is also a subset of α).


(ii) For all x, y ∈ α, one of the following hold: x ∈ y, x = y or y ∈ x.
The class of all ordinal numbers will be denoted Ord.
I 6.H.3 Note that, since no class can be a member of itself (see 6.H.3), (i) above is equivalent to

(i0 ) x ∈ α ⇒ x ⊂ α.
We will use this second definition as the basic one, work from that and eventually prove the
two definitions equivalent. Why work from the second, less obvious one? For two reasons.
Firstly, the basic properties of ordinal numbers are more easily derived from the second
definition. Secondly, and perhaps more importantly, the first definition does not make sense
in the language of ZF (because there is no way of expressing the idea of “for every class” in
that language) whereas the second definition does: because ZF is used quite extensively, it
is good to maintain compatibility where we can.

A.4 Definition: Ordering an ordinal number


Let α be an ordinal. We define the relation ≤ on α by

x≤y if and only if x∈y or x = y .

Using (i0 ) above, this is the same as

x≤y if and only if x⊆y .

A.5 Proposition
This relation is in fact a well-order on α.

I 6.E.1 Proof. As was observed in the note 6.E.1, it is enough to prove that this relation is anti-
symmetric and that every nonempty subset of α has a least element.
For antisymmetry, suppose that x ≤ y and y ≤ x in α. Then x ⊆ y and y ⊆ x so x = y.
Now suppose that E is a nonempty subset of α. By Version 3 of the Axiom of Foundation,
there is z ∈ E such that z ∩ E = ∅. We will show that z is the least member of E, as
required. We already know that z ∈ E, so it is enough to show that it is a lower bound.
Suppose then that x is any member of E. Then x ∈ / z. But then by A.3(ii) above, either
z ∈ x or z = x, that is z ≤ x, as required. 

A.6 Remark
Recall the definition of an initial segment in an ordered set: I(a) = { x : x < a }. Ordinal
numbers have the interesting property that the initial segment of a member is equal to that
member, that is
If α is an ordinal and x ∈ α, then I(x) = x.
To see this, simply observe that z∈x ⇔ z<x ⇔ z ∈ I(x).
A. ORDINAL NUMBERS 291

We will be using the idea of order-isomorphism quite often in connection with ordinal num-
bers. It will be convenient to have a symbol for this relation. If A and B are ordered sets,
we will write A ' B to mean that A is order-isomorphic to B.

A.7 Proposition
If two ordinal numbers are order-isomorphic, then they are equal.

Proof. By symmetry, it is enough to prove that, if α and β are ordinal numbers and α ' β,
then α ⊆ β. Suppose not. Let ϕ : α → β be the assumed order-isomorphism. Then there is
some x ∈ α such that x 6= ϕ(x) and thus the subset E = { x : x 6= ϕ(x) } of α is nonempty.
Since α is well-ordered, the set E has a least element, e say. Then x = ϕ(x) for all x < e in
α. This means that Iα (e) = ϕ[Iα (e)]. On the other hand, since ϕ is an order-isomorphism,
ϕ[Iα (e)] = Iβ (ϕ(e)) (by Proposition 6.E.6). Therefore Iα (e) = Iβ (ϕ(e)) and so, by the I 6.E.6
Remark A.6 above, e = ϕ(e). But this contradicts the choice of e.  I A.6

A.8 Proposition
Every initial segment of an ordinal number is an ordinal number.
Therefore every member of an ordinal number is an ordinal number.

Proof. Let I be an initial segment of the ordinal number α. First we check that I is
transitive. Let x ∈ I. Then x ∈ α so x ⊆ α. But now, for any z ∈ x, we have z < x and so
z ∈ I, proving that x ⊆ I and so that I is transitive. Also, if x, y ∈ I, then x, y ∈ α and so
one of x ∈ y, x = y, or y ∈ x must hold. 

A.9 Theorem
(i) The class Ord is transitive (that is, if α ∈ Ord, then α ⊆ Ord).

(ii) If α, β ∈ Ord then one of α ∈ β, α = β or β ∈ α must hold.


(iii) Every nonempty subclass of Ord has a least element.

Proof. (i) is the second statement of A.8. I A.8


(ii) Here α and β are well-ordered sets and so, by the Trichotomy Theorem (6.E.8, one I 6.E.8
of three things must hold. Firstly, α ' β, in which case they are equal, by Proposition A.7
above. Secondly, there is some y ∈ β such that α ' I(y). In this case α = y ∈ β. Lastly,
there is some x ∈ α such that β ' I(x), in which case β = x ∈ α.
(iii) Let A be a nonempty subclass of Ord. Then there is some α ∈ A. If α is the least
member of A, then we are done. Otherwise it is enough to show that the class of all members
of A which are < α has a least element. But this is the set of all members of A which are
∈ α and so is a nonempty subset of the ordinal α; the result follows. 
292 CHAPTER 7. TRANSFINITE ARITHMETIC

A.10 Remarks
Part (iii) of this theorem tells us that Ord is a well-ordered class.
This theorem also tells us that Ord has all the defining properties of an ordinal number —
except for one . . .

A.11 Proposition
The class Ord of all ordinals is a proper class; that is, it is not a set.
However, every proper initial segment of Ord is a set (in fact an ordinal number).

Proof. If Ord was a set, Parts (i) and (ii) of the previous theorem would mean that it was
an ordinal number itself. Then we would have Ord ∈ Ord, which is impossible.
Now let J be any proper initial segment of Ord. Since it is proper, Ord \ J is nonempty.
Let α be the least member of Ord \ J. Then J = I(α) = α. 

The next theorem will turn out to be much more important than it looks at first sight.

A.12 Theorem
Every well-ordered set is order-isomorphic to exactly one ordinal number.

I 6.E.8 Proof. Since ordinal numbers are well-ordered sets, by the Trichotomy Theorem (6.E.8)
ordinal numbers are of three kinds,

(i) Ones which are order-isomorphic to some initial segment I(w) of W .


(ii) Ones which are order-isomorphic to W itself.
(iii) Ordinals α such that W is order-isomorphic to some I(ξ), where ξ ∈ α.

I A.7 Looking first at (i), note that the initial segments of W form a set; by A.7 then, the ordinals
of Type (i) form a set. Therefore there exists an ordinal α which is either of Type (ii) or
of Type (iii). If it is of Type (ii) we are done. If it is of Type (iii), note that ξ is then an
ordinal and I(ξ) = ξ, so W is order-isomorphic to the ordinal ξ.
Uniqueness is easy. If W is order-isomorphic to ordinals α and β, then α and β are order-
I A.7 isomorphic to each other and so, by A.7, they are equal. 

A.13 Definition
If A is a well-ordered set, then the unique ordinal number to which it is order-isomorphic is
called the order-type of A.
A. ORDINAL NUMBERS 293

A.14 Proposition
(i) The set N of natural numbers is an ordinal number.
(ii) Every natural number is an ordinal number. In other words, N ⊂ Ord.

Remark This proves one of the results mentioned in the preamble to this chapter. I have
already mentioned that, the context of ordinal numbers, the set N is usually denoted ω.

Proof. (i) To prove that ω is transitive, we assume that m ∈ n ∈ ω and prove that
then m ∈ ω. This we do by induction over n. If n = 0, the result is vacuously true. Now
assume that the result is true for n and prove it for n+. We are assuming that m ∈ n ∪ {n};
but then either m ∈ n, in which case m ∈ N by the inductive hypothesis, or else m = n ∈ N
in which case it is true trivially.
Nowe must prove that, for any m, n ∈ ω, one of m ∈ n, m = n or n ∈ m holds. Let us write
m ∼ n to mean that one of m ∈ n, m = n or n ∈ m holds. We now wish to prove that
m ∼ n for all m, n ∈ ω. The proof goes in several steps.

(1) ∅ ∈ n+ for all n ∈ ω.


Proof is by induction over n. Firstly ∅ ∈ {∅} = ∅+ . Now suppose ∅ ∈ n+ . Then ∅ ∈ n++
since n+ ⊆ n++ .
(2) ∅ ∼ n for all n ∈ ω. (Corollary of (1).)

(3) If m ∈ n then m+ = n or m+ ∈ n. Proof is by induction over n. If n = ∅, the result


is vacuously true. Now suppose m ∈ n+ ; we want to prove that m+ = n+ or m+ ∈ n+ . We
have m ∈ n ∪ {n}, so m ∈ n or m = n. If m = n then m+ = n+ . If m ∈ n then (by the
inductive hypothesis), either m+ = n or m+ ∈ n. In either case m+ ∈ n+ .
(4) Now we prove the main result, that m ∼ n, by induction over n. The case n = ∅ is
give by Step (2) above. Now suppose that m ∼ n; we want to prove that m ∼ n+ also. Let
us check the cases. If m ∈ n then m ∈ n+ also, since n ⊆ n+ . If m = n then m+ = n+ . If
n ∈ m then by (3), either n+ = m or n+ ∈ m and in either case m ∼ n+ .
(ii) is a corollary of (i). 

A.15 Proposition
(i) 0 is an ordinal number.
(ii) If α is an ordinal, then so is α+ .

(iii) If A is a set of ordinal numbers, then its union is an ordinal number.


(iv) α+ 6= 0 for any ordinal number α.
(v) If α and β are ordinal numbers,then α+ = β + ⇒ α = β.

(vi) α+ is the successor of α, in the sense that it is the “next” ordinal: there is no ordinal
ξ such that α < ξ < α+ .
294 CHAPTER 7. TRANSFINITE ARITHMETIC

(vii) A corollary of this is that, if α and β are ordinals and α < β, then α+ ≤ β.
(viii) Every set A of ordinals has a supremum, that is, an ordinal σ such that

for all α ∈ A, α ≤ σ.
and, if ξ is an ordinal such that, for all α ∈ A, α ≤ ξ then σ ≤ ξ.
S
It is given by σ = A. In the case that A has a maximum element, then σ is this maximum
element.

Proof. (i) The definition is satisfied vacuously for 0 = ∅.

(ii) and (iii) follow immediately from the definition of an ordinal number.
(iv) α ∈ α+ .
(v) If α+ = β + then α ∈ β + = β ∪ {β}, so α ∈ β or α = β, that is α ≤ β. But, in the
same way, β ≤ α.

(vi) (and (vii)) If such a ξ existed, we would have α ∈ ξ ∈ α∪{α}, so that either α ∈ ξ ∈ α
or α ∈ ξ = α, both of which contradict the axiom of foundation.
S
(viii) Let σ = A. Then σ is an ordinal (by (iii)) and, for any α ∈ A, α ⊆ σ, that is,
α ≤ σ.
S
Now suppose that β is some ordinal < σ, that is, β ⊂ A. Then there is an α ∈ A such
that α 6⊆ β. Then β < α, so β is not an upper bound. 

A.16 Definition: Successor and limit ordinals


It is clear by now that ordinal numbers are of two distinct types.
An ordinal number α is a successor ordinal if it is the successor of some ordinal, that is, if
it is of the form ξ + for some ordinal ξ.

Otherwise it is a limit ordinal. In other words, an ordinal number λ is a limit ordinal if it


is not the successor of any ordinal — that is, not of the form ξ + for any ordinal ξ).
Successor ordinals are sometimes called non-limit ordinals.
Examples of limit ordinals are 0, ω, ω2, ω3.

Examples of successor ordinals are: all the nonzero natural numbers (n = (n − 1)+ ) and the
ordinals ω + , ω ++ , ω +++ and so on.
The following useful characterisations of limit ordinals should be proved as an exercise:

A.17 Proposition
(i) An ordinal number λ is a limit ordinal if and only if

ξ<λ ⇒ ξ+ < λ for all ordinals ξ.


A. ORDINAL NUMBERS 295

(ii) An ordinal number λ is a limit ordinal if and only if λ = sup{ξ : ξ < λ}.
This last probably accounts for its name.
We now develop a few techniques for dealing with ordinal numbers.

A.18 Proposition: Notes on suprema


We will be using suprema frequently in this chapter, referring to the basic facts set out in
Section 6.C.3 — so we will also use these facts about suprema of sets of ordinal numbers. I 6.C.3
(i) For ordinals only: if X is a nonempty set of ordinals and sup X is either zero or a
successor ordinal, then sup X ∈ X and so is a maximum value of X.

(ii) Let f be a function X → Ord, where X is any set. Then µ = sup{ f (x) : x ∈ X } if
and only if the following two things are true:
(a) f (ξ) ≤ µ for all ξ ∈ X .
(b) If α < µ then there is some ξ ∈ X such that α < f (ξ) .

Proof. (i) If sup X = 0 then, since X 6= ∅, we have X = {0} and the result is trivially
true.
If sup X is a successor ordinal, sup X = α+ say, then α < α+ so, by (ib) above, there is
some ξ ∈ X such that α < ξ. But then α+ = ξ.
(i) follows immediately from 6.C.3(i).  I 6.C.3

A.19 Definition: Normal function


A function f : Ord → Ord is normal if

(i) It is strictly increasing and


(ii) f (λ) = sup{ f (ξ) : ξ < λ , } for every nonzero limit ordinal λ.
The point of this definition is that lots of important functions on the ordinals are normal.
In particular, the standard arithmetic operations of addition, multiplication and exponenti-
ation, which we will define below, are all normal in their second variable. (Hence I suppose
the use of that particular word.) Also, we have the very useful next proposition.

A.20 Proposition: Intermediate Value Theorem


Let f : Ord → Ord be a normal function and α be any ordinal ≥ f (0). Then there is an
ordinal µ such that f (µ) ≤ α < f (µ+ ).

Proof. Let θ be the least ordinal such that f (θ) > α (there is at least one such, namely
f (α), since f is strictly increasing). Now θ is not zero, by assumption, nor is it a nonzero
limit ordinal because, if it were, we would have
α < f (θ) = sup{ f (ξ) : ξ < θ } ,
296 CHAPTER 7. TRANSFINITE ARITHMETIC

in which case α would not be an upper bound for the set { f (ξ) : ξ < θ } so there would be
ξ < θ such that α < f (ξ), contradicting the choice of θ.
So θ is a successor ordinal. Set θ = µ+ . Then µ < θ, so, by the choice of θ, µ ≤ α, and we
already know that α < θ = µ+ . 

Notice what this says. Because f has values either side of α, then it has a value as close to
α as you could expect to get. (Because the ordinals are a sort of discrete set — lots of little
jumps — you couldn’t expect to always get a value µ such that f (µ) = α.)

A.21 Proposition: Normal functions and subsets


Let f : Ord → Ord be a normal function and let X be any nonempty subset of the ordinals.
Then
sup f [X] = f (sup(X)) .
(Here f [X] as usual denotes { f (x) : x ∈ X }.)

Proof. Write µ = sup X. Then, for every x ∈ X, x ≤ µ and so, since f is increasing,
f (x) ≤ f (µ). Therefore sup f [X] ≤ f (µ). It remains to prove that µ ≤ sup f [X].
If X has a maximum member, then it must be µ. We then have µ ∈ X so that f (µ) ∈ f [X]
and then f (µ) ≤ sup f [X].

If µ is 0 or a successor ordinal, then it must be the maximum member of X and we have


just proved the result in this case.
Finally, suppose that µ is a nonzero limit ordinal. Then, for every ξ < µ, ξ is not an upper
bound for X so there is some x ∈ X such that ξ < x. But then ξ < sup X and so, since f
is increasing, f (ξ) < f (sup X). But this is true for all ξ < µ, so

sup{ f (ξ) : ξ < µ } ≤ f (sup X) .

But sup{ f (ξ) : ξ < µ } = f (µ) (since f is normal) and the result is proved. 
B. INDUCTION AND ORDINALS 297

B Induction and ordinals


It is probably obvious by now that induction will play a big part in dealing with ordinals.
Because the class is well-ordered, strong induction is the most natural, and neatest, style of
induction to use. However, there is another inductive method which looks very much like
ordinary induction, so we will call it that.
We can do induction over the class Ord, thus proving things (or defining them) for all ordinal
numbers. Alternatively, we can work within a single ordinal number, α say, proving things
or defining them for all members of α — which are all the ordinal numbers < α. The process
in either case looks very similar. Let us start with induction over Ord itself.

B.1 Proposition: Inductive properties of Ord


(i) (Ordinary induction) Let X be a subclass of Ord with the properties:
For any ordinal ξ, ξ ∈ X ⇒ ξ + ∈ X

For any limit ordinal λ, I(λ) ⊆ X ⇒ λ ∈ X.


Then X = Ord.
(ii) (Strong induction) Let X be a subclass of Ord with the property:
For any ordinal ξ, I(ξ) ⊆ X ⇒ ξ ∈ X.

Then X = Ord.

Proof. (ii) is just the statement that Ord is well-ordered (see 6.E.4). I 6.E.4

(i) If X 6= Ord then there is a least ordinal which is not a member of X. If it is a


non-limit ordinal, call it ξ + and it violates the first condition; and if it is a limit ordinal,
call it λ and it violates the second. 

B.2 Corollary: Inductive proofs on Ord


(i) (Ordinary induction) Let P (ξ) be an expression defined for ξ ∈ Ord with the prop-
erties:
For any ordinal ξ, P (ξ) ⇒ P (ξ + ).

For any limit ordinal λ, (∀ξ < λ)(P (ξ)) ⇒ P (λ).


Then (∀ξ ∈ Ord)(P (ξ)).
(ii) (Strong induction) Let P (ξ) be a predicate defined on Ord with the property:

For any ordinal ξ, (∀θ < ξ)(P (θ)) ⇒ P (ξ).


Then (∀ξ ∈ Ord)(P (ξ)).

Proof. Let X be the set { ξ : P (ξ) } — over to you. 


298 CHAPTER 7. TRANSFINITE ARITHMETIC

B.3 Note
In (ordinary) inductive proofs it is very often the case that the zero case is treated separately
from the nonzero limit ordinals. In that case, the structure of the proof is

Let P (ξ) be a predicate defined on Ord with the properties:


P (0).
For any ordinal ξ, P (ξ) ⇒ P (ξ + ).
For any nonzero limit ordinal λ, (∀ξ < λ)(P (ξ)) ⇒ P (λ).

Then (∀ξ ∈ Ord)(P (ξ)).


As stated above, these inductive principles over Ord all have corresponding forms for induc-
tion over an ordinal number. They, and their proofs, are almost exactly the same, but it is
worth listing them here for reference.

B.4 Proposition: Inductive properties of an ordinal number


(i) (Ordinary induction) Let α be an ordinal number and X be a subset of α with the
properties:
For any ξ < α, ξ ∈ X ⇒ ξ + ∈ X

For any limit ordinal λ < α, I(λ) ⊆ X ⇒ λ ∈ X.


Then X = α.
(ii) (Strong induction) Let X be a subclass of α with the property:

For any ξ < α, I(ξ) ⊆ X ⇒ ξ ∈ X.


Then X = α.

I 6.E.4 Proof (ii). This is just the statement that α is well-ordered (see 6.E.4).
(i) If X 6= α then there is a member of α, which is therefore an ordinal, which is not a
member of X. Consider the first such. If it is a non-limit ordinal, call it ξ + and it violates
the first condition; and if it is a limit ordinal, call it λ and it violates the second. 

B.5 Corollary: Inductive proofs on an ordinal number


(i) (Ordinary induction) Let α be an ordinal and P (ξ) be a predicate defined on α with
the properties:
For any ordinal ξ < α, P (ξ) ⇒ P (ξ + ).
For any limit ordinal λ < α, (∀ξ < λ)(P (ξ)) ⇒ P (λ).

Then (∀ξ < α)(P (ξ)).


B. INDUCTION AND ORDINALS 299

(ii) (Strong induction) Let α be an ordinal and P (ξ) be a predicate defined on α with
the property:
For any ordinal ξ < α, (∀θ < ξ)(P (θ)) ⇒ P (ξ).
Then (∀ξ < α)(P (ξ)).

B.6 Definition by induction


In the light of the last remarks we would expect that definition by induction over an ordinal
number, or even the class of all ordinals, is allowed; even more so if we compare the remark
A.10 with the remarks preceding Theorem 6.E.11. I A.10
I 6.E.11
Of course Ord is a well-ordered class, so the kind of inductive definition described in Theorem
6.E.11 is automatically valid. I 6.E.11
Translating this into the notation appropriate to ordinal numbers, we have forms of inductive
definition which look more like the strong and weak kinds of inductive definition over N.
Suppose that we want to define a function f : Ord → B, where B is some class.
The strong form is straightforward: for each ordinal ξ, define f (ξ) in terms of ξ and f |I(ξ) .
In other words, in terms of ξ and the whole of that part of the function which has been
defined so far.

Ordinary or weak induction is very like ordinary induction over the natural numbers, with
just a slight addition to make it “get past” limit ordinals. To define a function f using this
kind of induction, separate definitions are given of

f (0)
+
f (ξ ) in terms of ξ and f (ξ)
and, for any non-zero limit ordinal λ, f (λ) in terms of λ and f I(λ) .

Those were the cases where there are no parameters. Where there are parameters we are
defining a function f : A × Ord → B, where A and B are classes. The weak version looks
like this: separate definitions are given of

f (a, 0)
+
f (a, ξ ) in terms of a, ξ and f (a, ξ)
and, for any nonzero limit ordinal λ, f (a, λ) in terms of a, λ and f A×I(λ) .

And the strong version looks like this:

For each ordinal ξ, define f (a, ξ) in terms of a, ξ and f A×I(ξ) .

Where B is Ord, in the majority of cases (I would say), the definition in the limit ordinal
case is
[
f (θ, λ) = f (θ, ξ) which is the same as f (θ, λ) = sup f (θ, ξ)
ξ<λ
ξ<λ

or something very like it.


300 CHAPTER 7. TRANSFINITE ARITHMETIC

The definitions of addition, multiplication and so on given in the next section are simple
examples of the application of this kind of inductive definition. It will be seen that these def-
initions are really straightforward extensions of the definitions for natural numbers already
considered, extended to the ordinals.
We have just considered three kinds of inductive definition over Ord. We already know that
the first kind is valid. It is a straightforward exercise to use this to prove that the other two
kinds are valid also.
We conclude this section with an interesting example of a definition by induction over
ordinals.

B.7 Definition: the Cumulative Hierarchy


Define a set Vα , for each ordinal α by induction:—
(i) V0 = ∅ .

(ii) For every ordinal α, Vα+1 = P (Vα ) (the power set of Vα ) .


S
(iii) For every nonzero limit ordinal λ, Vλ = { Vξ : ξ < λ } .

B.8 Theorem: Cumulative Hierarchy


(i) Vα is transitive, for every ordinal α.
(ii) If α < β, then Vα ⊂ Vβ .
(iii) For every set X, there is an ordinal α such that X ⊆ Vα (and then X ∈ Vα+1 ).

And note that (iii) here says that


[
Sets = Vα .
α∈Ord

That is, every set belongs in this hierarchy somewhere.

Proof. (i) Proof is by induction over α. First notice that V0 = ∅ is transitive (vacu-
ously).
Now suppose that Vα is transitive and that x ∈ y ∈ Vα+1 . Then x ∈ y ⊆ Vα so x ∈ Vα .
But Vα is transitive (as just assumed) and so x ⊆ Vα . Then x ∈ Vα+1 .
Finally suppose that λ is a nonzero limit ordinal, that all the Vξ for ξ < λ are transitive
and that x ∈ y ∈ Vλ . Then there is some ξ < λ such that y ∈ Vξ . But then x ∈ y ∈ Vξ and
Vξ is transitive (as just assumed); so x ∈ Vξ and then x ∈ Vλ .
(ii) We show first, by induction over β, that if α ≤ β then Vα ⊆ Vβ . If α = β the result
is trivial. Now suppose that α ≤ β and Vα ⊆ Vβ ; we prove that Vα ⊆ Vβ+1 . Let x ∈ Vα .
Then x ∈ Vβ ∈ Vβ+1 , which is transitive, so x ∈ Vβ+1 . Finally, suppose that α < λ and λ
is a nonzero limit ordinal. Then α + 1 < λ also and Vλ+1 ⊆ Vλ by the definition of Vλ . But
Vα ⊆ Vα+1 so Vα ⊆ Vα+1 ⊆ Vλ .
B. INDUCTION AND ORDINALS 301

To see that, if α < β then Vα ⊂ Vβ , it is now enough to show that Vα ⊂ Vα+1 for all α.
But this is easy: we know that Vα ⊆ Vα+1 and the sets cannot be equal because Vα ∈ Vα+1
but Vα 6∈ Vα .
(iii) Let us say “X is in the hierarchy” to mean “there exists some ordinal α such that X ∈
Vα ”. We want to prove that every set is in the hierarchy. Using the Axiom of Foundation,
Version 1A, it is enough to prove that, for all sets X,

all of the members of X are in the hierarchy ⇒ X is in the hierarchy.

Suppose then that X is such a set, that is, for all x ∈ X there is an ordinal ξ such that
x ∈ Vξ . Then, for all such x, write ξ(x) for the least ordinal such that x ∈ Vξ(x) . Now, by
the Axiom of Formation, the class { ξ(x) : x ∈ X } is in fact a set and so, by Proposition
A.15(viii), there is an ordinal α such that ξ(x) ≤ α for all x ∈ X. Then, for all x ∈ X, I A.15
x ∈ Vξ(x) ⊆ Vα ; thus X ∈ Vα and so X ⊆ Vα . 
302 CHAPTER 7. TRANSFINITE ARITHMETIC

C Ordinal functions and continuity


In this section I collect some technical results which are needed for time to time in the
discussion of ordinal arithmetic which follows. I would recommend you skip over this section,
coming back to check results when they become relevant to the following discussion.

C.1 Some definitions


We consider functions f : Ord → Ord, in particular several properties of such functions
related to what is known as continuity. These are
(1) The function f is continuous if, for every nonzero limit ordinal λ and ordinal θ < f (λ)
there is an ordinal α < λ such that f maps (α, λ] into (θ, f (λ)] .
(2) The function f has the sup property if, for every nonzero limit ordinal λ,

sup f (ξ) = f (λ) .


ξ<λ

(3) The function f has the minsup property if, for every nonzero limit ordinal λ,

min sup f (ξ) = f (λ) .


α<λ α<ξ<λ

(4) The function f has the intermediate value or IV property if, for all ordinals α1 , α2
and γ such that α1 < α2 and
f (α1 ) ≤ γ < f (α2 ) .
Then there is an ordinal β such that

α1 ≤ β < α2 (–1)
+
and f (β) ≤ γ < f (β ) . (–2)

We show here that

continuity ⇒ the minsup property ⇒ the IV property .

In general, no other implications exist between these properties.


However, in the case of weakly order-preserving functions, all four properties are equivalent.

C.2 Proposition
Suppose that f has the minsup property. Then, for any nonzero limit ordinal λ, there is an
ordinal α0 < λ such that

sup f (ξ) = f (λ) for all α ≥ α0 (–1)


α<ξ<λ

and sup f (ξ) > f (λ) for all α < α0 . (–2)


α<ξ<λ
C. ORDINAL FUNCTIONS AND CONTINUITY 303

Proof of (–1). From the minsup property, there is some ordinal α0 < λ such that

sup f (ξ) = f (λ) ; (–3)


α<ξ<λ

suppose that α0 is the least such. Then, for α0 ≤ α < λ, by (–3), we have (α, λ) ⊆ (α0 , λ)
so
sup f (ξ) ≤ sup f (ξ) = f (λ) .
α<ξ<λ α0 <ξ<λ

On the other hand, supα<ξ<λ f (ξ) ≥ f (λ) by the choice of α.


Proof of (–2). Suppose that α < α0 . Then (α, λ) ⊇ (α0 , λ) so, by (–3), supα<ξ<λ f (ξ) ≥
f (λ) . On the other hand, supα<ξ<λ f (ξ) 6= f (λ) by the choice of α0 . 

C.3 Proposition
(i) Continuity ⇒ the minsup property.
(ii) The minsup property ⇒ the IV property.

Proof. (i) Let f be a continuous function.

Let λ be a limit ordinal and θ any ordinal such that θ < f (λ). Then there is β < λ such
that f maps (β, λ] into (θ, f (λ)].
Given any α such that β ≤ α < λ, we have (α, λ] ⊆ (β, λ], so f maps (α, λ] into (θ, f (λ)] ,
that is,
α < ξ ≤ λ ⇒ θ < f (ξ) ≤ f (λ) .
On the other hand, given any ν < f (λ), there is µ < λ such that f maps (µ, λ] into (ν, f (λ)] .
But, since λ is a limit ordinal, µ < λ and α < λ, we have µ+ < λ and α+ < λ, so, writing
ξ = max{ µ+ , α+ } , we have ξ ∈ (α, λ) and ξ ∈ (µ, λ] so f (ξ) > ν . This shows that

f (λ) = sup f (ξ) for any α ≥ β .


α<ξ<λ

But if α < β then the index set is enlarged so (trivially)

f (λ) ≤ sup f (ξ) .


α<ξ<λ

This shows that


f (λ) = min sup f (ξ) . 
α<λ α<ξ<λ

(ii) Suppose that f has the minsup property and that α1 , α2 and γ are ordinals such
that α1 < α2 and
f (α1 ) ≤ γ < f (α2 ) ;
we want to show that there is an ordinal β such that (–1) and (–2) above hold.

Let µ be the least ordinal such that µ > α1 and f (µ) > γ. There is at least one such ordinal,
namely α2 , so µ exists and µ ≤ α2 .
304 CHAPTER 7. TRANSFINITE ARITHMETIC

Suppose that µ is a limit ordinal. Since µ > α1 it is a nonzero limit ordinal, so, by the
previous proposition, there is an ordinal α0 < µ such that

sup f (ξ) = f (µ) for all α such that α0 ≤ α < µ .


α<ξ<µ

Setting α0 = max{α0 , α1 }, we have α1 ≤ α0 < µ, so

sup f (ξ) = f (µ) .


α0 <ξ<µ

and γ < f (µ), so there is an ordinal ξ such that α0 < ξ < µ and f (ξ) > γ, contradicting the
choice of µ.

So µ is not a limit ordinal: there is β such that µ = β + . Then β ≥ α1 since µ > α1 and
β < µ ≤ α2 .
Also, by choice of µ, either β ≤ α1 or f (β) ≤ γ. But if β ≤ α1 then, since β + > α1 ,
we have β = α1 and so f (β) = f (α1 ) ≤ γ, so f (β) ≤ γ in either case. But we also have
γ < f (µ) = f (β + ), and we are done.

C.4 Proposition
(i) The IV property 6⇒ the minsup property.
(ii) The IV property 6⇒ continuity.
(iii) Continuity 6⇒ the sup property.
(iv) The minsup property 6⇒ the continuity.

(v) The minsup property 6⇒ the sup property.


(vi) The IV property 6⇒ the sup property.
(vii) The sup property 6⇒ the IV property.

(viii) The sup property 6⇒ continuity.


(ix) The sup property 6⇒ the minsup property.
Finding counterexample functions to show these is left as an exercise, but only if you are
interested.

C.5 Proposition
Let f be a weakly order-preserving function. Then all four properties above are equivalent
for f .

Proof. We already know from the previous section that

Continuity ⇒ the minsup property ⇒ the IV property


C. ORDINAL FUNCTIONS AND CONTINUITY 305

so now it is enough to prove that

the IV property ⇒ the sup property ⇒ continuity

under the added condition that f is weakly increasing.


(1) the IV property + weakly increasing ⇒ the sup property .
Suppose that f has the IV property and is weakly increasing. Let λ be a nonzero limit
ordinal. Then, for all ξ < λ, f (ξ) ≤ f (λ) so supξ<λ f (ξ) ≤ f (λ) .

Now suppose that θ < f (λ): we want to show that supξ<λ f (ξ) ≥ θ . If f (0) > θ then of
course supξ<λ f (ξ) ≥ f (θ) > θ and we are done. Otherwise f (0) ≤ θ < f (λ) and there is β
such that β < λ and f (β) ≤ θ < f (β + ) . But, since λ is a limit ordianal, β + < λ, so (from
f (β + ) > θ) we have supξ<λ f (ξ) > θ .

(2) The sup property + weakly increasing ⇒ continuity .


Suppose that f has the sup property and is weakly increasing. Let λ be a nonzero ordinal
and θ < λ. We want to show that there is α < λ such that f maps (α, λ] into (θ, f (λ)] .
Since supξ<λ f (ξ) = f (λ) and θ < f (λ), there is α < λ such that θ < f (α) ≤ f (λ) . But
then
ξ ∈ (α, λ] ⇒ α < ξ ≤ λ ⇒ f (α) ≤ f (ξ) ≤ f (λ) ⇒ f (ξ) ∈ (θ, f (λ)] ,
since θ < f (α) . 

C.6 A useful proposition


Suppose that f is a function Ord → Ord which is continuous or has the minsup property
or has the sup property. Then
(i) if also
f (ξ) ≤ f (ξ + ) for all ordinals ξ
then f is weakly increasing,
(ii) if also
f (ξ) < f (ξ + ) for all ordinals ξ
then f is strictly increasing.

Proof. The proof of (i) is given; the proof of (ii) is the same.
First let us note that, if f has the minsup property, then by Proposition C.2, for any nonzero I C.2
limit ordinal λ, there exists an ordinal α < λ such that

sup f (ξ) = f (λ) . (–1)


α<ξ<λ

But the same is obviously true if f has the sup property (put α = 0) and continuity implies
the minsup condition, so (–1) holds in any case.
306 CHAPTER 7. TRANSFINITE ARITHMETIC

Suppose then that f is not weakly increasing. Then there are ordinals α < β such that
f (α) > f (β). Let β be the least ordinal for which there exists α < β such that f (α) > f (β).
Then β must be a limit ordinal, for otherwise there would be ξ such that β = ξ + and then
f (ξ) ≤ f (β) < f (α) with ξ ≥ α and then ξ > α, contradicting the choice of β.
So β is a nonzero limit ordinal (since β > α) and so, as noted above, there is θ < β such
that
sup f (ξ) = f (β) . (–2)
θ<ξ<β

We may assume without loss of generality that θ ≥ α. Then α < θ+ < β and f (θ+ ) ≥
f (α) > f (β) contradicting (–2). 

C.7 Note
The proposition above does not hold if the IV property is used instead of the others. To see
this, consider the function f defined by
(
ξ if ξ < ω,
f (ξ) =
0 otherwise .

C.8 Lemma
If a function f : Ord → Ord is weakly order-preserving and continuous, then, for any
nonempty set X of ordinals

sup{ f (ξ) : ξ ∈ X } = f (sup X) . (–1)

Proof. Set µ = sup X.


If µ is 0 or a successor ordinal, then it is the maximum value of X and then

max{ f (ξ) : ξ ∈ X } = f (max X)

so (–1) follows.
Now suppose that µ is a nonzero limit ordinal.
For any ξ ∈ X, ξ ≤ µ so, since f is order-preserving, f (ξ) ≤ f (µ); it follows that

sup{ f (ξ) : ξ ∈ X } ≤ f (µ) = f (sup X) .

To see that equality holds, we suppose that sup{ f (ξ) : ξ ∈ X } < f (µ) and deduce a
contradiction. Put

θ = sup{ f (ξ) : ξ ∈ X } so that θ < f (sup X) . (–2)

By continuity, there is an α < sup X such that f maps (α, sup X] into (θ, f (sup X)], that is

θ < f (ξ) ≤ f (sup X) for all ξ such that α < ξ ≤ sup X . (–3)
C. ORDINAL FUNCTIONS AND CONTINUITY 307

But, since α < sup X, there is some ξ ∈ X such that α < ξ. Then

θ ≥ f (ξ) by (–2)
> θ by (–3),

the contradiction. 
308 CHAPTER 7. TRANSFINITE ARITHMETIC

D Arithmetic of ordinal numbers


D.1 Definition: Addition of ordinals
Let α and β be two ordinals. We define the ordinal α + β by induction over β:

α + 0 = α.
α + β + = (α + β)+

and, for any nonzero limit ordinal λ,

α + λ = sup{α + ξ : ξ < λ}

Notice that the first two lines of this definition are the same as the definition of addition on
the natural numbers. Consequently our new definition extends the old one.
The third line allows the definition to extend to all ordinal numbers, and does so in the
usual way.

We can now check some of the notational results I used in the introductory remarks to this
chapter. For example

ω + 1 = ω + 0+ = (ω + 0)+ = ω + ,
ω + 2 = ω + 1+ = (ω + 1)+ = ω ++ and so on.

also

ω + ω = sup{ω + ξ : ξ < ω}
= sup{ω + 0, ω + 1, ω + 2, . . .}
[
= {ω, ω + , ω ++ , . . .}
[
= {0, 1, 2, . . . , ω, ω + , ω ++ , . . .}
= (what I then called) ω2

ω3 = ω2 + ω

and so on.
By an easy induction,
0+α = α for every ordinal α .

Warning! All this is as expected so far, but note well that addition is not commutative.
For example 1 + ω = ω. To see this,

1 + ω = sup{1 + ξ : ξ < ω} = sup{1, 2, 3, . . .} = ω 6= ω + 1 .

This also example shows that neither is it cancellative (1 + ω = 0 + ω but 1 6= 0). There are
a number of other standard algebraic properties that we might expect of addition that fail
for ordinal addition, so one should tread with care when doing ordinal arithmetic.
D. ARITHMETIC OF ORDINAL NUMBERS 309

D.2 Proposition: Order properties of addition


(i) Addition is strict order-preserving in its second argument: ξ < η ⇒ α+ξ < α+η.
It is weak order-preserving in its first argument: ξ≤η ⇒ ξ + α ≤ η + α.

(It is not strict order-preserving in its first argument: there are examples of ordinals ξ, η
and α such that ξ < η but ξ + α = η + α.)
(ii) Addition is normal in its second argument, and so . . .
(ii0 ) For any ordinal α and any nonempty set X of ordinals,

sup{ α + ξ : ξ ∈ X } = α + sup X .

(iii) If α and β are ordinals and α ≤ β, then there is a unique ordinal γ such that
α + γ = β.
(This rule does not hold on the other side: there may not be any γ such that γ + α = β; for
example, there is no γ such that γ + 1 = ω.)

Proof. (i) That addition is strictly order-preserving in its second argument follows from
Proposition C.6. I C.6
To see that it is weakly order-preserving in its first argument, suppose that α, ξ and η are
ordinals and that ξ ≤ η; we want to show that ξ + α ≤ η + α. This we do by induction over
α.

If α = 0 the result is trivial. If α is a non-limit ordinal, α = θ+ say, then

ξ + α = ξ + θ+ = (ξ + θ)+ ≤ (η + θ)+ = η + θ+ = η + α .

Now suppose that α is a non zero limit ordinal. Then, for any θ < α we have

ξ + θ ≤ η + θ ≤ sup{η + ζ : ζ < α} = η + α .

Therefore ξ + α = sup{ξ + ζ : ζ < α} ≤ η + α.


(ii) By its definition.
(ii0 ) By A.21. I A.21

(iii) Addition is weakly order-preserving in its first argument, so we have β < β + =


0 + β + ≤ α + β + , so α + 0 ≤ β < α + β + . Applying the IV property to the second argument,
there is γ such that 0 ≤ γ < β + and α + γ ≤ β < α + γ + . But α + γ + = (α + γ)+ and so
β = α + γ.
Then γ is unique by (i). 

D.3 Proposition: Algebraic properties of addition


(i) Addition is associative: for any ordinals α, β and γ, (α + β) + γ = α + (β + γ).

(ii) Zero is a true zero: for any ordinal α, 0 + α = α + 0 = α.


310 CHAPTER 7. TRANSFINITE ARITHMETIC

(iii) Addition is cancellative on the left: α + ξ = α + η ⇒ ξ = η,


(but not on the right: there are examples of ordinals ξ, η and α such that ξ + α = ζ + α
but ξ 6= ζ).

Proof. (i) We prove this by induction over γ. In the case γ = 0 we have


(α + β) + 0 = α + β = α + (β + 0) .
If γ is a non-limit ordinal, γ = θ+ , say, we have
(α + β) + θ+ = ((α + β) + θ)+ = (α + (β + θ))+ = α + (β + θ)+ = α + (β + θ+ ) .
If γ is a nonzero limit ordinal, we have
(α + β) + γ = sup{(α + β) + ξ : ξ < γ} by definition of addition
= sup{α + (β + ξ) : ξ < γ} by inductive hypothesis
= α + sup{β + ξ : ξ < γ} by D.2(ii0 ), proved above
= α + (β + γ) by the definition of addition again.


(ii) α + 0 = α by definition. We prove that 0 + α = α by induction over α. In


the case α = 0 we have 0 + 0 = 0. If α is a non-limit ordinal, α = θ+ , say, we have
0 + α = 0 + θ+ = (0 + θ)+ = θ+ = α. Finally, if α is a nonzero limit ordinal, 0 + α =
sup{0 + ξ : ξ < α} = sup{ξ : ξ < α} = α.
(iii) This is because addition is strictly order-preserving in its second argument (Part (i)
of the preceding proposition).

D.4 Definition: Multiplication of ordinals


We define αβ by induction over β:

α0 = 0 .
αβ + = αβ + α
and, for any nonzero limit ordinal λ,
βλ = sup{βξ : ξ < λ}

Once again we observe that this definition extends the usual definition of multiplication
from the natural numbers to all the ordinals. Also that the third line extends the definition
for nonzero limit ordinals in the usual way.
We see that
α1 = α0+ = α0 + α = 0 + α = α,
α2 = α1+ = α1 + α = α + α,
α3 = α2+ = α2 + α = α + α + α,
D. ARITHMETIC OF ORDINAL NUMBERS 311

and so on, as you would expect. But once again there are a few surprises, for instance,

2ω = sup{2ξ : ξ < ω} = sup{2, 4, 6, . . .} = ω

so 2ω 6= ω2 and multiplication is not commutative. This accounts for the slightly unusual
notation ω2, ω3 etc. used earlier in this chapter.

D.5 Proposition: Order properties of multiplication


(i) Multiplication is strict order-preserving on the right: for any ordinal numbers ξ, η
and β,
if β 6= 0 and ξ < η then βξ < βη
and weak order -preserving on the left: for any ordinal numbers α (zero or not), ξ and η,

if ξ ≤ η then ξα ≤ ηα .

(ii) For any ordinal α 6= 0 the function ξ 7→ α + ξ is normal. Think: Multiplication is


normal most of the time.

(ii0 ) For any ordinal α and any nonempty set X of ordinals, sup{ αξ : ξ ∈ X } = α sup X.

Proof. (i) That multiplication is strictly order-preserving in its second argument follows
from Proposition C.6. I C.6
Now we show that multiplication is weak order-preserving on the left, ξ ≤ η ⇒ ξβ ≤ ηβ,
by induction over β.

If β = 0, both sides are zero and the result is trivial.


If β is a non-limit ordinal, β = θ+ , say, we have

ξβ = ξθ+ = ξθ + ξ ≤ ηθ + η = ηθ+ = ηβ

(we are using here the fact that addition is order-preserving on both sides, proved above).
Suppose α is a limit ordinal. For any θ < β we have ξθ ≤ ηθ ≤ sup{ηθ : θ < β} = ηβ. As
this is true for any θ < β, we have ξβ = sup{ξθ : θ < β} < ηβ.
(ii) By its definition.

(ii0 ) When α 6= 0 this is given by A.21; when α = 0 it is trivial.  I A.21

D.6 Proposition: Algebraic properties of multiplication


(i) Multiplication is associative: for any ordinal numbers α, β and γ, (αβ)γ = α(βγ).
(ii) Zero behaves the way you would expect: for any ordinal number α, α0 = 0α = 0.
(iii) One behaves the way you would expect: for any ordinal number α, α1 = 1α = α.

(iv) Multiplication is not commutative: there are ordinals α and β such that αβ 6= βα.
312 CHAPTER 7. TRANSFINITE ARITHMETIC

(v) Multiplication distributes on the left but not on the right: for any ordinal numbers
α, β and γ,
α(β + γ) = αβ + αγ

however there are ordinal numbers α, β and γ such that

(α + β)γ 6= αγ + βγ .

(vi) Multiplication is cancellative on the left but not on the right: for any ordinal numbers
α, ξ and η,
α 6= 0 and αξ = αη ⇒ ξ=η .

however there are ordinal numbers α, ξ and η such that

α 6= 0 and ξα = ηα but ξ 6= η

(vii) The division algorithm for ordinals. Let α and β be ordinals with α 6= 0. Then there
are unique ordinals δ and ρ such that

β = αδ + ρ and ρ < α.

I think that it is quite surprising that the well-known division algorithm for natural numbers
holds without any change at all for ordinals.

Proof. (ii) We need only prove that 0β = 0, since β0 = 0 by definition. Proof is


by induction over β and is very easy as you would expect. In the case β = 0 we have
0β = 0.0 = 0. Now suppose that β = θ+ . Then 0β = 0β + = 0β + 0 = 0 + 0 = 0. And if β
is a nonzero limit ordinal, 0β = sup{0ξ : ξ < β} = sup{0} = 0.
(iii) To see that α1 = α: α = α0+ = α0 + α = 0 + α = α.
We prove that 1β = β as usual by induction over β. In the case β = 0 this is just 1.0 = 0,
which we know already. If β = θ+ then 1β = 1θ+ = 1θ + 1 = θ + 1 = θ+ = β. And if β is a
limit ordinal, then 1β = sup{1ξ : ξ < β} = sup{ξ : ξ < β} = β.

(iv) We have seen above that 2ω = ω 6= ω2.


(v) We prove that α(β + γ) = αβ + αγ by induction over γ. This follows the by now
well-trodden path.
In the case γ = 0 we have α(β + 0) = αβ = αβ + 0 = αβ + α0.

If γ = θ+ , we have

α(β + γ + ) = α(β + γ)+ = α(β + γ) + α = αβ + αγ + α = αβ + αγ + .


D. ARITHMETIC OF ORDINAL NUMBERS 313

And if β is a nonzero limit ordinal,


α(β + γ) = α sup{β + ξ : ξ < γ} by definition of addition
= sup{α(β + ξ) : ξ < γ} multiplication is normal in its second argument
= sup{αβ + αξ : ξ < γ} by inductive hypothesis
= αβ + sup{αξ : ξ < γ} addition is normal in its second argument
= αγ + αγ by definition of multiplication

To see that multiplication does not distribute on the other side, observe that (1 + 1)ω =
2ω = ω but 1ω + 1ω = ω + ω = ω2 and we already know that these are not equal.
(vi) We prove this result in the form: α 6= 0 ∧ ξ 6= η ⇒ αξ 6= αη.
Then either ξ < η or ξ > η, in which case αξ < αη or αξ > αη, since multiplication is
strict order-preserving on that side.
To see that multiplication is not cancellative on the other side, observe that 2ω = 1ω.
(i) We prove that (αβ)γ = α(βγ) by induction over γ.
In the case γ = 0 we have (αβ)0 = 0 = α0 = α(β0) .

Now suppose that γ is a non-limit ordinal, γ = θ+ , say. Then


(αβ)γ = (αβ)γ +
= (αβ)θ + αβ by definition of multiplication
= α(βθ) + αβ by inductive hypothesis
= α(βθ + β) by distribution, proved above
+
= α(βθ ) by definition of multiplication
= α(βγ)

Finally suppose that α is a limit ordinal. Then, using normality and the inductive hypothesis,
(αβ)γ = sup{ (αβ)ξ : ξ < γ } = sup{ α(βξ) : ξ < γ } = α sup{ βξ : ξ < γ } = α(βγ).

(vii) We have α ≥ 1 so β < β + 1 < α(β + 1) so there is an ordinal θ such that αθ > β;
let θ in fact be the least such ordinal.
Now θ cannot be a limit ordinal because then we would have β < αθ = sup{ αξ : ξ < θ }
and (by the definition of a sup) there would be ξ < θ such that αξ > β, contradicting the
choice of θ. Therefore θ is a successor: set θ = δ + .

Since δ < θ, we have αδ ≤ β, so there is ρ such that αδ + ρ = β. Then, if ρ were ≥ α, there


would be σ such that ρ = α + σ and then we would have β = αδ + α + σ = αδ + + σ = αθ + σ
contradicting the choice of θ. Therefore ρ < α.
Uniqueness is easy: αδ + ρ = αδ 0 + ρ0 ⇒ ρ = ρ0 and δ = δ 0 by cancellation of addition and
multiplication. 
314 CHAPTER 7. TRANSFINITE ARITHMETIC

D.7 Definition: Exponentiation of ordinal numbers


Let α and β be ordinal numbers. We define αβ by induction over β:

α0 = 1 ,
+
αβ = αβ α

and, for any nonzero limit ordinal λ,

αλ = sup{αξ : 0 < ξ < λ} .

Once again we observe that this definition extends the usual definition of exponentiation
from the natural numbers to all the ordinals. Also, the third line extends the definition for
nonzero limit ordinals in the usual way.
We see that
+
α1 = α0 = α0 α = 1α = α,
+
α2 = α1 = α1 .α = αα,
+
α3 = α2 = α2 α = ααα,

and so on, as you would expect. But once again there are a few surprises, for instance,

2ω = sup{2ξ : ξ < ω} = sup{1, 2, 4, 8, . . .} = ω

D.8 Proposition: Order properties of exponentiation


(i) A couple of special cases: when α = 0 we have
(
β 1 if β = 0 ,
0 =
0 otherwise.

and when α = 1 we have 1β = 1 for all β.

(ii) Exponentiation is (mostly) strict order-preserving in its second argument:

If α ≥ 2 and ξ < η, then αξ < αη .

(iii) Exponentiation is (mostly) weakly order-preserving in both arguments.

If β ≤ γ then αβ ≤ αγ (except in the case that α = 0, β = 0 and γ 6= 0.)

and
if α ≤ β then αγ ≤ β γ (always).
D. ARITHMETIC OF ORDINAL NUMBERS 315

(iv) Exponentiation is (mostly) normal in its second argument.

For any ordinal α ≥ 2, the function ξ 7→ αξ is normal.

(iv0 ) For any ordinal α and nonempty set X of ordinals, and provided that it is not the
case that α = 0 and X contains both 0 and some nonzero member,

αsup X = sup{αξ : ξ ∈ X}.

Proof. (i) This follows directly from the definition.


(ii) That follows from Proposition C.6. I C.6
(iii) That it is weakly order-preserving in its second argument follows from (i) and (ii)
above.

To see that it is weakly order-preserving in its first argument, we assume that α ≤ β and
+
show that αγ ≤ β γ , as usual by induction over γ. Firstly α0 = 1 = β 0 . Next αθ = αθ α ≤
+
β θ β = β θ . And last, if γ is a nonzero limit ordinal,

αγ = sup{ αξ : ξ < γ } ≤ sup{ β ξ : ξ < γ } = β γ .

(iv) By its definition.


(iv0 ) This follows from Proposition A.21, with a little fiddle to get around the single value I A.21
(00 ) at which the exponential function fails to be order-preserving. The easiest fiddle is to
define a function (
αβ if α 6= 0 ;
f (α, β) =
αβ+1 if α = 0 .

and apply the proposition to that. 

Most of the arithmetic properties are left as an amusing exercise. We will state and prove
just one, which is interesting enough to rate a theorem all to itself.

D.9 Proposition: Base-α notation


Let α and β be ordinals with α ≥ 2. Then there is a natural number n and ordinals
1 , 2 , . . . , n and γ1 , γ2 , . . . , γn such that 1 < 2 < · · · < n , for each i, 0 < γi < α and

β = αn γn + αn−1 γn−1 + · · · + α1 γ1 .

(When β = 0, interpret this as meaning that n = 0, so that 0 is represented as an empty


sum.)
Moreover, this representation is unique: given α and β, there is only one choice of n,
1 , 2 , . . . , n , and γ1 , γ2 , . . . , γn for which this equation holds.
316 CHAPTER 7. TRANSFINITE ARITHMETIC

Proof. Because of the special case of the statement when β = 0, we may assume now that
β 6= 0. For the purposes of this proof, let us say that an expression αn γn + αn−1 γn−1 +
· · · + α1 γ1 is in “standard form” if n is a natural number, 1 , 2 , . . . , n are ordinals such
that 1 < 2 < . . . , n , and γ1 , γ2 , . . . , γn are ordinals such that 0 < γi < α for each
i = 1, 2, . . . , n. Our aim is to prove that β can be represented uniquely by such a standard
form. This is proved by induction over β.
I D.8 Since α ≥ 2, by D.8(ii) above, ξ 7→ αξ is strictly order-preserving and normal. Using
I A.20 the Intermediate Value Theorem (A.20), since α0 ≤ β there exists an ordinal e such that
+
αe ≤ β < αe .
Now we use the division algorithm: there are ordinals g and ρ such that

β = αe g + ρ and ρ < αe .

Now if α > β, we have e = 0, g = β and ρ = 0, so we are finished: β = αe g and set n = 1,


1 = e and γ1 = g. Otherwise we have α ≤ β, so that e ≥ 1 and g ≥ 1. Then ρ < αe < β
and we can apply the inductive hypothesis: ρ can be expressed in standard form

ρ = αn γn + αn−1 γn−1 + · · · + α1 γ1 .

Then, setting n+1 = e and γn+1 = g, we have

β = αn+1 γn+1 + αn γn + αn−1 γn−1 + · · · + α1 γ1

and, in order to show that this is also in standard form, it only remains to show that
n+1 > n , that is, that e > n . But, if n ≥ e, we would have ρ ≥ αe but we already have
ρ < αe above.
This establishes the existence of the representation for β.
+
Before proving uniqueness, we show that αn γn + αn−1 γn−1 + · · · + α1 γ1 < αn γn+ ≤ αn
+
by induction over n. If n = 1, then αn γn < αn γn+ ≤ αn α = αn since γn < α.
Otherwise n ≥ 2 and
+ +
αn γn +αn−1 γn−1 +· · ·+α1 γ1 < αn γn +αn−1 ≤ αn γn +αn = αn γn+ ≤ αn α = αn .

To prove uniqueness, suppose that β has two standard representations

β = αn γn + αn−1 γn−1 + · · · + α1 γ1


0 0 0
0 0
= αm γm + αm−1 γm−1 + · · · + α1 γ10

Suppose that n 6= 0m . Then wlog n < 0m and we have


+ 0 0 0 0
0 0
αn γn + αn−1 γn−1 + · · · + α1 γ1 < αn ≤ αm ≤ αm γm + αm−1 γm−1 + · · · + α1 γ10

contradicting the assumption. Therefore n = 0m . Suppose now that γn 6= γm


0
. Then wlog
0
γn < γm and
0 0 0 0
0 0
αn γn + αn−1 γn−1 + · · · + α1 γ1 < αn γn+ ≤ αm γm ≤ αm γm + αm−1 γm−1 + · · · + α1 γ10
D. ARITHMETIC OF ORDINAL NUMBERS 317

again contradicting the assumption. We now know that n = m and γn = γm . It follows


that the leading terms of the two representations are the same. But then, so are their tails:

0 0 0
0 0
αn−1 γn−1 + αn−2 γn−2 + · · · + α1 γ1 = αm−1 γm−1 + αm−2 γm−2 + · · · + α1 γ10
and then, by induction, these tails are identical expressions also. 

D.10 Discussion: Adding well-ordered sets


Here we start looking at an application of ordinal arithmetic — that addition and multipli-
cation correspond to natural ways of combining well-ordered sets.
Suppose that we have two ordered sets, A and B, both of order-type ω (which, as we have
seen, is just a complicated way of saying that they are ordered like the natural numbers).

A = {a0 , a1 , a2 , . . .} with a0 < a1 < a2 < . . .


B = {b0 , b1 , b2 , . . .} with b0 < b1 < b2 < . . .

Let us suppose that A and B are disjoint, to keep things simple. We can visualise the
order-isomorphisms ω → A and ω → B thus:

ω: 0 1 2 ... ω: 0 1 2 ...
↓ ↓ ↓ ↓ ↓ ↓
A: a0 a1 a2 ... B: b0 b1 b2 ...

Now suppose we combine A and B to form A ∪ B, ordering the union in what I suppose is
the obvious way: keep the orders of A and B as is and make every member of A less than
every member of B:

A ∪ B = {a0 , a1 , a2 , . . . , b0 , b1 , b2 , . . .} with a0 < a1 < a2 < · · · < b0 < b1 < b2 < . . .

Now we have an order-isomorphism with ω + ω (that is ω2):

ω+ω: 0 1 2 ... ω ω+1 ω+2 ...


↓ ↓ ↓ ↓ ↓ ↓
A∪B: a0 a1 a2 ... b0 b1 b2 ...

This works for any well-ordered sets A and B. The important things to notice here are
(1) A and B must be well-ordered in the first place so that each of them has an order-
type, that is, each is order-isomorphic to some ordinal.
(2) Given two well-ordered sets A and B, ordering their union as we have done above
always results in another well-ordered set.
(3) Any well-ordered set is order-isomorphic to one and only one ordinal, so there is no
choice about the order type of the union.
Accepting this (and we will prove all this below), this construction using unions could have
been used as the definition of ordinal addition. However I think it is clear that this way
318 CHAPTER 7. TRANSFINITE ARITHMETIC

of proceeding would have been much messier than the neat inductive definition we have
actually used.
Before moving on to general definitions and proofs, let us look at a couple more examples.
First, a nice small finite one. Suppose now that we have two ordered sets, A = {a0 , a1 } of
order-type 2 and B = {b0 , b1 , b2 } of order-type 3. As before, let us suppose that A and B
are disjoint, to keep things simple. We can visualise the order-isomorphisms 2 → A and
3 → B thus:

2: 0 1 3: 0 1 2
↓ ↓ ↓ ↓ ↓
A: a0 a1 B: b0 b1 b2

Combining A and B to form A ∪ B, ordered as before, we have

2 + 3 = 5: 0 1 2 3 4
↓ ↓ ↓ ↓ ↓
A∪B: a0 a1 b0 b1 b2

Now suppose that we have two disjoint ordered sets, A = {a0 } of order-type 1 and B =
{b0 , b1 , b2 , . . .} of order-type ω:

1: 0 ω: 0 1 2 ...
↓ ↓ ↓ ↓
A: a0 B: b0 b1 b2 ...

Combining A and B to form A ∪ B, ordered as before, we have

1+ω = ω: 0 1 2 3 ...
↓ ↓ ↓ ↓
A∪B: a0 b0 b1 b2 ...

Whereas combining A and B the other way round to form B ∪ A, we have

ω + 1 (6= ω) : 0 1 2 3 ... ω
↓ ↓ ↓ ↓ ↓
B ∪ A: b0 b1 b2 b3 ... a0

Important. It seems that all the argument above will break down if A and B are not
disjoint: we cannot very well “put them side-by-side” the way we have been doing if they
have members in common. We wriggle out of this problem by working with identical copies
of A and B rather than the originals, and making these identical copies in such a way that
they are guaranteed to be disjoint. To do this is not hard — we just “tag” the members of
A with 0 and the members of B with 1, so, instead of working with A and B directly, we
work with
Ā = { ha, 0i : a ∈ A } and B̄ = { hb, 1i : b ∈ B } .
D. ARITHMETIC OF ORDINAL NUMBERS 319

Going back to the first example, in which A and B are both of order-type ω, we first create
Ā and B̄, ordering them in the obvious way:

ω: 0 1 2 ... ω: 0 1 2 ...
↓ ↓ ↓ ↓ ↓ ↓
A: a0 a1 a2 ... B: b0 b1 b2 ...
↓ ↓ ↓ ↓ ↓ ↓
Ā : ha0 , 0i ha1 , 0i ha2 , 0i ... B̄ : hb0 , 1i hb1 , 1i hb2 , 1i ...

then put them together

ω+ω: 0 1 2 ... ω ω+1 ω+2 ...


↓ ↓ ↓ ↓ ↓ ↓
Ā ∪ B̄ : ha0 , 0i ha1 , 0i ha2 , 0i ... hb0 , 1i hb1 , 1i hb2 , 1i ...

At this stage it would be a very good idea to revisit the order and especially the algebraic
properties of addition (which we proved using the inductive definition in D.2 and D.3) I D.2
and prove them again, at least roughly, using the ordered-union way of looking at it just I D.3
discussed.

D.11 Definition: Sum of ordered sets


In what follows we want to manufacture the disjoint union of two (or more) sets and, if
those sets were ordered, to define the “natural” order on the disjoint union. The disjoint
union is often called the sum of the sets; we will see that for our present purposes this is a
particularly friendly terminology, so I will use it henceforth here. It is simplest if we make
the construction in the two obvious stages.
The sum (= disjoint union) of two sets

As described above, to create the disjoint union of sets A and B, we first make “tagged”
versions:

Ā = { ha, 0i : a ∈ A }
B̄ = { hb, 1i : b ∈ B } .

Now Ā and B̄ are disjoint copies of A and B. Then define the sum of A and B (which we
will denote by A + B) to be the ordinary union of Ā and B̄:

A + B = { ha, 0i : a ∈ A } ∪ { hb, 1i : b ∈ B } .

The maps A → Ā and B → B̄ given by a 7→ ha, 0i and b 7→ hb, 1i obviously map A and B
one-to-one onto Ā and B̄ respectively.

The ordered sum of two ordered sets


This construction becomes more interesting to us when the sets A and B are ordered. We
order Ā and B̄ in the same way as A and B and then extend this to the whole disjoint union
320 CHAPTER 7. TRANSFINITE ARITHMETIC

by specifying that every member of Ā is < every member of B̄. To write this out properly,
note that every member of A + B is of the form
hx, ii where (i = 0 and x ∈ A) or (i = 1 and x ∈ B)
Let hx, ii and hy, ji be two members of A + B. Then
hx, ii ≤ hy, ji if and only if (a) i < j or
(b) i = j = 0 and x ≤ y in A or
(c) i = j = 1 and x ≤ y in B.

Another, slightly more convenient, way of writing this is


hx, ii ≤ hy, ji if and only if (a) i < j or
(b) i = j and x ≤ y
(where of course x ≤ y in the second line means that holds in either A or B, as appropriate).
Note that the corresponding strict order is given by:
hx, ii ≤ hy, ji if and only if (a) i < j or
(b) i = j and x < y
Where A and B are ordered sets, sum A + B will be considered to be ordered as just defined.
Note also that this definition makes perfectly good sense for any kind of order. However,
we will only be using it for well-orders.
The sum (= disjoint union) of a family of sets

We extend the construction just discussed to any family (indexed collection) of sets. In this
case it is best to use the members of the index set as tags.
Let hAi ii∈I be a family of sets indexed by the set I. Then the sum (or disjoint union) of
this family is the set
X [
hAi ii∈I = hĀi ii∈I where Āi = { ha, ii : a ∈ Ai } for each i.

Each Ai is mapped one-to-one onto Āi by the natural injection ηi : Ai → Āi given by
ηi (a) = hi, ai for each i and (as is the point of this construction), Āi ∩ Āj = ∅ for all i 6= j.
One
P can ring the usual changes with this notation. For instance, we could write it as
i∈I Ai .

The sum of a family of ordered sets


From now on, whenever the index set I and all the sets Ai are ordered, we will assume that
P
hAii∈I has the order defined as follows.
P
Let hx, ii and hy, ji be members of hAii∈I . Then
hx, ii ≤ hy, ji if and only if (a) i < j or
(b) i = j and x ≤ y in Ai
D. ARITHMETIC OF ORDINAL NUMBERS 321

Note that the corresponding strict order is given by:

hx, ii ≤ hy, ji if and only if (a) i < j or


(b) i = j and x < y in Ai .

D.12 Proposition: Properties of ordered sums


Properties of the ordered sum of two ordered sets
(i) If the given orders on A an B are both partial orders, then so is the new one defined
on A + B. If the given orders are both full orders, then so is the new one defined on A + B.
And, most importantly, if the given orders are both well-orders, then so is the new one
defined on A + B.
(ii) A + B is the union of two disjoint sets, Ā = {ha, 0i : a ∈ A} and B̄ = {hb, 1i : b ∈ B},
which are order-isomorphic to A and B respectively, and every member of Ā is less than
(<) every member of B̄. (Which is of course the whole point of this construction.)

(iii) If A and B are order-isomorphic to A0 and B 0 respectively, then A + B ' A0 + B 0 .


(iv) (Converse of (ii)) If the ordered set X is the union of two disjoint sets, Ā and B̄,
which are order-isomorphic to A and B respectively, and every member of Ā is less than
(<) every member of B̄, then X ' A + B.

Properties of the sum of a family of sets


(i0 ) P given orders on I and all the Ai are partial orders, then so is the new one
If the
defined
P on hAii∈I . If the given orders are all full orders, then so is the new one defined
on hAii∈I . And, most importantly, if the given orders are P all well-orders (including the
one on the index set I), then so is the new one defined on hAii∈I .

(ii0 )
P
hAii∈I is the union of the sets Āi = {hi, ai : a ∈ Ai } (for each i ∈ I), these are all
disjoint (that is, Āi ∩ Āj = ∅ whenever i 6= j), Āi ' Ai for every i ∈ I, and, if i < j, every
member of Āi is less than (<) every member of Āj .
(iii0 ) If Ai ' A0i for every i ∈ I, then hAii∈I ' hA0 ii∈I .
P P

(iv0 ) (Converse of (ii0 )) If the ordered set X is the union of pairwise disjoint sets, Āi for
all i ∈ I, which are order-isomorphic to Ai respectively,
Pand, for every i < j, every member
of Āi is less than (<) every member of Āj , then X ' hAii∈I .

Proof. These proofs can be left as a straightforward, if tedious, exercise. Clearly (i) – (iv)
are all special cases of (i0 ) – (iv0 ) and so it is only necessary to prove the latter statements. 

D.13 Remarks
P
Given two ordered sets A0 and A1 , the sum A0 + A1 is exactly the same as hAi ii∈2 —
just compare the definitions, remembering that 2 = {0, 1}. Thus the definition of the sum
of two ordered sets is just a special case of the definition of the sum of a family.
322 CHAPTER 7. TRANSFINITE ARITHMETIC

We can use this to define the sum of any finite number of ordered sets. For example, given
three sets, A, B and C, we can index them by I = {0, 1, 2} and so define the sum
A + B + C = {ha, 0i : a ∈ A} ∪ {hb, 1i : b ∈ B} ∪ {hc, 2i : c ∈ C} ,
with the order defined exactly as before:
hx, ii ≤ hy, ji if and only if (a) i < j or
(b) i = j and x ≤ y in Ai

But note: this is different from A + (B + C) and (A + B) + C, since applying our definitions
yields
A + (B + C) = {ha, 0i : a ∈ A} ∪ {hhb, 0i, 1i : b ∈ B} ∪ {hhc, 1i, 1i : c ∈ C}
(A + B) + C = {hha, 0i, 0i : a ∈ A} ∪ {hhb, 1i, 0i : b ∈ B} ∪ {hc, 1i : c ∈ C} .

This is irritating. It would be nice to have a definition which would make the sum (dis-
joint union) associative (as one feels it ought to be), but that doesn’t seem to be possible.
The problem occurs both for the ordered sum of ordered sets and the plain sum of plain
sets. However the natural mappings between these sets are bijections for sets and order-
isomorphisms for ordered sets, so we have an associative law (up to isomorphism) which is
good enough for most purposes:
(A + B) + C ' A + (B + C) ' A + B + C .

There is another notational problem to consider. We have the ordered sum of two ordered
sets, which we write A + B, and we also have the sum α + β of two ordinal numbers,
as defined for ordinal arithmetic. The trouble is that ordinal numbers are also sets; well,
everything in mathematics is a set, but the point is that we sometimes treat them as sets,
whose members are of interest. So now we actually have two different definitions of α + β,
one by ordinal arithmetic and the other as the ordered disjoint union of two sets. If you
check the definitions, you will see that these things are in fact different.
This problem is only going to get worse. When we introduce cardinal arithmetic shortly,
we will have a third definition of addition, different again, exacerbated by the fact that a
cardinal is a special kind of ordinal, which in turn is a set.

One way to deal with this problem would be to use different notations for the different kinds
of addition (+, ⊕ and  say). This doesn’t work very well because the ordinary notation is
so commonly used that it would be perverse to use another.
What I will do in this book is make clear which kind of addition is being used by simply
saying so. Occasionally two kinds of addition are used in the same sentence or even in the
same equation. Then I will decorate the plus sign as follows:
set
α+β The disjoint sum of the sets α and β;
ord
α + β The sum of ordinals α and β by ordinal addition;
crd
α+β The sum of cardinals α and β by cardinal addition.
We will have occasion to do this in the next proposition.
D. ARITHMETIC OF ORDINAL NUMBERS 323

D.14 Proposition
The promised useful application of ordinal addition.
(i) Let α and β be two ordinal numbers. Then the sums α + β as ordered sets and by
ordinal arithmetic are order-isomorphic:
set ord
α + β ' α + β.

If you prefer you could write this

α + β (sum of ordered sets) ' α + β (ordinal arithmetic) .

(ii) Let A and B be two well-ordered sets of order types α and β respectively. Then
A + B is of order type α + β. Here I don’t bother decorating the plus signs, because it is
obvious which kind of addition is referred to in each case.

This generalises in the expected way to sums of any number of terms.


(i0 ) Let α1 , α2 , . . . , αn be a finite number of ordinal numbers.
Then
set set set ord ord ord
α1 + α2 + . . . + αn ' α1 + α2 + . . . + αn
and this of course means

α1 +α2 + . . . +αn (ordinal arithmetic) ' α1 +α2 + . . . +αn (sum of ordered sets) .

(ii0 ) Let A1 , A2 , . . . , An be a finite number of well-ordered sets of order types α1 , α2 , . . . , αn


respectively. Then A1 + A2 + · · · + An is of order type α1 + α2 + · · · + αn .

Proof. First we note that (ii) is an immediate corollary of (i), so it remains to prove (i).

Observing that α + β is the union of α and the set {α + ξ : ξ ∈ β}, that these subsets
are disjoint and that every member of α is less that every member of {α + ξ : ξ ∈ β}, we
conclude that α + β ' α + {α + ξ : ξ ∈ β}. Also, the map β → {α + ξ : ξ ∈ β} defined by
ξ 7→ α + ξ is an order-isomorphism, so β ' {α + ξ : ξ ∈ β} and therefore α + β ' α + β.
Proofs of (i0 ) and (ii0 ) are left as an exercise. 

D.15 Definition: Ordering a product


Multiplication of ordinals also corresponds to a common construction with ordered sets.
Recall that (in 6.B.19) we defined the cartesian product of two sets A and B as the set of I 6.B.19
ordered pairs
A × B = { ha, bi : a ∈ A and b ∈ B } .
Recall also that (in 6.E.2), when A and B were ordered, we defined the lexicographic order I 6.E.2
324 CHAPTER 7. TRANSFINITE ARITHMETIC

on A × B; well, there were two lexicographic orders, one from the left and the other from
the right. For our present purposes it is the lexicographic order from the right that we use:

ha, bi < ha0 , b0 i if and only if either b < b0


b = b0 and a < a0 .

This is the appropriate order to use for well-ordered sets, so from now on, whenever a
cartesian product of well-ordered sets is mentioned, it will be assumed that it is endowed
with the lexicographic order from the right.
We know that, if A and B are well-ordered then A × B is also well-ordered. We will prove
below that if A and B are of order-types α and β respectively, then A × B is of order-type
αβ.

At least we don’t have an ambiguity of notation in this case. However, when we come to
cardinal arithmetic we wil find that there are different definitions of the product for ordinal
and cardinal arithmetic, and that the same notation αβ is commonly used for both. Where
ord crd
necessary I will disambiguate by writing αβ and αβ .
This generalises just as one would hope to the product of any number of ordered sets.
If A1 , A2 , . . . , An are well-ordered sets of order-types α1 , α2 , . . . , αn respectively, then the
cartesian product A1 × A2 × · · · × An is well-ordered of order-type α1 α2 . . . αn . We won’t
need to follow this up in these notes.

D.16 Proposition
(i) Let α and β be two ordinal numbers. Then α × β ' αβ. Here α × β is the cartesian
product and αβ is the product according to ordinal arithmetic.
(ii) Let A and B be two well-ordered sets of order types α and β respectively. Then
A × B is of order type αβ.

Proof. (ii) is an immediate corollary of (i), so it is enough to prove (i). (Throughout this
proof, α + β and αβ of course refer to ordinal arithmetic.)
We define a function f : α × β → αβ by f (ξ, η) = αη + ξ . We will show that this is an
order-isomorphism.
First we must check that its range is within αβ as claimed. So, for hξ, ηi ∈ α × β we have
ξ < α and η < β and thus

f (ξ, η) = αη + ξ < αη + α = αη + ≤ αβ

so f (ξ, η) ∈ αβ as required.
To see that f is surjective, suppose that θ is any member of αβ, that is, θ < αβ. By the
division algorithm, there is ξ, η such that θ = αη + ξ with ξ < α. Then also η < β, because
otherwise we would have η ≥ β and then αη + ξ ≥ αη ≥ αβ. Thus we have ξ < α and η < β
such that f (ξ, η) = αη + ξ = θ. This shows that f maps α onto αβ.
D. ARITHMETIC OF ORDINAL NUMBERS 325

It remains to show that f is strictly increasing. So suppose that hξ, ηi < hξ 0 , η 0 i. The either
η < η 0 or else η = η 0 and ξ < ξ 0 .
In the first case, η + ≤ η 0 so

f (ξ, η) = αη + ξ < αη + α = αη + ≤ αη 0 ≤ αη 0 + ξ 0 = f (ξ 0 , η 0 )

and in the second case f (ξ, η) = αη + ξ < αη + ξ 0 = f (ξ 0 , η 0 ) . 

D.17 Proposition: Properties of the ordered product


(i) If A and B are well-ordered sets, then so is A × B.
(And, as usual, if A and B are both partially ordered or both fully ordered, then A × B is
partially or fully ordered respectively too.)

(ii) If A, B, A0 and B 0 are ordered sets and A ' A0 and B ' B 0 then A × B ' A0 × B 0 .
(iii) A sort of associative law: if A, B and C are ordered sets, then (A × B) × C '
A × (B × C) ' A × B × C.
(iv) A sort of distributive law: if A, B and C are ordered sets, then A × (B + C) '
(A × B) + (A × C).
(v) If A is an ordered set and {p} is a singleton, then A × {p} ' A.

Proof. Another interesting exercise. 

D.18 Remarks
Notice the similarities between the two definitions of A + B and A × B. In the case A + A
we have
A + A = A × {0, 1} = A × 2
and in general

A + A + · · · + A (n copies) = A × {0, 1, . . . , n − 1} = A × n

which is another way of establishing that α + α = α2, α + α + α = α3 and so on. (Here we


are assuming that {0, 1, 2, . . . , n} has the usual order.
P
Let hαii<β be a sequence of ordinals, indexed by the ordinal β. We define the sum i<β αi
by induction over β as follows (this is more ordinal arithmetic):
X
αi = 0
i<0
X X
αi = αi + αβ
i<β + i<β
X X
and, for a limit ordinal λ αi = sup{ αi : ξ < λ }
i<λ i<ξ
326 CHAPTER 7. TRANSFINITE ARITHMETIC

This is great. There is no reason here why β should not be an infinite ordinal, so we have
defined the limit of an infinite sequence of ordinals (some or all of which may themselves be
infinite) with hardly more trouble than we took to define plain addition! There is quite a
bit of nice mathematics to be discovered in following this up, but we have other fish to fry,
so we won’t.

It is easy to prove that, in the case where all the αi are the same, αi = α say,
X
α = αβ.
i<β

which we could express, rather roughly, as “the sum of β copies of α is αβ” — corresponding
exactly to a result about whole numbers familiar since primary school.
Exponentiation of ordinals also corresponds to a construction with well-ordered sets, however
this construction is rather technical and we will not be needing it in what we are going to
do here, so we will leave this field undisturbed.
E. CARDINALITY 327

E Cardinality

E.1 Discussion
In this section we will consider the problem of comparing the sizes of infinite sets. It depends
on the simple idea that two sets are the same size if there is a one-to-one correspondence
between them. One might think of this as a pairing up of the members of one set with those
of the other.
This idea of determining if two sets of things are the same size appears to be more primitive
than the process of counting. Certain societies which have no general notion of number are
perfectly able to compare the sizes of sets by matching; small children who have not yet
learnt to count can do the same.
The idea of determining the size of a set by counting its members may seem simple to us,
but it is really quite sophisticated. Its apparent simplicity is possibly due to the fact that
we learnt it early in our lives and it has become second nature to most of us. Basically,
to determine that a set has size n (that is, n members) we place the members in one-
to-one correspondence with the set {1, 2, 3, . . . , n} by some, often mental, process. And
then we recognise that two sets are the same size if they can both be placed in one-to-one
correspondence with the same set of natural numbers in this way. This involves recognising
(at least subconsciously) that if two sets are in one-to-one correspondence with a third one
(the set of numbers), then they must be in one-to-one correspondence with each other. And
of course the process of counting breaks down if you cannot ensure that every member of the
set does get counted and that no member gets counted twice. This can sometimes be quite
difficult. For example, determining the number of cattle in a field is not straightforward if
the number is large and the animals keep moving around. Farmers often solve this problem
by passing the animals through a narrow opening between two fields, counting them as they
go.

This counting method breaks down for infinite sets. All one can do with the natural numbers
is count and compare the sizes of finite sets and recognise when a set is infinite (because
it cannot be so counted). I find it remarkable that the more primitive method of placing
sets in one-to-one correspondence can actually leap-frog counting and provide us with a
good method of comparing the size of infinite sets. Using this method, for example, we can
say that N is not the same size as R, because we can prove that there does not exist any
one-to-one correspondence between them. Further, we can say that N is smaller than R
because N is in one-to-one correspondence with a subset of R.
This topic obviously revolves around the idea of a one-to-one correspondence and related
ideas. So let us review these briefly before starting.

E.2 Injective, surjective and bijective functions


There are a number of different words current for these ideas, a fact which might give rise
to some confusion. Here is a short list of . . .
Some synonyms (for a function f : A → B):

(1) f is an injection, f is injective, f is mono, f is one-to-one.


328 CHAPTER 7. TRANSFINITE ARITHMETIC

(2) f is a surjection, f is surjective, f is epi, f is onto B.


(3) f is a bijection, f is bijective, f is iso, f is a one-to-one correspondence.
In these notes I will use the first two words in these lists (the ones that end in “-jection”) as
far as possible.

An injection is defined to be a function f : A → B such that, for all a1 , a2 ∈ A, f (a1 ) =


f (a2 ) ⇒ a1 = a2 .
A surjection is defined to be a function f : A → B such that, for all b ∈ B, there exists some
a ∈ A such that f (a) = b.

A bijection is defined to be a function which is both an injection and a surjection.


Appeal to these definitions is probably the most common way of proving that a function
has these properties.
Since bijections will be important here, please note:

An alternative way of proving that a function is a bijection is to appeal to the fact that a
function f is a bijection if and only if it has an inverse, that is, a function g : B → A such
that g(f (a)) = a for all a ∈ A and f (g(b)) = b for all b ∈ B. In this case the inverse is
usually denoted f −1 and is unique.
It is useful to know that the composite of two injections is an injection, of two surjections is a
surjection and of two bijections is a bijection. Also that the identity function on any set is a
bijection, and therefore also both an injection and a surjection. (Proofs are straightforward.)

E.3 Comparison of size


We say that the sets A and B are the same size, denoted A ≈ B, if there is a bijection
A → B.
(Synonyms are: A is equipollent to B, A has the same cardinality as B.)
We say that set A is no larger than set B, denoted A 4 B, if there is an injection A → B.
(Synonyms are: B is no smaller than A, A is dominated by B, A has cardinality not exceeding
that of B.)
We say that set A is smaller than set B, denoted A ≺ B, if A 4 B and A 6≈ B.
(Synonyms are: B is larger than A, B strictly dominates A, A has cardinality less than that
of B.)

We will start by establishing some basic properties of these concepts before we look at some
more interesting applications.

E.4 Proposition
This proposition establishes the most basic properties that we would expect the notion of
“the same size” to have. Since we are extending this notion from the familiar domain of the
E. CARDINALITY 329

finite to the less familiar one of infinite sets, these properties do need proving.
Being the same size (equipollence) is an equivalence relation on the class of all sets. That
is, this relation satisfies
(i) A≈A for all sets A.

(ii) If A ≈ B then B≈A for all sets A and B.


(iii) If A ≈ B and B ≈ C then A≈C for all sets A, B and C.
The relation 4 is a full preorder and ≈ is the corresponding equivalence relation. That is,
this relation satisfies

(iv) A4A for all sets A.


(v) If A 4 B and B 4 A then A≈B for all sets A and B.
(vi) If A 4 B and B 4 C then A4C for all sets A, B and C.
AC
(vii) For any two sets A and B, either A 4 B or B 4 A.

The following obvious fact fits nicely in here too.


(viii) If A ⊆ B then A 4 B.

Proof. (i) The identity function A → A is a bijection.

(ii) If A ≈ B there is a bijection f : A → B and then f −1 : B → A is a bijection B → A.


(iii) If A ≈ B and B ≈ C there are bijections f : A → B and g : B → C and then g ◦ f
is a bijection A → C.
(iv) The identity function on A is an injection.

(v) is the Schröder-Bernstein Theorem, proved below.


(vi) If A 4 B and B 4 C then there are injections f : A → B and g : B → C. But
then g ◦ f is an injection A → C.
(vii) We invoke the Axiom of Choice in its avatar as the Well-Ordering Principle, and
assume that both A and B can be well-ordered. Then, by D11, either there is an order-
isomorphism from A to an initial segment of B, or there is an order-isomorphism from B to
an initial segment of A. But such an order isomorphism is injective.
The Axiom of Choice has appeared. As we will see, its use is necessary for many useful
results in this topic.

(viii) The inclusion function A → B is injective. 

E.5 Proposition
(i) If A ≈ A0 and B ≈ B 0 then
330 CHAPTER 7. TRANSFINITE ARITHMETIC

(a) A + B ≈ A0 + B 0 (disjoint unions.)


(b) A × B ≈ A0 × B 0
0
(c) B A ≈ B 0A
(d) P (A) ≈ P (A0 )
(Note that A r B need not be the same size as A0 r B 0 , even if B ⊆ A and B 0 ⊆ A0 . What
about A ∪ B ?)
(ii) If A 4 A0 and B 4 B 0 then
(a) A + B 4 A0 + B 0 (disjoint unions.)

(b) A × B 4 A0 × B 0
0
(c) B A 4 B 0A Except in the case A = B = B 0 = ∅ and A0 6= ∅.
(d) P (A) 4 P (A0 )

Proofs. As a straightforward exercise. 

Here are some “identities” for sets which look very much like similar ones for numbers.

E.6 Proposition
For all sets A, B and C, (here + means disjoint union)
(i) (A + B) + C ≈ A + B + C ≈ A + (B + C) .
(ii) A + B ≈ B + A.

(iii) A + ∅ ≈ A ≈ ∅ + A , in other words A + 0 ≈ A ≈ 0 + A .


(iv) (A × B) × C ≈ A × B × C ≈ A × (B × C) .
(v) A × B ≈ B × A.

(vi) A × ∅ ≈ ∅ ≈ ∅ × A , in other words A × 0 ≈ 0 ≈ 0 × A .


(vii) A × {b} ≈ A ≈ {b} × A , in particular, A × 1 ≈ A ≈ 1 × A .
(viii) (AB )C ≈ A(B×C) ≈ A(C×B) .

Proofs. These are all straightforward with the possible exception of (viii), and the only
difficulty with this is in getting the right notation. Here are the details. Firstly, notice that
A(B×C) ≈ A(C×B) by Part (v) and the preceding proposition. So it is sufficient to prove
that (AB )C ≈ A(C×B) .
Now both (AB )C and A(C×B) are sets of functions, so, for each function f ∈ (AB )C , we
want to define a function f ∗ ∈ A(C×B) and then show that this f 7→ f ∗ is a bijection.
E. CARDINALITY 331

Now (AB )C is the set of all functions C → AB , so f is such a function. Given any c ∈ C,
f (c) is itself a function B → A. So, given any b ∈ B, the action of this function on b is
f (c)(b), and this is a member of A. But now it is (sort of) obvious what f ∗ must be: for
any hc, bi ∈ C × B, f ∗ (c, b) = f (c)(b).
So the mapping f 7→ f ∗ can be described simply as “replacing the inner pair of parentheses
by a comma”. It remains to show that this mapping is one-to-one and onto A(C×B) . But
these are now easy.
One for the computer scientists: you will recognise this relationship as currying/uncurrying.


It is time for a few examples comparing the size of sets we often work with.

E.7 Examples
(i) X ≈ ∅ if and only if X = ∅.
Proof Obvious.
(ii) Normal counting of finite sets: the set A has n members if and only if A ≈ n. (This
probably should be the definition.)

Here we are placing the set A in on-to-one correspondence with the set {0, 1, 2, . . . , n − 1}
instead of the more everyday {1, 2, . . . , n}. Using {0, 1, 2, . . . , n − 1} in this way should be
familiar to computer programmers.
Proof not required: this is a definition.

(iii) For m, n ∈ N, m ≈ n if and only if m = n.


The natural numbers would not be much use for counting if this were not so.
Proof by induction (pretty obvious anyway).
(iv) A set A is finite if and only if there is some natural number n such that A ≈ n.

Proof not required: this is a definition.


(v) Consider the set N r {0}. The function x 7→ x + 1 is a bijection N → N r {0}, so
these sets are the same size. Thus it is perfectly possible for a set to be the same size as one
of its proper subsets.

Proof Obvious.
(vi) AC A set is infinite (that is, not finite in the sense defined above) if and only if it has
a subset ≈ N.
AC
(vii) A set is infinite if and only if it is the same size as one of its proper subsets.

Proof of (vi) and (vii) together. Call the set A.


332 CHAPTER 7. TRANSFINITE ARITHMETIC

Proof that A infinite ⇒ A has a subset ≈ N: First, well-order A (using the Axiom of
Choice). Then there is an order-isomorphism f from some ordinal α onto A. Since A
is infinite, α ≥ ω. Then f [ω] is the subset you want.
Proof that if A has a subset ≈ N then it is the same size as a proper subset of itself:
use the trick from Part (v) on that subset.
Proof that if A is the same size as a subset of itself, then it is infinite: show first (by
induction) that, for any n ∈ N (thought of as the set of its predecessors), n is not the
same size as a proper subset of itself. Consequently, by (iv), no finite set can be the
same size as a proper subset of itself either.
(viii) Let E be the set of even natural numbers, E = {0, 2, 4, 6, . . .}. Then E ≈ N.
Proof n 7→ 2n.

(ix) Any two nonempty open intervals (a, b) and (c, d) in R are the same size. The same
is true for any two nonempty closed intervals [a, b] and [c, d] (as long as a < b and c < d).
Proof Use the linear function which takes a 7→ c and b 7→ d.
(x) Any nonempty open interval (a, b) in R is the same size as R itself.

Proof To see that R ≈ (−1, +1), use the function x 7→ x/ 1 + x2 .
Then use (ix) for any other interval.
(xi) The open interval (−1, +1) is the same size as the closed interval [−1, +1].
Proof is tricky. For a direct proof, treat numbers of the form ± 21n separately from
all the others. For an easy proof, wait until we have proved the Schröder-Bernstein
Theorem.
Many more complicated examples are much more easily proved using the Schröder-Bernstein
Theorem (see below). Example (xi) above is the first such.

E.8 Definition
A set is countable if it is either finite or ≈ N. It is countably infinite if it is ≈ N. (Denumerable
and enumerable are common synonyms for countably infinite.)

E.9 Proposition
For any set X, X 6≈ P (X).
In fact, X ≺ P (X).
This says that no set is the same size as its power set; moreover it is in fact smaller than its
power set. You will recognise the proof as being very much the same as the Russell Paradox
proof.
At first sight it might seem surprising that this theorem holds even for the empty set. But
the empty set has no members, whereas its power set has one, namely ∅.

Proof. By contradiction: suppose X ≈ P (X). Then there is a set X and a bijection


E. CARDINALITY 333

F : X → P (X). Let R be the subset of X defined by R = { x : x ∈ X and x ∈ / F (x) }.


Then R ⊆ X, so there is some r ∈ R such that R = F (r). Now we ask: does r ∈ R or not?
r∈R ⇒ r ∈ F (r) ⇒ r∈
/R which proves that r ∈
/ R;
r∈
/R ⇒r∈
/ F (r) ⇒ r∈R which proves that r ∈ R;

This is a contradiction.
For the rest it is now enough to prove that X 4 P (X). This is easy: the function X → P (X)
defined by x 7→ {x} is an injection. 

E.10 Proposition
For any set A, P (A) ≈ 2A .
(Here 2A is the set of all functions A → 2, that is, all functions A → {0, 1}.

Proof. For each subset X of A we define its characteristic function χX by


(
χX (x) = 1 if x ∈ X ;
0 if x ∈
/X .

It is easy to see that the function X 7→ χX is a bijection P (X) → 2X . 

E.11 The Schröder-Bernstein Theorem


If A 4 B and B 4 A then A ≈ B.

Proof. (This is, I think, a very pretty proof.)


From the assumptions, there are injections f : A → B and g : B → A. If either of these
were a bijection, then there would be nothing left to prove. So we need to deal with the
possibility that neither is a bijection, that is, neither is a surjection. We also need to take
into account that there might be no nice neat relationship between the two functions.
Consider a member b of B. There is no assumption that the function f is surjective, so
there may or may not be an element a of A such that f (a) = b. If there is such an a, then
there is only one (since f is injective), and we will call it the parent of b. If there is no such
a, we will say that b is an orphan. Thus B is the disjoint union of f [A] (the members of B
which possess parents) and the set of orphans in B.
We can break up A in exactly the same way: let a be any member of A. We will say that
a member b of B is the parent of a if g(b) = a. If a has a parent, then it has only one and.
Otherwise a is an orphan in A. Thus A is the disjoint union of g[B] (the members of A
which possess parents) and the set of orphans in A.
334 CHAPTER 7. TRANSFINITE ARITHMETIC

f
g[B] f [A]

Orphans in A Orphans in B

A B

Now consider a member a of A, and trace its ancestry back as far as possible. It may have
a parent in B, which in turn may have a parent in A, which in turn . . . and so on. One of
three things must happen:

• Following ancestry back terminates with an orphan in A, what we might think of as


its “ultimate ancestor”. We will say that such elements are “descended from A”.
• Following ancestry back terminates with an orphan in B, so its ultimate ancestor is a
member of B. We will say that such elements are “descended from B”.
• Following ancestry back never terminates at all. We will say that such elements are
“infinitely descended”.

Note that an orphan in A is its own ultimate ancestor and so is descended from A. In the
same way an orphan in B is its own ultimate ancestor and so is descended from B
Because the functions f and g are injective, it is easy to see that these three possibilities
are mutually exclusive. Therefore A is the disjoint union of three subsets:

AA = members of A which are descended from A,


AB = members of A which are descended from B, and
A∞ = members of A which are infinitely descended.
Every member of A is a member of exactly one of these subsets. Note also that the set of
orphans in A is a subset of AA ..
We can trace back the ancestry of members of B in the same way, thus dividing it up into
three corresponding subsets:
BA = members of B which are descended from A,

BB = members of B which are descended from B, and


E. CARDINALITY 335

B∞ = members of B which are infinitely descended.


Again, every member of B is a member of exactly one of these subsets, and the set of orphans
in B is a subset of BB .

AA BA

AB BB
g

A∞ B∞

A B

Now consider the restrictions fA , fB and f∞ of the function f to the sets AA , AB and A∞
respectively.
Since f is injective, so are these three restrictions. By the definitions of BA , BB and B∞ ,
we see that

• fA is onto BA , and so is a bijection AA → BA .


• f∞ is onto B∞ , and so is a bijection A∞ → B∞ .
• fB is probably not onto BB — there are probably some orphans in BB .

fA
AA BA
bijective

fB
AB BB
not bijective
(Drat!)

f∞
A∞ B∞
bijective

A B
336 CHAPTER 7. TRANSFINITE ARITHMETIC

However, by a similar argument, the restriction gB of g to BB is a bijection BB → AB , and


−1
so it has an inverse gB : AB → BB , which must be a bijection also.

fA
AA BA
bijective

−1
gB
AB BB
bijective
(Yay!)

f∞
A∞ B∞
bijective

A B

We can now glue the last three bijections together to get a single bijection ϕ : A → B ;
define it by 
fA (a) if a ∈ AA ;

−1
ϕ(a) = gB (a) if a ∈ AB

f∞ (a) if a ∈ A∞ .

It follows from the observations above that ϕ is a bijection A → B, as required. 

That seems like a lot of work to establish such an obvious fact: that if A is no bigger than
B and B is no bigger than A then they must be the same size.

AC
E.12 Proposition
Let A and B be sets, A 6= ∅. Then A 4 B if and only if there is a surjection B → A.
(The proof that the existence of a surjection implies A 4 B requires the Axiom of Choice;
the converse does not.)

Another result which one feels should “obviously” be true. If there is a function from A onto
B, then “surely” B can be no bigger than A. This does require proof — and the fact that
the proof in one direction requires the big gun of the Axiom of Choice seems surprising.

Proof. Suppose first that A 4 B. Then there is an injection f : A → B. Also, since A


is nonempty, we can choose a particular element, a0 say, of A. Now we can construct a
surjection B → A as follows. For any b in the range of f , let g(b) be the unique a ∈ A for
which f (a) = b. For any b not in the range of f , define g(b) = a0 . It is easy to see that then
g is a surjection as claimed.
E. CARDINALITY 337

Now suppose that there exists a surjection g : B → A. Consider the sets

g −1 (a) = { b : b ∈ B and g(b) = a },

defined for each a ∈ A. These sets form a partition of B, so they are disjoint and all
nonempty. Invoking the Axiom of Choice, there is a choice function c : P (B) r ∅ → B such
that c(X) ∈ X for every nonempty subset of B. Define f (a) = c(g −1 (a)) for all a ∈ A. This
defines a function A → B, and, since all the sets g −1 (a) are disjoint, it is an injection. 

And what happens when A = ∅? I’ll leave you to figure that out.

E.13 Examples
(i) A set is countable if and only if it is 4 N.
Proof (outline). This is not quite as obvious as it looks. If a set is countable, then it
is 4 N: that is just a matter of checking the definitions. On the other hand, if a set
is 4 N, then it is in on-to-one correspondence with some subset of N. So the proof
boils down to showing that every subset of N is ≈ either some n ∈ N or N itself, an
interesting little exercise in induction.
AC
(ii) A set A is infinite if and only if N 4 A .
Proof This is just a restatement of Example E.7(vi). I E.7
(iii) It is now easy to prove Example E.7(x), that the two intervals [−1, +1] and (−1, +1) I E.7
in R are the same size.
Proof Firstly (−1, +1) ⊆ [−1, +1] and so (−1, +1) 4 [−1, +1].
Also, by E.7(viii), [−1, +1] ≈ [− 21 , + 12 ].
(iv) By a similar argument, any nontrivial interval in R , open, closed, half-open, finite
or infinite (in length) is the same size as R. (By “nontrivial” here I mean nonempty and not
a singleton interval [a, a]).

(v) 2N ≈ R.
(Consequently R is uncountable. By previous examples then, neither is any nontrivial
interval in R.)

Proof It is easiest to show that 2N is the same size as the interval [0, 1) of reals.
The trick is to represent the reals in this interval as an infinite binary expansion. If
we write x = 0.d0 d1 d2 . . . in this notation, we mean
d0 d1 d2
x = + 2 + 3 + ···
2 2 2
where each di is either 0 or 1. If we disallow sequences of digits which terminate with
an infinite number of 1’s, the representation is unique. Thus we have an injection
from [0, 1) into the set of all sequences in {0, 1}, indexed by N, that is, 2N . This
proves that R 4 2N .
338 CHAPTER 7. TRANSFINITE ARITHMETIC

To prove the opposite relation, observe that, to every such sequence of 0’s and 1’s,
there is a real number x in [0, 1) defined by

d0 d1 d2
x = + 2 + 3 + ···
3 3 3
and now, because the denominators are powers of 3, distinct sequences define distinct
reals; therefore this defines an injection from 2N into [0, 1), proving that 2N 4 [0, 1).
Trying to prove this result without the help of the Schröder-Bernstein Theorem is pretty
awful.
(vi) (Corollary) N ≺ R.
(vii) N2 ≈ N (and so N2 is countably infinite).

Proof There is an obvious injection N → N2 given by n 7→ h0, ni. There is also an


injection N2 → N given by hm, ni 7→ 2m 3n .

(viii) R2 ≈ R.
Proof From [0, 1) ≈ R it is easy to prove that [0, 1) × [0, 1) ≈ R × R, that is
[0, 1)2 ≈ R2 . It is now sufficient to show that [0, 1)2 ≈ [0, 1). That [0, 1) 4 [0.1)2 is
obvious, so we show that [0, 1)2 4 [0, 1).
Given a pair hx, yi in [0, 1)2 , write each as its proper decimal expansion
x = 0.d1 d2 d3 . . .
y = 0.e1 e2 e3 . . .
Here “proper” means we never use an expansion which ends with an infinite sequence
of 9’s. Now define z by alternating digits in these expansions
z = 0.d1 e1 d2 e2 d3 e3 . . . .
Then the function hx, yi 7→ z (where z is defined as above) defines an injection
[0, 1)2 → [0, 1).
(ix) Z is countably infinite.
Proof Define a bijection Z → N by
(
2n if n ≥ 0 ;
f (n) =
−2n − 1 if n < 0 .

(x) N × R ≈ R.

Proof This is nice. All the steps below are from things we have proved above.

R ≈ {0} × R 4 N × R 4 R × R ≈ R .

(xi) 2R ≈ NR ≈ RR .
Proof This is even nicer.

2R 4 NR 4 RR ≈ (2N )R ≈ 2(N×R) ≈ 2R .
Some more examples as exercises
E. CARDINALITY 339

(xii) Q (the set of rational numbers) is countably infinite.


(xiii) A set A is infinite if and only if N 4 A.

(xiv) NN ≈ 2N

(xv) RN ≈ R

(xvi) The set of all finite sequences of natural numbers is countably infinite.

(xvii) The set of all functions R → R is RR and therefore  R. What about the set of all
continuous functions R → R?
(xviii) Hard. Is the set of all bijections R → R the same size as R or as RR ?
(xviii) Much harder I think. Is the set of all increasing functions R → R the same size as
R or as RR ?

E.14 The Continuum Hypothesis


We know (Example (i) above) that there is no infinite set which is smaller than N. We also
know that N is smaller than R. The question arises: is there any set of intermediate size
here, that is, any set X such that N ≺ X ≺ R?

The Continuum Hypothesis states that there is no such set.


As its name implies, the Continuum Hypothesis is not a theorem of MK. It has the same
status as the Axiom of Choice: it is independent of MK, in the sense that (writing CH for
this Hypothesis) both MK+CH and MK+¬CH are consistent theories (assuming, as always,
that MK itself is consistent).
Given the fact that the Axiom of Choice is required for a number of useful results in this area,
one might wonder if it could be used to prove the Continuum Hypothesis or its negation.
But no, the Continuum Hypothesis remains independent, even in the presence of the Axiom
of Choice. More precisely, both MK+AC+CH and MK+AC+¬CH are consistent (again,
assuming that MK is).
The Generalised Continuum Hypothesis (commonly known as the GCH ) is that, for any
infinite set X, there is no set Y such that X ≺ Y ≺ 2X . It has the same independence
status as the ordinary Continuum Hypothesis.

The Continuum Hypothesis gets its name from the historical fact that R used to be
called “the Continuum”. Following the famous Cantor proof that the reals are not
countable (in our language, N ≺ R — and note that R is the same size as 2N ) the
question arose as to whether there was a subset of the reals of a strictly intermediate
size.
340 CHAPTER 7. TRANSFINITE ARITHMETIC

F Cardinal numbers
One can use the natural numbers to measure the size of finite sets, in the sense that every
finite set is the same size as exactly one natural number. It would be nice if there was some
set of canonical “numbers” which we could use to measure the size of infinite sets in the
same way.
If we are prepared to accept the Axiom of Choice then this can be done, for then every set
X can be well-ordered and so made order-isomorphic to an ordinal number, its order-type.
But then, since an order-isomorphism is a bijection, it is the same size as its order type. Now
forget the well-ordering: we are left knowing that every set X is the same size as at least
one ordinal number. Define its cardinal number or cardinality as the least ordinal number
which is the same size as it.

#X = min{ α : α is an ordinal and X ≈ α } .

You might wonder whether any given set can be the same size as more than one ordinal
number — maybe the above definition is worrying about a possibility that doesn’t arise?
Certainly any finite set is the same size as only one ordinal (the n which is its size) but
every infinite set will be the same size as many ordinals. This follows from the fact that,
if α is an infinite ordinal, then all the sets α + n, for all natural numbers n, are the same
size. So, if a set is the same size as an infinite ordinal α, then it is the same size as all of
α, α + 1, α + 2, . . . . So, yes, it is necessary to make the definition the way it is.
It is not difficult now to see that the ordinal numbers which can turn up in this rôle (as the
sizes of sets) are just those ordinals which are not the same size as any of their predecessors.
Now let us define all this in a more logical order.

AC
F.1 Definition
A cardinal number is an ordinal number which is not the same size as any of its predecessors.
In other words, it is an ordinal number α with the property that

For all ordinals ξ, ξ < α ⇒ ξ 6≈ α .

Since, if ξ < α, we have ξ ⊆ α, this condition can be rewritten

For all ordinals ξ, ξ < α ⇒ ξ ≺ α.

Now let X be any set. We define its cardinality to be the least ordinal number which is the
same size as X:
#X = min{ α : α is an ordinal and X ≈ α } ,
which is, of course, the unique cardinal number which is the same size as X. This is also
called the cardinal number of X, or even the size of X.
(The definition of a cardinal number does not require the Axiom of Choice. The definition
of the cardinality of a set requires the Axiom of Choice in its tacit assumption that every
set does have a cardinality. Should we take this definition to mean that we are defining
cardinality for sets which happen to be the same size as some ordinal number, accepting the
possibility that some sets may not qualify, then the Axiom of Choice is not involved.)
F. CARDINAL NUMBERS 341

F.2 Examples
(i) No natural number is the same size as any of its predecessors. Therefore every natural
number is a cardinal number.
(ii) The ordinal ω = N is infinite and therefore not the same size as any of its predecessors,
which are the natural numbers. Therefore ω is the first infinite cardinal number, and in this
context it is traditionally denoted ℵ0 (read “aleph-null”).
We now have three different names and symbols for the set of natural numbers. They are
all in common use, so one needs to know them. Which one is used depends upon how the
set is being employed — whether it is being used as the good old natural numbers we use
for counting etc., or as the first infinite ordinal or as the first infinite cardinal.
(iii) The ordinals ω + 1, ω + 2, . . . , 2ω, . . . , 3ω, . . . , ω 2 , . . . are all countable (countably
infinite of course). Therefore they are all the same size as ω, which means that none of them
are cardinal numbers.
(iv) The cardinality of R is the same as that of the set of all functions ℵ0 → 2 and so is
denoted 2ℵ0 (more on this notation below).
(v) The Continuum Hypothesis is that there is no cardinal number strictly between ℵ0
and 2ℵ0 . The Generalised Continuum Hypothesis is that, for any infinite cardinal i, there
is no cardinal number strictly between i and 2i .

For some reason, when dealing with cardinal numbers, mathematicians started using Hebrew
letters. Aleph (ℵ) and beth (i) are the first two letters of the Hebrew alphabet. As far as
I know, no other letters of this alphabet are in common use.
(vi) A≈B if and only if #A = #B,
A4B if and only if #A ≤ #B

and A≺B if and only if #A < #B.


Thus we can easily compare the size of sets if we know their cardinality.
(vii) Every cardinal number is its own size (cardinality). In other words, if α is a cardinal,
then #α is its size (thought of as a set itself); then
#α = α .

F.3 Cardinal arithmetic


As with ordinal numbers, we can define and investigate an arithmetic of cardinal numbers.
For any two cardinal numbers α and β, we can define α + β, αβ and β α as the cardinalities
of sets A + B, A × B and B A , where A and B are sets of cardinality α and β respectively.
(We will actually use a slightly different definition which comes to the same thing.)
Warning! As already remarked, we must be careful here because these definitions of
addition, multiplication and exponentiation are different from those made for ordinals in
Section D and different again from those made for sets. This problem is exacerbated by the ID
fact that cardinals are special kinds of ordinals which in turn are ordered sets
342 CHAPTER 7. TRANSFINITE ARITHMETIC

Here is a run-down on the various uses of these notations:


set
Sum A + B (or A + B) The sum = disjoint union of two sets.
ord
α + β (or α + β) The sum of two ordinal numbers (ordinal arithmetic).
crd
α + β (or α + β) The sum of two cardinal numbers (cardinal arithmetic).
Product A×B The cartesian product of two sets.
ord
αβ (or αβ) The product of two ordinal numbers (ordinal arithmetic).
crd
αβ (or αβ) The product of two cardinal numbers (cardinal arithmetic).
Exponent AB The set of all functions B → A.
αβ The exponent of two ordinal numbers (ordinal arithmetic).
αβ The exponent of two cardinal numbers (cardinal arithmetic).
We don’t need a special notation to distinguish ordinal and cardinal exponentiation because
we will not be using ordinal exponentiation at all from now on.

F.4 Definition: cardinal arithmetic operations


Let α and β be cardinals. Then
α + β (cardinal arithmetic) is the cardinality of the set α + β (disjoint sum).
αβ (cardinal arithmetic) is the cardinality of the set α × β (cartesian product).

αβ is the cardinality of the set of all functions β → α.


In other words,
crd set
α + β = #(α + β).
crd
αβ = #(α × β)
I E.5 As an immediate corollary of Example E.5(i), we have: If A and B are sets with cardinalities
α and β respectively, then

#(A + B) = α + β , #(A × B) = αβ and #(B A ) = β α .

This is of course the point of these definitions.


Another warning: α + β (cardinal arithmetic) and α + β (ordinal arithmetic) are hardly ever
the same as one another, even if α and β are both cardinals. The same goes for multiplication
and exponentiation.

F.5 Proposition: two simple properties of cardinals


(i) For any ordinal α, #α ≤ α .

(ii) Every infinite cardinal is a limit ordinal.


F. CARDINAL NUMBERS 343

Proof. (i) This follows immediately from the definition of a cardinal.


(ii) Any infinite successor ordinal α+ = α ∪ {α} ≈ α so it cannot be a cardinal. 

F.6 Example
The sets ω, ω + ω and ω × ω are all the same size (all countably infinite). Well, since we are
discussing cardinals here, we should say that the sets ℵ0 , ℵ0 ⊕ ℵ0 and ℵ0 × ℵ0 are all the
same size. Therefore, in cardinal arithmetic

ℵ0 = ℵ0 + ℵ0 = ℵ20 .

We know that these are all different in ordinal arithmetic. So cardinal arithmetic is different
from ordinal arithmetic.

F.7 Example: More on ℵ notation


The first few infinite cardinals are written ℵ0 , ℵ1 , ℵ2 , ℵ3 , . . . . This extends to ℵα , for
any ordinal α, defined in the obvious way:

ℵ0 = ω
ℵα+ = the first cardinal > ℵα
For any nonzero limit ordinal λ ℵλ = sup{ ℵξ : ξ < λ }

It is not difficult to prove that this defines a cardinal for every ordinal α and that these are
all the cardinals there are.

For any set A of cardinality α, its power set P (A) is of cardinality 2α . Thus the Continuum
Hypothesis can be restated:
ℵ1 = 2ℵ0
and the Generalised Continuum Hypothesis can be restated: for every ordinal α,

ℵα+1 = 2ℵα .

F.8 Proposition: Algebraic properties of cardinal arithmetic


Let α, β and γ be any cardinal numbers. Then the following hold:
(i) α + (β + γ) = (α + β) + γ
(ii) α+β =β+α
(iii) α+0=α

(iv) α(βγ) = (αβ)γ


(v) αβ = βα
(vi) 0α = 0, 1α = α, 2α = α + α and so on.
(vii) α(β + γ) = αβ + αγ
344 CHAPTER 7. TRANSFINITE ARITHMETIC

(viii) αβ+γ = αβ αγ
(ix) (αβ)γ = αγ β γ
(x) (αβ )γ = α(βγ)
(xi) 1α = 1

(xii) 0α = 0 (provided α 6= 0), 00 = 1.


(xiii) α0 = 1, α1 = α, α2 = αα, and so on.

I E.5 Proofs. These all follow from E.5.




F.9 Proposition: Order properties of cardinal arithmetic


Let α, α0 and β be cardinal numbers such that α ≤ α0 and β ≤ β 0 . Then the following hold:
(i) α + β ≤ α0 + β 0
(ii) αβ ≤ α0 β 0
0
(iii) αβ ≤ (α0 )β (except in the case that α = α0 = β = 0 and β 0 > 0).

Proofs. These are again straightforward. 

F.10 Proposition: Finite cardinal arithmetic


For finite cardinals (natural numbers), cardinal, ordinal and ordinary arithmetic are the
same.

So far so good. Now we look at a proposition and theorem which tell us that, where infinite
cardinals are involved, cardinal arithmetic is mostly trivial.

AC
F.11 Proposition
Let α be an infinite cardinal. Then α2 = α (cardinal arithmetic).

Proof. Suppose not. Then let α be the least infinite cardinal such that α2 6= α. Now α ≥ 1
(tivially), so α2 ≥ α.1 = α and therefore α2 > α.
Consider the cartesian product α × α. We define a new order / on it by
hξ1 , ξ2 i / hη1 , η2 i iff max{ξ1 , ξ2 } < max{η1 , η2 }
or max{ξ1 , ξ2 } = max{η1 , η2 } and ξ1 < ξ2
or max{ξ1 , ξ2 } = max{η1 , η2 } and ξ1 = ξ2 and η1 < η2 .
Note that it follows immediately from this definition that
If hξ1 , ξ2 i / hη1 , η2 i then max{ξ1 , ξ2 } ≤ max{η1 , η2 } .
F. CARDINAL NUMBERS 345

It is easy to show that this order / is a well-order on α × α ; let θ be its order-type, so we


have an order-isomorphism f : α × α → θ. Now we have

θ ≥ #θ by F.5(i)
= #(α × α) since f is a bijection
2
= α.α = α by definition of cardinal multiplication
> α by choice of α above.

But now, because of our order-isomorphism f : α × α → θ, there are ξ1 , ξ2 ∈ α such that


f (ξ1 , ξ2 ) = α. Then, again using the fact that f is an order-isomorphism, it must map the
set of all pairs hη1 , η2 i which are / hξ1 , ξ2 i onto the set of all ordinals < α, that is, onto α
itself. In other words,

f [ { hη1 , η2 i : hη1 , η2 i / hξ1 , ξ2 i } ] = α .

Now set µ = max{ξ1 , ξ2 } + 1. Since ξ1 , ξ2 ∈ α, they are both < α and so, since α is a limit
ordinal, µ < α. We now have

{ hη1 , η2 i : hη1 , η2 i / hξ1 , ξ2 i } ⊆ { hη1 , η2 i : max{η1 , η2 } ≤ max{ξ1 , ξ2 } }


= { hη1 , η2 i : max{η1 , η2 } < µ }
= { hη1 , η2 i : η1 < µ and η2 < µ }
= { η1 : η1 < µ } × { η2 : η1 < µ }
= µ × µ,

and now

α = #{ hη1 , η2 i : hη1 , η2 i / hξ1 , ξ2 i }


≤ #(µ × µ) = (#µ)2 . (–1)

Now µ cannot be finite because, if it were, µ×µ would also be finite, in which case #(µ×µ) =
(#µ)2 would be finite, contradicting (–1), since α is infinite.

So µ must be infinite, in which case #µ is infinite too. Now we saw that µ < α and so
#µ < α also. Therefore, by the original choice of α, (#µ)2 = #µ. But then (#µ)2 < α,
again contradicting (–1). 

AC
F.12 Theorem
Let α and β be cardinal numbers, α ≤ β and β be infinite. Then (cardinal arithmetic)

(i) α+β =β .
(ii) Provided α ≥ 1 also, αβ = β.
(iii) Provided α ≥ 2 also, αβ = β β .

Proof. (i) 0≤α , so β =0+β ≤α+β ≤β+β =β .

(ii) 1≤α so β = 1β ≤ αβ ≤ β 2 = β .
346 CHAPTER 7. TRANSFINITE ARITHMETIC

(iii) 2≤α≤β so 2β ≤ αβ ≤ β β ≤ (2β )β = 2(ββ) = 2β .




AC
F.13 Examples
We now have enough artillery to sort out the sizes of many of the infinite sets we deal with
constantly. Here are some examples:

(i) N+N ≈ N that is #(N + N) = ℵ0 .


(ii) N×N ≈ N that is #(N × N) = ℵ0 .

(iii) NN ≈ R that is #(NN ) = ℵℵ0 0 = 2ℵ0 .

(iv) N+R ≈ R that is #(N + R) = ℵ0 + 2ℵ0 = 2ℵ0 .


ℵ0 ℵ0
(v) NR ≈ 2 R that is #(NR ) = ℵ20 = 22 .

(vi) RN ≈ R that is #(RN ) = (2ℵ0 )ℵ0 = 2ℵ0 .

(vii) R+R ≈ R that is #(R + R) = 2ℵ0 + 2ℵ0 = 2ℵ0 .


(viii) R × R ≈ R that is #(R × R) = 2ℵ0 .2ℵ0 = (2ℵ0 )2 = 2ℵ0 .
ℵ0 ℵ0
(ix) RR ≈ 2 R that is #(RR ) = (2ℵ0 )2 = 22 .
8. MODEL THEORY

A Structures, interpretations, models again


A.1 Example: A universe of finite sets
Let us again consider the language MK, but now take as our structure the class of all finite
sets, with ∈ and = interpreted as ∈ and =. Let us call this structure F.
A warning before we start: the members of a finite set need not also be finite. For example,
the set {Z, N} is finite (2 members) whereas its members Z and N are not.

What does SET(x) correspond to? As before, to

(∃s ∈ F)(x ∈ s) .

That is always true: simply set s = {x}, which is finite.


What about x ⊆ y, that is (∀u)( u ∈ x ⇒ u ∈ y) ? This corresponds to

(∀u ∈ F)(u ∈ x ⇒ u ∈ y) ,

which says that every finite set which is a member of x is also a member of y. This is quite
different from x ⊆ y. For example, in an interpretation in which x corresponds to{Z, 1, 2}
and y to {1, 2, 3} (let’s call these sets x and y) we see that x ⊆ y is true in F, whereas x ⊆ y
is not.

In the light of this, one would expect MK1 to fail, and so it does. It is interpreted as

(∀a ∈ F)(∀b ∈ F) (∀x ∈ F)(x ∈ a ⇔ x ∈ b) ⇒ a = b .

But this is false: for instance take a = {Z} and b = ∅.


What about MK2? Given the predicate P (x), it corresponds to

(∃w ∈ F)(∀x ∈ F)( x ∈ w ⇔ P (x) )

where P (x) is the predicate corresponding to P (x). Once more, if we set P (x) to be, say,
x = x, this is false: it says that there is a finite set w which contains every finite set x.
The failure of MK1 is a big drawback because it makes the apparatus of definition by
description almost unusable — and this is used to define unordered pairs, unions, power
sets and, via them, almost everything else. Suppose that nevertheless we are serious about
trying to see what the axioms of mathematics have to say in a universe of finite sets; is there
any way we can resurrect this example? There are two ways we could go about this:
1 We could redefine F as the class of all finite sets whose members are also finite, whose
members of members are also finite and so on. We simply define a set to be recursively finite

347
348 CHAPTER 8. MODEL THEORY

if it is finite and all its members are recursively finite. This definition looks a bit alarming
at first sight, but the Axiom of Foundation tells us that it is valid.
2 We could keep F as the class of all finite sets, but redefine the interpretation of = .
There is nothing in the definition of an interpretation to say that = must be interpreted
as ordinary equality. It is a binary relation symbol and so must be interpreted as a binary
relation. The axioms of equality imply that it must be an equivalence relation and “behave
well” towards functions and other relations in an obvious way. So, suppose we define an
equivalence ∼ on F by:

a ∼ b ⇔ (∀x ∈ F)( x ∈ a ⇔ x ∈ b )

and use this as the interpretation of = . Then, having checked that the axioms of equality
hold in the new structure, we have more or less forced MK1 to be true.

It would be difficult, if not impossible, to arrange for MK2 to be true in such a structure.
As an alternative, we could decide to investigate the ZF axioms instead. (Note that ZF uses
the same language as MK, it is only the axioms that are different, leading of course to a
different theory.) Axiom ZF2 (corresponding to MK2) states (again for any predicate P (x):

(∀a)(∃w)(∀x)( x ∈ w ⇔ x ∈ a ∧ P (x) )

This translates to

(∀a ∈ F)(∃w ∈ F)(∀x ∈ F)( x ∈ w ⇔ x ∈ a ∧ P (x) )

and this is true in F: given any a ∈ F, set w = { x : x ∈ a ∧ P (x) }. Since w is a subset of


a, it is also finite, and has the required property.

A.2 Definition: Semantic implication


Let P and Q be two expressions in our language L. We will say that P semantically implies
Q if, for every structure M for L, if P is true for M , then Q is true for M also. This is
written P Q.
More generally, let P be a set of expressions in our language L. We say that P semantically
implies Q if, for every structure M for L, if every expression in P is true for M , then Q is
true for M also. This is written P Q.
More generally again, let P and Q be sets of expressions in our language L. We say that P
semantically implies Q if, for every structure M for L, if every expression in P is true for
M , then every expression in Q is true for M also. This is written P Q.

(We note here that the symbol is not part of the language L, even informally. It is a
metasymbol.)
If Q is a statement in the language L which is true for every structure for the language, we
say that Q is semantically valid and we write Q. This is the same as saying that ∅ Q.

I 5.C.3 (Note that Proposition 5.C.3(ii) says that all the theorems of PL are semantically valid.)
A. STRUCTURES, INTERPRETATIONS, MODELS AGAIN 349

A.3 Theorem
For any expressions P and Q in any language L,

P Q if and only if P ⇒Q.

Proof. Suppose first that P Q, and let M be any structure for L and θ any interpretation
L → M . Then it cannot be the case that θ(P ) is true and θ(Q) is false, because this would
contradict P Q. But that means that θ(P ) ⇒ θ(Q) in M , that is, that θ(P ⇒ Q) is true
in M . Since that is true for any M and any θ, this proves that P ⇒ Q.
Conversely, if P ⇒ Q, then θ(P ) ⇒ θ(Q) for every M and θ. But then, if θ(P ) is true,
then so is θ(Q), proving that P Q. 
350 CHAPTER 8. MODEL THEORY

B Adequacy of first-order theories


B.1 The Fundamental Theorem of Model Theory again
I 5.C.4 Back in Section 5.C.4 we stated this theorem, that a first-order theory has a model if and
only if it is consistent, but at that time only proved the “forwards” direction (if the theory
has a model, then it is consistent). The time has come to complete the proof by proving

If a first-order theory is consistent, then it has a model.

This is a big proof which goes in several stages, each of which is quite complicated. Never-
theless the proof itself is of considerable interest, so I plan here to explain it carefully in as
leisurely a fashion as I can manage, starting with an outline.
I Ch.5 In Chapter 5 we looked at several models for theories, but they don’t suggest any obvious
systematic way of creating a model for a general first-order theory. Given an arbitrary
theory, we don’t have much to work with — except for the theory itself. So what we do
is manufacture a model out of the theory itself, or at least part of it. Let us start with a
couple of examples to see how this might go.
If we were given PA to work with, but had no pre-existing idea about N, how could we pro-
ceed? Well, first we would notice that there ia a constant 0̄, so we must have a corresponding
member of our model, and that gives us 0, or rather, we can use the term 0̄ itself as that
member. Next we notice that there is a function suc, so we have to also have suc(0̄) and then
suc(suc(0̄)) and then suc(suc(suc(0̄))) and so on. Using our other notation for successors, we
now have
0̄ , 0̄+ , 0̄++ , 0̄+++ , . . .
and it looks like we have all of N (or at least something we can use to manufacture it)
already.
But note how we got this far — we started with the “constant” zero, which we had to have,
and then noticed that applying the function suc to it should give us more members which we
have to have also. But we cannot stop there: there are other functions too, giving us other
terms. We have to have things like

0̄+ + 0̄++ , 0̄++ .0̄+++ and even (0̄+ + 0̄++ ).(0̄++ + 0̄+ ) .

Now we know that lots of these should be the same as one another, and that which are the
same as which should be sorted out by the theorems of the theory. Actually doing this is
quite messy, however there is an easy way out: at present we are trying to manufacture a
plain model, and we don’t really care if it respects equality. We will press ahead, treating
the equality symbol (if it exists in our theory) just like any other binary relation. Basically,
creating a model without worrying about respecting equality will be hard enough without
that extra complication.
(Of course, we do want to know that, if the theory happens to be one with equality, then
there is a model which respects it. We will prove that after proving the main theorem by
showing that, if the theory happens to be a theory with equality which has a model, then
from that we can manufacture a model which does respect equality.)
B. ADEQUACY OF FIRST-ORDER THEORIES 351

So let us try this: Our model (let’s call it M ) should consist of all “constant” terms in the
language. By that I mean all terms of the language which do not contain any variables, so
terms like x + 0̄+ won’t be members of our model.
( Now I have a problem with my coloured fonts. So far I have adopted a convention that
members of the formal language are written in green and members of the model in black,
or perhaps brown. The trouble is, now the terms of the language are both. I think that the
best thing to do is leave them in green.)
We need to define the actions of the functions on our model. Let us start with addition: we
want to define its action +M on the model. For example, we have two members of the model
(terms) 0̄+ and 0̄++ ; how are we to define its sum? Well, there’s an obvious candidate, the
sum as already defined in the formal language. So we define

0̄+ +M 0̄++ = 0̄+ + 0̄++ .

Here, on the left hand side, we have the addition function of the model acting on two
members of the model and, on the right hand side, a single term of the language in its
capacity as a single member of the model. It is easy to see that we can do the same thing
for the other functions of the language (the successor function and multiplication); in fact
we can make this into a definition that will work for any first-order language at all.
Let f be a function symbol of the formal language (of arity n say); we define the action fM
of f on the model M as follows. For any constant terms t1 , t2 , . . . , tn of the language (which
are therefore members of the model M also),

fM (t1 , t2 , . . . , tn ) = f (t1 , t2 , . . . , tn )

As above, on the left hand side, we have the function fM of the model acting on n members
of the model and, on the right hand side, a single term of the language in its capacity as a
single member of the model. It is not hard to show that this definition satisfies the required
property for the interpretation of a function symbol.
We also have to define how relations behave in the model. So, supposing we have a relation
symbol r of arity n and we want to define a corresponding relation rM on the model, I think
that there is a natural way to do it: For any constant terms t1 , t2 , . . . , tn of the language
(which are therefore members of the model M also),

rM (t1 , t2 , . . . , tn ) if and only if r(t1 , t2 , . . . , tn ) in the theory .

(This definition turns out to work nicely provided the theory is complete, but gives trouble
otherwise. We will cross this bridge (and we can) when we come to it.)
So far, so good. But there is still another major hurdle to overcome. Consider the first-order
theory of Projective Planes, as discussed in Section 5.B. Looking at the definition there, we I 5.B
see that this theory has no constant terms at all and so the method we used with PA above
is not going to work. So how are we to build a model from this theory?
The answer to this question is that this theory, like many others, has some existence theorems
— theorems of the form (∃x)P , where P is some expression. We will see that we only need
to deal with such theorems in which x is the only free variable in P (so the theorem itself
352 CHAPTER 8. MODEL THEORY

is a sentence). Since such a theorem says that something exists with the given property, it
had better exist in the model. So we just add a new constant term to our language, c say,
and fix things so that the sentence P [x/c] is true for the model. (These new constants we
add are usually called witnesses.) Of course, we will have to add a witness for every such
sentence (∃x)P and there will probably be an infinite number of them.

Here’s a bit of a dialog about this . . .


— How are you going to do all that?
— In two steps. First we’ll add the new witness constants and then we’ll fix
things so those sentences are true for the model.
— Can you just add a whole lot of constants like that and get away with it?
— Yes, we’ll prove that, assuming you start with a consistent theory, just
adding constants cannot make it inconsistent.
— OK, so how do you fix things so those sentences are true for the model?
— Easy, we just add them as new axioms (and we won’t be able to resist calling
them “witness statements”).
— Really? How do you know that adding a whole bunch of new axioms won’t
make the theory inconsistent.
— We are only going to do it when we already have a theorem (∃x)P , and we
will prove in that case that adding the new axiom will not cause an inconsis-
tency.
— But there is still a problem. This process involves extending the theory by
adding new constants and axioms. But it’s not the new big theory that we
want a model for, but the old one we started with.
— True, but that’s the easy part. We prove that when we extend a theory,
any model for the extension is a model for the original. (In fact this is pretty
obvious already, if you think about the definition of a model.)
— I’ve just looked at the axioms for Projective Planes and I don’t see any
sentence of the form (∃x)P .
— Sorry, Axiom PP5 is of that form, and there are lots of theorems of that
form which are not in the list of axioms. Let us look at this properly now . . .

Axiom PP5 is of the form


(∃x1 )(∃x2 )(∃x3 )(∃x4 )Q
where Q is a complicated-looking expression whose only free variables are x1 , x2 , . . . , x4 .
(There is no harm here in using the sloppy functional notation and writing Q(x1 , x2 , . . . , x4 )
for Q.) So we will be adding a constant, let’s call it c1 and a new axiom

(∃x2 )(∃x3 )(∃x4 )(Q(c1 , x2 , x3 , x4 , )

Being an axiom, this is also a theorem, so we treat it the same way to get another constant
c2 and axiom
(∃x3 )(∃x4 )(Q(c1 , c2 , x3 , x4 , ) ,
and so on. Eventually we end up with four new constants, c1 , c2 , . . . , c4 and the axiom
Q(c1 , c2 , . . . , c4 ), which is what we wanted.
B. ADEQUACY OF FIRST-ORDER THEORIES 353

SO, our proof goes in several stages. We will deal with the case in which the language is
countable first, since that is complicated enough. When that is done, I will go through the
small changes necessary to make the whole thing work for a language of any cardinality.
Stage 1 We start with our given consistent language, A say and add a countable set of
new constants; this creates an extension B of the original language. We will prove before
starting that making an extension by adding constants in this way to a consistent language
creates a new language which is also consistent.
Stage 2 We add new axioms (the witness statements) to B to turn the new constants into
the required witnesses. This does not add any new symbols, so the language stays the same,
however it does create new theorems, so the theory is extended in that sense. We prove that
this new theory, which we will call C, is also consistent.
Stage 3 Then we invoke a theorem (that we have already proved: Theorem 6.G.13) that I 6.G.13
every consistent theory has an extension which is both consistent and complete to create
such an extension D of C.
Stage 4 We construct the model from the constant terms of D using the technique de-
scribed above and prove that it is indeed a model of D and therefore of A also.
Stage 5 We prove the theorem that if a first-order theory with equality has a model, then
it has one which respects equality.

Stage 4 Model
Stage 3
Extent to complete to get D

Stage 2 (May not respect equality)


Add witness statements to get C
Stage 5

Stage 1
Add b1 , b2 , . . . , to get B

Model

Given theory A

(Does respect equality)


354 CHAPTER 8. MODEL THEORY

B.2 Stage 1
This stage, adding plain constants, is not quite as simple as one might think. The point is
that, having added more constants, one then immediately gets lots more expressions and,
since the axioms of PL are all schemas, we get lots more instances of these too.

B.3 Symbol replacement


We are about to use a new kind of substitution of one symbol for another in expressions.
I 3.A.16 This will be much simpler than the [v/t] kind of substitution defined in 3.A.16, because now,
when substituting t for v, we simply substitute for every occurrence of v. In particular, if
v happens to be a variable symbol we pay no attention to whether it is bound or free, we
substitute for both kinds.
This kind of substitution is much easier to deal with than the [v/t] kind because we do not
have to worry about either binding or acceptability. Since it is rather different, let us give
it a different name and call it symbol replacement.
For example, we could replace x by u in the expression

P (x) ⇒ (∀x)Q(x) (–1)

to get
P (u) ⇒ (∀u)Q(u). (–2)

If u is a variable symbol, the result (–2) is also an expression, but if it is (for instance) a
constant symbol the result is no longer an expression (because then (∀u) is not allowed). Of
course, more bizarre replacements are allowed; for example, we could replace ) by ¬ in (–1)
to get
P (x¬ ⇒ (∀x¬Q(x¬ (–1)

which makes no sense at all.

However, the symbol replacements we are going to be concerned with here are only of one
simple kind, namely replacing a constant symbol by a variable symbol.

B.4 Lemma
Let L be a first-order language in which P is an expression, c a constant symbol and v a
variable symbol. Then the result of symbol-replacing c by v in P is also an expression.
(In other words, the kind of replacement we are interested in here does not convert expres-
sions into rubbish strings.)

Proof. As exercise. It would be a good idea to convince yourself that you can make a
careful proof of this. First define replacement by induction over the construction of terms
I 3.A.16 and expressions (this is almost the same as the “careful” part of Definition 3.A.16 but much
simpler because questions of acceptability do not arise). Then prove this proposition by a
similar induction. 
B. ADEQUACY OF FIRST-ORDER THEORIES 355

B.5 Lemma
Let B be a first-order theory, P be a theorem of B and let c1 , c2 , . . . , ck be distinct constant
symbols none of which which occur anywhere in the proper axioms of B. Then there are
variable symbols v 1 , v 2 , . . . , v k such that the result of symbol-replacing ci by vi for i =
1, 2, . . . , k in P is also a theorem of B.

The variable symbols vi can be found thus: take any proof L1 , L2 , . . . , Ln of P and let the
vi be any variable symbols which occur nowhere in this proof. (Since a proof is of finite
length, such variable symbols must exist.)
Note Here the constant symbols ci may appear in some of the axioms P1 – P6 of Predicate
Logic; indeed, as pointed out above, they must do so.

Proof. Assume that the proof L1 , L2 , . . . , Ln of P and variable symbols v 1 , v 2 , . . . , v k have


been chosen as suggested above. Let L01 , L02 , . . . , L0n be the results of making the symbol-
replacements ci → vi for all i in this given proof. Since L0n is the result of making these
replacements in P , it is now enough to show that this new proof is indeed a valid proof.
We prove this in the usual way, by showing that each L0k is either an axiom, or follows from
earlier steps by MP or UG.
If Lk is one of the axioms PL1 – PL5 or predicate logic then the result of the replacement in
it is to convert it to another instance of the same axiom schema. Perhaps we need to look
at this a little more closely. If Lk is an instance of Axiom PL1,

Lk = P ⇒ (Q ⇒ P )

then
L0k = P 0 ⇒ (Q0 ⇒ P 0 )
(where P 0 and Q0 are defined in the obvious way) and this is obviously another instance of
PL1. The proofs for Axioms PL2 and PL3 are the same.
If Lk is an instance of PL4,

Lk = (∀x)A ⇒ A[x/t] where the substitution [x/t] is acceptable in A.

Now all the ci are different from x but they might or might not occur in t. In any case we
have
L0k = (∀x)A0 ⇒ A0 [x/t0 ]
(where t0 is the result of making the ci → vi substitutions in t) and it is not difficult to see
that the substitution [x/t0 ] is acceptable in P 0 also.
If Lk is an instance of PL5,

Lk = (∀x)(S ⇒ A) ⇒ (S ⇒ (∀x)A) where x does not occur free in S

then (again since x and the ci are different)

L0k = (∀x)(S 0 ⇒ A0 ) ⇒ (S 0 ⇒ (∀x)A0 )

and it is easy to see that x does not occur free in P 0 .


356 CHAPTER 8. MODEL THEORY

That disposes of the axioms PL1 – PL5. If Lk is a proper axiom, then none of the ci occur
in it and so L0k = Lk and the result is trivial.
Now suppose that Lk follows from two earlier steps by MP. That means that there are two
earlier steps of the forms Li and Lj = Li ⇒ Lk . But then the new proof contains earlier
steps of the forms L0i and L0j = L0i ⇒ L0k and the result is again trivially true.

Finally, suppose that Lk follows from an earlier step Li by UG. (Note, we are talking about
a formal proof of a theorem here: the condition (ug∗) can be ignored.) That means that Lk
is of the form (∀x)Li . So L0k is (∀x)L0i and follows from L0i bu UG. 

B.6 Proposition
Let A be a consistent first-order theory. Create a new theory B by adding a set of new
constant symbols, C say, and extending its expressions and the Axioms PL1 – PL5 as
necessary to embrace the new constants. Then B is a consistent first-order theory also.

Note As pointed out above, adding new constants to the set of symbols will result in
enlarging the set of expressions, and in a first-order theory, the PL axioms PL1 – PL5 must
be schemata, and so now we have more instances of them too. On the other hand, there is
no necessity to enlarge the set of proper axioms, and we are not going to do that here.

Proof. B is consistent, for if not then there is an expression P of B such that P ∧ ¬P in


B. The proof of this is finite and so contains only a finite number of variable and constant
symbols of B. We may therefore replace each constant c which appears in this proof by a
variable symbol which did not appear in the proof. By the lemma above, this transforms
the proof into a new correct proof of a statement of the form P 0 ∧ ¬P 0 in A, contradicting
the consistency of A. 

B.7 Stage 2
We are now going to add the witness statements as new axioms. For each theorem of the
form (∃x)P we want to have a constant c such that P [x/c] .

We cannot just add all such statements as axioms, because as we add axioms we will get
more theorems and, for all we know, lots of these theorems might be new ones of the same
form (∃x)P (with a new x and P ), so we would get into a complicated vicious circle.
We wriggle out of this problem by starting with all expressions P with exactly one free
variable (whether they are theorems or not) and adding axioms

(∃x)P ⇒ P [x/c] (–1)

for all of them. Then, if (∃x)P turns out to be a theorem, the P [x/c] will be too, as we
want, and if (∃x)P turns out not to be a theorem, the new axiom gives us nothing. (The
set of all such P does not change as we add axioms, so we don’t get the circle problem.)
There is one more adjustment to make. The proof should work with our fully-formal lan-
guage, which does not have the ∃ symbol. So (–1) is semi-formal for

¬(∀x)¬P ⇒ P [x/c]
B. ADEQUACY OF FIRST-ORDER THEORIES 357

which is equivalent to
¬P [x/c] ⇒ (∀x)¬P
Now we are going to add an axiom of this form for every expression P with one free variable
and we can simplify this a bit by replacing P by ¬P :

P [x/c] ⇒ (∀x)P . (–2)

This simpler form of the witness statements makes the proof go more easily, so this is what
we’ll use.

B.8 Lemma
Let A be a countable consistent first-order theory. Then there is a consistent extension C
of A which has the property that, for every expression P with exactly one free variable x,
there is a constant c and a theorem P [x/c] ⇒ (∀x)P in C.

Proof. First, we use Stage 1: Create a new theory B by adding a countably infinite set of
new constant symbols b1 , b2 , b3 , . . . to A and extending its expressions and Axioms PL1 –
PL5 as necessary to embrace the new constants. By the Proposition B.6, B is consistent. I B.6
(And A ⊆ B of course.)
It will be convenient in this proof to mention axioms for these theories. Starting with A,
we will assume that it has proper axioms A; (if it does not come already equipped with
axioms, we can always use the whole of A as its own proper axioms: A = A. This may
seem a strange thing to do, but nobody said that you can’t repeat the PL axioms as proper
axioms too.) When we add the constants b1 , b2 , . . . to form B, as mentioned above, the
proper axioms of PL extend to encompass these new constants, but the proper axioms do
not change: so we can take A as a set of proper axioms for B too.
Let P1 , P2 , . . . be an enumeration of all the expressions of B that contain exactly one free
variable, and let x1 , x2 , . . . be their free variables. (We number them the same way so that,
for each i, xi is the unique free variable of Pi . Note that the Pi are all different, but the xi
will not be.)
Now we add the witness statements as described above. But first, we choose the constants
ci from among the bi we already have — but they have to be chosen craftily.

Choose a subsequence c1 , c2 , . . . from the new constants b1 , b2 , . . . in such a way that, for
each i,
(i) ci does not occur in any of P 1 , P 2 , . . . , P i
(ii) ci is different from all its predecessors c1 , c2 , . . . , ci−1 .

(This can be done for any i because there is an infinite number of bi to choose from, and
the two conditions only rule out a finite number of them.)
For each i, let Wi be the expression

Wi = Pi [xi /ci ] ⇒ (∀xi )Pi . (–1)


358 CHAPTER 8. MODEL THEORY

(These are the witness statements.) Note that, since xi was the only free variable in Pi , Wi
is a sentence — no free variables. To prove consistency of the result, we will add them one
at a time.
For each n, let Bn be the first-order theory with proper axioms A ∪ {W 1 , W 2 , . . . , W n }.
Also, let C be the one with proper axioms A ∪ {W1 , W2 , . . .}. It is obvious that

B0 = B
For every n, Bn is an extension of B.
For every n ≥ 1, Bn is the extension of Bn−1 generated by it together with Wn .

Hence these theories form a chain.


S
C is an extension of every Bn and in fact C = { Bn : n ∈ N }.
We prove that every Bn is consistent by induction; then the fact that C is consistent follows
immediately.

Firstly, we know that B0 = B is consistent by Stage 1. We now show that, for any n ≥ 1, if
Bn−1 is consistent, then so is Bn ; this we do by showing that, if Bn is inconsistent then so
is Bn−1 . Making the assumption that Bn is inconsistent, it follows that any expression at
all is provable in Bn ; in particular,

¬Wn (in Bn ).

But Bn is generated by Bn−1 together with Wn . Therefore

Wn ¬Wn (in Bn−1 ),

and so
Wn ⇒ ¬Wn (in Bn−1 ).
Uh oh! I’ve just used the deduction theorem and the line it came from was an entailment,
not a deduction. Is that OK? Yes, because the hypothesis Wn has no free variables, and in
that case a deduction and an entailment are the same thing.
From this it follows that
¬Wn (in Bn−1 ).
Using (–1), the definition of Wn (and some SL), we see that

Pn [xn /cn ] ∧ ¬(∀xn )Pn (in Bn−1 )

and so both
Pn [xn /cn ] (in Bn−1 ) (–2)
and
¬(∀xn )Pn (in Bn−1 ). (–3)
Now the constant cn is not a symbol of A and so does not occur in any of its axioms. By
its choice, it also occurs nowhere in any of P 1 , P 2 , . . . , P n or c1 , c2 , . . . , cn−1 . Therefore it
occurs nowhere in any of W 1 , W 2 , . . . , W n−1 . This shows that cn occurs nowhere in any of
the proper axioms of Bn−1 .
B. ADEQUACY OF FIRST-ORDER THEORIES 359

B.5 Using (–2) and Proposition B.5 above, there is a variable symbol v such that the result of
replacing cn by v in Pn [xn /cn ] is a theorem of Bn−1 .
Now Pn contains no occurrences of cn and Pn [xn /cn ] is the result of replacing the free
occurrences of xn in Pn by cn . Consequently, the result of replacing these occurrences of
cn in turn by v is just the same as replacing all free occurrences of xn in Pn by v (think
about it!) — and that is just Pn [xn /v]. This tells us that the result of replacing cn by v
in Pn [xn /cn ], which we have just seen is a theorem of Bn−1 , is just Pn [xn /v]. So we have
proved that
Pn [xn /v] (in Bn−1 ).

We now apply UG and conclude that

(∀v)Pn [xn /v] (in Bn−1 );

but (–3) gives (using Proposition 3.D.2) I 3.D.2

¬(∀v)Pn [xn /v] (in Bn−1 )

showing that Bn−1 is inconsistent, as required. This completes the induction, showing that
every Bn is consistent.
It follows that C is consistent also. 

Stage 3
Stage 3 is now easy. We have already done the work.

B.9 Lemma
Let A be a countable consistent first-order theory. Then there is a complete consistent
extension D of A which has the property that, for every expression P with exactly one free
variable x, there is a constant c and a theorem P [x/c] ⇒ (∀x)P in D.

Proof. Apply Theorem 6.G.13 to the previous Lemma.  I 6.G.13

B.10 Stage 4
Now we build the model, let us call it M . Its underlying set will be the set of all constant
terms of C (which is the same as the set of all constant terms of D), since this is a simple
extension.
We must make this into a structure for the language by defining the actions of the functions
and relations of the language on this set.
We must then show that this structure is a model by showing that all sentences in A are
true for M .
360 CHAPTER 8. MODEL THEORY

B.11 Theorem
Every countable consistent first-order theory has a countable model.
Remarks This theorem is in fact true if we replace “countable” by any cardinality. I
prove the countable case here first for two reasons. Firstly, the proof in the countable case is
complicated enough without adding the extra tricks needed to get an arbitrary-cardinality
version to work and, secondly, the countable version does not require the Axiom of Choice
for its proof and the general version does. In the next theorem following I prove the general
theorem by adding these tricks.

I B.9 Proof. Let A be the given theory. Create the theoriy D as in Lemma B.9 above. Let M
be the set of all constant terms in D (which is the same as all constant terms in B or in C
since they share the same language).
We make M into a structure for this language as follows:
for any n-ary function symbol f , define the function fM : M n → M by

fM (s1 , s2 , . . . , sn ) = f (s1 , s2 , . . . , sn ) (–1)

(and note that, in particular, each constant symbol bi is interpreted as itself, (bi )M = bi ),
for any n-ary relation symbol r, define the n-ary relation rM on M by

rM (s1 , s2 , . . . , sn ) is true if and only if r(s1 , s2 , . . . , sn ) (in D). (–2)

We now show that M is a model for D. This will complete the proof, since then it will be a
model for A and is obviously countable. To do this, we show that for any sentence (no free
variables) C in D,

C (in D) if and only if C is true for M

and this we do by induction over the construction of C.


• If C is an atomic expression, then this is true by the definition of the action of
relations on M , (–2) above.

• Suppose now that C is ¬A, for some expression A. Then A is a sentence also, so by
the inductive hypothesis, A is true for M if and only if A (in D).
If C is true for M , then A is false for M and so / A (in D).
But D is complete, so ¬A (in D), that is, C in D, as required.

On the other hand, if C is not true for M , then A is true for M and so A in D.
Since D is consistent, ¬A is not provable in D, that is, / C (in D).
(Here is where we use the consistency of D.)
• Suppose next that C is A ⇒ B. (The proof is much the same as for the last case,
but here it is anyway.) Since C is a sentence, so are A and B. Then, by the inductive
hypothesis, A is true for M if and only if A in D and the same is true for B.
B. ADEQUACY OF FIRST-ORDER THEORIES 361

If C is false for M , then A is true and B false for M .


Therefore A and / B (in D).
But then (by completeness) ¬B and so A ∧ ¬B from which ¬C in D
and so (by consistency again), / C.
If C is true for M , then either A is false or B is true for M .
If A is false for M then / A (in D) and so, by completeness, ¬A (in D)
from which A ⇒ B.
If B is true for M , then B (in D) and again we have A ⇒ B.
• Finally suppose that C is (∀x)A. Since C is a sentence, no variable symbols other
than x can occur in A.

If x does not occur in A, then (∀x)A A (in D) and C is true for M if and only if A is,
so the result follows immediately by induction.
We may now suppose that A has exactly one free variable x. Then it is one of the expressions
P1 , P2 , . . . listed at the beginning of the proof: there is some k such that C = (∀xk )Pk .

Let C be true for M . Then, by the definition of a model, Pk is true for every interpretation
in M and so, in particular, Pk [xk /ck ] is true in M .
Then, by the inductive hypothesis, Pk [xk /ck ] (in D).
But D contains, as an axiom, Qk , that is, Pk [xk /ck ] ⇒ (∀xk )Pk and so C (in D).
Let C be false for M . Then, again by the definition of a model, there is some interpretation
in M under which Pk is false.
Suppose this interpretation maps xk to t (a member of M and so a closed term of the
language), then Pk ([xk /t]) is false in M and so, by inductive hypothesis, is not provable in
D. But then, by PL4, (∀xk )Pk , that is C, is not provable either. 

This innocuous-looking theorem has a most extraordinary corollary . . .

B.12 Example: A countable model of MK Set Theory!


Assuming that MK is consistent, it has a countable model. The same is true of ZF.

Proof. Because they are both countable theories. 

This seems paradoxical — Set Theory contains assertions of the existence of uncountable
infinite sets, P (N) for example. How can such assertions be true in a countable model?
A model, M say, of MK contains members corresponding to sets and a binary relation
corresponding to the relation symbol ∈. However this binary relation on M need not itself
be set membership — it probably won’t be — so let us denote it /. Let P be the member
of M corresponding to P (N). Since M is countable the set of all x in M such that x / P
is countable. Thus, standing outside the theory, we can construct a one-to-one function
between N and these “members” x of P . However there is nothing in the model which does
the job of this function.

This theorem, together with 5.C.5 above, gives us . . . I 5.C.5


362 CHAPTER 8. MODEL THEORY

B.13 Corollary: the Fundamental Theorem of Model Theory (countable case)


A countable first-order theory is consistent if and only if it has a model.
Finally!

And here comes the arbitrary cardinality version. We can prove this version by some minor
tinkering with the proof of the countable one. All we need to do is replace the (countable)
sequence b1 , b2 , . . . by one big enough to be the same size as the set of all those P (x)
statements. We will also want it to be a sequence, indexed by ordinals, so that we can
define the c1 , c2 , . . . subsequence.

AC
B.14 The “Downward” Löwenheim-Skolem Theorem
Every consistent first-order theory in a language K has a model of cardinality no greater
than that of K.

Proof. Let A be the given theory in a language K of cardinality κ. Create a set B = {bi }i<κ
of the same cardinality as K and disjoint from it. Let L be the new language formed from
K by adding (all the members of) B as new constants to the language and extending the
language as necessary. Observe that L is also of cardinality κ. Let B be the correspondingly
I B.6 extended theory. Now B is consistent by Proposition B.6.
The set of all expressions in B which have at most one free variable is easily seen to be of
cardinality κ, so let us index it thus: {Pi (xi )}i<κ . Define a sequence {ci }i<κ in B by: for
each i < κ, ci is the first member of B that does not occur in any of the {Pj (xj )} for j < i
and is different from all the {cj } for j < i.
(How do we know that such a {ci } exists at all? Let i∗ be the cardinality of i and note that
i∗ ≤ i < κ. The set of all members of B which occur in any of the {Pj (xj )} for j < i is the
union of i∗ finite sets and so is of cardinality ≤ i∗ . The set of all the {cj } for j < i is of
cardinality i∗ . So the set of all members of B which may not be chosen, being the union of
these two sets, is of cardinality i∗ . But B is of cardinality κ > i∗ , so there must be members
left over. )
Now, for each i < κ, let Wi be the expression

Wi = Pi [xi /ci ] ⇒ (∀xi )Pi

Define a sequence of theories {Bν }ν≤κ (all with the same language L) by: Bν is the theory
obtained by adding all of {Wi }i<ν to the axioms of B, and set C = Bκ . It is obvious that
B0 = B

Every Bν is an extension of B.
For every ν, Bν+1 is the extension of Bν obtained by adding the extra axiom Wν .
S
If ν is a nonzero limit ordinal, then Bν = { Bi : i < ν }.

For all i and j such that i < j ≤ κ, Bj is an extension of Bi .


B. ADEQUACY OF FIRST-ORDER THEORIES 363

Hence these theories form a chain.


And so C = Bκ is an extension of every Bi .
Next we check that every Bν is consistent, by induction over ν. We have already proved
that B0 = B is. Suppose that Bν is consistent; the proof that then so is Bν+1 is exactly
the same as in the countable version of the proof. If ν is a nonzero limit ordinal and Bi is
consistent for all i < ν, then Bν , being the union of these, is also consistent by the usual
argument (any proof of P ∧ ¬P in Bν must involve only a finite number of axioms, and they
must all belong to some Bµ with µ < ν.)
We now know that Bκ is consistent. Let C be a consistent complete extension of Bκ . The
remainder of the proof is exactly the same as for the countable version of the theorem above,
noting that M , being a subset of L, is of cardinality no greater than κ. 

(This proof only requires the Axiom of Choice in its tacit assumption that every language
does in fact have a cardinality. If we do not accept the Axiom of Choice, and so do not
accept that every set necessarily has a cardinality, then we can interpret this theorem to
apply only to languages that do happen to have a cardinality. In that case the Axiom of
Choice is not involved.)
This theorem, together with 5.C.5 above, gives us the big result . . . I 5.C.5

AC
B.15 Corollary: the Fundamental Theorem of Model Theory (general case)
A first-order theory is consistent if and only if it has a model.
And, in case we should ever want any really big models, we have another version of the
theorem that is just the thing.

AC
B.16 The “Upward” Löwenheim-Skolem Theorem
(Also called the Löwenheim-Skolem-Tarski Theorem or the LST Theorem.)
Every consistent first-order theory in a language K has a model of every cardinality greater
than or equal to that of K.

Proof. Let us suppose that the cardinality of the language K is κ and that λ is some
cardinal ≥ κ; we construct a model of cardinality λ.

Repeat the proof of Theorem B.14 above with one change: where we create the set B at the I B.14
very beginning, now create it {bi }i<λ of cardinality λ. The whole of the rest of the proof
proceeds unchanged.
Observe at the end that, since B ⊆ M ⊆ L, it follows that M has cardinality λ. 

(It is possible to rewrite this theorem in a way which does not require the Axiom of Choice:
Every consistent first-order theory in a language K which has a cardinality has a model of
every cardinality greater than or equal to that of K.)
364 CHAPTER 8. MODEL THEORY

This theorem also has some surprising corollaries, for instance:

B.17 Corollary
There is a model of First-Order Number Theory (PA) which is uncountable. 
(Note that this one does not require the Axiom of Choice, because the theory is countable
and therefore does have a cardinality.)
There is an important consideration which has been ignored so far in this section: we have
not considered equality at all and in particular, there has been no attempt made to ensure
that the models we have been creating respect equality. Now first-order theories do not
necessarily have a symbol for equality, and our theorems so far have been perfectly general:
they apply to any theory, whether it is a theory with equality or not, by the simple expedient
of treating equality just like any other relation. But the result of this is that our models
so far need not respect equality — indeed, they almost certainly will not. However, most
of the first-order theories we are interested in are theories with equality, so the question of
whether such a theory has a model which respects equality is begging for an answer.

The next theorem fills that gap.

B.18 Theorem
If a first-order theory with equality has a model then it has a model which respects equality.
More specifically, if it has a model of cardinality κ, then it has a model of cardinality no
greater than κ which respects equality.

If you are familiar with the notion of a quotient group or a quotient ring from algebra — or
better still, the notion of a quotient of any kind of algebra — then this proof should look
very familiar. In fact, if we ignore the rôle of the relation symbols, a model is just a kind of
algebra, and what we are about to do is create a quotient algebra, at the same time checking
that relations behave the way we need them to.

Proof. Let A be a first-order theory in a language L with equality and M be a model for
A. We use M to construct another model M 0 which respects equality.
Step 1 Define a relation ∼ on M by: a ∼ b if either a = b or, for every interpretation
θ : tL → M , there are terms s, t ∈ tL such that s = t, θ(s) = a and θ(t) = b.
Note that it is obvious that ∼ is reflexive and symmetric, however it may not be transitive.

Now suppose that a1 , a2 , . . . , an and b1 , b2 , . . . , bn are members of M and that ai ∼ bi for


i = 1, 2, . . . , n. Then, for any n-ary function symbol f ,

fM (a1 , a2 , . . . , an ) ∼ fM (b1 , b2 , . . . , bn )

and, for any n-ary relation symbol r,

rM (a1 , a2 , . . . , an ) ⇔ rM (b1 , b2 , . . . , bn )
B. ADEQUACY OF FIRST-ORDER THEORIES 365

Proof of this: Let θ be any interpretation of A in M . Then, for each i = 1, 2, . . . , n, either


ai = bi or there are terms si , ti such that,

θ(si ) = ai , θ(ti ) = bi and si = ti .

Then, by the axioms of equality,

f (s1 , s2 , . . . , sn ) = f (t1 , t2 , . . . , tn ) and r(s1 , s2 , . . . , sn ) ⇔ r(t1 , t2 , . . . , tn ) .

Since fM (a1 , a2 , . . . , an ) = θ(f (s1 , s2 , . . . , sn )) and fM (b1 , b2 , . . . , bn ) = θ(f (t1 , t2 , . . . , tn ))


the definition of ∼ tells us that

fM (a1 , a2 , . . . , an ) ∼ fM (b1 , b2 , . . . , bn ) .

Also rM (a1 , a2 , . . . , an ) ⇔ rM (b1 , b2 , . . . , bn ) by the definition of rM .


Step 2 Now we define another relation ≡ on M by: a ≡ b if there is a finite sequence
u1 , u2 , . . . , uk (k ≥ 0) in M such that u0 = a, uk = b and ui−1 ∼ ui for i = 1, 2, . . . , k. This
is the “closure of ∼ to an equivalence relation”: it is an easy matter to check that this is
indeed an equivalence relation. Further, it is a congruence relation, in the sense that it has
the following properties inherited from ∼: If a1 , a2 , . . . , an and b1 , b2 , . . . , bn are members of
M such that ai ≡ bi for i = 1, 2, . . . , n then, for any n-ary function symbol f ,

fM (a1 , a2 , . . . , an ) ≡ fM (b1 , b2 , . . . , bn )

and, for any n-ary relation symbol r,

rM (a1 , a2 , . . . , an ) ⇔ rM (b1 , b2 , . . . , bn ) .

Proof of this: From the definition of ≡, there are natural numbers k1 , k2 , . . . , kn , all ≥ 0, and
members ui,j of M for i = 1, 2, . . . , n and j = 0, 1, . . . , ki such that ai = ui,0 and bi = ui,ki
for i = 1, 2, . . . , n and ui,j−1 ∼ ui,j for i = 1, 2, . . . , n and j = 1, 2, . . . , ki . The trickery with
subscripts here is because the chains of u’s connecting ai to bi may be of different lengths
for different i. However, we can always make any of these chains longer, by simply repeating
the last u, using the fact that ∼ is reflexive. So we may, in fact, assume that all these chains
are the same length, which is the same as assuming that all the ki are the same. So let
us do this; now we have one natural number k and members ui,j of M for i = 1, 2, . . . , n
and j = 0, 1, . . . , k such that ai = ui,0 and bi = ui,k for i = 1, 2, . . . , n and ui,j−1 ∼ ui,j for
i = 1, 2, . . . , n and j = 1, 2, . . . , k. But now, from what was proved above about ∼,

fM (u1,j−1 , u2,j−1 , . . . , un,j−1 ) ∼ fM (u1,j , u2,j , . . . , un,j )


and rM (u1,j−1 , u2,j−1 , . . . , un,j−1 ) ⇔ rM (u1,j , u2,j , . . . , un,j )
for j = 1, 2, . . . , k.

From this

fM (a1 , a2 , . . . , an = fM (u1,0 , u2,0 , . . . , un,0 ) ≡ fM (u1,k , u2,k , . . . , un,k ) = fM (b1 , b2 , . . . , bn )


and
rM (a1 , a2 , . . . , an ) = rM (u1,0 , u2,0 , . . . , un,0 ) ⇔ rM (u1,k , u2,k , . . . , un,k ) = rM (b1 , b2 , . . . , bn )
366 CHAPTER 8. MODEL THEORY

as required.
For you algebraists: what we have just done is show that this relation ≡ is a congruence.
Now all we need to do is “quotient it out” to get the model we want.
Step 3 Now, let M 0 be the set of all congruence classes in M 0 (equivalence classes under
the congruence ≡). For any member x of M , we will write [x] for the congruence class of x.
We also write π : M → M 0 for the “natural projection” x 7→ [x].
We make M 0 into a structure for L as follows. For any n-ary function symbol f , define
fM 0 : M 0 → M 0 thus: let X1 , X2 , . . . , Xn be members of M 0 , that is, congruence classes in
M . Choose representatives x1 , x2 , . . . , xn from these classes. Define fM 0 (X1 , X2 , . . . , Xn ) to
be the congruence class containing fM (x1 , x2 , . . . , xn ). The results just proved above tell us
that the particular choice of representatives x1 , x2 , . . . , xn in their classes is irrelevant. The
definition then can be written succinctly:

fM 0 ([x1 ], [x2 ], . . . , [xn ]) = [fM (x1 , x2 , . . . , xn )].

In the same way we define

rM 0 ([x1 ], [x2 ], . . . , [xn ]) ⇔ rM (x1 , x2 , . . . , xn )

It follows immediately from this definition that, if θ is an interpretation L → M , then π ◦ θ


is an interpretation L → M 0 .
Step 4 We now show that, if P is any expression in L and θ any interpretation L → M ,

P is true in M under θ if and only if it is true in M 0 under π ◦ θ.

I 5.B.5 We do this by induction over the construction of P using the formal definition in 5.B.5.
(i) If P = r(t1 , t2 , . . . , tn ) then it is true in M under θ if and only if

rM (θ(t1 ), θ(t2 ), . . . , θ(tn )) is true

and it is true in M 0 under π ◦ θ if and only if

rM 0 (π◦θ(t1 ), π◦θ(t2 ), . . . , π◦θ(tn )) is true, that is, rM 0 ([θ(t1 )], [θ(t2 )], . . . , [θ(tn ))] is true

and we proved these things equivalent in the previous step.


(ii) Suppose that P is ¬Q and (inductively), for every interpretation θ : L → M , Q is
true in M under θ if and only if it is true in M 0 under π ◦ θ. Then, for any such θ,

P is true in M under θ ⇔ Q is not true in M under θ


⇔ Q is not true in M 0 under π ◦ θ
⇔ P is true in M 0 under π ◦ θ

(iii) In the case in which P is Q ⇒ R the argument is the same.


B. ADEQUACY OF FIRST-ORDER THEORIES 367

(iv) Suppose that P is (∀v)Q and, as usual, that, for every interpretation θ : L → M ,
Q is true in M under θ if and only if it is true in M 0 under π ◦ θ. Then for any such θ,

P is true in M under θ ⇔ Q is true in M under θ[v/m] for every m ∈ M


⇔ Q is true in M 0 under π ◦ θ[v/m] for every m ∈ M
⇔ Q is true in M 0 under (π ◦ θ)[v/[m]] for every m ∈ M
⇔ Q is true in M 0 under (π ◦ θ)[v/m0 ] for every m0 ∈ M 0
⇔ P is true in M 0 under π ◦ θ

In the third equivalence here we use the fact that π ◦ θ[v/m] = (π ◦ θ)[v/[m]] , which can be
checked by verifying that both these interpretations have the same restrictions to vL. The
fourth equivalence depends upon the fact that π is onto M 0 .
Step 5 Now we show that M 0 is a model for A. Suppose P in A and θ is any interpre-
tation A → M 0 . Let us write α for the restriction of θ to the set vL of variables of L (this is
the assignment which defines θ). Since the natural projection π is onto M 0 , we may “factor
α through π”, that is, there is a function β : vL → M such that α = π ◦ β. Now β is an
assignment vL → M , and so can be extended uniquely to an interpretation, ψ say, L → M .
Now ψ, restricted to vL is β, so π ◦ ψ, restricted to vL is π ◦ β = α. Since an interpretation
is defined by its restriction to the set of variable symbols, it follows that π ◦ ψ = θ. Since
M is a model, P is true in M under θ. Now it follows from the previous step that it is also
true in M 0 under ψ.
Step 6 The new model M 0 respects equality: if s = t and θ is any interpretation
L → M 0 , construct ψ : L → M so that θ = π ◦ ψ, as in the previous step. Then ψ(s) ∼ ψ(t),
so ψ(s) ≡ ψ(t), so [ψ(s)] = [ψ(t)], that is θ(s) = θ(t) .
Also, the function π : M → M 0 is onto, so the cardinality of M 0 is no greater than that of
M.
(Note that the Axiom of Choice is used in this last statement. It has also been used earlier
in the proof — in Step 5 it is required to factor α through π.) 

AC
B.19 Remark

It follows from this theorem that any consistent theory has a model which respects equality,
however the theorem as proved is not so specific about the cardinality of the model.

For example, a countable theory has a countable (ordinary) model M , and then the theorem
constructs the model which respects equality as the quotient model M 0 , however this may
be much smaller than M . In other words, it might be countably infinite or it might be finite.
Here is an easy example. Choose any particular positive integer, say 3. Then it is easy enough
to construct a theory with axioms which ensure that a model which respects equality must
have exactly three members. For example, include three (different) constant symbols, a, b
368 CHAPTER 8. MODEL THEORY

and c say and the axioms

¬(a = b)
¬(a = c)
¬(b = c)
(∀x)( (x = a) ∨ (x = b) ∨ (x = c) )

If the model is not required to respect equality, we may construct an infinite model as above.
The only effect axioms such as these would have is to force every member of the model to
stand in the relation =M to one of the three elements which interpret a, b and c — but this
relation is not actual equality.
In setting up a theory to describe a model (for example PA to describe N), we choose some
function and relation symbols and axioms governing them. It is always possible that, either
unwittingly or on purpose, we fail to supply enough structure or axioms to completely specify
the model. In this case our theory will have several non-isomorphic models.
Now, if our theory does have non-isomorphic models, that means that there are some things
which are true in some models and false in others (that’s what non-isomorphism means).
Now these things may or may not be expressible in the language of the theory.

For example, the usual model of Peano Arithmetic is the countable set N. But the Upward
Löwenheim-Skolem Theorem tells us that it also has an uncountable model. This “count-
ability” fact, true in one model and false in the other, cannot be expressed in the language
of Peano Arithmetic.
An example of a theory which has non-isomorphic models and a fact, which is expressible in
the language, which is true for some models and false for others is not hard to manufacture.
Take any consistent first-order theory which has an independent set of axioms (for instance
PA. Make a weaker theory by choosing one of the axioms (call it A) and removing it. This
weaker theory has a model in which A is true, and another in which ¬A is true. (Why? Since
the original axioms were independent, both the original theory and the theory in which A
is replaced by ¬A are consistent, and therefore have models.

We have seen that a first-order theory may well have a number of non-isomorphic models.
This raises the likelihood that there may be things that can be said in the language that
are true in some models and false in others. However it would be nice if anything that can
be expressed in the language and is true in all models is in fact a theorem.

In the next section we will see some more interesting examples of non-isomorphic (perhaps
unexpected) models.
In view of all this, it would be nice if at least any fact that can be expressed in the language
and is true in all models has to be a theorem. A theory which has this property is called
adequate. The next theorem tells us that all first-order theories have this desirable property.
(The proof is now easy, but depends upon the heavy work we have been doing in this
chapter.)
B. ADEQUACY OF FIRST-ORDER THEORIES 369

B.20 The Adequacy Theorem, often called “Gödel’s Completeness Theorem”)


Any first-order theory is adequate.
In other words, in any first-order theory, the theorems are exactly those expressions which
are semantically implied by the axioms.

In other words again, if A is a first-order theory with axioms A and P is any expression in
A, then
A P if and only if A P .

Proof. That if A P then A P is just a restatement of Proposition 5.C.3(i). I 5.C.3


Now suppose that P is not provable in A (that is, that it is not the case that A P ). Form
the extension B of A by adding ¬P as a new axiom to A . We know that B is consistent.
By Theorem B.14, B has a model. This is also a model for A, and in it P is not true.  I B.14

B.21 The Adequacy Theorem for first-order theories with equality


If A is a first-order theory with equality and axioms A and P is any expression in A, then

A P if and only if A = P

where = means true under every structure which respects equality).

Proof. To see that, if A P then A = P observe that, if A P then, by the previous


theorem, A semantically implies P in every structure, including those which respect equality.

The other part of the argument is the same as for the previous theorem. Suppose that P is
not provable in A (that is, that it is not the case that A P ). Form the extension B of A
by adding ¬P as a new axiom to A . We know that B is consistent. By Theorem B.18, B I B.18
has a model which respects equality. This is also a model for A, and in it P is not true. 
370 CHAPTER 8. MODEL THEORY

C Compactness
The following theorem is easy to prove and looks innocuous, but it has a number of fasci-
nating (and very useful) consequences, some of which we will explore in this section.

C.1 The Compactness Theorem, Version 1


Let L be a first-order language, A be a set of statements in L. If every finite subset of A
generates a consistent theory then so does A .

Proof. This is straightforward — we have met this idea before. If A does not generate a
consistent theory, then there is a statement P such that A P ∧ ¬P . There can be only a
finite number of members of A involved in the proof of this, so that finite subset generates
an inconsistent theory. 

I B.14 The Fundamental Theorem of Model Theory (B.14) immediately gives us an equivalent form
of this theorem.

C.2 The Compactness Theorem


Let L be a first-order language, A be a set of statements in L . If every finite subset of A
has a model then so does A .

Proof. This is a corollary of the previous theorem, using B.14. 

C.3 Example: A model of arithmetic with an infinite number


I 4.C In Section 4.C we discussed the Elementary Theory of Arithmetic, PA. The structure which
immediately springs to mind for this theory is the natural numbers, together with the usual
zero, successor, addition and multiplication functions, which we could call (N, 0̄, + , +, .).
Let us now extend this language by adding a constant ω̄ and a countably infinite set of new
axioms
0̄ < ω̄ , 1̄ < ω̄ , 2̄ < ω̄ , . . .
(where, as usual, 1̄, 2̄, 3̄, . . . are abbreviations for 0̄+ , 0̄++ , 0̄+++ , . . . and a < b is an abbre-
viation for (∃x)(a + x+ = b) ). We could call this new language PAω .
I have called this extra member ω̄ to suggest the notation for ordinals, since it comes at
the end of the structure, after all the ordinary natural numbers. I have put a bar above it
because that’s what we have done for all the constants in this language.
It is important to observe here that we are extending the Elementary Theory of Arithmetic:
we make the language by using all the symbols of that theory and adding the new one ω̄ and
also using all of its seven axioms (NT1) – (NT7) and adding the new ones above. Because
of this we are retaining the axiom of induction amongst other things.
In fact, we need to be a bit more careful in explaining exactly what we are doing here. It is
best to take the extension process in two steps.
C. COMPACTNESS 371

First we add the new symbol ω̄. This automatically extends the language to include all the
new expressions containing ω̄. And, since this is a first-order theory, the logic axioms, which
are schemata, must also be extended to encompass all these new expressions. The induction
axiom PA3 is a schema, but it is a proper axiom, so we don’t extend it: it only refers to
expressions in the original language. Now, since we have not yet added the new axioms, this
extended theory is also consistent. (This is by Lemma B.6.) I B.6

Next we add all the new axioms.


Now if B is any finite subset of these axioms, then it can only contain a finite number of the
new axioms listed above. Thus there is a last one, n̄ < ω̄ say, which occurs in B . Then
we have a model for B: the (ordinary) natural numbers, with the usual interpretation of all
the symbols other than ω̄, and interpreting ω̄ as n + 1.
It follows that the theory itself has a model. But now the new axioms, taken together, state
that ω is greater than every one of the “ordinary” natural numbers. Note that, because of
the other axioms, the model must contain a whole host of “infinite” numbers, corresponding
to ω̄ + , 2̄ω̄ , ω̄ ω̄ and so on. Note also that, while a model for N together with an infinite
number ω is not surprising, one in which the principle of (ordinary) induction holds for the
whole set is!
How can this be? Consider, for instance, the expression x < ω̄ . Now 0 < ω̄ in our theory,
and a glance at the new axioms above tells us that (∀x)(x < ω̄ ⇒ x+ < ω̄) also. So the
principle of induction tells us that (∀x)(x < ω̄); and therefore ω̄ < ω̄, which is absurd
(a contradiction). SO, there is something wrong with this argument — what is it?

The answer to this question is that, in the argument above, I jumped too quickly to a
conclusion, namely that
(∀x)(x < ω̄ ⇒ x+ < ω̄) . (!!)

What we have, from the axioms, is that

for all x ∈ N x̄ < ω̄

and to get from here to the supposed theorem would involve all those infinite number of
axioms, and so an infinitely long proof: and there is no such thing. Now one might imagine
that there was some sneaky way of getting around this problem, and proving the theorem
labelled (!!) some other way. But this cannot be, because if this were a theorem, the
rest of the argument above is watertight and we would indeed have ω̄ < ω̄, which is an
antitheorem (by the definition of <). And that would show our new theory PAω to be
inconsistent, contradicting the Compactness Theorem.

This is one example of a number of similar games that can be played with PA, probably the
best-known one. They are generally called non-standard arithmetic.

C.4 Example: A model of the reals with infinite numbers and infinitesimals
We can apply a similar idea to the real numbers to add infinitesimals. To do this we are
going to represent R by a language having uncountably infinite sets of function and relation
symbols and an uncountably infinite number of axioms.
372 CHAPTER 8. MODEL THEORY

We define our language, L say, to have a function symbol fˆ of arity n corresponding to every
ℵ0
actual function f : Rn → R — the whole 22 of them. Also a relation symbol r̂ of arity n
ℵ0
corresponding to every actual relation of arity n, that is, every subset of Rn , again 22 of
them.
For our theory we simply take every expression in this language which is true in R, and we
take the axioms to be the same set of expressions (or better, just the closed ones): there’s
no harm in that, even if it is a bit unusual.
So now we have an enormous first-order theory. Nevertheless, it is in many ways quite easy
to deal with. For example, it is obvious, from the way it was built, that R (with all its
functions and relations) is a model.
Now we add an infinitesimal in much much the same way as we added an infinity to PA.
We add a new constant (nullary function) ι and axioms

0̂ < ι
ι < x̂ for every positive real number x .

(We can think of x as being a nullary function, and so get x̂.)

As with the previous example, R is a model for every finite set of these axioms. Consequently
the extended language has a model.
This model is the “nonstandard reals” and is used for Nonstandard Analysis. This is lots
of fun. Because of the existence of infinitesimals, things like dx and dy become first class
citizens and you can do calculus the way you’ve always secretly wanted to. Well, almost.

C.5 Theorem
If a first-order theory has arbitrarily large finite models which respect equality, then it has
an infinite model which respects equality.

Proof. Suppose that the theory A has arbitrarily large finite models which respect equality.
Since it has models, it is consistent.
From this form a new theory A0 by adding a countable number of new constant symbols
b1 , b2 , . . . (and extending the axiom schemata of predicate logic as necessary to encompass
I B.6 the new symbols). Then A0 is consistent by Proposition B.6.
Now, for each natural number n, extend the theory A0 by adding new axioms

bi 6= bj for all i, j such that 1 ≤ i < j ≤ n .

(These axioms “say” that b1 , b2 , . . . , bn are all different.) Call the theory with these new
axioms Bn .
We now prove that Bn is consistent for each n. Our original theory A has a model, M
say, of size at least n. Thus, we can make it into a structure, M 0 say, for A0 by defining
new constants in it, b̄1 , b̄2 , . . . say. Choose these so that b̄1 , b̄2 , . . . , b̄n are all different; the
remaining ones b̄n+1 , b̄n+2 can be chosen any way at all — for example they could all be
C. COMPACTNESS 373

equal. Now any interpretation of A in M extends uniquely to an interpretation of Bn in


M 0 . Consequently M 0 is a model for Bn , and so it is consistent.
Now let B∞ be A0 with all the new axioms bi 6= bj for all 1 ≤ i < j < ∞. Then every finite
subset of these axioms is contained in the axioms for some Bn and so, by the compactness
theorem, B∞ is consistent and has a model. But because of these bi 6= bj axioms, that
model must be infinite.
Is it obvious that any interpretation of A in M extends uniquely to an interpretation of Bn
in M 0 ? Here’s why. First note what we have done. Starting with A we created A0 by adding
the new constants, but no new axioms. Then from that we created the Bn and B∞ , all
with the same language as A0 , by adding new axioms. We can consider Bn and B∞ to be
axiomatic theories whose axioms are just all the theorems of A together with the new ones
we’ve added.
Now we know that, any function from the set of variables of A to M extends uniquely to an
interpretation (see 5.B.3), and, in the same way, any function from the set of variables of I 5.B.3
Bn to M 0 extends uniquely to an interpretation. But the sets of variables are the same —
we only added constants. Also, the underlying sets of M and M 0 are the same — we only
designated some of its members to be constants. So, if it not obvious already, given any
interpretation of A to M , look at its restriction to the set of variables, which then gives an
interpretation Bn to M 0 . And then by uniqueness, the new interpretation must extend the
old one.
Now M was a model for A, so every theorem of A is true for the model. Also, all the new
axioms are true for the model, because we just defined it that way. Consequently every
theorem of Bn is true for M 0 also. 

C.6 Remark
This remark applies only to theories with equality.
For any natural number we like to choose, we can give an axiom which ensures that any
model has at least n members. Here is the example for n = 3:
(∃x)(∃y)(∃z)(x 6= y ∧ x 6= z ∧ y 6= z)

In a similar fashion, for any natural number n we can give an axiom which ensures that any
model has at most n members. Here is the example for n = 3:
(∃x)(∃y)(∃z)(∀w)(w = x ∨ w = y ∨ w = z)
I think it is obvious how to construct such axioms for any n.
These two forms can be combined into an axiom which ensures that the model contains
exactly n members: Here is the example for n = 3:

(∃x)(∃y)(∃z) x 6= y ∧ x 6= z ∧ y 6= z ∧ (∀w)(w = x ∨ w = y ∨ w = z)

If a theory has this last axiom and no more, then any model of it which respects equality
must have exactly n members. If we add this axiom to an existing theory, then either the
resulting theory will be inconsistent or any model will have exactly n members.
374 CHAPTER 8. MODEL THEORY

Let Fn be the axiom above which states that the model has exactly n members. Then ¬Fn
states that the model does not have n members, and so the countably infinite set of axioms

¬F0 , ¬F1 , ¬F2 , . . . ,

taken together, ensure that any model is infinite..

We can also force a model to be infinite with only one axiom if we add a new binary function
symbol, f say. Use the axiom

(∀x)(∀y)(∀u)(∀v)(f (x, y) = f (u, v) ⇒ x = u ∧ y = v) .

This simply states that f is injective. But that makes fM an injective function M ×M → M
and the only way this can happen (since M is nonempty by definition) is for M to be infinite.
On the other hand, the last theorem above tells us that there is no set of axioms (finite or
infinite and irrespective of what relations or functions we allow ourselves) which say “the
model is finite” in the same way.

I B.18 In the context of Theorem B.18 and the following remarks, the next corollary is of interest.

C.7 Corollary
(i) Let T be a countable theory with equality which has an infinite model which respects
equality. Then it has a countably infinite model which respects equality.

Proof. Extend T by adding a countably infinite set of constants b0 , b1 , b2 , . . . together with


axioms
bi 6= bj for all i 6= j (–1)
(that’s a different axiom for each i and j). Call the extended theory T0 . By the compactness
I B.18 theorem it is consistent. Now, by Theorem B.18 it has a finite or countable model which
respects equality. But, from (–1), which must hold (with genuine equality) in the model, it
cannot be finite. 
Part III

COMPUTABILITY

375
9. RECURSIVE FUNCTIONS

A Partial recursive functions

A.1 Discussion
In this chapter we examine the notion of an algorithm. Think of an algorithm a cut-and-
dried method of computing something, specified by a recipe of some kind which can be
followed blindly, without any creative thought required. A computer program makes quite
a good analogy (or example).
Algorithms are not confined to working with numbers; they can work with other kinds of
data, though it will be assumed that the data can be represented symbolically in some way.
In Appendix C we will look briefly at algorithms in their widest applications, but in this IC
one we will start with algorithms for computing functions involving natural numbers only,
that is, functions N → N or more widely Nm → Nn .
The arithmetic processes of addition and multiplication of large numbers and of long division
are examples of algorithms which should be familiar from schooldays.

Numerical algorithms, such as we will be discussing mostly, are expected to work for numbers
of any size. Consequently they will typically contain some looping arrangement, something
like “Keep doing these instructions until so-and-so happens”.
So an algorithm will consist of a list of instructions, each one of which is quite unambiguous
and can always be performed, together with an arrangement to prescribe, again unambigu-
ously, which instruction to perform next. (Typically, if written in natural language, the
instructions will be followed line-by-line down the page except where “branching instruc-
tions” which create loops are encountered.) There will, of course, be one or more “End”
instructions of the form “OK, we are finished now. Here is your answer”.
One aspect of algorithms needs to be mentioned: any set of instructions, however stupid or
ridiculous, will define an algorithm, so long as those instructions are clear and unambiguous
and can be followed.
So, if one designs an algorithm to compute some function — for instance the nth prime,
given n — the first question one might ask is, “Does it actually compute the function we
want?”, followed quickly by, “Are there perhaps any special values which fail for reasons we
have overlooked?”
But there is a more basic and immediate problem with any algorithm, whether we are trying
to design one to compute a known function, or are simply writing down some interesting
instructions to see what will happen: the algorithm may fail to produce an answer at all.
Given that, as specified above, each instruction can always be performed and the algorithm
always specifies which instruction to perform next (except for “End” ones), the only way an

377
378 CHAPTER 9. RECURSIVE FUNCTIONS

algorithm can fail to produce an answer is that it never gets to such an “End” instruction.
In that circumstance the algorithm will just keep going forever, usually called “getting into
an infinite loop”. (Computer programmers are familiar with this possibility.)
So, for any algorithm and in particular the ones to compute functions Nm → Nn discussed
in this chapter, we have to allow for the possibility that, for some input values, there will
be no output value. In other words, we will be dealing with partial functions Nm → Nn . I
will use, as much as possible, a special arrow to denote this: Nm 99K Nn .
In this chapter I start with the most common definition of partial functions Nm 99K Nn
which can be computed algorithmically. The standard word for these is partial recursive
functions. In fact, we start with functions Nm 99K N and then make the more general
definition later.
The definition we start with really does not look as though it will define any algorithmically
computable function, but this is the standard definition, so let us run with it. As we progress
through the chapter it will become more apparent that pretty well any way of computing
a function you can think up will be covered by this definition. But that of course is not a
proof that we are indeed covering any algorithmically computable function this way; that
IC we will establish in Appendix C.

A.2 Notation
(1) Partial functions. We deal with partial functions Nn 99K N, that is, functions
D → N, where D is a subset of Nn . The subset D may be equal to Nn , in which case the
function is total, or it may be empty, in which case the function has no values at all and we
say the function is empty. Wherever possible I will use a dotted arrow for partial functions,
as above.

(2) Substitution By substitution of such functions, we mean the process of obtaining,


from a partial function f : Nm 99K N and partial functions g1 , g2 , . . . , gm : Nn 99K N the new
partial function h : Nn 99K N defined by
h(x) = f (g1 (x), g2 (x), . . . , gm (x)) .
This is a kind of composition operation; in fact, when n = 1 it is exactly ordinary compo-
sition. It is assumed that h is defined wherever the right hand side of this equation makes
sense — that is, x is in the domain of h iff it is in the intersecton of the domains of all the
gi and, moreover, hg1 (x), g2 (x), . . . , gm (x)i is in the domain of f .

A useful notation for this kind of substitution is to write the function simply
h = f (g1 , g2 , . . . , gm )

(3) Minimalisation Now let g be a partial function Nn+1 99K N. The operation of
minimalisation applied to g yields a new partial function Nn 99K N, often written µg, and
defined
µg(y) = the smallest value of x such that g(x, y) = 0 and g(ξ, y) is defined for all ξ < x,
provided such an x exists,
and is undefined if no such x exists.
A. PARTIAL RECURSIVE FUNCTIONS 379

This is also often written


µg(y) = min{g(x, y) = 0}.
x

Be careful of this notation; it can be misleading because it obscures the important fact that,
for x to be the value of µg(y), not only must x be the first zero of g(x, y), but all preceding
values must exist.

It might help to think of this operation as looking for the first zero value of f in a very
straightforward algorithmic sort of way. One just computes f (0), f (1), f (2), . . . in order,
stopping at the first n for which f (n) = 0 and then returning n as the value. This can fail
to return a value for two reasons: firstly, there may be no such n, that is, f (n) might never
be zero and, secondly, even if there is such an n one might encounter some i < n at which
f (i) fails to exist, in which case the whole searching process fails right there.

(4)  defined
Natural subtraction is the operation −
(
 = x−y if x ≥ y,
x−y
0 otherwise.

This is the closest we can expect to get to ordinary subtraction and still be a function
Nn → N, that is, to avoid getting any negative values.

A.3 Definition: Partial recursive function


A partial recursive function is a partial function Nn 99K N which can be obtained by a finite
number of applications of the operations of substitution and minimalisation from the list:—

(i) x 7→ x + 1 the successor function suc : N → N


(ii) hx1 , x2 , . . . , xn i 7→ xi the projection function πn,i : Nn → N
(iii) hx, yi 7→ x + y addition N2 → N

(iv) hx, yi 7→ x−y natural subtraction N2 → N
(v) hx, yi 7→ xy multiplication N2 → N.
A function is recursive if it is both partial recursive and total.
Note that all these basic functions (i)–(v) are total and that applying the operation of sub-
stitution to total functions yields a total function. Consequently, the only way a truly partial
function can arise is if the process of building it up involves minimalisation somewhere.
It is comforting to observe that all of these basic functions can be computed algorithmi-
cally (with very simple algorithms) and that applying substitution and minimalisation to
functions we already know how to compute can be done algorithmically also. Thus all our
partial recursive functions are algorithmic (at least according to the slightly vague idea of
an algorithm we have so far). The other way round is far from obvious.
380 CHAPTER 9. RECURSIVE FUNCTIONS

A.4 Examples: of partial recursive and recursive functions


(i) The basic functions mentioned in Parts (i) to (v) of the definition above are all
recursive. (They are obtained from the basic list by no applications of substitution and
minimalisation, and zero is a finite number.)

(ii) The identity function idN : N → N given by n 7→ n is just the projection function
π1,1 .
(iii) The zero constant function 0 : N0 → N (which takes the empty sequence h i to 0)
is obtained from the identity function id : N → N by minimalisation (the first x such that
id(x) = 0 is 0).

The zero constant function Nn → N, for any n, can be obtained by minimalisation of πn+1,1 :

Nn+1 → N. If n 6= 0 it can be obtained alternatively as πn,1 −πn,1 .
Other constant functions Nn → N are obtained by composing the appropriate zero function
with the successor function a suitable number of times. For example, the function Nn → N
which takes every hx1 , x2 , . . . , xn i to 2 is given by suc(suc(0(x1 , x2 , . . . , xn ))).
 
(iv) The difference function hx, yi 7→ |x − y| is obtained by |x − y| = (x−y) + (y −x).
(v) Here are a few simple “test functions” which check for some condition and return 1
for true or 0 for false.
The zero test function N → N and the equality test function N2 → N, defined
( (
1 if x = 0, 1 if x = y,
zero(x) = eq(x, y) =
0 otherwise 0 otherwise
 
are obtained as zero(x) = 1−x and eq(x, y) = 1−(|x − y|). And of course we have a
nonzero test function N → N and the inequality test function N2 → N, defined
( (
1 if x 6= 0, 1 if x 6= y,
nonzero(x) = neq(x, y) =
0 otherwise 0 otherwise
 −x)
obtained as 1−(1   −
and 1−(1  |x − y|).

In the same way we can define tests for various inequalities:


( (
1 if x < y, 1 if x > y,
less(x, y) = gtr(x, y) =
0 otherwise 0 otherwise
( (
1 if x ≤ y, 1 if x ≥ y,
leq(x, y) = geq(x, y) =
0 otherwise 0 otherwise
     
obtained as less(x, y) = 1−(x + 1−y), gtr(x, y) = 1−(y + 1−x), leq(x, y) = 1−(x−y) and
 
geq(x, y) = 1−(y −x).
(vi) The squaring function x 7→ x2 is obtained by substituting the identity function
into multiplication: mult(id(x),id(x)). Higher powers x 7→ xk (for fixed k) are obtained
A. PARTIAL RECURSIVE FUNCTIONS 381

by repeating this construction in the obvious way. We cannot get general exponentiation
hx, yi 7→ xy as easily as this. It requires, dare I say it, a higher-powered method.
√ √
(vii) An integer square root, x 7→ b xc (the largest √ natural number ≤ x ), can be
described as that natural number y for which y ≤ x < y + 1, which is the same as
y 2 ≤ x < (y + 1)2 . It is thus the smallest natural number y for which (y + 1)2 > x; thus

b xc = min{(y + 1)2 > x}
y

= min{leq((y + 1)2 , x) = 0} .
y

This looks back-to-front, but it is correct because it is the function leq, not gtr which yields
0 if and only if (y + 1)2 > x. We are sneakily getting round the fact that, for tests like these,
minimalisation searches for the first 0 = false instead of 1 = true.

(viii) The function hx, yi 7→ bx/yc, defined as:


(
x
the greatest natural number ≤ y if y 6= 0,
bx/yc =
undefined if y = 0.

In the case y 6= 0, this is the natural number z such that z ≤ xy < z + 1, that is, such that
yz ≤ x < y(z + 1). It is thus the smallest vaue of z for which y(z + 1) > x, so

bx/yc = min{y(z + 1) > x}


z
= min{leq(y(z + 1), x) = 0} .
z

As luck would have it, this also works when y = 0, whether or not x = 0 too, because then
there is no z such that y(z + 1) > x so the minimalisation fails to return a value.
(ix) The function
(
the remainder upon dividing x by y if y 6= 0,
Rem(x, y) =
undefined if y = 0


is obtained by Rem(x, y) = x − ybx/yc = x − ybx/yc (since x ≥ ybx/yc always).
Remark Integer division and the Remainder function are not recursive as defined, only
partial recursive (both being undefined when y = 0). However, as we will be using them
they usually give rise to recursive functions.

Suppose we have recursive functions f and g and define a new one h by

h(x) = bf (x)/g(x)c.

This clearly defines h as a partial recursive function. However, if we happen to know that
g(x) is never zero, then h is also a total function and so, in fact, recursive. The same idea
holds good for the Remainder function.
382 CHAPTER 9. RECURSIVE FUNCTIONS

A.5 And note well


 (x)
if f : N → N is partial recursive but not recursive (that is, not total) then x 7→ f (x)−f
is not the zero function. Rather, it is the function
(
 0 if f (x) is defined
f (x)−f (x) =
undefined otherwise.

Why? Algorithms, remember, are not supposed to be smart; they just do exactly what they
are told. In this case the algorithm says, “Compute f (x). Now compute f (x) again. Now
natural subtract the second value from the first.” If f (x) is undefined, the algorithm fails
to complete the first step. The same goes of course for functions of several variables.

Question
Which of the following are true for any partial functions f, g, h : Nm → N and x ∈ Nm ?

(i) If f (x) + g(x) = h(x) then f (x) = h(x)−g(x).

(ii) 
If f (x) = h(x)−g(x) then f (x) + g(x) = h(x).
(iii) If f (x) + g(x) = f (x) + h(x) then g(x) = h(x).

A.6 Formulas and the projection functions


In some of the examples above we built up formulas and it may not have been obvious how
this can be done. So let us illustrate this with a simple example: the function N2 → N
defined hx, yi 7→ x2 + y 2 .
It is easier to see how this goes if we employ a “generic” style of notation:

hx, yi 7→ +(×(x, x), ×(y, y))

We see that most of this is built up by substituting the basic functions listed in the definition
into one another, but the question is, where do the x and y come from to substitute in here?
This is where the projection functions come in: we have x = π2,1 (x, y) and y = π2,2 (x, y) so
we can write the function as

hx, yi 7→ +(×(π2,1 (x, y), π2,1 (x, y)), ×(π2,2 (x, y), π2,2 (x, y)))

or, more readably,


+(×(π2,1 , π2,1 ), ×(π2,2 , π2,2 )) .
So the projection fuctions have an important rôle in dissecting out the individual variables
before combining them in a formula. Here is another example ((iv) above): the difference
 
function |x − y| = (x−y) + (y −x) is

+(−(π 
2,1 , π2,2 ), −(π2,2 , π2,1 )) .

Here now is a more challenging problem: the integer square root given in (vii) above. It was
defined as √
b xc = min{leq((y + 1)2 , x) = 0} .
y
A. PARTIAL RECURSIVE FUNCTIONS 383

Before we can go further, we must realise that this is an “easy to read” version of the
minimalisation operation, which should strictly be written

b xc = µg(x) where g is the function g(y, x) = leq((y + 1)2 , x) .

Here we have two functions to sort out, leq and 1. Note that here we need to treat the
constant 1 as the constant function N2 → N. We can get this constant function by composing

the zero function with the successor function, and we get the zero function as π2,1 −π2,1 , so

the constant function 1 : N2 → N is 


suc(π2,1 −π 2,1 ) .

 −y),
The leq function is given as leq(x, y) = 1−(x  so it is

leq 
−(1, 
−(π
is 2,1 , π2,2 )) ,

the 1 here being the constant function 1 : N2 → N, as discussed above. I think it is now
clear how to substitute these various functions into one another to get the integer square
root.

A.7 A small extension


So far we have been discussing (partial) functions Nm 99K N. Now we want to extend the
discussion to cover partial functions Nm 99K Nn . This is quite easy, but there is a pesky
detail to be got out of the way first.
Any partial function f : Nm → Nn can be written in the form

f (x) = hf1 (x), f2 (x), . . . , fn (x)i for all x ∈ Nm , (–1)

where f1 (x), f2 (x), . . . , fn (x) are partial functions Nm → N. (I am using the boldface x as
shorthand for the n-tuple hx1 , x2 , . . . , xn i.)

Now there are two ways of looking at this.


(A) We start with (maybe are given) the main function f and use that to define the
component functions fi thus
fi = πn,i ◦ f
and then (–1) will hold.

(B) We start with (or are given) the individual functions f1 , f2 , . . . , fn and use (–1) to
define f .
The problem is that there is a slight difference between these two approaches. Consider
approach (B). The given functions f1 , f2 , . . . , fn are all partial and may well have different
domains. But then f (x) will exist exactly when all the fi (x) exist; that is, the domain of
f will be the intersection of the domains of all the fi .
On the other hand, if we use approach (A), we are defining all the component functions fi
to have the same domain as f .
The point is that now we want to define what it means to say that a partial function
Nm 99K Nn is partial recursive which will work even when n 6= 1. So w have a question:
384 CHAPTER 9. RECURSIVE FUNCTIONS

should we use approach (A) and define such a function to be partial recursive if its component
functions are or should we use approach (B) and say that there are some partial recursive
functions f1 , f2 , . . . , fn such that (–1) holds?
Happily, we do not need to make a choice: the two approaches are equivalent: a partial
function f : Nm 99K Nn is partial recursive or not irrespective of which definition we use.

So let us choose approach (A) as our definition, since it is likely to be the easiest one to
verify, and prove that approach (B) is equivalent.

A.8 Definition: Partial recursive functions Nm 99K Nn


(i) Let f be a partial function Nm 99K Nn . Then its components are the partial functions
f1 , f2 , . . . , fn : Nm 99K N given by

fi = πn,i ◦ f for each i

(ii) A partial function f : Nm 99K Nn is partial recursive if all its components are.

A.9 Proposition
Let f1 , f2 , . . . , fn be partial recursive functions Nm → N. Then the function f : Nm → Nn
defined by f = hf1 , f2 , . . . , fn i is partial recursive also. (The same result holds trivially for
recursive functions).
The point here is that the functions f1 , f2 , . . . , fn may not all have the same domain and in
this case, as pointed out above, they are not the components of g.

Proof. Firstly, note that dom(f ) = dom(f1 ) ∩ dom(f2 ) ∩ . . . ∩ dom(fn ). Let us write
g1 , g2 , . . . , gn for the genuine components of f . These are the restrictions of the fi to
dom(f ). We want to show that they are all partial recursive also.
For each i = 1, 2, . . . , n let us write di for the partial function Nm 99K N given by
(
1 if x ∈ dom(fi ),
di (x) =
undefined otherwise.

 (x) + 1, so these functions are all partial recursive. Here we are using
Now di (x) = fi (x)−f i
I A.5 the trick in A.5. Therefore so is their product d(x) = d1 (x)d2 (x) . . . dn (x), and this is the
function (
1 if x ∈ dom(g),
d(x) =
undefined otherwise.
But now the components of f are given by gi (x) = d(x)fi (x) for each i, so these are partial
recursive, as required. 

Observe that the relation “closure under the operation of substitution” in our original defini-
tion of partial recursive functions may be replaced by closure under the two slightly simpler
operations:
A. PARTIAL RECURSIVE FUNCTIONS 385

(a) Presubstitution: given partial functions f1 , f2 , . . . , fn , obtain f = (f1 , f2 , . . . , fn ).


(b) Ordinary composition.

A.10 Proposition: Pairs


There is a recursive function P : N2 → N which has a recursive inverse.

(We will denote this inverse x 7→ (L(x), R(x)).)


This function associates a single number P (m, n) with every pair hm, ni. Moreover, since
it has an inverse, no two pairs are associated with the same number and every number is
P (m, n) for some pair.
If P takes the pair hm, ni to x, then of course the inverse takes x to hm, ni, and we have
m = L(x) and n = R(x). The names P , L and R are chosen to suggest “pair”, “left” and
“right”.
Of course, we know already that N2 is countably infinite and so the existence of a bijection
N2 → N is nothing new. The fact that there is such a function which both is recursive and
has a recursive inverse is the point here.

The proof is interesting, but not particularly illuminating. Read it if you are interested.
The important fact to take away is the fact that P , L and R exist: we will be using them
often.

Proof. We must define the two functions, show that they are inverses and are recursive.
We define P by the usual way of enumerating N2 (running along the “back diagonals”):
x
0 1 2 3
0 0 2 5 9
1 1 4 8
y
2 3 7
3 6 etc.

It is not difficult to see that

1
P (x, y) = (x + y)(x + y + 1) + x (–1)
2

and, since 21 (x + y)(x + y + 1) must be a natural number, we have


 
(x + y)(x + y + 1)
P (x, y) = +x
2

and so P is recursive (see the remark at the end of Proposition A.4.) I A.4
386 CHAPTER 9. RECURSIVE FUNCTIONS

Suppose now that P (x, y) = z so that x = L(z) and y = R(z). By simple algabraic
manipulations we have

From (–1) (x + y)(x + y + 1) + 2x = 2z


that is (x + y)2 + 3x + y = 2z (–2)
2
so (2x + 2y + 1) + 8x = 8z + 1
and so (2x + 2y + 1)2 ≤ 8z + 1 < (2x + 2y + 3)2 .

Then 2x + 2y + 1 ≤ 8z + 1 < 2x + 2y + 3

so b 8z + 1c = 2x + 2y + 1 or 2x + 2y + 2

b 8z + 1c + 1
Thus b c−1 = x+y
2
and, since the right hand side here must be ≥ 0, we have

b 8z + 1c + 1 
x+y = b c − 1 = Q1 (z) say. (–3)
2
We see that Q1 is recursive. Substituting back into (–2),

3x + y = 2z − Q1 (z)2 = Q2 (z) say,

another recursive function. We solve the last two equations in the usual way to get

x = (Q2 (z) − Q1 (z))/2 .

Since x is a natural number we must have Q2 (z) − Q1 (z) ≥ 0 and even. Thus
 Q (z))/2c .
x = b(Q2 (z) − 1

Since this is true for all x, it gives us a formula for L as a recursive function:
  Q (z) 
Q2 (z) − 1
L(z) = .
2
 L(z).
Substituting back into (3) gives R(z) = Q1 (z) − 

Note for later use: if z and z 0 are two natural numbers such that L(z) = L(z 0 ) and R(z) =
R(z 0 ), then z = z 0 .
Also we have
P (L(z), R(z)) = z for all z
and
L(P (x, y)) = x and R(P (x, y)) = y for all x and y.

We now embark on a series of technical lemmas whose main object is to prove that any
function defined by ordinary induction is partial recursive (provided we start from partial
recursive functions of course).
A. PARTIAL RECURSIVE FUNCTIONS 387

A.11 Lemma
Let ν be any natural number which is divisible by all of 1, 2, ..., n. Then the n + 1 numbers
1 + ν(i + 1) i = 0, 1, 2, . . . , n
are pairwise coprime.

Proof. Any divisor of 1 + ν(i + 1), other than 1, must be greater than n. Now suppose
that 0 ≤ i < j ≤ n and d divides both 1 + ν(i + 1) and 1 + ν(j + 1). Then d divides
(j + 1)(1 + ν(i + 1)) − (i + 1)(1 + ν(j + 1)) = j − i. Thus d ≤ j − i < n so d = 1. 

A.12 The Chinese Remainder Theorem


This is a standard result in elementary number theory and will not be proved here. It states:
Let m0 , m1 , . . . , mk be natural numbers which are pairwise coprime and let a0 , a1 , . . . , ak be
any natural numbers. Then there is a natural number x such that
x ≡ ai mod mi for i = 0, 1, . . . , k .
We need to make a slight modification:
Let m0 , m1 , . . . , mk be nonzero natural numbers which are pairwise coprime and let a0 , a1 , . . . , ak
be any natural numbers. Then there is a nonzero natural number x such that
x ≡ ai mod mi for i = 0, 1, . . . , k .
To see this, suppose that x should be zero in the original version (which can only happen if
all the ai are zero). Then we may replace x by the product m0 m1 . . . mk and the result is
still true.

A.13 Lemma
Let a0 , a1 , . . . , an be a finite sequence of natural numbers. Then there are natural numbers
u and v such that
Rem(u + 1, 1 + v(i + 1)) = ai for i = 0, 1, . . . , n .

Proof. Let A = max{a0 , a1 , . . . , an } and v = (A + 1)n!. By Proposition A.11 above, the I A.11
numbers 1 + v(i + 1) are pairwise coprime. Also ai < v < 1 + v(i + 1) for all i. By the
modified version of the Chinese Remainder Theorem, there is a nonzero natural number x
such that
x ≡ ai mod 1 + v(i + 1) for i = 0, 1, . . . , n
and so, setting x = u + 1, there is a natural number u such that
u + 1 ≡ ai mod 1 + v(i + 1) for i = 0, 1, . . . , n ,
that is
Rem(u + 1, 1 + v(i + 1)) = Rem(ai , 1 + v(i + 1))
= ai since ai < 1 + v(i + 1). 

The next “lemma” is interesting enough to stand as a proposition in its own right.
388 CHAPTER 9. RECURSIVE FUNCTIONS

A.14 Proposition: Sequences


There is a recursive function T : N2 → N such that, for any finite sequence a0 , a1 , . . . , an of
natural numbers, there is a number w such that
T (w, i) = ai for i = 0, 1, . . . , n.

Proof. Define
T (w, i) = Rem(L(w) + 1, 1 + R(w)(i + 1)).
I A.4 Then T is obviously recursive (see the remark at the end of Proposition A.4.) Now, given
a0 , a1 , . . . , an , by the previous proposition, there are integers u and v such that ai = Rem(u+
1, 1 + v(i + 1)) for i = 0, 1, . . . , n. Setting w = T (u, v) does the trick. 

Note what this says. Suppose we display the values of the function T as a two-dimensional
array:

0 1 2 3 4
0 T(0,0) T(0,1) T(0,2) T(0,3) T(0,4) ...
1 T(1,0) T(1,1) T(1,2) T(1,3) T(1,4) ...
2 T(2,0) T(2,1) T(2,2) T(2,3) T(2,4) ...
3 T(3,0) T(3,1) T(3,2) T(3,3) T(3,4) ...
4 T(4,0) T(4,1) T(4,2) T(4,3) T(4,4) ...
.. .. .. .. ..
. . . . .

Then every finite sequence of natural numbers turns up somewhere in this array as the
beginning of a row:
T (w, 0) , T (w, 1) , T (w, 2) , . . . , T (w, n)
for some w.
This proposition will play an important rôle in the next section.

A.15 Remarks
I A.3 It would be nice if we could construct any recursive function as in Definition A.3, but in such
a way that all intermediate functions involved are also recursive, i.e. total. But we observe
that the definition allows the construction of a recursive function in such a way that one or
more of the intermediate functions involved are in fact partial; it is by no means obvious
that such a construction can be modified to avoid these intermediate partial functions.
A regular function is a total function which yields a total function when minimalised. More
precisely, a function f : Nn+1 → N is regular if it is total and, for every y ∈ Nn , there is
some x ∈ N such that f (x, y) = 0.
I A.3 To return to our wishful thinking: If in Definition A.3 we restrict the operation of minimali-
sation to apply only to regular functions, then all functions so produced would be total and
so recursive. It would be nice if all recursive functions could be formed in this way, but, as
stated above, it is not at all obvious. Nevertheless it is so. We will prove this as a corollary
I C.E.18 to a rather big theorem in the Appendix, Corollary (C.E.18).
A. PARTIAL RECURSIVE FUNCTIONS 389

A.16 Definition: Recursive sets and predicates


It seems natural to define a “recursive” subset of N as one for which there is an algorithm to
decide whether any natural number belongs to it or not. More generally, we could define a
recursive subset of Nn in the same way. We will see that this idea turns out to be extremely
useful (important) also.

As a first attempt, we might ask for an algorithm that will yield the answer “true” or “false”,
that is, a function f : Nn → {true, false}. But this won’t do: we need a function Nn → N.
But there is an obvious way to fix this: let the function take values 1 for “true” and 0 for
“false”. Thus, given a subset A of Nn , we want a function f : Nn → N such that f (x) = 1 if
x ∈ A and f (x) = 0 otherwise. That is easy: it is called the characteristic function of the
set A.
At this stage you might pause to wonder what a “partially recursive” subset might be. But
if you try to write down a definition of this idea, you will find it doesn’t make any sense (try
it). So the concept is not defined. (The closest thing is “recursively enumerable”, which we
look at later.)

Given the very close relationship between n-ary relations on N and subsets of Nn , it is
natural to also define recursive relations. The word predicate is synonymous with relation
and tends to be the one usually used in this context, so I will use it here.
(i) A subset A of Nn is recursive if its characteristic function χA : Nn → N is recursive.
Here as the characteristic function is defined in the usual way:
(
χA (x) = 1 if x ∈ A
0 otherwise

(ii) An n-ary predicate P (predicate on Nn ) is recursive if its characteristic function χP


is recursive. Here the characteristic function is defined in the obvious way:
( (
χP (x) = 1 if P (x) is true χP (x) = 1 if P (x)
or better
0 if P (x) is false. 0 if ¬P (x).
It follows that a subset A of Nn is recursive if and only if the predicate “x ∈ A” is recursive.
Similarly a predicate P on Nn is recursive if and only if the subset { x : x ∈ Nn and P (x) }
is recursive.
Recursive predicates have a number of useful properties.

A.17 Proposition
(i) Let P and Q be two n-ary predicates (defined on Nn ). If P and Q are both recursive
then so are ¬P , P ∧ Q , P ∨ Q , P ⇒ Q and P ⇔ Q.
Further, if f1 , f2 , . . . , fn are recursive functions Nm → N, then P (f1 (x), f2 (x), . . . , fn (x)) is
an m-ary recursive predicate.

(ii) Let P be a unary predicate (on N). If P is recursive, then so are the unary predicates
(∀ξ ≤ x)P (ξ) , (∃ξ ≤ x)P (ξ) , (∀ξ < x)P (ξ) and (∃ξ < x)P (ξ)
390 CHAPTER 9. RECURSIVE FUNCTIONS

More generally, suppose P is a predicate on Nn+1 . If P is recursive, then so are the (n+1)-ary
predicates

(∀ξ ≤ x)P (ξ, y) , (∃ξ ≤ x)P (ξ, y) , (∀ξ < x)P (ξ, y) and (∃ξ < x)P (ξ, y).

You will have noticed that all the quantifiers here are bounded. There is no guarantee that
the unbounded versions, such as (∀ξ)P (ξ) or (∃ξ)P (ξ) will be recursive.

Roughly speaking, even if the predicate P is recursive, trying to compute either (∀ξ)P (ξ) or
(∃ξ)P (ξ) will involve an infinite search, which might well end up in an infinite loop. Suppose,
for instance, we are trying to decide whether (∀ξ)P (ξ) is true or not by the straightforward
method of computing the ruth of P (ξ) for ξ = 0, 1, 2, . . . . If (∀ξ)P (ξ) is in fact false, then
there will be some value of ξ for which P (ξ) is false; our algorithm will eventually reach this
value and halt with the answer 0 (for “false”). But if (∀ξ)P (ξ) is in fact true, we will never
get a definitive answer: even if we check the first 10100 values of ξ, we won’t know whether
the (10100 + 1)th yields true or not.
Those remarks are not a proof of the assertion that there are recursive predicates for which
the quantified versions are not recursive; they are simply an indication of the sort of thing
that can go wrong. Actual examples to back up this assertion will appear later in these
notes.
(iii) If f and g are recursive functions Nn → N, then the n-ary predicates

f (x) = g(x) , f (x) 6= g(x) , f (x) ≤ g(x) and f (x) < g(x)

are all recursive also.

(iv) If P (x, y) is an (n + 1)-ary recursive predicate, then the function f : Nn → N given


by
f (y) = min{P (x, y)}
x

is partial recursive. P (x, y) is defined for all x and y, so there are no subtleties about min
to worry about here.

Proof. (i) The characteristic functions χP and χQ : Nn → N are recursive. Now


 χ (x)
χ(¬P ) (x) = 1− and
P
χ(P ∧Q) (x) = χP (x).χQ (x).

That P ∨ Q, P ⇒ Q and P ⇔ Q are recursive now follows immediately.


The characteristic function of P (f1 (x), f2 (x), . . . , fn (x)) is χP (f1 (x), f2 (x), . . . , fn (x)).
(iii) The characteristic functions of these predicates are

x 7→ eq(f (x), g(x)) , x 7→ neq(f (x), g(x)) , x 7→ leq(f (x), g(x)) , and x 7→ less(f (x), g(x)) ,

I A.4 (using the functions given in Section A.4).

is eq(f (x), g(x)) and, of course, f (x) 6= g(x) is ¬(f (x) = g(x)).
A. PARTIAL RECURSIVE FUNCTIONS 391

(iv)  χ (x, y) = 0 } , which is a straight minimalisation.


f (y) = minx { 1− P

We note, for use in the next part of the proof, that if P is such that, for every y ∈ Nn , there
is some x such that P (x, y), then f is recursive.
(ii) The characteristic function χP : Nn → N is recursive. Note that (∀ξ < x)P (x, y) if
and only if
min{ P̄ (ξ, y) = 0 or ξ = x } = x .
ξ

Note also that the predicate being minimised here is recursive (by (i) and the first half of
(iii)). Thus (using the note at the end of the proof of (iv) above), the left hand side of this
equation is a recursive function. Then the whole equation (as a predicate) is recursive, by
(iii) again.
The other three results now follow immediately. 

Question
What does all this mean for recursive sets?
392 CHAPTER 9. RECURSIVE FUNCTIONS

B Primitive recursive functions


B.1 Definition
Given total functions g : Nn → N and h : Nn+2 → N, the operation of primitive recursion
obtains the (total) function f : Nn+1 → N given by

f (0, y) = g(y) (1A)


f (x + 1, y) = h(x, f (x, y), y) for all x ∈ N. (1B)

This is just defining a function by ordinary induction, so we already know that the function
f so defined exists, is total and is uniquely defined by g and h. In the context of recursive
functions, this form of definition of a function is called “primitive recursion”, so we’ll stick
with that terminology here.

We already know that this method of constructing functions is important. We can now
see that it involves a sort of “searching along” process, rather like minimalisation (in both
processes we must compute f (0), f (1), f (2), f (3), . . . ).
So it is of interest to see what happens if we replace the operation of minimalisation by
primitive recursion in the original definition of a partial recursive function. The functions
created this way are called primitive recursive.
We see a couple of things straight away. Firstly, some of the basic functions in the definition
become redundant. Given we have the constant function 0 and the successor function, ad-
dition, multiplication and natural subtraction follow immediately using primitive recursion.
So the definition simplifies somewhat.

Secondly, we cannot possibly get all the partial recursive functions that way. In fact we can
only get total functions, since the basic functions are total and applying primitive recursion
to total functions always produces total functions. In other words all primitive recursive
functions are recursive. This raises the question: are all recursive functions primitive recur-
sive?

Now we have a hierarchy of functions Nn → N:

all primitive recursive functions Nn → N


⊂ all recursive functions Nn → N
⊂ all (total) functions Nn → N

How do we know that these relationships are strict (⊂ instead of ⊆)?


The search for a recursive function which is not primitive recursive is much more difficult
than one might at first suppose. Nearly all the recursive functions that we deal with daily
are in fact primitive recursive: that goes, for instance, for all recursive functions defined so
far in this chapter. In Section D we will settle this by defining the Ackermann Function and
proving that it is recursive but not primitive recursive.
To see that there are ordinary functions Nn → N which are not recursive, it is enough to
observe that there is an uncountable number of ordinary functions Nn → N but only a
B. PRIMITIVE RECURSIVE FUNCTIONS 393

countable number of recursive ones. I can give a rough argument here that the number of
recursive functions is countable: because it must be possible to specify such a function using
a finite number of instructions, each of which is composed of a finite number of symbols
from a finite alphabet. A more watertight proof will be given later.

B.2 Proposition
With the notation of the above definition, if g and h are recursive functions, then so is f .
This proof is the culmination of the series of technical lemmas in the last section leading to
Proposition A.14. It is also technical. Read it if you dare! I A.14

Proof. Let T be the function defined in Proposition A.14. Then, for any x and y, there is
a w so that
T (w, ξ) = f (ξ, y) for all ξ = 0, 1, . . . , x. (2)
Then, substituting in Equations (1A) and (1B) above,

T (w, 0) = g(y) (3A)


and T (w, ξ + 1) = h(ξ, T (w, ξ), y) for ξ = 0, 1, . . . , x − 1 . (3B)

Also conversely, if w is any natural number such that Equations (3A) and (3B) hold, then
T (w, ξ) = f (ξ, y) for all ξ = 0, 1, . . . , x.
Now, for any x and y, let w0 be the least w satisfying these equations. Then, for every x
and y, w0 is uniquely defined, so we may consider it to be a function of x and y and write
it as w0 (x, y). We have in fact defined it as

w0 (x, y) = min{ T (w, 0) = g(y) and (∀ξ < x) T (w, ξ + 1) = h(ξ, T (w, ξ), y) } .
w

and we know that Equation (2) holds for it in the sense that , for any x and y,

T (w0 (x, y), ξ) = f (ξ, y) for all ξ = 0, 1, . . . , x

and, in particular, that


T (w0 (x, y), x) = f (x, y) . (4)
Since T is recursive, it follows from Proposition A.17 that so is w0 and then from Equation I A.17
(4) that so is f . 

B.3 Definition
A function f : Nn → N is primitive recursive if it can be obtained by a finite number of
applications of the operations of substitution and primitive recursion from the functions in
the following list:—
(i) x 7→ x + 1, the successor function suc : N → N,
(ii) 0 the constant zero function N0 → N,
(iii) (x1 , x2 , . . . , xn ) 7→ xi the projection function πn,i : Nn → N.
394 CHAPTER 9. RECURSIVE FUNCTIONS

From the last proposition we see that every primitive recursive function is recursive.
Having said that all the recursive functions defined so far in this chapter are in fact primitive
recursive, this fact really needs proving. To do this, we need to go over all the constructions
of these functions and, wherever a minimalisation is used, show that it can be replaced
somehow with a primitive recursion. I will do that for those functions we want to know
really are primitive recursive.

B.4 Some examples of primitive recursive functions


(i) The identity function id : N → N is still the projection function π1,1 .

(ii) Addition N2 → N:
x+0=x (add(x, 0) = id(x))
x + (y + 1) = (x + y) + 1 (add(x, y + 1) = suc(add(x + y)) )

(iii) The constant function x 7→ 0 (as a function N → N) is obtained from 0 = the


constant 0, as a function N0 → N, and π2,2 by primitive recursion.
The constant zero functions Nn → N for n > 1 can now be obtained by composing the
constant zero function N → N, just constructed, with the projection functions πn,1 .
Other constant functions (either N0 → N or Nn → N) are obtained by composing these zero
functions with the successor function a suitable number of times.
(iv) Multiplication N2 → N:
x0 = 0 (mul(x, 0) = constant zero function N → N)
x(y + 1) = (xy) + x (mul(x, y + 1) = add(mul(xy), x) )

(v) Natural subtraction N2 → N:



Let us first temporarily define the natural predecessor function to be p(x) = x−1. It is
primitive recursive, because it is given by p(0) = 0 and p(x + 1) = x. Now we can define
natural subtraction by
 =x
x−0
 
x−(y + 1) = p(x−y) .

B.5 Remark
From these examples we see that an alternative definition of primitive recursive functions
would be to make the following changes to the definition of partial recursive functions:
replace the operation of minimalisation by that of primitive recursion and add the zero
function to the basic list.
By the same token, an alternative definition of partial recursive functions is afforded by
starting with the definition of primitive recursive function and adding the operation of
minimalisation.
B. PRIMITIVE RECURSIVE FUNCTIONS 395

B.6 Examples continued


(vi) The factorial function N → N:
0! = 1
(x + 1)! = x!.(x + 1) .

(vii) Exponentiation N2 → N:
x0 = 1
xy+1 = xy x .
(viii) The functions zero, eq, nonzero, neq and hx, yi 7→ |x − y| are now defined as in A.4(v) I A.4
above.

B.7 Proposition
If f : Nn+1 → N is recursive or primitive recursive, then so (respectively) are
x
X
g : Nn+1 → N defined by g(x, y) = f (i, y)
i=0

x
Y
h : Nn+1 → N defined by h(x, y) = f (i, y)
i=0

Proof. g(0, y) = f (0, y) and g(x + 1, y) = g(x, y) + f (x + 1, y),


h(0, y) = f (0, y) and g(x + 1, y) = g(x, y).f (x + 1, y). 

In the case that f is recursive, this proposition would have been awkward to prove without
the help of Proposition B.2. I B.2

B.8 Proposition
If f : Nn+1 → N is a recursive or primitive recursive function, then so is the function obtained
from it by bounded minimalisation, that is,
(
the least value of x such that x ≤ m and f (x, y) = 0 if such an x exists,
h(m, y) =
0 otherwise.

Proof. Noting that (


1 if f (x, y) 6= 0,
nonzero(f (x, y)) =
0 otherwise
we have (
x
Y 1 if f (i, y) 6= 0 for all i ≤ x,
nonzero(f (i, y)) =
i=0
0 otherwise
396 CHAPTER 9. RECURSIVE FUNCTIONS

Also (
x + 1 if f (x + 1, y) = 0,
(x + 1). zero(f (x + 1, y)) =
0 otherwise
so, multiplying the last two functions,
x
Y
(x + 1). zero(f (x + 1, y)) nonzero(f (i, y))
i=0
(
x+1 if f (x + 1, y) = 0 and f (i, y) 6= 0 for all i ≤ x,
=
0 otherwise

and this is primitive recursive and so, defining a function θ : Nn+1 → N by

θ(0, y) = 0
x
Y
θ(x + 1, y) = (x + 1). zero(f (x + 1, y)) nonzero(f (i, y)),
i=0

θ is also primitive recursive and


(
x if f (x, y) = 0 and f (i, y) 6= 0 for all i < x,
θ(x, y) =
0 otherwise.

So
m
X
h(m, y) = θ(x, y) is primitive recursive also.
x=0

Notation It will be convenient to use the notation minx≤m {f (x, y) = 0} for the result of
applying bounded minimalisation to the function f .
Thus we could, if we felt so inclined, add the operation of bounded minimalisation to the
list of basic operations in the definition of primitive recursive functions.
In a rough sort of way, this encapsulates the difference between recursive functions and
primitive recursive ones. The ordinary minimalisation can be thought of as incorporating
into our definition of a partial recursive function the process of searching through a set of
values until something or other occurs to stop the search — acknowledging the possibility
that that something or other might never occur. (This is where an infinite loop might arise.)
With bounded minimalisation, we have a limit on how far the search is to proceed, that
limit being known at the outset of the search. This precludes the possibility of an infinite
loop.

B.9 Corollary
Now we see that all the functions given as examples of recursive functions so far (the total
ones only!) are in fact primitive recursive, because in each case where minimalisation was
B. PRIMITIVE RECURSIVE FUNCTIONS 397

used, bounded minimalisation could have been used instead. For example, integer division
and integer square root can be redefined:
 
x
= min{gtr(y(z + 1) , x) = 0}
y z≤x


b xc = min{gtr((y + 1)2 , x) = 0}.
z≤x

In the same way, the functions P , L and R used to set up a recursive bijection between N
and N2 are primitive recursive.

B.10 Definition: Primitive recursive sets and predicates


The definitions here are the same as for recursive sets and predicates, but I give them just
to be on the safe side.
A subset A of Nn is primitive recursive if its characteristic function χA : Nn → N is primitive
recursive.
An n-ary predicate P (predicate on Nn ) is primitive recursive if its characteristic function
χP is primitive recursive.

It follows that a subset A of Nn is primitive recursive if and only if the predicate “x ∈ A” is


primitive recursive. Similarly a predicate P on Nn is primitive recursive if and only if the
subset { x : x ∈ Nn and P (x) } is primitive recursive.

B.11 Proposition
This corresponds quite closely with Proposition A.17. I A.17
n
(i) Let P and Q be two n-ary predicates (defined on N ). If P and Q are both primitive
recursive then so are ¬P , P ∧ Q , P ∨ Q , P ⇒ Q and P ⇔ Q.
Further, if f1 , f2 , . . . , fn are primitive recursive functions Nm → N, then P (f1 (x), f2 (x), . . . , fn (x))
is an m-ary primitive recursive predicate.
(ii) Let P be a unary predicate (on N). If P is primitive recursive, then so are the unary
predicates

(∀ξ ≤ x)P (ξ) , (∃ξ ≤ x)P (ξ) , (∀ξ < x)P (ξ) and (∃ξ < x)P (ξ)

More generally, suppose P is a predicate on Nn+1 . If P is primitive recursive, then so are


the (n+1)-ary predicates

(∀ξ ≤ x)P (ξ, y) , (∃ξ ≤ x)P (ξ, y) , (∀ξ < x)P (ξ, y) and (∃ξ < x)P (ξ, y).

(iii) If f and g are primitive recursive functions N → N, then the n-ary predicates

f (x) = g(x) , f (x) 6= g(x) , f (x) ≤ g(x) and f (x) < g(x)

are all primitive recursive also.


398 CHAPTER 9. RECURSIVE FUNCTIONS

Proof. The proofs of (i) and (iii) are exactly the same as the corresponding proofs of
Proposition A.17. The proof there of (ii) involves minimalisation, so cannot be used here, I A.17
so . . .
(ii) The characteristic function of (∀ξ ≤ x)P (ξ) is
x
Y
χP (ξ, y)
ξ=0

and now the other three follow immediately. 

B.12 Finite sequences again


It will be useful to have some primitive recursive functions relating members of Nn and N,
in the way the functions P , L and R relate N2 and N. More specifically, for each n we want
primitive recursive functions Pn : Nn → N and En,1 , En,2 , . . . , En,n all : N → N such that

En,i (Pn (x1 , x2 , . . . , xn )) = xi for every sequence (x1 , x2 , . . . , xn ) and i = 1, 2, . . . , n

P (En,1 (w), En,2 (w), . . . , En,n (w)) = w for every w ∈ N .

However now we can define recursive and primitive recursive functions N → Nn , so it is


neater to wrap these E functions up into a single inverse E : N → Nn .

For every n we will create primitive recursive functions Pn : Nn → N and En : N → Nn


which are inverses of one another, so

En (Pn (x1 , x2 , . . . , xn )) = hx1 , x2 , . . . , xn i for all hx1 , x2 , . . . , xn i ∈ Nn


P (E(y)) = y for all y ∈ N

Instead of performing a complicated multi-dimensional version of the construction of our


original pairing functions P and hL, Ri (which would be horrible), we can define new ones
by induction, using the original P, L and R.

B.13 Definition
(i) We define the function Pn : Nn → N by induction over n:

P0 () = 0 ; P1 (x) = x
Pn+1 (x1 , x2 , . . . , xn+1 ) = P (x1 , Pn (x2 , x3 , . . . ,n+1 )) for all n ≥ 1 ,

where P is the function N2 → N defined earlier.


(ii) We define the function En : N → Nn , also by induction over n:

E0 (w) = () ; E1 (w) = (w) ;


En+1 (w) = (L(w), LR(w), LR2 (w), . . . , LRn−1 (w), Rn (w)) for all n ≥ 1
B. PRIMITIVE RECURSIVE FUNCTIONS 399

(Here I am using a simplified notation for composites. For instance, LR(w) means L(R(w)),
LR2 (w) means L(R(R(w))) and so on.)
Some examples:

E2 (w) = (L(w), R(w))


E3 (w) = (L(w), LR(w), R2 (w))
E4 (w) = (L(w), LR(w), LR2 (w), R3 (w))

It will be convenient to write En,i for the various components of En , so that

En,1 (w) = L(w) for all n ≥ 1 ,


i−1
En,i (w) = LR (w) for all i, n such that 2 ≤ i ≤ n − 1 ,
n−1
En,n = R (w) for all n ≥ 1 .

B.14 Proposition
With the notation of the last definition, the functions Pn and En are all primitive recursive
(to say that En is primitive recursive means, as usual, that all its component functions En,i
are primitive recursive).
Furthermore, for any n 6= 0, these functions are inverse bijections, that is

En (Pn (x)) = x for all x ∈ Nn and


Pn (E(w)) = w for all w ∈ N .

The functions P0 : N0 → N and E0 : N → N0 are exceptions to this, because N0 has only


one member, the empty sequence. The first equation above still holds for n = 0, but the
second changes to
P0 (E0 (w)) = 0 for all w ∈ N .

Proof. This is all easy. 

B.15 Proposition
Let f be any function Nm 99K Nn , partial or total. Then there is a partial function f O :
N 99K N such that
f O ◦ Pm = Pn ◦ f (–1)
If m 6= 0 this function is unique.
Moreover,
f O is partial recursive if and only if f is partial recursive,
f O is recursive if and only if f is recursive,
f O is primitive recursive if and only if f is primitive recursive.
400 CHAPTER 9. RECURSIVE FUNCTIONS

f
Nm Nn

Pm Pn

N N
fO

Proof. This is quite easy (and pretty).


Given f , define f O by
f O = Pn ◦ f ◦ Em (–2)
Composing both sides on the right with Pm and using the fact that Em ◦ Pm is the identity
function, we have (–1).
For uniqueness, suppose that m 6= 0 and f ∗ is another function satisfying (–0), that is,
f ∗ ◦ Pm = Pn ◦ f . Then composing both equations on the right with Em we have
f ∗ = f ∗ ◦ Pm ◦ Em = Pn ◦ f ◦ Em
f O = f O ◦ Pm ◦ Em = Pn ◦ f ◦ Em

Also, composing (–1) on the left with En and using the fact that En ◦ Pn is the identity
function (always, even if n = 0), we have
E n ◦ f O ◦ Pm = f (–3)

f f
Nm Nn Nm Nn

Em Pn Pm En

N N N N
fO fO

Since Pm , Pn , Em and En are all primitive recursive and therefore also recursive and partial
recursive, Equations (–2) and (–3) give us statement (ii). 

B.16 Discussion
IC In the Appendix (C) and this chapter we will need to work with the set of all finite sequences
of natural numbers. This is the set

[
0 1 2
N ∪ N ∪ N ∪ ... = Nn .
n=0
B. PRIMITIVE RECURSIVE FUNCTIONS 401

Let us call this set N[∞] to have a simpler name for it.
We will want to consider functions from this set to itself, from it to Nn and from Nm to it.
There are many such functions which we would like to think of as recursive, because there
are obvious algorithms to compute them. Examples would be: finding the length of one of
these sequences, finding its first member, reversing its order, removing its first entry and so
on.
Now to ask for such a function to be partial recursive, recursive or primitive recursive is
problematical: we have as yet no definition of partial recursiveness (etc.) for functions
defined on this domain. So the job is now to supply one. What we do is take a hint from
the previous proposition: define a pair of inverse functions

P[∞] : N[∞] → N and E[∞] : N → N[∞]

which we can accept (intuitively) as being algorithmically computable, then use the above
proposition (sort of in reverse) to define partial recursive, recursive and primitive recursive
functions.
The basic idea for the function P[∞] is fairly simple: we can easily describe a sequence of
any length by two numbers, its length, n say, and Pn of the sequence. In other words,
any sequence x = hx1 , x2 , . . . , xn i is specified by the pair hn, Pn (x)i. We want however to
describe the sequence by a single number; but this is now easy, use the function P = P2 on
this pair. So we get (as a first approximation) P[∞] (x) = P2 (n, Pn (x)).
However, there is a small bug in this process. We really need these two functions to be
inverses, but P[∞] is not surjective (onto), so it cannot have an inverse. The trouble is with
the N0 bit which only has the empty sequence h i, for which P[∞] () = P2 (0, P0 ()) = P2 (0, 0) =
0, but none of the numbers P2 (0, 1) , P2 (0, 2) , P2 (0, 3) , . . . occur as values. Nothing else
goes wrong, so we make a slight adjustment to the definition of P[∞] to finesse our way
around this problem. Having ensured that P[∞] is in fact a bijection, the inverse E[∞] must
exist, and a little messy tinkering tells us what its definition must be. We can then prove
that they are both bijections by showing that they are inverses, a reasonably straightforward
calculation considering. So here is the outcome.

B.17 Definition
(i) We define the function

P[∞] : N[∞] → N

as follows: for any sequence x = (x1 , x2 , . . . , xn ) of length n,


0
 if n = 0,
P[∞] (x) = P2 (0, x1 + 1) if n = 1,

P2 (n − 1, Pn (x)) if n ≥ 2.

402 CHAPTER 9. RECURSIVE FUNCTIONS

Since, in the case n = 1, we have Pn (x) = x, this definition can be made to look a bit more
regular by rewriting the second case thus:

0
 if n = 0,
P[∞] (x) = P2 (n − 1, Pn (x) + 1) if n = 1,

P2 (n − 1, Pn (x)) if n ≥ 2.

Looking at the definitions in the previous section we see that we have an alternative (and
neater) way of expressing this definition: P[∞] (x1 , x2 , . . . , xn ) = Pn+1 (n, x1 , x2 , . . . , xn ).
(ii) We define the function
E[∞] : N → N[∞]
as follows: for any natural number w,

h i
 if w = 0,
E[∞] (w) = hR(w) − 1i if w 6= 0 and L(w) = 0,

EL(w)+1 (R(w)) if w =6 0 and L(w) ≥ 1.

(iii) Let f : Nm → Nn be a function, where either m or n or both might be [∞]. The we


define the function f O : N → N by

f O = Pn ◦ f ◦ Em .

Now everything to do with these functions works just the way we want.

B.18 Proposition
The functions P[∞] : N[∞] → N and E[∞] : N → N[∞] are inverse functions (and therefore
both bijections).

Proof. (1) Suppose that P[∞] (x) = w; we prove that E[∞] (w) = x.
From the definition of P[∞] there are three possibilities:
• n = 0, that is, x = h i.
Then P[∞] (x) = 0 so w = 0.
But then E[∞] (w) = E[∞] (0) = h i, as required.
• n = 1 so that x = hx1 i.
Then w = P[∞] (x(= P2 (0, x + 1).
But then w 6= 0 and L(w) = 0.
Therefore E[∞] (w) = hR(w) − 1i = hx1 i = x.

• n ≥ 2.
B. PRIMITIVE RECURSIVE FUNCTIONS 403

Then w = P2 (n − 1, Pn (x)), soL(w) = n-1andR(w) = Pn (x).


Also w 6= 0 and L(w) ≥ 1 and so we have
E[∞] (w) = EL(w)+1 (R(w)) = En (Pn (x)) = x.
(2) Suppose now that E[∞] (w) = x; We prove that P[∞] (x) = w.

From the definition of E[∞] there are three possibilities:


• w = 0.
In this case x = E[∞] (w) = h i and so the length n of x is 0.
Thus P[∞] (x) = 0, as required.

• w 6= 0 and L(w) = 0.
In this case x = E[∞] (w) = hR(w) − 1i, a sequence of length n = 1.
Then P[∞] (x) = P2 (0, R(w)) = P (2(L(w), R(w)) = w.
• w 6= 0 and L(w) ≥ 1. In this case x = E[∞] (w) = EL(w)+1 (R(w)) = En (R(w)),
a sequence of length n. From this also Pn (x) = R(w). Then P[∞] (x) =
P2 (n − 1, Pn (x)) = P2 (L(w), R(w)) = w. 

B.19 Proposition
Proposition B.15 above holds true even when m or n or both are [∞]. I B.15

Proof. From the definition of f O in the general case and the fact that P[∞] and E[∞] are
inverses, Equations (–1), (–2) and (–3) follow. Then Part (ii) follows as in the proof of that
Proposition. 

B.20 Proposition
Given two functions f : Nl → Nm and g : Nm → Nn , where any of l, m, n might be [∞], if f
and g are both partial recursive / recursive / primitive recursive the composite g ◦ f is the
same.

Proof. Just follow the arrows around on this diagram, using the fact that Pl , Pm and Pn
are bijective; and the fact that follows from this that (g ◦ f )O = g O ◦ f O .

f g
Nl Nm Nn

Pl Pm Pn

N N N
fO gO
404 CHAPTER 9. RECURSIVE FUNCTIONS

B.21 Proposition
There are primitive recursive functions

len : N → N
ent : N2 → N
del : N → N
adj : N2 → N
rep : N3 → N
concat : N2 → N

with these properties:


(i) len(w) gets the length of the sequence: if x = (x1 , x2 , . . . , xn ) and P[∞] (x) = w then
len(w) = n.
(ii) ent(i, w) gets the ith entry of the sequence: if x = (x1 , x2 , . . . , xn ), P[∞] (x) = w
and 1 ≤ i ≤ n then ent(i, w) = xi .
(iii) del(w) represents removing the first entry of the sequence: if n ≥ 1, x = (x1 , x2 , . . . , xn ),
x0 = (x2 , x3 , . . . , xn ), P[∞] (x) = w and P[∞] (x0 ) = w0 then del(w) = w0 .
(iv) adj(z, w) represents adjoining a new first entry z to the sequence: if x = (x1 , x2 , . . . , xn ),
x0 = (z, x1 , x2 , . . . , xn ), P[∞] (x) = w and P[∞] (x0 ) = w0 then adj(z, w) = w0 .

(v) rep(r, w, y) represents replacing the rth entry of the sequence by the value y: if x =
(x1 , x2 , . . . , xn ), x0 = (x1 , x2 , . . . , xr−1 , y, xr+1 , . . . , xn ), P[∞] (x) = w and P[∞] (x0 ) = w0
then rep(r, w, y) = w0 .
(vi) concat(u, v) represents the concatenation of the sequences represented by u and v:
concat(u, v) = P[∞] (hx1 , x2 , . . . , xm , y1 , y2 , . . . , yn i), where u = P[∞] (hx1 , x2 , . . . , xm i) and
v = P[∞] (hy1 , y2 , . . . , yn i).
(vii) zeros(n) represents an all-zero sequence of length n, P[∞] (h0, 0, . . . , 0i).

Proof. (i) This is trivial: len(w) = L(w).


(ii) Given w = P[∞] (x1 , x2 , . . . , xn ) = Pn+1 (n, x1 , x2 , . . . , xn ) we have ent(i, w) = xi =
En+1,i+1 (w).

(iii) Given w = P[∞] (x1 , x2 , . . . , xn ) = Pn+1 (n, x1 , x2 , . . . , xn ), and noting that then
Pn−1 (x2 , x3 , . . . , xn ) = R2 (w), we have

del(w) = P[∞] (x2 , x3 , . . . , xn ) = Pn (n − 1, x2 , x3 , . . . , xn ) = P (L(w) − 1 , R2 (w)) .


B. PRIMITIVE RECURSIVE FUNCTIONS 405

(iv) Given w = P[∞] (x1 , x2 , . . . , xn ) = Pn+1 (n, x1 , x2 , . . . , xn ),

adj(z, w) = P[∞] (z, x1 , x2 , . . . , xn )


= P (n + 1, Pn+1 (z, x1 , x2 , . . . , xn )
= P (n + 1, P (z, Pn (x1 , x2 , . . . , xn )))
= P (L(w) + 1 , P (z , R(w))) .

(v) Given w = P[∞] (x1 , x2 , . . . , xn ) = Pn+1 (n, x1 , x2 , . . . , xn ), in the case r = 1,

rep(1, w, y) = P[∞] (y, x2 , x3 , . . . , xn )


= P (n , Pn (y, x2 , x3 , . . . , xn ))
= P (n , P (y, Pn−1 (x2 , x3 , . . . , xn )))
= P (L(w) , P (y, R2 (w)))

and in the case r ≥ 2,

rep(r, w, y) = P[∞] (x1 , . . . , xr−1 , y, xr+1 , . . . , xn )


= P (n , Pn (x1 , . . . , xr−1 , y, xr+1 , . . . , xn ))
= P (n , P (x1 , Pn−1 (x2 , . . . , xr−1 , y, xr+1 , . . . , xn ))

but del(w) = (x2 , . . . , xn ) and so

rep(r − 1, del(w), y) = P[∞] (x2 , . . . , xr−1 , y, xr+1 , . . . , xn )


so rep(r, w, y) = P (L(w) , P (x1 , rep(r − 1, del(w), y) )) .

(vi) We define a preliminary function f : N3 → N by

f (r, u, v) = v ,
f (r + 1, u, v) = adj(ent(len(u) − r, u), f (r, u, v)) for all r ≥ 0 .

Then f (r, u, v) = P[∞] (hxm−r+1 , xm−r+2 , . . . , xm , y1 , y2 , . . . , yn i), and so concat(u, v) =


f (len(u), u, v).

(vii) The function zeros is defined by

zeros(0) = 0 (the sigma value of the empty sequence),


zeros(r + 1) = adj(0, zeros(r)) for all r ≥ 0.

It will be observed that Cases (v) – (vii) use definition by primitive recursion over r and
otherwise every step in all these cases is an operation we already know to be primitive
recursive. 
406 CHAPTER 9. RECURSIVE FUNCTIONS

C Specifying algorithms
C.1 Algorithms in general
One of the main points about the definition of (partial) recursive functions is that this
definition should cover functions which can be defined by any algorithm whatsoever. At
this point there are (or appear to be) some serious problems with the approach taken so far
in this chapter.
Firstly, it is not at all clear that any algorithm we might think up can be recast into the
I A.3 methods we have been using so far. The basic definition, A.3, is not very helpful, but in the
last two sections we saw how, by use of much trickery, we can usually do this. But to go
on and claim that any algorithm can be recast in this manner appears to require more of a
leap of faith than a mathematician should be happy with.
Secondly, the definition only applies to functions involving natural numbers. Worse, as we
have seen, it does not apply to functions defined on tuples of numbers of variable length.
For example, a function taking any sequence of natural numbers to its sum

hx1 , x2 , . . . , xn i 7→ x1 +x2 + . . . +xn (same function works for all n)

can clearly be computed by an algorithm, but is not covered by Definition A.3. Even worse,
there are algorithmic procedures which do not involve natural numbers at all, for example
reversing the letters in any word, as in

“CAT” 7→ “TAC” .

So, we want something a bit more general.


The idea of an algorithm, as a recipe for calculating with numerals, has been around infor-
mally since the late middle ages. The idea of defining an algorithm precisely, in a mathemat-
ical way, dates from the work of Alan Turing, Alonzo Church and Kurt Gödel in the 1930s.
At this time several different definitions of an algorithm emerged, perhaps the most famous
being the idea of Turing computability, but there were others such as Church’s lambda
calculus.
A fundamental problem with all this is that of trying to establish an equivalence between a
mathematically defined notion such as Turing computable or computable by lambda calcu-
lus (etc.) with the not-so-precise real-world notion of being computable by any algorithm
whatsoever.

C.2 About the appendix


IC It is probably a good idea at this juncture to take a look at Appendix C. There is quite
a bit of material there and it leads us a little away from strictly numerical calculations, so
I would suggest simply skimming the material to get an idea of what it is all about. Of
course, if you find it fascinating and want to dig in, feel free.

Here is a quick overview.


C. SPECIFYING ALGORITHMS 407

The first (and main) part of Appendix C is devoted to developing the notion of a completely
general algorithm (as far as that can be done). Such an algorithm is treated as a general
method for manipulating patterns of symbols. So in particular, numbers are treated as
strings of symbols (for example, a string of 0s and 1s for binary code).
The main facts and results that will be found in the appendix are as follows.

(1) That a partial function Nn 99K N is partially recursive if and only if it is computable
by an algorithm of the very general sort described in the appendix.
(2) Considering different possible codes for describing natural numbers: given some very
simple and obvious restrictions on the form of the code, it does not matter which code you
use — a function is computable in one code if and only if it is computable in the other.
These facts make the strongest argument I know for the assertion that the definition of
a partial recursive function at the beginning of this chapter characterises those functions
Nn → N which can be computed by any algorithm whatsoever. Which of course was the
point of everything we have done so far in this chapter.

(3) Out of the proof of this comes the important result (Theorem C.E.17) that I C.E.17
For any partial recursive function f : Nn 99K N, there are primitive recursive functions
p, q : Nn+1 → N such that
f (x) = p(min{q(i, x) = 0}, x) for all x ∈ Nn .
i

Back at the beginning of this chapter (Subsections A.2 and A.3, when partial recursive I A.2
functions were being defined, it was pointed out that care should be taken with the definition I A.3
of minimalisation (µg(y) = minx {g(x, y) = 0}), because it is easy to overlook the fact that it
can be applied to partial functions, for which there is a special way of reading this definition.
Since a partial recursive function is built up by repeated use of the operations of substitution
and minimalisation, this can be a bit of a problem when trying to prove things about them.
The result just quoted means that this problem can be avoided, and we have:—
(4) The original definition (in A.3) of a partial recursive function Nn → N remains true
if we replace the word “minimalisation” with “minimalisation applied to a regular function”.
(This is Corollary C.E.18 to the last-mentioned theorem.) I C.E.18
(5) A “programming language” is developed there which is designed to provide a fairly
easy way to describe algorithms and so provide proofs that partial recursive functions are
in fact partial recursive. (The methods we have used so far have been fairly painful.)

The language developed in the appendix is aimed at the general form of algorithm, involving
symbol-by-symbol manipulation. This is working at too detailed a level for what we will be
wanting to do in this chapter, so our next job here is to describe a subset of that language,
sufficient for all the numerical manipulations we will want to do.
(6) There is a brief run-down on Turing machines.

C.3 A simple programming language


One of the ways of defining a partial recursive function is to give an algorithm to compute it.
408 CHAPTER 9. RECURSIVE FUNCTIONS

Similarly, to prove that a given function is partial recursive, a standard method is to show it
can be computed by an algorithm. In many cases the algorithm can be quite complicated,
so it is good to have a way of describing algorithms as precisely and readably as possible.
I describe here a way of specifying an algorithm which is quite similar to a number of
computer languages, notably C. It is much simpler than those languages, because we do not
have to worry about many of the real-life problems that working on an actual computer
involves.
This language is a subset of the more general one given in the appendix — simpler since
we will not need to get down to the symbol-by-symbol kind of manipulation dealt with
there. However, that means that the proof that this language does define partial recursive
functions Nn → N and, conversely, that every such function can be specified in this language,
is contained in the similar (and lengthy!) proof in the appendix.
A program has the following form: one or more functions of which the first one is designated
the main one.
function; function; . . . function;

There must be at least one function. The non-main functions are called subfunctions.
A function has the following form

name(x1 , x2 , . . . , xm )(u1 , u2 , . . . , un ) {S1 ;S2 ;. . . ;Sk } (–1)

Where x1 , x2 , . . . , xm ; u1 , u2 , . . . , un are the variable symbols, of which x1 , x2 , . . . , xn are the


arguments and u1 , u2 , . . . , un the local variable symbols. Either or both of these may be
empty, that is, m ≥ 0 and n ≥ 0. The variable symbols are all distinct. The S1 , S2 , . . . , Sk
are statements, of which the last one must be a return statement (so there is at least one
statement in the list). There may be return statements elsewhere in the function also. The
names of the functions which make up a program must all be different.
Here is a simple example. It computes the exponential function
(for natural numbers) as defined inductively by
(
n 1 if n = 0 ,
m = n−1
m .m otherwise.

The program is:


1 exp(m,n)() {
2 if (n=0) return 1;

3 return exp(m,n−1) * m;
4 }

As you can see, it is pretty well self-explanatory. It has only one function, the main one,
whose name is exp. In fact most of our programs will only need one function. The arguments
are m and n and there are no local variables.
C. SPECIFYING ALGORITHMS 409

The return 1 statement on line 2 does two things: it tells the function to “return” the value
1 (output it if you like) and it tells the algorithm to stop, that it is finished. (So in the case
n = 0 it does not go on to line 3.)
In the third line the function “calls” itself recursively, thereby taking advantage of the in-
ductive nature of the definition. (This is a great boon).

The line numbers on the left are not part of the program. They are just shown to make it
easier to discuss what is going on.
Here is another example. This time we compute the same expo-
nential function by repeated multiplication. The program is:
1 exp(m,n)(r,u) {
2 r := 1; u := n;
3 while (u>0) {
4 r := r*m;

5 u := u−1;
6 }
7 return r
8 }

Here there are a couple of local variables. Think of them as private variables, which the
function can use for its own purposes. In this case, r is used to build up the value and u is
used as a count-down to the finish.

The while statement causes the statements in braces following (lines 4 and 5) to be repeated
over and over until the condition u>0 becomes false (as it will eventually because of the
countdown in line 5). In general, several statements can be grouped together to form a
single unit using curly braces { and }.
Note that spaces and newlines can be incorporated anywhere (except in the middle of names
of things). It is a good idea to use this and indenting to make your programs easier to
understand.
Note the difference between the plain equals sign in the first program (n=0) and the colon-
equals sign in the second (r:=1). The plain equals sign is used for a condition (is n equal
to zero? And produces true or false as its value. The colon equals sign means compute the
value on the right hand side and set the left hand variable equal to this.
x = x+1 asks “is x equal to x + 1?”; this is always false.
Note the difference here.
x := x+1 says “add 1 to the number x.

These “colon-equal” statements are called assignment statements (because the value calcu-
lated on the right hand side is “assigned” to the variable on the left. So the left hand side
must be a simple variable name. The right hand side can be a quite complicated expression.
(The word “expression” is used here in a different sense from its use in the context of formal
languages for theories, i.e. in other chapters of this book.)
410 CHAPTER 9. RECURSIVE FUNCTIONS

Expressions
So what is allowed for an expression?
• Any variable name.

• Any numeral representing a natural number. For example 0 or 113.


• A value calculated by a subfunction. For instance, suppose we have included a
subfunction to compute exponentiation, as per either of the examples above. Then exp(x,5)
is a good expression.

• Expressions may be built up using the operations + (addition), − (natural sub-
traction) and * (multiplication) using parentheses and the usual rules. Expressions may
be substituted into function calls, including substituting functions into one another. For
example
a*x + b*exp(x,(2*n+1))
is a good expression.

Note that it is OK to give ones variables names which are several letters long (e.g., fred +
nurk). For this reason, use * for multiplication, because, for instance, xy is a single variable
name. As far as the programming language is concerned it has nothing to do with x or y.
• Variables can only have natural number values, so we use 1 and 0 to stand for true
and false. This allows us to write things like

x := (a<b) .

Here, x is set equal to either 1 or 0, depending upon whether the value of a is less than that
of b or not. The allowed comparison operations are the usual ones,

< , > , ≤ , ≥ , = , 6=

with the usual meanings.


• And now we add to our ways of building up expressions the logical connectives

¬ , ∧ , ∨ , ⇒ , ⇔

Statements
Firstly, note that a statement may be made up of a number of simpler statements by
enclosing them in curly braces thus: {S1 ;S2 ; . . . ;Sk } (this occurs in the definition of a
function (–1) above. The individual statements are often written one under another for
legibility. A statement made up of simpler statements in this way is called a compound
statement; compound statements may be included in larger compound statements to any
depth. A statement which is not compound is called a simple statement.
Here are the kinds of simple statement allowed.

• An assignment statement is of the form

x := expression
C. SPECIFYING ALGORITHMS 411

where x is a variable name. These statements have pretty well been dealt with above.
• A return statement is of the form

return expression

It tells the function to compute the value of the expression and return it as the value of
the function. Every function must have a return statement at the end, and it is allowed to
have others elsewhere if it makes sense to return prematurely (this is done in the second exp
example above).

• An if statement is of the form

if (condition) T-statement ;
else F-statement

Here, a condition is any expression which evaluates to either 1 (for true) or 0 (for false).
If the condition is true, the program executes the T-statement, skips the F-statement and
proceeds to what follows. If the condition is false, the program skips the T-statement,
executes the F-statement and proceeds to what follows.
The else part may be omitted (as in the first exp example above).
The T- and F-statements here are frequently compound.

• A while statement is of the form

while (condition) statement

The statement is executed repeatedly so long as the condition is true. The statement is
usually compound and in any case had better modify the condition somewhere because
otherwise it will continue repeating forever.
That is all we need to create quite complicated algorithms for partial recursive functions
conveniently. As usual, there are a few abbreviations that may be used to make things
easier.

Re-using functions
Having written out a program for the exponential function, as we have done above, we
can now use it in any other program we like. There is no point in writing out the whole
exponential function as a subfunction again, even though we know it should be there, because
we know that it can be included. Rewriting the whole thing every time we want to use it
would just be a waste of time.

And the same goes, of course, for any other function we have already programmed.
But we can do better than that.
The big theorem of the appendix tells us that any partial recursive function is programmable
in this way. So, when we want to include such a function in a program, we can save a lot
of trouble by just noting that a program to compute it exists, give it a name, and forthwith
use it as a subfunction without bothering to write the it out explicitly.
412 CHAPTER 9. RECURSIVE FUNCTIONS

For instance, suppose we wanted to write out a program which involved the “T -function”,
I A.14 T (w, i), of Proposition A.14. The proof of the proposition tells us how to program it, but
that involves the functions Rem, L and R, which would also have to be programmed. But
there is no real need to bother. Just give it a name (T comes to mind) and use it.

Where partialness arises


Where a function is defined by a program as described here, it may be properly partial —
that is, it may have missing values. There is only one way this can occur, and that is by
getting into a “while-loop” that fails to terminate. Such unpleasantnesses can happen quite
easily. Consider, for instance, this program fragment
x := 1;
while (x 6= 0) x := x+1;

Of course many infinite loops are not so easy to spot as this one. In fact we will see that
there is no way, in general, to recognise infinite loops.

Minimalisation
Here is how to implement minimalisation in this language. Suppose we want to implement

min{f (x, y) = 0} ,
y

where we already know how to implement the function f . So we can suppose we have a
subfunction
F(x1 , x2 , . . . , xn ,y)
to compute it. Then the minimalisation is implemented thus:
y := 0;
while (F(x1 , x2 , . . . , xn ,y) 6= 0) y := y+1; Easy!
return y;
D. THE ACKERMANN FUNCTION 413

D The Ackermann function

D.1 Remarks
Are there any recursive functions which are not primitive recursive? In this section we answer
this question with “yes” by constructing the Ackermann function as a recursive function and
proving that it is not primitive recursive — well, we will discuss the Ackermann function
but then use a related function which gives the result with a simpler proof.
Consider the definitions of addition, multiplication and exponentiation (which I will give in
a back-to front sort of way, defining x + y and xy by induction over x instead of the more
usual y):

0+y = y (x + 1) + y = suc(x + y)
0y = 0 (x + 1)y = xy + y
0
y = 1 y x+1 = y x .y

We could continue this pattern by defining a few new functions,

f (0, y) = 1 f (x + 1, y) = y f (x,y)
g(0, y) = 1 g(x + 1, y) = f (g(x, y), y)

and so on.
So let us define a sequence am of functions N2 → N inductively, thus:

a0 (x, y) = x + y
a1 (0, y) = 0 a1 (x + 1, y) = a0 (a1 (x, y), y)
am+1 (0, y) = 1 am+1 (x + 1, y) = am (am+1 (x, y), y) for all m ≥ 1.

Then a0 is addition, a1 is multiplication and a2 exponentiation, then a3 is a sort of super-


exponentiation, a4 a sort of super-super-exponentiation, and so on. Pretty soon, these
functions grow very fast indeed. We see that all these functions are primitive recursive
and so recursive. Now let us put the subscript in as another argument: define a function
F : N3 → N by F (m, x, y) = am (x, y). We can provide a cleaner looking “recursive” definition
of this:

F (0, x, y) = x + y
F (1, 0, y) = 0
F (m, 0, y) = 1 for all m ≥ 2
F (m + 1, x + 1, y) = F (m, F (m + 1, x, y), y) for all m and x.

This function is recursive but not primitive recursive (recursive is obvious, not primitive
recursive is not).

The Ackermann function is a closely related (and slightly simpler) function A given by . . .
414 CHAPTER 9. RECURSIVE FUNCTIONS

D.2 Definition: The Ackermann function and a related one


(i) Define a sequence of functions am : N → N for each m ∈ N by induction over m:
a0 (x) = x + 2
(
am (am (0)) if x = 0,
am+1 (x) =
am (am+1 (x)) otherwise.
The actual Ackermann function is the function Ak of two variables got by setting Ak(m, x) =
am (x), so looking at it that way we have:
Ak(0, x) = x + 2 (Ak1)
Ak(m + 1, 0) = Ak(m, Ak(m, 0)) (Ak2)
Ak(m + 1, x + 1) = Ak(m, Ak(m + 1, x)) . (Ak3)
However we will not work with this function. Rather, we will work with a closely related
function with similar properties and for which the proof of the important result is somewhat
simpler.

(ii) Define a sequence of functions αm : N → N for each m ∈ N by induction over m:


α0 (x) = x + 1 for all x ∈ N,
[x+2]
αm+1 (x) = αm (x) for all m, x ∈ N.
Here the superscript [x + 2] means the function αm composed with itself x + 1 times. For
example,
[4]
αm (2) means αm (αm (αm (αm (2)))) .
In this section, we will write A for the function of two variables got by setting A(m, x) =
αm (x). (Giving a recursive type definition like the equations Ak1–Ak3 above is not hard
but messy; we won’t be needing it.)

D.3 Lemma
The Ackermann-like function A defined above is recursive.

Proof. That it is partial recursive is easily seen by writing a program for it:
1 A(m,x)() {
2 if (m=0) {return x+2};
2 n := m+2;
2 v := x;
3 while (n > 0) {

4 v := A(m-1,v))
4 n := n-1
5 }
5 return v;
6 }

That it always has a value follows immediately from the definition of the α functions above
using induction over m.
D. THE ACKERMANN FUNCTION 415

D.4 Lemma
For the α functions defined above,
(i) αm (x) > x for all m, x ∈ N.

(ii) For any m ∈ N, the function x 7→ αm (x) is strictly increasing.


(iii) For any x ∈ N, the function m 7→ αm (x) is strictly increasing.
(iv) αm+1 (x) ≥ αm (αm (x)) for all m, x, ∈ N.

Proof. Is easy: prove (ii) first by induction over m. (If αm is strictly increasing, then so is
[x+1]
αm .) Then (i) follows immediately, as does

αm+1 (x) > αm (x) for all x ∈ N ,

which gives (iii). Now (iv) follows immediately from the definition above. 

D.5 Lemma
For every primitive recursive function f : Nn → N, there is a m such that αm “dominates”
f , in the sense that

f (x1 , x2 , . . . , xn ) ≤ αm (max{x1 , x2 , . . . , xn }) for all x1 , x2 , . . . , xn ∈ N .

Proof. It is going to save space to use “vector” style notation: rewrite the above inequality
as
f (x) ≤ αm (max(x)) for all x ∈ Nn .
We will prove the result by induction over the construction of the function f , as given in
Definition 9.B.3. I 9.B.3

If f is one of the base functions (the successor function, the zero constant function or one
of the projection functions πn,i , then it is dominated by α0 .
Now suppose that f (x) = h(g1 (x), g2 (x), . . . , gk (x)), but write it

f (x) = h(y1 , y2 , . . . , yk )
where yi = gi (x) for i = 1, 2, . . . , k .

We assume inductively that the functions h, g1 , g2 , . . . , gk are so dominated. Thus we are


asuming that there are m0 , m1 , . . . , mk ∈ N such that

h(y) ≤ αm0 (max(y)) for all y ∈ Nk


gi (x) ≤ αmi (max(x)) for all i = 1, 2, . . . , k and x ∈ Nn .

Writing m = max{m0 , m1 , . . . , mk } and using the preceding lemma, we have

h(y) ≤ αm (max(y)) for all y ∈ Nk


yi = gi (x) ≤ αm (max(x)) for all i = 1, 2, . . . , k and x ∈ Nn .
416 CHAPTER 9. RECURSIVE FUNCTIONS

From the second of these, max(y) ≤ αm (max(x)) and then from the first one, h(y) ≤
αm (αm (max(x))) ≤ αm+1 (max(x)). Since f (x) = h(y) we have our result.
Finally, suppose that f is given by primitive recursion,

f (0, y) = g(y)
f (x + 1, y) = h(x, f (x, y), y) ,

where we assume inductively that g and h are each dominated by one of the α functions,
and so as above, they are both dominated by the same one, αm say.
Now we show, by induction over x, that
[x+1]
f (x, y) ≤ αm (max(x, y)) for all x ∈ N and y ∈ Nn .
[x+1]
We are given f (0, y) ≤ αm (max(y)). Now, assuming f (x, y) ≤ αm (max(x, y)) we also
[x+1]
have y ≤ αm (max(y)) (using the preceding lemma), and so

max(x, max(y), y) = max(x, y) and therefore


[x+1] [x+2]
f (x + 1, y) ≤ αm (αm (max(x, y)) = αm (max(x, y))

completing the induction.

Put z = max(x, y). Then, for all x and y,


[x+1] [z+2]
f (x, y) ≤ αm (z) ≤ αm (z) = αm+1 (z) = αm+1 (max(x, y))

as required. 

D.6 Theorem
There is a function f : N → N which is recursive but not primitive recursive.

Proof. We have seen that the Ackermann-like function A Is recursive. Consequently so is


the function f : N → N defined f (x) = A(x, x) + 1. This is not primitive recursive because
if it were, we would get the usual “diagonal” kind of contradiction: there would be m ∈ N
such that f is dominated by αm and then

f (m) ≤ αm (m) ≤ αm (m) + 1 = f (m) .

D.7 Discussion
The proof using the (genuine) Ackermann function goes much the same way. The main
part is to show that every primitive recursive function Nn → N is dominated by one of the
functions am . This part of the proof is more ticklish than the one given above, probably
because the a-functions do not grow as fast as the α-functions. The “diagonal” argument
then goes as above.
D. THE ACKERMANN FUNCTION 417

Let us see what can happen when using the definition to (attempt to) compute Ak(4, 4):

Ak(4, 4)
= Ak(3, Ak(4, 3)) by (Ak3)
= Ak(3, Ak(3, Ak(4, 2))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(4, 1)))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(4, 0))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, 0)))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(2, 0))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, Ak(1, 0)))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, Ak(0, Ak(0, 0))))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, Ak(0, 2)))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, 4))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(1, 3)))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(1, 2))))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(1, 1)))))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, Ak(1, 0))))))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, Ak(0, Ak(0, 0)))))))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, Ak(0, 2))))))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, 4)))))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, 6))))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, 8)))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, 10))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, 12)))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(1, Ak(2, 11))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(1, Ak(1, Ak(2, 10)))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(1, Ak(1, Ak(1, Ak(2, 9))))))))) by (Ak3)

and so on and on. It certainly does not look as though it is going to terminate any time
soon.

A rather mean trick that some teachers of computer science play on their students when they
are learning about recursive programming is to give them the definition of the Ackermann
function and ask them to write a small program to, say, compute A(6, 6). The task looks
easy, innocuous even. What the students find is that the process runs out of memory fast,
and there is not much they can do about it.
418 CHAPTER 9. RECURSIVE FUNCTIONS

E Two fundamental theorems


We start with a lemma which shows that partial recursive functions N → N can be defined
without reference to functions Nn → N for other n. It is not specially interesting in itself,
however it is a building block for the next few theorems.

E.1 Lemma
The set of partial recursive functions N 99K N is the closure of the following functions

(i) x 7→ x + 1
(ii) L
(iii) R
(iv) x 7→ L(x) + R(x)

(v) 
x 7→ L(x)−R(x)
(vi) x 7→ L(x)R(x)
under the following operations

(a) Composition
(b) From g and h obtain f given by f (x) = P (g(x), h(x))
(c) From g obtain f given by f (x) = miny {g(P (y, x)) = 0}

Proof. Let us temporarily refer to the set of functions defined above as P and show that
this set is in fact the set of partial recursive functions. It is obvious that all such functions
are partial recursive, so it is only the converse that we need to prove.
I B.13 We will use the inverse functions Pn : Nn → N and En : N → Nn defined in Section B.13.
We will also use two results, both easily proved by induction:
Firstly if g1 , g2 , . . . , gn are all members of P (n ≥ 1), then so is Pn (g1 , g2 , . . . , gn ).

Secondly, for n ≥ 1, En+1 (P (y, w)) = (y, En (w))


Now we prove the proposition by proving the slightly stronger result that, if f is any partially
recursive function Nn 99K N (for any n), then f ◦ En : N 99K N is a member of P. This
contains the required result, since E1 is the identity. We prove this by induction over the
I A.3 construction of f as given in Definition A.3.
(1) If f is the successor function, then it is a member of P by definition.
(2) If f is the projection function πn,i : Nn → N, that is, (x1 , x2 , . . . , xn ) 7→ xi , then
(from the definition of En ),
f ◦ En = LRi−1 if i < n and f ◦ En = R n if i = n .
In either case f ◦ En is a member of P.
E. TWO FUNDAMENTAL THEOREMS 419

(3) If f is addition, f (x, y) = x + y, then f ◦ E2 (w) = L(w) + R(w) and this is a member
of P by its definition.
(4) and (5) The same argument holds if f is natural subtraction or multiplication.
(6) Now suppose that f is obtained by substitution, f = h(g1 , g2 , . . . , gm ) : Nn 99K N
say, where h ◦ Em and g1 ◦ En , g2 ◦ En , . . . , gm ◦ En are all members of P. Then
f ◦ En = h(g1 ◦ En , g2 ◦ En , . . . , gm ◦ En ) = h ◦ Em ◦ Pm (g1 ◦ En , g2 ◦ En , . . . , gm ◦ En ).
This is a member of P by the first of our preliminary result as above.

(7) Finally, suppose that f is obtained by minimalisation, f (x) = miny {g(y, x) = 0}


say, where f : Nn 99K N and g : Nn+1 99K N and g ◦ En+1 is a member of P. But then so is
the function
x 7→ min{g ◦ En+1 P (y, x) = 0} = min{g(y, En (x)) = 0} = f ◦ En (x) 
y y

E.2 Theorem
There is a partial recursive function ϕ : N2 99K N such that the functions
ϕ(0, _) , ϕ(1, _) , ϕ(2, _) , . . .
are all the partial recursive functions N → N. That is, given any partial recursive function
f : N → N, there is an i such that f (x) = ϕ(i, x) for all x.
Note It is usual to write this function ϕi (x) instead of ϕ(i, x). One can then think of
ϕ0 , ϕ1 , ϕ2 , . . . as being a listing of all the partial recursive functions N → N.
A useful way of looking at this result is: suppose we visualise the values of ϕ as a two-
dimensional array:

0 1 2 3 4
0 ϕ(0, 0) ϕ(0, 1) ϕ(0, 2) ϕ(0, 3) ϕ(0, 4) ...
1 ϕ(1, 0) ϕ(1, 1) ϕ(1, 2) ϕ(1, 3) ϕ(1, 4) ...
2 ϕ(2, 0) ϕ(2, 1) ϕ(2, 2) ϕ(2, 3) ϕ(2, 4) ...
3 ϕ(3, 0) ϕ(3, 1) ϕ(3, 2) ϕ(3, 3) ϕ(3, 4) ...
4 ϕ(4, 0) ϕ(4, 1) ϕ(4, 2) ϕ(4, 3) ϕ(4, 4) ...
.. .. .. .. ..
. . . . .

Then the ith row of this table is a listing of the values of ϕi . And that means that every
partial recursive function will turn up listed as a row in this array somewhere. In fact, any
such partial recursive function will turn up as such a row an infinite number of times (we
will prove this later).
Note well that we are listing partial recursive functions here, with an emphasis on the
“partial”. There will be many values ϕ(i, j) which do not exist — so there will be lots of
“holes” in this array.

This function ϕ : N2 99K N is called a universal partial function for obvious reasons.
420 CHAPTER 9. RECURSIVE FUNCTIONS

Proof. Define ϕn thus:


Firstly ϕ1 , ϕ2 , . . . , ϕ5 are the six basic functions listed in the proposition above.
Then, for n ≥ 6,
If n = 3m + 6 then ϕn is the function got from ϕL(m) and ϕR(m) by composition.

If n = 3m + 7 then ϕn is the function got from g = ϕL(m) and h = ϕR(m) by ϕn (x) =


P (g(x), h(x)).
If n = 3m + 8 then ϕn is the function got from g = ϕm by ϕn (x) = miny {g(P (y, x)) = 0}.
It is clear, from the proposition, that this will indeed list all the partially recursive functions
N → N. It remains to show that ϕ is itself partial recursive, as a function N2 → N. To do
this we write a program.

F(n,x)(m,r,v) {
if (n=0) {return x+1};
if (n=1) {return L(x)};
if (n=2) {return R(x)};
if (n=3) {return L(x)+R(x)};

if (n=4) {return L(x)−R(x)};
if (n=5) {return L(x)*R(x)};
if (n>5) {
m := bn/3c) - 2;
r := Rem(m,3);
if (r=0) return F(L(m),F(R(m),x));
if (r=1) return P(F(L(m),x),F(R(m),x))
if (r=2) {
v := 0;
while (F(m,P(v,x))6=0) {v := v+1};
return v;
}
};
return 0;
}


E.3 Remarks
(i) The method of of constructing partial recursive functions N 99K N given in Lemma
E.1 above effectively constitutes a way of making algorithms, because it tells us that any
such function can be built up in a finite sequence of steps, starting with the basic functions
listed there and combining them with the operations listed until, at the last step, the required
function is produced. A listing of these steps then is one (perhaps inefficient) way of defining
an algorithm.

The proof of the last theorem shows us how to associate a number i with such a listing.
And conversely, given such a number i, it tells us how to break it down to find the listing.
E. TWO FUNDAMENTAL THEOREMS 421

What this means is that the subscripts i on the functions ϕ(i) can be taken to correspond to
algorithms to compute them according to the method given by the lemma. And, conversely,
if we know how to build up the function according to the lemma, then we can find such a
subscript i.
Now consider the original definition of a recursive function back in Definition A.A.3. This I A.A.3
also gives us a method, effectively an algorithm, for specifying and also computing any partial
recursive function, by listing the steps whereby the function can be built up according to
that definition.
But the proof of Lemma E.1 above tells us how to convert back and forth between its way
of defining partial recursive functions and that of the original Definition.

So that means that we can convert back and forth between a subscript i and an algorithm,
in the sense of a listing of steps according to the original definition.
Now we take this idea one step further. Suppose we have a partial recursive function specified
by a general algorithm, in the sense of Section C.A.1. Then the big long proof of Section E.1 I C.A.1
tells us how we can rewrite that algorithm as a sequence of steps according to the original I E.1
definition. Actually doing so would be horrible, but a scan through that long proof will show
that, at every step the way of building up the requisite function is stated explicitly. And,
conversely, given an algorithm in the sense of a sequence of steps building up a function
according to the original definition, the proof of Proposition C.E.14 tells us how to convert I C.E.14
it into a general algorithm.

This in turn means that we can convert back and forth between indices i and general
algorithms.
And, putting this all together, no matter which way we like to write an algorithm, we can
convert back and forth algorithms and indices i on the functions ϕi .

Another way of looking at this is that these functions ϕi make the indices i into a nice, very
compact, way of referring to algorithms.
This word “effective”. In the context of recursive functions and algorithms, to say some
procedure is “effective” means that there is an algorithm to do it, that is, that it can definitely
actually be done. To say that ϕ0 , ϕ1 , ϕ2 , . . . is an effective listing means that we can go
back and forth between the indices i and the algorithms for computing the functions — that
there are actually algorithms for so doing.
(ii) We now have a “gold-plated” effective listing of all partial recursive functions N → N,

ϕ0 , ϕ1 , ϕ2 , . . .

Using the recursive bijections Pm : Nm → N we can list the partial recursive functions
Nm → N
(m) (m) (m)
ϕ0 , ϕ1 , ϕ2 , . . .
(m)
simply by defining ϕi = ϕi Pm . In fact we can list the partial recursive functions Nm → Nn

(n,m) (n,m) (n,m)


ϕ0 , ϕ1 , ϕ2 , ...
422 CHAPTER 9. RECURSIVE FUNCTIONS

(n,m)
just as simply, by defining ϕi = En ϕi Pm .

In the particular ordering given by the chosen proof above, ϕ1 , ϕ2 , . . . , ϕ5 are the functions

ϕ0 (x) = x + 1
ϕ1 (x) = L(x)
ϕ2 (x) = R(x)
ϕ3 (x) = L(x) + R(x)

ϕ (x) = L(x)−R(x)
4
ϕ5 (x) = L(x)R(x) .

Also, for any i, j and x,

ϕi ϕj (x) = ϕ3P (i,j)+6 (x)


P (ϕi (x), ϕj (x)) = ϕ3P (i,j)+7 (x)

and

min{ϕi (P (y, x)) = 0} = ϕ3i+8 (x) .


y

I E.1 We notice that, as a consequence of the proof of Lemma E.1, there is a recursive function
χ : N2 → N such that
ϕj ϕi = ϕχ(j,i) for all i and j
(here ϕj ϕi denotes composition).

More generally, because of the definitions just made, if we have


ϕi ϕj
Nl −→ Nm −→ Nn
(n,m) (m,l) (n,l)
then the indices compose by the same function: ϕj ϕi = ϕχ(j,i) .

Another point worth noting is that, because of the definition of Pn ,

Pm+n (x1 , x2 , . . . , xm , y1 , y2 , . . . , yn ) = Pm+1 (x1 , x2 , . . . , xm , Pn (y1 , y2 , . . . , yn ))

and so
(m+n)
ϕi (x1 , x2 , . . . , xm , y) = ϕm+1
i (x1 , x2 , . . . , xm , Pn (y)) .

In what follows it will be convenient to assume that we have chosen the particular listing
of partial recursive functions given in the theorem. However it is not necessary to do this:
most of the results still hold whatever effective listing is used (but then some of the proofs
must become more general).
One should observe that the listing is not one-to-one, in fact it is easy to see that any partial
recursive function appears in the list infinitely often. Less obvious is the fact that this must
be so in any effective listing: it is just not possible to effectively list the partial recursive
E. TWO FUNDAMENTAL THEOREMS 423

functions in such a way that every function appears just once. This explains some of the
F.9 choices of words in the theorems which follow. Rice’s Theorem (F.9 below) shows just how
bad the situation must be in this regard.
The next two lemmas and theorem are technical. They allow us to play some fancy games
with the subscripts i.

E.4 Lemma
For every partial recursive function f : Nn → N there is a recursive function fˆ : Nn → N
(not merely partial recursive!) such that
f ◦ (ϕi1 , ϕi2 , . . . , ϕin ) = ϕfˆ(i1 ,i2 ,...,in ) for all i1 , i2 , . . . , in .

(The function on the left is x 7→ f (ϕi1 (x), ϕi2 (x), . . . , ϕin (x)). This is the notation intro-
duced in Subsection A.7.) I A.7

Proof. is by induction over the construction of f as given in Definition A.3. I A.3


Suppose first that f is the successor function, f (x) = x + 1. Then
f (ϕi (x)) = ϕi (x) + 1 = ϕ0 ϕi (x) = ϕχ(0,i) (x)

so the result is true with fˆ(i) = χ(0, i).


Suppose next that f is the projection function f ◦ (x1 , x2 , . . . , xn ) = xk . Then
f ◦ (ϕi1 , ϕi2 , . . . , ϕin ) = ϕik = ϕf (i1 ,i2 ,...,in )

so in this case fˆ = f .
Suppose next that f is addition, f (x, y) = x + y : N2 → N. Then
(2)
f E2 (w) = f (L(w), R(w)) = L(w) + R(w) so f = ϕ3 .

also

f ◦ (ϕi , ϕj ) = f E2 P ◦ (ϕi , ϕj ) = ϕ3 ϕ3P (i,j)+7 = ϕχ(3,3P (i,j)+7)

so the result is true with fˆ = χ(3, 3P (i, j) + 7).


The same argument holds if f is natural subtraction or multiplication.

Suppose now that f = P : N2 → N; then f ◦ (ϕi , ϕj ) = P ◦ (ϕi , ϕj ) = ϕ3P (i,j)+7 and so


the result is true with fˆ = 3P (i, j) + 7. Using this, we prove the proposition in the case
f = Pn : Nn → N by induction over n. If n = 1 then f = P1 = id so f (ϕi ) = ϕi , and so fˆ is
the identity also, fˆ = ϕd , where d is the index of the identity function. Now, assuming the
result is true for n,
Pn+1 ◦ (ϕi1 , ϕi2 , . . . , ϕin+1 ) = P ◦ (ϕi1 , Pn (ϕi2 , ϕi3 , . . . , ϕin+1 ))
= P ◦ (ϕi1 , ϕP̂n (i2 ,i3 ,...,in+1 ) ) by the inductive hypothesis
= ϕ3P (i1 ,P̂n (i2 ,i3 ,...,in+1 ))+7
424 CHAPTER 9. RECURSIVE FUNCTIONS

and so the result is true with fˆ = 3P (i1 , P̂n (i2 , i3 , . . . , in+1 )) + 7.


Suppose now that f is given by substitution, f = h ◦ (g1 , g2 , . . . , gm ) : Nn → N. Then

f ◦ (ϕi1 ,ϕi2 , . . . , ϕin ) =


= h ◦ (g1 ◦ (ϕi1 , ϕi2 , . . . , ϕin ), g2 ◦ (ϕi1 , ϕi2 , . . . , ϕin ), . . . , gm ◦ (ϕi1 , ϕi2 , . . . , ϕin ))
= h ◦ (ϕĝ1 (i1 ,i2 ,...,in ) , ϕĝ2 (i1 ,i2 ,...,in ) , . . . , ϕĝm (i1 ,i2 ,...,in ) )
= ϕĥ(ĝ1 (i1 ,i2 ,...,in ),ĝ2 (i1 ,i2 ,...,in ),...,ĝm (i1 ,i2 ,...,in ))

so fˆ = ĥ ◦ (ĝ1 , ĝ2 , . . . , ĝm ).


Finally, suppose that f is given by minimalisation

f (x1 , x2 , . . . , xn ) = min{g(y, x1 , x2 , . . . , xn ) = 0} .
y

Then, putting w = P (y, x),

g(y, ϕi1 (x), ϕi2 (x), . . . , ϕin (x)) = g(L(w), ϕi1 R(w), ϕi2 R(w), . . . , ϕin R(w))
= g(ϕ1 (w), ϕi1 ϕ2 (w), ϕi2 ϕ2 (w), . . . , ϕin ϕ2 (w))
since L = ϕ1 and R = ϕ2
= g(ϕ1 (w), ϕχ(i1 ,2) (w), ϕχ(i2 ,2) (w), . . . , ϕχ(in ,2) (w))
= ϕθ (w) = ϕθ P (x, y)

where θ = ĝ(1, χ(i1 , 2), χ(i2 , 2), . . . , χ(in , 2))). So

f (ϕi1 (x), ϕi2 (x), . . . , ϕin (x)) = min{ϕθ P (x, y) = 0} = ϕ3θ+8


y

and the result is true again with fˆ = 3ĝ(1, χ(i1 , 2), χ(i2 , 2), . . . , χ(in , 2))) + 8. 

E.5 Lemma
For any partial recursive function f : N2 → N there is a recursive function f˜ : N → N such
that f (x, y) = ϕf˜(x) (y) for all x and y.

Proof. Consider first the case of the function π(x, y) = x for all x and y. Define π̃ induc-
tively:

π̃(0) = z where z is the index of the zero function: ϕz (y) = 0 for all y,
π̃(x + 1) = χ(0, π̃(x)) where χ is the composition function defijned above.

Let f be any partial recursive function N2 → N. Then, by the previous lemma, there is a
recursive function fˆ : N2 → N such that ϕf˜(i,j) (y) = f (ϕi (y), ϕj (y)) for all i, j and y. Set
f˜ = fˆ(π̃(x), d), where d is the index of the identity function, ϕd (y) = y. Then

ϕf˜(x) (y) = ϕfˆ(π̃(x),d) (y) = f (ϕπ̃(x) (y), ϕd (y)), = f (x, y)

as required. 
E. TWO FUNDAMENTAL THEOREMS 425

E.6 The “s-m-n Theorem”


For each m ≥ 1 and n ≥ 1 there is a recursive function sm,n : Nm+1 → N such that
(m+n)
ϕi (x, y) = ϕm
sm,n (i,x) (y) for all i ∈ N, x ∈ Nm and y ∈ Nn .

Proof. We first prove the result in the case m = n = 1; the general case then follows easily.
So we prove: There is a recursive function s : N2 → N such that
(2)
ϕi (x, y) = ϕs(i,x) (y) for all i, x, y ∈ N.

We show how to define s recursively by giving a description of how to write a program for
s (with comments).
(2)
If i ≤ 5 then ϕi is one of the six basic functions listed in Lemma E.1. Therefore ϕi = ϕi P I E.1
is one of the functions

x 7→ P (x, y) + 1 , x , y , x+y , 
x−y or xy.

Whichever it is, call it f . Set s(i, x) = f˜(x) (where f˜ is the function defined in the preceding
proposition) and then ϕs(i,x) (y) = ϕf˜(x) (y) = f (x, y).

Otherwise i ≥ 6. Write m = b(i − 6)/3c, so that i = 3m + 6, 3m + 7 0r 3m + 8.

If i = 3m + 6 then ϕi = ϕL(m) ϕR(m) and so


(2)
ϕi (x, y) = ϕi P (x, y)
= ϕL(m) ϕR(m) P (x, y)
(2)
= ϕL(m) ϕR(m) (x, y)
= ϕL(m) ϕs(R(m),x) (y)
= ϕχ(L(m),s(R(m),x)) (y)

and so we set s(i, x) = χ(L(m), s(R(m), x)).


If i = 3m + 7 then ϕi = P (ϕL(m) , ϕR(m) and so
(2)
ϕi (x, y) = ϕi P (x, y)
= P (ϕL(m) P (x, y), ϕR(m) P (x, y))
(2) (2)
= P (ϕL(m) (x, y), ϕR(m) (x, y))
= P (ϕs(L(m),x) (y), ϕs(R(m),x) (y))
= ϕ3k+7 (y)

where k = P (s(L(m), x), s(R(m), x)), so we set s(i, x) = 3P (s(L(m), x), s(R(m), x)) + 7.
Before moving on, note that the function N3 → N3 given by (z, x, y) 7→ (x, z, y) is recursive,
(3,3)
so let its index be r: ϕr (z, x, y) = (x, z, y). Then, defining ρ : N → N by ρ(i) = χ(i, r),
we have ϕρ(i) (z, x, y) = ϕi (x, z, y) for all i, x, y and z.
426 CHAPTER 9. RECURSIVE FUNCTIONS

If i = 3m + 8 then ϕi (w) = minz {ϕm P (z, w) = 0} for all w. Then


(2)
ϕi (x, y) = ϕi P (x, y)
= min{ϕm P (z, P (x, y)) = 0}
z

= min{ϕ(3)
m (z, x, y) = 0}
z
= min{ϕρ(m) (x, z, y) = 0}
z
(2)
= min{ϕρ(m) (x, P (z, y)) = 0}
z
= min{ϕs(ρ(m),x) P (z, y) = 0}
z
= ϕ3s(ρ(m),x)+8 (y)

so set s(i, x) = 3s(ρ(m), x) + 8.


This completes the proof when m = n = 1.

Now we prove the result for m = 1 and any n. We want to prove that, for any n ≥ 1, there
(n+1) (n)
is a recursive function s1,n : N2 → N such that ϕi (x, y) = ϕs1,n (i,x) (y) for all i, x ∈ N
and y ∈ Nn . This is easy: the function s already defined will do:—
(n+1) (2) (n)
ϕi (x, y) = ϕi (x, Pn (y)) = ϕs(i,x) Pn (y) = ϕs(i,x) (y) .

Finally we prove the result for any m ≥ 1 and n ≥ 1, by induction over m. The case m = 1
has been proved above. Then, for the case m + 1,
(m+n+1) (m+n)
ϕi (x1 , x2 , . . . , xm+1 , y) = ϕs(i,x1 ) (x2 , x3 , . . . , xm+1 , y) just proved above,
= ϕsm,n (s(i,x1 ),(x2 ,x3 ,...,xm+1 )) (y)

and so the result is true with sm+1,n (i, x1 , x2 , . . . , xm+1 ) = sm,n (s(i, x1 ), (x2 , x3 , . . . , xm+1 )).


E.7 Remark
It was remarked above that the listing of partial recursive functions given by Proposition
I E.2 E.2 is by no means the only possible one. Many useful results about such effective listings
can be proved using the two main theorems of this section (Theorem E.2, as an existence
theorem, and the s-m-n Theorem).
F. SOME IMPORTANT NEGATIVE RESULTS 427

F Some important negative results


In this section we look at a number of things which can not, in general, be done. Both
the results here and their proofs are of considerable importance, and I would recommend
reading them with care. Given what we have already covered in these notes, the proofs
are not long or hard and contain interesting techniques. I will head each theorem with an
informal description of what it tells us.

In my opinion, these results are both alarming and fun, becoming gradually more so as you
progress through the section.
Throughout this section we will restrict our attention to partial recursive functions N 99K N,
but that is only to make the discussion simpler. All of these results can be generalised easily
to functions Nm 99K Nn .

F.1 Proposition
There is no universal recursive function. (Compare with the existence of a universal partial
function, proved in Theorem E.2.) I E.2
There is no recursive function ψ : N2 → N such that the functions

ψ(0, _) , ψ(1, _) , ψ(2, _) , ...

are all the recursive functions N → N (i.e. such that, for every recursive function f : N → N,
there is some i ∈ N such that f (x) = ψ(i, x) for all x.
The proof is a classic “diagonal” argument.

Proof. Suppose that ψ exists. Define g : N → N by

g(x) = ψ(x, x) + 1 for all x ∈ N.

Then g is recursive, so there is some i ∈ N such that g = ψ(i, _), that is,

g(x) = ψ(i, x) for all x ∈ N.

In the case x = i, the last two displayed equations give ψ(i, i) = g(i) = ψ(i, i) + 1, a
contradiction. 

One of the points of Theorem E.2 is that it shows how, given an algorithm for a partially I E.2
recursive function f , to find a subscript i such that f = ϕi ; and conversely, given i to find an
algorithm for f . Thus we can think of the subscript i as a convenient way of encapsulating
the algorithm for ϕi . So one way of asking what we can tell about a partial recursive function
from a given algorithm to compute it is to ask what we can tell about the function ϕi from
a knowledge of the index i. As we will see, the answer is “pretty well nothing”.
In this context note that the fact that, for any given partial recursive function f , there are
an infinite number of algorithms to compute it, corresponds to the fact that there are an
infinite number of indices i such that ϕi = f . Also note that, to ask whether such-and-such
428 CHAPTER 9. RECURSIVE FUNCTIONS

a question about a partial recursive function can be answered from a knowledge of a given
algorithm to compute it is to ask whether the answer to the question about ϕi is a recursive
predicate of the subscript i. The next Proposition answers an obvious such question in the
negative.

F.2 Proposition
Given an algorithm for a partial recursive function, one cannot in general determine whether
it is recursive or not.

Of course one can certainly determine this from the algorithm for some algorithms. The
proposition says that you can’t always do it.
The set { i : ϕi is recursive } is not recursive.

Proof. Suppose that this set is recursive. Then its characterisic function
(
1 if ϕi is recursive
C(i) =
0 otherwise

is recursive. Define ψ : N2 → N by
(
ϕi (x) if C(i) = 1
ψ(i, x) =
0 otherwise

I F.1 and we have a function which contradicts Proposition F.1. 

Next we look at the Halting Problem. This problem is to find a method whereby, given
any algorithm for a partial recursive function N 99K N and an argument x, one can decide
whether the algorithm halts or not for that x.

That this question is important, even central, is suggested by two things: firstly, that if such
a method exists, then it can easily be seen to extend to functions Nm 99K Nn and, secondly,
that it can be interpreted as asking whether there is any general and reliable method for
debugging computer programs (or at least for checking whether or not they get into an
“infinite loop”.

F.3 Proposition
The Halting Problem is not solvable.

The set dom ϕ is not recursive.

Proof. Suppose that dom ϕ is recursive. Then so is its characteristic function


(
1 if ϕ(i, x) is defined
C(i, x) =
0 otherwise.
F. SOME IMPORTANT NEGATIVE RESULTS 429

Now define ψ : N2 → N by
(
ϕ(i, x) if C(i, x) = 1
ψ(i, x) =
0 otherwise

and we have another function which contradicts Proposition F.1.  I F.1

There is a subtle point here. This proposition says that there is no general method that will
work for every algorithm. No matter what method we try, there will be an algorithm out
there somewhere which defeats it. But there remains the possibility that there might exist a
(separate) method for each algorithm which would allow one to determine, for each values of
x, whether that algorithm halted or not. The existence of an infinite number of algorithms
does not mean that they could all be combined together into one single “algorithm to rule
them all”.
The next proposition settles that question: there are algorithms for which no method exists.

F.4 Proposition
There is a particular partial recursive function f whose halting problem is not solvable.

Proof. There are in fact many. I will give two examples.


Define the function f by f = ϕ(L, R), that is, f (x) = ϕ(L(x), R(x)) for all x. Suppose that
dom f is recursive. Then so is its characteristic function,
(
1 if ϕ(L(x), R(x)) is defined
C(x) =
0 otherwise.

Define ψ : N2 → N by
(
ϕ(i, x) if C(P (i, x)) = 1
ψ(i, x) =
0 otherwise.

This is the ψ from the last proposition, so dom f is not recursive.


For a second example, define f (x) = ϕ(x, x). Suppose that dom f is recursive, with charac-
teristc function C . Define g : N → N by
(
ϕ(x, x) + 1 if C(x) = 1
g(x) =
0 otherwise.

Then g is recursive. So there is i ∈ N such that g = ϕi , that is, g(x) = ϕ(i, x) for all x. Put
x = i. Noting that ϕ(i, i) = g(i), which is defined, so that i ∈ dom f and C(i) = 1, we have
the contradiction
ϕ(i, i) = g(i) = ϕ(i, i) + 1 . 
430 CHAPTER 9. RECURSIVE FUNCTIONS

F.5 Proposition
You cannot tell if any particular number occurs as a value either.
Given any y ∈ N, the set S = { (i, x) : ϕ(i, x) = y } is not recursive.

Proof. Suppose that S is recursive. Then we will solve the Halting Problem. Define

ψ : N2 → N by ψ(i, x) = ϕ(i, x)−ϕ(i, x) + y. Then
(
y if (i, x) ∈ dom ϕ
ψ(i, x) =
undefined otherwise

I E.5 and (by Proposition E.5) there is recursive α : N → N such that ψ(i, x) = ϕ(α(i), x) for all
i, x. Then

(i, x) ∈ dom ϕ ⇔ ψ(i, x) = y ⇔ ϕ(α(i), x) = y ⇔ (α(i), x) ∈ S .

The last set here is recursive and the first set is not. 

To see why the last set here is recursive, see Part (ii) of the Remark below. To see how

I E.5 Proposition E.5 is applied, use it in the case n = 1, with f (u) = u−u + y and α = f˜.

F.6 Remark
Suppose that R is a recursive subset of Nn .

(i) Then so is N r R.
(ii) Let the function h : Nm → Nn be recursive also. Then so is the set { x : h(x) ∈ R } =
h−1 [R].

Proof. Let C be the characteristic function of R. Then the characteristic function of Nn rR


 and the characteristic function of h−1 [R] is C ◦ h.
is 1−C 

F.7 Proposition
There is no effective procedure to decide what algorithm gives what function.
(Corollary: If a computer-science lecturer gives you an assignment to write a program to
compute some given function, then there is no way he/she can be sure of being able to mark
you answer right or wrong.)
Let g be any partial recursive function N → N. Then the set { i : ϕi = g } is not recursive.

Proof. Suppose first that g 6= ∅, i.e. that g has nonempty domain. Let θ be a partial
recursive function N → N with nonrecursive domain (e.g. one of the examples of Proposition
I F.4 F.4).

Suppose that the set R = { i : ϕi = g } is recursive.


F. SOME IMPORTANT NEGATIVE RESULTS 431


Define a new function f : N2 → N by f (u, x) = θ(u)−θ(u) + g(x). This is partial recursive
and is the function (
g(x) if u ∈ dom θ
f (u, x) =
undefined otherwise.
Now, by Proposition E.5 there is a recursive α : N → N such that f (u, x) = ϕα(u) (x) for all I E.5
u, x. Thus (
g(x) if u ∈ dom θ
ϕα(u) (x) =
undefined otherwise,
which is to say (
g if u ∈ dom θ
ϕα(u) =
∅ otherwise.
Therefore (and here is where we use g 6= ∅),

u ∈ dom ϕ ⇔ ϕα(u) = g ⇔ α(u) ∈ R ⇔ u ∈ α−1 [R].

This is a contradiction, since α−1 [R] is recursive but dom ϕ is not.


It remains to show that the set { i : ϕi = ∅ } is not recursive.

Choose any nonempty partial recursive function g : N → N you like (say, the zero function).
Now define f and α as above, so again we have
(
g if u ∈ dom θ
ϕα(u) =
∅ otherwise.

Then
u ∈ dom ϕ ⇔ ϕα(u) 6= ∅ ⇔ α(u) ∈
/R ⇔ u ∈ α−1 [N r R]
and we have a contradiction as before. 

F.8 Corollary
There is no way of telling, from an algorithm, whether or not its function has any values at
all.

The set { i : ϕi = ∅ } is not recursive.

F.9 Rice’s Theorem


And if you thought things were bad so far, try this theorem and its corollary.
Let R be a recursive subset of with the property that

if i∈R and ϕi = ϕj then j∈R also.

Then R = ∅ or R = N.
We have been looking at properties of algorithms which are “functional”, in the sense that
they are really properties of the functions they define. (Is the function total? Does it have
432 CHAPTER 9. RECURSIVE FUNCTIONS

any zeros? Does it have any values at all?) One can define a functional property, P say, of
algorithms as one such that
For any two algorithms which generate the same function,
if one of them has property P then so does the other.
Or, equivalently,
For any two algorithms which generate the same function,
if one of them fails to have property P then so does the other.
Another way of looking at this is to regard two algorithms as being “equivalent” if they
generate the same function (a very natural idea, also easily seen to be a genuine equivalence
relation). Then a property P is functional if and only if it respects equivalence, in the sense
that two equivalent algorithms either both have property P or both don’t.

Given the relationship between algorithms and the indices i on the ϕ-functions, which has
underlain the results in this section so far, it is natural to consider functional properties of
these indices. Thus we would consider a predicate (unary relation) P on N as functional if,
whenever ϕi = ϕj , i and j either both have the property or both don’t.

I think that functional properties defined on N are clearly of some importance, and it is of
considerable interest to be able to determine whether any particular property is functional
or not.
We have already looked at the close relationship between predicates on N and subsets of N.
(Give the predicate P , define the subset to be { x : P (x) } ). So we define a subset R of N
as being functional if
For any i, j ∈ N such that ϕ(i) = ϕ(j),
either both i and j are members of R or both are not.
Thus the question can be rephrased: can one determine whether any given subset of N is
functional or not? The proposition tells us that the only such subsets are ∅ and N itself.

Proof. We will suppose that R is neither ∅ nor N and deduce a contradiction.


First note that the complement NrR has the same properties: it is recursive and, if i ∈ NrR
and ϕi = ϕj then j ∈ N r R also.

Let θ be a partial recursive function N → N with nonrecursive domain (e.g. one of the
I F.4 examples of Proposition F.4).
Now the empty function is partial recursive, so there is an index z such that ϕz = ∅. It is
also either a member of R or its complement. We may suppose without loss of generality
that z ∈ N r R. Since R is nonempty, it contains at least one number, k say, and then
ϕk 6= ϕz . Summary so far:—

k∈R , z ∈NrR , ϕk 6= ϕz = ∅ .


Now define f : N2 → N by f (u, x) = θ(u)−θ(u) + ϕk (x). Then f is partial recursive and is
the function (
ϕk (x) if u ∈ dom θ
f (u, x) =
undefined otherwise.
F. SOME IMPORTANT NEGATIVE RESULTS 433

Now there is a recursive α : N → N such that f (u, x) = ϕα(u) (x) for all u, x. Thus
(
ϕk (x) if u ∈ dom θ
ϕα(u) (x) =
undefined otherwise,

which is to say (
ϕk if u ∈ dom θ
ϕα(u) =
∅ = ϕz otherwise.
Now
if u ∈ dom θ then ϕα(u) = ϕk and so α(u) ∈ R
and
if u ∈
/ dom θ then ϕα(u) = ϕz and so α(u) ∈ N r R .
It follows that dom θ = { u : α(u) ∈ R }, which is recursive. 

Let us say that a property of partial functions N 99K N is “trivial” if either all functions
have the property or all functions don’t. (Thus a trivial property tells us nothing about the
function.) Examples of trivial properties are “f is a partial function N 99K N, “if f (x) exists,
then f (x) = f (x)” and so on. So we are really only interested in non-trivial properties.

F.10 Corollary
There is no way of telling, from its algorithm, whether a function has any nontrivial property
at all.
Let P be any nontrivial property of functions N 99K N. Then the set { i : ϕi has property P }
is neither ∅ nor N and so is not recursive.
Is that fact depressing or exciting? Possibly both.
434 CHAPTER 9. RECURSIVE FUNCTIONS

G Recursively enumerable sets

G.1 Definition

A subset of N is recursively enumerable (often abbreviated to r.e.) if it is the range of some


partial recursive function N → N.

G.2 Discussion

So what is the difference between a recursive set and a recursively enumerable set?
For a recursive set R, there is an algorithm to decide, for any n ∈ N whether n ∈ R or not.
One way I find helpful to look at this is: think of the algorithm as a machine. We feed any
numbers we like into it, and it spits out answers “Yes” or “No”.

With a recursively enumerable set S, there is an algorithm such that S is all the values it can
compute. Thinking of the algorithm as a machine, we feed it with all the natural numbers
0, 1, 2, . . . and collect all the numbers it spits out. After this infinite process is completed,
the collection of spat out numbers is S.
It is not hard to see that a recursive set must be recursively enumerable (details below), but
the other way round is problematical. Given our machine to generate the set S as above,
how could we determine whether any particular number n is in it or not? We could set the
machine going, with the numbers 0, 1, 2, . . . as input and wait to see if n is spat out. Now
if n happens to be in S, we will eventually see it appear, and our question is answered with
“Yes”. But if it is not in S, then we must wait forever for our answer — we have effectively
got ourselves into an infinite loop.

That is not a proof that there are recursively enumerable sets which are not recursive. The
fact that the suggested decision method doesn’t work does not mean that there isn’t another,
trickier, method which will. However we will prove below that there are such sets, including
some very important ones. In the meantime, I suppose that the existence of the two different
words for describing subsets suggests strongly that there is some difference between the two
types.
Why not use “recursive” instead of “partial recursive” in the definition? For two reasons.
Firstly, because we are interested in the output of algorithms, and as we have seen in the last
section, deciding whether an algorithm produces a recursive function or not is not always
possible. And secondly, as the next proposition shows, we can indeed define a recursively
enumerable set to be the range of a recursive function — with one small adjustment. Perhaps
more surprisingly, we can even define it as the range of a primitive recursive function (with
the same adjustment).
So what is that small adjustment? The empty set is recursively enumerable — we know
that because it is easy to define an algorithm which has no values at all. But the empty set
cannot possibly be the range of any total function. We will see that that is the only thing
that can go wrong: we can define a r.e. set as either ∅ or the range of a recursive function.
G. RECURSIVELY ENUMERABLE SETS 435

G.3 Proposition
For a subset R of N, the following are equivalent:
(i) R is recursively enumerable (i.e. the range of a partial recursive function N → N).

(ii) R is empty or the range of a recursive function N → N.


(iii) R is empty or the range of a primitive recursive function N → N.

Proof. Clearly (iii) ⇒ (ii) ⇒ (i), so it remains to prove (i) ⇒ (iii).

Suppose then that R is the range of a partial recursive function f and is nonempty; we
show that it is the range of a primitive recursive function. Using Theorem C.E.17, there are I C.E.17
primitive recursive functions p and q : N2 → N such that

f (x) = p( min{q(t, x) = 0} , x ) .
t

Then the function (


1 if t is a first zero of q
q̄(t, x) =
0 otherwise
is primitive recursive also. Here by “a first zero” I mean that q(t, x) = 0 but q(t0 , x) 6= 0 for
all t0 < t. (In case it is not obvious that q̄ is primitive recursive, it is proved at the end of
the main part of this proof, so as not to halt the flow.)
From the definition of q̄ we see that q̄(t, x) = 1 if and only if t = mint {q(t, x) = 0} and so

R = { p(t, x) : q̄(t, x) = 1 } .

Now R is nonempty, so there is some n0 ∈ R. Using this, define a function r : N2 → N by


 q̄(t, x))n .
r(t, x) = q̄(t, x)p(t, x) + (1− 0

With this definition it is obvious that r is primitive recursive. But, since q̄ only takes values
0 and 1, the definition may be restated
(
p(t, x) if q̄(t, x) = 1
r(t, x) =
n0 otherwise.

Then R = ran r with r primitive recursive. It remains to replace r by a primitive recursive


function N → N. But
g(u) = r(L(u), R(u))
will do the trick, because the functions L and R are primitive recursive.
Appendix To see that q̄ is primitive recursive, we manufacture it from q using only
operations that we know will lead from primitive recursive functions to primitive recursive
functions. Given q, define q1 by
t
Y
q1 (t, x) = q(u, x).
u=0
436 CHAPTER 9. RECURSIVE FUNCTIONS

Now q1 has the property that


(
0 if there is u ≤ t such that q(u, x) = 0
q1 (t, x) =
nonzero otherwise.
Next define q2 by
 (t, x) .
q+ 2(t, x) = 1−q1

Then q2 is the function


(
1 if there is u ≤ t such that q(u, x) = 0
q2 (t, x) =
0 otherwise.
Next define
t
X
q3 (t, x) = q2 (u, x)
u=0
and then (
1 if u is a first zero of q
q3 (t, x) =
0 or ≥ 2 otherwise.
I A.4 Now set q̄(t, x) = eq(q3 (t, x), 1). (Here eq is the Equality Test function defined in A.4(iv).)

(
0 if there is u ≤ t such that q(u, x) = 0
q1 (t, x) = 
nonzero otherwise.

G.4 Remark
Since we have primitive recursive one-to-one correspondences between N and Nn for each n,
functions N → N can be replaced by functions Nn → N in the above proposition. Using the
same idea, recursively enumerable subsets of Nn can be defined.
The next proposition is surprising, but nice. (IM¬HO anyway.)

G.5 Proposition
A subset R of N is recursively enumerable if and only if it is the domain of a partial recursive
function N → N.

Proof. First, suppose that R is recursively enumerable. Then it is the range of a recursive
function N → N. Define g : N → N by
g(x) = min{f (i) = x} .
i

Then R is the domain of g.


Conversely, suppose that R is the domain of a partially recursive function g : N → N. Define
f : N → N by
f (x) = x.1(g(x))
where 1 is the constant function t 7→ 1. Then 1(g(x)) is 1 if g(x) is defined and undefined
otherwise. Therefore R is the range of f . 
G. RECURSIVELY ENUMERABLE SETS 437

We now have four different characterisations of a recursively enumerable set. When trying
to prove things about these sets, the choice of which one to use can make a big difference
to the difficulty of finding a proof. I would recommend always contemplating whether it is
going to be easier to use a “range” or “domain” definition at the outset when embarking on
such a proof.

Now for the basic “easy” result promised above.

G.6 Proposition
If a subset is recursive then it is recursively enumerable.

Proof. Let R be a recursive set, C its characteristic function. Then R is the domain of the
partial recursive function
f (x) = min{C(x) = 1} . 
i

The point of that was that f is the function


(
1 if x ∈ R
f (x) =
undefined otherwise

G.7 Proposition
A subset R of N is recursive if and only if both it and its complement N r R are recursively
enumerable.

Proof. If R is recursive then so is its complement and then they are both recursively
enumerable by the previous proposition.

Now suppose both R and its complement are recursively enumerable. If either one is empty
the result is trivial, so we will suppose they are both nonempty. Then there are recursive
functions f and g such that R is the range of f and its complement is the range of g. Now
define a new function h : N → N by

f (b x2 c)
(
if x is even
h(x) = 
g(b x− 1
2 c) otherwise.

The point of this is that the sequence of values of h consists of alternate values of f and g:

h(0) = f (0) h(1) = g(0)


h(2) = f (1) h(3) = g(1)
h(4) = f (2) h(5) = g(2)
h(6) = f (3) h(7) = g(3)
438 CHAPTER 9. RECURSIVE FUNCTIONS

and so on. In particular, the set of even terms of h is R and the set of odd terms is its
complement. Therefore the function

θ(x) = min{h(i) = x}
i

is recursive (total!) and has the property that

θ(x) is even if x ∈ R and θ(x) is odd if x ∈


/R.

Consequently, the characteristic function of R can be computed as


(
1 if θ(x) is even
C(x) =
0 otherwise

and this is recursive. 

And just in case you’ve been wondering whether there are in fact any recursively
IF enumerable sets which are not recursive, the answer is yes. In Section F we found several
partial recursive functions whose domains were not recursive. These domains however must
be recursively enumerable. Perhaps the most obvious example is the domain of ϕ.

G.8 Proposition
Let A and B be recursively enumerable sets. Then so are A ∪ B and A ∩ B.

Proof. If either A or B is empty the result is trivial, so now we may assume that A and
B are the ranges of recursive functions f and g respectively. Define h as in the previous
proposition. Then h is recursive and A ∪ B is the range of h, so it is recursively enumerable.
A and B are also the domains of partial recursive functions a and b respectively and then
A ∩ B is the domain of a + b. 

Note that, if A is a recursively enumerable set, then its complement need not be. Indeed,
if A is not actually recursive, then its complement cannot be recursively enumerable (by
Proposition G.7 above).

In the next chapter we will need a slight generalisation of Proposition G.7:

G.9 Proposition
Let A and B be disjoint recursively enumerable sets whose union is recursive. Then they
are both recursive.

Proof. By symmetry, it is enough to show that A is recursive. Since it is recursively


enumerable, it is enough to show that NrA is also recursively enumerable. Now Nr(A∪B)
is recursive and so recursively enumerable. Therefore B ∪ (N r (A ∪ B)) is also recursively
enumerable. But this is the set N r A. 
10. GÖDEL’S THEOREM

A Gödel numbering
It will be assumed throughout this chapter that we are dealing with a fixed first-order theory
which “contains Robinson Arithmetic”, in the sense that it is an extension of the theory RA
described in Section 4.D. I 4.D
Noting that PA is itself an extension of RA, then everything in this chapter is true for any
first-theory S which contains Peano Arithmetic PA, and that includes MK and ZF.
In any case, such a language has the relation = (equality, binary) and the functions 0̄ (zero,
nullary), x 7→ x+ (successor, unary), + and × (addition and multiplication, binary), but,
since it is an extension of RA, it may contain other relations and functions.
Recall that, to any natural number ξ, we defined a corresponding term ξ¯ in S so that

1̄ = 0̄+ , 2̄ = 0̄++ , 3̄ = 0̄+++

and so on (see Definition 4.C.3). I 4.C.3

A.1 Numbering of sequences again


In Definition 9.B.17 we had a numbering of all finite sequences I 9.B.17

[
P[∞] : Nn → N .
n=0

which was very useful to us then. We proved then that the function P[∞] is bijective.

A.2 Definition: Gödel numbering


The basic idea here is to assign numbers to various parts of the language S — to symbols,
expressions and proofs — and so allow arithmetic to talk about itself to some extent. These
numberings are called Gödel numberings.

First we number the symbols in our language. It does not matter much how we do this,
provided that the numbers given to different symbols are different; we will assume that they
are numbered in a one-to-one fashion. For example, suppose there are m (formal) function
symbols, f1 , f2 , . . . , fm of arities α1 , α2 , . . . , αm respectively, n (formal) relation symbols,
r1 , r2 , . . . , rn of arities β1 , β2 , . . . , βm respectively and the variable symbols are v1 , v2 , . . . ;
then we could number the symbols as follows:

¬ 0
⇒ 1

439
440 CHAPTER 10. GÖDEL’S THEOREM

∀ 2
( 3
) 4
, 5
f1 , f2 , . . . , fm 6,7,. . . ,m+5
r1 , r2 , . . . , rn m+6,m+7,. . . ,m+n+5
v1 , v2 , . . . m+n+6,m+n+7,. . .

We may call this the Gödel numbering of symbols and denote it gnSym. Thus, for example,
gnSym(∀) = 2 and gnSym(f2 ) = 7 (assuming f2 actually exists).
We can now use this numbering to represent each string in the language by a sequence of
natural numbers. For example, let us see how we would represent the axiom (∀x)(x+ 6= 0̄) .
Firstly, we have to rewrite this axiom in its fully formal form, according to the definitions of
I Ch.3 Chapter 3; this is (∀x(¬ = (s(x), 0̄))) (here I am using s for the successor function). Now we
need to know what the Gödel numbers of the symbols 0̄, s, x and = are; this would depend
on how many extra functions and relations we have in our language S; let us suppose that

gnSym(0̄) = 7 , gnSym(s) = 8 , gnSym(=) = 10 and gnSym(x) = 11 .

Then our string would correspond to the sequence (3, 2, 11, 3, 0, 10, 3, 8, 3, 11, 4, 5, 7, 4, 4, 4).
(So far, this is quite closely analogous to the way a computer deals with text. Each symbol
of the text is stored as a number — its ASCII code — and words and expressions are stored
as sequences of these numbers. But next we go further.)

(In a formal language, none of the expressions are empty strings, so we need not deal with
the empty string at all.)
We can now use our P[∞] numbering of nonempty sequences just defined above to make
each string correspond to a single number. In the case of our example this would be
P[∞] (3, 2, 11, 3, 0, 10, 3, 8, 3, 11, 4, 5, 7, 4, 4, 4). This gives us a Gödel numbering of strings.
More precisely, it gives us a function gnStr from the set of all (nonempty) strings in the lan-
guage S to N defined thus: for the string a1 a2 . . . ak (where a1 , a2 , . . . , ak are the individual
characters (symbols) in the string)

gnSym(a1 a2 . . . ak ) = P[∞] (gnSym(a1 ), gnSym(a2 ), . . . , gnSym(an )) .

Since gnSym and P[∞] are both bijective, so is gnStr and so any string in the language may
be identified with its Gödel number. This allows us to make statements such as ...

A.3 Proposition
The set of all expressions in S is recursive.
By this I mean that the set of all string Gödel numbers of expressions in the language is a
recursive subset of N.
I3 Informal proof. Consider the definition of an expression given in Chapter 3. This pro-
vides a completely mechanical method for checking whether any given string is an expression
A. GÖDEL NUMBERING 441

9 or not. Indeed, using the programming language described in Chapter 9, it is tedious but not
difficult to write a program to compute the characteristic function of this set, thus showing
that it is recursive.
Note Here and in what follows I will give a number of “informal proofs” that certain sets
are recursive or recursively enumerable. A detailed proof would consist of the presentation of
the appropriate algorithm. The algorithms for these proofs are indeed presented in Appendix
D in case you should feel suspicious about any of the informal ones. ID

A.4 Proposition
The set of all sentences in S is recursive.
Informal proof. As for the last proposition but notice also that deciding whether an
expression contains any free variables or not is also a mechanical procedure.

A.5 Proposition
The set of all axioms of PL is recursive.
By this I mean of course that the set of all string Gödel numbers of axioms of PL is a
recursive subset of N.
Remark The six “axioms” of PL are of course axiom schemas and represent between them
an infinite number of actual axioms. Basically they tell us that any expression with one of
the six prescribed kind of structures is an axiom.
Informal proof. Consider the definition of an expression given in Chapter 3. This also I3
provides a completely mechanical method for breaking up an expression into its component
parts, so its structure can be compared with those of the six axiom schemas. Again it is
tedious but not difficult to write a program to compute the characteristic function of this
set, thus showing that it is recursive.

A.6 Remark: recursive axioms


Most first-order theories use the six axiom schemas of PL plus a finite number of proper
axioms. In this case of course their axioms form a recursive set. In cases where there are an
infinite number of proper axioms, it is nearly always the case that these axioms also defined
in such a way as to form a recursive set.
Thus we may suppose that the set of axioms of a first-order theory is usually recursive; a
non-recursive set of axioms is a rather strange thing to have — consider, what is the use of
a set of axioms if you have no way of deciding what is an axiom and what isn’t?
Occasionally non-recursive sets of axioms are useful for theoretical reasons. For example,
the set of axioms created in the proof of Theorem 3.G.13 is (usually) non-recursive. I 3.G.13
At this point we want to introduce the idea that a theory may have alternative sets of
axioms. We should get these ideas straight now.
442 CHAPTER 10. GÖDEL’S THEOREM

Recall that in an axiomatic theory, we have the set of theorems which may be deduced
from a chosen set of axioms. Given a set A of expressions, we write Th(A) for the theory
generated by A, that is, the set of all expressions X such that A X. Thus we say that
A is a set of axioms for the theory T if T = Th(A) . It may be the case that A is different
from the axioms we chose to generate T in the first place.

We will say that a theory is recursively axiomatisable if it has a recursive set of axioms
(whether they were the ones chosen to generate it or not).

A.7 Remark: decidability


The idea of a theory being decidable – defined up to now somewhat informally by saying
that there is an algorithm for deciding whether or not an expression is a theorem – may
now be made precise: a theory is decidable if it (that is, the set of theorems) is recursive.
We know that, because of Universal Generalisation, an expression is a theorem if and only
if its fully quantified form is a theorem. It follows that a theory is decidable if and only if
the set of all sentences (closed expressions) which are theorems is recursive.

A.8 Gödel numbering continued


Proofs are sequences of expressions satisfying certain conditions. Thus they are sequences of
strings (sequences of sequences if you like). It only takes one more step to form a numbering
of such sequences of strings.
Suppose then that s1 , s2 , . . . , sk is a sequence of strings. This corresponds in a bijective
fashion to the sequence (gnSym(s1 ), gnSym(s2 ), . . . , gnSym(sk )) of natural numbers which
in turn corresponds to the single number P[∞] (gnSym(s1 ), gnSym(s2 ), . . . , gnSym(sk )) .
This numbering of sequences of strings we will call the sequence Gödel numbering and
denote it gnSeq. We will be interested in the sequences of strings which constitute proofs.

A.9 Proposition
In a recursively axiomatisable theory, the set of all proofs is recursive.

By this we mean of course that the set of all sequence Gödel numbers of proofs in the
language is a recursive subset of N.
I Ch.3 Informal proof. Consider the definition of a proof given in Chapter 3. This provides a
completely mechanical method for checking whether any given sequence of strings is a proof
or not.

Consider what is involved. First we must step through the individual strings checking that
each is indeed an expression: we recover their string Gödel numbers using the function ent,
I A.3 and test them as guaranteed by Proposition A.3 above. We must also check that each string
arises by one of the rules for forming a proof – that it is a case of an axiom (checkable by
assumption), that it follows from two earlier steps by MP or from one earlier step by UG.
All these are mechanical things to check, though the algorithms might be tiresome to write
out.
A. GÖDEL NUMBERING 443

And along the same lines:

A.10 Proposition
The “Proof of” relation

hξ, ηi 7→ ξ is the number of some expression and η is the number of a proof of that expression

is recursive.
Informal proof. Same as for the previous proposition.

A.11 Theorem
In a recursively axiomatisable theory, the set of all theorems is recursively enumerable.

Proof. Let a be the string Gödel number of some fixed theorem, say the first instance of the
first axiom of PL. Now consider the function f : N → N defined by the following algorithm:
for each w ∈ N, compute whether w is the Gödel number of a valid proof or not. If it is the
Gödel number of a valid proof, output the string number of its last entry, ent(len(w), w) —
which is the string Gödel number of the theorem it proves. If w is not the Gödel number of
a valid proof, then output a.
Then the set of all theorems is just the range of f . 

A.12 Theorem
A recursively axiomatisable complete theory is decidable.

Proof. Let T be the set of all sentences (closed expressions) in the language which are the-
orems and let U be the set of all “antitheorems” — sentences whose negation is a theorem (A
is an antitheorem means ¬A ). The set of antitheorems is clearly recursively enumerable
too — simply enumerate the theorems as described above and negate each one as you go.
If T and U are not disjoint then the theory is inconsistent; then every expression is a theorem
and so the set of theorems is recursive (the set of its Gödel numbers is N).
Suppose now that T and U are disjoint. They are both recursively enumerable and their
union is the set of all sentences (that is the definition of “complete”), which is recursive.
Therefore (by 9.G.9) T is recursive.  I 9.G.9
444 CHAPTER 10. GÖDEL’S THEOREM

B Expressibility and representability


B.1 Definition: Expressible, representable
(i) An n-ary relation r on N is expressible in S if there is an expression R(x1 , x2 , . . . , xn )
in S with exactly n free variables x1 , x2 , . . . , xn such that, for any ξ1 , ξ2 , . . . , ξn in N

if r(ξ1 , ξ2 , . . . , ξn ) is true then R(ξ¯1 , ξ¯2 , . . . , ξ¯n ) in S


and if r(ξ1 , ξ2 , . . . , ξn ) is false then ¬R(ξ¯1 , ξ¯2 , . . . , ξ¯n ) in S.

We then say the “R expresses r”.


(ii) A function f : Nn → N is representable in S if there is an expression F (x1 , x2 , . . . , xn , y)
in S with exactly n + 1 free variables x1 , x2 , . . . , xn such that, for any ξ1 , ξ2 , . . . , ξn , η in N,

if f (ξ1 , ξ2 , . . . , ξn ) = η then F (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄) in S


and, for all ξ1 , ξ2 , . . . , ξn in N, (!y) F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) .

We then say the “F represents f ”.

Recall that the expression (!y) F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) here is short for

(∀y)(∀y 0 )( F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) ∧ F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ) ⇒ y = y 0 ) .

B.2 Remark
If the function f : Nn → N is representable, then the (n + 1)-ary relation given by
f (ξ1 , ξ2 , . . . , ξn ) = η is expressible. The first of the two requirements of Part (i) of the
definition above is given explicitly. As for the second, suppose that f (ξ1 , ξ2 , . . . , ξn ) 6= η.
Then, writing θ for the true value, that is, defining θ = f (ξ1 , ξ2 , . . . , ξn ), we have θ 6= η and
so, from these last two equations,

F (ξ¯1 , ξ¯2 , . . . , ξ¯n , θ̄) and θ̄ 6= η̄ .

From this and the last line of the definition above it follows that ¬F (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄) .
However, note that the notion of representability of the function says more than that the
relation f (ξ1 , ξ2 , . . . , ξn ) = η is expressible. The last line of the definition above says that,
moreover, this relation is of the kind that defines a function, well, at least when the argu-
ments represent natural numbers.

B.3 The Representability Theorem


(i) If a function f : Nn → N is recursive, then it is representable.

(ii) If an n-ary relation on N is recursive then it is expressible.


Preliminary remarks
We know that, in a formal proof, if we have two lines of the forms (∀x)P (x) and (∀x)Q(x)
then it is valid to deduce that (∀x)(P (x) ∧ Q(x)). On the other hand, if we have two lines
of the forms (∃x)P (x) and (∃x)Q(x) then it is not valid to deduce that (∃x)(P (x) ∧ Q(x)).
B. EXPRESSIBILITY AND REPRESENTABILITY 445

(Think: (∃x) (x = 2) and (∃x) (x = 3) are both true, but (∃x) (x = 2 ∧ x = 3) is not.) It is
however valid to deduce that (∃x)(∃x0 )(P (x) ∧ Q(x0 )).
When using the choice rule, as we will in the proof below, we have a similar consideration.
If we have lines of the forms (∃x)P (x) and (∃x)Q(x), it is valid to apply the choice rule to
the first line to get P (x), but if we apply the choice rule again to the second line, we must
introduce a new choice variable and write, say, Q(x0 ).

Proof. First we show that any recursive function f : Nn → N is representable.


The proof is by induction over the construction of f as a recursive function, as described in
Corollary 9.C.2 Item (3). The one that is like the original construction, only we always use I 9.C.2
recursive functions, never truly partial ones.

(i) The successor function f (ξ) = ξ + 1 is represented by

F (x, y) : y = x+

Suppose that f (ξ) = η, that is, that ξ + = η. Then ξ¯+ = η̄ by the definition of the bar
notation.
For uniqueness, suppose that ¯ y) ∧ F (ξ,
F (ξ, ¯ y 0 ), that is, y = ξ¯+ ∧ y 0 = ξ¯+ ; then
0
y = y by substitution of equals.
(ii) The projection function πn,i (ξ1 , ξ2 , . . . , ξn ) = ξi is represented by

Πn,i (x1 , x2 , . . . , xn , y) : y = xi ∧ x1 = x1 ∧ x2 = x2 ∧ . . . ∧ xn = xn

The apparently redundant parts of this expression are required to satisfy the stipulation in
the definition that there be exactly n + 1 variables.
Suppose that πn,i (ξ1 , ξ2 , . . . , ξn ) = η, that is, ξ1 = η. Then ξ¯i = η̄ so ξ¯i = η̄ and
trivially, for i = 1, 2, . . . , n, ξ¯i = ξ¯i . Therefore Πn,i (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄).
For uniqueness, suppose that Πn,i (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) ∧ Πn,i (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ). Then y = ξ¯i
and y 0 = ξ¯i , from which y = y0 .
(iii) Addition is represented by

A(x1 , x2 , y) : x1 + x2 = y .

Suppose that ξ1 + ξ2 = η. Then, by Proposition 4.D.7(ii), ξ¯1 + ξ¯2 = ξ1 + ξ2 and ξ1 + ξ2 I 4.D.7


is η̄.
For uniqueness, suppose that ξ¯1 + ξ¯2 = y ∧ ξ¯1 + ξ¯2 = y 0 . Then y = y 0 as required.

(iv) Multiplication is represented by

M (x1 , x2 , y) : x1 x2 = y .

and the proof is the same as for addition (using 4.D.7(iii)). I 4.D.7
446 CHAPTER 10. GÖDEL’S THEOREM

(v) Natural subtraction is represented by

S(x1 , x2 , y) : (x1 < x2 ∧ y = 0) ∨ (x1 = y + x2 ) .

 = η. Then either ξ < ξ and η = 0 or else ξ ≤ ξ and η + ξ = ξ .


Suppose that ξ1 −ξ 2 1 2 2 1 2 1

I 4.D.7 Suppose first that ξ1 < ξ2 and η = 0. Then ξ¯1 < ξ¯2 by 4.D.7(vi) and η̄ = 0̄ by the
definition of bar notation. But then ¯ ¯
S(ξ1 , ξ2 , η̄) by plain SL.

I 4.D.7 Suppose on the other hand that ξ2 ≤ ξ1 and η + ξ2 = ξ1 . Then ξ¯2 ≤ ξ¯1 by 4.D.7(vi0 ) and
¯ ¯
η̄ + ξ2 = ξ1 by the definition of bar notation. Then again S(ξ¯1 , ξ¯2 , η̄) by plain SL.
Note that here we have used Proof by Cases, but this has been carried out in the meta-
language; we have not used the fact that such a proof is valid in RA (even though it is).

Note also that ξ¯1 = y + ξ¯2 ⇒ ξ¯2 ≤ ξ¯1 and so S(ξ¯1 , ξ¯2 , y) is equivalent to

(ξ¯1 < ξ¯2 ∧ y = 0) ∨ (ξ¯2 ≤ ξ¯1 ∧ ξ¯1 = y + ξ¯2 ) .

For uniqueness, suppose that S(ξ¯1 , ξ¯2 , y) ∧ S(ξ¯1 , ξ¯2 , y 0 ) so that

(ξ¯1 < ξ¯2 ∧ y = 0) ∨ (ξ¯2 ≤ ξ¯1 ∧ ξ¯1 = y + ξ¯2 ) (–1)


and (ξ¯1 < ξ¯2 ∧ y 0 = 0) ∨ (ξ¯2 ≤ ξ¯1 ∧ ξ¯1 = y 0 + ξ¯2 ) . (–2)

Suppose now that ξ1 < ξ2 . Then ξ¯1 < ξ¯2 and ¬(ξ¯2 ≤ ξ¯1 ) (from D.7(vi) and (vi0 ))
0
and so (–1) and (–2) reduce to y = 0̄ and y = 0̄, from which y = y0 .
Otherwise ξ2 ≤ ξ1 in which case (–1) and (–2) reduce similarly to

ξ¯1 = y + ξ¯2 and ξ¯1 = y 0 + ξ¯2

I 4.D.7 from which y = y 0 by 4.D.7(viii).

(vi) Now suppose that f : Nn → N is obtained by substitution, f = h ◦ (g1 , g2 , . . . , gm ),


and that h and g1 , g2 , . . . , gm are all representable. Then we have expressions H(z1 , z2 , . . . , zm , y)
and G(x1 , x2 , . . . , xn , zi ) for i = 1, 2, . . . , m such that, for any ζ1 , ζ2 , . . . , ζm and η,

If h(ζ1 , ζ2 , . . . , ζm ) = η then H(ζ̄1 , ζ̄2 , . . . , ζ̄m , η̄)


and (!y) H(ζ̄1 , ζ̄2 , . . . , ζ̄m , y)

and, for any ξ1 , ξ2 , . . . , ξn , ζ and i = 1, 2, . . . , m,

If gi (ξ1 , ξ2 , . . . , ξn ) = ζ then Gi (ξ¯1 , ξ¯2 , . . . , ξ¯n , ζ̄)


and (!z) Gi (ξ¯1 , ξ¯2 , . . . , ξ¯n , z) .

We show that f is represented by the (obvious?) expression F (x1 , x2 , . . . , xn , y), defined


B. EXPRESSIBILITY AND REPRESENTABILITY 447

to be

(∃z1 )(∃z2 ) . . . (∃zm ) H(z1 , z2 , . . . , zm , y)


∧ G1 (x1 , x2 , . . . , xn , z1 )
∧ G2 (x1 , x2 , . . . , xn , z2
..
.

∧ Gm (x1 , x2 , . . . , xn , zm )

Suppose that f ◦ (ξ1 , ξ2 , . . . , ξn ) = η. Define ζ1 , ζ2 , . . . , ζm by

ζi = gi (ξ1 , ξ2 , . . . , ξn ) for i = 1, 2, . . . , m.

Then h(ζ1 , ζ2 , . . . , ζm ) = η, and so we have

Gi (ξ¯1 , ξ¯2 , . . . , ξ¯n , ζ̄i ) for each i and H(ζ̄1 , ζ̄2 , . . . , ζ̄m , η̄) .

Therefore

H(ζ̄1 , ζ̄2 , . . . , ζ̄m , η̄)


∧ G1 (ξ¯1 , ξ¯2 , . . . , ξ¯n , ζ̄1 )
∧ G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , ζ̄2 )
..
.
∧ Gm (ξ1 , ξ¯2 , . . . , ξ¯n , ζ̄m ) .
¯

and so

(∃z1 )(∃z2 ) . . . (∃zm ) H(z1 , z2 , . . . , zm , η̄)


∧ G1 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z1 )
∧ G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z2 )
..
.
∧ Gm (ξ¯1 , ξ¯2 , . . . , ξ¯n , zm ) .


and this is F (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄), as required.


For uniqueness we must now show that, for any ξ1 , ξ2 , . . . , ξn in N,

(∀y)(∀y 0 )( F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) ∧ F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ) ⇒ y = y 0 ) .

Using Universal Generalisation and the Deduction Theorem, it is enough to show that

F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) , F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ) y = y0 .


448 CHAPTER 10. GÖDEL’S THEOREM

The proof goes as follows: the two hypotheses are

(∃z1 )(∃z2 ) . . . (∃zm ) H(z1 , z2 , . . . , zm , y)


∧ G1 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z1 )
∧ G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z2 )
..
.
∧ Gm (ξ1 , ξ¯2 , . . . , ξ¯n , zm )
¯


and

(∃z1 )(∃z2 ) . . . (∃zm ) H(z1 , z2 , . . . , zm , y 0 )


∧ G1 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z1 )
∧ G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z2 )
..
.
∧ Gm (ξ¯1 , ξ¯2 , . . . , ξ¯n , zm ) .


Using the choice rule on these (and this is where the preliminary remark at the beginning
of the proof is relevant), we have

H(z1 , z2 , . . . , zm , y) (a)
G1 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z1 ) (a1)
G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z2 ) (a2)
.. .
. (..)
Gm (ξ¯1 , ξ¯2 , . . . , ξ¯n , zm ) (am)
H(z10 , z20 , . . . , zm 0
, y) (b)
¯ ¯ ¯
G1 (ξ1 , ξ2 , . . . , ξn , z10 ) (b1)
G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z20 ) (b2)
.. .
. (..)
Gm (ξ¯1 , ξ¯2 , . . . , ξ¯n , zm
0
) (bm)

(as separate lines in our proof). Now, from Lines (a1) and (b1), we have z10 = z1 and from
the other similar pairs of lines, z20 = z2 , . . . , zm
0
= zm . Then substitution of equals in Line
(b) gives H(z1 , z2 , . . . , zm , y ) which, with Line (a) gives y 0 = y, as required.
0

(vi) Finally suppose that f is obtained by minimalisation of a regular function,

f (ξ1 , ξ2 , . . . , ξn ) = min{ g(η, ξ1 , ξ2 , . . . , ξn ) = 0 }


η

where g is regular and representable, so there is an expression G(y, x1 , x2 , . . . , xn , z) such


that
if g(η, ξ1 , ξ2 , . . . , ξn ) = ζ then G(η̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , ζ̄)
B. EXPRESSIBILITY AND REPRESENTABILITY 449

and, for all η, ξ1 , ξ2 , . . . , ξn in N,

(∀z)(∀z 0 ) ( G(η̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , z) ∧ G(η̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , z 0 ) ⇒ z = z 0 ) .

I will show that f is represented by the expression F (x1 , x2 , . . . , xn , y) defined to be

G(y, x1 , x2 , . . . , xn , 0̄) ∧ (∀u < y)¬G(u, x1 , x2 , . . . , xn , 0̄) ∧ (∀u)(u < y ∨ u = y ∨ y < u)

Firstly, suppose that f (ξ1 , ξ2 , . . . , ξn ) = η. Then g(η, ξ1 , ξ2 , . . . , ξn ) = 0 and g(ν, ξ1 , ξ2 , . . . , ξn ) 6=


0 for all ν < η. We therefore have

G(η̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) ; (–3)

and, for any ν < η, using Remark B.2 above, we have I B.2

¬G(ν̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) .

Now this is true for every ν < η, so we have

¬G(0̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) ∧ ¬G(1̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) ∧ . . . ∧ ¬G(η − 1, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄)

which, using 4.D.9, gives I 4.D.9

(∀u < η̄)¬G(u, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) . (–4)

Also,
(∀u) (u < η̄ ∨ u = η̄ ∨ η̄ < u) (–5)
by 4.D.7(vii ). Now equations (–3), (–4) and (–5) together give F (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄), as re-
0
I 4.D.7
quired.

Finally we must prove that, for all ξ1 , ξ2 , . . . , ξn in N,

(∀y)(∀y 0 ) F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) ∧ F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ) ⇒ y = y 0 .




Using Universal Generalisation and the Deduction Theorem as before, it is enough to prove
that
F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) , F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ) y = y0 .
Now let us give an argument in S. We have

G(y, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) (–6)


(∀u < y)¬G(u, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) (–7)
(∀u)(u < y ∨ u = y ∨ y < u) (–8)
G(y 0 , ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) (–60 )
(∀u < y 0 )¬G(u, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) (–70 )
0 0 0
(∀u)(u < y ∨ u = y ∨ y < u) (–80 )

and from (–8)

y < y0 ∨ y = y0 ∨ y0 < y (–800 )


450 CHAPTER 10. GÖDEL’S THEOREM

Suppose that y < y 0 . Then (–70 ) gives ¬G(u, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄), which contradicts (–6).
Therefore ¬(y < y 0 ). In the same way ¬(y 0 < y). From (–800 ) now, y = y 0 ,as required.
(vii) Suppose that the n-ary relation r is recursive: we want to show that it is expressible.
By definition of recursiveness of a relation, its characteristic function
(
1 if r(ξ1 , ξ2 , . . . , ξn ) is true,
c(ξ1 , ξ2 , . . . , ξn ) =
0 if r(ξ1 , ξ2 , . . . , ξn ) is false.

is recursive and so, as just proved above, is representable. Therefore there is an expression
C(x1 , x2 , . . . , xn , y) which represents it. We will show that r is expressed by the (once again
obvious?) expression C(x1 , x2 , . . . , xn , 1̄).
If r(ξ1 , ξ2 , . . . , ξn ) is true, then c(ξ1 , ξ2 , . . . , ξn ) = 1 and so C(ξ¯1 , ξ¯2 , . . . , ξ¯n , 1̄). On the
I B.2 other hand, if r(ξ1 , ξ2 , . . . , ξn ) is false, then c(ξ1 , ξ2 , . . . , ξn ) = 0 and so, using Remark B.2
above, ¬C(ξ¯1 , ξ¯2 , . . . , ξ¯n , 1̄), as required. 
C. GÖDEL’S THEOREM 451

C Gödel’s theorem
Recall that (in Proposition 4.D.8 the following was shown to be a theorem of S: I 4.D.8
P (0̄) ∧ P (1̄) ∧ . . . ∧ P (ν̄) ⇔ (∀x ≤ ν̄)P (x)
This will be used in the proof of Gödel’s Theorem below.

C.1 Preliminary: a trap for young players


Suppose you have a formal theory S for which N is a model. Let P (x) be an expression in
S with just the one free variable x, and suppose that
for all ξ ∈ N, ¯
P (ξ).
You cannot infer from this that
(∀x)P (x) .

However, for any particular ν ∈ N, you can argue from


for all ξ ≤ ν, ¯
P (ξ)
to
(∀x ≤ ν̄)P (x) ;
here’s how:
given any particular natural number ξ ≤ ν, our assumption gives us
¯ ,
P (ξ)
so we have (separate) theorems
P (0̄) , P (1̄) , ... , P (ν̄) ,
from which
P (0̄) ∧ P (1̄) ∧ · · · ∧ P (ν̄)
from which (by 4.D.9, quoted above), I 4.D.9
(∀x ≤ ν̄)P (x)
as required.

C.2 Another preliminary


For the proof of the theorem, we will choose a particular pair of variable symbols x and y
(different), which will remain fixed for the whole proof.
We assign Gödel numbers to the expressions of S which contain x as their only free variable.
Thus
P0 (x) , P1 (x) , P2 (x) , . . .
are all such expressions.

We will be using Gödel numbers so often in this proof that I will call them simply “GNs”.
452 CHAPTER 10. GÖDEL’S THEOREM

C.3 Gödel’s Incompleteness Theorem


If S is consistent then it is not complete.

Proof. We assume that S is consistent and show that it is not complete.


We start by defining two relations W (ξ, η) and V (ξ, η) on N as follows
¯
W (ξ, η) : “η is the Gödel number of a proof of Pξ (ξ)” (–1)
¯
V (ξ, η) : “η is the Gödel number of a proof of ¬Pξ (ξ)” (–2)

Notice that we are already doing something rather peculiar: substituting into P (x) its own
GN.

I A.10 These relations are recursive (Informal proof: As for Proposition A.10, with a few more
obviously algorithmic checks). They are therefore expressible. Let Ŵ (x, y) and V̂ (x, y) be
the expressions which express them, thus:

if W (ξ, η) is true then ¯ η̄)


Ŵ (ξ,
if W (ξ, η) is false then ¬Ŵ (ξ,¯ η̄)
if V (ξ, η) is true then ¯ η̄)
V̂ (ξ,
and if V (ξ, η) is false then ¬V̂ (ξ,¯ η̄)

Because S is consistent, these are actually equivalences. For,

Suppose that ¯ η̄)


Ŵ (ξ,
Then, by consistency / ¬Ŵ (ξ,¯ η̄)
Then, by the second statement above W (ξ, η) is not false
so W (ξ, η)is true.

The same argument goes for the other three statements. Thus we have

W (ξ, η) is true if and only if ¯ η̄)


Ŵ (ξ, (–3A)
W (ξ, η) is false if and only if ¬Ŵ (ξ,¯ η̄) (–3B)
V (ξ, η) is true if and only if ¯ η̄)
V̂ (ξ, (–4A)
and V (ξ, η) is false if and only if ¬V̂ (ξ,¯ η̄) (–4B)

Now, define W ∗ (x) to be the expression (∀y)¬Ŵ (x, y).

This (sort of) says that a certain statement has no proof. Being more careful:—
Let ξ be any natural number. Because N is a model for S, we can infer from
¯
W ∗ (ξ) that is, from ¯ y)
(∀y)¬Ŵ (ξ,
C. GÖDEL’S THEOREM 453

that
(∀η ∈ N)¬W (ξ, η)
¯ . But since every
and this says that, for any η ∈ N, η is not the GN of a proof of Pξ (ξ)
proof has a GN, this says that
¯ has no proof.
Pξ (ξ) (–5)

Now W ∗ (x) is an expression with exactly one free variable x, so it has a GN, which we will
call µ. That is, it is the expression Pµ (x).

W ∗ (x) = Pµ (x) (–6)

Now let us see what happens if we substitute µ for x in this, to get W ∗ (µ̄).

In (–5) we have just seen that, if W ∗ (µ̄), then Pµ (µ̄) has no proof, and in (–6) that

Pµ (µ̄) is the same as W (µ̄). So this tells us that

if W ∗ (µ̄) then / W ∗ (µ̄)

from which it follows that / W ∗ (µ̄) .

At this point it seems we should be able to finish the proof off reasonably quickly: argue
the other way to get
if / W ∗ (µ̄) then W ∗ (µ̄)
from which W ∗ (µ̄), and so S is not complete.
But this is where we hit the trap for young players: to say that W ∗ (µ̄) (that is, Pµ (µ̄) has
no proof) is to say that

(∀η ∈ N)( η is not the GN of a proof of Pµ (µ̄) ) ,


that is, (∀η ∈ N) ¬W (µ, η)

and from this, using (–3B), we get

(∀η ∈ N) Ŵ (µ̄, η̄)

but from this we cannot get to


(∀y)¬Ŵ (µ̄, y)
which is the W ∗ (µ̄) we want.
Instead, we do some fancy footwork which allows us to use the Proposition 5.B.19 trick
mentioned at the beginning of this section.
454 CHAPTER 10. GÖDEL’S THEOREM

Now we use that other statement V (ξ, η) defined at the beginning in(–2). Let V ∗ (x) be the
expression 
(∀y) Ŵ (x, y) ⇒ (∃z ≤ y)V̂ (x, z)

Now let ν be the GN of V ∗ (x) so that

V ∗ (x) = Pν (x) .

We will consider V ∗ (ν̄); note that this is



(∀y) Ŵ (ν̄, y) ⇒ (∃z ≤ y)V̂ (ν̄, z) .

We now show that neither V ∗ (ν̄) nor ¬V ∗ (ν̄) is a theorem, and so S is not complete.
Both facts are proved by contradiction.

(A) Assume that V ∗ (ν̄) that is, that



(∀y) Ŵ (ν̄, y) ⇒ (∃z ≤ y)V̂ (ν̄, z) (–7)

Since this is a theorem, it has a proof. Let α be the GN of a proof of this. (n.b. There may
be many; pick one.)
But V ∗ (ν̄) is Pν (ν̄) and, since α is the GN of a proof of this (using (–1)) we have

W (ν, α) is true

and therefore (by (–3A))


Ŵ (ν̄, ᾱ)
which, with (–7) above, gives us

(∃z ≤ α)V̂ (ν̄, z) (–8)

Noting that we are working with the assumptions that V ∗ (ν̄) and that S is consistent,
we have
/ ¬V ∗ (ν̄)
in other words: there is no proof of V ∗ (ν̄) . In particular, for any natural number η, η is
not the GN of a proof of ¬V ∗ (ν̄) — which is ¬Pν (ν̄). In other words again, V (ν, η) is false
for all η. Thus (by (–4B)) we have

¬V̂ (ν̄, η̄) for all η ∈ N.

In particular
¬V̂ (ν̄, 0̄) , ¬V̂ (ν̄, 1̄) , ... , ¬V̂ (ν̄, ᾱ)
so, since there is only a finite number of these,

¬V̂ (ν̄, 0̄) ∧ ¬V̂ (ν̄, 1̄) ∧ · · · ∧ ¬V̂ (ν̄, ᾱ)


C. GÖDEL’S THEOREM 455

4.A.3 and so (by Proposition 4.A.3 quoted at the beginning)

(∀z ≤ ᾱ)¬V̂ (ν̄, z)

and
¬(∃z ≤ ᾱ)V̂ (ν̄, z)
which contradicts (–6).

(B) Now assume that ¬V ∗ (ν̄) that is, that



¬(∀y) Ŵ (ν̄, y) ⇒ (∃z ≤ y)V̂ (ν̄, z) (–9)

which, by the usual Logic manipulations, is equivalent to



(∃y) Ŵ (ν̄, y) ∧ (∀z ≤ y)¬V̂ (ν̄, z)

Let β be the GN of a proof of this. Then, by (–2), V (ν, β) is true, so by (–4A),

V̂ (ν̄, β̄) . (–10)

Also, we are assuming that ¬V ∗ (ν̄) so, by consistency,

/ V ∗ (ν̄)

and that means that there is no proof of V ∗ (ν̄) in S. Then (from (–1)) W (ν, η) is false for
all η ∈ N. Then, using (–3B),

for all η ∈ N, ¬Ŵ (ν̄, η̄)

in particular
¬Ŵ (ν̄, 0̄) , ¬Ŵ (ν̄, 1̄) , ... , ¬Ŵ (ν̄, β̄)
and so
(∀y ≤ β̄) ¬Ŵ (ν̄, y) (–11)
Recalling that this is just an abbreviation for (∀y)(y ≤ β̄ ⇒ ¬Ŵ (ν̄, y)), we have

(∀y)(Ŵ (ν̄, y) ⇒ y > β̄) .

from which, trivially,


(∀y)(Ŵ (ν̄, y) ⇒ y ≥ β̄) .
But we also have (–10), which gives (also trivially)

(∀y)(y ≥ β̄ ⇒ (∃z ≤ y) V̂ (ν̄, z)) .

From the last two


(∀y)(Ŵ (ν̄, y) ⇒ (∃z ≤ y) V̂ (ν̄, z)) ,
which contradicts (–9). 
456 CHAPTER 10. GÖDEL’S THEOREM

C.4 A fixed-point theorem


Let P (x) be an expression with one free variable x. Then there is a sentence Q with Gödel
number ψ such that Q ⇔ P (ψ̄) .

Proof. Let f be the function N → N defined thus: f (ξ) is the Gödel number of the statement
obtained by finding the expression with Gödel number ξ and replacing all free variables in
¯ This is clearly recursive, so it is expressed by an expression of exactly two free
it by ξ.
variables, F (x, y) say:
if f (ξ) = η then ¯ η̄)
F (ξ,
and
(∀x)(!y)F (x, y) . (–1)
Given the expression P (x), let R(x) be the expression

R(x) = (∃u)( P (u) ∧ F (x, u) )

and suppose that R(x) has Gödel number ρ. Let Q be R(ρ) and suppose that Q has Gödel
number ψ. From this and the definition of f , we see that ψ = f (ρ) and therefore that

F (ρ̄, ψ̄) . (–2)

From the definition of Q we also see that it is in fact the expression (∃u)( P (u) ∧ F (ρ̄, u) )
. From this, (–1) and (–2) it follows (by ordinary logic) that

Q ⇔ P (ψ̄)

as required. 

C.5 Church’s Theorem


If S is consistent then it is not decidable.
In other words, the set { ν : ν ∈ N and ν is the Gödel number of a theorem in S } is not
recursive.

Note Since S is any first-order theory which contains Peano Arithmetic, this theorem says
that no such theory can be both consistent and decidable.

Proof. We suppose that this set, T say, is recursive and prove that then S is not consistent.
So we are supposing that the unary relation r on N defined

r(ξ) ⇔ ξ is the Gödel number of a theorem

is recursive. It is then representable, which means that there is an expression R(x) in S


with exactly one free variable such that, for all ξ ∈ N,

if ξ is the Gödel number of a theorem, then ¯


R(ξ) (–1)

and if not, then ¯ .


¬R(ξ) (–2)
C. GÖDEL’S THEOREM 457

Applying the fixed-point theorem above to ¬R(x), there is a sentence Q with Gödel number
ψ such that
Q ⇔ ¬R(ψ̄) . (–3)
Suppose now that Q is not a theorem; then ψ is not the Gödel number of a theorem, so
¬R(ψ̄) by (–2) and then Q by (–3). But this is a contradiction, so Q is a theorem, that
is Q. But then, by (–1), R(ψ̄). From (–3) again, ¬Q and so S is inconsistent. 

C.6 Gödel-Rosser Theorem


If S is recursively axiomatisable and consistent then it is not complete.

Proof. Because if it were complete, then by A.12 it would be decidable, contradicting I A.12
Church’s Theorem above. 

This is the famous Gödel theorem, as strengthened by Rosser. It says that Peano’s axioms
are not sufficient to “answer every question about N”, in the sense that PA is consistent and
complete. But it also says much more than that: it is impossible to achieve this result by
adding more axioms, provided those axioms are such that it is possible to decide what is an
axiom and what isn’t.
458 CHAPTER 10. GÖDEL’S THEOREM

D PL is not decidable
D.1 Discussion
We have seen in this chapter, among other things, that Mathematics as a whole is not
decidable, whether we do it from a basis of MK or of ZF — or of pretty well anything that
allows counting for that matter. I suppose that is just as well, otherwise we mathematicians
would soon be out of a job.
I 2.G We did see however that Sentential Logic SL is decidable (back in Section 2.G). That means
that, once you get the hang of it, SL is completely routine.

Considering the Truth Table method, it feels very much as though just a bit of tweaking
would upgrade it to a decision method for Predicate Logic PL. Any given expression contains
only a finite number of functions, relations and variables. Surely, it seems, one could look
at all possible combinations of truth-values the relations could have and assign a variable
to each one — or something like that. Well, to cut a long story short, that cannot work.
We can now prove, fairly easily, that PL is not decidable. While the proof is now short and
easy, that does not mean it is simple, because it builds on a number of the big results in
these notes.
These are the ideas and results we will call on:

I 3.H.1 • PL is consistent (Theorem 3.H.1)

I 10.A.12 • Recursively axiomatisable and complete ⇒ decidable (Theorem 10.A.12)

I 3.J • Definition of finitely axiomatisable (Definition 3.J)

I 4.D.2 • RA is finitely axiomatisable (Definition 4.D.2)

• RA is not decidable (Gödel’s Incompleteness Theorem in this chapter)

We need a short preliminary lemma:

D.2 Lemma
If PL is decidable, then so is any finitely axiomatisable theory.

Proof. Let us suppose that PL is decidable and that T be a finitely axiomatisable theory,
that is, a first-order theory with a finite set of proper axioms A1 , A2 , . . . , An .
We need an algorithm to decide whether or not any given expression P is a theorem of T .

But P is a theorem of T if and only if

A1 ∧ A2 ∧ · · · ∧ An ⇒ P

is a theorem of PL — and (given our supposition) we can decide that. 


D. PL IS NOT DECIDABLE 459

D.3 Theorem
Predicate Logic PL is not decidable.

Proof. We know that RA is finitely axiomatisable. So, by the previous lemma, if PL were
decidable, then RA would be too. But it’s not. 
A. CONSTRUCTION OF THE BASIC
NUMBER SYSTEMS

In this appendix I describe the construction of the most important number systems of
mathematics, The Natural Numbers N, The Integers Z, The Rational Numbers Q, The Real
Numbers R and, briefly, the complex numbers C.
Proofs will mostly not be given, however the results are given in such an order that each
can be proved quite easily from what has gone before.

A Quotient sets and algebraic operations


The ideas of a quotient set and, more specifically, a quotient algebra will be used in several
of the constructions that follow. It will make the ensuing discussions simpler if we describe
them in general first.

A.1 Quotient sets


Let ∼ be an equivalence relation on a set A (see Section 6.B.20) and let a be a member of I 6.B.20
A. Then the equivalence class of a is the set of all members of A which are equivalent to a.
It is denoted [a], so we have

[a] = { x : x ∈ A and x ∼ a } .

It is easy to check that the equivalence classes form a partition of A, that is,

(i) Each equivalence class is nonempty,


(ii) Distinct equivalence classes are disjoint and
(iii) A is the union of all its equivalence classes.

((ii) and (iii) may be restated: every member of A is a member of exactly one equivalence
class.) Another useful property is:
(iv) For any a, b ∈ A, the following three things are equivalent: a ∼ b , [a] = [b] and
[a] ∩ [b] 6= ∅.
Given an equivalence relation ∼ on a set A, we define the quotient set A/∼ to be the set of
all equivalence classes under ∼:

A/∼ = { [a] : a ∈ A } .

The function a 7→ [a] is called the natural projection A → A/∼.

461
462 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

A.2 Algebraic operations


An algebraic operation of arity n on a set A is simply a function An → A. Examples of
algebraic operations are addition and multiplication on N (both binary), and indeed addition
and multiplication on the rest of our number systems (Z, Q, R, C). Negation is a unary
operation on all of these systems except for N. Inversion and division are not algebraic
operations because they are not everywhere defined (x/y and y −1 are undefined when y is
zero). Subtraction is a binary algebraic operation on most of our number systems, but not
on N.

A.3 Respect
We say that an equivalence relation ≡ on a set A respects the algebraic operation θ if
(assuming θ is n-ary), for any members a1 , a2 , . . . , an and a0 1 , a0 2 , . . . , a0 n of A such that
a1 ≡ a01 , a2 ≡ a02 , . . . , an ≡ a0n ,

θ(a1 , a2 , . . . , an ) ≡ θ(a01 , a02 , . . . , a0n ) .

An equivalence relation which respects all the algebraic operations we are interested in is
usually called a congruence.

Observe that any nullary operation is automatically respected by any equivalence relation
(the definition above is vacuously satisfied).
The most familiar example of a congruence is ordinary congruence modulo n on the integers,
for any fixed integer n. It respects all the ordinary operations of a ring with unity: addition,
negation, zero, multiplication and unity (one).
Now let ≡ be an equivalence relation on an algebra which respects the n-ary algebraic
operation θ. Then θ can also be defined on the quotient set in a natural way as follows:
let X1 , X2 , . . . , Xn be members of A/≡, that is, equivalence classes. Choose any members
x1 , x2 , . . . , xn of these classes (x1 ∈ X1 , x2 ∈ X2 , . . . , xn ∈ Xn ). Then θ(X1 , X2 , . . . , Xn ) is
defined to be the equivalence class containing θ(x1 , x2 , . . . , xn ).
It must of course be checked that this definition makes sense: that the definition just given of
θ(X1 , X2 , . . . , Xn ) is independent of the particular choice of the representatives x1 , x2 , . . . , xn
of the classes. That this is so follows quite easily from the fact that ≡ respects θ.
Once this has been established we can rewrite the definition in a much more convenient
form: given ≡ and θ as above, let a1 , a2 , . . . , an be members of A; then [a1 ], [a2 ], . . . , [an ]
are equivalence classes and we want to define θ([a1 ], [a2 ], . . . , [an ]). Well then,

θ([a1 ], [a2 ], . . . , [an ]) = [θ(a1 , a2 , . . . , an )] .

A.4 Laws
A law (in the sense in which we will be using the word here) is an equation involving the
defined operations which must be true for all values of the variables involved. Examples are
the commutative law
(∀x ∈ A)(∀y ∈ A) ( xy = yx )
A. QUOTIENT SETS AND ALGEBRAIC OPERATIONS 463

and the associative law

(∀x ∈ A)(∀y ∈ A)(∀z ∈ A) ( (xy)z = x(yz) ) .

Because of what we are about to do, it is a good idea to make sure that this definition
is precisely understood. Firstly, we define terms as in Chapter 3: a term is either a vari- I Ch.3
able symbol x, y, z . . . or of the form θ(t1 , t2 , . . . , tn ), where θ is an n-ary operation and
t1 , t2 , . . . , tn are (simpler) terms. Next, we define an equation as an expression of the form

term1 = term2

where term1 and term2 are of course terms. Finally, a law is an expression of the form

(∀x ∈ A)(∀y ∈ A)(∀z ∈ A) equation ,

were x1 , x2 , . . . , xn are all the variable symbols which appear in the equation.

Note that there are a number of facts which are often called laws which are not in fact laws
in the sense just defined. For instance, the “cancellative law of addition” for Z states:

(∀x ∈ Z)(∀y ∈ Z)(∃z ∈ Z) ( x + z = y ) .

The existential quantifier here stops this from being a law. In the same way, facts about
fields involving division fail to be laws; for example

(∀x ∈ Q) ( x 6= 0 ⇒ x/x = 1 ) .

The implication here is not allowed.


The point of all this, as far as this appendix goes, is the following very useful theorem.

A.5 Theorem
Let A be an algebra and ≡ be an equivalence relation on A. Then any law of A which
involves only operations which are respected by ≡ is also a law of A/≡.

A.6 Orders
Most of the above discussion applies to relations in the same way. We are only interested
in order relations here, so I will summarise the relevant facts for orders; the generalisations
to arbitrary relations, should you be interested, are obvious.
Recall that a preorder on a set A is a binary relation, ≤ say, with the properties:
Reflexivity: For all a ∈ A, a ≤ a.

Transitivity: For all a, b, c ∈ A, a ≤ b ∧ b ≤ c ⇒ a ≤ c.


A partial order is a preorder with the extra property:
Antisymmetry: For all a, b ∈ A, a ≤ b ∧ b ≤ a ⇒ a = b.
A full order is a partial order with the extra property:
464 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

Dichotomy: For all a, b ∈ A, either a ≤ b or b ≤ a.


We may also speak of a “full preorder”, which is a relation which has all of these properties
except possibly antisymmetry.
It is easy to check that, if ≤ is a preorder on the set A, then the relation ∼ defined by

x∼y ⇔ x ≤ y and y ≤ x

is an equivalence relation. We call this the equivalence relation defined by (or corresponding
to) ≤.
Now let A be a set and ≤ be a preorder on A. Let ∼ be an equivalence relation (not
necessarily the one defined by ≤). We say that ∼ respects the order relation ≤ if, for all
a, b, a0 , b0 ∈ A
a ∼ a0 and b ∼ b0 and a ≤ b ⇒ a0 ≤ b0 .
In this case a corresponding preorder can be defined on the quotient set A/∼ in the obvious
way: let X and Y be equivalence classes in A. Choose any members x of X and y of Y .
Then define X ≤ Y if and only if x ≤ y.
Having checked, as before, that this definition makes sense, that is, that the definition of
X ≤ Y is independent of the particular members x and y chosen, we may rewrite the
definition in the more convenient form: for all a, b ∈ A, [a] ≤ [b] if and only if a ≤ b.
Now the following facts are easily checked:
(1) If the order ≤ was a partial order, a full order or a full preorder on A, then the
corresponding order on A/∼ is an order of the same kind.

(2) If the order ≤ was a preorder on A and the relation ∼ happens to be the equivalence
relation defined by ≤, then the corresponding order on A/∼ is a partial order.
Note that, combining (1) and (2), if the order ≤ was a full preorder and the relation ∼ is
the equivalence relation defined by ≤, then the corresponding order on A/∼ is a full order.
B. THE NATURAL NUMBERS, N 465

B The Natural Numbers, N


This has already been defined and discussed in Section 6.D. Here we will introduce its I 6.D
algebraic and order properties.

B.1 Fundamental properties of The Natural Numbers


Addition, multiplication and the order relation are defined on N as follows:
(N1a) For all a ∈ N, a + 0 = a.

(N1b) For all a, b ∈ N, a + b+ = (a + b)+ .


(N2a) For all a ∈ N, a.0 = 0.
(N2b) For all a, b ∈ N, ab+ = (ab) + a.
(N3) For all a, b ∈ N, a ≤ b ⇔ (∃u)(a + u = b)

Now we can prove the following laws by induction.


(N4) Addition is associative. For all a, b, c ∈ N, (a + b) + c = a + (b + c).

(N5) Addition is commutative. For all a, b ∈ N, a + b = b + a.


(N6) Multiplication is associative. For all a, b, c ∈ N, (ab)c = a(bc).
(N7) Multiplication distributes over addition. For all a, b, c ∈ N, a(b + c) = ab + ac.
(N8) Multiplication is commutative. For all a, b ∈ N, ab = ba.

(N9) Addition is cancellative. If a + x = a + y then x = y.


(N10) Multiplication is cancellative. If ax = ay and a 6= 0 then x = y.

The relation ≤ is a well-order:


(N11a) For all a ∈ N, a ≤ a.
(N11b)For all a, b, c ∈ N, if a ≤ b and b ≤ c then a ≤ c.
(N11c) For all a, b ∈ N, if a ≤ b and b ≤ a then a = b.

(N11d)For all a, b ∈ N, a≤b or b ≤ a.


(N11w)Every nonempty subset of N has a least element.
466 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

C The Integers, Z

C.1 Preliminary
We are about to construct the Integers Z from the Natural Numbers N. Let us suppose for
a moment that we already have Z in some sense to look at. How can we specify a member
z of Z using natural numbers? Well we can write any integer z = m − n, where m and
n are natural numbers, and so we can use the pair hm, ni to represent z. Thus, as a first
approximation, we might consider defining z to actually be the pair hm, ni, and so construct
Z as the set N × N of pairs of natural numbers.
There is a problem with this however: any given integer z can be represented by a pair of
natural numbers in many ways: z = m − n = m0 − n0 . To accommodate this we consider
each integer z to actually be the set of pairs of natural numbers which represent it. Thus
we will define Z to be a set of appropriately defined subsets of N × N, upon which we will
define algebraic operations and order.
In order to define which subsets of N × N are going to represent integers, we define a relation
between pairs: hm, ni ∼ hm0 , n0 i if they represent the same integer, that is, if m−n = m0 −n0 .
This turns out to be an equivalence relation, and so we can define Z conveniently as the
quotient set (N × N)/∼.
There is one last small problem to be got out of the way: we need to be able to define
the relation ∼ in order to construct Z. But before Z has been constructed, the equation
m − n = m0 − n0 has no meaning (certainly not in N when m < n). Using our hindsight,
we observe that this equation is the same as m + n0 = m0 + n and that this one does make
sense in N, so we use this as our definition.
We are going to construct Z as a quotient set of N × N. It will be convenient to take this in
two steps, and discuss N × N, with some useful structure added, as a sort of half-way-there
construction.

C.2 Construction of The Integers, Step 1


Let us write J for the set N × N, with some operations defined as follows.

Zero 0J = h0, 0i
Addition ha, bi + hc, di = ha + c, b + di

Negation −ha, bi = hb, ai


One 1J = h1, 0i
Multiplication ha, bihc, di = hac + bd, ad + bci

And the relations

Equivalence ha, bi ∼ hc, di ⇔ a+d=b+c

Preorder ha, bi ≤ hc, di ⇔ a+d≤b+c


C. THE INTEGERS, Z 467

J1 We now prove that all the usual algebraic and order laws for Z hold in J “up to
equivalence”, as follows:
(i) (ha, bi + hc, di) + he, f i ∼ ha, bi + (hc, di + he, f i)
(ii) ha, bi + 0J ∼ ha, bi

(iii) ha, bi + (−ha, bi) ∼ 0J


(iv) ha, bi + hc, di ∼ hc, di + ha, bi
(v) (ha, bihc, di)he, f i ∼ ha, bi(hc, dihe, f i)
(vi) ha, bi(hc, di + he, f i) ∼ ha, bihc, di + ha, bihe, f i

(vii) ha, bi.1J ∼ ha, bi


(viii) ha, bihc, di ∼ hc, diha, bi
(ix) The relation ≤ defined on J is a preorder and its corresponding equivalence relation
is ∼.

(x) ha, bi ≤ hc, di ⇒ ha, bi + he, f i ≤ hc, di + he, f i


(xi) ha, bi ≤ hc, di and 0J ≤ he, f i ⇒ ha, bihe, f i ≤ hc, dihe, f i

Now (ix) above tells us that ∼ is an equivalence relation. Next we prove that it respects all
the operations we have defined on J:
J2 If ha, bi ∼ ha0 , b0 i and hc, di ∼ hc0 , d0 i then
(i) ha, bi + hc, di ∼ ha0 , b0 i + hc0 , d0 i

(ii) −ha, bi ∼ −ha0 , b0 i


(iii) ha, bihc, di ∼ ha0 , b0 ihc0 , d0 i
(iv) ha, bi ≤ hc, di ⇔ ha0 , b0 i ≤ hc0 , d0 i.

Now the scene is set to construct Z itself.

C.3 Construction of The Integers, Step 2


We define Z to be the quotient set: Z = (N × N)/∼ = J/∼ .
We use the notation [a, b] for the equivalence class of ha, bi, so that [a, b] = [c, d] ⇔ a + d =
b + c.

Z1 From (J2) it follows that we can define corresponding operations on Z by


(o) 0Z = [0,0] and 1Z = [1, 0]
(i) [a, b] + [c, d] = [a + c, b + d] (= the equivalence class of ha, bi + hc, di).

(ii) −[a, b] = [b, a] (= the equivalence class of −ha, bi).


468 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

(iii) [a, b][c, d] = [ac + bd, ad + bc] (= the equivalence class of ha, bihc, di).
(iv) [a, b] ≤ [c, d] ⇔ a+d≤b+c (⇔ ha, bi ≤ hc, di).
(The point here is that the results of (J2) mean that the definitions here are independent of
the particular members chosen to represent the equivalence classes.)

Z2 Applying (Z1) to (J1) gives all the basic laws for Z.


(i) ([a, b] + [c, d]) + [e, f ] = [a, b] + ([c, d] + [e, f ])
(ii) [a, b] + 0Z = [a, b]
(iii) [a, b] + (−[a, b]) = 0Z

(iv) [a, b] + [c, d] = [c, d] + [a, b]


(v) ([a, b][c, d])[e, f ] = [a, b]([c, d][e, f ])
(vi) [a, b]([c, d] + [e, f ]) = [a, b][c, d] + [a, b][e, f ]

(vii) [a, b].1Z = [a, b]


(viii) [a, b][c, d] = [c, d][a, b]
(ix) The relation ≤ defined on Z is a full order.
(x) [a, b] ≤ [c, d] ⇒ [a, b] + [e, f ] ≤ [c, d] + [e, f ]

(xi) [a, b] ≤ [c, d] and 0Z ≤ [e, f ] ⇒ [a, b][e, f ] ≤ [c, d][e, f ]


Z3 All these terms in (Z2) above are simply arbitrary integers, so the laws look more
familiar in more normal notation:
(i) (x + y) + z = x + (y + z)

(ii) x+0=x
(iii) x + (−x) = 0
(iv) x+y =y+x

(v) (xy)z = x(yz)


(vi) x(y + z) = xy + xz
(vii) x.1 = x
(viii) xy = yx

(ix) The relation ≤ defined on Z is a full order.


(x) x≤y ⇒ x+z ≤y+z
(xi) x ≤ y and 0 ≤ z ⇒ xz ≤ yz.

Z4 Finally, N is embedded in Z by the map n 7→ [n, 0]. In other words, the function
C. THE INTEGERS, Z 469

ι : N → Z defined by ι(n) = [n, 0] for all n ∈ N is an injection which preserves all the
operations and the order. Thus the image of N in Z under this mapping is an identical copy
of N in all salient respects. From now on we may, and often do when it suits us, think of N
as this subset of Z.
470 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

D The Rationals, Q

D.1 Preliminary
Now we construct The Rationals Q from The Integers Z. The general method is exactly
the same as for the previous construction, with only a few small changes of detail. Using
hindsight as before, we know that any rational q can be written as a quotient ab , where a and
b are integers and b 6= 0. Of course, any particular rational can be written as such a product
in many ways, and this gives rise to an equivalence relation on Z × (Z r {0}), ha, bi ∼ ha0 , b0 i
0
if and only if ab = ab0 . This can be defined within Z by ab0 = a0 b.
We are going to construct Q as a quotient set of Z × (Z r {0}), the set of all pairs ha, bi
of integers such that b 6= 0. As before, it will be convenient to take the construction in
two steps, and first discuss this product set, with some useful structure added, as a sort of
half-way-there construction.

D.2 Construction of The Rationals, Step 1


Let us write K for the set Z × (Z r {0}), with some operations defined as follows.

Zero 0K = h0, 1i
Addition ha, bi + hc, di = had + bc, bdi

Negation −ha, bi = h−a, bi


One 1K = h1, 1i
Multiplication ha, bihc, di = hac, bdi

Inversion If a 6= 0, then ha, bi−1 = hb, ai

And the relations


Equivalence ha, bi ∼ hc, di ⇔ ad = bc
Preorder ha, bi ≤ hc, di ⇔ (bd > 0 and ad ≤ bc) or (bd < 0 and ad ≥ bc).

K1 We prove that all the usual algebraic and order laws for Q hold in K “up to equiva-
lence”, as follows:
(i) (ha, bi + hc, di) + he, f i ∼ ha, bi + (hc, di + he, f i)
(ii) ha, bi + 0K ∼ ha, bi

(iii) ha, bi + (−ha, bi) ∼ 0K


(iv) ha, bi + hc, di ∼ hc, di + ha, bi
(v) (ha, bihc, di)he, f i ∼ ha, bi(hc, dihe, f i)

(vi) ha, bi(hc, di + he, f i) ∼ ha, bihc, di + ha, bihe, f i


D. THE RATIONALS, Q 471

(vii) ha, bi.1K ∼ ha, bi (viii) ha, bihc, di ∼ hc, diha, bi


(ix) If ha, bi  0K , then ha, biha, bi−1 ∼ 1K
(x) The relation ≤ defined on K is a preorder and its corresponding equivalence relation
is ∼.

(xi) ha, bi ≤ hc, di ⇒ ha, bi + he, f i ≤ hc, di + he, f i


(xii) ha, bi ≤ hc, di and 0K ≤ he, f i ⇒ ha, bihe, f i ≤ hc, dihe, f i

Now (x) above tells us that ∼ is an equivalence relation. Next we prove that it respects all
the operations we have defined on H:
K2 If ha, bi ∼ ha0 , b0 i and hc, di ∼ hc0 , d0 i then

(i) ha, bi + hc, di ∼ ha0 , b0 i + hc0 , d0 i


(ii) −ha, bi ∼ −ha0 , b0 i
(iii) ha, bihc, di ∼ ha0 , b0 ihc0 , d0 i

(iv) a 6= 0 if and only if a0 6= 0 and then ha, bi−1 ∼ ha0 , b0 i−1


(v) ha, bi ≤ hc, di ⇔ ha0 , b0 i ≤ hc0 , d0 i.
Now we are ready to construct Q itself.

D.3 Construction of The Rationals, Step 2


We define Q to be the quotient set: Q = (Z × (Z r {0}))/∼ = K/∼ .
a a c
We use the notation b for the equivalence class of ha, bi, so that b = d ⇔ ad = bc.
Q1 From (K3) it follows that we can define corresponding operations on Q by
0 1
(o) 0Q = 1 and 1Q = 1

a c ad+bc
(i) b + d = bd (= the equivalence class of ha, bi + hc, di).
−a
(ii) − ab = b (= the equivalence class of −ha, bi).
a c ac
(iii) b d = bd (= the equivalence class of ha, bihc, di).

(iv) If a
b 6= 0Q , then ( ab )−1 = b
a (= the equivalence class of ha, bi−1 ).
a c
(v) b ≤ d ⇔ (bd > 0 and ad ≤ bc) or (bd < 0 and ad ≥ bc) (⇔ ha, bi ≤ hc, di).

(The point here is that the results of (K2) mean that the definitions here are independent
of the particular members chosen to represent the equivalence classes.)
Q2 Applying (Q1) to (K1) gives all the basic laws for Q.

(i) ( ab + dc ) + e
f = a
b + ( dc + fe )
472 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

a a a
(ii) b +0= b (iii) b + (− ab ) = 0
a c c a
(iv) b + d = d + b

(v) ( ab dc ) fe = ab ( dc fe )
a c
(vi) b (d + fe ) = a c
b d + a e
b f
a a
(vii) b .1 = b
ac ca
(viii) b d = d b
a a a −1
(ix) If b 6= 0 then b(b) =1
(x) The relation ≤ defined on Q is a full order.
a c a e c e
(xi) b ≤ d ⇒ b + f ≤ d + f
a c e a e c e
(xii) b ≤ d and 0 ≤ f ⇒ b f ≤ df

Q3 All these terms in (Q2) above are simply arbitrary rationals, so the laws look more
familiar in more normal notation:
(i) (x + y) + z = x + (y + z)
(ii) x+0=x
(iii) x + (−x) = 0

(iv) x+y =y+x


(v) (xy)z = x(yz)
(vi) x(y + z) = xy + xz

(vii) x.1 = x
(viii) xy = yx
(ix) If x 6= 0 then x(x)−1 = 1
(x) The relation ≤ defined on Q is a full order.

(xi) x≤y ⇒ x+z ≤y+z


(xii) x ≤ y and 0 ≤ z ⇒ xz ≤ yz
Q4 Finally, Z is embedded in Q by the map n 7→ n1 . In other words, the function
ι : Z → Q defined by ι(n) = n1 for all n ∈ Z is an injection which preserves all the operations
and the order. Thus the image of Z in Q under this mapping is an identical copy of Z in
all salient respects. From now on we may, and often do when it suits us, think of Z as this
subset of Q.
E. THE REALS, R 473

E The Reals, R
E.1 Preliminary
Once again we use hindsight and consider what we know about the reals to decide how to
construct them from the rationals. There are two standard ways of doing this and I will
describe them both here; the first is the “completion” method, in which real numbers are
obtained as limits of convergent sequences of rational numbers, and the second is the method
of Dedekind cuts.
First we set out to define a real in terms of rational numbers is as the limit of a convergent
sequence. The usual definition of convergence, that the sequence ha0 , a1 , a2 , . . .i converges
to the real x if

(∀ real  > 0)(∃n ∈ N)(∀i ∈ N)(i ≥ n ⇒ |ai − x| < ),

won’t do, because we need to be able to identify a convergent sequence of rationals before
the reals have been constructed — that is, we need to be able to recognise, before the
reals have been constructed, those sequences of rationals which are going to converge after
the reals have been constructed. The definition above mentions reals in two places: firstly
in (∀ real  > 0) and secondly in the limit x. The first occurrence is no real problem:
(∀ rational  > 0) will do just as well. The second occurrence is more of a problem: the
definition is fine if you already have x to work with, but we are here trying to create x out
of thin air.
The solution is to use Cauchy sequences. A sequence ha0 , a1 , a2 , . . .i of real numbers is a
Cauchy sequence if

(∀ real  > 0)(∃n ∈ N)(∀i, j ∈ N)(i, j ≥ n ⇒ |ai − aj | < ). (1)

That is the usual definition, but it is easy to see that replacing “real” by “rational” in
(∀ real  > 0) results in an equivalent definition. Also, we know two crucial things about
Cauchy sequences: (1) Cauchy sequences are convergent (in the ordinary sense above) and
(2) Every real number is the limit of a Cauchy sequence of rationals. Just to make sure
there is no confusion, for the rest of this section a Cauchy sequence will mean a Cauchy
sequence of rational numbers, unless otherwise stated, that is, a sequence ha0 , a1 , a2 , . . .i of
rational numbers satisfying

(∀ rational  > 0)(∃n ∈ N)(∀i, j ∈ N)(i, j ≥ n ⇒ |ai − aj | < ). (2)

Just as in our previous constructions, we must observe that each real number is going
to be the limit of many Cauchy sequences. This defines an equivalence relation on the
Cauchy sequences, but we need a way of defining it a priori — without reference to real
numbers. This is not too difficult: two Cauchy sequences ha0 , a1 , a2 , . . .i and hb0 , b1 , b2 , . . .i
are equivalent in this sense if and only if

(∀ rational  > 0)(∃n ∈ N)(∀i ∈ N)(i ≥ n ⇒ |ai − bi | < ).

Having made these observations, the construction of R follows the now familiar pattern.
474 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

We are going to construct R as a quotient set of the set of all Cauchy sequences of rationals.
As before, it will be convenient to take this in two steps, and discuss the set of all Cauchy
sequences of rationals, with some useful structure added, as a half-way-there construction.

E.2 Construction of The Reals by completion, Step 1


Let us write L for the set of all Cauchy sequences of rationals, with operations defined as
follows.
Zero 0L = h0, 0, 0, . . .i
Addition ha0 , a1 , a2 , . . .i + hb0 , b1 , b2 , . . .i = ha0 + b0 , a1 + b1 , a2 + b2 , . . .i
Negation −ha0 , a1 , a2 , . . .i = h−a0 , −a1 , −a2 , . . .i

One 1L = h1, 1, 1, . . .i
Multiplication ha0 , a1 , a2 , . . .ihb0 , b1 , b2 , . . .i = ha0 b0 , a1 b1 , a2 b2 , . . .i
Inversion ha0 , a1 , a2 , . . .i−1 = ha∗0 , a∗1 , a∗2 , . . .i

(where for each i, a∗i = a−1


i if ai 6= 0 and a∗i = 0 if ai = 0).

and relations defined

Equivalence ha0 , a1 , a2 , . . .i ∼ hb0 , b1 , b2 , . . .i ⇔


(∀ rational  > 0)(∃n ∈ N)(∀i ∈ N)(i ≥ n ⇒ |ai − bi | < ).
Preorder ha0 , a1 , a2 , . . .i ≤ hb0 , b1 , b2 , . . .i ⇔

(∀ rational  > 0)(∃n ∈ N)(∀i ∈ N)(i ≥ n ⇒ bi − ai < ).


We prove that all of these operations, when applied to Cauchy sequences, yield Cauchy
sequences (with the exception of inversion when applied to a sequence which is equivalent
to 0L ).

L1 As before, we prove all the basic algebraic and order laws on L “up to equivalence”.
Let X, Y and Z be any three Cauchy sequences in L. Then
(i) (X + Y ) + Z ∼ X + (Y + Z)
(ii) X + 0L ∼ X

(iii) X + (−X) ∼ 0
(iv) X +Y ∼Y +X
(v) (XY )Z ∼ X(Y Z)
(vi) X(Y + Z) ∼ XY + XZ

(vii) X.1L ∼ X
E. THE REALS, R 475

(viii) XY ∼ Y X
(ix) If X 6∼ 0 then X(X)−1 ∼ 1L
(x) The relation ≤ defined on R is a full preorder whose corresponding equivalence
relation is ∼.

(xi) X≤Y ⇒ X +Z ≤Y +Z
(xii) X ≤ Y and 0 ≤ Z ⇒ XZ ≤ Y Z

Now (x) above tells us that ∼ is an equivalence relation. Now we prove it respects all the
operations we have defined on L:
L2 If X, Y , X 0 and Y 0 are Cauchy sequences and X ∼ X 0 and Y ∼ Y 0 then
(i) X + Y ∼ X0 + Y 0

(ii) −X ∼ −X 0
(iii) XY ∼ X 0 Y 0
(iv) X 6∼ 0L if and only if X 0 6∼ 0L and then X −1 ∼ X 0−1 .
(v) X ≤ Y if and only if X 0 ≤ Y 0 .

E.3 Construction of The Reals by completion, Step 2


We define R to be the quotient set: R = L/∼ .
R1 From (L2) it follows that we can define corresponding operations on R by

Zero 0R = [0L ]
Addition [X] + [Y ] = [X + Y ]
Negation −[X] = [−X]
One 1R = [1L ]

Multiplication [X][Y ] = [XY ]


Inversion If [X] 6= 0R , then [X]−1 = [X −1 ].
Preorder [X] ≤ [Y ] ⇔ X ≤Y.

(As usual, the point here is that the results of (L2) mean that the definitions here are
independent of the particular members chosen to represent the equivalence classes.)
R2 Applying (R1) to (L1) gives all the basic algebraic laws for R.
(i) (x + y) + z = x + (y + z)

(ii) x+0=x
476 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

(iii) x + (−x) = 0
(iv) x+y =y+x
(v) (xy)z = x(yz)
(vi) x(y + z) = xy + xz

(vii) x.1 = x
(viii) xy = yx
(ix) If x 6= 0 then x(x)−1 = 1

(x) The relation ≤ defined on R is a full order.


(xi) x≤y ⇒ x+z ≤y+z
(xii) x ≤ y and 0 ≤ z ⇒ xz ≤ yz
R3 We now prove the completeness property of the reals: any Cauchy sequence of real
numbers converges to a unique real number. (Note: we are talking here about a sequence of
real numbers, which is Cauchy in the sense of Equation (1) of this section —  is real also.
This is therefore quite a complicated result.)
R4 Finally, Q is embedded in R by the map q 7→ [q, q, q, . . .]. This function is an
injection which preserves all the operations and the order. Thus the image of Q in R under
this mapping is an identical copy of Q in all salient respects. From now on we may, and
often do when it suits us, think of Q as a this subset of R.

E.4 Construction of The Reals by Dedekind cuts


The basic idea of this method is to obtain a real number a by looking at the set

{ x : x ∈ Q, x < a }

of all rationals less than it; such sets are called cuts. We must first define them in a way
which does not require prior reference to real numbers.

E.5 Definition: Cut


A cut is a subset A of Q with these properties:
(i) It is initial, that is, if a ∈ A, x ∈ Q and x ≤ a then x ∈ A also.

(ii) Neither A nor its complement Q r A is empty.


(iii) A does not have a greatest element (that is, it is open).

E.6 Proposition
If A is a cut, then its complement Q r A has these properties:

(i∗ ) It is final, that is, if a ∈ Q r A, x ∈ Q and x ≥ a then x ∈ Q r A also.


E. THE REALS, R 477

( ii∗ ) Neither Q r A nor its complement A is empty.


However it may not be open — it may have a least element.
Let us write Aco for Q r A with its least element (if any) removed. Then Aco satisfies (i)∗
and (ii)∗ above and is open, that is,

(i− ) It is final, that is, if a ∈ Aco , x ∈ Q and x ≥ a then x ∈ Aco also.


(ii− ) Neither Aco nor its complement is empty.
(iii− ) Aco does not have a least element.
It follows that the set { −b : b ∈ Aco } is a cut.

Now we can define the reals with their additive structure.

E.7 Definition
The Real Numbers R is just the set of all cuts in Q, as defined above (there will be no need
to form a quotient structure with this approach).
The zero real is 0R = Q− , the set of all negative rationals.
For any reals (cuts) A and B, A + B = { a + b : a ∈ A and b ∈ B } .

For any real (cut) A, −A = { −b : b ∈ Aco } .

E.8 Proposition
Let A be a cut and r be any rational number > 0. Then there is some a ∈ A such that
a + r ∈ Aco .

Proof. Now construct sequences a0 , a1 , a2 , . . . and b0 , b1 , b2 , . . . of rationals inductively as


follows:
Choose any a0 ∈ A and (since Aco is nonempty) b0 ∈ Aco . Since A is initial and Aco ⊆ QrA,
a0 < b0 . Now, assuming that an and bn have been defined and an ∈ A and bn ∈ Aco ,
so that an < bn , we define an+1 and bn+1 . Consider 21 (an + bn ). If 12 (an + bn ) ∈ A,
define an+1 = 12 (an + bn ) and bn+1 = bn ; if 12 (an + bn ) ∈ Aco , define an+1 = an and
bn+1 = 21 (an + bn ); otherwise 21 (an + bn ) must be the least element of Q r A, in which case
define an+1 = 31 (an + bn ) and bn+1 = bn .
From this definition we can see that every an ∈ A, every bn ∈ Aco and 0 < bn+1 − an+1 ≤
2 2 n
3 (bn −an ), whence 0 < bn −an ≤ ( 3 ) (b0 −a0 ). Thus we can choose n so that 0 < bn −an ≤ r.
Set a = an , so a ∈ A and a + r = an + r ≥ bn ∈ Aco so an + r ∈ Aco . 

E.9 Proposition
With these definitions, R is a commutative group with respect to addition.

Proof. Associativity and commutativity of addition are trivial.


478 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

Let A be any cut; we show that A + 0R = A.


First, suppose that x ∈ A + 0R . Then there is a ∈ A and r ∈ 0R such that x = a + r. But
then r < 0, so x < a and then x ∈ A.
Conversely, suppose that x ∈ A. Since A has no greatest element, there is b ∈ A such that
a < b. Then a = b + (a − b) with b ∈ A and a − b ∈ 0R , so a ∈ A + 0R .
Now let us show that A + (−A) = 0R .
First, let x ∈ A + (−A). Then there are a ∈ A and b ∈ Aco such that x = a − b. But then
b > a, so x < 0, so x ∈ 0R .

Conversely, suppose that x ∈ 0R . Set x = −r, where r is a positive rational. Using


the proposition above, let a ∈ A be such that a + r ∈ Aco . Then −a − r ∈ −A and
x = −r = a + (−a − r) ∈ A + (−A) . 

We now define the order on R. This is easy.

E.10 Definition
For cuts A and B,
A≤B if and only if A ⊆ B .

It is easy to check that this is a full order. Also, for any cuts A, B and C,

if A ≤ B then − B ≤ −A
and A + C ≤ B + C .

E.11 Proposition
For any cuts A, B and C,
(i) If A ≤ B then A + C ≤ B + C.
(ii) If A < B then A + C < B + C.

(iii) If A ≤ B then −A ≥ B.
iv If A < B then −A > B.

Proof. (i) follows trivially from the definition of the order and the others follow easily from
this by ordinary group theory. 

Now we can define the multiplicative structure for non-negative cuts.


E. THE REALS, R 479

E.12 Definition
Suppose that A ≥ 0R and B ≥ 0R . Then

AB = { ab : a ∈ A, a ≥ 0, b ∈ B, b ≥ 0 } ∪ 0R ,
A−1 = { x : x > 0, x−1 ∈ Aco } ∪ {0} ∪ 0R ,
1R = { x : x ∈ Q, x < 1 } .

E.13 Proposition
Let A be a cut, A > 0R , and let r be any rational > 1. Then there is an a ∈ A such that
ar ∈ Aco .

Proof. Since A > 0R , there is c ∈ A such that c > 0. Let s = c(r − 1). Then, by the earlier
proposition, there is d ∈ A such that d + s ∈ Aco . Now let a = max{c, d}. Since c, d ∈ A,
we have a ∈ A. Also ar = a + a(r − 1) ≥ a + c(r − 1) = a + s ≥ d + s and d + s ∈ Aco so
ar ∈ Aco . 

We can now show that the non-negative cuts obey all the usual field laws involving multi-
plication.

E.14 Proposition
Let A, B and C be cuts, all ≥ 0R . Then
(i) (AB)C = A(BC) ,

(ii) AB = BA ,
(iii) A0R = 0R ,
(iv) A(B + C) = AB + AC ,
(v) A1R = A ,

(vi) If A 6= 0R then AA−1 = 1R .

Proof. (i), (ii) and (iii) follow immediately from the definition.
(iv) Given (ii) and (iii) above, this result is trivially true if any of A, B or C are 0R ; so
we assume now that they are all > 0R . Suppose first that x ∈ A(B + C). If x < 0, then
x ∈ AB + AC automatically. Otherwise, there are a ∈ A, a ≥ 0 and y ∈ B + C, y ≥ 0 such
that x = ay, and then b ∈ B and c ∈ C such that y = b + c.
If b ≥ 0 and c ≥ 0 we are done, for then x = ab + ac with ab ∈ AB and ac ∈ AC. Since
b + c = y ≥ 0 we cannot have both b and c negative. We may therefore suppose that b < 0
and c > 0, the proof in the other case being the same. But then y < c, so y ∈ C and now
x = 0 + ay with 0 ∈ AB and ay ∈ AC, so x ∈ AB + AC as required.
480 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

Conversely, suppose that x ∈ AB + AC and x ≥ 0. Then x = ab + a0 c, where a, a0 ∈ A,


0
b ∈ B, c ∈ C and a, a0 , b and c are all ≥ 0. Let a00 = ab+a c
b+c . This is a weighted average of
a and a (b and c are ≥ 0) and so lies between a and a ; therefore a00 ∈ A and a00 ≥ 0. But
0 0

now x = a00 (b + c) ∈ A(B + C), as required.


(v) Suppose first that x ∈ A1R and x ≥ 0. Then x = ab with a ∈ A, a ≥ 0 and 0 ≤ b < 1.
But then 0 ≤ x ≤ a, so x ∈ a.
Conversely let a ∈ A with a ≥ 0. Since A has no greatest element, there is some b ∈ A with
b > a. Then b 6= 0 and 0 ≤ ab < a so a = b. ab with b ∈ A, b ≥ 0 and ab ∈ 1R , ab ≥ 0.
(vi) First, let x ∈ AA−1 , x > 0. Then x = ab with a ∈ A, a ≥ 0, b−1 ∈ Aco and b > 0.
But then b−1 > a so 0 < b < a−1 and so ab < aa−1 = 1. Thus x ∈ 1R .

Conversely, supose that x ∈ 1R , x > 0, that is, 0 < x < 1. Set r = x−1 so r > 1. By
the previous proposition, there is an a ∈ A such that ar ∈ Aco . Then (ar)−1 ∈ A−1 so
a(ar)−1 = r−1 = x ∈ AAco , as required. 

E.15 Proposition
For any cuts A, B and C,

(i) If 0R ≤ A ≤ B and C ≥ 0R then AC ≤ BC.


(ii) If 0R < A ≤ B then 0R < B −1 ≤ A−1 .

Proof. (i) follows immediately from the definition of the order and (ii) follows from that
by ordinary ring theory. 

We can now stop messing around with cuts. The story so far, converting to ordinary
notation, is that we have our set R of reals, with addition, negation and zero defined on R,
multiplication and identity defined on the set of non-negative reals and inversion defined on
the set of positive reals. These satisfy
For any x, y and z in R,
(Ai) (x + y) + z = x + (y + z)

(Aii) x + y = y + x
(Aiii) x + 0 = x
(Aiv) x + (−x) = 0
(Av) If x ≤ y then x + z ≤ y + z and −x ≥ −y.

(Avi) If x < y then x + z < y + z and −x > −y.

For any non-negative reals x, y and z,

(Mi) (xy)z = x(yz)


E. THE REALS, R 481

(Mii) xy = yx
(Miii) x0 = x
(Miv) x(y + z) = xy + xz
(Mv) x1 = x

(Mvi) If x > 0 then xx−1 = 1


(Mvii) If x ≤ y then xz ≤ yz
(Mviii) If 0 < x ≤ y then 0 < y −1 ≤ x−1 .

We now extend the definitions of multiplication and inversion to negative reals.

E.16 Definition
(i) Let x be any real. Then its absolute value |x| is defined in the usual way:
(
x if x ≥ 0,
|x| =
−x if x < 0.

(ii) Let x and y be any reals. Then their product xy is defined




 as already defined if x ≥ 0 and y ≥ 0

−x |y| if x ≥ 0 and y < 0
xy =
−|x| y
 if x < 0 and y ≥ 0

|x| |y| if x < 0 and y < 0

(iii) Let x be any real. Then its inverse x−1 is defined:


(
−1 as already defined if x ≥ 0,
x =
−(−x)−1 if x < 0.

Observe that, because of M(iii) above, the last three displayed equations can be rewritten
(
x if x ≥ 0,
|x| =
−x if x ≤ 0 ,



as already defined if x ≥ 0 and y ≥0

−x |y| if x ≥ 0 and y ≤0
xy =


−|x| y if x ≤ 0 and y ≥0
|x| |y| if x ≤ 0 y ≤0

and
and (
as already defined if x ≥ 0,
x−1 =
−(−x)−1 if x ≤ 0.
482 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

It follows from these definitions that (−x)(−y) = xy , x(−y) = (−x)y = −(xy) and
(−x)−1 = −x−1 irrespective of the signs of x and y.

We now show that M(i)–M(vii) extend to negative numbers as well.

E.17 Proposition
For any reals x, y and z,
(i) (xy)z = x(yz)
(ii) xy = yx
(iii) x0 = x

(iv) x(y + z) = xy + xz
(v) x1 = x
(vi) If x 6= 0 then xx−1 = 1

(vii) If x ≤ y and z ≥ 0 then xz ≤ yz

Proof. First note that, while we must still be careful about multiplicative manipulation,
we have already proved that R is a group with respect to addition and so we can freely use
ordinary manipulation of addition, subtraction and zero.
(i), (ii) and (iii) all follow immediately from the definitions above.
(iv) We must consider the various cases according as x, y and z are positive or negative.
We already know that x(y + z) = xy + xz in the case where x, y and z are all ≥ 0, and we
use this freely below.
Suppose now that x ≥ 0, y ≥ 0 and z ≤ 0. We must consider two subcases, depending on the
sign of y + z. If y + z ≥ 0 then, since −z ≥ 0 also, we have x(y + z + (−z)) = x(y + z) + x(−z)
which is xy = x(y + z) − xz and so the required result. In the other subcase, y + z ≤ 0,
we have −z ≥ 0 and so xy = x(−z + (y + z)) = x(−z) + x(y + z) by the previous subcase
= −xz + x(y + z), which gives the required result again.
Suppose now that x ≥ 0, y ≤ 0 and z ≥ 0; by commutativity of addition, this is the same as
the previous case.
Suppose now that x ≥ 0, y ≤ 0 and z ≤ 0. Then y + z ≤ 0 so x(y + z) = −x(−y − z) =
−(x(−y) + x(−z)) = −(−xy − xz) = xy + xy as required.

This disposes of all cases in which x ≥ 0. But if x ≤ 0 we have x(y + z) = −(−x)(y + z) =


−((−x)y + (−x)z) by one or other of the cases above, and this = −(−xy − xz) = xy + xz
as required.
(v) We already know that x1 = x in the case x ≥ 0. And if x ≤ 0 we have x1 =
−((−x)1) = −(−x) (since −x ≥ 0) = x as required.
E. THE REALS, R 483

(vi) We already know that xx−1 = 1 in the case that x > 0. But if x < 0 we have x−1 < 0
also, so xx−1 = (−x)(−x−1 ) = (−x)((−x)−1 ) = 1 as required.
(vii) We already know this in the case 0 ≤ x ≤ y. In the case x ≤ 0 ≤ y we have xz ≤
0 ≤ yz and in the case x ≤ y ≤ 0 we have −y ≤ −x, so xz = −(−x)z ≤ −(−y)z = yz. 

This completes the proof of the ordered field structure of R. As usual we observe

E.18 Proposition
The function Q to R given by

r 7→ { x : x ∈ Q, x < r }

is one-to-one and preserves the order and field operations of Q. In other words, this function
embeds Q into R.
And finally, a result which is easy to prove but important enough to be called

E.19 Theorem: Completion


Let X be a nonempty subset of R which is bounded above. Then it has a least upper bound.

S
Proof. Let m be the given upper bound. Define s = X (the union of this set of reals,
thought of as cuts). We will show that s is the required least upper bound.
Firstly we must show that it is in fact a real number, that is, a cut. To see that it is initial,
let r ∈ s, q ∈ Q and q ≤ r. Then there isSsome member x of X such that r ∈ x. But then x
is a cut and so q ∈ x also; and then q ∈ X = s. To see that s is nonempty, note that X is
nonempy and every member of X, being a cut, is nonempty. To see that the complement S of
s is nonempty, note that every member x of X is ≤ m, that is, ⊆ m and so s = X ⊆ m
— and the complement of m is nonempty. Finally, s is the union of sets x none of which
have a greatest element, so neither does s.
S
Every member of X is a subset of X, that is, every member of X is ≤ s; and that says
that s is an upper bound for X.
Suppose that u is any otherSupper bound for X. That means that every member of X is a
subset of u. But then s = X ⊆ u, that is s ≤ u. This tells us that s is in fact the least
upper bound for X. 

E.20 Corollary
Every Cauchy sequence in R converges to a real number.

Proof by intimidation. This is now first year work. 


484 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS

F The Complex Numbers, C


The construction of the Complex Numbers is now easy. Since each complex number z can
be written uniquely in the form z = x + iy, where x and y are real, it follows that we
can construct C as the set of all pairs hx, yi of real numbers with the operations defined
appropriately. Since the representation z = x + iy is unique, there is no need to define an
equivalence relation and a quotient set: in this sense the construction of C is much simpler
than the other ones just discussed.

The proper way to define the operations is obvious: the zero and identity are h0, 0i and
h1, 0i. The sum and product of ha, bi and hc, di are ha + c, b + di and hac − bd, ad + bci. The
negative and inverse of ha, bi are h−a, −bi and

a −b
h√ ,√ i.
a2+b 2 a2 + b2

There is no standard order defined on C, so we do not need to worry about constructing


one.pWe can, however, define the absolute value of a complex number z = x + iy = hx, yi to
be x2 + y 2 and then prove that the Complex Numbers are complete, that is, that every
Cauchy sequence of complex numbers converges (to a complex number).

One of the most of the most important properties of the Complex Numbers is expressed
by the Fundamental Theorem of Algebra: Every non-constant polynomial (with real or
complex coefficients) has at least one (complex) root. A corollary of this theorem is that
every polynomial can be factorised completely into linear factors, provided those factors are
allowed to have complex coefficients. There are two standard proofs of this theorem; one
involves a considerable amount of complex analysis (contour integration and so on), the
other a goodly amount of topology (homotopy theory). Both are beyond the scope of these
notes (sorry!).
B. ZF SET THEORY

A Zermelo-Fraenkel Axioms for Set Theory


In this appendix I outline the Zermelo-Fraenkel axioms for Set Theory.

The structure of the appendix follows that of Chapter 6 as closely as possible, noting the I6
differences with MK Set Theory as it progresses. In many places the development is identical,
and then it is not done all over again here.
The major difference between the two treatments is that in MK Set Theory the basic objects
are classes and in ZF Set Theory they are sets. There is no direct way of talking about proper
classes in ZF.

A.1 Definition: Zermelo-Fraenkel Set Theory, ZF


This is a first-order theory with equality. It has one binary relation ∈ and no functions. For
simplicity I will use an abbreviation for the “subset” predicate which occurs in these axioms:

x⊆y will mean (∀u)( u ∈ x ⇒ u ∈ y ) .

The proper axioms are:—

(ZF1, Extension) (∀x)(x ∈ a ⇔ x ∈ b) ⇒ a=b

(ZF2, Specification) Schema: an axiom for every expression P (x, y1 , y2 , . . . , yn )


(∀y1 )(∀y2 ) . . . (∀yn )(∀a)(∃w)(∀x)(x ∈ w ⇔ x ∈ a ∧ P (x, y1 , y2 , . . . , yn ))

(ZF3, Unordered Pairs) (∀a)(∀b)(∃w)(a ∈ w ∨ b ∈ w)

(ZF4, Unions) (∀a)(∃w)(∀x)((∃y)(x ∈ y ∧ y ∈ a) ⇒ x ∈ w)

(ZF5, Power Set) (∀a)(∃w)(∀x)(x ⊆ a ⇒ x ∈ w)

(ZF6, Infinity) (∃w) (∃u)(u ∈ w ∧ (∀x)¬(x ∈ u)) ∧ 


(∀x) x ∈ w ⇒ (∃y)(y ∈ w ∧ (∀u)(u ∈ y ⇔ u ∈ x ∨ u = x))

(ZF7, Formation) Schema: an axiom for every expression P (x, y)


(∀x)(x ∈ a ⇒ (∃!y)P (x, y)) ⇒ (∃w)(∀y)(y ∈ w ⇔ (∃x)(x ∈ a ∧ P (x, y))

(ZF8, Foundation) (∀a) (∀x)(x ∈ a ⇒ P (x)) ⇒ P (a) ⇒ (∀a)P (a)

Note the Axiom of Choice is not an axiom of ZF.

485
486 APPENDIX B. ZF SET THEORY

A.2 Sets and classes in ZF


In ZF all mathematical objects are sets. As with MK, numbers, functions, relations and so
on are all actually constructed as sets (and in much the same way).
Most of the axioms of ZF differ from the corresponding ones of MK only in that they make
no mention of the predicate SET( ); this is unnecessary since everything in ZF is a set
anyway.
Even though ZF does not allow us to talk about classes in the formal language, it is often
highly desirable to use those sorts of ideas and there are various ways of translating notions
which we would naturally think of as being about classes into valid ZF. The most common
way of doing this translation is by replacing the idea of the class by the predicate which
defines it. For a simple example, one could write G for the “class of all groups” and write
G ∈ G, which can be translated into genuine ZF as “G is a group”, in the sense “G satisfies
the definition of a group”. This kind of terminology and notation is rather dangerous and
liable to lead one astray; the acid test is whether anything one says using the language of
classes can be translated into genuine ZF-speak.

Various kinds of class-like construction are acceptable in ZF. For example, suppose we write
F for the class of finite groups and A for the class of abelian groups. Then we may write
G ∈ F ∩ A because we can translate this as G is a finite abelian group. For another example,
for any group G, its centre, usually written Z(G), is a uniquely defined subgroup and so a
group in its own right. Thus we can describe Z as a “function G → G”. This is perfectly
proper, provided we take this to be definition of a function by description, as in Definition
I 4.A.7 4.A.7 — the definition of a function as a class, as in 6.B.21 is not available to us. This can
I 6.B.21 be quite a nuisance in areas of mathematics which make a lot of use of ideas which are most
naturally expressed as functions between classes: functors in algebraic topology, universal
algebra and more generally category theory spring to mind.

The kind of argument which must not be used in ZF is one which quantifies classes, for
instance one which talks about all classes with some given property. The reason this is
outlawed is that, in trying to translate this using the predicates which define the classes,
one ends up talking about all predicates of such and such a form, and this is just not part
of first-order logic. For example, there is no way at all of expressing the idea of “for all
functions G → G” in ZF. It is always possible, of course, that one might be able to replace
the argument with another completely different one which is valid ZF and does come to the
same conclusion, however this is not automatically possible.
The Axiom of Foundation provides an example of straightforward translation. As expressed
in MK, the axiom mentions a class w which turns out to be a proper class; indeed w = Sets.
This cannot be said in this way in (formal) ZF, so the ZF version of the axiom uses the
usual way of getting around this by using a predicate instead. The idea of the “class of all
sets” is not directly expressible in ZF, however a construction of the form “for all sets a,
such and such is true” is.
By far the most important difference in the axioms is in the Axiom of Specification. In MK
the axiom tells us that, for any predicate P (x) there is a corresponding class { x : P (x)}.
In ZF the corresponding axiom is much more limited; it tells us that, for any set a and any
predicate P (x) there is a corresponding set { x : x ∈ a ∧ P (x)}.
B. THE FIRST SIX AXIOMS 487

B The first six axioms


B.1 ZF1: The Axiom of Extension
As with MK, the opposite implication is given to us by Substitution of Equals, and so this
axiom tells us that
A=B if and only if (∀x)(x ∈ A ⇔ x ∈ B) ,
in other words, two sets are equal if and only if they have the same members.

B.2 ZF2: The Axiom of Specification and the Empty Set


Is there any guarantee that there are any sets at all? The Axiom of Specification does not
help with this basic question, since is now requires a set before stating the existence of
another one. However an appeal to the Axiom of Infinity guarantees the existence of a set,
just as with MK.
To define the empty set, we may proceed as follows. Define it as the set ∅ with this property:
x ∈ ∅ ⇔ x 6= x .
From this it is easy to deduce the usual properties of the empty set, such as the alternative
definition
(∀x)¬(x ∈ ∅)
all except for the fact that it exists at all. To show that the empty set exists, appeal to
the mysterious set, w say, which the Axiom of Infinity tells us exists; then the Axiom of
Specification tells us that the set { x : x ∈ w ∧ x 6= x } exists also. Then show that this
set is in fact the empty set.

The notation { x : P (x) } can be used fairly freely in ZF, however it is essential that
the expression P (x) implies that x is a member of some already-known set, otherwise the
construction is meaningless in ZF. For example, the construction
L = {x : x ∈ R ∧ x < 0}
is OK, whereas
G = { G : G is a group }
is not.

B.3 Some useful notation (definitions)


The definitions of x ∈
/ A, A ⊆ B, A ⊂ B, A ⊇ B, A ⊃ B, (∀x ∈ A)P (x) and (∃x ∈ A)P (x)
are exactly the same as for MK, however of course now everything must be a set.

B.4 Properties of subsets


The following are now easily proved
(i) The empty set is a subset of every set: ∅ ⊆ A , and the only subset of the empty
set is itself: A ⊆ ∅ ⇔ A = ∅ .

(ii) The subset relation has the properties of a partial order.


488 APPENDIX B. ZF SET THEORY

B.5 Sets
The class Sets of all sets does not exist in ZF, nor anything that does the job of “all sets”.
Sorry about that.
(The Russell Paradox tells us that there is no set of all sets.)

B.6 ZF5: The Axiom of Power Sets


This axiom tells us that, for any A, the power set P (A) = { X : X ⊆ A } exists.
Because this is ZF, A must be a set and then so is P (A). As with MK, the axiom, as stated,
only says that, for any set A, there is a set, W say, which contains all the subsets of A
as members (that is, X ⊆ A ⇒ X ∈ W ). To see that the power set P itself exists, the
argument is slightly simpler than the one needed in MK: the Axiom of Specification can be
used directly to define
P = {X : X ∈ W ∧ X ⊆ A}

B.7 ZF3: The Axiom of Unordered Pairs


This axiom tells us that, for any a and b, the unordered pair {a, b} = { x : x = a ∨ x = b },
exists. It follows that the singleton {a} = { x : x = a } exists also, since {a} = {a, a}.
As usual, a and b must be sets and then so are {a, b} and {a}. And once again, the usual
properties follow easily.

B.8 ZF4: The Axiom of Unions


The union of two sets and the union of a set of sets are defined in the same way as for MK.
[
A ∪ B = { x : x ∈ A or x ∈ B } and A = { x : (∃W )(x ∈ W and W ∈ A }

S
Axiom ZF4 tells us S that, for any A, A exists. As usual, since this is ZF, A must be a
set and then so is A. (Note that, in MK, the existence of the union of classes is given by
the Axiom of Specification. However in ZF, we need Axiom ZF4 in order to be able to talk
about unions at all.) It follows that the union of two sets, and hence the union of any finite
number of sets, is also defined.

B.9 Intersections
The definition of intersections of sets is rather more tricky than is the case with classes in
MK. the intersection of two sets can be defined by
A ∩ B = { x : x ∈ A ∧ x ∈ B }.
The intersection of any (nonzero) finite number of sets can be defined similarly, and then
the usual properties are easily proved.

The intersection of any nonempty set of sets can be defined in the usual way: if A is a
nonempty set of sets, then its intersection is:
\
A = { x : (∀W ∈ A)(x ∈ W ) }
B. THE FIRST SIX AXIOMS 489

Here is where we must be careful: it is not enough to simply write this definition down.
The axioms of ZF do not guarantee the intersections existence as easily as that, because
the definition does not conform to the ZF form of the Axiom of Specification. So we must
verify that the intersection exists and is unique.
For a start, if A is empty, this intersection is in fact undefined. If it existed, then the
definition can be seen to be equivalent to that of the universe, which does not exist in ZF.
If A is nonempty, then it must contain a member, A say. Then we observe (with a small
proof) that the definition above is equivalent to
\
A = { x : x ∈ A ∧ (∀W ∈ A)(x ∈ W ) }

which does conform to the ZFAxiom of Specification. Then finish off by checking uniqueness.

B.10 The calculus of sets


We have to be more careful with the calculus of sets than is the case with MK.

For a start, the complement of a set is undefined.


For two sets A and B, the set difference A r B is defined in the same way as in MK:
ArB = {x : x ∈ A ∧ x ∈ / B }, and the definition is also good in ZF. The usual properties
are easily proved.

The union and intersection of two sets are defined as discussed above and have all the usual
properties.

B.11 Ordered pairs


The primitive ordered pair ha, bi of two sets is defined in exactly the same way as for ZF.
Its properties, and the consequent definition of the (ordinary) ordered pair go exactly as for
MK.

Since we cannot speak of classes in ZF, there is no need for the more general definition of
an ordered pair which is used in MK to define a pair of classes: we use the above definition
everywhere. So we don’t need the special notation ha, bip , and I haven’t used it above.

B.12 Cartesian products


The cartesian product is defined, and its properties proved, in the same way as for MK.

Note that the horrid little proof given in Chapter 6 that the cartesian product of two sets is I Ch.6
a set works just as well in ZF, but here is used to tell us that the cartesian product exists.

B.13 Relations, functions, cartesian powers and sequences


Everything here is exactly the same as for MK, except of course that everything in sight
must be a set.
490 APPENDIX B. ZF SET THEORY

C The Natural Numbers


This chapter holds good in ZF with hardly any changes.
I 6.D.2 In Definition 6.D.2 the bald definition

N = { n : n is a member of every inductive set }.

won’t do. Instead we show that there exists a unique set N with the property that

for all x, x ∈ N if and only if n is a member of every inductive set.

Noting that the set W of the axiom is not unique, we show that N exists by using the fact
that the axiom states that at least one such set W exists, and so there exists a set

N = { n : n ∈ W and n is a member of every inductive set }.

We then show (easy) that N has the property stated above. Then it is also easy to prove
that it is unique.
I 6.D.4 In Definition 6.D.4, of course, we only define a transitive set. This is not a problem for the
rest of the section.

D Well-ordering
and . . .

E The Axiom of Choice


and . . .

F The Axioms of Foundation and Formation


I Ch.6 No surprises here. Everything in these sections of Chapter 6 holds good in ZF word for word
— so long as references to classes are replaced by references to sets wherever they appear.
I Ch.7 The same remarks apply to the treatment of Cardinality in Chapter 7.

Dealing with ordinal numbers when one cannot talk about classes at all can be done, but
can be a bit tiresome. In the next section I will present the main results of the section on
I Ch.7 Ordinal numbers in Chapter 7 translated into ZF-speak.
G. ORDINAL NUMBERS 491

G Ordinal numbers
G.1 Definition
An ordinal number is a set α such that
(i) α is transitive (that is, every member of α is also a subset of α).
(ii) For all x, y ∈ α, one of the following hold: x ∈ y, x = y or y ∈ x.
Note that (i) above is equivalent to

(i0 ) x ∈ α ⇒ x ⊂ α.

G.2 Ordering an ordinal number


Let α be an ordinal. We define the relation ≤ on α by

x≤y if and only if x∈y or x = y .

This relation is in fact a well-order on α.

G.3 Proposition
If two ordinal numbers are order-isomorphic, then they are equal.

G.4 Proposition
Every initial segment of an ordinal number is an ordinal number.
Therefore every member of an ordinal number is an ordinal number.

G.5 Theorem
(i) If α is an ordinal number and x ∈ α then x is an ordinal number also.

(ii) If α and β are ordinal numbers then one of α ∈ β, α = β or β ∈ α must hold.


(iii) Every nonempty set of ordinal numbers has a least element.

G.6 Remark
This theorem tells us that, if there were such a thing as the set of all ordinal numbers, then
it would itself be an ordinal number, and thus a member of itself. Therefore ...

G.7 Theorem
There is no such thing as the set of all ordinal numbers.

G.8 Theorem
Every well ordered set is order-isomorphic to exactly one ordinal number.
492 APPENDIX B. ZF SET THEORY

G.9 Definition
If A is a well ordered set, then the unique ordinal number to which it is order-isomorphic is
called the order type of A.

G.10 Example
N is an ordinal number.

G.11 Remark
In the context of ordinal numbers, the set N is usually denoted ω.

G.12 Example
Every natural number is an ordinal number.

G.13 Theorem
If α is an ordinal, then so is α+ .

G.14 Examples
As examples of the last theorem, the following are ordinal numbers.
ω + = {0, 1, 2, . . .} ∪ {ω}
ω ++ = {0, 1, 2, . . .} ∪ {ω, ω + }
ω +++ = {0, 1, 2, . . .} ∪ {ω, ω + , ω ++ }

and so on.

G.15 Theorem
S
Let A be a set of ordinals. Then A is an ordinal.
C. GENERAL ALGORITHMS

A Defining a general algorithm


A.1 Any algorithm at all?
We are about to construct a precise model of a very general form of algorithm. It will end up
in constructing a programming language modelled very loosely on such languages as Pascal
or C. The idea is that this language should be able to model and perform any algorithm
whatsoever. This is rather a tall order, considering the wide range of types of algorithm
available.
We mostly (I imagine) think of an algorithm as being performed with pen or pencil on paper,
or perhaps with manipulations of states in a computer’s memory, but it could perfectly well
consist of operations performed with the beads of an abacus or with pebbles on a beach, or
pretty well anything. So what we need to do here is decide what the essential properties of
an algorithm are and then build a mathematical model powerful enough to simulate it in all
its generality.

Note that here, when I write “model”, I am using the word only very roughly in the
sense of Chapter 5. Here I mean a mathematical structure which imitates accurately
the operation of any general algorithm, as we are about to investigate it.

Note that we will restrict ourselves to algorithms for calculation of the kinds of things
that can be written down — ruling out physical things like recipes for making plum
pudding or instructions for changing a tyre on your car. Symbol manipulation in
other words.

Basically an algorithm consists of three main parts. The first is what we might call the
workspace, where the operations are carried out. It could be the paper we work on, a
blackboard, the computer’s memory, the abacus with its beads, the beach with its pebbles
or whatever. This idea carries along with it the individual symbols which may be used:
numerical digits or letters when working on paper, 0s and 1s in a computer’s memory,
positions of beads in an abacus and so on.
The second part is what is actually written on this workspace. This will usually change
from time to time as the algorithm progresses. I will call this a configuration.
The third part is a set of instructions for implementing the algorithm in this workspace,
which we can understand to be carried out by some kind of an operator or processor. The
important thing here is that the individual instructions should be completely cut and dried:
they should be able to be followed without any creative thought and without the need for
more than a finite amount of memory on the part of the processor. The instructions are
memorised too, so there can only be a finite number of them (for any given algorithm). We

493
494 APPENDIX C. GENERAL ALGORITHMS

would normally think of the processor as a human — or, nowadays, as a computer CPU —
but in any case, the requirements of finite memory and no creative thought mean that we
can model the processor as a (finite-state) automaton.
Note that this requirement of finite memory on the part of the processor means that the set
of symbols available for use, the alphabet, must be finite too — it must be able to recognise
the different symbols and know what to do when encountering them.
At first sight this seems like too open-ended a problem. Think of all the possible instructions
that could be given — “add these two numbers”, “rewrite that number in reverse order”,
“convert that number to Roman notation”, “write a row of that many zeros” and so on —
and that’s only dealing with numerical notation. However that’s because we are looking
at too high a level here. All these kinds of instruction can be broken up into a fairly
small number of “primitive” instructions. Let us consider an example. Consider adding two
multidigit numbers, say:

2 6 8
6 7 9

What do you actually do to add these numbers (using the usual algorithm)? You look at
the top right hand digit (8), then, remembering it, go down to the one below (9). You know
what to do with this: put a 7 in the third row (empty so far) and remember to carry 1.
Now go to the top of the next column to the left, . . . and so on, working leftwards across
the diagram.
Analysing this down to the very simplest mini-operations, one proceeds as follows:

Head state What to do next

0 Ready to go Look at top-left digit


1 Seen an 8 Look at digit below
2 Seen an 8 and a 9 Move to space below, write 7
3 Done bottom row, remember to carry Look at top, one column left
4 Seen a 6, remembering carry Look at digit below
5 Seen a 6 and a 7, remembering carry Move to space below, write 4
.. ..
. .
etc. etc.

There is only a reasonably small (finite!) number of different head-states involved here. The
largest number come from dealing with cases like rows 2 and 5, where there are 200 states,
depending on the two digits seen and whether a carry is to be made or not. If you are like
me, you spent some dreary time in primary school having the “what to do nexts” for each
of these states beaten into your brain.
There are a few extra kinds of head-states, not displayed in the list above, to do with what
to do when a blank is encountered (reaching the far end of one or other of the numbers).
A. DEFINING A GENERAL ALGORITHM 495

And there’s the notion of “keep going until you reach one of these blanks”. But that’s about
it.
An important point of this is that all you need to “know” (remember) to perform this
algorithm on any pair of numbers, no matter how big, is a small finite number of instructions
of a few very specific, very simple kinds; assuming you are in head-state s and looking at a
cell containing character c:

Decide what to do next, that is, enter into a new head-state s0 ; which actual head-state
you change to depends on the character c you are seeing.

Write a new character at the place you are looking at, just erase whatever is there
(which we could think of as writing a blank), or make no change (which we could
think of as writing the character or blank that was already there).

Move one step in any of the obvious directions (in this case right, left, up or down).

That’s about it. We will need to clean this up a little and we will see shortly that we will
need to add a couple more of the primitive instructions: they will be equally simple.

Here is an interesting exercise to try if you are crazy enough: do the same sum in hexadecimal
(base-16) notation.

1 0 C
2 A 7

Don’t cheat by converting the numbers back into base-10 first. Don’t even convert to binary.
Now, unless you are some super-geek who already knows how to do arithmetic operations in
hexadecimal, you will find that you are teaching yourself something much like the algorithm
I have laid out above, with perhaps a few short cuts you can devise. Well, you don’t have
to do the whole sum, but at least look at the first (rightmost) column and convince yourself
that the bottom symbol should be a 3 (with a carry).
The point of all this is that, if you consider any algorithm and analyse all its operations
down to the simplest “atomic” ones, by which I mean ones that are so simple that they
cannot be broken down any further, then those instructions will be of the simple kind I
described above. There may be a lot of them to define any particular algorithm, but of only
a very few kinds.
We must think about the workspace a bit more. As mentioned above, it can be much more
general that sheets of paper organised into rows and columns of characters. However there
must be some structure. You cannot do our multidigit sum above (decimal version) with a
layout like this:
496 APPENDIX C. GENERAL ALGORITHMS

8
6
2
7

9
because how could you tell which digit belongs to which number and in what order?
Typically, one arranges symbols in rows and columns, perhaps using extra sheets of paper for
subsidiary calculations and so on. This workspace is finite, but is allowed to grow as much
as is needed for any particular calculation (you can always buy some more paper or another
hard disk). The particular physical medium used for the workspace is not really relevant,
so we model it by a mathematical structure. But we will want to define this structure to be
general enough to encompass any well-defined way of arranging our symbols on our medium.

At the very least, the workspace, or rather what is written into it, will consist of single
symbols arranged in some kind of definite pattern which allows the processor to move around
it in some defined and predictable way. Let us look at a few examples of such patterns.

• A natural number written in binary notation can be represented by a sequence of the


symbols 0 and 1; a horizontal row. In fact numbers in almost any standard notation
can be represented in the same way, but using a different alphabet.

• An addition or multiplication algorithm, as usually taught, involves digits (members


of the alphabet) arranged in rows and columns.

• Calculations involving matrices involve numbers arranged on a 2-dimensional grid.


Actually, it is more complicated than that. Each number has horizontal and vertical
neighbours, defining a rectangular 2-dimensional grid. If we are to represent numbers
as sequences of digits, as we usually do, then each digit has neighbouring digits within
the number also. So we really have what would be better represented by a three-
dimensional array here.

• In the well-known triangular array for calculating binomial coefficients, each entry has
up to six neighbours.

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
A. DEFINING A GENERAL ALGORITHM 497

So basically what we need here is the idea of a “next” symbol, or more generally, the “next”
symbol in some given direction (such as left, right, up or down). A workspace will be
equipped with a finite set of directions, and we can conveniently think of its state at any time
during a calculation as consisting of a finite number of symbols from the alphabet, connected
together by arrows labelled with the names of the directions. Even more conveniently for
visualisation, we can enclose each symbol in a little box, representing somewhere it is OK
to write a symbol, and connect the boxes with the labelled arrows. I will call these boxes
cells.
We are now closing in on the general properties needed for a workspace, close enough for us to
construct a mathematical model of one which is general enough to cover all manifestations.
For example, consider a matrix. Decorating it with arrows to represent the directions and
little boxes to represent the cells, we would have something like this.

R R
3 7 2
L L
U D U D U D
R R
4 0 4
L L
U D U D U D
R R
1 9 8
L L

A three-by-three matrix as a workspace. It uses four directions,


L, R, U, D, for left, right, up and down. The top middle cell
contains the symbol 7 and “owns” the L, R and D pointers com-
ing out of it; its U pointer is null.

You will (I hope) notice that I have cheated here by only having single-digit entries. To
accommodate more general, multidigit, numbers we would need a couple more directions to
move back and forth along the digits of each number. (If I drew that diagram it would be
unpleasantly complicated.)
A couple of points should be made about workspaces before moving on.

(1) Think of the workspace as being like a blank sheet of paper, ready for the calculations
to be written on it. It doesn’t change during the calculation. At any stage during the
calculation, the workspace will contain a set of symbols (in cells) connected by the directions.
I will call that a configuration, and that will change as the calculation progresses.

So the plain workspace is entirely defined by the available directions (a finite set) and the
available alphabet (another finite set). For the matrix example above, for instance, the
set of directions would be {L, R, U, D, l, r} — here I have used l and r for left and right
directions within a numeral — and the alphabet would consist of the decimal digits together
with perhaps a minus sign and a decimal point, depending upon what sort of numbers you
were interested in and what notation you wanted to use.
498 APPENDIX C. GENERAL ALGORITHMS

And, in this matrix example, the picture above shows a particular configuration in the
workspace, representing a 3 × 3 matrix. If an algorithm was being performed on this matrix,
diagonalisation for example, the configuration would change from time to time, but the
workspace itself would not.
(2) Not all the pointers (representing the directions) have to point to anything. In the
matrix example, the U pointers of the top row don’t point to anything (so I haven’t drawn
them in). The same goes for the other edges of the matrix.
(3) Any configuration is finite, but is allowed to grow by adding cells in any direction.
(Think: in any arithmetic calculation, you are likely to add new digits in places where before
there was nothing. There’s nothing mysterious about this.) By the same token, removing
cells is OK too.
Note that adding or removing a cell is a change to the configuration, not to the workspace.
(4) At any time during the progress of an algorithm, the processor’s attention is fixed
on one particular cell. It is convenient (for our discussion) to think of that cell as being
distinguished by a marker of some kind. Actually, there is no harm in thinking of an
algorithm as having several markers, each representing a cell which the processor may focus
on at some time; the processor might shift focus from one marker to another, and therefore
from one cell in the configuration to another. Using this it would, for instance, be possible
to add two numbers which were widely separated in the workspace, using a marker for each.

(5) We will of course be interested primarily in algorithms which can be used for calcu-
lations involving natural numbers. For our theoretical purposes we need algorithms which
will work for all natural numbers. (An algorithm to deal with, say, multiplication which is
restricted to handling numbers not exceeding some maximum size M is trivial: just write
out all possible M 2 cases as a long list and use it as a look-up table.) And that means that,
amongst other things, our algorithms will have to cope with numbers that are inconceivably
vast.
Consider, for instance, the apparently trivial task of copying a number, let’s call it x. If
the number x is of “normal” size, say not bigger than ten digits long, I can simply look at
x, memorise it and write it into the new place. But what if the number is of the order of
101000 ? There is no way I can remember a 1000-digit number. I would have to copy it in
pieces, perhaps a digit at a time — thinking things like “I now have to copy digit 573 across”.
Still fairly straightforward, though the task is starting to look more complicated. But now
1000
what if the number is of the order of 1010 ? Now there is no way I can even keep what
digit I am up to in my head: that is a 1000-digit number itself. The obvious way to deal
with this is to use some kind of moveable marker, say place a small pebble on the next digit
to be copied. All I now have to do is scan along x until I reach the pebble, push it one digit
to the right, memorise the one revealed underneath, go over to the new partial copy and
write that digit at the end. Keep doing this until my pebble falls off the end of x. Simple,
if time-consuming.
Or, as remarked above, we could think of this as using two markers, keeping one of them
on the place we are up to in the old number and the other on the place we are up to in the
new copy.
A. DEFINING A GENERAL ALGORITHM 499

Of course, when I say “I” here, I am really thinking of the processor of the algorithm. It has
finite memory, so the above discussion applies to it just as well. Moreover, most algorithms
need to deal with data (configurations) of arbitrary size, including truly enormous ones, so
the problems just discussed are quite general.
The moral of this is that all the algorithm’s most basic operations must act locally, on a single
symbol or one of its neighbours (what the neighbours are being defined by the structure of
the workspace). Operations which act at a distance might have to operate at enormous
distances and so must be implemented as repetitions of these basic local operations.
Since these markers we have been discussing simply distinguish certain cells as ones being
“looked at”, it is suggestive to call them “eyes” (of the processor). I shall rather call them
variables, since that is the way they are used: they refer to a piece of the data (the configu-
ration) which may change from time to time, as may the part of the configuration they are
referring to. It is then natural to call the cell that variable refers to as its value.

An example of addition, first version

Here is an example of an addition to be performed in binary notation:

x
O
1 0 1
1 1 1

This is the initial set up. The little triangle represents the algorithm’s lone variable, its
value being the top right-hand digit. (I have called it x.) Now here is the situation part-way
through the performance:

x
O
1 0 1
1 1 1
0 0

The first two columns have been dealt with and the processor is just about to start on the
third. All it needs to remember at this stage is that there is a carry.
500 APPENDIX C. GENERAL ALGORITHMS

Start
O
? ?
 0 1  0 1

D D D D D D

? ? ? ? ? ?
   1 0 
 0 1 0 1 0 1 1 0 1
0

Fin D D D D D D

0 1 0 1 0 1

UUL UUL UUL UUL

I think that this diagram is fairly self-explanatory. The circles and rectangles represent
what I have so far been referring to as “states” or “head-states” — I will henceforth call
them nodes. The blue circles represent nodes where a decision as to what to do next is
being made, based on the value of its variable. Here the symbol  represents a blank, that
is, seeing nothing or an empty cell. The yellow signposts, D etc., represent moves in the
directions down, up or left. The signpost UUL represents three moves, up, up and left:
a convenient abbreviation for three nodes. The green rectangles, 0 etc., represent writing
a symbol.
And, if it’s not obvious, the arrows represent the node gone to next. From the “decision”
nodes there are several arrows, each labelled by the symbol or symbols which caused that
particular transition.

The left side of the diagram is what happens when a carry hasn’t been seen, for instance at
the start, and the right side is what is done when a carry has been seen.
The above diagram may look a bit complicated, but addition is complicated if analysed
down to its simplest components, as we have done here.

An example of addition, second version


This version is good as far as it goes, though we will see that there are some problems
associated with the assumption that we are working in a “squared paper” style of workspace.
However it is of interest to have an algorithm which works when the two numbers to be added
have not necessarily been lined up neatly under and above one another, and the sum is to be
A. DEFINING A GENERAL ALGORITHM 501

developed perhaps in another place entirely. For such an algorithm the processor’s attention
will need to move between the two given numbers and the sum, needing three variables. To
deal with this we will have to name the variables (say, x and y for the two given numbers
and s for the sum). Then the various basic operations, deciding based upon the value of
its variable, moving the variable or writing in the cell it refers to, each depend upon which
variable is being used. So each of our nodes in the diagram must be labelled by which
variable it refers to. Given this the diagram turns out to be simpler than the last one
(because there is not so much moving about).

Start

x? x?
 0 1  0 1

y? y? y? y? y? y?
   1  
 0 1 0 1 0 1 0 1 0 1
0

Fin s := 0 s := 1 s := 0 s := 1 s := 0 s := 1

L L
xys → xys →

There’s not much new here, only the x, y and s tags on the nodes to indicate which variable
they refer to. The xys on the bottom “move” nodes means that all three variables move one
step to the left (it is actually shorthand for three simpler move nodes).

More thoughts about the workspace


Any particular configuration is finite (as for example, the addition sum being developed
above). But we often do not know in advance how much space any particular calculation is
going to take up, so our configurations are allowed to grow without bounds. The example
above shows how this happens as the sum is developed: as a new digit is added to the sum,
the variable moves into empty space, and a connection in the correct direction is established.
(In the first version above, the variable moves down into empty space, thus creating a new
cell with a D-pointer to it; in the second version the s variable moves left and establishes
an L pointer.
It is tempting to stipulate that an entire set of empty cells exists right from the start,
extending infinitely in all directions. (Imagine, for instance, an infinite sheet of squared
paper). This would be fine for such a square array of cells, but we want to allow any kind
of arrangement of direction pointers.
For a start, we cannot assume that the workspace is already provided with a complete set of
empty cells connected together in any way at all, because in that case a workspace could be
arranged to encode any function at all. For example, the following workspace would allow
502 APPENDIX C. GENERAL ALGORITHMS

the calculation of the function f (n) = the nth Fibonacci number, simply by looking along
the top row to the nth cell, then counting the cells below it.

···
···
···
···
···
···
···
···

(Here we are using R and D pointers only; note that both pointers out of every cell are
defined.) All the cells are blank, so here we are simply using an empty space with a predefined
structure. It should be obvious from this example that, if we are allowed to set up the empty
space with any kind of structure in advance, then, given any function N → N at all, we can
set up the structure so that values of that function can be looked up just by following
pointers around in the empty space, rather like looking up an infinite telephone book. That
is clearly not what we mean by an algorithm: for a difficult function it is going to be difficult
to set up the space thus and for a non-computable function, impossible.
Even a fairly straightforward arrangement like our assumption of a “squared paper” one in
our first version of the addition example above has its problems. If we follow the movements
of the variable around the configuration in this algorithm, this is what happens:

 1 0 1

 1 1 1

1 1 0 0

Presumably the two given numbers (the upper two rows in the diagram) are both internally
connected by pointers, probably both L and R, since they are given as numbers. But the
bottom row, the sum, has been created by the algorithm, and only the pointers shown in
and out of those cells have been created explicitly by the variable moving around. But that
bottom row has to end up connected to itself by L and R pointers, or at least L pointers.
How does this come about? It seems that there are two possibilities: either these connections
are made automatically by the workspace, due to the “squared paper” assumption, or else
they are made by the processor as part of the algorithm (which has not been made explicit
in our discussion above).
A. DEFINING A GENERAL ALGORITHM 503

The “squared paper” approach seems natural and straightforward, so let us look at that first.
Let us assume that a binary representation of a number is connected together by both R
and L pointers, and that the two given numbers are connected that way when given. The
black part of the diagram below shows what is given. Then adding the extra connections
that occur because of variable movements give rise to the part shown in red.
Something must give rise to the L and R connections in the bot-
tom line. At present we are considering that this is done by the
workspace. This means that the workspace must at least have en-
 1 0 1 coded in it somehow the local structure, that L and R are inverse
directions and that going UR has the same outcome as going RU
 1 1 1 — and so on. In fact it must encode these facts somehow:
LR = I , RL = I , UD = I , DU = I ,
1 1 0 0 RU = UR , RD = DR , LU = UL , LD = DL .
(I am using I to mean “no movement”. Note that these relations
are not all independent.)
Remembering that “squared paper” is just one of many ways of structuring the workspace,
on can think about some others. For example the triangular sort of structure used in the
calculation of binomial coefficients has six directions and a correspondingly more complicated
set of relations to define it.
So it seems that we might be able to define a workplace structure by giving a set of directions
and a few relations between them such as these, then taking these into account automatically
somehow as the configuration grows. That this is not entirely trivial, even for the well-known
“squared paper” type of arrangement is shown by a case such as the following.
A B Suppose that the algorithm has started at A and built
the configuration shown here, progressing around to the
cell at C. Now it is going to move up, to create another
cell above the one at C. But it has to know somehow
C that this new cell must also be connected upward to the
one at B. It is clear from this example that, even with a
simple local definition of the structure of the workspace,
there can be implications afar off which require some
computation to recognise.
In this example, recognising that a connection should be made between the new cell and
the one at B is tantamount to recognising that RRRDDDLLLU U = D follows from
the relations listed above. To a group theorist this is starting to look like solving the word
problem for a group given by generators and relations, or, more generally, for a semigroup
so given. And so it should: it is not difficult to see that, given any semigroup defined
by generators and relations, we can set up a workspace where the directions correspond
to those generators, the “local” structure is defined by those relations and the question of
determining other more complicated resulting relations is in fact that of solving the word
problem. It is known that the word problem for groups, and therefore for semigroups also, is
in general algorithmically unsolvable. There are of course many groups for which the word
problem is solvable, but this is not the case for all of them. Indeed, there are many groups
504 APPENDIX C. GENERAL ALGORITHMS

with innocuous looking generator-and-relations definitions for which the word problem is
solvable but takes an enormous amount of computation.
What this boils down to is the idea of defining a workspace with anything more complicated
than just a “bare” set of directions with no relations assumed between them is fraught with
problems, the very least of which is assuming that quite a large amount of the computations
of the algorithm is assumed to be hidden in the structure of the workplace. So we go back
to the original definition: the workspace consists of a set of symbols and a set of directions,
and that is all. If we want some further structure, of the kind we have just been considering,
that is made the responsibility of the program the processor is following. If that program
involves following up on local relations (which are now built in to the program, not the
workspace) and that turns out to be trying to solve an unsolvable word problem, then at
some point the algorithm may fail — but that is not a problem, algorithms are allowed to
fail.
As far as the workspace is concerned, we have come full circle here. We are back with the
extremely simple definition of the workspace we started with, but now we have some good
reasons why. Now any structure added, in the sense of relations between the directions,
must be part of the program of instructions the processor follows. So . . .

More thoughts about the processor and instructions


Looking at the two addition examples above, the second version certainly looked simpler.
The assumption of more than one variable means that complicated algorithms can be per-
formed without the necessity for a single variable to go crawling stepwise all over the place.
Also, we will see in the next paragraph that more than one variable is pretty well essential
anyway. Now, at times it will be necessary to set a variable to refer to some cell. For in-
stance, at the beginning of an algorithm, the several variables that will be used will probably
need to be initialised. To set a variable to refer to some cell, that cell must be already known
to the algorithm and, at this low level, the only way the processor can specify a particular
cell is as the value of some other variable. Thus we need an atomic instruction of the form

Given two variables x and y, set x to refer to whatever cell y is already referring to.
We will symbolise this operation as x :≡ y.

It should now be obvious that the first version, the “squared paper” one is even more com-
plicated than it looked, because the extra connections which must be made by the processor
require instructions which were not made explicit in the node diagram above. They were
hidden by the undisclosed assumption that the structure of the workspace would take care
of it, an assumption we have now discarded. But, to make the extra connections we need
an extra low level (“atomic”) instruction that will make such a connection. A connection is
normally made between two cells (though it is possible to make it from a cell to itself). To
do that of course the processor must know which cells to connect: it needs to have variables
referring to both of them (here is where the assumption of more than one variable becomes
essential). So the atomic instruction is of the form

Connect the cell being referred to by one variable (x say) to the one being referred
D
to by the other (y) in direction D. We will symbolise this operation as x → y.
A. DEFINING A GENERAL ALGORITHM 505

Note that, in any algorithm, each variable is named explicitly somewhere in the program and
the program is finite (contains only a finite number of these atomic instructions). Therefore,
there is only need of a finite number of variables.
Time to make all this official.

A.2 Definition of a general algorithm

A cell may be connected, in certain directions, to no other cell. A cell may contain no
symbol. A variable may not actually be referring to anything. So it will be convenient to
have a symbol to represent “nothing”, or “blank”. I shall use  for this.

(i) A workspace is a pair of finite sets:


(a) A finite alphabet A, whose members will be called symbols.
(b) A second finite set D , whose members will be called directions.
(To avoid confusion, these two sets will be assumed to be disjoint.)

(ii) A configuration C (“in” some given workspace) consists of:


(a) A finite set C of cells.
(b) Each cell in C “contains” a symbol of the alphabet, or perhaps nothing.
This gives a function sym : C → A ∪ {}.
(It will be convenient to assume that this function has been extended to include  by
defining sym() = , so it becomes a function sym : C ∪ {} → A ∪ {}.)
(c) To each cell c ∈ C and direction d, a cell (or blank) con(c, d).
This gives a function con : C × D → C ∪ {}.
(It will also be convenient to assume that this function has been extended to include  by
defining con(, d) = , so it becomes a function con : (C ∪ {}) × D → C ∪ {}.)
We can think of a configuration as being the triple hC , sym, coni.
(iii) A program, which consists of a finite set N of nodes and a finite set V of variables .
Each node is of one of the kinds (a)–(h) listed below.

Each node n specifies one of the variables, x = var(n). This gives a function var : N → E .
This defines which cell the algorithm is “looking at” when it performs that operation.
Each node n also specifies a next node, next(n). This gives a function next : N → N . This
defines which node the algorithm normally goes to after completing the operation defined
by this one. “Normally” because the first three kinds of nodes are “branch” nodes which can
make a decision to go to an alternative node.
506 APPENDIX C. GENERAL ALGORITHMS

(a) x ≈ a? A symbol-compare node n specifies a symbol (or blank) a = sym(n) and


also a second next node, nextT (n). This gives two functions,
sym : {symbol-compare-nodes} → A ∪ {},
nextT : {symbol-compare-nodes} → N .
(In this context the ordinary next function may be called nextF .)
In this node the algorithm compares the contents of whichever cell x is
referring to with its own symbol a. If they are the same, the algorithm
goes to the nextT (s) node, otherwise it goes to the ordinary nextF (s)
node.
(b) x ≡ y? A variable-compare node n specifies a secondary variable y = var0 (n)
and also a second next node, nextT (n). This gives two functions,
var0 : {variable-compare-nodes} → V ,
nextT : {variable-compare-nodes} → N .
(In this context the ordinary variable may be called the “primary” one,
and again the ordinary next function may be called nextF .)
The algorithm compares the value of x with the value of y. If they are
the same cell (not contents of cell!), the algorithm goes to the nextT (s)
node, otherwise it goes to the ordinary nextF (s) node.

(c) d A cell-exists node n specifies a direction d = dir(n) and also a second


x →?
next node, nextT (n). This gives two functions,
dir : {cell-exists-nodes} → D ,
nextT : {cell-exists-nodes} → N .
(In this context again the ordinary next function may be called nextF .)
The algorithm checks to see if a cell in direction d from the one x is
referring to actually exists. If so the algorithm goes to the nextT (s)
node, otherwise it goes to the ordinary nextF (s) node.
The next three nodes actually change the configuration in some way.
(d) x :≈ a A write node n specifies a symbol (or blank), a = write(n).
This gives a function write : {write nodes} → A ∪ {}.
The algorithm writes the symbol a into the cell x is referring to.
(e) d
A join node n, which specifies a direction d = dir(n) and also a (sec-
x→y ondary) variable y = var0 (n). This gives two functions,
dir : {join-nodes} → D ,
var0 : {join-nodes} → V ∪ {}.
The algorithm connects the cell that is the value of variable x to the cell
that is the value of variable y, in direction d.
(f ) x new A create-cell node. The algorithm creates a new empty cell and sets the
value of variable x to be it.
The next two nodes move the variable around.
(g) d
(x →) A (weak) move node n specifies a direction, d = weakmove(n).
This gives a function weakmove : {weak-move-nodes} → D .
The algorithm moves the variable x in direction d. (If possible. If there
is nothing there already, it doesn’t move.)
A. DEFINING A GENERAL ALGORITHM 507

(h) x :≡ y A variable-set node n specifies a secondary variable, y = var0 (n),


This gives a function var0 : {variable-set nodes} → V ∪ {}.
The algorithm sets the value of variable x to be equal to the value of
variable y (that is, variable x now refers to the same cell as variable y).

(i) Fin A fin node.


In a fin node, next(n) = n.
This is where the algorithm goes when it is finished. It doesn’t do any-
thing, it just stays there.
Exactly one node is defined as the initial node, which we will also call the “start”.

A.3 Running
Running an algorithm results in a sequence of steps

hC0 , n0 , ref 0 i , hC1 , n1 , ref 1 i , hC2 , n2 , ref 2 i , . . .

The ith step hCi , ni , ref i i represents the current configuration Ci = hC i , symi , coni i, the
current node ni in the program and the current ref-function ref i at “time” i. The ref
function defines the values of all the program’s variables at that time, and so is a function
ref i : V → C i ∪ {}.

Initialising an algorithm then consists of


(1) Selecting a particular configuration C0 = hC 0 , sym0 , con0 i to be the initial configu-
ration. This is the input data.
(2) Specifying which cell of C 0 each of the variables of the program is referring to (possibly
nothing). So this is defining an initial ref-function ref 0 : V 0 → C 0 ∪ {}.
Running an algorithm, once it has been initialised, results in the sequence of steps displayed
above. Here I use the obvious notation

Stepi = hCi , ni , ref i i where Ci = hC i , symi , coni i

The initial step Step0 = hC0 , n0 , ref 0 i is defined as described above. Then all subse-
quent steps are defined as follows. Given Stepi = hC i , ni , ref i i, the next step Stepi+1 =
hC i+1 , sni+1 , ref i+1 i is defined as follows:
(a) If ni is a symbol-compare node, then
that node specifies an variable, e = var(ni ) and a symbol (or ) a = sym(ni ).
The variable defines a cell (or ), c = ref i (e) which in turn defines another
symbol b = sym(c)
This then defines the next node ni+1 = nextT (ni , a) if a = b, ni+1 =
nextF (ni , a) otherwise.
The node is the only thing that changes, so the next step is given by:
(
nextT (ni ) if a = b
C i+1 = C i , ni+1 = ref i+1 = ref i .
nextF (ni ) otherwise ,
508 APPENDIX C. GENERAL ALGORITHMS

(b) If ni is an variable-compare node, then


that node specifies primary and secondary variables, e = var(ni ) and e0 =
var0 (ni ).
This then defines the next node ni+1 = nextT (ni , a) if e = e0 , ni+1 =
nextF (ni , a) otherwise.
The node is the only thing that changes, so the next step is given by:
(
nextT (ni ) if e = e0
C i+1 = C i , ni+1 = ref i+1 = ref i .
nextF (ni ) otherwise ,

(c) If ni is an cell-exists node, then


that node specifies a variable, e = var(ni ) and a direction d = dir(ni ).
The variable defines a cell (or ), c = ref i (e) which in turn defines a second
cell c0 = con(c, d).
This then defines the next node ni+1 = nextT (ni , a) if c0 6= , ni+1 =
nextF (ni , a) otherwise.
The node is the only thing that changes, so the next step is given by:
(
nextT (ni ) if c0 6= 
C i+1 = C i , ni+1 = ref i+1 = ref i .
nextF (ni ) if c0 =  ,

(d) If ni is a write node, then


It has a variable, e = var(ni ) and a symbol (or blank) a = write(ni ) to write.
The cell’s variable specifies the current cell it is referring to, c = ref(e).
The contents of c that e is referring to is the only thing that changes, so the
next step is given by:

ni+1 = next(ni ) , ref i+1 = ref i ,

and C i+1 is the configuration given by


(
symi+1 (c) = a0 ,
Ci+1 = Ci , coni+1 = coni .
symi+1 (u) = symi (u) for all u 6= c,

(e) If ni is an join node, then


It has primary and secondary variables, e = var(ni ) and e0 = var0 (ni ) and a
direction d = dir(ni ).
The variables define cells c = ref i (e) and c0 = ref i (e0 ).
Cell c is connected in direction d to cell c0 , so the next step is:

ni+1 = next(ni ) , ref i+1 = ref i ,

and C i+1 is the configuration given by


(
con(c, d) = c0
Ci+1 = Ci , symi+1 = symi ,
coni+1 (u, v) = coni (u, v) for all (u, v) 6= (c, d)
A. DEFINING A GENERAL ALGORITHM 509

(f) If ni is an create-cell node, then


it has a variable var(ni ). A new cell c0 is added to the configuration, and e is
set to refer to it.
C i+1 is the configuration given by
(
symi+1 (c0 ) = 
Ci+1 = Ci ∪ {c0 } ,
symi+1 u = symi u for all u 6= c0
(
coni+1 (c0 , d) =  for all directions d
coni+1 (u, v) = coni (u, v) for all (u, v) such that u 6= c0

and then
(
ref i+1 (e) = c0
ni+1 = next(ni ) ,
ref i+1 (x) = ref i (x) for all variables x 6= e .

(g) If ni is a weak move node, then


It has an variable, e = var(ni ), which defines a cell c = ref(e), and a direction
to move it in, d = weakmove(ni ).
The next cell is in direction d from the current one, c0 = con(c, d), provided
that one is not . The cell that e is referring to is the only thing that changes,
so the next step is given by:

0
ref i+1 (e) = c
 if c0 6= 
C i+1 = C i , ni+1 = next(ni ) , ref i+1 (e) = e if c0 = 

ref i+1 (u) = ref i (u)for all u 6= e.

(h) If ni is an variable-set node, then


It has primary and secondary variables, e = var(ni ) and e0 = var0 (ni ).
Variable e is changed to refer to whatever cell e0 is currently referring to, so
the next step is:
(
ref i+1 (e) = ref i (e0 )
C i+1 = C i , ni+1 = next(ni ) ,
ref i+1 (u) = ref i (u) for all u 6= e.

(i) If ni is a fin node, then nothing changes: Stepi+1 = Stepi .


That is it: we’ve defined an algorithm in (we hope) complete generality. The definition
looks complicated, but that is because we have analysed the ideas down to such fine detail,
and also because the definition in the last couple of subsections has been given with rather
obsessive attention to these details.
510 APPENDIX C. GENERAL ALGORITHMS

B Some useful facilities


Now, as has been done several times in this book, we will add some facilities to this bare-
bones definition which will make it much easier to use. Note that, as usual, these facilities
do not involve changing the basic definition at all; they are simply introduced as convenient
abbreviations for algorithmic structures that can be built using the definitions above. These
facilities may then be referred to by a single word or symbol, because we will already know
how to expand them out to be an algorithm in the form defined above. We’ll start with a
couple of easy ones.

B.1 Better move operations


A strong move operation makes the move whether the target cell exists or not. If it doesn’t
exist, a blank cell is created, connected up and then moved to. This operation requires a
second variable, which I will here call x̂, that refers to the same cell as x all the time, except
that during the move it lags behind a little and then catches up again (I think I’ll call it
a doppelgänger). The operation is so much more useful than the weak move, that I will
simply call it a move:

T
x :≈ ? x new
d
x→ is an abbreviation for F

d
x̂ → x

d
x̂ →

. . . and reversible directions


For many purposes it is useful to have directions which are the opposite of each other (like
left and right or up and down). This is used so often that it is useful to have it happen
automatically (whenever we want it) without having to write out the small steps needed to
make it happen every time.
The changes needed to make this happen are small: basically, whenever a connection between
two cells is made, the reverse connection has to be made also. So, if directions d and d∗ are
to be reverses, then

d
d x→y
instead of x→y we have
d∗
y→x
B. SOME USEFUL FACILITIES 511

The move operation must be upgraded too by making the same change to the “better move
operation” above.
So, if we want this to happen in an algorithm we are about to specify, all we need do is say
at the outset something like “in this algorithm the directions L and R will be reverses, as
will U and D”, and then assume all the join operations are upgraded as above.

Why are we using the strange notations ≡ and ≈ , when a plain equals sign should serve for
one of them? That is because we are saving the = symbol for the more important equality
of numbers, as represented by their codes.

B.2 Multi-way decisions


Then we had the more general decision making operation at the beginning of our addition
examples, where there were several branches according to various symbols. In that case the
symbols were 0, 1 (and ). So

T
x ≈ 0? 0
x? is an abbreviation for F
 0 1

T
x ≈ 1? 1
F

The same trick can be used to make a x ≈ y? operation, which checks whether the contents
of the cells that variables x and y refer to are the same, and a x :≈ y? operation which
sets the contents of the cell referred to by x equal to the contents of the one referred to by
y.

Here a comment about the notation might be helpful.


The symbol ≡ is to do with what cell the eyes are referring to, whereas the symbol ≈ has
to do with the contents of those cells; and the same goes for :≡ and :≈ .
The symbols ≈ and ≡ (without the colon) are to do with whether two things are equal or
not (true or false), whereas the symbols :≈ and :≡ are to do with making a change, changing
the thing on the left to become equal to the thing on the right.

B.3 Sub-algorithms (subroutines)


Suppose we are writing an algorithm to compute something or other using rational numbers.
We would have a lot of choice about the notation we might use for rationals, but let us say
512 APPENDIX C. GENERAL ALGORITHMS

for sake of argument that we represent integers in binary notation, requiring 0 and 1 symbols
in our alphabet together with an extra symbol to denote negatives. Then we could represent
rationals as pairs of integers — with some convenient algorithmic structure to denote pairs.

Now, representing a rational number by a pair of integers (numerator and denominator) is


not unique unless reduced to lowest form. It doesn’t matter much what we are doing with our
rational numbers, we are going to need to do this reduction many times and in many places.
This will require computing the highest common factor of the two integers, then dividing
them by that. A highest common factor sub-algorithm will involve sub-sub-algorithms for
integer division, subtraction and probably comparison. And so on.

Supposing that we use the algorithm for binary addition given as a diagram above (the
second, better one, with three variables); we certainly don’t want to write it out in full the
dozens of times it turns up in our main algorithm. It is pretty obvious what to do: having
designed the sub-algorithm, we can simply represent it by a box with only the things the
rest of the algorithm needs to know shown on the outside — that is, the entry and exit
points and the three variables. Perhaps something like this

Here I have given the sub-algorithm the


bAd(x, y, s)
name “bAd” (for “binary addition”).

Things get a bit better if this box can be shared. Let’s suppose our algorithm requires this
addition sub-algorithm in three places, like this

something something something


(Only the relevant bits of
bAd(x, y, s) bAd(x, y, s) bAd(x, y, s) the main algorithm are
shown.)
something something something

Rather than have umpteen copies of the same sub-algorithm, a single one can be shared.
Wherever it is needed, the algorithm just jumps to the single copy. All we need to do is
arrange that, when that sub-algorithm is finished, it jumps back to the correct place. This
we do by supplying the sub-algorithm with one more variable (here I’ll call it r, for “return”).
This is set to refer to a new cell, unconnected to any other part of the configuration. Then
we add new symbols to the alphabet, to stand for the places we want the algorithm to return
to when finished. Write the corresponding symbol to the new cell before jumping to the
sub-algorithm and, when it is finished, refer to that cell to decide where to jump back to.
(For this diagram there are only three such points, and I will call the corresponding symbols
p1 , p2 and p3 .)
Since the sub-algorithm is being shared, its variables x, y and z must be set to refer to the
appropriate cells before it is used. This can be done by variable-set operations. So the whole
thing becomes . . .
B. SOME USEFUL FACILITIES 513

something something something

x :≡ a1 x :≡ a2 x :≡ a3
y :≡ b1 y :≡ b2 y :≡ b3
s :≡ c1 s :≡ c2 s :≡ c3

r :≈ p1 r :≈ p2 r :≈ p3

bAd(x, y, s)

r?
p1 p2 p3

something something something

This doesn’t seem to be getting much simpler, but in fact it is, in two ways. Firstly, from the
point of view of the actual complexity of the algorithm. Remember that the sub-algorithm
box conceals quite a complicated construction, and this now only has to occur once. However
we are not interested in that kind of simplicity or efficiency here, we are only interested in
whether computations can be performed by an algorithm at all.
More importantly for us, it will allow a useful conceptual simplification, because everything
that is in colour in the above diagram is quite standard, repeated in an obvious way whenever
any sub-algorithm is shared in this way. We can simply rewrite this diagram as follows,
because all the hidden details are known.

something something something

call bAd(a1 , b1 , c1 ) call bAd(a2 , b2 , c2 ) call bAd(a3 , b3 , c3 )

something something something

Here the “call” box encapsulates setting up the coloured stuff in the previous diagram and
setting the variables of the bAd sub-algorithm to refer to a1 , b1 and c1 (or whatever).

If you’ve done any computer programming, you will recognise what is going on here. We
have just developed the idea of a subroutine — well, the corresponding thing in terms of
the kind of diagrams we have been using — but it corresponds pretty accurately. So I think
I will call them subroutines from now on.
We are now basically up to the point that computer science had reached by about the early
1960s. We have “plain vanilla” subroutines (= sub-algorithms), but our way of implementing
them will not yet allow recursive calls. And these will be important for us.
514 APPENDIX C. GENERAL ALGORITHMS

For our main interests (recursive functions) it will in fact be necessary for us to be able
to create recursive subroutines, that is, subroutines which can call themselves, or more
generally, can call another routine which can call another routine, . . . , which can call the
original one.

Subroutine S1 calls Subroutine S2 calls . . . calls Suroutine Sn calls Subroutine S1 .

Our implementation of subroutines above is not sophisticated enough to deal with this
(basically because the variables of the subroutine will get confused about what to refer to).
But it doesn’t take a great deal to make such recursive subroutines work.

Recursive subroutines
Let’s start by assuming we have got recursiveness to work and we have an algorithm which
is in the middle of its calculation; currently we might have a situation like this:

Main algorithm
called Subroutine A
called Subroutine B
called Subroutine A
called Subroutine C
called Subroutine A

Here we have three invocations of the one subroutine, A, so this one at least has to be
recursive. When the bottom invocation of Subroutine A is finished, and then the invocation
of C, the algorithm needs to know somehow how to continue the middle invocation of
Subroutine A from where it left off. That means that something must remember somehow
enough information for this to happen. (And again when the algorithm eventually gets back
to the top invocation of A.) We cannot tell in general how deep these piles of invocations
might go, and that means that we cannot know in advance how much of this kind of recovery
information needs to be kept. That in turn means that this information cannot be kept in
the program itself. And that means that it must be kept in the configuration somehow.
So what information is necessary to define where a routine was “up to” at the moment we
left it to go to another subroutine? This is pretty simple: for a start, we can arrange to
jump back to the place in the program where we left it by the same method as was used for
a plain subroutine above. We assign a new symbol (in the alphabet) to each place it might
want to go back to and, at the end of the subroutine, use a decision cell (as above) to go
there.
The only other thing necessary is a list of whatever the routine’s variables were referring to
just before the subroutine was called. The required information will look like this:
B. SOME USEFUL FACILITIES 515

This is the structure that would be used


if the calling routine has three variables.
p The first cell contains the symbol that
defines the return “address” in the pro-
gram. The other three cells have point-
ers that point to what the variables
were referring to.
The trick that makes the subroutines work recursively is that we keep a copy of this list at
every new invocation. So, for the cascade of invocations in the example above, we would
have a configuration like this:

Main algorithm p1

called Subroutine A p2

called Subroutine B p3

called Subroutine A p4

called Subroutine C σ . p5

called Subroutine A

This extra bit of configuration we are using to keep track of where subroutines were up to
is called the stack. Each of its rows is called a frame. Note that the stack is furnished with
its own variable which, for the present discussion, I will call σ. Its “normal” position (when
not in the process of calling or returning) is, as shown, looking at the bottom-leftmost cell
of the stack.

To reiterate: for each subroutine, there are a finite number of places in the program it may
be called from, and hence a finite places it might want to jump back to when finished. We
give each such “return address” a different symbol from the alphabet, and use a decision
operation at the end of the subroutine to jump back to the proper place.
All this can be accomplished with the simple operations originally defined. Our new im-
proved call box will contain the following. (Here I assume that the routine A that is doing
the calling has three variables, x, y and z, and the routine, X say, that is being called has
two, u and v. We will want to set those two variables to look at something before setting B
going, and the most likely way to do this is to set them to refer to things that a couple of
variables of A, a and b say, are already referring to.) I will just write the operations out in
longhand, rather than draw a diagram.

Before the jump to subroutine B:


516 APPENDIX C. GENERAL ALGORITHMS

σ move D (Start a new frame by moving down; this creates a new bottom-left cell.)
σ write p6 (Whatever the return symbol is.)
σ move R (The next few steps fill in the new frame.)
D
σ join → x
σ move R
D
σ join → y
σ move R
D
σ join → z
σ move LL
u look at a (Set subroutine B’s variables to refer to whatever a and b are referring to.)
v look at b
After subroutine has finished, it uses the symbol p6 to jump back to here.
σ move RD
x look at σ
σ move URD
y look at σ
σ move URD
z look at σ
C. A MORE CONVENIENT NOTATION 517

C A more convenient notation

C.1 Discussion

We will clean up the notation used above and make it easy to use. Firstly, we write each of
the basic operations (nodes) in a form like move(x,d) or write(x,a) and so on.
Then, noting that most of these basic operations go on to a unique next operation, we can
simply write these operations one after another, or one under another. For example
move(x,d); write(x,a); move(x,u); write(x,b);

or
move(x,d);
write(x,a);
move(x,u);
write(x,b);

(we separate individual operations (which we will now call “statements”) by semicolons for
safety’s sake.)
Of course our “programs” are not usually going to be written out in a long simple string like
this, we will need loops and jumps around. We can allow connections to be made in any
way we like by allowing a label on a statement and a jump to such a labelled statement.
Here is an example of a loop (we have labelled the first statement “fred”):
fred: move(x,d);
write(x,a);
move(x,u);
write(x,b);
jump fred;
In fact, using jump statements like this is considered very bad practice. We will have much
better ways of connecting our statements up, and we won’t in fact be using jumps at all.
But it is nice to know that they are there, so that any connection we like can be made
directly.

Another useful feature is to be able to collect a group of statements together into a larger
“compound” statement. We use curly braces for this:
{move(x,d); write(x,a); move(x,u); write(x,b)}

This can be done hierarchically, with compound statements being collected together into
larger ones. We will see shortly why this is useful. From now on, when I write “statement”
that will include these compound ones.

So how do we deal with the three kinds of branch operation, for instance x ≈ y? ?

This is dealt with by a statement of the form


518 APPENDIX C. GENERAL ALGORITHMS

if (x ≈ a) T-statement;
else F-statement;
following statements
The T-statement represents the nextT connection. The algorithm proceeds to this if the x
≈ a? test is true. Having done that it then proceeds to the following statements (unless
the T-statement jumps elsewhere).
In the same way, the F-statement represents the nextF connection. The algorithm proceeds
to this if the x ≈ a? test is false. Having done that it then proceeds to the following
statements (unless the F-statement jumps elsewhere).

The whole thing can be represented diagrammatically by

x ≈ a?
F T

F-statement T-statement

following-statements

The T-statement can be a simple jump statement, but more often it is a compound state-
ment, made up of several (perhaps many) simpler statements.
The else portion can optionally be omitted entirely, thus
if (x ≈ a) T-statement;
following statements
and this can be represented diagrammatically by

x ≈ a?
F T

T-statement

following-statements

The other two branch operations are dealt with in exactly the same way,
d
by an if (x ≡ y) or an if (x →) statement.
C. A MORE CONVENIENT NOTATION 519

Another kind of statement which is very useful is the while statement, which is used for
building loops. It looks like this:
while (x ≈ a) loop-statement;
following statements
This results in the loop-statement being executed over and over again, so long as the con-
dition x ≈ a holds true. The loop-statement is nearly always a compound, made up of a
number of simpler statements. Also, one of these statements had better modify either x or
a, or else the loop will never end.
As soon as the condition x ≈ a goes false, the loop ends and the algorithm proceeds to the
following-statements.
This can be represented diagrammatically by:

x ≈ a? T loop-statement
F

following-statements

And, as with the if statement, we can use the other two kinds of conditions:
d
while (x ≡ y) or while (x →)
The last basic construction we need to consider with here concerns how to deal with sub-
routines. But we have already dealt with this, and all we need to do is specify how to write
it in our new programming language.

Suppose we have a subroutine that we have decided to call Fred. It will probably have some
variables that must be set by the routine which calls it, before it is set going, its arguments.
It will also possibly have some that it uses for its own private purposes, its local variables.
The whole thing then will look like this:

Fred(x1 , x2 , . . . , xm )(y1 , y2 , . . . , yn ) body

Here x1 , x2 , . . . , xm are its arguments, y1 , y2 , . . . , yn are its local variables and body is a
statement, usually a compound of several statements, which does the work of the subroutine.
So it will usually look more like this:

Fred(x1 , x2 , . . . , xm )(y1 , y2 , . . . , yn ) { S1 ; S2 ; . . . ; Sk }

where the Si are the simpler statements which make up the body.
When another routine calls Fred, it does so thus:

call Fred(a1 , a2 , . . . , am );
520 APPENDIX C. GENERAL ALGORITHMS

The whole arrangement, described above, of calling a subroutine (recursively) is handled


automatically by these two kinds of statement. We do not have to worry, when writing an
algorithm this way, about the messy details of maintaining the stack, return addresses and
so on.
One last thing about subroutines. A subroutine normally returns when it reaches its last
(bottommost) statement, but we often would like it to return from other places. We can
make this happen by including a return statement, which looks like this:

return

If you have done any computer programming, you will see what is happening here. I am
describing a language to write our algorithms in which is modelled fairly closely on the
standard computer language C. This is an elegant and battle-hardened language for writing
the sort of simple algorithms we are interested in here.
Simple? For a start, we are not interested in efficiency — either in time or in space — only
in whether an algorithm works at all. Thus we will write some very wasteful algorithms, for
the good reason that they are simple to write and to understand.
For the same reason, there are many aspects of “real” computer programming that (thank-
fully) will not bother us here, including such things as input/output, error reporting, garbage
collecting, multi-threading and re-entrant subroutines and so on and on.

Now it is time to give a more exact description of the language.

C.2 The language (notation)


A program is defined for a particular workspace, that is, in terms of a given alphabet and a
given set of directions. The program can refer to these in various ways — it has a symbol
for each letter of the alphabet and one for each direction (and we assume that these are all
different so no confusion can arise).
The variables and direction pointers of cells are all pointers to cells; in what follows it is
essential to not confuse these pointers to cells with the contents of those cells. I will as far
as possible use letters near the end of the alphabet for variables, . . . , x, y, z. The contents of
cells are letters (or ); as far as possible I will use letters near the beginning of the alphabet
for these a, b, c, . . . .. The language also contains symbols for directions (left, right and so
on); I will use uppercase letters for these, L, R, . . . .
A program has the following form: one or more routines of which the first one is designated
the main one.
routine; routine; . . . routine;

There must be at least one routine. The non-main routines are usually called subroutines.
A routine has the following form

name(x1 , x2 , . . . , xm )(u1 , u2 , . . . , un ) {S1 ;S2 ;. . . ;Sk }


C. A MORE CONVENIENT NOTATION 521

Where x1 , x2 , . . . , xm ; u1 , u2 , . . . , un are the variable symbols, of which x1 , x2 , . . . , xn are the


arguments and u1 , u2 , . . . , un the local variable symbols. Either or both of these may be
empty, that is, m ≥ 0 and n ≥ 0. The variable symbols are all distinct. The S1 , S2 , . . . , Sk
are statements, of which the last one must be a return statement (so there is at least one
statement in the list). There may be return statements elsewhere in the routine also. The
names of the routines which make up a program must all be different.

A statement is of one of the following forms. First I list the ones which correspond exactly
to the operations used in the diagrammatic form discussed above.
(a) x ≈ a? if(x ≈ a) T-statement
else F-statement
The else-part is optional.
(b) x ≡ y? if(x ≡ y) T-statement
else F-statement
The else-part is optional.

(c) d move?(x,d) T-statement


x →?
else F-statement
The else-part is optional. This form of statement is seldom used.
(d) x :≈ a x :≈ a

(e) x→y
d join(x,d,y)

(f ) x new x new

(g) d
(x →) weakmove(x,d)
We won’t be using this one.
(h) x :≡ y x :≡ y

(i) Fin Fin


We won’t be using this one either. Instead we use a return statement
in the main routine.
Next there are a few convenience operations, which we have constructed from a few of the
basic operations above, and defined diagram symbols for.

d move(x,d)
x→
This is the strong move; we will be using this one.

x ≈ y? if(x ≈ y) T-statement
else F-statement
The else-part is optional.
And now for a few other facilities that we will assume built in to our language.

Jump jump label


Don’t use this.
522 APPENDIX C. GENERAL ALGORITHMS

Routine name(x1 , x2 , . . . , xm )(y1 , y2 , . . . , yn ) body

Call call name(a1 , a2 , . . . , am )

Return return

While while (condition) S


Here condition can be any of x=a, x≡y or move?(x,d).
The behaviour of all these operations has been described above.

C.3 Comments
(i) We can of course represent natural numbers by strings in various ways, for instance
we could use the usual binary code, involving the alphabet { 0 , 1 }. Or we could use decimal
code, involving a slightly larger alphabet.
For example, the number 3718 would be represented by the configuration

x
O R R R
3 7 1 8

(The small triangle indicates that the variable x refers to the cell containing digit 3.)
We will see that, subject to a couple of very simple and reasonable restrictions on the code,
it does not matter what code we use for natural numbers. We will also see that a function
Nm → N can be computed by a program in this language if and only if it is partial recursive.
This will take some doing; for the time being, until this is proved, let us call these functions
algorithmic.
(ii) Clearly this appendix defines “algorithmic” functions more widely: on things that
can be represented by a workplace structure of any kind. For example, we could expect to
write programs that would operate in various ways on the strings used in one of our formal
theories. We can write simple programs to, for instance, check whether a string is a valid
expression or not or to decide whether a sequence of strings is a valid proof or not.

Functions
You will have noticed that the routines which make up a program do not return values, they
simply make changes to the current configuration. However a “function”, that is, a routine
which does return a value, is a very useful thing. We can implement functions without
making any additions to our basic language.
A function is be written in much the same way as a routine: it has the form
name(x1 , x2 , . . . , xn )(u1 , u2 , . . . , ul ) {S1 ;S2 ;. . . ;Sk } (–1)
As with normal routines, x1 , x2 , . . . , xn ; u1 , u2 , . . . , ul are its variables, of which x1 , x2 , . . . , xm
are its arguments and u1 , u2 , . . . , ul the local variables. Either or both of these may be empty,
C. A MORE CONVENIENT NOTATION 523

that is, n ≥ 0 and l ≥ 0. The variables are all distinct. The S1 , S2 , . . . , Sk are statements.
The last statement must be a return statement, so there is at least one statement.
The only difference from the form of a normal routine is that any return statement must
have the form

• return r
where r is one of the variables of the function.
The function can be used in an assignment statement in the following way:

z :≈ F (y1 , y2 , . . . , ym ) (–2)

(where F is the name of the function). This statement executes F just like a normal routine,
with the one exception that, when it encounters a return r statement, it sets z:≡ r before
exiting.
We don’t need to extend the basic definition of a program as given above to include this
facility. It can be implemented as a routine as follows: replace the function definition (–1)
above by
F(v,x1 , x2 , . . . , xn )(u1 , u2 , . . . , ul ){S1 ,S2 ,...Sk }

(note the extra argument v, for “value”), in which all the statements are the same except
that any return statement
v :≈ r;
return r; is replaced by
return;

Then the assignment statement (–2) is replaced by

F (z, y1 , y2 , . . . , ym );

C.4 Truth values


We have three basic decision-making operations x ≈ y , x :≡ y and move?(x,d)) but
they are pretty primitive in their present forms. Most often we want to make a go/no-go
decision based on some more complicated decision. Something like

if (condition) {statements}

where condition is something we can test for which is either true or false. If the condition
is true then the statements are executed, otherwise they are not.
As a first step, it will be useful to have structures to stand for “true” and “false”, and this is
easily done using single-cell structures containing an appropriate letter:

T F

true false
524 APPENDIX C. GENERAL ALGORITHMS

It doesn’t matter much what symbols we use for these, so long as they are different. We
can add two new symbols to the alphabet or we can re-use existing ones (for the way we use
them, no confusion will arise).
We will represent these simple structures by the words true and false.

C.5 Conditions
Here I describe ways of dealing with conditions which are more complicated than the three
basic ones mentioned above. The idea of a condition here is that it is something that can
be true or false, so it will be something that we can compute a value either true or false
for.

For a start, we can compute T or F values for the three basic conditions. Consider
p :≈ (x ≈ y) .
This asks whether the contents of the cells looked at by x and y are the same; if so, it sets
the contents of p to be T, otherwise F. This is easily implemented:
if (x≈ y) p:≈ T;
else p:≈ F;

Boolean operations

We can combine conditions together using boolean operations to form more complicated
ones, as follows; here, if p and q are conditions, then z will be one too.
• z :≈ ¬p
• z :≈ p ∧ q
• z :≈ p ∨ q

These have the obvious meanings and are implemented by functions thus:
not(p)(r); {
if(p) {r :≈ false}
else {r :≈ true}
return r;
}

and(p,q)(r); {
if(p) {
if(q) {r :≈ true;}
else {r :≈ false;}
}
else {r :≈ false}
return r;
}
C. A MORE CONVENIENT NOTATION 525

or(p,q)(r); {
if(p) {r :≈ true;}
else {
if(q) {r :≈ true;}
else {r :≈ false;}
}
return r;
}
We can then write the function calls thus:
z :≈ not(p)
z :≈ and(p,q)
z :≈ or(p,q)
but for these particular function calls it will be convenient to allow the notation
z :≈ ¬p
z :≈ p ∧ q
z :≈ p ∨ q

C.6 Expressions
As soon as we have assignment operations such as the ones discussed in the last few sections,
we can combine them into more complicated expressions. This usually requires the use of
one or more extra unused local variabless. For example, we can implement

if((x 6≈ ) ∧ (y ≈ a)) {statements}

as
p :≈ (x ≈ );
q :≈ ¬p;
r :≈ (y ≈ a);
s :≈ q ∧ r;
if(s) {statements};
The point of this is that we can now write things like

if((x 6≈ ) ∧ (y ≈ a)) {statements}

knowing that it is just a shorthand for the group of simpler statements above. We can also
write
p := (x 6≈ ) ∧ (y ≈ a)
and so on.

We can also substitute functions into other functions and routines. For instance, given
functions
F(x,y)
G(x)
H(x,y,z)
we can write
526 APPENDIX C. GENERAL ALGORITHMS

z :≈ F(G(a),H(a,b,c))
knowing that it is just shorthand for something of the form
u :≈ G(a);
v :≈ H(a,b,c);
z :≈ F(u,v);
D. AN IMPORTANT WORKSPACE STRUCTURE 527

D An important workspace structure


In this section we deal with a particularly simple type of workspace: there are only two
allowed directions, R (for “right”) and D (for “down”), so we will call them RD-workspaces.
Note that we place no restriction on the alphabet.
We will see that quite a lot can be done with RD-workspaces, in particular

• They can deal with sequences of symbols, that is, strings, in the sense that they can
represent strings and perform all the usual operations on them, adding and removing
entries, concatenating strings, comparing them and so on. This allows operations such
as checking whether a given string in one of our formal languages is an expression or
not, whether it is an axiom or not and so on.

• Moreover, they can deal with “nested” sequences: sequences of sequences of sequences
and so on to any depth. So, for instance, proofs can be checked.

• In the same way, algorithms involving trees can be implemented.

• Natural numbers can be represented by strings of digits, so they can be dealt with
also. All the basic functions and operations defining partial recursive functions (as
in Definition 9.A.3) can be computed. This leads to the conclusion that all partial I 9.A.3
recursive functions are algorithmic, and in fact can be computed in this rather simple
workspace.

• Any configuration in any kind of workspace at all can be represented by one in the
RD-workspace with the same alphabet; moreover any algorithm that can be performed
using the more general workspace structures can be mimicked by one using the RD-
workspace with the same alphabet. This means that anything that can be computed
in any way can be computed using an RD-workspace.

D.1 Strings
We have already seen a neat way to deal with strings whose entries come from any given
alphabet using only R pointers. For example, the number 3718 would be represented by the
structure

x y
O R
O R R
3 7 1 8

(The small triangles indicate that there are variables x and y referrring to the digits 3 and
7.)
We can represent it (on paper) more compactly thus:
x y
O O
[3 7 1 8] .
528 APPENDIX C. GENERAL ALGORITHMS

In a computer program, for efficiency, one would normally have L (“left”) pointers (which
are the reverse of R pointers) as well, but they are not really necessary, so to keep things
simple we will not use them. Note however that this means that we must always refer to
strings by their leftmost entries, for there is no way of finding entries to the left of the one
a a variable references, unless there is already another variable pointing there. So here “the
string x” means “the string whose leftmost entry is pointed to by x”.

What are the things we might want to do with strings?


• z :≈ rightmost(x)
This sets z to point to the rightmost entry of the string x. The function is implemented as
rightmost(x)(r) {
r :≈ x;
while(move?(r,R)) move(r,R);
return r;
}
• z :≈ [ ]
Set z to point to a new empty string. This is the same as z new .

• appendL(x,y)
• appendR(x,y)
Append a new entry a to the left or right end of the string x which is a new cell containing
whatever symbol y was looking at. Make sure that x now points to the left-hand end of the
string as it should.
The appendL function is implemented as
appendL(x,y)(u) {
u new; u :≈ y;
if (x6=≈ ) join(x,L,u);
x:≡u;
return;
}
and the appendR by
appendR(x,y)(u,v) {
u new; u :≈ y;
if(x ≈ ) x :≈ u;
else {
v :≈ rightmost(x);
join(v,R,u);
}
return;
}
• z := concat(x,y)
D. AN IMPORTANT WORKSPACE STRUCTURE 529

Concatenate the strings x and y and set z to point to the leftmost entry of the result.
Implemented as
concat(x,y)(r,u) {
if(x ≈ ) {r cceq y;}
else {
r :≈ x;
u:=rightmost(x);
join(u,R,y);
}
return r;
}
The statement x:=y sets x to point at the same cell that y does; it does not create anything
new. The statement x:=copy(y) creates a new cell identical to the one that y points at.
We also want a statement that will copy a whole string. This is

• z :≈ copyString(x)
and it is implemented as
copyString(x)(r,u) {
r :≈ ;
u cceq x;
while(u 6≈ ) {
appendR(r,u);
move(u,R);
}
return r;
}
Of course we want to check whether two strings are equal. The condition x=y won’t do
because it only checks whether the individual cells x and y point to are the same (in the
sense of having the same contents; their direction pointers may be different).

• z :≈ equalStrings(x,y)
is implemented as
equalString(x,y)(u,v) {
u :≈ x;
v :≈ y;
while(u 6≈  ∧ v 6≈ ) {
if(u 6≈ v) return false;
move(u,R);
move(v,R);
}
if(u ≈  ∧ v ≈ ) return true;
return false;
}
530 APPENDIX C. GENERAL ALGORITHMS

D.2 Sequences of sequences (and trees)


One of our reasons for looking at sequences is that we will use them to code natural numbers.
But we will also want to code up sequences of natural numbers, and there are two obvious
ways to do this: one is to introduce one two more letters to stand for parentheses and the
other is to introduce a new direction. Representing the sequence {{2, 7}, {5, 3}, {108}} as a
string using extra letters (the parentheses), [ ( 2 7 ) ( 5 3 ) ( 1 0 8 )]

x
O
( 2 7 ) ( 5 3 ) ( 1 0 8 )

but much better is to use the D (for “down”) pointers,

x
O

2 7 5 3 1 0 8

Here all the cells in the first row have null contents and those in the second row have null
down pointers. Clearly, using this idea we can nest sequences to any depth — or represent
tree diagrams.
We will need a function to copy an entire tree (or subtree):

• z :≈ copyTree(x)
This is implemented as
copyTree(x)(u,p,q,r,s) {
u :≈ x;
s :≈ ;
while(u 6≈ ) {
p :≈ copy(u);
q :≈ copyTree(u.D);
join(p,D,q);
if(s 6≈ ) join(s,R,p);
else r :≈ p;
s :≈ p;
move(u,R);
}
return r;
}
This is the first example we have of a routine which calls itself recursively.

D.3 Natural numbers


We will now set out to represent natural numbers as workspace structures, and then show
D. AN IMPORTANT WORKSPACE STRUCTURE 531

that all the basic operations defining a partial recursive function (as in Definition A.3) can I A.3
be computed in this language. We will see that all that is required of our representation of
natural numbers is that

• the representations of different natural numbers are different;

• the representation of zero is known, that is, there is a nullary algorithm zero() which
will compute the representation of zero;

• the successor function can be computed in this representation, that is, there is a unary
algorithm suc(x) which will compute the successor of any number in this representa-
tion;

• there is an “equality” algorithm, equal(x,y) say, which will determine whether two
numbers are equal or not; and

• we should be able to represent sequences of natural numbers in some way so that


individual entries in the sequence can be recovered;

then all else follows.


These are very easily met conditions. They are also obviously necessary to be able to do any
sort of calculations in that notation. So we will call any code that satisfies these requirements
“algorithmic”. Here we look at a couple of useful ones as examples.

D.4 Binary code


Normal binary notation can be represented as a string. For example eleven is represented
as the string [ 1 0 1 1 ]. One slight difference from normal notation will be used: zero will
be represented by the empty string [ ] instead of [ 0 ]. We can also represent sequences of
natural numbers using the “down pointer” method described above.
(There is a slight possibility of confusion with this notation. Here is how I am avoiding it:
The number 0 is represented by the string [ ],
The number 1 is represented by the string [ 1 ],
The number 2 is represented by the string [ 1 0 ],

and so on. (The letters used inside the strings are written in small slanty font to distinguish
them from actual numbers, which will be written normally. After the example routine for
the successor function below, we will not be looking inside the strings for natural numbers
again anyway.)
That different natural numbers have different representations in this coding is easily proved
and we have just defined the representation of zero.

The successor function for this coding is implemented as follows:


532 APPENDIX C. GENERAL ALGORITHMS

suc(x)(r,carry,dig) {
r :≈ [ ];
carry :≈ true;
dig :≈ rightmost(x);
while(dig6≈ ) {
if(carry) {
if(dig ≈ 0) {appendL(r,1 ); carry :≈ false;}
else {appendL(r,0 ); carry :≈ true;} }
else {
if(dig ≈ 0) {appendL(r,0 ); carry :≈ false;}
else {appendL(r,1 ); carry :≈ false;}
}
move(dig,L);
if(carry) appendL(r,1 );
}
return r;
}
Equality is given by equalStrings.

D.5 Unary code

This is possibly the simplest notation of all. We choose a symbol (say, 1 ) and represent
each natural number by a string of that many copies of the symbol. So, for example, five is
represented by the string [ 1 1 1 1 1 ] .
That different natural numbers have different representations in this coding is obvious. Zero
is represented by the empty string. The successor function is implemented by:
suc(x)() {
appendR(x,1 )
return r
}
Equality is again given by equalStrings.

D.6 Lemma: Interchangeability of number codes

Given any two algorithmic codes for natural numbers (that is, ones which satisfy the con-
I D.3 ditions listed in section D.3), then there are algorithms to convert back and forth between
the two codes.
For, suppose we have two such codes — let us call them Code 1 and Code 2. Then, according
to the conditions assumed, there are algorithms zero1(), suc1(x) and equal1(x,y) for
Code 1 and corresponding ones zero2(), suc2(x) and equal2(x,y) for Code 2.

Then an algorithm for converting a number x in Code 1 to Code 2 is:


D. AN IMPORTANT WORKSPACE STRUCTURE 533

convert(x)(r,u) {
u :≈ zero1();
r :≈ zero2();
while(¬equal1(u,x)) {
u :≈ suc1(u);
r :≈ suc2(r)
}
return r;
}
It follows that any function Nn → N which can be algorithmically computed in one Code 1
can be computed in Code 2. For, given an algorithm which works for Code 1, an algorithm
for Code 2 can be created by simply converting all the arguments from Code 2 to Code
1 first, then applying the given Code 1 algorithm and finally converting the Code 1 result
back to Code 2.

Summary
So, from now on we will just assume that we have a workspace structure representing the
number zero, which I will call 0, and a functions for computing the successor and deciding
equality, which I will call suc and equal. (As usual, I will abbreviate z := equal(x,y) to
the more usual z:=(x=y) ).
This is where we get to use the = symbol.
So we can use more or less any reasonable notation we like, binary, decimal, Roman, unary,
gaol-wall — choose your favourite.

D.7 Entries in a sequence


Now it makes sense to ask for an algorithm entry(i,x) which will return the ith entry of
the sequence x. (Here i points to the code for a natural number.) Here it is:
entry(i,x)(r,u,count) {
u :≈ x;
r :≈ 0;
count :≈ 1;
while(count6≈ i) {
u :≈ u.R;
count :≈ suc(count);
}
r :≈ u.D;
return r;
}
This function returns a pointer to (variable that refers to) the entry, where it sits in the given
sequence. To get a separate copy of the entry, use the function copyEntry, which is exactly
the same as entry except that the last statement is changed to return copyTree(r).

For example, z :≈ copyEntry(2,x), applied to


534 APPENDIX C. GENERAL ALGORITHMS

x
O

2 7 5 3 1 0 8

will yield

x
O
z
O
2 7 5 3 1 0 8 5 3

Note that this function will work with sequences, or in fact trees of any depth, just as well.
Note also that we are indexing the entries by 1,2,3,. . . , not 0,1,2,. . . .
Finally, as remarked above, this will work no matter what code we use for natural numbers,
so long as it is algorithmic. We will not assume that we are using the particular decimal
code exhibited in the pictures above: that was done just for illustration.

D.8 Proposition
All partial recursive functions are algorithmic.
Moreover, given any code for numbers, provided the number code is algorithmic, any partial
recursive function can be computed by an algorithm using that code.

I A.3 Proof. We work our way through the various parts of Definition A.3.
(i) We are assuming that an algorithm suc for the successor function is given.
(ii) We need to give an algorithm for the projection function πn,i : Nn → N. Note that,
for this proof, it is sufficient to give a different algorithm for each n and i, however it is now
easy to give a single algorithm which works for all n and j. The function πn,i (x) is given by
the algorithm entry(x,i) above (the subscript n is irrelevant).
(iii) Addition, z :≈ x+y, is implemented as
add(x,y)(r,u)
r :≈ x;
u :≈ 0;
while(u6≈ y) {
r :≈ suc(r);
u :≈ suc(u);
}
return r;
}
D. AN IMPORTANT WORKSPACE STRUCTURE 535

(iv)  y, is implemented as
Natural subtraction, z :≈ x−
natSubtract(x,y)(r) {
r :≈ 0;
while(x+r6≈ y ∧ y+r6≈ x) r :≈ suc(r);
if(y+r≈ x) return r;
return 0;
}
(v) Multiplication, z :≈ x*y, is implemented as
multiply(x,y)(r,u)
r :≈ x;
u :≈ 1;
while(u6≈ y) {
r :≈ r+x;
u :≈ suc(u);
}
return r;
}
For most notations these would be an ridiculously inefficient algorithms. But that doesn’t
matter here.
(vi) Substitution. Suppose that we have functions

f : Nm → N and g1 , g2 , . . . , gm : Nn → N

all algorithmic by programs whose main functions are (respectively)

F(x1 , x2 , . . . , xm ) , G1 (x1 , x2 , . . . , xn ), . . . , Gm (x1 , x2 , . . . , xn ) .

Then we can implement the function h = f (g1 , g2 , . . . , gm ), that is, h : Nn → N defined by

h(x1 , x2 , . . . , xn ) = f (g1 (x1 , x2 , . . . , xn ), g2 (x1 , x2 , . . . , xn ), . . . , gm (x1 , x2 , . . . , xn ))

as follows:
H(x1 ,x2 ,...,xn ) (r,u1 ,u2 ,...,um ) {
u1 :≈ G1 (x1 ,x2 ,...,xn );
u2 :≈ G2 (x1 ,x2 ,...,xn );
.
.
.
um :≈ Gm (x1 ,x2 ,...,xn );
return F(u1 ,u2 ,...,um );
}
In whatever program this occurs, it must be accompanied by all the subroutines in the
programs for F,G1 , G2 , . . . , Gm , so that the above assignment statements work
(vii) Minimalisation. Suppose we have a function f : Nn → N defined by minimalisation
thus:
f (x1 , x2 , . . . , xn ) = min{g(u, x1 , x2 , . . . , xn ) = 0},
u
536 APPENDIX C. GENERAL ALGORITHMS

where g is a function Nn+1 → N which is algorithmic by a program whose main routine is

G(u,x1 ,x2 ,...,xn ) .

Then we can implement the function f as follows:


F(x1 ,x2 ,...,xn )(u) {
u :≈ 0;
while(G(u,x1 ,x2 ,...,xn )6=0) u :≈ suc(u);
return u;
}
Again, this must be accompanied by the routine for G and any routines it uses. 
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 537

E Algorithmic ⇒ partial recursive


E.1 Discussion
In this section we will prove that every algorithmic function Nn → N is partial recursive.
(We have just proved that the reverse is true). This is a fairly major undertaking; in outline
it involves showing:

• We know that the overall state of the algorithmic calculation at any point in its
progress (what we called a step in the running of the algorithm) consists of the current
configuration together with where the computation is up to in the program. There is
a way of representing every such step as a single natural number (which we will call
its Gödel number ) which allows the following.
• Converting any natural number to the Gödel number of its representation as a config-
uration is a recursive function.
• The action of each kind of statement in our algorithmic programming language is
reflected by a recursive function of the Gödel numbers of the before-and-after states.
• Recognising when the program has terminated is also reflected in a recursive function
of the Gödel number.
• Reading off the “answer”, that is, converting the final state to the natural number it
represents, is also reflected in a recursive function.
• Consequently, starting the program with any given number as its input and running it
until it terminates and then reading off the answer can be reflected as a partial recursive
function, defined by a minimalisation involving the aforementioned functions.

This proof will have a couple of interesting side-effects (unintended corollaries):

Since the minimalisation step here occurs only once, it follows that it is possible to build up
any partial recursive functions in the manner of their basic definition, in which a minimali-
sation step is used at most once.
Further, we will see as we go along that all the recursive functions mentioned above (in
every dot-point except the last) is in fact primitive recursive. This results in the surprising
fact (Theorem E.17) that any partial recursive function can be computed in three steps: I E.17
(1) Apply a primitive recursive function,
(2) Do a minimalisation (once only!) on the result,

(3) Apply another primitive recursive function to the result of that.


As you would expect, this is a long and technical process, involving most of this section.
As often, I have provided all the gory details as a matter of completeness. I would not
recommend that you study all the down-and-dirty details carefully unless you are really
interested. The overall argument is however of some interest — and the results important
— so I would recommend reading through the whole thing lightly, with a view to getting
an idea of the techniques involved. And start paying proper attention again at E.14!
538 APPENDIX C. GENERAL ALGORITHMS

E.2 Gödel numbers

In this section we will show how to assign natural numbers to complicated structures in such
a way that operations we perform on those structures correspond to recursive operations on
these numbers. This process in general is called Gödel numbering.

Our aim will be to assign a Gödel numbering to the steps in the running of an algorithm in
such a way that the operation of going from one step to the next is mirrored by a recursive
function N → N. (In fact it will be primitive recursive, which gives another interesting
result.)

The definition of a step is quite complicated. We will work up to it by first Gödel numbering
the alphabet A, the set D of directions and the set N of nodes. Next we will Gödel number
the possible cells . . . and so on, working our way up through levels of complexity.
In this process we will be dealing with the set N[∞] of all finite sequences of natural numbers;
I 9.B.17 this and the function P[∞] : N[∞] → N defined in 9.B.17 will be used many times. So often
in fact that I will use the less complicated notation γ for this function. (The symbol γ will
be used for this function and for nothing else throughout this appendix.)
The properties of this function (and some related ones) will be re-stated here in this notation.
It will become apparent that this is our first example of a Gödel numbering.
I 9.B.21 In Proposition 9.B.21 a number of associated functions are defined and their relevant prop-
erties proved. Since we will be using them, here they are again:

get length
len : N → N mirrors the operation of finding the N[∞] N
length of a sequence.
For any sequence x of length n we have len(γ(x)) = γ id
n.

N N
len

get entry
ent : N2 → N mirrors the operation of finding the N × N[∞] N
ith entry of a sequence.
For any sequence x of length n we have ent(i, γ(x)) = id ×γ id
xi .
(We don’t care what its values are for i > n: we
won’t be using them.) N2 N
ent

delete x1
del : N → N mirrors the operation of deleting the N[∞] N[∞]
first entry of a sequence.
For any sequence x of length n we have del(γ(x)) = γ γ
γ(y), where y is the sequence x with its first entry
removed.
(We don’t care what its value is for the empty se- N N
quence: we won’t be using it.) del
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 539

get entry
adj : N2 → N mirrors the operation of adjoining a N × N[∞] N
new first entry to a sequence.
adj(z, γ(x)) = γ(y) where y is the sequence x with id ×γ id
a new first entry z tacked on.

N2 N
ent
xr = z
rep : N3 → N mirrors the operation of replacing an N2 × N[∞] N[∞]
entry with a new value.
rep(r, z, γ(x)) = γ(y) where y is the sequence x with id2 ×γ γ
its rth entry replaced by z.
(We don’t care what its values are for r > n: we
won’t be using them.) N3 rep
N

concat : N2 → N mirrors the operation of concate-


nating two sequences. (x, y) 7→ xy
If xy is the concatenation of sequences x and y, then (N[∞] )2 N[∞]
concat(γ(x), γ(y)) = γ(xy).
(γ)2 γ

N2 N
concat

zeros : N → N mirrors the operation of creating a n 7→ 0n


N N[∞]
sequence 0n = h0, 0, . . . , 0i of n zeros.
For any n ∈ N, γ(0n ) = zeros(n). id γ

N N
zeros

E.3 The game plan


Here is an outline of what we do. The running of a program results in a sequence of steps,
Step0 , Step1 , . . . , Stepn . To compute the value, f (x) say, of its function algorithmically we
first encode x into the initial configuration, giving us Step0 , then we set the algorithm
running. It stops as soon as it reaches a fin operation (at Stepn say). At this stage we
decode the value from the configuration at that step. Every step in this progress will be
mirrored by its Gödel number, so we can picture the whole running of the algorithm thus:

encode decode
x Step0 Step1 ... Stepn f (x)
enc dec
g0 g1 ... gn
540 APPENDIX C. GENERAL ALGORITHMS

The gi are all Gödel numbers. We will show that the red arrows (the left and right end
diagonal ones and all the ones in the bottom row) are primitive recursive functions. The
process of stepping continues until a step to the Fin operation occurs (which can also be
tested by a primitive recursive function). Therefore the whole process of getting from x to
f (x) is partial recursive. (Partial because the algorithm might get into an infinite loop and
never reach the Fin node.)

E.4 The basic Gödel numbers


The first thing we do is assign Gödel numbers to the symbols (members of the alphabet),
the directions the nodes in the program and the cells in a configuration.
To do this, first we index the symbols, so that the alphabet A is
A = {s1 , s2 , . . . , sσ } allocating 0 to , so that  = s0 .

Then we index the directions, so the set D of directions is


D = {d1 , d2 , . . . , dδ } .

Next we index the nodes in the program, so the set N of nodes is


N = {n1 , n2 , . . . , nν } ,
and its variables
V = {v1 , v2 , . . . , vϕ } .

Finally, for any configuration, we will assume its cells are indexed, so the set of all cells in
the configuration is
C = {c1 , c2 , . . . , cn } allocating 0 to , so that  = c0 .

About how the cells are indexed


As the algorithm progresses step by step, the configuration will change. In this process the
set of cells is likely to change too. In order for the calculations we are about to do to make
sense, it is essential that the way the cells are indexed behaves well, that is, that the same
cell in different configurations has the same index.
This is in fact easy to arrange. First note that cells are never destroyed (because we are not
interested in saving space, and clearing up any mess we make would add some pretty nasty
complications) and the only operation that ever creates cells is the “x new” operation. This
operation creates one new cell, so we will simply give it a new index which places it at the
end of our list.

E.5 Cell Gödel numbers


Having indexed the letters, directions and cells, as above, we can describe the state of any
cell, ck , that is, what letter it contains and what its direction pointers point to, by a sequence
of length δ + 1:
hs, c1 , c2 , . . . , cδ i .
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 541

where s is the index (as above) of the symbol it contains, possibly 0 for empty, and each ci
is the index of the cell that it is connected to in direction i, again 0 if it is not connected.
Having done this, we can represent this state by a single number gnCell, given by
gnCell(ck ) = γhs, c1 , c2 , . . . , cδ i .
We will call this number the cell Gödel number for the cell.

The things we will want to do with a cell are reflected in its Gödel number; let us give
these functions helpful names. (We suppose we are working with a cell which has cell Gödel
number g.)

• getContentsC (g) = ent(1, g) ;


This gives the contents of the cell.
• setContentsC (g, s) = rep(1, g, s) ;
This gives the Gödel number of the cell which results from setting the contents of the
given one to ss .
• neighbourC (g, d) = ent(d + 1, g) ;
This gives the index of the neighbour in direction dd .
• connectC (g, d, k) = rep(d + 1, g, k) ;
Sets the direction pointer of the cell in direction dd to point to cell ck , and gives the
Gödel number of the new cell.
• emptyC () = γ(h0, 0, 0, . . . , 0i) (where the sequence consists of δ + 1 zeros);
this is the cell Gödel number of an empty cell.

All these functions are primitive recursive.


We will be building up to the important functions incrementally, so there will be lots of
functions like these. I will give them subscripts to indicate at what level they operate. Here
the subscript C indicates the cell level.

E.6 Configuration Gödel numbers


Consider a configuration C . It can be specified by giving a list of all its cells together with
what letters they contain and what their direction pointers point to. Thus it is specified by
the sequence
h gc1 , gc2 , . . . , gcn i .
where, for each i, gci is the cell Gödel number of cell ci .

Consequently the state of the entire configuration can be represented by the single number
gnConfig(C ) = γh gc1 , gc2 , . . . , gcn i ,
the “configuration Gödel number of the configuration”.
The things we will want to do with the configuration are also mirrored by its Gödel number;
and we will give these functions helpful names also. Let us now suppose we are working
with a configuration with Gödel number g.
542 APPENDIX C. GENERAL ALGORITHMS

• getCellK (g, i) = ent(g, i) ; this returns the cell Gödel number of the cell with index i.

• setCellK (g, i, j) = rep(g, i, j) ; this replaces the cell with index i by a cell with Gödel
number j and returns the Gödel number of the resulting configuration.

• getContentsK (g, i) = getContentsC (getCellK (g, i)) ;


this returns the contents of the cell with index i.

• setContentsK (g, i, l) = rep(i, g, setContentsC (getCellK (g, i), l)) ;


this finds the cell at index i, replaces its contents with symbol sl and returns the Gödel
number of the resulting configuration.

• neighbourK (g, i, d) = neighbourC (d, getCellK (g, i)) ;


If i is the index of a cell in the configuration g, this gives the index of its neighbour in
direction d.

• joinK (g, i, d, j) = setCellC (g, i, connectC (d + 1, getCellK (g, i), j)) ;


If i is the index of a cell in the configuration g, this connects it in direction d to the
cell with index j.

• newCellK (g) = concat(g, emptyC ()) ;


Adds a new empty cell to the end of configuration g.

All these functions are primitive recursive.


I couldn’t use the obvious subscript C for the configuration level, because that is already
taken; so I’ve used K.

We are heading for a Gödel number of a step, encapsulating the current states of the
configuration and the processor. So now we must turn our attention to the processor.

E.7 Describing the program


Now we can assume that the program has been described diagrammatically and that the
only kinds of nodes used are the nine basic ones listed in the original definition of a general
I A.2 algorithm (A.2).
This diagram is static — it does not change while the algorithm is running, so we won’t
need a Gödel number for it (though using the techniques we have used so far this would not
be hard) but we do need a way of describing what its nodes are and how they are connected
up using just numbers — which we will do using sequences and their Gödel numbers.
In that definition of a general algorithm it was specified that there are nine kinds of node and
six functions which, taken together, completely specify the program, its “wiring diagram”.
Some of these functions are defined for all of the nodes and some of them are only defined
for certain kinds of node. Here is a table showing which of the functions are defined for
which kind of node.
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 543

kind var next nextT sym dir var0


1 x=a? Y Y Y Y 0 0
2 x≡y? Y Y Y 0 0 Y
3 move? Y Y 0 0 Y Y
4 x:=a Y Y 0 Y 0 0
5 join Y Y 0 0 Y 0
6 new Y Y 0 0 0 0
7 (move) Y Y 0 0 Y 0
8 (x:≡y) Y Y 0 0 0 Y
0 Fin 0 Y 0 0 0 0
Here a Y means that the function is defined for that kind,
a zero that it is not.

It will make our notation much simpler if we assume these functions are in fact defined
for all nodes by giving them some arbitrary value on the nodes for which they are not yet
defined. We won’t be using those values, so this won’t cause any problems. A natural value
to give them would be , which has subscript 0 in each set.
All these functions are between finite sets which we have already indexed, so we can represent
them by Gödel numbers using a trick.
So the function
var : {n1 , n2 , . . . , nν } → {v1 , v2 , . . . , vϕ }

is mirrored by a function defined on the subscripts

var : {1, 2, . . . , ν} → {1, 2, . . . , ϕ} ,

which is conveniently defined by

vari = k ⇔ var(ni ) = vk .

In the same way,


the function next : {n1 , n2 , . . . , nν } → {n1 , n2 , . . . , nν }
is mirrored by next : {1, 2, . . . , ν} → {1, 2, . . . , ν} ,
which is defined by nexti = k ⇔ next(ni ) = nk

the function nextT : {n1 , n2 , . . . , nν } → {n1 , n2 , . . . , nν }


is mirrored by nextT : {1, 2, . . . , ν} → {1, 2, . . . , ν} ,
which is defined by nextT i = k ⇔ nextT (ni ) = nk

the function sym : {n1 , n2 , . . . , nν } → {s1 , s2 , . . . , sσ , }


is mirrored by sym : {1, 2, . . . , ν} → {1, 2, . . . , σ, 0} ,
which is defined by symi = k ⇔ sym(ni ) = sk

the function dir : {n1 , n2 , . . . , nν } → {d1 , d2 , . . . , dδ }


is mirrored by dir : {1, 2, . . . , ν} → {1, 2, . . . , δ} ,
which is defined by diri = k ⇔ dir(ni ) = dk
544 APPENDIX C. GENERAL ALGORITHMS

the function var0 : {n1 , n2 , . . . , nν } → {v1 , v2 , . . . , vϕ , }


is mirrored by var0 : {1, 2, . . . , ν} → {1, 2, . . . , ϕ, 0} ,
which is defined by var0 i = k ⇔ var0 (ni ) = vk
not forgetting of course
the function kind : {n1 , n2 , . . . , nν } → {0, 1, . . . , 8}
is mirrored by kind : {1, 2, . . . , ν} → {0, 1, . . . , 8} ,
which is defined by kindi = k ⇔ kind(ni ) = k

E.8 Process Gödel numbers


The state of the process, that is, where the processor is up to, is easily described. It is
defined by two things: the current node np , which can be specified by its index p and the
values of its variables.
The values of all its variables constitute a function ref : V → C . We can use the same trick
as above:
the function ref : {v1 , v2 , . . . , vϕ } → {c1 , c2 , . . . , cn }
is mirrored by ref : {1, 2, . . . , ϕ} → {1, 2, . . . , n} ,
which is defined by ref i = k ⇔ ref(vi ) = ck

and this has Gödel number gnRef = γ(ref) and then the current state of the processor is
defined by this and the index of the current node:

gnProc = γ(p, gnRef) .

The things we will want to do with the state of the process are as follows. Suppose we are
working with a process state with Gödel number g = gnProc.

• getNodeP (g) = ent(g, 1) ; this returns the node index p.

• setNodeP (g, p) = rep(g, 1, p) ; this replaces the node index by p and returns the Gödel
number of the resulting process state.

• getVarP (g, i) = ent(ent(g, 2), i) ;


this returns the value of the variable with index i, that is, the index of the cell it refers
to.

• setVarP (g, i, v) = rep(g, 2, rep(ent(g, 2, v)) ;


this replaces the value of the variable with index i by v and returns the Gödel number
of the resulting process state.

All these functions are primitive recursive.

E.9 Step Gödel numbers


A step in the running of the algorithm is a pair S = hC, P i where C is a configuration, and
P is a state of the process, as just described.
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 545

The step can be represented by the pair hgnConfig C, gnProci, and so we have a Gödel
number
gnStep(S) = γ(gnConfig C, gnProc).

Now the various things we want to do at this level, both to the workspace and to the
program state, are easily defined: suppose we are working with a step-level Gödel number
g = gnStep(S).

• getCellS (g, i) = getCellK (ent(g, 1), i) ;


this returns the contents of the cell with index k.

• setCellS (g, i, c) = rep(g, 1, setCellK (ent(g, 1), i, c)) ;


this finds the cell at index k, replaces its contents with syms and returns the resulting
top-level Gödel number.

• getContentsS (g, i) = getContentsK (ent(g, 1), i) ;


this returns the contents of the cell with index k.

• setContentsS (g, i, s) = rep(g, 1, setContentsK (ent(g, 1), i, s)) ;


this finds the cell at index i, replaces its contents with syms and returns the resulting
top-level Gödel number.

• neighbourS (g, i, d) = neighbourK (ent(g, 1), i, d) ;


If i is the index of a cell in the workspace g, this gives the index of its neighbour in
direction d.

• joinS (g, i, d, j) = rep(g, 1, joinK (ent(g, 1), i, d, j)) ;


This connects cell ci in direction d to cell cj and returns the resulting top-level Gödel
number.

• getNodeS (g) = getNodeP (ent(2, g)) .


This gets the index of the current node.

• setNodeS (g, p) = rep(2, g, setNodeP (ent(2, g), p)) .


This sets the current node-index to p.

• getVarS (g, i) = getVarP (ent(2, g), i) .


This gets the value of the processor’s ith variable, xi .

• setVarS (g, i, v) = rep(2, g, setVarP (ent(2, g), i, v)) .


This sets the value of the processor’s ith variable, xi to v.

All these functions are primitive recursive.

E.10 Action of individual kinds of node


The Gödel number gnStep above represents the state of the entire computation at any time
(step in the computation). We are not finished yet: we need to track the process of the
computation from initialisation (setting up the initial configuration and stack, based on the
supplied input) to the output (extracting the result from the workspace). In between we
546 APPENDIX C. GENERAL ALGORITHMS

track the computation step-by-step until we recognise that it is time to stop. Tracking the
computation is done by seeing what a single step (execution of a single statement) does
to the top-level Gödel number, and simply repeating this until the “finished” condition is
recognised. All these computations are primitive recursive except for the “keep going until
finished” part which is partial recursive (because of the possibility of never getting to the
“finished” condition).

So our next job is to look at how a single statement step is reflected in the top-level-Gödel
number. To do this we look first at the way the execution of the different kinds of node
change the step.
The execution of a single node results in going from one step to the next, so a function from
the set of all possible steps to itself. It is clear that this is a well-defined function. The point
of the present subsection is to look at it closely enough to show that it is in fact mirrored by
a primitive recursive operation on the Gödel numbers of the steps. This is a slightly tedious
process of checking the behaviour of each of the nine kinds of node. The proofs are pretty
repetitive, but here they are.

Assume that the Gödel number of the current step is g.


• Kind=1, x ≈ a?
The index of the node is p = getNodeS (g);
The index of the symbol a is i = sym(p);
The index of the variable x is v = var(p);
The index of the cell it refers to is i = getVarS (g, v);
The index of the contents of this cell is j = getContentsS (g, i);
Therefore (
nextT (p) if i = j
The index of the next node is q=
next(p) otherwise;
and Gödel number of the new step is setNodeS (g, q)
• Kind=2, x ≡ y?
The index of the node is p = getNodeS (g);
The index of the variable x is u = var(p);
The index of the variable y is v = var0 (p);
Therefore (
nextT (p) if u = v
The index of the next node is q=
next(p) otherwise;
and Gödel number of the new step is setNodeS (g, q)
d
• Kind=3, x →?
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 547

The index of the node is p = getNodeS (g);


The index of the direction is d = dir(p);
The index of the variable x is v = var(p);
The index of the cell it refers to is i = getVarS (g, v);
The index of the neighbouring cell is j = getNeighbourS (g, i, d);
Therefore (
nextT (p) if j 6= 0
The index of the next node is q=
next(p) otherwise;
and Gödel number of the new step is setNodeS (g, q)

For the following steps two things happen: a change is made to the configuration (the result
of which I shall call the intermediate step) and then the current node is updated to the next
one, resulting in the new step.

• Kind=4, x :≈ a
The index of the node is p = getNodeS (g);
The index of the variable x is v = var(p);
The index of the cell it refers to is i = getCellS (g, v);
The index of the symbol ais s = sym p;
Therefore
the Gödel number of the intermediate step is gint = setContentsS (g, i, s)
and of the new step is setNodeS (gint , next(p))
d
• Kind=5, x → y
The index of the node is p = getNodeS (g);
The index of the variable x is u = var(p);
The index of the cell x refers to is i = getCellS (g, u);
The index of the variable y is v = var0 (p);
The index of the cell y refers to is j = getCellS (g, v);
Therefore
the Gödel number of the intermediate step is gint = joinS (g, i, d, j)
and of the new step is setNodeS (gint , next(p))
• Kind=6, x new
The index of the node is p = getNodeS (g);
Therefore
the Gödel number of the intermediate step is gint = newCellK (g)
and of the new step is setNodeS (gint , next(p))
d
• Kind=7, (x →)
548 APPENDIX C. GENERAL ALGORITHMS

The index of the node is p = getNodeS (g);


The index of the variable x is v = var(p);
The index of the cell x refers to is i = getCellS (g, v);
The index of the neighbouring cell is j = getNeighbourS (g, i, d);
Therefore (
j if j 6= 0
the index of the cell that x now refers to is k= ;
i otherwise
so the Gödel number of the intermediate step is gint = setVarK (g, v, k)
and of the new step is setNodeS (gint , next(p))
• Kind=8, x :≡ y
The index of the node is p = getNodeS (g);
The index of the variable x is u = var(p);
The index of the variable y is v = var0 (p);
The index of the cell y refers to is j = getCellS (g, v);
Change index of the cell x refers to to j: setCellS (g, u, j);
Therefore
the Gödel number of the intermediate step is gint = setVarK (g, u, j)
and of the new step is setNodeS (gint , next(p))
• Kind=0, Fin
No change occurs. Therefore
the Gödel number of the new step is g
All the functions in this section are primitive recursive. We have defined one such function
for each kind k = 0, 1, . . . , 8. Let us call these functions Nextk (g).

E.11 The Next step function

In the “game plan” I laid out the overall progress of an algorithm.

encode decode
x Step0 Step1 ... Stepn f (x)
enc dec
g0 g1 ... gn

Here the progress from one step to the next, Stepi → Stepi+1 , is mirrored by a function
Next : gi → gi+1 which is now easy to describe.
All that is required is to extract the kind of the current node from the Gödel number g and
use this to decide which of the actions just described to perform.
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 549

For the Gödel number g any step S, the Gödel number of the next step is given by:


Next0 (g) if k =0




Next1 (g) if k =1




Next2 (g) if k =2
Next3 (g) k =3


 if
NextStep(g) = Next4 (g) if k =4 where k = kind(getNodeS (g))

Next5 (g) if k =5








Next6 (g) if k =6




Next7 (g) if k =7
Next8 (g) if k =8

This, as usual, is primitive recursive.

We have now dealt with most of the game plan diagram above. We still need to look at the
question of getting numbers into and out of the configuration and the corresponding Gödel
numbers. This is looking at the left and right hand triangles in the diagram.
Since we have proved that it makes no difference which code we use (so long as it is algo-
rithmic) we will use the unary code as being by far the simplest to deal with.

E.12 Lemma: The function UnaryEncode


encode
x Step0
Here we look at the left hand triangle.
enc
g0

The input x is a sequence, so we encode this into unary code.


Note that the top “encode” arrow here is the function which takes an n-tuple x to Step0 ,
which is a triple hC, p, refi. We want to show that the composite of this with the vertical
arrow, gnStep is primitive recursive. It is clearly enough to show that the composite function
x 7→ C 7→ gnConfig(C) is primitive recursive.
Let us call the first function here (making a configuration from x) con and write gc for the
composite function, so we are now looking at the triangle
con
x C
gnConfig
enc
gc

We now show that this function gc is primitive recursive.

Proof. This will be done in several easy steps. As an illustrative example, consider the
sequence h 3, 0, 2 i. This is represented by the configuration
550 APPENDIX C. GENERAL ALGORITHMS

O
1 5 6

1 1 1 1 1
2 3 4 7 8

The small numbers beside the cells show the way we will index them.
Prelim

To get the index of the last cell in the bottom row (8 in the example above) as a function
of the input sequence x = hx1 , x2 , . . . , xn i. Call this function LB. Then

LB(x1 , x2 , . . . , xn ) = x1 + x2 + . . . xn + n .

We also want to get the index of the last cell in the top row (6 in the example). Call this
function LT. Then

LT(x1 , x2 , . . . , xn ) = x1 + x2 + . . . xn−1 + n .

These are both primitive recursive functions.


We now show that the function enc : x 7→ gc(x) is primitive recursive by induction over the
length of x.

Step 0
In the case of length zero, hi 7→ gch(i) is just a constant and so is primitive recursive.
Now we assume that, for length n, the function hx1 , x2 , . . . , xn i 7→ gc(x1 , x2 , . . . , xn ) is
primitive recursive and prove that hx1 , x2 , . . . , xn , yi 7→ gc(x1 , x2 , . . . , xn , y) is also primitive
recursive, by induction over y.

Step 1
We show first that the function hx1 , x2 , . . . , xn i 7→ gc(x1 , x2 , . . . , xn , 0) is primitive recursive.

O
1 5 6 9

1 1 1 1 1
2 3 4 7 8

Writing g for gc(x1 , x2 , . . . , xn ), we have

gc(x1 , x2 , . . . , xn , 0) = gnConfig(con(x1 , x2 , . . . , xn , 0))


= join(newCellC (g), LT(x), 2, LB(x) + 1) .

Step 2
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 551

Now note that


(
join(newCellC (g), LT(g), 1, LB(g) + 1) if y = 0
gc(x1 , x2 , . . . , xn , y + 1) =
join(newCellC (g), LB(g), 2, LB(g) + 1) 6 0.
if y =
where g = gc(x1 , x2 , . . . , xn , y)

Is it obvious that this makes gc primitive recursive? It means that, if we rewrite the right
hand side of this equation as a function of x, y and another variable z thus,
(
join(newCellC (z), LT(z), 1, LB(z) + 1) if y = 0
h(x1 , x2 , . . . , xn , y, z) =
join(newCellC (z), LB(z), 2, LB(z) + 1) if y 6= 0 .

then we have

gc(x1 , x2 , . . . , xn , y + 1) = h(x1 , x2 , . . . , xn , y, gc(x1 , x2 , . . . , xn , y))

and here h is primitive recursive. 

E.13 Lemma: The function UnaryDecode


This is much easier because we have a single number rather than an n-tuple to decode.

decode
Stepn f (x)

dec
gn

The function UnaryDecode : N → N which takes the step Gödel number of the representation
as a workspace of a single number using unary code to that number is primitive recursive.

Proof. Suppose (to simplify the notation) that S is the last step in the sequence with step
Gödel number g, and that this represents a single number n

decode
S n

dec
g

Then

S = (C, p, ref) as usual


so g = gnStep(S) = γ(gnConfig(C), gnProc(P ))
and so gnConfig(C) = ent(1, g) .

Now the configuration C represents the number n in unary code, so it is


552 APPENDIX C. GENERAL ALGORITHMS

O
1 1 1 1 ... 1
1 2 3 4 n

This workspace is represented by the number

ent(1, g) = γ(γ(1, 0, 2) , γ1, 0, 3) , γ(1, 0, 4) , . . . , γ(1, 0, n) , γ(1, 0, 0))

of length n. Consequently the required function is given by

dec(g) = len(ent(1, g)) .

and this is primitive recursive. 

E.14 Proposition: Algorithmic ⇒ partial recursive


Every algorithmic function Nn → N is partial recursive.
Here all this stuff comes together!

Proof. First note that, for any step S with step Gödel number g = gnStep(S), it is possible
to test g to see whether the process is in a Fin state or not, as follows:—

S = hC, P i = hC, hp, refii


so g = gnStep(S) = γ(gnConfig(C), γ(p, ref))
and so p = ent(1, ent(2, g)) ;

and then the kind of the current node is given by kind, so we have

The process is in a Fin state if and only if kind(ent(1, ent(2, g))) = 0 .

Now let us redraw the “game plan” diagram to make the data x and the step numbers
explicit:—

encode decode
x Step(0, x) Step(1, x) ... Step(n, x) f (x)

enc dec
g(0, x) g(1, x) ... g(n, x)

We know that

g(0, x) = enc(x) and g(i + 1, x) = NextStep(g(i, x))

and that both enc and NextStep are primitive recursive; therefore so is g.

The step number at which the process first reaches a Fin node is given by

halt(x) = min{ i : kind(ent(1, ent(2, g(i, x)))) = 0 }


E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 553

which is a straightforward minimalisation. Therefore the halt function is partial recursive.


Then
f (x) = dec(gnStep(halt(x), x)) for all x
and so f is partial recursive. 

E.15 Theorem: Algorithmic ⇔ partial recursive


A function Nn → N is algorithmic if and only if it is partial recursive.

Proof. This follows immediately from Propositions D.8 and E.14.  I D.8
I E.14
E.16 Remark: some synonyms
As a result of the theorem above we can us the words partial recursive, algorthmic and com-
putable as being synonyms — completely interchangeable. Well, at least as far as functions
Nm → Nn goes; algorithms in general can be applied to a wider range of data structures.
In the same way we can use the word recursive for any algorithm that is guaranteed to
terminate.
The following result is an immediate corollary of this proof, but it is important enough to
be called a theorem.

E.17 Theorem
For any partial recursive function f : Nn 99K N, there are primitive recursive functions
p, q : Nn+1 → N such that
f (x) = p(min{q(i, x) = 0}, x) for all x ∈ Nn .
i

Proof. With the notation of the last few lines of Proposition E.14 above, set
p(u, x) = dec(gnStep(u, x))
and q(i, x) = kind(ent(1, ent(2, g(i, x)))) .


E.18 Corollary
In the definition of partial recursive functions, 9.A.3, the operation of minimalisation can I 9.A.3
be replaced by minimalisation applied only to regular functions.

This was discussed in the Remarks, 9.A.15. The theorem in fact says something much I 9.A.15
stronger.

Proof. Given a recursive function (not partial) f : Nn 99K N, write it in terms of primitive
recursive functions p and q, as in the theorem above. Since f is total, so must x 7→
mini {q(i, x) = 0} be. But if this is total, q must be regular. Since that is the only application
of minimalisation (p and q are primitive recursive) the corollary is proved. 
554 APPENDIX C. GENERAL ALGORITHMS

F Turing machines
F.1 Discussion
The definition of a Turing machine and the results about them are not essential to the studies
we are doing. Nevertheless, the idea is intimately bound up with that of an algorithm, and
Turing machines and Turing computability are so frequently mentioned in this context that
it is pretty well essential to discuss them briefly here.

Luckily, with the work of this appendix behind us we can cover this subject adequately
enough in fairly short order.

F.2 Definition of a Turing machine


A Turing machine is just an algorithm of the same kind we have been discussing in this
appendix, but with a strictly limited definition.
(1) The workspace is particularly simple. It is perhaps reasonable to describe it as “one-
dimensional”. There are only two directions, L and R (for “left” and “right”) and these are
reverses of each other (so that a move left followed by a move right cancel out to no move,
and the same of course for the other order). The alphabet can be any finite set, including
a “blank” symbol.
The workspace is usually visualised thus:

In discussing Turing machines, the workspace is usually called the “tape”, for obvious reasons.
In the usual way of defining a touring machine the tape extends infinitely in both directions,
but only a finite number of the cells may be non-blank. (This is equivalent to the formulation
of a general algorithm in which the workspace is finite, if a blank cell is considered to be “off
the edge” or, if you like empty space.) In these formulations, the blank cell is often denoted
0.
(2) The processor is also particularly simple. For a start it has only one variable, which
can be thought of as an eye, poised above the tape, and only able to look at one cell at a
time.
The processor can have many nodes, but they are only of a limited number of kinds. These
are
(a) x ≈ a?. (Do I see the symbol a?)
(d) x :≈ y. (Write the symbol a.)
L R
(g) x → and x →. (Move one cell left or right.)
(i) FIN. Stop work.
All the other kinds are irrelevant because of the simplicity of the workspace (tape).
In a sense the definition of a general algorithm at the beginning of this appendix is just a
generalisation of that of a Turing machine.
F. TURING MACHINES 555

In many descriptions of Turing machines, the four kinds of nodes are collapsed into one
rather complicated one which does all of the following:
(1) Look at the symbol in the current cell.
(2) Write a symbol or not, depending upon the one just seen.
(3) Move L or R or stand still, also depending upon the symbol seen.
There is also some provision for knowing when the process has finished. This is equivalent
to the previous formulation in terms of four kinds of node.

F.3 Programming a Turing machine


Because of its simplicity, programming a Turing machine can be very difficult. Because
all numbers involved have to be strung out in a line (along the tape) and there is only one
variable (eye), a great deal of carefully managed moving backwards and forwards is required.
There is another thing which causes pain and that is: it is not enough to define an addition
routine (say) on the assumption that the couple of numbers involved are the only things on
the tape. Normally the addition will be part of a larger calculation, which will involve lots of
data on the tape on both sides of where the addition is being worked on. Now, whenever the
addition operation needs a little more space (where the sum is being developed for instance)
it will have to push all the data on either side away a bit to make room, without messing it
up and without the processor losing track of which bit of data is where.
In my opinion, programming Turing machines is a nightmare. If you don’t believe me, try
programming addition, taking into account the preservation of any amount of data on the
tape on either side. You can use any number code you like and any alphabet that suits you.

This is not to denigrate Turing machines. They are historically very important, and have
had a great influence on the kind of mathematics and logic we have been studying here.
Moreover, though the idea may seem a bit obvious looking back from the twenty-first century,
the original formulation by Alan Turing must have involved profound insights.
It is good to know what it means to say that a function has been shown to be Turing
computable, even though one might suspect that this term is often airily thrown around by
people who have not actually defined and proved the actual Turing machine program they
are suggesting. It is comforting to know that the notion is equivalent to “algorithmic”, in the
sense of this appendix, and so one can verify that a function is Turing machine computable
by the rather easier methods described here without having to actually program the machine.

I have stated here that Turing machine computable is equivalent to algorithmically com-
putable which we already know is equivalent to partial recursive. I do not propose to prove
this here, since we will not be using Turing machines.
One way round is easy: if a function is computable by a Turing machine then, since such a
machine is a simple type of algorithm, it is computable by a general algorithm and we have
proved that then it is partial recursive.

The other way around involves showing that the basic functions of the definition of a par-
556 APPENDIX C. GENERAL ALGORITHMS

I 9.A.3 tial recursive function (Definition 9.A.3) are all Turing computable — those basic functions
include addition and, even worse, multiplication — and that Turing computability is main-
tained under the operations of substitution and minimalisation. This is an unpleasant and
lengthy undertaking.
D. SOME ALGORITHMS

In this section I explain in more detail the algorithms required for some of the constructions
in Chapter 10. I Ch.10

A Preliminaries
A.1 Symbols and their arities
Here are some basic functions, defined more for convenience than anything else. As presented
here, I am assuming that the functions and relations are as given in Section 4.C.1 and no I 4.C.1
others, with the following Gödel numbers assigned; for a system S with more functions and
relations, make the obvious changes.

¬ 0 0̄ 6
⇒ 1 s 7 (the successor function)
∀ 2 + 8
( 3 × 9
) 4 = 10
, 5 vi 11 + i (ith variable symbol)

First some functions which test for various kinds of symbols from their Gödel numbers.

isNotSym(n) { if (n=0) return 1; else return 0 ; }

isImpSym(n) { if (n=1) return 1; else return 0 ; }

isForAllSym(n) { if (n=2) return 1; else return 0 ; }

isOpenSym(n) { if (n=3) return 1; else return 0 ; }

isCloseSym(n) { if (n=4) return 1; else return 0 ; }

isComma(n) { if (n=5) return 1; else return 0 ; }

isFunSym(n) { if (3≤n≤9) return 1; else return 0 ; }

isRelSym(n) { if (n=10) return 1; else return 0 ; }

isVarSym(n) { if (n≥11) return 1; else return 0 ; }

and a function to return arities (for both functions and relations)

557
558 APPENDIX D. SOME ALGORITHMS

1 arity(n) {
2 if (n=6) return 0;
3 if (n=7) return 1;
4 if (8 ≤ n ≤ 10) return 2;
5 return 0;
6 }

Note that if n is not the Gödel number of any function or relation symbol this function
returns 0; this is never used, but we do this so that the function is recursive, not partial.)
I 10 In what follows we will also use the functions defined at the beginning of Chapter 10.
Some of the functions we are about to look at can best be thought of as “scanning” a string,
checking out symbols and substrings as they go. Typically they will work with two numbers,
x, the Gödel number of the string and p, the index of the symbol currently being looked at
in the string. Some functions will check out a substring (usually a subexpression) in some
way; typically such a function will return the index of the next symbol after the substring,
making it easy for the function that called it to proceed.

A.2 String manipulations


We will need some functions which perform basic manipulations with strings.

Firstly, to get the substring of a string; given a string with Gödel number x, the function
Substring(p,q,x) finds the substring which extends from its pth to its qth entry (inclusive)
and returns the Gödel number of that substring. It assumes that 1 ≤ p ≤ q ≤ len(x)
(we will only be using it when these inequalities hold, so we are not interested in what the
function does in other cases).

1 Substring(p,q,x; z,r) {
2 z := ent(q,x); The substring will be built up in z,
3 r := q-1; one symbol at a time, starting from
4 while (r ≥ p) { the right hand end of the substring in x.
5 add(ent(r,x),z);
6 r := r-1;
7 }
8 return z;
9 }

We will need to concatenate two strings. Given strings with Gödel numbers x and y, the
function Concat(x,y) returns the Gödel number of the concatenated string x followed by
y.

1 Concat(x,y; z,p) {
2 z := y; The new string will be built up in z,
B. ALGORITHMS FOR GÖDEL’S THEOREM 559

3 p:=len(x); starting with y, then adding


4 while (p ≥ 1) { one symbol at a time from x.
5 add(ent(p,x),z);
6 p := p-1;
7 }
8 return z;
9 }

We will also need to replace a single entry by a substring. Given a string with Gödel number
x, an index p and another string s, the function Replace(p,x,s) replaces the pth entry of
x with the string s and returns the Gödel number of the resulting string. We do this by
using the Substring function to pull x apart and then the Concat function to reassemble
it differently.

1 Replace(x,p,s; ) {
2 if (len(x)=1) return s;
3 if (p=1) return Concat(s,Substring(2,len(x),x));
4 if (p=len(x)) return Concat(Substring(1,len(x)-1,x),s);
5 else return Concat(Concat(Substring(1,p-1,x),s),Substring(p+1,len(x),x));
6 }

B Algorithms for Gödel’s Theorem

B.1 Recognising terms


Here we set about defining a function which will decide whether a given string is a valid
expression or not — more precisely, it decides whether a given number x is the string Gödel
number of a valid expression or not.
Before doing that we need a function isTerm which will decide whether a given substring is
a term or not.
More correctly: the function isTerm(x,p) will test to see if the pth entry of the string with
Gödel number x is the first character of a valid term or not. It returns 1 for yes and 0 for
no.

We also want a function which simply skips a term. Given the Gödel number x of an
expression and the index p of an entry in it, SkipTerm(x,p) will return the index of the
first entry after that term. We will only use this function when it is already known that
(x,p) is indeed the start of a valid term.
For example, consider the distributive law (∀z)(∀y)(∀z)( (x + y)z = xz + yz ) written out in
its fully formal form:

(∀x(∀y(∀z = (×(+(x, y), z), +(×(x, z), ×(y, z))))))


560 APPENDIX D. SOME ALGORITHMS

and let us suppose that its Gödel number is x. In this there is a term ×(+(x, y), z) whose
first character (the ×) is the 12th entry in the expression, so our functions should give

isTerm(x,12) = 1
SkipTerm(x,12) = 23

since the first entry after this term is the 23rd (a comma). Similarly, its 8th entry (a ∀
symbol) is not the first character of a term, so

isTerm(x,8) = 0
SkipTerm(x,8) = Don’t care.

Here are the functions:

1 SkipTerm(x,p; ch,ar,count) {
2 ch := ent(x,p);

3 if (isVarSym(ch) return p+1;

4 if (isFunSym(ch)) {
5 ar := arity(ch) ;
6 p := p+2
7 count := 0;
8 while (count < ar) {
9 p := SkipTerm(x,p);
10 count := count + 1 ;
11 p:=p+1 ;
12 }
13 return p;
14 }

15 return 0;
16 }

and
B. ALGORITHMS FOR GÖDEL’S THEOREM 561

1 isTerm(x,p; ch,ar,count) {
2 ch := ent(x,p); Get the first character.

3 if (isVarSym(ch) return 1; If it is a variable, OK.

4 if (isFunSym(ch)) { If it is a function symbol ...


5 ar := arity(ch) ; Get its arity.
6 p := p+1 ; ch = ent(x,p); Get the next character.
7 if (not isOpenSym(ch)) return 0; Should be an opening parenthesis.
8 p := p + 1; ch := ent(x,p); Get the next character.
9 count := 0; Start counting arguments.
10 while (count < ar) { Loop for correct number of args.
11 if (not isTerm(x,p)) return 0; Arg should be a term.
12 p := SkipTerm(x,p); If not return 0.
13 count := count + 1 ; If OK, update the count.
14 ch := ent(x,p) ; Get then next character.
15 if (count < ar) { For all but the last entry,
16 if (not isComma(ch)) return 0; this should be a comma.
17 p := p + 1; ch := ent(xp);
18 }
19 }
20 if (isCloseSym(ch)) return 1; Now we’ve looked at all the args.
21 else return 0 ; Next character should be a ).
22 }

23 return 0; If we get here, the first character was not


24 } a variable or a function; not OK.

Shortly we will need a function which gets a term: given the Gödel number x of an expression
and the index p of the first entry of a term in it, it returns the Gödel number of that term
as a string.

1 GetTerm(x,p; q) {
2 q := SkipTerm(x,p)) - 1;
3 return Substring(p.q.x);
4 }

B.2 Recognising expressions (for 10.A.3) I 10.A.3


Next we define a very similar pair of algorithms for checking subexpressions. The function
isSubxpression(x,p) will test to see if the pth entry of the string with Gödel number x
is the first character of a valid subexpression or not, returning 1 for yes and 0 for no. The
function SkipSubexpression(x,p) will simply skip the subexpression starting at (x,p).

Recalling that an expression must be of one of the forms


562 APPENDIX D. SOME ALGORITHMS

• r(t1 , t2 , . . . , tn ) where r is an n-ary relation symbol and t1 , t2 , . . . , tn are terms,

• (¬P ) where P is an expression,

• (P ⇒ Q) where P and Q are expressions,

• (∀xP ) where x is a variable symbol and P is an expression,

we have:

1 SkipSubexpression(x,p; ch,ar,count) {
2 ch := ent(x,mu);

3 if (isRelSym(ch)) {
4 ar := arity(ch) ;
5 p := p+2;
6 count := 0;
7 while (count < ar) {
8 p := SkipTerm(x,p))+1;
9 count := count + 1;
10 }
11 }
12 return p;
13 }

14 p:=p+1; ch:=ent(x,p);

15 if (isNotSym(ch)) {
16 p := p+1;
17 return := SkipSubxpression(x, p))+1;
18 }

19 if (isForAllSym(ch)) {
20 p := p+2;
21 p := SkipSubexpression(x,p));
22 return p+1;
23 }

24 p := p+1;
25 p := SkipSubexpression(x,p));
26 p := p+1;
27 p := SkipSubexpression(x,p);
28 return p+1;
29 }

and
B. ALGORITHMS FOR GÖDEL’S THEOREM 563

1 isSubexpression(x,p; ch,ar,count) {
2 ch := ent(x,mu); Get the first character.

3 if (isRelSym(ch)) { Starts with a relation symbol.


4 ar := arity(ch) ; This section is almost exactly the same
5 p := p+1; a := ent(x,p); as the isTerm function above.
6 if (not isOpenSym(ch)) return 0;
7 p := p+1; ch := ent(x,p);
8 count := 0;
9 while (count < ar) {
10 if (not isTerm(x,p)) return 0;
11 p := SkipTerm(x,p);
12 count := count + 1;
13 ch := ent(x,p) ;
14 if (count < ar) {
15 if (not isComma(ch)) return 0;
16 p := p + 1; ch := ent(x,p);
17 }
18 }
19 if (not isCloseSym(ch)) return 0;
20 return 1 ;
21 }

22 if (not isOpenSym(a)) return 0; All other forms must start with an


23 p := p + 1; ch := ent(x,p); opening parenthesis.

24 if (isNotSym(ch)) { A Not symbol.


25 p := p+1; ch := ent(x,p); Get next character.
26 if (not isSubexpression(x,p)) return 0; It should start a subexpression.
27 p := SkipSubexpression(x,p);
28 ch := ent(x,p); Get next character.
29 if (not isCloseSym(ch)) return 0; It should be a closing parenthesis.
30 return 1; If we got to here, all is OK.
31 }

32 if (isForAllSym(ch)) { A For All symbol.


33 p := p + 1; ch := ent(x,p); Get next character.
34 if (not isVarSym(ch)) return 0; It should be a variable symbol.
35 p := p + 1; ch := ent(x,p); Get the next character.
36 if (not isSubexpression(x,p)) return 0; It should start a subexpression.
37 p := SkipSubexpression(x,p);
38 ch := ent(x,p); Get the next character.
39 if (not isCloseSym(ch)) return 0; It should be a closing parenthesis.
40 return 1; If we got to here, all is OK.
41 }

42 if (not isSubexpression(x,p)) return 0; Current character should start a subexpression.


564 APPENDIX D. SOME ALGORITHMS

43 p := SkipSubexpression(x,p);
44 ch := ent(x,p); Get next character.
45 if (not isImpSym(ch)) return 0; It should be an implication symbol.
46 p := p + 1; ch := ent(x,p); Get next character,
47 if (not isSubexpression(x,p)) return 0; It should start a subexpression.
48 p := SkipSubexpression(x,p);
49 ch := ent(x,p); Get next character.
50 if (not isCloseSym(ch)) return 0; It should be a closing parenthesis.
51 return 1 ; If we got to here, all is OK.
52 }

Now it is easy to make a function isExpression(x), which tests whether an entire string is
a valid expression or not; more correctly, it tests whether its argument is the string Gödel
number of an expression or not.

1 isExpression(x; p) {
2 p = isSubexpression(x,1));
3 if (p = len(x)+1) return 1;
4 else return 0;
5 }

We will need a function which gets a subexpression: given the Gödel number x of an
expression and the index p of the first entry of a subexpression, it returns the Gödel number
of that subexpression.
B. ALGORITHMS FOR GÖDEL’S THEOREM 565

1 GetSubexpression(x,p; q) {
2 q := SkipSubexpression(x,p)) - 1;
3 return Substring(p.q.x);
4 }

B.3 Recognising sentences (for 10.A.4) I 10.A.4


Next we wish to write a function to test if a given string is a sentence. To do this, we first
write a function which decides whether a given variable occurs free in a given expression.
And before that in turn we write a function to decide whether a given variable occurs in a
term or not.

1 if the term that starts at character number p in the string with

varInTerm(x,p,v) = Gödel number x contains the variable with Gödel number v,

0 otherwise.

We will only use this function when we already know that x is the Gödel number of a valid
expression, that p is the index of the first character of a valid term in that expression and
that v is the Gödel number of a variable symbol; we do not care what nonsense the function
may get up to in any other case.

1 varInTerm(x,p,v; ch,arity,count) {
2 ch := ent(x,p); Get first character.
3 if (ch=v) return 1; If it’s a variable, we are done.
4 ar := arity(ch); Must be function symbol; get arity.
5 p := p+2; count := 0 ; Skip the parenthesis; get next character.
6 while (count < ar) { Count through the arguments.
7 if (varInTerm(x,p,v)=1) return 1; Does v occur in the argument?
8 p := SkipTerm(x,p);
9 count:=count+1; p:=p+1; Skip the comma; get next character.
10 }
11 return 0; If we get to here something’s wrong.
12 }

Now we can test whether a variable occurs free in an expression. Referring to the definition
of binding in 3.A.12 we see that the variable v occurs free in an expression if one of the I 3.A.12
following cases obtains:

• The expression is atomic, r(t1 , t2 , . . . , tn ) say, and v occurs in one of the terms t1 , t2 , . . . , tn .
• The expression is of the form (¬P ) and v occurs free in P .
• The expression is of the form (P ⇒ Q) and v occurs free in at least one of P or Q.
• The expression is of the form (∀xP ), v is not x and v occurs free in P .
566 APPENDIX D. SOME ALGORITHMS

We define a function FreeInSubexpression(x,p,v); if x is the string Gödel number of


a valid expression, p is the index of a character in that expression which starts a valid
subexpression and v is the Gödel number of a variable symbol, the function will return 1
for yes if that variable occurs free in that subexpression and 0 for no otherwise.
B. ALGORITHMS FOR GÖDEL’S THEOREM 567

1 FreeInSubexpression(x,p,v; ch,ar,count) {
2 ch := ent(x,p);

3 if (isRelSym(ch)) {
4 p := p+2; count := 0 ; ar = arity(ch);
5 while (count < ar) {
6 if (varInTerm(x,p,v)=1) return 1;
7 SkipTerm(x,p);
8 count := count+1; p := p+1;
9 }
10 return 0;
11 }

12 p := p+1; ch := ent(x,p) ;

13 if (isNotSym(ch)) {
14 p := p+1;
15 return FreeInSubexpression(x,p,v);
16 }

17 if (isForAllSym(ch)) {
18 p:=p+1; ch:=ent(x,p) ;
19 if (ch=v) return 0;
20 p:=p+1;
21 return FreeInSubexpression(x,p,v);
22 }

23 p := p+1;
24 if (freeInSubexpression(x,p,v)=1) return 1;
25 p:= p+1;
26 if (freeInSubexpression(x,p,v)=1) return 1;
27 else return 0;

28 }

It is now easy to write a function which checks whether a variable occurs free in a whole
expression or not. If x is the string Gödel number of a valid expression and v is the Gödel
number of a variable symbol, freeInExpression(x,v) returns 1 for yes if that variable
occurs free in the expression and 0 for no otherwise.

1 freeInExpression(x,v) {
2 return freeInSubexpression(x,1,v);
3 }
568 APPENDIX D. SOME ALGORITHMS

And now we can write a function which checks if a string x is a valid sentence or not. This
is fairly easy. First check that it is an expression, then check that it has no free variables
by checking that variable symbols with Gödel numbers up to x do not occur free in it . If
nothing has gone wrong up to there, it is a sentence, so return 1.
(Referring back to the original definition of the function P , we see that, for all x and y,
P (x, y) ≥ x and P (x, y) ≥ y and that P is strictly increasing in both variables. It follows
that the Gödel number of a string is greater than the Gödel number of any of its substrings
and greater than the Gödel number of any of its characters. So, in order to test whether
the expression of Gödel number x contains any free variables or not it is enough to test it
for free occurrences of all variables of Gödel number from 11 up to x. Why 11? Because
that is the Gödel number of the first variable symbol. This is obviously not very efficient,
but we are not interested in efficiency here, and it is simple.)

1 isSentence(x; p,w,ch,v) {
2 if (not isExpression(x)) return 0;

3 v:=11;
4 while(v ≤ x) {
5 if (isVarSym(v) & FreeInExpression(x,v)) return 0;
6 v := v+1;
7 }
8 return 1;
9 }

B.4 Substitution
Given the Gödel numbers x of an expression, v of a variable symbol and t of a term, the
function SubstInTerm(x,v,t) returns the Gödel number of the expression made by the
substitution x[v/t].

1 SubstInTerm(x,v,t; ch,p,ar,y,s) {
2 ch := ent(1,x);

3 if (ch=v) return t;
4 if (len(x) = 1) return x;

5 ar := arity(ch);
6 y := Substring(x,1,2);
7 p := 3; count := 1;
8 while (count ≤ ar) {
9 s := GetTerm(x,p);
10 y := Concat(y, SubstInTerm(s,v,t));
11 p := SkipTerm(x,p);
12 y := Concat(y, Substring(x,p,p));
13 p := p+1; count := count + 1;
B. ALGORITHMS FOR GÖDEL’S THEOREM 569

14 }
15 return y;
16 }

Now we can write a function which will make a similar substitution in an expression. The
code is lengthy because it has to deal with the various ways an expression can be constructed,
but the logic is very similar to that of the last function.

1 SubstInExpression(x,v,t; ch,p,ar,y,s) {
2 ch := ent(1,x);

3 if (isRelSym(ch)) {
4 ar := arity(ch);
5 y := Substring(x,1,2);
6 p := 3; count := 1;
7 while (count ≤ ar) {
8 s := GetTerm(x,p);
9 y := Concat(y, SubstInTerm(s,v,t));
10 p := SkipTerm(x,p);
11 y := Concat(y, Substring(x,p,p));
12 p := p+1; count := count + 1;
13 }
14 return y;
15 }

16 ch := ent(2,p);

17 if (isNotSym(ch)) {
18 y := Substring(x,1,2);
19 p := 3;
20 s := GetSubexpression(x,p);
21 y := Concat(y, SubstInExpression(s,v,t));
22 p := SkipSubexpression(x,p);
23 y := Concat(y, Substring(x,p,p);
24 return y;
25 }

26 if (isForAllSym(ch)) {
27 ch := ent(x,3);
28 if (ch = v) return x;
29 y := Substring(x,1,3);
30 p := 4;
31 s := GetSubexpression(x,p);
32 y := Concat(y, SubstInExpression(s,v,t));
33 p := SkipSubexpression(x,p);
34 y := Concat(y, Substring(x,p,p));
570 APPENDIX D. SOME ALGORITHMS

35 return y;
36 }

37 y := Substring(x,1,1);
38 p := 2;
39 s := GetSubexpression(x,p);
40 y := Concat(y, SubstInExpression(s,v,t));
41 p := SkipSubexpression(x,p);
42 y := Concat(y, Substring(x,p,p));
43 p := p+1;
44 s := GetSubexpression(x,p);
45 y := Concat(y, SubstInExpression(s,v,t));
46 p := SkipSubexpression(x,p);
47 y := Concat(y, Substring(x,p,p));
48 return y;

49 }

B.5 Acceptability
Testing for acceptability is not difficult. We use the inductive definition: the substitution
[v/t] is acceptable in . . .

• any atomic expression r(s1 , s2 , . . . , sn ) always;


• (¬P ) iff it is acceptable in P ;
• (P ⇒ Q) iff it is acceptable in both P and Q; and
• (∀uP ) iff it is acceptable in P and, if P contains v free, t does not contain u.
B. ALGORITHMS FOR GÖDEL’S THEOREM 571

1 Acceptable(x,v,t; ch,p,ar,y,s) {
2 ch := ent(1,x);

3 if (isRelSym(ch)) return 1;
4 }

5 ch := ent(2,p);

6 if (isNotSym(ch)) {
7 y := GetSubexpression(x,3);
8 return Acceptable(y,v,t);
9 }

10 if (isForAllSym(ch)) {
11 u := ent(3,x);
12 y := GetSubexpression(x,5);
13 if (not Acceptable(y,v,t)) return 0;
14 if (FreeInSubexpression(x,5,v) and
15 (VarInTerm(t,1,v)) return 0;
16 else return 1;
17 }

18 y := GetSubexpression(x,2);
19 if (not Acceptable(y,v,t)) return 0;
20 p := SkipSubexpression(x,p)+1;
21 y := GetSubexpression(x,2);
22 if (not Acceptable(y,v,t)) return 0;
23 else return 1;

24 }

B.6 Recognising axioms (for 10.A.5) I 10.A.5


First we test a string to see if it is an instance of Axiom PL1. The function isPL1(x) returns
1 if string x is an instance of Axiom PL1, 0 otherwise. This axiom in its formal form is
(P ⇒ (Q ⇒ P )), where P and Q are expressions.

In this implementation we first check that x is the Gödel number of a valid expression. Then
the main part of the listing is concerned with scanning the expression in the now-familiar
way to check that it is of the form (P ⇒ (Q ⇒ R)), remembering where the subexpressions
P and Q start (π and ρ) and the length λ of P . Finally (Line 21) it checks that P and R
are equal.
572 APPENDIX D. SOME ALGORITHMS

1 isPL1(x; p,ch,P1,P2) {

2 if (not isExpression(x) return 0;

3 p := 1; ch := ent(x,p);
4 if (not isOpener(ch)) return 0;
5 p := p+1;
6 if (not isSubexpression(x,p)) return 0;
7 P1 := GetSubexpression(x,p);
8 p := SkipSubexpression(x,p);
9 p := p+2;
10 p := skipSubexpression(x,p);
11 p := p+1;
12 P2 := GetSubexpression(x,p);
13 if (P1 6= P2) return 0;
14 return 1;
15 }

Axioms PL2 to PL5 are checked in much the same way; implementation of functions isPL2
to isPL5 can now be left as an exercise. Axiom PL6 however raises some other concerns: it
is necessary to check whether a substitution [x/t] is acceptable in an expression P and, if
so, check whether another expression is of the form P [x/t] or not.
In order to recognise a valid instance of PL6 we need to be able to decide, given Gödel
numbers x and y of expressions and v of a symbol, whether the expression y is the result
of an acceptable substitution of any term for v in the expression x or not. Referring back
to the original definition of the function P , we see that, for all x and y, P (x, y) ≥ x and
P (x, y) ≥ y and that P is strictly increasing in both variables. It follows that the Gödel
number of a string is greater than the Gödel number of any of its substrings and greater than
the Gödel number of any of its characters. So, in order to perform the test just described, it
is enough to test for substitutions [v/t] for all t of Gödel number less than that of y. This
is obviously very inefficient, but we are not interested in efficiency here, and it is simple.

1 anySubst(x,v,y) {
2 t:= 1;
3 while (t ≤ y) {
4 if (isTerm(t) &
5 Acceptable(x,v,t) &
6 y=SubstInExpression(x,v,t)) return 1;
7 t := t + 1; \\
8 }
9 return 0;
10 }

Now we can test a string to see if it is an instance of Axiom PL6. The function isPL6(x)
B. ALGORITHMS FOR GÖDEL’S THEOREM 573

returns 1 if string x is an instance of Axiom PL6, 0 otherwise. This axiom in its formal
form is ((∀xP ) ⇒ Q), where P is an expression, Q is the result of substituting term t for
variable v in P and this substitution is acceptable.
574 APPENDIX D. SOME ALGORITHMS

1 isPL6(x; p,ch,y,z) {
2 if (not isExpression(x) return 0;

3 ch := ent(x,1);
4 if (not isOpenSym(ch)) return 0;
5 ch := ent(x,2);
6 if (not isOpenSym(ch)) return 0;
7 ch := ent(x,3);
8 if (not isForAllSym(ch)) return 0;
9 v := ent(x,4);
10 if (not isVarSym(v)) return 0;
11 y := GetSubexpression(x,5);
12 p := skipSubexpression(x,5);
13 ch := ent(x,p);
14 if (not isCloseSym(ch)) return 0;
15 p := p+1; ch := ent(x,p);
16 if (not isImpSym(ch)) return 0;
17 p := p+1;
18 z := GetSubexpression(x,p);
19 p := skipSubexpression(x,p);
20 ch := ent(x,p);
21 if (not isCloseSym(x,p)) return 0;
22 if (p 6= len(x)) return 0;

23 if (not anySubst(y,z,v)) return 0;

24 else return 1;
25 }

Now, of course, checking whether an string is an axiom of PL is trivial:

1 isAxiomOfPL(x) {
2 if (isPL1(x) or isPL2(x) or isPL3(x) or
3 isPL4(x) or isPL5(x) or isPL6(x)) return 1;
4 else return 0;
5 }

I 10.A.9 B.7 Recognising proofs (for 10.A.9)


We want to be able to recognise a proof in an axiomatisable theory. The theory may have
axioms other than the six of PL, but we are assuming that the set of all axioms is recursive.
That means we may assume the existence of a function isAxiom(x) which returns 1 for true
if the string x is an axiom of the system and 0 for false otherwise.
B. ALGORITHMS FOR GÖDEL’S THEOREM 575

Consider first recognising Modus Ponens. This is fairly simple. Given three expressions x,
y and z, we wish to recognise when they are of the form P , (P ⇒ Q) and Q respectively;
that amounts to simply recognising when y is of the form (x ⇒ z). Assuming that we have
already checked that x, y and z are valid expressions,
576 APPENDIX D. SOME ALGORITHMS

1 isModusPonens(x,y,z; p,ch) {
2 ch := ent(y,1);
3 if (not isOpener(ch)) return 0;
4 if (not isSubexpression(y,2)) return 0;
5 if (x 6= GetSubexpression(y,2)) return 0;
6 p := SkipSubexpression(y,p);
7 ch := ent(y,p);
8 if (not isImpSym(ch)) return 0;
9 p := p+1;
10 if (not isSubexpression(y,p)) return 0;
11 if (z 6= GetSubexpression(y,p)) return 0;
12 p := SkipSubexpression(y,p);
13 ch := ent(y,p);
14 if (not isCloseSym(ch)) return 0;
15 if (p 6= len(y)) return 0;
16 else return 1;
17 }

For Universal Generalisation we must recognise when two strings x and y are of the form P
and (∀v P ), that is, when y is of the form (∀v x). This can now safely be left as an exercise.

Before we define isProof we need to define the function isValidStep. Given a (sequence)
Gödel number S of a sequence of expressions and a number n less than or equal to its length,
isValidStep(S,n) will return 1 for true if the nth step in the sequence is valid proof-step,
and 0 for false otherwise.

1 isValidStep(S,n; x,i,j) {
2 x := ent(S,n);
3 if (isAxiom(x)) return 1;
4 i := 1; j := 1;
5 while (i < n) {
6 while(j < n) {
7 if (isModusPonens(ent(S,i), ent(S,j), x)) return 1;
8 }
9 if (isUniversalGeneralisation(ent(S,i),x)) return 1;
10 }
11 return 0;
12 }

Now the function isProof is easy. Given a number S, isProof(S) will return 1 for true if
it is the (sequence) Gödel number of a valid proof and 0 for false otherwise.

1 isProof(S; length,stepNum,x) {
2 length := len(S); The number of steps in the proof
B. ALGORITHMS FOR GÖDEL’S THEOREM 577

3 stepNum := 1;
4 while (stepNum < length) { Loop through the steps of the proof
5 x := ent(S,stepNum); The current step
6 if( not isExpression(x)) return 0; Check it out
7 if( not isValidStep(S,stepNum)) return 0;
8 stepNum := stepNum + 1;
9 }
10 return 1;
11 }

B.8 The “Proof of” relation (for 10.A.10) I 10.A.10


We want to define a partial recursive function ProofOf(S,x) which returns value 1 when
S is the Gödel number of a proof of the expression with Gödel number x and returns 0
otherwise. This is now easy.

1 ProofOf(S,x) {
2 if (not isProof(S)) return 0;
3 if (x 6= ent(S,len(S))) return 0;
4 else return 1;
5 }

B.9 Enumerating theorems (for 10.A.11) I 10.A.11


We want to define a partial recursive function theoremNumber which enumerates all theo-
rems, that is, whose range is the set of the Gödel numbers of all theorems. This is now
simple.

1 theoremNumber(S) {
2 if (isProof(S)) return ent(S,len(S));
3 }

Here, if S is the Gödel number of a valid proof, the function returns its last entry, which is
the theorem it proves; otherwise it does not return any value. It is not hard to make this a
recursive function if you prefer. One way is to choose the Gödel number n0 of your favourite
simple theorem, say P ⇒ P , and redefine the function thus:

1 theoremNumber(S) {
2 if (isProof(S)) return ent(S,len(S));
3 else return n0;
4 }
578 APPENDIX D. SOME ALGORITHMS

Programs and routines, as we have defined them so far, simply return a single number. But
suppose we had something like a “print” statement in our language. Then we could set our
machine to print out the Gödel numbers of all theorems thus:

1 AllTheorems(;S) {
2 S := 0;
3 while (1) {
4 if (isProof(S)) print(ent(S,len(S)));
5 S :=S+1;
6 }
7 }

Now all you have to do is decode the number ent(S,len(S)) back into its actual string
of characters (not hard!) and you have a program which will type out every theorem of
mathematics. Just sit back and wait for your favourite theorem to turn up.

B.10 The W and V functions (forGödel’s Theorem).


In order to compute this function we will need to know how to convert an ordinary number
ξ into the Gödel number of the string ξ¯ which represents it. For example, if ξ is 2, then ξ¯
is the string s(s(0̄)) of length 7. The algorithm will use the following constants

The Gödel number of 0̄ as a string of length 1 is P (0, 6) = 21 ,


The Gödel number of ) as a string of length 1 is P (0, 4) = 10 ,
The Gödel number of the string s( of length 2 is P (1, P (7, 3)) = 2017 .

and we have a recursive algorithm

1 NumberToString(n; x) {
2 if (n=0) return 21;
3 else {
4 x := NumberToString(n-1);
5 return Concat(Concat(2017,x),10);
6 }
7 }

Now for the W function.

1 W(xi,eta; x,v,xix) {

2 if (not isExpression(xi)) return 0;

3 x := 0;
4 v := 11;
B. ALGORITHMS FOR GÖDEL’S THEOREM 579

5 while (v<xi) {
6 if (FreeInExpression(v,xi) {
7 if (x=0) x:=v;
8 else if (v 6= x) return 0;
9 }
10 if (x=0) return 0;
11 }

12 xix := SubstInExpression(xi,x,NumberToString(xi));

13 if (not ProofOf(eta,xix)) return 0;

14 return 1;
15 }

For the V function,we use the constant

The Gödel number of the string (¬ of length 2 is P (1, P (3, 0)) = 56 .

Now modify the algorithm for W by replacing Line 17 by

17 nxix := Concat(Concat(56,xix),10);
18 if (not ProofOf(eta,nxix)) return 0;
E. THE EXPONENTIAL FUNCTION IN PA

This appendix assumes you have read Sections 4.C on Peano Arithmetic, 9.B on primitive I 4.C
recursive functions and 10.B.3, The Representability Theorem. With all this background, I 9.B
we are now in a position to answer a question raised in 4.C.2: is it possible to define the I 10.B.3
exponential function in PA? I 4.C.2
We cannot define a new function in PA by giving its graph, as in 6.B.21, because this I 6.B.21
technique involves set-theoretic operations not available in PA. Rather, we must define it
by description, that is, define an expression, E(z, x, y) say, which is true if and only if z = y x .
The exponential function (on N) is defined inductively thus:
y0 = 1 (–1A)
x+ x
y = y .y (–1B)
From this we can easily write down what is required of the expression E(z, y, x):
(∀x)E(1, 0, y)
(∀x)(∀y) E(z, x, y) ⇒ E(zy, x+ , y)


(∀x)(∀y)(∃!z)E(z, x, y)
(Here, the third line is the general requirement for the expression to define a function.)
But note that this is a set of conditions the expression must satisfy if it is to define the
exponential function — but that in no way guarantees that such an expression exists.
The Representability Theorem gives us a short answer to this: from this definition, the
exponential function is obviously primitive recursive. Therefore it is recursive and the Rep-
resentability Theorem tells us that such an expression E(z, x, y) exists.
It would be nice to know exactly what this expression is. Fortunately, the proof of the
Representation Theorem is constructive: it tells us how to find the required expression.
Even more fortunately, the process is not too involved.

However, there is one hurdle to be got over.The definition above is a primitive recursive
one, and the Representation Theorem works from the basic definition of the function as a
recursive function, as given in Definition 9.A.3, that is, in terms of some basic functions I 9.A.3
which are all available to us in PA and the operations of substitution and minimalisation.
Thus we need the services of Proposition 9.B.2 to convert the definition (–1A,–1B) above I 9.B.2
into this recursive form.

To whip this into shape to make the conversion, let us temporarily call the function e(x, y),
that is e(x, y) = y x . Then rewrite (–1A) and (–1B) in the form
e(0, y) = g(y) where g(y) = 1 ,
+
e(x , y) = h(x, e(x, y), y) where h(x, u, y) = uy .

581
582 APPENDIX E. THE EXPONENTIAL FUNCTION IN PA

I 9.B.2 Then 9.B.2 tells us that

e(x, y) = T (w0 (x, y), x)

where

w0 (x, y) = min{ T (w, 0) = g(y) ∧ (∀ξ < x)(T (w, ξ + 1) = T (w, ξ).y) }
w
T (w, i) = Rem( L(w) + 1 , 1 + R(w)(i + 1) ) .

I 9.A.10 Here L and R are the functions defined in Proposition 9.A.10 with the function P .

Now we start at the bottom and work back up, defining expressions which represent these
functions as we go, and starting with P .
P is given by
1
P (x, y) = (x + y)(x + y + 1) + x
2
which means that
2P (x, y) = (x + y)(x + y + 1) + 2x ,
in other words,
u = P (x, y) ⇔ 2u = (x + y)(x + y + 1) + 2x .
So the function P is represented by the expression

P̂ (u, x, y) : 2u = (x + y)(x + y + 1) + 2x . (–2)

Now the functions L and R are given by

x = P (u, v) ⇔ u = L(x) and v = R(x) .

So

u = L(x) ⇔ there is some v such that x = P (u, v)


and v = R(x) ⇔ there is some u such that x = P (u, v) .

This gives us expressions L̂(u, x) and R̂(v, x) representing these:

L̂(u, x) is (∃v)(x = P (u, v)) and R̂(v, x) is (∃u)(x = P (u, v))


that is L̂(u, x) is (∃v)(P̂ (x, u, v)) and R̂(v, x) is (∃u)(P̂ (x, u, v)) . (–3)

Next we represent the function Rem. It is defined


(
remainder upon dividing x by y if y 6= 0
Rem(x, y) =
0 if y = 0 .

This is represented by the expression



[ x, y) :
Rem(r, (y = 0 ∧ r = 0) ∨ y 6= 0 ∧ (∃d)(x = dy + r ∧ r < y) . (–4)
583

We can now find an expression T̂ (u, w, i) representing the function T . We have


u = T (w, i) ⇔ u = Rem( L(w) + 1 , 1 + R(w)(i + 1) )

⇔ (∃l)(∃r) u = Rem( l + 1 , 1 + r(i + 1) ) ∧ l = L(w) ∧ r = R(w)

⇔ [ u , l + 1 , 1 + r(i + 1) ) ∧ L̂(l, w) ∧ R̂(r, w)
(∃l)(∃r) Rem( (–5)
so the last line here is the expression T̂ (u, w, i).
Now we find an expression ŵ0 (u, x, y) to represent w0 (x, y). To stop the notation getting
out of hand, write X(w, x, y) for the expression
T (w, 0) = 1 ∧ (∀ξ < x)(T (w, ξ + 1) = T (w, ξ).y) )
rewritten to use T̂ instead of T :
T̂ (1, w, 0) ∧ (∀ξ < x)(∃θ)(T̂ (θ, w, ξ + 1) ∧ T̂ (θ, w, ξ).y) ) (–6)
so that then
w0 (x, y) = min X(w, x, y)
w
and then
u = w0 (x, y) ⇔ X(u, x, y) ∧ (∀z < u)¬X(z, x, y) . (–7)
Therefore ŵ0 (u, x, y) is this last expression, X(u, x, y) ∧ (∀z < u)¬X(z, x, y).
Finally
y x = e(x, y) = T (w0 (x, y), x)
so
u = yx ⇔ u = T (w0 (x, y), x)
⇔ T̂ (u, w0 (x, y), x)

⇔ (∃v) T̂ (u, v, x) ∧ v = w0 (x, y)

⇔ (∃v) T̂ (u, v, x) ∧ ŵ0 (v, x, y) (–8)
x
and we are done: this last line is the expression which represents y and can be used in PA
to define exponentiation by description.
You can, if you wish, substitute back to expand this expression into one which uses nothing
but the formal language of PA, but if you try you will find it expands to several lines long
and is not easy to understand that way.
Let’s try that.
First substitute for ŵ0 (v, x, y) in (–8), using (–7):

(∃v) T̂ (u, v, x) ∧ X(v, x, y) ∧ (∀z < v)¬X(z, x, y) .
Next, substitute for X in this, (–6):
(∃v) T̂ (u, v, x)
∧ T̂ (1, v, 0) ∧ (∀ξ < x)(∃θ)(T̂ (θ, v, ξ + 1) ∧ T̂ (θ, v, ξ).y) )

∧ (∀z < v)¬ T̂ (1, z, 0) ∧ (∀ξ < x)(∃θ)(T̂ (θ, z, ξ + 1) ∧ T̂ (θ, z, ξ).y) )

584 APPENDIX E. THE EXPONENTIAL FUNCTION IN PA

Now we need to substitute for all seven occurrences of T̂ here from (–5), making the appro-
priate changes in variables. Even using cut-and-paste this looks quite unpleasant. And we
[ L̂ and R̂, which will lead us
are still only half-way there: we have yet to substitute for Rem,
to substituting for P̂ . I think it is time to give up on this project — but you get the idea.
F. INDEX

Logic symbols

1.B.14, 2.C.2

1.B.14
8.A.2
¬ 2.A.3, 3.A.3
⇒ 2.A.3, 3.A.3

∧ 2.F.6, 3.A.4
∨ 2.F.6, 3.A.4
⇔ 2.F.6, 3.A.4

F 2.F.30, 3.A.9
T 2.F.30, 3.A.9
∀ 3.A.3
∃ 3.A.4

∃! 4.A.6
! 4.A.6
P [x/t] see Substitution

TTT 2.G.4

Set theory: relations

≤ 4.C.8, 6.C.1
≥ 6.C.1
6≤ 6.C.1

6≥ 6.C.1
6= 4.C.8
< 4.C.8, 6.C.1

585
586 APPENDIX F. INDEX

∈ 6.A.1
∈-induction 6.H.1
⊂ 6.B.7
⊆ 6.B.7

⊃ 6.B.7
⊇ 6.B.7

/ 6.B.7

≈ 7.E.3
4 7.E.3

Set theory: functions and constants

SET 6.A.1
∅ 6.B.3

Sets 6.B.9
N see Natural numbers
r 6.B.12

∪ 6.B.14
S
6.B.14
∪˙ 7.D.11
S˙ 7.D.11

∩ 6.B.15
T
6.B.15
ha, bi, ha, bip 6.B.16

× 6.B.19
dom 6.B.20
cod 6.B.20
gr 6.B.20

◦ 6.B.21
id 6.B.21
587

BA 6.B.23
ha0 , a1 , . . .i 6.B.24
{xi }i∈I 6.B.24
Πi∈I Ai , Π{Ai }i∈I 6.B.24

# 7.F.1
ℵ, ℵ0 7.F.2, 7.F.7
bx/yc 9.A.4

x+ 4.C.1, 6.D.1
ξ¯ 4.C.2
¯
I(x), I(x) 6.C.1
ω, ω + , ω ++ etc. 7.A.1

Σ 7.D.18
fM , rM 5.A.2
coloured symbols 1.B.16
θ[vv /m] 5.B.4

πn,i 9.A.3

− 9.A.2
0 9.A.4
zero(x), eq(x, y), nonzero(x), 6= (x, y), less(x, y), gtr(x, y), leq(x, y), geq(x, y) 9.A.4

b xc, bx/yc 9.A.4
Rem(x, y) 9.A.4
P (x, y), L(x), R(x) 9.A.10

T (w, i) 9.A.14
Pn , En , En,i 9.B.13

N[∞] 9.B.16

P[∞] 9.B.17
len, ent, del, adj, rep 9.B.21
ϕ, ϕi 9.E.2
fˆ 9.E.4
588 APPENDIX F. INDEX

f˜ 9.E.5
ϕsm,n 9.E.6
gnSym 10.A.2
gnSym 10.A.2

V (Function used in proof of Gödel’s theorem) 10.C.3


W (Function used in proof of Gödel’s theorem) 10.C.3
A/∼ A.A

[a] A.A

Absorptive laws of ∧ and ∨ 2.F.27


Acceptable substitution 3.A.19 ( 3.A.18 )
Ackermann function 9.D

Addition of cardinals 7.F.4


Addition of ordinals 7.D.1
Adequacy, adequacy Theorem 8.B.20 ( 8.B.19 )
Affected variable symbol 3.A.21

=== Aleph, aleph-null 7.F.2, 7.F.7


Algebraic operation A.A.2
Algorithm 9.A.1 ( C.A.1 )

Algorithmic function ( C.C.3 )

Alphabet (of an algorithm) C.A.2 ( C.A.1 )


Alphabet (of a formal language) 1.A.3 ( 1.A.1 )
And (connective) see conjunction

Antisymmetric, antisymmetry 6.C.1


Antitheorem 2.F.30, 10.A.12
Argument 9.C.3
Arity 3.A.4

Assignment 5.B.2 ( 5.B.1 )


589

Assignment statement 9.C.3


Associativity of ∧ and ∨ 2.F.22
Atomic expression 3.A.9
Axiom 1.B.5 ( 1.A.1 )

Axiom of Choice 6.F


Axiom of Comprehension 6.A.1, 6.B.3
Axiom of Extension 6.A.1, 6.B.1

Axiom of Formation 6.A.1, 6.H.7


Axiom of Foundation 6.A.1, 6.H.1
Axiom of Infinity ( 6.A.1, 6.D.1 )

Axiom of Power Sets 6.A.1, 6.B.10

Axiom of Specification 6.A.1, 6.B.3


Axiom of Unions 6.A.1, 6.B.14
Axiom of Unordered Pairs 6.A.1, 6.B.13

Axiomatic theory 1.B.5 ( 1.A.1 )


Axioms of SL 2.A.10
Axioms of PL 3.A.21

B
Base-α notation 7.D.9
Basis (for a vector space) 6.G.2
Bijection, bijective function 7.E.2

Binary 3.A.4
Binary code C.D.4
Binding 3.A.12 3.A.11

Bound variable 3.A.12 3.A.11


Bounded minimalisation 9.B.8

C
590 APPENDIX F. INDEX

C-term (in UDLO) 4.F.6


Calculus of classes 6.B.12
Cardinal number 7.F, 7.F.1
Cardinality 7.E, 7.E.3

Cartesian power 6.B.23


Cartesian product 6.B.19, B.B.12
Categorical theory 5.D.6 ( 1.A.1 )

Cauchy sequence A.E.1


Cell (of an algorithm) C.A.2 ( C.A.1 )
Cell Gödel number C.E.5
Chain (in an ordered set) 6.C.2

Characteristic function (of a predicate) 9.A.16


Characteristic function (of a set) 7.E.10, 9.A.16
Compactness Theorem 9.A.12

Choice function 6.F


Choice rule 3.F ( 3.F.1 )
Choice set 6.F.3
Choice variable 3.F ( 3.F.1 )

Church’s Theorem 10.C.5


Class 6.A.2, 5.A, B.A.2 ( 5.A.2, 6.A.2 )
Class-difference 6.B.12
Closed expression 3.A.15

Codomain of a relation 6.B.20


Coloured font Preface, 1.B.16
Commutativity of ∧ and ∨ 2.F.21

Compactness Theorem 8.C.2


Comparable 6.C.2
Complement of a class 6.B.12
Complete (complex numbers) A.F
591

Complete theory ( 1.A.1 )

Complex numbers, construction of A.F


Components (of a partial recursive function) 9.A.8
Composition of fundtions B.21.6

Conclusion (of a rule) 1.B.1


Condition 9.C.3
Configuration (for an algorithm) C.A.2 C.A.1

Configuration Gödel number C.E.6


Congruence A.A.3
Conjunction (connective) 2.F.7
Conjunctive term 2.G.7

Connective, connective symbol 2.A.3, 3.A.3 ( 2.A.2 )


Consistency of PL 3.H
Consistent theory ( 1.A.1 )

Constant term 8.B.11


Continuity, continuous (function on ordinals) 7.C.1
Continuum Hypothesis 7.E.14
Contrapositive 2.F.5

Constructive dilemma 2.F.12


Countable, countably infinite 7.E.8
Cumulative Hierarchy 7.B.8

D
DC-form (in UDLO) 4.F.6

Dean Martin ( 3.F.7 )

De Morgan’s laws 2.F.24


Decidable theory 1.A.7, 1.B.13, 1.B.13, 10.A.7 ( 1.A.1 )

Decidability of SL 2.G
Decidability of PL C.6.D.3
592 APPENDIX F. INDEX

Dedekind cut A.E.5


Deduction 3.B
Deduction Theorem for (PL) 3.B.3
Deduction Theorem (for SL) 2.C.2, 2.C.4

Deductively equivalent 1.B.14


Deductive framework 1.B.1
Deductive rule see rule

Definition by description 4.A.7


Definition by induction (ordinary) 6.D.10 6.D.9
Definition by induction (strong) 6.E.10, 7.B.6 7.B.6
Dense, density 4.F.3

Denumerable 7.E.8
Description, definition by 4.A.7
Dichotomy, Law of (theorem of SL) 2.F.28

Dichotomy (property of order) 6.C.1


Difference function 9.A.4
Directions (of an algorithm) C.A.2 ( C.A.1 )
Disjoint union of sets, of ordered sets 7.D.11

Disjunction (connective) 2.F.7


Disjunctive form 2.G.7
Distributivity of ∧ and ∨ 2.F.23
Domain of a relation 6.B.20

Dominate (size of sets) 7.E.3


Double negation 2.F.4
Doubleton 6.B.13

Downward Löwenheim-Skolem Theorem 8.B.14


Dummy variable ( 3.A.11 )

E
593

Elementary Theory of Arithmetic 4.C


Elementary Theory of Groups 4.E
Elementary Theory of Unbounded Dense Linear Orders 4.F
Empty function 6.B.21, 9.A.2

Empty set 6.B.3, 6.B.6


entail, entailment 3.B
Epi function 7.E.2

Equality 4.A
Equality test function 9.A.4
Equation A.A.4
Equipollent 7.E.3

Equivalence (connective) 2.F.7


equivalence class A.A
Equivalence relation 6.B.20

Equivalent theories 1.C.4 ( 1.C.3 )


Equivalents, substitution of 2.F.18, 3.D.6
Excluded middle, Law of 2.F.28
Existental quantifier 3.A.9, 3.D.3

Existential generalisation 3.D.4


Exponentiation of cardinals 7.F.4
Exponentiation of ordinal 7.D.7
Expressible 10.B.1

Expression (in a formal language) 1.A.6 ( 1.A.1 )


Expression (in PL) 3.A.8
Expression (in SL) 2.A.5

Extension of a theory 1.C.1

F
F, symbol for “false” 2.F.30
594 APPENDIX F. INDEX

Factorial function 9.B.6


False (for a structure) 5.B.6
Falsification, falsify 5.B.6
Family 6.B.24

Finitely axiomatisable 3.J


First-order language ( 3.A.1 )

First-order logic Chapter 3 ( 1.A.1 )

First-order theory 3.A


First-order theory with arithmetic 4.C
First-order theory with equality 4.A, 4.A.2
Formal language see Language

Formal theory see Theory


Frame ( C.A.4 )

Free variable 3.A.12 3.A.11

Fresh variable 3.A.4


Full order 6.C.1
Fully formal ( 2.A.1 )

Function 6.B.21

Function domain 3.A.4


Function symbol 3.A.3
Functional notation 3.A.20
Fundamental Theorem of Algebra A.F

The fundamental theorem of Model Theory 5.B.1 ( 5.C.4 )

G
GCH 7.E.14

Generalised Continuum Hypothesis 7.E.14


Generate 1.B.3
Gödel Completeness Theorem 8.B.20
595

Gödel’s Incompleteness Theorem 10.C.3


Gödel number C.E.1, 10.A ( C.E.2 )
Gödel Numbering 10.A.2
Gödel-Rosser Theorem C.6.C.5

Graph of a relation 6.B.20


Greatest member 6.C.2
Greatest lower bound 6.C.2

Groups (Elementary theory of) 4.E

H
Halting Problem 9.F.3
Head-state (of execution of an algorithm) ( C.A.1 )

I
Idempotence of ∧ and ∨ 2.F.29
Identity function 6.B.21
If and only if, iff (connective) see equivalence

Incidence ( 4.B.2 )

Inconsistent theory ( 1.A.1 )

Independent axioms 5.C.7

Induction ( 4.C.2 )

Induction over construction 2.A.9


Inductive definition (ordinary) 6.D.10
Inductive definition (strong) 6.E.10, 6.B.6

Inductive proof (ordinary) 6.D.8


Inductive proof (strong) 6.E.4, 6.B.2
Inductive set 6.D.2

Inequality test function 9.A.4


Initial, initial segment 6.C.2
596 APPENDIX F. INDEX

Infimum 6.C.2
Injection, injective function 7.E.2
Instance (of a schema) 1.B.12
Integers (construction of) A.C

Integer square root function 9.A.4


Intermediate value property 7.C.1
Interpretation 5.B ( 5.B.1 )

Interpretation of an expression 5.B.5


Interpretation of a terms 5.B.3
Intersection 6.B.15
Iso function 7.E.2

Isomorphic structures 5.D.5


IV property 7.C.1

J
Join ( 2.F.15 )

L
Language ( 1.A.1 )

Language of Pure Identity 5.A.7

Larger (size of sets) 7.E.3


Law A.A.4
Least member 6.C.2

Least upper bound 6.C.2


Lexicographic order 6.E.2, 7.D.15
Limit ordinal 7.A.16
Linear combination 6.G.2
597

Linear order 6.C.1


Linearly independent 6.G.2
Löwenheim-Skolem Theorem 8.B.14
Lower bound 6.C.2

M
Maximal 6.C.2
Maximal principle 6.F.3

Maximum 6.C.2
Meet ( 2.F.15 )

Metalanguage 1.B.16

Minsup property 7.C.1


Minimal 6.C.2
Minimalisation 9.A.2
Minimum 6.C.2

MK Set Theory see Morse-Kelley Set Theory


Model 5.C.1
Modular arithmetic 5.A.5

Modus ponens 2.A.11


Mono function 7.E.2
Morse-Kelley Set Theory 6.A.1
Multiplication of cardinals 7.F.4

Multiplication of ordinals 7.D.4

N
=== N the Natural Numbers, q.v.
Nand 2.I.4
Natural number, The Natural numbers 6.D, 6.D.2, A.B

=== Natural numbers 6.D, A.B, B.C


598 APPENDIX F. INDEX

Natural predecessor function 9.B.4


natural projection A.A
Natural subtraction 9.A.2
Non-limit ordinal 7.A.16

Non-standard arithmetic 8.C.3


Non-triviality axiom ( 4.B.2 )

Nonzero test function 9.A.4

Nullary 3.A.4

Occurrence (of a string) 1.A.5


One-to-one function, one-to-one correspondence 7.E.2
Onto function 7.E.2

Or (connective) see disjunction


Ord see Ordinal numbers
Order 6.C
Order-preserving function 6.E.5

Order-isomorphism 6.E.5
Order-type 7.A.13
Ordered pair 6.B.16

Ordered set 6.C.1


Ordered triple 6.B.16
Ordinal number 7.A, 7.A.2, 7.A.3, B.G
Orphan 7.E.11

Parent 7.E.11
Partial function 9.A.2 ( 9.A.1 )
Partial order 6.C.1
599

Partial recursive function 9.A.3 ( 9.A.1 )


Partial substitution 4.A.5
Particularisation 3.A.21
Peano arithmetic 4.C

Peano’s axioms 6.D.1


PL see Predicate logic
Power class 6.B.10

Power set 6.B.10, B.B.6


Pre-order 6.C.1
Predicate ( 9.A.16 )

Predicate logic Ch.3 ( 1.A.1 )

Prefix notation 2.I.1


Premise (of a rule) 1.B.1
Presubstitution 9.A.9

Primitive ordered triple 6.B.16


Primitive ordered pair 6.B.16
Primitive recursion ( 9.B.1 )

Primitive recursive function 9.B, 9.B.3 ( 9.B.1 )

Primitive recursive predicate 9.B.10


Primitive recursive set 9.B.10
Principia Mathematica ( 2.F.19 )

Program C.A.2 ( C.A.1 )

Programming language (for an algorithm) 9.C.2


Projective planes 4.B
proof 1.B.8 ( 1.A.1 )

Proof by contradiction 2.F.6


Proof by cases 2.F.12
Proper axiom 3.A.2 ( 3.A.2 )
Proper class 6.B.2 ( 6.A.2 )
600 APPENDIX F. INDEX

Proposition ( 1.A.1 )

Proposition symbol 2.A.3


Propositional logic Ch.2 ( 1.A.1 )
Punctuation symbol 2.A.3, 3.A.3

Qualified quantifiers 3.G.2


Quasiorder 6.C.1

Quotient set A.A

R
R.e. 9.G
Range of a function 6.B.21

Rational numbers (construction of) A.C


Real numbers (construction of) A.E
The Real Projective Plane 5.E.1
Recursive function 9.A.3

Recursive predicate 9.A.16


Recursive set 9.A.16
Recursively axiomatisable 1.B.5, 10.A.6
Recursively enumerable 9.G.1

Recursively finite 8.A.1


Reductio ad absurdum 2.F.6
Relation symbol 3.A.3

Relative consistency ( 5.A.4 )

Reflexive, reflexivity 6.C.1


Regular function ( 9.A.15 )

Relation 6.B.20

Relation domain 3.A.4


601

Relation symbols 3.A.3


Representability Theorem 10.B.3
Representable 10.B.1
Respect (an algebraic operation) A.A.3

Respects equality 5.D.2


Restriction of a function 6.B.21
Return statement C.C.2

Rice’s Theorem 9.F.9


Robinson Arithmetic 4.D, 4.D.2
Routine C.C.2
Rule 1.B.1 1.A.1

Rules of SL 2.A.11
Russell, Bertrand ( 2.F.19 )

Russell Class ( 6.B.11 )

Russell Paradox ( 6.B.11 )

S
S (A first-order language with arithmetic) 10.A
s-m-n Theorem 9.E.6

Satisfaction, satisfy 5.B.6


Schema, schemata 1.B.12
Schröder-Bernstein Theorem 7.E.11
Scope ( 3.A.11 )

Semantic implication 8.A.2


Semi-formal language 2.A.7 ( 1.A.1, 2.A.1 )
Sentence 3.A.15 ( 1.A.1, 2.A.2 )

Sentential logic Ch.2 ( 1.A.1, 2.A.2 )


Sequence 6.B.24
Set 6.B.2, B.A.2 ( 6.A.2 )
602 APPENDIX F. INDEX

Sets 6.B.9
Set-theoretic difference 6.B.12
Seven-point plane 4.B.4
Sheffer stroke 2.I.4

Signature 3.A.4
Simple expression (in UDLO) 4.F.6
Simple extension of a theory 1.C.1

Singleton 6.B.13
Size 7.E.3 ( 7.E.1 )
SL see Sentential logic
Smaller (size of sets) 7.E.3

Span 6.G.2
Stack ( C.B.3 )

Starter function 6.E.11

State (of execution of an algorithm) ( C.A.1 )

Statement (in an algorithm) C.C.2


Statement (in a formal language) ( 1.A.1 )

Step (in a proof) ( 1.B.7 )

Strict order-preserving function 6.E.5


String 1.A.4 ( 1.A.1 )
Strong induction see Induction (strong)
Structure 5.A.2 ( 5.A.1 )

Subdeductions 2.E
Subspace (of a vector space) 6.G.2
Substitution 3.A.16 ( 3.A.17, 4.A.5 )

Substitution (of functions) 9.A.2


Substitution of dummies 3.D.2
Substitution of Equivalents (in SL) 2.F.18 ( 2.F.16 )
Substitution of Equivalents (in PL) 3.D.6
603

Substitution rule 2.A.12


Substring 1.A.5
Successor function 4.C
Successor ordinal 7.A.16

Sum of ordered sets 7.D.11


Sup property 7.C.1
Supremum 6.C.2

Surjection, surjective function 7.E.2


Symbol (of a formal language) 1.A.3
Symbol replacement 8.B.3
Symbols (for SL) 2.A.3

Symmetry 6.B.20

T
T, symbol for “true” 2.F.30
Tautology 2.A.7, 2.G.2
Term 3.A.6

Ternary 3.A.4
Theorem 1.B.5, 1.B.8 ( 1.A.1 )
Theory 1.B.2 ( 1.A.1 )

tL 5.A.1
Total function 9.A.2
Total order 6.C.1
Toy theory 1.A.2

Transfinite inductive 7.A.2


Transitive class 6.D.4
Transitive, transitivity (property of order) 6.C.1

Transitive class 6.D.4


Transitive closure 6.H.4
604 APPENDIX F. INDEX

Trichotomy Theorem 6.E.8


Trivial language 5.A.7
True (for a structure) 5.B.6
Truth table, truth table method 2.G.2

Truth table tautology 2.G.4

U
UDLO 4.F
UG see Universal generalisation
Unary 3.A.4

Unary code C.D.5


Unary decode C.E.13
Unbounded dense linear order 4.F

Underlying class (of a structure) 5.A.2


Unique existence 4.A.6
Uniqueness of parsing ( 2.A.7, 3.A.9 )

Union 6.B.14

Universal generalisation 3.A.21 ( 3.A.26 )


universal partial function 9.E.2
Universal quantifier 3.A.9

Universe (of a structure) 5.A.2


Universe, the 6.B.9, B.B.5
Unordered pair 6.B.13
Upper bound 6.C.2

Variable symbol 3.A.3


vL 5.A.1
605

W
Well-formed formula 3.A.8
Well-order 6.C.1, 6.E
Well-Ordering Principle 6.F.3

Wff 3.A.8
Witness ( 8.B.1 )

Workspace C.A.2 ( C.A.1 )

Zermelo-Fraenkel Set Theory Appendix B ( 6.A.2 )


Zermelo-Fraenkel Axioms B.A.1
Zero test function 9.A.4

Zorn’s Lemma 6.F.3

You might also like