Foundations
Foundations
Martin Ward
1
Contents
Contents i
I LOGIC 3
1 FORMAL THEORIES 5
A Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
B Formal theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
C Comparing formal theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 SENTENTIAL LOGIC 31
A A language for Sentential Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B Examples of formal proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
C Deductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
D Three “it follows that” relationships . . . . . . . . . . . . . . . . . . . . . . . . 49
E Subdeductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
F Some theorems and deductions . . . . . . . . . . . . . . . . . . . . . . . . . . 55
G Decidability of Sentential Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 73
H Independence of the axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
I Other axiomatisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
i
ii CONTENTS
5 MODELS 185
A Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
B Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
C Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
D Models which respect equality . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
E Example: more models of the projective plane . . . . . . . . . . . . . . . . . . 208
F PL is not complete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
G Example: Models for RA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
II MATHEMATICS 215
IIICOMPUTABILITY 375
APPENDICES 460
F INDEX 585
CONTENTS 1
Preface
Please read this: it won’t take long.
In these notes I use coloured text a lot. Everything makes perfect sense without the colour-
coding, however it is there to make it easy to see what kind of text it is; I hope that will
make the notes easier to understand. Here is how the colours will be used . . .
The plain black text like this is the serious core of the book. It is most easily explained as
being what the other-coloured text is not.
The text like this is for informal discussion and comments. It will contain, for instance,
suggestions of useful ways to visualise things, informal outlines of proofs, warnings about
easily-made mistakes and so on. In short, the sort of overview which I would normally
provide in lectures. In a number of places I mention mathematical results that will not be
established until later in the notes. So be aware that anything in this colour which looks like
a mathematical statement with or without proof cannot be used seriously until it is actually
proved (in another colour) later in the notes.
The green like this is used for expressions in the various formal languages we will be studying.
(That’s apart from this paragraph of course.) Such expressions will usually be mostly
symbols. This usage will be explained properly when it becomes relevant.
Anything in blue like this is a hot link to another place in the notes. Hot links are usually
in the margin where they won’t get in the way of the text.
Part I
LOGIC
3
1. FORMAL THEORIES
A Formal languages
A.1 Discussion: The idea of a formal language
Let us suppose we have decided to develop a self-contained theory of the natural numbers,
treating the subject with full mathematical rigour. Our writings would have informal discur-
sive portions, which would be written in English (or whatever language we choose) peppered
with some mathematical technical terms, and there would be highly formal portions, such
as equations, proofs and so on, which would not look like a natural language at all. To
be entirely rigorous, the full logical development of the theory should be contained in the
formally written part, which would consist for the most part of definitions and proofs. The
discursive portions would be limited to discussion of motivation, pointing out interesting
connections, history and so on and could be dispensed with without affecting the logical
structure of the theory. Our focus in this book will be on the formal language (which is all
that is logically necessary).
To treat the natural numbers in this formal way, we would need some symbols: some letters
to refer to variables, some numeric digits such as 0 and 1, and some logical and mathematical
symbols such as +, =, ≤, ∀ and so on; perhaps quite a short list. (We will have good reason
to do exactly this in Section 4.C and the list is indeed short.) I 4.C
The statements we make in our formal language will consist of strings of these symbols.
Some strings will be just plain rubbish, such as
)∀ + (=== .
Of course we hope we will not write such strings down in earnest. The kind of strings we
will be writing are ones which are meaningful, which we will call expressions, for instance
(∀x) ( x + 1 > x )
(∀x) ( x + y > x )
2+2=5
While an expression must make sense, it need not be true. The first example above is true,
the last one is false and the middle one occupies an intermediate position — its truth or
falsity depends upon the value of y.
We expect that it should be a fairly simple matter to decide whether a string is an expression
or not. In most areas of mathematics you only have to glance at an equation to see if it is
well-formed or not (assuming you are already familiar with the notation used in that area).
You might have trouble with a very complicated equation which extended over several lines,
and then you could engage in a more careful procedure — checking that parentheses match
and so on. In short, one has an algorithm (a straightforward computational procedure) for
5
6 CHAPTER 1. FORMAL THEORIES
determining whether a string is an expression or not; one hopes it is simple enough that it
can be applied to short expressions very easily, even subconsciously.
Putting this together, we have a formal language. In general, a formal language consists of
• A defined set of symbols, which we will call its alphabet, even though it is likely to
contain symbols other than ordinary letters.
• A defined set of expressions, which is a subset of the set of all strings. (Expressions
are sometimes called well-formed formulas or just wffs.)
Things become much more interesting, perhaps useful, when amongst the expressions we
have a distinguished subset which we consider “good” in some sense: they are the important
centre of our investigation when using this language. If the language is designed to do
mathematics or logic, then these “good” expressions will probably be “theorems”, in the
sense that they can be derived from a given set of “axioms” by application of some well-
defined set of rules. When we have this extra structure, we have an “axiomatic theory”.
• A defined set of axioms, which is a subset of the set of expressions of the language.
• This then defines the set of theorems, another subset of the set of expressions, as those
expressions which can be derived from the given axioms by the given rules.
Some of these ideas, especially how theorems can be derived using rules, need to be explained
more carefully. This will be done later in this chapter .
Returning to our number theory example, our theorems should be the expressions which say
something that is in fact true, or at least a big subset of them. We could start with Peano’s
Axioms and give a sufficient set of deductive rules to derive the theorems we want. Figuring
out what these rules should be is of course a big undertaking, and one with a long history.
A. FORMAL LANGUAGES 7
[ In these notes, the emphasis will be on topics in mathematics and logic, however it
is worth pointing out that these ideas and methods can be of considerable interest in
other areas. It is possible, for instance, to define a language in which the expressions
specify four-part musical pieces and then the theorems are those which specify pieces
which obey the rules of classical (common practice) harmony. For another example,
we might define a language in which the expressions specify positions in the game of
Chess and the theorems are those which specify the winning positions. For perhaps a
more obvious example, the language could be a computer language, C for instance; in
this case the expressions would be programs obeying the syntax of the language and
the theorems those programs which do not crash (in some sense which we would have
to specify carefully), for instance the well-known “Hello world”. The expressions in
this language (= programs) tend to be long and very complicated and the algorithm to
distinguish expressions from non-expressions correspondingly intricate: this is (part
of) what a compiler does.]
It should be pointed out that the description given above of an axiomatic theory might give
the impression that one starts with a formal language with axioms and deductive rules and
then figures out what theorems follow from that. In fact, from a historical perspective, this
is quite back-to-front. Most important formal theories in use started from a body of known
facts that was crying out for codification, a language was invented to describe them and then
the difficult process of discovering axioms and rules which will distinguish the required facts
from the dross was undertaken. As one obvious example, the important general facts about
the natural numbers were well known for thousands of years before Giuseppi Peano came
up with his axioms — but others abound: the invention of group theory to investigate the
properties of symmetry, the long drawn out invention of axiomatic bases for mathematics
which more or less started with Bertrand Russell’s Principia Mathematica and the more
recent invention of Category Theory to (amongst other things) codify properties in common
between diverse mathematical theories. Although this approach is obviously important, we
will not be following it in these notes; rather we are interested in the properties of formal
theories in general, and several particularly important ones, leading up to the axiomatisation
of mathematics itself. By and large our axiomatic theories will be taken ready-made. There
is quite enough here to keep us busy!
We will however from time to time want to consider such ideas as
• Under what circumstances might one theory be stronger (say more) than another
theory (and what does that mean anyway)?
To discuss such ideas, we need the notion of a “theory” which does not necessarily involve
the notion of axioms for that theory. One property that a theory really should have is that
it should logically hold together. We can make this idea more clear-cut by stating it thus:
assuming we already have a language and a set of deductive rules,
Any expression which can be derived (using the given rules) from expressions
in the theory must also be in the theory.
8 CHAPTER 1. FORMAL THEORIES
This property could be conveniently expressed “A theory must be closed under the deductive
rules”. This idea turns out to work very nicely; we will define a theory to be a set of
expressions which is closed under the deductive rules in this way. (The definition assumes
that we have already decided on a language and set of deductive rules.) Here is why this
definition is nice: we will be able to prove
(1) Any axiomatic theory (that is, any set of statements that can be derived from a set
of axioms as described above) is in fact a theory (as just defined), so we get no ambiguity
of words.
(2) Any theory (as just defined) is axiomatic, that is, there does exist a set of axioms
from which that theory can be derived.
So the neat way to proceed will be to start with the idea of a theory in general, then define
an axiomatic one.
Some examples of theories:
• Finite set theory (facts about finite sets. What would the axioms be?).
• Real numbers (facts about real numbers. Something like the “axioms” given in ele-
mentary calculus textbooks?).
• Simple formal theories such as the one described in the next section.
When we come to investigating a particular language, for instance that for Mathematics, we
will define it completely formally. We do this because we are interested in proving things
about mathematical proofs, provability, and theories in general, some of them subtle and
some surprising so we need a firm basis to work from. This means that we must idealise
our language somewhat — it must differ to some extent from common usage (because
common usage, even in mathematical texts, is quite sloppy). Given that we are going to do
this, we may as well idealise it in a way which is convenient for our present purposes. It
should be understood however that, whatever we do, anything correctly stated in ordinary
mathematical discourse should be translatable into our formalised version of the language
and vice versa. Having granted ourselves this freedom, we will define the formal language
as simply as possible. This will mean that it is (relatively) easy to prove things about it.
On the other hand, as we will see, it will be very difficult to read and most impracticable
to use. We will usually employ a number of abbreviations to make it easier to understand
— thus creating a “user friendly”, semi-formal version of the language.
Many formal theories embed logic in some sense. There are many different logics and
many ways they could be incorporated into a theory, however two forms of logic stand
out as most important: Sentential (or Propositional ) Logic and Predicate (or First-Order )
Logic. Sentential Logic is concerned with sentences (otherwise known as propositions or
statements), that is, expressions which are either true or false (but not both) and the
A. FORMAL LANGUAGES 9
logical relations between them, the most common being negation, conjunction, disjunction,
implication and equivalence. Predicate Logic includes all of this and adds the quantifiers
(“For all x . . . ” and “there exists x such that . . . ”). That, of course, was a rough description,
not a definition.
Sentential and Predicate Logic will be described in detail in the next two chapters, however
to continue these introductory remarks, let us assume that we now know what Predicate
Logic is (at least roughly), and we propose to incorporate it into some formal theory. There
are several ways we could do this. For a start, there is a choice of symbols: we do not need
all of ¬ ∧ ∨ ⇒ ⇔ ∀ ∃ , because some of these symbols can be defined in terms of the
others. When we have chosen our symbols and the rules for forming expressions, we have
some choices about the sets of axioms and rules. If you take the trouble to look up some
textbooks on logic, or just look the topic up on the internet, you will discover that there are
many different ways of axiomatising both sentential and predicate logic. Of course, all these
different axiomatisations should be equivalent, in the sense that they give rise to the same
theory. To a large extent, the choice of axiomatisation is simply a matter of convenience:
any choice will be acceptable provided it yields a language and theory which is equivalent to
“standard” Predicate Logic. (In Section C.4 we given a careful definition of what we mean I C.4
here by “equivalence”). In these notes we adopt a particularly simple and fairly common
axiomatisation of these logics.
You will have noticed that above I wrote “should be equivalent”. It is an unfortunate
fact that some authors have slightly different ideas as to what Predicate Logic actually
is, so their axiomatisations might give rise to a different set of theorems. Possibly the
most common divergence is that some authors incorporate the notion of equality into
their logic, whereas in these notes it is treated as a separate idea. Even more unfortu-
nately, a number of treatments contain errors, either large or small, the most common
being errors of omission. (Once you have worked through to the end of Chapter on
First-order Theories, you might find it interesting to look up some other expositions
and see how they deal with the subtle but important topics of acceptability, binding
and universal generalisation.)
Suppose now we have some formal theory in which we are interested. Here are some funda-
mental questions we might ask about it:
• Is the theory decidable? That is, is there an algorithm which will decide whether any
given expression is a theorem or not?
• What sort of interpretations does it have, perhaps quite different from whatever led
us to think it up in the first place?
Those questions can be asked whether the theory has logic embedded in it or not. Now
suppose that the theory contains (at least) sentential logic.
then follow from the rules of Sentential Logic that every expression is a theorem (and
therefore so would the negation of every expression), and so the whole thing becomes
trivial. Indeed, worse than that, a theory which contained such theorems would also
contain (one meaning) P AND NOT P , and so would violate the spirit of sentential
logic anyway. In short, we really do not want any of our theories to be inconsistent.
• Is the theory complete? A theory is complete if, given any sentence P , either P or
NOT–P is a theorem. Since a sentence (in the sense we will use the term) is meant
to represent some proposition which is either true or false, one might hope that the
theory was strong enough to say which of these was the case: completeness means that
this is so.
XiY ↔ XY .
(2) The string aaa may be replaced by i anywhere, and vice versa. That is
XaaaY ↔ XiY .
(4) The string abab may be replaced by bbaa anywhere, and vice versa:
XababY ↔ XbbaaY .
A. FORMAL LANGUAGES 11
(In these rules, X and Y stand for any subexpressions, possibly empty — though they may
not both be empty in (1).)
An expression is part of the theory if it can be “deduced” from the axiom by a sequence of
none or more of the rules. For example, we could have a sequence
i → aaa by Rule 2
→ iaaa by Rule 1
→ bbbaaa by Rule 3
→ bababa by Rule 4
XiY ↔ XY Rule 1
3
Xa Y ↔ XiY Rule 2
3
Xb Y ↔ XiY Rule 3
2 2 2
X(ab) Y ↔ Xb a Y Rule 4
i → a3 → ia3 → b3 a3 → (ba)3
Here are some questions to try. I believe that they are in increasing order of difficulty.
(Q4) Is this theory decidable or not? In other words, is there an algorithm (a routine
computational process) which will decide, for any expression of the language, whether it is
a member of the theory or not?
12 CHAPTER 1. FORMAL THEORIES
Note that we could have defined this theory slightly differently. We could have replaced the
lone axiom i by an extra rule which says
In a proof, you can write down i as a step any time you like.
Then we would have a perfectly good theory based on five rules and no axioms at all. (The
theorems would be the same as before.)
The alphabet is normally either finite or countably infinite. In occasional (unusual) cases in
which we want to deal with an uncountable set of symbols, this will be made clear.
A.4 Strings
Assume now that we have decided on the set of symbols.
In terms of these, a string is simply a finite sequence of symbols.
For example, if our alphabet contains the symbols a, c and t, one of our strings is {c, a, t}.
For legibility, we mostly write these sequences without the braces and commas, thus cat.
Using a set of symbols commonly used in mathematics, we can now make strings such as
2 + 2 = 4 and xy∀+− . Note that nonsensical sequences of symbols are permitted as
strings. To facilitate discussion, we also include the empty sequence, which I will denote by
the name e.
Pause for nitpicking. We cannot write the empty string if we think in terms of marks
on a page, except I suppose to display a blank space on the paper. In this notation,
one could argue whether there is such a thing as the empty string at all. It suffices
to say that anything we will say below which mentions the empty string explicitly or
implicily can also be said without it; we are simply using the idea as a notational or
terminological convenience. In any case, the symbol e which I use to represent the
empty string is not the empty string itself but a name for it, and so is not part of the
formal language.
We note that two strings can be concatenated, i.e. joined end to end. We can concatenate
abc and pqa to get abcpqa. We could concatenate the and cat to get thecat. This turns
A. FORMAL LANGUAGES 13
our set of expressions into an algebraic system, consisting of a set S of strings, with two
operations
the empty string e (nullary)
concatenation (binary)
It is easy to see that the empty string acts as an identity and that concatenation is associa-
tive, i.e. that the following laws hold:
ue = u and eu = u for any string u,
u(vw) = (uv)w for any strings u, v and w.
This means that S is a monoid, that is, a semigroup with identity. This particular semigroup
has some very special properties, most of which are really really obvious. However, here is
some terminology that we should get clear, because the ideas are used frequently.
A.5 Definition
(i) A string u is substring of a string v if there are strings x and y such that v = xuy.
(Note that either x or y or both may be empty here, so in particular, any string is a
substring of itself. Also the empty string is a substring of every string.)
(ii) An occurrence of the string u in the string v is a triple (x, u, y) such that v = xuy.
For example:
A.6 Expressions
The set of expressions is simply a defined subset of the set of strings.
Normally the set of expressions is decidable, that is, there is an effective procedure — an
algorithm or rule — to decide whether any given string is an expression or not.
Let me put that more strongly: it is hardly ever the case that a formal language does not
have such an algorithm.
A.7 Decidability
The above raises the question of what exactly an algorithm is. For the time being, it is
sufficient to think of an algorithm as a rule, or set of rules, which can be applied to answer
such a question (or compute values of a function) in a completely routine and mechanical
way — without any call for creative thinking. Algorithms will be defined and discussed in
eye-watering detail in Chapter 9. I Ch.9
14 CHAPTER 1. FORMAL THEORIES
B Formal theories
Now we have the idea of a formal language, we are able to define a formal theory.
More precisely, the rule simply consists of the specified expressions. The phrase “one can
deduce” is, so far, a meaningless convenience. We will give it a meaning in the next paragraph
or so. Such a rule is often written in the convenient form
A1 , A2 , . . . , Ar
B
Here the expressions A1 , A2 , . . . , Ar are called the premises and B the conclusion of the
rule. Once again, there may be an infinite number of rules but, as with the axioms, the set
of rules will normally be decidable, for the same reason.
When discussing relationships between different theories, possible sets of axioms for theories,
and so on, we normally (but not always) work with a fixed language and set of rules. Thus
it is natural to think of a given language and set of rules as a sort of deductive framework
within which we work.
For a theory, it will always be assumed that the set R of deductive rules is decidable.
Note that the definition of a formal theory assumes that it is done in terms of a language
and set of rules, presumably already chosen. To be more careful, perhaps we should talk of
“the formal theory T in the language L with rules R”, or “the formal theory T for framework
L and R.” To be even more careful we could consider the whole thing together as a triple
hL, R, T i. Most of the time we will not need to be so explicit especially since, from Chapter
3 on, the language will hardly ever change and the rules never.
In any case, the language L is called the underlying language of the theory.
I now give what may look like a strange definition of axioms for a theory — it does not
mention proofs. However, it is neat and useful, and we will come to proofs shortly. First a
proposition which makes the definition work.
(i) A ⊆ Th(A),
(ii) If T is any theory which contains A as a subset, then Th(A) ⊆ T .
We will say that A generates the theory Th(A).
Proof. First notice that, given a set of theories (with the same framework), their inter-
section is also a theory. This is because, given any rule A1 ,A2B,...,Ar such that the premises
A1 , A2 , . . . , Ar are all members of this intersection, they must all be members of every one
of the set of theories. But then so must B be.
Notice also that the entire language — the set of all its expressions — is a theory.
It follows that, setting Th(A) to be the intersection of all theories containing A, it has the
required properties.
That Th(A) is unique follows immediately from (i) and (ii)
B.4 Corollary
Given a deductive framework L and R, let A and B be two sets of expressions in the language
such that A ⊆ B. Then Th(A) ⊆ Th(B).
B.6 Comments
(1) Note that any theory has at least one set of axioms (according to this definition),
namely itself. Thus there is no point in defining an axiomatic theory to be a theory which
happens to have a set of axioms — because then every theory would be axiomatic. Rather,
we define an axiomatic theory to be a theory together with a particular, usually chosen set
of axioms, the pair hT , Ai if you like.
(2) Nearly all the theories we will deal with in this book will be axiomatic theories.
Nearly.
(2) There is a difference between a “plain” formal theory, as defined in B.2 above, and I B.2
an axiomatic theory that has an empty set of axioms. A plain formal theory is just a
set of expressions, closed under the chosen rules. It usually expresses everything we know
about some subject, or at least we hope so. Frequently we would be interested in finding a
convenient set of axioms which would make it into an axiomatic theory. Until we do that
we cannot identify theorems by giving proofs. In an axiomatic theory for which the set of
axioms is empty, on the other hand, theorems can be proved by proofs which involve just the
rules alone. We made a version of the Toy Theory at the end of A.2 which is such a theory. I A.2
Axiomatic theories with empty sets of axioms are unusual, but not particularly rare. For
instance, there are formal theories of logic which are entirely rules-based, with no axioms.
16 CHAPTER 1. FORMAL THEORIES
(3) It is normally assumed that the set of axioms must be decidable and certainly for
any useful formal theory this will be so. However very occasionally we may want to consider
a theory in which the set of axioms is not (or need not be) decidable, so we do not want
to make it a hard and fast rule. In the rest of these notes, you can assume that all sets of
axioms are decidable unless I say otherwise — however it is probably a good thing, each
time a new theory is introduced, to look at the axioms and ask whether the set is decidable
or not. (If we have only a finite set of axioms, that is automatically decidable. However we
often want the set of axioms to be infinite, in which case the decidability criterion is not
quite so trivial.)
A formal theory for which a decidable set of axioms exists is called a recursively axiomatisable
theory for reasons which will become apparent much later in this book.
Most of the time we will be dealing with axiomatic theories, where there is a chosen set of
axioms to work with which generates the theory, but just occasionally we will deal with a
“plain” theory, where we have rules but no particular set of chosen axioms. So I need to
word the definitions here carefully to cover both cases.
The set of theorems can be built up as follows: a theorem is an expression which has a proof.
A proof is a finite sequence of expressions, S 1 , S 2 , . . . , S n say, (which we call the steps of
the proof), such that every one of the individual expressions Si is either an axiom (if there
are any), or follows from earlier members of the sequence by one of the deductive rules.
This corresponds pretty closely to what we would ordinarily think of as a theorem and its
proof, except that in the definition of the formal theory a proof is expected to be written
out in much more detail than we would normally bother with in practice.
More generally, we can have a proof from hypotheses, in which case we start with a set of
expressions (the hypotheses), which can be any set of expressions at all, and then a proof
from them is just like the one defined above except that a step may also be one of the
hypotheses.
In the case of a theory which is not axiomatic (that is, we do not have a defined subset of
axioms) we can use these definitions by simply assuming that the set of axioms is empty.
(called the steps in the proof) in which, for each i, one of the following holds:
(a) Si is an axiom ,
(b) Si follows by one of the rules of the theory from its predecessors S 1 , S 2 , . . . , S i−1 .
Is it obvious what it means in (b) to say that, in this sequence, the step Si
follows from earlier steps by the rule
A1 , A2 , . . . , Ar
?
B
It means that, among the preceding members S 1 , S 2 , . . . , S i−1 of the sequence,
all the expressions A1 , A2 , . . . , Ar occur, though not necessarily in that order,
and Si is the same as B.
(ii) In an axiomatic theory, a theorem is any expression T which occurs as the final step
of a proof. And of course we say that that proof is a proof of T .
We also have the notion of a proof from hypotheses H, where H can be any set of expressions.
In this case I will use the word deduction to emphasise the difference.
(b) Si is a member of H ,
(c) Si follows by one of the rules of the theory from its predecessors S 1 , S 2 , . . . , S i−1 .
So the only difference between a deduction from hypotheses and an ordinary proof of a
theorem is that, in a deduction from hypotheses, any one of the hypotheses may be asserted
as a step at any time. This corresponds to making the assumption that that hypothesis is
true. Looking at this the other way around: a proof is just a deduction from no hypotheses
(H is empty).
The idea of proof of a theorem is not usually useful in a plain formal system, but the idea
of a deduction from hypotheses is. Of course, in this case there are no axioms to appeal to,
so the definition looks like this:
(iv) In a plain formal theory (that is, one without a given set of axioms), a deduction
from or based on hypotheses H is a finite sequence of expressions
(called the steps in the deduction) in which, for each i, one of the following holds:
(a) Si is a member of H ,
(b) Si follows by one of the rules of the theory from its predecessors S 1 , S 2 , . . . , S i−1 .
18 CHAPTER 1. FORMAL THEORIES
The last step Sn is usually called the conclusion (since that is usually what we are proving).
B.9 Comments
(i) This definition of a proof or deduction includes sequences S1 of length 1. In this case
S1 must be an axiom (if there are any) or a hypothesis (if there are any).
(ii) For an ordinary proof (that is, not a deduction from hypotheses) I will often say
“proof of a theorem” in an attempt to avoid confusion.
(iii) You may have noticed a rather odd thing about Part (iii) of the definition: there
doesn’t seem to be any difference between axioms and hypotheses — at least as far as this
definition is concerned. And that is right: the difference is in how we use them. With
an axiomatic theory, the axioms are fixed (for that theory) and won’t change, whereas we
may play with various sets of hypotheses. In the next chapter, for instance, we investigate
the axiomatic theory SL. The axioms are part of the definition of SL, are given near the
beginning and stay fixed for the whole chapter, while we look at proofs from many sets of
hypotheses.
Also, it would be an irritating waste of time to have to write out all the axioms like hy-
potheses at the start of the proof of every theorem.
(iv) It does follow from this that, if we are prepared to change axioms in our theory, we
can sort of swap axioms and hypotheses. Now I will state this more carefully.
Suppose we have an axiomatic system with axioms set A, and we also have a chosen set H
of hypotheses, and we are looking at a possible proof
Then the following five statements about it are equivalent (in the sense that if one is true
they all are).
B.10 Proposition
In an axiomatic theory, the theory is just the set of all theorems.
Which is exactly what you would expect: that’s the whole point of a proof.
Proof. Let T be the set of theorems based on A; I will show that it satisfies the definition
I B.3 of the theory generated by A as given in B.3.
B. FORMAL THEORIES 19
(i) The remark above about proofs of length 1 tells us that all the axioms are theorems,
so A ⊆ T . To see that T is a theory, let A1 ,A2B,...,Ar be a rule such that A1 , A2 , . . . , Ar ∈ T .
Then all those Ai have proofs, so, for each i let
Now, make a big proof by placing all these individual proofs end-to-end, and finishing off
with B, thus:
proof1 , proof2 , . . . , proofr , B (–2)
Then this is a proof of B because every step except the last is justified in (–2) by the same
rule as in (–1), and the last step B is justified by the rule we started out with, because all
the Ai occur as the last steps of proofi in (–2). This shows that B ∈ T as required.
(ii) Suppose that U is another theory such that A ⊆ U. Any T ∈ T has a proof,
S 1 , S 2 , . . . , S n say, with Sn = T . Then, by induction over i, every Si ∈ U. This proves that
T ⊆ U, as required.
B.11 Comments
There are several simple facts about proofs that should be pointed out here.
(1) The remark above about proofs of length 1 tells us that, in an axiomatic theory, all
the axioms are theorems.
(2) Note that we could have defined a theorem alternatively as any expression which
occurs as a step anywhere within any proof. This follows from the obvious fact that , if
S 1 , S 2 , . . . , S n is a proof and 1 ≤ m ≤ n, then S 1 , S 2 , . . . , S m is also a proof.
(3) If an expression appears as a step in a proof, it may always be repeated at a later
point, since whatever justified its use the first time will still justify it later on.
20 CHAPTER 1. FORMAL THEORIES
Strings
Expressions
Theorems
B.12 Schemata
Returning to our simple i − a − b theory, we have a single axiom, i. Consider the second
rule, which I wrote as Xa3 Y ↔ XiY . This is of course just a convenient shorthand for
two rules, Xa3 Y → XiY and XiY → Xa3 Y . According to our latest notation, we should
write these
Xa3 Y XiY
and .
XiY Xa3 Y
But this does not exactly fit the definition of a deductive rule given above: the premise
and conclusion are not expressions. They represent, rather, forms of expressions. Must we
extend the definition above to include such rules?
No. What we do is regard this, not as a single rule, but rather as a recipe for producing
a whole set of rules, one for every possible pair of “values” of X and Y . More precisely,
substitution of any pair of expressions for X and Y (possibly empty and same substitutions
top and bottom) in the recipe above produces a rule.
B. FORMAL THEORIES 21
Such a recipe is called a schema and the individual rules which it defines are called instances
of the schema. It is not a new kind of deductive rule, but simply a convenient way of
specifying an infinite set of genuine deductive rules.
This is a deductive rule schema. Shortly we will see that an axiom schema is also a useful
thing.
(By the way, the plural of “schema” is “schemata”, if you want to be pedantic. However
“schemas” will do as long as you are far enough away from the Classics Department.)
It is not difficult to see that an infinite number of deductive rules is essential for an interesting
theory. Observe that the only things that can turn up as expressions in proofs are axioms
and the conclusions of rules. Therefore, if a formal theory had only a finite number of
deductive rules (I refer to genuine rules, not schemata now), then the theory would consist
of only the axioms and a subset of this finite set of conclusions. In this case, we might as
well add those conclusions to the axioms in the first place, and have a theory which has
been specified simply by writing down all the theorems, and no deduction necessary.
(i) It is not hard to make an algorithm which will list all the strings. Suppose first that
there is only a finite number of symbols. To make the list, simply start with the empty
string, then list all the strings of length 1 in dictionary order, followed by all the strings
of length 2, followed by all the strings of length 3 and so on. It is not hard to extend
this method to cover the infinite number of variables case.
(ii) Next, let us perform this algorithm, but as we go, apply the given method of deciding
whether strings are expressions or not, and so throw away every string which is not
an expression. We now have an algorithmic way of producing a list of all expressions.
(iii) Now go back to step (i), but change it in the obvious way to list all finite sequences
of strings. Then apply the trick of step (ii) to turn it into a method of listing all finite
sequences of expressions.
(iv) Tune up the last algorithm one further stage by testing each sequence of expressions,
as it is produced, to see if it is a proof or not, and throwing away all non-proofs. We
now have an algorithm which will list all proofs.
22 CHAPTER 1. FORMAL THEORIES
(v) Now use this algorithm to produce proofs, but as each one is produced, select its last
expression and discard the rest. We have an algorithm which will list all the theorems.
B.14 Deduction
(i) Where there is a deduction of a conclusion C based on a set H of hypotheses, there
is a standard symbol for this:
H C .
We can also say that H entails C.
Clearly this idea (what it means and whether it is true or not) depends upon
what theory we are working in. The definition does not make much sense unless
we know what the rules are, whether we are working in an axiomatic system or
not and, if so, what the axioms are. Thus we only use this useful symbol when
we already know what theory we are talking about. Occasionally we might
be talking about several different theories at the same time, or the particular
theory might be in doubt for some other reason. Then we can indicate that we
are referring to a proof in the theory T by writing
H C (in T ).
(ii) In the case where H has only one member, H = {H} say, we really should write
{H} C, but usually, as a slight abuse of notation, we write simply
H C
H 1, H 2, . . . , H n C
(iii) And in the same way, in the case where H = ∅, we really should write ∅ C, but
again, as a slight abuse of notation, we write simply
C.
H H for every H ∈ K .
B. FORMAL THEORIES 23
(v) Note that the symbol is not part of the formal language we are discussing. It
is not even a convenient abbreviation for something that can be said in the language —
that is, it is not even part of our semi-formal version. Statements involving this symbol
are statements about the language, in other words they are part of the metalanguage (see
below).
This symbol I believe comes from a T for “theorem” turned on its side, but I
could be wrong. If you want a name for it, it is usually called “turnstile” (because it
looks a bit like one of those things you run into as you leave a library).
(vi) We will also write H K to mean that both H K and K H and say that H
and K are deductively equivalent. (And of course, if H and K contain one expression each,
we may write H K, and so on.)
B.15 Proposition
(i) Perhaps the most basic fact about deduction (in an axiomatic theory) is that
If H K and K L then K L.
(iv) This one says, roughly, that deductive rules immediately give deductions.
A1 , A2 , . . . , Ak
If is a rule, then A1 , A2 , . . . , Ak B.
B
(v) The set of hypotheses can always be pared down to a finite set (but only when the
conclusion is a single expression).
24 CHAPTER 1. FORMAL THEORIES
I B.9 Proof (i). This is just a restatement of the comment (v) in B.9 above.
I B.3 (vi) K ⊆ Th(A ∪ H) (by Part (i)) and Th(A ∪ H) ⊆ Th(A) (by Proposition B.3).
(ii) This is just a restatement of Part (vi) using Part (i).
0
(iii) From H C we have C ∈ Th(A ∪ H) and from H ⊆ H we have Th(A ∪ H) ⊆
0 0
Th(A ∪ H ), so C ∈ Th(A ∪ H ) .
(iv) B ∈ Th(A ∪ {A1 , A2 , . . . , Ak }) by the definition of a theory.
(v) A proof of H B consists of a finite sequence of steps S 1 , S 2 , . . . , S n , where
Sn is B and each Si is either an axiom, a member of H or follows from earlier steps by a
deductive rule. So, if we set {H 1 , H 2 , . . . , H k } to be all the members of H which occur as
steps in that proof (there must only be a finite number of them), then it is also a proof of
{H 1 , H 2 , . . . , H k } B.
For an everyday example, the phrase «Il pleut» is a statement in the French language. In
the statements “It is possible to say ‘It is raining’ in French” and “«Il pleut» is French for
‘It is raining’ ”, we are using English as metaFrench.
For the kind of discussions we will be having, it is essential to distinguish between expressions
of the object language and statements made in the metalanguage. We will be dealing with
the language of mathematics and languages which express parts of mathematics. These
languages of course contain symbols and expressions which are the same as the ones we use
in the metalanguage, providing a rich source of confusion.
For example, suppose we are dealing with a language which contains symbols for the nu-
merals 0, 1, 2, . . . , + for addition and = for equality. What are we to make of this?
2+2=4
What we do in everyday English, if we are being careful, is use quotes, something like this:
Sprinkling mathematics with quote marks like this does not work very well: it quickly
becomes unreadable. Another way to deal with the problem is to use a noticeably different
font. In these notes I will use a different font colour, so that expressions in the formal
language are easily distinguished — that accounts for the green-coloured symbols in the
text above.
Also, because equality and inequality of expressions as strings is often discussed, I will
occasionally use a special symbol for this, a heavy circled equals sign:
When we look at important formal languages in detail, we will find that they are enormously
long-winded. So we introduce many abbreviations to make them easier to read and deal
with. We have already done so with our simple i − a − b language with the abbreviation
a3 for aaa. Such abbreviations will also be written in the “formal language font colour”. In
discussing that simple theory, we also used expressions such as Xa3 Y . Here the X and Y
26 CHAPTER 1. FORMAL THEORIES
are not actually symbols of the language; but they stand for strings in the language. We
write them in the formal font colour because they represent something that is entirely in
the formal language. Discussions above of strings X and Y follow the same pattern.
So here is the general rule followed in these notes:
• Anything that is entirely in the formal font colour is, or represents, a string in the
formal language itself.
• Anything that is entirely in any other colour is in the metalanguage (or just generally
rabbiting on).
• Anything that is a mixture of colours is in the metalanguage, discussing the indicated
bits of the formal language.
Note that the symbols = and 6= are never part of the formal language. They belong to our
metalanguage — just like the symbols and discussed earlier.
So, excuse me if I go on a bit about this coloured font business. The point of using it is to
make clear, as far as possible, what is an expression in the formal language under discussion
I A.2 and what is not. For example, when writing about the toy formal language of Section A.2,
I wrote
On the other hand, supposing that we give the toy language a name, T say; that would not
be coloured green. It is not a formal expression, nor does it represent one: it represents the
language itself.
I have already pointed out that, in statements such as
x>0 x+y >y
there are two formal expressions connected by a metasymbol. The entire statement is not
a formal expression. Where colour coding is used, the whole statement must be coloured
green to be a formal expression.
I will try to use this colour convention as consistently as possible throughout these notes,
I Ch.6 except for Chapters 6 and 7. Virtually everything that looks formal in those chapters is
I Ch.7 formal (or semiformal), so the colouring would just be a nuisance there.
[The prefix “meta-” is used a lot in various contexts, often sloppily. As used in these
notes it has very little to do with meta-analyses (referring to clinical trials) or meta-
narratives (in postmodern discourse) or more or less any other popular use of the
prefix.]
B. FORMAL THEORIES 27
If we want to discuss different theories or sets of axioms, while keeping the same language
and set of rules, the ideas of the previous section are usually quite adequate. However
sometimes we want to compare different theories which have different languages or different
sets of rules (or both). Two common cases are
(1) Some new symbols are added to the language, thus extending the language, and with
it usually the theory. The rules don’t change, but (if the theories are axiomatic) usually
some new axioms are added to define the behaviour of the new symbols.
(2) The two languages are completely different, but the theories say the same things in
those different languages. We say that the two theories are equivalent.
In the particular case in which the underlying languages are the same, we say that we have
a simple extension.
Note that, in the case that A and B are axiomatic theories, we do not require that all the
axioms of A must also be axioms of B. However, the axioms of A must at least be theorems
of B, since we know that they are theorems of A.
(We could also of course consider what might happen if we change the rules as well. We
don’t bother here, because that is hardly ever done.)
The following result can be useful in proving that one axiomatic theory is an extension of
another.
C.2 Proposition
Let A and B be two axiomatic theories. Then B is an extension of A if and only if
(i) The underlying language of A ⊆ the underlying language of B
Proof. Suppose first the definition holds. Then Part (i) of this proposition holds because
the definition says so and Part (ii) holds because every axiom of A is a theorem of A.
Conversely, suppose that Parts (i) and (ii) of this proposition hold. Then Part (i) of the
definition is immediate. For Part (ii) follows from Proposition B.15(v). I B.15
ϕ:A → B and ψ : B → A.
Even more carefully, these functions are from the underlying language of A to that of B and
vice versa.
But we want more than that. Given an expression A in A, if we translate it into B and
then translate back again we might expect to get the expression we started with; that is,
ψ(ϕ(A)) = A. Well, this turns out to be too much to ask for in general; in fact this is not
going to work even for natural languages. As an example, suppose we had translations
Now French has no single-word translation for the English word “shallow”; it would normally
be translated as “pas profond”. Then translating this back again into English would normally
yield something like “not deep”. Thus we would have
The double translation brings us back, not to the same thing, but at least to something
which means the same. We will see that much the same thing happens in formal theories and
that the best we can expect is that the double translation back and forth yields something
equivalent to the original. Thus we have
(1) For any expression A in the underlying language of A, A ψ(ϕ(A)) (in A).
30 CHAPTER 1. FORMAL THEORIES
(10 ) For any expression B in the underlying language of B, B ϕ(ψ(B)) (in B).
(2) If H 1 , H 2 , . . . , H k A (in A), then ϕ(H1 ), ϕ(H2 ), . . . , ϕ(Hk ) ϕ(A) (in B).
(20 ) If K 1 , K 2 , . . . , K k B (in B), then ψ(K1 ), ψ(K2 ), . . . , ψ(Kk ) ψ(B) (in A).
It is a straightforward, if tedious, exercise to prove that equivalence of formal theories, as
just defined, is an equivalence relation in the ordinary sense.
C.5 Note
Part (2) of the theorem, in the case k = 0, tells us that if A is a theorem of A, then ϕ(A) is
a theorem of B. In the same way, if B is a theorem of B, then ψ(B) is a theorem of A.
C.6 Proposition
Let A and B be two axiomatic theories. Then A and B are equivalent if and only if there
are functions ϕ : A → B and ψ : B → A (that is, between their underlying languages)
with these properties:
(1) For any expression A in the underlying language of A, A ψ(ϕ(A)) (in A).
(10 ) For any expression B in the underlying language of B, B ϕ(ψ(B)) (in B).
Proof. Obvious.
Note that, since the languages are different, the rules will probably have to be different too.
Remark This proposition looks more complicated than the definition, however it is useful
because its conditions are usually easier to check than those of the definition. In particular,
for many theories based on logic, the rules are the same for both theories and only the
axioms differ; in this case Conditions (3) and (30 ) hold automatically.
2. SENTENTIAL LOGIC
In this chapter, the next one and Chapter 6 we will develop Logic and then Mathematics I6
as axiomatic theories. The definitions will be made as simple as possible. There are several
reasons for this.
Secondly, we are going to prove things about Logic and Mathematics. Some of these things
are quite startling, some have occasioned major shifts in the way mathematicians and Lo-
gicians think about their subjects. So it is important that the proofs of these facts are as
watertight as possible, and to do this requires that the subject matter, Logic or Mathe-
matics, be defined precisely. To do this it helps a lot to make the definitions as simple as
possible — but no simpler.
So we head towards developing a formal theory for Mathematics in Chapter 6. It will use
a surprisingly small number of symbols and so the underlying language will be correspond-
ingly simple. Nevertheless, it is designed so that everything that can be said in Logic or
Mathematics can be said in this language, though usually in an enormously long-winded
and obscure way. In the same way, anything that can be proved in Logic or Mathematics
can be proved in our formal theory, though at the expense of long, tedious and extremely
detailed proofs.
Having made our clean and simple definition, we will allow ourselves to introduce new
symbols and methods of proof to increase the convenience of the theory — shorter and more
legible expressions, smarter and easier to understand proofs and so on. As we progress, the
theory will become more and more like Logic and Mathematics as they are usually written
(though perhaps with a little more attention to detail) until eventually we can see how any
logical or mathematical proof can be performed in our defined language.
For safety’s sake (and in order to be able to describe what is going on) it is best to think of
this as two theories. Firstly there is the stripped-down one we are about to define in terms
of expressions, axioms, rules and so on; this I will call the fully-formal theory. We won’t
31
32 CHAPTER 2. SENTENTIAL LOGIC
mess with this one — after it has been defined, we will add no symbols or methods of proof
at all. Then we will have the more convenient theory, got by starting with the fully-formal
one, but adding new symbols and methods of proof as we go along. This I will call the
semi-formal theory.
Imagine trying to write out a proof of a theorem from First-Year Calculus (the Intermediate
Value Theorem say) using only the facilities of the fully-formal language. Such a (fully-
I 1.B.8 formal) proof was defined in 1.B.8, and involves starting from the axioms and proceeding by
the (tiny) steps allowed by the formal rules of the theory. It would have to contain the whole
necessary infrastructure — the definition of the Reals, with its algebra, infs and sups, lots of
elementary stuff about continuous functions, ideas about the natural numbers and induction
and so on and on. If it is not obvious already, the Deduction Theorem (proved later in this
chapter) will make it so, that such a proof would be stupendously long; it certainly wouldn’t
fit into a fat book. And then, when you move on to the next theorem (the Heine-Borel
Theorem say), you would have to write out the entire infrastructure all over again. This is
clearly not the way mathematicians actually proceed. For one thing (among many), they
build upon theorems already proved and so avoid re-inventing the wheel over and over again.
I 1.B.15 We have already made a start in allowing this facility. Proposition 1.B.15(vi) says that we
can use a set H of theorems already proved in the proof of a new one C. Well, we do not
add this new method of proof to our fully-formal theory; we leave that in its pristine state.
We add it as an allowed method of proof to our semi-formal theory, and then the corollary
tells us that anything that can be proved in our semi-formal theory can also be proved in
the fully-formal one.
This is the way we will proceed. We will make many incremental changes to our semi-formal
theory. Any time we change the underlying language, for instance by adding a new symbol,
we will show that
whatever can be expressed in the newly extended semi-formal language can also be expressed
in the formal one.
In the same way, if we add a new proof method to the semi-formal theory, we show that
anything that can be proved in the newly extended semi-formal language can also be proved
in the formal one.
Since we will be growing our semi-formal theory by many small incremental changes, the
easiest way of doing this is by showing, for each such change, that whatever can be expressed
or shown after the change could already be expressed or shown before it.
Sentential Logic can be described as the logic of sentences, that is, statements which are
simply true or false (but not both), such as 2 + 2 = 4 (true) or 2 + 2 = 31 (false). Predicate
Logic extends this by adding the idea of variables, so we can deal with expressions such
as x + y < z − 1, for which the truth or falsity depends upon the values of the variables
involved.
For Sentential Logic, we combine sentences with the operations ¬ (NOT), ∧ (AND), ∨ (OR),
⇒ (IMPLIES) and ⇔ (IF AND ONLY IF). These operations are usually called connectives.
Here I have given the symbols that will be used for them in these notes in the formal theories
and the names that are usually used for them.
The symbols used here are becoming the ones most commonly used in formal logic;
however be aware, if you consult other books on logic, that you may find other symbols
being used. Common alternatives are ∼ for NOT and either ⊃ or → for IMPLIES.
Some care has to be taken with the words NOT, AND, OR, IMPLIES, IF as their use
in Logic is rather different from their use in the ordinary English language. For a start,
it is important to realise that when sentences are combined using these operations, the
truth-value (truth or falsity) of the combined sentence is determined by the truth-values
of the simpler sentences from which it is constructed, and nothing else. It might help in
understanding these operations to think of them as follows: for any sentences X and Y ,
¬X (NOT X) is true when X is false and false when X is true.
X ∧Y (X AND Y ) is true when both X and Y are true, false otherwise.
X ∨Y (X OR Y ) is true when at least one of X and Y is true, false otherwise.
X⇒Y (X IMPLIES Y ) is only false in the case that X is true and Y false;
it is true otherwise.
X⇔Y (X IFF Y ) is true when X and Y have the same truth-value
(i.e. both true or both false).
Warning I have found that mathematics students often use the symbol ⇒ to mean
“therefore” in the course of a proof, like this:
a line of proof
⇒ another line of proof
What they are (usually) trying to say here is that the first line is true and the second
follows from that, so they are both true. But this is not what the ⇒ symbol means — see
the description above. The point is, that “implies” and “therefore” are two different ideas,
though they are related of course. If you really want to use a quick symbol for “therefore”,
there is a perfectly good one:
a line of proof
∴ another line of proof
Please do not misuse the ⇒ symbol in your writings for this course: it is used very frequently
in its proper meaning and misuse can cause much confusion.
For those who like to use TeX to typeset mathematics, this symbol can be set using
the unsurprising command \therefore
34 CHAPTER 2. SENTENTIAL LOGIC
A.4 Comments
(1) What happened to the other connectives, ∧, ∨ and ⇔? They have disappeared in our
program of simplifying the (fully-formal) language. All these other connectives of Sentential
Logic are redundant, since they can be expressed in terms of the two above, for example
p ∨ q is the same as (¬p) ⇒ q. We will shortly re-introduce them in the semi-formal theory.
(2) It is helpful to think of the proposition symbols as standing for sentences (otherwise
known as propositions). For instance, if p stands for the sentence 2 + 2 = 4 and q for the
sentence 3 + 5 = 42 then p ∧ q stands for 2 + 2 = 4 AND 3 + 5 = 42 (which is false because
q is).
(3) What do we mean by “. . . ” for the proposition symbols? We assume that there is a
countably infinite number of proposition symbols. Does this make sense? Can we actually
have an infinite number of different symbols, each of which is recognisably different from
every other one? Yes we can: here is an example:
Alternatively, we could simply use two symbols, for instance p and 0 , which would allow us
our unending set of proposition symbols
In a sense the last suggestion is the simplest of all, so why don’t we make this part of the
original definition, in keeping with our program? We want the language to be simple in the
sense that it will be easy to define and easy to prove things about it. Saying that we have
an infinite number of proposition symbols if we like makes the language easier to read and
write and does not in fact put any extra difficulties in our way. Indeed, there is really no
problem here — while we normally think of the symbols as being characters written down,
it was pointed out in the previous chapter that they can come from any kind of set at all.
So we see that there are in fact many versions of the language SL, depending upon which
particular set of symbols we choose to use. But the difference between these versions of the
language is entirely trivial, we ignore it and speak of the language SL.
In these notes I will use lower-case letters from the ordinary English alphabet for proposition
symbols. Expressions will usually be represented by upper-case letters.
A. A LANGUAGE FOR SENTENTIAL LOGIC 35
A.5 Expressions
(S1) Any proposition symbol (by itself) constitutes an expression.
A.6 Examples
For a start the proposition symbols p, q, r and so on are all expressions.
From these, using (S2) and (S3) we can construct more expressions such as (¬p) and (q ⇒ r).
Then we can keep going, thus ((¬p) ⇒ (q ⇒ r)) and ((¬(q ⇒ p)) ⇒ (¬(¬p)) ⇒ (q ⇒ (¬p))) .
A.7 Comments
(1) This definition should perhaps be put slightly more carefully. It is meant to state
that expressions can be built up from simpler ones in this way, and only in this way. One
can think of this as a set of construction rules, telling one how it is allowed to build up
expressions. This way of looking at it does not make it obvious that we have a decision
algorithm here. Perhaps a better approach is to notice that, in the second and third rules,
the new expression is longer than the one(s) from which it is formed. We can therefore
reword the definition in a recursive fashion:
(2) The rules introduce more parentheses than we usually use. However if the parenthe-
ses in (S3) were omitted, for instance, we would be allowed to write p ⇒ q ⇒ r, which is
meaningless (since implication is not associative). Our normal use of parentheses depends
on complicated rules of precedence in the application of operators, which presumably we
could include in our rules, at the expense of making our definition much more complicated.
The easier approach is to insist on a complete set of parentheses every time, at the expense
of legibility — for example, the expression we would normally write (p ⇒ ¬p) ⇒ ¬p must
now be written ((p ⇒ (¬p)) ⇒ (¬p)).
(3) In our semi-formal version, we will of course allow ourselves to dispense with paren-
theses according to the usual conventions. Shortly we will introduce the other connectives
into our semi-formal language as abbreviations for the appropriate constructions in the fully-
formal language. For instance, when in the semi-formal language we write p ∨ q we are only
using a convenient shorthand for the true formal ((¬p) ⇒ q).
Shortly we will define formal equivalents for the semi-formal p ∧ q and p ⇔ q. In the mean-
time, you might be interested in trying to figure them out for yourself.
That completes the definition of the language. Of course we have a theory in mind, namely
the set of all tautologies, the logical formulas which are always true, irrespective of the truth
36 CHAPTER 2. SENTENTIAL LOGIC
p∧q ⇔q∧p
(p ⇒ (q ∧ ¬q)) ⇒ ¬p
¬(p ∧ q) ⇔ (¬p) ∨ (¬q) .
[These are, of course, presented in semi-formal version. Try translating them into fully-formal
language to see why we don’t want to use it much.]
Comment (1) above tells us how we are allowed to build up expressions. But there is
another aspect to this: any particular expression can be built up in only one way. Putting
this another way, given an expression X, we can break it down into its component parts
in only one way. This is important, because it means that any given expression cannot be
understood to mean two different things; our expressions are not ambiguous.
This property of expressions, known as uniqueness of parsing, relies on the way we have
included parentheses in the definition. If we leave out parentheses, for example write
p ⇒ q ⇒ r then the result is indeed ambiguous: it could mean either (p ⇒ q) ⇒ r or
p ⇒ (q ⇒ r), and these are two quite different things.
(2) X is of the form (¬A) for some (shorter) expression A; and in this case there is only
one such possible A.
(3) X is of the form (A ⇒ B) for some (shorter) expressions A and B and in this case
there is only one possible A and one possible B.
I will not give the rather tiresome proof of this here. However, to point out that there is
definitely something to be proved, consider the expression
If you want to prove it for yourself, let me encourage you. Here are some hints. For any
expression, assign numbers to all the symbols in it by starting at the beginning (left-hand
end) with 0, adding 1 every time you encounter an opening parenthesis and subtracting 1
every time you pass a closing parenthesis, thus:
12 2 2 34 4 4 44 3 3 32 1 23 3 3 44 4 4 43 2 33 3 3 321
((p ⇒ ((p ⇒ p) ⇒ p)) ⇒ ((p ⇒ (p ⇒ p)) ⇒ (p ⇒ p)))
Notice that opening and closing parentheses are treated slightly differently: you add one as
you reach an opening parenthesis, but subtract one just after you pass a closing parenthesis.
If you now erase all the little numbers, except for the ones over parentheses, it is clear how
A. A LANGUAGE FOR SENTENTIAL LOGIC 37
Now all you have to do is work out how this numbering of parentheses defines the way the
expression has been put together, then prove that it works for all expressions. Ha! (If you
study, or have studied, Computer Science, this proof should be very familiar to you.)
(3) for any expressions A and B, if it is true of both of them then it is also true of
(A ⇒ B).
This process is called induction over the construction of the expression. We will be using it
often.
(SL1) (A ⇒ (B ⇒ A))
As before, to say that these are axiom schemata means that each represents an infinite
number of actual axioms, created by substituting any actual expressions for the symbols A,
B and C used here. For example, as instances of (SL1) are the actual axioms
(p ⇒ (q ⇒ p))
(p ⇒ (p ⇒ p))
(((¬q) ⇒ (¬p)) ⇒ ((p ⇒ (¬p)) ⇒ ((¬q) ⇒ (¬p))))
The axioms are a little easier to comprehend if we remove some parentheses and write them
semi-formally:
(SL10 ) A ⇒ (B ⇒ A)
(SL20 )
A ⇒ (B ⇒ C) ⇒ (A ⇒ B) ⇒ (A ⇒ C)
(SL30 )
(¬A ⇒ ¬B) ⇒ (¬A ⇒ B) ⇒ A
(There’s a few other visual clues here too — extra space and some larger parentheses.)
A , (A ⇒ B)
(MP)
B
that is, given A and (A ⇒ B) one can deduce B. The rôle of rules in defining theories and
proofs has been discussed in the previous chapter.
However, if we allow ourselves to look forward to this notation, we can rewrite the three
axioms thus:
(SL100 ) A∧B ⇒ A
(SL200 ) (A ∧ B ⇒ C) ∧ (A ⇒ B) ⇒ (A ⇒ C)
When we move on to Predicate Logic in the next chapter, setting up the axioms in this
manner becomes very messy. So we won’t pursue this any further.
(v) It is possible to define this theory semantically; we will look at this later. Also
the “truth table” method yields a decision process without too much trouble: an algorithm
which will decide whether any given expression is a theorem or not. Once we have checked
this, we will have proved that the theory is decidable.
It is in fact usually easier to check whether an expression in SL is theorem or not by using
the truth-table method than by looking for a semi-formal proof. However, there is no such
nice procedure when we move on to Predicate Logic, so we need to prepare for that already.
That is: Predicate Logic contains Sentential Logic, so if we don’t do SL by proofs now, we
will just have to do it by proofs in the next chapter.
40 CHAPTER 2. SENTENTIAL LOGIC
B.1 Proposition
(p ⇒ p)
Proof
That is the full formal proof. It is extremely unfriendly to human readers — on the other
hand a computer would have little trouble with it. Below it is presented again in a less
formal form. Parentheses have been removed, some spacing has been inserted, line numbers
have been added, and a gloss on the right indicating which rules or axioms have been used.
In Line 1, the gloss “Ax 2” means that this is an instance of Axiom Schema (SL2) and, in
line 3, the gloss “MP: 1 and 2” means that this is the result of applying Modus Ponens to
Lines 1 and 2.
With this proof and the next one, it is a very good idea to check it through line by line and
make sure you understand just how each line is an instance of an axiom or an application
of MP, as claimed.
Because the axioms and the rule are all schemata, it follows that we can substitute any
expression for p in the above theorem and its proof and we will still have a valid theorem
and proof. In other words, we immediately get a schematic version of the theorem:
B.2 Proposition
A⇒A
(Here A stands for any expression.)
B. EXAMPLES OF FORMAL PROOFS 41
B.3 Proposition
p⇒q , q⇒r p⇒r
Proof.
1 (q ⇒ r) ⇒ (p ⇒ (q ⇒ r)) Ax 1
2 q⇒r Hyp
3 p ⇒ (q ⇒ r) MP: 1 and 2
4 (p ⇒ (q ⇒ r)) ⇒ ((p ⇒ q) ⇒ (p ⇒ r)) Ax 2
5 (p ⇒ q) ⇒ (p ⇒ r) MP: 3 and 4
6 p⇒q Hyp
7 p⇒r MP: 6 and 5
Recall that, in a proof of a deduction, we are allowed to write down any one of the hypotheses
as a step at any time. Here I have done this in Steps 2 and 6.
As with Theorem B.2, we could just as easily have proved a schematic version:
B.4 Proposition
A⇒B , B⇒C A⇒C
We can make this kind of substitution in any theorem and proof in SL. Since a schematic the-
orem is more general that an ordinary one, we will present and prove theorems in schematic
form throughout this chapter.
42 CHAPTER 2. SENTENTIAL LOGIC
B.5 Proposition
Another way of looking at the axioms.
(SL1d) A B⇒A
(SL2d) A ⇒ (B ⇒ C) (A ⇒ B) ⇒ (A ⇒ C)
(SL2d0 ) A ⇒ (B ⇒ C) , A ⇒ B A⇒C
(SL3d) ¬A ⇒ ¬B , ¬A ⇒ B A
Proof. (SL1d)
1 A hyp
2 A ⇒ (B ⇒ A) Ax SL1
3 B⇒A MP: 1 and 2
(SL2d)
1 A ⇒ (B ⇒ C) hyp
2 (A ⇒ (B ⇒ C)) ⇒ ((A ⇒ B) ⇒ (B ⇒ C)) Ax SL2
3 (A ⇒ B) ⇒ (B ⇒ C) MP: 1 and 2
(SL2d0 )
1 A ⇒ (B ⇒ C) hyp
2 A⇒B hyp
3 (A ⇒ (B ⇒ C)) ⇒ ((A ⇒ B) ⇒ (B ⇒ C)) Ax SL2
4 (A ⇒ B) ⇒ (B ⇒ C) MP: 1 and 3
5 A⇒C MP: 2 and 4
(SL3d)
1 ¬A ⇒ ¬B hyp
2 ¬A ⇒ B hyp
3 (¬A ⇒ ¬B) ⇒ ((¬A ⇒ B) ⇒ A) Ax SL3
4 (¬A ⇒ ¬B) ⇒ A MP: 1 and 3
5 A MP: 2 and 4
You will notice that all these follow in the same way from the axioms, using Modus Ponens
to justify relpacing an implication (⇒) with a turnstile ( ).
C. DEDUCTIONS 43
C Deductions
C.1 Using deductions as rules
One important thing about deductions is that, once proved, we can henceforth use them
in the same way as rules (in our semi-formal theory). For example, having proved the last
deduction above (Proposition B.4), we can henceforth proceed as though we had the new I B.4
deductive rule
A⇒B , B⇒C
.
A⇒C
What we are doing here is extending our semi-formal language, and making it more con-
venient. We are not changing the formal language in any way. Using the deduction just
proved, for instance, we now allow ourselves to put a line in one of our semi-formal proofs
of the form
A⇒C By Proposition B.4
provided we already have two lines A ⇒ B and B ⇒ C somewhere. The justification for
this new freedom is that there is a recipe for converting such a proof-step into a full formal
version for the object language. Here it is:
Replace the line A ⇒ C in the semi-formal version with the entire proof of the de-
duction B.4, as given above. The result is a valid fully-formal proof. The new lines
added to the proof include A ⇒ C, and so the new proof contains all the lines of the
old proof in the same order. Therefore all the old lines are still valid, for the same
reasons (except for A ⇒ C itself, for which we need a proper reason). We now justify
each of the new steps in the proof (and that includes A ⇒ C). If a new step was an
axiom, then it still is, so it is valid. If a new step followed by Modus Ponens from
two earlier steps in the proof of the deduction, then these earlier steps have just been
added to our new formal proof: the step being examined is valid. Finally, if the new
step was a hypothesis of the deduction, then it is a repetition of an earlier step of the
old proof, and this is also valid, as remarked in 1.B.11.
If H C then H ⇒C.
is the important one, and the harder to prove. Notice that it tells us that, in some sense,
the symbol ⇒ says, within the formal language, what the symbol says outside it (in
the metalanguage). This corresponds to what we are used to: that one can do formal
manipulations with the symbol ⇒, yet it can also be interpreted as saying that one thing
can be deduced from another. For our present purposes it is important to keep the two ideas
distinct, though the Deduction Theorem states the close relationship between them.
Proof. We assume that we have a proof of H C; let’s call this the “old proof” Using
this we will construct a proof of H ⇒ C, the “new proof”. We start by putting “H ⇒”
in front of every line of the old proof:
This new proof is not finished yet. Next we are going to inject some extra steps into the
new proof, a few before each of the statements just created, like this:
For each i, the extra steps Xi we insert depend upon what sort of step Si was in the old
proof. From the definition of a deduction, each step of the old proof is either an axiom, the
hypothesis H or follows from two earlier steps by modus ponens. Let us take these cases
one at a time.
Suppose first that the step in the old proof is an axiom, A say. Then, in the new proof we
replace it by A , A ⇒ (H ⇒ A) , H ⇒ A, so that part of the two proofs looks like this:
Old proof New proof
.. ..
. .
..
. A
..
. A ⇒ (H ⇒ A)
A H⇒A
.. ..
. .
The first step here is OK because it is an axiom. The second step is also an axiom — an
instance of Axiom SL1. And now the third step follows from these two by Modus Ponens.
Now suppose that the step in the old proof we are fixing is the hypothesis H.
Old proof New proof
.. ..
. .
H H⇒H
.. ..
. .
This is not as bad as it looks. Steps (1) and (2) are the ones we have already fixed. The
horrible Step (3) is simply an instance of Axiom SL2. Steps (4) and (5) now follow by Modus
Ponens.
Finally observe that the last step in the old proof was C, so the last step in the new one is
H ⇒ C, as required.
Oh yes, it is an “if and only if” theorem. We should also prove the reverse form, that
If H⇒C then H C.
This is easy:
H hyp
H⇒C theorem
C MP
We now look at the more general form of the theorem, in which there may be a few extra
hypotheses.
Proof. This is a straightforward extension of the proof just given. There is only one more
case to consider: where the step in the old proof is one of the extra hypotheses Ki (that is
a perfectly valid step in such a proof). Then the corresponding step in the new proof which
needs fixing in H ⇒ Ki . This is left as an exercise.
Secondly, notice that we are using the method of incrementally improving our semi-formal
language. Our first increment was to show that we could use previously proved theorems as
steps in proofs. Now we add the Deduction Theorem as our second incremental improvement.
Note that our recipe above does not convert the given proof into a fully formal one; it only
converts into a “first increment” one (because we appeal to the H ⇒ H theorem). But that’s
OK because we already know that such a proof works.
H1 ⇒ (H2 ⇒ B) .
And, as mentioned several times already, when we come to defining ∧, we can write these
in a form that is easier to understand:
H1 ∧ H2 ⇒ B
H1 ∧ H2 ∧ H3 ⇒ B .
There is no particular reason to suppose that there is only a finite number of extra hypotheses
in the above version of the Deduction Theorem. Any set of extra hypotheses can be used:
C.6 Example
Earlier in this chapter, we proved that
A⇒B (B ⇒ C) ⇒ (A ⇒ C)
and that
(A ⇒ B) ⇒ ((B ⇒ C) ⇒ (A ⇒ C)) .
It is a useful exercise to use the recipe given above to construct fully-formal proofs for the
last two statements. It then becomes clear what a great labour-saving device the Deduction
Theorem is.
48 CHAPTER 2. SENTENTIAL LOGIC
A ⇒ (B ⇒ C) B ⇒ (A ⇒ C) . (–2)
Now the expression to the right of the turnstile is agai of the form X ⇒ Y , so we repeat the
operation:
A ⇒ (B ⇒ C) , B A⇒C. (–3)
And again:
A ⇒ (B ⇒ C) , B , A C. (–4)
The point of these manipulations is, firstly, that (–1), (–2), (–3) and (–4) are all equivalent
in the sense that if any one of them is a valid statement, then they all are. And, secondly,
that the final form (–4) is much more likely to be easy to prove than the first one — the
thing to be proved is much simpler and there are more hypotheses which can be employed.
Here is the easy proof of (–4)
1 A ⇒ (B ⇒ C) hyp
2 B hyp
3 A hyp
4 B⇒C MP: 3, 1
5 C MP: 2, 4
Having proved (–4) then (–3), then (–2), then (–1) all follow in order by direct applications
of the Deduction Theorem.
Should we be interested in the actual equivalence of these statement, the argument in the
forwards direction from (–1) to (–2) to (–3) to (–4) does not require the Deduction Theorem,
just applications of Modus Ponens. For example, to get from (–1) to (–2), we assume that
(–1) has been proved and exhibit a proof of (–2):
1 A ⇒ (B ⇒ C) hyp
2 (A ⇒ (B ⇒ C)) ⇒ (B ⇒ (A ⇒ C)) Thm (–1)
3 (B ⇒ (A ⇒ C)) MP: 1, 3
A
B says that, in any proof in which A has already occurred as a step,
it is now allowed to write down B as a step. More precisely, in the original
definition of the formal theory, one of the given rules is that, B may occur as
a step in any formal proof in which A has already occurred as a step.
Firstly, the first two are certainly equivalent; that is, if either one of them is true, then so is
the other. That is what the Deduction Theorem tells us.
A
Secondly, if B is true, then it is easy to see that A B must be too: here is a proof
A hyp
A
B because B
However this does not work the other way around. For example, in SL, we have A B
whenever B is an axiom, irrespective of what A is, and this is certainly not one of the given
rules of the theory (Modus Ponens is the only one such).
50 CHAPTER 2. SENTENTIAL LOGIC
E Subdeductions
In this section we introduce another extension to our proof methods. This one is a great
convenience, and we will use it a lot. We start with an example.
We can clean up the proof of A ⇒ B , B ⇒ C A ⇒ C even further. Informally, we are
tempted to argue thus:
We are given the hypotheses A ⇒ B and B ⇒ C. Now suppose that A (a temporary hy-
pothesis). Then, from the first hypothesis we have B and then, from that and the second
hypothesis, we have C. Since we got this on the supposition A, we have proved that A ⇒ C.
It will be very useful to make this kind of subsidiary deduction part of our semi-formal
theory. Let us try to make it look more formal.
1 A⇒B hyp
2 B⇒C hyp
3 A subhyp Proof 1
4 B MP: 3 & 1
5 C MP: 4 & 2
6 A⇒C Ded: 3–5
Here I have enclosed the subsidiary deduction in an inner box, and for both the main deduc-
tion and the subsidiary one have drawn a line to separate the hypotheses from the proper
proof steps. It is very important in this kind of proof to know where our subdeductions
start and end, and exactly which lines are hypotheses and which are the proper proof steps.
These boxes seem to me to be a good way of doing it.
So here we have a subsidiary deduction (Lines 3–5) based on a temporary (or subsidiary)
hypothesis A. When that subsidiary proof is finished, we can conclude that A ⇒ C. The
steps in the subsidiary proof are allowed to use any steps that came before, whether in the
subsidiary proof or the main one — for example, Line 5 makes use of Lines 4 and 2.
How do we justify this? As usual, by showing that a proof which uses such a subsidiary
deduction can be replaced by one which does not. The recipe is to make the subsidiary
proof into a separate deduction which uses all the preceding steps of the main proof as
hypotheses. Like this:
1 A⇒B hyp
2 B⇒C hyp Proof 2
3 A hyp
4 B MP: 3 & 1
5 C MP: 4 & 2
Note that the proper steps of the new proof (Lines 4 and 5 in this example) are justified by
E. SUBDEDUCTIONS 51
the same things as they were in Proof 1 above. Proof 2 is, of course, a proof of the deduction
A⇒B, B ⇒C, A C
A⇒B, B ⇒C A⇒C
as required.
Now let us prove that this works in general. Suppose then that we have a proof with a
subsidiary deduction of this form:
Here the usual rules apply: the hypotheses H 1 , H 2 , . . . , H h and K can be any expressions
at all.
Each of the “main” steps S 1 , S 2 , . . . , S s can be an axiom, one of the main hypotheses
H 1 , H 2 , . . . , H h or follow by Modus Ponens from any of its preceding main steps Si . It
may not use any of K 1 , K 2 , . . . , K k or T 1 , T 2 , . . . , T t for the purposes of Modus Ponens.
Each of the steps T 1 , T 2 , . . . , T t of the subsidiary proof can be an axiom, one of the main
hypotheses H 1 , H 2 , . . . , H h or the subsidiary one K or follow by Modus Ponens from any
of the preceding main steps Si or subsidiary ones Ti .
The process of going from the subproof to the line K ⇒ Tt is often called “discharging the
hypothesis K”.
H1 , H2 , . . . , Hk hypotheses
S1 , S2 , . . . , Sp−1 more hypotheses
K yet another hypothesis Proof 4
T1
..
.
Tt
Once again, note that anything which justified one of the lines T 1 , T 2 , . . . , T t in Proof 3 still
justifies that line in Proof 4, so this is then a valid proof of the deduction
H 1 , H 2 , . . . , H h , S 1 , S 2 , . . . , S p−1 , K Ti
H 1 , H 2 , . . . , H h , S 1 , S 2 , . . . , S p−1 K ⇒ Ti
(2) Similarly, the step discharging the hypothesis does not have to involve the last step
of the subproof: K ⇒ Tu , where Tu is any of the substeps, is equally valid. But again, why
do this? It only renders the last few steps of the subproof a waste of time. Except . . .
(3) It is OK to discharge the hypothesis several times, for instance
K ⇒ Tu (Step Sp )
K ⇒ Tv (Step Sp+1 )
K ⇒ Tt (Step Sp+2 )
Sp+3
(Useful if you want to show that several things follow from the same hypothesis.)
E. SUBDEDUCTIONS 53
(4) More usefully, one may have multiple hypotheses for a subproof, for example:
H1 , H2 , . . . , Hh main hyps
S1
..
.
Sp−1
K1 , K2 , K3 subhyp
T1
..
.
Tt
K1 ⇒ (K2 ⇒ (K3 ⇒ Tt ))
Sp+1
..
.
Ss
Once we have the and connective to work with, the discharging step can be written K1 ∧ K2 ∧ K3 ⇒ Ti .
(5) One can have multiple subdeductions, any number, either sequentially or nested, for
examples:
H1 , H2 , . . . , Hh main hyps
H1 , H2 , . . . , Hh main hyps
S1
S1 ..
.. .
.
Sp−1
Sp−1
K subhyp
K subhyp
T1
T1 ..
.. .
.
Tt L subsubhyp
K ⇒ Tt U1
..
.
L subhyp
Uu
U1
.. L ⇒ Uu
. ..
.
Uu
Tt
L ⇒ Uu
K ⇒ Tt
Sp+1
.. Sp+1
. ..
.
Ss
Ss
54 CHAPTER 2. SENTENTIAL LOGIC
E.1 Example
To some extent one can view the use of subproofs as a convenient way of folding uses of the
Deduction Theorem into one’s proofs. For example, we can prove the result discussed in
I C.7 Example C.7
(A ⇒ (B ⇒ C)) ⇒ (B ⇒ (A ⇒ C))
using this technique.
1 A ⇒ (B ⇒ C) hyp
2 B subhyp
3 A subsubhyp
4 B⇒C MP: 3 and 1
5 C MP: 2 and 4
6 A⇒C Ded: 3–5
7 B ⇒ (A ⇒ C)) Ded: 2–6
F. SOME THEOREMS AND DEDUCTIONS 55
F.1 Discussion
In this section I give a number of theorems and deductions and their proofs. This is by no
means an exhaustive list of the theorems of SL, not even a list of all the more useful ones.
There are enough theorems here, however, to supply a working basis: from this further
theorems can be proved without difficulty. The proofs illustrate the techniques discussed
above.
Do not feel you have to read all these proofs in detail — but if you are interested enough
to do so, don’t let me discourage you! One thread of this chapter, the next and Chapter 6
is to provide enough detail to show that the whole of mathematics, as usually done, can be
set up as a formal theory on the axiomatic basis provided here. The point of this being that
we can then prove things about mathematics, some of them quite surprising.
Nevertheless, the examples of proof techniques provided here are important, and you should
read and digest enough of them to understand well what is going on. A few of the more
straightforward proofs are left as exercises; you should supply these for yourself.
London is the capital of France ⇒ The blue whale is the largest living creature
56 CHAPTER 2. SENTENTIAL LOGIC
is true because the first part (about London) is false. From the truth-value table given
above, we can see that this is true whether or not the second part (about blue whales) is
true (it isn’t by the way).
IG Warning We will see later (Section G) that using calculations involving truth-values in
this way can be used to decide whether an expression of SL is a theorem or not. But the
fact that that technique works needs proof, and that proof requires a lot of the theorems
developed in this section. So to use truth-value calculations to verify any of the proofs in
this section would be circular; in short, they must not be used yet.
So why does one usually think of implication as requiring some sort of argument or proof?
I think that this is because, in mathematics, we seldom want to prove implications between
sentences (like, say, 2 + 2 = 4 ⇒ 2 + 3 = 5), but rather implications which are true for an
infinite number of values ((∀x)(∀y)(∀z)(x < y ⇒ x + z < y + z)). But you cannot check
the truth of an infinite number of cases one by one — you need a general argument.
F.2 Proposition
A⇒A
“If something is true, then it is true” is about as obvious as you can get. But we do need to
check that it is a theorem.
F.3 Proposition
(a) (A ⇒ B) ⇒ ((B ⇒ C) ⇒ (A ⇒ C))
Put this another way: every expression in the theory becomes both true and false. The
theory then is inconsistent and probably quite useless.
We will prove in Section G that this cannot happen for SL and in section 3.H that it cannot IG
happen for plain predicate logic, which will be reassuring. But the possibility remains that I 3.H
by extending these theories by adding more symbols and axioms, as we do to make various
important formal theories, including mathematics, we may introduce an expression which
can be proved both true and false. Of course, we really want to avoid this if possible.
Proof of (b) This has already been given (twice) in C.7 and E.1. I C.7
I E.1
Proof of (c) We prove (c0 ) and then (c) follows by two applications of the Deduction
Theorem.
1 A hyp
2 ¬A hyp
3 ¬B ⇒ A By Axiom SL1d on 1
4 ¬B ⇒ ¬A By Axiom SL1d on 2
5 (¬B ⇒ ¬A) ⇒ ((¬B ⇒ A) ⇒ B) Ax 3
6 (¬B ⇒ A) ⇒ B MP: 4 & 5
7 B MP: 3 & 6
A ¬¬A
58 CHAPTER 2. SENTENTIAL LOGIC
This is the result usually known as “double negation”. To say that it is not true that A is
not true is the same as saying that A is true. In other words, two negatives nested in this
way cancel each other out.
Double negation is not often used in this bare-faced way in ordinary language:
But beware of ordinary-language uses where a double negative does not cancel out, but
is used to reinforce the negativeness — or just ungrammatically. Consider I can’t get no
satisfaction by you-know-who or the (ironic?) We don’t need no education by Pink Floyd.
Or the deplorable word irregardless (where both the prefix “ir-” and the suffix “-less” should
denote negation.
1 ¬¬A hyp
2 ¬A ⇒ ¬¬A SL1d on 1
3 (¬A ⇒ ¬¬A) ⇒ ((¬A ⇒ ¬A) ⇒ A) Ax 3
4 (¬A ⇒ ¬A) ⇒ A MP: 2 and 3)
5 ¬A ⇒ ¬A Proposition F.2
6 A MP: 4 and 5
And
1 A hyp
2 ¬¬¬A ⇒ ¬A Part 1 of this proof
3 (¬¬¬A ⇒ ¬A) ⇒ ((¬¬¬A ⇒ A) ⇒ ¬¬A) Ax 3
4 (¬¬¬A ⇒ A) ⇒ ¬¬A MP: 2 and 3)
5 ¬¬¬A ⇒ A SL1d on 1
6 ¬¬A MP: 5 and 4
(c00 ) A ⇒ ¬B B ⇒ ¬A
(d00 ) ¬A ⇒ B ¬B ⇒ A
Note that (b) is the standard way of arguing by “contrapositive”. We prove “if A then B”
by showing that if B is false then A must be too. The expression ¬B ⇒ ¬A is called the
contrapositive of A ⇒ B.
(a), (c) and (d) are just variations on this idea.
Note also that (a0 ) can be rewritten in the form
(a000 ) A ⇒ B , ¬B ¬A
This is one of the versions of proof by contradiction (a.k.a. reductio ad absurdum). It says
that if A implies something that we know to be false, then A must itself be false. The same
maniplulation can be applied to (b0 ), (c0 ) and (d0 ) to produce some other variations on the
proof-by-contradiction theme.
Proof of (b).
1 ¬B ⇒ ¬A hyp
2 (¬B ⇒ ¬A) ⇒ ((¬B ⇒ A) ⇒ B) Ax 3
3 (¬B ⇒ A) ⇒ B MP: 1 and 2
4 A subhyp
5 ¬B ⇒ A Proposition F.3(d) on 4
6 B MP: 5 and 3
7 A⇒B Ded: 4–6
Proof of (a)
1 A⇒B hyp
2 ¬¬A ⇒ A Proposition F.4
3 ¬¬A ⇒ B Proposition F.3(a) on 1 and 2
4 B ⇒ ¬¬B Proposition F.4 again
5 ¬¬A ⇒ ¬¬B F.3(a) on 3 and 4
6 (¬¬A ⇒ ¬¬B) ⇒ (¬B ⇒ ¬A) First part of this proof
7 ¬B ⇒ ¬A MP: 5 and 6
60 CHAPTER 2. SENTENTIAL LOGIC
From now on I will not give both forms of the propositions, except where there is good
reason to. But don’t forget that they exist.
F.6 Proposition
(a) B ⇒ (A ⇒ B)
(b) ¬A ⇒ (A ⇒ B)
(c) ¬(A ⇒ B) ⇒ A
(d) ¬(A ⇒ B) ⇒ ¬B
Proofs as easy exercises. (a) of course is Axiom SL1, so needs no proof. It is included
here because it fits in with the pattern of the other parts.
F.7 Notation
Now let us introduce the other connectives. As already stated, they are not part of the
fully-formal language, but are used in the semi-formal one as abbreviations for the more
complicated expressions which they represent.
The simplest way to think of this is as three new rules. The first says that you can substitute
any expression of the form A ∧ B for one of the form ¬(A ⇒ ¬B) and vice versa (same A
and B of course). The other two are to be read the same way.
These substitutions can even be carried out on subexpressions. For instance, it is allowed
to substitute
F.8 Proposition
(a) A∧B ⇒A
(b) A∧B ⇒B
(c) A ⇒ (B ⇒ (A ∧ B))
F. SOME THEOREMS AND DEDUCTIONS 61
Proof of (a0 ). We have to show that ¬(A ⇒ ¬B) A. Note that Proposition F.3(c) can
be rewritten ¬A , A B which, by the Deduction Theorem, gives ¬A ⇒ (A ⇒ B).
Substituting ¬B for B in this starts the proof:
Proof of (b0 ) (Note that this is not obvious because we have not yet proved that ∧ is
commutative.) We want to prove that ¬(A ⇒ ¬B) B.
F.9 Proposition
(a) A⇒A∨B
(b) B ⇒A∨B
These basic properties of disjunction make sense if you think of A ∨ B as saying that at least
one of A, B is true.
F.10 Proposition
(a) (A ⇒ (B ∧ C)) ⇒ (A ⇒ B)
(b) (A ⇒ (B ∧ C)) ⇒ (A ⇒ C)
(d) (A ⇒ (B ⇒ C)) ⇒ (A ∧ B ⇒ C)
(e) (A ∧ B ⇒ C) ⇒ (A ⇒ (B ⇒ C))
These are more properties of conjunction. Again, they make sense if you think in terms of
truth-values.
I F.8 (b) and (c) follow in the same way from Propositions F.8(b) and (c).
(d) This is an equivalence so as usual there are two things to prove. First we prove
A ⇒ (B ⇒ C) A ∧ B ⇒ C.
1 A ⇒ (B ⇒ C) hyp
2 A∧B subhyp
3 A Proposition F.8(a) on 2
4 B⇒C MP: 3 and 1
5 B Proposition F.8(b) on 2
6 C MP: 5 and 4
7 A∧B ⇒C Ded: 2–6
Now we prove A ∧ B ⇒ C A ⇒ (B ⇒ C). For the first time we will use a doubly-
nested subdeduction.
1 A∧B ⇒ C hyp
2 A subhyp
3 B subsubhyp
4 A∧B Proposition F.8(c) on 2 and 3
5 C MP: 4 and 1
6 B⇒C Ded: 3–5
7 A ⇒ (B ⇒ C) Ded: 2–6
F. SOME THEOREMS AND DEDUCTIONS 63
F.11 Proposition
(a) ((A ∨ B) ⇒ C) ⇒ (A ⇒ C)
(b) ((A ∨ B) ⇒ C) ⇒ (B ⇒ C)
(c) (A ⇒ C) ⇒ ((B ⇒ C) ⇒ (A ∨ B ⇒ C))
1 (¬A ⇒ B) ⇒ C hyp
2 B subhyp
3 ¬A ⇒ B Proposition F.3(d) on 2
4 C MP: 3 and 1
5 B⇒C Ded: 2–4
1 A⇒C hyp
2 B⇒C hyp
3 ¬A ⇒ B subhyp
4 ¬A ⇒ C Proposition F.3(a) on 2 and 3
5 ¬C ⇒ A Proposition F.5(c) on 4
6 ¬C ⇒ ¬A Proposition F.5(a) on 1
7 C Ax 3 as deduction on 5 and 6
8 (¬A ⇒ B) ⇒ C Ded: 3–7
(d) is an immediate corollary of (c). Propositions (e) and (f ) follow immediately from the
fact that A ∨ B is defined to mean ¬A ⇒ B.
64 CHAPTER 2. SENTENTIAL LOGIC
F.12 Comment
The last two groups of deductions, specifically the first three of each of F.10 and F.11
encapsulate the functions of ∧ and ∨
A ⇒ (B ∧ C) A⇒B (A ∨ B) ⇒ C A⇒C
A ⇒ (B ∧ C) A⇒C (A ∨ B) ⇒ C B⇒C
A⇒B , A⇒C A ⇒ (B ∧ C) A⇒C , B⇒C (A ∨ B) ⇒ C
From here on, all properties of these connectives follow from these six deductions and no
further reference to the rather strange definitions is required (though occasional proofs
straight from the definitions do turn out to be quicker). Observe the pleasing symmetry
between ∧ and ∨; this flows on to their various properties.
I F.11 Proposition F.11(d) is traditionally called the Constructive Dilemma; I prefer the more
obvious name Proof by Cases, because that’s what it is. It is a very common way of arguing.
Here A and B represent the two cases, and the proposition says that if one or other of the
two cases is true and C can be proved in either case, then C is true. It is easily extended to
cope with three or more cases.
As an exercise, try proving the three-case version
A∨B∨C , A⇒D , B ⇒D , C ⇒D D
Oops! Not so fast: the notation A ∨ B ∨ C is (so far) undefined and ambiguous. We have
a pretty good idea what it means, but this sort of notation assumes that the operation ∨
is associative and we haven’t proved that yet. We will do that, but until that time and for
the purposes of this challenge, take A ∨ B ∨ C to be convenient shorthand for (A ∨ B) ∨ C.
I F.8 Note that Deductions F.8(a) to (c) can be applied to the definition of ⇔ to yield a rather
similar group of deductions involving it . . .
F.13 Proposition
(a) (A ⇔ B) ⇒ (A ⇒ B)
(b) (A ⇔ B) ⇒ (B ⇒ A)
(c) (A ⇒ B) ⇒ ((B ⇒ A) ⇒ (A ⇔ B))
And some basic properties of equivalence
F.14 Proposition
(a) A⇔A
(b) (A ⇔ B) ⇒ (B ⇔ A)
(c) (A ⇔ B) ⇒ ((B ⇔ C) ⇒ (A ⇔ C))
(c) is probably easier to understand in its deduction form:
F. SOME THEOREMS AND DEDUCTIONS 65
X A
X0 A0
66 CHAPTER 2. SENTENTIAL LOGIC
Note that it is possible that there may be several occurrences of A in X. In this case the
theorem does not require that the substitution should be carried out consistently on all
occurrences. It may be carried out on some and others left as they were, and the conclusion
still holds.
Mental picture of A occurring three times as a subexpression of B:
X A A A
Mental picture of substituting A0 for some but not all of the occurrences of A and so changing
X to X 0 :
X0 A0 A A0
Note well that we now have two very different styles of substitution. In substitution of
equivalents, as above, we may substitute some but not necessarily all of the subexpression in
question. Compare with making a substitution into one of the axioms to form an instance of
the axiom; in this case we must make the same substitution consistently to all occurrences
of the subexpression in question. For instance, given the first axiom A ⇒ (B ⇒ A), we may
substitute any expression for A (whether it is equivalent to A or not, for example, Q ⇒
(B ⇒ Q), but we must substitute both occurrences of the subexpression: Q ⇒ (B ⇒ A) is
not OK.
We need to prove a lemma first. This basically covers a few simple cases and is easy to
prove.
F.17 Lemma
Proof (a).
1 A ⇔ A0 hyp
0
2 A⇒A F.13(a)
3 ¬A0 ⇒ ¬A F.6
4 A0 ⇒ A F.13(b)
5 ¬A ⇒ ¬A0 F.6 again
6 ¬A ⇔ ¬A0 Definition of ⇔, steps 3 and 5
(b)
F. SOME THEOREMS AND DEDUCTIONS 67
1 A ⇔ A0 hyp
2 A ⇒ A0 F.13(a) on hyp
0
3 A ⇒B subhyp
4 A⇒B Proposition F.3(a) on 2 and 3
0
6 (A ⇒ B) ⇒ (A ⇒ B) Subded, 3–4
7 A0 ⇒ A F.13(b) on hyp
8 A⇒B subhyp
9 A0 ⇒ B Proposition F.3(a) on 7 and 8
10 (A ⇒ B) ⇒ (A0 ⇒ B) Subded, 8–9
11 (A ⇒ B) ⇔ (A0 ⇒ B) Definition of ⇔, steps 6 and 10
Proof. We prove this by induction over the construction of X (that is, we assume the
theorem is true in all cases where X is shorter).
Since A is a subexpression of X, one of the following cases must hold:—
(1) X = A.
In this case X 0 = A0 and the result is trivially true.
(2) X = ¬P and A is a subexpression of P .
In this case X 0 = ¬P 0 , where P 0 is the result of substituting A0 for A in P . But, by the
inductive hypothesis, P 0 ⇔ P , and then X 0 ⇔ X by Part (a) of the lemma.
(3) X = P ⇒ Q and A is a subexpression of P .
In this case X 0 = P 0 ⇒ Q, where P 0 is the result of substituting A0 for A in P . But, by the
inductive hypothesis, P 0 ⇔ P , and then X 0 ⇔ X by Part (b) of the lemma.
F.19 Corollary
These are probably all a little easier to read in their deduction forms.
68 CHAPTER 2. SENTENTIAL LOGIC
F.20 Corollary
Multiple applications of the Substitution of Equivalents theorem allow us to make multiple
substitutions, for example:
(b0 ) A ⇔ A0 , B ⇔ B 0 (A ⇒ B) ⇔ (A0 ⇒ B 0 )
(c) (A ⇔ A0 ) ⇒ ((B ⇔ B 0 ) ⇒ ((A ∧ B) ⇔ (A0 ∧ B 0 )))
(c0 ) A ⇔ A0 , B ⇔ B 0 (A ∧ B) ⇔ (A0 ∧ B 0 )
(d) (A ⇔ A0 ) ⇒ ((B ⇔ B 0 ) ⇒ ((A ∨ B) ⇔ (A0 ∨ B 0 )))
(d0 ) A ⇔ A0 , B ⇔ B 0 (A ∨ B) ⇔ (A0 ∨ B 0 )
(e) (A ⇔ A0 ) ⇒ ((B ⇔ B 0 ) ⇒ ((A ⇔ B) ⇔ (A0 ⇔ B 0 )))
(e0 ) A ⇔ A0 , B ⇔ B 0 (A ⇔ B) ⇔ (A0 ⇔ B 0 )
Anecdote: Bertrand Russell, in his autobiography, tells us that around the time he
was working on Principia Mathematica he was puzzled by the following idea. Consider
the true statement
That means that “Sir Walter Scott” and “the author of Ivanhoe” are the same thing
— equal. That being the case, one can be substituted for the other in any sentence
without altering its meaning. Do that to the sentence displayed above and we get
The two displayed statements should both say the same thing — but they clearly
don’t. (OK. Is this a problem or isn’t it?)
Most of these express really obvious facts if you think in terms of truth-values. Firstly, they
are commutative. — If A and B are both true, then B and A are both true. And if at least
one of A and B is true, then at least one of B and A is true. How’s that for obvious?
(a) (A ∧ B) ⇔ (B ∧ A)
(b) (A ∨ B) ⇔ (B ∨ A)
Now that we have associativity we may write things like A ∧ B ∧ C ∧ D without fear of
ambiguity, because no matter how we bracket the expression, we get equivalent expressions.
The same goes of course for ∨. This fact about associativity should be familiar from ordinary
algebra.
Put this together with commutativity and we can change the order in such expressions at
will too. For instance
A∧B∧C ∧D ⇔ D∧B∧A∧C
This, combined with substitution of equivalents, gives us much the same freedom in manip-
ulating expressions that we are used to with ordinary algebra.
Where you have expressions involving both conjunctions and disjunctions, the conjunctions
are traditionally supposed to take precedence, so an expression such as A ∧ B ∨ C ∧ D is
to be interpreted as meaning (A ∧ B) ∨ (C ∧ D) and not A ∧ (B ∨ C) ∧ D. However it is
not a good idea to leave parentheses out in such expressions, mainly because it makes them
difficult to read. I won’t do it in these notes.
(b) A ∨ (B ∧ C) ⇔ (A ∨ B) ∧ (A ∨ C) and (B ∧ C) ∨ A ⇔ (B ∨ A) ∧ (C ∨ A)
70 CHAPTER 2. SENTENTIAL LOGIC
That this works both ways is a little startling. Compare with ordinary algebra (of numbers).
We are used to multiplication distributing over addition:
x(y + z) = xy + xz
x + yz = (x + y)(x + z) ??!
F.25 Corollary
Substituting ¬A for A and/or ¬B for B gives us some variations on these, which are usually
also referred to as De Morgan’s Laws.
(c) ¬(¬A ∧ B) ⇔ A ∨ ¬B
F.26 Corollary
(a) ¬A ¬(A ∧ B) and ¬B ¬(A ∧ B)
(b) ¬A , ¬B ¬(A ∨ B)
(b) A ∧ (A ∨ B) ⇔ A
F. SOME THEOREMS AND DEDUCTIONS 71
(a) A∧A⇔ A
(b) A∨A⇔ A
Now, choose your favourite theorem, preferably a very simple one (for instance, I would
choose p ⇒ p, where p is some fixed chosen letter of the alphabet); call it T for short. (The
symbol is supposed to suggest “true”.)
In the same way, choose a nice simple antitheorem (¬(p ⇒ p) is convenient), and call it F
(to suggest “false”).
Now, if A is any theorem, we have both A and T. From this it is easy to prove
that A ⇔ T. Using Substitution of Equivalents (F.18), we now know that, wherever any I F.18
theorem occurs as a subexpression of some larger expression, we may replace it by T.
For example, we know that (A ∧ B) ⇒ A and so we can replace it by T in, say,
(A ∨ B) ⇒ ((A ∧ B) ⇒ A) ∧ C) (–1A)
(A ∧ B) ⇒ (T ∧ C) . (–1B)
In the same way, we may substitute F for any antitheorem, wherever it occurs as a subex-
pression. For example, we can prove that ¬(A ∧ ¬(B ⇒ A)), which is to say that
(A ∧ ¬(B ⇒ A)) is an antitheorem. So, making the substitution in
For example, using the next theorem we see that the expression (–1B) above simplifies to
(A ∧ B) ⇒ C
which is not a theorem, so neither was (–1A). Similarly expression (–2B) simplifies to
¬A ⇒ ¬A ,
Exercise Prove some of these. (Hint: the proofs are all very easy.)
G. DECIDABILITY OF SENTENTIAL LOGIC 73
In this section we will do just that: I describe an organised method, the Truth Table method
of computing the truth-value of an expression in terms of the truth-values of its component
proposition symbols. This will be fact an algorithm to decide whether any given expression
is a theorem or not; that this is so is also proved in this section.
The section is in two parts. In the first part the Truth Table method is described. In the
second, we prove the fact that it performs as claimed: that the description of how to read
off whether an expression is a theorem or not is actually correct.
The upshot of all this is that we will be proving that SL is a decidable theory.
The first part is well worth reading and understanding in detail; it is not difficult. The second
part, the proof that it works, is long and fairly complicated; it is included for completeness
because this is a fairly significant result. However, I would only recommend you read it
through if you are particularly interested in seeing the proof of this result. (That said, there
are some techniques and subsidiary results here that should be of interest to anyone who
intends proceeding to further studies of logic in general.)
G.1 Preliminary
Consider a theorem of SL, for example
p ∧ (q ∨ r) ⇔ (p ∧ q) ∨ (p ∧ r) .
Note that we are talking about an actual theorem here, not a theorem schema such as we
have been working with most of the time so far. Here p, q and r are letters of the alphabet
of the theory. And now suppose we have a proof of this,
S1, S2, . . . , Sn
Now, the expression being proved uses only the three proposition symbols p, q, r, however we
must consider the possibility that the proof might need to contain some extra proposition
symbols, other than these three. Now this seems unlikely, and a glance through the previous
section will show that none of the proofs given there require any “extra” proposition symbols,
that is, ones that didn’t already occur in the expression to be proved. In what follows we
need to know that no such extra symbols are ever necessary, so we now prove this obvious
fact.
G.2 Theorem
Let A be an expression (of SL) and suppose that p1 , p2 , . . . , pk are the proposition symbols
(letters) of the language which occur in A. Then A if and only if there is a proof of it
in which the only letters which occur anywhere in the proof are these same p1 , p2 , . . . , pk .
74 CHAPTER 2. SENTENTIAL LOGIC
p q p∧q p q p∨q
1 1 1 1 1 1
1 0 0 1 0 1
0 1 0 0 1 1
0 0 0 0 0 0
G. DECIDABILITY OF SENTENTIAL LOGIC 75
p q p⇒q p q p⇔q
1 1 1 1 1 1
1 0 0 1 0 0
0 1 1 0 1 0
0 0 1 0 0 1
One can construct a table for more complicated expressions by a mechanical process. For
example, to construct a truth table for the expression (p ⇒ q) ⇒ (¬q ⇒ ¬p), start with
an empty table, in which there is a column corresponding to each of its sub-expressions,
building up from left to right:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1
1 0
0 1
0 0
Now we fill in the missing entries in the table, working from left to right. The first three
columns can be filled in by consulting the basic tables given above:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1 1 0 0
1 0 0 0 1
0 1 1 1 0
0 0 1 1 1
Now the ¬q ⇒ ¬p column can be filled in, using the values in the ¬p and ¬q columns and
the basic table for ⇒:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1 1 0 0 1
1 0 0 0 1 0
0 1 1 1 0 1
0 0 1 1 1 1
Finally the last column can be filled in, using the values in the p ⇒ q and ¬q ⇒ ¬p columns
and the basic table for ⇒ again:
p q p⇒q ¬p ¬q ¬q ⇒ ¬p (p ⇒ q) ⇒ (¬q ⇒ ¬p)
1 1 1 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 0 1 1
0 0 1 1 1 1 1
The upshot of this is that we now have an algorithm (a completely routine method) to decide
whether any expression is a theorem of SL or not. In other words, the theory is decidable.
One more example. Let us ask if the connective ⇒ is associative, that is, whether
((p ⇒ q) ⇒ r) ⇔ (p ⇒ (q ⇒ r))
is a theorem of SL or not. Here is the table:
p q r p⇒q (p ⇒ q) ⇒ r q⇒r p ⇒ (q ⇒ r) The expression
1 1 1 1 1 1 1 1
1 1 0 1 0 0 0 1
1 0 1 0 1 1 1 1
1 0 0 0 1 1 1 1
0 1 1 1 1 1 1 1
0 1 0 1 0 0 1 0
0 0 1 1 1 1 1 1
0 0 0 1 0 1 1 0
First notice that the leftmost columns, corresponding to the letters p, q and r must con-
tain every combination of truth-values. Consequently, if the expression being investigated
has n letters, the table must have 2n rows. Second, observe that some of the entries in
the final column are not 1 in this example. From this we conclude that the expression
((p ⇒ q) ⇒ r) ⇔ (p ⇒ (q ⇒ r)) is not a tautology, and hence not a theorem of SL.
Here we embark upon the process of proving that these claims are correct. Read on if you
are hard enough!
We may re-interpret the columns of these tables as tables of functions. For example, take
the table of p ⇒ q above. Think of it as a table of a function of two variables (the values of p
and q), each of which can take values from the set B = {0, 1}. It is better not to confuse the
expression p ⇒ q with its truth-table function, so let us call the function Tp⇒q . It is thus a
function B2 → B. Doing the same thing for the first big table above, we have another way
of looking at it, as a table of values of the functions involved:
e Tp⇒q (e) T¬p (e) T¬q (e) T¬q⇒¬p (e) T(p⇒q)⇒(¬q⇒¬p) (e)
(1,1) 1 0 0 1 1
(1,0) 0 0 1 0 1
(0,1) 1 1 0 1 1
(0,0) 1 1 1 1 1
This interpretation will be used in the remainder of this section to prove that the Truth-table
Algorithm works as claimed.
G.4 Definition
Let us write B for the set {0, 1} so that Bk = { (e1 , e2 , . . . , ek ) : each ei = 0 or 1 }.
In Bk we will write O for the all-0 k-tuple and I for the all-1 one: O = (0, 0, . . . , 0) and
G. DECIDABILITY OF SENTENTIAL LOGIC 77
I = (1, 1, . . . , 1).
We will be working with functions Bk → B. We define operations on these functions
corresponding to the connectives; if f and g are such functions, then ¬f , f ∧ g, f ∨ g, f ⇒ g
and f ⇔ g are defined as follows:
(
1 if f (e) = 0 ;
(¬f )(e) =
0 otherwise.
( (
1 if f (e) = g(e) = 1 ; 0 if f (e) = g(e) = 0 ;
(f ∧ g)(e) = (f ∨ g)(e) =
0 otherwise. 1 otherwise.
( (
0 if f (e) = 1 and g(e) = 0 ; 1 if f (e) = g(e) ;
(f ⇒ g)(e) = (f ⇔ g)(e) =
1 otherwise. 0 otherwise.
G.5 Theorem
A if and only if A is a TTT.
This theorem will establish the decidability of the theory, because whether a expression is a
TTT or not is clearly decidable. (For each of the 2k “vectors” e, compute TA (e) according to
the recipe given by their definition, and see if all the values are 1. The truth-table method,
as described in G.4 is simply an organised way of doing this.) I G.4
Proof. A is a theorem if and only if it is one of the lines in a formal proof. Therefore it is
enough to show that the three axioms are TTTs and that, if A and A ⇒ B are TTTs, then
so is B.
Consider Axiom SL1, A ⇒ (B ⇒ A). Let e ∈ Bk and check all possible combinations of
values of TA (e) and TB (e).
If TA (e) = 1 and TB (e) = 1 then TB⇒A (e) = 1 and so TA⇒(B⇒A) (e) = 1.
G.7 Definition
(i) A conjunctive term is a expression of the form Q1 ∧Q2 ∧ . . . ∧Qk where, for each i, Qi
is either pi or ¬pi . (So every proposition symbol occurs exactly once, either with or
without a preceding not-sign.)
F
or C 1 ∨C 2 ∨ . . . ∨C s where C 1 , C 2 , . . . , C s are distinct conjunctive terms.
G.8 Notes
(1) We will assume that some fixed order has been chosen for the conjunctive terms in
the second form of the expression. It does not matter what it is, so long as we choose
one (say, lexicographic) and stick to it.
(2) The form F is used to act as the “gold-plated” always-false expression, such as ¬(p1 ⇒ p1 )
I F.30 (see Section F.30).
(3) We will want to talk about “the conjunctive terms which occur in” any given disjunc-
tive form. It is obvious what this means for the second form above. We will simply
define F to contain no conjunctive terms.
G. DECIDABILITY OF SENTENTIAL LOGIC 79
(4) Having made these clarifications, we can now say that, given any set of conjunctive
terms, there is exactly one disjunctive form which contains just these terms and no
others.
G.9 Definition
To any member e of Bk there corresponds a conjunctive term Ke = Q1 ∧Q2 ∧ . . . ∧Qk .
where (
pi if ei = 1,
Qi =
¬pi if ei = 0.
ADF = C 1 ∨C 2 ∨ . . . ∨C s
where C 1 , C 2 , . . . , C s are just the conjunctive terms Ke for which TA (e) = 1, in our chosen
order. (Note: in the case e = O, we interpret this to mean that ADF is F.)
G.10 Lemma
Proof. (i) follows immediately from the definition and (ii) follows easily from (i).
G.11 Corollaries
(i) Any disjunctive form is its own disjunctive form, that is, if D is a disjunctive form, then
DDF = D.
(iii) Suppose that D is a disjunctive form. Then D is a TTT if and only if it contains all
2k conjunctive terms.
G.12 Lemma
Let A = C 1 ∨C 2 ∨ . . . ∨C s , where C 1 , C 2 , . . . , C s are all the 2k conjunctive terms. Then
A.
Proof. We have pi ∨ ¬pi for each i (Proposition F.28). By Proposition F.8(c) then, I F.28
I F.8
(p1 ∨ ¬p1 ) ∧ (p2 ∨ ¬p2 ) .
Now, by repeated applications of distributivity, commutativity and associativity (Proposi-
I F.21 tions F.21–F.23),
I F.23
(p1 ∧ p2 ) ∨ (p1 ∧ ¬p2 ) ∨ (¬p1 ∧ p2 ) ∨ (¬p1 ∧ ¬p2 ) ,
which is the lemma in the case k = 2. If k ≥ 3, we also have p3 ∨ ¬p3 and so, by
I F.8 Proposition F.8(c) again,
(p1 ∧ p2 ) ∨ (p1 ∧ ¬p2 ) ∨ (¬p1 ∧ p2 ) ∨ (¬p1 ∧ ¬p2 ) ∧ (p3 ∨ ¬p3 )
which, after umpteen applications of the distributive, associative and commutative laws
expands out to the required eight-term expression. AND SO ON.
G.13 Corollary
Let D be a disjunctive form. Then D if and only if it is a TTT.
G.14 Lemma
pi C 1 ∨C 2 ∨ . . . ∨C s ,
k−1
where C 1 , C 2 , . . . , C s are all the 2 conjunctive terms of the form Q1 ∧Q2 ∧ . . . ∧Qk in
which Qi = pi .
G.15 Lemma
Let A = A1 ∨A2 ∨ . . . ∨As be a disjunctive form in which A1 , A2 , . . . , As are some of the
available conjunctive terms and let B = B 1 ∨B 2 ∨ . . . ∨B t be another disjunctive form in
which the conjunctive terms B 1 , B 2 , . . . , B t are exactly the ones which are not involved in
A. (In the special case in which A1 , A2 , . . . , As happens to be all the conjunctive terms, we
interpret this to mean B = F. And vice versa.) Then ¬A B.
Proof. We show first that ¬A B. Using Modus Ponens, it is enough to show that
¬A ⇒ B. From the definition of the disjunction symbol, this is the same as ¬¬A ∨ B.
I F.4 Now by Proposition F.4 and Substitution of Equivalences, this is the same as proving that
A ∨ B. Because of the assumed form of A and B, this is the preceding lemma.
Next we prove that B ¬A. Again we ring some changes: using Modus Ponens it is enough
to prove that B ⇒ ¬A. This is the same as ¬¬(B ⇒ ¬A) which, by the definition
of conjunction, is the same as ¬(B ∧ A), so this is what we will establish.
G. DECIDABILITY OF SENTENTIAL LOGIC 81
Now B ∧ A is of the form (B 1 ∨B 2 ∨ . . . ∨B t )∧(A1 ∨A2 ∨ . . . ∨As ) and so, by repeated use of
the distributive, associative and commutative laws,
the terms on the right being all expressions of the form B q ∧Ap (for q = 1, 2, . . . , t and
p = 1, 2, . . . , s). In each such term, B q and Ap are distinct and so there is at least one index
i such that Ap contains pi and B q contains ¬pi , or vice versa. Then, using commutativity,
there is a expression E such that B q ∧Ap pi ∧¬pi ∧E. Using the tricks in Section F.30 I F.30
(specifically F.31(d) and (e)), we get ¬(B q ∧Ap ) and, since this is true of every one of the
terms in the expansion (1) above, ¬(B ∧ A).
G.16 Lemma
Let A = A1 ∨A2 ∨ . . . ∨As and B = B 1 ∨B 2 ∨ . . . ∨B t be disjunctive forms in which
A1 , A2 , . . . , As is a subset of B 1 , B 2 , . . . , B t . Then A B.
Proof. If we write E 1 , E 2 , . . . , E u for those conjunctive terms which appear in B but not
in A, then using distributivity and associativity, A∨E 1 ∨E 2 ∨ . . . ∨E u B. By Proposition
E7a, A A∨E 1 ∨E 2 ∨ . . . ∨E u and we are done.
G.17 Lemma
For any expression A, A ADF .
where A1 , A2 , . . . , As are all the conjunctive terms which are not amongst B 1 , B 2 , . . . , B t
(the last step being by Lemma F13). Since B DF = B 1 ∨B 2 ∨ . . . ∨B t , TB (e) = 1 for just
those e such that Ke is amongst B 1 , B 2 , . . . , B t . Thus TA (e) = T¬B (e) = 1 for exactly those
e such that Ke is not amongst B 1 , B 2 , . . . , B t , that is, for those e such that Ke is amongst
A1 , A2 , . . . , As . Therefore ADF = A1 ∨A2 ∨ . . . ∨As and we have proved that A ADF in
this case.
A (B 1 ∨B 2 ∨ . . . ∨B t ) ⇒ (C 1 ∨C 2 ∨ . . . ∨C u ) .
82 CHAPTER 2. SENTENTIAL LOGIC
(B 1 ∨B 2 ∨ . . . ∨B t ) ⇒ (C 1 ∨C 2 ∨ . . . ∨C u ) ¬(B 1 ∨B 2 ∨ . . . ∨B t ) ∨ (C 1 ∨C 2 ∨ . . . ∨C u )
Proof. If A is a TTT then, by Corollary F9(ii), so is ADF . Then, by Corollary F11, ADF .
Then, by Proposition F15, A.
G.19 Corollary
SL is consistent.
Proof. We must show that the theory contains no theorem of the form A ∧ ¬A. By the
definition of a truth-table function, TA∧¬A (e) = 0 for all e and so A ∧ ¬A is not a TTT.
The result follows.
G.20 Corollary
SL is not complete.
Proof. We must demonstrate the existence of a expression A such that neither A nor ¬A
is a theorem. This is so with A = p1 because
We assume that values have been assigned to all the proposition symbols, p, q, . . . and
define consequent values for all expressions. We will call an expression which takes the value
2, no matter what values are assigned to its proposition symbols, a strange tautology. We
then show that all instances of Axioms SL2 and SL3 are strange tautologies and that, if P
and P ⇒ Q both have value 2, then Q must have value 2 also. It follows from this that
every expression which can be proved from SL2 and SL3 is a strange tautology.
Finally, we show that Axiom SL1 is not a strange tautology; it follows that it cannot be
proved from the other two axioms.
To define how values are calculated for expressions, we use tables like those used in Section
G.3: I G.3
p q p⇒q
2 2 2
2 1 0
p ¬p 2 0 0
2 2 1 2 0
1 1 1 1 0
0 1 1 0 2
0 2 2
0 1 2
0 0 2
Inspection of the table for ⇒ above shows that, if P and P ⇒ Q both have value 2, then
so does Q. It now follows that every expression that can be proved from SL2 and SL3 is a
strange tautology.
Verify that, when p has value 1 and q has value 0, SL1 has value 0; thus SL1 is not a strange
tautology and so SL1 cannot be proved from the other two axioms.
Since we are using the standard table for ⇒, it follows immediately that Axioms SL1 and
SL2 are strange tautologies and that if P and P ⇒ Q both have value 1, then so does Q.
Therefore any expression that can be proved from SL1 and SL2 is also a strange tautology.
When p and q both have value 0, SL3 has value 0. Therefore SL3 is not a strange autology
and cannot be proved from SL1 and SL2.
I. OTHER AXIOMATISATIONS 85
I Other axiomatisations
There are many other ways of setting up Sentential Logic as a formal theory. Here are a few.
They are all equivalent to SL in the sense of Definition 1.C.4 — which is a careful way of I 1.C.4
saying that, while the new theory has different symbols and therefore different expressions,
there is a (usually pretty obvious) way of translating back and forth between them in such
a way that theorems correspond to theorems (both ways).
Connectives ¬ ⇒
Proposition symbols p q r and so on.
⇒A⇒BA
⇒⇒A⇒BC⇒⇒AB⇒AC
⇒⇒¬A¬ B⇒⇒¬ABA
The only rule of inference is Modus Ponens, which now looks like this:
A , ⇒AB
B
Connectives ¬ ∨
Proposition symbols p q r and so on.
Punctuation ( )
86 CHAPTER 2. SENTENTIAL LOGIC
In the semi-formal version of the language, we omit parentheses in the usual way and use
A ⇒ B as an abbreviation for ¬A ∨ B. The axioms are:
(1) A∨A⇒A
(2) A⇒A∨B
A , (¬A ∨ B)
B
A∧B ⇒A
A∧B ⇒B
A ⇒ (B ⇒ A ∧ B).
Alternatively, we can add the new symbol and some new rules of inference. For example,
again starting from our original theory, we can add the symbol ∧ and two rules of inference:
(Here I have used several different types of parentheses to make the axiom easier to read.)
Even more stripped down is an axiomatisation using the Sheffer stroke, otherwise known as
the nand function (short for “not-and”). It has the following truth-function table:
p q p⇒q
1 1 0
1 0 1
0 1 1
0 0 1
This connective is more important than you might at first think. Firstly, note that all the
other connectives can be defined in terms of nand, as follows
¬p as p|p , p ⇒ q as p|(q|q) , p ∧ q as (p|q)|(p|q) , p ∨ q as (p|p)|(q|q)
Consider digital circuitry — computers, calculators, mobile phones and so on. The adjec-
tive “digital” means that most of the circuitry inside implements logical operations. Each
particular brand of circuitry defines two voltage levels to represent true and false (or 1 and
0). For instance the (now old-fashioned) TTL series uses 5 volts and 0 volts for these; my
Mac, which is not old-fashioned yet, uses 1 volt and 0 volts.
Note that, when binary notation is used, arithmetic operations such as addition and multi-
plication can be implemented as (rather complicated) logical operations.
So called “gates” implement the various logical operators. For example, you can go down
to your local electronics shop and buy an and-gate. It will have two inputs, p and q say,
and an output which will give p ∧ q. (Well, actually what you can buy is an IC chip which
probably contains at least four and-gates, plus connections for its power supply.) If you
have the time and money, some electronics know-how and a soldering iron, you can build
almost any logical network you choose.
Inside the IC, these operations are implemented by tiny transistors. By far the simplest
operation to implement is the nand operation, which only takes one transistor. Other
operations are usually built up from combinations of nand gates according to the rules given
above.
So more complicated IC circuits, such as the CPU in a computer, are packages containing
a large number of these gates pre-wired together to implement the required behaviour.
Nowadays a CPU chip will contain many millions of microscopic nand-gates and very little
else.
3. PREDICATE LOGIC AND FIRST-ORDER
THEORIES
which we might expect to be a theorem in a theory about the natural numbers. Expressing
it more symbolically,
(∀x)(∀y)(∀z)(x + y = x + z ⇒ y = z) .
Consider the component parts of this.
• First there are the variable symbols, x, y and z. No surprises here — so far.
• Then there is the function symbol + representing addition, a binary function (that
is, a function of two variables). Mathematics, as usually written, is full of different kinds of
notation for functions, some quite bizarre. We have the usual f (x, y) kind of notation but
also “infix” notation such as is used for addition and subtraction, the superscript notation
xy for exponentiation, things like nr for the binomial coefficient; and what are we to make
of xy for multiplication? — scarcely a notation at all.
We wish to set up a general formal language, and for this we will avoid all these special
types of function notation; we will insist that for the fully-formal language all functions are
written in the same “generic” f (x, y) way. For instance, we will write addition as +(x, y)
instead of x + y, and our example expression becomes
(∀x)(∀y)(∀z)(+(x, y) = +(x, z) ⇒ y = z) .
• The statement contains a relation, equality, also binary. In order to preserve flexibility
in our formal language, we will also insist that all relations are written in generic r(x, y)
form. Accordingly, we will write = (x, y) instead of x = y and our expression becomes
• And we have the quantifiers, (∀x), (∀y) and (∀z), of which much more anon.
Here we have the essential components of a first-order language (that is, the language for
any theory based upon Predicate Logic): variable symbols, function and relation symbols,
and the logical connectives augmented by quantifiers.
89
90 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
There are several different ways we can think about a first-order theory.
(1) We might have a specific interpretation in mind, such as elementary Number Theory,
Group Theory or even one of the axiomatisations of Set Theory (which is powerful enough
to contain all of mathematics) — the main ones are Morse-Kelley (MK) Von Neumann-
Bernays-Gödel (VBG) and Zermelo-Fraenkel (ZF). In this case we will have a (usually
small) number of functions and relations, each of which has a definite interpretation in our
target structure (such as addition, equality, set-membership etc.). We will discuss several
I4 specific theories in Chapter 4 and MK in detail in Chapter 6.
I6
(2) We might want to discuss first-order theories in general, for instance in order to
discover properties common to them all. Then we will have functions and relations, and
perhaps some axioms, but no specific axioms or interpretation in mind.
(3) We might want to consider Predicate Logic itself, in much the same way as we
considered Sentential Logic in the previous chapter. Then we will have no proper axioms
at all. It is tempting to call such a theory “Pure Predicate Logic”, but this won’t do;
unfortunately this term is already used for a specific kind of Predicate Logic (one without
any function symbols at all either) so I will use the term Plain Predicate Logic for this —
a first-order theory with no proper axioms.
Plain Predicate Logic is not uniquely defined — one can create different plain predicate
logics by making different choices of the sets of function and relation symbols. However
it turns out that the difference between different predicate logics is entirely inessential, in
that, for any property of theories we are actually interested in, if it is true for one predicate
logic then it is true for all of them. Therefore there is no harm in choosing one gold plated
example as a representative and calling it Plain Predicate Logic (with capitals) or just PL.
This is what we will do.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 91
Most of the important properties of PL are properties of all first-order theories. There
is, however, one important theorem about PL in this chapter which is not shared by all
first-order theories: it is consistent (proof later)
Expressions Expressions
Theorems Theorems
Proper axioms
PL Axioms PL Axioms
Now we start defining the theory in the usual way, by specifying what our alphabet is.
A.4 Comments
(i) When we come to specific first-order theories, the function symbols will be meant to
have permanently assigned meanings. In ordinary mathematics, examples might be symbols
such as + or ∩. In an example such as “let f be a differentiable function . . . ” we would
not include the letter f among the function symbols above — other arrangements are made
for this usage (that is, functions created on the fly). The actual list of function symbols
would depend on which particular language we are considering. In the same way, the
relation symbols also will be meant to have permanently assigned meanings. In ordinary
mathematics, examples would be = or ∈ . Again, the actual list of relation symbols would
depend on the particular example of a language we were considering. In a first-order theory
there would usually be only a finite number of function and relation symbols (though this
is not necessary); in some theories one or other of these sets might even be empty.
(ii) In Sentential Logic we have already made a distinction between the formal symbols
of the language, those which are part of the basic definition of the language, such as ¬ and
⇒, and the others, which we might call semi-formal, those which are introduced later, for
instance as abbreviations, such as ∧ and ⇔.
In the same way, in first-order theories, there are formal functions and relations which
are part of the definition of the language and (usually) some semi-formal ones which are
introduced later in various ways. For example, most first-order theories have the symbol =
for equality, nearly always introduced as a formal relation. Then the symbol 6= for inequality
can be defined later (x 6= y to mean ¬(x = y)) and so is a semi-formal symbol.
(iii) Note that we assume that there is an infinite number of variable symbols at our
disposal. In a number of proofs it will be necessary to introduce, at some point, a new
variable symbol or symbols which has not appeared in the proof so far (a “fresh” variable).
It is necessary to know that this can always be done. In nearly all first-order theories
a countably infinite set of variable symbols is big enough, however we will never need to
assume this for our investigations. That releases us from the problem of defining countability
at this early stage: the remark about countability is simply a comment.
(iv) In Plain Predicate Logic, the function and relation symbols are not supposed to have
any specific meanings, and we usually use non-specific symbols such as f and r to stand
for them. It is usual to suppose that we have a finite (or at most denumerably infinite)
collection of each, though this is not necessary. It is always supposed that the symbols
have been chosen so that variable, function and relation symbols can be recognised as such.
(Thinking of them as sets, the sets are disjoint.)
(v) Function symbols have an arity, which is a natural number (≥ 0) — a function of
arity n is simply a function of n variables. (A function of arity n is called n-ary. Common
terminology is nullary for 0-ary, unary for 1-ary, binary for 2-ary and ternary for 3-ary.)
Note that nullary functions are just constants, so this allows us to pick out permanent names
for important constants in our theory, examples in ordinary mathematics being 0 and ∅.
Relation symbols also have arity. A nullary relation represents a truth value that does
not depend on any variables, so nullary relations have the same rôle in Predicate Logic as
proposition symbols have in Sentential Logic.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 93
(vi) As with Sentential Logic, other connectives could be added to our list. Conspicuous
by their absence are ∧, ∨, ⇔ and ∃. These connectives can be defined in terms of the ones
given above, so we happily leave them out of the formal language and then introduce them
in the semi-formal one. (We have already done this with ∧, ∨ and ⇔ and will do it with ∃
in this chapter.)
A.5 Discussion
We now set about defining the language. The existence of function and relation symbols
means that this language is a bit more complicated than that for Sentential Logic.
There are two kinds of “meaningful” strings in this language: terms and expressions.
A term names something. For example, in a language describing the natural numbers, we
would expect terms like 3 and 1 + 1 and x2 + 3y + 2, each of which names some natural
number. Observe that the possible presence of variable symbols means that we may not
know exactly which thing is being named unless we know which things the variable symbols
name.
A expression is a statement. For example, in the language for natural numbers, we would
expect expressions like 2 + 2 = 4 and 13 + 5 < 3 and x2 + 3y − 2 = 0. It is convenient to
think of an expression as a sentence which is either true or false. Note that expressions may
well express facts which are false and that, if they contain variable symbols, their truth or
falsity may depend upon the values of the variables involved.
We need to define terms first and then, using this, define expressions.
A.7 Comments
(i) As remarked above, think of terms as behaving like nouns and noun phrases in
natural languages: they name things.
(ii) As already mentioned, a special case is a nullary function, which can be thought of
as a constant. Examples are 0 (in a language for the natural numbers) and ∅ (in a language
for set theory).
94 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
(iii) Notice that this definition is recursive: Item (T2) allows us to substitute functions
inside functions to any depth. For example, we would construct the term x2 + 3y in our
formal language thus:
+( .(x, x) , .(3, y) )
A.9 Comments
(i) Expressions of the form (E1) are called atomic.
(ii) Expressions normally represent sentences (e.g. 2 + 2 = 4), which are either true or
false, or predicates (e.g. x + y = z), whose truth or falsity depends upon what the variables
represent.
(iii) As usual, the fully-formal language uses lots of parentheses to make sure that am-
biguous constructions cannot arise. And, also as usual, we will omit them in the semi-formal
language whenever possible, An expression of the form (∀x P ) will usually be written (∀x)P
in our semi-formal language whenever possible without confusion, because it is easier to read
that way — compare (∀x x + 0 = x) with (∀x) x + 0 = x .
The (∀x) construction is usually called a universal quantifier. (This word is also sometimes
used for the single symbol ∀, but I won’t do that in these notes.)
The universal quantifier (∀x) expresses in our formal language the frequently-used preamble
“For all x . . . ”.
(iv) We will add the connectives ∧ , ∨ and ⇔ to our semi-formal language as before; later
in this chapter we will also add the existential quantifier (∃x) in the same way. It stands
for the preamble “There exists an x such that . . . ”.
(v) If we want to include the symbols T and F (for true and false), they are nullary
relations, and so are expressions.
(vi) Where the forms (E1), (E2) and (E3) above occur in the semi-formal language we
will usually leave off the outer parentheses where no ambiguity can arise, and write them
¬P , P ⇒ Q and (∀x)P .
A. A LANGUAGE FOR FIRST-ORDER THEORIES 95
Again we have a unique parsing lemma; we had one before for SL in 2.A.8. I 2.A.8
(ii) Let A be a term. Then only one of cases (T1) or (T2) of the definition above hold. If
(T1) holds, there is a unique variable symbol x such that A = x. If (T2) holds, then there
is a unique function symbol f and terms s1 , s2 , . . . , sn such that A = f (s1 , s2 , . . . , sn ).
(iii) Let A be an expression. Then only one of cases (E1) to (E4) of the definition above
hold. If (E1) holds, then there is a unique relation symbol r and terms t1 , t2 , . . . , tn such
that A = r(t1 , t2 , . . . , tn ). If (E2) holds, then A is unique, if (E3) holds then P and Q are
unique and if (E4) holds then x and P are unique.
Once again I will leave the proof of this as a tiresome and completely optional exercise.
In the first of these the variable x is bound, in the second the variable i. (A bound variable
is often called a “dummy” variable.)
This is what the word “dummy” is meant to suggest: that the dummy variable is just a
passing convenience. However if the statement is at all complicated, saying it without
96 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
using
R π x the dummy1 variable might require a truly horrible sentence. For instance, try saying
π
0
e cos x dx = 2 (1 − e ) without mentioning x. Dummy variables are indeed convenient.
(3) The dummy variable can be changed to another variable symbol without changing
the meaning of the statement. For instance, the two statements (–1) above say exactly the
same things as
Z π X10
cos z dz = 0 , r2 = 45 (–1A)
0 r=1
In larger, more complicated statements which might contain several variables, things can
get nasty if you substitute a variable which is already being used elsewhere in the statement.
We will have to look at this a bit more carefully below, but as a general rule, if you are
going to substitute a new variable for a dummy, it is best to substitute a “fresh” one, that
is, one which does not already occur elsewhere in the expression.
Let us look at
Scope
Each of these constructions which bind a variable symbol has a scope, that is, an area within
the expression in which that dummy variable has meaning. In an integral or a quantified
expression, the scopes are as shown here by the boxes.
Z π
cos x dx = 0 , (∀x)(x + 0 = x)
0
This may not look very illuminating as it stands: it becomes important when such an
expression is part of a larger one. For example, here is an elementary property of integration:
Z Z Z
(f (x) + g(x))dx = f (x) dx + g(x) dx . (–2A)
we see that the variable symbol x in fact has three different scopes, which can be thought
of informally as different “meanings”, here. The equation could equally well be written
Z Z Z
(f (x) + g(x))dx = f (y) dy + g(z) dz , (–2B)
making it much more obvious that there are actually three different dummies here.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 97
At first glance, the second form (–2B) looks better, at least for a formal theory, in that
it does not require recognition of the various different scopes of the same variable symbol.
On the other hand, the first form (–2A) is more in line with common usage. Furthermore,
recognising different scopes is algorithmically quite simple (even if it allows us to write
rather human-unfriendly expressions — see below) and, more importantly, legislating against
expressions such as (–2A) in our formal language would require a lot of awkward little rules
to be added. Therefore multiple different bindings of the same variable symbol are allowed
in the formal language from the word go.
Moreover, if you really used a different letter for every different scope, you would probably
run out of letters by the end of the first page.
Mathematics teachers are at pains to drum into school and university students that
one should never use the same variable symbol with two different meanings in the
same expression or proof, because that way leads to invalid arguments and so to
incorrect results. But then we turn around and flagrantly disregard this rule in
statements such as (–2A) above. The point is that it is OK to use the same symbol
for different dummies, provided that their scopes don’t overlap; it is just the unbound
(“free”) variables we should apply this rule to.
The sigma notation forPsums is a bit of a worry, because it is sometimes not obvious what
10
the scope is. Consider i=1 i2 + 1. Is the 1 at the end included in the scope or not? If not,
the value is 386 and if it is, 395.
10
X 10
X
i2 + 1 = 386 but i2 + 1 = 395
i=1 i=1
(These boxes aren’t quite right: they should include the i at the bottom of the Σ-sign too.)
Well, we should have used parentheses to remove the ambiguity in these statements
10
X 10
X
( i2 ) + 1 = 386 but (i2 + 1) = 395
i=1 i=1
This is just a mild problem with the sigma notation which won’t occur with our logic, so
we will ignore it here.
Now, for an example involving quantifiers, consider the deduction
A, B A∧B
Suppose that P (x) and Q(x) are expressions which mention the variable x. By substitution,
the above allows
For computer science geeks only: This sort of thing should be familiar to you. There
is no harm in different local variables being denoted by the same symbol so long as
their scopes do not overlap. This is almost exactly the same idea.
The discussion so far should be enough for the definition to make sense.
• All occurrences of a variable x are bound in (∀x)P ; any occurrence of any other variable
is free or bound in (∀x)P according as it is free or bound in P .
(iii) Note that the definition refers to occurrences of a variable. It is perfectly possible,
though rather unusual, for an expression to contain both bound and free occurrences of the
same variable. For example, in
the first two occurrences of x are free, the third and fourth bound.
A. A LANGUAGE FOR FIRST-ORDER THEORIES 99
(∃x)(x + y = x + 3)
that is, there exists some x such that x + y = x + 3
Then we can say immediately that y = 3. But we have jumped over a step here. The proof,
with the subconscious missing step goes
(∃x)(x + y = x + 3) given,
(∃x)(y = 3) because x + y = x + 3 ⇔ y = 3,
y=3 because y = 3 does not contain x free.
Here the first occurrence of x is free, the second and third are bound. This sort of thing is
unusual in ordinary mathematical notation, but is in fact allowed, for example
Z 1
2
g(x) = x + f (x) dx . (–3A)
0
As mentioned above, this kind of confusing notation is allowed in the formal language simply
because legislating against it causes more problems than it solves. While these are actually
100 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
allowed, in expressions like these it is far more sensible to change the name of the dummy
so that the meaning of the expression is easier to see:
P (x) ∨ (∀y)Q(y) , (–2B)
Z 1
2
g(x) = x + f (t) dt , (–3B)
0
(iii) Things become really hard to read when we have overlapping scopes with the same
dummy. For example one is allowed to take expressions such as (–2A) and (–3A) and
quantify them again to get, for instance,
(∃x) P (x) ∨ (∀x)Q(x) , (–4A)
Z 1 Z 1
g(x) = x2 + f (x) dx dx , (–5A)
0 0
When confronted with horrible expressions such as these, probably the best thing to do is
change the letters used for dummies, working from the innermost outward:
(∃x) P (x) ∨ (∀y)Q(y) , (–4B)
Z 1 Z 1
g(x) = x2 + f (t) dt dx , (–5B)
0 0
(Or just draw in the “scope boxes”.)
Challenge: Here is the definition of a function R → R. Figure out what the function is in a
more straightforward notation.
Z xZ x
h(x) = x2 dx dx for all x .
0 0
How many different dummies, all called x, are there here anyway?
This kind of notation can nearly always be avoided, and I will do so in these notes. But one
should be aware that it is possible, being legal under the definition of an expression.
Of course not. For a start, that doesn’t even make sense. What we should have got is
Z 1
g(3) = 32 + f (x)dx CORRECT.
0
The point is that in the expression (–1A) there are both free and bound occurrences of x,
and we should have only changed the free ones. If we had rewritten (-1A) in the safer form
Z 1
2
g(x) = x + f (t)dt for all x. (–1B)
0
So we arrive at:
102 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
It follows immediately from these that substitution behaves in the same way with the other
connectives (the proof is easy and obvious):
All this is just saying in a complicated way that, to make the substitution [x/t] in any term
or expression, simply go through it and change every free occurrence of the variable x to
the term t. However occasionally, when we want to prove something about substitution, it
helps to pick the idea apart in this way.
(If we were being really careful, at this stage we should verify that the result of such a
substitution in a term is guaranteed to always yield a genuine term and that such a substi-
tution in an expression is guaranteed to always yield a genuine expression. This is not hard
to verify, but tiresome, so let’s leave it at that.)
is a radically different statement (the right hand side is now a constant!). What went wrong
here? The problem was that the two occurrences of x in (–1A) were both free and referred
to the same thing, whereas in (–1X) the first y is free, but the second one is bound: the
variable y here has two different meanings in the same statement: this won’t do.
In short, the problem is that we are trying the substitution [x/y], when x occurs within the
scope of the binding of y by the integral.
The same sort of thing occurs in logic. Suppose we are talking about the integers Z. Here
is an expression:
(∃y)(x + y = 1) . (–2)
This is a statement about the integer x, which is in fact true of all such x, so we could write
(∀x)(∃y)(x + y = 1) . (–3)
(∃y)(z + y = 1) .
This says the same thing about z as (–2) does about x and we have
(∀z)(∃y)(z + y = 1) .
104 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
All this is OK, however if we make the unacceptable substitution [x/y] in (–2) we get
(∃y)(y + y = 1) .
which means something completely different — it has no free variable and is false (for
integers).
So far we have looked at substitutions of one variable for another. We are also allowed to
substitute terms for variables. Here are the results of the substitutions [x/3] and [x/x2 + 3]
in (–2):
(∃y)(3 + y = 1)
(∃y)(x2 + 3 + y = 1) .
These are OK, but the unacceptable substitution [x/y 2 + 3], yielding
(∃y)(y 2 + 3 + y = 1) .
is not.
In short, an expression with a free variable x can be thought of as saying something about
that variable. An acceptable substitution [x/t] yields an expression which says the same
thing about the term t; an unacceptable substitution yields an expression which says some-
thing entirely different. And what makes this unacceptable is if the term t contains a variable
which changes from free to bound when t is substituted for some occurrence of x.
(i) (One variable is being substituted for another.) If x and y are variable symbols, then
the substitution [x/y] is acceptable in P provided no free occurrence of x in P is within the
scope of a (∀y) quantifier.
(ii) (A variable is being substituted for itself.) As you would expect, the substitution
[x/x] is always acceptable in P , and P [x/x] is just P .
(iii) (An ineffective substitution.) If the expression P happens not to contain any occur-
rence of the variable x at all, then any substitution [x/t] is acceptable in P and again P [x/t]
is the same as P .
Here is a careful definition, by induction over the construction of the expression:
• A substitution [x/t] in (∀xQ) is always acceptable (note, the same x in both the
substitution and the quantifier).
One of the problems with this notation is that, with many of the results we discuss, an
expression P may have any number of variables, over and above the ones we are actually
interested in. But the P (x) notation leads us to think that that is the only variable in P .
But if we remember that P (x) might have other variables besides, we are allowed to write
things like
P (x) for x2 + 3y ;
Now let’s do a couple of substitutions and see what happens. Here’s what we would expect
(with this notation).
Start with x2 + 3y
Substitute y for x to get y 2 + 3y
Now substitute x for y to get x2 + 3x and that isn’t P (x)!.
What we are seeing here is a shortcoming of the “functional” notation for expressions: it just
isn’t up to the job when repeated substitutions occur (and in some other circumstances).
So I will use it occasionally for illustrative purposes, but we should not rely on it.
(PL5) (∀x)(S ⇒ A(x)) ⇒ (S ⇒ (∀x)A(x)) provided that x does not occur free in S.
Universal generalisation looks like this:
A(x)
(∀x)A(x)
of PL for the various letters in those theorems. And that, in turn, is because all the proofs
were schematic.
Put this another way: you could, if you wished, simply copy all the theorems of the SL
chapter and paste them into this one here, and they would stand as perfectly good theorems
and proofs for PL. (I don’t recommend actually doing this of course!)
These comments do not apply to the two big metatheorems of the previous chapter (the
Deduction Theorem and the Substitution of Equivalents Theorem). They need to have their
statements readjusted and be proved again for the new context.
There is a simple special case of this axiom which is used quite often. In this version no
substitution is made, that is, the variable x is left unchanged. So this special form of PL4 is
(∀x) A ⇒ A
(To get this special case from the general version of PL4, simply use the substitution [x/x].
We know that this kind of substitution is always acceptable, so we don’t need to bother
checking acceptability.)
It can only be used when S does not contain x free. I have used the letter S here in an
attempt to make this stand out by suggesting “sentence”. This means that it is of very
limited usefulness, for instance in proving weird things like
It is however used as a crucial step in the proof of the Deduction Theorem for PL and so,
of course, occurs as a “hidden” step in many proofs.
It was mentioned above that, if the expression A has no free occurrence of the variable x,
then (∀x)A is equivalent to A. So PL5 should be an equivalence, and it is; we will prove
this below.
In the last chapter we saw that expressions of the form X ⇒ (Y ⇒ Z) are equivalent to
X ∧ Y ⇒ Z and that the axioms of SL are probably easier to understand if you change
them to the latter form. (As before, Axiom PL4 is written in this double-implication form
because we want to write the axioms in fully-formal form, and the ∧ symbol is only available
in the semi-formal language.)
108 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
If we change Axiom PL4 in the same way to make it more readable, it becomes
or even
(∀x)(S ⇒ A) , S (∀x)A (Same proviso) .
x + y = y + x.
(By the way, we have already seen that every expression which turns up as a step in the
proof of a theorem is itself a theorem, and we know that all axioms are also theorems, so
what we are talking about here is free variables in any theorem, any step in the proof of a
theorem, or any axiom.)
Think of an expression (theorem, step, axiom) such as the one above as being true for any
values of its free variables. Let us have a look at an example of how this might work in a
proof.
When we come to the axiomatic development of mathematics, we will define the subset
relation thus:
and then of course we want to prove some of the elementary properties of this relation. Let
us prove
A⊆B, B ⊆C A⊆C.
Using the definition above, this is the same as
(∀x) (x ∈ A ⇒ x ∈ C)
1 (∀x)(x ∈ A ⇒ x ∈ B) hyp
2 (∀x)(x ∈ B ⇒ x ∈ C) hyp
3 x∈A ⇒ x∈B PL4 on 1
4 x∈B ⇒ x∈C PL4 on 2
5 x∈A ⇒ x∈C SL Proposition F.3(a) on 3 and 4
6 (∀x)(x ∈ A ⇒ x ∈ C) UG on 5
Notice that Steps 3,4 and 5 in this proof contain a free variable (x). It is a little clearer now
what is meant by such steps: the free variable stands for an arbitrary x. You might say that
these lines are supposed to be true “for any x”, and in fact we do say just that explicitly in
the plain language version of the proof.
We do not write “Now, for any x” between Steps 2 and 3 of the formal proof for two reasons.
Firstly, because any occurrence of x as a free variable in a proof of a theorem represents
“any x” (and the same of course for any other variable symbol which occurs free). And
secondly, because there is no expression in the formal language to say, “From now on every
free occurrence of x stands for any x”.
Notice also what these tricks allow us to do. Between Steps 2 and 3 we strip the quantifiers
off the hypotheses. Then, in Steps 3, 4 and 5 we do some manipulations with x which
would be difficult, perhaps impossible, to do if the expressions were still surrounded by the
quantifiers. Finally, having done these manipulations, we put the quantifier back in Step 6.
(This makes sense because Step 5 still holds for any x.) It is UG which allows us to perform
this final step — and that is the whole point of UG.
In a fully formal proof, this is the only way a new free variable symbol may appear: the
only justifications for the appearance of a new free variable symbol in a fully formal proof
are (1) that it is in an instance of one of the axioms or (2) that it is in the result of an
application of PL4 (the term t in P [x/t] is, or contains, a new variable symbol).
However, mostly we will be using semi-formal proofs and for them we will introduce tech-
niques which will allow new free variable symbols to appear out of thin air in two more
ways: (3) in a hypothesis to a deduction and (4) as a choice variable in an application of
the Choice Rule. We will deal with these later in this chapter.
(ii) Important warning This form of Universal Generalisation (UG) can only be used
in the formal proof of a theorem. When we come to look at deductions in PL, we will see
that there is an extra condition necessary for UG to be valid in the proof of a deduction.
A.27 A remark
Mathematics is a first-order theory and so is built on the basis of predicate logic. All of the
considerations we have discussed so far in this chapter turn up everywhere in mathematics.
110 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
so it is clear that some heavy predicate logic is involved here. In fact, I have simplified the
definition somewhat: it really should specify that a, , δ and x are all real numbers, and
I have not defined yet how notations such as ∀ > 0 are defined. Adding these necessary
refinements would make the definition even longer and harder to read.
Nevertheless, Calculus 101 courses around the world all expect pass-level students to be able
to work with such definitions, without any preliminary training in predicate logic and usually
without any mention being made of the numerous subtleties and pitfalls I have discussed in
this chapter.
Go figure!
B. DEDUCTION AND SUBDEDUCTION 111
I mentioned above that UG cannot be used with complete freedom when proving a deduction.
Let us look at an example to see why a further condition is needed. Suppose we are working
in a formal theory to deal with the natural numbers, and we have got as far as defining
prime numbers. We might now want to prove
x is prime hyp
..
.
..
. (proof steps) (Good proof)
..
.
x = 2 ∨ x is odd
Such a proof certainly exists; I have just left out the tiresome details. And this is all perfectly
correct so far.
HOWEVER, if I now use UG to add a further line to the bottom
I have “proved” something which is obviously absurd — that every natural number is either
2 or odd.
It is fairly clear what the problem is here. In the good part of the proof, every step follows
from the hypothesis. So any reference to the variable x (as a free variable) in the proof is
assuming that the hypothesis is true — that x is a prime number. In other words, in this
proof, x is not just any natural number, it is any prime number. And that means that UG
cannot be used in this way.
So the way out of this dilemma is to say that, in this kind of proof of a deduction from
hypotheses, UG must not be used to generalise any variable which occurs free in any hy-
pothesis. Remember that a deduction from hypotheses is not part of our fully formal theory.
It is a convenient addition in our semi-formal theory. So we are allowed to define a deduc-
tion in any way we please — but then it is incumbent on us to prove that it does what it is
supposed to do, that is, that the Deduction Theorem works.
Now there is another kind of argument which works from hypotheses and which looks very
similar. This occurs when we treat our hypotheses just like theorems. (We might in fact
already know that they are theorems, or we might want to ask what would happen if they
were theorems.) For the purposes of the present discussion, let us call such an argument an
entailment.
So the difference between an entailment and a deduction is the difference between two
possible ways of looking at the idea of “C follows from hypotheses H”:
112 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
And
We will put some mild restrictions on a proof H C so that the Deduction theorem will
∗
work, that is, so that it follows from such a proof that H ⇒ C. I will use the symbol
for such a restricted proof. We can also of course still deal with more general entailments
H 1, H 2, . . . , H h C if we want to.
∗
One way of dealing with the problem is to insist that in a proof of a deduction H C,
we do not allow any UG step in which the variable being generalised occurs free in H. More
∗
generally, in a proof of a deduction H 1 , H 2 , . . . , H h C , we do not allow any UG step in
which the variable being generalised occurs free in any of the hypotheses H 1 , H 2 , . . . , H h .
This restriction will indeed do the trick — with it the Deduction Theorem follows — and
we will prove it.
However, this is overkill: the Deduction Theorem still works with a milder condition. In
order for a UG step from S to (∀x)S to be valid, all that is necessary is that x should not
occur free in those hypotheses which are actually involved in the proof of the step S (and
that might well be not all of them).
So how do we decide which of the set of hypotheses are involved in the proof of some given
step? The easiest way to do this is to change the wording slightly as follows: Let us suppose
∗
we have a deduction H C with a purported proof S 1 , S 2 , . . . , S n . Then a subset K of
H is sufficient for step Si if there is a subsequence of the given proof S 1 , S 2 , . . . , S n giving
∗
a valid proof of K S. Then we can say that the UG step from S to (∀x)S is allowed
provided there is some subset of the hypotheses which is sufficient for S, none of which
contain x free.
This wording happily covers a slightly confusing possibility. In writing down a proof, we
usually annotate each step with how it comes about (“UG from Step 3” or “MP from Steps 4
and 5” and so on). But the fact is that these annotations are not part of the proof but just
a kindness to the reader. It is perfectly possible (though uncommon) for a step to have two
different justifications (“UG from Step 3” and “MP from Steps 4 and 5”), but we only bother
to point out one. And this in turn means that a step might have more than one sufficient
set of hypotheses. If this does occur, then it is only necessary for one of those sets to have
no free occurrences of x. As just stated, this strange possibility is covered by the wording
above.
114 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
(iii) In the condition (ug∗) it is possible that the empty set of hypotheses might be
sufficient for Sj . If this happens, it means that there is a subsequence of the given proof
which constitutes a proof of Sj on the basis of no hypotheses at all — that is, that Sj is a
theorem.
(iv) If the hypotheses contain no free variables at all, then the proviso (ug∗) automatically
holds and so can be ignored.
(v) As a special case of this, where there are no hypotheses (that is, H is empty: this is
the proof of a theorem), the proviso (ug∗) is automatically true and can be ignored.
(vi) Where the proof does not use UG at all, the proviso does not apply at all.
(vii) From this it follows that all proofs of deductions given in the previous chapter on
B. DEDUCTION AND SUBDEDUCTION 115
Sentential Logic are equally valid as proofs of deductions in PL or any first-order theory.)
(viii) The extra condition (ug∗) is the one that will make the Deduction Theorem work,
as discussed informally above and proved below.
(ix) Here is what this means for our small example proof above.
The short proof, labelled “Good proof” above, is a valid proof of the deduction
∗
x is prime x = 2 ∨ x is odd . (–2)
The extra step, labelled “Bad step” is invalid. It is not allowed in a proof of such a deduction
because it fails the (ug∗) condition. As already remarked, this is a good thing because the
conclusion is obviously false.
The Deduction Theorem, which we prove next, will allow us to conclude from (–2) that
x is prime ⇒ x = 2 ∨ x is odd
and then, since this is a theorem (and not a step in the proof of a deduction) we may now
if we wish apply UG to conclude
(∀x)( x is prime ⇒ x = 2 ∨ x is odd )
All this is just what we want from deductions and the Deduction Theorem.
Earlier I said that all the theorems and proofs in Sentential Logic in Chapter 2 held just as
well for Predicate Logic in this chapter — that you could just cut and paste any theorem
from that chapter into this one, but that we will need to prove the Deduction Theorem
again. The reason is that, in the earlier proof, we looked at the various kinds of steps which
could occur in a deduction. Now that we are dealing with Predicate Logic, we have one
more kind of step, namely a UG step, and we need to deal with this as well. But the proof
for first-order theories is almost exactly the same as the proof for Sentential Logic, with just
a small addition to cope with UG steps.
In order to prove the Deduction Theorem we must first observe that Proposition 2.B.2 and I 2.B.2
Deduction 2.B.4 of Chapter 2 still hold good for Predicate Logic: simply look at the proofs I 2.B.4
given there and observe that they obey the rules for a proof in Predicate Logic.
Proof. The simple form is just a special case of the more general form (the case h = 1), so
we need only prove the latter. We do this by proving that if
∗
H 1, H 2, . . . , H h C,
116 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
then
∗
H 1 , H 2 , . . . , H h−1 Hh ⇒ C ,
Hh ⇒ S1 , Hh ⇒ S2 , . . . , Hh ⇒ Sn (-2)
with a few extra steps thrown in, which is a valid proof of (-2). We did this by working step
by step along the proof, which is the same as working by induction over n.
So, for some particular step Sk (with 1 ≤ k ≤ n) we assumed all previous steps Hh ⇒ Si
in the new proof were valid and proved that Hh ⇒ Sk is valid also. We did this by looking
at the various ways that Sk can be justified (as an axiom, as one of the hypotheses and as
following by Modus Ponens from earlier steps), giving a short proof of validity in each case.
All those short proofs still hold, word for word, in this new environment, so it remains only
to check the fourth, new, case, that Sk follows from an earlier step by UG:
and, since this is part of the old valid proof, there is a subset, K say, of H 1 , H 2 , . . . , H h which
is sufficient for Sp , no member of which contains x free. And that means that there is some
∗
subsequence U 1 , U 2 , . . . , U m of S 1 , S 2 , . . . , S p which constitutes a valid proof of K Sp .
Now we look at two subcases:
(i) Hh does not contain x free.
∗
We have K Sp
∗
so K Hh ⇒ Sp (Axiom PL1 and MP)
∗
so K (∀x)(Hh ⇒ Sp ) (OK, no member of K contains x free)
∗
so K (∀x)(Hh ⇒ Sp ) ⇒ (Hh ⇒ (∀x)Sp ) (Axiom PL5 (n.b. ug∗ complied with.)
∗
so K Hh ⇒ (∀x)Sp (MP)
B.4 Subdeduction
Subdeductions work for PL in the same way as for SL. The only difference is that the (ug∗)
restriction on the affected variables in a Universal Generalisation step must be observed. As
remarked above, the stronger but simpler condition (ug∗∗) is nearly always sufficient, which
is nice because it is much more easily seen to be obeyed in a proof.
To conform with the stronger condition (ug∗∗): wherever a Universal Generalisation step is
used, the affected variables must not occur free in any of the hypotheses of any box (subproof
or main proof) which encloses that step.
For example
The two UG steps shown must take into account free variables in H1 , H2 , . . . , Hh and
J1 , J2 , . . . , Jj but may ignore those in K1 , K2 , . . . , Kk and L1 , L2 , . . . , Ll .
Where the more general condition (ug∗) is used, it is almost always hidden in the substitution
of the statement of a deduction or theorem, as justified in Theorem C.1 above. In the rare I C.1
cases where (ug*) is used explicitly, one should be careful to specify the sufficient set of
hypotheses in the comments.
118 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
Proof of PL6.
1 (∀x)(A ⇒ B) hyp
2 A⇒B PL4
3 (∀x)A subhyp
4 A PL4
5 B MP:4,2
6 (∀x)B UG
7 (∀x)A ⇒ (∀x)B Ded:3–6
Proof of PL7
1 S hyp
2 (∀x)S UG
It is an interesting exercise to prove that, using this alternative axiomatisation, PL5 can be
proved (that is what is needed to show that the two sets of axioms are equivalent). The
slightly tricky bit is that, if you use the Deduction Theorem in this proof (hard not to!),
then you have to first prove that it works for the new axiomatisation. Note that the proof
above of the Deduction Theorem does use PL5, so this is necessary.
C. USING THEOREMS AND DEDUCTIONS IN PROOFS 119
That was fine for Sentential Logic, where UG is not involved with its special condition on
free variables in the hypotheses. (And where there was no difference between deduction
and entailment). But now, with Predicate Logic it seems that a problem can arise with
deductions. Consider this: suppose that we have already proved a deduction of the form
∗
H 1, H 2, . . . , H h P (–1)
and we are now trying to prove a new one
∗
K 1, K 2, . . . , K k C (–2)
Suppose further that in the course of the proof of (–2) we have already proved H 1 , H 2 , . . . , H h ,
so now we feel justified in writing down P as a new step, because of (–1).
But are we?
Our semiformal proof, up to the point where we assert P , looks like this
S1, S2, . . . , Sn , P (–3)
(Here S 1 , S 2 , . . . , S n are the steps so far.) Now, using the justification we used in the
previous chapter, all we have to do is insert the proof of (–1) before P to get a valid formal
proof:
S1, S2, . . . , Sn , T 1, T 2, . . . , T m , P (–4)
But not so fast! We are dealing with deductions now. It is possible that one of the steps
we have just inserted is the result of applying UG, and that pesky condition (ug∗) must
be checked. Now, the steps we have just inserted constitute a valid proof of (–1), so that
variable does not turn up free in any of the hypotheses H 1 , H 2 , . . . , H h of (–1). However,
now we are constructing a proof of (–2), and there is nothing at all to say that the variable
does not turn up free in one of its hypotheses K 1 , K 2 , . . . , K k . (This really can happen —
it’s not too hard to think up an example.) So this proof that it is always OK to insert P in
the semi-formal proof as in (–3) appears to be incorrect. That’s the bad news.
The good news is that it would only be incorrect if the simpler condition (ug∗∗) was all
we had to work with, but with the more subtle (ug∗), which is in fact the definition, the
insertion is correct. We prove this now.
Then
What this means is that, in constructing a proof, we can always add a new step (P ), provided
that step has already been proved to follow as a deduction from earlier steps in this proof.
H1 ∧H2 ∧ . . . ∧Hh ⇒ P ;
Since a theorem is the same as a deduction from the empty set of hypotheses, it follows
from this that, in constructing a proof, we are always allowed to insert the statement of a
theorem (that is proved elsewhere) with no further ado.
This is good, because this is something we want to do often.
D. SOME THEOREMS AND METATHEOREMS 121
Proof. Inspection of the definitions of proofs in SL and first-order languages shows that a
proof in SL is a proof in any first-order language, without need for any change.
In proofs and arguments henceforth, it will be assumed that you are familiar with Sentential
Logic. Steps in proofs which depend on sentential logic arguments alone are simply notated
“by SL” or something similar.
Back in the early discussion of binding (A.11), I stated that one of the ways of recognising a I A.11
bound (= dummy) variable is that it can be replaced by another variable without changing
the meaning of the expression (the substitution as usual should be acceptable). We are now
in a position to prove this.
∗
Proof. Using the deduction theorem, we prove (∀x)P (∀y)P [x/y]
1 (∀x)P hyp
2 P [x/y] PL4 and MP
3 (∀y)P [x/y] UG
Note that x occurs in the hypothesis, but it does not occur free there; so y does not occur
free in the hypothesis, even if it is the same as x. Therefore the UG step is OK.
Exercise Think up a couple of simple examples to show that ignoring the provisos in this
theorem can lead to silly results.
That is the fully formal version, a bit awkward to read as usual. In semi-formal notation
we usually write
(∃x)P stands for ¬(∀x)¬P
Note that these all make sense when expressed in plain English:
(1) To say that there exists some x such that P is true is the same as saying that P is
not false for all x.
(2) To say that there exists some x such that P is false is the same as saying that P is
not true for all x.
(3) To say that there does not exist any x such that P is true is the same as saying that
P is false for all x.
(4) To say that there does not exist any x such that P is false is the same as saying that
P is true for all x.
Now we prove another obvious property of this quantifier. Suppose we know that the
expression P is true when some particular value (t say) is substituted for the variable x,
then we can say that there exists an x such that P is true. Stating this properly:
P [x/t] ⇒ (∃x)P .
Proof. Using the definition of ∃, this is P [x/t] ⇒ ¬(∀x)¬P . The proof is easy:
D.5 Proposition
(∀x)(P ⇔ Q) ⇒ ((∀x)P ⇔ (∀x)Q)
We are proving this here because it is needed for the next theorem.
∗
Proof. As usual we prove (∀x)(P ⇔ Q) ((∀x)P ⇔ (∀x)Q) .
D. SOME THEOREMS AND METATHEOREMS 123
1 (∀x)(P ⇔ Q) hyp
2 P ⇔Q PL4 on 1
3 P ⇒Q Definition of ⇔ on 2
4 (∀x)(P ⇒ Q) UG
5 (∀x)P ⇒ (∀x)Q PL6
6 (∀x)Q ⇒ (∀x)P Similarly
7 (∀x)P ⇔ (∀x)Q Definition of ⇔ on 5,6
Substitution of equivalents in the context of Sentential Logic was proved in Theorem 2.F.18. I 2.F.18
The theorem still works for Predicate Logic, assuming that care is taken to avoid accept-
ability problems (we should expect that by now). It is also a very useful theorem — we tend
to use it all the time without thinking much about it. The proof is almost the same as the
one for the SL version, but I will give the updated one because of its importance.
(i) If A ⇔ A0 then X ⇔ X 0.
(ii) Suppose further that this substitution is “acceptable” in the sense that, if A lies
within the scope of a quantifier (∀x) in X, then neither A nor A0 contain a free occurrence
of x. Then
(A ⇔ A0 ) ⇒ (X ⇔ X 0 ) (–1)
Proof (i). The proof is by induction over the construction of X and is exactly the same
as the proof of the SL version, referred to above, except that one extra case is needed to
cover a quantifier, that is, where X is (∀x)P and and A is a subexpression of P . Then X 0
is (∀x)P 0 , where P 0 is the result of replacing A by A0 in P .
We have P ⇔ P 0 by the inductive hypothesis.
Such expressions turn up from time to time (often while simplifying more complicated
expressions), so we need to be able to deal with them. Luckily, that is usually quite simple.
Basically, a statement such as the one above is equivalent to the same one without the
quantifier, that is
2 + 2 = 4. (–1B)
More generally, these remarks apply to expressions in which the variable in question does
not occur free. For example, in
(∀x)(∀y)( x + y = y + x ) . (–2A)
(∀x)(∀x)(∀y)( x + y = y + x ) . (–2B)
and to
(∀y)(∀x)(∀y)( x + y = y + x ) . (–2C)
E.1 Proposition
Suppose that the variable x does not occur free in S. Then
(i) S ⇔ (∀x)S
(ii) S ⇔ (∃x)S .
Proof. (i) S ⇒ (∀x)S by PL7. Conversely (∀x)S ⇒ S by PL4. The result follows.
(ii) Applying (i) to ¬S we have ¬S ⇔ (∀x)¬S. But (∀x)¬S ⇔ ¬(∃x)S and so we
have ¬S ⇔ ¬(∃x)S. The result now follows by SL.
E. QUANTIFYING NON-FREE VARIABLE SYMBOLS 125
E.2 Proposition
Suppose that the variable x does not occur free in S (but may occur free in P ). Then
(i) (∀x)(S ∧ P ) ⇔ S ∧ (∀x)P
(ii) (∀x)(S ∨ P ) ⇔ S ∨ (∀x)P
(iii) (∃x)(S ∧ P ) ⇔ S ∧ (∃x)P
∗
Proof. (i) First we prove that (∀x)(S ∧ P ) S ∧ (∀x)P .
1 (∀x)(S ∧ P ) hyp
2 S∧P PL4 on 1
3 S SL on 2
4 P SL on 2
5 (∀x)P UG on 4
6 S ∧ (∀x)P SL on 3 and 5
∗
Next we prove that S ∧ (∀x)P (∀x)(S ∧ P ).
1 S ∧ (∀x)P hyp
2 S SL on 1
3 (∀x)P SL on 1
4 P PL4 on 3
5 S∧P SL on 2 and 4
6 (∀x)(S ∧ P ) UG on 5
1 (∀x)(¬S ⇒ P ) hyp
2 ¬S ⇒ P PL4 on 1
3 ¬S subhyp
4 P MP: 3 and 2
5 (∀x)P UG on 4
6 ¬S ⇒ (∀x)P Ded: 3–5
∗
Now we prove S ∨ (∀x)P (∀x)(S ∨ P ).
1 S ∨ (∀x)P hyp
2 S subhyp
3 S∨P SL on 2
4 (∀x)(S ∨ P ) UG on 3
5 S ⇒ (∀x)(S ∨ P ) Ded: 2–4
6 (∀x)P subhyp
7 P PL4 on 6
8 S∨P SL on 7
9 (∀x)(S ∨ P ) UG on 8
10 (∀x)P ⇒ (∀x)(S ∨ P ) Ded: 6–9
11 (∀x)(S ∨ P ) Proof by Cases: 1,5 and 10
E.3 Proposition
Suppose that the variable x does not occur free in S (but may occur free in P ). Then
(i) (∀x)(S ⇒ P ) ⇔ (S ⇒ (∀x)P )
Proof. (i) Using the theorem of SL that (A ⇒ B) ⇔ (¬A ∨ B), this is the same as
(∀x)(¬S ∨ P ) ⇔ ¬S ∨ (∀x)P
(∀x)(P ⇒ S) ⇔ (∀x)(¬S ⇒ ¬P ) by SL
⇔ ¬S ⇒ (∀x)¬P just proved
⇔ ¬S ⇒ ¬(∃x)P see Section D.3(–3)
⇔ (∃x)P ⇒ S by SL
(∀x)(P ⇒ S) ⇒ ((∃x)P ⇒ S)
can be rewritten
(∀x)(P ⇒ S) , (∃x)P S
which is a sort of quantified version of the Constructive Dilemma.
128 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
Proof Suppose not, that is, that neither A nor B is a subset of the other.
Then there are group members a ∈ A and b ∈ B such that a ∈ / B and b ∈/ A.
Since a and b are members of the subgroup A ∪ B, we have ab ∈ A ∪ B and so
either ab ∈ A or ab ∈ B; suppose without loss of generality that ab ∈ A. Now
remember that a−1 ∈ A also, so b = a−1 .ab ∈ A, a contradiction.
There are a number of interesting aspects of this proof (proof by contradiction, use of the
Constructive Dilemma, apparantly unavoidable use of Reductio ad Absurdum and so on).
The part of interest now is the assertion of the existence of a and b. Making the assertion
of the existence of a a little bit more formal, it goes
A 6⊆ B
Therefore ¬(∀x)(x ∈ A ⇒ x ∈ B) from the definition of B ⊆ A
and so (∃x)(x ∈ A ∧ x ∈ / B) by PL manipulation.
Now let a be an element such that a ∈ A and a ∈
/ B.
etc . . .
(v) Si follows by the Choice Rule from an earlier expression in the sequence, that is, it
is of the form C[x/a] where
(a) There is an earlier step of the form (∃x)C (same C and x).
(b) a is a variable symbol which which does not occur in the proof before the step
C[x/a] (and in particular, it must not occur, free or bound, in any hypothesis of the proof).
(c) The variable a must not be the variable being generalised in any application
of UG in this proof.
(d) The variable a must not occur free in the “target” expression. This means that,
∗
when this proof with choice, H 1 , H 2 , . . . , H h A is used to prove H1 ∧H2 ∧ . . . ∧Hh ⇒ A,
the expression A must not contain a free.
F.3 Notes
(i) The conditions effectively make a choice variable behave like a new constant.
(ii) Under this definition, a deduction (with choice) in which the Choice Rule is not in
fact used, is just an ordinary deduction. Therefore this new kind of deduction is an extension
of the old kind.
(iii) Note that the Choice Rule may be used several times in the same deduction (we used
it twice in the introductory example F.1 above; the wording of the definition takes this into
account.
(iv) Regarding (d). The expression A may however contain a (∃x)C quantifier, in fact is
likely to. Together with (b) it tells us that a cannot occur free anywhere in H1 ∧H2 ∧ . . . ∧Hh ⇒ A.
The point of (d) is that this kind of proof subverts the variable a to act like a constant for
the duration of this proof. But once the proof is over, we do not want a to be stuck in this
rôle for all time; once the proof is over, it reverts to being a plain variable again.
F.4 Theorem
∗
H A (with choice) (–1)
130 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
∗
if and only if H A (without choice) (–2)
The theorem is important. Make sure you understand it and how to use the Choice Rule.
The proof is not so important to understand, and is included here as usual for completeness.
On the other hand, it is not too long and quite nice, so don’t let me discourage you from
working through it.
Proof. It has just been observed that a deduction without choice is a special case of a
deduction with it, so if (–2) then (–1). We now prove that if (–1) then (–2), by induction
over the number of applications of the Choice Rule in the assumed proof with choice
B1, B2, . . . , Bn
of (–1). If the Choice Rule is not used in the proof, then it is already a proof without choice
and (–2) is the case. Now suppose that the Choice Rule is used at least once; suppose further
that the first use of the rule in the proof is to (∃x)C to yield C[x/c]. Now note that
∗
H ∪ {C[x/c]} A (with choice). (–3)
This uses exactly the same proof, except that the line C[x/c] is now justified by “Hyp”
instead of “choice rule”. [Not so fast. We must check that B 1 , B 2 , . . . , B n does indeed
satisfy the conditions in the definition of a proof with choice of (–3). A careful reading of
the conditions shows that this is in fact so.] Therefore the Choice Rule is used one less time
in B 1 , B 2 , . . . , B n as a proof of (–3) than it was as a proof of (–1). We may therefore use
induction to conclude that
∗
H ∪ {C[x/c]} A (without choice) (–4)
and then apply the (ordinary) Deduction Theorem to conclude
∗
H C[x/c] ⇒ A (without choice).
Since one of the conditions on the choice variable c was that it should comply with (ug∗),
we may use Universal Generalisation to conclude further that
∗
H (∀c)(C[x/c] ⇒ A) (without choice).
I E.3 Another condition on c was that it should not occur in A either, and so by E.3(ii)
∗
H (∀c)C[x/c] ⇒ A (without choice). (–5)
We also note that (∃x)C occurs in the proof before C[x/c], and so that part of the proof
from the beginning up to and including (∃x)C constitutes a proof of that expression in which
the Choice Rule is not used at all. In other words
∗
H (∃x)C (without choice).
Yet another condition on c was that it should not occur in this step (∃x)C of the proof, and
so
∗
H (∃c)C[x/c] (without choice). (–6)
The result now follows from (–5) and (–6).
The next proposition is important; it deals with when we can or cannot change the order in
which quantifiers occur. The proof of Part (iii) is a nice example of the use of the Choice
Rule.
F. DEDUCTIONS INVOLVING CHOICE 131
F.5 Proposition
(i) (∀x)(∀y)P ⇔ (∀y)(∀x)P
(ii) (∃x)(∃y)P ⇔ (∃y)(∃x)P
∗
Proof. (i) First we prove (∀x)(∀y)P (∀y)(∀x)P .
1 (∀x)(∀y)P hyp
2 (∀y)P PL4 on 1
3 P PL4 on 2
4 (∀x)P UG on 3
5 (∀y)(∀x)P UG on 4
1 (∃x)(∀y)P Hyp
2 (∀y)P [x/c] Choice rule on 1
3 P [x/c] PL4 on 2
4 (∃x)P Existential Generalisation on 3
5 (∀y)(∃x)P UG (Note that y does not occur free in Line 1)
F.6 Remarks
In calculus, continuity of a function f : R → R is defined thus:
(∀a ∈ R)(∀ > 0)(∃δ > 0)(∀x ∈ R)( |x − a| < δ ⇒ |f (x) − f (a)| < )
(∀ > 0)(∃δ > 0)(∀a ∈ R)(∀x ∈ R)( |x − a| < δ ⇒ |f (x) − f (a)| < ) .
The only difference between these definitions is the order of the first three quantifiers.
Consequently the theorem
(∀x)(∃t)(∃y) L(x, y, t)
“Everybody, at some time, loves somebody”
but it is not OK to swap the existentials with the universal — the meaning will change:
(∃y)(∀x)(∃t) L(x, y, t)
“There is a particular person that everybody loves at some time”
and
(∃t)(∀x)(∃y) L(x, y, t)
“There is/was/will be a particular time at which everybody loves someone” .
Changing quantifiers from existential to universal and vice versa has an even more violent
effect:
(∀x)(∃y)(∀t) L(x, y, t)
“Everybody loves someone forever” .
And because I try to be a good guy: this song is copyright c 1948, 1975 Sinatra Songs, Inc.
and written by Sam Coslow, Irving Taylor and Ken Lane, 1947. It was the theme song of
The Dean Martin Show 1965–1974.
G. QUALIFIED QUANTIFIERS 133
G Qualified quantifiers
G.1 Discussion
Consider the definition of continuity in elementary calculus: a function f : R → R is contin-
uous if
(∀a ∈ R)(∀ > 0)(∃δ > 0)(∀x ∈ R)( |x − a| < δ ⇒ |f (x) − f (a)| < )
We see that the quantifiers here are slightly more complicated than the ones we have dis-
cussed so far: they contain qualifications within the quantifiers themselves. (∀x ∈ R) means
“for all real x”, not “for all x whatsoever”. Similarly (∃δ > 0) means “there exists a real δ > 0
such that”. Other similar ways of qualifying a quantifier will probably occur to you.
If we expand these slightly in an obvious way,
(∀a ∈ R) means (∀a such that a ∈ R)
(∃δ > 0) means (∃δ such that δ is real and δ > 0)
and so on, we see that all such notations are shorthand for one or other of
(∀x such that Q(x)) or (∃x such that Q(x))
where Q(x) is some expression involving x. And now we can make a general definition which
covers all such cases:
(∀x such that Q(x)) P (x) means (∀x)(Q(x) ⇒ P (x))
(∃x such that Q(x)) P (x) means (∃x)(Q(x) ∧ P (x))
I have used the slightly sloppy “functional” notation here because it is perhaps easier to see
what is going on. Here is a properly written definition:
It is nice to know that most of the manipulations we have been verifying for working with
ordinary quantifiers still work with these qualified ones. For example, the basic relationship
between ∀ and ∃:
Normally Q would mention x but not y and R would mention y but not x. But there is no
hard and fast rule about this. It is perfectly possible to have such an expression as
In this case it would not be valid to swap the two quantifiers: (∀y > x)(∀x > 0)X. However,
so long as we avoid this kind of overlapping of quantifiers, the rules for swapping their order
I F.5 (F.5) still hold.
G.4 Proposition
Provided Q does not mention y free and R does not mention x free, the following hold:
(i) (∀x such that Q)(∀y such that R)A ⇔ (∀y such that R)(∀x such that Q)A
(ii) (∃x such that Q)(∃y such that R)A ⇔ (∃y such that R)(∃x such that Q)A
(iii) (∃x such that Q)(∀y such that A)P ⇒ (∀y such that R)(∃x such that Q)A
Note As before, the third statement here is an implication, not an equivalence. The
reverse implication is not a theorem of PL.
Proof (i).
H Consistency of PL
H.1 Theorem
Plain Predicate Logic PL is consistent.
Proof. We show that Predicate Logic contains no expression P such that both P and
¬P .
We define, for each expression P , a value kP k which will be either 0 or 1. This is by
induction over the construction of P .
• If P is ¬Q then kP k = 1 − kQk.
(
1 if kQk ≤ kRk;
• If P is Q ⇒ R then kP k =
0 otherwise.
H.2 Lemma
If the substitution [x/t] is acceptable in P , then kP [x/t]k = kP k.
Proof. This follows the construction of P , using the “careful” description of acceptability
given in A.19. I A.19
If P is atomic, then so is P [x/t] and so both have value 1.
If P is ¬Q and [x/t] is acceptable in P , then it is acceptable in Q also and we have kP [x/t]k =
k¬Q[x/t]k = k¬Qk = kP k.
and so, since kQ[x/t]k = kQk and kR[x/t]k, kP [x/t]k = 1 if and only if kQk ≤ kRk, that is,
kP [x/t]k = kQ ⇒ Rk = kP k.
If P is (∀y)Q, where y is different from x and [x/t] is acceptable in P , then it is acceptable
in Q also and the variable symbol y does not occur in t. Then
Finally, if P is (∀x)Q then [x/t] is acceptable in P , kP [x/t]k = kP k and the result is trivially
true.
136 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
H.3 Lemma
The value of every theorem is 1.
Proof. This is by induction over the length of the proof of the theorem. It is enough to
prove that the value of every axiom is 1 and that, if the expression P follows by Modus
Ponens or Universal Generalisation from expressions of value 1, then kP k = 1 also.
The proofs for Axioms PL1, PL2 and PL3 are simply by checking the various cases for values
of kP k, kQk and kRk.
H.4 Comment
I 5.F In the Section 5.F we will prove that Plain Predicate Logic is not complete. In fact we could
do it right now with a slight modification of the above technique, but it sits better in the
the Models chapter.
Towards the end of these notes we will prove that Plain Predicate Logic is not decidable.
There are various ways of doing this and they all require a fair amount of background. We
will show that the result follows without too much trouble from Gödel’s Incompleteness
Theorem, which we are going to prove anyway.
I. FIRST-ORDER THEORIES COMPARED WITH PL 137
I.1 Proposition
Let K be a first-order theory with proper axioms A. Then, for any expression X,
X in K (–1)
if and only if
A X in PL . (–2)
∗
Note well: in (–2) the plain turnstile symbol is used, not . This means that this is an
entailment, not a deduction. (This is exactly the kind of fact for which the idea of entailment
is used.)
Proof. The proofs of the two things are the same, the only difference being that where a
member of A turns up as a line in the proof, in (–1) it is justified by being an axiom and in
(–2) by being a hypothesis.
138 CHAPTER 3. PREDICATE LOGIC AND FIRST-ORDER THEORIES
This is a good place to define the idea properly, so let us do that now.
Note the wording here. It might happen that we define a first-order theory by a convenient
but infinite set of axioms, and then find out that there is another, perhaps less convenient,
finite set which generates the same set of theorems. In that case the theory is finitely
axiomatisable. This will in fact occur with the very first set of axioms we investigate in the
next chapter.
In general it might be far from obvious whether any given theory is finitely axiomatisable
or not.
4. SOME FIRST-ORDER THEORIES
For an example of the last two items here, suppose we are solving the equation
A natural way to proceed would be to use the fact that we know that (have already proved
that)
cos 2x = 2 cos2 x − 1 (–2)
so we can substitute this equality into the term cos x + cos 2x to get
139
140 CHAPTER 4. SOME FIRST-ORDER THEORIES
or, better still, substitute this equality (–2) into the equation (–1) (an expression) to get the
equivalent expression
cos x + 2 cos2 x − 1 = 1
after which it is all downhill. If we write P for Equation (–1) then it would be natural to
write Equation (–2) as
P [ cos 2x / 2 cos2 x − 1 ] (–3)
but there is a problem with this notation: in a substitution [x/y], y may be any term, but
x must be a variable symbol. Thus the notation in (–3) is not allowed.
A binary relation which did not have all these properties would not be called “equality” and
it would be foolhardy to use the symbol = for such a relation.
We now look at a pair of axioms which will give all these properties.
A.2 Definition
A first-order theory with equality is a first-order theory with an extra binary relation = and
the extra axioms:
(Eq1) (∀x)( x = x )
(Eq2) (∀x)(∀y) x = y ⇒ (P ⇔ P [x/y]) provided the substitution is acceptable.
Eq2 is an axiom schema: one axiom for every choice of expression P . On the other hand,
Eq1 is a plain axiom.
As usual with axioms, we can use either the fully quantified version, as I have done above,
or the unquantified version:
(Eq1u) x=x
(Eq2u) x = y ⇒ (P ⇔ P [x/y]) provided both substitutions are acceptable.
(You can deduce the unquantified versions from the fully quantified ones by using PL4, and
go in the opposite direction by UG.)
It is not at all obvious that all the properties of equality listed above can be deduced from
just these two axioms. Nevertheless they can, and we will prove this now.
(ii) (∀x)(∀y)(x = y ⇒ y = x)
(iii) (∀x)(∀y)(∀z)(x = y ∧ y = z ⇒ x = z)
(ii) Let P be the expression y = x. Then P [x/y] is the expression y = y and Axiom Eq2
gives
x = y ⇒ (y = x ⇔ y = y)
∴ x = y ⇒ (y = x ⇔ T) by Eq1
∴ x = y ⇒ y = x.
(iii) Let P be the expression x = z. Then P [x/y] is the expression y = z and Axiom Eq2
gives
x = y ⇒ (x = z ⇔ y = z)
∴ x = y ⇒ (y = z ⇒ x = z)
which is the same as x = y ∧ y = z ⇒ x = z.
When we come to substituting equals for equals we have a problem, one we can fix without
too much trouble, but we have to notice that it is there. The thing is, we have a new kind
of substitution here.
With the kind of substitution we have been dealing with in the last chapter (for example
P [x/t]) the thing being substituted for (x in the example) must be a variable symbol and
every free occurrence of x must be replaced by t.
On the other hand, with substitution of equals we can substitute one term for another, and,
as stated above, it is OK to substitute for some but not all of the occurrences.
This raises the spectre that we might need to invent a new notation for this kind of substi-
tution. Luckily, we don’t. With a little fancy footwork we can make ordinary substitution
(as we have been using it) work fine.
The next proposition is the crucial one which makes it work, and the following discussion
shows how.
A.4 Proposition
(i) For any term t and variable z
(∀x)(∀y) x = y ⇒ (t[z/x] = t[z/y])
Proof. (ii) Suppose that x = y. Apply Axiom Eq2 twice, once to P [z/x] and once to
P [z/y]:
P [z/x] ⇔ P [z/x][x/y]
and P [z/y] ⇔ P [z/y][x/y] .
It remains only to observe that P [z/x][x/y] and P [z/y][x/y] are the same expression. This
is simply because of the properties of substitution. In the first substitution P [z/x][x/y],
142 CHAPTER 4. SOME FIRST-ORDER THEORIES
• All free occurrences of x in P are left alone by [z/x] and then changed to y by [x/y].
[z/x] [x/y]
x → x → y
y → y → y
z → x → y
[z/y] [x/y]
x → x → y
y → y → y
z → y → y
and, by the calculations just done above, the two sides of this equation are in fact the same
term, so P [x/y] is true by Eq1.
Now Eq2 gives
x = y ⇒ (P ⇔ P [x/y])
which is x = y ⇒ (P ⇔ T )
and so x=y ⇒ P
which is x = y ⇒ (t[z/x] = t[z/y])
as required.
The second is how to define the operation of substituting one term for another.
A. FIRST-ORDER THEORIES WITH EQUALITY 143
We will see that the last proposition above in fact solves both these problems for us.
Let us suppose we have proved that s = s0 , where s and s0 are terms, we are confronted with
a larger term, t say, which contains several occurrences of s and we want to substitute s0 for
some of these occurrences, but perhaps not all.
t: s s s
Mental picture of substituting s0 for some but not all of the s occurring and so changing t
to t0 :
t0 s0 s s0
t∗ : z s z
So now what we want to prove becomes: t∗ [z/s] = t∗ [z/s0 ]; but this is just what Part (i) of
the previous Proposition (A.4) tells us.
The proof that such partial substitution of equals for equals in an expression yields an
equivalent expression is almost the same, and should be obvious by now. Here are the
details anyway.
Let us suppose we have proved that s = s0 , where s and s0 are terms, we are confronted with
an expression, P say, which contains several occurrences of s and we want to substitute s0
for some of these occurrences, but perhaps not all.
Mental picture of s occurring three times as a subterm of P :
P : s s s
Mental picture of substituting s0 for some but not all of the s occurring and so changing P
to P 0 :
P0 s0 s s0
P∗ : z s z
Then
P = P ∗ [z/s] and P 0 = P ∗ [z/s0 ] ,
so now what we want to prove becomes: P ∗ [z/s] ⇔ P ∗ [z/s0 ]; and this is just what Part (ii)
of the previous Proposition (A.4) tells us.
Definition
(∃!x)P (x) means (∃x)P (x) ∧ (∀x)(∀y) P (x) ∧ P (y) ⇒ x = y .
(Here y is the first variable symbol which does not occur in P (x).) Some writers use the
notation (∃1 x)P (x) or (∃1x)P (x) for this idea.
It seems to me to be quite natural to split this up into two ideas:
Note that the uniqueness (!x) part of this definition requires equality to be defined.
This procedure is valid, in a sense which is made precise in the next proposition. We will
call it defining the function f by description.
This way of defining a function is important: it is used frequently in mathematics and other
first-order theories with equality. On the other hand, you will see that the proof is quite long
A. FIRST-ORDER THEORIES WITH EQUALITY 145
and complicated. I would suggest that you read the statement of the theorem and make
sure you understand it. On the other hand, the proof is provided for completeness (and
because you might find it difficult to find elsewhere in the literature) so I would recommend
working through it only if you are interested in the gory details or are suspicious.
• Adding a new n-ary function symbol f to the symbols of the language A and extending
the sets of terms and expressions accordingly, as dictated by Definitions 3.A.6 and I 3.A.6
3.A.8, I 3.A.8
• Extending the axiom schemas PL1–PL5 and Eq2 and the rules Modus Ponens and
Universal Generalisation to include all the new expressions so formed,
Then the new theory B is equivalent to the old one A (in the sense of Definition 1.C). I 1.C
Proof. We observe that B is an extension of A, in the sense that the symbols of A are a
subset of the symbols of B and the same goes for strings, terms, expressions, axioms, rules
and therefore theorems.
Observe that the new axiom (∀u) u = f (x1 , x2 , . . . , xn ) ⇒ P (u, x1 , x2 , . . . , xn ) gives rise
in the usual way to theorems: for any terms t1 , t2 , . . . , tn of B,
B (∀u) u = f (t1 , t2 , . . . , tn ) ⇒ P (u, t1 , t2 , . . . , tn ) .
Here “ B ” means “the following is a theorem of B”. In the same way, the assumed theorem
(∃!u)P (u, x1 , x2 , . . . , xn ) of A gives rise to theorems: for any terms t1 , t2 , . . . , tn of A, such
that the substitutions [x1 /t1 ], [x2 /t2 ] . . . [xn /tn ] are all acceptable in P ,
1. (∀u) u = f (t1 , t2 , . . . , tn ) ⇒ P (u, t1 , t2 , . . . , tn ) New axiom
2. f (t1 , t2 , . . . , tn ) = f (t1 , t2 , . . . , tn ) ⇒ P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) PL4 on 1
3. f (t1 , t2 , . . . , tn ) = f (t1 , t2 , . . . , tn ) Eq10
4. P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) MP: 3, 2
I 1.C We now define the functions ϕ : A → B and ψ : B → A (required by Definition 1.C). The
definition of ϕ is easy: it is the identity, that is, to each expression A of A, ϕ(A) is A.
Define a “simple f -term” to be a term in B of the form f (t1 , t2 , . . . , tn ), where the terms
t1 , t2 , . . . , tn do not mention f .
Let B be any atomic expression in B. We define ψ(B) by induction over the number of
occurrences of the symbol f in B. If f does not occur in B, then ψ(B) = B. Otherwise
B must contain at least one simple f -term; write B ∗ for the result of replacing the leftmost
occurrence of a simple f -term, f (t1 , t2 , . . . , tn ) say, in B by the first variable u which does
not occur in B or P . Then
ψ (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) .
ψ(B) =
We observe that (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) has one less occurrence of the symbol f
than does B and so this inductive definition is good.
ψ(¬B) = ¬ψ(B)
ψ(B ⇒ B 0 ) = ψ(B) ⇒ ψ(B 0 )
and ψ((∀x)B) = (∀x)ψ(B) .
show that
B (∃u)(P (u, t1 , t2 , . . . , tn ) ∧ B ∗ ) in B .
3. B∗ SL on 1
4. P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn ) Proved above (note, acceptable)
5. (∃!u)P (u, t1 , t2 , . . . , tn ) The assumed theorem
6. (∀x)(∀y)(P (x, t1 , t2 , . . . , tn ) ∧ P (y, t1 , t2 , . . . , tn ) ⇒ x = y From 5 by SL
7. P (u, t1 , t2 , . . . , tn ) ∧ P (f (t1 , t2 , . . . , tn ), t1 , t2 , . . . , tn )
⇒ u = f (t1 , t2 , . . . , tn ) PL4 on 6
8. u = f (t1 , t2 , . . . , tn ) SL, 2, 4 and 7
9. B Subst equals, 8 into 3.
This is a single axiom, not an axiom scheme, so the symbol f occurs exactly once and
f (x1 , x2 , . . . , xn ) is a simple f -term. If v is the first variable symbol which does not occur
148 CHAPTER 4. SOME FIRST-ORDER THEORIES
here, we have:
This is a theorem of A (proof below), so all we need do is insert its proof before the line
ψ(L) in the new proof.
Here is a proof:
(The proviso that there should be only a finite number of function and relation symbols is
not a big problem: nearly all the theories we are interested in are of this kind.)
We prove this now
(EqF3) (∀x)(∀y)(∀z)( x = y ∧ y = z ⇒ x = z )
A. FIRST-ORDER THEORIES WITH EQUALITY 149
(EqFf) For each formal function symbol f of the language of any arity k,
(EqFr) For each formal relation symbol r of the language of any arity k,
The last two “axioms” can be many axioms each. However, if the theory has only a finite
number of function and relation symbols, then these each represent only a finite number of
axioms.
A.10 Proposition
The sets of axioms given in Section A.2 and A.9 are equivalent. I A.2
I A.9
Proof. Since they both use the same language, it is enough to show that (1) the axioms of
A.9 follow as theorems from the axioms of A.2 and (2) the axioms of A.2 follow as theorems
from those of A.9.
(1) Assuming the theory with the axioms of A.2, we have already proved EqF1–3. The
other two, EqFf and EqFr, follow by k applications of the Partial Substitution Proposition
A.4.
(2) Now assume the language with the axioms of A.9. Axiom Eq1 is given as Axiom
EqF1.
To prove Axiom Eq2 we must first prove a similar result for terms: for any term t,
x = y ⇒ t = t[x/y]
B Projective planes
B.1 Discussion
A projective plane is a geometric structure consisting of lines and points, in which there is
one line through every pair of distinct points and every pair of distinct lines have one point
in common.
If we start from the point of view of Euclidan geometry, then the requirement that every two
points have a line through them is no surprise, but to stipulate that every pair of distinct
lines has a point in common can be looked at in two ways: either there are no parallel lines
at all or there must be extra “points at infinity” at which parallel lines meet. In any case, it
raises the question whether this whole idea makes sense at all.
This question can be looked at two ways:
• Euclidean Geometry can be realised by a structure (the points and straight lines in
R2 ); is there any such structure which realises the axioms of projective geometry?
We will answer both questions in the affirmative in the next chapter. For the meantime, let
us see how to define Projective Geometry as a first-order theory.
To any two distinct lines, there is a unique point which lies on both of them.
There exist four points, no three of which all lie on the same line. (This is called the
Non-triviality axiom.)
(Think: P (p) means p is a point, L(l) means l is a line and i(p, l) means point p lies on line
l).
B.4 An example
This is called the seven-point plane for obvious reasons.
This structure has seven points (shown as grey blobs) and seven lines — six of them straight
and one curved (no one said the lines must be straight!). That the axioms all hold for this
structure can be checked directly by observation.
A structure which obeys the axioms (and therefore the theorems) of a theory like this is
called a model. We will look at models in detail in the next chapter.
Another way of defining the same model (or, at least, an isomorphic one) is to give an
incidence matrix. There is a row for each point p1 , p2 , . . . , p7 and a column for each line
l1 , l2 , . . . , l7 ; the following matrix has a 1 where the point lies on the line and a zero otherwise.
In this form it is easier to check Axioms (PP3) and (PP4).
152 CHAPTER 4. SOME FIRST-ORDER THEORIES
l1 l2 l3 l4 l5 l6 l7
p1 1 0 1 0 1 0 0
p2 1 0 0 1 0 1 0
p3 0 1 0 1 1 0 0
p4 0 1 1 0 0 1 0
p5 1 1 0 0 0 0 1
p6 0 0 1 1 0 0 1
p7 0 0 0 0 1 1 1
C. PEANO ARITHMETIC (PA) 153
C.1 Definition
A first-order theory with arithmetic is a first-order theory with equality with the extra
function symbols
We use a bar over the zero symbol to distinguish between 0̄, the constant symbol in the
language, and 0, the actual zero in N. We will also use a special notation for the successor
function, writing x+ for suc(x). Finally, we will use the usual infix notation for addition and
multiplication, writing x + y for +(x, y) and x.y or xy for .(x, y) .
I write the function suc all squashed up like that to suggest that it is a single symbol, not
three letters. We won’t be using it much anyway.
(PA4) (∀x) (x + 0̄ = x)
(PA5) (∀x)(∀y) (x + y + = (x + y)+ )
(PA6) (∀x) (x.0̄ = 0̄)
(PA7) (∀x)(∀y) (x.(y + ) = (x.y) + x)
Note The theory defined by exactly the functions, relations and axioms specified above
(and no others) is Peano Arithmetic; we will denote it by PA. (It is often called the
Elementary Theory of Arithmetic).
Later we we will be interested in extending the theory by adding more functions, relation
symbols and axioms than the minimum specified above.
C.2 Remarks
(i) The first three axioms here obviously encapsulate Peano’s axioms and are sufficient
to define the Natural Numbers. The third axiom looks a bit more friendly if we rewrite it
thus:
P (0̄) ∧ (∀x)(P (x) ⇒ P (x+ ))
⇒ (∀x)P (x)
Note how it expresses the Principle of Induction without the need for any set theory notation.
It is frequently used in the form of a deduction:
(ii) Since we are avoiding set theory, we have no easy mechanism for introducing new
functions. Thus the two basic functions, addition and multiplication, are built in to the
definition of the theory itself. Axioms (PA4) and (PA5) constitute an inductive definition
of addition, Axioms (PA6) and (PA7) do the same for multiplication.
Herein lies a major shortcoming of this theory — it is difficult to introduce further functions,
without extending the language. For example, how can you define exponentiation and prove
the index laws in this language?
(iii) We have allowed the possibility of adding extra function and relation symbols and
extra axioms. Thus we can approach this theory in two ways. We can consider PA, the bare
theory of arithmetic, having only the functions, relations and axioms stated above, and see
what we can find out about it. Alternatively, we can use this definition to discuss any theory
which contains arithmetic (and that includes mathematics). This latter approach is useful
when we come to Gödel’s Theorem, which applies to any theory which contains arithmetic.
(iv) In general, if we take an idea normally defined in mathematics by a set of axiom-like
statements, and extract those to form a first-order theory by themselves, we form what is
usually called the “Elementary Theory of” whatever it is. We will look briefly at elementary
theories of groups and certain kinds of orders later in this chapter.
Such elementary theories do not normally contain the full power of Mathematics — set
theoretic notation, functions and so on — so there are usually many things we would like
to say about such theories which simply cannot be said, let alone proved, within them. For
C. PEANO ARITHMETIC (PA) 155
example, there is no way one can express “there is an uncountable number of subsets of the
natural numbers” in the language of this theory, far less prove it.
(v) The language contains terms which describe each of the natural numbers. For in-
stance the term 0̄+ represents the number 1, the term 0̄++ represents the number 2 and so
on. So it is a natural abbreviation to write 1̄ for 0̄+ , 2̄ for 0̄++ and so on. In fact, let us now
add these notations to our semi-formal theory, making it precise with a proper inductive
definition.
C.3 Definition
For any natural number ξ, we define the term ξ¯ by
0+0 = 0 (–1b)
+ +
0+y =y ⇒ 0+y =y for all y (–1c)
Here (–1b) is a special case of PA4 and, with the hypothesis 0 + y = y, we can prove
0 + y + = y + using PA5. We should be able to write this out to make a proper proof of (–1);
so now let us look at (–2). If we want to prove this by induction as it stands, we have a
choice of induction over x or y. Either way yields a couple of horrible tasks; for induction
over y for instance we need:
However things are not as bad as they seem: we can prove the right hand side of (–2) without
needing the left hand side, that is, we can prove x+ + y = y + x+ for all x and y without
need of the hypothesis, and that will do. To do this by induction over y we need to prove
But (–2a) is just an instance of (–1), already proved, and (–2b) follows from several appli-
cations of PA5. So now all that is needed is to convert this into a proper semi-formal proof,
which I’ll write out just to show what it looks like.
C.5 Proposition
(∀x)(∀y)( x + y = y + x )
1 0̄ + 0̄ = 0̄ PA4
2 0̄ + x = x subhyp
+ +
3 0̄ + x = (0̄ + x) PA5
4 0̄ + x+ = x+ Subst of equals (eq3): 1 into 3
5 0̄ + x = x ⇒ 0̄ + x+ = x+ Ded: 2–4
6 (∀x)(0̄ + x = x ⇒ 0̄ + x+ = x+ ) UG
7 (∀x)(0̄ + x+ = x+ ) Induction (PA3a) on 1 and 6
Next we prove
(∀x)(∀y)( x+ + y = (x + y)+ ) (–2)
C. PEANO ARITHMETIC (PA) 157
This is by induction over y. Writing the proof out slightly less formally, firstly,
x+ + 0̄ = x+ by PA4
+
= (x + 0̄) by PA4 again (x = x + 0̄ and substitute equals).
Now for any y, suppose that x + y = (x + y)+ . Then
+
1 x+ + 0̄ = x+ PA4
2 x+ = (x + 0̄)+ PA4 again
3 x+ + 0̄ = (x + 0̄)+ A.3 on 1 and 2
4 (∀x)(x+ + 0̄ = (x + 0̄)+ ) UG
5 x+ + y = (x + y)+ subhyp
6 x+ + y + = (x+ + y)+ PA5
7 (x+ + y)+ = (x + y)++ By the subhyp
8 x+ + y + = (x + y)++ A.3 on 6 and 7
9 (x + y)++ = (x + y + )+ PA5 again
10 x+ + y + = (x + y + )+ A.3 on 8 and 9
11 x+ + y = (x + y)+ ⇒ x+ + y + = (x + y + )+ Ded: 6–10
12 (∀x)(x+ + y = (x + y)+ ⇒ x+ + y + = (x + y + )+ ) UG on x
13 (∀y)(∀x)(x+ + y = (x + y)+ ⇒ x+ + y + = (x + y + )+ ) UG on y
14 (∀y)(∀x)( x+ + y = (x + y)+ ) Induction over y, 4 and 13
14 (∀x)(∀y)( x+ + y = (x + y)+ ) F.5 of Ch.3
Finally we prove the main result by induction over y, writing the proof out less formally.
We have
x + 0̄ = x = 0̄ + x by PA4 and (–1).
Now, for any y, suppose that x + y = y + x. Then
x + y + = (x + y)+ = (y + x)+ = y + + x by PA5, IH and (–2).
C.6 Remarks
That was pretty unpleasant for two reasons. The first was that the result we set out to
prove (commutativity of addition) actually required two lemmas first. The second was that
I was at pains to show how an inductive proof really could be expressed within the formal
theory.
But, once one has the hang of how it goes, it is OK to write out inductive proofs in a more
relaxed manner, knowing that they can be translated into formal form if necessary. Here
follows an example.
158 CHAPTER 4. SOME FIRST-ORDER THEORIES
C.7 Proposition
We prove that addition is associative:
(∀x)(∀y)(∀z)( (x + y) + z = x + (y + z) ) .
(x + y) + 0 = x + y by PA4
= x + (y + 0) by PA4 again
(Again there are a couple of quiet uses of substitution of equals.) “IH” is an abbreviation
for “Inductive hypothesis”.
C.8 Remarks
(i) It is possible to continue in this vein and prove the usual algebraic properties of
addition and multiplication — associativity, distributivity, cancellation and so on up to prime
number factorisation and beyond. We will not follow this up here, but restrict ourselves to
a few results which are relevant to the discussion of Gödel’s Theorem later.
(ii) While it is difficult to introduce new functions in any useful way into this language
(except by formally extending the language by adding new function symbols and axioms),
there is no problem with adding new relations semiformally. For example, we can define
C.9 Proposition
(∀x)(∀y)( x < y ⇒ x+ < y + )
Proof. From the definition of <, there exists u such that x + u+ = y. Then
x+ + u+ = (x + u+ )+ = y + and so x+ < y + .
C. PEANO ARITHMETIC (PA) 159
C.10 Proposition
(∀x)( x = 0̄ ∨ (∃y)(x = y + ) )
Proof. is by induction over x. Let P (x) be the statement to be proved. Then 0̄ = 0̄ and so
P (0̄) is true.
C.11 Corollary
(∀x)( x 6= 0 ⇒ (∃y)(x = y + ) )
Proof is by induction over x. Suppose first that x = 0̄. Then either y = 0̄, in which case
x = y and the result is true, or else y 6= 0̄, in which case by Corollary B10, there is a u such
that y = u+ . But then y = x + u+ , so x < y and again the result is true.
Now we want to prove (x+ < y) ∨ (x+ = y) ∨ (y < x+ ). Now either y = 0̄, in which case
x+ = y + x+ , so y < x+, or else y 6= 0̄. Then, by Corollary 2.9, there is a u such that
y = u+ . By IH, (x < u) ∨ (x = u) ∨ (u < x) and now, using Proposition C.9, the result
follows.
Proof. (i) x + 0̄ = x.
(ii) and (iii) as an interesting exercise.
(iv) follows easily from Trichotomy, above.
Question: is it possible in this language to express the fact that the Natural Numbers are
well-ordered?
Imagine grains of sand in a bag. I can lift the bag when it contains one grain
of sand. If I can lift the bag with N grains of sand then I can certainly lift it
with N+1 grains of sand (for it is absurd to think that I can lift N grains but
adding a single grain makes it too heavy to lift). Therefore, I can lift the bag
when it contains any number of grains of sand, even if it contains five tons of
sand.
I would prefer that the second sentence here was “I can lift the bag when it is empty”, since
we start counting at zero in this chapter, but that doesn’t really matter.
The argument is easily formalised. Write P (N ) for the statement “I can lift the bag with N
grains of sand”. Then it goes
P (0)
(∀N )(P (N ) ⇒ P (N + 1))
therefore (∀N )P (N )
D.1 Discussion
In Section C we introduced the first-order theory PA, Peano Arithmetic, otherwise known IC
as the Elementary Theory of Arithmetic, which describes the natural numbers and basic
arithmetic based on Peano’s axioms. In this section we intoduce a simpler theory, which I
will call Robinson Arithmetic, RA (after its proposer Raphael Robinson). It uses the same
language as PA but different axioms, giving rise to a weaker theory (that is, all the theorems
of RA are theorems of PA, but there are theorems of PA which are not theorems of RA).
Why bother? Well, there are two reasons why this theory is important. Firstly, it is strong
enough to prove the Representability Theorem of Chapter 10. Gödel’s Incompleteness The-
orem applies to any first-order theory strong enough to prove the Representability Theorem.
Thus showing that the Representability Theorem can be proved in a weaker theory means
that Gödel’s theorem has wider applicability. Secondly, we will also see that, if there
exists a first-order theory in which the Representability Theorem can be proved and which
is finitely axiomatisable, then plain Predicate Logic is incomplete and undecidable. RA is
finitely axiomatisable, which will establish these important facts, whereas PA is not.
D.2 Definition: RA
RA is a first-order language with equality and four formal function symbols:
0̄ zero nullary
suc successor unary
+ addition binary
. multiplication binary
Just as with PA, we use a bar over the zero symbol to distinguish between 0̄, the constant
symbol in the language, and 0, the actual zero in N. We also use the notation x+ for the
successor function suc(x) and the usual infix notation for addition and multiplication, writing
x + y for +(x, y) and x.y or xy for .(x, y) .
Here are the axioms of the theory RA.
(PL1 – PL5) The six ordinary axioms of Predicate Logic.
D.3 Order
The axioms of RA do not imply that addition is commutative, so we must be a bit careful
about how we define the order relation here. We actually define it “on the opposite side”
from the way we defined it for PA:
We define
D.4 Warning
Be very careful with formal proofs in this theory. There is no reason to suppose that the usual
laws, such as associativity and commutativity of addition and multiplication are proveable
D. A STRIPPED-DOWN VERSION OF ELEMENTARY ARITHMETIC (RA) 163
in this theory. Consequently, many of the simple arithmetic manipulations we are used to
doing are not justified.
At the end of the next chapter (Section 5.G) a model is constructed (and another related I 5.G
one outlined) which prove that nearly all the standard manipulations we are used to are
not justified by theorems in this theory: these include commutativity, associativity and
cancellation of addition and even the behaviour of zero. (0 + x = x is not a theorem!)
Note also that proof by induction is not justified by these axioms, so induction may not be
used in any formal proof within the theory. We are, however, allowed to make inductive
arguments in metaproofs about the theory (because for metaproofs we are using ordinary
mathematics).
Nevertheless, the theory does have some familiar theorems. We now set about developing
just enough results in this theory for the requirements of the Representability Theorem.
(ii)
(∀x)(∀y)( x ≤ y ∧ x 6= y ⇒ x < y).
Note that the opposite implication is not a theorem of RA.
(iii)
(iv)
(∀x)( x ≤ 0̄ ⇒ x = 0̄ )
Proof (i). Suppose there is an x such that x < 0̄. Then there is u such that u+ + x = 0̄.
But then x 6= 0̄ because x = 0̄ would give u+ = 0̄ (by RA4) contradicting RA1. Then, by
RA3a, there is y such that x = y + . But now we have u+ + y + = 0̄, that is, (u+ + y)+ = 0̄
(by RA5) and this contradicts RA1 again.
(ii) This follows immediately from the definition of x ≤ y.
(iii) For any x and y,
and
0̄ is already defined
ξ + 1 = (ξ)¯+
¯+
ξ + = (ξ)
The next proposition shows that the function ξ 7→ ξ¯ preserves all the relevant structure of
N.
D.7 Proposition
For any natural numbers µ and ν,
(i) µ+ = µ̄+ ;
(ii) µ + ν = µ̄ + ν̄ ;
(iii) µν = µ̄ν̄ ;
(iv) For all µ ∈ N, µ̄ 6= µ̄+
(v) The map µ 7→ µ̄ (from N into the language of RA) is one-to-one.
(ii) is proved by induction over ν. I will give the proof in detail to show that we are not
making any unjustified assumptions. First we prove the zero case µ + 0 = µ̄ + 0̄:
µ + 0 = µ̄ since µ + 0 = µ in N
= µ̄ + 0̄ by RA4
(iv) Suppose that there is some µ ∈ N such that µ̄ = µ̄+ . Then let µ be the least such
number. Then µ 6= 0, since 0̄ 6= 0̄+ by RA1. So, writing ν = µ − 1, we have µ̄ = ν̄ + ,
ν̄ ++ = µ̄+ = µ̄ = ν̄ + and then ν̄ = ν̄ + by RA2, contradicting the choice of µ.
(v) Proof by contradiction. Suppose that there are µ, ν ∈ N such that µ 6= ν but µ̄ = ν̄.
We may assume without loss of generality that µ < ν. We may suppose further that,
amongst all such pairs, we have chosen one so that ν − µ is the smallest. Now, if ν ≥ 2 + µ
we have µ < 1 + µ < ν and then, by choice of µ and ν, µ̄ = 1 + µ = ν̄, a contradiction.
Therefore ν = µ+ , contradicting (v).
(vi) If µ < ν then there is some ξ ∈ N such that ξ + + µ = ν (a property of N). But then
ξ¯+ + µ̄ = ν̄ by (ii), so µ̄ < ν̄.
For the converse, suppose that µ̄ < ν̄. We prove that µ < ν by induction over µ.
If µ = 0 we have 0̄ < ν, that is (∃u)(u+ + 0̄) = ν̄. Then (∃u)(u+ = ν̄ so ν̄ 6= 0̄
by RA1 and thus ν 6= 0, so µ = 0 < ν.
If µ 6= 0, noting that ν 6= 0 too by D.5(i), there are ξ and η in N such that µ = ξ +
and ν = η + . Thus ξ¯+ < η̄ + , in other words, (∃u)(u+ + ξ¯+ = η̄ + ). But then
+
(∃u)((u + ξ)¯ = η̄ ) from which
+ +
(∃u)(u + ξ¯ = η̄) by RA1. But then
+
ξ¯ < η̄
and, by the inductive hypothesis, ξ < η and so µ < ν.
(vi0 )
µ̄ ≤ ν̄ if and only if µ̄ < ν̄ ∨ µ̄ = ν̄
if and only if µ<ν ∨ µ=ν
if and only if µ≤ν.
Now, when x = 0 the result is immediate also, so we may suppose that x 6= 0 and thus that
there is y such that x = y + . But then our inductive hypothesis tells us that either y ≤ µ or
µ ≤ y, in which case either y+ ≤ µ+ or µ+ ≤ y, that is x ≤ ν or ν ≤ x, as required.
(vii0 ) This follows immediately from (vii) using the definition of ≤.
D.8 Proposition
For any ν ∈ N,
x ≤ ν̄ ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν̄ (–1)
and, if ν > 0, x < ν̄ ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν − 1 (–2)
x ≤ ν̄ ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν̄
x ≤ ν̄ + ⇒ x = 0̄ ∨ x = 1̄ ∨ . . . ∨ x = ν̄ ∨ x = ν̄ +
If x = 0̄ this is immediate, so we assume that x 6= 0̄. But then (by RA2) there is a y such
I D.5 that x = y + . We now have y + ≤ ν + and so, by Proposition D.5(iii), y ≤ ν. The inductive
hypothesis now gives
y = 0̄ ∨ y = 1̄ ∨ . . . ∨ y = ν̄
from which
y + = 0̄+ ∨ y + = 1̄+ ∨ . . . ∨ y + = ν̄ +
which is
x = 1̄ ∨ x = 2̄ ∨ . . . ∨ x = ν̄ +
and we are done.
D.9 Proposition
For any expression P (x) and any natural number ν,
Proof.
(Note the change from ∨ to ∧ here). This is what is used in this step.
(–4) is because (x = 0̄ ⇒ P (x)) ⇔ P (0̄) and so on for the other terms.
(–5) is one one the rare cases in which we use the fact that, in an expression of the form
(∀x)(An expression which does not contain x free), one can simply remove the (∀x) quanti-
fier. (See 3.E.) I 3.E
The proof of (–2) is the same.
168 CHAPTER 4. SOME FIRST-ORDER THEORIES
no relation symbols (other that equality) and the following proper axioms:—
(G1) (∀x)(∀y)(∀z) m(x, m(y, z)) = m(m(x, y), z)
(G2) (∀x) m(x, e) = x
(G3) (∀x) m(x, n(x)) = e .
If we employ a more usual notation,
e is the identity 1
n(x) is the inverse x−1
m(x, y) is the product xy
E.2 Remarks
(i) This is a stripped-down version of the axioms. In particular we do not assert
(G20 ) 1x = x
or (G30 ) x−1 x = 1
as axioms. These can be proved as theorems. To find the proofs is an interesting exercise
(hint: prove G30 first).
(ii) This theory cannot encompass the whole of Group Theory as we know it. As it is set
up, all variables must be elements of the same group and so the theory can only talk about
one group at a time (a single group constitutes the whole “universe of discourse”). Thus the
I 1.A theory is not categorical, in the sense of the question posed at the end of Section 1.A. This
will be made precise in the Chapter 5.
This also means that the theory cannot talk about homomorphisms, subgroups, direct prod-
ucts etc. etc. It is pretty limited.
E. THE ELEMENTARY THEORY OF GROUPS 169
In developing the theory from this base, the first task is to show that the elements e and
y mentioned in GA2 are in fact unique. Then they give rise to the identity and inverse as
functions defined by description.
And, of course, you can use additive notation. It uses the same axioms, but different symbols.
170 CHAPTER 4. SOME FIRST-ORDER THEORIES
F.1 Discussion
It seems reasonable to expect that any complete theory should be categorical. Look at it
this way: to say that a theory is complete means that any sentence, that is, any simple
statement with no free variables, is either provably true or provably false. And to say that
a theory is categorical means that any two structures that can be described by the language
and for which all the theorems of the theory are true must be isomorphic. Completeness
means that anything that the language can express that is true in one of the structures must
be true in the other also. How then can they possibly fail to be isomorphic?
Well, they can, and the theory described in this section is complete but not categorical.
The definition of “categorical” is at present a bit vague (what do I mean by a “structure the
language can describe” and by “isomorphic”?); it will be made precise in the next chapter
and then the fact that UDLO is non-categorical will be clear. In this section we will prove
the harder ssertion, that UDLO is complete.
It is the fact that UDLO is both complete and non-categorical that is important here. The
proof is longish and quite intricate, and is given here for completeness. I would recommend
skimming through it to get a rough idea how it goes (the argument is quite interesting) but
do not feel you have to read it in detail unless you become fascinated by it.
F.3 Remarks
(i) Axioms UDLO1–UDLO3 express the fact that the system is fully ordered, in terms
of a strict order relation <. Well, almost. They need to imply transitivity, and this is done
by UDLO3. They also need to imply that, for any x and y, exactly one of x < y, x = y or
y < x is true. Now UDLO1 implies that at least one of these is true and UDLO2 implies
that x < y and y < x cannot both be true. It remains to ensure that x < y and x = y
cannot both be true, for then it follows that x = y and y < x cannot both be true either.
But this is given with the help of UDLO4 (proof as an easy exercise).
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 171
In fact, UDLO4 is there for another reason (see the next remark), but it does this as a side
effect, so there is no need for an extra axiom.
(ii) Axiom UDLO4 expresses density: between any two distinct elements there is another.
Axiom UDLO5 says that the system is unbounded: it has neither a least nor a greatest
element.
(iii) In our semi-formal exposition we allow the usual notation, 6=, >, ≤ and ≥, defined
in the usual way.
(iv) In what follows it will be very convenient to allow the symbols T and F for “true”
and “false”. We can either add these as part of the logical background or add T as a new
nullary relation together with defining axiom
F.4 Proposition
UDLO is not categorical.
Proof (sort of ). The Rationals Q and the Reals R are both systems described by UDLO;
they are clearly not isomorphic.
As already remarked, the proof above is a wee bit vague, due mainly to the slight vagueness
in our definition of “categorical”. What we are really saying here is that Q and R are non-
isomorphic models of UDLO. In Chapter 5 we will define all these terms properly and this I Ch.5
theorem and its proof will become watertight.
The remainder of this section is devoted to a proof that UDLO is complete. Throughout,
the symbol means, of course, proveable in UDLO.
F.5 Proposition
Let x1 , x2 , . . . , xn , z be variable symbols (n ≥ 1). Then
(∃z) x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xn ≤ z ∧ (x1 = z ∨ x2 = z ∨ . . . ∨ xn = z) .
(1)
This expresses the idea that the set {x1 , x2 , . . . , xn } has a maximum member, using the
language available to us. In the same way we can express the fact that the set has a
minimum member.
(∃z) z ≤ x1 ∧ z ≤ x2 ∧ . . . ∧ z ≤ xn ∧ (z = x1 ∨ z = x2 ∨ . . . ∨ z = xn ) .
(2)
Proof. We prove the first statement only, by induction over n. In the case n = 1,
(∃z)( x1 ≤ z ∧ x1 = z )
is obvious (set z = x1 ).
172 CHAPTER 4. SOME FIRST-ORDER THEORIES
Now in the case n ≥ 2, and slipping into more informal style, we assume as inductive
hypothesis that there exists z 0 such that
x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xn ≤ z ∧ (x1 = z ∨ x2 = z ∨ . . . ∨ xn = z) .
z = z0 (7)
and we have
You will notice that we have used both the choice Rule and the Constructive Dilemma in
this proof.
F.6 Note
The atomic expressions of UDLO are those of one of the forms
T , x=y or x<y
where x and y are variable symbols. By simple expressions we will mean expressions which
are either atomic expressions or negations of atomic expressions, so these are expressions of
one of the forms
These are the same, or equivalent to, expressions of one of the forms
T , F , x=y , x 6= y , x<y or x ≤ y .
A1 ∧ A2 ∧ . . . ∧ Am (m ≥ 1)
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 173
where each Ai is an atomic expression and there are no repetitions (that is, Ai 6= Aj
whenever i 6= j).
By a DC-form we will mean an expression which is either T or F or of the form
B1 ∨ B2 ∨ . . . ∨ Bn (n ≥ 1)
were each Bi is a C-term and again there are no repetitions (that is, Bi 6= Bj whenever
i 6= j).
For the next few propositions, we are going to have to discuss the set of free variables which
occur in our expressions. To save repeating the same phrase over and over,
For any expression P we will write v(P ) for the set of free variables in P .
F.7 Proposition
Let P be any expression in UDLO which contains no quantifiers. Then
P ⇔ P∗ ,
Proof. In this proof, when I say that two expressions are equivalent, that means of course
that they can be proved equivalent in UDLO. Also, the associativity of ∧ and ∨ will be
used frequently without comment.
Step 0 First some trivialities:
(a) Any atomic expression is a C-term.
Proof. By Step (0a), P1 ∨P2 ∨ . . . ∨Pn is a list of C-terms OR-ed together. If there are no
repetitions, this is already a DC-form. Otherwise, using commutativity of ∨, we can rear-
range the list so that identical atomic expressions are side by side and then use idempotence
of ∨ to remove the repetitions.
Step 2 If P 1 , P 2 , . . . , P n are DC-forms, then P1 ∨P2 ∨ . . . ∨Pn is equivalent to a DC-form.
Otherwise P1 ∨P2 ∨ . . . ∨Pn is a list of C-terms OR-ed together. The proof now goes as for
Step 1.
Step 3 If P 1 , P 2 , . . . , P n are C-terms, then P1 ∧P2 ∧ . . . ∧Pn is equivalent to a C-term.
Proof. (The proof is similar to that for Step 1). P1 ∧P2 ∧ . . . ∧Pn is a list of atomic expres-
sions AND-ed together. If there are no repetitions, this is already a C-term. Otherwise,
using commutativity of ∧, we can rearrange the list so that identical atomic expressions are
side by side and then use idempotence to remove the repetitions.
Step 4a If P and Q are DC-forms, then P ∧ Q is equivalent to a DC-form.
Proof. If P is T, then P ∧ Q is equivalent to Q, a DC-form. The proof is the same if Q is
T.
If either P or Q is F, then P ∧ Q is equivalent to F, again a DC-form.
Otherwise both P and Q are lists of C-terms OR-ed together,
P = C1 ∨C2 ∨ . . . ∨Cm
and Q = D1 ∨D2 ∨ . . . ∨Dn say.
Then, using distributivity, this P ∧ Q is equivalent to
(C1 ∧ D1 ) ∨ (C1 ∧ D2 ) ∨ . . . (Cm ∧ Dn ) , (–1)
where the list contains all of the pairs (Ci ∧ Dj ). But each of these pairs is equivalent to a
C-term, by Step Step 3, so (–1) is equivalent to a list of C-terms OR-ed together. But then
this is equivalent to a DC-form, by Step 1.
Step 7 (The main result at last) If P is any expression with no quantifiers, then it is
equivalent to a DC-form P ∗ .
Proof is by induction over the construction of P .
If P is an atomic expression, then it is already a DC-form.
F.8 Proposition
Let P be any expression in UDLO which contains no quantifiers. Then
(∃z)P ⇔ Q
Proof. By the previous proposition we may assume that P is a DC-form, that is,
P = C1 ∨C2 ∨ . . . ∨Cn
so it is sufficient to prove the result in the case that P is in fact a C-term. So let us assume
that
P = A1 ∧ A2 ∧ · · · ∧ Am ,
where the Ai are all atomic expressions (let’s call them its factors). The proof is by induction
over m, the number of factors in P .
Firstly note that, if none of the factors Ai mention z then the result is trivially true with Q
the same as P . So from now on we may assume that at least one of the factors mentions z.
Next observe that if any one of the factors Ai does not mention z, then it may be “taken
outside”; for instance, if A1 does not mention z, then
(∃z)(A1 ∧ A2 ∧ · · · ∧ An ) ⇔ A1 ∧ (∃z)(A2 ∧ · · · ∧ An )
and then the inductive hypothesis gives the result. So from now on we may assume that all
the factors A1 , A2 , . . . , An mention z.
176 CHAPTER 4. SOME FIRST-ORDER THEORIES
Now the factors of P are each of one of the following forms (where here x stands for any
variable symbol which is different from z).
If P contains a factor of the form z = z, then that factor may be simply eliminated, since it
is equivalent to T, and then the inductive hypothesis applies. If P contains a factor of the
form z < z, then P ⇔ F since that factor is equivalent to F.
Suppose now that P contains a factor of the form z = x or x = z. Rearranging the factors
if necessary, P is equivalent to an expression of the form (z = x) ∧ R(z), where R(z) is the
conjunction of all the remaining factors. (I have made the variable z explicit here because
we are going to substitute for it.) Now
(∃z)P ⇔ (∃z) (z = x) ∧ R(z)
⇔ R(x)
From now on we may assume that all of the factors of P are of one of the forms z < x and
x < z.
Suppose now that P contains two such factors with the same variable x. If the two factors
are in fact identical, then they can be replaced by one copy and the inductive hypothesis
applies. On the other hand, if the two factors are different, they must be of the form z < x
and x < z, with the same x. But
z<x ∧ x<z ⇔ F
x1 , x2 , . . . , xp , y1 , y2 , . . . , yq ,
P = (x1 < z) ∧ (x2 < z) ∧ · · · ∧ (xp < z) ∧ (z < y1 ) ∧ (z < y2 ) ∧ · · · ∧ (yq < z) (1)
I will prove that then (∃z)P is equivalent to the expression Q which is the conjunction of
all the simple expressions
(if there are no such expressions at all, we take this to mean that Q is T).
It is not difficult to see that, in ordinary mathematics, (1) and (2) are equivalent (the proof
is "Stare hard"). The point here is to check that they are proveably equivalent with only
the limited resources of UDLO.
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 177
To see that (∃z)P ⇒ Q we argue as follows. For any i and j, xi < z and z < yj are factors
of P , so P ⇒ xi < z ∧ z < yj and so P ⇒ xi < yj and thus (∃z)P ⇒ xi < yj .
Repeating this for each i and j, we see that (∃z)P implies every one of the factors of Q
listed above and so it implies their conjunction, which is Q.
It remains to show that Q ⇒ (∃z)P .
Suppose first that P has no terms of the form z < yj ; Then Q is T so we want to prove that
T ⇒ (∃z)P , that is, that P . By the previous proposition (F.5),
(∃z) x1 ≤ z ∧ x2 ≤ z ∧ . . . ∧ xp ≤ z ∧
( z = x1 ∨ z = x2 ∨ . . . ∨ z = xp ) . (3)
Using a choice argument, we choose such a z, and so deduce that xi ≤ z for all i. Using
the axiom UDLO4, there is z 0 such that z < z 0 , from which we deduce xi < z 0 for all the
xi , and then (∃z)P follows immediately.
A similar argument proves the proposition in the case in which P has no terms of the forms
xi < z.
Finally, we assume that P has at least one term of the form xi < z and at least one term
of the form z < yj . For this we need both forms of the previous proposition: there are
theorems of the form (3) above and
(∃z) z ≤ y1 ∧ z ≤ y2 ∧ . . . ∧ z ≤ yq ∧
( z = y1 ∨ z = y2 ∨ . . . ∨ z = yq ) . (4)
Using the Choice Rule (and changing letters for convenience), there are theorems of the
forms
xi ≤ u (5)
u = x1 ∨ u = x2 ∨ . . . ∨ u = xp (6)
v ≤ yj (7)
v = y1 ∨ v = y2 ∨ . . . ∨ v = yq (8)
((5) is a list of theorems, one for each i = 1, 2, . . . , p, and (7) is a list of theorems, one for
each q.)
Applying Proof by Cases (the Constructive dilemma) to (6) and (8), we prove a heap of
special cases, one for each i and j, as follows.
For any particular i and j, we have a case u = xi ∧ v = yj , and we use (2) to get u < v.
Then the axiom UDLO4 tells us that (∃z)(u < z < v) and then, using (5) and (7), for
each i0 and j 0 ,
xi0 < z
z < yj 0
F.9 Corollary
Let P be any expression in UDLO which contains no quantifiers. Then
(∀z)P ⇔ Q
where Q is a DC-form and v(Q) ⊆ v(P ) r {z} .
F.10 Proposition
Let P be any expression in UDLO. Then
P ⇔ Q
where Q is a DC-form and v(Q) ⊆ v(P ) .
I F.7 with no quantifiers, so by Proposition F.7 there is a DC-form Q such that ¬R0 ⇔ Q and
0
v(Q) ⊆ v(¬R ) ⊆ v(P ) .
The same argument can be used when P is of the form R ⇒ S
Finally, suppose that P is (∀z)R, so that v(P ) = v(R) r {z} . Then, by the inductive
hypothesis
R ⇔ R0 R0 a DC form, v(R0 ) ⊆ v(R) ,
so P ⇔ (∀z)R0 v((∀z)R0 ) ⊆ v(R0 ) r {z} ⊆ v(P ) ,
0
and (∀z)R ⇔ Q Q a DC form, v(Q) ⊆ v((∀z)R0 ) ⊆ v(P ) .
F. UNBOUNDED DENSE LINEAR ORDERS (UDLO) 179
F.11 Corollary
UDLO is complete.
And now we translate the axioms (G1)–(G3) of The Elementary Theory of Groups into
relational form. Let us start with (G1), the Axiom of Associativity. It is
(G1) (∀x)(∀y)(∀z)( (xy)z = x(yz) ) .
In order to convert this one into an expression using the relational forms above, we use the
I A.8 tricks described in the proof of Theorem A.8. The basic idea is to use extra variables to pull
G. FUNCTIONS VERSUS RELATIONS 181
(G1) apart into manageable fragments. First, we introduce variables u and v to stand for
(xy)z and x(yz), breaking the axiom up thus:
(∀x)(∀y)(∀z) u = (xy)z ∧ v = x(yz) ⇒ u = v
We still need to break up the fragments (xy)z and x(yz) because they still involve the
functional form. For the first one, we introduce a further variable w to stand for xy) and
use the fact that
(∀x)(∀y)(∀z)
(∃w)( mul(w, x, y) ∧ mul(u, w, z) ) ∧ (∃w)( mul(w, y, z) ∧ mul(v, x, w) )
⇒ u=v
show that the resulting axioms are usually more complicated-looking than the ones using
the function symbols.
However, there is a useful variation on this. A partial function can also be treated as a
relation of a particular kind.
Suppose, for example, we have a theory within which we want to deal with a partial function
f of two variables. Consider the expression
u = f (x, y) .
This means that the pair hx, yi of arguments is in the domain of f and that u is the (unique)
value of the function for those arguments. If we want to replace this by a relation, F (u, x, y)
say, we know that
(∀x)(∀y)(!u)F (u, x, y) .
This can be useful. We might wish to deal with a first-order theory in which one of our defin-
ing function symbols stands for a partial function. But the definition of a first-order theory
does not allow function symbols for partial functions (in the formal language). However,
this technique allows us to introduce a partial function as a relation, and then use ordinary
functional notation in the semi-formal language (which is usually more convenient).
Where a partial function is to be introduced in this way, one usually needs to add an
axiom, similar to the one above, which ensures that the relation is in fact a partial function.
Another axiom to define its domain may be required also. (It is always possible that both
these statements might follow as theorems from other axioms of the theory.)
An obvious example of this is the inverse operation in a field. Let us look at this now.
(F4) (∀x)(∀y)( x + y = y + x )
(F5) (∀x)(∀y)(∀z)( (xy)z = x(yz) )
(F6) (∀x)( x1 = x )
(F7) (∀x)(∀y)( xy = yx )
A Structures
A.1 Discussion
In the informal discussions up to here in these notes, I have often made remarks about what
terms in our formal language “represent” and what expressions “mean”, leaving it to you to
understand as well as you can what I mean. For straightforward theories like PA I think
this is pretty obvious, but for less well-known ones like, say, Projective Geometry it might
be less so. In this chapter we will discuss the definitions of structures and models, which
clears up what it means for a formal language or theory to “talk about” something in these
ways. (And about time!)
A language seldom exists in isolation. It is usually used to describe something, perhaps a
whole class of similar things. If the language has function symbols F and relation symbols
R, then the thing the language describes presumably has (actual) functions and relations to
which these symbols refer. This means that this thing must be some kind of set (or class)
upon which these functions and relations are defined: we will call this a structure. Once we
know this, we can ascertain which members of the structure all the terms of the language
refer to and thus determine which of the statements in the language are true or false, as
they refer to that structure.
In this section we will consider only first-order theories and the languages which underlie
them (first-order languages). Some of the results of interest in this context will involve
languages which are (possibly) uncountably infinite. So this section we will allow languages
to have arbitrarily large sets of variable, function and relation symbols. As usual, the number
of variable symbols will be at least countably infinite.
(Recall that a first-order language L has a function domain, that is, a set F of function
symbols, each with a prescribed arity, a relation domain, that is, a set R of relation symbols,
also each with a prescribed arity, and a set of variable symbols, which we will call vL. These
three things define the language. Once we have such a language its set of terms is defined;
we will call this set tL.)
Then an (F,R)-structure is a set M upon which there are (actual) functions and relations
defined corresponding to the members of F and R. More precisely,
185
186 CHAPTER 5. MODELS
What is a “class”?! — A class can be described roughly as something just like a set, only
it might be much bigger. For a couple of examples, the class of all topological spaces, the
class of all groups. (Both these things behave like sets in many respects, but we will prove
later that they are actually “too big” to actually be sets.) Classes will be defined properly
in the next chapter and dealt with there at length. In the meantime, it will do no harm in
this chapter to think “set” when you read “class”.
When I think of a formal language and a structure for it, I usually have a mental picture
something like this:
Note that we sometimes use the same sign (on paper) to stand for both a relation symbol
in the language and the corresponding actual relation on the structure. This can obviously
be a rich source of confusion, which I alleviate as far as possible in these notes with the
coloured font convention.
188 CHAPTER 5. MODELS
0̄ 0
= =
suc suc
+ +
. .
It is now fairly obvious that any sentence in the language (expression with no free variables
corresponds in the same way to a statement about the structure which is either true or false.
For example
It is important to notice that the idea of a structure for a language pays no attention to any
axioms or theorems that might be defined. So sentences in the language may just as well
correspond to false statements about the structure as to true ones:
Obviously, it will usually be a good thing if the theorems in our theory correspond to true
statements about the structure. A structure which has this property is called a model and
we will discuss these in the next section.
Everything outside that egg is meant to be the language we use to describe it and its
properties, the metalanguage. We will use (and have been using) ordinary mathematics,
perhaps supplemented by plain English when necessary, as that language.
Why choose ordinary mathematics? Well, we have to use some sort of language to describe
what is going on and to establish facts about our various theories. It should be obvious by
A. STRUCTURES 189
now that often we have to make quite sophisticated and complicated arguments, the kind
for which ordinary everyday language is just not good enough. Mathematics is designed for
this sort of thing.
However, this does raise a fundamental question: how can we be sure that arguments made
in ordinary mathematics are indeed correct? Indeed, how can we be sure that ordinary
mathematics is at least consistent? This question really becomes important later when we
set about proving that certain other theories are consistent.
One could imagine perhaps using some other, simpler, system of argument to prove that
ordinary mathematics is consistent, but that would only raise the question of the consistency
of that simpler system. With such an approach, the best one could hope for is some sort
of infinite regress, which doesn’t really prove anything. So throughout these notes we make
the fundamental assumption: Ordinary mathematics is a consistent theory.
More specifically, we will use the system of mathematics defined by the Morse-Kelley Axioms
in these notes. This is defined and discussed in detail in Chapter 6. So our fundamental I6
assumption is:
All the arguments we use can equally well be made using the other major system, Zermelo-
Fraenkel (see Appendix B), and all or most of them can be made using somewhat weaker IB
assumptions. We will not go into this in detail here.
There is an alternative, less gung-ho, approach. One can be explicit about this assumption
and simply prefix virtually everything proved in these notes with “If ordinary mathematics
is consistent then . . . ”. For example, in the last chapter we proved that UDLO is complete.
Taking this approach, we can say that that was just an abbreviation for what we really
proved, which was
This way of looking at proofs of consistency in particular is usually called relative consistency.
Now, returning to the egg diagrams above, everything outside the left-hand egg is meant to
take place in ordinary mathematics. In particular the right-hand egg, that is the structure,
is constructed in ordinary mathematics. In the above example, in which the structure is N,
that is the ordinary mathematical Natural Numbers which we all know and love, and when
we talk about functions and relations there we mean ordinary functions and relations in the
sense we have always been used to.
Let us look at the structure Z5 , the ring of integers modulo 5, which can act as a structure
for PA. To see this let us write Z5 as the set { [0] , [1] , [2] , [3] , [4] } and use the notation
= [0] csuc ⊕
for the various relations and functions on it, defined in the usual way. (Here csuc is the
“cyclic successor” function [0] 7→ [1] 7→ [2] →
7 [3] 7→ [4] 7→ [0], ⊕ and are addition and
multiplication modulo 5).
We make this into a structure for the language of PA by making the formal relation and
function symbols of the language correspond to these relations and functions in the obvious
way:
= → =
0̄ → [0]
suc → csuc
+ → ⊕
. →
In this structure, the sentence 3̄ + 4̄ = 2̄ corresponds to the statement [3] ⊕ [4] = [2], which
happens to be true.
Combining the interpretations of the various symbols here together in the obvious way, this
should be interpreted as
Using the definition of csuc we can simplify this to [3] ⊕ [4] = [2] as claimed, and then verify
that it is true.
This gives us a pretty good idea about how to interpret terms. But how do we interpret
expressions? (Remember that in this section we will only look at sentences.)
The first principle here is that the interpretation does not mess around with Sentential Logic,
so the not-symbol ¬ of the language is interpreted as the ordinary not ¬ of mathematics
A. STRUCTURES 191
and the implication symbol ⇒ of the language as the ordinary if-then implication ⇒ of
mathematics. It is straightforward then to verify that the same thing happens to the other
connectives of Sentential Logic: we have the following interpretations
¬ → ¬
⇒ → ⇒
∧ → ∧
∨ → ∨
⇔ → ⇔
When we come to interpret the quantifier symbols, we have a second principle: where the
language is being interpreted in a structure M , then the members of M are all that the theory
can see — effectively, M is its entire universe (what is called its “universe of discourse”).
Consider our first example above, PA being interpreted in the structure N as usual. Then,
when PA4 says
(∀x)(x + 0̄ = x)
it doesn’t mean that this is true for all possible things in the universe of mathematics —
sets, real numbers, complex functions etc. etc. — it simply means that it is true for all
natural numbers, that is, everything in its own private universe. In the same way, in our
second example above, where PA is being interpreted in Z5 , this axiom would be interpreted
as saying that this was true for all five members of Z5 .
(∀x)(∀y)(∀z)( x + y = x + z ⇒ y = z)
is interpreted in N as
and in Z5 as
Note that this recipe breaks down if the expression contains free variables. If we try to
interpret (for example) x + 3̄.y = 7̄ this way, we can replace the + by +, 3̄ and 7̄ by 3 and
7, but what are we to replace x and y by? (We will sort this out in the next section.)
192 CHAPTER 5. MODELS
From here there are two approaches to interpreting (–1). One obvious thing to do is simply
use the definition (–2) to convert (–1) back into the fully-formal version it is short for,
then interpret that according to the method described above. For this example, we would
get
(∀x ∈ N)(∀y ∈ N) (∃z ∈ N)(x + z = y) ∨ (∃z ∈ N)(y + z = x) . (–3)
This method is tiresome, especially when the definitions are complicated — and becomes
truly horrible when we deal with definitions which depend on earlier definitions which in
turn depend on even earlier ones, and so on. A much better approach is to interpret the
definition itself directly once and for all. In this example, we know that,
(This, of course, is why the symbol ≤ was defined the way it was in PA.)
Having determined this rule once, we can then interpret (–1) in a single step as
much easier!
By the way, you can see that (–3) simplifies to (–5).
194 CHAPTER 5. MODELS
B Interpretations
B.1 Remarks leading to the definition of an interpretation
Now we set about defining what an interpretation of an expression is in general — when
that expression might contain free variables.
Before we go any further, I should point out that there are three ways of looking at an
expression with free variables, only one of which we will officially call an interpretation.
(1) When an expression with free variables occurs as a theorem (and that includes as an
axiom or a step in the proof of a theorem) it is meant to imply that the expression is true
for all values of those free variables. For example, in PA we proved the theorem
(∀x)(∀y)( x + y = y + x ) (–1)
but we could just as well have written this theorem in the shorter form
x + y = y + x. (–2)
((–2) follows from (–1) by PL4 and (–1) follows from (–2) by UG.) So (–2) is the kind of
expression we are talking about here.
(2) When an expression with free variables occurs as a step in the proof of a deduction
from hypotheses, it is meant to imply that the expression is true for all values of those
variables which satisfy all the hypotheses in force for it.
(3) But what are we to make of any old expression which does not turn up as a theorem
or a step in a proof? Let us take a simple example (in PA again):
x + 3̄.y = 7̄ . (–3)
Normally, what one would think about an expression like this is, “Whether it is true or false
depends upon the values given to x and y”. It is this way of looking at an expression that
is called an interpretation, so we now set about making it precise.
So, in order to interpret an expression, one must decide first on an assignment of values to
all the free variables. Assign the values
x → 1 and y → 2
and the interpretation is 1 + 3.2 = 7 which is true (the dot here represents multiplication,
not a decimal point), or assign the values
x → 2 and y → 1
and the interpretation is 2 + 3.1 = 7 which is false.
Another example: consider the expression
(∃y)(x + y = z) . (–4A)
This one has two free variables and one bound one. To interpret it in N, we assign particular
values for x and z in our structure N, for example
x → a and z → c
B. INTERPRETATIONS 195
(∀x)(∃y)(x + y = z) . (–5A)
To interpret this we need only assign a value, c say, to z in our structure and then the
interpretation in N would be
have truth-values which depend on the choice of assignment of the variables and thus on
the interpretation, whereas it is obvious that sentences (closed expressions) will have truth-
values independent of the particular assignment/interpretation chosen.
While it is clear how this goes, it is necessary to have a proper definition of the process,
so that further definitions may be made and facts proved about it, including the important
fact that proofs in the language translate to valid proofs about the structure in ordinary
mathematics.
Third, if two interpretations on the terms agree on all the variables, then they are the same
(i.e. agree on all terms). This is for the same reason as the second observation above.
Fourth, this is merely a more precise definition of the action of an interpretation on the
terms as described in B.1 above.
Before we can make a precise definition of the interpretation of expressions, we need the
following strange little definition.
B. INTERPRETATIONS 197
This definition is not as complicated as it looks. It says that, given an old interpretation θ,
we make the new one by assigning the value m to the variable v (instead of whatever it was
assigned to) and leaving all the other variables as they were. (Since this specifies what the
new interpretation does to all the variables, it completely specifies it.)
This can be said in a more useful way: given θ, we can calculate θ[v/m](t) for any term t by
(ii) If P = ¬Q then
θ(P ) is ¬θ(Q) .
(iii) If P = Q ⇒ R then
All this is pretty straightforward, except perhaps for Part (iv). An example should explain
what is going on here. Let us consider the expression
2.x = y (–1)
198 CHAPTER 5. MODELS
B.7 Proposition
Suppose that P is an expression in a first-order language L and M is a structure for L.
(i) If θ and ϕ are two interpretations of L in M which agree on all the free variables of
P , then the interpretations of P under θ and ϕ are the same; therefore P is satisfied by θ if
and only if it is satisfied by ϕ.
(ii) Suppose that P is a sentence (no free variables). Then the interpretation of P under
all interpretations in M is the same. Therefore if P is satisfied by any one interpretation
then it is satisfied by every interpretation (so then it is true for M ). By the same token, if
P is falsified by any one interpretation then it is falsified by every interpretation.
Proof. Obvious.
200 CHAPTER 5. MODELS
C Models
In the discussion of interpretations of a language in a structure in the last section, no
attention was paid to theorems or proofs. In fact, examples were given where theorems
of a theory were interpreted as false statements about the structure. For this reason, we
could discuss interpretations of a language in a structure, because theorems and proofs were
irrelevant.
But of course, the interesting and useful kind of structure is one in which the theorems are
in fact interpreted as true statements. Such a structure is called a model. In discussing
models, then, theorems become relevant, so we discuss models for theories, not languages.
The definition is now just what you would expect.
C.2 Proposition
(i) Let L be a first-order language and M a structure for L. Let E be a set of expressions
and X an expression in L such that E X.
If E is true for M , so is X.
In other words, if M is a model for E, then it is a model for X also.
Note that we are talking about entailment here.
Proof (i). We want to prove that X is true for M , given the assumption that every member
of E is true for M . We also have the assumption that E X which tells us that there is a
C. MODELS 201
proof, S 1 , S 2 , . . . , S n say, of X from E. As usual we work our way from left to right in this
list, proving that each step Si is true for M (assuming that this holds for all earlier steps).
So let S be any one of these steps and suppose that all earlier steps have already been proved
to be true for M .
(ii) Let E be member of E; we want to show that it is true under every interpretation
θ : L → M . If E contains no free variables, then this follows immediately from B.7(ii) above. I B.7
Suppose now that E contains free variables, x1 , x2 , . . . , xk say. Let F be the statement
obtained from E by universally quantifying all these variables, i.e. (∀x1 )(∀x2 ) . . . (∀xk )E.
Then by UG, F is entailed by E and so is satisfied by θ. Also it contains no free variables
and so is true for every interpretation A → M , that is, M is a model for F . But F ⇒ E by
Predicate Logic Axiom PL4. Therefore M is a model for E.
C.3 Proposition
(i) Let A be a first-order theory and M be a structure for A. Then M is a model for A
if and only if it is a model for its axioms.
(ii) Let A be a first-order theory and M a structure for A. Then every theorem of PL is
true for M .
P ⇒ (Q ⇒ P )
(P ⇒ (Q ⇒ R)) ⇒ ((P ⇒ Q) ⇒ (P ⇒ R))
(¬P ⇒ ¬Q) ⇒ ((¬P ⇒ Q) ⇒ P )
(∀x)(P ⇒ Q) ⇒ (∀x)P ⇒ (∀x)Q
P ⇒ (∀x)P , where x does not occur free in P
(∀x)P ⇒ P [x/t] , provided that [x/t] is acceptable in P
These are all obviously true in ordinary mathematics, except for the last one (PL6), which
pehaps requires explanation. The left hand side of this states that θ[x/m](P ) is true for
all m ∈ M . Now t is a term of L, so θ(t) ∈ M . The left hand side therefore implies the
particular case in which θ(t) is substituted for m, that is θ[x/θ(t)](P ). It remains only to
check the definitions to see that this is the same as θ(P [x/t]). We have shown that the six
axioms of PL are all satisfied by every interpretation L → M ; that means they are all true
for M .
(iii) This is an immediate corollary of C.2(ii) above.
(iv) For each axiom of A, the fact that it is a sentence and is satisfied by an interpretation
I B.7 means that it is satisfied by all interpretations (see B.7), and is therefore true for M . Since
this is true for all the axioms of A, the result follows from Part(i).
We will prove one direction of this theorem now and look at a couple of ways it can be used.
The other direction (that if a theory is consistent then it has a model) will be dealt with in
I Ch.8 our second helping of Model Theory, Chapter 8.
C. MODELS 203
C.5 Theorem
If a first-order theory has a model, then it is consistent.
Proof. Suppose that the first-order theory A has a model M . If A were inconsistent, then
it would have a theorem of the form P ∧ ¬P where P is a sentence. Writing PM for the
interpretation of P in this model, the interpretation of the theorem is PM ∧ ¬PM , which
cannot be true in M because of our fundamental assumption above.
C.6 Comments
This gives us a straightforward way of proving a theory is consistent: just build a model.
For example N is a model of the theory PA, therefore
PA is consistent.
For another example, consider the theory of projective planes, as described in section 4.B.3. I 4.B.3
The Seven-point plane, defined immediately after that is easily seen to be a model. Therefore
The celebrated proof by Paul Cohen (and the simplified version by J. Barkley Rosser) that
the negation of the Axiom of Choice is consistent with Mathematics (as a formal theory)
depends upon using ordinary mathematics to build a model of itself plus the negation of
the Axiom of Choice. (The construction of the model and the proof that everything works
is very long and complicated and beyond the scope of these notes.)
Now suppose that we have proved a theory to be consistent. The next thing we are likely to
ask is: are the axioms we have chosen independent? (To say that the axioms are independent
means that no one of them can be proved from the others.)
Normally, we want our axiomatic basis to be as simple as possible, at least not redundant
— it is just a nuisance if one of them can be proved from the others. Using models, we can
check independence, using the following theorem.
(ii) (Saying the same thing in slightly different notation.) Let A be a first-order theory
with proper axioms A ∪ {B}. To say that B is independent of the others means that B
cannot be proved in the first-order theory defined by the other axioms A. Equivalently,
A / B in PL.
(iii) And, of course, to say that the axioms of a first-order theory are independent means
that every one of them is independent of the others.
204 CHAPTER 5. MODELS
C.8 Proposition
Suppose we have a first-order theory A with proper axioms A∪{B}. If the first-order theory
with axioms A ∪ {¬B} is consistent, then, in A, the axiom B is independent of the others.
Proof. To say that the theory A is consistent means that there is no expression X such
that X ∧ ¬X in A, and this is the same thing as saying that there is no expression X
such that A ∪ {B} X ∧ ¬X in PL.
Suppose then that B is not independent of the other axioms. That means that A B in
PL and then of course A ∪ {¬B} B in PL. But also (trivially) A ∪ {¬B} ¬B in
PL, contradicting the consistency of A ∪ {¬B}.
C.9 Comment
This gives a neat way of showing that axioms are independent. Given a first-order theory
A with proper axioms A, we can show that any one of them, B say, is independent of the
others by constructing a model of the theory A in which axiom B is replaced by ¬B. And
doing this for each axiom separately would prove that the set of axioms is independent.
I A.5 For example, using Z5 as a structure for PA, as we did in Section A.5, it is quite straight-
forward to prove that all of the axioms except for PA1 are true for Z5 , but that PA1 fails in
this structure. It follows that, for this theory, PA1 is independent of the other six axioms.
C.10 Exercise
Show that the other six axioms of PA are independent. (This involves building six different
models, one for each axiom).
D. MODELS WHICH RESPECT EQUALITY 205
In particular, if A is a first-order theory with equality and M is a model for A, then again
M must have a binary relation =M .
But there is nothing in the definition of either a structure or a model which says that =M
must be ordinary equality!
In a model, the axioms of equality force the relation =M to be an equivalence relation; not
only that, but an equivalence relation with special properties. It must, for instance behave
well towards the functions of the structure, in the sense that, for any n-ary function symbol
f and terms x1 , x2 , . . . , xn , x0 1 , x0 2 , . . . , x0 n ,
It must also behave well in a similar fashion towards all relations and indeed to all expres-
sions.
However none of this guarantees that =M must actually be equality itself. In many cases it
does of course and this is clearly a very useful property of a model, so we give it a name:
D.2 Definition
A model M respects equality if the relation =M is ordinary equality.
I specified n 6= 0 to make the congruence different from equality, and excluded Axiom PA1
because it is not true for this model.
206 CHAPTER 5. MODELS
D.4 Remark
I8 In Chapter 8, when we prove that every consistent first-order theory has a model, we will
also prove that if a theory with equality has a model then it has one which respects equality.
Then it will follow that every consistent first-order theory with equality has a model which
respects equality. That is reassuring.
(i) For any formal function symbol f of the language of arity n and members x1 , x2 , . . . , xn , y
of M ,
fM (x1 , x2 , . . . , xn ) = y ⇔ fN (ϕ(x1 ), ϕ(x2 ), . . . , ϕ(xn )) = ϕ(y)
(ii) For any formal relation symbol r of the language of arity n and elements x1 , x2 , . . . , xn
of M ,
rM (x1 , x2 , . . . , xn ) ⇔ rN (ϕ(x1 ), ϕ(x2 ), . . . , ϕ(xn )) .
This is really just the obvious definition. It is laid out carefully here just to be on the safe
side. It is usually only of interest where M and N are models.
We can now define the idea of a categorical theory properly.
D.7 Discussion
(i) What’s equality got to do with this? Why not make the much simpler definition:
Any formal theory is categorical if all its models are isomorphic?
The reason is that the simpler definition is useless. Every formal theory that has any models
at all has models which are not isomorphic to one another.
(ii) We saw above that the axioms given for equality are not sufficient to ensure that
=M is in fact equality in a model M . (For this reason we needed to introduce the extra idea
of a model which respects equality.) So why not add some extra axioms to fix this, to make
sure that =M must be true equality?
The reason is that, no matter what extra axioms one adds, so long as the resulting theory
has a model at all, then it has one which does not respect equality, that is, one for which
=M is not true equality.
D.8 Example
Let A be any first-order theory and M be any model of A.
Let P be any nonempty set and p0 be some particular member of P . Make N = P × M
into a structure for A as follows:
(i) For any formal function symbol f of A of arity n and members hp1 , m1 i, hp2 , m2 i, . . . , hpn , mn i
of N , define
(ii) For any formal relation symbol f of A of arity n and members hp1 , m1 i, hp2 , m2 i, . . . , hpn , mn i
of N , define
It is not difficult to see that this structure is a model for the Projective Plane. It is also
easy to see that it has an infinite number of point-pairs, and so is not isomorphic to the
Seven-point Plane.
It is of interest to observe that this construction gives a large number of different (i.e., non-
isomorphic) projective planes, by simply choosing different fields for the field K of scalars
for V . This shows that the theory is not categorical.
Choosing K to be the Galois field GF(2) of two elements (which is the field of integers
modulo 2), this construction gives the Seven-point plane discussed above (well, something
isomorphic to it).
Choosing K to be the reals R yields the Real Projective Plane, a construction isomorphic to
the “spherical” one given above. To see this isomorphism, note that if the field of scalars is
R, then V is Euclidean 3-space. Make the subspaces of this construction correspond to their
intersections with the unit sphere: each 1-dimensional subspace is a straight line through
the origin and so intersects the sphere in a pair of opposite points; and each 2-dimensional
subspace is a plane through the origin and so intersects the sphere in a great circle.
210 CHAPTER 5. MODELS
F PL is not complete
F.1 Discussion
I 3.H In Section 3.H we proved that Plain Predicate Logic is consistent by a technique of attaching
a value of 1 or 0 to every statement in such a way that all theorems have value 1 and all
antitheorems value 0.
We can now reveal that this technique was actually the use of a (particularly simple) model
in disguise. The way it works is this.
Our model is a one-member set, M = {m} say, on which the actions of the functions and
relations are defined thus:
I 3.H Now, looking at the technique of 3.H, you can see that we defined
Now we can alter this model slightly to get another fundamental result for Plain PL.
Make a second model, let’s call it M 0 , defined in exactly the same way as M above, with
the one change that the actions of the relation symbols are defined
Everything still works (check it!) and all theorems of PL are still true for this model.
Choose a unary relation, r say, and consider the sentence P = (∀x)r(x). Now P is false
for the second model M 0 , so it cannot be a theorem of PL. On the other hand, ¬P is false
for the first model M , so it cannot be a theorem of PL either. Since P is a sentence, this
shows that
PL is not complete.
G. EXAMPLE: MODELS FOR RA 211
G.3 Theorem
RA satisfies none of the cancellative laws of ordinary arithmetic. In other words, these are
all non-theorems:
/ x+y =x+z ⇒ y=z
/ y+x=z+x ⇒ y=z
/ x.y = x.z ∧ x 6= 0 ⇒ y=z
/ y.x = z.x ∧ x 6= 0 ⇒ y=z
Proof. Is really just a matter of checking cases. All the axioms of RA hold for N, so we
just have to check that they also hold when one or more of the variables involved is ∞. For
example, to check RA5, it is enough to check that
(∀x)( x + ∞+ = (x + ∞)+ )
(∀y)( ∞ + y + = (∞ + y)+ )
and ∞ + ∞+ = (∞ + ∞)+ .
This can be done just by checking the definitions above. The same goes for all the other
axioms. (And there is no need to check the axioms of PL or equality, because they hold
automatically). Now, to show that the “untheorems” above really don’t hold in the model,
all we need do is look for an example — and it will have to involve ∞, so the search is not
hard.
∞+0 =∞+∞ but 0 6= ∞
0+∞ =∞+∞ but 0 6= ∞
∞.1 = ∞.∞ but 1 6= ∞
1.∞ = ∞.∞ but 1 6= ∞
212 CHAPTER 5. MODELS
[X ⊕0 Y ] = X , [X ⊕n+1 Y ] = [[X ⊕n Y ] ⊕ Y ],
0 n+1 n
[X Y] = X , [X Y ] = [[X Y ] ⊕ Y ].
G.6 Proposition
The following are not theorems of RA:
Associativity and commutativity of addition.
Proof. First we check that all the axioms of RA are satisfied by this model, just a matter
of checking lots of cases. Then, noting that λ is basic, we have
contradicting both associativity and commutativity. The same trick works for multiplication:
0 + λ = [0 ⊕ λ] 6= λ and 0.λ = [0 λ] 6= 0 .
This model satisfies the usual cancellation laws, but a small modification to it breaks those
laws too. All that is needed is to define λ + λ = λ instead of [λ ⊕ λ] and λ.λ = λ instead
of [λ λ]. However it was nicer to break those laws with the much simpler first model.
Part II
MATHEMATICS
215
6. MORSE-KELLEY SET THEORY
Read the first one as saying “x is a set”. This may look a little strange; the reason for it will
be explained soon.
217
218 CHAPTER 6. MORSE-KELLEY SET THEORY
(MK2) Specification
Schema: an axiom for every expression P with at least one free variable x (here y1 , y2 , . . . , yn
are all the other free variables of P — if any).
(MK4) Unions
(∀a) SET(a) ⇒ (∃w) SET(w) ∧ (∀x)((∃y)(x ∈ y ∧ y ∈ a) ⇒ x ∈ w)
(MK6) Infinity
(MK7) Formation
(∀a)(∀b)(∀c) SET(a)
∧ (∀u) u ∈ a ⇒ (∃v)(∃w)(v ∈ b ∧ w ∈ c ∧ u ∈ w ∧ v ∈ w)
∧ (∀u)(∀w1 )(∀w2 )(∀v1 )(∀v2 ) u ∈ a ∧ w1 ∈ c ∧ w2 ∈ c ∧ v1 ∈ b ∧ v2 ∈ b
∧ u ∈ w1 ∧ v1 ∈ w1 ∧ u ∈ w2 ∧ v2 ∈ w2 ⇒ v1 = v2
∧ (∀v) v ∈ b ⇒ (∃u)(∃w)(u ∈ a ∧ w ∈ c ∧ u ∈ w ∧ v ∈ w)
⇒ SET(b)
(MK8) Foundation
(∀w) (∀a) (SET(a) ∧ a ⊆ w) ⇒ a ∈ w ⇒ (∀a)(SET(a) ⇒ a ∈ w)
Note the Axiom of Choice is not an axiom of MK. It will be discussed below in Section
IF F.
A. MORSE-KELLEY AXIOMS FOR SET THEORY 219
Well, firstly we will see that a set is just a special kind of class, so the interesting question
is: what other sorts of classes are there? What we really want is some idea of what a proper
class might be — that is, a class which is not a set.
The best answer I can give here is to think of them as things like sets, but which are “too
big” to be sets. Examples are: the class of all sets, the class of all topological spaces, the
class of all groups, the class of all three-dimensional C ∞ manifolds, etc. We will see later
that the assumption that any one of these examples I have given of proper classes is a set
leads to a contradiction. In other words, any theory which contains enough set theory to
define, say, the set of all groups will be inconsistent.
So, for MK, the basic object is a class. Everything is a class. For instance, when we write
(∀x) . . .
Some of the notation becomes a little clearer if we use uppercase letters for variables which
we think of as classes or sets (that is, whose members are of interest to us) and lowercase
letters for the others — as in “let x be a member of the group G”. I will do this where
appropriate as far as possible.
It should be pointed out here that there are other ways of axiomatising set theory and with
it the “whole of mathematics”. There is one pre-eminent one, known as Zermelo-Fraenkel
220 CHAPTER 6. MORSE-KELLEY SET THEORY
Set Theory or ZF, which is probably more popular than MK, in the sense that of those texts
which take the trouble to describe the foundations they assume, I believe currently more of
them specify ZF than specify MK.
So why am I subjecting you to MK instead of the better known and possibly more widespread
ZF?
The fundamental object in ZF is the set, not the class. In fact, there is no way in ZF to talk
directly about classes. But for a number of areas of modern mathematics this can be quite
a drawback. In algebraic topology one discusses functors, and these are functions from the
class of all topological spaces to the class of all groups. In group theory, “abelianisation” is a
function from the class of all groups to the class of all abelian groups. And category theory
can hardly be done at all without using classes extensively.
In this chapter, many statements are made and some of them have quite obvious and easy
proofs; in these cases the proofs will usually be omitted to keep the narrative going. Where
proofs are not obvious they will usually be given. There are occasional (very few) state-
ments whose proofs are truly enormous and which will not be given — the consistency and
independence of the Axiom of Choice being two examples.
The remainder of this chapter will be devoted to showing how mathematics, as we know
it, is done within this formal theory. The axioms will be introduced one at a time, what
they say will be explained in plain language, and an overview of what they provide for us
provided.
Towards the end I will discuss the Axiom of Choice, which perhaps can be thought of as an
optional extra axiom, and the extra facilities it provides us with.
B. THE FIRST FIVE AXIOMS 221
if (∀x)(x ∈ A ⇔ x ∈ B) then A = B .
Or even less formally: if two classes have exactly the same members, then they are equal.
in other words, two classes are equal if and only if they have the same members.
Thus the (semi-formal) notation SET(x) introduced right at the beginning of A.1 above says
that x is a set.
It is a schema: an axiom for every expression P with at least one free variable x (here
y1 , y2 , . . . , yn are all the other free variables of P — if any).
This might be easier to understand if we use the less formal “functional” notation for the
expression P and write it P (x, y1 , y2 , . . . , yn ) — with the understanding that the variables
listed are all the free variables in P . The axiom can be restated: for any such expression P
and given any classes y1 , y2 , . . . , yn , there is a class w such that
w is the class consisting of all those sets x for which P (x, y1 , y2 , . . . , yn ) is true.
222 CHAPTER 6. MORSE-KELLEY SET THEORY
This axiom gives us a powerful tool for defining new sets and classes, and we will be using
it frequently. For now, let us look at a straightforward example of the way it can be used.
Suppose we have two sets (or classes, it doesn’t matter) A and B and we want to define
their intersection. We know what we want — we want a set (or class), W say, with this
property
for any x, x ∈ W ⇔ x ∈ A and x ∈ B . (–1)
So, take the right hand part of this and call it P (or, if you prefer, P (x, A, B), thus:
Applying the axiom to this, it tells us that there is a class W such that
(∀x)( x ∈ W ⇔ SET(x) ∧ x ∈ A ∧ x ∈ B )
Now, notice that if x ∈ A then SET(x) is automatically true (by the definition of SET(x)),
so this simplifies to
(∀x)( x ∈ W ⇔ x ∈ A ∧ x ∈ B )
which is just (–1) written slightly more formally.
The point of this example is that it shows how we can define new sets and classes: As long
as we can describe the members of the class we want by some expression P , then the Axiom
of Specification will tell us that that class exists.
What is the point of that part “SET(x)” of the axiom? It is there just to ensure that the
members of W must always be sets — it is possible that you could choose a bad sort of P
which was true for some proper classes x. We will see an example of this shortly. However,
in most cases the expression P implies that x is a set, and then the axiom simplifies. This
occurs frequently enough for us to state it properly:—
in other words, for any y1 , y2 , . . . , yn there is a class w whose members are just those x for
which P is true.
This unique existence allows us to define a function by description (see 4.A.7). The usual I 4.A.7
notation for this is { x : P } or, if you would like to be explicit about the variables involved,
{ x : P (x, y1 , y2 , . . . , yn ) }. According to the way definition by description works, this is
defined by
W = {x : P } ⇔ (∀x)( x ∈ W ⇔ SET(x) ∧ P ) (–1)
Returning to our intersection example above. This latest result proves that, not only does
there exist a class with the required property of an intersection, but there is only one such.
That allows us to write the intersection of two classes as a function, which is exactly what
we do in the notation A ∩ B ( ∩ is a function of two variables; we are using infix notation).
Note by the way that the x in { x : P } is a dummy: it is a bound variable, bound by this
notation, and its scope is everything inside the two curly braces.
This means, amongst other things, that we can change the dummy to another variable
without changing the meaning of the notation,
{ x : P } = { u : P [x/u] }
provided, as usual, that the substitution is acceptable in P . But note, there is also a “hidden”
acceptability condition here: u must not even occur free in P (unless u is the same as x, in
which case nothing changes). To see why this is so, look at what the definition becomes in
this case:
w = { u : P [x/u] } ⇔ (∀u)( u ∈ w ⇔ P [x/u] ) ;
you can see that, if u occurs free in P then it is definitely unacceptable on the right hand
side of this definition.
Using this notation for our intersection example, this all boils down to the more compact
definition
A ∩ B = {x : x ∈ A ∧ x ∈ B }
and, as an example of changing the dummy, this is the same as
A ∩ B = {v : v ∈ A ∧ v ∈ B }
(or any other letter you would like to use except for A or B). To see the sort of thing that
can happen if you violate the “hidden” acceptability condition, consider
A ∩ B = {A : A ∈ A ∧ A ∈ B } WRONG!
Now let us use this axiom to get some useful basic facts about sets and classes.
224 CHAPTER 6. MORSE-KELLEY SET THEORY
Is there any guarantee that there are any classes at all? Well, yes. Substitute any predicate
you like into the Axiom of Specification, for instance x = x, and we have
which tells us that there is a class (W ) with certain properties, which we won’t bother about
just now. Suffice it for now to know that at least one class actually exists.
Well then, are there any sets at all? Again yes. A glance at the Axiom of Infinity (MK6)
shows us that it states that a class (W again) exists with certain complicated-looking prop-
erties, the first of which is that it is a set. For the moment, that is enough: there exists at
least one set.
But of course we can do a lot better than that.
But it is far more convenient to use this to define the empty class; since x is the only variable
in the expression, this defines a constant.
Definition
The empty set, ∅, is defined
∅ = { x : x 6= x } . (–1)
This certainly defines ∅, but so far all it tells us is that ∅ is a class. We will show that it
is indeed a set shortly, but first. . .
(∀x)( x ∈
/ ∅). (–2)
Using the Axiom of Extension, it is easy to show that this also defines ∅ uniquely; it is
probably the better-known definition of the empty set. That means also that any set u
which has the property (∀x)¬(x ∈ u) must be the empty set.
Now, to prove that it is indeed a set, we peek once more at the Axiom of Infinity. Notice
that it has the form
(∃w) stuff ∧ (∃u)(u ∈ w ∧ (∀x)¬(x ∈ u)) ∧ more stuff
and so it says, amongst other things and in a very formal way, that there is a class w such
that ∅ ∈ w. But then, by the definition of a set above, ∅ is a set.
B. THE FIRST FIVE AXIOMS 225
x∈
/ A means ¬(x ∈ A)
A⊆B means (∀x)(x ∈ A ⇒ x ∈ B)
A⊂B means A ⊆ B ∧ A 6= B
A⊇B means B ⊆ A
A⊃B means B ⊂ A
(∀x ∈ A)P (x) means (∀x)(x ∈ A ⇒ P (x))
(∃x ∈ A)P (x) means (∃x)(x ∈ A ∧ P (x))
Note that all these definitions hold whenever A and B are classes — there is no requirement
that they be sets. With regard to the last two notations, observe that if the class A happens
to be empty, (∀x ∈ A)P (x) is automatically true regardless of the expression P (x). We say
that the statement is vacuously true. Similarly, if A is empty, the statement (∃x ∈ A)P (x)
is automatically false.
Also:
A is a proper class means ¬ SET(A) ,
∅⊆A
A⊆∅ ⇔ A=∅
Reflexivity A⊆A
Sets = { x : x = x } .
The class of all sets is occasionally called “the universe”. I will not use the word in this sense
because it clashes with the way we use it elsewhere in this book — recall, we have used the
term “universe of discourse” to mean everything that a variable can represent. For MK, the
universe of discourse is all classes, not just all sets.
P (A) = { X : X ⊆ A } ,
that is, the class whose members are exactly the subclasses of A which are sets — the subsets
of A.
If you are used to the idea of a power set, then this definition seems a little strange. Why
not define P (A) to consist of all the subclasses of A? The reason is that any subclasses of
A which are not sets are not allowed to be members of anything, so cannot be members of
P (A).
In fact, this is one of the uncommon cases in which the definition of this notation:
X ∈ P (A) ⇔ SET(X) ∧ x ⊆ A
X ∈ P (A) ⇔ x⊆A
(∀X)(X ⊆ A ⇒ X ∈ W) ,
B. THE FIRST FIVE AXIOMS 227
that is, there is a set W such that every subclass of A is a member of W ; in other words,
such that P (A) ⊆ W .
There are several important things to notice here:
(1) If A is any set, then every subclass of A must also be a set, since it is a member of
this set W . This is the result promised in B.8 above.
(2) If A is a set, then its power class P (A) is in fact a set, since it is a subclass of the
set W . We therefore call it the power set of A.
(3) A corollary of all this is that the Zermelo-Fraenkel version of the Axiom of Specifi-
cation holds, that is, if a is a set and P (x) an expression, then
(∃w ∈ Sets)(∀x)(x ∈ w ⇔ x ∈ a ∧ P (x)) .
For anyone interested in category theory: actually the two-tiered approach starts to
fall a little short here. Given categories A and B, there is a concept of the category of
functors A → B and natural transformations between them. If A and B are classes
(as they usually are) then the members of the functor category are classes too, and the
whole thing cannot be accommodated within MK without some serious double-speak.
(There are ways of getting around the problem, but they are a bit of a nuisance.) If
you are not interested in category theory, ignore this coment.
In the context of MK, Russell’s argument shows that the class of all sets cannot be a set,
and is therefore a proper class.
The Russell class, is defined to be the class of all sets which are not members of themselves:
RUS = { x : x ∈
/ x}.
228 CHAPTER 6. MORSE-KELLEY SET THEORY
The “Russell Paradox” argument shows that this is not a set. From the results above, we
see that RUS ⊆ Sets, and so Sets is not a set either.
Here is the Russell Paradox argument: note first that the definition of RUS is equivalent to
x ∈ RUS ⇔ SET(x) ∧ x ∈
/x. (–1)
The important thing to take away from this fun little proof is
The Russell Paradox is sometimes called the “Barber Paradox”, from a popularisation which
starts off, “In a certain town there is a barber who shaves everyone who does not shave
himself. Does the barber shave himself?” This is not a true paradox because the logical
conclusion is that no such barber exists.
Proof. All these proofs go much the same way and are quite easy. The trick is to first use
the definitions of complement, intersection, or whatever operations are relevant, to convert
the equation into a problem in logic; this then turns out to be a well-known result in SL.
For example, consider the first equation of (xi). Using the Axiom of Extension, this is
equivalent to
(∀x) x ∈ A ∩ (B ∪ C) ⇔ x ∈ (A ∩ B) ∪ (A ∩ C)
Working on the left-hand side,
x ∈ A ∩ (B ∪ C) ⇔ x ∈ A ∧ x ∈ (B ∪ C) Defn of intersection
⇔ x ∈ A ∧ (x ∈ B ∨ x ∈ C) Defn of union
I 2.F.23 and these are equivalent by SL (2.F.23). Now we have a proof, but it is written out sort of
backwards. Putting it the right way round, we get
x ∈ A ∩ (B ∪ C) ⇔ x ∈ A ∧ x ∈ (B ∪ C) Defn of intersection
⇔ x ∈ A ∧ (x ∈ B ∨ x ∈ C) Defn of union
⇔ (x ∈ A ∧ x ∈ B) ∨ (x ∈ A ∧ x ∈ C) By SL
⇔ (x ∈ A ∩ B) ∨ (x ∈ A ∩ C) Defn of union
⇔ x ∈ ((A ∩ B) ∪ (A ∩ C)) Defn of intersection
It is probably no accident that the symbols for logical AND and OR are so similar to the
symbols for intersection and union.
We note that if a is a proper class, then {a} is empty; similarly, if a and b are proper classes,
then {a, b} is empty. Therefore these definitions are usually only used when a and b are sets.
In this case they are not empty, and their defining properties are
In this case, {a} is called a singleton and {a, b} is called a doubleton or an unordered pair.
As with the power set, we don’t need the new axiom to tell us that the pair and the singleton
exist as classes and that they have the requisite properties. The new axiom tells us, in the
usual roundabout way, that they are actually sets.
The Axiom of Unordered Pairs tells us that, for any sets a and b, there is a set W such
that {a, b} ⊆ W . It follows from this that {a, b} is itself a set. Also, from the definition,
{a} = {a, a} and so it follows that {a} is a set also.
So for any set a, there is a set whose only member is a; for any sets a and b there is a set
whose members are a and b (and no others).
From these definitions we can easily prove lots of simple facts about unordered pairs and
singletons. For some examples (here a, b, c, d and x are assumed to be sets)
(i) {a, b} = {c, d} if and only if (a = c ∧ b = d) or (a = d ∧ b = c) .
(ii) {a} = {b} if and only if a=b .
Note that it will shortly become apparent that these simple-looking facts are more important
than they look.
It is a further interesting little exercise to figure out which of these statements remain true
if the restriction that a, b, etc. must be sets is removed.
Before we leave singletons, note a trap for young players: the set {∅} is not the empty set.
It has in fact one member, namely ∅.
• the union of two classes A∪B (from which idea we can define the union of three classes
A ∪ B ∪ C and so on) and
S
• the union
S of a class of sets: if A is a classSof sets this may variously be written Y ∈A Y
or { Y : Y ∈ A } or, most simply, A . (I prefer the last and simplest notation
and will use it in what follows.)
Of course, in the world of MK, the members of classes are always sets, and so any class is
automatically a class of sets. When I say above that A is a “class of sets”, it is merely a
signal to the human reader that we will here actually be interested in the members of A as
sets themselves — the members of its members are relevant.
A ∪ B = {x : x ∈ A ∨ x ∈ B }
[
A = { x : (∃Y )(x ∈ Y ∧ Y ∈ A }
S
Axiom MK4 tells us that, if A is a set of sets, then A is also a set. (More precisely, it
works in the usual
S roundabout way, telling us that, ifSA is a set of sets, then there is a set
W such that A ⊆ W ; and from this it follows that A is a set.)
x∈A∪B ⇔ x ∈ A or x ∈ B ,
[
x∈ A ⇔ there is some member Y of A such that x ∈ Y .
which, with the Axiom of Unordered Pairs above, tells us that, if A and B are sets, then so
is A ∪ B. You cannot use this trick if A and B are proper classes because then {A, B} is
empty.
Note that {a, b} = {a} ∪ {b} and so we can extend this notation in a natural manner:
{a,
S b, c} = {a}∪{b}∪{c} (defined as ({a}∪{b})∪{c}) and then of course we have A∪B ∪C =
{A, B, C} (provided A, B and C are sets) and so on.
B.15 Intersections
We define the intersection of two classes and of a class of sets in a similar way to their
unions:
A ∩ B = { x : x ∈ A ∧ x ∈ B }.
\
A = { x : (∀Y ∈ A)(x ∈ Y ) }
We don’t need another axiom to deal with intersections of sets. If A is nonempty, then its
intersection is a set, even when A is a proper class! To see this, note that if A is nonempty,
then it mustTcontain at least one member, Y say, and this must be a set. It is now easy to
prove that A ⊆ Y , and so the intersection is a set also.
On the other hand, the intersection of the empty set is not a set:
\
∅ = Sets
The intersection of two classes is a class (possibly a set, but not always). The intersection
of two sets is always a set because of the easily proved fact that A ∩ B ⊆ A.
I B.12 It is now worth looking again at all the basic results listed in Section B.12. All these results
hold just as well for sets as classes (for the simple reason that sets are just a kind of class),
however we now know also that, provided A, B and C are sets, all the things listed in that
Section are also sets — with the exception of those that mention Sets or complements.
B. THE FIRST FIVE AXIOMS 233
We will actually define ordered pairs twice — in two different ways. I will call the first kind
a primitive ordered pair.
So, the primitive ordered pair ha, bip is defined by
As with unordered pairs, this definition makes sense when a and b are classes, but is only
useful when they are both sets. Note that, if a and b are both sets, the Axiom of unordered
pairs tells us that this ordered pair ha, bip is also a set.
Using the properties of unordered pairs listed above ((i) to (iii)), we can prove the main
property of primitive ordered pairs:
B.17 Proposition
If a, b, c and d are sets, then
We now also define (primitive) ordered triples by ha, b, cip = hha, bip , cip and prove the
crucial property, that, if a, b, c, d, e and f are all sets, then so are ha, b, cip and hd.e.f ip ,
and
ha, b, cip = hd, e, f ip if and only if a=d ∧ b=e ∧ c=f
and so on.
The problem with this “primitive” definition is that it only really works when the elements
of the pair are sets: if A and B are proper classes, then it is not difficult to check that
hA, Bip = {∅}, irrespective of the values of A and B, and so the crucial property just stated
and proved for sets no longer holds. We will shortly have need of ordered pairs of classes —
well actually an ordered triple hA, G, Bi defined as hhA, Gi, Bi — so we now make a more
powerful definition, which works for classes as well as sets.
What is going on here? We are replacing a and b by sets in which all their elements are
“tagged” so we can tell them apart. Think of this as
ha, bi = a 0 ∪ b0
where a0 and b0 are the tagged versions. We tag every member of a with ∅ and every member
of b with {∅}:
a0 = { h∅, xip : x ∈ a }
b0 = { h{∅}, xip : x ∈ b } .
Now it really doesn’t matter what tags we use here, so long as they are different so that we
can tell them apart. For example, if we had already defined the numbers 0 and 1 it would
have been natural to use them for tags, something like this:
a0 = { h0, xip : x ∈ a }
b0 = { h1, xip : x ∈ b } .
But we can’t do that, because we haven’t defined 0 and 1 yet. So I have chosen to use ∅
and {∅} here because they are the simplest things we already know are different (∅ has no
members and {∅} has one member, namely ∅ itself). There is another reason which will
emerge later that this choice of tags is nice.
It is important to know that if a and b are sets then ha, bi is a set too.
Here is the proof. This proof is included for completeness. Do not bother to wade through
it unless you are interested.
Suppose that a and b are both sets. Then ha, bi is also a set. To see this, note that {a}
and {a, b} are both subsets of {a, b} and hence members of P ({a, b}). Hence ha, bip =
{{a}, {a, b}} ⊆ P ({a, b}). In particular, for any x ∈ a, {∅, x} ⊆ {∅} ∪ a and so
Thus {h∅, xip : x ∈ a} ⊆ PP ({∅} ∪ a). In the same way {h{∅}, xip : y ∈ b} ⊆
PP ({{∅}} ∪ b) and so
is a set.
B.18 Proposition
If a, b, c and d are any classes, then
Write X = ha, bi. We want to show that, given X there is only one possible choice of a and
b that will give it. For a start we can recover a and b thus:
a = { x : h∅, xip ∈ X }
b = { x : h{∅}, xip ∈ X } .
Our next major goal is to define functions in such a way that we can create them conveniently
when we want to; also so that we can define functions from one class to another (as one
wants to do all the time in algebraic topology and category theory to mention just two
areas). We will, of course, have to define functions as sets or classes. In order to do this we
must first discuss cartesian products and relations.
A×B = { ha, bi : a ∈ A ∧ b ∈ B }
I have used the general form of the ordered pair here, but the primitive form could have
been used just as well (since here a and b must be sets). In any case, this definition is good
for defining the cartesian product of two classes A and B. (This is important.) Nevertheless,
we also need to know that
As before, the proof is given here for completeness. Don’t feel you have to work through it
unless you really want to.
[ [
ha, bi ⊆ PP ({∅} ∪ A) ∪ PP ({{∅}} ∪ B)
and so [ [
ha, bi ∈ P PP ({∅} ∪ A) ∪ PP ({{∅}} ∪ B) .
236 CHAPTER 6. MORSE-KELLEY SET THEORY
A × (B ∪ C) = (A × B) ∪ (A × C) , A × (B ∩ C) = (A × B) ∩ (A × C).
U ⊆ A and V ⊆ B ⇒ U ×V ⊆A×B
{a} × {b} = {ha, bi}
and so on.
We could now go on to define cartesian products of more than two sets, for instance A×B×C
to mean (A × B) × C, but we won’t do that because later we will have a more satisfactory
way to define general cartesian products. For the time being however, the notation A2 for
A × A is useful.
B.20 Relations
We now wish to define relations as mathematical objects, rather than as the relation symbols
of our language. This will allow us great flexibility in creating new relations of various
kinds. It would be possible to define new relations by adding new symbols to the language
for them, extending the language to include them, and also adding new axioms to define
their meanings. This process clearly would be very messy; instead we show how to define
relations as sets, which can be defined at will by the means already at our disposal without
having to do any violence to the language itself. This method, it will be seen, is neat and
very flexible.
For the time being, we will only discuss binary relations (the extension to other arities will
become obvious). If R is a relation, we will denote “a stands in relation R to b” by aRb
rather than R(a, b). This is more in line with the notation used for most binary relations
such as ≤ , < , ⊆ and so on.
Suppose now that R is a relation from a class A to a class B. That means that aRb makes
sense and is either true or false whenever a ∈ A and b ∈ B. Then there are three important
classes here:—
A, the domain of R. We denote this dom(R).
Clearly these classes between them define the relation, gr(R) is a subclass of the cartesian
product A × B and any subclass of A × B will define some relation from A to B. Thus we
have a method of constructing relations as classes.
Definition A relation is a triple hA, G, Bi, where A and B are classes and G is a subclass
of the cartesian product A × B. In this case A is called the domain of R, dom(R); B is the
codomain of R, cod(R) and G is the graph of R, gr(R). We say that R is a relation from A
to B. We write aRb to mean ha, bi ∈ G.
When we define such a relation, A and B will usually be sets, and in this case, gr(R) and
R itself will be sets also. However it all works fine if A and B are classes.
We can now define equivalence relations:—
We can now also define order relations of various kinds (partial, full orders etc.). We will
do so in Section C. IC
Various other kinds of relations (preorders and full orders being a couple of important ones)
can be defined the same way.
It is important to notice that now we have three distinct but related ideas, all called “rela-
tion”.
(1) The relation symbols of the fully formal language. In mathematics, as defined here,
there are only two of these, ∈ and =.
(2) Any expression with two free variables defines a relation, for example the expression
P (a, b) = (∀x)(x ∈ a ⇒ x ∈ b)
Also, of course, where a useful relation has been defined by either methods (2) or (3), it may
be given a symbol which can henceforth be used as though it was of type (1): the subset
relation symbol ⊆ arose this way.
B.21 Functions
We will now define a function as a special kind of relation. If f is a function A → B, then
we can think of f (a) = b as defining a relation from A to B. If (temporarily) we write this
238 CHAPTER 6. MORSE-KELLEY SET THEORY
af b to stay in line with the notation of the previous paragraph, we see that the relation f
has to have a special property to qualify as a function, and arrive at the following definition.
This uniqueness is of course exactly what we need to use functional notation (definition by
description): for any a ∈ A, we write f (a) for the unique b such that af b.
Note This definition implies that dom f = A. Thus we are here adopting the convention
used in virtually all of mathematics except elementary calculus that, when we say that f is
a function A → B, we imply that it is defined on all of A; in other words, its domain is A,
not just a subset of A.
Musing Calculus 101 really is a worry. Look in any first-year calculus door-stop textbook
at the definitions of limit and continuity and you will find that the authors get themselves
into all kinds of knots regarding the domain of the functions involved (if they don’t simply
ignore the problems completely). In my experience, most such textbooks end up getting the
definitions plain wrong.
More musing It would appear that we now have two different meanings of the word
“graph” — the one defined here, and the one we all know well from elementary mathematics.
Now consider an example, the squaring function on the reals; let us call it f . It is a function
f : R → R defined by f (x) = x2 for all x ∈ R. So, if we use the definition of “graph” in
this section, its graph will be the subset of R × R consisting of all points of the form hx, x2 i.
But R × R is of course the Euclidean plane, so we can draw a picture of this set (the orange
curve is the graph):
Note also In this definition, every function has a well-defined domain and codomain. The
fact that a function defines its domain is not a surprise, but not everyone would agree that
a function defines its codomain. (The codomain is what you might think of as the “target”
set or class — it contains the range.) Consider this example:
B. THE FIRST FIVE AXIOMS 239
The absolute value function x 7→ |x| as a function C → R≥0 (the non-negative reals);
The absolute value function as a function C → R (all the reals);
The absolute value function as a function C → C
(Makes a nice example of a non-analytic function).
Some folk would consider these to be all the same function. According to the definition
above however they are three different functions because they have different codomains.
They are obviously closely related of course.
The more precise definition of a function used here is in line with some branches of modern
mathematics, in particular category theory and any branch that uses category-theory ideas;
Its use in general does not cause any problems.
Note that, if A and B are sets, then any function A → B is also a set.
It is now straightforward, if tedious, to verify all the usual basic properties of functions.
We will say that two functions f and g are equal if they are equal as classes. This turns out
to be the same as the usual meaning of equality of functions: f and g are equal if they have
the same domain and codomain and f (a) = g(a) for all a in their domain.
The composite of two functions and the identity function idA on a class A are defined in the
obvious ways.
Composition is associative where the composites are defined and if f : A → B then idB ◦f =
f and f ◦ idA = f .
The proper definition of a function as a special type of relation allows us to clear up a point it
is possible you have been wondering about for ages: whether there are functions f : A → B
where A and or B may be empty, and if so, what they are. Perusal of the definition tells us
that
• For any class B there is a unique function ∅ → B. Its graph is the empty set, so we
call it the empty function.
• If A is nonempty, then there is no function A → ∅ .
(2) By specifying it as a class, as in the definition just given. If we have classes A and
B and a subclass G of A × B such that, for all a ∈ A there is a unique b ∈ B such that
ha, bi ∈ G, then this defines a function f = hA, G, Bi.
These two ways of dealing with functions are equivalent, in the sense that, given any function
of type (1), we can define an equivalent function of type (2) and vice versa.
However, method (2) has one outstanding advantage: if A and B are sets, then any function
A → B is, as we have seen a set, and so we are able to speak of the “set of all functions
A → B”. With method (1) this construction just doesn’t make sense. Therefore, whenever
new functions are introduced, it is usually tacitly understood that they are defined as sets
(or classes).
B.24 Sequences
Consider a sequence ha0 , a1 , a2 , . . .i, whose elements belong to some class A. If we were to
write its elements as ha(0), a(1), a(2), . . .i instead, it would be clear that were were simply
listing the values of a function N → A. Looking at it this way then, a sequence is nothing
new; it is simply a function N → A (where A is the class containing the elements of the
sequence), but with a different notation which we sometimes choose to use for one reason
or another.
Well ... there is something new here: we haven’t got around to defining N yet. But when
we do, we will have sequences ready made. We will also have n-tuples of course: an n-tuple
ha1 , a2 , . . . , an i with elements in a set A is simply a function from the subset {1, 2, . . . , n}
of N to A.
I B.16 Oops! Now we have three different definitions of an ordered pair, the ones in B.16 and B.17
I B.17
B. THE FIRST FIVE AXIOMS 241
above and the one just given as a 2-tuple. In the same way we have three definitions of
a triple. The definitions just given (as special cases of n-tuples) are the most convenient.
Unfortunately, the more primitive definitions in B.16 cannot be dispensed with — they are
used to define a function, which in turn is used to define an n-tuple. Perhaps the best plan
is to call both the kinds of ordered pairs and triples defined in B.16 “primitive”, use them
only to define relations and functions and use the more general and convenient definition of
an n-tuple in all other cases. In fact, the distinction is almost never important in practice.
When we index members of some set A in this way it is very common for the index set to
be N (or some subset thereof). But there is no reason why this has to be so. It is perfectly
possible to have something like a sequence, but indexed by some other set, I say. In this
case we use the word family instead of sequence. Also, the notation ha0 , a1 , a2 , . . .i is no
longer useful; we write something like hai ii∈I instead.
Without needing to wait for N to be defined, we can define families. A family hxi ii∈I in
a class A indexed by the set I is simply a function x : I → A. (There is no reason why I
cannot be a class here too, but it would be unusual for this to be so.)
In particular, if A is a set, the cartesian power AI can be thought of as the set of all families
in A indexed by I.
More generally, if hAi ii∈I is a family of sets, we can define its cartesian
Q product Qto be the
set of all families hai ii∈I , where ai ∈ Ai for all i ∈ I. This is denoted i∈I Ai or hAi ii∈I .
242 CHAPTER 6. MORSE-KELLEY SET THEORY
C Order
The various kinds of orders, especially partial and full orders, should be familiar to you.
This is a short section just to get our symbols and terminology straight.
It is perfectly possible to define an order relation on a class and to discuss ordered classes;
everything in this section applies to ordered classes just as well as to sets. (Example: ⊆ is
a partial order on the class of all sets.)
There are various kinds of orders, depending upon which of the following conditions they
satisfy:
special circumstances (there are sets X and Y such that neither X ⊆ Y nor Y ⊆ X). We
will deal with other partial orders from time to time.
Notation: In a partially ordered class, we write
but the reverse implication does not hold in general. In fact, if the reverse implication does
hold for all x and y, then the relation is a full order (another easy proof).
Full orders are probably the most familiar kind: the standard order relations on the natural
numbers, the integers, the rationals and the reals are all full orders.
We will look at well-orders in Section E. IE
a is a minimal element of X if
Upper bounds, maximum (= greatest) elements and maximal elements are defined in the
obvious ways.
A least upper bound or supremum for X is, as its (first) name implies, a least element in the
set of all upper bounds of X. The notations lub X and sup X are used. Similarly a greatest
lower bound or infimum, denoted glb X or inf X, is a greatest element in the set of all lower
bounds of X.
244 CHAPTER 6. MORSE-KELLEY SET THEORY
(iii) A chain in a partially ordered class is a subclass which is fully ordered, that is, a
subclass in which all elements are comparable.
(iv) A subclass X of a partially ordered class A is initial or an initial segment if
x∈X and a ≤ x ⇒ a∈X .
For any member a of A, the set I(a) = {x : x < a} is an initial segment (the initial segment
¯
defined by a), as is the class I(a) = {x : x ≤ a}. Also, A is an initial segment of itself. A
partially ordered class may well have initial segments other than ones of these forms.
C.3 Suprema
The idea of a supremum should be familiar to you from calculus and analysis. Suprema and
their properties will be important to us (in this chapter, but particularly in the context of
I Ch.7 ordinal numbers in Chapter 7) so here are a number of facts worth knowing.
(i) The supremum of a subset depends also on what it is a subset of. For example, let
X be the set of all rational numbers x such that x2 < 2. Then:
√
• as a subset of R, sup X exists and is equal to 2;
• as a subset of Q, sup X does not exist.
This suggests that the usual notation could be a bit misleading, and we should write some-
thing like, say, supR X and supQ X. It usually doesn’t matter, because we know what the
“main” set is.
This same example shows that the inf does not always exist.
(ii) In any ordered set, m = sup X if and only if the following are both true:
(a) x ≤ m for all x ∈ X.
(iv) If a subset X happens to have a supremum and that supremum is a member of the
set (that is, sup X exists and ∈ X), then it is also the maximum member of the set.
(v) Consider the empty set as a subset of some given ordered set A.
• If A has a minimum member, then sup ∅ exists and is that minimum member,
• otherwise sup ∅ does not exist.
∅∈W
We now set about creating the Natural Numbers N. We will see that most of the engineering
required to manufacture this set is already available to us, with one lynchepin left to be
provided by this axiom.
The way we treat the Natural Numbers here is rather different from the approach used in
I 4.C Section 4.C. Now we set them up as a set. This allows us to use general mathematical
methods to investigate them and, conversely, to use them as a tool in general mathematics.
What we want is the set N to which Peano’s Axioms apply in this form:
(P1) 0 ∈ N.
(P2) for all n ∈ N, n+ ∈ N.
(P3) for all n ∈ N, 0 6= n+ .
then X = N.
We will use the beautiful definition originally due to John von Neumann. Imagine that we
are attempting to come up with a definition of the natural numbers, based on the sort of
set theory we have done so far. Then every natural number is going to be a set. It would
be nice if the set n actually had n members, so let’s see if we can arrange this. For a start,
0 must have no members and so must be the empty set. To go further than this, look at
the natural numbers
0, 1, 2, 3, 4, ...
Notice that here each natural number n is preceded by exactly n smaller ones. For example,
3 is preceded by 0,1 and 2. So we try defining each natural number as the set of its
predecessors. Can we make this circular-sounding definition work? Well, yes, if we organise
it properly. We are proposing to define each natural number n to be the set {0, 1, . . . , n − 1}.
D. THE NATURAL NUMBERS 247
n+ = {0, 1, 2, . . . , n}
= {0, 1, 2, . . . , n − 1} ∪ {n}
= n ∪ {n}
So we have
0 = ∅
+
n = n ∪ {n} for all n
This is starting to look very like an inductive definition, in fact it would be if we only had
induction already. (We had induction as an axiom of PA, but it is not an axiom of MK. We
are about to make that work too.)
You will notice that the Axiom of Infinity is suddenly starting to look very appropriate: the
set W above will contain our brand new N, but it might be too big. We can see straight
off that it contains 0 and, if it contains x then it contains x+ also, so we are on our way.
There is nothing in the axiom, however, to say that the set W does not contain a whole lot
of other rubbish. So we must define N to be a subset of W in a crafty way. There will also
be some other technical details to fix as we go. These will occupy the rest of this section.
The main trick in his discussion is to observe that the set W given to us by the axiom is
not unique — there will be many such sets. If we take the intersection of all of them we
can show that this has all the required properties above. So we use this intersection as the
definition of N. For convenience, we will call these sets “inductive”.
D.2 Definition
A set A is inductive if it has these properties:
(We could define an inductive class the same way if we wished, but we don’t need this
generality, so why bother?)
We now define N by specification:
This clearly defines N to be a class. To see that it is a set, note that N ⊆ W , (since W is
inductive) and the axiom says that W is a set.
(P5) If X is a subset of N with the given properties, then of course X ⊆ N. But also X
is inductive, so N ⊆ X.
This leaves (P4), which is surprisingly tricky . . .
D.4 Definition
A class X is transitive if every member is also a subset, that is,
(∀x)(x ∈ X ⇒ x ⊆ X) .
In this context then, each such x is both a member and a subset of X.
The word transitive comes from the alternative (equivalent) definition: a class X is transitive
if
(∀x)(∀y)(x ∈ y and y ∈ X ⇒ x ∈ X) .
At present we are only interested in transitive sets. Later, when we come to consider ordinal
numbers, it will be useful to observe that the class of all ordinal numbers is transitive. But
we are getting ahead of ourselves.
D.5 Proposition
All natural numbers are transitive.
Proof. The empty set is transitive (vacuously). Since we have already proved (P5), it
remains to show that, if n is a transitive natural number, then so is n+ . Suppose then that
n is such a number and that x ∈ n+ ; we want to show that x ⊆ n+ . We have x ∈ n ∪ {n}, so
either x ∈ n or x ∈ {n}. If x ∈ n then x ⊆ n (by the inductive hypothesis) and then, since
n ⊆ n+ we have x ⊆ n+ . On the other hand, if x ∈ {n} then x = n and again x ⊆ n+ .
One would expect there to be a theorem stating that no set can be a member of itself. This
is in fact so, the result following from the Axiom of Foundation, and we will prove it in
Corollary H.3. I H.3
It is nice however to know that this esoteric axiom is not needed to prove such a basic fact
for the natural numbers; it follows from what we have just proved:
D.6 Proposition
No natural number is a member of itself (that is, if n ∈ N then n ∈
/ n).
Proof. The empty set has no members, so it is not a member of itself. It now suffices to
show that if n is a natural number which is not a member of itself, then so is n+ . Suppose
/ n. We want to show that n+ ∈
then that n is such a number, n ∈ / n+ .
Proof is by contradiction: suppose that n+ ∈ n+ , that is, n+ ∈ n ∪ {n}. Then either
n+ ∈ n or n+ ∈ {n}. But if n+ ∈ n then n+ ⊆ n , since n is transitive. If, on the other
hand, n+ ∈ {n}, then n+ = n. So in either case n+ ⊆ n. But n ∈ n+ , so then n ∈ n, a
contradiction.
then X = N.
or in terms of an expression to be proved, as was given when we discussed Peano Arithmetic:
(B) If P is an expression and x a variable symbol such that,
then (∀x)P .
Thus we specify f (0) = b, where b is a member of B, and, for all n, f (n+ ) = h(n, f (n)) where
h is some function, already defined. (Notice that here, h must be a function N × B → B.)
There is no reason why B should not be a class here, and occasionally we will want that.
So we make the definition in this generality: it is no harder.
f (0) = b
and
f (n+ ) = h(n, f (n)) for all n ∈ N.
More generally, there may be other variables involved and we want to define a function of
several variables by induction over one of them — define f (a1 , a2 , . . . , ak , n) by induction
over n. In this case we can of course consider the n-tuple ha1 , a2 , . . . , ak i to be a single
variable, and so our general problem boils down to defining a function f : A × N → B by
D. THE NATURAL NUMBERS 251
specifying f (a, 0) for every a ∈ A and specifying f (a, n+ ) in terms of a, n and f (a, n) for
all a ∈ A and n ∈ N. Here A and B can be any classes.
Thus we have
Proof. The proof that f is unique is a simple application of proof by induction and will be
left as an exercise. The fact that such an f exists is what will be proved here.
The most powerful method for defining a new function A × N → B that we have at our
disposal so far is to define it as a triple hA × N, G, Bi, where G is the graph of the function,
so we will do that. Now G must be a subset of (A × N) × B, which is the same as A × N × B.
Translating conditions (–1A) and (–1B) above on the function into corresponding conditions
on the graph, we have
and we mustn’t forget the extra condition to ensure that this relation is a function:
Any subset of A × N × B which satisfies these three conditions is the graph of the required
function; so now all we need do is show that such a subset exists. We employ a similar trick
to the one we used when creating N above.
For the purposes of this proof, let us call any subset of A × N × B which satisfies (–2A)
and (–2B) “good”. We will show that the intersection of all good subsets satisfies all three
conditions and so gives the required function. So let us write G for the intersection of all
good subsets of A × N × B.
Firstly, observe that A × N × B itself is good, so G ⊆ A × N × B.
• Given any a ∈ A, ha, 0, g(a)i is a member of every good set by (–2A) and so is a
member of G. Thus G satisfies (–2A) also.
And now we can prove (–2C), which we will do by induction over n. That’s not being
circular: we are in the process of showing that definition by induction works; we already
know that proof by induction works.
First we want to show that, given any a ∈ A, there is a unique b ∈ B such that ha, 0, bi ∈ G.
Now we know (by (–2A) for G) that ha, 0, g(a)i ∈ G, so it exists. For uniqueness, suppose
that there is some b 6= g(a) such that ha, 0, bi ∈ G. Then this triple can simply be removed
from G and it will still be good, so that cannot happen.
Finally, given any a ∈ A and n ∈ N, assume that there is a unique b ∈ B such that
ha, n, bi ∈ G; we want to show that there is a unique c ∈ B such that ha, n+ , ci ∈ G. Such
a c exists, namely c = h(a, n, b). But it is also unique, because we have just seen that
c = h(a, n, b) for some b ∈ B such that c = h(a, n, b) and our inductive assumption is that
there is only one such b.
indeed, we can now define (for example) exponentiation in the same way
+
x0 = 1 for all x ∈ N and x(y )
= (xy ).y for all x, y ∈ N
Also, of course, we are now allowed to talk about sets and classes, so we can say much more.
We can also define the order on N as in Section C.1, or else we can ignore addition and
define it directly: take a hint from the fact noted above that n = (1, 2, . . . , n − 1), and define
a ≤ b to mean a ∈ b ∨ a = b, or perhaps better just a ⊆ b. It is an interesting exercise to
prove the usual order properties from this basis.
More ambitiously, we can now develop all the well-known algebraic and order properties of
N. Having done this, we can construct in turn The Integers Z, The Rationals Q, The Reals
R and The Complex Numbers C and prove their defining properties.
D.12 Some thoughts about the Natural Numbers and definitions in general
You may find the fact that we have three separate definitions of an ordered pair, and the
fact that we often don’t care much about which one we use, a little unsettling. But this
should not be new: we customarily treat the Natural Numbers the same way. We have a
basic construction of N, à la Von Neumann as in this book, or any other way you please;
D. THE NATURAL NUMBERS 253
then later a construction of the Rationals Q which “contains” N and later still a construction
of the Reals R which “contains” Q and along with it N again. Look at the actual definitions
(they are given in Appendix A) and you will see that these three versions of N are in fact IA
quite different. However we are happy to swap back and forth between these constructions
according to convenience. Oh, and did I mention N as a subset of the Complex Numbers?
Of the Algebraic Numbers? Of rings of polynomials? We are even happy to call all these
things “the” Natural Numbers. Why is this OK?
What do we want the Natural Numbers to provide for us? At the most basic level, probably
two things. Firstly to act as a way of measuring the size of finite sets — one element exactly
of N corresponding to each finite size — and, one would hope, some useful operations to
help figure out what happens when one adds an extra element to a finite set or merges two
of them. Secondly as something to keep track of repetitive operations or occurrences. The
uses seem to be intertwined; in particular, it does not seem easy to deal with the question
of the size with out dealing with repetitive operations also — think counting.
Analysis of these ideas leads to the requirements encapsulated in the axioms of PA (or
something very like it). In a richer context such as mathematics in general (MK or ZF for
examples), the first three axioms are enough and the final four then follow.
Now, to return to the original question, anything we construct that has these properties will
do just fine. It is seldom relevant whether we are using the Natural Numbers as constructed
according to the Von Neumann method or in some other way. All these constructions yield
structures which are isomorphic as far as relevant properties go.
[It should be mentioned here that the Von Neumann construction does become relevant
again when we look at transfinite arithmetic, because it extends nicely to a very convenient
construction of the ordinal numbers.]
These same general considerations apply to the construction of an ordered pair. The whole
point of an ordered pair is its property: it is a function of two variables (let us use here the
generic notation p(x, y)) such that
[It should be pointed out that of the three definitions given above, the “ordinary” ordered
pair is not exactly equivalent to the other two (the “primitive” one and as a sequence of
length 2) because it can be used when x and y are classes, whereas for the other two they
must be sets (for the defining property to work).
So why then do we bother with the construction at all (since we are hardly going to pay
any attention to it)? It is because we need to know that the thing (Natural Numbers, the
ordered pair or whatever) does in fact exist.
These general ideas apply to many of the constructions of mathematics. For another exam-
ple, there are two standard ways of constructing the Real Numbers, one employing Cauchy
sequences and the other Dedekind cuts. These are used to show that the Real Numbers
(with its required properties) actually exists and from then on are pretty well ignored.
254 CHAPTER 6. MORSE-KELLEY SET THEORY
E Well-ordering
E.1 Notes
Well-order was defined in Section C.1. (The various terms that were defined in that section
will be used here.) With regard to that definition, note that:—
(i) Properties O3 and O5 together imply the others, so in order to verify that a relation
is a well-order, it is sufficient to verify these two properties.
(ii) When we come to discuss the Axiom of Choice, we will see that that axiom is
equivalent to the Well-Ordering Principle: Every set can be well-ordered.
Proof (i). To prove O1, note that {x} has a least member, so x ≤ x.
To prove O2, note that {x, y, z} has a least member. If this is x then of course x ≤ z. If it
is y, then y ≤ x; since we already know that x ≤ y then x = y (by O3), so x ≤ z. If the
least member is z the proof is similar.
To prove O4, note that {x, y} has a least element, which must be either x or y.
(iii) The proof is by contradiction: suppose that X is a subset of N which has no least
element; we show that X is empty.
number n. Again, it is easy to prove that this is well-ordered. (Any nonempty subset X is
either {∞}, in which case ∞ is its least member, or else X r {∞} is a nonempty subset of
N and so has a least member.)
You can also put two copies of N side by side. The best way to look at this is to start with
two disjoint sets which are ordered like N, A = {ai }i∈N and B = {bi }i∈N . Now order the
disjoint union A ∪ B by specifying ai < aj and bi < bj whenever i < j and ai < bj for all
i, j. Picture:
{ a0 < a1 < a2 < . . . < b0 < b1 , < b2 < . . . }
Any nonempty subset X either contains sat least one ai , in which case the least such ai is
the least member of X, or else it is entirely contained in the B part, in which case it has a
least member because B is well ordered.
We can well-order the set N2 thus:
All the pairs which start with 0 first, h0, 0i < h0, 1i < h0, 2i < . . .
followed by all the pairs which start with 1, h1, 0i < h1, 1i < h1, 2i < . . .
followed by all the pairs which start with 2, h2, 0i < h2, 1i < h2, 2i < . . .
and so on.
This is called the lexicographic order for I suppose obvious reasons; more carefully, it is the
lexicographic order from the left. This order is better defined thus:
(
a1 < b1
ha1 , a2 i < hb1 , b2 i if
or else a1 = b1 and a2 < b2 .
To see that this is a well-order, let X be a nonempty subset of N. Then there is at least
one member ha1 , a2 i. Let a1 be in fact the least member of N such that such a pair is in X.
Keep that ai fixed. Now there is at least one member a2 of N such that ha1 , a2 i ∈ X (with
that already fixed a1 ). Choose the least such a2 . Then ha1 , a2 i is the least member of X.
This construction works for n-tuples (for any fixed n). Define an order on Nn by
I C.2 which was defined in C.2. (It is quite useful in the context of well-orders.)
Then E = W .
(ii) Let W be a well-ordered class and P (x) be an expression such that
Discussion
All these methods work not only for N but also for any well-ordered set or indeed any well-
ordered class, however let us discuss them in the context of N here — not forgetting their
wider applicability.
What this means in practice is, to use this method to prove that P (n) is true for all n ∈ N,
it is sufficient to prove, for any a ∈ N,
In order to compare this with ordinary induction, we could rewrite the ordinary induc-
tion method thus: to prove that P (a) is true for all a ∈ N,it is sufficient to prove that
P (0) (–2A)
and, for all a ≥ 1 ∈ N,
Proving (–1) for all n includes proving it in the case n = 0. Now, writing out
(–1) in the special case n = 0, we get
But, since (∀k < 0)P (k) is vacuously true, this is the same as T ⇒ P (0),
which in turn is the same as P (0). In practice, when strong induction is an
appropriate method, this special case very often just comes out in the wash.
E.5 Definition
Let A and B be two partially ordered classes.
(i) A function f : A → B is order-preserving if, for any a1 , a2 ∈ A
a1 ≤ a2 ⇒ f (a1 ) ≤ f (a2 ) .
a1 ≤ a2 ⇔ ϕ(a1 ) ≤ ϕ(a2 ) .
An alternative description of an order isomorphism is as a bijection such that both it and its
inverse are order-preserving. Note that, if both A and B are known to be fully ordered, then
either ⇒ or ⇐ is enough in (ii) above. Putting that another way, if both A and B are known
to be fully ordered, then an order-preserving bijection must be an order-isomorphism.
The next proposition proves some useful tricks for dealing with order-preserving functions.
E.6 Proposition
(i) Suppose that X and Y are fully ordered classes and f : X → Y is an order-preserving
bijection, then it is an order-isomorphism. Also f [I(x)] = I(f (x)) for all x ∈ X.
Proof. (i) Suppose that y1 < y2 in Y . Since f is a bijection, there are x1 and x2 in
X such that f (x1 ) = y1 and f (x2 ) = y2 . But then x1 ≥ x2 gives y1 ≥ y2 , a contradiction.
Suppose y ∈ f [I(x)]. Then there is u ∈ I(x) such that y = f (u). Then u < x and, since f is
one-to-one, f (u) < f (x), i.e. y ∈ I(f (x)).
Conversely, suppose that y ∈ I(f (x)). Since f is onto, there is u ∈ X such that y = f (u).
Then u ∈ I(x), for otherwise (X being fully ordered) x ≤ u, in which case we would have
y = f (u) ≥ f (x), contradicting y ∈ I(f (x)).
(ii) Observe first that the conditions on f tell us that, if x < y in W , then f (x) < f (y)
(f is strict order-preserving). Let E be the class of all members x of W such that x ≤ f (x);
we want to prove that E = W . Suppose not; then E r E is nonempty and so has a least
member, w say. Then f (w) < w, from which we deduce two things, first that f (w) ∈ E and
second that f (f (w)) < f (w). But these contradict one another.
Proof. For the sake of this proof, let us call a function which is an order-isomorphism from
an initial subset of A to an initial subset of B and initial function. Let U be the subset of
¯
A consisting of all members a ∈ A such that there exists an initial function I(a) → B. If
such a function exists it is unique (by Part (v) of the previous proposition), so we can call
¯
it fa . It is obvious that the range fa [I(a)] ¯ (a)].
must be I(f
(1) U is an initial segment.
To see this, let u ∈ U , a ∈ A and a ≤ u. Then it is easy to check that the function fu I(a)
¯
¯ → B, so a ∈ U also.
is an initial function I(a)
260 CHAPTER 6. MORSE-KELLEY SET THEORY
(2) These functions fit together nicely, in the sense that, for any a1 ≤ a2 in A, fa1 =
fa2 I(a
¯ 1).
(3) So we can fit them all together to make a single inclusive function f : U → B. The
easiest way to do this is to define f by f (u) = fu (u) for all u ∈ U .
(4) Now we prove the theorem. If either U = A or V = B we are done; we show that
one or other (or both) must be true. For suppose not. Then we have initial segments U of
A and V of B such that U 6= A and V 6= B. Then there are members a of A and b of B
such that U = I(a) and V = I(b). Define
(
g(a) = b
g : U ∪ {a} → B by
g(x) = f (x) for all x ∈ U .
E.9 Corollary
From this theorem we can say that, if A and B are two well-ordered classes, then either A
is order-isomorphic to an initial segment of B or B is order-isomorphic to an initial segment
of A. Moreover, if both, then A and B are themselves order-isomorphic.
E. WELL-ORDERING 261
It is clear that these equations are sufficient to define B(n) for all n, along these lines:
P (1) = 1
P (2) = P (1).P (1) = 1.1 = 1
P (3) = P (1).P (2) + P (2).P (1) = 1.1 + 1.1 = 2
P (4) = P (1).P (3) + P (2).P (2) + P (3).P (1) = 1.2 + 1.1 + 2.1 = 5
P (5) = P (1).P (4) + P (2).P (3) + P (3).P (2) + P (4).P (1) = 1.5 + 1.2 + 2.1 + 5.1 = 14
and so on.
This looks very like an inductive definition, except that the “inductive step” (–2) involves
all the previous values of P , not just P (n − 1). It looks like a combination of an inductive
definition with the idea of strong induction. And so it is: it is an example of a definition by
strong induction. To define a function, f say, this way, we define f (n) not just in terms of
f (n − 1) but in terms of the whole function so far.
The general idea is not terribly complicated, but describing it properly (and in full generality)
is rather tricky. The important thing to notice is that, when defining a function F : N → B
in this way, we define f (n) in terms of the entire function so far, that is, in terms of f I(n)
— recall, this is our notation for f restricted to {0, 1, . . . , n − 1}.
well, the second argument n is not required here, because if we need n it is given by f I(n)
anyway. So what we require is actually
So this defining function h is a function A×? → B, where the question mark represents
some set or class which we have yet to figure out. The thing that goes in here, f I(n) is, as
far as we can know when setting up the definition, some unpredictable function I(n) → B,
so maybe the question mark should represent all such functions. Well, yes, but it also has
to work for all possible values of n too, so what we want here is
For the present purposes I will call this set I (that’s a big curly letter I).
To see how this notation includes our bracketing example above, first we have to give P (0) a
value. Since we don’t care much what it is, let’s just arbitrarily set P (0) = 1. (Also A = ∅
since there are no extra parameters.) Then our function h : I → N is
(
1 if the domain of θ is either I(0) or I(1)
h(θ) = Pn−1
r=1 θ(r)θ(n − r) where the domain of theta is n and n > 1
Now we have enough to write out a proper definition of definition by strong induction, and
to prove that it works. Moreover, all this works for any well-ordered class in place of N
with no extra complication. Since we will eventually need this kind of definition in other
well-ordered cases, we treat the general case here. For the time being, there is no harm in
thinking of W as being the natural numbers, then glancing back at this method when we
come to discuss ordinal numbers later.
Read the statement of this theorem and make sure you understand what it says, even though
the notation is a bit complicated — the theorem is important. On the other hand, it is not
so important to work through the rather horrible little proof. It is included mainly for
completeness and because you might have difficulty finding it anywhere else. As usual, feel
free to examine the proof if you are interested, or suspicious.
E.11 Theorem
Let A and B be classes, W a well-ordered class and h be a function h : A × W × I → B,
where I is the class of all functions A × I(w) → B for all w ∈ W . Then there is a unique
function f : A × W → B such that
¯
Proof. For the purposes of this proof, define a starter to be a function σ : A × I(w) → B,
where w ∈ W , which satisfies (–1) “as far as it goes”, that is,
Suppose then that we have two starters σ, σ 0 : A × I(w) ¯ → B and σ 6= σ 0 . Then there is
some a ∈ A and x ∈ I(w) ¯ such that σ(a, x) 6= σ 0 (a, x); using well-order, we may suppose
that x is in fact the least such. Then we have σ(a, z) = σ 0 (a, z) for all z < x; in other words,
σA×I(x) = σ 0 A×I(x) . But then
Now suppose that w ∈ S and v ≤ w, and consider the function σ 0 = σw A×I(v) ¯ . Noting
that, for any x ≤ v we have σ 0 (a, x) = σw (a, x), so that σ 0 A×I(x) = σw A×I(x) , we see that,
for any x ≤ v,
¯
and so σ 0 is a starter and therefore the starter A × I(v) → B, that is σ 0 = σv . From this
we see several things. Firstly, if w ∈ W and v ≤ w, then v ∈ S also, that is, S is an initial
segment of W. Secondly, if v ≤ w ∈ S,
σv = σw A×I(v)
¯ .
f (a, w) = σw (a, w)
It remains only to show that S = W . Suppose not. Then W rS 6= ∅ so, using well-ordering,
¯
let w be the least member of W r S. Then there is a starter σx : A × I(x) → B for all
¯
x < w, which means that I(w) ⊆ S. Define σ : A × I(w) → B by
(
f (a, x) if x < w,
σ(a, x) =
h(a, w, f A×I(w) ) if x = w.
But this σ is also a starter. To see this we check (–1). Firstly, for x < w , σ(a, x) = f (a, x)
which tells us that σ|A×I(w) = f |A×I(w) , and so
Noting also that, for any x < w, σ(a, x) = f (a, x) = σx (a, x), so σ|A×I(x) = σx |A×I(x) ,
¯
This proves that σ is a starter A× I(w) → B, contradicting the choice of w and thus proving
that S = W , as required.
F. THE AXIOM OF CHOICE 265
Think of f as “choosing” one element of each set in A. It is called a choice function: hence
the name of the axiom.
F.2 Discussion
The Axiom of Choice is not one of the axioms of von Morse-Kelley Set Theory (MK) nor is
it one of the axioms of Zermelo-Fraenkel Set Theory (ZF). I will call Set Theory with the
Axiom of Choice added MK+AC.
Note
(1) AC is consistent with Set Theory: if MK is consistent, then so is MK+AC.
(2) AC is independent of Set Theory: if MK is consistent, then so is MK+¬AC.
The Axiom of Choice has a special status: some mathematicians are happy to use it routinely,
others regard it with deep suspicion. If you use an argument which involves AC then it is a
very good idea to say so. Moreover, if it is possible to find an alternative argument which
does not involve AC, this is generally considered to be a good thing.
Given the controversial nature of AC, the first consistency result above is reassuring. Even
if AC does not have the acceptance of the other axioms, at least its use cannot involve any
more risk of a self contradiction that mathematics without it. The second consistency result
shows that the search for a proof of AC from the other axioms is futile (unless, of course,
you suspect that MK itself is inconsistent, and that this might be a good way to prove it!).
Because of the status of the Axiom of Choice, I will decorate results in these notes which
depend upon it with a superscript thus AC .
Here are some observations which I find helpful in thinking about AC.
You don’t ever need AC to make a finite number of choices.
If there is some reason to choose a particular member of each nonempty subset, then you
don’t need the AC either. A standard example of this is: given a (finite or infinite) number
of subsets of N, you can choose one from each simply by choosing its least member. What
is happening here is that the general rule for any nonempty subset X of N
X 7→ least member of X ,
you consider the set of all nonempty subsets of R, you will find that you cannot come up
with a definition of a choice function. (Choosing the midpoint won’t do: plenty of nonempty
sets don’t contain their midpoint. And so on for any other rule you like to come up with.)
Here the AC comes to the rescue — it tells you that such a choice function does in fact
exist, even though you cannot possibly know exactly what function it is.
This is one of the characteristics of AC (and the equivalent principles we will look at shortly):
they usually tell us that something exists, without supplying the least clue as to what that
something actually is.
There is a joke about the Axiom of Choice (the kind only a mathematician would laugh at).
Suppose you have an infinite number of pairs of socks. Then it requires the AC to choose
one sock from every pair. On the other hand, if you have an infinite number of pairs of
shoes, it does not require the AC to choose one from every pair. (Because the two socks in
a pair are identical, so there is no particular rule to make a choice; but for shoes there is a
choice function which doesn’t require AC: pair of shoes 7→ left one.)
That this is another way of stating the Axiom of Choice becomes more obvious if you look
at the definition of the cartesian product.
(AC3 ) AC Let A be a set of pairwise disjoint nonempty sets. Then there is a set C which
contains exactly one element from every member of A.
(The set C here is called a choice set for A.)
The last form is the one which is most easily expressed in the formal language (without need
for preliminary definitions). It can be stated:
(∀u)(u ∈ a ⇒ u 6= ∅) ∧ (∀u)(∀v)(u ∈ a ∧ v ∈ a ∧ (∃x)(x ∈ u ∧ x ∈ v) ⇒ u = v)
⇒ (∃c) (∀u)(u ∈ a ⇒ (∃x)(x ∈ c ∧ x ∈ u))
∧ (∀x)(∀y)( x ∈ c ∧ y ∈ c ∧ (∃u)(x ∈ u ∧ y ∈ u ∧ u ∈ a) ⇒ x = y ) .
I will leave the proof that these four formulations of the Axiom of Choice are equivalent as
a fairly straightforward exercise.
F. THE AXIOM OF CHOICE 267
It is considerably more difficult to prove the following three equivalent. They are important
and very useful results, especially Zorn’s Lemma and the Well-ordering Principle. I will
give some applications of the use of Zorn’s Lemma in the next section; the Well-ordering
Principle will come into its own when we discuss ordinals and cardinals in the next chapter.
In the meantime, we will prove their equivalence by proving implications round in a circle:
AxC ⇒ Max Principle ⇒ Zorn’s Lemma ⇒ W-O Principle ⇒ AxC
We prove a lemma to help with the first step and this lemma has the hardest proof of the
circle. While it is important to understand Zorn’s Lemma and the Well-ordering Principle
(and I will discuss them in some detail as we proceed) it is not imperative to work through
the rather awful proofs. They are included here for completeness and so you can consult
them if you are interested. These proofs occupy the remainder of this section.
AC
Maximal Principle Every partially ordered set has a maximal chain.
Zorn’s Lemma AC Let P be a nonempty partially ordered set in which every chain has
an upper bound. Then P has a maximal element.
AC
The Well-Ordering Principle For every set there is a well-order.
(This is often expressed: “Every set can be well-ordered”. This is a tad dangerous, if you
interpret it to mean that you can explicitly specify or construct a particular well-order for
the set. For many sets you cannot. The AC, as is its wont, simply says that there is a
well-order there without giving any hint as to what it actually is.)
In order to prove these are equivalent to the Axiom of Choice, we start with another lemma,
also equivalent, but not particularly interesting in its own right. (Its proof does however
contain most of the hard work here.)
AC
F.4 Lemma
Let X be a set and S be a collection of subsets of X (which we will consider to be partially
ordered by inclusion), with the properties:
(1) ∅∈S
(2) If V ∈ S and U ⊆ V , then U ∈ S also.
S
(3) If C is a chain in S, then C ∈ S.
Proof that AC ⇒ this lemma. By the Axiom of Choice, there is a choice function c :
P (X) r {∅} → X (so that, for every nonempty subset U of X, c(U ) ∈ U ).
For each member S of S, define a set S̄ by
S̄ = { x ∈ X : S ∪ {x} ∈ S }
Obviously S ⊆ S̄ for all such S.
268 CHAPTER 6. MORSE-KELLEY SET THEORY
Suppose that S 6= S̄. Then S ⊂ S̄, so there is some x ∈ / S such that S ∪ {x} ∈ S̄. This
means that S is not maximal in S. Conversely, suppose that S is not maximal. Then there
is some T ∈ S such that S ⊂ T . Then there is some x ∈ T r S and then S ∪ {x} ⊆ T , so
S ∪ {x} ∈ S by Condition (2). But then x ∈ S̄. Since x ∈/ S, S 6= S̄. We have just proved
that S 6= S̄ if and only if S is not maximal. In other words, S is maximal if and only if
S = S̄. Our task becomes: show that S contains a member S such that S = S̄.
(i) ∅∈T
(ii) T ∈T ⇒ T∗ ∈ T
S
(iii) if C is a chain in T, then C∈T.
There is at least one tower, namely S itself. Also the intersection of any collection of towers
is a tower. Therefore there is a “smallest” tower, T0 say — define T0 to be the intersection
of all towers and then it is a tower which is a subset of any other tower.
S
We will show that T0 is a chain; this will prove the lemma, for then, writing W for T0 , we
have W ∈ T0 by (iii). Then from (ii), W ∗ = W . Also, T0 is a chain in S, so W ∈ S. To this
end, we will say that a member C of T0 is central if it is comparable with every member of
T0 , that is
for all T ∈ T0 , either T ⊆ C or C ⊆ T.
It is now enough to show that every member of T0 is central and, to do this, it is sufficient
to show that the set of all central members of T0 is a tower; this is now our task.
Firstly,
S ∅ is obviously central. Now let K be a chain of central elements of T0 ; we
S show
that K is central. Let T be any member of T0 ; we want to show that T and K are
comparable. If every member of K is a subset of T then so is its union. OtherwiseSthere is
some K ∈ K such that K * T . But then since K is central, T ⊆ K and then T ⊆ K.
It remains to show that if C is central then so is C ∗ . From now on then suppose that C
is a fixed central member of T0 . We consider the collection U of all sets U in T0 such that
either U ⊆ C or C ∗ ⊆ U . We will show that U is a tower. This proves the lemma, for then
U = T0 (by definition of T0 ) and so, for every member T of T0 we have T ⊆ C or C ∗ ⊆ T ;
since C ⊆ C ∗ , this makes C ∗ central.
It remains to show that U is a tower. Clearly ∅ ∈ U. Now S let K be a chain in U. Then
either every member of K is a subset of C, in which case so is SK, or else K has a member,
K say, which is not a subset of C, and then C ∗ ⊆ K so C ∗ ⊆ K.
F. THE AXIOM OF CHOICE 269
Proof that Lemma E4 ⇒ the Maximum Principle. Let X be a partially ordered set
and let S be the collection of all chains in X. Now ∅ is a chain in X, so ∅ ∈ S. If V is a
chain in X and U ⊆ V thenSU is a chain also. Now suppose that C is a chain in S (under
⊆ as usual); we prove that C ∈ S, that is, its union is a chain in X.
S
Let a, b ∈ C. Then there are sets A, B ∈ C such that a ∈ A and b ∈ B. Since C is a chain,
A and B are comparable; suppose without loss of generality that A ⊆ B. Then a, b ∈ B.
Since B is a chain, a and b are comparable.
Proof that the Maximum Principle ⇒ Zorn’s Lemma. By the Maximum Principle,
P has a maximal chain, C say. By hypothesis, C has an upper bound, m say. We show
that m is a maximal member of P . Suppose not. Then there is p ∈ P such that m < p. It
follows that C ∪ {p} is also a chain, contradicting the maximality of C.
Proof that Zorn’s Lemma ⇒ the Well-Ordering Principle. (I will only give an out-
line of the proof here. The main structure of the proof is a classic example of how Zorn’s
Lemma is used. The details are straightforward but a bit tedious, and can safely be left as
an exercise.)
Let X be any set; we show that it can be well-ordered. Let W be the collection of all
well-ordered subsets of X, that is, all pairs of the form hW, ≤i, where W is a subset of X
and ≤ is a well-order of W . We now order W in the obvious (I think!) way:
that is, if
W1 ⊆ W2 ,
≤1 is the restriction of ≤2 to W1
and
u ≤ v in W2 and v ∈ W1 ⇒ u ∈ W1 .
We now use Zorn’s Lemma to show that W has a maximal member. First note that W is
nonempty since it contains h∅, ∅i. Now we suppose that C is a chain in W and construct
an upper bound for C. This upper bound is the union of all the sets in C with the “obvious”
order. More precisely, it is the pair hM, ≤0 i defined by
[
M = { W : hW, ≤i ∈ C } (the union of all the sets “in” the chain)
The “straightforward but tedious” details that need checking here are that hM, ≤0 i is indeed
a member of W , that is, M is a subset of X and ≤0 is a well-order of M , and that it is an
upper bound for C.
Having established that W does have a maximal member, hW, ≤i say, it is enough to show
that then W = X. Suppose not. Then there is some x ∈ X such that x ∈ / W . But we
can extend the well-order of W to W ∪ {x} by making x a new greatest element — more
precisely, define an order ≤0 on W ∪ {x} by
It is now easy to check that ≤0 well-orders W ∪ {x} and that hW, ≤i is an initial segment of
hW ∪ {x}, ≤0 i, contradicting the maximality of hW, ≤i.
Proof that the Well-Ordering Principle ⇒ the Axiom of Choice. Let X be any set.
We show that there is a choice function c : P (X) r {∅} → X. There is a well-order, ≤ say
of X. Now define the function c by defining its graph:
G. ZORN’S LEMMA 271
G Zorn’s Lemma
In this section I give several examples of applications of Zorn’s Lemma, together with a bit
of advice on how to use it.
Even though we are dealing with the infinite-dimensional case, the definitions are necessarily
couched in terms of finite sums because in a general vector space there is no definition of the
sum of an infinite number of vectors. (Where infinite sums occur in analysis some notion
of convergence is required and that means that the vector spaces involved must have some
sort of additional topological structure: you would be working in a Hilbert or Banach space
or something of that nature. But one needs the notion of a basis in a general vector space,
and this notion is important even in analysis.
G.2 Definitions
(1) A subset A of a vector space V is linearly independent if, for any finite subset
a1 , a2 , . . . , an of distinct members of A and scalars x1 , x2 , . . . , xn such that
x1 a1 + x2 a2 + · · · + xn an = 0
it follows that x1 = x2 = · · · = xn = 0
(2) A member v of the vector space V is a linear combination of the subset A of V if
there is a finite subset a1 , a2 , . . . , an and scalars x1 , x2 , . . . , xn such that v = x1 a1 + x2 a2 +
· · · + xn an .
(3) Let A be a subset of a vector space V. The set of all linear combinations of A is
denoted sp(A) and called the subspace of V spanned by A. If sp(A) = V we say that A
spans V.
(4) A subset A of a vector space V is a basis for V if it is both linearly independent and
spans V.
Now, a couple of lemmas and a corollary with easy proofs which you should supply: they
are almost the same as in the finite-dimensional case.
G.3 Lemma
The subset sp(A) of the vector space V, as defined in 3 above, is indeed a subspace.
272 CHAPTER 6. MORSE-KELLEY SET THEORY
G.4 Lemma
If A is a linearly independent subset of the vector space V and b ∈ V r sp(A), then A ∪ {b}
is linearly independent also.
G.5 Corollary
A maximal linearly independent subset of a vector space V is a basis for V.
Here “maximal” of course means maximal with respect to the order ⊆, in other words we
are looking at a linearly independent subset which is not a proper subset of any other one.
A result like this clearly suggests the use of Zorn’s Lemma. It tells us that, in order to
show that a basis exists, it is enough to show that the collection of all linearly independent
subsets (ordered by ⊆) has a maximal member. To use Zorn’s Lemma, we need to prove:
• The collection of all linearly independent subsets of the space is nonempty (that is,
that there exists a linearly independent subset of the space), and
• Every chain of linearly independent subsets is bounded above (and to do this, the
union of the chain is likely to be the bound we need).
A couple of comments before starting. Zorn’s Lemma applies to any partial order, and here
we are going to use ⊆ as this order. In this context a chain is a set, C say, of sets such that
any pair of them is comparable, in the sense that, given any C1 and C2 ∈ C , either C1 ⊆ C2
or C2 ⊆ C1 . Also, an upper bound for such a chain is a set, U say, which contains every
member ofSC , in the sense that C ⊆ U for every C ∈ C . A likely candidate is the union of
the chain, C , and this works for this theorem.
G.6 Theorem
Every vector space has a basis.
x1 a1 + x2 a2 + · · · + xn an = 0 .
Then there are members C1 , C2 , . . . , Cn of C such that a1 ∈ C1 , a2 ∈ C2 , . . . , an ∈ Cn .
Since all the Ci are comparable (and there are only a finite number of them), there is one
that contains all the others. In other words, there is some k such that Ci ⊆ Ck for all i. But
then a1 , a2 , . . . , an all ∈ Ck . But Ck is linearly independent, so x1 = x2 = · · · = xn = 0, as
required.
G. ZORN’S LEMMA 273
G.7 Note
The same argument can be used to prove the stronger and more useful theorem: Let A
and C be subsets of a vector space V such that A is linearly independent, C spans V and
A ⊆ C. Then there is a basis B for V such that A ⊆ B ⊆ C.
To prove this use the proof above with L the collection of all linearly independent subsets
L such that A ⊆ L ⊆ C.
4. Prove that every chain is bounded above. Note that this upper bound must be a
member of P . This is usually the main part of the proof.
Here (1) and (2) are the parts where you need to use some creativity, to work out what P
and its order should be. The usual starting point is (2): you have something you need to
prove exists, and you figure out that a maximal sort of something will do the trick.
P will usually (but not always) be a collection of sets, subsets of something, often with
some extra structure. The partial order we place on these is usually ⊆, but if the sets have
extra structure, we will probably also need the structures to agree. The upper bound in (4)
is usually the union of the chain; if we need to create a structure on it, we will generally
need the agreement mentioned above.
In the vector space basis example above, we did not need to create any extra structure on
our subsets: the only structure needed is that of the vector space itself, and that is there
already.
In the next example we do need to create some structure, the full order we are trying to
show exists.
So what we will be working with are fully-ordered subsets: not just plain subsets, but subsets
together with a full order. It is best to think of these things as pairs:
Now we will have not only all these subsets with the full orders on them, but also the set
of all such pairs upon which we want to define the partial order to apply Zorn’s Lemma to.
I will use a nice exotic sort of order symbol for this: if hX1 , ≤1 i and hX2 , ≤2 i are two such
fully-ordered sets, we define
hX1 , ≤1 i 4 hX2 , ≤2 i
. . . well, how are we to define it? It is not good enough to define this to mean simply that
A1 ⊆ A2 (try it; you’ll se that you can’t prove that the union of a chain is fully ordered).
What we need is for the two orders ≤1 and ≤2 to match up as well. So we define this to
mean that A1 ⊆ A2 and ≤2 extends ≤1 , that is, that ≤1 and≤2 agree on A1 .
(If we view the orders as subsets of X1 × X1 and X2 × X2 , then to say that ≤2 extends ≤1
is the same as to say that ≤1 is a subset of ≤2 , or if you like ≤1 ⊆ ≤2 — but this notation
might be a bit hard to handle.)
G.10 Theorem
For every set there is a full order.
Proof. Let A be the given set. Let P be the set of all pairs hX, ≤i, where X ⊆ A and ≤
is a full order on X. We partially order this set by the order 4 defined
Step 1 This relation 4 is a partial order on the class of fully ordered subsets sets of A.
Proof This is very straightforward, however here is some of it.
Let hX, ≤i be any fully ordered set. Then X ⊆ X and ≤⊆≤, so hX, ≤i 4 hX, ≤i.
Let hX1 , ≤1 i and hX2 , ≤2 i be two fully ordered sets such that hX1 , ≤1 i 4 hX2 , ≤2 i and
hX2 , ≤2 i 4 hX1 , ≤1 i. Then X1 ⊆ X2 and X2 ⊆ X1 so X1 = X2 and the same goes for the
two orders.
Proof of transitivity is much the same.
There is some a ∈ A such that a ∈ / X. So we have a set X ∪ {a} which we may order by
extending the order on X so that a is a new greatest element (that is, x ≤0 y in X ∪ {a} if
either x ≤ y in X or else y = a.) Now clearly this new set is a fully ordered subset of A and
hX ≤i ≺ hX ∪ {a}, ≤0 i.
So now we want to show that such a maximal member exists. To apply Zorn’s Lemma we
will want to construct an upper bound for a chain of ordered sets. This is in fact fairly
straightforward.
Step 4 Let C be a chain of ordered sets, that is a collection of ordered sets in which every
pair of members is comparable under our relation 4. Define a new pair hM, ≤m i by
[
M = { X : hX, ≤i ∈ C } ,
[
≤m = { ≤ : hX, ≤i ∈ C } .
THEN hM, ≤m i is indeed a fully ordered set as claimed and is an upper bound for C under
the order 4.
Proof First we must check that hM, ≤m i is a member of P . That M ⊆ A is obvious. To
see that ≤m is a full order is straightforward, as follows.
Reflexivity: Let x ∈ M . Then there is some hX, ≤i ∈ C such that x ∈ X. Then x ≤ x and
so x ≤m x.
Antisymmetry: Let x, y ∈ M be such that x ≤ y and y ≤ x. Then there are hX1 , ≤1 i ∈ C
and hX2 , ≤2 i ∈ C such that x, y ∈ X1 , x ≤1 y, x, y ∈ X2 and y ≤2 x. Now, since C is a
chain, we have either
without loss of generality we may assume the former. But then x ≤2 y also and, since this
is an order, x = y.
Transitivity: The proof is much the same.
Trichotomy: Let x, y ∈ M . Then there is some hX1 , ≤1 i ∈ C such that x ∈ X1 and some
hX2 , ≤2 i ∈ C such that y ∈ X2 . Then x and y are both members of the "bigger" of these two
ordered subsets, so are comparable in that subset, and therefore comparable in hM, ≤m i.
Now for the proof that hM, ≤m i is an upper bound. Referring to the construction of M and
≤m at the beginning of this step, this is easy. Let hX, ≤i be any member of C. Then, by
construction, X ⊆ M . Also ≤ is a subset of ≤m , and this means that ≤m extends ≤.
Step 5 That’s it, the proof is finished except perhaps to write something like
Following the previous example, one might think that a good approach would be to look
at fully ordered subsets of A on which the full order extends the given partial order. This
idea turns out to lead to a horrible mess (try it!) so we use a different approach. We order
partial orders themselves (!). Apart from the rather weird fact that we are going to define
an order relation on a set of orders (so we need to be very careful with notation), this works
nicely. In fact we find that we don’t run into the “creating structure” kind of complications
that we did with the last example.
To use Zorn’s Lemma, we order the partial orders on A which extend the given one and
show that a maximal partial order must be a full order.
How to order the orders? The easy way is to think of an order as a subset of A × A — the
set of all pairs (x, y) such that x ≤ y. Then we simply order them by set inclusion.
Note that, to say that the order ≤1 is “less than or equal to” the order ≤2 , which we could
denote by ≤1 ⊆ ≤2 if that notation is not too confusing, means that
x ≤1 y ⇒ x ≤2 y for all x, y ∈ A
G.12 Lemma
If ≤ is a partial order on A which is not a full order, then it is not maximal, that is, there
is some partial order ≤0 on P such that ≤ ⊂ ≤0 .
Proof. Suppose then that ≤ is partial but not full. Then there must be incomparable
a, b ∈ A. The trick is that we can now order those two in any way we like, as long as we
take care of the further relationships that follow from that. (If we put a ≤0 b, then anything
less than a must become less than b and anything greater than b must become greater than
a. It turns out that that is all we need.)
Define ≤0 thus:
x ≤0 y if either x ≤ y
or x ≤ a and b ≤ y .
(Note that that includes the case a ≤ b.) It is obvious that ≤ is a proper subset of ≤0 . It is
necessary to prove that ≤0 is indeed a partial order. This is not hard, but requires checking
a few cases; I’ll leave that as an exercise, so as not to interrupt the flow.
G. ZORN’S LEMMA 277
Main proof
Now we look at the collection P of all partial orders on A which contain the given order ≤,
that is, all partial orders ≤0 such that ≤ ⊆ ≤0 .
This is nonempty, since ≤ is a member of P.
So, to apply Zorn’s Lemma, we show that any chain in P is bounded above. As is often the
case with Zorn’s Lemma, we take the union of the chain. That is clearly an upper bound;
so it remains only to prove that it is a partial order.
So let C be a chain in P. What is that? It is a collection of partial orders on A with the
property that, for any two of them, one must be contained in the other.
Proof. For the sake of this proof we will say that a set G of expressions is “good” if it has
these three properties.
G ∪ {P } Q and G ∪ {P } ¬Q
but then
G ∪ {P } Q ∧ ¬Q and so G P ⇒ Q ∧ ¬Q
(This last step is by the Deduction Theorem. Note that P is a sentence, so there is no
trouble with UG.) Now, by SL, G ¬P , contradicting our choice of P . This contradiction
proves that H is consistent. We have just proved that H is good. But G ⊂ H, since P ∈ H
but P ∈/ G. Therefore G is not maximal.
There is another way of looking at this statement, so little different that we will call it . . .
So let us suppose that a is any set, and prove that (∀x ∈ a)P (x) ⇒ P (a). This we do by
proving that ¬P (a) ⇒ ¬(∀x ∈ a)P (x), which is the same as ¬P (a) ⇒ (∃x ∈ a)¬P (x).
So now we assume ¬P (a), that is, that there is some bad sequence starting with {a}; our
aim is to prove that there is some x ∈ a and a bad sequence starting with {x}.
Let m0 , m1 , m2 , . . . be the assumed bad sequence starting with {a}. We have
(i) m0 = {a} .
(ii) For every i ∈ N and x ∈ mi , x ∩ mi+1 6= ∅.
In particular, since a ∩ m1 6= ∅, there is an element b ∈ a ∩ m1 . But then the sequence
{b}, m2 , m3 , . . . is bad (every x ∈ b also ∈ m1 ), and b ∈ a.
Proof that Version 2 ⇒ Version 3. We will prove that Version 3 false ⇒ Version 2
false. Our assumption is then that there is a class a such that a 6= ∅ and (∀x ∈ a)(x∩a 6= ∅).
We choose b ∈ a and use it to construct a bad sequence. Define it inductively:
m0 = {b}
[
mi+1 = { x ∩ a : x ∈ mi }.
Proof that Version3 ⇒ Version1. We will prove that Version 1 false ⇒ Version 3 false.
Our assumption then is that Version 1 fails, in other words, that there is a class w such that
(∀x)(x ∩ b = ∅ ⇒ x∈
/ b)
H.2 Remarks
Version 1 is of course the version given at the start of this chapter. If you compare it with
the statements of Strong Induction in E.4(ii), you will see that Versions 1 and 1A say that I E.4
you can do something very much like strong induction on the class of all sets, using the
relation ∈. Hence the subheading “∈-induction” above.
Version 2 has several immediate and useful corollaries:
H.3 Corollaries
(i) There is no sequence a0 , a1 , a2 , . . . of sets (indexed by N) such that a1 3 a1 3 a2 3 . . .
This probably accounts for the name of the axiom. To see that this follows from Version 2,
define m0 , m1 , m2 , . . . by mi = {ai } for all i ∈ N. Another way of looking at this result is
that, if you start with some set, then take a member of it, then a member of that and so
on, eventually you must come to the empty set. More precisely,
(i0 ) Any sequence a0 3 a1 3 a2 3 . . . of sets must terminate after a finite number of steps
with the empty set.
Putting this another way again, “everything, at the bottom, is empty”. If you are familiar
with Henri le Chat Noir you will recognise this as just his kind of theorem.
And following from this we have
(ii) No set is a member of itself.
Because, if a ∈ a that would generate an infinite sequence a 3 a 3 a 3 . . ..
We will need the rather technical-looking Version 3 when we come to discuss ordinal numbers.
In the section on Ordinal numbers, we will prove an interesting result which follows from
the Axiom of Foundation. For the time being we prove a couple of facts about transitivity:
H.4 Proposition
For every set there is a unique smallest transitive set which contains it (as a subset). In more
detail: let A be any set. Then there is a transitive set T such that A ⊆ T and, moreover, if
T 0 is any other transitive set such that A ⊆ T 0 then T ⊆ T 0 . This set T is uniquely defined
by A.
B0 = A
[
for each i ∈ N, Bi+1 = Bi ∪ Bi .
(This means that x ∈ Bi+1 if Sand only if either x ∈ Bi or there is some y such that
x ∈ y ∈ Bi .) Now define T = i∈N Bi . It is easy to check that T is transitive and that
A ⊆ T . It remains to check the minimality condition. So, given transitive T 0 such that
A ⊆ T 0 , we must prove that T ⊆ T 0 ; it is enoug to show that every Bi ⊆ T 0 , and this we do
by induction over i. Firstly (zerothly?) B0 ⊆ T 0 because B0 = A. Now suppose (i-thly?)
that Bi ⊆ T 0 ; we show that Bi+1 ⊆ T 0 . Let x ∈ Bi + 1. Then either x ∈ Bi , in which case
x ∈ T 0 trivially, or else there is some y such that x ∈ y ∈ Bi , in whic case y ∈ T 0 and then
x ∈ T 0 by transitivity.
H.5 Remark
When dealing with classes of sets with some kind of structure, it is normally of interest to
consider functions between those sets which preserve the structure in some obvious way, for
examples, continuous functions between topological spaces, linear functions between vector
spaces, homomorphisms between groups, order-preserving functions between ordered sets
and so on. Now a plain old set has a built-in structure: since all its members are also sets,
it has a defined relation ∈ between its members. It seems natural to look at functions which
preserve this structure. Such a function f : A → B would have the property that, for all
x, y ∈ A, x ∈ y ⇒ f (x) ∈ f (y). In this context, the following proposition says that two
transitive sets are isomorphic if and only if they are the same.
H.6 Proposition
Let A and B be transitive sets and suppose there exists a bijection f : A → B such that
then A = B.
(a) Suppose that f is a surjection A → B, where A is a set and B is any class then B is
also a set.
Or, an obviously equivalent form:
(b) Suppose that f is a function A → B, where A is a set and B is any class then f [A]
is also a set.
H. THE LAST TWO AXIOMS 283
It is not obvious that this is what the axiom, as stated in Section A.1, gives us, so let I A.1
us deconstruct the axiom now. (Why not state the axiom in the simple form just stated?
Because the axioms, as given at the beginning of this Chapter, are all in formal form; the
simple version above would involve the definition of a function and thus the definitions
of unordered and ordered pairs; this could be done, but the axiom would then become
distressingly long.)
This is another section where the proofs have been included for completeness. Feel free to
skip these proofs unless you are interested in how they go. We return to the main story
with H.10 below. I H.10
First, let us change some of the variable letters to more helpful ones:
(∀A)(∀B)(∀F ) SET(A)
∧ (∀a) a ∈ A ⇒ (∃b)(∃f )(b ∈ B ∧ f ∈ F ∧ a ∈ f ∧ b ∈ f )
∧ (∀a)(∀f1 )(∀f2 )(∀b1 )(∀b2 ) a ∈ A ∧ f1 ∈ F ∧ f2 ∈ F ∧ b1 ∈ B ∧ b2 ∈ B
∧ a ∈ f1 ∧ b1 ∈ f1 ∧ a ∈ f2 ∧ b2 ∈ f2 ⇒ b1 = b2
∧ (∀b) b ∈ B ⇒ (∃a)(∃f )(a ∈ A ∧ f ∈ F ∧ a ∈ f ∧ b ∈ f )
⇒ SET(B)
Suppose then that A is a set and B and F classes satisfying Conditions (1), (2) and (3). We
must prove, using the standard form, that B is a set.
Let G be the class of all ordered pairs ha, bi such that there exists some f ∈ F such that
a, b ∈ f :
G = { ha, bi : (∃f ∈ F )(a ∈ f ∧ b ∈ f } .
then Condition (1) says
so, between them thay say that G is the graph of a function, g say, A → B. Finally,
Condition (3) says that g is onto B.
H.10 Corollaries
(i) Let X be a class, Y a set and f : X → Y an injection. Then X is a set too.
(ii) Let X be a proper class, Y a class and f : X → Y an injection. Then Y is a proper
class too.
R = { f (x) : x ∈ X } .
This is a subclass of Y and so is a set too. But, since f is an injection, its inverse exists,
f −1 : R → X and this is a surjection (onto), so X is also a set.
(ii) This is just the contrapositive of (i).
will be a set. Here are the details, in case they aren’t obvious already. We want to define
these sets as a sequence, S0 , S1 , S2 , . . . say, indexed by N, that is, a function S with domain
N. This is easy: we define the function S : N → Sets by induction by specifying S0 = ∅
and Sn+ = {Sn }. The collection of all these sets is of course the range of the function S,
and the Axiom of Formation tells us that this is a set.
Vector spaces
Let X be any set. We make a real vector space VX . It is the set of all functions f : X → R
with the operations defined thus (in the obvious way?):
Topological spaces
Let X be any set. Make a topological space TX out of X by simply defining all of its subsets
to be open (this is known as a discrete space).
It is easy to see that TX is then a topological space, as claimed, and trivial to see that
the function X 7→ TX is injective. By the same argument as before, then, the class of all
toplological spaces is a proper one.
286 CHAPTER 6. MORSE-KELLEY SET THEORY
We get more from this example: since discrete spaces are metrisable, it also proves that the
class of all metic spaces is proper and, since discrete spaces are hausdorff spaces, the class
of all hausdorff spaces is also proper. And so on.
Groups
Use the same construction as we used for vector spaces above, but ignore the multiplication-
by-scalars part; use only the additive group part. This gives us an injection from the class
of all sets to the class of all abelian groups. So this proves that the classes of all abelian
groups and of all groups are proper.
7. TRANSFINITE ARITHMETIC
In this chapter we will investigate cardinality, that is the way in which the sizes of infinite
sets can be compared. You are probably familiar with the idea that the set R of real
numbers is uncountable and therefore “larger” than the countable set N of natural numbers.
Cardinality is the systematic investigation of “sizes” of all infinite sets (finite ones too, but
that’s nothing new.).
One useful outcome of this is the notion of cardinal numbers. We know that, for each finite
set, there is a unique natural number which represents its size (“The set X has n members;
its size is n.”). The cardinal numbers extend the natural numbers so that every set, finite
or infinite, has a unique cardinal number representing its size.
We will also look at ordinal numbers, which also extend the natural numbers past the finite,
but do so in a way which takes into account ordering, so for instance, just like the natural
numbers, every ordinal number will have a successor. We will see that the ordinal numbers
are well-ordered and that every well-ordered set is order-isomorphic to exactly one ordinal
number. Ordinal numbers will also allow us to do transfinite induction, a process which is
occasionally very useful.
While cardinal numbers sound more generally useful than ordinal numbers, it turns out
that the discussion of cardinality and cardinal numbers is based on well-orders and ordinal
numbers. Consequently, we investigate ordinal numbers first.
The words “cardinal” and “ordinal” have been stolen from the ordinary English lan-
guage and given slightly different and more specialised meanings. (Mathematicians
have a bad habit of doing that, to the confusion of non-specialists.) In ordinary En-
glish usage, the cardinal numbers are ‘zero’, ‘one’, ‘two’, . . . and are used to represent
the size of sets (“I watched four movies last night”); the ordinal numbers are ‘first’,
‘second’, ‘third’, . . . and are used to represent the order in which things occur (“But
my sister only watched the second and third one”). So the mathematical meaning is
not so very different from the everyday one in this case.
A Ordinal numbers
A.1 Discussion
Suppose we want to extend the natural numbers to include infinite numbers in some way.
The first step might be to append a first infinite number at the far end, something like this:
0, 1, 2, 3, ... , ω
(This notation is meant to indicate that all the natural numbers come first, then ω at the
end.) This first infinite number is called ω here because that’s what it is always called in
this context.
287
288 CHAPTER 7. TRANSFINITE ARITHMETIC
Well, that’s nice, but what is this new thing ω going to be exactly? It has to be a set of
some sort, because everything’s a set in MK. Here we can use what I like to think of as the
“Von Neumann insight”: every natural number is just the set of its predecessors, so let us
define ω to be the set of its predecessors too:
ω = {0 , 1 , 2 , 3 , . . . } .
So we have actually defined ω to be N: we have two official names for this set. Mathemati-
cians normally use one or other of these symbols depending upon the way we are looking at
the set. If we are thinking of it as the first infinite ordinal number, we usually call it ω.
Back when we were defining the natural numbers, we defined the successor a+ = a ∪ {a} of
any set. At that point we were only interested in the successors of natural numbers, but the
definition works for any set at all. So now we can extend our new set of numbers by adding
in successors to get
0 , 1 , 2 , 3 , . . . , ω , ω + , ω ++ , ω +++ , . . .
We will see that we will be able to define addition and multiplication of these new ordinal
numbers (in much the same way as we did for natural numbers) and that the we can use
the notation
0, 1, 2, 3, ... , ω, ω + 1, ω + 2, ω + 3, ...
Now of course we can pop a new number at the end of all this (I will call it ω2 because
that’s what it is usually called)
0 , 1 , 2 , 3 , . . . , ω , ω + 1 , ω + 2 , ω + 3 , . . . , ω2
Here, as before, we just define ω2 to be the set of all its predecessors here. But why stop
there? We can follow ω2 by a lot of successors and eventually append ω3 at the far end and
keep going like that. The resulting set is
0, 1, 2, 3, ...
ω, ω + 1, ω + 2, ω + 3, ...
ω2 , ω2 + 1 , ω2 + 2 , ω2 + 3 , . . .
ω3 , ω3 + 1 , ω3 + 2 , ω3 + 3 , . . .
..
.
Then we can put a new “number” at the end of of all this — we’ll call it ωω or just ω 2 —
and start all over again with things like ω 2 + 1, ω 2 + ω2 etc. etc.
At this stage you may be wondering if this is all an amusing but pointless mathematical
game. To which I reply: no, stay with it; as this chapter progresses I hope to demonstrate
how very useful these ordinal numbers are.
What we need at this stage is a workable definition of the class of ordinal numbers (we will
see that it is a proper class). Looking at the examples above, we see that, just like the
natural numbers, this class has the properties
(1) 0 is an ordinal number, and
Recalling that we defined N was the “smallest” class which had these properties (in other
words, the “closure” of these properties) it seems that what we need now is a third property
to ensure that the class contains numbers like ω, ω2, . . . , ω 2 and all the other ones of this
nature that will turn up. Here it is useful to notice a couple of things: firstly, that every one
of these numbers is the union of all its predecessors and secondly that so is every natural
number. It turns out (we will prove this, it’s not hard) that the same is true for all the
other numbers we have just created. So it seems we should add a third closure property to
the two above, that the union of every initial subset of this class is a member of the class.
There is one rather awkward thing about this property (in this form at least): to talk about
“before”, “initial sets” and so on one needs a definition of order. Now, while it is pretty
obvious how we want to order this set, to try to define an order on it while we are in the
process of creating it could be quite messy.
However, this class has a very pretty property that gets us out of this difficulty: the union
of any subset whatsoever is also the union of some initial subset. (Again, we will prove this,
and again it will not be hard to do so.) So our third property will be:
S
(3) If X is any set of ordinal numbers, then X is also an ordinal number.
Now we define the class Ord of all ordinal numbers to be the “smallest” class with this
property. It is tempting to define it as
\
Ord = { X : X is a class having properties (1), (2) and (3) above } WRONG!
but this won’t do: all those X’s are proper classes, so they cannot be members of anything:
we are actually taking the intersection of the empty set here, which is not at all what we
have in mind. But we already found a way of avoiding this problem when we defined N, so
we’ll use it again.
An ordinal number is a set which is a member of every class which has
properties (1), (2) and (3) above .
And now I give a second definition of ordinal numbers which looks rather different.
(i0 ) x ∈ α ⇒ x ⊂ α.
We will use this second definition as the basic one, work from that and eventually prove the
two definitions equivalent. Why work from the second, less obvious one? For two reasons.
Firstly, the basic properties of ordinal numbers are more easily derived from the second
definition. Secondly, and perhaps more importantly, the first definition does not make sense
in the language of ZF (because there is no way of expressing the idea of “for every class” in
that language) whereas the second definition does: because ZF is used quite extensively, it
is good to maintain compatibility where we can.
A.5 Proposition
This relation is in fact a well-order on α.
I 6.E.1 Proof. As was observed in the note 6.E.1, it is enough to prove that this relation is anti-
symmetric and that every nonempty subset of α has a least element.
For antisymmetry, suppose that x ≤ y and y ≤ x in α. Then x ⊆ y and y ⊆ x so x = y.
Now suppose that E is a nonempty subset of α. By Version 3 of the Axiom of Foundation,
there is z ∈ E such that z ∩ E = ∅. We will show that z is the least member of E, as
required. We already know that z ∈ E, so it is enough to show that it is a lower bound.
Suppose then that x is any member of E. Then x ∈ / z. But then by A.3(ii) above, either
z ∈ x or z = x, that is z ≤ x, as required.
A.6 Remark
Recall the definition of an initial segment in an ordered set: I(a) = { x : x < a }. Ordinal
numbers have the interesting property that the initial segment of a member is equal to that
member, that is
If α is an ordinal and x ∈ α, then I(x) = x.
To see this, simply observe that z∈x ⇔ z<x ⇔ z ∈ I(x).
A. ORDINAL NUMBERS 291
We will be using the idea of order-isomorphism quite often in connection with ordinal num-
bers. It will be convenient to have a symbol for this relation. If A and B are ordered sets,
we will write A ' B to mean that A is order-isomorphic to B.
A.7 Proposition
If two ordinal numbers are order-isomorphic, then they are equal.
Proof. By symmetry, it is enough to prove that, if α and β are ordinal numbers and α ' β,
then α ⊆ β. Suppose not. Let ϕ : α → β be the assumed order-isomorphism. Then there is
some x ∈ α such that x 6= ϕ(x) and thus the subset E = { x : x 6= ϕ(x) } of α is nonempty.
Since α is well-ordered, the set E has a least element, e say. Then x = ϕ(x) for all x < e in
α. This means that Iα (e) = ϕ[Iα (e)]. On the other hand, since ϕ is an order-isomorphism,
ϕ[Iα (e)] = Iβ (ϕ(e)) (by Proposition 6.E.6). Therefore Iα (e) = Iβ (ϕ(e)) and so, by the I 6.E.6
Remark A.6 above, e = ϕ(e). But this contradicts the choice of e. I A.6
A.8 Proposition
Every initial segment of an ordinal number is an ordinal number.
Therefore every member of an ordinal number is an ordinal number.
Proof. Let I be an initial segment of the ordinal number α. First we check that I is
transitive. Let x ∈ I. Then x ∈ α so x ⊆ α. But now, for any z ∈ x, we have z < x and so
z ∈ I, proving that x ⊆ I and so that I is transitive. Also, if x, y ∈ I, then x, y ∈ α and so
one of x ∈ y, x = y, or y ∈ x must hold.
A.9 Theorem
(i) The class Ord is transitive (that is, if α ∈ Ord, then α ⊆ Ord).
A.10 Remarks
Part (iii) of this theorem tells us that Ord is a well-ordered class.
This theorem also tells us that Ord has all the defining properties of an ordinal number —
except for one . . .
A.11 Proposition
The class Ord of all ordinals is a proper class; that is, it is not a set.
However, every proper initial segment of Ord is a set (in fact an ordinal number).
Proof. If Ord was a set, Parts (i) and (ii) of the previous theorem would mean that it was
an ordinal number itself. Then we would have Ord ∈ Ord, which is impossible.
Now let J be any proper initial segment of Ord. Since it is proper, Ord \ J is nonempty.
Let α be the least member of Ord \ J. Then J = I(α) = α.
The next theorem will turn out to be much more important than it looks at first sight.
A.12 Theorem
Every well-ordered set is order-isomorphic to exactly one ordinal number.
I 6.E.8 Proof. Since ordinal numbers are well-ordered sets, by the Trichotomy Theorem (6.E.8)
ordinal numbers are of three kinds,
I A.7 Looking first at (i), note that the initial segments of W form a set; by A.7 then, the ordinals
of Type (i) form a set. Therefore there exists an ordinal α which is either of Type (ii) or
of Type (iii). If it is of Type (ii) we are done. If it is of Type (iii), note that ξ is then an
ordinal and I(ξ) = ξ, so W is order-isomorphic to the ordinal ξ.
Uniqueness is easy. If W is order-isomorphic to ordinals α and β, then α and β are order-
I A.7 isomorphic to each other and so, by A.7, they are equal.
A.13 Definition
If A is a well-ordered set, then the unique ordinal number to which it is order-isomorphic is
called the order-type of A.
A. ORDINAL NUMBERS 293
A.14 Proposition
(i) The set N of natural numbers is an ordinal number.
(ii) Every natural number is an ordinal number. In other words, N ⊂ Ord.
Remark This proves one of the results mentioned in the preamble to this chapter. I have
already mentioned that, the context of ordinal numbers, the set N is usually denoted ω.
Proof. (i) To prove that ω is transitive, we assume that m ∈ n ∈ ω and prove that
then m ∈ ω. This we do by induction over n. If n = 0, the result is vacuously true. Now
assume that the result is true for n and prove it for n+. We are assuming that m ∈ n ∪ {n};
but then either m ∈ n, in which case m ∈ N by the inductive hypothesis, or else m = n ∈ N
in which case it is true trivially.
Nowe must prove that, for any m, n ∈ ω, one of m ∈ n, m = n or n ∈ m holds. Let us write
m ∼ n to mean that one of m ∈ n, m = n or n ∈ m holds. We now wish to prove that
m ∼ n for all m, n ∈ ω. The proof goes in several steps.
A.15 Proposition
(i) 0 is an ordinal number.
(ii) If α is an ordinal, then so is α+ .
(vi) α+ is the successor of α, in the sense that it is the “next” ordinal: there is no ordinal
ξ such that α < ξ < α+ .
294 CHAPTER 7. TRANSFINITE ARITHMETIC
(vii) A corollary of this is that, if α and β are ordinals and α < β, then α+ ≤ β.
(viii) Every set A of ordinals has a supremum, that is, an ordinal σ such that
for all α ∈ A, α ≤ σ.
and, if ξ is an ordinal such that, for all α ∈ A, α ≤ ξ then σ ≤ ξ.
S
It is given by σ = A. In the case that A has a maximum element, then σ is this maximum
element.
(ii) and (iii) follow immediately from the definition of an ordinal number.
(iv) α ∈ α+ .
(v) If α+ = β + then α ∈ β + = β ∪ {β}, so α ∈ β or α = β, that is α ≤ β. But, in the
same way, β ≤ α.
(vi) (and (vii)) If such a ξ existed, we would have α ∈ ξ ∈ α∪{α}, so that either α ∈ ξ ∈ α
or α ∈ ξ = α, both of which contradict the axiom of foundation.
S
(viii) Let σ = A. Then σ is an ordinal (by (iii)) and, for any α ∈ A, α ⊆ σ, that is,
α ≤ σ.
S
Now suppose that β is some ordinal < σ, that is, β ⊂ A. Then there is an α ∈ A such
that α 6⊆ β. Then β < α, so β is not an upper bound.
Examples of successor ordinals are: all the nonzero natural numbers (n = (n − 1)+ ) and the
ordinals ω + , ω ++ , ω +++ and so on.
The following useful characterisations of limit ordinals should be proved as an exercise:
A.17 Proposition
(i) An ordinal number λ is a limit ordinal if and only if
(ii) An ordinal number λ is a limit ordinal if and only if λ = sup{ξ : ξ < λ}.
This last probably accounts for its name.
We now develop a few techniques for dealing with ordinal numbers.
(ii) Let f be a function X → Ord, where X is any set. Then µ = sup{ f (x) : x ∈ X } if
and only if the following two things are true:
(a) f (ξ) ≤ µ for all ξ ∈ X .
(b) If α < µ then there is some ξ ∈ X such that α < f (ξ) .
Proof. (i) If sup X = 0 then, since X 6= ∅, we have X = {0} and the result is trivially
true.
If sup X is a successor ordinal, sup X = α+ say, then α < α+ so, by (ib) above, there is
some ξ ∈ X such that α < ξ. But then α+ = ξ.
(i) follows immediately from 6.C.3(i). I 6.C.3
Proof. Let θ be the least ordinal such that f (θ) > α (there is at least one such, namely
f (α), since f is strictly increasing). Now θ is not zero, by assumption, nor is it a nonzero
limit ordinal because, if it were, we would have
α < f (θ) = sup{ f (ξ) : ξ < θ } ,
296 CHAPTER 7. TRANSFINITE ARITHMETIC
in which case α would not be an upper bound for the set { f (ξ) : ξ < θ } so there would be
ξ < θ such that α < f (ξ), contradicting the choice of θ.
So θ is a successor ordinal. Set θ = µ+ . Then µ < θ, so, by the choice of θ, µ ≤ α, and we
already know that α < θ = µ+ .
Notice what this says. Because f has values either side of α, then it has a value as close to
α as you could expect to get. (Because the ordinals are a sort of discrete set — lots of little
jumps — you couldn’t expect to always get a value µ such that f (µ) = α.)
Proof. Write µ = sup X. Then, for every x ∈ X, x ≤ µ and so, since f is increasing,
f (x) ≤ f (µ). Therefore sup f [X] ≤ f (µ). It remains to prove that µ ≤ sup f [X].
If X has a maximum member, then it must be µ. We then have µ ∈ X so that f (µ) ∈ f [X]
and then f (µ) ≤ sup f [X].
But sup{ f (ξ) : ξ < µ } = f (µ) (since f is normal) and the result is proved.
B. INDUCTION AND ORDINALS 297
Then X = Ord.
Proof. (ii) is just the statement that Ord is well-ordered (see 6.E.4). I 6.E.4
B.3 Note
In (ordinary) inductive proofs it is very often the case that the zero case is treated separately
from the nonzero limit ordinals. In that case, the structure of the proof is
I 6.E.4 Proof (ii). This is just the statement that α is well-ordered (see 6.E.4).
(i) If X 6= α then there is a member of α, which is therefore an ordinal, which is not a
member of X. Consider the first such. If it is a non-limit ordinal, call it ξ + and it violates
the first condition; and if it is a limit ordinal, call it λ and it violates the second.
(ii) (Strong induction) Let α be an ordinal and P (ξ) be a predicate defined on α with
the property:
For any ordinal ξ < α, (∀θ < ξ)(P (θ)) ⇒ P (ξ).
Then (∀ξ < α)(P (ξ)).
Ordinary or weak induction is very like ordinary induction over the natural numbers, with
just a slight addition to make it “get past” limit ordinals. To define a function f using this
kind of induction, separate definitions are given of
f (0)
+
f (ξ ) in terms of ξ and f (ξ)
and, for any non-zero limit ordinal λ, f (λ) in terms of λ and f I(λ) .
Those were the cases where there are no parameters. Where there are parameters we are
defining a function f : A × Ord → B, where A and B are classes. The weak version looks
like this: separate definitions are given of
f (a, 0)
+
f (a, ξ ) in terms of a, ξ and f (a, ξ)
and, for any nonzero limit ordinal λ, f (a, λ) in terms of a, λ and f A×I(λ) .
Where B is Ord, in the majority of cases (I would say), the definition in the limit ordinal
case is
[
f (θ, λ) = f (θ, ξ) which is the same as f (θ, λ) = sup f (θ, ξ)
ξ<λ
ξ<λ
The definitions of addition, multiplication and so on given in the next section are simple
examples of the application of this kind of inductive definition. It will be seen that these def-
initions are really straightforward extensions of the definitions for natural numbers already
considered, extended to the ordinals.
We have just considered three kinds of inductive definition over Ord. We already know that
the first kind is valid. It is a straightforward exercise to use this to prove that the other two
kinds are valid also.
We conclude this section with an interesting example of a definition by induction over
ordinals.
Proof. (i) Proof is by induction over α. First notice that V0 = ∅ is transitive (vacu-
ously).
Now suppose that Vα is transitive and that x ∈ y ∈ Vα+1 . Then x ∈ y ⊆ Vα so x ∈ Vα .
But Vα is transitive (as just assumed) and so x ⊆ Vα . Then x ∈ Vα+1 .
Finally suppose that λ is a nonzero limit ordinal, that all the Vξ for ξ < λ are transitive
and that x ∈ y ∈ Vλ . Then there is some ξ < λ such that y ∈ Vξ . But then x ∈ y ∈ Vξ and
Vξ is transitive (as just assumed); so x ∈ Vξ and then x ∈ Vλ .
(ii) We show first, by induction over β, that if α ≤ β then Vα ⊆ Vβ . If α = β the result
is trivial. Now suppose that α ≤ β and Vα ⊆ Vβ ; we prove that Vα ⊆ Vβ+1 . Let x ∈ Vα .
Then x ∈ Vβ ∈ Vβ+1 , which is transitive, so x ∈ Vβ+1 . Finally, suppose that α < λ and λ
is a nonzero limit ordinal. Then α + 1 < λ also and Vλ+1 ⊆ Vλ by the definition of Vλ . But
Vα ⊆ Vα+1 so Vα ⊆ Vα+1 ⊆ Vλ .
B. INDUCTION AND ORDINALS 301
To see that, if α < β then Vα ⊂ Vβ , it is now enough to show that Vα ⊂ Vα+1 for all α.
But this is easy: we know that Vα ⊆ Vα+1 and the sets cannot be equal because Vα ∈ Vα+1
but Vα 6∈ Vα .
(iii) Let us say “X is in the hierarchy” to mean “there exists some ordinal α such that X ∈
Vα ”. We want to prove that every set is in the hierarchy. Using the Axiom of Foundation,
Version 1A, it is enough to prove that, for all sets X,
Suppose then that X is such a set, that is, for all x ∈ X there is an ordinal ξ such that
x ∈ Vξ . Then, for all such x, write ξ(x) for the least ordinal such that x ∈ Vξ(x) . Now, by
the Axiom of Formation, the class { ξ(x) : x ∈ X } is in fact a set and so, by Proposition
A.15(viii), there is an ordinal α such that ξ(x) ≤ α for all x ∈ X. Then, for all x ∈ X, I A.15
x ∈ Vξ(x) ⊆ Vα ; thus X ∈ Vα and so X ⊆ Vα .
302 CHAPTER 7. TRANSFINITE ARITHMETIC
(3) The function f has the minsup property if, for every nonzero limit ordinal λ,
(4) The function f has the intermediate value or IV property if, for all ordinals α1 , α2
and γ such that α1 < α2 and
f (α1 ) ≤ γ < f (α2 ) .
Then there is an ordinal β such that
α1 ≤ β < α2 (–1)
+
and f (β) ≤ γ < f (β ) . (–2)
C.2 Proposition
Suppose that f has the minsup property. Then, for any nonzero limit ordinal λ, there is an
ordinal α0 < λ such that
Proof of (–1). From the minsup property, there is some ordinal α0 < λ such that
suppose that α0 is the least such. Then, for α0 ≤ α < λ, by (–3), we have (α, λ) ⊆ (α0 , λ)
so
sup f (ξ) ≤ sup f (ξ) = f (λ) .
α<ξ<λ α0 <ξ<λ
C.3 Proposition
(i) Continuity ⇒ the minsup property.
(ii) The minsup property ⇒ the IV property.
Let λ be a limit ordinal and θ any ordinal such that θ < f (λ). Then there is β < λ such
that f maps (β, λ] into (θ, f (λ)].
Given any α such that β ≤ α < λ, we have (α, λ] ⊆ (β, λ], so f maps (α, λ] into (θ, f (λ)] ,
that is,
α < ξ ≤ λ ⇒ θ < f (ξ) ≤ f (λ) .
On the other hand, given any ν < f (λ), there is µ < λ such that f maps (µ, λ] into (ν, f (λ)] .
But, since λ is a limit ordinal, µ < λ and α < λ, we have µ+ < λ and α+ < λ, so, writing
ξ = max{ µ+ , α+ } , we have ξ ∈ (α, λ) and ξ ∈ (µ, λ] so f (ξ) > ν . This shows that
(ii) Suppose that f has the minsup property and that α1 , α2 and γ are ordinals such
that α1 < α2 and
f (α1 ) ≤ γ < f (α2 ) ;
we want to show that there is an ordinal β such that (–1) and (–2) above hold.
Let µ be the least ordinal such that µ > α1 and f (µ) > γ. There is at least one such ordinal,
namely α2 , so µ exists and µ ≤ α2 .
304 CHAPTER 7. TRANSFINITE ARITHMETIC
Suppose that µ is a limit ordinal. Since µ > α1 it is a nonzero limit ordinal, so, by the
previous proposition, there is an ordinal α0 < µ such that
and γ < f (µ), so there is an ordinal ξ such that α0 < ξ < µ and f (ξ) > γ, contradicting the
choice of µ.
So µ is not a limit ordinal: there is β such that µ = β + . Then β ≥ α1 since µ > α1 and
β < µ ≤ α2 .
Also, by choice of µ, either β ≤ α1 or f (β) ≤ γ. But if β ≤ α1 then, since β + > α1 ,
we have β = α1 and so f (β) = f (α1 ) ≤ γ, so f (β) ≤ γ in either case. But we also have
γ < f (µ) = f (β + ), and we are done.
C.4 Proposition
(i) The IV property 6⇒ the minsup property.
(ii) The IV property 6⇒ continuity.
(iii) Continuity 6⇒ the sup property.
(iv) The minsup property 6⇒ the continuity.
C.5 Proposition
Let f be a weakly order-preserving function. Then all four properties above are equivalent
for f .
Now suppose that θ < f (λ): we want to show that supξ<λ f (ξ) ≥ θ . If f (0) > θ then of
course supξ<λ f (ξ) ≥ f (θ) > θ and we are done. Otherwise f (0) ≤ θ < f (λ) and there is β
such that β < λ and f (β) ≤ θ < f (β + ) . But, since λ is a limit ordianal, β + < λ, so (from
f (β + ) > θ) we have supξ<λ f (ξ) > θ .
Proof. The proof of (i) is given; the proof of (ii) is the same.
First let us note that, if f has the minsup property, then by Proposition C.2, for any nonzero I C.2
limit ordinal λ, there exists an ordinal α < λ such that
But the same is obviously true if f has the sup property (put α = 0) and continuity implies
the minsup condition, so (–1) holds in any case.
306 CHAPTER 7. TRANSFINITE ARITHMETIC
Suppose then that f is not weakly increasing. Then there are ordinals α < β such that
f (α) > f (β). Let β be the least ordinal for which there exists α < β such that f (α) > f (β).
Then β must be a limit ordinal, for otherwise there would be ξ such that β = ξ + and then
f (ξ) ≤ f (β) < f (α) with ξ ≥ α and then ξ > α, contradicting the choice of β.
So β is a nonzero limit ordinal (since β > α) and so, as noted above, there is θ < β such
that
sup f (ξ) = f (β) . (–2)
θ<ξ<β
We may assume without loss of generality that θ ≥ α. Then α < θ+ < β and f (θ+ ) ≥
f (α) > f (β) contradicting (–2).
C.7 Note
The proposition above does not hold if the IV property is used instead of the others. To see
this, consider the function f defined by
(
ξ if ξ < ω,
f (ξ) =
0 otherwise .
C.8 Lemma
If a function f : Ord → Ord is weakly order-preserving and continuous, then, for any
nonempty set X of ordinals
so (–1) follows.
Now suppose that µ is a nonzero limit ordinal.
For any ξ ∈ X, ξ ≤ µ so, since f is order-preserving, f (ξ) ≤ f (µ); it follows that
To see that equality holds, we suppose that sup{ f (ξ) : ξ ∈ X } < f (µ) and deduce a
contradiction. Put
By continuity, there is an α < sup X such that f maps (α, sup X] into (θ, f (sup X)], that is
θ < f (ξ) ≤ f (sup X) for all ξ such that α < ξ ≤ sup X . (–3)
C. ORDINAL FUNCTIONS AND CONTINUITY 307
But, since α < sup X, there is some ξ ∈ X such that α < ξ. Then
θ ≥ f (ξ) by (–2)
> θ by (–3),
the contradiction.
308 CHAPTER 7. TRANSFINITE ARITHMETIC
α + 0 = α.
α + β + = (α + β)+
α + λ = sup{α + ξ : ξ < λ}
Notice that the first two lines of this definition are the same as the definition of addition on
the natural numbers. Consequently our new definition extends the old one.
The third line allows the definition to extend to all ordinal numbers, and does so in the
usual way.
We can now check some of the notational results I used in the introductory remarks to this
chapter. For example
ω + 1 = ω + 0+ = (ω + 0)+ = ω + ,
ω + 2 = ω + 1+ = (ω + 1)+ = ω ++ and so on.
also
ω + ω = sup{ω + ξ : ξ < ω}
= sup{ω + 0, ω + 1, ω + 2, . . .}
[
= {ω, ω + , ω ++ , . . .}
[
= {0, 1, 2, . . . , ω, ω + , ω ++ , . . .}
= (what I then called) ω2
ω3 = ω2 + ω
and so on.
By an easy induction,
0+α = α for every ordinal α .
Warning! All this is as expected so far, but note well that addition is not commutative.
For example 1 + ω = ω. To see this,
This also example shows that neither is it cancellative (1 + ω = 0 + ω but 1 6= 0). There are
a number of other standard algebraic properties that we might expect of addition that fail
for ordinal addition, so one should tread with care when doing ordinal arithmetic.
D. ARITHMETIC OF ORDINAL NUMBERS 309
(It is not strict order-preserving in its first argument: there are examples of ordinals ξ, η
and α such that ξ < η but ξ + α = η + α.)
(ii) Addition is normal in its second argument, and so . . .
(ii0 ) For any ordinal α and any nonempty set X of ordinals,
sup{ α + ξ : ξ ∈ X } = α + sup X .
(iii) If α and β are ordinals and α ≤ β, then there is a unique ordinal γ such that
α + γ = β.
(This rule does not hold on the other side: there may not be any γ such that γ + α = β; for
example, there is no γ such that γ + 1 = ω.)
Proof. (i) That addition is strictly order-preserving in its second argument follows from
Proposition C.6. I C.6
To see that it is weakly order-preserving in its first argument, suppose that α, ξ and η are
ordinals and that ξ ≤ η; we want to show that ξ + α ≤ η + α. This we do by induction over
α.
ξ + α = ξ + θ+ = (ξ + θ)+ ≤ (η + θ)+ = η + θ+ = η + α .
Now suppose that α is a non zero limit ordinal. Then, for any θ < α we have
ξ + θ ≤ η + θ ≤ sup{η + ζ : ζ < α} = η + α .
α0 = 0 .
αβ + = αβ + α
and, for any nonzero limit ordinal λ,
βλ = sup{βξ : ξ < λ}
Once again we observe that this definition extends the usual definition of multiplication
from the natural numbers to all the ordinals. Also that the third line extends the definition
for nonzero limit ordinals in the usual way.
We see that
α1 = α0+ = α0 + α = 0 + α = α,
α2 = α1+ = α1 + α = α + α,
α3 = α2+ = α2 + α = α + α + α,
D. ARITHMETIC OF ORDINAL NUMBERS 311
and so on, as you would expect. But once again there are a few surprises, for instance,
so 2ω 6= ω2 and multiplication is not commutative. This accounts for the slightly unusual
notation ω2, ω3 etc. used earlier in this chapter.
if ξ ≤ η then ξα ≤ ηα .
(ii0 ) For any ordinal α and any nonempty set X of ordinals, sup{ αξ : ξ ∈ X } = α sup X.
Proof. (i) That multiplication is strictly order-preserving in its second argument follows
from Proposition C.6. I C.6
Now we show that multiplication is weak order-preserving on the left, ξ ≤ η ⇒ ξβ ≤ ηβ,
by induction over β.
ξβ = ξθ+ = ξθ + ξ ≤ ηθ + η = ηθ+ = ηβ
(we are using here the fact that addition is order-preserving on both sides, proved above).
Suppose α is a limit ordinal. For any θ < β we have ξθ ≤ ηθ ≤ sup{ηθ : θ < β} = ηβ. As
this is true for any θ < β, we have ξβ = sup{ξθ : θ < β} < ηβ.
(ii) By its definition.
(iv) Multiplication is not commutative: there are ordinals α and β such that αβ 6= βα.
312 CHAPTER 7. TRANSFINITE ARITHMETIC
(v) Multiplication distributes on the left but not on the right: for any ordinal numbers
α, β and γ,
α(β + γ) = αβ + αγ
(α + β)γ 6= αγ + βγ .
(vi) Multiplication is cancellative on the left but not on the right: for any ordinal numbers
α, ξ and η,
α 6= 0 and αξ = αη ⇒ ξ=η .
α 6= 0 and ξα = ηα but ξ 6= η
(vii) The division algorithm for ordinals. Let α and β be ordinals with α 6= 0. Then there
are unique ordinals δ and ρ such that
β = αδ + ρ and ρ < α.
I think that it is quite surprising that the well-known division algorithm for natural numbers
holds without any change at all for ordinals.
If γ = θ+ , we have
To see that multiplication does not distribute on the other side, observe that (1 + 1)ω =
2ω = ω but 1ω + 1ω = ω + ω = ω2 and we already know that these are not equal.
(vi) We prove this result in the form: α 6= 0 ∧ ξ 6= η ⇒ αξ 6= αη.
Then either ξ < η or ξ > η, in which case αξ < αη or αξ > αη, since multiplication is
strict order-preserving on that side.
To see that multiplication is not cancellative on the other side, observe that 2ω = 1ω.
(i) We prove that (αβ)γ = α(βγ) by induction over γ.
In the case γ = 0 we have (αβ)0 = 0 = α0 = α(β0) .
Finally suppose that α is a limit ordinal. Then, using normality and the inductive hypothesis,
(αβ)γ = sup{ (αβ)ξ : ξ < γ } = sup{ α(βξ) : ξ < γ } = α sup{ βξ : ξ < γ } = α(βγ).
(vii) We have α ≥ 1 so β < β + 1 < α(β + 1) so there is an ordinal θ such that αθ > β;
let θ in fact be the least such ordinal.
Now θ cannot be a limit ordinal because then we would have β < αθ = sup{ αξ : ξ < θ }
and (by the definition of a sup) there would be ξ < θ such that αξ > β, contradicting the
choice of θ. Therefore θ is a successor: set θ = δ + .
α0 = 1 ,
+
αβ = αβ α
Once again we observe that this definition extends the usual definition of exponentiation
from the natural numbers to all the ordinals. Also, the third line extends the definition for
nonzero limit ordinals in the usual way.
We see that
+
α1 = α0 = α0 α = 1α = α,
+
α2 = α1 = α1 .α = αα,
+
α3 = α2 = α2 α = ααα,
and so on, as you would expect. But once again there are a few surprises, for instance,
and
if α ≤ β then αγ ≤ β γ (always).
D. ARITHMETIC OF ORDINAL NUMBERS 315
(iv0 ) For any ordinal α and nonempty set X of ordinals, and provided that it is not the
case that α = 0 and X contains both 0 and some nonzero member,
To see that it is weakly order-preserving in its first argument, we assume that α ≤ β and
+
show that αγ ≤ β γ , as usual by induction over γ. Firstly α0 = 1 = β 0 . Next αθ = αθ α ≤
+
β θ β = β θ . And last, if γ is a nonzero limit ordinal,
Most of the arithmetic properties are left as an amusing exercise. We will state and prove
just one, which is interesting enough to rate a theorem all to itself.
Proof. Because of the special case of the statement when β = 0, we may assume now that
β 6= 0. For the purposes of this proof, let us say that an expression αn γn + αn−1 γn−1 +
· · · + α1 γ1 is in “standard form” if n is a natural number, 1 , 2 , . . . , n are ordinals such
that 1 < 2 < . . . , n , and γ1 , γ2 , . . . , γn are ordinals such that 0 < γi < α for each
i = 1, 2, . . . , n. Our aim is to prove that β can be represented uniquely by such a standard
form. This is proved by induction over β.
I D.8 Since α ≥ 2, by D.8(ii) above, ξ 7→ αξ is strictly order-preserving and normal. Using
I A.20 the Intermediate Value Theorem (A.20), since α0 ≤ β there exists an ordinal e such that
+
αe ≤ β < αe .
Now we use the division algorithm: there are ordinals g and ρ such that
β = αe g + ρ and ρ < αe .
and, in order to show that this is also in standard form, it only remains to show that
n+1 > n , that is, that e > n . But, if n ≥ e, we would have ρ ≥ αe but we already have
ρ < αe above.
This establishes the existence of the representation for β.
+
Before proving uniqueness, we show that αn γn + αn−1 γn−1 + · · · + α1 γ1 < αn γn+ ≤ αn
+
by induction over n. If n = 1, then αn γn < αn γn+ ≤ αn α = αn since γn < α.
Otherwise n ≥ 2 and
+ +
αn γn +αn−1 γn−1 +· · ·+α1 γ1 < αn γn +αn−1 ≤ αn γn +αn = αn γn+ ≤ αn α = αn .
0 0 0
0 0
αn−1 γn−1 + αn−2 γn−2 + · · · + α1 γ1 = αm−1 γm−1 + αm−2 γm−2 + · · · + α1 γ10
and then, by induction, these tails are identical expressions also.
Let us suppose that A and B are disjoint, to keep things simple. We can visualise the
order-isomorphisms ω → A and ω → B thus:
ω: 0 1 2 ... ω: 0 1 2 ...
↓ ↓ ↓ ↓ ↓ ↓
A: a0 a1 a2 ... B: b0 b1 b2 ...
Now suppose we combine A and B to form A ∪ B, ordering the union in what I suppose is
the obvious way: keep the orders of A and B as is and make every member of A less than
every member of B:
This works for any well-ordered sets A and B. The important things to notice here are
(1) A and B must be well-ordered in the first place so that each of them has an order-
type, that is, each is order-isomorphic to some ordinal.
(2) Given two well-ordered sets A and B, ordering their union as we have done above
always results in another well-ordered set.
(3) Any well-ordered set is order-isomorphic to one and only one ordinal, so there is no
choice about the order type of the union.
Accepting this (and we will prove all this below), this construction using unions could have
been used as the definition of ordinal addition. However I think it is clear that this way
318 CHAPTER 7. TRANSFINITE ARITHMETIC
of proceeding would have been much messier than the neat inductive definition we have
actually used.
Before moving on to general definitions and proofs, let us look at a couple more examples.
First, a nice small finite one. Suppose now that we have two ordered sets, A = {a0 , a1 } of
order-type 2 and B = {b0 , b1 , b2 } of order-type 3. As before, let us suppose that A and B
are disjoint, to keep things simple. We can visualise the order-isomorphisms 2 → A and
3 → B thus:
2: 0 1 3: 0 1 2
↓ ↓ ↓ ↓ ↓
A: a0 a1 B: b0 b1 b2
2 + 3 = 5: 0 1 2 3 4
↓ ↓ ↓ ↓ ↓
A∪B: a0 a1 b0 b1 b2
Now suppose that we have two disjoint ordered sets, A = {a0 } of order-type 1 and B =
{b0 , b1 , b2 , . . .} of order-type ω:
1: 0 ω: 0 1 2 ...
↓ ↓ ↓ ↓
A: a0 B: b0 b1 b2 ...
1+ω = ω: 0 1 2 3 ...
↓ ↓ ↓ ↓
A∪B: a0 b0 b1 b2 ...
ω + 1 (6= ω) : 0 1 2 3 ... ω
↓ ↓ ↓ ↓ ↓
B ∪ A: b0 b1 b2 b3 ... a0
Important. It seems that all the argument above will break down if A and B are not
disjoint: we cannot very well “put them side-by-side” the way we have been doing if they
have members in common. We wriggle out of this problem by working with identical copies
of A and B rather than the originals, and making these identical copies in such a way that
they are guaranteed to be disjoint. To do this is not hard — we just “tag” the members of
A with 0 and the members of B with 1, so, instead of working with A and B directly, we
work with
Ā = { ha, 0i : a ∈ A } and B̄ = { hb, 1i : b ∈ B } .
D. ARITHMETIC OF ORDINAL NUMBERS 319
Going back to the first example, in which A and B are both of order-type ω, we first create
Ā and B̄, ordering them in the obvious way:
ω: 0 1 2 ... ω: 0 1 2 ...
↓ ↓ ↓ ↓ ↓ ↓
A: a0 a1 a2 ... B: b0 b1 b2 ...
↓ ↓ ↓ ↓ ↓ ↓
Ā : ha0 , 0i ha1 , 0i ha2 , 0i ... B̄ : hb0 , 1i hb1 , 1i hb2 , 1i ...
At this stage it would be a very good idea to revisit the order and especially the algebraic
properties of addition (which we proved using the inductive definition in D.2 and D.3) I D.2
and prove them again, at least roughly, using the ordered-union way of looking at it just I D.3
discussed.
As described above, to create the disjoint union of sets A and B, we first make “tagged”
versions:
Ā = { ha, 0i : a ∈ A }
B̄ = { hb, 1i : b ∈ B } .
Now Ā and B̄ are disjoint copies of A and B. Then define the sum of A and B (which we
will denote by A + B) to be the ordinary union of Ā and B̄:
A + B = { ha, 0i : a ∈ A } ∪ { hb, 1i : b ∈ B } .
The maps A → Ā and B → B̄ given by a 7→ ha, 0i and b 7→ hb, 1i obviously map A and B
one-to-one onto Ā and B̄ respectively.
by specifying that every member of Ā is < every member of B̄. To write this out properly,
note that every member of A + B is of the form
hx, ii where (i = 0 and x ∈ A) or (i = 1 and x ∈ B)
Let hx, ii and hy, ji be two members of A + B. Then
hx, ii ≤ hy, ji if and only if (a) i < j or
(b) i = j = 0 and x ≤ y in A or
(c) i = j = 1 and x ≤ y in B.
We extend the construction just discussed to any family (indexed collection) of sets. In this
case it is best to use the members of the index set as tags.
Let hAi ii∈I be a family of sets indexed by the set I. Then the sum (or disjoint union) of
this family is the set
X [
hAi ii∈I = hĀi ii∈I where Āi = { ha, ii : a ∈ Ai } for each i.
Each Ai is mapped one-to-one onto Āi by the natural injection ηi : Ai → Āi given by
ηi (a) = hi, ai for each i and (as is the point of this construction), Āi ∩ Āj = ∅ for all i 6= j.
One
P can ring the usual changes with this notation. For instance, we could write it as
i∈I Ai .
(ii0 )
P
hAii∈I is the union of the sets Āi = {hi, ai : a ∈ Ai } (for each i ∈ I), these are all
disjoint (that is, Āi ∩ Āj = ∅ whenever i 6= j), Āi ' Ai for every i ∈ I, and, if i < j, every
member of Āi is less than (<) every member of Āj .
(iii0 ) If Ai ' A0i for every i ∈ I, then hAii∈I ' hA0 ii∈I .
P P
(iv0 ) (Converse of (ii0 )) If the ordered set X is the union of pairwise disjoint sets, Āi for
all i ∈ I, which are order-isomorphic to Ai respectively,
Pand, for every i < j, every member
of Āi is less than (<) every member of Āj , then X ' hAii∈I .
Proof. These proofs can be left as a straightforward, if tedious, exercise. Clearly (i) – (iv)
are all special cases of (i0 ) – (iv0 ) and so it is only necessary to prove the latter statements.
D.13 Remarks
P
Given two ordered sets A0 and A1 , the sum A0 + A1 is exactly the same as hAi ii∈2 —
just compare the definitions, remembering that 2 = {0, 1}. Thus the definition of the sum
of two ordered sets is just a special case of the definition of the sum of a family.
322 CHAPTER 7. TRANSFINITE ARITHMETIC
We can use this to define the sum of any finite number of ordered sets. For example, given
three sets, A, B and C, we can index them by I = {0, 1, 2} and so define the sum
A + B + C = {ha, 0i : a ∈ A} ∪ {hb, 1i : b ∈ B} ∪ {hc, 2i : c ∈ C} ,
with the order defined exactly as before:
hx, ii ≤ hy, ji if and only if (a) i < j or
(b) i = j and x ≤ y in Ai
But note: this is different from A + (B + C) and (A + B) + C, since applying our definitions
yields
A + (B + C) = {ha, 0i : a ∈ A} ∪ {hhb, 0i, 1i : b ∈ B} ∪ {hhc, 1i, 1i : c ∈ C}
(A + B) + C = {hha, 0i, 0i : a ∈ A} ∪ {hhb, 1i, 0i : b ∈ B} ∪ {hc, 1i : c ∈ C} .
This is irritating. It would be nice to have a definition which would make the sum (dis-
joint union) associative (as one feels it ought to be), but that doesn’t seem to be possible.
The problem occurs both for the ordered sum of ordered sets and the plain sum of plain
sets. However the natural mappings between these sets are bijections for sets and order-
isomorphisms for ordered sets, so we have an associative law (up to isomorphism) which is
good enough for most purposes:
(A + B) + C ' A + (B + C) ' A + B + C .
There is another notational problem to consider. We have the ordered sum of two ordered
sets, which we write A + B, and we also have the sum α + β of two ordinal numbers,
as defined for ordinal arithmetic. The trouble is that ordinal numbers are also sets; well,
everything in mathematics is a set, but the point is that we sometimes treat them as sets,
whose members are of interest. So now we actually have two different definitions of α + β,
one by ordinal arithmetic and the other as the ordered disjoint union of two sets. If you
check the definitions, you will see that these things are in fact different.
This problem is only going to get worse. When we introduce cardinal arithmetic shortly,
we will have a third definition of addition, different again, exacerbated by the fact that a
cardinal is a special kind of ordinal, which in turn is a set.
One way to deal with this problem would be to use different notations for the different kinds
of addition (+, ⊕ and say). This doesn’t work very well because the ordinary notation is
so commonly used that it would be perverse to use another.
What I will do in this book is make clear which kind of addition is being used by simply
saying so. Occasionally two kinds of addition are used in the same sentence or even in the
same equation. Then I will decorate the plus sign as follows:
set
α+β The disjoint sum of the sets α and β;
ord
α + β The sum of ordinals α and β by ordinal addition;
crd
α+β The sum of cardinals α and β by cardinal addition.
We will have occasion to do this in the next proposition.
D. ARITHMETIC OF ORDINAL NUMBERS 323
D.14 Proposition
The promised useful application of ordinal addition.
(i) Let α and β be two ordinal numbers. Then the sums α + β as ordered sets and by
ordinal arithmetic are order-isomorphic:
set ord
α + β ' α + β.
(ii) Let A and B be two well-ordered sets of order types α and β respectively. Then
A + B is of order type α + β. Here I don’t bother decorating the plus signs, because it is
obvious which kind of addition is referred to in each case.
α1 +α2 + . . . +αn (ordinal arithmetic) ' α1 +α2 + . . . +αn (sum of ordered sets) .
Proof. First we note that (ii) is an immediate corollary of (i), so it remains to prove (i).
Observing that α + β is the union of α and the set {α + ξ : ξ ∈ β}, that these subsets
are disjoint and that every member of α is less that every member of {α + ξ : ξ ∈ β}, we
conclude that α + β ' α + {α + ξ : ξ ∈ β}. Also, the map β → {α + ξ : ξ ∈ β} defined by
ξ 7→ α + ξ is an order-isomorphism, so β ' {α + ξ : ξ ∈ β} and therefore α + β ' α + β.
Proofs of (i0 ) and (ii0 ) are left as an exercise.
on A × B; well, there were two lexicographic orders, one from the left and the other from
the right. For our present purposes it is the lexicographic order from the right that we use:
This is the appropriate order to use for well-ordered sets, so from now on, whenever a
cartesian product of well-ordered sets is mentioned, it will be assumed that it is endowed
with the lexicographic order from the right.
We know that, if A and B are well-ordered then A × B is also well-ordered. We will prove
below that if A and B are of order-types α and β respectively, then A × B is of order-type
αβ.
At least we don’t have an ambiguity of notation in this case. However, when we come to
cardinal arithmetic we wil find that there are different definitions of the product for ordinal
and cardinal arithmetic, and that the same notation αβ is commonly used for both. Where
ord crd
necessary I will disambiguate by writing αβ and αβ .
This generalises just as one would hope to the product of any number of ordered sets.
If A1 , A2 , . . . , An are well-ordered sets of order-types α1 , α2 , . . . , αn respectively, then the
cartesian product A1 × A2 × · · · × An is well-ordered of order-type α1 α2 . . . αn . We won’t
need to follow this up in these notes.
D.16 Proposition
(i) Let α and β be two ordinal numbers. Then α × β ' αβ. Here α × β is the cartesian
product and αβ is the product according to ordinal arithmetic.
(ii) Let A and B be two well-ordered sets of order types α and β respectively. Then
A × B is of order type αβ.
Proof. (ii) is an immediate corollary of (i), so it is enough to prove (i). (Throughout this
proof, α + β and αβ of course refer to ordinal arithmetic.)
We define a function f : α × β → αβ by f (ξ, η) = αη + ξ . We will show that this is an
order-isomorphism.
First we must check that its range is within αβ as claimed. So, for hξ, ηi ∈ α × β we have
ξ < α and η < β and thus
f (ξ, η) = αη + ξ < αη + α = αη + ≤ αβ
so f (ξ, η) ∈ αβ as required.
To see that f is surjective, suppose that θ is any member of αβ, that is, θ < αβ. By the
division algorithm, there is ξ, η such that θ = αη + ξ with ξ < α. Then also η < β, because
otherwise we would have η ≥ β and then αη + ξ ≥ αη ≥ αβ. Thus we have ξ < α and η < β
such that f (ξ, η) = αη + ξ = θ. This shows that f maps α onto αβ.
D. ARITHMETIC OF ORDINAL NUMBERS 325
It remains to show that f is strictly increasing. So suppose that hξ, ηi < hξ 0 , η 0 i. The either
η < η 0 or else η = η 0 and ξ < ξ 0 .
In the first case, η + ≤ η 0 so
f (ξ, η) = αη + ξ < αη + α = αη + ≤ αη 0 ≤ αη 0 + ξ 0 = f (ξ 0 , η 0 )
(ii) If A, B, A0 and B 0 are ordered sets and A ' A0 and B ' B 0 then A × B ' A0 × B 0 .
(iii) A sort of associative law: if A, B and C are ordered sets, then (A × B) × C '
A × (B × C) ' A × B × C.
(iv) A sort of distributive law: if A, B and C are ordered sets, then A × (B + C) '
(A × B) + (A × C).
(v) If A is an ordered set and {p} is a singleton, then A × {p} ' A.
D.18 Remarks
Notice the similarities between the two definitions of A + B and A × B. In the case A + A
we have
A + A = A × {0, 1} = A × 2
and in general
A + A + · · · + A (n copies) = A × {0, 1, . . . , n − 1} = A × n
This is great. There is no reason here why β should not be an infinite ordinal, so we have
defined the limit of an infinite sequence of ordinals (some or all of which may themselves be
infinite) with hardly more trouble than we took to define plain addition! There is quite a
bit of nice mathematics to be discovered in following this up, but we have other fish to fry,
so we won’t.
It is easy to prove that, in the case where all the αi are the same, αi = α say,
X
α = αβ.
i<β
which we could express, rather roughly, as “the sum of β copies of α is αβ” — corresponding
exactly to a result about whole numbers familiar since primary school.
Exponentiation of ordinals also corresponds to a construction with well-ordered sets, however
this construction is rather technical and we will not be needing it in what we are going to
do here, so we will leave this field undisturbed.
E. CARDINALITY 327
E Cardinality
E.1 Discussion
In this section we will consider the problem of comparing the sizes of infinite sets. It depends
on the simple idea that two sets are the same size if there is a one-to-one correspondence
between them. One might think of this as a pairing up of the members of one set with those
of the other.
This idea of determining if two sets of things are the same size appears to be more primitive
than the process of counting. Certain societies which have no general notion of number are
perfectly able to compare the sizes of sets by matching; small children who have not yet
learnt to count can do the same.
The idea of determining the size of a set by counting its members may seem simple to us,
but it is really quite sophisticated. Its apparent simplicity is possibly due to the fact that
we learnt it early in our lives and it has become second nature to most of us. Basically,
to determine that a set has size n (that is, n members) we place the members in one-
to-one correspondence with the set {1, 2, 3, . . . , n} by some, often mental, process. And
then we recognise that two sets are the same size if they can both be placed in one-to-one
correspondence with the same set of natural numbers in this way. This involves recognising
(at least subconsciously) that if two sets are in one-to-one correspondence with a third one
(the set of numbers), then they must be in one-to-one correspondence with each other. And
of course the process of counting breaks down if you cannot ensure that every member of the
set does get counted and that no member gets counted twice. This can sometimes be quite
difficult. For example, determining the number of cattle in a field is not straightforward if
the number is large and the animals keep moving around. Farmers often solve this problem
by passing the animals through a narrow opening between two fields, counting them as they
go.
This counting method breaks down for infinite sets. All one can do with the natural numbers
is count and compare the sizes of finite sets and recognise when a set is infinite (because
it cannot be so counted). I find it remarkable that the more primitive method of placing
sets in one-to-one correspondence can actually leap-frog counting and provide us with a
good method of comparing the size of infinite sets. Using this method, for example, we can
say that N is not the same size as R, because we can prove that there does not exist any
one-to-one correspondence between them. Further, we can say that N is smaller than R
because N is in one-to-one correspondence with a subset of R.
This topic obviously revolves around the idea of a one-to-one correspondence and related
ideas. So let us review these briefly before starting.
An alternative way of proving that a function is a bijection is to appeal to the fact that a
function f is a bijection if and only if it has an inverse, that is, a function g : B → A such
that g(f (a)) = a for all a ∈ A and f (g(b)) = b for all b ∈ B. In this case the inverse is
usually denoted f −1 and is unique.
It is useful to know that the composite of two injections is an injection, of two surjections is a
surjection and of two bijections is a bijection. Also that the identity function on any set is a
bijection, and therefore also both an injection and a surjection. (Proofs are straightforward.)
We will start by establishing some basic properties of these concepts before we look at some
more interesting applications.
E.4 Proposition
This proposition establishes the most basic properties that we would expect the notion of
“the same size” to have. Since we are extending this notion from the familiar domain of the
E. CARDINALITY 329
finite to the less familiar one of infinite sets, these properties do need proving.
Being the same size (equipollence) is an equivalence relation on the class of all sets. That
is, this relation satisfies
(i) A≈A for all sets A.
E.5 Proposition
(i) If A ≈ A0 and B ≈ B 0 then
330 CHAPTER 7. TRANSFINITE ARITHMETIC
(b) A × B 4 A0 × B 0
0
(c) B A 4 B 0A Except in the case A = B = B 0 = ∅ and A0 6= ∅.
(d) P (A) 4 P (A0 )
Here are some “identities” for sets which look very much like similar ones for numbers.
E.6 Proposition
For all sets A, B and C, (here + means disjoint union)
(i) (A + B) + C ≈ A + B + C ≈ A + (B + C) .
(ii) A + B ≈ B + A.
Proofs. These are all straightforward with the possible exception of (viii), and the only
difficulty with this is in getting the right notation. Here are the details. Firstly, notice that
A(B×C) ≈ A(C×B) by Part (v) and the preceding proposition. So it is sufficient to prove
that (AB )C ≈ A(C×B) .
Now both (AB )C and A(C×B) are sets of functions, so, for each function f ∈ (AB )C , we
want to define a function f ∗ ∈ A(C×B) and then show that this f 7→ f ∗ is a bijection.
E. CARDINALITY 331
Now (AB )C is the set of all functions C → AB , so f is such a function. Given any c ∈ C,
f (c) is itself a function B → A. So, given any b ∈ B, the action of this function on b is
f (c)(b), and this is a member of A. But now it is (sort of) obvious what f ∗ must be: for
any hc, bi ∈ C × B, f ∗ (c, b) = f (c)(b).
So the mapping f 7→ f ∗ can be described simply as “replacing the inner pair of parentheses
by a comma”. It remains to show that this mapping is one-to-one and onto A(C×B) . But
these are now easy.
One for the computer scientists: you will recognise this relationship as currying/uncurrying.
It is time for a few examples comparing the size of sets we often work with.
E.7 Examples
(i) X ≈ ∅ if and only if X = ∅.
Proof Obvious.
(ii) Normal counting of finite sets: the set A has n members if and only if A ≈ n. (This
probably should be the definition.)
Here we are placing the set A in on-to-one correspondence with the set {0, 1, 2, . . . , n − 1}
instead of the more everyday {1, 2, . . . , n}. Using {0, 1, 2, . . . , n − 1} in this way should be
familiar to computer programmers.
Proof not required: this is a definition.
Proof Obvious.
(vi) AC A set is infinite (that is, not finite in the sense defined above) if and only if it has
a subset ≈ N.
AC
(vii) A set is infinite if and only if it is the same size as one of its proper subsets.
Proof that A infinite ⇒ A has a subset ≈ N: First, well-order A (using the Axiom of
Choice). Then there is an order-isomorphism f from some ordinal α onto A. Since A
is infinite, α ≥ ω. Then f [ω] is the subset you want.
Proof that if A has a subset ≈ N then it is the same size as a proper subset of itself:
use the trick from Part (v) on that subset.
Proof that if A is the same size as a subset of itself, then it is infinite: show first (by
induction) that, for any n ∈ N (thought of as the set of its predecessors), n is not the
same size as a proper subset of itself. Consequently, by (iv), no finite set can be the
same size as a proper subset of itself either.
(viii) Let E be the set of even natural numbers, E = {0, 2, 4, 6, . . .}. Then E ≈ N.
Proof n 7→ 2n.
(ix) Any two nonempty open intervals (a, b) and (c, d) in R are the same size. The same
is true for any two nonempty closed intervals [a, b] and [c, d] (as long as a < b and c < d).
Proof Use the linear function which takes a 7→ c and b 7→ d.
(x) Any nonempty open interval (a, b) in R is the same size as R itself.
√
Proof To see that R ≈ (−1, +1), use the function x 7→ x/ 1 + x2 .
Then use (ix) for any other interval.
(xi) The open interval (−1, +1) is the same size as the closed interval [−1, +1].
Proof is tricky. For a direct proof, treat numbers of the form ± 21n separately from
all the others. For an easy proof, wait until we have proved the Schröder-Bernstein
Theorem.
Many more complicated examples are much more easily proved using the Schröder-Bernstein
Theorem (see below). Example (xi) above is the first such.
E.8 Definition
A set is countable if it is either finite or ≈ N. It is countably infinite if it is ≈ N. (Denumerable
and enumerable are common synonyms for countably infinite.)
E.9 Proposition
For any set X, X 6≈ P (X).
In fact, X ≺ P (X).
This says that no set is the same size as its power set; moreover it is in fact smaller than its
power set. You will recognise the proof as being very much the same as the Russell Paradox
proof.
At first sight it might seem surprising that this theorem holds even for the empty set. But
the empty set has no members, whereas its power set has one, namely ∅.
This is a contradiction.
For the rest it is now enough to prove that X 4 P (X). This is easy: the function X → P (X)
defined by x 7→ {x} is an injection.
E.10 Proposition
For any set A, P (A) ≈ 2A .
(Here 2A is the set of all functions A → 2, that is, all functions A → {0, 1}.
f
g[B] f [A]
Orphans in A Orphans in B
A B
Now consider a member a of A, and trace its ancestry back as far as possible. It may have
a parent in B, which in turn may have a parent in A, which in turn . . . and so on. One of
three things must happen:
Note that an orphan in A is its own ultimate ancestor and so is descended from A. In the
same way an orphan in B is its own ultimate ancestor and so is descended from B
Because the functions f and g are injective, it is easy to see that these three possibilities
are mutually exclusive. Therefore A is the disjoint union of three subsets:
AA BA
AB BB
g
A∞ B∞
A B
Now consider the restrictions fA , fB and f∞ of the function f to the sets AA , AB and A∞
respectively.
Since f is injective, so are these three restrictions. By the definitions of BA , BB and B∞ ,
we see that
fA
AA BA
bijective
fB
AB BB
not bijective
(Drat!)
f∞
A∞ B∞
bijective
A B
336 CHAPTER 7. TRANSFINITE ARITHMETIC
fA
AA BA
bijective
−1
gB
AB BB
bijective
(Yay!)
f∞
A∞ B∞
bijective
A B
We can now glue the last three bijections together to get a single bijection ϕ : A → B ;
define it by
fA (a) if a ∈ AA ;
−1
ϕ(a) = gB (a) if a ∈ AB
f∞ (a) if a ∈ A∞ .
That seems like a lot of work to establish such an obvious fact: that if A is no bigger than
B and B is no bigger than A then they must be the same size.
AC
E.12 Proposition
Let A and B be sets, A 6= ∅. Then A 4 B if and only if there is a surjection B → A.
(The proof that the existence of a surjection implies A 4 B requires the Axiom of Choice;
the converse does not.)
Another result which one feels should “obviously” be true. If there is a function from A onto
B, then “surely” B can be no bigger than A. This does require proof — and the fact that
the proof in one direction requires the big gun of the Axiom of Choice seems surprising.
defined for each a ∈ A. These sets form a partition of B, so they are disjoint and all
nonempty. Invoking the Axiom of Choice, there is a choice function c : P (B) r ∅ → B such
that c(X) ∈ X for every nonempty subset of B. Define f (a) = c(g −1 (a)) for all a ∈ A. This
defines a function A → B, and, since all the sets g −1 (a) are disjoint, it is an injection.
And what happens when A = ∅? I’ll leave you to figure that out.
E.13 Examples
(i) A set is countable if and only if it is 4 N.
Proof (outline). This is not quite as obvious as it looks. If a set is countable, then it
is 4 N: that is just a matter of checking the definitions. On the other hand, if a set
is 4 N, then it is in on-to-one correspondence with some subset of N. So the proof
boils down to showing that every subset of N is ≈ either some n ∈ N or N itself, an
interesting little exercise in induction.
AC
(ii) A set A is infinite if and only if N 4 A .
Proof This is just a restatement of Example E.7(vi). I E.7
(iii) It is now easy to prove Example E.7(x), that the two intervals [−1, +1] and (−1, +1) I E.7
in R are the same size.
Proof Firstly (−1, +1) ⊆ [−1, +1] and so (−1, +1) 4 [−1, +1].
Also, by E.7(viii), [−1, +1] ≈ [− 21 , + 12 ].
(iv) By a similar argument, any nontrivial interval in R , open, closed, half-open, finite
or infinite (in length) is the same size as R. (By “nontrivial” here I mean nonempty and not
a singleton interval [a, a]).
(v) 2N ≈ R.
(Consequently R is uncountable. By previous examples then, neither is any nontrivial
interval in R.)
Proof It is easiest to show that 2N is the same size as the interval [0, 1) of reals.
The trick is to represent the reals in this interval as an infinite binary expansion. If
we write x = 0.d0 d1 d2 . . . in this notation, we mean
d0 d1 d2
x = + 2 + 3 + ···
2 2 2
where each di is either 0 or 1. If we disallow sequences of digits which terminate with
an infinite number of 1’s, the representation is unique. Thus we have an injection
from [0, 1) into the set of all sequences in {0, 1}, indexed by N, that is, 2N . This
proves that R 4 2N .
338 CHAPTER 7. TRANSFINITE ARITHMETIC
To prove the opposite relation, observe that, to every such sequence of 0’s and 1’s,
there is a real number x in [0, 1) defined by
d0 d1 d2
x = + 2 + 3 + ···
3 3 3
and now, because the denominators are powers of 3, distinct sequences define distinct
reals; therefore this defines an injection from 2N into [0, 1), proving that 2N 4 [0, 1).
Trying to prove this result without the help of the Schröder-Bernstein Theorem is pretty
awful.
(vi) (Corollary) N ≺ R.
(vii) N2 ≈ N (and so N2 is countably infinite).
(viii) R2 ≈ R.
Proof From [0, 1) ≈ R it is easy to prove that [0, 1) × [0, 1) ≈ R × R, that is
[0, 1)2 ≈ R2 . It is now sufficient to show that [0, 1)2 ≈ [0, 1). That [0, 1) 4 [0.1)2 is
obvious, so we show that [0, 1)2 4 [0, 1).
Given a pair hx, yi in [0, 1)2 , write each as its proper decimal expansion
x = 0.d1 d2 d3 . . .
y = 0.e1 e2 e3 . . .
Here “proper” means we never use an expansion which ends with an infinite sequence
of 9’s. Now define z by alternating digits in these expansions
z = 0.d1 e1 d2 e2 d3 e3 . . . .
Then the function hx, yi 7→ z (where z is defined as above) defines an injection
[0, 1)2 → [0, 1).
(ix) Z is countably infinite.
Proof Define a bijection Z → N by
(
2n if n ≥ 0 ;
f (n) =
−2n − 1 if n < 0 .
(x) N × R ≈ R.
Proof This is nice. All the steps below are from things we have proved above.
R ≈ {0} × R 4 N × R 4 R × R ≈ R .
(xi) 2R ≈ NR ≈ RR .
Proof This is even nicer.
2R 4 NR 4 RR ≈ (2N )R ≈ 2(N×R) ≈ 2R .
Some more examples as exercises
E. CARDINALITY 339
(xiv) NN ≈ 2N
(xv) RN ≈ R
(xvi) The set of all finite sequences of natural numbers is countably infinite.
(xvii) The set of all functions R → R is RR and therefore R. What about the set of all
continuous functions R → R?
(xviii) Hard. Is the set of all bijections R → R the same size as R or as RR ?
(xviii) Much harder I think. Is the set of all increasing functions R → R the same size as
R or as RR ?
The Continuum Hypothesis gets its name from the historical fact that R used to be
called “the Continuum”. Following the famous Cantor proof that the reals are not
countable (in our language, N ≺ R — and note that R is the same size as 2N ) the
question arose as to whether there was a subset of the reals of a strictly intermediate
size.
340 CHAPTER 7. TRANSFINITE ARITHMETIC
F Cardinal numbers
One can use the natural numbers to measure the size of finite sets, in the sense that every
finite set is the same size as exactly one natural number. It would be nice if there was some
set of canonical “numbers” which we could use to measure the size of infinite sets in the
same way.
If we are prepared to accept the Axiom of Choice then this can be done, for then every set
X can be well-ordered and so made order-isomorphic to an ordinal number, its order-type.
But then, since an order-isomorphism is a bijection, it is the same size as its order type. Now
forget the well-ordering: we are left knowing that every set X is the same size as at least
one ordinal number. Define its cardinal number or cardinality as the least ordinal number
which is the same size as it.
You might wonder whether any given set can be the same size as more than one ordinal
number — maybe the above definition is worrying about a possibility that doesn’t arise?
Certainly any finite set is the same size as only one ordinal (the n which is its size) but
every infinite set will be the same size as many ordinals. This follows from the fact that,
if α is an infinite ordinal, then all the sets α + n, for all natural numbers n, are the same
size. So, if a set is the same size as an infinite ordinal α, then it is the same size as all of
α, α + 1, α + 2, . . . . So, yes, it is necessary to make the definition the way it is.
It is not difficult now to see that the ordinal numbers which can turn up in this rôle (as the
sizes of sets) are just those ordinals which are not the same size as any of their predecessors.
Now let us define all this in a more logical order.
AC
F.1 Definition
A cardinal number is an ordinal number which is not the same size as any of its predecessors.
In other words, it is an ordinal number α with the property that
Now let X be any set. We define its cardinality to be the least ordinal number which is the
same size as X:
#X = min{ α : α is an ordinal and X ≈ α } ,
which is, of course, the unique cardinal number which is the same size as X. This is also
called the cardinal number of X, or even the size of X.
(The definition of a cardinal number does not require the Axiom of Choice. The definition
of the cardinality of a set requires the Axiom of Choice in its tacit assumption that every
set does have a cardinality. Should we take this definition to mean that we are defining
cardinality for sets which happen to be the same size as some ordinal number, accepting the
possibility that some sets may not qualify, then the Axiom of Choice is not involved.)
F. CARDINAL NUMBERS 341
F.2 Examples
(i) No natural number is the same size as any of its predecessors. Therefore every natural
number is a cardinal number.
(ii) The ordinal ω = N is infinite and therefore not the same size as any of its predecessors,
which are the natural numbers. Therefore ω is the first infinite cardinal number, and in this
context it is traditionally denoted ℵ0 (read “aleph-null”).
We now have three different names and symbols for the set of natural numbers. They are
all in common use, so one needs to know them. Which one is used depends upon how the
set is being employed — whether it is being used as the good old natural numbers we use
for counting etc., or as the first infinite ordinal or as the first infinite cardinal.
(iii) The ordinals ω + 1, ω + 2, . . . , 2ω, . . . , 3ω, . . . , ω 2 , . . . are all countable (countably
infinite of course). Therefore they are all the same size as ω, which means that none of them
are cardinal numbers.
(iv) The cardinality of R is the same as that of the set of all functions ℵ0 → 2 and so is
denoted 2ℵ0 (more on this notation below).
(v) The Continuum Hypothesis is that there is no cardinal number strictly between ℵ0
and 2ℵ0 . The Generalised Continuum Hypothesis is that, for any infinite cardinal i, there
is no cardinal number strictly between i and 2i .
For some reason, when dealing with cardinal numbers, mathematicians started using Hebrew
letters. Aleph (ℵ) and beth (i) are the first two letters of the Hebrew alphabet. As far as
I know, no other letters of this alphabet are in common use.
(vi) A≈B if and only if #A = #B,
A4B if and only if #A ≤ #B
F.6 Example
The sets ω, ω + ω and ω × ω are all the same size (all countably infinite). Well, since we are
discussing cardinals here, we should say that the sets ℵ0 , ℵ0 ⊕ ℵ0 and ℵ0 × ℵ0 are all the
same size. Therefore, in cardinal arithmetic
ℵ0 = ℵ0 + ℵ0 = ℵ20 .
We know that these are all different in ordinal arithmetic. So cardinal arithmetic is different
from ordinal arithmetic.
ℵ0 = ω
ℵα+ = the first cardinal > ℵα
For any nonzero limit ordinal λ ℵλ = sup{ ℵξ : ξ < λ }
It is not difficult to prove that this defines a cardinal for every ordinal α and that these are
all the cardinals there are.
For any set A of cardinality α, its power set P (A) is of cardinality 2α . Thus the Continuum
Hypothesis can be restated:
ℵ1 = 2ℵ0
and the Generalised Continuum Hypothesis can be restated: for every ordinal α,
ℵα+1 = 2ℵα .
(viii) αβ+γ = αβ αγ
(ix) (αβ)γ = αγ β γ
(x) (αβ )γ = α(βγ)
(xi) 1α = 1
So far so good. Now we look at a proposition and theorem which tell us that, where infinite
cardinals are involved, cardinal arithmetic is mostly trivial.
AC
F.11 Proposition
Let α be an infinite cardinal. Then α2 = α (cardinal arithmetic).
Proof. Suppose not. Then let α be the least infinite cardinal such that α2 6= α. Now α ≥ 1
(tivially), so α2 ≥ α.1 = α and therefore α2 > α.
Consider the cartesian product α × α. We define a new order / on it by
hξ1 , ξ2 i / hη1 , η2 i iff max{ξ1 , ξ2 } < max{η1 , η2 }
or max{ξ1 , ξ2 } = max{η1 , η2 } and ξ1 < ξ2
or max{ξ1 , ξ2 } = max{η1 , η2 } and ξ1 = ξ2 and η1 < η2 .
Note that it follows immediately from this definition that
If hξ1 , ξ2 i / hη1 , η2 i then max{ξ1 , ξ2 } ≤ max{η1 , η2 } .
F. CARDINAL NUMBERS 345
θ ≥ #θ by F.5(i)
= #(α × α) since f is a bijection
2
= α.α = α by definition of cardinal multiplication
> α by choice of α above.
Now set µ = max{ξ1 , ξ2 } + 1. Since ξ1 , ξ2 ∈ α, they are both < α and so, since α is a limit
ordinal, µ < α. We now have
and now
Now µ cannot be finite because, if it were, µ×µ would also be finite, in which case #(µ×µ) =
(#µ)2 would be finite, contradicting (–1), since α is infinite.
So µ must be infinite, in which case #µ is infinite too. Now we saw that µ < α and so
#µ < α also. Therefore, by the original choice of α, (#µ)2 = #µ. But then (#µ)2 < α,
again contradicting (–1).
AC
F.12 Theorem
Let α and β be cardinal numbers, α ≤ β and β be infinite. Then (cardinal arithmetic)
(i) α+β =β .
(ii) Provided α ≥ 1 also, αβ = β.
(iii) Provided α ≥ 2 also, αβ = β β .
(ii) 1≤α so β = 1β ≤ αβ ≤ β 2 = β .
346 CHAPTER 7. TRANSFINITE ARITHMETIC
AC
F.13 Examples
We now have enough artillery to sort out the sizes of many of the infinite sets we deal with
constantly. Here are some examples:
(∃s ∈ F)(x ∈ s) .
(∀u ∈ F)(u ∈ x ⇒ u ∈ y) ,
which says that every finite set which is a member of x is also a member of y. This is quite
different from x ⊆ y. For example, in an interpretation in which x corresponds to{Z, 1, 2}
and y to {1, 2, 3} (let’s call these sets x and y) we see that x ⊆ y is true in F, whereas x ⊆ y
is not.
In the light of this, one would expect MK1 to fail, and so it does. It is interpreted as
(∀a ∈ F)(∀b ∈ F) (∀x ∈ F)(x ∈ a ⇔ x ∈ b) ⇒ a = b .
where P (x) is the predicate corresponding to P (x). Once more, if we set P (x) to be, say,
x = x, this is false: it says that there is a finite set w which contains every finite set x.
The failure of MK1 is a big drawback because it makes the apparatus of definition by
description almost unusable — and this is used to define unordered pairs, unions, power
sets and, via them, almost everything else. Suppose that nevertheless we are serious about
trying to see what the axioms of mathematics have to say in a universe of finite sets; is there
any way we can resurrect this example? There are two ways we could go about this:
1 We could redefine F as the class of all finite sets whose members are also finite, whose
members of members are also finite and so on. We simply define a set to be recursively finite
347
348 CHAPTER 8. MODEL THEORY
if it is finite and all its members are recursively finite. This definition looks a bit alarming
at first sight, but the Axiom of Foundation tells us that it is valid.
2 We could keep F as the class of all finite sets, but redefine the interpretation of = .
There is nothing in the definition of an interpretation to say that = must be interpreted
as ordinary equality. It is a binary relation symbol and so must be interpreted as a binary
relation. The axioms of equality imply that it must be an equivalence relation and “behave
well” towards functions and other relations in an obvious way. So, suppose we define an
equivalence ∼ on F by:
a ∼ b ⇔ (∀x ∈ F)( x ∈ a ⇔ x ∈ b )
and use this as the interpretation of = . Then, having checked that the axioms of equality
hold in the new structure, we have more or less forced MK1 to be true.
It would be difficult, if not impossible, to arrange for MK2 to be true in such a structure.
As an alternative, we could decide to investigate the ZF axioms instead. (Note that ZF uses
the same language as MK, it is only the axioms that are different, leading of course to a
different theory.) Axiom ZF2 (corresponding to MK2) states (again for any predicate P (x):
(∀a)(∃w)(∀x)( x ∈ w ⇔ x ∈ a ∧ P (x) )
This translates to
(We note here that the symbol is not part of the language L, even informally. It is a
metasymbol.)
If Q is a statement in the language L which is true for every structure for the language, we
say that Q is semantically valid and we write Q. This is the same as saying that ∅ Q.
I 5.C.3 (Note that Proposition 5.C.3(ii) says that all the theorems of PL are semantically valid.)
A. STRUCTURES, INTERPRETATIONS, MODELS AGAIN 349
A.3 Theorem
For any expressions P and Q in any language L,
Proof. Suppose first that P Q, and let M be any structure for L and θ any interpretation
L → M . Then it cannot be the case that θ(P ) is true and θ(Q) is false, because this would
contradict P Q. But that means that θ(P ) ⇒ θ(Q) in M , that is, that θ(P ⇒ Q) is true
in M . Since that is true for any M and any θ, this proves that P ⇒ Q.
Conversely, if P ⇒ Q, then θ(P ) ⇒ θ(Q) for every M and θ. But then, if θ(P ) is true,
then so is θ(Q), proving that P Q.
350 CHAPTER 8. MODEL THEORY
This is a big proof which goes in several stages, each of which is quite complicated. Never-
theless the proof itself is of considerable interest, so I plan here to explain it carefully in as
leisurely a fashion as I can manage, starting with an outline.
I Ch.5 In Chapter 5 we looked at several models for theories, but they don’t suggest any obvious
systematic way of creating a model for a general first-order theory. Given an arbitrary
theory, we don’t have much to work with — except for the theory itself. So what we do
is manufacture a model out of the theory itself, or at least part of it. Let us start with a
couple of examples to see how this might go.
If we were given PA to work with, but had no pre-existing idea about N, how could we pro-
ceed? Well, first we would notice that there ia a constant 0̄, so we must have a corresponding
member of our model, and that gives us 0, or rather, we can use the term 0̄ itself as that
member. Next we notice that there is a function suc, so we have to also have suc(0̄) and then
suc(suc(0̄)) and then suc(suc(suc(0̄))) and so on. Using our other notation for successors, we
now have
0̄ , 0̄+ , 0̄++ , 0̄+++ , . . .
and it looks like we have all of N (or at least something we can use to manufacture it)
already.
But note how we got this far — we started with the “constant” zero, which we had to have,
and then noticed that applying the function suc to it should give us more members which we
have to have also. But we cannot stop there: there are other functions too, giving us other
terms. We have to have things like
0̄+ + 0̄++ , 0̄++ .0̄+++ and even (0̄+ + 0̄++ ).(0̄++ + 0̄+ ) .
Now we know that lots of these should be the same as one another, and that which are the
same as which should be sorted out by the theorems of the theory. Actually doing this is
quite messy, however there is an easy way out: at present we are trying to manufacture a
plain model, and we don’t really care if it respects equality. We will press ahead, treating
the equality symbol (if it exists in our theory) just like any other binary relation. Basically,
creating a model without worrying about respecting equality will be hard enough without
that extra complication.
(Of course, we do want to know that, if the theory happens to be one with equality, then
there is a model which respects it. We will prove that after proving the main theorem by
showing that, if the theory happens to be a theory with equality which has a model, then
from that we can manufacture a model which does respect equality.)
B. ADEQUACY OF FIRST-ORDER THEORIES 351
So let us try this: Our model (let’s call it M ) should consist of all “constant” terms in the
language. By that I mean all terms of the language which do not contain any variables, so
terms like x + 0̄+ won’t be members of our model.
( Now I have a problem with my coloured fonts. So far I have adopted a convention that
members of the formal language are written in green and members of the model in black,
or perhaps brown. The trouble is, now the terms of the language are both. I think that the
best thing to do is leave them in green.)
We need to define the actions of the functions on our model. Let us start with addition: we
want to define its action +M on the model. For example, we have two members of the model
(terms) 0̄+ and 0̄++ ; how are we to define its sum? Well, there’s an obvious candidate, the
sum as already defined in the formal language. So we define
Here, on the left hand side, we have the addition function of the model acting on two
members of the model and, on the right hand side, a single term of the language in its
capacity as a single member of the model. It is easy to see that we can do the same thing
for the other functions of the language (the successor function and multiplication); in fact
we can make this into a definition that will work for any first-order language at all.
Let f be a function symbol of the formal language (of arity n say); we define the action fM
of f on the model M as follows. For any constant terms t1 , t2 , . . . , tn of the language (which
are therefore members of the model M also),
fM (t1 , t2 , . . . , tn ) = f (t1 , t2 , . . . , tn )
As above, on the left hand side, we have the function fM of the model acting on n members
of the model and, on the right hand side, a single term of the language in its capacity as a
single member of the model. It is not hard to show that this definition satisfies the required
property for the interpretation of a function symbol.
We also have to define how relations behave in the model. So, supposing we have a relation
symbol r of arity n and we want to define a corresponding relation rM on the model, I think
that there is a natural way to do it: For any constant terms t1 , t2 , . . . , tn of the language
(which are therefore members of the model M also),
(This definition turns out to work nicely provided the theory is complete, but gives trouble
otherwise. We will cross this bridge (and we can) when we come to it.)
So far, so good. But there is still another major hurdle to overcome. Consider the first-order
theory of Projective Planes, as discussed in Section 5.B. Looking at the definition there, we I 5.B
see that this theory has no constant terms at all and so the method we used with PA above
is not going to work. So how are we to build a model from this theory?
The answer to this question is that this theory, like many others, has some existence theorems
— theorems of the form (∃x)P , where P is some expression. We will see that we only need
to deal with such theorems in which x is the only free variable in P (so the theorem itself
352 CHAPTER 8. MODEL THEORY
is a sentence). Since such a theorem says that something exists with the given property, it
had better exist in the model. So we just add a new constant term to our language, c say,
and fix things so that the sentence P [x/c] is true for the model. (These new constants we
add are usually called witnesses.) Of course, we will have to add a witness for every such
sentence (∃x)P and there will probably be an infinite number of them.
Being an axiom, this is also a theorem, so we treat it the same way to get another constant
c2 and axiom
(∃x3 )(∃x4 )(Q(c1 , c2 , x3 , x4 , ) ,
and so on. Eventually we end up with four new constants, c1 , c2 , . . . , c4 and the axiom
Q(c1 , c2 , . . . , c4 ), which is what we wanted.
B. ADEQUACY OF FIRST-ORDER THEORIES 353
SO, our proof goes in several stages. We will deal with the case in which the language is
countable first, since that is complicated enough. When that is done, I will go through the
small changes necessary to make the whole thing work for a language of any cardinality.
Stage 1 We start with our given consistent language, A say and add a countable set of
new constants; this creates an extension B of the original language. We will prove before
starting that making an extension by adding constants in this way to a consistent language
creates a new language which is also consistent.
Stage 2 We add new axioms (the witness statements) to B to turn the new constants into
the required witnesses. This does not add any new symbols, so the language stays the same,
however it does create new theorems, so the theory is extended in that sense. We prove that
this new theory, which we will call C, is also consistent.
Stage 3 Then we invoke a theorem (that we have already proved: Theorem 6.G.13) that I 6.G.13
every consistent theory has an extension which is both consistent and complete to create
such an extension D of C.
Stage 4 We construct the model from the constant terms of D using the technique de-
scribed above and prove that it is indeed a model of D and therefore of A also.
Stage 5 We prove the theorem that if a first-order theory with equality has a model, then
it has one which respects equality.
Stage 4 Model
Stage 3
Extent to complete to get D
Stage 1
Add b1 , b2 , . . . , to get B
Model
Given theory A
B.2 Stage 1
This stage, adding plain constants, is not quite as simple as one might think. The point is
that, having added more constants, one then immediately gets lots more expressions and,
since the axioms of PL are all schemas, we get lots more instances of these too.
to get
P (u) ⇒ (∀u)Q(u). (–2)
If u is a variable symbol, the result (–2) is also an expression, but if it is (for instance) a
constant symbol the result is no longer an expression (because then (∀u) is not allowed). Of
course, more bizarre replacements are allowed; for example, we could replace ) by ¬ in (–1)
to get
P (x¬ ⇒ (∀x¬Q(x¬ (–1)
However, the symbol replacements we are going to be concerned with here are only of one
simple kind, namely replacing a constant symbol by a variable symbol.
B.4 Lemma
Let L be a first-order language in which P is an expression, c a constant symbol and v a
variable symbol. Then the result of symbol-replacing c by v in P is also an expression.
(In other words, the kind of replacement we are interested in here does not convert expres-
sions into rubbish strings.)
Proof. As exercise. It would be a good idea to convince yourself that you can make a
careful proof of this. First define replacement by induction over the construction of terms
I 3.A.16 and expressions (this is almost the same as the “careful” part of Definition 3.A.16 but much
simpler because questions of acceptability do not arise). Then prove this proposition by a
similar induction.
B. ADEQUACY OF FIRST-ORDER THEORIES 355
B.5 Lemma
Let B be a first-order theory, P be a theorem of B and let c1 , c2 , . . . , ck be distinct constant
symbols none of which which occur anywhere in the proper axioms of B. Then there are
variable symbols v 1 , v 2 , . . . , v k such that the result of symbol-replacing ci by vi for i =
1, 2, . . . , k in P is also a theorem of B.
The variable symbols vi can be found thus: take any proof L1 , L2 , . . . , Ln of P and let the
vi be any variable symbols which occur nowhere in this proof. (Since a proof is of finite
length, such variable symbols must exist.)
Note Here the constant symbols ci may appear in some of the axioms P1 – P6 of Predicate
Logic; indeed, as pointed out above, they must do so.
Lk = P ⇒ (Q ⇒ P )
then
L0k = P 0 ⇒ (Q0 ⇒ P 0 )
(where P 0 and Q0 are defined in the obvious way) and this is obviously another instance of
PL1. The proofs for Axioms PL2 and PL3 are the same.
If Lk is an instance of PL4,
Now all the ci are different from x but they might or might not occur in t. In any case we
have
L0k = (∀x)A0 ⇒ A0 [x/t0 ]
(where t0 is the result of making the ci → vi substitutions in t) and it is not difficult to see
that the substitution [x/t0 ] is acceptable in P 0 also.
If Lk is an instance of PL5,
That disposes of the axioms PL1 – PL5. If Lk is a proper axiom, then none of the ci occur
in it and so L0k = Lk and the result is trivial.
Now suppose that Lk follows from two earlier steps by MP. That means that there are two
earlier steps of the forms Li and Lj = Li ⇒ Lk . But then the new proof contains earlier
steps of the forms L0i and L0j = L0i ⇒ L0k and the result is again trivially true.
Finally, suppose that Lk follows from an earlier step Li by UG. (Note, we are talking about
a formal proof of a theorem here: the condition (ug∗) can be ignored.) That means that Lk
is of the form (∀x)Li . So L0k is (∀x)L0i and follows from L0i bu UG.
B.6 Proposition
Let A be a consistent first-order theory. Create a new theory B by adding a set of new
constant symbols, C say, and extending its expressions and the Axioms PL1 – PL5 as
necessary to embrace the new constants. Then B is a consistent first-order theory also.
Note As pointed out above, adding new constants to the set of symbols will result in
enlarging the set of expressions, and in a first-order theory, the PL axioms PL1 – PL5 must
be schemata, and so now we have more instances of them too. On the other hand, there is
no necessity to enlarge the set of proper axioms, and we are not going to do that here.
B.7 Stage 2
We are now going to add the witness statements as new axioms. For each theorem of the
form (∃x)P we want to have a constant c such that P [x/c] .
We cannot just add all such statements as axioms, because as we add axioms we will get
more theorems and, for all we know, lots of these theorems might be new ones of the same
form (∃x)P (with a new x and P ), so we would get into a complicated vicious circle.
We wriggle out of this problem by starting with all expressions P with exactly one free
variable (whether they are theorems or not) and adding axioms
for all of them. Then, if (∃x)P turns out to be a theorem, the P [x/c] will be too, as we
want, and if (∃x)P turns out not to be a theorem, the new axiom gives us nothing. (The
set of all such P does not change as we add axioms, so we don’t get the circle problem.)
There is one more adjustment to make. The proof should work with our fully-formal lan-
guage, which does not have the ∃ symbol. So (–1) is semi-formal for
¬(∀x)¬P ⇒ P [x/c]
B. ADEQUACY OF FIRST-ORDER THEORIES 357
which is equivalent to
¬P [x/c] ⇒ (∀x)¬P
Now we are going to add an axiom of this form for every expression P with one free variable
and we can simplify this a bit by replacing P by ¬P :
This simpler form of the witness statements makes the proof go more easily, so this is what
we’ll use.
B.8 Lemma
Let A be a countable consistent first-order theory. Then there is a consistent extension C
of A which has the property that, for every expression P with exactly one free variable x,
there is a constant c and a theorem P [x/c] ⇒ (∀x)P in C.
Proof. First, we use Stage 1: Create a new theory B by adding a countably infinite set of
new constant symbols b1 , b2 , b3 , . . . to A and extending its expressions and Axioms PL1 –
PL5 as necessary to embrace the new constants. By the Proposition B.6, B is consistent. I B.6
(And A ⊆ B of course.)
It will be convenient in this proof to mention axioms for these theories. Starting with A,
we will assume that it has proper axioms A; (if it does not come already equipped with
axioms, we can always use the whole of A as its own proper axioms: A = A. This may
seem a strange thing to do, but nobody said that you can’t repeat the PL axioms as proper
axioms too.) When we add the constants b1 , b2 , . . . to form B, as mentioned above, the
proper axioms of PL extend to encompass these new constants, but the proper axioms do
not change: so we can take A as a set of proper axioms for B too.
Let P1 , P2 , . . . be an enumeration of all the expressions of B that contain exactly one free
variable, and let x1 , x2 , . . . be their free variables. (We number them the same way so that,
for each i, xi is the unique free variable of Pi . Note that the Pi are all different, but the xi
will not be.)
Now we add the witness statements as described above. But first, we choose the constants
ci from among the bi we already have — but they have to be chosen craftily.
Choose a subsequence c1 , c2 , . . . from the new constants b1 , b2 , . . . in such a way that, for
each i,
(i) ci does not occur in any of P 1 , P 2 , . . . , P i
(ii) ci is different from all its predecessors c1 , c2 , . . . , ci−1 .
(This can be done for any i because there is an infinite number of bi to choose from, and
the two conditions only rule out a finite number of them.)
For each i, let Wi be the expression
(These are the witness statements.) Note that, since xi was the only free variable in Pi , Wi
is a sentence — no free variables. To prove consistency of the result, we will add them one
at a time.
For each n, let Bn be the first-order theory with proper axioms A ∪ {W 1 , W 2 , . . . , W n }.
Also, let C be the one with proper axioms A ∪ {W1 , W2 , . . .}. It is obvious that
B0 = B
For every n, Bn is an extension of B.
For every n ≥ 1, Bn is the extension of Bn−1 generated by it together with Wn .
Firstly, we know that B0 = B is consistent by Stage 1. We now show that, for any n ≥ 1, if
Bn−1 is consistent, then so is Bn ; this we do by showing that, if Bn is inconsistent then so
is Bn−1 . Making the assumption that Bn is inconsistent, it follows that any expression at
all is provable in Bn ; in particular,
¬Wn (in Bn ).
and so
Wn ⇒ ¬Wn (in Bn−1 ).
Uh oh! I’ve just used the deduction theorem and the line it came from was an entailment,
not a deduction. Is that OK? Yes, because the hypothesis Wn has no free variables, and in
that case a deduction and an entailment are the same thing.
From this it follows that
¬Wn (in Bn−1 ).
Using (–1), the definition of Wn (and some SL), we see that
and so both
Pn [xn /cn ] (in Bn−1 ) (–2)
and
¬(∀xn )Pn (in Bn−1 ). (–3)
Now the constant cn is not a symbol of A and so does not occur in any of its axioms. By
its choice, it also occurs nowhere in any of P 1 , P 2 , . . . , P n or c1 , c2 , . . . , cn−1 . Therefore it
occurs nowhere in any of W 1 , W 2 , . . . , W n−1 . This shows that cn occurs nowhere in any of
the proper axioms of Bn−1 .
B. ADEQUACY OF FIRST-ORDER THEORIES 359
B.5 Using (–2) and Proposition B.5 above, there is a variable symbol v such that the result of
replacing cn by v in Pn [xn /cn ] is a theorem of Bn−1 .
Now Pn contains no occurrences of cn and Pn [xn /cn ] is the result of replacing the free
occurrences of xn in Pn by cn . Consequently, the result of replacing these occurrences of
cn in turn by v is just the same as replacing all free occurrences of xn in Pn by v (think
about it!) — and that is just Pn [xn /v]. This tells us that the result of replacing cn by v
in Pn [xn /cn ], which we have just seen is a theorem of Bn−1 , is just Pn [xn /v]. So we have
proved that
Pn [xn /v] (in Bn−1 ).
showing that Bn−1 is inconsistent, as required. This completes the induction, showing that
every Bn is consistent.
It follows that C is consistent also.
Stage 3
Stage 3 is now easy. We have already done the work.
B.9 Lemma
Let A be a countable consistent first-order theory. Then there is a complete consistent
extension D of A which has the property that, for every expression P with exactly one free
variable x, there is a constant c and a theorem P [x/c] ⇒ (∀x)P in D.
B.10 Stage 4
Now we build the model, let us call it M . Its underlying set will be the set of all constant
terms of C (which is the same as the set of all constant terms of D), since this is a simple
extension.
We must make this into a structure for the language by defining the actions of the functions
and relations of the language on this set.
We must then show that this structure is a model by showing that all sentences in A are
true for M .
360 CHAPTER 8. MODEL THEORY
B.11 Theorem
Every countable consistent first-order theory has a countable model.
Remarks This theorem is in fact true if we replace “countable” by any cardinality. I
prove the countable case here first for two reasons. Firstly, the proof in the countable case is
complicated enough without adding the extra tricks needed to get an arbitrary-cardinality
version to work and, secondly, the countable version does not require the Axiom of Choice
for its proof and the general version does. In the next theorem following I prove the general
theorem by adding these tricks.
I B.9 Proof. Let A be the given theory. Create the theoriy D as in Lemma B.9 above. Let M
be the set of all constant terms in D (which is the same as all constant terms in B or in C
since they share the same language).
We make M into a structure for this language as follows:
for any n-ary function symbol f , define the function fM : M n → M by
(and note that, in particular, each constant symbol bi is interpreted as itself, (bi )M = bi ),
for any n-ary relation symbol r, define the n-ary relation rM on M by
We now show that M is a model for D. This will complete the proof, since then it will be a
model for A and is obviously countable. To do this, we show that for any sentence (no free
variables) C in D,
• Suppose now that C is ¬A, for some expression A. Then A is a sentence also, so by
the inductive hypothesis, A is true for M if and only if A (in D).
If C is true for M , then A is false for M and so / A (in D).
But D is complete, so ¬A (in D), that is, C in D, as required.
On the other hand, if C is not true for M , then A is true for M and so A in D.
Since D is consistent, ¬A is not provable in D, that is, / C (in D).
(Here is where we use the consistency of D.)
• Suppose next that C is A ⇒ B. (The proof is much the same as for the last case,
but here it is anyway.) Since C is a sentence, so are A and B. Then, by the inductive
hypothesis, A is true for M if and only if A in D and the same is true for B.
B. ADEQUACY OF FIRST-ORDER THEORIES 361
If x does not occur in A, then (∀x)A A (in D) and C is true for M if and only if A is,
so the result follows immediately by induction.
We may now suppose that A has exactly one free variable x. Then it is one of the expressions
P1 , P2 , . . . listed at the beginning of the proof: there is some k such that C = (∀xk )Pk .
Let C be true for M . Then, by the definition of a model, Pk is true for every interpretation
in M and so, in particular, Pk [xk /ck ] is true in M .
Then, by the inductive hypothesis, Pk [xk /ck ] (in D).
But D contains, as an axiom, Qk , that is, Pk [xk /ck ] ⇒ (∀xk )Pk and so C (in D).
Let C be false for M . Then, again by the definition of a model, there is some interpretation
in M under which Pk is false.
Suppose this interpretation maps xk to t (a member of M and so a closed term of the
language), then Pk ([xk /t]) is false in M and so, by inductive hypothesis, is not provable in
D. But then, by PL4, (∀xk )Pk , that is C, is not provable either.
This seems paradoxical — Set Theory contains assertions of the existence of uncountable
infinite sets, P (N) for example. How can such assertions be true in a countable model?
A model, M say, of MK contains members corresponding to sets and a binary relation
corresponding to the relation symbol ∈. However this binary relation on M need not itself
be set membership — it probably won’t be — so let us denote it /. Let P be the member
of M corresponding to P (N). Since M is countable the set of all x in M such that x / P
is countable. Thus, standing outside the theory, we can construct a one-to-one function
between N and these “members” x of P . However there is nothing in the model which does
the job of this function.
And here comes the arbitrary cardinality version. We can prove this version by some minor
tinkering with the proof of the countable one. All we need to do is replace the (countable)
sequence b1 , b2 , . . . by one big enough to be the same size as the set of all those P (x)
statements. We will also want it to be a sequence, indexed by ordinals, so that we can
define the c1 , c2 , . . . subsequence.
AC
B.14 The “Downward” Löwenheim-Skolem Theorem
Every consistent first-order theory in a language K has a model of cardinality no greater
than that of K.
Proof. Let A be the given theory in a language K of cardinality κ. Create a set B = {bi }i<κ
of the same cardinality as K and disjoint from it. Let L be the new language formed from
K by adding (all the members of) B as new constants to the language and extending the
language as necessary. Observe that L is also of cardinality κ. Let B be the correspondingly
I B.6 extended theory. Now B is consistent by Proposition B.6.
The set of all expressions in B which have at most one free variable is easily seen to be of
cardinality κ, so let us index it thus: {Pi (xi )}i<κ . Define a sequence {ci }i<κ in B by: for
each i < κ, ci is the first member of B that does not occur in any of the {Pj (xj )} for j < i
and is different from all the {cj } for j < i.
(How do we know that such a {ci } exists at all? Let i∗ be the cardinality of i and note that
i∗ ≤ i < κ. The set of all members of B which occur in any of the {Pj (xj )} for j < i is the
union of i∗ finite sets and so is of cardinality ≤ i∗ . The set of all the {cj } for j < i is of
cardinality i∗ . So the set of all members of B which may not be chosen, being the union of
these two sets, is of cardinality i∗ . But B is of cardinality κ > i∗ , so there must be members
left over. )
Now, for each i < κ, let Wi be the expression
Define a sequence of theories {Bν }ν≤κ (all with the same language L) by: Bν is the theory
obtained by adding all of {Wi }i<ν to the axioms of B, and set C = Bκ . It is obvious that
B0 = B
Every Bν is an extension of B.
For every ν, Bν+1 is the extension of Bν obtained by adding the extra axiom Wν .
S
If ν is a nonzero limit ordinal, then Bν = { Bi : i < ν }.
(This proof only requires the Axiom of Choice in its tacit assumption that every language
does in fact have a cardinality. If we do not accept the Axiom of Choice, and so do not
accept that every set necessarily has a cardinality, then we can interpret this theorem to
apply only to languages that do happen to have a cardinality. In that case the Axiom of
Choice is not involved.)
This theorem, together with 5.C.5 above, gives us the big result . . . I 5.C.5
AC
B.15 Corollary: the Fundamental Theorem of Model Theory (general case)
A first-order theory is consistent if and only if it has a model.
And, in case we should ever want any really big models, we have another version of the
theorem that is just the thing.
AC
B.16 The “Upward” Löwenheim-Skolem Theorem
(Also called the Löwenheim-Skolem-Tarski Theorem or the LST Theorem.)
Every consistent first-order theory in a language K has a model of every cardinality greater
than or equal to that of K.
Proof. Let us suppose that the cardinality of the language K is κ and that λ is some
cardinal ≥ κ; we construct a model of cardinality λ.
Repeat the proof of Theorem B.14 above with one change: where we create the set B at the I B.14
very beginning, now create it {bi }i<λ of cardinality λ. The whole of the rest of the proof
proceeds unchanged.
Observe at the end that, since B ⊆ M ⊆ L, it follows that M has cardinality λ.
(It is possible to rewrite this theorem in a way which does not require the Axiom of Choice:
Every consistent first-order theory in a language K which has a cardinality has a model of
every cardinality greater than or equal to that of K.)
364 CHAPTER 8. MODEL THEORY
B.17 Corollary
There is a model of First-Order Number Theory (PA) which is uncountable.
(Note that this one does not require the Axiom of Choice, because the theory is countable
and therefore does have a cardinality.)
There is an important consideration which has been ignored so far in this section: we have
not considered equality at all and in particular, there has been no attempt made to ensure
that the models we have been creating respect equality. Now first-order theories do not
necessarily have a symbol for equality, and our theorems so far have been perfectly general:
they apply to any theory, whether it is a theory with equality or not, by the simple expedient
of treating equality just like any other relation. But the result of this is that our models
so far need not respect equality — indeed, they almost certainly will not. However, most
of the first-order theories we are interested in are theories with equality, so the question of
whether such a theory has a model which respects equality is begging for an answer.
B.18 Theorem
If a first-order theory with equality has a model then it has a model which respects equality.
More specifically, if it has a model of cardinality κ, then it has a model of cardinality no
greater than κ which respects equality.
If you are familiar with the notion of a quotient group or a quotient ring from algebra — or
better still, the notion of a quotient of any kind of algebra — then this proof should look
very familiar. In fact, if we ignore the rôle of the relation symbols, a model is just a kind of
algebra, and what we are about to do is create a quotient algebra, at the same time checking
that relations behave the way we need them to.
Proof. Let A be a first-order theory in a language L with equality and M be a model for
A. We use M to construct another model M 0 which respects equality.
Step 1 Define a relation ∼ on M by: a ∼ b if either a = b or, for every interpretation
θ : tL → M , there are terms s, t ∈ tL such that s = t, θ(s) = a and θ(t) = b.
Note that it is obvious that ∼ is reflexive and symmetric, however it may not be transitive.
fM (a1 , a2 , . . . , an ) ∼ fM (b1 , b2 , . . . , bn )
rM (a1 , a2 , . . . , an ) ⇔ rM (b1 , b2 , . . . , bn )
B. ADEQUACY OF FIRST-ORDER THEORIES 365
fM (a1 , a2 , . . . , an ) ∼ fM (b1 , b2 , . . . , bn ) .
fM (a1 , a2 , . . . , an ) ≡ fM (b1 , b2 , . . . , bn )
rM (a1 , a2 , . . . , an ) ⇔ rM (b1 , b2 , . . . , bn ) .
Proof of this: From the definition of ≡, there are natural numbers k1 , k2 , . . . , kn , all ≥ 0, and
members ui,j of M for i = 1, 2, . . . , n and j = 0, 1, . . . , ki such that ai = ui,0 and bi = ui,ki
for i = 1, 2, . . . , n and ui,j−1 ∼ ui,j for i = 1, 2, . . . , n and j = 1, 2, . . . , ki . The trickery with
subscripts here is because the chains of u’s connecting ai to bi may be of different lengths
for different i. However, we can always make any of these chains longer, by simply repeating
the last u, using the fact that ∼ is reflexive. So we may, in fact, assume that all these chains
are the same length, which is the same as assuming that all the ki are the same. So let
us do this; now we have one natural number k and members ui,j of M for i = 1, 2, . . . , n
and j = 0, 1, . . . , k such that ai = ui,0 and bi = ui,k for i = 1, 2, . . . , n and ui,j−1 ∼ ui,j for
i = 1, 2, . . . , n and j = 1, 2, . . . , k. But now, from what was proved above about ∼,
From this
as required.
For you algebraists: what we have just done is show that this relation ≡ is a congruence.
Now all we need to do is “quotient it out” to get the model we want.
Step 3 Now, let M 0 be the set of all congruence classes in M 0 (equivalence classes under
the congruence ≡). For any member x of M , we will write [x] for the congruence class of x.
We also write π : M → M 0 for the “natural projection” x 7→ [x].
We make M 0 into a structure for L as follows. For any n-ary function symbol f , define
fM 0 : M 0 → M 0 thus: let X1 , X2 , . . . , Xn be members of M 0 , that is, congruence classes in
M . Choose representatives x1 , x2 , . . . , xn from these classes. Define fM 0 (X1 , X2 , . . . , Xn ) to
be the congruence class containing fM (x1 , x2 , . . . , xn ). The results just proved above tell us
that the particular choice of representatives x1 , x2 , . . . , xn in their classes is irrelevant. The
definition then can be written succinctly:
I 5.B.5 We do this by induction over the construction of P using the formal definition in 5.B.5.
(i) If P = r(t1 , t2 , . . . , tn ) then it is true in M under θ if and only if
rM 0 (π◦θ(t1 ), π◦θ(t2 ), . . . , π◦θ(tn )) is true, that is, rM 0 ([θ(t1 )], [θ(t2 )], . . . , [θ(tn ))] is true
(iv) Suppose that P is (∀v)Q and, as usual, that, for every interpretation θ : L → M ,
Q is true in M under θ if and only if it is true in M 0 under π ◦ θ. Then for any such θ,
In the third equivalence here we use the fact that π ◦ θ[v/m] = (π ◦ θ)[v/[m]] , which can be
checked by verifying that both these interpretations have the same restrictions to vL. The
fourth equivalence depends upon the fact that π is onto M 0 .
Step 5 Now we show that M 0 is a model for A. Suppose P in A and θ is any interpre-
tation A → M 0 . Let us write α for the restriction of θ to the set vL of variables of L (this is
the assignment which defines θ). Since the natural projection π is onto M 0 , we may “factor
α through π”, that is, there is a function β : vL → M such that α = π ◦ β. Now β is an
assignment vL → M , and so can be extended uniquely to an interpretation, ψ say, L → M .
Now ψ, restricted to vL is β, so π ◦ ψ, restricted to vL is π ◦ β = α. Since an interpretation
is defined by its restriction to the set of variable symbols, it follows that π ◦ ψ = θ. Since
M is a model, P is true in M under θ. Now it follows from the previous step that it is also
true in M 0 under ψ.
Step 6 The new model M 0 respects equality: if s = t and θ is any interpretation
L → M 0 , construct ψ : L → M so that θ = π ◦ ψ, as in the previous step. Then ψ(s) ∼ ψ(t),
so ψ(s) ≡ ψ(t), so [ψ(s)] = [ψ(t)], that is θ(s) = θ(t) .
Also, the function π : M → M 0 is onto, so the cardinality of M 0 is no greater than that of
M.
(Note that the Axiom of Choice is used in this last statement. It has also been used earlier
in the proof — in Step 5 it is required to factor α through π.)
AC
B.19 Remark
It follows from this theorem that any consistent theory has a model which respects equality,
however the theorem as proved is not so specific about the cardinality of the model.
For example, a countable theory has a countable (ordinary) model M , and then the theorem
constructs the model which respects equality as the quotient model M 0 , however this may
be much smaller than M . In other words, it might be countably infinite or it might be finite.
Here is an easy example. Choose any particular positive integer, say 3. Then it is easy enough
to construct a theory with axioms which ensure that a model which respects equality must
have exactly three members. For example, include three (different) constant symbols, a, b
368 CHAPTER 8. MODEL THEORY
¬(a = b)
¬(a = c)
¬(b = c)
(∀x)( (x = a) ∨ (x = b) ∨ (x = c) )
If the model is not required to respect equality, we may construct an infinite model as above.
The only effect axioms such as these would have is to force every member of the model to
stand in the relation =M to one of the three elements which interpret a, b and c — but this
relation is not actual equality.
In setting up a theory to describe a model (for example PA to describe N), we choose some
function and relation symbols and axioms governing them. It is always possible that, either
unwittingly or on purpose, we fail to supply enough structure or axioms to completely specify
the model. In this case our theory will have several non-isomorphic models.
Now, if our theory does have non-isomorphic models, that means that there are some things
which are true in some models and false in others (that’s what non-isomorphism means).
Now these things may or may not be expressible in the language of the theory.
For example, the usual model of Peano Arithmetic is the countable set N. But the Upward
Löwenheim-Skolem Theorem tells us that it also has an uncountable model. This “count-
ability” fact, true in one model and false in the other, cannot be expressed in the language
of Peano Arithmetic.
An example of a theory which has non-isomorphic models and a fact, which is expressible in
the language, which is true for some models and false for others is not hard to manufacture.
Take any consistent first-order theory which has an independent set of axioms (for instance
PA. Make a weaker theory by choosing one of the axioms (call it A) and removing it. This
weaker theory has a model in which A is true, and another in which ¬A is true. (Why? Since
the original axioms were independent, both the original theory and the theory in which A
is replaced by ¬A are consistent, and therefore have models.
We have seen that a first-order theory may well have a number of non-isomorphic models.
This raises the likelihood that there may be things that can be said in the language that
are true in some models and false in others. However it would be nice if anything that can
be expressed in the language and is true in all models is in fact a theorem.
In the next section we will see some more interesting examples of non-isomorphic (perhaps
unexpected) models.
In view of all this, it would be nice if at least any fact that can be expressed in the language
and is true in all models has to be a theorem. A theory which has this property is called
adequate. The next theorem tells us that all first-order theories have this desirable property.
(The proof is now easy, but depends upon the heavy work we have been doing in this
chapter.)
B. ADEQUACY OF FIRST-ORDER THEORIES 369
In other words again, if A is a first-order theory with axioms A and P is any expression in
A, then
A P if and only if A P .
A P if and only if A = P
The other part of the argument is the same as for the previous theorem. Suppose that P is
not provable in A (that is, that it is not the case that A P ). Form the extension B of A
by adding ¬P as a new axiom to A . We know that B is consistent. By Theorem B.18, B I B.18
has a model which respects equality. This is also a model for A, and in it P is not true.
370 CHAPTER 8. MODEL THEORY
C Compactness
The following theorem is easy to prove and looks innocuous, but it has a number of fasci-
nating (and very useful) consequences, some of which we will explore in this section.
Proof. This is straightforward — we have met this idea before. If A does not generate a
consistent theory, then there is a statement P such that A P ∧ ¬P . There can be only a
finite number of members of A involved in the proof of this, so that finite subset generates
an inconsistent theory.
I B.14 The Fundamental Theorem of Model Theory (B.14) immediately gives us an equivalent form
of this theorem.
First we add the new symbol ω̄. This automatically extends the language to include all the
new expressions containing ω̄. And, since this is a first-order theory, the logic axioms, which
are schemata, must also be extended to encompass all these new expressions. The induction
axiom PA3 is a schema, but it is a proper axiom, so we don’t extend it: it only refers to
expressions in the original language. Now, since we have not yet added the new axioms, this
extended theory is also consistent. (This is by Lemma B.6.) I B.6
The answer to this question is that, in the argument above, I jumped too quickly to a
conclusion, namely that
(∀x)(x < ω̄ ⇒ x+ < ω̄) . (!!)
and to get from here to the supposed theorem would involve all those infinite number of
axioms, and so an infinitely long proof: and there is no such thing. Now one might imagine
that there was some sneaky way of getting around this problem, and proving the theorem
labelled (!!) some other way. But this cannot be, because if this were a theorem, the
rest of the argument above is watertight and we would indeed have ω̄ < ω̄, which is an
antitheorem (by the definition of <). And that would show our new theory PAω to be
inconsistent, contradicting the Compactness Theorem.
This is one example of a number of similar games that can be played with PA, probably the
best-known one. They are generally called non-standard arithmetic.
C.4 Example: A model of the reals with infinite numbers and infinitesimals
We can apply a similar idea to the real numbers to add infinitesimals. To do this we are
going to represent R by a language having uncountably infinite sets of function and relation
symbols and an uncountably infinite number of axioms.
372 CHAPTER 8. MODEL THEORY
We define our language, L say, to have a function symbol fˆ of arity n corresponding to every
ℵ0
actual function f : Rn → R — the whole 22 of them. Also a relation symbol r̂ of arity n
ℵ0
corresponding to every actual relation of arity n, that is, every subset of Rn , again 22 of
them.
For our theory we simply take every expression in this language which is true in R, and we
take the axioms to be the same set of expressions (or better, just the closed ones): there’s
no harm in that, even if it is a bit unusual.
So now we have an enormous first-order theory. Nevertheless, it is in many ways quite easy
to deal with. For example, it is obvious, from the way it was built, that R (with all its
functions and relations) is a model.
Now we add an infinitesimal in much much the same way as we added an infinity to PA.
We add a new constant (nullary function) ι and axioms
0̂ < ι
ι < x̂ for every positive real number x .
As with the previous example, R is a model for every finite set of these axioms. Consequently
the extended language has a model.
This model is the “nonstandard reals” and is used for Nonstandard Analysis. This is lots
of fun. Because of the existence of infinitesimals, things like dx and dy become first class
citizens and you can do calculus the way you’ve always secretly wanted to. Well, almost.
C.5 Theorem
If a first-order theory has arbitrarily large finite models which respect equality, then it has
an infinite model which respects equality.
Proof. Suppose that the theory A has arbitrarily large finite models which respect equality.
Since it has models, it is consistent.
From this form a new theory A0 by adding a countable number of new constant symbols
b1 , b2 , . . . (and extending the axiom schemata of predicate logic as necessary to encompass
I B.6 the new symbols). Then A0 is consistent by Proposition B.6.
Now, for each natural number n, extend the theory A0 by adding new axioms
(These axioms “say” that b1 , b2 , . . . , bn are all different.) Call the theory with these new
axioms Bn .
We now prove that Bn is consistent for each n. Our original theory A has a model, M
say, of size at least n. Thus, we can make it into a structure, M 0 say, for A0 by defining
new constants in it, b̄1 , b̄2 , . . . say. Choose these so that b̄1 , b̄2 , . . . , b̄n are all different; the
remaining ones b̄n+1 , b̄n+2 can be chosen any way at all — for example they could all be
C. COMPACTNESS 373
C.6 Remark
This remark applies only to theories with equality.
For any natural number we like to choose, we can give an axiom which ensures that any
model has at least n members. Here is the example for n = 3:
(∃x)(∃y)(∃z)(x 6= y ∧ x 6= z ∧ y 6= z)
In a similar fashion, for any natural number n we can give an axiom which ensures that any
model has at most n members. Here is the example for n = 3:
(∃x)(∃y)(∃z)(∀w)(w = x ∨ w = y ∨ w = z)
I think it is obvious how to construct such axioms for any n.
These two forms can be combined into an axiom which ensures that the model contains
exactly n members: Here is the example for n = 3:
(∃x)(∃y)(∃z) x 6= y ∧ x 6= z ∧ y 6= z ∧ (∀w)(w = x ∨ w = y ∨ w = z)
If a theory has this last axiom and no more, then any model of it which respects equality
must have exactly n members. If we add this axiom to an existing theory, then either the
resulting theory will be inconsistent or any model will have exactly n members.
374 CHAPTER 8. MODEL THEORY
Let Fn be the axiom above which states that the model has exactly n members. Then ¬Fn
states that the model does not have n members, and so the countably infinite set of axioms
We can also force a model to be infinite with only one axiom if we add a new binary function
symbol, f say. Use the axiom
This simply states that f is injective. But that makes fM an injective function M ×M → M
and the only way this can happen (since M is nonempty by definition) is for M to be infinite.
On the other hand, the last theorem above tells us that there is no set of axioms (finite or
infinite and irrespective of what relations or functions we allow ourselves) which say “the
model is finite” in the same way.
I B.18 In the context of Theorem B.18 and the following remarks, the next corollary is of interest.
C.7 Corollary
(i) Let T be a countable theory with equality which has an infinite model which respects
equality. Then it has a countably infinite model which respects equality.
COMPUTABILITY
375
9. RECURSIVE FUNCTIONS
A.1 Discussion
In this chapter we examine the notion of an algorithm. Think of an algorithm a cut-and-
dried method of computing something, specified by a recipe of some kind which can be
followed blindly, without any creative thought required. A computer program makes quite
a good analogy (or example).
Algorithms are not confined to working with numbers; they can work with other kinds of
data, though it will be assumed that the data can be represented symbolically in some way.
In Appendix C we will look briefly at algorithms in their widest applications, but in this IC
one we will start with algorithms for computing functions involving natural numbers only,
that is, functions N → N or more widely Nm → Nn .
The arithmetic processes of addition and multiplication of large numbers and of long division
are examples of algorithms which should be familiar from schooldays.
Numerical algorithms, such as we will be discussing mostly, are expected to work for numbers
of any size. Consequently they will typically contain some looping arrangement, something
like “Keep doing these instructions until so-and-so happens”.
So an algorithm will consist of a list of instructions, each one of which is quite unambiguous
and can always be performed, together with an arrangement to prescribe, again unambigu-
ously, which instruction to perform next. (Typically, if written in natural language, the
instructions will be followed line-by-line down the page except where “branching instruc-
tions” which create loops are encountered.) There will, of course, be one or more “End”
instructions of the form “OK, we are finished now. Here is your answer”.
One aspect of algorithms needs to be mentioned: any set of instructions, however stupid or
ridiculous, will define an algorithm, so long as those instructions are clear and unambiguous
and can be followed.
So, if one designs an algorithm to compute some function — for instance the nth prime,
given n — the first question one might ask is, “Does it actually compute the function we
want?”, followed quickly by, “Are there perhaps any special values which fail for reasons we
have overlooked?”
But there is a more basic and immediate problem with any algorithm, whether we are trying
to design one to compute a known function, or are simply writing down some interesting
instructions to see what will happen: the algorithm may fail to produce an answer at all.
Given that, as specified above, each instruction can always be performed and the algorithm
always specifies which instruction to perform next (except for “End” ones), the only way an
377
378 CHAPTER 9. RECURSIVE FUNCTIONS
algorithm can fail to produce an answer is that it never gets to such an “End” instruction.
In that circumstance the algorithm will just keep going forever, usually called “getting into
an infinite loop”. (Computer programmers are familiar with this possibility.)
So, for any algorithm and in particular the ones to compute functions Nm → Nn discussed
in this chapter, we have to allow for the possibility that, for some input values, there will
be no output value. In other words, we will be dealing with partial functions Nm → Nn . I
will use, as much as possible, a special arrow to denote this: Nm 99K Nn .
In this chapter I start with the most common definition of partial functions Nm 99K Nn
which can be computed algorithmically. The standard word for these is partial recursive
functions. In fact, we start with functions Nm 99K N and then make the more general
definition later.
The definition we start with really does not look as though it will define any algorithmically
computable function, but this is the standard definition, so let us run with it. As we progress
through the chapter it will become more apparent that pretty well any way of computing
a function you can think up will be covered by this definition. But that of course is not a
proof that we are indeed covering any algorithmically computable function this way; that
IC we will establish in Appendix C.
A.2 Notation
(1) Partial functions. We deal with partial functions Nn 99K N, that is, functions
D → N, where D is a subset of Nn . The subset D may be equal to Nn , in which case the
function is total, or it may be empty, in which case the function has no values at all and we
say the function is empty. Wherever possible I will use a dotted arrow for partial functions,
as above.
A useful notation for this kind of substitution is to write the function simply
h = f (g1 , g2 , . . . , gm )
(3) Minimalisation Now let g be a partial function Nn+1 99K N. The operation of
minimalisation applied to g yields a new partial function Nn 99K N, often written µg, and
defined
µg(y) = the smallest value of x such that g(x, y) = 0 and g(ξ, y) is defined for all ξ < x,
provided such an x exists,
and is undefined if no such x exists.
A. PARTIAL RECURSIVE FUNCTIONS 379
Be careful of this notation; it can be misleading because it obscures the important fact that,
for x to be the value of µg(y), not only must x be the first zero of g(x, y), but all preceding
values must exist.
It might help to think of this operation as looking for the first zero value of f in a very
straightforward algorithmic sort of way. One just computes f (0), f (1), f (2), . . . in order,
stopping at the first n for which f (n) = 0 and then returning n as the value. This can fail
to return a value for two reasons: firstly, there may be no such n, that is, f (n) might never
be zero and, secondly, even if there is such an n one might encounter some i < n at which
f (i) fails to exist, in which case the whole searching process fails right there.
(4) defined
Natural subtraction is the operation −
(
= x−y if x ≥ y,
x−y
0 otherwise.
This is the closest we can expect to get to ordinary subtraction and still be a function
Nn → N, that is, to avoid getting any negative values.
(ii) The identity function idN : N → N given by n 7→ n is just the projection function
π1,1 .
(iii) The zero constant function 0 : N0 → N (which takes the empty sequence h i to 0)
is obtained from the identity function id : N → N by minimalisation (the first x such that
id(x) = 0 is 0).
The zero constant function Nn → N, for any n, can be obtained by minimalisation of πn+1,1 :
Nn+1 → N. If n 6= 0 it can be obtained alternatively as πn,1 −πn,1 .
Other constant functions Nn → N are obtained by composing the appropriate zero function
with the successor function a suitable number of times. For example, the function Nn → N
which takes every hx1 , x2 , . . . , xn i to 2 is given by suc(suc(0(x1 , x2 , . . . , xn ))).
(iv) The difference function hx, yi 7→ |x − y| is obtained by |x − y| = (x−y) + (y −x).
(v) Here are a few simple “test functions” which check for some condition and return 1
for true or 0 for false.
The zero test function N → N and the equality test function N2 → N, defined
( (
1 if x = 0, 1 if x = y,
zero(x) = eq(x, y) =
0 otherwise 0 otherwise
are obtained as zero(x) = 1−x and eq(x, y) = 1−(|x − y|). And of course we have a
nonzero test function N → N and the inequality test function N2 → N, defined
( (
1 if x 6= 0, 1 if x 6= y,
nonzero(x) = neq(x, y) =
0 otherwise 0 otherwise
−x)
obtained as 1−(1 −
and 1−(1 |x − y|).
by repeating this construction in the obvious way. We cannot get general exponentiation
hx, yi 7→ xy as easily as this. It requires, dare I say it, a higher-powered method.
√ √
(vii) An integer square root, x 7→ b xc (the largest √ natural number ≤ x ), can be
described as that natural number y for which y ≤ x < y + 1, which is the same as
y 2 ≤ x < (y + 1)2 . It is thus the smallest natural number y for which (y + 1)2 > x; thus
√
b xc = min{(y + 1)2 > x}
y
= min{leq((y + 1)2 , x) = 0} .
y
This looks back-to-front, but it is correct because it is the function leq, not gtr which yields
0 if and only if (y + 1)2 > x. We are sneakily getting round the fact that, for tests like these,
minimalisation searches for the first 0 = false instead of 1 = true.
In the case y 6= 0, this is the natural number z such that z ≤ xy < z + 1, that is, such that
yz ≤ x < y(z + 1). It is thus the smallest vaue of z for which y(z + 1) > x, so
As luck would have it, this also works when y = 0, whether or not x = 0 too, because then
there is no z such that y(z + 1) > x so the minimalisation fails to return a value.
(ix) The function
(
the remainder upon dividing x by y if y 6= 0,
Rem(x, y) =
undefined if y = 0
is obtained by Rem(x, y) = x − ybx/yc = x − ybx/yc (since x ≥ ybx/yc always).
Remark Integer division and the Remainder function are not recursive as defined, only
partial recursive (both being undefined when y = 0). However, as we will be using them
they usually give rise to recursive functions.
h(x) = bf (x)/g(x)c.
This clearly defines h as a partial recursive function. However, if we happen to know that
g(x) is never zero, then h is also a total function and so, in fact, recursive. The same idea
holds good for the Remainder function.
382 CHAPTER 9. RECURSIVE FUNCTIONS
Why? Algorithms, remember, are not supposed to be smart; they just do exactly what they
are told. In this case the algorithm says, “Compute f (x). Now compute f (x) again. Now
natural subtract the second value from the first.” If f (x) is undefined, the algorithm fails
to complete the first step. The same goes of course for functions of several variables.
Question
Which of the following are true for any partial functions f, g, h : Nm → N and x ∈ Nm ?
(i) If f (x) + g(x) = h(x) then f (x) = h(x)−g(x).
(ii)
If f (x) = h(x)−g(x) then f (x) + g(x) = h(x).
(iii) If f (x) + g(x) = f (x) + h(x) then g(x) = h(x).
We see that most of this is built up by substituting the basic functions listed in the definition
into one another, but the question is, where do the x and y come from to substitute in here?
This is where the projection functions come in: we have x = π2,1 (x, y) and y = π2,2 (x, y) so
we can write the function as
hx, yi 7→ +(×(π2,1 (x, y), π2,1 (x, y)), ×(π2,2 (x, y), π2,2 (x, y)))
Here now is a more challenging problem: the integer square root given in (vii) above. It was
defined as √
b xc = min{leq((y + 1)2 , x) = 0} .
y
A. PARTIAL RECURSIVE FUNCTIONS 383
Before we can go further, we must realise that this is an “easy to read” version of the
minimalisation operation, which should strictly be written
√
b xc = µg(x) where g is the function g(y, x) = leq((y + 1)2 , x) .
Here we have two functions to sort out, leq and 1. Note that here we need to treat the
constant 1 as the constant function N2 → N. We can get this constant function by composing
the zero function with the successor function, and we get the zero function as π2,1 −π2,1 , so
−y),
The leq function is given as leq(x, y) = 1−(x so it is
leq
−(1,
−(π
is 2,1 , π2,2 )) ,
the 1 here being the constant function 1 : N2 → N, as discussed above. I think it is now
clear how to substitute these various functions into one another to get the integer square
root.
where f1 (x), f2 (x), . . . , fn (x) are partial functions Nm → N. (I am using the boldface x as
shorthand for the n-tuple hx1 , x2 , . . . , xn i.)
(B) We start with (or are given) the individual functions f1 , f2 , . . . , fn and use (–1) to
define f .
The problem is that there is a slight difference between these two approaches. Consider
approach (B). The given functions f1 , f2 , . . . , fn are all partial and may well have different
domains. But then f (x) will exist exactly when all the fi (x) exist; that is, the domain of
f will be the intersection of the domains of all the fi .
On the other hand, if we use approach (A), we are defining all the component functions fi
to have the same domain as f .
The point is that now we want to define what it means to say that a partial function
Nm 99K Nn is partial recursive which will work even when n 6= 1. So w have a question:
384 CHAPTER 9. RECURSIVE FUNCTIONS
should we use approach (A) and define such a function to be partial recursive if its component
functions are or should we use approach (B) and say that there are some partial recursive
functions f1 , f2 , . . . , fn such that (–1) holds?
Happily, we do not need to make a choice: the two approaches are equivalent: a partial
function f : Nm 99K Nn is partial recursive or not irrespective of which definition we use.
So let us choose approach (A) as our definition, since it is likely to be the easiest one to
verify, and prove that approach (B) is equivalent.
(ii) A partial function f : Nm 99K Nn is partial recursive if all its components are.
A.9 Proposition
Let f1 , f2 , . . . , fn be partial recursive functions Nm → N. Then the function f : Nm → Nn
defined by f = hf1 , f2 , . . . , fn i is partial recursive also. (The same result holds trivially for
recursive functions).
The point here is that the functions f1 , f2 , . . . , fn may not all have the same domain and in
this case, as pointed out above, they are not the components of g.
Proof. Firstly, note that dom(f ) = dom(f1 ) ∩ dom(f2 ) ∩ . . . ∩ dom(fn ). Let us write
g1 , g2 , . . . , gn for the genuine components of f . These are the restrictions of the fi to
dom(f ). We want to show that they are all partial recursive also.
For each i = 1, 2, . . . , n let us write di for the partial function Nm 99K N given by
(
1 if x ∈ dom(fi ),
di (x) =
undefined otherwise.
(x) + 1, so these functions are all partial recursive. Here we are using
Now di (x) = fi (x)−f i
I A.5 the trick in A.5. Therefore so is their product d(x) = d1 (x)d2 (x) . . . dn (x), and this is the
function (
1 if x ∈ dom(g),
d(x) =
undefined otherwise.
But now the components of f are given by gi (x) = d(x)fi (x) for each i, so these are partial
recursive, as required.
Observe that the relation “closure under the operation of substitution” in our original defini-
tion of partial recursive functions may be replaced by closure under the two slightly simpler
operations:
A. PARTIAL RECURSIVE FUNCTIONS 385
The proof is interesting, but not particularly illuminating. Read it if you are interested.
The important fact to take away is the fact that P , L and R exist: we will be using them
often.
Proof. We must define the two functions, show that they are inverses and are recursive.
We define P by the usual way of enumerating N2 (running along the “back diagonals”):
x
0 1 2 3
0 0 2 5 9
1 1 4 8
y
2 3 7
3 6 etc.
1
P (x, y) = (x + y)(x + y + 1) + x (–1)
2
and so P is recursive (see the remark at the end of Proposition A.4.) I A.4
386 CHAPTER 9. RECURSIVE FUNCTIONS
Suppose now that P (x, y) = z so that x = L(z) and y = R(z). By simple algabraic
manipulations we have
another recursive function. We solve the last two equations in the usual way to get
Since x is a natural number we must have Q2 (z) − Q1 (z) ≥ 0 and even. Thus
Q (z))/2c .
x = b(Q2 (z) − 1
Since this is true for all x, it gives us a formula for L as a recursive function:
Q (z)
Q2 (z) − 1
L(z) = .
2
L(z).
Substituting back into (3) gives R(z) = Q1 (z) −
Note for later use: if z and z 0 are two natural numbers such that L(z) = L(z 0 ) and R(z) =
R(z 0 ), then z = z 0 .
Also we have
P (L(z), R(z)) = z for all z
and
L(P (x, y)) = x and R(P (x, y)) = y for all x and y.
We now embark on a series of technical lemmas whose main object is to prove that any
function defined by ordinary induction is partial recursive (provided we start from partial
recursive functions of course).
A. PARTIAL RECURSIVE FUNCTIONS 387
A.11 Lemma
Let ν be any natural number which is divisible by all of 1, 2, ..., n. Then the n + 1 numbers
1 + ν(i + 1) i = 0, 1, 2, . . . , n
are pairwise coprime.
Proof. Any divisor of 1 + ν(i + 1), other than 1, must be greater than n. Now suppose
that 0 ≤ i < j ≤ n and d divides both 1 + ν(i + 1) and 1 + ν(j + 1). Then d divides
(j + 1)(1 + ν(i + 1)) − (i + 1)(1 + ν(j + 1)) = j − i. Thus d ≤ j − i < n so d = 1.
A.13 Lemma
Let a0 , a1 , . . . , an be a finite sequence of natural numbers. Then there are natural numbers
u and v such that
Rem(u + 1, 1 + v(i + 1)) = ai for i = 0, 1, . . . , n .
Proof. Let A = max{a0 , a1 , . . . , an } and v = (A + 1)n!. By Proposition A.11 above, the I A.11
numbers 1 + v(i + 1) are pairwise coprime. Also ai < v < 1 + v(i + 1) for all i. By the
modified version of the Chinese Remainder Theorem, there is a nonzero natural number x
such that
x ≡ ai mod 1 + v(i + 1) for i = 0, 1, . . . , n
and so, setting x = u + 1, there is a natural number u such that
u + 1 ≡ ai mod 1 + v(i + 1) for i = 0, 1, . . . , n ,
that is
Rem(u + 1, 1 + v(i + 1)) = Rem(ai , 1 + v(i + 1))
= ai since ai < 1 + v(i + 1).
The next “lemma” is interesting enough to stand as a proposition in its own right.
388 CHAPTER 9. RECURSIVE FUNCTIONS
Proof. Define
T (w, i) = Rem(L(w) + 1, 1 + R(w)(i + 1)).
I A.4 Then T is obviously recursive (see the remark at the end of Proposition A.4.) Now, given
a0 , a1 , . . . , an , by the previous proposition, there are integers u and v such that ai = Rem(u+
1, 1 + v(i + 1)) for i = 0, 1, . . . , n. Setting w = T (u, v) does the trick.
Note what this says. Suppose we display the values of the function T as a two-dimensional
array:
0 1 2 3 4
0 T(0,0) T(0,1) T(0,2) T(0,3) T(0,4) ...
1 T(1,0) T(1,1) T(1,2) T(1,3) T(1,4) ...
2 T(2,0) T(2,1) T(2,2) T(2,3) T(2,4) ...
3 T(3,0) T(3,1) T(3,2) T(3,3) T(3,4) ...
4 T(4,0) T(4,1) T(4,2) T(4,3) T(4,4) ...
.. .. .. .. ..
. . . . .
Then every finite sequence of natural numbers turns up somewhere in this array as the
beginning of a row:
T (w, 0) , T (w, 1) , T (w, 2) , . . . , T (w, n)
for some w.
This proposition will play an important rôle in the next section.
A.15 Remarks
I A.3 It would be nice if we could construct any recursive function as in Definition A.3, but in such
a way that all intermediate functions involved are also recursive, i.e. total. But we observe
that the definition allows the construction of a recursive function in such a way that one or
more of the intermediate functions involved are in fact partial; it is by no means obvious
that such a construction can be modified to avoid these intermediate partial functions.
A regular function is a total function which yields a total function when minimalised. More
precisely, a function f : Nn+1 → N is regular if it is total and, for every y ∈ Nn , there is
some x ∈ N such that f (x, y) = 0.
I A.3 To return to our wishful thinking: If in Definition A.3 we restrict the operation of minimali-
sation to apply only to regular functions, then all functions so produced would be total and
so recursive. It would be nice if all recursive functions could be formed in this way, but, as
stated above, it is not at all obvious. Nevertheless it is so. We will prove this as a corollary
I C.E.18 to a rather big theorem in the Appendix, Corollary (C.E.18).
A. PARTIAL RECURSIVE FUNCTIONS 389
As a first attempt, we might ask for an algorithm that will yield the answer “true” or “false”,
that is, a function f : Nn → {true, false}. But this won’t do: we need a function Nn → N.
But there is an obvious way to fix this: let the function take values 1 for “true” and 0 for
“false”. Thus, given a subset A of Nn , we want a function f : Nn → N such that f (x) = 1 if
x ∈ A and f (x) = 0 otherwise. That is easy: it is called the characteristic function of the
set A.
At this stage you might pause to wonder what a “partially recursive” subset might be. But
if you try to write down a definition of this idea, you will find it doesn’t make any sense (try
it). So the concept is not defined. (The closest thing is “recursively enumerable”, which we
look at later.)
Given the very close relationship between n-ary relations on N and subsets of Nn , it is
natural to also define recursive relations. The word predicate is synonymous with relation
and tends to be the one usually used in this context, so I will use it here.
(i) A subset A of Nn is recursive if its characteristic function χA : Nn → N is recursive.
Here as the characteristic function is defined in the usual way:
(
χA (x) = 1 if x ∈ A
0 otherwise
A.17 Proposition
(i) Let P and Q be two n-ary predicates (defined on Nn ). If P and Q are both recursive
then so are ¬P , P ∧ Q , P ∨ Q , P ⇒ Q and P ⇔ Q.
Further, if f1 , f2 , . . . , fn are recursive functions Nm → N, then P (f1 (x), f2 (x), . . . , fn (x)) is
an m-ary recursive predicate.
(ii) Let P be a unary predicate (on N). If P is recursive, then so are the unary predicates
(∀ξ ≤ x)P (ξ) , (∃ξ ≤ x)P (ξ) , (∀ξ < x)P (ξ) and (∃ξ < x)P (ξ)
390 CHAPTER 9. RECURSIVE FUNCTIONS
More generally, suppose P is a predicate on Nn+1 . If P is recursive, then so are the (n+1)-ary
predicates
(∀ξ ≤ x)P (ξ, y) , (∃ξ ≤ x)P (ξ, y) , (∀ξ < x)P (ξ, y) and (∃ξ < x)P (ξ, y).
You will have noticed that all the quantifiers here are bounded. There is no guarantee that
the unbounded versions, such as (∀ξ)P (ξ) or (∃ξ)P (ξ) will be recursive.
Roughly speaking, even if the predicate P is recursive, trying to compute either (∀ξ)P (ξ) or
(∃ξ)P (ξ) will involve an infinite search, which might well end up in an infinite loop. Suppose,
for instance, we are trying to decide whether (∀ξ)P (ξ) is true or not by the straightforward
method of computing the ruth of P (ξ) for ξ = 0, 1, 2, . . . . If (∀ξ)P (ξ) is in fact false, then
there will be some value of ξ for which P (ξ) is false; our algorithm will eventually reach this
value and halt with the answer 0 (for “false”). But if (∀ξ)P (ξ) is in fact true, we will never
get a definitive answer: even if we check the first 10100 values of ξ, we won’t know whether
the (10100 + 1)th yields true or not.
Those remarks are not a proof of the assertion that there are recursive predicates for which
the quantified versions are not recursive; they are simply an indication of the sort of thing
that can go wrong. Actual examples to back up this assertion will appear later in these
notes.
(iii) If f and g are recursive functions Nn → N, then the n-ary predicates
f (x) = g(x) , f (x) 6= g(x) , f (x) ≤ g(x) and f (x) < g(x)
is partial recursive. P (x, y) is defined for all x and y, so there are no subtleties about min
to worry about here.
x 7→ eq(f (x), g(x)) , x 7→ neq(f (x), g(x)) , x 7→ leq(f (x), g(x)) , and x 7→ less(f (x), g(x)) ,
is eq(f (x), g(x)) and, of course, f (x) 6= g(x) is ¬(f (x) = g(x)).
A. PARTIAL RECURSIVE FUNCTIONS 391
We note, for use in the next part of the proof, that if P is such that, for every y ∈ Nn , there
is some x such that P (x, y), then f is recursive.
(ii) The characteristic function χP : Nn → N is recursive. Note that (∀ξ < x)P (x, y) if
and only if
min{ P̄ (ξ, y) = 0 or ξ = x } = x .
ξ
Note also that the predicate being minimised here is recursive (by (i) and the first half of
(iii)). Thus (using the note at the end of the proof of (iv) above), the left hand side of this
equation is a recursive function. Then the whole equation (as a predicate) is recursive, by
(iii) again.
The other three results now follow immediately.
Question
What does all this mean for recursive sets?
392 CHAPTER 9. RECURSIVE FUNCTIONS
This is just defining a function by ordinary induction, so we already know that the function
f so defined exists, is total and is uniquely defined by g and h. In the context of recursive
functions, this form of definition of a function is called “primitive recursion”, so we’ll stick
with that terminology here.
We already know that this method of constructing functions is important. We can now
see that it involves a sort of “searching along” process, rather like minimalisation (in both
processes we must compute f (0), f (1), f (2), f (3), . . . ).
So it is of interest to see what happens if we replace the operation of minimalisation by
primitive recursion in the original definition of a partial recursive function. The functions
created this way are called primitive recursive.
We see a couple of things straight away. Firstly, some of the basic functions in the definition
become redundant. Given we have the constant function 0 and the successor function, ad-
dition, multiplication and natural subtraction follow immediately using primitive recursion.
So the definition simplifies somewhat.
Secondly, we cannot possibly get all the partial recursive functions that way. In fact we can
only get total functions, since the basic functions are total and applying primitive recursion
to total functions always produces total functions. In other words all primitive recursive
functions are recursive. This raises the question: are all recursive functions primitive recur-
sive?
countable number of recursive ones. I can give a rough argument here that the number of
recursive functions is countable: because it must be possible to specify such a function using
a finite number of instructions, each of which is composed of a finite number of symbols
from a finite alphabet. A more watertight proof will be given later.
B.2 Proposition
With the notation of the above definition, if g and h are recursive functions, then so is f .
This proof is the culmination of the series of technical lemmas in the last section leading to
Proposition A.14. It is also technical. Read it if you dare! I A.14
Proof. Let T be the function defined in Proposition A.14. Then, for any x and y, there is
a w so that
T (w, ξ) = f (ξ, y) for all ξ = 0, 1, . . . , x. (2)
Then, substituting in Equations (1A) and (1B) above,
Also conversely, if w is any natural number such that Equations (3A) and (3B) hold, then
T (w, ξ) = f (ξ, y) for all ξ = 0, 1, . . . , x.
Now, for any x and y, let w0 be the least w satisfying these equations. Then, for every x
and y, w0 is uniquely defined, so we may consider it to be a function of x and y and write
it as w0 (x, y). We have in fact defined it as
w0 (x, y) = min{ T (w, 0) = g(y) and (∀ξ < x) T (w, ξ + 1) = h(ξ, T (w, ξ), y) } .
w
and we know that Equation (2) holds for it in the sense that , for any x and y,
B.3 Definition
A function f : Nn → N is primitive recursive if it can be obtained by a finite number of
applications of the operations of substitution and primitive recursion from the functions in
the following list:—
(i) x 7→ x + 1, the successor function suc : N → N,
(ii) 0 the constant zero function N0 → N,
(iii) (x1 , x2 , . . . , xn ) 7→ xi the projection function πn,i : Nn → N.
394 CHAPTER 9. RECURSIVE FUNCTIONS
From the last proposition we see that every primitive recursive function is recursive.
Having said that all the recursive functions defined so far in this chapter are in fact primitive
recursive, this fact really needs proving. To do this, we need to go over all the constructions
of these functions and, wherever a minimalisation is used, show that it can be replaced
somehow with a primitive recursion. I will do that for those functions we want to know
really are primitive recursive.
(ii) Addition N2 → N:
x+0=x (add(x, 0) = id(x))
x + (y + 1) = (x + y) + 1 (add(x, y + 1) = suc(add(x + y)) )
B.5 Remark
From these examples we see that an alternative definition of primitive recursive functions
would be to make the following changes to the definition of partial recursive functions:
replace the operation of minimalisation by that of primitive recursion and add the zero
function to the basic list.
By the same token, an alternative definition of partial recursive functions is afforded by
starting with the definition of primitive recursive function and adding the operation of
minimalisation.
B. PRIMITIVE RECURSIVE FUNCTIONS 395
(vii) Exponentiation N2 → N:
x0 = 1
xy+1 = xy x .
(viii) The functions zero, eq, nonzero, neq and hx, yi 7→ |x − y| are now defined as in A.4(v) I A.4
above.
B.7 Proposition
If f : Nn+1 → N is recursive or primitive recursive, then so (respectively) are
x
X
g : Nn+1 → N defined by g(x, y) = f (i, y)
i=0
x
Y
h : Nn+1 → N defined by h(x, y) = f (i, y)
i=0
In the case that f is recursive, this proposition would have been awkward to prove without
the help of Proposition B.2. I B.2
B.8 Proposition
If f : Nn+1 → N is a recursive or primitive recursive function, then so is the function obtained
from it by bounded minimalisation, that is,
(
the least value of x such that x ≤ m and f (x, y) = 0 if such an x exists,
h(m, y) =
0 otherwise.
Also (
x + 1 if f (x + 1, y) = 0,
(x + 1). zero(f (x + 1, y)) =
0 otherwise
so, multiplying the last two functions,
x
Y
(x + 1). zero(f (x + 1, y)) nonzero(f (i, y))
i=0
(
x+1 if f (x + 1, y) = 0 and f (i, y) 6= 0 for all i ≤ x,
=
0 otherwise
θ(0, y) = 0
x
Y
θ(x + 1, y) = (x + 1). zero(f (x + 1, y)) nonzero(f (i, y)),
i=0
So
m
X
h(m, y) = θ(x, y) is primitive recursive also.
x=0
Notation It will be convenient to use the notation minx≤m {f (x, y) = 0} for the result of
applying bounded minimalisation to the function f .
Thus we could, if we felt so inclined, add the operation of bounded minimalisation to the
list of basic operations in the definition of primitive recursive functions.
In a rough sort of way, this encapsulates the difference between recursive functions and
primitive recursive ones. The ordinary minimalisation can be thought of as incorporating
into our definition of a partial recursive function the process of searching through a set of
values until something or other occurs to stop the search — acknowledging the possibility
that that something or other might never occur. (This is where an infinite loop might arise.)
With bounded minimalisation, we have a limit on how far the search is to proceed, that
limit being known at the outset of the search. This precludes the possibility of an infinite
loop.
B.9 Corollary
Now we see that all the functions given as examples of recursive functions so far (the total
ones only!) are in fact primitive recursive, because in each case where minimalisation was
B. PRIMITIVE RECURSIVE FUNCTIONS 397
used, bounded minimalisation could have been used instead. For example, integer division
and integer square root can be redefined:
x
= min{gtr(y(z + 1) , x) = 0}
y z≤x
√
b xc = min{gtr((y + 1)2 , x) = 0}.
z≤x
In the same way, the functions P , L and R used to set up a recursive bijection between N
and N2 are primitive recursive.
B.11 Proposition
This corresponds quite closely with Proposition A.17. I A.17
n
(i) Let P and Q be two n-ary predicates (defined on N ). If P and Q are both primitive
recursive then so are ¬P , P ∧ Q , P ∨ Q , P ⇒ Q and P ⇔ Q.
Further, if f1 , f2 , . . . , fn are primitive recursive functions Nm → N, then P (f1 (x), f2 (x), . . . , fn (x))
is an m-ary primitive recursive predicate.
(ii) Let P be a unary predicate (on N). If P is primitive recursive, then so are the unary
predicates
(∀ξ ≤ x)P (ξ) , (∃ξ ≤ x)P (ξ) , (∀ξ < x)P (ξ) and (∃ξ < x)P (ξ)
(∀ξ ≤ x)P (ξ, y) , (∃ξ ≤ x)P (ξ, y) , (∀ξ < x)P (ξ, y) and (∃ξ < x)P (ξ, y).
(iii) If f and g are primitive recursive functions N → N, then the n-ary predicates
f (x) = g(x) , f (x) 6= g(x) , f (x) ≤ g(x) and f (x) < g(x)
Proof. The proofs of (i) and (iii) are exactly the same as the corresponding proofs of
Proposition A.17. The proof there of (ii) involves minimalisation, so cannot be used here, I A.17
so . . .
(ii) The characteristic function of (∀ξ ≤ x)P (ξ) is
x
Y
χP (ξ, y)
ξ=0
B.13 Definition
(i) We define the function Pn : Nn → N by induction over n:
P0 () = 0 ; P1 (x) = x
Pn+1 (x1 , x2 , . . . , xn+1 ) = P (x1 , Pn (x2 , x3 , . . . ,n+1 )) for all n ≥ 1 ,
(Here I am using a simplified notation for composites. For instance, LR(w) means L(R(w)),
LR2 (w) means L(R(R(w))) and so on.)
Some examples:
B.14 Proposition
With the notation of the last definition, the functions Pn and En are all primitive recursive
(to say that En is primitive recursive means, as usual, that all its component functions En,i
are primitive recursive).
Furthermore, for any n 6= 0, these functions are inverse bijections, that is
B.15 Proposition
Let f be any function Nm 99K Nn , partial or total. Then there is a partial function f O :
N 99K N such that
f O ◦ Pm = Pn ◦ f (–1)
If m 6= 0 this function is unique.
Moreover,
f O is partial recursive if and only if f is partial recursive,
f O is recursive if and only if f is recursive,
f O is primitive recursive if and only if f is primitive recursive.
400 CHAPTER 9. RECURSIVE FUNCTIONS
f
Nm Nn
Pm Pn
N N
fO
Also, composing (–1) on the left with En and using the fact that En ◦ Pn is the identity
function (always, even if n = 0), we have
E n ◦ f O ◦ Pm = f (–3)
f f
Nm Nn Nm Nn
Em Pn Pm En
N N N N
fO fO
Since Pm , Pn , Em and En are all primitive recursive and therefore also recursive and partial
recursive, Equations (–2) and (–3) give us statement (ii).
B.16 Discussion
IC In the Appendix (C) and this chapter we will need to work with the set of all finite sequences
of natural numbers. This is the set
∞
[
0 1 2
N ∪ N ∪ N ∪ ... = Nn .
n=0
B. PRIMITIVE RECURSIVE FUNCTIONS 401
Let us call this set N[∞] to have a simpler name for it.
We will want to consider functions from this set to itself, from it to Nn and from Nm to it.
There are many such functions which we would like to think of as recursive, because there
are obvious algorithms to compute them. Examples would be: finding the length of one of
these sequences, finding its first member, reversing its order, removing its first entry and so
on.
Now to ask for such a function to be partial recursive, recursive or primitive recursive is
problematical: we have as yet no definition of partial recursiveness (etc.) for functions
defined on this domain. So the job is now to supply one. What we do is take a hint from
the previous proposition: define a pair of inverse functions
which we can accept (intuitively) as being algorithmically computable, then use the above
proposition (sort of in reverse) to define partial recursive, recursive and primitive recursive
functions.
The basic idea for the function P[∞] is fairly simple: we can easily describe a sequence of
any length by two numbers, its length, n say, and Pn of the sequence. In other words,
any sequence x = hx1 , x2 , . . . , xn i is specified by the pair hn, Pn (x)i. We want however to
describe the sequence by a single number; but this is now easy, use the function P = P2 on
this pair. So we get (as a first approximation) P[∞] (x) = P2 (n, Pn (x)).
However, there is a small bug in this process. We really need these two functions to be
inverses, but P[∞] is not surjective (onto), so it cannot have an inverse. The trouble is with
the N0 bit which only has the empty sequence h i, for which P[∞] () = P2 (0, P0 ()) = P2 (0, 0) =
0, but none of the numbers P2 (0, 1) , P2 (0, 2) , P2 (0, 3) , . . . occur as values. Nothing else
goes wrong, so we make a slight adjustment to the definition of P[∞] to finesse our way
around this problem. Having ensured that P[∞] is in fact a bijection, the inverse E[∞] must
exist, and a little messy tinkering tells us what its definition must be. We can then prove
that they are both bijections by showing that they are inverses, a reasonably straightforward
calculation considering. So here is the outcome.
B.17 Definition
(i) We define the function
P[∞] : N[∞] → N
0
if n = 0,
P[∞] (x) = P2 (0, x1 + 1) if n = 1,
P2 (n − 1, Pn (x)) if n ≥ 2.
402 CHAPTER 9. RECURSIVE FUNCTIONS
Since, in the case n = 1, we have Pn (x) = x, this definition can be made to look a bit more
regular by rewriting the second case thus:
0
if n = 0,
P[∞] (x) = P2 (n − 1, Pn (x) + 1) if n = 1,
P2 (n − 1, Pn (x)) if n ≥ 2.
Looking at the definitions in the previous section we see that we have an alternative (and
neater) way of expressing this definition: P[∞] (x1 , x2 , . . . , xn ) = Pn+1 (n, x1 , x2 , . . . , xn ).
(ii) We define the function
E[∞] : N → N[∞]
as follows: for any natural number w,
h i
if w = 0,
E[∞] (w) = hR(w) − 1i if w 6= 0 and L(w) = 0,
EL(w)+1 (R(w)) if w =6 0 and L(w) ≥ 1.
f O = Pn ◦ f ◦ Em .
Now everything to do with these functions works just the way we want.
B.18 Proposition
The functions P[∞] : N[∞] → N and E[∞] : N → N[∞] are inverse functions (and therefore
both bijections).
Proof. (1) Suppose that P[∞] (x) = w; we prove that E[∞] (w) = x.
From the definition of P[∞] there are three possibilities:
• n = 0, that is, x = h i.
Then P[∞] (x) = 0 so w = 0.
But then E[∞] (w) = E[∞] (0) = h i, as required.
• n = 1 so that x = hx1 i.
Then w = P[∞] (x(= P2 (0, x + 1).
But then w 6= 0 and L(w) = 0.
Therefore E[∞] (w) = hR(w) − 1i = hx1 i = x.
• n ≥ 2.
B. PRIMITIVE RECURSIVE FUNCTIONS 403
• w 6= 0 and L(w) = 0.
In this case x = E[∞] (w) = hR(w) − 1i, a sequence of length n = 1.
Then P[∞] (x) = P2 (0, R(w)) = P (2(L(w), R(w)) = w.
• w 6= 0 and L(w) ≥ 1. In this case x = E[∞] (w) = EL(w)+1 (R(w)) = En (R(w)),
a sequence of length n. From this also Pn (x) = R(w). Then P[∞] (x) =
P2 (n − 1, Pn (x)) = P2 (L(w), R(w)) = w.
B.19 Proposition
Proposition B.15 above holds true even when m or n or both are [∞]. I B.15
Proof. From the definition of f O in the general case and the fact that P[∞] and E[∞] are
inverses, Equations (–1), (–2) and (–3) follow. Then Part (ii) follows as in the proof of that
Proposition.
B.20 Proposition
Given two functions f : Nl → Nm and g : Nm → Nn , where any of l, m, n might be [∞], if f
and g are both partial recursive / recursive / primitive recursive the composite g ◦ f is the
same.
Proof. Just follow the arrows around on this diagram, using the fact that Pl , Pm and Pn
are bijective; and the fact that follows from this that (g ◦ f )O = g O ◦ f O .
f g
Nl Nm Nn
Pl Pm Pn
N N N
fO gO
404 CHAPTER 9. RECURSIVE FUNCTIONS
B.21 Proposition
There are primitive recursive functions
len : N → N
ent : N2 → N
del : N → N
adj : N2 → N
rep : N3 → N
concat : N2 → N
(v) rep(r, w, y) represents replacing the rth entry of the sequence by the value y: if x =
(x1 , x2 , . . . , xn ), x0 = (x1 , x2 , . . . , xr−1 , y, xr+1 , . . . , xn ), P[∞] (x) = w and P[∞] (x0 ) = w0
then rep(r, w, y) = w0 .
(vi) concat(u, v) represents the concatenation of the sequences represented by u and v:
concat(u, v) = P[∞] (hx1 , x2 , . . . , xm , y1 , y2 , . . . , yn i), where u = P[∞] (hx1 , x2 , . . . , xm i) and
v = P[∞] (hy1 , y2 , . . . , yn i).
(vii) zeros(n) represents an all-zero sequence of length n, P[∞] (h0, 0, . . . , 0i).
(iii) Given w = P[∞] (x1 , x2 , . . . , xn ) = Pn+1 (n, x1 , x2 , . . . , xn ), and noting that then
Pn−1 (x2 , x3 , . . . , xn ) = R2 (w), we have
f (r, u, v) = v ,
f (r + 1, u, v) = adj(ent(len(u) − r, u), f (r, u, v)) for all r ≥ 0 .
It will be observed that Cases (v) – (vii) use definition by primitive recursion over r and
otherwise every step in all these cases is an operation we already know to be primitive
recursive.
406 CHAPTER 9. RECURSIVE FUNCTIONS
C Specifying algorithms
C.1 Algorithms in general
One of the main points about the definition of (partial) recursive functions is that this
definition should cover functions which can be defined by any algorithm whatsoever. At
this point there are (or appear to be) some serious problems with the approach taken so far
in this chapter.
Firstly, it is not at all clear that any algorithm we might think up can be recast into the
I A.3 methods we have been using so far. The basic definition, A.3, is not very helpful, but in the
last two sections we saw how, by use of much trickery, we can usually do this. But to go
on and claim that any algorithm can be recast in this manner appears to require more of a
leap of faith than a mathematician should be happy with.
Secondly, the definition only applies to functions involving natural numbers. Worse, as we
have seen, it does not apply to functions defined on tuples of numbers of variable length.
For example, a function taking any sequence of natural numbers to its sum
can clearly be computed by an algorithm, but is not covered by Definition A.3. Even worse,
there are algorithmic procedures which do not involve natural numbers at all, for example
reversing the letters in any word, as in
“CAT” 7→ “TAC” .
The first (and main) part of Appendix C is devoted to developing the notion of a completely
general algorithm (as far as that can be done). Such an algorithm is treated as a general
method for manipulating patterns of symbols. So in particular, numbers are treated as
strings of symbols (for example, a string of 0s and 1s for binary code).
The main facts and results that will be found in the appendix are as follows.
(1) That a partial function Nn 99K N is partially recursive if and only if it is computable
by an algorithm of the very general sort described in the appendix.
(2) Considering different possible codes for describing natural numbers: given some very
simple and obvious restrictions on the form of the code, it does not matter which code you
use — a function is computable in one code if and only if it is computable in the other.
These facts make the strongest argument I know for the assertion that the definition of
a partial recursive function at the beginning of this chapter characterises those functions
Nn → N which can be computed by any algorithm whatsoever. Which of course was the
point of everything we have done so far in this chapter.
(3) Out of the proof of this comes the important result (Theorem C.E.17) that I C.E.17
For any partial recursive function f : Nn 99K N, there are primitive recursive functions
p, q : Nn+1 → N such that
f (x) = p(min{q(i, x) = 0}, x) for all x ∈ Nn .
i
Back at the beginning of this chapter (Subsections A.2 and A.3, when partial recursive I A.2
functions were being defined, it was pointed out that care should be taken with the definition I A.3
of minimalisation (µg(y) = minx {g(x, y) = 0}), because it is easy to overlook the fact that it
can be applied to partial functions, for which there is a special way of reading this definition.
Since a partial recursive function is built up by repeated use of the operations of substitution
and minimalisation, this can be a bit of a problem when trying to prove things about them.
The result just quoted means that this problem can be avoided, and we have:—
(4) The original definition (in A.3) of a partial recursive function Nn → N remains true
if we replace the word “minimalisation” with “minimalisation applied to a regular function”.
(This is Corollary C.E.18 to the last-mentioned theorem.) I C.E.18
(5) A “programming language” is developed there which is designed to provide a fairly
easy way to describe algorithms and so provide proofs that partial recursive functions are
in fact partial recursive. (The methods we have used so far have been fairly painful.)
The language developed in the appendix is aimed at the general form of algorithm, involving
symbol-by-symbol manipulation. This is working at too detailed a level for what we will be
wanting to do in this chapter, so our next job here is to describe a subset of that language,
sufficient for all the numerical manipulations we will want to do.
(6) There is a brief run-down on Turing machines.
Similarly, to prove that a given function is partial recursive, a standard method is to show it
can be computed by an algorithm. In many cases the algorithm can be quite complicated,
so it is good to have a way of describing algorithms as precisely and readably as possible.
I describe here a way of specifying an algorithm which is quite similar to a number of
computer languages, notably C. It is much simpler than those languages, because we do not
have to worry about many of the real-life problems that working on an actual computer
involves.
This language is a subset of the more general one given in the appendix — simpler since
we will not need to get down to the symbol-by-symbol kind of manipulation dealt with
there. However, that means that the proof that this language does define partial recursive
functions Nn → N and, conversely, that every such function can be specified in this language,
is contained in the similar (and lengthy!) proof in the appendix.
A program has the following form: one or more functions of which the first one is designated
the main one.
function; function; . . . function;
There must be at least one function. The non-main functions are called subfunctions.
A function has the following form
As you can see, it is pretty well self-explanatory. It has only one function, the main one,
whose name is exp. In fact most of our programs will only need one function. The arguments
are m and n and there are no local variables.
C. SPECIFYING ALGORITHMS 409
The return 1 statement on line 2 does two things: it tells the function to “return” the value
1 (output it if you like) and it tells the algorithm to stop, that it is finished. (So in the case
n = 0 it does not go on to line 3.)
In the third line the function “calls” itself recursively, thereby taking advantage of the in-
ductive nature of the definition. (This is a great boon).
The line numbers on the left are not part of the program. They are just shown to make it
easier to discuss what is going on.
Here is another example. This time we compute the same expo-
nential function by repeated multiplication. The program is:
1 exp(m,n)(r,u) {
2 r := 1; u := n;
3 while (u>0) {
4 r := r*m;
5 u := u−1;
6 }
7 return r
8 }
Here there are a couple of local variables. Think of them as private variables, which the
function can use for its own purposes. In this case, r is used to build up the value and u is
used as a count-down to the finish.
The while statement causes the statements in braces following (lines 4 and 5) to be repeated
over and over until the condition u>0 becomes false (as it will eventually because of the
countdown in line 5). In general, several statements can be grouped together to form a
single unit using curly braces { and }.
Note that spaces and newlines can be incorporated anywhere (except in the middle of names
of things). It is a good idea to use this and indenting to make your programs easier to
understand.
Note the difference between the plain equals sign in the first program (n=0) and the colon-
equals sign in the second (r:=1). The plain equals sign is used for a condition (is n equal
to zero? And produces true or false as its value. The colon equals sign means compute the
value on the right hand side and set the left hand variable equal to this.
x = x+1 asks “is x equal to x + 1?”; this is always false.
Note the difference here.
x := x+1 says “add 1 to the number x.
These “colon-equal” statements are called assignment statements (because the value calcu-
lated on the right hand side is “assigned” to the variable on the left. So the left hand side
must be a simple variable name. The right hand side can be a quite complicated expression.
(The word “expression” is used here in a different sense from its use in the context of formal
languages for theories, i.e. in other chapters of this book.)
410 CHAPTER 9. RECURSIVE FUNCTIONS
Expressions
So what is allowed for an expression?
• Any variable name.
Note that it is OK to give ones variables names which are several letters long (e.g., fred +
nurk). For this reason, use * for multiplication, because, for instance, xy is a single variable
name. As far as the programming language is concerned it has nothing to do with x or y.
• Variables can only have natural number values, so we use 1 and 0 to stand for true
and false. This allows us to write things like
x := (a<b) .
Here, x is set equal to either 1 or 0, depending upon whether the value of a is less than that
of b or not. The allowed comparison operations are the usual ones,
< , > , ≤ , ≥ , = , 6=
¬ , ∧ , ∨ , ⇒ , ⇔
Statements
Firstly, note that a statement may be made up of a number of simpler statements by
enclosing them in curly braces thus: {S1 ;S2 ; . . . ;Sk } (this occurs in the definition of a
function (–1) above. The individual statements are often written one under another for
legibility. A statement made up of simpler statements in this way is called a compound
statement; compound statements may be included in larger compound statements to any
depth. A statement which is not compound is called a simple statement.
Here are the kinds of simple statement allowed.
x := expression
C. SPECIFYING ALGORITHMS 411
where x is a variable name. These statements have pretty well been dealt with above.
• A return statement is of the form
return expression
It tells the function to compute the value of the expression and return it as the value of
the function. Every function must have a return statement at the end, and it is allowed to
have others elsewhere if it makes sense to return prematurely (this is done in the second exp
example above).
if (condition) T-statement ;
else F-statement
Here, a condition is any expression which evaluates to either 1 (for true) or 0 (for false).
If the condition is true, the program executes the T-statement, skips the F-statement and
proceeds to what follows. If the condition is false, the program skips the T-statement,
executes the F-statement and proceeds to what follows.
The else part may be omitted (as in the first exp example above).
The T- and F-statements here are frequently compound.
The statement is executed repeatedly so long as the condition is true. The statement is
usually compound and in any case had better modify the condition somewhere because
otherwise it will continue repeating forever.
That is all we need to create quite complicated algorithms for partial recursive functions
conveniently. As usual, there are a few abbreviations that may be used to make things
easier.
Re-using functions
Having written out a program for the exponential function, as we have done above, we
can now use it in any other program we like. There is no point in writing out the whole
exponential function as a subfunction again, even though we know it should be there, because
we know that it can be included. Rewriting the whole thing every time we want to use it
would just be a waste of time.
And the same goes, of course, for any other function we have already programmed.
But we can do better than that.
The big theorem of the appendix tells us that any partial recursive function is programmable
in this way. So, when we want to include such a function in a program, we can save a lot
of trouble by just noting that a program to compute it exists, give it a name, and forthwith
use it as a subfunction without bothering to write the it out explicitly.
412 CHAPTER 9. RECURSIVE FUNCTIONS
For instance, suppose we wanted to write out a program which involved the “T -function”,
I A.14 T (w, i), of Proposition A.14. The proof of the proposition tells us how to program it, but
that involves the functions Rem, L and R, which would also have to be programmed. But
there is no real need to bother. Just give it a name (T comes to mind) and use it.
Of course many infinite loops are not so easy to spot as this one. In fact we will see that
there is no way, in general, to recognise infinite loops.
Minimalisation
Here is how to implement minimalisation in this language. Suppose we want to implement
min{f (x, y) = 0} ,
y
where we already know how to implement the function f . So we can suppose we have a
subfunction
F(x1 , x2 , . . . , xn ,y)
to compute it. Then the minimalisation is implemented thus:
y := 0;
while (F(x1 , x2 , . . . , xn ,y) 6= 0) y := y+1; Easy!
return y;
D. THE ACKERMANN FUNCTION 413
D.1 Remarks
Are there any recursive functions which are not primitive recursive? In this section we answer
this question with “yes” by constructing the Ackermann function as a recursive function and
proving that it is not primitive recursive — well, we will discuss the Ackermann function
but then use a related function which gives the result with a simpler proof.
Consider the definitions of addition, multiplication and exponentiation (which I will give in
a back-to front sort of way, defining x + y and xy by induction over x instead of the more
usual y):
0+y = y (x + 1) + y = suc(x + y)
0y = 0 (x + 1)y = xy + y
0
y = 1 y x+1 = y x .y
f (0, y) = 1 f (x + 1, y) = y f (x,y)
g(0, y) = 1 g(x + 1, y) = f (g(x, y), y)
and so on.
So let us define a sequence am of functions N2 → N inductively, thus:
a0 (x, y) = x + y
a1 (0, y) = 0 a1 (x + 1, y) = a0 (a1 (x, y), y)
am+1 (0, y) = 1 am+1 (x + 1, y) = am (am+1 (x, y), y) for all m ≥ 1.
F (0, x, y) = x + y
F (1, 0, y) = 0
F (m, 0, y) = 1 for all m ≥ 2
F (m + 1, x + 1, y) = F (m, F (m + 1, x, y), y) for all m and x.
This function is recursive but not primitive recursive (recursive is obvious, not primitive
recursive is not).
The Ackermann function is a closely related (and slightly simpler) function A given by . . .
414 CHAPTER 9. RECURSIVE FUNCTIONS
D.3 Lemma
The Ackermann-like function A defined above is recursive.
Proof. That it is partial recursive is easily seen by writing a program for it:
1 A(m,x)() {
2 if (m=0) {return x+2};
2 n := m+2;
2 v := x;
3 while (n > 0) {
4 v := A(m-1,v))
4 n := n-1
5 }
5 return v;
6 }
That it always has a value follows immediately from the definition of the α functions above
using induction over m.
D. THE ACKERMANN FUNCTION 415
D.4 Lemma
For the α functions defined above,
(i) αm (x) > x for all m, x ∈ N.
Proof. Is easy: prove (ii) first by induction over m. (If αm is strictly increasing, then so is
[x+1]
αm .) Then (i) follows immediately, as does
which gives (iii). Now (iv) follows immediately from the definition above.
D.5 Lemma
For every primitive recursive function f : Nn → N, there is a m such that αm “dominates”
f , in the sense that
Proof. It is going to save space to use “vector” style notation: rewrite the above inequality
as
f (x) ≤ αm (max(x)) for all x ∈ Nn .
We will prove the result by induction over the construction of the function f , as given in
Definition 9.B.3. I 9.B.3
If f is one of the base functions (the successor function, the zero constant function or one
of the projection functions πn,i , then it is dominated by α0 .
Now suppose that f (x) = h(g1 (x), g2 (x), . . . , gk (x)), but write it
f (x) = h(y1 , y2 , . . . , yk )
where yi = gi (x) for i = 1, 2, . . . , k .
From the second of these, max(y) ≤ αm (max(x)) and then from the first one, h(y) ≤
αm (αm (max(x))) ≤ αm+1 (max(x)). Since f (x) = h(y) we have our result.
Finally, suppose that f is given by primitive recursion,
f (0, y) = g(y)
f (x + 1, y) = h(x, f (x, y), y) ,
where we assume inductively that g and h are each dominated by one of the α functions,
and so as above, they are both dominated by the same one, αm say.
Now we show, by induction over x, that
[x+1]
f (x, y) ≤ αm (max(x, y)) for all x ∈ N and y ∈ Nn .
[x+1]
We are given f (0, y) ≤ αm (max(y)). Now, assuming f (x, y) ≤ αm (max(x, y)) we also
[x+1]
have y ≤ αm (max(y)) (using the preceding lemma), and so
as required.
D.6 Theorem
There is a function f : N → N which is recursive but not primitive recursive.
D.7 Discussion
The proof using the (genuine) Ackermann function goes much the same way. The main
part is to show that every primitive recursive function Nn → N is dominated by one of the
functions am . This part of the proof is more ticklish than the one given above, probably
because the a-functions do not grow as fast as the α-functions. The “diagonal” argument
then goes as above.
D. THE ACKERMANN FUNCTION 417
Let us see what can happen when using the definition to (attempt to) compute Ak(4, 4):
Ak(4, 4)
= Ak(3, Ak(4, 3)) by (Ak3)
= Ak(3, Ak(3, Ak(4, 2))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(4, 1)))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(4, 0))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, 0)))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(2, 0))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, Ak(1, 0)))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, Ak(0, Ak(0, 0))))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, Ak(0, 2)))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(1, 4))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(1, 3)))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(1, 2))))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(1, 1)))))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, Ak(1, 0))))))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, Ak(0, Ak(0, 0)))))))))))) by (Ak2)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, Ak(0, 2))))))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, Ak(0, 4)))))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, Ak(0, 6))))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, Ak(0, 8)))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, Ak(0, 10))))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(2, 12)))))) by (Ak1)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(1, Ak(2, 11))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(1, Ak(1, Ak(2, 10)))))))) by (Ak3)
= Ak(3, Ak(3, Ak(3, Ak(3, Ak(3, Ak(1, Ak(1, Ak(1, Ak(2, 9))))))))) by (Ak3)
and so on and on. It certainly does not look as though it is going to terminate any time
soon.
A rather mean trick that some teachers of computer science play on their students when they
are learning about recursive programming is to give them the definition of the Ackermann
function and ask them to write a small program to, say, compute A(6, 6). The task looks
easy, innocuous even. What the students find is that the process runs out of memory fast,
and there is not much they can do about it.
418 CHAPTER 9. RECURSIVE FUNCTIONS
E.1 Lemma
The set of partial recursive functions N 99K N is the closure of the following functions
(i) x 7→ x + 1
(ii) L
(iii) R
(iv) x 7→ L(x) + R(x)
(v)
x 7→ L(x)−R(x)
(vi) x 7→ L(x)R(x)
under the following operations
(a) Composition
(b) From g and h obtain f given by f (x) = P (g(x), h(x))
(c) From g obtain f given by f (x) = miny {g(P (y, x)) = 0}
Proof. Let us temporarily refer to the set of functions defined above as P and show that
this set is in fact the set of partial recursive functions. It is obvious that all such functions
are partial recursive, so it is only the converse that we need to prove.
I B.13 We will use the inverse functions Pn : Nn → N and En : N → Nn defined in Section B.13.
We will also use two results, both easily proved by induction:
Firstly if g1 , g2 , . . . , gn are all members of P (n ≥ 1), then so is Pn (g1 , g2 , . . . , gn ).
(3) If f is addition, f (x, y) = x + y, then f ◦ E2 (w) = L(w) + R(w) and this is a member
of P by its definition.
(4) and (5) The same argument holds if f is natural subtraction or multiplication.
(6) Now suppose that f is obtained by substitution, f = h(g1 , g2 , . . . , gm ) : Nn 99K N
say, where h ◦ Em and g1 ◦ En , g2 ◦ En , . . . , gm ◦ En are all members of P. Then
f ◦ En = h(g1 ◦ En , g2 ◦ En , . . . , gm ◦ En ) = h ◦ Em ◦ Pm (g1 ◦ En , g2 ◦ En , . . . , gm ◦ En ).
This is a member of P by the first of our preliminary result as above.
E.2 Theorem
There is a partial recursive function ϕ : N2 99K N such that the functions
ϕ(0, _) , ϕ(1, _) , ϕ(2, _) , . . .
are all the partial recursive functions N → N. That is, given any partial recursive function
f : N → N, there is an i such that f (x) = ϕ(i, x) for all x.
Note It is usual to write this function ϕi (x) instead of ϕ(i, x). One can then think of
ϕ0 , ϕ1 , ϕ2 , . . . as being a listing of all the partial recursive functions N → N.
A useful way of looking at this result is: suppose we visualise the values of ϕ as a two-
dimensional array:
0 1 2 3 4
0 ϕ(0, 0) ϕ(0, 1) ϕ(0, 2) ϕ(0, 3) ϕ(0, 4) ...
1 ϕ(1, 0) ϕ(1, 1) ϕ(1, 2) ϕ(1, 3) ϕ(1, 4) ...
2 ϕ(2, 0) ϕ(2, 1) ϕ(2, 2) ϕ(2, 3) ϕ(2, 4) ...
3 ϕ(3, 0) ϕ(3, 1) ϕ(3, 2) ϕ(3, 3) ϕ(3, 4) ...
4 ϕ(4, 0) ϕ(4, 1) ϕ(4, 2) ϕ(4, 3) ϕ(4, 4) ...
.. .. .. .. ..
. . . . .
Then the ith row of this table is a listing of the values of ϕi . And that means that every
partial recursive function will turn up listed as a row in this array somewhere. In fact, any
such partial recursive function will turn up as such a row an infinite number of times (we
will prove this later).
Note well that we are listing partial recursive functions here, with an emphasis on the
“partial”. There will be many values ϕ(i, j) which do not exist — so there will be lots of
“holes” in this array.
This function ϕ : N2 99K N is called a universal partial function for obvious reasons.
420 CHAPTER 9. RECURSIVE FUNCTIONS
F(n,x)(m,r,v) {
if (n=0) {return x+1};
if (n=1) {return L(x)};
if (n=2) {return R(x)};
if (n=3) {return L(x)+R(x)};
if (n=4) {return L(x)−R(x)};
if (n=5) {return L(x)*R(x)};
if (n>5) {
m := bn/3c) - 2;
r := Rem(m,3);
if (r=0) return F(L(m),F(R(m),x));
if (r=1) return P(F(L(m),x),F(R(m),x))
if (r=2) {
v := 0;
while (F(m,P(v,x))6=0) {v := v+1};
return v;
}
};
return 0;
}
E.3 Remarks
(i) The method of of constructing partial recursive functions N 99K N given in Lemma
E.1 above effectively constitutes a way of making algorithms, because it tells us that any
such function can be built up in a finite sequence of steps, starting with the basic functions
listed there and combining them with the operations listed until, at the last step, the required
function is produced. A listing of these steps then is one (perhaps inefficient) way of defining
an algorithm.
The proof of the last theorem shows us how to associate a number i with such a listing.
And conversely, given such a number i, it tells us how to break it down to find the listing.
E. TWO FUNDAMENTAL THEOREMS 421
What this means is that the subscripts i on the functions ϕ(i) can be taken to correspond to
algorithms to compute them according to the method given by the lemma. And, conversely,
if we know how to build up the function according to the lemma, then we can find such a
subscript i.
Now consider the original definition of a recursive function back in Definition A.A.3. This I A.A.3
also gives us a method, effectively an algorithm, for specifying and also computing any partial
recursive function, by listing the steps whereby the function can be built up according to
that definition.
But the proof of Lemma E.1 above tells us how to convert back and forth between its way
of defining partial recursive functions and that of the original Definition.
So that means that we can convert back and forth between a subscript i and an algorithm,
in the sense of a listing of steps according to the original definition.
Now we take this idea one step further. Suppose we have a partial recursive function specified
by a general algorithm, in the sense of Section C.A.1. Then the big long proof of Section E.1 I C.A.1
tells us how we can rewrite that algorithm as a sequence of steps according to the original I E.1
definition. Actually doing so would be horrible, but a scan through that long proof will show
that, at every step the way of building up the requisite function is stated explicitly. And,
conversely, given an algorithm in the sense of a sequence of steps building up a function
according to the original definition, the proof of Proposition C.E.14 tells us how to convert I C.E.14
it into a general algorithm.
This in turn means that we can convert back and forth between indices i and general
algorithms.
And, putting this all together, no matter which way we like to write an algorithm, we can
convert back and forth algorithms and indices i on the functions ϕi .
Another way of looking at this is that these functions ϕi make the indices i into a nice, very
compact, way of referring to algorithms.
This word “effective”. In the context of recursive functions and algorithms, to say some
procedure is “effective” means that there is an algorithm to do it, that is, that it can definitely
actually be done. To say that ϕ0 , ϕ1 , ϕ2 , . . . is an effective listing means that we can go
back and forth between the indices i and the algorithms for computing the functions — that
there are actually algorithms for so doing.
(ii) We now have a “gold-plated” effective listing of all partial recursive functions N → N,
ϕ0 , ϕ1 , ϕ2 , . . .
Using the recursive bijections Pm : Nm → N we can list the partial recursive functions
Nm → N
(m) (m) (m)
ϕ0 , ϕ1 , ϕ2 , . . .
(m)
simply by defining ϕi = ϕi Pm . In fact we can list the partial recursive functions Nm → Nn
(n,m)
just as simply, by defining ϕi = En ϕi Pm .
In the particular ordering given by the chosen proof above, ϕ1 , ϕ2 , . . . , ϕ5 are the functions
ϕ0 (x) = x + 1
ϕ1 (x) = L(x)
ϕ2 (x) = R(x)
ϕ3 (x) = L(x) + R(x)
ϕ (x) = L(x)−R(x)
4
ϕ5 (x) = L(x)R(x) .
and
I E.1 We notice that, as a consequence of the proof of Lemma E.1, there is a recursive function
χ : N2 → N such that
ϕj ϕi = ϕχ(j,i) for all i and j
(here ϕj ϕi denotes composition).
and so
(m+n)
ϕi (x1 , x2 , . . . , xm , y) = ϕm+1
i (x1 , x2 , . . . , xm , Pn (y)) .
In what follows it will be convenient to assume that we have chosen the particular listing
of partial recursive functions given in the theorem. However it is not necessary to do this:
most of the results still hold whatever effective listing is used (but then some of the proofs
must become more general).
One should observe that the listing is not one-to-one, in fact it is easy to see that any partial
recursive function appears in the list infinitely often. Less obvious is the fact that this must
be so in any effective listing: it is just not possible to effectively list the partial recursive
E. TWO FUNDAMENTAL THEOREMS 423
functions in such a way that every function appears just once. This explains some of the
F.9 choices of words in the theorems which follow. Rice’s Theorem (F.9 below) shows just how
bad the situation must be in this regard.
The next two lemmas and theorem are technical. They allow us to play some fancy games
with the subscripts i.
E.4 Lemma
For every partial recursive function f : Nn → N there is a recursive function fˆ : Nn → N
(not merely partial recursive!) such that
f ◦ (ϕi1 , ϕi2 , . . . , ϕin ) = ϕfˆ(i1 ,i2 ,...,in ) for all i1 , i2 , . . . , in .
(The function on the left is x 7→ f (ϕi1 (x), ϕi2 (x), . . . , ϕin (x)). This is the notation intro-
duced in Subsection A.7.) I A.7
so in this case fˆ = f .
Suppose next that f is addition, f (x, y) = x + y : N2 → N. Then
(2)
f E2 (w) = f (L(w), R(w)) = L(w) + R(w) so f = ϕ3 .
also
f (x1 , x2 , . . . , xn ) = min{g(y, x1 , x2 , . . . , xn ) = 0} .
y
g(y, ϕi1 (x), ϕi2 (x), . . . , ϕin (x)) = g(L(w), ϕi1 R(w), ϕi2 R(w), . . . , ϕin R(w))
= g(ϕ1 (w), ϕi1 ϕ2 (w), ϕi2 ϕ2 (w), . . . , ϕin ϕ2 (w))
since L = ϕ1 and R = ϕ2
= g(ϕ1 (w), ϕχ(i1 ,2) (w), ϕχ(i2 ,2) (w), . . . , ϕχ(in ,2) (w))
= ϕθ (w) = ϕθ P (x, y)
and the result is true again with fˆ = 3ĝ(1, χ(i1 , 2), χ(i2 , 2), . . . , χ(in , 2))) + 8.
E.5 Lemma
For any partial recursive function f : N2 → N there is a recursive function f˜ : N → N such
that f (x, y) = ϕf˜(x) (y) for all x and y.
Proof. Consider first the case of the function π(x, y) = x for all x and y. Define π̃ induc-
tively:
π̃(0) = z where z is the index of the zero function: ϕz (y) = 0 for all y,
π̃(x + 1) = χ(0, π̃(x)) where χ is the composition function defijned above.
Let f be any partial recursive function N2 → N. Then, by the previous lemma, there is a
recursive function fˆ : N2 → N such that ϕf˜(i,j) (y) = f (ϕi (y), ϕj (y)) for all i, j and y. Set
f˜ = fˆ(π̃(x), d), where d is the index of the identity function, ϕd (y) = y. Then
as required.
E. TWO FUNDAMENTAL THEOREMS 425
Proof. We first prove the result in the case m = n = 1; the general case then follows easily.
So we prove: There is a recursive function s : N2 → N such that
(2)
ϕi (x, y) = ϕs(i,x) (y) for all i, x, y ∈ N.
We show how to define s recursively by giving a description of how to write a program for
s (with comments).
(2)
If i ≤ 5 then ϕi is one of the six basic functions listed in Lemma E.1. Therefore ϕi = ϕi P I E.1
is one of the functions
x 7→ P (x, y) + 1 , x , y , x+y ,
x−y or xy.
Whichever it is, call it f . Set s(i, x) = f˜(x) (where f˜ is the function defined in the preceding
proposition) and then ϕs(i,x) (y) = ϕf˜(x) (y) = f (x, y).
where k = P (s(L(m), x), s(R(m), x)), so we set s(i, x) = 3P (s(L(m), x), s(R(m), x)) + 7.
Before moving on, note that the function N3 → N3 given by (z, x, y) 7→ (x, z, y) is recursive,
(3,3)
so let its index be r: ϕr (z, x, y) = (x, z, y). Then, defining ρ : N → N by ρ(i) = χ(i, r),
we have ϕρ(i) (z, x, y) = ϕi (x, z, y) for all i, x, y and z.
426 CHAPTER 9. RECURSIVE FUNCTIONS
= min{ϕ(3)
m (z, x, y) = 0}
z
= min{ϕρ(m) (x, z, y) = 0}
z
(2)
= min{ϕρ(m) (x, P (z, y)) = 0}
z
= min{ϕs(ρ(m),x) P (z, y) = 0}
z
= ϕ3s(ρ(m),x)+8 (y)
Now we prove the result for m = 1 and any n. We want to prove that, for any n ≥ 1, there
(n+1) (n)
is a recursive function s1,n : N2 → N such that ϕi (x, y) = ϕs1,n (i,x) (y) for all i, x ∈ N
and y ∈ Nn . This is easy: the function s already defined will do:—
(n+1) (2) (n)
ϕi (x, y) = ϕi (x, Pn (y)) = ϕs(i,x) Pn (y) = ϕs(i,x) (y) .
Finally we prove the result for any m ≥ 1 and n ≥ 1, by induction over m. The case m = 1
has been proved above. Then, for the case m + 1,
(m+n+1) (m+n)
ϕi (x1 , x2 , . . . , xm+1 , y) = ϕs(i,x1 ) (x2 , x3 , . . . , xm+1 , y) just proved above,
= ϕsm,n (s(i,x1 ),(x2 ,x3 ,...,xm+1 )) (y)
and so the result is true with sm+1,n (i, x1 , x2 , . . . , xm+1 ) = sm,n (s(i, x1 ), (x2 , x3 , . . . , xm+1 )).
E.7 Remark
It was remarked above that the listing of partial recursive functions given by Proposition
I E.2 E.2 is by no means the only possible one. Many useful results about such effective listings
can be proved using the two main theorems of this section (Theorem E.2, as an existence
theorem, and the s-m-n Theorem).
F. SOME IMPORTANT NEGATIVE RESULTS 427
In my opinion, these results are both alarming and fun, becoming gradually more so as you
progress through the section.
Throughout this section we will restrict our attention to partial recursive functions N 99K N,
but that is only to make the discussion simpler. All of these results can be generalised easily
to functions Nm 99K Nn .
F.1 Proposition
There is no universal recursive function. (Compare with the existence of a universal partial
function, proved in Theorem E.2.) I E.2
There is no recursive function ψ : N2 → N such that the functions
are all the recursive functions N → N (i.e. such that, for every recursive function f : N → N,
there is some i ∈ N such that f (x) = ψ(i, x) for all x.
The proof is a classic “diagonal” argument.
Then g is recursive, so there is some i ∈ N such that g = ψ(i, _), that is,
In the case x = i, the last two displayed equations give ψ(i, i) = g(i) = ψ(i, i) + 1, a
contradiction.
One of the points of Theorem E.2 is that it shows how, given an algorithm for a partially I E.2
recursive function f , to find a subscript i such that f = ϕi ; and conversely, given i to find an
algorithm for f . Thus we can think of the subscript i as a convenient way of encapsulating
the algorithm for ϕi . So one way of asking what we can tell about a partial recursive function
from a given algorithm to compute it is to ask what we can tell about the function ϕi from
a knowledge of the index i. As we will see, the answer is “pretty well nothing”.
In this context note that the fact that, for any given partial recursive function f , there are
an infinite number of algorithms to compute it, corresponds to the fact that there are an
infinite number of indices i such that ϕi = f . Also note that, to ask whether such-and-such
428 CHAPTER 9. RECURSIVE FUNCTIONS
a question about a partial recursive function can be answered from a knowledge of a given
algorithm to compute it is to ask whether the answer to the question about ϕi is a recursive
predicate of the subscript i. The next Proposition answers an obvious such question in the
negative.
F.2 Proposition
Given an algorithm for a partial recursive function, one cannot in general determine whether
it is recursive or not.
Of course one can certainly determine this from the algorithm for some algorithms. The
proposition says that you can’t always do it.
The set { i : ϕi is recursive } is not recursive.
Proof. Suppose that this set is recursive. Then its characterisic function
(
1 if ϕi is recursive
C(i) =
0 otherwise
is recursive. Define ψ : N2 → N by
(
ϕi (x) if C(i) = 1
ψ(i, x) =
0 otherwise
Next we look at the Halting Problem. This problem is to find a method whereby, given
any algorithm for a partial recursive function N 99K N and an argument x, one can decide
whether the algorithm halts or not for that x.
That this question is important, even central, is suggested by two things: firstly, that if such
a method exists, then it can easily be seen to extend to functions Nm 99K Nn and, secondly,
that it can be interpreted as asking whether there is any general and reliable method for
debugging computer programs (or at least for checking whether or not they get into an
“infinite loop”.
F.3 Proposition
The Halting Problem is not solvable.
Now define ψ : N2 → N by
(
ϕ(i, x) if C(i, x) = 1
ψ(i, x) =
0 otherwise
There is a subtle point here. This proposition says that there is no general method that will
work for every algorithm. No matter what method we try, there will be an algorithm out
there somewhere which defeats it. But there remains the possibility that there might exist a
(separate) method for each algorithm which would allow one to determine, for each values of
x, whether that algorithm halted or not. The existence of an infinite number of algorithms
does not mean that they could all be combined together into one single “algorithm to rule
them all”.
The next proposition settles that question: there are algorithms for which no method exists.
F.4 Proposition
There is a particular partial recursive function f whose halting problem is not solvable.
Define ψ : N2 → N by
(
ϕ(i, x) if C(P (i, x)) = 1
ψ(i, x) =
0 otherwise.
Then g is recursive. So there is i ∈ N such that g = ϕi , that is, g(x) = ϕ(i, x) for all x. Put
x = i. Noting that ϕ(i, i) = g(i), which is defined, so that i ∈ dom f and C(i) = 1, we have
the contradiction
ϕ(i, i) = g(i) = ϕ(i, i) + 1 .
430 CHAPTER 9. RECURSIVE FUNCTIONS
F.5 Proposition
You cannot tell if any particular number occurs as a value either.
Given any y ∈ N, the set S = { (i, x) : ϕ(i, x) = y } is not recursive.
Proof. Suppose that S is recursive. Then we will solve the Halting Problem. Define
ψ : N2 → N by ψ(i, x) = ϕ(i, x)−ϕ(i, x) + y. Then
(
y if (i, x) ∈ dom ϕ
ψ(i, x) =
undefined otherwise
I E.5 and (by Proposition E.5) there is recursive α : N → N such that ψ(i, x) = ϕ(α(i), x) for all
i, x. Then
The last set here is recursive and the first set is not.
To see why the last set here is recursive, see Part (ii) of the Remark below. To see how
I E.5 Proposition E.5 is applied, use it in the case n = 1, with f (u) = u−u + y and α = f˜.
F.6 Remark
Suppose that R is a recursive subset of Nn .
(i) Then so is N r R.
(ii) Let the function h : Nm → Nn be recursive also. Then so is the set { x : h(x) ∈ R } =
h−1 [R].
F.7 Proposition
There is no effective procedure to decide what algorithm gives what function.
(Corollary: If a computer-science lecturer gives you an assignment to write a program to
compute some given function, then there is no way he/she can be sure of being able to mark
you answer right or wrong.)
Let g be any partial recursive function N → N. Then the set { i : ϕi = g } is not recursive.
Proof. Suppose first that g 6= ∅, i.e. that g has nonempty domain. Let θ be a partial
recursive function N → N with nonrecursive domain (e.g. one of the examples of Proposition
I F.4 F.4).
Define a new function f : N2 → N by f (u, x) = θ(u)−θ(u) + g(x). This is partial recursive
and is the function (
g(x) if u ∈ dom θ
f (u, x) =
undefined otherwise.
Now, by Proposition E.5 there is a recursive α : N → N such that f (u, x) = ϕα(u) (x) for all I E.5
u, x. Thus (
g(x) if u ∈ dom θ
ϕα(u) (x) =
undefined otherwise,
which is to say (
g if u ∈ dom θ
ϕα(u) =
∅ otherwise.
Therefore (and here is where we use g 6= ∅),
Choose any nonempty partial recursive function g : N → N you like (say, the zero function).
Now define f and α as above, so again we have
(
g if u ∈ dom θ
ϕα(u) =
∅ otherwise.
Then
u ∈ dom ϕ ⇔ ϕα(u) 6= ∅ ⇔ α(u) ∈
/R ⇔ u ∈ α−1 [N r R]
and we have a contradiction as before.
F.8 Corollary
There is no way of telling, from an algorithm, whether or not its function has any values at
all.
Then R = ∅ or R = N.
We have been looking at properties of algorithms which are “functional”, in the sense that
they are really properties of the functions they define. (Is the function total? Does it have
432 CHAPTER 9. RECURSIVE FUNCTIONS
any zeros? Does it have any values at all?) One can define a functional property, P say, of
algorithms as one such that
For any two algorithms which generate the same function,
if one of them has property P then so does the other.
Or, equivalently,
For any two algorithms which generate the same function,
if one of them fails to have property P then so does the other.
Another way of looking at this is to regard two algorithms as being “equivalent” if they
generate the same function (a very natural idea, also easily seen to be a genuine equivalence
relation). Then a property P is functional if and only if it respects equivalence, in the sense
that two equivalent algorithms either both have property P or both don’t.
Given the relationship between algorithms and the indices i on the ϕ-functions, which has
underlain the results in this section so far, it is natural to consider functional properties of
these indices. Thus we would consider a predicate (unary relation) P on N as functional if,
whenever ϕi = ϕj , i and j either both have the property or both don’t.
I think that functional properties defined on N are clearly of some importance, and it is of
considerable interest to be able to determine whether any particular property is functional
or not.
We have already looked at the close relationship between predicates on N and subsets of N.
(Give the predicate P , define the subset to be { x : P (x) } ). So we define a subset R of N
as being functional if
For any i, j ∈ N such that ϕ(i) = ϕ(j),
either both i and j are members of R or both are not.
Thus the question can be rephrased: can one determine whether any given subset of N is
functional or not? The proposition tells us that the only such subsets are ∅ and N itself.
Let θ be a partial recursive function N → N with nonrecursive domain (e.g. one of the
I F.4 examples of Proposition F.4).
Now the empty function is partial recursive, so there is an index z such that ϕz = ∅. It is
also either a member of R or its complement. We may suppose without loss of generality
that z ∈ N r R. Since R is nonempty, it contains at least one number, k say, and then
ϕk 6= ϕz . Summary so far:—
k∈R , z ∈NrR , ϕk 6= ϕz = ∅ .
Now define f : N2 → N by f (u, x) = θ(u)−θ(u) + ϕk (x). Then f is partial recursive and is
the function (
ϕk (x) if u ∈ dom θ
f (u, x) =
undefined otherwise.
F. SOME IMPORTANT NEGATIVE RESULTS 433
Now there is a recursive α : N → N such that f (u, x) = ϕα(u) (x) for all u, x. Thus
(
ϕk (x) if u ∈ dom θ
ϕα(u) (x) =
undefined otherwise,
which is to say (
ϕk if u ∈ dom θ
ϕα(u) =
∅ = ϕz otherwise.
Now
if u ∈ dom θ then ϕα(u) = ϕk and so α(u) ∈ R
and
if u ∈
/ dom θ then ϕα(u) = ϕz and so α(u) ∈ N r R .
It follows that dom θ = { u : α(u) ∈ R }, which is recursive.
Let us say that a property of partial functions N 99K N is “trivial” if either all functions
have the property or all functions don’t. (Thus a trivial property tells us nothing about the
function.) Examples of trivial properties are “f is a partial function N 99K N, “if f (x) exists,
then f (x) = f (x)” and so on. So we are really only interested in non-trivial properties.
F.10 Corollary
There is no way of telling, from its algorithm, whether a function has any nontrivial property
at all.
Let P be any nontrivial property of functions N 99K N. Then the set { i : ϕi has property P }
is neither ∅ nor N and so is not recursive.
Is that fact depressing or exciting? Possibly both.
434 CHAPTER 9. RECURSIVE FUNCTIONS
G.1 Definition
G.2 Discussion
So what is the difference between a recursive set and a recursively enumerable set?
For a recursive set R, there is an algorithm to decide, for any n ∈ N whether n ∈ R or not.
One way I find helpful to look at this is: think of the algorithm as a machine. We feed any
numbers we like into it, and it spits out answers “Yes” or “No”.
With a recursively enumerable set S, there is an algorithm such that S is all the values it can
compute. Thinking of the algorithm as a machine, we feed it with all the natural numbers
0, 1, 2, . . . and collect all the numbers it spits out. After this infinite process is completed,
the collection of spat out numbers is S.
It is not hard to see that a recursive set must be recursively enumerable (details below), but
the other way round is problematical. Given our machine to generate the set S as above,
how could we determine whether any particular number n is in it or not? We could set the
machine going, with the numbers 0, 1, 2, . . . as input and wait to see if n is spat out. Now
if n happens to be in S, we will eventually see it appear, and our question is answered with
“Yes”. But if it is not in S, then we must wait forever for our answer — we have effectively
got ourselves into an infinite loop.
That is not a proof that there are recursively enumerable sets which are not recursive. The
fact that the suggested decision method doesn’t work does not mean that there isn’t another,
trickier, method which will. However we will prove below that there are such sets, including
some very important ones. In the meantime, I suppose that the existence of the two different
words for describing subsets suggests strongly that there is some difference between the two
types.
Why not use “recursive” instead of “partial recursive” in the definition? For two reasons.
Firstly, because we are interested in the output of algorithms, and as we have seen in the last
section, deciding whether an algorithm produces a recursive function or not is not always
possible. And secondly, as the next proposition shows, we can indeed define a recursively
enumerable set to be the range of a recursive function — with one small adjustment. Perhaps
more surprisingly, we can even define it as the range of a primitive recursive function (with
the same adjustment).
So what is that small adjustment? The empty set is recursively enumerable — we know
that because it is easy to define an algorithm which has no values at all. But the empty set
cannot possibly be the range of any total function. We will see that that is the only thing
that can go wrong: we can define a r.e. set as either ∅ or the range of a recursive function.
G. RECURSIVELY ENUMERABLE SETS 435
G.3 Proposition
For a subset R of N, the following are equivalent:
(i) R is recursively enumerable (i.e. the range of a partial recursive function N → N).
Suppose then that R is the range of a partial recursive function f and is nonempty; we
show that it is the range of a primitive recursive function. Using Theorem C.E.17, there are I C.E.17
primitive recursive functions p and q : N2 → N such that
f (x) = p( min{q(t, x) = 0} , x ) .
t
R = { p(t, x) : q̄(t, x) = 1 } .
With this definition it is obvious that r is primitive recursive. But, since q̄ only takes values
0 and 1, the definition may be restated
(
p(t, x) if q̄(t, x) = 1
r(t, x) =
n0 otherwise.
(
0 if there is u ≤ t such that q(u, x) = 0
q1 (t, x) =
nonzero otherwise.
G.4 Remark
Since we have primitive recursive one-to-one correspondences between N and Nn for each n,
functions N → N can be replaced by functions Nn → N in the above proposition. Using the
same idea, recursively enumerable subsets of Nn can be defined.
The next proposition is surprising, but nice. (IM¬HO anyway.)
G.5 Proposition
A subset R of N is recursively enumerable if and only if it is the domain of a partial recursive
function N → N.
Proof. First, suppose that R is recursively enumerable. Then it is the range of a recursive
function N → N. Define g : N → N by
g(x) = min{f (i) = x} .
i
We now have four different characterisations of a recursively enumerable set. When trying
to prove things about these sets, the choice of which one to use can make a big difference
to the difficulty of finding a proof. I would recommend always contemplating whether it is
going to be easier to use a “range” or “domain” definition at the outset when embarking on
such a proof.
G.6 Proposition
If a subset is recursive then it is recursively enumerable.
Proof. Let R be a recursive set, C its characteristic function. Then R is the domain of the
partial recursive function
f (x) = min{C(x) = 1} .
i
G.7 Proposition
A subset R of N is recursive if and only if both it and its complement N r R are recursively
enumerable.
Proof. If R is recursive then so is its complement and then they are both recursively
enumerable by the previous proposition.
Now suppose both R and its complement are recursively enumerable. If either one is empty
the result is trivial, so we will suppose they are both nonempty. Then there are recursive
functions f and g such that R is the range of f and its complement is the range of g. Now
define a new function h : N → N by
f (b x2 c)
(
if x is even
h(x) =
g(b x− 1
2 c) otherwise.
The point of this is that the sequence of values of h consists of alternate values of f and g:
and so on. In particular, the set of even terms of h is R and the set of odd terms is its
complement. Therefore the function
θ(x) = min{h(i) = x}
i
And just in case you’ve been wondering whether there are in fact any recursively
IF enumerable sets which are not recursive, the answer is yes. In Section F we found several
partial recursive functions whose domains were not recursive. These domains however must
be recursively enumerable. Perhaps the most obvious example is the domain of ϕ.
G.8 Proposition
Let A and B be recursively enumerable sets. Then so are A ∪ B and A ∩ B.
Proof. If either A or B is empty the result is trivial, so now we may assume that A and
B are the ranges of recursive functions f and g respectively. Define h as in the previous
proposition. Then h is recursive and A ∪ B is the range of h, so it is recursively enumerable.
A and B are also the domains of partial recursive functions a and b respectively and then
A ∩ B is the domain of a + b.
Note that, if A is a recursively enumerable set, then its complement need not be. Indeed,
if A is not actually recursive, then its complement cannot be recursively enumerable (by
Proposition G.7 above).
G.9 Proposition
Let A and B be disjoint recursively enumerable sets whose union is recursive. Then they
are both recursive.
A Gödel numbering
It will be assumed throughout this chapter that we are dealing with a fixed first-order theory
which “contains Robinson Arithmetic”, in the sense that it is an extension of the theory RA
described in Section 4.D. I 4.D
Noting that PA is itself an extension of RA, then everything in this chapter is true for any
first-theory S which contains Peano Arithmetic PA, and that includes MK and ZF.
In any case, such a language has the relation = (equality, binary) and the functions 0̄ (zero,
nullary), x 7→ x+ (successor, unary), + and × (addition and multiplication, binary), but,
since it is an extension of RA, it may contain other relations and functions.
Recall that, to any natural number ξ, we defined a corresponding term ξ¯ in S so that
which was very useful to us then. We proved then that the function P[∞] is bijective.
First we number the symbols in our language. It does not matter much how we do this,
provided that the numbers given to different symbols are different; we will assume that they
are numbered in a one-to-one fashion. For example, suppose there are m (formal) function
symbols, f1 , f2 , . . . , fm of arities α1 , α2 , . . . , αm respectively, n (formal) relation symbols,
r1 , r2 , . . . , rn of arities β1 , β2 , . . . , βm respectively and the variable symbols are v1 , v2 , . . . ;
then we could number the symbols as follows:
¬ 0
⇒ 1
439
440 CHAPTER 10. GÖDEL’S THEOREM
∀ 2
( 3
) 4
, 5
f1 , f2 , . . . , fm 6,7,. . . ,m+5
r1 , r2 , . . . , rn m+6,m+7,. . . ,m+n+5
v1 , v2 , . . . m+n+6,m+n+7,. . .
We may call this the Gödel numbering of symbols and denote it gnSym. Thus, for example,
gnSym(∀) = 2 and gnSym(f2 ) = 7 (assuming f2 actually exists).
We can now use this numbering to represent each string in the language by a sequence of
natural numbers. For example, let us see how we would represent the axiom (∀x)(x+ 6= 0̄) .
Firstly, we have to rewrite this axiom in its fully formal form, according to the definitions of
I Ch.3 Chapter 3; this is (∀x(¬ = (s(x), 0̄))) (here I am using s for the successor function). Now we
need to know what the Gödel numbers of the symbols 0̄, s, x and = are; this would depend
on how many extra functions and relations we have in our language S; let us suppose that
Then our string would correspond to the sequence (3, 2, 11, 3, 0, 10, 3, 8, 3, 11, 4, 5, 7, 4, 4, 4).
(So far, this is quite closely analogous to the way a computer deals with text. Each symbol
of the text is stored as a number — its ASCII code — and words and expressions are stored
as sequences of these numbers. But next we go further.)
(In a formal language, none of the expressions are empty strings, so we need not deal with
the empty string at all.)
We can now use our P[∞] numbering of nonempty sequences just defined above to make
each string correspond to a single number. In the case of our example this would be
P[∞] (3, 2, 11, 3, 0, 10, 3, 8, 3, 11, 4, 5, 7, 4, 4, 4). This gives us a Gödel numbering of strings.
More precisely, it gives us a function gnStr from the set of all (nonempty) strings in the lan-
guage S to N defined thus: for the string a1 a2 . . . ak (where a1 , a2 , . . . , ak are the individual
characters (symbols) in the string)
Since gnSym and P[∞] are both bijective, so is gnStr and so any string in the language may
be identified with its Gödel number. This allows us to make statements such as ...
A.3 Proposition
The set of all expressions in S is recursive.
By this I mean that the set of all string Gödel numbers of expressions in the language is a
recursive subset of N.
I3 Informal proof. Consider the definition of an expression given in Chapter 3. This pro-
vides a completely mechanical method for checking whether any given string is an expression
A. GÖDEL NUMBERING 441
9 or not. Indeed, using the programming language described in Chapter 9, it is tedious but not
difficult to write a program to compute the characteristic function of this set, thus showing
that it is recursive.
Note Here and in what follows I will give a number of “informal proofs” that certain sets
are recursive or recursively enumerable. A detailed proof would consist of the presentation of
the appropriate algorithm. The algorithms for these proofs are indeed presented in Appendix
D in case you should feel suspicious about any of the informal ones. ID
A.4 Proposition
The set of all sentences in S is recursive.
Informal proof. As for the last proposition but notice also that deciding whether an
expression contains any free variables or not is also a mechanical procedure.
A.5 Proposition
The set of all axioms of PL is recursive.
By this I mean of course that the set of all string Gödel numbers of axioms of PL is a
recursive subset of N.
Remark The six “axioms” of PL are of course axiom schemas and represent between them
an infinite number of actual axioms. Basically they tell us that any expression with one of
the six prescribed kind of structures is an axiom.
Informal proof. Consider the definition of an expression given in Chapter 3. This also I3
provides a completely mechanical method for breaking up an expression into its component
parts, so its structure can be compared with those of the six axiom schemas. Again it is
tedious but not difficult to write a program to compute the characteristic function of this
set, thus showing that it is recursive.
Recall that in an axiomatic theory, we have the set of theorems which may be deduced
from a chosen set of axioms. Given a set A of expressions, we write Th(A) for the theory
generated by A, that is, the set of all expressions X such that A X. Thus we say that
A is a set of axioms for the theory T if T = Th(A) . It may be the case that A is different
from the axioms we chose to generate T in the first place.
We will say that a theory is recursively axiomatisable if it has a recursive set of axioms
(whether they were the ones chosen to generate it or not).
A.9 Proposition
In a recursively axiomatisable theory, the set of all proofs is recursive.
By this we mean of course that the set of all sequence Gödel numbers of proofs in the
language is a recursive subset of N.
I Ch.3 Informal proof. Consider the definition of a proof given in Chapter 3. This provides a
completely mechanical method for checking whether any given sequence of strings is a proof
or not.
Consider what is involved. First we must step through the individual strings checking that
each is indeed an expression: we recover their string Gödel numbers using the function ent,
I A.3 and test them as guaranteed by Proposition A.3 above. We must also check that each string
arises by one of the rules for forming a proof – that it is a case of an axiom (checkable by
assumption), that it follows from two earlier steps by MP or from one earlier step by UG.
All these are mechanical things to check, though the algorithms might be tiresome to write
out.
A. GÖDEL NUMBERING 443
A.10 Proposition
The “Proof of” relation
hξ, ηi 7→ ξ is the number of some expression and η is the number of a proof of that expression
is recursive.
Informal proof. Same as for the previous proposition.
A.11 Theorem
In a recursively axiomatisable theory, the set of all theorems is recursively enumerable.
Proof. Let a be the string Gödel number of some fixed theorem, say the first instance of the
first axiom of PL. Now consider the function f : N → N defined by the following algorithm:
for each w ∈ N, compute whether w is the Gödel number of a valid proof or not. If it is the
Gödel number of a valid proof, output the string number of its last entry, ent(len(w), w) —
which is the string Gödel number of the theorem it proves. If w is not the Gödel number of
a valid proof, then output a.
Then the set of all theorems is just the range of f .
A.12 Theorem
A recursively axiomatisable complete theory is decidable.
Proof. Let T be the set of all sentences (closed expressions) in the language which are the-
orems and let U be the set of all “antitheorems” — sentences whose negation is a theorem (A
is an antitheorem means ¬A ). The set of antitheorems is clearly recursively enumerable
too — simply enumerate the theorems as described above and negate each one as you go.
If T and U are not disjoint then the theory is inconsistent; then every expression is a theorem
and so the set of theorems is recursive (the set of its Gödel numbers is N).
Suppose now that T and U are disjoint. They are both recursively enumerable and their
union is the set of all sentences (that is the definition of “complete”), which is recursive.
Therefore (by 9.G.9) T is recursive. I 9.G.9
444 CHAPTER 10. GÖDEL’S THEOREM
Recall that the expression (!y) F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) here is short for
B.2 Remark
If the function f : Nn → N is representable, then the (n + 1)-ary relation given by
f (ξ1 , ξ2 , . . . , ξn ) = η is expressible. The first of the two requirements of Part (i) of the
definition above is given explicitly. As for the second, suppose that f (ξ1 , ξ2 , . . . , ξn ) 6= η.
Then, writing θ for the true value, that is, defining θ = f (ξ1 , ξ2 , . . . , ξn ), we have θ 6= η and
so, from these last two equations,
From this and the last line of the definition above it follows that ¬F (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄) .
However, note that the notion of representability of the function says more than that the
relation f (ξ1 , ξ2 , . . . , ξn ) = η is expressible. The last line of the definition above says that,
moreover, this relation is of the kind that defines a function, well, at least when the argu-
ments represent natural numbers.
(Think: (∃x) (x = 2) and (∃x) (x = 3) are both true, but (∃x) (x = 2 ∧ x = 3) is not.) It is
however valid to deduce that (∃x)(∃x0 )(P (x) ∧ Q(x0 )).
When using the choice rule, as we will in the proof below, we have a similar consideration.
If we have lines of the forms (∃x)P (x) and (∃x)Q(x), it is valid to apply the choice rule to
the first line to get P (x), but if we apply the choice rule again to the second line, we must
introduce a new choice variable and write, say, Q(x0 ).
F (x, y) : y = x+
Suppose that f (ξ) = η, that is, that ξ + = η. Then ξ¯+ = η̄ by the definition of the bar
notation.
For uniqueness, suppose that ¯ y) ∧ F (ξ,
F (ξ, ¯ y 0 ), that is, y = ξ¯+ ∧ y 0 = ξ¯+ ; then
0
y = y by substitution of equals.
(ii) The projection function πn,i (ξ1 , ξ2 , . . . , ξn ) = ξi is represented by
Πn,i (x1 , x2 , . . . , xn , y) : y = xi ∧ x1 = x1 ∧ x2 = x2 ∧ . . . ∧ xn = xn
The apparently redundant parts of this expression are required to satisfy the stipulation in
the definition that there be exactly n + 1 variables.
Suppose that πn,i (ξ1 , ξ2 , . . . , ξn ) = η, that is, ξ1 = η. Then ξ¯i = η̄ so ξ¯i = η̄ and
trivially, for i = 1, 2, . . . , n, ξ¯i = ξ¯i . Therefore Πn,i (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄).
For uniqueness, suppose that Πn,i (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) ∧ Πn,i (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ). Then y = ξ¯i
and y 0 = ξ¯i , from which y = y0 .
(iii) Addition is represented by
A(x1 , x2 , y) : x1 + x2 = y .
M (x1 , x2 , y) : x1 x2 = y .
and the proof is the same as for addition (using 4.D.7(iii)). I 4.D.7
446 CHAPTER 10. GÖDEL’S THEOREM
I 4.D.7 Suppose first that ξ1 < ξ2 and η = 0. Then ξ¯1 < ξ¯2 by 4.D.7(vi) and η̄ = 0̄ by the
definition of bar notation. But then ¯ ¯
S(ξ1 , ξ2 , η̄) by plain SL.
I 4.D.7 Suppose on the other hand that ξ2 ≤ ξ1 and η + ξ2 = ξ1 . Then ξ¯2 ≤ ξ¯1 by 4.D.7(vi0 ) and
¯ ¯
η̄ + ξ2 = ξ1 by the definition of bar notation. Then again S(ξ¯1 , ξ¯2 , η̄) by plain SL.
Note that here we have used Proof by Cases, but this has been carried out in the meta-
language; we have not used the fact that such a proof is valid in RA (even though it is).
Note also that ξ¯1 = y + ξ¯2 ⇒ ξ¯2 ≤ ξ¯1 and so S(ξ¯1 , ξ¯2 , y) is equivalent to
Suppose now that ξ1 < ξ2 . Then ξ¯1 < ξ¯2 and ¬(ξ¯2 ≤ ξ¯1 ) (from D.7(vi) and (vi0 ))
0
and so (–1) and (–2) reduce to y = 0̄ and y = 0̄, from which y = y0 .
Otherwise ξ2 ≤ ξ1 in which case (–1) and (–2) reduce similarly to
to be
ζi = gi (ξ1 , ξ2 , . . . , ξn ) for i = 1, 2, . . . , m.
Gi (ξ¯1 , ξ¯2 , . . . , ξ¯n , ζ̄i ) for each i and H(ζ̄1 , ζ̄2 , . . . , ζ̄m , η̄) .
Therefore
and so
Using Universal Generalisation and the Deduction Theorem, it is enough to show that
and
Using the choice rule on these (and this is where the preliminary remark at the beginning
of the proof is relevant), we have
H(z1 , z2 , . . . , zm , y) (a)
G1 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z1 ) (a1)
G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z2 ) (a2)
.. .
. (..)
Gm (ξ¯1 , ξ¯2 , . . . , ξ¯n , zm ) (am)
H(z10 , z20 , . . . , zm 0
, y) (b)
¯ ¯ ¯
G1 (ξ1 , ξ2 , . . . , ξn , z10 ) (b1)
G2 (ξ¯1 , ξ¯2 , . . . , ξ¯n , z20 ) (b2)
.. .
. (..)
Gm (ξ¯1 , ξ¯2 , . . . , ξ¯n , zm
0
) (bm)
(as separate lines in our proof). Now, from Lines (a1) and (b1), we have z10 = z1 and from
the other similar pairs of lines, z20 = z2 , . . . , zm
0
= zm . Then substitution of equals in Line
(b) gives H(z1 , z2 , . . . , zm , y ) which, with Line (a) gives y 0 = y, as required.
0
and, for any ν < η, using Remark B.2 above, we have I B.2
¬G(0̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) ∧ ¬G(1̄, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄) ∧ . . . ∧ ¬G(η − 1, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄)
Also,
(∀u) (u < η̄ ∨ u = η̄ ∨ η̄ < u) (–5)
by 4.D.7(vii ). Now equations (–3), (–4) and (–5) together give F (ξ¯1 , ξ¯2 , . . . , ξ¯n , η̄), as re-
0
I 4.D.7
quired.
Using Universal Generalisation and the Deduction Theorem as before, it is enough to prove
that
F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y) , F (ξ¯1 , ξ¯2 , . . . , ξ¯n , y 0 ) y = y0 .
Now let us give an argument in S. We have
Suppose that y < y 0 . Then (–70 ) gives ¬G(u, ξ¯1 , ξ¯2 , . . . , ξ¯n , 0̄), which contradicts (–6).
Therefore ¬(y < y 0 ). In the same way ¬(y 0 < y). From (–800 ) now, y = y 0 ,as required.
(vii) Suppose that the n-ary relation r is recursive: we want to show that it is expressible.
By definition of recursiveness of a relation, its characteristic function
(
1 if r(ξ1 , ξ2 , . . . , ξn ) is true,
c(ξ1 , ξ2 , . . . , ξn ) =
0 if r(ξ1 , ξ2 , . . . , ξn ) is false.
is recursive and so, as just proved above, is representable. Therefore there is an expression
C(x1 , x2 , . . . , xn , y) which represents it. We will show that r is expressed by the (once again
obvious?) expression C(x1 , x2 , . . . , xn , 1̄).
If r(ξ1 , ξ2 , . . . , ξn ) is true, then c(ξ1 , ξ2 , . . . , ξn ) = 1 and so C(ξ¯1 , ξ¯2 , . . . , ξ¯n , 1̄). On the
I B.2 other hand, if r(ξ1 , ξ2 , . . . , ξn ) is false, then c(ξ1 , ξ2 , . . . , ξn ) = 0 and so, using Remark B.2
above, ¬C(ξ¯1 , ξ¯2 , . . . , ξ¯n , 1̄), as required.
C. GÖDEL’S THEOREM 451
C Gödel’s theorem
Recall that (in Proposition 4.D.8 the following was shown to be a theorem of S: I 4.D.8
P (0̄) ∧ P (1̄) ∧ . . . ∧ P (ν̄) ⇔ (∀x ≤ ν̄)P (x)
This will be used in the proof of Gödel’s Theorem below.
We will be using Gödel numbers so often in this proof that I will call them simply “GNs”.
452 CHAPTER 10. GÖDEL’S THEOREM
Notice that we are already doing something rather peculiar: substituting into P (x) its own
GN.
I A.10 These relations are recursive (Informal proof: As for Proposition A.10, with a few more
obviously algorithmic checks). They are therefore expressible. Let Ŵ (x, y) and V̂ (x, y) be
the expressions which express them, thus:
The same argument goes for the other three statements. Thus we have
This (sort of) says that a certain statement has no proof. Being more careful:—
Let ξ be any natural number. Because N is a model for S, we can infer from
¯
W ∗ (ξ) that is, from ¯ y)
(∀y)¬Ŵ (ξ,
C. GÖDEL’S THEOREM 453
that
(∀η ∈ N)¬W (ξ, η)
¯ . But since every
and this says that, for any η ∈ N, η is not the GN of a proof of Pξ (ξ)
proof has a GN, this says that
¯ has no proof.
Pξ (ξ) (–5)
Now W ∗ (x) is an expression with exactly one free variable x, so it has a GN, which we will
call µ. That is, it is the expression Pµ (x).
Now let us see what happens if we substitute µ for x in this, to get W ∗ (µ̄).
In (–5) we have just seen that, if W ∗ (µ̄), then Pµ (µ̄) has no proof, and in (–6) that
∗
Pµ (µ̄) is the same as W (µ̄). So this tells us that
At this point it seems we should be able to finish the proof off reasonably quickly: argue
the other way to get
if / W ∗ (µ̄) then W ∗ (µ̄)
from which W ∗ (µ̄), and so S is not complete.
But this is where we hit the trap for young players: to say that W ∗ (µ̄) (that is, Pµ (µ̄) has
no proof) is to say that
Now we use that other statement V (ξ, η) defined at the beginning in(–2). Let V ∗ (x) be the
expression
(∀y) Ŵ (x, y) ⇒ (∃z ≤ y)V̂ (x, z)
V ∗ (x) = Pν (x) .
We now show that neither V ∗ (ν̄) nor ¬V ∗ (ν̄) is a theorem, and so S is not complete.
Both facts are proved by contradiction.
Since this is a theorem, it has a proof. Let α be the GN of a proof of this. (n.b. There may
be many; pick one.)
But V ∗ (ν̄) is Pν (ν̄) and, since α is the GN of a proof of this (using (–1)) we have
W (ν, α) is true
Noting that we are working with the assumptions that V ∗ (ν̄) and that S is consistent,
we have
/ ¬V ∗ (ν̄)
in other words: there is no proof of V ∗ (ν̄) . In particular, for any natural number η, η is
not the GN of a proof of ¬V ∗ (ν̄) — which is ¬Pν (ν̄). In other words again, V (ν, η) is false
for all η. Thus (by (–4B)) we have
In particular
¬V̂ (ν̄, 0̄) , ¬V̂ (ν̄, 1̄) , ... , ¬V̂ (ν̄, ᾱ)
so, since there is only a finite number of these,
and
¬(∃z ≤ ᾱ)V̂ (ν̄, z)
which contradicts (–6).
/ V ∗ (ν̄)
and that means that there is no proof of V ∗ (ν̄) in S. Then (from (–1)) W (ν, η) is false for
all η ∈ N. Then, using (–3B),
in particular
¬Ŵ (ν̄, 0̄) , ¬Ŵ (ν̄, 1̄) , ... , ¬Ŵ (ν̄, β̄)
and so
(∀y ≤ β̄) ¬Ŵ (ν̄, y) (–11)
Recalling that this is just an abbreviation for (∀y)(y ≤ β̄ ⇒ ¬Ŵ (ν̄, y)), we have
Proof. Let f be the function N → N defined thus: f (ξ) is the Gödel number of the statement
obtained by finding the expression with Gödel number ξ and replacing all free variables in
¯ This is clearly recursive, so it is expressed by an expression of exactly two free
it by ξ.
variables, F (x, y) say:
if f (ξ) = η then ¯ η̄)
F (ξ,
and
(∀x)(!y)F (x, y) . (–1)
Given the expression P (x), let R(x) be the expression
and suppose that R(x) has Gödel number ρ. Let Q be R(ρ) and suppose that Q has Gödel
number ψ. From this and the definition of f , we see that ψ = f (ρ) and therefore that
From the definition of Q we also see that it is in fact the expression (∃u)( P (u) ∧ F (ρ̄, u) )
. From this, (–1) and (–2) it follows (by ordinary logic) that
Q ⇔ P (ψ̄)
as required.
Note Since S is any first-order theory which contains Peano Arithmetic, this theorem says
that no such theory can be both consistent and decidable.
Proof. We suppose that this set, T say, is recursive and prove that then S is not consistent.
So we are supposing that the unary relation r on N defined
Applying the fixed-point theorem above to ¬R(x), there is a sentence Q with Gödel number
ψ such that
Q ⇔ ¬R(ψ̄) . (–3)
Suppose now that Q is not a theorem; then ψ is not the Gödel number of a theorem, so
¬R(ψ̄) by (–2) and then Q by (–3). But this is a contradiction, so Q is a theorem, that
is Q. But then, by (–1), R(ψ̄). From (–3) again, ¬Q and so S is inconsistent.
Proof. Because if it were complete, then by A.12 it would be decidable, contradicting I A.12
Church’s Theorem above.
This is the famous Gödel theorem, as strengthened by Rosser. It says that Peano’s axioms
are not sufficient to “answer every question about N”, in the sense that PA is consistent and
complete. But it also says much more than that: it is impossible to achieve this result by
adding more axioms, provided those axioms are such that it is possible to decide what is an
axiom and what isn’t.
458 CHAPTER 10. GÖDEL’S THEOREM
D PL is not decidable
D.1 Discussion
We have seen in this chapter, among other things, that Mathematics as a whole is not
decidable, whether we do it from a basis of MK or of ZF — or of pretty well anything that
allows counting for that matter. I suppose that is just as well, otherwise we mathematicians
would soon be out of a job.
I 2.G We did see however that Sentential Logic SL is decidable (back in Section 2.G). That means
that, once you get the hang of it, SL is completely routine.
Considering the Truth Table method, it feels very much as though just a bit of tweaking
would upgrade it to a decision method for Predicate Logic PL. Any given expression contains
only a finite number of functions, relations and variables. Surely, it seems, one could look
at all possible combinations of truth-values the relations could have and assign a variable
to each one — or something like that. Well, to cut a long story short, that cannot work.
We can now prove, fairly easily, that PL is not decidable. While the proof is now short and
easy, that does not mean it is simple, because it builds on a number of the big results in
these notes.
These are the ideas and results we will call on:
D.2 Lemma
If PL is decidable, then so is any finitely axiomatisable theory.
Proof. Let us suppose that PL is decidable and that T be a finitely axiomatisable theory,
that is, a first-order theory with a finite set of proper axioms A1 , A2 , . . . , An .
We need an algorithm to decide whether or not any given expression P is a theorem of T .
A1 ∧ A2 ∧ · · · ∧ An ⇒ P
D.3 Theorem
Predicate Logic PL is not decidable.
Proof. We know that RA is finitely axiomatisable. So, by the previous lemma, if PL were
decidable, then RA would be too. But it’s not.
A. CONSTRUCTION OF THE BASIC
NUMBER SYSTEMS
In this appendix I describe the construction of the most important number systems of
mathematics, The Natural Numbers N, The Integers Z, The Rational Numbers Q, The Real
Numbers R and, briefly, the complex numbers C.
Proofs will mostly not be given, however the results are given in such an order that each
can be proved quite easily from what has gone before.
[a] = { x : x ∈ A and x ∼ a } .
It is easy to check that the equivalence classes form a partition of A, that is,
((ii) and (iii) may be restated: every member of A is a member of exactly one equivalence
class.) Another useful property is:
(iv) For any a, b ∈ A, the following three things are equivalent: a ∼ b , [a] = [b] and
[a] ∩ [b] 6= ∅.
Given an equivalence relation ∼ on a set A, we define the quotient set A/∼ to be the set of
all equivalence classes under ∼:
A/∼ = { [a] : a ∈ A } .
461
462 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS
A.3 Respect
We say that an equivalence relation ≡ on a set A respects the algebraic operation θ if
(assuming θ is n-ary), for any members a1 , a2 , . . . , an and a0 1 , a0 2 , . . . , a0 n of A such that
a1 ≡ a01 , a2 ≡ a02 , . . . , an ≡ a0n ,
An equivalence relation which respects all the algebraic operations we are interested in is
usually called a congruence.
Observe that any nullary operation is automatically respected by any equivalence relation
(the definition above is vacuously satisfied).
The most familiar example of a congruence is ordinary congruence modulo n on the integers,
for any fixed integer n. It respects all the ordinary operations of a ring with unity: addition,
negation, zero, multiplication and unity (one).
Now let ≡ be an equivalence relation on an algebra which respects the n-ary algebraic
operation θ. Then θ can also be defined on the quotient set in a natural way as follows:
let X1 , X2 , . . . , Xn be members of A/≡, that is, equivalence classes. Choose any members
x1 , x2 , . . . , xn of these classes (x1 ∈ X1 , x2 ∈ X2 , . . . , xn ∈ Xn ). Then θ(X1 , X2 , . . . , Xn ) is
defined to be the equivalence class containing θ(x1 , x2 , . . . , xn ).
It must of course be checked that this definition makes sense: that the definition just given of
θ(X1 , X2 , . . . , Xn ) is independent of the particular choice of the representatives x1 , x2 , . . . , xn
of the classes. That this is so follows quite easily from the fact that ≡ respects θ.
Once this has been established we can rewrite the definition in a much more convenient
form: given ≡ and θ as above, let a1 , a2 , . . . , an be members of A; then [a1 ], [a2 ], . . . , [an ]
are equivalence classes and we want to define θ([a1 ], [a2 ], . . . , [an ]). Well then,
A.4 Laws
A law (in the sense in which we will be using the word here) is an equation involving the
defined operations which must be true for all values of the variables involved. Examples are
the commutative law
(∀x ∈ A)(∀y ∈ A) ( xy = yx )
A. QUOTIENT SETS AND ALGEBRAIC OPERATIONS 463
Because of what we are about to do, it is a good idea to make sure that this definition
is precisely understood. Firstly, we define terms as in Chapter 3: a term is either a vari- I Ch.3
able symbol x, y, z . . . or of the form θ(t1 , t2 , . . . , tn ), where θ is an n-ary operation and
t1 , t2 , . . . , tn are (simpler) terms. Next, we define an equation as an expression of the form
term1 = term2
where term1 and term2 are of course terms. Finally, a law is an expression of the form
were x1 , x2 , . . . , xn are all the variable symbols which appear in the equation.
Note that there are a number of facts which are often called laws which are not in fact laws
in the sense just defined. For instance, the “cancellative law of addition” for Z states:
The existential quantifier here stops this from being a law. In the same way, facts about
fields involving division fail to be laws; for example
(∀x ∈ Q) ( x 6= 0 ⇒ x/x = 1 ) .
A.5 Theorem
Let A be an algebra and ≡ be an equivalence relation on A. Then any law of A which
involves only operations which are respected by ≡ is also a law of A/≡.
A.6 Orders
Most of the above discussion applies to relations in the same way. We are only interested
in order relations here, so I will summarise the relevant facts for orders; the generalisations
to arbitrary relations, should you be interested, are obvious.
Recall that a preorder on a set A is a binary relation, ≤ say, with the properties:
Reflexivity: For all a ∈ A, a ≤ a.
x∼y ⇔ x ≤ y and y ≤ x
is an equivalence relation. We call this the equivalence relation defined by (or corresponding
to) ≤.
Now let A be a set and ≤ be a preorder on A. Let ∼ be an equivalence relation (not
necessarily the one defined by ≤). We say that ∼ respects the order relation ≤ if, for all
a, b, a0 , b0 ∈ A
a ∼ a0 and b ∼ b0 and a ≤ b ⇒ a0 ≤ b0 .
In this case a corresponding preorder can be defined on the quotient set A/∼ in the obvious
way: let X and Y be equivalence classes in A. Choose any members x of X and y of Y .
Then define X ≤ Y if and only if x ≤ y.
Having checked, as before, that this definition makes sense, that is, that the definition of
X ≤ Y is independent of the particular members x and y chosen, we may rewrite the
definition in the more convenient form: for all a, b ∈ A, [a] ≤ [b] if and only if a ≤ b.
Now the following facts are easily checked:
(1) If the order ≤ was a partial order, a full order or a full preorder on A, then the
corresponding order on A/∼ is an order of the same kind.
(2) If the order ≤ was a preorder on A and the relation ∼ happens to be the equivalence
relation defined by ≤, then the corresponding order on A/∼ is a partial order.
Note that, combining (1) and (2), if the order ≤ was a full preorder and the relation ∼ is
the equivalence relation defined by ≤, then the corresponding order on A/∼ is a full order.
B. THE NATURAL NUMBERS, N 465
C The Integers, Z
C.1 Preliminary
We are about to construct the Integers Z from the Natural Numbers N. Let us suppose for
a moment that we already have Z in some sense to look at. How can we specify a member
z of Z using natural numbers? Well we can write any integer z = m − n, where m and
n are natural numbers, and so we can use the pair hm, ni to represent z. Thus, as a first
approximation, we might consider defining z to actually be the pair hm, ni, and so construct
Z as the set N × N of pairs of natural numbers.
There is a problem with this however: any given integer z can be represented by a pair of
natural numbers in many ways: z = m − n = m0 − n0 . To accommodate this we consider
each integer z to actually be the set of pairs of natural numbers which represent it. Thus
we will define Z to be a set of appropriately defined subsets of N × N, upon which we will
define algebraic operations and order.
In order to define which subsets of N × N are going to represent integers, we define a relation
between pairs: hm, ni ∼ hm0 , n0 i if they represent the same integer, that is, if m−n = m0 −n0 .
This turns out to be an equivalence relation, and so we can define Z conveniently as the
quotient set (N × N)/∼.
There is one last small problem to be got out of the way: we need to be able to define
the relation ∼ in order to construct Z. But before Z has been constructed, the equation
m − n = m0 − n0 has no meaning (certainly not in N when m < n). Using our hindsight,
we observe that this equation is the same as m + n0 = m0 + n and that this one does make
sense in N, so we use this as our definition.
We are going to construct Z as a quotient set of N × N. It will be convenient to take this in
two steps, and discuss N × N, with some useful structure added, as a sort of half-way-there
construction.
Zero 0J = h0, 0i
Addition ha, bi + hc, di = ha + c, b + di
J1 We now prove that all the usual algebraic and order laws for Z hold in J “up to
equivalence”, as follows:
(i) (ha, bi + hc, di) + he, f i ∼ ha, bi + (hc, di + he, f i)
(ii) ha, bi + 0J ∼ ha, bi
Now (ix) above tells us that ∼ is an equivalence relation. Next we prove that it respects all
the operations we have defined on J:
J2 If ha, bi ∼ ha0 , b0 i and hc, di ∼ hc0 , d0 i then
(i) ha, bi + hc, di ∼ ha0 , b0 i + hc0 , d0 i
(iii) [a, b][c, d] = [ac + bd, ad + bc] (= the equivalence class of ha, bihc, di).
(iv) [a, b] ≤ [c, d] ⇔ a+d≤b+c (⇔ ha, bi ≤ hc, di).
(The point here is that the results of (J2) mean that the definitions here are independent of
the particular members chosen to represent the equivalence classes.)
(ii) x+0=x
(iii) x + (−x) = 0
(iv) x+y =y+x
Z4 Finally, N is embedded in Z by the map n 7→ [n, 0]. In other words, the function
C. THE INTEGERS, Z 469
ι : N → Z defined by ι(n) = [n, 0] for all n ∈ N is an injection which preserves all the
operations and the order. Thus the image of N in Z under this mapping is an identical copy
of N in all salient respects. From now on we may, and often do when it suits us, think of N
as this subset of Z.
470 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS
D The Rationals, Q
D.1 Preliminary
Now we construct The Rationals Q from The Integers Z. The general method is exactly
the same as for the previous construction, with only a few small changes of detail. Using
hindsight as before, we know that any rational q can be written as a quotient ab , where a and
b are integers and b 6= 0. Of course, any particular rational can be written as such a product
in many ways, and this gives rise to an equivalence relation on Z × (Z r {0}), ha, bi ∼ ha0 , b0 i
0
if and only if ab = ab0 . This can be defined within Z by ab0 = a0 b.
We are going to construct Q as a quotient set of Z × (Z r {0}), the set of all pairs ha, bi
of integers such that b 6= 0. As before, it will be convenient to take the construction in
two steps, and first discuss this product set, with some useful structure added, as a sort of
half-way-there construction.
Zero 0K = h0, 1i
Addition ha, bi + hc, di = had + bc, bdi
K1 We prove that all the usual algebraic and order laws for Q hold in K “up to equiva-
lence”, as follows:
(i) (ha, bi + hc, di) + he, f i ∼ ha, bi + (hc, di + he, f i)
(ii) ha, bi + 0K ∼ ha, bi
Now (x) above tells us that ∼ is an equivalence relation. Next we prove that it respects all
the operations we have defined on H:
K2 If ha, bi ∼ ha0 , b0 i and hc, di ∼ hc0 , d0 i then
a c ad+bc
(i) b + d = bd (= the equivalence class of ha, bi + hc, di).
−a
(ii) − ab = b (= the equivalence class of −ha, bi).
a c ac
(iii) b d = bd (= the equivalence class of ha, bihc, di).
(iv) If a
b 6= 0Q , then ( ab )−1 = b
a (= the equivalence class of ha, bi−1 ).
a c
(v) b ≤ d ⇔ (bd > 0 and ad ≤ bc) or (bd < 0 and ad ≥ bc) (⇔ ha, bi ≤ hc, di).
(The point here is that the results of (K2) mean that the definitions here are independent
of the particular members chosen to represent the equivalence classes.)
Q2 Applying (Q1) to (K1) gives all the basic laws for Q.
(i) ( ab + dc ) + e
f = a
b + ( dc + fe )
472 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS
a a a
(ii) b +0= b (iii) b + (− ab ) = 0
a c c a
(iv) b + d = d + b
(v) ( ab dc ) fe = ab ( dc fe )
a c
(vi) b (d + fe ) = a c
b d + a e
b f
a a
(vii) b .1 = b
ac ca
(viii) b d = d b
a a a −1
(ix) If b 6= 0 then b(b) =1
(x) The relation ≤ defined on Q is a full order.
a c a e c e
(xi) b ≤ d ⇒ b + f ≤ d + f
a c e a e c e
(xii) b ≤ d and 0 ≤ f ⇒ b f ≤ df
Q3 All these terms in (Q2) above are simply arbitrary rationals, so the laws look more
familiar in more normal notation:
(i) (x + y) + z = x + (y + z)
(ii) x+0=x
(iii) x + (−x) = 0
(vii) x.1 = x
(viii) xy = yx
(ix) If x 6= 0 then x(x)−1 = 1
(x) The relation ≤ defined on Q is a full order.
E The Reals, R
E.1 Preliminary
Once again we use hindsight and consider what we know about the reals to decide how to
construct them from the rationals. There are two standard ways of doing this and I will
describe them both here; the first is the “completion” method, in which real numbers are
obtained as limits of convergent sequences of rational numbers, and the second is the method
of Dedekind cuts.
First we set out to define a real in terms of rational numbers is as the limit of a convergent
sequence. The usual definition of convergence, that the sequence ha0 , a1 , a2 , . . .i converges
to the real x if
won’t do, because we need to be able to identify a convergent sequence of rationals before
the reals have been constructed — that is, we need to be able to recognise, before the
reals have been constructed, those sequences of rationals which are going to converge after
the reals have been constructed. The definition above mentions reals in two places: firstly
in (∀ real > 0) and secondly in the limit x. The first occurrence is no real problem:
(∀ rational > 0) will do just as well. The second occurrence is more of a problem: the
definition is fine if you already have x to work with, but we are here trying to create x out
of thin air.
The solution is to use Cauchy sequences. A sequence ha0 , a1 , a2 , . . .i of real numbers is a
Cauchy sequence if
That is the usual definition, but it is easy to see that replacing “real” by “rational” in
(∀ real > 0) results in an equivalent definition. Also, we know two crucial things about
Cauchy sequences: (1) Cauchy sequences are convergent (in the ordinary sense above) and
(2) Every real number is the limit of a Cauchy sequence of rationals. Just to make sure
there is no confusion, for the rest of this section a Cauchy sequence will mean a Cauchy
sequence of rational numbers, unless otherwise stated, that is, a sequence ha0 , a1 , a2 , . . .i of
rational numbers satisfying
Just as in our previous constructions, we must observe that each real number is going
to be the limit of many Cauchy sequences. This defines an equivalence relation on the
Cauchy sequences, but we need a way of defining it a priori — without reference to real
numbers. This is not too difficult: two Cauchy sequences ha0 , a1 , a2 , . . .i and hb0 , b1 , b2 , . . .i
are equivalent in this sense if and only if
Having made these observations, the construction of R follows the now familiar pattern.
474 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS
We are going to construct R as a quotient set of the set of all Cauchy sequences of rationals.
As before, it will be convenient to take this in two steps, and discuss the set of all Cauchy
sequences of rationals, with some useful structure added, as a half-way-there construction.
One 1L = h1, 1, 1, . . .i
Multiplication ha0 , a1 , a2 , . . .ihb0 , b1 , b2 , . . .i = ha0 b0 , a1 b1 , a2 b2 , . . .i
Inversion ha0 , a1 , a2 , . . .i−1 = ha∗0 , a∗1 , a∗2 , . . .i
L1 As before, we prove all the basic algebraic and order laws on L “up to equivalence”.
Let X, Y and Z be any three Cauchy sequences in L. Then
(i) (X + Y ) + Z ∼ X + (Y + Z)
(ii) X + 0L ∼ X
(iii) X + (−X) ∼ 0
(iv) X +Y ∼Y +X
(v) (XY )Z ∼ X(Y Z)
(vi) X(Y + Z) ∼ XY + XZ
(vii) X.1L ∼ X
E. THE REALS, R 475
(viii) XY ∼ Y X
(ix) If X 6∼ 0 then X(X)−1 ∼ 1L
(x) The relation ≤ defined on R is a full preorder whose corresponding equivalence
relation is ∼.
(xi) X≤Y ⇒ X +Z ≤Y +Z
(xii) X ≤ Y and 0 ≤ Z ⇒ XZ ≤ Y Z
Now (x) above tells us that ∼ is an equivalence relation. Now we prove it respects all the
operations we have defined on L:
L2 If X, Y , X 0 and Y 0 are Cauchy sequences and X ∼ X 0 and Y ∼ Y 0 then
(i) X + Y ∼ X0 + Y 0
(ii) −X ∼ −X 0
(iii) XY ∼ X 0 Y 0
(iv) X 6∼ 0L if and only if X 0 6∼ 0L and then X −1 ∼ X 0−1 .
(v) X ≤ Y if and only if X 0 ≤ Y 0 .
Zero 0R = [0L ]
Addition [X] + [Y ] = [X + Y ]
Negation −[X] = [−X]
One 1R = [1L ]
(As usual, the point here is that the results of (L2) mean that the definitions here are
independent of the particular members chosen to represent the equivalence classes.)
R2 Applying (R1) to (L1) gives all the basic algebraic laws for R.
(i) (x + y) + z = x + (y + z)
(ii) x+0=x
476 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS
(iii) x + (−x) = 0
(iv) x+y =y+x
(v) (xy)z = x(yz)
(vi) x(y + z) = xy + xz
(vii) x.1 = x
(viii) xy = yx
(ix) If x 6= 0 then x(x)−1 = 1
{ x : x ∈ Q, x < a }
of all rationals less than it; such sets are called cuts. We must first define them in a way
which does not require prior reference to real numbers.
E.6 Proposition
If A is a cut, then its complement Q r A has these properties:
E.7 Definition
The Real Numbers R is just the set of all cuts in Q, as defined above (there will be no need
to form a quotient structure with this approach).
The zero real is 0R = Q− , the set of all negative rationals.
For any reals (cuts) A and B, A + B = { a + b : a ∈ A and b ∈ B } .
E.8 Proposition
Let A be a cut and r be any rational number > 0. Then there is some a ∈ A such that
a + r ∈ Aco .
E.9 Proposition
With these definitions, R is a commutative group with respect to addition.
E.10 Definition
For cuts A and B,
A≤B if and only if A ⊆ B .
It is easy to check that this is a full order. Also, for any cuts A, B and C,
if A ≤ B then − B ≤ −A
and A + C ≤ B + C .
E.11 Proposition
For any cuts A, B and C,
(i) If A ≤ B then A + C ≤ B + C.
(ii) If A < B then A + C < B + C.
(iii) If A ≤ B then −A ≥ B.
iv If A < B then −A > B.
Proof. (i) follows trivially from the definition of the order and the others follow easily from
this by ordinary group theory.
E.12 Definition
Suppose that A ≥ 0R and B ≥ 0R . Then
AB = { ab : a ∈ A, a ≥ 0, b ∈ B, b ≥ 0 } ∪ 0R ,
A−1 = { x : x > 0, x−1 ∈ Aco } ∪ {0} ∪ 0R ,
1R = { x : x ∈ Q, x < 1 } .
E.13 Proposition
Let A be a cut, A > 0R , and let r be any rational > 1. Then there is an a ∈ A such that
ar ∈ Aco .
Proof. Since A > 0R , there is c ∈ A such that c > 0. Let s = c(r − 1). Then, by the earlier
proposition, there is d ∈ A such that d + s ∈ Aco . Now let a = max{c, d}. Since c, d ∈ A,
we have a ∈ A. Also ar = a + a(r − 1) ≥ a + c(r − 1) = a + s ≥ d + s and d + s ∈ Aco so
ar ∈ Aco .
We can now show that the non-negative cuts obey all the usual field laws involving multi-
plication.
E.14 Proposition
Let A, B and C be cuts, all ≥ 0R . Then
(i) (AB)C = A(BC) ,
(ii) AB = BA ,
(iii) A0R = 0R ,
(iv) A(B + C) = AB + AC ,
(v) A1R = A ,
Proof. (i), (ii) and (iii) follow immediately from the definition.
(iv) Given (ii) and (iii) above, this result is trivially true if any of A, B or C are 0R ; so
we assume now that they are all > 0R . Suppose first that x ∈ A(B + C). If x < 0, then
x ∈ AB + AC automatically. Otherwise, there are a ∈ A, a ≥ 0 and y ∈ B + C, y ≥ 0 such
that x = ay, and then b ∈ B and c ∈ C such that y = b + c.
If b ≥ 0 and c ≥ 0 we are done, for then x = ab + ac with ab ∈ AB and ac ∈ AC. Since
b + c = y ≥ 0 we cannot have both b and c negative. We may therefore suppose that b < 0
and c > 0, the proof in the other case being the same. But then y < c, so y ∈ C and now
x = 0 + ay with 0 ∈ AB and ay ∈ AC, so x ∈ AB + AC as required.
480 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS
Conversely, supose that x ∈ 1R , x > 0, that is, 0 < x < 1. Set r = x−1 so r > 1. By
the previous proposition, there is an a ∈ A such that ar ∈ Aco . Then (ar)−1 ∈ A−1 so
a(ar)−1 = r−1 = x ∈ AAco , as required.
E.15 Proposition
For any cuts A, B and C,
Proof. (i) follows immediately from the definition of the order and (ii) follows from that
by ordinary ring theory.
We can now stop messing around with cuts. The story so far, converting to ordinary
notation, is that we have our set R of reals, with addition, negation and zero defined on R,
multiplication and identity defined on the set of non-negative reals and inversion defined on
the set of positive reals. These satisfy
For any x, y and z in R,
(Ai) (x + y) + z = x + (y + z)
(Aii) x + y = y + x
(Aiii) x + 0 = x
(Aiv) x + (−x) = 0
(Av) If x ≤ y then x + z ≤ y + z and −x ≥ −y.
(Mii) xy = yx
(Miii) x0 = x
(Miv) x(y + z) = xy + xz
(Mv) x1 = x
E.16 Definition
(i) Let x be any real. Then its absolute value |x| is defined in the usual way:
(
x if x ≥ 0,
|x| =
−x if x < 0.
Observe that, because of M(iii) above, the last three displayed equations can be rewritten
(
x if x ≥ 0,
|x| =
−x if x ≤ 0 ,
as already defined if x ≥ 0 and y ≥0
−x |y| if x ≥ 0 and y ≤0
xy =
−|x| y if x ≤ 0 and y ≥0
|x| |y| if x ≤ 0 y ≤0
and
and (
as already defined if x ≥ 0,
x−1 =
−(−x)−1 if x ≤ 0.
482 APPENDIX A. CONSTRUCTION OF THE BASIC NUMBER SYSTEMS
It follows from these definitions that (−x)(−y) = xy , x(−y) = (−x)y = −(xy) and
(−x)−1 = −x−1 irrespective of the signs of x and y.
E.17 Proposition
For any reals x, y and z,
(i) (xy)z = x(yz)
(ii) xy = yx
(iii) x0 = x
(iv) x(y + z) = xy + xz
(v) x1 = x
(vi) If x 6= 0 then xx−1 = 1
Proof. First note that, while we must still be careful about multiplicative manipulation,
we have already proved that R is a group with respect to addition and so we can freely use
ordinary manipulation of addition, subtraction and zero.
(i), (ii) and (iii) all follow immediately from the definitions above.
(iv) We must consider the various cases according as x, y and z are positive or negative.
We already know that x(y + z) = xy + xz in the case where x, y and z are all ≥ 0, and we
use this freely below.
Suppose now that x ≥ 0, y ≥ 0 and z ≤ 0. We must consider two subcases, depending on the
sign of y + z. If y + z ≥ 0 then, since −z ≥ 0 also, we have x(y + z + (−z)) = x(y + z) + x(−z)
which is xy = x(y + z) − xz and so the required result. In the other subcase, y + z ≤ 0,
we have −z ≥ 0 and so xy = x(−z + (y + z)) = x(−z) + x(y + z) by the previous subcase
= −xz + x(y + z), which gives the required result again.
Suppose now that x ≥ 0, y ≤ 0 and z ≥ 0; by commutativity of addition, this is the same as
the previous case.
Suppose now that x ≥ 0, y ≤ 0 and z ≤ 0. Then y + z ≤ 0 so x(y + z) = −x(−y − z) =
−(x(−y) + x(−z)) = −(−xy − xz) = xy + xy as required.
(vi) We already know that xx−1 = 1 in the case that x > 0. But if x < 0 we have x−1 < 0
also, so xx−1 = (−x)(−x−1 ) = (−x)((−x)−1 ) = 1 as required.
(vii) We already know this in the case 0 ≤ x ≤ y. In the case x ≤ 0 ≤ y we have xz ≤
0 ≤ yz and in the case x ≤ y ≤ 0 we have −y ≤ −x, so xz = −(−x)z ≤ −(−y)z = yz.
This completes the proof of the ordered field structure of R. As usual we observe
E.18 Proposition
The function Q to R given by
r 7→ { x : x ∈ Q, x < r }
is one-to-one and preserves the order and field operations of Q. In other words, this function
embeds Q into R.
And finally, a result which is easy to prove but important enough to be called
S
Proof. Let m be the given upper bound. Define s = X (the union of this set of reals,
thought of as cuts). We will show that s is the required least upper bound.
Firstly we must show that it is in fact a real number, that is, a cut. To see that it is initial,
let r ∈ s, q ∈ Q and q ≤ r. Then there isSsome member x of X such that r ∈ x. But then x
is a cut and so q ∈ x also; and then q ∈ X = s. To see that s is nonempty, note that X is
nonempy and every member of X, being a cut, is nonempty. To see that the complement S of
s is nonempty, note that every member x of X is ≤ m, that is, ⊆ m and so s = X ⊆ m
— and the complement of m is nonempty. Finally, s is the union of sets x none of which
have a greatest element, so neither does s.
S
Every member of X is a subset of X, that is, every member of X is ≤ s; and that says
that s is an upper bound for X.
Suppose that u is any otherSupper bound for X. That means that every member of X is a
subset of u. But then s = X ⊆ u, that is s ≤ u. This tells us that s is in fact the least
upper bound for X.
E.20 Corollary
Every Cauchy sequence in R converges to a real number.
The proper way to define the operations is obvious: the zero and identity are h0, 0i and
h1, 0i. The sum and product of ha, bi and hc, di are ha + c, b + di and hac − bd, ad + bci. The
negative and inverse of ha, bi are h−a, −bi and
a −b
h√ ,√ i.
a2+b 2 a2 + b2
One of the most of the most important properties of the Complex Numbers is expressed
by the Fundamental Theorem of Algebra: Every non-constant polynomial (with real or
complex coefficients) has at least one (complex) root. A corollary of this theorem is that
every polynomial can be factorised completely into linear factors, provided those factors are
allowed to have complex coefficients. There are two standard proofs of this theorem; one
involves a considerable amount of complex analysis (contour integration and so on), the
other a goodly amount of topology (homotopy theory). Both are beyond the scope of these
notes (sorry!).
B. ZF SET THEORY
The structure of the appendix follows that of Chapter 6 as closely as possible, noting the I6
differences with MK Set Theory as it progresses. In many places the development is identical,
and then it is not done all over again here.
The major difference between the two treatments is that in MK Set Theory the basic objects
are classes and in ZF Set Theory they are sets. There is no direct way of talking about proper
classes in ZF.
485
486 APPENDIX B. ZF SET THEORY
Various kinds of class-like construction are acceptable in ZF. For example, suppose we write
F for the class of finite groups and A for the class of abelian groups. Then we may write
G ∈ F ∩ A because we can translate this as G is a finite abelian group. For another example,
for any group G, its centre, usually written Z(G), is a uniquely defined subgroup and so a
group in its own right. Thus we can describe Z as a “function G → G”. This is perfectly
proper, provided we take this to be definition of a function by description, as in Definition
I 4.A.7 4.A.7 — the definition of a function as a class, as in 6.B.21 is not available to us. This can
I 6.B.21 be quite a nuisance in areas of mathematics which make a lot of use of ideas which are most
naturally expressed as functions between classes: functors in algebraic topology, universal
algebra and more generally category theory spring to mind.
The kind of argument which must not be used in ZF is one which quantifies classes, for
instance one which talks about all classes with some given property. The reason this is
outlawed is that, in trying to translate this using the predicates which define the classes,
one ends up talking about all predicates of such and such a form, and this is just not part
of first-order logic. For example, there is no way at all of expressing the idea of “for all
functions G → G” in ZF. It is always possible, of course, that one might be able to replace
the argument with another completely different one which is valid ZF and does come to the
same conclusion, however this is not automatically possible.
The Axiom of Foundation provides an example of straightforward translation. As expressed
in MK, the axiom mentions a class w which turns out to be a proper class; indeed w = Sets.
This cannot be said in this way in (formal) ZF, so the ZF version of the axiom uses the
usual way of getting around this by using a predicate instead. The idea of the “class of all
sets” is not directly expressible in ZF, however a construction of the form “for all sets a,
such and such is true” is.
By far the most important difference in the axioms is in the Axiom of Specification. In MK
the axiom tells us that, for any predicate P (x) there is a corresponding class { x : P (x)}.
In ZF the corresponding axiom is much more limited; it tells us that, for any set a and any
predicate P (x) there is a corresponding set { x : x ∈ a ∧ P (x)}.
B. THE FIRST SIX AXIOMS 487
The notation { x : P (x) } can be used fairly freely in ZF, however it is essential that
the expression P (x) implies that x is a member of some already-known set, otherwise the
construction is meaningless in ZF. For example, the construction
L = {x : x ∈ R ∧ x < 0}
is OK, whereas
G = { G : G is a group }
is not.
B.5 Sets
The class Sets of all sets does not exist in ZF, nor anything that does the job of “all sets”.
Sorry about that.
(The Russell Paradox tells us that there is no set of all sets.)
S
Axiom ZF4 tells us S that, for any A, A exists. As usual, since this is ZF, A must be a
set and then so is A. (Note that, in MK, the existence of the union of classes is given by
the Axiom of Specification. However in ZF, we need Axiom ZF4 in order to be able to talk
about unions at all.) It follows that the union of two sets, and hence the union of any finite
number of sets, is also defined.
B.9 Intersections
The definition of intersections of sets is rather more tricky than is the case with classes in
MK. the intersection of two sets can be defined by
A ∩ B = { x : x ∈ A ∧ x ∈ B }.
The intersection of any (nonzero) finite number of sets can be defined similarly, and then
the usual properties are easily proved.
The intersection of any nonempty set of sets can be defined in the usual way: if A is a
nonempty set of sets, then its intersection is:
\
A = { x : (∀W ∈ A)(x ∈ W ) }
B. THE FIRST SIX AXIOMS 489
Here is where we must be careful: it is not enough to simply write this definition down.
The axioms of ZF do not guarantee the intersections existence as easily as that, because
the definition does not conform to the ZF form of the Axiom of Specification. So we must
verify that the intersection exists and is unique.
For a start, if A is empty, this intersection is in fact undefined. If it existed, then the
definition can be seen to be equivalent to that of the universe, which does not exist in ZF.
If A is nonempty, then it must contain a member, A say. Then we observe (with a small
proof) that the definition above is equivalent to
\
A = { x : x ∈ A ∧ (∀W ∈ A)(x ∈ W ) }
which does conform to the ZFAxiom of Specification. Then finish off by checking uniqueness.
The union and intersection of two sets are defined as discussed above and have all the usual
properties.
Since we cannot speak of classes in ZF, there is no need for the more general definition of
an ordered pair which is used in MK to define a pair of classes: we use the above definition
everywhere. So we don’t need the special notation ha, bip , and I haven’t used it above.
Note that the horrid little proof given in Chapter 6 that the cartesian product of two sets is I Ch.6
a set works just as well in ZF, but here is used to tell us that the cartesian product exists.
won’t do. Instead we show that there exists a unique set N with the property that
Noting that the set W of the axiom is not unique, we show that N exists by using the fact
that the axiom states that at least one such set W exists, and so there exists a set
We then show (easy) that N has the property stated above. Then it is also easy to prove
that it is unique.
I 6.D.4 In Definition 6.D.4, of course, we only define a transitive set. This is not a problem for the
rest of the section.
D Well-ordering
and . . .
Dealing with ordinal numbers when one cannot talk about classes at all can be done, but
can be a bit tiresome. In the next section I will present the main results of the section on
I Ch.7 Ordinal numbers in Chapter 7 translated into ZF-speak.
G. ORDINAL NUMBERS 491
G Ordinal numbers
G.1 Definition
An ordinal number is a set α such that
(i) α is transitive (that is, every member of α is also a subset of α).
(ii) For all x, y ∈ α, one of the following hold: x ∈ y, x = y or y ∈ x.
Note that (i) above is equivalent to
(i0 ) x ∈ α ⇒ x ⊂ α.
G.3 Proposition
If two ordinal numbers are order-isomorphic, then they are equal.
G.4 Proposition
Every initial segment of an ordinal number is an ordinal number.
Therefore every member of an ordinal number is an ordinal number.
G.5 Theorem
(i) If α is an ordinal number and x ∈ α then x is an ordinal number also.
G.6 Remark
This theorem tells us that, if there were such a thing as the set of all ordinal numbers, then
it would itself be an ordinal number, and thus a member of itself. Therefore ...
G.7 Theorem
There is no such thing as the set of all ordinal numbers.
G.8 Theorem
Every well ordered set is order-isomorphic to exactly one ordinal number.
492 APPENDIX B. ZF SET THEORY
G.9 Definition
If A is a well ordered set, then the unique ordinal number to which it is order-isomorphic is
called the order type of A.
G.10 Example
N is an ordinal number.
G.11 Remark
In the context of ordinal numbers, the set N is usually denoted ω.
G.12 Example
Every natural number is an ordinal number.
G.13 Theorem
If α is an ordinal, then so is α+ .
G.14 Examples
As examples of the last theorem, the following are ordinal numbers.
ω + = {0, 1, 2, . . .} ∪ {ω}
ω ++ = {0, 1, 2, . . .} ∪ {ω, ω + }
ω +++ = {0, 1, 2, . . .} ∪ {ω, ω + , ω ++ }
and so on.
G.15 Theorem
S
Let A be a set of ordinals. Then A is an ordinal.
C. GENERAL ALGORITHMS
Note that here, when I write “model”, I am using the word only very roughly in the
sense of Chapter 5. Here I mean a mathematical structure which imitates accurately
the operation of any general algorithm, as we are about to investigate it.
Note that we will restrict ourselves to algorithms for calculation of the kinds of things
that can be written down — ruling out physical things like recipes for making plum
pudding or instructions for changing a tyre on your car. Symbol manipulation in
other words.
Basically an algorithm consists of three main parts. The first is what we might call the
workspace, where the operations are carried out. It could be the paper we work on, a
blackboard, the computer’s memory, the abacus with its beads, the beach with its pebbles
or whatever. This idea carries along with it the individual symbols which may be used:
numerical digits or letters when working on paper, 0s and 1s in a computer’s memory,
positions of beads in an abacus and so on.
The second part is what is actually written on this workspace. This will usually change
from time to time as the algorithm progresses. I will call this a configuration.
The third part is a set of instructions for implementing the algorithm in this workspace,
which we can understand to be carried out by some kind of an operator or processor. The
important thing here is that the individual instructions should be completely cut and dried:
they should be able to be followed without any creative thought and without the need for
more than a finite amount of memory on the part of the processor. The instructions are
memorised too, so there can only be a finite number of them (for any given algorithm). We
493
494 APPENDIX C. GENERAL ALGORITHMS
would normally think of the processor as a human — or, nowadays, as a computer CPU —
but in any case, the requirements of finite memory and no creative thought mean that we
can model the processor as a (finite-state) automaton.
Note that this requirement of finite memory on the part of the processor means that the set
of symbols available for use, the alphabet, must be finite too — it must be able to recognise
the different symbols and know what to do when encountering them.
At first sight this seems like too open-ended a problem. Think of all the possible instructions
that could be given — “add these two numbers”, “rewrite that number in reverse order”,
“convert that number to Roman notation”, “write a row of that many zeros” and so on —
and that’s only dealing with numerical notation. However that’s because we are looking
at too high a level here. All these kinds of instruction can be broken up into a fairly
small number of “primitive” instructions. Let us consider an example. Consider adding two
multidigit numbers, say:
2 6 8
6 7 9
What do you actually do to add these numbers (using the usual algorithm)? You look at
the top right hand digit (8), then, remembering it, go down to the one below (9). You know
what to do with this: put a 7 in the third row (empty so far) and remember to carry 1.
Now go to the top of the next column to the left, . . . and so on, working leftwards across
the diagram.
Analysing this down to the very simplest mini-operations, one proceeds as follows:
There is only a reasonably small (finite!) number of different head-states involved here. The
largest number come from dealing with cases like rows 2 and 5, where there are 200 states,
depending on the two digits seen and whether a carry is to be made or not. If you are like
me, you spent some dreary time in primary school having the “what to do nexts” for each
of these states beaten into your brain.
There are a few extra kinds of head-states, not displayed in the list above, to do with what
to do when a blank is encountered (reaching the far end of one or other of the numbers).
A. DEFINING A GENERAL ALGORITHM 495
And there’s the notion of “keep going until you reach one of these blanks”. But that’s about
it.
An important point of this is that all you need to “know” (remember) to perform this
algorithm on any pair of numbers, no matter how big, is a small finite number of instructions
of a few very specific, very simple kinds; assuming you are in head-state s and looking at a
cell containing character c:
Decide what to do next, that is, enter into a new head-state s0 ; which actual head-state
you change to depends on the character c you are seeing.
Write a new character at the place you are looking at, just erase whatever is there
(which we could think of as writing a blank), or make no change (which we could
think of as writing the character or blank that was already there).
Move one step in any of the obvious directions (in this case right, left, up or down).
That’s about it. We will need to clean this up a little and we will see shortly that we will
need to add a couple more of the primitive instructions: they will be equally simple.
Here is an interesting exercise to try if you are crazy enough: do the same sum in hexadecimal
(base-16) notation.
1 0 C
2 A 7
Don’t cheat by converting the numbers back into base-10 first. Don’t even convert to binary.
Now, unless you are some super-geek who already knows how to do arithmetic operations in
hexadecimal, you will find that you are teaching yourself something much like the algorithm
I have laid out above, with perhaps a few short cuts you can devise. Well, you don’t have
to do the whole sum, but at least look at the first (rightmost) column and convince yourself
that the bottom symbol should be a 3 (with a carry).
The point of all this is that, if you consider any algorithm and analyse all its operations
down to the simplest “atomic” ones, by which I mean ones that are so simple that they
cannot be broken down any further, then those instructions will be of the simple kind I
described above. There may be a lot of them to define any particular algorithm, but of only
a very few kinds.
We must think about the workspace a bit more. As mentioned above, it can be much more
general that sheets of paper organised into rows and columns of characters. However there
must be some structure. You cannot do our multidigit sum above (decimal version) with a
layout like this:
496 APPENDIX C. GENERAL ALGORITHMS
8
6
2
7
9
because how could you tell which digit belongs to which number and in what order?
Typically, one arranges symbols in rows and columns, perhaps using extra sheets of paper for
subsidiary calculations and so on. This workspace is finite, but is allowed to grow as much
as is needed for any particular calculation (you can always buy some more paper or another
hard disk). The particular physical medium used for the workspace is not really relevant,
so we model it by a mathematical structure. But we will want to define this structure to be
general enough to encompass any well-defined way of arranging our symbols on our medium.
At the very least, the workspace, or rather what is written into it, will consist of single
symbols arranged in some kind of definite pattern which allows the processor to move around
it in some defined and predictable way. Let us look at a few examples of such patterns.
• In the well-known triangular array for calculating binomial coefficients, each entry has
up to six neighbours.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
A. DEFINING A GENERAL ALGORITHM 497
So basically what we need here is the idea of a “next” symbol, or more generally, the “next”
symbol in some given direction (such as left, right, up or down). A workspace will be
equipped with a finite set of directions, and we can conveniently think of its state at any time
during a calculation as consisting of a finite number of symbols from the alphabet, connected
together by arrows labelled with the names of the directions. Even more conveniently for
visualisation, we can enclose each symbol in a little box, representing somewhere it is OK
to write a symbol, and connect the boxes with the labelled arrows. I will call these boxes
cells.
We are now closing in on the general properties needed for a workspace, close enough for us to
construct a mathematical model of one which is general enough to cover all manifestations.
For example, consider a matrix. Decorating it with arrows to represent the directions and
little boxes to represent the cells, we would have something like this.
R R
3 7 2
L L
U D U D U D
R R
4 0 4
L L
U D U D U D
R R
1 9 8
L L
You will (I hope) notice that I have cheated here by only having single-digit entries. To
accommodate more general, multidigit, numbers we would need a couple more directions to
move back and forth along the digits of each number. (If I drew that diagram it would be
unpleasantly complicated.)
A couple of points should be made about workspaces before moving on.
(1) Think of the workspace as being like a blank sheet of paper, ready for the calculations
to be written on it. It doesn’t change during the calculation. At any stage during the
calculation, the workspace will contain a set of symbols (in cells) connected by the directions.
I will call that a configuration, and that will change as the calculation progresses.
So the plain workspace is entirely defined by the available directions (a finite set) and the
available alphabet (another finite set). For the matrix example above, for instance, the
set of directions would be {L, R, U, D, l, r} — here I have used l and r for left and right
directions within a numeral — and the alphabet would consist of the decimal digits together
with perhaps a minus sign and a decimal point, depending upon what sort of numbers you
were interested in and what notation you wanted to use.
498 APPENDIX C. GENERAL ALGORITHMS
And, in this matrix example, the picture above shows a particular configuration in the
workspace, representing a 3 × 3 matrix. If an algorithm was being performed on this matrix,
diagonalisation for example, the configuration would change from time to time, but the
workspace itself would not.
(2) Not all the pointers (representing the directions) have to point to anything. In the
matrix example, the U pointers of the top row don’t point to anything (so I haven’t drawn
them in). The same goes for the other edges of the matrix.
(3) Any configuration is finite, but is allowed to grow by adding cells in any direction.
(Think: in any arithmetic calculation, you are likely to add new digits in places where before
there was nothing. There’s nothing mysterious about this.) By the same token, removing
cells is OK too.
Note that adding or removing a cell is a change to the configuration, not to the workspace.
(4) At any time during the progress of an algorithm, the processor’s attention is fixed
on one particular cell. It is convenient (for our discussion) to think of that cell as being
distinguished by a marker of some kind. Actually, there is no harm in thinking of an
algorithm as having several markers, each representing a cell which the processor may focus
on at some time; the processor might shift focus from one marker to another, and therefore
from one cell in the configuration to another. Using this it would, for instance, be possible
to add two numbers which were widely separated in the workspace, using a marker for each.
(5) We will of course be interested primarily in algorithms which can be used for calcu-
lations involving natural numbers. For our theoretical purposes we need algorithms which
will work for all natural numbers. (An algorithm to deal with, say, multiplication which is
restricted to handling numbers not exceeding some maximum size M is trivial: just write
out all possible M 2 cases as a long list and use it as a look-up table.) And that means that,
amongst other things, our algorithms will have to cope with numbers that are inconceivably
vast.
Consider, for instance, the apparently trivial task of copying a number, let’s call it x. If
the number x is of “normal” size, say not bigger than ten digits long, I can simply look at
x, memorise it and write it into the new place. But what if the number is of the order of
101000 ? There is no way I can remember a 1000-digit number. I would have to copy it in
pieces, perhaps a digit at a time — thinking things like “I now have to copy digit 573 across”.
Still fairly straightforward, though the task is starting to look more complicated. But now
1000
what if the number is of the order of 1010 ? Now there is no way I can even keep what
digit I am up to in my head: that is a 1000-digit number itself. The obvious way to deal
with this is to use some kind of moveable marker, say place a small pebble on the next digit
to be copied. All I now have to do is scan along x until I reach the pebble, push it one digit
to the right, memorise the one revealed underneath, go over to the new partial copy and
write that digit at the end. Keep doing this until my pebble falls off the end of x. Simple,
if time-consuming.
Or, as remarked above, we could think of this as using two markers, keeping one of them
on the place we are up to in the old number and the other on the place we are up to in the
new copy.
A. DEFINING A GENERAL ALGORITHM 499
Of course, when I say “I” here, I am really thinking of the processor of the algorithm. It has
finite memory, so the above discussion applies to it just as well. Moreover, most algorithms
need to deal with data (configurations) of arbitrary size, including truly enormous ones, so
the problems just discussed are quite general.
The moral of this is that all the algorithm’s most basic operations must act locally, on a single
symbol or one of its neighbours (what the neighbours are being defined by the structure of
the workspace). Operations which act at a distance might have to operate at enormous
distances and so must be implemented as repetitions of these basic local operations.
Since these markers we have been discussing simply distinguish certain cells as ones being
“looked at”, it is suggestive to call them “eyes” (of the processor). I shall rather call them
variables, since that is the way they are used: they refer to a piece of the data (the configu-
ration) which may change from time to time, as may the part of the configuration they are
referring to. It is then natural to call the cell that variable refers to as its value.
x
O
1 0 1
1 1 1
This is the initial set up. The little triangle represents the algorithm’s lone variable, its
value being the top right-hand digit. (I have called it x.) Now here is the situation part-way
through the performance:
x
O
1 0 1
1 1 1
0 0
The first two columns have been dealt with and the processor is just about to start on the
third. All it needs to remember at this stage is that there is a carry.
500 APPENDIX C. GENERAL ALGORITHMS
Start
O
? ?
0 1 0 1
D D D D D D
? ? ? ? ? ?
1 0
0 1 0 1 0 1 1 0 1
0
Fin D D D D D D
0 1 0 1 0 1
I think that this diagram is fairly self-explanatory. The circles and rectangles represent
what I have so far been referring to as “states” or “head-states” — I will henceforth call
them nodes. The blue circles represent nodes where a decision as to what to do next is
being made, based on the value of its variable. Here the symbol represents a blank, that
is, seeing nothing or an empty cell. The yellow signposts, D etc., represent moves in the
directions down, up or left. The signpost UUL represents three moves, up, up and left:
a convenient abbreviation for three nodes. The green rectangles, 0 etc., represent writing
a symbol.
And, if it’s not obvious, the arrows represent the node gone to next. From the “decision”
nodes there are several arrows, each labelled by the symbol or symbols which caused that
particular transition.
The left side of the diagram is what happens when a carry hasn’t been seen, for instance at
the start, and the right side is what is done when a carry has been seen.
The above diagram may look a bit complicated, but addition is complicated if analysed
down to its simplest components, as we have done here.
developed perhaps in another place entirely. For such an algorithm the processor’s attention
will need to move between the two given numbers and the sum, needing three variables. To
deal with this we will have to name the variables (say, x and y for the two given numbers
and s for the sum). Then the various basic operations, deciding based upon the value of
its variable, moving the variable or writing in the cell it refers to, each depend upon which
variable is being used. So each of our nodes in the diagram must be labelled by which
variable it refers to. Given this the diagram turns out to be simpler than the last one
(because there is not so much moving about).
Start
x? x?
0 1 0 1
y? y? y? y? y? y?
1
0 1 0 1 0 1 0 1 0 1
0
Fin s := 0 s := 1 s := 0 s := 1 s := 0 s := 1
L L
xys → xys →
There’s not much new here, only the x, y and s tags on the nodes to indicate which variable
they refer to. The xys on the bottom “move” nodes means that all three variables move one
step to the left (it is actually shorthand for three simpler move nodes).
the calculation of the function f (n) = the nth Fibonacci number, simply by looking along
the top row to the nth cell, then counting the cells below it.
···
···
···
···
···
···
···
···
(Here we are using R and D pointers only; note that both pointers out of every cell are
defined.) All the cells are blank, so here we are simply using an empty space with a predefined
structure. It should be obvious from this example that, if we are allowed to set up the empty
space with any kind of structure in advance, then, given any function N → N at all, we can
set up the structure so that values of that function can be looked up just by following
pointers around in the empty space, rather like looking up an infinite telephone book. That
is clearly not what we mean by an algorithm: for a difficult function it is going to be difficult
to set up the space thus and for a non-computable function, impossible.
Even a fairly straightforward arrangement like our assumption of a “squared paper” one in
our first version of the addition example above has its problems. If we follow the movements
of the variable around the configuration in this algorithm, this is what happens:
1 0 1
1 1 1
1 1 0 0
Presumably the two given numbers (the upper two rows in the diagram) are both internally
connected by pointers, probably both L and R, since they are given as numbers. But the
bottom row, the sum, has been created by the algorithm, and only the pointers shown in
and out of those cells have been created explicitly by the variable moving around. But that
bottom row has to end up connected to itself by L and R pointers, or at least L pointers.
How does this come about? It seems that there are two possibilities: either these connections
are made automatically by the workspace, due to the “squared paper” assumption, or else
they are made by the processor as part of the algorithm (which has not been made explicit
in our discussion above).
A. DEFINING A GENERAL ALGORITHM 503
The “squared paper” approach seems natural and straightforward, so let us look at that first.
Let us assume that a binary representation of a number is connected together by both R
and L pointers, and that the two given numbers are connected that way when given. The
black part of the diagram below shows what is given. Then adding the extra connections
that occur because of variable movements give rise to the part shown in red.
Something must give rise to the L and R connections in the bot-
tom line. At present we are considering that this is done by the
workspace. This means that the workspace must at least have en-
1 0 1 coded in it somehow the local structure, that L and R are inverse
directions and that going UR has the same outcome as going RU
1 1 1 — and so on. In fact it must encode these facts somehow:
LR = I , RL = I , UD = I , DU = I ,
1 1 0 0 RU = UR , RD = DR , LU = UL , LD = DL .
(I am using I to mean “no movement”. Note that these relations
are not all independent.)
Remembering that “squared paper” is just one of many ways of structuring the workspace,
on can think about some others. For example the triangular sort of structure used in the
calculation of binomial coefficients has six directions and a correspondingly more complicated
set of relations to define it.
So it seems that we might be able to define a workplace structure by giving a set of directions
and a few relations between them such as these, then taking these into account automatically
somehow as the configuration grows. That this is not entirely trivial, even for the well-known
“squared paper” type of arrangement is shown by a case such as the following.
A B Suppose that the algorithm has started at A and built
the configuration shown here, progressing around to the
cell at C. Now it is going to move up, to create another
cell above the one at C. But it has to know somehow
C that this new cell must also be connected upward to the
one at B. It is clear from this example that, even with a
simple local definition of the structure of the workspace,
there can be implications afar off which require some
computation to recognise.
In this example, recognising that a connection should be made between the new cell and
the one at B is tantamount to recognising that RRRDDDLLLU U = D follows from
the relations listed above. To a group theorist this is starting to look like solving the word
problem for a group given by generators and relations, or, more generally, for a semigroup
so given. And so it should: it is not difficult to see that, given any semigroup defined
by generators and relations, we can set up a workspace where the directions correspond
to those generators, the “local” structure is defined by those relations and the question of
determining other more complicated resulting relations is in fact that of solving the word
problem. It is known that the word problem for groups, and therefore for semigroups also, is
in general algorithmically unsolvable. There are of course many groups for which the word
problem is solvable, but this is not the case for all of them. Indeed, there are many groups
504 APPENDIX C. GENERAL ALGORITHMS
with innocuous looking generator-and-relations definitions for which the word problem is
solvable but takes an enormous amount of computation.
What this boils down to is the idea of defining a workspace with anything more complicated
than just a “bare” set of directions with no relations assumed between them is fraught with
problems, the very least of which is assuming that quite a large amount of the computations
of the algorithm is assumed to be hidden in the structure of the workplace. So we go back
to the original definition: the workspace consists of a set of symbols and a set of directions,
and that is all. If we want some further structure, of the kind we have just been considering,
that is made the responsibility of the program the processor is following. If that program
involves following up on local relations (which are now built in to the program, not the
workspace) and that turns out to be trying to solve an unsolvable word problem, then at
some point the algorithm may fail — but that is not a problem, algorithms are allowed to
fail.
As far as the workspace is concerned, we have come full circle here. We are back with the
extremely simple definition of the workspace we started with, but now we have some good
reasons why. Now any structure added, in the sense of relations between the directions,
must be part of the program of instructions the processor follows. So . . .
Given two variables x and y, set x to refer to whatever cell y is already referring to.
We will symbolise this operation as x :≡ y.
It should now be obvious that the first version, the “squared paper” one is even more com-
plicated than it looked, because the extra connections which must be made by the processor
require instructions which were not made explicit in the node diagram above. They were
hidden by the undisclosed assumption that the structure of the workspace would take care
of it, an assumption we have now discarded. But, to make the extra connections we need
an extra low level (“atomic”) instruction that will make such a connection. A connection is
normally made between two cells (though it is possible to make it from a cell to itself). To
do that of course the processor must know which cells to connect: it needs to have variables
referring to both of them (here is where the assumption of more than one variable becomes
essential). So the atomic instruction is of the form
Connect the cell being referred to by one variable (x say) to the one being referred
D
to by the other (y) in direction D. We will symbolise this operation as x → y.
A. DEFINING A GENERAL ALGORITHM 505
Note that, in any algorithm, each variable is named explicitly somewhere in the program and
the program is finite (contains only a finite number of these atomic instructions). Therefore,
there is only need of a finite number of variables.
Time to make all this official.
A cell may be connected, in certain directions, to no other cell. A cell may contain no
symbol. A variable may not actually be referring to anything. So it will be convenient to
have a symbol to represent “nothing”, or “blank”. I shall use for this.
Each node n specifies one of the variables, x = var(n). This gives a function var : N → E .
This defines which cell the algorithm is “looking at” when it performs that operation.
Each node n also specifies a next node, next(n). This gives a function next : N → N . This
defines which node the algorithm normally goes to after completing the operation defined
by this one. “Normally” because the first three kinds of nodes are “branch” nodes which can
make a decision to go to an alternative node.
506 APPENDIX C. GENERAL ALGORITHMS
A.3 Running
Running an algorithm results in a sequence of steps
The ith step hCi , ni , ref i i represents the current configuration Ci = hC i , symi , coni i, the
current node ni in the program and the current ref-function ref i at “time” i. The ref
function defines the values of all the program’s variables at that time, and so is a function
ref i : V → C i ∪ {}.
The initial step Step0 = hC0 , n0 , ref 0 i is defined as described above. Then all subse-
quent steps are defined as follows. Given Stepi = hC i , ni , ref i i, the next step Stepi+1 =
hC i+1 , sni+1 , ref i+1 i is defined as follows:
(a) If ni is a symbol-compare node, then
that node specifies an variable, e = var(ni ) and a symbol (or ) a = sym(ni ).
The variable defines a cell (or ), c = ref i (e) which in turn defines another
symbol b = sym(c)
This then defines the next node ni+1 = nextT (ni , a) if a = b, ni+1 =
nextF (ni , a) otherwise.
The node is the only thing that changes, so the next step is given by:
(
nextT (ni ) if a = b
C i+1 = C i , ni+1 = ref i+1 = ref i .
nextF (ni ) otherwise ,
508 APPENDIX C. GENERAL ALGORITHMS
and then
(
ref i+1 (e) = c0
ni+1 = next(ni ) ,
ref i+1 (x) = ref i (x) for all variables x 6= e .
T
x :≈ ? x new
d
x→ is an abbreviation for F
d
x̂ → x
d
x̂ →
d
d x→y
instead of x→y we have
d∗
y→x
B. SOME USEFUL FACILITIES 511
The move operation must be upgraded too by making the same change to the “better move
operation” above.
So, if we want this to happen in an algorithm we are about to specify, all we need do is say
at the outset something like “in this algorithm the directions L and R will be reverses, as
will U and D”, and then assume all the join operations are upgraded as above.
Why are we using the strange notations ≡ and ≈ , when a plain equals sign should serve for
one of them? That is because we are saving the = symbol for the more important equality
of numbers, as represented by their codes.
T
x ≈ 0? 0
x? is an abbreviation for F
0 1
T
x ≈ 1? 1
F
The same trick can be used to make a x ≈ y? operation, which checks whether the contents
of the cells that variables x and y refer to are the same, and a x :≈ y? operation which
sets the contents of the cell referred to by x equal to the contents of the one referred to by
y.
for sake of argument that we represent integers in binary notation, requiring 0 and 1 symbols
in our alphabet together with an extra symbol to denote negatives. Then we could represent
rationals as pairs of integers — with some convenient algorithmic structure to denote pairs.
Supposing that we use the algorithm for binary addition given as a diagram above (the
second, better one, with three variables); we certainly don’t want to write it out in full the
dozens of times it turns up in our main algorithm. It is pretty obvious what to do: having
designed the sub-algorithm, we can simply represent it by a box with only the things the
rest of the algorithm needs to know shown on the outside — that is, the entry and exit
points and the three variables. Perhaps something like this
Things get a bit better if this box can be shared. Let’s suppose our algorithm requires this
addition sub-algorithm in three places, like this
Rather than have umpteen copies of the same sub-algorithm, a single one can be shared.
Wherever it is needed, the algorithm just jumps to the single copy. All we need to do is
arrange that, when that sub-algorithm is finished, it jumps back to the correct place. This
we do by supplying the sub-algorithm with one more variable (here I’ll call it r, for “return”).
This is set to refer to a new cell, unconnected to any other part of the configuration. Then
we add new symbols to the alphabet, to stand for the places we want the algorithm to return
to when finished. Write the corresponding symbol to the new cell before jumping to the
sub-algorithm and, when it is finished, refer to that cell to decide where to jump back to.
(For this diagram there are only three such points, and I will call the corresponding symbols
p1 , p2 and p3 .)
Since the sub-algorithm is being shared, its variables x, y and z must be set to refer to the
appropriate cells before it is used. This can be done by variable-set operations. So the whole
thing becomes . . .
B. SOME USEFUL FACILITIES 513
x :≡ a1 x :≡ a2 x :≡ a3
y :≡ b1 y :≡ b2 y :≡ b3
s :≡ c1 s :≡ c2 s :≡ c3
r :≈ p1 r :≈ p2 r :≈ p3
bAd(x, y, s)
r?
p1 p2 p3
This doesn’t seem to be getting much simpler, but in fact it is, in two ways. Firstly, from the
point of view of the actual complexity of the algorithm. Remember that the sub-algorithm
box conceals quite a complicated construction, and this now only has to occur once. However
we are not interested in that kind of simplicity or efficiency here, we are only interested in
whether computations can be performed by an algorithm at all.
More importantly for us, it will allow a useful conceptual simplification, because everything
that is in colour in the above diagram is quite standard, repeated in an obvious way whenever
any sub-algorithm is shared in this way. We can simply rewrite this diagram as follows,
because all the hidden details are known.
Here the “call” box encapsulates setting up the coloured stuff in the previous diagram and
setting the variables of the bAd sub-algorithm to refer to a1 , b1 and c1 (or whatever).
If you’ve done any computer programming, you will recognise what is going on here. We
have just developed the idea of a subroutine — well, the corresponding thing in terms of
the kind of diagrams we have been using — but it corresponds pretty accurately. So I think
I will call them subroutines from now on.
We are now basically up to the point that computer science had reached by about the early
1960s. We have “plain vanilla” subroutines (= sub-algorithms), but our way of implementing
them will not yet allow recursive calls. And these will be important for us.
514 APPENDIX C. GENERAL ALGORITHMS
For our main interests (recursive functions) it will in fact be necessary for us to be able
to create recursive subroutines, that is, subroutines which can call themselves, or more
generally, can call another routine which can call another routine, . . . , which can call the
original one.
Our implementation of subroutines above is not sophisticated enough to deal with this
(basically because the variables of the subroutine will get confused about what to refer to).
But it doesn’t take a great deal to make such recursive subroutines work.
Recursive subroutines
Let’s start by assuming we have got recursiveness to work and we have an algorithm which
is in the middle of its calculation; currently we might have a situation like this:
Main algorithm
called Subroutine A
called Subroutine B
called Subroutine A
called Subroutine C
called Subroutine A
Here we have three invocations of the one subroutine, A, so this one at least has to be
recursive. When the bottom invocation of Subroutine A is finished, and then the invocation
of C, the algorithm needs to know somehow how to continue the middle invocation of
Subroutine A from where it left off. That means that something must remember somehow
enough information for this to happen. (And again when the algorithm eventually gets back
to the top invocation of A.) We cannot tell in general how deep these piles of invocations
might go, and that means that we cannot know in advance how much of this kind of recovery
information needs to be kept. That in turn means that this information cannot be kept in
the program itself. And that means that it must be kept in the configuration somehow.
So what information is necessary to define where a routine was “up to” at the moment we
left it to go to another subroutine? This is pretty simple: for a start, we can arrange to
jump back to the place in the program where we left it by the same method as was used for
a plain subroutine above. We assign a new symbol (in the alphabet) to each place it might
want to go back to and, at the end of the subroutine, use a decision cell (as above) to go
there.
The only other thing necessary is a list of whatever the routine’s variables were referring to
just before the subroutine was called. The required information will look like this:
B. SOME USEFUL FACILITIES 515
Main algorithm p1
called Subroutine A p2
called Subroutine B p3
called Subroutine A p4
called Subroutine C σ . p5
called Subroutine A
This extra bit of configuration we are using to keep track of where subroutines were up to
is called the stack. Each of its rows is called a frame. Note that the stack is furnished with
its own variable which, for the present discussion, I will call σ. Its “normal” position (when
not in the process of calling or returning) is, as shown, looking at the bottom-leftmost cell
of the stack.
To reiterate: for each subroutine, there are a finite number of places in the program it may
be called from, and hence a finite places it might want to jump back to when finished. We
give each such “return address” a different symbol from the alphabet, and use a decision
operation at the end of the subroutine to jump back to the proper place.
All this can be accomplished with the simple operations originally defined. Our new im-
proved call box will contain the following. (Here I assume that the routine A that is doing
the calling has three variables, x, y and z, and the routine, X say, that is being called has
two, u and v. We will want to set those two variables to look at something before setting B
going, and the most likely way to do this is to set them to refer to things that a couple of
variables of A, a and b say, are already referring to.) I will just write the operations out in
longhand, rather than draw a diagram.
σ move D (Start a new frame by moving down; this creates a new bottom-left cell.)
σ write p6 (Whatever the return symbol is.)
σ move R (The next few steps fill in the new frame.)
D
σ join → x
σ move R
D
σ join → y
σ move R
D
σ join → z
σ move LL
u look at a (Set subroutine B’s variables to refer to whatever a and b are referring to.)
v look at b
After subroutine has finished, it uses the symbol p6 to jump back to here.
σ move RD
x look at σ
σ move URD
y look at σ
σ move URD
z look at σ
C. A MORE CONVENIENT NOTATION 517
C.1 Discussion
We will clean up the notation used above and make it easy to use. Firstly, we write each of
the basic operations (nodes) in a form like move(x,d) or write(x,a) and so on.
Then, noting that most of these basic operations go on to a unique next operation, we can
simply write these operations one after another, or one under another. For example
move(x,d); write(x,a); move(x,u); write(x,b);
or
move(x,d);
write(x,a);
move(x,u);
write(x,b);
(we separate individual operations (which we will now call “statements”) by semicolons for
safety’s sake.)
Of course our “programs” are not usually going to be written out in a long simple string like
this, we will need loops and jumps around. We can allow connections to be made in any
way we like by allowing a label on a statement and a jump to such a labelled statement.
Here is an example of a loop (we have labelled the first statement “fred”):
fred: move(x,d);
write(x,a);
move(x,u);
write(x,b);
jump fred;
In fact, using jump statements like this is considered very bad practice. We will have much
better ways of connecting our statements up, and we won’t in fact be using jumps at all.
But it is nice to know that they are there, so that any connection we like can be made
directly.
Another useful feature is to be able to collect a group of statements together into a larger
“compound” statement. We use curly braces for this:
{move(x,d); write(x,a); move(x,u); write(x,b)}
This can be done hierarchically, with compound statements being collected together into
larger ones. We will see shortly why this is useful. From now on, when I write “statement”
that will include these compound ones.
So how do we deal with the three kinds of branch operation, for instance x ≈ y? ?
if (x ≈ a) T-statement;
else F-statement;
following statements
The T-statement represents the nextT connection. The algorithm proceeds to this if the x
≈ a? test is true. Having done that it then proceeds to the following statements (unless
the T-statement jumps elsewhere).
In the same way, the F-statement represents the nextF connection. The algorithm proceeds
to this if the x ≈ a? test is false. Having done that it then proceeds to the following
statements (unless the F-statement jumps elsewhere).
x ≈ a?
F T
F-statement T-statement
following-statements
The T-statement can be a simple jump statement, but more often it is a compound state-
ment, made up of several (perhaps many) simpler statements.
The else portion can optionally be omitted entirely, thus
if (x ≈ a) T-statement;
following statements
and this can be represented diagrammatically by
x ≈ a?
F T
T-statement
following-statements
The other two branch operations are dealt with in exactly the same way,
d
by an if (x ≡ y) or an if (x →) statement.
C. A MORE CONVENIENT NOTATION 519
Another kind of statement which is very useful is the while statement, which is used for
building loops. It looks like this:
while (x ≈ a) loop-statement;
following statements
This results in the loop-statement being executed over and over again, so long as the con-
dition x ≈ a holds true. The loop-statement is nearly always a compound, made up of a
number of simpler statements. Also, one of these statements had better modify either x or
a, or else the loop will never end.
As soon as the condition x ≈ a goes false, the loop ends and the algorithm proceeds to the
following-statements.
This can be represented diagrammatically by:
x ≈ a? T loop-statement
F
following-statements
And, as with the if statement, we can use the other two kinds of conditions:
d
while (x ≡ y) or while (x →)
The last basic construction we need to consider with here concerns how to deal with sub-
routines. But we have already dealt with this, and all we need to do is specify how to write
it in our new programming language.
Suppose we have a subroutine that we have decided to call Fred. It will probably have some
variables that must be set by the routine which calls it, before it is set going, its arguments.
It will also possibly have some that it uses for its own private purposes, its local variables.
The whole thing then will look like this:
Here x1 , x2 , . . . , xm are its arguments, y1 , y2 , . . . , yn are its local variables and body is a
statement, usually a compound of several statements, which does the work of the subroutine.
So it will usually look more like this:
Fred(x1 , x2 , . . . , xm )(y1 , y2 , . . . , yn ) { S1 ; S2 ; . . . ; Sk }
where the Si are the simpler statements which make up the body.
When another routine calls Fred, it does so thus:
call Fred(a1 , a2 , . . . , am );
520 APPENDIX C. GENERAL ALGORITHMS
return
If you have done any computer programming, you will see what is happening here. I am
describing a language to write our algorithms in which is modelled fairly closely on the
standard computer language C. This is an elegant and battle-hardened language for writing
the sort of simple algorithms we are interested in here.
Simple? For a start, we are not interested in efficiency — either in time or in space — only
in whether an algorithm works at all. Thus we will write some very wasteful algorithms, for
the good reason that they are simple to write and to understand.
For the same reason, there are many aspects of “real” computer programming that (thank-
fully) will not bother us here, including such things as input/output, error reporting, garbage
collecting, multi-threading and re-entrant subroutines and so on and on.
There must be at least one routine. The non-main routines are usually called subroutines.
A routine has the following form
A statement is of one of the following forms. First I list the ones which correspond exactly
to the operations used in the diagrammatic form discussed above.
(a) x ≈ a? if(x ≈ a) T-statement
else F-statement
The else-part is optional.
(b) x ≡ y? if(x ≡ y) T-statement
else F-statement
The else-part is optional.
(e) x→y
d join(x,d,y)
(f ) x new x new
(g) d
(x →) weakmove(x,d)
We won’t be using this one.
(h) x :≡ y x :≡ y
d move(x,d)
x→
This is the strong move; we will be using this one.
x ≈ y? if(x ≈ y) T-statement
else F-statement
The else-part is optional.
And now for a few other facilities that we will assume built in to our language.
Return return
C.3 Comments
(i) We can of course represent natural numbers by strings in various ways, for instance
we could use the usual binary code, involving the alphabet { 0 , 1 }. Or we could use decimal
code, involving a slightly larger alphabet.
For example, the number 3718 would be represented by the configuration
x
O R R R
3 7 1 8
(The small triangle indicates that the variable x refers to the cell containing digit 3.)
We will see that, subject to a couple of very simple and reasonable restrictions on the code,
it does not matter what code we use for natural numbers. We will also see that a function
Nm → N can be computed by a program in this language if and only if it is partial recursive.
This will take some doing; for the time being, until this is proved, let us call these functions
algorithmic.
(ii) Clearly this appendix defines “algorithmic” functions more widely: on things that
can be represented by a workplace structure of any kind. For example, we could expect to
write programs that would operate in various ways on the strings used in one of our formal
theories. We can write simple programs to, for instance, check whether a string is a valid
expression or not or to decide whether a sequence of strings is a valid proof or not.
Functions
You will have noticed that the routines which make up a program do not return values, they
simply make changes to the current configuration. However a “function”, that is, a routine
which does return a value, is a very useful thing. We can implement functions without
making any additions to our basic language.
A function is be written in much the same way as a routine: it has the form
name(x1 , x2 , . . . , xn )(u1 , u2 , . . . , ul ) {S1 ;S2 ;. . . ;Sk } (–1)
As with normal routines, x1 , x2 , . . . , xn ; u1 , u2 , . . . , ul are its variables, of which x1 , x2 , . . . , xm
are its arguments and u1 , u2 , . . . , ul the local variables. Either or both of these may be empty,
C. A MORE CONVENIENT NOTATION 523
that is, n ≥ 0 and l ≥ 0. The variables are all distinct. The S1 , S2 , . . . , Sk are statements.
The last statement must be a return statement, so there is at least one statement.
The only difference from the form of a normal routine is that any return statement must
have the form
• return r
where r is one of the variables of the function.
The function can be used in an assignment statement in the following way:
z :≈ F (y1 , y2 , . . . , ym ) (–2)
(where F is the name of the function). This statement executes F just like a normal routine,
with the one exception that, when it encounters a return r statement, it sets z:≡ r before
exiting.
We don’t need to extend the basic definition of a program as given above to include this
facility. It can be implemented as a routine as follows: replace the function definition (–1)
above by
F(v,x1 , x2 , . . . , xn )(u1 , u2 , . . . , ul ){S1 ,S2 ,...Sk }
(note the extra argument v, for “value”), in which all the statements are the same except
that any return statement
v :≈ r;
return r; is replaced by
return;
F (z, y1 , y2 , . . . , ym );
if (condition) {statements}
where condition is something we can test for which is either true or false. If the condition
is true then the statements are executed, otherwise they are not.
As a first step, it will be useful to have structures to stand for “true” and “false”, and this is
easily done using single-cell structures containing an appropriate letter:
T F
true false
524 APPENDIX C. GENERAL ALGORITHMS
It doesn’t matter much what symbols we use for these, so long as they are different. We
can add two new symbols to the alphabet or we can re-use existing ones (for the way we use
them, no confusion will arise).
We will represent these simple structures by the words true and false.
C.5 Conditions
Here I describe ways of dealing with conditions which are more complicated than the three
basic ones mentioned above. The idea of a condition here is that it is something that can
be true or false, so it will be something that we can compute a value either true or false
for.
For a start, we can compute T or F values for the three basic conditions. Consider
p :≈ (x ≈ y) .
This asks whether the contents of the cells looked at by x and y are the same; if so, it sets
the contents of p to be T, otherwise F. This is easily implemented:
if (x≈ y) p:≈ T;
else p:≈ F;
Boolean operations
We can combine conditions together using boolean operations to form more complicated
ones, as follows; here, if p and q are conditions, then z will be one too.
• z :≈ ¬p
• z :≈ p ∧ q
• z :≈ p ∨ q
These have the obvious meanings and are implemented by functions thus:
not(p)(r); {
if(p) {r :≈ false}
else {r :≈ true}
return r;
}
and(p,q)(r); {
if(p) {
if(q) {r :≈ true;}
else {r :≈ false;}
}
else {r :≈ false}
return r;
}
C. A MORE CONVENIENT NOTATION 525
or(p,q)(r); {
if(p) {r :≈ true;}
else {
if(q) {r :≈ true;}
else {r :≈ false;}
}
return r;
}
We can then write the function calls thus:
z :≈ not(p)
z :≈ and(p,q)
z :≈ or(p,q)
but for these particular function calls it will be convenient to allow the notation
z :≈ ¬p
z :≈ p ∧ q
z :≈ p ∨ q
C.6 Expressions
As soon as we have assignment operations such as the ones discussed in the last few sections,
we can combine them into more complicated expressions. This usually requires the use of
one or more extra unused local variabless. For example, we can implement
as
p :≈ (x ≈ );
q :≈ ¬p;
r :≈ (y ≈ a);
s :≈ q ∧ r;
if(s) {statements};
The point of this is that we can now write things like
knowing that it is just a shorthand for the group of simpler statements above. We can also
write
p := (x 6≈ ) ∧ (y ≈ a)
and so on.
We can also substitute functions into other functions and routines. For instance, given
functions
F(x,y)
G(x)
H(x,y,z)
we can write
526 APPENDIX C. GENERAL ALGORITHMS
z :≈ F(G(a),H(a,b,c))
knowing that it is just shorthand for something of the form
u :≈ G(a);
v :≈ H(a,b,c);
z :≈ F(u,v);
D. AN IMPORTANT WORKSPACE STRUCTURE 527
• They can deal with sequences of symbols, that is, strings, in the sense that they can
represent strings and perform all the usual operations on them, adding and removing
entries, concatenating strings, comparing them and so on. This allows operations such
as checking whether a given string in one of our formal languages is an expression or
not, whether it is an axiom or not and so on.
• Moreover, they can deal with “nested” sequences: sequences of sequences of sequences
and so on to any depth. So, for instance, proofs can be checked.
• Natural numbers can be represented by strings of digits, so they can be dealt with
also. All the basic functions and operations defining partial recursive functions (as
in Definition 9.A.3) can be computed. This leads to the conclusion that all partial I 9.A.3
recursive functions are algorithmic, and in fact can be computed in this rather simple
workspace.
• Any configuration in any kind of workspace at all can be represented by one in the
RD-workspace with the same alphabet; moreover any algorithm that can be performed
using the more general workspace structures can be mimicked by one using the RD-
workspace with the same alphabet. This means that anything that can be computed
in any way can be computed using an RD-workspace.
D.1 Strings
We have already seen a neat way to deal with strings whose entries come from any given
alphabet using only R pointers. For example, the number 3718 would be represented by the
structure
x y
O R
O R R
3 7 1 8
(The small triangles indicate that there are variables x and y referrring to the digits 3 and
7.)
We can represent it (on paper) more compactly thus:
x y
O O
[3 7 1 8] .
528 APPENDIX C. GENERAL ALGORITHMS
In a computer program, for efficiency, one would normally have L (“left”) pointers (which
are the reverse of R pointers) as well, but they are not really necessary, so to keep things
simple we will not use them. Note however that this means that we must always refer to
strings by their leftmost entries, for there is no way of finding entries to the left of the one
a a variable references, unless there is already another variable pointing there. So here “the
string x” means “the string whose leftmost entry is pointed to by x”.
• appendL(x,y)
• appendR(x,y)
Append a new entry a to the left or right end of the string x which is a new cell containing
whatever symbol y was looking at. Make sure that x now points to the left-hand end of the
string as it should.
The appendL function is implemented as
appendL(x,y)(u) {
u new; u :≈ y;
if (x6=≈ ) join(x,L,u);
x:≡u;
return;
}
and the appendR by
appendR(x,y)(u,v) {
u new; u :≈ y;
if(x ≈ ) x :≈ u;
else {
v :≈ rightmost(x);
join(v,R,u);
}
return;
}
• z := concat(x,y)
D. AN IMPORTANT WORKSPACE STRUCTURE 529
Concatenate the strings x and y and set z to point to the leftmost entry of the result.
Implemented as
concat(x,y)(r,u) {
if(x ≈ ) {r cceq y;}
else {
r :≈ x;
u:=rightmost(x);
join(u,R,y);
}
return r;
}
The statement x:=y sets x to point at the same cell that y does; it does not create anything
new. The statement x:=copy(y) creates a new cell identical to the one that y points at.
We also want a statement that will copy a whole string. This is
• z :≈ copyString(x)
and it is implemented as
copyString(x)(r,u) {
r :≈ ;
u cceq x;
while(u 6≈ ) {
appendR(r,u);
move(u,R);
}
return r;
}
Of course we want to check whether two strings are equal. The condition x=y won’t do
because it only checks whether the individual cells x and y point to are the same (in the
sense of having the same contents; their direction pointers may be different).
• z :≈ equalStrings(x,y)
is implemented as
equalString(x,y)(u,v) {
u :≈ x;
v :≈ y;
while(u 6≈ ∧ v 6≈ ) {
if(u 6≈ v) return false;
move(u,R);
move(v,R);
}
if(u ≈ ∧ v ≈ ) return true;
return false;
}
530 APPENDIX C. GENERAL ALGORITHMS
x
O
( 2 7 ) ( 5 3 ) ( 1 0 8 )
x
O
2 7 5 3 1 0 8
Here all the cells in the first row have null contents and those in the second row have null
down pointers. Clearly, using this idea we can nest sequences to any depth — or represent
tree diagrams.
We will need a function to copy an entire tree (or subtree):
• z :≈ copyTree(x)
This is implemented as
copyTree(x)(u,p,q,r,s) {
u :≈ x;
s :≈ ;
while(u 6≈ ) {
p :≈ copy(u);
q :≈ copyTree(u.D);
join(p,D,q);
if(s 6≈ ) join(s,R,p);
else r :≈ p;
s :≈ p;
move(u,R);
}
return r;
}
This is the first example we have of a routine which calls itself recursively.
that all the basic operations defining a partial recursive function (as in Definition A.3) can I A.3
be computed in this language. We will see that all that is required of our representation of
natural numbers is that
• the representation of zero is known, that is, there is a nullary algorithm zero() which
will compute the representation of zero;
• the successor function can be computed in this representation, that is, there is a unary
algorithm suc(x) which will compute the successor of any number in this representa-
tion;
• there is an “equality” algorithm, equal(x,y) say, which will determine whether two
numbers are equal or not; and
and so on. (The letters used inside the strings are written in small slanty font to distinguish
them from actual numbers, which will be written normally. After the example routine for
the successor function below, we will not be looking inside the strings for natural numbers
again anyway.)
That different natural numbers have different representations in this coding is easily proved
and we have just defined the representation of zero.
suc(x)(r,carry,dig) {
r :≈ [ ];
carry :≈ true;
dig :≈ rightmost(x);
while(dig6≈ ) {
if(carry) {
if(dig ≈ 0) {appendL(r,1 ); carry :≈ false;}
else {appendL(r,0 ); carry :≈ true;} }
else {
if(dig ≈ 0) {appendL(r,0 ); carry :≈ false;}
else {appendL(r,1 ); carry :≈ false;}
}
move(dig,L);
if(carry) appendL(r,1 );
}
return r;
}
Equality is given by equalStrings.
This is possibly the simplest notation of all. We choose a symbol (say, 1 ) and represent
each natural number by a string of that many copies of the symbol. So, for example, five is
represented by the string [ 1 1 1 1 1 ] .
That different natural numbers have different representations in this coding is obvious. Zero
is represented by the empty string. The successor function is implemented by:
suc(x)() {
appendR(x,1 )
return r
}
Equality is again given by equalStrings.
Given any two algorithmic codes for natural numbers (that is, ones which satisfy the con-
I D.3 ditions listed in section D.3), then there are algorithms to convert back and forth between
the two codes.
For, suppose we have two such codes — let us call them Code 1 and Code 2. Then, according
to the conditions assumed, there are algorithms zero1(), suc1(x) and equal1(x,y) for
Code 1 and corresponding ones zero2(), suc2(x) and equal2(x,y) for Code 2.
convert(x)(r,u) {
u :≈ zero1();
r :≈ zero2();
while(¬equal1(u,x)) {
u :≈ suc1(u);
r :≈ suc2(r)
}
return r;
}
It follows that any function Nn → N which can be algorithmically computed in one Code 1
can be computed in Code 2. For, given an algorithm which works for Code 1, an algorithm
for Code 2 can be created by simply converting all the arguments from Code 2 to Code
1 first, then applying the given Code 1 algorithm and finally converting the Code 1 result
back to Code 2.
Summary
So, from now on we will just assume that we have a workspace structure representing the
number zero, which I will call 0, and a functions for computing the successor and deciding
equality, which I will call suc and equal. (As usual, I will abbreviate z := equal(x,y) to
the more usual z:=(x=y) ).
This is where we get to use the = symbol.
So we can use more or less any reasonable notation we like, binary, decimal, Roman, unary,
gaol-wall — choose your favourite.
x
O
2 7 5 3 1 0 8
will yield
x
O
z
O
2 7 5 3 1 0 8 5 3
Note that this function will work with sequences, or in fact trees of any depth, just as well.
Note also that we are indexing the entries by 1,2,3,. . . , not 0,1,2,. . . .
Finally, as remarked above, this will work no matter what code we use for natural numbers,
so long as it is algorithmic. We will not assume that we are using the particular decimal
code exhibited in the pictures above: that was done just for illustration.
D.8 Proposition
All partial recursive functions are algorithmic.
Moreover, given any code for numbers, provided the number code is algorithmic, any partial
recursive function can be computed by an algorithm using that code.
I A.3 Proof. We work our way through the various parts of Definition A.3.
(i) We are assuming that an algorithm suc for the successor function is given.
(ii) We need to give an algorithm for the projection function πn,i : Nn → N. Note that,
for this proof, it is sufficient to give a different algorithm for each n and i, however it is now
easy to give a single algorithm which works for all n and j. The function πn,i (x) is given by
the algorithm entry(x,i) above (the subscript n is irrelevant).
(iii) Addition, z :≈ x+y, is implemented as
add(x,y)(r,u)
r :≈ x;
u :≈ 0;
while(u6≈ y) {
r :≈ suc(r);
u :≈ suc(u);
}
return r;
}
D. AN IMPORTANT WORKSPACE STRUCTURE 535
(iv) y, is implemented as
Natural subtraction, z :≈ x−
natSubtract(x,y)(r) {
r :≈ 0;
while(x+r6≈ y ∧ y+r6≈ x) r :≈ suc(r);
if(y+r≈ x) return r;
return 0;
}
(v) Multiplication, z :≈ x*y, is implemented as
multiply(x,y)(r,u)
r :≈ x;
u :≈ 1;
while(u6≈ y) {
r :≈ r+x;
u :≈ suc(u);
}
return r;
}
For most notations these would be an ridiculously inefficient algorithms. But that doesn’t
matter here.
(vi) Substitution. Suppose that we have functions
f : Nm → N and g1 , g2 , . . . , gm : Nn → N
as follows:
H(x1 ,x2 ,...,xn ) (r,u1 ,u2 ,...,um ) {
u1 :≈ G1 (x1 ,x2 ,...,xn );
u2 :≈ G2 (x1 ,x2 ,...,xn );
.
.
.
um :≈ Gm (x1 ,x2 ,...,xn );
return F(u1 ,u2 ,...,um );
}
In whatever program this occurs, it must be accompanied by all the subroutines in the
programs for F,G1 , G2 , . . . , Gm , so that the above assignment statements work
(vii) Minimalisation. Suppose we have a function f : Nn → N defined by minimalisation
thus:
f (x1 , x2 , . . . , xn ) = min{g(u, x1 , x2 , . . . , xn ) = 0},
u
536 APPENDIX C. GENERAL ALGORITHMS
• We know that the overall state of the algorithmic calculation at any point in its
progress (what we called a step in the running of the algorithm) consists of the current
configuration together with where the computation is up to in the program. There is
a way of representing every such step as a single natural number (which we will call
its Gödel number ) which allows the following.
• Converting any natural number to the Gödel number of its representation as a config-
uration is a recursive function.
• The action of each kind of statement in our algorithmic programming language is
reflected by a recursive function of the Gödel numbers of the before-and-after states.
• Recognising when the program has terminated is also reflected in a recursive function
of the Gödel number.
• Reading off the “answer”, that is, converting the final state to the natural number it
represents, is also reflected in a recursive function.
• Consequently, starting the program with any given number as its input and running it
until it terminates and then reading off the answer can be reflected as a partial recursive
function, defined by a minimalisation involving the aforementioned functions.
Since the minimalisation step here occurs only once, it follows that it is possible to build up
any partial recursive functions in the manner of their basic definition, in which a minimali-
sation step is used at most once.
Further, we will see as we go along that all the recursive functions mentioned above (in
every dot-point except the last) is in fact primitive recursive. This results in the surprising
fact (Theorem E.17) that any partial recursive function can be computed in three steps: I E.17
(1) Apply a primitive recursive function,
(2) Do a minimalisation (once only!) on the result,
In this section we will show how to assign natural numbers to complicated structures in such
a way that operations we perform on those structures correspond to recursive operations on
these numbers. This process in general is called Gödel numbering.
Our aim will be to assign a Gödel numbering to the steps in the running of an algorithm in
such a way that the operation of going from one step to the next is mirrored by a recursive
function N → N. (In fact it will be primitive recursive, which gives another interesting
result.)
The definition of a step is quite complicated. We will work up to it by first Gödel numbering
the alphabet A, the set D of directions and the set N of nodes. Next we will Gödel number
the possible cells . . . and so on, working our way up through levels of complexity.
In this process we will be dealing with the set N[∞] of all finite sequences of natural numbers;
I 9.B.17 this and the function P[∞] : N[∞] → N defined in 9.B.17 will be used many times. So often
in fact that I will use the less complicated notation γ for this function. (The symbol γ will
be used for this function and for nothing else throughout this appendix.)
The properties of this function (and some related ones) will be re-stated here in this notation.
It will become apparent that this is our first example of a Gödel numbering.
I 9.B.21 In Proposition 9.B.21 a number of associated functions are defined and their relevant prop-
erties proved. Since we will be using them, here they are again:
get length
len : N → N mirrors the operation of finding the N[∞] N
length of a sequence.
For any sequence x of length n we have len(γ(x)) = γ id
n.
N N
len
get entry
ent : N2 → N mirrors the operation of finding the N × N[∞] N
ith entry of a sequence.
For any sequence x of length n we have ent(i, γ(x)) = id ×γ id
xi .
(We don’t care what its values are for i > n: we
won’t be using them.) N2 N
ent
delete x1
del : N → N mirrors the operation of deleting the N[∞] N[∞]
first entry of a sequence.
For any sequence x of length n we have del(γ(x)) = γ γ
γ(y), where y is the sequence x with its first entry
removed.
(We don’t care what its value is for the empty se- N N
quence: we won’t be using it.) del
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 539
get entry
adj : N2 → N mirrors the operation of adjoining a N × N[∞] N
new first entry to a sequence.
adj(z, γ(x)) = γ(y) where y is the sequence x with id ×γ id
a new first entry z tacked on.
N2 N
ent
xr = z
rep : N3 → N mirrors the operation of replacing an N2 × N[∞] N[∞]
entry with a new value.
rep(r, z, γ(x)) = γ(y) where y is the sequence x with id2 ×γ γ
its rth entry replaced by z.
(We don’t care what its values are for r > n: we
won’t be using them.) N3 rep
N
N2 N
concat
N N
zeros
encode decode
x Step0 Step1 ... Stepn f (x)
enc dec
g0 g1 ... gn
540 APPENDIX C. GENERAL ALGORITHMS
The gi are all Gödel numbers. We will show that the red arrows (the left and right end
diagonal ones and all the ones in the bottom row) are primitive recursive functions. The
process of stepping continues until a step to the Fin operation occurs (which can also be
tested by a primitive recursive function). Therefore the whole process of getting from x to
f (x) is partial recursive. (Partial because the algorithm might get into an infinite loop and
never reach the Fin node.)
Finally, for any configuration, we will assume its cells are indexed, so the set of all cells in
the configuration is
C = {c1 , c2 , . . . , cn } allocating 0 to , so that = c0 .
where s is the index (as above) of the symbol it contains, possibly 0 for empty, and each ci
is the index of the cell that it is connected to in direction i, again 0 if it is not connected.
Having done this, we can represent this state by a single number gnCell, given by
gnCell(ck ) = γhs, c1 , c2 , . . . , cδ i .
We will call this number the cell Gödel number for the cell.
The things we will want to do with a cell are reflected in its Gödel number; let us give
these functions helpful names. (We suppose we are working with a cell which has cell Gödel
number g.)
Consequently the state of the entire configuration can be represented by the single number
gnConfig(C ) = γh gc1 , gc2 , . . . , gcn i ,
the “configuration Gödel number of the configuration”.
The things we will want to do with the configuration are also mirrored by its Gödel number;
and we will give these functions helpful names also. Let us now suppose we are working
with a configuration with Gödel number g.
542 APPENDIX C. GENERAL ALGORITHMS
• getCellK (g, i) = ent(g, i) ; this returns the cell Gödel number of the cell with index i.
• setCellK (g, i, j) = rep(g, i, j) ; this replaces the cell with index i by a cell with Gödel
number j and returns the Gödel number of the resulting configuration.
We are heading for a Gödel number of a step, encapsulating the current states of the
configuration and the processor. So now we must turn our attention to the processor.
It will make our notation much simpler if we assume these functions are in fact defined
for all nodes by giving them some arbitrary value on the nodes for which they are not yet
defined. We won’t be using those values, so this won’t cause any problems. A natural value
to give them would be , which has subscript 0 in each set.
All these functions are between finite sets which we have already indexed, so we can represent
them by Gödel numbers using a trick.
So the function
var : {n1 , n2 , . . . , nν } → {v1 , v2 , . . . , vϕ }
vari = k ⇔ var(ni ) = vk .
and this has Gödel number gnRef = γ(ref) and then the current state of the processor is
defined by this and the index of the current node:
The things we will want to do with the state of the process are as follows. Suppose we are
working with a process state with Gödel number g = gnProc.
• setNodeP (g, p) = rep(g, 1, p) ; this replaces the node index by p and returns the Gödel
number of the resulting process state.
The step can be represented by the pair hgnConfig C, gnProci, and so we have a Gödel
number
gnStep(S) = γ(gnConfig C, gnProc).
Now the various things we want to do at this level, both to the workspace and to the
program state, are easily defined: suppose we are working with a step-level Gödel number
g = gnStep(S).
track the computation step-by-step until we recognise that it is time to stop. Tracking the
computation is done by seeing what a single step (execution of a single statement) does
to the top-level Gödel number, and simply repeating this until the “finished” condition is
recognised. All these computations are primitive recursive except for the “keep going until
finished” part which is partial recursive (because of the possibility of never getting to the
“finished” condition).
So our next job is to look at how a single statement step is reflected in the top-level-Gödel
number. To do this we look first at the way the execution of the different kinds of node
change the step.
The execution of a single node results in going from one step to the next, so a function from
the set of all possible steps to itself. It is clear that this is a well-defined function. The point
of the present subsection is to look at it closely enough to show that it is in fact mirrored by
a primitive recursive operation on the Gödel numbers of the steps. This is a slightly tedious
process of checking the behaviour of each of the nine kinds of node. The proofs are pretty
repetitive, but here they are.
For the following steps two things happen: a change is made to the configuration (the result
of which I shall call the intermediate step) and then the current node is updated to the next
one, resulting in the new step.
• Kind=4, x :≈ a
The index of the node is p = getNodeS (g);
The index of the variable x is v = var(p);
The index of the cell it refers to is i = getCellS (g, v);
The index of the symbol ais s = sym p;
Therefore
the Gödel number of the intermediate step is gint = setContentsS (g, i, s)
and of the new step is setNodeS (gint , next(p))
d
• Kind=5, x → y
The index of the node is p = getNodeS (g);
The index of the variable x is u = var(p);
The index of the cell x refers to is i = getCellS (g, u);
The index of the variable y is v = var0 (p);
The index of the cell y refers to is j = getCellS (g, v);
Therefore
the Gödel number of the intermediate step is gint = joinS (g, i, d, j)
and of the new step is setNodeS (gint , next(p))
• Kind=6, x new
The index of the node is p = getNodeS (g);
Therefore
the Gödel number of the intermediate step is gint = newCellK (g)
and of the new step is setNodeS (gint , next(p))
d
• Kind=7, (x →)
548 APPENDIX C. GENERAL ALGORITHMS
encode decode
x Step0 Step1 ... Stepn f (x)
enc dec
g0 g1 ... gn
Here the progress from one step to the next, Stepi → Stepi+1 , is mirrored by a function
Next : gi → gi+1 which is now easy to describe.
All that is required is to extract the kind of the current node from the Gödel number g and
use this to decide which of the actions just described to perform.
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 549
For the Gödel number g any step S, the Gödel number of the next step is given by:
Next0 (g) if k =0
Next1 (g) if k =1
Next2 (g) if k =2
Next3 (g) k =3
if
NextStep(g) = Next4 (g) if k =4 where k = kind(getNodeS (g))
Next5 (g) if k =5
Next6 (g) if k =6
Next7 (g) if k =7
Next8 (g) if k =8
We have now dealt with most of the game plan diagram above. We still need to look at the
question of getting numbers into and out of the configuration and the corresponding Gödel
numbers. This is looking at the left and right hand triangles in the diagram.
Since we have proved that it makes no difference which code we use (so long as it is algo-
rithmic) we will use the unary code as being by far the simplest to deal with.
Proof. This will be done in several easy steps. As an illustrative example, consider the
sequence h 3, 0, 2 i. This is represented by the configuration
550 APPENDIX C. GENERAL ALGORITHMS
O
1 5 6
1 1 1 1 1
2 3 4 7 8
The small numbers beside the cells show the way we will index them.
Prelim
To get the index of the last cell in the bottom row (8 in the example above) as a function
of the input sequence x = hx1 , x2 , . . . , xn i. Call this function LB. Then
LB(x1 , x2 , . . . , xn ) = x1 + x2 + . . . xn + n .
We also want to get the index of the last cell in the top row (6 in the example). Call this
function LT. Then
LT(x1 , x2 , . . . , xn ) = x1 + x2 + . . . xn−1 + n .
Step 0
In the case of length zero, hi 7→ gch(i) is just a constant and so is primitive recursive.
Now we assume that, for length n, the function hx1 , x2 , . . . , xn i 7→ gc(x1 , x2 , . . . , xn ) is
primitive recursive and prove that hx1 , x2 , . . . , xn , yi 7→ gc(x1 , x2 , . . . , xn , y) is also primitive
recursive, by induction over y.
Step 1
We show first that the function hx1 , x2 , . . . , xn i 7→ gc(x1 , x2 , . . . , xn , 0) is primitive recursive.
O
1 5 6 9
1 1 1 1 1
2 3 4 7 8
Step 2
E. ALGORITHMIC ⇒ PARTIAL RECURSIVE 551
Is it obvious that this makes gc primitive recursive? It means that, if we rewrite the right
hand side of this equation as a function of x, y and another variable z thus,
(
join(newCellC (z), LT(z), 1, LB(z) + 1) if y = 0
h(x1 , x2 , . . . , xn , y, z) =
join(newCellC (z), LB(z), 2, LB(z) + 1) if y 6= 0 .
then we have
decode
Stepn f (x)
dec
gn
The function UnaryDecode : N → N which takes the step Gödel number of the representation
as a workspace of a single number using unary code to that number is primitive recursive.
Proof. Suppose (to simplify the notation) that S is the last step in the sequence with step
Gödel number g, and that this represents a single number n
decode
S n
dec
g
Then
O
1 1 1 1 ... 1
1 2 3 4 n
Proof. First note that, for any step S with step Gödel number g = gnStep(S), it is possible
to test g to see whether the process is in a Fin state or not, as follows:—
and then the kind of the current node is given by kind, so we have
Now let us redraw the “game plan” diagram to make the data x and the step numbers
explicit:—
encode decode
x Step(0, x) Step(1, x) ... Step(n, x) f (x)
enc dec
g(0, x) g(1, x) ... g(n, x)
We know that
and that both enc and NextStep are primitive recursive; therefore so is g.
The step number at which the process first reaches a Fin node is given by
Proof. This follows immediately from Propositions D.8 and E.14. I D.8
I E.14
E.16 Remark: some synonyms
As a result of the theorem above we can us the words partial recursive, algorthmic and com-
putable as being synonyms — completely interchangeable. Well, at least as far as functions
Nm → Nn goes; algorithms in general can be applied to a wider range of data structures.
In the same way we can use the word recursive for any algorithm that is guaranteed to
terminate.
The following result is an immediate corollary of this proof, but it is important enough to
be called a theorem.
E.17 Theorem
For any partial recursive function f : Nn 99K N, there are primitive recursive functions
p, q : Nn+1 → N such that
f (x) = p(min{q(i, x) = 0}, x) for all x ∈ Nn .
i
Proof. With the notation of the last few lines of Proposition E.14 above, set
p(u, x) = dec(gnStep(u, x))
and q(i, x) = kind(ent(1, ent(2, g(i, x)))) .
E.18 Corollary
In the definition of partial recursive functions, 9.A.3, the operation of minimalisation can I 9.A.3
be replaced by minimalisation applied only to regular functions.
This was discussed in the Remarks, 9.A.15. The theorem in fact says something much I 9.A.15
stronger.
Proof. Given a recursive function (not partial) f : Nn 99K N, write it in terms of primitive
recursive functions p and q, as in the theorem above. Since f is total, so must x 7→
mini {q(i, x) = 0} be. But if this is total, q must be regular. Since that is the only application
of minimalisation (p and q are primitive recursive) the corollary is proved.
554 APPENDIX C. GENERAL ALGORITHMS
F Turing machines
F.1 Discussion
The definition of a Turing machine and the results about them are not essential to the studies
we are doing. Nevertheless, the idea is intimately bound up with that of an algorithm, and
Turing machines and Turing computability are so frequently mentioned in this context that
it is pretty well essential to discuss them briefly here.
Luckily, with the work of this appendix behind us we can cover this subject adequately
enough in fairly short order.
In discussing Turing machines, the workspace is usually called the “tape”, for obvious reasons.
In the usual way of defining a touring machine the tape extends infinitely in both directions,
but only a finite number of the cells may be non-blank. (This is equivalent to the formulation
of a general algorithm in which the workspace is finite, if a blank cell is considered to be “off
the edge” or, if you like empty space.) In these formulations, the blank cell is often denoted
0.
(2) The processor is also particularly simple. For a start it has only one variable, which
can be thought of as an eye, poised above the tape, and only able to look at one cell at a
time.
The processor can have many nodes, but they are only of a limited number of kinds. These
are
(a) x ≈ a?. (Do I see the symbol a?)
(d) x :≈ y. (Write the symbol a.)
L R
(g) x → and x →. (Move one cell left or right.)
(i) FIN. Stop work.
All the other kinds are irrelevant because of the simplicity of the workspace (tape).
In a sense the definition of a general algorithm at the beginning of this appendix is just a
generalisation of that of a Turing machine.
F. TURING MACHINES 555
In many descriptions of Turing machines, the four kinds of nodes are collapsed into one
rather complicated one which does all of the following:
(1) Look at the symbol in the current cell.
(2) Write a symbol or not, depending upon the one just seen.
(3) Move L or R or stand still, also depending upon the symbol seen.
There is also some provision for knowing when the process has finished. This is equivalent
to the previous formulation in terms of four kinds of node.
This is not to denigrate Turing machines. They are historically very important, and have
had a great influence on the kind of mathematics and logic we have been studying here.
Moreover, though the idea may seem a bit obvious looking back from the twenty-first century,
the original formulation by Alan Turing must have involved profound insights.
It is good to know what it means to say that a function has been shown to be Turing
computable, even though one might suspect that this term is often airily thrown around by
people who have not actually defined and proved the actual Turing machine program they
are suggesting. It is comforting to know that the notion is equivalent to “algorithmic”, in the
sense of this appendix, and so one can verify that a function is Turing machine computable
by the rather easier methods described here without having to actually program the machine.
I have stated here that Turing machine computable is equivalent to algorithmically com-
putable which we already know is equivalent to partial recursive. I do not propose to prove
this here, since we will not be using Turing machines.
One way round is easy: if a function is computable by a Turing machine then, since such a
machine is a simple type of algorithm, it is computable by a general algorithm and we have
proved that then it is partial recursive.
The other way around involves showing that the basic functions of the definition of a par-
556 APPENDIX C. GENERAL ALGORITHMS
I 9.A.3 tial recursive function (Definition 9.A.3) are all Turing computable — those basic functions
include addition and, even worse, multiplication — and that Turing computability is main-
tained under the operations of substitution and minimalisation. This is an unpleasant and
lengthy undertaking.
D. SOME ALGORITHMS
In this section I explain in more detail the algorithms required for some of the constructions
in Chapter 10. I Ch.10
A Preliminaries
A.1 Symbols and their arities
Here are some basic functions, defined more for convenience than anything else. As presented
here, I am assuming that the functions and relations are as given in Section 4.C.1 and no I 4.C.1
others, with the following Gödel numbers assigned; for a system S with more functions and
relations, make the obvious changes.
¬ 0 0̄ 6
⇒ 1 s 7 (the successor function)
∀ 2 + 8
( 3 × 9
) 4 = 10
, 5 vi 11 + i (ith variable symbol)
First some functions which test for various kinds of symbols from their Gödel numbers.
557
558 APPENDIX D. SOME ALGORITHMS
1 arity(n) {
2 if (n=6) return 0;
3 if (n=7) return 1;
4 if (8 ≤ n ≤ 10) return 2;
5 return 0;
6 }
Note that if n is not the Gödel number of any function or relation symbol this function
returns 0; this is never used, but we do this so that the function is recursive, not partial.)
I 10 In what follows we will also use the functions defined at the beginning of Chapter 10.
Some of the functions we are about to look at can best be thought of as “scanning” a string,
checking out symbols and substrings as they go. Typically they will work with two numbers,
x, the Gödel number of the string and p, the index of the symbol currently being looked at
in the string. Some functions will check out a substring (usually a subexpression) in some
way; typically such a function will return the index of the next symbol after the substring,
making it easy for the function that called it to proceed.
Firstly, to get the substring of a string; given a string with Gödel number x, the function
Substring(p,q,x) finds the substring which extends from its pth to its qth entry (inclusive)
and returns the Gödel number of that substring. It assumes that 1 ≤ p ≤ q ≤ len(x)
(we will only be using it when these inequalities hold, so we are not interested in what the
function does in other cases).
1 Substring(p,q,x; z,r) {
2 z := ent(q,x); The substring will be built up in z,
3 r := q-1; one symbol at a time, starting from
4 while (r ≥ p) { the right hand end of the substring in x.
5 add(ent(r,x),z);
6 r := r-1;
7 }
8 return z;
9 }
We will need to concatenate two strings. Given strings with Gödel numbers x and y, the
function Concat(x,y) returns the Gödel number of the concatenated string x followed by
y.
1 Concat(x,y; z,p) {
2 z := y; The new string will be built up in z,
B. ALGORITHMS FOR GÖDEL’S THEOREM 559
We will also need to replace a single entry by a substring. Given a string with Gödel number
x, an index p and another string s, the function Replace(p,x,s) replaces the pth entry of
x with the string s and returns the Gödel number of the resulting string. We do this by
using the Substring function to pull x apart and then the Concat function to reassemble
it differently.
1 Replace(x,p,s; ) {
2 if (len(x)=1) return s;
3 if (p=1) return Concat(s,Substring(2,len(x),x));
4 if (p=len(x)) return Concat(Substring(1,len(x)-1,x),s);
5 else return Concat(Concat(Substring(1,p-1,x),s),Substring(p+1,len(x),x));
6 }
We also want a function which simply skips a term. Given the Gödel number x of an
expression and the index p of an entry in it, SkipTerm(x,p) will return the index of the
first entry after that term. We will only use this function when it is already known that
(x,p) is indeed the start of a valid term.
For example, consider the distributive law (∀z)(∀y)(∀z)( (x + y)z = xz + yz ) written out in
its fully formal form:
and let us suppose that its Gödel number is x. In this there is a term ×(+(x, y), z) whose
first character (the ×) is the 12th entry in the expression, so our functions should give
isTerm(x,12) = 1
SkipTerm(x,12) = 23
since the first entry after this term is the 23rd (a comma). Similarly, its 8th entry (a ∀
symbol) is not the first character of a term, so
isTerm(x,8) = 0
SkipTerm(x,8) = Don’t care.
1 SkipTerm(x,p; ch,ar,count) {
2 ch := ent(x,p);
4 if (isFunSym(ch)) {
5 ar := arity(ch) ;
6 p := p+2
7 count := 0;
8 while (count < ar) {
9 p := SkipTerm(x,p);
10 count := count + 1 ;
11 p:=p+1 ;
12 }
13 return p;
14 }
15 return 0;
16 }
and
B. ALGORITHMS FOR GÖDEL’S THEOREM 561
1 isTerm(x,p; ch,ar,count) {
2 ch := ent(x,p); Get the first character.
Shortly we will need a function which gets a term: given the Gödel number x of an expression
and the index p of the first entry of a term in it, it returns the Gödel number of that term
as a string.
1 GetTerm(x,p; q) {
2 q := SkipTerm(x,p)) - 1;
3 return Substring(p.q.x);
4 }
we have:
1 SkipSubexpression(x,p; ch,ar,count) {
2 ch := ent(x,mu);
3 if (isRelSym(ch)) {
4 ar := arity(ch) ;
5 p := p+2;
6 count := 0;
7 while (count < ar) {
8 p := SkipTerm(x,p))+1;
9 count := count + 1;
10 }
11 }
12 return p;
13 }
14 p:=p+1; ch:=ent(x,p);
15 if (isNotSym(ch)) {
16 p := p+1;
17 return := SkipSubxpression(x, p))+1;
18 }
19 if (isForAllSym(ch)) {
20 p := p+2;
21 p := SkipSubexpression(x,p));
22 return p+1;
23 }
24 p := p+1;
25 p := SkipSubexpression(x,p));
26 p := p+1;
27 p := SkipSubexpression(x,p);
28 return p+1;
29 }
and
B. ALGORITHMS FOR GÖDEL’S THEOREM 563
1 isSubexpression(x,p; ch,ar,count) {
2 ch := ent(x,mu); Get the first character.
43 p := SkipSubexpression(x,p);
44 ch := ent(x,p); Get next character.
45 if (not isImpSym(ch)) return 0; It should be an implication symbol.
46 p := p + 1; ch := ent(x,p); Get next character,
47 if (not isSubexpression(x,p)) return 0; It should start a subexpression.
48 p := SkipSubexpression(x,p);
49 ch := ent(x,p); Get next character.
50 if (not isCloseSym(ch)) return 0; It should be a closing parenthesis.
51 return 1 ; If we got to here, all is OK.
52 }
Now it is easy to make a function isExpression(x), which tests whether an entire string is
a valid expression or not; more correctly, it tests whether its argument is the string Gödel
number of an expression or not.
1 isExpression(x; p) {
2 p = isSubexpression(x,1));
3 if (p = len(x)+1) return 1;
4 else return 0;
5 }
We will need a function which gets a subexpression: given the Gödel number x of an
expression and the index p of the first entry of a subexpression, it returns the Gödel number
of that subexpression.
B. ALGORITHMS FOR GÖDEL’S THEOREM 565
1 GetSubexpression(x,p; q) {
2 q := SkipSubexpression(x,p)) - 1;
3 return Substring(p.q.x);
4 }
We will only use this function when we already know that x is the Gödel number of a valid
expression, that p is the index of the first character of a valid term in that expression and
that v is the Gödel number of a variable symbol; we do not care what nonsense the function
may get up to in any other case.
1 varInTerm(x,p,v; ch,arity,count) {
2 ch := ent(x,p); Get first character.
3 if (ch=v) return 1; If it’s a variable, we are done.
4 ar := arity(ch); Must be function symbol; get arity.
5 p := p+2; count := 0 ; Skip the parenthesis; get next character.
6 while (count < ar) { Count through the arguments.
7 if (varInTerm(x,p,v)=1) return 1; Does v occur in the argument?
8 p := SkipTerm(x,p);
9 count:=count+1; p:=p+1; Skip the comma; get next character.
10 }
11 return 0; If we get to here something’s wrong.
12 }
Now we can test whether a variable occurs free in an expression. Referring to the definition
of binding in 3.A.12 we see that the variable v occurs free in an expression if one of the I 3.A.12
following cases obtains:
• The expression is atomic, r(t1 , t2 , . . . , tn ) say, and v occurs in one of the terms t1 , t2 , . . . , tn .
• The expression is of the form (¬P ) and v occurs free in P .
• The expression is of the form (P ⇒ Q) and v occurs free in at least one of P or Q.
• The expression is of the form (∀xP ), v is not x and v occurs free in P .
566 APPENDIX D. SOME ALGORITHMS
1 FreeInSubexpression(x,p,v; ch,ar,count) {
2 ch := ent(x,p);
3 if (isRelSym(ch)) {
4 p := p+2; count := 0 ; ar = arity(ch);
5 while (count < ar) {
6 if (varInTerm(x,p,v)=1) return 1;
7 SkipTerm(x,p);
8 count := count+1; p := p+1;
9 }
10 return 0;
11 }
12 p := p+1; ch := ent(x,p) ;
13 if (isNotSym(ch)) {
14 p := p+1;
15 return FreeInSubexpression(x,p,v);
16 }
17 if (isForAllSym(ch)) {
18 p:=p+1; ch:=ent(x,p) ;
19 if (ch=v) return 0;
20 p:=p+1;
21 return FreeInSubexpression(x,p,v);
22 }
23 p := p+1;
24 if (freeInSubexpression(x,p,v)=1) return 1;
25 p:= p+1;
26 if (freeInSubexpression(x,p,v)=1) return 1;
27 else return 0;
28 }
It is now easy to write a function which checks whether a variable occurs free in a whole
expression or not. If x is the string Gödel number of a valid expression and v is the Gödel
number of a variable symbol, freeInExpression(x,v) returns 1 for yes if that variable
occurs free in the expression and 0 for no otherwise.
1 freeInExpression(x,v) {
2 return freeInSubexpression(x,1,v);
3 }
568 APPENDIX D. SOME ALGORITHMS
And now we can write a function which checks if a string x is a valid sentence or not. This
is fairly easy. First check that it is an expression, then check that it has no free variables
by checking that variable symbols with Gödel numbers up to x do not occur free in it . If
nothing has gone wrong up to there, it is a sentence, so return 1.
(Referring back to the original definition of the function P , we see that, for all x and y,
P (x, y) ≥ x and P (x, y) ≥ y and that P is strictly increasing in both variables. It follows
that the Gödel number of a string is greater than the Gödel number of any of its substrings
and greater than the Gödel number of any of its characters. So, in order to test whether
the expression of Gödel number x contains any free variables or not it is enough to test it
for free occurrences of all variables of Gödel number from 11 up to x. Why 11? Because
that is the Gödel number of the first variable symbol. This is obviously not very efficient,
but we are not interested in efficiency here, and it is simple.)
1 isSentence(x; p,w,ch,v) {
2 if (not isExpression(x)) return 0;
3 v:=11;
4 while(v ≤ x) {
5 if (isVarSym(v) & FreeInExpression(x,v)) return 0;
6 v := v+1;
7 }
8 return 1;
9 }
B.4 Substitution
Given the Gödel numbers x of an expression, v of a variable symbol and t of a term, the
function SubstInTerm(x,v,t) returns the Gödel number of the expression made by the
substitution x[v/t].
1 SubstInTerm(x,v,t; ch,p,ar,y,s) {
2 ch := ent(1,x);
3 if (ch=v) return t;
4 if (len(x) = 1) return x;
5 ar := arity(ch);
6 y := Substring(x,1,2);
7 p := 3; count := 1;
8 while (count ≤ ar) {
9 s := GetTerm(x,p);
10 y := Concat(y, SubstInTerm(s,v,t));
11 p := SkipTerm(x,p);
12 y := Concat(y, Substring(x,p,p));
13 p := p+1; count := count + 1;
B. ALGORITHMS FOR GÖDEL’S THEOREM 569
14 }
15 return y;
16 }
Now we can write a function which will make a similar substitution in an expression. The
code is lengthy because it has to deal with the various ways an expression can be constructed,
but the logic is very similar to that of the last function.
1 SubstInExpression(x,v,t; ch,p,ar,y,s) {
2 ch := ent(1,x);
3 if (isRelSym(ch)) {
4 ar := arity(ch);
5 y := Substring(x,1,2);
6 p := 3; count := 1;
7 while (count ≤ ar) {
8 s := GetTerm(x,p);
9 y := Concat(y, SubstInTerm(s,v,t));
10 p := SkipTerm(x,p);
11 y := Concat(y, Substring(x,p,p));
12 p := p+1; count := count + 1;
13 }
14 return y;
15 }
16 ch := ent(2,p);
17 if (isNotSym(ch)) {
18 y := Substring(x,1,2);
19 p := 3;
20 s := GetSubexpression(x,p);
21 y := Concat(y, SubstInExpression(s,v,t));
22 p := SkipSubexpression(x,p);
23 y := Concat(y, Substring(x,p,p);
24 return y;
25 }
26 if (isForAllSym(ch)) {
27 ch := ent(x,3);
28 if (ch = v) return x;
29 y := Substring(x,1,3);
30 p := 4;
31 s := GetSubexpression(x,p);
32 y := Concat(y, SubstInExpression(s,v,t));
33 p := SkipSubexpression(x,p);
34 y := Concat(y, Substring(x,p,p));
570 APPENDIX D. SOME ALGORITHMS
35 return y;
36 }
37 y := Substring(x,1,1);
38 p := 2;
39 s := GetSubexpression(x,p);
40 y := Concat(y, SubstInExpression(s,v,t));
41 p := SkipSubexpression(x,p);
42 y := Concat(y, Substring(x,p,p));
43 p := p+1;
44 s := GetSubexpression(x,p);
45 y := Concat(y, SubstInExpression(s,v,t));
46 p := SkipSubexpression(x,p);
47 y := Concat(y, Substring(x,p,p));
48 return y;
49 }
B.5 Acceptability
Testing for acceptability is not difficult. We use the inductive definition: the substitution
[v/t] is acceptable in . . .
1 Acceptable(x,v,t; ch,p,ar,y,s) {
2 ch := ent(1,x);
3 if (isRelSym(ch)) return 1;
4 }
5 ch := ent(2,p);
6 if (isNotSym(ch)) {
7 y := GetSubexpression(x,3);
8 return Acceptable(y,v,t);
9 }
10 if (isForAllSym(ch)) {
11 u := ent(3,x);
12 y := GetSubexpression(x,5);
13 if (not Acceptable(y,v,t)) return 0;
14 if (FreeInSubexpression(x,5,v) and
15 (VarInTerm(t,1,v)) return 0;
16 else return 1;
17 }
18 y := GetSubexpression(x,2);
19 if (not Acceptable(y,v,t)) return 0;
20 p := SkipSubexpression(x,p)+1;
21 y := GetSubexpression(x,2);
22 if (not Acceptable(y,v,t)) return 0;
23 else return 1;
24 }
In this implementation we first check that x is the Gödel number of a valid expression. Then
the main part of the listing is concerned with scanning the expression in the now-familiar
way to check that it is of the form (P ⇒ (Q ⇒ R)), remembering where the subexpressions
P and Q start (π and ρ) and the length λ of P . Finally (Line 21) it checks that P and R
are equal.
572 APPENDIX D. SOME ALGORITHMS
1 isPL1(x; p,ch,P1,P2) {
3 p := 1; ch := ent(x,p);
4 if (not isOpener(ch)) return 0;
5 p := p+1;
6 if (not isSubexpression(x,p)) return 0;
7 P1 := GetSubexpression(x,p);
8 p := SkipSubexpression(x,p);
9 p := p+2;
10 p := skipSubexpression(x,p);
11 p := p+1;
12 P2 := GetSubexpression(x,p);
13 if (P1 6= P2) return 0;
14 return 1;
15 }
Axioms PL2 to PL5 are checked in much the same way; implementation of functions isPL2
to isPL5 can now be left as an exercise. Axiom PL6 however raises some other concerns: it
is necessary to check whether a substitution [x/t] is acceptable in an expression P and, if
so, check whether another expression is of the form P [x/t] or not.
In order to recognise a valid instance of PL6 we need to be able to decide, given Gödel
numbers x and y of expressions and v of a symbol, whether the expression y is the result
of an acceptable substitution of any term for v in the expression x or not. Referring back
to the original definition of the function P , we see that, for all x and y, P (x, y) ≥ x and
P (x, y) ≥ y and that P is strictly increasing in both variables. It follows that the Gödel
number of a string is greater than the Gödel number of any of its substrings and greater than
the Gödel number of any of its characters. So, in order to perform the test just described, it
is enough to test for substitutions [v/t] for all t of Gödel number less than that of y. This
is obviously very inefficient, but we are not interested in efficiency here, and it is simple.
1 anySubst(x,v,y) {
2 t:= 1;
3 while (t ≤ y) {
4 if (isTerm(t) &
5 Acceptable(x,v,t) &
6 y=SubstInExpression(x,v,t)) return 1;
7 t := t + 1; \\
8 }
9 return 0;
10 }
Now we can test a string to see if it is an instance of Axiom PL6. The function isPL6(x)
B. ALGORITHMS FOR GÖDEL’S THEOREM 573
returns 1 if string x is an instance of Axiom PL6, 0 otherwise. This axiom in its formal
form is ((∀xP ) ⇒ Q), where P is an expression, Q is the result of substituting term t for
variable v in P and this substitution is acceptable.
574 APPENDIX D. SOME ALGORITHMS
1 isPL6(x; p,ch,y,z) {
2 if (not isExpression(x) return 0;
3 ch := ent(x,1);
4 if (not isOpenSym(ch)) return 0;
5 ch := ent(x,2);
6 if (not isOpenSym(ch)) return 0;
7 ch := ent(x,3);
8 if (not isForAllSym(ch)) return 0;
9 v := ent(x,4);
10 if (not isVarSym(v)) return 0;
11 y := GetSubexpression(x,5);
12 p := skipSubexpression(x,5);
13 ch := ent(x,p);
14 if (not isCloseSym(ch)) return 0;
15 p := p+1; ch := ent(x,p);
16 if (not isImpSym(ch)) return 0;
17 p := p+1;
18 z := GetSubexpression(x,p);
19 p := skipSubexpression(x,p);
20 ch := ent(x,p);
21 if (not isCloseSym(x,p)) return 0;
22 if (p 6= len(x)) return 0;
24 else return 1;
25 }
1 isAxiomOfPL(x) {
2 if (isPL1(x) or isPL2(x) or isPL3(x) or
3 isPL4(x) or isPL5(x) or isPL6(x)) return 1;
4 else return 0;
5 }
Consider first recognising Modus Ponens. This is fairly simple. Given three expressions x,
y and z, we wish to recognise when they are of the form P , (P ⇒ Q) and Q respectively;
that amounts to simply recognising when y is of the form (x ⇒ z). Assuming that we have
already checked that x, y and z are valid expressions,
576 APPENDIX D. SOME ALGORITHMS
1 isModusPonens(x,y,z; p,ch) {
2 ch := ent(y,1);
3 if (not isOpener(ch)) return 0;
4 if (not isSubexpression(y,2)) return 0;
5 if (x 6= GetSubexpression(y,2)) return 0;
6 p := SkipSubexpression(y,p);
7 ch := ent(y,p);
8 if (not isImpSym(ch)) return 0;
9 p := p+1;
10 if (not isSubexpression(y,p)) return 0;
11 if (z 6= GetSubexpression(y,p)) return 0;
12 p := SkipSubexpression(y,p);
13 ch := ent(y,p);
14 if (not isCloseSym(ch)) return 0;
15 if (p 6= len(y)) return 0;
16 else return 1;
17 }
For Universal Generalisation we must recognise when two strings x and y are of the form P
and (∀v P ), that is, when y is of the form (∀v x). This can now safely be left as an exercise.
Before we define isProof we need to define the function isValidStep. Given a (sequence)
Gödel number S of a sequence of expressions and a number n less than or equal to its length,
isValidStep(S,n) will return 1 for true if the nth step in the sequence is valid proof-step,
and 0 for false otherwise.
1 isValidStep(S,n; x,i,j) {
2 x := ent(S,n);
3 if (isAxiom(x)) return 1;
4 i := 1; j := 1;
5 while (i < n) {
6 while(j < n) {
7 if (isModusPonens(ent(S,i), ent(S,j), x)) return 1;
8 }
9 if (isUniversalGeneralisation(ent(S,i),x)) return 1;
10 }
11 return 0;
12 }
Now the function isProof is easy. Given a number S, isProof(S) will return 1 for true if
it is the (sequence) Gödel number of a valid proof and 0 for false otherwise.
1 isProof(S; length,stepNum,x) {
2 length := len(S); The number of steps in the proof
B. ALGORITHMS FOR GÖDEL’S THEOREM 577
3 stepNum := 1;
4 while (stepNum < length) { Loop through the steps of the proof
5 x := ent(S,stepNum); The current step
6 if( not isExpression(x)) return 0; Check it out
7 if( not isValidStep(S,stepNum)) return 0;
8 stepNum := stepNum + 1;
9 }
10 return 1;
11 }
1 ProofOf(S,x) {
2 if (not isProof(S)) return 0;
3 if (x 6= ent(S,len(S))) return 0;
4 else return 1;
5 }
1 theoremNumber(S) {
2 if (isProof(S)) return ent(S,len(S));
3 }
Here, if S is the Gödel number of a valid proof, the function returns its last entry, which is
the theorem it proves; otherwise it does not return any value. It is not hard to make this a
recursive function if you prefer. One way is to choose the Gödel number n0 of your favourite
simple theorem, say P ⇒ P , and redefine the function thus:
1 theoremNumber(S) {
2 if (isProof(S)) return ent(S,len(S));
3 else return n0;
4 }
578 APPENDIX D. SOME ALGORITHMS
Programs and routines, as we have defined them so far, simply return a single number. But
suppose we had something like a “print” statement in our language. Then we could set our
machine to print out the Gödel numbers of all theorems thus:
1 AllTheorems(;S) {
2 S := 0;
3 while (1) {
4 if (isProof(S)) print(ent(S,len(S)));
5 S :=S+1;
6 }
7 }
Now all you have to do is decode the number ent(S,len(S)) back into its actual string
of characters (not hard!) and you have a program which will type out every theorem of
mathematics. Just sit back and wait for your favourite theorem to turn up.
1 NumberToString(n; x) {
2 if (n=0) return 21;
3 else {
4 x := NumberToString(n-1);
5 return Concat(Concat(2017,x),10);
6 }
7 }
1 W(xi,eta; x,v,xix) {
3 x := 0;
4 v := 11;
B. ALGORITHMS FOR GÖDEL’S THEOREM 579
5 while (v<xi) {
6 if (FreeInExpression(v,xi) {
7 if (x=0) x:=v;
8 else if (v 6= x) return 0;
9 }
10 if (x=0) return 0;
11 }
12 xix := SubstInExpression(xi,x,NumberToString(xi));
14 return 1;
15 }
17 nxix := Concat(Concat(56,xix),10);
18 if (not ProofOf(eta,nxix)) return 0;
E. THE EXPONENTIAL FUNCTION IN PA
This appendix assumes you have read Sections 4.C on Peano Arithmetic, 9.B on primitive I 4.C
recursive functions and 10.B.3, The Representability Theorem. With all this background, I 9.B
we are now in a position to answer a question raised in 4.C.2: is it possible to define the I 10.B.3
exponential function in PA? I 4.C.2
We cannot define a new function in PA by giving its graph, as in 6.B.21, because this I 6.B.21
technique involves set-theoretic operations not available in PA. Rather, we must define it
by description, that is, define an expression, E(z, x, y) say, which is true if and only if z = y x .
The exponential function (on N) is defined inductively thus:
y0 = 1 (–1A)
x+ x
y = y .y (–1B)
From this we can easily write down what is required of the expression E(z, y, x):
(∀x)E(1, 0, y)
(∀x)(∀y) E(z, x, y) ⇒ E(zy, x+ , y)
(∀x)(∀y)(∃!z)E(z, x, y)
(Here, the third line is the general requirement for the expression to define a function.)
But note that this is a set of conditions the expression must satisfy if it is to define the
exponential function — but that in no way guarantees that such an expression exists.
The Representability Theorem gives us a short answer to this: from this definition, the
exponential function is obviously primitive recursive. Therefore it is recursive and the Rep-
resentability Theorem tells us that such an expression E(z, x, y) exists.
It would be nice to know exactly what this expression is. Fortunately, the proof of the
Representation Theorem is constructive: it tells us how to find the required expression.
Even more fortunately, the process is not too involved.
However, there is one hurdle to be got over.The definition above is a primitive recursive
one, and the Representation Theorem works from the basic definition of the function as a
recursive function, as given in Definition 9.A.3, that is, in terms of some basic functions I 9.A.3
which are all available to us in PA and the operations of substitution and minimalisation.
Thus we need the services of Proposition 9.B.2 to convert the definition (–1A,–1B) above I 9.B.2
into this recursive form.
To whip this into shape to make the conversion, let us temporarily call the function e(x, y),
that is e(x, y) = y x . Then rewrite (–1A) and (–1B) in the form
e(0, y) = g(y) where g(y) = 1 ,
+
e(x , y) = h(x, e(x, y), y) where h(x, u, y) = uy .
581
582 APPENDIX E. THE EXPONENTIAL FUNCTION IN PA
where
w0 (x, y) = min{ T (w, 0) = g(y) ∧ (∀ξ < x)(T (w, ξ + 1) = T (w, ξ).y) }
w
T (w, i) = Rem( L(w) + 1 , 1 + R(w)(i + 1) ) .
I 9.A.10 Here L and R are the functions defined in Proposition 9.A.10 with the function P .
Now we start at the bottom and work back up, defining expressions which represent these
functions as we go, and starting with P .
P is given by
1
P (x, y) = (x + y)(x + y + 1) + x
2
which means that
2P (x, y) = (x + y)(x + y + 1) + 2x ,
in other words,
u = P (x, y) ⇔ 2u = (x + y)(x + y + 1) + 2x .
So the function P is represented by the expression
So
Now we need to substitute for all seven occurrences of T̂ here from (–5), making the appro-
priate changes in variables. Even using cut-and-paste this looks quite unpleasant. And we
[ L̂ and R̂, which will lead us
are still only half-way there: we have yet to substitute for Rem,
to substituting for P̂ . I think it is time to give up on this project — but you get the idea.
F. INDEX
Logic symbols
1.B.14, 2.C.2
1.B.14
8.A.2
¬ 2.A.3, 3.A.3
⇒ 2.A.3, 3.A.3
∧ 2.F.6, 3.A.4
∨ 2.F.6, 3.A.4
⇔ 2.F.6, 3.A.4
F 2.F.30, 3.A.9
T 2.F.30, 3.A.9
∀ 3.A.3
∃ 3.A.4
∃! 4.A.6
! 4.A.6
P [x/t] see Substitution
TTT 2.G.4
≤ 4.C.8, 6.C.1
≥ 6.C.1
6≤ 6.C.1
6≥ 6.C.1
6= 4.C.8
< 4.C.8, 6.C.1
585
586 APPENDIX F. INDEX
∈ 6.A.1
∈-induction 6.H.1
⊂ 6.B.7
⊆ 6.B.7
⊃ 6.B.7
⊇ 6.B.7
∈
/ 6.B.7
≈ 7.E.3
4 7.E.3
SET 6.A.1
∅ 6.B.3
Sets 6.B.9
N see Natural numbers
r 6.B.12
∪ 6.B.14
S
6.B.14
∪˙ 7.D.11
S˙ 7.D.11
∩ 6.B.15
T
6.B.15
ha, bi, ha, bip 6.B.16
× 6.B.19
dom 6.B.20
cod 6.B.20
gr 6.B.20
◦ 6.B.21
id 6.B.21
587
BA 6.B.23
ha0 , a1 , . . .i 6.B.24
{xi }i∈I 6.B.24
Πi∈I Ai , Π{Ai }i∈I 6.B.24
# 7.F.1
ℵ, ℵ0 7.F.2, 7.F.7
bx/yc 9.A.4
x+ 4.C.1, 6.D.1
ξ¯ 4.C.2
¯
I(x), I(x) 6.C.1
ω, ω + , ω ++ etc. 7.A.1
Σ 7.D.18
fM , rM 5.A.2
coloured symbols 1.B.16
θ[vv /m] 5.B.4
πn,i 9.A.3
− 9.A.2
0 9.A.4
zero(x), eq(x, y), nonzero(x), 6= (x, y), less(x, y), gtr(x, y), leq(x, y), geq(x, y) 9.A.4
√
b xc, bx/yc 9.A.4
Rem(x, y) 9.A.4
P (x, y), L(x), R(x) 9.A.10
T (w, i) 9.A.14
Pn , En , En,i 9.B.13
N[∞] 9.B.16
P[∞] 9.B.17
len, ent, del, adj, rep 9.B.21
ϕ, ϕi 9.E.2
fˆ 9.E.4
588 APPENDIX F. INDEX
f˜ 9.E.5
ϕsm,n 9.E.6
gnSym 10.A.2
gnSym 10.A.2
[a] A.A
B
Base-α notation 7.D.9
Basis (for a vector space) 6.G.2
Bijection, bijective function 7.E.2
Binary 3.A.4
Binary code C.D.4
Binding 3.A.12 3.A.11
C
590 APPENDIX F. INDEX
D
DC-form (in UDLO) 4.F.6
Decidability of SL 2.G
Decidability of PL C.6.D.3
592 APPENDIX F. INDEX
Denumerable 7.E.8
Description, definition by 4.A.7
Dichotomy, Law of (theorem of SL) 2.F.28
E
593
Equality 4.A
Equality test function 9.A.4
Equation A.A.4
Equipollent 7.E.3
F
F, symbol for “false” 2.F.30
594 APPENDIX F. INDEX
Function 6.B.21
G
GCH 7.E.14
H
Halting Problem 9.F.3
Head-state (of execution of an algorithm) ( C.A.1 )
I
Idempotence of ∧ and ∨ 2.F.29
Identity function 6.B.21
If and only if, iff (connective) see equivalence
Incidence ( 4.B.2 )
Induction ( 4.C.2 )
Infimum 6.C.2
Injection, injective function 7.E.2
Instance (of a schema) 1.B.12
Integers (construction of) A.C
J
Join ( 2.F.15 )
L
Language ( 1.A.1 )
M
Maximal 6.C.2
Maximal principle 6.F.3
Maximum 6.C.2
Meet ( 2.F.15 )
Metalanguage 1.B.16
N
=== N the Natural Numbers, q.v.
Nand 2.I.4
Natural number, The Natural numbers 6.D, 6.D.2, A.B
Nullary 3.A.4
Order-isomorphism 6.E.5
Order-type 7.A.13
Ordered pair 6.B.16
Parent 7.E.11
Partial function 9.A.2 ( 9.A.1 )
Partial order 6.C.1
599
Proposition ( 1.A.1 )
R
R.e. 9.G
Range of a function 6.B.21
Relation 6.B.20
Rules of SL 2.A.11
Russell, Bertrand ( 2.F.19 )
S
S (A first-order language with arithmetic) 10.A
s-m-n Theorem 9.E.6
Sets 6.B.9
Set-theoretic difference 6.B.12
Seven-point plane 4.B.4
Sheffer stroke 2.I.4
Signature 3.A.4
Simple expression (in UDLO) 4.F.6
Simple extension of a theory 1.C.1
Singleton 6.B.13
Size 7.E.3 ( 7.E.1 )
SL see Sentential logic
Smaller (size of sets) 7.E.3
Span 6.G.2
Stack ( C.B.3 )
Subdeductions 2.E
Subspace (of a vector space) 6.G.2
Substitution 3.A.16 ( 3.A.17, 4.A.5 )
Symmetry 6.B.20
T
T, symbol for “true” 2.F.30
Tautology 2.A.7, 2.G.2
Term 3.A.6
Ternary 3.A.4
Theorem 1.B.5, 1.B.8 ( 1.A.1 )
Theory 1.B.2 ( 1.A.1 )
tL 5.A.1
Total function 9.A.2
Total order 6.C.1
Toy theory 1.A.2
U
UDLO 4.F
UG see Universal generalisation
Unary 3.A.4
Union 6.B.14
W
Well-formed formula 3.A.8
Well-order 6.C.1, 6.E
Well-Ordering Principle 6.F.3
Wff 3.A.8
Witness ( 8.B.1 )