Lectures in Logic and Set Theory. Volume 2 - Set Theory (PDFDrive)
Lectures in Logic and Set Theory. Volume 2 - Set Theory (PDFDrive)
CAMBRIDGE STUDIES IN
ADVANCED MATHEMATICS
EDITORIAL BOARD
B. BOLLOBAS, W. FULTON, A. KATOK, F. KIRWAN,
P. SARNAK
GEORGE TOURLAKIS
York University
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Preface page xi
I A Bit of Logic: A User’s Toolbox 1
I.1 First Order Languages 7
I.2 A Digression into the Metatheory:
Informal Induction and Recursion 20
I.3 Axioms and Rules of Inference 29
I.4 Basic Metatheorems 43
I.5 Semantics 53
I.6 Defined Symbols 66
I.7 Formalizing Interpretations 77
I.8 The Incompleteness Theorems 87
I.9 Exercises 94
II The Set-Theoretic Universe, Naı̈vely 99
II.1 The “Real Sets” 99
II.2 A Naı̈ve Look at Russell’s Paradox 105
II.3 The Language of Axiomatic Set Theory 106
II.4 On Names 110
III The Axioms of Set Theory 114
III.1 Extensionality 114
III.2 Set Terms; Comprehension; Separation 119
III.3 The Set of All Urelements; the Empty Set 130
III.4 Class Terms and Classes 134
III.5 Axiom of Pairing 145
III.6 Axiom of Union 149
III.7 Axiom of Foundation 156
III.8 Axiom of Collection 160
III.9 Axiom of Power Set 178
vii
viii Contents
A word on approach. I use the Zermelo-Fraenkel axiom system with the axiom
of choice (AC). This is the system known as ZFC. As many other authors do, I
simplify nomenclature by allowing “proper classes” in our discussions as part
of our metalanguage, but not in the formal language.
I said earlier that this volume contains the “basics”. I mean this charac-
terisation in two ways: One, that all the fundamental tools of set theory as needed
elsewhere in the mathematical sciences are included in detailed exposition. Two,
that I do not present any applications of set theory to other parts of mathematics,
because space considerations, along with a decision to include certain advanced
relative consistency results, have prohibited this.
“Basics” also entails that I do not attempt to bring the reader up to speed
with respect to current research issues. However, a reader who has mastered
the advanced metamathematical tools contained here will be able to read the
literature on such issues.
The title of the book reflects two things: One, that all good short titles are
taken. Two, more importantly, it advertises my conscious effort to present the
material in a conversational, user-friendly lecture style. I deliberately employ
classroom mannerisms (such as “pauses” and parenthetical “why”s, “what if”s,
and attention-grabbing devices for passages that I feel are important). This
aims at creating a friendly atmosphere for the reader, especially one who has
decided to study the topic without the guidance of an instructor. Friendliness
also means steering clear of the terse axiom-definition-theorem recipe, and
explaining how some concepts were arrived at in their present form. In other
words, what makes things tick. Thus, I approach the development of the key
concepts of ordinals and cardinals, initially and tentatively, in the manner they
were originally introduced by Georg Cantor (paradox-laden and all). Not only
does this afford the reader an understanding of why the modern (von Neumann)
approach is superior (and contradiction-free), but it also shows what it tries to
accomplish. In the same vein, Russell’s paradox is visited no less than three
Preface xiii
times, leaving us in the end with a firm understanding that it has nothing to do
with the “truth” or otherwise of the much-maligned statement “x ∈ x” but it is
just the result of a diagonalization of the type Cantor originally taught us.
A word on coverage. Chapter I is our “Chapter 0”. It contains the tools needed
to enable us do our job properly – a bit of mathematical logic, certainly no more
than necessary. Chapter II informally outlines what we are about to describe
axiomatically: the universe of all the “real” sets and other “objects” of our
intuition, a caricature of the von Neumann “universe”. It is explained that the
whole fuss about axiomatic set theory† is to have a formal theory derive true
statements about the von Neumann sets, thus enabling us to get to know the
nature and structure of this universe. If this is to succeed, the chosen axioms
must be seen to be “true” in the universe we are describing.
To this end I ensure via informal discussions that every axiom that is intro-
duced is seen to “follow” from the principle of the formation of sets by stages, or
from some similarly plausible principle devised to keep paradoxes away. In this
manner the reader is constantly made aware that we are building a meaningful
set theory that has relevance to mathematical intuition and expectations (the
“real” mathematics), and is not just an artificial choice of a contradiction-free
set of axioms followed by the mechanical derivation of a few theorems.
With this in mind, I even make a case for the plausibility of the axiom of
choice, based on a popularization of Gödel’s constructible universe argument.
This occurs in Chapter IV and is informal.
The set theory we do allows atoms (or Urelemente),‡ just like Zermelo’s.
The re-emergence of atoms has been defended aptly by Jon Barwise (1975) and
others on technical merit, especially when one does “restricted set theories”
(e.g., theory of admissible sets).
Our own motivation is not technical; rather it is philosophical and ped-
agogical. We find it extremely counterintuitive, especially when addressing
undergraduate audiences, to tell them that all their familiar mathematical
objects – the “stuff of mathematics” in Barwise’s words – are just perverse
“box-in-a-box-in-a-box . . . ” formations built from an infinite supply of empty
boxes. For example, should I be telling my undergraduate students that their
familiar number “2” really is just a short name for something like “ ”?
√
And what will I tell them about “ 2 ”?
† O.K., maybe not the whole fuss. Axiomatics also allow us to meaningfully ask, and attempt to
answer, metamathematical questions of derivability, consistency, relative consistency, indepen-
dence. But in this volume much of the fuss is indeed about learning set theory.
‡ Allows, but does not insist that there are any.
xiv Preface
Some mathematicians have said that set theory (without atoms) speaks only
of sets and it chooses not to speak about objects such as cows or fish (colourful
terms for urelements). Well, it does too! Such (“atomless”) set theory is known
to be perfectly capable of constructing “artificial ” cows and fish, and can then
proceed to talk about such animals as much as it pleases.
While atomless ZFC has the ability to construct or codify all the familiar
mathematical objects in it, it does this so well that it betrays the prime directive
of the axiomatic method, which is to have a theory that applies to diverse
concrete (meta – i.e., outside the theory and in the realm of “everyday math”)
mathematical systems. Group theory and projective geometry, for example,
fulfill the directive.
In atomless ZFC the opposite appears to be happening: One is asked to
embed the known mathematics into the formal system.
We prefer a set theory that allows both artificial and real cows and fish, so that
when we want to illustrate a point in an example utilizing, say, the everyday set
of integers, Z, we can say things like “let the atoms (be interpreted to) include
the members of Z . . . ”.
But how about technical convenience? Is it not hard to include atoms in a
formal set theory? In fact, not at all!
Acknowledgments. I wish to thank all those who taught me, a group that is too
large to enumerate, in which I must acknowledge the presence and influence
of my parents, my students, and the writings of Shoenfield (in particular, 1967,
1978, 1971).
The staff at Cambridge University Press provided friendly and expert sup-
port, and I thank them. I am particularly grateful for the encouragement received
from Lauren Cowles and Caitlin Doggart at the initial (submission and referee-
ing) and advanced stages (production) of the publication cycle respectively.
I also wish to record my appreciation to Zach Dorsey of TechBooks and
his team. In both volumes they tamed my English and LATEX, fitting them to
Cambridge specifications, and doing so with professionalism and flexibility.
This has been a long project that would have not been successful without
the support and understanding – for my long leaves of absence in front of a
computer screen – that only one’s family knows how to provide.
I finally wish to thank Donald Knuth and Leslie Lamport for their typesetting
systems TEX and LATEX that make technical writing fun (and also empower
authors to load the pages with and other signs).
George Tourlakis
Toronto, March 2002
I
† We drop the qualifier “mathematical” from now on, as this is the only type of logic we are about.
1
2 I. A Bit of Logic: A User’s Toolbox
It is evident that we need a precise formulation of set theory, that is, we must
turn it into a mathematical object in order to make task (2), above, a meaningful
mathematical activity.§ This dictates that we develop logic itself formally, and
subsequently set theory as a formal theory.
Formalism,¶ roughly speaking, is the abstraction of the reasoning processes
(proofs) achieved by deleting any references to the “truth content” of the com-
ponent mathematical statements (formulas). What is important in formalist
reasoning is solely the syntactic form of (mathematical) statements as well as
that of the proofs (or deductions) within which these statements appear.
A formalist builds an artificial language, that is, an infinite – but finitely
specifiable# – collection of “words” (meaning symbol sequences, also called
expressions). He|| then uses this language in order to build deductions – that
is, finite sequences of words – in such a manner that, at each step, he writes
down a word if and only if it is “certified” to be syntactically correct to do so.
“Certification” is granted by a toolbox consisting of the very same rules of logic
that we will present in this chapter.
The formalist may pretend, if he so chooses, that the words that appear in a
proof are meaningless sequences of meaningless symbols. Nevertheless, such
posturing cannot hide the fact that (in any purposefully designed theory) these
† We often quote a word or cluster of related words as a warning that the crude English meaning
is not necessarily the intended meaning, or it may be ambiguous. For example, the first “true”
in the sentence where this footnote originates is technical, but in a first approximation may be
taken to mean what “true” means in English. “Obviously true” is an ambiguous term. Obvious to
whom? However, the point is – to introduce another ambiguity – that “reasonable people” will
accept the truth of the (ZFC) axioms.
‡ This is an acronym reflecting the names of Zermelo and Fraenkel – the founders of this particular
axiomatization – and the fact that the so-called axiom of choice is included.
§ Here is an analogy: It is the precision of the rules for the game of chess that makes the notion of
analyzing a chessboard configuration meaningful.
¶ The person who practises formalism is a formalist.
# The finite specification is achieved by a finite collection of “rules”, repeated applications of which
build the words.
|| By definition, “he”, “his”, “him” – and their derivatives – are gender-neutral in this volume.
I. A Bit of Logic: A User’s Toolbox 3
words codify “true” (intuitively speaking) statements. Put bluntly, we must have
something meaningful to talk about before we bother to codify it.
Therefore, a formal theory is a laboratory version (artificial replica or sim-
ulation, if you will) of a “real” mathematical theory of the type encountered
in mathematics,† and formal proofs do unravel (codified versions of) “truths”
beyond those embodied in the adopted axioms.
It will be reassuring for the uninitiated that it is a fact of logic that the to-
tality of the “universally true” statements – that is, those that hold in all of
mathematics and not only in specific theories – coincides with the totality
of statements that we can deduce purely formally from some simple univer-
sally true assumptions such as x = x, without any reference to meaning or
“truth” (Gödel’s completeness theorem for first order logic). In short, in this
case formal deducibility is as powerful as “truth”. The flip side is that formal
deducibility cannot be as powerful as “truth” when it is applied to specific
mathematical theories such as set theory or arithmetic (Gödel’s incompleteness
theorem).
† Examples of “real” (non-formalized) theories are Euclid’s geometry, topology, the theory of
groups, and, of course, Cantor’s “naı̈ve” or “informal” set theory.
‡ In model theory “model” means exactly the opposite of what it means here. A model airplane
abstracts the real thing. A model of a formal (i.e., abstract) theory is a “concrete” or “real” version
of the abstract theory.
§ This is where it pays to choose reasonable assumptions, assumptions that are “obviously true”.
4 I. A Bit of Logic: A User’s Toolbox
But what about the interests of the reader who only wants to practise set
theory, and who therefore may choose to skip the parts of this volume that just
talk about set theory? Does, perchance, formalism put him into an unnecessary
straitjacket?
We think not. Actually it is easier, and safer, to reason formally than to do so
informally. The latter mode often mixes syntax and semantics (meaning), and
there is always the danger that the “user” may assign incorrect (i.e., convenient,
but not general) meanings to the symbols that he manipulates, a phenomenon
that anyone who is teaching mathematics must have observed several times
with some distress.
Another uncertainty one may encounter in an informal approach is this:
“What can we allow to be a ‘property’ in mathematics?” This is an important
question, for we often want to collect objects that share a common property,
or we want to prove some property of the natural numbers by induction or by
the least principle. But what is a property? Is colour a property? How about
mood? It is not enough to say, “no, these are not properties”, for these are
just two frivolous examples. The question is how to describe accurately and
unambiguously the infinite variety of properties that are allowed. Formalism
can do just that.†
“Formalism for the user” is not a revolutionary slogan. It was advocated
by Hilbert, the founder of formalism, partly as a means of – as he believed‡ –
formulating mathematical theories in a manner that allows one to check them
(i.e., run diagnostic tests on them) for freedom from contradiction,§ but also as
the right way to do mathematics. By this proposal he hoped to salvage mathe-
matics itself – which, Hilbert felt, was about to be destroyed by the Brouwer
school of intuitionist thought. In a way, his program could bridge the gap
between the classical and the intuitionist camps, and there is some evidence
that Heyting (an influential intuitionist and contemporary of Hilbert) thought
that such a rapprochement was possible. After all, since meaning is irrelevant
to a formalist, all that he is doing (in a proof ) is shuffling finite sequences of
† Well, almost. So-called cardinality considerations make it impossible to describe all “good”
properties formally. But, practically and empirically speaking, we can define all that matter for
“doing mathematics”.
‡ This belief was unfounded, as Gödel’s incompleteness theorems showed.
§ Hilbert’s metatheory – that is, the “world” or “lab” outside the theory, where the replica is
actually manufactured – was finitary. Thus – Hilbert believed – all this theory building and
theory checking ought to be effected by finitary means. This was another ingredient that was
consistent with peaceful coexistence with the intuitionists. And, alas, this ingredient was the one
that – as some writers put it – destroyed Hilbert’s program to found mathematics on his version
of formalism. Gödel’s incompleteness theorems showed that a finitary metatheory is not up to
the task.
I. A Bit of Logic: A User’s Toolbox 5
symbols, never having to handle or argue about infinite objects – a good thing,
as far as an intuitionist is concerned.†
In support of the “formalism for the user” position we must not fail to
mention Bourbaki’s (1966a) monumental work, which is a formalization of a
huge chunk of mathematics, including set theory, algebra, topology, and theory
of integration. This work is strictly for the user of mathematics, not for the
metamathematician who studies formal theories. Yet, it is fully formalized,
true to the spirit of Hilbert, and it comes in a self-contained package, including
a “Chapter 0” on formal logic.
More recently, the proposition of employing formal reasoning as a tool has
been gaining support in a number of computer science undergraduate curricula,
where logic and discrete mathematics are taught in a formalized setting, starting
with a rigorous course in the two logical calculi (propositional and predicate),
emphasizing the point of view of the user of logic (and mathematics) – hence
with an attendant emphasis on calculating (i.e., writing and annotating formal)
proofs. Pioneering works in this domain are the undergraduate text (1994) and
the paper (1995) of Gries and Schneider.
You are urged to master the technique of writing formal proofs by studying
how we go about it throughout this volume, especially in Chapter III.‡ You will
find that writing and annotating formal proofs is a discipline very much like
computer programming, so it cannot be that hard. Computer programming is
taught in the first year, isn’t it?§
† True, a formalist applies classical logic, while an intuitionist applies a different logic where, for
example, double negation is not removable. Yet, unlike a Platonist, a formalist does not believe –
or he does not have to disclose to his intuitionist friends that he might do – that infinite sets exist
in the metatheory, as his tools are just finite symbol sequences. To appreciate the tension here,
consider this anecdote: It is said that when Kronecker – the father of intuitionism – was informed
of Lindemann’s proof (1882) that π is transcendental, while he granted that this was an interesting
result, he also dismissed it, suggesting that π – whose decimal expansion is, of course, infinite
but not periodic – “does not exist” (see Wilder (1963, p. 193)). We do not propound the tenets of
intuitionism here, but it is fair to state that infinite sets are possible in intuitionistic mathematics
as this has later evolved in the hands of Brouwer and his Amsterdam school. However, such
sets must be (like all sets of intuitionistic mathematics) finitely generated – just like our formal
languages and the set of theorems (the latter provided that our axioms are too) – in a sense that
may be familiar to some readers who have had a course in automata and language theory. See
Wilder (1963, p. 234).
‡ Many additional paradigms of formal proofs, in the context of arithmetic, are found in Chapter II
of volume 1 of these Lectures.
§ One must not gather the impression that formal proofs are just obscure sequences of symbol
sequences akin to Morse code. Just as one does in computer programming, one also uses comments
in formal proofs – that is, annotations (in English, Greek, or your favourite natural language)
that aim to explain or justify for the benefit of the reader the various proof steps. At some point,
when familiarity allows and the length of (formal) proofs becomes prohibitive, we agree to relax
the proof style. Read on!
6 I. A Bit of Logic: A User’s Toolbox
† The term “bootstrapping” is suggestive of a person pulling himself up by his bootstraps. Reput-
edly, this technique, which is pervasive, among others, in the computer programming field – as
alluded to in the term “booting” – was invented by Baron Münchhausen.
I.1. First Order Languages 7
I.1.1 Remark. What is a rule? We run the danger of becoming circular or too
pedantic if we overdefine this notion. Intuitively, the rules we have in mind
are string manipulation rules – that is, “black boxes” (or functions) that re-
ceive string inputs and respond with string outputs. For example, a well-known
theorem-building rule receives as input a formula and a variable, and it returns
(essentially) the string composed of the symbol ∀, immediately followed by the
variable and, in turn, immediately followed by the formula.‡
(1) First off, the ( first order) formal language, L, where the theory is “spoken”§
is a triple (V , Term, Wff), that is, it has three important components, each
of them a set. V is the alphabet (or vocabulary) of the language. It is the
† For a less abstract, but more detailed view of theories see p. 39.
‡ This rule is usually called “generalization”.
§ We will soon say what makes a language “first order”.
8 I. A Bit of Logic: A User’s Toolbox
I.1.2 Remark. We may think of axioms (of either logical or nonlogical type) as
being special cases of rules, that is, rules that receive no input in order to produce
an output. In this manner item (2) above is subsumed by item (3), thus we are
faithful to our abstract definition of theory (where axioms were not mentioned).
An example, outside mathematics, of an inputless rule is the rule invoked
when you type date on your computer keyboard. This rule receives no input,
and outputs the current date on your screen.
† The generous use of the term “true” here is only meant to motivate. “Provable” or “deducible”
formula, or “theorem”, will be the technically precise terminology that we will soon define to
replace the term “true statement”.
I.1. First Order Languages 9
There are two parts in each first order alphabet. The first, the collection of
the logical symbols, is common to all first order languages (regardless of which
theory is spoken in them). We describe this part immediately below.
Logical Symbols.
LS.1. Object or individual variables. An object variable is any one symbol out
of the unending sequence v0 , v1 , v2 , . . . . In practice – whether we are
using logic as a tool or as an object of study – we agree to be sloppy with
notation and use, generically, x, y, z, u, v, w with or without subscripts
or primes as names of object variables.† This is just a matter of nota-
tional convenience. We allow ourselves to write, say, z instead of, say,
v120000000000560000009 . Object variables (intuitively) “vary over” (i.e., are
allowed to take values that are) the objects that the theory studies (e.g.,
numbers, sets, atoms, lines, points, etc., as the case may be).
LS.2. The Boolean or propositional connectives. These are the symbols “¬”
and “∨”.‡ These are pronounced not and or respectively.
LS.3. The existential quantifier, that is, the symbol “∃”, pronounced exists or
for some.
LS.4. Brackets, that is, “(” and “)”.
LS.5. The equality predicate. This is the symbol “=”, which we use to indicate
that objects are “equal”. It is pronounced equals.
The logical symbols will have a fixed interpretation. In particular, “=” will
always be expected to mean equals.
The theory-specific part of the alphabet is not fixed, but varies from theory
to theory. For example, in set theory we just add the nonlogical (or special)
symbols, ∈ and U . The first is a special predicate symbol (or just predicate) of
arity 2; the second is a predicate symbol of arity 1.§
In number theory we adopt instead the special symbols S (intended meaning:
successor, or “ + 1”, function), +, ×, 0, <, and (sometimes) a symbol for the
† Conventions such as this one are essentially agreements – effected in the metatheory – on how to
be sloppy and get away with it. They are offered in the interest of user-friendliness and readability.
There are also theory-specific conventions, which may allow additional names in our informal
(metamathematical) notation. Such examples, in set theory, occur in the following chapters.
‡ The quotes are not part of the symbol. They serve to indicate clearly, e.g., in the case of “∨” here,
what is part of the symbol and what is not (the following period is not).
§ “arity” is derived from “ary” of “unary”, “binary”, etc. It denotes the number of arguments
needed by a symbol according to the dictates of correct syntax. Function and predicate symbols
need arguments.
10 I. A Bit of Logic: A User’s Toolbox
Nonlogical Symbols.
NLS.1. A (possibly empty) set of symbols for constants. We normally use
the metasymbols† a, b, c, d, e, with or without primes or subscripts, to
stand for constants unless we have in mind some alternative “standard”
formal notation in specific theories (e.g., ∅, 0, ω).
NLS.2. A (possibly empty) set of symbols for predicate symbols or relation
symbols for each possible arity n > 0. We normally use P, Q, R,
generically, with or without primes or subscripts, to stand for predicate
symbols. Note that = is in the logical camp. Also note that theory-
specific formal symbols are possible for predicates, e.g., <, ∈, U .
NLS.3. Finally, a (possibly empty) set of symbols for functions for each possi-
ble arity n > 0. We normally use f, g, h, generically, with or without
primes or subscripts, to stand for function symbols. Note that theory-
specific formal symbols are possible for functions, e.g., +, ×.
I.1.3 Remark. (1) We have the option of assuming that each of the logical
symbols that we named in LS.1–LS.5 have no further structure and that the
symbols are, ontologically, identical to their names, that is, they are just these
exact signs drawn on paper (or on any equivalent display medium).
In this case, changing the symbols, say, ¬ and ∃ to ∼ and E respectively
results in a “different” logic, but one that is, trivially, isomorphic to the one we
are describing: Anything that we may do in, or say about, one logic trivially
translates to an equivalent activity in, or utterance about, the other as long as
we systematically carry out the translations of all occurrences of ¬ and ∃ to ∼
and E respectively (or vice versa).
An alternative point of view is that the symbol names are not the same as
(identical with) the symbols they are naming. Thus, for example, “¬” names
the connective we pronounce not, by we do not know (or care) exactly what
the nature of this connective is (we only care about how it behaves). Thus, the
name “¬” becomes just a typographical expedient and may be replaced by other
names that name the same object, not.
This point of view gives one flexibility in, for example, deciding how the
variable symbols are “implemented”. It often is convenient to suppose that the
† Metasymbols are informal (i.e., outside the formal language) symbols that we use within “real”
mathematics – the metatheory – in order to describe, as we are doing here, the formal language.
I.1. First Order Languages 11
entire sequence of variable symbols was built from just two symbols, say, “v”
and “|”.† One way to do this is by saying that vi is a name for the symbol
sequence
v|...|.
i |’s
Or, preferably – see (2) below – vi might be a name for the symbol sequence
v | . . . | v.
i |’s
† We intend these two symbols to be identical to their names. No philosophical or other purpose
will be served by allowing more indirection here (such as “v names u, which actually names w,
which actually is . . . ”).
‡ What we have stated under (2) are requirements, not metatheorems. That is, they are nothing of
the sort that we can prove about our formal language within everyday mathematics.
§ This phenomenon will be visited upon in some detail in what follows. By the way, any additions
are made to the nonlogical side of the alphabet, since all the logical symbols have been given,
once and for all.
12 I. A Bit of Logic: A User’s Toolbox
Wait a minute! If formal set theory is to serve as the foundation of all mathe-
matics, and if the present chapter is to assist towards that purpose, then how is
it that we are already employing natural numbers like 12000000560000009 as
subscripts in the names of object variables? How is it permissible to already talk
about “sets of symbols” when we are about to found a theory of sets formally?
Surely we do not have† any of these items yet, do we?
This protestation is offered partly in jest. We have already said that we work
within real mathematics as we build the “replicas” or “simulators” of logic
and set theory. Say we are Platonists. Then the entire body of mathematics –
including infinite sets, in particular the set of natural numbers N – is available
to us as we are building whatever we are building.
We can thus describe how we assemble the simulator and its various parts
using our knowledge of real mathematics, the language of real mathematics,
and all “building blocks” available to us, including sets, infinite or otherwise,
and natural numbers. This mathematics “exists” whether or not anyone ever
builds a formal simulator for naı̈ve set theory, or logic for that matter. Thus any
apparent circularity disappears.
Now if we are not Platonists, then our mathematical “reality” is more re-
stricted, but, nevertheless, building a simulator or not in this reality does not
affect the existence of the reality. We will, however, this time, revise our tools.
For example, if we prefer to think that individual natural numbers exist (up
to any size), but not so their collection N, then it is still possible to build our
formal languages (in particular, as many object variables as we want) – pretty
much as already described – in this restricted metatheory. We may have to
be careful not to say that we have a unending sequence of such variables, as
this would presume the existence of infinite sets in the metatheory.‡ We can
say instead that a variable is any object of the form vi where i is a (meaning-
less) word of (meaningless) symbols, the latter chosen out of the set or list
“0, 1, 2, 3, 4, 5, 6, 7, 8, 9”.
Clearly the above approach works even within a metatheory that has failed
to acknowledge the existence of any natural numbers.§
In this volume we will take the normal user-friendly position that is habi-
tual nowadays, namely, that our metatheory is the Platonist’s (infinitary)
mathematics.
† “Do not have” in the sense of having not formally defined – or proved to exist – or both.
‡ A finitist would have none of it, although a post-Brouwer intuitionist would be content that such
a sequence is finitely describable.
§ Hilbert, in his finitistic metatheory, built whatever natural numbers he needed by repeating the
stroke symbol “|”.
I.1. First Order Languages 13
In this book the symbol ≡ will be used exclusively in the metatheory as equality
of strings over some set M.
The symbol λ normally denotes the empty string, and we postulate for it the
following behaviour:
I.1.5 Definition (Terms). The set of terms, Term, is the smallest set of strings
over the alphabet V with the following two properties:
(1) Any of the items in LS.1 or NLS.1 (x, y, z, a, b, c, etc.) are included.
† A set that supplies symbols to be used in building strings is not special. It is just a set. However,
it often has a special name: “alphabet”.
‡ Punctuation, such as “.”, is not part of the string. One often avoids such footnotes by quoting
strings that are explicitly written as symbol sequences. For example, if A stands for the string
#, one writes A ≡ “#”. Note that we must not write “A”, unless we mean a string whose only
symbol is A.
§ If and only if.
14 I. A Bit of Logic: A User’s Toolbox
I.1.6 Remark. (1) We often abuse notation and write f (t1 , . . . , tn ) instead of
f t1 . . . tn .
(2) Definition I.1.5 is an inductive definition.‡ It defines a more or less com-
plicated term by assuming that we already know what simpler terms look like.
This is a standard technique employed in real mathematics (within which we are
defining the formal language). We will have the opportunity to say more about
such inductive definitions – and their appropriateness – in a comment
later on.
(3) We relate this particular manner of defining terms to our working def-
inition of a theory (given on p. 7 immediately before Remark I.1.1 in terms
of “rules” of formation). Item (2) in I.1.5 essentially says that we build new
terms (from old ones) by applying the following general rule: Pick an arbitrary
function symbol, say f . This has a specific formation rule associated with it.
Namely, “for the appropriate number, n, of an already existing ordered list of
terms, t1 , . . . , tn , build the new term consisting of f , immediately followed by
the ordered list of the given terms”.
For example, suppose we are working in the language of number theory.
There is a function symbol + available there. The rule associated with + builds
the new term +ts for any prior obtained terms t and s. Thus, +v1 v13 and
+v121 + v1 v13 are well-formed terms. We normally write terms of number the-
ory in infix notation,§ i.e., t +s, v1 +v13 and v121 +(v1 +v13 ) (note the intrusion of
brackets, to indicate sequencing in the application of +).
A by-product of what we have just described is that the arity of a function
symbol f is whatever number of terms the associated rule will require as
input.
† We will omit from now on the qualification “symbol” from terminology such as “function sym-
bol”, “constant symbol”, “predicate symbol”.
‡ Some mathematicians are adamant that we call this a recursive definition and reserve the term
“induction” for “induction proofs”. This is seen to be unwarranted hairsplitting if we consider
that Bourbaki (1966b) calls induction proofs “démonstrations par récurrence”. We will be less
dogmatic: Either name is all right.
§ Function symbol placed between the arguments.
I.1. First Order Languages 15
(4) A crucial word used in I.1.5 (which recurs in all inductive definitions) is
“smallest”. It means “least inclusive” (set). For example, we may easily think of
a set of strings that satisfies both conditions of the above definition, but which
is not “smallest” by virtue of having additional elements, such as the string
“¬¬(”.
Pause. Why is “¬¬(” not in the smallest set as defined above, and therefore
not a term?
The reader may wish to ponder further on the import of the qualification
“smallest” by considering the familiar (similar) example of N. The principle of
induction in N ensures that this set is the smallest with the properties
I.1.7 Definition (Atomic Formulas). The set of atomic formulas, Af, contains
precisely:
(1) The strings t = s for every possible choice of terms t, s.
(2) The strings Pt1 t2 . . . tn for every possible choices of n-ary predicates P (for
all choices of n > 0) and all possible choices of terms t1 , t2 , . . . , tn .
† Denotes.
16 I. A Bit of Logic: A User’s Toolbox
I.1.9 Remark.
(1) The above is yet another inductive definition. Its statement (in the met-
alanguage) is facilitated by the use of syntactic, or meta-, variables – A and
B – used as names for arbitrary (indeterminate) formulas. We first encountered
the use of syntactic variables in Definition I.1.5.
In general, we will let calligraphic capital letters A, B , C , D, E, F , G
(with or without primes or subscripts) be syntactic variables (i.e., metalinguistic
names) denoting well-formed formulas, or just formulas, as we often say. The
definition of Wff given above is standard. In particular, it permits well-formed
formulas such as ((∃x)((∃x)x = 0)) in the interest of making the formation
rules context-free.†
(2) The rules of syntax just given do not allow us to write things such as ∃ f
or ∃P where f and P are function and predicate symbols respectively. That
quantification is deliberately restricted to act solely on object variables makes
the language first order.
(3) We have already indicated in Remark I.1.6 where the arities of function
and predicate symbols come from (Definitions I.1.5 and I.1.7 referred to them).
These are numbers that are implicit (“hardwired”) within the formation rules
for terms and atomic formulas. Each function, and each predicate symbol – e.g.,
+, ×, ∈, < – has its own unique associated formation rule. This rule “knows”
how many terms are needed (on the “input side”) in order to form a term or
atomic formula.
There is an alternative way of making arities of symbols known (in the
metatheory): Rather than embedding arities in the formation rules, we can hide
them inside the ontology of the symbols, not making them explicit in the name.
For example, a new symbol, say ∗, can be used to record arity. That is, we can
think of a predicate (or function) symbol as consisting of two parts: an arity
part and an “all the rest” part, the latter needed to render the symbol unique.
For example, ∈ may be actually the short name for the symbol “∈∗∗ ”, where
this latter name is identical to the symbol it denotes, or “what you see is what
you get” – see Remark I.1.3(1) and (2), p. 10. The presence of the two asterisks
declares the arity. Some people say this differently: They make available to the
metatheory a “function”, ar , from “the set of all predicate and function symbols”
(of a given language) to the natural numbers, so that for any function symbol f
or predicate symbol P, ar ( f ) and ar (P) yield the arity of f or P respectively.‡
† In some presentations, formation rule I.1.8(c) is context-sensitive: It requires that x be not already
quantified in A.
‡ In mathematics we understand a function as a set of input-output pairs. One can “glue” the two
parts of such pairs together, as in “∈∗∗ ” – where “∈” is the input part and “∗∗” is the output part,
the latter denoting “2” – etc. Thus, the two approaches are equivalent.
I.1. First Order Languages 17
(4) As a consequence of the remarks in (3) the theory can go about its job of
generating, say, terms using the formation rules, at the same time being unable
to see or discuss these arities, since these are hidden inside the rules (or inside
the function or predicate names in the alternative approach). So it is not in the
theory’s “competence” to say, e.g., “hmm, this function has arity 10011132”.
Indeed, a theory cannot even say “hmm, so this is a function (or a term, or a
wff )”. A theory just generates strings. It does not test them for membership in
syntactic categories, such as variable, function, term, or wff. A human user of
the theory, on the other hand, can, of course, make such observations. Indeed,
in theories such as set theory and arithmetic, the human user can even write a
computer program that correctly makes such observations. But both of these
agents, human and computer, act in the metatheory.
(5) Abbreviations:
Abr1. The string ((∀x)A) abbreviates the string (¬((∃x)(¬A))). Thus, for
any explicitly written formula A, the former notation is informal (meta-
mathematical), while the latter is formal (within the formal language). In
particular, ∀ is a metalinguistic symbol. “∀x” is the universal quantifier.
A is its scope. The symbol ∀ is pronounced for all.
example, in set theory we will also introduce a 1-ary† predicate, U , whose job is
to test an object for “sethood”‡ (vs. atom status). Similar remedies are available
to other theories. For example, geometry will manage with one sort of variable,
and unary predicates “Point”, “Line”, and “Plane”.
Apropos language, some authors emphasize the importance of the nonlogical
symbols, taking at the same time the formation rules for granted; thus they say
that we have a language, say, “L = {∈, U }” rather than “L = (V , Term, Wff )
where V has ∈ and U as its only nonlogical symbols”. That is, they use
“language” for the nonlogical part of the alphabet.
We have said above “This rule ‘knows’ how many terms are needed (on
the ‘input side’) in order to form a term or atomic formula.” We often like to
personify rules, theories, and the like, to make the exposition more relaxed.
This runs the danger of being misunderstood on occasion. Here is how a rule
“knows”.
Syntactic definitions in the part of theoretical computer science known as
formal language theory are given by a neat notation called BNF:§ To fix ideas,
let us say that we are describing the terms of a specific first order language that
contains just one constant symbol, “c”, and just two function symbols, “ f ” and
“g”, where we intend the former to be ternary (arity 3) and the latter of arity 5.
Moreover, assume that the variables v0 , v1 , . . . are short names for vv, v|v, . . .
respectively.
Then, using the syntactic names ter m, var , str okes to stand for any
term, any variable, any string of strokes, we can recursively define these syn-
tactic categories as follows, where
we read “→” as “is defined as” (the right
hand side), and the big stroke, “” – pronounced “or” – gives alternatives in the
For example, rule (1) says that a string of strokes is (defined as) either the empty
string λ, or a string of strokes followed by a single stroke.
Rule (3) shows clearly how the “knowledge” of the arities of f and g is
“hardwired” within the rule. For example, the third alternative of that rule says
that a term is a string composed of the symbol “ f ” followed immediately by
three strings, each of which is a term.
I.1.11 Remark. (1) Of course, Definition I.1.10 takes care of the defined con-
nectives as well, via the obvious translation procedure.
(2) Notation. If A is a formula, then we often write A [y1 , . . . , yk ] to
indicate interest in the variables y1 , . . . , yk , which may or may not be free in
† Recall that x and y are abbreviations of names such as v1200098 and v11009 (which name distinct
variables). However, it could be that both x and y name v101 . Therefore it is not redundant to say
“and y is not the same variable as x”. By the way, x ≡ y says the same thing, by I.1.4.
20 I. A Bit of Logic: A User’s Toolbox
A. There may be other free variables in A that we may have chosen not to in-
clude in the list. On the other hand, if we use round brackets, as in A(y1 , . . . , yk ),
then we are implicitly asserting that y1 , . . . , yk is the complete list of free
variables that occur in A.
I.1.12 Definition. A term or formula is closed iff no free variables occur in it.
A closed formula is called a sentence.
A formula is open iff it contains no quantifiers (thus, an open formula may
also be closed).
y1 ... yn z
a1 ... an an+1
b1 ... bn bn+1
.. .. ..
. . .
If the above rule (table) is called Q, then we use the notations Q(a1 , . . . , an ,
an+1 ) and† a1 , . . . , an , an+1 ∈ Q interchangeably to indicate that the ordered
sequence or “row” a1 , . . . , an , an+1 is present in the table. We say “Q(a1 , . . . ,
an , an+1 ) holds” or “Q(a1 , . . . , an , an+1 ) is true”, but we often also say that
“Q applied to a1 , . . . , an yields an+1 ”, or that “an+1 is a result or output of
Q, when the latter receives input a1 , . . . , an ”. We often abbreviate such inputs
using vector notation, namely, an (or just a , if n is understood). Thus, we often
an+1 ) for Q(a1 , . . . , an , an+1 ).
write Q(
A rule Q that has n + 1 columns is called (n + 1)-ary.
(1) I ⊆ S,†
(2) S is closed under every Q in R. In this case we say that S is R-closed.
We write S = Cl(I , R), and say that “S is the closure of I under R”.
We have at once:
(1) I ⊆ T , and
(2) T is closed under every Q in R,
then S ⊆ T .
† From our knowledge of elementary informal set theory, we recall that A ⊆ B means that every
member of A is also a member of B.
22 I. A Bit of Logic: A User’s Toolbox
Of course, this rephrased principle is valid, for if we let T be the set of all
objects that have property P(x) – for which set one employs the well-established
symbol {x : P(x)} – then this T satisfies (1) and (2) of the metatheorem.†
I.2.5 Remark. The following metatheorem shows that there is a way to “con-
struct” Cl(I , R) iteratively, i.e., one element at a time by repeated application
of the rules.
This result shows definitively that our inductive definitions of terms (I.1.5)
and well-formed formulas (I.1.8) fully conform with our working definition of
theory, as an alphabet and a set of rules that are used to build formulas and
theorems (p. 7).
I.2.6 Metatheorem.
Cl(I , R) = {x : x is (I , R)-derivable within some number of steps, n}
† We are sailing too close to the wind here. It turns out that not all properties P(x) lead to sets
{x : P(x)}. Our explanation was naı̈ve. However, formal set theory, which is meant to save us
from our naı̈veté, upholds the principle (a)–(b) using just a slightly more complicated explanation.
The reader can see this explanation in Chapter VII.
‡ This “or” is inclusive: (1), or (2), or both.
I.2. A Digression into the Metatheory 23
I.2.7 Example. One can see now that N = Cl(I , R), where I = {0} and R
contains just the relation y = x + 1 (input x, output y). Similarly, Z, the set
of all integers, is Cl(I , R), where I = {0} and R contains just the relations
y = x + 1 and y = x − 1 (input x, output y).
For the latter, the inclusion Cl(I , R) ⊆ Z is trivial (by I.2.3). For ⊇ we eas-
ily see that any n ∈ Z has a (I , R)-derivation (and then we are done by I.2.6).
For example, if n > 0, then 0, 1, 2, . . . , n is a derivation, while if n < 0, then
0, −1, −2, . . . , n is one. If n = 0, then the one-term sequence 0 is a derivation.
Another interesting closure is obtained by I = {3} and the two relations
z = x + y and z = x − y. This is the set {3k : k ∈ Z} (see Exercise I.1).
Pause. So, taking the first sentence of I.2.7 one step further, we note that we
have just proved the induction principle for N, for that is exactly what the
“equation” N = Cl(I , R) says (by I.2.3). Do you agree?
There is another way to view the iterative construction of Cl(I , R): The
set is constructed in stages. Below we are using some more notation borrowed
24 I. A Bit of Logic: A User’s Toolbox
from informal set theory. For any sets A and B we write A ∪ B to indicate the
set union which consists of all the members found in A or B or in both. More
generally, if we have a lot of sets, X 0 , X 1 , X 2 , . . . , that is, one X i for every
integer i ≥ 0 – which we denote by the compact notation (X i )i≥0 – then we
may wish to form a set that includes all the objects found as members all over
the X i , that is (using inclusive, or “logical”, “or”s below), form
{x : x ∈ X 0 or x ∈ X 1 or . . .}
{x : for some i ≥ 0, x ∈ X i }
The latter is called the union of the sequence (X i )i≥0 and is often denoted by
X i or Xi
i≥0 i≥0
Correspondingly, we write
Xi or Xi
i≤n i≤n
X0 = I
X n+1 = Xi ∪ b : for some Q ∈ R and some an in an , b)
X i , Q(
i≤n i≤n
That is, to form X n+1 we append to i≤n X i all the outputs of all the relations
in R acting on all possible inputs, the latter taken from i≤n X i .
We say that X i is built at stage i, from initial objects I and rule set R.
a0 = 1 (for a = 0 throughout)
a n+1
= a · an
Proof. ⊆: We do induction on Cl(I , R). For the basis, I = X 0 ⊆ i≥0 X i .
We show that i≥0 X i is R-closed. Let Q ∈ R and Q( an , b) hold, for some
an in i≥0 X i . Thus, by definition of union, there are integers j1 , j2 , . . . , jn
such that ai ∈ X ji , i = 1, . . . , n. If k = max{ j1 , . . . , jn }, then an is in i≤k X i ;
hence b ∈ X k+1 ⊆ i≥0 X i .
⊇: It suffices to prove that X n ⊆ Cl(I , R), a fact we can prove by induction
on n. For n = 0 it holds by I.2.2. As an I.H. we assume the claim for all n ≤ k.
The case for k + 1: X k+1 is the union of two sets. One is i≤k X i . This is
a subset of Cl(I , R) by the I.H. The other is
b : for some Q ∈ R and some a in a , b)
X i , Q(
i≤k
This too is a subset of Cl(I , R), by the preceding observation and the fact that
Cl(I , R) is R-closed.
(i) It has two (or more) distinct sets of immediate P-predecessors for some
rule P.
26 I. A Bit of Logic: A User’s Toolbox
I.2.11 Example. The pair ({00, 0}, {Q}), where Q(x, y, z) holds iff z = x y
(where “x y” denotes the concatenation of the strings x and y, in that order), is
ambiguous. For example, 0000 has the two immediate predecessor sets {00, 00}
and {0, 000}. Moreover, while 00 is an initial object, it does have immediate
predecessors – namely, the set {0, 0} (or, what amounts to the same thing, {0}).
but also
P(b1 , o1 , . . . , bl , ol , x, z)
Uniqueness part. Let the function K also satisfy (1). We show, by induction
on Cl(I , R), that
The above clearly is valid for functions h and g Q that may fail to be defined
everywhere in their “natural” input sets. To be able to have this degree of
generality without having to state additional definitions (such as those of left
fields, right fields, partial functions, total functions, nontotal functions, and
Kleene weak equality) we have stated the recurrence (1) the way we did (to
† Cl(I, R
)-derivable.
I.3. Axioms and Rules of Inference 29
keep an eye on both the input and output side of things) rather than the usual
h(x) if x ∈ I
f (x) =
g Q (x, f (a1 ), . . . , f (ar )) if Q(a1 , . . . , ar , x) holds,
Of course, if all the g Q and h are defined everywhere on their input sets (i.e.,
they are “total”), then f is defined everywhere on Cl(I , R) (see Exercise I.4).
† Interestingly, our myope can see the brackets and the Boolean connectives.
30 I. A Bit of Logic: A User’s Toolbox
The above inductive definition of v̄ relies on the fact that Definition I.3.2 of
Prop is unambiguous (I.2.10, p. 25), so that a propositional formula is uniquely
readable (or parsable) (see Exercises I.5 and I.6). It employs the metatheorem
on recursive definitions (I.2.13).
The reader may think that all this about unique readability is just an annoying
quibble. Actually it can be a matter of life or death. The ancient Oracle of
Delphi had the nasty habit of issuing ambiguous – not uniquely readable, that
is – pronouncements. One famous such pronouncement, rendered in English,
went like this: “You will go you will return not dying in the war”.† Given that
† The original was “I ξ εις αϕιξ εις oυ θ νηξ εις εν π oλεµ ω”.
ι
I.3. Axioms and Rules of Inference 31
ancient Greeks did not use punctuation, the above has two diametrically opposite
meanings depending on whether you put a comma before or after “not”.
The situation with formulas in Prop would have been as disastrous in the
absence of brackets – which serve as punctuation – because unique readability
would then not be guaranteed: For example, for three distinct prime formulas
p, q, r we could find a v such that v( p → q → r ) depended on whether we meant
to insert brackets around “ p → q” or around “q → r ” (can you find such a v?).
I.3.5 Remark (Truth Tables). Definition I.3.4 is often given in terms of truth
functions. For example, we could have defined (in the metatheory, of course)
the function F¬ : {t, f} → {t, f} by
t if x = f
F¬ (x) =
f if x = t
We could then say that v̄((¬A)) = F¬ (v̄(A)). One can similarly take care of
all the connectives (∨ and all the abbreviations) with the help of truth functions
F∨ , F∧ , F→ , F↔ . These functions are conveniently given via so-called truth
tables as indicated below:
We have at once
If = ∅, then |=Taut A says just |=Taut A , since the hypothesis “every truth
assignment v that satisfies ”, in the definition above, is vacuously satisfied.
For that reason we almost never write ∅ |=Taut A and write instead |=Taut A.
I.3.9 Exercise. For any formula A and any two valuations v and v , v̄(A) =
v̄ (A) if v and v agree on all the propositional variables that occur in A.
In the same manner, |=Taut A is oblivious to v-variations that do not
affect the variables that occur in and A (see Exercise I.7).
† The word “lemma” has Greek origin, “λήµµα”, plural “lemmata” – many people say “lemmas” –
from “ήµµατ α”. It derives from the verb “λαµβ άνω” (to take) and thus means “taken thing”.
In mathematical reasoning a lemma is a provable auxiliary statement that is taken and used as
a stepping stone in lengthy mathematical arguments – invoked therein by name, as in “. . . by
Lemma such and such . . . ” – much as subroutines (or procedures) are taken and used as auxiliary
stepping stones to elucidate lengthy computer programs. Thus our purpose in having lemmata is
to shorten proofs by breaking them up into modules.
I.3. Axioms and Rules of Inference 33
I.3.11 Remark. There are a number of issues about Definition I.3.10 that need
discussion or clarification.
Any reasonable person will be satisfied with the above definition “as is”.
However, there are some obscure points (deliberately quoted, above).
(1) What is this about “capture”? Well, suppose that A ≡ (∃x)¬x = y. Let
t ≡ x.† Then, if we ignore the provison in I.3.10, A [y ← t] ≡ (∃x)¬x =
x, which says something altogether different than the original. Intuitively,
this is unexpected (and undesirable): A codes a statement about the free
variable y, i.e., a statement about all objects which could be values (or
meanings) of y. One would have expected that, in particular, A [y ← x] –
if the substitution were allowed – would make this very same statement
about the values of x. It does not.‡ What happened is that x was captured
by the quantifier upon substitution, thus distorting A’s original meaning.
(2) Are we sure that the term “replace” is mathematically precise?
(3) Is A [x ← t] always a formula, if A is?
† Recall that in I.1.4 (p. 13) we defined the symbol “≡” to be equality on strings. No further
reminders will be issued.
‡ And that is why the substitution is not allowed. The original formula says that for any object y
there is an object that is different from it. On the other hand, A [y ← x] says that there is an
object that is different from itself.
34 I. A Bit of Logic: A User’s Toolbox
In all cases above, the left hand side is defined iff the right hand side is.
Exercise I.9 shows that we obtain the same string in (1) above, regardless of
our choice of new variables zr .
I.3.13 Definition (Axioms and Axiom Schemata). The logical axioms are all
the formulas in the group Ax1 and all the possible instances of the schemata in
the remaining groups:
or even
A [t] → (∃x)A
† Plural of schema. This is of Greek origin, σ χ ήµα, meaning – e.g., in geometry – figure or
configuration or even formation.
36 I. A Bit of Logic: A User’s Toolbox
The logical axioms for equality are not the strongest possible, but they are
adequate for the job. What Leibniz really proposed was the schema t = s ↔
(∀P)(P[t] ↔ P[s]), which says, intuitively, that “two objects t and s are equal
iff, for every property P, both have P or neither has P”.
Unfortunately, our system of notation (first-order language) does not allow
quantification over predicate symbols (which can have as “values” arbitrary
“properties”). But is not Ax4 read “for all formulas A” anyway? Yes, but with
one qualification: “For all formulas A that we can write down in our system of
notation”, and, alas, we cannot write all possible formulas of real mathematics
down, because they are too many.†
While the symbol “=” is suggestive of equality, it is not its shape that qual-
ifies it. It is the two axioms, Ax3 and Ax4, that make the symbol behave as we
expect equality to behave, and any other symbol of any other shape (e.g.,
Enderton (1972) uses “≈”) satisfying these two axioms qualifies as formal
equality that is intended to codify the metamathematical standard “=”.
I.3.14 Remark. In Ax2 and Ax4 we imposed the condition that t (and s) must
be substitutable in x. Here is why:
Take A to stand for (∀y)x = y and B to stand for (∃y)¬x = y. Then, tem-
porarily suspending the restriction on substitutability, A [x ← y] → (∃x)A is
(∀y)y = y → (∃x)(∀y)x = y
and x = y → B ↔ B [x ← y] is
x = y → (∃y)¬x = y ↔ (∃y)¬y = y
neither of which, obviously, is “valid”.‡
There is a remedy in the metamathematics: That is, move the quantified
variable(s) out of harm’s way, by renaming them so that no quantified variable
in A has the same name as any (free, of course) variable in t (or s).
This renaming is formally correct (i.e., it does not change the meaning of
the formula) as we will see in the variant (meta)theorem I.4.13. Of course,
it is always possible to effect this renaming, since we have countably many
variables, and only finitely many appear free in t (and s) and A.
† Uncountably many, in a precise technical sense that we will introduce in Chapter VII. This is
due to Cantor’s theorem, which implies that there are uncountably many subsets of N. Each such
subset A, gives rise to the formula, x ∈ A, in the metalanguage.
On the other hand, our formal system of notation, using just ∈ and U as start-up (nonlogical)
symbols, is not rich enough to write down but a countably infinite set of formulas (at some point
later, Example VII.5.17, this will be clear). Thus, our notation will fail to denote uncountably
many “real formulas” x ∈ A.
‡ Speaking intuitively is enough for now. Validity will be defined carefully pretty soon.
I.3. Axioms and Rules of Inference 37
This trivial remedy allows us to render the conditions in Ax2 and Ax4
harmless. Essentially, a t (or s) is always substitutable after renaming.
I.3.15 Definition (Rules of Inference). The following two are the only primi-
tive† rules of inference. These rules are relations with inputs from the set Wff
and outputs also in Wff. They are written down, traditionally, as “fractions”
through the employment of syntactic (or meta-) variables. We call the “numer-
ator” the premise(s) and the “denominator” the conclusion.
We say that a rule of inference is applied to any instance of the formula
schema(ta) in the numerator, and that it yields (or results in) the corresponding
instance‡ of the formula schema in the denominator.
Inf1. Modus ponens, or MP, is the rule
A, A → B
B
Inf2. ∃-introduction – pronounced E-introduction – is the rule
A →B
(∃x)A → B
that is applicable if a side condition is met: That x is not free in B .
N.B. Recall the conventions on eliminating brackets.
It is immediately clear that the definition above meets our requirement that the
rules of inference be “algorithmic”, in the sense that whether they are applicable
can be decided and their application can be carried out in a finite number of
steps by just looking at the form of (potential input) formulas (not at their
meaning).
We next define -theorems, that is, formulas we can prove from the set of
formulas (this may be empty).
† That is, given initially. Other rules can be proved to hold, and we call them derived rules.
‡ The corresponding instance is the one obtained from the schema in the denominator by replac-
ing each of its metavariables by the same specific formula, or term, used to instantiate all the
occurrences of the same metavariable in the numerator.
38 I. A Bit of Logic: A User’s Toolbox
I.3.17 Definition (-Proofs). We just saw that Thm = Cl(I , R), where I =
∪ and R contains just the two rules of inference. A (I , R)-derivation is
also called a -proof (or just proof, if is understood).
A, A → B B (i)
A → B (∃x)A → B (ii)
Some texts (e.g., Schütte (1977)) give the rules in the format of (i)–(ii) above.
The axioms and rules provide us with a calculus, that is, a means to “calcu-
late” (used synonymously with construct) proofs and theorems. In the interest
of making the calculus more user-friendly – and thus more easily applicable to
mathematical theories of interest, such as set theory – we are going to develop in
the next section a number of “derived principles”. These principles are largely
I.3. Axioms and Rules of Inference 39
A ∈T iff T A (1)
Indeed, the if direction follows from closure under , while the only-if
direction is a consequence of Definition I.3.16.
T is the set of the formulas of the theory,† and we often say “a theory T ”,
taking everything else for granted.
If T = Wff, then the theory T is called inconsistent or contradictory.
Otherwise it is called consistent.
Throughout our exposition we fix and I as in Definitions I.3.13 and I.3.15.
By (1), T = ThmT . This observation suggests that we call theories – such as
the ones we have just defined – axiomatic theories, in that a set always exists
such that T = Thm (if at a loss, we can just take = T ).
We are mostly interested in theories T for which there is a “small” set
(“small” by comparison with T ) such that T = Thm . We say that T is
axiomatized by . Naturally, we call T the set of theorems, and the set of
nonlogical axioms, of T.
If, moreover, is “recognizable” (i.e., we can tell “algorithmically” whether
or not a formula A is in ), then we say that T is recursively axiomatized.
Examples of recursively axiomatized theories are ZFC set theory and Peano
arithmetic. On the other hand, if we take T to be all the sentences of arithmetic
that are true when interpreted “in the standard way”† over N – the so-called
complete arithmetic – then there is no recognizable such that T = Thm .
We say that complete arithmetic is not recursively axiomatizable.‡
Pause. Why does complete arithmetic form a theory? Because work of the next
section – in particular, the soundness theorem – entails that it is closed under .
We tend to further abuse language and call axiomatic theories by the name
of their (set of) nonlogical axioms . Thus if T = (L , , I, T ) is a first order
theory and T = Thm , then we may say interchangeably “theory T”, “theory
T ”, or “theory ”.
If = ∅, then we have a pure or absolute theory (i.e., we are “just doing
logic, not math”). If = ∅ then we have an applied theory.
Argot. A final note on language versus metalanguage, and theory versus meta-
theory. When are we speaking the metalanguage, and when are we speaking
the formal language?
The answers are, respectively, “almost always” and “almost never”. As has
been remarked before, in principle, we are speaking the formal language exactly
when we are pronouncing or writing down a string from Term or Wff. Otherwise
we are (speaking or writing) in the metalanguage. It appears that we (and
everybody else who has written a book in logic or set theory) are speaking and
writing within the metalanguage with a frequency approaching 100%.
The formalist is clever enough to simplify notation at all times. We will
seldom be caught writing down a member of Wff in this book, and, on the rare
occasions we may do so, it will only be to serve as an illustration of why one
should avoid writing down such formulas: because they are too long and hard
to read and understand.
We will be speaking the formal language with a heavy “accent” and using
many idioms borrowed from “real” (meta-) mathematics, and English. We will
call our dialect argot, following Manin (1977).
A related, and practically more important,§ question is “When are we arguing
in the theory, and when are we arguing in the metatheory?”. That is, the question
is not about how we speak, but about what we are saying when we speak.
† That is, the symbol “0” of the language is interpreted as 0 ∈ N, “Sx” as x + 1, “(∃x)” as “there
is an x ∈ N”, etc.
‡ The trivial “solution”, that is, taking = T , will not do, for T is not recognizable.
§ Important, because arguing in the theory restricts us to use only its axioms (and earlier proved
theorems; cf. I.3.18) and its rules of inference – nothing extraneous to these syntactic tools is
allowed.
I.3. Axioms and Rules of Inference 41
The answer to this is also easy: Once we have fixed a theory T and the
nonlogical axioms , we are working in the theory iff we are writing down a
(-) proof of some specific formula A. It does not matter if A (and much of
the what we write down during the proof) is in argot.
Two examples:
(1) One is working in formal number theory (or formal arithmetic) if one states
and proves (say, from the Peano axioms) that “every natural number n > 1
has a prime factor”. Note how this theorem is stated in argot. Below we
give its translation into the formal language of arithmetic:†
(∀n) S0 < n → (∃x)(∃y) n = x × y ∧
(1)
S0 < x ∧ (∀m)(∀r )(x = m × r → m = S0 ∨ m = x)
(2) One is working in formal logic if one is writing a proof of (∃v13 )v13 = v13 .
† Well, almost. In the interest of brevity, all the variable names used in the displayed formula (1)
are metasymbols.
‡ That is, whether or not T = Wff.
§ That is, “if is consistent,” – where we are naming the theory by its nonlogical axioms – “does
it stay so after we have added some formula A as a nonlogical axiom?”.
42 I. A Bit of Logic: A User’s Toolbox
Pause. But how much of real mathematics are we allowed to use, reliably, to
study or speak about the “simulator” that the formal system is?†
For example, have we not overstepped our license by using induction (and,
implicitly, the entire infinite set N), specifically the recursive definitions of
terms, well-formed formulas, theorems, etc.?
The quibble here is largely “political”. Some people argue (a major proponent
of this was Hilbert) as follows: Formal mathematics was meant to crank out
“true” statements of mathematics, but no “false” ones, and this freedom of
contradiction ought to be verifiable.
Now, as we are verifying so in the metatheory (i.e., outside the formal sys-
tem), shouldn’t the metatheory itself be above suspicion (of contradiction, that
is)? Naturally.
Hilbert’s suggestion towards achieving this “above suspicion” status was,
essentially, to utilize in the metatheory only a small fragment of “reality” that
is so simple and close to intuition that it does not need itself any “certificate”
(via formalization) for its freedom from contradiction.
In other words, restrict the metamathematics!‡
Such a fragment of the metatheory, he said, should have nothing to do with
the “infinite”, in particular with the entire set N and all that it entails (e.g.,
inductive definitions and proofs).§
If it were not for Gödel’s incompleteness results, this position – that meta-
mathematical techniques must be finitary – might have prevailed. However,
Gödel proved it to be futile, and most mathematicians have learnt to feel com-
fortable with infinitary metamathematical techniques, or at least with N and
induction.¶ Of course, it would be imprudent to use as metamathematical tools
mathematics of suspect consistency (e.g., the full naı̈ve theory of sets).
† The methods or scope of the metamathematics that a logician uses – in the investigation of some
formal system – are often restricted for technical or philosophical reasons.
‡ Otherwise we would need to formalize the metamathematics – in order to “certify” it – and
next the metametamathematics, and so on. For if “metaM” is to authoritatively check “M” for
consistency, then it too must be consistent; so let us formalize “metaM” and let “metametaM”
check it; . . . – a never ending story.
§ See Hilbert and Bernays (1968, pp. 21–29) for an elaborate scheme that constructs “concrete
number objects” – Ziffern or “numerals” – “|”,“||”,“|||”, etc., that stand for “1”,“2”,“3”, etc.,
complete with a “concrete mathematical induction” proof technique on these objects, and even
the beginnings of their “recursion theory”. Of course, at any point, only finite sets of such objects
were considered.
¶ Some proponents of infinitary techniques in metamathematics have used very strong words in
describing the failure of Hilbert’s program. Rasiowa and Sikorski (1963) write in their intro-
duction: “However Gödel’s results exposed the fiasco of Hilbert’s finitistic methods as far as
consistency is concerned.”
I.4. Basic Metatheorems 43
It is worth pointing out that one could fit (with some effort) our inductive
definitions within Hilbert’s style. But we will not do so.
First, one would have to abandon the elegant (and now widely used) approach
with closures, and use instead the concept of derivations of Section I.2.
Then one would somehow have to effect and study derivations without the
benefit of the entire set N. Bourbaki (1966b, p. 15) does so with his construc-
tions formatives. Hermes (1973) is another author who does so, with his “term-”
and “formula-calculi” (such calculi being, essentially, finite descriptions of
derivations).
Bourbaki (but not Hermes) avoids induction over all of N. In his metamath-
ematical discussions of terms and formulas† that are derived by a derivation
d1 , . . . , dn , he restricts his induction arguments to the segment {0, 1, . . . , n},
that is, he takes an I.H. on k < n and proceeds to k + 1.
† For example, in loc. cit., p. 18, where he proves that, in our notation, A [x ← y] and t[x ← y]
are a formula and term respectively.
44 I. A Bit of Logic: A User’s Toolbox
Proof.
(1) A →B given
(2) ¬B → ¬A (1) and I.4.1
(3) (∃x)¬B → ¬A (2) and ∃-introduction
(4) A → ¬(∃x)¬B (3) and I.4.1
(5) A → (∀x)B (4), introducing the ∀-abbreviation
At this point, the reader may want to review our abbreviation conventions; in
particular, see Ax2 (I.3.13).
Proof.
(1) ¬A [t] → (∃x)¬A in
(2) ¬(∃x)¬A → A [t] (1) and I.4.1
(3) (∀x)A → A[t] (2), introducing the ∀-abbreviation
I.4. Basic Metatheorems 45
Proof. A [x ← x] ≡ A.
The above corollary motivates the following definition. It also justifies the
common mathematical practice of the “implied universal quantifier”. That is,
we often just state “. . . x . . . ” when we mean “(∀x) . . . x . . . ”.
I.4.11 Definition (Universal Closure). Let y1 , . . . , yn be the list of all free vari-
ables of A. The universal closure of A is the formula (∀y1 )(∀y2 ) · · · (∀yn )A –
often written more simply as (∀y1 y2 . . . yn )A or even (∀yn )A.
Pause. We said the universal closure. Hopefully, the remark immediately above
is robust to permutation of (∀y1 )(∀y2 ) · · · (∀yn ). Is it? (Exercise 1.10.)
The reader may wish to review I.3.12 and the remark following it.
46 I. A Bit of Logic: A User’s Toolbox
Proof. We illustrate the proof for n = 2. What makes it interesting is the re-
quirement to have “simultaneous substitution”. To that end we first substitute
into x1 and x2 new variables z, w – i.e., not occurring in either A or in the ti .
The proof is the following sequence. Comments justify, in each case, the pres-
ence of the formula immediately to the left by virtue of the presence of the
immediately preceding formula:
We often write this (under the stated conditions) as (∃x) A [x] ↔ (∃z) A [z].
By the way, another way to state the conditions is “if z does not occur in A
(i.e., is neither free nor bound in A ), and is different from x”. Of course, if
z ≡ x, then there is nothing to prove.
A → (∃z) A [x← z]
I.4. Basic Metatheorems 47
Hence, by ∃-introduction
(∃x)A → (∃z) A [x← z] (2)
Tautological implication from (1) and (2) concludes the argument.
I.4.14 Definition. In the sequel we will often discuss two (or more) theories at
once. Let T = (L , , I, T ) and T = (L , , I, T ) be two theories such that
V ⊆ V . This enables T to be “aware” of all the formulas of T (but not
vice versa, since L may contain additional nonlogical symbols – case where
V = V ).
We say that T is an extension of T, in symbols T ≤ T , iff T ⊆ T .
Let A be a formula over L (so that both theories are aware of it). The
symbols T A and T A are synonymous with A ∈ T and A ∈ T
respectively.
Note that we did not explicitly mention the nonlogical axioms or to the
left of , since the subscript of takes care of that information.
We say that the extension is conservative iff for any A over L, whenever
T A it is also the case that T A. That is, when it comes to formulas over
48 I. A Bit of Logic: A User’s Toolbox
the language (L) that both theories understand, then the new theory does not
do any better than the old in producing theorems.
Pause. What does the restriction on the xi have to do with the claim above?
Modus ponens. Here T B [e1 , . . . , en ] → A [e1 , . . . , en ] and T
B [e1 , . . . , en ]. By I.H., T B [y1 , . . . , yn ] → A [y1 , . . . , yn ] and T
B [y1 , . . . , yn ], where y1 , . . . , yn occur nowhere in B [e1 , . . . , en ] →
A [e1 , . . . , en ] as either free or bound variables. By modus ponens, T
A [y1 , . . . , yn ]; hence T A [x1 , . . . , xn ] by I.4.12 (and I.4.13).
∃-introduction. We have T B [e1 , . . . , en ] → C [e1 , . . . , en ], z is not free in
C [e1 , . . . , en ], and A [e1 , . . . , en ] ≡ (∃z)B [e1 , . . . , en ] → C [e1 , . . . , en ]. By
the I.H., if w1 , . . . , wn – distinct from z – occur nowhere in B [e1 , . . . , en ] →
C [e1 , . . . , en ] as either free or bound, then we get T B [w1 , . . . , wn ] →
C [w1 , . . . , wn ]. By ∃-introduction we get T (∃z)B [w1 , . . . , wn ] →
C [w1 , . . . , wn ]. By I.4.12 and I.4.13 we get T (∃z)B [x1 , . . . , xn ] →
C [x1 , . . . , xn ], i.e., T A [x1 , . . . , xn ].
The following corollary stems from the proof (rather than the statement)
of I.4.15 and I.4.16, and is important.
I.4.20 Remark. (1) Is the restriction that A must be closed important? Yes.
Let A ≡ x = a, where “a” is some constant. Then, even though A (∀x)A
by generalization, it is not always true† that A → (∀x)A. This follows from
soundness considerations (next section). Intuitively, assuming that our logic
“doesn’t lie” (that is, it proves no “invalid” formulas), we immediately infer
that x = a → (∀x)x = a cannot be absolutely provable, for it is a “lie”. It fails
at least over N, if a is interpreted to be “0”.
(2) I.4.16 adds flexibility to applications of the deduction theorem:
T (A → B )[x1 , . . . , xn ] (∗)
where [x1 , . . . , xn ] is the list of all free variables just in A, is equivalent
(by I.4.16) to
T (A → B )[e1 , . . . , en ] (∗∗)
where e1 , . . . , en are new constants added to V (with no effect on nonlogical
axioms: = ). Now, since A [e1 , . . . , en ] is closed, proving
+ A [e1 , . . . , en ] B [e1 , . . . , en ]
establishes (∗∗), hence also (∗).
In practice, one does not perform this step explicitly, but ensures that,
throughout the + A-proof, whatever free variables were present in A
“behaved like constants”, or, as we also say, were “frozen”.
(3) In some expositions the deduction theorem is not constrained by requiring
that A be closed (e.g., Bourbaki (1966b), and more recently Enderton (1972)).
Which version is right? Both are, in their respective contexts. If all the
primary rules of inference are “propositional” (e.g., as in Bourbaki (1966b) and
Enderton (1972), who only employ modus ponens) – that is, these rules do not
meddle with quantifiers – then the deduction theorem is unconstrained. If, on
the other hand, full generalization, namely, A (∀x)A, is a permissible rule
(primary or derived), then one cannot avoid constraining the application of the
† That is, it is not true in the metatheory that we can prove A → (∀x)A without nonlogical
axioms (absolutely).
I.4. Basic Metatheorems 51
deduction theorem, lest one want to derive (the invalid) A → (∀x)A from
the valid A (∀x)A.
This also entails that approaches such as in Bourbaki (1966b) and Enderton
(1972) do not derive full generalization. They only allow a weaker rule, “if
A, then (∀x)A”.†
(4) This divergence of approach in choosing rules of inference has some
additional repercussions. One has to be careful in defining the semantic counter-
part of , namely, |= (see next section). One wants the two symbols to track
each other faithfully (Gödel’s completeness theorem).‡
The following is important enough to merit stating. It follows from the type
of argument we employed in the only-if part above.
A → B (∃x)A → (∃x)B
† Indeed, they allow a bit more generality, namely, the rule “if A with a side condition, then
(∀x)A. The side condition is that the formulas of do not have free occurrences of x”. Of
course, can always be taken to be finite (why?), so that this condition is not unrealistic.
‡ In Mendelson (1987) |= is defined inconsistently with .
52 I. A Bit of Logic: A User’s Toolbox
A → B (∀x)A → (∀x)B
(A → B ) ↔ (A ∨ B ↔ B) (i)
If we think of “A ∨ B ” as “max(A, B )”, then the right hand side in (i) above
says that B is the maximum of A and B . Or that A is “less than or equal to”
B . The above metatheorems say that both ∃ and ∀ preserve this “inequality”.
Proof by cases usually benefits from the application of the deduction theorem.
That is, having established A 1 ∨ · · · ∨ A n , one then proceeds to adopt,
in turn, each A i (i = 1, . . . , n) as a new nonlogical axiom (with its variables
I.5. Semantics 53
“frozen”). In each case (A i ) one proceeds to prove B . At the end of all this
one has established B .
In practice we normally use the following argot:
“We will consider cases A i , for i = 1, . . . , n.
Case A 1 . . . . therefore, B .†
...
Case A n . . . . therefore, B .”
The technique that flows from this metatheorem is used often in practice. For
example, in projective geometry axiomatized as in Veblen and Young (1916), in
order to prove Desargues’s theorem on perspective triangles on the plane we use
some arbitrary point (this is the auxiliary constant) off the plane, having verified
that the axioms guarantee that such a point exists. It is important to note that De-
sargues’s theorem does not refer to this point at all – hence the term “auxiliary”.
I.5. Semantics
So what do all these symbols mean? We show in this section how to decode
the formal statements (formulas) into informal statements of real mathematics.
Conversely, this will entail an understanding of how to code statements of real
mathematics in our formal language.
† That is, we add the axiom A 1 to , freezing its variables, and we then prove B .
54 I. A Bit of Logic: A User’s Toolbox
An (or A × ·
· · × A)
n times
for the set of ordered n-tuples of members of A. We will also use the symbols
⊆, ∪, a∈I .‡
† One often says “The formal definition of semantics . . .”, but the word “formal” is misleading
here, for we are actually defining semantics in the metatheory (in “real” mathematics), not in
some formal theory.
‡ If we have a set of sets{Sa , Sb , Sc , . . . }, where the indices a, b, c, . . . all come out of an index
set I , then the symbol i∈I Si stands for the collection of all those objects x that are found in at
∞
least one of the sets Si . It is a common habit to write i=0 Si instead of i∈N Si . A ∪ B is the
same as i∈{1,2} Si , where we have let S1 = A and S2 = B.
§ Often the qualification “of discourse” is added to the terms “domain” and “universe”.
¶ Requiring f I to be total is a traditional convention. By the way, total means that f I is defined
everywhere on M n .
# Thus P I is an n-ary relation with inputs and outputs in M.
I.5. Semantics 55
I.5.6 Definition. For any closed formula A in Wff(M) we define the symbol
A I inductively. In all cases, A I ∈ {t, f}:
(1) If A ≡ t = s, where t and s are closed M-terms, then A I = t iff
t I = s I . (The last two occurrences of “=” are metamathematical.)
(2) If A ≡ Pt1 . . . tn , where P is an n-ary predicate and the ti are closed
I I I I
M-terms, then A I = t iff t1 , . . . , tn ∈ P I or P I (t1 , . . . , tn ) holds.
(Or “is true”; see p. 20. Of course, the last occurrence of “=” is meta-
mathematical.)
(3) If A is any of the sentences ¬B , B ∨ C , then A I is determined by
the usual truth tables (see p. 31) using the values B I and C I . That is,
(¬B )I = F¬ (B I ) and (B ∨ C )I = F∨ (B I , C I ). (The last two occur-
rences of “=” are metamathematical.)
(4) If A ≡ (∃x)B , then A I = t iff (B [x ← i])I = t for some i ∈ M.
(The last two occurrences of “=” are metamathematical.)
Towards the soundness result‡ below we look at two tedious (but easy)
lemmata.
I.5.12 Metatheorem (Soundness). Any first order theory (identified by its non-
logical axioms) , over some language L, is sound.
† For a metamathematical relation Q, as usual (p. 20), Q(a, b, . . . ) = t, or just Q(a, b, . . . ), stands
for a, b, . . . ∈ Q.
60 I. A Bit of Logic: A User’s Toolbox
because |=Taut A , the second because after prime formulas have been
taken care of, all that remains to be done for the evaluation of A I is
to apply Boolean connectives – see I.5.6(3)).
(B [t ])I = t (1)
and
I
(∃x)B = f (2)
I
Let t = I i (i ∈ M). By I.5.11 and (1), (B [i]) = t. By I.5.6(4),
I
|=M B → C (3)
Let
(∃x)B → C be an M-instance such that (despite expectations)
I
(∃x)B = t but
I
C =f (4)
I.5. Semantics 61
Thus
B [i] = t
I
(5)
for some i ∈ M. Since x is not free in C , B [i] → C is a false (by (4) and
(5)) M-instance of B → C , contradicting (3).
I.5.14 Corollary. Any first order theory that has a model is consistent.
i 1 , . . . , i n ∈ S iff |=M S (i 1 , . . . , i n )
N.B. Some authors say “(first order) expressible” (Smullyan (1992)) rather
than “(first order) definable” in a structure.
In the context of M, the above definition gives precision to statements such
as “we code (or translate) an informal statement into the formal language” or
“the (formal language) formula A informally ‘says’ . . . ”, since any (informal)
“statement” (or relation) that depends on the informal variables x1 , . . . , xn has
the form “x1 , . . . , xn ∈ S” for some (informal) set S. It also captures the
essence of the statement “The (informal) statement x1 , . . . , xn ∈ S can be
written (or can be made) in the formal language.”
It must be said that translation is not just an art or skill. There are theoretical
limitations to translation. The trivial limitation is that if M is an infinite set and,
say, L has a finite set of nonlogical symbols (as is the case in arithmetic and
set theory), then we cannot define all S ⊆ M, simply because we do not have
enough first order formulas to do so.
There are non-trivial limitations too. Some sets are not first order definable
because their definitions are “far too complex” (the reader who wants more
on this comment may wish to look up the section on definability and incom-
pletableness in volume 1 of these lectures (Mathematical Logic)).
be
members of I M. The notation A [[ i 1 , . . . , i n ]] is an abbreviation of
A [i 1 , . . . , i n ] .
This argot allows one to substitute informal objects into variables outright,
by-passing the procedure of importing formal names for such objects into the
language. It is noteworthy that mixed mode formulas can be defined directly by
induction on formulas – that is, without forming L(M) first – as follows:
Let L and M be as above. Let x1 , . . . , xn contain all the free variables that
appear in a term t or formula A over L (not over L(M)). Let i 1 , . . . , i n be
arbitrary in M.
For terms we define
t [[ i 1 , . . . i n ]]
i j if t ≡ x j (1 ≤ j ≤ n)
= aI if t ≡ a
I
f t1 [[ i 1 , . . . , i n ]] , . . . , tr [[ i 1 , . . . , i n ]] if t ≡ f t1 . . . tr
A [[ i 1 , . . . i n ]]
t [[i 1 , . . . i n ]] = s [[ i 1 , . . . i n ]] if A ≡t =s
P t1 [[ i 1 , . . . , i n ]] , . . . , tr [[ i 1 , . . . , i n ]] if A ≡ Pt1 . . . tr
= ¬ B [[ i 1 , . . . i n ]] if A ≡ ¬B
B [[ i 1 , . . . i n ]] ∨ C [[ i 1 , . . . , i n ]] if A ≡B ∨C
(∃a ∈ M)B [[ a, i , . . . , i ]]
1 n if A ≡ (∃z)B [z, xn ]
where “(∃a ∈ M) . . . ” is short for “(∃a)(a ∈ M ∧ . . . )”. The right hand side
of = has no free (informal) variables, thus it evaluates to t or f.
The proof of the semantic completeness of every first order theory hinges on
the consistency theorem, which we state without proof below.† The complete-
ness theorem will then be derived as a corollary.
Proof. Only-if part. This is trivial, for a model of is a model of any finite
subset.
If part. Suppose that is unsatisfiable (it has no models). Then it is in-
consistent by the consistency theorem. In particular, ¬x = x. Since the
pure theory over L is consistent, a -proof of ¬x = x involves a nonempty
finite sequence of nonlogical axioms (formulas of ), A 1 , . . . , A n . That is,
A 1 , . . . , A n ¬x = x, hence {A 1 , . . . , A n } has no model (by soundness).
This contradicts the hypothesis.
At one extreme, ZFC set theory’s intended model is so huge that it is not
even a set (its domain, that is, is not). At the other extreme, set theory has only
two primary nonlogical symbols; hence, if we believe that it is consistent,† it has
A ⊆ B ↔ (∀x)(x ∈ A → x ∈ B)
or
√
y= x ↔x =y·y
† In practice we state the above definition in argot, probably as “A ⊆ B means that, for all x, we
have x ∈ A → x ∈ B”.
I.6. Defined Symbols 67
† Uniqueness follows from extensionality, while existence follows from separation. These facts –
and the italicized terminology – are found in Chapter III.
‡ U is 1-ary (unary) predicate. It is one of the two primitive nonlogical symbols of formal set
theory. With the help of this predicate we can test an object for set or atom status. “ U (y)” asserts
that y is an atom; thus “¬U (y)” asserts that y is a set – since we accept that sets or atoms are the
only types of objects that the formal system axiomatically characterizes.
§ “Basic” means here the language given originally, before any new symbols were added.
¶ Recall that (see Remark I.1.11, p. 19) the notation Q (xn ) asserts that xn , i.e., x1 , . . . , xn is the
complete list of the free variables of Q .
# Recall that predicate letters are denoted by non-calligraphic capital letters P, Q, R with or without
subscripts or primes.
68 I. A Bit of Logic: A User’s Toolbox
to k as the defining axiom for P. “⊆” is such a defined (2-ary) predicate in set
theory.
Similarly, a new n-ary function symbol f is added into L k (to form L k+1 ) by
a definition of its behaviour. That is, we add f to L k and also add the following
formula (ii) to k as a new nonlogical axiom
y = f y1 . . . yn ↔ Q (y, y1 , . . . , yn ) (ii)
Depending on the theory and on the number of free variables (n ≥ 0), “ f ” may
√
take theory-specific names such as ∅, ω, , etc. (in this illustration, for the
sake of economy of effort, we have thought of defined constants, e.g., ∅ and ω,
as 0-ary functions).
In effecting these definitions, we want to be assured of two things:
(1) Whatever we can say in the richer language L k (for any k > 0) we can also
state in the original (basic) language L = L 0 (although awkwardly, which
justifies our doing all this). “Can be stated” means that we can translate any
formula F over L k (hopefully in a “natural” way) into a formula F ∗ over
L so that the extended theory k can prove that F and F ∗ are equivalent.†
(2) We also want to be assured that the new symbols offer no more than conve-
nience, in the sense that any formula F over the basic language L deducible
from k (k > 0), one way or another (perhaps with the help of defined sym-
bols) is also deducible from .‡
These assurances will become available shortly, as Metatheorems I.6.1 and I.6.3.
Here are the “natural” translation rules that take us from a language stage L k+1
† , spoken over L, can have no opinion, of course, since it cannot see the new symbols, nor does
it have their definitions among its “knowledge”.
‡ Trivially, any F over L that can prove, any k (k > 0) can prove as well, since the latter
understands the language (L) and contains all the axioms of . Thus k extends the theory .
That it cannot have more theorems over L than makes this extension conservative.
I.6. Defined Symbols 69
back to the previous, L k (so that, iterating the process, we get back to L):
Rule (1). Suppose that F is a formula over L k+1 , and that the predicate
P (whose definition took us from L k to L k+1 , and hence is a symbol of
L k+1 but not of L k ) occurs in F zero or more times. Assume that P has
been defined by the axiom (i) above (included in k+1 ), where Q is a
formula over L k . We eliminate P from F by replacing all its occurrences
by Q . That is, whenever P tn is a subformula of F , all its occurrences are
replaced by Q (tn ). We can always arrange by I.4.13 that the simultaneous
substitution Q [xn ← tn ] is defined. This results to a formula F ∗ over L k .
Rule (2). If f is a defined n-ary function symbol as in (ii) above, introduced
into L k+1 , and if it occurs in F as F [ f t1 . . . tn ],† then this formula is
logically equivalent to‡
(∃y)(y = f t1 . . . tn ∧ F [y]) (iv)
provided that y is not free in F [ f t1 . . . tn ]. Using the definition of f
given by (ii), and I.4.13 to ensure that Q (y, tn ) is defined, we eliminate
this occurrence of f , writing (iv) as
(∃y)(Q (y, t1 , . . . , tn ) ∧ F [y]) (v)
which says the same thing as (iv) in any theory that thinks that (ii) is
true (this observation is made precise in the proof of Metatheorem I.6.1).
Of course, f may occur many times in F , even “within itself”, as in
f f z 1 . . . z n y2 . . . yn ,§ or even in more complicated configurations. Indeed,
it may occur within the scope of a quantifier. So the rule becomes: Apply the
transformation taking every atomic subformula A [ f t1 . . . tn ] of F into
the form (v) by stages, eliminating at each stage the leftmost-innermost¶
occurrence of f (in the atomic formula we are transforming at this stage),
until all occurrences of f are eliminated. We now have a formula F ∗ over
Lk.
† This notation allows for the possibility that f t1 . . . tn does not occur at all in F (see the convention
on brackets, p. 19).
‡ See (C) in the proof of Metatheorem I.6.1 below.
§ Or f ( f (z 1 , . . . , z n ), y2 , . . . , yn )), using brackets and commas to facilitate reading.
¶ A term f t1 . . . tn is innermost iff none of the ti contains “ f ”.
70 I. A Bit of Logic: A User’s Toolbox
(b) Let F [x] be over L, and let t stand for f t1 . . . tn , where f is introduced
by (ii) above as an axiom that extends into . Assume that no ti contains
the letter f and that y is not free in F [t]. Then†
F [t] ↔ (∃y)(Q (y, tn ) ∧ F [y])
Proof. First observe that this metatheorem indeed gives the assurance that, after
applying the transformations (1) and (2) to obtain F ∗ from F , thinks that
the two are equivalent.
(a): This follows immediately from the Leibniz rule (I.4.25).
(b): Start with
F [t] → t = t ∧ F [t] (by t = t and |=Taut -implication) (A)
Now, by Ax2, substitutability, and non-freedom of y in F [t],
t = t ∧ F [t] → (∃y)(y = t ∧ F [y])
Hence
F [t] → (∃y)(y = t ∧ F [y]) (B)
by (A) and |=Taut -implication.‡
Conversely,
y = t → (F [y] ↔ F [t]) (Ax4; substitutability was used here)
Hence (by |=Taut )
y = t ∧ F [y] → F [t]
Therefore, by ∃-introduction (allowed, by our assumption on y),
(∃y)(y = t ∧ F [y]) → F [t]
Finally, by (ii) (which introduces to the left of ), (C), and the Leibniz rule,
I.6.2 Remark (One Point Rule). The absolutely provable formula in (C) above
is sometimes called the one point rule (Gries and Schneider (1994), Tourlakis
(2000a, 2000b, 2001)). Its “dual”
is also given the same nickname and is easily (absolutely) provable using (C)
by eliminating ∃.
(∃!y)R(y, x1 , . . . , xn ) (∗)
72 I. A Bit of Logic: A User’s Toolbox
y = f x1 . . . xn ↔ R(y, x1 , . . . , xn ) (∗∗)
|= F (1)
|=M (2)
We now expand the structure M into M = (M, I ) – without adding any new
individuals to its domain M – by adding an interpretation, P I , for the new
symbol P. We define for every a1 , . . . , an in M
P I (a1 , . . . , an ) = t iff |=M Q (a 1 , . . . , a n ) [i.e., iff |=M Q (a 1 , . . . , a n )]
Clearly then, M is a model of the new axiom, since, for all M -instances of the
axiom – such as P(a 1 , . . . , a n ) ↔ Q (a 1 , . . . , a n ) – we have
I
P(a 1 , . . . , a n ) ↔ Q (a 1 , . . . , a n ) =t
It follows that |=M , since we have |=M , the latter by (2), due to having
made no changes to M that affect the symbols of L. Thus, F yields
|=M F ; hence, since F is over L, |=M F . Along with (2), this proves (1).
|=M (∃!y)R(y, x1 , . . . , xn )
We now expand the structure M into M = (M, I ),† so that all we add to it
is an interpretation for the new function symbol f . We let f I =
f . From (2)
it follows that
|=M (2 )
Now (∗∗), (2 ) and (4) yield |=M , which implies |=M F (from F ).
Finally, since F contains no f , |=M F . This last result and (2) give (1).
I.6.4 Remark.
(a) We note that translation rule (1) and (2) – the latter applied to atomic sub-
formulas – preserve the syntactic structure of quantifier prefixes. For example,
suppose that we have introduced f in set theory by
y = f x1 . . . xn ↔ Q (y, x1 . . . , xn ) (5)
which still has the ∀∃-prefix and still looks exactly like a collection axiom
hypothesis.
(b) Rather than worrying about the ontology of the function symbol formally
introduced by (5) above – i.e., the question of the exact nature of the symbol
The “z” in (8) above is a bound variable.† This new type of term is read “the
unique z such that . . . ”.
This “ι” is not one of our primitive symbols.‡ It is just meant to lead to the
friendly shorthand (8) above that avoids the ontology issue.
Thus, once one proves
which, of course, is an alias for axiom (5), using more suggestive notation for
the term f x1 , . . . , xn .
By (9), axioms (5) or (5 ) can be replaced by
Q ( f x1 , . . . , xn , x1 , . . . , xn )
and
respectively. For example, from (5 ) we get (10) by substitution. Now, Ax4
(with some help from |=Taut ) yields
Q (ιz)Q (z, x1 , . . . , xn ), x1 , . . . , xn →
y = (ιz)Q (z, x1 , . . . , xn ) → Q (y, x1 , . . . , xn )
Hence, assuming (10),
The indefinite article. We often have the following situation: We have proved a
statement like
A( f y1 . . . yn , y1 , . . . , yn ) (3)
+ A( f y1 . . . yn , y1 , . . . , yn ) B (4)
† Cf. II.4.1.
76 I. A Bit of Logic: A User’s Toolbox
such that f , the new function symbol, occurs nowhere in B , i.e., the latter
formula is over L. We can conclude then that
B (5)
|=M (6)
and show
|=M B (7)
fI =
f
( f i 1 . . . i n )I =
f (i 1 , . . . , i n ) = a(i 1 , . . . , i n )
and the right hand side of the above is true by the choice of a(i 1 , . . . , i n ).
Thus, |=M + A( f y1 . . . yn , y1 , . . . , yn ); hence |=M B , by (4).
Since B contains no f , we also have |=M B ; thus we have established (7)
from (6). We now have (5).
One can give a number of names to a function like f : A Skolem function,
an ε-term (Hilbert and Bernays (1968)), or a τ -term (Bourbaki (1966b)). In
the first case one may ornament the symbol f , e.g., f ∃A , to show where it is
coming from, although such mnemonic naming is not, of course, mandatory.
The last two terminologies actually apply to the term f y1 . . . yn , rather than to
the function symbol f .
Hilbert would have written
and Bourbaki
(τ x)A(x, y1 . . . , yn ) (9)
each denoting f y1 . . . yn . The “x” in each of (8) and (9) is a bound variable
(different from each yi ).
The fact that a formula M(x) might formally denote a collection that is not a set
is perfectly consistent with our purposes. After all, the intended interpretation
of set theory has such a non-set collection as its universe.
† The subscript “i” is a weak attempt on my part to keep reminding us throughout this section that
L i and Ti are to implement an interpretation of L.
‡ See II.2.1.
78 I. A Bit of Logic: A User’s Toolbox
The conditions in I.7.1(ii) and I.7.1(iii) simply say that the universe {x : M } is
closed under constants (i.e., contains the interpreting constants, a I ) and under
the interpreting functions, f I .
Some authors will not assume that L i already has enough nonlogical symbols
to effect the mapping I as plainly as in the definition above. They will instead
say that, for example, to any n-ary f of L, I will assign a formula A(y, xn )
of L i such that
Ti M(x1 ) ∧ · · · ∧ M(xn ) → (∃!y) M(y) ∧ A(y, xn )
In view of our work in the previous section, this would be an unreasonably
roundabout way for us to tell the story.
Similarly, the results of Section I.6 allow us, without loss of generality, to
always assume that the formula M in an interpretation I = (. . . , M, . . . ) is
atomic, P x, where P is some unary predicate.
† We thus substitute the syntactic, or formal, requirement of provability for the semantic, or infor-
mal, concept of truth.
I.7. Formalizing Interpretations 79
We next formalize the extension of I to all terms and formulas (cf. I.5.5
and I.5.6).
The two definitions I.7.2 and I.7.3 are entirely analogous with the definition
of mixed mode formulas (I.5.17). The analogy stands out if we imagine that
“A M ” is some kind of novel notation for “A [[ . . . ]]”. Particularly telling is
the last case (pretend that we have let M = {x : M(x)}, where M may or may
not be a set).
We have restricted the definition to the primary logical symbols.
M Thus, e.g.,
just
as (∀x)A
M abbreviates ¬(∃x)¬A, we have that (∀x)A abbreviates
¬ (∃x)¬A
, i.e., ¬(∃x) M(x)∧¬A M
, or, in terms of “∀”, (∀x) M(x) →
M
A .
A trivial induction (on formulas A over L) proves that A M is a formula
over L i .
is short for
I.7.5 Lemma. Given terms s and t and a formula A, all over L. Then
(s[x ← t])M ≡ s M [x ← t M ] and (A [x ← t])M ≡ A M [x ← t M ].
Proof. The details of the two inductions, on terms s and formulas A, are left
to the reader (see the proof of I.5.11).
We only look at one “hard case” in each induction:
Induction on terms s. Let s ≡ f t1 t2 . . . tn . Then
To see why (5) holds, freeze the xr and add the axiom B ≡ M(x1 ) ∧ · · · ∧
M (xr ) to Ti . By the I.H.,
M
Ti +B M(ti [xr ]) for i = 1, . . . , n
We are ready to prove our key result in this connection, namely soundness.
82 I. A Bit of Logic: A User’s Toolbox
Proof. We want
Ti M (x1 ) ∧ · · · ∧ M(xn ) → A M (x1 , . . . , xn ) (6)
for all A ∈ . We have several cases.
Ax1. Let A(xn ) be a tautology. As the operation . . .M does not change the
Boolean connectivity of a formula, so is A M (xn ). Thus, (6) follows by
tautological implication.
Ax2. Let A (x , y , z ) ≡ B (x , t(x , y ), z ) → (∃w)B (x , w, z ). By I.7.5,
A M (x , y , z ) ≡ B M (x , t M (x , y ), z ) → (∃w) M(w) ∧ B M (x , w, z )
By I.7.6,
Ti M(x1 ) ∧ · · · ∧ M(y1 ) ∧ · · · → M(t M (x , y )) (7)
Since
M(t M (x , y )) ∧ B M (x , t M (x , y ), z ) → (∃w) M(w) ∧ B M (x , w, z )
is in over L i , (7) and tautological implication yield
Ti M(x1 ) ∧ · · · ∧ M(y1 ) ∧ · · · →
B M (x , t M (x , y ), z ) → (∃w) M(w) ∧ B M (x , w, z )
One more tautological implication gives what we want:
Ti M(x1 ) ∧ · · · ∧ M(y1 ) ∧ · · · ∧ M(z 1 ) ∧ · · · → A M (x , y , z )
Ax3. Let A(x) ≡ x = x. We want Ti M(x) → x = x, which holds by
tautological implication and the fact that x = x is logical over L i .
Ax4. Here A [xn ] ≡ t = s → (B [x ← t] ↔ B [x ← s]), where xn includes
all the participating free variables. Thus, using I.7.5, (6) translates into
Ti M(x1 ) ∧ · · · ∧ M(xn ) → t M = s M
→ (B M [x ← t M ] ↔ B M [x ← s M ])
which holds by tautological implication from the instance of the Leibniz
axiom over L i , t M = s M → (B M [x ← t M ] ↔ B M [x ← s M ]).
and
By ∃-introduction,
Ti (∃z) M (z) ∧ B M → M(x1 ) → · · · → M (xn ) → C M
It is a shame to call the next result just a “corollary”, for it is the result on
which we will base the various relative consistency results in this volume (with
84 I. A Bit of Logic: A User’s Toolbox
the sole exception of those in Chapter VIII, where we work in the metatheory,
mostly).
The corollary simply says that if the theory, Ti , in which we interpret T is
not “broken”, then T is consistent. This is the formal counterpart of the easy
half of Gödel’s completeness theorem: If a theory T has a (metamathematical)
model,† then it is consistent.
† It is a well-established habit not to doubt the metatheory’s reliability, a habit that has had its
critics, including Hilbert, whose metatheory sought “reliability” in simplicity. But we are not
getting into that discussion again.
‡ The → half of ↔ we get for free by an application of the Leibniz axiom.
I.7. Formalizing Interpretations 85
(3) Ti M (x) → U M (x)↔ U N (φ(x) (“preservation
of U Ӡ )
(4) Ti M (x) ∧ M(y) → x ∈ M y ↔ φ(x) ∈ N φ(y) (“preservation of ∈”)
If L contains a constant c, then we must also have φ(c M ) = c N . This is
met in our only application later on by having c M = c = c N and φ(c) = c.
† We write “U M ” rather than “U I ”, as this will be the habitual notation in the context of set
theory.
86 I. A Bit of Logic: A User’s Toolbox
from which
T A 1 → A n
I.7.12 Lemma. Let L be a language with just U and ∈, above, as its nonlogical
symbols, and let φ be a formal isomorphism of its two interpretations I =
(L i , Ti , M, I ) and J = (L i , Ti , N , J ) in the sense of (1)–(4). Then, for
every formula A(xn ) over L,
Ti M (x1 ) ∧ · · · ∧ M(xn ) → A M (xn ) ↔ A N (φ(x1 ), . . . , φ(xn ))
Proof. Induction on formulas. For the atomic ones the statement is just (2)–(4)
above. We skip the trivial ∨ and ¬ cases and look into A(xn ) ≡ (∃y)B (y, xn ).
First,
A M (xn ) ≡ (∃y) M(y) ∧ B M (y, xn ) (5)
and
A N (φ(x1 ), . . . , φ(xn )) ≡ (∃y) N (y) ∧ B N (y, φ(x1 ), . . . , φ(xn )) (6)
The top line of our calculation is (6), while the bottom is (5) (within bound
variable renaming); thus we are done by the deduction theorem.
The abbreviation
n is pronounced the numeral n, and it stands for
S .
. . S 0
n of them
I.8.2 Definition. A theory (this names the set of nonlogical axioms) over L N
is correct over N just in case A ∈ implies |=N A.
That is, all its nonlogical axioms are true in N (or really true, if N happens
to be the intended model).
The term correct is used by Smullyan (1992). Some authors say “sound”, but
this is not as apt a terminology, for sound means something else: All first order
theories are sound, but some theories over L N – although sound – may fail to
be correct.
Pause. Why “sentence”? Why not define the above concepts (complete, etc.)
in terms of arbitrary formulas over L?
Thus, in the case of an incomplete theory and for any particular one of its
models – including the intended one – there is at least one sentence of the
language which is (Tarski-)true in said model, but is not provable. Such is any
undecidable sentence A, for it or ¬A must be true in any given model.
An inconsistent theory is complete, of course.
T ¬A(
n) for all n ∈ N
An ω-consistent theory fails to prove something over its language; thus it is con-
sistent. The converse is not true, a fact first observed by Tarski. This observation
is a corollary of the techniques applied to prove Gödel’s (first) incompleteness
theorem (see our companion volume for the full story).
Why is Gödel’s theorem true? The idea (in Gödel’s original proof) is very
old, based on games ancient Greek philosophers liked to play: The so-called
† A fair amount of recursion theory is covered in volume 1, Mathematical Logic, where, in partic-
ular, recursive sets are defined and studied.
90 I. A Bit of Logic: A User’s Toolbox
Where have we used, in the above argument, the part of the assumptions that
requires the set of nonlogical axioms to be recognizable? We actually did not
use it explicitly, since our argument was too far removed from the level of detail
that would exhibit such dependences on assumptions.
Suffice it to say that, among other things, the assumption on recognizability
prevents us from cheating – thus invalidating Gödel’s theorem: Why don’t we
just add all the really true sentences to the set of axioms and form a complete
extension of Peano arithmetic? Because the recognizability assumption does
not allow this. Such an extension results to a non-recognizable set of axioms
(cf. volume 1).
There is another way to look at the intuitive reason behind the incompletable-
ness phenomenon. This relies on results of recursion theory. Imagine beings
who live in a world where set theorists call a set countable just in a case a
mechanical procedure, or algorithm, exists to enumerate all the set’s members,
possibly with repetitions. Such beings call any set that fails to be enumerable
in this manner uncountable. Intuitively, in the eyes of the inhabitants of this
world, this latter type of set has far too many objects.
In this world the set of theorems of any extension of Peano arithmetic, by
an arbitrary recognizable set of new axioms, is countable. The reason can be
seen intuitively as a consequence of the recognizability of the set of nonlogical
† Attributed to Epimenides. He, a Cretan, said: “All Cretans are liars”. So, was his statement true?
Gödel’s proof is based on a variation of this. A person says: “I am lying.” Well, is he, or is he
not?
‡ The exact form of G depends on the extension at hand.
§ Soundness we have for free. Correctness guarantees the real truth of the nonlogical axioms.
Soundness extends this to all theorems.
I.8. The Incompleteness Theorems 91
Digression. Here is how. To simplify matters assume that the alphabet of the
language L N is finite (for example, variables are really the strings
v|...|v
n+1
denoting what we may call vn , for n ≥ 0, built from just two symbols, “v” and
“|”).
We convert every proof into a single string by adding a new symbol to our
alphabet, say #, which is used as a separator and “glue” – between formulas –
as we concatenate all the formulas of a proof into a single string, from left to
right. We will still call the result of this concatenation a “proof”.
We now form two separate infinite lists, algorithmically. The first is the list of
all strings over the alphabet of L N, as the latter was augmented by the addition
of #. This listing can be effected by enumerating by string length, and then,
within each length group, lexicographically (alphabetically).‡
The second list is built as follows. Every time a string A is put in the first
list, we test algorithmically whether or not A is a proof. We can do this, for,
firstly, we can recognize if it is of the right form, that is,
A1 #A2 # . . . #An
† One can “build an infinite list algorithmically” is jargon that means the following: One has an
algorithm which, for any n ∈ N, will generate the nth element of the list in a finite number of
steps.
‡ We assume that we have fixed an alphabetical order of the finitely many symbols of our alphabet.
92 I. A Bit of Logic: A User’s Toolbox
While Gödel worked with the G that says “I am not a theorem”, his result
was purely syntactic. We state it without proof below.
† It is straightforward to see that if there were only finitely many really true sentences that the formal
system missed, these could be put into a finite table T , which we can check for membership,
trivially. But then, we have an algorithm that can check a formula for membership in the set union
between the theory’s axioms and T (just search the table; if not found there, then search the set
of nonlogical axioms). Thus, adding the formulas of T to the theory, we have an extension with
a recognizable set of axioms. This new theory trivially has all the formulas in T as theorems.
Hence it has all the really true formulas as theorems (T is all that the original theory missed),
contradicting the fact that this set is uncountable, while the set of theorems is still countable.
I.8. The Incompleteness Theorems 93
This fact showed that Hilbert’s finitary techniques, in the metatheory, were inad-
equate for his purposes: Intuitively, finitary techniques are codable by integers
and therefore can be expressible and usable in formal Peano arithmetic.
Now we have two conflicting situations: Hilbert’s belief that finitary tech-
niques can settle the consistency (or otherwise) of formal theories has had as
a corollary the expectation that Peano arithmetic could settle (prove) its own
consistency (via the formalized finitary tools used within the theory). On the
other hand, Gödel’s second incompleteness theorem proved that this cannot be
done.
The detailed proof takes several tens of pages to be fully spelled out (cf. vol-
ume 1). However, the proof idea is very simple: Let us fix attention on an
extension T as above, and let “Con” be a sentence whose natural interpreta-
tion (over N) says that T is consistent. Let also G be the sentence that says “I
am not a theorem of T ”.
Now, Gödel’s first theorem (partly) asserts the truth (over N) of
Con → G (1)
i.e., “if T is consistent, then G is true – hence, is not provable, for it says just
that”.
The quoted sentence above is correct, for ω-consistency came into play only
to show that Gödel’s G was not refutable. This part of the first theorem is not
needed towards the proof of the second incompleteness theorem.
Imagine now that we have managed to formalize the argument leading to (1)
so that instead of truth in N we can speak of provability in T :†
T Con → G
† While this is in principle possible – to formalize the argument that leads to the truth of (1) –
this is not exactly how one proves the deducibility of (1), and hence the second incompleteness
theorem, in practice.
94 I. A Bit of Logic: A User’s Toolbox
I.9. Exercises
I.1. Prove that the closure of I = {3} under the two relations z = x + y and
z = x − y is the set {3k : k ∈ Z}.
I.2. The pair that effects the definition of Term (I.1.5, p. 13) is unambiguous.
I.3. The pair that effects the definition of Wff (I.1.8, p. 15) is unambiguous.
I.4. With reference to I.2.13 (p. 26), prove that if all the g Q and h are defined
everywhere on their input sets (i.e., they are “total”), that is, I for h
and A × Y r for g Q and (r + 1)-ary Q, then f is defined everywhere on
Cl(I , R).
I.5. Prove that for every formula A in Prop (I.3.2, p. 29) the following is
true: Every nonempty proper prefix (I.1.4, p. 13) of the string A has an
excess of left brackets.
† Briefly, imagine that through arithmetization we have managed to represent every formula, and
every sequence of formulas, of L N by a numeral. Gödel defined a formula P (x, y) which “says”
that the formula coded x is provable by a proof coded y. Self-reference allows one to find a
natural number n such that the numeral n codes the formula ¬(∃y) P ( n , y). Clearly, this last
formula says that “the formula coded by n is not a theorem”. But it is talking about itself, for n
is its own code. In short, G ≡ ¬(∃y) P ( n , y).
‡ Recognizability is at the heart of being able to “talk about” provability within the formal
theory.
§ More concretely, and without invoking faith, one can easily show that there is an interpretation,
in the sense of Section I.7, of Peano arithmetic within ZFC. This becomes clear in Chapter V,
where the set of formal natural numbers, ω, is defined.
¶ The formal statement of the incompleteness theorems starts with the hypothesis “If ZFC is
consistent”.
I.9. Exercises 95
I.6. Prove that any non-prime A in Prop has uniquely determined immediate
predecessors.
I.7. For any formula A and any two valuations v and v , v̄(A) = v̄ (A) if v
and v agree on all the propositional variables that occur in A.
I.8. Prove that A [x ← t] is a formula (whenever it is defined) if t is a term.
I.9. Prove that Definition I.3.12 does not depend on our choice of new vari-
ables zr .
I.10. Prove that (∀x)(∀y)A ↔ (∀y)(∀x)A.
I.11. Prove I.4.23.
I.12. Prove I.4.24.
I.13. (1) Show that x < y y < x (< is some binary predicate symbol; the
choice of symbol here is meant to provoke).
(2) Show informally that x < y → y < x
(Hint. Use the soundness theorem.)
(3) Does this invalidate the deduction theorem? Explain.
I.14. Prove I.4.25.
I.15. Suppose that ti = si for i = 1, . . . , m, where the ti , si are arbitrary
terms. Let F be a formula, and F be obtained from it by replacing any
number of occurrences of ti in F (not necessarily all) by si . Prove that
F ↔ F .
I.16. Suppose that ti = si for i = 1, . . . , m, where the ti , si are arbitrary
terms. Let r be a term, and r be obtained from it by replacing any number
of occurrences of ti in r (not necessarily all) by si . Prove that r = r .
I.17. Settle the “Pause” following I.4.21.
I.18. Prove I.4.27.
I.19. Prove that x = y → y = x.
I.20. Prove that x = y ∧ y = z → x = z.
I.21. Prove (semantically, without using soundness) that A |= (∀x)A.
I.22. Suppose that x is not free in A. Prove that A → (∀x)A and
(∃x)A → A.
I.23. Prove the distributive laws:
(∀x)(A ∧ B ) ↔ (∀x)A ∧ (∀x)B and
(∃x)(A ∨ B ) ↔ (∃x)A ∨ (∃x)B .
I.24. Prove (∃x)(∀y)A → (∀y)(∃x)A with two methods: first using the
auxiliary constant method, next exploiting monotonicity.
96 I. A Bit of Logic: A User’s Toolbox
I.26. Show that for all F and set of formulas , if 2 F holds then so does
1 F .
Our aim is to see that the logics 1 and 2 are equivalent, i.e., have exactly the
same theorems. In view of the trivial Exercise I.26 above, what remains to be
shown is that every tautology is a theorem of 2 . One particular way to prove
this is through the following sequence of 2 -facts.
† ¬ and ∨ are the primary symbols; →, ∧, ↔ are defined in the usual manner.
I.9. Exercises 97
We can now prove the completeness theorem (Post’s theorem) for the “propo-
sitional segment” of 2 , that is, the logic, 3 – so-called propositional logic
(or propositional calculus) – obtained from 2 by keeping only the “proposi-
tional axioms” (1)–(4) and modus ponens, dropping the remaining axioms and
the ∃-introduction rule.
I.38. Let 3 A. Prove that there is a complete ⊇ such that also 3 A.
This is a completion of .
(Hint. Let F 0 , F 1 , F 3 , . . . be an enumeration of all formulas. There is
such an enumeration, right?
Define n by induction on n:
0 =
n ∪ {F n } if n ∪ {F n } 3 A
n+1 =
n ∪ {¬F n } otherwise
† It is our experience that readers of books like this one often choose to ignore “Chapter 0” initially.
Invariably they are compelled to acknowledge its existence sooner or later in the course of the
exposition. This will probably happen as early as Chapter III in our case.
99
100 II. The Set-Theoretic Universe, Naı̈vely
are either
(1) atomic – let us understand by this term an object that is not a collection of
other objects – such as a number or a point on a Euclidean line, or
(2) collections of mathematical objects.
Pause. Are the sets that we have displayed under L1-2 and L1-3 the same?
(I mean, equal?) Same question for the sets under L1-4 and L1-5. Our “un-
derstanding” is – gentle way of saying “we postulate” – that set equality is
† Taking for granted an understanding of the terms “atom” and “collection” as intuitively self-
explanatory, we use them to describe the objects that set theory studies. We are purposely leaving
out a description of what “mathematical” is supposed to mean. Suffice it to say that experience
provides numerous examples of mathematical objects, such as numbers of all sorts, points, lines,
vectors, matrices, groups, etc. Of course, one needs an experiential understanding of atomic
mathematical objects only, since all the others are built from those as described in II.1.1.
‡ Atoms are very often called “urelements”, pronounced “ūr-élements” – an anglicized form of the
German word Urelemente – “primeval elements”.
II.1. The “Real Sets” 101
Note how the level of nesting of { }-brackets matches the level of the objects.
L2-1. {∅}.
L2-2. {1,√{1}}.
L2-3. {{ 2, 1}}.
† It cannot be emphasized strongly enough that “accepted” is a very important verb here. Different
descriptions/ontologies of sets may be possible – for example, one that denies Principle 1 below.
Compare with the similar situation in geometry. It is possible to imagine different types of
geometry – Euclidean on one hand, and various non-Euclidean ones on the other – but one is free
to say “I will accept Euclidean geometry as the ‘true’ depiction of the universe and then proceed
to learn its theorems”. All that the latter acceptance means is a decision to study a particular type
of geometry.
102 II. The Set-Theoretic Universe, Naı̈vely
For this process to be effective we have to understand some of the fine points
of II.1.3. Thus we begin by “unwinding” the induction into an iteration. We
obtain the following two principles of set formation that are taken as “obvious
truths”:†
II.1.4 Remark. Principle 1 is too strong. Omitting it does not affect the ap-
plicability of set theory to mathematics, i.e., the status of the former as the
“foundation” of the latter. Of course, we cannot omit this principle unless we
modify the descriptions II.1.1 and II.1.3 (for reasons analogous to the pheno-
menon described in I.2.9).
Now, if Principle 1 holds, as it does under our assumptions, then it leads
to the foundation axiom. This comment will make much more sense later. For
now, if you have just read it you have done so at your own risk.
† Not less “obvious” than II.1.3, from which they follow directly. The reader may peek once more
into I.2.9 for motivation, forewarned though that the stages of set formation are “far too many”
to be numbered solely by natural numbers.
By the way, we do not normally speak of formation of atoms. Atoms are given outright. It is
sets that we build.
II.1. The “Real Sets” 103
II.1.5 Remark. (1) We are not saying above that stage is the “earliest” stage
at which A can be built, since we have said “follows” rather than “immediately
follows”.
(2) If some set is definable (“buildable”) at some stage , then we find
it both convenient and intuitively acceptable to agree that it is also definable
at any later stage as well. This corresponds to the common experience that a
theorem has proofs of various lengths; once a “short” proof has been given,
then – for example by adding redundant axioms in this proof – we can lengthen
it arbitrarily and yet still have it yield the same theorem.
(3) “If our intuition will accept . . . ”. This condition in Principle 2 creates
some difficulty. Whose intuition? What is acceptable to some might not be to
others.
This is a problem that arises when one does one’s mathematics like a
Platonist. A Platonist accepts some “obvious truths” about mathematical ob-
jects, and then proceeds to discover some more truths by employing (infor-
mal) logical deductions. Most practising mathematicians practise their craft like
Platonists (whether they are card-carrying Platonists or not).
The catch with this approach, especially when applied to something “big” –
by this I mean “foundational” – like set theory, is that one cannot always syn-
chronize the understandings of all Platonists as to what are the “obvious truths”
(about sets) – from where all reasoning begins to flow. There was a time not
too long ago, for example, that mathematicians, otherwise comfortable with
infinite sets, were not unanimous on whether the set-theoretic principle known
as the axiom of choice was valid or not.
In the end, we avoid this difficulty by adopting the axiomatic approach to
set theory. The Platonist within each of us may continue thinking of the sets
that were imperfectly described in II.1.3 as the “real sets” – the ones that,
Platonistically speaking, “exist”. However, we plan to learn about sets by argu-
ing like formalists. That is, we will translate a few obvious and important truths
about real sets into a formal language (these translations will lead to our axiom
schemata) and then employ first order logic as our reasoning tool to learn about
real sets, indirectly, by proving theorems in our formal language.†
Thus, once the imprecise set-formation-by-stages thesis has motivated the
selection of the above-mentioned “few obvious and important truths”, it will
† The indirection occurs because in this language we will use terms to represent or codify real sets,
and formulas to represent or codify properties of real sets. The reader who has read volume 1 is
by now familiar with this approach, which we applied there in Chapter II to the study of (Peano)
arithmetic. For terminology – such as “formal language”, “term”, “formula”, “metatheory” – and
tools from logic, the reader is referred to Chapter I of the present volume.
104 II. The Set-Theoretic Universe, Naı̈vely
never be invoked again. Indeed, the opposite will happen. Our axioms will be
strong enough to precisely define (eventually) what stages are and what happens
at each stage, something that we are totally powerless to do now.
Another criticism of the Platonist’s approach to set theory is that it may entail
contradictions (often called antinomies or paradoxes) which are hard to work
around. Such paradoxes come about in the Platonist approach because it is not
always clear what is a safe “truth” that we can adopt as a starting point of our
reasoning. For example, is the following a “safe truth”? “For any property ‘A’
we can build a set of all the objects x that satisfy A.” We look into this question
in the next section. We also ponder briefly, through an example immediately
below, the nature of set-building, or set-defining, “properties”.
A bit on terminology here: Some people call the contradictions of naı̈ve set
theory “antinomies” (e.g., the Russell antinomy), and the harmless pleasantries
of the Berry type “paradoxes”. Others, like ourselves, use just one term, para-
doxes. The reader may wish to decide for himself on the choice of terminology
here, given that both words are rooted in Greek and a paradox is something
that is “against one’s belief” or even “against one’s knowledge” (δoκ ώ = “I
believe”, or, “I know”) while antinomy means being “against the – here, logical
or mathematical – law” (ν óµoς = “(the) law”).
By the way, Berry’s paradox is this: Define n by “n is a positive integer
definable using fewer than 1000 non-blank symbols of print”.† Examples of
possible values of n: “5”, “10”, “10 raised to the power 350000”, “the smallest
prime number that has at least 10 raised to the power 350000 digits”.
Now, the set of such numbers is finite, since there are finitely many ways to
write a definition employing fewer than 1000 non-blank symbols. Thus, there
are plenty of positive integers that are not so definable. Let m denote the smallest
such.
Then “m is the smallest positive integer not definable using fewer than 1000
non-blank symbols of print”.
Hey, we have just defined m in less than 1000 non-blank symbols of print.
A contradiction!‡
II.1.6 Remark. It should be pointed out that our Platonist’s view of “real sets”
is informed by the work of Russell (and the later work of von Neumann),
† There is an implicit understanding that the set of all available symbols of print is finite: e.g.,
nowadays we could take as such the set of symbols on a standard English computer keyboard.
‡ Well, not really. Neither of the statements “n is a positive integer definable using fewer than
1000 non-blank symbols of print” or “m is the smallest positive integer not definable using fewer
than 1000 non-blank symbols of print” is a definition. What does “definable” mean?
II.2. A Naı̈ve Look at Russell’s Paradox 105
namely, by his suggested “fix” for the paradox that he discovered – see next
section. Georg Cantor, the founder of set theory, did not require any particular
manner, or order, in which sets are formed. The axioms of the ZFC set theory
of Zermelo and Fraenkel describe the von Neumann universe, which is built
by stages, rather than the Cantorian universe. In the latter, as many sets can be
present at once as our thought or perception will allow.†
II.2.1 Example. ‡ Let us recall (from Chapter I or from our previous mathe-
matics courses) that the notation
S = {x : A [x]} (1)
denotes (naı̈vely) the set S of all objects x that satisfy the formula A[x].§ This
means that entrance into S is determined by
x∈S iff A[x] (2)
where, of course, by “x ∈ S” we mean “x is a member of S”.
Let us see why the “Russell set”
R = {x : x ∈
/ x} (3)
is bad news for the informal approach: By (2), (3) yields
x ∈ R iff x∈
/x (4)
Now, since the variable x can receive as value any object of the theory, in
particular it can receive the “set” R. Thus, (4) yields the contradiction
R∈R iff R∈
/ R (5)
Our only way out of the contradiction (5) is to say that
R is not a set.¶
† In Cantor’s own description, a set is any collection into “a whole” of objects of our “perception
or of our thought”.
‡ Reminder: This is at the informal level.
§ The square and round bracket notation is introduced in I.1.11.
¶ This saves the theory, for now, since then it is “illegal” to plug R into the set/atom variable x;
hence (5) will not be derived from (4).
106 II. The Set-Theoretic Universe, Naı̈vely
Now that we realize that some collections such as S in (1) above are sets, and
some are not, how can we tell which is which? The axiomatic approach resolves
such issues in an elegant way.
∃, ¬, ∨, =, (, )
† “Hmm”, the alert reader will say. “You are using Principle 1 here. You are saying that if x is a
non-atomic mathematical object, then it must be built at some stage.” Indeed! However, even if we
were to totally abandon Principle 1 and revise our naı̈ve picture of the universe of “mathematical
objects” to allow x ∈ x to be true (depending on the “value” of x), we could still avoid the
Russell paradox argument in exactly the same way we avoid it in the presence of Principle 1:
Namely, by restricting the circumstances where the “operation” {x : A[x]} is allowed to build
a set. In short, it is not the choice of an answer to the question “x ∈ x” that creates the Russell
paradox, rather it is a comprehension principle, {x : A[x]}, that is far too powerful for its own
good.
II.3. The Language of Axiomatic Set Theory 107
and object variables, that is, variables that when interpreted are interpreted to
take as values (real) sets or atoms,†
v0 , v1 , . . . , vi , . . .
Additionally, L Set has the two primitive nonlogical symbols “∈”‡ and “U ”.§
The former is a binary predicate that is intended to mean (when interpreted) “is
a member of”. The latter is a unary predicate meant to say (of its argument) “is
an atom”. All the remaining familiar symbols of set theory (e.g., ∩, ∪, ⊆, ×)
are introduced as defined nonlogical symbols as the theory progresses.
The logical axioms and rules of first order logic will be those that we have
introduced in Chapter I.
Our intended “standard model” – i.e., what we are describing by our formal
system – is the already (imperfectly) described “universe” of all sets and atoms.¶
Having here a standard model in mind, which the axiomatic theory attempts
to describe correctly and completely,# is entirely analogous to what we did in
volume 1.
There we had the standard model of arithmetic, N = (N, S, +, ×, 0, <),
in mind, and each of the axiomatizations introduced, ROB and PA, were suc-
cessive attempts at formally deducing all the true formulas of N from a few
axioms.||
† This is the implementation of our intentions regarding the nature of “mathematical objects”
(II.1.1).
‡ “∈” is a stylized form of ε (épsilon) the first letter of the ancient Greek word “εσ τ ί” (pronounced
estı́ – with a short “ i ” – and meaning “is”). Thus, if y is the set of all even integers, x ∈ y says
that “x is an even integer”. Some authors still use x ε y instead of x ∈ y, but we prefer not to do
so, as “ε” is overused (e.g., empty string, epsilon number, Hilbert selector, a major contributor
to the dreaded “ε-δ” proofs of calculus, etc.).
§ “Primitive” means “primeval” or “given at the very beginning”.
¶ Known as the von Neumann universe. That this universe is not a set – it is equal to the Russell
collection R granting Principle 1, is it not? – is an issue we should not worry about, as long as
we accept that its members are all the sets and atoms.
# The terms correctness and (syntactic or simple) completeness of a theory are defined in I.8.2
and I.8.3. The former means that every theorem is true when interpreted in the standard model.
The latter means that all formulas that are true in the standard model are theorems. We have
no difficulty with the former requirement. The latter is impossible by Gödel’s incompleteness
theorems (I.8.5).
|| Again, we could not produce all such formulas, because of Gödel’s incompleteness theorems.
108 II. The Set-Theoretic Universe, Naı̈vely
The definitions of terms and formulas for L Set are those given in Chapter I
(I.1.5 and I.1.8) subject to the restriction that the only primitive nonlogical
symbols are the two predicates ∈ and U .
II.3.2 Remark (Basic Language). Thus, the terms of L Set are just the variables,
v0 , v1 , v2 , . . . .
Formulas are built from the atomic ones, that is, U vi , vi = v j , and vi ∈ v j
for all choices of i, j in N, by application of the connectives ¬, ∨, and ∃ (I.1.8).
We call L Set the basic or primitive language of set theory. The qualifiers
“basic” and “primitive” reflect the fact that the only nonlogical symbols are the
primeval ∈ and U . As the theory is being developed, we will frequently introduce
new defined symbols, thus extending L Set (cf. Section I.6). This process also
enlarges the variety of terms (adding terms such as ∅, {x : ¬x = x}, ω, etc.)
and formulas.
We note that the definitions of terms and formulas of L Set are strictly about
syntax – i.e., correct form. Thus they do not concern themselves with semantic
issues or provability issues. In particular, it is good form to write “v2 ∈ v2 ”,
even if one of our axioms will entail that “v2 ∈ v2 ” is a false statement.§
† “Be” is used here in formalist jargon. The Platonist terminology is “be denoted by”.
‡ A true formalist would probably declare that the sets of our intuition do not really “exist” –
mathematically speaking – and sets just are the terms of our formal language. See Bourbaki
(1966b, p. 62), where it is stated, in translation, that “[ . . . ] the word ‘set’ will be strictly considered
to be a synonym for ‘term’; in particular, phrases such as ‘let x be a set’ are, in principle, totally
superfluous, since every variable is a term; such phrases are simply introduced to help the intuitive
interpretation of [formal] texts”.
§ We have already remarked in II.2.1 that x ∈ x is false in our intended universe.
II.3. The Language of Axiomatic Set Theory 109
II.3.4 Example. Picking up the last comment above, we show here two exam-
ples of what the judicious use of English saves us from:
(a) We would sooner say “n is a natural number” than write the set theory
formula
(∀x)(∀y)(x ∈ y ∈ n → x ∈ n) ∧
[n = ∅ ∨ (∃x)(¬U (x) ∧ n = x ∪ {x}] ∧
(∀m) m ∈ n → (∀x)(∀y)(x ∈ y ∈ m → x ∈ m) ∧
[m = ∅ ∨ (∃x)(¬U (x) ∧ m = x ∪ {x}]
† “Abbreviated” is not always shorter. +x × yz is shorter than x +(y ×z), and f t1 t2 t3 is shorter than
f (t1 , t2 , t3 ). Yet the longer forms are easier to understand. An abbreviation here is an alternative,
easier to understand form.
110 II. The Set-Theoretic Universe, Naı̈vely
n > S0 ∧ (∀x)(∀y)(n = x × y → x = S0 ∨ x = n)
II.4. On Names
The reader is referred to Section I.1 (in particular see the discussion starting
with Remark I.1.3 on p. 10) so that we will not be unduly repetitive in the present
section.
† “Never” as long as all consistent augmentations of the set of axioms preserve the set’s
recursiveness – or “recognizability”.
II.4. On Names 111
metatheory. For example, in the metatheory we may say “the set {x : ¬x = x}”,
thus using the symbol sequence “{x : ¬x = x}” to name some appropriate real
set, the so-called empty set.†
This correspondence between certain terms‡ of the type “{x : A [x]}” and real
sets is nothing else than an application of first order definability (cf. I.5.15).
That is, if some real set A is first order definable in the standard model by a
formula A, we have
Reciprocally, in our argot, we nickname formal terms and their formal ab-
breviations by the metamathematical (often English) names of the sets that
they name. Thus, e.g., we say that {x : ¬x = x}, or ∅, is “the empty set of the
(formal) theory”.
We note two limitations of this naming apparatus below.
† I am guilty here of borrowing from the sequel. “{x : ¬x = x}” is not a term of the basic
language II.3.2; instead it is a defined term, about which we will talk soon.
‡ Russell’s paradox is fresh in our memory; thus, “certain” is an apt qualifier.
§ We cannot gloss over this shortage of names by extending L Set by the addition of a name for
each real set. That would make our language impractical, as it would make it uncountable, and
112 II. The Set-Theoretic Universe, Naı̈vely
II.4.3 Example. This, like most examples, is in the “real (informal) realm”.
The natural numbers 0, 1, 2, . . . when collected together form a set, normally
denoted by N.
We often capture the above sentence informally by writing N = {0, 1, 2, . . . }.
therefore impossible to generate finitely. In an uncountable language we will not be able to write,
or even check, proofs anymore, as we will have trouble telling what symbols belong to L Set and
which do not. As a result, we will be unable to know whether an arbitrary string of symbols is a
formula, an axiom, or just rubbish.
† This is the original reason that prompted the development of axiomatic theories.
‡ The reader should not worry about the meaning of “isomorphic”. We will come back to this very
issue in Chapter V.
II.4. On Names 113
II.4.5 Remark. At the outset of Section II.3 we promised “to turn the theory
into a consistent deductive science”.
It may come as a shock to the reader that we have no (generally acceptable)
proof of consistency of ZFC. We Platonistically got around the consistency
question of either ROB or Peano arithmetic by saying “sure they are consistent;
N is a model of either”, since few reasonable people will feel uncomfortable
about N or its fitness to certify consistency (serving as a model). Notwith-
standing this, proof theorists have found alternative constructive proofs of the
consistency of Peano arithmetic and hence of ROB (such proofs can be found
in Schütte (1977) and Shoenfield (1967)). These proofs necessarily use tools
that are beyond those included in Peano arithmetic (because of Gödel’s second
incompleteness theorem).
We have no such constructive proof of the consistency of ZFC. This, of
course, is not surprising. Since ZFC satisfies Gödel’s second incompleteness
theorem, a proof of its consistency cannot be formalized within ZFC. Here then
is the difficulty: What will any such consistency proof “outside” or “beyond”
ZFC look like, considering that
(a) it cannot be expressed in ZFC, and yet
(b) ZFC, being the “foundation of all mathematics” (or such that “mathematics
can be embedded” in it), ought to be able to include (formalizations of ) all
mathematical tools and mathematical reasoning – including a formalization
of its consistency proof that was given “outside” ZFC.
However, most set theorists are willing to accept the consistency of ZFC.
“Evidence” (but not a proof ) of this consistency is, of course, the presence of
the standard model.
III
III.1. Extensionality
Under what conditions are two sets equal?
First of all, if a and b stand for urelements, then a = b just obeys the logical
axioms of equality (Definition I.3.13, p. 35) and we have nothing to add about
their behaviour concerning equality.
For sets, however, we require that they be equal whenever they contain
exactly the same elements, regardless of whatever “structural connections”
these elements may have. In order to state this axiom formally we use the
primitive predicate of set theory, U . Thus U x is intended to mean “x is an
urelement” (therefore ¬U x will mean “x is a set”).
We use the “abbreviation”† “U (x)” for “U x”, since it is arguable that, in
general, “P(t1 , . . . , tn )” is more user-friendly than “Pt1 . . . tn ”.
III.1.2 Remark. We noted that the above axiom, (E), indicates that we want
two sets to be equal as long as they have the same elements, regardless of
the existence of inner structure in the sets (such as one dimensional or higher
114
III.1. Extensionality 115
dimensional order) and regardless of “intention”, that is, how the set originally
came about. For example, the set that contains the integers 2 and 3 is expected
to be the same as (equal to) the set of all roots of x 2 − 5x + 6 = 0, despite the
difference in the two descriptions. That is, we have postulated that set equality
is “forgetful of structure”.
It is the extension of a set (i.e., its actual contents) that decides equality,
hence the name of Axiom III.1.1.
But is this axiom “true”?† Is this the condition that governs equality of “real”
sets? Well, formal or axiomatic mathematics aims at representing reality within
an artificial but formal and precise language. In this “representation” there is
always something lost, partly due to limitations of the formal language and
partly due to decisions that we make – regarding the choice of our assumptions,
or axioms – about what features of “reality” are essential (of which we create
counterparts in the formal language) and which are not.
For example, a “real” line has width no matter how you construct it, but
geometers have decided that width is irrelevant, so they invent their lines so
as to have no width. In our case, we are saying that to decide set equality we
forget all attributes of sets other than what elements they contain. This is what
we deem to be important.
Now that we have defended our choice of (E), another question arises: Is
(E) a definition? Much of the elementary literature on the theory of sets takes
the point of view that it is (see Wilder (1963, p. 58), for example), although
often somewhat casually.
A formal definition would introduce the symbol “=” by (E), if the symbol
were not part of our “logical list” of symbols. Since we already have “=” and
its basic axioms, (E), for us, is an axiom.‡
Note that in the extensionality axiom we state no more than what we need –
following the mathematician’s known propensity to assume less in order to have
the pleasure of proving more. This accounts for using “· · · → A = B” rather
than “· · · ↔ A = B”. In fact, we have
¬U (A) ∧ ¬U (B) → A = B → (∀x)(x ∈ A ↔ x ∈ B) (1)
† Remember that while we cannot give a proof of consistency of ZFC, we can at least check that its
axioms are “really true”, i.e., true in the standard model. This checking is done on the informal
level, of course.
‡ In fact, a formal definition is still an axiom, via which a new formal symbol is introduced – as
we saw in Section I.6. But this is not the case with (E).
116 III. The Axioms of Set Theory
In English, in the case where A and B are sets, this says – that is, the semantics
in the metatheory is – that A ⊆ B is a short name for the statement “every
member of A is also a member of B”.
We read “A ⊆ B” as “A is a subset of B”, or “B is a superset of A”. Instead
of A ⊆ B we sometimes write B ⊇ A. As usual, we negate ⊆ and ⊇ by writing
⊆ and ⊇ respectively.
III.1.6 Remark. In III.1.5 we chose to allow the symbol ⊆ to act on any objects
of set theory – sets or atoms. An alternative approach that is often adopted in the
literature on naı̈ve set theory is to make A ⊆ B undefined (or meaningless) if
either A or B is an atom (this would be analogous to the situation in Euclidean
geometry, where, for example, parallelism is undefined on, say, triangles or
circles). Our choice in III.1.5, that is, (1), is technically more convenient, since
it does not require us to know the exact nature of A or B before we can use the
(formal) abbreviation A ⊆ B.
We note that, according to III.1.5, x ∈ A → x ∈ B is provable if A is an
urelement (by III.1.3), that is, A ⊆ B is provable. Indeed,
A ⊆ B ∧ B ⊆ A
↔ III.1.5 and the equivalence theorem (I.4.25, p. 52)
(∀x)(x ∈ A → x ∈ B) ∧ (∀x)(x ∈ B → x ∈ A)
↔ ∀ over ∧ distributivity (Exercise I.23, p. 95)
(∀x) (x ∈ A → x ∈ B) ∧ (x ∈ B → x ∈ A)
↔ tautological equivalence and equivalence theorem
(∀x)(x ∈ A ↔ x ∈ B)
↔ extensionality, plus assumptions for “→”; Leibniz axiom for “←”
A=B
III.2. Set Terms; Comprehension; Separation 119
The reader will note that, stylistically, ⊆ and ⊂ parallel the symbols ≤ and
< (compare how a < b, for numbers, means a ≤ b and a = b). It should be
mentioned however that it is not uncommon in the literature (e.g., in Bourbaki
(1966b), Shoenfield (1967)) to use ⊂ where we use ⊆, and then to need to use
or even to denote proper subset.
where we have taken the precaution that y is not free in A.† Bourbaki (1966b)
calls formulas such as A “collecting”.
Coll x A (2)
† Otherwise we would be attempting to “solve” for y in something like “x ∈ y ↔ A (x, y)”, which
is not the same as collecting in a container called y all those “values” of x that make A (x, z)
“true” for an arbitrarily chosen value of the “parameter” z. Such rather obvious remarks will
become sparser as we go along.
120 III. The Axioms of Set Theory
as an abbreviation of
(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ A [x])
Note that x is not a free variable in (1) (or in (2)), but it is, nevertheless, the
free variable of interest in A, the variable whose “values” (that “satisfy” A)
we are determined to collect. (2) says that indeed we can collect these “values”
into a set.
which is an expanded notation for (A). Intuitively, all of (A), (B), and (C) say
that if y is a set, then ZFC allows us to collect all the x for which x ∈ y is true
III.2. Set Terms; Comprehension; Separation 121
into a set. This is hardly surprising at the intuitive level, since this collection
that we form is just y – and we already know it is a set. Nevertheless, it is
reassuring that we have had no bad surprises here.
The formula in line (4) is, or creates, a contradiction, since also A(c, c) ↔
A(c, c).
Taking B (y) to be the special case of ¬U (y), and A(x, y) to be x ∈ y,
we have refuted Coll x x ∈
/ x, that is, we have established that (note absence
of subscript on ) ¬Coll x x ∈
/ x without ever using any nonlogical axioms
or being aware of the question x ∈ x. Our third (and last) visit to Russell’s
paradox will show that what is at work here is a Cantor diagonalization.
Thus, Frege’s (1893) axiom of comprehension that
for all formulas A , one has Coll x A
is refutable within pure logic by providing the counterexample A ≡ x ∈
/ x.‡
Russell’s paradox was not the first paradox discovered in naı̈ve set theory
as originally developed by Georg Cantor. The Burali-Forti antinomy had been
suggested earlier, and we will get to it at the proper place in our development.
Russell’s paradox is less technical on the one hand, and is immediately rele-
vant to our present discussion on the other hand; thus we opted for its early
presentation.
Obviously, comprehension, as stated by Frege, was too strong and allowed
some “super-collections” to be built (like R) which are not sets.
It should be noted here that in the work of Cantor the comprehension schema
was used carefully so as not to construct “too large” or “too complicated” sets,
as compared with the “ingredient” sets that entered into such constructions. For
this reason, his work did not explicitly lead to Russell’s objection, the latter
being aimed at Frege.
We still want to be able to collect into a set all “x-values” that satisfy “rea-
sonable” formulas A – leaving the unreasonable ones out. Let us work towards
identifying such reasonable formulas. But first a lemma:
III.2.3 Lemma.
Proof.
III.2.4 Remark. Let us recall the basics of introducing new function symbols
(Section I.6). Suppose that we have the following:
and
Then we may introduce a new function symbol, say f A , into the language of
T by the axiom
We also know that (3) is provably equivalent to (4) below (see p. 73), so that (4)
could serve as well as the introducing axiom:
We finally recall (p. 73) the notation (Whitehead and Russell (1912)) for the
term f A (xn ):
A special case of the above is important. Suppose that t(xn ) is a term, where we
have written “(xn )” to indicate the totality of free variables in t. Then substitution
in the logical axiom x = x yields t = t; thus the substitution axiom and modus
ponens yield
(∃y)y = t (6)
y =t∧z =t → y =z (7)
f t (ym ) = y ↔ t = y (8)
f t (ym ) = t (9)
We would like now to introduce a function f A that satisfies (3) (or (4)) for
precisely those xn that satisfy D . We could define f arbitrarily for those xn for
which D fails.
Let then a be some constant in the language of T . Let
We show that
Indeed,
(i) D (xn ) assume
(ii) (∃y)A(y, xn ) (i), (10) and MP
(iii) A(c, xn ) assume; c is a new
constant
(iv) D (xn ) ∧ A(c, xn ) ∨ ¬D (xn ) ∧ c = a (i), (iii) plus |=Taut
(v) (∃y) D (xn ) ∧ A(y,xn ) ∨¬ D (xn ) ∧ y = a (iv), subst. axiom
plus MP
Next consider
(i) ¬D (xn ) assume
(ii) ¬D (xn ) ∧ a = a (i), a = a and |=Taut
(iii) D (xn ) ∧ A(a, xn ) ∨ ¬D (xn ) ∧ a = a (ii) plus |=Taut
(iv) (∃y) D (xn ) ∧ A(y, xn ) ∨ ¬D (xn ) ∧ y = a (iii), subst. axiom
plus MP
Thus, by the deduction theorem,
(15) and (16) yield (14) via proof by cases. (13) and (14) allow the introduction
of f A by the axiom
That is,
Since
and
we get
and
In other words, (10) and (11) allow us to introduce a new function symbol f A
that satisfies (19) and (20). (20) defines f A “arbitrarily” for those xn where D
fails.
It is easy to check, just as we did on p. 73, that (19) is provably equivalent to
(18) T D (xn ) → A(y, xn ) ↔ y = f A (xn ) (19 )
126 III. The Axioms of Set Theory
III.2.5 Definition (Set Terms). If ZFC Coll x F (x, z n ), then Lemma III.2.3
allows us to introduce the term
(ιy) ¬U (y) ∧ (∀x) x ∈ y ↔ F (x, z n ) (st)
We call the above a set term, defined by the formula F and the objects
z1, . . . , zn .
We (almost always) use the shorter, and standard, metamathematical abbre-
viation
{x : F (x, z n )} (sst)
The reader will recall from Section I.6 that, in actual fact, a formal defini-
tion introduces a function symbol, not a term. However, we agree to leave the
“ontology” of that function symbol, say, “ f F ”, unspecified, and we agree to
use the argot (st) or (sst) above to informally denote the term, f F (z n ), that
corresponds to f F .
Nevertheless, whenever ZFC Coll x F , either of the notations (st) or (sst)
stands for (i.e., names) a formal term of the theory.
It is important to note that set terms give rise to more complicated terms than
just variables. The latter are the only terms of the basic language L Set (see II.3.2),
while as we enrich the language by the (formal) addition of new function sym-
bols f A , f B , etc., and constants ∅, ω, etc., we can build complicated terms
such as f A (. . . , f B (. . . , ω, . . .), ∅, . . .) (see I.1.5). Such terms we will call just
terms (or “formal terms”, to occasionally emphasize their formal status).
We immediately have
Proof. (i): This is (3) in III.2.4 above, that is, the introductory axiom for “ f A ”,
where A is “¬U (y) ∧ (∀x)(x ∈ y ↔ F )”.
(ii): By (4) in III.2.4 and tautological implication.
III.2. Set Terms; Comprehension; Separation 127
III.2.7 Remark. Note that x is a bound variable in (st) of Definition III.2.5, and
hence also in (sst). Thus, if the conditions for the variant theorem are fulfilled
(I.4.13, p. 46) – that w occurs neither free nor bound in F – then we can also
write the set term as {w : F (w, z n )}. That is,
y = {x : F (x, z n )}
↔ (i) of III.2.6
¬U
(y) ∧ (∀x)(x ∈ y ↔ F (x, z n ))
↔ variant theorem (I.4.13, p. 46) and equivalence theorem
¬U
(y) ∧ (∀w)(w ∈ y ↔ F (w, z n ))
↔ (i) of III.2.6
y = {w : F (w, z n )}
Thus,
from which, substitution and the logical fact t = t for any term t yield (1).
As a corollary we have – via the equivalence theorem and (1) – the well-
known and obvious (under the usual non-occurrence restrictions on w)
Formally introduced set terms play a dual role. On one hand, formally, they are
just meaningless symbol sequences of which we have proved (or a proof exists,
in any case) that they are sets. For that reason, we often just say “. . . the set
{x : F } . . . ”.
On the other hand, the formula part of a set term first order defines (in the
standard structure) some real set; hence the term itself represents or names that
set.
The very format of the chosen symbol for set terms,
{x : F }
is suggestive of its semantics in the standard model: “the collection of all the x
that make F true”. As a matter of fact, this is more than notational suggestive-
ness: Soundness of all first order theories – and anticipating that our axioms will
be true in the standard model – implies that all ZFC theorems will be “really
true”. In particular, the formula in (iii) of III.2.6 is “true” and says that “x is in
{x : F [x]} iff F [x] is ‘true’ for this x”.
¬U (y) ZFC x ∈ {x : x ∈ y} ↔ x ∈ y
By III.2.6 (ii),
¬U (y) ZFC ¬U {x : x ∈ y}
as well; hence
is an axiom:
¬U (A) → (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ A ∧ P [x])
For short,
¬U (A) → Coll x x ∈ A ∧ P [x]
The above is a schema due to the presence of the arbitrary formula P . Every
specific choice of P leads to an axiom: An instance of the axiom schema. The
name “separation axiom” is apt since the axiom allows us to separate members
from non-members (of a set).
Why is the schema III.2.9 true (in the standard model)? Well, it says that if A
is a set, then – no matter what formula P we choose – we can also collect
all those x that make x ∈ A ∧ P [x] true (1)
into a set.
Now, all those x in (1) are in A, and we know that we have formed A at
some stage† (it is a set!), say , that comes after all the stages at which all the
various x in A were formed (or “given”, if atomic).
Thus, at this very same stage we can collect into a set just those x in A
that are moreover restricted to satisfy P [x].
III.2.11 Proposition.
¬U (a), P → x ∈ a ZFC (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ P )
† Principle 1, p. 102, is at work here. Recall that we can take or leave this principle. However,
we have decided to take it (and hence adopt foundation, later on). It is worth stating that in the
absence of Principle 1, a “doctrine on limitation of size” would still effectively argue the “truth”
of separation.
130 III. The Axioms of Set Theory
Proof.
(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ a ∧ P )
↔ equivalence theorem and A → B |=Taut A ↔ A ∧ B
(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ P )
Done, since the top line is provable in the presence of the assumption ¬U (a)
(separation).
III.2.12 Corollary.
ZFC ¬U (a) → (∀x)(P → x ∈ a) → (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ P )
† Upon reflection, there is nothing unsettling about pure logic proving that “objects” exist in set
theory. This is simply a consequence of our decision – in logic – not to allow empty structures.
This decision was also hardwired in the syntactic apparatus of logic.
III.3. The Set of All Urelements; the Empty Set 131
since
ZFC (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ U (x))
by III.3.1.
That is, we have just introduced the (rather unimaginative) short name M for
the set term
{x : U (x)}
ZFC y = M ↔ y = {x : U (x)}
ZFC M = {x : U (x)}
III.3.4 Definition (Empty Set). By III.3.3 we may introduce the set term
{x : ¬x = x}
for the empty set. We can then follow this up by the axiom (definition)
∅ = {x : ¬x = x}
to introduce (using (9) in III.2.4) the new constant symbol ∅ for the empty set.
III.3.5 Remark. Referring to 2.6 (ii) and (iii), we see that the intuitive
meaning – or “standard semantics” – of ∅ is the “set with no elements”, since
it is a set, but, moreover, x ∈ ∅ is “false” (equivalent to ¬x = x) for all x. And
this is as we hoped it would be (refer also to the discussion in Section II.4).
Syntactically we get the same understanding as follows: By III.2.6 and III.3.4,
ZFC x ∈ ∅ ↔ ¬x = x
ZFC ¬x ∈ ∅ ↔ x = x
132 III. The Axioms of Set Theory
III.3.7 Example. We saw how to justify the existence (as a formal mathematical
object – a set) of a “part of” M in the simple, but very important, case of ∅.
In general, III.2.12 allows us to prove Coll x A for any A for which we know
that A → x ∈ M (either as an assumption, or as a provable ZFC fact).
For example, we can show that for any a and b in M we can collect these two
elements into a set of “two” elements, intuitively denoted by “{a, b}”. Indeed,
a ∈ M ∧b ∈ M → x =a∨x =b → x ∈ M (1)
In fact,
a∈M →x =a→x ∈M (2)
and
b∈M →x =b→x ∈M (3)
by the Leibniz axiom. Thus, proof by cases (I.4.26, p. 52) helped by |=Taut gives
x = a ∨ x = b → (a ∈ M → x ∈ M) ∨ (b ∈ M → x ∈ M)
of which (1) is a tautological consequence.
Thus,
if a ∈ M and b ∈ M, then Coll x (x = a ∨ x = b)
III.3. The Set of All Urelements; the Empty Set 133
Can we repeat the above for any sets a and b? That is, is it true that ZFC
Coll x (x = a ∨ x = b) for any objects a and b? In particular, can we say that we
can form the (real) sets {{a}} or {{a}, {b}} in the metatheory? Well, we should
hope so, since – intuitively – there is a stage after the stages when {a} and {b}
were built.
However we need a new axiom to formally guarantee this, because all our
present axioms are true in the structure with underlying set {∅, 1, {1}} (M = {1}
here), but so is
(∀y) ¬U (y) → (∀x)(x ∈ y → U (x))
since the members of every set in this structure are atoms. Thus,
present set of axioms + “no set has set elements” (1)
is consistent (cf. I.5.14). Hence
present set of axioms “some set has set elements” (2)
Thus, by (2), we cannot prove yet that, in particular, a set {{a}} exists.
One last comment before we leave this section: We choose not to postulate
existence of individual urelements, so it may be the case that M = ∅. This
leaves our options open in the sense that we can have the “usual” ZFC (with
no urelements) as a special case of our ZFC. We note in this connection that if,
instead of having a predicate (U ) to separate sets from atoms, we adopted a two-
sorted language with two “types” of object variables one, say a, b, c, a , a , . . . ,
for sets and one, say, p, q, p , q , . . . , for atoms, then
(∃ p)( p = p)
would guarantee the existence of atoms, spoiling our present flexibility.
134 III. The Axioms of Set Theory
The collection that A names is technically called a class (cf. III.4.3). We, of
course, simply say “A is a class”.
To protect the innocent I state outright that there is no philosophical signi-
ficance in restricting attention to first order definable classes. It is not due to a
lack of belief in the existence of non-definable classes; rather it is due to a lack
of interest in them.
III.4.1 Informal Definition (Class Terms). For any formula A of ZFC, the
symbol sequence
{x : A} (1)
is called a class term.
† In I.5.15 we saw what it means to first order define a set in a structure. The notion naturally
extends to first order definability of any collection.
III.4. Class Terms and Classes 135
ZFC Coll x A
then we use (1) to name a (formal) set term as per III.2.5 – thus, every set term
is also a class term.
(b) If not, then (1) can still be employed as an abbreviation of certain formal
texts described below (compare with III.2.6):
(i) y = {x : F } and {x : F } = y each stand for the formal text
¬U (y) ∧ (∀x)(x ∈ y ↔ F )
“=” in y = {x : F } is not the formal “=”. We are not to parse the informal text
“y = {x : F }”, decomposing it into its ingredients. We take it in its entirety as
an alias for the formal text “¬U (y) ∧ (∀x)(x ∈ y ↔ F )”. A similar comment
applies to informal uses of “=”, “∈”, and “U ” below.
III.4.2 Remark. (1) Ideally, we should have different notations for the symbol
{x : F } according to its status as a name for a set term or not – say, boldface type
in the former case, and lightface in the latter. However, it is typographically more
expedient to use no such notational distinctions but allow instead the context
(established by English text surrounding the use of such terms) to fend off
ambiguities.
(2) We already know that for some formulas A, ZFC ¬Coll x A. Seman-
tically, for such a formula A, the collection in the metatheory named by the
symbol
{x : A [x]} (∗)
is not a set.
Indeed, using III.4.1(i) above, we translate the formal “ZFC ¬Coll x A”
into the theorem, written in English,
“There is no set y such that y = {x : A}” (∗∗)
Then, Platonistically, for such a formula A we know that the collection (∗) is
not a set in the metatheory, since the theorems – such as (∗∗) above – of the
formal theory are true in the standard model.
For example, we can state that “{x : x ∈ / x} is not a set in the metatheory”.
The quoted fact is the translation of our formal knowledge that “There is no set
y such that y = {x : x ∈ / x}”, or in full formal armor†
¬(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ / x)
For the semantic and informal side of things and for future reference we
state:
† The reader will recall that this is a fact of logic, whence just “”.
III.4. Class Terms and Classes 137
parameters z 1 , . . . , z n ,
If, for some choice of closed terms tn , ZFC Coll x A(x, t1 , . . . , tn ), then A(tn )
denotes a real set; otherwise it denotes a non-set class, called a proper class.
For the sake of convenience we will use “blackboard bold” capital letters as
short names of classes; e.g., A abbreviates the class term {x : A} and we may
write, using the metalinguistic “=”,
A = {x : A }
These names are metavariables.† We will normally adopt the general convention
of naming a class term by the blackboard bold version of the same letter that
denotes the defining formula.
For example, A = B is short for {x : A} = {x : B }, A ∈ B is short for
{x : A} ∈ {x : B }, etc. – expressions which can be translated into the formal
language using III.4.1.
III.4.4 Remark. (1) Worth repeating: Class terms are just symbols that name
certain entities of our intuition, namely, classes. We will often abuse terminology
and say “let {x : A } be a class” rather than “let {x : A } name a class”, just
as one may say (under the assurance of ZFC Coll x A) “let {x : A} be a set”.
Properly speaking, a class term is an syntactic object, while a class is a “real”
object.
(2) What class terms and classes do for us is analogous to what number
theory argot does for us in Peano arithmetic (PA). Such argot allows us to
write, e.g., the easily understandable informal text
instead of
PA (∀n) n > 1 → (∃x)(∃y) n = x × y ∧
x > 1 ∧ (∀m)(∀r )(x = m × r → m = 1 ∨ m = x)
† We note that N, Z, Q, R, C are already reserved for the natural numbers, integers, rational num-
bers, reals, and complex numbers. These are metalinguistic constants. Besides, we have al-
ready called the Russell proper class “R”, and later we will use “On” and “Cn” for certain
proper classes. This does not conflict with the blackboard bold notation for indeterminate class
names.
138 III. The Axioms of Set Theory
In particular, in the context of class terms one can readily replace “stands
for” by “↔” to write – for example – something like (cf. III.4.1(i))
We can obtain (an absolute) proof of (∗) by starting with the tautology
and then abbreviating the left hand side, “¬U (y) ∧ (∀x)(x ∈ y ↔ F )”, by
“y = {x : F }” using the translation rule III.4.1(i).
(3) Every “real” set named by some formal term t is a class, since (by III.2.8)
y = {x : x ∈ y}
and hence
t = {x : x ∈ t}
by substitution.†
III.4.5 Example.
(a)
y=A (1)
(or A = y) is very short text for y = {x : A}, which in turn is short for
(III.4.1(i))
Thus, whenever we claim that we can prove (1), we really mean that we
can prove (2). In particular, such a proof yields also a proof that
(i) Coll x A (by substitution axiom and modus ponens); hence {x : A} is
(i.e., can be introduced as) a set term; thus, A is (denotes) a set.
Pause. What is all this roundabout argument for? Why don’t we just
say, “A, a class, equals a set y.‡ Therefore, it is itself a set”?
(ii) x ∈ y ↔ A.
(b)
A∈B (3)
Now, to say that we have a proof of (3) (or that we assume (3)) is to say that
we have a proof of (5) (or that we assume (5)). From the latter, tautological
implication along with ∃-monotonicity (I.4.23) yields
(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ A)
that is, Coll x A. In words, “(3) implies that A is a set”. This corresponds
well with our intention that class members are sets or atoms. Here A, being
a collection, is not an atom.
We have at once:
III.4.8 Proposition.
(i) A ⊆ B ↔ (∀x)(A → B )
(ii) A ⊆ B ∧ B ⊆ A ↔ A = B
Distribution of ∀ over ∧ to the left of the first ↔, along with the tautology
theorem and equivalence theorem, shows that the above is a theorem of pure
logic.
III.4.9 Example. What can we learn from ZFC ¬U (y) ∧ A ⊆ y? Well, III.2.8,
III.4.7 and III.4.8 allow us to translate the above into
ZFC ¬U (y) ∧ (∀x)(A → x ∈ y)
By III.2.12 and modus ponens we get
ZFC Coll x A
That is, A is a set. Another way to say all this is
ZFC ¬U (y) ∧ A ⊆ y → Coll x A
III.4.11 Example. We translate the very common informal text “A = ∅”: first,
into ¬({x : A} = ∅), and next, taking ZFC ∅ = {x : ¬x = x} into account,
¬(∀x)(A ↔ ¬x = x)
that is,
(∃x)¬(A ↔ ¬x = x) (1)
But
|=Taut ¬(Q ↔ P ) ↔ (Q ↔ ¬P )
Thus (1) is provably (within pure logic) equivalent to
(∃x)(A ↔ x = x) (2)
by the equivalence theorem.
III.4. Class Terms and Classes 141
(∃x)A
Some authors use “∼” or even “\” for difference instead of “−”.
{x : x = a} − {x : x = a ∨ x = b} = ∅
while
{x : x = a ∨ x = b} − {x : x = a} = {x : x = b}
It is immediate that
(1) A ∩ B ⊆ A
(2) A ∩ B ⊆ B
(3) A − B ⊆ A
(∀x)(A ∧ B → A)
142 III. The Axioms of Set Theory
The above is provable in pure logic. Similarly for (2). Also, (3) translates to
(∀x)(A ∧ ¬B → A)
and
1
Ai stands for A1
i=1
n+1
n
Ai stands for Ai ∩ An+1
i=1 i=1
n n
The symbols “ i=1 ” and “ i=1 ” are also written as “ 1≤i≤n ” and “ 1≤i≤n ”
respectively.
In a moment of weakness one may find oneself writing “A1 ∪ · · · ∪ An ” and
“A1 ∩ · · · ∩ An ” respectively.
† Recall that definitions, both formal and informal ones, are effected in the metatheory, where we
have tools such as natural numbers, induction, and recursion over N.
III.4. Class Terms and Classes 143
We note that x ∩ y makes sense formally even when one or both of x and y
are atoms, whereas the informal ∩ was defined only for classes.† The defining
axiom (i) and III.1.3 prove
ZFC U (x) → x ∩ y = ∅
and
ZFC U (y) → x ∩ y = ∅
The proof of the legitimacy of (v) is left to the reader. Note that here too
x − y, formally, makes sense for atom “arguments”. Moreover, we have the
“pathological” special cases
ZFC U (x) → x − y = ∅
and
So much for the formal “∩” and “−”. Of course, the defining axiom for the
formal “∪” will still have to wait for the union axiom.
Since we would have sooner or later to extend the formal “∩”, “ ∪ ”, “−”
(recorded below) to the class versions to use in our argot, we decided to in-
troduce these symbols as (informal) abbreviations to begin with (as is done in,
e.g., Levy (1979)).
III.4.17 Definition. For the record, we introduce the 2-ary function symbols
“∩” and “−” formally to our language, and add the defining axioms (i) and (v)
of III.4.16 to our theory.
The context will easily tell in each use whether we are employing these
symbols formally, or as abbreviations as per III.4.12.
† It is normal practice in a first order language to insist that function symbols stand for totally
defined or total functions, upon interpretation. Thus it is appropriate that ∩ and − are defined on
all objects.
144 III. The Axioms of Set Theory
III.4.19 Proposition.
(1) U M is a proper class.
(2) ZFC U M = V M ∪ M.
Once we have the union axiom (which says that the union of two sets is a set),
we will obtain that V M is a proper class too (by (2) above and III.3.1).†
Similarly, we can define the class of all sets whose construction is based on
the urelements in N ⊆ M, V N . We write just V for V∅ . We note that V N is
given by the class term {x : ¬U (x) ∧ sp(x) ⊆ N }.
(2) In many elementary developments of the subject one often works within
a “reference set”, or “relative universe”, X (a set), and the sets of interest are
subsets or members of X . With this understanding, one would write “−A” or
“A” for X − A (where A ⊆ X , and therefore A is a set) and call “−A” the
complement of A (with respect to X ). Note that for any set A ∈ U N , U N − A
(for any N ⊆ M) is a proper class (Exercise III.21); thus we will have little
use for complements. It is the difference (most of the time of sets, rather than
classes) that we will have use for.
III.5.1 Axiom (Unordered Pair). For any atoms or sets a and b there is a set
c such that a ∈ c and b ∈ c. Or, stated in the formal language,
(∃z)(a ∈ z ∧ b ∈ z)
† Other axiomatizations of set theory, originating with Gödel and Bernays, admit (proper) classes
as formal objects of study. See for example Monk (1969).
146 III. The Axioms of Set Theory
(∀x)(∀y)(∃z)(x ∈ z ∧ y ∈ z)
ZFC Coll x (x = a ∨ x = b)
We have
(1) (∃z)(a ∈ z ∧ b ∈ z) III.5.1
(2) a ∈ A ∧ b ∈ A added; A is a new constant
(3) a ∈ A (2) plus taut. implic.
(4) b∈A (2) plus taut. implic.
(5) ¬U (A) (3) plus III.1.3 plus MP
(6) x = a → (x ∈ A ↔ a ∈ A) Leibniz axiom
(7) x = b → (x ∈ A ↔ b ∈ A) Leibniz axiom
(8) x =a→x ∈ A (6) and (3) and taut. impl.
(9) x =b→x ∈ A (7) and (4) and taut. impl.
(10) x = a ∨ x = b → x ∈ A (8) and (9) and taut. impl.
(11) Coll x (x = a ∨ x = b) (10) and (5) and III.2.11
Proof. See the proof above, and use (5) plus (8) and III.2.11. Alternatively
(without referring to the proof), by |=Taut x = a ↔ x = a ∨ x = a.
III.5. Axiom of Pairing 147
III.5.5 Definition (Pairs and Singletons). The above proposition and its corol-
lary allow the formal introduction of the set terms
{x : x = a ∨ x = b} (1)
and
{x : x = a} (2)
We also introduce the terms {a, b} (unordered pair) and {a} (singleton) by
the formal definitions (cf. III.2.4)
{a, b} = {x : x = a ∨ x = b}
and
{a} = {x : x = a}
III.5.6 Remark (Denoting Sets by Listing). We say that {a, b} and {a} denote
sets by explicit listing of their members. We note that the informal notation
N = {0, 1, 2, . . .} does not denote a set by explicit listing (in the metatheory).
Such notation is only possible for what we intuitively understand as “finite” sets.
The “. . . ” indicates our inability to conclude the listing and hints at a “rule”,
or understanding, of how to obtain more elements. Such understanding depends
on the context (in the case of N, just add 1 to get the next member).
III.5.7 Proposition. ZFC {a, b} = {b, a} and ZFC {a} = {a, a}.
III.5.8 Remark. (i) Why “ZFC ” rather than just “” above? That is because
{a, b} and {a} were formally introduced as terms (sets) in III.5.5. Their intro-
duction necessitated the prior proof in (our present fragment of) ZFC of the
formulas Coll x (x = a ∨ x = b) and Coll x (x = a). As far as the class terms
{x : x = a ∨ x = b} and {x : x = a} are concerned, we have†
{x : x = a ∨ x = b} = {x : x = b ∨ x = a} (1)
and
{x : x = a} = {x : x = a ∨ x = a} (2)
† Cf. III.4.1 regarding the use of the unbracketed “=” in (1) and (2).
148 III. The Axioms of Set Theory
and
x =a ↔x =a∨x =a (4)
Thus, (1) and (2) are just stating the tautologies in (3) and (4), while Proposi-
tion III.5.7 states much more in between the lines, in particular that {a, b} and
{a} are sets.
If we had introduced the pair and singleton as abbreviations of the respective
class terms instead,† then the above proposition would be provable in pure
logic – for it would be just stating (3) and (4) – and whether or not the terms
referenced are sets would be a separate issue.
This remark was necessitated by our decision not to differentiate the notations
for set terms and class terms.
(ii) Proposition III.5.7 is popularized in naı̈ve set theory by saying “when
we list the elements of a set explicitly, multiplicity or order of elements does
not matter”.
While the axiom of pairing is not provable from the axioms that we had at our
disposal prior to its introduction (see p. 133), it becomes provable once (an
appropriate version of) collection and power set axioms are introduced (see
Exercises III.15 and III.16).
III.6.2 Example. We can easily verify that De Morgan’s laws hold for bounded
quantification, i.e.,
(∃x ∈ A)F ↔ ¬(∀x ∈ A)¬F
150 III. The Axioms of Set Theory
Indeed,
¬(∀x ∈ A)¬F
↔ by III.6.1
¬(∀x)(x ∈ A → ¬F )
↔ equiv. theorem
¬(∀x)(¬x ∈ A ∨ ¬F )
↔ “∀-De Morgan”
(∃x)¬(¬x ∈ A ∨ ¬F )
↔ “∨-De Morgan” and equiv. theorem
(∃x)(x ∈ A ∧ F )
↔ by III.6.1
(∃x ∈ A)F
III.6.5 Example. #, {|}, {1, {2}} = |, 1, {2} , where “#”, “|”, “1”, “2”
are names for atoms. So, in the result of the union, “loose atoms” are lost.
Let now A be a set, and consider A, that is, {x : (∃y ∈ A)x ∈ y}. Let A
be formed at stage . Then each y ∈ A must be available before , and since
x ∈ y for each x that we collect in A, a fortiori, x is available before . It
follows that A itself can be built at as a set, so it is a set. As in the case of
pairing, we state the following axiom of union in a “weak” form. It asserts the
existence of a set that contains the union as a subclass. This, by III.4.10, makes
the union a set.
III.6. Axiom of Union 151
(∃z)(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ z) (1)
III.6.7 Remark. Formula (1) has A as its only free variable. Of course, it is
equivalent to a version that is prefixed with “(∀A)”. Now, this axiom is stated
a bit too tersely (especially for our flavour of ZFC, which allows atoms) and
needs some “parsing”.
(a) (1) is provably equivalent to
Indeed, (2) is a tautological consequence of (1). Conversely, (1) follows from (2)
and proof by cases, because we can also prove
U (A) ZFC ¬y ∈ A
U (A) ZFC x ∈ y ∧ y ∈ A → x ∈ z
To see this, let B be a z that works (we are arguing by auxiliary constant) in (1).
Thus we add (to ZFC) the assumption
(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ B) (5)
Hence
¬U (B) ∧ (∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ B)
and then the substitution axiom yields (4).
r Let (i.e., add) U (B). By III.1.3 and the assumption we obtain ¬x ∈ B; hence
x ∈B→x ∈∅
by tautological implication. (6) and tautological implication (followed by
generalization) yield
(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ ∅)
from which
¬U (∅) ∧ (∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ ∅)
Once again, the substitution axiom yields (4).
III.6.8 Proposition. ZFC Coll x (∃y ∈ A)x ∈ y , where A is a free variable.
Proof. We use (4) of III.6.7(b). Add then a new constant B and the assumption
¬U (B) ∧ (∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ B)
Thus
¬U (B) (i)
and
(∃y)(y ∈ A ∧ x ∈ y) → x ∈ B (iii)
which will rest the case by III.2.11 and (i). Well, (iii) follows from (ii) by
∃-introduction.
III.6.9 Definition (The Formal Big and Little ∪). For the record, we
introduce into our theory a unary (1-ary) function symbol, “ ”, formally, by
the defining axiom
A = z ↔ ¬U (z) ∧ (∀x) x ∈ z ↔ (∃y ∈ A)x ∈ y (1)
III.6. Axiom of Union 153
We also introduce a new binary (or 2-ary) function symbol, “∪”, by the defining
axiom
x∪y= {x, y} (2)
Worth repeating: If A is a set, then so is A. Indeed, the assumption translates
to Coll x A; hence the class term A – that is, {x : A} – is (really, names) a
formal term “t” of set theory. So is t by definition of terms, and III.6.9.
But is it an atom? Since ZFC ¬U x by the preceding definition, where
x is a free variable, ZFC ¬U t by substitution.
III.6.10 Remark. By III.6.8 the function “makes sense” for both set and
atom variables. It is trivial to see from (1) above that
ZFC U (A) → A=∅
It follows that the binary formal ∪ also makes sense for any arguments and that
ZFC U (A) ∧ U (B) → A ∪ B = ∅.
III.6.11 Example. What is {a, b}? How does it relate to the informal defini-
tion (III.4.12, p. 141)? Let us calculate using III.6.3:
The second “=” from the bottom was by application of the “one-point rule”
(I.6.2). Note that in “a ∪ b” we are using the formal “∪” to allow this term to
be meaningful for both sets and atoms a, b.†
†
“ {a, b}” of III.6.3 is meaningful for both sets and atoms a, b. So is the formal “∪” of III.6.9,
unlike “A ∪ B” of III.4.12, which is defined only for class arguments.
154 III. The Axioms of Set Theory
III.6.13 Example. Let F = {{1, 2}, {1, 3}} (this family is a set; working in the
metatheory, apply pairing three times). Then F = {1}.
Let now G be any family, and a ∈ G. Then G ⊆ a. Indeed, the translation
of the claim (by III.6.12 and III.4.7) is
a ∈ G → (∀y)(y ∈ G → x ∈ y) → x ∈ a (1)
We can prove (1) within pure logic: Assume a ∈ G and (∀y)(y ∈ G → x ∈ y).
By specialization, a ∈ G → x ∈ a; hence (MP) x ∈ a. By the deduction theo-
rem, (1) is now settled. What happens if G = ∅? (See Exercise III.18.)
Priorities of set operations. “−” (that is, difference – as we will not use comple-
ments) and “∪” have the same priority and associate right to left. “∩” is stronger
(associativity is irrelevant by Exercises III.19 and III.20). Thus A − B ∪ C =
A − (B ∪ C), while A ∩ B − C = (A ∩ B) − C, A ∩ B ∪ C = (A ∩ B) ∪ C.
When in doubt, use brackets!
†
We do not feel inclined to perform acrobatics just to get around the fact that ∅ cannot be a
formal term: it is not a set (see Example III.6.13 below).
III.6. Axiom of Union 155
Proof. We do (1) imitating the way people normally argue this type of thing,
“at the element level”. The proof uses pure logic and Definitions III.4.1, III.4.3,
III.4.8, and III.4.12.
⊆: Let x ∈ C − (A ∪ B). Then
x ∈C (i)
and
x∈
/ (A ∪ B) (ii)
By definition of ∪, (ii) yields
x∈
/ A∧x ∈
/B (iii)
Combine (i) and (iii) to get (by definition of difference)
x ∈C−A∧x ∈C−B
or (by definition of ∩)
x ∈ (C − A) ∩ (C − B)
Done, by the deduction theorem.
⊇: Let x ∈ (C − A) ∩ (C − B). Then
x ∈C−A∧x ∈C−B
Hence
x ∈C (iv)
and
x∈
/ A∧x ∈
/B
This last one says (by definition of ∪)
x∈
/ (A ∪ B)
which along with (iv) gives
x ∈ C − (A ∪ B)
Case (2) is left as Exercise III.26.
III.6.16 Example. A better way, perhaps, is to use translations and reduce the
issue to a tautology: (1) above translates to (III.4.1, III.4.3 and III.4.12)
C ∧ ¬(A ∨ B ) ↔ (C ∧ ¬A) ∧ (C ∧ ¬B )
Noting that (by propositional De Morgan’s laws)
|=Taut C ∧ ¬(A ∨ B ) ↔ C ∧ (¬A ∧ ¬B )
we are done.
156 III. The Axioms of Set Theory
a b c D B A X ...
a i i i i i i i ...
b i i i i i i i ...
c i i i i i i i ...
D i i i i i i i ...
B i i i i i i i ...
A i i i i i i i ...
X i i i i i i i ...
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
The a, b, c, . . . that label the columns and rows are all the sets and atoms
arranged in some fashion. We may call these labels the heads of the respective
rows or columns that they label.
Each entry, i, can have the value 0 or 1 or the name of an atom (without
loss of generality, we assume that no atom has as name 0 or 1). This value is
determined as follows:
if z ∈ w
0
Entry at row z and column w = 1 if z ∈
/ w ∧ ¬U (w)
w if U (w) [N.B. This entails z ∈
/ w]
Here are a few examples: Say a is an atom. Then all the i’s in column a have
value a. Let b = {b, D}.† Then column b has 1 everywhere except on rows b
and D, where it has 0. Conversely, the head of a column sequence of 0, 1, and
atom values is determined by the sequence.‡ For example, the sequence of 1
everywhere determines ∅; the sequence
110 1 ...
all 1
i.e., the one that has 1 everywhere except at (row) position c, where it has a 0,
determines the set {c}.
Let us now define informally a sequence, and therefore a class that we will
by going along the main diagonal (that is, along the matrix entries
call R,
(a, a), (b, b), (c, c), . . . ) “reversing” all the i-values (specifically, nonzero to 0
and 0 to 1). That is,
0 at position x if entry(x, x) is not a 0
the sequence for R has a (1)
1 at position x if entry(x, x) is a 0
It follows that the sequence for R differs from the sequence for any x at posi-
tion x.
cannot occur anywhere in the matrix (as a column) – for if it occurred
Thus, R
as column x, it then would be “schizophrenic” at matrix entry (x, x) – so it is
not a set (recall, the matrix represents all sets and atoms as columns).
What is the connection with Russell’s paradox? Well, in (1) we are saying
iff x ∈
that x ∈ R = R, the Russell class! The above diagonalization
/ x; hence R
can be readily adapted to “construct” a set that is not in a given set b. All we
have to do is to think of the matrix as “listing”, or representing, just all the
atoms and sets in b rather than in U M (see Exercise III.6).
† In view of what we said in the preceding footnote regarding hypersets, we allow just for the sake
of this discussion the generality where b ∈ b is possible.
‡ Do not expect all sequences to appear as columns. For example the sequence whose members
are all 0 denotes U M , but our matrix heads are only sets or atoms.
158 III. The Axioms of Set Theory
We have chosen to describe (by ZFC) a standard model where, among its
properties, we have that x ∈ x is always false, by Principle 1.† Similarly,
a ∈ b ∈ a and, more generally, a ∈ b ∈ · · · ∈ a are absurd in our model, for
the leftmost a should be available before the rightmost a for such a chain of
memberships to be valid (Principle 1).
Note that if, say, a ∈ b ∈ a were possible, then we would get the “infinite”
chain
···a ∈ b ∈ a ∈ b ∈ a ∈ b ∈ a (1)
i.e., a would be “bottomless”, like an infinite regression of a “box in a box in a
box in a box . . . ”. Sets that are not bottomless are called well-founded.
A bit more can be said of the standard model. It is not only “repeating”
chains such as (1) that are not possible, but likewise non-repeating infinite
“descending” chains such as
· · · an ∈ an−1 ∈ · · · ∈ a2 ∈ a1 ∈ a0 (2)
There are no bottomless sets, period.
Towards formulating an appropriate axiom of ZFC that says “bottomless sets
do not exist”, let A be any nonempty class. Assume that it contains no atoms.
Now, there must be a set (maybe more than one) in A that was constructed no
later than any other set in A (for example, if # and | name atoms, and if {#} ∈ A
and {|} ∈ A, then {#} and {|} are two among those sets in A that are constructed
the earliest possible).
Let now, in general, y be such an earliest-constructed set in A, and let x ∈ y.
It follows that x ∈/ A (for x is an atom – hence not in A – or is a set built
before y). The existence of sets like y in A captures foundation. Thus, taking
A to contain precisely the members of (2), we see that (2) is absurd.
† Note that in such a state of affairs all entries (x, x) are nonzero; thus
R is the sequence composed
entirely of 0, representing U M . This is as it is expected, since now R = U M .
III.7. Axiom of Foundation 159
III.7.3 Remark.
(1) The foundation axiom (schema) is also called the regularity axiom.
(2) The schema version of foundation is due to Skolem (1923). It readily
implies – using A ≡ y ∈ A – and is implied† by the single-axiom
(non-schema) set version where A is a free variable other than y:
(∃y)y ∈ A → (∃y) y ∈ A ∧ ¬(∃x ∈ y)x ∈ A
or
¬U (A) → A = ∅ → (∃y ∈ A)(¬(∃x ∈ y)x ∈ A)
(3) The discussion that motivated III.7.2 was in terms of a class A that contained
no atoms. No such restriction is stated in III.7.2, for trivially, if A does
contain atoms, any such atom will do for y. If it is known that A is a family
of sets (i.e., that it contains no atoms), then foundation simplifies to
A = ∅ → (∃y ∈ A)y ∩ A = ∅
(4) If for a minute we write < for ∈, then III.7.2 (formal language version)
reads exactly as the least number principle on N. Of course, ∈ is not an
order on all sets; however, if its scope is restricted on appropriate sets, then
it becomes an order, and III.7.2 makes it a well-ordering. More on this in
Chapter VI.
or, as we say when we act like Platonists, “let B be an object such as (1) tells
us exists”. From (2) we derive B ∈ {a} hence, by III.5.5,
B=a (3)
and
¬(∃x)(x ∈ B ∧ x ∈ {a})
¬(∃x)(x ∈ a ∧ x = a)
¬a ∈ a
† We use the term “set” in quotes because at the time in the development of set theory that this
commentary refers there was no technical distinction between sets and proper classes. Rather,
there was a distinction between “sets” and “self-contradictory” sets; they were all sets, but some
were troublemakers and were avoided.
‡ If x could talk, it would say “I am a member of myself”.
III.8. Axiom of Collection 161
† The reader is reminded of the nowadays acknowledged existence of such (hyper)sets (Barwise
and Moss (1991)) – not, however in ZFC.
‡ Indeed, mathematicians were suspicious of even the phrase “all sets” in something as innocent
as “. . . let us divide all possible sets into [equivalence] classes. . . ” (N.B. Just let us divide,
not attempt to collect into a “set”.) See Wilder (1963, p. 100) for further discussion, where he
speculates whether the “concept” of “all sets” might be as “self-contradictory” as the concept of
“the set of all sets”.
162 III. The Axioms of Set Theory
After this has been accomplished, we will forget Principles 0–3 and always
defer to the axioms.
III.8.1 Example. Let B be a set and let A ⊆ B. Then certainly A is not larger
than B, so A is a set by Principle 3. Thus separation (see the class form of
the schema, III.4.10) follows from the doctrine of size limitation as much as it
follows from that of set formation by stages.
Next, let U be the class of all singletons. Is this class “large” (hence proper),
or is it “small” (hence a set)? This example appears in Wilder (1963, p. 100)
(see in particular the closing remarks prior to his 4.1.2), where the argument
implies that this “set” is not “self-contradictory” (what we now call a “proper
class”), for, after all, it is far from containing “all sets”. In fact, in 4.1.2 (loc.
cit.) the “cardinal number 1” is identified with the “set” of all singletons (U )
without any adverse comment.
Well, it turns out that U is a proper class, for it has the same size as U M , as we
can readily see from the fact that each x ∈ U M corresponds to a unique {x} ∈ U
and vice versa. Thus, as a “set”, U would be every bit as “self-contradictory”
as U M . Incidentally, we must wonder to what extent the fact that, as a “set”, U
clearly satisfied U ∈ / U made it more acceptable than U M back then.
Now consider a set A. Let us next “replace” every element x ∈ A by some other
object x (set or urelement).†
Evidently, the resulting class (let us call it A) is not larger than the original
(and could very well be smaller, for we might have replaced several x ∈ A by
the same object); hence, by Principle 3, A is a set. This is the principle of (it
goes under several names) replacement or substitution or collection, and it is
very important in ZFC.
We prefer not to use the name “substitution” for this nonlogical axiom, for
that would clash with our use of the name for the logical axiom
A [x ← t] → (∃x)A
† We use “replace” in the weak sense, where it is possible that for one or more x ∈ A the replacing
object is the same as the replaced object.
III.8. Axiom of Collection 163
III.8.3 Remark. (I) In any specific instance of the axiom (schema) of collection
the formula P [x, y] is the “agent” that effects the replacements: The hypothesis
ensures that for each x ∈ A, P suggests a y (maybe it has more than one
suggestion) – depending on x – as a possible replacement.
The conclusion says that there is a set,† z, which contains, instead of each x
that was originally in A, one (or more) replacement(s), y, among the possibly
many that were suggested by P . (All the suggestions were made to the left of
“→”.)
There is a small difficulty here: In the formal statement adopted in III.8.2 –
where we have allowed more than one possible candidate y to replace each x –
we run at once into a size and a “definability” problem: Obviously, if we are
going to argue that the size of z is small (and hence z is a set) we have to be
able to
(a) either choose a unique replacement y for each x ∈ A (and P [x, y] cannot
help us here; we have to do the choosing), or
(b) choose a “very small number” of replacements y for each x ∈ A – i.e., cut
down the size of the class of replacement values for each x – so that the
size of z is not substantially different from that of A.‡
† Well, not exactly. It says that a formal object exists, but this object could well be an atom. Since
we will prove (III.8.4) an equivalent statement to (1), which explicitly asks that z be a set, we
can pretend in this discussion that (1) already asks that z be a set, although it does so between
the lines.
‡ Clearly, it is not “safe” to collect into z all possible y that P [x, y] yields for each x. For example,
if P [x, y] ≡ x ⊆ y and A = {∅}, then allowing all the y that P yields for x ∈ A we would end
up with z = U M , not a set.
164 III. The Axioms of Set Theory
We can do better than that (avoiding the axiom of choice, which we have
not formally introduced yet) if we allow ourselves to put in z possibly more
than one y that satisfy P [x, y] for a given x ∈ A, that is, approach (b). We
do this as follows: To show (informally) that a set z as claimed by the formal
axiom exists, and that therefore the axiom is “really true”, let us consider, for
each x ∈ A, all the y such that P [x, y] is true which are built at the earliest
possible stage. There is just one such stage for each x, call it x . Now the class
of all such y, call it Yx , is a set, for all its elements are available at stage x ,
and there certainly is a stage after x (at such a stage, Yx is formed as a set).
Thus, for each x ∈ A we ended up with a unique set Yx . Using the informal
analysis prior to the axiom, there is a set B that contains exactly all the Yx . It
is clear now that we can “well-define” z: z = B will do, and is a set by the
union axiom.
(II) The hypothesis part of the axiom is usually stated in stronger terms,
viz., (∀x ∈ A)(∃!y) P [x, y],† and in that format it usually goes under the name
replacement axiom. The present form (mostly known as the collection axiom,
e.g., Barwise (1975)) is clearly preferable, for to apply it we have to work
less hard to recognize that the hypothesis holds. All the various formulations
of collection/replacement are equivalent in ZF (even without the “C”). Some
other forms besides the ones stated so far are the following, where we are using
set term notation in the interest of readability:
(1) Bourbaki (1966b):
(∀x)(∃z)(∀y)(P [x, y] → y ∈ z) → (∀A)Coll y (∃x ∈ A) P [x, y]
or in more suggestive notation
(∀x)(∃z){y : P [x, y]} ⊆ z → (∀A)Coll y (∃x ∈ A) P [x, y]
(2) Shoenfield (1967):
(∀x)(∃z)(∀y)(P [x, y] ↔ y ∈ z) → (∀A)Coll y (∃x ∈ A) P [x, y]
or in more suggestive notation‡
(∀x)(∃z){y : P [x, y]} = z → (∀A)Coll y (∃x ∈ A) P [x, y]
† Recall that (∃!x)R says that there is a unique x satisfying R. That is, (∃x)(R[x] ∧ (∀y)(y =
x → ¬R[y])).
‡ The “suggestive” notation in (1) above is 100% faithful to the formal version, since, by III.3.6,
(∀y)(P [x, y] → y ∈ z) is provably equivalent to {y : P [x, y]} ⊆ z (see also III.1.5, p. 116). Not
so for the suggestive rendering of (2) in the presence of atoms. For example, on the assumption
U (z), (∀x)(¬x = x ↔ x ∈ z) is not equivalent to {x : ¬x = x} = z. Well, on one hand
III.8. Axiom of Collection 165
Shoenfield (1967) does not employ atoms, so that our rendering of (2) captures exactly what this
form of collection “says” in loc. cit. On the other, this is a moot point, for we prove (III.8.12)
that the formal version (2) – even in the presence of atoms – implies version (3) without adding
the qualifier “¬U (z) ∧ ” before “(∀y)”. In turn, we find out later that form (3) implies collection.
In short, versions (1) and (2), exactly as stated, are equivalent to our collection.
166 III. The Axioms of Set Theory
Put another way, we cannot “stretch” an “infinite” set A into a proper class
by the device of replacing each of A’s elements with horrendously complicated
sets – i.e., sets that are built extremely late in the stage hierarchy – in an effort
to run out of stages. Starting with A ∈ U M , no matter how far we stretch it, we
still end up inside U M . Therefore, we do have a lot of stages. Equivalently, our
“universe” is “very large”.
There is an important observation to be made here: The reason that we have
used the size doctrine to justify collection/replacement is, intuitively, precisely
the result of the game above. We felt that we could not apply Principle 2
(p. 102) reliably, or convincingly, towards arguing that “we could imagine” that
a stage existed after all the stages for the construction of all the sets Sx (our two
colleagues could not imagine either).
The reader is referred to Manin (1977, p. 46), where he states that – in
the context of the doctrine of set formation by stages – the justification of the
collection axiom goes beyond the “usual intuitively obvious”.
x ∈ A → (∃y ∈ ∅) P [x, y]
B∈A (7)
¬U (C) (10)
A ↔ A∗ (1)
over the basic language L Set , that we care to substitute into the metavariable
P . There is no a priori promise that the schema “works” whenever we replace
the syntactic variable P [x, y] by a specific formula, say “B ”, over a language
that is an extension of L Set by definitions.
For example,† do we have the right to expect the provability of
(∀x ∈ A)(∃y)y = t[x] → (∃z)(∀x ∈ A)(∃y ∈ z)y = t[x]
in the extended theory, if the term t contains defined function or constant
symbols?
Indeed we do, for let us look, in general, at an instance of collection obtained
in the extended language by substituting the specific formula B – that may
contain defined symbols – into the syntactic variable P :
(∀x ∈ A)(∃y)B [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B [x, y] (2)
We argue that (2) is provable in the extended theory; thus the axiom schema is
legitimately usable in any extension by definitions of set theory over L Set .
Following the technique of symbol elimination given in Section I.6 (cf. I.6.4,
p. 73) – eliminating symbols at the atomic formula level – we obtain the fol-
lowing version of (2), in the basic language L Set . This translated version has
exactly the same form as (2) (i.e., of collection), namely
(∀x ∈ A)(∃y)B ∗ [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B ∗ [x, y]
Thus – being a collection schema instance over the basic language – it is an
axiom of set theory, and hence also of its extension (by definitions).
Now, by (1), the equivalence theorem yields the following theorem of the
extended theory:
(∀x ∈ A)(∃y)B [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B [x, y]
↔
(∀x ∈ A)(∃y)B ∗ [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B ∗ [x, y]
Hence (2) is a theorem of the extended theory as claimed.
The exact same can be said of the other two schemata (foundation,
separation).‡
III.8.6 Example (Informal). We often want to collect into a set objects that
are more complicated than [“values of”] variables, subject to a condition being
true. For example, we often write things such as
(1) {n 2 : n ∈ N},
(2) {x + y : x ∈ R ∧ y ∈ R ∧ x 2 + y 2 = 1},
(3) {(x, y) : x ∈ R ∧ y = 2}, where “(x, y)” is the “ordered pair” (more on
which shortly) of the Cartesian coordinates of a point on the plane,
(4) {(x, y) : x ∈ R}.
We are clear on what we mean by these shorthand notations. First off, for
example, notation (1) cannot be possibly obtained in any manner by substitution
from something like {x : . . . x . . . }, since the “x” in a set term {x : A} is bound.
What we do mean is that we want to collect all objects that have the form “n 2 ”
for some n in N. That is, notation (1) is shorthand (abbreviation or argot) for
{x : (∃n)(x = n 2 ∧ n ∈ N)}
The variables xn explicitly quantified in (1) above are precisely the ones we
list in “[xn ]” of A. We may call them “linking” variables (linking the term t
with the “condition” A) or “active” variables (Levy (1979)). All the remaining
variables other than y are free (parameters).
The notation does not always unambiguously indicate the active variables. In
such cases the context, including surrounding text, will remove any ambiguity.
Thus,†
{t[x] : A [x]} = {z : (∃x)(A [x] ∧ z ∈ t[x])} (3)
Only x, not A, is the linking variable. We could have written {t[x] : (x ∈ A)[x]}
to indicate this.
Proof. By collection,
since
† Note that since we are dealing with abbreviations, this is a theorem of pure logic.
172 III. The Axioms of Set Theory
x ∈ A → t[x] ∈ B (6)
C ∈ A ∧ y = t[C]
Thus
C∈A (8)
and
y = t[C] (9)
Proof. Apply III.6.8 to {t[x] : x ∈ A} (a set, by III.8.9), and use (3) above
(in III.8.8).
{t[x, y] : x ∈ A ∧ y ∈ B}
III.8.12 Proposition. In the presence of the ZFC axioms that we have intro-
duced so far – less collection, except when explicitly assumed as (1) below – we
have the following chain of implications (stated conjunctively): (1) → (2) →
(3) → (4) → (5), where the statements are
Proof. (1) → (2): We assume (1) (collection version III.8.2). To prove (2)
assume the hypothesis. Hence (specialization)
(∃z)(∀y)(P [x, y] → y ∈ z)
By III.3.6,†
Coll y P [x, y]
It follows that {y : P [x, y]} can be introduced formally as a set term. Hence,
by III.8.10,
Coll y (∃x ∈ A)y ∈ {y : P [x, y]}
(2) → (3): We assume (2). To prove (3), assume the hypothesis. Hence
(specialization)
(∃z)(∀y)(P [x, y] ↔ y ∈ z)
(∀y)(P [x, y] ↔ y ∈ B)
By III.2.4 (p. 122: (10), (11), (18)), (ii) and (iii) allow us to introduce a new
function symbol, f P , into the language, and the defining axiom
into the theory. From (iv) one deduces (corresponding to (19 ) and (20) of III.2.4)
and
† Observe how we did not need to insist that B is a set. This issue was the subject of a footnote on
p. 164.
III.8. Axiom of Collection 175
P [x, y] ↔ y ∈ { f P [x]}
Case of ¬(∃y) P [x, y]. Thus, (∀y)¬P [x, y]; hence ¬P [x, y] is deriv-
able. By tautological implication, P [x, y] → y ∈ ∅.
Conversely, y ∈ ∅ → P [x, y] is a tautological consequence of the theorem
¬y ∈ ∅. Thus,
P [x, y] ↔ y ∈ ∅
and we derive (vii ) once more, by the substitution axiom. Proof by cases now
yields (vii ) solely on the hypothesis (∀x)(∀y)(∀y )(P [x, y] ∧ P [x, y ] →
y = y ); hence we have (vii) by generalization.
Having settled (vii), we next obtain
which entails
and
x ∈ A → P [x, y] ∧ P [x, y ] → y = y (i x)
Q [x, y] ≡ x ∈ A ∧ P [x, y] ∨ x ∈
/ A∧y =∅
Q [x, y] ∧ Q [x, y ] → y = y
176 III. The Axioms of Set Theory
Hence
Let then (a formal definition of “B”† introduced just for notational convenience)
Let also w ∈ A.
By (viii) we get (∃y) P [w, y], which allows us to add a new constant C and
the assumption
P [w, C] (xii)
and, by (xi),
C∈B (xiii)
w ∈ A → (∃y ∈ B) P [w, y]
† Of course, “B” is a name for a term, and what one really defines formally here is a function f ,
by f (A, . . . ) = {y : (∃x)(x ∈ A ∧ P [x, y])}, where “. . . ” are all those free variables present
that we do not care to mention. In practice, “let B be defined as . . . ” is all one really cares to say.
III.8. Axiom of Collection 177
Hence
III.8.14 Remark. (I) Thus, in the sequel we may use any version of collection,
as convenience dictates.
(II) Intuitively, collection versions (2)–(5) have a hypothesis that guarantees
that for each “value” of x the corresponding number of values of y that satisfy
P [x, y] is sufficiently “small” to fit into a set. In fact, in case (4) at most one
y-value is possible for each x-value, while in case (5) exactly one is possible –
albeit on the restriction that x is varying over a set A. Thus, collecting all the
values y, for all x in a set A, yields a set under all cases (2)–(5). This set is what
we call in elementary algebra or discrete mathematics courses the image of A
under the black box P [x, y]. This black box is an agent that for each “input”
value x yields zero or more (but not too many) “output” values y.
We note that collection in its version III.8.2 does not have the “not too many”
restriction on the number of outputs y for each x, and that is why one is selective
when collecting such outputs into a set. The conclusion of Axiom III.8.2,
allows the possibility that many outputs y need not be included in z; it says
only that some outputs are included.
(III) Collection versions (2)–(4) in III.8.12 are quite strong, even in the
absence of some of the other axioms. For example, Bourbaki (1966b) adopts the
axiom of pairing, but adopts collection version (2), and proves both separation
and union (Exercise III.14).
Shoenfield (1967) adopts separation and proves pairing and union from col-
lection version (3) (Exercise III.15). Finally, Levy (1979) adopts union, and
proves separation and pairing from collection version (4) (Exercise III.16).
178 III. The Axioms of Set Theory
III.9.1 Informal Definition (Power Class, Power Set). For any class A, P(A)
stands for {x : ¬U (x) ∧ x ⊆ A}. We read P(A) as the power class of A.
If A is a set, then P(A) is also pronounced the power set of A.
= {∅}
P(∅)
P {∅} = ∅, {∅}
P {0, 1} = ∅, {0}, {1}, {0, 1}
Also note that
(i) Since for every class A we have ∅ ⊆ A, it follows that ∅ ∈ P(A).
(ii) If a is a set, then a ⊆ a as well; hence we have a ∈ P(a).
(iii) For a set x, x ∈ P(a) iff x ⊆ a.
(iv) Even though U (x) → x ⊆ a (is provable), still U (x) → x ∈ / P(a) (is prov-
able), since x must satisfy ¬U (x) for inclusion. Power classes contain no
atoms.
Pause. Now P {∅} ⊇ ∅, {∅} by (i) and (ii) above. But have we not forgotten
to include any other subsets of {∅}? Is it really “=” (rather than “⊃”) as we have
claimed above? The definitive answer that one is tempted to give is “Obviously,
the above is ‘=’ as stated”.
† My geometry teacher in high school used to say: “. . . [T]here are many proof methods: e.g., by
contradiction, by induction, etc. Among all those proof methods the most powerful is proof by
intimidation. It starts with the word ‘obviously’. Few have the courage to challenge such a proof
or to demand details . . . ”.
III.9. Axiom of Power Set 179
¬U (x) (2)
(∀y)(y ∈ x → y = ∅) (3)
and
¬x = ∅
(∃y)y ∈ x (4)
A∈x (5)
∅∈x (6)
by the Leibniz axiom and (5). The one point rule (III.6.2, p. 149) gives
∅ ∈ x ↔ (∀y)(y = ∅ → y ∈ x)
Thus, by (6),
(∀y)(y = ∅ → y ∈ x)
(∀y)(y ∈ x ↔ y = ∅)
III.9.3 Exercise. Repeat the above argument without relying on the axiom of
pairing. Thus, prove that
x ∈ P P(∅) ↔ x = ∅ ∨ x = P(∅)
Let be a stage after (we have no problem accepting that a stage exists
after a given stage). Then, by (1), P(A) is formed as a set at stage .
We could not reliably argue the above using the size limitation doctrine
(Principle 3, p. 161), for it is not intuitively clear whether an “exponential
growth” in set size is harmless. In fact, Principle 3 was introduced solely to
justify the replacement axiom, and, in Chapter V, the axiom of infinity.
We capture the above informal argument by the following power set axiom:
(∃y)(∀x)(x ⊆ A → x ∈ y) (1)
Actually, the axiom as stated above is not exactly what we have established in
the informal argument that preceded it. The axiom says a bit more than “the
power class of a set is a set”.
It is really true as stated, nevertheless. First off, the y of (1) above is neces-
sarily a set by x ⊆ A → x ∈ y, since (∃x)x ⊆ A is provable (A ⊆ A is a theorem
of pure logic – see III.1.5, p. 116) and hence, so is (∃x)x ∈ y by ∃-monotonicity.
Thus ¬U (y) by Axiom III.1.3. We have omitted the usual qualification “¬U (y)”
in the statement of the axiom, since it is a conclusion that the axiom forces any-
way. So the axiom says that
“There is a set y which contains as elements all x such that x ⊆ A,
without restricting x to be a set ”.
Now we see why it is really true. If A is a set, then lifting the restriction from
x adds to P(A) all the urelements (see III.1.5, p. 116). Well, there is a set y as
described above, for example, M ∪ P(A).†
If on the other hand A is an atom, then a choice for y that works is M ∪ {∅}.
As we have done on previous occasions, here too we prefer not to assert
explicitly that the objects which our axioms claim to exist are sets (nor do we
want to unnecessarily restrict our variables to be sets). We prefer to prove this
† We are using here the name M – which was earlier introduced formally to denote the set of all
atoms – to also name the set of all real atoms in the metatheory.
III.9. Axiom of Power Set 181
III.9.5 Proposition.
(∀x)(x ⊆ A → x ∈ B)
Hence
by ∀-monotonicity, since
We would like now to introduce the symbol “P” formally, in the interest of
convenience, along with its informal use. As in the cases of ∪, ∩, and , we
will take no notational measures to distinguish between the formal and informal
occurrences of the symbol; we will rely instead on the context.
or, equivalently
III.9.7 Remark. (I) Once again we note that it is redundant to add “¬U (y) ∧ ”
in (1) (III.9.6) or “¬U (P(A)) ∧ ” in (2). Indeed,
To see this, note that the ← direction is a tautological implication. For the →
direction we have
¬U (x) ∧ x ⊆ A ↔ x ∈ y
(∃x)(¬U (x) ∧ x ⊆ A)
¬U (y)
¬U (P(A)) (3)
(a, b) = (a , b ) → a = a ∧ b = b (1)
In particular, (a, a) is supposed to have two objects in it, a first a and a second
a, so it is not to be confused with {a, a} = {a}.
Some naı̈ve approaches to set theory take (a, b) to be a new kind of object
whose behaviour, i.e., (1), is “axiomatically accepted” (admittedly this is a
patently odd thing to do in a non-axiomatic approach). To proceed formally
within a framework that accepts sets and urelements as the only formal objects,
we must implement our new object as a set.
III.10. Pairing Functions and Products 183
There
are several
implementations,
the simplest one (due to Kuratowski)
being {a}, {a, b} , or the related a, {a, b} .
III.10.1 Proposition. If a, {a, b} = a , {a , b } , then a = a and b = b .
Proof. (Presented in a “relaxed” manner, that is, in argot. See also III.5.9,
p. 148.) By foundation, a = {a, b} (otherwise a ∈ a). Thus, taking the ⊆-half of
the hypothesis,
a = a and {a, b} = {a , b } (1)
or
a = {a , b } and {a, b} = a (2)
From (2) we get a ∈ a ∈ a , contradicting foundation. Therefore, case (2) is
untenable. Let us further analyze case (1), which already gives us half of what
we want, namely, a = a .
Thus,
{a, b} = {a, b } (3)
If a = b, then the ⊇-part of (3) gives b = b , and we are done. Otherwise, the
⊆-part of (3) gives b = b , and we are done again.
¬(a = a ∧ a = {a , b })
¬(a = {a, b} ∧ a = a )
and
¬(a = {a, b} ∧ a = {a , b })
a = a
Many of the sequel’s proofs are in the “relaxed” style. We get to be formal
whenever there is danger of missing fine points in this argot.
Proof. The only-if part is Proposition III.10.1. The if part follows from the
Leibniz axiom.
† An exception occurs in Chapter VII in our study of cardinality, where yet another pairing is
considered.
III.10. Pairing Functions and Products 185
Some will say that using the above definition for “pair” is overkill, since founda-
tion was needed toestablish itskey property in III.10.1. By contrast, a definition
of ordered pair via {a}, {a, b} does not require this axiom (see Exercise III.36).
This is a valid criticism for a development of set theory that is constantly pre-
occupied with the question of what theorem needs what axioms. In the context
of our plan, it is a minor quibble, since we will seldom ask such questions, and
we do have foundation anyway.
x1 , . . . , xn
III.10.5 Remark. (1) Some authors will not define n-tuples for arbitrary n –
once again avoiding the set N and inductive definitions – instead, they will
“unwind” the recursion and give a definition that goes, say, up to a 5-tuple, e.g.,
J (1) (x) = x
J (x, y) = J (J (1) (x), y)
(2)
and leave the rest up to the imagination, invoking “. . . ”. The reader should not
forget that we are using n in the metalanguage. As far as the formal system is
Thus, from now on we denote the ordered pair by the symbol “a, b”, rather
than “(a, b)”, in the interest of notational uniformity.
(3) The ordered pair provides, intuitively, a pairing function – which moti-
vates the name we gave J – that, for any two objects a and b, given in that order,
“codes” them into a unique object c ( = a, b) in such a way that if, conversely,
we know that a given c is a “code”, then we can uniquely (by III.10.3) “decode”
it into a and b. That is, if c is an ordered pair, then (∃!x)(∃!y)x, y = c holds.
The unique x is called the first projection and the unique y is called the second
projection of c – in symbols (we are using notation due to Moschovakis), π (c)
and δ(c) respectively.† More accurately, since not all sets (and no urelements)
are valid “codes” (that is pairs; e.g., {0} is not), we must let π and δ “return”
some standard “output” when the input c is “bad” (not a pair).‡
We next prove
Assume the hypothesis, and add the assumptions (by auxiliary constant)
x, A = z
and
v, B = z
Hence
x, A = v, B
† Presumably, π for “πρ ώτ η” (= first) and δ for “δε ύτ ερη” (= second).
‡ Once again we will ensure that the defined function symbols, π and δ, have total interpretations.
This determination was also at play when we defined power set and, earlier on, the formal “∩”,
“∪”, and difference. We defined all these functions to act on all sets or atoms.
III.10. Pairing Functions and Products 187
Therefore
x =v
by III.10.3.
By III.2.4 (19), (20), and (19 ) one directly obtains in ZFC (using III.10.6):
III.10.7 Proposition.
and
(∃x)(∃y)x, y = z → (∃y)x, y = z ↔ x = π (z) (π -19 )
III.10.9 Proposition. The following are theorems in the presence of the defining
axiom III.10.8:
and
(∃x)(∃y)x, y = z → (∃x)x, y = z ↔ y = δ(z) (δ-19 )
We have at once:
III.10.11 Proposition. The following are ZFC theorems (in the presence of the
appropriate defining axioms):
OP(z) → π(z), δ(z) = z (1)
and
¬OP(z) → π(z) = ∅ ∧ δ(z) = ∅ (2)
Proof. (2) is a direct consequence of III.10.7 and III.10.9 ((π -20) and (δ-20)).
As for (1), assume the hypothesis, OP(z). By III.10.7 (π -19),
(∃y)π(z), y = z
while by III.10.9 (δ-19),
(∃x)x, δ(z) = z
The above two allow us to assume (where A and B are new constants)
π(z), A = z
and
B, δ(z) = z (3)
Hence
III.10.12 Corollary. The following are ZFC theorems (in the presence of the
appropriate defining axioms):
π x, y = x (3)
and
δ x, y = y (4)
Proof. We note the logical theorem x, y = x, y, from which the substitution
axiom yields (∃u)(∃v)u, v = x, y, that is,
OP x, y (5)
By (5) and the logical theorem (∃u)x, u = x, y, (6) yields
x = π x, y
Proof. By III.8.7 the left hand side abbreviates the class term
or
This A × B makes sense for all variables A and B. In particular, one can
prove
U (A) → A × B = ∅
as well as
∅× B =∅
III.10.19 Remark. The following proof of III.10.17 for any sets A and B is
often criticized as overkill (e.g., Barwise (1975)), while Bourbaki (1966b), Levy
(1979), and Shoenfield (1967) – who use the collection-based proof above –
but not Jech (1978b), just stay away from it without comment:
Let a, b ∈ A × B, i.e., a ∈ A and b ∈ B. Thus, {a, b} ⊆ A ∪ B, and
therefore {a, b} ∈ P(A ∪ B). It follows that a, {a, b} ⊆ A ∪ P(A ∪ B) and
hence
a, {a, b} ∈ P A ∪ P(A ∪ B)
Thus a, b ∈ P A ∪ P(A ∪ B) , establishing A × B ⊆ P A ∪ P(A ∪ B) . By
separation, A × B is a set.
An additional criticism here may be that this proof needed to know the
implementation of “a, b”, while the one, based on collection, does not need
this information.
The objection that the proof is overkill, on the other hand, is context-
dependent. If the foundation of set theory is going to exclude the power set
axiom (one important set theory so restricted is that of Kripke and Platek with
192 III. The Axioms of Set Theory
or without urelements – “KP” or “KPU”; e.g., Barwise (1975)), then the objec-
tion is justified. If on the other hand we do have the power set axiom, then we
certainly are going to take advantage of it, and we reserve the right to use any
axiom we please in our proofs.
III.10.20 Example (Informal). {0} × {1} = {0, 1} and {1} × {0} = {1, 0}.
Since 0, 1 = 1, 0, these two products are different; hence A × B = B × A
in general.
×A × ×A
n
n
i and i=1
Ai and i
i=1 1≤i≤n
{xn : x1 ∈ A1 ∧ · · · ∧ xn ∈ An } (∗)
× A stands for A
1
i 1
i=1
× A stands for × A
n+1 n
i i × An+1
i=1 i=1
If all the Ai are equal, say to A, then we will usually write An . We let A1
mean A.
That the “. . . ” notation, in (∗) of III.10.21 above, and the inductive definition
coincide can be verified using III.10.4.
We can, of course, use the metalogical “=” in lieu of “stands for”. For exam-
ple, the logical theorem A1 = A1 leads to the logical theorem
1
Ai = A1 ×
×
1 i=1
on replacing the left A1 by the “abbreviation” A.
i=1 i
“rule”) we consider† the classes {x : A(a, x)} for each a ∈ I. I is the index
class.
We cannot in general collect all the Aa into a class, yet we can (informally)
“define” their union and intersection:
Aa stands for {x : (∃a ∈ I)x ∈ Aa } that is,{x : (∃a ∈ I)A(a, x)}
a∈I
and
Aa stands for {x : (∀a ∈ I)x ∈ Aa } that is,{x : (∀a ∈ I)A(a, x)}
a∈I
† “Consider”, not “collect”, since some Aa may be proper classes and we are unwilling to collect
other than sets or atoms into a class.
‡ This is a “theorem schema”. For each value of the informal object n we have a (different) theorem:
“A × B is a set”; “A × B × C is a set”; etc. We have suggested above a (meta)proof of all these
theorems at once by induction in the metatheory.
§ Such a table may have, intuitively, infinite length.
¶ Hence the term “one-to-many”.
194 III. The Axioms of Set Theory
III.11.2 Remark. (1) The term “binary”, understood if omitted, refers to the
fact that we have a class of (ordered) pairs in mind.
Some mathematicians – especially in the context of a discrete mathematics
course – want to have n-ary relations, for any n ≥ 1, that is, classes whose
members are n-tuples (III.10.4, p. 185). We will not spend any nontrivial amount
of time on those, since for any n ≥ 2, xn = xn−1 , xn (by III.10.4), and
therefore any n-ary relation, for n ≥ 2, is a binary relation. For n = 1 we
have the unary relations, that is, classes of elements that are 1-tuples, x.
Since x = x (cf. III.10.4), unary relations are just classes with no additional
requirements imposed on their elements. We will not use the terminology “unary
relations”; rather we will just call them classes (or sets, as the case may be). For
the record, when one uses n-ary relations, one usually abbreviates “xn ∈ T”
by “T(xn )”. In particular, when n = 2, the texts “x, y ∈ T”, “T(x, y)” and
“y T x” state the same thing.
An n-ary relation T may naturally arise as the extension or implementation
of a formula of n free variables, that is, as T = {xn : T (xn )} (see III.8.7). In
this case, and in view of the above comment, the texts “T(xn )” and “T (xn )”
are interchangeable in the argot of n-ary relations.
(2) The empty set is obviously a relation.
(3) Note the reversal of order in “a, b ∈ T iff b T a” in III.11.1. This is
one of a variety of tricks employed in the literature in order to make notation
regarding composition consistent between relations and functions (we will re-
turn to clarify this point when composition is introduced). The trick employed
here is as in Shoenfield (1978); for a different one see Levy (1979).
(4) In the same spirit, whenever the defining formula F of a relation F is
by convention written in so-called infix notation (i.e., x F y) rather than prefix
(i.e., F (x, y)) – e.g., one writes x < y rather than <(x, y) – then we observe
this reversal by writing F = {y, x : x F y} or F = {z : OP(z) ∧ δ(z)F π(z)}.
This notation has the nice side effect that a F b ↔ a F b is provable. For exam-
ple, {y, x : x < y)} is the relational implementation of the formula x < y.
Note. We will continue writing F = {x, y : F (x, y)}, that is, there is no
reversal of variables if the defining formula F is written in the usual prefix
notation.
Whenever b T a holds, we say that T, when presented with input a, “re-
sponds” with b among its (possibly many different) outputs.
(5) A relation often inherits the name of the defining formula. Thus the
relation {y, x : x ∈ y} is also denoted by “∈”, and y, x ∈ ∈ means x ∈ y.
The left “∈” in “y, x ∈ ∈” is the nonlogical symbol, while the right “∈” is
the informal name of the relation that extends the formula x ∈ y. Similarly, <
is used as both the name of {y, x : x < y} and that of the defining formula;
thus y, x ∈ < means x < y.
With some practice, all this will be less confusing than at first sight.
III.11.3 Example (Informal). Here are some relations: ∅, {0, 1}, R2 (where
R is the set of all reals), {0, 1, 2}. According to the above remark, the last
example is both 3-ary (ternary) and binary, since 0, 1, 2 = 0, 1, 2.
B”. The symbol† S : A → B is read exactly like any one of the three previous
italicized sentences in quotes.
A is called the left field and B is called the right field of S.
If S ⊆ A × A, then we say that S is a relation on A rather than on A × A.
Given S : A → B and in the context of these fields, S is total iff dom(S) = A;
otherwise it is nontotal.
It is onto iff ran(S) = B. In this case we often say that S maps A onto B, or
just that S is onto B.
The converse or inverse of any class S, in symbols S−1 , is the class {x, y :
y, x ∈ S}.
The concept of inverse (converse) applies, in particular, to relations S.
Let S : A → B, X ⊆ A, and Y ⊆ B. The image of X under S, in symbols
S[X], is the class of all the outputs that are caused by inputs in X, i.e., {y :
(∃x ∈ X)y S x}.
We have the non-standard‡ shorthand Sc for S[{c}].
The inverse image of Y under S is just S−1 [Y].
We have the non-standard shorthand S−1 c for S−1 [{c}].
III.11.5 Remark. (1) The notions of left field and right field are not absolute;
they depend on the context. It is clear that once left and right fields are chosen,
then any super-class of the left (respectively, the right) field is also a left (re-
spectively, right) field. Conversely, one can always narrow the left field until it
equals dom(S), thus rendering S total. A similar comment holds for the concept
of onto. That may create the impression that the notions “total”, “nontotal”, and
“onto” are really useless.
This is not so, for in many branches of mathematics the studied relations and
functions have “natural” associated classes (usually sets) from which inputs are
taken and into which outputs are placed. For example, in (ordinary) recursion
theory functions take inputs from N and produce outputs in N. It is a (provably)
unsolvable problem of that theory to determine for any given such function, in
general, whether it is total or onto.§ Therefore it is out of the question to make¶
left and right fields “small enough” to render the arbitrary such function total
and onto.
† The context will not allow confusion between the logical → and the one employed, as is the case
here, to mean “to”.
‡ A reason for its being non-standard becomes obvious as soon as we consider III.11.14.
§ By Rice’s theorem, proved in volume 1.
¶ “Make” with the tools of recursion theory, that is. Such tools are formalized algorithms.
III.11. Relations and Functions 197
(2) Note that for any a and b, b ∈ Sa iff b ∈ S[{a}] iff (∃x ∈ {a})b S x
iff (∃x)(x = a ∧ b S x) iff b S a. This pedantic (conjunctional) iff chain proves
(within pure logic, where we wrote the argot “iff” for “↔”) the obvious
b ∈ Sa ↔ b S a (i)
Similarly,
(4)
S[X] = {Sx : x ∈ X}
III.11.6 Example (Informal). Let S = {0, a, 0, b, 1, c, {0, 1}, a}.
Then S0 = S[{0}] = {a, b}, S[{0, 1}] = {a, b, c}. On the other hand,
S{0, 1} = S[{{0, 1}}] = {a}. Thus, S[{0, 1}] = S{0, 1}.
198 III. The Axioms of Set Theory
This phenomenon occurs because dom(S) has a member, namely {0, 1},
which is also a subset of dom(S). One encounters a lot of sets like this in set
theory, so the common notation “S(X)”, which is used in naı̈ve approaches both
for the image (when X is viewed as a collection of points) and for the output(s)
when X is a single input (X now being viewed as a point), would have been
ambiguous in our setting.
What does Sa = ∅ mean? By III.4.11 it translates into ¬(∃x)x ∈ Sa.
This is (logically) equivalent to a ∈
/ {z : (∃x)x S z}, that is, a ∈
/ dom(S), or Sa
is undefined.
III.11.7 Example (Informal). (1) Let < and > be the usual predicates on N,
and let us use the same symbols for the relational extensions of the atomic
formulas x < y and x > y. Then, <3 = {x : x < 3} = {0, 1, 2}. Similarly,
>3 = {4, 5, 6, . . . }.
(2) Let M = {0, x : x = x}. Then dom(M) = {0} and ran(M) = U M .
Thus a relation that is a proper class can have a domain which is a set. Similar
comment for the range (think of M−1 ).
III.11.8 Proposition. If the relation S is a set, then so are dom(S) and ran(S).
and III.11.4 we get dom(S) = {π(z) : z ∈ S}. The claim follows now from
III.8.9.
The argument for ran(S) just uses δ(z) instead.
> {0, 1, 2} = {0, 1, 0, 2, . . . , 1, 2, 1, 3 . . . , 2, 3, 2, 4 . . .}.
Thus, “if F is a function, . . .” adds (1) to the axioms (of ZFC), while “. . . F is
a function . . .” claims that (1) is a theorem.
If F : A → B, then F is a partial function from A to B. The qualification
“partial” will always be understood (see above remark), and therefore will not
be mentioned again.
† We are not going to continue reminding the reader that this is argot. See III.11.12.
200 III. The Axioms of Set Theory
by III.8.7. By III.10.3 and the equivalence theorem, the above translates (is
provably equivalent) to
(∃u)(∃w) u = x ∧ v = y ∧ F (u, w)
Two applications of the one point rule yield the logically equivalent formula
F (x, y).
With this settled, we see that (ii) and (iii) are indeed provably equivalent if
F is given by (i).
(5) Since a function is a relation, all the notions and notation defined previ-
ously for relations apply to functions as well. We have some additional notation
and concepts peculiar to functions:
† This informal definition gives the meaning of z ∈ {x : A [x]}, not that of z ∈ {t[x] : A [x]}.
III.11. Relations and Functions 201
We now see why we have two different notations for functions and relations
when it comes to the image of an input. F(a) is the output itself, while Fa is
the singleton {F(a)}.†
III.11.15 Example (Informal). Here are two examples of relations from “real”
mathematics: C = {x, y ∈ R2 : x 2 + y 2 = 1} and H = {x, y ∈ R2 : x 2 + y 2 =
1 ∧ x ≥ 0 ∧ y ≥ 0}.
H , but not C, is a function. Clearly H is the restriction of C on the non-
negative reals, R≥0 , in the sense H = C ∩ R2≥0 .
In particular, G(x) S a stands for (∃z)(z = G(x) ∧ z S a), and a S G(x) for
(∃z)(z = G(x) ∧ a S z).
In short, we have introduced abbreviations that extend the one point rule in
the informal domain, with informal terms such as G(x) that are not necessarily
admissible in the formal theory.
def
III.11.17 Remark (Informal). (1) Take the relations “=” ( = {x, y : x = y})
def
and “=” ( = {x, y : ¬x = y}), both on N, and a function f : N → N. Then
† Sometimes one chooses to abuse notation and use “F(a)” for both the singleton (thinking of F
as a relation) and the “raw output” (thinking of F as a function). Of course the two uses of the
notation are inconsistent, especially in the presence of foundation, and the context is being asked
to do an unreasonable amount of fending against this. The Fx notation that we chose restores
tranquillity.
202 III. The Axioms of Set Theory
while
(4) S = T ↔ (∀x)(S(x) = T(x)) fails.
III.11.19 Exercise. For any (formal) term t(xn ) and class term A, the class term
F = {xn , t(xn ) : xn ∈ A}
is the binary relation (see III.8.7)
F = z : OP(z) ∧ (∃x1 ) . . . (∃xn ) π (z) = xn ∧ δ(z) = t(xn ) ∧ π (z) ∈ A
† Indeed not just “”, but “|=Taut ”. On the right hand side of “→” we expand the abbreviation into
F(a) ↑ ∧ G (b) ↑ ∨(∃x)(x = F(a) ∧ x = G(b)).
III.11. Relations and Functions 203
(1) F(xn ) = t(x1 , . . . , xn ) for all xn (recall that we write F(xn ) rather than F(xn )
xn , y = xn , y).
and that
(2) F = xn "→ t(x1 , . . . , xn ) .
(3) F = λxn .t(x1 , . . . , xn ) (λ-notation).
f = λx y.y 2 (2)
This function has two inputs. One, x, is ignored when the output is “computed”.
Such variables (inputs) are sometimes called “dummy variables”.
λ-notation gives us the list of variables (between λ and “.”) and the “rule”
for finding the output (after the “.”). The left and right fields (here N2 and N
respectively) must by understood from the context.
In practice one omits the largely ceremonial part of introducing (1) and
writes (2) at once.
As we feel obliged after the introduction of new argot to issue the usual clar-
ifications, we state: When used in ZFC, “F is 1-1” is just short for (1) or (2)
above. Thus, to assume that F is 1-1 is tantamount to adding (1) (equivalently,
(2)) to the axioms, while to claim that F is 1-1 is the same as asserting its (ZFC)
provability.
204 III. The Axioms of Set Theory
Note that F(x) = F(y) implies that both sides of “=” are defined
(cf. III.11.16). In the opposite situation F(x) = F(y) is refutable; hence (2)
still holds.
The above definition can also be stated as u F x ∧ u F y → x = y. We can say
that a 1-1 function “distinguishes inputs”, in that distinct inputs of its domain
are mapped into distinct outputs.
Note that f = {0, 1, 1, 2} is 1-1 by III.11.22, but while f (2) ! f (3)
(III.11.17), it is the case that 2 = 3. Nevertheless, f (2) = f (3) → 2 = 3, since
f (2) = f (3) is refutable.
1-1-ness is a notion that is independent of left or right fields (unlike the
notions total, nontotal, onto).
Let us now re-formulate the axiom of collection with the benefit of relational
and functional notation.
Proof. (1): Let S = {x, y : S (x, y)} for some formula S of the formal lan-
guage. Let Z = dom(S). An instance of “verbose” collection (cf. III.8.4) is
(∀x ∈ Z )(∃y)S (x, y) → (∃W ) ¬U (W ) ∧ (∀x ∈ Z )(∃y ∈ W ) S (x, y) (i)
Now, we are told that Coll x (∃y) S (x, y) (cf. III.11.4); thus the assumption
Z = dom(S) translates into
ZFC (∀x) x ∈ Z ↔ (∃y) S (x, y)
and therefore the left hand side of (i) is provable, by tautological implication
and ∀-monotonicity (I.4.24). Thus the following is also a theorem:
(∃W ) ¬U (W ) ∧ (∀x ∈ Z )(∃y ∈ W ) S (x, y) (ii)
Let us translate (ii) into argot: We are told that a set W exists† such that
(∀x ∈ Z )W ∩ Sx = ∅
and finally (see Remark III.11.5(5))
Z ⊆ S−1 [W ]
Since, trivially, Z ⊇ S−1 [W ] we see that W will do for the sought B.
(2) follows from (1) using S−1 instead of S.
III.11.27 Remark. Statement (1) in the theorem, and therefore (2), are “equiv-
alent” to collection. That is, if we have (1), then we also have (i) above. To see
this, let S (x, y) of the formal language satisfy the hypothesis of collection, (i):
(∀x ∈ Z )(∃y)S (x, y) (a)
def
for some set Z . Let us define S = {x, y : S (x, y) ∧ x ∈ Z }. Then (a) yields
Z = dom(S). (1) now implies that for some set B, Z ⊆ S−1 [B], from which
the reader will have no trouble deducing
III.11.28 Proposition.
(1) If S is a function and A is a set, then so is S[A].
(2) If S is a function and dom(S) is a set, then so is ran(S).
If we let I be a name for dom(F), then we often write (Fa )a∈I to denote the
indexed family, rather than just F or λa.F(a).
In the notation “Fa ” it is not implied that F(a) might be a proper class (it
cannot be); rather we imply that the function F might be a proper class.
Note that we called F, rather than ran(F), the indexed family (of course, ran(F)
is a family of sets in the sense of III.6.3, p. 150). What is new here is the intention
to allow “multiple copies” of a set in a family with the help of F. An indexed
family allows us to be able to talk about, say, S = {a, b, a, a, c, d} without being
obliged to collapse the multiple a-elements into one (extensionality would
dictate this if we had just a set or class {a, b, a, a, c, d}). This freedom is
achieved by thinking of the first a as, say, f (0), the second as f (2), and the
third as f (3), where
f = {0, a, 1, b, 2, a, 3, a, 4, c, 5, d}
is an indexed family with index set dom( f ) = {0, 1, 2, 3, 4, 5}, and ran( f ) = S.
Why is this useful?
For example, if a, b, c, d, . . . are cardinals (Chapter VII), we may want to
study sums of these where multiple summands may be equal to each other. We
!
can achieve this with a concept/notation like i∈dom( f ) f (i).
This situation is entirely analogous to one that occurs in the study of series
in real analysis, where repeated terms are also allowed.
III.11.32 Informal Definition. Let (Fa )a∈I be an indexed family of sets. Then
def
Fa = ran(F)
a∈I
def
Fa = ran(F)
a∈I
If I is a set I , then
" def
Fa = { f : f is a function ∧ dom( f ) = I ∧ (∀a ∈ I ) f (a) ∈ Fa }
a∈I
208 III. The Axioms of Set Theory
#
If, for each a ∈ I , Fa = A, the same set, then a∈I Fa is denoted by I A. That
is, I A is the class of all total functions from I to A:
III.11.33 Remark. (1) a∈I Fa and a∈I Fa just introduce new notation (see
#
also the related III.10.22, p. 192). However, a∈I Fa is a new concept that is
related but is not identical to the “finite” Cartesian product. For example, if
A1 = {1} and A2 = {2}, then
×A = A
2
i 1 × A2 = {1, 2} (i)
i=1
#
III.11.34 Proposition. If I is a set, then a∈I Fa and a∈I Fa are sets. If more-
over I = ∅, then a∈I Fa is a set as well.
III.11. Relations and Functions 209
Proof. Since I is a set, so is ran(F) by III.11.28. Now the cases for a∈I Fa
and a∈I Fa follow from III.11.32 and the axiom of union and from III.6.14,
respectively.
On the other hand,
"
Fa ⊆ P(I × Fa )
a∈I a∈I
#
Hence a∈I Fa is a set.
A
III.11.35 Corollary. For any sets A and B, B is a set.
(∃x)x ∈ A
(∃x)x ∈ B
and
(∃x)x ∈ C
a∈A
b∈B
c∈C
We can do the same with four sets, or five sets, or eleven sets, etc., or prove by
(informal) induction on n, in the metatheory, the theorem schema (see III.10.21)
if Ai = ∅ for i = 1, . . . , n, then × A = ∅.
1≤i≤n
i (1)
† Thinking of the informal natural numbers as urelements, I is a set by separation, since the “real”
M is a set.
210 III. The Axioms of Set Theory
It follows that
"
if Ai = ∅ for i = 1, . . . , n, then Ai = ∅. (2)
i∈I
(2) can be obtained within the theory, once n and N are formalized. In the
meanwhile, we can view it, formally, not as one statement, but as a compact
way of representing infinitely many statements (a theorem schema): One with
one set A (called A1 in (2)), one with two sets A, B (A1 , A2 ), one with three
sets A, B, C (A1 , A2 , A3 ), etc.
As indices we can use, for example, {∅}, {{∅}}, {{{∅}}}, {{{{∅}}}}, etc., which
are all distinct (why?).†
Does (2) extend to the case of arbitrary (and therefore possibly infinite) I ?
We consider this question in the next chapter.
We have seen the majority of nonlogical axioms for ZFC in this chapter.
The “C” of ZFC is considered in the next chapter. In Chapter V we will in-
troduce the last axiom, the axiom of infinity, which implies that infinite sets
exist.‡
III.12. Exercises
III.1. Prove ZFC ¬U (A) ∧ ¬U (B) → (A ⊂ B → B = ∅).
III.2. Prove ZFC Coll x B → (∀x)(A → B ) → Coll x A.
III.3. Prove ZFC U (x) → x ∩ y = ∅.
III.4. Prove ZFC U (x) → x − y = ∅.
III.5. Prove ZFC ¬U (x) → U (y) → x − y = x.
III.6. Let a be a set, and consider the class b = {x ∈ a : x ∈ / x}. Show that,
despite similarities with the Russell class R, b is a set. Moreover, show
b∈ / a. Do not use foundation.
III.7. Show R (the Russell class) = U M .
III.8. Show that ZFC ¬U (x) → ∅ ⊆ x
III.9. Show that if a class A satisfies A ⊆ x for all sets x, then A = ∅.
III.10. Without using foundation, show that ∅ = {∅}.
† We can still write these indices as “1”, “2”, “3”, “4”, etc. (essentially counting the nesting of
{}-brackets), as this is more pleasing visually.
‡ The infinity axiom does not just say that infinite sets exist. It says, essentially, that limit ordinals
exist, which is a stronger assertion.
III.12. Exercises 211
III.11. Interpret the extensionality axiom over N so that the variables vary over
integers, not sets, and ∈ is interpreted as “less than”, <. Show that under
this interpretation the axiom is true.
III.12. Show that if we have no urelements, and if our axioms are just ex-
tensionality, separation, union, foundation, and collection, then this set
theory
(1) can prove that a set exists, but
(2)
cannot prove that a nonempty set exists.
Hint:
Find a model of all the
above axioms augmented by the formula
(∀y) ¬U (y) → (∀x)x ∈ /y .
III.13. Suppose we have all the axioms except the one for pairing and the one
that asserts the existence of a set of urelements (III.3.1). Show that these
axioms cannot prove that a set exists.
(Hint: Find a model of all the above axioms augmented by the formula
(∀x)U (x).)
III.14. (Bourbaki (1966b)) Drop collection version III.8.2, separation, and
union. Add Bourbaki’s axiom of “selection and union”, that is, collec-
tion version (2) of III.8.12, p. 173. Prove that separation and union are
now theorems.
III.15. (Shoenfield (1967)) Drop collection version III.8.2, pairing, and union.
Add collection version (3) of III.8.12, p. 173. Prove that pairing and
union are now theorems.
III.16. (Levy (1979)) Drop collection version III.8.2, pairing, and separation.
Add collection version (4) of III.8.12, p. 173. Prove that pairing and
separation are now theorems.
III.17. Prove
U (A) → P(A) = {∅}
in ZFC.
III.18. What is ∅ (and why)?
III.19. Show that
(1) A ∪ B = B ∪ A and
(2) A ∪ (B ∪ C) = (A ∪ B) ∪ C.
III.20. Show that
(1) A ∩ B = B ∩ A and
(2) A ∩ (B ∩ C) = (A ∩ B) ∩ C.
III.21. For any set A in the “restricted” universe U N (N ⊆ M), show that
U N − A is a proper class.
212 III. The Axioms of Set Theory
(2) A ∪ (B ∩ D) = (A ∪ B) ∩ (A ∪ D)
III.30. Generalized distributive laws for ∪, ∩. Prove for any class A and in-
dexed family (Bi )i∈F that
(1) A∩ Bi = (A ∩ Bi )
i∈F i∈F
(2) A∪ Bi = (A ∪ Bi )
i∈F i∈F
From this chapter and onwards the reader will witness more and more the
“relaxed proof style” (cf. III.5.9).
IV.1. Introduction
The previous chapter concluded with the question, can
"
if Ai = ∅ for all i ∈ I, then Ai = ∅ (1)
i∈I
IV.1.1 Axiom (Axiom of Choice, or AC). If I and Aa , for all a ∈ I , are non-
#
empty sets, then a∈I Aa = ∅.
But why “axiom”? After all, the case for finite I is provable as a theorem,
that is, (1) above. Before we address this question, let us first consider some
more down-to-earth equivalent forms of AC.
IV.1.2 Theorem. The following statements (1), (2), (3), and (4) are provably
equivalent.
(1) AC.
(2) If the set F is a nonempty family of nonempty sets, then there is a function
g such that dom(g) = F and g(x) ∈ x for all x ∈ F.
† The terms “infinite” and “finite” throughout this discussion have their intuitive metamathematical
meaning.
215
216 IV. The Axiom of Choice
(3) If the set S is a relation, then there is a function f such that dom( f ) =
dom(S) and f ⊆ S.
(4) If the set F is a nonempty family of pairwise disjoint nonempty sets, then
there is a set C that consists of exactly one element out of each set of F
(i.e., for each x ∈ F, C ∩ x is a singleton).
def
Proof. (1) → (2): Given a set F as in (2). Define i = λx.x with dom(i) = F.
Then F can be viewed as the indexed family (i(x))x∈F , or (x)x∈F . By (1) there
#
is a g ∈ x∈F x. Thus, dom(g) = F and g(x) ∈ x for all x ∈ F.
def
(2) → (3): Given a relation S (set). Let F = {Sa : a ∈ dom(S)}. F is a
set by III.8.9, since dom S is a set (III.11.8). If F = ∅, then S = ∅ and f = ∅
will do. So let F = ∅. By (2), there is a choice function g, i.e., dom(g)
= F
and g(x) ∈ x for all x ∈ F. In terms of S, the last result reads g Sa ∈
†
def & '
Sa for each a ∈ dom(S). Clearly, f = a, g Sa : a ∈ dom(S)
will do.
def
(3) → (4): Let F be as in (4). Define S = {x, y : y ∈ x ∈ F}.‡ S is a set,
since S ⊆ F × F. Now apply (3) to obtain f ⊆ S with dom( f ) = dom(S) =
F. Take C = ran( f ), a set by III.11.28.
To verify, let x ∈ F = dom( f ). Then x, f (x) ∈ S; therefore f (x) ∈ x.
This along with f (x) ∈ C yields f (x) ∈ C ∩ x. Let also
y ∈C∩x (i)
This attitude is similar to the one that separates collections into sets and proper
classes: That some collections are not sets is a situation we are by now com-
fortable with. In this section we further narrow down what collections we will
accept as sets in defense of AC.
This is a local restriction, however, valid only in this section. In the remainder
of the volume we revert to our understanding of “real” sets as this was explained
in Chapter II (II.1.3).
† The reader will observe that all we are doing here is arguing that a proposed new axiom is
reasonable. This is a process we have been through for all the previous axioms, and it does
not constitute a proof of the axioms in the metatheory. The notion “reasonable” is not tempo-
rally stable. When Cantor introduced set theory, the entire theory was “unreasonable” to many
mathematicians of the day – including influential ones like Poincaré, who suggested that most of
Cantor’s set theory ought to be discarded. When Russell proposed to found mathematics on logic,
this too was considered as an “unreasonable” point of view. For example, Poincaré protested that
this was tantamount to suggesting that the whole body of mathematics was just a devious way
to say A ↔ A . As mathematics progresses, mathematicians become more ready to accept the
reasonableness of formerly “unreasonable” concepts or statements.
‡ “Well-definable” is just an emphatic way of saying “(first order) definable”, in the sense that
we can write these sets down as class terms. We have already remarked (cf. II.4.2(b)) that
we cannot expect all subsets of N to be first order definable, for they are far too many of
them.
IV.2. More Justification for AC 219
† The following informal definition is adequate for our informal discussion. A precise version
will be given with the help of ordinals – formal counterparts of “stages” – when we revisit the
constructible universe in Chapter VI.
220 IV. The Axiom of Choice
In (1), D(A) denotes the set of all definable subsets of A. We will soon
make the meaning of the term D(A) precise, but for the time being let us
imagine that it is a pared-down version of P(A), that is, D(A) ⊂ P(A) for
infinite sets A.†
(iii) Immediately after each such infinite sequence of powering stages, a col-
lecting stage occurs to form the union of all the sets formed at all the
previous stages.
This process, alternating between (ii) and (iii), continues ad infinitum and con-
structs all sets (all definable sets really, but you will recall that in this section
we pretend that these are the only legitimate sets anyway).
IV.2.1 Remark. The need for collection, after each sequence of powering,
should be clear. For example, if we stop the process after the first sequence
of powering, then, even though we have constructed sets with arbitrary integer
depth of nesting of {}-brackets, we have not constructed a single set that contains
as elements sets with all possible depths of nesting of {}-brackets.
Specifically, if X 1 , X 2 , . . . , X n , . . . constitutes the first sequence of power-
ing, then
∅ ∈ X1
and for n ≥ 1,
{. . . { ∅ } . . . } ∈ X n+1
n n
{. . . { ∅ } . . . }
n n
for all n.
† Of course, in principle, we can list explicitly all subsets of any finite set, so that for finite sets A
we intuitively accept that all their subsets are definable, i.e., D(A) = P(A) in this case.
‡ A set S that satisfies a ∈ b ∈ S → a ∈ S – that is, a ∈ b ∧ b ∈ S → a ∈ S – is called transitive.
Such sets play a major role in set theory – all ordinals are transitive, to make the point. Here are
two simple examples:
(a) {#, ?}, where # and ? are urelements (a ∈ b ∈ S → a ∈ S is true for b an urelement, since
then a ∈ b ∈ S is false)
(b) {∅, {∅}}.
IV.2. More Justification for AC 221
We say “true” and “false” freely in this section, since we are working, like
Platonists, in the metatheory.
y ∈ M ∪Y → y ⊆ M ∪Y (2)
(i) The set Y constructed at the 0th stage has the property.
(ii) The property propagates with collecting.
(iii) The property propagates with powering.
As for (i), this is true because the Y formed at stage 0 is M, and M is transitive
(see the footnote to the claim) – or, another way of saying this, y ∈ M ∪ Y is
false (M ∪ Y = M contains only atoms, while y is a set).
As for (ii), let Y = {Z , W , . . .} be formed at a collecting stage, where
Z , W , . . . are all the sets formed at all the previous stages. Let y ∈ M ∪ Y .
Thus, y ∈ Y (for M contains only atoms); hence
y ∈ (say) W ⊆ M ∪ W
X ⊆ M ∪Y
We now turn to how the sets obtained at powering stages are actually “de-
fined”. Let us “sort” in an arbitrary fixed way the alphabet of the first-order
language of logic that we have been using all along (we will use the symbol ≺
222 IV. The Axiom of Choice
A = {∀, ∃, ¬, ∧, ∨, →, ↔, =, (, ), v, |, ∈, U }
where the symbols v and | are used to build the object variables v0 , v1 , v2 , . . .
as v|v, v||v, v|||v, . . . (as usual, we can use abbreviations such as x, y, z – with
or without primes or subscripts – for variables). Let us fix the order
for A.
† It might be thought – with some justification – that we are cheating somewhat here by taking M,
the set of all urelements, to be N. Recall however that all that we are after is to
(1) give a philosophically plausible description of what sets are, and
(2) within this description argue that AC holds.
In other words, following Gödel, we are proposing an informal and plausible universe of “real”
sets.
We have chosen N as the set of urelements because AC holds on it by the least integer principle.
How well does this choice hold philosophically, i.e., how well are we serving requirement (1)
above? Well, it should not be too difficult to accept the view that the primeval “real stuff” of
mathematics – the atomic objects – is the natural numbers, and that all else in mathematics
we build starting from these numbers. After all, one of the most careful among the fathers of
foundations, Kronecker, had no trouble with this position at all. He is said to have held that “God
created the integers; all else is the work of man”. Mind you, Kronecker, the mentoring father of
intuitionism and a confirmed finitist, did not allow for the entire set of natural numbers, but only
granted you the right of having as many numbers as you wanted by simply adding one to the last
one you have had.
Even technically, one can argue that the choice of such a small set of urelements does not
restrict our ability to use sets to do mathematics, for it turns out that even a smaller set works
(i.e., leads to a set theory that is sufficiently rich for the purposes of doing mathematics). Namely,
as we shall see in Chapter V, von Neumann has shown how to build the natural numbers and,
therefore, also Kronecker’s “all else”, starting from ∅.
‡ Definable in the process that we are describing. By the way, introducing a unique name for each
“real object” of a collection is a trick that we have already used in describing the semantics of
first order languages in I.5.4.
IV.2. More Justification for AC 223
Our goal is to extend the order ≺ from A to all names, and then to induce
it on the named objects, that is, all objects of our definable universe.
Thus, what we have set out to do is to achieve
a b iff a ≺
b (4)
b n ) is true in X }
x,
a = {x ∈ X : P (
We do not append all the names c to A at once. We have only appended the
names of the atoms so far to form the alphabet (5). There are two good reasons
for this: One, we will augment our formal symbol set by stages, so that as it
grows it stays (provably) well-ordered. Two, we will add a name only after its
corresponding set has been seen to be definable; for, conceivably, not all sets
are definable.
(cf. I.1.4).
Then ≺ on S + is a well-ordering.
Transform a1 a2 . . . an to
a1 a2 . . . an
where
a1 is the ≺-smallest in S (S is well-ordered!) such that a1 a2 . . . an ∈ C.‡
In general, assuming that a2 . . .
a1 ai ai+1 . . . an ∈ C has been defined,
Transform a2 . . .
a1 ai ai+1 . . . an to a2 . . .
a1 ai
ai+1 . . . an
† Of course, a string over S of length i is just a member of the i-copy Cartesian product S i :
x1 , . . . , xi . One usually writes strings without the angular brackets, and without the comma
separators, like this: x1 x2 . . . xi . Naturally, if the latter notation becomes ambiguous – e.g., if
S = {0, 00} then 00 might be either 00 or 0, 0 in vector notation – then we revert to the vector
notation.
‡
a1 may be the same as a1 .
IV.2. More Justification for AC 225
This observation validates the basis of the induction on stages that we now
embark upon.
Assume then that has been extended to a well-ordering on the set of all
objects M ∪ X defined so far, in such a way that (4) holds, where ≺ is a well-
ordering on the set of all symbols and names that we have up to now – this
augmented set is still called A – and moreover assume that the present stage
to be “executed” is a powering stage that will yield Y = D(M ∪ X ).
We now extend to Y by cases:
Let {a, b} ⊆ Y .
† For example, a defining formula P defines the same set as P ∨ P , or ¬¬P , etc., or any other
formula Q for which the equivalence P ↔ Q holds, trivially or not.
226 IV. The Axiom of Choice
we choose the ≺-smallest in (6). Similarly, of the possible Q (v0 , vm ) that
can define the set b at this stage, we choose the ≺-smallest in (7). This
invokes IV.2.5.
(ii) Of the possible parameter strings a1a2 . . .
an that work in conjunction with
the formula P (v0 , vn ) chosen in (i) above to define a as in (6), we choose
the ≺-smallest. Similarly for b. This also invokes IV.2.5.
where the ≺ to the right of “iff” is meaningful, since the involved strings are
in A + . The “normalization” of P and Q used in (6) and (7) ensures that the
extension of (and hence of ≺) is well defined, and is still a well-ordering,
since the ≺ to the right of “iff” is.
This settles the induction step with respect to powering stages – having
extended to Y so that (4) holds.
Suppose finally that the stage we are about to execute is a collecting stage
that builds Y as {X, W, Z , . . . }. By I.H., is a well-ordering on each of
X, W, Z , . . . .
Let a, b be in Y . Then a, b are in X , say. Then a b is already defined and
satisfies (4); thus we need do nothing further.†
† Since is updated only at powering stages as new sets get constructed, it is never redefined
during the normalization (i) and (ii) above. Thus it cannot be that a b in X while ¬a b in,
say, W above. The reader will also observe that IV.2.2 validates our contention that a and b are
both in some earlier “X ”.
‡ Not just set.
IV.2. More Justification for AC 227
IV.2.8 Remark. (1) Our notational apparatus does not allow higher order ob-
jects that are collections of (possibly) proper classes. If for a minute such objects
were allowed, and if is one of them and it happens to contain mutually dis-
joint nonempty classes (not just sets), then a higher order collection T exists
which contains exactly one element out of each x ∈ . Just use the -smallest
out of each x.
(2) Informally, we have established the acceptability (= informal “truth”,
modulo some appropriate understandings of what the real sets really are and
how they come about) of a strong choice principle, and hence of AC. We did
this under two opposing “philosophies” regarding set existence, a Platonist’s
approach (p. 217) and, subsequently, a definability or constructibility approach.
It must be conceded that under the philosophy “existence = definability”,
even though the argument itself that exists and is a well-ordering of the universe
is sound and can be promoted into a rigorous proof within formal set theory
once we learn about ordinals, the background hypotheses could be attacked on
the grounds that “real” sets might not be constructed in the manner we have
assumed. The whole argument was a “what if”.† In particular, there might be
dissent on the choice of urelements, on what is going on at stages, on the use
of exclusively first order formulas in defining sets, etc.
Let us be content with the fact that at least the plausibility if not as much as
“proof” of a strong AC has been established under this philosophy, because the
picture suggested of what sets are is intuitively pleasing and natural.
(3) In an axiomatic approach to set theory one adopts certain basic axioms
which are plausible (or, more boldly, “true”) and adequately describe our a
priori perception of the nature of sets. The latter means that the axioms must
also be sufficiently strong to imply as many “true” statements about sets as
possible.
There are two difficulties regarding these requirements. The first is a technical
difficulty, pointed out by Gödel (incompleteness theorem), namely, that there
† That is, a construction of a model for ZFC. The reader will note that in this “model” we only
verified AC. Of course, one must verify all the nonlogical axioms in order to claim that a structure
is a model. However, since we will revisit the constructible universe formally we chose here to
only deal with our immediate worry: the “truth” of our newest axiom, AC.
228 IV. The Axiom of Choice
are axiomatic theories (set theory – unfortunately – being one such) which are
incompletable, i.e., as long as they are consistent, they can never capture all the
true sentences that they are intended to capture (as theorems), no matter how
many axiom schemata we add (even an infinite number, as long as the formulas
that are axioms are recognizable as such).
The other difficulty has to do with limitations of our intuition (of course,
intuition advances and becomes more permissive as mathematical culture de-
velops). We do not know a priori what statements are supposed to be true (a
good thing this: otherwise mathematicians would be out of business), and coun-
terintuitive consequences of otherwise perfectly plausible axioms (shall I say
“true”?) may unfairly reflect badly on the axioms themselves, in any mathe-
matical culture that is not of sufficiently high order for mathematicians to know
better.
That “perfectly acceptable” axioms can lead to theorems that will seriously
challenge one’s intuition cannot be better illustrated than by Blum’s speed-up
theorem† in computational complexity theory. This theorem follows from the
only two axioms of the theory, both of which are outright “true”.
The theorem says that there is a computable function f on N with values in
{0, 1} which is so difficult to compute that for any program that computes f
there is another program that computes it significantly faster for all but finitely
many inputs – in other words, there is no “best” program for f . Now, this result
is certainly in conflict with intuition, but acceptable it must be, for the axioms
in this case are unassailable.
Acceptability of AC was initially hampered by a similar phenomenon: It
implied results that were unexpected and hard to swallow. The most notable such
result was Zermelo’s theorem that every set can be well-ordered, in particular
that the set of reals can. See also the discussion in Wilder (1963, pp. 73–74); in
particular note the concluding paragraph on p. 74.
To AC’s defense, we observe that mathematics is not entirely innocent of
counterintuitive constructions or theorems even in AC’s absence. We have al-
ready noted Blum’s theorem. Other examples are Weierstrass’s construction of
a continuous nowhere differentiable function, and Peano’s space-filling curve
(see Apostol (1957, p. 224)). Besides, we need AC because of vested interest.
Without it, much of mathematics is lost. For example, the standard fact that a
countable union of countable sets is countable crumbles if we disown AC (and
this may come as a surprise to many readers).‡
† See Blum (1967), or Tourlakis (1984), where this theorem is rehearsed in detail.
‡ Feferman and Levy (1963) have constructed a model of Zermelo-Fraenkel set theory without
AC, where the reals R, provably uncountable in ZF, are a countable union of countable sets.
IV.3. Exercises 229
(4) Two formal (i.e., syntactical) questions about AC must be settled right
away:
(a) If ZF is consistent and we add AC to its axioms, is the new theory, ZFC,
still consistent?
(b) Is AC provable in (i.e., a theorem of) ZF, assuming that ZF is consistent?
(Of course, an inconsistent ZF would prove every formula, including the
one that states AC (I.4.21).)
Gödel has answered (a) positively (1939, 1940; see also Devlin (1978)) by
two different methods, constructing in ZF the constructible universe of sets (it
is his first construction that we “popularized” within naı̈ve set theory to define
in this section). On the other hand, Fraenkel and Mostowski (see Jech (1978a))
and Cohen (1963) answered question (b) negatively.
Thus, both AC and its negation are consistent with ZF, and one can take
or leave AC without logical penalty either way. In this sense, AC has in the
context of ZF the same status that Euclid’s axiom on parallels has in the context
of axiomatic geometry. Adopting or rejecting Euclid’s axiom is just a reflection
of what kind of geometry one wants to do. Similarly, adopting AC or not reflects
the sort of set theory, and ultimately mathematics, one wants to do.
As we have indicated earlier, it makes sense to take a more direct approach to
our choice of axioms (rather than the indirect, or “results-driven”, approach), for
it is easy to be misled by strange but correct results. If at all possible, we should
adopt axioms by judging them on their own plausibility rather than on that of
their consequences. On that count, AC is nowadays generally accepted without
apology, since it is not any less plausible than, say, the axiom of replacement. It is
noteworthy that the first order logic which Bourbaki uses as the foundation of his
multi-volume work Éleménts de Mathématique contains a powerful “selection”
axiom – using the τ -operator (cf. Section I.6) – that directly turns the axiom of
choice of set theory into a theorem.
IV.3. Exercises
IV.1. Show by an example that the assumption of pairwise disjointness is
essential in the proof (3) → (4) of IV.1.2.
IV.2. Show that if for two objects A and B in L the formula A ∈ B is true,
then A B is also true.
The following exercises are best approached after the reader has mastered
the concepts of order and inductive definitions on ordered sets (Chapter VI).
They are presented here because of their thematic unity with the concepts of
230 IV. The Axiom of Choice
Show that if every set can be well-ordered, then (Hausdorff ) in every set A
ordered by, say, <, every totally ordered subset B is included (⊆) in a maximal
totally ordered subset M of A.
Note. Maximality means that if a ∈ A − M, then for some m ∈ M neither
a < m nor m < a.
(Hint. If B = A, there is nothing to prove. Else, let <W be a well-ordering
of A − B (which in general has no relationship to < that is already given on
A). By induction on <W , partition A − B into a good and a bad set: Put the
<W -minimum element of A − B in the good one if it is <-comparable with all
x ∈ A− B; else put it in the bad one. If all the elements of {x ∈ A− B : x <W a}
have been so placed, then place a in the good set if it is <-comparable with all
the elements in A and good; else put it in the bad one.)
IV.5. Show that the italicized statement that follows (due to Kuratowski and
Zorn; also known as “Zorn’s lemma”) is a consequence of Hausdorff’s
theorem in Exercise IV.4 above. If every totally ordered subset B of an
ordered (by <) set A has an upper bound (that is, an element b ∈ A such
IV.3. Exercises 231
Now this issue is much more complex than dealing with one (or, in any
case, “finitely” many) Pi at a time – like P2 , P101 , P123005 – which we can
define, and use, formally without the need for a formal copy of N. The trick of
absorbing the informal number i inside the name so that it is invisible in the
theory was done before (and discussed, for example, on p. 12). For example,
P2 = {x, y : (∃z)(yPz ∧ zPx)}.
Here we need to collect all the infinitely many Pi into a class, and to allow
∞
the formal system to “see” the variable i, in order to speak of “ i=1 . . . ”, a
short form of “{z : (∃i in an appropriate ZFC set of i’s)z ∈ . . .}”.
Clearly, this is true even if P is a set (a restriction we want to avoid); therefore
we need to formalize the presence of the “natural number” i.†
A similar situation arises in computer programming: We can use “infor-
mal subscripts” 1, 2, 3, . . . to denote several unrelated variables as X 1, X 2,
† When P is a set, things are a bit easier. We can then prove existence of P+ not by confronting
∞
i=1 P but by avoiding it. See Exercise V.16.
i
232
V.1. The Natural Numbers 233
Of course we have to settle a few things. Does any inductive set really exist?
Is it not possible that an inductive set might contain much more than what we
would care to identify with natural numbers? In a way, the answers are “yes”
and “yes” – the first by the “axiom of infinity”, the second by the fact that
234 V. The Natural Numbers; Transitive Closure
we have “limit ordinals” larger than ω. These are inductive sets that contain
much more than just copies of the intuitive natural numbers (this will be better
understood in Chapter VI).
By the remark following V.1.1, the axiom is a formula of set theory. Because
of Example V.1.2, an inductive set – if such a set “truly” exists – has as a subset
a set of “aliases” of all the members of N, so it is intuitively infinite; hence the
axiom name is appropriate. Now why is the axiom “really true”? Because we can
certainly construct each (real) set in the infinite sequence ∅, {∅}, {∅, {∅}}, . . . ,
for each integer depth of nesting n ∈ N,† and put them all in a class. Now this
class has the same size as the (real) set N (why?); hence it must be a set by
the “size limitation doctrine” of Chapter III. Alternatively, we can say that since
collection is “true” and N is a set, then ran( f ) is also a set (III.11.28), where
f is the function with domain N that for each n ∈ N “outputs” the set in the
sequence ∅, {∅}, . . . that has depth of nesting of braces equal to n.
Furthermore, by construction, ran( f ) is inductive.
It should be noted that the negation of the axiom of infinity is “no inductive sets
exist”, not “infinite sets do not exist” (see Exercise VI.54).
Finally, we should mention that it is known that Axiom V.1.3 is not provable
by the axioms we have so far (again, see Exercise VI.54); therefore it is a
welcome addition, being intuitively readily acceptable (and necessary).
V.1.4 Lemma. If F is a nonempty family of inductive sets, then F is an
inductive set.
† To reach any set in the sequence that involves depth of nesting of { }-brackets equal to n, all we
have to do is to write down a proof, of length n + 1, that starts with the statement “∅ is a set” and
repeatedly uses the lemma “since x is a set, then so is x ∪ {x}” (by union and pairing) as x runs
through ∅, {∅}, . . . .
V.1. The Natural Numbers 235
We will call ω the set of formal natural numbers (we will drop the qualification
“formal” whenever there is no danger of confusing N and ω).
Members of ω are called (formal) natural numbers. In our metanotation
n, m, l, i, j, k – with or without primes or subscripts – default to (formal) natural
number variables unless the context dictates otherwise.
That is, we are introducing natural number typed variables in our
argot. Thus, “(∀m) P [m]” is short for “(∀x ∈ ω) P [x]”. “(∃m) P [m]” means
“(∃x ∈ ω) P [x]”.
We develop a few properties of the formal natural numbers that we will need
on one hand for our theoretical development, and on the other hand in order to
make the claim that ω is a formal counterpart of N more acceptable.
236 V. The Natural Numbers; Transitive Closure
Proof. n ∈ n ∪ {n}.
Proof. We prove‡
(∀z)(∀x, y)(x ∈ y ∈ z → x ∈ z) (1)
by induction on z.
Basis. x ∈ y ∈ ∅ → x ∈ ∅ is provable, since x ∈ y ∈ ∅ is refutable.
I.H. For a frozen z assume
(∀x, y)(x ∈ y ∈ z → x ∈ z) (2)
Let now x, y be frozen variables,§ and add the assumption x ∈ y ∈ z ∪ {z}.
Case y ∈ z. Then (I.H. and specialization) x ∈ z ⊆ z ∪ {z}.
Case y = z. Then x ∈ z ⊆ z ∪ {z}. By the deduction theorem,
x ∈ y ∈ z ∪ {z} → x ∈ z ∪ {z}
Hence
Proof. We prove
(∀y)(∀x)(x ∈ y ∈ ω → x ∈ ω) (1)
by induction on y.
† The reader will recall that in “y ∈ x”, “x” is the input, for according to our conventions x, y is
a pair in ∈. A sizable part of the literature has “y ∈ x” to mean y, x is in ∈, i.e., it has y as the
input. Naturally, for them, a transitive class is not ∈-closed; instead it is ∈−1 -closed.
‡ We use the shorthand “(∀x, y)” for “(∀x)(∀y)”
§ That is, we must remember not to universally quantify them or substitute into them prior to our
intended application of the deduction theorem.
238 V. The Natural Numbers; Transitive Closure
(∀x)(x ∈ y ∈ ω → x ∈ ω) (2)
To argue the case for y ∪ {y}, let now x be frozen† and add the assumption
x ∈ y ∪ {y}.
Case x ∈ y. Then (I.H. and specialization) x ∈ ω.
Case x = y. Then x ∈ ω.
Thus, we have proved (deduction theorem followed by generalization)
(∀x)(x ∈ y ∪ {y} ∈ ω → x ∈ ω)
The above two lemmata say quite a bit about the structure of natural numbers:
and we have a complete characterization of natural numbers that does not need
the axiom of infinity anymore. (See Exercise V.5.) Well, we will need infinity
sooner or later, and we will need induction and inductive definitions over ω
sooner rather than later, so it was not a bad idea to introduce the “whole” ω
now.
† In argot one often says “let x be arbitrary but fixed”, referring to the “value” of x.
V.1. The Natural Numbers 239
(19), (20))
pr (x) = y ↔
x ∈ ω ∧ (x = ∅ ∧ y = ∅ ∨ x = ∅ ∧ x = y ∪ {y}) (1)
∨x ∈
/ ω∧y =∅
Thus (aforementioned (19) and (20) respectively)
and
x∈
/ ω → pr (x) = ∅
V.1.17 Theorem (The Minimality Principle for ω, and for Any n ∈ ω).
Proof. (In the metatheory.) Taking recursive (inductive) definitions over N for
granted,† a unique and total I , as defined by recursion in the statement of the
metatheorem, exists. Let us prove its other stated properties.
1-1-ness: By (metatheoretical) induction on n − m ≥ 1 (n, m in N) over N
we will prove that m < n → I (m) ∈ I (n), hence m = n → I (n) = I (m).
Basis. If n − m = 1, then I (n) = I (m) ∪ {I (m)}.
I.H. Assume the claim for n − m = k. Case n − m = k + 1: Now I (n) =
I (m + k + 1), so that I (m + k) ∈ I (n). By I.H., I (m) ∈ I (m + k) so that
I (m) ∈ I (n), since the sets I (i) are transitive.
Ontoness: By contradiction, let n ∈ ω be ∈-minimal such that n ∈ / ran(I ).‡
Now, n = ∅ for ∅ ∈ ran(I ). Thus (V.1.15), n = pr (n) ∪ { pr (n)}. Since pr (n) ∈ n
and n is minimal with the above property, pr (n) fails the property, that is,
pr (n) = I (m) for some m ∈ N. But then I (m + 1) = n, hence n ∈ ran(I ); a
contradiction.
† See I.2.13, p. 26, for justification in a general setting. We will consider their formal counterparts
over ω shortly.
‡ By correctness and soundness of ZFC, the real ω satisfies the minimality principle, i.e., Theo-
rem V.1.17 is really true.
V.1. The Natural Numbers 241
(∀m, n)(m ∈ n ∨ m = n ∨ n ∈ m)
is not one of those. That this “truth” is formally provable within set theory we
will see shortly. First, let us summarize our position vs. ω:
with using sets whose elements behave like natural numbers for purposes that
include the ones articulated at the outset in this chapter.
ZFC (∀n)(∀m)(m ∈ n ∨ m = n ∨ n ∈ m)
(∃n)(∃m)(¬m ∈ n ∧ ¬m = n ∧ ¬n ∈ m) (1)
(∃m)(¬m ∈ n 0 ∧ ¬m = n 0 ∧ ¬n 0 ∈ m) (2)
¬m 0 ∈ n 0 ∧ ¬m 0 = n 0 ∧ ¬n 0 ∈ m 0 (3)
m0 ∈ x ∨ m0 = x ∨ x ∈ m0 (4)
¬m ∈ n ∧ ¬n ∈ m ∧ ¬m = n
† The qualification “in ω” can be omitted (indeed, it will be in the remainder of the proof) in view
of the naming convention of V.1.5.
‡ This uses proof by auxiliary constant, n 0 , between the lines. The reader was forewarned at
the beginning of Chapter IV that we will be increasingly using the “relaxed” proof style (see
also III.5.9, p. 148).
244 V. The Natural Numbers; Transitive Closure
V.1.21 Theorem (Recursive Definitions over ω). Given a set A and a total
function g : ω × A → A in the sense of III.11.12. There exists a unique total
function f : ω → A that satisfies the following recursive definition:
f (0) = a, where a ∈ A
(R)
for n ≥ 0, f (n + 1) = g(n, f (n))
Argument. (R) gives the value of f at 0, namely a. So, if we take the I.H. that
f (n) is defined, then (R) (second equation) shows that f (n + 1) is also defined
(since g is total). By induction over ω, f is defined for all n ∈ ω, hence it exists.
The above argument is drastically off the mark. All we really have argued
about is that any f that happens to satisfy (R) also satisfies dom( f ) = ω, or,
“if an f satisfying (R) exists, then dom( f ) = ω”. Thus we have not proved
existence at all. After all, a function f does not need to be total in order to
“exist”.
The correct way to go about proving existence is to build “successive approx-
imations” of f by “finite” functions that satisfy (R) on their domain. Each of
these finite functions will have as domain some natural number n ∈ ω −{0}. To-
wards this purpose we relax the “for n ≥ 0” requirement in (R). It turns out that
these finite functions (if they exist) are pairwise consistent in that, for any two
V.1. The Natural Numbers 245
of them, h and p, one has either h ⊆ p or p ⊆ h. Thus the union of all of them
is a function f . With a bit of extra work we show that f is total and satisfies (R).
Let
F = f : f is a function
∧ dom( f ) ∈ ω − {0} ∧ f (0) = a ∧ (1)
(∀k ∈ dom( f )) k = 0 → f (k) = g(k − 1, f (k − 1))
A fact used twice below is that F = ∅. For example, {0, a} ∈ F ; also,
{0, a, 1, g(0, a)} ∈ F . The first of these two functions has domain equal
to 1, the second has domain equal to 2.
Although we have used trichotomy in the existence part, this can be avoided.
See Exercise V.4.
and
ZFC x ∈ ω → y ∈ ω → f + x, y ∪ {y} = f + (x, y) ∪ { f + (x, y)}
ZFC (∀n)(∀m)(m + n = n + m)
Proof. We do induction on n.
Basis. n = 0. We want to prove
(∀m)(m + 0 = 0 + m) (2)
Anticipating success, this will also entail (by commutativity of equality and the
Leibniz rule)
(∀m)(0 + m = m + 0) (2 )
0 + (n + 1) = (n + 1) + 0
m + (n + 1) = (n + 1) + m (5)
(m + 1) + (n + 1) = (n + 1) + (m + 1)
Well,
(n + 1) + (m + 1) = ((n + 1) + m) + 1 by V.1.23
= (m + (n + 1)) + 1 by I.H. on m (5)
= ((m + n) + 1) + 1 by V.1.23
= ((n + m) + 1) + 1 by I.H. on n: (3) and specialization
= (n + (m + 1)) + 1 by V.1.23
= ((m + 1) + n) + 1 by I.H. on n: (3) and specialization
= (m + 1) + (n + 1) by V.1.23
The reader has just witnessed an application of the dreaded double induction.
That is, to prove
one starts, in good faith, an induction on n. En route it turns out that in order to
get unstuck one has to do an induction on m as well, towards proving the basis
(∀m) P (m, 0) and the induction step (∀m) P (m, n) → (∀m) P (m, n + 1).
The good news is that it is not always necessary to do a double induction in
order to prove something like (6). See for example the proof of the next result.
The reader can prove the associativity of + (Exercise V.6), which we take
for a fact from now on. Since, intuitively, n = {0, 1, . . . , n −1}, then, intuitively,
n + m = {0, 1, . . . , n − 1, n + 0, n + 1, . . . , n + (m − 1)} . That is, to obtain
n + m we “concatenate” to the right of n the elements of m “shifted” by n.
Formally this is true:
V.1.25 Theorem.
ZFC (∀n)(∀m) n + m = n ∪ {n + i : i ∈ m}
V.1. The Natural Numbers 249
(∀m)(n + m = n ∪ {n + i : i ∈ m})
by induction on m.
Basis. For m = 0 (i.e., m = ∅) the claim amounts to n + 0 = n ∪ ∅, while,
trivially, ZFC n ∪ ∅ = n. Done by V.1.23.
I.H. Assume
n + m = n ∪ {n + i : i ∈ m}
for frozen m and n. We look at the case m + 1: The left hand side is
n + (m + 1) = (n + m) + 1 by V.1.23
(1)
= (n + m) ∪ {n + m} (expanding “+1”)
n ∪ {n + i : i ∈ m ∪ {m}} = n ∪ {n + i : i ∈ m} ∪ {n + m}
= (n + m) ∪ {n + m} by I.H.
V.1.26 Theorem.
ZFC (∀n)(∀m) n ≤ m → (∃!i)n + i = m
Let then – invoking proof by auxiliary constant between the lines, as in the
proof of V.1.20 – n 0 be smallest such that†
(∃m) n 0 ≤ m ∧ ¬(∃i)n 0 + i = m (1)
n 0 ≤ m 0 ∧ ¬(∃i)n 0 + i = m 0 (2)
†
That is, “add thenew constant n 0 and the assumption (1) along with k < n 0 → ¬(∃m) k ≤ m ∧
¬(∃i)k + i = m ”.
250 V. The Natural Numbers; Transitive Closure
n0 + i = n0 + j (3)
(n 0 − 1) + i + 1 = (n 0 − 1) + j + 1
{m, n, i : m = n + i}
−:ω→ω
ZFC n ≤ m → m − n ↓
and
ZFC n ≤ m → m = n + (m − n)
while
ZFC m < n → m − n ↑
(1) We are painfully aware of the multiple meanings of the symbol “−” in
set theory as set difference and, now, natural number difference, but such
“overloading” of symbol meaning is common in mathematics.
V.1. The Natural Numbers 251
(2) The difference between natural numbers does not coincide with the set
difference of the two numbers. For example, in the former sense, 2 − 1 =
1 = {0}, while in the latter sense 2 − 1 = {0, 1} − {0} = {1}. The context
will alert the reader if we (ever) perform m − n in the “set sense” rather
than the (normally) “natural number sense”.
(3) Number difference is consistent with the earlier introduction of “−” in the
context of predecessor. In the former sense, if n ≥ 1, then n −1 is the unique
number m such that m ∪ {m} = n, i.e., m + 1 = n; that m is precisely the
predecessor of n.
(1) The vector a, b is the set {a, {a, b}}, while the sequence [a, b] is the
set {0, a, 1, b} = {0, {0, a}}, {1, {1, b}} ; thus they are different as
sets.
(2) The vector x1 , . . . , xn has the informal n in its name, so that n cannot
be manipulated by the formalism.† Thus x1 , . . . , xn is much like a set
of unrelated variables X 1, . . . , X n in a programming language, while a
sequence f = {0, x1 , . . . , n − 1, xn } not only gives the same positional
information, but also behaves like an array f (i) for i = 0, . . . , n − 1 in a
programming language; for the i in f (i) has formal status.
† One can of course revisit the definition of . . . and redefine it in terms of the formal numbers n.
Such rewriting of history will be unwise in view of the commotion it will create. As it stands we
are doing fine: The original definition allowed the theory to bootstrap itself up to a point where
the present more general and more flexible definition of sequence was given.
252 V. The Natural Numbers; Transitive Closure
The reader who has read volume 1, Chapter II, now armed with V.1.8, V.1.9,
V.1.23, Exercise V.11 (which introduces multiplication over ω), V.1.20, V.1.32,
V.2. Algebra of Relations; Transitive Closure 253
and V.1.33 along with the induction principle over ω, will see with a minimum
of effort or imagination that the Gödel incompleteness theorems hold for ZFC –
a fact that we took as a given in many earlier discussions.
Indeed, one need only carry out the formal Gödel numbering of volume 1,
Chapter II, within ZFC (rather than within PA) using terms t of ZFC that
(provably) satisfy t ∈ ω as Gödel numbers of formulas, terms, and proofs in
(any extension of ) L Set . In this endeavour the proved results (for ω) that we
enumerated above – suggesting them as appropriate ammunition – play the role
of ROB and induction, which – in arithmetic – were assumed axiomatically in
volume 1.
Moreover, if denotes the set of individual ZFC axioms,† then it is easy
to prove that the corresponding formula Γ(x) is recursive. Everything else has
already been done in the aforementioned chapter.
or, equivalently,
y ∈ (P ◦ S)x abbreviates ∃z ∈ Sx y ∈ Pz
We are adopting the notational convention that “P ◦ Sx” means “(P ◦ S)x”,
that is, we render the use of brackets redundant.
V.2.3 Lemma. For any relations P and S and all x, P ◦ Sx = P[Sx].
† Recall that separation and collection denote infinitely many axioms, and so does foundation in
the form we have adopted, although the latter can be replaced by a single axiom.
254 V. The Natural Numbers; Transitive Closure
Proof. ⊆: Let y ∈ P ◦ Sx. Then, for some z, y ∈ Pz ∧ z ∈ Sx.† That is,
y ∈ P[Sx] (Definition III.11.4).
⊇: Let y ∈ P[Sx]. Then, for some z ∈ Sx, one has y ∈ Pz; hence
y ∈ P ◦ Sx by V.2.1.
$ %
V.2.4 Corollary. For any relations P and S and all X, P ◦ S[X] = P S[X] .
V.2.5 Example. Let R = {1, 2, 1, 3} and S = {1, 1, 2, 1}. Then
Thus S ◦ R = {1, 1}. On the other hand, one similarly calculates that R ◦ S =
{1, 2, 1, 3, 2, 2, 2, 3}.
Therefore, in general, ZFC S ◦ R = R ◦ S.
P ◦ (S ◦ T) = (P ◦ S) ◦ T
† The reader who may long for the earlier tediously formal proof style will note that “z” can be
thought of as the name of an auxiliary constant here.
V.2. Algebra of Relations; Transitive Closure 255
Digression. By the (informal) definition of class equality (cf. III.4.7 and III.4.8)
and the deduction theorem, a formula A = B is proved by “letting x ∈ A”
(frozen x) and then proving x ∈ B to settle “⊆”, subsequently repeating these
steps with the roles of A and B reversed. We have already employed this tech-
nique in the proof of V.2.3.
When we deal with relations P and S and we want to prove P = S, the above
technique translates to “letting” x P y in the “⊆-direction”. The reason is that
one really “lets”
z∈P (1)
Then (cf. III.11.1), OP(z) follows, that is,
(∃u)(∃v)u, v = z (2)
Letting now x and y be auxiliary (new) constants, we can add the assumption
y, x = z so that (1) becomes
xPy (3)
With some work, one then proves x S y, that is, z ∈ S. This settles P ⊆ S.
Thus, in practice, one is indeed justified in suppressing the steps (1)–(2) and
start by “letting” (3).
P ◦ ·
· · ◦ P
n copies
This tentative definition is acceptable, but it has the drawback that it hides
n in the name, as we have already discussed in the preamble of Section V.1.
We can fix this easily, if P is a set, by making V.2.7 into a formal recursive
definition of a function n "→ Pn on ω,† replacing “n ∈ N” by “n ∈ ω”.‡
However we want to afford our exposition the generality that P may be a
proper class. Intuitively, x Pn y (n ∈ ω and n = 0) should mean that for some
sequence [ f 0 , f 1 , . . . , f n ],
f 0 P f 1 P · · · P f n−1 P f n
where x = f 0 and y = f n .
The reader already knows how to express “ f is a function” within set theory.
† Technically, we then also need a meaning for P0 , i.e., a value of the defined function at 0. As
such we can take, for example, {x, x : x ∈ field(P)}.
‡ The requirement that P be a set makes the pairs n, Pn of the recursively defined function
meaningful, for the two components of a pair must be sets or atoms.
V.2. Algebra of Relations; Transitive Closure 257
V.2.10 Example. Let A = {1, 2, 3}. Then 1A = ∆A = {1, 1, 2, 2, 3, 3}.
V.2.11 Lemma. For each P : A → A, one has P ◦ ∆ = ∆ ◦ P = P.
Proof. We have
y ∈ P ◦ ∆x
y ∈ P[∆x]
y ∈ P[{x}]
iff
y ∈ Px
Thus P ◦ ∆ = P. Similarly, ∆ ◦ P = P.
ZFC P1 = P
ZFC Pn+1 = Pn ◦ P for any n ∈ ω − {0}
† The context will guard against confusing this ∆ with that of volume 1, Chapter II.
258 V. The Natural Numbers; Transitive Closure
Since ZFC j < 1 ↔ j = 0 (recall that “ j < 1” means “ j ∈ {∅}”), the one
point rule (I.6.2) yields at once that (1) is provably equivalent to the following:
(∃ f ) f is a function ∧ dom( f ) = 2 ∧
(2)
f (0) = x ∧ f (1) = y ∧ f (0) P f (1)
f 0 is a function ∧ dom( f 0 ) = 2 ∧
(3)
f 0 (0) = x ∧ f 0 (1) = y ∧ f 0 (0) P f 0 (1)
Next, assume
n>0 (5)
and
x Pn+1 y (6)
Note that x Pn+1 y (cf. V.2.8) contributes the redundant (by V.1.8; hence not
included in (7)) conjunct n + 1 > 0. By V.1.33 and employing tautological
equivalences, distributivity of ∀ over ∧, and the one point rule, (7) is provably
V.2. Algebra of Relations; Transitive Closure 259
equivalent to
n > 0 ∧ (∃ f ) f is a function ∧ dom( f ) = n + 2 ∧ f (0) = x ∧ f (n + 1)
= y ∧ (∀ j) j < n → f ( j) P f ( j + 1) ∧ f (n) P f (n + 1)
(8)
(8) allows the introduction of a new constant h and of the accompanying
assumption
h is a function ∧ dom(h) = n + 2 ∧ h(0) = x ∧
(9)
(∀ j) j < n → h( j) P h( j + 1) ∧ h(n) P y
or, setting g = h |` (n + 1) – which implies g(n) = h(n) in particular –
g is a function ∧ dom(g) = n + 1 ∧ g(0) = x ∧ g(n) = h(n) ∧
(∀ j) j < n → g( j) P g( j + 1) ∧ h(n) P y
which, by (5) and the substitution axiom, yields x Pn h(n) P y. Hence x Pn ◦P y.
The reader will have no trouble establishing the converse.
P0+1 = P1
=P by V.2.12
and
P0 ◦ P = ∆ ◦ P
=P by V.2.11
In the statement of the lemma, as is the normal practice, we use “implied multi-
plication”, i.e., “m j” means m · j. We will also follow the standard convention
that “·” has smaller scope (or higher “priority”) than “+”, so that m + n j means
m + (n j).
(1) It is clear that (a) above says more, namely “P is symmetric iff x P y ↔
y P x”, for the names x, y can be interchanged in the definition.
(2) All concepts except reflexivity depend only on the relation P, while reflex-
ivity is relative to a class A. If P : A → A and if it is reflexive on A, we
usually say just “reflexive”.
Reflexivity on A clearly is tantamount to ∆A ⊆ P.
a ≡m b iff m |a − b
V.2.19 Example. Let R = {1, 2, 2, 1, 2, 3, 3, 2, 1, 1, 2, 2, 3, 3}
on the set {1, 2, 3}. R is reflexive and symmetric, but is not antisymmetric or
transitive.
Proof. For symmetry: If part. Let P = P−1 and x P y. Then x P−1 y as well;
therefore y P x, so that P is symmetric.
Only-if part. ⊆: Let P be symmetric and x P y. It follows that y P x; therefore
x P−1 y (by the definition of P−1 ).
† It goes almost without saying that no relation can be irreflexive and reflexive on a nonempty
class.
‡ It is usual, whenever it is typographically elegant, to denote the negation of . . . P . . . , i.e.,
¬ . . . P . . . , by . . . P . . . , for any relation P.
262 V. The Natural Numbers; Transitive Closure
V.2.21 Example. For any relations P and S, ZFC (P ∪ S)−1 = P−1 ∪ S−1 .
Indeed, let x, y ∈ (P ∪ S)−1 . Then
y, x ∈ P ∪ S
Hence
y, x ∈ P ∨ y, x ∈ S
Hence
Hence
(1) The reflexive closure of P with respect to A, rA (P), is the ⊆-smallest relation
S that is reflexive on A such that P ⊆ S.
(2) The symmetric closure of P, s(P), is the ⊆-smallest symmetric relation S
such that P ⊆ S.
(3) The transitive closure of P, t(P), is the ⊆-smallest transitive relation S such
that P ⊆ S. The alternative notation P+ is often used to denote t(P).
“S is ⊆-smallest such that F holds” means that if F holds also for T, then
S ⊆ T.
† In analogy with the conjunctional use of “<” in x < y < z, one often uses an arbitrary relation
P conjunctionally, so that x P y P z stands for x P y ∧ y P z.
V.2. Algebra of Relations; Transitive Closure 263
(a) rA (P) = P ∪ ∆A ,
(b) s(P) = P ∪ P−1 ,
∞ i
(c) P+ = i=1 P.
∞ i
P+ = i=1 P is, of course, to be understood as an abbreviation of x P+ y ↔
(∃i ∈ ω)(i > 0 ∧ x Pi y).
P j+1 ⊆ T (1)
x P j+1 y
(∃z)(x P j z ∧ z P y) V.2.12
(∃z)(x T z ∧ z T y) I.H. and assumption on T
xTy T is transitive
∞
which proves the induction step. i=1 Pi ⊆ T follows at once.
V.2.25 Example. Why does a class A fail to be transitive? Because some set
x ∈ A has members that are not in A. If we fix this deficiency – by adding to
A the missing members – we will turn A into a transitive class. All we have to
do is to iterate the following process, until no new elements can be added:
Add to the current iterate of A – call this Ai – all those elements y, not
already included, such that y ∈ x ∈ Ai for all choices of x.
So, if we add a y, then we must add also all the z ∈ y that were not already
included, and all the w ∈ z ∈ y that were not already included, . . . . In short, we
add an element w just in case w ∈ z ∈ y ∈ · · · ∈ x ∈ A for some z, y, . . . , x.
With the help of the transitive closure – and switching notation from “∈ the
nonlogical symbol” to “∈, the relation {x, y : y ∈ x}”‡ – this is simply put as:
It turns out that A∪ ∈+ [A] is the ⊆-smallest transitive class that has A as a
subclass – that is, it extends A to a transitive class in the most economical way
(see below).
V.2.27 Proposition.
x ∈ ∈ [∈i [A]]
= ∈ ◦ ∈i [A], by V.2.4
= ∈i+1 [A], by V.2.12
⊆ ∈+ [A]
⊆ T C(A)
† Brackets are inserted this once for the sake of clarity. They are omitted in the rest of the proof.
‡ It is rather easy to see, from the context, when “∈” stands for the relation and when for the
predicate.
266 V. The Natural Numbers; Transitive Closure
Now
Hence
Therefore
x0 ∈ y ∈ B, by (∗)
and
x0 ∈ B, since B is transitive.
Thus, by induction on i, one can easily prove that ∈i [A] is a set (having defined
∈0 [A] to mean A for convenience), since
∈i+1 [A] = ∈ [∈i [A]] = ∈i [A]
From the above we infer that another way to “construct” T C(A) is to throw in
all the elements of A, then all the elements of all the elements of A, then all
the elements of all the elements of all the elements of A, and so on. That is,
T C(A) = A ∪ A ∪ A∪ A...
V.2.28 Remark. (1) It follows from Lemma V.2.24 (if that were not already
clear from the definition) that the s- and t-closures are only dependent on the
relation we are closing, and not on any other context. On the contrary, the
reflexive closure depends on a context A.
(2) We also note that closing a relation P amounts, intuitively, to adding
pairs x, y to P until the first time it acquires the desired property (reflexivity
on some A, or symmetry or transitivity). Correspondingly, P is reflexive on A,
V.2. Algebra of Relations; Transitive Closure 267
2
V.2.29 Example (Informal). If A = {1, 2, . . . , n}, n ≥ 1, then A × A has 2n
2
subsets, that is, there are 2n relations P : A → A. Fix attention on one such
relation, say, R.
2
Clearly then, the sequence R, R 2 , R 3 , . . . , R i , . . . has at most 2n distinct
terms, thus
2
2n
+
R = Ri
i=1
R + = R n−1
V.2.30 Example. Let the “higher order collection” of relations (Ta )a∈I be given
by a formula of set theory, T (a, x, y), in the sense that
x Ta y abbreviates T (a, x, y)
so that a∈I Ta stands for {x, y : (∃a ∈ I)T (a, x, y)}. Let S be another
relation.
Then the following two (abbreviations of ) formulas are provable in ZFC:
S◦ Ta = (S ◦ Ta ) (1)
a∈I a∈I
Ta ◦ S = (Ta ◦ S) (2)
a∈I a∈I
Then
(∃z) x S z ∧ z Ta y
a∈I
268 V. The Natural Numbers; Transitive Closure
(∃a ∈ I)(∃z)(x S z ∧ z Ta y)
which yields
(∃a ∈ I)(x S ◦ Ta y)
and finally
x (S ◦ Ta ) y
a∈I
Thus
∞
tr (P) = Pi (4)
i=0
Next, look into r t(P) (really r t(P) : the reflexive closure of the transitive
closure of P). Clearly,
∞
r t(P) = ∆ ∪ t(P) = Pi (5)
i=0
By (4) and (5), ZFC r t(P) = tr (P). We call this relation (r t(P) or tr (P))
the reflexive-transitive closure of P. It usually goes under the symbol P∗ .
Thus, intuitively, x P∗ y iff either x = y, or for some z 1 , . . . , z k−1 for k ≥ 1,
x P z 1 P z 2 . . . z k−1 P y.†
† For k = 1 the sequence z 1 , . . . , z k−1 is empty by convention; thus we just have x P y in this
case.
‡ The ai can be thought of as values of a function f with domain n, that is ai = f (i).
270 V. The Natural Numbers; Transitive Closure
V.2.33 Example (Informal). Consider R = {a, b, b, c} on A = {a, b, c},
where a = b = c = a. Let us arbitrarily rename a, b, c as a1 , a2 , a3 respectively –
or, equivalently, 1, 2, 3, since the a in ai is clearly cosmetic. Then,
0 1 0
A R = 0 0 1
0 0 0
1 1 0
Ar (R) = 0 1 1
0 0 1
0 1 1
A R+ = 0 0 1
0 0 0
0 1 1
Ast(R) = 1 0 1
1 1 0
1 1 1
Ats(R) = 1 1 1
1 1 1
† This “+” is often called Boolean addition, for if 0, 1 are thought of as the values false and true
respectively, the operation amounts to “∨”.
V.2. Algebra of Relations; Transitive Closure 271
whereas
,
n
A R∗ = (A R )i
i=0
∗
From the observation that R = tr (R) and from Exercise V.24 one gets
A R ∗ = (A∆ + A R )n−1
a simpler formula.
The reader will be asked to pursue this a bit more in the Exercises section, where
“good” algorithms for the computation of A R + and A R ∗ will be sought.
V.3.3 Example (Informal). Let A = {a, b}, where a = b, and B = {1, 2, 3, 4}.
Consider the following functions:
f1 = {a, 1, b, 3}
f2 = {a, 1, b, 4}
g1 = {1, a, 3, b, 4, b}
g2 = {1, a, 2, b, 3, b}
g3 = {1, a, 2, b, 3, b, 4, b}
g4 = {1, a, 2, a, 3, b, 4, b}
g5 = {1, a, 3, b}
We observe that
g1 ◦ f 1 = g2 ◦ f 1 = g3 ◦ f 1 = g4 ◦ f 1 = g5 ◦ f 1 = g1 ◦ f 2 = g3 ◦ f 2
= g4 ◦ f 2 = ∆ A
What emerges is:
(1) The equation x ◦ f = ∆ A does not necessarily have unique x-solutions,
not even when only total solutions are sought.
(2) The equation x ◦ f = ∆ A can have nontotal x-solutions. Neither a total
nor a nontotal solution is necessarily 1-1.
(3) An x-solution to x ◦ f = ∆ A can be 1-1 without being total.
(4) The equation g ◦ x = ∆ A does not necessarily have unique x-solutions.
Solutions do not have to be onto.
In the previous example we saw what we cannot infer about f and g from
g ◦ f = ∆ A . Let us next see what we can infer.
Proof. (1): Since g ◦ f is total, it follows that f is too (for f (a) ↑ implies
g( f (a)) ↑). Next, let f (a) = f (b). Then g( f (a)) = g( f (b)) by Leibniz axiom;
hence g ◦ f (a) = g ◦ f (b), that is, ∆A (a) = ∆A (b).
Hence a = b.
(2): For ontoness of g we argue that there exists an x-solution of the equation
g(x) = a for any a ∈ A. Indeed, x = f (a) is a solution.
V.3.5 Corollary. Not all functions f : A → B have left (or right) inverses.
V.3.6 Corollary. Functions with neither left nor right inverses exist.
Proof. Any f : A → B which is neither 1-1 nor onto fills the bill. For example,
take f = {1, 2, 2, 2} from {1, 2} to {1, 2}.
The above proofs can be thought of as argot versions of formal proofs, since 1
and 2 can be thought of as members of (the formal) ω.
a f ◦ f −1 b iff (∃c)(a f c ∧ c f −1 b)
iff (∃c)(a f c ∧ b f c)
iff a = b ( f is single-valued)
where the if part of the last iff is due to ontoness of f , while the only-if
part employs proof by auxiliary constant (let c work, i.e., a f c ∧ b f c . . . ). Thus
x = f −1 solves f ◦ x = ∆B . Similarly, one can show that it solves x ◦ f = ∆A
too.
Uniqueness of solution: Let x ◦ f = ∆A . Then (x ◦ f ) ◦ f −1 = ∆A ◦ f −1 =
f . By associativity of ◦, this says x ◦ ( f ◦ f −1 ) = f −1 , i.e., x = x ◦ ∆B =
−1
V.3.8 Corollary. If f : A → B has both left and right inverses, then it is a 1-1
correspondence, and hence the two inverses equal f −1 .
Proof. (1): The if part is Proposition V.3.4(1). As for the only-if part, note that
f −1 : B → A is single-valued ( f is 1-1) and verify that f −1 ◦ f = ∆ A .
(2): The if part is Proposition V.3.4(2).
Only-if part: By ontoness of g, all the sets in the family g −1 x x∈A are
#
nonempty. By AC, let h ∈ x∈A g −1 x.
Thus, h : A → B is total, and h(x) ∈ g −1 x for all x ∈ A. Now, for all x,
h(x) ∈ g −1 x iff h(x)g −1 x
iff xg h(x)
iff x = g ◦ h(x)
That is, g ◦ h = ∆ A .
(3) Recall that π, δ denote the first and second projections π (x, y) = x and
δ(x, y) = y for all x, y. Let f : C → A and g : C → B be two total
functions. Then there is a unique total function h which can label the dotted
276 V. The Natural Numbers; Transitive Closure
† The reflexive, symmetric, and transitive properties have trivial translations in the formal language.
V.4. Equivalence Relations 277
Why is this intuition not valid for arbitrary relations? Well, for one thing, not
all relations are symmetric, so if element a of A started up a club of “pals”
with respect to a (non-symmetric) relation P, then a would welcome b into the
club as soon as a P b holds. Now since, conceivably, b P a is false, b would not
welcome a in his club. The two clubs would be different. Now that is contrary
to the intuitive meaning of “equivalence” according to which we would like a
and b to be in the same club.
O.K., so let us throw in symmetry. Do symmetric relations group related
elements in a way we could intuitively call “equivalence”? Take the symmetric
relation =. If it behaved like equivalence, then a = b and b = c would require
all three a, b, c to belong to the same “pals’ club”, for a and b are in the same
club, and b and c are in the same club. Alas, it is conceivable that a = b = c,
yet a = c, so that a and c would not be in the same club. The problem is that
= is not transitive.
What do we need reflexivity for? Well, without it we would have “stray”
elements (of A) which belong to no clubs at all, and this is undesirable intuitively.
For example, R = {1, 2, 2, 1, 1, 1, 2, 2} is symmetric and transitive on
A = {1, 2, 3}. We have exactly one club, {1, 2}, and 3 belongs to no club. We
fix this by adding 3, 3 to R, so that 3 belongs to the club {3}.
As we already said, intuitively we view related elements of an equivalence
relation as indistinguishable. We collect them in so-called equivalence classes
(the “clubs”) which are therefore viewed intuitively as a kind of “fat urelements”
(their individual members lose their “individuality”).
(1) Restricting the definition of A/P to sets P, A ensures that [x] P are sets
(why?) so that A/P makes sense as a class. Indeed, it is a set, by collection.
(2) Of course, [a]P = P[{a}] = Pa.
Pause. Are all the “sets” mentioned in the theorem indeed sets?
Observe that:
is reflexive: Take any x ∈ A. By V.4.6(iii), there is an a ∈ I such that
(i)
x ∈ Fa , and hence {x, x} ⊆ Fa . Thus x x.
(ii) is, trivially, symmetric.
is transitive: Indeed, let x
(iii) y z. Then {x, y} ⊆ Fa and {y, z} ⊆ Fb
for some a, b in I . Thus, y ∈ Fa ∩ Fb ; hence Fa = Fb by V.4.6(ii). Hence
z.
{x, z} ⊆ Fa ; therefore x
is an equivalence relation. Once we show that C()
So = , i.e., A/ =
, we will have settled ontoness.
⊆: Let [x] be arbitrary in A/ (we use [x] for [x] ). Take Fa such that
x; hence z, x are in
x ∈ Fa (it exists by V.4.6(iii)). Now let z ∈ [x]. Then z
the same Fb , which is Fa by V.4.6(ii). Hence z ∈ Fa ; therefore [x] ⊆ Fa .
x; thus z ∈ [x]. All in all,
Conversely, if z ∈ Fa , since also x ∈ Fa , then z
[x] = Fa , where Fa is the unique Fa containing x. Thus [x] ∈ .
⊇: Let Fa be arbitrary in . By V.4.6(i), there is some x ∈ Fa . By the same
argument as in the ⊆-part, [x] = Fa ; thus Fa ∈ A/.
be as before, and let also R ∈ E such that A/R = .
For 1-1-ness, let and
If x R y, then [x] R = [y] R . Let [x] R = Fa for some a ∈ I ; thus x and y are in
y. The argument is clearly reversible, so R = .
Fa , that is, x
We observe:
(a) λx.[x] is total and onto. It is called the natural projection of A onto A/R f .†
(b) [x] "→ f (x) is single-valued,‡ for if [x] = [y], then x R f y and thus
f (x) ↑ ∧ f (y) ↑ ∨(∃z)(z = f (x) ∧ z = f (y)).
(c) The function [x] "→ f (x) is defined iff f (x) ↓ (trivial).
(d) [x] "→ f (x) is 1-1. For, let [x], a and [y], a be pairs of this func-
tion. The first pair implies f (x) = a, and the second implies f (y) = a, thus
f (x) = f (y), and hence f (x) ! f (y). It follows that x R f y, and hence
[x] = [y].
(e) Let h = λx.[x] and g = λ[x]. f (x). Then g ◦ h(x) = g(h(x)) = g([x]) =
f (x).
This verifies the earlier claim that the above diagram is commutative.
† The term applies to the general case λx.[x] : A → A/R, not just for the special R = R f above.
‡ In this context one often says “well-defined”, i.e., the image f (x) is independent of the repre-
sentative x which denotes (defines) the equivalence class [x].
V.5. Exercises 281
V.5. Exercises
V.1. Course-of-values induction. Prove that for any formula F (x),
ZFC (∀n ∈ ω) (∀m < n ∈ ω)F (m) → F (n) → (∀n ∈ ω)F (n)
or, in words, if for the arbitrary n ∈ ω we can prove F (n) on the induc-
tion hypothesis that F (m) holds for all m < n, then this is as good as
having proved (∀n ∈ ω)F (n).
(Hint. Assume (∀n ∈ ω) (∀m < n ∈ ω) F (m) → F (n) to prove (∀n ∈
ω)F (n). Consider the formula G (n) defined as (∀m < n ∈ ω) F (m),
and apply (ordinary) induction on n to prove that (∀n ∈ ω)G (n). Take
it from there.)
V.2. The “least” number principle over ω. Prove that every ∅ = A ⊆ ω has
a minimal element, i.e., an n ∈ A such that for no m ∈ A is it possible to
have m < n. Do so without foundation, using instead course-of-values
induction.
V.3. Prove that the principle of induction over ω and the least number principle
are equivalent, i.e., one implies the other. Again, do so without using
foundation.
V.4. Redo the proof of Theorem V.1.21 (existence part) so that it would go
through even if trichotomy of ∈ over ω did not hold.
V.5. Prove that a set x is a natural number iff it satisfies (1) and (2) below:
(1) It and all its members are transitive.
(2) It and all its members are successors or ∅.
V.6. Prove that for all m, n, i in ω, m + (n + i) = (m + n) + i.
V.7. Redo the proof of V.1.24 (commutativity of natural number addition) by
a single induction, relying on the associativity of addition.
V.8. Prove that for all m in ω, m < n implies m + 1 ≤ n (recall that ≤ on ω is
the same as ⊆).
V.9. Prove that for all m, n in ω, m < n implies m + 1 < n + 1.
V.10. Show by an appropriate example that if f, g are finite sequences, then
f ∗ g = g ∗ f in general.
282 V. The Natural Numbers; Transitive Closure
m·0 = 0
m · (n + 1) = m · n + m
Prove:
(1) · is associative.
(2) · is commutative.
(3) · distributes over +, i.e., for all m, n, k, (m + n) · k = (m · k) + (n · k).
V.12. Prove that m + n < (m + 1) · (n + 1) for all m, n in ω.
V.13. Let P be both symmetric and antisymmetric. Show that P ⊆ ∆A , where
A is the field of P. Conclude that P is transitive.
V.14. In view of the previous problem, explore the patterns of independence be-
tween reflexivity, symmetry, antisymmetry, transitivity, and irreflexivity.
V.15. For any relation P, (P−1 )−1 = P.
V.16. Let R : A → A be a relation (set). Define
P = {S ⊆ A × A : R ⊆ S ∧ S is reflexive}
Q = {S ⊆ A × A : R ⊆ S ∧ S is symmetric}
T = {S ⊆ A × A : R ⊆ S ∧ S is transitive}
Show that
r (R) = P
s(R) = Q
t(R) = T
R + = R n−1
Order
This chapter contains concepts that are fundamental for the further development
of set theory, such as well-orderings and ordinals. The latter constitute the
skeleton of set theory, as they formalize the intuitive concept of “stages” and,
among other things, enable us to make transfinite constructions formally (such
as the construction of the universe of sets and atoms, U M , and the constructible
universe, L M ).
VI.1.2 Remark.
(1) The symbol < will be used to denote any unspecified order P, and it will be
pronounced “less than”. It is hoped that the context will not allow confusion
with the concrete < on numbers (say, on the reals).
(2) If the field of the order < is a subclass of A, then we say that < is an order
on A.
(3) Clearly, for any order < and any class B, < ∩ (B × B) – or < | B – is an
order on B.
284
VI.1. PO Classes, LO Classes, and WO Classes 285
VI.1.3 Example (Informal). The concrete “less than”, <, on N is an order, but
≤ is not (it is not irreflexive). The “greater than” relation, >, on N is also an
order, but ≥ is not.
In general, it is trivial to verify that P is an order iff P−1 is an order.
VI.1.5 Example. The relation ∈ (strictly speaking, the relation defined by the
formula x ∈ y – see III.11.2) is irreflexive by the foundation axiom. It is not
transitive, though. For example, if a is a set (or atom), then a ∈ {a} ∈ {{a}} but
a∈ / {{a}}.
Let A = {∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}}. The relation ε =∈ ∩(A × A)
is transitive and irreflexive; hence it is an order (on A).
VI.1.7 Definition. Let < be a partial order on A. We use the abbreviation ≤ for
rA (<) = A ∪ <. We pronounce ≤ “less than or equal”. rA (>), i.e., rA (<−1 )
is denoted by ≥ and is pronounced “greater than or equal”.
Proof. Since
P − A ⊆ P (1)
it is clear that P − A is on A. It is also clear that it is irreflexive. We only need
verify that it is transitive.
So let
x, y and y, z be in P − A (2)
By (1)
x, y and y, z are in P (3)
Hence
x, z ∈ P
by transitivity of P.
Can x, z ∈ A , i.e., can x = z? No, for antisymmetry of P and (3) would
imply x = y, i.e., x, y ∈ A , contrary to (2).
So x, z ∈ P − A .
VI.1.12 Example (Informal). Consider the order ⊂ once more. In this case
we have none of {∅} ⊂ {{∅}}, {{∅}} ⊂ {∅} or {{∅}} = {∅}. That is, {∅} and {{∅}}
† Formally, (A, <) is not an ordered pair . . ., for A may be a proper class. We may think then
of “(A, <)” as informal notation that simply “ties” A and < together. If we were absolutely
determined to, then we could introduce pairing with proper classes as components, for example
as (A, B) = (A × {0}) ∪ (B × {1}). For our part we will have no use for such pair types and will
consider (A, <) in the informal sense.
VI.1. PO Classes, LO Classes, and WO Classes 287
are non-comparable items. This justifies the qualification partial for orders in
general (Definition VI.1.1).
On the other hand, the “natural” < on N is such that one of x = y, x < y,
y < x always holds for any x, y (trichotomy). That is, all (unordered) pairs
x, y of N are comparable under <. This is a concrete example of a total order.
Another example is ∈ on ω (V.1.20).
While all orders are partial orders, some are total (< above) and others are
nontotal (⊂ above).
† Plural of minimum.
VI.1. PO Classes, LO Classes, and WO Classes 289
Proof. The false proof of the previous example is valid under the present cir-
cumstances.
The following type of relation has crucial importance for set theory, and
mathematics in general.
† As in the case where we wrote “<” for “P” (VI.1.15), the symbol “|” is taken to have higher
priority than “a” and “↑”; thus “P | Aa ↑” means “ P | A a ↑”.
290 VI. Order
relation P – and proclaimed that “P has MC” – just in case we have proved the
schema†
Correspondingly, the phrase “let P have MC” is argot for the phrase “add the
axiom schema (1)”.
In the present connection, if we set A = {x : A[x]}, schema (1) translates
into
(∃x)A[x] → (∃x) A[x] ∧ ¬(∃y)(y P x ∧ A[y]) (2)
and each specific formula A provides an instance (see also VI.1.25 below).
The reader will immediately note that (2) generalizes the foundation schema:
Foundation is just the formal translation of the phrase “the (relation) ∈ – i.e.,
{x, y : y ∈ x} where the “∈” in “{. . .}” is the nonlogical predicate of L Set –
has MC”.
This discussion is also meant to caution that the casualness of Definition
VI.1.22 does not hide between the lines quantification over a class term (A) –
a thing we are not allowed to do.
Clearly, ∅ has MC. So, every relation can be “cut down” to a point that it
has MC (if necessary, cut it down all the way to ∅). One interesting way to cut
down a relation is by the process of restriction.
We say that P has MC over A just in case P | A has MC.
The term “PO set” (also “poset”) is standard. “LO set” is not much in cir-
culation, but “WO set” has occurred elsewhere (Jech (1978b)). By analogy we
have introduced the (non-standard) nomenclature PO class, LO class, and WO
class.
† Here x ∈ A; thus P | Ax = A ∩ Px – see VI.1.21. A provable schema is, of course, one such
that all its instances are (here, ZFC) theorems.
‡ Where “∅ = B ⊆ A” is short for “∅ = B ∧ B ⊆ A”, i.e., utilizing the connectives “=” and “⊆”
conjunctionally.
VI.1. PO Classes, LO Classes, and WO Classes 291
Case B ∩ A = ∅. Then
x ∈ B → B ∩ A ∩ Px = ∅
Therefore
x ∈ B → B ∩ (P | Ax) = ∅
by VI.1.21, which yields (2) by ∃-monotonicity (I.4.23) and modus
ponens.
Case B ∩ A = ∅. By (1),
(∃x ∈ B ∩ A)(B ∩ A) ∩ Px = ∅ (3)
since B ∩ A ⊆ A. Therefore (2) is deduced again, since (B ∩ A) ∩ Px =
B ∩ (P | Ax) and the quantification in (3) can be changed to (∃x ∈ B).
Thus under both cases, the assumption ∅ = B yields (2), i.e., P has MC
over A.
Proof. (a) Add schema (1) above, and prove schema (4): Fix F , and let B =
A ∩ {x : F [x]}. We add
(∃x ∈ A)F [x] (hypothesis of (4)) (5)
(6) yields
(∃x ∈ B)¬(∃y)(y P x ∧ y ∈ B)
which in turn yields
(∃x ∈ A) F [x] ∧ ¬(∃y ∈ A)(y P x ∧ F [y])
This concludes the proof of (4).
(b) Conversely, assume (4) and prove (1): So let ∅ = B ⊆ A for fixed B. The
class B is “given” by a class term {x : B [x]}, so that the assumption yields
(∃x ∈ A)B [x]
By (4) we get
(∃x ∈ A) B [x] ∧ ¬(∃y ∈ A)(y P x ∧ B [y])
which, in terms of B, reads
(∃x ∈ A ∩ B)A ∩ Px ∩ B = ∅
which, in view of B ⊆ A, yields exactly what we want:
(∃x ∈ B)B ∩ Px = ∅
VI.1.27 Example (Informal). (N, <), where < is the natural order, is a WO
set. (Z, <) is not. Define next ≺ on Nn+1 by
xn+1 ≺ yn+1 iff x1 < y1 ∧ xi = yi for i = 2, . . . , n + 1
where “<” denotes the natural order on N. Then (Nn+1 , ≺) is a PO set (but not
a LO set) with MC.
Indeed, ≺ is irreflexive and transitive (xn+1 ≺ yn+1 ≺ z n+1 means x1 <
y1 < z 1 and xi = yi = z i for i = 2, . . . , n +1; hence xn+1 ≺ z n+1 ); therefore
it is an order. Note that xn+1 and yn+1 are non-comparable if x2 = y2 .
For any ∅ = B ⊆ Nn+1 the minimal elements are (n + 1)-tuples with mini-
mum first component.
VI.2. Induction and Inductive Definitions 293
or, in English (again invoking the deduction theorem with the usual restrictions):
If P has IC, then to prove (∀x)A[x] it suffices to prove A[x] with the help
of an additional “axiom” (induction hypothesis): that (∀y)(y P x → A[y]) –
with all the free variables of this “axiom” frozen.
An elegant way to say the same thing is that “the property A propagates
with P” in the sense that if all the (“values” of ) y that are “predecessors” of x –
i.e., y P x – have the property, then so does x.†
If P is an order, then (1) will be immediately recognized in form. It gener-
alizes the well-known principle of course-of-values induction‡ over N.
One can easily verify that ∅ has IC.§ As in the case of the MC property, a relation
can be “cut down” until it acquires IC. In particular, this may come about by
the process of restriction.
VI.2.3 Definition. We say that P has IC over A just in case P | A has IC.
Proof. Only-if part. Assume that P has IC over A, i.e., add the following schema:
(∀x)(P | Ax ⊆ D → x ∈ D) → (∀x)x ∈ D (3)
To prove (2), fix B and add
(∀x ∈ A)(A ∩ Px ⊆ B → x ∈ B)
that is,
(∀x) x ∈
/ A ∨ (A ∩ Px ⊆ B → x ∈ B)
or, provably equivalently,¶
(∀x)(P | Ax ⊆ B → x ∈ B ∪ A) (4)
† Here we are just trying to employ some visually suggestive nomenclature, thus we are forgetting
the “reality” that y is an “output” of P on “input” x and thus, in the intuition of cause and effect,
it comes after x. We are simply concentrating on the visual effect: y appears to the left of x in
the expression y P x.
‡ This is the name of the induction over N that takes the I.H. on 0, . . . , n − 1 – rather than just on
n − 1 – in order to help the case for n. We have encountered this in a formal setting in Peano
arithmetic in volume 1, Chapter II. See also Exercise V.1.
§ This is hardly surprising, in view of VI.1.23 and VI.2.11 below.
¶ A = U M − A.
VI.2. Induction and Inductive Definitions 295
(∀x)(P | Ax ⊆ B ∪ A → x ∈ B ∪ A)
which, by (3), proves (∀x)x ∈ B ∪ A, that is, (∀x ∈ A)x ∈ B. This estab-
lishes (2).
If part. Assume (2), fix B, and calculate:
(∀x)(P | Ax ⊆ B → x ∈ B)
↔ tautological equivalence and Leibniz rule
(∀x) (x ∈ A → P | Ax ⊆ B → x ∈ B) ∧
(x ∈
/ A → P | Ax ⊆ B → x ∈ B)
↔ distributing ∀ over ∧, VI.1.21, and simplifying using Leibniz rule
(∀x ∈ A)(A ∩ Px ⊆ B → x ∈ B) ∧ (∀x)(x ∈
/ A → x ∈ B)
→ using (2)
(∀x)(x ∈ A → x ∈ B) ∧ (∀x)(x ∈ / A → x ∈ B)
↔ distributing ∀ over ∧
(∀x) (x ∈ A → x ∈ B) ∧ (x ∈ / A → x ∈ B)
↔ tautological equivalence and Leibniz rule
(∀x)x ∈ B
A ∩ Px ⊆ B
VI.2.6 Remark. In the above corollary (∀y ∈ A)(y P x → F [y]) is, of course,
the I.H. The following formula is the induction step:
(∀y ∈ A)(y P x → F [y]) → F [x] (a)
What happened to our familiar (from “ordinary” induction over N, or ω) basis
step? The answer is that to prove the induction step (a) with x free entails that the
proof must be valid, in particular, for all the P-minimal elements of A, if any.†
Now, when considering the case where x is P-minimal in A, (a) is prov-
ably equivalent to F [x] – which is a‡ basis case for x: Instead of proving (a),
prove F [x].
Indeed, that F [x] implies (a) is trivial. Conversely, since y ∈ A ∧ y P x is
for an x that we have assumed to be P-minimal, (∀y) y ∈ A∧ y P x →
refutable
F [y] is provable, so that, if (a) is, so is F [x] by modus ponens.
† It turns out that P has MC over A, so that A does have minimal elements – Theorem VI.2.11
below.
‡ “A” rather than “the”, since there may be many minimal elements.
VI.2. Induction and Inductive Definitions 297
Of course, the usual precautions that one takes when applying the deduction
theorem are taken.
Proof. (1): Consider the schema in VI.1.25 and the schema in VI.2.5. The
former schema is that of MC over A, while the latter is that of IC over A. It is
trivial to verify that an instance of any one of the two schemata realized with
a formula F is provably equivalent to the contrapositive of the instance of the
other realized with the formula ¬ F .
(2): Let instead f be an infinite descending P | A-chain. Then ∅ = ran( f ) ⊆
A, and hence there is an a ∈ ran( f ) which is P | A-minimal. Now, a = f (n)
for some n ∈ ω, but f (n + 1)(P | A) f (n), contradicting the P | A-minimality
of a.
Proof. By VI.1.26.
298 VI. Order
VI.2.13 Corollary. Let A be set. Then the following are provably equivalent:
(1) P has MC over A.
(2) P has IC over A.
(3) P is well-founded over A.
Proof. We only need to prove that (3) implies (1). So assume (3), and let (1)
fail. Let ∅ = B ⊆ A such that B has no P-minimal elements. Pick an a ∈ B.
Since it cannot be P-minimal, pick an a1 ∈ B such that a1 P a. Since a1 cannot
be P-minimal, pick an a2 ∈ B such that a2 P a1 .
This process can continue ad infinitum to yield an infinite descending
chain . . . a3 P a2 P a1 P a in A, contradicting (3).
This argument used AC, and more formally it goes like this: Let g be a choice
function for P(B) − {∅}.† Define f on ω by recursion as
f (n) =
g(B)
if n = 0
g B ∩ P f (n − 1) if n > 0
f is total on ω for B ∩ P f (n − 1) = ∅ for all n > 0, by assumption (cf.
VI.1.24). By g(x) ∈ x for all x ∈ P(B) − {∅}, we have f (n) ∈ P f (n − 1),
i.e., f (n)P f (n − 1) for all n > 0; thus f is an infinite descending chain.
VI.2.14 Remark. The corollary goes through for any class A, not just a set A,
as we will establish later.
It is also noted that a weaker version of AC was used in the proof, the so-
called axiom of dependent choices, namely that “if P is a relation and B = ∅ a
set such that (∀x ∈ B)(∃y ∈ B)y P x, then there is a total function f : ω → B
such that (∀n ∈ ω) f (n + 1) P f (n).”
We cannot sharpen the above to “P+ has MC (IC) over A”, for that means that
P+ | A has MC. The latter is not true, though: Let O be the odd natural numbers,
and R be defined on N by x R y iff x = y + 1; thus R + = >.
Now, R has MC over O (for R | O = ∅), yet R + does not, for R + | O has an
infinite descending chain in O:
··· > 7 > 5 > 3 > 1
In particular, we note from this example that (P | A)+ = P+ | A in general.
VI.2.19 Example. We already know that the axiom of foundation yields that
∈ has MC. Therefore properties of sets can be proved by ∈-induction over U M .
300 VI. Order
VI.2.20 Example (Double Induction over ω). (See also Chapter V, p. 248.)
We often want to prove
(∀m)(∀n)F (m, n) (1)
for some formula F and m, n ranging over ω. The obvious approach, which
often works, is to do induction on, say, m only, treating n as a “parameter”. That
is (assuming the problem can be handled by “simple” induction):
(i) Prove (∀n)F (0, n).
(ii) For m ≥ 0 prove (∀n)F (m + 1, n) from the I.H. (∀n)F (m, n).
Sometimes steps (i) and/or (ii) are not easy, and can be helped by induction on
n, that is:
(iii) Prove F (0, 0).
(iv) For n ≥ 0 prove F (0, n + 1) from the I.H. (on n) F (0, n),
which settles (i) by induction on n, and then
(v) For m ≥ 0 prove F (m + 1, 0), from the I.H. of (ii) above.
(vi) For m ≥ 0, n ≥ 0 prove F (m + 1, n + 1) from the assumptions
(a) I.H. on n, namely, F (m + 1, n), and
(b) I.H. on m ((ii) above).
Let us revisit the above “cascaded” induction from a different point of view.
Define ≺ on ω × ω by
a, b ≺ c, d iff c = a + 1 ∨ a = c ∧ d = b + 1
Finally prove F (m, n) from F (m, n−1) and (∀n)F (m−1, n) – this is step (vi).
VI.2. Induction and Inductive Definitions 301
VI.2.21 Example. It is clear now that since sets such as ω − {0}, N ∪ {−3, −2,
−1} are well-ordered (by <), we can carry induction proofs over them. In the
former case the “basis” case is at 1; in the latter case it is at −3.
Proof. The proof is essentially the same as that for recursive definitions over
an inductively defined set that we carried out in the metatheory in I.2.13. See
also the proof of V.1.21.
302 VI. Order
We then simply name the formula below “M( f, a)”, and let K = { f :
(∃a) M ( f, a)}.
¬U ( f ) ∧ A(a) ∧ (∀z) z ∈ f → O P(z) ∧ π(z) ≤ a ∧ X (δ(z))
∧ (∀x)(∀y)(∀z)(z
f x∧ y f x → y = z) (2 )
∧ ∀x ∈ ≤ a (∀y) y f x ↔ G (x, {u, v : v f u ∧ u < x}, y)
Since the uniqueness argument above does not depend on the particular left
field A, but only on the fact that < has IC over A, the same proof of uniqueness
applies to the case that the left field is ≤ a (a subset of A),† showing that
We have at once‡
because ≤ x ⊆≤ a∩ ≤ b by transitivity;§ hence f |`≤ x = g |`≤ x by (3).
Now
F= K is a function F : A → X (4)
for
f (x) = y ∧ M( f, a) (5)
and
F(a) ↑ (10)
We define h = F |`< a (a renaming of convenience), which is a function.
By minimality of a, the function F, and hence h, satisfy the recurrence (1) on
< a, that is
(∀x ∈< a)h(x) ! G(x, h |`< x) (11)
The function f = h ∪ {a, b} satisfies (∀x ∈≤ a) f (x) ! G(x, f |`< x),
because of (9). Hence f ⊆ F by (4). Now, f (a) = b contradicts (10).
VI.2.26 Remark. (1) Pretending that the above proof took place in the metathe-
ory, one can view it as “constructively” demonstrating the “existence” of a class
F with the stated properties. Formally, we cannot quantify over classes. Thus,
to prove “(∀A) . . . A . . . ” one proves the schema “. . . A . . .” for the arbitrary
A (that “defines” A = {x : A }). To prove “(∃A). . . A . . .” one must exhibit a
specific formula A (that gives rise to A as above) for which we can prove (the
formal translation of) “. . . A . . .”.
In particular, what we really did in the above proof were two things:
(a) We stated a formula F (x, y), displayed below, that was built from given
formulas:
(∃ f )(∃a)(M( f, a) ∧ x, y ∈ f )
where M is given by (2 ). We then proved the theorems
F (x, y) ∧ F (x, y ) → y = y (∗)
† Clearly, by the direction already proved, F(a) ↓ is incompatible with the failure of (8).
VI.2. Induction and Inductive Definitions 305
and (1) of the theorem, using in the latter case the abbreviation F(x) = y for
F (x, y). This was our “existence” proof.
(b) The uniqueness part showed that our “solution” F is unique within equi-
valence: Any other formula H that is functional (i.e., satisfies (∗) above with
F replaced by H ) and “solves” (1)) is provably equivalent to F :
A(x) → F (x, y) ↔ H (x, y)
The above discussion makes it clear that using class terminology and notation
was a good idea.
(2) The recursion on the natural numbers (V.1.21) is a special case of VI.2.25:
Indeed,
f (0) = a
for n ≥ 0, f (n + 1) = g(n, f (n))
can be rewritten as
† G is obtained from G as in III.11.20. Conversely, starting with G = {x, y, z : G (x, y, z)}, a
function, we have G (x, y, z) → G (x, y, z ) → z = z . We can then introduce G by G(x, y) =
z ↔ G (x, y, z).
306 VI. Order
where
a if n = 0
G(n, h) = g(n − 1, h(n − 1)) if h is a function ∧ dom(h) = n > 0
↑ otherwise
Note that G on ω × U M is nontotal. In particular, if the second argument is not
of the correct type (middle case above), G will be undefined. We can still prove
that f (n) ↓ for all n ∈ ω, without using V.1.21.
Assume the claim for m < n (I.H.). For n = 0, we have f (0) ! G(0, ∅) = a,
defined. Let next n > 0. Now f (n) ! G(n, f |` n) and dom( f |` n) = n by I.H.;
hence f (n) = g(n − 1, ( f |` n)(n − 1)) = g(n − 1, f (n − 1)), defined, since g
is total.
(3) In view of the above, it is worth noting that a recursive definition à
la VI.2.25 can still define a total function, even if G is nontotal.
: A × U M → X by
Proof. Define G
∅ if f is not a function
G(a, f) =
G(a, f |` Pa) otherwise
Let < stand for P+ . Now < is an order on A with IC by VI.2.16. Moreover it
is left-narrow by the axiom of union, since (V.2.24)
P+ a = P n a : n ∈ ω − {0}
and an easy induction on n shows that each P n a is a set (Exercise VI.2). Thus,
by VI.2.25, there is a unique F : A → X such that
F |`< a)
(∀a ∈ A)F(a) ! G(a,
(1)
! G a, F |`< a |` Pa
Now, Pa ⊆< a yields (F |`< a) |` Pa = F |` Pa; hence (1) becomes
But the right hand side of ! is defined for all a ∈ A; thus we can use “=”
instead of “!” in the statement of VI.2.27.
u, a
P v, b iff u = v ∧ aPb
It is clear that
P has MC. Now, (1) can be rewritten as
† “(∀s, a ∈ S × A)” is argot for “(∀z)(O P(z) ∧ π (z) ∈ S ∧ δ(z) ∈ A → . . .”, or, simply,
“(∀s ∈ S)(∀x ∈ A)”.
308 VI. Order
VI.2.30 Corollary (Recursive Definition with Parameters II). Let all as-
sumptions be as in Corollary VI.2.29, except that the recurrence now reads
(∀s, a ∈ S × A)F(s, a) ! G(s, a, {x, F(s, x) : x P a}) (1)
Then there exists a unique function F : S × A → X satisfying (1).
“Pure recursion” refers to the fact that G has only one argument, the “history”
of F on the segment < a.
Proof. In view of Theorem VI.2.25, we need only prove (2). So let dom(F) = A.
Let a in A be <-minimal (also minimum here, since < is total) such that†
F(a) ↑ i.e., G(F |`< a) ↑ (3)
Thus < a ⊆ dom(F). We will prove that dom(F) =< a. Well, let instead
b ∈ dom(F) − < a be minimal.‡
By (3) and totalness of <, we have a < b. By choice of b,
(∀x)(a ≤ x ∧ x < b → F(x) ↑)
Thus,
F |`< b = F |`< a (4)
Therefore
yields the function F = {1, 1}, whose domain is neither 2 nor a segment of
2. Thus the requirement of pure recursion in VI.2.31 is essential.†
† Purity of recursion we tacitly took advantage of in the last step of the proof of VI.2.31. Imagine
what would happen if F’s argument were explicitly present in G: We would get G(b, F |`< b) !
G(b, F |`< a), a dead end, since what we have is G(a, F |`< a) ↑, not G(b, F |`< a) ↑.
310 VI. Order
VI.2.35 Definition (Pure Sets). A set with empty support is called a pure
set.
C( p, x) = {C( p, y) : y ∈ p ∩ x} = {C( p, y) : y ∈ p ∩ x } = C( p, x )
In other words, C “collapses” any two sets x and x if their (possible) differ-
ences cannot be witnessed inside p. That is, an inhabitant of p, aware only of
members of p but of nothing outside p, cannot tell x and x apart on the basis
VI.2. Induction and Inductive Definitions 311
C( p, #) = #
C( p, !) = !
C( p, ?) = ?
C( p, @) = @
C( p, x) = {C( p, y) : y ∈ p ∩ x}
= {C( p, y) : y = # ∨ y = ?} = {#, ?}
= x
while
C( p, x ) = {C( p, y) : y ∈ p ∩ x }
= {C( p, y) : y = # ∨ y = ?} = {#, ?}
= x
and
C( p, p) = {C( p, y) : y ∈ p}
= {C( p, y) : y = # ∨ y =! ∨ y = ? ∨ y = x ∨ y = x }
= {#, !, ?, {#, ?}}
Note that – in the place of the two original x and x of p – C( p, p) (the “collapsed
p”) only contains the common collapsed element, the set C( p, x) (= C( p, x )).
Moreover, we note that the C( p, p) that we have just computed is transitive.
This is not a coincidence with the present p, but holds for all p:
Indeed, if C( p, p) is not an atom (case where p is an urelement), then it is a
transitive set (why set?). To verify, let next p be a set and (using conjunctional
notation)
a ∈ b ∈ C( p, p) = {C( p, x) : x ∈ p} (2)
b = {C( p, y) : y ∈ p ∩ x}
VI.2.37 Remark. What lies behind the fact that C( p, p) is transitive, intuitively
speaking? Well, by “squeezing out” those elements of x (in p – such as @ above)
which do not help to establish the “identity” of x in p, we have left in x, in
essence, only those objects which (in squeezed form, of course) p “knows
about” (i.e., are its elements). The collapsed p (i.e., C( p, p)) has the hereditary
property: If x (set) is in it, then so are the members of x, and – repeating this
observation – so are the members of the members of x, and so on.
It turns out that if p is extensional to begin with, then by collapsing it, not only
do we turn it into a transitive set, but also, the new set C( p, p) is essentially the
same as p; its elements are obtained by a judicious renaming of the elements
of p, otherwise leaving the {}-structure of p intact.
† By our belief that ZFC is consistent – cf. II.4.5 – set universes exist by the completeness theorem
of Chapter I. However, this p cannot be one of them, for extensionality fails in it.
‡ Caution: Since p is (supposed to be) the “universe”, “(∀z)” here is short for “(∀z ∈ p)”.
VI.2. Induction and Inductive Definitions 313
† Such models exist by the Löwenheim-Skolem theorem of Chapter I, since the language of ZFC
is countable (granting that ZFC is consistent).
VI.2. Induction and Inductive Definitions 315
C( p, x) = {#, ?}
and
C( p, x ) = {#, !}
and
(∀a ∈ A)F2 (a) ! G2 a, {x, F1 (x) : x P a}, {x, F2 (x) : x P a} (2)
and
↑ if f is not a class of x, y, z-type
p2 ( f ) = entries
{π (z), δ(δ(z)) : z ∈ f } otherwise
316 VI. Order
and set
& '
G = λx f. G1 x, p1 ( f ), p2 ( f ) , G2 x, p1 ( f ), p2 ( f )
and
† Strictly speaking, order isomorphism in this case, since the concept of isomorphism extends to
other mathematical structures as well. The prefix “iso” in the term comes from the Greek word
ίσ o, which means “equal” or “identical”.
VI.3. Comparing Orders 317
VI.3.2 Example (Informal). (N, <) and ({−2, −1} ∪ N, <) are order-
isomorphic PO sets, where the “<” is the standard order (they are also order-
isomorphic WO sets).
Indeed, if we let f : N → {−2, −1} ∪ N be λx.x − 2, then clearly f is a
1-1 correspondence and i < j iff f (i) < f ( j) for all i, j in N.
VI.3.3 Informal Definition. Let (A, S) and (B, T) be two PO classes. A 1-1
correspondence f : A → B is an order-isomorphism just in case
VI.3.4 Informal Definition. Let (A, S) and (B, T) be two PO classes, and
f : A → B be total. If, on the assumption that x ∈ A ∧ y ∈ A, the implication
x S y → f (x) T f (y), holds, then f is called order-preserving.
x ∈ A ∧ y ∈ A → x S y → f (x) T f (y)
318 VI. Order
If we use the more natural notation <1 and <2 for S and T respectively, then
the above definition says that x <1 y → f (x) <2 f (y) is the condition for a
total f to be order-preserving.
VI.3.8 Proposition. Let (A, <1 ) be a LO class, (B, <2 ) be a PO class, and
f : A → B be order-preserving (see VI.3.5 for the interpretation of these
assumptions). Then
isomorphic copy of a subclass of B (here ran( f )), where, of course, this subclass
is equipped with the same order as B, namely <2 .
If ran( f ) = B, then the embedding is an isomorphism.
VI.3.11 Remark. We say that the order <2 on B is induced by f (and <1 ).
Proof. Assume the contrary, and let m be minimal in B = {x ∈ A : f (x) < x}.
Thus,
VI.3.14 Remark. Another way to see the reason for the above is to observe
that if for any a ∈ A
f (a) < a
holds, then
The following two corollaries use the notion of segment in their formulation
(see VI.2.24).
Proof. Say (A, <) is a WO class and f : A →< a is an isomorphism, where
a ∈ A. Then f (a) ∈< a, that is, f (a) < a, a contradiction.
Proof. Let instead f (a) = a for some a ∈ A. If a < f (a), then applying the
order-preserving f −1 to both sides, we get f −1 (a) < a, contradicting VI.3.13.
For the same reason, the hypothesis f (a) < a is rejected outright.
VI.3.19 Corollary. If (A, <1 ) and (B, <2 ) are isomorphic WO classes, then
there is exactly one isomorphism f : A → B.
The next result shows, on one hand, that if two WO classes are not isomor-
phic, then one properly contains (an isomorphic copy of) the other, i.e., the
“smaller” of the two is embeddable in the “larger”. On the other hand, it shows
that every WO class has the structure of (i.e., is isomorphic to) a segment.
VI.3.20 Theorem. Let (A, <1 ) and (B, <2 ) be any WO classes and <1 be
left-narrow. Then exactly one of the following cases obtains:
(a) The two WO classes are isomorphic,
(b) (A, <1 ) is isomorphic to a segment of (B, <2 ),
(c) (B, <2 ) is isomorphic to a segment of (A, <1 ).
VI.3. Comparing Orders 321
Proof. By VI.3.16 no two of the above three cases are possible at once. It
remains to prove the disjunction of (a)–(c). Intuitively, we start off by pairing
min(A) with min(B). Then we pair the “next larger” element of A with that of
B. We continue in this way until either we run out of elements from A and B
simultaneously, or deplete A first, or deplete B first (these cases correspond to
the ones enumerated (a)–(c) in the theorem).
Formally now, if any of A or B is ∅, then the result is trivial. So let A =
∅ = B, and apply the pure recursion (VI.2.31) to define the function F : A → B
by
(∀x ∈ A)F(x) ! min y : y ∈ B − ran(F |`<1 x) (1)
where c is a new constant. Let b ∈<1 x (another auxiliary constant) such that
F(b) = c. By (1),
By (1) again, F(x) ∈ / ran(F |`<1 x) (in particular, y <1 x → F(y) = F(x),
/ ran(F |`<1 b) by <1 b ⊂<1 x.
i.e., F is 1-1); hence F(x) ∈
Thus
by (1), (4) and 1-1-ness of F (the last property sharpens “≤2 ” to “<2 ”). This
contradicts (3) since c = F(b). We have established (2).
By VI.2.31 we have one of
dom(F) = A (5)
or
c ∈ B − ran(F) (8)
c <2 x (9)
F(y) ≤2 c
Suppose now that (5) is the case. If also ran(F) = B, then we are done in this
case. If on the other hand ran(F) = B, then ran(F) is a segment by the above,
so we are done in this case.
Suppose finally that (6) is the case. Thus ran(F) is either all of B or a segment,
<2 b.
We will retire the proof if we show this latter subcase to be untenable: Indeed,
the function F ∪ {a, b} properly extends F, still satisfying (1) –
VI.3.21 Remark. The above theorem can form the basis for the comparability
of ordinals of the next section. Alternatively, one can prove the comparability
of ordinals directly and derive VI.3.20 (for WO sets) as a corollary (VI.3.23
below). We will return to this remark in the next section.
VI.3.22 Exercise.
(i) If <2 is known not to be left-narrow (the statement of the theorem allows
either possibility), then how are cases (a)–(c) affected?
(ii) Suppose that <2 is left-narrow as well, and A and B are proper classes.
What now?
VI.4. Ordinals 323
VI.3.23 Corollary. Let (A, <1 ) and (B, <2 ) be any WO sets. Then exactly one
of the following cases obtains:
VI.4. Ordinals
Let (A, <) be a WO set, where A = ∅. Let a0 = min(A). If A − {a0 } = ∅, then
let a1 = min(A − {a0 }). In general, if A − {a0 , a1 , . . . , an } = ∅, then define
an+1 to be min(A − {a0 , a1 , . . . , an }).
Possibly, for some smallest n ∈ N, A − {a0 , a1 , . . . , an } = ∅, and thus A =
{a0 , a1 , . . . , an }, so that a0 < a1 < · · · < an .
Another possibility, when A is (intuitively† ) infinite is, that we will exactly
need all the natural numbers in N in order to name the positions of the elements
of A in their (ascending) <-order; that is, A = {a0 , a1 , . . .} and a0 < a1 < · · · .
Is it possible that a WO set is so “long” that we will run out of position
names (from N) before we run out of positions in A? The answer (affirmative)
is straightforward:
† We are going to formalize the notions “finite” and “infinite” in Chapter VII.
324 VI. Order
These position names of WO set elements are the so-called ordinals (also
called ordinal numbers). They provide (among other things) an extension of
the position-naming apparatus that N is.
In order to eventually come up with a well-motivated formal definition
of ordinals, let us speculate a bit further on their nature. Extrapolating from
the discussion of Example VI.4.1, let us imagine a sequence of position
names 0, 1, . . . , ω, ω + 1, . . . , ω · 2, ω · 2 + 1, . . . , ω · 3, ω · 3 + 1, . . . of suffi-
cient length so that the elements of any WO set (A, <) can fit, in ascending
order (with respect to the WO set’s own “<”) contiguously from left to right in
named position slots (starting with the 0th position slot).
Once we have so fitted (A, <), let the ordinal α be the first unused position
name. This α characterizes the “form” or “type” of the WO set (A, <), in the
sense that if (B, <1 ) is another WO set such that (A, <) ∼ = (B, <1 ), then the
elements of B, in view of
A : a0 < a1 < · · · < aγ . . .
B : b0 < b1 < · · · < bγ . . .
will occupy exactly the same positions as the A-elements, and thus, once again,
α will be the first unused position name.
Hence, a formal definition of ordinals must ensure that they are objects of set
theory associated with WO sets in such a way that the same ordinal corresponds
to each WO set in a class of pairwise isomorphic WO sets. That is, one looks
for a function & . . . & , defined on all WO sets, such that
&(A, <1 )& = &(B, <2 )& iff (as WO sets) (A, <1 ) ∼
= (B, <2 )
The range of & . . . & will be the class of all ordinals – which turns out to be a
proper class.
VI.4.2 Tentative Definition. (See Wilder (1963, p. 111).) The ordinal or or-
dinal number of a WO set (A, <) is the class of all WO sets (B, <1 ) such that
(A, <) ∼
= (B, <1 ).
VI.4.3 Remark. The reader can readily verify that ∼ = is an equivalence rela-
tion
$ on the
% class of all WO sets. Thus, the above definition adopts (A, <) "→
(A, <) ∼=
(recall the notation introduced in V.4.3) as the function & . . . &.
It turns out that the equivalence classes [. . . ]∼ = are too big to be sets
(Exercise VI.7), so that they are inappropriate as formal objects of the
theory.
The new definition gets around the difficulty mentioned in VI.4.3. However,
it creates a great sense of uncertainty with the indefinite (“an arbitrary represen-
tative”) manner in which an ordinal is “defined”. To conclude this discussion
that peeks into the history of the development of ordinals (mostly by Cantor),
let us try and fix the latest tentative definition (VI.4.4) so that we can appreciate
that the old-fashioned way of introducing ordinals could be made to work. We
will fix the definition and follow up some of its early consequences. Once this
is done, we will have on hand enough motivational ideas to start from scratch
with von Neumann’s modern definition. The reader will benefit from knowing
both points of view.
Warning. All these tentative definitions are informal and deal with metamath-
ematical concepts.
† We avoid the term “axiom”. The reason is explained in the commentary following the definition.
326 VI. Order
VI.4.7 Remark. All along, when we wrote (A, R) for a set A equipped with a
relation R ⊆ A × A, the symbol (. . . , . . .) was used informally, simply to remind
us of the two ingredients of the situation, namely A and R (see also the footnote
to VI.1.11).
In instances such as Tentative Definitions VI.4.2–VI.4.5, for example in uses
such as &(A, R)&, one would expect to use the formal A, R instead, so that the
“pair” of A and R is an object of the theory (a set). However, we will continue
using round brackets to denote PO sets, as we have previously agreed to do.
Ordinals will be denoted by lowercase Greek letters, in general. Notation
for specific ordinals may differ (see the following example).
VI.4.8 Example (Informal). What is &({0, 1}, <)&, where < is the standard
order (∈) on ω? According to VI.4.5, it is whichever WO set of exactly two
% {b, a}) for some a = b) strong AC will pick out of
elements $(say, ({a, b},
the class ({0, 1}, <) ∼=
. We naturally use a standard name, the symbol “2”, to
denote the ordinal of a WO set of two elements. This is summed up as
VI.4.10 Remark (Informal). Recall that α = (A, <1 ) and β = (B, <2 ) for
some appropriate A, B, <1 , <2 . Now, intuitively, (A, <1 ) can be embedded
into (B, <2 ) as a segment iff the sequence
B : b 0 <2 b 1 <2 · · ·
VI.4. Ordinals 327
Proof. Let α = (A, <1 ) and β = (B, <2 ). By VI.3.23, exactly one of the follow-
ing holds:
(a) (A, <1 ) ∼
= (B, <2 ),
(b) (A, <1 ) is isomorphic to a segment of (B, <2 ),
(c) (B, <2 ) is isomorphic to a segment of (A, <1 ).
(b) and (c) say α < β and β < α, respectively, by VI.4.9.
By (a), both (A, <1 ) and (B, <2 ) are in the same equivalence class. Since
strong choice picks “deterministically” a unique representative from each equiv-
alence class, and each of (A, <1 ) and (B, <2 ) is a representative, it follows that
(A, <1 ) = (B, <2 ), i.e., α = β.
Proof. By VI.4.11, < is total. By VI.3.16, < is irreflexive. The reader can verify
that it is also transitive (Exercise VI.8). Therefore, < is a linear order.
Let next ∅ = A ⊆ On (A need not be a set). Let α ∈ A. If α = min(A),† then
we are done; otherwise X = {β ∈ A : β < α} is nonempty. Let α = (Y, <1 ).
Next, if β = (Z , <2 ) ∈ X , then (by VI.4.9) there is a unique
† The term “minimum” and “minimal” are interchangeable, since < is total (VI.1.19).
328 VI. Order
βi ∼
= (<1 yβi , <1 ) (3)
(2) contradicts the fact that (Y, <1 ) is a WO set (VI.2.13), and we have shown
that X has a minimal element, as long as we manage to convince that the inequal-
ities in (2) indeed hold.
To this end, let β < γ in X , where β = (Z , <2 ), γ = (W, <3 ). We have
γ ∼
= (<1 yγ , <1 ) (4)
and
β∼
= (<1 yβ , <1 ) (5)
Also, by β < γ ,
β∼
= (<3 u, <3 ), where u ∈ W (6)
Since we have adopted the convention that lowercase Greek letters stand for
ordinals, we will use the shorthand “{β : . . .}” for “{β ∈ On : . . .}”. Also, recall
that – for now – α = (A, <1 ) for some A (set), so that the <1 -ingredient is incor-
porated in the notation “· · · ∼
= α”. In writing “{β : . . .} ∼
= · · ·”, however, we are
VI.4. Ordinals 329
slightly abusing notation, since we ought to have written “({β : . . .}, <) ∼
= · · ·”
instead, where < is the order on On defined in VI.4.9. This type of notational
abuse is common when the order is clearly understood (this echoes the remark
following VI.3.3).
Proof. Let α = (Y, <1 ) and X = {β : β < α}.† As in the proof of VI.4.12, for
each β ∈ X we pick a yβ ∈ Y such that
β∼
= (<1 yβ , <1 ) (1)
contrary to VI.3.16.
We saw in the proof of VI.4.12 that F is order-preserving (γ < β → yγ <1
yβ ); hence (by VI.3.8)
F
(X, <) ∼
= (ran(F), <1 ) (2)
where ran(F) ⊆ Y . Now, if y ∈ Y and β = &(<1 y, <1 )&, then β < α by VI.4.9;
hence β ∈ X and F(β) = y. This shows that F is onto Y .
α = &(On, <)&
Thus,
(On, <) ∼
=α (1)
By VI.4.13,
By the normal form theorem (VI.4.13), ({α : α < &(A, <1 )&}, <), where “<”
is that of VI.4.9, is a member of [(A, <1 )]∼
= for all (A, <1 ).
Let us ponder then what would be the consequences if the principle of global
(strong) choice (invoked in VI.4.5) were to be so smart as to always pick
for “&(A, <1 )&”, for all WO sets (A, <1 ). Since the order in all instances of (i)
is the same (that is, < of VI.4.9), we could go one step further and just use the
set {α : α < &(A, <1 )&} as the ordinal for the WO set (A, <1 ), implying, rather
than including explicitly, the order <. Of course, the sets in (i) are ∼
=-invariants
just as they are when thought of as WO sets under <. Under the pondered
circumstances we end up with a recurrence
def
&(A, <1 )& = {α : α < &(A, <1 )&}
where the α’s are the ordinals (according to our present speculative analysis)
assigned to the segments of (A, <1 ) by VI.4.9.
The self-referential definition above, can also be written more simply as
< α = α (ii)
Let us “compute” (i.e., find which sets are) the first few ordinals. For example,†
(∅, <1 ) has no segments; therefore the set {α : α < &(∅, <1 )&} is empty, i.e.,
Now, ({∅}, <1 ) (where <1 is empty as well) has one segment only, (∅, <1 ).
Hence, exactly one ordinal, ∅, is smaller than &({∅}, <1 )&. Thus
&({∅}, <1 )& = {∅} (iii)
Next, let us compute &({a, b}, <2 )&, where a = b and a <2 b. The only segments
are (∅, <2 ) and ({a}, <2 ), which have ordinals ∅ and {∅} respectively.
Of course, ({a}, <2 ) ∼= ({∅}, <1 ); hence &({a}, <2 )& = &({∅}, <1 )& = {∅}
by (iii). Thus, †
Note that for the first three ordinals, at least, their order < coincides with ∈,
since
This is true of all ordinals, for‡ β ∈ α iff (by (ii)) β ∈ <α iff β < α.
Continuing our pondering on what if global choice were smart, we observe
that each ordinal is a transitive set (Definition V.1.11). Indeed, let α ∈ β ∈ γ .
By the previous remark this is equivalent to α < β < γ ; hence, by transitivity
of <, α < γ ; therefore α ∈ γ . Of course, an ordinal, being the set of all the
smaller ordinals, will have as members only transitive sets.
Von Neumann showed that, surprisingly, these transitivity properties fully
characterize the “appropriate concept” of an ordinal as a “special set”, without
any recourse to any form of AC – from which we disengage in the following
“permanent” definition – and without any a priori reliance on the concept of
well-ordering either.
The reader is now asked to consider all the preceding attempts to get ordinals
off the ground as “motivational discussion” with a historical flavour. Therefore
the definitions and consequences VI.4.2–VI.4.14 are to be discarded. Our formal
study of ordinals starts with VI.4.16 below. In particular we will show that
ordinals (as defined below) are ∼=-invariants.
On will be the class of all ordinals; that is, On abbreviates the class term
{x : Ord(x)}. Lowercase Greek letters will denote arbitrary ordinals – that is,
we employ, in our argot, “ordinal-typed” variables α, β, γ , . . . , with or without
subscripts or primes, a notation that we extend to unspecified ordinal constants.
Thus in instances such as “. . . α . . . ” we will understand more: “. . . α ∧ α ∈
On . . . ”. Of course, specific ordinals (i.e., specific ordinal constants) may have
names deviating from this rule (e.g., ∅ in the lemma below).
VI.4.17 Lemma. On = ∅.
VI.4.18 Example. Here are some more members of On, as the reader can
readily verify using VI.4.16: {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}.
Indeed, every natural number n, and the set of natural numbers ω, are ordinals
by V.1.12–V.1.13.
The definition of ordinals does not explicitly state that the members of an
ordinal are themselves ordinals. The following lemma says that.
Proof. Let
y∈x ∈α (1)
ZFC x ∈ α ↔ x ∈ α ∧ x ∈ On
VI.4. Ordinals 333
“∈”, as the context hopefully makes clear, is here the relation defined by the
predicate “∈”. We should not need to issue such warnings in the future.
Proof. By VI.4.22, it suffices to show that ∈ is total on On. To this end, let
¬ P (α0 , β0 ) (4)
334 VI. Order
Let now
γ ∈ β0 (5)
Then P (α0 , γ ) by (4) and ∈-minimality of β0 . P (α0 , γ ) (by (0)) yields one of:
Case 1. α0 = γ . That is, (by (5)) α0 ∈ β0 , contradicting (4).
Case 2. α0 ∈ γ . By (5) and transitivity of β0 , again α0 ∈ β0 ; unacceptable. We
must therefore have
Case 3. γ ∈ α0 .
Thus, by (5),
β0 ⊆ α0 (6)
Next, let
δ ∈ α0 (7)
Then ∈-minimality of α0 , and (3),† yield P (δ, β0 ). The latter yields in turn
(by (0))
Case 1. δ = β0 . That is, (by (7)) β0 ∈ α0 , contradicting (4).
Case 2. β0 ∈ δ. By (7) and transitivity of α0 , again β0 ∈ α0 ; unacceptable. This
leaves
Case 3. δ ∈ β0 .
Hence α0 ⊆ β0 , which along with (6) yields α0 = β0 . This contradicts (4).
Proof. Only-if part. By VI.4.19, an ordinal x satisfies x ⊆ On. Thus, the restric-
tion of the well-ordering ∈ of On on x well-orders the latter. Moreover, by
VI.4.16 no atom is an ordinal; thus y ∈ x → ¬U (y).
If part. Let x be a transitive set that contains no atoms, and let the restriction
of ∈ on x be a well-ordering. Let y ∈ x. First off, ¬U (y).
We need only show that y is transitive. To this end let, in conjunctional
notation,
u ∈ v ∈ y (∈ x) (1)
† ¬(∃β)¬ P (δ, β) follows; hence (∀β) P (δ, β); thus P (δ, β0 ) by specialization.
VI.4. Ordinals 335
It is a trivial observation that the corollary above goes through even if well-
order(ing) were relaxed to total order(ing), as the reader can readily check.
However the above “redundant” formulation is necessary if one desires to found
the notion of ordinal in the absence of the foundation axiom (that axiom was
used in VI.4.23 in an essential way). In that case one takes the statement of
Corollary VI.4.24 as the definition of ordinals.
In this discussion, enclosed between double “dangerous turn” road signs, we
digress to peek into this possible avenue of founding ordinals. This discussion
is only of use in the proof of the consistency of foundation with the remaining
axioms of ZFC, and can otherwise be omitted with no loss of continuity. So
we temporarily suspend here (i.e., until the end of this “doubly dangerous”
material) the axiom of foundation, and define:
(1) α ∈/ α for all ordinals. (Careful here! We cannot rely on foundation to say
that ∈ is irreflexive.) Indeed, let
α∈α (i)
∈ | α is irreflexive. (ii)
By (i), the left α is a member of the right α, so that (ii) yields α ∈/ α (these
are two copies of the left α). We have just contradicted (i).
(2) x ∈ α ∈ On → x ∈ On. Assume x ∈ α ∈ On. As in the proof of the if
part of VI.4.24, x is a transitive set. By transitivity of α, x ⊆ α. At once we
obtain that x is atom-free, since α is. Moreover, x is well-ordered by ∈, an
order inherited from α. Thus, the alternate Definition VI.4.25 too implies
that On is transitive, that is, ordinals only contain ordinals as members.
(3) α ⊂ β → α ∈ β. Assume α ⊂ β, and let γ ∈ β − α (set difference) be
∈-minimum (we say minimum rather than minimal because ∈ is total on β).
336 VI. Order
Therefore, if δ ∈ γ ; then δ ∈
/ β −α. On the other hand, δ ∈ β by transitivity
of β. Thus δ ∈ α; hence
γ ⊆α (iii)
Next, let
δ∈α (iv)
α = γ ∈ (β − α) ⊆ β
In short, α ∈ β.
(4) α = β ∨ α ∈ β ∨ β ∈ α. Let α = β. Observe that α ∩ β ⊆ α and α ∩ β ⊆ β.
Also, α ∩ β is transitive (verify) and well-ordered by ∈ (as a subset of α)
as well as atom-free (hence an ordinal). By hypothesis, one of the two
inclusions (⊆) must be proper (⊂), in which case the other is equality.
Indeed, if they are both proper, then, by (3), α ∩ β ∈ α and α ∩ β ∈ β;
hence α ∩ β ∈ α ∩ β, contradicting (1). Say, α ∩ β = α, i.e., α ⊆ β. Since
we have assumed α = β, (3) yields α ∈ β.
(5) α = {β : β ∈ α}. By (2).
(6) On is well-ordered by ∈. We need only establish ∈-MC on On. So let ∅ =
A ⊆ On, a subclass of On. Let α ∈ A. If α ∩ A = ∅, then ∈ α ∩ A = ∅,
that is, α is ∈-minimal in A. If now α ∩ A = ∅, then let β be ∈-minimal in
α ∩ A (α ∩ A is a subset of the WO set (α, ∈)). We argue that β is ∈-minimal
in A. If not, let γ ∈ A be such that γ ∈ β. Then γ ∈ α by transitivity of α;
hence γ ∈ α ∩ A. This contradicts the choice of β.
This approach (sans foundation) may be considered attractive in that it re-
lies on fewer axioms than that of VI.4.16. We are nevertheless committed to
having foundation (which has already provided us with some interesting re-
sults in VI.2.38) and therefore will continue our development based on Defini-
tion VI.4.16.
VI.4.26 Definition. As is normal practice, we will often utilize the symbol “<”
for the well-ordering ∈ | On. Thus α < β means exactly α ∈ β.
VI.4. Ordinals 337
VI.4.27 Lemma. The reflexive closure, rOn (<) =≤, of < coincides with ⊆ on
On, i.e., ≤ = ⊆ on On.
VI.4.30 Theorem. Let (A, <1 ) be a WO set. Then there is a unique ordinal α
and a unique isomorphism φ A for which
φA
(A, <1 ) ∼
= (α, <)
where < is the standard order ∈ on On.
Proof. By VI.3.20, and for the WO classes (A, <1 ) and (On, <),† we have these
three alternatives:
(1) (On, <) is isomorphic to a segment of (A, <1 ). This is impossible, because
collection would force On to be a set.
(2) (On, <) ∼ = (A, <1 ). Untenable, as in (1).
(3) So it must be that (A,<1 ) ∼
= (< α,<) for some α. By VI.4.20 (cf. VI.4.29),
this says
φA
(A, <1 ) ∼
= (α, <) (i)
† Note that both <1 and < are left-narrow, the former because A is a set, the latter by VI.4.20.
338 VI. Order
Proof. The if part is trivial. The only-if part was proved in the course of the
above proof (uniqueness of α).
† We assume without loss of generality that <1 has A as its field; therefore we do not need to add
“∧ b ∈ A” to the condition b <1 a.
VI.4. Ordinals 339
VI.4.33 Definition (Order Types of WO Sets). For any WO set (A, <1 ), the
symbol &(A, <1 )& stands for the unique α which by VI.4.30 satisfies
(A, <1 ) ∼
= (α, ∈)
α is called the order type of (A, <1 ). If A is a set of ordinals and <1 = ∈, then
we use the simpler notation &A& = α (rather than &(A, <1 )& = α).
VI.4.36 Example. Let ∅ = A be any class of ordinals (not just a set). Then A
is a set. We will argue that it is an ordinal, indeed the smallest (<-minimum)
ordinal in A.
Let α ∈ A. Since A ⊆ α, A is well-ordered by < (i.e., ∈), and since
none of α contains atoms (by VI.4.16), nor does A. Using VI.4.24, we need
only show that A is transitive in order to conclude that it is an ordinal.
So let β ∈ α ∈ A. Thus, γ ∈ A → β ∈ α ∈ γ . By transitivity of γ ,
γ ∈ A → β ∈ γ ; hence β ∈ A.
Next,
α∈A→ A⊆α (1)
translates to (VI.4.27) α ∈ A → A ≤ α. Thus, it only remains to prove that
A ∈A
The notation α + 1 for ordinals is consistent with that for the natural numbers
(V.1.19). However, unlike the special case of natural numbers (which are “finite
ordinals”), where n + 1 = 1 + n is provable for the free variable n over ω
(cf. V.1.24), it is not the case that α + 1 = 1 + α is provable in general. For
example, we will soon see that ZFC 1 + ω = ω + 1. Of course, we have not
yet said what “1 + α” ought to mean in general, but that will be done soon.
We also recall here the result of Lemma V.1.9 and the fact (see remark prior
to the proof) that
for free x and y, not just for variables restricted over ω. In particular,
ZFC α + 1 = β + 1 → α = β (1)
VI.5. The Transfinite Sequence of Ordinals 341
VI.5.2 Example (What If?). Let us prove (1) above again, this time without
using foundation (which was used in V.1.9), but instead taking as independently
given the fact that <, that is ∈, on On is a total order (for example, this would
have been the avenue taken if we were to omit the axiom of foundation and
define ordinals as in the alternate definition VI.4.25).
Under such restrictions we still have trichotomy (see p. 336, item (4)), i.e.,
we have one of α = β, α ∈ β, β ∈ α. So assume α + 1 = β + 1, and let
α∈β (2)
Now, the hypothesis α ∪ {α} = β ∪ {β} implies (via ⊇) that β ∈ α or β = α,
either of which in turn implies, along with (2), that β ∈ β, contradicting the
irreflexivity of the order ∈ on On. Similarly, β ∈ α is untenable. This leaves
α = β.
VI.5.6 Lemma. α + 1 = ∅.
Proof. α ∈ α + 1.
VI.5.7 Remark. Lemma VI.5.6 generalizes the case of natural numbers (V.1.8)
to all ordinals.
From now on we will freely use the symbol 0 for ∅ in all contexts where
the latter is thought of as an ordinal rather than just the empty set, since 0 is
the symbol we have assigned to the smallest ordinal – the natural number 0
(V.1.19).
Proof. By the axiom of infinity (V.1.3), it has already been established that ω
is a set that is a limit ordinal.
VI.5.11 Theorem. Every α falls under exactly one of the following cases:
(1) α = 0,
(2) Lim(α),
(3) α is a successor.
Proof. First, the cases are mutually exclusive. Indeed, (1) excludes (2) by def-
inition, while it excludes (3) by VI.5.6. We verify that (2) excludes (3). Say
Lim(α), yet α = β + 1 for some β. Then β < α and, by (2), β + 1 < α – i.e.,
α < α – a contradiction (see also V.1.14).
Next, let α = 0 and also ¬ Lim(α). Therefore, by trichotomy, for some
β < α we have α ≤ β + 1. By VI.5.4 and ≤ = ⊆ on On, this yields α = β + 1,
thus α is a successor.
Since On and any α are well-ordered by <, the results on induction and in-
ductive definitions presented in Section VI.2 carry over with minor translations:
Of course, “(∀α)” means “(∀α ∈ On)” (VI.4.16), while “for arbitrary α” means
that α is a free ordinal variable.
VI.5.13 Theorem (Induction over δ on Variable α). To prove (∀α < δ)F (α)
it suffices to prove, for arbitrary α < δ, that F (α) follows from the induction
hypothesis (∀β < α)F (β).
Proof. Recall that <α = α. In particular, this makes < over On left-narrow.
VI.5.17 Corollary (Pure Recursion over δ). Let G be a (not necessarily total)
function G : U M → X, for some class X. Then there exists a unique function
F : δ → X satisfying
(1) 0 ∈ S,
(2) (∀α)(α ∈ S → α + 1 ∈ S),
(3) whenever Lim(α), the hypothesis (∀β < α)β ∈ S implies α ∈ S.
Then S = On.
Of course, the above can be rephrased in terms of a formula S (α). The reader
will easily carry out this translation by letting S = {α : S (α)}.
: On × U M → X by
Proof. Define G
x if α = 0
G(α, f ) if Lim(α)
G(α, f)=
H(α − 1, f (α − 1)) if dom( f ) ⊇ α ∧ α is a successor
0 otherwise
Thus, by VI.5.11, (1)–(3) translate to (∀α)F(α) ! G(α, F |` α).
Note that by III.11.4 it is not necessary to add in the third case above “∧ f
is a function”, for dom( f ) makes sense regardless.
By induction over On, assume for all β < α that sp(β) = 0. Thus, by (1),
sp(α) = β<α sp(β) = 0. In sum
ZFC (∀α)sp(α) = 0
i.e., all ordinals are pure sets.
In view of the fact that the successor operation, +1, is inadequate to “reach”
limit ordinals, we search for more powerful operations.
VI.5.24 Remark.
(1) Suprema are unique, for if c and c are suprema of B, then c ≤ c and c ≤ c,
and hence c = c by antisymmetry. We write c = sup(B) or c = lub(B).
Upper bounds with respect to the inverse order > are called lower bounds
with respect to <. Correspondingly, least upper bounds with respect to >
are called greatest lower bounds or infima (singular infimum) with respect
to <. The latter are also unique, and we write d = inf(B) or d = glb(B) to
indicate that d is the infimum of B (in (X, <)).
(2) If B = ∅, then any a ∈ X is a lower bound, strict lower bound, upper
bound, and strict upper bound of B. Thus, the empty set has a supremum in
X iff X has a <-minimum element (which, of course, is unique if it exists).
Similarly, the statement “∅ has a glb” is equivalent to “X has a <-maximum
element”.
Note that above, consistently with Remark VI.5.24, A = ∅ implies that its sup
in On is 0. In other words, sup(A) = A is valid for A = ∅.
(1) The smallest ordinal strictly greater than all ordinals in A is sup{α +1 : α ∈
A}, denoted by sup+ (A).
(2) If A has a maximum element γ , then sup(A) = γ and sup+ (A) = γ + 1.
(3) If A does not have a maximum, then sup(A) = sup+ (A).
for all α ∈ A, the “≤” becoming “<” due to the absence of a maximum element.
But sup+ (A) also is smallest that satisfies (ii), by (1) (regardless of the issue of
maximum). Hence sup(A) = sup+ (A).
348 VI. Order
(1) weakly continuous iff, for each α ∈ dom( f ), if Lim(α) then f (α) =
sup{ f (β) : β < α},
(2) normal iff it is increasing and weakly continuous,
(3) weakly normal iff it is non-decreasing and weakly continuous.
Proof. Let x <1 y in A. Now, sup{x, y} = y, hence sup{F(x), F(y)} exists and
Proof. Let Lim(α) and α ∈ dom( f ). Now, α = sup{β : β < α}; hence, by con-
tinuity, f (α) = sup{ f (β) : β < α}.
Hence
and the ⊆ in (1) is promoted to equality. But the right hand side of ⊆ is f (α)
by weak continuity.
function. It turns out that with a bit of a boost, weak continuity can imply a
strong monotonicity property for the function, and hence continuity by VI.5.35.
Proof. We need to show f (α) < f (β) for all α < β. We do induction on β.
Basis. β = 0. The contention is vacuously satisfied.
The successor case. Say β = γ + 1, and let α < β. Thus, α = γ or α < γ .
In the former case, f (α) < f (β) by the assumption; in the latter case, by I.H.,
the assumption, and transitivity of <.
The limit ordinal case. Say Lim(β). By weak continuity,
The following easy result will be useful in Section VI.10. It says that a
normal On-sequence of ordinals maps limit ordinals to limit ordinals.
Proof. Suppose that Lim(α). Then f (α) = sup{ f (γ ) : γ < α}. But f (α) ∈ /
{ f (γ ) : γ < α} because f is increasing. Thus Lim( f (α)) by Exercise VI.14.
VI.5.41 Definition (Fixed Points). A fixed point (also called fixpoint some-
times) of a function F : A → A is a u ∈ A such that F(u) = u.
352 VI. Order
s0 = t
sn+1 = F(sn )
VI.5.43 Corollary. The fixed point u of the previous theorem is ≤-least. That
is, any c such that F(c) ≤ c satisfies u ≤ c.
Proof. Let F(c) ≤ c. Now, by induction on n we see that sn ≤ c for all n < ω,
because s0 = t ≤ c, and if (I.H.) sn ≤ c, then sn+1 = F(sn ) ≤ F(c) ≤ c.
Thus u = sup{sn : n < ω} ≤ c.
Proof. The proof follows that of Theorem VI.5.42. We must just ensure that
the key assumptions hold. Well, On has a minimum element, and if (sn )n<ω
is ascending, then sup{sn : n < ω} exists by VI.5.22. The rest is taken care of
by VI.5.35.
Proof. All else is as above, but now define s0 = γ + 1. You will need to argue
that (sn )n<ω is non-decreasing via a different route than before (Exercise VI.18).
The above says that normal On-sequences of ordinals have arbitrarily large
fixed points.
It is easy to see that the proof (imitating that of the theorem) yields the least
fixed point greater than γ (Exercise VI.18).
Theorem VI.5.42 can be sharpened in one direction, that is, dropping the
requirement of (countable) continuity. A small trade-off towards achieving this
is to restrict attention to PO sets (A, <) where every subset of A – not just
ascending sequences – has a supremum in A.
The terminology depends on which side of ≤ one is looking at. The input is
“expanded” by, or “included”† in, the output.
VI.5.47 Theorem. Let (A, <) be a PO set such that every S ⊆ A has a least
upper bound in A. If f : A → A is either inclusive or monotone, then it has a
fixpoint c ∈ A, that is, f (c) = c.
sα = f (s<α ) (3)
VI.5.48 Remark. (1) The reader can easily verify that we can weaken some-
what the assumption on (A, <) and still prove our theorem. It suffices to pos-
tulate that the PO set has a supremum for every chain.† Thus, (1) would be
undefined unless {sβ : β < α} is a chain. Well, one has to prove that it will be a
chain (under the changed assumptions for (A, <)) anyway (Exercise VI.19).
(2) It turns out that with the β and γ as fixed in the proof above,
γ ≤ α → sγ = sα (5)
On can prove (5) by induction, since the class of ordinals above γ is well-
ordered. Thus assume the claim for all α such that γ ≤ α < δ. Now, mono-
tonicity of λα.sα and the I.H. entail
s<δ = sup{sθ : θ < δ} = sγ (6)
Applying f to the two extreme sides of (6) and remembering that sγ is a fixpoint
of f , we obtain sδ = sγ . In particular, for the γ that we fixed in the proof, we
have shown that sγ = s<γ .
Pause. Must it also be the case, for the β of the proof, that s<β = sβ ?
Proof. The c of the proof of VI.5.47 works. It suffices to prove that sα ≤ d for
all α.
For α = 0, s0 = f (sup ∅). Now, sup ∅ ≤ d; hence s0 = f (sup ∅) ≤ f (d) ≤ d,
using monotonicity of f . In general, s<α = sup{sδ : δ < α} ≤ d by I.H. By
monotonicity, sα = f (s<α ) ≤ f (d) ≤ d.
Proof. Let x be any set. If x = ∅, then the result is trivial. Assume then that
x = ∅ and let f be a choice function (AC) on P(x) − {∅}, i.e.,
(∀y) ∅ = y ⊆ x → f (y) ∈ y (1)
By recursion over On (with α as the recursion variable) define
(∀α)h(α) ! f x − ran(h |` α) (2)
It follows by VI.5.16 ((2) is a pure recursion) that dom(h) = On, or dom(h) = γ
for some γ . Now, h is 1-1, for if α < β and β ∈ dom(h) – so that α ∈ dom(h)
as well – then, by (1) and (2), h(β) ∈ x − ran(h |` β), and hence h(β) = h(α).
Thus, by collection – since ran(h) ⊆ x – we have dom(h) = γ .
h is onto x, for dom(h) = γ entails that
γ = min{δ
:δ ∈ / dom(h)}
= min δ : f x − ran(h |` δ) ↑
= min{δ : x − ran(h |` δ) = ∅}
That is, x = ran(h |` γ ) = ran(h).
By VI.3.12, h induces a well-ordering <1 on x such that &(x, <1 )& = γ .
VI.5.51 Remark. The above theorem is due to Zermelo (1904, 1908). The proof
in his 1904 paper is reproduced in Kamke (1950) (see also Exercise IV.3). It
is noteworthy that while AC was, of course, employed in an essential way in
the original proof, it was taken there to be a “fundamental truth” of set theory†
rather than an additional assumption (axiom).
† See Kamke (1950, p. 112), especially the concluding remarks prior to the statement of the well-
ordering theorem.
356 VI. Order
Cantor had conjectured (but not proved) a special case of the well-ordering
theorem in 1883, where x is the set of reals, R.
Proof. If F (a set) is a family of nonempty sets, then let <1 well-order F. To
each x ∈ F associate its <1 -minimum element xmin . The function x "→ xmin is
a choice function on F.
The careful reader will observe that “remaining axioms” need not include
foundation or power set, as the ordinals can be developed without these
axioms.
Proof. Assume AC. Let x = ∅. Referring to the proof of VI.5.50, we can take
g = h and α = γ . If x = ∅, then α = 1 and g = {0, 0} work.
Conversely, let F be a nonempty family of nonempty sets. We take x = F
and let α and g be as described in the
corollary.
A choice function for F is λy.g min(g −1 [y]) : F → x.
VI.5.54 Corollary. For every set x there is an ordinal α and a 1-1 correspon-
dence between x and α.
† Note that (1) may yield undefined right hand sides in the second or third case of the definition,
because the set we are using as the argument of f is actually not in the domain of f (because it
is ∅). For example, this may happen in the third case if {tβ : β < α} is not a chain.
‡ Since dom(t) is On or an ordinal, it is transitive. Therefore, α ∈ β ∈ dom(t) implies α ∈ dom(t).
358 VI. Order
Proof. In view of the proof of VI.5.56, we need only prove that the latter
implies Zorn’s lemma. Let then (A, <) be a PO set where every chain has an
upper bound. Let a ∈ A. Now, {a} is a chain; thus there is a maximal chain
C ⊆ A such that a ∈ C. Let c be an upper bound of C. Then a ≤ c trivially.
Moreover, c is maximal, for if not, there is a b > c. But then C ∪ {b} is a chain
that properly extends C.
† But from which vague description we have firmly justified the selection of axioms of ZFC!
VI.6. The von Neumann Universe 359
The reader is cautioned that the formal construction within ZFC does not
provide a formal proof of consistency of the axioms. What it does is build a for-
mal interpretation of L Set and ZFC over the language L Set and ZFC. Thus,
formally, it only proves that (cf. I.7.10) if ZFC is consistent, then ZFC is
consistent.† Hardly newsworthy. Nevertheless, as we have noticed above,
Platonistically we get much more out of this construction.
VN (0) = N
VN (α + 1) =
P(N ∪ VN (α))
VN (α) = V (α)
β<α N
if Lim(α)
R N (0) = ∅
R N (α + 1) = P(N ∪ R N (α))
R N (α) = R (α)
β<α N
if Lim(α)
VI.6.3 Lemma (VN vs. R N ). For successor ordinals α, VN (α) = R N (α). For all
other ordinals, VN (α) = N ∪ R N (α).
For α = β + 1,
VN (β + 1) = P(N ∪ VN (β))
= P(N ∪ R N (β)) by I.H.
= R N (β + 1)
VI.6. The von Neumann Universe 361
Thus, N ∪ WF N = α∈On VN (α). In WF N we collect only the sets built from
N , and leave “loose” urelements out of the collection. As in III.4.20, V N will
denote the class of all sets built from N . The question whether WF N = V N is
a subsidiary of the second claim made in VI.6.2, and will be settled shortly.
It follows that α∈On VN (α) = N ∪ WF N is transitive as well. Note however
that WF N is not transitive (unless N = ∅), since for any urelement p, p ∈ N ∈
R N (1) ⊆ WF N , yet p ∈/ WF N .
VN (α) ⊆ N ∪ VN (α + 1) (1)
Let next β < α. The I.H. yields VN (β) ⊆ N ∪ VN (α); hence VN (β) ⊆ N ∪
VN (α + 1) by (1).
362 VI. Order
The case Lim(α): Here already VN (β) ⊆ VN (α), even without the help
of I.H.
VN (α) ∈ VN (α + 1) (1)
β ⊆ VN (α) (1)
since ordinals are pure sets. Why should β < α + 1, that is, β ≤ α, or β ⊆ α?
Well, let γ ∈ β. Thus γ ∈ VN (α) by (1). By I.H., γ ∈ α.
The case Lim(α): Let β ∈ VN (α) = {VN (γ ) : γ < α}. So β ∈ VN (γ ) for
some γ < α; hence, by I.H., β < γ . Thus, β < α.
VI.6. The von Neumann Universe 363
VI.6.10 Corollary. On ⊆ α∈On VN (α).
At the end of all this, have we got enough sets to “do set theory”? In other
words, are all the axioms of set theory true in the “real”
VN (α) (A)
α∈On
or, formally, are the axioms provable when relativized to α∈On VN (α)
(cf. Section I.7)? And are these all the sets we can get if we start with a set N
of atoms? That is, is it the case that
UN = VN (α) (B)
α∈On
J = (L Set , ZFC, N ∪ WF N )
we will write “P N ∪WF N ” instead. Two more simplifications in our notation are:
(1) We wrote L Set , but we mean here the basic language augmented by the
various defined symbols we have introduced to this point.
(2) We wrote N ∪ WF N rather than “M(x)”, where the latter is the defining
formula of the class term.
Proof. The reader may wish to review the concepts in Section I.7.
Now, the requirement that
ZFC (∃x)(x ∈ N ∪ WF N )
ZFC x ∈ N ∪ WF N → U (x)
(∀x)(x ∈ A → x ∈ B) (1)
be? It will be
Trivially, (1) implies (2). Interestingly, (2) implies (1): Indeed, to prove (1)
(from (2)), let x ∈ A. Since also A ∈ N ∪ WF N , we get x ∈ N ∪ WF N by
transitivity of N ∪ WF N . Then x ∈ B by (2).
† It is easier expositionally to refer to N ∪ WF N , meaning really J. The jargon “true in” was
introduced (with apologies) on p. 80.
VI.6. The von Neumann Universe 365
So assume (3). This we view as (∀x ∈ A)(∃y)G [x, y], where G [x, y] is
“y ∈ N ∪ WF N ∧ F [x, y]”. We can now apply collection (in ZFC) to
obtain a set B (in ZFC) such that
(∀x ∈ A)(∃y ∈ B)G [x, y]
or
† Cf. III.8.4.
VI.6. The von Neumann Universe 367
Part (xi) in the proof above was carried out without foundation. Indeed, the
whole theorem can be proved without foundation, in “ZFC − f” – where “f”
stands for foundation. This is due to the feasibility of basing everything in
the proof on the Kuratowski pairing, x, y = {{x}, {x, y}}, while defining
ordinals as in VI.4.25. Indeed, everything we have said up until now (except for
the examples regarding the properties of the collapsing function) can be said
without the benefit of foundation.
Thus, the whole construction has built more than we were willing to admit
initially. We have built a formal model of ZFC in J = (L Set , ZFC − f, N ∪WF N )
rather than in “just” J = (L Set , ZFC, N ∪ WF N ). I.7.10 yields at once:
But we do have foundation – that is, its suspension was only temporary, to
obtain VI.6.14.
VI.6.15 Theorem. U N = α∈On VN (α).
VI.6.16 Corollary. WF N = V N .
† Once again we point out that there is no circularity in this assertion, for ordinals can be defined
without the presence of foundation (see the discussion following VI.4.25). Another revision that
one needs to make in this (temporary!) rewriting of our development is the definition of x, y.
To avoid foundation one defines x, y = {{x}, {x, y}}.
368 VI. Order
Another way to put all this is that if we drop the foundation axiom, then
WF N ⊂ V N
The sets in V N − WF N are the hypersets (see Barwise and Moss (1991)).
Principle 0 of Chapter II says that “an arbitrary class is a set formed (from N )
at some stage if all its members are formed (set-members) or given (urelement-
members) at some earlier stage”, and Principle 1 says that “every set is con-
structed at some stage”. All this has now become formally true (with our final
interpretation of what “stage” means).
Indeed, if the set x is in VN (α) and α is smallest (“earliest”), then α = β + 1
(why?); hence x ⊆ N ∪ VN (β) = N ∪ R N (β). That is, all the elements of x are
formed (∈ R N (β)) or given (∈ N ) at some earlier stage.
Conversely, if it is known for a class A that all y ∈ A satisfy y ∈ VN (α y ),
and if we are told that there is a stage after all the α y (that is, sup{α y : y ∈ A}
exists and equals, say, β), then A ⊆ y∈x VN (α y ) ⊆ N ∪ VN (β) (the last ⊆
by VI.6.6). Hence A ∈ VN (β + 1), i.e., A is constructed at stage β + 1 from N ,
as a set. This formalizes Principle 2.
Principle 1 is formalized as VI.6.16.
† This is meant in the non-strict sense. α need not be the earliest stage (i.e., smallest ordinal) at
which x is formed.
VI.6. The von Neumann Universe 369
Proof. N = VN (0) settles the first equation. Let now x be a set and α =
y∈x ρ N (y), while ρ N (x) = β + 1. Since ρ N (y) ≤ α for all y ∈ x, we get
(∀y ∈ x)y ∈ N ∪VN (α) (by VI.6.6); hence x ⊆ N ∪VN (α); thus x ∈ VN (α+1).
This yields
β +1≤α+1 (1)
By VI.6.23, (∀y ∈ x)ρ N (y) < β + 1; hence (∀y ∈ x)ρ N (y) ≤ β. Thus α ≤ β;
hence α + 1 ≤ β + 1. Using (1), we get the second equation.
VI.6.26 Example. Let us rediscover the identity ρ(α) = α +1, using induction
over On in connection with VI.6.24. Use ρ(β) = β + 1 for β < α as I.H.
Then ρ(α) = β<α (β + 1) + 1 = sup+ (α) + 1 = α + 1.
VI.6.27 Example. Suppose that f is a function with dom( f ) ⊆ ω+1, such that
f (ω) ↓. Let us estimate f ’s rank. We know that ω, x ∈ f for some x. Now,
ρ(ω, x) = ρ({ω, {ω, x}})
= max(ω
+ 1, ρ({ω,x})) + 1
= max ω + 1, max ω + 1, ρ(x) + 1 + 1
≥ ω+3
By VI.6.23, ρ(ω, x) < ρ( f ); thus, f ∈
/ VN (ω + 3).
This entails, in particular, that an inhabitant of VN (ω +3) would be oblivious
to the fact that there is a 1-1 correspondence ω ∼ ω + 1, since even though ω
and ω + 1 are “visible” in VN (ω + 3),† the 1-1 correspondence is not.
† In anticipation of ordinal arithmetic in Section VI.10, we are taking here the notational liberty
to write things such as “α + 3” for ((α + 1) + 1) + 1.
VI.6. The von Neumann Universe 371
VI.6.28 Example. Next, assume Lim(α), and let β < γ , both in VN (α), and
f : γ → β be a 1-1 correspondence. Is f ∈ VN (α)?
f ⊆ γ × β. Now,
ρ(γ × β) = ρ(δ, η) + 1
δ,η∈γ ×β
= max(δ + 3, η + 3) + 1
δ,η∈γ ×β
≤ γ + 3, since max(δ + 3, η + 3) ≤ γ + 2
To avoid repetitiousness in the arguments that follow, let us show once and
for all that
(1) B = ∅ → b = ∅,
(2) b can be given as a set term in terms of B.
We now turn to show that “replacement” ((1) above) implies collection (there
is no circularity in this, for anywhere that we have used collection, the restricted
form (1) sufficed).
Assume then
(∀x ∈ A)(∃y)S (x, y) (2)
Thus, there is a nonempty class Bx = {y : S (x, y)} for each x ∈ A.
Let, for each x ∈ A, bx = ∅ be the set “computed” by VI.6.29(i). Using
class notation for readability, (2) translates into
(∀x ∈ A)(∃y)y ∈ Bx (2 )
We have by VI.6.29
(∀x ∈ A)(∃!y)y = bx (3)
Why “∃!”? Because, referring to the proof of VI.6.29, there is only one mini-
mum α such that Bx ∩ VN (α) = ∅, and hence only one bx . On the other hand,
bx = y → bx = z → y = z. By schema (1), (3) yields
(∃z)(∀x ∈ A)(∃y ∈ z)y = bx
Let then C be a new constant, and add
(∀x ∈ A)(∃y)(y ∈ C ∧ y = bx )
By the one point rule (I.6.2), the above implies (∀x ∈ A)bx ∈ C; thus {bx : x ∈ A}
is a set by separation; hence so is {bx : x ∈ A}. Call this union D (new
constant).
We are almost done. Let x ∈ A. Then Bx = ∅, hence by VI.6.29, bx = ∅.
This allows us to add a new constant e and also add e ∈ bx . By bx ⊆ D we
have e ∈ D. By bx ⊆ Bx we have e ∈ Bx . Thus, e ∈ D ∧ e ∈ Bx ; hence
(∃y ∈ D)y ∈ Bx
By the deduction theorem x ∈ A → (∃y ∈ D)y ∈ Bx ; hence (generalizing)
(∀x ∈ A)(∃y ∈ D)y ∈ Bx
from which, eliminating class notation,
(∃z)(∀x ∈ A)(∃y ∈ z)S (x, y)
This, along with (2), proves collection (III.8.2).
VI.6.30 Example. Let the relation P satisfy a “weak” MC, namely, for every
nonempty set x, there is a P-minimal element y ∈ x, i.e., ¬(∃z ∈ x)z P y, or
Py ∩ x = ∅
It will follow that P has “ordinary” (strong) MC, as defined in VI.1.22.
VI.7. A Pairing Function on the Ordinals 373
Sa ∩ A = ∅
or
ba ∩ A = ∅
We start by noting that, since for any two ordinals α, β one has either α ≤ β or
β < α, it makes sense to define
α if β ≤ α
max(α, β) =
β otherwise
and
α if α ≤ β
min(α, β) =
β otherwise
Proof. We delegate the details, for example that is a linear order, to the reader
(Exercise VI.26).
Let us argue that it has MC. To this end, let ∅ = A ⊆ On × On. The class
{α ∪ β : α, β ∈ A} has a smallest member γ . This is realized as γ = α ∪ β
for some (perhaps several) α, β ∈ A. Among those, pick all with the smallest
α (first component), i.e., setting
def
F (σ, τ ) ≡ γ = σ ∪ τ ∧ σ, τ ∈ A
form the class
{α, β : F (α, β) ∧ (∀σ )(∀τ )(F (σ, τ ) → α ≤ σ )} (1)
and finally pick in (1) that α, β with the smallest β.
Let us verify that α, β is -minimal in A: If σ, τ α, β because σ ∪ τ <
α ∪ β = γ , then σ, τ ∈
/ A by the choice of γ . Let it then be so because σ ∪ τ =
α ∪ β = γ , but σ < α. Then σ, τ ∈ / A by the choice of α. The last case to
consider also yields σ, τ ∈
/ A, by the choice of β.
VI.7.4 Remark. The unique function J : On2 → On that effects the isomor-
phism of Theorem VI.7.3 is an instance of a pairing function on the ordinals –
that is, a 1-1, total function On2 → On. This particular one is also onto;
thus there is an inverse J −1 : On → On2 . We reserve the letters K , L to write
J −1 = K , L. The K , L are the first and second projections of J and satisfy
K , L ◦ J = 1On2 and J ◦ K , L = 1On (the latter only because J is onto).
Thus,
K (J (α, β)) = α
and
for all α, β.
With the aid of K , L we can enumerate all the pairs in On2 by the (total)
function α "→ K (α), L(α) on On. Note how each of K and L enumerates
each ordinal σ infinitely often (why?)
Pairing functions play an important role in recursion theory (from where
the notation is borrowed here). For a detailed account of computable pairing
functions on N see Tourlakis (1984).
What are pairing functions good for? Section VI.9 will exhibit a substantial
application. The next one will be given in Chapter VII. For now let us extend
the coding of pairs (of ordinals) that J effects into a coding of “vectors” of
ordinals (compare with III.10.4).
Thus, for each n, m ∈ ω2 , J (n, m) is a successor (or 0 = J (0, 0)); therefore
J [ω2 ] ⊆ ω. Now J [ω2 ] = J (0, ω) (Exercises VI.28 and VI.29); hence J [ω2 ] ⊇
ω, by VI.7.6. Thus ω is a fixed point of λα.J [α 2 ].
VI.8. Absoluteness
We expand here on the notions introduced in Section I.7. We will be interested
in exploring the phenomenon where inhabitants of universes M, possibly much
smaller than U M , can correctly tell that a sentence A is true in U M , even though
their knowledge goes no further than what is going on in their “world” M.
We start by repeating the definition of relativization of formulas, this time
in the specific context of L Set . Since we here use L Set as our interpretation
language, we go one step further and interpret ∈ as ∈ and U as U . Thus, we
restate below Definition I.7.3 under these assumptions.
The reader must have noticed that we now use F M rather than F M(x)
(cf. I.7.3), as that is the normal practice in the context of set theory.
Recall that the primary logical connectives are ¬, ∨, and ∃. In contrast,
∀, ∧, →, ↔ are defined symbols, which is why the above definition does not
refer to them.
M
Clearly, if F is quantifier-free, then F is F .
VI.8.3 Remark (“Truth” in M). We use the short argot “F (x1 , . . . , xn ) is true
in M” for the longer argot “F (x1 , . . . , xn ) is true in J = (L Set , ZFC , M)”. We
will often write this assertion as
|=M F (x1 , . . . , xn ) (1)
We will recall from I.7.4 the translation of the above argot, (1), where we use
here “ZFC ” for some unspecified fragment of ZFC:
M
ZFC x1 ∈ M ∧ x2 ∈ M ∧ · · · ∧ xn ∈ M → F (x1 , . . . , xn ) (2)
The part “x1 ∈ M ∧ x2 ∈ M ∧ · · · ∧ xn ∈ M →” in (2) is empty if F is a
sentence.
Platonistically (semantically), “truth in M” is just that; in the sense of
I.5. Indeed, the notation (1) states such truth from the semantic viewpoint
as well. In this and the next section however, our use of (1) is in the syntactic
sense (2).
|=M U (x)[[ a ]] iff |=U N U (x)[[ a ]] iff (cf. VI.8.1) |=U N U M (x)[[ a ]]
VI.8. Absoluteness 379
VI.8.5 Example (Informal). Let M = {a, {a}, {a, b}}, where a = b are ure-
lements. Set A = {a} and B = {a, b}. Clearly, A = B; hence also (A = B) M .
Yet,
M
(∀x)(x ∈ A ↔ x ∈ B)
380 VI. Order
that is,
(∀x ∈ M)(x ∈ A ↔ x ∈ B)
|= M (∀x)(x ∈ A ↔ x ∈ B)
VI.8.6 Definition (0 -formulas). The set of the 0 -formulas is the smallest
subset of all formulas of L Set that
(1) includes all the atomic formulas (of the types xi = x j , U (xi ), xi ∈ x j ), and
(2) is such that whenever the formulas A, B are included, so are (¬A),
(A ∨ B ), and ((∃xi ∈ x j )A) for any variables xi , x j (xi ≡ x j ).
Y ∈ b ∧ A (Y, an )
Thus, using the above terminology, VI.8.7 says that 0 -formulas are absolute
for transitive classes. Example VI.8.2 shows that (the term) {a, b} is absolute
for any class M.
If T(
u ) = {x : T (x, u )} and T is absolute for M, then for u i ∈ M,
TM (
u ) = {x ∈ M : T M (x, u )}
= {x ∈ M : T (x, u )} by absoluteness of T
= T(
u) ∩ M
If moreover we know that T(u ) ⊆ M for all u i ∈ M, then T(
u ) is absolute for
M, as happened in the special case T(x, y) = {x, y}.
In view of Definition VI.8.8, and inspecting the proof of VI.8.7, we can state
at once:
VI.8.9 Corollary. The set of formulas that are absolute for some class M is
closed under the Boolean connectives and the bounded quantifiers (∃x ∈ y)
and (∀x ∈ y).
Its relativization to M is
(∃x ∈ M)A M [x] → (∃x ∈ M) A M [x] ∧ ¬(∃y ∈ M)(y ∈ x ∧ A M [y]) (2)
using VI.8.1. Letting now a1 ∈ M, . . . , an ∈ M, where an are the free variables
in (2), we can have a proof of (2) in ZF, for it is an instance of the schema “∈
(the relation) has MC over M”, a provable schema by the foundation axiom
(cf. VI.1.25).
The lemma can be strengthened by effecting the interpretation in “ZF − f”, that
is, dropping foundation. We have to take a few precautions:
u ), y = TM (
VI.8.12 Lemma. For any transitive class M and class term T( u)
M
u )) , for all y, u i ∈ M.
iff (y = T(
The proof can be carried out in ZF − f. The statement is in the customary argot,
but it states an implication, from premises y ∈ M, u i ∈ M, to the conclusion
y = TM ( u ) ↔ (y = T( u ))M
The last equivalence above uses the Leibniz rule, the tautology
A → (B ↔ C ) ↔ (A ∧ B → C ) ∧ (A ∧ C → B )
VI.8.13 Remark. The above result is useful: We often need to show that the
M-relativization of the formula “{x : A(x, u )} is a set” is provable. That is, we
need to show, for u i ∈ M (free variables), the derivability of
M
(∃y)y = A( u) (1)
(∃y ∈ M)y = AM (
u)
u )] is absolute for M.
(i) F [T(
(ii) S[T(
u )] is absolute for M.
and hence to
(∃y ∈ M) F [y] ∧ y = T(
u) (3)
Thus,
M M
S[T(
u )] = x ∈ M : S [x, T(
u )]
= {x ∈ M : S [x, T(
u )]}, by part (i)†
M
= S [T(u )]
= S[T(
u )], by absoluteness of S
VI.8.15 Example. In particular, for any transitive class M that is {a, b}-closed,
{a, {a, b}}M = {a, {a, b}} and {{a}, {a, b}}M = {{a}, {a, b}} are provable in
ZF − f ‡ for a ∈ M and b ∈ M. In other words, for such a class either imple-
mentation of the ordered pair a, b (among the two that we have mentioned)
is an absolute term.
VI.8.16 Lemma. The following are absolute for any transitive class M:
(i) A ⊆ B.
(ii) A = ∅.
(iii) A is a pair (also, A = {x, y}).
(iv) A is an ordered pair (also, A = x, y).
Proof. Most of these will be left to the reader (Exercise VI.61). Let us sample
a few:
(iii): A is a pair: (∃x ∈ A)(∃y ∈ A)(∀z ∈ A)(z = x ∨ z = y). This is a 0 -
formula, and hence absolute for any transitive class.
(x): A is a transitive set: ¬U (A) ∧ (∀y ∈ A)(∀x ∈ y)x ∈ A.
(xiv): A ∈ ω: “A is an ordinal ∧ A is a successor or 0 ∧ (∀x ∈ A)x is a
successor or 0” is a 0 -formula.
(x xi): Hint. x, y = {x, {x, y}}. Thus, if x, y ∈ A, then y ∈ A.
(1): A × B = {z : (∃x ∈ A)(∃y ∈ B)z = x, y}. Since the defining formula
is 0 , (A × B)M = (A × B) ∩ M = A × B, since M is {x, y}-closed (and hence
x, y-closed).
VI.8. Absoluteness 387
VI.8.17 Example. Let M be a transitive class that is closed under pairs (hence
also under ordered pairs), and R = {x, y : R(x, y)} a relation, where R is
absolute for M. We calculate RM , noting that (cf. III.8.7)
R = z : (∃x)(∃y) x, y = z ∧ R(x, y)
We have
RM = z ∈ M : (∃x ∈ M)(∃y ∈ M) x, y = z ∧ R(x, y)
= z : (∃x)(∃y) x, y = z ∧ z ∈ M ∧ x ∈ M ∧ y ∈ M ∧ R(x, y)
= z : (∃x)(∃y) x, y = z ∧ x ∈ M ∧ y ∈ M ∧ R(x, y)
= x, y : x ∈ M ∧ y ∈ M ∧ R(x, y)
= x, y ∈ M × M : R(x, y)
= R ∩ (M × M)
= R|M
The third “=” stems from the assumption that M is closed under pairs, which
leads to the equivalence
x, y = z ∧ z ∈ M ∧ x ∈ M ∧ y ∈ M ↔ x, y = z ∧ x ∈ M ∧ y ∈ M
With some practice one tends to shrug off calculations such as the above and
write RM = {x, y ∈ M × M : R(x, y)} directly.
x ∈ M → y ∈ M → z ∈ M → R(x, y) ∧ R(x, z) → y = z
or (by VI.8.1)
M
x ∈ M → y ∈ M → z ∈ M → R(x, y) ∧ R(x, z) → y = z (2 )
It follows from (3) that to obtain R(a) ! RM (a), for all a ∈ M – the absoluteness
condition – we equivalently need the provability of
a ∈ M → R(a) ↓→ R(a) ∈ M
that is, that M is R-closed in a sense weaker than that in VI.8.1: R[M] ⊆ M.
In terms of R we need the provability of
x ∈ M → R(x, y) → y ∈ M (4)
(i) R is absolute.
(ii) (1) (hence, trivially by (i), (2)) is provable.
(iii) (4) is provable.
(∀x)(∃!y)R(x, y) (6)
Note that (6) combines (1) with (∀x)(∃y)R(x, y) (no “!”). Thus, our conditions
for absoluteness of R – that is, (i)–(iii) – simplify to just (i) along with requiring
VI.8. Absoluteness 389
the provability of †
M
(∀x)(∃!y)R(x, y) (7)
Of course, one can go back and forth between a total informal R and a formal
R (cf. III.11.20); hence the two conditions (i) and (7) constitute all we need for
absoluteness of a total function R.
Finally, an interesting subcase of the formal R is that of a constant c – “0-ary
function symbol” – defined by an absolute for M formula, C (y), as
c = y ↔ C (y) (8)
In this case the conditions of absoluteness for c are that of C , and the require-
ment that the M-relativization of (9) be provable. For example, if ω ∈ M, then
ωM = ω by VI.8.16(xv).
Not all absoluteness results follow from ascertaining that our formulas are
0 . The following is an important example that does not so follow, which uses
some terminology (finiteness) from the sequel, whence the .
is absolute for any transitive class M that satisfies a bit of ZFC. Namely, we want
M to be closed under pairs and to contain ω (ω ∈ M). We start by observing
that the formula to the right of (∃ f ) is 0 . Indeed, in view of VI.8.16 one need
only verify that
† Thus, one may introduce the function symbol RM so that x ∈ M → (y = RM (x) ↔ RM (x, y))
/ M → RM (x) = ∅ are provable.
and x ∈
390 VI. Order
Pause. How much ZFC did we employ in the above proof? Was the assump-
tion ω ∈ M an overkill? If so, what would be a weaker assumption that still
works?
† Therefore, J = (L Set , ZF, M) is the model, but it is a common abuse of terminology to say that
M is.
VI.8. Absoluteness 391
Consulting the list VI.8.16 and also invoking VI.8.14, we observe that, for α
and y in M, the relativization of (5) – within provable equivalence – introduces
only one annoying part, namely, (∃! f ∈ M).‡
Let then α ∈ M and y ∈ M. The relativization of (5) is (provably equiva-
lent to)
(∃! f ∈ M) f is a function ∧ dom( f ) = α ∪ {α}
(6)
∧ ∀β ∈ α ∪ {α} G ( f |` β, f (β)) ∧ α, y ∈ f
† Actually, no detailed proof was given for this particular statement. The detailed proof that applies
here as well, with only notational modifications, was given in VI.2.25 for a more general case of
recursion, not just for recursion over On.
‡ Note that the term f |` x is short for {z : π (z) ∈ x ∧z ∈ f }, that is, {z : ((∃y ∈ x)y = π (z))∧z ∈ f },
and is thus absolute, and defined in M on the assumptions x ∈ M and f ∈ M. Thus VI.8.14
applies.
392 VI. Order
which we abbreviate as
(∃! f ∈ M)C ( f, α, y) (6 )
in what follows. We will have shown (3) if we prove, under our underlined
assumptions, that (5) → (6 ), the other direction being trivial.
Add then (5) as an assumption, as well as a new constant g and the as-
sumption
C (g, α, y) (5 )
In VI.5.16 we gave a proof in ZF
that
(∀α)(∃!y)(∃! f )C ( f, α, y) (7)
Since M is a formal model of ZF, and being mindful of the relativization claims
we made a few steps back, we have (by I.7.9) a ZF-proof of
(∀α ∈ M)(∃!y ∈ M)(∃! f ∈ M)C ( f, α, y) (8)
Specializing (8), we derive (∃y ∈ M)(∃ f ∈ M)C ( f, α, y).
Thus, adding two new constants c and h, we may add also
C (h, α, c) (9)
and
h∈M (10)
Now, the “!”-notation in (7) is short for (∀α)(∃y)(∃ f )C ( f, α, y) and
C ( f, α, y) → C ( f , α, y ) → f = f ∧ y = y (11)
Thus, since we have (7), the above and (5 ) yield y = c and h = g; hence,
from (10), g ∈ M. We have derived (cf. (5 ))
g ∈ M ∧ C (g, α, y)
and hence (6 ) by the substitution axiom (the “!” is inserted by (11)).
So y = F(α) is absolute. We need to establish (4) to show that F is. In fact,
(4) is a direct result of (8), which also yields dom(FM ) = M ∩ On = OnM .
VI.8.23 Exercise (Induction on T C(x)). Prove that for any formula F (x),†
ZF (∀x) (∀y ∈ T C(x))F (y) → F (x) → (∀x)F (x)
That is, to prove F (x) we are helped by the assumption (∀y ∈ T C(x))F (y)
that we can add for free.
Hint. Start by assuming the hypothesis and proving using foundation (equiv-
alently, ∈-induction) that ZF (∀x)(∀y ∈ T C(x))F (y).
where we have written G for the function given by y = G(x) ↔ G (x, y) (we
could also have introduced a formal G).
Hint. Imitate the proof of VI.2.25. Start by showing that y = F(x) (or
y = F(x)) must stand for
(∃! f )C ( f, x, y)
where C ( f, x, y) abbreviates
† This works in weaker set theories. For example, neither infinity nor power set axioms are required.
394 VI. Order
ZF (∀x)(∃!y)(∃! f )C ( f, x, y)
Once the fact that F (or F) can be introduced has been established, show that F
is absolute for any transitive formal model of ZF for which G is absolute. This
will imitate the work in VI.8.20. We know that T C(x) is absolute by VI.8.21. We
need to worry about things such as (∃y ∈ T C(x)), y ∈ T C(x), and dom( f ) =
T C(x).
VI.8.26 Exercise. Prove in ZF that the function J as well as the various πin of
Section VI.7 are absolute for transitive formal models of ZF.
Hint. J satisfies the recurrence
or
& '
(∀x ∈ On)(∀y ∈ On)J (x, y) = ran J |` x, y
Absoluteness follows (after some work) from our standard technique that proves
the existence of recursively defined functions: Start by proving in ZF that
J (α, β) = γ must be given by
(∃! f )C ( f, α, β, γ )
where
& '
C ( f, α, β, γ ) ↔ f is a function ∧ dom( f ) = α, β ∪ {α, β} ∧
& '
(∀w ∈ dom( f )) f (w) = ran f |` w ∧
f (α, β) = γ
As in all previous cases where this technique was employed, f simply “codes”
the computation that verifies J (α, β) = γ . Now you will need a few ab-
soluteness lemmata to conclude your case. For example, you will need the
VI.9. The Constructible Universe 395
& '
absoluteness of x ∈ α, β . This is equivalent to
$ ∧ π (x) ∈ On ∧ δ(x) ∈ On ∧
O P(x)
(π (x) ∈ α ∨ π(x) ∈ β) ∧ (δ(x) ∈ α ∨ δ(x) ∈ β) ∨ %
π (x) ∪ δ(x) = α ∪ β ∧ (π(x) < α ∨ π(x) = α ∧ δ(x) < β)
etc.
† These are in VN (α + 1) = P(N ∪ VN (α)) when we start with the set of urelements N .
‡ Thus, J will be a so-called ∈-model, or more accurately, (U, ∈)-model.
396 VI. Order
in Shoenfield (1967), but with some departures for convenience and user-
friendliness).
VI.9.1 Definition (The Gödel Operations). We call the terms Fi below the
Gödel operations:
F0 (x, y) = x−y
F1 (x, y) = x ∩ dom(y)
F2 (x, y) = {z ∈ x : U (z)}
F3 (x, y) = {u, v ∈ x : u = v}
F4 (x, y) = {u, v ∈ x : u ∈ v}
F5 (x, y) = {u, v ∈ x : v, u ∈ y}
F6 (x, y) = {u, v, w ∈ x : v, w, u ∈ y}
F7 (x, y) = {u, v, w ∈ x : u, w, v ∈ y}
F8 (x, y, z) = x ∩ (y × z)
F9 (x, y) = {x, y}
† This means the following: On one hand, trivially, ZF (∃x)(∃y)(¬U (x) ∧ (∀z ∈ x)U (z) ∧ y
is a 1-1 function ∧ dom(y) ∈ On ∧ ran(y) = x). For example, x = y = ∅ work, and then we
can invoke the substitution axiom. Now one can introduce new constants N and f along with
assumption (cf. p. 75),
¬U (N ) ∧ (∀z ∈ N )U (z) ∧ f is a 1-1 function ∧ dom( f ) ∈ On ∧ ran( f ) = N
VI.9. The Constructible Universe 397
The previous recursive definition is appropriate, since πi4 (α) ≤ α < α + 1 for
i = 1, 2, 3, 4 (see VI.7.7). It uses ordinals to systematically iterate the Gödel
operations “as long as possible”. In this section we work in ZF; thus we had to
ask that N be well-ordered (in ZFC, of course, every set is well-orderable by
Zermelo’s theorem). The subcase “∨ α = 0” (part of the case “for α ≥ &N &”)
takes care of the situation where N = ∅. Then F0 = ∅.
The reader will want to compare the above definition with Definition VI.6.1.
L N parallels U N (rather than V N ). The major differences between the two
definitions are:
(1) Instead of using the power set operation at successor ordinal stages, we
are using explicit (Gödel) operations that are much “weaker” than forming
power sets, to construct one member of the hierarchy at a time.
(2) The urelements are not given at once (stage 0), but are “built” one at a time;
it takes &N & steps to have them all.
Between successive limit ordinals we ensure that each case (among the
ten Gödel operations) gets “equal opportunity” to apply (at successor ordinal
stages), by using a technique recursion theorists call “dovetailing”.†
† For each case, according as π44 (α) = 0, 1, . . . , 8, or > 8, all pairs π14 (α), π24 (α) and all triples
π14 (α), π24 (α), π34 (α) will be considered, since α "→ π14 (α), π24 (α), π34 (α), π44 (α) is onto – as
follows from VI.7.5 by the observation that K , L is onto.
398 VI. Order
Then
y ∈ x ⊆ Fπ14 (α)
by VI.9.1. By the obvious I.H. (note that ord(Fπ14 (α) ) ≤ π14 (α) ≤ α < α + 1),
we have y ∈ L N and ord(y) < ord(Fπ14 (α) ) < α + 1 = ord(x).
Finally, let y ∈ x = Fα+1 = F9 (Fπ14 (α) , Fπ24 (α) ). Then y = Fπ 4j (α) for j = 1
or j = 2; hence ord(y) = ord(Fπ 4j (α) ) ≤ π 4j (α) ≤ α < α + 1. Moreover, y ∈ L N ,
since it is an “Fβ ”.
Proof. Straightforward index computations, and VI.9.2, yield this claim. For
example, say that x, y (sets or atoms) are in L N , so that x = Fα and y = Fβ .
Then J4 (α, β, &N &, 9) ≥ &N & (by VI.7.7) and
Similarly, if x, y are sets with orders as above, x ∩dom(y) = FJ4 (α,β,1,1)+1 (why
is J4 (α, β, 1, 1) + 1 ≥ &N &?) The remaining cases are left to the reader.
This is a theorem schema: one theorem for each n ∈ N. The proof is by informal
induction on n.
(∀z ∈ x)(∃α){z} = Fα
will do.
Pause. But what about x ∩ y when, say, U (x)? We have said in Chapter III that
the formal operations (∩, ∪, −) are total, so they make sense on atoms (in N )
too.
F10 (x) ⊆ y1
F11 (x) ⊆ y2
F12 (x) ⊆ y3
Thus, F10 (x) = F5 (y1 , x), F11 (x) = F6 (y2 , x), and F12 (x) = F7 (y3 , x). We are
done by VI.9.5.
The following lemma shows that introducing dummy variables does not take
us out of L N . It forms the fundamental step of the main result of this section, that
L N is a model of ZFC, in that it helps to show that L N satisfies the separation
axiom.
is in L N .
if i < j, and
otherwise.
For the induction step we consider cases.
Case n ∈
/ {i, j}. By I.H.,
is in L N . But then so is
by VI.9.8.
Case n ∈ {i, j} and i, j are consecutive integers. If n = 2, then we are back
to the basis step, so assume n > 2. Now, observe that u 1 , . . . , u n−1 , u n =
u 1 , . . . , u n−2 , u n−1 , u n , and set
F(3) an−1 , an , (a1 × · · · × an−2 )
= {u n−1 , u n , u 1 , . . . , u n−2 ∈ (an−1 × an )
× (a1 × · · · × an−2 ) : u i , u j ∈ b}
Clearly, F(3) an−1 , an , (a1 ×· · ·×an−2 ) ∈ L N by the previous case (and VI.9.8),
and
-
F (n) (a , . . . , a ) = {u , . . . , u , u
1 n 1 n n−1 ∈
a1 × · · · × an × an−1 : u i , u j ∈ b}
402 VI. Order
is in L N .
Case i = j, finally. By VI.9.5 and VI.9.8,
FA (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : A L N (
u n )}
is in L N .
FA (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : U (u i )}
= a1 × · · · × ai−1 × F2 (ai , ai ) × ai+1 × · · · × an
† Of course, “×” associates left to right, and we omitted brackets to avoid cluttering the notation.
VI.9. The Constructible Universe 403
FA (a1 , . . . , an )
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i = u j }
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i , u j ∈ F3 (ai × a j , ai )}
FA (a1 , . . . , an )
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i ∈ u j }
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i , u j ∈ F4 (ai × a j , ai )}
FB (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : B L N (
u n )}
FA (a1 , . . . , an ) = a1 × · · · × an − FB (a1 , . . . , an )
FB (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : B L N (
u n )}
and
LN
FC (a1 , . . . , am ) = {u 1 , . . . , u m ∈ a1 × · · · × am : C (
u m )}
are in L N . Since (B ∨ C )L N is (B L N ) ∨ (C LN
),
and the result follows from VI.9.5 and VI.9.8 (“× am+1 × · · · × an ” above is
absent if n = m).
Finally,
u n , y). By I.H.,
u n ) is (∃y)B (
Case A(
FB (a1 , . . . , an , b)
(1)
= {u 1 , . . . , u n , y ∈ a1 × · · · × an × b : B L N (
u n , y)}
404 VI. Order
FA (a1 , . . . , an )
$ %L
= u 1 , . . . , u n ∈ a1 × · · · × an : (∃y)B ( u n , y) N
= u 1 , . . . , u n ∈ a1 × · · · × an : (∃y ∈ L N )B L N (
u n , y)
u n ∈ a1 × · · · × an.
Let us prove (4).Let
Case 1. (∃y) y ∈ L N ∧ B L N ( u n , y) . Then (4) follows by tautological
implication and ∃-monotonicity.
Case 2. ¬(∃z) z ∈ L N ∧ B L N ( u n , z) . Note that ∅ is constructible: If N =
∅, then ∅ = p − p,† where p ∈ N ( p is, of course, in L N ). If N = ∅, then
∅ = F0 . Thus ∅ ∈ L N ∧ ∅ = ∅ ∧ ¬(∃z ∈ L N )B L N ( u n , z) is provable; hence so
LN
is (∃y) y ∈ L N ∧ y = ∅ ∧ ¬(∃z ∈ L N )B ( u n , z) by the substitution axiom.
Now (4) follows once more by tautological implication and ∃-monotonicity.
By collection,‡ there is a set A (new constant) such that
$
u n ∈ a1 × · · · × an (∃y ∈ A) y ∈ L N ∧ B L N (
∀ u n , y)
%
∨ y = ∅ ∧ ¬(∃z ∈ L N )B L N (
u n , z)
† Recall that the formal difference makes sense on atoms. Cf. III.4.16.
‡ “y ∈ L N ” is, of course, the set theory formula “(∃α)y = Fα ” – see Definition VI.9.2.
VI.9. The Constructible Universe 405
N.B. Actually, L Set and ZF contain N and f and their axiom (VI.9.2).
Proof. The proof is in ZF. First off, L N = ∅. Indeed, we have shown in the
course of the previous proof that ∅ ∈ L N . We verify now the ZFC axioms.
(1) Extensionality holds in L N by VI.9.4 and VI.8.10.
(2) For the axiom
U (b) → ¬(∃x)x ∈ b
we want
b ∈ L N → U (b) → ¬(∃x ∈ L N )x ∈ b
which follows from the ZF version preceding it.
(3) The axiom of separation says that
u i−1 , a, u i+1 , . . . , u n ) = {u i : u i ∈ a ∧ P (
A( u n )}
is a set (parametrized by the free variables a and u j , 1 ≤ j < i ∨ i < j ≤ n)
for any formula P . The relativized version asserts that
AL N (
u i−1 , a, u i+1 , . . . , u n ) = {u i ∈ L N : u i ∈ a ∧ P LN
(
u n )} (i)
is constructible from N whenever a and u j , 1 ≤ j < i ∨ i < j ≤ n, are
(cf. VI.8.13).
So, we let a and u j , 1 ≤ j < i ∨ i < j ≤ n, be in L N and prove that
LN
A ( u i−1 , a, u i+1 , . . . , u n ) ∈ L N . By transitivity of L N , (i) simplifies to
AL N (
u i−1 , a, u i+1 , . . . , u n ) = {u i : u i ∈ a ∧ P LN
(
u n )} (i )
406 VI. Order
is constructible. Hence
·
ran dom · · dom(FP ) if i > 1
AL N (
n−i terms
u i−1 , a, u i+1 , . . . , u n ) =
dom ·
· · dom(FP ) otherwise
n−1 terms
{x ∈ L N : U (x)} ∈ L N
In view of the already noted sp(Fα ) ⊆ N and N ⊆ L N (N = {Fα : α < &N &}),
the above translates to N ∈ L N . Now, since N ⊆ L N (and N is a set), there
is, by VI.9.7, an N -constructible set A such that N ⊆ A. But then N =
{x ∈ A : U (x)}; hence it is constructible by L N -separation ((3) above).
(5) The pairing axiom: For any atoms or sets a and b, there is a set c such
that a ∈ c and b ∈ c. Thus, we want (by VI.8.13) to show that {a, b} is defined
in L N . Since {a, b}L N = {a, b}, this follows from VI.9.5.
(6) The union axiom states, essentially, that if A is a set, then A is a set.
For the L N version we need to be defined in L N (again by VI.8.13). Since
(by VI.8.16 and VI.9.4) is absolute for L N , we need to show that L N is
-closed. For A ∈ L N , A = {x : (∃y ∈ A)x ∈ y} ⊆ L N by VI.9.3. Hence
A ⊆ b ∈ L N for some b (by VI.9.7), and A ∈ L N by L N -separation.
(7) Foundation holds in L N by VI.8.11.
(8) Collection says that for any set A and formula P [x, y],
Proof. By our previous construction and the results of Section I.7. Note that the
auxiliary constant metatheorem is being invoked, since neither the hypothesis
nor the conclusion refers to the new constants N and f introduced as per the
footnote on p. 396. c.f. p. 75.
VI.9.15 Exercise. Prove that all Gödel operations, including the three derived
ones, are absolute for transitive models of ZF. Are they absolute for any other
classes?
VI.9.16 Exercise. Prove that the function λα.Fα is absolute for L N when the
constants N , f are interpreted as themselves.
Hint. This follows from VI.9.15 and techniques in Section VI.8 (cf. in par-
ticular VI.8.20 and VI.8.26). Note that f : dom( f ) → N is in L N . This is so
by dom( f ) ∈ On ⊆ L N , f ⊆ dom( f ) × N , and closure of L N under × (now
apply L N -separation).
The axiom of constructibility says that all objects are in L N , or all objects
are constructible.‡ Formally then it is
(∀x)(∃α)x = Fα (V = L)
(∀x ∈ L N )(∃α)x = Fα
ZF A L N (1)
ZF+(V=L) A (2)
ZF V = L → A (3)
|=J (V = L) → A
But |=J (V = L) by VI.9.17; thus |=J A; that is, we have derived (1).
Conversely, suppose we have proved (1). We show that (2) follows. To this
end we show by induction on formulas that
V = L A ↔ A LN (4)
LN
interesting case where A ≡ (∃x)B . Now A ≡ (∃x) (∃α)
We just check the
LN
x = Fα ∧ B , while
V = L (∃x) (∃α)x = Fα ∧ B L N ↔ (∃x) (∃α)x = Fα ∧ B (5)
410 VI. Order
by the I.H. But V = L (∃α)x = Fα ∧ B ↔ B , since the hypothesis –
(∀x)(∃α)x = Fα – implies (∃α)x = Fα . This and (5) yield (4) via the Leibniz
rule.
Thus, if we have (1), then we also have ZF + (V=L) A L N , and hence (2),
by (4).
The moral, in plain English, is:
VI.9.19 Exercise. Prove that L N is the ⊆-smallest proper class (formal) model
of ZF among those that contain N . In particular, L (cf. VI.9.2) is the ⊆-smallest
proper class (formal) model of ZF.
Hint. Indeed, let M be a proper class model of ZF where N ∈ M. First show
that On ⊆ M: Let α ∈ On. Look for a β ∈ M such that α ≤ β. For example,
as M is not a set, pick an x ∈ M − N ∪ VN (α). By VI.8.25, ρ(x) ∈ M. This
is a good enough β. Conclude by computing LM N . Towards this you will need
that λα.Fα is absolute for M, in particular, that f of VI.9.2 is in M. The latter
is argued as in VI.9.16, using the absoluteness of × (cf. VI.8.16).
VI.9.20 Remark. Our discussion has focused so far on “large” models M, i.e.,
proper class models. These have the advantage of containing all the ordinals.
One is also interested in “small” transitive (U, ∈)-models of ZF, M, where M is a
set. We can easily adjust the constant N of p. 396 so that the related assumption is
of length, intuitively, α · β.
Let us first make addition of ordinals precise by extending the recursive
definition of addition over ω (V.1.22).†
In the definition we use “+” with potentially two different meanings: The new
meaning is given in the definition. The old meaning is in the use of “+1” to
mean ordinal successor. It will turn out that the two meanings are consistent.
Proof. By VI.5.38 and the weak continuity of λβ.α + β, we only need to show
that α + β < α + (β + 1). But, by VI.10.1, this translates to
α + β < (α + β) + 1
= (α + β) ∪ {α + β}
Proof. By VI.5.40.
We next prove the analogue of V.1.25, which shows that Definition VI.10.1
indeed captures the intuitive meaning of (1) in the preamble to this section.
Proof. We do induction on β. The basis and the case β + 1 are handled exactly
as in V.1.25.
So, let Lim(β), and assume (I.H.) that whenever γ < β,
α + γ = α ∪ {α + λ : λ < γ } (1)
Now β > 0; hence α < α + β, using VI.10.2 (since α + 0 = α by VI.10.1); thus
α ⊂ α + β. Moreover, by VI.10.2 again, α + γ < α + β, that is, α + γ ∈ α + β.
Thus
α + β ⊇ α ∪ {α + ρ : ρ < β} (2)
Let now δ ∈ α + β = {α + τ : τ < β}. Thus, δ ∈ α + τ for some τ < β,
and, by (1), δ ∈ α or δ = α + λ for some λ < τ . Thus δ is in the right hand
side of (2), and we get the converse inclusion of (2).
VI.10.6 Definition.
(i) The disjoint union of sets X and Y is the set {0} × X ∪ {1} × Y , denoted
X Y.
(ii) If (X β )β∈α is an α-sequence of sets, then their ordered disjoint sum is
!
β∈α ({β} × X β ) and is denoted by β∈α X β .
VI.10. Arithmetic on the Ordinals 413
!
Clearly, X 0 X 1 = β∈2 X β . Note that X Y = Y X in general (e.g., take
X = {0} and Y = {1}.)
VI.10.7 Definition.
(i) Let (X, <1 ) and (Y, <2 ) be two WO sets. Then <∗ on X Y is defined
lexicographically, i.e.,
Proof. That they are linear orders is straightforward. Moreover, in either case,
any nonempty set of pairs β, x has a minimum: Locate the ones with
<-minimum β among such pairs; in this set locate the pair with <β -minimum
x (in X β ).
VI.10.10 Theorem. If &(X 0 , <0 )& = α and &(X 1 , <1 )& = β, then &(X 0 , <0 ) ∗
(X 1 , <1 )& = α + β.
Proof. Let
f
α∼
= X0 (1)
414 VI. Order
and
g
β∼
= X1 (2)
where we have omitted the relevant orders to the left (∈) and right (<i , i = 0, 1)
of ∼
= for simplicity of notation.
Let f˜ = {γ , 0, f (γ ) : γ ∈ α} and g̃ = {α + γ , 1, g(γ ) : γ ∈ β}.
Since α ≤ α + γ (with the “=” when γ = 0), we have dom( f˜ ) ∩ dom(g̃) = ∅;
hence H = f˜ ∪ g̃ is a total function on dom( f˜ ) ∪ dom(g̃), that is, on α + β,
by VI.10.4. H is onto X 0 X 1 since f˜ and g̃ are onto {0} × X 0 and {1} × X 1
respectively, by (1) and (2).
By (1) and (2), and Definition VI.10.7, H is order-preserving and hence an
isomorphism between α + β and (X 0 X 1 , <∗ ).
(i) α + (β + γ ) = (α + β) + γ
(ii) α <β →α+γ ≤β +γ
(iii) α <β ↔γ +α <γ +β
(iv) 0+α =α
(v) α < β ↔ (∃γ > 0)α + γ = β.
Proof. (i): We do induction on γ . Assume then the claim for all λ < γ .
By VI.10.4,
(α + β) + γ = (α + β) ∪ {(α + β) + λ : λ < γ }
= (α + β) ∪ {α + (β + λ) : λ < γ } by I.H.
= α ∪ {α +δ : δ < β} ∪ {α + (β + λ) : λ < γ } by VI.10.4 (1)
Next,
α + (β + γ ) = α ∪ {α + δ : δ < β + γ } (2)
0 + α = 0 ∪ {0 + β : β < α} = {β : β < α} = α
(v): The ← follows from (iii) and VI.10.1. For →, let α < β, hence α ⊂ β.
Thus the set difference X = β − α is nonempty and well-ordered by <. Let
is trivially an isomorphism
f
β∼
= {0} × α ∪ {1} × X
Hence &(α X, <∗ )& = β. But also &(α X, <∗ )& = α + γ by (3), whence
the claim.
1 + ω = {0} ∪ {1 + n : n ∈ ω}
= {0} ∪ {n + 1 : n ∈ ω} by V.1.24
= {n : n ∈ ω}
=ω
Yet, ω < ω + 1.
Here is an alternative argument: ω + 1 is a successor, while Lim(1 + ω)
by VI.5.40, so they cannot be equal.
M(α, 0) = 0
M(α, β + 1) = M(α, β) + α
for Lim(β), M(α, β) = sup{M(α, γ ) : γ < β}
Proof. By VI.5.38 and the weak continuity of λβ.α · β, we only need to show
that α · β < α · (β + 1) if α > 0. But, by VI.10.13, this translates to
α · β < (α · β) + α
Proof. By VI.5.40.
α · β = sup{α · γ + α : γ < β}
(1)
= (α · γ + α)
γ ∈β
α · (β + 1) = α · β + α
= (α · β + α) ∪ (α · β)
= (α · β + α) ∪ (α · γ + α) by (1)
γ ∈β
= (α · γ + α)
γ ∈β+1
VI.10.17 Theorem. Let (X γ , <γ ) be a WO set with order type α, for each
γ ∈ β. Then &catγ ∈ β (X γ , <γ )& = α · β.
φ Aβ [Aβ ] ⊆ α · β (4)
Towards promoting (4) to equality, let η ∈ α · β = {α · γ + α : γ < β}
by VI.10.16. Thus, η ∈ α · γ + α for some γ < β; therefore η ∈ α · γ
or η = α · γ + δ for some δ < α, by VI.10.4. In the latter case, immediately
η ∈ φ Aβ [Aβ ] by (3). In the former case, α · γ = α · γ + 0 ∈ φ Aβ [Aβ ] and
transitivity of φ Aβ [Aβ ] again yield η ∈ φ Aβ [Aβ ]. Thus (4) is an equality.
VI.10.18 Remark. It is worth noting that Aβ = γ <β ({γ } × α) = β × α; thus
VI.10.17 shows that
(β × α, < ) ∼
= (α · β, <)
Proof. (i): 0 · α = β<α (0 · β + 0) = 0, using the I.H. that 0·β = 0 for β < α.
α · 0 = 0 by VI.10.13.
(ii):
1·α = (1 · β + 1)
β<α
= (β + 1) under the obvious I.H.
β<α
+
= sup {β : β < α}
=α
(iii):
α·2 = (α · β + α)
β<2
= (α · 0 + α) ∪ (α · 1 + α)
= α ∪ {α + α}
= α + α, since α ⊆ α + α by VI.10.2.
α · (λ + 1) = α · λ + α
= ≤ β · λ + α by I.H. and VI.10.11(ii)
< β . λ + β by VI.10.2
= β · (λ + 1)
Finally, let Lim(γ ), and assume (I.H.) that α · λ ≤ β · λ for all λ < γ . Thus,
α · γ = λ<γ α · λ ⊆ λ<γ β · λ = β · γ .
(v): The ← is by VI.10.14. The → is by VI.10.14 and trichotomy.
(vi): The result holds for α = 0 (by (i)), so let α > 0 and do induction on
γ . The case γ = 0 is α · (β + 0) = α · β = α · β + 0 = α · β + α · 0, using (i)
for the last “=”. Let γ = λ + 1, and assume the obvious I.H. Then
α · β + (λ + 1) = α · (β + λ) + 1
= α · (β + λ) + α
= (α · β + α · λ) + α by I.H.
= α · β + (α · λ + α)
= α · β + α · (λ + 1)
Finally, let Lim(γ ), and assume the claim (I.H.) for all λ < γ . Now,
α · (β + γ ) = α · (β + sup{λ : λ < γ })
= α · (sup{β + λ : λ < γ }) by continuity of + (VI.5.36, VI.10.2)
= sup{α · (β + λ) : λ < γ } by continuity of · (VI.5.36, VI.10.14)
π = sup X (1)
β · π = β · sup X
= sup{β · θ : θ ∈ X } (2)
≤α
α =β ·π +υ (3)
with
α = β · π + (β + γ )
= β · (π + 1) + γ
α = β · π + υ = β · π + υ (5)
α = β · (π + γ ) + υ = β · π + (β · γ + υ )
Now
υ<β
= β ·1
≤ β ·γ
≤ β · γ + υ
α =ω·π +υ
α = 2 · (β + 1)
= 2·β +2
≥ 1 · β + 2 by VI.10.19
= β +2
> β +1=α
a contradiction. So Lim(α).
P(α, 0) = 1
P(α, β + 1) = P(α, β) · α
for Lim(β), P(α, β) = sup{P(α, γ ) : γ < β}
α ·β+1 = α ·β · α > 0
by VI.10.19(i x). Finally, if Lim(β) and α ·γ > 0 for all γ < β (I.H.), then
Finally, let Lim(γ ), and assume the claim for all δ < γ . By VI.5.35 and
VI.10.25, λβ.α ·β is continuous. Therefore,
i.e.,
α ·β < α · β (1)
α ·β < α · δ (2)
By (2),
δ>1
since otherwise α ·β < α = α ·1 , contradicting (iii) (recall, α > 1, β > 1). Now,
by (iii),
α ·δ < α ·β
< α · δ, by (2)
α · α = α ·2 < α · 2 = α + α
α · γ ≤ α ·γ
Hence
(α · γ ) · α ≤ α ·γ · α = α ·β (3)
α · β < α · (γ · α) = (α · γ ) · α ≤ α ·β
contradicting (1).
while
We see that the right hand sides of (1) and (2) are different by VI.10.27(iii);
therefore so are the left hand sides.
426 VI. Order
VI.10.29 Remark. Since λα.ω·α is normal, it has arbitrarily large fixed points.
Such fixed points were called “ε-numbers” by Cantor.
·ω
In particular, the reader can readily verify that sup{ω, ω·ω , ω·ω , . . .} is an
ε-number, the smallest one above ω.
VI.11. Exercises
VI.1. If P has MC and Q ⊆ P, then Q has MC.
VI.2. If P is left-narrow, then P n a is a set for all n > 0 in ω and all a.
VI.3. Prove that if an order < is left-narrow, then it has MC iff every nonempty
set has a <-minimal element.
(Hint. Only the if part is non-trivial. Start with a ∈ A. If a is minimal,
fine. Else show that any minimal element in the set < a∩A is minimal
in A.)
VI.4. Prove that if a relation P is left-narrow, then it has MC iff every nonempty
set has a P-minimal element.
(Hint. Work with P+ .)
VI.5. Prove the claims in the – passage in Remark VI.2.26.
VI.6. Prove that if A ⊆ B and A is a transitive set, then C(B, x) = x for all
x ∈ A, where C is Mostowski’s collapsing function (VI.2.38).
(Hint. Use ∈-induction.)
$ %
VI.7. With reference to VI.4.3, prove that ({1}, <) ∼=
is a proper class, where
“<” is the standard order on ω.
VI.8. Prove that the relation < defined on On in VI.4.9 is transitive.
VI.9. Prove Theorem VI.4.30 directly by explicitly using the recursion (1)
of VI.4.32.
VI.10. Prove VI.3.23 for WO sets by using the comparability of ordinals and
Theorem VI.4.30 (proved via VI.4.32).
VI.11. For α > 0 prove that Lim(α) iff for all β < α there is a γ such that
β < γ < α.
VI.12. For each α = 0 show that sup+ α = α.
VI.13. Prove that Lim(α) iff α = 0 and α = α.
VI.11. Exercises 427
VI.14. Prove that if a set ∅ = A ⊆ On does not have a maximum, then sup(A)
is a limit ordinal.
VI.15. Prove that there are arbitrarily large limit ordinals, that is, for each α
there is a β > α such that Lim(β). This problem addresses questions
raised (and answers promised) following V.1.2.
(Hint. By induction over ω define the sequence f (0) = α and f (n+1) =
f (n) + 1. Argue that (1): if β = sup ran( f ), then Lim(β); and (2):
α < β.)
VI.16. Prove that if f is a weakly continuous On-sequence, of ordinals that
moreover satisfies (∀α) f (α) ≤ f (α + 1), then f is non-decreasing and
hence is weakly normal.
VI.17. Prove that the composition of normal functions is normal.
VI.18. Prove that if f is a normal transfinite On-sequence, then, for any γ , it
has a fixed point β such that γ < β. Check that your proof (along the
lines of that for the Knaster-Tarski theorem) furnishes the smallest fixed
point greater than γ .
VI.19. Prove the Knaster-Tarski fixpoint theorem (VI.5.47) under the weakened
assumption that in the PO set (A, <) every chain has a least upper bound.
VI.20. Refer to the proof of VI.5.47. For the γ chosen, prove that s<γ = sγ .
VI.21. Prove that for all α, sp(VN (α)) ⊆ N .
VI.22. Show that, for all α, β, VN (α) ∈ VN (β) implies α < β.
VI.23. Show that ρ(VN (α)) = α + 1.
VI.24. Define “standard rank” by r k N (x) = min{α : x ⊆ N ∪ VN (α)}. Show
that r k N (α) = α.
VI.25. Relate ρ N and r k N . Show that for all sets x, ρ N (x) = r k N (x) + 1.
VI.26. Complete the proof of VI.7.2.
VI.27. Substantiate the comment made following VI.7.5.
VI.28. Show for the J of Section VI.7 that J (α, β) = {J (σ, τ ) : σ, τ α, β}.
VI.29. Show for the of Section VI.7 that α 2 = (0, α) for all α.
VI.30. Prove that the function λα.J [α × α] is increasing (order-preserving).
VI.31. Prove that the function λα.J [α × α] has arbitrarily large fixed points.
(Hint. Prove that it is normal.)
VI.32. Prove that ordinal addition is absolute for transitive models of ZF.
VI.33. Prove that if β > 0, then α + β = sup+ {α + γ : γ < β}.
VI.34. Prove that 1 + α = α iff ω ≤ α.
428 VI. Order
cat (X β , <α ) ∼
= cat (Yβ , <α )
β<α β<α
VI.36. Let (X, <1 ) and (Y, <2 ) be disjoint WO sets. Define < on X ∪ Y by
In the following few exercises models of fragments of ZFC are being sought.
We mean (U, ∈)-models.
VI.11. Exercises 429
VI.54. Show that V (ω) is a model for ZFC−infinity (i.e., satisfies all the ZFC
axioms except that for infinity). Indeed, infinity fails here. By the way,
this shows that infinity is not implied by the remaining axioms.
VI.55. Show that V (α) is a model for ZFC−collection, for any limit ordinal
α > ω.
VI.56. Find a limit ordinal α > ω such that the collection axiom is false in V (α).
By the way, this shows that collection is not implied by the remaining
axioms.
(Hint. Experiment with α = ω + ω.)
VI.57. Prove that the structure (On, U, ∈) is not a model of ZFC.
VI.58. Let N be a set of urelements such that f : N → n is a 1-1 correspondence
for some n ∈ ω. Prove that VN (ω) is a model of ZFC−infinity. (It fails
infinity).
VI.59. Let N be a set of urelements such that f : N → ω is a 1-1 correspon-
dence. Prove that VN (ω) is a model of ZFC−{infinity, collection}. (It
fails both infinity and collection.)
VI.60. Show that for any transitive class M, PM (A) = M ∩ P(A).
VI.61. Complete the proof of Lemma VI.8.16.
VI.62. Prove that L N satisfies global choice. That is, show that there is a func-
tion F on L N such that for any set ∅ = x ∈ L N , F(x) ∈ x.
VII
Cardinality
In Chapter VI, among other things, we studied the WO sets and learnt how to
measure their length with the help of ordinal numbers. A consequence of the
axiom of choice was (Theorem VI.5.50) that every set can be well-ordered and
therefore every set can be assigned a length.†
In the present chapter we turn to another aspect of set size, namely its
number of elements, or cardinality. It will turn out that for finite sets length
and cardinality are measured by the same (finite) ordinal; thus, in particu-
lar, finite sets have a unique length. As was already remarked, the situation
with infinite sets is much less clean intuitively, and several WO sets of differ-
ing lengths can have the same number of elements (e.g., ω, ω + 1, ω + 2,
etc).
The following section will formalize the notions of “finite” and “infinite”
sets. Intuitively, a set is finite if the process of removing its elements, one at a
time, will terminate; it is infinite otherwise.‡
Thus for finite sets the process implicitly assigns the numbers 1, 2, 3, . . . to
the first, second, third, . . . removed items. Since the process terminates, there
will be a natural number assigned to the last removed item. Evidently this
number equals the cardinality, or number of elements, of the set.
† This length is not unique in general. For example, the set ω ∪ {ω} can be assigned both the lengths
ω + 1 and ω.
‡ This intuitive idea is for motivation only, and it will not be used anywhere except in the informal
discussion here. One can easily get into trouble if “time” is taken too literally. For example, let
us deplete ω in finite time! Start by removing 0. Exactly 1 hour later remove 1; exactly 1/2 hour
later remove 2; . . . ; exactly 1/2n hour later remove n + 1; and so on, in the obvious pattern. It
takes just 1 + 1/2 + 1/22 + · · · + 1/2n + · · · = 2 hours to complete the task. Yet, ω is intuitively
infinite. Of course, we would not have had this informal “paradox” if we were careful to say
explicitly that we spend exactly the same amount of time between any two consecutive removals
of elements.
430
VII.1. Finite vs. Infinite 431
In the infinite case it is not clear a priori how to assign a “number” that
denotes the cardinality of the set. Thus the issue is temporarily postponed, and
one first worries about whether or not two infinite sets have the same number of
elements. This is an easier problem to address, and it can be addressed before
we settle the question of what “number of elements” means. Indeed, any two
sets (infinite or not) clearly have the same number of elements if we can match
each element of one with a unique element of the other in such a way that
no unmatched elements are left on either side. Technically, two sets have the
same cardinality iff there is a 1-1 correspondence between them. Let us now
formalize this discussion and see where it leads.
VII.1.4 Remark. According to the above definition and the hint in Exer-
cise VII.1.2, ∅ is finite. Furthermore, each n ∈ ω is finite, and |n| = n.
Corollary VII.1.8 below shows that the cardinality of a finite set is indeed
unique, for it is impossible to have n ∼ A ∼ m and n = m with m and n in ω.
x is not necessarily a natural number, so that “⊂” here is more general than “<”.
One refers to Corollary VII.1.8 as the pigeon-hole principle, in that if you have
n pigeons and m holes (or vice versa) then there is no way to put exactly one
pigeon in each hole so that no pigeon is left out (and no hole is empty).
x is not necessarily a natural number, so that “⊂” here is more general than “<”.
(x, ∈) ∼
= (α, ∈)
In particular, α ∼ x, say, via the 1-1 correspondence f : α → x.
By VII.1.7, α = n. Suppose that n < α. Let y = ran( f n). Then n ∼ y
via f n, and y ⊆ x ⊂ n, contradicting VII.1.7. Thus, α < n. That is, α is a
natural number m, x ∼ m, and m < n.
† The case x = ∅ cannot lead to onto functions, as seen in the basis step; therefore it is not
considered here.
VII.1. Finite vs. Infinite 433
It is clear from the proof of the if part that f need not be total.
Proof. Induction on n.
Basis: n = 0: The result is immediate.
I.H.: Assume the assertion for n ≤ k. Proceeding by contradiction, as-
sume that f : k + 1 → ω is onto and let H = f −1 0. Hence k + 1 ⊇ H = ∅
by ontoness. Thus, k + 1 − H ∼ m < k + 1 for some m, by VII.1.9. Let
g : m → k + 1 − H be a 1-1 correspondence. The diagram below shows that
h ◦ g : m → ω is onto, contradicting I.H., since m ≤ k:
g h=λx. f (k+1−H ) (x)−1
m −→ K + 1 − H −→ ω
434 VII. Cardinality
Before we turn our attention to infinite sets, we will look into finite sets more
carefully, at the same time establishing a few facts about inductively defined sets
and a technique of proving properties of finite sets by some sort of induction.
The reader is familiar with operations on sets. For example, + is a total 2-ary
operation on the real numbers, R. We prefer to call 2-ary operations binary.
Also, λx.1/x is a nontotal 1-ary operation on R. We call 1-ary operations unary.
We also note that ∅ is closed under any f and that if for a choice of an n-ary
f and class S it is the case that ran( f Sn ) = ∅, then S is f -closed.
The requirement that “operations” (or “rules”) be sets does not limit the
range of applicability of the concept, while it simplifies the technicalities. For
example, it is meaningful to have a set of rules, since, so restricted, they are
objects which can be collected into a class or set. Further justification for this
restriction is embedded in the proof of Proposition VII.1.19 below. The material
below formalizes work presented informally in Section I.2 to bootstrap our
theory.
Indeed, first, ω satisfies (a) and (b) of Definition VII.1.16. Secondly, let a set
T also satisfy (a) and (b) with respect to the given I and F .
That is, 0 ∈ T and (∀x)(x ∈ T → x ∪ {x} ∈ T ). By induction over ω, ω ⊆ T ;
ω is ⊆-smallest.
Proof. First, let X = I ∪ f ∈ F ran( f ). By Exercise VII.3, X is a set that
satisfies (a) and (b). Thus X ∈ J, and hence S = x∈J x is a set, being a subclass
of X .
Next, it is easy to verify that S satisfies (a) and (b). (See Exercise VII.3.)
Finally, S is ⊆-smallest, for if a set Q satisfies (a) and (b), then Q ∈ J.
The above establishes existence. For uniqueness use the ⊆-smallest property.
(See Exercise VII.3.)
We next note that there are two reasons justifying the term “inductively
defined”, or “recursively defined”, for sets such as S = Cl(I , F ).
First, the set S is defined in terms of (“smaller”, or “earlier”, instances of)
itself (starting with I ). For, (b) of Definition VII.1.16 says that if we know S
up to a certain “extent”, or “stage”, then we can enlarge S by applying to its
current version the operations in F .
Second, the definition allows us to prove properties of all elements of S by
induction with respect to the formation, or definition, of S. We also say, by
induction over S.
Such inductive definitions appear frequently in logic and mathematics, as
we have already witnessed, which was the reason that compelled us to present
an informal version of these results early on.
Condition (ii) in the theorem is also pronounced “P (x) propagates with each
operation in F ”. The part “P (a1 ) ∧ · · · ∧ P (an )” is the I.H. for f .
436 VII. Cardinality
† We also say that xi is obtained from k previous objects in the derivation by the application of a
k-ary operation from F .
VII.1. Finite vs. Infinite 437
In particular, D is a set.
VII.1.27 Theorem. If is a monotone operator over the set X , then the set
S= Z
Z ⊆X
(Z )⊆Z
satisfies (S) = S.
438 VII. Cardinality
We call S a fixed point or fixpoint of . It turns out that S is the ⊆-smallest fixed
point of .
VII.1.30 Remark. For an abstraction of what we are doing here see VI.5.47
and VI.5.49. Here the PO set (A, <) is (P(X ), ⊂), and f is .
Monotone operators are also called inductive. VII.1.29 provides a justifica-
tion for the name “inductive”, for it says that has the “property” T (x) if it
happens that the property “propagates with” : If Z is the set of all x ∈ X which
satisfy T (x), then all the elements of (Z ) also satisfy the property.
† This generalization is proper, i.e., there are fixed points which cannot be inductively defined
as in Definition VII.1.16. These require infinitary operations, i.e., operations with infinitely
many arguments.
VII.1. Finite vs. Infinite 439
By Theorem VII.1.27,
I ,F = Z
Z ⊆X
I ,F (Z )⊆Z
= Z by (1)
I ⊆Z ⊆X
Z is F -closed.
=S
† Whitehead-Russell-finite.
440 VII. Cardinality
VII.1.32 Definition.
A set A is WR-finite iff A ∈ Cl(I , F ),where I = {∅}
and F = f y : dom( f y ) = P(A) ∧ y ∈ A ∧ f y = λx.x ∪ {y} .
If A is not WR-finite, then it is WR-infinite.
† At any step of the derivation we may place ∅; such a step is redundant in that it does not help to
progress with the formation of A.
VII.1. Finite vs. Infinite 441
From now on we drop the qualification “WR-” from “finite”. What we are
left with from all this, besides a better understanding of finite sets, is the useful
proof technique of induction on finite sets.
† By “∼= n” we mean, of course, “∼= (n, ∈)”. We are following the convention of Section VI.4 in
writing “∼
= α” as a short form of “∼
= (α, ∈)” for ordinals.
442 VII. Cardinality
Proof. By induction on n.
Basis: n = 0. The result is trivial.
I.H.: Assume the claim for n = k. Let n = k + 1. By Exercise VII.10, A
has a <-maximal element, say a. Now, a is also <-maximum, since < is a total
order. That is, x < a for all x ∈ A − {a}.
Now, |A−{a}| = k (see Exercise VII.12) and the I.H. yield (A−{a}, <) ∼
= k.
By pairing a with k, we extend the previous ∼
= to (A, <) ∼= k + 1.
The above result establishes the claim made in the preamble to this chapter,
namely, that there is a unique “length” for each finite set and that this length
coincides with the set’s cardinality. We add that, of course, every finite set is
well-orderable, a well-ordering being induced by A ∼ |A| (see VI.3.12).
Some authors use the term denumerable for enumerable. Also, the term at most
enumerable is sometimes used for countable.
VII.2.2 Example. According to Definition VII.2.1 each finite set is also count-
able. We also observe that ω is enumerable (since ω ∼ ω), so enumerable sets
exist. Do uncountable sets exist? In other words, are there infinite sets which
are not enumerable? Cantor, as we will see in the next section, answered this
affirmatively.
VII.2.3 Example. The set of the even natural numbers, E, is enumerable, since
λx.2x : ω → E is a 1-1 correspondence. A similar comment is true for the set
of the odd natural numbers.
f (0) = min A
and if n > 0, (i)
f (n) ! min A − ran( f n)
Observe that
(1) Since the recursion (i) is pure, dom( f ) ≤ ω. Say, dom( f ) = n ∈ ω (for
some n). Thus, A = f [n].† By Exercise VII.18 (or VII.7), A is finite, a
contradiction. Thus f is total on ω.
(2) f is 1-1. Indeed, let n = m, where we assume, without loss of generality,
that m < n. Hence f (m) ∈ ran( f n); therefore f (n) = f (m) by the second
part of (i).‡
(3) A = ran( f ). Let us assume instead that, for some m, m ∈ A − ran( f ). Since
ω ∼ ran( f ) (why?), ran( f ) is infinite; therefore, for some n,
m ≤ f (n) (ii)
for, otherwise, (∀n ∈ ω) f (n) < m, i.e., ran( f ) ⊆ m, making ran( f ) finite.
Indeed, (ii) graduates to m < f (n), the strict inequality being justified from
m∈ / ran( f ). This last observation is inconsistent with the definition of f (n)
(second equation of (i)) since both m and f (n) are in A − ran( f n).
Items (1) through (3) establish that ω ∼ A.
It is clear that, if we want, we can state the corollary so that f is total. Indeed,
if f : A → B is nontotal but onto, then we can always extend it to a total and
onto function h by taking h = f ∪ {x, b : x ∈ A − dom( f )}, where b is any
fixed element of B. Thus, whereas the original definition (Definition VII.2.1)
of A being enumerable requires that an enumeration without repetitions
exists (this is the 1-1 correspondence f : ω → A), we now have relaxed this
by saying, via Corollary VII.2.6, that a nonempty set A is countable iff an
enumeration exists (possibly with repetitions – the 1-1-ness requirement being
dropped).
VII.2.9 Proposition. ω ∼ ω × ω.
Proof. By VI.7.8.
There is a more “elementary” proof that avoids the J of VI.7.4 and uses just
multiplication and addition on ω.
We start with the total function f = λmn.(m + n) · (m + n) + m on ω2 .
(By the way, ω2 is in the Cartesian product sense throughout. Ordinal exponen-
tiation would have used an exponent “·2” instead, and cardinal exponentiation
we have not introduced yet.)
One can easily derive that f is 1-1, relying on what we know about ordinal
addition and multiplication (cf. VI.10.11 and VI.10.19). Indeed, assume that
VII.2. Enumerable Sets 445
(m + n) · (m + n) + m = (m + n ) · (m + n ) + m (1)
and prove
m = m ∧ n = n (2)
(m + n) · (m + n) + m + m + n + n + 1 ≤ (m + n ) · (m + n )
Hence
(m + n) · (m + n) + m + m + n + n + 1 ≤ (m + n ) · (m + n ) + m
We can also view the above as a theorem schema (one theorem for each n ∈ N,
rather than the single theorem (∀n ∈ ω)(n ≥ 2 → ω ∼ ωn )) and prove it by
informal induction on N − {0, 1}.
Theorem schema.
† By “squaring”. We are using the distributive law and commutativity of + and · on ω freely –
cf. VI.10.19(iv).
‡ ω2 ⊇ {0} × ω ∼ ω, the ∼ obtained via 0, n → n (cf. Exercise VII.2).
446 VII. Cardinality
Informally, using N for ω, one can conclude the above argument in this alter-
native manner (without invoking VII.2.10): Let
The above is stated as a schema, one theorem for each n ∈ N. A formal version
uses a function f with dom( f ) = ω. It takes the form
(∀n ∈ ω) (∀i ∈ n) f (i) ∼ ω → { f (i) : i ∈ n} ∼ ω
VII.2.14 Remark. (1) The nickname of the theorem has the obvious justifica-
tion as the family (Ai )i∈ω is countable, indeed enumerable.
VII.2. Enumerable Sets 447
(2) The proof of Theorem VII.2.13 involved (tacitly) the axiom of choice.
This happened during the definition of g, where one out of, possibly, several f i
was (tacitly) chosen for each i ∈ ω. The omitted details are as follows:
#
Since { f : f : ω → Ai is onto} = ∅ for i ∈ ω, there is an h in i∈ω { f : f :
ω → Ai is onto}. For each i ∈ ω, h(i) is the f i used in the proof.
Was this a peculiarity of this particular proof? No, as a result of Feferman
and Levy (1963) shows: without the axiom of choice we may have a countable
union of countable sets that turns out to be uncountable.
(3) The axiom of choice is provable for finite sets of sets, as we already
know. Thus to construct g in the proof of Proposition VII.2.12 we did not need
AC to select one (out of the possibly many) f i for each i = 1, . . . , n.
(4) In each of the results VII.2.11, VII.2.12, and VII.2.13, if any of the Ai is
enumerable, then so are
∞
×
n n
Ai , Ai , and Ai
i=1 i=1 i=0
For the case of this follows from VII.1.10; for the other case see Exer-
cise VII.15.
VII.2.15 Example. ∞ n=1 ω is enumerable by Theorem VII.2.13, Corollary
n
a0 , a1 , a2 , . . .
and B is enumerated as
b0 , b1 , b2 , . . .
448 VII. Cardinality
VII.2.19 Example (Informal). We next see that Q, the set of rational numbers,
is enumerable.
This may appear, at first sight, surprising, because of the density of rational
numbers: Between any two rational numbers r and s there is another rational
number, for example, (r + s)/2. Thus, intuitively, there seem to be “more”
numbers in Q than in N (which does not enjoy a density property). Well, intuition
can be wrong in connection with the cardinality of infinite sets. (We will see
VII.2. Enumerable Sets 449
another counterintuitive result in Section VII.4, namely, that there are as many
reals in the unit square, [0, 1]2 , as there are in the unit segment, [0, 1]. See also
Exercise VII.35.)
The justification of the claim is straightforward:
Q = m/n : m ∈ Z ∧ n ∈ N − {0} (1)
Since N ∼ N − {0} via λx.x + 1,
Z × (N − {0}) ∼ N (2)
by Proposition VII.2.11 and Remark VII.2.14(4).
Since the function Z × (N − {0}) → Q given by m, n "→ m/n is onto, (2)
and (1) yield an onto function N → Q.
The ai are the coefficients, and, in this example, they are in Z. Whenever an = 0,
we say that the degree of the polynomial is n. We identify each nth-degree
!n
polynomial, i=0 ai x i , with the (n + 1)-tuple a0 , . . . , an .
It follows that the set of nth-degree polynomials is Zn+1 and therefore the
set of all polynomials is
∞
Zn
n=1
† This characterizing property of infinite sets was also observed by Cantor and Bolzano. See also
Wilder (1963, p. 65).
450 VII. Cardinality
For the balance of this discussion, “infinite” and “finite”, without the qual-
ification “Dedekind”, refer to the ordinary notions, as per Definition VII.1.3.
Proof. That Dedekind infinite implies infinite is the content of Remark VII.2.22.
So let next A be infinite.
In the next section, among other things, we see that case 2 above is not
vacuous.
Clearly this works, i.e., for each a ∈ A, D = λy.F(a, y); for (1) yields D(a) =
F(a, a).
We will illustrate the above general description of diagonalization in the
following examples.
† Intuitively, the main diagonal was “rotated” counterclockwise, by 45 degrees, around the pivot
entry F(h, h).
452 VII. Cardinality
E = {x ∈ J : x ∈
/ Px} (1)
or
E = {x ∈ J : x P x} (1 )
† Clearly, this argument breaks down if the family ( f n )n∈ω contains nontotal functions, in which
case we employ Kleene’s weak equality !. Then it is possible to have f i (i) ↑, in which case
f i (i) ! f i (i) + 1.
VII.3. Diagonalization; Uncountable Sets 453
then
by (2) by (1)
a ∈ Pa ←→ a ∈ E ←→ a ∈
/ Pa
– a contradiction.
This is not a new flavour of diagonalization, but fits under the general dis-
cussion on p. 451 above. Indeed, think of the “table” F : J × J → 2 defined
by†
0 if x P y
F(x, y) =
1 otherwise
The general discussion would lead to the “diagonal object”
which is a J-long 0-1-valued array that cannot be a row of the “table”. “D” is
just another way of saying “E”, for the former is the characteristic function of
the latter, as the following equivalences show (for x ∈ J):
D(x) = 0 ↔ F(x, x) = 1 ↔ x P x ↔ x ∈ E
As an application, let us look at the family (x)x∈S , where S is any class of sets.
Here J = S and P is the identity. Let D = {x ∈ S : x ∈ / x}, i.e., x ∈ D iff
x ∈S∧x ∈ / x. Thus D behaves at x differently than x at x, and therefore it is
not one of the x’s; in other words, D ∈ / S (this is so because this diagonalization
tells us that D is not in the family (x)x∈S , but this family equals S).
So, by diagonalization, we have obtained an object not in S. If we now let
S be V M , the class of all sets, then D is the Russell class, and the above argu-
ment establishes (once again), that D ∈ / V M , i.e., that D is not a set. Therefore,
Russell’s proof was a diagonalization over all sets to obtain an object which is
not a set, and while ingenious and elegantly simple, the technique was borrowed
from Cantor’s work.
(otherwise, d = f (a) for some a – hence d(a) = f (a)(a) – but also d(a) =
f (a)(a) + 1).†
VII.3.5 Example (Informal). Throughout, [0, 1] will denote the real closed
interval, {x ∈ R : 0 ≤ x ≤ 1}. Since Q, the set of rational numbers, is enumer-
able, then so is [0, 1] ∩ Q = {x ∈ Q : 0 ≤ x ≤ 1} (see Example VII.2.7). Now
each rational in [0, 1] has a decimal expansion 0.a0 a1 . . . ai . . . . For example,
1 = 0. 99
. .. , 0 = 0. 00
. .. , 1/3 = 0. 33
. ..
all 9’s all 0’s all 3’s
and
1/2 = 0.5 00
. .. but also 1/2 = 0.4 99
. ..
all 0’s all 9’s
We next claim that the set of all decimal expansions of rationals in [0, 1]
is enumerable. Indeed, this set equals Qinf ∪ Qfin , where Qinf is the set of all
infinite representations such as 0.33 . . . , 0.99 . . . , 0.499 . . . , whereas Qfin is
the set of all finite representations, i.e., those that terminate with an infinite
sequence of 0’s, such as
. ..
0. 00 . ..
and 0.5 00
all 0’s all 0’s
Note that some rationals have both infinite and finite representations.
By Exercise VII.31, Qinf is equinumerous to an infinite subset of Q, and
also Qfin is equinumerous to an infinite subset of Q; hence Qinf ∪ Qfin ∼ ω,
as required. So we have an enumeration (0.a0n a1n . . . ain . . . )n∈ω of all decimal
expansions of the rational numbers in [0, 1].
Consider the decimal expansion d = 0.d 0 d 1 . . . d i . . . , where for all i ∈ ω,
d = aii . For example, a well-defined way to achieve this is to set
i
2 if aii = 1
d =
i
1 otherwise
By diagonalization, d does not belong to the family (0.a0n a1n . . . ain . . . )n∈ω .
Since the latter represents all the rationals in [0, 1], and since d represents a
real in [0, 1] it follows that d is an irrational number in [0, 1].
One can now continue to discover more irrationals in the interval by adding
d at the beginning of the enumeration and then diagonalizing again to obtain
† This type of argument shows, in recursion theory, that the set of all total computable functions
cannot be “effectively” enumerated.
VII.3. Diagonalization; Uncountable Sets 455
d = 0. d 0 d
1
. ..
1’s and 2’s
Proof. Let us assume the contrary, i.e., that there is a 1-1 correspondence
f : ω → P(ω). Construct the diagonal set D = {x ∈ ω : x ∈ / f (x)}.† Thus on
one hand D is not in the range of f ; on the other hand it must be, since D ⊆ ω.
This contradiction establishes the claim.
VII.3.8 Example (Informal). For the purpose of this example we will state
without proof a few facts. To begin with, each real number r in [0, 1] has a binary
expansion, or can be represented in binary notation, as r = 0.b0 b1 . . . bi . . . .
This notation, or expansion, means (quite analogously with the familiar decimal
!∞
case) that r = i=0 bi /2i+1 , where each bi is 0 or 1, and is called the ith binary
digit or bit. An expansion 0.b0 b1 . . . bi . . . is finite if for some n, bi = 0 for all
i ≥ n; otherwise it is infinite. Infinite expansions are unique.
† To connect with the discussion on p. 451, here P on ω is given by n P m iff n ∈ f (m); thus
Pm = f (m).
456 VII. Cardinality
0 = 0. 00 . . . ,
1 = 0. 11 . . . ,
all 0’s all 1’s
1/2 = 0.1 00 . . .
but also 1/2 = .0 11 . . .
all 0’s all 1’s
Since the expansions in [0, 1]fin represent rationals (see Exercise VII.32),
we have [0, 1]fin ∼ ω and therefore
by Lemma VII.2.24. Since every non-zero real has a unique infinite binary
expansion (see also Exercise VII.33), (1) yields (0, 1] ∼ [0, 1]inf ∪ [0, 1]fin , and
one more application of Lemma VII.2.24 ([0, 1] ∼ (0, 1]) yields
VII.4. Cardinals
In this section, following von Neumann, we assign a measure of cardinality to
each set, its cardinal number.
At the very least, cardinal numbers must be ∼-invariants (i.e., equinumerous
sets must measure identically). It is also desirable that this measure be consistent
with the measures we have already accepted for the cardinality of finite sets,
since the latter perfectly fit with our intuition.
The requirement that cardinal numbers be ∼-invariants means that for any
set A, its cardinal number depends on the class of all sets equinumerous to
458 VII. Cardinality
VII.4.1 Definition. For any set x, its cardinal number, or cardinality, is defined
to be min{α : α ∼ x} and is denoted by Card(x). Cardinal numbers are also
simply referred to as cardinals. Thus a cardinal is just the cardinal number of
some set.
We shall use (in argot) lowercase fraktur letters to denote arbitrary cardinals,
i.e., cardinal-typed variables (e.g., a, b, m), but also lowercase Greek letters
around the middle of the alphabet, typically, κ and λ. The class of all cardinals
will be denoted by Cn.
By definition, Cn ⊆ On.
(a) x ∼ Card(x).
(b) x ∼ y iff Card(x) = Card(y).
VII.4. Cardinals 459
VII.4.3 Proposition. For any ordinal α, α ∈ Cn iff there is no β < α such that
β ∼ α.
Proof. Let α ∈ Cn be due to α = Card(x) for some set x. Consider the set (why
set?) S = {γ : γ ∼ x}. We have
α = min S (1)
If some β < α satisfies β ∼ α, then this contradicts (1), since β ∈ S. This argu-
ment establishes the only-if part.
For the if part, let there be no β < α such that β ∼ α. Then α is smallest in
{γ : γ ∼ α}, i.e., α = Card(α) and therefore α ∈ Cn.
VII.4.5 Example. We now establish that ω ∈ Cn. Indeed, just invoke Proposi-
tion VII.1.13 to see that α ∈ ω implies α ∼ ω.
We have just witnessed that ω is the smallest infinite cardinal (i.e., smal-
lest infinite ordinal that is a cardinal). Definitions VII.1.3 and VII.2.1 are
also worth restating in the present context: x is finite iff Card(x) < ω; it is
460 VII. Cardinality
(i) Card(α) ≤ α,
(ii) Card(α) = α iff α ∈ Cn.
VII.4.7 Example. For any cardinal m, Card(m) = m. Indeed, this just rephrases
Proposition VII.4.6(ii).
This observation is often usefully applied as Card(Card(x)) = Card(x), where
x is any set.
Proof. The claim is known to be true for ω. So let ω < a, and assume instead
that a = β + 1 for some β. Now, β is infinite; otherwise a = β + 1 < ω.
By Lemma VII.2.24, β ∪ {β} ∼ β; therefore a ∼ β, which along with β < a
contradicts that a is a cardinal.
The above result shows that there are many “more” ordinals than cardinals.
For example, ω + 1, ω + 2, and ω + i for any i ∈ ω are not cardinals.
It also suggests the question of whether indeed there are any cardinals above
ω. This question will be eventually answered affirmatively. As a matter of fact,
there are so many cardinals that Cn is a proper class.
The following result is very important for the further development of the
theory of cardinal numbers.
We know (VI.4.32) that ran(φ) ∈ On (which justifies calling it “α”) and that
φ(x) ∈ On for all x ∈ A.
We next show that
The above theorem will provide the basic tools to compare cardinalities of
sets. To this end we introduce a definition.
VII.4.10 Definition. For two sets A and B, A B means that there is a total
and 1-1 f : A → B.
VII.4.11 Proposition.
VII.4.14 Proposition. For any sets A = ∅ and B, the following are equivalent:
(i) A B.
(ii) There is an onto function f : B → A.
(iii) Card(A) ≤ Card(B).
Proof. The equivalence of (i) and (ii) follows directly from V.3.9. Next, let
us assume (i) and prove (iii). By Proposition VII.4.11(iii), A ∼ C ⊆ B for
some C. Hence, using Propositions VII.4.2 and VII.4.9, Card(A) = Card(C) ≤
Card(B). Conversely, assume now (iii), i.e., Card(A) ⊆ Card(B). The diagram
below shows that g ◦ i ◦ f : A → B is total and 1-1, thus establishing (i), where
i : Card(A) → Card(B) is the inclusion map x "→ x and f : A → Card(A)
and g : Card(B) → B are 1-1 correspondences:
g◦i◦ f
A −−−−→ B
0
f/
g
Card(A) −−−−→ Card(B)
i
VII.4.15 Corollary. For any sets A and B, A ≺ B iff Card(A) < Card(B).
By Corollary VII.4.16, there are infinitely many cardinals. Indeed, for any a,
Card(P(a)) is a bigger cardinal. The preceding proposition relates comparisons
of sets (as to size) with comparisons of their cardinal numbers and leads to
the following important result, which has several names attached to it: Cantor,
Dedekind, Schröder, and Bernstein.
Proof. The “conversely” part directly follows from Proposition VII.4.11. For
the rest, observe that A B and B A yield Card(A) ≤ Card(B) and Card(B) ≤
Card(A) respectively, by Proposition VII.4.14; thus Card(A) = Card(B).
Our approach to cardinals relies on AC. Some authors define cardinal numbers in
a way independent of AC (see, for example, Levy (1979)). In such an approach,
there is a more obscure – but AC-free – proof† of the Cantor-Bernstein theorem,
which we include here.
X = f [X ] (2)
Y = A−X (3)
Y = B−X (4)
so that
A = X ∪Y and X ∩Y =∅ (5)
B = X ∪Y and X ∩Y =∅ (6)
† Of course, AC enters via the Zermelo theorem in Definition VII.4.1 and in the proof of Theo-
rem VII.4.9, on which the above-given proof of the Cantor-Bernstein theorem is based.
464 VII. Cardinality
VII.4.19 Example (Informal). Let us see that (0, 1] × (0, 1] ∼ (0, 1].
Indeed, (0, 1]2 (0, 1] via the function 0.a0 a1 . . . ai . . . , 0.b0 b1 . . . bi . . . "→
0.a0 b0 a1 b1 . . . ai bi . . . , which is clearly total and 1-1 on the understanding
that we only utilize infinite expansions. On the other hand, (0, 1] (0, 1]2 via
x "→ x, 1. The result follows from the Cantor-Bernstein theorem.
Compare the proof just given with the one you gave in Exercise VII.35.
and hence
Card S <a⊆ S for some a (2)
VII.4.22 Definition. For any cardinal a, its cardinal successor, a+ , is the smal-
lest cardinal > a.
The above definition makes sense by Corollary VII.4.16 and the remark
following it, since Cn is well-ordered by < (i.e., ∈). We can now define the
alephs:
VII.4.23 Definition. The aleph transfinite sequence is given by the total func-
tion α "→ ℵα on On defined inductively as follows:
ℵ0 = ω
+
ℵα+1 =
(ℵα )
ℵα = {ℵβ : β < α} if Lim(α)
Each ℵα is an aleph.
A cardinal ℵα with Lim(α) is called a limit cardinal, while one such as ℵα+1
is called a successor cardinal.
VII.4.24 Remark. (1) The reader will note that the term “limit cardinal” applies
to the index of an infinite cardinal in the aleph sequence. It does not refer to the
cardinal itself (all infinite cardinals are limit ordinals by VII.4.8).
466 VII. Cardinality
(2) If Lim(α), then ℵα = {ℵβ+1 : β < α}. Indeed, by VII.4.20, {ℵβ+1 :
β < α} is a cardinal, say
a= {ℵβ+1 : β < α}
By the previous theorem, ℵ1 = {α : Card(α) ≤ ω}. That is, ℵ1 is the set of all
countable ordinals.
It is also noted that for each α, (ℵα )+ ≤ Card(P(ℵα )), since ℵα < Card(P(ℵα ))
by Cantor’s theorem, while (ℵα )+ is the smallest cardinal above ℵα .
The conjecture (Hausdorff)
ℵα+1 = Card(P(ℵα ))
ℵ1 = Card(P(ℵ0 )) (1)
Gödel (1938, 1939, 1940) showed, using L, that GCH is consistent with
the Zermelo-Fraenkel (+AC) axioms of set theory, and Cohen (1963) showed
that ¬GCH is also consistent with ZFC. Thus GCH is independent of the ZFC
axioms; these axioms can neither prove it nor disprove it. So, as with AC,
one can adopt either GCH or ¬GCH, as an axiom. This is not generally done,
however. The other axioms of ZFC (including AC) are widely accepted as
“really true”, being counterparts of reasonable principles (e.g., substitution,
foundation), whereas our intuition does not help us at all to choose between
GCH or ¬GCH. Our principles (or axioms) are not adequate to settle this
question, and one hopes that additional intuitively “true” axioms will eventually
be discovered and added which will settle GCH. It is noted that if one adopts
GCH for the sake of experimentation, then several things become simpler in
set theory (e.g., cardinal arithmetic – see Section VII.6), and even the axiom
of choice becomes a theorem† (the interested reader is referred to Levy (1979,
p. 190)).
In the “real realm”, because P(ω) ∼ R (by VII.3.8, VII.3.11, and Exer-
cise VII.34), CH can be rephrased to read there is no cardinal between ω
and Card(R), or also every subset of R either has the cardinality of R or is
countable.
VII.4.28 Remark. (1) The above definition is essentially the original due to
Frege and Russell suggested in the preamble to this section, where the “size” of
cardinals has been drastically reduced down to set size (see VI.6.29). We state
this as Proposition VII.4.29 below.
(2) The cardinal number of a set A does not necessarily contain A (i.e., card-
inals of Definition VII.4.27 are not equivalence classes). To see this, look, for
† For this to be non circular cardinals must be introduced in an AC-free manner. See the following
Digression.
468 VII. Cardinality
The above is the counterpart of Proposition VII.4.2(b), this time under Def-
inition VII.4.27. It has been shown by Pincus (1974) that one cannot define
“Card()” in ZF so that it satisfies x ∼ Card(x) as well.
One now proceeds by adopting Definitions VII.4.10 and VII.4.13 for
and ≺. In particular, Proposition VII.4.11 is derivable. Next, ≤ on cardinals
is defined through as in VII.4.31 below. We also observe that if a ∼ a and
b ∼ b , then a b yields a b and a ≺ b yields a ≺ b (Exercise VII.39).
The above definition embodies the equivalence of (i) and (iii) of Proposi-
tion VII.4.14. Here Theorem VII.4.9 trivially holds via VII.4.11(ii). “Cantor’s
theorem” (VII.4.16) also holds. The Cantor-Bernstein theorem is proved by the
AC-free proof that follows VII.4.17 (p. 463). This yields that < is a partial order
on Cn. Indeed, irreflexivity is immediate (Card(a) < Card(a) requires a ∼ a).
Transitivity is obtained as follows:
Let a < b < c and therefore Card(a) < Card(b) < Card(c) for appropriate
a, b and c. By VII.4.31, a ≺ b ≺ c; hence a c by Proposition VII.4.11(i). If
a ∼ c, then (Exercise VII.39) b ≺ a, and hence a ∼ b by the Cantor-Bernstein
theorem. Thus a ≺ c, i.e., a < c.
Proposition VII.4.20 has the following counterpart:
VII.4. Cardinals 469
x ∈ VN (α) (1)
By (1), a ⊆ VN (α) for any a ∈ S. Thus b = Card(VN (α)) will do, by VII.4.11(ii).
From the above, one obtains, once again, Corollary VII.4.21: If Cn is a set,
then let b satisfy a ≤ b for all a ∈ Cn. Then since Card(P(b)) ∈ Cn, we get
Card(P(b)) ≤ b, contradicting Cantor’s theorem.
VII.4.34 Theorem. (Hartogs (1915)). For any set x there is an ordinal α such
that α x.
total order. Of course, in the presence of AC one would rather define cardinals,
as we do, by Definition VII.4.1.
Proof. Let a = Card(A) and b = Card(B). Since a ∼ {0} × a via x "→ 0, x and
b ∼ {1}×b via x "→ 1, x, there are 1-1 correspondences f : A → {0}×a and
g : B → {1}×b. Since A ∩ B = {0} × a ∩ {1} × b = ∅, it follows that f ∪ g is a
1-1 correspondence and A ∪ B ∼ {0} × a ∪ {1} × b; thus a +c b = Card(A ∪ B).
VII.5. Arithmetic on Cardinals 471
The basic properties of addition, worked out by Cantor, are captured by the
following theorem.
† The cardinality of the set of real numbers is often denoted by c in the literature (“c” stands for
“continuum”).
472 VII. Cardinality
For (v), let a ≤ b and A, B be as above; hence (VII.4.14) there is a total, 1-1
f : A → B. Let C = B −ran( f ) (this might be empty). Set c = Card(C). Now,
C ∩ ran( f ) = ∅, and Card(ran( f )) = a. Thus, a +c c = Card(ran( f ) ∪ C) = b.
This settles the only-if part. For the if part start with A ∩ C = ∅ such that
a = Card(A), c = Card(C), and set B = A ∪ C, b = Card(B) = a +c c. Since
i : A ⊆ B (the inclusion map, given by i(x) = x for all x ∈ A) is total and 1-1,
we get a ≤ b by VII.4.14.
VII.5.8 Proposition. +c ω2 = + ω2 .
m + n = Card(m + n) since m + n ∈ ω
= Card(m) +c Card({m + k : k ∈ n})
= m +c n
VII.5.9 Definition. For any cardinals a and b, a ·c b stands for Card(a × b),
their product.
a ·c b = a · b, in general.
VII.5.12 Proposition. ·c ω2 = · ω2 .
VII.5. Arithmetic on Cardinals 473
Proof. We have
m · n = Card(m · n) since m · n ∈ ω
= Card(n × m) by VI.10.18
= Card(m × n) via k, l "→ l, k
= m ·c n by VII.5.9
(i) a ·c 0 = 0
(ii) a ·c 1 = a
(iii) a ·c b = b ·c a
(iv) (a ·c b) ·c c = a ·c (b ·c c)
(v) If a ≤ b, then a ·c c ≤ b ·c c
(vi) a · (b +c c) = a ·c b + a ·c c.
Proof. Let a ≥ ω be the smallest cardinal for which the claim fails. Then a > ω
by VII.5.11. Let a × a ∼
= β, via the J of Section VI.7, i.e., J [a × a] = β. We
know that J [a × a] ≥ a by Exercise VI.30, so a < β; therefore
& '
a∼
= γ , δ for some γ , δ ∈ a × a (1)
474 VII. Cardinality
Since Lim(a) (by VII.4.8), take a λ < a to satisfy also max(γ , δ) < λ. Thus,
the isomorphism in (1) establishes a λ × λ. Therefore
a = Card(a) ≤ Card(λ × λ) = Card(λ) ·c Card(λ) = Card(λ) (2)
the last “=” by minimality of a, and Card(λ) ≤ λ < a (using VII.4.6). We now
have a contradiction a ≤ Card(λ) < a.
That is (3).
VII.5. Arithmetic on Cardinals 475
Apart from our use of AC in the definition of cardinals (through the well-
ordering theorem), the above result also invoked AC (twice) additionally (why
twice?). AC can be avoided if we are content to prove instead: “Assume that
cardinals were defined without AC, say as in VII.4.27. Now, assume (1) and (2)
above, and moreover let I and i∈I Ai be well-orderable. Then (3) follows
within ZF.”
Indeed, let α = &( i∈I Ai , <1 )& with respect to some arbitrarily chosen well-
ordering <1 of this set. Let us also pick a well-ordering <2 of I . Define for
each x ∈ i∈I Ai
VII.5.17 Example. Here is a situation where we may want to use the technique
of the previous example: We have a first order language of logic, where the set of
nonlogical symbols has cardinality k. How “many” formulas can this language
have? Well, no more than strings over the alphabet of the language. Now the
cardinality of the alphabet, L, is
ω if k ≤ ω
Card(L) = ω +c k =
k otherwise
where ω is the cardinality of the set of logical symbols (assuming the object
variables are v0 , v1 , . . .).
A “string” of length n < ω is, of course, a member of L n . An easy induction
on n, via VII.5.14, shows that
ω if k ≤ ω
Card(L ) =
n
k otherwise
Thus, the set of all strings over L, n∈ω L n , has cardinality
ω ·c ω = ω if k ≤ ω
Card L ≤
n
n∈ω
k ·c ω = k otherwise
We define, finally, cardinal exponentiation. This again turns out to be far too
“easy” by comparison with ordinal exponentiation.
476 VII. Cardinality
F(h) = f ◦ h ◦ g −1
2ℵ0 = Card(P(ω))
= c, by VII.3.8
(i) a0 = 1
(ii) a1 = a
(iii) ak+c l = ak ·c al
(iv) (ak)l = a(k·c l)
(v) ak ≤ bk whenever a ≤ b.
Proof. (i): The empty function 0 is the only member of 0 a; hence Card(0 a) = 1.
For (ii), the set of total functions f : 1 → a is 1 a = {{0, γ } : γ < a}. Thus
Card(1 a) = a, via the 1-1 correspondence {0, γ } "→ γ .
VII.5. Arithmetic on Cardinals 477
The next result shows that cardinal exponentiation coincides with ordinal
exponentiation over ω, just like the addition and multiplication.
m ·(n+1) = m ·n · m by VI.10.23
= m n ·c m by I.H. and VII.5.12
= m ·c m
n 1
by VII.5.21
= m n+c 1 by VII.5.21
= m n+1 , by VII.5.8
VII.5.23 Example. How big is ℵℵ0 0 ? Well, this is Card(ω ω); therefore
c = Card(ω 2)
≤ Card(ω ω) since
ω
2 ⊆ ωω
≤ Card P(ω × ω) since ω ω ⊆ P(ω × ω)
ω×ω
= Card( 2) by VII.3.9
=c by VII.5.20 and VII.2.9
Thus, ℵℵ0 0 = c.
VII.5.24 Example. For any n ∈ ω−{0} and set A, we have An ∼ n A via the 1-1
correspondence x0 , . . . xn−1 "→ {i, xi : i ∈ n}. Thus Card(n A) = Card(An ).
In particular, a2 = Card(2 a) = Card(a × a) = a ·c a for any a.
VII.5.25 Remark. We saw in the discussion following VII.4.26 (p. 466) that
(ℵα )+ ≤ Card(P(ℵα ))
478 VII. Cardinality
or
Thus,
ℵα+1 = 2ℵα
and
(2) we have just estimated a+ in general:
a+ ≤ 2a
Thus, cf(α) is the smallest ordinal into which we can “shrink” α via a (total)
function. If Lim(α), and f : β → α is cofinal, then ran( f ) is unbounded in α
(hence sup ran( f ) = ran( f ) = α), since γ < α implies γ + 1 < α, and hence
γ < γ + 1 ≤ f (σ ) for some σ ∈ β.†
From the preamble to the section it follows that 1 = cf(α + 1) for any α.
Also, ω = cf(ω), since for each n ∈ ω and f : n → ω, ran( f ) is finite (Exer-
cise VII.18); hence ran( f ) ∈ ω, and thus, for all n ∈ ω, cf(ω) = n. Also,
by VII.4.23, cf(ℵω ) = ω; therefore some “huge” ordinals (in this case a car-
dinal) can shrink quite a bit.
Finally, it is clear that cf(α) ≤ α, since, whenever Lim(α), the identity
function maps α cofinally into α, while when α = β + 1, cf(α) = 1 ≤ α.
Proof. If α is a successor then the result is trivial. Otherwise, let 1 ≤ β < cf(α)
and g : β → cf(α) be a 1-1 correspondence. Suppose that f maps cf(α) to α
cofinally. Thus, ran( f ) = α. Clearly ran( f ◦ g) = α as well, thus f ◦ g
maps β cofinally into α, contradicting the minimality of cf(α).
Thus, all regular ordinals are cardinals. In particular, all, except 1, are limit
ordinals.
Proof. If α is regular, then the result is trivial. Otherwise, use VII.6.5 and
VII.6.6.
VII.6.8 Corollary. If F is normal, then cf(F(α)) = cf(α) for all limit ordi-
nals α.
VII.6.11 Proposition. The infinite cardinal a is singular iff, for some β < a
and a family of sets (Aα )α<β with Card(Aα ) < a for all sets in the family, one
has a = Card( α<β Aα ).
482 VII. Cardinality
(1) The above proposition can be rephrased to read exactly as above, but with
β replaced by a cardinal b < a. In the only-if part this is so because cf(a)
is a cardinal. In the if part it is so because the smallest ordinal β that makes
the proof work is cf(a). But this is a cardinal.
(2) As the notions “singular” and “regular” pertain to ordinals, the remark
following VII.5.16 applies here, so that VII.6.11 is provable within ZF on
the assumption a is well-orderable. Remarks such as that and the present
one are only of value when one wants to gauge with accuracy which results
follow, or do not follow, from what axioms.
It is known that without AC we cannot prove (in ZF) that ℵ1 is regular (Feferman
and Levy (1963)). One cannot even prove (in ZF) that there are any infinite
regular cardinals at all beyond ω (Gitik (1980)).
Let us next turn our attention to regular limit cardinals beyond ω. These have
a special name.
On the other hand, cf(α) = ω for this α, since n "→ sn is cofinal. Thus cf(ℵα ) =
cf(α) = ω < ℵα . We have just established that the first fixed point of ℵ, a huge
limit cardinal, is singular. As this was only the first candidate for a weakly
inaccessible cardinal, the first actual such cardinal will be even bigger, as it
must occur later in the aleph sequence.
It turns out that within ZFC one cannot prove that weak inaccessibles exist.
We will prove this relatively easy metamathematical fact below, but first we will
need a notion of strongly inaccessible cardinals and some additional cardinal
arithmetic tools.
(1) Any cardinal a that satisfies b < a → 2b < a is called a strong limit.
(2) The above definition could also be phrased: “A cardinal a > ℵ0 is strongly
inaccessible, or just inaccessible, iff it is regular and, moreover, for every infinite
cardinal b < a, 2b < a.” This is because for any b, b+ ≤ 2b; thus the “moreover”
484 VII. Cardinality
part yields the implication b < a → b+ < a; hence a is a limit cardinal and
therefore, in particular, weakly inaccessible.
In the presence of the generalized continuum hypothesis (GCH) that 2b = b+
for all infinite cardinals, the requirement in VII.6.14 that b < a implies 2b < a
is automatically satisfied, since a is a limit cardinal. Thus, under GCH, weak
and strong inaccessibles coincide.
A strongly inaccessible in comparison with other (smaller) infinite cardinals
is like ω in comparison with smaller cardinals (natural numbers), since n ∈ ω
implies 2n ∈ ω.†
Intuitively, in the sum we “count” all the elements in all the ki and allow for
multiplicity of occurrence as well, since if ki = k j , still {i} × ki ∩ { j} × k j = ∅.
Proof.
,
b = Card {α} × b
α∈a α∈a
= Card(a × b) = a ·c b
# #
VII.6.19 Remark. If Ai ∼ Bi for i ∈ I , then i∈I Ai ∼ i∈I Bi (Exer-
cise VII.64).
Card(Ai ) = ai for i ∈ I
Thus,
"
bi = Bi
i∈I i∈I
Pi = { p(i) : p ∈ Bi } (2)
Card(Pi ) ≤ Card(Bi )
≤ Card(Ai )
< bi
= acf(a)
In the absence of the CH, ZFC cannot pinpoint the cardinal 2ℵ0 in the aleph
sequence with any certainty. If 2ℵ0 = ℵ1 , then fine. But if not, then it can be
(i.e., it is consistent with ZFC), as Cohen forcing has shown, that 2ℵ0 = ℵ2 or
2ℵ0 = ℵ3 , or, indeed, that 2ℵ0 is weakly inaccessible provided existence of such
inaccessibles is consistent with ZFC.
However, we know that 2ℵ0 = ℵω by VII.6.22, since cf(ℵω ) = ℵ0 .
Proof. Trivially,
m m
a⊇ α (1)
α<a
Let next f ∈ ma. By the assumption, sup ran( f ) < a; hence f ∈ α<a
m
α. Thus
(1) is promoted to equality.
Next,
m m
a = Card α
α<a
,
≤ Card(α)m by Exercise VII.60
α<a
= a ·c Card(α)m by Exercise VII.62
α<a
ℵ ℵ
ℵα+1
β
= ℵα β ·c ℵα+1
488 VII. Cardinality
Proof. For β ≤ α we apply VII.6.24 (see also VII.6.12 and VII.5.21) to obtain
ℵβ
,
ℵα+1 = Card(γ )ℵβ (note the “≤”)
γ ≤ℵα
, ℵ
≤ ℵα β
γ ≤ℵα
ℵ
= ℵα β ·c ℵα+1
ℵ
≤ ℵα+1
β
·c ℵα+1
ℵ
= ℵα+1
β
Hence, from ℵα+1 ≤ ℵβ < 2ℵβ the contention becomes the following provable
statement:
Proof. (i):
ℵ
ℵα β = 2ℵβ by Exercise VII.56
= ℵβ+1 by GCH
(ii):
(iii):
ℵ
,
ℵα β = Card(γ )ℵβ by VII.6.23
γ <ℵα
= ℵα ·c sup Card(γ )ℵβ by Exercise VII.62
γ <ℵα
To conclude, by induction on β, we show that Card(VN (β)) < α for all β < α.
If β = 0, then Card(VN (β)) < α from the choice of N .
If β = γ + 1, then
Card(VN (γ + 1)) = Card(P(N ∪ VN (γ )))
≤ 2max(Card(VN (γ )), Card(N ))
< α, by the I.H. and α’s “strong limit” property.
If Lim(β), then VN (β) = γ ∈β VN (γ ). By the I.H., Card(VN (γ )) < α for
γ < β; thus
Card(VN (β)) < α, by the I.H. and VII.6.11, since β < α.
Thus, (2) yields Card(VN (α)) ≤ α, and the result follows from (1).
490 VII. Cardinality
or that (VI.8.13)
P
VN (α)
(A) ∈ VN (α)
By absoluteness of ⊆, PVN (α) (A) = P(A) ∩ VN (α) = P(A), the last equal-
ity because x ⊆ A ∈ VN (β) (β < α) implies x ⊆ A ⊆ N ∪ VN (β),
and hence x ∈ VN (β + 1). Thus, also, P(A) ⊆ P(N ∪ VN (β)); therefore
P(A) ∈ VN (β + 2).
(ix) Collection. For convenience, we approach collection via its equivalent
form, replacement, that is, “For any set A and any function f , f [A] is
a set.” We need, therefore, to show that for any set A ∈ VN (α) and any
function f ∈ VN (α)
† One can also do this with a sledgehammer: {a, b} ⊆ VN (α) and Card({a, b}) ≤ 2 < α. Hence
{a, b} ∈ VN (α), by VI.6.28.
492 VII. Cardinality
(xi) AC. Let S be a set of nonempty sets in VN (α). We need a choice fun-
ction in VN (α). By AC, there is a choice function, f : S → S, in ZFC,
such that f (x) ∈ x for all x ∈ S. Now by (viii), S ∈ VN (α); hence
S × S ∈ VN (α) (why?). Thus, f ⊆ S × S implies f ∈ VN (α).
Then we have a proof in ZFC that the smallest inaccessible, β, exists. Now
introduce new constants β for that inaccessible, and N for a set of urelements
such that Card(N ) < β.† Thus, VN (β) is a (formal) model of ZFC. By (1), I.7.9,
and VII.6.30,
or
which contradicts the choice of β (see VI.6.9). Since ZFC is consistent, this
contradiction establishes the original claim.
VII.6.32 Remark. (1) The above can be transformed to a ZFC proof, via a
formal model, that if ZFC is consistent, then so is ZFC + ¬(∃α)I (α) (where we
use “I (α)” here as an abbreviation of “α is strongly inaccessible”). Once we fix
N with, say, Card(N ) ≤ ω, the model is M = {x : (∀α)(I (α) → x ∈ VN (α))}
† The reader has had enough practice by now to see that augmenting ZFC thus – including the
relevant axioms, e.g., “N is a set of atoms”, “Card(N ) < β”, etc. – results in a conservative
extension.
‡ Intuitively, if α ∈ VN (β), then an inhabitant of VN (β) will perceive it as a strongly inaccessible
iff an inhabitant of U N does.
VII.6. Cofinality; More Cardinal Arithmetic 493
with interpretation of ∈, U as themselves. This is clearly so, for there are two
cases: If there are no inaccessibles (i.e., ¬(∃α)I (α)), then M = U N is a model of
ZFC + ¬(∃α)I (α); else M = VN (β), where β is the smallest inaccessible, and
hence again (by VII.6.31) M is a model of ZFC + ¬(∃α)I (α) since (∃α)I (α)
is false in M(I.7.4).
(2) Can we, again in ZFC, prove consistency of ZFC + (∃α)I (α) (assuming
consistency of ZFC)? No, because this would clash with Gödel’s second incom-
pleteness theorem, which says “In any extension S of ZFC, if S is recognizable
and consistent, then S CONS(S)”, where CONS(S) is a formula that says “S
is consistent”. In outline, this goes like this. Assume that we have a proof
By VII.6.30,
since, if β is any inaccessible and Card(N ) < β, then for the universal closure
F of every axiom of ZFC we have
(3) How about weakly inaccessibles? Can we prove in ZFC (if this is consis-
tent† ) that weakly inaccessibles exist? Suppose we could. Then we could also
prove this in the extension theory ZFC + GCH (which is also consistent – as
Gödel has shown using his L). If β is the smallest weakly inaccessible as far as
ZFC knows, then it is also the smallest strongly inaccessible in ZFC + GCH.
But then VN (β), constructed in ZFC + GCH with a well-chosen N , is a model
of ZFC. We have, as in VII.6.31,
† By now this hedging must have become annoying. However, recall that if ZFC is inconsistent,
then for any formula F whatsoever, ZFC F .
494 VII. Cardinality
{a1 , . . . , an } "→ f (a j1 , . . . , a jn )
R[P(X )] = {a : (∃A)(a R A ∧ A ⊆ X )}
VII.7.3 Example. Every ordinal is closed under the solitary rule ∅ "→ 0.
Every limit ordinal is closed under the rule set
∅→" 0
{α} →" α+1
It is clear that the rule sets are not single-valued relations in general. We will
often omit mention of the set S such that R ⊆ P(S) × S.
As in VII.1.19, we have
Proof. Uniqueness. Say that S, T are both candidates for Cl(R). Then S ⊆ T
and T ⊆ S by VII.7.4; hence S = T .
Existence. To see that
S= {X : X is R-closed} (1)
R
(∀ f ∈ F )({a1 , . . . , an } "→ f (a j1 , . . . , a jn )
for all permutations ai "→ a ji ) (2)
By (2), the only way that (3) is possible is that a = f (a j1 , . . . , a jn ) for some
permutation of a1 , . . . , an ; hence a ∈ Cl(I , F ). We also need to settle the case
R
Cl(I , F ) ⊇ ∅ "→ a (4)
Thus,
Two remarks:
(i) By Q-induction (in the sense of VI.2.1), the right hand side of “iff” above
says that A ⊆ X , and hence A = X .
(ii) Thus, by (1), since Cl(R) ⊆ A (why is Cl(R) ⊆ A?) and Cl(R) is R-closed,
we get that A = Cl(R).
(1) We cannot bring in induction over the entire field of Q through the back
door (via R of VII.7.8), if Q does not have IC on B. Our ability to do
induction in these cases is restricted to some (often – as above – trivial)
subset, Cl(R), of the field of Q (see Exercise VII.73 for what we can say
in the general case).
(2) To define “useful” closures, the rule set must have rules with empty premises
(∅ "→ a).
∅ "→ 1
∅ "→ 2
∅ "→ 3
{x, y} "→ x + y
{x, y} "→ y + x
{x, y} "→ x × y
{x, y} "→ y × x
val(1) = 1
val(2) = 2
val(3) = 3
val(x + y) = val(x) + val(y) if x, y are the i.p.’s of x + y
val(x × y) = val(x) × val(y) if x, y are the i.p.’s of x × y
It should be clear that the above definition of val is, intuitively, “ambiguous” or
“ill-defined” (terminology that was formally adopted in VII.7.10). For example,
VII.7. Inductively Defined Sets 499
there are two choices of i.p. sets for 1 + 2 × 3. One choice is x = 1 + 2 and
y = 3 (under ×), so that
We get different results! Even 1 + 2 + 3 has two possible sets of i.p.’s, although
this does not create a problem for val, since + is commutative.
It is not always easy to prove that a rule set is unambiguous (it is much easier,
in general, to spot an ambiguity). The reader will be asked in the Exercises sec-
tion to check that a few familiar rule sets are unambiguous (Exercises VII.74
to VII.77). Freedom from ambiguity is important in an inductive definition
effected by a rule set R, for we can then “well-define”, recursively, functions
by induction over Cl(R). Examples of such functions are the val function over
arithmetic expressions (assuming that arithmetic expressions are defined more
carefully than in VII.7.11: brackets would have helped – see Exercise VII.75),
the truth-value function on formulas (propositional calculus), assigning “mean-
ing” (i.e., “interpretation” over some structure) to terms and formulas of a first
order language, numerous definitions on “trees”, and more. The following result
allows such recursive definitions.
Thus,
Does Q have IC? Well, suppose that S is a set for which we know that
Qx ⊆ S → x ∈ S (2)
500 VII. Cardinality
R
Can we conclude that A ⊆ S? Indeed we can, as follows: Let Y ⊆ S and Y "→ y.
By (1), Qy = Y ⊆ S. By (2), y ∈ S. Thus, S is R-closed, hence A ⊆ S.
It follows that, for unambiguous R, R-induction can be replaced by i.p.
induction (see however VII.7.9).
Since Q has IC on Cl(R) (by Example VII.7.12) we are done, via VI.2.28.
Several variations are possible for VII.7.13 (see Section VI.2), but we will
not pursue them here.
We return to operators : P(X ) → P(X ) (Definition VII.1.25). It is now the
case (compare with VII.1.31, footnote) that every monotone operator gives rise
to an equivalent rule set.
As in VII.1.27,
= Z
Z ⊆X
(Z )⊆Z
= Z = Cl(R)
Z ⊆X
Z is R -closed
– that is, at each stage, we add to the S that we have so far all the new points
we constructed in (S) (cf. the “abstraction” of this in VI.5.47).
for all α.
We call the set α the αth stage (often, by abuse of terminology, we refer
to the ordinal α itself as the αth stage). An element s ∈ α has level or stage
≤ α. It has level α if moreover s ∈ / β for all β < α. We write = α α
or ∞ = α α . We call ( ∞ ) the class inductively defined by the opera-
tor .
The notation α might be confusing at first sight. This is the α-th set constructed;
it is not an operator. By the way, an easy induction on α shows that α is indeed
a set (Exercise VII.78).
502 VII. Cardinality
The notation <α for the union of all the stages before α is due to Moschovakis
(it corresponds to the set S we used in the pseudo-program above). Note that
<0 = ∅ and hence 0 = (∅).
VII.7.16 Lemma. Let be any monotone operator (not necessarily a set). If,
for some α, <α = α , then:
(1) = α = β for β ≥ α.
(2) is a fixed point of , that is, () = .
= <α ∪ α by I.H.
<α
= by the choice of α
Hence,
β = <β ∪ ( <β )
= <α ∪ ( <α )
= α
In particular, = β β = <α = α , for the above α.
As a by-product, is a set.
(2): Since α = <α ∪ ( <α ), it follows that ( <α ) ⊆ α ; hence,
() ⊆ (i)
by = α = <α , with α as above. Next, as an I.H., assume that
<β ⊆ ()
Now,
() = γ
γ
⊇ γ by monotonicity of . (ii)
γ <β
Therefore β ⊆ () for all β; hence ⊆ (). This settles the issue,
by (i).
(1) is a set.
(2) There is an α such that <α = α . Moreover, = α = β for β ≥ α.
(3) The α of (2) satisfies Card(α) ≤ Card(X ).
(4) is a fixed point of , that is, () = .
Proof. (1): This is trivial, since α ⊆ X for all α, and hence = α α ⊆ X .
(2): By the proof of VI.5.47 (see also the remark following that proof).
Alternatively, the function f = λs. min{α : s ∈ α }, that is, the one that maps
each s ∈ to its level, is a set by (1). Let α = sup+ ran( f ). Then
α = β ∪ β
β<α β<α
= β since every s ∈ ( <α ) is in some β , β < α
β<α
<α
=
A, then
(1) || ≤ m,
(2) Cl(R) = .
Proof. (1): Since m ⊇ <m, it suffices to show that m ⊆ <m. Let then
x ∈ m. Thus, either x ∈ <m, in which case there is nothing to prove, or
x ∈ ( <m). Thus, for some A ⊆ <m, we have Card(A) < m and A "→ x is
an R-rule. By VII.6.10, A ⊆ <α for some α < m; thus x ∈ ( <α ) ⊆ <m.
(2): By VII.7.16, () = = m. Thus, is an R-closed set; hence Cl(R)
exists (i.e., is a set). Indeed,
Cl(R) ⊆ (i)
On the other hand, assume <α ⊆ Cl(R). It follows (Cl(R) is R-closed) that
VII.7.21 Lemma. For any formula F (y, xn ) and set N , there is a set M ⊇ N
such that:
and
(2) M can be chosen to satisfy Card(M) ≤ max(Card(N ), ℵ0 ).
Thus we may add F (a, u ), where a is a new constant and ρ(a) is minimum.
Since M is R-closed, a ∈ M; thus (∃y ∈ M)F (y, u ) by substitution axiom.
(2): Using AC, cut R down to T such that for each of the n! permutations w
of u 1 , . . . , u n , where u i ∈ M (the M above) we keep a unique {u 1 , . . . , u n } "→ y
whenever F (y, w) (this T is, of course, a set). Set M = Cl(T ).
First of all, for all u i ∈ M , (∃y)F (y, u ) ↔ (∃y ∈ M )F (y, u ), exactly as
in (1). Next, by VII.7.20,
M = p (ii)
p∈ω
where is the monotone operator associated with the rule set T (recall, T is
ω-based, whence the choice of upper bound of in (ii)).
By induction on p we argue that Card( p ) ≤ max(Card(N ), ℵ0 ). Indeed,
this is true for p = 0, as 0 = ( <0 ) = (∅) = N . Now,
Card( ) = Card
p+1
∪
i
i
i≤ p i≤ p
≤ Card i +c Card i (iii)
i≤ p i≤ p
By the I.H.,
Card ≤ ( p + 1) ·c max(Card(N ), ℵ0 ) = max(Card(N ), ℵ0 )
i
(iv)
i≤ p
Also, setting S = i≤ p i ,
Card (S) = Card {y : (∃
u ∈ S)({u 1 , . . . , u n } "→ y is in T )}
n
≤ Card(S) since T is single-valued in y
≤ max(Card(N ), ℵ0 ) by (iv) (v)
and
(2) M can be chosen to satisfy Card(M) ≤ max(Card(N ), ℵ0 ).
506 VII. Cardinality
Lemma VII.7.21 also holds for any set of formulas that can be indexed within the
theory. However, for an arbitrary (infinite) set of formulas (not indexed within
the theory) the lemma breaks down. It is still true metamathematically, though,
since, arguing in the metatheory, we can index this (enumerable) set of formulas
using N as index set. Of course, the R so obtained (by “put {u 1 , . . . , u n } "→ y
in R as long as G i (y, u ) for some i ∈ N, and y has least rank”) is still ω-based.
The proof technique in VII.7.21 (and the flavour of the result in VII.7.23) is
analogous to that employed towards the downward Löwenheim-Skolem theorem
of model theory (proved in volume 1 of these lectures).
We next apply VII.7.22 to show that for any finite set of formulas, there is a
set M such that each of these formulas is absolute for M. We say that M reflects
these formulas.
VII.7.23 Theorem (Reflection Principle). For any set N and any finite set of
formulas F i , i = 1, . . . , m, there is a set M ⊇ N such that
ZFC F i ↔ F M
i for i = 1, . . . , m
Proof. Let G j , j = 1, . . . , r , be the list of all formulas that consists of the list
F i , i = 1, . . . , m, augmented by all subformulas of the F i . If none of the G j is
of the form (∃y)Q , then take M = N . Otherwise, take M ⊇ N , using VII.7.22
on all formulas of the form (∃y)Q in the G j -list.
By induction on formulas we show next that
ZFC G j ↔ G M
j for j = 1, . . . , r (1)
ZFC A ↔ A M
and
ZFC B ↔ B M
ZFC ¬A ↔ (¬A ) M
VII.7. Inductively Defined Sets 507
and
ZFC A ∨ B ↔ (A ∨ B ) M
This is so because a consistent ZFC using (1) can prove the existence of a
(set) model for itself (i.e., one for {F 1 , . . . , F n }). Thus, ZFC can prove its
consistency (cf. I.7.8):
by VII.7.23 and tautological implication‡ with help from the Leibniz rule. This
is contrary to Gödel’s second incompleteness theorem.
As a by-product of this observation, we also conclude that an extension
of VII.7.21 to an arbitrary set of formulas not only does not follow (in ZFC)
from our (Löwenheim-Skolem) proof technique, but is downright impossible.
Now, working in the metatheory, we can mimic the construction that builds
the model U A for some set of urelements A. Continuing in the metatheory, we
can apply reflection (to the enumerable – in the metatheory – set of axioms) and
“cut U A down” to an enumerable (U, ∈)-model (M, U, ∈).§ We can next apply
Mostowski collapsing (λx.C(M, x) : M → C(M, M); see VI.2.38, p. 312) to
get an ∈-isomorphic transitive set structure (C(M, M), U, ∈) which is also a
model, since λx.C(M, x) preserves atoms and the ∈-relation.¶
† ZFC, as given, indeed has infinitely many axioms. For example, collection provides one axiom for
each formula F . On the other hand, if ZFC is inconsistent, then certainly all its theorems (which
happen to be all formulas under these circumstances) follow from the single axiom (∀x)x = x.
‡ If ZFC A ↔ A M and ZFC A, then ZFC A M .
§ Take N in the proof of VII.7.23 to be enumerable.
¶ See VI.2.36–VI.2.39. Of course, M is extensional, being a ZFC (U, ∈)-model.
508 VII. Cardinality
ω ∼ P(ω) (1)
|=C(M,M) ω ∼ P(ω)
and this is as it should be by (1). This person cannot see the 1-1 correspondence
f that effects (2), for f is not in C(M, M). Note that the expression immediately
above says the same thing as (cf. VI.8.4)
C(M,M)
|=U N ω ∼ P(ω)
Consistency of GCH with ZF. We conclude this chapter with a proof that L N
is a model of GCH. Which L N ? We add to ZF the new constants N , f and the
axiom
Therefore
Hence
Card P(ℵα ) ≤ Card {Fβ : β < ℵα+1 } ≤ ℵα+1
– the last ≤ due to the onto map ℵα+1 % β "→ Fβ . Since ℵα+1 ≤ Card P(ℵα ) ,
we are done.
{ f, N } ∪ N ∪ T C( f ) ∪ A ⊆ B ∧ ¬U (B) (2)
Card(B) ≤ m (3)
† Recall that “freezing”, jargon we have applied constantly towards invoking the deduction theorem,
formally means to add new set constants, A and m, and the assumptions (1).
510 VII. Cardinality
T φ is 1-1 (9)
† φ(x) = C(B, x) in the notation of VI.2.36. N and f are fixed points of φ. (Why?)
VII.7. Inductively Defined Sets 511
and
T ran(φ) is transitive† (10)
By the first case of (7)
T x ∈ B → U (x) ↔ U (φ(x)) (11)
where T is the conservative extension of T obtained by adding the introducing
axiom for φ.‡ Its language is L , that is, L with φ added.
Now, I = (L , T , B) is also a model of ZF + (V = L), and (8)–(11) yield
a formal isomorphism (cf. p. 84) between I and the transitive interpretation of
L, J = (L , T , ran(φ)).§ In fact J is a formal (U, ∈)-model of ZF + (V = L).
To see this we employ I.7.12 to obtain
T A B ↔ A ran(φ) (12)
for all sentences over L. This and (4) entail
T A ↔ A ran(φ) (13)
for all L-sentences. If now F is the universal closure of a ZF + (V = L) axiom,
then T F ; hence, by (13), we also have T F ran(φ) .
We note two more facts:
by (9)
T Card(ran(φ)) = Card(B) ≤ m (14)
T {N } ∪ N ∪ A ⊆ ran(φ) (15)
the latter by Exercise VI.6, since {N } ∪ N ∪ A is transitive. By (15) and results
in Sections VI.8 and VI.9, α "→ Fα is absolute for ran(φ) (i.e., for J); hence so
is ord, since it is introduced by the explicit definition¶
ord(x) = min{α : x = Fα }
Then
T x ∈ A → ord(x) ∈ ran(φ) (using (15))
Hence, by transitivity of ran(φ),
T x ∈ A → ord(x) ⊆ ran(φ)
† This does not need the extensional nature of B. By the way, T ran(φ) = C(B, B) in the
notation of VI.2.38.
‡ Extensions by definitions are conservative. Cf. I.6
§ Where U , ∈, N and f are interpreted as themselves, just as in I . cf. footnote to the definition
of φ.
¶ ord(x) = α ↔ (x = Fα ∧ (∀β ∈ α)x = Fβ ), etc.
512 VII. Cardinality
VII.8. Exercises
VII.1. Show that ω ∼ ω + 2.
VII.2. Show that if A ∼ B, then A is finite iff B is finite.
VII.3. Fill in the missing details in the proof of Proposition VII.1.19.
VII.4. Show that the concatenation of any finite number of (I , F )-derivations
is a (I , F )-derivation.
VII.5. Prove, using first Definition VII.1.32 and then Definition VII.1.3, that
for any x and y such that x = y, both {x} and {x, y} are finite. Use the
second method to also compute their cardinality.
VII.6. Fill in any missing details in the proof of Proposition VII.1.35.
VII.7. Show, using induction on WR-finite sets, that if A is WR-finite and f
is a function, then f [A] is WR-finite.
VII.8. Show that every natural number is WR-finite.
(Hint. Induction on WR-finite sets.)
VII.9. Prove that if A is WR-finite and B ⊆ A, then B is WR-finite. (Do not
use the equivalence of finite with WR-finite.)
VII.10. Prove, by induction on finite sets, that if A is finite and < is a partial
order on A, then A has both a <-minimal and a <-maximal element.
VII.11. Using the previous problem, give an alternative proof that ω is infinite.
(Hint. Consider the partial order ∈ on ω.)
VII.12. If |A| = n + 1 and a ∈ A, then |A − {a}|= n.
VII.8. Exercises 513
† If A ⊆ X and the set X is fixed throughout the discussion, then the characteristic function of A
(with respect to X being understood) is the function χ A = λx.if x ∈ A then 0 else 1.
514 VII. Cardinality
This last example shows that our tentative plan needs an amendment,
for the second component of the pair is not a “real” as we understand
them in this problem. Make an appropriate amendment to obtain a
1-1 correspondence.
(Hint. Revise the way you split .a0 a1 a2 . . . ai . . . into the blocks that
you use to alternately build the two components of the pair, so that
each block contains a non-zero digit.)
VII.36. Show, using the previous problem, that R2 ∼ R.
VII.37. Prove Proposition VII.4.11.
VII.38. Show that α "→ ℵα is onto Cn − ω.
VII.39. Show that if a ∼ a and b ∼ b , then a b yields a b , and a ≺ b
yields a ≺ b .
VII.40. (Tarski.) Show without the help of AC that a set x is infinite iff P(P(x))
contains an enumerable subset.
(Hint. Consider the function f : ω → P(P(x)) given by f (n) =
{a ∈ P(x) : a ∼ n}.)
VII.41. For any α, β, show that Card(α + β) = Card(α) +c Card(β).
VII.42. Show that a < b does not, in general, imply a +c c < b +c c.
VII.43. Without using VII.5.14, prove that a +c c = c for all a ≤ c, where
c = Card(R).
VII.44. Prove VII.5.12 in a different way: Use induction on n.
VII.45. For any α, β, show that Card(α · β) = Card(α) ·c Card(β).
VII.46. Show that for all α, β, ℵα +c ℵβ = ℵα ·c ℵβ = ℵα∪β .
VII.47. Show that for all a > 0, 0a = 0.
VII.48. Show that (a ·c b)c = ac ·c bc for all a, b, c.
VII.49. Fill in all the missing details in the proof of VII.5.21.
VII.50. Compute n ℵ0 for all n ∈ ω.
VII.51. Compute cℵ0 .
VII.52. Show that c < cc .
VII.53. Compute (cc )c in terms of f = Card(R R).
VII.54. Compute the cardinality of the set of all continuous real-valued func-
tions on R.
(Hint. A continuous function is uniquely determined by its restriction
on Q, the set of rational numbers.)
516 VII. Cardinality
VII.55. Compute the cardinality of the set of all differentiable real-valued func-
tions on R.
ℵ
VII.56. Show that ℵα β = 2ℵβ on the assumption that α ≤ β.
(Hint. Use VII.5.21 and VII.4.25.)
VII.57. Show that if k ≤ l, then ak ≤ al .
VII.58. Prove that for any ordinal α, cf(ℵα+ω ) = ω.
VII.59. Prove that if Ai ∼ Bi for all i ∈ I , and if Ai ∩ A j = ∅ = Bi ∩ B j
whenever i = j, then i∈I Ai ∼ i∈I Bi .
!
VII.60. Prove that Card( i∈I Ai ) ≤ i∈I Card(Ai ).
! !
VII.61. Prove that ai ≤ bi for all i ∈ I implies i∈I ai ≤ i∈I bi .
!
VII.62. Prove that i∈m ki = m ·c supi∈m ki , if ki > 0 for all i, and at least one
cardinal among m and ki is infinite.
VII.63. Prove that an infinite cardinal a is singular iff there are cardinals b < a
!
and mλ < a, for all λ < b, such that a = λ∈b mλ .
# #
VII.64. If Ai ∼ Bi for i ∈ I , then i∈I Ai ∼ i∈I Bi .
VII.65. Show that cf(2ℵα ) > ℵα for any α.
VII.66. (Bernstein.) Prove that ℵℵn α = 2ℵα ·c ℵn for all n ∈ ω and all α.
VII.67. Define the beth function, , by the induction 0 = ℵ0 , α+1 = 2α
and, if Lim(α), α = β<α β . Show that has fixpoints.
VII.68. Show that if GCH holds, then α = ℵα for all α.
VII.69. Let Card(N ) < ω. Show that Card(VN (ω + α)) = α for all α.
VII.70. Show that if Lim(α), then VN (α) is a model of ZFC less collection.
VII.71. Let Card(N ) < α, where α is strongly inaccessible. Then the following
are absolute for VN (α):
(1) β is a cardinal,
(2) f : β → γ is cofinal,
(3) cf(β),
(4) β is strongly inaccessible.
VII.72. Prove that “β is a cardinal” is not absolute for VN (ω + 2).
VII.73. Let R be a relation on a set A. Define W f (R), the well-founded part
of R, as the set {a ∈ A : there is no infinite chain . . . R a2 R a1 R a}
(where, of course, all the ai are in A, since R ⊆ A × A). Prove that
= W f (R), where R
Cl( R) on P(A) × A is given by
R
X "→ x iff X = Rx
VII.8. Exercises 517
Forcing
The method of forcing was invented by Cohen (1963) towards the construction
of non-standard models of ZFC, so that “new axioms” could be proved consis-
tent with the standard ones. Our retelling of the basics of forcing found in this
chapter is indebted primarily to the user-friendly account found in Shoenfield
(1971). The influence of the expositions in Burgess (1978), Jech (1978b), and
Kunen (1980) should also be evident.
In outline, the method goes like this: Suppose we want to show that ZFC
(sometimes ZF or an even weaker subtheory) is consistent with some weird
new axiom, “NA”. Working in the metatheory, one starts with a CTM, M,†
for ZFC. This is the ground model. One then judiciously chooses a PO set,‡
P, <, 1, in M – where we find it convenient to restrict attention to PO sets
that have a maximum element (let us call the latter “1”) – and, using the PO
set, one constructs a so-called generic set G. Circumstances normally have G
obey G ∈ / M. The “judicious” aspect of the choice of the PO set will entail that
the generic extension, M[G], of the CTM M not only contains G as an element
but is a CTM itself that satisfies NA as well (i.e., |= M[G] ZFC + NA). Thus, one
has a proof in the metatheory that if ZFC is consistent (i.e., if a CTM for ZFC
exists), then so is ZFC + NA.
We have said above that “P, <, 1 ∈ M”. By absoluteness of pair (see
Section VI.8), the quoted statement is equivalent to “P ∈ M and <∈ M and
1 ∈ M”.
† We know that we have used the symbol M for {x : U (x)}. However, it is normal practice for
people to also denote by M an arbitrary CTM of ZFC. We are rapidly running out of symbols;
therefore we ask the reader to allow us this overloading of the letter M with more meanings than
one. As always, we will invoke context in our defense.
‡ This is the hard part of the method.
518
VIII. Forcing 519
The above argument cannot be formalized in ZFC “as is” to provide a finitary
proof of relative consistency, namely, along the lines “if ZFC is consistent, here
is how we can construct a set model for ZFC + NA in ZFC ”. Unfortunately
such a construction would formally prove as a corollary (in ZFC) that ZFC is
itself consistent, clashing with Gödel’s second incompleteness theorem.
We can still circumvent this difficulty and provide finitary proofs of relative
consistency results of the type “if ZFC is consistent, then ZFC + NA is too”,
using forcing. One attempts the contrapositive instead:
But the if part means that for some finite set of axioms of ZFC, ,
∪ {NA} 0 = 0 (2)
Now, we can construct a CTM, M, just for , inside ZFC using reflection
(cf. VII.7.23 and the proof of VII.7.24) followed by Mostowski collapsing.
Using forcing, and continuing to work formally inside ZFC, we get a generic
extension of M, M[G], that is a formal model of ∪ {NA}. Now this is fine
by Gödel’s second theorem, for is not the entire ZFC axiom set. By (2), we
have shown ZFC (0 = 0) M[G] , and hence ZFC 0 = 0, since 0 is absolute for
transitive classes. This concludes the forcing proof of (1) in a finitary manner.
VIII.1.2 Remark. In structure parlance (cf. Section I.5) a PO set P, <, 1 is
a structure, with underlying set (or domain) P and with < and 1 as specified
relation and function (a 0-ary function or constant) respectively. Thus, if need
arises, we will use fraktur type (and the same letter as the domain) to name the
structure; in the present case, P = P, <, 1.
The terms “open” and “dense” are not accidental. There is a strong relation
between the homonymous topological concepts and forcing, but this connection
with topology will not be pursued here. By the way, the fully qualified terms
are P-open and P-dense respectively, but usually the qualification is omitted
and P is understood from the context.
† Yes, the extension is the “smaller” of the two. This terminology is due to Cohen (1963).
‡ Recall that when the order < is understood, we say “PO set P” instead of “PO set P, <, 1”.
VIII.1. PO Sets, Filters, and Generic Sets 521
VIII.1.3 Example. The set of finite functions from ω to {0, 1}, i.e.,
(1) 1 ∈ F.
(2) For any two members p and q of F there is an r in F such that r ≤ p and
r ≤ q.
(3) If p ∈ F and p ≤ q, then q ∈ F (or, p ∈ F → ≥ p ⊆ F).
In algebra people define their filters in a stronger manner. First off, one requires
that the PO set P be a lattice, that is, for any two of its members p and q, both
sup{ p, q} and inf{ p, q} exist.† One then calls an F ⊆ P a filter if it satisfies
Note that if F is a filter in the sense (i)–(ii) over the lattice P = P, <, 1,
then it is as well in the sense (1)–(3) of VIII.1.4 over the PO set P. Indeed,
by (ii), if p and q are in F, then so is inf{ p, q}, providing the “witness” that (2)
requires. Also, if p ∈ F and p ≤ q (q ∈ P), then p = inf{ p, q}; hence (by (ii))
q ∈ F.
Trivially (by x ≤ x), S ⊆ F. We next verify that F is a filter. For property (1)
(of VIII.1.4), pick any q ∈ S (S is not empty). But then q ≤ 1; hence 1 ∈ F
by (∗). Property (3) is also trivially verified. As for (2), let p and p be in F.
Then q ≤ p and q ≤ p for some q and q in S. For the sake of concreteness,
say q ≤ q (by comparability of S-elements). Then q is an appropriate witness
for the compatibility of p and p .
Let F be any filter such that
S ⊆ F (∗∗)
Let p ∈ F and also q (in S, by (∗)) such that q ≤ p. Since q ∈ F by (∗∗) and
F is a filter, it follows that p ∈ F . Thus, F ⊆ F .
VIII.1.7 Definition (Generic Sets). Given a PO set P = P, <, 1 and a set
M. A subset G ⊆ P is called M-generic iff
The reader is reminded that the phrase “D is P-dense” subsumes the sub-
phrase “D ⊆ P”; see VIII.1.1 and VIII.1.2.
f (0) = p
VIII.1. PO Sets, Filters, and Generic Sets 523
and
m smallest k such that m k ∈ ≤ f (n) ∩ m n if ≤ f (n) ∩ m n =
∅
f (n + 1) = (∗)
f (n) if ≤ f (n) ∩ m n = ∅
Note that the explicit definition of the subscript “k” above avoids an infinite
set of “unspecified choices” (AC). Also note that the last case above always
obtains if m n is an urelement.
It is easy to see that ran( f ) is a nonempty ( p ∈ ran( f )) chain: First off, that
ran( f ) ⊆ P is trivial. Next, the reader can verify that (∀n ∈ ω) f (n + 1) ≤ f (n)
holds (induction on n; the last case in the definition of f guarantees that f is
total on ω).
Taking for G the filter generated by the chain ran( f ) (see VIII.1.5) will
do. Indeed, that p ∈ G is trivial. Let then D ∈ M be dense. Now D = m n
for some n. Then the first condition in (∗) gives us f (n + 1) = m k , where
m k ∈≤ f (n) ∩ D. Since m k ∈ G, we have G ∩ D = ∅.
VIII.1.9 Example. Let M be a CTM for, say, ZF. We will consider the PO
set of VIII.1.3 relativized in M. By absoluteness of pairing and finiteness (see
Section VI.8), {a, b}, ordered pairs, and finite functions are (M-) absolute, and
so is Pω (A) defined as {x : x ⊆ A ∧ x is finite} (see Exercise VIII.3). We also
recall that finite ordinals (and ω), dom, and ran are absolute for M.
Thus one may redo Example VIII.1.3, this time arguing from within M, as
an inhabitant† of M would do, to obtain in M the PO set P = P, ⊃, ∅, where
P = { p : p is a function ∧ dom( p) ∈ ω ∧ ran( p) ⊆ 2} (1)
He will conduct his argument by noting that ω and 2 are in M, and therefore so
is ω × 2; thus P ∈ M, by separation, since‡ P = { p ∈ Pω (ω × 2) : p is a func-
tion ∧ dom( p) ∈ ω} (he knows that M is a ZF model, so he can do all that). It
then follows that P is in M as well, by the fact that M is closed under pairing.
Equivalently, a being of U A argues the same thing by making the case that
P M given in
P M = { p ∈ Pω (ω × 2) : p is a function ∧ dom( p) ∈ ω} (1 )
† This person uses just “{ p : etc.}” rather than “{ p ∈ M : etc.}”, since the ∈ M part is implicit;
there are no universes beyond M for him.
‡ For him Pω (A) consists precisely of these finite subsets of A that are also members of M – which
are all the finite subsets of A, absolutely speaking.
524 VIII. Forcing
To keep our sanity, we will usually employ the language and methods of ZF
(or ZFC, or even of a fragment of ZF) “in the abstract” (i.e., formally) to effect
our various constructions.‡ We can afterwards relativize what we have done
to some CTM M, using results from Sections VI.8 and, sometimes, VI.9. On
occasion, it might be just as easy to work as an inhabitant of M would and,
using the methods of ZFC (or ZF, or . . . ), argue in effect from within M, as we
have done in the initial part of VIII.1.9.
Our discussion will be, in general, dependent upon a PO set “variable”, which
we will invariably call by the nondescript name P = P, <, 1. For convenience
we will use the (fairly standard) notation
More completely, we should add to the definition (1) above, second case, the
conjunct “∧ G ⊆ P ∧ P, <, 1 ∈ M is a PO set”.§ One then adds a third,
“otherwise” case where, say, x G = ∅. This “completion” spoils the clean form
of (1) and adds or subtracts nothing to or from the expected properties of x G
used in the sequel. Thus we have stated the missing conditions loosely in the
“assumptions” instead.
Note that if we fix G, then the above defines λx.x G by ∈G -recursion using
G as a “parameter”. But whence “interpretation”? This terminology will make
sense in the next section.
Finally:
M[G] = {x G : x ∈ M}
We next build tools to show that for any CTM M and M-generic G, we have
M ⊆ M[G] and G ∈ M[G].
VIII.2.8 Lemma. Let M be a CTM of ZF, and G an M-generic set with respect
to P ∈ M. Then M ⊆ M[G] and G ∈ M[G].
= {z : z ∈ x} by I.H.
=x
= { p̂, p : p ∈ P} (1)
will do fine. By closure of M under pairs and by the fact that collection is true
in M, ∈ M. We next calculate
G = {y G : (∃ p ∈ G)y, p ∈ }
= {( p̂)G : p ∈ G}
= { p : p ∈ G} by what we have proved above
=G
VIII.2.9 Remark. M and M[G] have the same urelements. Indeed, M ⊆ M[G]
yields that all atoms of M are included in M[G]. Conversely, suppose that U (a G )
is true in M[G], and hence in U A (recall from Section VI.8 that we set U M = U
in general in (U, ∈)-interpretations). From VIII.2.3, a G = a (otherwise a G is a
set). Thus, a G ∈ M (since a ∈ M).
VIII.2.11 Remark. So far we readily have the “C” and “T” of the expected
CTM attributes of M[G]. Indeed, by VIII.2.4, the function λx.x G : M → M[G]
is onto. Since the left field is countable, this settles the “C”. As for transitivity,
let x ∈ y G ∈ M[G]. Then y G is a set, thus (VIII.2.3) x = z G for some z ∈G y.
Therefore, for some p ∈ |P|, z ∈ z, p ∈ y ∈ M; hence z ∈ M by transitivity
of M. Finally, x ∈ M[G] by VIII.2.4.
For the “M”, that M[G] is a model of, say, ZFC if M is, we need more work.
† As we did with L Set , we are free afterwards to extend the augmented language by definitions.
VIII.3. Weak Forcing 529
We moreover let
Caution. It is important to note that even though the names a were “built” by
importing into L Set names of objects of M, they are primarily used to name –
via the interpretation (1) – objects of M[G], not objects of M.
Whenever we interpret a formula in the structure M = (M, U, ∈), the names
a, . . . are interpreted as (a)M = a, . . . .‡
G |= A (a1 , a2 , . . . , an ) (1)
or
G |= A[[ a1 G , a2 G , . . . , an G ]]
as a shorthand for
or
† This procedure justifies the name “interpretation” for the function x "→ x G .
‡ E.g., in VIII.4.10.
§ Under the term “L Set formula” we include formulas over L Set that may contain defined symbols.
However these formulas must have no M-symbols a, etc.
530 VIII. Forcing
To a person living in U A , (2), and hence also (1), above mean the same thing
as (cf. VI.8.4)
|=U A A M[G] [[ a1G , a2G , . . . , anG ]]
We will usually write the above using round brackets (argot). Similarly, one
abuses notation slightly and writes
|=MG A(a1G , a2G , . . . , anG ) (3)
instead of (2). However, to the right of |= one normally expects to see a well-
formed formula over our language, here L Set,M . Thus, a “real” object (from the
structure MG ) can appear in a formula only by its formal name, a, rather than
by an informal name such as a G , unless one writes in mixed mode using [[ . . . ]]
brackets.†
Ideally one ought to use a subscript “(M, P)” to the symbol “w ”, but such
pedantry is almost never practised or needed.
† This same apparent hairsplitting is what made us import constants in I.5.4 towards defining the
Tarski semantics for first order languages.
‡ In the jargon of VIII.3.1.
§ Looking back to VIII.1.9, this distinction between finite and infinite “amounts of information”
is aptly motivated.
VIII.3. Weak Forcing 531
sets except by (formal) name – all he knows on faith is that generic sets are
objects found beyond the universe he lives in. Yet we will see in the next section
that this apparent dependence of forcing on G-sets and knowledge of “things
outside M” can be circumvented.
We state here a few basic properties of w , all due to Cohen. Monotonicity
(2) in VIII.3.3 below, and the definability and truth lemmata below, will be the
ticket for the mathematician in M to do forcing within his universe.
Granting the lemma, a being in M can verify the right hand side of the above
equivalence (hence also the left) working in his world M with the unrelativized
B (cf. VI.8.4).
The following lemma says that truth in MG can be certified by working with
a finite approximation of G.
The last two lemmata are proved in Section VIII.5 with the help of the
“original” concept of Cohen’s (strong) forcing.
The form of (1) immediately suggests that unlike w – this being helped by
its semantical definition – does not subscribe to the proposition that an even
number of ¬ symbols at the front of a formula can be dropped.
We will attempt to motivate the definition of the version of we use here.
This is the one in Shoenfield (1971) and is probably the user-friendliest in the
literature (compare with the versions in Cohen (1963) and Kunen (1980)† ). The
reader is cautioned not to expect our motivational overture to unambiguously
lead to a unique choice of definition of . Even the auxiliary relation x ∈ p y
introduced below can be defined in different ways (see, e.g., Shoenfield (1967
vs. 1971)).
The crucial concept in motivating the definition of is that by using it, in M,
we can effect a syntactic approximation to truth in M[G], i.e., an approximation
to what
G |= A(a1 , . . . , an )
G |= A(x1 , . . . , xn )
means.
We begin by introducing a finite version of x ∈G y:
† Not only are the various versions of strong forcing not created equal in terms of definitional
complexity, but this is also true in terms of their behaviour. For example, in the version in Kunen
w M
(1980), rather than (1) above one proves p A iff p A .
VIII.4. Strong Forcing 533
Just like and |=, and w apply to everything to their right (they have
lowest priority, hence maximum scope). This explains the brackets around
p A(x1 , . . . , xn ) above.
VIII.4.6 Remark. In other words, for any fixed xn , the set
p ∈ |P| : p A(xn ) ∨ p ¬A(xn )
Hence, since being dense is absolute for such a CTM (Exercise VIII.3),
M M
p ∈ |P| : p A(xn ) ∨ p ¬A(xn ) is a dense set in M.
Next we search for a good definition for p a ∈ b and p a = b,† or, more
generally (with free variables), of p x ∈ y and p x = y.
So, what does |=MG a ∈ b mean? It means that a G ∈ b G is true, that is, a ∈G b
is true. This is, trivially, equivalent to (cf. I.6.2)
(∃z)(z = a ∧ z ∈G b) (1)
p ¬A iff ¬ p A
for it is conceivable that the p does not force the truth of A, simply because it
does not contain enough information to do so. Thus we need to know that no
amount of additional finite information (any q such that q ≤ p) will help to
force A. Then we can proclaim that p forces ¬A. Thus, we will adopt
† Actually, it turns out to be technically somewhat more convenient to look for a definition of
p a = b, viewing = as the primary (in)equality predicate and = as a derived one.
‡ The first ¬ is formal, part of the formula ¬A . The second is metamathematical. Some writers
use p
A instead.
536 VIII. Forcing
Using (6) and (7), we can thus rewrite clauses (b) and (c) of the definition to
read
and
p x = y abbreviates
(U (x) ∧ U (y) ∧x = y) ∨ (U (x) ∧ ¬U (y)) ∨ (¬U (x) ∧ U (y)) ∨
(∃z)(∃q ≥ p) (z, q ∈ x ∧ (∀r ≤ p)¬r z ∈ y) ∨ (9)
(z, q ∈ y ∧ (∀r ≤ p)¬r z ∈ x)
respectively. This now makes sense out of Definition VIII.4.8(b)–(c), since (8)
and (9) constitute a simultaneous recursion in the sense of VI.2.40.
More rigorously, then, what Definition VIII.4.8(b) and (c) really mean is that
we employ the abbreviations
and
where the functions λpx y.In( p, x, y) and λpx y.Ne( p, x, y) (with right field
{0, 1}) are defined in U A by the simultaneous recursion (8 ) + (9 ) below,
mimicking (8) and (9):
0 if p ∈ |P| ∧
In( p, x, y) = (∃z)(∃q ≥ p)(z, q ∈ y ∧ (∀r ≤ p)Ne(r, z, x) = 1)
1 otherwise (this includes the case p ∈
/ |P|)
(8 )
and
0 if p ∈ |P| ∧ (U (x) ∧ U (y) ∧ x = y) ∨ (U (x) ∧ ¬U (y))
∨ (¬U (x) ∧ U (y))$
Ne( p, x, y) = ∨ (∃z)(∃q ≥ p) z, q ∈ x ∧ (∀r ≤ p)In(r,
z, y) = 1
%
∨ z, q ∈ y ∧ (∀r ≤ p)In(r, z, x) = 1
1 otherwise
(9 )
That the recursion (8 )–(9 ) is legitimate follows from the following consid-
erations: We note that (8 ) implies z ∈ dom(y) (by z, q ∈ y) and hence
max(ρ(z), ρ(x)) < max(ρ(x), ρ(y)). Similarly, (9 ) implies z ∈ dom(x) (by
z, q ∈ x) or z ∈ dom(y) (by z, q ∈ y); thus max(ρ(z), ρ(y)) < max(ρ(x),
ρ(y)) and max(ρ(z), ρ(x)) < max(ρ(x), ρ(y)) respectively. It follows that the
538 VIII. Forcing
or
(I) ¬B : Let q ≤ p and (∀r ≤ p)¬r B . Then (∀r ≤ q)¬r B , that is,
q ¬B . (The I.H. was not used)
(II) B ∨ C : Exercise.
(III) (∃y)B (y, x ): Let q ≤ p and p A . That is, (∃y) p B (y, x ). By the
I.H., (∃y)q B (y, x ).
† We have opted here for the notation “A (. . .) M ” rather than the awkward “(A ) M (. . .)”.
540 VIII. Forcing
pc = x (a)
cG ∈ y G (d)
† The I.H. applies to atomic formulas, here c = x. However, it is all right to apply it to negated
atomic formulas, here c = x, by the “¬” case below.
‡ By the remark following the proof of VIII.4.10, it is unnecessary to write (r x ∈ y) M .
VIII.4. Strong Forcing 541
(d) ¬U (x G ) ∧ ¬U (y G ). Thus,
∃z ∈ dom(x) ∪ dom(y)
(z G ∈ x G ∧ z G ∈
/ yG ) (1)
∨ (z ∈ y ∧ z ∈
G
/x )
G G G
For the sake of argument, say it is the first of the two (∨-)cases
of (1) above that holds for some z. Then z ∈G x, i.e., for some
p ∈ G,
z, p ∈ x (2)
Moreover, since ρ(z) < ρ(x) by (2), max(ρ(z), ρ(y)) < max(ρ(x),
ρ(y)); thus the I.H. implies, for some q ∈ G,
q z ∈
/y (3)
Let r ∈ G satisfy r ≤ q and r ≤ p. Then (2) yields z ∈r x, and
(3) yields r z ∈ / y by VIII.4.3. Thus r x = y. The other case
in (1) is entirely analogous.
←: We fix p ∈ G and x and y in M such that p x = y. We have
cases.
(a ) U (x)∧U (y). By VIII.4.8, x = y holds. Since x = x G and y = y G
(VIII.2.9), x G = y G holds.
(b ) U (x) ∧ ¬U (y). Then (VIII.2.9) U (x G ) ∧ ¬U (y G ). Thus x G = y G
holds.
(c ) ¬U (x) ∧ U (y). As above.
(d ) ¬U (x) ∧ ¬U (y). By VIII.4.8,
∃z ∈ dom(x) ∪ dom(y)
(∃q ≥ p)(z, q ∈ x ∧ p z ∈ / y) (4)
∨ (∃q ≥ p)(z, q ∈ y ∧ p z ∈
/ x)
For the sake of argument, say it is the first of the two (∨-)cases
of (4) above that holds for some z. Then z ∈G x, since q ∈ G by
filter properties; hence
zG ∈ x G (5)
ρ(z) < ρ(x) by z ∈ dom(x) implies max(ρ(z), ρ(y)) < max(ρ(x),
ρ(y)); hence, from p z ∈
/ y and the I.H.,
zG ∈
/ yG (6)
(5) and (6) now yield x = y . The other case in (4) is entirely
G G
analogous.
542 VIII. Forcing
→: Assume ¬B M
M[G] G
(x1 , . . . , xnG ). Let q ∈ G make (7) true. Then
q ¬B (xn ) , since the alternative and the I.H. would yield
B M[G] (x1G , . . . , xnG ), contradicting the assumption.
M
←: Let p ∈ G be such that p ¬B (xn ) . Then (VIII.4.4) p
M
B (xn ) .
M
Is it possible that, for some r ∈ G, r B (xn ) ? Well, if so, let
s ∈ G witness the compatibility of p and r . Then (VIII.4.3)
M M
s B (xn ) ∧ s ¬B (xn )
or
M
¬(∃ p ∈ G) p B (xn )
is true; hence
M
(∃ p ∈ G) p (∃y)B (y, x1 , . . . , xn ) (11)
→: Let (11) hold for the given x . Then (10) holds by VIII.4.8 (∃-case)
for some b ∈ M. Therefore, by the I.H., (9) holds. Thus we have (8).
VIII.5.1 Theorem. Fix a CTM M and a PO set P ∈ M. Then for any formula
A(xn ) over L Set,M , and all xn in M, (1) on p. 532 holds in U A .
holds in U A .
←: Assume the right hand side of ↔, and let G ⊆ |P| be M-generic
and p ∈ G (cf. VIII.1.8). By VIII.4.7, ¬¬A M[G] (x1G , . . . , xnG ) is true; hence
so is A M[G] (x1G , . . . , xnG ). By Definition VIII.3.2, p w A(xn ), since G (with
p ∈ G) was an arbitrary generic set.
→: We prove the contrapositive, so let
M
¬ p ¬¬A(xn ) (3)
M
(∃ p ∈ G) p ¬¬A(x1 , . . . , xn )
ρ(x ) =
G
ρ(y ) + 1
G
y G ∈x G
≤ ρ(y) + 1 by I.H.
(∃ p∈G)y, p∈x
≤ ρ(z) + 1 since ρ(y) < ρ(y, p)
z∈x
= ρ(x)
Proof.
(i) Urelement axioms:
(a) Urelements are atomic: M[G]Let y ∈ M[G]. We want the truth of
U (y) → ¬(∃x)x ∈ y . That is, of U (y) → ¬(∃x ∈ M[G])x ∈ y,
which is true even without the qualification (∃x)(x ∈ M[G] ∧ · · · ).
(b) Set of all atoms:
We want {x : U (x)} M[G] ∈ M[G]. That is, M[G] ∩
{x : U (x)} ∈ M[G]. This is so by VIII.2.9 – whence M[G] ∩
{x : U (x)} = M ∩ {x : U (x)} – along with M ⊆ M[G] and the fact
that M is a CTM, so that {x : U (x)} M ∈ M.
(ii) Extensionality: It holds because M[G] is transitive (VI.8.10).
(iii) Separation: Let a ∈ M and b = {x ∈ a G : A M[G] (x)}, where A(x) is
over L Set,M . We let
c = {x, p ∈ dom(a) × |P| : p w x ∈ a ∧ A(x)} (1)
† Not relativized.
546 VIII. Forcing
“ p w A(z, u)”):
(∀z, p ∈ dom(a) × |P|)(∃u ∈ M) p w A(z, u)
→ (∃W ∈ M)(∀z, p ∈ dom(a) × |P|)(∃u ∈ W ) p w A (z, u)
Or, moving (∃W ) to the front and asserting it to be a set (permissible
by III.8.4),
b G = {y G : y ∈ W } (6)
v ∈ dom(a) (7)
p w A(v, t) (8)
A M[G] (v G , c G ) is true
{x ∈ M[G] : x ⊆ a G } ⊆ b G (9)
It will turn out that b = P(dom(a) × |P|) M × {1} works. First off, since
M satisfies ZFC, b ∈ M. The same type of calculation we have done in
the collection case yields
VIII.7. Applications
We conclude our lectures by presenting in this section elementary applications
of forcing. They all are based on PO sets of finite functions. We recall from
Section VI.8 that finiteness is absolute for CTMs of ZFC and therefore an
inhabitant of such a CTM proclaims a set “finite” exactly when such a set is
“really” finite. We will benefit from widening the scope of Examples VIII.1.3,
VIII.1.6, and VIII.1.9.
VIII.7.1 Example. Let† F(a, b) = F(a, b), <, 1 be a PO set defined as fol-
lows in terms of sets a and b where a is infinite and b = ∅:‡
Ri % q ∪ { j, i} < q
Absoluteness results of Sections VI.8 and VI.9 (in particular VI.9.20) show that
the function α "→ Fα of VI.9.2 is absolute for any CTM of ZF as long as we
take N of L N in M. Thus, if M is such a CTM, then
L M[G]
N = {Fα : α ∈ On} M[G] = {Fα : α ∈ On M[G] } (2)
Of course, ℵ1M is the smallest ordinal α > ω in M for which there is no 1-1
correspondence f : ω → α in M.
Let G be M-generic with respect to the PO set F(ω, ℵ1M ), and consider
g = G. We know that g ∈ M[G]. We also know that g : ω → α is total
and onto in M[G] (recall that ordinals are absolute, and M and M[G] have
the same ordinals). Thus, α is not an uncountable cardinal in M[G]; it is just
an at most countable ordinal (in view of g). We say that the cardinal ℵ1M was
collapsed as we passed from M to its generic extension M[G]. Therefore ℵ1M is
not an uncountable cardinal in M[G], that is, ℵ1M < ℵ1M[G] in M[G]. Of course,
all along, ℵ1M is just that α above.
This phenomenon of cardinals collapsing – a witness to the fact that being a
cardinal is not absolute for CTMs – is annoying because it causes more work
towards proving the relative consistency of ¬CH.
in M, or
(κ is an uncountable cardinal) M (1)
Let G be M-generic and form M[G]. We know that if we let f = G, then
f : ω × κ → 2 is total and onto (2)
and
f ∈
/M by Exercise VIII.2 (3)
As a matter of fact, α "→ λn. f (n, α) † is 1-1 on κ: Indeed, for any α = β in κ
consider the set in M,
D = p ∈ F(ω × κ, 2) : (∃n ∈ ω) p(n, α) ↓ ∧
p(n, β) ↓ ∧ p(n, α) = p(n, β)
respect to the present F(a, b)) M[G], that is, we would like to show ℵ1M = ℵ1M[G]
and ℵ2M = ℵ2M[G] .
We will do that through a sequence of lemmata.
Pause. If ℵ2M collapses, then we are in trouble: (5) does nothing for us, because
the ordinal ℵ2M is countable in M[G]. If ℵ1M collapses, even if ℵ2M might not,
we are still in trouble, for ℵ2M is not the second infinite cardinal in M[G] (that
is, ℵ2M[G] ) now that ℵ1M has collapsed.
VIII.7.6 Remark. Since for each α < κ, λn. f (n, α) codes a real number in
[0, 1] (in binary notation), we say that λn. f (n, α) is a Cohen generic real. Thus
we have added (these objects are new, by Exercise VIII.2) κ generic reals to
the ground model M. Intuitively, this set of reals turns out to be so huge that,
in M[G], it has cardinality large enough to allow some cardinalities below it,
but above ω.
In much of the literature the κ-antichain condition is called the κ-chain condition
or κ-c.c. In particular, when κ = ℵ1 one then speaks of the countable chain
condition or c.c.c.
Proof. Suppose instead that hypotheses hold, yet there is an M-generic G and an
onto function in M[G], f : α → κ, where α < κ in M[G]. That is, the ordinal κ
is not a cardinal in M[G]. By VIII.6.2, α is an ordinal in M as well.
There is a formula A(x, y, z) of L Set that says “x : y → z is an onto func-
tion”. Thus, if t ∈ M is such that f = t G , then, for some p ∈ G (we fix one
such p), VIII.3.5 implies
† The “in M” cannot be emphasized enough. Since M is countable, a resident of U A trivially sees
that every antichain in M is countable.
554 VIII. Forcing
p w t : α̂ → κ̂ is onto (1)
Bβ ∈ M for all β < α, since, by the definability lemma, the expression to the
right of “:” is a formula relativized to M. We next majorize the cardinality in
M of the Bβ , i.e., estimate (“from above”) Card M (Bβ ). To this end, let us pick
for each γ ∈ Bβ one q that works in (2) above. We denote this q by qγ .
Assume now that γ = δ are both in Bβ . We will argue that qγ ⊥qδ : If not,
let r ≤ qγ and r ≤ qδ . Now, by definition of the symbol “qγ ”, qγ w t(β̂) = γ̂
and qδ w t(β̂) = δ̂; hence, by monotonicity (VIII.3.3(2)), r w t(β̂) = γ̂ and
r w t(β̂) = δ̂. Let G % r be some M-generic set (cf. VIII.1.8). The truth
lemma yields γ = t G (β) = δ in M[G ] (recall that ordinals are preserved, and
both γ and δ are in M, for κ is). This is a contradiction.
Pause. While G = G in U A in general, and the same is true of t G versus
t G = f , these objects – G and t G – were just intermediate agents towards
deriving the contradiction γ = δ.
Thus, in M, Bβ maps 1-1 into some antichain C that contains the qγ objects
for the various γ ∈ Bβ . Therefore, Card(Bβ ) ≤ Card(C) < κ is true in M,† the
“<” contributed by the κ-a.c. of P in M. Since κ is regular in M, the following
is true in M by VII.6.11:
Card Bβ < κ (3)
β<α
This shows that the assumption that we have an α and f in M[G] with the
stated properties is untenable, thus proving the lemma.
Towards (4), let us argue in M, and let γ < κ (i.e., γ ∈ κ). Since f is onto
κ (from α – this happens in M[G]), let β < α such that f (β) = γ . Thus, by
† Written without the M-superscript, since we have said “is true in M” (cf. VI.8.4).
VIII.7. Applications 555
Proof. We only worry about what happens beyond ω. By the remarks above,
if κ = ℵα+1
M
, a successor cardinal, then κ is preserved, since it is regular in M
(see VII.6.12). Suppose now that κ = ℵαM and Lim(α), that is, a limit cardinal.†
Thus κ = β<α ℵβM . By Remark VII.4.24(2), κ = β<α ℵβ+1 M
and all ℵβ+1
M
are
preserved.
Our next task is to show that the particular PO set of Example VIII.7.5 has
the ℵ1 -a.c. (or c.c.c. in the alternative terminology). We will need a definition
and two more lemmata.
u
a b c …
† That is, we work with an uncountable Cn , call it “A”, and discard the original A.
VIII.7. Applications 557
is the second step in the recursion possible for any α < ℵ1 ? The
But why
set Y = Y ∈ A : Y ∈ / {X β : β < α} is uncountable; otherwise
We will be done if we can argue that at least one Y ∈ Y is disjoint from all
X β , β < α. Any such Y can then be chosen to be X α .
Suppose instead that every Y ∈ Y intersects β<α X β . Then there is a
β0 < α such that X β0 ∩ Y = ∅ for uncountably many among the Y ∈ Y –
otherwise, Y is a countable union of countable sets Zβ = {Y ∈ Y : X β ∩ Y =
∅}, for β < α. Fixing attention on that β0 , we prove that some a ∈ X β0 is in
uncountably many Y , contradicting the case we are arguing under. Well, if not,
let for each a ∈ X β0
W a = {Y ∈ Y : a ∈ Y }
Each W a is countable; hence (X β0 being finite) so is a∈X β W a . But this union
0
is the set of Y -sets that X β0 intersects, and that is uncountable.
VIII.7.13 Lemma. Let M be a CTM of ZFC, and F(a, b) = F(a, b), <, 1
a PO set in M, where a = ω × ℵ2M and b = 2. Then F(a, b) has the ℵ1 -a.c.
(or c.c.c.) in M.
B = {dom( p) : p ∈ A} (1)
is also uncountable. If not, A ⊆ s∈B { p ∈ F(a, b) : dom( p) = s}, a countable
set, since for each finite s ⊆ ω × ℵ2M the cardinality of s 2 is finite (= 2Card(s) ),
and thus { p ∈ F(a, b) : dom( p) = s} is finite. Let D ⊆ B be an uncountable
-system of root r , and set A D = { p ∈ A : dom( p) ∈ D}. This is uncountable
due to the onto map p "→ dom( p).
Now, { p|`r : p ∈ A D } is finite, hence there are plenty of p and q in A D , indeed
uncountably many, with p = q and p|`r = q|`r . But then p and q are compatible,
558 VIII. Forcing
since dom( p) and dom(q) are in D, and therefore dom( p) ∩ dom(q) = r . But
also p⊥q, since both are in A.
VIII.8. Exercises
VIII.1. In the definition of generic sets (VIII.1.7) we have required G to
be a filter definitionally. Prove that in the presence of the density
requirement we get that G is a filter for free, relaxing requirement (2)
in the definition of a filter (VIII.1.4) as follows: We only ask that any
two p and q in G be compatible (without asking for a witness in G).
(Hint. Fix p and q in G. It helps to prove that the following set is
dense: {r ∈ |P| : r ⊥ p ∨ r ⊥q ∨ (r ≤ p ∧ r ≤ q)}.)
VIII.2. Refer to Example VIII.7.1, and take a = ω and b = 2. We have seen
that if M is a CTM (of, say, ZF) with F(ω, 2) ∈ M and G is any
M-generic set, then G ∈
/ M. We also know that f = G is a function
and f ∈ M[G]. Prove that f ∈/ M.
(Hint. Let g : ω → 2 be in M. With the help of the set { p ∈ F(ω, 2) :
(∃n ∈ ω)( p(n) ↓ ∧ p(n) = g(n)}, prove that f = g.)
VIII.3. If M is a transitive model of ZF − P, then the following are absolute for
M, where we write πi , i = 1, 2, 3, for the ith projection of x, y, z:
(a) Pω (A), where Pω (A) = {x : x ⊆ A ∧ x is finite}.
(b) x is a PO set.
(c) x is a PO set ∧ y ∈ π1 (x) ∧ z ∈ π1 (x) ∧ y⊥z.
(d) x is a PO set ∧ y ⊆ π1 (x) ∧ ¬U (y) ∧ y is open.
(e) x is a PO set ∧ y ⊆ π1 (x) ∧ ¬U (y) ∧ y is dense.
(f) x is a PO set ∧ y ⊆ π1 (x) ∧ ¬U (y) ∧ ¬U (z) ∧ y is z-generic.
VIII.4. Prove that λx G.x G is absolute for transitive models of ZF − P.
VIII.5. Prove that λxP.x̂ of VIII.2.6 is absolute for transitive models of
ZF − P.
VIII.6. Provide all the necessary details that show In and Ne are absolute for
any transitive model of ZF − P.
VIII.8. Exercises 559
560
Bibliography 561
——— (1940). The Consistency of the Axiom of Choice and of the Generalized
Continuum-Hypothesis with the Axioms of Set Theory. Annals of Math. Stud. 3.
Princeton University Press, Princeton.
Gries, David, and Fred B. Schneider (1994). A Logical Approach to Discrete Math.
Springer-Verlag, New York.
——— and ——— (1995). Equational propositional logic. Information Processing
Lett., 53:145–152.
Hartogs, F. (1915). Über das Problem der Wohlordnung. Math. Ann., 76:438–443.
Hermes, H. (1973). Introduction to Mathematical Logic. Springer-Verlag, New York.
Hilbert, D., and P. Bernays (1968). Grundlagen der Mathematik I, II. Springer-Verlag,
New York.
Hinman, P. G. (1978). Recursion-Theoretic Hierarchies. Springer-Verlag, New York.
Jech, T. J. (1978a). About the axiom of choice. In Barwise (1978), Chapter B.2,
pages 345–370.
——— (1978b). Set Theory. Academic Press, New York.
Kamke, E. (1950). Theory of Sets. Translated from the 2nd German edition by
F. Bagemihl. Dover Publications, New York.
Kunen, Kenneth (1980). Set Theory: An Introduction to Independence Proofs. North-
Holland, Amsterdam.
Levy, A. (1979). Basic Set Theory. Springer-Verlag, New York.
Manin, Yu. I. (1977). A Course in Mathematical Logic. Springer-Verlag, New York.
Mendelson, Elliott (1987). Introduction to Mathematical Logic, 3rd edition. Wadsworth
& Brooks, Monterey, California.
Monk, J. D. (1969). Introduction to Set Theory. McGraw-Hill, New York.
Montague, R. (1955). Well-founded relations; generalizations of principles of induction
and recursion (abstract). Bull. Amer. Math. Soc., 61:442.
Pincus, D. (1974). Cardinal representatives. Israel J. Math., 18:321–344.
Rasiowa, H., and R. Sikorski (1963). The Mathematics of Metamathematics. Państwowe
Wydawnictwo Naukowe, Warszawa.
Schütte, K. (1977). Proof Theory. Springer-Verlag, New York.
Shoenfield, Joseph R. (1967). Mathematical Logic. Addison-Wesley, Reading,
Massachusetts.
——— (1971). Unramified forcing. In Dana S. Scott, editor, Axiomatic Set Theory, Proc.
Symp. Pure Math., pages 357–381.
——— (1978). Axioms of Set Theory. In Barwise (1978), Chapter B.1, pages 321–344.
Sierpiński, W. (1965). Cardinal and Ordinal Numbers. Warsaw.
Skolem, T. (1923). Einige Bemerkungen zur axiomatischen Begründung der Mengen-
lehre. In Wiss. Vorträge gehalten auf dem 5. Kongress der skandinav. Mathematiker
in Helsingförs, 1922, pages 217–232.
Smullyan, Raymond, M. (1922). Gödel’s Incompleteness Theorems. Oxford University
Press, Oxford.
Tarski, A. L. (1955). General principles of induction and recursion (abstract); The notion
of rank in axiomatic set theory and some of its applications (abstract). Bull. Amer.
Math. Soc., 61:442–443.
–—— (1956). Ordinal Algebras. North-Holland, Amsterdam.
Tourlakis, G. (1984). Computability. Reston Publishing Company, Reston, Virginia.
——— (2000a). A basic formal equational predicate logic – part I. BSL, 29(1–2):43–56.
——— (2000b). A basic formal equational predicate logic – part II. BSL, 29(3):75–88.
——— (2001). On the soundness and completeness of equational predicate logics.
J. Computation and Logic, 11(4):623–653.
562 Bibliography
Veblen, Oswald, and John Wesley Young (1916). Projective Geometry, volume I. Ginn
and Company, Boston.
Whitehead, A. N., and B. Russell (1912). Principia Mathematica, volume 2. Cambridge
Univ. Press, Cambridge.
Wilder, R. L. (1963). Introduction to the Foundations of Mathematics. Wiley, New York.
Zermelo, E. (1904). Beweis daß jede Menge wohlgeordnet werden kann. Math. Ann.,
59:514–516.
——— (1908). Untersuchungen über die Grundlagen der Mengenlehre I. Math. Ann.,
65:261–281.
——— (1909). Sur les ensembles finis et le principe de l’induction complète. Acta
Math., 32:185–193.
List of Symbols
f : A → B, 26 N, 12
f I , 54 xn , 185
glb(B), 347 x1 , . . . , xn (n-vector or n-tuple),
1A : A → A (identity relation on A), 185
257 x , 185
S[X] (image of X under S), 196 N, 15
S−1 [X] (inverse image of X under S), N, 112, 233
196 ω, 112, 234
Sc, 196 n, m, l, i, j, k (natural number
i, 55 variables), 235
p⊥q, 520 |=, 61
inf(B), 67, 347 , 61
b T a (T a relation; it means ∈,
/ 116
a, b ∈ T), 194 n , 88
Z, 113 On, 332
n
A , 142 OP, 188
i=1 i
A , 142 < (abstract symbol for order), 284
1≤i≤n i
a∈I A a , 193 Ord, 331
∩, 141 ord(x), 397
A (intersection of a class A), , 374
153 ∼
=, 317
Z, 15 ≤ (abstract symbol for reflexive
(ιz)Q , 74 order), 285
κ-a.c., 553 β∈α ({β} × X β ), 135, 412
·β
κ-c.c., 553 α , 423
Lim (α), 342 α + 1, 340
L, 397 a, b (ordered pair), 186
L(M), 55 (A, <), 286
L N , 397 ≺, 462
λ (used in λ-notation), 202 ≺ (a fixed order of logical and
"→ (alternative to λ-notation), 203 nonlogical symbols), 222, 223
, 438 , 461
∞ , 438 pr (the predecessor function on ω),
<, 336 239
L M (the constructible universe), 284 P I , 54
, 35 An , 192
, 39 A1 × · · · × An , 192
lub(B), 346 P0 (0th power of a relation), 257
α − 1, 342 Pn ( positive power of a relation), 256
List of Symbols 565
#
Fa , 207 sup(B), 346
×
a∈I
n
A , 192 sup+ (A), 347
i=1 i
×
I
1≤i≤n
Ai , 192 t M , 79
Term(M), 55
A, 208
S + (the set of all nonempty strings
∃!, 67
over the set S), 224
A/P, 277
T C(A) (transitive closure of the
R N (α), 359
class A), 265
ran (range of a relation), 195
(order of “definable sets”), 219,
ρ, 369
223, 225
r k, 369
. . ., 20
Q, 113
Q, 15 U M , 144
R, 15 U N (the class of all sets and atoms),
R, 113 144
S−1 (inverse of S), 196 U, 144
, 55 Sa ↑ (S is undefined at a),
T A (restrict inputs of T to be 198
n
in A), 198 A , 142
i=1 i
T A (alternate notation for “”), A , 142
1≤i≤n i
199 A
a∈I a , 193
T | A (that is, T ∩ A2 ), 198 ∪, 141
S(x), 340 A (union of a class A), 150
string A (union of a set A), 152
λ, 13 U (x) (“x is an urelement”), 114
≡, 13 VN (α), 359
⊆, 117, 139 V = L, 408
⊆, 117 V = L, 408
⊇, 117, 139 an , 21
⊇, 117 a , 21
⊂, 119, 139 xn , 34
, 119 V M , 144
, 119 V N (the class of all sets), 145
⊃, 139 V, 145
n + 1 (the successor of n ∈ ω, i.e., WF N , 359
n ∪ {n}), 242 Wff(M), 55
a +c b, 470 ZF, 108, 228
sp, 309 ZFC, 2, 108, 229
Index
567
568 Index
domain, 54 formalize, 6
double induction, 248, 300 formalized, 3
dummy renaming, 46 formula
absolute, 382
∃-introduction, 37 mixed-mode, 62
elimination of defined symbols, 69, 71 prime, 29
embedding, 318 propositional, 29
empty set, 100, 111, 131 satisfiable, 31
Entscheidungsproblem, 41 tautology, 31
enumerable, 442 unsatisfiable, 31
ε-numbers, 426 formula form, 35
ε-term, 76 formula schema, 35
equinumerous, 431 foundation axiom, 102
equipotent, 431 Fraenkel, A. A., 105
equivalence class, 276, 277 free for, 33
equivalence relation, 276 free variable, 19
equivalence theorem, 52 Frege, G., 121
existential formula, 67 function, 199
explicit listing, 147 bijective, 204
expression, 2, 8, 13 collapsing, 310
extension, 7, 194 continuous, 348
Cohen, 524 countably continuous, 348
generic, 524 diagram, 275
of a condition, 520 commutative, 275
extensional, 31, 312 expansive, 353
extensionality, 67 inclusive, 353
increasing, 348
family injective, 203
of sets, 150 inverse of, 273
indexed, 206 left inverse of, 272
intersection of, 153 left-invertible, 274
union of, 150 monotone, 348
quasi-disjoint, 555 non-decreasing, 348
Feferman, S., 228, 447, 530 one-to-one, 203
filter, 521 order-preserving, 317
generated by a chain, 522 partial, 199
finitary, 4 right inverse of, 272
finitary rule, 503 right-invertible, 274
finite, 389 surjective, 204
finite sequence, 251 function diagram, 275
length of, 251 function substitution, 201
finitely satisfiable, 98
first incompleteness theorem, 87 G-interpretation, 525
first order definable, 111 -closed, 437
fixed point, 351, 438 -induction, 438
fixpoint, 351, 438 GCH, 466, 484
forcing, 315 generalization, 45, 120
forcing conditions, 520 generalized continuum hypothesis, 408,
forcing language, 528 466, 484
formal interpretation, 78 generic, 518
of a theory, 83 generic extension, 524, 526
formal isomorphism, 84 global choice, see AC
formal language, 111, 115 Gödel, K., 3, 6, 229, 395, 467
first order, 7 Gödel operations, 396
formal model Gödel-Rosser incompleteness theorem, 92
of a theory, 83 Gödel’s second incompleteness theorem,
formal natural numbers, see set 93
570 Index
well-ordering, 219, 223, 284, 289 Zermelo, E., 105, 230, 355
Whitehead, A. N., 74, 439 Zermelo’s well-ordering principle, 355
Wilder, R. L., 449, 455, 458 Zermelo’s well-ordering theorem, 230
WO class, 289 Zermelo-Fraenkel axioms, 108
WR-finite, 440 Zermelo-Fraenkel set theory, 228
WR-infinite, 440 Ziffern, 42
Zorn’s lemma, see Kuratowski-Zorn
Young, J. W., 53 theorem