0% found this document useful (0 votes)

220 views

Lectures in Logic and Set Theory. Volume 2 - Set Theory (PDFDrive)

Uploaded by

vince ojeda

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

220 views

Lectures in Logic and Set Theory. Volume 2 - Set Theory (PDFDrive)

Uploaded by

vince ojeda

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 593

This page intentionally left blank

CAMBRIDGE STUDIES IN
ADVANCED MATHEMATICS
EDITORIAL BOARD
B. BOLLOBAS, W. FULTON, A. KATOK, F. KIRWAN,
P. SARNAK

Lectures in Logic and Set Theory Volume 2

This two-volume work bridges the gap between introductory expositions of

logic or set theory on one hand, and the research literature on the other. It can
be used as a text in an advanced undergraduate or beginning graduate course
in mathematics, computer science, or philosophy. The volumes are written in
a user-friendly conversational lecture style that makes them equally effective
for self-study or class use.
Volume 2, on formal (ZFC) set theory, incorporates a self-contained
“Chapter 0” on proof techniques so that it is based on formal logic, in the style
of Bourbaki. The emphasis on basic techniques will provide the reader with
a solid foundation in set theory and sets a context for the presentation of ad-
vanced topics such as absoluteness, relative consistency results, two exposi-
tions of Gödel’s constructible universe, numerous ways of viewing recursion,
and a chapter on Cohen forcing.

George Tourlakis is Professor of Computer Science at York University of

Ontario.
Already published
2 K. Petersen Ergodic theory
3 P.T. Johnstone Stone spaces
5 J.-P. Kahane Some random series of functions, 2nd edition
7 J. Lambek & P.J. Scott Introduction to higher-order categorical logic
8 H. Matsumura Commutative ring theory
10 M. Aschbacher Finite group theory, 2nd edition
11 J.L. Alperin Local representation theory
12 P. Koosis The logarithmic integral I
14 S.J. Patterson An introduction to the theory of the Riemann zeta-function
15 H.J. Baues Algebraic homotopy
16 V.S. Varadarajan Introduction to harmonic analysis on semisimple Lie groups
17 W. Dicks & M. Dunwoody Groups acting on graphs
19 R. Fritsch & R. Piccinini Cellular structures in topology
20 H. Klingen Introductory lectures on Siegel modular forms
21 P. Koosis The logarithmic integral II
22 M.J. Collins Representations and characters of finite groups
24 H. Kunita Stochastic flows and stochastic differential equations
25 P. Wojtaszczyk Banach spaces for analysts
26 J.E. Gilbert & M.A.M. Murray Clifford algebras and Dirac operators in harmonic analysis
27 A. Frohlich & M.J. Taylor Algebraic number theory
28 K. Goebel & W.A. Kirk Topics in metric fixed point theory
29 J.F. Humphreys Reflection groups and Coxeter groups
30 D.J. Benson Representations and cohomology I
31 D.J. Benson Representations and cohomology II
32 C. Allday & V. Puppe Cohomological methods in transformation groups
33 C. Soule et al. Lectures on Arakelov geometry
34 A. Ambrosetti & G. Prodi A primer of nonlinear analysis
35 J. Palis & F. Takens Hyperbolicity, stability and chaos at homoclinic bifurcations
37 Y. Meyer Wavelets and operators 1
38 C. Weibel, An introduction to homological algebra
39 W. Bruns & J. Herzog Cohen-Macaulay rings
40 V. Snaith Explicit Brauer induction
41 G. Laumon Cohomology of Drinfeld modular varieties I
42 E.B. Davies Spectral theory and differential operators
43 J. Diestel, H. Jarchow, & A. Tonge Absolutely summing operators
44 P. Mattila Geometry of sets and measures in Euclidean spaces
45 R. Pinsky Positive harmonic functions and diffusion
46 G. Tenenbaum Introduction to analytic and probabilistic number theory
47 C. Peskine An algebraic introduction to complex projective geometry
48 Y. Meyer & R. Coifman Wavelets
49 R. Stanley Enumerative combinatorics I
50 I. Porteous Clifford algebras and the classical groups
51 M. Audin Spinning tops
52 V. Jurdjevic Geometric control theory
53 H. Volklein Groups as Galois groups
54 J. Le Potier Lectures on vector bundles
55 D. Bump Automorphic forms and representations
56 G. Laumon Cohomology of Drinfeld modular varieties II
57 D.M. Clark & B.A. Davey Natural dualities for the working algebraist
58 J. McCleary A user’s guide to spectral sequences II
59 P. Taylor Practical foundations of mathematics
60 M.P. Brodmann & R.Y. Sharp Local cohomology
61 J.D. Dixon et al. Analytic pro-P groups
62 R. Stanley Enumerative combinatorics II
63 R.M. Dudley Uniform central limit theorems
64 J. Jost & X. Li-Jost Calculus of variations
65 A.J. Berrick & M.E. Keating An introduction to rings and modules
66 S. Morosawa Holomorphic dynamics
67 A.J. Berrick & M.E. Keating Categories and modules with K-theory in view
68 K. Sato Levy processes and infinitely divisible distributions
69 H. Hida Modular forms and Galois cohomology
70 R. Iorio & V. Iorio Fourier analysis and partial differential equations
71 R. Blei Analysis in integer and fractional dimensions
72 F. Borceaux & G. Janelidze Galois theories
73 B. Bollobas Random graphs
LECTURES IN LOGIC
AND SET THEORY

Volume 2: Set Theory

GEORGE TOURLAKIS
York University
  
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

Cambridge University Press

The Edinburgh Building, Cambridge  , United Kingdom
Published in the United States by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521753746

© George Tourlakis 2003

This book is in copyright. Subject to statutory exception and to the provision of

relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.

First published in print format 2003

ISBN-13 978-0-511-06872-0 eBook (EBL)

ISBN-10 0-511-06872-7 eBook (EBL)

ISBN-13 978-0-521-75374-6 hardback

ISBN-10 0-521-75374-0 hardback

Cambridge University Press has no responsibility for the persistence or accuracy of

s for external or third-party internet websites referred to in this book, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
To the memory of my parents
Contents

Preface page xi
I A Bit of Logic: A User’s Toolbox 1
I.1 First Order Languages 7
I.2 A Digression into the Metatheory:
Informal Induction and Recursion 20
I.3 Axioms and Rules of Inference 29
I.4 Basic Metatheorems 43
I.5 Semantics 53
I.6 Deﬁned Symbols 66
I.7 Formalizing Interpretations 77
I.8 The Incompleteness Theorems 87
I.9 Exercises 94
II The Set-Theoretic Universe, Naı̈vely 99
II.1 The “Real Sets” 99
II.2 A Naı̈ve Look at Russell’s Paradox 105
II.3 The Language of Axiomatic Set Theory 106
II.4 On Names 110
III The Axioms of Set Theory 114
III.1 Extensionality 114
III.2 Set Terms; Comprehension; Separation 119
III.3 The Set of All Urelements; the Empty Set 130
III.4 Class Terms and Classes 134
III.5 Axiom of Pairing 145
III.6 Axiom of Union 149
III.7 Axiom of Foundation 156
III.8 Axiom of Collection 160
III.9 Axiom of Power Set 178
vii
viii Contents

III.10 Pairing Functions and Products 182

III.11 Relations and Functions 193
III.12 Exercises 210
IV The Axiom of Choice 215
IV.1 Introduction 215
IV.2 More Justification for AC; the “Constructible”
Universe Viewpoint 218
IV.3 Exercises 229
V The Natural Numbers; Transitive Closure 232
V.1 The Natural Numbers 232
V.2 Algebra of Relations; Transitive Closure 253
V.3 Algebra of Functions 272
V.4 Equivalence Relations 276
V.5 Exercises 281
VI Order 284
VI.1 PO Classes, LO Classes, and WO Classes 284
VI.2 Induction and Inductive Definitions 293
VI.3 Comparing Orders 316
VI.4 Ordinals 323
VI.5 The Transfinite Sequence of Ordinals 340
VI.6 The von Neumann Universe 358
VI.7 A Pairing Function on the Ordinals 373
VI.8 Absoluteness 377
VI.9 The Constructible Universe 395
VI.10 Arithmetic on the Ordinals 410
VI.11 Exercises 426
VII Cardinality 430
VII.1 Finite vs. Infinite 431
VII.2 Enumerable Sets 442
VII.3 Diagonalization; Uncountable Sets 451
VII.4 Cardinals 457
VII.5 Arithmetic on Cardinals 470
VII.6 Cofinality; More Cardinal Arithmetic;
Inaccessible Cardinals 478
VII.7 Inductively Defined Sets Revisited;
Relative Consistency of GCH 494
VII.8 Exercises 512
VIII Forcing 518
VIII.1 PO Sets, Filters, and Generic Sets 520
VIII.2 Constructing Generic Extensions 524
Contents ix

VIII.3 Weak Forcing 528

VIII.4 Strong Forcing 532
VIII.5 Strong vs. Weak Forcing 543
VIII.6 M[G] Is a CTM of ZFC If M Is 544
VIII.7 Applications 549
VIII.8 Exercises 558
Bibliography 560
List of Symbols 563
Index 567
Preface

This volume contains the basics of Zermelo-Fraenkel axiomatic set theory. It is

situated between two opposite poles: On one hand there are elementary texts that
familiarize the reader with the vocabulary of set theory and build set-theoretic
tools for use in courses in analysis, topology, or algebra – but do not get into
metamathematical issues. On the other hand are those texts that explore issues
of current research interest, developing and applying tools (constructibility,
absoluteness, forcing, etc.) that are aimed to analyze the inability of the axioms
to settle certain set-theoretic questions.
Much of this volume just “does set theory”, thoroughly developing the theory
of ordinals and cardinals along with their arithmetic, incorporating a careful dis-
cussion of diagonalization and a thorough exposition of induction and inductive
(recursive) definitions. Thus it serves well those who simply want tools to ap-
ply to other branches of mathematics or mathematical sciences in general (e.g.,
theoretical computer science), but also want to find out about some of the subtler
results of modern set theory.
Moreover, a fair amount is included towards preparing the advanced reader
to read the research literature. For example, we pay two visits to Gödel’s con-
structible universe, the second of which concludes with a proof of the relative
consistency of the axiom of choice and of the generalized continuum hypothesis
with ZF. As such a program requires, I also include a thorough discussion of
formal interpretations and absoluteness. The lectures conclude with a short but
detailed study of Cohen forcing and a proof of the non-provability in ZF of the
continuum hypothesis.
The level of exposition is designed to fit a spectrum of mathematical sophis-
tication, from third-year undergraduate to junior graduate level (each group will
find here its favourite chapters or sections that serve its interests and level of
preparation).
xi
xii Preface

The volume is self-contained. Whatever tools one needs from mathematical

logic have been included in Chapter I. Thus, a reader equipped with a com-
bination of sufﬁcient mathematical maturity and patience should be able to
read it and understand it. There is a trade-off: the less the maturity at hand, the
more the supply of patience must be. To pinpoint this “maturity”: At least two
courses from among calculus, linear algebra, and discrete mathematics at the
junior level should have exposed the reader to sufﬁcient diversity of mathemat-
ical issues and proof culture to enable him or her to proceed with reasonable
ease.

A word on approach. I use the Zermelo-Fraenkel axiom system with the axiom
of choice (AC). This is the system known as ZFC. As many other authors do, I
simplify nomenclature by allowing “proper classes” in our discussions as part
of our metalanguage, but not in the formal language.
I said earlier that this volume contains the “basics”. I mean this charac-
terisation in two ways: One, that all the fundamental tools of set theory as needed
elsewhere in the mathematical sciences are included in detailed exposition. Two,
that I do not present any applications of set theory to other parts of mathematics,
because space considerations, along with a decision to include certain advanced
relative consistency results, have prohibited this.
“Basics” also entails that I do not attempt to bring the reader up to speed
with respect to current research issues. However, a reader who has mastered
the advanced metamathematical tools contained here will be able to read the
literature on such issues.
The title of the book reﬂects two things: One, that all good short titles are
taken. Two, more importantly, it advertises my conscious effort to present the
material in a conversational, user-friendly lecture style. I deliberately employ
classroom mannerisms (such as “pauses” and parenthetical “why”s, “what if”s,
and attention-grabbing devices for passages that I feel are important). This
aims at creating a friendly atmosphere for the reader, especially one who has
decided to study the topic without the guidance of an instructor. Friendliness
also means steering clear of the terse axiom-deﬁnition-theorem recipe, and
explaining how some concepts were arrived at in their present form. In other
words, what makes things tick. Thus, I approach the development of the key
concepts of ordinals and cardinals, initially and tentatively, in the manner they
were originally introduced by Georg Cantor (paradox-laden and all). Not only
does this afford the reader an understanding of why the modern (von Neumann)
approach is superior (and contradiction-free), but it also shows what it tries to
accomplish. In the same vein, Russell’s paradox is visited no less than three
Preface xiii

times, leaving us in the end with a firm understanding that it has nothing to do
with the “truth” or otherwise of the much-maligned statement “x ∈ x” but it is
just the result of a diagonalization of the type Cantor originally taught us.
A word on coverage. Chapter I is our “Chapter 0”. It contains the tools needed
to enable us do our job properly – a bit of mathematical logic, certainly no more
than necessary. Chapter II informally outlines what we are about to describe
axiomatically: the universe of all the “real” sets and other “objects” of our
intuition, a caricature of the von Neumann “universe”. It is explained that the
whole fuss about axiomatic set theory† is to have a formal theory derive true
statements about the von Neumann sets, thus enabling us to get to know the
nature and structure of this universe. If this is to succeed, the chosen axioms
must be seen to be “true” in the universe we are describing.
To this end I ensure via informal discussions that every axiom that is intro-
duced is seen to “follow” from the principle of the formation of sets by stages, or
from some similarly plausible principle devised to keep paradoxes away. In this
manner the reader is constantly made aware that we are building a meaningful
set theory that has relevance to mathematical intuition and expectations (the
“real” mathematics), and is not just an artificial choice of a contradiction-free
set of axioms followed by the mechanical derivation of a few theorems.
With this in mind, I even make a case for the plausibility of the axiom of
choice, based on a popularization of Gödel’s constructible universe argument.
This occurs in Chapter IV and is informal.
The set theory we do allows atoms (or Urelemente),‡ just like Zermelo’s.
The re-emergence of atoms has been defended aptly by Jon Barwise (1975) and
others on technical merit, especially when one does “restricted set theories”
(e.g., theory of admissible sets).
Our own motivation is not technical; rather it is philosophical and ped-
agogical. We find it extremely counterintuitive, especially when addressing
undergraduate audiences, to tell them that all their familiar mathematical
objects – the “stuff of mathematics” in Barwise’s words – are just perverse
“box-in-a-box-in-a-box . . . ” formations built from an infinite supply of empty
boxes. For example, should I be telling my undergraduate students that their
familiar number “2” really is just a short name for something like “ ”?
√
And what will I tell them about “ 2 ”?

† O.K., maybe not the whole fuss. Axiomatics also allow us to meaningfully ask, and attempt to
answer, metamathematical questions of derivability, consistency, relative consistency, indepen-
dence. But in this volume much of the fuss is indeed about learning set theory.
‡ Allows, but does not insist that there are any.
xiv Preface

Some mathematicians have said that set theory (without atoms) speaks only
of sets and it chooses not to speak about objects such as cows or fish (colourful
terms for urelements). Well, it does too! Such (“atomless”) set theory is known
to be perfectly capable of constructing “artificial ” cows and fish, and can then
proceed to talk about such animals as much as it pleases.
While atomless ZFC has the ability to construct or codify all the familiar
mathematical objects in it, it does this so well that it betrays the prime directive
of the axiomatic method, which is to have a theory that applies to diverse
concrete (meta – i.e., outside the theory and in the realm of “everyday math”)
mathematical systems. Group theory and projective geometry, for example,
fulfill the directive.
In atomless ZFC the opposite appears to be happening: One is asked to
embed the known mathematics into the formal system.
We prefer a set theory that allows both artificial and real cows and fish, so that
when we want to illustrate a point in an example utilizing, say, the everyday set
of integers, Z, we can say things like “let the atoms (be interpreted to) include
the members of Z . . . ”.
But how about technical convenience? Is it not hard to include atoms in a
formal set theory? In fact, not at all!

A word on exposition devices. I freely use a pedagogical feature that, I believe,

originated in Bourbaki’s books – that is, marking an important or difﬁcult topic
by placing a “winding road” sign in the margin next to it. I am using here the
same symbol that Knuth employed in his TEXbook, namely, , marking with
it the beginning and end of an important passage.
Topics that are advanced, or of the “read at your own risk” type, can be
omitted without loss of continuity. They are delimited by a double sign, .
Most chapters end with several exercises. I have stopped making attempts to
sort exercises between “hard” and “just about right”, as such classiﬁcations are
rather subjective. In the end, I’ll pass on to you the advice one of my professors
at the University of Toronto used to offer: “Attempt all the problems. Those you
can do, don’t do. Do the ones you cannot”.
What to read. Just as in the advice above, I suggest that you read everything
that you do not already know if time is no object. In a class environment the
coverage will depend on class length and level, and I defer to the preferences of
the instructor. I suppose that a fourth-year undergraduate audience ought to see
the informal construction of the constructible universe in Chapter IV, whereas
a graduate audience would rather want to see the formal version in Chapter VI.
The latter group will probably also want to be exposed to Cohen forcing.
Preface xv

Acknowledgments. I wish to thank all those who taught me, a group that is too
large to enumerate, in which I must acknowledge the presence and influence
of my parents, my students, and the writings of Shoenfield (in particular, 1967,
1978, 1971).
The staff at Cambridge University Press provided friendly and expert sup-
port, and I thank them. I am particularly grateful for the encouragement received
from Lauren Cowles and Caitlin Doggart at the initial (submission and referee-
ing) and advanced stages (production) of the publication cycle respectively.
I also wish to record my appreciation to Zach Dorsey of TechBooks and
his team. In both volumes they tamed my English and LATEX, fitting them to
Cambridge specifications, and doing so with professionalism and flexibility.
This has been a long project that would have not been successful without
the support and understanding – for my long leaves of absence in front of a
computer screen – that only one’s family knows how to provide.
I finally wish to thank Donald Knuth and Leslie Lamport for their typesetting
systems TEX and LATEX that make technical writing fun (and also empower
authors to load the pages with and other signs).

George Tourlakis
Toronto, March 2002
I

A Bit of Logic: A User’s Toolbox

This prerequisite chapter – what some authors call a “Chapter 0” – is an abridged

version of Chapter I of volume 1 of my Lectures in Logic and Set Theory. It is of-
fered here just in case that volume Mathematical Logic is not readily accessible.
Simply put, logic† is about proofs or deductions. From the point of view of
the user of the subject – whose best interests we attempt to serve in this chapter –
logic ought to be just a toolbox which one can employ to prove theorems, for
example, in set theory, algebra, topology, theoretical computer science, etc.
The volume at hand is about an important specimen of a mathematical theory,
or logical theory, namely, axiomatic set theory. Another signiﬁcant example,
which we do not study here, is arithmetic. Roughly speaking, a mathematical
theory consists on one hand of assumptions that are speciﬁc to the subject
matter – the so-called axioms – and on the other hand a toolbox of logical rules.
One usually performs either of the following two activities with a mathematical
theory: One may choose to work within the theory, that is, employ the tools and
the axioms for the sole purpose of proving theorems. Or one can take the entire
theory as an object of study and study it “from the outside” as it were, in order to
pose and attempt to answer questions about the power of the theory (e.g., “does
the theory have as theorems all the ‘true’ statements about the subject matter?”),
its reliability (meaning whether it is free from contradictions or not), how its
reliability is affected if you add new assumptions (axioms), etc.
Our development of set theory will involve both types of investigations indi-
cated above:
(1) Primarily, we will act as users of logic in order to deduce “true” state-
ments about sets (i.e., theorems of set theory) as consequences of certain

† We drop the qualiﬁer “mathematical” from now on, as this is the only type of logic we are about.

1
2 I. A Bit of Logic: A User’s Toolbox

“obviously true”† statements that we accept up front without proof, namely,

the ZFC axioms.‡ This is pretty much analogous to the behaviour of a
geometer whose job is to prove theorems of, say, Euclidean geometry.
(2) We will also look at ZFC from the outside and address some issues of the
type “is such and such a sentence (of set theory) provable from the axioms
of ZFC and the rules of logic alone?”

It is evident that we need a precise formulation of set theory, that is, we must
turn it into a mathematical object in order to make task (2), above, a meaningful
mathematical activity.§ This dictates that we develop logic itself formally, and
subsequently set theory as a formal theory.
Formalism,¶ roughly speaking, is the abstraction of the reasoning processes
(proofs) achieved by deleting any references to the “truth content” of the com-
ponent mathematical statements (formulas). What is important in formalist
reasoning is solely the syntactic form of (mathematical) statements as well as
that of the proofs (or deductions) within which these statements appear.
A formalist builds an artificial language, that is, an infinite – but finitely
specifiable# – collection of “words” (meaning symbol sequences, also called
expressions). He|| then uses this language in order to build deductions – that
is, finite sequences of words – in such a manner that, at each step, he writes
down a word if and only if it is “certified” to be syntactically correct to do so.
“Certification” is granted by a toolbox consisting of the very same rules of logic
that we will present in this chapter.
The formalist may pretend, if he so chooses, that the words that appear in a
proof are meaningless sequences of meaningless symbols. Nevertheless, such
posturing cannot hide the fact that (in any purposefully designed theory) these

† We often quote a word or cluster of related words as a warning that the crude English meaning
is not necessarily the intended meaning, or it may be ambiguous. For example, the first “true”
in the sentence where this footnote originates is technical, but in a first approximation may be
taken to mean what “true” means in English. “Obviously true” is an ambiguous term. Obvious to
whom? However, the point is – to introduce another ambiguity – that “reasonable people” will
accept the truth of the (ZFC) axioms.
‡ This is an acronym reflecting the names of Zermelo and Fraenkel – the founders of this particular
axiomatization – and the fact that the so-called axiom of choice is included.
§ Here is an analogy: It is the precision of the rules for the game of chess that makes the notion of
analyzing a chessboard configuration meaningful.
¶ The person who practises formalism is a formalist.
# The finite specification is achieved by a finite collection of “rules”, repeated applications of which
build the words.
|| By definition, “he”, “his”, “him” – and their derivatives – are gender-neutral in this volume.
I. A Bit of Logic: A User’s Toolbox 3

words codify “true” (intuitively speaking) statements. Put bluntly, we must have
something meaningful to talk about before we bother to codify it.
Therefore, a formal theory is a laboratory version (artiﬁcial replica or sim-
ulation, if you will) of a “real” mathematical theory of the type encountered
in mathematics,† and formal proofs do unravel (codiﬁed versions of) “truths”
beyond those embodied in the adopted axioms.

It will be reassuring for the uninitiated that it is a fact of logic that the to-
tality of the “universally true” statements – that is, those that hold in all of
mathematics and not only in specific theories – coincides with the totality
of statements that we can deduce purely formally from some simple univer-
sally true assumptions such as x = x, without any reference to meaning or
“truth” (Gödel’s completeness theorem for first order logic). In short, in this
case formal deducibility is as powerful as “truth”. The flip side is that formal
deducibility cannot be as powerful as “truth” when it is applied to specific
mathematical theories such as set theory or arithmetic (Gödel’s incompleteness
theorem).

Formalization allows us to understand the deeper reasons that have pre-

vented set theorists from settling important questions such as the continuum
hypothesis – that is, the statement that there are no cardinalities between that of
the set of natural numbers and that of the set of the reals. This understanding is
gathered by “running diagnostics” on our laboratory replica of set theory. That
is, just as an engineer evaluates a new airplane design by building and testing
a model of the real thing, we can ﬁnd out, with some startling successes, what
are the limitations of our theory, that is, what our assumptions are incapable of
logically implying.‡ If the replica is well built,§ we can then learn something
about the behaviour of the real thing.
In the case of formal set theory and, for example, the question of our failure
to resolve the continuum hypothesis, such diagnostics (the methods of Gödel
and Cohen – see Chapters VI and VIII) return a simple answer: We have not
included enough assumptions in (whether “real” or “formal”) set theory to settle
this question one way or another.

† Examples of “real” (non-formalized) theories are Euclid’s geometry, topology, the theory of
groups, and, of course, Cantor’s “naı̈ve” or “informal” set theory.
‡ In model theory “model” means exactly the opposite of what it means here. A model airplane
abstracts the real thing. A model of a formal (i.e., abstract) theory is a “concrete” or “real” version
of the abstract theory.
§ This is where it pays to choose reasonable assumptions, assumptions that are “obviously true”.
4 I. A Bit of Logic: A User’s Toolbox

But what about the interests of the reader who only wants to practise set
theory, and who therefore may choose to skip the parts of this volume that just
talk about set theory? Does, perchance, formalism put him into an unnecessary
straitjacket?
We think not. Actually it is easier, and safer, to reason formally than to do so
informally. The latter mode often mixes syntax and semantics (meaning), and
there is always the danger that the “user” may assign incorrect (i.e., convenient,
but not general) meanings to the symbols that he manipulates, a phenomenon
that anyone who is teaching mathematics must have observed several times
with some distress.
Another uncertainty one may encounter in an informal approach is this:
“What can we allow to be a ‘property’ in mathematics?” This is an important
question, for we often want to collect objects that share a common property,
or we want to prove some property of the natural numbers by induction or by
the least principle. But what is a property? Is colour a property? How about
mood? It is not enough to say, “no, these are not properties”, for these are
just two frivolous examples. The question is how to describe accurately and
unambiguously the infinite variety of properties that are allowed. Formalism
can do just that.†
“Formalism for the user” is not a revolutionary slogan. It was advocated
by Hilbert, the founder of formalism, partly as a means of – as he believed‡ –
formulating mathematical theories in a manner that allows one to check them
(i.e., run diagnostic tests on them) for freedom from contradiction,§ but also as
the right way to do mathematics. By this proposal he hoped to salvage mathe-
matics itself – which, Hilbert felt, was about to be destroyed by the Brouwer
school of intuitionist thought. In a way, his program could bridge the gap
between the classical and the intuitionist camps, and there is some evidence
that Heyting (an influential intuitionist and contemporary of Hilbert) thought
that such a rapprochement was possible. After all, since meaning is irrelevant
to a formalist, all that he is doing (in a proof ) is shuffling finite sequences of

† Well, almost. So-called cardinality considerations make it impossible to describe all “good”
properties formally. But, practically and empirically speaking, we can define all that matter for
“doing mathematics”.
‡ This belief was unfounded, as Gödel’s incompleteness theorems showed.
§ Hilbert’s metatheory – that is, the “world” or “lab” outside the theory, where the replica is
actually manufactured – was finitary. Thus – Hilbert believed – all this theory building and
theory checking ought to be effected by finitary means. This was another ingredient that was
consistent with peaceful coexistence with the intuitionists. And, alas, this ingredient was the one
that – as some writers put it – destroyed Hilbert’s program to found mathematics on his version
of formalism. Gödel’s incompleteness theorems showed that a finitary metatheory is not up to
the task.
I. A Bit of Logic: A User’s Toolbox 5

symbols, never having to handle or argue about infinite objects – a good thing,
as far as an intuitionist is concerned.†
In support of the “formalism for the user” position we must not fail to
mention Bourbaki’s (1966a) monumental work, which is a formalization of a
huge chunk of mathematics, including set theory, algebra, topology, and theory
of integration. This work is strictly for the user of mathematics, not for the
metamathematician who studies formal theories. Yet, it is fully formalized,
true to the spirit of Hilbert, and it comes in a self-contained package, including
a “Chapter 0” on formal logic.
More recently, the proposition of employing formal reasoning as a tool has
been gaining support in a number of computer science undergraduate curricula,
where logic and discrete mathematics are taught in a formalized setting, starting
with a rigorous course in the two logical calculi (propositional and predicate),
emphasizing the point of view of the user of logic (and mathematics) – hence
with an attendant emphasis on calculating (i.e., writing and annotating formal)
proofs. Pioneering works in this domain are the undergraduate text (1994) and
the paper (1995) of Gries and Schneider.
You are urged to master the technique of writing formal proofs by studying
how we go about it throughout this volume, especially in Chapter III.‡ You will
find that writing and annotating formal proofs is a discipline very much like
computer programming, so it cannot be that hard. Computer programming is
taught in the first year, isn’t it?§

† True, a formalist applies classical logic, while an intuitionist applies a different logic where, for
example, double negation is not removable. Yet, unlike a Platonist, a formalist does not believe –
or he does not have to disclose to his intuitionist friends that he might do – that infinite sets exist
in the metatheory, as his tools are just finite symbol sequences. To appreciate the tension here,
consider this anecdote: It is said that when Kronecker – the father of intuitionism – was informed
of Lindemann’s proof (1882) that π is transcendental, while he granted that this was an interesting
result, he also dismissed it, suggesting that π – whose decimal expansion is, of course, infinite
but not periodic – “does not exist” (see Wilder (1963, p. 193)). We do not propound the tenets of
intuitionism here, but it is fair to state that infinite sets are possible in intuitionistic mathematics
as this has later evolved in the hands of Brouwer and his Amsterdam school. However, such
sets must be (like all sets of intuitionistic mathematics) finitely generated – just like our formal
languages and the set of theorems (the latter provided that our axioms are too) – in a sense that
may be familiar to some readers who have had a course in automata and language theory. See
Wilder (1963, p. 234).
‡ Many additional paradigms of formal proofs, in the context of arithmetic, are found in Chapter II
of volume 1 of these Lectures.
§ One must not gather the impression that formal proofs are just obscure sequences of symbol
sequences akin to Morse code. Just as one does in computer programming, one also uses comments
in formal proofs – that is, annotations (in English, Greek, or your favourite natural language)
that aim to explain or justify for the benefit of the reader the various proof steps. At some point,
when familiarity allows and the length of (formal) proofs becomes prohibitive, we agree to relax
the proof style. Read on!
6 I. A Bit of Logic: A User’s Toolbox

It is also fair to admit, in defense of “semantic reasoning”, that meaning is

an important tool for formulating conjectures, for analyzing a given proof in
order to ﬁgure out what makes it tick, or indeed for discovering the proof, in
rough outline, in the ﬁrst place. For these very reasons we supplement many of
our formal arguments in this volume with discussions that are based on intuitive
semantics, and with several examples taken from informal mathematics.
We forewarn the reader of the inevitability with which the informal language
of sets already intrudes in this chapter (as it indeed does in all mathematics).
More importantly, some of the elementary results of Cantorian naı̈ve set theory
are needed here. Conversely, formal set theory needs the tools and some of the
results developed here. This apparent “chicken or egg” phenomenon is often
called “bootstrapping”,† not to be confused with “circularity” – which it is not:
Only informal set theory notation and results are needed here in order to found
formal set theory.

This is a good place to summarize our grand plan:

First (in this chapter), we will formalize the rules of reasoning in general – as
these apply to all mathematics – and develop their properties. We will skip the
detailed study of the interaction between formalized rules and their intended
meaning (semantics), as well as the study of the limitations of these formalized
rules. Nevertheless, we will state without proof the relevant important results
that come into play here, the completeness and incompleteness theorems (both
due to Kurt Gödel).
Secondly (starting with the next chapter), once we have learnt about these
tools of formalized reasoning – what they are and how to use them – we will
next become users of formal logic so that we can discover important theorems
of (or, as we say, develop) set theory. Of course, we will not forget to run a few
diagnostics. For example, Chapter VIII is entirely on metamathematical issues.
Formal theories, and their artificial languages, are defined (built) and “tested”
within informal mathematics (the latter also called “real” mathematics by
Platonists). The first theory that we build here is general-purpose, or “pure”,
formal logic. We can then build mathematical formal theories (e.g., set theory)
by just adding “impurities”, namely, the appropriate special symbols and ap-
propriate special assumptions (written in the artificial formal language).
We describe precisely how we construct these languages and theories using
the usual abundance of mathematical notation, notions, and techniques available

† The term “bootstrapping” is suggestive of a person pulling himself up by his bootstraps. Reput-
edly, this technique, which is pervasive, among others, in the computer programming ﬁeld – as
alluded to in the term “booting” – was invented by Baron Münchhausen.
I.1. First Order Languages 7

to us, augmented by the descriptive power of natural language (e.g., English,

or Greek, or French, or German, or Russian), as particular circumstances or
geography might dictate. This milieu within which we build, pursue, and study
our theories – besides “real mathematics” – is also often called the metatheory,
or more generally, metamathematics. The language we speak while at it, this
mélange of mathematics and natural language, is the metalanguage.

I.1. First Order Languages

In the most abstract and thus simplest manner of describing it, a formalized
mathematical theory (also, formalized logical theory) consists of the following
sets of things: a set of basic or primitive symbols, V , used to build symbol
sequences (also called strings, or expressions, or words, over V ); a set of
strings, Wff, over V , called the formulas of the theory; and ﬁnally, a subset of
Wff, Thm, the set of theorems of the theory.†
Well, this is the extension of a theory, that is, the explicit set of objects in it.
How is a theory given?
In most cases of interest to the mathematician it is given by specifying V and
two sets of simple rules, namely, formula-building rules and theorem-building
rules. Rules from the ﬁrst set allow us to build, or generate, Wff from V .
The rules of the second set generate Thm from Wff. In short (e.g., Bourbaki
(1966b)), a theory consists of an alphabet of primitive symbols and rules used
to generate the “language of the theory” (meaning, essentially, Wff) from these
symbols, and some additional rules used to generate the theorems. We expand
on this below.

I.1.1 Remark. What is a rule? We run the danger of becoming circular or too
pedantic if we overdeﬁne this notion. Intuitively, the rules we have in mind
are string manipulation rules – that is, “black boxes” (or functions) that re-
ceive string inputs and respond with string outputs. For example, a well-known
theorem-building rule receives as input a formula and a variable, and it returns
(essentially) the string composed of the symbol ∀, immediately followed by the
variable and, in turn, immediately followed by the formula.‡

(1) First off, the ( ﬁrst order) formal language, L, where the theory is “spoken”§
is a triple (V , Term, Wff), that is, it has three important components, each
of them a set. V is the alphabet (or vocabulary) of the language. It is the

† For a less abstract, but more detailed view of theories see p. 39.
‡ This rule is usually called “generalization”.
§ We will soon say what makes a language “ﬁrst order”.
8 I. A Bit of Logic: A User’s Toolbox

collection of the basic syntactic “bricks” (symbols) that we use to form

symbol sequences (or expressions) that are terms (members of Term) or
formulas (members of Wff). We will ensure that the processes that build
terms or formulas, using the basic building blocks in V , are (intuitively)
algorithmic (“mechanical”). Terms will formally codify objects, while for-
mulas will formally codify statements about objects.
(2) Reasoning in the theory will be the process of discovering “true statements”
about objects – that is, theorems. This discovery journey begins with cer-
tain formulas which codify statements that we take for granted (i.e., accept
without proof as “basic truths”). Such formulas are the axioms. There are
two types of axioms. Special, or nonlogical, axioms are to describe specific
aspects of any theory that we might be building; they are “basic truths”
in a restricted context. For example, “x + 1 = 0” is a special axiom that
contributes towards the characterization of number theory over N. This is a
“basic truth” in the context of N but is certainly not true of the integers or the
rationals – which is good, because we do not want to confuse N with the in-
tegers or the rationals. The other kind of axiom will be found in all theories.
It is the kind that is “universally valid”, that is, not a theory-specific truth
but one that holds in all branches of mathematics (for example, “x = x” is
such a universal truth). This is why this type of axiom will be called logical.
(3) Finally, we will need rules for reasoning, actually called rules of inference.
These are rules that allow us to deduce, or derive, a true statement from other
statements that we have already established as being true.† These rules will
be chosen to be oblivious to meaning, being only conscious of form. They
will apply to statement configurations of certain recognizable forms and
will produce (derive) new statements of some corresponding recognizable
forms (see Remark I.1.1).

I.1.2 Remark. We may think of axioms (of either logical or nonlogical type) as
being special cases of rules, that is, rules that receive no input in order to produce
an output. In this manner item (2) above is subsumed by item (3), thus we are
faithful to our abstract deﬁnition of theory (where axioms were not mentioned).
An example, outside mathematics, of an inputless rule is the rule invoked
when you type date on your computer keyboard. This rule receives no input,
and outputs the current date on your screen.

We next look carefully into (ﬁrst order) formal languages.

† The generous use of the term “true” here is only meant to motivate. “Provable” or “deducible”
formula, or “theorem”, will be the technically precise terminology that we will soon deﬁne to
replace the term “true statement”.
I.1. First Order Languages 9

There are two parts in each first order alphabet. The first, the collection of
the logical symbols, is common to all first order languages (regardless of which
theory is spoken in them). We describe this part immediately below.

Logical Symbols.
LS.1. Object or individual variables. An object variable is any one symbol out
of the unending sequence v0 , v1 , v2 , . . . . In practice – whether we are
using logic as a tool or as an object of study – we agree to be sloppy with
notation and use, generically, x, y, z, u, v, w with or without subscripts
or primes as names of object variables.† This is just a matter of nota-
tional convenience. We allow ourselves to write, say, z instead of, say,
v120000000000560000009 . Object variables (intuitively) “vary over” (i.e., are
allowed to take values that are) the objects that the theory studies (e.g.,
numbers, sets, atoms, lines, points, etc., as the case may be).
LS.2. The Boolean or propositional connectives. These are the symbols “¬”
and “∨”.‡ These are pronounced not and or respectively.
LS.3. The existential quantiﬁer, that is, the symbol “∃”, pronounced exists or
for some.
LS.4. Brackets, that is, “(” and “)”.
LS.5. The equality predicate. This is the symbol “=”, which we use to indicate
that objects are “equal”. It is pronounced equals.

The logical symbols will have a ﬁxed interpretation. In particular, “=” will
always be expected to mean equals.

The theory-specific part of the alphabet is not fixed, but varies from theory
to theory. For example, in set theory we just add the nonlogical (or special)
symbols, ∈ and U . The first is a special predicate symbol (or just predicate) of
arity 2; the second is a predicate symbol of arity 1.§
In number theory we adopt instead the special symbols S (intended meaning:
successor, or “ + 1”, function), +, ×, 0, <, and (sometimes) a symbol for the

† Conventions such as this one are essentially agreements – effected in the metatheory – on how to
be sloppy and get away with it. They are offered in the interest of user-friendliness and readability.
There are also theory-speciﬁc conventions, which may allow additional names in our informal
(metamathematical) notation. Such examples, in set theory, occur in the following chapters.
‡ The quotes are not part of the symbol. They serve to indicate clearly, e.g., in the case of “∨” here,
what is part of the symbol and what is not (the following period is not).
§ “arity” is derived from “ary” of “unary”, “binary”, etc. It denotes the number of arguments
needed by a symbol according to the dictates of correct syntax. Function and predicate symbols
need arguments.
10 I. A Bit of Logic: A User’s Toolbox

exponentiation operation (function) a b . The ﬁrst three are function symbols of

arities 1, 2, and 2 respectively. 0 is a constant symbol, < is a predicate of arity 2,
and whatever symbol we might introduce to denote a b would have arity 2.
The following list gives the general picture.

Nonlogical Symbols.
NLS.1. A (possibly empty) set of symbols for constants. We normally use
the metasymbols† a, b, c, d, e, with or without primes or subscripts, to
stand for constants unless we have in mind some alternative “standard”
formal notation in specific theories (e.g., ∅, 0, ω).
NLS.2. A (possibly empty) set of symbols for predicate symbols or relation
symbols for each possible arity n > 0. We normally use P, Q, R,
generically, with or without primes or subscripts, to stand for predicate
symbols. Note that = is in the logical camp. Also note that theory-
specific formal symbols are possible for predicates, e.g., <, ∈, U .
NLS.3. Finally, a (possibly empty) set of symbols for functions for each possi-
ble arity n > 0. We normally use f, g, h, generically, with or without
primes or subscripts, to stand for function symbols. Note that theory-
specific formal symbols are possible for functions, e.g., +, ×.

I.1.3 Remark. (1) We have the option of assuming that each of the logical
symbols that we named in LS.1–LS.5 have no further structure and that the
symbols are, ontologically, identical to their names, that is, they are just these
exact signs drawn on paper (or on any equivalent display medium).
In this case, changing the symbols, say, ¬ and ∃ to ∼ and E respectively
results in a “different” logic, but one that is, trivially, isomorphic to the one we
are describing: Anything that we may do in, or say about, one logic trivially
translates to an equivalent activity in, or utterance about, the other as long as
we systematically carry out the translations of all occurrences of ¬ and ∃ to ∼
and E respectively (or vice versa).
An alternative point of view is that the symbol names are not the same as
(identical with) the symbols they are naming. Thus, for example, “¬” names
the connective we pronounce not, by we do not know (or care) exactly what
the nature of this connective is (we only care about how it behaves). Thus, the
name “¬” becomes just a typographical expedient and may be replaced by other
names that name the same object, not.
This point of view gives one ﬂexibility in, for example, deciding how the
variable symbols are “implemented”. It often is convenient to suppose that the
† Metasymbols are informal (i.e., outside the formal language) symbols that we use within “real”
mathematics – the metatheory – in order to describe, as we are doing here, the formal language.
I.1. First Order Languages 11

entire sequence of variable symbols was built from just two symbols, say, “v”
and “|”.† One way to do this is by saying that vi is a name for the symbol
sequence

v|...|.

i |’s

Or, preferably – see (2) below – vi might be a name for the symbol sequence

v | . . . | v.

i |’s

Regardless of option, vi and v j will name distinct objects if i = j.

This is not the case for the metavariables (abbreviated informal names)
x, y, z, u, v, w. Unless we say explicitly otherwise, x and y may name the
same formal variable, say, v131 .
We will mostly abuse language and deliberately confuse names with the
symbols they name. For example, we will say “let v1007 be an object variable . . . ”
rather than “let v1007 name an object variable . . . ”, thus appearing to favour
option one.
(2) Any two symbols included in the alphabet are distinct. Moreover, if any
of them are built from simpler sub-symbols – e.g., v0 , v1 , v2 , . . . might really
name the strings vv, v|v, v||v, . . . – then none of them is a substring (or subex-
pression) of any other.‡
(3) A formal language, just like a natural language (such as English or Greek),
is alive and evolving. The particular type of evolution we have in mind is the
one effected by formal deﬁnitions. Such deﬁnitions continually add nonlogical
symbols to the language.§
Thus, when we say that, e.g., “∈ and U are the only nonlogical symbols of
set theory”, we are telling a small white lie. More accurately, we ought to have
said that “∈ and U are the only ‘primitive’ (or primeval) nonlogical symbols of
set theory”, for we will add loads of other symbols such as ∪, ω, ∅, ⊂, and ⊆.
This evolution affects the (formal) language of any theory, not just that of
set theory.

† We intend these two symbols to be identical to their names. No philosophical or other purpose
will be served by allowing more indirection here (such as “v names u, which actually names w,
which actually is . . . ”).
‡ What we have stated under (2) are requirements, not metatheorems. That is, they are nothing of
the sort that we can prove about our formal language within everyday mathematics.
§ This phenomenon will be visited upon in some detail in what follows. By the way, any additions
are made to the nonlogical side of the alphabet, since all the logical symbols have been given,
once and for all.
12 I. A Bit of Logic: A User’s Toolbox

Wait a minute! If formal set theory is to serve as the foundation of all mathe-
matics, and if the present chapter is to assist towards that purpose, then how is
it that we are already employing natural numbers like 12000000560000009 as
subscripts in the names of object variables? How is it permissible to already talk
about “sets of symbols” when we are about to found a theory of sets formally?
Surely we do not have† any of these items yet, do we?
This protestation is offered partly in jest. We have already said that we work
within real mathematics as we build the “replicas” or “simulators” of logic
and set theory. Say we are Platonists. Then the entire body of mathematics –
including infinite sets, in particular the set of natural numbers N – is available
to us as we are building whatever we are building.
We can thus describe how we assemble the simulator and its various parts
using our knowledge of real mathematics, the language of real mathematics,
and all “building blocks” available to us, including sets, infinite or otherwise,
and natural numbers. This mathematics “exists” whether or not anyone ever
builds a formal simulator for naı̈ve set theory, or logic for that matter. Thus any
apparent circularity disappears.
Now if we are not Platonists, then our mathematical “reality” is more re-
stricted, but, nevertheless, building a simulator or not in this reality does not
affect the existence of the reality. We will, however, this time, revise our tools.
For example, if we prefer to think that individual natural numbers exist (up
to any size), but not so their collection N, then it is still possible to build our
formal languages (in particular, as many object variables as we want) – pretty
much as already described – in this restricted metatheory. We may have to
be careful not to say that we have a unending sequence of such variables, as
this would presume the existence of infinite sets in the metatheory.‡ We can
say instead that a variable is any object of the form vi where i is a (meaning-
less) word of (meaningless) symbols, the latter chosen out of the set or list
“0, 1, 2, 3, 4, 5, 6, 7, 8, 9”.
Clearly the above approach works even within a metatheory that has failed
to acknowledge the existence of any natural numbers.§

In this volume we will take the normal user-friendly position that is habi-
tual nowadays, namely, that our metatheory is the Platonist’s (inﬁnitary)
mathematics.

† “Do not have” in the sense of having not formally defined – or proved to exist – or both.
‡ A finitist would have none of it, although a post-Brouwer intuitionist would be content that such
a sequence is finitely describable.
§ Hilbert, in his finitistic metatheory, built whatever natural numbers he needed by repeating the
stroke symbol “|”.
I.1. First Order Languages 13

I.1.4 Deﬁnition (Terminology about Strings). A symbol sequence, or expres-

sion (or string), that is formed by using symbols exclusively out of a given set†
M is called a string over the set, or alphabet, M.
If A and B denote strings (say, over M), then the symbol A ∗ B, or more
simply AB, denotes the symbol sequence obtained by listing ﬁrst the symbols
of A in the given left to right sequence, immediately followed by the symbols of
B in the given left to right sequence. We say that AB is (more properly, denotes
or names) the concatenation of the strings A and B in that order.
We denote the fact that the strings (named) C and D are identical sequences
(but we just say that they are equal) by writing C ≡ D. The symbol ≡ denotes
the negation of the string equality symbol ≡. Thus, if # and ? are (we do mean
“are”) symbols from an alphabet, then #?? ≡ #?? but #? ≡ #??. We can also
employ ≡ in contexts such as “let A ≡ ##?”, where we give the name A to the
string ##?.‡

In this book the symbol ≡ will be used exclusively in the metatheory as equality
of strings over some set M.

The symbol λ normally denotes the empty string, and we postulate for it the
following behaviour:

A ≡ Aλ ≡ λA for all strings A.

We say that A occurs in B, or is a substring of B, iff § there are strings C and

D such that B ≡ C AD. For example, “(” occurs four times in the (explicit)
string “¬(()∨)((”, at positions 2, 3, 7, 8. Each time this happens we have an
occurrence of “(” in “¬(()∨)((”.
If C ≡ λ, we say that A is a preﬁx of B. If moreover D ≡ λ, then we say
that A is a proper preﬁx of B.

I.1.5 Deﬁnition (Terms). The set of terms, Term, is the smallest set of strings
over the alphabet V with the following two properties:
(1) Any of the items in LS.1 or NLS.1 (x, y, z, a, b, c, etc.) are included.

† A set that supplies symbols to be used in building strings is not special. It is just a set. However,
it often has a special name: “alphabet”.
‡ Punctuation, such as “.”, is not part of the string. One often avoids such footnotes by quoting
strings that are explicitly written as symbol sequences. For example, if A stands for the string
#, one writes A ≡ “#”. Note that we must not write “A”, unless we mean a string whose only
symbol is A.
§ If and only if.
14 I. A Bit of Logic: A User’s Toolbox

(2) If f is a function† of arity n and t1 , t2 , . . . , tn are included, then so is the

string “ f t1 t2 . . . tn ”.
The symbols t, s, and u, with or without subscripts or primes, will denote
arbitrary terms. As they are used to describe the syntax of terms, we often call
such symbols syntactic variables – which is synonymous with metavariables.

I.1.6 Remark. (1) We often abuse notation and write f (t1 , . . . , tn ) instead of
f t1 . . . tn .
(2) Definition I.1.5 is an inductive definition.‡ It defines a more or less com-
plicated term by assuming that we already know what simpler terms look like.
This is a standard technique employed in real mathematics (within which we are
defining the formal language). We will have the opportunity to say more about
such inductive definitions – and their appropriateness – in a comment
later on.

(3) We relate this particular manner of defining terms to our working def-
inition of a theory (given on p. 7 immediately before Remark I.1.1 in terms
of “rules” of formation). Item (2) in I.1.5 essentially says that we build new
terms (from old ones) by applying the following general rule: Pick an arbitrary
function symbol, say f . This has a specific formation rule associated with it.
Namely, “for the appropriate number, n, of an already existing ordered list of
terms, t1 , . . . , tn , build the new term consisting of f , immediately followed by
the ordered list of the given terms”.
For example, suppose we are working in the language of number theory.
There is a function symbol + available there. The rule associated with + builds
the new term +ts for any prior obtained terms t and s. Thus, +v1 v13 and
+v121 + v1 v13 are well-formed terms. We normally write terms of number the-
ory in infix notation,§ i.e., t +s, v1 +v13 and v121 +(v1 +v13 ) (note the intrusion of
brackets, to indicate sequencing in the application of +).
A by-product of what we have just described is that the arity of a function
symbol f is whatever number of terms the associated rule will require as
input.

† We will omit from now on the qualiﬁcation “symbol” from terminology such as “function sym-
bol”, “constant symbol”, “predicate symbol”.
‡ Some mathematicians are adamant that we call this a recursive deﬁnition and reserve the term
“induction” for “induction proofs”. This is seen to be unwarranted hairsplitting if we consider
that Bourbaki (1966b) calls induction proofs “démonstrations par récurrence”. We will be less
dogmatic: Either name is all right.
§ Function symbol placed between the arguments.
I.1. First Order Languages 15

(4) A crucial word used in I.1.5 (which recurs in all inductive definitions) is
“smallest”. It means “least inclusive” (set). For example, we may easily think of
a set of strings that satisfies both conditions of the above definition, but which
is not “smallest” by virtue of having additional elements, such as the string
“¬¬(”.

Pause. Why is “¬¬(” not in the smallest set as deﬁned above, and therefore
not a term?

The reader may wish to ponder further on the import of the qualiﬁcation
“smallest” by considering the familiar (similar) example of N. The principle of
induction in N ensures that this set is the smallest with the properties

(i) 0 is included, and

(ii) if n is included, then so is n + 1.

By contrast, all of Z (set of integers), Q (set of rational numbers), and R (set

of real numbers) satisfy (i) and (ii), but they are clearly not the “smallest” such.

I.1.7 Deﬁnition (Atomic Formulas). The set of atomic formulas, Af, contains
precisely:
(1) The strings t = s for every possible choice of terms t, s.
(2) The strings Pt1 t2 . . . tn for every possible choices of n-ary predicates P (for
all choices of n > 0) and all possible choices of terms t1 , t2 , . . . , tn .

We often abuse notation and write P(t1 , . . . , tn ) instead of Pt1 . . . tn .

I.1.8 Deﬁnition (Well-Formed Formulas). The set of well-formed formulas,

Wff, is the smallest set of strings or expressions over the alphabet V with the
following properties:

(a) All the members of Af are included.

(b) If A and B denote strings (over V ) that are included, then (A ∨ B ) and
(¬A) are also included.
(c) If A is† a string that is included and x is any object variable (which may or
may not occur (as a substring) in the string A), then the string ((∃x)A)
is also included. We say that A is the scope of (∃x).

† Denotes.
16 I. A Bit of Logic: A User’s Toolbox

I.1.9 Remark.
(1) The above is yet another inductive definition. Its statement (in the met-
alanguage) is facilitated by the use of syntactic, or meta-, variables – A and
B – used as names for arbitrary (indeterminate) formulas. We first encountered
the use of syntactic variables in Definition I.1.5.
In general, we will let calligraphic capital letters A, B , C , D, E, F , G
(with or without primes or subscripts) be syntactic variables (i.e., metalinguistic
names) denoting well-formed formulas, or just formulas, as we often say. The
definition of Wff given above is standard. In particular, it permits well-formed
formulas such as ((∃x)((∃x)x = 0)) in the interest of making the formation
rules context-free.†
(2) The rules of syntax just given do not allow us to write things such as ∃ f
or ∃P where f and P are function and predicate symbols respectively. That
quantification is deliberately restricted to act solely on object variables makes
the language first order.
(3) We have already indicated in Remark I.1.6 where the arities of function
and predicate symbols come from (Definitions I.1.5 and I.1.7 referred to them).
These are numbers that are implicit (“hardwired”) within the formation rules
for terms and atomic formulas. Each function, and each predicate symbol – e.g.,
+, ×, ∈, < – has its own unique associated formation rule. This rule “knows”
how many terms are needed (on the “input side”) in order to form a term or
atomic formula.
There is an alternative way of making arities of symbols known (in the
metatheory): Rather than embedding arities in the formation rules, we can hide
them inside the ontology of the symbols, not making them explicit in the name.
For example, a new symbol, say ∗, can be used to record arity. That is, we can
think of a predicate (or function) symbol as consisting of two parts: an arity
part and an “all the rest” part, the latter needed to render the symbol unique.
For example, ∈ may be actually the short name for the symbol “∈∗∗ ”, where
this latter name is identical to the symbol it denotes, or “what you see is what
you get” – see Remark I.1.3(1) and (2), p. 10. The presence of the two asterisks
declares the arity. Some people say this differently: They make available to the
metatheory a “function”, ar , from “the set of all predicate and function symbols”
(of a given language) to the natural numbers, so that for any function symbol f
or predicate symbol P, ar ( f ) and ar (P) yield the arity of f or P respectively.‡

† In some presentations, formation rule I.1.8(c) is context-sensitive: It requires that x be not already
quantiﬁed in A.
‡ In mathematics we understand a function as a set of input-output pairs. One can “glue” the two
parts of such pairs together, as in “∈∗∗ ” – where “∈” is the input part and “∗∗” is the output part,
the latter denoting “2” – etc. Thus, the two approaches are equivalent.
I.1. First Order Languages 17

(4) As a consequence of the remarks in (3) the theory can go about its job of
generating, say, terms using the formation rules, at the same time being unable
to see or discuss these arities, since these are hidden inside the rules (or inside
the function or predicate names in the alternative approach). So it is not in the
theory’s “competence” to say, e.g., “hmm, this function has arity 10011132”.
Indeed, a theory cannot even say “hmm, so this is a function (or a term, or a
wff )”. A theory just generates strings. It does not test them for membership in
syntactic categories, such as variable, function, term, or wff. A human user of
the theory, on the other hand, can, of course, make such observations. Indeed,
in theories such as set theory and arithmetic, the human user can even write a
computer program that correctly makes such observations. But both of these
agents, human and computer, act in the metatheory.
(5) Abbreviations:
Abr1. The string ((∀x)A) abbreviates the string (¬((∃x)(¬A))). Thus, for
any explicitly written formula A, the former notation is informal (meta-
mathematical), while the latter is formal (within the formal language). In
particular, ∀ is a metalinguistic symbol. “∀x” is the universal quantiﬁer.
A is its scope. The symbol ∀ is pronounced for all.

We also introduce – in the metalanguage – a number of additional Boolean

connectives in order to abbreviate certain strings:

Abr2. Conjunction, ∧. (A ∧ B ) stands for (¬((¬A) ∨ (¬B ))). The symbol

∧ is pronounced and.
Abr3. Classical or material implication, →. (A → B ) stands for ((¬A) ∨
B ). (A → B ) is pronounced if A, then B .
Abr4. Equivalence, ↔. (A ↔ B ) stands for ((A → B ) ∧ (B → A)).
Abr5. To minimize the use of brackets in the metanotation, we adopt standard
priorities of connectives, that is, ∀, ∃, and ¬ have the highest; then
we have (in decreasing order of priority) ∧, ∨, →, ↔; and we agree
not to use outermost brackets. All associativities are right – that is,
if we write A → B → C , then this is a (sloppy) counterpart for
(A → (B → C )).
(6) The language just deﬁned, L, is one-sorted, that is, it has a single sort or
type of object variable. Is this not inconvenient? After all, our set theory will
have both atoms and sets. In other theories, e.g., geometry, one has points, lines,
and planes. One would have hoped to have different types of variables, one for
each.
Actually, to do this would amount to a totally unnecessary complication of
syntax. We can (and will) get away with just one sort of object variable. For
18 I. A Bit of Logic: A User’s Toolbox

example, in set theory we will also introduce a 1-ary† predicate, U , whose job is
to test an object for “sethood”‡ (vs. atom status). Similar remedies are available
to other theories. For example, geometry will manage with one sort of variable,
and unary predicates “Point”, “Line”, and “Plane”.
Apropos language, some authors emphasize the importance of the nonlogical
symbols, taking at the same time the formation rules for granted; thus they say
that we have a language, say, “L = {∈, U }” rather than “L = (V , Term, Wff )
where V has ∈ and U as its only nonlogical symbols”. That is, they use
“language” for the nonlogical part of the alphabet.

This comment requires some familiarity with elementary concepts – such

as BNF notation for grammar speciﬁcation – encountered in a course on
formal languages and automata, or, alternatively in language manuals for Algol-
like programming languages (such as Algol itself, Pascal, etc.); hence the
sign.

We have said above “This rule ‘knows’ how many terms are needed (on
the ‘input side’) in order to form a term or atomic formula.” We often like to
personify rules, theories, and the like, to make the exposition more relaxed.
This runs the danger of being misunderstood on occasion. Here is how a rule
“knows”.
Syntactic definitions in the part of theoretical computer science known as
formal language theory are given by a neat notation called BNF:§ To fix ideas,
let us say that we are describing the terms of a specific first order language that
contains just one constant symbol, “c”, and just two function symbols, “ f ” and
“g”, where we intend the former to be ternary (arity 3) and the latter of arity 5.
Moreover, assume that the variables v0 , v1 , . . . are short names for vv, v|v, . . .
respectively.
Then, using the syntactic names ter m, var , str okes to stand for any
term, any variable, any string of strokes, we can recursively define these syn-
tactic categories as follows, where
we read “→” as “is defined as” (the right

hand side), and the big stroke, “” – pronounced “or” – gives alternatives in the

† More usually called unary.

‡ People writing about, or teaching, set theory have made this word up. Of course, one means by
it the property of being a set.
§
Backus-Naur form. Rules (1)–(3) are in BNF. In particular the alternative symbol “” is part of
BNF notation, and so is the . . . notation for the names of syntactic categories. The “→” has
many typographical variants, including “::=”.
I.1. First Order Languages 19

deﬁnition (of the left hand side):

(1) strokes → λstrokes|
(2) var → vstrokesv

(3) term → cvar

f termtermtermgtermtermtermtermterm

For example, rule (1) says that a string of strokes is (deﬁned as) either the empty
string λ, or a string of strokes followed by a single stroke.
Rule (3) shows clearly how the “knowledge” of the arities of f and g is
“hardwired” within the rule. For example, the third alternative of that rule says
that a term is a string composed of the symbol “ f ” followed immediately by
three strings, each of which is a term.

A variable that is quantiﬁed is bound in the scope of the quantiﬁer. Non-

quantiﬁed variables are free. We also give below, by induction on formulas,
precise (metamathematical) deﬁnitions of “free” and “bound”.

I.1.10 Deﬁnition (Free and Bound Variables). An object variable x occurs

free in a term t or atomic formula A iff it occurs in t or A as a substring
(see I.1.4).
x occurs free in (¬A) iff it occurs free in A.
x occurs free in (A ∨ B ) iff it occurs free in at least one of A and B .
x occurs free in ((∃y)A) iff x occurs free in A and y is not the same variable
as x.†
The y in ((∃y)A) is, of course, not free – even if it might be so in A – as
we have just concluded in this inductive deﬁnition. We say that it is bound in
((∃y)A). Trivially, terms and atomic formulas have no bound variables.

I.1.11 Remark. (1) Of course, Deﬁnition I.1.10 takes care of the deﬁned con-
nectives as well, via the obvious translation procedure.
(2) Notation. If A is a formula, then we often write A [y1 , . . . , yk ] to
indicate interest in the variables y1 , . . . , yk , which may or may not be free in

† Recall that x and y are abbreviations of names such as v1200098 and v11009 (which name distinct
variables). However, it could be that both x and y name v101 . Therefore it is not redundant to say
“and y is not the same variable as x”. By the way, x ≡ y says the same thing, by I.1.4.
20 I. A Bit of Logic: A User’s Toolbox

A. There may be other free variables in A that we may have chosen not to in-
clude in the list. On the other hand, if we use round brackets, as in A(y1 , . . . , yk ),
then we are implicitly asserting that y1 , . . . , yk is the complete list of free
variables that occur in A.

I.1.12 Deﬁnition. A term or formula is closed iff no free variables occur in it.
A closed formula is called a sentence.
A formula is open iff it contains no quantiﬁers (thus, an open formula may
also be closed).

I.2. A Digression into the Metatheory: Informal

Induction and Recursion
We have already seen a number of inductive or recursive definitions in Sec-
tion I.1. The reader, most probably, has already seen or used such definitions
elsewhere.
We will organize the common important features of inductive definitions in
this section for easy reference. We will revisit these issues, within the framework
of formal set theory, in due course, but right now we need to ensure that our grasp
of these notions and techniques, at the metamathematical level, is sufficient for
our needs.
One builds a set S by recursion, or inductively (or by induction), out of two
ingredients: a set of initial objects, I , and a set of rules or operations, R. A
member of R – a rule – is a (possibly infinite) table, or relation, like

y1 ... yn z
a1 ... an an+1
b1 ... bn bn+1
.. .. ..
. . .

If the above rule (table) is called Q, then we use the notations Q(a1 , . . . , an ,
an+1 ) and† a1 , . . . , an , an+1 ∈ Q interchangeably to indicate that the ordered
sequence or “row” a1 , . . . , an , an+1 is present in the table. We say “Q(a1 , . . . ,
an , an+1 ) holds” or “Q(a1 , . . . , an , an+1 ) is true”, but we often also say that
“Q applied to a1 , . . . , an yields an+1 ”, or that “an+1 is a result or output of
Q, when the latter receives input a1 , . . . , an ”. We often abbreviate such inputs

† “x ∈ A” means that “x is a member of – or is in – A” in the informal set-theoretic sense.

I.2. A Digression into the Metatheory 21

using vector notation, namely, an (or just a , if n is understood). Thus, we often
an+1 ) for Q(a1 , . . . , an , an+1 ).
write Q(
A rule Q that has n + 1 columns is called (n + 1)-ary.

I.2.1 Deﬁnition. We say “a set T is closed under an (n + 1)-ary rule Q” to

mean that whenever c1 , . . . , cn are all in T , then d ∈ T for all d satisfying
Q(c1 , . . . , cn , d).

With these preliminary understandings out of the way, we now state

I.2.2 Deﬁnition. S is deﬁned by recursion, or by induction, from initial objects

I and set of rules R, provided it is the smallest (least inclusive) set with the
properties

(1) I ⊆ S,†
(2) S is closed under every Q in R. In this case we say that S is R-closed.

We write S = Cl(I , R), and say that “S is the closure of I under R”.

We have at once:

I.2.3 Metatheorem (Induction on S). If S = Cl(I , R) and if some set T

satisﬁes

(1) I ⊆ T , and
(2) T is closed under every Q in R,

then S ⊆ T .

Pause. Why is the above a metatheorem?

The above principle of induction on S is often rephrased as follows: To prove

that a “property” P(x) holds for all members of Cl(I , R), just prove that

(a) Every member of I has the property, and

(b) The property propagates with every rule in R, i.e., if P(ci ) holds (is true)
for i = 1, . . . , n, and if Q(c1 , . . . , cn , d) holds, then d too has property
P(x) – that is, P(d) holds.

† From our knowledge of elementary informal set theory, we recall that A ⊆ B means that every
member of A is also a member of B.
22 I. A Bit of Logic: A User’s Toolbox

Of course, this rephrased principle is valid, for if we let T be the set of all
objects that have property P(x) – for which set one employs the well-established
symbol {x : P(x)} – then this T satisﬁes (1) and (2) of the metatheorem.†

I.2.4 Deﬁnition (Derivations and Parses). A (I , R)-derivation, or simply

derivation – if I and R are understood – is a ﬁnite sequence of objects
d1 , . . . , dn (n ≥ 1) such that each di is
(1) A member of I , or‡
(2) For some (r + 1)-ary Q ∈ R, Q(d j1 , . . . , d jr , di ) holds, and jl < i for
l = 1, . . . , r .
We say that di is derivable within i steps.
A derivation of an object A is also called a parse of a.

Trivially, if d1 , . . . , dn is a derivation, then so is d1 , . . . , dm for any 1 ≤ m < n.

If d is derivable within n steps, it is also derivable in k steps or less for all
k > n, since we can lengthen a derivation arbitrarily by adding I -elements to it.

I.2.5 Remark. The following metatheorem shows that there is a way to “con-
struct” Cl(I , R) iteratively, i.e., one element at a time by repeated application
of the rules.
This result shows definitively that our inductive definitions of terms (I.1.5)
and well-formed formulas (I.1.8) fully conform with our working definition of
theory, as an alphabet and a set of rules that are used to build formulas and
theorems (p. 7).

I.2.6 Metatheorem.
Cl(I , R) = {x : x is (I , R)-derivable within some number of steps, n}

Proof. For notational convenience let us write

T = {x : x is (I , R)-derivable within some number of steps, n}.
As we know from elementary naı̈ve set theory, we need to show here both
Cl(I , R) ⊆ T and Cl(I , R) ⊇ T to settle the claim.
⊆: We do induction on Cl(I , R) (using I.2.3). Now I ⊆ T , since every
member of I is derivable in n = 1 step (why?).

† We are sailing too close to the wind here. It turns out that not all properties P(x) lead to sets
{x : P(x)}. Our explanation was naı̈ve. However, formal set theory, which is meant to save us
from our naı̈veté, upholds the principle (a)–(b) using just a slightly more complicated explanation.
The reader can see this explanation in Chapter VII.
‡ This “or” is inclusive: (1), or (2), or both.
I.2. A Digression into the Metatheory 23

Also, T is closed under every Q in R. Indeed, let such an (r + 1)-ary Q be

chosen. Let
Q(a1 , . . . , ar , b) (i)

and {a1 , . . . , ar } ⊆ T . Thus, each ai has a (I , R)-derivation. Concatenate all

these derivations:
. . . , a1 , . . . , a2 , . . . , . . . , ar

The above is a derivation (why?). But then, so is

. . . , a1 , . . . , a2 , . . . , . . . , ar , b
by (i). Thus, b ∈ T .
⊇: We argue this – that is, if d ∈ T , then d ∈ Cl(I , R) – by induction on
the number of steps, n, in which d is derivable.
For n = 1 we have d ∈ I and we are done, since I ⊆ Cl(I , R).
Let us make the induction hypothesis (I.H.) that for derivations of ≤ n steps
the claim is true.
Let then d be derivable within n + 1 steps. Thus, there is a derivation
a1 , . . . , an , d.
Now, if d ∈ I , we are done as above (is this a “real case”?).
If on the other hand Q(a j1 , . . . , a jr , d), then, for i = 1, . . . , r , we have
a ji ∈ Cl(I , R) by the I.H.; hence d ∈ Cl(I , R), since the closure is closed
under all Q ∈ R.

I.2.7 Example. One can see now that N = Cl(I , R), where I = {0} and R
contains just the relation y = x + 1 (input x, output y). Similarly, Z, the set
of all integers, is Cl(I , R), where I = {0} and R contains just the relations
y = x + 1 and y = x − 1 (input x, output y).
For the latter, the inclusion Cl(I , R) ⊆ Z is trivial (by I.2.3). For ⊇ we eas-
ily see that any n ∈ Z has a (I , R)-derivation (and then we are done by I.2.6).
For example, if n > 0, then 0, 1, 2, . . . , n is a derivation, while if n < 0, then
0, −1, −2, . . . , n is one. If n = 0, then the one-term sequence 0 is a derivation.
Another interesting closure is obtained by I = {3} and the two relations
z = x + y and z = x − y. This is the set {3k : k ∈ Z} (see Exercise I.1).

Pause. So, taking the ﬁrst sentence of I.2.7 one step further, we note that we
have just proved the induction principle for N, for that is exactly what the
“equation” N = Cl(I , R) says (by I.2.3). Do you agree?

There is another way to view the iterative construction of Cl(I , R): The
set is constructed in stages. Below we are using some more notation borrowed
24 I. A Bit of Logic: A User’s Toolbox

from informal set theory. For any sets A and B we write A ∪ B to indicate the
set union which consists of all the members found in A or B or in both. More
generally, if we have a lot of sets, X 0 , X 1 , X 2 , . . . , that is, one X i for every
integer i ≥ 0 – which we denote by the compact notation (X i )i≥0 – then we
may wish to form a set that includes all the objects found as members all over
the X i , that is (using inclusive, or “logical”, “or”s below), form

{x : x ∈ X 0 or x ∈ X 1 or . . .}

or, more elegantly and precisely,

{x : for some i ≥ 0, x ∈ X i }

The latter is called the union of the sequence (X i )i≥0 and is often denoted by

X i or Xi
i≥0 i≥0

Correspondingly, we write

Xi or Xi
i≤n i≤n

if we only want to take a ﬁnite union, also indicated clumsily as X 0 ∪ · · · ∪ X n .

I.2.8 Deﬁnition (Stages). In connection with Cl(I , R) we deﬁne the sequence

of sets (X i )i≥0 by induction on n, as follows:

X0 = I

X n+1 = Xi ∪ b : for some Q ∈ R and some an in an , b)
X i , Q(
i≤n i≤n

That is, to form X n+1 we append to i≤n X i all the outputs of all the relations

in R acting on all possible inputs, the latter taken from i≤n X i .
We say that X i is built at stage i, from initial objects I and rule set R.

In words, at stage 0 we are given the initial objects (X 0 = I ). At stage 1 we

apply all possible relations to all possible objects that we have so far – they
form the set X 0 – and build the ﬁrst stage set, X 1 , by appending the outputs to
what we have so far. At stage 2 we apply all possible relations to all possible
objects that we have so far – they form the set X 0 ∪ X 1 – and build the second
stage set, X 2 , by appending the outputs to what we have so far. And so on.
When we work in the metatheory, we take for granted that we can have
simple inductive deﬁnitions on natural numbers. The reader is familiar with
I.2. A Digression into the Metatheory 25

several such deﬁnitions, e.g.,

a0 = 1 (for a = 0 throughout)
a n+1
= a · an

We will (meta)prove a general theorem on the feasibility of recursive deﬁnitions

later on (I.2.13).
The following theorem connects stages and closures.

I.2.9 Metatheorem. With the X i as in I.2.8,

Cl(I , R) = Xi
i≥0

Proof. ⊆: We do induction on Cl(I , R). For the basis, I = X 0 ⊆ i≥0 X i .

We show that i≥0 X i is R-closed. Let Q ∈ R and Q( an , b) hold, for some

an in i≥0 X i . Thus, by deﬁnition of union, there are integers j1 , j2 , . . . , jn

such that ai ∈ X ji , i = 1, . . . , n. If k = max{ j1 , . . . , jn }, then an is in i≤k X i ;

hence b ∈ X k+1 ⊆ i≥0 X i .
⊇: It sufﬁces to prove that X n ⊆ Cl(I , R), a fact we can prove by induction
on n. For n = 0 it holds by I.2.2. As an I.H. we assume the claim for all n ≤ k.

The case for k + 1: X k+1 is the union of two sets. One is i≤k X i . This is
a subset of Cl(I , R) by the I.H. The other is

b : for some Q ∈ R and some a in a , b)
X i , Q(
i≤k

This too is a subset of Cl(I , R), by the preceding observation and the fact that
Cl(I , R) is R-closed.

N.B. An inductively deﬁned set can be built by stages.

I.2.10 Deﬁnition (Immediate Predecessors; Ambiguity). If d ∈ Cl(I , R)

and for some Q and a1 , . . . , ar it is the case that Q(a1 , . . . , ar , d), then the
a1 , . . . , ar are immediate Q-predecessors of d, or just immediate predecessors
if Q is understood; for short, i.p.
A pair (I , R) is called ambiguous if some d ∈ Cl(I , R) satisﬁes any
(or all) of the following conditions:

(i) It has two (or more) distinct sets of immediate P-predecessors for some
rule P.
26 I. A Bit of Logic: A User’s Toolbox

(ii) It has both immediate P-predecessors and immediate Q-predecessors, for

P = Q.
(iii) It is a member of I , yet it has immediate predecessors.
If (I , R) is not ambiguous, then it is unambiguous.

I.2.11 Example. The pair ({00, 0}, {Q}), where Q(x, y, z) holds iff z = x y
(where “x y” denotes the concatenation of the strings x and y, in that order), is
ambiguous. For example, 0000 has the two immediate predecessor sets {00, 00}
and {0, 000}. Moreover, while 00 is an initial object, it does have immediate
predecessors – namely, the set {0, 0} (or, what amounts to the same thing, {0}).

I.2.12 Example. The pair (I , R) where I = {3} and R consists of z = x + y

and z = x − y is ambiguous. Even 3 has (inﬁnitely many) distinct sets of i.p.
(e.g., any {a, b} such that a + b = 3, or a − b = 3).
The pairs that effect the deﬁnition of Term (I.1.5) and Wff (I.1.8) are un-
ambiguous (see Exercises I.2 and I.3).

I.2.13 Metatheorem (Deﬁnition by Recursion). Let (I , R) be unambiguous,

and Cl(I , R) ⊆ A, where A is some set. Let also Y be a set, and† h : I → Y
and g Q , for each Q ∈ R, be given functions. For any (r +1)-ary Q, an input for
the function g Q is a sequence a, b1 , . . . , br , where a is in A and the b1 , . . . , br
are all in Y . All the g Q yield outputs in Y .
Under these assumptions, there is a unique function f : Cl(I , R) → Y
such that


 y = h(x) and x ∈ I

or, for some Q ∈ R,
y = f (x) iff (1)

 y = g Q (x, o1 , . . . , or ) and Q(a1 , . . . , ar , x) holds,

where oi = f (ai ) for i = 1, . . . , r

The reader may wish to skip the proof on ﬁrst reading.

Proof. Existence part. For each (r + 1)-ary Q ∈ R, deﬁne Q by‡

a1 , o1 , . . . , ar , or , b, g Q (b, o1 , . . . , or ) iff Q(a1 , . . . , ar , b)
Q (2)

† The notation f : A → B is common in informal (and formal) mathematics. It denotes a function

f that receives “inputs” from the set A and yields “outputs” in the set B.
‡ For a relation Q, writing just “Q(a1 , . . . , ar , b)” is equivalent to writing “Q(a1 , . . . , ar , b) holds”.
I.2. A Digression into the Metatheory 27

For any a1 , . . . , ar , b, the above deﬁnition of Q is effected for all possible

choices of o1 , . . . , or such that g Q (b, o1 , . . . , or ) is deﬁned.

to form a set of rules R.

Collect now all the Q

Let also I = {x, h(x) : x ∈ I }.
We will verify that the set F = Cl(I, R) is a 2-ary relation that for every
input yields at most one output, and therefore is a function. For such a relation
it is customary to write, letting the context fend off the obvious ambiguity in
the use of the letter F,

y = F(x) iff F(x, y) (∗)

We will further verify that replacing f in (1) above by F results in a valid

equivalence (the “iff” holds). That is, F satisﬁes (1).

(a) We establish that F is a relation composed of pairs x, y (x is input, y is

output) where x ∈ Cl(I , R) and y ∈ Y . This follows easily by induction
on F (I.2.3), since I ⊆ F, and the property (of containing such pairs)
propagates with each Q (recall that the g Q yield outputs in Y ).
(b) We next show “if x, y ∈ F and x, z ∈ F, then y = z” – that is, F is
single-valued, or well deﬁned, in short, it is a function. We again employ
induction on F, thinking of the quoted statement as a “property” of the
pair x, y: Suppose that x, y ∈ I and let also x, z ∈ F. By I.2.6,
1 , o1 , . . . , ar , or , x, z), where Q(a1 , . . . , ar , x) and
x, z ∈ I, or Q(a
z = g Q (x, o1 , . . . , or ), for some (r + 1)-ary Q and a1 , o1 , . . . , ar , or
in F. The right hand side of the italicized “or” cannot hold for an un-
ambiguous (I , R), since x cannot have i.p. Thus x, z ∈ I, hence y =
h(x) = z. To prove that the property propagates with each Q, let

1 , o1 , . . . , ar , or , x, y)

Q(a

but also

P(b1 , o1 , . . . , bl , ol , x, z)

where Q(a1 , . . . , ar , x), P(b1 , . . . , bl , x), and

y = g Q (x, o1 , . . . , or ) and z = g P (x, o1 , . . . , ol ) (3)

Since (I , R) is unambiguous, Q = P (hence also Q = P), r = l, and

ai = bi , for i = 1, . . . , r . By the I.H., oi = oi , for i = 1, . . . , r ; hence
y = z by (3).
28 I. A Bit of Logic: A User’s Toolbox

(c) Finally, we show that F satisﬁes (1). We do induction on Cl(I, R ), to

prove ←: If x ∈ I and y = h(x), then F(x, y) (i.e., y = F(x) in the
alternative notation (∗)), since I ⊆ F. Let next y = g Q (x, o1 , . . . , or )
Q(a1 , . . . , ar , x), where also F(ai , oi ), for
and i = 1, . . . , r . By (2),
a1 , o1 , . . . , ar , or , x, g Q (x, o1 , . . . , or ) ; thus – F being closed un-
Q
der all the rules in R – F(x, g Q (b, o1 , . . . , or )) holds, in short, F(x, y) or
y = F(x). For →, now we assume that F(x, y) holds and we want to infer
the right hand side (of iff ) in (1). We employ Metatheorem I.2.6.
Case 1. Let x, y be F-derivable† in n = 1 step. Then x, y ∈ I. Thus
y = h(x).
Case 2. Suppose next that x, y is F-derivable within n + 1 steps, namely,
we have a derivation
x1 , y1 , x2 , y2 , . . . , xn , yn , x, y (4)
1 , o1 , . . . , ar , or , x, y) and Q(a1 , . . . , ar , x) (see (2)), and
where Q(a
each of a1 , o1 , . . . , ar , or appears in the above derivation, to the left of
x, y. This entails (by (2)) that y = g Q (x, o1 , . . . , or ). Since the ai , oi
appear in (4), F(ai , oi ) holds for i = 1, . . . , r . Thus, x, y satisﬁes the
right hand side of iff in (1), once more.

Uniqueness part. Let the function K also satisfy (1). We show, by induction
on Cl(I , R), that

for all x ∈ Cl(I , R) and all y ∈ Y, y = F(x) iff y = K (x) (5)

→: Let x ∈ I , and y = F(x). By lack of ambiguity, the case conditions
of (1) are mutually exclusive. Thus, it must be that y = h(x). But then, y = K (x)
as well, since K satisﬁes (1) too.
Let now Q(a1 , . . . , ar , x) and y = F(x). By (1), there are (unique, as
we now know) o1 , . . . , or such that oi = F(ai ) for i = 1, . . . , r , and y =
g Q (x, o1 , . . . , or ). By the I.H., oi = K (ai ). But then (1) yields y = K (x) as
well (since K satisﬁes (1)).
←: Just interchange the letters F and K in the above argument.

The above clearly is valid for functions h and g Q that may fail to be defined
everywhere in their “natural” input sets. To be able to have this degree of
generality without having to state additional definitions (such as those of left
fields, right fields, partial functions, total functions, nontotal functions, and
Kleene weak equality) we have stated the recurrence (1) the way we did (to

† Cl(I, R
)-derivable.
I.3. Axioms and Rules of Inference 29

keep an eye on both the input and output side of things) rather than the usual

h(x) if x ∈ I
f (x) =
g Q (x, f (a1 ), . . . , f (ar )) if Q(a1 , . . . , ar , x) holds,
Of course, if all the g Q and h are deﬁned everywhere on their input sets (i.e.,
they are “total”), then f is deﬁned everywhere on Cl(I , R) (see Exercise I.4).

I.3. Axioms and Rules of Inference

Now that we have our language L, we will embark on using it to formally
effect deductions. These deductions start at the axioms. Deductions employ
“acceptable”, purely syntactic – i.e., based on form, not on substance – rules
that allow us to write a formula down (to deduce it) solely because certain other
formulas that are syntactically related to it were already deduced (i.e., already
written down). These string-manipulation rules are called rules of inference.
We describe in this section the axioms and the rules of inference that we will
accept into our logical calculus and that are common to all theories.
We start with a precise deﬁnition of tautologies in our ﬁrst order language L.

I.3.1 Deﬁnition (Prime Formulas in Wff. Propositional Variables). A for-

mula A ∈ Wff is a prime formula or a propositional variable iff it is either
Pri1. atomic or
Pri2. a formula of the form ((∃x)A ).
We use the lowercase letters p, q, r (with or without subscripts or primes) to
denote arbitrary prime formulas (propositional variables) of our language.

That is, a prime formula either has no propositional connectives, or if it does,

it hides them inside the scope of (∃x).
We may think of a propositional variable as a “blob of ink” that is all that a
myopic being makes out of a formula described in I.3.1. The same being will
see an arbitrary well-formed formula as a bunch of blobs, brackets and Boolean
connectives (¬, ∨), “correctly connected” as stipulated below.†

I.3.2 Deﬁnition (Propositional Formulas). The set of propositional formulas

over V , denoted here by Prop, is the smallest set such that:
(1) Every propositional variable (over V ) is in Prop.
(2) If A and B are in Prop, then so are (¬A) and (A ∨ B ).

† Interestingly, our myope can see the brackets and the Boolean connectives.
30 I. A Bit of Logic: A User’s Toolbox

I.3.3 Metatheorem. Prop = Wff.

Proof. ⊆: We do induction on Prop. Every item in I.3.2(1) is in Wff. Wff

satisfies I.3.2(2) (see I.1.8(b)). Done.
⊇: We do induction on Wff. Every item in I.1.8(a) is a propositional variable
(over V ), and hence is in Prop.
Prop trivially satisfies I.1.8(b). It also satisfies I.1.8(c), for if A is in Prop,
then it is in Wff by the ⊆-direction above. Then, by I.3.1, ((∃x)A) is a propo-
sitional variable, and hence in Prop.
We are done once more.

I.3.4 Deﬁnition (Propositional Valuations). We can arbitrarily assign a value

of 0 or 1 to every A in Wff (or Prop) as follows:
(1) We fix an assignment of 0 or 1 to every prime formula. We can think of this
as an arbitrary but fixed function v : {all prime formulas over L} → {0, 1}
in the metatheory.
(2) We define by recursion an extension of v, denoted by v̄:
v̄((¬A)) = 1 − v̄(A)
v̄((A ∨ B )) = v̄(A ) · v̄(B )
where “·” above denotes number multiplication.
We call, traditionally, the values 0 and 1 by the names “true” and “false”
respectively, and write t and f respectively.
We also call a valuation v a truth (value) assignment.
We use the jargon “A takes the truth value t (respectively, f) under a valu-
ation v” to mean “v̄(A ) = 0 (respectively, v̄(A) = 1)”.

The above inductive definition of v̄ relies on the fact that Definition I.3.2 of
Prop is unambiguous (I.2.10, p. 25), so that a propositional formula is uniquely
readable (or parsable) (see Exercises I.5 and I.6). It employs the metatheorem
on recursive definitions (I.2.13).
The reader may think that all this about unique readability is just an annoying
quibble. Actually it can be a matter of life or death. The ancient Oracle of
Delphi had the nasty habit of issuing ambiguous – not uniquely readable, that
is – pronouncements. One famous such pronouncement, rendered in English,
went like this: “You will go you will return not dying in the war”.† Given that

† The original was “I ξ εις αϕιξ εις oυ θ νηξ εις εν π oλεµ ω”.
ι
I.3. Axioms and Rules of Inference 31

ancient Greeks did not use punctuation, the above has two diametrically opposite
meanings depending on whether you put a comma before or after “not”.
The situation with formulas in Prop would have been as disastrous in the
absence of brackets – which serve as punctuation – because unique readability
would then not be guaranteed: For example, for three distinct prime formulas
p, q, r we could ﬁnd a v such that v( p → q → r ) depended on whether we meant
to insert brackets around “ p → q” or around “q → r ” (can you ﬁnd such a v?).

I.3.5 Remark (Truth Tables). Deﬁnition I.3.4 is often given in terms of truth
functions. For example, we could have deﬁned (in the metatheory, of course)
the function F¬ : {t, f} → {t, f} by

t if x = f
F¬ (x) =
f if x = t

We could then say that v̄((¬A)) = F¬ (v̄(A)). One can similarly take care of
all the connectives (∨ and all the abbreviations) with the help of truth functions
F∨ , F∧ , F→ , F↔ . These functions are conveniently given via so-called truth
tables as indicated below:

x y F¬ (x) F∨ (x, y) F∧ (x, y) F→ (x, y) F↔ (x, y)

f f t f f t t
f t t t f t f
t f f t f f f
t t f t t t t

I.3.6 Definition (Tautologies, Satisfiable Formulas, Unsatisfiable Formulas

in Wff). A formula A ∈ Wff (equivalently, in Prop) is a tautology iff for all
valuations v, v̄(A) = t.
We call the set of all tautologies, as defined here, Taut. The symbol |=Taut A
says “A is in Taut”.
A formula A ∈ Wff (equivalently, in Prop) is satisfiable iff for some
valuation v, v̄(A ) = t. We say that v satisfies A.
A set of formulas is satisfiable iff for some valuation v, v̄(A) = t for
every A in . We say that v satisfies .
32 I. A Bit of Logic: A User’s Toolbox

A formula A ∈ Wff (equivalently, in Prop) is unsatisﬁable iff for all

valuations v, v̄(A) = f. A set of formulas is unsatisﬁable iff for all valuations
v, v̄(A) = f for some A in .

“Satisﬁable” and “unsatisﬁable” are terms introduced here in the propositional

or Boolean sense. These terms have a more complicated meaning when we
decide to “see” the object variables and quantiﬁers that occur in formulas.

I.3.7 Deﬁnition (Tautologically Implies, for Formulas in Wff ). Let A and

be respectively any formula and any set of formulas (over L). The symbol
|=Taut A, pronounced “ tautologically implies A ”, means that every truth
assignment v that satisﬁes also satisﬁes A.

We have at once

I.3.8 Lemma.† |=Taut A iff ∪ {¬A } is unsatisﬁable (in the propositional

sense).

If = ∅, then |=Taut A says just |=Taut A , since the hypothesis “every truth
assignment v that satisfies ”, in the definition above, is vacuously satisfied.
For that reason we almost never write ∅ |=Taut A and write instead |=Taut A.

I.3.9 Exercise. For any formula A and any two valuations v and v , v̄(A) =
v̄ (A) if v and v agree on all the propositional variables that occur in A.
In the same manner, |=Taut A is oblivious to v-variations that do not
affect the variables that occur in and A (see Exercise I.7).

Before presenting the axioms, we need to introduce the concept of sub-

stitution.

I.3.10 Tentative Deﬁnition (Substitution of a Term for a Variable). Let A

be a formula, x an (object) variable, and t a term.
A [x ← t] denotes the result of “replacing” all free occurrences of x in A
by the term t, provided no variable of t was “captured” (by a quantiﬁer) during

† The word “lemma” has Greek origin, “λήµµα”, plural “lemmata” – many people say “lemmas” –
from “ήµµατ α”. It derives from the verb “λαµβ άνω” (to take) and thus means “taken thing”.
In mathematical reasoning a lemma is a provable auxiliary statement that is taken and used as
a stepping stone in lengthy mathematical arguments – invoked therein by name, as in “. . . by
Lemma such and such . . . ” – much as subroutines (or procedures) are taken and used as auxiliary
stepping stones to elucidate lengthy computer programs. Thus our purpose in having lemmata is
to shorten proofs by breaking them up into modules.
I.3. Axioms and Rules of Inference 33

substitution. If the proviso is valid, then we say that “t is substitutable for x

(in A)”, or that “t is free for x (in A)”.
If the proviso is not valid, then the substitution is undeﬁned.

I.3.11 Remark. There are a number of issues about Definition I.3.10 that need
discussion or clarification.
Any reasonable person will be satisfied with the above definition “as is”.
However, there are some obscure points (deliberately quoted, above).

(1) What is this about “capture”? Well, suppose that A ≡ (∃x)¬x = y. Let
t ≡ x.† Then, if we ignore the provison in I.3.10, A [y ← t] ≡ (∃x)¬x =
x, which says something altogether different than the original. Intuitively,
this is unexpected (and undesirable): A codes a statement about the free
variable y, i.e., a statement about all objects which could be values (or
meanings) of y. One would have expected that, in particular, A [y ← x] –
if the substitution were allowed – would make this very same statement
about the values of x. It does not.‡ What happened is that x was captured
by the quantiﬁer upon substitution, thus distorting A’s original meaning.
(2) Are we sure that the term “replace” is mathematically precise?
(3) Is A [x ← t] always a formula, if A is?

A revisitation of I.3.10 via an inductive deﬁnition (by induction on terms

and formulas) settles (1)–(3) at once (in particular, the intuitive terms “replace”
and “capture” do not appear in the inductive deﬁnition). Here it goes:
First off, let us deﬁne s[x ← t], where s is also a term, by cases:


 t if s ≡ x



a if s ≡ a, a constant
s[x ← t] ≡ (symbol)



 y if s ≡ y, a variable ≡ x

 f r [x ← t]r [x ← t] . . . r [x ← t] if s ≡ f r . . . r
1 2 n 1 n

Pause. Is s[x ← t] always a term? That this is so follows directly by induction

on terms, using the deﬁnition by cases above and the I.H. that each of ri [x ← t],
i = 1, . . . , n, is a term.

† Recall that in I.1.4 (p. 13) we deﬁned the symbol “≡” to be equality on strings. No further
reminders will be issued.
‡ And that is why the substitution is not allowed. The original formula says that for any object y
there is an object that is different from it. On the other hand, A [y ← x] says that there is an
object that is different from itself.
34 I. A Bit of Logic: A User’s Toolbox

We turn now to formulas. The symbols P, r, s (with or without subscripts)

below denote a predicate of arity n, a term and a term (respectively).


 s[x ← t] = r [x ← t] if A ≡ s = r



 Pr1 [x ← t]r2 [x ← t] . . . rn [x ← t] if A ≡ Pr1 . . . rn


(B [x ← t] ∨ C [x ← t])
 if A ≡ (B ∨ C )


 ¬(B [x ← t])
 if A ≡ (¬B )
A [x ← t] ≡ A if A ≡ ((∃y)B ) and



 y≡x



 (∃y)(B [x ← t]) if A ≡ ((∃y)B ) and



 y ≡ x and y does


not occur in t

In all cases above, the left hand side is deﬁned iff the right hand side is.

Pause. We have eliminated “replaces” and “captured”. Is though A [x ← t] a

formula (whenever it is deﬁned)? (See Exercise I.8.)

I.3.12 Deﬁnition (Simultaneous Substitution). The symbols A [y1 , . . . , yr ←

t1 , . . . , tr ] or, equivalently, A [yr ← tr ] – where yr is an abbreviation of
y1 , . . . , yr – denote simultaneous substitution of the terms t1 , . . . , tr into the
variables y1 , . . . , yr in the following sense: Let zr be variables that do not occur
at all (either as free or bound) in either A or tr . Then A [yr ← tr ] is short for

A [y1 ← z 1 ] . . . [yr ← zr ][z 1 ← t1 ] . . . [zr ← tr ] (1)

Exercise I.9 shows that we obtain the same string in (1) above, regardless of
our choice of new variables zr .

More conventions: The symbol [x ← t] lies in the metalanguage. This

metasymbol has the highest priority, so that, e.g., A ∨ B [x ← t] means
A ∨ (B [x ← t]), (∃x)B [x ← t] means (∃x)(B [x ← t]), etc.
We often write A [y1 , . . . , yr ], rather than the terse A, in order to convey
our interest in the free variables y1 , . . . , yr that may or may not actually appear
free in A. Other variables, not mentioned in the notation, may also be free in
A (see also I.1.11).
In this context, if t1 , . . . , tr are terms, the symbol A [t1 , . . . , tr ] abbreviates
A [yr ← tr ].

We are ready to introduce the (logical) axioms and rules of inference.

I.3. Axioms and Rules of Inference 35

Schemata.† Some of the axioms below will actually be schemata. A formula

schema, or formula form, is a string G of the metalanguage that contains syn-
tactic variables (or metavariables), such as A, P, f, a, t, x.
Whenever we replace all these syntactic variables that occur in G by speci-
ﬁc formulas, predicates, functions, constants, terms, or variables respectively,
we obtain a speciﬁc well-formed formula, a so-called instance of the schema.
For example, an instance of (∃x)x = a is (∃v12 )v12 = 0 (in the language of
Peano arithmetic). An instance of A → A is v101 = v114 → v101 = v114 .

I.3.13 Deﬁnition (Axioms and Axiom Schemata). The logical axioms are all
the formulas in the group Ax1 and all the possible instances of the schemata in
the remaining groups:

Ax1. All formulas in Taut.

Ax2. (Substitution axiom. Schema)
A [x ← t] → (∃x)A for any term t.

By I.3.10–I.3.11, the notation already imposes a condition on t, that it is sub-

stitutable for x.

N.B. We often see the above written as

A [t] → (∃x) A [x]

or even

A [t] → (∃x)A

Ax3. (Schema) For each object variable x, the formula x = x.

Ax4. (Leibniz’s characterization of equality – ﬁrst order version. Schema) For
any formula A , object variable x, and any terms t and s, the formula
t = s → (A [x ← t] ↔ A [x ← s])
N.B. The above is written usually as
t = s → (A [t] ↔ A [s])
as long as we remember that the notation already requires that t and s be free
for x. We will denote the above set of logical axioms .

† Plural of schema. This is of Greek origin, σ χ ήµα, meaning – e.g., in geometry – ﬁgure or
conﬁguration or even formation.
36 I. A Bit of Logic: A User’s Toolbox

The logical axioms for equality are not the strongest possible, but they are
adequate for the job. What Leibniz really proposed was the schema t = s ↔
(∀P)(P[t] ↔ P[s]), which says, intuitively, that “two objects t and s are equal
iff, for every property P, both have P or neither has P”.
Unfortunately, our system of notation (first-order language) does not allow
quantification over predicate symbols (which can have as “values” arbitrary
“properties”). But is not Ax4 read “for all formulas A” anyway? Yes, but with
one qualification: “For all formulas A that we can write down in our system of
notation”, and, alas, we cannot write all possible formulas of real mathematics
down, because they are too many.†
While the symbol “=” is suggestive of equality, it is not its shape that qual-
ifies it. It is the two axioms, Ax3 and Ax4, that make the symbol behave as we
expect equality to behave, and any other symbol of any other shape (e.g.,
Enderton (1972) uses “≈”) satisfying these two axioms qualifies as formal
equality that is intended to codify the metamathematical standard “=”.

I.3.14 Remark. In Ax2 and Ax4 we imposed the condition that t (and s) must
be substitutable in x. Here is why:
Take A to stand for (∀y)x = y and B to stand for (∃y)¬x = y. Then, tem-
porarily suspending the restriction on substitutability, A [x ← y] → (∃x)A is
(∀y)y = y → (∃x)(∀y)x = y

and x = y → B ↔ B [x ← y] is

x = y → (∃y)¬x = y ↔ (∃y)¬y = y
neither of which, obviously, is “valid”.‡
There is a remedy in the metamathematics: That is, move the quantified
variable(s) out of harm’s way, by renaming them so that no quantified variable
in A has the same name as any (free, of course) variable in t (or s).
This renaming is formally correct (i.e., it does not change the meaning of
the formula) as we will see in the variant (meta)theorem I.4.13. Of course,
it is always possible to effect this renaming, since we have countably many
variables, and only finitely many appear free in t (and s) and A.

† Uncountably many, in a precise technical sense that we will introduce in Chapter VII. This is
due to Cantor’s theorem, which implies that there are uncountably many subsets of N. Each such
subset A, gives rise to the formula, x ∈ A, in the metalanguage.
On the other hand, our formal system of notation, using just ∈ and U as start-up (nonlogical)
symbols, is not rich enough to write down but a countably inﬁnite set of formulas (at some point
later, Example VII.5.17, this will be clear). Thus, our notation will fail to denote uncountably
many “real formulas” x ∈ A.
‡ Speaking intuitively is enough for now. Validity will be deﬁned carefully pretty soon.
I.3. Axioms and Rules of Inference 37

This trivial remedy allows us to render the conditions in Ax2 and Ax4
harmless. Essentially, a t (or s) is always substitutable after renaming.

I.3.15 Deﬁnition (Rules of Inference). The following two are the only primi-
tive† rules of inference. These rules are relations with inputs from the set Wff
and outputs also in Wff. They are written down, traditionally, as “fractions”
through the employment of syntactic (or meta-) variables. We call the “numer-
ator” the premise(s) and the “denominator” the conclusion.
We say that a rule of inference is applied to any instance of the formula
schema(ta) in the numerator, and that it yields (or results in) the corresponding
instance‡ of the formula schema in the denominator.
Inf1. Modus ponens, or MP, is the rule
A, A → B
B
Inf2. ∃-introduction – pronounced E-introduction – is the rule
A →B
(∃x)A → B
that is applicable if a side condition is met: That x is not free in B .
N.B. Recall the conventions on eliminating brackets.

It is immediately clear that the deﬁnition above meets our requirement that the
rules of inference be “algorithmic”, in the sense that whether they are applicable
can be decided and their application can be carried out in a ﬁnite number of
steps by just looking at the form of (potential input) formulas (not at their
meaning).

We next deﬁne -theorems, that is, formulas we can prove from the set of
formulas (this may be empty).

I.3.16 Deﬁnition (-Theorems). The set of -theorems, Thm , is the least

inclusive subset of Wff that satisﬁes
Th1. ⊆ Thm (see I.3.13).
Th2. ⊆ Thm . We call every member of a nonlogical axiom.
Th3. Thm is closed under each rule Inf1–Inf2.

† That is, given initially. Other rules can be proved to hold, and we call them derived rules.
‡ The corresponding instance is the one obtained from the schema in the denominator by replac-
ing each of its metavariables by the same speciﬁc formula, or term, used to instantiate all the
occurrences of the same metavariable in the numerator.
38 I. A Bit of Logic: A User’s Toolbox

The metalinguistic statement A ∈ Thm is traditionally written as A,

and we say that A is proved from or that it is a -theorem.
We also say that A is deduced from , or that deduces A.
If = ∅, then rather than ∅ A we write A. We often say in this case
that A is absolutely provable (or provable with no nonlogical axioms).
We often write A, B , . . . , D E for {A, B , . . . , D } E .

I.3.17 Deﬁnition (-Proofs). We just saw that Thm = Cl(I , R), where I =
∪ and R contains just the two rules of inference. A (I , R)-derivation is
also called a -proof (or just proof, if is understood).

I.3.18 Remark. (1) It is clear that if each of A 1 , . . . , A n has a -proof and B

has an {A 1 , . . . , A n }-proof, then B has a -proof. Indeed, simply concatenate
each of the given -proofs (in any sequence). Append to the right of that
sequence the given {A 1 , . . . , A n }-proof (that ends with B ). Then the entire
sequence is a -proof, and ends with B .
We refer to this phenomenon as the transitivity of .
Very important. Transitivity of allows one to invoke previously proved
(by oneself or others) theorems in the course of a proof. Thus, practically, a
-proof is a sequence of formulas in which each formula is an axiom, is a
known -theorem, or is obtained by applying a rule of inference to previous
formulas of the sequence.
(2) If ⊆ and A, then also A, as follows from I.3.16 or I.3.17.
In particular, A implies A for any .
(3) It is immediate from the deﬁnitions that for any formulas A and B ,

A, A → B B (i)

and, if, moreover, x is not free in B ,

A → B (∃x)A → B (ii)

Some texts (e.g., Schütte (1977)) give the rules in the format of (i)–(ii) above.

The axioms and rules provide us with a calculus, that is, a means to “calcu-
late” (used synonymously with construct) proofs and theorems. In the interest
of making the calculus more user-friendly – and thus more easily applicable to
mathematical theories of interest, such as set theory – we are going to develop in
the next section a number of “derived principles”. These principles are largely
I.3. Axioms and Rules of Inference 39

of the form A 1 , . . . , A n B . We call such a (provable in the metatheory)

principle a derived rule of inference, since, by transitivity of , it can be used
as a proof step in a -proof. By contrast, the rules Inf1–Inf2 are “primitive”
(or “basic” or “primary”); they are given outright.

We can now ﬁx our understanding of the concept of a formal or mathematical

theory.
A (ﬁrst order) formal (mathematical) theory, or just theory over a language
L, or just theory, is a tuple (of “ingredients”) T = (L , , I, T ), where L is a
ﬁrst order language, is a set of logical axioms, I is a set of rules of inference,
and T a non-empty subset of Wff that is required to contain (i.e., ⊆ T )
and be closed under the rules I.
Equivalently, one may simply require that T be closed under , that is,

for any ⊆ T and any formula A, if A, then A ∈ T .

This is, furthermore, equivalent to requiring that

A ∈T iff T A (1)

Indeed, the if direction follows from closure under , while the only-if
direction is a consequence of Definition I.3.16.
T is the set of the formulas of the theory,† and we often say “a theory T ”,
taking everything else for granted.
If T = Wff, then the theory T is called inconsistent or contradictory.
Otherwise it is called consistent.
Throughout our exposition we fix and I as in Definitions I.3.13 and I.3.15.
By (1), T = ThmT . This observation suggests that we call theories – such as
the ones we have just defined – axiomatic theories, in that a set always exists
such that T = Thm (if at a loss, we can just take = T ).
We are mostly interested in theories T for which there is a “small” set
(“small” by comparison with T ) such that T = Thm . We say that T is
axiomatized by . Naturally, we call T the set of theorems, and the set of
nonlogical axioms, of T.
If, moreover, is “recognizable” (i.e., we can tell “algorithmically” whether
or not a formula A is in ), then we say that T is recursively axiomatized.
Examples of recursively axiomatized theories are ZFC set theory and Peano
arithmetic. On the other hand, if we take T to be all the sentences of arithmetic

† As opposed to “of the language”, which is all of Wff.

40 I. A Bit of Logic: A User’s Toolbox

that are true when interpreted “in the standard way”† over N – the so-called
complete arithmetic – then there is no recognizable such that T = Thm .
We say that complete arithmetic is not recursively axiomatizable.‡

Pause. Why does complete arithmetic form a theory? Because work of the next
section – in particular, the soundness theorem – entails that it is closed under .

We tend to further abuse language and call axiomatic theories by the name
of their (set of) nonlogical axioms . Thus if T = (L , , I, T ) is a ﬁrst order
theory and T = Thm , then we may say interchangeably “theory T”, “theory
T ”, or “theory ”.
If = ∅, then we have a pure or absolute theory (i.e., we are “just doing
logic, not math”). If = ∅ then we have an applied theory.

Argot. A ﬁnal note on language versus metalanguage, and theory versus meta-
theory. When are we speaking the metalanguage, and when are we speaking
the formal language?
The answers are, respectively, “almost always” and “almost never”. As has
been remarked before, in principle, we are speaking the formal language exactly
when we are pronouncing or writing down a string from Term or Wff. Otherwise
we are (speaking or writing) in the metalanguage. It appears that we (and
everybody else who has written a book in logic or set theory) are speaking and
writing within the metalanguage with a frequency approaching 100%.
The formalist is clever enough to simplify notation at all times. We will
seldom be caught writing down a member of Wff in this book, and, on the rare
occasions we may do so, it will only be to serve as an illustration of why one
should avoid writing down such formulas: because they are too long and hard
to read and understand.
We will be speaking the formal language with a heavy “accent” and using
many idioms borrowed from “real” (meta-) mathematics, and English. We will
call our dialect argot, following Manin (1977).
A related, and practically more important,§ question is “When are we arguing
in the theory, and when are we arguing in the metatheory?”. That is, the question
is not about how we speak, but about what we are saying when we speak.

† That is, the symbol “0” of the language is interpreted as 0 ∈ N, “Sx” as x + 1, “(∃x)” as “there
is an x ∈ N”, etc.
‡ The trivial “solution”, that is, taking = T , will not do, for T is not recognizable.
§ Important, because arguing in the theory restricts us to use only its axioms (and earlier proved
theorems; cf. I.3.18) and its rules of inference – nothing extraneous to these syntactic tools is
allowed.
I.3. Axioms and Rules of Inference 41

The answer to this is also easy: Once we have ﬁxed a theory T and the
nonlogical axioms , we are working in the theory iff we are writing down a
(-) proof of some speciﬁc formula A. It does not matter if A (and much of
the what we write down during the proof) is in argot.
Two examples:

(1) One is working in formal number theory (or formal arithmetic) if one states
and proves (say, from the Peano axioms) that “every natural number n > 1
has a prime factor”. Note how this theorem is stated in argot. Below we
give its translation into the formal language of arithmetic:†

(∀n) S0 < n → (∃x)(∃y) n = x × y ∧
(1)
S0 < x ∧ (∀m)(∀r )(x = m × r → m = S0 ∨ m = x)

(2) One is working in formal logic if one is writing a proof of (∃v13 )v13 = v13 .

Suppose though that our activity consists of effecting deﬁnitions, introduc-

ing axioms, or analyzing the behaviour or capability of T, e.g., proving some
derived rule A 1 , . . . , A n B – that is, a theorem schema – or investigating
consistency,‡ or relative consistency.§ Then we are operating in the metatheory,
that is, in “real” mathematics.

One of the most important problems posed in the metatheory is

“Given a theory T and a formula A. Is A a theorem of T?”

This is Hilbert’s Entscheidungsproblem, or decision problem. Hilbert believed

that every recursively axiomatized theory ought to admit a “general” solution,
by more or less “mechanical means”, to its decision problem. The techniques
of Gödel and the insight of Church showed that this problem is, in general,
algorithmically unsolvable.
As we have already stated, metamathematics exists outside and indepen-
dently of our effort to build this or that formal system. All its methods are – in
principle – available to us for use in the analysis of the behaviour of a formal
system.

† Well, almost. In the interest of brevity, all the variable names used in the displayed formula (1)
are metasymbols.
‡ That is, whether or not T = Wff.
§ That is, “if is consistent,” – where we are naming the theory by its nonlogical axioms – “does
it stay so after we have added some formula A as a nonlogical axiom?”.
42 I. A Bit of Logic: A User’s Toolbox

Pause. But how much of real mathematics are we allowed to use, reliably, to
study or speak about the “simulator” that the formal system is?†

For example, have we not overstepped our license by using induction (and,
implicitly, the entire infinite set N), specifically the recursive definitions of
terms, well-formed formulas, theorems, etc.?
The quibble here is largely “political”. Some people argue (a major proponent
of this was Hilbert) as follows: Formal mathematics was meant to crank out
“true” statements of mathematics, but no “false” ones, and this freedom of
contradiction ought to be verifiable.
Now, as we are verifying so in the metatheory (i.e., outside the formal sys-
tem), shouldn’t the metatheory itself be above suspicion (of contradiction, that
is)? Naturally.
Hilbert’s suggestion towards achieving this “above suspicion” status was,
essentially, to utilize in the metatheory only a small fragment of “reality” that
is so simple and close to intuition that it does not need itself any “certificate”
(via formalization) for its freedom from contradiction.
In other words, restrict the metamathematics!‡
Such a fragment of the metatheory, he said, should have nothing to do with
the “infinite”, in particular with the entire set N and all that it entails (e.g.,
inductive definitions and proofs).§
If it were not for Gödel’s incompleteness results, this position – that meta-
mathematical techniques must be finitary – might have prevailed. However,
Gödel proved it to be futile, and most mathematicians have learnt to feel com-
fortable with infinitary metamathematical techniques, or at least with N and
induction.¶ Of course, it would be imprudent to use as metamathematical tools
mathematics of suspect consistency (e.g., the full naı̈ve theory of sets).

† The methods or scope of the metamathematics that a logician uses – in the investigation of some
formal system – are often restricted for technical or philosophical reasons.
‡ Otherwise we would need to formalize the metamathematics – in order to “certify” it – and
next the metametamathematics, and so on. For if “metaM” is to authoritatively check “M” for
consistency, then it too must be consistent; so let us formalize “metaM” and let “metametaM”
check it; . . . – a never ending story.
§ See Hilbert and Bernays (1968, pp. 21–29) for an elaborate scheme that constructs “concrete
number objects” – Ziffern or “numerals” – “|”,“||”,“|||”, etc., that stand for “1”,“2”,“3”, etc.,
complete with a “concrete mathematical induction” proof technique on these objects, and even
the beginnings of their “recursion theory”. Of course, at any point, only finite sets of such objects
were considered.
¶ Some proponents of infinitary techniques in metamathematics have used very strong words in
describing the failure of Hilbert’s program. Rasiowa and Sikorski (1963) write in their intro-
duction: “However Gödel’s results exposed the fiasco of Hilbert’s finitistic methods as far as
consistency is concerned.”
I.4. Basic Metatheorems 43

It is worth pointing out that one could fit (with some effort) our inductive
definitions within Hilbert’s style. But we will not do so.
First, one would have to abandon the elegant (and now widely used) approach
with closures, and use instead the concept of derivations of Section I.2.
Then one would somehow have to effect and study derivations without the
benefit of the entire set N. Bourbaki (1966b, p. 15) does so with his construc-
tions formatives. Hermes (1973) is another author who does so, with his “term-”
and “formula-calculi” (such calculi being, essentially, finite descriptions of
derivations).
Bourbaki (but not Hermes) avoids induction over all of N. In his metamath-
ematical discussions of terms and formulas† that are derived by a derivation
d1 , . . . , dn , he restricts his induction arguments to the segment {0, 1, . . . , n},
that is, he takes an I.H. on k < n and proceeds to k + 1.

I.4. Basic Metatheorems

We are dealing with an arbitrary theory T = (L , , I, T ), such that is the
set of logical axioms I.3.13 and I are the inference rules I.3.15. We also let
be an appropriate set of nonlogical axioms, i.e., T = Thm .

I.4.1 Metatheorem (Post’s “Extended” Tautology Theorem). If A 1 , . . . ,

A n |=Taut B then A 1 , . . . , A n B .

Proof. The assumption yields that

|=Taut A 1 → · · · → A n → B (1)
Thus – since the formula in (1) is in , and using Deﬁnition I.3.16,
A 1, . . . , A n A 1 → · · · → A n → B (2)
Applying modus ponens to (2), n times, we deduce B .

I.4.1 is an omnipresent derived rule.

I.4.2 Deﬁnition. A and B are provably equivalent in T means that

A ↔ B.

I.4.3 Metatheorem. Any two theorems A and B of T are provably equiva-

lent in T.

† For example, in loc. cit., p. 18, where he proves that, in our notation, A [x ← y] and t[x ← y]
are a formula and term respectively.
44 I. A Bit of Logic: A User’s Toolbox

Proof. By I.4.1, A yields B → A. Similarly, A → B

follows from B . One more application of I.4.1 yields A ↔ B .

¬x = x ↔ ¬y = y (why?), but neither ¬x = x nor ¬y = y is a ∅-theorem.

I.4.4 Remark (Hilbert Style Proofs). In practice we write proofs “vertically”,

that is, as numbered vertical sequences (or lists) of formulas. The numbering
helps the annotational comments that we insert to the right of each formula that
we list, as the following proof demonstrates.
A metatheorem admits a metaproof, strictly speaking. The following is a
derived rule (or theorem schema) and thus belongs to the metatheory (and so
does its proof ).
Another point of view is possible, however: The syntactic symbols x, A,
and B below stand for a speciﬁc variable and speciﬁc formulas that we just
forgot to write down explicitly. Then one can think of the proof as a (formal)
Hilbert style proof.

I.4.5 Metatheorem (∀-Introduction – Pronounced A-Introduction). If x

does not occur free in A , then A → B A → (∀x)B .

Proof.
(1) A →B given
(2) ¬B → ¬A (1) and I.4.1
(3) (∃x)¬B → ¬A (2) and ∃-introduction
(4) A → ¬(∃x)¬B (3) and I.4.1
(5) A → (∀x)B (4), introducing the ∀-abbreviation

I.4.6 Metatheorem (Specialization). For any formula A and term t,

(∀x)A → A[t].

At this point, the reader may want to review our abbreviation conventions; in
particular, see Ax2 (I.3.13).

Proof.
(1) ¬A [t] → (∃x)¬A in
(2) ¬(∃x)¬A → A [t] (1) and I.4.1
(3) (∀x)A → A[t] (2), introducing the ∀-abbreviation

I.4. Basic Metatheorems 45

I.4.7 Corollary. For any formula A, (∀x)A → A.

Proof. A [x ← x] ≡ A.

Pause. Why is A [x ← x] the same string as A ?

I.4.8 Metatheorem (Generalization). For any and any A, if A, then

(∀x)A.

Proof. Choose y ≡ x. Then we continue any given proof of A (from ) as

follows:
(1) A proved from
(2) y = y → A (1) and I.4.1
(3) y = y → (∀x)A (2) and ∀-introduction
(4) y = y in
(5) (∀x)A (3), (4) and MP

I.4.9 Corollary. For any and any A, A iff (∀x)A.

Proof. By I.4.7, I.4.8, and modus ponens.

I.4.10 Corollary. For any A, A (∀x)A and (∀x)A A.

The above corollary motivates the following definition. It also justifies the
common mathematical practice of the “implied universal quantifier”. That is,
we often just state “. . . x . . . ” when we mean “(∀x) . . . x . . . ”.

I.4.11 Deﬁnition (Universal Closure). Let y1 , . . . , yn be the list of all free vari-
ables of A. The universal closure of A is the formula (∀y1 )(∀y2 ) · · · (∀yn )A –
often written more simply as (∀y1 y2 . . . yn )A or even (∀yn )A.

By I.4.10, a formula deduces and is deduced by its universal closure.

Pause. We said the universal closure. Hopefully, the remark immediately above
is robust to permutation of (∀y1 )(∀y2 ) · · · (∀yn ). Is it? (Exercise 1.10.)

I.4.12 Corollary (Substitution of Terms). A [x1 , . . . , xn ] A [t1 , . . . , tn ]

for any terms t1 , . . . , tn .

The reader may wish to review I.3.12 and the remark following it.
46 I. A Bit of Logic: A User’s Toolbox

Proof. We illustrate the proof for n = 2. What makes it interesting is the re-
quirement to have “simultaneous substitution”. To that end we ﬁrst substitute
into x1 and x2 new variables z, w – i.e., not occurring in either A or in the ti .
The proof is the following sequence. Comments justify, in each case, the pres-
ence of the formula immediately to the left by virtue of the presence of the
immediately preceding formula:

A [x1 , x2 ] starting point

(∀x1 ) A [x1 , x2 ] generalization
A [z, x2 ] specialization; x1 ← z
(∀x2 ) A [z, x2 ] generalization
A [z, w] specialization; x2 ← w
Now z ← t1 , w ← t2 , in any order, is the same as simultaneous substitu-
tion I.3.12:
(∀z) A [z, w] generalization
A [t1 , w] specialization; z ← t1
(∀w)A [t1 , w] generalization
A [t1 , t2 ] specialization; w ← t2

I.4.13 Metatheorem (Variant, or Dummy, Renaming). For any formula

(∃x)A, if z does not occur in it (i.e., is neither free nor bound), then (∃x)A ↔
(∃z)A [x← z].

We often write this (under the stated conditions) as (∃x) A [x] ↔ (∃z) A [z].
By the way, another way to state the conditions is “if z does not occur in A
(i.e., is neither free nor bound in A ), and is different from x”. Of course, if
z ≡ x, then there is nothing to prove.

Proof. Since z is substitutable in x under the stated conditions, A [x← z] is

deﬁned. Thus, by Ax2,
A [x← z] → (∃x)A
By ∃-introduction – since z is not free in (∃x)A – we also have
(∃z) A [x← z] → (∃x)A (1)
We note that x is not free in (∃z)A [x← z] and is free for z in A [x← z].
Indeed, A [x← z][z← x] ≡ A. Thus, by Ax2,

A → (∃z) A [x← z]
I.4. Basic Metatheorems 47

Hence, by ∃-introduction
(∃x)A → (∃z) A [x← z] (2)
Tautological implication from (1) and (2) concludes the argument.

Why is A [x← z][z← x] ≡ A? We can see this by induction on A (recall

that z occurs as neither free nor bound in A).
If A is atomic, then the claim is trivial. The claim also clearly “propagates”
with the propositional formation rules, that is, I.1.8(b).
Consider then the case that A ≡ (∃w)B . Note that w ≡ x is possible
under our assumptions, but w ≡ z is not. If w ≡ x, then A [x← z] ≡ A; in
particular, z is not free in A; hence A [x← z][z← x] ≡ A as well.
So let us work with w ≡ x. By the I.H., B [x← z][z← x] ≡ B . Now
A [x← z][z ← x] ≡ ((∃w)B )[x← z][z← x]
≡ ((∃w)B [x← z])[z← x] see I.3.11; w ≡ z
≡ ((∃w)B [x← z][z← x]) see I.3.11; w ≡ x
≡ ((∃w)B ) I.H.
≡A

By I.4.13, the issue of substitutability becomes moot. Since we have an inﬁnite

supply of variables (to use, for example, as bound variables), we can always
change the names of all the bound variables in A so that the new names are
different from all the free variables in A or t. In doing so we obtain a formula
B that is (absolutely) provably equivalent to the original.
Then B [x ← t] will be deﬁned (t will be substitutable in x). Thus, the moral
is “any term t is free for x in A after an appropriate ‘dummy’ renaming”.

I.4.14 Deﬁnition. In the sequel we will often discuss two (or more) theories at
once. Let T = (L , , I, T ) and T = (L , , I, T ) be two theories such that
V ⊆ V . This enables T to be “aware” of all the formulas of T (but not
vice versa, since L may contain additional nonlogical symbols – case where
V = V ).
We say that T is an extension of T, in symbols T ≤ T , iff T ⊆ T .
Let A be a formula over L (so that both theories are aware of it). The
symbols T A and T A are synonymous with A ∈ T and A ∈ T
respectively.
Note that we did not explicitly mention the nonlogical axioms or to the
left of , since the subscript of takes care of that information.
We say that the extension is conservative iff for any A over L, whenever
T A it is also the case that T A. That is, when it comes to formulas over
48 I. A Bit of Logic: A User’s Toolbox

the language (L) that both theories understand, then the new theory does not
do any better than the old in producing theorems.

I.4.15 Metatheorem (Metatheorem on Constants). Let us extend a language

L of a theory T by adding new constant symbols e1 , . . . , en to the alphabet V ,
resulting in the alphabet V , language L , and theory T . Furthermore, assume
that = , that is, we did not add any new nonlogical axioms.
Then T A [e1 , . . . , en ] implies T A [x1 , . . . , xn ], for any variables
x1 , . . . , xn that occur nowhere in A [e1 , . . . , en ], as either free or bound
variables.

Proof. Fix a set of variables x1 , . . . , xn as described above. We do induction on

T -theorems.
Basis. A [e1 , . . . , en ] is a logical axiom (over L ); hence so is A [x1 , . . . , xn ],
over L – because of the restriction on the xi . Thus T A [x1 , . . . , xn ]. Note
that if A [e1 , . . . , en ] is nonlogical, then so is A [x1 , . . . , xn ] under our
assumptions.

Pause. What does the restriction on the xi have to do with the claim above?
Modus ponens. Here T B [e1 , . . . , en ] → A [e1 , . . . , en ] and T
B [e1 , . . . , en ]. By I.H., T B [y1 , . . . , yn ] → A [y1 , . . . , yn ] and T
B [y1 , . . . , yn ], where y1 , . . . , yn occur nowhere in B [e1 , . . . , en ] →
A [e1 , . . . , en ] as either free or bound variables. By modus ponens, T
A [y1 , . . . , yn ]; hence T A [x1 , . . . , xn ] by I.4.12 (and I.4.13).
∃-introduction. We have T B [e1 , . . . , en ] → C [e1 , . . . , en ], z is not free in
C [e1 , . . . , en ], and A [e1 , . . . , en ] ≡ (∃z)B [e1 , . . . , en ] → C [e1 , . . . , en ]. By
the I.H., if w1 , . . . , wn – distinct from z – occur nowhere in B [e1 , . . . , en ] →
C [e1 , . . . , en ] as either free or bound, then we get T B [w1 , . . . , wn ] →
C [w1 , . . . , wn ]. By ∃-introduction we get T (∃z)B [w1 , . . . , wn ] →
C [w1 , . . . , wn ]. By I.4.12 and I.4.13 we get T (∃z)B [x1 , . . . , xn ] →
C [x1 , . . . , xn ], i.e., T A [x1 , . . . , xn ].

I.4.16 Corollary. Let us extend a language L of a theory T by adding new

constant symbols e1 , . . . , en to the alphabet V , resulting in the alphabet V ,
language L , and theory T . Furthermore, assume that = , that is, we did
not add any new nonlogical axioms.
Then T A [e1 , . . . , en ] iff T A [x1 , . . . , xn ] for any choice of variables
x1 , . . . , xn .
I.4. Basic Metatheorems 49

Proof. If part. Trivially, T A [x1 , . . . , xn ] implies T A [x1 , . . . , xn ]; hence

T A [e1 , . . . , en ] by I.4.12.
Only-if part. Choose variables y1 , . . . , yn that occur nowhere in A [e1 , . . . ,
en ] as either free or bound. By I.4.15, T A [y1 , . . . , yn ]; hence, by I.4.12 and
I.4.13, T A [x1 , . . . , xn ].

I.4.17 Remark. Thus, the extension T of T is conservative; for, if A is over

L, then A [e1 , . . . , en ] ≡ A. Therefore, if T A, then T A [e1 , . . . , en ];
hence T A [x1 , . . . , xn ], that is, T A.
A more emphatic way to put the above is this: T is not aware of any new
nonlogical facts that T did not already know, albeit by a different name. If
T can prove A [e1 , . . . , en ], then T can prove the same statement, using any
names (other than the ei ) that are meaningful in its own language; namely, it
can prove A [x1 , . . . , xn ].

The following corollary stems from the proof (rather than the statement)
of I.4.15 and I.4.16, and is important.

I.4.18 Corollary. Let e1 , . . . , en be constants that do not appear in the nonlog-

ical axioms . Then, if x1 , . . . , xn are any variables, and if A [e1 , . . . , en ],
it is also the case that A [x1 , . . . , xn ].

I.4.19 Metatheorem (The Deduction Theorem). For any closed formula A ,

arbitrary formula B , and set of formulas , if + A B , then
A →B.

N.B. + A denotes the augmentation of by adding the formula A.

In the present metatheorem A is a single (but unspeciﬁed) formula. However,
the notation extends to the case where A is a schema, in which case it means
the augmentation of by adding all the instances of the schema.
A converse of the metatheorem is also true trivially: That is, A → B
implies + A B . This direction immediately follows by modus ponens
and does not require the restriction on A.

Proof. The proof is by induction on + A-theorems.

Basis. Let B be logical or nonlogical (but, in the latter case, assume
B ≡ A). Then B . Since B |=Taut A → B , it follows by I.4.1 that
A → B.
Now, if B ≡ A, then A → B is a logical axiom (group Ax1); hence
A → B once more.
50 I. A Bit of Logic: A User’s Toolbox

Modus ponens. Let + A C and + A C → B . By I.H.,

A → C and A → C → B . Since A → C , A → C → B |=Taut
A → B , we have A → B .
∃-introduction. Let + A C → D and B ≡ (∃x)C → D , where x is
not free in D . By the I.H., A → C → D . By I.4.1, C → A → D ;
hence (∃x)C → A → D by ∃-introduction (A is closed). One more
application of I.4.1 yields A → (∃x)C → D .

I.4.20 Remark. (1) Is the restriction that A must be closed important? Yes.
Let A ≡ x = a, where “a” is some constant. Then, even though A (∀x)A
by generalization, it is not always true† that A → (∀x)A. This follows from
soundness considerations (next section). Intuitively, assuming that our logic
“doesn’t lie” (that is, it proves no “invalid” formulas), we immediately infer
that x = a → (∀x)x = a cannot be absolutely provable, for it is a “lie”. It fails
at least over N, if a is interpreted to be “0”.
(2) I.4.16 adds ﬂexibility to applications of the deduction theorem:
T (A → B )[x1 , . . . , xn ] (∗)
where [x1 , . . . , xn ] is the list of all free variables just in A, is equivalent
(by I.4.16) to
T (A → B )[e1 , . . . , en ] (∗∗)
where e1 , . . . , en are new constants added to V (with no effect on nonlogical
axioms: = ). Now, since A [e1 , . . . , en ] is closed, proving
+ A [e1 , . . . , en ] B [e1 , . . . , en ]
establishes (∗∗), hence also (∗).
In practice, one does not perform this step explicitly, but ensures that,
throughout the + A-proof, whatever free variables were present in A
“behaved like constants”, or, as we also say, were “frozen”.
(3) In some expositions the deduction theorem is not constrained by requiring
that A be closed (e.g., Bourbaki (1966b), and more recently Enderton (1972)).

Which version is right? Both are, in their respective contexts. If all the
primary rules of inference are “propositional” (e.g., as in Bourbaki (1966b) and
Enderton (1972), who only employ modus ponens) – that is, these rules do not
meddle with quantiﬁers – then the deduction theorem is unconstrained. If, on
the other hand, full generalization, namely, A (∀x)A, is a permissible rule
(primary or derived), then one cannot avoid constraining the application of the

† That is, it is not true in the metatheory that we can prove A → (∀x)A without nonlogical
axioms (absolutely).
I.4. Basic Metatheorems 51

deduction theorem, lest one want to derive (the invalid) A → (∀x)A from
the valid A (∀x)A.
This also entails that approaches such as in Bourbaki (1966b) and Enderton
(1972) do not derive full generalization. They only allow a weaker rule, “if
A, then (∀x)A”.†
(4) This divergence of approach in choosing rules of inference has some
additional repercussions. One has to be careful in deﬁning the semantic counter-
part of , namely, |= (see next section). One wants the two symbols to track
each other faithfully (Gödel’s completeness theorem).‡

I.4.21 Corollary (Proof by Contradiction). Let A be closed. Then A

iff + ¬A is inconsistent.

Proof. If part. Given that Thm+¬A = Wff. In particular, + ¬A A. By

the deduction theorem, ¬A → A. But ¬A → A |=Taut A.
Only-if part. Given that A. Hence + ¬A A as well (recall
I.3.18(2)). Of course, + ¬A ¬A too. Since A, ¬A |=Taut B for an
arbitrary B , we are done.

Pause. Is it necessary to assume that A is closed in I.4.21? Why?

The following is important enough to merit stating. It follows from the type
of argument we employed in the only-if part above.

I.4.22 Metatheorem. T is inconsistent iff for some A, both T A and

T ¬A hold.

We also list below a number of quotable proof techniques. These techniques

are routinely used by mathematicians, and will be routinely used by us in what
follows. The proofs of all the following metatheorems are delegated to the
reader.

I.4.23 Metatheorem (Distributivity or Monotonicity of ∃). For any x, A, B ,

A → B (∃x)A → (∃x)B

Proof. See Exercise I.11.

† Indeed, they allow a bit more generality, namely, the rule “if A with a side condition, then
(∀x)A. The side condition is that the formulas of do not have free occurrences of x”. Of
course, can always be taken to be ﬁnite (why?), so that this condition is not unrealistic.
‡ In Mendelson (1987) |= is deﬁned inconsistently with .
52 I. A Bit of Logic: A User’s Toolbox

I.4.24 Metatheorem (Distributivity or Monotonicity of ∀). For any x, A, B ,

A → B (∀x)A → (∀x)B

Proof. See Exercise I.12.

The term “monotonicity” is inspired by thinking of “→” as “≤”. How? Well,

we have the tautology

(A → B ) ↔ (A ∨ B ↔ B) (i)

If we think of “A ∨ B ” as “max(A, B )”, then the right hand side in (i) above
says that B is the maximum of A and B . Or that A is “less than or equal to”
B . The above metatheorems say that both ∃ and ∀ preserve this “inequality”.

I.4.25 Metatheorem (The Equivalence Theorem, or Leibniz Rule). Let

A ↔ B , and let C be obtained from C by replacing some – possibly,
but not necessarily, all – occurrences of a subformula A of C by B . Then
C ↔ C , i.e.,
A ↔B
C ↔C
is a derived rule.

Proof. The proof is by induction on formulas C . See Exercise I.14.

Equational or calculational predicate logic is a particular foundation of ﬁrst

order logic that uses the above Leibniz rule as the primary rule of inference.
In applying such logic one prefers to write proofs as chains of equivalences.
Most equivalences in such a chain stem from an application of the rule. See
Dijkstra and Scholten (1990), Gries and Schneider (1994), Tourlakis (2000a,
2000b, 2001).

I.4.26 Metatheorem (Proof by Cases). Suppose that A 1 ∨ · · · ∨ A n ,

and A i → B for i = 1, . . . , n. Then B .

Proof. Immediate, by I.4.1.

Proof by cases usually beneﬁts from the application of the deduction theorem.
That is, having established A 1 ∨ · · · ∨ A n , one then proceeds to adopt,
in turn, each A i (i = 1, . . . , n) as a new nonlogical axiom (with its variables
I.5. Semantics 53

“frozen”). In each case (A i ) one proceeds to prove B . At the end of all this
one has established B .
In practice we normally use the following argot:
“We will consider cases A i , for i = 1, . . . , n.
Case A 1 . . . . therefore, B .†
...
Case A n . . . . therefore, B .”

I.4.27 Metatheorem (Proof by Auxiliary Constant). Suppose that for formu-

las A and B over the language L we know
(1) (∃x)A [x],
(2) + A [a] B , where a is a new constant not in the language L of .
Furthermore assume that in the proof of B all the free variables of A [a]
were frozen.
Then B .

Proof. Exercise I.18.

The technique that ﬂows from this metatheorem is used often in practice. For
example, in projective geometry axiomatized as in Veblen and Young (1916), in
order to prove Desargues’s theorem on perspective triangles on the plane we use
some arbitrary point (this is the auxiliary constant) off the plane, having veriﬁed
that the axioms guarantee that such a point exists. It is important to note that De-
sargues’s theorem does not refer to this point at all – hence the term “auxiliary”.

Note. In this example, from projective geometry, “B ” is Desargues’s theorem,

“(∃x)A [x]” asserts that there are points outside the plane, a is an arbitrary such
point, and the proof (2) starts with words like “Let a be a point off the plane” –
which is argot for “add the axiom A [a]”.

I.5. Semantics
So what do all these symbols mean? We show in this section how to decode
the formal statements (formulas) into informal statements of real mathematics.
Conversely, this will entail an understanding of how to code statements of real
mathematics in our formal language.

† That is, we add the axiom A 1 to , freezing its variables, and we then prove B .
54 I. A Bit of Logic: A User’s Toolbox

The rigorous† deﬁnition of semantics for ﬁrst order languages is due to

Tarski and is often referred to as “Tarski semantics”. The flavour of the particular
definition given below is that of Shoenfield (1967), and it accurately reflects our
syntactic choices – most importantly, the choice to permit “full” generalization
A (∀x)A. In particular, we will define the semantic counterpart of , name-
ly, |=, pronounced “logically implies”, to ensure that A iff |= A. This
is the content of Gödel’s completeness theorem, which we state without proof
in this section (for a proof see, e.g., our volume 1, Mathematical Logic).
This section will assume some knowledge of notation and elementary facts
from Cantorian (naı̈ve) set theory. We will, among other things, make use of
notation such as

An (or A × ·
· · × A)
n times

for the set of ordered n-tuples of members of A. We will also use the symbols

⊆, ∪, a∈I .‡

I.5.1 Deﬁnition. Given a language L = (V , Term, Wff ), a structure M =

(M, I ) appropriate for L is such that M = ∅ is a set (the domain or underlying
set or universe§ ) and I (“I ” for interpretation) is a mapping that assigns

(1) to each constant a of V a unique member a I ∈ M,

(2) to each function f of V – of arity n – a unique (total)¶ function f I :
M n → M,
(3) to each predicate P of V – of arity n – a unique set P I ⊆ M n .#

I.5.2 Remark. The structure M is often written more verbosely, in conformity

with practice in algebra. Namely, one unpacks the I into a list a I , bI , . . . ; f I ,
g I , . . . ; P I , Q I , . . . and writes instead M = (M; a I , bI , . . . ; f I , g I , . . . ;

† One often says “The formal definition of semantics . . .”, but the word “formal” is misleading
here, for we are actually defining semantics in the metatheory (in “real” mathematics), not in
some formal theory.
‡ If we have a set of sets{Sa , Sb , Sc , . . . }, where the indices a, b, c, . . . all come out of an index
set I , then the symbol i∈I Si stands for the collection of all those objects x that are found in at
∞
least one of the sets Si . It is a common habit to write i=0 Si instead of i∈N Si . A ∪ B is the

same as i∈{1,2} Si , where we have let S1 = A and S2 = B.
§ Often the qualification “of discourse” is added to the terms “domain” and “universe”.
¶ Requiring f I to be total is a traditional convention. By the way, total means that f I is defined
everywhere on M n .
# Thus P I is an n-ary relation with inputs and outputs in M.
I.5. Semantics 55

P I , Q I , . . . ). Under this understanding, a structure is an underlying set (uni-

verse), M, along with a list of “concrete” constants, functions, and relations
that “interpret” corresponding “abstract” items of the language.
Under the latter notational circumstances we often use the symbols a M, f M,
M
P (rather than a I , etc.) to indicate the interpretations in M of the constant a,
function f , and predicate P respectively.
We have said above “structure appropriate for L”, thus emphasizing the
generality of the language and therefore our ability to interpret what we say in
it in many different ways.
Often though (e.g., as in formal arithmetic and set theory), we have a structure
in mind to begin with, and then build a formal language to formally codify
statements about the objects in the structure. Under these circumstances, in
effect, we deﬁne a language appropriate for the structure. We use the symbol
L M to indicate that the language was built to ﬁt the structure M.

I.5.3 Deﬁnition. We routinely add symbols to a language L (by adding new

nonlogical symbols) to obtain a language L . We say that L is an extension of
L and that L is a restriction of L . Suppose that M = (M, I ) is a structure
for L, and let M = (M, I ) be a structure with the same underlying set M,
but with I extended to I so that the latter gives meaning to all new symbols
while it gives the same meaning as I does to the symbols of L.
We call M an expansion (rather than extension) of M, and M a reduct
(rather than restriction) of M . We often write I = I L to indicate that the
mapping I – restricted to L (symbol “ ”) – equals I .

I.5.4 Deﬁnition. Given L and a structure M = (M, I ) appropriate for L.

L(M) denotes the language obtained from L by adding to V a unique new
name i for each object i ∈ M.
This amends the sets Term, Wff into Term(M), Wff(M). Members of the
latter sets are called M-terms and M-formulas respectively.
I
We extend the mapping I to the new constants by: i = i for all i ∈ M
(where the “=” here is metamathematical: equality on M).

All that we have done here is to allow ourselves to do substitutions like [x ← i]

formally. We do instead [x ← i]. One next gives “meaning” to all closed terms
in L(M). The following uses deﬁnition by recursion (I.2.13) and relies on the
fact that the rules that deﬁne terms are unambiguous.
56 I. A Bit of Logic: A User’s Toolbox

I.5.5 Deﬁnition. For closed terms t in Term(M) we deﬁne the symbol t I ∈ M

inductively:
(1) If t is either a (original constant) or i (imported constant), then t I has
already been deﬁned.
(2) If t is the string f t1 . . . tn , where f is n-ary and t1 , . . . , tn are closed
I I
M-terms, we deﬁne t I to be the object (of M) f I (t1 , . . . , tn ).

Finally, we give meaning to all closed M-formulas, again by recursion (over

Wff).

I.5.6 Deﬁnition. For any closed formula A in Wff(M) we deﬁne the symbol
A I inductively. In all cases, A I ∈ {t, f}:
(1) If A ≡ t = s, where t and s are closed M-terms, then A I = t iff
t I = s I . (The last two occurrences of “=” are metamathematical.)
(2) If A ≡ Pt1 . . . tn , where P is an n-ary predicate and the ti are closed
I I I I
M-terms, then A I = t iff t1 , . . . , tn ∈ P I or P I (t1 , . . . , tn ) holds.
(Or “is true”; see p. 20. Of course, the last occurrence of “=” is meta-
mathematical.)
(3) If A is any of the sentences ¬B , B ∨ C , then A I is determined by
the usual truth tables (see p. 31) using the values B I and C I . That is,
(¬B )I = F¬ (B I ) and (B ∨ C )I = F∨ (B I , C I ). (The last two occur-
rences of “=” are metamathematical.)
(4) If A ≡ (∃x)B , then A I = t iff (B [x ← i])I = t for some i ∈ M.
(The last two occurrences of “=” are metamathematical.)

We have “imported” constants from M into L in order to be able to state

the semantics of (∃x)B above in the simple manner we just did (following
Shoenﬁeld (1967)).
We often state the semantics of (∃x)B by writing
I
(∃x)B [x] is true iff (∃i ∈ M)(B [i])I is true

I.5.7 Deﬁnition. Let A ∈ Wff, and M be a structure as above.

An M-instance of A is an M-sentence A(i 1 , . . . , i k ) (that is, all the free
variables of A have been replaced by imported constants).
We say that A is valid in M, or that M is a model of A, iff for all M-
instances A of A it is the case that A I = t.† Under these circumstances we
write |=M A.
† We henceforth discontinue our pedantic “(The last occurrence of “=” is metamathematical.)”.
I.5. Semantics 57

For any set of formulas from Wff, |=M , pronounced “M is a model of

”, means that |=M A for all A ∈ .
A formula A is universally valid or logically valid (we often say just valid)
iff every structure appropriate for the language is a model of A.
Under these circumstances we simply write |= A.
If is a set of formulas, then we say it is satisfiable iff it has a model. It is
finitely satisfiable iff every finite subset of has a model.†

Contrast the concept of satisﬁability here with that of propositional satisﬁabil-

ity (I.3.6). The deﬁnition of validity of A in a structure M corresponds with the
normal mathematical practice. It says that a formula is true (in a given “context”
M) just in case it is so for all possible values of the free variables.

I.5.8 Deﬁnition. We say that logically implies A, in symbols |= A, to

mean that every model of is also a model of A.

I.5.9 Deﬁnition (Soundness). A theory (identiﬁed by its nonlogical axioms)

is sound iff, for all A ∈ Wff, A implies |= A, that is, iff all the
theorems of the theory are logically implied by the nonlogical axioms.

Clearly then, a pure theory T is sound iff T A implies |= A for all A ∈

Wff. That is, all its theorems are universally valid.

Towards the soundness result‡ below we look at two tedious (but easy)
lemmata.

I.5.10 Lemma. Given a term t, variables x ≡ y, where y does not occur in t,

and a constant a. Then, for any term s and formula A , s[x ← t][y ← a] ≡
s[y ← a][x ← t] and A [x ← t][y ← a] ≡ A [y ← a][x ← t].

Proof. Induction on s: Basis:



 if s ≡ x then t

if s ≡ y then a
s[x ← t][y ← a] ≡

 if s ≡ z, where x ≡
z ≡ y, then z

if s ≡ b then b
≡ s[y ← a][x ← t]

† These two concepts are often deﬁned just for sentences.

‡ Also nicknamed “the easy half of Gödel’s completeness theorem”.
58 I. A Bit of Logic: A User’s Toolbox

For the induction step let s ≡ f r1 . . . rn , where f has arity n. Then

s[x ← t][y ← a] ≡ f r1 [x ← t][y ← a] . . . rn [x ← t][y ← a]
≡ f r1 [y ← a][x ← t] . . . rn [y ← a][x ← t] by I.H.
≡ s[y ← a][x ← t]
Induction on A: Basis:


 if A ≡ Pr1 . . . rn then



 Pr1 [x ← t][y ← a] . . . rn [x ← t][y ← a] ≡


Pr1 [y ← a][x ← t] . . . rn [y ← a][x ← t]
A [x ← t][y ← a] ≡

 if A ≡ r = s then



 r [x ← t][y ← a] = s[x ← t][y ← a] ≡


r [y ← a][x ← t] = s[y ← a][x ← t]
≡ A [y ← a][x ← t]

The property we are proving, trivially, propagates with Boolean connectives.

Let us do the induction step just in the case where A ≡ (∃w)B . If w ≡ x or
w ≡ y, then the result is trivial. Otherwise,

A [x ← t][y ← a] ≡ ((∃w)B )[x ← t][y ← a]

≡ ((∃w)B [x ← t][y ← a])
≡ ((∃w)B [y ← a][x ← t]) by I.H.
≡ ((∃w)B )[y ← a][x ← t]
≡ A [y ← a][x ← t]

I.5.11 Lemma. Given a structure M = (M, I ), a term s and a formula A,

both over L(M). Furthermore, each of s and A have at most one free variable,
namely, x.
Let t be a closed term over L(M) such that t I = i ∈ M. Then (s[x ← t])I =
(s[x ← i])I and (A [x ← t])I = (A [x ← i])I . Of course, since t is closed,
A [x ← t] is deﬁned.

Proof. Induction on s: Basis: s[x ← t] ≡ s if s ∈ {y, a, j} (y ≡ x). Hence

(s[x ← t])I = s I = (s[x ← i])I in this case. If s ≡ x, then s[x ← t] ≡ t and
s[x ← i] ≡ i, and the claim follows once more.
For the induction step let s ≡ f r1 . . . rn , where f has arity n. Then

(s[x ← t])I = f I (r1 [x ← t])I , . . . , (rn [x ← t])I

= f I (r1 [x ← i])I , . . . , (rn [x ← i])I by I.H.
= (s[x ← i])I
I.5. Semantics 59

Induction on A: Basis: If A ≡ Pr1 . . . rn , then†

(A [x ← t])I = P I (r1 [x ← t])I , . . . , (rn [x ← t])I

= P I (r1 [x ← i])I , . . . , (rn [x ← i])I
= (A [x ← i])I
Similarly if A ≡ r = s.
The property we are proving, clearly, propagates with Boolean connectives.
Let us do the induction step just in the case where A = (∃w)B . If w ≡ x the
result is trivial. Otherwise, we note that – since t is closed – w does not occur
in t, and proceed as follows:
I
(A [x ← t])I = t iff ((∃w)B )[x ← t] = t
I
iff ((∃w)B [x ← t]) = t
iff (B [x ← t][w ← j])I = t for some j ∈ M, by I.5.6(4)
iff (B [w ← j][x ← t])I = t for some j ∈ M, by I.5.10
I
iff (B [w ← j])[x ← t] = t for some j ∈ M
I
iff (B [w ← j])[x ← i] = t for some j ∈ M, by I.H.
iff (B [w ← j][x ← i])I = t for some j ∈ M
iff (B [x ← i][w ← j])I = t for some j ∈ M, by I.5.10
I
iff ((∃w)B [x ← i]) = t by I.5.6(4)
I
iff ((∃w)B )[x ← i] = t
iff (A [x ← i])I = t

I.5.12 Metatheorem (Soundness). Any ﬁrst order theory (identiﬁed by its non-
logical axioms) , over some language L, is sound.

Proof. By induction on -theorems A, we prove that |= A. That is, we ﬁx

a structure for L, say M, and assume that |=M . We then proceed to show that
|=M A.
Basis: If A is a nonlogical axiom, then our conclusion is part of the as-
sumption, by I.5.7.
If A is a logical axiom, there are a number of cases:

Case 1. |=Taut A. We ﬁx an M-instance of A, say A , and show that

A I = t. Let p1 , . . . , pn be all the propositional variables (alias,
prime formulas) occurring in A . Deﬁne a valuation v by setting

† For a metamathematical relation Q, as usual (p. 20), Q(a, b, . . . ) = t, or just Q(a, b, . . . ), stands
for a, b, . . . ∈ Q.
60 I. A Bit of Logic: A User’s Toolbox

v( pi ) = pi for i = 1, . . . , n. Clearly, t = v̄(A ) = A I (the ﬁrst “=”

because |=Taut A , the second because after prime formulas have been
taken care of, all that remains to be done for the evaluation of A I is
to apply Boolean connectives – see I.5.6(3)).

Pause. Why is |=Taut A ?

Case 2. A ≡ B [t] → (∃x)B . Again, we look at an M-instance B [t ] →

(∃x)B . We want (B [t ] → (∃x)B )I = t, but suppose instead that

(B [t ])I = t (1)

and
I
(∃x)B = f (2)
I
Let t = I i (i ∈ M). By I.5.11 and (1), (B [i]) = t. By I.5.6(4),
I

(∃x)B = t, contradicting (2).

Case 3. A ≡ x = x. Then an arbitrary M-instance is i = i for some i ∈ M.
By I.5.6(1), (i = i)I = t.
Case 4. A ≡ t = s → (B [t] ↔ B [s]). Once more, we take an arbitrary M-
instance, t = s → (B [t ] ↔ B [s ]). Suppose that (t = s )I = t.
That is, t I = s I = (let us say) i (in M). But then

(B [t ])I = (B [i])I , by I.5.11

= (B [s ])I , by I.5.11

Hence (B [t] ↔ B [s])I = t.

For the induction step we have two cases:

Modus ponens. Let B and B → A be -theorems. Fix an M-instance
B → A . Since B , B → A |=Taut A , the argument here is entirely analo-

gous to the case A ∈ (hence we omit it).

∃-introduction. Let A ≡ (∃x)B → C and B → C , where x is not
free in C . By the I.H.

|=M B → C (3)

Let
(∃x)B → C be an M-instance such that (despite expectations)
I
(∃x)B = t but
I
C =f (4)
I.5. Semantics 61

Thus

B [i] = t
I
(5)

for some i ∈ M. Since x is not free in C , B [i] → C is a false (by (4) and
(5)) M-instance of B → C , contradicting (3).

We have used the condition of ∃-introduction above, by saying “Since x is not

free in C , B [i] → C is a[n] . . . M-instance of B → C ”.
So the condition was useful. But is it essential? Yes, since, for example, if
x ≡ y, then x = y → x = y |= (∃x)x = y → x = y.

As a corollary of soundness we have the consistency of pure theories:

I.5.13 Corollary. Any ﬁrst order pure theory is consistent.

Proof. Let T be a pure theory over some language L. Since |= ¬x = x, it

follows that T ¬x = x, thus T = Wff.

By A and |= A we mean the metatheoretical statements “ ‘ A’ is false”

and “ ‘|= A’ is false” respectively.

I.5.14 Corollary. Any ﬁrst order theory that has a model is consistent.

Proof. Let T be a ﬁrst theory over some language L, and M a model of T.

Since |=M ¬x = x, it follows that T ¬x = x, thus T = Wff.

First order deﬁnability in a structure. We are now in a position to make the

process of “translation” to and from informal mathematics rigorous.

I.5.15 Deﬁnition. Let L be a ﬁrst order language, and M a structure for L. A

set (synonymously, relation) S ⊆ M n is (ﬁrst order) deﬁnable in M over L
iff for some formula S (y1 , . . . , yn ) (see p. 19 for a reminder on round-bracket
notation) and for all i j , j = 1, . . . , n, in M,

i 1 , . . . , i n ∈ S iff |=M S (i 1 , . . . , i n )

We often just say “deﬁnable in M”.

A function f : M n → M is deﬁnable in M over L iff the relation y =
f (x1 , . . . , xn ) is so deﬁnable.
62 I. A Bit of Logic: A User’s Toolbox

N.B. Some authors say “(first order) expressible” (Smullyan (1992)) rather
than “(first order) definable” in a structure.
In the context of M, the above definition gives precision to statements such
as “we code (or translate) an informal statement into the formal language” or
“the (formal language) formula A informally ‘says’ . . . ”, since any (informal)
“statement” (or relation) that depends on the informal variables x1 , . . . , xn has
the form “x1 , . . . , xn ∈ S” for some (informal) set S. It also captures the
essence of the statement “The (informal) statement x1 , . . . , xn ∈ S can be
written (or can be made) in the formal language.”

What “makes” the statement, in the formal language, is the formula S .

I.5.16 Example. The informal statement “z is a prime” has a formal translation

S0 < z ∧ (∀x)(∀y) z = x × y → x = z ∨ x = S0
over the language of elementary number theory, where nonlogical symbols
are 0, S, +, ×, < and the deﬁnition (translation) is effected in the standard
structure N = (N; 0; S, +, ×; <), where “S” satisﬁes, for all n ∈ N, S(n) =
n + 1 and interprets “S” (see I.5.2, p. 54, for the “unpacked” notation we have
just used to denote the structure N). We have used the variable name “z” both
formally and informally, but we have used a typographical trick: The formal
variable was in boldface type while the informal one was in lightface.

It must be said that translation is not just an art or skill. There are theoretical
limitations to translation. The trivial limitation is that if M is an infinite set and,
say, L has a finite set of nonlogical symbols (as is the case in arithmetic and
set theory), then we cannot define all S ⊆ M, simply because we do not have
enough first order formulas to do so.
There are non-trivial limitations too. Some sets are not first order definable
because their definitions are “far too complex” (the reader who wants more
on this comment may wish to look up the section on definability and incom-
pletableness in volume 1 of these lectures (Mathematical Logic)).

This is a good place to introduce a common notational argot that allows us to

write mixed-mode formulas that have a formal part (over some language L)
but may contain informal constants (names, to be sure, but names that have not
formally been imported into L) from some structure M appropriate for L.

I.5.17 Informal Deﬁnition. Let L be a ﬁrst order language, and M = (M, I )

a structure for L. Let A be a formula with at most x1 , . . . , xn free, and i 1 , . . . , i n
I.5. Semantics 63

be
members of I M. The notation A [[ i 1 , . . . , i n ]] is an abbreviation of
A [i 1 , . . . , i n ] .

This argot allows one to substitute informal objects into variables outright,
by-passing the procedure of importing formal names for such objects into the
language. It is noteworthy that mixed mode formulas can be defined directly by
induction on formulas – that is, without forming L(M) first – as follows:
Let L and M be as above. Let x1 , . . . , xn contain all the free variables that
appear in a term t or formula A over L (not over L(M)). Let i 1 , . . . , i n be
arbitrary in M.
For terms we define

t [[ i 1 , . . . i n ]]

i j if t ≡ x j (1 ≤ j ≤ n)
= aI if t ≡ a
 I
f t1 [[ i 1 , . . . , i n ]] , . . . , tr [[ i 1 , . . . , i n ]] if t ≡ f t1 . . . tr

For formulas we let

A [[ i 1 , . . . i n ]]


 t [[i 1 , . . . i n ]] = s [[ i 1 , . . . i n ]] if A ≡t =s



 P t1 [[ i 1 , . . . , i n ]] , . . . , tr [[ i 1 , . . . , i n ]] if A ≡ Pt1 . . . tr
= ¬ B [[ i 1 , . . . i n ]] if A ≡ ¬B



 B [[ i 1 , . . . i n ]] ∨ C [[ i 1 , . . . , i n ]] if A ≡B ∨C

(∃a ∈ M)B [[ a, i , . . . , i ]]
1 n if A ≡ (∃z)B [z, xn ]

where “(∃a ∈ M) . . . ” is short for “(∃a)(a ∈ M ∧ . . . )”. The right hand side
of = has no free (informal) variables, thus it evaluates to t or f.

We now turn to the “hard half” of Gödel’s completeness theorem, which

states that our syntactic proof apparatus can faithfully mimic proofs by logical
implication. That is, the syntactic apparatus is “complete”.

I.5.18 Deﬁnition. A theory over L (designated by its nonlogical axioms) is

semantically complete iff |= A implies A for any formula A.

The term “semantically complete” is not used much. There is a competing

syntactic notion of completeness, that of simple completeness, also called just
completeness. The latter is the notion one has normally in mind when saying
“a complete theory”, or, in the opposite case, incomplete.
64 I. A Bit of Logic: A User’s Toolbox

The proof of the semantic completeness of every ﬁrst order theory hinges on
the consistency theorem, which we state without proof below.† The complete-
ness theorem will then be derived as a corollary.

I.5.19 Metatheorem (Consistency Theorem). If a (ﬁrst order) theory T is

consistent, then it has a model.

Metamathematically speaking, a set S is countable if it is ﬁnite or it can be put

in 1-1 correspondence with N. The latter means that there is a total function
f : N → S that is onto – that is, (∀x ∈ S)(∃n ∈ N) f (n) = x is true – and 1-1.
“1-1” means that (∀n ∈ N)(∀m ∈ N)( f (n) = f (m) → n = m) is true.
A set that is not countable is uncountable. Cantor has proved that the set of
reals, R, is uncountable.
By deﬁnition, a language L is countable or uncountable iff the set of its
nonlogical symbols is.
By deﬁnition, a model is countable or uncountable iff its domain is.

The technique of proof of I.5.19 yields the following important corollaries.

I.5.20 Corollary. A consistent theory over a countable language has a count-

able model.

I.5.21 Corollary (Löwenheim-Skolem Theorem). If a set of formulas over

a countable language has a model, then it has a countable model.

I.5.22 Corollary (Gödel’s Completeness Theorem – Hard Half). In any

countable ﬁrst order language L, |= A implies A.

Proof. Let B denote the universal closure of A. By Exercise I.21, |= B .

Thus, + ¬B has no models (why?). Therefore it is inconsistent. Thus, B
(by I.4.21), and hence (specialization), A.

A way to rephrase completeness is that if |= A, then also |= A, where

⊆ is ﬁnite. This follows by soundness, since |= A entails A and
hence A, where consists of just those formulas of used in the proof
of A.

† For a proof see volume 1 of these lectures.

I.5. Semantics 65

I.5.23 Corollary (Compactness Theorem). In any countable ﬁrst order lan-

guage L, a set of formulas is satisfiable iff it is finitely satisfiable.

Proof. Only-if part. This is trivial, for a model of is a model of any finite
subset.
If part. Suppose that is unsatisfiable (it has no models). Then it is in-
consistent by the consistency theorem. In particular, ¬x = x. Since the
pure theory over L is consistent, a -proof of ¬x = x involves a nonempty
finite sequence of nonlogical axioms (formulas of ), A 1 , . . . , A n . That is,
A 1 , . . . , A n ¬x = x, hence {A 1 , . . . , A n } has no model (by soundness).
This contradicts the hypothesis.

Now, if the language L is uncountable, we say that it has cardinality k if V

(or equivalently, the set of nonlogical symbols) does. Cardinality is studied
within ZFC in Chapter VII. However, to extend the consistency theorem and
its corollaries to uncountable L one only needs to have an understanding of
the informal Cantorian concept and of its basic properties (e.g., the “real”
counterpart of VII.5.17) along with a basic (informal) understanding of ordinals.
The following is true (for a proof outline see volume 1 of these lectures).

I.5.24 Metatheorem (Consistency Theorem). If a (ﬁrst order) theory T over a

language L of cardinality k is consistent, then it has a model of cardinality ≤ k.

I.5.25 Corollary (Completeness Theorem). In any ﬁrst order language L,

|= A implies A.

I.5.26 Corollary (Gödel-Mal cev Compactness Theorem). In any ﬁrst order

language L, a set of formulas is satisfiable iff it is finitely satisfiable.

The Löwenheim-Skolem theorem takes the following form:

I.5.27 Corollary (Upward Löwenheim-Skolem Theorem). If a set of formu-

las over a language L of cardinality k has an inﬁnite model, then it has a
model of any cardinality n such that k ≤ n.

At one extreme, ZFC set theory’s intended model is so huge that it is not
even a set (its domain, that is, is not). At the other extreme, set theory has only
two primary nonlogical symbols; hence, if we believe that it is consistent,† it has

† We will have an opportunity to explain this hedging later on.

66 I. A Bit of Logic: A User’s Toolbox

a countable model. Countable models play an important role in the metatheory

of ZFC (as we see, e.g., in Chapter VIII).
The (very condensed) material in this passage is not used anywhere in
this volume.

I.6. Deﬁned Symbols

We have already mentioned that the language lives, and it is being constantly
enriched by new nonlogical symbols through deﬁnitions. The reason we do this
is to abbreviate undecipherably long formal texts, thus making them humanly
understandable.
There are three possible kinds of formal abbreviations, namely, abbreviations
of formulas, abbreviations of variable terms (i.e., objects that depend on free
variables), and abbreviations of constant terms (i.e., objects that do not depend
on free variables). Correspondingly, we introduce a new nonlogical symbol for
a predicate, a function, or a constant in order to accomplish such abbreviations.
Here are three simple examples, representative of each case.
We introduce a new predicate (symbol), “⊆”, in set theory by a deﬁnition†

A ⊆ B ↔ (∀x)(x ∈ A → x ∈ B)

An introduction of a function symbol by deﬁnition is familiar from elemen-

tary mathematics. There is a theorem that says

“for every non-negative real number x there is a unique

(1)
non-negative real number y such that x = y · y”
This justiﬁes the introduction of a 1-ary function symbol f that, for each such x,
produces the corresponding y. Instead of using the generic “ f (x)”, we normally
√
adopt one of the notations “ x” or “x 1/2 ”. Thus, we enrich the language (of,
√
say, algebra) by the function symbol and add as an axiom the deﬁnition of
its behaviour. This would be
√ √
x= x x

or
√
y= x ↔x =y·y

where the restriction x ≥ 0 is implied by the context.

† In practice we state the above deﬁnition in argot, probably as “A ⊆ B means that, for all x, we
have x ∈ A → x ∈ B”.
I.6. Deﬁned Symbols 67

The “enabling formula” (1) – stated in argot above – is crucial in order

√
that we be allowed to introduce and its deﬁning axiom. That is, before we
introduce an abbreviation of a (variable or constant) term – i.e., an object – we
must have a proof in our theory of an existential formula, i.e., one of the type
(∃!y)A, that asserts that (if applicable, for each “value” of the free variables)
a unique such object exists.

The symbol “(∃!y)” is read “there is a unique y”. It is a logical abbreviation

(deﬁned logical symbol, just like ∀) given (in least parenthesized form) by

(∃x) A ∧ ¬(∃z)(A ∧ ¬x = z)

Finally, an example of introducing a new constant symbol, from set theory,

is the introduction of the symbol ∅ into the language, as the name of the unique
object† y that satisfies ¬U (y) ∧ (∀x)x ∈ / y, read “y is a set‡ and it has no
members”. Thus, ∅ is defined by
¬U (∅) ∧ (∀x)x ∈
/∅
or, equivalently, by
y = ∅ ↔ ¬U (y) ∧ (∀x)x ∈
/y
The general situation is this: We start with a theory , spoken in some
basic§ formal language L. As the development of proceeds, gradually and
continuously we extend L into languages L n , for n ≥ 0 (we have set L 0 = L).
Thus the symbol L n+1 stands for some arbitrary extension of L n effected at stage
n + 1. The theory itself is being extended by stages, as a sequence n , n ≥ 0.
A stage is marked by the event of introducing a single new symbol into the
language via a definition of a new predicate, function, or constant symbol. At
that same stage we also add to n the defining nonlogical axiom of the new
symbol in question, thus extending the theory n into n+1 . We set 0 = .
Specifically, if ¶ Q (xn ) is some formula we then can introduce a new predi-
cate symbol “P”# that stands for Q .

† Uniqueness follows from extensionality, while existence follows from separation. These facts –
and the italicized terminology – are found in Chapter III.
‡ U is 1-ary (unary) predicate. It is one of the two primitive nonlogical symbols of formal set
theory. With the help of this predicate we can test an object for set or atom status. “ U (y)” asserts
that y is an atom; thus “¬U (y)” asserts that y is a set – since we accept that sets or atoms are the
only types of objects that the formal system axiomatically characterizes.
§ “Basic” means here the language given originally, before any new symbols were added.
¶ Recall that (see Remark I.1.11, p. 19) the notation Q (xn ) asserts that xn , i.e., x1 , . . . , xn is the
complete list of the free variables of Q .
# Recall that predicate letters are denoted by non-calligraphic capital letters P, Q, R with or without
subscripts or primes.
68 I. A Bit of Logic: A User’s Toolbox

In the present description, Q is a syntactic (meta-)variable, while P is a new

formal predicate symbol.

This entails adding P to L k (i.e., to its alphabet V k ) as a new n-ary predicate

symbol, and adding

P xn ↔ Q (xn ) (i)

to k as the defining axiom for P. “⊆” is such a defined (2-ary) predicate in set
theory.
Similarly, a new n-ary function symbol f is added into L k (to form L k+1 ) by
a definition of its behaviour. That is, we add f to L k and also add the following
formula (ii) to k as a new nonlogical axiom

y = f y1 . . . yn ↔ Q (y, y1 , . . . , yn ) (ii)

provided we have a proof in k of the formula

(∃!y)Q (y, y1 , . . . , yn ). (iii)

Depending on the theory and on the number of free variables (n ≥ 0), “ f ” may
√
take theory-specific names such as ∅, ω, , etc. (in this illustration, for the
sake of economy of effort, we have thought of defined constants, e.g., ∅ and ω,
as 0-ary functions).
In effecting these definitions, we want to be assured of two things:
(1) Whatever we can say in the richer language L k (for any k > 0) we can also
state in the original (basic) language L = L 0 (although awkwardly, which
justifies our doing all this). “Can be stated” means that we can translate any
formula F over L k (hopefully in a “natural” way) into a formula F ∗ over
L so that the extended theory k can prove that F and F ∗ are equivalent.†
(2) We also want to be assured that the new symbols offer no more than conve-
nience, in the sense that any formula F over the basic language L deducible
from k (k > 0), one way or another (perhaps with the help of defined sym-
bols) is also deducible from .‡
These assurances will become available shortly, as Metatheorems I.6.1 and I.6.3.
Here are the “natural” translation rules that take us from a language stage L k+1

† , spoken over L, can have no opinion, of course, since it cannot see the new symbols, nor does
it have their deﬁnitions among its “knowledge”.
‡ Trivially, any F over L that can prove, any k (k > 0) can prove as well, since the latter
understands the language (L) and contains all the axioms of . Thus k extends the theory .
That it cannot have more theorems over L than makes this extension conservative.
I.6. Deﬁned Symbols 69

back to the previous, L k (so that, iterating the process, we get back to L):
Rule (1). Suppose that F is a formula over L k+1 , and that the predicate
P (whose definition took us from L k to L k+1 , and hence is a symbol of
L k+1 but not of L k ) occurs in F zero or more times. Assume that P has
been defined by the axiom (i) above (included in k+1 ), where Q is a
formula over L k . We eliminate P from F by replacing all its occurrences
by Q . That is, whenever P tn is a subformula of F , all its occurrences are
replaced by Q (tn ). We can always arrange by I.4.13 that the simultaneous
substitution Q [xn ← tn ] is defined. This results to a formula F ∗ over L k .
Rule (2). If f is a defined n-ary function symbol as in (ii) above, introduced
into L k+1 , and if it occurs in F as F [ f t1 . . . tn ],† then this formula is
logically equivalent to‡
(∃y)(y = f t1 . . . tn ∧ F [y]) (iv)
provided that y is not free in F [ f t1 . . . tn ]. Using the definition of f
given by (ii), and I.4.13 to ensure that Q (y, tn ) is defined, we eliminate
this occurrence of f , writing (iv) as
(∃y)(Q (y, t1 , . . . , tn ) ∧ F [y]) (v)
which says the same thing as (iv) in any theory that thinks that (ii) is
true (this observation is made precise in the proof of Metatheorem I.6.1).
Of course, f may occur many times in F , even “within itself”, as in
f f z 1 . . . z n y2 . . . yn ,§ or even in more complicated configurations. Indeed,
it may occur within the scope of a quantifier. So the rule becomes: Apply the
transformation taking every atomic subformula A [ f t1 . . . tn ] of F into
the form (v) by stages, eliminating at each stage the leftmost-innermost¶
occurrence of f (in the atomic formula we are transforming at this stage),
until all occurrences of f are eliminated. We now have a formula F ∗ over
Lk.

I.6.1 Metatheorem (Elimination of Deﬁned Symbols I). Let be any theory

over some formal language L.
(a) Let the formula Q be over L, and P be a new predicate symbol that extends
L into L and into via the axiom P xn ↔ Q (xn ). Then, for any formula

† This notation allows for the possibility that f t1 . . . tn does not occur at all in F (see the convention
on brackets, p. 19).
‡ See (C) in the proof of Metatheorem I.6.1 below.
§ Or f ( f (z 1 , . . . , z n ), y2 , . . . , yn )), using brackets and commas to facilitate reading.
¶ A term f t1 . . . tn is innermost iff none of the ti contains “ f ”.
70 I. A Bit of Logic: A User’s Toolbox

F over L , the P-elimination as in Rule (1) above yields a F ∗

over L such
that
F ↔ F ∗

(b) Let F [x] be over L, and let t stand for f t1 . . . tn , where f is introduced
by (ii) above as an axiom that extends into . Assume that no ti contains
the letter f and that y is not free in F [t]. Then†
F [t] ↔ (∃y)(Q (y, tn ) ∧ F [y])

Here “L ” is “L k+1 ” (for some k) and “L” is “L k ”.

Proof. First observe that this metatheorem indeed gives the assurance that, after
applying the transformations (1) and (2) to obtain F ∗ from F , thinks that
the two are equivalent.
(a): This follows immediately from the Leibniz rule (I.4.25).
(b): Start with
F [t] → t = t ∧ F [t] (by t = t and |=Taut -implication) (A)
Now, by Ax2, substitutability, and non-freedom of y in F [t],
t = t ∧ F [t] → (∃y)(y = t ∧ F [y])
Hence
F [t] → (∃y)(y = t ∧ F [y]) (B)
by (A) and |=Taut -implication.‡
Conversely,
y = t → (F [y] ↔ F [t]) (Ax4; substitutability was used here)
Hence (by |=Taut )
y = t ∧ F [y] → F [t]
Therefore, by ∃-introduction (allowed, by our assumption on y),
(∃y)(y = t ∧ F [y]) → F [t]

† As we already have remarked, in view of I.4.13, it is unnecessary pedantry to make assumptions

on substitutability explicit.
‡ We will often write just “by |=Taut ” meaning to say “by |=Taut -implication”.
I.6. Deﬁned Symbols 71

which, along with (B), establishes

F [t] ↔ (∃y)(y = t ∧ F [y]) (C)

Finally, by (ii) (which introduces to the left of ), (C), and the Leibniz rule,

F [t] ↔ (∃y)(Q (y, tn ) ∧ F [y]) (D)

The import of Metatheorem I.6.1 is that if we transform a formula F – written

over some arbitrary extension by definitions, L k+1 , of the basic language L –
into a formula F ∗ over L, then k+1 (the theory over L k+1 that has the benefit
of all the added axioms) thinks that F ↔ F ∗ . The reason for this is that we
can imagine that we eliminate one new symbol at a time, repeatedly applying
the metatheorem above – part (b) to atomic subformulas – forming a sequence
of increasingly more basic formulas F k+1 , F k , F k−1 , . . . , F 0 , where F 0 is
the same string as F ∗ and F k+1 is the same string as F .
Now, i+1 F i+1 ↔ F i for i = k, . . . , 0, where, if a defined function
letter was eliminated at step i + 1 → i, we invoke (D) above and Leibniz
rule. Hence, since 0 ⊆ 1 ⊆ · · · ⊆ k+1 , we have k+1 F i+1 ↔ F i for
i = k, . . . , 0, and therefore k+1 F k+1 ↔ F 0 .

I.6.2 Remark (One Point Rule). The absolutely provable formula in (C) above
is sometimes called the one point rule (Gries and Schneider (1994), Tourlakis
(2000a, 2000b, 2001)). Its “dual”

F [t] ↔ (∀y)(y = t → F [y])

is also given the same nickname and is easily (absolutely) provable using (C)
by eliminating ∃.

I.6.3 Metatheorem (Elimination of Deﬁned Symbols II). Let be a theory

over a language L.

(a) If L denotes the extension of L by the new predicate symbol P, and

denotes the extension of by the addition of the axiom P xn ↔ Q (xn ),
where Q is a formula over L, then F for any formula F over L such
that F .
(b) Assume that

(∃!y)R(y, x1 , . . . , xn ) (∗)
72 I. A Bit of Logic: A User’s Toolbox

pursuant to which we deﬁned the new function symbol f by the axiom

y = f x1 . . . xn ↔ R(y, x1 , . . . , xn ) (∗∗)

and thus extended L to L and to . Then F for any formula F

over L such that F .

Proof. This metatheorem assures that extensions of theories by deﬁnitions are

conservative in that they produce convenience but no additional power (the
same old theorems over the original language are the only ones provable).
(a): By the completeness theorem, we show instead that

|= F (1)

So let M = (M, I ) be an arbitrary model of , i.e., let

|=M (2)

We now expand the structure M into M = (M, I ) – without adding any new

individuals to its domain M – by adding an interpretation, P I , for the new
symbol P. We deﬁne for every a1 , . . . , an in M

P I (a1 , . . . , an ) = t iff |=M Q (a 1 , . . . , a n ) [i.e., iff |=M Q (a 1 , . . . , a n )]

Clearly then, M is a model of the new axiom, since, for all M -instances of the
axiom – such as P(a 1 , . . . , a n ) ↔ Q (a 1 , . . . , a n ) – we have
I
P(a 1 , . . . , a n ) ↔ Q (a 1 , . . . , a n ) =t

It follows that |=M , since we have |=M , the latter by (2), due to having
made no changes to M that affect the symbols of L. Thus, F yields
|=M F ; hence, since F is over L, |=M F . Along with (2), this proves (1).

(b): As in (a), assume (2) in an attempt to prove (1). By (∗)

|=M (∃!y)R(y, x1 , . . . , xn )

Thus, there is a concrete (i.e., in the metatheory) function f of n arguments that

takes its inputs from M and gives its outputs to M, the input-output relation
being given by (3) below (bn in, a out). To be speciﬁc, the semantics of “∃!”
implies that for all b1 , . . . , bn in M there is a unique a ∈ M such that
I
R(a, b1 , . . . , bn ) = t (3)
I.6. Deﬁned Symbols 73

We now expand the structure M into M = (M, I ),† so that all we add to it
is an interpretation for the new function symbol f . We let f I =

f . From (2)
it follows that

|=M (2 )

since we made no changes to M other than adding an interpretation of f , and

since no formula in contains f . By (3), if a, b1 , . . . , bn are any members of
M, then we have
|=M a = f b1 . . . bn iff a = f (b1 , . . . , bn )
iff |=M R(a, b1 , . . . , bn ) by the deﬁnition of
f
iff |=M R(a, b1 , . . . , bn )
– the last “iff” because R (over L) means the same thing in M and M .
Thus,

|=M y = f x1 . . . xn ↔ R(y, x1 , . . . , xn ) (4)

Now (∗∗), (2 ) and (4) yield |=M , which implies |=M F (from F ).
Finally, since F contains no f , |=M F . This last result and (2) give (1).

I.6.4 Remark.
(a) We note that translation rule (1) and (2) – the latter applied to atomic sub-
formulas – preserve the syntactic structure of quantiﬁer preﬁxes. For example,
suppose that we have introduced f in set theory by

y = f x1 . . . xn ↔ Q (y, x1 . . . , xn ) (5)

Now, an application of the collection axiom of set theory has a hypothesis of

the form

“(∀x ∈ Z )(∃w)(. . . A [ f t1 . . . tn ] . . . )” (6)

where, say, A is atomic and the displayed f is innermost. Eliminating this f

we have the translation

“(∀x ∈ Z )(∃w) . . . (∃y)(A [y] ∧ Q (y, t1 , . . . , tn )) . . . ” (7)

which still has the ∀∃-preﬁx and still looks exactly like a collection axiom
hypothesis.
(b) Rather than worrying about the ontology of the function symbol formally
introduced by (5) above – i.e., the question of the exact nature of the symbol

† This part is independent of part (a); hence this is a different I in general.

74 I. A Bit of Logic: A User’s Toolbox

that we named “ f ” – in practice we shrug this off and resort to metalinguistic

devices to name the function symbol, or the term that naturally arises from
it. For example, one can use the notation “ f Q ” for the function – where the
subscript “Q ” is the exact string over the language that “Q ” denotes – or, for
the corresponding term, the notation of Whitehead and Russell (1912),

(ιz)Q (z, x1 , . . . , xn ) (8)

The “z” in (8) above is a bound variable.† This new type of term is read “the
unique z such that . . . ”.
This “ι” is not one of our primitive symbols.‡ It is just meant to lead to the
friendly shorthand (8) above that avoids the ontology issue.
Thus, once one proves

(∃!z)Q (z, x1 , . . . , xn ) (9)

one can then introduce (8) by the axiom

y = (ιz)Q (z, x1 , . . . , xn ) ↔ Q (y, x1 , . . . , xn ) (5 )

which, of course, is an alias for axiom (5), using more suggestive notation for
the term f x1 , . . . , xn .
By (9), axioms (5) or (5 ) can be replaced by

Q ( f x1 , . . . , xn , x1 , . . . , xn )

and

Q ((ιz)Q (z, x1 , . . . , xn ), x1 , . . . , xn ) (10)

respectively. For example, from (5 ) we get (10) by substitution. Now, Ax4
(with some help from |=Taut ) yields

Q (ιz)Q (z, x1 , . . . , xn ), x1 , . . . , xn →
y = (ιz)Q (z, x1 , . . . , xn ) → Q (y, x1 , . . . , xn )
Hence, assuming (10),

y = (ιz)Q (z, x1 , . . . , xn ) → Q (y, x1 , . . . , xn ) (11)

† That it must be distinct from the xi is obvious.

‡ It is however possible to enlarge our alphabet to include “ι”, and then add deﬁnitions of the
syntax of “ι-terms” and axioms for the behaviour of “ι-terms”. At the end of all this one gets a
conservative extension of the original theory, i.e., any ι-free formula provable in the new theory
can be also proved in the old (Hilbert and Bernays (1968)).
I.6. Deﬁned Symbols 75

Finally, deploying (9), we get

Q (ιz)Q (z, x1 , . . . , xn ), x1 , . . . , xn →
Q (y, x1 , . . . , xn ) → y = (ιz)Q (z, x1 , . . . , xn )
Hence

Q (y, x1 , . . . , xn ) → y = (ιz)Q (z, x1 , . . . , xn )

by (10). This, along with (11), yields (5 ).

The indeﬁnite article. We often have the following situation: We have proved a
statement like

(∃x) A [x] (1)

and we want next to derive a statement B .

To this end, we start by picking a symbol c not in B and say “let c be such that
A [c] is true”.† That is, we add A [c] as a nonlogical axiom, treating c as a new
constant. From all these assumptions we then manage to prove B , hopefully
treating all the free variables of A [c] as constants during the argument. We then
conclude that B has been derived without the help of A [c] or c (see I.4.27).
Two things are noteworthy in this technique: One, c does not occur in the
conclusion, and, two, c is not uniquely determined by (1). So we have a c,
rather than the c, that makes A [c] true.
Now the suggestion that the free variables of the latter be frozen during the
derivation of B is unnecessarily restrictive, and we have a more general result:
Suppose that

(∃x)A (x, y1 , . . . , yn ) (2)

Add a new function symbol f to the language L of (thus obtaining L ) via

the axiom

A( f y1 . . . yn , y1 , . . . , yn ) (3)

This says, intuitively, “for any y1 , . . . , yn , let x = f y1 . . . yn make A(x,

y1 , . . . , yn ) true”. Again, this x is not uniquely determined by (2).
Finally, suppose that we have a proof

+ A( f y1 . . . yn , y1 , . . . , yn ) B (4)

† Cf. II.4.1.
76 I. A Bit of Logic: A User’s Toolbox

such that f , the new function symbol, occurs nowhere in B , i.e., the latter
formula is over L. We can conclude then that

B (5)

that is, the extension + A( f y1 . . . yn , y1 , . . . , yn ) of is conservative.

A proof of the legitimacy of this technique, based on the completeness
theorem, is easy. Let

|=M (6)

and show

|=M B (7)

Expand the model M = (M, I ) to M = (M, I ) so that I interprets the new

symbol f . The interpretation is chosen as follows:
(2) guarantees that, for all choices of i 1 , . . . , i n in M, the set S(i 1 , . . . , i n ) =
{a ∈ M : |=M A(a, i 1 , . . . , i n )} is not empty. By the axiom of choice (of in-
formal set theory), we can pick† an a(i 1 , . . . , i n ) in each S(i 1 , . . . , i n ). Thus,
we deﬁne a function f : M n → M by letting, for each i 1 , . . . , i n in M,
f (i 1 , . . . , i n ) = a(i 1 , . . . , i n ).
The next step is to set

fI =

f

Therefore, for all i 1 , . . . , i n in M,

( f i 1 . . . i n )I =

f (i 1 , . . . , i n ) = a(i 1 , . . . , i n )

It is now clear that |=M A( f y1 . . . yn , y1 , . . . , yn ), for, by I.5.11,

(A( f i 1 . . . i n , i 1 , . . . , i n ))I = t ↔ (A(a(i 1 , . . . , i n ), i 1 , . . . , i n ))I = t

and the right hand side of the above is true by the choice of a(i 1 , . . . , i n ).
Thus, |=M + A( f y1 . . . yn , y1 , . . . , yn ); hence |=M B , by (4).
Since B contains no f , we also have |=M B ; thus we have established (7)
from (6). We now have (5).
One can give a number of names to a function like f : A Skolem function,
an ε-term (Hilbert and Bernays (1968)), or a τ -term (Bourbaki (1966b)). In
the ﬁrst case one may ornament the symbol f , e.g., f ∃A , to show where it is
coming from, although such mnemonic naming is not, of course, mandatory.

† The“(i 1 , . . . , i n )” part indicates that “a” depends on i 1 , . . . , i n .

I.7. Formalizing Interpretations 77

The last two terminologies actually apply to the term f y1 . . . yn , rather than to
the function symbol f .
Hilbert would have written

(εx) A(x, y1 . . . , yn ) (8)

and Bourbaki

(τ x)A(x, y1 . . . , yn ) (9)

each denoting f y1 . . . yn . The “x” in each of (8) and (9) is a bound variable
(different from each yi ).

I.7. Formalizing Interpretations

In Section I.5 we discussed Tarski semantics. As we pointed out there (footnote,
p. 54), this semantics, while rigorous, is not formal. It is easy to formalize Tarski
semantics, and we do so in this section not out of a compulsion to formalize,
but because formal interpretations are at the heart of many relative consistency
results, some of which we want to discuss in this volume.
As always, we start with a formal language, L. We want to interpret its
terms and formulas inside some appropriate structure M = (M, I ). This time,
instead of relying on the metatheory to provide us with a universe of discourse,
M, we will have another formal language† L i and a theory Ti over L i to supply
the structure.
Now, such a universe is, intuitively, a collection of individuals. Any formula
M(x) over L i can formally denote a collection of objects. For example, we
may think of M(x) as deﬁning “the collection of all x such that M(x) holds”
(whatever we may intuitively understand by “holds”).
We have carefully avoided saying “set of all x such that M(x) holds”,
since, if (for example) L i is an extension of the language of set theory, then
“the collection of all x such that x ∈ x holds” is not a set.‡ Intuitively, such
collections are of “enormous size” (this being the reason – again, intuitively –
that prevents them from being sets).

The fact that a formula M(x) might formally denote a collection that is not a set
is perfectly consistent with our purposes. After all, the intended interpretation
of set theory has such a non-set collection as its universe.

† The subscript “i” is a weak attempt on my part to keep reminding us throughout this section that
L i and Ti are to implement an interpretation of L.
‡ See II.2.1.
78 I. A Bit of Logic: A User’s Toolbox

The requirement that a universe be nonempty – or that it be true in the

metatheory that M = ∅ – translates to the formal requirement that Ti can
syntactically† certify the nonemptiness:
Ti (∃x) M (x) (1)
The primary interpretation mapping, I , is similar to the one deﬁned in I.5.1.
We summarize what we have agreed to do so far, “translating” Deﬁnition I.5.1
to the one below.

I.7.1 Deﬁnition. Given a language L = (V , Term, Wff).

A formal interpretation of L is a 4-tuple I = (L i , Ti , M (x), I ), where
L i = (V i , Termi , Wffi ) is a ﬁrst order language (possibly, the same as L), Ti
is a theory over L i , M(x) is a formula over L i , and I is a total mapping from
the set of nonlogical symbols of L into the set of nonlogical symbols of L i .
Moreover, it is required that the following hold:
(i) (1) above holds.
(ii) For each constant a of V , a I is a constant of V i such that Ti M(a I ).
(iii) For each function f of V , of arity n, f I is function of V i , of arity n,
such that
Ti M(x1 ) ∧ M(x2 ) ∧ · · · ∧ M(xn ) → M( f I x1 x2 . . . xn )
(iv) For each predicate P of V , P I is a predicate of V i , of arity n.

The conditions in I.7.1(ii) and I.7.1(iii) simply say that the universe {x : M } is
closed under constants (i.e., contains the interpreting constants, a I ) and under
the interpreting functions, f I .
Some authors will not assume that L i already has enough nonlogical symbols
to effect the mapping I as plainly as in the deﬁnition above. They will instead
say that, for example, to any n-ary f of L, I will assign a formula A(y, xn )
of L i such that

Ti M(x1 ) ∧ · · · ∧ M(xn ) → (∃!y) M(y) ∧ A(y, xn )
In view of our work in the previous section, this would be an unreasonably
roundabout way for us to tell the story.
Similarly, the results of Section I.6 allow us, without loss of generality, to
always assume that the formula M in an interpretation I = (. . . , M, . . . ) is
atomic, P x, where P is some unary predicate.
† We thus substitute the syntactic, or formal, requirement of provability for the semantic, or infor-
mal, concept of truth.
I.7. Formalizing Interpretations 79

We next formalize the extension of I to all terms and formulas (cf. I.5.5
and I.5.6).

I.7.2 Deﬁnition. For every term t over L, we deﬁne its relativization to M, in

symbols t M , by induction on t:
 I
a if t ≡ a
t ≡ z
M
if t ≡ z (a variable)
 I M M
f t1 . . . tn if t ≡ f t1 . . . tn

where t1 , . . . , tn are terms over L, and f is an n-ary function of L.

A trivial induction (on terms t over L) proves that t M is a term over L i .

I.7.3 Deﬁnition. For every A over L, we deﬁne its relativization to M, in

symbols A M , by induction on A :


 t M = sM if A ≡t =s


 M M
 P M t1 . . . tn if A ≡ Pt1 . . . tn
A ≡ ¬(B )
M M
if A ≡ ¬B



(B M ) ∨ (C M ) if A ≡B ∨C

(∃z)(M(z) ∧ B M ) if A ≡ (∃z)B

where s, t, t1 , . . . , tn are terms over L, and P is an n-ary predicate of L.

The two definitions I.7.2 and I.7.3 are entirely analogous with the definition
of mixed mode formulas (I.5.17). The analogy stands out if we imagine that
“A M ” is some kind of novel notation for “A [[ . . . ]]”. Particularly telling is
the last case (pretend that we have let M = {x : M(x)}, where M may or may
not be a set).
We have restricted the definition to the primary logical symbols.
M Thus, e.g.,
just
as (∀x)A
M abbreviates ¬(∃x)¬A, we have that (∀x)A abbreviates

¬ (∃x)¬A
, i.e., ¬(∃x) M(x)∧¬A M
, or, in terms of “∀”, (∀x) M(x) →
M
A .
A trivial induction (on formulas A over L) proves that A M is a formula
over L i .

We have deﬁned in Section I.5 the symbol |=M A(x1 , . . . , xn ) to mean

For all a1 , . . . , an in M, A[[a1 , . . . , an ]] is true (1)

80 I. A Bit of Logic: A User’s Toolbox

Correspondingly, we deﬁne the formalization of (1) in I. Unfortunately, we will

use the same symbol as above, |=. However, the context will reveal whether
it is the semantic or syntactic (formal) version that we are talking about. In
the latter case we have a subscript, |=I, that is a formal interpretation (not a
metamathematical structure) name.

I.7.4 Deﬁnition. Let I = (L i , Ti , M(x), I ) be a formal interpretation for a

language L. For any formula A (x1 , . . . , xn ) over L, the symbol

|=I A(x1 , . . . , xn ) (2)

is short for

Ti M(x1 ) ∧ M(x2 ) ∧ · · · ∧ M(xn ) → A M (x1 , . . . , xn ) (3)

The part “M(x1 ) ∧ M(x2 ) ∧ · · · ∧ M(xn ) →” in (3) is empty if A is a

sentence.

We will (very reluctantly) pronounce (2) above “A(x1 , . . . , xn ) is true in the

interpretation I”. Even though we have said “true”, the context will alert us to
the argot use of the term, and that we really are talking about provability – (3) –
here.
The following lemma is the counterpart of Lemma I.5.11.

I.7.5 Lemma. Given terms s and t and a formula A, all over L. Then
(s[x ← t])M ≡ s M [x ← t M ] and (A [x ← t])M ≡ A M [x ← t M ].

We assume that the operation [x ← t] is possible, without loss of generality.

Proof. The details of the two inductions, on terms s and formulas A, are left
to the reader (see the proof of I.5.11).
We only look at one “hard case” in each induction:
Induction on terms s. Let s ≡ f t1 t2 . . . tn . Then

(s[x ← t])M ≡ ( f t1 [x ← t] . . . tn [x ← t])M

≡ f I (t1 [x ← t])M . . . (tn [x ← t])M by I.7.2
M M
≡ f I t1 [x ← t M ] . . . tn [x ←t M
] by I.H.
M M
≡ ( f I t1 . . . tn )[x ← t M ]
≡s M
[x ← t M
] by I.7.2
I.7. Formalizing Interpretations 81

Induction on formulas A. Let A ≡ (∃w)B and w ≡ x. Then

M
(A [x ← t])M ≡ (∃w)B [x ← t]
M
≡ (∃w)B [x ← t] (recall the priority of [. . . ])
M
≡ (∃w) M(w) ∧ B [x ← t] by I.7.3

≡ (∃w) M(w) ∧ B M [x ← t M ] by I.H.

≡ (∃w) M (w) ∧ B M [x ← t M ] by w ≡ x
M
≡ (∃w)B [x ← t M ], by I.7.3

We will also need the following lemma. It says that all “interpreting objects”
are in {x : M}.

I.7.6 Lemma. For any term t over L,

Ti M(x1 ) ∧ · · · ∧ M(xn ) → M t M [xn ] (4)

where all the free variables of t are among the xn .

Proof. We have three cases.

(a) t ≡ a, a constant. Then the preﬁx “M(x1 ) ∧ · · · ∧ M(xn ) →” is empty

in (4), and the result follows from I.7.1(ii).
(b) t ≡ z, a variable. Then (4) becomes Ti M(z) → M (z).
(c) t ≡ f t1 . . . tn . Now (4) is
M M
Ti M(x1 ) ∧ · · · ∧ M(xr ) → M f I t1 [xr ] . . . tn [xr ] (5)

To see why (5) holds, freeze the xr and add the axiom B ≡ M(x1 ) ∧ · · · ∧
M (xr ) to Ti . By the I.H.,
M
Ti +B M(ti [xr ]) for i = 1, . . . , n

By tautological implication, substitution (I.4.12), and I.7.1(iii), the above

yields
M M
Ti +B M f I t1 [xr ] . . . tn [xr ]

The deduction theorem does the rest.

We are ready to prove our key result in this connection, namely soundness.
82 I. A Bit of Logic: A User’s Toolbox

I.7.7 Theorem. Let I = (L i , Ti , M, I ) be a formal interpretation of a lan-

guage L. Then for any A ∈ over L (cf. I.3.13),
|=I A(x1 , . . . , xn )

Proof. We want
Ti M (x1 ) ∧ · · · ∧ M(xn ) → A M (x1 , . . . , xn ) (6)
for all A ∈ . We have several cases.
Ax1. Let A(xn ) be a tautology. As the operation . . .M does not change the
Boolean connectivity of a formula, so is A M (xn ). Thus, (6) follows by
tautological implication.
Ax2. Let A (x , y , z ) ≡ B (x , t(x , y ), z ) → (∃w)B (x , w, z ). By I.7.5,

A M (x , y , z ) ≡ B M (x , t M (x , y ), z ) → (∃w) M(w) ∧ B M (x , w, z )
By I.7.6,
Ti M(x1 ) ∧ · · · ∧ M(y1 ) ∧ · · · → M(t M (x , y )) (7)
Since

M(t M (x , y )) ∧ B M (x , t M (x , y ), z ) → (∃w) M(w) ∧ B M (x , w, z )
is in over L i , (7) and tautological implication yield
Ti M(x1 ) ∧ · · · ∧ M(y1 ) ∧ · · · →
B M (x , t M (x , y ), z ) → (∃w) M(w) ∧ B M (x , w, z )
One more tautological implication gives what we want:
Ti M(x1 ) ∧ · · · ∧ M(y1 ) ∧ · · · ∧ M(z 1 ) ∧ · · · → A M (x , y , z )
Ax3. Let A(x) ≡ x = x. We want Ti M(x) → x = x, which holds by
tautological implication and the fact that x = x is logical over L i .
Ax4. Here A [xn ] ≡ t = s → (B [x ← t] ↔ B [x ← s]), where xn includes
all the participating free variables. Thus, using I.7.5, (6) translates into
Ti M(x1 ) ∧ · · · ∧ M(xn ) → t M = s M
→ (B M [x ← t M ] ↔ B M [x ← s M ])
which holds by tautological implication from the instance of the Leibniz
axiom over L i , t M = s M → (B M [x ← t M ] ↔ B M [x ← s M ]).

I have used above abbreviations such as “B M → A M ” for the abbreviation

“(B → A)M ”, etc.

We next direct our attention to some theory T over L.

I.7. Formalizing Interpretations 83

I.7.8 Deﬁnition. Let T be a theory over L and I = (L i , Ti , M, I ) be a formal

interpretation of L over the language L i .
We say that I is a formal interpretation of the theory (or a formal model
of the theory) T just in case, for every nonlogical axiom A of T, it is |=I A
(cf. I.7.4).

I.7.9 Theorem (Formal Soundness). If I = (L i , Ti , M, I ) is a formal in-

terpretation of the theory T over L, then, for any formula A over L, T A
implies |=I A.

Proof. We do induction on T-theorems. For the basis, if A is logical, then we

are done by I.7.7. If it is nonlogical, then we are done by deﬁnition (I.7.8).
Assume then that T B → A and T B , and let xn include all the free
variables of these two formulas. By the I.H.,

Ti M(x1 ) ∧ · · · ∧ M(xn ) → B M → A M

and

Ti M(x1 ) ∧ · · · ∧ M(xn ) → B M

The above two and tautological implication yield

Ti M(x1 ) ∧ · · · ∧ M(xn ) → A M

Finally, let it be the case that A ≡ (∃z)B → C , where z is not free in C ,

and moreover T B → C . Let z, xn – distinct variables – include all the free
variables of B → C .
By the I.H.,

Ti M (z) ∧ M(x1 ) ∧ · · · ∧ M(xn ) → B M → C M

Hence (by tautological implication)

Ti M (z) ∧ B M → M(x1 ) → · · · → M (xn ) → C M

By ∃-introduction,

Ti (∃z) M (z) ∧ B M → M(x1 ) → · · · → M (xn ) → C M

Utilizing tautological implication again, and Deﬁnition I.7.3, we are done:

M
Ti M(x1 ) ∧ · · · ∧ M(xn ) → (∃z)B →CM

It is a shame to call the next result just a “corollary”, for it is the result on
which we will base the various relative consistency results in this volume (with
84 I. A Bit of Logic: A User’s Toolbox

the sole exception of those in Chapter VIII, where we work in the metatheory,
mostly).
The corollary simply says that if the theory, Ti , in which we interpret T is
not “broken”, then T is consistent. This is the formal counterpart of the easy
half of Gödel’s completeness theorem: If a theory T has a (metamathematical)
model,† then it is consistent.

I.7.10 Corollary. Let I = (L i , Ti , M, I ) be a formal model of the theory T

over L. If Ti is consistent, then so is T.

Proof. We prove the contrapositive. Let T be inconsistent; thus

T ¬x = x
By I.7.9, Ti M (x) → ¬x = x; thus, by I.4.23,
Ti (∃x) M (x) → (∃x)¬x = x
Since Ti (∃x) M (x) by I.7.1, modus ponens yields Ti (∃x)¬x = x, which
along with Ti (∀x)x = x shows that Ti is inconsistent.

We conclude the section with a brief discussion of a formal version of

structure isomorphisms. In the case of “real structures” M = (M, . . . ) and
N = (N , . . . ), we have shown in volume 1 that if φ : M → N is a 1-1 cor-
respondence that preserves the meaning of all basic symbols, then it preserves
the meaning of everything, that is, if A is a formula and a, b, . . . are in M,
then |=M A [[ a, b, . . . ]] iff |=N A [[ φ(a), φ(b), . . . ]].
We will use the formal version only once in this volume; thus we feel free
to restrict it to our purposes. To begin with, we assume a language L whose
only nonlogical symbols are a unary and a binary predicate, which we will
denote by U and ∈ respectively (we have set theory in mind, of course). The
interpretations of L whose isomorphisms we want to deﬁne and discuss are
I = (L i , Ti , M, I ) and J = (L i , Ti , N , J ). Note that Ti and L i are the
same in both interpretations.
Let now φ be a unary function symbol in L i . It is a formal isomorphism of
the two interpretations iff the following hold:

(1) Ti N (x) → (∃y) M(y) ∧ x = φ(y) (“ontoness”)

(2) Ti M(x) ∧ M(y) → x = y ↔ φ(x) = φ(y) (“1-1ness”‡ )

† It is a well-established habit not to doubt the metatheory’s reliability, a habit that has had its
critics, including Hilbert, whose metatheory sought “reliability” in simplicity. But we are not
getting into that discussion again.
‡ The → half of ↔ we get for free by an application of the Leibniz axiom.
I.7. Formalizing Interpretations 85

(3) Ti M (x) → U M (x)↔ U N (φ(x) (“preservation
of U ”† )
(4) Ti M (x) ∧ M(y) → x ∈ M y ↔ φ(x) ∈ N φ(y) (“preservation of ∈”)
If L contains a constant c, then we must also have φ(c M ) = c N . This is
met in our only application later on by having c M = c = c N and φ(c) = c.

I.7.11 Remark. In what follows we will present quite a number of formal

proofs. It is advisable then at this point to offer a proof-writing tool that will,
hopefully, shorten many of these proofs.
Whenever the mathematician is aware (of proofs) of a chain of equivalences
such as
A 1 ↔ A2 , A 2 ↔ A3 , A 3 ↔ A4 , . . . , A n−1 ↔ An
he often writes instead
A 1 ↔ A2 ↔ A3 ↔ A4 ↔ · · · ↔ A n−1 ↔ An
i.e., abusing notation and treating “↔” conjunctionally rather than (the correct)
associatively. This parallels the (ab)uses
a<b<c for a < b and b < c
and
a=b=c for a = b and b = c
Of course, such a chain also proves A 1 ↔ An by tautological equivalence.
Moreover, A 1 is provable iff A n is (by tautological implication).
More generally, the chain may involve a mix of “↔” and “→”. Again tau-
tological equivalence yields a proof of A 1 → A n this time.
Dijkstra and Scholten (1990), Gries and Schneider (1994), and Tourlakis
(2000a, 2000b, 2001) suggest a vertical layout of such chains and say that such
a chain constitutes a calculational proof:
A1

↔ or → annotation/reason
A2

↔ or → annotation/reason
..
.
↔ or → annotation/reason
An

† We write “U M ” rather than “U I ”, as this will be the habitual notation in the context of set
theory.
86 I. A Bit of Logic: A User’s Toolbox

from which

T A 1 → A n

follows, where T is the theory within which we reasoned above.

Moreover, if T A 1 , then also T A n by modus ponens.

We can now prove:

I.7.12 Lemma. Let L be a language with just U and ∈, above, as its nonlogical
symbols, and let φ be a formal isomorphism of its two interpretations I =
(L i , Ti , M, I ) and J = (L i , Ti , N , J ) in the sense of (1)–(4). Then, for
every formula A(xn ) over L,

Ti M (x1 ) ∧ · · · ∧ M(xn ) → A M (xn ) ↔ A N (φ(x1 ), . . . , φ(xn ))

Proof. Induction on formulas. For the atomic ones the statement is just (2)–(4)
above. We skip the trivial ∨ and ¬ cases and look into A(xn ) ≡ (∃y)B (y, xn ).
First,

A M (xn ) ≡ (∃y) M(y) ∧ B M (y, xn ) (5)

and

A N (φ(x1 ), . . . , φ(xn )) ≡ (∃y) N (y) ∧ B N (y, φ(x1 ), . . . , φ(xn )) (6)

We now freeze the xn and work in Ti + M(x1 ) ∧ · · · ∧ M(xn ). We calculate

as follows:

(∃y) N (y) ∧ B N (y, φ(x1 ), . . . , φ(xn ))

↔ by (1) and Leibniz rule; z a new variable

(∃y) (∃z) M(z) ∧ y = φ(z) ∧ B N (y, φ(x1 ), . . . , φ(xn ))

↔ newness of z

(∃z)(∃y) M(z) ∧ y = φ(z) ∧ B N (y, φ(x1 ), . . . , φ(xn ))

↔ one point rule (I.6.2) and Leibniz rule

(∃z) M(z) ∧ B N (φ(z), φ(x1 ), . . . , φ(xn ))

↔ I.H. and Leibniz rule

(∃z) M(z) ∧ B M (z, xn )
I.8. The Incompleteness Theorems 87

The top line of our calculation is (6), while the bottom is (5) (within bound
variable renaming); thus we are done by the deduction theorem.

I.8. The Incompleteness Theorems

This brief section is only meant to acquaint the reader with what Gödel’s in-
completeness theorems are about. The second theorem in particular is one that
we will invoke a number of times in this volume; therefore it is desirable to
present here the statements of these two theorems and outline, at the intuitive
level, what makes them tick. A full exposition and complete proofs for both
theorems can be found in our companion volume Mathematical Logic.
Now, Gödel’s completeness theorem asserts the adequacy of the syntactic
proof apparatus for characterization of “truth”. On the other hand, his incom-
pleteness theorems assert the inadequacy of this syntactic apparatus for captur-
ing “truth”. The contradiction is only apparent. Completeness says that truth
of a formula in all concrete worlds (all models) of a first order theory can be
adequately captured – the formula is provable. Incompleteness addresses truth
in one world. Often such a world is the one that matters: The intended or natural
model of a theory that we want to study axiomatically. A formula of the theory
that is true (in the Tarski semantics sense) in the intended model is, naturally,
called really true. An example of such a special world is our familiar structure,
N = (N; S, +, ×; <; 0). Peano arithmetic is the associated formal theory that
attempts to characterize this structure.
The first incompleteness theorem in its semantic version says that Peano
arithmetic, or indeed any reasonably well-constructed extension, cannot do a
very good job of proving all the formulas that are really true (in N). It misses
infinitely many. Hence the term “incompleteness”, or, more emphatically, “in-
completableness”, the latter because we cannot make incompleteness go away
by throwing axioms at it.
Let us dispense with some terminology before we can actually state and
discuss the theorems.

I.8.1 Deﬁnition. The language for Peano arithmetic we denote by L N. It has

the nonlogical symbols listed below along with their intended interpretations,
where boldface denotes the formal symbol while lightface denotes the “real”
(metamathematical) symbol:

(1) S (successor): SN = S, where S(x) = x + 1 for all x ∈ N

(2) + (addition): +N = +
(3) × (multiplication): ×N = ×
88 I. A Bit of Logic: A User’s Toolbox

(4) < (less than): <N =<

(5) 0 (zero): 0N = 0

The abbreviation
n is pronounced the numeral n, and it stands for

S .
. . S 0
n of them

As they are metamathematical abbreviations, we are not using boldface type

for numerals.

I.8.2 Deﬁnition. A theory (this names the set of nonlogical axioms) over L N
is correct over N just in case A ∈ implies |=N A.
That is, all its nonlogical axioms are true in N (or really true, if N happens
to be the intended model).

The term correct is used by Smullyan (1992). Some authors say “sound”, but
this is not as apt a terminology, for sound means something else: All ﬁrst order
theories are sound, but some theories over L N – although sound – may fail to
be correct.

I.8.3 Deﬁnition. A theory T over some language L is simply complete, or just

complete, iff, for all sentences A over L, we have at least one of T A and
T ¬A.
It is simply incomplete, or just incomplete, otherwise. An incomplete theory
thus fails to decide at least one sentence A over L, that is, neither T A nor
T ¬A holds.
Such an A is called an undecidable sentence.

Pause. Why “sentence”? Why not deﬁne the above concepts (complete, etc.)
in terms of arbitrary formulas over L?

Thus, in the case of an incomplete theory and for any particular one of its
models – including the intended one – there is at least one sentence of the
language which is (Tarski-)true in said model, but is not provable. Such is any
undecidable sentence A, for it or ¬A must be true in any given model.
An inconsistent theory is complete, of course.

I.8.4 Deﬁnition. A theory T in the language of Peano arithmetic, L N, is

ω-consistent just in case there is no formula A(x) over L N such that all of
I.8. The Incompleteness Theorems 89

the following hold:

T ¬A(
n) for all n ∈ N

and T (∃x)A(x). Otherwise it is ω-inconsistent.

An ω-consistent theory fails to prove something over its language; thus it is con-
sistent. The converse is not true, a fact ﬁrst observed by Tarski. This observation
is a corollary of the techniques applied to prove Gödel’s (ﬁrst) incompleteness
theorem (see our companion volume for the full story).

We can now state:

I.8.5 Theorem (First Incompleteness Theorem, Semantic Version). Any

correct extension of formal Peano arithmetic, effected in such a manner that
the new set of axioms remains recognizable, will fail to prove at least one really
true sentence.
It follows that any such extension is a simply incomplete theory.

By “a set A is recognizable” we mean that we can solve the membership prob-

lem, “x ∈ A?”, by algorithmic, or mechanical, means. That is, in our case here,
we can test any formula and find out, in a finite number of steps, whether it is
an axiom or not. The technical term is recursive, but we do not intend to get
into that here.†
The first word in the theorem is very important: “any”. It shows that the
theory (Peano arithmetic) is not just incomplete (take the trivial extension that
adds nothing) but, indeed, incompletable: For, add to Peano arithmetic one
really true sentence that it fails to prove. This effects an extension that is correct
(why?) and constitutes a recognizable set of axioms. Repeat now, adding a
really true sentence that this theory cannot prove. And so on.
In particular, this says that each of these extensions misses not one but in-
finitely many really true sentences (after all, we are effecting an infinite sequence
of extensions; after each extension there are infinitely many more to go).

Why is Gödel’s theorem true? The idea (in Gödel’s original proof) is very
old, based on games ancient Greek philosophers liked to play: The so-called

† A fair amount of recursion theory is covered in volume 1, Mathematical Logic, where, in partic-
ular, recursive sets are deﬁned and studied.
90 I. A Bit of Logic: A User’s Toolbox

“liar’s paradox”.† Through an ingenious arithmetization of the language Gödel

managed to construct a sentence G whose natural interpretation said “I am not
a theorem”.
Let us see then if Peano arithmetic (or a correct and recognizable extension‡ )
can prove G . Well, if it can, then – by correctness and soundness§ – G is really
true, i.e., it is not a theorem. This contradicts what we have just assumed.
So it must be that G is not a theorem.
But then, G is really true, for it says just that. We found a true sentence, G ,
that is not provable.
We have more. Since the theory is correct (and sound), and ¬G is really
false, this latter sentence is not provable. Thus the theory (as extended) is simply
incomplete; G is undecidable.

Where have we used, in the above argument, the part of the assumptions that
requires the set of nonlogical axioms to be recognizable? We actually did not
use it explicitly, since our argument was too far removed from the level of detail
that would exhibit such dependences on assumptions.
Sufﬁce it to say that, among other things, the assumption on recognizability
prevents us from cheating – thus invalidating Gödel’s theorem: Why don’t we
just add all the really true sentences to the set of axioms and form a complete
extension of Peano arithmetic? Because the recognizability assumption does
not allow this. Such an extension results to a non-recognizable set of axioms
(cf. volume 1).
There is another way to look at the intuitive reason behind the incompletable-
ness phenomenon. This relies on results of recursion theory. Imagine beings
who live in a world where set theorists call a set countable just in a case a
mechanical procedure, or algorithm, exists to enumerate all the set’s members,
possibly with repetitions. Such beings call any set that fails to be enumerable
in this manner uncountable. Intuitively, in the eyes of the inhabitants of this
world, this latter type of set has far too many objects.
In this world the set of theorems of any extension of Peano arithmetic, by
an arbitrary recognizable set of new axioms, is countable. The reason can be
seen intuitively as a consequence of the recognizability of the set of nonlogical

† Attributed to Epimenides. He, a Cretan, said: “All Cretans are liars”. So, was his statement true?
Gödel’s proof is based on a variation of this. A person says: “I am lying.” Well, is he, or is he
not?
‡ The exact form of G depends on the extension at hand.
§ Soundness we have for free. Correctness guarantees the real truth of the nonlogical axioms.
Soundness extends this to all theorems.
I.8. The Incompleteness Theorems 91

axioms. This property allows us to build systematically (algorithmically) an

inﬁnite list† of all theorems.

Digression. Here is how. To simplify matters assume that the alphabet of the
language L N is ﬁnite (for example, variables are really the strings

v|...|v

n+1

denoting what we may call vn , for n ≥ 0, built from just two symbols, “v” and
“|”).
We convert every proof into a single string by adding a new symbol to our
alphabet, say #, which is used as a separator and “glue” – between formulas –
as we concatenate all the formulas of a proof into a single string, from left to
right. We will still call the result of this concatenation a “proof”.
We now form two separate infinite lists, algorithmically. The first is the list of
all strings over the alphabet of L N, as the latter was augmented by the addition
of #. This listing can be effected by enumerating by string length, and then,
within each length group, lexicographically (alphabetically).‡
The second list is built as follows. Every time a string A is put in the first
list, we test algorithmically whether or not A is a proof. We can do this, for,
firstly, we can recognize if it is of the right form, that is,

A1 #A2 # . . . #An

where each Ai is a nonempty string over L N.

Secondly, if it is of the right form, we can then check whether indeed A is a
proof: Whether or not A j is the result of a primary rule of inference applied to
Ai (and possibly to Ak ) for some i < j (and k < j) can be determined from the
form of the strings A j , Ai , and Ak . The same is true of whether A j ∈ or not.
Finally the recognizability assumption means that we can also check whether
or not A j is nonlogical.
If (and only if ) A passes the above test, i.e., it is a proof, then we add its last
formula (the one to the right of the rightmost #) to the second list.
Now, it turns out that in such a world the set of all really true sentences of
arithmetic is uncountable (this is proved in volume 1). Thus, there are inﬁnitely
many really true sentences that are not provable, no matter which theory (that

† One can “build an infinite list algorithmically” is jargon that means the following: One has an
algorithm which, for any n ∈ N, will generate the nth element of the list in a finite number of
steps.
‡ We assume that we have fixed an alphabetical order of the finitely many symbols of our alphabet.
92 I. A Bit of Logic: A User’s Toolbox

produces a countable set of theorems) we have constructed on top of Peano

arithmetic.†

While Gödel worked with the G that says “I am not a theorem”, his result
was purely syntactic. We state it without proof below.

I.8.6 Theorem (First Incompleteness Theorem, Syntactic Version). Any

ω-consistent extension of formal Peano arithmetic, effected in such a manner
that the new set of axioms remains recognizable, has undecidable sentences, and
thus is a simply incomplete theory. In particular, one can construct a formula
G which says “I am not a theorem of this theory”. This formula is undecidable.

I.8.7 Remark. In Gödel’s proof simple (ordinary) consistency sufﬁces to prove

the unprovability of G . ω-consistency is called upon to prove that ¬G is not a
theorem either.

With a different “G ” (let us call it G ), Rosser extended I.8.6 to the following

result:

I.8.8 Theorem (Gödel-Rosser Incompleteness Theorem). Any (simply) con-

sistent extension of formal Peano arithmetic, effected in such a manner that the
new set of axioms remains recognizable, has undecidable sentences and thus is
a simply incomplete theory.

We already mentioned that ω-consistency is strictly stronger than consistency.

Similarly, it can be seen, once the details of the Gödel argument are laid out,
that correctness is strictly stronger than ω-consistency (cf. volume 1).

The second incompleteness theorem of Gödel is, more or less, a formalization

of the ﬁrst. In plain English, it says that one of the really true sentences that Peano
arithmetic – or, for that matter, any consistent and recognizable extension –
cannot prove is its own consistency.

† It is straightforward to see that if there were only ﬁnitely many really true sentences that the formal
system missed, these could be put into a ﬁnite table T , which we can check for membership,
trivially. But then, we have an algorithm that can check a formula for membership in the set union
between the theory’s axioms and T (just search the table; if not found there, then search the set
of nonlogical axioms). Thus, adding the formulas of T to the theory, we have an extension with
a recognizable set of axioms. This new theory trivially has all the formulas in T as theorems.
Hence it has all the really true formulas as theorems (T is all that the original theory missed),
contradicting the fact that this set is uncountable, while the set of theorems is still countable.
I.8. The Incompleteness Theorems 93

This fact showed that Hilbert’s finitary techniques, in the metatheory, were inad-
equate for his purposes: Intuitively, finitary techniques are codable by integers
and therefore can be expressible and usable in formal Peano arithmetic.
Now we have two conflicting situations: Hilbert’s belief that finitary tech-
niques can settle the consistency (or otherwise) of formal theories has had as
a corollary the expectation that Peano arithmetic could settle (prove) its own
consistency (via the formalized finitary tools used within the theory). On the
other hand, Gödel’s second incompleteness theorem proved that this cannot be
done.

I.8.9 Theorem (Gödel’s Second Incompleteness Theorem). Any (simply)

consistent extension of formal Peano arithmetic, effected in such a manner
that the new set of axioms remains recognizable, is unable to prove its own
consistency.

The detailed proof takes several tens of pages to be fully spelled out (cf. vol-
ume 1). However, the proof idea is very simple: Let us ﬁx attention on an
extension T as above, and let “Con” be a sentence whose natural interpreta-
tion (over N) says that T is consistent. Let also G be the sentence that says “I
am not a theorem of T ”.
Now, Gödel’s ﬁrst theorem (partly) asserts the truth (over N) of

Con → G (1)

i.e., “if T is consistent, then G is true – hence, is not provable, for it says just
that”.

The quoted sentence above is correct, for ω-consistency came into play only
to show that Gödel’s G was not refutable. This part of the ﬁrst theorem is not
needed towards the proof of the second incompleteness theorem.

Imagine now that we have managed to formalize the argument leading to (1)
so that instead of truth in N we can speak of provability in T :†

T Con → G

It follows that if T Con, then T G by modus ponens, contradicting the ﬁrst

incompleteness theorem.

† While this is in principle possible – to formalize the argument that leads to the truth of (1) –
this is not exactly how one proves the deducibility of (1), and hence the second incompleteness
theorem, in practice.
94 I. A Bit of Logic: A User’s Toolbox

I.8.10 Remark. The contribution of Peano arithmetic is that it allows one

to carry out Gödel’s arithmetization formally, and to speak about provability,
within the formal theory. In particular, it allows self-reference.†
Clearly, this machinery exists in all consistent (and recognizable‡ ) extensions
of Peano arithmetic. It also exists in formal theories that may not be, exactly,
extensions but are powerful enough to “contain”, or, more accurately, simulate
Peano arithmetic. Such a theory is ZFC set theory. Clearly it is not an extension,
for the languages do not even match. However we can see that since ZFC is the
foundation of all mathematics, in particular one must be able to do arithmetic
within ZFC.§
Thus the incompletableness phenomenon manifests itself in ZFC as well. In
particular, ZFC has undecidable sentences (ﬁrst incompleteness theorem), and
it cannot prove its own consistency (second incompleteness theorem).¶

I.9. Exercises
I.1. Prove that the closure of I = {3} under the two relations z = x + y and
z = x − y is the set {3k : k ∈ Z}.
I.2. The pair that effects the definition of Term (I.1.5, p. 13) is unambiguous.
I.3. The pair that effects the definition of Wff (I.1.8, p. 15) is unambiguous.
I.4. With reference to I.2.13 (p. 26), prove that if all the g Q and h are defined
everywhere on their input sets (i.e., they are “total”), that is, I for h
and A × Y r for g Q and (r + 1)-ary Q, then f is defined everywhere on
Cl(I , R).
I.5. Prove that for every formula A in Prop (I.3.2, p. 29) the following is
true: Every nonempty proper prefix (I.1.4, p. 13) of the string A has an
excess of left brackets.

† Briefly, imagine that through arithmetization we have managed to represent every formula, and
every sequence of formulas, of L N by a numeral. Gödel defined a formula P (x, y) which “says”
that the formula coded x is provable by a proof coded y. Self-reference allows one to find a
natural number n such that the numeral n codes the formula ¬(∃y) P ( n , y). Clearly, this last
formula says that “the formula coded by n is not a theorem”. But it is talking about itself, for n
is its own code. In short, G ≡ ¬(∃y) P ( n , y).
‡ Recognizability is at the heart of being able to “talk about” provability within the formal
theory.
§ More concretely, and without invoking faith, one can easily show that there is an interpretation,
in the sense of Section I.7, of Peano arithmetic within ZFC. This becomes clear in Chapter V,
where the set of formal natural numbers, ω, is defined.
¶ The formal statement of the incompleteness theorems starts with the hypothesis “If ZFC is
consistent”.
I.9. Exercises 95

I.6. Prove that any non-prime A in Prop has uniquely determined immediate
predecessors.
I.7. For any formula A and any two valuations v and v , v̄(A) = v̄ (A) if v
and v agree on all the propositional variables that occur in A.
I.8. Prove that A [x ← t] is a formula (whenever it is deﬁned) if t is a term.
I.9. Prove that Deﬁnition I.3.12 does not depend on our choice of new vari-
ables zr .
I.10. Prove that (∀x)(∀y)A ↔ (∀y)(∀x)A.
I.11. Prove I.4.23.
I.12. Prove I.4.24.
I.13. (1) Show that x < y y < x (< is some binary predicate symbol; the
choice of symbol here is meant to provoke).
(2) Show informally that x < y → y < x
(Hint. Use the soundness theorem.)
(3) Does this invalidate the deduction theorem? Explain.
I.14. Prove I.4.25.
I.15. Suppose that ti = si for i = 1, . . . , m, where the ti , si are arbitrary
terms. Let F be a formula, and F be obtained from it by replacing any
number of occurrences of ti in F (not necessarily all) by si . Prove that
F ↔ F .
I.16. Suppose that ti = si for i = 1, . . . , m, where the ti , si are arbitrary
terms. Let r be a term, and r be obtained from it by replacing any number
of occurrences of ti in r (not necessarily all) by si . Prove that r = r .
I.17. Settle the “Pause” following I.4.21.
I.18. Prove I.4.27.
I.19. Prove that x = y → y = x.
I.20. Prove that x = y ∧ y = z → x = z.
I.21. Prove (semantically, without using soundness) that A |= (∀x)A.
I.22. Suppose that x is not free in A. Prove that A → (∀x)A and
(∃x)A → A.
I.23. Prove the distributive laws:
(∀x)(A ∧ B ) ↔ (∀x)A ∧ (∀x)B and
(∃x)(A ∨ B ) ↔ (∃x)A ∨ (∃x)B .

I.24. Prove (∃x)(∀y)A → (∀y)(∃x)A with two methods: ﬁrst using the
auxiliary constant method, next exploiting monotonicity.
96 I. A Bit of Logic: A User’s Toolbox

I.25. Prove (∃x)(A → (∀x)A).

In what follows let us denote by 1 the pure logic of Section I.3 (I.3.13
and I.3.15). Let us now introduce a new pure logic, which we will call 2 .
This is exactly the same as 1 , except that we have a different axiom group
Ax1. Instead of adopting all tautologies, we only adopt the following four
logical axiom schemata of group Ax1:†
(1) A ∨A →A
(2) A →A ∨B
(3) A ∨B →B ∨A
(4) (A → B ) → (C ∨ A → C ∨ B )

2 is due to Hilbert (actually, he also included associativity in the axioms,

but, as Gentzen has proved, this was deducible from the system as here given;
therefore, it was not an independent axiom – see Exercise I.35). In the exercises
below we write i for i , i = 1, 2.

I.26. Show that for all F and set of formulas , if 2 F holds then so does
1 F .

Our aim is to see that the logics 1 and 2 are equivalent, i.e., have exactly the
same theorems. In view of the trivial Exercise I.26 above, what remains to be
shown is that every tautology is a theorem of 2 . One particular way to prove
this is through the following sequence of 2 -facts.

I.27. Show the transitivity of → in 2 :

A → B , B → C 2 A → C for all A, B , and C .
I.28. Show that 2 A → A (i.e., 2 ¬A ∨ A) for any A.
I.29. For all A, B show that 2 A → B ∨ A.
I.30. Show that for all A and B , A 2 B → A.
I.31. Show that for all A, 2 ¬¬A → A and 2 A → ¬¬A.
I.32. For all A and B , show that 2 (A → B ) → (¬B → ¬A). Conclude
that A → B 2 ¬B → ¬A.
(Hint. 2 A → ¬¬A.)
I.33. Show that A → B 2 (B → C ) → (A → C ) for all A, B , C .

† ¬ and ∨ are the primary symbols; →, ∧, ↔ are deﬁned in the usual manner.
I.9. Exercises 97

I.34. (Proof by cases in 2 .) Show for all A, B , C , D ,

A → B , C → D 2 A ∨ C → B ∨ D
I.35. Show for all A, B , C that
(1) 2 A ∨ (B ∨ C ) → (A ∨ B ) ∨ C and
(2) 2 (A ∨ B ) ∨ C → A ∨ (B ∨ C ).
I.36. Deduction theorem in “propositional” 2 . Prove that if , A 2 B
using only modus ponens, then also 2 A → B using only modus
ponens, for any formulas A, B and set of formulas .
(Hint. Induction on the length of proof of B from ∪ {A}, using the
results above.)
I.37. Proof by contradiction in “propositional” 2 . Prove that if , ¬A
derives a contradiction in 2 using only modus ponens,† then 2 A
using only modus ponens, for any formulas A and set of formulas .
Also prove the converse.

We can now prove the completeness theorem (Post’s theorem) for the “propo-
sitional segment” of 2 , that is, the logic, 3 – so-called propositional logic
(or propositional calculus) – obtained from 2 by keeping only the “proposi-
tional axioms” (1)–(4) and modus ponens, dropping the remaining axioms and
the ∃-introduction rule.

Note. It is trivial that if 3 A, then 2 A.

Namely, we will prove that, for any A and , if |=Taut A, then 3 A.

First, a deﬁnition:

I.9.1 Deﬁnition (Complete Sets of Formulas). A set is complete iff for

every A, at least one of A or ¬A is a member of .

I.38. Let 3 A. Prove that there is a complete ⊇ such that also 3 A.
This is a completion of .
(Hint. Let F 0 , F 1 , F 3 , . . . be an enumeration of all formulas. There is
such an enumeration, right?
Deﬁne n by induction on n:
0 =
n ∪ {F n } if n ∪ {F n } 3 A
n+1 =
n ∪ {¬F n } otherwise

† That is, it proves some B but also proves ¬B .

98 I. A Bit of Logic: A User’s Toolbox

To make sense of the above deﬁnition, show the impossibility of having

both n ∪ {F n } 3 A and n ∪ {¬F n } 3 A. Then show that =

n≥0 n is as needed.)
I.39. (Post.) If |= A, then 3 A.
(Hint. Prove the contrapositive. If 3 A, let be a completion
(Exercise I.38) of such that 3 A. Now, for every prime formula
(cf. I.3.1, p. 29) P , exactly one of P or ¬ P (why exactly one?) is in .
Define a valuation (cf. I.3.4, p. 30) v on all prime formulas by

0 if P ∈
v(P ) =
1 otherwise
Of course, “0” codes, intuitively, “true”, while “1” codes “false”.
To conclude, prove by induction on the formulas of Prop (cf. I.3.2, p. 29)
that the extension of v, v, satisfies, for all formulas B , v(B ) = 0 iff
B ∈ . Argue that A ∈ / .)
I.40. If |=Taut A, then 2 A.
I.41. For any formula F and set of formulas , 1 F iff 2 F .
I.42. Compactness of propositional logic. We say that is finitely satisfiable (in
the propositional sense) iff every finite subset of is satisfiable (cf. I.3.6,
p. 31). Prove that is satisfiable iff it is finitely satisfiable.
(Hint. Only the if part is non-trivial. It uses Exercise I.39. Further hint: If
is unsatisfiable, then |=Taut A ∧ ¬A for some formula A.)
II

The Set-Theoretic Universe, Naı̈vely

This volume is an introduction to formal (axiomatic) set theory. Putting ﬁrst

things ﬁrst, we are attempting in this chapter to gain an intuitive understanding
of the “real” universe of sets and the process of set creation (that is, what we
think is going on in the metatheory). After all, we must have some idea of what
it is that we are called upon to codify and formally describe before we embark
upon doing it.
Set theory, using as primitives the notions of set (as a synonym for “collec-
tion”), atom (i.e., an object that is not subdivisible, not a collection), and the
relation belongs to (∈), has sufﬁcient expressive power to serve as the foun-
dation of all mathematics. Mathematicians use notation and results from set
theory in their everyday practice. We call the sets that mathematicians use the
“real sets” of our mathematical intuition.
The exposition style in this chapter, true to the attribute “naı̈ve”, will be rather
leisurely to the extent that we will forget, on occasion, that our “Chapter 0”
(Chapter I) is present.†

II.1. The “Real Sets”

Naı̈vely, or informally, set theory is the study of collections of “mathematical
objects”.

II.1.1 Informal Description (Mathematical Objects). Set theory is only in-

terested in mathematical objects. As far as set theory is concerned, such objects

† It is our experience that readers of books like this one often choose to ignore “Chapter 0” initially.
Invariably they are compelled to acknowledge its existence sooner or later in the course of the
exposition. This will probably happen as early as Chapter III in our case.

99
100 II. The Set-Theoretic Universe, Naı̈vely

are either
(1) atomic – let us understand by this term an object that is not a collection of
other objects – such as a number or a point on a Euclidean line, or
(2) collections of mathematical objects.

The foregoing description of “mathematical object” is inductive – describing

the notion in terms of itself† – and, as all inductive descriptions do, it implies
a formation of such objects, from the bottom up, by stages (cf. I.2.9). That
is, we start with atoms.‡ We may then collect atoms to form all sorts of ﬁrst
level collections, or sets as we will normally say. We may proceed to collect
any mix of atoms and ﬁrst-level sets to build new collections – that is, second
level sets – and so on. Much of what set theory does is to attempt to remove the
fuzziness from the foregoing description, and it does so by logically developing
the properties of these sets.

II.1.2 Example. Thus, at the beginning we have all the level-0,

√ or type-0,
objects available to us. For example, atoms such as 1, 2, 13, 2 are available.
At the next level we can include any number of such atoms (from none at all
in one extreme, to all available atoms in the other extreme) to build a set, that
is, a new mathematical object. Allowing the usual notation, i.e., listing within
braces what we intend to include, we may cite a few examples of level-1 sets:
L1-1. { }. Nothing listed. This set has the standard notation ∅, and is known as
the “empty set”.
L1-2. {1}.
L1-3. {1, 1}.
√
L1-4. {1,
√ 2}.
L1-5. { 2, 1}.

Pause. Are the sets that we have displayed under L1-2 and L1-3 the same?
(I mean, equal?) Same question for the sets under L1-4 and L1-5. Our “un-
derstanding” is – gentle way of saying “we postulate” – that set equality is

† Taking for granted an understanding of the terms “atom” and “collection” as intuitively self-
explanatory, we use them to describe the objects that set theory studies. We are purposely leaving
out a description of what “mathematical” is supposed to mean. Sufﬁce it to say that experience
provides numerous examples of mathematical objects, such as numbers of all sorts, points, lines,
vectors, matrices, groups, etc. Of course, one needs an experiential understanding of atomic
mathematical objects only, since all the others are built from those as described in II.1.1.
‡ Atoms are very often called “urelements”, pronounced “ūr-élements” – an anglicized form of the
German word Urelemente – “primeval elements”.
II.1. The “Real Sets” 101

“forgetful of structure” such as repetition or permutation of elements. This un-

derstanding will soon be formally codiﬁed by choosing an appropriate axiom
for set equality.
We already can identify a few level-2 objects, using what (we already know)
is available.

Note how the level of nesting of { }-brackets matches the level of the objects.

L2-1. {∅}.
L2-2. {1,√{1}}.
L2-3. {{ 2, 1}}.

II.1.3 Informal Deﬁnition. A set is a non-atomic mathematical object, as the

latter is described in II.1.1 (p. 99).

The above is not a mathematical deﬁnition, because it is not precise. It is only

an understanding on which we will subsequently base our choice of axioms.
We do not need to attempt to search for the “real, deﬁnitive ontology” of sets
(whatever that may mean) in order to do set theory, any more than we bother to
search for the real ontology of “number” or “point” before we allow ourselves
to do number theory or geometry, respectively.
From the mathematical point of view we are content to have tools (axioms
and rules of logic) that tell us how sets behave rather than what sets are –
entirely analogously with our attitude towards points and lines when we do
axiomatic geometry, or towards numbers when we do axiomatic arithmetic
(see, for example, our development of Peano arithmetic in volume 1 of these
lectures).

Nevertheless, we will accept throughout this volume the previous (inductive)

intuitive description of sets (II.1.3), doing so not because of some deep philo-
sophical conviction, but in the sense that we will let this accepted† ontology
guide us to choose reasonable axioms.

† It cannot be emphasized strongly enough that “accepted” is a very important verb here. Different
descriptions/ontologies of sets may be possible – for example, one that denies Principle 1 below.
Compare with the similar situation in geometry. It is possible to imagine different types of
geometry – Euclidean on one hand, and various non-Euclidean ones on the other – but one is free
to say “I will accept Euclidean geometry as the ‘true’ depiction of the universe and then proceed
to learn its theorems”. All that the latter acceptance means is a decision to study a particular type
of geometry.
102 II. The Set-Theoretic Universe, Naı̈vely

For this process to be effective we have to understand some of the ﬁne points
of II.1.3. Thus we begin by “unwinding” the induction into an iteration. We
obtain the following two principles of set formation that are taken as “obvious
truths”:†

Principle 0. We can form, or build, sets by stages as follows: At stage 0 we

acknowledge the presence of atoms. At each subsequent stage we may form
a mathematical object – a set – by collecting together (mathematical) objects
provided these are available to us from previous stages.
Principle 0 is worded so that it leaves open the possibility that there are some
sets that are obtained outside this formation process. However, our accepted
inductive deﬁnition of sets (II.1.3) requires the following as well:

Principle 1. Every set is built at some stage.

II.1.4 Remark. Principle 1 is too strong. Omitting it does not affect the ap-
plicability of set theory to mathematics, i.e., the status of the former as the
“foundation” of the latter. Of course, we cannot omit this principle unless we
modify the descriptions II.1.1 and II.1.3 (for reasons analogous to the pheno-
menon described in I.2.9).
Now, if Principle 1 holds, as it does under our assumptions, then it leads
to the foundation axiom. This comment will make much more sense later. For
now, if you have just read it you have done so at your own risk.

The following subsidiary (and delightfully vague) principle is important

enough to be listed:

(Subsidiary) Principle 2. If our intuition will accept the existence of a stage

(let us call it ) that follows all the (earliest) stages of construction (as a set)
of each non-atomic member of some collection A, then A is a mathematical
object, and hence is a set (A is not atomic, being a collection). The reason: By
invoking Principle 0 we can built A at stage .

† Not less “obvious” than II.1.3, from which they follow directly. The reader may peek once more
into I.2.9 for motivation, forewarned though that the stages of set formation are “far too many”
to be numbered solely by natural numbers.
By the way, we do not normally speak of formation of atoms. Atoms are given outright. It is
sets that we build.
II.1. The “Real Sets” 103

II.1.5 Remark. (1) We are not saying above that stage is the “earliest” stage
at which A can be built, since we have said “follows” rather than “immediately
follows”.
(2) If some set is definable (“buildable”) at some stage , then we find
it both convenient and intuitively acceptable to agree that it is also definable
at any later stage as well. This corresponds to the common experience that a
theorem has proofs of various lengths; once a “short” proof has been given,
then – for example by adding redundant axioms in this proof – we can lengthen
it arbitrarily and yet still have it yield the same theorem.
(3) “If our intuition will accept . . . ”. This condition in Principle 2 creates
some difficulty. Whose intuition? What is acceptable to some might not be to
others.
This is a problem that arises when one does one’s mathematics like a
Platonist. A Platonist accepts some “obvious truths” about mathematical ob-
jects, and then proceeds to discover some more truths by employing (infor-
mal) logical deductions. Most practising mathematicians practise their craft like
Platonists (whether they are card-carrying Platonists or not).
The catch with this approach, especially when applied to something “big” –
by this I mean “foundational” – like set theory, is that one cannot always syn-
chronize the understandings of all Platonists as to what are the “obvious truths”
(about sets) – from where all reasoning begins to flow. There was a time not
too long ago, for example, that mathematicians, otherwise comfortable with
infinite sets, were not unanimous on whether the set-theoretic principle known
as the axiom of choice was valid or not.
In the end, we avoid this difficulty by adopting the axiomatic approach to
set theory. The Platonist within each of us may continue thinking of the sets
that were imperfectly described in II.1.3 as the “real sets” – the ones that,
Platonistically speaking, “exist”. However, we plan to learn about sets by argu-
ing like formalists. That is, we will translate a few obvious and important truths
about real sets into a formal language (these translations will lead to our axiom
schemata) and then employ first order logic as our reasoning tool to learn about
real sets, indirectly, by proving theorems in our formal language.†
Thus, once the imprecise set-formation-by-stages thesis has motivated the
selection of the above-mentioned “few obvious and important truths”, it will

† The indirection occurs because in this language we will use terms to represent or codify real sets,
and formulas to represent or codify properties of real sets. The reader who has read volume 1 is
by now familiar with this approach, which we applied there in Chapter II to the study of (Peano)
arithmetic. For terminology – such as “formal language”, “term”, “formula”, “metatheory” – and
tools from logic, the reader is referred to Chapter I of the present volume.
104 II. The Set-Theoretic Universe, Naı̈vely

never be invoked again. Indeed, the opposite will happen. Our axioms will be
strong enough to precisely deﬁne (eventually) what stages are and what happens
at each stage, something that we are totally powerless to do now.

Another criticism of the Platonist’s approach to set theory is that it may entail
contradictions (often called antinomies or paradoxes) which are hard to work
around. Such paradoxes come about in the Platonist approach because it is not
always clear what is a safe “truth” that we can adopt as a starting point of our
reasoning. For example, is the following a “safe truth”? “For any property ‘A’
we can build a set of all the objects x that satisfy A.” We look into this question
in the next section. We also ponder briefly, through an example immediately
below, the nature of set-building, or set-defining, “properties”.
A bit on terminology here: Some people call the contradictions of naı̈ve set
theory “antinomies” (e.g., the Russell antinomy), and the harmless pleasantries
of the Berry type “paradoxes”. Others, like ourselves, use just one term, para-
doxes. The reader may wish to decide for himself on the choice of terminology
here, given that both words are rooted in Greek and a paradox is something
that is “against one’s belief” or even “against one’s knowledge” (δoκ ώ = “I
believe”, or, “I know”) while antinomy means being “against the – here, logical
or mathematical – law” (ν óµoς = “(the) law”).
By the way, Berry’s paradox is this: Define n by “n is a positive integer
definable using fewer than 1000 non-blank symbols of print”.† Examples of
possible values of n: “5”, “10”, “10 raised to the power 350000”, “the smallest
prime number that has at least 10 raised to the power 350000 digits”.
Now, the set of such numbers is finite, since there are finitely many ways to
write a definition employing fewer than 1000 non-blank symbols. Thus, there
are plenty of positive integers that are not so definable. Let m denote the smallest
such.
Then “m is the smallest positive integer not definable using fewer than 1000
non-blank symbols of print”.
Hey, we have just defined m in less than 1000 non-blank symbols of print.
A contradiction!‡

II.1.6 Remark. It should be pointed out that our Platonist’s view of “real sets”
is informed by the work of Russell (and the later work of von Neumann),
† There is an implicit understanding that the set of all available symbols of print is finite: e.g.,
nowadays we could take as such the set of symbols on a standard English computer keyboard.
‡ Well, not really. Neither of the statements “n is a positive integer definable using fewer than
1000 non-blank symbols of print” or “m is the smallest positive integer not definable using fewer
than 1000 non-blank symbols of print” is a definition. What does “definable” mean?
II.2. A Naı̈ve Look at Russell’s Paradox 105

namely, by his suggested “ﬁx” for the paradox that he discovered – see next
section. Georg Cantor, the founder of set theory, did not require any particular
manner, or order, in which sets are formed. The axioms of the ZFC set theory
of Zermelo and Fraenkel describe the von Neumann universe, which is built
by stages, rather than the Cantorian universe. In the latter, as many sets can be
present at once as our thought or perception will allow.†

II.2. A Naı̈ve Look at Russell’s Paradox

Let us ponder an elementary but fundamental example of what sort of contra-
dictions might occur in the informal approach.

II.2.1 Example. ‡ Let us recall (from Chapter I or from our previous mathe-
matics courses) that the notation
S = {x : A [x]} (1)
denotes (naı̈vely) the set S of all objects x that satisfy the formula A[x].§ This
means that entrance into S is determined by
x∈S iff A[x] (2)
where, of course, by “x ∈ S” we mean “x is a member of S”.
Let us see why the “Russell set”
R = {x : x ∈
/ x} (3)
is bad news for the informal approach: By (2), (3) yields
x ∈ R iff x∈
/x (4)
Now, since the variable x can receive as value any object of the theory, in
particular it can receive the “set” R. Thus, (4) yields the contradiction
R∈R iff R∈
/ R (5)
Our only way out of the contradiction (5) is to say that

R is not a set.¶

† In Cantor’s own description, a set is any collection into “a whole” of objects of our “perception
or of our thought”.
‡ Reminder: This is at the informal level.
§ The square and round bracket notation is introduced in I.1.11.
¶ This saves the theory, for now, since then it is “illegal” to plug R into the set/atom variable x;
hence (5) will not be derived from (4).
106 II. The Set-Theoretic Universe, Naı̈vely

Here is what happened: We have obtained outrageously many x’s, each

satisfying x ∈
/ x. We then decided to collect them all, and build a set R. Our
blunder was that we did not verify that Principle 2 (p. 102) applied to R.
No checking, no right to claim sethood for R!
Thus, the fact that R is not a set is neither a surprise nor paradoxical. Ap-
parently we have run out of stages. By the time all the x’s were built, there was
no next stage left at which we could collect them all into a set R.
You are shaking your head. But consider this: x ∈ x has to be false for any
object x. Indeed, it is trivially false for atomic x. For non-atomic x, in order to
build the copy to the right of “∈” I must ﬁrst have (at an earlier stage† ) the x to
the left of “∈” (since it is a member of the collection x).
Thus x ∈ / x is true for all objects x. But then R contains everything, for the
entrance condition in (3) is always true. No wonder there were no stages left to
build R. We have used them all up building the x’s!

Now that we realize that some collections such as S in (1) above are sets, and
some are not, how can we tell which is which? The axiomatic approach resolves
such issues in an elegant way.

II.3. The Language of Axiomatic Set Theory

Having taken our foregoing terse description of how sets are built – by stages – as
our (Platonist) view of what sets really are, we now want to avoid embarrassing
paradoxes and to turn the theory into a consistent deductive science. The obvious
approach is to translate or codify naı̈ve set theory into a formal ﬁrst order theory,
in the sense of Chapter I. We begin by choosing a formal ﬁrst order language,
L Set .
L Set has the standard logical symbols, namely,

∃, ¬, ∨, =, (, )

† “Hmm”, the alert reader will say. “You are using Principle 1 here. You are saying that if x is a
non-atomic mathematical object, then it must be built at some stage.” Indeed! However, even if we
were to totally abandon Principle 1 and revise our naı̈ve picture of the universe of “mathematical
objects” to allow x ∈ x to be true (depending on the “value” of x), we could still avoid the
Russell paradox argument in exactly the same way we avoid it in the presence of Principle 1:
Namely, by restricting the circumstances where the “operation” {x : A[x]} is allowed to build
a set. In short, it is not the choice of an answer to the question “x ∈ x” that creates the Russell
paradox, rather it is a comprehension principle, {x : A[x]}, that is far too powerful for its own
good.
II.3. The Language of Axiomatic Set Theory 107

and object variables, that is, variables that when interpreted are interpreted to
take as values (real) sets or atoms,†

v0 , v1 , . . . , vi , . . .

Additionally, L Set has the two primitive nonlogical symbols “∈”‡ and “U ”.§
The former is a binary predicate that is intended to mean (when interpreted) “is
a member of”. The latter is a unary predicate meant to say (of its argument) “is
an atom”. All the remaining familiar symbols of set theory (e.g., ∩, ∪, ⊆, ×)
are introduced as deﬁned nonlogical symbols as the theory progresses.

Of course, exactly as in Chapter I, one introduces, in the interest of convenience,

deﬁned logical symbols, namely, ∀, ∧, →, ↔.

The logical axioms and rules of ﬁrst order logic will be those that we have
introduced in Chapter I.
Our intended “standard model” – i.e., what we are describing by our formal
system – is the already (imperfectly) described “universe” of all sets and atoms.¶
Having here a standard model in mind, which the axiomatic theory attempts
to describe correctly and completely,# is entirely analogous to what we did in
volume 1.
There we had the standard model of arithmetic, N = (N, S, +, ×, 0, <),
in mind, and each of the axiomatizations introduced, ROB and PA, were suc-
cessive attempts at formally deducing all the true formulas of N from a few
axioms.||

† This is the implementation of our intentions regarding the nature of “mathematical objects”
(II.1.1).
‡ “∈” is a stylized form of ε (épsilon) the first letter of the ancient Greek word “εσ τ ί” (pronounced
estı́ – with a short “ i ” – and meaning “is”). Thus, if y is the set of all even integers, x ∈ y says
that “x is an even integer”. Some authors still use x ε y instead of x ∈ y, but we prefer not to do
so, as “ε” is overused (e.g., empty string, epsilon number, Hilbert selector, a major contributor
to the dreaded “ε-δ” proofs of calculus, etc.).
§ “Primitive” means “primeval” or “given at the very beginning”.
¶ Known as the von Neumann universe. That this universe is not a set – it is equal to the Russell
collection R granting Principle 1, is it not? – is an issue we should not worry about, as long as
we accept that its members are all the sets and atoms.
# The terms correctness and (syntactic or simple) completeness of a theory are defined in I.8.2
and I.8.3. The former means that every theorem is true when interpreted in the standard model.
The latter means that all formulas that are true in the standard model are theorems. We have
no difficulty with the former requirement. The latter is impossible by Gödel’s incompleteness
theorems (I.8.5).
|| Again, we could not produce all such formulas, because of Gödel’s incompleteness theorems.
108 II. The Set-Theoretic Universe, Naı̈vely

The choice of the intended model inﬂuences the choice of (nonlogical)

axioms. We will adopt in this book the Zermelo-Fraenkel axioms (ZF) with
the axiom of choice as an additional axiom. This system is known as ZFC.

II.3.1 Remark. To an observer we will appear to behave like formalists,

manipulating sets and their properties in a finitistic manner, writing proofs
within a first order theory.
Sets will be† just terms of our language, thus finite symbol sequences! Prop-
erties of sets will also be finite objects, the formulas of the language. Finally,
proofs themselves are finite objects, being finite sequences of formulas.
We do not have to take sides or disclose where our loyalties lie – Platonist
vs. formalist camp – as such disclosure is functionally irrelevant. What really
matters is how we act when we form deductions.‡

The deﬁnitions of terms and formulas for L Set are those given in Chapter I
(I.1.5 and I.1.8) subject to the restriction that the only primitive nonlogical
symbols are the two predicates ∈ and U .

II.3.2 Remark (Basic Language). Thus, the terms of L Set are just the variables,
v0 , v1 , v2 , . . . .
Formulas are built from the atomic ones, that is, U vi , vi = v j , and vi ∈ v j
for all choices of i, j in N, by application of the connectives ¬, ∨, and ∃ (I.1.8).
We call L Set the basic or primitive language of set theory. The qualifiers
“basic” and “primitive” reflect the fact that the only nonlogical symbols are the
primeval ∈ and U . As the theory is being developed, we will frequently introduce
new defined symbols, thus extending L Set (cf. Section I.6). This process also
enlarges the variety of terms (adding terms such as ∅, {x : ¬x = x}, ω, etc.)
and formulas.
We note that the definitions of terms and formulas of L Set are strictly about
syntax – i.e., correct form. Thus they do not concern themselves with semantic
issues or provability issues. In particular, it is good form to write “v2 ∈ v2 ”,
even if one of our axioms will entail that “v2 ∈ v2 ” is a false statement.§

† “Be” is used here in formalist jargon. The Platonist terminology is “be denoted by”.
‡ A true formalist would probably declare that the sets of our intuition do not really “exist” –
mathematically speaking – and sets just are the terms of our formal language. See Bourbaki
(1966b, p. 62), where it is stated, in translation, that “[ . . . ] the word ‘set’ will be strictly considered
to be a synonym for ‘term’; in particular, phrases such as ‘let x be a set’ are, in principle, totally
superﬂuous, since every variable is a term; such phrases are simply introduced to help the intuitive
interpretation of [formal] texts”.
§ We have already remarked in II.2.1 that x ∈ x is false in our intended universe.
II.3. The Language of Axiomatic Set Theory 109

II.3.3 Remark (Notational Liberties). In practice we use abbreviations –

in the metalanguage – in order to enhance readability. The reader may wish
to review the metalinguistic argot introduced in Chapter I, in particular the
agreement that calligraphic upper case (Latin) letters stand for formulas (see
Remark I.1.9 for more on this) while t, s, r typically are metasymbols for
arbitrary terms.
We also take liberties with the correct syntax of formulas and terms, writing
them down in abbreviated more readable form.† One type of abbreviation has to
do with reducing the number of brackets that we use when we write formulas.
This has been discussed in Chapter I.
We also have metalinguistic abbreviations for the variables we use. Instead
of the cumbersome v1234777 , v90 , etc., we adopt the convention that any lower or
upper case single Latin letter, with or without subscripts or primes, will denote
an object variable.
We will prefer to name variables using letters near the end of the alphabet.
Nevertheless, we will often introduce variables such as A, b, c, or even go to
Greek and German (Fraktur) alphabets to obtain names for variables, such as
α, β and m, k.
We will almost never write down a well-formed formula of set theory (except
for the purpose of mocking its unfriendliness and awkwardness). We will prefer
“translations” of the formula in our argot, where abbreviations of various sorts,
and natural language, are allowed. This renders the formula easier to read and
comprehend.

II.3.4 Example. Picking up the last comment above, we show here two exam-
ples of what the judicious use of English saves us from:
(a) We would sooner say “n is a natural number” than write the set theory
formula
(∀x)(∀y)(x ∈ y ∈ n → x ∈ n) ∧
[n = ∅ ∨ (∃x)(¬U (x) ∧ n = x ∪ {x}] ∧

(∀m) m ∈ n → (∀x)(∀y)(x ∈ y ∈ m → x ∈ m) ∧

[m = ∅ ∨ (∃x)(¬U (x) ∧ m = x ∪ {x}]

It should be noted that the above is already abbreviated. It contains the

deﬁned symbols ∅, ∪ and {x}, not to mention that the variables used were

† “Abbreviated” is not always shorter. +x × yz is shorter than x +(y ×z), and f t1 t2 t3 is shorter than
f (t1 , t2 , t3 ). Yet the longer forms are easier to understand. An abbreviation here is an alternative,
easier to understand form.
110 II. The Set-Theoretic Universe, Naı̈vely

written in argot and that we employed logical abbreviations such as →, ∀,

etc., and brackets of various shapes.
(b) If we are in number theory (arithmetic) we would sooner state “n is a prime”,
than

n > S0 ∧ (∀x)(∀y)(n = x × y → x = S0 ∨ x = n)

II.4. On Names
The reader is referred to Section I.1 (in particular see the discussion starting
with Remark I.1.3 on p. 10) so that we will not be unduly repetitive in the present
section.

II.4.1 Remark (The Last Word on “Truth”). The completeness theorem

shows that the syntactic apparatus of a first order (formal) logic totally captures
the semantic notion of truth, “modulo” the acceptance as true of any given assu-
mptions, . This justifies the habit of the mathematician (even of the formalist –
see Bourbaki (1966b, p. 21)) of saying – in the context of any given theory –
“it is true”, meaning “it is a -theorem”, or “it is -proved”; “it is false”, mean-
ing “the negation is a -theorem”; “assume that A is true”, meaning “add the
formula A – to – as a nonlogical axiom”; and “assume that A is false”,
meaning “add to the formula ¬A, as a nonlogical axiom”.
There is another meaning (and use) of “true” which is not equivalent to
deducibility. This is what we have called the “really true”, meaning what is true
in the intended, or standard, model.
The Gödel incompletableness phenomenon tells us that strong theories like
set theory or arithmetic will never† have deducibility coincide with “real truth”.
This is because there will always be sentences A that are neither provable nor
refutable – but one of them surely is “really true”!
We plan to abandon the qualifier “real” (as we promised in an earlier footnote)
and the quotes around true (in the standard model). To avoid confusion with
the “other” true ( = deducible) we will do the following:
Whenever we mean “is proved” or “is provable”, we just say so. We will not
say “is true” instead.

It will be convenient (and it is standard practice) to use the symbol sequences

that are terms of the formal theory as names for their counterparts, real sets of the

† “Never” as long as all consistent augmentations of the set of axioms preserve the set’s
recursiveness – or “recognizability”.
II.4. On Names 111

metatheory. For example, in the metatheory we may say “the set {x : ¬x = x}”,
thus using the symbol sequence “{x : ¬x = x}” to name some appropriate real
set, the so-called empty set.†

This correspondence between certain terms‡ of the type “{x : A [x]}” and real
sets is nothing else than an application of first order definability (cf. I.5.15).
That is, if some real set A is first order definable in the standard model by a
formula A, we have

x∈A iff A(x) is true in the standard model

Thus the symbol sequence A , or more suggestively the symbol sequence

{x : A(x)}, can name the set A. As we know, the latter sequence is pronounced
“the set of all x such that A(x) is true”.

Reciprocally, in our argot, we nickname formal terms and their formal ab-
breviations by the metamathematical (often English) names of the sets that
they name. Thus, e.g., we say that {x : ¬x = x}, or ∅, is “the empty set of the
(formal) theory”.
We note two limitations of this naming apparatus below.

II.4.2 Remark (Limitations of our Naming Mechanism).

(a) Inconvenience. This stems from the fact that formal terms, even for very
simple sets, can be horrendously long, and thus can be quite incomprehen-
sible. For this reason we almost always introduce, via formal definitions,
short names for such terms (formal abbreviations) that we just make up –
that is, we name the formal names by more convenient (shorter) names
that we invent. These shorter defined names become part of the formal lan-
guage of the formal theory. For example, the term {x : ¬x = x} is formally
abbreviated by a new (defined) constant symbol, ∅.
(b) Formal limitations. First, we cannot name – that is, first order define –
all the real sets by terms, because there are far more sets than terms that
we can supply via the formal language. We cannot even so define all the
subsets of the set of natural numbers. As a consequence, we cannot codify
all “properties” of sets in our language as formulas either, because there are
far too many properties but too few formulas.§ Second, as if the short supply

† I am guilty here of borrowing from the sequel. “{x : ¬x = x}” is not a term of the basic
language II.3.2; instead it is a deﬁned term, about which we will talk soon.
‡ Russell’s paradox is fresh in our memory; thus, “certain” is an apt qualiﬁer.
§ We cannot gloss over this shortage of names by extending L Set by the addition of a name for
each real set. That would make our language impractical, as it would make it uncountable, and
112 II. The Set-Theoretic Universe, Naı̈vely

of formulas were not limiting enough, Gödel’s ﬁrst incompleteness theorem

yields another insurmountable limitation. It tells us that in any consistent
axiomatization of set theory through a recognizable set of axioms there will
be inﬁnitely many “true properties” of sets for each of which we do have a
formal name, but nevertheless none of these formal names (formulas of L Set )
are deducible in the formal theory. Thus the theory can only incompletely
capture “real truth”.

Unlike the limitation of convenience (a), which we have easily circumvented

above, there is no solution for (b).
But then, why bother with a formal theory at all? I can state two reasons.
The first is the precision that such a theory gives to the concept of deduction,
turning it into a mathematical (finite) object. Thus, questions such as “is our
set theory free from contradiction?”† or “what is, and what is not, deducible
from what axioms?” become meaningful and can be handled mathematically,
in principle.
The above are metatheoretical concerns. The second reason has to do with
everyday mathematical practice. We benefit from the precision that a formal
theory gives to the praxis of deduction, guarding us against embarrassing para-
doxes that loose arguments or loose assumptions may lead to.

II.4.3 Example. This, like most examples, is in the “real (informal) realm”.
The natural numbers 0, 1, 2, . . . when collected together form a set, normally
denoted by N.
We often capture the above sentence informally by writing N = {0, 1, 2, . . . }.

II.4.4 Remark. N is a remarkable example of a (real) set, in that we have no

easy way to give a term name for it in set theory. The next best thing to do is,
instead, to ﬁnd another real set, ω, “isomorphic to N ”,‡ that can easily be seen
to have a term counterpart in the formal theory.
Needless to say, as follows from our previous discussion, both the real ω and
the corresponding formal term are denoted by the same symbol, ω.

therefore impossible to generate ﬁnitely. In an uncountable language we will not be able to write,
or even check, proofs anymore, as we will have trouble telling what symbols belong to L Set and
which do not. As a result, we will be unable to know whether an arbitrary string of symbols is a
formula, an axiom, or just rubbish.
† This is the original reason that prompted the development of axiomatic theories.
‡ The reader should not worry about the meaning of “isomorphic”. We will come back to this very
issue in Chapter V.
II.4. On Names 113

Notwithstanding the comment regarding N, we will continue employing it,

as well as other familiar sets from the metatheory (such as Z (the integers), Q
(the rationals), and R (the reals)) in our informal discussions, i.e., in examples,
remarks, “naı̈ve” exercises, etc.

II.4.5 Remark. At the outset of Section II.3 we promised “to turn the theory
into a consistent deductive science”.
It may come as a shock to the reader that we have no (generally acceptable)
proof of consistency of ZFC. We Platonistically got around the consistency
question of either ROB or Peano arithmetic by saying “sure they are consistent;
N is a model of either”, since few reasonable people will feel uncomfortable
about N or its fitness to certify consistency (serving as a model). Notwith-
standing this, proof theorists have found alternative constructive proofs of the
consistency of Peano arithmetic and hence of ROB (such proofs can be found
in Schütte (1977) and Shoenfield (1967)). These proofs necessarily use tools
that are beyond those included in Peano arithmetic (because of Gödel’s second
incompleteness theorem).
We have no such constructive proof of the consistency of ZFC. This, of
course, is not surprising. Since ZFC satisfies Gödel’s second incompleteness
theorem, a proof of its consistency cannot be formalized within ZFC. Here then
is the difficulty: What will any such consistency proof “outside” or “beyond”
ZFC look like, considering that
(a) it cannot be expressed in ZFC, and yet
(b) ZFC, being the “foundation of all mathematics” (or such that “mathematics
can be embedded” in it), ought to be able to include (formalizations of ) all
mathematical tools and mathematical reasoning – including a formalization
of its consistency proof that was given “outside” ZFC.
However, most set theorists are willing to accept the consistency of ZFC.
“Evidence” (but not a proof ) of this consistency is, of course, the presence of
the standard model.
III

The Axioms of Set Theory

III.1. Extensionality
Under what conditions are two sets equal?

First of all, if a and b stand for urelements, then a = b just obeys the logical
axioms of equality (Deﬁnition I.3.13, p. 35) and we have nothing to add about
their behaviour concerning equality.

For sets, however, we require that they be equal whenever they contain
exactly the same elements, regardless of whatever “structural connections”
these elements may have. In order to state this axiom formally we use the
primitive predicate of set theory, U . Thus U x is intended to mean “x is an
urelement” (therefore ¬U x will mean “x is a set”).
We use the “abbreviation”† “U (x)” for “U x”, since it is arguable that, in
general, “P(t1 , . . . , tn )” is more user-friendly than “Pt1 . . . tn ”.

III.1.1 Axiom (Extensionality).

¬U (A) ∧ ¬U (B) → (∀x)(x ∈ A ↔ x ∈ B) → A = B (E)
In words, for any sets A and B, if they have the same elements, then they are
equal.

III.1.2 Remark. We noted that the above axiom, (E), indicates that we want
two sets to be equal as long as they have the same elements, regardless of
the existence of inner structure in the sets (such as one dimensional or higher

† We have already remarked in a footnote on p. 109 that an “abbreviation” is meant to create

easier-to-read text, not necessarily shorter text.

114
III.1. Extensionality 115

dimensional order) and regardless of “intention”, that is, how the set originally
came about. For example, the set that contains the integers 2 and 3 is expected
to be the same as (equal to) the set of all roots of x 2 − 5x + 6 = 0, despite the
difference in the two descriptions. That is, we have postulated that set equality
is “forgetful of structure”.
It is the extension of a set (i.e., its actual contents) that decides equality,
hence the name of Axiom III.1.1.
But is this axiom “true”?† Is this the condition that governs equality of “real”
sets? Well, formal or axiomatic mathematics aims at representing reality within
an artificial but formal and precise language. In this “representation” there is
always something lost, partly due to limitations of the formal language and
partly due to decisions that we make – regarding the choice of our assumptions,
or axioms – about what features of “reality” are essential (of which we create
counterparts in the formal language) and which are not.
For example, a “real” line has width no matter how you construct it, but
geometers have decided that width is irrelevant, so they invent their lines so
as to have no width. In our case, we are saying that to decide set equality we
forget all attributes of sets other than what elements they contain. This is what
we deem to be important.
Now that we have defended our choice of (E), another question arises: Is
(E) a definition? Much of the elementary literature on the theory of sets takes
the point of view that it is (see Wilder (1963, p. 58), for example), although
often somewhat casually.
A formal definition would introduce the symbol “=” by (E), if the symbol
were not part of our “logical list” of symbols. Since we already have “=” and
its basic axioms, (E), for us, is an axiom.‡

Note that in the extensionality axiom we state no more than what we need –
following the mathematician’s known propensity to assume less in order to have
the pleasure of proving more. This accounts for using “· · · → A = B” rather
than “· · · ↔ A = B”. In fact, we have

¬U (A) ∧ ¬U (B) → A = B → (∀x)(x ∈ A ↔ x ∈ B) (1)

where “” indicates provability without using any nonlogical axioms.

† Remember that while we cannot give a proof of consistency of ZFC, we can at least check that its
axioms are “really true”, i.e., true in the standard model. This checking is done on the informal
level, of course.
‡ In fact, a formal deﬁnition is still an axiom, via which a new formal symbol is introduced – as
we saw in Section I.6. But this is not the case with (E).
116 III. The Axioms of Set Theory

To see this, note that

A = B → (x ∈ A ↔ x ∈ B)
by equality axioms. Then, since A = B has no free x,† ∀-introduction (cf. I.4.5,
p. 44) yields
A = B → (∀x)(x ∈ A ↔ x ∈ B)
(1) now follows by tautological implication (cf. I.4.1, p. 43).

We have said that urelements have no set-theoretic structure, that is, if b is

an urelement, then the claim a ∈ b is false for all possible meanings of a. This
is formalized below.

III.1.3 Axiom (Urelements are “Atomic”).

U (y) → ¬(∃x)x ∈ y
The above says that urelements do not have any elements; however, that does
not make them empty sets, for urelements are not sets. The content of III.1.3
can also be written as
U (y) → (∀x)¬x ∈ y
or even
U (y) → (∀x)x ∈
/y
where “x ∈
/ y” is an informal (metamathematical) abbreviation of “¬x ∈ y”.

III.1.4 Remark. The contrapositive of III.1.3 is

(∃x)x ∈ y → ¬U (y)
that is, intuitively, “if y has any elements, then it is a set”.
It is also useful to note the consequence
x ∈ y → ¬U (y)
of the above (substitution axiom x ∈ y → (∃x)x ∈ y and tautological impli-
cation).

III.1.5 Deﬁnition (Subsets, Supersets). We introduce to the formal language

a new predicate of arity 2, denoted by “⊆”, by the deﬁning axiom
A ⊆ B ↔ (∀x)(x ∈ A → x ∈ B) (1)

† A and B are free variables distinct from x.

III.1. Extensionality 117

In English, in the case where A and B are sets, this says – that is, the semantics
in the metatheory is – that A ⊆ B is a short name for the statement “every
member of A is also a member of B”.
We read “A ⊆ B” as “A is a subset of B”, or “B is a superset of A”. Instead
of A ⊆ B we sometimes write B ⊇ A. As usual, we negate ⊆ and ⊇ by writing
⊆ and ⊇ respectively.

III.1.6 Remark. In III.1.5 we chose to allow the symbol ⊆ to act on any objects
of set theory – sets or atoms. An alternative approach that is often adopted in the
literature on naı̈ve set theory is to make A ⊆ B undeﬁned (or meaningless) if
either A or B is an atom (this would be analogous to the situation in Euclidean
geometry, where, for example, parallelism is undeﬁned on, say, triangles or
circles). Our choice in III.1.5, that is, (1), is technically more convenient, since
it does not require us to know the exact nature of A or B before we can use the
(formal) abbreviation A ⊆ B.
We note that, according to III.1.5, x ∈ A → x ∈ B is provable if A is an
urelement (by III.1.3), that is, A ⊆ B is provable. Indeed,

U (A) ZFC (∀x)¬x ∈ A

by Axiom III.1.3 and modus ponens. Thus,

U (A) ZFC ¬x ∈ A (2)

by specialization (cf. I.4.6, p. 44). By tautological implication followed by

generalization (I.4.8) we get what we want from (2):

U (A) ZFC (∀x)(x ∈ A → x ∈ B)

or, applying the deduction theorem (I.4.19),

ZFC U (A) → (∀x)(x ∈ A → x ∈ B)

We use the provability symbol with a subscript, e.g., T , to indicate in

which theory T (i.e., with what nonlogical axioms) we carry out the proof. In
the simple proofs above we have used the subscript ZFC, but we only employed
Axiom III.1.3. We will seldom indicate what subset of ZFC axioms we are using
at any given moment, and whenever we do, we will normally do so in words
rather than using some -subscript different from ZFC.
The reader will also note that “A T . . .” is the same as “T + A . . .”
or “T ∪ {A} . . .”
118 III. The Axioms of Set Theory

III.1.7 Example. Since x ∈ a → x ∈ a is a tautology (note the absence of sub-

script on the that follows), we have x ∈ a → x ∈ a and hence
(∀x)(x ∈ a → x ∈ a) (∗)
by generalization. Thus, by III.1.5 and tautological implication,
a⊆a (∗∗)
or any object is a subset of itself.
We did not use a subscript (e.g., ZFC) on immediately above because no
ZFC axioms were used.

We immediately infer from III.1.1 and III.1.5 the following Proposition

III.1.8.

III.1.8 Proposition. For any two sets A and B,

A=B↔ A⊆B∧B⊆ A
holds, or, formally,
ZFC ¬U (A) ∧ ¬U (B) → (A = B ↔ A ⊆ B ∧ B ⊆ A)

An observation is in order in connection with the above: Logical connectives

have lower priority than any other connectives, so that “A ⊆ B ∧ B ⊆ A”
means “(A ⊆ B) ∧ (B ⊆ A)”.

Proof. Invoking the deduction theorem (I.4.19), we prove instead

¬U (A), ¬U (B) ZFC A = B ↔ A ⊆ B ∧ B ⊆ A (1)
We offer a calculational proof:

A ⊆ B ∧ B ⊆ A
↔ III.1.5 and the equivalence theorem (I.4.25, p. 52)
(∀x)(x ∈ A → x ∈ B) ∧ (∀x)(x ∈ B → x ∈ A)

↔ ∀ over ∧ distributivity (Exercise I.23, p. 95)

(∀x) (x ∈ A → x ∈ B) ∧ (x ∈ B → x ∈ A)

↔ tautological equivalence and equivalence theorem
(∀x)(x ∈ A ↔ x ∈ B)

↔ extensionality, plus assumptions for “→”; Leibniz axiom for “←”
A=B
III.2. Set Terms; Comprehension; Separation 119

It often happens that A ⊆ B, yet A = B, where “A = B” is (informal) short

for “¬A = B”.

III.1.9 Deﬁnition (Proper Subsets). We introduce a new predicate symbol of

arity 2, denoted by “⊂”, by the deﬁning axiom
A ⊂ B ↔ A ⊆ B ∧ ¬A = B
We read “A ⊂ B” as “A is a proper subset of B”.

The reader will note that, stylistically, ⊆ and ⊂ parallel the symbols ≤ and
< (compare how a < b, for numbers, means a ≤ b and a = b). It should be
mentioned however that it is not uncommon in the literature (e.g., in Bourbaki
(1966b), Shoenﬁeld (1967)) to use ⊂ where we use ⊆, and then to need to use
or even to denote proper subset.

III.2. Set Terms; Comprehension; Separation

We now want to imitate the informal act of collecting into a set all objects x that
satisfy (i.e., make true – in the standard model) a formula A [x]. We already
saw that a careless approach here entails dangers (Section II.2). We revisit this
issue again here, and then provide a formal ﬁx.
It is clear that the reasonable thing to do within the formal theory is to restrict
attention to formulas A for which we can prove the existence of a set, say a,
such that
x ∈ a ↔ A [x]
is provable, thus replacing truth (cf. I.5.15 and also p. 111) by provability. We
achieve this if we can prove

ZFC (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ A [x]) (1)

where we have taken the precaution that y is not free in A.† Bourbaki (1966b)
calls formulas such as A “collecting”.

As in loc. cit., we use the symbol

Coll x A (2)

† Otherwise we would be attempting to “solve” for y in something like “x ∈ y ↔ A (x, y)”, which
is not the same as collecting in a container called y all those “values” of x that make A (x, z)
“true” for an arbitrarily chosen value of the “parameter” z. Such rather obvious remarks will
become sparser as we go along.
120 III. The Axioms of Set Theory

as an abbreviation of

(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ A [x])

Note that x is not a free variable in (1) (or in (2)), but it is, nevertheless, the
free variable of interest in A, the variable whose “values” (that “satisfy” A)
we are determined to collect. (2) says that indeed we can collect these “values”
into a set.

III.2.1 Example. We verify that if y is a set, then Coll x x ∈ y. It will be best to

give a terse annotated proof of the formal translation of the italicized statement,
that is,

ZFC ¬U (y) → Coll x x ∈ y (A)

It is easier to tackle instead

¬U (y) ZFC (∃z) ¬U (z) ∧ (∀x)(x ∈ z ↔ x ∈ y) (B)

where z is distinct from y and x:

(1) ¬U (y) given

(2) x∈y↔x∈y tautology, hence

logical axiom

(3) (∀x)(x ∈ y ↔ x ∈ y) (2) plus generalization

(I.4.8, p. 45)

(4) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ y) (1), (3) plus

taut. implication
(5) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ y) →

(∃z) ¬ U (z) ∧ (∀x)(x ∈ z ↔ x ∈ y) logical axiom

(6) (∃z) ¬ U (z) ∧ (∀x)(x ∈ z ↔ x ∈ y) (4), (5), and modus ponens

We also note that, by the deduction theorem, (B) yields

ZFC ¬U (y) → (∃z) ¬U (z) ∧ (∀x)(x ∈ z ↔ x ∈ y) (C)

which is an expanded notation for (A). Intuitively, all of (A), (B), and (C) say
that if y is a set, then ZFC allows us to collect all the x for which x ∈ y is true
III.2. Set Terms; Comprehension; Separation 121

into a set. This is hardly surprising at the intuitive level, since this collection
that we form is just y – and we already know it is a set. Nevertheless, it is
reassuring that we have had no bad surprises here.

III.2.2 Example (Russell’s Paradox, Second Visit). In this example we go

out of our way to show that Russell’s paradox can be argued within pure logic –
in II.2.1 we appeared to be arguing within (informal) set theory – and that it
has nothing to do with one’s position on the set-theoretic question “x ∈ x?”
We prove that if A(x, y) and B (y) are any formulas,† then

¬(∃y) B (y) ∧ (∀x)(A (x, y) ↔ ¬A(x, x)) (i)

We prove (i) by contradiction (I.4.21, p. 51) combined with proof by auxiliary

constant (I.4.27, p. 53):

(1) (∃y) B (y) ∧ (∀ x)(A(x, y) ↔ ¬ A(x, x)) added or “given”

(2) B (c) ∧ (∀x)(A(x, c) ↔ ¬A(x, x)) added; c is a new constant

(3) (∀x)(A(x, c) ↔ ¬A(x, x)) (2) plus

tautological implication

(4) A(c, c) ↔ ¬A(c, c) (3) plus specialization

The formula in line (4) is, or creates, a contradiction, since also A(c, c) ↔
A(c, c).
Taking B (y) to be the special case of ¬U (y), and A(x, y) to be x ∈ y,
we have refuted Coll x x ∈
/ x, that is, we have established that (note absence
of subscript on ) ¬Coll x x ∈
/ x without ever using any nonlogical axioms
or being aware of the question x ∈ x. Our third (and last) visit to Russell’s
paradox will show that what is at work here is a Cantor diagonalization.
Thus, Frege’s (1893) axiom of comprehension that
for all formulas A , one has Coll x A
is refutable within pure logic by providing the counterexample A ≡ x ∈
/ x.‡

† You may want to look at I.1.11, p. 19.

‡ The reader may recall that we have reserved “≡” for string equality, not formula equivalence
(see p. 13).
122 III. The Axioms of Set Theory

Russell’s paradox was not the ﬁrst paradox discovered in naı̈ve set theory
as originally developed by Georg Cantor. The Burali-Forti antinomy had been
suggested earlier, and we will get to it at the proper place in our development.
Russell’s paradox is less technical on the one hand, and is immediately rele-
vant to our present discussion on the other hand; thus we opted for its early
presentation.
Obviously, comprehension, as stated by Frege, was too strong and allowed
some “super-collections” to be built (like R) which are not sets.
It should be noted here that in the work of Cantor the comprehension schema
was used carefully so as not to construct “too large” or “too complicated” sets,
as compared with the “ingredient” sets that entered into such constructions. For
this reason, his work did not explicitly lead to Russell’s objection, the latter
being aimed at Frege.
We still want to be able to collect into a set all “x-values” that satisfy “rea-
sonable” formulas A – leaving the unreasonable ones out. Let us work towards
identifying such reasonable formulas. But ﬁrst a lemma:

III.2.3 Lemma.

Z FC ¬U (y) ∧ (∀x)(x ∈ y ↔ A) ∧ ¬U (z) ∧ (∀x)(x ∈ z ↔ A) → y = z

Proof.

¬U (y) ∧ (∀x)(x ∈ y ↔ A) ∧ ¬U (z) ∧ (∀x)(x ∈ z ↔ A)

↔ ∀ over ∧ distributivity, taut. equivalence and I.4.25, p. 52

¬U (y) ∧ ¬U (z) ∧ (∀x) (x ∈ y ↔ A) ∧ (x ∈ z ↔ A)

→ ∀-monotonicity (I.4.24, p. 52) and taut. implication
¬U (y) ∧ ¬U (z) ∧ (∀x)(x ∈ y ↔ x ∈ z)

→ extensionality – only this step used a ZFC axiom
y=z

(Recall the discussion in I.7.11)

III.2.4 Remark. Let us recall the basics of introducing new function symbols
(Section I.6). Suppose that we have the following:

T (∃y)A(y, xn ) (1)

III.2. Set Terms; Comprehension; Separation 123

and

T A (y, xn ) ∧ A(z, xn ) → y = z (2)

Then we may introduce a new function symbol, say f A , into the language of
T by the axiom

f A (xn ) = y ↔ A(y, xn ) (3)

We also know that (3) is provably equivalent to (4) below (see p. 73), so that (4)
could serve as well as the introducing axiom:

A( f A (xn ), xn ) (4)

We ﬁnally recall (p. 73) the notation (Whitehead and Russell (1912)) for the
term f A (xn ):

(ιy)A(y, xn ) (5)

A special case of the above is important. Suppose that t(xn ) is a term, where we
have written “(xn )” to indicate the totality of free variables in t. Then substitution
in the logical axiom x = x yields t = t; thus the substitution axiom and modus
ponens yield

(∃y)y = t (6)

Note the absence of subscript from above. Since equality is transitive, we

also have

y =t∧z =t → y =z (7)

We may thus introduce a new function symbol f t of arity m ≥ n by the axiom

(form (3) above)

f t (ym ) = y ↔ t = y (8)

or equivalently (form (4) above)

f t (ym ) = t (9)

where the list ym contains all the variables xn of t.

An important, more general case of (1)–(2) often occurs in practice. We may
have a proof of (1) for some, but not all, xn :

T D (xn ) → (∃y)A(y, xn ) (10)

124 III. The Axioms of Set Theory

We assume that we still have (2) in the restricted form

T D (xn ) → A(y, xn ) ∧ A (z, xn ) → y = z (11)

We would like now to introduce a function f A that satisﬁes (3) (or (4)) for
precisely those xn that satisfy D . We could deﬁne f arbitrarily for those xn for
which D fails.
Let then a be some constant in the language of T . Let

B (y, xn ) ≡ D (xn ) ∧ A(y, xn ) ∨ ¬D (xn ) ∧ y = a (12)

We show that

T B (y, xn ) ∧ B (z, xn ) → y = z (13)

We will employ the deduction theorem:

(i) D (xn ) ∧ A(y, xn ) ∨ ¬D (xn ) ∧ y = a assume

(ii) D (xn ) ∧ A(z, xn ) ∨ ¬D (xn ) ∧ z = a assume
(iii) D (xn ) ∧ A(y, xn ) ∧ A(z, xn ) ∨ ¬D (xn ) ∧ y = a ∧ z = a

(i), (ii) plus |=taut
(iv) y = z

proof by cases: 1st disjunct of (iii) plus (11); 2nd disjunct plus trans. of “=”

We next note that

T (∃y)B (y, xn ) (14)

Indeed,

(i) D (xn ) assume

(ii) (∃y)A(y, xn ) (i), (10) and MP

(iii) A(c, xn ) assume; c is a new

constant

(iv) D (xn ) ∧ A(c, xn ) ∨ ¬D (xn ) ∧ c = a (i), (iii) plus |=Taut

(v) (∃y) D (xn ) ∧ A(y,xn ) ∨¬ D (xn ) ∧ y = a (iv), subst. axiom

plus MP

By the deduction theorem

T D (xn ) → (∃y)B (y, xn ) (15)

III.2. Set Terms; Comprehension; Separation 125

Next consider

(i) ¬D (xn ) assume

(ii) ¬D (xn ) ∧ a = a (i), a = a and |=Taut

(iii) D (xn ) ∧ A(a, xn ) ∨ ¬D (xn ) ∧ a = a (ii) plus |=Taut

(iv) (∃y) D (xn ) ∧ A(y, xn ) ∨ ¬D (xn ) ∧ y = a (iii), subst. axiom

plus MP
Thus, by the deduction theorem,

T ¬D (xn ) → (∃y)B (y, xn ) (16)

(15) and (16) yield (14) via proof by cases. (13) and (14) allow the introduction
of f A by the axiom

B ( f A (xn ), xn ) (17)

That is,

D (xn ) ∧ A( f A (xn ), xn ) ∨ ¬D (xn ) ∧ f A (xn ) = a (18)

Since

D (xn ), (18) |=Taut A( f A (xn ), xn )

and

¬D (xn ), (18) |=Taut f A (xn ) = a

we get

(18) T D (xn ) → A ( f A (xn ), xn ) (19)

and

(18) T ¬D (xn ) → f A (xn ) = a (20)

In other words, (10) and (11) allow us to introduce a new function symbol f A
that satisﬁes (19) and (20). (20) deﬁnes f A “arbitrarily” for those xn where D
fails.
It is easy to check, just as we did on p. 73, that (19) is provably equivalent to

(18) T D (xn ) → A(y, xn ) ↔ y = f A (xn ) (19 )

126 III. The Axioms of Set Theory

III.2.5 Deﬁnition (Set Terms). If ZFC Coll x F (x, z n ), then Lemma III.2.3
allows us to introduce the term

(ιy) ¬U (y) ∧ (∀x) x ∈ y ↔ F (x, z n ) (st)

We call the above a set term, deﬁned by the formula F and the objects
z1, . . . , zn .
We (almost always) use the shorter, and standard, metamathematical abbre-
viation

{x : F (x, z n )} (sst)

instead of the notation (st).

The reader will recall from Section I.6 that, in actual fact, a formal deﬁni-
tion introduces a function symbol, not a term. However, we agree to leave the
“ontology” of that function symbol, say, “ f F ”, unspeciﬁed, and we agree to
use the argot (st) or (sst) above to informally denote the term, f F (z n ), that
corresponds to f F .
Nevertheless, whenever ZFC Coll x F , either of the notations (st) or (sst)
stands for (i.e., names) a formal term of the theory.
It is important to note that set terms give rise to more complicated terms than
just variables. The latter are the only terms of the basic language L Set (see II.3.2),
while as we enrich the language by the (formal) addition of new function sym-
bols f A , f B , etc., and constants ∅, ω, etc., we can build complicated terms
such as f A (. . . , f B (. . . , ω, . . .), ∅, . . .) (see I.1.5). Such terms we will call just
terms (or “formal terms”, to occasionally emphasize their formal status).

We immediately have

III.2.6 Proposition (Set Term Facts). If ZFC Coll x F , then:

(i) ZFC y ={x : F } ↔ ¬U (y) ∧ (∀x)(x ∈ y ↔ F ).
(ii) ZFC ¬U {x : F } .
(iii) ZFC x ∈ {x : F } ↔ F .
(iv) If also ZFC Coll x G , then ZFC (∀x)(F → G ) ↔ {x : F } ⊆ {x : G }.
(v) If also ZFC Coll x G , then ZFC (∀x)(F ↔ G ) ↔ {x : F } = {x : G }.

Proof. (i): This is (3) in III.2.4 above, that is, the introductory axiom for “ f A ”,
where A is “¬U (y) ∧ (∀x)(x ∈ y ↔ F )”.
(ii): By (4) in III.2.4 and tautological implication.
III.2. Set Terms; Comprehension; Separation 127

(iii): By (4) in III.2.4 and tautological implication followed by specializa-

tion.
(iv): By (iii) and the equivalence theorem,

ZFC (∀x)(F → G ) ↔ (∀x)(x ∈ {x : F } → x ∈ {x : G })

Note that the assumption ZFC Coll x G allows us to introduce {x : G } form-

ally and have (iii) (with F replaced by G ).
(v): Similar to (iv).

In Section III.4 we will introduce informal notation that allows us to write

(i)–(v) above in the metatheory without requiring prior proofs of either Coll x F
or Coll x G .

III.2.7 Remark. Note that x is a bound variable in (st) of Deﬁnition III.2.5, and
hence also in (sst). Thus, if the conditions for the variant theorem are fulﬁlled
(I.4.13, p. 46) – that w occurs neither free nor bound in F – then we can also
write the set term as {w : F (w, z n )}. That is,

ZFC {x : F (x, z n )} = {w : F (w, z n )} (1)

The above is different from (v) of III.2.6. It can be proved as follows:

y = {x : F (x, z n )}
↔ (i) of III.2.6
¬U
(y) ∧ (∀x)(x ∈ y ↔ F (x, z n ))
↔ variant theorem (I.4.13, p. 46) and equivalence theorem
¬U
(y) ∧ (∀w)(w ∈ y ↔ F (w, z n ))
↔ (i) of III.2.6
y = {w : F (w, z n )}

Thus,

ZFC y = {x : F (x, z n )} ↔ y = {w : F (w, z n )}

from which, substitution and the logical fact t = t for any term t yield (1).
As a corollary we have – via the equivalence theorem and (1) – the well-
known and obvious (under the usual non-occurrence restrictions on w)

ZFC w ∈ {x : F [x]} ↔ F [w] (2)

128 III. The Axioms of Set Theory

Formally introduced set terms play a dual role. On one hand, formally, they are
just meaningless symbol sequences of which we have proved (or a proof exists,
in any case) that they are sets. For that reason, we often just say “. . . the set
{x : F } . . . ”.
On the other hand, the formula part of a set term ﬁrst order deﬁnes (in the
standard structure) some real set; hence the term itself represents or names that
set.
The very format of the chosen symbol for set terms,

{x : F }

is suggestive of its semantics in the standard model: “the collection of all the x
that make F true”. As a matter of fact, this is more than notational suggestive-
ness: Soundness of all ﬁrst order theories – and anticipating that our axioms will
be true in the standard model – implies that all ZFC theorems will be “really
true”. In particular, the formula in (iii) of III.2.6 is “true” and says that “x is in
{x : F [x]} iff F [x] is ‘true’ for this x”.

III.2.8 Example. We continue here what we have started in Example III.2.1.

Since

ZFC ¬U (y) → Coll x x ∈ y

III.2.6 (iii) gives

¬U (y) ZFC x ∈ {x : x ∈ y} ↔ x ∈ y

By III.2.6 (ii),

¬U (y) ZFC ¬U {x : x ∈ y}

as well; hence

¬U (y) ZFC y = {x : x ∈ y} (1)

by extensionality via substitution.

In words, every set is equal to a set term.

We now introduce a weak form of Frege comprehension, so that we can have

a sufﬁcient condition for Coll x A to hold.

III.2.9 Axiom (Schema: Separation or “Subsets” Axioms). For every for-

mula P [x] which does not have any free occurrences of y, the following
III.2. Set Terms; Comprehension; Separation 129

is an axiom:

¬U (A) → (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ A ∧ P [x])
For short,

¬U (A) → Coll x x ∈ A ∧ P [x]

The above is a schema due to the presence of the arbitrary formula P . Every
speciﬁc choice of P leads to an axiom: An instance of the axiom schema. The
name “separation axiom” is apt since the axiom allows us to separate members
from non-members (of a set).

Why is the schema III.2.9 true (in the standard model)? Well, it says that if A
is a set, then – no matter what formula P we choose – we can also collect
all those x that make x ∈ A ∧ P [x] true (1)
into a set.
Now, all those x in (1) are in A, and we know that we have formed A at
some stage† (it is a set!), say , that comes after all the stages at which all the
various x in A were formed (or “given”, if atomic).
Thus, at this very same stage we can collect into a set just those x in A
that are moreover restricted to satisfy P [x].

III.2.10 Deﬁnition. Whenever the set term

{x : x ∈ A ∧ P }
can be introduced by III.2.9, it is often written more simply as
{x ∈ A : P }

III.2.11 Proposition.

¬U (a), P → x ∈ a ZFC (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ P )

In words, if a is a set and we know that P → x ∈ a, then

Coll x P
so we can introduce the set (term) {x : P }.

† Principle 1, p. 102, is at work here. Recall that we can take or leave this principle. However,
we have decided to take it (and hence adopt foundation, later on). It is worth stating that in the
absence of Principle 1, a “doctrine on limitation of size” would still effectively argue the “truth”
of separation.
130 III. The Axioms of Set Theory

Proof.

(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ a ∧ P )

↔ equivalence theorem and A → B |=Taut A ↔ A ∧ B

(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ P )

Done, since the top line is provable in the presence of the assumption ¬U (a)
(separation).

III.2.12 Corollary.

ZFC ¬U (a) → (∀x)(P → x ∈ a) → (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ P )

Proof. By the deduction theorem and III.2.11.

So we can build sets by separation by restricting membership to existing

sets. Unsatisfactory as this may be – since separation only enables us to build
“smaller” sets (meaning here subsets) than the ones we have – it gets worse:
We have no proof that any set exists yet. We ﬁx this in the next and following
sections. One should note that
(∃x)x = x
is a theorem of pure logic (axiom x = x followed by the substitution axiom
and modus ponens). This says, as far as ZFC is concerned,
An object exists!† (∗)
But what type of object? This may well be an atom, so we still have no proof
that any set exists.

III.3. The Set of All Urelements; the Empty Set

III.3.1 Axiom. The set of all urelements exists:
Coll x U (x)

III.3.2 Deﬁnition. We introduce a new constant, M, into our formal language

by the axiom
y = M ↔ ¬U (y) ∧ (∀x)(x ∈ y ↔ U (x)) (1)

† Upon reﬂection, there is nothing unsettling about pure logic proving that “objects” exist in set
theory. This is simply a consequence of our decision – in logic – not to allow empty structures.
This decision was also hardwired in the syntactic apparatus of logic.
III.3. The Set of All Urelements; the Empty Set 131

since

ZFC (∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ U (x))

by III.3.1.

That is, we have just introduced the (rather unimaginative) short name M for
the set term

{x : U (x)}

since (1) above yields

ZFC y = M ↔ y = {x : U (x)}

and hence, by substitution,

ZFC M = {x : U (x)}

III.3.3 Lemma (Existence of the Empty Set). ZFC Coll x ¬x = x.

Proof. By ¬x = x → x ∈ M and ZFC ¬U (M) (the latter by III.3.2)

plus III.2.12.

III.3.4 Deﬁnition (Empty Set). By III.3.3 we may introduce the set term

{x : ¬x = x}

for the empty set. We can then follow this up by the axiom (deﬁnition)

∅ = {x : ¬x = x}

to introduce (using (9) in III.2.4) the new constant symbol ∅ for the empty set.

III.3.5 Remark. Referring to 2.6 (ii) and (iii), we see that the intuitive
meaning – or “standard semantics” – of ∅ is the “set with no elements”, since
it is a set, but, moreover, x ∈ ∅ is “false” (equivalent to ¬x = x) for all x. And
this is as we hoped it would be (refer also to the discussion in Section II.4).
Syntactically we get the same understanding as follows: By III.2.6 and III.3.4,

ZFC x ∈ ∅ ↔ ¬x = x

Hence, by tautological implication,

ZFC ¬x ∈ ∅ ↔ x = x
132 III. The Axioms of Set Theory

Therefore, by the equality axiom x = x and tautological implication,

ZFC ¬x ∈ ∅
or
ZFC x ∈
/∅
A by-product of the existence of the empty set is a relaxing of the conditions
in III.2.11 and III.2.12. We may drop the assumption ¬U (a). Assume P →
x ∈ a. Now there are two cases (proof by cases). If ¬U (a), then we let III.2.11
or III.2.12 do their thing. If U (a) is the case, then we infer ¬x ∈ a (III.1.3);
hence x ∈ a → x ∈ ∅ by tautological implication. Another tautological
implication gives P → x ∈ ∅. Since ZFC ¬U (∅), we can now invoke III.2.11
to infer Coll x P .

The concluding remark above is worth immortalizing:

III.3.6 Proposition. ZFC (∀x)(P → x ∈ a) → Coll x P .

Correspondingly, P → x ∈ a ZFC Coll x P

III.3.7 Example. We saw how to justify the existence (as a formal mathematical
object – a set) of a “part of” M in the simple, but very important, case of ∅.
In general, III.2.12 allows us to prove Coll x A for any A for which we know
that A → x ∈ M (either as an assumption, or as a provable ZFC fact).
For example, we can show that for any a and b in M we can collect these two
elements into a set of “two” elements, intuitively denoted by “{a, b}”. Indeed,
a ∈ M ∧b ∈ M → x =a∨x =b → x ∈ M (1)
In fact,
a∈M →x =a→x ∈M (2)
and
b∈M →x =b→x ∈M (3)
by the Leibniz axiom. Thus, proof by cases (I.4.26, p. 52) helped by |=Taut gives
x = a ∨ x = b → (a ∈ M → x ∈ M) ∨ (b ∈ M → x ∈ M)
of which (1) is a tautological consequence.
Thus,
if a ∈ M and b ∈ M, then Coll x (x = a ∨ x = b)
III.3. The Set of All Urelements; the Empty Set 133

or, in the formal language (with the “Coll” abbreviation),

ZFC a ∈ M ∧ b ∈ M → Coll x (x = a ∨ x = b) (4)
One introduces as usual the set term
{x : x = a ∨ x = b}
as a follow-up to (4), and also the new symbol (“set term by listing”)
{a, b}
i.e., one deﬁnes
{a, b} = {x : x = a ∨ x = b}

Can we repeat the above for any sets a and b? That is, is it true that ZFC
Coll x (x = a ∨ x = b) for any objects a and b? In particular, can we say that we
can form the (real) sets {{a}} or {{a}, {b}} in the metatheory? Well, we should
hope so, since – intuitively – there is a stage after the stages when {a} and {b}
were built.
However we need a new axiom to formally guarantee this, because all our
present axioms are true in the structure with underlying set {∅, 1, {1}} (M = {1}
here), but so is

(∀y) ¬U (y) → (∀x)(x ∈ y → U (x))

since the members of every set in this structure are atoms. Thus,
present set of axioms + “no set has set elements” (1)
is consistent (cf. I.5.14). Hence
present set of axioms “some set has set elements” (2)
Thus, by (2), we cannot prove yet that, in particular, a set {{a}} exists.

One last comment before we leave this section: We choose not to postulate
existence of individual urelements, so it may be the case that M = ∅. This
leaves our options open in the sense that we can have the “usual” ZFC (with
no urelements) as a special case of our ZFC. We note in this connection that if,
instead of having a predicate (U ) to separate sets from atoms, we adopted a two-
sorted language with two “types” of object variables one, say a, b, c, a , a , . . . ,
for sets and one, say, p, q, p , q , . . . , for atoms, then
(∃ p)( p = p)
would guarantee the existence of atoms, spoiling our present ﬂexibility.
134 III. The Axioms of Set Theory

III.4. Class Terms and Classes

Before moving on towards developing tools for building more complicated
sets, we pause to expand our argot notation in the interest of achieving more
flexibility.
ZFC is about sets and atoms. It does not deal with “higher order” objects such
as the Russell collection (which, we have seen, is not a set), and, moreover, its
(formal) language has no means of notation for such higher order collections.
Nevertheless, much is to be gained in notational uniformity, and hence also in
user-friendliness, by allowing in the metalanguage the use of symbol sequences
of the form {x : A} – called “class terms” – even if we have no knowledge of
ZFC Collx A
Indeed, we want to be able to use in formal (syntactic) contexts the “term”
{x : A}, even if the above may actually fail. Correspondingly, in semantic
contexts, the symbol sequence {x : A} serves as a name for a real collection A –
that is probably too big to be a set – which A first order defines in the usual
sense:†
x ∈A iff A [x] is true in the standard model of ZFC

The collection that A names is technically called a class (cf. III.4.3). We, of
course, simply say “A is a class”.
To protect the innocent I state outright that there is no philosophical signi-
ficance in restricting attention to first order definable classes. It is not due to a
lack of belief in the existence of non-definable classes; rather it is due to a lack
of interest in them.

While the intended semantics above is meant to motivate the consideration

of (possibly non-set) classes, “real classes” do not intrude into our (argot)
usage of class terms. The latter are employed entirely syntactically. Their use
is governed by a “calculus of translations” through which we may introduce or
remove class term abbreviations:

III.4.1 Informal Deﬁnition (Class Terms). For any formula A of ZFC, the
symbol sequence
{x : A} (1)
is called a class term.

† In I.5.15 we saw what it means to first order define a set in a structure. The notion naturally
extends to first order definability of any collection.
III.4. Class Terms and Classes 135

We hereby expand the role of this symbol, employing it in the metalanguage

for two purposes:
(a) If we can show

ZFC Coll x A

then we use (1) to name a (formal) set term as per III.2.5 – thus, every set term
is also a class term.
(b) If not, then (1) can still be employed as an abbreviation of certain formal
texts described below (compare with III.2.6):
(i) y = {x : F } and {x : F } = y each stand for the formal text

¬U (y) ∧ (∀x)(x ∈ y ↔ F )

In particular, this reﬂects the position that a (formal) variable, like y,

stands for an atom or set (here, a set).

“=” in y = {x : F } is not the formal “=”. We are not to parse the informal text
“y = {x : F }”, decomposing it into its ingredients. We take it in its entirety as
an alias for the formal text “¬U (y) ∧ (∀x)(x ∈ y ↔ F )”. A similar comment
applies to informal uses of “=”, “∈”, and “U ” below.

(ii) {x : F } = {x : G } stands for the formula (∀x)(F ↔ G ).

(iii) x ∈ {x : F [x]} stands for the formula F [x], and (see III.2.7) x ∈
{w : F [w]} stands for the formula F [x] (where w is neither free nor
bound in F [x]).
(iv) {x : F } ∈ {x : G } stands for

(∃y) y = {x : F [x]} ∧ y ∈ {x : G [x]}

which (with the help of (i) and (ii)) becomes

(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ F [x]) ∧ G [y]

(v) {x : F } ∈ z stands for

(∃y) y = {x : F [x]} ∧ y ∈ z

which (with the help of (i)) becomes

(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ F [x]) ∧ y ∈ z

(vi) U {x : F } stands for (∀x)¬x = x.
136 III. The Axioms of Set Theory

Pause. So U {x : F } is refutable. Does this prove that {x : F } is a set?

III.4.2 Remark. (1) Ideally, we should have different notations for the symbol
{x : F } according to its status as a name for a set term or not – say, boldface type
in the former case, and lightface in the latter. However, it is typographically more
expedient to use no such notational distinctions but allow instead the context
(established by English text surrounding the use of such terms) to fend off
ambiguities.
(2) We already know that for some formulas A, ZFC ¬Coll x A. Seman-
tically, for such a formula A, the collection in the metatheory named by the
symbol
{x : A [x]} (∗)
is not a set.
Indeed, using III.4.1(i) above, we translate the formal “ZFC ¬Coll x A”
into the theorem, written in English,
“There is no set y such that y = {x : A}” (∗∗)
Then, Platonistically, for such a formula A we know that the collection (∗) is
not a set in the metatheory, since the theorems – such as (∗∗) above – of the
formal theory are true in the standard model.
For example, we can state that “{x : x ∈ / x} is not a set in the metatheory”.
The quoted fact is the translation of our formal knowledge that “There is no set
y such that y = {x : x ∈ / x}”, or in full formal armor†

¬(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ x ∈ / x)

For the semantic and informal side of things and for future reference we
state:

III.4.3 Informal Deﬁnition (Real Classes). A (real) class is a collection that is

first order definable in the standard structure (in the language of ZFC). Specif-
ically, the class term
{x : A(x, z 1 , . . . , z n )} (1)
names a real collection, also denoted by A(z 1 , . . . , z n ), that is first order defined
by the formula A(x, z 1 , . . . , z n ). That is, for any choice of values for the

† The reader will recall that this is a fact of logic, whence just “”.
III.4. Class Terms and Classes 137

parameters z 1 , . . . , z n ,

x ∈ A(z 1 , . . . , z n ) iff A(x, z 1 , . . . , z n ) is true

If, for some choice of closed terms tn , ZFC Coll x A(x, t1 , . . . , tn ), then A(tn )
denotes a real set; otherwise it denotes a non-set class, called a proper class.
For the sake of convenience we will use “blackboard bold” capital letters as
short names of classes; e.g., A abbreviates the class term {x : A} and we may
write, using the metalinguistic “=”,

A = {x : A }

These names are metavariables.† We will normally adopt the general convention
of naming a class term by the blackboard bold version of the same letter that
denotes the deﬁning formula.
For example, A = B is short for {x : A} = {x : B }, A ∈ B is short for
{x : A} ∈ {x : B }, etc. – expressions which can be translated into the formal
language using III.4.1.

III.4.4 Remark. (1) Worth repeating: Class terms are just symbols that name
certain entities of our intuition, namely, classes. We will often abuse terminology
and say “let {x : A } be a class” rather than “let {x : A } name a class”, just
as one may say (under the assurance of ZFC Coll x A) “let {x : A} be a set”.
Properly speaking, a class term is an syntactic object, while a class is a “real”
object.
(2) What class terms and classes do for us is analogous to what number
theory argot does for us in Peano arithmetic (PA). Such argot allows us to
write, e.g., the easily understandable informal text

PA every n > 1 has a prime divisor

instead of

PA (∀n) n > 1 → (∃x)(∃y) n = x × y ∧

x > 1 ∧ (∀m)(∀r )(x = m × r → m = 1 ∨ m = x)

† We note that N, Z, Q, R, C are already reserved for the natural numbers, integers, rational num-
bers, reals, and complex numbers. These are metalinguistic constants. Besides, we have al-
ready called the Russell proper class “R”, and later we will use “On” and “Cn” for certain
proper classes. This does not conﬂict with the blackboard bold notation for indeterminate class
names.
138 III. The Axioms of Set Theory

In particular, in the context of class terms one can readily replace “stands
for” by “↔” to write – for example – something like (cf. III.4.1(i))

y = {x : F } ↔ ¬U (y) ∧ (∀x)(x ∈ y ↔ F ) (∗)

We can obtain (an absolute) proof of (∗) by starting with the tautology

¬U (y) ∧ (∀x)(x ∈ y ↔ F ) ↔ ¬U (y) ∧ (∀x)(x ∈ y ↔ F )

and then abbreviating the left hand side, “¬U (y) ∧ (∀x)(x ∈ y ↔ F )”, by
“y = {x : F }” using the translation rule III.4.1(i).
(3) Every “real” set named by some formal term t is a class, since (by III.2.8)
y = {x : x ∈ y}
and hence
t = {x : x ∈ t}
by substitution.†

III.4.5 Example.

(a)
y=A (1)
(or A = y) is very short text for y = {x : A}, which in turn is short for
(III.4.1(i))

¬U (y) ∧ (∀x)(x ∈ y ↔ A) (2)

Thus, whenever we claim that we can prove (1), we really mean that we
can prove (2). In particular, such a proof yields also a proof that
(i) Coll x A (by substitution axiom and modus ponens); hence {x : A} is
(i.e., can be introduced as) a set term; thus, A is (denotes) a set.

Pause. What is all this roundabout argument for? Why don’t we just
say, “A, a class, equals a set y.‡ Therefore, it is itself a set”?

(ii) x ∈ y ↔ A.
(b)

A∈B (3)

† Without loss of generality, x is not free in t.

‡ Recall the convention on variable names. y names a set or atom, but it is not an atom here.
III.4. Class Terms and Classes 139

is very short for

{x : A} ∈ {x : B [x]} (4)
which is short for (III.4.1(iv))

(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ A) ∧ B [y] (5)

Now, to say that we have a proof of (3) (or that we assume (3)) is to say that
we have a proof of (5) (or that we assume (5)). From the latter, tautological
implication along with ∃-monotonicity (I.4.23) yields

(∃y) ¬U (y) ∧ (∀x)(x ∈ y ↔ A)

that is, Coll x A. In words, “(3) implies that A is a set”. This corresponds
well with our intention that class members are sets or atoms. Here A, being
a collection, is not an atom.

III.4.6 Example. By III.4.1(i, ii), if y = A then A = y, and if A = B then

B = A.† Moreover, A = A is a theorem of logic, since A ↔ A.
Transitivity of this (informal) class equality is also guaranteed, from
A ↔ B , B ↔ C |=Taut A ↔ C and Deﬁnition III.4.1(ii).

III.4.7 Informal Deﬁnition (Subclass and Superclass of a Class). The nota-

tion A ⊆ B stands for
(∀x)(x ∈ A → x ∈ B)
and is pronounced “A is a subclass of B”, or “B is a superclass of A”. We can
also write B ⊇ A.
A ⊂ B (also B ⊃ A) stands for A ⊆ B ∧ ¬A = B and is read “A is a proper‡
subclass of B” (also “B is a proper superclass of A”).
If A ⊆ B and A is a set we say, as before, that A is a subset of B.

We have at once:

III.4.8 Proposition.

(i) A ⊆ B ↔ (∀x)(A → B )
(ii) A ⊆ B ∧ B ⊆ A ↔ A = B

† Indeed, A = B ↔ B = A translates to the formal (A ↔ B ) ↔ (B ↔ A ).

‡ This “proper” qualiﬁes “subclass”, not “class”. Thus a proper subclass could still be a set.
140 III. The Axioms of Set Theory

Proof. (i): A ⊆ B ↔ (∀x)(x ∈ A → x ∈ B) is a tautology “G ↔ G ” by III.4.7.

We use III.4.1 to eliminate “x ∈ · · · ”; thus A ⊆ B ↔ (∀x)(A → B ).
(ii): We translate (ii) using (i) that we have just veriﬁed, and III.4.1:

(∀x)(A → B ) ∧ (∀x)(B → A) ↔ (∀x)(A ↔ B )

Distribution of ∀ over ∧ to the left of the ﬁrst ↔, along with the tautology
theorem and equivalence theorem, shows that the above is a theorem of pure
logic.

Pause. So, does (ii) above prove extensionality?

III.4.9 Example. What can we learn from ZFC ¬U (y) ∧ A ⊆ y? Well, III.2.8,
III.4.7 and III.4.8 allow us to translate the above into
ZFC ¬U (y) ∧ (∀x)(A → x ∈ y)
By III.2.12 and modus ponens we get
ZFC Coll x A
That is, A is a set. Another way to say all this is
ZFC ¬U (y) ∧ A ⊆ y → Coll x A

The above is worth immortalizing, in English:

III.4.10 Proposition (Class Form of Separation). Any subclass of a set is a

set.

III.4.11 Example. We translate the very common informal text “A = ∅”: ﬁrst,
into ¬({x : A} = ∅), and next, taking ZFC ∅ = {x : ¬x = x} into account,
¬(∀x)(A ↔ ¬x = x)
that is,
(∃x)¬(A ↔ ¬x = x) (1)
But
|=Taut ¬(Q ↔ P ) ↔ (Q ↔ ¬P )
Thus (1) is provably (within pure logic) equivalent to
(∃x)(A ↔ x = x) (2)
by the equivalence theorem.
III.4. Class Terms and Classes 141

Since x = x is a logical axiom, (2) is provably (within pure logic) equiv-

alent to

(∃x)A

This is the translation of A = ∅. Correspondingly, A = ∅ translates to ¬(∃x)A.

A class that equals ∅ is called, of course, empty. However, many authors also
use the term void, or null, class if A = ∅, and correspondingly, non-void, or
non-null, if A = ∅.

III.4.12 Informal Deﬁnition (Class Union, Intersection, and Difference).

We introduce the following metalinguistic abbreviations:

(a) A ∪ B, pronounced the union of A and B, abbreviates {x : x ∈ A ∨ x ∈ B}.

(b) A ∩ B, pronounced the intersection of A and B, abbreviates {x : x ∈ A ∧
x ∈ B}. If A ∩ B = ∅, then we say that A and B are disjoint.
(c) A − B, pronounced the difference of A and B in that order, abbreviates
{x : x ∈ A ∧ x ∈
/ B}.

Some authors use “∼” or even “\” for difference instead of “−”.

III.4.13 Example. A − B = B − A. Indeed, if a = b,

{x : x = a} − {x : x = a ∨ x = b} = ∅

while

{x : x = a ∨ x = b} − {x : x = a} = {x : x = b}

It is immediate that

III.4.14 Proposition. If A or B is a set, then so is A ∩ B. If A is a set, then so

is A − B.

Proof. By III.4.10 and (1)–(3) below:

(1) A ∩ B ⊆ A
(2) A ∩ B ⊆ B
(3) A − B ⊆ A

To see why (1) holds, we eliminate class terms:

(∀x)(A ∧ B → A)
142 III. The Axioms of Set Theory

The above is provable in pure logic. Similarly for (2). Also, (3) translates to
(∀x)(A ∧ ¬B → A)

Associativity of each of ∪ and ∩ (Exercises III.19 and III.20) allows one to

omit brackets and write “A ∩ B ∩ C” or, by recursion on n ∈ N,†

III.4.15 Informal Deﬁnition.

1
Ai stands for A1
i=1

n+1
n
Ai stands for Ai ∪ An+1
i=1 i=1

and

1
Ai stands for A1
i=1

n+1
n
Ai stands for Ai ∩ An+1
i=1 i=1

n n
The symbols “ i=1 ” and “ i=1 ” are also written as “ 1≤i≤n ” and “ 1≤i≤n ”
respectively.
In a moment of weakness one may ﬁnd oneself writing “A1 ∪ · · · ∪ An ” and
“A1 ∩ · · · ∩ An ” respectively.

III.4.16 Remark (Formal ∩ and Difference). We cannot prove similar results

(to those contained in III.4.14) for union yet. We will have to wait for the axiom
of union.
Note that in a classless approach one could carry out Definition III.4.12 and
Proposition III.4.14 as follows:
For the definition, for example of “∩”, one would introduce a new 2-ary
(binary) function symbol, “∩”, formally by the defining axiom
x ∩ y = z ↔ ¬U (z) ∧ (∀w)(w ∈ z ↔ w ∈ x ∧ w ∈ y) (i)
(i) is legitimate because ZFC Collw (w ∈ x ∧ w ∈ y). Indeed,
|=Taut w ∈ x ∧ w ∈ y → w ∈ x (ii)

† Recall that deﬁnitions, both formal and informal ones, are effected in the metatheory, where we
have tools such as natural numbers, induction, and recursion over N.
III.4. Class Terms and Classes 143

Thus (by III.3.6)

ZFC Collw (w ∈ x ∧ w ∈ y) (iii)

We note that x ∩ y makes sense formally even when one or both of x and y
are atoms, whereas the informal ∩ was deﬁned only for classes.† The deﬁning
axiom (i) and III.1.3 prove

ZFC U (x) → x ∩ y = ∅

and

ZFC U (y) → x ∩ y = ∅

Similarly, difference would be introduced formally by

x − y = z ↔ ¬U (z) ∧ (∀w)(w ∈ z ↔ w ∈ x ∧ ¬w ∈ y) (v)

The proof of the legitimacy of (v) is left to the reader. Note that here too
x − y, formally, makes sense for atom “arguments”. Moreover, we have the
“pathological” special cases

ZFC U (x) → x − y = ∅

and

ZFC ¬U (x) → U (y) → x − y = x

So much for the formal “∩” and “−”. Of course, the deﬁning axiom for the
formal “∪” will still have to wait for the union axiom.
Since we would have sooner or later to extend the formal “∩”, “ ∪ ”, “−”
(recorded below) to the class versions to use in our argot, we decided to in-
troduce these symbols as (informal) abbreviations to begin with (as is done in,
e.g., Levy (1979)).

III.4.17 Deﬁnition. For the record, we introduce the 2-ary function symbols
“∩” and “−” formally to our language, and add the deﬁning axioms (i) and (v)
of III.4.16 to our theory.
The context will easily tell in each use whether we are employing these
symbols formally, or as abbreviations as per III.4.12.

† It is normal practice in a first order language to insist that function symbols stand for totally
defined or total functions, upon interpretation. Thus it is appropriate that ∩ and − are defined on
all objects.
144 III. The Axioms of Set Theory

III.4.18 Informal Deﬁnition (the Universe; the Universe of All Sets). We

introduce the following abbreviations for the class terms {x : x = x} and
{x : ¬U (x)}:
U M = {x : x = x}
and
V M = {x : ¬U (x)}
In the “real” sense, i.e., semantically, U M is the universe of all objects set theory
is about, while V M is the universe of all sets, i.e., the atoms are not included in
V M (they are used however to build sets).

The following is immediate.

III.4.19 Proposition.
(1) U M is a proper class.
(2) ZFC U M = V M ∪ M.

Proof. (1): Indeed, R, the Russell class, satisﬁes R ⊆ U M , since x ∈/x→

x = x. If U M were a set, then so would R be by III.4.10.
(2): U M = V M ∪ M translates to (by III.3.2, III.4.1(ii) and III.4.12)
x = x ↔ ¬U (x) ∨ U (x)

Once we have the union axiom (which says that the union of two sets is a set),
we will obtain that V M is a proper class too (by (2) above and III.3.1).†

III.4.20 Remark (Alternate Universes). (1) The symbol U N will, in general,

denote the class of all sets and atoms built from the arbitrarily chosen initial set
of atoms N ⊆ M. If N = ∅, then we simply write U rather than U∅ . The reader
will note that while U M (where M is the set of all urelements) is trivially given
by the class term {x : x = x}, it takes, a glimpse forward (to Chapter VI) to
show that U N can also be deﬁned by a class term – as we require for all classes –
for any N ⊆ M. One way to do this is using the support function, “sp”
(see VI.2.34), namely U N = {x : sp(x) ⊆ N }. The latter says, “an object is
in U N iff when we disassemble it all the way down to its constituent urele-
ments, all these urelements are in N ”.

† Note that if M = ∅ then M ⊆ R; thus R ⊆ V M .

III.5. Axiom of Pairing 145

Similarly, we can deﬁne the class of all sets whose construction is based on
the urelements in N ⊆ M, V N . We write just V for V∅ . We note that V N is
given by the class term {x : ¬U (x) ∧ sp(x) ⊆ N }.
(2) In many elementary developments of the subject one often works within
a “reference set”, or “relative universe”, X (a set), and the sets of interest are
subsets or members of X . With this understanding, one would write “−A” or
“A” for X − A (where A ⊆ X , and therefore A is a set) and call “−A” the
complement of A (with respect to X ). Note that for any set A ∈ U N , U N − A
(for any N ⊆ M) is a proper class (Exercise III.21); thus we will have little
use for complements. It is the difference (most of the time of sets, rather than
classes) that we will have use for.

We note our intention to use informal class notation extensively. Therefore,

it is important to remember at all times that ZFC set theory does not admit
(proper) classes as formal objects of study.†
Invariably, there is nothing that we can say with the help of a class term
{x : A(x)} that cannot also be said – with a lot more effort and a lot less in-
tuitive transparency – by just using the formula A(x) instead (e.g., Bourbaki
(1996b) and Shoenﬁeld (1967) do not employ classes at all). Deﬁnition III.4.1
will be used as a tool to eliminate class terms in order to go back to the formal
language of set theory – whenever such caution is necessary (notably in the
introduction of axioms).

III.5. Axiom of Pairing

Consider now any two sets A and B. Say the first was built at stage A and
the second at stage B . We have no difficulty imagining that a stage exists
following both these stages. By Principle 0 (p. 102), at stage we can built any
set whose elements are available. In particular A and B are available; thus a set
that contains exactly A and B exists. However, the axiom that flows from this
discussion – the axiom for pairing – will have a simpler form if we allow the
possibility of additional members, beyond A and B, in the set asserted to exist.

III.5.1 Axiom (Unordered Pair). For any atoms or sets a and b there is a set
c such that a ∈ c and b ∈ c. Or, stated in the formal language,

(∃z)(a ∈ z ∧ b ∈ z)

† Other axiomatizations of set theory, originating with Gödel and Bernays, admit (proper) classes
as formal objects of study. See for example Monk (1969).
146 III. The Axioms of Set Theory

or still (universal closure of the above)

(∀x)(∀y)(∃z)(x ∈ z ∧ y ∈ z)

III.5.2 Remark. By III.1.3, ZFC a ∈ z → ¬U (z). Thus, by tautological im-

plication ZFC a ∈ z ∧ b ∈ z ↔ ¬U (z) ∧ a ∈ z ∧ b ∈ z. The equivalence
theorem then gives

ZFC (∃z)(¬U (z) ∧ a ∈ z ∧ b ∈ z)

Thus the object z guaranteed to exist in III.5.1 is a set, as expected.

III.5.3 Proposition. ZFC (∀a)(∀b)Coll x (x = a ∨ x = b).

Proof. It sufﬁces to prove

ZFC Coll x (x = a ∨ x = b)

We have
(1) (∃z)(a ∈ z ∧ b ∈ z) III.5.1

(2) a ∈ A ∧ b ∈ A added; A is a new constant

(3) a ∈ A (2) plus taut. implic.

(4) b∈A (2) plus taut. implic.

(5) ¬U (A) (3) plus III.1.3 plus MP

(6) x = a → (x ∈ A ↔ a ∈ A) Leibniz axiom

(7) x = b → (x ∈ A ↔ b ∈ A) Leibniz axiom

(8) x =a→x ∈ A (6) and (3) and taut. impl.

(9) x =b→x ∈ A (7) and (4) and taut. impl.

(10) x = a ∨ x = b → x ∈ A (8) and (9) and taut. impl.

(11) Coll x (x = a ∨ x = b) (10) and (5) and III.2.11

III.5.4 Corollary. ZFC Coll x (x = a).

Proof. See the proof above, and use (5) plus (8) and III.2.11. Alternatively
(without referring to the proof), by |=Taut x = a ↔ x = a ∨ x = a.
III.5. Axiom of Pairing 147

III.5.5 Deﬁnition (Pairs and Singletons). The above proposition and its corol-
lary allow the formal introduction of the set terms

{x : x = a ∨ x = b} (1)

and

{x : x = a} (2)

We also introduce the terms {a, b} (unordered pair) and {a} (singleton) by
the formal deﬁnitions (cf. III.2.4)

{a, b} = {x : x = a ∨ x = b}

and

{a} = {x : x = a}

III.5.6 Remark (Denoting Sets by Listing). We say that {a, b} and {a} denote
sets by explicit listing of their members. We note that the informal notation
N = {0, 1, 2, . . .} does not denote a set by explicit listing (in the metatheory).
Such notation is only possible for what we intuitively understand as “ﬁnite” sets.
The “. . . ” indicates our inability to conclude the listing and hints at a “rule”,
or understanding, of how to obtain more elements. Such understanding depends
on the context (in the case of N, just add 1 to get the next member).

III.5.7 Proposition. ZFC {a, b} = {b, a} and ZFC {a} = {a, a}.

Proof. By III.2.6, commutativity of ∨, and idempotency of ∨ (i.e., |=Taut A ↔

A ∨ A).

III.5.8 Remark. (i) Why “ZFC ” rather than just “” above? That is because
{a, b} and {a} were formally introduced as terms (sets) in III.5.5. Their intro-
duction necessitated the prior proof in (our present fragment of) ZFC of the
formulas Coll x (x = a ∨ x = b) and Coll x (x = a). As far as the class terms
{x : x = a ∨ x = b} and {x : x = a} are concerned, we have†

{x : x = a ∨ x = b} = {x : x = b ∨ x = a} (1)

and

{x : x = a} = {x : x = a ∨ x = a} (2)

† Cf. III.4.1 regarding the use of the unbracketed “=” in (1) and (2).
148 III. The Axioms of Set Theory

by III.4.1, since the above simply abbreviate

x =a∨x =b↔x =b∨x =a (3)

and

x =a ↔x =a∨x =a (4)

Thus, (1) and (2) are just stating the tautologies in (3) and (4), while Proposi-
tion III.5.7 states much more in between the lines, in particular that {a, b} and
{a} are sets.
If we had introduced the pair and singleton as abbreviations of the respective
class terms instead,† then the above proposition would be provable in pure
logic – for it would be just stating (3) and (4) – and whether or not the terms
referenced are sets would be a separate issue.
This remark was necessitated by our decision not to differentiate the notations
for set terms and class terms.
(ii) Proposition III.5.7 is popularized in naı̈ve set theory by saying “when
we list the elements of a set explicitly, multiplicity or order of elements does
not matter”.

III.5.9 Remark (Relaxing the Proof Style). It would be counterproductive to

introduce a rich argot towards the simpliﬁcation of formal texts on one hand,
while on the other hand we continue to offer extremely detailed formal proofs
such as the one for III.5.3. Well, we do not have to be that formal always,
nor can we afford to be so when our arguments get more involved. We will
frequently relax the proof style to shorten proofs. This relaxing will invariably
use shorthand tools such as English text, class terms, and a judicious omission
of (proof) details.
For example, a relaxed version of the proof of III.5.3 would read like this:
Let a and b be any objects (i.e., sets or urelements). Let us denote by c any
set (asserted to exist in III.5.1) such that a ∈ c and b ∈ c [this combines
steps (1)–(5) of the formal proof]. Thus {x : x = a ∨ x = b} ⊆ c; hence
{x : x = a ∨ x = b} (denotes) is a set by separation (III.4.10) [the obvious
steps (6)–(10) were just compacted to (10)].

While the axiom of pairing is not provable from the axioms that we had at our
disposal prior to its introduction (see p. 133), it becomes provable once (an

† This is how it is often done; e.g., Levy (1979).

III.6. Axiom of Union 149

appropriate version of) collection and power set axioms are introduced (see
Exercises III.15 and III.16).

III.6. Axiom of Union

How about the classes {x : x = a ∨ x = b ∨ x = c} and {x : x = a ∨ x = b ∨ x =
c ∨ x = d} – for short, {a, b, c} and {a, b, c, d} – where a, b, c, d are arbitrary
objects? Are these sets?
Of course, we could invoke Principle 0 (p. 102) again, and show that these
classes are sets indeed. However, it is not fitting an axiomatic approach to go
back to this metamathematical principle all the time. It is more elegant – and
safer – to have just one axiom that will imply that all such objects are sets.
What we have in mind is something more powerful than an endless sequence
of axioms for (unordered) triple, quadruple, etc.
We already know that {a, b}, {c, d}, and {c} are sets. Then, applying pairing
again, the following are sets too:

{a, b}, {c} (1)
and

{a, b}, {c, d} (2)
What we need is the ability to remove the level of braces just below the outer-
most, to obtain the (unordered) triple (from (1)) and quadruple (from (2)). In
essence, we want to know that, in particular, {a, b} ∪ {c} and {a, b} ∪ {c, d} are
sets.
We will address this immediately, but in somewhat more general setting.
First, we will define the operation that removes the “top level” of braces of all
non-atomic members of a class. To this end, and in all that follows in this volume,
we will benefit from a notational device. We often use bounded quantification in
set theory, i.e., “there is an x in A such that . . . ” and “for all x in A it follows
that . . . ”.

III.6.1 Informal Deﬁnition (Bounded Quantiﬁcation). The notations (∃x ∈

A)F and (∀x ∈ A)F are short forms of (∃x)(x ∈ A ∧ F ) and (∀x)(x ∈ A →
F ) respectively.

III.6.2 Example. We can easily verify that De Morgan’s laws hold for bounded
quantiﬁcation, i.e.,
(∃x ∈ A)F ↔ ¬(∀x ∈ A)¬F
150 III. The Axioms of Set Theory

Indeed,

¬(∀x ∈ A)¬F

↔ by III.6.1
¬(∀x)(x ∈ A → ¬F )

↔ equiv. theorem
¬(∀x)(¬x ∈ A ∨ ¬F )

↔ “∀-De Morgan”
(∃x)¬(¬x ∈ A ∨ ¬F )

↔ “∨-De Morgan” and equiv. theorem
(∃x)(x ∈ A ∧ F )

↔ by III.6.1
(∃x ∈ A)F

III.6.3 Informal Deﬁnition (Union of a Class; Union of a Family of

Sets). Let A be a class. The symbol A is an abbreviation of the class term

{x : (∃y ∈ A)x ∈ y}. We read A as the union of all the sets in A.

If A contains no atoms, then it is called a family of sets, and A is the union
of the family A.

III.6.4 Remark. Let A = {x : A [x]}. We have a number of variations in the

notation for A, namely, x∈A x or {x : A [x]} or A[x] x. In any case,
after we eliminate class notation, all these notations stand for the class term
{x : (∃y)(A [y] ∧ x ∈ y)}.

III.6.5 Example. #, {|}, {1, {2}} = |, 1, {2} , where “#”, “|”, “1”, “2”
are names for atoms. So, in the result of the union, “loose atoms” are lost.

Let now A be a set, and consider A, that is, {x : (∃y ∈ A)x ∈ y}. Let A
be formed at stage . Then each y ∈ A must be available before , and since

x ∈ y for each x that we collect in A, a fortiori, x is available before . It

follows that A itself can be built at as a set, so it is a set. As in the case of
pairing, we state the following axiom of union in a “weak” form. It asserts the
existence of a set that contains the union as a subclass. This, by III.4.10, makes
the union a set.
III.6. Axiom of Union 151

III.6.6 Axiom (Union).

(∃z)(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ z) (1)

III.6.7 Remark. Formula (1) has A as its only free variable. Of course, it is
equivalent to a version that is preﬁxed with “(∀A)”. Now, this axiom is stated
a bit too tersely (especially for our ﬂavour of ZFC, which allows atoms) and
needs some “parsing”.
(a) (1) is provably equivalent to

¬U (A) → (∃z)(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ z) (2)

Indeed, (2) is a tautological consequence of (1). Conversely, (1) follows from (2)
and proof by cases, because we can also prove

U (A) → (∃z)(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ z) (3)

Let us do this. We have

U (A) ZFC ¬y ∈ A

Thus, by tautological implication,

U (A) ZFC x ∈ y ∧ y ∈ A → x ∈ z

Now, generalization followed by an invocation of the substitution axiom

gives

U (A) ZFC (∃z)(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ z)

from which the deduction theorem yields (3).

(b) (1) does not ask that z, whose existence it postulates, be a set (it could
be an atom). However, we can show using (1) that

ZFC (∃z) ¬U (z) ∧ (∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ z) (4)

To see this, let B be a z that works (we are arguing by auxiliary constant) in (1).
Thus we add (to ZFC) the assumption

(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ B) (5)

Hence

x ∈y∧y∈ A→x ∈B (6)

152 III. The Axioms of Set Theory

We have two cases (proof by cases):

r Let (i.e., add) ¬U (B). By (5) and tautological implication, we get

¬U (B) ∧ (∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ B)
and then the substitution axiom yields (4).
r Let (i.e., add) U (B). By III.1.3 and the assumption we obtain ¬x ∈ B; hence

x ∈B→x ∈∅
by tautological implication. (6) and tautological implication (followed by
generalization) yield
(∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ ∅)
from which
¬U (∅) ∧ (∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ ∅)
Once again, the substitution axiom yields (4).

III.6.8 Proposition. ZFC Coll x (∃y ∈ A)x ∈ y , where A is a free variable.

Proof. We use (4) of III.6.7(b). Add then a new constant B and the assumption

¬U (B) ∧ (∀x)(∀y)(x ∈ y ∧ y ∈ A → x ∈ B)

Thus

¬U (B) (i)

and

x ∈y∧y∈ A→x ∈B (ii)

We can now show that

(∃y)(y ∈ A ∧ x ∈ y) → x ∈ B (iii)

which will rest the case by III.2.11 and (i). Well, (iii) follows from (ii) by
∃-introduction.

III.6.9 Deﬁnition (The Formal Big and Little ∪). For the record, we

introduce into our theory a unary (1-ary) function symbol, “ ”, formally, by
the deﬁning axiom

A = z ↔ ¬U (z) ∧ (∀x) x ∈ z ↔ (∃y ∈ A)x ∈ y (1)
III.6. Axiom of Union 153

We also introduce a new binary (or 2-ary) function symbol, “∪”, by the defining
axiom

x∪y= {x, y} (2)

Worth repeating: If A is a set, then so is A. Indeed, the assumption translates
to Coll x A; hence the class term A – that is, {x : A} – is (really, names) a

formal term “t” of set theory. So is t by definition of terms, and III.6.9.

But is it an atom? Since ZFC ¬U x by the preceding definition, where
x is a free variable, ZFC ¬U t by substitution.

III.6.10 Remark. By III.6.8 the function “makes sense” for both set and
atom variables. It is trivial to see from (1) above that

ZFC U (A) → A=∅

It follows that the binary formal ∪ also makes sense for any arguments and that
ZFC U (A) ∧ U (B) → A ∪ B = ∅.

III.6.11 Example. What is {a, b}? How does it relate to the informal deﬁni-
tion (III.4.12, p. 141)? Let us calculate using III.6.3:

{x : (∃y)(y ∈ {a, b} ∧ x ∈ y)} = {x : (∃y)((y = a ∨ y = b) ∧ x ∈ y)}

= {x : (∃y)(y = a ∧ x ∈ y ∨ y = b ∧ x ∈ y)}
= {x : (∃y)(y = a ∧ x ∈ y) ∨ (∃y)(y = b ∧ x ∈ y)}
= {x : x ∈ a ∨ x ∈ b}
= a∪b

The second “=” from the bottom was by application of the “one-point rule”
(I.6.2). Note that in “a ∪ b” we are using the formal “∪” to allow this term to
be meaningful for both sets and atoms a, b.†

III.6.12 Informal Deﬁnition (Intersection of a Family). The intersection of

a family F, in symbols F, stands for {x : (∀y ∈ F)x ∈ y}.
If for every two A and B in a family F it is the case that A = B → A∩ B = ∅,
then we say that F consists of pairwise disjoint sets or is a pairwise disjoint
family.

†

“ {a, b}” of III.6.3 is meaningful for both sets and atoms a, b. So is the formal “∪” of III.6.9,
unlike “A ∪ B” of III.4.12, which is deﬁned only for class arguments.
154 III. The Axioms of Set Theory

(1) Operationally, we certify things such as “F is a pairwise disjoint family”

by proving in ZFC the deﬁning property “A = B → A ∩ B = ∅ for all sets A
and B in F”. Correspondingly, a statement such as “Let F be a pairwise disjoint
family” is another way of saying “assume that A = B → A ∩ B = ∅ for all
sets A and B in F”.
(2) We are not interested in the intersection of arbitrary classes (that may

contain atoms, and hence not be families) in introducing the big- abbreviation.
We will also make an exception to what we have practiced so far, and we will

not introduce a formal counterpart for .† It is sufﬁcient that we have a formal
little ∩.
Let A = {x : A [x]}. We have a number of variations in the notation for

A: x∈A x or {x : A [x]} or A[x] x.

III.6.13 Example. Let F = {{1, 2}, {1, 3}} (this family is a set; working in the

metatheory, apply pairing three times). Then F = {1}.

Let now G be any family, and a ∈ G. Then G ⊆ a. Indeed, the translation
of the claim (by III.6.12 and III.4.7) is
a ∈ G → (∀y)(y ∈ G → x ∈ y) → x ∈ a (1)
We can prove (1) within pure logic: Assume a ∈ G and (∀y)(y ∈ G → x ∈ y).
By specialization, a ∈ G → x ∈ a; hence (MP) x ∈ a. By the deduction theo-
rem, (1) is now settled. What happens if G = ∅? (See Exercise III.18.)

III.6.14 Proposition (Existence of Intersections). If the family F is nonempty,

then F is a set.

Proof. By Example III.6.13 and separation (III.4.10).

Priorities of set operations. “−” (that is, difference – as we will not use comple-
ments) and “∪” have the same priority and associate right to left. “∩” is stronger
(associativity is irrelevant by Exercises III.19 and III.20). Thus A − B ∪ C =
A − (B ∪ C), while A ∩ B − C = (A ∩ B) − C, A ∩ B ∪ C = (A ∩ B) ∪ C.
When in doubt, use brackets!

III.6.15 Proposition (De Morgan’s Laws for Classes). Let A, B, C be arbi-

trary classes. Then,
(1) C − (A ∪ B) = (C − A) ∩ (C − B) and
(2) C − (A ∩ B) = (C − A) ∪ (C − B).

†

We do not feel inclined to perform acrobatics just to get around the fact that ∅ cannot be a
formal term: it is not a set (see Example III.6.13 below).
III.6. Axiom of Union 155

Proof. We do (1) imitating the way people normally argue this type of thing,
“at the element level”. The proof uses pure logic and Definitions III.4.1, III.4.3,
III.4.8, and III.4.12.
⊆: Let x ∈ C − (A ∪ B). Then
x ∈C (i)
and
x∈
/ (A ∪ B) (ii)
By definition of ∪, (ii) yields
x∈
/ A∧x ∈
/B (iii)
Combine (i) and (iii) to get (by definition of difference)
x ∈C−A∧x ∈C−B
or (by definition of ∩)
x ∈ (C − A) ∩ (C − B)
Done, by the deduction theorem.
⊇: Let x ∈ (C − A) ∩ (C − B). Then
x ∈C−A∧x ∈C−B
Hence
x ∈C (iv)
and
x∈
/ A∧x ∈
/B
This last one says (by definition of ∪)
x∈
/ (A ∪ B)
which along with (iv) gives
x ∈ C − (A ∪ B)
Case (2) is left as Exercise III.26.

III.6.16 Example. A better way, perhaps, is to use translations and reduce the
issue to a tautology: (1) above translates to (III.4.1, III.4.3 and III.4.12)
C ∧ ¬(A ∨ B ) ↔ (C ∧ ¬A) ∧ (C ∧ ¬B )
Noting that (by propositional De Morgan’s laws)
|=Taut C ∧ ¬(A ∨ B ) ↔ C ∧ (¬A ∧ ¬B )
we are done.
156 III. The Axioms of Set Theory

III.7. Axiom of Foundation

III.7.1 Example. We have seen that the “absolute universe”, U M = {x : x = x},
is a proper class.
The Russell paradox argument does not depend on what exactly M is; there-
fore an alternate Russell class, {x ∈ U N : x ∈
/ x}, exists in all alternate universes
U N (where ∅ ⊆ N ⊆ M – see III.4.20). Thus all universes U N are also proper
classes.

Informal Discussion (towards Foundation). In preparation for the axiom of

foundation, we next reexamine the “magic” of the statements x ∈ / x and x ∈ x.
Some people react to Russell’s paradox by blaming it on an expectation that
x ∈ x might be true for some x. This is not the right attitude, regardless of what
we think the answer to the question x ∈ x is. After all, there is an alternative
“theory of sets” where x ∈ x is possible, and this theory is consistent if ZFC
is – so, in particular, it does not suffer from Russell’s paradox.†
What really is taking place in the Russell argument is a diagonalization –
a technique introduced by Cantor to show that there are “more” real numbers
than natural numbers – and this has nothing to do with whether x ∈ x is, or is
not, “really true”.
We can visualize this diagonalization as follows. Arrange all atoms and sets
into a matrix as in the ﬁgure below:

a b c D B A X ...
a i i i i i i i ...
b i i i i i i i ...
c i i i i i i i ...
D i i i i i i i ...
B i i i i i i i ...
A i i i i i i i ...
X i i i i i i i ...
.. .. .. .. .. .. .. .. ..
. . . . . . . . .

The a, b, c, . . . that label the columns and rows are all the sets and atoms
arranged in some fashion. We may call these labels the heads of the respective
rows or columns that they label.
Each entry, i, can have the value 0 or 1 or the name of an atom (without
loss of generality, we assume that no atom has as name 0 or 1). This value is

† See Barwise and Moss (1991) for an introduction to hypersets.

III.7. Axiom of Foundation 157

determined as follows:


 if z ∈ w
0
Entry at row z and column w = 1 if z ∈
/ w ∧ ¬U (w)


w if U (w) [N.B. This entails z ∈
/ w]

Here are a few examples: Say a is an atom. Then all the i’s in column a have
value a. Let b = {b, D}.† Then column b has 1 everywhere except on rows b
and D, where it has 0. Conversely, the head of a column sequence of 0, 1, and
atom values is determined by the sequence.‡ For example, the sequence of 1
everywhere determines ∅; the sequence

110 1 ...

all 1

i.e., the one that has 1 everywhere except at (row) position c, where it has a 0,
determines the set {c}.
Let us now deﬁne informally a sequence, and therefore a class that we will
by going along the main diagonal (that is, along the matrix entries
call R,
(a, a), (b, b), (c, c), . . . ) “reversing” all the i-values (speciﬁcally, nonzero to 0
and 0 to 1). That is,

0 at position x if entry(x, x) is not a 0
the sequence for R has a (1)
1 at position x if entry(x, x) is a 0

It follows that the sequence for R differs from the sequence for any x at posi-
tion x.
cannot occur anywhere in the matrix (as a column) – for if it occurred
Thus, R
as column x, it then would be “schizophrenic” at matrix entry (x, x) – so it is
not a set (recall, the matrix represents all sets and atoms as columns).
What is the connection with Russell’s paradox? Well, in (1) we are saying
iff x ∈
that x ∈ R = R, the Russell class! The above diagonalization
/ x; hence R
can be readily adapted to “construct” a set that is not in a given set b. All we
have to do is to think of the matrix as “listing”, or representing, just all the
atoms and sets in b rather than in U M (see Exercise III.6).

† In view of what we said in the preceding footnote regarding hypersets, we allow just for the sake
of this discussion the generality where b ∈ b is possible.
‡ Do not expect all sequences to appear as columns. For example the sequence whose members
are all 0 denotes U M , but our matrix heads are only sets or atoms.
158 III. The Axioms of Set Theory

We have chosen to describe (by ZFC) a standard model where, among its
properties, we have that x ∈ x is always false, by Principle 1.† Similarly,
a ∈ b ∈ a and, more generally, a ∈ b ∈ · · · ∈ a are absurd in our model, for
the leftmost a should be available before the rightmost a for such a chain of
memberships to be valid (Principle 1).
Note that if, say, a ∈ b ∈ a were possible, then we would get the “infinite”
chain
···a ∈ b ∈ a ∈ b ∈ a ∈ b ∈ a (1)
i.e., a would be “bottomless”, like an infinite regression of a “box in a box in a
box in a box . . . ”. Sets that are not bottomless are called well-founded.
A bit more can be said of the standard model. It is not only “repeating”
chains such as (1) that are not possible, but likewise non-repeating infinite
“descending” chains such as
· · · an ∈ an−1 ∈ · · · ∈ a2 ∈ a1 ∈ a0 (2)
There are no bottomless sets, period.
Towards formulating an appropriate axiom of ZFC that says “bottomless sets
do not exist”, let A be any nonempty class. Assume that it contains no atoms.
Now, there must be a set (maybe more than one) in A that was constructed no
later than any other set in A (for example, if # and | name atoms, and if {#} ∈ A
and {|} ∈ A, then {#} and {|} are two among those sets in A that are constructed
the earliest possible).
Let now, in general, y be such an earliest-constructed set in A, and let x ∈ y.
It follows that x ∈/ A (for x is an atom – hence not in A – or is a set built
before y). The existence of sets like y in A captures foundation. Thus, taking
A to contain precisely the members of (2), we see that (2) is absurd.

III.7.2 Axiom (Foundation Schema). Class form:

A = ∅ → (∃y ∈ A)(¬(∃x ∈ y)x ∈ A)
or, applying De Morgan’s laws,
A = ∅ → (∃y ∈ A)(∀x ∈ y)x ∈
/A
The axiom expressed in the formal language is the schema
(∃y)A [y] → (∃y)(A [y] ∧ ¬(∃x ∈ y)A [x])

† Note that in such a state of affairs all entries (x, x) are nonzero; thus
R is the sequence composed
entirely of 0, representing U M . This is as it is expected, since now R = U M .
III.7. Axiom of Foundation 159

III.7.3 Remark.

(1) The foundation axiom (schema) is also called the regularity axiom.
(2) The schema version of foundation is due to Skolem (1923). It readily
implies – using A ≡ y ∈ A – and is implied† by the single-axiom
(non-schema) set version where A is a free variable other than y:

(∃y)y ∈ A → (∃y) y ∈ A ∧ ¬(∃x ∈ y)x ∈ A
or
¬U (A) → A = ∅ → (∃y ∈ A)(¬(∃x ∈ y)x ∈ A)

(3) The discussion that motivated III.7.2 was in terms of a class A that contained
no atoms. No such restriction is stated in III.7.2, for trivially, if A does
contain atoms, any such atom will do for y. If it is known that A is a family
of sets (i.e., that it contains no atoms), then foundation simpliﬁes to
A = ∅ → (∃y ∈ A)y ∩ A = ∅

(4) If for a minute we write < for ∈, then III.7.2 (formal language version)
reads exactly as the least number principle on N. Of course, ∈ is not an
order on all sets; however, if its scope is restricted on appropriate sets, then
it becomes an order, and III.7.2 makes it a well-ordering. More on this in
Chapter VI.

III.7.4 Example. Let us derive once again the falsehood of a ∈ a and a ∈ b ∈ a,

this time formally, using the axiom (schema) of foundation.
Given a and b (sets or atoms), the sets S = {a} and T = {a, b} exist,‡ as we
saw earlier. Since S = ∅, there is a y ∈ S such that x ∈ y is false for all x ∈ S
(III.7.2). The only candidate for either y or x is a. Thus, a ∈ a is false.
O.K., let us repeat the above in a (formal) manner so that we will not be
accused of arguing semantically (saying things like “false” – colloquial for
“refutable” – and the like):
ZFC ¬U ({a}) → {a} = ∅ → (∃y ∈ {a})(¬(∃x ∈ y)x ∈ {a})
by III.7.3(2) and III.5.5. Since ZFC ¬U ({a}) and ZFC {a} = ∅, modus ponens
yields
ZFC (∃y ∈ {a})(¬(∃x ∈ y)x ∈ {a}) (1)

† Not so readily. We will get to this later.

‡ “The set {a} exists” is another way of saying that “{a} is a set” or that “the term {a} can be
formally introduced”.
160 III. The Axioms of Set Theory

Let B be a new constant, and add the assumption

B ∈ {a} ∧ ¬(∃x)(x ∈ B ∧ x ∈ {a}) (2)

or, as we say when we act like Platonists, “let B be an object such as (1) tells
us exists”. From (2) we derive B ∈ {a} hence, by III.5.5,

B=a (3)

and

¬(∃x)(x ∈ B ∧ x ∈ {a})

which in view of (3) and III.5.5 yields

¬(∃x)(x ∈ a ∧ x = a)

i.e., (“one point rule”, I.6.2, p. 71)

¬a ∈ a

We have just refuted a ∈ a (a a free variable).

For T we only offer the informal (Platonist’s) argument: There is a y ∈ T
such that x ∈ y is false for all x ∈ T .
Case where y = a: Then we cannot have b ∈ a.
Case where y = b: Then we cannot have a ∈ b.
So we cannot have both a ∈ b and b ∈ a (i.e., a ∈ b ∈ a).

III.8. Axiom of Collection

In older approaches to set theory, when the formation-by-stages doctrine was not
available, how did mathematicians recover from paradoxes? “Sets”† like R and
U M were known to be “paradoxical”, and this was attributed to their enormous
size. In turn, this uncontrollable size resulted into some of these “sets” becoming
members of themselves, a situation that was (incorrectly) considered in itself
as paradoxical and a source of serious logical ills – such was the impact of Rus-
sell’s paradox and the central presence of the “self-referential statement”‡ x ∈ x
in its derivation. For example, the “self-contradictory” (as they called it) “set of
all sets”, V M , certainly satisﬁed, according to the analysis at that time, V M ∈ V M .

† We use the term “set” in quotes because at the time in the development of set theory that this
commentary refers there was no technical distinction between sets and proper classes. Rather,
there was a distinction between “sets” and “self-contradictory” sets; they were all sets, but some
were troublemakers and were avoided.
‡ If x could talk, it would say “I am a member of myself”.
III.8. Axiom of Collection 161

That this was considered to be a “problem” can be seen, for example, in

Kamke (1950, p. 136) where he states that all “sets”, such as (Russell’s) R and
V M , “that contain themselves as elements are ‘self-contradictory’ concepts as a
matter of course, and are therefore inadmissible”. He adds that no sets that con-
tain themselves are known that reasonable people would “regard as meaningful
sets”.†
So the “set of all sets” was to be avoided at all costs.‡ But how do you define
“large”? In the absence of an exact definition, at the one extreme you may be
out on a witch hunt, and at the other extreme you may be the victim of error
(see III.8.1). Are all large “sets” members of themselves? (Again, see III.8.1.)
Of course, all these worries were for the mathematician who worked on the
foundations of mathematics. The analyst, the number theorist, and the topologist
were not worried by such issues, for they worked in “small universes”, or
“reference sets”. That is, R (reals) or C (complex numbers) or Z would be the
reference sets of the analyst and number theorist: all the atoms they needed
were members of these reference sets, and any sets they needed were subsets
of the reference sets. The topologist too would be satisfied to start with some
“small” space (set), X , his reference, and then study subsets of X , looking for
“open” sets, “closed” sets, “connected” sets, etc.
In elementary expositions of set theory, even contemporary ones, the refer-
ence set approach is sometimes misrepresented as a logical necessity for the
avoidance of paradoxes.
Let us conclude this discussion by proposing a new informal (metamathe-
matical) principle, which invokes “largeness” in a relative sense (Cantor’s work
implicitly used this principle, which was first articulated by Russell). This prin-
ciple, on one hand, yields – by a different route – the axiom of separation; on
the other hand it yields the important axiom of replacement.

Principle 3 (The Size Limitation Doctrine). A class is a set if it is not “larger”

than some known set. Correspondingly, it is not a set if it is as large as a proper
class, for, otherwise this proper class would also be a set.
“Largeness” we will leave undeﬁned, but this drawback is not serious, for we
will apply the principle (carefully, and only twice) just to “derive” two axioms.

† The reader is reminded of the nowadays acknowledged existence of such (hyper)sets (Barwise
and Moss (1991)) – not, however in ZFC.
‡ Indeed, mathematicians were suspicious of even the phrase “all sets” in something as innocent
as “. . . let us divide all possible sets into [equivalence] classes. . . ” (N.B. Just let us divide,
not attempt to collect into a “set”.) See Wilder (1963, p. 100) for further discussion, where he
speculates whether the “concept” of “all sets” might be as “self-contradictory” as the concept of
“the set of all sets”.
162 III. The Axioms of Set Theory

After this has been accomplished, we will forget Principles 0–3 and always
defer to the axioms.

III.8.1 Example. Let B be a set and let A ⊆ B. Then certainly A is not larger
than B, so A is a set by Principle 3. Thus separation (see the class form of
the schema, III.4.10) follows from the doctrine of size limitation as much as it
follows from that of set formation by stages.
Next, let U be the class of all singletons. Is this class “large” (hence proper),
or is it “small” (hence a set)? This example appears in Wilder (1963, p. 100)
(see in particular the closing remarks prior to his 4.1.2), where the argument
implies that this “set” is not “self-contradictory” (what we now call a “proper
class”), for, after all, it is far from containing “all sets”. In fact, in 4.1.2 (loc.
cit.) the “cardinal number 1” is identiﬁed with the “set” of all singletons (U )
without any adverse comment.
Well, it turns out that U is a proper class, for it has the same size as U M , as we
can readily see from the fact that each x ∈ U M corresponds to a unique {x} ∈ U
and vice versa. Thus, as a “set”, U would be every bit as “self-contradictory”
as U M . Incidentally, we must wonder to what extent the fact that, as a “set”, U
clearly satisﬁed U ∈ / U made it more acceptable than U M back then.

Now consider a set A. Let us next “replace” every element x ∈ A by some other
object x (set or urelement).†
Evidently, the resulting class (let us call it A) is not larger than the original
(and could very well be smaller, for we might have replaced several x ∈ A by
the same object); hence, by Principle 3, A is a set. This is the principle of (it
goes under several names) replacement or substitution or collection, and it is
very important in ZFC.
We prefer not to use the name “substitution” for this nonlogical axiom, for
that would clash with our use of the name for the logical axiom

A [x ← t] → (∃x)A

We will adhere to the name “collection”.

Below we state it as an axiom in the formal language. In the next section,
once the notions of relation and function have been formalized, we will give a
very simple version of the axiom.

† We use “replace” in the weak sense, where it is possible that for one or more x ∈ A the replacing
object is the same as the replaced object.
III.8. Axiom of Collection 163

III.8.2 Axiom (Schema of Collection or Replacement). For any formula

P [x, y],
(∀x ∈ A)(∃y) P [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z) P [x, y] (1)
where A is a free variable.

III.8.3 Remark. (I) In any specific instance of the axiom (schema) of collection
the formula P [x, y] is the “agent” that effects the replacements: The hypothesis
ensures that for each x ∈ A, P suggests a y (maybe it has more than one
suggestion) – depending on x – as a possible replacement.
The conclusion says that there is a set,† z, which contains, instead of each x
that was originally in A, one (or more) replacement(s), y, among the possibly
many that were suggested by P . (All the suggestions were made to the left of
“→”.)
There is a small difficulty here: In the formal statement adopted in III.8.2 –
where we have allowed more than one possible candidate y to replace each x –
we run at once into a size and a “definability” problem: Obviously, if we are
going to argue that the size of z is small (and hence z is a set) we have to be
able to

(a) either choose a unique replacement y for each x ∈ A (and P [x, y] cannot
help us here; we have to do the choosing), or
(b) choose a “very small number” of replacements y for each x ∈ A – i.e., cut
down the size of the class of replacement values for each x – so that the
size of z is not substantially different from that of A.‡

If we were to take approach (a), then we would need a mechanism to effect

infinitely many choices, one out of each class Ax = {y : P [x, y]}, thus in effect
turning the hypothesis into (∀x ∈ A)(∃!y)Q [x, y] (where Q [x, y] → P [x, y],
for all x ∈ A) so that we could benefit from the size argument preceding III.8.2.
However, this would require (a strong form of) the axiom of choice, the axiom
that says, in effect, “don’t worry if you cannot come up with a well-defined
method to form a set consisting of one element out of each set in a (set) family
of sets; such a set exists anyhow”.

† Well, not exactly. It says that a formal object exists, but this object could well be an atom. Since
we will prove (III.8.4) an equivalent statement to (1), which explicitly asks that z be a set, we
can pretend in this discussion that (1) already asks that z be a set, although it does so between
the lines.
‡ Clearly, it is not “safe” to collect into z all possible y that P [x, y] yields for each x. For example,
if P [x, y] ≡ x ⊆ y and A = {∅}, then allowing all the y that P yields for x ∈ A we would end
up with z = U M , not a set.
164 III. The Axioms of Set Theory

We can do better than that (avoiding the axiom of choice, which we have
not formally introduced yet) if we allow ourselves to put in z possibly more
than one y that satisfy P [x, y] for a given x ∈ A, that is, approach (b). We
do this as follows: To show (informally) that a set z as claimed by the formal
axiom exists, and that therefore the axiom is “really true”, let us consider, for
each x ∈ A, all the y such that P [x, y] is true which are built at the earliest
possible stage. There is just one such stage for each x, call it x . Now the class
of all such y, call it Yx , is a set, for all its elements are available at stage x ,
and there certainly is a stage after x (at such a stage, Yx is formed as a set).
Thus, for each x ∈ A we ended up with a unique set Yx . Using the informal
analysis prior to the axiom, there is a set B that contains exactly all the Yx . It

is clear now that we can “well-deﬁne” z: z = B will do, and is a set by the
union axiom.
(II) The hypothesis part of the axiom is usually stated in stronger terms,
viz., (∀x ∈ A)(∃!y) P [x, y],† and in that format it usually goes under the name
replacement axiom. The present form (mostly known as the collection axiom,
e.g., Barwise (1975)) is clearly preferable, for to apply it we have to work
less hard to recognize that the hypothesis holds. All the various formulations
of collection/replacement are equivalent in ZF (even without the “C”). Some
other forms besides the ones stated so far are the following, where we are using
set term notation in the interest of readability:
(1) Bourbaki (1966b):
(∀x)(∃z)(∀y)(P [x, y] → y ∈ z) → (∀A)Coll y (∃x ∈ A) P [x, y]
or in more suggestive notation
(∀x)(∃z){y : P [x, y]} ⊆ z → (∀A)Coll y (∃x ∈ A) P [x, y]
(2) Shoenﬁeld (1967):
(∀x)(∃z)(∀y)(P [x, y] ↔ y ∈ z) → (∀A)Coll y (∃x ∈ A) P [x, y]
or in more suggestive notation‡
(∀x)(∃z){y : P [x, y]} = z → (∀A)Coll y (∃x ∈ A) P [x, y]

† Recall that (∃!x)R says that there is a unique x satisfying R. That is, (∃x)(R[x] ∧ (∀y)(y =
x → ¬R[y])).
‡ The “suggestive” notation in (1) above is 100% faithful to the formal version, since, by III.3.6,
(∀y)(P [x, y] → y ∈ z) is provably equivalent to {y : P [x, y]} ⊆ z (see also III.1.5, p. 116). Not
so for the suggestive rendering of (2) in the presence of atoms. For example, on the assumption
U (z), (∀x)(¬x = x ↔ x ∈ z) is not equivalent to {x : ¬x = x} = z. Well, on one hand
III.8. Axiom of Collection 165

(3) Levy (1979):

(∀x)(∀y)(∀y )(P [x, y] ∧ P [x, y ] → y = y )
→ (∀A)Coll y (∃x ∈ A) P [x, y]
We can readily prove (III.8.12 below) that collection implies all these alternative
forms. While the converse is true, it will have to wait until we can formalize
“stages” and thus formalize the argument we have used in (I) above to show
that collection is “really true”.
(III) We have restricted the way in which sets become available, namely,
requiring that they be built in stages, or that they be not much larger than their
“parents” (i.e., the sets that we have used to build them). In the process, we
developed (most of, but not all yet) the ZFC axioms, as they ﬂow from these
doctrines, with the apparent result of managing to escape from the paradoxes
and antinomies of the past.
Thus, despite the lack of a (meta)proof of the consistency for ZFC, we are
doing well so far. But is this apparent success at no cost? Have we got “enough
sets” in this restricted axiomatic set theory to mirror what we normally do
in everyday mathematics? Put another way, do we have enough stages of set
construction in order to build sets that are as complicated as the various branches
of mathematics require them to be?
Of course, this is not a quantitatively precise question, and it will not get
a quantitatively precise answer. However, the answer will hopefully satisfy us
that we are doing well on this count too.
Imagine two mathematicians who are playing the following game: They
have a large and complicated set, A, to start with. They take turns, each taking
an x ∈ A, “making his move”, and then discarding x. A “move” consists of
proposing the wildest, most complicated set of one’s experience that one can
think of on the spur of the moment: Sx . Of course, at each move each player is
doing his best to utterly demolish the morale of his opponent and also to better
his own effort at his previous move.
At the end of the game, we have a class of all the Sx , which is a set by
collection. Now, the stage at which this class was built as a set is beyond the
wildest imagination of our two friends – otherwise one of them would have
proposed some set built at that stage during the game.

Shoenfield (1967) does not employ atoms, so that our rendering of (2) captures exactly what this
form of collection “says” in loc. cit. On the other, this is a moot point, for we prove (III.8.12)
that the formal version (2) – even in the presence of atoms – implies version (3) without adding
the qualifier “¬U (z) ∧ ” before “(∀y)”. In turn, we find out later that form (3) implies collection.
In short, versions (1) and (2), exactly as stated, are equivalent to our collection.
166 III. The Axioms of Set Theory

Put another way, we cannot “stretch” an “inﬁnite” set A into a proper class
by the device of replacing each of A’s elements with horrendously complicated
sets – i.e., sets that are built extremely late in the stage hierarchy – in an effort
to run out of stages. Starting with A ∈ U M , no matter how far we stretch it, we
still end up inside U M . Therefore, we do have a lot of stages. Equivalently, our
“universe” is “very large”.
There is an important observation to be made here: The reason that we have
used the size doctrine to justify collection/replacement is, intuitively, precisely
the result of the game above. We felt that we could not apply Principle 2
(p. 102) reliably, or convincingly, towards arguing that “we could imagine” that
a stage existed after all the stages for the construction of all the sets Sx (our two
colleagues could not imagine either).
The reader is referred to Manin (1977, p. 46), where he states that – in
the context of the doctrine of set formation by stages – the justiﬁcation of the
collection axiom goes beyond the “usual intuitively obvious”.

III.8.4 Remark (a More Verbose Collection). We “parse” here (just as we

did for the axiom of union in III.6.7) the collection statement, extracting in the
process more information from the axiom than it seems to be stating.
(I) First off, we never said that A has to be a set. Indeed, III.8.2 is equivalent
to
¬U (A) → (∀x ∈ A)(∃y) P [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z) P [x, y] (2)
This is because (2) is a tautological consequence of (1) in III.8.2 on the
one hand. On the other hand, proof by cases with the help of
ZFC U (A) → (∀x ∈ A)(∃y) P [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z) P [x, y]
(3)
combines with (2) to derive collection as originally stated. Why is (3)
valid? We can prove the simpler
ZFC U (A) → (∃z)(∀x ∈ A)(∃y ∈ z) P [x, y] (4)
from which (3) follows tautologically: Well, assume U (A). Then ¬x ∈ A
by III.1.3, from which
x ∈ A → (∃y ∈ z) P [x, y]
by tautological implication. Generalization followed by an invocation of
the substitution axiom (and modus ponens) ﬁnally yields
(∃z)(∀x ∈ A)(∃y ∈ z) P [x, y]
III.8. Axiom of Collection 167

Thus, we do not need to worry whether or not the variable A appearing in

the collection axiom is a set.
(II) Next we prove that

(∀x ∈ A)(∃y) P [x, y] → (∃z) ¬U (z) ∧ (∀x ∈ A)(∃y ∈ z) P [x, y]
(5)
is equivalent to collection. It is trivial that (5) implies collection, so we
concentrate in the direction where collection implies (5). We use the de-
duction theorem, assuming

(∀x ∈ A)(∃y) P [x, y] (6)

under three cases: U (A); ¬U (A) and A = ∅; A = ∅.

Of course, U (A) ∨ ¬U (A) ∧ (A = ∅ ∨ A = ∅).

r A = ∅ or U (A). This yields ¬x ∈ A (x free – see III.3.5 in case

A = ∅); thus

x ∈ A → (∃y ∈ ∅) P [x, y]

by tautological implication. Another tautological implication and

ZFC ¬U (∅) yield

¬U (∅) ∧ (x ∈ A → (∃y ∈ ∅) P [x, y])

Following this up with generalization (and distribution of ∀ over ∧,

noting the absence of free x in ¬U (∅)), we have

¬U (∅) ∧ (∀x ∈ A)(∃y ∈ ∅) P [x, y]

Thus, by the substitution axiom

(∃z) ¬U (z) ∧ (∀x ∈ A)(∃y ∈ z) P [x, y]

Note. Neither collection nor (6) was needed in this case.

r ¬U (A) and A = ∅. The assumption amounts to (∃y)y ∈ A. We argue
by auxiliary constant. Let B be a new constant, and assume

B∈A (7)

By (6) collection yields

(∃z)(∀x ∈ A)(∃y ∈ z) P [x, y] (8)

168 III. The Axioms of Set Theory

Add yet another new constant, C, and assume

(∀x ∈ A)(∃y ∈ C) P [x, y] (9)
Specialization of (9) using (7) and modus ponens yields
(∃y ∈ C) P [B, y]
Hence
(∃y)y ∈ C
by ∃-monotonicity (I.4.23). Thus (by III.1.3)

¬U (C) (10)

(9) and (10) tautologically imply ¬U (C)∧(∀x ∈ A)(∃y ∈ C) P [x, y],

which by substitution axiom gives

(∃z) ¬U (z) ∧ (∀x ∈ A)(∃y ∈ z) P [x, y]

(III) Finally, collection is equivalent to

¬U (A) → (∀x ∈ A)(∃y) P [x, y]
(11)
→ (∃z) ¬U (z) ∧ (∀x ∈ A)(∃y ∈ z) P [x, y]
for (11) trivially implies (2), while (2) implies (11) using the two cases
A = ∅ and A = ∅ exactly as we did above.

III.8.5 Remark (A Note on Nonlogical Schemata and Deﬁned Symbols).

By I.6.1 and I.6.3, the addition of deﬁned predicate, function, and constant
symbols to any language/theory results in a conservative extension of the theory,
that is, any theorem of the new theory over the original language is also provable
in the original theory. Moreover, any formula A of the extended language can
be naturally transformed back into a formula A ∗ of the old language (by elim-
inating all the deﬁned symbols), so that

A ↔ A∗ (1)

is provable in the extended theory.

There is one potential worry about the presence of nonlogical schemata –
such as the separation, foundation, and collection axiom schemata – that we
need to address: Nonlogical axioms and schemata are specific to a theory and
its basic language, i.e., the language prior to any extensions by definitions. For
example, the collection schema III.8.2 (p. 163) is a “generator” that yields a
specific nonlogical axiom (an instance of the schema) for each specific formula,
III.8. Axiom of Collection 169

over the basic language L Set , that we care to substitute into the metavariable
P . There is no a priori promise that the schema “works” whenever we replace
the syntactic variable P [x, y] by a specific formula, say “B ”, over a language
that is an extension of L Set by definitions.
For example,† do we have the right to expect the provability of
(∀x ∈ A)(∃y)y = t[x] → (∃z)(∀x ∈ A)(∃y ∈ z)y = t[x]
in the extended theory, if the term t contains defined function or constant
symbols?
Indeed we do, for let us look, in general, at an instance of collection obtained
in the extended language by substituting the specific formula B – that may
contain defined symbols – into the syntactic variable P :
(∀x ∈ A)(∃y)B [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B [x, y] (2)
We argue that (2) is provable in the extended theory; thus the axiom schema is
legitimately usable in any extension by definitions of set theory over L Set .
Following the technique of symbol elimination given in Section I.6 (cf. I.6.4,
p. 73) – eliminating symbols at the atomic formula level – we obtain the fol-
lowing version of (2), in the basic language L Set . This translated version has
exactly the same form as (2) (i.e., of collection), namely
(∀x ∈ A)(∃y)B ∗ [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B ∗ [x, y]
Thus – being a collection schema instance over the basic language – it is an
axiom of set theory, and hence also of its extension (by definitions).
Now, by (1), the equivalence theorem yields the following theorem of the
extended theory:

(∀x ∈ A)(∃y)B [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B [x, y]
↔

(∀x ∈ A)(∃y)B ∗ [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)B ∗ [x, y]
Hence (2) is a theorem of the extended theory as claimed.
The exact same can be said of the other two schemata (foundation,
separation).‡

† This scenario materializes below, in III.8.9.

‡ One can rethink the axioms, for example adopting Bourbaki’s collection instead of our III.8.2,
so that the separation schema becomes redundant. We have already promised to prove in due
course that foundation need not be a schema. However, it turns out that we cannot eliminate all
schemata. It is impossible to have a ﬁnite set of axioms equivalent to the ZFC axioms. We prove
this result in Chapter VII.
170 III. The Axioms of Set Theory

III.8.6 Example (Informal). We often want to collect into a set objects that
are more complicated than [“values of”] variables, subject to a condition being
true. For example, we often write things such as
(1) {n 2 : n ∈ N},
(2) {x + y : x ∈ R ∧ y ∈ R ∧ x 2 + y 2 = 1},
(3) {(x, y) : x ∈ R ∧ y = 2}, where “(x, y)” is the “ordered pair” (more on
which shortly) of the Cartesian coordinates of a point on the plane,
(4) {(x, y) : x ∈ R}.
We are clear on what we mean by these shorthand notations. First off, for
example, notation (1) cannot be possibly obtained in any manner by substitution
from something like {x : . . . x . . . }, since the “x” in a set term {x : A} is bound.
What we do mean is that we want to collect all objects that have the form “n 2 ”
for some n in N. That is, notation (1) is shorthand (abbreviation or argot) for

{x : (∃n)(x = n 2 ∧ n ∈ N)}

Similarly with (2)–(4). (4) is interesting in that y is a free variable, or a para-

meter as we often say. We get different sets for different “values” of y. The
shorthand (4) stands for the term {z : (∃x)(z = (x, y) ∧ x ∈ R)}.
The notation reviewed here is sufﬁciently important to motivate the deﬁnition
below.

III.8.7 Informal Deﬁnition (Collecting Formal Terms). The symbol

{t[w
m ] : A [xn ]}
where t[w m ] is a formal term (cf. discussion following III.2.5), is an abbrevia-
tion of the class term

y : (∃x1 )(∃x2 ) · · · (∃xn ) y = t[w
m ] ∧ A [xn ] (1)

The variables xn explicitly quantiﬁed in (1) above are precisely the ones we
list in “[xn ]” of A. We may call them “linking” variables (linking the term t
with the “condition” A) or “active” variables (Levy (1979)). All the remaining
variables other than y are free (parameters).
The notation does not always unambiguously indicate the active variables. In
such cases the context, including surrounding text, will remove any ambiguity.

III.8.8 Example. What does

{t[x] : A [x]}
III.8. Axiom of Collection 171

abbreviate? In the ﬁrst instance, it abbreviates the expression

{y : (∃x)(y = t[x] ∧ A [x])}

The latter abbreviates (cf. III.6.3)

z : (∃y) (∃x)(y = t[x] ∧ A [x]) ∧ z ∈ y (2)

Let us simplify (2):

(∃y) (∃x)(y = t[x] ∧ A [x]) ∧ z ∈ y

↔ z ∈ y has no free x
(∃y)(∃x)(y = t[x] ∧ A [x] ∧ z ∈ y)

↔ commuting the two ∃
(∃x)(∃y)(y = t[x] ∧ A [x] ∧ z ∈ y)

↔ one point rule (I.6.2, p. 71)
(∃x)(A [x] ∧ z ∈ t[x])

Thus,†

{t[x] : A [x]} = {z : (∃x)(A [x] ∧ z ∈ t[x])} (3)

III.8.9 Proposition. The class {t[x] : x ∈ A} (A a free variable) is a set, that is

(using III.8.7),

ZFC Coll y ((∃x ∈ A)y = t[x]) (4)

Only x, not A, is the linking variable. We could have written {t[x] : (x ∈ A)[x]}
to indicate this.

Proof. By collection,

ZFC (∃z)(∀x ∈ A)(∃y ∈ z)y = t[x] (5)

since

(∀x ∈ A)(∃y)y = t[x]

† Note that since we are dealing with abbreviations, this is a theorem of pure logic.
172 III. The Axioms of Set Theory

is a theorem of pure logic deduced from t = t (using substitution, tautological

implication, and generalization, in that order). Arguing by auxiliary constant,
we add a new constant B and the assumption

(∀x) x ∈ A → (∃y)(y ∈ B ∧ y = t[x])

The one point rule and specialization yield

x ∈ A → t[x] ∈ B (6)

We can now prove that

(∃x ∈ A)y = t[x] → y ∈ B (7)

which will settle (4) by III.3.6.

We assume the hypothesis in (7); indeed, we go a step further: We add a new
constant C and the assumption

C ∈ A ∧ y = t[C]

Thus

C∈A (8)

and

y = t[C] (9)

(6) and (8) yield t[C] ∈ B. From (9), y ∈ B.

III.8.10 Corollary. ZFC Coll z (∃x ∈ A)z ∈ t[x] .

Proof. Apply III.6.8 to {t[x] : x ∈ A} (a set, by III.8.9), and use (3) above
(in III.8.8).

III.8.11 Corollary. {t[x, y] : x ∈ A ∧ y ∈ B} is a set, where A and B are free

variables (and x and y are
active).
Formally, ZFC Coll z (∃x ∈ A)(∃y ∈ B)z = t[x, y] .

Proof. We will establish

{t[x, y] : x ∈ A ∧ y ∈ B} = {t[x, y] : y ∈ B} : x ∈ A (1)

from which the corollary follows by two applications of III.8.9 followed by an

application of union. As for (1), we transform the right hand side to the left hand
III.8. Axiom of Collection 173

side by eliminating abbreviations. The “=” instances below use III.4.1(ii):

{t[x, y] : y ∈ B} : x ∈ A

= by III.8.8(3)

z : (∃x ∈ A)z ∈ {t[x, y] : y ∈ B}

= by III.8.7

z : (∃x ∈ A)(∃y ∈ B)z = t[x, y]

= by III.8.7 again

{t[x, y] : x ∈ A ∧ y ∈ B}

By commutativity of ∧, (1) yields

{t[x, y] : x ∈ A ∧ y ∈ B} = {t[x, y] : x ∈ A} : y ∈ B

III.8.12 Proposition. In the presence of the ZFC axioms that we have intro-
duced so far – less collection, except when explicitly assumed as (1) below – we
have the following chain of implications (stated conjunctively): (1) → (2) →
(3) → (4) → (5), where the statements are

(1) collection—version III.8.2,

(2) (∀x)(∃z)(∀y)(P [x, y] → y ∈ z) → (∀A)Coll y (∃x ∈ A) P [x, y],
(3) (∀x)(∃z)(∀y)(P [x, y] ↔ y ∈ z) → (∀A)Coll y (∃x ∈ A) P [x, y],
(4) (∀x)(∀y)(∀y )(P [x, y] ∧ P [x, y ] → y = y ) → (∀A)Coll y (∃x ∈ A) P [x, y],
(5) (∀x ∈ A)(∃!y) P [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z) P [x, y].

Proof. (1) → (2): We assume (1) (collection version III.8.2). To prove (2)
assume the hypothesis. Hence (specialization)

(∃z)(∀y)(P [x, y] → y ∈ z)

We add a new constant B and let

(∀y)(P [x, y] → y ∈ B) (i)

174 III. The Axioms of Set Theory

By III.3.6,†

Coll y P [x, y]

It follows that {y : P [x, y]} can be introduced formally as a set term. Hence,
by III.8.10,

Coll y (∃x ∈ A)y ∈ {y : P [x, y]}

In short, Coll y (∃x ∈ A) P [x, y]; thus

(∀A)Coll y (∃x ∈ A) P [x, y]

(2) → (3): We assume (2). To prove (3), assume the hypothesis. Hence
(specialization)

(∃z)(∀y)(P [x, y] ↔ y ∈ z)

We add a new constant B and let

(∀y)(P [x, y] ↔ y ∈ B)

Tautological implication followed by an invocation of ∀-monotonicity (I.4.24,

p. 52) yields (i) above.
(3) → (4): We assume (3). To prove (4), assume the hypothesis. Hence

P [x, y] ∧ P [x, y ] → y = y (ii)

We also record the tautology

(∃y) P [x, y] → (∃y) P [x, y] (iii)

By III.2.4 (p. 122: (10), (11), (18)), (ii) and (iii) allow us to introduce a new
function symbol, f P , into the language, and the deﬁning axiom

(∃y) P [x, y] ∧ P [x, f P [x]] ∨ ¬(∃y) P [x, y] ∧ f P [x] = ∅ (iv)

into the theory. From (iv) one deduces (corresponding to (19 ) and (20) of III.2.4)

(∃y) P [x, y] → (P [x, y] ↔ y = f P [x]) (v)

and

¬(∃y) P [x, y] → f P [x] = ∅ (vi)

† Observe how we did not need to insist that B is a set. This issue was the subject of a footnote on
p. 164.
III.8. Axiom of Collection 175

We can now prove

(∀x)(∃z)(P [x, y] ↔ y ∈ z) (vii)

We have two cases:

Case of (∃y) P [x, y]. By (v),

P [x, y] ↔ y ∈ { f P [x]}

By the substitution axiom,

(∃z)(P [x, y] ↔ y ∈ z) (vii )

Case of ¬(∃y) P [x, y]. Thus, (∀y)¬P [x, y]; hence ¬P [x, y] is deriv-
able. By tautological implication, P [x, y] → y ∈ ∅.
Conversely, y ∈ ∅ → P [x, y] is a tautological consequence of the theorem
¬y ∈ ∅. Thus,

P [x, y] ↔ y ∈ ∅

and we derive (vii ) once more, by the substitution axiom. Proof by cases now
yields (vii ) solely on the hypothesis (∀x)(∀y)(∀y )(P [x, y] ∧ P [x, y ] →
y = y ); hence we have (vii) by generalization.
Having settled (vii), we next obtain

(∀A)Coll y (∃x ∈ A) P [x, y]

by our hypothesis (3).

(4) → (5): We assume (4). To prove (5), assume the hypothesis, that is,

(∀x ∈ A)(∃!y) P [x, y]

which entails

(∀x)(x ∈ A → (∃y) P [x, y]) (viii)

and

x ∈ A → P [x, y] ∧ P [x, y ] → y = y (i x)

For convenience we let

Q [x, y] ≡ x ∈ A ∧ P [x, y] ∨ x ∈
/ A∧y =∅

Work already done in III.2.4 yields, because of (i x),

Q [x, y] ∧ Q [x, y ] → y = y
176 III. The Axioms of Set Theory

Thus, by hypothesis (4), we have derived

(∀z)Coll y (∃x ∈ z)Q [x, y]

Hence

Coll y (∃x ∈ A)Q [x, y] (x)

by specialization. Expanding Q and using the tautology

x ∈ A ∧ (x ∈ A ∧ P [x, y] ∨ x ∈
/ A ∧ y = ∅) ↔ x ∈ A ∧ P [x, y]

the equivalence theorem yields

(∃x) x ∈ A ∧ (x ∈ A ∧ P [x, y] ∨ x ∈
/ A ∧ y = ∅) ↔ (∃x)(x ∈ A ∧ P [x, y])

Hence, from (x),

Coll y (∃x ∈ A) P [x, y]

Let then (a formal deﬁnition of “B”† introduced just for notational convenience)

B = {y : (∃x)(x ∈ A ∧ P [x, y])} (xi)

Let also w ∈ A.
By (viii) we get (∃y) P [w, y], which allows us to add a new constant C and
the assumption

P [w, C] (xii)

from which w ∈ A ∧ P [w, C] by tautological implication. Therefore

(∃x)(x ∈ A ∧ P [x, C])

and, by (xi),

C∈B (xiii)

Now (xii) and (xiii) yield C ∈ B ∧ P [w, C]; thus

(∃y)(y ∈ B ∧ P [w, y])

The deduction theorem and hypothesis w ∈ A yield

w ∈ A → (∃y ∈ B) P [w, y]

† Of course, “B” is a name for a term, and what one really deﬁnes formally here is a function f ,
by f (A, . . . ) = {y : (∃x)(x ∈ A ∧ P [x, y])}, where “. . . ” are all those free variables present
that we do not care to mention. In practice, “let B be deﬁned as . . . ” is all one really cares to say.
III.8. Axiom of Collection 177

Hence

(∀x ∈ A)(∃y ∈ B) P [x, y]

Thus (substitution axiom)

(∃z)(∀x ∈ A)(∃y ∈ z) P [x, y]

which is the conclusion part of (5).

III.8.13 Corollary. Each of the versions (2)–(5) of collection above is a theo-

rem schema – hence for each speciﬁc P a theorem – of ZFC.

III.8.14 Remark. (I) Thus, in the sequel we may use any version of collection,
as convenience dictates.
(II) Intuitively, collection versions (2)–(5) have a hypothesis that guarantees
that for each “value” of x the corresponding number of values of y that satisfy
P [x, y] is sufﬁciently “small” to ﬁt into a set. In fact, in case (4) at most one
y-value is possible for each x-value, while in case (5) exactly one is possible –
albeit on the restriction that x is varying over a set A. Thus, collecting all the
values y, for all x in a set A, yields a set under all cases (2)–(5). This set is what
we call in elementary algebra or discrete mathematics courses the image of A
under the black box P [x, y]. This black box is an agent that for each “input”
value x yields zero or more (but not too many) “output” values y.
We note that collection in its version III.8.2 does not have the “not too many”
restriction on the number of outputs y for each x, and that is why one is selective
when collecting such outputs into a set. The conclusion of Axiom III.8.2,

(∃z)(∀x ∈ A)(∃y ∈ z) P [x, y]

allows the possibility that many outputs y need not be included in z; it says
only that some outputs are included.
(III) Collection versions (2)–(4) in III.8.12 are quite strong, even in the
absence of some of the other axioms. For example, Bourbaki (1966b) adopts the
axiom of pairing, but adopts collection version (2), and proves both separation
and union (Exercise III.14).
Shoenﬁeld (1967) adopts separation and proves pairing and union from col-
lection version (3) (Exercise III.15). Finally, Levy (1979) adopts union, and
proves separation and pairing from collection version (4) (Exercise III.16).
178 III. The Axioms of Set Theory

III.9. Axiom of Power Set

There is another operation on sets, which, intuitively, increases the size of a set
“exponentially”, from which fact it derives its name (see Exercise III.35 in this
connection).

III.9.1 Informal Deﬁnition (Power Class, Power Set). For any class A, P(A)
stands for {x : ¬U (x) ∧ x ⊆ A}. We read P(A) as the power class of A.
If A is a set, then P(A) is also pronounced the power set of A.

Note that what we collect in P(A) are sets.

III.9.2 Example (Informal). We compute some power classes:

= {∅}
P(∅)
P {∅} = ∅, {∅}
P {0, 1} = ∅, {0}, {1}, {0, 1}
Also note that
(i) Since for every class A we have ∅ ⊆ A, it follows that ∅ ∈ P(A).
(ii) If a is a set, then a ⊆ a as well; hence we have a ∈ P(a).
(iii) For a set x, x ∈ P(a) iff x ⊆ a.
(iv) Even though U (x) → x ⊆ a (is provable), still U (x) → x ∈ / P(a) (is prov-
able), since x must satisfy ¬U (x) for inclusion. Power classes contain no
atoms.

Pause. Now P {∅} ⊇ ∅, {∅} by (i) and (ii) above. But have we not forgotten
to include any other subsets of {∅}? Is it really “=” (rather than “⊃”) as we have
claimed above? The deﬁnitive answer that one is tempted to give is “Obviously,
the above is ‘=’ as stated”.

Well, let us prove the obvious, just to be sure:†

We will prove the formula ¬U (x) ∧ x ⊆ {∅} → x = ∅ ∨ x = {∅}, that is,
the tautologically equivalent
¬U (x) → x ⊆ {∅} → ¬x = ∅ → x = {∅} (1)

† My geometry teacher in high school used to say: “. . . [T]here are many proof methods: e.g., by
contradiction, by induction, etc. Among all those proof methods the most powerful is proof by
intimidation. It starts with the word ‘obviously’. Few have the courage to challenge such a proof
or to demand details . . . ”.
III.9. Axiom of Power Set 179

Arguing by the deduction theorem, we assume the hypothesis, namely,

¬U (x) (2)
(∀y)(y ∈ x → y = ∅) (3)

and

¬x = ∅

By (2) and the above (see III.4.11)

(∃y)y ∈ x (4)

Let (arguing by auxiliary constant)

A∈x (5)

Hypothesis (3) yields A ∈ x → A = ∅. (5) now yields A = ∅; hence

∅∈x (6)

by the Leibniz axiom and (5). The one point rule (III.6.2, p. 149) gives

∅ ∈ x ↔ (∀y)(y = ∅ → y ∈ x)

Thus, by (6),

(∀y)(y = ∅ → y ∈ x)

This along with (3) yields

(∀y)(y ∈ x ↔ y = ∅)

Thus x = {∅} (III.5.5).

III.9.3 Exercise. Repeat the above argument without relying on the axiom of
pairing. Thus, prove that

x ∈ P P(∅) ↔ x = ∅ ∨ x = P(∅)

The following is “really true”:

If A is a set, then so is P(A)
Indeed, let be a stage at which A is formed as a set. Let x ∈ P(A), i.e.,
x ⊆ A. Every member of x is available before x, and hence before A, therefore
180 III. The Axioms of Set Theory

before stage . Thus,

x ∈ P(A) implies that x is formed as a set at or before stage (1)

Let be a stage after (we have no problem accepting that a stage exists
after a given stage). Then, by (1), P(A) is formed as a set at stage .
We could not reliably argue the above using the size limitation doctrine
(Principle 3, p. 161), for it is not intuitively clear whether an “exponential
growth” in set size is harmless. In fact, Principle 3 was introduced solely to
justify the replacement axiom, and, in Chapter V, the axiom of inﬁnity.

We capture the above informal argument by the following power set axiom:

III.9.4 Axiom (Axiom of Power Set).

(∃y)(∀x)(x ⊆ A → x ∈ y) (1)

where A is a free variable.

Actually, the axiom as stated above is not exactly what we have established in
the informal argument that preceded it. The axiom says a bit more than “the
power class of a set is a set”.
It is really true as stated, nevertheless. First off, the y of (1) above is neces-
sarily a set by x ⊆ A → x ∈ y, since (∃x)x ⊆ A is provable (A ⊆ A is a theorem
of pure logic – see III.1.5, p. 116) and hence, so is (∃x)x ∈ y by ∃-monotonicity.
Thus ¬U (y) by Axiom III.1.3. We have omitted the usual qualiﬁcation “¬U (y)”
in the statement of the axiom, since it is a conclusion that the axiom forces any-
way. So the axiom says that
“There is a set y which contains as elements all x such that x ⊆ A,
without restricting x to be a set ”.
Now we see why it is really true. If A is a set, then lifting the restriction from
x adds to P(A) all the urelements (see III.1.5, p. 116). Well, there is a set y as
described above, for example, M ∪ P(A).†
If on the other hand A is an atom, then a choice for y that works is M ∪ {∅}.
As we have done on previous occasions, here too we prefer not to assert
explicitly that the objects which our axioms claim to exist are sets (nor do we
want to unnecessarily restrict our variables to be sets). We prefer to prove this

† We are using here the name M – which was earlier introduced formally to denote the set of all
atoms – to also name the set of all real atoms in the metatheory.
III.9. Axiom of Power Set 181

as a consequence of the axioms. On one hand this approach is mathematically

elegant; on the other hand – more importantly – it allows us to state our axioms
(e.g., power set above, as well as pairing, union, and collection) in a manner
that does not betray that we allow atoms; this gives us ﬂexibility. Thus we have
chosen the statement of Axiom III.9.4 to be morphologically identical to the
statement one would make in the absence of atoms (all variables then “vary
over” sets).

III.9.5 Proposition.

ZFC Coll x (¬U (x) ∧ x ⊆ A)

where A is a free variable.

Proof. By (1) of III.9.4 we may assume (B a new constant)

(∀x)(x ⊆ A → x ∈ B)

Hence

(∀x)(¬U (x) ∧ x ⊆ A → x ∈ B) (2)

by ∀-monotonicity, since

|=Taut (x ⊆ A → x ∈ B) → (¬U (x) ∧ x ⊆ A → x ∈ B)

We are done by III.3.6.

We would like now to introduce the symbol “P” formally, in the interest of

convenience, along with its informal use. As in the cases of ∪, ∩, and , we
will take no notational measures to distinguish between the formal and informal
occurrences of the symbol; we will rely instead on the context.

III.9.6 Deﬁnition (Formal P). We introduce a function symbol, P, of arity 1,

by the deﬁning axiom

P(A) = y ↔ (∀x)(¬U (x) ∧ x ⊆ A ↔ x ∈ y) (1)

or, equivalently

(∀x)(¬U (x) ∧ x ⊆ A ↔ x ∈ P(A)) (2)

182 III. The Axioms of Set Theory

III.9.7 Remark. (I) Once again we note that it is redundant to add “¬U (y) ∧ ”
in (1) (III.9.6) or “¬U (P(A)) ∧ ” in (2). Indeed,

ZFC (∀x)(¬U (x) ∧ x ⊆ A ↔ x ∈ y) ↔ ¬ U (y) ∧

(∀x)(¬U (x) ∧ x ⊆ A ↔ x ∈ y)

To see this, note that the ← direction is a tautological implication. For the →
direction we have

¬U (x) ∧ x ⊆ A ↔ x ∈ y

Thus, since ZFC ¬U (∅) ∧ ∅ ⊆ A, we obtain

(∃x)(¬U (x) ∧ x ⊆ A)

from which (∃x)x ∈ y by the equivalence theorem. That is (III.1.3),

¬U (y)

Similarly, (2) of III.9.6 proves

¬U (P(A)) (3)

(II) A is an arbitrary variable; thus P makes sense on atoms. Indeed,

ZFC U (A) → P(A) = {∅}

(see Exercise III.17).

III.10. Pairing Functions and Products

We now turn to the ordered pair concept, which will lead to the formalization
(within axiomatic set theory) of the intuitive concepts of relation and function
in the next section. We want to invent objects “(a, b)” which are meaningful
for all sets and atoms a and b and which are mindful of order in that

(a, b) = (a , b ) → a = a ∧ b = b (1)

In particular, (a, a) is supposed to have two objects in it, a ﬁrst a and a second
a, so it is not to be confused with {a, a} = {a}.
Some naı̈ve approaches to set theory take (a, b) to be a new kind of object
whose behaviour, i.e., (1), is “axiomatically accepted” (admittedly this is a
patently odd thing to do in a non-axiomatic approach). To proceed formally
within a framework that accepts sets and urelements as the only formal objects,
we must implement our new object as a set.
III.10. Pairing Functions and Products 183

There
are several
implementations,
the simplest one (due to Kuratowski)
being {a}, {a, b} , or the related a, {a, b} .

III.10.1 Proposition. If a, {a, b} = a , {a , b } , then a = a and b = b .

Proof. (Presented in a “relaxed” manner, that is, in argot. See also III.5.9,
p. 148.) By foundation, a = {a, b} (otherwise a ∈ a). Thus, taking the ⊆-half of
the hypothesis,
a = a and {a, b} = {a , b } (1)
or
a = {a , b } and {a, b} = a (2)
From (2) we get a ∈ a ∈ a , contradicting foundation. Therefore, case (2) is
untenable. Let us further analyze case (1), which already gives us half of what
we want, namely, a = a .
Thus,
{a, b} = {a, b } (3)
If a = b, then the ⊇-part of (3) gives b = b , and we are done. Otherwise, the
⊆-part of (3) gives b = b , and we are done again.

The pedantic way to derive a = a goes like this: We want

ZFC a, {a, b} = a , {a , b } → a = a
Assume the hypothesis
(∀z)(z = a ∨ z = {a, b} ↔ z = a ∨ z = {a , b })
Thus
a = a ∨ a = {a, b} ↔ a = a ∨ a = {a , b }
and
a = a ∨ a = {a, b} ↔ a = a ∨ a = {a , b }
Hence (by tautological implication and the axiom x = x)
a = a ∨ a = {a, b} (1)
and
a = a ∨ a = {a , b } (2)
184 III. The Axioms of Set Theory

which ( jointly) tautologically imply

(a = a ∧ a = a ) ∨ (a = a ∧ a = {a , b })
∨ (a = {a, b} ∧ a = a ) ∨ (a = {a, b} ∧ a = {a , b })
(3)
By foundation,

¬(a = a ∧ a = {a , b })
¬(a = {a, b} ∧ a = a )

and

¬(a = {a, b} ∧ a = {a , b })

which along with (3) tautologically imply

a = a

The rest of the above proof of III.10.1 has a straightforward formalization as a

proof by cases.

III.10.2 Deﬁnition (Pairing Function and Ordered Pair). We introduce a

new function symbol of arity 2, J , by

J (x, y) = {x, {x, y}}

It is customary to denote the term J (x, y) by (x, y).

We call J (x, y) or (x, y) the ordered pair. We call J the pairing function.
“The” is dictated by our determination to have just one implementation of
(ordered) pair, as that is sufﬁcient for the theory.† Indeed, one seldom needs to
remember how (x, y) is implemented, as the property expressed in III.10.1 is
all we normally need and use.

Many of the sequel’s proofs are in the “relaxed” style. We get to be formal
whenever there is danger of missing ﬁne points in this argot.

III.10.3 Proposition. For any a, b, a , c , (a, b) = (a , b ) iff a = a and b = b .

Proof. The only-if part is Proposition III.10.1. The if part follows from the
Leibniz axiom.

† An exception occurs in Chapter VII in our study of cardinality, where yet another pairing is
considered.
III.10. Pairing Functions and Products 185

Some will say that using the above deﬁnition for “pair” is overkill, since founda-
tion was needed toestablish itskey property in III.10.1. By contrast, a deﬁnition
of ordered pair via {a}, {a, b} does not require this axiom (see Exercise III.36).
This is a valid criticism for a development of set theory that is constantly pre-
occupied with the question of what theorem needs what axioms. In the context
of our plan, it is a minor quibble, since we will seldom ask such questions, and
we do have foundation anyway.

We often ﬁnd it convenient to extend the notion of an ordered pair to that of

an (ordered) n-tuple in general (for n ≥ 1). To this end,

III.10.4 Deﬁnition (The Ordered n-Tuple). We deﬁne by induction (recur-

sion) on n ≥ 1 a function, J (n) , of arity n:
def
J (1) (x) = x Basis
def
J (n+1)
(x1 , . . . , xn+1 ) = J (J (n)
(x1 , . . . , xn ), xn+1 ) for n ≥ 0

where x, x1 , . . . , xn+1 are variables, and J is the pairing function of Deﬁni-

tion III.10.2.
It is normal practice to denote the term J (n) (x1 , . . . , xn ) – somewhat ambigu-
ously, since the same symbol is good for any arity† – by the symbol

x1 , . . . , xn

We adopt this practice henceforth and call x1 , . . . , xn an n-tuple, or n-vector,

or just vector if n is understood or unimportant. We often use the shorthand
notation xn (or x if n is not important) for the n-tuple.

III.10.5 Remark. (1) Some authors will not define n-tuples for arbitrary n –
once again avoiding the set N and inductive definitions – instead, they will
“unwind” the recursion and give a definition that goes, say, up to a 5-tuple, e.g.,

J (1) (x) = x
J (x, y) = J (J (1) (x), y)
(2)

J (3) (x, y, z) = J (J (2) (x, y), z)

..
.

and leave the rest up to the imagination, invoking “. . . ”. The reader should not
forget that we are using n in the metalanguage. As far as the formal system is

† A “real-life” function of non-ﬁxed arity is the print function of computer programming.

186 III. The Axioms of Set Theory

concerned, “n” of xn is hidden in the name – it is not a variable accessible to

the formal system.
(2) Following the deﬁnition, let us compute a, b using the shorthand (a, b)
for J (a, b) below:

a, b = (a, b) by the induction step

= (a, b) by the basis step

Thus, from now on we denote the ordered pair by the symbol “a, b”, rather
than “(a, b)”, in the interest of notational uniformity.
(3) The ordered pair provides, intuitively, a pairing function – which moti-
vates the name we gave J – that, for any two objects a and b, given in that order,
“codes” them into a unique object c ( = a, b) in such a way that if, conversely,
we know that a given c is a “code”, then we can uniquely (by III.10.3) “decode”
it into a and b. That is, if c is an ordered pair, then (∃!x)(∃!y)x, y = c holds.
The unique x is called the ﬁrst projection and the unique y is called the second
projection of c – in symbols (we are using notation due to Moschovakis), π (c)
and δ(c) respectively.† More accurately, since not all sets (and no urelements)
are valid “codes” (that is pairs; e.g., {0} is not), we must let π and δ “return”
some standard “output” when the input c is “bad” (not a pair).‡

Let us do this formally: We record the tautology

(∃x)(∃y)x, y = z → (∃x)(∃y)x, y = z (1)

We next prove

(∃y)x, y = z ∧ (∃y)v, y = z → x = v (2)

Assume the hypothesis, and add the assumptions (by auxiliary constant)

x, A = z

and

v, B = z

Hence

x, A = v, B

† Presumably, π for “πρ ώτ η” (= first) and δ for “δε ύτ ερη” (= second).
‡ Once again we will ensure that the defined function symbols, π and δ, have total interpretations.
This determination was also at play when we defined power set and, earlier on, the formal “∩”,
“∪”, and difference. We defined all these functions to act on all sets or atoms.
III.10. Pairing Functions and Products 187

Therefore

x =v

by III.10.3.

III.10.6 Deﬁnition (The First Projection π). By the techniques found in

III.2.4 (p. 122: (10), (11), (18)) we may now introduce a new function symbol
π of arity 1 by the axiom (we insert redundant brackets to avoid any misunder-
standing)

(∃x)(∃y)x,y = z ∧ (∃y)π(z),y = z ∨

¬(∃x)(∃y)x,y = z ∧ π(z) = ∅

By III.2.4 (19), (20), and (19 ) one directly obtains in ZFC (using III.10.6):

III.10.7 Proposition.

(∃x)(∃y)x, y = z → (∃y)π(z), y = z (π -19)

¬(∃x)(∃y)x, y = z → π(z) = ∅ (π-20)

and

(∃x)(∃y)x, y = z → (∃y)x, y = z ↔ x = π (z) (π -19 )

A similar analysis, which we do not repeat, yields for δ:

III.10.8 Deﬁnition (The Second Projection δ). By the techniques in III.2.4

(p. 122: (10), (11), (18)) we may now introduce a new function symbol δ of
arity 1 by the axiom

(∃x)(∃y)x, y = z ∧ (∃x)x, δ(z) = z ∨

¬(∃x)(∃y)x, y = z ∧ δ(z) = ∅

III.10.9 Proposition. The following are theorems in the presence of the deﬁning
axiom III.10.8:

(∃x)(∃y)x, y = z → (∃x)x, δ(z) = z (δ-19)

¬(∃x)(∃y)x, y = z → δ(z) = ∅ (δ-20)
188 III. The Axioms of Set Theory

and

(∃x)(∃y)x, y = z → (∃x)x, y = z ↔ y = δ(z) (δ-19 )

It is also notationally convenient to introduce the predicate “is an ordered

pair” by:

III.10.10 Deﬁnition. We introduce “OP”, a predicate of arity 1, by

OP(z) ↔ (∃x)(∃y)x, y = z
We pronounce OP(z) as “z is an ordered pair”.

We have at once:

III.10.11 Proposition. The following are ZFC theorems (in the presence of the
appropriate deﬁning axioms):
OP(z) → π(z), δ(z) = z (1)
and
¬OP(z) → π(z) = ∅ ∧ δ(z) = ∅ (2)

Proof. (2) is a direct consequence of III.10.7 and III.10.9 ((π -20) and (δ-20)).
As for (1), assume the hypothesis, OP(z). By III.10.7 (π -19),
(∃y)π(z), y = z
while by III.10.9 (δ-19),
(∃x)x, δ(z) = z
The above two allow us to assume (where A and B are new constants)
π(z), A = z
and
B, δ(z) = z (3)
Hence

π(z), A = B, δ(z)

from which A = δ(z) and B = π (z) by III.10.3. Thus, π (z), δ(z) = z.

It is also worth recording that

III.10. Pairing Functions and Products 189

III.10.12 Corollary. The following are ZFC theorems (in the presence of the
appropriate deﬁning axioms):

π x, y = x (3)

and

δ x, y = y (4)

Proof. We note the logical theorem x, y = x, y, from which the substitution
axiom yields (∃u)(∃v)u, v = x, y, that is,

OP x, y (5)

For (3), III.10.7 (π -19 ) yields (note dummy renaming)

OP x, y → (∃u)x, u = x, y ↔ x = π x, y (6)

By (5) and the logical theorem (∃u)x, u = x, y, (6) yields

x = π x, y

The case for δ is similar.

In recursion theory (or computability, studied in volume 1), pairing functions

on the natural numbers play an important role. There are several so-called
primitive recursive pairing functions, e.g., 2x 3 y , 2x (2y + 1), 2x+y+2 + 2 y+1 ,
(x + y)2 + x, (x + y)(x + y + 1)/2 + x. Of these, only the last one ensures that
every n ∈ N is a “pair”, while the second one misses only the number 0.

III.10.13 Proposition. For n ≥ 2 and any objects a1 , . . . , an , an is a set.

Proof. Exercise III.38.

III.10.14 Proposition. For n ≥ 1 and any objects a1 , . . . , an , b1 , . . . , bn ,

an = bn iff ai = bi for i = 1, . . . , n

Proof. Exercise III.39.

III.10.15 Informal Deﬁnition (The Cartesian Product of Two Classes). For

any classes A and B, the symbol A × B, read the Cartesian product of A and
B (in that order), is an abbreviation for the class term {a, b : a ∈ A ∧ b ∈ B}
(see also III.8.7).
190 III. The Axioms of Set Theory

III.10.16 Lemma. {x, y : T [x, y]} = {z : OP(z) ∧ T [π (z), δ(z)]} is a theo-

rem schema.

Proof. By III.8.7 the left hand side abbreviates the class term

{z : (∃x)(∃y)(x, y = z ∧ T [x, y])}

Thus we need to prove

(∃x)(∃y)(x, y = z ∧ T [x, y]) ↔ OP(z) ∧ T [π (z), δ(z)] (1)

We note the theorem

x, y = z ↔ OP(z) ∧ π(z) = x ∧ δ(z) = y (2)

Indeed, the → direction is by the Leibniz axiom and

OP(x, y) ∧ π (x, y) = x ∧ δ(x, y) = y

from the deﬁnition of OP and III.10.12. The ← direction is by III.10.11.

Now (2) yields, via the equivalence theorem, the ﬁrst equivalence of the
following “calculation”:

(∃x)(∃y)(x, y = z ∧ T [x, y])

↔ see above
(∃x)(∃y)(OP(z) ∧ π(z) = x ∧ δ(z) = y ∧ T [x, y])

↔ no free x, y in OP(z)
OP(z) ∧ (∃x)(∃y)(π(z) = x ∧ δ(z) = y ∧ T [x, y])

↔ one point rule
OP(z) ∧ (∃x)(π(z) = x ∧ T [x, δ(z)])

↔ one point rule
OP(z) ∧ T [π(z), δ(z)])

We have proved (1).

III.10.17 Theorem. For any variables A and B, A × B is a set.

Proof. By III.8.11 (p. 172), ZFC Coll z (∃x)(∃y)(z = x, y∧x ∈ A∧ y ∈ B)
for any free variables A and B.
III.10. Pairing Functions and Products 191

Using the recently introduced symbols and III.10.16,

ZFC Coll z OP(z) ∧ π (z) ∈ A ∧ δ(z) ∈ B

Thus, we might as well introduce the formal “×”:

III.10.18 Deﬁnition (Formal ×). In view of the above observation, we intro-

duce a new function symbol, ×, of arity 2 by the deﬁning axiom

A × B = y ↔ ¬U (y) ∧ (∀z)(z ∈ y ↔ OP(z) ∧ π (z) ∈ A ∧ δ(z) ∈ B)

A × B = {z : OP(z) ∧ π(z) ∈ A ∧ δ(z) ∈ B)

This A × B makes sense for all variables A and B. In particular, one can
prove

U (A) → A × B = ∅

as well as

∅× B =∅

III.10.19 Remark. The following proof of III.10.17 for any sets A and B is
often criticized as overkill (e.g., Barwise (1975)), while Bourbaki (1966b), Levy
(1979), and Shoenﬁeld (1967) – who use the collection-based proof above –
but not Jech (1978b), just stay away from it without comment:
Let a, b ∈ A × B, i.e., a ∈ A and b ∈ B. Thus, {a, b} ⊆ A ∪ B, and
therefore {a, b} ∈ P(A ∪ B). It follows that a, {a, b} ⊆ A ∪ P(A ∪ B) and
hence

a, {a, b} ∈ P A ∪ P(A ∪ B)

Thus a, b ∈ P A ∪ P(A ∪ B) , establishing A × B ⊆ P A ∪ P(A ∪ B) . By
separation, A × B is a set.
An additional criticism here may be that this proof needed to know the
implementation of “a, b”, while the one, based on collection, does not need
this information.
The objection that the proof is overkill, on the other hand, is context-
dependent. If the foundation of set theory is going to exclude the power set
axiom (one important set theory so restricted is that of Kripke and Platek with
192 III. The Axioms of Set Theory

or without urelements – “KP” or “KPU”; e.g., Barwise (1975)), then the objec-
tion is justiﬁed. If on the other hand we do have the power set axiom, then we
certainly are going to take advantage of it, and we reserve the right to use any
axiom we please in our proofs.

III.10.20 Example (Informal). {0} × {1} = {0, 1} and {1} × {0} = {1, 0}.
Since 0, 1 = 1, 0, these two products are different; hence A × B = B × A
in general.

We conclude this section by extending (more argot) × to any ﬁnite number

of class “operands”, just as we did for ∪ and ∩ in III.4.15 (p. 142).

III.10.21 Informal Deﬁnition. Given classes Ai for i = 1, . . . , n,

×A × ×A
n
n
i and i=1
Ai and i
i=1 1≤i≤n

are alternative abbreviations for

{xn : x1 ∈ A1 ∧ · · · ∧ xn ∈ An } (∗)

We avoid “· · · ” by the inductive deﬁnition

× A stands for A
1
i 1
i=1

× A stands for × A
n+1 n
i i × An+1
i=1 i=1

One often writes A1 × · · · × An rather than A.

i=1 i × n

If all the Ai are equal, say to A, then we will usually write An . We let A1
mean A.

That the “. . . ” notation, in (∗) of III.10.21 above, and the inductive deﬁnition
coincide can be veriﬁed using III.10.4.
We can, of course, use the metalogical “=” in lieu of “stands for”. For exam-
ple, the logical theorem A1 = A1 leads to the logical theorem
1
Ai = A1 ×
×
1 i=1
on replacing the left A1 by the “abbreviation” A.
i=1 i

III.10.22 Informal Deﬁnition. We often have a “rule” which for each a ∈ I

“gives” us a class Aa . This simply means that for some formula A(x, y) (the
III.11. Relations and Functions 193

“rule”) we consider† the classes {x : A(a, x)} for each a ∈ I. I is the index
class.
We cannot in general collect all the Aa into a class, yet we can (informally)
“deﬁne” their union and intersection:

Aa stands for {x : (∃a ∈ I)x ∈ Aa } that is,{x : (∃a ∈ I)A(a, x)}
a∈I

and

Aa stands for {x : (∀a ∈ I)x ∈ Aa } that is,{x : (∀a ∈ I)A(a, x)}
a∈I

The case I = N occurs frequently in informal discussions.

III.10.23 Proposition. If for n ≥ 2 all of A1 , . . . , An are sets, then so is

A1 × · · · × An .

Proof. By III.10.21 and induction on n.‡

III.10.24 Corollary. For any set A, An is a set for n ≥ 1.

III.11. Relations and Functions

We intuitively picture a binary relation as a table of rows,§ each row containing
two objects, a first object (occupying the first column) and a second object
(occupying the second column) – see also Section I.2, p. 20.
A table naturally leads to a (usually) one-to-many “rule” that to each object
from some class associates one or more¶ elements from another (possibly the
same) class: Simply associate to the the first object on each row (the input
object) the second object of the row (the output object).
Conversely any one-to-many “rule”, regardless of how it is expressed (re-
gardless of intention) can be represented as a table by forming a class of rows
where the second object in each row is associated to the first, according to the
given rule.

† “Consider”, not “collect”, since some Aa may be proper classes and we are unwilling to collect
other than sets or atoms into a class.
‡ This is a “theorem schema”. For each value of the informal object n we have a (different) theorem:
“A × B is a set”; “A × B × C is a set”; etc. We have suggested above a (meta)proof of all these
theorems at once by induction in the metatheory.
§ Such a table may have, intuitively, inﬁnite length.
¶ Hence the term “one-to-many”.
194 III. The Axioms of Set Theory

Our rigorous counterpart of a rule or table is then a class of ordered pairs.†

Such a class is what we call a binary relation, or just relation.

III.11.1 Informal Deﬁnition (Binary Relations). A binary relation, or just

relation, is a class T whose members are (exclusively) ordered pairs.
Within ZFC, that “T is a relation” is argot for “z ∈ T → OP(z) is a theorem”.
Similarly, “let T be a relation” means “add the assumption z ∈ T → OP(z)”.
If T is a relation, then the notations a, b ∈ T and b T a (note the order
reversal in the notation) mean exactly the same thing.
A relation T often is introduced as a class term {x, y : T [x, y]} or, equiv-
alently (III.10.16), {z : OP(z) ∧ T [π(z), δ(z)]}.
We call T the deﬁning formula of the relation T. In most practical cases T
has no parameters, that is, x and y are its only free variables. In such cases we
say that T is the relational implementation of T (x, y) or that it is the relational
extension of T (x, y).

III.11.2 Remark. (1) The term “binary”, understood if omitted, refers to the
fact that we have a class of (ordered) pairs in mind.
Some mathematicians – especially in the context of a discrete mathematics
course – want to have n-ary relations, for any n ≥ 1, that is, classes whose
members are n-tuples (III.10.4, p. 185). We will not spend any nontrivial amount
of time on those, since for any n ≥ 2, xn = xn−1 , xn (by III.10.4), and
therefore any n-ary relation, for n ≥ 2, is a binary relation. For n = 1 we
have the unary relations, that is, classes of elements that are 1-tuples, x.
Since x = x (cf. III.10.4), unary relations are just classes with no additional
requirements imposed on their elements. We will not use the terminology “unary
relations”; rather we will just call them classes (or sets, as the case may be). For
the record, when one uses n-ary relations, one usually abbreviates “xn ∈ T”
by “T(xn )”. In particular, when n = 2, the texts “x, y ∈ T”, “T(x, y)” and
“y T x” state the same thing.
An n-ary relation T may naturally arise as the extension or implementation
of a formula of n free variables, that is, as T = {xn : T (xn )} (see III.8.7). In
this case, and in view of the above comment, the texts “T(xn )” and “T (xn )”
are interchangeable in the argot of n-ary relations.
(2) The empty set is obviously a relation.
(3) Note the reversal of order in “a, b ∈ T iff b T a” in III.11.1. This is
one of a variety of tricks employed in the literature in order to make notation

† Much is to be gained in notational convenience if we do not restrict relations to be sets.

III.11. Relations and Functions 195

regarding composition consistent between relations and functions (we will re-
turn to clarify this point when composition is introduced). The trick employed
here is as in Shoenfield (1978); for a different one see Levy (1979).
(4) In the same spirit, whenever the defining formula F of a relation F is
by convention written in so-called infix notation (i.e., x F y) rather than prefix
(i.e., F (x, y)) – e.g., one writes x < y rather than <(x, y) – then we observe
this reversal by writing F = {y, x : x F y} or F = {z : OP(z) ∧ δ(z)F π(z)}.
This notation has the nice side effect that a F b ↔ a F b is provable. For exam-
ple, {y, x : x < y)} is the relational implementation of the formula x < y.

Note. We will continue writing F = {x, y : F (x, y)}, that is, there is no
reversal of variables if the defining formula F is written in the usual prefix
notation.
Whenever b T a holds, we say that T, when presented with input a, “re-
sponds” with b among its (possibly many different) outputs.
(5) A relation often inherits the name of the defining formula. Thus the
relation {y, x : x ∈ y} is also denoted by “∈”, and y, x ∈ ∈ means x ∈ y.
The left “∈” in “y, x ∈ ∈” is the nonlogical symbol, while the right “∈” is
the informal name of the relation that extends the formula x ∈ y. Similarly, <
is used as both the name of {y, x : x < y} and that of the defining formula;
thus y, x ∈ < means x < y.
With some practice, all this will be less confusing than at first sight.

III.11.3 Example (Informal). Here are some relations: ∅, {0, 1}, R2 (where
R is the set of all reals), {0, 1, 2}. According to the above remark, the last
example is both 3-ary (ternary) and binary, since 0, 1, 2 = 0, 1, 2.

III.11.4 Informal Deﬁnition. Let S be any class (binary relation or not).

dom(S), its domain, is an abbreviation for the class {x : (∃y)x, y ∈ S}
or {π (z) : OP(z) ∧ z ∈ S}, i.e., the class of all “useful” inputs – those
which do “cause” some output in S. The range of S, ran(S), on the other hand
stands for the class of all the outputs “caused” by all inputs in S. In symbols,
{y : (∃x)x, y ∈ S} or, equivalently, {δ(z) : OP(z) ∧ z ∈ S}.
The argot concepts “dom” and “ran” apply, in particular, to relations S.
The class that contains all the useful inputs and all the outputs is the ﬁeld of
the class S, that is, dom(S) ∪ ran(S).
If S ⊆ A × B for some A and B, then we say that “S is a relation on A × B”
or that “S is a relation from A to B” or that “S is a relation that maps A into
196 III. The Axioms of Set Theory

B”. The symbol† S : A → B is read exactly like any one of the three previous
italicized sentences in quotes.
A is called the left field and B is called the right field of S.
If S ⊆ A × A, then we say that S is a relation on A rather than on A × A.
Given S : A → B and in the context of these fields, S is total iff dom(S) = A;
otherwise it is nontotal.
It is onto iff ran(S) = B. In this case we often say that S maps A onto B, or
just that S is onto B.
The converse or inverse of any class S, in symbols S−1 , is the class {x, y :
y, x ∈ S}.
The concept of inverse (converse) applies, in particular, to relations S.
Let S : A → B, X ⊆ A, and Y ⊆ B. The image of X under S, in symbols
S[X], is the class of all the outputs that are caused by inputs in X, i.e., {y :
(∃x ∈ X)y S x}.
We have the non-standard‡ shorthand Sc for S[{c}].
The inverse image of Y under S is just S−1 [Y].
We have the non-standard shorthand S−1 c for S−1 [{c}].

III.11.5 Remark. (1) The notions of left field and right field are not absolute;
they depend on the context. It is clear that once left and right fields are chosen,
then any super-class of the left (respectively, the right) field is also a left (re-
spectively, right) field. Conversely, one can always narrow the left field until it
equals dom(S), thus rendering S total. A similar comment holds for the concept
of onto. That may create the impression that the notions “total”, “nontotal”, and
“onto” are really useless.
This is not so, for in many branches of mathematics the studied relations and
functions have “natural” associated classes (usually sets) from which inputs are
taken and into which outputs are placed. For example, in (ordinary) recursion
theory functions take inputs from N and produce outputs in N. It is a (provably)
unsolvable problem of that theory to determine for any given such function, in
general, whether it is total or onto.§ Therefore it is out of the question to make¶
left and right fields “small enough” to render the arbitrary such function total
and onto.

† The context will not allow confusion between the logical → and the one employed, as is the case
here, to mean “to”.
‡ A reason for its being non-standard becomes obvious as soon as we consider III.11.14.
§ By Rice’s theorem, proved in volume 1.
¶ “Make” with the tools of recursion theory, that is. Such tools are formalized algorithms.
III.11. Relations and Functions 197

(2) Note that for any a and b, b ∈ Sa iff b ∈ S[{a}] iff (∃x ∈ {a})b S x
iff (∃x)(x = a ∧ b S x) iff b S a. This pedantic (conjunctional) iff chain proves
(within pure logic, where we wrote the argot “iff” for “↔”) the obvious

b ∈ Sa ↔ b S a (i)

Similarly,

b ∈ S−1 a ↔ b S−1 a (ii)

since S−1 is a relation.

(3) The deﬁnition (III.11.4) of inverse relation is equivalent to

bSa iff a S−1 b

Using (i) and (ii), we obtain at once

b ∈ Sa ↔ a ∈ S−1 b

(4)

S[X] = {Sx : x ∈ X}

as the following calculation shows.

{Sx : x ∈ X} = z : (∃x)(x ∈ X ∧ z ∈ Sx) (by III.8.8)

= z : (∃x)(x ∈ X ∧ z S x) (by (i) above)
= S[X] (Deﬁnition III.11.4)
(5)

S−1 [Y] = {x : Y ∩ Sx = ∅}

as the following calculation shows:

S−1 [Y] = {x : (∃y ∈ Y)x S−1 y}

= {x : (∃y)(y ∈ Y ∧ y S x)}
= {x : (∃y)(y ∈ Y ∧ y ∈ Sx)}
= {x : (∃y)(y ∈ Y ∩ Sx)}
= {x : Y ∩ Sx = ∅}

III.11.6 Example (Informal). Let S = {0, a, 0, b, 1, c, {0, 1}, a}.
Then S0 = S[{0}] = {a, b}, S[{0, 1}] = {a, b, c}. On the other hand,
S{0, 1} = S[{{0, 1}}] = {a}. Thus, S[{0, 1}] = S{0, 1}.
198 III. The Axioms of Set Theory

This phenomenon occurs because dom(S) has a member, namely {0, 1},
which is also a subset of dom(S). One encounters a lot of sets like this in set
theory, so the common notation “S(X)”, which is used in naı̈ve approaches both
for the image (when X is viewed as a collection of points) and for the output(s)
when X is a single input (X now being viewed as a point), would have been
ambiguous in our setting.
What does Sa = ∅ mean? By III.4.11 it translates into ¬(∃x)x ∈ Sa.
This is (logically) equivalent to a ∈
/ {z : (∃x)x S z}, that is, a ∈
/ dom(S), or Sa
is undeﬁned.

III.11.7 Example (Informal). (1) Let < and > be the usual predicates on N,
and let us use the same symbols for the relational extensions of the atomic
formulas x < y and x > y. Then, <3 = {x : x < 3} = {0, 1, 2}. Similarly,
>3 = {4, 5, 6, . . . }.
(2) Let M = {0, x : x = x}. Then dom(M) = {0} and ran(M) = U M .
Thus a relation that is a proper class can have a domain which is a set. Similar
comment for the range (think of M−1 ).

III.11.8 Proposition. If the relation S is a set, then so are dom(S) and ran(S).

Proof. Assume the hypothesis. By

z ∈ S → OP(z) |=Taut z ∈ S ↔ z ∈ S ∧ OP(z)

and III.11.4 we get dom(S) = {π(z) : z ∈ S}. The claim follows now from
III.8.9.
The argument for ran(S) just uses δ(z) instead.

III.11.9 Informal Deﬁnition. For any relation S, “a ∈ dom(S)” is pronounced

“Sa is deﬁned”. We use the symbol “Sa ↓” to indicate this. Correspondingly,
“a ∈ / dom(S)” is pronounced “Sa is undeﬁned”. We use the symbol “Sa ↑”
to indicate this.
If T is a relation and S ⊆ T, then T is an extension of S, and S is a restriction
of T.
If T is a relation and A is some class, then a restriction of T on A is usually
obtained in one of two ways:

(1) Restrict both inputs and outputs to be in A, to obtain T ∩ A2 . The symbol

T | A is used as a shorthand for this restriction.
(2) Restrict only the inputs to be in A, to obtain {x ∈ T : π (x) ∈ A}. This restric-
tion is denoted by T A.
III.11. Relations and Functions 199

III.11.10 Example (Informal). Suppose that we are working in N. Thus

{2, 1, 2, 0, 1, 0} =< ∩ {0, 1, 2}2 . It is also the case that {2, 1, 2, 0,
1, 0} =< {0, 1, 2}, so that both versions of restricting the relation < on the
set {0, 1, 2} give the same result.
Now, {1, 2, 0, 2, 0, 1} = > ∩ {0, 1, 2}2 . However,

> {0, 1, 2} = {0, 1, 0, 2, . . . , 1, 2, 1, 3 . . . , 2, 3, 2, 4 . . .}.

Here the two versions of restriction are not the same.

III.11.11 Remark. (1) By the concluding remarks in III.11.6, Sa ↓ iff

Sa = ∅, while Sa ↑ iff Sa = ∅.
(2) For relations T the most commonly used version of restriction is T | A.
Occasionally one sees T A for T A (Levy (1979)).
(3) A relation S : A → B is sometimes called a partial multiple-valued
function from A to B. “Partial” refers to the possibility of being nontotal, while
“multiple-valued” refers to the fact that, in general, S can give several outputs
for a given input.
Remove this possibility, and you get a (partial) function.

III.11.12 Informal Deﬁnition ((Informal) Functions). A function F is a

single-valued relation, more precisely, single-valued in the second projection.
If we are working in ZFC, then “F is single-valued in the second projection”
is argot for

“x ∈ F ∧ y ∈ F ∧ π(x) = π (y) → δ(x) = δ(y)” (1)

Thus, “if F is a function, . . .” adds (1) to the axioms (of ZFC), while “. . . F is
a function . . .” claims that (1) is a theorem.
If F : A → B, then F is a partial function from A to B. The qualiﬁcation
“partial” will always be understood (see above remark), and therefore will not
be mentioned again.

III.11.13 Remark. (1) The deﬁnition of single-valuedness can also be stated

as b F a ∧ c F a → b = c (where a, b, c are free), or even a, b ∈ F ∧ a, c ∈
F → b = c.
(2) Clearly, if F is a function,† then we can prove z ∈ F → Fπ (z) = {δ(z)}
(we have ⊇ by deﬁnition of Fx and ⊆ by single-valuedness).
(3) ∅ is a function.

† We are not going to continue reminding the reader that this is argot. See III.11.12.
200 III. The Axioms of Set Theory

(4) Since a relation is the “implementation” of a formula as a class, so is a

function. But if the relation F, deﬁned from the formula F (x, y), is a function –
that is, we have the abbreviation

F = {x, y : F (x, y)} (i)

and also a proof of

x, y ∈ F ∧ x, z ∈ F → y = z (ii)

– then we must also be able to prove

F (x, y) ∧ F (x, z) → y = z (iii)

Indeed, we are (and a bit more).

First off, we see at once – by (slightly ab)using III.4.1(iii)† (p. 134) – that
“x, y ∈ F” abbreviates F (x, y).
A more serious (i.e., complete) reason is this: “x, y ∈ F” is logically
equivalent to

(∃u)(∃w) u, v = x, y ∧ F (u, w)

by III.8.7. By III.10.3 and the equivalence theorem, the above translates (is
provably equivalent) to

(∃u)(∃w) u = x ∧ v = y ∧ F (u, w)

Two applications of the one point rule yield the logically equivalent formula
F (x, y).
With this settled, we see that (ii) and (iii) are indeed provably equivalent if
F is given by (i).
(5) Since a function is a relation, all the notions and notation deﬁned previ-
ously for relations apply to functions as well. We have some additional notation
and concepts peculiar to functions:

III.11.14 Informal Deﬁnition. If F is a function and b ∈ Fa, then (by unique-

ness of output) {b} = Fa. For functions only we employ the abbreviation
b = F(a). Note the round brackets.
If a = xn we agree to write F(xn ) rather than F(xn ).
Functions that are sets will be generically denoted – unless they have speciﬁc
names – by the letters f, g, h.

† This informal deﬁnition gives the meaning of z ∈ {x : A [x]}, not that of z ∈ {t[x] : A [x]}.
III.11. Relations and Functions 201

We now see why we have two different notations for functions and relations
when it comes to the image of an input. F(a) is the output itself, while Fa is
the singleton {F(a)}.†

III.11.15 Example (Informal). Here are two examples of relations from “real”
mathematics: C = {x, y ∈ R2 : x 2 + y 2 = 1} and H = {x, y ∈ R2 : x 2 + y 2 =
1 ∧ x ≥ 0 ∧ y ≥ 0}.
H , but not C, is a function. Clearly H is the restriction of C on the non-
negative reals, R≥0 , in the sense H = C ∩ R2≥0 .

III.11.16 Informal Deﬁnition (Function Substitution). If S is a relation,

F (x, y1 , . . . , yn ) is a formula, and G is a function, then

F (G(x), y1 , . . . , yn ) abbreviates (∃z)(z = G(x) ∧ F (z, y1 , . . . , yn ))

In particular, G(x) S a stands for (∃z)(z = G(x) ∧ z S a), and a S G(x) for
(∃z)(z = G(x) ∧ a S z).
In short, we have introduced abbreviations that extend the one point rule in
the informal domain, with informal terms such as G(x) that are not necessarily
admissible in the formal theory.

def
III.11.17 Remark (Informal). (1) Take the relations “=” ( = {x, y : x = y})
def
and “=” ( = {x, y : ¬x = y}), both on N, and a function f : N → N. Then

y = f (x) iff (∃z)(z = y ∧ z = f (x)) (i)

def
by III.11.16. Call this relation F. Also, let T = {x, y : y = f (x)}.
Now N2 − T = {x, y : f (x) ↑ ∨(∃z)(z = y ∧ z = f (x))}, for there are
two ways to make y = f (x) fail:
(a) f (x) ↑, since y = f (x) implies f (x) ↓, or
(b) (∃z)(z = y ∧ z = f (x)).
Thus, unless f is total (in which case f (x) ↑ is false for all x), N2 − T = F.
This observation is very important if one works with nontotal functions a lot
(e.g., in recursion theory).

† Sometimes one chooses to abuse notation and use “F(a)” for both the singleton (thinking of F
as a relation) and the “raw output” (thinking of F as a function). Of course the two uses of the
notation are inconsistent, especially in the presence of foundation, and the context is being asked
to do an unreasonable amount of fending against this. The Fx notation that we chose restores
tranquillity.
202 III. The Axioms of Set Theory

(2) According to Deﬁnition III.11.16, for any functions F and G,

F(a) = G(b) means (∃x)(∃y)(x = F(a) ∧ y = G(b) ∧ x = y), or more simply,
(∃x)(x = F(a) ∧ x = G(b)). This is satisfactory for most purposes, but note that
“=” between partial functions is not reflexive! Indeed, if both F(a) and G(b)
are undefined, then they are not equal, although you would prefer them to be.
Kleene fixed this for the purposes of recursion theory with his weak equality,
“!”, defined (informally) by
F(a) ! G (b) abbreviates F(a) ↑ ∧ G(b) ↑ ∨(∃x)(x = F(a) ∧ x = G(b))
Clearly, F(a) ↑ ∧ G(b) ↑→ F(a) ! G(b).†
Whenever we use “=” we mean ordinary equality (where, in particular,
F(a) = G(b) entails F(a) ↓ and G(b) ↓). On those occasions where weak equal-
ity is employed, we will use the symbol “!”.

III.11.18 Exercise. For any relations S and T prove

(1) S ⊆ T ↔ (∀x)(Sx ⊆ Tx),
(2) S = T ↔ (∀x)(Sx = Tx).
Also prove that for any two functions F and G,
(3) S = T ↔ (∀x)(S(x) ! T(x))

while
(4) S = T ↔ (∀x)(S(x) = T(x)) fails.

III.11.19 Exercise. For any (formal) term t(xn ) and class term A, the class term
F = {xn , t(xn ) : xn ∈ A}
is the binary relation (see III.8.7)

F = z : OP(z) ∧ (∃x1 ) . . . (∃xn ) π (z) = xn ∧ δ(z) = t(xn ) ∧ π (z) ∈ A

Prove that F is single-valued in the second projection (δ(z)), and hence is a

function.

III.11.20 Informal Deﬁnition (λ-Notation). We use a variety of notations to

indicate the dependence of the function F of the preceding exercise on the
“deﬁning term” t, usually letting A be understood from the context. We may

† Indeed not just “”, but “|=Taut ”. On the right hand side of “→” we expand the abbreviation into
F(a) ↑ ∧ G (b) ↑ ∨(∃x)(x = F(a) ∧ x = G(b)).
III.11. Relations and Functions 203

write any of the following:

(1) F(xn ) = t(x1 , . . . , xn ) for all xn (recall that we write F(xn ) rather than F(xn )
xn , y = xn , y).
and that
(2) F = xn "→ t(x1 , . . . , xn ) .
(3) F = λxn .t(x1 , . . . , xn ) (λ-notation).

III.11.21 Example (Informal). If we work in N informally, we can deﬁne –

from the term y 2 – a function

f = {x, y, y 2 : x, y ∈ N2 } (1)

We can then write

f = λx y.y 2 (2)

This function has two inputs. One, x, is ignored when the output is “computed”.
Such variables (inputs) are sometimes called “dummy variables”.
λ-notation gives us the list of variables (between λ and “.”) and the “rule”
for ﬁnding the output (after the “.”). The left and right ﬁelds (here N2 and N
respectively) must by understood from the context.
In practice one omits the largely ceremonial part of introducing (1) and
writes (2) at once.

Some restricted types of functions are important.

III.11.22 Informal Deﬁnition. A function F is one-to-one, or simply 1-1, iff

it is single-valued in the ﬁrst projection, that is,

z ∈ F ∧ w ∈ F ∧ δ(z) = δ(w) → π(z) = π (w) (1)

Alternatively, we may write

F(x) = F(y) → x = y (2)

A 1-1 function is also called injective or an injection.

As we feel obliged after the introduction of new argot to issue the usual clar-
iﬁcations, we state: When used in ZFC, “F is 1-1” is just short for (1) or (2)
above. Thus, to assume that F is 1-1 is tantamount to adding (1) (equivalently,
(2)) to the axioms, while to claim that F is 1-1 is the same as asserting its (ZFC)
provability.
204 III. The Axioms of Set Theory

Note that F(x) = F(y) implies that both sides of “=” are defined
(cf. III.11.16). In the opposite situation F(x) = F(y) is refutable; hence (2)
still holds.
The above definition can also be stated as u F x ∧ u F y → x = y. We can say
that a 1-1 function “distinguishes inputs”, in that distinct inputs of its domain
are mapped into distinct outputs.
Note that f = {0, 1, 1, 2} is 1-1 by III.11.22, but while f (2) ! f (3)
(III.11.17), it is the case that 2 = 3. Nevertheless, f (2) = f (3) → 2 = 3, since
f (2) = f (3) is refutable.
1-1-ness is a notion that is independent of left or right fields (unlike the
notions total, nontotal, onto).

III.11.23 Example. A function F is 1-1 iff F−1 is a function. Indeed, F is 1-1

iff it is single-valued in the ﬁrst projection, iff F−1 is single-valued in the
second projection.

III.11.24 Informal Deﬁnition (1-1 correspondences). A function F : A → B

is a 1-1 correspondence iff it is 1-1, total, and onto. We say that A and B are in
1-1 correspondence and write
F
A∼B or A∼B

A 1-1 correspondence is also called a bijection, or a bijective function. An onto

function is also called a surjection, or surjective.

III.11.25 Example (Informal). The notion of 1-1 correspondence is very im-

portant. If two sets are in 1-1 correspondence, then, intuitively, they have “the
same number of elements”. On this observation rests the theory of cardinality
and cardinal numbers (Chapter VII).
For example, λn.2n : N → {2n : n ∈ N} is a 1-1 correspondence between
all natural numbers and all even (natural) numbers.

Let us now re-formulate the axiom of collection with the beneﬁt of relational
and functional notation.

III.11.26 Theorem (Collection in Argot).

(1) For any relation S such that dom(S) is a set, there is a set B such that
S−1 [B] = dom(S).
(2) For any relation S such that ran(S) is a set, there is a set A such that
S[A] = ran(S).
III.11. Relations and Functions 205

Proof. (1): Let S = {x, y : S (x, y)} for some formula S of the formal lan-
guage. Let Z = dom(S). An instance of “verbose” collection (cf. III.8.4) is

(∀x ∈ Z )(∃y)S (x, y) → (∃W ) ¬U (W ) ∧ (∀x ∈ Z )(∃y ∈ W ) S (x, y) (i)

Now, we are told that Coll x (∃y) S (x, y) (cf. III.11.4); thus the assumption
Z = dom(S) translates into

ZFC (∀x) x ∈ Z ↔ (∃y) S (x, y)
and therefore the left hand side of (i) is provable, by tautological implication
and ∀-monotonicity (I.4.24). Thus the following is also a theorem:

(∃W ) ¬U (W ) ∧ (∀x ∈ Z )(∃y ∈ W ) S (x, y) (ii)
Let us translate (ii) into argot: We are told that a set W exists† such that

(∀x ∈ Z )(∃y)(y ∈ W ∧ y ∈ Sx)

Hence

(∀x ∈ Z )W ∩ Sx = ∅
and ﬁnally (see Remark III.11.5(5))

Z ⊆ S−1 [W ]
Since, trivially, Z ⊇ S−1 [W ] we see that W will do for the sought B.
(2) follows from (1) using S−1 instead of S.

III.11.27 Remark. Statement (1) in the theorem, and therefore (2), are “equiv-
alent” to collection. That is, if we have (1), then we also have (i) above. To see
this, let S (x, y) of the formal language satisfy the hypothesis of collection, (i):
(∀x ∈ Z )(∃y)S (x, y) (a)
def
for some set Z . Let us deﬁne S = {x, y : S (x, y) ∧ x ∈ Z }. Then (a) yields
Z = dom(S). (1) now implies that for some set B, Z ⊆ S−1 [B], from which
the reader will have no trouble deducing

(∀x ∈ Z )(∃y ∈ B) S (x, y) (b)

(b) proves (∃W ) ¬U (W ) ∧ (∀x ∈ Z )(∃y ∈ W ) S (x, y) .

† We are using the auxiliary constant W , in other words.

206 III. The Axioms of Set Theory

III.11.28 Proposition.
(1) If S is a function and A is a set, then so is S[A].
(2) If S is a function and dom(S) is a set, then so is ran(S).

Proof. (2) follows from (1), since ran(S) = S[dom(S)].

As for (1), it is argot for collection version III.8.12(4). Indeed, letting
S = {x, y : S (x, y)}
the assumption that S is a function yields (cf. III.11.13(4)) the theorem
S (x, y) ∧ S (x, z) → y = z
The aforementioned version of collection then yields
Coll y (∃x ∈ A)S (x, y)
i.e., that the class term
{y : (∃x ∈ A)S (x, y)}
can be formally introduced (“is a set”). This is exactly what we want.

We have already noted in III.8.3(II), p. 164, that the proposition – being an

argot rendering of III.8.12(4) – is equivalent to collection III.8.2.† A proof
of this equivalence will be given later once rank (of set) and stage (of set
construction) have been deﬁned rigorously. In the meanwhile, in practice, the
proposition (i.e., collection version III.8.12(4)) will often be used in lieu of
collection (being an implication of the latter, this is legitimate).

III.11.29 Corollary. If F is a function and dom(F) is a set, then F is a set.

Proof. By III.11.28, ran(F) is a set. But F ⊆ dom(F) × ran(F).

def
Alternatively, let G = {x, x, F(x) : x ∈ dom(F)}. Clearly, G is a func-
tion and dom(G) = dom(F), while ran(G) = F.

The notion of function allows us to see families of sets from a slightly

different notational viewpoint. More importantly, it allows us to extend the
notion of Cartesian product. First of all,

III.11.30 Informal Deﬁnition (Indexed Families of Sets). A function F such

that ran(F) contains no urelements is an indexed family of sets. dom(F) is the
index class. If dom(F) = ∅, then we have an empty indexed family.

† Or, as we simply say, collection.

III.11. Relations and Functions 207

If we let I be a name for dom(F), then we often write (Fa )a∈I to denote the
indexed family, rather than just F or λa.F(a).
In the notation “Fa ” it is not implied that F(a) might be a proper class (it
cannot be); rather we imply that the function F might be a proper class.

Note that we called F, rather than ran(F), the indexed family (of course, ran(F)
is a family of sets in the sense of III.6.3, p. 150). What is new here is the intention
to allow “multiple copies” of a set in a family with the help of F. An indexed
family allows us to be able to talk about, say, S = {a, b, a, a, c, d} without being
obliged to collapse the multiple a-elements into one (extensionality would
dictate this if we had just a set or class {a, b, a, a, c, d}). This freedom is
achieved by thinking of the ﬁrst a as, say, f (0), the second as f (2), and the
third as f (3), where
f = {0, a, 1, b, 2, a, 3, a, 4, c, 5, d}
is an indexed family with index set dom( f ) = {0, 1, 2, 3, 4, 5}, and ran( f ) = S.
Why is this useful?
For example, if a, b, c, d, . . . are cardinals (Chapter VII), we may want to
study sums of these where multiple summands may be equal to each other. We
!
can achieve this with a concept/notation like i∈dom( f ) f (i).
This situation is entirely analogous to one that occurs in the study of series
in real analysis, where repeated terms are also allowed.

III.11.31 Example. Every family of sets A in the sense of III.6.3 leads to an

indexed family of sets λx.x that has A as domain or index class.
Here is a family of sets in informal mathematics: {(0, 1/n) : n ∈ N − {0}},
where “(a, b)” here stands for open interval of real numbers. This can be viewed
as an indexed family ﬁtting the general scheme – λx.x – above.
A more natural view it as the indexed family λn.(0, 1/n) of domain
way is to
N − {0}, that is, (0, 1/n) n∈N−{0} .

III.11.32 Informal Deﬁnition. Let (Fa )a∈I be an indexed family of sets. Then
def
Fa = ran(F)
a∈I
def

Fa = ran(F)
a∈I

If I is a set I , then
" def
Fa = { f : f is a function ∧ dom( f ) = I ∧ (∀a ∈ I ) f (a) ∈ Fa }
a∈I
208 III. The Axioms of Set Theory
#
If, for each a ∈ I , Fa = A, the same set, then a∈I Fa is denoted by I A. That
is, I A is the class of all total functions from I to A:

{ f : f is a function ∧ dom( f ) = I ∧ (∀a ∈ I ) f (a) ∈ A}

III.11.33 Remark. (1) a∈I Fa and a∈I Fa just introduce new notation (see
#
also the related III.10.22, p. 192). However, a∈I Fa is a new concept that is
related but is not identical to the “ﬁnite” Cartesian product. For example, if
A1 = {1} and A2 = {2}, then

×A = A
2
i 1 × A2 = {1, 2} (i)
i=1

while if we consider the indexed family (Ai )i∈{1,2} , then

"
Ai = {1, 1, 2, 2} (ii)
i∈{1,2}

In general, an element of A1 × · · · × An is an n-vector x1 , . . . , xn , while an

#
element of i∈{1,...,n} Ai is a sequence f = {1, x1 , . . . , n, xn }. Sequences
are not tuples, but they do give, in a different way, positional information, just
as tuples do. Sequences have the additional ﬂexibility of being able to have
“inﬁnite” length (intuitively). Thus, while a1 , a2 , . . . , where “. . . ” stands for
ai for all i ∈ N, is meaningless as a tuple, it can be captured as a sequence
f , where dom( f ) = N and f (i) = ai for all i ∈ N. This makes sequences
preferable to vectors in practice.

In this discussion we are conversing within informal (meta)mathematics. We

must not forget that in the formal theory we mirror only those objects that we
have proved to exist so far – i.e., to have counterparts, or representations, within
the theory; the arbitrary n, and N, are not among them.
#
(2) Requiring I to be a set in the deﬁnition of ensures that the functions
#
f that we collect into are sets (by III.11.29). This is necessary, for classes
can only have elements that are sets or atoms.
(3) The sub-formula “ f is a function” is argot for

(∀z ∈ f ) OP(z) ∧ (∀w ∈ f ) π(z) = π(w) → δ(z) = δ(w)

#
III.11.34 Proposition. If I is a set, then a∈I Fa and a∈I Fa are sets. If more-

over I = ∅, then a∈I Fa is a set as well.
III.11. Relations and Functions 209

Proof. Since I is a set, so is ran(F) by III.11.28. Now the cases for a∈I Fa

and a∈I Fa follow from III.11.32 and the axiom of union and from III.6.14,
respectively.
On the other hand,
"
Fa ⊆ P(I × Fa )
a∈I a∈I
#
Hence a∈I Fa is a set.

A
III.11.35 Corollary. For any sets A and B, B is a set.

III.11.36 Example. Let A, B, C be nonempty sets, that is, suppose we have

proved

(∃x)x ∈ A
(∃x)x ∈ B

and

(∃x)x ∈ C

Let (by auxiliary constant)

a∈A
b∈B
c∈C

Then a, b, c ∈ A × B × C; hence we have proved in ZFC

(∃x)x ∈ A ∧ (∃x)x ∈ B ∧ (∃x)x ∈ C → (∃x)x ∈ A × B × C

We can do the same with four sets, or ﬁve sets, or eleven sets, etc., or prove by
(informal) induction on n, in the metatheory, the theorem schema (see III.10.21)

if Ai = ∅ for i = 1, . . . , n, then × A = ∅.
1≤i≤n
i (1)

Metatheoretically speaking, we can “implement” any n-tuple an ∈

× A as a sequence f = {i, ai : 1 ≤ i ≤ n}. That is, the function f is
1≤i≤n i
a set (by III.11.29) applied to the (informal) set I = {1, 2, 3, . . . , n}.† Thus,
#
f ∈ i∈I Ai .

† Thinking of the informal natural numbers as urelements, I is a set by separation, since the “real”
M is a set.
210 III. The Axioms of Set Theory

It follows that
"
if Ai = ∅ for i = 1, . . . , n, then Ai = ∅. (2)
i∈I

(2) can be obtained within the theory, once n and N are formalized. In the
meanwhile, we can view it, formally, not as one statement, but as a compact
way of representing inﬁnitely many statements (a theorem schema): One with
one set A (called A1 in (2)), one with two sets A, B (A1 , A2 ), one with three
sets A, B, C (A1 , A2 , A3 ), etc.
As indices we can use, for example, {∅}, {{∅}}, {{{∅}}}, {{{{∅}}}}, etc., which
are all distinct (why?).†
Does (2) extend to the case of arbitrary (and therefore possibly inﬁnite) I ?
We consider this question in the next chapter.

We have seen the majority of nonlogical axioms for ZFC in this chapter.
The “C” of ZFC is considered in the next chapter. In Chapter V we will in-
troduce the last axiom, the axiom of inﬁnity, which implies that inﬁnite sets
exist.‡

III.12. Exercises
III.1. Prove ZFC ¬U (A) ∧ ¬U (B) → (A ⊂ B → B = ∅).
III.2. Prove ZFC Coll x B → (∀x)(A → B ) → Coll x A.
III.3. Prove ZFC U (x) → x ∩ y = ∅.
III.4. Prove ZFC U (x) → x − y = ∅.
III.5. Prove ZFC ¬U (x) → U (y) → x − y = x.
III.6. Let a be a set, and consider the class b = {x ∈ a : x ∈ / x}. Show that,
despite similarities with the Russell class R, b is a set. Moreover, show
b∈ / a. Do not use foundation.
III.7. Show R (the Russell class) = U M .
III.8. Show that ZFC ¬U (x) → ∅ ⊆ x
III.9. Show that if a class A satisﬁes A ⊆ x for all sets x, then A = ∅.
III.10. Without using foundation, show that ∅ = {∅}.

† We can still write these indices as “1”, “2”, “3”, “4”, etc. (essentially counting the nesting of
{}-brackets), as this is more pleasing visually.
‡ The inﬁnity axiom does not just say that inﬁnite sets exist. It says, essentially, that limit ordinals
exist, which is a stronger assertion.
III.12. Exercises 211

III.11. Interpret the extensionality axiom over N so that the variables vary over
integers, not sets, and ∈ is interpreted as “less than”, <. Show that under
this interpretation the axiom is true.
III.12. Show that if we have no urelements, and if our axioms are just ex-
tensionality, separation, union, foundation, and collection, then this set
theory
(1) can prove that a set exists, but
(2)
cannot prove that a nonempty set exists.
Hint:
Find a model of all the
above axioms augmented by the formula
(∀y) ¬U (y) → (∀x)x ∈ /y .
III.13. Suppose we have all the axioms except the one for pairing and the one
that asserts the existence of a set of urelements (III.3.1). Show that these
axioms cannot prove that a set exists.
(Hint: Find a model of all the above axioms augmented by the formula
(∀x)U (x).)
III.14. (Bourbaki (1966b)) Drop collection version III.8.2, separation, and
union. Add Bourbaki’s axiom of “selection and union”, that is, collec-
tion version (2) of III.8.12, p. 173. Prove that separation and union are
now theorems.
III.15. (Shoenﬁeld (1967)) Drop collection version III.8.2, pairing, and union.
Add collection version (3) of III.8.12, p. 173. Prove that pairing and
union are now theorems.
III.16. (Levy (1979)) Drop collection version III.8.2, pairing, and separation.
Add collection version (4) of III.8.12, p. 173. Prove that pairing and
separation are now theorems.
III.17. Prove
U (A) → P(A) = {∅}
in ZFC.

III.18. What is ∅ (and why)?
III.19. Show that
(1) A ∪ B = B ∪ A and
(2) A ∪ (B ∪ C) = (A ∪ B) ∪ C.
III.20. Show that
(1) A ∩ B = B ∩ A and
(2) A ∩ (B ∩ C) = (A ∩ B) ∩ C.
III.21. For any set A in the “restricted” universe U N (N ⊆ M), show that
U N − A is a proper class.
212 III. The Axioms of Set Theory

III.22. Show for any classes A, B that A − B = A − A ∩ B.

III.23. For any classes A, B show that A ∪ B = A iff B ⊆ A.
III.24. For any classes A, B show that A ∩ B = A iff A ⊆ B.
III.25. For any classes A, B show that A − (A − B) = B iff B ⊆ A.
III.26. Prove III.6.15(2).
III.27. (1) Express A ∩ B using class difference as the only operation.
(2) Express A ∪ B using class difference and complement as the only
operations.
III.28. Generalized De Morgan’s laws. Prove for any class A and indexed
family (Bi )i∈F that

(1) A− Bi = (A − Bi )
i∈F i∈F

(2) A− Bi = (A − Bi )
i∈F i∈F

III.29. Distributive laws for ∪, ∩. For any classes A, B, D show

(1) A ∩ (B ∪ D) = (A ∩ B) ∪ (A ∩ D)

(2) A ∪ (B ∩ D) = (A ∪ B) ∩ (A ∪ D)
III.30. Generalized distributive laws for ∪, ∩. Prove for any class A and in-
dexed family (Bi )i∈F that

(1) A∩ Bi = (A ∩ Bi )
i∈F i∈F

(2) A∪ Bi = (A ∪ Bi )
i∈F i∈F

III.31. Show that we cannot have a ∈ b ∈ c ∈ · · · ∈ a.

III.32. Show that V N is a proper class for any set N of urelements (including
the case N = ∅).
III.33. Show that for any class (not just set) A, A ∈ A is refutable.
III.34. (1) Show that A =“the class of all sets that contain at least one element”
can be deﬁned by a class term.
(2) Show that A is a proper class.
III.35. Attach the intuitive meaning to the statement that the set A has n distinct
elements. Show then, by informal induction on n ∈ N, that for n ≥ 0,
if A has n elements, then P(A) has 2n elements.
III.12. Exercises 213

III.36. Show (without the use of foundation) that {a}, {a, b} = {a }, {a , b }
implies a = a and b = b .
III.37. For any sets x, y show that x ∪ {x} = y ∪ {y} → x = y.
(Hint: Use foundation.)
III.38. Prove Proposition III.10.13.
III.39. Prove Proposition III.10.14.
III.40. For any A, B show that ∅ = A × B iff A = ∅ or B = ∅.
III.41. For any set of urelements N , show that U3N ⊆ U2N .
III.42. Distributive law for ×. Show for any A, B and D that D × (A ∪ B) =
(D × A) ∪ (D × B).
III.43. Let F : X → Y be a function, and A ⊆ Y, B ⊆ Y. Prove
(a) F−1 [A ∪ B] = F−1 [A] ∪ F−1 [B]
(b) F−1 [A ∩ B] = F−1 [A] ∩ F−1 [B]
(c) if A ⊆ B, then F−1 [B − A] = F−1 [B] − F−1 [A].
Is this last equality true if A ⊆ B? Why?
III.44. Let F : X → Y be a function, and A ⊆ X, B ⊆ X. Prove
(a) F[A ∪ B] = F[A] ∪ F[B]
(b) F[A ∩ B] ⊆ F[A] ∩ F[B]
(c) if A ⊆ B, then F[B − A] ⊇ F[B] − F[A].
Can the above inclusions be sharpened to equalities? Why?
III.45. Which parts, if any, of the above two problems generalize to the case
that F is just a relation?
III.46. Let G be$ a function and F$ a %family of sets. Prove
%
(a) G−1 $ F% = G−1 $F%
−1
(b) G−1
$ %F = G F
$ %
(c) G$ F% = G$F%

(d) G F ⊆ G F . (Can ⊆ be replaced by =? Why?)
III.47. Let F be a function, and A a class. Prove
(a) F[F−1 [A]] ⊆ A
(b) F−1 [F[A]] ⊇ A, provided that A ⊆ dom(F).
Show by appropriate concrete examples that the above inclusions cannot
be sharpened, in general, to equalities.
III.48. Let the function F be 1-1, while A ⊆ dom(F) is an arbitrary class. Show
that F−1 [F[A]] = A. State and prove an appropriate converse.
III.49. Let B ⊆ ran(G). Prove G[G−1 [B]] = B. State and prove an appropriate
converse.
214 III. The Axioms of Set Theory

III.50. Let F be a 1-1 function and A ⊆ B ⊆ dom(F).

(a) Prove F[B − A] = F[B] − F[A].
(b) Prove a suitable converse.
Is the restriction A ⊆ B ⊆ dom(F) necessary? Why?
III.51. For any relations S, T prove
(1) (S−1 )−1 = S
(2) dom(S) = ran(S−1 )
(3) ran(S) = dom(S−1 )
(4) (S ∪ T)−1 = S−1 ∪ T−1 .
III.52. Prove that if (S−1 )−1 = S, then S is a relation. Give an example where
the equality fails. Which ones among (2)–(4) in the previous exercise
hold for arbitrary classes?
III.53. Using only the axioms of union, pairing, and separation, show that if a
function F is a set, then so are both dom(F) and ran(F).
III.54. Show for a relation S that if both the range and the domain are sets, then
S is a set.
III.55. Show that if a relation S is a set, then so is S −1 .
III.56. If F : A → B is a 1-1 correspondence, show that so is F−1 : B → A.
IV

The Axiom of Choice

From this chapter and onwards the reader will witness more and more the
“relaxed proof style” (cf. III.5.9).

IV.1. Introduction
The previous chapter concluded with the question, can
"
if Ai = ∅ for all i ∈ I, then Ai = ∅ (1)
i∈I

where I = {1, 2, . . . , n}, be extended to the case of arbitrary (and therefore

possibly inﬁnite† ) I ?
The axiom of choice, AC, says yes.

IV.1.1 Axiom (Axiom of Choice, or AC). If I and Aa , for all a ∈ I , are non-
#
empty sets, then a∈I Aa = ∅.

But why “axiom”? After all, the case for ﬁnite I is provable as a theorem,
that is, (1) above. Before we address this question, let us ﬁrst consider some
more down-to-earth equivalent forms of AC.

IV.1.2 Theorem. The following statements (1), (2), (3), and (4) are provably
equivalent.
(1) AC.
(2) If the set F is a nonempty family of nonempty sets, then there is a function
g such that dom(g) = F and g(x) ∈ x for all x ∈ F.

† The terms “inﬁnite” and “ﬁnite” throughout this discussion have their intuitive metamathematical
meaning.

215
216 IV. The Axiom of Choice

(3) If the set S is a relation, then there is a function f such that dom( f ) =
dom(S) and f ⊆ S.
(4) If the set F is a nonempty family of pairwise disjoint nonempty sets, then
there is a set C that consists of exactly one element out of each set of F
(i.e., for each x ∈ F, C ∩ x is a singleton).

Note. The function g in (2) above is called a choice function for F.

def
Proof. (1) → (2): Given a set F as in (2). Deﬁne i = λx.x with dom(i) = F.
Then F can be viewed as the indexed family (i(x))x∈F , or (x)x∈F . By (1) there
#
is a g ∈ x∈F x. Thus, dom(g) = F and g(x) ∈ x for all x ∈ F.
def
(2) → (3): Given a relation S (set). Let F = {Sa : a ∈ dom(S)}. F is a
set by III.8.9, since dom S is a set (III.11.8). If F = ∅, then S = ∅ and f = ∅
will do. So let F = ∅. By (2), there is a choice function g, i.e., dom(g)
= F
and g(x) ∈ x for all x ∈ F. In terms of S, the last result reads g Sa ∈
†
def & '
Sa for each a ∈ dom(S). Clearly, f = a, g Sa : a ∈ dom(S)
will do.
def
(3) → (4): Let F be as in (4). Deﬁne S = {x, y : y ∈ x ∈ F}.‡ S is a set,

since S ⊆ F × F. Now apply (3) to obtain f ⊆ S with dom( f ) = dom(S) =
F. Take C = ran( f ), a set by III.11.28.
To verify, let x ∈ F = dom( f ). Then x, f (x) ∈ S; therefore f (x) ∈ x.
This along with f (x) ∈ C yields f (x) ∈ C ∩ x. Let also

y ∈C∩x (i)

Hence y ∈ C in particular. Then, for some z, f (z) = y; therefore z, y ∈ S

and thus y ∈ z. By (i), y ∈ x ∩ z, so that x = z (by the assumption on F). This
yields y = f (z) = f (x). Thus, C ∩ x = { f (x)}.
(4) → (1): Let (Aa )a∈I be an indexed family of nonempty sets (I = ∅ as
def
well). Let F = ran(λa.({Aa } × Aa )) for a ∈ I . F is a set by III.11.28, and its
members are pairwise disjoint sets; for if Aa = Ab , then Aa , x = Ab , y for
all x, y, and thus ({Aa } × Aa ) ∩ ({Ab } × Ab ) = ∅. By (4) there is a set C such
that C ∩ ({Aa } × Aa ) is a singleton for all a ∈ I .
def
Deﬁne f = a, δ(y) : a, y ∈ I × C ∩ ({Aa } × Aa ) , which is a set
by III.8.9, and obviously a function by the previous remark. Now, dom( f ) = I

† x ∈ F ↔ (∃a ∈ dom(S))x = Sa (cf. III.8.7).

‡ By “y ∈ x ∈ F” I mean “y ∈ x ∧ x ∈ F”, i.e., using ∈ conjunctionally.
IV.1. Introduction 217
# #
and f (a) = δ(y) ∈ Aa for all a ∈ I ; hence f ∈ a∈I Aa , that is, a∈I Aa
= ∅.

We will concentrate on the equivalence of AC with (4), due to the latter’s

intuitive appeal. Is now (4) “really true”? Is it possible to choose one element
out of each set x of a possibly infinite† set F of nonempty pairwise disjoint
sets, in order to form a set C? In the finite case we can literally “choose” each
representative from each x ∈ F, for if there are n such choices, we can fit them in
a proof that is about n lines long (see the proof of (1) at the closing of Chapter III).
We cannot do the same in the infinite case, for proofs must have finite length.
Of course, it often happens that we can describe an infinite process – such as
an infinite sequence of choices – in a finite manner. For example, if F consists
exclusively of nonempty subsets of N, then we can define C compactly without
def
having to list our infinite set of choices: C = {y : y is smallest in x ∧ x ∈ F}.
One point of view maintains that to accept the existence of a set like C we
must be able to give a “rule” or “unambiguous definition” P [y] – just as in
the example above. Holders of this viewpoint do not accept AC as a legitimate
axiom. They argue that in the absence of “structure” in the set members x of
F, all the elements of such x “look alike”, and therefore the infinite process of
“choosing” cannot be compacted into a finite well-defined description. This is
true even for very small sets x (it is the size of F, not that of x ∈ F, that creates
the problem).‡
A well-known example due to Russell contrasts an infinite set of pairs of
shoes with an infinite set of pairs of socks. In the former case the set C can
be defined compactly to consist of, say, the left shoe out of each pair. In the
case of socks this “rule” does not define well which sock to pick, because, even
though they are distinct objects, the two socks in a pair cannot be distinguished
by “left” vs. “right”.
The other philosophical point of view accepts that sets exist outside our-
selves despite our frequent inability to define them “well”, or to describe them.
Thus, the choice set C is some arbitrary “partition”§ of the objects of “real”
mathematics into members (of C) and non-members. As such, it exists whether
we can define it well or not.¶

† Intuitively speaking, for now.

‡ If each x ∈ F is a singleton, then, of course, C can be well defined.
§ We do not attach any technical significance to the term “partition” here.
¶ It is conceivable that someone some day may come up with a way to describe how to choose one
sock from each one of an infinite collection of sock pairs. It would be therefore unwise to say “it
cannot be done” simply because you or I cannot do it today.
218 IV. The Axiom of Choice

Under this Platonist interpretation of “existence”, a set of representative

socks, one out of each pair in an inﬁnite set of pairs, certainly exists. More
generally, (4) and, equivalently, AC are “true”.†

IV.2. More Justiﬁcation for AC;

the “Constructible” Universe Viewpoint

Throughout this section we reason Platonistically in the domain of informal (or

“real”) set theory. Within the set-formation-by-stages doctrine we can argue
the reasonableness of AC, paraphrasing Gödel’s proof of the consistency of AC
with the remaining axioms of ZF (Gödel (1939, 1940)).
Let us take seriously the challenge that AC does not provide us with a well-
defined “rule” to effect the potentially infinitely many choices. Thus, for the
balance of this section we will respond by demanding that all sets – not just
those that AC asserts to exist – be “given” by a well-defined rule. This just
levels the playing field.
In particular, when we apply the power set operation, P(A), to an infinite set
A, we will accept that only those subsets of A that are “well-definable” exist
(i.e., as sets; we will soon make this requirement precise).‡

This attitude is similar to the one that separates collections into sets and proper
classes: That some collections are not sets is a situation we are by now com-
fortable with. In this section we further narrow down what collections we will
accept as sets in defense of AC.
This is a local restriction, however, valid only in this section. In the remainder
of the volume we revert to our understanding of “real” sets as this was explained
in Chapter II (II.1.3).

† The reader will observe that all we are doing here is arguing that a proposed new axiom is
reasonable. This is a process we have been through for all the previous axioms, and it does
not constitute a proof of the axioms in the metatheory. The notion “reasonable” is not tempo-
rally stable. When Cantor introduced set theory, the entire theory was “unreasonable” to many
mathematicians of the day – including influential ones like Poincaré, who suggested that most of
Cantor’s set theory ought to be discarded. When Russell proposed to found mathematics on logic,
this too was considered as an “unreasonable” point of view. For example, Poincaré protested that
this was tantamount to suggesting that the whole body of mathematics was just a devious way
to say A ↔ A . As mathematics progresses, mathematicians become more ready to accept the
reasonableness of formerly “unreasonable” concepts or statements.
‡ “Well-definable” is just an emphatic way of saying “(first order) definable”, in the sense that
we can write these sets down as class terms. We have already remarked (cf. II.4.2(b)) that
we cannot expect all subsets of N to be first order definable, for they are far too many of
them.
IV.2. More Justification for AC 219

The deﬁnition of the “well-deﬁnable” sets will be given by formulas of set

theory. We hope to be able to “sort” all possible such formulas in “ascending
order”, indicating such an order by the symbol “≺”. Next, we will see that ≺
on formulas naturally induces an order on the defined sets. To avoid notational
confusion, we will use the symbol “” for this induced order on sets. With some
luck, will arguably be a well-ordering, i.e., each nonempty class will have
a minimum element (with respect to ). If all this succeeds, we will have two
things:
(1) A restricted universe of sets, L, where all sets are definable (all the other
sets are ignored – banned from “sethood”, that is).
(2) There will be a well-ordering, , on this restricted universe.
Thus, if A is any set of nonempty sets, a choice function for A can be
(well) defined by setting f (x) equal to the minimum a ∈ x with respect to the
ordering .
To begin with, we will need a judicious reinterpretation of what is going on
at each stage of set construction.

A stage of set construction is one of two possible types: a collecting type, or a

powering type.
At a collecting stage one collects into a set all the objects that are available
so far. In particular, since the urelements are given outright, the 0th stage is a
collecting stage, at which the set of all urelements, call it M, is formed. At any
subsequent collecting stage, we form the union of all the sets that were formed
at all previous stages.
At a powering stage we form the set of all well-definable subsets of the set
formed at the immediately previous stage – a sort of truncated power set.
The stages occur in the following order, defined inductively:†
(i) The 0th stage is a collecting stage at which the set of all urelements, M, is
formed.
(ii) If at the arbitrary collecting stage the set X has been formed, then this
stage is followed immediately by infinitely many powering stages, to form
the sets X 1 , X 2 , . . . , X n , . . . , where
X 1 = D(M ∪ X ) and, for n ≥ 1 X n+1 = D(M ∪ X n ) (1)

† The following informal deﬁnition is adequate for our informal discussion. A precise version
will be given with the help of ordinals – formal counterparts of “stages” – when we revisit the
constructible universe in Chapter VI.
220 IV. The Axiom of Choice

In (1), D(A) denotes the set of all definable subsets of A. We will soon
make the meaning of the term D(A) precise, but for the time being let us
imagine that it is a pared-down version of P(A), that is, D(A) ⊂ P(A) for
infinite sets A.†
(iii) Immediately after each such infinite sequence of powering stages, a col-
lecting stage occurs to form the union of all the sets formed at all the
previous stages.
This process, alternating between (ii) and (iii), continues ad infinitum and con-
structs all sets (all definable sets really, but you will recall that in this section
we pretend that these are the only legitimate sets anyway).

IV.2.1 Remark. The need for collection, after each sequence of powering,
should be clear. For example, if we stop the process after the first sequence
of powering, then, even though we have constructed sets with arbitrary integer
depth of nesting of {}-brackets, we have not constructed a single set that contains
as elements sets with all possible depths of nesting of {}-brackets.
Specifically, if X 1 , X 2 , . . . , X n , . . . constitutes the first sequence of power-
ing, then

∅ ∈ X1
and for n ≥ 1,
{. . . { ∅ } . . . } ∈ X n+1

n n

But none of the X i contains all of the

{. . . { ∅ } . . . }

n n

for all n.

It is useful to observe the following important property of each set Y con-

structed at some stage: If x ∈ y ∈ M ∪ Y , then x ∈ M ∪ Y .‡
The claim is trivially true if y is an atom (see the previous footnote).

† Of course, in principle, we can list explicitly all subsets of any finite set, so that for finite sets A
we intuitively accept that all their subsets are definable, i.e., D(A) = P(A) in this case.
‡ A set S that satisfies a ∈ b ∈ S → a ∈ S – that is, a ∈ b ∧ b ∈ S → a ∈ S – is called transitive.
Such sets play a major role in set theory – all ordinals are transitive, to make the point. Here are
two simple examples:
(a) {#, ?}, where # and ? are urelements (a ∈ b ∈ S → a ∈ S is true for b an urelement, since
then a ∈ b ∈ S is false)
(b) {∅, {∅}}.
IV.2. More Justification for AC 221

We say “true” and “false” freely in this section, since we are working, like
Platonists, in the metatheory.

If y is a set, then we rephrase what we want to prove as follows:

y ∈ M ∪Y → y ⊆ M ∪Y (2)

We prove (2) by induction on stages (that we went through towards building

Y ). To this end, we need to verify the following:

(i) The set Y constructed at the 0th stage has the property.
(ii) The property propagates with collecting.
(iii) The property propagates with powering.

As for (i), this is true because the Y formed at stage 0 is M, and M is transitive
(see the footnote to the claim) – or, another way of saying this, y ∈ M ∪ Y is
false (M ∪ Y = M contains only atoms, while y is a set).

As for (ii), let Y = {Z , W , . . .} be formed at a collecting stage, where
Z , W , . . . are all the sets formed at all the previous stages. Let y ∈ M ∪ Y .
Thus, y ∈ Y (for M contains only atoms); hence

y ∈ (say) W ⊆ M ∪ W

By the induction hypothesis (I.H.), y ⊆ M ∪ W , and, since W ⊆ Y , it follows

that y ⊆ M ∪ Y .
As for (iii), let Y = D(M ∪ X ), where X is the set we have built at the
immediately previous stage. Let y ∈ M ∪ Y be true and take any x ∈ y. Again,
y ∈ Y , thus y ⊆ M ∪ X , hence x ∈ M ∪ X .
Now, if x ∈ M, then x ∈ M ∪ Y . If not, then x is a set, and I.H. yields
x ⊆ M ∪ X , from which follows x ∈ Y . Thus, in either case, x ∈ M ∪ Y , and
(x being arbitrary) y ⊆ M ∪ Y .

IV.2.2 Remark. We state an important by-product of the transitivity of the sets

M ∪ X , where X has been obtained in our construction. If Y = D(M ∪ X ) for
some such X , then we have that

X ⊆ M ∪Y

Indeed, let x ∈ X . If x ∈ M, then we are done. Else, x is a set and x ∈ M ∪ X ;

therefore x ⊆ M ∪ X . It follows that x ∈ Y .

We now turn to how the sets obtained at powering stages are actually “de-
fined”. Let us “sort” in an arbitrary fixed way the alphabet of the first-order
language of logic that we have been using all along (we will use the symbol ≺
222 IV. The Axiom of Choice

to indicate the assumed order on the alphabet, A, of logical and nonlogical

symbols). Now,

A = {∀, ∃, ¬, ∧, ∨, →, ↔, =, (, ), v, |, ∈, U }

where the symbols v and | are used to build the object variables v0 , v1 , v2 , . . .
as v|v, v||v, v|||v, . . . (as usual, we can use abbreviations such as x, y, z – with
or without primes or subscripts – for variables). Let us ﬁx the order

∀ ≺ ∃ ≺ ¬ ≺ ∧ ≺ ∨ ≺→≺↔ ≺=≺(≺) ≺ v ≺ | ≺∈≺ U (3)

for A.

We next augment our alphabet to include the names 0,

1,
2, . . . ,
n , . . . of all
† ‡
urelements, and exactly one name for each deﬁnable set.

As a matter of notation, if c is (I mean, informally names) a deﬁnable set,

then c will denote the unique name for c that we import into our alphabet A.
In particular, the horrible notation cn will stand for the sequence of names

c1 , . . . ,
cn .

† It might be thought – with some justification – that we are cheating somewhat here by taking M,
the set of all urelements, to be N. Recall however that all that we are after is to
(1) give a philosophically plausible description of what sets are, and
(2) within this description argue that AC holds.
In other words, following Gödel, we are proposing an informal and plausible universe of “real”
sets.
We have chosen N as the set of urelements because AC holds on it by the least integer principle.
How well does this choice hold philosophically, i.e., how well are we serving requirement (1)
above? Well, it should not be too difficult to accept the view that the primeval “real stuff” of
mathematics – the atomic objects – is the natural numbers, and that all else in mathematics
we build starting from these numbers. After all, one of the most careful among the fathers of
foundations, Kronecker, had no trouble with this position at all. He is said to have held that “God
created the integers; all else is the work of man”. Mind you, Kronecker, the mentoring father of
intuitionism and a confirmed finitist, did not allow for the entire set of natural numbers, but only
granted you the right of having as many numbers as you wanted by simply adding one to the last
one you have had.
Even technically, one can argue that the choice of such a small set of urelements does not
restrict our ability to use sets to do mathematics, for it turns out that even a smaller set works
(i.e., leads to a set theory that is sufficiently rich for the purposes of doing mathematics). Namely,
as we shall see in Chapter V, von Neumann has shown how to build the natural numbers and,
therefore, also Kronecker’s “all else”, starting from ∅.
‡ Definable in the process that we are describing. By the way, introducing a unique name for each
“real object” of a collection is a trick that we have already used in describing the semantics of
first order languages in I.5.4.
IV.2. More Justification for AC 223

Our goal is to extend the order ≺ from A to all names, and then to induce
it on the named objects, that is, all objects of our deﬁnable universe.
Thus, what we have set out to do is to achieve

a b iff a ≺
b (4)

We do this in stages, starting by extending (3) into

∀ ≺ ∃ ≺ ¬ ≺ ∧ ≺ ∨ ≺→≺↔ ≺=≺(≺) ≺ v ≺ | ≺∈≺ U ≺

0 ≺
1 ≺ ···
(5)

It is trivial that ≺ in its present stage of deﬁnition given by (5) is a well-ordering,

that is, every nonempty set of symbols in A ∪ { n : n ∈ N} has a ≺-smallest
element.†

IV.2.3 Deﬁnition. A set a is deﬁnable from X , a formula P (v0 , v1 , . . . , vn )

b n iff
over the initial alphabet (3), and the parameters

b n ) is true in X }
x,
a = {x ∈ X : P (

where the (constant) objects bi , named by

bi , are all in X .
“Is true in X ” means that the truth value is “computed” by restricting
all bound variables of the sentence P ( b n ) to vary over X .‡ That is, an
x,
occurrence of (∃y) or (∀y) in the formula means (∃y ∈ X ) or (∀y ∈ X )
respectively.
D(X ) denotes all sets a deﬁnable from X and parameters in X (for all P
over the initial alphabet (3)).

We will well-order the class of all deﬁnable sets by well-ordering their

deﬁnitions.

We do not append all the names c to A at once. We have only appended the
names of the atoms so far to form the alphabet (5). There are two good reasons
for this: One, we will augment our formal symbol set by stages, so that as it
grows it stays (provably) well-ordered. Two, we will add a name only after its
corresponding set has been seen to be deﬁnable; for, conceivably, not all sets
are deﬁnable.

† Well-orderings will be studied in detail in Chapter VI.

‡ Why x in P ( b )? This is because each object x ∈ X that we check for membership in a enters
x, n
the deﬁning formula P via its name.
224 IV. The Axiom of Choice

In order to keep the notation simple as we append more symbols, we will

continue naming the so augmented alphabet “A”. Thus, (5) depicts the version
of A that we have immediately after appending the names of all the atoms to
the initial alphabet (3).
As usual, S + denotes the set of all nonempty strings of symbols from S, while

S , for 0 < i ∈ N, denotes the set of strings of length i.† Thus, S + = i≥1 S i
i

(cf. I.1.4).

IV.2.4 Lemma. Let ≺ be a well-ordering on the set S. We extend ≺ to S + by

the following rules, but still call it ≺.
r We order by increasing string length, i.e., all the elements of S i precede
those of S i+1 .
r In each equal-length group (i.e., each S i ) we order the strings lexicograph-
ically (as in dictionaries), that is, of two unequal strings, the smaller is the
one that in the leftmost position of disagreement contains the smaller of the
two disagreeing symbols of S.

Then ≺ on S + is a well-ordering.

Proof. Let ∅ = C ⊆ S + . Pick any string a1 a2 . . . an of shortest length n. We

deﬁne a sequence of transformations:

Transform a1 a2 . . . an to
a1 a2 . . . an

where
a1 is the ≺-smallest in S (S is well-ordered!) such that a1 a2 . . . an ∈ C.‡
In general, assuming that a2 . . .
a1 ai ai+1 . . . an ∈ C has been deﬁned,

Transform a2 . . .
a1 ai ai+1 . . . an to a2 . . .
a1 ai
ai+1 . . . an

where ai+1 is the ≺-smallest in S such that a2 . . .

a1 ai
ai+1 . . . an ∈ C.
Thus, we have deﬁned, by induction on i (≤ n), a ≺-smallest element
a2 . . .
a1
an of C.

IV.2.5 Corollary. For the version of A given by (5), A + is well-ordered by ≺.

† Of course, a string over S of length i is just a member of the i-copy Cartesian product S i :
x1 , . . . , xi . One usually writes strings without the angular brackets, and without the comma
separators, like this: x1 x2 . . . xi . Naturally, if the latter notation becomes ambiguous – e.g., if
S = {0, 00} then 00 might be either 00 or 0, 0 in vector notation – then we revert to the vector
notation.
‡
a1 may be the same as a1 .
IV.2. More Justiﬁcation for AC 225

So far, we have at stage 0:

r a well-ordering of M, , deﬁned to be equal to the “standard <” on the set
of atoms (since M = N), and
r a relation ≺ satisfying (4) on M because of (5).

This observation validates the basis of the induction on stages that we now
embark upon.
Assume then that has been extended to a well-ordering on the set of all
objects M ∪ X deﬁned so far, in such a way that (4) holds, where ≺ is a well-
ordering on the set of all symbols and names that we have up to now – this
augmented set is still called A – and moreover assume that the present stage
to be “executed” is a powering stage that will yield Y = D(M ∪ X ).
We now extend to Y by cases:
Let {a, b} ⊆ Y .

Case 1. {a, b} ⊆ X . (This is a legitimate case by Remark IV.2.2.) Then, a b

is already deﬁned, and we do not alter it. By I.H., (4) holds.
Case 2. a ∈ X and b ∈ / X . Then deﬁne a b and a ≺ b. Thus (4) still holds.
Case 3. a = b, where a ∈/ X and b ∈ / X (i.e, both are “new” objects – hence
sets). Let
a = {x ∈ M ∪ X : P ( a n ) holds in M ∪ X }
x, (6)
and
b = {x ∈ M ∪ X : Q ( b m ) holds in M ∪ X }
x, (7)
where the parameters ai and b j are in M ∪ X .

By Corollary IV.2.5, ≺ extends from A to A + as a well-ordering.

Now, every definable set will be defined infinitely many times in this process.†
Thus, to extend ≺ and by adding to them the pairs a,b and a, b – or

b,
a and b, a, as the case may be – respectively, we look, in a sense, for
the “earliest construction times” for a and b, or, more conveniently, for the
≺-smallest definition.
(i) Of all the possible formulas P (v0 , vn ) – over the initial alphabet (3), with
free variables v0 , . . . , vn – that can define the set a at this stage via appropri-
ate parameters, denoting members of M ∪ X , substituted into the variables,

† For example, a deﬁning formula P deﬁnes the same set as P ∨ P , or ¬¬P , etc., or any other
formula Q for which the equivalence P ↔ Q holds, trivially or not.
226 IV. The Axiom of Choice

we choose the ≺-smallest in (6). Similarly, of the possible Q (v0 , vm ) that
can deﬁne the set b at this stage, we choose the ≺-smallest in (7). This
invokes IV.2.5.
(ii) Of the possible parameter strings a1a2 . . .
an that work in conjunction with
the formula P (v0 , vn ) chosen in (i) above to deﬁne a as in (6), we choose
the ≺-smallest. Similarly for b. This also invokes IV.2.5.

After having exercised all this caution, and having chosen P , Q , b m as

a n , and
directed in (i) and (ii) above, we extend and ≺ by deﬁning


 P (v0 , vn ) ≺ Q (v0 , vm ),

or
a b and a ≺ b iff

 P (v0 , vn ) ≡ Q (v0 , vm ) (equal as strings),

and
a1
a2 . . .an ≺
b1b2 . . .
bm

where the ≺ to the right of “iff” is meaningful, since the involved strings are
in A + . The “normalization” of P and Q used in (6) and (7) ensures that the
extension of (and hence of ≺) is well defined, and is still a well-ordering,
since the ≺ to the right of “iff” is.
This settles the induction step with respect to powering stages – having
extended to Y so that (4) holds.
Suppose finally that the stage we are about to execute is a collecting stage

that builds Y as {X, W, Z , . . . }. By I.H., is a well-ordering on each of
X, W, Z , . . . .
Let a, b be in Y . Then a, b are in X , say. Then a b is already defined and
satisfies (4); thus we need do nothing further.†

This concludes the deﬁnition of as a well-ordering of the entire class of

deﬁnable sets and atoms, L.

We have obtained more than what we set out to achieve:

IV.2.6 Metatheorem (Strong, or Global, Choice). If F is any class‡ of mu-

tually disjoint nonempty sets R, S, . . . , then there is a class T that consists of
exactly one element from each of R, S, . . . .

† Since is updated only at powering stages as new sets get constructed, it is never redeﬁned
during the normalization (i) and (ii) above. Thus it cannot be that a b in X while ¬a b in,
say, W above. The reader will also observe that IV.2.2 validates our contention that a and b are
both in some earlier “X ”.
‡ Not just set.
IV.2. More Justiﬁcation for AC 227

Proof. Put in T the -smallest element out of each x ∈ F.

IV.2.7 Corollary. AC holds in L.

Proof. If F is a set, then so is T by collection (e.g., III.11.28, p. 206).

IV.2.8 Remark. (1) Our notational apparatus does not allow higher order ob-
jects that are collections of (possibly) proper classes. If for a minute such objects
were allowed, and if is one of them and it happens to contain mutually dis-
joint nonempty classes (not just sets), then a higher order collection T exists
which contains exactly one element out of each x ∈ . Just use the -smallest
out of each x.
(2) Informally, we have established the acceptability (= informal “truth”,
modulo some appropriate understandings of what the real sets really are and
how they come about) of a strong choice principle, and hence of AC. We did
this under two opposing “philosophies” regarding set existence, a Platonist’s
approach (p. 217) and, subsequently, a definability or constructibility approach.
It must be conceded that under the philosophy “existence = definability”,
even though the argument itself that exists and is a well-ordering of the universe
is sound and can be promoted into a rigorous proof within formal set theory
once we learn about ordinals, the background hypotheses could be attacked on
the grounds that “real” sets might not be constructed in the manner we have
assumed. The whole argument was a “what if”.† In particular, there might be
dissent on the choice of urelements, on what is going on at stages, on the use
of exclusively first order formulas in defining sets, etc.
Let us be content with the fact that at least the plausibility if not as much as
“proof” of a strong AC has been established under this philosophy, because the
picture suggested of what sets are is intuitively pleasing and natural.
(3) In an axiomatic approach to set theory one adopts certain basic axioms
which are plausible (or, more boldly, “true”) and adequately describe our a
priori perception of the nature of sets. The latter means that the axioms must
also be sufficiently strong to imply as many “true” statements about sets as
possible.
There are two difficulties regarding these requirements. The first is a technical
difficulty, pointed out by Gödel (incompleteness theorem), namely, that there

† That is, a construction of a model for ZFC. The reader will note that in this “model” we only
veriﬁed AC. Of course, one must verify all the nonlogical axioms in order to claim that a structure
is a model. However, since we will revisit the constructible universe formally we chose here to
only deal with our immediate worry: the “truth” of our newest axiom, AC.
228 IV. The Axiom of Choice

are axiomatic theories (set theory – unfortunately – being one such) which are
incompletable, i.e., as long as they are consistent, they can never capture all the
true sentences that they are intended to capture (as theorems), no matter how
many axiom schemata we add (even an infinite number, as long as the formulas
that are axioms are recognizable as such).
The other difficulty has to do with limitations of our intuition (of course,
intuition advances and becomes more permissive as mathematical culture de-
velops). We do not know a priori what statements are supposed to be true (a
good thing this: otherwise mathematicians would be out of business), and coun-
terintuitive consequences of otherwise perfectly plausible axioms (shall I say
“true”?) may unfairly reflect badly on the axioms themselves, in any mathe-
matical culture that is not of sufficiently high order for mathematicians to know
better.
That “perfectly acceptable” axioms can lead to theorems that will seriously
challenge one’s intuition cannot be better illustrated than by Blum’s speed-up
theorem† in computational complexity theory. This theorem follows from the
only two axioms of the theory, both of which are outright “true”.
The theorem says that there is a computable function f on N with values in
{0, 1} which is so difficult to compute that for any program that computes f
there is another program that computes it significantly faster for all but finitely
many inputs – in other words, there is no “best” program for f . Now, this result
is certainly in conflict with intuition, but acceptable it must be, for the axioms
in this case are unassailable.
Acceptability of AC was initially hampered by a similar phenomenon: It
implied results that were unexpected and hard to swallow. The most notable such
result was Zermelo’s theorem that every set can be well-ordered, in particular
that the set of reals can. See also the discussion in Wilder (1963, pp. 73–74); in
particular note the concluding paragraph on p. 74.
To AC’s defense, we observe that mathematics is not entirely innocent of
counterintuitive constructions or theorems even in AC’s absence. We have al-
ready noted Blum’s theorem. Other examples are Weierstrass’s construction of
a continuous nowhere differentiable function, and Peano’s space-filling curve
(see Apostol (1957, p. 224)). Besides, we need AC because of vested interest.
Without it, much of mathematics is lost. For example, the standard fact that a
countable union of countable sets is countable crumbles if we disown AC (and
this may come as a surprise to many readers).‡

† See Blum (1967), or Tourlakis (1984), where this theorem is rehearsed in detail.
‡ Feferman and Levy (1963) have constructed a model of Zermelo-Fraenkel set theory without
AC, where the reals R, provably uncountable in ZF, are a countable union of countable sets.
IV.3. Exercises 229

(4) Two formal (i.e., syntactical) questions about AC must be settled right
away:

(a) If ZF is consistent and we add AC to its axioms, is the new theory, ZFC,
still consistent?
(b) Is AC provable in (i.e., a theorem of) ZF, assuming that ZF is consistent?
(Of course, an inconsistent ZF would prove every formula, including the
one that states AC (I.4.21).)

Gödel has answered (a) positively (1939, 1940; see also Devlin (1978)) by
two different methods, constructing in ZF the constructible universe of sets (it
is his first construction that we “popularized” within naı̈ve set theory to define
in this section). On the other hand, Fraenkel and Mostowski (see Jech (1978a))
and Cohen (1963) answered question (b) negatively.
Thus, both AC and its negation are consistent with ZF, and one can take
or leave AC without logical penalty either way. In this sense, AC has in the
context of ZF the same status that Euclid’s axiom on parallels has in the context
of axiomatic geometry. Adopting or rejecting Euclid’s axiom is just a reflection
of what kind of geometry one wants to do. Similarly, adopting AC or not reflects
the sort of set theory, and ultimately mathematics, one wants to do.
As we have indicated earlier, it makes sense to take a more direct approach to
our choice of axioms (rather than the indirect, or “results-driven”, approach), for
it is easy to be misled by strange but correct results. If at all possible, we should
adopt axioms by judging them on their own plausibility rather than on that of
their consequences. On that count, AC is nowadays generally accepted without
apology, since it is not any less plausible than, say, the axiom of replacement. It is
noteworthy that the first order logic which Bourbaki uses as the foundation of his
multi-volume work Éleménts de Mathématique contains a powerful “selection”
axiom – using the τ -operator (cf. Section I.6) – that directly turns the axiom of
choice of set theory into a theorem.

IV.3. Exercises
IV.1. Show by an example that the assumption of pairwise disjointness is
essential in the proof (3) → (4) of IV.1.2.
IV.2. Show that if for two objects A and B in L the formula A ∈ B is true,
then A B is also true.
The following exercises are best approached after the reader has mastered
the concepts of order and inductive deﬁnitions on ordered sets (Chapter VI).
They are presented here because of their thematic unity with the concepts of
230 IV. The Axiom of Choice

the present chapter. The Kuratowski-Zorn version of AC is particularly useful

in many branches of mathematics.
The reader is encouraged to try them out informally.
IV.3. AC implies that (Zermelo’s well-ordering theorem) every nonempty set
A can be well-ordered, i.e., an order < can be deﬁned on A so that every
nonempty B ⊆ A has a <-minimal element b (that is, ¬(∃x ∈ B)x < b).
(Hint. Follow Zermelo’s (1904) proof. Do not use the overkill of ordinals
(VI.5.50). Instead, let f be a choice function on F = P(A) − {∅}. Let
B ∈ F be called distinguished if it can be well-ordered by some order
< B so that, for every b ∈ B, b = f (A − {x ∈ B : x < B b}) (we call
{x ∈ B : x < B b} a segment (in B), the one that is determined by b).
For example, { f (A)} and { f (A), f (A − { f (A)})} are distinguished (in
the latter, of course, we set f (A) < f (A − { f (A)}). Show that for any
two distinguished sets B and C, one is a segment of the other, if they are
not identical (think of a maximal common segment; { f (A)} is certainly
a common segment). Take then the union of all distinguished sets, and
compare with A.)
IV.4. A linearly or totally ordered set A is one equipped with an order < such
that for any a, b in A it is true that a = b ∨ a < b ∨ b < a.

Formally, we proclaim that < : A → A is a linear order if we we have a proof

of (or have assumed) (∀a)(∀b)(a = b ∨ a < b ∨ b < a).

Show that if every set can be well-ordered, then (Hausdorff ) in every set A
ordered by, say, <, every totally ordered subset B is included (⊆) in a maximal
totally ordered subset M of A.
Note. Maximality means that if a ∈ A − M, then for some m ∈ M neither
a < m nor m < a.
(Hint. If B = A, there is nothing to prove. Else, let <W be a well-ordering
of A − B (which in general has no relationship to < that is already given on
A). By induction on <W , partition A − B into a good and a bad set: Put the
<W -minimum element of A − B in the good one if it is <-comparable with all
x ∈ A− B; else put it in the bad one. If all the elements of {x ∈ A− B : x <W a}
have been so placed, then place a in the good set if it is <-comparable with all
the elements in A and good; else put it in the bad one.)
IV.5. Show that the italicized statement that follows (due to Kuratowski and
Zorn; also known as “Zorn’s lemma”) is a consequence of Hausdorff’s
theorem in Exercise IV.4 above. If every totally ordered subset B of an
ordered (by <) set A has an upper bound (that is, an element b ∈ A such
IV.3. Exercises 231

that x ∈ B implies x = b or x < b; in short, x ≤ b), then for every

element x ∈ B there is a <-maximal element a of A such that x ≤ a.
Note. a above is <-maximal in A in the sense that ¬(∃x ∈ A)a < x.
IV.6. Show that the Kuratowski-Zorn theorem in Exercise IV.5 above im-
plies AC. This shows the equivalence of all four: AC, the Zermelo well-
ordering theorem, Hausdorff’s theorem, and Zorn’s lemma.
(Hint. Let A = ∅ be given. We want a choice function on F = P(A)−{∅}.
There certainly are choice functions on some subsets of F – on any ﬁnite
subsets, as a matter of fact. Let F be the set (why set?) of all choice
functions on subsets of F. For f, g in F deﬁne the order f < g to mean
f ⊂ g. Next, argue that any totally ordered subset of F has an upper
bound, and thus apply Zorn’s lemma to get a <-maximal member ψ of
F . Argue that ψ is a choice function on F.)
V

The Natural Numbers; Transitive Closure

V.1. The Natural Numbers

We are now at a point in our development where much would be gained in ex-
positional smoothness were we to have a formal counterpart of N in set theory.
For example, the main result of the next section is that of the existence of
the transitive closure, P+ , of an arbitrary relation P (an important result that we
∞ i
will need in Chapter VI). We will prove that P+ = i=1 P.
This requires that we settle the questions

(1) what is Pi , and

∞ i
(2) what is i=1 P?

Now this issue is much more complex than dealing with one (or, in any
case, “finitely” many) Pi at a time – like P2 , P101 , P123005 – which we can
define, and use, formally without the need for a formal copy of N. The trick of
absorbing the informal number i inside the name so that it is invisible in the
theory was done before (and discussed, for example, on p. 12). For example,
P2 = {x, y : (∃z)(yPz ∧ zPx)}.
Here we need to collect all the infinitely many Pi into a class, and to allow
∞
the formal system to “see” the variable i, in order to speak of “ i=1 . . . ”, a
short form of “{z : (∃i in an appropriate ZFC set of i’s)z ∈ . . .}”.
Clearly, this is true even if P is a set (a restriction we want to avoid); therefore
we need to formalize the presence of the “natural number” i.†
A similar situation arises in computer programming: We can use “infor-
mal subscripts” 1, 2, 3, . . . to denote several unrelated variables as X 1, X 2,

† When P is a set, things are a bit easier. We can then prove existence of P+ not by confronting
∞
i=1 P but by avoiding it. See Exercise V.16.
i

232
V.1. The Natural Numbers 233

X 3, . . . but we cannot refer to these informal subscripts within the program-

ming formalism to access, say, the ith variable Xi, since the programming
language does not see the i inside the name. For example,
for i = 1 to n do
Xi ← i
end do
just refers to one variable – named Xi – at all times and it successively changes
its value from 1 through n. It does not refer to variables (named) X 1, X 2, . . . ,
X n.
Now, if we use i as a formal subscript as in a subscripted variable (or “array”)
X [i] then the following refers to n different variables, X [1] through X [n]:
for i = 1 to n do
X [i] ← i
end do
We proceed now to introduce a formal counterpart of N, denoted by the standard
symbol ω.

V.1.1 Deﬁnition. A set A is inductive iff

∅ ∈ A ∧ (∀x ∈ A)x ∪ {x} ∈ A
For any set x, x ∪ {x} is called the successor of x. A set y is a successor iff
y = x ∪ {x} for some x.

Thus “A is inductive” is (represented by) a formula of set theory.

V.1.2 Example. An inductive set A contains ∅, {∅}, {∅, {∅}}, and as we go

on, applying the successor operation again and again, we increase the depth
of nesting of {}-brackets by 1 at every step. So these depths are successively
0, 1, 2, . . . .
We can identify these depths of nesting with the natural numbers of our
intuition. Better still, we can identify ∅, {∅}, {∅, {∅}}, etc., with the natural
numbers (this is “better” because, unlike the nebulous “depth of nesting”, these
sets are objects of set theory).

Of course we have to settle a few things. Does any inductive set really exist?
Is it not possible that an inductive set might contain much more than what we
would care to identify with natural numbers? In a way, the answers are “yes”
and “yes” – the ﬁrst by the “axiom of inﬁnity”, the second by the fact that
234 V. The Natural Numbers; Transitive Closure

we have “limit ordinals” larger than ω. These are inductive sets that contain
much more than just copies of the intuitive natural numbers (this will be better
understood in Chapter VI).

V.1.3 Axiom (Axiom of Inﬁnity). There is an inductive set.

By the remark following V.1.1, the axiom is a formula of set theory. Because
of Example V.1.2, an inductive set – if such a set “truly” exists – has as a subset
a set of “aliases” of all the members of N, so it is intuitively inﬁnite; hence the
axiom name is appropriate. Now why is the axiom “really true”? Because we can
certainly construct each (real) set in the inﬁnite sequence ∅, {∅}, {∅, {∅}}, . . . ,
for each integer depth of nesting n ∈ N,† and put them all in a class. Now this
class has the same size as the (real) set N (why?); hence it must be a set by
the “size limitation doctrine” of Chapter III. Alternatively, we can say that since
collection is “true” and N is a set, then ran( f ) is also a set (III.11.28), where
f is the function with domain N that for each n ∈ N “outputs” the set in the
sequence ∅, {∅}, . . . that has depth of nesting of braces equal to n.
Furthermore, by construction, ran( f ) is inductive.

It should be noted that the negation of the axiom of inﬁnity is “no inductive sets
exist”, not “inﬁnite sets do not exist” (see Exercise VI.54).

Finally, we should mention that it is known that Axiom V.1.3 is not provable
by the axioms we have so far (again, see Exercise VI.54); therefore it is a
welcome addition, being intuitively readily acceptable (and necessary).

V.1.4 Lemma. If F is a nonempty family of inductive sets, then F is an
inductive set.

Proof. Easy exercise.

V.1.5 Deﬁnition (The Formal Natural Numbers). We introduce a new con-

stant, ω, in the language of set theory by the explicit deﬁnition

ω= {x : x is an inductive set}

† To reach any set in the sequence that involves depth of nesting of { }-brackets equal to n, all we
have to do is to write down a proof, of length n + 1, that starts with the statement “∅ is a set” and
repeatedly uses the lemma “since x is a set, then so is x ∪ {x}” (by union and pairing) as x runs
through ∅, {∅}, . . . .
V.1. The Natural Numbers 235

We will call ω the set of formal natural numbers (we will drop the qualiﬁcation
“formal” whenever there is no danger of confusing N and ω).
Members of ω are called (formal) natural numbers. In our metanotation
n, m, l, i, j, k – with or without primes or subscripts – default to (formal) natural
number variables unless the context dictates otherwise.
That is, we are introducing natural number typed variables in our
argot. Thus, “(∀m) P [m]” is short for “(∀x ∈ ω) P [x]”. “(∃m) P [m]” means
“(∃x ∈ ω) P [x]”.

By the axiom of inﬁnity, ω is indeed a set (since {x : x is an inductive set} is

nonempty). By Lemma V.1.4 it is itself inductive. Clearly, ω is the ⊆-smallest
inductive set, for if A is inductive, then A ∈ {x : x is an inductive set} and hence
ω ⊆ A. This simple observation leads to

V.1.6 Theorem (Induction over ω). Let P (x) be a formula. Then

P (∅), (∀x) P (x) → P (x ∪ {x}) (∀x ∈ ω) P (x)

Proof. Assume the hypothesis. Let B = {x ∈ ω : P (x)}. By separation, B

is a set. The hypothesis (and the fact that ω is inductive) implies that B is
inductive. Hence (ω is smallest inductive set), ω ⊆ B. That is, x ∈ ω →
x ∈ ω ∧ P (x); hence (∀x ∈ ω) P (x) by tautological implication followed by
generalization.

V.1.7 Remark. The induction over ω is stated in a more user-friendly way as

“To prove (∀x ∈ ω) P (x) one proves
(1) P (∅) (this is the basis), and
(2) freezing x, the hypothesis (induction hypothesis, or I.H.) P (x) implies
P (x ∪ {x}).”
Note that (2) above establishes (∀x)(P (x) → P (x ∪ {x})) by the deduction
theorem (which uses the freezing assumption) followed by generalization.
The process in quotes proves P (x) by induction on x (over ω). x is called
the induction variable.
Applying the deduction theorem to V.1.6, one derives the ZFC theorem,

P (∅) → (∀x) P (x) → P (x ∪ {x}) → (∀x ∈ ω) P (x)

We develop a few properties of the formal natural numbers that we will need
on one hand for our theoretical development, and on the other hand in order to
make the claim that ω is a formal counterpart of N more acceptable.
236 V. The Natural Numbers; Transitive Closure

V.1.8 Lemma. n ∪ {n} = ∅ for all n ∈ ω.

This corresponds to “n + 1 = 0” on N, or ROB† axiom S1.

Proof. n ∈ n ∪ {n}.

V.1.9 Lemma. n ∪ {n} = m ∪ {m} implies m = n for all n, m in ω.

This corresponds to “n + 1 = m + 1 implies n = m” on N, or ROB axiom S2.

The reader will note from the proof below that this result is valid for all sets
n, m not just those in ω.

Proof. Let instead ¬m = n (proof by contradiction, frozen m and n). As n is

on the left hand side of
n ∪ {n} = m ∪ {m}
it must be on the right hand side too; hence, n ∈ m. Similarly, m ∈ n, and this
contradicts the axiom of foundation (applied to {m, n}).

V.1.10 Lemma. If n ∈ ω, then either n = ∅ or n = m ∪ {m} for some m ∈ ω.

Proof. Let‡ P (n) ≡ n = ∅ ∨ (∃m ∈ ω)(n = m ∪ {m}). Clearly, P (∅), in

pure logic, by axiom x = x and substitution. Assume next P (x) for frozen x
(I.H.), and prove P (x ∪ {x}):
Now P (x) entails two cases, x = ∅ and ¬x = ∅. The ﬁrst yields that x ∈ ω.
The second allows the introduction of the assumption
m ∈ ω ∧ x = m ∪ {m}
where m is a new constant. Since ω is inductive, x ∈ ω. Thus, the logical fact
x ∪ {x} = x ∪ {x}
and the substitution axiom Ax2 yield
(∃m)(m ∈ ω ∧ x ∪ {x} = m ∪ {m})
and therefore P (x ∪ {x}) by tautological implication.

V.1.11 Deﬁnition (Transitive Classes). A class A is transitive iff x ∈ y ∈ A

implies x ∈ A for all x, y.

† ROB stands for Robinson’s axiomatic arithmetic, studied in volume 1, Chapter I.

‡ “≡” means string equality (cf. I.1.4, p. 13).
V.1. The Natural Numbers 237

This concept was introduced, in passing, in a footnote of Chapter IV. It is of the

utmost importance: not only are the ordinals (in particular the formal natural
numbers) transitive sets – while the class of all ordinals is a transitive class –
but also such sets play a major role in the model theory of set theory.
Note that a class A is transitive amounts to ran(∈ |` A) ⊆ A, where, of
course, we employ the symbol “∈” to denote the relation – {y, x : x ∈ y} –
deﬁned by the nonlogical symbol (predicate) also denoted “∈”. In words: For
all inputs x ∈ A, the relation ∈ has all its outputs in A as well. Or, as we say,
A is ∈-closed.†

V.1.12 Lemma. Every natural number is a transitive set.

Proof. We prove‡
(∀z)(∀x, y)(x ∈ y ∈ z → x ∈ z) (1)
by induction on z.
Basis. x ∈ y ∈ ∅ → x ∈ ∅ is provable, since x ∈ y ∈ ∅ is refutable.
I.H. For a frozen z assume
(∀x, y)(x ∈ y ∈ z → x ∈ z) (2)
Let now x, y be frozen variables,§ and add the assumption x ∈ y ∈ z ∪ {z}.
Case y ∈ z. Then (I.H. and specialization) x ∈ z ⊆ z ∪ {z}.
Case y = z. Then x ∈ z ⊆ z ∪ {z}. By the deduction theorem,

x ∈ y ∈ z ∪ {z} → x ∈ z ∪ {z}
Hence

(∀x, y)(x ∈ y ∈ z ∪ {z} → x ∈ z ∪ {z})

V.1.13 Lemma. ω is a transitive set.

Proof. We prove
(∀y)(∀x)(x ∈ y ∈ ω → x ∈ ω) (1)
by induction on y.

† The reader will recall that in “y ∈ x”, “x” is the input, for according to our conventions x, y is
a pair in ∈. A sizable part of the literature has “y ∈ x” to mean y, x is in ∈, i.e., it has y as the
input. Naturally, for them, a transitive class is not ∈-closed; instead it is ∈−1 -closed.
‡ We use the shorthand “(∀x, y)” for “(∀x)(∀y)”
§ That is, we must remember not to universally quantify them or substitute into them prior to our
intended application of the deduction theorem.
238 V. The Natural Numbers; Transitive Closure

Basis. x ∈ ∅ ∈ ω → x ∈ ω is provable, since x ∈ ∅ ∈ ω is refutable.

I.H. Freeze y ∈ ω, and assume

(∀x)(x ∈ y ∈ ω → x ∈ ω) (2)

To argue the case for y ∪ {y}, let now x be frozen† and add the assumption
x ∈ y ∪ {y}.
Case x ∈ y. Then (I.H. and specialization) x ∈ ω.
Case x = y. Then x ∈ ω.
Thus, we have proved (deduction theorem followed by generalization)

(∀x)(x ∈ y ∪ {y} ∈ ω → x ∈ ω)

The above two lemmata say quite a bit about the structure of natural numbers:

(1) Every natural number is a transitive set.

(2) Every member of a natural number is a natural number (V.1.13).

Add to this that

(3) A natural number is a successor, or equal to ∅ (V.1.10),

and we have a complete characterization of natural numbers that does not need
the axiom of infinity anymore. (See Exercise V.5.) Well, we will need infinity
sooner or later, and we will need induction and inductive definitions over ω
sooner rather than later, so it was not a bad idea to introduce the “whole” ω
now.

V.1.14 Example. Is ω a successor? No, for if ω = x ∪ {x} for some x, then

x ∈ ω. Since ω is inductive, x ∪{x} ∈ ω as well, i.e., ω ∈ ω, which is impossible
by foundation.

V.1.15 Example (Predecessors). By Lemmata V.1.8, V.1.9, and V.1.10 the

formula n ∈ ω → (∃!m ∈ ω)(n = ∅ ∧ m = ∅ ∨ n = ∅ ∧ n = m ∪ {m}) is
provable in set theory.
Thus we can introduce a function symbol of arity 1, pr , the predeces-
sor function symbol, by the axiom (see III.2.4, p. 122, in particular, (10), (11),

† In argot one often says “let x be arbitrary but ﬁxed”, referring to the “value” of x.
V.1. The Natural Numbers 239

(19), (20))
pr (x) = y ↔
x ∈ ω ∧ (x = ∅ ∧ y = ∅ ∨ x = ∅ ∧ x = y ∪ {y}) (1)
∨x ∈
/ ω∧y =∅
Thus (aforementioned (19) and (20) respectively)

x ∈ ω → x = ∅ ∧ pr (x) = ∅ ∨ x = ∅ ∧ x = pr (x) ∪ { pr (x)}

and

x∈
/ ω → pr (x) = ∅

V.1.16 Remark. By the above, whenever n = ∅ is a natural number, then its

predecessor pr (n) is also a natural number such that pr (n) ∈ n. By the transi-
tivity of each natural number n, the predecessor of the predecessor of n (if the
latter is not ∅) is also a member of n, and so on; thus each natural number n is
the set of all natural numbers that “precede it” in the sense that “m precedes n
iff m ∈ n”.
This remark is important, yet trivial. Another way to see it is to note that
n = {x : x ∈ n} is provable for any set n. Now if n ∈ ω, then so are all the
x ∈ n, so with the notational convention of Deﬁnition V.1.5 one can write
n = {m : m ∈ n}.

V.1.17 Theorem (The Minimality Principle for ω, and for Any n ∈ ω).

(1) For ω: If A is a nonempty subset of ω, then there is an m ∈ A such that

¬(∃n ∈ A)n ∈ m.
(2) For any natural number k: If A is a nonempty subset of k, then there is
an m ∈ A such that ¬(∃n ∈ A)n ∈ m.

An element such as m is called an ∈-minimal element of A.

Proof. (1): Formally, we want to prove the theorem schema

(∃m) P [m] → (∃m) P [m] ∧ ¬(∃n) P [n] ∧ n ∈ m

By the argot conventions of V.1.5, the above is short for

(∃x)(x ∈ ω ∧ P [x]) → (∃x) x ∈ ω ∧ P [x] ∧ ¬(∃y) y ∈ ω ∧ P [y] ∧ y ∈ x

which is provable (for any P ) by the axiom of foundation.

240 V. The Natural Numbers; Transitive Closure

(2): Formally, since x ∈ k ↔ x ∈ ω ∧ x ∈ k is provable by V.1.13, we just

want to prove the theorem schema

(∃m ∈ k) P [m] → (∃m ∈ k) P [m] ∧ ¬(∃n ∈ k) P [n] ∧ n ∈ m
This translates to

(∃m)(m ∈ k ∧ P [m]) → (∃m) m ∈ k ∧ P [m] ∧ ¬(∃n) n ∈ k ∧ P [n] ∧ n ∈ m
and is provable by part (1).

V.1.18 Metatheorem (Relating ω and N). There is a 1-1 correspondence

I : N → ω – where “ω” here denotes the “real” smallest inductive set – that
“translates the successor on N to the successor on ω”, namely,
I (0) = ∅
and, for n ≥ 0, n ∈ N,
I (n + 1) = I (n) ∪ {I (n)}

Proof. (In the metatheory.) Taking recursive (inductive) deﬁnitions over N for
granted,† a unique and total I , as deﬁned by recursion in the statement of the
metatheorem, exists. Let us prove its other stated properties.
1-1-ness: By (metatheoretical) induction on n − m ≥ 1 (n, m in N) over N
we will prove that m < n → I (m) ∈ I (n), hence m = n → I (n) = I (m).
Basis. If n − m = 1, then I (n) = I (m) ∪ {I (m)}.
I.H. Assume the claim for n − m = k. Case n − m = k + 1: Now I (n) =
I (m + k + 1), so that I (m + k) ∈ I (n). By I.H., I (m) ∈ I (m + k) so that
I (m) ∈ I (n), since the sets I (i) are transitive.
Ontoness: By contradiction, let n ∈ ω be ∈-minimal such that n ∈ / ran(I ).‡
Now, n = ∅ for ∅ ∈ ran(I ). Thus (V.1.15), n = pr (n) ∪ { pr (n)}. Since pr (n) ∈ n
and n is minimal with the above property, pr (n) fails the property, that is,
pr (n) = I (m) for some m ∈ N. But then I (m + 1) = n, hence n ∈ ran(I ); a
contradiction.

V.1.19 Remark. In Metatheorem V.1.18 we have established that the “real”

structures (N; 0, λx.x + 1; <) and (ω; ∅, λx.x ∪ {x}; ∈) are isomorphic in a

† See I.2.13, p. 26, for justiﬁcation in a general setting. We will consider their formal counterparts
over ω shortly.
‡ By correctness and soundness of ZFC, the real ω satisﬁes the minimality principle, i.e., Theo-
rem V.1.17 is really true.
V.1. The Natural Numbers 241

unique (uniqueness of I ) and “natural” way. This is the well-known result of

naı̈ve set theory that the order type of N is ω.

(1) The isomorphism I preserves the initial element (I (0) = ∅),

(2) it preserves the operation of successor (I (n + 1) = I (n) ∪ {I (n)}), and
(3) it preserves order (we proved above that m < n → I (m) ∈ I (n)).
(4) I uniquely names the formal natural numbers ∅, {∅}, . . . as 0, 1, . . . ;
moreover,
(5) I , informally, assigns to each formal natural number its number of elements:
It assigns 0 to ∅, which is correct, and if we assume that n ∈ N correctly
measures the number of elements in I (n) (induction over N), then n + 1 is
the number of elements of I (n) ∪ {I (n)}, for from I (n) ∈ / I (n) it follows
that I (n) is a net new element added in passing from I (n) to I (n) ∪ {I (n)}.
(6) Further observing the “real” ω we extract one more piece of informa-
tion: We know that < on N satisﬁes trichotomy, i.e., for any n, m in N,
m < n ∨ m = n ∨ n < m is true. We know that ∈ does not satisfy trichotomy
on U M (think of an example); however, in view of the isomorphism I , we
expect that the transform of < (that is, ∈ restricted to the real ω) does satisfy
trichotomy. Indeed, continuing to argue in the metatheory, let m, n be in ω
such that m = n and let n , m in N be such that I (n ) = n, I (m ) = m. By
single-valuedness of I , n = m . Then n < m or m < n ; hence n ∈ m or
m ∈ n respectively. So (∀m, n)(m ∈ n ∨ m = n ∨ n ∈ m) is true in (the
real) ω.

By Gödel’s incompleteness theorem, there are really true sentences of the

language of set theory that are not provable in ZFC. However, trichotomy

(∀m, n)(m ∈ n ∨ m = n ∨ n ∈ m)

is not one of those. That this “truth” is formally provable within set theory we
will see shortly. First, let us summarize our position vs. ω:

Hold on a minute! How do we know that trichotomy is true in N? It positively is

the wrong reason to advance that this might be because N, the standard model
of PA, of course satisfies the PA axioms. That puts the cart well in advance
of the horse, since the formal PA is an afterthought, a symbol-shuffling game
we play within real mathematics. That is, we may rightly be accused here of
circular thinking: PA proves trichotomy just because we thought trichotomy
was really true, and so we “fixed” the PA axioms to derive it as a theorem!
Perhaps a more satisfying argument is that a real natural number indexes
(counts) the members of a string of strokes, “|”. Thus, if we have two natural
242 V. The Natural Numbers; Transitive Closure

numbers m and n, we can associate with them two strings, | . . . | and | . . . |,

of the requisite numbers of strokes. We can think of such strings as the unary
notations for those numbers.
Now, “obviously”, if we superimpose the two sequences of strokes so that
the two leftmost strokes in each coincide with each other, then either the two
sequences totally coincide (case m = n), or the ﬁrst sequence ends before the
second (is a “preﬁx” of the second; case m < n), or the “otherwise” obtains
(m > n).

Metamathematically the two structures

(N; 0, λx. x + 1; <) and (ω; ∅, λx. x ∪ {x}; ∈)

are indistinguishable (which is pretty much what “isomorphic” means). This

motivates the following behaviour on our part henceforth: From now on we will
enrich the argot we speak when we do (formal) set theory to include:

“set of natural numbers” to mean ω;

“natural number n” to mean n ∈ ω

(that is, we drop the qualiﬁcation “formal”).

We will utilize the naming induced by the “external” (metamathematical)
1-1 correspondence I without any further special notice, reserving the right to
fall back to rigid notation (∅,{∅}, etc.) whenever we feel that the exposition
will benefit from us doing so. Specifically, n + 1 stands for the successor
n ∪ {n} in the context of natural numbers – in particular, we can write n + 1 =
n ∪ {n}.
n − 1 is another name for pr (n) whenever n = ∅; we write 0 for ∅, 1 for {∅}
and, in general, n for {0, 1, . . . , n − 1} if n = 0.
We write < for ∈ whenever we feel that intuition will benefit from this
notation. Then, m ≤ n, i.e., m < n ∨ m = n is m ∈ n ∨ m = n; in short, m ⊆ n
due to the transitivity of n.
In examples, and metamathematical discussions in general, we will continue
to draw from our wider mathematical experience, and real sets such as N and
R will continue being used and talked about. All we have done here is to find
an isomorphic image of N in the real realm that is easily seen to have a formal
counterpart within the formal theory. We are not saying that this (real ω) is
really the set of natural numbers rather than that (N), as such a statement would
be meaningless (and pointless) mathematically. It is the job of the philosopher
to figure out exactly what are the natural numbers. The set theorist is content
V.1. The Natural Numbers 243

with using sets whose elements behave like natural numbers for purposes that
include the ones articulated at the outset in this chapter.

V.1.20 Theorem (Trichotomy on ω).

ZFC (∀n)(∀m)(m ∈ n ∨ m = n ∨ n ∈ m)

Proof. Recall V.1.5. We proceed by contradiction, so let us assume (or “add”)

instead

(∃n)(∃m)(¬m ∈ n ∧ ¬m = n ∧ ¬n ∈ m) (1)

By the minimality principle on ω (V.1.17), let n 0 be ∈-minimal† in ω such that‡

(∃m)(¬m ∈ n 0 ∧ ¬m = n 0 ∧ ¬n 0 ∈ m) (2)

Again, let m 0 be ∈-minimal such that

¬m 0 ∈ n 0 ∧ ¬m 0 = n 0 ∧ ¬n 0 ∈ m 0 (3)

We proceed to prove n 0 = m 0 , thus obtaining a contradiction.

Let x ∈ n 0 (which can also be written as “let i ∈ n 0 ” in view of V.1.13). By
minimality of n 0 , (2) yields (∀m)(m ∈ x ∨ m = x ∨ x ∈ m), and by specializa-
tion,

m0 ∈ x ∨ m0 = x ∨ x ∈ m0 (4)

Which ∨-clause is provable in (4)? Well, each of m 0 ∈ x and m 0 = x yields

m 0 ∈ n 0 (using transitivity of n 0 , V.1.12), which contradicts (3). It is then the
case that x ∈ m 0 , which proves n 0 ⊆ m 0 .
The symmetry of (3) suggests (check it!) that an entirely analogous argument
yields x ∈ m 0 → x ∈ n 0 and hence m 0 ⊆ n 0 . All in all, we have both (3) and
n 0 = m 0 , a contradiction.

Because of trichotomy, ∈-minimal elements in ∅ = A ⊆ ω are unique, for if

m = n in A are both ∈-minimal, then we get the contradiction (to V.1.20)

¬m ∈ n ∧ ¬n ∈ m ∧ ¬m = n

† The qualiﬁcation “in ω” can be omitted (indeed, it will be in the remainder of the proof) in view
of the naming convention of V.1.5.
‡ This uses proof by auxiliary constant, n 0 , between the lines. The reader was forewarned at
the beginning of Chapter IV that we will be increasingly using the “relaxed” proof style (see
also III.5.9, p. 148).
244 V. The Natural Numbers; Transitive Closure

An ∈-minimal element m of A thus is also ∈-minimum (∈-least, ∈-smallest) or

just minimum (least, smallest) in the sense that x ∈ A → m ∈ x ∨ m = x, or –
to use our new argot – x ∈ A → m ≤ x. This is so by trichotomy, since x ∈ m
fails.

V.1.21 Theorem (Recursive Definitions over ω). Given a set A and a total
function g : ω × A → A in the sense of III.11.12. There exists a unique total
function f : ω → A that satisfies the following recursive definition:
f (0) = a, where a ∈ A
(R)
for n ≥ 0, f (n + 1) = g(n, f (n))

Proof. It is convenient to prove uniqueness ﬁrst. Arguing by contradiction, let

f satisfy the identical recursive definition as f yet be different from f . Let m
be the least such that f (m) = f (m). By the basis of the definition (R), m = 0,
and hence m − 1 ∈ ω and f (m − 1) = f (m − 1). But then,
f (m) = g(m − 1, f (m − 1))
= g(m − 1, f (m − 1))
= f (m)
contradicting the hypothesis. Uniqueness is settled. Indeed, the argument ap-
plies unchanged for recursive definitions with a natural number n as domain
(i.e., total f : n → A), since the minimality principle holds on n as well (V.1.17).
We address existence next.

Preamble. For the existence part one is tempted to argue as follows:

Argument. (R) gives the value of f at 0, namely a. So, if we take the I.H. that
f (n) is defined, then (R) (second equation) shows that f (n + 1) is also defined
(since g is total). By induction over ω, f is defined for all n ∈ ω, hence it exists.
The above argument is drastically off the mark. All we really have argued
about is that any f that happens to satisfy (R) also satisfies dom( f ) = ω, or,
“if an f satisfying (R) exists, then dom( f ) = ω”. Thus we have not proved
existence at all. After all, a function f does not need to be total in order to
“exist”.
The correct way to go about proving existence is to build “successive approx-
imations” of f by “finite” functions that satisfy (R) on their domain. Each of
these finite functions will have as domain some natural number n ∈ ω −{0}. To-
wards this purpose we relax the “for n ≥ 0” requirement in (R). It turns out that
these finite functions (if they exist) are pairwise consistent in that, for any two
V.1. The Natural Numbers 245

of them, h and p, one has either h ⊆ p or p ⊆ h. Thus the union of all of them
is a function f . With a bit of extra work we show that f is total and satisﬁes (R).

Let

F = f : f is a function
∧ dom( f ) ∈ ω − {0} ∧ f (0) = a ∧ (1)
(∀k ∈ dom( f )) k = 0 → f (k) = g(k − 1, f (k − 1))

A fact used twice below is that F = ∅. For example, {0, a} ∈ F ; also,
{0, a, 1, g(0, a)} ∈ F . The ﬁrst of these two functions has domain equal
to 1, the second has domain equal to 2.

By the uniqueness part (applied to a function satisfying (R) on domain n),

for each n ∈ ω − {0} there is at most one f ∈ F with dom( f ) = n, hence, by
collection (III.11.28), F is a set. So is then

def
f = F (2)

Observe ﬁrst that f is a function: Let a, b ∈

f and also a, c ∈
f . Then, by (2),

f (a) = b and f (a) = c for some f and f in F . Without loss of generality,
applying trichotomy, dom( f ) ∈ dom( f ).† By uniqueness, f = f |` dom( f ),
since both sides of = satisfy
the same
recurrence on dom( f ).
Thus, c = f (a) = f |` dom( f ) (a) = f (a) = b, and therefore f is single-
valued.
We next argue that f satisﬁes the recurrence: Trivially,
f (0) = a by {0, a} ∈
F . Now

f (n + 1) = f (n + 1) for some f ∈ F
= g(n, f (n))
= g(n,
f (n)) by f ⊆
f

We ﬁnally show that

f is total on ω: Let instead m be least such that
f (m) ↑.
Hence, m = 0 (since f (0) ↓, due to {0, a}∈ F ) and
f (m − 1) ↓. Thus,

f (m−1) = f (m−1) for some f ∈ F with dom( f ) = m. Deﬁne h : m + 1, → A
by

f (x) if x ∈ m
h(x) =
g(m − 1, f (m − 1)) if x = m

Clearly h ∈ F (dom(h) = m + 1); thus

f (m) = h(m), a contradiction.

† If dom( f ) = dom( f ) then f = f by uniqueness, and hence b = c.

246 V. The Natural Numbers; Transitive Closure

Although we have used trichotomy in the existence part, this can be avoided.
See Exercise V.4.

We apply the theorem on recursive deﬁnitions to deﬁne addition on ω.

V.1.22 Deﬁnition (Addition of Natural Numbers). Deﬁne f m : ω → ω for

each m ∈ ω by
f m (0) = m
f m (n + 1) = f m (n) + 1

V.1.23 Remark. Of course, for each m, f m is a set by separation ( f m ⊆ ω × ω).

The class
{m, n, f m (n) : m, n ∈ ω2 } (1)
is a set by collection (III.8.9). It is also single-valued in the second projection,
for m, n = m , n implies m = m , and hence (uniqueness – V.1.21) also
f m = f m . Finally, this implies f m (n) = f m (n ), for our assumption yields n = n ,
and the f m are functions.
We will call the set (1) “+”, as tradition has it. That is, + : ω × ω → ω
satisfies (i.e., we can prove in ZFC) the following:
m+0 = m
m + (n + 1) = (m + n) + 1
In form, as it was dictated by intuition, the recursive definition of addition on
ω is identical to the recursive definition of addition on N (only the interpretation
of the nonlogical symbols changes). Naturally we expect addition on ω to enjoy
the same properties as that over N.
We note that the “+” just introduced is an informal abbreviation for a subset
of ω3 (given by the set term (1) above) that also happens to be an important
function. If we wish (we do not, however), we can also introduce a new formal
function symbol, say, “ f + ” by the explicit definition

x + y if x ∈ ω ∧ y ∈ ω
f + (x, y) =
∅ otherwise
The reader will recall that we bow to the requirement that formal function
symbols be total functions upon interpretation (e.g., over the standard model,
which here has as underlying “set” the proper class U M ). This is the reason for
the “otherwise” part. Trivially,
ZFC x ∈ ω → f + (x, 0) = x
V.1. The Natural Numbers 247

and

ZFC x ∈ ω → y ∈ ω → f + x, y ∪ {y} = f + (x, y) ∪ { f + (x, y)}

V.1.24 Proposition (Commutativity of Addition).

ZFC (∀n)(∀m)(m + n = n + m)

Proof. We do induction on n.
Basis. n = 0. We want to prove

(∀m)(m + 0 = 0 + m) (2)

Anticipating success, this will also entail (by commutativity of equality and the
Leibniz rule)

(∀m)(0 + m = m + 0) (2 )

Now ZFC (∀m)(m + 0 = m) by V.1.23. It sufﬁces to prove

(∀m)(0 + m = m)
Regrettably, this requires an induction on m:
Basis. m = 0. That ZFC 0 + 0 = 0 follows from V.1.23, which settles the
basis (of the m-induction).
Let (I.H. on m) 0 + m = m for frozen m. Then (using “=” conjunctionally
throughout this proof)
0 + (m + 1) = (0 + m) + 1 by V.1.23
= m+1 by I.H. on m
This ﬁnally settles the basis (for n), namely, (2).
Assume now (I.H. on n) that
(∀m)(m + n = n + m) (3)
with frozen n. We embark on proving
(∀m)(m + (n + 1) = (n + 1) + m) (4)
We prove (4) by induction on m.
Basis. For m = 0 we want

0 + (n + 1) = (n + 1) + 0

which is provable by (2 ) above via specialization.

248 V. The Natural Numbers; Transitive Closure

We take now an I.H. on m, that is, we add the assumption

m + (n + 1) = (n + 1) + m (5)

with frozen m (n was already frozen when we took assumption (3))

We embark on the ﬁnal trip, i.e., to prove

(m + 1) + (n + 1) = (n + 1) + (m + 1)

Well,

(n + 1) + (m + 1) = ((n + 1) + m) + 1 by V.1.23
= (m + (n + 1)) + 1 by I.H. on m (5)
= ((m + n) + 1) + 1 by V.1.23
= ((n + m) + 1) + 1 by I.H. on n: (3) and specialization
= (n + (m + 1)) + 1 by V.1.23
= ((m + 1) + n) + 1 by I.H. on n: (3) and specialization
= (m + 1) + (n + 1) by V.1.23

(4) is now settled.

The reader has just witnessed an application of the dreaded double induction.
That is, to prove

(∀n)(∀m) P (m, n) (6)

one starts, in good faith, an induction on n. En route it turns out that in order to
get unstuck one has to do an induction on m as well, towards proving the basis
(∀m) P (m, 0) and the induction step (∀m) P (m, n) → (∀m) P (m, n + 1).
The good news is that it is not always necessary to do a double induction in
order to prove something like (6). See for example the proof of the next result.

The reader can prove the associativity of + (Exercise V.6), which we take
for a fact from now on. Since, intuitively, n = {0, 1, . . . , n −1}, then, intuitively,
n + m = {0, 1, . . . , n − 1, n + 0, n + 1, . . . , n + (m − 1)} . That is, to obtain
n + m we “concatenate” to the right of n the elements of m “shifted” by n.
Formally this is true:

V.1.25 Theorem.

ZFC (∀n)(∀m) n + m = n ∪ {n + i : i ∈ m}
V.1. The Natural Numbers 249

Proof. By generalization, it sufﬁces to prove

(∀m)(n + m = n ∪ {n + i : i ∈ m})

by induction on m.
Basis. For m = 0 (i.e., m = ∅) the claim amounts to n + 0 = n ∪ ∅, while,
trivially, ZFC n ∪ ∅ = n. Done by V.1.23.
I.H. Assume

n + m = n ∪ {n + i : i ∈ m}

for frozen m and n. We look at the case m + 1: The left hand side is

n + (m + 1) = (n + m) + 1 by V.1.23
(1)
= (n + m) ∪ {n + m} (expanding “+1”)

The right-hand side is

n ∪ {n + i : i ∈ m ∪ {m}} = n ∪ {n + i : i ∈ m} ∪ {n + m}
= (n + m) ∪ {n + m} by I.H.

By (1), we are done.

V.1.26 Theorem.

ZFC (∀n)(∀m) n ≤ m → (∃!i)n + i = m

Proof. Existence. We argue by contradiction, so we add the assumption

(∃n)(∃m) n ≤ m ∧ ¬(∃i)n + i = m

Let then – invoking proof by auxiliary constant between the lines, as in the
proof of V.1.20 – n 0 be smallest such that†

(∃m) n 0 ≤ m ∧ ¬(∃i)n 0 + i = m (1)

and next let m 0 be smallest such that

n 0 ≤ m 0 ∧ ¬(∃i)n 0 + i = m 0 (2)

†

That is, “add thenew constant n 0 and the assumption (1) along with k < n 0 → ¬(∃m) k ≤ m ∧
¬(∃i)k + i = m ”.
250 V. The Natural Numbers; Transitive Closure

Because of ZFC n 0 + 0 = n 0 , (2) yields ¬m 0 = n 0 : otherwise (i.e., adding the

assumption m 0 = n 0 ) we get ZFC n 0 +0 = m 0 and hence ZFC (∃i)n 0 +i = m 0
by the substitution axiom (Ax2), contradicting (2).
Thus, from (2) (ﬁrst conjunct, via V.1.19, p. 242), ZFC n 0 ∈ m 0 ; hence
ZFC m 0 = 0. It follows that m 0 = pr (m 0 ) ∪ { pr (m 0 )} (V.1.15); therefore
ZFC n 0 ≤ m 0 − 1 (V.1.19, p. 242). By minimality of m 0 , there is an (auxiliary
constant) i such that n 0 + i = m 0 − 1; hence we have a ZFC proof of m 0 =
(n 0 + i) + 1 = n 0 + (i + 1) (employing “=” conjunctionally), contradicting
the property (2) of m 0 .
Uniqueness. Let n 0 be smallest such that

n0 + i = n0 + j (3)

for some i = j. Now n 0 = 0, for ZFC i = 0 + i and ZFC 0 + j = j (V.1.24).

Thus we can write (3) (using commutativity) as (i.e., we can prove)

(n 0 − 1) + i + 1 = (n 0 − 1) + j + 1

Since i + 1 = j + 1 (Lemma V.1.9), we have just contradicted the minimality

of n 0 .

V.1.27 Deﬁnition (Difference of Natural Numbers). By V.1.26, the relation

over ω below – which we denote by the informal symbol “−”,

{m, n, i : m = n + i}

is single-valued in the second projection, that is, a function

−:ω→ω

in the sense of III.11.12, called difference. By V.1.26,

ZFC n ≤ m → m − n ↓

and

ZFC n ≤ m → m = n + (m − n)

while

ZFC m < n → m − n ↑

(1) We are painfully aware of the multiple meanings of the symbol “−” in
set theory as set difference and, now, natural number difference, but such
“overloading” of symbol meaning is common in mathematics.
V.1. The Natural Numbers 251

(2) The difference between natural numbers does not coincide with the set
difference of the two numbers. For example, in the former sense, 2 − 1 =
1 = {0}, while in the latter sense 2 − 1 = {0, 1} − {0} = {1}. The context
will alert the reader if we (ever) perform m − n in the “set sense” rather
than the (normally) “natural number sense”.
(3) Number difference is consistent with the earlier introduction of “−” in the
context of predecessor. In the former sense, if n ≥ 1, then n −1 is the unique
number m such that m ∪ {m} = n, i.e., m + 1 = n; that m is precisely the
predecessor of n.

V.1.28 Deﬁnition (Finite Sequences). A ﬁnite sequence is a function f such

that dom( f ) ∈ ω.
If dom( f ) = 0, then f is the empty sequence. The length of the sequence f
is dom( f ). If i ∈ dom( f ), then f (i) is the ith element of the sequence.

Intuitively, a sequence f is [ f (0), . . . , f (n − 1)] where n = dom( f ) = 0.

If dom( f ) = 0, then we have the empty sequence [ ].

We wish to distinguish between vectors and ﬁnite sequences although in a sense

the two work the same way. They both give “order information” for a (ﬁnite)
set. The technical differences are as follows:

(1) The vector a, b is the set {a, {a, b}}, while the sequence [a, b] is the
set {0, a, 1, b} = {0, {0, a}}, {1, {1, b}} ; thus they are different as
sets.
(2) The vector x1 , . . . , xn has the informal n in its name, so that n cannot
be manipulated by the formalism.† Thus x1 , . . . , xn is much like a set
of unrelated variables X 1, . . . , X n in a programming language, while a
sequence f = {0, x1 , . . . , n − 1, xn } not only gives the same positional
information, but also behaves like an array f (i) for i = 0, . . . , n − 1 in a
programming language; for the i in f (i) has formal status.

We often need to juxtapose or concatenate two sequences f and g to obtain

[ f (0), . . . , f (n − 1), g(0), . . . , g(m − 1)] where dom( f ) = n, dom(g) = m.
Intuitively, the concatenation of f and g, in that order, is a sequence, for we

† One can of course revisit the definition of . . . and redefine it in terms of the formal numbers n.
Such rewriting of history will be unwise in view of the commotion it will create. As it stands we
are doing fine: The original definition allowed the theory to bootstrap itself up to a point where
the present more general and more flexible definition of sequence was given.
252 V. The Natural Numbers; Transitive Closure

can write it as [ f (0), . . . , f (n − 1),

g(n + 0), . . . ,
g(n + (m − 1))], where

g(n + i) = g(i) for all i ∈ dom(g) and undeﬁned otherwise. That is, the
concatenation is the function f ∪ g.

V.1.29 Deﬁnition (Concatenation of Finite Sequences). If f and g are ﬁnite

sequences, then the relation over ω given by

f ∪ {dom( f ) + i, g(i) : i ∈ dom(g)}

– and denoted by f ∗ g – is called their concatenation, in the order f followed

by g.

V.1.30 Proposition. For any two ﬁnite sequences f and g, f ∗ g is a ﬁnite

sequence of length dom( f ) + dom(g).

Proof. Observe that dom( f ) ⊆ dom( f ) + i (by V.1.25); hence dom( f ) + i ∈ /

dom( f ). In other words the domains of f and {dom( f )+i, g(i) : i ∈ dom(g)}
are disjoint.
Thus, f ∗ g is a function. Its domain, in view of the previous comment, is
dom( f ) ∪ {dom( f ) + i : i ∈ dom(g)}, which is dom( f ) + dom(g) by V.1.25.

V.1.31 Corollary. If f is the empty sequence and g is some sequence, then

f ∗ g = g ∗ f = g.

f ∗ g = g ∗ f in general: Exercise V.10.

V.1.32 Proposition. ZFC ¬n < 0.

Proof. That is, ZFC ¬n ∈ ∅.

V.1.33 Proposition. ZFC n < m + 1 ↔ n < m ∨ n = m.

Proof. That is, ZFC n ∈ m ∪ {m} ↔ n ∈ m ∨ n = m.

Some more arithmetic over ω is delegated to the Exercises section.

The reader who has read volume 1, Chapter II, now armed with V.1.8, V.1.9,
V.1.23, Exercise V.11 (which introduces multiplication over ω), V.1.20, V.1.32,
V.2. Algebra of Relations; Transitive Closure 253

and V.1.33 along with the induction principle over ω, will see with a minimum
of effort or imagination that the Gödel incompleteness theorems hold for ZFC –
a fact that we took as a given in many earlier discussions.
Indeed, one need only carry out the formal Gödel numbering of volume 1,
Chapter II, within ZFC (rather than within PA) using terms t of ZFC that
(provably) satisfy t ∈ ω as Gödel numbers of formulas, terms, and proofs in
(any extension of ) L Set . In this endeavour the proved results (for ω) that we
enumerated above – suggesting them as appropriate ammunition – play the role
of ROB and induction, which – in arithmetic – were assumed axiomatically in
volume 1.
Moreover, if denotes the set of individual ZFC axioms,† then it is easy
to prove that the corresponding formula Γ(x) is recursive. Everything else has
already been done in the aforementioned chapter.

V.2. Algebra of Relations; Transitive Closure

V.2.1 Informal Deﬁnition. Let P and S be two relations. Then P ◦ S, their
composition in that order, is deﬁned by

yP◦Sx stands for (∃z)(y P z ∧ z S x)

or, equivalently,

y ∈ (P ◦ S)x abbreviates ∃z ∈ Sx y ∈ Pz

We are adopting the notational convention that “P ◦ Sx” means “(P ◦ S)x”,
that is, we render the use of brackets redundant.

V.2.2 Remark. Intuitively, y ∈ P ◦ Sx iff there is a “stepping stone”, z, such

that S sends x to z and P sends the latter to y.
If S is a function, then y P S(x) (that is, S(x), y ∈ P – the reader may
wish to review notational issues; see III.11.4 and III.11.14) means, by Deﬁni-
tion III.11.16, (∃z)(z = S(x) ∧ y P z). This says the same thing as y P ◦ S x by
Deﬁnition V.2.1 (remembering that z = S(x) iff z S x for functions – III.11.14).

V.2.3 Lemma. For any relations P and S and all x, P ◦ Sx = P[Sx].

† Recall that separation and collection denote inﬁnitely many axioms, and so does foundation in
the form we have adopted, although the latter can be replaced by a single axiom.
254 V. The Natural Numbers; Transitive Closure

Proof. ⊆: Let y ∈ P ◦ Sx. Then, for some z, y ∈ Pz ∧ z ∈ Sx.† That is,
y ∈ P[Sx] (Deﬁnition III.11.4).
⊇: Let y ∈ P[Sx]. Then, for some z ∈ Sx, one has y ∈ Pz; hence
y ∈ P ◦ Sx by V.2.1.

$ %
V.2.4 Corollary. For any relations P and S and all X, P ◦ S[X] = P S[X] .

Lemma V.2.3 justiﬁes – at long last – the convention of writing “y P x” for

“x, y ∈ P”.
Of course, the “standard” convention is to write instead x P y, but it has a
serious drawback: For functions f, g viewed as special cases of relations – thus
using notation acceptable for relations – x f ◦ g y would mean that x is input
for f whose output is input to g; the latter yields y. In short, y = g( f (x)).
“Standard” notation goes a step further to write y = g ◦ f (x), thus introducing
the well-known (from elementary discrete mathematics texts) “reversal” from
f ◦ g (when we are composing f and g “viewed as relations”) to g ◦ f (when
we are composing f and g “viewed as functions”).
On the other hand, at the cost of being a bit unconventional at the outset,
when y P x was deﬁned, we proposed a convention that holds uniformly for
relations and functions when it comes to composition. This (by Lemma V.2.3)
says “in y P ◦ S x, S acts ﬁrst, on input x; one of its outputs, if it is inputed to
P, will cause output (possibly among other things) y”.

V.2.5 Example. Let R = {1, 2, 1, 3} and S = {1, 1, 2, 1}. Then

x S ◦ Ry iff (∃z)(x Sz ∧ z Ry)

iff (∃z)(y, z ∈ R ∧ z, x ∈ S)

Thus S ◦ R = {1, 1}. On the other hand, one similarly calculates that R ◦ S =
{1, 2, 1, 3, 2, 2, 2, 3}.
Therefore, in general, ZFC S ◦ R = R ◦ S.

V.2.6 Lemma. For any relations P, S, T

P ◦ (S ◦ T) = (P ◦ S) ◦ T

that is, composition is associative.

† The reader who may long for the earlier tediously formal proof style will note that “z” can be
thought of as the name of an auxiliary constant here.
V.2. Algebra of Relations; Transitive Closure 255

Digression. By the (informal) deﬁnition of class equality (cf. III.4.7 and III.4.8)
and the deduction theorem, a formula A = B is proved by “letting x ∈ A”
(frozen x) and then proving x ∈ B to settle “⊆”, subsequently repeating these
steps with the roles of A and B reversed. We have already employed this tech-
nique in the proof of V.2.3.
When we deal with relations P and S and we want to prove P = S, the above
technique translates to “letting” x P y in the “⊆-direction”. The reason is that
one really “lets”
z∈P (1)
Then (cf. III.11.1), OP(z) follows, that is,
(∃u)(∃v)u, v = z (2)
Letting now x and y be auxiliary (new) constants, we can add the assumption
y, x = z so that (1) becomes
xPy (3)
With some work, one then proves x S y, that is, z ∈ S. This settles P ⊆ S.
Thus, in practice, one is indeed justiﬁed in suppressing the steps (1)–(2) and
start by “letting” (3).

Proof. ⊆: Let xP ◦ (S ◦ T)y. Then (by V.2.1)

(∃z)(x P z ∧ z(S ◦ T)y)
Hence (by V.2.1)

(∃z) x P z ∧ (∃w)(z S w ∧ w T y)
Hence (w is not free in x P z)
(∃w)(∃z)(x P z ∧ z S w ∧ w T y)
Hence (z is not free in w T y)

(∃w) (∃z)(x P z ∧ z S w) ∧ w T y
Hence (by V.2.1)

(∃w) (xP ◦ Sw) ∧ w T y
Hence (by V.2.1)
x(P ◦ S) ◦ Ty
The case for ⊇ is entirely analogous.
256 V. The Natural Numbers; Transitive Closure

In view of the associativity of ◦, brackets are redundant in a chain of composi-

tions involving the same relation P. In particular,

P ◦ ·
· · ◦ P
n copies

depends only on P and the number of copies, n. It should be naturally denoted

by Pn . Here we have been thinking in informal (metamathematical) terms, i.e.,
n ∈ N.
We can say the same thing by an informal inductive deﬁnition (over N):

V.2.7 Tentative Deﬁnition (“Positive” Powers of a Relation). Let P be any

relation. Then
def
P1 = P
def
Pn+1 = Pn ◦ P for any n ∈ N such that n > 0

This tentative definition is acceptable, but it has the drawback that it hides
n in the name, as we have already discussed in the preamble of Section V.1.
We can fix this easily, if P is a set, by making V.2.7 into a formal recursive
definition of a function n "→ Pn on ω,† replacing “n ∈ N” by “n ∈ ω”.‡
However we want to afford our exposition the generality that P may be a
proper class. Intuitively, x Pn y (n ∈ ω and n = 0) should mean that for some
sequence [ f 0 , f 1 , . . . , f n ],

f 0 P f 1 P · · · P f n−1 P f n

where x = f 0 and y = f n .

V.2.8 Deﬁnition (“Positive” Powers of a Relation). For any relation P (pos-

sibly a proper class), and any n ∈ ω − {0}, the relation Pn is deﬁned by

x Pn y abbreviates n ∈ ω − {0} ∧ (∃ f ) f is a function ∧ dom( f ) = n + 1 ∧
f (0) = x ∧ f (n) = y ∧ (∀ j)( j < n → f ( j) P f ( j + 1)

The reader already knows how to express “ f is a function” within set theory.

† Technically, we then also need a meaning for P0 , i.e., a value of the defined function at 0. As
such we can take, for example, {x, x : x ∈ field(P)}.
‡ The requirement that P be a set makes the pairs n, Pn of the recursively defined function
meaningful, for the two components of a pair must be sets or atoms.
V.2. Algebra of Relations; Transitive Closure 257

V.2.9 Deﬁnition (Relational Identity). The identity or diagonal relation on

the class A is 1A : A → A (also denoted ∆A : A → A) given by 1A =
{x, x : x ∈ A}. Thus, 1A is a total function A → A.
If A is understood, then we simply write 1 or ∆.†
For any P : A → A we let P0 abbreviate 1A .

Some people deﬁne P0 as {x, x : x = x}, but we prefer to make P0 context-

sensitive. We note that Pn for n > 0 does not depend on the context A in which
P was given (as P : A → A).
Thus, if we have a relation R on a set A, then R 0 is the set {x, x : x ∈ A}
rather than the proper class {x, x : x = x}.

Pause. Why is {x, x : x = x} a proper class?

V.2.10 Example. Let A = {1, 2, 3}. Then 1A = ∆A = {1, 1, 2, 2, 3, 3}.

V.2.11 Lemma. For each P : A → A, one has P ◦ ∆ = ∆ ◦ P = P.

Proof. We have

y ∈ P ◦ ∆x

iff (Lemma V.2.3)

y ∈ P[∆x]

iff (Deﬁnition V.2.9)

y ∈ P[{x}]

iff

y ∈ Px

Thus P ◦ ∆ = P. Similarly, ∆ ◦ P = P.

V.2.12 Proposition. For any relation P,

ZFC P1 = P
ZFC Pn+1 = Pn ◦ P for any n ∈ ω − {0}

† The context will guard against confusing this ∆ with that of volume 1, Chapter II.
258 V. The Natural Numbers; Transitive Closure

Proof. By V.2.8, x P1 y abbreviates

(∃ f ) f is a function ∧ dom( f ) = 2 ∧ (1)
f (0) = x ∧ f (1) = y ∧ (∀ j)( j < 1 → f ( j) P f ( j + 1)

Since ZFC j < 1 ↔ j = 0 (recall that “ j < 1” means “ j ∈ {∅}”), the one
point rule (I.6.2) yields at once that (1) is provably equivalent to the following:

(∃ f ) f is a function ∧ dom( f ) = 2 ∧
(2)
f (0) = x ∧ f (1) = y ∧ f (0) P f (1)

(2), in turn, is provably equivalent to x P y.

To see the truth of the last claim, in view of (2) we introduce a new constant
f 0 and the assumption

f 0 is a function ∧ dom( f 0 ) = 2 ∧
(3)
f 0 (0) = x ∧ f 0 (1) = y ∧ f 0 (0) P f 0 (1)

the last three conjuncts of which yield x P y.

Conversely, assuming x P y, we get trivially

{0, x, 1, y} is a function ∧ dom({0, x, 1, y}) = 2 ∧

(4)
x = x ∧y = y∧xPy

which yields (2) by the substitution axiom Ax2. This settles P1 = P.

Next, assume

n>0 (5)

and

x Pn+1 y (6)

These imply (by V.2.8)

n > 0 ∧ (∃ f ) f is a function ∧ dom( f ) = n + 2 ∧

f (0) = x ∧ f (n + 1) = y ∧ (∀ j) j < n + 1 → f ( j) P f ( j + 1)
(7)

Note that x Pn+1 y (cf. V.2.8) contributes the redundant (by V.1.8; hence not
included in (7)) conjunct n + 1 > 0. By V.1.33 and employing tautological
equivalences, distributivity of ∀ over ∧, and the one point rule, (7) is provably
V.2. Algebra of Relations; Transitive Closure 259

equivalent to

n > 0 ∧ (∃ f ) f is a function ∧ dom( f ) = n + 2 ∧ f (0) = x ∧ f (n + 1)

= y ∧ (∀ j) j < n → f ( j) P f ( j + 1) ∧ f (n) P f (n + 1)
(8)
(8) allows the introduction of a new constant h and of the accompanying
assumption
h is a function ∧ dom(h) = n + 2 ∧ h(0) = x ∧
(9)
(∀ j) j < n → h( j) P h( j + 1) ∧ h(n) P y
or, setting g = h |` (n + 1) – which implies g(n) = h(n) in particular –
g is a function ∧ dom(g) = n + 1 ∧ g(0) = x ∧ g(n) = h(n) ∧
(∀ j) j < n → g( j) P g( j + 1) ∧ h(n) P y
which, by (5) and the substitution axiom, yields x Pn h(n) P y. Hence x Pn ◦P y.
The reader will have no trouble establishing the converse.

Proposition V.2.12 captures formally the essence of Tentative Deﬁnition V.2.7

for any P (even a proper class), avoiding the technical difficulty (indeed, im-
possibility) of defining a proper-class-valued function.
We observe that for a relation P on A, Definition V.2.9 is in harmony
with V.2.12 in the sense that

P0+1 = P1
=P by V.2.12

and

P0 ◦ P = ∆ ◦ P
=P by V.2.11

so that in this case ZFC Pn+1 = Pn ◦ P for n ≥ 0.

V.2.13 Lemma (The Laws of Exponents). For any P : A → A and j, m in

ω,
(a) ZFC P j ◦ Pm = P j+m
(b) ZFC (Pm ) j = Pm j .†

† For multiplication of natural numbers see Exercise V.11.

260 V. The Natural Numbers; Transitive Closure

In the statement of the lemma, as is the normal practice, we use “implied multi-
plication”, i.e., “m j” means m · j. We will also follow the standard convention
that “·” has smaller scope (or higher “priority”) than “+”, so that m + n j means
m + (n j).

Proof. (a): Induction on m. The case m = 0 requires the provability of P j ◦ 1 =

P j , which is indeed the case by Lemma V.2.11. Now for the case m + 1 via the
m-case using “=” conjunctionally:

P j ◦ Pm+1 = P j ◦ (Pm ◦ P) (by V.2.12)

= (P j ◦ Pm ) ◦ P (by associativity)
= P j+m ◦ P (by I.H.)
= P j+m+1 (by V.2.12)

(b): Induction on j. The case j = 0 requires that (Pm )0 = P0 (i.e., 1 = 1).

For the case j + 1, via the j-case,

(Pm ) j+1 = (Pm ) j ◦ Pm (by Proposition V.2.12)

= Pm j ◦ Pm (by I.H.)
= Pm j+m (by case (a))
= Pm( j+1) (by Exercise V.11)

By V.2.13(a), ZFC P j ◦ Pm = Pm ◦ P j for any P : A → A and j, m in ω. That

is, powers of a relation commute with respect to composition.

V.2.14 Deﬁnition. A relation P is

(a) symmetric iff for all x, y, x P y implies y P x.
(b) antisymmetric iff for all x, y, x P y ∧ y P x implies x = y.
(c) transitive iff for all x, y, z, x P y ∧ y P z implies x P z.
(d) irreﬂexive iff for all x, y, x P y implies x = y.
(e) reﬂexive on A iff (∀x ∈ A)x P x.

(1) It is clear that (a) above says more, namely “P is symmetric iff x P y ↔
y P x”, for the names x, y can be interchanged in the definition.
(2) All concepts except reflexivity depend only on the relation P, while reflex-
ivity is relative to a class A. If P : A → A and if it is reflexive on A, we
usually say just “reflexive”.
Reflexivity on A clearly is tantamount to ∆A ⊆ P.

V.2.15 Example. ∅ is all of (a)–(d). It satisﬁes (e) only on ∅.

V.2. Algebra of Relations; Transitive Closure 261

V.2.16 Example. ∆A : A → A is all of (a)–(c). It fails (d). It satisﬁes (e) on A.

Two obvious examples of irreﬂexive relations are ⊂ on classes and < on ω,
or, in the real realm, N.

V.2.17 Example (Informal). The congruence modulo m on Z is deﬁned for

m > 1 (m ∈ Z) by

a ≡m b iff m |a − b

where “x | y” means that (∃z)(x · z = y).

In number theory, rather than “. . . ≡m . . . ” one uses the “dismembered”
symbol “. . . ≡ . . . (mod m)”.
We verify that ≡m : Z → Z is reﬂexive, symmetric, and transitive but not
antisymmetric.†
Indeed, m | a−a for all a ∈ Z, which settles reﬂexivity. m | a−b → m | b−a
for all a, b settles symmetry. For transitivity we start with m | a −b and m | b−c,
that is, a − b = km and b − c = r m for k and r in Z. Thus, a − c = (k + r )m;
therefore a ≡m c.
To see that antisymmetry fails, just consider the fact that while 0 ≡m m and
m ≡m 0, still 0 = m.

V.2.18 Example (Informal). ≤: N → N (“less than or equal” relation) is re-

ﬂexive, antisymmetric, and transitive, but not symmetric. For example, 3 ≤ 4
but 4 ≤ 3.‡

V.2.19 Example. Let R = {1, 2, 2, 1, 2, 3, 3, 2, 1, 1, 2, 2, 3, 3}
on the set {1, 2, 3}. R is reﬂexive and symmetric, but is not antisymmetric or
transitive.

V.2.20 Proposition. P is symmetric iff P = P−1 (cf. III.11.4). P is transitive

iff P2 ⊆ P.

Proof. For symmetry: If part. Let P = P−1 and x P y. Then x P−1 y as well;
therefore y P x, so that P is symmetric.
Only-if part. ⊆: Let P be symmetric and x P y. It follows that y P x; therefore
x P−1 y (by the deﬁnition of P−1 ).

† It goes almost without saying that no relation can be irreﬂexive and reﬂexive on a nonempty
class.
‡ It is usual, whenever it is typographically elegant, to denote the negation of . . . P . . . , i.e.,
¬ . . . P . . . , by . . . P . . . , for any relation P.
262 V. The Natural Numbers; Transitive Closure

This settles ⊆. The case for ⊇ is entirely analogous.

For transitivity: If part. Let P2 ⊆ P and x P y P z.† Thus (deﬁnition of “◦”)
x P2 z; hence x P z, so that P is transitive.
Only-if part. Assume transitivity and let x P2 y. Thus (deﬁnition of “◦”)
x P z P y for some z (auxiliary constant). By transitivity, x P y; hence P2 ⊆ P.

V.2.21 Example. For any relations P and S, ZFC (P ∪ S)−1 = P−1 ∪ S−1 .
Indeed, let x, y ∈ (P ∪ S)−1 . Then

y, x ∈ P ∪ S

Hence

y, x ∈ P ∨ y, x ∈ S

Hence

x, y ∈ P−1 ∨ x, y ∈ S−1

Hence

x, y ∈ P−1 ∪ S−1

This settles ⊆. The argument can clearly be reversed to establish ⊇.

V.2.22 Deﬁnition (Closures). Given P.

(1) The reﬂexive closure of P with respect to A, rA (P), is the ⊆-smallest relation
S that is reﬂexive on A such that P ⊆ S.
(2) The symmetric closure of P, s(P), is the ⊆-smallest symmetric relation S
such that P ⊆ S.
(3) The transitive closure of P, t(P), is the ⊆-smallest transitive relation S such
that P ⊆ S. The alternative notation P+ is often used to denote t(P).

“S is ⊆-smallest such that F holds” means that if F holds also for T, then
S ⊆ T.

† In analogy with the conjunctional use of “<” in x < y < z, one often uses an arbitrary relation
P conjunctionally, so that x P y P z stands for x P y ∧ y P z.
V.2. Algebra of Relations; Transitive Closure 263

In anticipation of the following lemma, we used “the” as opposed to “a” in

Deﬁnition V.2.22.

V.2.23 Lemma (Uniqueness of Closures). Let S, T be two q-closures of P of

the same type (q ∈ {rA , s, t}). Then S = T.

Proof. Let S pose as a q-closure of P. Then S ⊆ T.

Now let T pose as a q-closure of P. Then T ⊆ S. Hence S = T.

V.2.24 Lemma (Existence of Closures). Given a class A and a relation P.

Then

(a) rA (P) = P ∪ ∆A ,
(b) s(P) = P ∪ P−1 ,
∞ i
(c) P+ = i=1 P.

∞ i
P+ = i=1 P is, of course, to be understood as an abbreviation of x P+ y ↔
(∃i ∈ ω)(i > 0 ∧ x Pi y).

Proof. (a): P ⊆ P ∪ ∆A , and P ∪ ∆A is reﬂexive on A. Let also T be reﬂexive

on A, and P ⊆ T. Reﬂexivity of T contributes ∆A ⊆ T. Thus, P ∪ ∆A ⊆ T.
So P ∪ ∆A is ⊆-smallest.
(b): Trivially, P ⊆ P ∪ P−1 . By V.2.20 and V.2.21, P ∪ P−1 is symmetric
and hence a candidate for s-closure. Let now P ⊆ T where T is symmetric and
hence T = T−1 . Thus P−1 ⊆ T−1 = T, from which P ∪ P−1 ⊆ T. Done.
∞ i
(c): Now P ⊆ i=1 P by V.2.12.
∞ i
Next, we argue that i=1 P is transitive.
∞ i ∞ i
Let x( i=1 P )y and y( i=1 P )z. That is, x P j y and y Pm z for some (auxi-
liary constants, cf. remark prior to this proof ) j, m (≥ 1), from which follows
∞ i
(by V.2.13) x P j+m z; hence ( j +m ≥ 1 is provable, clearly) x( i=1 P )z, which
settles transitivity.
∞ i
To establish i=1 P as ⊆-smallest, let T be transitive and P ⊆ T. We claim
that j > 0 → P j ⊆ T. We do (formal, of course) induction on j:
Basis. For j = 0 the claim is vacuously satisﬁed, since then 0 < j is
refutable.
264 V. The Natural Numbers; Transitive Closure

We assume the claim for frozen j and proceed to prove†

P j+1 ⊆ T (1)

There are two cases:

Case j + 1 = 1. Then we are done by P1 = P (V.2.12) and the assump-
tion P ⊆ T.
Case j + 1 > 1. Thus j > 0. Therefore we have a proof

x P j+1 y

(∃z)(x P j z ∧ z P y) V.2.12

(∃z)(x T z ∧ z T y) I.H. and assumption on T

xTy T is transitive
∞
which proves the induction step. i=1 Pi ⊆ T follows at once.

V.2.25 Example. Why does a class A fail to be transitive? Because some set
x ∈ A has members that are not in A. If we ﬁx this deﬁciency – by adding to
A the missing members – we will turn A into a transitive class. All we have to
do is to iterate the following process, until no new elements can be added:

Add to the current iterate of A – call this Ai – all those elements y, not
already included, such that y ∈ x ∈ Ai for all choices of x.

So, if we add a y, then we must add also all the z ∈ y that were not already
included, and all the w ∈ z ∈ y that were not already included, . . . . In short, we
add an element w just in case w ∈ z ∈ y ∈ · · · ∈ x ∈ A for some z, y, . . . , x.
With the help of the transitive closure – and switching notation from “∈ the
nonlogical symbol” to “∈, the relation {x, y : y ∈ x}”‡ – this is simply put as:

“Add a w just in case w ∈+ x ∧ x ∈ A for some x”, or

“Add to the original A the class ∈+ [A] – i.e., form A∪ ∈+ [A].”

It turns out that A∪ ∈+ [A] is the ⊆-smallest transitive class that has A as a
subclass – that is, it extends A to a transitive class in the most economical way
(see below).

† ZFC j + 1 > 0 anyway.

‡ The rightmost “∈” is the nonlogical symbol.
V.2. Algebra of Relations; Transitive Closure 265

V.2.26 Informal Deﬁnition (Transitive Closure of a Class). For any class A,

the informal symbol T C(A), pronounced the transitive closure of the class A,
stands for (abbreviates) A∪ ∈+ [A].

V.2.27 Proposition.

(1) T C(A) is the ⊆-smallest transitive class that has A as a subclass.

(2) If A ⊆ T C(B), then T C(A) ⊆ T C(B).
(3) If A is a set, then so is T C(A).

Proof. (1): Trivially, A ⊆ T C(A). Next, let x ∈ y ∈ T C(A).

Case 1. y ∈ A. Then x ∈ (∈ [A]) ⊆ (∈+ [A]),† since‡ ∈ is a subclass of
+
∈ . Hence x ∈ T C(A).
Case 2. y ∈ ∈+ [A]. Say y ∈ ∈i [A] for some i ∈ ω − {0}. Then x ∈
∈ [∈i [A]]. Moreover, we have the following simple calculation, where we
have used the leftmost predicates in each line conjunctionally.

x ∈ ∈ [∈i [A]]
= ∈ ◦ ∈i [A], by V.2.4
= ∈i+1 [A], by V.2.12
⊆ ∈+ [A]
⊆ T C(A)

Thus T C(A) is transitive.

Finally, let A ⊆ B and B be transitive. Let x ∈ ∈+ [A]. As above, x ∈
∈i [A] for some i ∈ ω − {0}. We want to conclude that x ∈ B. For variety’s
sake we argue by contradiction, so let i 0 > 0 (auxiliary constant) be smallest
such that for some x0 (auxiliary constant) the contention fails, that is, add

x0 ∈ ∈i0 [A] ∧ ¬x0 ∈ B

Now i 0 = 1 is provable, for, if we add i 0 = 1, then x0 ∈ ∈i0 [A] means that

(∃y)(x0 ∈ y ∈ A); hence (∃y)(x0 ∈ y ∈ B) by hypothesis. Thus, x0 ∈ B, since
B is transitive, and we have just contradicted what we have assumed about
membership of x0 in B.
Thus, i = i 0 − 1 > 0, and, by minimality of i 0 ,

(∀y)(y ∈ ∈i [A] → y ∈ B) (∗)

† Brackets are inserted this once for the sake of clarity. They are omitted in the rest of the proof.
‡ It is rather easy to see, from the context, when “∈” stands for the relation and when for the
predicate.
266 V. The Natural Numbers; Transitive Closure

Now

x0 ∈ ∈i0 [A] → x0 ∈ ∈ [∈i0 −1 [A]], by V.2.4

Hence

x0 ∈ y ∈ ∈i [A] for some y

Therefore

x0 ∈ y ∈ B, by (∗)

and

x0 ∈ B, since B is transitive.

We have contradicted the choice of i 0 and x0 , and this settles (1).

(2) follows trivially from (1).
(3): For any class T,

∈ [T] = {x : (∃y ∈ T)x ∈ y} = T

Thus, by induction on i, one can easily prove that ∈i [A] is a set (having deﬁned
∈0 [A] to mean A for convenience), since

∈i+1 [A] = ∈ [∈i [A]] = ∈i [A]

Then, by collection (e.g., III.11.28),

S = {A, ∈ [A], ∈2 [A], . . . , ∈i [A], . . .}

is a set, and (by union) so is T C(A) = S.

From the above we infer that another way to “construct” T C(A) is to throw in
all the elements of A, then all the elements of all the elements of A, then all
the elements of all the elements of all the elements of A, and so on. That is,

T C(A) = A ∪ A ∪ A∪ A...

V.2.28 Remark. (1) It follows from Lemma V.2.24 (if that were not already
clear from the definition) that the s- and t-closures are only dependent on the
relation we are closing, and not on any other context. On the contrary, the
reflexive closure depends on a context A.
(2) We also note that closing a relation P amounts, intuitively, to adding
pairs x, y to P until the first time it acquires the desired property (reflexivity
on some A, or symmetry or transitivity). Correspondingly, P is reflexive on A,
V.2. Algebra of Relations; Transitive Closure 267

or symmetric or transitive, iff it equals its corresponding closure. This readily

follows from the ⊆-minimality of closures.

2
V.2.29 Example (Informal). If A = {1, 2, . . . , n}, n ≥ 1, then A × A has 2n
2
subsets, that is, there are 2n relations P : A → A. Fix attention on one such
relation, say, R.
2
Clearly then, the sequence R, R 2 , R 3 , . . . , R i , . . . has at most 2n distinct
terms, thus
2

2n
+
R = Ri
i=1

With some extra work one can show that

n
R+ = Ri
i=1

in this case. Moreover, if R is reﬂexive (on A, that is), then

R + = R n−1

V.2.30 Example. Let the “higher order collection” of relations (Ta )a∈I be given
by a formula of set theory, T (a, x, y), in the sense that

x Ta y abbreviates T (a, x, y)

so that a∈I Ta stands for {x, y : (∃a ∈ I)T (a, x, y)}. Let S be another
relation.
Then the following two (abbreviations of ) formulas are provable in ZFC:

S◦ Ta = (S ◦ Ta ) (1)

a∈I a∈I

Ta ◦ S = (Ta ◦ S) (2)
a∈I a∈I

We prove (1), leaving (2) as an exercise. Let

xS◦ Ta y.
a∈I

Then

(∃z) x S z ∧ z Ta y
a∈I
268 V. The Natural Numbers; Transitive Closure

Hence (via some trivial logical manipulations)

(∃a ∈ I)(∃z)(x S z ∧ z Ta y)

which yields

(∃a ∈ I)(x S ◦ Ta y)

and ﬁnally

x (S ◦ Ta ) y
a∈I

This settles ⊆; the ⊇-part is similar.

V.2.31 Example. Consider now P : A → A. We will write ∆ for ∆A . We will

show that

m
ZFC (∀m) (∆ ∪ P)m = Pi (3)
i=0

We do induction on m. For m = 0, (3) requires ZFC ∆ = ∆, a logical fact.

I.H.: Assume (3) for some ﬁxed m ≥ 0.
Case m + 1 (employing “=” conjunctionally):

m
(∆ ∪ P)m+1
= P ◦ (∆ ∪ P)
i
(by I.H.)
i=0

m

= Pi ◦ (∆ ∪ P) (by V.2.30, case (2))
i=0
m
= (Pi ∪ Pi+1 ) (by V.2.30, case (1), and V.2.11)
i=0

m+1
= Pi
i=0

As an application of this result, we look into tr (P), or more correctly, t r (P) :
the transitive closure of the reﬂexive closure of P. We “calculate” as follows:

∞
∞
i
∞
tr (P) = t(∆ ∪ P) = (∆ ∪ P)i = Pk = Pi
i=1 i=1 k=0 i=0
V.2. Algebra of Relations; Transitive Closure 269

Thus

∞
tr (P) = Pi (4)
i=0

Next, look into r t(P) (really r t(P) : the reﬂexive closure of the transitive
closure of P). Clearly,

∞
r t(P) = ∆ ∪ t(P) = Pi (5)
i=0

By (4) and (5), ZFC r t(P) = tr (P). We call this relation (r t(P) or tr (P))
the reﬂexive-transitive closure of P. It usually goes under the symbol P∗ .
Thus, intuitively, x P∗ y iff either x = y, or for some z 1 , . . . , z k−1 for k ≥ 1,
x P z 1 P z 2 . . . z k−1 P y.†

V.2.32 Informal Deﬁnition (Adjacency Map). Given a relation P : A → B,

its adjacency map MP is

x, y, i : x ∈ A ∧ y ∈ B ∧ i ∈ {0, 1} ∧ (i = 1 ↔ x, y ∈ P)

In applications the most interesting case of adjacency maps occurs when

A = B = {a0 , . . . , an−1 }, a ﬁnite set of n elements (we have, intuitively, n
elements iff i = j → ai = a j ).‡ In this case we have relations P on A, a set;
therefore any such P is a set too. The adjacency map M P can be represented, or
“stored” (in a computer, for example), as a table, A P , known as an adjacency
matrix, via the deﬁnition
def
A P (i, j) = M P (ai , a j )

that is, A P is n × n and

def 1 if ai , a j ∈ P
A P (i, j) =
0 otherwise

We understand i as the row index and j as the column index.

† For k = 1 the sequence z 1 , . . . , z k−1 is empty by convention; thus we just have x P y in this
case.
‡ The ai can be thought of as values of a function f with domain n, that is ai = f (i).
270 V. The Natural Numbers; Transitive Closure

V.2.33 Example (Informal). Consider R = {a, b, b, c} on A = {a, b, c},
where a = b = c = a. Let us arbitrarily rename a, b, c as a1 , a2 , a3 respectively –
or, equivalently, 1, 2, 3, since the a in ai is clearly cosmetic. Then,
 
0 1 0
A R = 0 0 1
0 0 0
 
1 1 0
Ar (R) = 0 1 1
0 0 1
 
0 1 1
A R+ = 0 0 1
0 0 0
 
0 1 1
Ast(R) = 1 0 1
1 1 0
 
1 1 1
Ats(R) = 1 1 1
1 1 1

V.2.34 Example (Informal). Consider ∆ : S → S, where S = {1, 2, . . . , n}.

Then

1 if i = j
A∆ (i, j) =
0 otherwise
That is, A∆ has 1’s only on the main diagonal. This observation partly justiﬁes
the term “diagonal relation”.

V.2.35 Example (Informal). Given R : A → A and S : A → A, where A =

{a1 , a2 , . . . , an } and ai = a j if i = j. We can form R −1 , r (R), s(R), R + , R ∗ ,
R ∪ S, R ◦ S, S ◦ R, etc. What are their adjacency matrices?
First, let us agree that in the context of adjacency matrices the operation “+”
on {0, 1} is given by†

1 if x + y ≥ 1
x+y=
0 otherwise

† This “+” is often called Boolean addition, for if 0, 1 are thought of as the values false and true
respectively, the operation amounts to “∨”.
V.2. Algebra of Relations; Transitive Closure 271

Now, going from easy to hard:

(1) A R −1 satisﬁes A R −1 (i, j) = A R ( j, i) for all i, j, i.e., in matrix jargon, A R −1
is the transpose of A R .
(2) A R∪S = A R + A S , that is, A R∪S (i, j) = A R (i, j) + A S (i, j) for all i, j. We
say that A R∪S is the Boolean sum of A R and A S .
(3) In particular, Ar (R) = A∆∪R = A∆ + A R ; thus we pass from A R to Ar (R)
by making all the diagonal entries equal to 1.
(4) As(R) = A R∪R −1 = A R + A R −1 , so that As(R) (i, j) = A R (i, j) + A R ( j, i)
for all i, j.
(5) What is A R◦S in terms of A R , A S ? We have
A R◦S (i, j) = 1 iff ai , a j ∈ R ◦ S
iff a j R ◦ Sai
iff (∃m)(a j Ram ∧ am Sai )
iff (∃m)(am , a j ∈ R ∧ ai , am ∈ S)
iff (∃m)(A S (i, m) = 1 ∧ A R (m, j) = 1)
,n
iff A S (i, m) · A R (m, j) = 1
m=1
iff (A S · A R )(i, j) = 1
Thus, A R◦S = A S · A R (note the order reversal!).
(6) In particular, A R m+1 = A R m ◦R = A R · A R m . If now we assume inductively
that A R m = (A R )m (which is clearly true† for m = 0), then A R m+1 = (A R )m+1 .
Thus A R m = (A R )m is true for all m ≥ 0.
(7) It follows from (2), (6), and Exercise V.23 that
,
n
A R+ = (A R )i
i=1

whereas
,
n
A R∗ = (A R )i
i=0
∗
From the observation that R = tr (R) and from Exercise V.24 one gets
A R ∗ = (A∆ + A R )n−1
a simpler formula.
The reader will be asked to pursue this a bit more in the Exercises section, where
“good” algorithms for the computation of A R + and A R ∗ will be sought.

† We did say that this example is in the informal domain.

272 V. The Natural Numbers; Transitive Closure

V.3. Algebra of Functions

V.3.1 Deﬁnition. Given functions f : A → B and g : B → A such that g ◦ f =
∆A . We say that g is a left inverse of f , and f is a right inverse of g.

V.3.2 Remark. We will follow in this section the convention of using f, g, h

and possibly other lowercase letters, with or without subscripts or primes, as
function names even in those cases that the functions might be proper classes.
We will continue utilizing uppercase letters for “general” relations. Any startling
deviations from this notational rule will be noted.

V.3.3 Example (Informal). Let A = {a, b}, where a = b, and B = {1, 2, 3, 4}.
Consider the following functions:
f1 = {a, 1, b, 3}
f2 = {a, 1, b, 4}
g1 = {1, a, 3, b, 4, b}
g2 = {1, a, 2, b, 3, b}
g3 = {1, a, 2, b, 3, b, 4, b}
g4 = {1, a, 2, a, 3, b, 4, b}
g5 = {1, a, 3, b}
We observe that
g1 ◦ f 1 = g2 ◦ f 1 = g3 ◦ f 1 = g4 ◦ f 1 = g5 ◦ f 1 = g1 ◦ f 2 = g3 ◦ f 2
= g4 ◦ f 2 = ∆ A
What emerges is:
(1) The equation x ◦ f = ∆ A does not necessarily have unique x-solutions,
not even when only total solutions are sought.
(2) The equation x ◦ f = ∆ A can have nontotal x-solutions. Neither a total
nor a nontotal solution is necessarily 1-1.
(3) An x-solution to x ◦ f = ∆ A can be 1-1 without being total.
(4) The equation g ◦ x = ∆ A does not necessarily have unique x-solutions.
Solutions do not have to be onto.

In the previous example we saw what we cannot infer about f and g from
g ◦ f = ∆ A . Let us next see what we can infer.

V.3.4 Proposition. Given f : A → B and g : B → A such that g ◦ f = ∆A .

Then
(1) f is total and 1-1.
(2) g is onto.
V.3. Algebra of Functions 273

Proof. (1): Since g ◦ f is total, it follows that f is too (for f (a) ↑ implies
g( f (a)) ↑). Next, let f (a) = f (b). Then g( f (a)) = g( f (b)) by Leibniz axiom;
hence g ◦ f (a) = g ◦ f (b), that is, ∆A (a) = ∆A (b).
Hence a = b.
(2): For ontoness of g we argue that there exists an x-solution of the equation
g(x) = a for any a ∈ A. Indeed, x = f (a) is a solution.

V.3.5 Corollary. Not all functions f : A → B have left (or right) inverses.

Proof. Not all functions f : A → B are 1-1 (respectively, onto).

V.3.6 Corollary. Functions with neither left nor right inverses exist.

Proof. Any f : A → B which is neither 1-1 nor onto ﬁlls the bill. For example,
take f = {1, 2, 2, 2} from {1, 2} to {1, 2}.

The above proofs can be thought of as argot versions of formal proofs, since 1
and 2 can be thought of as members of (the formal) ω.

V.3.7 Proposition. If f : A → B is a 1-1 correspondence (cf. III.11.24), then

x ◦ f = ∆A and f ◦ x = ∆B have the unique common solution f −1 .

N.B. This unique common solution, f −1 , is called the inverse of f .

Proof. First off, it is trivial that f −1 is single-valued and hence a function.

Verify next that it is a common solution:

a f ◦ f −1 b iff (∃c)(a f c ∧ c f −1 b)
iff (∃c)(a f c ∧ b f c)
iff a = b ( f is single-valued)

where the if part of the last iff is due to ontoness of f , while the only-if
part employs proof by auxiliary constant (let c work, i.e., a f c ∧ b f c . . . ). Thus
x = f −1 solves f ◦ x = ∆B . Similarly, one can show that it solves x ◦ f = ∆A
too.
Uniqueness of solution: Let x ◦ f = ∆A . Then (x ◦ f ) ◦ f −1 = ∆A ◦ f −1 =
f . By associativity of ◦, this says x ◦ ( f ◦ f −1 ) = f −1 , i.e., x = x ◦ ∆B =
−1

f −1 . Therefore a left inverse has to be f −1 . The same can be similarly shown

for the right inverse.
274 V. The Natural Numbers; Transitive Closure

V.3.8 Corollary. If f : A → B has both left and right inverses, then it is a 1-1
correspondence, and hence the two inverses equal f −1 .

Proof. From h ◦ f = ∆A (h is some left inverse) it follows that f is 1-1 and

total. From f ◦ g = ∆B (g is some right inverse) it follows that f is onto.

V.3.9 Theorem (Algebraic Characterization of 1-1ness and Ontoness).

(1) f : A → B is total and 1-1 iff it is left-invertible.†
(2) g : B → A is onto iff it is right-invertible.‡

Proof. (1): The if part is Proposition V.3.4(1). As for the only-if part, note that
f −1 : B → A is single-valued ( f is 1-1) and verify that f −1 ◦ f = ∆ A .
(2): The if part is Proposition V.3.4(2).
Only-if part: By ontoness of g, all the sets in the family g −1 x x∈A are
#
nonempty. By AC, let h ∈ x∈A g −1 x.
Thus, h : A → B is total, and h(x) ∈ g −1 x for all x ∈ A. Now, for all x,
h(x) ∈ g −1 x iff h(x)g −1 x
iff xg h(x)
iff x = g ◦ h(x)
That is, g ◦ h = ∆ A .

V.3.10 Remark. If B ⊆ N, then AC is unnecessary in the proof of Theo-

rem V.3.9 (case (2), only-if part).
Note that Theorem V.3.9 provides an “equational” or “algebraic” deﬁnition
of ontoness and 1-1-ness (the latter for total functions).

The reader has probably observed that, given f : A → B and f −1 : B → A,

an easy way to ﬁgure out the correct subscript of ∆ in f ◦ f −1 = ∆ and
f −1 ◦ f = ∆ is to draw a diagram such as

† That is, it has a left inverse.

‡ That is, it has a right inverse.
V.3. Algebra of Functions 275

This is a trivial example of the usefulness of function diagrams. In some

branches of mathematics, such as category theory and algebraic topology,
it is an entire science to know how to manipulate complex function diagrams.

V.3.11 Informal Deﬁnition (Function Diagrams). A function diagram con-

sists of a finite set of points, each labeled by some class (repetition of labels
is allowed), and a finite set of arrows between points, each labeled by some
function (no repetitions allowed).
If an arrow labeled f starts (i.e., has its tail) at the (point labeled by the)
class X and ends (i.e., has its head) at the class Y, then the interpretation is that
we have a function f : X → Y.
A chain from point (labeled) A to point B in a diagram is a sequence of
arrows in the diagram such that the first starts at A, the last ends at B, and the
(i + 1)st starts where the ith ends, for all relevant i-values.
If f 1 , f 2 , . . . , f n are the labels of a chain in that order from beginning to end,
then we say that the chain has length n and result f n ◦ f n−1 ◦ · · · ◦ f 2 ◦ f 1 .
A diagram is called commutative iff any two chains with common start and
common end, of which at least one has length ≥ 2, have the same result.

V.3.12 Example (Informal: Examples of Function Diagrams).

(1) The following is commutative iff g ◦ f = h ◦ f . Note that commutativity

does not require g = h, since both chains from B to C are of length one and
thus the commutativity concept does not apply:
f g
A→B⇒C
h

(2) The following is commutative:

(3) Recall that π, δ denote the ﬁrst and second projections π (x, y) = x and
δ(x, y) = y for all x, y. Let f : C → A and g : C → B be two total
functions. Then there is a unique total function h which can label the dotted
276 V. The Natural Numbers; Transitive Closure

arrow below and make the diagram commutative:

This h is, of course, λx. f (x), g(x) : C → A × B.

Note that in drawing diagrams we do not draw the points; we only draw their
labels.

V.4. Equivalence Relations

V.4.1 Informal Deﬁnition. A relation P : A → A is an equivalence relation
on A iff it is reﬂexive, symmetric, and transitive.

Thus, in the metatheory “P : A → A is an equivalence relation” means that

“P ⊆ A × A and P is reﬂexive, symmetric, and transitive” is true, whereas in
ZFC it means that the quoted (quasi) translation† is provable (or has been taken
as an assumption)

V.4.2 Example. As examples of equivalence relations we mention “=”, i.e.,

∆ on any class, and ≡ (mod m) on Z (see Example V.2.17).

An equivalence relation on A has the effect, intuitively, of grouping equiva-

lent (i.e., related) elements into equivalence classes.

† The reﬂexive, symmetric, and transitive properties have trivial translations in the formal language.
V.4. Equivalence Relations 277

Why is this intuition not valid for arbitrary relations? Well, for one thing, not
all relations are symmetric, so if element a of A started up a club of “pals”
with respect to a (non-symmetric) relation P, then a would welcome b into the
club as soon as a P b holds. Now since, conceivably, b P a is false, b would not
welcome a in his club. The two clubs would be different. Now that is contrary
to the intuitive meaning of “equivalence” according to which we would like a
and b to be in the same club.
O.K., so let us throw in symmetry. Do symmetric relations group related
elements in a way we could intuitively call “equivalence”? Take the symmetric
relation =. If it behaved like equivalence, then a = b and b = c would require
all three a, b, c to belong to the same “pals’ club”, for a and b are in the same
club, and b and c are in the same club. Alas, it is conceivable that a = b = c,
yet a = c, so that a and c would not be in the same club. The problem is that
= is not transitive.
What do we need reﬂexivity for? Well, without it we would have “stray”
elements (of A) which belong to no clubs at all, and this is undesirable intuitively.
For example, R = {1, 2, 2, 1, 1, 1, 2, 2} is symmetric and transitive on
A = {1, 2, 3}. We have exactly one club, {1, 2}, and 3 belongs to no club. We
ﬁx this by adding 3, 3 to R, so that 3 belongs to the club {3}.
As we already said, intuitively we view related elements of an equivalence
relation as indistinguishable. We collect them in so-called equivalence classes
(the “clubs”) which are therefore viewed intuitively as a kind of “fat urelements”
(their individual members lose their “individuality”).

Here are the technicalities.

V.4.3 Informal Deﬁnition. Given an equivalence relation P on A. The equiv-

alence class of an element a ∈ A is {x ∈ A : x P a}. We use the symbol [a]P ,
or just [a] if P is understood, for the equivalence class.
If P, A are sets P, A, then A/P, the quotient set of A with respect to P, is
the set of all equivalence classes [a] P .

(1) Restricting the deﬁnition of A/P to sets P, A ensures that [x] P are sets
(why?) so that A/P makes sense as a class. Indeed, it is a set, by collection.
(2) Of course, [a]P = P[{a}] = Pa.

V.4.4 Lemma. Let P be an equivalence relation on A. Then [x] = [y] iff x P y.

Proof. If part. Let z ∈ [x]. Then z P x. Hence z P y by assumption and transi-

tivity. That is, z ∈ [y], from which [x] ⊆ [y].
278 V. The Natural Numbers; Transitive Closure

By swapping letters we have y P x implies [y] ⊆ [x]; hence (by symmetry

of P) our original assumption, namely x P y, implies [y] ⊆ [x]. All in all,
[x] = [y].
Only-if part. By reﬂexivity, x ∈ [x]. The assumption then yields x ∈ [y],
i.e., x P y.

V.4.5 Lemma. Let P be an equivalence relation on A. Then

(i) [x] = ∅ for all x ∈ A.

(ii) [x] ∩ [y] = ∅ implies [x] = [y] for all x, y in A.

(iii) x∈A [x] = A.

Proof. (i): From x P x for all x ∈ A we get x ∈ [x].

(ii): Let z ∈ [x] ∩ [y]. Then z P x and z P y; therefore x P z and z P y (by
symmetry); hence x P y (by transitivity). Thus, [x] = [y] by Lemma V.4.4.
(iii): The ⊆-part is obvious from [x] ⊆ A. The ⊇-part follows from

x∈A {x} = A and {x} ⊆ [x].

The properties (i)–(iii) are characteristic of the notion of a partition of a set.

V.4.6 Deﬁnition (Partitions). Let (Fa )a∈I be a family of subsets of A. It is a

partition of A iff all of the following hold:

(i) Fa = ∅ for all a ∈ I .

(ii) Fa ∩ Fb = ∅ implies Fa = Fb for all a, b in I .

(iii) x∈I Fa = A.

There is a natural afﬁnity between equivalence relations and partitions on a

set A.

V.4.7 Theorem. The relation C = {R, A/R : R is an equivalence relation

on A} is a 1-1 correspondence between the set E of all equivalence relations
on A and the set P of all partitions on A.

Pause. Are all the “sets” mentioned in the theorem indeed sets?

Proof. By deﬁnition, C is single-valued (on the δ-coordinate) on E and total,

since whenever R occurs in R, x, x is always the same, namely, A/R. More-
over, for each R ∈ E, A/R exists. By Lemma V.4.5 ran(C) ⊆ P , so that we
have a total function C : E → P so far.
V.4. Equivalence Relations 279

We next check ontoness:

on A as follows:
Let = (Fa )a∈I ∈ P . Deﬁne a relation
y
x iff (∃a ∈ I ){x, y} ⊆ Fa

Observe that:
is reﬂexive: Take any x ∈ A. By V.4.6(iii), there is an a ∈ I such that
(i)
x ∈ Fa , and hence {x, x} ⊆ Fa . Thus x x.

(ii) is, trivially, symmetric.
is transitive: Indeed, let x
(iii) y z. Then {x, y} ⊆ Fa and {y, z} ⊆ Fb
for some a, b in I . Thus, y ∈ Fa ∩ Fb ; hence Fa = Fb by V.4.6(ii). Hence
z.
{x, z} ⊆ Fa ; therefore x
is an equivalence relation. Once we show that C()
So = , i.e., A/ =
, we will have settled ontoness.
⊆: Let [x] be arbitrary in A/ (we use [x] for [x] ). Take Fa such that
x; hence z, x are in
x ∈ Fa (it exists by V.4.6(iii)). Now let z ∈ [x]. Then z
the same Fb , which is Fa by V.4.6(ii). Hence z ∈ Fa ; therefore [x] ⊆ Fa .
x; thus z ∈ [x]. All in all,
Conversely, if z ∈ Fa , since also x ∈ Fa , then z
[x] = Fa , where Fa is the unique Fa containing x. Thus [x] ∈ .
⊇: Let Fa be arbitrary in . By V.4.6(i), there is some x ∈ Fa . By the same
argument as in the ⊆-part, [x] = Fa ; thus Fa ∈ A/.
be as before, and let also R ∈ E such that A/R = .
For 1-1-ness, let and
If x R y, then [x] R = [y] R . Let [x] R = Fa for some a ∈ I ; thus x and y are in
y. The argument is clearly reversible, so R = .
Fa , that is, x

V.4.8 Example (Informal).

The equivalence relation ≡m on Z determines
the
quotient set Z/ ≡m = {i + k · m : k ∈ Z} : i ∈ Z ∧ 0 ≤ i < m . We usually
denote Z/ ≡m by Zm .

V.4.9 Example. Given f : A → B. Deﬁne R f by

x Rf y iff f (x) ! f (y) (1)

where ! is the weak equality of Kleene (see III.11.17).

R f is an equivalence relation:
(i) For all x ∈ A we have f (x) ! f (x); hence x R f x (this would have failed
whenever f (x) ↑ if we had used = rather than ! in (1)).
(ii) R f is trivially symmetric.
280 V. The Natural Numbers; Transitive Closure

(iii) R f is transitive: x R f y R f z means f (x) ! f (y) ! f (z), and hence f (x) !

f (z), and hence x R f z.
We can, intuitively, “lump” or “identify” all “points” in A that map into the
same element of B, thus, in essence, turning f into a 1-1 function. We also
lump together all points of A for which f is undeﬁned. All this is captured by
the following commutative diagram:

We observe:
(a) λx.[x] is total and onto. It is called the natural projection of A onto A/R f .†
(b) [x] "→ f (x) is single-valued,‡ for if [x] = [y], then x R f y and thus
f (x) ↑ ∧ f (y) ↑ ∨(∃z)(z = f (x) ∧ z = f (y)).
(c) The function [x] "→ f (x) is defined iff f (x) ↓ (trivial).
(d) [x] "→ f (x) is 1-1. For, let [x], a and [y], a be pairs of this func-
tion. The first pair implies f (x) = a, and the second implies f (y) = a, thus
f (x) = f (y), and hence f (x) ! f (y). It follows that x R f y, and hence
[x] = [y].
(e) Let h = λx.[x] and g = λ[x]. f (x). Then g ◦ h(x) = g(h(x)) = g([x]) =
f (x).
This verifies the earlier claim that the above diagram is commutative.

† The term applies to the general case λx.[x] : A → A/R, not just for the special R = R f above.
‡ In this context one often says “well-deﬁned”, i.e., the image f (x) is independent of the repre-
sentative x which denotes (deﬁnes) the equivalence class [x].
V.5. Exercises 281

(f) The moral, in words, is: “Every function f : A → B can be decomposed

into a 1-1 and an onto total function, in that left-to-right order. Moreover,
the 1-1 component is total iff f is.”

V.5. Exercises
V.1. Course-of-values induction. Prove that for any formula F (x),

ZFC (∀n ∈ ω) (∀m < n ∈ ω)F (m) → F (n) → (∀n ∈ ω)F (n)

or, in words, if for the arbitrary n ∈ ω we can prove F (n) on the induc-
tion hypothesis that F (m) holds for all m < n, then this is as good as
having proved (∀n ∈ ω)F (n).

(Hint. Assume (∀n ∈ ω) (∀m < n ∈ ω) F (m) → F (n) to prove (∀n ∈
ω)F (n). Consider the formula G (n) defined as (∀m < n ∈ ω) F (m),
and apply (ordinary) induction on n to prove that (∀n ∈ ω)G (n). Take
it from there.)
V.2. The “least” number principle over ω. Prove that every ∅ = A ⊆ ω has
a minimal element, i.e., an n ∈ A such that for no m ∈ A is it possible to
have m < n. Do so without foundation, using instead course-of-values
induction.
V.3. Prove that the principle of induction over ω and the least number principle
are equivalent, i.e., one implies the other. Again, do so without using
foundation.
V.4. Redo the proof of Theorem V.1.21 (existence part) so that it would go
through even if trichotomy of ∈ over ω did not hold.
V.5. Prove that a set x is a natural number iff it satisfies (1) and (2) below:
(1) It and all its members are transitive.
(2) It and all its members are successors or ∅.
V.6. Prove that for all m, n, i in ω, m + (n + i) = (m + n) + i.
V.7. Redo the proof of V.1.24 (commutativity of natural number addition) by
a single induction, relying on the associativity of addition.
V.8. Prove that for all m in ω, m < n implies m + 1 ≤ n (recall that ≤ on ω is
the same as ⊆).
V.9. Prove that for all m, n in ω, m < n implies m + 1 < n + 1.
V.10. Show by an appropriate example that if f, g are finite sequences, then
f ∗ g = g ∗ f in general.
282 V. The Natural Numbers; Transitive Closure

V.11. Deﬁne multiplication, “·”, on ω by

m·0 = 0
m · (n + 1) = m · n + m

Prove:
(1) · is associative.
(2) · is commutative.
(3) · distributes over +, i.e., for all m, n, k, (m + n) · k = (m · k) + (n · k).
V.12. Prove that m + n < (m + 1) · (n + 1) for all m, n in ω.
V.13. Let P be both symmetric and antisymmetric. Show that P ⊆ ∆A , where
A is the field of P. Conclude that P is transitive.
V.14. In view of the previous problem, explore the patterns of independence be-
tween reflexivity, symmetry, antisymmetry, transitivity, and irreflexivity.
V.15. For any relation P, (P−1 )−1 = P.
V.16. Let R : A → A be a relation (set). Define

P = {S ⊆ A × A : R ⊆ S ∧ S is reﬂexive}
Q = {S ⊆ A × A : R ⊆ S ∧ S is symmetric}
T = {S ⊆ A × A : R ⊆ S ∧ S is transitive}

Show that

r (R) = P

s(R) = Q

t(R) = T

V.17. Fill in the missing details of Example V.2.29.

V.18. If R is on A = {1, . . . , n}, then show that

n
R+ = Ri
i=1

V.19. If R is reﬂexive on A = {1, . . . , n}, then

R + = R n−1

V.20. Show that for any P : A → A, s(r (P)) = r (s(P)).

V.21. Show by an appropriate example that, in general, s(t(P)) = t(s(P)).
V.22. Given R on A = {1, . . . , n}.
V.5. Exercises 283

(a) Prove by an appropriate induction that the following algorithm ter-

minates with the value A R + in the matrix variable M:
M ← AR
for i = 1 to n − 1 do
M ← (A∆ + M) · A R
(b) Show that the above algorithm performs O(n 4 ) operations of the
type “+” and “·”.
f (n) = O(g(n)) means | f (n)| ≤ C · |g(n)| for all n ≥ n 0 (for some con-
stants C, n 0 independent of n), or, equivalently, | f (n)| ≤ C · |g(n)| + D
for all n ≥ 0 (for some constants C, D independent of n).
m
V.23. (a) Prove that if R is on A = {1, . . . , n}, then R + = i=1 R i for all
m ≥ n.
(b) Based on the above observation, on Example V.2.35, and on the fact
k
(with proof, of course) that for any matrix M we can find M 2 in k
matrix multiplications, find an algorithm that provably computes R +
in O(n 3 log n) operations of the type “+” and “·”.
V.24. Prove by appropriate inductions that the following algorithm due to
Warshall terminates with the value A R + in the matrix variable M, and
that all this is done in O(n 3 ) “+”-operations (there are no “·”-operations
in this algorithm):
M ← AR
for j = 1 to n do
for i = 1 to n do
if M(i, j) = 1 then
for k = 1 to n do
M(i, k) ← M(i, k) + M( j, k)
fi
Is the algorithm still correct if the first two loops are interchanged (i.e.,
if i is controlled by the outermost loop, rather than j)?
(Hint. When M(i, j) = 1, M(i, k) ← M(i, k) + M( j, k) says the same
thing as M(i, k) ← M(i, k) + M(i, j) · M( j, k).)
V.25. Prove that for sets P, A, where P is an equivalence relation on A, A/P
is a set as Definition V.4.3 wants us believe.
VI

Order

This chapter contains concepts that are fundamental for the further development
of set theory, such as well-orderings and ordinals. The latter constitute the
skeleton of set theory, as they formalize the intuitive concept of “stages” and,
among other things, enable us to make transﬁnite constructions formally (such
as the construction of the universe of sets and atoms, U M , and the constructible
universe, L M ).

VI.1. PO Classes, LO Classes, and WO Classes

We start with the introduction of the most important type of binary relation,
that of partial order.

VI.1.1 Deﬁnition. A relation P is a partial order, or just order, iff it is

(1) irreﬂexive (i.e., x P y → ¬x = y) and
(2) transitive.
It is emphasized that P need not be a set.

VI.1.2 Remark.
(1) The symbol < will be used to denote any unspeciﬁed order P, and it will be
pronounced “less than”. It is hoped that the context will not allow confusion
with the concrete < on numbers (say, on the reals).
(2) If the ﬁeld of the order < is a subclass of A, then we say that < is an order
on A.
(3) Clearly, for any order < and any class B, < ∩ (B × B) – or < | B – is an
order on B.

284
VI.1. PO Classes, LO Classes, and WO Classes 285

VI.1.3 Example (Informal). The concrete “less than”, <, on N is an order, but
≤ is not (it is not irreﬂexive). The “greater than” relation, >, on N is also an
order, but ≥ is not.
In general, it is trivial to verify that P is an order iff P−1 is an order.

VI.1.4 Example. ∅ is an order. Since for any A we have ∅ ⊆ A × A, ∅ is an

order on A for the arbitrary A.

VI.1.5 Example. The relation ∈ (strictly speaking, the relation defined by the
formula x ∈ y – see III.11.2) is irreflexive by the foundation axiom. It is not
transitive, though. For example, if a is a set (or atom), then a ∈ {a} ∈ {{a}} but
a∈ / {{a}}.
Let A = {∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}}. The relation ε =∈ ∩(A × A)
is transitive and irreflexive; hence it is an order (on A).

VI.1.6 Example. ⊂ is an order; ⊆ – failing irreﬂexivity – is not.

VI.1.7 Deﬁnition. Let < be a partial order on A. We use the abbreviation ≤ for
rA (<) = A ∪ <. We pronounce ≤ “less than or equal”. rA (>), i.e., rA (<−1 )
is denoted by ≥ and is pronounced “greater than or equal”.

(1) In plain English, given < on A, we deﬁne x ≤ y to mean x < y ∨ x = y

for all x, y in A.
(2) The deﬁnition of ≤ depends on A, due to the presence of A . There is no
such dependence on a “reference” or “carrier” class in the case of <.

VI.1.8 Lemma. For any <: A → A, the associated ≤ on A is reﬂexive,

antisymmetric, and transitive.

Proof. (1) Reﬂexivity is trivial.

(2) For antisymmetry, let x ≤ y and y ≤ x. If x = y then we are done, so
assume the remaining case x = y (i.e., x, y ∈/ A ). Then the hypothesis
becomes x < y and y < x; therefore x < x by transitivity, contradicting the
irreﬂexivity of <.
(3) As for transitivity, let x ≤ y and y ≤ z. If x = z, then x ≤ z (see the
remark following VI.1.7) and we are done. The remaining case is x = z. Now,
if x = y or y = z, then we are done again; so it remains to consider the case
x < y and y < z. By transitivity of < we get x < z, and hence x ≤ z, since
< ⊆ ≤.
286 VI. Order

VI.1.9 Lemma. Let P on A be reﬂexive, antisymmetric, and transitive. Then

P − A is an order on A.

Proof. Since
P − A ⊆ P (1)
it is clear that P − A is on A. It is also clear that it is irreﬂexive. We only need
verify that it is transitive.
So let
x, y and y, z be in P − A (2)
By (1)
x, y and y, z are in P (3)
Hence
x, z ∈ P
by transitivity of P.
Can x, z ∈ A , i.e., can x = z? No, for antisymmetry of P and (3) would
imply x = y, i.e., x, y ∈ A , contrary to (2).
So x, z ∈ P − A .

VI.1.10 Remark. Often in the literature ≤: A → A is deﬁned as a partial order

by the requirements that it be reﬂexive, antisymmetric, and transitive. Then < is
obtained as in Lemma VI.1.9, namely, as ≤ − A . Lemmata VI.1.8 and VI.1.9
show that the two approaches are interchangeable, but the modern approach
of Deﬁnition VI.1.1 avoids the nuisance of tying the notion of order to some
particular carrier class A. For us “≤” is the derived notion from VI.1.7.

VI.1.11 Informal Deﬁnition. If < is an order on a class A, we call the pair†

(A, <) a partially ordered class, or PO class. If < is an order on a set A, then
we call the pair (A, <) a partially ordered set or PO set. Often, if the order <
is understood as being on A or A, one says that “A is a PO class” or “A is a PO
set” respectively.

VI.1.12 Example (Informal). Consider the order ⊂ once more. In this case
we have none of {∅} ⊂ {{∅}}, {{∅}} ⊂ {∅} or {{∅}} = {∅}. That is, {∅} and {{∅}}

† Formally, (A, <) is not an ordered pair . . ., for A may be a proper class. We may think then
of “(A, <)” as informal notation that simply “ties” A and < together. If we were absolutely
determined to, then we could introduce pairing with proper classes as components, for example
as (A, B) = (A × {0}) ∪ (B × {1}). For our part we will have no use for such pair types and will
consider (A, <) in the informal sense.
VI.1. PO Classes, LO Classes, and WO Classes 287

are non-comparable items. This justifies the qualification partial for orders in
general (Definition VI.1.1).
On the other hand, the “natural” < on N is such that one of x = y, x < y,
y < x always holds for any x, y (trichotomy). That is, all (unordered) pairs
x, y of N are comparable under <. This is a concrete example of a total order.
Another example is ∈ on ω (V.1.20).
While all orders are partial orders, some are total (< above) and others are
nontotal (⊂ above).

VI.1.13 Deﬁnition. A relation < on A is a total or linear order on A iff

(1) it is an order, and
(2) for any x, y in A one of x = y, x < y, y < x holds (trichotomy).
If A is a class, then the pair (A, <) is a linearly ordered class, or LO class.
If A is a set, then the pair (A, <) is a linearly ordered set, or LO set. One often
calls just A a LO class or LO set (as the case warrants) when < is understood
from the context.

VI.1.14 Example (Informal). The standard <: N → N is a total order; hence

(N, <) is a LO set.

VI.1.15 Deﬁnition. Let < be an order and A some class. An element a ∈ A is

a <-minimal element in A, or a <-minimal element of A, iff ¬(∃x ∈ A)x < a.
m ∈ A is a <-minimum element in A iff (∀x ∈ A)m ≤ x.
We also use the terminology minimal or minimum with respect to <, instead
of <-minimal or <-minimum.
If a ∈ A is >-minimal in A, that is, ¬(∃x ∈ A)x > a, we call a a <-maximal
element in A. Similarly, a >-minimum element is called a <-maximum element.
If the order < is understood, then the qualiﬁcation “<-” is omitted.

In particular, if a ∈ A is not in the ﬁeld of <, then a is both <-minimal and

<-maximal in A.
Note that minimality with respect to < in A has the interesting formulation
< a ∩ A = ∅, which, if < is on A, simpliﬁes
further to < a ↑. In this
light, the “general case” also reads < | A a ↑ (see III.11.9), i.e., a ∈ A is
<-minimal iff the (relational) restriction of < on A is undeﬁned at a.
Because of the duality between the notions of minimal/maximal and
minimum/maximum, we will mostly deal with the <-notions, whose results
can be trivially translated for the >-notions.
288 VI. Order

VI.1.16 Example (Informal). 0 is minimal, and also minimum, in N with res-

pect to the natural ordering.
In P(N), ∅ is both ⊂-minimal and ⊂-minimum. On the other hand, all of
{0}, {1}, {2} are ⊂-minimal in P(N) − {∅} but none are ⊂-minimum in that set.
Observe from this last example that minimal elements in a class are not in
general unique.

VI.1.17 Lemma. Given an order < and a class A.

(1) If m is a minimum in A, then it is also minimal.

(2) If m is a minimum in A, then it is unique.

Proof. (1): Assume

(∀x ∈ A)(m = x ∨ m < x) (i)

and prove ¬(∃x ∈ A)x < m.

Well, assume (∃x ∈ A)x < m instead, and introduce a new constant a with
the assumption a < m ∧ a ∈ A. By (i), m = a ∨ m < a. Now, by irreﬂexivity,
case m = a is ruled out. But then case m < a and transitivity yield a < a,
which contradicts irreﬂexivity.
(2): Let m and n be minima† in A. Then m ≤ n (with m posing as mini-
mum) and n ≤ m (now n is so posing); hence m = n by antisymmetry
(Lemma VI.1.8).

VI.1.18 Example. Let m be <-minimal in A. Let us attempt to show that

it is also <-minimum (this is, of course, doomed to fail due to VI.1.16 and
VI.1.17(2) – but the false proof below is interesting).
By VI.1.15 we have ¬(∃x ∈ A)x < m. That is, (∀x ∈ A)¬x < m, i.e.,
(∀x ∈ A)m ≤ x, which says that m is <-minimum in A.
The error is in the last step, where ¬x < m and m = x ∨ m < x were
taken to be equivalent – i.e., we unjustiﬁably assumed trichotomy or totalness
of the order “<”. As we have seen (VI.1.16), it is possible to prove all three of
¬x < m, ¬x = m, ¬m < x for some orders and appropriate x and m.

VI.1.19 Lemma. If < is a linear order on A, then every minimal element is

also minimum.

† Plural of minimum.
VI.1. PO Classes, LO Classes, and WO Classes 289

Proof. The false proof of the previous example is valid under the present cir-
cumstances.

Much is to be gained, especially for work we will be doing in the next

section, if we generalize the notion of “minimal” – and, dually, “maximal” –
(Deﬁnition VI.1.15) to make it relevant to any relation P, even one that is not
necessarily an order.

VI.1.20 Deﬁnition. Let P be some relation and A some class.

We say that a ∈ A is P-minimal in A – or a P-minimal element of A – iff
P | Aa ↑.†
If a is P−1 -minimal in A, then we call it P-maximal in A.

VI.1.21 Remark. Clearly, Deﬁnition VI.1.15 is a special case of VI.1.20 when

P is an order. For a ∈ A the condition P | Aa ↑ is equivalent to A ∩ Pa = ∅,
since

∅ if a ∈/A
P | Aa =
A ∩ Pa otherwise

The following type of relation has crucial importance for set theory, and
mathematics in general.

VI.1.22 Informal Deﬁnition. A relation P (not necessarily an order) satisﬁes

the minimal condition (brieﬂy, it has MC) iff every nonempty A has P-minimal
elements.
If a total order <: A → A has MC, then it is a well-ordering on (or of ) the
class A.
If (A, <) is a LO class (or set) with MC, then it is a well-ordered, or WO,
class (or set).

VI.1.23 Remark (The Formalities – Exegesis). The above informal deﬁnition

is worded semantically (most informal definitions are), saying what an inhab-
itant of U M ought to look for to recognize that a relation P has MC. Since we
insist on certifying truths via proofs in ZFC (notwithstanding our knowledge
that not all truths are so certifiable), operationally, we have so certified some

† As in the case where we wrote “<” for “P” (VI.1.15), the symbol “|” is taken to have higher

priority than “a” and “↑”; thus “P | Aa ↑” means “ P | A a ↑”.
290 VI. Order

relation P – and proclaimed that “P has MC” – just in case we have proved the
schema†

∅ = A → (∃x ∈ A)A ∩ Px = ∅ (1)

Correspondingly, the phrase “let P have MC” is argot for the phrase “add the
axiom schema (1)”.
In the present connection, if we set A = {x : A[x]}, schema (1) translates
into

(∃x)A[x] → (∃x) A[x] ∧ ¬(∃y)(y P x ∧ A[y]) (2)

and each specific formula A provides an instance (see also VI.1.25 below).
The reader will immediately note that (2) generalizes the foundation schema:
Foundation is just the formal translation of the phrase “the (relation) ∈ – i.e.,
{x, y : y ∈ x} where the “∈” in “{. . .}” is the nonlogical predicate of L Set –
has MC”.
This discussion is also meant to caution that the casualness of Definition
VI.1.22 does not hide between the lines quantification over a class term (A) –
a thing we are not allowed to do.
Clearly, ∅ has MC. So, every relation can be “cut down” to a point that it
has MC (if necessary, cut it down all the way to ∅). One interesting way to cut
down a relation is by the process of restriction.
We say that P has MC over A just in case P | A has MC.
The term “PO set” (also “poset”) is standard. “LO set” is not much in cir-
culation, but “WO set” has occurred elsewhere (Jech (1978b)). By analogy we
have introduced the (non-standard) nomenclature PO class, LO class, and WO
class.

VI.1.24 Proposition. The condition “P has MC over A” is provably equiva-

lent to‡

∅ = B ⊆ A → (∃x ∈ B)B ∩ Px = ∅ (1)

Proof. Using the deduction theorem, we have two directions to prove:

→: Let us ﬁrst assume that P has MC over A. This means that P | A has
MC (by deﬁnition, VI.1.23), and therefore our assumption amounts to adding

† Here x ∈ A; thus P | Ax = A ∩ Px – see VI.1.21. A provable schema is, of course, one such
that all its instances are (here, ZFC) theorems.
‡ Where “∅ = B ⊆ A” is short for “∅ = B ∧ B ⊆ A”, i.e., utilizing the connectives “=” and “⊆”
conjunctionally.
VI.1. PO Classes, LO Classes, and WO Classes 291

the schema below (cf. VI.1.23):

∅ = B → (∃x ∈ B)B ∩ (P | Ax) = ∅ (1a)
Next, we focus on one unspeciﬁed B (so-called “arbitrary”). Now, after adding
B = ∅ and B ⊆ A, (1a) yields
(∃x ∈ B)B ∩ Px = ∅
using Remark VI.1.21. This proves (1).
←: Conversely, assume (add) (1), ﬁx B, and let ∅ = B (it is not assumed
that B ⊆ A).
We want to prove
(∃x ∈ B)B ∩ (P | Ax) = ∅ (2)

Case B ∩ A = ∅. Then

x ∈ B → B ∩ A ∩ Px = ∅
Therefore
x ∈ B → B ∩ (P | Ax) = ∅
by VI.1.21, which yields (2) by ∃-monotonicity (I.4.23) and modus
ponens.
Case B ∩ A = ∅. By (1),
(∃x ∈ B ∩ A)(B ∩ A) ∩ Px = ∅ (3)
since B ∩ A ⊆ A. Therefore (2) is deduced again, since (B ∩ A) ∩ Px =
B ∩ (P | Ax) and the quantiﬁcation in (3) can be changed to (∃x ∈ B).

Thus under both cases, the assumption ∅ = B yields (2), i.e., P has MC
over A.

VI.1.25 Corollary. That P has MC over A is provably equivalent to the

schema (4) below:

(∃x ∈ A)F [x] → (∃x ∈ A) F [x] ∧ ¬(∃y ∈ A)(y P x ∧ F [y]) (4)

Proof. (a) Add schema (1) above, and prove schema (4): Fix F , and let B =
A ∩ {x : F [x]}. We add
(∃x ∈ A)F [x] (hypothesis of (4)) (5)

(5) yields ∅ = B ⊆ A; hence, by (1),

(∃x ∈ B)B ∩ Px = ∅ (6)
292 VI. Order

(6) yields
(∃x ∈ B)¬(∃y)(y P x ∧ y ∈ B)
which in turn yields

(∃x ∈ A) F [x] ∧ ¬(∃y ∈ A)(y P x ∧ F [y])
This concludes the proof of (4).
(b) Conversely, assume (4) and prove (1): So let ∅ = B ⊆ A for ﬁxed B. The
class B is “given” by a class term {x : B [x]}, so that the assumption yields
(∃x ∈ A)B [x]
By (4) we get

(∃x ∈ A) B [x] ∧ ¬(∃y ∈ A)(y P x ∧ B [y])
which, in terms of B, reads
(∃x ∈ A ∩ B)A ∩ Px ∩ B = ∅
which, in view of B ⊆ A, yields exactly what we want:
(∃x ∈ B)B ∩ Px = ∅

VI.1.26 Corollary. If P has MC over A and B ⊆ A, then P has MC over B.

Proof. By VI.1.24 we add the schema

∅ = C ⊆ A → (∃x ∈ C)C ∩ Px = ∅ (7)

and ﬁx A and ∅ = D ⊆ B ⊆ A. We want

(∃x ∈ D)D ∩ Px = ∅

which we have by (7), since the hypothesis implies ∅ = D ⊆ A.

VI.1.27 Example (Informal). (N, <), where < is the natural order, is a WO
set. (Z, <) is not. Define next ≺ on Nn+1 by
xn+1 ≺ yn+1 iff x1 < y1 ∧ xi = yi for i = 2, . . . , n + 1
where “<” denotes the natural order on N. Then (Nn+1 , ≺) is a PO set (but not
a LO set) with MC.
Indeed, ≺ is irreflexive and transitive (xn+1 ≺ yn+1 ≺ z n+1 means x1 <
y1 < z 1 and xi = yi = z i for i = 2, . . . , n +1; hence xn+1 ≺ z n+1 ); therefore
it is an order. Note that xn+1 and yn+1 are non-comparable if x2 = y2 .
For any ∅ = B ⊆ Nn+1 the minimal elements are (n + 1)-tuples with mini-
mum first component.
VI.2. Induction and Inductive Definitions 293

VI.2. Induction and Inductive Deﬁnitions

We have already seen the application of induction, both informally (over N) and
formally (over ω), as well as inductive definitions both over N (in definitions –
cf. Section I.2 for the justification of this metatheoretical tool) and over ω. The
purpose of this section is to study the “induction phenomenon” further, since
these techniques are commonplace in set theory (and logic in general). We will
see that N and ω do not hold a monopoly on inductive techniques and that
we can do induction and inductive (or recursive) definitions over much more
general, indeed “longer”, sets than the natural numbers.

VI.2.1 Informal Deﬁnition. A relation P (not necessarily an order) satisﬁes

the inductiveness condition, or has IC, iff for every class A

(∀x)(Px ⊆ A → x ∈ A) → (∀x)x ∈ A (1)

holds. Formula schema (1) is called the P-induction schema.

VI.2.2 Remark. As in the case of MC (cf. VI.1.23), operationally, the phrases

“P has IC” and “let P have IC” are argot, respectively, for “(schema) (1) is
provable” and “add schema (1) (as an axiom schema)”.
Once again, we remind the reader that we are not quantifying over A in VI.2.1
any more than we are quantifying over a formula F in the statement of the
collection axiom.
Practically speaking, what the induction schema (1) enables is as follows: If
we want to prove x ∈ A for the free variable x, and if we know of some relation
P that has IC,† then our task can be helped by the additional hypothesis – known
as the induction hypothesis, (I.H. for short) – Px ⊆ A.
This technique proves (∀x)x ∈ A by P-induction on (the variable) x.
Of course, what we have outlined above in English is how to prove
Px ⊆ A → x ∈ A
via the deduction theorem; the usual restrictions on the free variables of Px ⊆
A apply.
Now, when employed in work within ZFC, A is just an argot name for the
class term {x : A[x]}; thus the P-induction schema (or principle), for any P
that has IC, can be restated without class names as

(∀x) (∀y)(y P x → A[y]) → A[x] → (∀x)A[x] (1a)

† We “know” because we either proved or assumed schema (1).

294 VI. Order

or, in English (again invoking the deduction theorem with the usual restrictions):
If P has IC, then to prove (∀x)A[x] it sufﬁces to prove A[x] with the help
of an additional “axiom” (induction hypothesis): that (∀y)(y P x → A[y]) –
with all the free variables of this “axiom” frozen.
An elegant way to say the same thing is that “the property A propagates
with P” in the sense that if all the (“values” of ) y that are “predecessors” of x –
i.e., y P x – have the property, then so does x.†
If P is an order, then (1) will be immediately recognized in form. It gener-
alizes the well-known principle of course-of-values induction‡ over N.

One can easily verify that ∅ has IC.§ As in the case of the MC property, a relation
can be “cut down” until it acquires IC. In particular, this may come about by
the process of restriction.

VI.2.3 Deﬁnition. We say that P has IC over A just in case P | A has IC.

VI.2.4 Proposition. That P has IC over A is provably equivalent to the follow-

ing schema:
(∀x ∈ A)(A ∩ Px ⊆ B → x ∈ B) → (∀x ∈ A)x ∈ B (2)

Proof. Only-if part. Assume that P has IC over A, i.e., add the following schema:
(∀x)(P | Ax ⊆ D → x ∈ D) → (∀x)x ∈ D (3)
To prove (2), ﬁx B and add
(∀x ∈ A)(A ∩ Px ⊆ B → x ∈ B)
that is,

(∀x) x ∈
/ A ∨ (A ∩ Px ⊆ B → x ∈ B)
or, provably equivalently,¶
(∀x)(P | Ax ⊆ B → x ∈ B ∪ A) (4)

† Here we are just trying to employ some visually suggestive nomenclature, thus we are forgetting
the “reality” that y is an “output” of P on “input” x and thus, in the intuition of cause and effect,
it comes after x. We are simply concentrating on the visual effect: y appears to the left of x in
the expression y P x.
‡ This is the name of the induction over N that takes the I.H. on 0, . . . , n − 1 – rather than just on
n − 1 – in order to help the case for n. We have encountered this in a formal setting in Peano
arithmetic in volume 1, Chapter II. See also Exercise V.1.
§ This is hardly surprising, in view of VI.1.23 and VI.2.11 below.
¶ A = U M − A.
VI.2. Induction and Inductive Deﬁnitions 295

by VI.1.21. Since P | Ax ⊆ B ∪ A implies P | Ax ⊆ B (by VI.1.21), (4) –

via tautological implication followed by ∀-monotonicity (I.4.24) – ﬁnally
yields

(∀x)(P | Ax ⊆ B ∪ A → x ∈ B ∪ A)

which, by (3), proves (∀x)x ∈ B ∪ A, that is, (∀x ∈ A)x ∈ B. This estab-
lishes (2).
If part. Assume (2), ﬁx B, and calculate:

(∀x)(P | Ax ⊆ B → x ∈ B)

↔ tautological equivalence and Leibniz rule

(∀x) (x ∈ A → P | Ax ⊆ B → x ∈ B) ∧

(x ∈
/ A → P | Ax ⊆ B → x ∈ B)

↔ distributing ∀ over ∧, VI.1.21, and simplifying using Leibniz rule
(∀x ∈ A)(A ∩ Px ⊆ B → x ∈ B) ∧ (∀x)(x ∈
/ A → x ∈ B)

→ using (2)
(∀x)(x ∈ A → x ∈ B) ∧ (∀x)(x ∈ / A → x ∈ B)

↔ distributing ∀ over ∧

(∀x) (x ∈ A → x ∈ B) ∧ (x ∈ / A → x ∈ B)

↔ tautological equivalence and Leibniz rule
(∀x)x ∈ B

Thus, the top line implies the bottom line, as we need.

The practical outcome is this: To prove x ∈ A → x ∈ B – i.e., to prove that

A ⊆ B – one normally applies the deduction theorem, freezing the free variables
and assuming x ∈ A. The aim is then to prove x ∈ B instead. Now, VI.2.4
shows that if we know of some relation P that has IC over A, then we can use
an additional hypothesis (I.H.), namely,

A ∩ Px ⊆ B

Of course, an additional hypothesis usually helps.

296 VI. Order

VI.2.5 Corollary. That P has IC over A is provably equivalent to the following

schema:

(∀x ∈ A) (∀y ∈ A)(y P x → F [y]) → F [x] → (∀x ∈ A)F [x]

VI.2.6 Remark. In the above corollary (∀y ∈ A)(y P x → F [y]) is, of course,
the I.H. The following formula is the induction step:
(∀y ∈ A)(y P x → F [y]) → F [x] (a)
What happened to our familiar (from “ordinary” induction over N, or ω) basis
step? The answer is that to prove the induction step (a) with x free entails that the
proof must be valid, in particular, for all the P-minimal elements of A, if any.†
Now, when considering the case where x is P-minimal in A, (a) is prov-
ably equivalent to F [x] – which is a‡ basis case for x: Instead of proving (a),
prove F [x].
Indeed, that F [x] implies (a) is trivial. Conversely, since y ∈ A ∧ y P x is
for an x that we have assumed to be P-minimal, (∀y) y ∈ A∧ y P x →
refutable
F [y] is provable, so that, if (a) is, so is F [x] by modus ponens.

VI.2.7 Example (Informal). The “course-of-values <-induction” over N, as

it is outlined in the elementary literature (e.g., discrete mathematics texts), says
that to prove (∀n ∈ N)P (n) one only need do (1) and (2) below:
prove P (0) (1)
and
prove for every n ∈ N − {0} that (∀m < n) P (m) implies P (n) (2)
Stating (1) explicitly is standard folklore, but, as we have already remarked
in VI.2.6 above, we can actually merge (1) and (2) into

(∀n ∈ N) (∀m < n) P (m) → P (n)

VI.2.8 Remark. In practice, Corollary VI.2.5 is often applied in such a way

that the “veriﬁcation” on the P-minimal elements of A is stated and performed
explicitly:

Basis cases: One proves F [x] on the assumption that x ∈ A is P-minimal.

Induction step: One proves F [x] on the assumption that x ∈ A is not
minimal, using as I.H. that (∀y ∈ A)(y P x → F [y]).

† It turns out that P has MC over A, so that A does have minimal elements – Theorem VI.2.11
below.
‡ “A” rather than “the”, since there may be many minimal elements.
VI.2. Induction and Inductive Deﬁnitions 297

Of course, the usual precautions that one takes when applying the deduction
theorem are taken.

VI.2.9 Deﬁnition. For any relation P, an inﬁnite descending P-chain is a func-

tion f with the properties
(1) dom( f ) = ω, and
(2) (∀n ∈ ω) f (n + 1) P f (n).

Intuitively, an inﬁnite descending P-chain is a sequence a0 , a1 , . . . such

that . . . a3 P a2 P a1 P a0 .

VI.2.10 Informal Deﬁnition. A relation P is well-founded iff it has no inﬁnite

descending chains.
P is well-founded over A iff P | A is well-founded.

Intuitively, P is well-founded if the universe U M cannot contain an inﬁnite

descending chain, while it is well-founded over A if A cannot contain an infinite
descending chain. Clearly, no infinite descending P-chain can start anywhere
outside dom(P) in any case.
There is some disagreement on the term “well-founded”. In some of the
literature it applies definitionally to what we have called relations “with MC”.
However, in the presence of AC well-founded relations are precisely those that
have MC, so the slight confusion – if any – is harmless.

VI.2.11 Theorem. For any relation P the following are provable:

(1) P has MC over a class A iff it has IC over A.
(2) If P has MC over A, then P is well-founded over A.

Proof. (1): Consider the schema in VI.1.25 and the schema in VI.2.5. The
former schema is that of MC over A, while the latter is that of IC over A. It is
trivial to verify that an instance of any one of the two schemata realized with
a formula F is provably equivalent to the contrapositive of the instance of the
other realized with the formula ¬ F .
(2): Let instead f be an inﬁnite descending P | A-chain. Then ∅ = ran( f ) ⊆
A, and hence there is an a ∈ ran( f ) which is P | A-minimal. Now, a = f (n)
for some n ∈ ω, but f (n + 1)(P | A) f (n), contradicting the P | A-minimality
of a.

VI.2.12 Corollary. If P has IC over A and B ⊆ A, then P has IC over B.

Proof. By VI.1.26.
298 VI. Order

VI.2.13 Corollary. Let A be set. Then the following are provably equivalent:
(1) P has MC over A.
(2) P has IC over A.
(3) P is well-founded over A.

Proof. We only need to prove that (3) implies (1). So assume (3), and let (1)
fail. Let ∅ = B ⊆ A such that B has no P-minimal elements. Pick an a ∈ B.
Since it cannot be P-minimal, pick an a1 ∈ B such that a1 P a. Since a1 cannot
be P-minimal, pick an a2 ∈ B such that a2 P a1 .
This process can continue ad infinitum to yield an infinite descending
chain . . . a3 P a2 P a1 P a in A, contradicting (3).
This argument used AC, and more formally it goes like this: Let g be a choice
function for P(B) − {∅}.† Define f on ω by recursion as

f (n) =
g(B)
if n = 0
g B ∩ P f (n − 1) if n > 0
f is total on ω for B ∩ P f (n − 1) = ∅ for all n > 0, by assumption (cf.
VI.1.24). By g(x) ∈ x for all x ∈ P(B) − {∅}, we have f (n) ∈ P f (n − 1),
i.e., f (n)P f (n − 1) for all n > 0; thus f is an infinite descending chain.

VI.2.14 Remark. The corollary goes through for any class A, not just a set A,
as we will establish later.
It is also noted that a weaker version of AC was used in the proof, the so-
called axiom of dependent choices, namely that “if P is a relation and B = ∅ a
set such that (∀x ∈ B)(∃y ∈ B)y P x, then there is a total function f : ω → B
such that (∀n ∈ ω) f (n + 1) P f (n).”

VI.2.15 Example. If P is well-founded, then it is irreﬂexive. Indeed, if a P a

for some a, then λn.a on ω is an infinite descending chain (. . . a P a P a P a).
By Theorem VI.2.11, if P has IC (equivalently MC), then it is irreflexive.
If P is irreflexive but not well-founded, is then P+ a partial order? (A legiti-
mate question, since P+ is transitive.) Well, no, for consider R = {1, 2, 2, 3,
3, 1}, which is irreflexive. Now R + = {1, 1, 2, 2, 3, 3, 1, 2, 2, 3,
3, 1, 1, 3, 2, 1, 3, 2}, which is not a partial order (it is reflexive), nor
the reflexive closure of one, since it is not antisymmetric (e.g., 1 R 3 ∧ 3 R 1
requires 1 = 3).
It turns out that if P has MC, then so does P+ , and hence, in particular, it is
a partial order, being irreflexive.

† Proof by auxiliary constant, “g”.

VI.2. Induction and Inductive Deﬁnitions 299

VI.2.16 Theorem. If P has MC (IC), then so does P+ .

Proof. Let ∅ = A and a ∈ A be P-minimal, i.e.,

Pa ↑ (1)
+
Suppose now that b P a for some b. Then, for some f with dom( f ) = n ∈ ω
and n > 2 (why n > 2?), we have f (0) = a, f (n − 1) = b, and f (i) P f (i − 1)
for i = 1, . . . , n − 1 (V.2.8 and V.2.24). In particular, f (1) P f (0), which
contradicts (1). Therefore a is also P+ -minimal.

VI.2.17 Corollary. If P has MC (IC) over A, then (P | A)+ has MC (IC).

Proof. It is given that P | A has MC (IC). By VI.2.16 (P | A)+ has MC (IC).

We cannot sharpen the above to “P+ has MC (IC) over A”, for that means that
P+ | A has MC. The latter is not true, though: Let O be the odd natural numbers,
and R be deﬁned on N by x R y iff x = y + 1; thus R + = >.
Now, R has MC over O (for R | O = ∅), yet R + does not, for R + | O has an
inﬁnite descending chain in O:
··· > 7 > 5 > 3 > 1
In particular, we note from this example that (P | A)+ = P+ | A in general.

VI.2.18 Example. Let ≺ on ω be deﬁned by n ≺ m iff m = n + 1. It is obvious

that ≺ is well-founded; hence it has MC and IC by VI.2.13.
What is ≺-induction? For notational convenience let “(∀x)” stand for
“(∀x ∈ ω)”. Thus, for any formula F (x),

(∀n) (∀x ≺ n)F (x) → F (n) → (∀n)F (n)
holds. In other words, if F (0) is proved [this is provably equivalent to
(∀x ≺ 0)F (x) → F (0) – see VI.2.6] and if also F (n − 1) → F (n) is proved
under the assumption n > 0, then (∀n)F (n) is proved.
This is just our familiar “simple” (as opposed to “course-of-values”) induc-
tion over ω, stemming from the fact that ω is the smallest inductive set (see
V.1.5 and V.1.6).
The “natural” < on ω (i.e., ∈) is ≺+ . <-induction over ω coincides with the
course-of-values induction over ω.

VI.2.19 Example. We already know that the axiom of foundation yields that
∈ has MC. Therefore properties of sets can be proved by ∈-induction over U M .

300 VI. Order

VI.2.20 Example (Double Induction over ω). (See also Chapter V, p. 248.)
We often want to prove
(∀m)(∀n)F (m, n) (1)
for some formula F and m, n ranging over ω. The obvious approach, which
often works, is to do induction on, say, m only, treating n as a “parameter”. That
is (assuming the problem can be handled by “simple” induction):
(i) Prove (∀n)F (0, n).
(ii) For m ≥ 0 prove (∀n)F (m + 1, n) from the I.H. (∀n)F (m, n).
Sometimes steps (i) and/or (ii) are not easy, and can be helped by induction on
n, that is:
(iii) Prove F (0, 0).
(iv) For n ≥ 0 prove F (0, n + 1) from the I.H. (on n) F (0, n),
which settles (i) by induction on n, and then
(v) For m ≥ 0 prove F (m + 1, 0), from the I.H. of (ii) above.
(vi) For m ≥ 0, n ≥ 0 prove F (m + 1, n + 1) from the assumptions
(a) I.H. on n, namely, F (m + 1, n), and
(b) I.H. on m ((ii) above).
Let us revisit the above “cascaded” induction from a different point of view.
Deﬁne ≺ on ω × ω by
a, b ≺ c, d iff c = a + 1 ∨ a = c ∧ d = b + 1

It is clear that ≺ is well-founded; hence it has IC over ω2 .

What is the proof of (1) by ≺-induction?
(vii) Prove F (0, 0) (0, 0 is the unique ≺-minimal element in ω2 ) – this is
step (iii).
(viii) For non-minimal m, n prove F (m, n) from the I.H. r, s ≺ m, n →
F (r, s).
Item (viii) splits into the following cases:
r m = 0. Then prove F (0, n) from F (0, n − 1) (why is n > 0?) – this is
step (iv).
r n = 0. Then prove F (m, 0) from (∀n)F (m − 1, n) (why is m > 0?) – this is
step (v).
r m > 0, n > 0.

Finally prove F (m, n) from F (m, n−1) and (∀n)F (m−1, n) – this is step (vi).

VI.2. Induction and Inductive Deﬁnitions 301

VI.2.21 Example. It is clear now that since sets such as ω − {0}, N ∪ {−3, −2,
−1} are well-ordered (by <), we can carry induction proofs over them. In the
former case the “basis” case is at 1; in the latter case it is at −3.

We next turn to recursive (or inductive) deﬁnitions with respect to an arbitrary

P that has IC: ﬁrst one that is also an order, and then one that is not necessarily
an order.

VI.2.22 Deﬁnition (Left-Narrow Relations). (Levy (1979).) A relation P is

left-narrow iff Px is a set for all x. It is left-narrow over A iff P | A is left-
narrow.
Left-narrow relations are also called set-like (Kunen (1980)).

VI.2.23 Example. ∈ is left-narrow, while % is not.

VI.2.24 Deﬁnition (Segments). If < is an order on A and a ∈ A, then the class

< a is called the (initial) segment deﬁned by a, while the class ≤ a is called
the closed segment deﬁned by a.

≤ is rA (<), of course, so that ≤ a =< a ∪ {a}. Segments of left-narrow

relations are sets.

VI.2.25 Theorem (Recursive or Inductive Deﬁnitions). Let <: A → A be a

left-narrow order with IC, and G a (not necessarily total) function G : A ×
U M → X for some class X. Then there exists a unique function F : A → X
satisfying:

(∀a ∈ A)F(a) ! G(a, F |` < a) (1)

The requirement of left-narrowness guarantees (with the help of collection;

cf. III.11.28) that the second argument of G in (1) is a set. This restriction does
not adversely affect applicability of the theorem, as the reader will be able to
observe in the sequel.
Also recall that “!” is Kleene’s weak equality, so that in the recurrence (1)
above we have either both sides deﬁned and equal (as sets or atoms), or both
undeﬁned (see III.11.17).

Proof. The proof is essentially the same as that for recursive deﬁnitions over
an inductively deﬁned set that we carried out in the metatheory in I.2.13. See
also the proof of V.1.21.
302 VI. Order

We prove uniqueness ﬁrst, so let H : A → X also satisfy (1). Let a ∈ A, and

adopt the I.H. that
(∀b < a)F(b) ! H(b)
that is, b < a → (∀y)(b, y ∈ F ↔ b, y ∈ H), and therefore
F |`< a = H |`< a
It follows that (writing “!” conjunctionally)
F(a) ! G(a, F |`< a)
! G(a, H |`< a)
! H(a)
This settles the claim of uniqueness: (∀a ∈ A)F(a) ! H(a), that is, F = H.
Deﬁne now

K = f : (∃a ∈ A) f: ≤ a → X (2)
∧ ∀x ∈ ≤ a f (x) ! G(x, f |`< x)

“ f :≤ a → X” stands for “ f is a function ≤ a → X”. Thus, of course,

in particular, f is a class (not an atom). By left narrowness and III.11.28, any
such f is actually a set, so we can “deﬁne” the class term K.
A classless way of stating (2) is to let A = {x : A(x)}, G = {x, y, z :
G (x, y, z)}, and X = {x : X (x)}, adding the assumptions
G (x, y, z) → A(x) ∧ X (z)
and
G (x, y, z) ∧ G (x, y, z ) → z = z

We then simply name the formula below “M( f, a)”, and let K = { f :
(∃a) M ( f, a)}.

¬U ( f ) ∧ A(a) ∧ (∀z) z ∈ f → O P(z) ∧ π(z) ≤ a ∧ X (δ(z))

∧ (∀x)(∀y)(∀z)(z
f x∧ y f x → y = z) (2 )
∧ ∀x ∈ ≤ a (∀y) y f x ↔ G (x, {u, v : v f u ∧ u < x}, y)

In the formal description of K (i.e., (2 )) we have added the conjunct ¬U ( f )

to exclude atoms. Informally we do not have to do this, since a function f is a
class (our only concern being whether it is proper or not).
We also note that K = ∅. For example, if a ∈ A is <-minimal, then ≤ a =
{a}, < a = ∅, and hence f |`< a = ∅ for any f ; thus K contains {a, G(a, ∅)}
if G(a, ∅) ↓, else it contains the empty function ∅ :≤ a → X. Indeed, for the
latter we have ∅(a) ! G(a, ∅ |`< a), since both sides are undeﬁned.
VI.2. Induction and Inductive Deﬁnitions 303

Since the uniqueness argument above does not depend on the particular left
ﬁeld A, but only on the fact that < has IC over A, the same proof of uniqueness
applies to the case that the left ﬁeld is ≤ a (a subset of A),† showing that

ZFC M( f, a) ∧ M(g, a) → f = g (3)

We have at once‡

ZFC M( f, a) → M (g, b) → f (x) ↓→ g(x) ↓→ f (x) = g(x) (3 )

because ≤ x ⊆≤ a∩ ≤ b by transitivity;§ hence f |`≤ x = g |`≤ x by (3).
Now

F= K is a function F : A → X (4)

for

F(x) = y ↔ (∃ f )(∃a)(M( f, a) ∧ f (x) = y)

Thus, if F(x) = y and F(x) = z, then (using auxiliary constants f, g, a, b) we

add

f (x) = y ∧ M( f, a) (5)

and

g(x) = z ∧ M (g, b) (6)

from which (and (3 )) we derive y = z.

In preparation for our ﬁnal step we note that

ZFC M( f, a) → f = F |`≤ a (7)

Assume the hypothesis, and let x ≤ a. Then, f (x) = y implies F(x) = y by

(4). Conversely, under the same hypotheses – M( f, a) and x ≤ a – as-
sume also F(x) = z. This leads (say, via new constants g and b) to (6). Since
x ∈ ≤a∩ ≤b, we get f (x) = g(x) = F(x) (cf. footnote related to (3 )).
We ﬁnally verify that F satisﬁes the recurrence (1) of the theorem. Indeed, let
F(x) = y, which, using auxiliary constants f and a, leads to the assumption (5)

† Of course, < has IC over ≤ a for any a ∈ A (cf. VI.2.12).

‡ We do mean “=” rather than the wishy-washy “!” here, since x ∈ dom( f ) ∩ dom(g). Note that
if we had known that x ∈≤ a∩ ≤ b, then only one of the hypotheses f (x) ↓ or g(x) ↓ would
have sufﬁced.
§ We have just used the assumption that < is an order.
304 VI. Order

above. Then, by M( f, a) – speciﬁcally, specializing the last conjunct of (2 ) –

we get
G (x, {u, v : v f u ∧ u < x}, y)
Hence,
G (x, {u, v : v F u ∧ u < x}, y)
by (7). That is,
ZFC F(x) = y → G(x, F |`< x) = y
We now want the converse,
ZFC G(x, F |`< x) = y → F(x) = y (8)
Let a ∈ A be <-minimal for which (8) fails. This failure means that we have
G(a, F |`< a) = b for some appropriate b ∈ X (9)
yet †

F(a) ↑ (10)
We deﬁne h = F |`< a (a renaming of convenience), which is a function.
By minimality of a, the function F, and hence h, satisfy the recurrence (1) on
< a, that is
(∀x ∈< a)h(x) ! G(x, h |`< x) (11)
The function f = h ∪ {a, b} satisﬁes (∀x ∈≤ a) f (x) ! G(x, f |`< x),
because of (9). Hence f ⊆ F by (4). Now, f (a) = b contradicts (10).

VI.2.26 Remark. (1) Pretending that the above proof took place in the metathe-
ory, one can view it as “constructively” demonstrating the “existence” of a class
F with the stated properties. Formally, we cannot quantify over classes. Thus,
to prove “(∀A) . . . A . . . ” one proves the schema “. . . A . . .” for the arbitrary
A (that “deﬁnes” A = {x : A }). To prove “(∃A). . . A . . .” one must exhibit a
speciﬁc formula A (that gives rise to A as above) for which we can prove (the
formal translation of) “. . . A . . .”.
In particular, what we really did in the above proof were two things:
(a) We stated a formula F (x, y), displayed below, that was built from given
formulas:
(∃ f )(∃a)(M( f, a) ∧ x, y ∈ f )
where M is given by (2 ). We then proved the theorems
F (x, y) ∧ F (x, y ) → y = y (∗)

† Clearly, by the direction already proved, F(a) ↓ is incompatible with the failure of (8).
VI.2. Induction and Inductive Deﬁnitions 305

and (1) of the theorem, using in the latter case the abbreviation F(x) = y for
F (x, y). This was our “existence” proof.

The reader will have absolutely no trouble verifying that if instead of G we

have a function symbol, G, of arity 2, and a relation < with IC (dropping A and
X), then we can introduce a function symbol of arity 1, F, so that the following
holds:

ZFC (∀a)F(a) = G(a, F |`< a)

Indeed, F can be introduced by

F(x) = y ↔ (∃ f )(∃a)(M( f, a) ∧ f (x) = y) (∗∗)

since we can prove, under these assumptions,† that

ZFC (∀x)(∃!y)(∃ f )(∃a)(M( f, a) ∧ f (x) = y)

Even in the presence of an A we can always introduce F (using the techniques

in III.2.4) by saying “under the assumption x ∈ A, (∗∗) is derivable, while
under the assumption x ∈/ A, F(x) = ∅ is derivable” (deﬁnition by cases). (See
Exercise VI.5.)

(b) The uniqueness part showed that our “solution” F is unique within equi-
valence: Any other formula H that is functional (i.e., satisﬁes (∗) above with
F replaced by H ) and “solves” (1)) is provably equivalent to F :

A(x) → F (x, y) ↔ H (x, y)

The above discussion makes it clear that using class terminology and notation
was a good idea.
(2) The recursion on the natural numbers (V.1.21) is a special case of VI.2.25:
Indeed,

f (0) = a
for n ≥ 0, f (n + 1) = g(n, f (n))

can be rewritten as

(∀n ∈ ω) f (n) ! G(n, f |`< n)

! G(n, f |` n)

† G is obtained from G as in III.11.20. Conversely, starting with G = {x, y, z : G (x, y, z)}, a
function, we have G (x, y, z) → G (x, y, z ) → z = z . We can then introduce G by G(x, y) =
z ↔ G (x, y, z).
306 VI. Order

where

a if n = 0
G(n, h) = g(n − 1, h(n − 1)) if h is a function ∧ dom(h) = n > 0

↑ otherwise
Note that G on ω × U M is nontotal. In particular, if the second argument is not
of the correct type (middle case above), G will be undefined. We can still prove
that f (n) ↓ for all n ∈ ω, without using V.1.21.
Assume the claim for m < n (I.H.). For n = 0, we have f (0) ! G(0, ∅) = a,
defined. Let next n > 0. Now f (n) ! G(n, f |` n) and dom( f |` n) = n by I.H.;
hence f (n) = g(n − 1, ( f |` n)(n − 1)) = g(n − 1, f (n − 1)), defined, since g
is total.
(3) In view of the above, it is worth noting that a recursive definition à
la VI.2.25 can still define a total function, even if G is nontotal.

VI.2.27 Corollary (Deﬁnition by Recursion with Respect to an Arbitrary

Relation with IC† ). Let P : A → A be a left-narrow relation – not necessarily
an order – with IC, and G a (not necessarily total) function G : A × U M → X
for some class X. Then there exists a unique function F : A → X satisfying
(∀a ∈ A)F(a) ! G(a, F |` Pa)

: A × U M → X by
Proof. Deﬁne G

∅ if f is not a function
G(a, f) =
G(a, f |` Pa) otherwise
Let < stand for P+ . Now < is an order on A with IC by VI.2.16. Moreover it
is left-narrow by the axiom of union, since (V.2.24)

P+ a = P n a : n ∈ ω − {0}
and an easy induction on n shows that each P n a is a set (Exercise VI.2). Thus,
by VI.2.25, there is a unique F : A → X such that
F |`< a)
(∀a ∈ A)F(a) ! G(a,
(1)
! G a, F |`< a |` Pa

Now, Pa ⊆< a yields (F |`< a) |` Pa = F |` Pa; hence (1) becomes

(∀a ∈ A)F(a) ! G(a, F |` Pa)

† This idea is due to Montague (1955) and Tarski (1955).

VI.2. Induction and Inductive Deﬁnitions 307

VI.2.28 Corollary (Recursion with a Total G). Let P : A → A be a left-

narrow relation – not necessarily an order – with IC, and G a total function
G : A × U M → X, for some class X. Then there exists a unique total function
F : A → X satisfying

(∀a ∈ A)F(a) = G(a, F |` Pa)

Proof. We only need to show that dom(F) = A. By VI.2.27, there is a unique

F satisfying

(∀a ∈ A)F(a) ! G(a, F |` Pa)

But the right hand side of ! is deﬁned for all a ∈ A; thus we can use “=”
instead of “!” in the statement of VI.2.27.

VI.2.29 Corollary (Recursive Deﬁnition with Parameters I). Let P : A → A

be a left-narrow relation – not necessarily an order – with IC, and G a (not
necessarily total) function G : S × A × U M → X, for some classes S and X.
Then there exists a unique function F : S × A → X satisfying†

(∀s, a ∈ S × A)F(s, a) ! G(s, a, {s, x, F(s, x) : x P a}) (1)

In equation (1) s persists throughout (unchanged); hence it is called a parameter.

Proof. Deﬁne the relation

P on S × A by

u, a
P v, b iff u = v ∧ aPb

It is clear that
P has MC. Now, (1) can be rewritten as

(∀s, a ∈ S × A)F(s, a) ! G(s,

a, {s, x,& F(s, 'x) : s, x
P s, a})

! G s, a, F |` P s, a

The result follows from VI.2.27 by using J given below as “G-function”:

↑ if g ∈
/ S×A
J(g, f ) =
G(π(g), δ(g), f ) otherwise

† “(∀s, a ∈ S × A)” is argot for “(∀z)(O P(z) ∧ π (z) ∈ S ∧ δ(z) ∈ A → . . .”, or, simply,
“(∀s ∈ S)(∀x ∈ A)”.
308 VI. Order

VI.2.30 Corollary (Recursive Deﬁnition with Parameters II). Let all as-
sumptions be as in Corollary VI.2.29, except that the recurrence now reads
(∀s, a ∈ S × A)F(s, a) ! G(s, a, {x, F(s, x) : x P a}) (1)
Then there exists a unique function F : S × A → X satisfying (1).

Proof. Apply Corollary VI.2.29 with a “G-function”, J, given by

J(s, a, f ) = G(s, a, p23 ( f ))
where p23 : U M → U M is

↑ if f is not a class of 3-tuples
p23 ( f ) =
{δ(π (z)), δ(z) : z ∈ f } otherwise

VI.2.31 Corollary (Pure Recursion with Respect to a Well-Ordering and

with a Partial G). Let <: A → A be a left-narrow well-ordering, and G a (not
necessarily total) function G : U M → X for some class X. Then there exists a
unique function F : A → X satisfying (1)–(2) below:

(1) (∀a ∈ A)F(a) ! G(F |`< a),

(2) dom(F) is either A, or < a for some a ∈ A.

“Pure recursion” refers to the fact that G has only one argument, the “history”
of F on the segment < a.

Proof. In view of Theorem VI.2.25, we need only prove (2). So let dom(F) = A.
Let a in A be <-minimal (also minimum here, since < is total) such that†
F(a) ↑ i.e., G(F |`< a) ↑ (3)
Thus < a ⊆ dom(F). We will prove that dom(F) =< a. Well, let instead
b ∈ dom(F) − < a be minimal.‡
By (3) and totalness of <, we have a < b. By choice of b,
(∀x)(a ≤ x ∧ x < b → F(x) ↑)
Thus,
F |`< b = F |`< a (4)

† Proof by auxiliary constant hiding between the lines.

‡ Another hidden proof by auxiliary constant.
VI.2. Induction and Inductive Deﬁnitions 309

Therefore

F(b) ! G(F |`< b)

! G(F |`< a) (by (4))

contradicting (3), since F(b) ↓.

VI.2.32 Example. Let G : 2 × U → 2 (recall that 2 = {0, 1}) be

1 if x = 1 ∧ f = ∅
G(x, f ) =
↑ otherwise
and 2 be equipped with the “standard order” < (i.e., ∈) on ω. Then the recursive
deﬁnition

(∀a ∈ 2)F(a) ! G(a, F |`< a)

yields the function F = {1, 1}, whose domain is neither 2 nor a segment of
2. Thus the requirement of pure recursion in VI.2.31 is essential.†

VI.2.33 Remark. In practice, recursive deﬁnitions with respect to a P that has

MC (IC) have often the form

H(s) if x is P-minimal
F(s, x) !
G(s, x, {s, y, F(s, y) : y P x}) otherwise
given by
This reduces to the case considered in VI.2.29 with a “G-function” G

x, f ) = H(x)
G(s,
if x is P-minimal
G(s, x, f ) otherwise
A similar remark holds – regarding making the basis of the recursion explicit –
for all the forms of recursion that we have considered.

VI.2.34 Example (The Support Function). The support function sp : U M →

U M gives the set of all urelements, sp(x), that took part in the formation of
some set x. For example,
sp(∅) = ∅
sp(n) = ∅ for every n ∈ ω (induction on n)
sp(ω) = ∅
sp({2, ?, {#, !, ω}}) = {?, #, !} for urelements ?, #, !

† Purity of recursion we tacitly took advantage of in the last step of the proof of VI.2.31. Imagine
what would happen if F’s argument were explicitly present in G: We would get G(b, F |`< b) !
G(b, F |`< a), a dead end, since what we have is G(a, F |`< a) ↑, not G(b, F |`< a) ↑.
310 VI. Order

The existence and uniqueness of sp is established by the following recursive

deﬁnition:

{x} if x is an urelement
sp(x) = (1)
{sp(y) : y ∈ x} otherwise
That (1) is an appropriate recursion can be seen as follows: First, ∈ (the relation,
not the predicate) is left-narrow and has MC. Next, (1) can be put in standard
form (Corollary VI.2.28 in this case)

(∀x ∈ dom(sp))sp(x) = G(x, sp |`∈ x)

(of course, ∈ x = x), where the total G : U M × U M → U M is given by


 {x} if x is an urelement
G(x, f ) = ∅ otherwise, if f is not a relation

ran( f ) in all other cases
Note that in view of the discussion in Remark VI.2.26, we may introduce “sp”
as a new formal function symbol.

VI.2.35 Deﬁnition (Pure Sets). A set with empty support is called a pure
set.

VI.2.36 Example (Mostowski’s Collapsing Function). Here is another func-

tion on sets that is an important tool in the model theory of set theory. It is a
function C : U M × U M → U M deﬁned by

x if x is an urelement
C( p, x) = (1)
{C( p, y) : y ∈ p ∧ y ∈ x} otherwise
This too can be introduced formally if desired (cf. VI.2.26). Note that p is a
parameter.
What does C do – i.e., what is C( p, x) – to a set or urelement x in the context
of the “reference” set or urelement p?
Well, if x is an urelement, then C does not change it. In the contrary case, if
p is an urelement, then y ∈ p is refutable and thus C( p, x) = ∅.
The interesting subcase is when p is a set. Suppose that x ∩ p = x ∩ p
despite (possibly) x = x . We get from (1)

C( p, x) = {C( p, y) : y ∈ p ∩ x} = {C( p, y) : y ∈ p ∩ x } = C( p, x )

In other words, C “collapses” any two sets x and x if their (possible) differ-
ences cannot be witnessed inside p. That is, an inhabitant of p, aware only of
members of p but of nothing outside p, cannot tell x and x apart on the basis
VI.2. Induction and Inductive Deﬁnitions 311

of extensionality (trying to ﬁnd something in p that is in one of x or x but not

in the other).
Here is a concrete example: Let p = {#, !, ?, {#, @, ?}, {#, ?}}, where #, !, ?,
@ are urelements, and let x = {#, @, ?} and x = {#, ?}. Now,

C( p, #) = #
C( p, !) = !
C( p, ?) = ?
C( p, @) = @
C( p, x) = {C( p, y) : y ∈ p ∩ x}
= {C( p, y) : y = # ∨ y = ?} = {#, ?}
= x

while

C( p, x ) = {C( p, y) : y ∈ p ∩ x }
= {C( p, y) : y = # ∨ y = ?} = {#, ?}
= x

and

C( p, p) = {C( p, y) : y ∈ p}
= {C( p, y) : y = # ∨ y =! ∨ y = ? ∨ y = x ∨ y = x }
= {#, !, ?, {#, ?}}

Note that – in the place of the two original x and x of p – C( p, p) (the “collapsed
p”) only contains the common collapsed element, the set C( p, x) (= C( p, x )).
Moreover, we note that the C( p, p) that we have just computed is transitive.
This is not a coincidence with the present p, but holds for all p:
Indeed, if C( p, p) is not an atom (case where p is an urelement), then it is a
transitive set (why set?). To verify, let next p be a set and (using conjunctional
notation)

a ∈ b ∈ C( p, p) = {C( p, x) : x ∈ p} (2)

Then b = C( p, x) for some x ∈ p (III.8.7). Now, one case is where x is an

urelement, hence b = x (by (1)). Since a ∈ b is now refutable, a ∈ b ∈
C( p, p) → a ∈ C( p, p) follows.
The other case leads to

b = {C( p, y) : y ∈ p ∩ x}

By the assumption a ∈ b, a = C( p, y) for some y ∈ p ∩ x ⊆ p; hence (by (1))

a ∈ C( p, p).
312 VI. Order

We conclude by putting the recursion (1) in standard form (that of Corollary

VI.2.29 – hence C is total on U2M , since G below is). Just take as “G-function”
: U3 → U M , given by
G M


 x if x is an urelement

p, x, f ) = ∅
G(
else, if p is an atom

 ∅ else, if f is not a relation

ran( f |` ({ p} × ( p ∩ x))) in all other cases
The reader will have no trouble putting future recursions in “standard” form,
and we delegate to him all future instances of such an exercise.

VI.2.37 Remark. What lies behind the fact that C( p, p) is transitive, intuitively
speaking? Well, by “squeezing out” those elements of x (in p – such as @ above)
which do not help to establish the “identity” of x in p, we have left in x, in
essence, only those objects which (in squeezed form, of course) p “knows
about” (i.e., are its elements). The collapsed p (i.e., C( p, p)) has the hereditary
property: If x (set) is in it, then so are the members of x, and – repeating this
observation – so are the members of the members of x, and so on.

VI.2.38 Example (Continuation of Example VI.2.36). We now examine

whether the concrete p of the previous example is a possible “universe” of
sets and urelements, where we are content to live (mathematically speaking)
and “do set theory” (i.e., it is the underlying set of some model of ZFC).† We
discover that this potential “universe” has a disturbing property: Even though –
as inhabitants of p – we “know” about certain two of its members, x and x ,
that‡ (∀z)(z ∈ x ↔ z ∈ x ), yet it happens that “really” x = x . That is,
extensionality fails in this universe.
Let us call a set p extensional iff it satisﬁes (3) below (otherwise it is called
nonextensional):

(∀u ∈ p)(∀v ∈ p) ¬U (u) → ¬U (v) →
(3)
(∀z ∈ p) (z ∈ u ↔ z ∈ v) → u = v

It turns out that if p is extensional to begin with, then by collapsing it, not only
do we turn it into a transitive set, but also, the new set C( p, p) is essentially the
same as p; its elements are obtained by a judicious renaming of the elements
of p, otherwise leaving the {}-structure of p intact.
† By our belief that ZFC is consistent – cf. II.4.5 – set universes exist by the completeness theorem
of Chapter I. However, this p cannot be one of them, for extensionality fails in it.
‡ Caution: Since p is (supposed to be) the “universe”, “(∀z)” here is short for “(∀z ∈ p)”.
VI.2. Induction and Inductive Deﬁnitions 313

More precisely, there is a 1-1 correspondence between p and C( p, p) (x "→

C( p, x) does the “renaming”) which preserves membership relationships (we
get, technically, an isomorphism with respect to ∈).
Let us prove this. If p is extensional, then
λx.C( p, x) : p → C( p, p)
is a 1-1 correspondence such that, for all x, y in p
x∈y iff C( p, x) ∈ C( p, y) (4)
To this end, observe that λx.C( p, x) |` p is, trivially, total and that
ran(λx.C( p, x) |` p) = {C( p, y) : y ∈ p}
= C( p, p)
so that λx.C( p, x) |` p is onto as well. Also observe that since
C( p, y) = {C( p, u) : u ∈ y ∩ p}
we have
x ∈ p ∧ x ∈ y → C( p, x) ∈ C( p, y)
which is half of (4). To conclude we need to show the 1-1-ness of λx.C( p, x) |` p
as well as the if part of (4) above.
We show that
(∀x)(∀y)Q (x, y) (5)
where Q (x, y) stands for
[C( p, x) = C( p, y) → x = y] ∧
[C( p, x) ∈ C( p, y) → x ∈ y] ∧
[C( p, y) ∈ C( p, x) → y ∈ x]
and quantiﬁcation is over p.
We argue by contradiction, assuming instead the negation of (5):
(∃x)(∃y)¬Q (x, y) (5 )
The argument is extremely close to that of the proof of trichotomy (V.1.20).
So, let x0 be ∈-minimal such that

(∃y)¬Q (x0 , y) (6)

and, similarly, y0 be ∈-minimal such that
¬Q (x0 , y0 ) (7)
314 VI. Order

We will contradict (7), which says that

(C( p, x0 ) = C( p, y0 ) ∧ x0 = y0 )
∨ (C( p, x0 ) ∈ C( p, y0 ) ∧ x0 ∈/ y0 )
∨ (C( p, y0 ) ∈ C( p, x0 ) ∧ y0 ∈/ x0 )
Case 1. C( p, x0 ) = C( p, y0 ) ∧ x0 = y0 (refutation of 1-1-ness). If either x0
or y0 is an urelement, then x0 = y0 , a contradiction. Indeed, say x0 is an atom.
Then x0 = C( p, x0 ) = C( p, y0 ), which forces C( p, y0 ) to be an urelement,
inevitably y0 (why?). So let both x0 and y0 be sets.
We will prove that x0 = y0 to obtain a contradiction. Since p is extensional,
this amounts to proving z ∈ x0 → z ∈ y0 and z ∈ y0 → z ∈ x0 for the arbitrary
z ∈ p.
To this end, let z ∈ x0 , so (by (6)), (∀y)Q (z, y) holds, in particular
Q (z, y0 ) (8)
By the only-if part of (4), already proved, z ∈ x0 yields C( p, z) ∈ C( p, x0 ) =
C( p, y0 ). Thus, by (8), z ∈ y0 .
One similarly proves z ∈ y0 → z ∈ x0 . However, it is instructive to include
the full proof here, so that we can make a comment.
Let z ∈ y0 . By (7),
Q (x0 , z) (9)
By half of (4), C( p, z) ∈ C( p, y0 ) = C( p, x0 ). By (9) z ∈ x0 .
Note that the inclusion of the seemingly redundant “∧[C( p, y) ∈ C( p, x) →
y ∈ x]” in the deﬁnition of Q (x, y) ensures the symmetry in the roles of x, y.
In the absence of such symmetry, (9) would not help here.
Case 2. C( p, x0 ) ∈ C( p, y0 ) – hence y0 is not an urelement – yet x0 ∈
/ y0 .
Thus
C( p, x0 ) = C( p, z) (10)
for some z ∈ p ∩ y0 ; hence (by (7)) Q (x0 , z); and (by (10)) x0 = z, thus x0 ∈ y0 ,
contradicting the assumption.
Case 3. C( p, y0 ) ∈ C( p, x0 ), yet y0 ∈
/ x0 . This case leads to a contradiction,
exactly like the previous one, establishing (5).
Thus, if A = (A, U, ∈) is a countable model of ZFC,† then (C(A, A), U, ∈)
is an isomorphic (=, U and ∈ are “preserved”) transitive model. A so-called
CTM (countable transitive model). Cf. I.7.12.

† Such models exist by the Löwenheim-Skolem theorem of Chapter I, since the language of ZFC
is countable (granting that ZFC is consistent).
VI.2. Induction and Inductive Deﬁnitions 315

VI.2.39 Example (Example VI.2.36 Concluded). Let p = {#, !, ?, {#, @, ?},

{#, !}}. Set x = {#, @, ?} and x = {#, !}. This p is extensional for p ∩ x =
p ∩ x . One easily computes

C( p, x) = {#, ?}

and

C( p, x ) = {#, !}

Thus the collapse of p, C( p, p), is {#, !, ?, {#, ?}, {#, !}}.

We conclude this section with an extension of the previous recursive deﬁni-

tion schemata – which define one function – to the case where many functions
are defined at once by simultaneous recursion. This tool – familiar to the worker
in computability, where it goes back (at least) to Hilbert and Bernays (1968) –
will be handy in our last chapter, on forcing. There are many variations that are
left to the reader’s imagination. We just give two of the many possible schemata
here and also restrict the number of functions that are simultaneously defined
to just two (without loss of generality, as the reader will readily attest).

VI.2.40 Corollary (Simultaneous Recursion with Respect to an Arbitrary

Relation with IC). Let P : A → A be a left-narrow relation – not necessarily
an order – with IC, and G1 and G2 (not necessarily total) functions A × U2M →
X, for some class X. Then there exist unique functions F1 and F2 , from A to X,
satisfying

(∀a ∈ A)F1 (a) ! G1 a, {x, F1 (x) : x P a}, {x, F2 (x) : x P a} (1)

and

(∀a ∈ A)F2 (a) ! G2 a, {x, F1 (x) : x P a}, {x, F2 (x) : x P a} (2)

Proof. We deﬁne functions p1 and p2 by


↑ if f is not a class of x, y, z-type
p1 ( f ) = entries

{π(z), π (δ(z)) : z ∈ f } otherwise

and

↑ if f is not a class of x, y, z-type
p2 ( f ) = entries

{π (z), δ(δ(z)) : z ∈ f } otherwise
316 VI. Order

and set
& '
G = λx f. G1 x, p1 ( f ), p2 ( f ) , G2 x, p1 ( f ), p2 ( f )

By VI.2.27 there is a unique F : A → X such that

(∀a ∈ A)F(a) ! G a, {x, F(x) : x P a}

It is trivial to check (induction) that F is a class of x, y, z-type entries

(equivalently, F(a) ↓ implies that F(a) is a pair). Taking F1 = π ◦ F and F2 =
δ ◦ F, we have satisﬁed (1) and (2) respectively.

VI.2.41 Corollary (Simultaneous Recursion with a Total G). Let P : A → A

be a left-narrow relation – not necessarily an order – with IC, and G1 and G2
total functions A × U2M → X for some class X. Then there exist unique total
functions F1 and F2 from A to X satisfying

(∀a ∈ A)F1 (a) = G1 (a, F1 |` Pa, F2 |` Pa)

and

(∀a ∈ A)F2 (a) = G2 (a, F1 |` Pa, F2 |` Pa)

VI.3. Comparing Orders

VI.3.1 Example (Informal). Consider R ⊆ A × A, where A = {1, 2, 3} and

the relation R = {1, 2, 2, 3, 1, 3}. Also consider S ⊆ B × B, where B =
{a, b, c} and S = {a, b, b, c, a, c}.
(A, R) and (B, S) are PO sets (indeed, WO sets). What is interesting here is
that once we are given the ﬁrst PO set, the second one does not offer any new
information as far as partial order or, indeed, well-ordering is concerned. This
observation holds true if “ﬁrst” and “second” are interchanged.
This is because (B, S) is obtained from (A, R) by a systematic renaming
of objects (1 "→ a, 2 "→ b, 3 "→ c) which preserves order. That is, f = {1, a,
2, b, 3, c} is a 1-1 correspondence A → B such that x R y iff f (x)S f (y).
Since such a correspondence exhibits the fact that (A, R) and (B, S)
have the same “shape”, or “form” (loosely translated into Greek, the same
“µoρφ ή”, or “morphē” in transliteration), it has been given the standard name
isomorphism.†

† Strictly speaking, order isomorphism in this case, since the concept of isomorphism extends to
other mathematical structures as well. The preﬁx “iso” in the term comes from the Greek word
ίσ o, which means “equal” or “identical”.
VI.3. Comparing Orders 317

If we have complete knowledge of (A, R) (respectively (B, S)), it is as good

as having complete knowledge of (B, S) (respectively (A, R)). It sufﬁces to
study any convenient one out of many mutually order-isomorphic PO sets (or
LO sets or WO sets).

VI.3.2 Example (Informal). (N, <) and ({−2, −1} ∪ N, <) are order-
isomorphic PO sets, where the “<” is the standard order (they are also order-
isomorphic WO sets).
Indeed, if we let f : N → {−2, −1} ∪ N be λx.x − 2, then clearly f is a
1-1 correspondence and i < j iff f (i) < f ( j) for all i, j in N.

Now some deﬁnitions and some useful results.

VI.3.3 Informal Deﬁnition. Let (A, S) and (B, T) be two PO classes. A 1-1
correspondence f : A → B is an order-isomorphism just in case

x S y iff f (x) T f (y) for all {x, y} ⊆ A.

(A, S) and (B, T) are called order-isomorphic. We write (A, S) ∼

= (B, T).

We will drop the qualiﬁcation “order-” from “order-isomorphic” as long as the

context ascertains that there is no other type of isomorphism in consideration.
We often abuse language (and notation) in those cases where the orders S (on
A) and T (on B) are clearly understood from the context. We then say simply
that A and B are isomorphic and write A ∼ = B. As usual, the negation of ∼ = is
∼

written =.

A notion related to isomorphism is that of an order-preserving function.

VI.3.4 Informal Deﬁnition. Let (A, S) and (B, T) be two PO classes, and
f : A → B be total. If, on the assumption that x ∈ A ∧ y ∈ A, the implication
x S y → f (x) T f (y), holds, then f is called order-preserving.

VI.3.5 Remark. Operationally, “holds” above can only be certiﬁed by a ZFC

proof (when such proof is possible). Correspondingly, the act of assuming, in
the course of a proof, that a certain total f : A → B is order-preserving between
the PO classes (A, S) and (B, T) is tantamount to adding the axiom

x ∈ A ∧ y ∈ A → x S y → f (x) T f (y)
318 VI. Order

If we use the more natural notation <1 and <2 for S and T respectively, then
the above deﬁnition says that x <1 y → f (x) <2 f (y) is the condition for a
total f to be order-preserving.

VI.3.6 Example (Informal). Let A = {a, b, c, d} be equipped with the order

<1 so that (just) a <1 b. Let B = {1, 2} be equipped with the order <2 so that
(just) 1 <2 2.
Deﬁne f = {a, 1, b, 2, c, 1, d, 1}. Clearly, f : A → B is order-
preserving, since the statement

(∀x ∈ A)(∀y ∈ A)(x <1 y → f (x) <1 f (y))

is true. However, note that f is not 1-1; hence it is not an isomorphism.

VI.3.7 Example (Informal). Let A = {a, b, c, d} be equipped with the order

<1 so that (just) a <1 b. Let B = {1, 2, 3, 4} be equipped with the order
<2 so that 1 <2 2 <2 3 <2 4 (employing “<2 ” conjunctionally). Deﬁne g as
{a, 1, b, 2, c, 3, d, 4}. Then g is order-preserving. It is also a 1-1 corre-
spondence, but not an isomorphism, since g(c) <2 g(d) but c <1 d. In fact, c
and d are non-comparable under <1 .

VI.3.8 Proposition. Let (A, <1 ) be a LO class, (B, <2 ) be a PO class, and
f : A → B be order-preserving (see VI.3.5 for the interpretation of these
assumptions). Then

(a) f is 1–1, and

(b) f is an isomorphism between (A, <1 ) and (ran( f ), <2 ).

Proof. (a): Let x = y in A. As <1 is a linear order, we have x <1 y or y <1 x;

let us examine only the latter case. Then f (y) <2 f (x); hence f (x) = f (y),
since the orders are irreﬂexive.
(b): Let f (x) <2 f (y) in ran( f ). Since this implies f (x) = f (y), we must
have x = y by single-valuedness of f . Thus we will have x <1 y or y <1 x.
Let the latter be the case. Then also f (y) <2 f (x); hence by the assumption
and transitivity of <2 , f (x) <2 f (x) – a contradiction.
We conclude that x <1 y is the only possible case. Therefore we have
established that f (x) <2 f (y) → x <1 y, which, along with f being order-
preserving, establishes f as an isomorphism of LO classes.

VI.3.9 Remark. A function such as f is called an embedding. It embeds

(A, <1 ) into (B, <2 ) in the sense that it shows the former to be an
VI.3. Comparing Orders 319

isomorphic copy of a subclass of B (here ran( f )), where, of course, this subclass
is equipped with the same order as B, namely <2 .
If ran( f ) = B, then the embedding is an isomorphism.

VI.3.10 Corollary. Let (A, <1 ) be a LO class, B a class, and f : A → B a

1-1 correspondence. Deﬁne x <2 y on B by f −1 (x) <1 f −1 (y).
Then (B, <2 ) is a LO class that is isomorphic to (A, <1 ).

VI.3.11 Remark. We say that the order <2 on B is induced by f (and <1 ).

VI.3.12 Corollary. If <1 in VI.3.10 is a well-ordering, then (B, <2 ) is a WO

class isomorphic to (A, <1 ).

Proof. Let ∅ = X ⊆ B, and let a = min( f −1 [X]). Now, x ∈ X implies f −1 (x) ∈

f −1 [X]; hence a ≤1 f −1 (x); thus f (a) ≤2 x. That is, f (a) = min(X).

VI.3.13 Proposition. Let (A, <) be a PO class with MC, and f : A → A be

order-preserving. Then there is no x ∈ A such that f (x) < x.

Proof. Assume the contrary, and let m be minimal in B = {x ∈ A : f (x) < x}.
Thus,

f (m) < m (1)

Since f is order-preserving, (1) yields

f ( f (m)) < f (m) (2)

By (2), f (m) ∈ B, which by (1) contradicts the minimality of m.

VI.3.14 Remark. Another way to see the reason for the above is to observe
that if for any a ∈ A

f (a) < a

holds, then

· · · < f ( f ( f (a))) < f ( f (a)) < f (a) < a

is an inﬁnite descending chain, contradicting VI.2.11.

VI.3.15 Corollary. If (A, <) is a WO class and f : A → A is order-preserving,

then (∀x ∈ A)x ≤ f (x).
320 VI. Order

The following two corollaries use the notion of segment in their formulation
(see VI.2.24).

VI.3.16 Corollary. There is no isomorphism between a WO class and one of

its segments.

Proof. Say (A, <) is a WO class and f : A →< a is an isomorphism, where
a ∈ A. Then f (a) ∈< a, that is, f (a) < a, a contradiction.

VI.3.17 Corollary. Given a WO class (A, <). If a ∈ A and ≤ a ⊂ A, then

there is no isomorphism f : A → ≤ a.

Proof. Let b ∈ A− ≤ a. Thus a < b; hence (conjunctionally) f (b) ≤ a < b,

contradicting VI.3.13.

VI.3.18 Corollary. If (A, <) is a WO class, and f : A → A is an isomorphism,

then f = A .

Proof. Let instead f (a) = a for some a ∈ A. If a < f (a), then applying the
order-preserving f −1 to both sides, we get f −1 (a) < a, contradicting VI.3.13.
For the same reason, the hypothesis f (a) < a is rejected outright.

VI.3.19 Corollary. If (A, <1 ) and (B, <2 ) are isomorphic WO classes, then
there is exactly one isomorphism f : A → B.

Proof. Let f : A → B and g : A → B be isomorphisms. It is trivially veriﬁable

that g −1 ◦ f : A → A is an isomorphism.
By VI.3.18, g −1 ◦ f = A ; hence f = g, since both functions are 1-1 corre-
spondences.

The next result shows, on one hand, that if two WO classes are not isomor-
phic, then one properly contains (an isomorphic copy of) the other, i.e., the
“smaller” of the two is embeddable in the “larger”. On the other hand, it shows
that every WO class has the structure of (i.e., is isomorphic to) a segment.

VI.3.20 Theorem. Let (A, <1 ) and (B, <2 ) be any WO classes and <1 be
left-narrow. Then exactly one of the following cases obtains:
(a) The two WO classes are isomorphic,
(b) (A, <1 ) is isomorphic to a segment of (B, <2 ),
(c) (B, <2 ) is isomorphic to a segment of (A, <1 ).
VI.3. Comparing Orders 321

Proof. By VI.3.16 no two of the above three cases are possible at once. It
remains to prove the disjunction of (a)–(c). Intuitively, we start off by pairing
min(A) with min(B). Then we pair the “next larger” element of A with that of
B. We continue in this way until either we run out of elements from A and B
simultaneously, or deplete A first, or deplete B first (these cases correspond to
the ones enumerated (a)–(c) in the theorem).
Formally now, if any of A or B is ∅, then the result is trivial. So let A =
∅ = B, and apply the pure recursion (VI.2.31) to define the function F : A → B
by

(∀x ∈ A)F(x) ! min y : y ∈ B − ran(F |`<1 x) (1)

Next, we establish that F : dom(F) → ran(F) is an isomorphism. By VI.3.8, it

sufﬁces to show that it is order-preserving on its domain. To see this we show
that

(∀y)(y ∈ ran(F |`<1 x) → y <2 F(x)) (2)

Assume instead (recall that <2 is total)

(∃y)(y ∈ ran(F |`<1 x) ∧ F(x) ≤2 y) (2 )

Because of (2 ) we may add the assumption

c ∈ ran(F |`<1 x) ∧ F(x) ≤2 c (3)

where c is a new constant. Let b ∈<1 x (another auxiliary constant) such that
F(b) = c. By (1),

z ∈ B − ran(F |`<1 b) → F(b) ≤2 z (4)

By (1) again, F(x) ∈ / ran(F |`<1 x) (in particular, y <1 x → F(y) = F(x),
/ ran(F |`<1 b) by <1 b ⊂<1 x.
i.e., F is 1-1); hence F(x) ∈
Thus

F(b) <2 F(x)

by (1), (4) and 1-1-ness of F (the last property sharpens “≤2 ” to “<2 ”). This
contradicts (3) since c = F(b). We have established (2).
By VI.2.31 we have one of

dom(F) = A (5)

dom(F) =<1 a for some a ∈ A (6)

322 VI. Order

Before proceeding we show that

x ∈ ran(F) →<2 x ⊆ ran(F) (7)

This is trivial if ran(F) = B. Let then x ∈ ran(F), and take

c ∈ B − ran(F) (8)

(a new auxiliary constant) such that

c <2 x (9)

Let x = F(y). By (1) and (8)

F(y) ≤2 c

contradicting (9). This settles (7). We immediately conclude that

If ran(F) = B, then ran(F) = <2 b, where b = min{y : y ∈ B−ran(F)}.

Suppose now that (5) is the case. If also ran(F) = B, then we are done in this
case. If on the other hand ran(F) = B, then ran(F) is a segment by the above,
so we are done in this case.
Suppose ﬁnally that (6) is the case. Thus ran(F) is either all of B or a segment,
<2 b.
We will retire the proof if we show this latter subcase to be untenable: Indeed,
the function F ∪ {a, b} properly extends F, still satisfying (1) –

Pause. Do you believe this?

contradicting uniqueness of F (VI.2.31).

VI.3.21 Remark. The above theorem can form the basis for the comparability
of ordinals of the next section. Alternatively, one can prove the comparability
of ordinals directly and derive VI.3.20 (for WO sets) as a corollary (VI.3.23
below). We will return to this remark in the next section.

VI.3.22 Exercise.

(i) If <2 is known not to be left-narrow (the statement of the theorem allows
either possibility), then how are cases (a)–(c) affected?
(ii) Suppose that <2 is left-narrow as well, and A and B are proper classes.
What now?
VI.4. Ordinals 323

VI.3.23 Corollary. Let (A, <1 ) and (B, <2 ) be any WO sets. Then exactly one
of the following cases obtains:

(a) The two WO sets are isomorphic,

(b) (A, <1 ) is isomorphic to a segment of (B, <2 ),
(c) (B, <2 ) is isomorphic to a segment of (A, <1 ).

VI.4. Ordinals
Let (A, <) be a WO set, where A = ∅. Let a0 = min(A). If A − {a0 } = ∅, then
let a1 = min(A − {a0 }). In general, if A − {a0 , a1 , . . . , an } = ∅, then define
an+1 to be min(A − {a0 , a1 , . . . , an }).
Possibly, for some smallest n ∈ N, A − {a0 , a1 , . . . , an } = ∅, and thus A =
{a0 , a1 , . . . , an }, so that a0 < a1 < · · · < an .
Another possibility, when A is (intuitively† ) infinite is, that we will exactly
need all the natural numbers in N in order to name the positions of the elements
of A in their (ascending) <-order; that is, A = {a0 , a1 , . . .} and a0 < a1 < · · · .
Is it possible that a WO set is so “long” that we will run out of position
names (from N) before we run out of positions in A? The answer (affirmative)
is straightforward:

VI.4.1 Example (Informal). Adjoin to N – equipped with the “natural order”

<= {i + 1, i : i ∈ N} – a new object. For example, adjoin the new object N
(“new” in the sense that N ∈ / N) to form A = N ∪ {N}.
Next, extend < to < A on A by < A = < ∪ {N, i : i ∈ N}. That is, i < A N for
all i ∈ N.
The requirement that the object N have a position immediately after all the
i’s in N forces us to run out of position names (supplied from N) when we are
naming the positions of the elements of the WO set A. Object N is the ﬁrst in
A that has no position name, if the name supply is just N.
Mathematicians use the name ω (the same one used for the set of formal
natural numbers) to name the position of N in A (that is, the ﬁrst position
after positions 0, 1, 2, . . .). Thus A is the “ordered sequence” a0 < A a1 < A
a2 < A · · · < A aω .
We can carry this further. We can imagine a WO set (B, < B ) that is so long
that it requires yet another position name after ω. We call this new position ω + 1,
so that B is the ordered sequence b0 < B b1 < B b2 < B < B · · · < B bω < B bω+1 .

† We are going to formalize the notions “ﬁnite” and “inﬁnite” in Chapter VII.
324 VI. Order

Similarly, for still longer WO sets one invents position names ω + 2, ω + 3,

etc.
What would be the name for the position immediately after all the ones
named ω + i (i ∈ N)? Mathematicians have invented the name “ω · 2”.

These position names of WO set elements are the so-called ordinals (also
called ordinal numbers). They provide (among other things) an extension of
the position-naming apparatus that N is.
In order to eventually come up with a well-motivated formal definition
of ordinals, let us speculate a bit further on their nature. Extrapolating from
the discussion of Example VI.4.1, let us imagine a sequence of position
names 0, 1, . . . , ω, ω + 1, . . . , ω · 2, ω · 2 + 1, . . . , ω · 3, ω · 3 + 1, . . . of suffi-
cient length so that the elements of any WO set (A, <) can fit, in ascending
order (with respect to the WO set’s own “<”) contiguously from left to right in
named position slots (starting with the 0th position slot).
Once we have so fitted (A, <), let the ordinal α be the first unused position
name. This α characterizes the “form” or “type” of the WO set (A, <), in the
sense that if (B, <1 ) is another WO set such that (A, <) ∼ = (B, <1 ), then the
elements of B, in view of
A : a0 < a1 < · · · < aγ . . .
B : b0 < b1 < · · · < bγ . . .
will occupy exactly the same positions as the A-elements, and thus, once again,
α will be the first unused position name.

Hence, a formal deﬁnition of ordinals must ensure that they are objects of set
theory associated with WO sets in such a way that the same ordinal corresponds
to each WO set in a class of pairwise isomorphic WO sets. That is, one looks
for a function & . . . & , deﬁned on all WO sets, such that
&(A, <1 )& = &(B, <2 )& iff (as WO sets) (A, <1 ) ∼
= (B, <2 )
The range of & . . . & will be the class of all ordinals – which turns out to be a
proper class.

The above observations led to Cantor’s original deﬁnition:

VI.4.2 Tentative Deﬁnition. (See Wilder (1963, p. 111).) The ordinal or or-
dinal number of a WO set (A, <) is the class of all WO sets (B, <1 ) such that
(A, <) ∼
= (B, <1 ).

The “permanent” deﬁnition will be given in VI.4.16.

VI.4. Ordinals 325

VI.4.3 Remark. The reader can readily verify that ∼ = is an equivalence rela-
tion
$ on the
% class of all WO sets. Thus, the above deﬁnition adopts (A, <) "→
(A, <) ∼=
(recall the notation introduced in V.4.3) as the function & . . . &.
It turns out that the equivalence classes [. . . ]∼ = are too big to be sets
(Exercise VI.7), so that they are inappropriate as formal objects of the
theory.

Therefore we try next

VI.4.4 Tentative Deﬁnition. (See Kamke (1950, p. 57).) The ordinal

$ or ordinal
%
number of a WO set (A, <) is an arbitrary representative out of (A, <) ∼=
.

The new definition gets around the difficulty mentioned in VI.4.3. However,
it creates a great sense of uncertainty with the indefinite (“an arbitrary represen-
tative”) manner in which an ordinal is “defined”. To conclude this discussion
that peeks into the history of the development of ordinals (mostly by Cantor),
let us try and fix the latest tentative definition (VI.4.4) so that we can appreciate
that the old-fashioned way of introducing ordinals could be made to work. We
will fix the definition and follow up some of its early consequences. Once this
is done, we will have on hand enough motivational ideas to start from scratch
with von Neumann’s modern definition. The reader will benefit from knowing
both points of view.

Warning. All these tentative deﬁnitions are informal and deal with metamath-
ematical concepts.

VI.4.5 Tentative Deﬁnition. The ordinal or ordinal$ number

% of a WO set
(A, <), in symbols &(A, <)&, is that element of (A, <) ∼ =
picked up by the
principle† of global (strong) choice. “On” denotes the class of all ordinals.

We are not committing ourselves above to an assumption that we have strong

or global choice, an assumption that would entail (indeed, would be equivalent
to – see the informal discussion in Section IV.2) the well-orderability of U M .
The reader is strongly reminded that, until the definitive definition of ordinals
in VI.4.16 below, all that these tentative attempts towards a definition do is to
outline briefly the history that led to the definitive definition. For this reason,
any auxiliary assumptions introduced to make these tentative definitions tenable
will be discarded as soon as we reach VI.4.16.

† We avoid the term “axiom”. The reason is explained in the commentary following the deﬁnition.
326 VI. Order

We have the following trivial consequence of this deﬁnition:

VI.4.6 Proposition (Informal). &(A,<1 )& = &(B,<2 )& iff (A,<1 ) ∼

= (B,<2 )
for any two WO sets.

VI.4.7 Remark. All along, when we wrote (A, R) for a set A equipped with a
relation R ⊆ A × A, the symbol (. . . , . . .) was used informally, simply to remind
us of the two ingredients of the situation, namely A and R (see also the footnote
to VI.1.11).
In instances such as Tentative Deﬁnitions VI.4.2–VI.4.5, for example in uses
such as &(A, R)&, one would expect to use the formal A, R instead, so that the
“pair” of A and R is an object of the theory (a set). However, we will continue
using round brackets to denote PO sets, as we have previously agreed to do.
Ordinals will be denoted by lowercase Greek letters, in general. Notation
for speciﬁc ordinals may differ (see the following example).

VI.4.8 Example (Informal). What is &({0, 1}, <)&, where < is the standard
order (∈) on ω? According to VI.4.5, it is whichever WO set of exactly two
% {b, a}) for some a = b) strong AC will pick out of
elements $(say, ({a, b},
the class ({0, 1}, <) ∼=
. We naturally use a standard name, the symbol “2”, to
denote the ordinal of a WO set of two elements. This is summed up as

&({a, b}, {b, a})& = &(2, <)& = 2

since 2 = {0, 1}.

Similarly, the symbol “n” (in ω) will denote the ordinal &(n, <)&, where
again, <=∈. Finally, we have already remarked that ω will be the short name
for the ordinal &(ω, ∈)&.

Next, we consider the ordering of ordinals.

VI.4.9 Tentative Deﬁnition. An order, <, is deﬁned on On as follows: Let α

and β be two ordinals. Then α < β iff α is isomorphic to a segment of β.

VI.4.10 Remark (Informal). Recall that α = (A, <1 ) and β = (B, <2 ) for
some appropriate A, B, <1 , <2 . Now, intuitively, (A, <1 ) can be embedded
into (B, <2 ) as a segment iff the sequence

B : b 0 <2 b 1 <2 · · ·
VI.4. Ordinals 327

is longer than the sequence

A : a0 <1 a1 <1 · · ·
and hence, iff the position immediately to the right of the A-sequence is to the
left of the position immediately to the right of the B-sequence. The italicized
text says, intuitively, that α < β; therefore the above deﬁnition is consistent
with our view that the ordinal of a WO set is the position name of the ﬁrst
position to the right of the set.

VI.4.11 Proposition (Informal). If α and β are ordinals, then exactly one of

α < β, α = β, β < α holds.

Proof. Let α = (A, <1 ) and β = (B, <2 ). By VI.3.23, exactly one of the follow-
ing holds:
(a) (A, <1 ) ∼
= (B, <2 ),
(b) (A, <1 ) is isomorphic to a segment of (B, <2 ),
(c) (B, <2 ) is isomorphic to a segment of (A, <1 ).
(b) and (c) say α < β and β < α, respectively, by VI.4.9.
By (a), both (A, <1 ) and (B, <2 ) are in the same equivalence class. Since
strong choice picks “deterministically” a unique representative from each equiv-
alence class, and each of (A, <1 ) and (B, <2 ) is a representative, it follows that
(A, <1 ) = (B, <2 ), i.e., α = β.

VI.4.12 Proposition (Informal). On is well-ordered by “<” of VI.4.9.

Proof. By VI.4.11, < is total. By VI.3.16, < is irreﬂexive. The reader can verify
that it is also transitive (Exercise VI.8). Therefore, < is a linear order.
Let next ∅ = A ⊆ On (A need not be a set). Let α ∈ A. If α = min(A),† then
we are done; otherwise X = {β ∈ A : β < α} is nonempty. Let α = (Y, <1 ).
Next, if β = (Z , <2 ) ∈ X , then (by VI.4.9) there is a unique

Pause. Why “unique”?

yβ ∈ Y such that β ∼
= (<1 yβ , <1 ). By collection, X is a set.

We show that there is a minimum β in X . If not, then (VI.2.13) there is an

inﬁnite descending <-chain in X :
· · · < β3 < β2 < β1 (1)

† The term “minimum” and “minimal” are interchangeable, since < is total (VI.1.19).
328 VI. Order

This induces the inﬁnite descending <1 -chain in Y :

· · · <1 yβ3 <1 yβ2 <1 yβ1 (2)

where yβi is chosen as above to satisfy

βi ∼
= (<1 yβi , <1 ) (3)

(2) contradicts the fact that (Y, <1 ) is a WO set (VI.2.13), and we have shown
that X has a minimal element, as long as we manage to convince that the inequal-
ities in (2) indeed hold.
To this end, let β < γ in X , where β = (Z , <2 ), γ = (W, <3 ). We have

γ ∼
= (<1 yγ , <1 ) (4)

and

β∼
= (<1 yβ , <1 ) (5)

Also, by β < γ ,

β∼
= (<3 u, <3 ), where u ∈ W (6)

Observe that yβ = yγ (otherwise, β ∼ = γ from (4) and (5), whence β = γ

by VI.4.6, contradicting β < γ (irreﬂexivity)).
Assume now that yγ <1 yβ . Then γ , that is (W, <3 ), is isomorphic to a seg-
ment of (<1 yβ , <1 ) by (4), and therefore to a segment of (<3 u, <3 ) by (5)
and (6). That is, (W, <3 ) ∼
= (<3 v, <3 ), where v <3 u, contradicting VI.3.16.
Thus, yβ <1 yγ .
Let then β be the <-minimal in X . We claim that β is <-minimal (also
<-minimum) in A, which will rest the case. Indeed, if not, then for some γ in
A, γ < β. Then γ ∈ X by transitivity of <, contradicting the minimality of β
in X .

VI.4.13 Proposition (Informal Normal Form Theorem). For each α ∈ On,

{β : β < α} ∼
= α.

Since we have adopted the convention that lowercase Greek letters stand for
ordinals, we will use the shorthand “{β : . . .}” for “{β ∈ On : . . .}”. Also, recall
that – for now – α = (A, <1 ) for some A (set), so that the <1 -ingredient is incor-
porated in the notation “· · · ∼
= α”. In writing “{β : . . .} ∼
= · · ·”, however, we are
VI.4. Ordinals 329

slightly abusing notation, since we ought to have written “({β : . . .}, <) ∼
= · · ·”
instead, where < is the order on On deﬁned in VI.4.9. This type of notational
abuse is common when the order is clearly understood (this echoes the remark
following VI.3.3).

Proof. Let α = (Y, <1 ) and X = {β : β < α}.† As in the proof of VI.4.12, for
each β ∈ X we pick a yβ ∈ Y such that

β∼
= (<1 yβ , <1 ) (1)

Consider the relation F = {β, yβ : β ∈ X }. It is single-valued in yβ , for if also

β∼
= (<1 yβ , <1 ) and (without loss of generality) yβ <1 yβ , then

(<1 yβ , <1 ) ∼

= (<1 yβ , <1 )

contrary to VI.3.16.
We saw in the proof of VI.4.12 that F is order-preserving (γ < β → yγ <1
yβ ); hence (by VI.3.8)
F
(X, <) ∼
= (ran(F), <1 ) (2)

where ran(F) ⊆ Y . Now, if y ∈ Y and β = &(<1 y, <1 )&, then β < α by VI.4.9;
hence β ∈ X and F(β) = y. This shows that F is onto Y .

VI.4.14 Proposition (The (Informal) Burali-Forti Antinomy). On is not a

set.

Proof. In the contrary case, (On, <) is a WO set, by VI.4.12. So let

α = &(On, <)&

Thus,

(On, <) ∼
=α (1)

By VI.4.13,

(< α, <) ∼

=α (2)

(1) and (2) yield (On, <) ∼

= (< α, <), contradicting VI.3.16.

† This is a different X from the one employed in the proof of VI.4.12.

330 VI. Order

VI.4.15 Remark. The Burali-Forti antinomy is the ﬁrst contradiction of naı̈ve

set theory, discovered by Burali-Forti (and Cantor himself). It is a “paradox”
or “antinomy” in that it contradicts the thesis (Frege’s) that for any formula
of set theory, F (x), the class {x : F (x)} is a set. The F (x) in question here is
“x is an ordinal”.
Observe, on the other hand, that by VI.4.13, by the fact that any α is
some (A, <1 ) for some set A, and by collection, we have that {β : β < α} = <α
is a set for any α. In particular, this says that our tentative < on On is left-
narrow.

By the normal form theorem (VI.4.13), ({α : α < &(A, <1 )&}, <), where “<”
is that of VI.4.9, is a member of [(A, <1 )]∼
= for all (A, <1 ).

Let us ponder then what would be the consequences if the principle of global
(strong) choice (invoked in VI.4.5) were to be so smart as to always pick

({α : α < &(A, <1 )&}, <) (i)

for “&(A, <1 )&”, for all WO sets (A, <1 ). Since the order in all instances of (i)
is the same (that is, < of VI.4.9), we could go one step further and just use the
set {α : α < &(A, <1 )&} as the ordinal for the WO set (A, <1 ), implying, rather
than including explicitly, the order <. Of course, the sets in (i) are ∼
=-invariants
just as they are when thought of as WO sets under <. Under the pondered
circumstances we end up with a recurrence
def
&(A, <1 )& = {α : α < &(A, <1 )&}
where the α’s are the ordinals (according to our present speculative analysis)
assigned to the segments of (A, <1 ) by VI.4.9.
The self-referential definition above, can also be written more simply as
< α = α (ii)
Let us “compute” (i.e., find which sets are) the first few ordinals. For example,†
(∅, <1 ) has no segments; therefore the set {α : α < &(∅, <1 )&} is empty, i.e.,

&(∅, <1 )& = ∅, the smallest ordinal

Now, ({∅}, <1 ) (where <1 is empty as well) has one segment only, (∅, <1 ).
Hence, exactly one ordinal, ∅, is smaller than &({∅}, <1 )&. Thus
&({∅}, <1 )& = {∅} (iii)

† <1 is the empty order here.

VI.4. Ordinals 331

Next, let us compute &({a, b}, <2 )&, where a = b and a <2 b. The only segments
are (∅, <2 ) and ({a}, <2 ), which have ordinals ∅ and {∅} respectively.
Of course, ({a}, <2 ) ∼= ({∅}, <1 ); hence &({a}, <2 )& = &({∅}, <1 )& = {∅}
by (iii). Thus, †

&({a, b}, <2 )& = &({∅, {∅}}, <)& = {∅, {∅}}

Note that for the ﬁrst three ordinals, at least, their order < coincides with ∈,
since

∅ ∈ {∅} ∈ {∅, {∅}}

This is true of all ordinals, for‡ β ∈ α iff (by (ii)) β ∈ <α iff β < α.
Continuing our pondering on what if global choice were smart, we observe
that each ordinal is a transitive set (Definition V.1.11). Indeed, let α ∈ β ∈ γ .
By the previous remark this is equivalent to α < β < γ ; hence, by transitivity
of <, α < γ ; therefore α ∈ γ . Of course, an ordinal, being the set of all the
smaller ordinals, will have as members only transitive sets.
Von Neumann showed that, surprisingly, these transitivity properties fully
characterize the “appropriate concept” of an ordinal as a “special set”, without
any recourse to any form of AC – from which we disengage in the following
“permanent” definition – and without any a priori reliance on the concept of
well-ordering either.
The reader is now asked to consider all the preceding attempts to get ordinals
off the ground as “motivational discussion” with a historical flavour. Therefore
the definitions and consequences VI.4.2–VI.4.14 are to be discarded. Our formal
study of ordinals starts with VI.4.16 below. In particular we will show that
ordinals (as defined below) are ∼=-invariants.

VI.4.16 Deﬁnition. (von Neumann.) An ordinal, or ordinal number, is a tran-

sitive set all of whose members are also transitive sets.
For the record, we may introduce a unary predicate “Ord” – which says of
its argument that it is an ordinal – by the deﬁnition (1) below, where T itself is
the unary predicate introduced by T (x) ↔ ¬U (x) ∧ (∀y ∈ x)(∀z ∈ y)z ∈ x.
T (x) says that x is a transitive set:

Ord(x) ↔ T (x) ∧ (∀y ∈ x)T (y) (1)

† < here is that of VI.4.9.

‡ By (ii), the only members of ordinals – in the current stage of the tentative deﬁnition – are
themselves ordinals. Thus “x ∈ α ∧ x ∈ On” – that is, “β ∈ α” – is equivalent to “x ∈ α”.
332 VI. Order

On will be the class of all ordinals; that is, On abbreviates the class term
{x : Ord(x)}. Lowercase Greek letters will denote arbitrary ordinals – that is,
we employ, in our argot, “ordinal-typed” variables α, β, γ , . . . , with or without
subscripts or primes, a notation that we extend to unspecified ordinal constants.
Thus in instances such as “. . . α . . . ” we will understand more: “. . . α ∧ α ∈
On . . . ”. Of course, specific ordinals (i.e., specific ordinal constants) may have
names deviating from this rule (e.g., ∅ in the lemma below).

We now embark on developing the properties of ordinals.

VI.4.17 Lemma. On = ∅.

Proof. ∅ satisﬁes Deﬁnition VI.4.16; therefore, ∅ ∈ On.

VI.4.18 Example. Here are some more members of On, as the reader can
readily verify using VI.4.16: {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}.
Indeed, every natural number n, and the set of natural numbers ω, are ordinals
by V.1.12–V.1.13.

The deﬁnition of ordinals does not explicitly state that the members of an
ordinal are themselves ordinals. The following lemma says that.

VI.4.19 Lemma. If x ∈ α, then x ∈ On.

The above statement can be rephrased in a number of ways: “On is a transitive

class”, or “α ⊆ On for all α”, or “every member of an ordinal is an ordinal”. This
last formulation coincides with the results of our earlier informal discussion.

Proof. Let

y∈x ∈α (1)

By VI.4.16, y ∈ α. Therefore (VI.4.16), y is a transitive set. By VI.4.16 and (1),

x is transitive. Since so is its arbitrary member y, x is an ordinal.

VI.4.20 Corollary (Normal Form Theorem). {β : β ∈ α} = α.

Proof. For any set y, y = {x : x ∈ y}. In particular, α = {x : x ∈ α}. By VI.4.19,

ZFC x ∈ α ↔ x ∈ α ∧ x ∈ On
VI.4. Ordinals 333

Hence, α = {x : x ∈ α ∧ x ∈ On}. Thus, using the notational convention of

VI.4.16, we may write α = {β : β ∈ α}.

Another way to say the above is “∈α = α”.

VI.4.21 Theorem (Burali-Forti Antinomy). On is not a set.

Proof. Suppose On is a set. By VI.4.19 it is an ordinal; hence On ∈ On, which

is impossible by foundation.

VI.4.22 Lemma. ∈ restricted on On is a partial order with MC.

“∈”, as the context hopefully makes clear, is here the relation deﬁned by the
predicate “∈”. We should not need to issue such warnings in the future.

Proof. That ∈ has MC on On is trivial, since it does so on U M (foundation).

Next, x ∈
/ x for all sets, so that ∈ is irreﬂexive. Finally, α ∈ β ∈ γ implies
α ∈ γ by VI.4.16; hence ∈ is transitive on On.

VI.4.23 Theorem. ∈ well-orders On.

Proof. By VI.4.22, it sufﬁces to show that ∈ is total on On. To this end, let

P (α, β) stand for α ∈ β ∨ α = β ∨ β ∈ α (0)

We will show that

(∀α)(∀β)P (α, β) (1)

where, of course, quantiﬁcation is over On, as the “typed” variables α and β

make clear (cf. VI.4.16). We can prove (1) by ∈-induction or, equivalently,
by ∈-MC. We do the latter. So let the negation of (1) hold (i.e., we argue by
contradiction), that is, we add the assumption

(∃α)(∃β)¬ P (α, β) (2)

By ∈-MC on On, let α0 be ∈-minimal such that

(∃β)¬ P (α0 , β) (3)

Next, let β0 be ∈-minimal such that

¬ P (α0 , β0 ) (4)
334 VI. Order

Let now
γ ∈ β0 (5)
Then P (α0 , γ ) by (4) and ∈-minimality of β0 . P (α0 , γ ) (by (0)) yields one of:
Case 1. α0 = γ . That is, (by (5)) α0 ∈ β0 , contradicting (4).
Case 2. α0 ∈ γ . By (5) and transitivity of β0 , again α0 ∈ β0 ; unacceptable. We
must therefore have
Case 3. γ ∈ α0 .
Thus, by (5),
β0 ⊆ α0 (6)
Next, let
δ ∈ α0 (7)
Then ∈-minimality of α0 , and (3),† yield P (δ, β0 ). The latter yields in turn
(by (0))
Case 1. δ = β0 . That is, (by (7)) β0 ∈ α0 , contradicting (4).
Case 2. β0 ∈ δ. By (7) and transitivity of α0 , again β0 ∈ α0 ; unacceptable. This
leaves
Case 3. δ ∈ β0 .
Hence α0 ⊆ β0 , which along with (6) yields α0 = β0 . This contradicts (4).

The above proof is essentially a duplicate of that for the trichotomy of ∈

over the natural numbers (V.1.20), although here it covers a wider context. The
reader may also wish to compare the above proof with that in Example VI.2.38.

VI.4.24 Corollary. x ∈ On iff x is a transitive set that contains no atoms as

members, and ∈ well-orders x.

Proof. Only-if part. By VI.4.19, an ordinal x satisﬁes x ⊆ On. Thus, the restric-
tion of the well-ordering ∈ of On on x well-orders the latter. Moreover, by
VI.4.16 no atom is an ordinal; thus y ∈ x → ¬U (y).
If part. Let x be a transitive set that contains no atoms, and let the restriction
of ∈ on x be a well-ordering. Let y ∈ x. First off, ¬U (y).
We need only show that y is transitive. To this end let, in conjunctional
notation,
u ∈ v ∈ y (∈ x) (1)

† ¬(∃β)¬ P (δ, β) follows; hence (∀β) P (δ, β); thus P (δ, β0 ) by specialization.
VI.4. Ordinals 335

Applying transitivity of x twice, we get in turn v ∈ x and u ∈ x. Thus {u, v, y} ⊆

x. Since ∈ is a well-ordering on x, it is also transitive on x. Hence, (1) yields
that u ∈ y. But then y is a transitive set.

It is a trivial observation that the corollary above goes through even if well-
order(ing) were relaxed to total order(ing), as the reader can readily check.
However the above “redundant” formulation is necessary if one desires to found
the notion of ordinal in the absence of the foundation axiom (that axiom was
used in VI.4.23 in an essential way). In that case one takes the statement of
Corollary VI.4.24 as the deﬁnition of ordinals.
In this discussion, enclosed between double “dangerous turn” road signs, we
digress to peek into this possible avenue of founding ordinals. This discussion
is only of use in the proof of the consistency of foundation with the remaining
axioms of ZFC, and can otherwise be omitted with no loss of continuity. So
we temporarily suspend here (i.e., until the end of this “doubly dangerous”
material) the axiom of foundation, and deﬁne:

VI.4.25 Alternative Deﬁnition (Ordinals in ZF − Foundation). x is an or-

dinal iff it is a transitive atom-free set that is well-ordered by ∈. The notational
conventions of VI.4.16 will apply; in particular, we continue using the symbol
“On” for the class of all ordinals.

We have the following consequences, in point form:

(1) α ∈/ α for all ordinals. (Careful here! We cannot rely on foundation to say
that ∈ is irreﬂexive.) Indeed, let

α∈α (i)

Since (the right hand) α is (well-)ordered by ∈,

∈ | α is irreﬂexive. (ii)

By (i), the left α is a member of the right α, so that (ii) yields α ∈/ α (these
are two copies of the left α). We have just contradicted (i).
(2) x ∈ α ∈ On → x ∈ On. Assume x ∈ α ∈ On. As in the proof of the if
part of VI.4.24, x is a transitive set. By transitivity of α, x ⊆ α. At once we
obtain that x is atom-free, since α is. Moreover, x is well-ordered by ∈, an
order inherited from α. Thus, the alternate Deﬁnition VI.4.25 too implies
that On is transitive, that is, ordinals only contain ordinals as members.
(3) α ⊂ β → α ∈ β. Assume α ⊂ β, and let γ ∈ β − α (set difference) be
∈-minimum (we say minimum rather than minimal because ∈ is total on β).
336 VI. Order

Therefore, if δ ∈ γ ; then δ ∈
/ β −α. On the other hand, δ ∈ β by transitivity
of β. Thus δ ∈ α; hence

γ ⊆α (iii)

Next, let

δ∈α (iv)

Since α ⊂ β, we have δ ∈ β. Moreover, γ ∈ β as well; hence (since ∈ is

total on β) we have three cases:
Case 1. γ = δ. This is untenable due to γ ∈ / α and (iv).
Case 2. γ ∈ δ. This is impossible as well, as it yields γ ∈ α by transitivity
of α and (iv). This leaves
Case 3. δ ∈ γ . Along with (iv), this yields α ⊆ γ ; hence α = γ by (iii).
We conclude (using =, ∈, ⊆ conjunctionally) that

α = γ ∈ (β − α) ⊆ β

In short, α ∈ β.
(4) α = β ∨ α ∈ β ∨ β ∈ α. Let α = β. Observe that α ∩ β ⊆ α and α ∩ β ⊆ β.
Also, α ∩ β is transitive (verify) and well-ordered by ∈ (as a subset of α)
as well as atom-free (hence an ordinal). By hypothesis, one of the two
inclusions (⊆) must be proper (⊂), in which case the other is equality.
Indeed, if they are both proper, then, by (3), α ∩ β ∈ α and α ∩ β ∈ β;
hence α ∩ β ∈ α ∩ β, contradicting (1). Say, α ∩ β = α, i.e., α ⊆ β. Since
we have assumed α = β, (3) yields α ∈ β.
(5) α = {β : β ∈ α}. By (2).
(6) On is well-ordered by ∈. We need only establish ∈-MC on On. So let ∅ =
A ⊆ On, a subclass of On. Let α ∈ A. If α ∩ A = ∅, then ∈ α ∩ A = ∅,
that is, α is ∈-minimal in A. If now α ∩ A = ∅, then let β be ∈-minimal in
α ∩ A (α ∩ A is a subset of the WO set (α, ∈)). We argue that β is ∈-minimal
in A. If not, let γ ∈ A be such that γ ∈ β. Then γ ∈ α by transitivity of α;
hence γ ∈ α ∩ A. This contradicts the choice of β.
This approach (sans foundation) may be considered attractive in that it re-
lies on fewer axioms than that of VI.4.16. We are nevertheless committed to
having foundation (which has already provided us with some interesting re-
sults in VI.2.38) and therefore will continue our development based on Deﬁni-
tion VI.4.16.

VI.4.26 Deﬁnition. As is normal practice, we will often utilize the symbol “<”
for the well-ordering ∈ | On. Thus α < β means exactly α ∈ β.
VI.4. Ordinals 337

VI.4.27 Lemma. The reﬂexive closure, rOn (<) =≤, of < coincides with ⊆ on
On, i.e., ≤ = ⊆ on On.

Proof. Let α ≤ β. This means that α = β or α ∈ β. In the former case,

α ⊆ β is immediate. In the latter case it follows from the transitivity of β
(γ ∈ α → γ ∈ β).
Conversely, let α ⊆ β. Since we want α = β ∨ α ∈ β, by VI.4.23 we only
need to discredit the hypothesis β ∈ α. Well, that hypothesis, along with the
original hypothesis, leads to β ∈ β.

VI.4.28 Example. We already know that ∅ ∈ On. ∅ is the <-minimum element

of On, since for any α, ∅ ⊆ α translates to ∅ ≤ α.

VI.4.29 Remark. By VI.4.20, α = {β : β < α}, or α = <α, i.e., an ordinal

is the set of all smaller ordinals. Thus, Deﬁnition VI.4.16, offered without the
assistance of either AC or the concept of well-ordering, yields formally the
property (ii) of ordinals that we had arrived at in our wishful motivational
discussion prior to VI.4.16. The next result will establish that ordinals are ∼
=-
invariants of WO sets. Thus we have come now full circle, and all the pieces of
the puzzle ﬁt.

VI.4.30 Theorem. Let (A, <1 ) be a WO set. Then there is a unique ordinal α
and a unique isomorphism φ A for which
φA
(A, <1 ) ∼
= (α, <)
where < is the standard order ∈ on On.

Proof. By VI.3.20, and for the WO classes (A, <1 ) and (On, <),† we have these
three alternatives:

(1) (On, <) is isomorphic to a segment of (A, <1 ). This is impossible, because
collection would force On to be a set.
(2) (On, <) ∼ = (A, <1 ). Untenable, as in (1).
(3) So it must be that (A,<1 ) ∼
= (< α,<) for some α. By VI.4.20 (cf. VI.4.29),
this says
φA
(A, <1 ) ∼
= (α, <) (i)

† Note that both <1 and < are left-narrow, the former because A is a set, the latter by VI.4.20.
338 VI. Order

for some φ A . Suppose that (A, <1 ) ∼

= (β, <) as well, where (without loss
of generality) β < α, i.e., β ∈ α. By (i), (α, <) ∼
= (β, <); hence (α, <) ∼
=
(< β, <) contradicting VI.3.16. This settles uniqueness of α.
By VI.3.19, φ A is unique.

VI.4.31 Corollary. (α, <) ∼

= (β, <) iff α = β.

Proof. The if part is trivial. The only-if part was proved in the course of the
above proof (uniqueness of α).

VI.4.32 Remark. (1) We can prove VI.4.30 without recourse to VI.3.20 by

defining φ A on A by <1 -recursion as follows:†
φ A (a) = {φ A (b) : b <1 a} for all a ∈ A
or
φ A (a) = φ A [<1 a] (1)
The reader is asked to pursue this. Once successful, one can turn to prove VI.3.23
(for WO sets) using the theorem on the comparability of ordinals.
It is important to note that (1) is the only possible definition for φ A , for it is
tantamount to the requirement that φ A be an isomorphism, i.e., that
φ A (b) ∈ φ A (a) iff b <1 a
(2) Whenever (A, <1 ) ∼ = (α, <), we can intuitively think that the members
of α serve as “indices” (or position names) for the ordering of the elements of
A in ascending order. α is the first index not needed in this “enumeration” of
A. Compare this remark with the discussion that launched this section.
(3) With the normally accepted abuse of notation, we often write α ∼ = β to
mean (α, <) ∼= (β, <); thus Corollary VI.4.31 can be also stated as “α ∼= β iff
α = β ”.
(4) Rephrasing VI.4.30, we can state that every [(A, <1 )]∼
= contains a unique
ordinal WO set, (α, <). We can then re-introduce the symbol &(A, <1 )&, for-
mally this time, without any reliance on any form of AC, to mean (α, <), where
(A, <1 ) ∼
= (α, <). Actually, the following (final) definition picks just the ordinal
α rather than the WO set (α, <).

† We assume without loss of generality that <1 has A as its ﬁeld; therefore we do not need to add
“∧ b ∈ A” to the condition b <1 a.
VI.4. Ordinals 339

VI.4.33 Deﬁnition (Order Types of WO Sets). For any WO set (A, <1 ), the
symbol &(A, <1 )& stands for the unique α which by VI.4.30 satisﬁes

(A, <1 ) ∼
= (α, ∈)

α is called the order type of (A, <1 ). If A is a set of ordinals and <1 = ∈, then
we use the simpler notation &A& = α (rather than &(A, <1 )& = α).

VI.4.34 Corollary. &α& = α.

Proof. (α, <) ∼

= (α, <) and VI.4.33.

VI.4.35 Corollary. (A, <1 ) ∼

= (B, <2 ) iff &(A, <1 )& = &(B, <2 )&.

Proof. If part. (A, <1 ) ∼

= &(A, <1 )& = &(B, <2 )& ∼
= (B, <2 ).
∼ ∼ ∼
Only-if part. &(A, <1 )& = (A, <1 ) = (B, <2 ) = &(B, <2 )& and VI.4.31.

VI.4.36 Example. Let ∅ = A be any class of ordinals (not just a set). Then A
is a set. We will argue that it is an ordinal, indeed the smallest (<-minimum)
ordinal in A.

Let α ∈ A. Since A ⊆ α, A is well-ordered by < (i.e., ∈), and since

none of α contains atoms (by VI.4.16), nor does A. Using VI.4.24, we need

only show that A is transitive in order to conclude that it is an ordinal.

So let β ∈ α ∈ A. Thus, γ ∈ A → β ∈ α ∈ γ . By transitivity of γ ,

γ ∈ A → β ∈ γ ; hence β ∈ A.

Next,

α∈A→ A⊆α (1)

translates to (VI.4.27) α ∈ A → A ≤ α. Thus, it only remains to prove that

A ∈A

or that some among the inclusions (1) is an equality. If not, by VI.4.27

α∈A→ A ∈α

Hence A∈ A, a contradiction.
340 VI. Order

VI.5. The Transﬁnite Sequence of Ordinals

Ordinals have essentially been introduced in order to extend the natural number
sequence, so that one can index, or number, elements of arbitrary WO sets.
Also, in much the same way that the natural numbers are employed to label
steps, or stages, in a mathematical construction (by a recursive definition over
ω), ordinals can be used for the same purpose whenever the natural number
sequence is not “long enough” for the labeling, or, in the case of a mathemat-
ical construction, whenever the stages of the construction are too many to be
labelled by natural numbers. In this section we learn how to count (or how to
sequence) with the help of ordinals, and find that ordinals are naturally split
into three mutually exclusive types, namely, 0, limit ordinals, and successor or-
dinals. In the light of this classification we revisit, or rephrase, the principles of
induction and recursive (inductive) definitions, already studied in Section VI.2
for arbitrary WO classes, as these principles apply over On or over an arbitrary
α. We will apply both induction and inductive definitions over On, under their
new guise, in the next section to formally construct “all sets” starting from the
urelements, and to study their properties. In particular, we will find there that
the vague principle of “set formation by stages” – on which we have based
the discovery, and “truth”, of all the ZFC axioms except that of collection and
infinity – can be made precise with the help of ordinals.
We recall here Definition V.1.1 of the set successor operation x ∪ {x} on any
set x, and the notational convention established in V.1.19.

VI.5.1 Deﬁnition (Successor on On). The successor operation on sets x is

deﬁned by x ∪ {x} and is generally denoted by S(x).
If x ∈ On, then x + 1 is a preferred synonym for S(x).

The notation α + 1 for ordinals is consistent with that for the natural numbers
(V.1.19). However, unlike the special case of natural numbers (which are “ﬁnite
ordinals”), where n + 1 = 1 + n is provable for the free variable n over ω
(cf. V.1.24), it is not the case that α + 1 = 1 + α is provable in general. For
example, we will soon see that ZFC 1 + ω = ω + 1. Of course, we have not
yet said what “1 + α” ought to mean in general, but that will be done soon.
We also recall here the result of Lemma V.1.9 and the fact (see remark prior
to the proof) that

ZFC S(x) = S(y) → x = y

for free x and y, not just for variables restricted over ω. In particular,

ZFC α + 1 = β + 1 → α = β (1)
VI.5. The Transﬁnite Sequence of Ordinals 341

VI.5.2 Example (What If?). Let us prove (1) above again, this time without
using foundation (which was used in V.1.9), but instead taking as independently
given the fact that <, that is ∈, on On is a total order (for example, this would
have been the avenue taken if we were to omit the axiom of foundation and
define ordinals as in the alternate definition VI.4.25).
Under such restrictions we still have trichotomy (see p. 336, item (4)), i.e.,
we have one of α = β, α ∈ β, β ∈ α. So assume α + 1 = β + 1, and let
α∈β (2)
Now, the hypothesis α ∪ {α} = β ∪ {β} implies (via ⊇) that β ∈ α or β = α,
either of which in turn implies, along with (2), that β ∈ β, contradicting the
irreflexivity of the order ∈ on On. Similarly, β ∈ α is untenable. This leaves
α = β.

VI.5.3 Lemma. α + 1 is an ordinal.

Proof. Let y ∈ α + 1 = α ∪ {α}. Then y ∈ α or y = α; hence y ∈ On in either

case.
So every member of α + 1 is transitive. Next, x ∈ y ∈ α + 1 implies (as
above) the cases x ∈ y ∈ α and x ∈ y = α.
In either case x ∈ α; hence x ∈ α + 1. Thus α + 1 is transitive too.

VI.5.4 Lemma. ZFC α < β ↔ α + 1 ≤ β.

Proof. ←: Let α ∪ {α} ⊆ β (translating “≤” to “⊆” via VI.4.27). Then α ∈ β.

→: Let α ∈ β. By transitivity of β, α ⊆ β, which along with the hypothesis
gives α ∪ {α} ⊆ β.

As a special case, we have ZFC n < m ↔ n + 1 ≤ m, where n, m are natural

number variables.

VI.5.5 Lemma. ZFC α < β ↔ α + 1 < β + 1.

Proof. →: Let α < β. By VI.5.4, α + 1 ≤ β. But β < β + 1 (i.e., β ∈ β ∪ {β});

hence α + 1 < β + 1 (this formula simply says “β < β + 1” if α + 1 = β;
otherwise it follows from the transitivity of < on On).
←: Let α + 1 < β + 1. Possible conclusions are β < α, β = α, or
α < β. The first option gives β + 1 < α + 1 (by the →-direction) and hence
α + 1 < α + 1 by the hypothesis and transitivity, contradicting irreflexivity
of <. The second option gives (Leibniz) β + 1 = α + 1, again contradicting
irreflexivity along with the assumption. It remains to take α < β.
342 VI. Order

As a special case, we have ZFC n < m ↔ n + 1 < m + 1, where n, m are

natural number variables.

VI.5.6 Lemma. α + 1 = ∅.

Proof. α ∈ α + 1.

VI.5.7 Remark. Lemma VI.5.6 generalizes the case of natural numbers (V.1.8)
to all ordinals.
From now on we will freely use the symbol 0 for ∅ in all contexts where
the latter is thought of as an ordinal rather than just the empty set, since 0 is
the symbol we have assigned to the smallest ordinal – the natural number 0
(V.1.19).

As ω is an inductive set, indeed the ⊆-smallest inductive set (V.1.5), it follows

that whenever n ∈ ω then also n + 1 ∈ ω; thus we cannot “reach” ω if we start
at 0 and keep applying the successor operation. This makes ω a limit ordinal.

VI.5.8 Deﬁnition. A limit ordinal is an α such that

(1) α = 0, and
(2) whenever β ∈ α, then also β + 1 ∈ α.
The notation “Lim(α)” says “α is a limit ordinal”.
An ordinal α is a successor ordinal, or simply successor, just in case α =
β + 1 for some β.

(1) Some authors allow 0 to be a limit ordinal (e.g., Jech (1978b)).

(2) Rephrasing VI.5.8, we can say (by V.1.1) that Lim(α) iff α is an inductive
set.
(3) Every n ∈ ω such that n = 0 is a successor ordinal (V.1.10). So successor
ordinals exist ({∅} and {∅, {∅}}, or 1 and 2, are two speciﬁc examples).
(4) If α = β + 1, then β is uniquely determined by α (by (1) on p. 340). We
use the notation β = α − 1 or β = pr (α) and call β the predecessor of α
(cf. V.1.15), but will not bother to introduce pr on On formally.

VI.5.9 Proposition. An ordinal is a successor ordinal iff it is a successor set.

Proof. The only-if part is trivial.

If part. Say x ∪ {x} = α. Then x ∈ α; hence x = β for some β.
VI.5. The Transﬁnite Sequence of Ordinals 343

VI.5.10 Proposition. Limit ordinals exist.

Translation: There is a set that is a limit ordinal.

Proof. By the axiom of inﬁnity (V.1.3), it has already been established that ω
is a set that is a limit ordinal.

VI.5.11 Theorem. Every α falls under exactly one of the following cases:

(1) α = 0,
(2) Lim(α),
(3) α is a successor.

Proof. First, the cases are mutually exclusive. Indeed, (1) excludes (2) by def-
inition, while it excludes (3) by VI.5.6. We verify that (2) excludes (3). Say
Lim(α), yet α = β + 1 for some β. Then β < α and, by (2), β + 1 < α – i.e.,
α < α – a contradiction (see also V.1.14).
Next, let α = 0 and also ¬ Lim(α). Therefore, by trichotomy, for some
β < α we have α ≤ β + 1. By VI.5.4 and ≤ = ⊆ on On, this yields α = β + 1,
thus α is a successor.

Since On and any α are well-ordered by <, the results on induction and in-
ductive deﬁnitions presented in Section VI.2 carry over with minor translations:

VI.5.12 Theorem (Induction over On on Variable α). To prove (∀α)F (α) it

sufﬁces to prove, for arbitrary α, that F (α) follows from the induction hypo-
thesis (∀β < α)F (β).

Of course, “(∀α)” means “(∀α ∈ On)” (VI.4.16), while “for arbitrary α” means
that α is a free ordinal variable.

VI.5.13 Theorem (Induction over δ on Variable α). To prove (∀α < δ)F (α)
it sufﬁces to prove, for arbitrary α < δ, that F (α) follows from the induction
hypothesis (∀β < α)F (β).

The general case of inductive deﬁnitions (VI.2.25) becomes:

VI.5.14 Theorem (Recursive or Inductive Deﬁnitions over On). Let G be a

(not necessarily total) function G : On × U M → X, for some class X. Then
344 VI. Order

there exists a unique function F : On → X satisfying

(∀α)F(α) ! G(α, F |` α)

Proof. Recall that <α = α. In particular, this makes < over On left-narrow.

VI.5.15 Theorem (Recursive or Inductive Deﬁnitions over δ). Let G be a

(not necessarily total) function G : δ × U M → X, for some class X. Then there
exists a unique function F : δ → X satisfying

(∀α ∈ δ)F(α) ! G(α, F |` α)

Particularly useful is the translation of Corollary VI.2.31 in the context of

On. We obtain two corollaries:

VI.5.16 Corollary (Pure Recursion over On). Let G be a (not necessarily

total) function G : U M → X, for some class X. Then there exists a unique
function F : On → X satisfying

(1) (∀α)F(α) ! G(F |` α),

(2) dom(F) is either On, or some α.

VI.5.17 Corollary (Pure Recursion over δ). Let G be a (not necessarily total)
function G : U M → X, for some class X. Then there exists a unique function
F : δ → X satisfying

(1) (∀α ∈ δ)F(α) ! G(F |` α),

(2) dom(F) is either δ, or some α < δ.

Theorem VI.5.11 leads to some additional interesting formulations of induc-

tion and inductive deﬁnitions over On (or over some δ).

VI.5.18 Theorem (Induction over On Rephrased). Let S ⊆ On satisfy

(1) 0 ∈ S,
(2) (∀α)(α ∈ S → α + 1 ∈ S),
(3) whenever Lim(α), the hypothesis (∀β < α)β ∈ S implies α ∈ S.

Then S = On.

Proof. Let instead S = On, and let α be the minimum element in On − S.

By VI.5.11, α must be one of 0, a successor, or a limit ordinal.
VI.5. The Transﬁnite Sequence of Ordinals 345

Well, α = 0 by (1). Next, α is not a successor either, for otherwise α −1 ∈ S,

hence α ∈ S by (2).
Thus, perhaps Lim(α). If so, by minimality of α, (∀β < α)β ∈ S. By (3)
this entails α ∈ S, once more contradicting the choice of α. So the hypothesis
S = On is untenable.

Of course, the above can be rephrased in terms of a formula S (α). The reader
will easily carry out this translation by letting S = {α : S (α)}.

We offer two reformulations of inductive deﬁnitions over On, leaving the

untold variations to the reader’s imagination.

VI.5.19 Theorem (Recursive or Inductive Deﬁnitions over On Rephrased).

Let G and H be (not necessarily total) functions On × U M → X and On ×
X → X respectively, for some class X. Then there exists a unique function
F : On → X satisfying:

(1) F(0) = x (for some x ∈ X),

+ 1) ! H(α, F(α)),
(2) (∀α)F(α
(3) (∀α) Lim(α) → F(α) ! G(α, F |` α) .

: On × U M → X by
Proof. Deﬁne G


 x if α = 0

G(α, f ) if Lim(α)
G(α, f)=

 H(α − 1, f (α − 1)) if dom( f ) ⊇ α ∧ α is a successor

0 otherwise
Thus, by VI.5.11, (1)–(3) translate to (∀α)F(α) ! G(α, F |` α).
Note that by III.11.4 it is not necessary to add in the third case above “∧ f
is a function”, for dom( f ) makes sense regardless.

VI.5.20 Theorem (Pure Recursive Deﬁnitions over On Rephrased). Let G

and H be (not necessarily total) functions U M → X and X → X respectively,
for some class X. Then there exists a unique function F : On → X satisfying:

(1) F(0) = x (for some x ∈ X),

(2) + 1) ! H(F(α)),
(∀α)F(α
(3) (∀α) Lim(α) → F(α) ! G(F |` α) ,
(4) dom(F) is either On, or some α.

Let us now probe further into the ordinals.

346 VI. Order

VI.5.21 Example. (Refer to Example VI.2.34.) The support function x "→

sp(x) is deﬁned on all sets by the recursive deﬁnition†

sp(x) = sp(y) (1)
y∈x

By induction over On, assume for all β < α that sp(β) = 0. Thus, by (1),

sp(α) = β<α sp(β) = 0. In sum
ZFC (∀α)sp(α) = 0
i.e., all ordinals are pure sets.

In view of the fact that the successor operation, +1, is inadequate to “reach”
limit ordinals, we search for more powerful operations.

VI.5.22 Theorem. Let A ⊆ On be a nonempty set. Then

(1) A is an ordinal,

(2) α ≤ A for all α ∈ A,

(3) A is the least ordinal with property (2).

Proof. (1): Let x ∈ y ∈ A. Thus x ∈ y ∈ α, for some α ∈ A. Therefore

x ∈ α, and hence x ∈ A, proving that A is transitive. On the other hand,

from the above, since y was arbitrary, we have that every element of A is an
ordinal. This settles (1).

(2): This translates to α ⊆ A, which is trivial.
(3): Finally, let (∀α ∈ A)α ≤ δ for some δ. That is, (∀α ∈ A)α ⊆ δ; hence

A ⊆ δ.

VI.5.23 Deﬁnition. Let < be a partial order on X, and B ⊆ X.

(1) a ∈ X is an upper bound of B iff (∀b ∈ B)b ≤ a. It is a strict upper bound
iff (∀b ∈ B)b < a.
(2) c ∈ X is a least upper bound, or supremum, of B iff it is an upper bound of
B and for any upper bound a of B we have c ≤ a.

VI.5.24 Remark.
(1) Suprema are unique, for if c and c are suprema of B, then c ≤ c and c ≤ c,
and hence c = c by antisymmetry. We write c = sup(B) or c = lub(B).
Upper bounds with respect to the inverse order > are called lower bounds

† sp(x) = {x} on the assumption U (x).

VI.5. The Transﬁnite Sequence of Ordinals 347

with respect to <. Correspondingly, least upper bounds with respect to >
are called greatest lower bounds or infima (singular infimum) with respect
to <. The latter are also unique, and we write d = inf(B) or d = glb(B) to
indicate that d is the infimum of B (in (X, <)).
(2) If B = ∅, then any a ∈ X is a lower bound, strict lower bound, upper
bound, and strict upper bound of B. Thus, the empty set has a supremum in
X iff X has a <-minimum element (which, of course, is unique if it exists).
Similarly, the statement “∅ has a glb” is equivalent to “X has a <-maximum
element”.

With the notation just introduced, Theorem VI.5.22 yields

VI.5.25 Corollary. Let A ⊆ On, A a set. Then On % sup(A) = A.

Note that above, consistently with Remark VI.5.24, A = ∅ implies that its sup

in On is 0. In other words, sup(A) = A is valid for A = ∅.

VI.5.26 Corollary. Let ∅ = A ⊆ On, A a set. Then

(1) The smallest ordinal strictly greater than all ordinals in A is sup{α +1 : α ∈
A}, denoted by sup+ (A).
(2) If A has a maximum element γ , then sup(A) = γ and sup+ (A) = γ + 1.
(3) If A does not have a maximum, then sup(A) = sup+ (A).

Proof. (1): By collection (cf. III.8.9), B = {α + 1 : α ∈ A} is a set. By VI.5.22,

sup+ (A) = sup(B) = B is the smallest ordinal such that

α + 1 ≤ sup+ (A) (i)

for all α ∈ A. But (i) is equivalent to α < sup+ (A).

(2): Since α ≤ γ for all α ∈ A, γ is an upper bound of A; hence sup(A) ≤ γ .
But γ ∈ A as well; hence γ ≤ sup(A). So sup(A) = γ . For the rest, γ + 1
is trivially a strict upper bound of A. Let also δ be a strict upper bound. In
particular, γ < δ (since γ ∈ A); hence γ + 1 ≤ δ by VI.5.4, establishing
γ + 1 = sup+ (A).
(3): sup(A) is smallest satisfying

α < sup(A) (ii)

for all α ∈ A, the “≤” becoming “<” due to the absence of a maximum element.
But sup+ (A) also is smallest that satisﬁes (ii), by (1) (regardless of the issue of
maximum). Hence sup(A) = sup+ (A).
348 VI. Order

If ∅ = A ⊆ On does not have a maximum, then sup(A) is a limit ordinal (see

Exercise VI.14).

VI.5.27 Definition (Transfinite Sequences). A transfinite sequence is a func-

tion f such that either dom( f ) = On or dom( f ) = α and ω < α. In the former
case it is also termed an On-sequence, in the latter case an α-sequence.

As is usual, we think of f as the “sequence” ( f (δ))δ<α – or ( f (δ))δ∈On

if appropriate – and even write ( f δ )δ<α or ( f δ )δ∈On . In the latter notation the
argument δ becomes an index (or subscript), and f δ is the term of the sequence
at location (position) δ. The concept of transfinite sequence derives from that
of the familiar finite sequences (case where α < ω) and infinite sequences
(case where α = ω). Intuitively, a transfinite sequence is just too “long” to
be enumerated by natural numbers, and therefore it requires ordinals beyond
natural numbers as indices for its terms.

VI.5.28 Informal Deﬁnition. A total F : A → B, where each of A, B is equip-

ped with a partial order, <1 , <2 respectively, is:

(1) Non-decreasing or monotone just in case F(x) ≤2 F(y) whenever x <1 y in

A. If <1 is ∈ and A = ω, then F is an ascending sequence. The terminology
derives from F(0) ≤2 F(1) ≤2 F(2) ≤2 · · · .
(2) Increasing just in case it is order-preserving in the sense of VI.3.4, that is,
F(x) <2 F(y) whenever x <1 y in A.
(3) Continuous just in case, for each nonempty set S ⊆ A, if t = sup(S), then
sup(F[S]) exists and F(t) = sup(F[S]).
(4) Countably continuous just in case for each ascending sequence s : ω → A, if
t = sup(ran(s)), then sup(F[ran(s)]) exists and F(t) = sup(F[ran(s)]). More
suggestively, we write this as F(limn xn ) = limn F(xn ), where xn = s(n)
for n ∈ ω.

we independently know that, say, every nonempty subset of B has a supremum

(for example, a complete lattice does), then all that continuity of F adds is that
the objects F(t) and sup(F[S]) are equal.

We deﬁne three additional concepts when F is a transﬁnite sequence.

VI.5.29 Deﬁnition. A transﬁnite sequence f with ran( f ) ⊆ On is

(1) weakly continuous iff, for each α ∈ dom( f ), if Lim(α) then f (α) =
sup{ f (β) : β < α},
(2) normal iff it is increasing and weakly continuous,
(3) weakly normal iff it is non-decreasing and weakly continuous.

VI.5.30 Proposition. A continuous function F, as in VI.5.28, is non-

decreasing.

Proof. Let x <1 y in A. Now, sup{x, y} = y, hence sup{F(x), F(y)} exists and

F(y) = sup{F(x), F(y)}

(recall, F is total), i.e., F(x) ≤2 F(y).

VI.5.31 Corollary. A countably continuous function F, as in VI.5.28, is non-

decreasing.

Proof. Let x <1 y in A. Then

x if n = 0
s = λn.
y if n > 0

is an ascending sequence and ran(s) = {x, y}.

VI.5.32 Proposition. A continuous transﬁnite sequence f with ran( f ) ⊆ On is

weakly continuous.

Proof. Let Lim(α) and α ∈ dom( f ). Now, α = sup{β : β < α}; hence, by con-
tinuity, f (α) = sup{ f (β) : β < α}.

VI.5.33 Corollary. A continuous transﬁnite sequence f with ran( f ) ⊆ On is

weakly normal.
350 VI. Order

VI.5.34 Corollary. An increasing continuous transﬁnite sequence f with val-

ues in On is normal.

The following establishes the converse of VI.5.33. It will prove useful in

Section VI.10.

VI.5.35 Proposition. If a transﬁnite sequence f is weakly normal, then it is

continuous.

Proof. Let ∅ = S ⊆ On be a set. Let α = sup S (it exists by VI.5.22).

If α ∈ S, then f (α) ∈ f [S] is maximum, since f is non-decreasing.
Let α ∈
/ S. Then Lim(α) by Exercise VI.14. Now

sup f [S] = { f (γ ) : γ ∈ S} ⊆ { f (γ ) : γ ∈ α} (1)

since γ ∈ S → γ ∈ α. On the other hand, the choice of α yields

(∀γ ∈ α)(∃δ ∈ S)γ < δ

Hence

(∀γ ∈ α)(∃δ ∈ S) f (γ ) ≤ f (δ)

and the ⊆ in (1) is promoted to equality. But the right hand side of ⊆ is f (α)
by weak continuity.

VI.5.36 Corollary. If f is normal, then it is continuous.

VI.5.37 Example. A transﬁnite sequence f can be weakly continuous without

being continuous. This is because weak continuity can be satisﬁed by a function
which is not non-decreasing. For example, let f : ω + 1 → ω + 1 be given by

 2k if x = 2k + 1 ∧ k ∈ ω
f (x) = 2k + 1 if x = 2k ∧ k ∈ ω

ω if x = ω
Thus, f (2k) = 2k + 1 and f (2k + 1) = 2k for all k ∈ ω, so f is not non-
decreasing. On the other hand, f (ω) = ω = sup{n : n < ω} = sup{ f (n) : n <
ω}, and ω is the only limit ordinal in dom( f ).

So continuity of an f : On → On is equivalent to a (weak) monotonicity

property, together with weak continuity. On the other hand, by the above exam-
ple, weak continuity alone does not imply any monotonicity property for the
VI.5. The Transﬁnite Sequence of Ordinals 351

function. It turns out that with a bit of a boost, weak continuity can imply a
strong monotonicity property for the function, and hence continuity by VI.5.35.

VI.5.38 Proposition. If f is a weakly continuous On-sequence of ordinals

that moreover satisﬁes (∀α) f (α) < f (α + 1), then f is increasing, and hence
normal.

Proof. We need to show f (α) < f (β) for all α < β. We do induction on β.
Basis. β = 0. The contention is vacuously satisﬁed.
The successor case. Say β = γ + 1, and let α < β. Thus, α = γ or α < γ .
In the former case, f (α) < f (β) by the assumption; in the latter case, by I.H.,
the assumption, and transitivity of <.
The limit ordinal case. Say Lim(β). By weak continuity,

f (β) = sup{ f (γ ) : γ < β} (1)

Let now α < β, hence α + 1 < β, so that

f (α) < f (α + 1) by assumption

≤ f (β) by (1)
Note that the I.H. was not needed in this case.

VI.5.39 Proposition. If f is a weakly continuous On-sequence of ordinals that

moreover satisﬁes (∀α) f (α) ≤ f (α + 1), then f is non-decreasing, and hence
weakly normal.

Proof. Exercise VI.16.

The following easy result will be useful in Section VI.10. It says that a
normal On-sequence of ordinals maps limit ordinals to limit ordinals.

VI.5.40 Proposition. If f is a normal On-sequence of ordinals and Lim(α),

then also Lim( f (α)).

Proof. Suppose that Lim(α). Then f (α) = sup{ f (γ ) : γ < α}. But f (α) ∈ /
{ f (γ ) : γ < α} because f is increasing. Thus Lim( f (α)) by Exercise VI.14.

VI.5.41 Definition (Fixed Points). A fixed point (also called fixpoint some-
times) of a function F : A → A is a u ∈ A such that F(u) = u.
352 VI. Order

VI.5.42 Theorem (Knaster-Tarski). Let (A, <) be a PO class with a minimum

element t, and where every ascending sequence has a supremum (meaning that
its range does). If F : A → A is countably continuous, then F has a ﬁxed point.

Proof. Deﬁne by recursion over ω the sequence (sn )n<ω by

s0 = t
sn+1 = F(sn )

By induction on n one sees that the sequence is ascending. Indeed, t = s0 ≤ s1

(t is minimum), and if (I.H.) sn ≤ sn+1 , then, by VI.5.31, sn+1 = F(sn ) ≤
F(sn+1 ) = sn+2 .
Let u = sup{sn : n < ω}. By countable continuity,

F(u) = sup{F(sn ) : n < ω}

= sup{sn+1 : n < ω} (1)
= sup{sn : n < ω} (2)
=u

where the passage from (1) to (2) is justiﬁed by s0 ≤ sn for all n ∈ ω.

VI.5.43 Corollary. The ﬁxed point u of the previous theorem is ≤-least. That
is, any c such that F(c) ≤ c satisﬁes u ≤ c.

In particular, any c such that F(c) = c satisﬁes u ≤ c.

Proof. Let F(c) ≤ c. Now, by induction on n we see that sn ≤ c for all n < ω,
because s0 = t ≤ c, and if (I.H.) sn ≤ c, then sn+1 = F(sn ) ≤ F(c) ≤ c.
Thus u = sup{sn : n < ω} ≤ c.

VI.5.44 Corollary. If f is a weakly normal transﬁnite On-sequence, then it

has a ﬁxed point β.

Proof. The proof follows that of Theorem VI.5.42. We must just ensure that
the key assumptions hold. Well, On has a minimum element, and if (sn )n<ω
is ascending, then sup{sn : n < ω} exists by VI.5.22. The rest is taken care of
by VI.5.35.

VI.5.45 Corollary. If f is a normal transﬁnite On-sequence, then, for every

γ , it has a ﬁxed point β such that γ < β.
VI.5. The Transﬁnite Sequence of Ordinals 353

Proof. All else is as above, but now deﬁne s0 = γ + 1. You will need to argue
that (sn )n<ω is non-decreasing via a different route than before (Exercise VI.18).

The above says that normal On-sequences of ordinals have arbitrarily large
ﬁxed points.
It is easy to see that the proof (imitating that of the theorem) yields the least
ﬁxed point greater than γ (Exercise VI.18).

Theorem VI.5.42 can be sharpened in one direction, that is, dropping the
requirement of (countable) continuity. A small trade-off towards achieving this
is to restrict attention to PO sets (A, <) where every subset of A – not just
ascending sequences – has a supremum in A.

VI.5.46 Deﬁnition. Let (A, <) be a PO set. A total function f : A → A is

inclusive or expansive iff, for all x ∈ A, x ≤ f (x).

The terminology depends on which side of ≤ one is looking at. The input is
“expanded” by, or “included”† in, the output.

VI.5.47 Theorem. Let (A, <) be a PO set such that every S ⊆ A has a least
upper bound in A. If f : A → A is either inclusive or monotone, then it has a
ﬁxpoint c ∈ A, that is, f (c) = c.

Proof. Let t = sup ∅, the minimum element of A. We deﬁne, by recursion over

On, the transﬁnite sequence s : On → A:

sα = f sup{sβ : β < α} (1)

α "→ sα is total on On. For convenience we set

s<α = sup{sβ : β < α} (2)

This simpliﬁes (1):

sα = f (s<α ) (3)

We now claim that α "→ sα is monotone.

† It is no more weird to pronounce a ≤ b – where ≤ is an arbitrary order – “a is included in b” than

to pronounce it “a is less than or equal to b”. Each version has an obvious “concrete” motivation.
354 VI. Order

Case 1. f is inclusive. Then s<α ≤ sα by (3); hence sβ ≤ sα whenever

β < α (by (2)).
Case 2. f is monotone. By (2), β < α implies s<β ≤ s<α . Hence, by
monotonicity, f (s<β ) ≤ f (s<α ). That is, sβ ≤ sα .
By collection, λα.sα cannot be 1-1 (A is a set). Let us ﬁx attention on some
β and γ such that β < γ and sβ = sγ . By monotonicity of λα.sα , if β < α < γ
then sβ = sα ; thus
s<γ = sup{sα : α < γ } = sβ (4)
We can now calculate as follows:
sβ = sγ
= f (s<γ )
= f (sβ ), by (4)
Thus, c = sβ works.

VI.5.48 Remark. (1) The reader can easily verify that we can weaken some-
what the assumption on (A, <) and still prove our theorem. It suffices to pos-
tulate that the PO set has a supremum for every chain.† Thus, (1) would be
undefined unless {sβ : β < α} is a chain. Well, one has to prove that it will be a
chain (under the changed assumptions for (A, <)) anyway (Exercise VI.19).
(2) It turns out that with the β and γ as fixed in the proof above,
γ ≤ α → sγ = sα (5)
On can prove (5) by induction, since the class of ordinals above γ is well-
ordered. Thus assume the claim for all α such that γ ≤ α < δ. Now, mono-
tonicity of λα.sα and the I.H. entail
s<δ = sup{sθ : θ < δ} = sγ (6)
Applying f to the two extreme sides of (6) and remembering that sγ is a fixpoint
of f , we obtain sδ = sγ . In particular, for the γ that we fixed in the proof, we
have shown that sγ = s<γ .

Pause. Must it also be the case, for the β of the proof, that s<β = sβ ?

VI.5.49 Corollary. Restricting the assumptions of VI.5.47 to a monotone

f : A → A, there is a least ﬁxpoint c for f . That is, f (c) = c, and if f (d) ≤ d,
then c ≤ d.
† That is, every set S ⊆ A that is totally ordered by <.
VI.5. The Transﬁnite Sequence of Ordinals 355

Proof. The c of the proof of VI.5.47 works. It sufﬁces to prove that sα ≤ d for
all α.
For α = 0, s0 = f (sup ∅). Now, sup ∅ ≤ d; hence s0 = f (sup ∅) ≤ f (d) ≤ d,
using monotonicity of f . In general, s<α = sup{sδ : δ < α} ≤ d by I.H. By
monotonicity, sα = f (s<α ) ≤ f (d) ≤ d.

We conclude this section with the important Zermelo well-ordering principle

and another theorem that is equivalent to AC. The connection here is that the
proofs involve recursively deﬁned transﬁnite sequences.

VI.5.50 Theorem (Zermelo’s Well-Ordering Principle). Every set can be

well-ordered.

Proof. Let x be any set. If x = ∅, then the result is trivial. Assume then that
x = ∅ and let f be a choice function (AC) on P(x) − {∅}, i.e.,

(∀y) ∅ = y ⊆ x → f (y) ∈ y (1)
By recursion over On (with α as the recursion variable) deﬁne

(∀α)h(α) ! f x − ran(h |` α) (2)
It follows by VI.5.16 ((2) is a pure recursion) that dom(h) = On, or dom(h) = γ
for some γ . Now, h is 1-1, for if α < β and β ∈ dom(h) – so that α ∈ dom(h)
as well – then, by (1) and (2), h(β) ∈ x − ran(h |` β), and hence h(β) = h(α).
Thus, by collection – since ran(h) ⊆ x – we have dom(h) = γ .
h is onto x, for dom(h) = γ entails that
γ = min{δ
:δ ∈ / dom(h)}
= min δ : f x − ran(h |` δ) ↑
= min{δ : x − ran(h |` δ) = ∅}
That is, x = ran(h |` γ ) = ran(h).
By VI.3.12, h induces a well-ordering <1 on x such that &(x, <1 )& = γ .

VI.5.51 Remark. The above theorem is due to Zermelo (1904, 1908). The proof
in his 1904 paper is reproduced in Kamke (1950) (see also Exercise IV.3). It
is noteworthy that while AC was, of course, employed in an essential way in
the original proof, it was taken there to be a “fundamental truth” of set theory†
rather than an additional assumption (axiom).

† See Kamke (1950, p. 112), especially the concluding remarks prior to the statement of the well-
ordering theorem.
356 VI. Order

Cantor had conjectured (but not proved) a special case of the well-ordering
theorem in 1883, where x is the set of reals, R.

We have remarked several times that AC holds on any WO set. Thus,

VI.5.52 Corollary. AC is equivalent to the well-ordering principle (in the pre-

sence of the remaining axioms).

Proof. If F (a set) is a family of nonempty sets, then let <1 well-order F. To
each x ∈ F associate its <1 -minimum element xmin . The function x "→ xmin is
a choice function on F.

The careful reader will observe that “remaining axioms” need not include
foundation or power set, as the ordinals can be developed without these
axioms.

VI.5.53 Corollary. AC is equivalent to the following: For every set x there is

an ordinal α and a function g with dom(g) = α and x ⊆ ran(g).

Proof. Assume AC. Let x = ∅. Referring to the proof of VI.5.50, we can take
g = h and α = γ . If x = ∅, then α = 1 and g = {0, 0} work.

Conversely, let F be a nonempty family of nonempty sets. We take x = F
and let α and g be as described in the
corollary.
A choice function for F is λy.g min(g −1 [y]) : F → x.

The following is important for the next chapter.

VI.5.54 Corollary. For every set x there is an ordinal α and a 1-1 correspon-
dence between x and α.

VI.5.55 Theorem (Kuratowski-Zorn Theorem). If (A, <) is a PO set where

every chain has an upper bound, then for every a ∈ A there is a <-maximal
element c ∈ A such that a ≤ c.

The above is usually referred to as “Zorn’s lemma ”.

VI.5. The Transﬁnite Sequence of Ordinals 357

Proof. Let f be a choice function on P(A) − {∅}. We deﬁne by recursion a

transfinite sequence λα.tα :
t0 = a
tα+1 ! f {x : tα < x} (1)

tα ! f x : x is an upper bound of {tβ : β < α} if Lim(α)
(1) is a pure recursion; thus, dom(λα.tα ) = On or dom(λα.tα ) = θ for some
θ ∈ On. We will determine which one is the case.† Let us simply write “t” for
the function λα.tα . We prove that t is increasing on dom(t), that is,‡
α < β ∈ dom(t) → tα < tβ (2)
We do induction on β (the proof is entirely analogous to that of VI.5.38).
For the basis, the implication (2) is trivially provable if β = 0. Suppose now
that β = γ + 1. There are two cases:
One is where α = γ , and we are done by the second case in (1).
The other is where α < γ . The I.H. yields tα < tγ . But we also have tγ < tγ +1
by (1). We are done by transitivity of <.
Finally, let Lim(β). As remarked in the preceding footnote, α < β implies
α ∈ dom(t). Since tβ ↓, the third case in (1) yields that tα ≤ tβ for any α < β.
For such an α, α + 1 < β as well; thus α + 1 ∈ dom(t) and tα+1 ≤ tβ . But
tα < tα+1 by the second case in (1).
We must thus choose the case dom(t) = θ by collection. We claim that θ is
a successor (it is not 0, since t0 = a). If not, Lim(θ). But then, {tα : α < θ } is a
chain because t is increasing, thus t is defined at θ using the third case of (1),
contradicting the fact that dom(t) = θ .
Let then θ = η + 1. Then tη ↓ and a ≤ tη (why “≤” and not “<”?). More-
over, tη is <-maximal; otherwise the second case in (1) would define tη+1 .

VI.5.56 Corollary (Hausdorff’s Theorem). In a PO set (A, <) every chain

B ⊆ A is included in some maximal chain C. That is, C is a chain, B ⊆ C,
and there is no chain D such that C ⊂ D.

Proof. Let us order the set of <-chains of A by inclusion. So we have <-chains

and ⊆-chains.

† Note that (1) may yield undeﬁned right hand sides in the second or third case of the deﬁnition,
because the set we are using as the argument of f is actually not in the domain of f (because it
is ∅). For example, this may happen in the third case if {tβ : β < α} is not a chain.
‡ Since dom(t) is On or an ordinal, it is transitive. Therefore, α ∈ β ∈ dom(t) implies α ∈ dom(t).
358 VI. Order

Now, the union of the members of a ⊆-chain – these are <-chains – is a

<-chain. Indeed, let S be a ⊆-chain, and let x ∈ S and y ∈ S. Then
x ∈ B ∈ S and y ∈ B ∈ S. Without loss of generality let B ⊆ B . Then x and

y are in B and thus are <-comparable. In short, S is a <-chain. It is trivially
a ⊆-upper bound of the members of S (<-chains).
It follows by VI.5.55 that for any <-chain B there is a ⊆-maximal <-chain
C such that B ⊆ C.

VI.5.57 Corollary. Hausdorff’s theorem is equivalent to Zorn’s lemma, and

thus to AC.

Proof. In view of the proof of VI.5.56, we need only prove that the latter
implies Zorn’s lemma. Let then (A, <) be a PO set where every chain has an
upper bound. Let a ∈ A. Now, {a} is a chain; thus there is a maximal chain
C ⊆ A such that a ∈ C. Let c be an upper bound of C. Then a ≤ c trivially.
Moreover, c is maximal, for if not, there is a b > c. But then C ∪ {b} is a chain
that properly extends C.

We have learnt a few of the properties of the transﬁnite sequence of ordinals

in this section, some of which will be conveniently used in the sequel, especially
in Section VI.10. There is a bit more that we will have to say in the latter section.
For the whole story, the advanced and curious reader should consult Levy (1979)
and the older references Bachmann (1955), Sierpiński (1965).

VI.6. The von Neumann Universe

We now turn to formalizing “stages” of set construction within set theory, and to
studying what such formalization entails. We will define a transfinite sequence
of sets – built from an arbitrarily chosen set of urelements N – which when taken
together constitute a “universe” of sets and atoms of set theory built from N , in
the sense that all axioms of set theory hold in this universe. This construction is
effected within ZFC by recursion on ordinals, and for every α formally yields a
set, VN (α), that consists of all sets (and atoms) that we can define at “stage” α.

The (proper) class U N = α∈On VN (α) is all we can construct from the atoms
N , using the axioms.
If we mimic the formal construction outside ZFC, using “real” mathematical
objects for atoms and working within “real” mathematics, we may then Platon-
istically proclaim that we have, at long last, constructed the “natural” model of
set theory, a model that we have only vaguely described in II.1.3.†

† But from which vague description we have ﬁrmly justiﬁed the selection of axioms of ZFC!
VI.6. The von Neumann Universe 359

The reader is cautioned that the formal construction within ZFC does not
provide a formal proof of consistency of the axioms. What it does is build a for-
mal interpretation of L Set and ZFC over the language L Set and ZFC. Thus,
formally, it only proves that (cf. I.7.10) if ZFC is consistent, then ZFC is
consistent.† Hardly newsworthy. Nevertheless, as we have noticed above,
Platonistically we get much more out of this construction.

VI.6.1 Deﬁnition (The Cumulative Hierarchy or von Neumann Universe).

Recall that we have been using the (rather unimaginative) name M formally,
having introduced it by the formal deﬁnition

M = y ↔ ¬U (y) ∧ (∀x) x ∈ y ↔ U (x)

or simply put, M = {x : U (x)}. For any N ⊆ M we deﬁne VN (α) by induction

over On by

VN (0) = N
VN (α + 1) =
P(N ∪ VN (α))
VN (α) = V (α)
β<α N
if Lim(α)

If N = ∅, then we omit the subscript “N ” in VN (α) and just write V (α).

We also deﬁne the sequence R N (α) by

R N (0) = ∅
R N (α + 1) = P(N ∪ R N (α))
R N (α) = R (α)
β<α N
if Lim(α)

If N = ∅, then we omit the subscript “N ” in R N (α) and just write R(α).

We denote α∈On R N (α) the class of well-founded sets built from N , by
WF N .

VI.6.2 Remark. We view the above as a recursive deﬁnition of the functions

λα.VN (α) and λα.R N (α), effected for any set N of urelements. We can also
view it as defining λα N .VN (α) and λα N .R N (α) respectively, i.e., where N is
a parameter. We then explicitly arrange that the right hand side in each case is
equal to a “don’t care value” if N ⊆ M fails. We can use ∅ for this value.
Alternatively, we may define λα A.V A (α) for any set A (parameter) but mod-
ify the definition setting V A (0) = T C(A) and V A (α + 1) = P(T C(A) ∪ VN (α)).
One can easily prove by induction on α that sp(VN (α)) ⊆ N .

† We actually end up deriving a somewhat less trivial result.

360 VI. Order

In what follows we want to (formally) settle two claims:

One, that α∈On VN (α), for any initial set of urelements N , satisfies all
axioms of set theory. Or, Platonistically, that this union is a “universe” of sets
and atoms (built from N ).

Two, that α∈On VN (α) contains all the objects (i.e., sets and urelements)
of set theory, if these objects are built from the initial set of atoms N , i.e.,

that α∈On VN (α) = {x : sp(x) ⊆ N }. Even though this will be done formally,
we want to point out its Platonist interpretation: If our recursive definition is
carried out in the realm of real mathematics, and if N happens to contain all the

atoms, then α∈On VN (α) contains everything: it is the class of all mathematical
objects, U N .
The above inductive definition reflects the intuitive idea of the formation
of sets by stages. At stage 0 we collect a collection of atoms, which are given
outright, into a set via the equation VN (0) = N . Subsequently, urelements are
used along with sets already built, in the second equation above, to build new
sets at a “powering stage” α + 1 (identified by a successor ordinal).
We also note that at a “collecting stage” α, identified by a limit ordinal,
we collect together all objects previously constructed or “donated” that are
scattered around in the various VN (β) for β < α. Thus, ordinals serve as
(formal) “stages” of set construction.
Comparing this formal definition with that in Section IV.2, we note that we
deviate from the latter in that we include all subsets of N ∪ VN (α) at a powering
stage, not only the “definable” (or “constructible”) ones.

VI.6.3 Lemma (VN vs. R N ). For successor ordinals α, VN (α) = R N (α). For all
other ordinals, VN (α) = N ∪ R N (α).

Proof. A trivial induction: For α = 0, VN (0) = N = N ∪ R N (0). For Lim(α),

VN (α) = VN (β)
β<α

=N∪ R N (β) since VN (β) ∈ {R N (β), N ∪ R N (β)} by I.H.
β<α
= N ∪ R N (α)

For α = β + 1,

VN (β + 1) = P(N ∪ VN (β))
= P(N ∪ R N (β)) by I.H.
= R N (β + 1)
VI.6. The von Neumann Universe 361

Thus, N ∪ WF N = α∈On VN (α). In WF N we collect only the sets built from
N , and leave “loose” urelements out of the collection. As in III.4.20, V N will
denote the class of all sets built from N . The question whether WF N = V N is
a subsidiary of the second claim made in VI.6.2, and will be settled shortly.

VI.6.4 Proposition. N ∪ VN (α) is transitive for all α.

Proof. By induction on α: N ∪ VN (0) = N is transitive, for x ∈ y ∈ N is

refutable.
Consider N ∪ VN (α + 1), on the I.H. that N ∪ VN (α) is transitive. Let then
x ∈ y ∈ N ∪ VN (α + 1); hence (why?) x ∈ y ∈ VN (α + 1); therefore x ∈ y ⊆
N ∪ VN (α), so that x ∈ N ∪ VN (α). If x ∈ N , then x ∈ N ∪ VN (α + 1) and we
are done, otherwise, x ⊆ N ∪ VN (α) by I.H. and hence x ∈ VN (α + 1).
Let ﬁnally Lim(α), and (I.H.) assume that all N ∪ VN (β) are transitive for

β < α. Suppose that x ∈ y ∈ N ∪ VN (α) = N ∪ β<α VN (β). Thus,

x∈y∈ VN (β)
β<α

so that x ∈ y ∈ VN (β) for some β < α. By I.H., x ∈ N ∪ VN (β) ⊆ N ∪ VN (α).

It follows that α∈On VN (α) = N ∪ WF N is transitive as well. Note however
that WF N is not transitive (unless N = ∅), since for any urelement p, p ∈ N ∈
R N (1) ⊆ WF N , yet p ∈/ WF N .

VI.6.5 Corollary. If there are no urelements (i.e., N = ∅), then R N (α) =

VN (α) is transitive for all α.

VI.6.6 Corollary. (∀α)(∀β < α)VN (β) ⊆ N ∪ VN (α).

Proof. We do induction on α. For α = 0 the statement is vacuously satisﬁed.

The case for α + 1, on the I.H. that the claim holds for α: Let us ﬁrst
consider β = α < α + 1. Take x ∈ VN (α). If x ∈ N , then we are done; else
x ⊆ N ∪ VN (α) by VI.6.4, whence x ∈ VN (α + 1). Thus

VN (α) ⊆ N ∪ VN (α + 1) (1)

Let next β < α. The I.H. yields VN (β) ⊆ N ∪ VN (α); hence VN (β) ⊆ N ∪
VN (α + 1) by (1).
362 VI. Order

The case Lim(α): Here already VN (β) ⊆ VN (α), even without the help
of I.H.

VI.6.7 Corollary. (∀α)(∀β < α)VN (β) ∈ VN (α).

Proof. Induction on α. For α = 0 the claim is vacuously satisﬁed.

For α + 1, VN (α + 1) = P(N ∪ VN (α)); hence

VN (α) ∈ VN (α + 1) (1)

If, in general, β < α + 1, then it remains to consider β < α. By I.H., VN (β) ∈

VN (α). By (1) and VI.6.4, VN (β) ∈ VN (α + 1).
The case Lim(α): Let β < α, hence β + 1 < α. Now VN (β + 1) ⊆

γ <α VN (γ ) = VN (α); thus, by (1), VN (β) ∈ VN (α) (the I.H. was not needed
here).

VI.6.8 Corollary. (∀α)α ⊆ VN (α).

Proof. Induction on α. For α = 0 the claim is trivial.

For α + 1, VN (α + 1) = P(N ∪ VN (α)) % α, by I.H. By VI.6.4, α ⊆
N ∪ VN (α + 1); hence α ⊆ VN (α + 1), since ordinals are pure sets (VI.5.21).
All in all, α + 1 = α ∪ {α} ⊆ VN (α + 1).
The case Lim(α): β ⊆ VN (β) ⊆ VN (α) for all β < α, by I.H. Thus, α =

α = {β : β < α} ⊆ VN (α), where the ﬁrst = is justiﬁed in Exercise VI.13.

VI.6.9 Corollary. β ∈ VN (α) iff β < α.

Proof. If part. β ∈ α ⊆ VN (α) (by VI.6.8) implies β ∈ VN (α).

Only-if part. Induction on α. For α = 0 the claim is vacuously satisﬁed,
because β ∈ N is refutable.
For α + 1: Let β ∈ VN (α + 1). Thus

β ⊆ VN (α) (1)

since ordinals are pure sets. Why should β < α + 1, that is, β ≤ α, or β ⊆ α?
Well, let γ ∈ β. Thus γ ∈ VN (α) by (1). By I.H., γ ∈ α.

The case Lim(α): Let β ∈ VN (α) = {VN (γ ) : γ < α}. So β ∈ VN (γ ) for
some γ < α; hence, by I.H., β < γ . Thus, β < α.
VI.6. The von Neumann Universe 363

VI.6.10 Corollary. On ⊆ α∈On VN (α).

VI.6.11 Remark. By VI.6.6 we have a hierarchy of sets VN (α), in the sense

that as the stages, α, of the construction progress, we obtain more and more
inclusive sets.
The hierarchy is proper, that is, VN (β) ⊂ N ∪ VN (α) if β < α, i.e., the
construction keeps adding new stuff. This is because of β ∈ VN (α) − VN (β)
(by VI.6.9). Alternatively, if VN (β) = N ∪ VN (α) for some β < α, then,
by VI.6.7, VN (β) ∈ VN (β).

At the end of all this, have we got enough sets to “do set theory”? In other
words, are all the axioms of set theory true in the “real”

VN (α) (A)
α∈On

or, formally, are the axioms provable when relativized to α∈On VN (α)
(cf. Section I.7)? And are these all the sets we can get if we start with a set N
of atoms? That is, is it the case that

UN = VN (α) (B)
α∈On

Let us address these two questions in order. First a lemma.

VI.6.12 Lemma. For any set x, x ⊆ N ∪ WF N implies x ∈ N ∪ WF N .

Proof. If x = ∅, then x ∈ VN (1). So assume that x = ∅, and let α y denote, for

each y ∈ x, the smallest α such that y ∈ VN (α). By collection, S = {α y : y ∈ x}
is a set; hence, sup S exists (by VI.5.22). Say it is β. But then (by VI.6.6),
x ⊆ N ∪ VN (β); hence x ∈ VN (β + 1).

VI.6.13 Theorem. For any initial set of urelements N , the class N ∪ WF N =

α∈On VN (α) satisﬁes all the axioms of set theory, that is,

J = (L Set , ZFC, N ∪ WF N )

is a formal model of ZFC.

We are somewhat simplifying the notation in our applications of the mate-

rial of Section I.7. Thus, we have omitted the last component, “I ”, in “J =
(L Set , ZFC, N ∪ WF N )”. If we ever need to write “P I ” (for some predicate P),
364 VI. Order

we will write “P N ∪WF N ” instead. Two more simpliﬁcations in our notation are:

(1) We wrote L Set , but we mean here the basic language augmented by the
various deﬁned symbols we have introduced to this point.
(2) We wrote N ∪ WF N rather than “M(x)”, where the latter is the deﬁning
formula of the class term.

Proof. The reader may wish to review the concepts in Section I.7.
Now, the requirement that

ZFC (∃x)(x ∈ N ∪ WF N )

is trivially met by ZFC 0 ∈ N ∪ WF N (cf. VI.6.10) and the substitution axiom.

We interpret now our two nonlogical symbols, U and ∈. We interpret both
as “themselves”. That is, ∈ N ∪WF N ≡ ∈ and U N ∪WF N ≡ U .
A moment’s reﬂection (cf. I.7.4) shows that U (x) is “true in N ∪ WF N ”† iff
x ∈ N . Indeed, |=J U (x) is short for

ZFC x ∈ N ∪ WF N → U (x)

Thus x ∈ WF N is untenable. Therefore x ∈ N . The other direction is trivial.

Correspondingly, “A is a set” translates to “A is a set and A ∈ N ∪ WF N .”
Next, to facilitate the argument that follows, we look at the (deﬁned) symbol
“⊆”: Suppose that A, B are in N ∪WF N . What will the interpretation of A ⊆ B,
that is, of

(∀x)(x ∈ A → x ∈ B) (1)

be? It will be

(∀x ∈ N ∪ WF N )(x ∈ A → x ∈ B) (2)

Trivially, (1) implies (2). Interestingly, (2) implies (1): Indeed, to prove (1)
(from (2)), let x ∈ A. Since also A ∈ N ∪ WF N , we get x ∈ N ∪ WF N by
transitivity of N ∪ WF N . Then x ∈ B by (2).

Thus, Platonistically, ⊆ has the same meaning in N ∪ WF N as in the whole

universe U N .

† It is easier expositionally to refer to N ∪ WF N , meaning really J. The jargon “true in” was
introduced (with apologies) on p. 80.
VI.6. The von Neumann Universe 365

We now turn to the veriﬁcation of all the axioms in N ∪ WF N .

(i) The axiom (∃x)(∀y)(U (y) ↔ y ∈ x) translates to
(∃x ∈ N ∪ WF N )(∀y ∈ N ∪ WF N )(U (y) ↔ y ∈ x)
Since N ∈ N ∪ WF N is provable (e.g., I.6.7), to prove the above it sufﬁ-
ces to argue the case for
(∀y ∈ N ∪ WF N )(U (y) ↔ y ∈ N )
But this we have already done.
(ii) That the axiom U (x) → (∀y)y ∈ / x is “true in N ∪ WF N ” means
ZFC x ∈ N ∪ WF N → U (x) → (∀y ∈ N ∪ WF N )y ∈
/x
which is a tautological consequence of U (x) → (∀y ∈ N ∪ WF N )y ∈
/ x,
and the latter trivially follows in ZFC from U (x) → y ∈ N ∪ WF N →
y∈/ x, itself a tautological consequence of U (x) → y ∈ / x.†
(iii) Axiom of extensionality. It says that for any sets A and B,
A⊆B∧B⊆ A→ A=B
Now, for any sets A, B in N ∪ WF N , the relativization of this is provable
in ZFC, since “=” is logical, and we saw above (the equivalence of
(1) and (2)) that “⊆” relativizes over N ∪ WF N as itself.
(iv) Axiom of separation. It says that for any set B and class A, A ⊆ B implies
that A is a set. To see why this is true in N ∪WF N ,‡ let B ∈ N ∪WF N and
A ⊆ B. Thus, in ZFC, A is a set (recall the invariance of meaning of “⊆”).
To prove that (A is a set) N ∪WF N , equipped with our preliminary remarks
at the onset of the proof, we only need to show that A ∈ N ∪ WF N . Now,
B ∈ VN (α) for some α; hence (VI.6.4), A ⊆ B ⊆ N ∪ VN (α). Therefore,
A ∈ VN (α + 1); hence A ∈ N ∪ WF N .
(v) Axiom of pairing. For any a, b in N ∪ WF N one must ﬁnd a set C ∈
N ∪ WF N such that a ∈ C and b ∈ C. We can take C = {a, b},
since a ∈ VN (α) and b ∈ VN (β) implies (using VI.6.6) that {a, b} ∈
VN (max(α, β) + 1).
(vi) Axiom of union. For any set A ∈ N ∪ WF N we need to show that there
is a set B ∈ N ∪ WF N such that, for all sets x in N ∪ WF N ,§
x∈A→x⊆B

† (∀y)(A → B ) ↔ (A → (∀y)B ), provided y is not free in A.

‡ Armed with I.7.4, the reader will not be confused by the frequent occurrence of the argot “is true
in N ∪ WF N ” in this proof.
§ Note that we have translated ⊆ by itself, due to (1) and (2).
366 VI. Order

We can take B = N ∪ VN (α), where A ∈ VN (α). Indeed, x ∈ A implies

x ∈ N ∪ VN (α) by VI.6.4; hence, x being a set, x ⊆ N ∪ VN (α) by VI.6.4
again. Of course, B ∈ N ∪ WF N .
(vii) Power set axiom. We need to show that for any set A ∈ N ∪ WF N there
is a set B ∈ N ∪ WF N such that, for all x ∈ N ∪ WF N ,
x ⊆ A → x ∈ B.
We can take B = VN (α + 1), where A ∈ VN (α), since A ⊆ N ∪ VN (α)
by VI.6.4.
(viii) Collection. We need to show the truth of
(∀x ∈ A)(∃y)F [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z)F [x, y]
in N ∪ WF N , that is, for any A and formula F ,
A ∈ N ∪ WF N and (∀x ∈ A)(∃y ∈ N ∪ WF N )F [x, y] (3)

imply within ZFC that there is a set† z in N ∪ WF N such that

(∀x ∈ A)(∃y ∈ z)F [x, y] (4)

So assume (3). This we view as (∀x ∈ A)(∃y)G [x, y], where G [x, y] is
“y ∈ N ∪ WF N ∧ F [x, y]”. We can now apply collection (in ZFC) to
obtain a set B (in ZFC) such that
(∀x ∈ A)(∃y ∈ B)G [x, y]
or

(∀x ∈ A)(∃y ∈ B)(y ∈ N ∪ WF N ∧ F [x, y]) (5)

By (5) and Lemma VI.6.12, z = B ∩ (N ∪ WF N ) is what we want
for (4).
(ix) Axiom of inﬁnity. We need an inductive set in N ∪ WF N . Since ω ∈
N ∪ WF N by VI.6.10, we are done.
(x) AC. Let S be a set of nonempty sets in N ∪ WF N . By AC (in ZFC),

there is a choice function, f : S → S, such that f (x) ∈ x for all
x ∈ S. We want to show that f is in N ∪ WF N . Now by (iv) and (vi),

S ∈ N ∪ WF N ; hence S × S ∈ N ∪ WF N using (iv), (vi), and (vii),

since S × S ⊆ P(S ∪ P(S ∪ S)). Thus, f ⊆ S × S implies
f ∈ N ∪ WF N .
(xi) Axiom of foundation. We left this till the very end for a special reason.
We will show that N ∪ WF N would satisfy foundation even if we did

† Cf. III.8.4.
VI.6. The von Neumann Universe 367

not include foundation in ZFC.† Let then ∅ = A ⊆ N ∪ WF N . Take

α = min{β : A ∩ VN (β) = ∅}, and pick an A ∈ A ∩ VN (α) (auxiliary
constant). If A is an urelement, then
(∃x ∈ A)(∀y ∈ A)y ∈
/x
is provable, since
(∀y ∈ A)y ∈
/ A (6)
is. If A is a set, then (6) is still provable (in ZFC without foundation):
First, A ∈ / VN (γ ) if γ < α. Thus, α = δ + 1 for some δ (why?);
hence A ⊆ N ∪ VN (δ). Then, y ∈ A implies y ∈ N ∪ VN (δ), so that
y ∈ VN (max(0, δ)); hence y ∈ / A (else α ≤ max(0, δ)).

Part (xi) in the proof above was carried out without foundation. Indeed, the
whole theorem can be proved without foundation, in “ZFC − f” – where “f”
stands for foundation. This is due to the feasibility of basing everything in
the proof on the Kuratowski pairing, x, y = {{x}, {x, y}}, while deﬁning
ordinals as in VI.4.25. Indeed, everything we have said up until now (except for
the examples regarding the properties of the collapsing function) can be said
without the beneﬁt of foundation.
Thus, the whole construction has built more than we were willing to admit
initially. We have built a formal model of ZFC in J = (L Set , ZFC − f, N ∪WF N )
rather than in “just” J = (L Set , ZFC, N ∪ WF N ). I.7.10 yields at once:

VI.6.14 Corollary. If ZFC without foundation is consistent, then so is ZFC.

But we do have foundation – that is, its suspension was only temporary, to
obtain VI.6.14.

VI.6.15 Theorem. U N = α∈On VN (α).

Proof. By U N we understand, of course, {x : sp(x) ⊆ N } (cf. VI.6.2). The

⊇-part is trivial, so we address the ⊆-part. Let us use ∈-induction over U N ,
so let x ∈ U N , and assume (I.H.) that for all y ∈ x, y ∈ N ∪ WF N . Thus,
x ⊆ N ∪ WF N . By VI.6.12, x ∈ N ∪ WF N .

VI.6.16 Corollary. WF N = V N .

† Once again we point out that there is no circularity in this assertion, for ordinals can be defined
without the presence of foundation (see the discussion following VI.4.25). Another revision that
one needs to make in this (temporary!) rewriting of our development is the definition of x, y.
To avoid foundation one defines x, y = {{x}, {x, y}}.
368 VI. Order

VI.6.17 Corollary. Foundation is equivalent to the statement V N = WF N .

Proof. That foundation implies V N = WF N was the content of the proof of

VI.6.15. Conversely, if V N = WF N holds, then foundation holds, since it is a
theorem of ZFC − f in WF N .

Another way to put all this is that if we drop the foundation axiom, then
WF N ⊂ V N
The sets in V N − WF N are the hypersets (see Barwise and Moss (1991)).

We started off in Chapter II by proposing – after Russell – that sets be formed

in stages. The concept of “stage” was necessarily vague there, yet it assisted
us to choose a small group of “reasonable axioms” on which we based all our
deductions hence. We have now come to a point that “stages” can be formalized
within the theory!

VI.6.18 Deﬁnition. We say that a set x is formed from N at stage† α iff x ∈

R N (α) (iff x ∈ VN (α), since x is a set).
Thus R N (α) is the set of all sets formed at stage α.

Principle 0 of Chapter II says that “an arbitrary class is a set formed (from N )
at some stage if all its members are formed (set-members) or given (urelement-
members) at some earlier stage”, and Principle 1 says that “every set is con-
structed at some stage”. All this has now become formally true (with our ﬁnal
interpretation of what “stage” means).
Indeed, if the set x is in VN (α) and α is smallest (“earliest”), then α = β + 1
(why?); hence x ⊆ N ∪ VN (β) = N ∪ R N (β). That is, all the elements of x are
formed (∈ R N (β)) or given (∈ N ) at some earlier stage.
Conversely, if it is known for a class A that all y ∈ A satisfy y ∈ VN (α y ),
and if we are told that there is a stage after all the α y (that is, sup{α y : y ∈ A}

exists and equals, say, β), then A ⊆ y∈x VN (α y ) ⊆ N ∪ VN (β) (the last ⊆
by VI.6.6). Hence A ∈ VN (β + 1), i.e., A is constructed at stage β + 1 from N ,
as a set. This formalizes Principle 2.
Principle 1 is formalized as VI.6.16.

† This is meant in the non-strict sense. α need not be the earliest stage (i.e., smallest ordinal) at
which x is formed.
VI.6. The von Neumann Universe 369

VI.6.19 Deﬁnition (Rank of an Object in U N ). The rank of an object x in

U N , ρ N (x), is the earliest stage α at which x is formed from N , in the sense
ρ N (x) = min{α : x ∈ VN (α)}.

VI.6.20 Remark. (1) We will normally suppress the subscript N on ρ. See

also VI.6.24 below.
(2) Definition VI.6.19 wants ρ(x) to be the earliest stage α at which we can
place the set or urelement x as a member of VN (α). This deviates from the
standard definition given in most of the literature, where the rank is defined as
the smallest α such that x ⊆ VN (α).
We prefer VI.6.19 (the adoption of which affects computation of ranks, but
no other theoretical results) because we find it aesthetically more pleasing not
to have both sets and urelements at stage 0. The alternate definition gives rank
0 to ∅, as it does to any atom – see for example Barwise (1975). With respect to
the literature that admits no urelements in the theory, our objection disappears.
(3) The adoption of VI.6.19 makes ρ(x) a successor for all sets x. In-
deed, if Lim(ρ(x)) (why can it not be that ρ(x) = 0?), then, as VN (ρ(x)) =

{VN (α) : α < ρ(x)}, we would get x ∈ VN (α) for some α < ρ(x).
(4) VI.6.19 yields ρ(α) = α + 1 (see below), while the standard rank, “r k”,
would have r k(α) = α (Exercise VI.24).

VI.6.21 Proposition. (∀α)ρ(α) = α + 1.

Proof. By VI.6.9, α ∈ VN (α + 1) − VN (β), where β ≤ α.

VI.6.22 Example. If x ⊆ N , where N is a set of urelements, then ρ N (x) = 1.

/ N = VN (0), and second, x ∈ VN (1) = P(N ∪ VN (0)).
Well, ﬁrst, x ∈
In particular, ρ N (∅) = 1.

VI.6.23 Proposition. For any sets x, y,

(i) x ∈ y implies ρ(x) < ρ(y),

(ii) x ⊆ y implies ρ(x) ≤ ρ(y).

Proof. (i): Let x ∈ y and ρ(y) = α + 1. Then x ∈ y ∈ P(N ∪ VN (α));

hence x ∈ N ∪ VN (α), from which ρ(x) ≤ max(0, α).
(ii): Let x ⊆ y and ρ(y) = α + 1. Here, x ⊆ N ∪VN (α); hence x ∈ VN (α + 1);
thus ρ(x) ≤ α + 1.
370 VI. Order

VI.6.24 Proposition. ρ N satisﬁes the recurrence equations

0 if x ∈ N
ρ N (x) =
y∈x ρ N (y) + 1 otherwise

Proof. N = VN (0) settles the ﬁrst equation. Let now x be a set and α =

y∈x ρ N (y), while ρ N (x) = β + 1. Since ρ N (y) ≤ α for all y ∈ x, we get
(∀y ∈ x)y ∈ N ∪VN (α) (by VI.6.6); hence x ⊆ N ∪VN (α); thus x ∈ VN (α+1).
This yields
β +1≤α+1 (1)
By VI.6.23, (∀y ∈ x)ρ N (y) < β + 1; hence (∀y ∈ x)ρ N (y) ≤ β. Thus α ≤ β;
hence α + 1 ≤ β + 1. Using (1), we get the second equation.

The above recurrence shows that the dependence of ρ N on N is straight-

forward (initialization), which justiﬁes our lack of caution in suppressing the
subscript N .

VI.6.25 Example. Let us re-compute ρ(∅): ρ(∅) = y∈∅ ρ(y) +1 = 0+
1 = 1.
Next, let us
ρ(x) for
compute again

∅ = x ⊆ N (where N is our initial atom
set): ρ(x) = y∈x ρ(y) + 1 = ( {0}) + 1 = 0 + 1 = 1.

VI.6.26 Example. Let us rediscover the identity ρ(α) = α +1, using induction
over On in connection with VI.6.24. Use ρ(β) = β + 1 for β < α as I.H.

Then ρ(α) = β<α (β + 1) + 1 = sup+ (α) + 1 = α + 1.

VI.6.27 Example. Suppose that f is a function with dom( f ) ⊆ ω+1, such that
f (ω) ↓. Let us estimate f ’s rank. We know that ω, x ∈ f for some x. Now,
ρ(ω, x) = ρ({ω, {ω, x}})
= max(ω
+ 1, ρ({ω,x})) + 1

= max ω + 1, max ω + 1, ρ(x) + 1 + 1
≥ ω+3
By VI.6.23, ρ(ω, x) < ρ( f ); thus, f ∈
/ VN (ω + 3).
This entails, in particular, that an inhabitant of VN (ω +3) would be oblivious
to the fact that there is a 1-1 correspondence ω ∼ ω + 1, since even though ω
and ω + 1 are “visible” in VN (ω + 3),† the 1-1 correspondence is not.

† In anticipation of ordinal arithmetic in Section VI.10, we are taking here the notational liberty
to write things such as “α + 3” for ((α + 1) + 1) + 1.
VI.6. The von Neumann Universe 371

VI.6.28 Example. Next, assume Lim(α), and let β < γ , both in VN (α), and
f : γ → β be a 1-1 correspondence. Is f ∈ VN (α)?
f ⊆ γ × β. Now,

ρ(γ × β) = ρ(δ, η) + 1
δ,η∈γ ×β

= max(δ + 3, η + 3) + 1
δ,η∈γ ×β

≤ γ + 3, since max(δ + 3, η + 3) ≤ γ + 2

Thus, ρ( f ) ≤ γ + 3 by VI.6.23, so that f ∈ VN (γ + 3) and hence f ∈ VN (α),

since γ + 3 < α. An inhabitant of VN (α) will witness the fact that γ ∼ β.

What else is the rank function good for?

It allows us to formalize the argument in III.8.3(I) to prove that collection
follows from the following restricted axiom (replacement axiom), and therefore
it is equivalent to it. This result strengthens III.8.12 by proving the the last
version ((1) below) implies the ﬁrst (III.8.2) (see p. 173).

(∀x ∈ A)(∃!y)S (x, y) → (∃z)(∀x ∈ A)(∃y ∈ z)S (x, y) (1)

To avoid repetitiousness in the arguments that follow, let us show once and
for all that

VI.6.29 Proposition. (Tarski (1955).) For every class B there is a set b ⊆ B

such that

(1) B = ∅ → b = ∅,
(2) b can be given as a set term in terms of B.

Proof. Let us deﬁne b = ∅ if B = ∅. Otherwise, let us collect in b all those

members of B of least rank (compare with the idea developed in III.8.3(I)).
That is,

b = {x ∈ B : (∀y ∈ B)ρ(x) ≤ ρ(y)} (i)

By (i), b ⊆ B. By assumption on B, there is a minimum α such that ∅ =

B ∩ VN (α). Thus, b = ∅ and (∀x ∈ b)ρ(x) = α (if some x, y in b have
ρ(x) < ρ(y), then, as x, y are in B as well, we must also have ρ(y) ≤ ρ(x),
untenable). Thus, b ⊆ VN (α); hence it is a set.
(i) is the “computation” of b as a set term in terms of B.
372 VI. Order

We now turn to show that “replacement” ((1) above) implies collection (there
is no circularity in this, for anywhere that we have used collection, the restricted
form (1) sufﬁced).
Assume then
(∀x ∈ A)(∃y)S (x, y) (2)
Thus, there is a nonempty class Bx = {y : S (x, y)} for each x ∈ A.
Let, for each x ∈ A, bx = ∅ be the set “computed” by VI.6.29(i). Using
class notation for readability, (2) translates into
(∀x ∈ A)(∃y)y ∈ Bx (2 )
We have by VI.6.29
(∀x ∈ A)(∃!y)y = bx (3)
Why “∃!”? Because, referring to the proof of VI.6.29, there is only one mini-
mum α such that Bx ∩ VN (α) = ∅, and hence only one bx . On the other hand,
bx = y → bx = z → y = z. By schema (1), (3) yields
(∃z)(∀x ∈ A)(∃y ∈ z)y = bx
Let then C be a new constant, and add
(∀x ∈ A)(∃y)(y ∈ C ∧ y = bx )
By the one point rule (I.6.2), the above implies (∀x ∈ A)bx ∈ C; thus {bx : x ∈ A}

is a set by separation; hence so is {bx : x ∈ A}. Call this union D (new
constant).
We are almost done. Let x ∈ A. Then Bx = ∅, hence by VI.6.29, bx = ∅.
This allows us to add a new constant e and also add e ∈ bx . By bx ⊆ D we
have e ∈ D. By bx ⊆ Bx we have e ∈ Bx . Thus, e ∈ D ∧ e ∈ Bx ; hence
(∃y ∈ D)y ∈ Bx
By the deduction theorem x ∈ A → (∃y ∈ D)y ∈ Bx ; hence (generalizing)
(∀x ∈ A)(∃y ∈ D)y ∈ Bx
from which, eliminating class notation,
(∃z)(∀x ∈ A)(∃y ∈ z)S (x, y)
This, along with (2), proves collection (III.8.2).

VI.6.30 Example. Let the relation P satisfy a “weak” MC, namely, for every
nonempty set x, there is a P-minimal element y ∈ x, i.e., ¬(∃z ∈ x)z P y, or
Py ∩ x = ∅
It will follow that P has “ordinary” (strong) MC, as deﬁned in VI.1.22.
VI.7. A Pairing Function on the Ordinals 373

Indeed, let ∅ = A, and let A have no P-minimal elements, that is,

for all a ∈ A, Ba = Pa ∩ A = ∅ (1)
Now, we have no reason to assume that the Ba ’s are sets (e.g., P might fail to
be left-narrow). However, by VI.6.29, we get for each a ∈ A a nonempty set
ba ⊆ Ba . Let

S= {a} × ba
a∈A

Since S ⊆ P, S has “weak” MC as well (compare with Exercise VI.1), and it

is left-narrow, since for all a ∈ dom(S) we have Sa = ba . By Exercise VI.4,
there is an a ∈ A such that

Sa ∩ A = ∅

ba ∩ A = ∅

which contradicts (1).

In particular, taking P ≡ ∈ (the relation “∈”), this shows that the “set version”
(single axiom) of foundation,

(∃x ∈ y) → (∃x ∈ y)¬(∃z ∈ y)z ∈ x

is equivalent to the “class version” that we gave as a schema (III.7.2).

As promised in Remark VI.2.14, Corollary VI.2.13 can now be strengthened
to allow A to be any class, possibly proper. The restriction to a set A in the
proof of VI.2.13 was meant to allow the use of AC, proving there that well-
foundedness implies MC. Now let P be well-founded over a class A. The proof
of VI.2.13, unchanged, now starting with “. . . let ∅ = B ⊆ A, where B is a
set,† . . .”, shows that P has weak MC over A. In view of the equivalence of the
strong and weak versions of MC, P has (strong) MC over A.

VI.7. A Pairing Function on the Ordinals

In this section we establish a useful technical result that we will employ in
Section VI.9 and in the next chapter. We show that On × On can be well-
ordered, and when this is done,
On × On ∼
= On

† The italicised hypothesis is explicitly added since A may be proper class.

374 VI. Order

We start by noting that, since for any two ordinals α, β one has either α ≤ β or
β < α, it makes sense to deﬁne

α if β ≤ α
max(α, β) =
β otherwise
and

α if α ≤ β
min(α, β) =
β otherwise

Since ≤ is ⊆, we have max(α, β) = α ∪ β and min(α, β) = α ∩ β.

VI.7.1 Deﬁnition. We deﬁne a relation on On × On by

σ, τ α, β iff max(σ, τ ) < max(α, β) ∨
max(σ,
τ ) = max(α, β) ∧
σ < α ∨ (σ = α ∧ τ < β)

VI.7.2 Proposition. is a well-ordering on On × On.

Proof. We delegate the details, for example that is a linear order, to the reader
(Exercise VI.26).
Let us argue that it has MC. To this end, let ∅ = A ⊆ On × On. The class
{α ∪ β : α, β ∈ A} has a smallest member γ . This is realized as γ = α ∪ β
for some (perhaps several) α, β ∈ A. Among those, pick all with the smallest
α (ﬁrst component), i.e., setting
def
F (σ, τ ) ≡ γ = σ ∪ τ ∧ σ, τ ∈ A
form the class
{α, β : F (α, β) ∧ (∀σ )(∀τ )(F (σ, τ ) → α ≤ σ )} (1)
and ﬁnally pick in (1) that α, β with the smallest β.
Let us verify that α, β is -minimal in A: If σ, τ α, β because σ ∪ τ <
α ∪ β = γ , then σ, τ ∈
/ A by the choice of γ . Let it then be so because σ ∪ τ =
α ∪ β = γ , but σ < α. Then σ, τ ∈ / A by the choice of α. The last case to
consider also yields σ, τ ∈
/ A, by the choice of β.

VI.7.3 Theorem. (On2 , ) ∼

= (On, <).

Proof. By VI.7.2 and VI.3.20 we have one of

(1) (On2 , ) ∼
= (< α, <) for some α,
VI.7. A Pairing Function on the Ordinals 375
& '
(2) ( α, β , ) ∼
= (On, <),
(3) (On2 , ) ∼
= (On, <).
To the left of ∼
= in (1) we have a proper class (e.g., dom(On2 ) = On). Thus,
this case is untenable. Case (2) is impossible as well, for the left hand side of
∼
= is a subclass of γ × γ , where γ = (α ∪ β) + 1, and hence a set.
This leaves (3) as the only possibility.

As a result, is left-narrow on On2 .

VI.7.4 Remark. The unique function J : On2 → On that effects the isomor-
phism of Theorem VI.7.3 is an instance of a pairing function on the ordinals –
that is, a 1-1, total function On2 → On. This particular one is also onto;
thus there is an inverse J −1 : On → On2 . We reserve the letters K , L to write
J −1 = K , L. The K , L are the ﬁrst and second projections of J and satisfy
K , L ◦ J = 1On2 and J ◦ K , L = 1On (the latter only because J is onto).
Thus,

K (J (α, β)) = α

and

L(J (α, β)) = α

for all α, β.
With the aid of K , L we can enumerate all the pairs in On2 by the (total)
function α "→ K (α), L(α) on On. Note how each of K and L enumerates
each ordinal σ inﬁnitely often (why?)
Pairing functions play an important role in recursion theory (from where
the notation is borrowed here). For a detailed account of computable pairing
functions on N see Tourlakis (1984).

What are pairing functions good for? Section VI.9 will exhibit a substantial
application. The next one will be given in Chapter VII. For now let us extend
the coding of pairs (of ordinals) that J effects into a coding of “vectors” of
ordinals (compare with III.10.4).

VI.7.5 Deﬁnition. We deﬁne by induction on n ∈ ω the functions Jn :

J1 (α) = α for all α
αn+1 ) = J (Jn (
Jn+1 ( αn ), αn+1 ) for all α n+1
where α n stands for the sequence α1 , . . . , αn .
376 VI. Order

We also deﬁne for each n ∈ ω the functions πin for† i = 1, . . . , n by

π11 (α) = α for all α

πn+1
n+1
(α) = L(α) for all α

πin+1 (α) = πi K (α) for all α
n

It is trivial to verify that for each n ∈ ω, Jn : Onn → On is a 1-1 correspondence

of which α "→ π1n (α), . . . , πnn (α) is the inverse.

VI.7.6 Proposition. For each α, β, J (α, β) ≥ α and J (α, β) ≥ β.

Proof. Fix a β, and let σ < τ .

Case 1. If τ ≤ β, then σ, β τ, β; hence J (σ, β) < J (τ, β).
Case 2. If τ > β, then σ ∪ β < τ ∪ β, so that again σ, β τ, β; thus
also J (σ, β) < J (τ, β).
This amounts to λα.J (α, β) being order-preserving on On; hence the ﬁrst
required inequality follows from VI.3.15. The second inequality is proved
similarly.

VI.7.7 Corollary. For each, n, α, and i ∈ n + 1 − {0}, πin (α) ≤ α.

J and have a number of additional interesting properties. We discuss one

more here and delegate the others to the exercises section.

VI.7.8 Example. We show here that J [ω2 ] = ω.

Throughout this example, . . .2 stands for . . . × . . . , not for ordinal multipli-
cation or exponentiation (which have not been introduced yet anyway).
Let n, m ∈ ω2 , where at least one of n and m is nonzero.
Case 1. n = 0. Then the immediate predecessor is m − 1, m − 1; hence
J (n, m) = J (m − 1, m − 1) + 1.
Case 2. 0 < n < m. Then the next pair down is n −1, m; hence J (n, m) =
J (n − 1, m) + 1.
Case 3. n > m = 0. Then the next pair down is n − 1, n − 1; hence
J (n, m) = J (n − 1, n − 1) + 1.
Case 4. n > m > 0. Then the next pair down is n, m −1; hence J (n, m) =
J (n, m − 1) + 1.

† i ∈ n + 1 − {0}, if you want to avoid “. . . ”.

VI.8. Absoluteness 377

Thus, for each n, m ∈ ω2 , J (n, m) is a successor (or 0 = J (0, 0)); therefore
J [ω2 ] ⊆ ω. Now J [ω2 ] = J (0, ω) (Exercises VI.28 and VI.29); hence J [ω2 ] ⊇
ω, by VI.7.6. Thus ω is a ﬁxed point of λα.J [α 2 ].

VI.8. Absoluteness
We expand here on the notions introduced in Section I.7. We will be interested
in exploring the phenomenon where inhabitants of universes M, possibly much
smaller than U M , can correctly tell that a sentence A is true in U M , even though
their knowledge goes no further than what is going on in their “world” M.
We start by repeating the definition of relativization of formulas, this time
in the specific context of L Set . Since we here use L Set as our interpretation
language, we go one step further and interpret ∈ as ∈ and U as U . Thus, we
restate below Definition I.7.3 under these assumptions.

VI.8.1 Deﬁnition (Relativization of Formulas and Terms). Given a class M

for which M = ∅ is a theorem† and a formula F . We denote by F M the for-
mula obtained from F by replacing each occurrence of (∃x) in it by (∃x ∈ M).
More precisely, by induction on formulas we define
U M (x) ≡ U (x)
(x ∈ y)M ≡x∈y
(x = y)M ≡x=y
(¬A)M ≡ (¬(A M ))
(A ∨ B )M ≡ (A M ∨ B M )
M
(∃x)A ≡ (∃x ∈ M)A M
Now let T( u ) = {x : T (x, u )} be a class term depending on the (free) variables
u . Its relativization to a class M is defined as TM ( u ) = {x ∈ M : T M (x, u )}.
The terminology “T( u ) is defined in M” is argot for the assertion “TM ( u) ∈
M is provable on the assumptions u i ∈ M (for all i)”. M is T-closed iff T( u) ∈ M
for all u i ∈ M.

The reader must have noticed that we now use F M rather than F M(x)
(cf. I.7.3), as that is the normal practice in the context of set theory.
Recall that the primary logical connectives are ¬, ∨, and ∃. In contrast,
∀, ∧, →, ↔ are defined symbols, which is why the above definition does not
refer to them.
M
Clearly, if F is quantifier-free, then F is F .

† Of ZF or of ZFC or of whatever fragment “ZFC ” of ZFC we want to use in a formal interpreta-

tion J = (L Set , ZFC , M).
378 VI. Order

u ) is deﬁned in M is to say that for all u i ∈ M, TM (

To say that T( u ) is a
set, and a member of M. Clearly, TU N ( u ) = T( u ) is deﬁned in U N ”
u ), and “T(
simply means that for all u i , T(
u ) is a set.

VI.8.2 Example. What is {a, b}M ? It is

{x ∈ M : (x = a ∨ x = b)M } = {x ∈ M : x = a ∨ x = b}
= {a, b} ∩ M
This proves (in ZF − f, for example) that
x ∈ M → y ∈ M → {x, y}M = {x, y}
That is, for a and b chosen in M, “{a, b}” has the same meaning in M as it has
in U N .
If M is {a, b}-closed, then {a, b} is deﬁned in M.

VI.8.3 Remark (“Truth” in M). We use the short argot “F (x1 , . . . , xn ) is true
in M” for the longer argot “F (x1 , . . . , xn ) is true in J = (L Set , ZFC , M)”. We
will often write this assertion as
|=M F (x1 , . . . , xn ) (1)
We will recall from I.7.4 the translation of the above argot, (1), where we use
here “ZFC ” for some unspeciﬁed fragment of ZFC:
M
ZFC x1 ∈ M ∧ x2 ∈ M ∧ · · · ∧ xn ∈ M → F (x1 , . . . , xn ) (2)
The part “x1 ∈ M ∧ x2 ∈ M ∧ · · · ∧ xn ∈ M →” in (2) is empty if F is a
sentence.
Platonistically (semantically), “truth in M” is just that; in the sense of
I.5. Indeed, the notation (1) states such truth from the semantic viewpoint
as well. In this and the next section however, our use of (1) is in the syntactic
sense (2).

VI.8.4 Remark. Thinking once again Platonistically (semantically), let us ver-

ify that for any formula F and all a, b, . . . in M,
M
|=M F [[ a, b, . . . ]] iff |=U N F [[ a, b, . . . ]] (1)
assuming that N is the supply of atoms we have used to build the von Neumann
universe. This is an easy induction on formulas, and the details are left to the
reader. Here are some cases: For F ≡ U (x) we have, for a ∈ M,

|=M U (x)[[ a ]] iff |=U N U (x)[[ a ]] iff (cf. VI.8.1) |=U N U M (x)[[ a ]]
VI.8. Absoluteness 379

Say F ≡ ¬A. Then, for a, . . . in M,

|=M (¬A) [[ a, . . . ]]
iff
|=M A [[ a, . . . ]]
iff (by I.H.)
|=U N A M [[ a, . . . ]]
iff
|=U N (¬A)M [[ a, . . . ]]
Say F ≡ (∃x)A. Then, for a, . . . in M,

|=M (∃x)A [[ a, . . . ]]
iff
(∃i ∈ M) |=M A [[ i, a, . . . ]]
iff (by I.H.)
(∃i ∈ M) |=U N A M [[ i, a, . . . ]]
iff
(∃i) |=U N (i ∈ M ∧ A M ) [[ i, a, . . . ]]
iff
|=U N (∃x ∈ M)A M [[ a, . . . ]]
iff M
|=U N (∃x)A [[ a, . . . ]]
Thus, there are two ways to semantically evaluate a formula F in M. One is to
act as an inhabitant of M. You then evaluate F in the standard way indicated
in I.5. The other way is to act as an inhabitant of U N . Before you evaluate F in
M, however, you ensure that it is relativized into F M and that the values you
plug into free variables are from M. Then, both methods yield the same result.
Note, nevertheless, that an inhabitant of M may think (because he evaluated
so, and he knows of no worlds beyond his to know better) that a sentence F is
true, when in reality, i.e., absolutely speaking, something somewhat different
is true: F M .
Sometimes (for some F and some M) the reality in M and the absolute
reality coincide, and this is wonderful, for, in that case, what an inhabitant
of M considers to be true is really true. We will explore this phenomenon
shortly.

VI.8.5 Example (Informal). Let M = {a, {a}, {a, b}}, where a = b are ure-
lements. Set A = {a} and B = {a, b}. Clearly, A = B; hence also (A = B) M .
Yet,
M
(∀x)(x ∈ A ↔ x ∈ B)
380 VI. Order

that is,

(∀x ∈ M)(x ∈ A ↔ x ∈ B)

is (really) true. In short (see previous remark)

|= M (∀x)(x ∈ A ↔ x ∈ B)

Thus, M does not satisfy extensionality.

It turns out that transitive classes do not have this ﬂaw (M, of course, is not
transitive).

VI.8.6 Deﬁnition (0 -formulas). The set of the 0 -formulas is the smallest
subset of all formulas of L Set that

(1) includes all the atomic formulas (of the types xi = x j , U (xi ), xi ∈ x j ), and
(2) is such that whenever the formulas A, B are included, so are (¬A),
(A ∨ B ), and ((∃xi ∈ x j )A) for any variables xi , x j (xi ≡ x j ).

The 0 -formulas are also called restricted formulas, as quantiﬁcation is always

bounded by asking that the quantiﬁed variable belong to some set x j .
Once more, we refer here only to the connectives ¬, ∨, ∃, since the others
(∀, ∧, etc.) are expressible in terms of them. As always, when writing down
formulas, whether these are restricted or not, we will only use just enough
brackets to avoid ambiguities.

VI.8.7 Lemma. If M is a transitive class and A(x1 , . . . , xn ) is a 0 -formula,

then for all xi ∈ M,

A(xn ) ↔ A M (xn ) (1)

The above claim (1) is short for

ZF−f x1 ∈ M ∧ · · · ∧ xn ∈ M → A(xn ) ↔ A M (xn )

Proof. Induction on 0 -formulas:

Basis. The contention follows from VI.8.1 and VI.8.6.
Take as I.H. that (1) holds for A and B . It is trivial that it holds for ¬A
and A ∨ B as well. Let us then concentrate on (∃y ∈ z)A(y, xn ).
→: Assume now b ∈ M, a1 ∈ M, . . . , an ∈ M (all these letters are
free variables), and also add the assumption (∃y ∈ b)A(y, an ), that is,
VI.8. Absoluteness 381

(∃y) y ∈ b ∧ A(y, an ) . This allows us to introduce the assumption

Y ∈ b ∧ A (Y, an )

where Y is a new constant. Since b ∈ M, it follows that Y ∈ M by transitivity.

Thus, Y ∈ M ∧ Y ∈ b ∧ A(Y, an ), from which the basis case and the I.H.
yield (via the Leibniz rule)

Y ∈ M ∧ (Y ∈ b)M ∧ A M (Y, an )

The substitution axiom yields

(∃y ∈ M) (y ∈ b)M ∧ A M (y, an ) (2)

Thus, using VI.8.1,

M
(∃y ∈ b)A(y, an ) (3)

By the deduction theorem (omitting the -subscript),

M
b ∈ M → a1 ∈ M → · · · → (∃y ∈ b)A(y, an ) → (∃y ∈ b)A(y, an )
(4)
←: Conversely, let b ∈ M, a1 ∈ M, . . . , an ∈ M (all these letters are free
variables), and also add the assumption (3). By VI.8.1 this yields (2). Hence
(via the Leibniz rule, the basis part, and the I.H.), (∃y ∈ M) y ∈ b ∧A (y, an ) ,
i.e.,

(∃y) y ∈ M ∧ y ∈ b ∧ A(y, an )

from which tautological implication along with ∃-monotonicity yields

(∃y) y ∈ b ∧ A (y, an )

The deduction theorem does the rest.

Thus, Platonistically, a 0 -formula does not think that it is someone else if

you give it more or less “interpretive freedom” (from a transitive M to U N and
back). Its “meaning” is, somehow, “absolute”.
Less well-endowed formulas could suffer a change in meaning as we go
beyond a transitive class M. For example, (∀x)B (x) can be true if the search
(∀x) is restricted to M, but might fail to be so if the search is extended beyond
M. Similarly, a formula (∃x)B (x) might be true in U N , for there we can ﬁnd
an x that works (a “witness”), whereas in a “smaller” class M such an x might
fail to exist.
382 VI. Order

In view of Remark VI.8.4, we can read VI.8.7 also this way: If F is a

0 -formula and M is transitive, then (Platonistically) for all a, . . . in M,
|=M F [[ a, . . . ]] iff |=U N F [[ a, . . . ]] (5)
by (1) in VI.8.4. Thus, an inhabitant of M will think that a 0 -sentence is true
iff it is really true.

VI.8.8 Deﬁnition (Absolute Formulas). For formula A(xn ) to be absolute

for a class M (that is not necessarily transitive) means that

ZFC x1 ∈ M → · · · → xn ∈ M → A(xn ) ↔ A M (xn )
A class term T(xn ) is absolute for M iff
ZFC x1 ∈ M → · · · → xn ∈ M → T(xn ) = TM (xn )

Thus, using the above terminology, VI.8.7 says that 0 -formulas are absolute
for transitive classes. Example VI.8.2 shows that (the term) {a, b} is absolute
for any class M.
If T(
u ) = {x : T (x, u )} and T is absolute for M, then for u i ∈ M,
TM (
u ) = {x ∈ M : T M (x, u )}
= {x ∈ M : T (x, u )} by absoluteness of T
= T(
u) ∩ M
If moreover we know that T(u ) ⊆ M for all u i ∈ M, then T(
u ) is absolute for
M, as happened in the special case T(x, y) = {x, y}.

In view of Deﬁnition VI.8.8, and inspecting the proof of VI.8.7, we can state
at once:

VI.8.9 Corollary. The set of formulas that are absolute for some class M is
closed under the Boolean connectives and the bounded quantiﬁers (∃x ∈ y)
and (∀x ∈ y).

VI.8.10 Corollary. Extensionality holds in any transitive class M.

Proof. Extensionality in M states that

A ∈ M→ B ∈ M →
(¬U (A) ∧ ¬U (B) ∧ (∀x ∈ A)x ∈ B ∧ (∀x ∈ B)x ∈ A → A = B)M
Since inside (. . .)M we have a 0 -formula, the above is a tautological conse-
quence of the extensionality axiom and VI.8.7.
VI.8. Absoluteness 383

VI.8.11 Lemma. Foundation holds in any class M.

Proof. Let M = {x : M(x)}. Foundation says that

(∃x)A[x] → (∃x)(A[x] ∧ ¬(∃y ∈ x)A[y]) (1)

Its relativization to M is

(∃x ∈ M)A M [x] → (∃x ∈ M) A M [x] ∧ ¬(∃y ∈ M)(y ∈ x ∧ A M [y]) (2)

using VI.8.1. Letting now a1 ∈ M, . . . , an ∈ M, where an are the free variables
in (2), we can have a proof of (2) in ZF, for it is an instance of the schema “∈
(the relation) has MC over M”, a provable schema by the foundation axiom
(cf. VI.1.25).

The lemma can be strengthened by effecting the interpretation in “ZF − f”, that
is, dropping foundation. We have to take a few precautions:

(1) M is now not arbitrary, but is taken to be any subclass of N ∪ WF N .

(2) x, y is deﬁned by {{x}, {x, y}} to avoid foundation.
(3) Ordinals are deﬁned by VI.4.25.

We know then that we do have foundation in N ∪ WF N (provably) and can

conclude the proof above in the same way, the only difference being the reason
we have foundation (a theorem rather than an axiom).

u ), y = TM (
VI.8.12 Lemma. For any transitive class M and class term T( u)
M
u )) , for all y, u i ∈ M.
iff (y = T(

The proof can be carried out in ZF − f. The statement is in the customary argot,
but it states an implication, from premises y ∈ M, u i ∈ M, to the conclusion
y = TM ( u ) ↔ (y = T( u ))M

Proof. We write throughout T(

u ) = {x : T (x, u )}.
We calculate as follows:
M
(¬U (y) ∧ ∀z)(T (z, u ) ↔ z ∈ y)
↔
M
¬U (y) ∧ (∀z ∈ M)(T (z, u ) ↔ z ∈ y)
↔
M
¬U (y) ∧ (∀z) z ∈ M → (T (z, u ) ↔ z ∈ y)
↔
M
¬U (y) ∧ ∀z)(z ∈ M ∧ T (z, u ) ↔ z ∈ y
384 VI. Order

The last equivalence above uses the Leibniz rule, the tautology

A → (B ↔ C ) ↔ (A ∧ B → C ) ∧ (A ∧ C → B )

and transitivity of M – which implies z ∈ y ↔ z ∈ M ∧ z ∈ y. Noting

u ))M , while the last
(III.4.1) that the ﬁrst line of our calculation says (y = T(
M
says y = T ( u ); we are done.

VI.8.13 Remark. The above result is useful: We often need to show that the
M-relativization of the formula “{x : A(x, u )} is a set” is provable. That is, we
need to show, for u i ∈ M (free variables), the derivability of
M
(∃y)y = A( u) (1)

where we have set A(

u ) = {x : A(x, u )}.
(1) stands for
M
(∃y ∈ M) y = A(
u)

which under the assumptions of VI.8.12 is provably equivalent to

(∃y ∈ M)y = AM (
u)

Thus, to prove (1), it is necessary and sufﬁcient to prove that A is deﬁned in M.

One would apply this remark to proving that axioms such as those for sepa-
ration, pairing, union, and power set hold in a transitive class.

VI.8.14 Corollary. Let M be transitive, S be absolute for M, and T be absolute

for and deﬁned in M. Assume also that the formula F is absolute for M. Then:

u )] is absolute for M.
(i) F [T(
(ii) S[T(
u )] is absolute for M.

Proof. Let a1 ∈ M, . . . , u 1 ∈ M, . . . , where a1 , . . . , u 1 , . . . are all the free

variables occurring in the formulas above.
(i): We ﬁrst relativize F [T(
u )]. This formula means (see III.11.16)

(∃y) F [y] ∧ y = T( u) (1)
M
Thus, F [T( u )] is provably equivalent to (using the absoluteness assump-
tions and Leibniz rule)

(∃y ∈ M) F [y] ∧ (y = T(u ))M (2)
VI.8. Absoluteness 385

By VI.8.12, (2) is provably equivalent to

(∃y ∈ M) F [y] ∧ y = TM (
u)

and hence to

(∃y ∈ M) F [y] ∧ y = T(
u) (3)

since T is absolute for M.

Now we want to argue that (3) is provably equivalent to (1): Indeed, (3)
implies (1) trivially; conversely, (3) follows from (1), for if y (auxiliary constant)
works in the latter, then it satisﬁes y = T[
u ]; hence y ∈ M under the assumptions
on T and u .
(ii): Next, start by observing that S , where S(z ) = {x : S (x, z )}, is absolute
for M, since for x, z i ∈ M (free variables), using “↔” conjunctionally, we have
M
S (x, z ) ↔ x ∈ SM (z )
↔ x ∈ S(z ) by absoluteness of S
↔ S (x, z )

Thus,
M M
S[T(
u )] = x ∈ M : S [x, T(
u )]
= {x ∈ M : S [x, T(
u )]}, by part (i)†
M
= S [T(u )]
= S[T(
u )], by absoluteness of S

VI.8.15 Example. In particular, for any transitive class M that is {a, b}-closed,
{a, {a, b}}M = {a, {a, b}} and {{a}, {a, b}}M = {{a}, {a, b}} are provable in
ZF − f ‡ for a ∈ M and b ∈ M. In other words, for such a class either imple-
mentation of the ordered pair a, b (among the two that we have mentioned)
is an absolute term.

VI.8.16 Lemma. The following are absolute for any transitive class M:

(i) A ⊆ B.
(ii) A = ∅.
(iii) A is a pair (also, A = {x, y}).
(iv) A is an ordered pair (also, A = x, y).

† Note that T(u ) ∈ M.

‡ “f” enters in proving {a, {a, b}} = {a , {a , b }} → a = a ∧ b = b and is not needed here.
386 VI. Order

(v) x = π (A) (π(A) is the ﬁrst projection if A is an ordered pair; ∅

otherwise).
(vi) x = δ(A) (δ(A) is the second projection if A is an ordered pair; ∅
otherwise).
(vii) A is a relation.
(viii) A is an order.
(i x) A is a function (also, A is a 1-1 function).
(x) A is a transitive set.
(xi) A is an ordinal.
(xii) A is a limit ordinal.
(xiii) A is a successor.
(xiv) A ∈ ω (or, A is a natural number).
(xv) A = ω.
(xvi) {a, b}.
(xvii) ∅.
(xviii) A − B.

(xi x) A.

(x x) A (to make this term total we re-deﬁne ∅ as ∅).
(x xi) x ∈ dom(A).
(x xii) x ∈ ran(A).
(x xiii) (∗x ∈ dom(A))F , if F is absolute for M, where “∗” is ∃ or ∀.
(x xiv) (∗x ∈ ran(A))F , if F is absolute for M, where “∗” is ∃ or ∀.
Add the assumption that M is {a, b}-closed. Then the following terms are ab-
solute for M:
(1) A × B.
(2) dom(A).
(3) ran(A).

Proof. Most of these will be left to the reader (Exercise VI.61). Let us sample
a few:
(iii): A is a pair: (∃x ∈ A)(∃y ∈ A)(∀z ∈ A)(z = x ∨ z = y). This is a 0 -
formula, and hence absolute for any transitive class.
(x): A is a transitive set: ¬U (A) ∧ (∀y ∈ A)(∀x ∈ y)x ∈ A.
(xiv): A ∈ ω: “A is an ordinal ∧ A is a successor or 0 ∧ (∀x ∈ A)x is a
successor or 0” is a 0 -formula.

(x xi): Hint. x, y = {x, {x, y}}. Thus, if x, y ∈ A, then y ∈ A.
(1): A × B = {z : (∃x ∈ A)(∃y ∈ B)z = x, y}. Since the deﬁning formula
is 0 , (A × B)M = (A × B) ∩ M = A × B, since M is {x, y}-closed (and hence
x, y-closed).
VI.8. Absoluteness 387

VI.8.17 Example. Let M be a transitive class that is closed under pairs (hence
also under ordered pairs), and R = {x, y : R(x, y)} a relation, where R is
absolute for M. We calculate RM , noting that (cf. III.8.7)

R = z : (∃x)(∃y) x, y = z ∧ R(x, y)

We have

RM = z ∈ M : (∃x ∈ M)(∃y ∈ M) x, y = z ∧ R(x, y)

= z : (∃x)(∃y) x, y = z ∧ z ∈ M ∧ x ∈ M ∧ y ∈ M ∧ R(x, y)

= z : (∃x)(∃y) x, y = z ∧ x ∈ M ∧ y ∈ M ∧ R(x, y)

= x, y : x ∈ M ∧ y ∈ M ∧ R(x, y)

= x, y ∈ M × M : R(x, y)
= R ∩ (M × M)
= R|M

The third “=” stems from the assumption that M is closed under pairs, which
leads to the equivalence

x, y = z ∧ z ∈ M ∧ x ∈ M ∧ y ∈ M ↔ x, y = z ∧ x ∈ M ∧ y ∈ M

With some practice one tends to shrug off calculations such as the above and
write RM = {x, y ∈ M × M : R(x, y)} directly.

VI.8.18 Example. Continuing under the same assumptions as in VI.8.17, let

moreover R be a function, that is, we assume (alternatively, have a proof) that

R(x, y) ∧ R(x, z) → y = z (1)

Tautological implication yields

x ∈ M → y ∈ M → z ∈ M → R(x, y) ∧ R(x, z) → y = z

Hence, using assumption of absoluteness and the Leibniz rule,

x ∈ M → y ∈ M → z ∈ M → RM (x, y) ∧ RM (x, z) → y = z (2)

or (by VI.8.1)
M
x ∈ M → y ∈ M → z ∈ M → R(x, y) ∧ R(x, z) → y = z (2 )

That is, in our jargon, “(1) is true in M”.

388 VI. Order

Let us calculate RM a for a ∈ M:

RM a = {x ∈ M : RM (a, x)}

= {x ∈ M : R(a, x)}
= M ∩ {x : R(a, x)}
= M ∩ Ra

Thus, using the standard function notation “R(a)” ({R(a)} = Ra),

M ↑ if R(a) ∈
/M
R (a) ! (3)
R(a) otherwise

It follows from (3) that to obtain R(a) ! RM (a), for all a ∈ M – the absoluteness
condition – we equivalently need the provability of

a ∈ M → R(a) ↓→ R(a) ∈ M

that is, that M is R-closed in a sense weaker than that in VI.8.1: R[M] ⊆ M.
In terms of R we need the provability of

x ∈ M → R(x, y) → y ∈ M (4)

An alternative notation for (4) is obtained if we set D = dom(R):

(∀x ∈ M ∩ D)R(a) ∈ M (4 )

In summary, we deﬁne that a function R = {x, y : R(x, y)} is absolute for M

(a transitive class that is closed under pairs) iff the following three conditions
hold:

(i) R is absolute.
(ii) (1) (hence, trivially by (i), (2)) is provable.
(iii) (4) is provable.

A special case occurs if instead of the informal R we have introduced a

formal function symbol, R, by

y = R(x) ↔ R(x, y) (5)

because we have a proof of

(∀x)(∃!y)R(x, y) (6)

Note that (6) combines (1) with (∀x)(∃y)R(x, y) (no “!”). Thus, our conditions
for absoluteness of R – that is, (i)–(iii) – simplify to just (i) along with requiring
VI.8. Absoluteness 389

the provability of †
M
(∀x)(∃!y)R(x, y) (7)

Of course, one can go back and forth between a total informal R and a formal
R (cf. III.11.20); hence the two conditions (i) and (7) constitute all we need for
absoluteness of a total function R.
Finally, an interesting subcase of the formal R is that of a constant c – “0-ary
function symbol” – deﬁned by an absolute for M formula, C (y), as

c = y ↔ C (y) (8)

after securing a proof of

(∃!y)C (y) (9)

In this case the conditions of absoluteness for c are that of C , and the require-
ment that the M-relativization of (9) be provable. For example, if ω ∈ M, then
ωM = ω by VI.8.16(xv).

Not all absoluteness results follow from ascertaining that our formulas are
0 . The following is an important example that does not so follow, which uses
some terminology (ﬁniteness) from the sequel, whence the .

VI.8.19 Example. A set A is ﬁnite, formally, iff there is an onto f : n → A for

some n ∈ ω. That is,
¬U (A) ∧ (∃ f )( f is a function ∧ dom( f ) is a natural number
(1)
∧(∀y ∈ A)(∃x ∈ dom( f ))x, y ∈ f )
Now, (1) is not 0 (and it is known that it is not equivalent to a 0 -formula).
Nevertheless, we can show that

“A is a ﬁnite set” (2)

is absolute for any transitive class M that satisﬁes a bit of ZFC. Namely, we want
M to be closed under pairs and to contain ω (ω ∈ M). We start by observing
that the formula to the right of (∃ f ) is 0 . Indeed, in view of VI.8.16 one need
only verify that

“dom( f ) is a natural number” is 0 (3)

† Thus, one may introduce the function symbol RM so that x ∈ M → (y = RM (x) ↔ RM (x, y))
/ M → RM (x) = ∅ are provable.
and x ∈
390 VI. Order

The quoted statement in (3) is argot for

dom( f ) = ∅ ∨ (∃x ∈ dom( f )) dom( f ) = x ∪ {x}

∧ (∀y ∈ dom( f ))(y is a natural number)
In view of the existential quantiﬁer preceding it, dom( f ) = x ∪ {x} translates to
(∀y ∈ x)y ∈ dom( f ) ∧ (∀y ∈ dom( f ))(y = x ∨ y ∈ x)
and we are done with claim (3). Now, by the absoluteness of the component
of (1) to the right of (∃ f ), the relativization of the entire formula is provably
equivalent to
¬U (A) ∧ (∃ f ∈ M)( f is a function ∧ dom( f ) is a natural number
(4)
∧ (∀y ∈ A)(∃x ∈ dom( f ))x, y ∈ f )
To prove the absoluteness of (2), we let A ∈ M and prove the equivalence of (1)
and (4). As (4) → (1) is trivial, we need worry only about (1) → (4). The
whole story is to show that a “witness” f for (1) (auxiliary constant) will work
for (3). For the latter we only need prove f ∈ M. So let f be a new constant,
and add the assumption
¬U (A) ∧ f is a function ∧ dom( f ) is a natural number
(5)
∧ (∀y ∈ A)(∃x ∈ dom( f ))x, y ∈ f
We set n = dom( f ) for convenience. Now transitivity of M and ω ∈ M imply
n ∈ M (and also n ⊆ M). By induction on m ≤ n we now prove f |` m ∈ M.
For m = 0 we are done by 0 ∈ ω ∈ M. Taking an I.H. for m < n, consider
f |` (m ∪ {m}). Now,
f |` (m ∪ {m}) = ( f |` m) ∪ {m, f (m)}
and thus it is in M by closure under pair and absoluteness of union (VI.8.16).
Thus, f = ( f |` n) ∈ M.

Pause. How much ZFC did we employ in the above proof? Was the assump-
tion ω ∈ M an overkill? If so, what would be a weaker assumption that still
works?

VI.8.20 Example. In our last example we consider a transitive class M that is

a formal model of ZF, that is, we can prove, say, in ZF,
a1 ∈ M → · · · → an ∈ M → A M
for every ZF axiom A of free variables an .†

† Therefore, J = (L Set , ZF, M) is the model, but it is a common abuse of terminology to say that
M is.
VI.8. Absoluteness 391

If it helps the intuition, Platonistically, we may think of M as a (real, i.e.,

semantic) model of ZF, i.e., a set (or proper class) where we have interpreted ∈
as ∈ and U as U , and all the ZF axioms turned out to be true.
Consider the recursive deﬁnition
(∀α)F(α) = G(F |` α) (1)
where G is total on U N , and moreover is absolute for M. We will show that F
is also absolute for M and that dom(FM ) = OnM .
By the way,
OnM = {x ∈ M : x is an ordinal} = M ∩ On, by VI.8.16. (2)
By VI.8.18 we need to prove (in ZF), on the assumptions x ∈ M, x is an ordinal,
and y ∈ M, that
(y = F(x))M ↔ y = F(x) (3)
and also (cf. (4) in VI.8.18)
x ∈ M ∧ (x is an ordinal) → F(x) ∈ M (4)
Note that the ﬁrst two assumptions, in view of (2), are jointly equivalent to
x ∈ OnM . We can then use, as usual, α to mean “x ∈ On and x = α”.
By the proof of VI.5.16,† y = F(α) stands for

(∃! f ) f is a function ∧ dom( f ) = α ∪ {α}
(5)
∧ ∀β ∈ α ∪ {α} G ( f |` β, f (β)) ∧ α, y ∈ f

Consulting the list VI.8.16 and also invoking VI.8.14, we observe that, for α
and y in M, the relativization of (5) – within provable equivalence – introduces
only one annoying part, namely, (∃! f ∈ M).‡
Let then α ∈ M and y ∈ M. The relativization of (5) is (provably equiva-
lent to)

(∃! f ∈ M) f is a function ∧ dom( f ) = α ∪ {α}
(6)
∧ ∀β ∈ α ∪ {α} G ( f |` β, f (β)) ∧ α, y ∈ f

† Actually, no detailed proof was given for this particular statement. The detailed proof that applies
here as well, with only notational modiﬁcations, was given in VI.2.25 for a more general case of
recursion, not just for recursion over On.
‡ Note that the term f |` x is short for {z : π (z) ∈ x ∧z ∈ f }, that is, {z : ((∃y ∈ x)y = π (z))∧z ∈ f },
and is thus absolute, and deﬁned in M on the assumptions x ∈ M and f ∈ M. Thus VI.8.14
applies.
392 VI. Order

which we abbreviate as
(∃! f ∈ M)C ( f, α, y) (6 )
in what follows. We will have shown (3) if we prove, under our underlined
assumptions, that (5) → (6 ), the other direction being trivial.
Add then (5) as an assumption, as well as a new constant g and the as-
sumption
C (g, α, y) (5 )
In VI.5.16 we gave a proof in ZF

Pause. In ZF? Is this true?

that
(∀α)(∃!y)(∃! f )C ( f, α, y) (7)
Since M is a formal model of ZF, and being mindful of the relativization claims
we made a few steps back, we have (by I.7.9) a ZF-proof of
(∀α ∈ M)(∃!y ∈ M)(∃! f ∈ M)C ( f, α, y) (8)
Specializing (8), we derive (∃y ∈ M)(∃ f ∈ M)C ( f, α, y).
Thus, adding two new constants c and h, we may add also
C (h, α, c) (9)
and
h∈M (10)
Now, the “!”-notation in (7) is short for (∀α)(∃y)(∃ f )C ( f, α, y) and

C ( f, α, y) → C ( f , α, y ) → f = f ∧ y = y (11)

Thus, since we have (7), the above and (5 ) yield y = c and h = g; hence,
from (10), g ∈ M. We have derived (cf. (5 ))

g ∈ M ∧ C (g, α, y)

and hence (6 ) by the substitution axiom (the “!” is inserted by (11)).
So y = F(α) is absolute. We need to establish (4) to show that F is. In fact,
(4) is a direct result of (8), which also yields dom(FM ) = M ∩ On = OnM .

VI.8.21 Exercise. As an important application of the above, prove that T C(x)

is absolute for any transitive formal model of ZF, M.
VI.8. Absoluteness 393

Hint. Recall that

T C(x) = x ∪ x∪ x∪ ∪···
Express the above as a recursive definition (you only need recursion in ω, but
to fit the result in the style of the previous example, define your function in a
trivial way for arguments ≥ ω).

VI.8.22 Exercise. Prove (in ZF) that T C satisﬁes

U (x) → T C(x) = ∅
and

¬U (x) → T C(x) = x ∪ {T C(y) : y ∈ x}

VI.8.23 Exercise (Induction on T C(x)). Prove that for any formula F (x),†

ZF (∀x) (∀y ∈ T C(x))F (y) → F (x) → (∀x)F (x)
That is, to prove F (x) we are helped by the assumption (∀y ∈ T C(x))F (y)
that we can add for free.
Hint. Start by assuming the hypothesis and proving using foundation (equiv-
alently, ∈-induction) that ZF (∀x)(∀y ∈ T C(x))F (y).

VI.8.24 Exercise. Another important application of the technique in VI.8.20

is the following: First, working in ZF, assume that (∀x)(∃!y)G (x, y), and prove
that a unique informal total function F exists (equivalently, you may introduce
a (formal) unary function symbol, F, for F) such that

(∀x)F(x) = G F |` T C(x) (1)

where we have written G for the function given by y = G(x) ↔ G (x, y) (we
could also have introduced a formal G).
Hint. Imitate the proof of VI.2.25. Start by showing that y = F(x) (or
y = F(x)) must stand for

(∃! f )C ( f, x, y)

where C ( f, x, y) abbreviates

f is a function ∧ dom( f ) = T C(x) ∧

(∀z ∈ T C(x))G ( f |` T C(z), f (z)) ∧
f (x) = y

† This works in weaker set theories. For example, neither inﬁnity nor power set axioms are required.
394 VI. Order

You will need to show that

ZF (∀x)(∃!y)(∃! f )C ( f, x, y)

Once the fact that F (or F) can be introduced has been established, show that F
is absolute for any transitive formal model of ZF for which G is absolute. This
will imitate the work in VI.8.20. We know that T C(x) is absolute by VI.8.21. We
need to worry about things such as (∃y ∈ T C(x)), y ∈ T C(x), and dom( f ) =
T C(x).

VI.8.25 Exercise (Absoluteness of Rank). Prove in ZF that the rank ρ is

absolute for transitive formal models of ZF. Do so by bringing the recursive
deﬁnition of rank (cf. VI.6.24) into the form (1) in VI.8.24. Treat N as a
parameter.

VI.8.26 Exercise. Prove in ZF that the function J as well as the various πin of
Section VI.7 are absolute for transitive formal models of ZF.
Hint. J satisﬁes the recurrence

(∀x ∈ On)(∀y ∈ On)J (x, y) = {J (x , y ) : x , y ∈ On2 ∧ x , y x, y}

or
& '
(∀x ∈ On)(∀y ∈ On)J (x, y) = ran J |` x, y

Absoluteness follows (after some work) from our standard technique that proves
the existence of recursively deﬁned functions: Start by proving in ZF that
J (α, β) = γ must be given by

(∃! f )C ( f, α, β, γ )

where
& '
C ( f, α, β, γ ) ↔ f is a function ∧ dom( f ) = α, β ∪ {α, β} ∧
& '
(∀w ∈ dom( f )) f (w) = ran f |` w ∧
f (α, β) = γ

As in all previous cases where this technique was employed, f simply “codes”
the computation that veriﬁes J (α, β) = γ . Now you will need a few ab-
soluteness lemmata to conclude your case. For example, you will need the
VI.9. The Constructible Universe 395
& '
absoluteness of x ∈ α, β . This is equivalent to

$ ∧ π (x) ∈ On ∧ δ(x) ∈ On ∧
O P(x)
(π (x) ∈ α ∨ π(x) ∈ β) ∧ (δ(x) ∈ α ∨ δ(x) ∈ β) ∨ %
π (x) ∪ δ(x) = α ∪ β ∧ (π(x) < α ∨ π(x) = α ∧ δ(x) < β)

etc.

VI.9. The Constructible Universe

We now revisit Section IV.2 from a formal point of view. The quest there was
simply to show that AC is plausible, but here we will do more.
We will define a cumulative hierarchy of sets, similar to the von Neumann
hierarchy, but as in Section IV.2 – and unlike what we did in Section VI.6 –
we will be careful not to admit all the sets that a “powering stage” yields.† We
will accept instead only those sets that can be defined by explicit and “simple”
operations based on what is available “so far”. This is one of the differences.
The other essential difference is that no two sets are constructed at the same
stage (even the urelements will be given at distinct stages).
The construction will build a formal model, J = (L Set , ZF, L N ) of ZFC,
where, as in Section VI.6, ∈ will be interpreted as ∈ and U as U .‡ The proof
of AC in the model will be, unlike the informal argument of Section IV.2, a
consequence of the fact that a construction stage produces a unique constructible
object. In particular, all this work will establish that if ZF is consistent, then so
is ZFC (cf. Section I.7), i.e., adding AC doesn’t hurt a theory that is not already
broken.
The whole idea of the construction of J is due to Gödel (1938, 1939, 1940),
who gave two different constructions, one outlined in Section IV.2, the other to
be followed here. The story has been retold by many, in many slightly different
ways. Our version is influenced by the accounts given by Shoenfield (1967),
Barwise (1975), and Jech (1978b).
After these preliminaries, we embark now upon the construction of Gödel’s
constructible universe L N over any appropriately chosen set of urelements
N . Sets will be built by iterating some simple “explicit” operations (Gödel
operations). There have being many variations in the choice of these opera-
tions. The following definition is one of these variations (close to the version

† These are in VN (α + 1) = P(N ∪ VN (α)) when we start with the set of urelements N .
‡ Thus, J will be a so-called ∈-model, or more accurately, (U, ∈)-model.
396 VI. Order

in Shoenﬁeld (1967), but with some departures for convenience and user-
friendliness).

VI.9.1 Deﬁnition (The Gödel Operations). We call the terms Fi below the
Gödel operations:

F0 (x, y) = x−y
F1 (x, y) = x ∩ dom(y)
F2 (x, y) = {z ∈ x : U (z)}
F3 (x, y) = {u, v ∈ x : u = v}
F4 (x, y) = {u, v ∈ x : u ∈ v}
F5 (x, y) = {u, v ∈ x : v, u ∈ y}
F6 (x, y) = {u, v, w ∈ x : v, w, u ∈ y}
F7 (x, y) = {u, v, w ∈ x : u, w, v ∈ y}
F8 (x, y, z) = x ∩ (y × z)
F9 (x, y) = {x, y}

The purpose of the Gödel operations is to provide “normalized” terms which

by repeated composition (substitution of one into the other) will form all the
“constructible” sets.
Indices 2–4 take care of the atomic formulas of set theory. Indices 5–7 ensure
that we can manipulate “vectors”, and, in particular, provide tools to address
the fact that u, v, w = u, v, w. Note the absence of power set operations
(we want to provide subsets in a “controlled” manner). Note also that for each
i = 0, . . . , 7, Fi (x, y) ⊆ x, and F8 (x, y, z) ⊆ x, a “technical” fact from which
we will beneﬁt (cf. VI.9.3 below). This technicality compelled the choice of the
rather awkward x ∩ dom(y) and x ∩ (y × z) instead of just dom(y) and y × z
at indices 1 and 8.

VI.9.2 Deﬁnition (The Sets Constructible from N ). Fix a set N of urelements

f
and a function f such that N ! &N &, where we have written &N & for dom( f ),
an ordinal.†

† This means the following: On one hand, trivially, ZF (∃x)(∃y)(¬U (x) ∧ (∀z ∈ x)U (z) ∧ y
is a 1-1 function ∧ dom(y) ∈ On ∧ ran(y) = x). For example, x = y = ∅ work, and then we
can invoke the substitution axiom. Now one can introduce new constants N and f along with
assumption (cf. p. 75),
¬U (N ) ∧ (∀z ∈ N )U (z) ∧ f is a 1-1 function ∧ dom( f ) ∈ On ∧ ran( f ) = N
VI.9. The Constructible Universe 397

Deﬁne by recursion over On the function α "→ Fα :

for α < &N & : Fα = f (α),

for α ≥ &N & : Fα = &N &≤β<α Fβ if Lim(α) ∨ α = 0
for α + 1 ≥ &N & : Fα+1 = Fi (Fπ14 (α) , Fπ24 (α) ) if π44 (α) = i < 8
Fα+1 = F8 (Fπ14 (α) , Fπ24 (α) , Fπ34 (α) ) if π44 (α) = 8
Fα+1 = F9 (Fπ14 (α) , Fπ24 (α) ) if π44 (α) ≥ 9

where the πin are those of VI.7.5.

An object x (i.e., set or urelement) is constructible ( from N – a qualiﬁcation
that is omitted if it is clear from the context) just in case x = Fα for some α.
We will also say that x is N -constructible.
L N is the class of all objects constructible from N (if N = ∅, then we write
L rather than L∅ ).
We will use the notation ord(x) to indicate min{α : Fα = x}. We will pro-
nounce ord(x) “order of x”.

The previous recursive definition is appropriate, since πi4 (α) ≤ α < α + 1 for
i = 1, 2, 3, 4 (see VI.7.7). It uses ordinals to systematically iterate the Gödel
operations “as long as possible”. In this section we work in ZF; thus we had to
ask that N be well-ordered (in ZFC, of course, every set is well-orderable by
Zermelo’s theorem). The subcase “∨ α = 0” (part of the case “for α ≥ &N &”)
takes care of the situation where N = ∅. Then F0 = ∅.
The reader will want to compare the above definition with Definition VI.6.1.
L N parallels U N (rather than V N ). The major differences between the two
definitions are:
(1) Instead of using the power set operation at successor ordinal stages, we
are using explicit (Gödel) operations that are much “weaker” than forming
power sets, to construct one member of the hierarchy at a time.
(2) The urelements are not given at once (stage 0), but are “built” one at a time;
it takes &N & steps to have them all.
Between successive limit ordinals we ensure that each case (among the
ten Gödel operations) gets “equal opportunity” to apply (at successor ordinal
stages), by using a technique recursion theorists call “dovetailing”.†

† For each case, according as π44 (α) = 0, 1, . . . , 8, or > 8, all pairs π14 (α), π24 (α) and all triples
π14 (α), π24 (α), π34 (α) will be considered, since α "→ π14 (α), π24 (α), π34 (α), π44 (α) is onto – as
follows from VI.7.5 by the observation that K , L is onto.
398 VI. Order

Note that ord(x) is similar to ρ(x) of VI.6.19, but it is a different function

here, whence the different symbol.
We easily see by induction on α that ZF sp(Fα ) ⊆ N .

The following few lemmata are central:

VI.9.3 Lemma. Let x ∈ L N . Then y ∈ x implies y ∈ L N and ord(y) < ord(x).

Proof. We do induction on ord(x), setting γ = ord(x) for convenience.

If γ < &N & or &N & ≤ γ = 0, then the claim is vacuously satisﬁed.

Let then γ ≥ &N & and Lim(γ ). By VI.9.2, x = &N &≤β<γ Fβ . Then y ∈ x
implies y ∈ Fβ for some &N & ≤ β < γ . By the obvious I.H. (since ord(Fβ ) ≤
β < γ ), we have y ∈ L N and ord(y) < ord(Fβ ) < γ = ord(x).
Let γ = α + 1 ≥ &N &. We have cases according to i = π44 (α):

Fi (Fπ14 (α) , Fπ24 (α) ) if i = 0, . . . , 7
y ∈ x = Fα+1 =
Fi (Fπ14 (α) , Fπ24 (α) , Fπ34 (α) ) if i = 8

Then

y ∈ x ⊆ Fπ14 (α)

by VI.9.1. By the obvious I.H. (note that ord(Fπ14 (α) ) ≤ π14 (α) ≤ α < α + 1),
we have y ∈ L N and ord(y) < ord(Fπ14 (α) ) < α + 1 = ord(x).
Finally, let y ∈ x = Fα+1 = F9 (Fπ14 (α) , Fπ24 (α) ). Then y = Fπ 4j (α) for j = 1
or j = 2; hence ord(y) = ord(Fπ 4j (α) ) ≤ π 4j (α) ≤ α < α + 1. Moreover, y ∈ L N ,
since it is an “Fβ ”.

VI.9.4 Corollary. L N is a transitive class.

VI.9.5 Lemma. L N is closed under Fi for i = 0, . . . , 9.

Proof. Straightforward index computations, and VI.9.2, yield this claim. For
example, say that x, y (sets or atoms) are in L N , so that x = Fα and y = Fβ .
Then J4 (α, β, &N &, 9) ≥ &N & (by VI.7.7) and

L N % FJ4 (α,β,&N &,9)+1 = F9 (x, y) = {x, y}

Note, in particular, that

L N % FJ4 (α,α,&N &,9)+1 = F9 (x, x) = {x}

VI.9. The Constructible Universe 399

Similarly, if x, y are sets with orders as above, x ∩dom(y) = FJ4 (α,β,1,1)+1 (why
is J4 (α, β, 1, 1) + 1 ≥ &N &?) The remaining cases are left to the reader.

VI.9.6 Corollary. If each xi (i = 1, . . . , n) is in L N , so is x1 , x2 , . . . , xn .

This is a theorem schema: one theorem for each n ∈ N. The proof is by informal
induction on n.

Proof. Induction on n. For n = 1, x1 = x1 and there is nothing to prove.

We proceed to n + 1 via (I.H.) n: Now u, v = {u, {u, v}}; hence it is in L N
whenever u, v are. Thus, x1 , . . . , xn+1 = x1 , . . . , xn , xn+1 ∈ L N by I.H.

VI.9.7 Lemma. If x is a set and x ⊆ L N , then there is a set y ∈ L N such that

x ⊆ y.

Proof. The hypothesis yields, via Lemma VI.9.5,

(∀z ∈ x)(∃α){z} = Fα

By collection, there is a set A such that

(∀z ∈ x)(∃α ∈ A){z} = Fα

Let γ = α∈A α (VI.5.22). Borrowing two results from the next section (VI.10.3
and VI.10.11), we note that Lim(γ + ω) and γ < γ + ω. Thus

y = Fγ +ω = Fα
&N &≤α<γ +ω

will do.

VI.9.8 Lemma. L N is closed under ∩, ∪, and ×.

Proof. Let the sets x, y be in L N . Then x ∩ y = x − (x − y) = F0 (x, F0 (x, y));

thus it is in L N by VI.9.5.
Closure under ∪ follows by this argument: x ∪y ⊆ L N (by transitivity of L N );
hence (VI.9.7), for some z ∈ L N , x ∪ y ⊆ z. But then, x ∪ y = (z − x) ∩ (z − y).
Finally, x × y ⊆ L N , by VI.9.3 and VI.9.6. Thus, for some z ∈ L N ,
x × y ⊆ z. Hence x × y = (x × y) ∩ z = F8 (z, x, y).
400 VI. Order

Pause. But what about x ∩ y when, say, U (x)? We have said in Chapter III that
the formal operations (∩, ∪, −) are total, so they make sense on atoms (in N )
too.

VI.9.9 Lemma. L N is closed under dom.

Proof. Let x ∈ L N and z ∈ x. By two applications of VI.9.3, π (z), i.e., the y

which for some w satisﬁes z = {y, {y, w}} ∈ x, is in L N . Thus, dom(x) ⊆ L N
and, of course, dom(x) is a set by collection. Thus (VI.9.7), for some u ∈ L N ,
dom(x) ⊆ u, and hence we are done by (VI.9.5), since dom(x) = dom(x)∩u =
F1 (u, x).

VI.9.10 Lemma (Three “Derived” Gödel Operations). L N is closed under

F10 (x) = {u, v : v, u ∈ x}

F11 (x) = {u, v, w : v, w, u ∈ x}
F12 (x) = {u, v, w : u, w, v ∈ x}

Proof. Let x be in L N . Using VI.9.3, collection, and VI.9.7, we have sets

y1 , y2 , y3 in L N such that

F10 (x) ⊆ y1
F11 (x) ⊆ y2
F12 (x) ⊆ y3

Thus, F10 (x) = F5 (y1 , x), F11 (x) = F6 (y2 , x), and F12 (x) = F7 (y3 , x). We are
done by VI.9.5.

The following lemma shows that introducing dummy variables does not take
us out of L N . It forms the fundamental step of the main result of this section, that
L N is a model of ZFC, in that it helps to show that L N satisﬁes the separation
axiom.

VI.9.11 Lemma. For all n ≥ 1, all i, j among 1, . . . , n, and all N -

constructible sets a1 , . . . , an , b, the set

F(n) (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : u i , u j ∈ b}

is in L N .

This too is a theorem schema.

VI.9. The Constructible Universe 401

Proof. Let ﬁrst i = j. We do (informal) induction on the length of u 1 , . . . , u n .

For the basis, we have n = 2, and the result follows from VI.9.8 and VI.9.5
on observing that

F(2) (a1 , a2 ) = {u 1 , u 2 ∈ a1 × a2 : u 1 , u 2 ∈ b} = (a1 × a2 ) ∩ b

if i < j, and

F(2) (a1 , a2 ) = {u 1 , u 2 ∈ a1 × a2 : u 2 , u 1 ∈ b}

= F5 (a1 × a2 , b)

otherwise.
For the induction step we consider cases.
Case n ∈
/ {i, j}. By I.H.,

F(n−1) (a1 , . . . , an−1 ) = {u 1 , . . . , u n−1 ∈ a1 × · · · × an−1 : u i , u j ∈ b}

is in L N . But then so is

F(n) (a1 , . . . , an ) = F(n−1) (a1 , . . . , an−1 ) × an

by VI.9.8.
Case n ∈ {i, j} and i, j are consecutive integers. If n = 2, then we are back
to the basis step, so assume n > 2. Now, observe that u 1 , . . . , u n−1 , u n =
u 1 , . . . , u n−2 , u n−1 , u n , and set

F(3) an−1 , an , (a1 × · · · × an−2 )
= {u n−1 , u n , u 1 , . . . , u n−2 ∈ (an−1 × an )
× (a1 × · · · × an−2 ) : u i , u j ∈ b}

Clearly, F(3) an−1 , an , (a1 ×· · ·×an−2 ) ∈ L N by the previous case (and VI.9.8),
and

F(n) (a1 , . . . , an ) = {u 1, . . ., u n ∈ a1 × · · · × an : u i , uj ∈ b}

= F11 F(3) an−1 , an , (a1 × · · · × an−2 )

Therefore, the latter is in L N by VI.9.10.

Case n ∈ {i, j} and i, j are not consecutive integers. By the ﬁrst case,

-
F (n) (a , . . . , a ) = {u , . . . , u , u
1 n 1 n n−1 ∈
a1 × · · · × an × an−1 : u i , u j ∈ b}
402 VI. Order

is in L N . Since u 1 , . . . , u n−1 , u n = u 1 , . . . , u n−2 , u n−1 , u n , we conclude

that

F(n) (a1 , . . . , an ) = {u 1 , . . . , u n−1 , u n ∈ a1 × · · · × an : u i , u j ∈ b}

-
= F12 F (n) (a , . . . , a )
1 n

is in L N .
Case i = j, ﬁnally. By VI.9.5 and VI.9.8,

F(2) (ai , ai ) = {u i , u i ∈ ai × ai : u i , u i ∈ b}

= F3 (c, c)

where c = ai ∩ b, is in L N . Clearly, u i , u i ∈ F3 (c, c) iff u i ∈ dom(F3 (c, c)).

Thus,

F(n) (a1 , . . . , an ) = a1 × · · · × ai−1 × dom(F3 (c, c)) × ai+1 × · · · × an

where, without loss of generality, we have assumed that i is “ in general position”

(1 < i < n).† The result follows from VI.9.9.

We need one more lemma before we can successfully tackle separation in L N .

VI.9.12 Lemma. For each n ≥ 1 and N -constructible sets a1 , . . . , an and for

u n ) of set theory, the set
each formula A(

FA (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : A L N (
u n )}

is in L N .

Some of the arguments in A( u n ) might be dummy ones, as in λu 1 u 2 u 3 .u 3 ∈ u 1 .

The round brackets, according to our earlier conventions, mean that – dummy
u n ” is the entire list of variables relevant to A.
or not – “

Proof. We do induction on formulas A. The reader will want to keep in mind

Deﬁnition VI.8.1.
u n .U (u i ). Then, since U L N (x) is U (x), we obtain
u n ) is λ
Case A(

FA (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : U (u i )}
= a1 × · · · × ai−1 × F2 (ai , ai ) × ai+1 × · · · × an

in L N , by VI.9.5 and VI.9.8.

† Of course, “×” associates left to right, and we omitted brackets to avoid cluttering the notation.
VI.9. The Constructible Universe 403

Case A( u n .u i = u j (possibly i = j). Then, since (x = y)L N is x = y,

u n ) is λ
we obtain

FA (a1 , . . . , an )
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i = u j }
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i , u j ∈ F3 (ai × a j , ai )}

in L N , by VI.9.5, VI.9.8, and VI.9.11.

Case A( u n .u i ∈ u j (possibly i = j). Then, since (x ∈ y)L N is
u n ) is λ
x ∈ y, we obtain

FA (a1 , . . . , an )
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i ∈ u j }
= {u 1 , . . . , u n ∈ a1 × · · · × an : u i , u j ∈ F4 (ai × a j , ai )}

in L N , by VI.9.5, VI.9.8, and VI.9.11.

u n ) is ¬B (
Case A( u n ). By I.H.,

FB (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : B L N (
u n )}

is in L N . Since (¬B )L N is ¬(B L N ),

FA (a1 , . . . , an ) = a1 × · · · × an − FB (a1 , . . . , an )

and the result follows from VI.9.5 and VI.9.8.

Case A( u n ) ∨ C (
u n ) is B ( u m ), say, with m ≤ n. By I.H.,

FB (a1 , . . . , an ) = {u 1 , . . . , u n ∈ a1 × · · · × an : B L N (
u n )}

and
LN
FC (a1 , . . . , am ) = {u 1 , . . . , u m ∈ a1 × · · · × am : C (
u m )}

are in L N . Since (B ∨ C )L N is (B L N ) ∨ (C LN
),

FA (a1 , . . . , an ) = FB (a1 , . . . , an ) ∪ FC (a1 , . . . , am ) × am+1 × · · · × an

and the result follows from VI.9.5 and VI.9.8 (“× am+1 × · · · × an ” above is
absent if n = m).
Finally,
u n , y). By I.H.,
u n ) is (∃y)B (
Case A(

FB (a1 , . . . , an , b)
(1)
= {u 1 , . . . , u n , y ∈ a1 × · · · × an × b : B L N (
u n , y)}
404 VI. Order

is in L N for any N -constructible a1 , . . . , an , b. As we want to show that

FA (a1 , . . . , an )
$ %L
= u 1 , . . . , u n ∈ a1 × · · · × an : (∃y)B ( u n , y) N

= u 1 , . . . , u n ∈ a1 × · · · × an : (∃y ∈ L N )B L N (
u n , y)

is in L N , it sufﬁces to prove there is an N -constructible set b for which (3)

below holds. Then

FA (a1 , . . . , an ) = dom(FB (a1 , . . . , an , b)) (2)

and we are done by VI.9.9.

Now consider
$
u n ∈ a1 × · · · × an (∃y ∈ b) y ∈ L N ∧ B L N (
∀ u n , y)
% (3)
∨ y = ∅ ∧ ¬(∃z ∈ L N )B L N (
u n , z)

We prove (3) as a consequence of

$
u n ∈ a1 × · · · × an (∃y) y ∈ L N ∧ B L N (
∀ u n , y)
% (4)
∨ y = ∅ ∧ ¬(∃z ∈ L N )B L N (
u n , z)

u n ∈ a1 × · · · × an.
Let us prove (4).Let
Case 1. (∃y) y ∈ L N ∧ B L N ( u n , y) . Then (4) follows by tautological
implication and ∃-monotonicity.

Case 2. ¬(∃z) z ∈ L N ∧ B L N ( u n , z) . Note that ∅ is constructible: If N =
∅, then ∅ = p − p,† where p ∈ N ( p is, of course, in L N ). If N = ∅, then
∅ = F0 . Thus ∅ ∈ L N ∧ ∅ = ∅ ∧ ¬(∃z ∈ L N )B L N ( u n , z) is provable; hence so
LN
is (∃y) y ∈ L N ∧ y = ∅ ∧ ¬(∃z ∈ L N )B ( u n , z) by the substitution axiom.
Now (4) follows once more by tautological implication and ∃-monotonicity.
By collection,‡ there is a set A (new constant) such that
$
u n ∈ a1 × · · · × an (∃y ∈ A) y ∈ L N ∧ B L N (
∀ u n , y)
%
∨ y = ∅ ∧ ¬(∃z ∈ L N )B L N (
u n , z)

† Recall that the formal difference makes sense on atoms. Cf. III.4.16.
‡ “y ∈ L N ” is, of course, the set theory formula “(∃α)y = Fα ” – see Deﬁnition VI.9.2.
VI.9. The Constructible Universe 405

Take then for b (new constant) any N -constructible set satisfying A ∩ L N ⊆ b

(by VI.9.7). We can now verify (2):

u n ∈ FA (a1 , . . . , an )
↔
u n ∈ a1 × · · · × an ∧ (∃y ∈ L N )B L N (
u n , y)
↔ →: by (4)→(3); ←: by (3) and VI.9.3, since b ∈ L N
u n ∈ a1 × · · · × an ∧ (∃y ∈ b)B L N (
u n , y)
↔
(∃y) y ∈ b ∧ u n ∈ a1 × · · · × an ∧ B L N (u n , y)
↔
u n ∈ dom FB (a1 , . . . , an , b)

VI.9.13 Theorem. J = (L Set , ZF, L N ) is a formal model of ZFC.

N.B. Actually, L Set and ZF contain N and f and their axiom (VI.9.2).

Proof. The proof is in ZF. First off, L N = ∅. Indeed, we have shown in the
course of the previous proof that ∅ ∈ L N . We verify now the ZFC axioms.
(1) Extensionality holds in L N by VI.9.4 and VI.8.10.
(2) For the axiom
U (b) → ¬(∃x)x ∈ b
we want
b ∈ L N → U (b) → ¬(∃x ∈ L N )x ∈ b
which follows from the ZF version preceding it.
(3) The axiom of separation says that
u i−1 , a, u i+1 , . . . , u n ) = {u i : u i ∈ a ∧ P (
A( u n )}
is a set (parametrized by the free variables a and u j , 1 ≤ j < i ∨ i < j ≤ n)
for any formula P . The relativized version asserts that
AL N (
u i−1 , a, u i+1 , . . . , u n ) = {u i ∈ L N : u i ∈ a ∧ P LN
(
u n )} (i)
is constructible from N whenever a and u j , 1 ≤ j < i ∨ i < j ≤ n, are
(cf. VI.8.13).
So, we let a and u j , 1 ≤ j < i ∨ i < j ≤ n, be in L N and prove that
LN
A ( u i−1 , a, u i+1 , . . . , u n ) ∈ L N . By transitivity of L N , (i) simpliﬁes to
AL N (
u i−1 , a, u i+1 , . . . , u n ) = {u i : u i ∈ a ∧ P LN
(
u n )} (i )
406 VI. Order

Now, by VI.9.5, VI.9.12, and VI.9.8,

FP ({u 1 }, . . . , {u i−1 }, a, {u i+1 }, . . . , {u n })

LN
= {xn ∈ {u 1 } × · · · × {u i−1 } × a × {u i+1 } × · · · × {u n } : P (xn )}

is constructible. Hence



 ·
ran dom · · dom(FP ) if i > 1
AL N (
n−i terms
u i−1 , a, u i+1 , . . . , u n ) =

 dom ·
· · dom(FP ) otherwise

n−1 terms

which is in L N , considering that ran(x) = dom(F10 (x)).

(4) Existence of the set of urelements: {x : U (x)} is a set. We want to prove
(cf. VI.8.13)

{x ∈ L N : U (x)} ∈ L N

In view of the already noted sp(Fα ) ⊆ N and N ⊆ L N (N = {Fα : α < &N &}),
the above translates to N ∈ L N . Now, since N ⊆ L N (and N is a set), there
is, by VI.9.7, an N -constructible set A such that N ⊆ A. But then N =
{x ∈ A : U (x)}; hence it is constructible by L N -separation ((3) above).
(5) The pairing axiom: For any atoms or sets a and b, there is a set c such
that a ∈ c and b ∈ c. Thus, we want (by VI.8.13) to show that {a, b} is deﬁned
in L N . Since {a, b}L N = {a, b}, this follows from VI.9.5.

(6) The union axiom states, essentially, that if A is a set, then A is a set.

For the L N version we need to be deﬁned in L N (again by VI.8.13). Since

(by VI.8.16 and VI.9.4) is absolute for L N , we need to show that L N is

-closed. For A ∈ L N , A = {x : (∃y ∈ A)x ∈ y} ⊆ L N by VI.9.3. Hence

A ⊆ b ∈ L N for some b (by VI.9.7), and A ∈ L N by L N -separation.
(7) Foundation holds in L N by VI.8.11.
(8) Collection says that for any set A and formula P [x, y],

(∀x ∈ A)(∃y)P [x, y] → (∃z)(∀x ∈ A)(∃y ∈ z) P [x, y]

Letting A ∈ L N , we want to prove the relativized version

(∀x ∈ A)(∃y ∈ L N )P L N [x, y]

LN (iii)
→ (∃z ∈ L N )(∀x ∈ A)(∃y ∈ z) P [x, y]

So assume the hypothesis of (iii), i.e.,

$ LN
%
(∀x ∈ A)(∃y) (∃α)y = Fα ∧ P [x, y]
VI.9. The Constructible Universe 407

By collection (in ZF), there is a set w (new constant) such that

$ %
(∀x ∈ A)(∃y ∈ w) (∃α)y = Fα ∧ P L N [x, y] (iv)
By VI.9.7 there is a set s ∈ L N (new constant) such that w ∩ L N ⊆ s. Let

z = y ∈ s : (∃x ∈ A)P L N [x, y] (v)

By L N -separation, z ∈ L N . Moreover, this z works to establish the conclusion

of (iii). Indeed, let x ∈ A. By (iv) we derive
$ %
(∃y)(∃α) y ∈ w ∧ y = Fα ∧ P L N [x, y]
since without loss of generality α is not free in P . Introduce now new constants
Y and β and the assumption
LN
Y ∈ w ∧ Y = Fβ ∧ P [x, Y ]
The first two conjuncts imply that Y ∈ s hence, by (v), Y ∈ z. Thus,
Y ∈ z ∧ P L N [x, Y ]

hence (∃y) y ∈ z ∧ P L N [x, y] from which generalization and the underlined
assumption (deduction theorem used) yield
LN
(∀x ∈ A)(∃y ∈ z)P [x, y]
The right hand side of (iii) is now obtained by the substitution axiom and modus
ponens.
(9) The power set axiom says that for any set A, {x : x ⊆ A} is a set too. For
the L N version we want P(A) to be defined in L N . Now P L N (A) = P(A) ∩ L N
(Exercise VI.60), a set by the power set axiom and separation in ZF. Since
P(A) ∩ L N ⊆ L N , P L N (A) is constructible by VI.9.7 and L N -separation.
(10) The relativization of the axiom of infinity, essentially, says ωL N ∈ L N .
Thus, if we can prove that ω ∈ L N , then we will be done, since ωL N = ω will
follow (by remarks in VI.8.18). We will prove a bit more for convenience,
namely,
On ⊆ L N
So take as induction hypothesis that
(∀β < α)β ∈ L N
Proceeding now exactly as in the proof of VI.9.7, there is a limit ordinal γ such
that α ⊆ Fγ . Let A = {σ : σ ∈ Fγ } and τ ∈ σ ∈ A. First, σ = Fρ for ρ < γ ,
by VI.9.3. Hence, τ ∈ σ ⊆ Fγ , since Lim(γ ) (see VI.9.2). Thus, A is transitive;
408 VI. Order

hence A ∈ On. Furthermore, A ∈ L N by L N -separation.† Clearly, α ⊆ A. If

α = A, then α ∈ L N . If α ∈ A, then again α ∈ L N by transitivity of L N .
(11) The axiom of choice, relativized in L N , says that if A is a (constructible)
set of nonempty constructible sets, then there is a choice function c with
dom(c) = A such that (∀x ∈ A)c(x) ∈ x. To prove this just take

c(x) = Fmin{α : Fα ∈x}

VI.9.14 Corollary. If ZF is consistent, then so is ZFC.

Proof. By our previous construction and the results of Section I.7. Note that the
auxiliary constant metatheorem is being invoked, since neither the hypothesis
nor the conclusion refers to the new constants N and f introduced as per the
footnote on p. 396. c.f. p. 75.

Gödel also showed that the generalized continuum hypothesis is true in L N ,

and hence consistent with ZF and AC (see VII.7.25).
We conclude the section by brieﬂy exploring some easy consequences of
absoluteness considerations, mostly stated as exercises.

VI.9.15 Exercise. Prove that all Gödel operations, including the three derived
ones, are absolute for transitive models of ZF. Are they absolute for any other
classes?

VI.9.16 Exercise. Prove that the function λα.Fα is absolute for L N when the
constants N , f are interpreted as themselves.
Hint. This follows from VI.9.15 and techniques in Section VI.8 (cf. in par-
ticular VI.8.20 and VI.8.26). Note that f : dom( f ) → N is in L N . This is so
by dom( f ) ∈ On ⊆ L N , f ⊆ dom( f ) × N , and closure of L N under × (now
apply L N -separation).

The axiom of constructibility says that all objects are in L N , or all objects
are constructible.‡ Formally then it is

(∀x)(∃α)x = Fα (V = L)

It denoted by “V = L” or “V = L” depending on typeface preference.

† A = {σ : σ ∈ Fγ } = {x ∈ Fγ : x is an ordinal}. See VI.8.16.

‡ In atomless approaches to ZF, an object has to be a set.
VI.9. The Constructible Universe 409

It must be noted however that the formula displayed generically as (V = L)

depends on the choice of N used in the construction of the function α "→ Fα ,
and hence of L N . What N we have in mind in a particular argument should
be clear from the context, if the particular choice affects results claimed. Set
theorists do not believe that the axiom of constructibility is (really) true, but
they ﬁnd it interesting as a “temporary” axiom. For one thing, it is harmless:

VI.9.17 Theorem. If ZF is consistent, then so is ZF + (V = L). Indeed, V = L

is true in J = (L Set , ZF, L N ), where ZF is extended as in VI.9.2.

Proof. (V = L)L N is (∀x ∈ L N )(∃α ∈ L N )(x = Fα )L N ; hence, by VI.9.16 and

On ⊆ L N , one needs to prove

(∀x ∈ L N )(∃α)x = Fα

This is precisely VI.9.2.

For another thing, it simpliﬁes the relativization of arguments to L N : Suppose

that we want to prove (having set N L N = N and f L N = f )

ZF A L N (1)

for some sentence A of the extended L Set . Suppose that we prove

ZF+(V=L) A (2)

instead, which results in a proof of (by the deduction theorem)

ZF V = L → A (3)

Since J is a formal model of ZF (the N , f -axiom is true in J), (3) implies

(cf. I.7.9)

|=J (V = L) → A

But |=J (V = L) by VI.9.17; thus |=J A; that is, we have derived (1).
Conversely, suppose we have proved (1). We show that (2) follows. To this
end we show by induction on formulas that

V = L A ↔ A LN (4)
LN

interesting case where A ≡ (∃x)B . Now A ≡ (∃x) (∃α)
We just check the
LN
x = Fα ∧ B , while

V = L (∃x) (∃α)x = Fα ∧ B L N ↔ (∃x) (∃α)x = Fα ∧ B (5)
410 VI. Order

by the I.H. But V = L (∃α)x = Fα ∧ B ↔ B , since the hypothesis –
(∀x)(∃α)x = Fα – implies (∃α)x = Fα . This and (5) yield (4) via the Leibniz
rule.
Thus, if we have (1), then we also have ZF + (V=L) A L N , and hence (2),
by (4).
The moral, in plain English, is:

VI.9.18 Remark. To prove in ZF a sentence A relativized to L N , add the

axiom of constructibility and prove instead the unrelativized sentence A.

VI.9.19 Exercise. Prove that L N is the ⊆-smallest proper class (formal) model
of ZF among those that contain N . In particular, L (cf. VI.9.2) is the ⊆-smallest
proper class (formal) model of ZF.
Hint. Indeed, let M be a proper class model of ZF where N ∈ M. First show
that On ⊆ M: Let α ∈ On. Look for a β ∈ M such that α ≤ β. For example,
as M is not a set, pick an x ∈ M − N ∪ VN (α). By VI.8.25, ρ(x) ∈ M. This
is a good enough β. Conclude by computing LM N . Towards this you will need
that λα.Fα is absolute for M, in particular, that f of VI.9.2 is in M. The latter
is argued as in VI.9.16, using the absoluteness of × (cf. VI.8.16).

VI.9.20 Remark. Our discussion has focused so far on “large” models M, i.e.,
proper class models. These have the advantage of containing all the ordinals.
One is also interested in “small” transitive (U, ∈)-models of ZF, M, where M is a
set. We can easily adjust the constant N of p. 396 so that the related assumption is

¬U (N ) ∧ (∀z ∈ N )U (z) ∧ f is a 1-1 function ∧ dom( f ) ⊆ ω ∧ ran( f ) = N

Since M satisﬁes inﬁnity, the constant ω is absolute for M. The additional

assumption that N ∈ M will then yield that λα.Fα is absolute for M
(cf. VI.8.19).

VI.10. Arithmetic on the Ordinals

Infinite sequences extend the notion of an n-tuple, while transfinite sequences
extend the notion of an infinite sequence. Just as in the case of finite sequences
(V.1.28–V.1.29), where we can juxtapose or concatenate them to obtain a se-
quence whose length is the sum of the lengths of the originals (V.1.30), we are
able to do that with arbitrarily “long” WO sets, in particular with canonical WO
sets, i.e., ordinals.
VI.10. Arithmetic on the Ordinals 411

For example, on concatenating α and β in that order (α ∗ β) we expect,

informally speaking, to end up with the sequence
{0, 1, 2, . . . , α, α + 1, . . .} (1)
of length, intuitively, α + β , as is the case when α, β are natural numbers. We
would also like to be able to iterate concatenation, for example concatenating
α with itself “β times”, to obtain
α
∗ α ∗· · · ∗ α (2)
a sequence of β copies of α

of length, intuitively, α · β.
Let us ﬁrst make addition of ordinals precise by extending the recursive
deﬁnition of addition over ω (V.1.22).†

VI.10.1 Deﬁnition (Addition of Ordinals). The unique function A(α, β) given

by the recursion below will be denoted by “α + β ”:
A(α, 0) = α
A(α, β + 1) = A(α, β) + 1
for Lim(β), A(α, β) = sup{A(α, γ ) : γ < β}

In the deﬁnition we use “+” with potentially two different meanings: The new
meaning is given in the deﬁnition. The old meaning is in the use of “+1” to
mean ordinal successor. It will turn out that the two meanings are consistent.

VI.10.2 Proposition. λβ.α + β is normal for any α.

Proof. By VI.5.38 and the weak continuity of λβ.α + β, we only need to show
that α + β < α + (β + 1). But, by VI.10.1, this translates to

α + β < (α + β) + 1
= (α + β) ∪ {α + β}

VI.10.3 Corollary. If Lim(β), then Lim(α + β).

Proof. By VI.5.40.

We next prove the analogue of V.1.25, which shows that Deﬁnition VI.10.1
indeed captures the intuitive meaning of (1) in the preamble to this section.

† All the results in this section are due to Cantor.

412 VI. Order

VI.10.4 Theorem. α + β = α ∪ {α + γ : γ < β}.

Proof. We do induction on β. The basis and the case β + 1 are handled exactly
as in V.1.25.
So, let Lim(β), and assume (I.H.) that whenever γ < β,
α + γ = α ∪ {α + λ : λ < γ } (1)
Now β > 0; hence α < α + β, using VI.10.2 (since α + 0 = α by VI.10.1); thus
α ⊂ α + β. Moreover, by VI.10.2 again, α + γ < α + β, that is, α + γ ∈ α + β.
Thus
α + β ⊇ α ∪ {α + ρ : ρ < β} (2)

Let now δ ∈ α + β = {α + τ : τ < β}. Thus, δ ∈ α + τ for some τ < β,
and, by (1), δ ∈ α or δ = α + λ for some λ < τ . Thus δ is in the right hand
side of (2), and we get the converse inclusion of (2).

VI.10.5 Remark (On Notation). (1) We re-examine the notation α + 1, which

we have adopted to mean α ∪ {α}. Using the theorem above and thinking of
“+” here as addition rather than successor, we get
α + 1 = α ∪ {α + γ : γ < 1}
= α ∪ {α + 0}
= α ∪ {α} by VI.10.1
Thus the new notation α + β and the old α + 1 mean the same thing when
β = 1.
(2) Often, the notation is used instead of + for ordinal addition, + being
reserved for cardinal addition. For the addition of cardinals we will use +c . We
use for something else (see below).

We next relate the ordinal of a WO set obtained by concatenation of two

WO sets to those of the originals. As intuition dictates, it will be the sum of the
two original ordinals. Thus the length of the concatenation of a WO set is the
sum of the lengths of its two components, as it should be.

VI.10.6 Deﬁnition.
(i) The disjoint union of sets X and Y is the set {0} × X ∪ {1} × Y , denoted
X Y.
(ii) If (X β )β∈α is an α-sequence of sets, then their ordered disjoint sum is
!
β∈α ({β} × X β ) and is denoted by β∈α X β .
VI.10. Arithmetic on the Ordinals 413
!
Clearly, X 0 X 1 = β∈2 X β . Note that X Y = Y X in general (e.g., take
X = {0} and Y = {1}.)

VI.10.7 Deﬁnition.

(i) Let (X, <1 ) and (Y, <2 ) be two WO sets. Then <∗ on X Y is deﬁned
lexicographically, i.e.,

i, x <∗ j, y iff i < j ∨

i = j = 0 ∧ x <1 y ∨
i = j = 1 ∧ x <2 y
!
(ii) Let (X β , <β ) be a WO set for all β ∈ α. Then < on β∈α X β is deﬁned
lexicographically by

β, x < γ , y iff β < γ ∨

β = γ ∧ x <β y

VI.10.8 Proposition. <∗ and < of VI.10.7 are indeed well-orderings.

Proof. That they are linear orders is straightforward. Moreover, in either case,
any nonempty set of pairs β, x has a minimum: Locate the ones with
<-minimum β among such pairs; in this set locate the pair with <β -minimum
x (in X β ).

VI.10.9 Deﬁnition (Concatenation of WO Sets). Given WO sets (X β , <β ) for

!
all β ∈ α, their concatenation catβ∈α (X β , <β ) is the WO set ( β∈α X β , < ).
In particular, when α = 2 we write also (X 0 , <0 ) ∗ (X 1 , <1 ) for the concate-
nation, and we denote < by <∗ in this case.

The following theorem shows that the deﬁnition of addition of ordinals is

appropriate.

VI.10.10 Theorem. If &(X 0 , <0 )& = α and &(X 1 , <1 )& = β, then &(X 0 , <0 ) ∗
(X 1 , <1 )& = α + β.

Proof. Let
f
α∼
= X0 (1)
414 VI. Order

and
g
β∼
= X1 (2)

where we have omitted the relevant orders to the left (∈) and right (<i , i = 0, 1)
of ∼
= for simplicity of notation.
Let f˜ = {γ , 0, f (γ ) : γ ∈ α} and g̃ = {α + γ , 1, g(γ ) : γ ∈ β}.
Since α ≤ α + γ (with the “=” when γ = 0), we have dom( f˜ ) ∩ dom(g̃) = ∅;
hence H = f˜ ∪ g̃ is a total function on dom( f˜ ) ∪ dom(g̃), that is, on α + β,
by VI.10.4. H is onto X 0 X 1 since f˜ and g̃ are onto {0} × X 0 and {1} × X 1
respectively, by (1) and (2).
By (1) and (2), and Deﬁnition VI.10.7, H is order-preserving and hence an
isomorphism between α + β and (X 0 X 1 , <∗ ).

VI.10.11 Proposition (Some Properties of Ordinal Addition).

(i) α + (β + γ ) = (α + β) + γ
(ii) α <β →α+γ ≤β +γ
(iii) α <β ↔γ +α <γ +β
(iv) 0+α =α
(v) α < β ↔ (∃γ > 0)α + γ = β.

Proof. (i): We do induction on γ . Assume then the claim for all λ < γ .
By VI.10.4,

(α + β) + γ = (α + β) ∪ {(α + β) + λ : λ < γ }
= (α + β) ∪ {α + (β + λ) : λ < γ } by I.H.
= α ∪ {α +δ : δ < β} ∪ {α + (β + λ) : λ < γ } by VI.10.4 (1)

Next,

α + (β + γ ) = α ∪ {α + δ : δ < β + γ } (2)

Since β + γ = β ∪ {β + λ : λ < γ }, δ < β + γ is equivalent to (recall that <

is ∈)

δ < β ∨ (∃λ < γ )δ = β + λ

Thus, (1) and (2) yield the claim.

(ii): We do induction on γ . The basis follows from the assumption and
δ + 0 = δ for all δ.
VI.10. Arithmetic on the Ordinals 415

The case α + (γ + 1) vs. β + (γ + 1) follows from α + γ ≤ β + γ (I.H.)

and VI.5.5.
Let next Lim(γ ), and assume that α + λ ≤ β + λ for all λ < γ (I.H.).
Then

α + γ = sup{α + λ : λ < γ } ≤ sup{β + λ : λ < γ } = β + γ

(iii): From VI.10.2 and trichotomy.

(iv): Assume the claim for all β < α (I.H.). By VI.10.4,

0 + α = 0 ∪ {0 + β : β < α} = {β : β < α} = α

(v): The ← follows from (iii) and VI.10.1. For →, let α < β, hence α ⊂ β.
Thus the set difference X = β − α is nonempty and well-ordered by <. Let

&(X, <)& = γ (3)

Hence γ > 0 (X = ∅). The function f given by

0, δ if δ ∈ α
f (δ) =
1, δ if δ ∈ X

is trivially an isomorphism
f
β∼
= {0} × α ∪ {1} × X

Hence &(α X, <∗ )& = β. But also &(α X, <∗ )& = α + γ by (3), whence
the claim.

VI.10.12 Example. Note that + is not commutative on On. For example,

1 + ω = {0} ∪ {1 + n : n ∈ ω}
= {0} ∪ {n + 1 : n ∈ ω} by V.1.24
= {n : n ∈ ω}
=ω

Yet, ω < ω + 1.
Here is an alternative argument: ω + 1 is a successor, while Lim(1 + ω)
by VI.5.40, so they cannot be equal.

We next deﬁne multiplication of ordinals by iterating addition, as it is done

over ω.
416 VI. Order

VI.10.13 Deﬁnition (Multiplication of Ordinals). The unique function

M(α, β) deﬁned by the recursion below will be denoted by “α · β ”:

M(α, 0) = 0
M(α, β + 1) = M(α, β) + α
for Lim(β), M(α, β) = sup{M(α, γ ) : γ < β}

VI.10.14 Proposition. λβ.α · β is normal for any α > 0.

Proof. By VI.5.38 and the weak continuity of λβ.α · β, we only need to show
that α · β < α · (β + 1) if α > 0. But, by VI.10.13, this translates to

α · β < (α · β) + α

which is derivable by VI.10.2.

VI.10.15 Corollary. If α > 0 and Lim(β), then Lim(α · β).

Proof. By VI.5.40.

VI.10.16 Theorem. α · β = sup{α · γ + α : γ < β}.

Proof. We do induction on β. Let ﬁrst β = 0. Then α · 0 = 0 by VI.10.13. Also

sup{α · γ + α : γ ∈ β} = sup ∅ = ∅. Done with the basis.
Assume next as I.H.

α · β = sup{α · γ + α : γ < β}
(1)
= (α · γ + α)
γ ∈β

Then, since α · β ⊆ α · β + α by VI.10.2 (“=” corresponds to α = 0),

α · (β + 1) = α · β + α
= (α · β + α) ∪ (α · β)

= (α · β + α) ∪ (α · γ + α) by (1)
γ ∈β

= (α · γ + α)
γ ∈β+1

This settles the successor case.

VI.10. Arithmetic on the Ordinals 417

Finally, let Lim(β). Now, by VI.10.13,

α·β = α·γ
γ ∈β

⊆ (α · γ + α) since α · γ ⊆ α · γ + α
γ ∈β

= α · (γ + 1) by VI.10.13
γ ∈β

⊆ α·γ since γ ∈ β → γ + 1 ∈ β
γ ∈β

The I.H. was not needed in this case.

We next interpret multiplication of ordinals in terms of concatenations of

WO sets.

VI.10.17 Theorem. Let (X γ , <γ ) be a WO set with order type α, for each
γ ∈ β. Then &catγ ∈ β (X γ , <γ )& = α · β.

Proof. The case α = 0 being trivial (each X γ = ∅), we assume α > 0. By

Exercise VI.35, it sufﬁces to assume that X γ = α for all γ ∈ β.

For ease of notation, set Aβ = γ <β ({γ }×α). Take the induction hypothesis
(on β) that
(∀γ < β)&(Aγ , < )& = α · γ (1)
Now, since (Aβ , < ) is a WO set, there is a unique isomorphism φ Aβ that maps
this WO set onto an ordinal. This φ is given by (see VI.4.32)
φ Aβ (γ , δ) = {φ Aβ (σ, τ ) : σ, τ < γ , δ}
& ' (2)
= φ Aβ < γ , δ
for any γ , δ ∈ Aβ . Using (1), we now compute the ordinal in (2).
For γ < β we have
& '
< γ , δ = ({η} × α) ∪ {γ } × δ
η<γ

∼
= {0} × ({η} × α) ∪ {1} × δ by Exercise VI.36
η<γ
= {0} × Aγ ∪ {1} × δ
where we have omitted the orders on both sides of the ∼ =. Thus, by the I.H. (1),
& '
φ Aβ (γ , δ) = φ Aβ < γ , δ
(3)
= α·γ +δ
We are ready to compute &(Aβ , < )&.
418 VI. Order

For γ < β and δ < α, we have α · γ + δ < α · γ + α = α · (γ + 1) by VI.10.2

and VI.10.13. Thus, α · γ + δ ≤ α · β by VI.10.14. We conclude that

φ Aβ [Aβ ] ⊆ α · β (4)

Towards promoting (4) to equality, let η ∈ α · β = {α · γ + α : γ < β}
by VI.10.16. Thus, η ∈ α · γ + α for some γ < β; therefore η ∈ α · γ
or η = α · γ + δ for some δ < α, by VI.10.4. In the latter case, immediately
η ∈ φ Aβ [Aβ ] by (3). In the former case, α · γ = α · γ + 0 ∈ φ Aβ [Aβ ] and
transitivity of φ Aβ [Aβ ] again yield η ∈ φ Aβ [Aβ ]. Thus (4) is an equality.

VI.10.18 Remark. It is worth noting that Aβ = γ <β ({γ } × α) = β × α; thus
VI.10.17 shows that

(β × α, < ) ∼
= (α · β, <)

Note the order reversal of the “terms” α and β.

VI.10.19 Proposition (Some Properties of Ordinal Multiplication).

(i) 0·α =α·0=0

(ii) 1·α =α·1=α
(iii) α·2=α+α
(iv) α <β →α·γ ≤β ·γ
(v) α > 0 → (α · β < α · γ ↔ β < γ )
(vi) α · (β + γ ) = α · β + α · γ
(vii) α · (β · γ ) = (α · β) · γ
(viii) α >1∧β >1→α+β ≤α·β
(i x) α = 0 = β → α · β = 0.

Proof. (i): 0 · α = β<α (0 · β + 0) = 0, using the I.H. that 0·β = 0 for β < α.
α · 0 = 0 by VI.10.13.
(ii):

1·α = (1 · β + 1)
β<α

= (β + 1) under the obvious I.H.
β<α
+
= sup {β : β < α}
=α

On the other hand, α · 1 = α · 0 + α = α (VI.10.13 and VI.10.11(iv)).

VI.10. Arithmetic on the Ordinals 419

(iii):

α·2 = (α · β + α)
β<2
= (α · 0 + α) ∪ (α · 1 + α)
= α ∪ {α + α}
= α + α, since α ⊆ α + α by VI.10.2.

(iv): Fix α < β. We do induction on γ : Let γ = 0. Then α · 0 = β · 0. Done.

Next, consider γ = λ + 1. Then

α · (λ + 1) = α · λ + α
= ≤ β · λ + α by I.H. and VI.10.11(ii)
< β . λ + β by VI.10.2
= β · (λ + 1)

Finally, let Lim(γ ), and assume (I.H.) that α · λ ≤ β · λ for all λ < γ . Thus,

α · γ = λ<γ α · λ ⊆ λ<γ β · λ = β · γ .
(v): The ← is by VI.10.14. The → is by VI.10.14 and trichotomy.
(vi): The result holds for α = 0 (by (i)), so let α > 0 and do induction on
γ . The case γ = 0 is α · (β + 0) = α · β = α · β + 0 = α · β + α · 0, using (i)
for the last “=”. Let γ = λ + 1, and assume the obvious I.H. Then

α · β + (λ + 1) = α · (β + λ) + 1
= α · (β + λ) + α
= (α · β + α · λ) + α by I.H.
= α · β + (α · λ + α)
= α · β + α · (λ + 1)

Finally, let Lim(γ ), and assume the claim (I.H.) for all λ < γ . Now,

α · (β + γ ) = α · (β + sup{λ : λ < γ })
= α · (sup{β + λ : λ < γ }) by continuity of + (VI.5.36, VI.10.2)
= sup{α · (β + λ) : λ < γ } by continuity of · (VI.5.36, VI.10.14)

(recall that we are in the case α > 0)

= sup{α · β + α · λ : λ < γ } by I.H.

= α · β + sup{α · λ : λ < γ } by continuity of +
= α · β + α · γ by VI.10.13

(vii): We do induction on γ (assume throughout that α = 0 = β, since

otherwise the result reads 0 = 0). When γ = 0, we get α · (β · 0) = α · 0 = 0.
Also, (α · β) · 0 = 0. Done.
420 VI. Order

Let γ = λ + 1. Then α · (β · (λ + 1)) = α · (β · λ + β) = α · (β · λ) + α · β =

(α · β) · λ + α · β = (α · β) · (λ + 1), where the second “=” uses distributivity,
and the third one the obvious I.H.
Let now Lim(γ ). Then, arguing as in (vi), using continuity of · ,
α · (β · γ ) = α · (β · sup{λ : λ < γ })
= α · sup{β · λ : λ < γ }
= sup{α · (β · λ) : λ < γ }
= sup{(α · β) · λ : λ < γ } by I.H.
= (α · β) · γ by VI.10.13.
(viii): By Exercise VI.33,

α+β = {α + (γ + 1) : γ < β} (4)
under the assumptions. On the other hand,

α·β = {α · (γ + 1) : γ < β} (5)
We thus take as I.H. that 1 < α and 1 < γ yield α + γ ≤ α · γ . Therefore, (4)
and (5) yield the result for β > 1, by the I.H. and the strict monotonicity of
λγ .α + (γ + 1) and λγ .α · (γ + 1).

(i x): α · β = γ ∈ β (α · γ + α) ⊇ α by VI.10.11(ii) and the fact that the union
is nonempty. We have proved more under the assumptions: 0 < α ≤ α · β.

VI.10.20 Remark. Multiplication of ordinals is not commutative. For example,

ω · 2 = ω + ω > ω. On the other hand, 2 · ω = {2 · n : n ∈ ω}, since Lim(ω).
Now {2 · n : n ∈ ω} ⊆ ω; hence

{2 · n : n ∈ ω} ⊆ ω
The ⊆ above graduates to =, since the fact that Lim(2 · ω) precludes ⊂. Thus,
ω · 2 > 2 · ω.
There are two more conclusions to be drawn from this example:
(1) The “right” distributive law does not hold. For example, (1 + 1)·ω = 2·ω =
ω < ω + ω = ω · 2.
(2) VI.10.19(iv) cannot be sharpened. Indeed, 1 < 2, yet 1 · ω = 2 · ω.

VI.10.21 Proposition (Division with Remainder). Given α and β > 0, there

are unique ordinals π and υ such that
(1) α = β · π + υ,
(2) π ≤ α and υ < β.
π is the quotient and υ is the remainder of the division of α by β.
VI.10. Arithmetic on the Ordinals 421

This extends the well-known theorem of Euclid from ω to all of On.

Proof. Existence. If α = 0, then take π = υ = 0. So let α > 0. Since β · 0 =

0 < α, the set X = {θ : β · θ ≤ α} is not empty (why set?). If θ > α, then
β ·θ ≥1·θ by VI.10.19
=θ by VI.10.19
>α
Hence θ ∈
/ X . Contrapositively, θ ∈ X → θ ≤ α. Thus sup X ≤ α. We claim
that

π = sup X (1)

works. So far (π ≤ α), so good. We want to propose a partnering υ.

Now, using VI.5.36 and VI.10.19,

β · π = β · sup X
= sup{β · θ : θ ∈ X } (2)
≤α

By VI.10.11(v) and (2), there is an υ such that

α =β ·π +υ (3)

with

υ=0 iff (2) is equality (4)

We next show that υ < β. If not,

Case β = υ. Then (3) yields (by VI.10.13) that α = β · (π + 1); hence
π + 1 ∈ X , contradicting (1).
Case β < υ. Then (VI.10.11(v)) there is a γ > 0 such that υ = β + γ , and
(3) yields

α = β · π + (β + γ )
= β · (π + 1) + γ

again yielding π + 1 ∈ X by VI.10.11(v), thus contradicting (1). We have

settled existence.
Uniqueness. Let

α = β · π + υ = β · π + υ (5)

where υ < β and υ < β.

422 VI. Order

Let π < π . Then π = π + γ for some γ > 0; hence

α = β · (π + γ ) + υ = β · π + (β · γ + υ )

Now

υ<β
= β ·1
≤ β ·γ
≤ β · γ + υ

Hence β · π + υ < β · π + (β · γ + υ ) = β · π + υ , contradicting (5).

Similarly, π > π is untenable; hence π = π . Thus υ = υ by VI.10.11(iii).

VI.10.22 Corollary. For α > 0, Lim(α) iff (∃β)ω · β = α iff 2 · α = α.

Proof. Let Lim(α), and divide α by ω to get (by VI.10.21)

α =ω·π +υ

Now, υ < ω; hence it is a natural number. If υ > 0, then υ = n + 1 (n ∈ ω);

hence α = ω · π + (n + 1) = (ω · π + n) + 1, a successor. This contradicts the
assumption, so υ = 0.
Next, say 0 < α = ω · β. Then 2 · α = 2 · (ω · β) = (2 · ω) · β = ω · β = α,
where we have used VI.10.20 in the penultimate “=”.
Finally, let 0 < α = 2 · α, and suppose that α = β + 1 for some β. Then

α = 2 · (β + 1)
= 2·β +2
≥ 1 · β + 2 by VI.10.19
= β +2
> β +1=α

a contradiction. So Lim(α).

We conclude this section with the operation of exponentiation on ordinals,

once again using the corresponding operation on the natural numbers for mo-
tivation. Therefore we will iterate multiplication.
VI.10. Arithmetic on the Ordinals 423

VI.10.23 Deﬁnition (Exponentiation of Ordinals). The unique function

P(α, β) deﬁned by the recursion below will be denoted by α ·β :

P(α, 0) = 1
P(α, β + 1) = P(α, β) · α
for Lim(β), P(α, β) = sup{P(α, γ ) : γ < β}

The “·” in α ·β is an annoying convention used to distinguish ordinal exponen-

tiation from cardinal exponentiation, the latter being (usually) privileged not to
need the “·”.
So far we have kept the notation for ordinal operations “natural”, by using
the same symbolism as that used on the natural numbers. As justification we
invoke priority: In our development we first developed the natural numbers, and
we denoted n ∪ {n} by n + 1 (as most people justifiably do), which induced (or
forced) our subsequent notation, + and ·, first on the natural numbers and then
on ordinals. We decided that we did not want to ask the reader to write m n
from some point onwards, when he means to add m and n. Note, however, that
some authors use and • respectively, although they may start off with the
notation n + 1 for n ∪ {n}.
It is clear that the above recursion also defines exponentiation over ω, pro-
vided the case Lim(α) is dropped, as it is irrelevant over ω.

VI.10.24 Remark. If α > 0, then α ·β > 0. Indeed, α ·0 = 1 > 0. Furthermore,

assuming that α ·β > 0 (I.H.), we get that

α ·β+1 = α ·β · α > 0

by VI.10.19(i x). Finally, if Lim(β) and α ·γ > 0 for all γ < β (I.H.), then

α ·β = sup{α ·γ : γ < β} > 0

VI.10.25 Proposition. If α > 0, then λβ.α ·β is weakly normal. If α > 1, then

it is normal.

Proof. Since α ·β > 0 by VI.10.24, we have α ·β+1 = α ·β · α ≥ α ·β by VI.10.19,

with “>” if α > 1. The result follows from VI.5.38 and VI.5.39.

VI.10.26 Corollary. If α > 1 and Lim(β), then Lim(α ·β ).

424 VI. Order

VI.10.27 Proposition (Some Properties of Ordinal Exponentiation).

(i) 0·α = 0 iff α is a successor; otherwise 0·α = 1

(ii) 1·α = 1
(iii) α > 1 → (α ·β < α ·γ ↔ β < γ )
(iv) α > 0 → α ·β+γ = α ·β · α ·γ
(v) α > 0 → (α ·β )·γ = α ·β·γ
(vi) α > 1 ∧ β > 0 → 1 < α ·β
(vii) α > 1 ∧ β > 1 → α · β ≤ α ·β .

Proof. (i): 0·β + 1 = 0·β · 0 = 0 by VI.10.19. By VI.10.23, 0·0 = 1. Now if

Lim(β), then 0·β = sup{0·γ : γ < β} ≥ 1, since 0 < β. Assume next (I.H.)
that whenever γ < β and Lim(γ ), then 0·γ = 1. It follows that 0·β = sup{0·γ :
γ < β} ≤ 1.
For (ii), 1·0 = 1, and 1·α + 1 = 1·α · 1 = 1 · 1 = 1 using the obvious I.H.
Finally, let Lim(α) and 1·β = 1 for all β < α (I.H.). Then 1·α = sup{1·β :
β < α} = sup{1} = 1.
(iii): By VI.10.25 and trichotomy.
(iv): We do induction on γ . First, α ·β+0 = α ·β = α ·β · 1 = α ·β · α ·0 . This
settles the basis. Next,

α ·β+(γ +1) = α ·(β+γ )+1

= α ·β+γ · α by VI.10.23
= (α ·β · α ·γ ) · α by the obvious I.H.
= α ·β · (α ·γ · α)
= α ·β · α ·γ +1 by VI.10.23

Finally, let Lim(γ ), and assume the claim for all δ < γ . By VI.5.35 and
VI.10.25, λβ.α ·β is continuous. Therefore,

α ·β+γ = α ·β+sup{δ : δ<γ }

= α · sup{β+δ : δ<γ }
= sup{α ·β+δ : δ < γ }
= sup{α ·β · α ·δ : δ < γ } by the I.H.
= α ·β · sup{α ·δ : δ < γ } by continuity of · and by α ·β > 0
= α ·β · α ·γ

(v): A similar routine calculation proves this case.

(vi): 1 = α ·0 . Now use (iii).
(vii): Going for a contradiction, let β > 1 be smallest for which (vii) fails,
VI.10. Arithmetic on the Ordinals 425

i.e.,

α ·β < α · β (1)

Is it the case that Lim(β)? If so, α · β = sup{α · δ : δ < β} and Lim(α · β)

(cf. VI.10.15); thus, by (1), there is a δ < β such that

α ·β < α · δ (2)

By (2),

δ>1

since otherwise α ·β < α = α ·1 , contradicting (iii) (recall, α > 1, β > 1). Now,
by (iii),

α ·δ < α ·β
< α · δ, by (2)

This contradicts the minimality of β, so we must have β = γ + 1 for some γ .

Clearly, γ > 1; otherwise γ = 1 (why?) and (1) says that

α · α = α ·2 < α · 2 = α + α

which cannot be, by VI.10.19(viii). By minimality of β,

α · γ ≤ α ·γ

Hence

(α · γ ) · α ≤ α ·γ · α = α ·β (3)

Since β = γ + 1 < γ + α ≤ γ · α by VI.10.19(viii), (3) yields

α · β < α · (γ · α) = (α · γ ) · α ≤ α ·β

contradicting (1).

VI.10.28 Example. (α · β)·γ = α ·γ · β ·γ in general. For example,

(2 · 2)·ω = (2·2 )·ω = 2·2·ω = 2·ω (1)

while

2·ω · 2·ω = 2·ω+ω (2)

We see that the right hand sides of (1) and (2) are different by VI.10.27(iii);
therefore so are the left hand sides.
426 VI. Order

VI.10.29 Remark. Since λα.ω·α is normal, it has arbitrarily large ﬁxed points.
Such ﬁxed points were called “ε-numbers” by Cantor.
·ω
In particular, the reader can readily verify that sup{ω, ω·ω , ω·ω , . . .} is an
ε-number, the smallest one above ω.

Additional material on the arithmetic of ordinals can be found in the Ex-

ercises section and in the references Bachmann (1955), Kamke (1950), Levy
(1979), Monk (1969), Sierpiński (1965), Tarski (1956).

VI.11. Exercises
VI.1. If P has MC and Q ⊆ P, then Q has MC.
VI.2. If P is left-narrow, then P n a is a set for all n > 0 in ω and all a.
VI.3. Prove that if an order < is left-narrow, then it has MC iff every nonempty
set has a <-minimal element.
(Hint. Only the if part is non-trivial. Start with a ∈ A. If a is minimal,
ﬁne. Else show that any minimal element in the set < a∩A is minimal
in A.)
VI.4. Prove that if a relation P is left-narrow, then it has MC iff every nonempty
set has a P-minimal element.
(Hint. Work with P+ .)
VI.5. Prove the claims in the – passage in Remark VI.2.26.
VI.6. Prove that if A ⊆ B and A is a transitive set, then C(B, x) = x for all
x ∈ A, where C is Mostowski’s collapsing function (VI.2.38).
(Hint. Use ∈-induction.)
$ %
VI.7. With reference to VI.4.3, prove that ({1}, <) ∼=
is a proper class, where
“<” is the standard order on ω.
VI.8. Prove that the relation < deﬁned on On in VI.4.9 is transitive.
VI.9. Prove Theorem VI.4.30 directly by explicitly using the recursion (1)
of VI.4.32.
VI.10. Prove VI.3.23 for WO sets by using the comparability of ordinals and
Theorem VI.4.30 (proved via VI.4.32).
VI.11. For α > 0 prove that Lim(α) iff for all β < α there is a γ such that
β < γ < α.
VI.12. For each α = 0 show that sup+ α = α.

VI.13. Prove that Lim(α) iff α = 0 and α = α.
VI.11. Exercises 427

VI.14. Prove that if a set ∅ = A ⊆ On does not have a maximum, then sup(A)
is a limit ordinal.
VI.15. Prove that there are arbitrarily large limit ordinals, that is, for each α
there is a β > α such that Lim(β). This problem addresses questions
raised (and answers promised) following V.1.2.
(Hint. By induction over ω define the sequence f (0) = α and f (n+1) =
f (n) + 1. Argue that (1): if β = sup ran( f ), then Lim(β); and (2):
α < β.)
VI.16. Prove that if f is a weakly continuous On-sequence, of ordinals that
moreover satisfies (∀α) f (α) ≤ f (α + 1), then f is non-decreasing and
hence is weakly normal.
VI.17. Prove that the composition of normal functions is normal.
VI.18. Prove that if f is a normal transfinite On-sequence, then, for any γ , it
has a fixed point β such that γ < β. Check that your proof (along the
lines of that for the Knaster-Tarski theorem) furnishes the smallest fixed
point greater than γ .
VI.19. Prove the Knaster-Tarski fixpoint theorem (VI.5.47) under the weakened
assumption that in the PO set (A, <) every chain has a least upper bound.
VI.20. Refer to the proof of VI.5.47. For the γ chosen, prove that s<γ = sγ .
VI.21. Prove that for all α, sp(VN (α)) ⊆ N .
VI.22. Show that, for all α, β, VN (α) ∈ VN (β) implies α < β.
VI.23. Show that ρ(VN (α)) = α + 1.
VI.24. Define “standard rank” by r k N (x) = min{α : x ⊆ N ∪ VN (α)}. Show
that r k N (α) = α.
VI.25. Relate ρ N and r k N . Show that for all sets x, ρ N (x) = r k N (x) + 1.
VI.26. Complete the proof of VI.7.2.
VI.27. Substantiate the comment made following VI.7.5.
VI.28. Show for the J of Section VI.7 that J (α, β) = {J (σ, τ ) : σ, τ α, β}.
VI.29. Show for the of Section VI.7 that α 2 = (0, α) for all α.
VI.30. Prove that the function λα.J [α × α] is increasing (order-preserving).
VI.31. Prove that the function λα.J [α × α] has arbitrarily large fixed points.
(Hint. Prove that it is normal.)
VI.32. Prove that ordinal addition is absolute for transitive models of ZF.
VI.33. Prove that if β > 0, then α + β = sup+ {α + γ : γ < β}.
VI.34. Prove that 1 + α = α iff ω ≤ α.
428 VI. Order

VI.35. Let (X β , <β ) ∼

= (Yβ , <β ) for all β < α. Show that

cat (X β , <α ) ∼
= cat (Yβ , <α )
β<α β<α

VI.36. Let (X, <1 ) and (Y, <2 ) be disjoint WO sets. Deﬁne < on X ∪ Y by

a<b iff {a, b} ⊆ X ∧ a <1 b ∨

{a, b} ⊆ Y ∧ a <2 b ∨
a ∈ X ∧b ∈Y

Show that (X Y, <∗ ) ∼

= (X ∪ Y, <).
VI.37. (Left cancellation in ordinal addition.) Show that if α + β = α + γ ,
then β = γ .
VI.38. (Right cancellation in ordinal addition does not hold.) Show by an
example that β + α = γ + α → β = γ .
VI.39. Prove that ordinal multiplication is absolute for transitive models of ZF.
VI.40. Show that α > 0 implies β ≤ α · β.
VI.41. Show that if α > 0 and β > 1, then α < α · β.
VI.42. (Left cancellation in ordinal multiplication.) Show that if α > 0 and
α · β = α · γ , then β = γ .
VI.43. (Right cancellation in ordinal multiplication does not hold.) Show by
an example that α > 0 ∧ β · α = γ · α → β = γ .
VI.44. For any 0 < n ∈ ω, show that n · ω = ω.
VI.45. Show that α > 0 is a limit iff for every 0 < n < ω, α = n · α.
VI.46. Prove that ordinal exponentiation is absolute for transitive models of
ZF.
VI.47. Show that α ·1 = α and α ·2 = α · α.
VI.48. Prove VI.10.27(v).
VI.49. Prove that for all m, n in ω, m ·n ∈ ω.
VI.50. Prove that for all m, n, k in ω, (m · n)·k = m ·k · n ·k .
VI.51. Show for all α > 1 that α ·β ≥ β. Can ≥ be sharpened to >? Why?
VI.52. Prove that Lim(α) implies that (α + n)·ω = α ·ω for all n ∈ ω.
VI.53. Prove that if the formula A has no quantiﬁers, then it is absolute for
any class M.

In the following few exercises models of fragments of ZFC are being sought.
We mean (U, ∈)-models.
VI.11. Exercises 429

VI.54. Show that V (ω) is a model for ZFC−infinity (i.e., satisfies all the ZFC
axioms except that for infinity). Indeed, infinity fails here. By the way,
this shows that infinity is not implied by the remaining axioms.
VI.55. Show that V (α) is a model for ZFC−collection, for any limit ordinal
α > ω.
VI.56. Find a limit ordinal α > ω such that the collection axiom is false in V (α).
By the way, this shows that collection is not implied by the remaining
axioms.
(Hint. Experiment with α = ω + ω.)
VI.57. Prove that the structure (On, U, ∈) is not a model of ZFC.
VI.58. Let N be a set of urelements such that f : N → n is a 1-1 correspondence
for some n ∈ ω. Prove that VN (ω) is a model of ZFC−infinity. (It fails
infinity).
VI.59. Let N be a set of urelements such that f : N → ω is a 1-1 correspon-
dence. Prove that VN (ω) is a model of ZFC−{infinity, collection}. (It
fails both infinity and collection.)
VI.60. Show that for any transitive class M, PM (A) = M ∩ P(A).
VI.61. Complete the proof of Lemma VI.8.16.
VI.62. Prove that L N satisfies global choice. That is, show that there is a func-
tion F on L N such that for any set ∅ = x ∈ L N , F(x) ∈ x.
VII

Cardinality

In Chapter VI, among other things, we studied the WO sets and learnt how to
measure their length with the help of ordinal numbers. A consequence of the
axiom of choice was (Theorem VI.5.50) that every set can be well-ordered and
therefore every set can be assigned a length.†
In the present chapter we turn to another aspect of set size, namely its
number of elements, or cardinality. It will turn out that for finite sets length
and cardinality are measured by the same (finite) ordinal; thus, in particu-
lar, finite sets have a unique length. As was already remarked, the situation
with infinite sets is much less clean intuitively, and several WO sets of differ-
ing lengths can have the same number of elements (e.g., ω, ω + 1, ω + 2,
etc).
The following section will formalize the notions of “finite” and “infinite”
sets. Intuitively, a set is finite if the process of removing its elements, one at a
time, will terminate; it is infinite otherwise.‡
Thus for finite sets the process implicitly assigns the numbers 1, 2, 3, . . . to
the first, second, third, . . . removed items. Since the process terminates, there
will be a natural number assigned to the last removed item. Evidently this
number equals the cardinality, or number of elements, of the set.

† This length is not unique in general. For example, the set ω ∪ {ω} can be assigned both the lengths
ω + 1 and ω.
‡ This intuitive idea is for motivation only, and it will not be used anywhere except in the informal
discussion here. One can easily get into trouble if “time” is taken too literally. For example, let
us deplete ω in ﬁnite time! Start by removing 0. Exactly 1 hour later remove 1; exactly 1/2 hour
later remove 2; . . . ; exactly 1/2n hour later remove n + 1; and so on, in the obvious pattern. It
takes just 1 + 1/2 + 1/22 + · · · + 1/2n + · · · = 2 hours to complete the task. Yet, ω is intuitively
inﬁnite. Of course, we would not have had this informal “paradox” if we were careful to say
explicitly that we spend exactly the same amount of time between any two consecutive removals
of elements.

430
VII.1. Finite vs. Inﬁnite 431

In the infinite case it is not clear a priori how to assign a “number” that
denotes the cardinality of the set. Thus the issue is temporarily postponed, and
one first worries about whether or not two infinite sets have the same number of
elements. This is an easier problem to address, and it can be addressed before
we settle the question of what “number of elements” means. Indeed, any two
sets (infinite or not) clearly have the same number of elements if we can match
each element of one with a unique element of the other in such a way that
no unmatched elements are left on either side. Technically, two sets have the
same cardinality iff there is a 1-1 correspondence between them. Let us now
formalize this discussion and see where it leads.

VII.1. Finite vs. Inﬁnite

VII.1.1 Deﬁnition. Two sets A and B are said to be equinumerous or equipo-
tent, in symbols A ∼ B, iff there is a 1-1 correspondence f : A → B.
The negation of ∼ is denoted by ∼.

VII.1.2 Exercise (Informal). Show that ∼ is an equivalence relation on the

universe (of sets) U M .
(Hint. For reﬂexivity note that the function ∅ : ∅ → ∅ proves the special case
∅ ∼ ∅.)

VII.1.3 Deﬁnition. A set A is ﬁnite iff A ∼ n, where n ∈ ω. We call n the

cardinality or cardinal number of A, and write |A| = n in this case.
A set which is not ﬁnite is inﬁnite.

VII.1.4 Remark. According to the above definition and the hint in Exer-
cise VII.1.2, ∅ is finite. Furthermore, each n ∈ ω is finite, and |n| = n.
Corollary VII.1.8 below shows that the cardinality of a finite set is indeed
unique, for it is impossible to have n ∼ A ∼ m and n = m with m and n in ω.

VII.1.5 Example. Let E denote the set of even numbers. Then E ∼ ω.

Indeed, the function f : ω → E given by f = λn.2n is a 1-1 correspondence.

We now embark on showing that the deﬁnition of ﬁnite set is “reasonable”,

that is, it leads to the (intuitively) expected properties of ﬁnite sets.

VII.1.6 Proposition. If x ⊂ n ∈ ω, then there is no onto function f : x → n.

432 VII. Cardinality

x is not necessarily a natural number, so that “⊂” here is more general than “<”.

Proof. Induction on n in ω − {0}.

Basis: n = 1: Then the only function f : x → n is ∅ : ∅ → 1, which is clearly
not onto.
I.H.: Assume that if n = k, then there is no onto function f : x → n for any
x ⊂ n.
We show, by contradiction, that the situation is unchanged when n = k + 1.
Let instead f : x → k + 1 be onto, where x = ∅.† So let H = f −1 k. By onto-
ness, ∅ = H , and, of course, H ⊆ x.
Case 1. k ∈/ x. Then g = f (x − H ) is onto k, and x − H ⊂ k, contradicting
the I.H.
Case 2. k ∈ x. If f (k) ↑, then usex − {k} and go back to Case
1. Otherwise,
let k = m = f (k) and set g = f − {k, m} ∪ (H × {k}) ∪ (H × {m}).
Then g : x −{k} → k is onto, which contradicts the I.H. Finally, if k = f (k),
then f − (H × {k}) : x − H → k is onto, and we have contradicted the
I.H. once more.

VII.1.7 Corollary. If x ⊂ n ∈ ω, then x ∼ n.

VII.1.8 Corollary. If m < n ∈ ω, then m ∼ n.

One refers to Corollary VII.1.8 as the pigeon-hole principle, in that if you have
n pigeons and m holes (or vice versa) then there is no way to put exactly one
pigeon in each hole so that no pigeon is left out (and no hole is empty).

VII.1.9 Proposition. If x ⊂ n ∈ ω, then x is ﬁnite and |x| < n.

x is not necessarily a natural number, so that “⊂” here is more general than “<”.

Proof. x is well-ordered by ∈ (or <) as a subset of n. Thus, for some α,

(x, ∈) ∼
= (α, ∈)
In particular, α ∼ x, say, via the 1-1 correspondence f : α → x.
By VII.1.7, α = n. Suppose that n < α. Let y = ran( f n). Then n ∼ y
via f n, and y ⊆ x ⊂ n, contradicting VII.1.7. Thus, α < n. That is, α is a
natural number m, x ∼ m, and m < n.

† The case x = ∅ cannot lead to onto functions, as seen in the basis step; therefore it is not
considered here.
VII.1. Finite vs. Inﬁnite 433

VII.1.10 Corollary. If A is ﬁnite and B ⊆ A, then B is ﬁnite and |B| ≤ |A|.

Proof. Let f : A → n be a 1-1 correspondence for some n ∈ ω. Then

B ∼ ran( f B) ⊆ n.
By Proposition VII.1.9, ran( f B) ∼ m for some m ≤ n; thus B ∼ m (by
transitivity of ∼).

VII.1.11 Corollary. A is ﬁnite iff there is an onto f : n → A for some n ∈ ω.

Proof. If part. Let f : n → A be onto. Deﬁne g on A by g(x) = min f −1 x.

Then g : A → ran(g) ⊆ n is a 1-1 correspondence; hence |A| = |ran(g)| ≤ n.
Only-if part. Let f : n → A be a 1-1 correspondence. Then f is onto.

It is clear from the proof of the if part that f need not be total.

VII.1.12 Corollary. If A and B are ﬁnite, then A ∼ B iff |A| = |B|.

Proof. If part. Let |A| = |B| = n ∈ ω. Let h : n → A and g : n → B be 1-1 cor-

respondences. Then g ◦ h −1 : A → B is a 1-1 correspondence.
Only-if part. Let f : A → B, h : A → n and g : B → m be 1-1 correspon-
dences, where m < n. The diagram below establishes m ∼ n, a contradiction:
f
A −−−−→ B
 
 g
h/ /
n −−−−→ m
g◦ f ◦h −1

VII.1.13 Proposition. For all n ∈ ω there is no f such that f : n → ω is onto.

Proof. Induction on n.
Basis: n = 0: The result is immediate.
I.H.: Assume the assertion for n ≤ k. Proceeding by contradiction, as-
sume that f : k + 1 → ω is onto and let H = f −1 0. Hence k + 1 ⊇ H = ∅
by ontoness. Thus, k + 1 − H ∼ m < k + 1 for some m, by VII.1.9. Let
g : m → k + 1 − H be a 1-1 correspondence. The diagram below shows that
h ◦ g : m → ω is onto, contradicting I.H., since m ≤ k:

g h=λx. f (k+1−H ) (x)−1
m −→ K + 1 − H −→ ω
434 VII. Cardinality

VII.1.14 Corollary. ω is inﬁnite.

Before we turn our attention to infinite sets, we will look into finite sets more
carefully, at the same time establishing a few facts about inductively defined sets
and a technique of proving properties of finite sets by some sort of induction.

VII.1.15 Deﬁnition. Given a class S and an n-ary function f for some n ∈ ω,

we say that S is closed under f , or is f -closed, iff ran( f Sn ) ⊆ S.
If f is a set, then it is called an n-ary operation, or rule, on S. If F is a set
of rules and S is a class, then “S is F -closed” means that S is closed under all
f∈F .

The reader is familiar with operations on sets. For example, + is a total 2-ary
operation on the real numbers, R. We prefer to call 2-ary operations binary.
Also, λx.1/x is a nontotal 1-ary operation on R. We call 1-ary operations unary.
We also note that ∅ is closed under any f and that if for a choice of an n-ary
f and class S it is the case that ran( f Sn ) = ∅, then S is f -closed.
The requirement that “operations” (or “rules”) be sets does not limit the
range of applicability of the concept, while it simpliﬁes the technicalities. For
example, it is meaningful to have a set of rules, since, so restricted, they are
objects which can be collected into a class or set. Further justiﬁcation for this
restriction is embedded in the proof of Proposition VII.1.19 below. The material
below formalizes work presented informally in Section I.2 to bootstrap our
theory.

VII.1.16 Deﬁnition. Given a set I and a set of operations on F . We say that

a set S is inductively, or recursively, deﬁned by I (the initial objects) and F
(the set of operations or rules) iff S is the ⊆-smallest set that satisﬁes both of
the following conditions:
(a) I ⊆ S.
(b) S is F -closed.
Under these conditions, we also say that S is the closure of I under F , in
symbols S = Cl(I , F ).

VII.1.17 Remark. We clarify “⊆-smallest”: If after replacing S by the set T

in (a) and (b) we find that T satisfies (a) and (b), then S ⊆ T .

VII.1.18
Example. ω is inductively defined by I = {0} and F = λx.x ∪
{x} , where we may take as dom(λx.x ∪ {x}) ω itself, or any ordinal α such
that ω < α.
VII.1. Finite vs. Infinite 435

Indeed, first, ω satisfies (a) and (b) of Definition VII.1.16. Secondly, let a set
T also satisfy (a) and (b) with respect to the given I and F .
That is, 0 ∈ T and (∀x)(x ∈ T → x ∪ {x} ∈ T ). By induction over ω, ω ⊆ T ;
ω is ⊆-smallest.

VII.1.19 Proposition. Given the sets I and F of Deﬁnition VII.1.16, a unique

set Cl(I , F ) exists and is equal to x∈J x, where J is the class of all sets x
satisfying (a) and (b).

Proof. First, let X = I ∪ f ∈ F ran( f ). By Exercise VII.3, X is a set that

satisfies (a) and (b). Thus X ∈ J, and hence S = x∈J x is a set, being a subclass
of X .
Next, it is easy to verify that S satisfies (a) and (b). (See Exercise VII.3.)
Finally, S is ⊆-smallest, for if a set Q satisfies (a) and (b), then Q ∈ J.
The above establishes existence. For uniqueness use the ⊆-smallest property.
(See Exercise VII.3.)

We next note that there are two reasons justifying the term “inductively
defined”, or “recursively defined”, for sets such as S = Cl(I , F ).
First, the set S is defined in terms of (“smaller”, or “earlier”, instances of)
itself (starting with I ). For, (b) of Definition VII.1.16 says that if we know S
up to a certain “extent”, or “stage”, then we can enlarge S by applying to its
current version the operations in F .
Second, the definition allows us to prove properties of all elements of S by
induction with respect to the formation, or definition, of S. We also say, by
induction over S.
Such inductive definitions appear frequently in logic and mathematics, as
we have already witnessed, which was the reason that compelled us to present
an informal version of these results early on.

VII.1.20 Theorem(Induction over an Inductively Deﬁned Set). Let P (x)

be a formula. Then ∀x ∈ Cl(I , F ) P (x) is derivable from the assumptions

(i) (∀x ∈ I ) P (x) (basis), and

(ii) for each n-ary f ∈ F ,

an ) f (
(∀ an ) ↓→ P (a1 ) ∧ · · · ∧ P (an ) → P ( f (
an )

Condition (ii) in the theorem is also pronounced “P (x) propagates with each
operation in F ”. The part “P (a1 ) ∧ · · · ∧ P (an )” is the I.H. for f .
436 VII. Cardinality

Proof. Let P be the class {x : P (x)}. By (i) I ⊆ P, and by (ii) P is F -closed.

The set X = I ∪ f ∈ F ran( f ) is also F -closed, and I ⊆ X .
Thus Z = X ∩ P is a set which satisﬁes (a) and (b) of Deﬁnition VII.1.16.
Hence Cl(I , F ) ⊆ Z by Proposition VII.1.19, from which follows Cl(I , F )
⊆ P.

As the above proof suggests, proving by induction

with respect to Cl(I , F )
– or (I , F )-induction – that ∀x ∈ Cl(I , F ) P amounts to proving that the
class P = {x : P (x)} is F -closed and that, moreover, I ⊆ P.

VII.1.21 Deﬁnition. Given I and F . An n-tuple x1 , . . . , xn is a (I , F )-

derivation, or simply derivation if I and F are understood, iff for each
i = 1, . . . , n, at least one of the following holds:
(a) xi ∈ I , or
(b) xi = f (x j1 , . . . , x jk ), where jm < i for m = 1, . . . , k and f ∈ F is a k-ary
operation.†
We say that xn is (I , F )-derived, or just derived if I and F are under-
stood, by the derivation x1 , . . . , xn .

VII.1.22 Remark. It is clear that if x1 , . . . , xn is a derivation, then so is each

x1 , . . . , xk for 0 < k < n.

VII.1.23 Example. Let I = {0}, F = λx.x ∪ {x} . Then 0, 1, 2 and 0, 1, 0,
0, 1, 1, 0, 2 are (I , F )-derivations. They both derive 2.

VII.1.24 Theorem. For any I and F , {x : x is (I , F )-derived} = Cl (I , F ).

Proof. Let us denote by D the class {x : x is (I , F )-derived}. First, we do

(I , F )-induction to show Cl (I , F ) ⊆ D.
Basis: I ⊆ D, since for each a ∈ I , a is a derivation of a.
We next show that D is F -closed. So let f ∈ F be n-ary and let f ( an ) ↓,
where ai ∈ D for i = 1, . . . , n. By deﬁnition of D, there are derivations
. . . , a1 , . . . , . . . , an .
Then . . . , a1 , . . . , . . . , an is a derivation (see Exercise VII.4), and therefore
so is . . . , a1 , . . . , . . . , an , f ( an ) ∈ D. Thus D is
an ) (why?). It follows that f (
f -closed, and hence F -closed, since f ∈ F was arbitrary.
By induction over Cl (I , F ), we have obtained Cl (I , F ) ⊆ D.

† We also say that xi is obtained from k previous objects in the derivation by the application of a
k-ary operation from F .
VII.1. Finite vs. Inﬁnite 437

Next, to show the opposite inclusion, we do induction in ω −{0} with respect

to the length, n, of (I , F )-derivations.
Basis: n = 1: Let a ∈ D, where a is a derivation. Then a ∈ I ⊆ Cl(I , F ).
I.H.: Assume the claim for n ≤ k. Let n = k + 1, and a1 , . . . , ak , a be a
derivation of a ∈ D. If a ∈ I , then a ∈ Cl(I , F ) as in the basis step. Let then
a = f (a j1 , . . . , a jr ). By the I.H., {a j1 , . . . , a jr } ⊆ Cl(I , F ), since each of
. . ., a j1 , . . . , . . . , a jr is a derivation of length ≤ k. But Cl(I , F ) is closed
under f .

In particular, D is a set.

The above theorem provides an alternative characterization of the set

Cl(I , F ), which is more convenient when we want to prove that such and
such an x is in the set. On the other hand, the original definition (VII.1.16)
is more flexible to use when we try to prove properties of all the elements of
Cl(I , F ), in which case we use (I , F )-induction. In such inductive proofs
we do not need to refer to the natural numbers, not even implicitly, since we do
not employ derivations.
Before we proceed with an alternative definition of finite sets, due to
Whitehead and Russell, we present one more result on inductive definitions,
which properly belongs here and will also be used later (for example, in the
proof of the Cantor-Bernstein theorem).

VII.1.25 Deﬁnition. Let X be a set. An operator over X is a function

: P(X ) → P(X ). is monotone iff for every S ⊆ T ⊆ X , (S) ⊆ (T ).
Thus, a monotone operator is total.
A set S ⊆ X is -closed iff (S) ⊆ S.

VII.1.26 Example. Let I be a set, and F a set of operations. For each f ∈ F

we denote its arity by n( f ). X will denote I ∪ f ∈ F ran( f ).
Deﬁne I ,F , for simplicity referred to as just in the balance of the exam-

ple, by (Z ) = I ∪ f ∈ F ran( f Z n( f ) ), for all Z ⊆ X .
Thus a set of initial objects, I , and a set of operations, F , give rise to an
operator over X . It is clear that is monotone, since S ⊆ T ⊆ X implies
ran( f S n( f ) ) ⊆ ran( f T n( f ) ).

VII.1.27 Theorem. If is a monotone operator over the set X , then the set

S= Z
Z ⊆X
(Z )⊆Z

satisﬁes (S) = S.
438 VII. Cardinality

We call S a fixed point or fixpoint of . It turns out that S is the ⊆-smallest fixed
point of .

Proof. Let J = {Z : Z ⊆ X ∧ (Z ) ⊆ Z }. Now J is a nonempty set, since

X ∈ J ⊆ P(X ). Hence, S is a set.

Next, we establish that (S) ⊆ S, i.e., that ( Z ∈J Z ) ⊆ S. Indeed,

( Z) ⊆ (Z ) [by monotonicity ( Z ) ⊆ (Z )]
Z ∈J Z ∈J Z ∈J

⊆ Z=S
Z ∈J

To conclude, we need to show that S ⊆ (S). We proceed by contradiction:

Let, instead, x ∈ S −(S). By monotonicity of , (S −{x}) ⊆ (S) ⊆ S, and,
since x ∈/ (S), we have x ∈/ (S − {x}) as well. Thus (S − {x}) ⊆ S − {x};

hence S − {x} ∈ J , and therefore S = Z ∈J Z ⊆ S − {x}, a contradiction.

VII.1.28 Remark. (S) ⊆ S implies that S ∈ J , and therefore S is the

⊆-smallest set in J . If now (T ) = T , then also (T ) ⊆ T ; hence T ∈ J and
consequently S ⊆ T . Thus the claim of the preceding note, that S is the
⊆-smallest ﬁxed point of , is correct.
One usually denotes this ⊆-smallest ﬁxed point of by or ∞ .

VII.1.29 Corollary (Γ-Induction, or Induction over Γ). (∀x ∈ ) T (x),

where is an operator over a set X , is derivable from a proof that
{x ∈ X : T (x)} is -closed.

Proof. Let Z = {x ∈ X : T (x)}. By assumption, (Z ) ⊆ Z . Thus Z ∈ J (where

J is as in VII.1.27); hence ⊆ Z and therefore (∀x ∈ ) T (x).

VII.1.30 Remark. For an abstraction of what we are doing here see VI.5.47
and VI.5.49. Here the PO set (A, <) is (P(X ), ⊂), and f is .
Monotone operators are also called inductive. VII.1.29 provides a justiﬁca-
tion for the name “inductive”, for it says that has the “property” T (x) if it
happens that the property “propagates with” : If Z is the set of all x ∈ X which
satisfy T (x), then all the elements of (Z ) also satisfy the property.

VII.1.31 Example. We conclude Example VII.1.26 by showing that the

⊆-smallest ﬁxed points of inductive operators generalize the notion of induc-
tively deﬁned sets.†

† This generalization is proper, i.e., there are fixed points which cannot be inductively defined
as in Definition VII.1.16. These require infinitary operations, i.e., operations with infinitely
many arguments.
VII.1. Finite vs. Infinite 439

Let S = Cl(I , F ) and X be as in Example VII.1.26. We will show that

S = I ,F . First,

I ,F (Z ) ⊆ Z ↔ I ∪ ran( f Z n( f ) ) ⊆ Z
f∈F

by deﬁnition of I ,F (Example VII.1.26). In words,

I ,F (Z ) ⊆ Z iff I ⊆ Z and Z is F -closed. (1)

By Theorem VII.1.27,

I ,F = Z
Z ⊆X
I ,F (Z )⊆Z

= Z by (1)
I ⊆Z ⊆X
Z is F -closed.
=S

Finally, let us recognize (I , F )-induction as I ,F -induction.

To prove (∀x ∈ S)P (x) by (I , F )-induction, we let P be the class
{x : P (x)} and prove that I ⊆ P and P is F -closed. If this plan succeeds, then
we have actually proved (see the proof of Theorem VII.1.20) that the set Z =
X ∩ P = {x ∈ X : P (x)} satisﬁes I ⊆ Z and is F -closed. By (1), this is tan-
tamount to proving that Z is I ,F -closed. By Corollary VII.1.29, this proves
(∀x ∈ I ,F )P (x), i.e., (∀x ∈ S) P (x), by I ,F -induction.

We will return to inductive operators in Section VII.7. To conclude the

present section, we resume the study of finite sets.
Definition VII.1.3 was based on the intuitive notion of depleting a finite
set in “finite time” by successively removing its elements, one at a time. The
following alternative definition due to Whitehead and Russell (1912) (see also
Levy (1979)) builds, rather than depletes, a finite set from “scratch” (i.e., from ∅)
in “finite time”, by successively adding elements to it.
The definition given below is a variant of the original one, chosen in the
context of the preceding groundwork on inductive definitions – and especially
derivations – so as to make it clear that we are on the right track towards
characterizing “finite” (see the following remark).
We will use the term WR-finite† until we can show that the notions of “finite”
and “WR-finite” are equivalent.

† Whitehead-Russell-ﬁnite.
440 VII. Cardinality

VII.1.32 Definition.
A set A is WR-finite iff A ∈ Cl(I , F ),where I = {∅}
and F = f y : dom( f y ) = P(A) ∧ y ∈ A ∧ f y = λx.x ∪ {y} .
If A is not WR-finite, then it is WR-infinite.

VII.1.33 Remark. Since A is a set, so is F . By Theorem VII.1.24, A is

WR-finite iff there is some derivation ∅, . . . , A of A such that at each non-
redundant step† a set x ∪ {y} occurs, where x is available at an earlier step of
the derivation and y ∈ A.
Thus in a “finite number of steps” we obtain A by collecting its elements
together, one at a time, starting from ∅. So the definition is reasonable. Note
that the notion of natural number was only used implicitly (via the derivation
concept) in this remark; it does not occur in the Definition VII.1.32, not even
implicitly.

VII.1.34 Proposition. ∅ is WR-ﬁnite.

Proof. Let I = {∅} and F = ∅. Since I ⊆ Cl(I , F ), it follows that ∅ ∈

Cl(I , F ).

VII.1.35 Proposition. If A is WR-ﬁnite, then so is A ∪ {y} for any y.

Proof. We look at the interesting case where y ∈ / A. By assumption, A ∈

Cl(I , F ), where I = {∅}, and F is the set of all the total functions λx.x ∪ {z}
on P(A), for all z ∈ A.
Let G = F ∪ λx.x ∪ {y} , where F contains exactly the F -functions ex-
tended to P(A ∪ {y}), so that for all z ∈ A ∪ {y}, dom(λx.x ∪ {z}) = P(A ∪ {y}).
A trivial (I , F )-induction shows that Cl(I , F ) ⊆ Cl(I , G ) (see Exer-
cise VII.6). Hence, A ∈ Cl(I , G ) and therefore A ∪ {y} ∈ Cl(I , G ), since
Cl(I , G ) is closed under λx.x ∪ {y}.

VII.1.36 Remark. Here is an easier alternative proof: Let ∅, . . . , A be a

(I , F )-derivation. Then ∅, . . . , A, A ∪ {y} is a (I , G )-derivation; hence
A ∪ {y} ∈ Cl(I , G ).
We prefer the original proof, because it avoids any reliance on the natural
numbers.

† At any step of the derivation we may place ∅; such a step is redundant in that it does not help to
progress with the formation of A.
VII.1. Finite vs. Inﬁnite 441

VII.1.37 Theorem (Induction on WR-Finite Sets) (Zermelo (1909)). Let

P (x) be a formula. To prove

(∀x) x is WR-ﬁnite → P (x)

it sufﬁces to prove the following two things:

(a) P (∅), and

(b)

(∀x)(∀y) x is WR-ﬁnite → P (x) → P (x ∪ {y})

Proof. Let P (x) be as above, and A be WR-ﬁnite. Then A ∈ Cl(I , F ), where

I = {∅} and F is as in Deﬁnition VII.1.32. Let S = {x : x is WR-ﬁnite ∧
x ⊆ A ∧ P (x)}.
By (a) and VII.1.34, I ⊆ S.
By (b) and Proposition VII.1.35, S is F -closed. Therefore, by (I , F )-
induction, Cl(I , F ) ⊆ S; hence A ∈ S. In particular, P (A).

VII.1.38 Theorem. A is ﬁnite iff it is WR-ﬁnite.

Proof. If part. We show that for some n ∈ ω, A ∼ n. The proof is by induction

on WR-finite sets.
Basis: ∅ ∼ 0.
I.H.: Let x be WR-finite, and for some n ∈ ω, f : x → n be a 1-1 corres-
pondence. Consider x ∪ {y}, and show this set to be equinumerous with some
natural number. If y ∈ x, then the result is the I.H. itself.
So let y ∈/ x. Then f ∪ {y, n} provides a 1-1 correspondence x ∪ {y} →
n ∪ {n} = n + 1.
Only-if part. Let f : n → A be a 1-1 correspondence. Then f [n] = A. By
Exercises VII.7 and VII.8, A is WR-finite.

From now on we drop the qualification “WR-” from “finite”. What we are
left with from all this, besides a better understanding of finite sets, is the useful
proof technique of induction on finite sets.

VII.1.39 Theorem. Let |A| = n, and < be a well-ordering on A. Then

(A, <) ∼
= n.†

† By “∼= n” we mean, of course, “∼= (n, ∈)”. We are following the convention of Section VI.4 in
writing “∼
= α” as a short form of “∼
= (α, ∈)” for ordinals.
442 VII. Cardinality

Proof. By induction on n.
Basis: n = 0. The result is trivial.
I.H.: Assume the claim for n = k. Let n = k + 1. By Exercise VII.10, A
has a <-maximal element, say a. Now, a is also <-maximum, since < is a total
order. That is, x < a for all x ∈ A − {a}.
Now, |A−{a}| = k (see Exercise VII.12) and the I.H. yield (A−{a}, <) ∼
= k.
By pairing a with k, we extend the previous ∼
= to (A, <) ∼= k + 1.

The above result establishes the claim made in the preamble to this chapter,
namely, that there is a unique “length” for each ﬁnite set and that this length
coincides with the set’s cardinality. We add that, of course, every ﬁnite set is
well-orderable, a well-ordering being induced by A ∼ |A| (see VI.3.12).

VII.2. Enumerable Sets

VII.2.1 Deﬁnition. A set S is enumerable iff S ∼ ω. If S is either ﬁnite or

enumerable, then it is called countable.
If a set is not countable, then it is called uncountable.

Some authors use the term denumerable for enumerable. Also, the term at most
enumerable is sometimes used for countable.

VII.2.2 Example. According to Definition VII.2.1 each finite set is also count-
able. We also observe that ω is enumerable (since ω ∼ ω), so enumerable sets
exist. Do uncountable sets exist? In other words, are there infinite sets which
are not enumerable? Cantor, as we will see in the next section, answered this
affirmatively.

VII.2.3 Example. The set of the even natural numbers, E, is enumerable, since
λx.2x : ω → E is a 1-1 correspondence. A similar comment is true for the set
of the odd natural numbers.

VII.2.4 Example. Every enumerable set is inﬁnite. Indeed, if A ∼ ω, then (see

Exercise VII.2) A is finite iff ω is. But ω is not finite.
If now A ⊆ B and A is enumerable, then B is infinite. This is so because of
Corollary VII.1.10.
VII.2. Enumerable Sets 443

VII.2.5 Theorem. If A ⊆ ω, then A is countable.

Proof. If A is ﬁnite, then we are done. So let A be inﬁnite.

Deﬁne f by induction on n as follows:

f (0) = min A
and if n > 0, (i)
f (n) ! min A − ran( f n)

Observe that
(1) Since the recursion (i) is pure, dom( f ) ≤ ω. Say, dom( f ) = n ∈ ω (for
some n). Thus, A = f [n].† By Exercise VII.18 (or VII.7), A is ﬁnite, a
contradiction. Thus f is total on ω.
(2) f is 1-1. Indeed, let n = m, where we assume, without loss of generality,
that m < n. Hence f (m) ∈ ran( f n); therefore f (n) = f (m) by the second
part of (i).‡
(3) A = ran( f ). Let us assume instead that, for some m, m ∈ A − ran( f ). Since
ω ∼ ran( f ) (why?), ran( f ) is inﬁnite; therefore, for some n,

m ≤ f (n) (ii)

for, otherwise, (∀n ∈ ω) f (n) < m, i.e., ran( f ) ⊆ m, making ran( f ) finite.
Indeed, (ii) graduates to m < f (n), the strict inequality being justified from
m∈ / ran( f ). This last observation is inconsistent with the definition of f (n)
(second equation of (i)) since both m and f (n) are in A − ran( f n).
Items (1) through (3) establish that ω ∼ A.

VII.2.6 Corollary. A nonempty set A is countable iff there is an onto function

f : ω → A.§

Proof. Only-if part. Let A be ﬁnite and f : n → A be a 1-1 correspondence

for some n ∈ ω. Trivially, f is a (nontotal) function on ω, and is onto A.
If, on the other hand, A is infinite, then there is a 1-1 correspondence
f : ω → A for some f . This f is onto.
If part. If A is finite, we are done. So let A be infinite, and let f : ω → A
be onto.

† f (n) ↑ entails A ⊆ ran( f n).

‡ See also Exercise VII.20.
§ We emphasize that f need not be total.
444 VII. Cardinality

By V.3.9, there is a right inverse, g : A → ω, of f , in the sense that f ◦g = 1,

where 1 is the identity on A.
By V.3.4, g is total and 1-1; thus A ∼ ran(g). By Exercise VII.2, ran(g) is inﬁ-
nite; by Theorem VII.2.5, ω ∼ ran(g); hence ω ∼ A. (See also Exercise VII.21.)

It is clear that, if we want, we can state the corollary so that f is total. Indeed,
if f : A → B is nontotal but onto, then we can always extend it to a total and
onto function h by taking h = f ∪ {x, b : x ∈ A − dom( f )}, where b is any
fixed element of B. Thus, whereas the original definition (Definition VII.2.1)
of A being enumerable requires that an enumeration without repetitions
exists (this is the 1-1 correspondence f : ω → A), we now have relaxed this
by saying, via Corollary VII.2.6, that a nonempty set A is countable iff an
enumeration exists (possibly with repetitions – the 1-1-ness requirement being
dropped).

VII.2.7 Example. If A is countable and B ⊆ A, then B is countable. Indeed,

if B = ∅, then the result is immediate.
So, let B = ∅, and let (by Corollary VII.2.6) f : ω → A be onto. The total
function i = λx.x on B is 1-1. Thus the inverse relation, i −1 , is an onto (but
nontotal, unless A = B) function i −1 : A → B. Clearly i −1 ◦ f : ω → B is
onto.

VII.2.8 Example. If A and B are countable, then so are A ∩ B and A − B.

This follows from A ∩ B ⊆ A and A − B ⊆ A.
Note that the hypothesis could be weakened to simply require that just A be
countable.

VII.2.9 Proposition. ω ∼ ω × ω.

Proof. By VI.7.8.

There is a more “elementary” proof that avoids the J of VI.7.4 and uses just
multiplication and addition on ω.
We start with the total function f = λmn.(m + n) · (m + n) + m on ω2 .
(By the way, ω2 is in the Cartesian product sense throughout. Ordinal exponen-
tiation would have used an exponent “·2” instead, and cardinal exponentiation
we have not introduced yet.)
One can easily derive that f is 1-1, relying on what we know about ordinal
addition and multiplication (cf. VI.10.11 and VI.10.19). Indeed, assume that
VII.2. Enumerable Sets 445

f (m, n) = f (m , n ) (m, n, m , n in ω), that is,

(m + n) · (m + n) + m = (m + n ) · (m + n ) + m (1)

and prove

m = m ∧ n = n (2)

Well, if m + n = m + n , then m = m from (1) by VI.10.11, and then n = n .

We show that m + n = m + n cannot apply; then we are done. So let instead
m + n < m + n . Thus m + n + 1 ≤ m + n . It follows† that

(m + n) · (m + n) + m + m + n + n + 1 ≤ (m + n ) · (m + n )

Hence

(m + n) · (m + n) + m + m + n + n + 1 ≤ (m + n ) · (m + n ) + m

By (1) and VI.10.11 again, m + n + n + 1 ≤ 0, which is absurd.

Thus the inverse relation f −1 : ω → ω2 is a (nontotal) onto function. By
VII.2.6, ω2 is countable. Since it is inﬁnite,‡ it is enumerable.

VII.2.10 Corollary. ω ∼ ωn for n ≥ 2.

Proof. ωn+1 = ωn × ω. Now use induction on n in ω − {0, 1}, and VII.2.9.

We can also view the above as a theorem schema (one theorem for each n ∈ N,
rather than the single theorem (∀n ∈ ω)(n ≥ 2 → ω ∼ ωn )) and prove it by
informal induction on N − {0, 1}.

VII.2.11 Proposition. If Ai is countable for i = 1, . . . , n, then so is × n

i=1
Ai .

Theorem schema.

Proof. Let f i : ω → Ai be onto for i = 1, . . . , n. Then f 1 , . . . , f n : ωn →

× n
i=1 i
A is onto, where f 1 , . . . , f n denotes the function λxn . f 1 (x1 ), . . . ,
f n (xn ).

† By “squaring”. We are using the distributive law and commutativity of + and · on ω freely –
cf. VI.10.19(iv).
‡ ω2 ⊇ {0} × ω ∼ ω, the ∼ obtained via 0, n → n (cf. Exercise VII.2).
446 VII. Cardinality

To conclude, we need an onto function g : ω → ωn , since f 1 , . . . , f n ◦ g :

ω→ ×
n
A will then be onto. This we have by VII.2.10.
i=1 i

Informally, using N for ω, one can conclude the above argument in this alter-
native manner (without invoking VII.2.10): Let

h = λxn . p1x1 +1 p2x2 +1 · · · pnxn +1

where pi is the ith prime ( p0 = 2, p1 = 3, p2 = 5, . . .). By the (informal)

prime factorization theorem, h : Nn → N is 1-1. Trivially, it is also total. Thus
if g = h −1 , the inverse relation, g : N → Nn is an onto function.
Of course, armed with sufﬁcient strength of will (and time and space), one
can develop the properties of the formal natural numbers to the point that one
proves the prime factorization theorem within ZFC (ZF sufﬁces, actually). Then
one can turn the above informal reasoning fragment to a formal one.
n
VII.2.12 Proposition. If, for i = 1, . . . , n, Ai is countable, then so is i=1 Ai .

The above is stated as a schema, one theorem for each n ∈ N. A formal version
uses a function f with dom( f ) = ω. It takes the form

(∀n ∈ ω) (∀i ∈ n) f (i) ∼ ω → { f (i) : i ∈ n} ∼ ω

and is proved pretty much like the schema version.

n
Proof. Let f i : ω → Ai be onto for i = 1, . . . , n. Deﬁne g : ω × ω → i=1 Ai
by g(i, x) = f i (x) for all x ∈ ω and i = 1, . . . , n. Clearly, g is onto, for if
n
a ∈ i=1 Ai , then, say, a ∈ Am for some m. By ontoness of f m , there is an
x ∈ ω such that f m (x) = a, that is, g(m, x) = a. Now invoke VII.2.6 and VII.2.9.

We observe that the g of the previous proof is not total.

VII.2.13 Theorem (A Countable Union of Countable Sets Is Countable).

∞
If, for all i ∈ ω, Ai is countable, then so is i=0 Ai .

Proof. Let f i : ω → Ai be onto for all i ∈ ω. Deﬁne g on ω × ω as in the

previous proof, and proceed in an identical fashion.

VII.2.14 Remark. (1) The nickname of the theorem has the obvious justiﬁca-
tion as the family (Ai )i∈ω is countable, indeed enumerable.
VII.2. Enumerable Sets 447

(2) The proof of Theorem VII.2.13 involved (tacitly) the axiom of choice.
This happened during the deﬁnition of g, where one out of, possibly, several f i
was (tacitly) chosen for each i ∈ ω. The omitted details are as follows:
#
Since { f : f : ω → Ai is onto} = ∅ for i ∈ ω, there is an h in i∈ω { f : f :
ω → Ai is onto}. For each i ∈ ω, h(i) is the f i used in the proof.
Was this a peculiarity of this particular proof? No, as a result of Feferman
and Levy (1963) shows: without the axiom of choice we may have a countable
union of countable sets that turns out to be uncountable.
(3) The axiom of choice is provable for ﬁnite sets of sets, as we already
know. Thus to construct g in the proof of Proposition VII.2.12 we did not need
AC to select one (out of the possibly many) f i for each i = 1, . . . , n.
(4) In each of the results VII.2.11, VII.2.12, and VII.2.13, if any of the Ai is
enumerable, then so are

∞
×
n n
Ai , Ai , and Ai
i=1 i=1 i=0

For the case of this follows from VII.1.10; for the other case see Exer-
cise VII.15.

VII.2.15 Example. ∞ n=1 ω is enumerable by Theorem VII.2.13, Corollary
n

VII.2.10, and Remark VII.2.14(4). A direct proof – assuming an undertaking to

develop enough arithmetic in ZF – bypasses the axiom of choice in this special
case.
Let f (x) = x0 , . . . , xm whenever x = p0x0 +1 p1x1 +1 · · · pmxm +1 . By the prime
factorization theorem, this leads to an onto (but nontotal) function f : ω →
∞ n
n=1 ω .

VII.2.16 Example (Informal). Let us see that A ∪ B is countable whenever

A and B are, this time using an elementary (informal) technique traceable
back to Cantor, rather than observing that this statement is a special case of
Proposition VII.2.12.
According to hypothesis (see the discussion following Corollary VII.2.6), A
is enumerated (possibly with repetitions) as

a0 , a1 , a2 , . . .

and B is enumerated as

b0 , b1 , b2 , . . .
448 VII. Cardinality

Following the arrows in the diagram below, we trace an enumeration of A ∪ B:

a0 a1 a2 a3 ...
↓ ' ↓ ' ↓ ' ↓ ...
b0 b1 b2 b3 ...

VII.2.17 Example (Informal). We present an informal proof that ω2 is enu-

merable by providing an enumeration diagrammatically, as in the previous
example:
0, 0 0, 1 0, 2 0, 3 ···
' ' '
1, 0 1, 1 1, 2
' '
2, 0 2, 1
'
3, 0
..
.

VII.2.18 Example (Informal). It is easy to see that the set of integers, Z =

{. . . , −1, 0, 1, . . . }, is enumerable. One way to do this is to observe ﬁrst that
λx. − x is a 1-1 correspondence between (the real) ω (if you prefer, you may
use the real alternative, N) and {. . . , −1, 0}, the non-positive numbers NP.
Then observe that Z = NP ∪ ω, and invoke either Proposition VII.2.12 or
Example VII.2.16.
Another way to do the same is to view Z as {0, 1} × N (or {0, 1} × ω, using
the real ω), where 0, n stands for n ∈ N, whereas 1, n stands for −n for
n ∈ N. By Proposition VII.2.11 (see also Remark VII.2.14(4)), once again, Z
is enumerable.

VII.2.19 Example (Informal). We next see that Q, the set of rational numbers,
is enumerable.
This may appear, at ﬁrst sight, surprising, because of the density of rational
numbers: Between any two rational numbers r and s there is another rational
number, for example, (r + s)/2. Thus, intuitively, there seem to be “more”
numbers in Q than in N (which does not enjoy a density property). Well, intuition
can be wrong in connection with the cardinality of inﬁnite sets. (We will see
VII.2. Enumerable Sets 449

another counterintuitive result in Section VII.4, namely, that there are as many
reals in the unit square, [0, 1]2 , as there are in the unit segment, [0, 1]. See also
Exercise VII.35.)
The justiﬁcation of the claim is straightforward:

Q = m/n : m ∈ Z ∧ n ∈ N − {0} (1)
Since N ∼ N − {0} via λx.x + 1,
Z × (N − {0}) ∼ N (2)
by Proposition VII.2.11 and Remark VII.2.14(4).
Since the function Z × (N − {0}) → Q given by m, n "→ m/n is onto, (2)
and (1) yield an onto function N → Q.

VII.2.20 Example (Informal). Let us “count” the polynomials of one variable

(say, x), with integer coefﬁcients.
Such a polynomial is a function of x, whose value for each x is given by the
expression
,
n
a0 + a1 x + a2 x 2 + · · · + an x n , for short, ai x i .
i=0

The ai are the coefﬁcients, and, in this example, they are in Z. Whenever an = 0,
we say that the degree of the polynomial is n. We identify each nth-degree
!n
polynomial, i=0 ai x i , with the (n + 1)-tuple a0 , . . . , an .
It follows that the set of nth-degree polynomials is Zn+1 and therefore the
set of all polynomials is

∞
Zn
n=1

Since we already know that ω ∼ Z (Example VII.2.18), the set of polynomials

is enumerable, by Example VII.2.15. (See also Exercises VII.26 and VII.27.)

We now turn to a characterization of inﬁnite sets due to Dedekind (1888).† This

characterization is contained in the following deﬁnition.

VII.2.21 Deﬁnition. A set A is Dedekind-inﬁnite iff it is equinumerous with

some of its proper subsets. Otherwise, it is Dedekind-ﬁnite.

† This characterizing property of inﬁnite sets was also observed by Cantor and Bolzano. See also
Wilder (1963, p. 65).
450 VII. Cardinality

VII.2.22 Remark. A set which is Dedekind-inﬁnite is also inﬁnite.

We prove the equivalent contrapositive statement, that if a set A is finite in
the “ordinary” sense (Definition VII.1.3), then it is also Dedekind-finite (i.e.,
not Dedekind infinite). Indeed, if f : A → n ∈ ω is a 1-1 correspondence and
if B is any proper subset of A, then
(1) B ∼ f [B], and
(2) f [B] ⊂ n.
By Corollary VII.1.7, f [B] ∼ n; hence f [B] ∼ A. By (1), B ∼ A. Since B
was an arbitrary proper subset of A, our conclusion is that A is not Dedekind-
infinite.
In what follows we will show the equivalence of the two definitions of finite
(or infinite).

For the balance of this discussion, “infinite” and “finite”, without the qual-
ification “Dedekind”, refer to the ordinary notions, as per Definition VII.1.3.

VII.2.23 Lemma. Every inﬁnite set has an enumerable subset.

Proof. Let A be inﬁnite. By VI.5.54, there is an α and a 1-1 correspondence

f : α → A. By Exercise VII.2, α is inﬁnite; hence ω ≤ α, i.e., ω ⊆ α (otherwise,
α < ω, whence α is a natural number and therefore ﬁnite).
The set f [ω], being an enumerable subset of A, settles the issue.

VII.2.24 Lemma. If A is inﬁnite and B is countable, then A ∪ B ∼ A.

Proof. Let C = B − A, and D be an enumerable subset of A.

By Example VII.2.8, C is countable. Thus D ∪ C ∼ D (see Re-
mark VII.2.14(4)). Extend the above 1-1 correspondence to one between A ∪ B
and A, as follows: Let each x ∈ A − D correspond to itself, and observe that
A ∩ C = ∅, A ∪ B = A ∪ C = (A − D) ∪ (D ∪ C), and A = (A − D) ∪ D.

VII.2.25 Theorem. The notions “inﬁnite” and “Dedekind inﬁnite” are

equivalent.

Proof. That Dedekind infinite implies infinite is the content of Remark VII.2.22.
So let next A be infinite.

Case 1. There is a 1-1 correspondence f : ω → A, i.e., A is enumerable.

From ω ∼ ω − {0} (via x "→ x + 1) it follows that f [ω − {0}] ∼ A; more-
over, f [ω − {0}] ⊂ A.
VII.3. Diagonalization; Uncountable Sets 451

Case 2. A is not enumerable. By Lemma VII.2.23, A has an enumerable

subset B. By Exercise VII.16, A− B is inﬁnite (otherwise A = (A− B) ∪ B
is enumerable). By Lemma VII.2.24, A ∼ A − B.

In the next section, among other things, we see that case 2 above is not
vacuous.

VII.3. Diagonalization; Uncountable Sets

In this section we study the important technique of diagonalization through
several examples. It was devised by Cantor in order to prove that the set of
real numbers is uncountable, and it has since found many applications in logic,
such as in the proof of Gödel’s incompleteness theorems and later in recursion
theory (and its offspring, computational complexity). To a large extent, recur-
sion theory and complexity theory are the art and science of diagonalization.
Described generally, given a “square table” (that is, a total function F : A ×
A → X), this method defines an A-long array of elements of X (that is, a
function D : A → X) that is different from all the “rows” of the table (where
the hth “row” of F is the function H = λx.F(h, x)). The idea, due to Cantor,
can be described like this:
Start with the function λx.F(x, x) (the “main diagonal” of the table). If this
were to be the same as the hth row, then, in particular, H(h) = F(h, h).†
Suppose now that we want to build an A-long array that cannot equal the
hth row. It suffices to take a modified diagonal: Just change the entry F(h, h).
It is clear now what needs to be done to get an A-long array D that fits
nowhere as a row. Take, again, the diagonal, but change every one of its entries.
That is,

D(x) = some element in X − {F(x, x)} (1)

Clearly this works, i.e., for each a ∈ A, D = λy.F(a, y); for (1) yields D(a) =
F(a, a).
We will illustrate the above general description of diagonalization in the
following examples.

VII.3.1 Example. Let ( f n )n∈ω be a countable family of total functions

f n : ω → ω.

† Intuitively, the main diagonal was “rotated” counterclockwise, by 45 degrees, around the pivot
entry F(h, h).
452 VII. Cardinality

The function d = λx. f x (x) + 1 is different from each f x at input x; thus

d does not belong to the family. Let us verify: If d = f i for some i ∈ ω, then
d(i) = f i (i). On the other hand, by the definition of d, d(i) = f i (i) + 1.
We conclude that f i (i) = f i (i) + 1, a contradiction, since both sides of “=”
are defined.†
Finally, we note that the argument just presented indeed fits the general
description of diagonalization. Here the “table” is F = λmn. f m (n), and one
particular way to build D of (1) (here called “d”) is to make d(x) different from
f (x, x) by adding 1 to the latter.
d(x) = f x (x) + x + 1 would have worked too.

VII.3.2 Proposition. The set A = ω ω is uncountable.

Recall that X Y is the set of all total functions X → Y .

Proof. If A were countable, then there would be an enumeration ( f n )n∈ω of all

its members. Example VII.3.1 shows that this is impossible, as any such (at-
tempted) enumeration will omit at least one member of A, for example d.

VII.3.3 Corollary. The set ω 2 is uncountable.

Proof. Exercise VII.30.

VII.3.4 Example (Informal). Diagonalization is often applied to deﬁne a class

that does not belong to an indexed family of classes (Px)x∈J , where J is a
class and P a relation on J (we are speaking informally, ignoring issues of
left-narrowness). This is done by deﬁning the “diagonal class” E, setting

E = {x ∈ J : x ∈
/ Px} (1)

E = {x ∈ J : x P x} (1 )

Clearly this works, i.e., E = Px for all x ∈ J, for if

E = Pa for some a ∈ J (2)

† Clearly, this argument breaks down if the family ( f n )n∈ω contains nontotal functions, in which
case we employ Kleene’s weak equality !. Then it is possible to have f i (i) ↑, in which case
f i (i) ! f i (i) + 1.
VII.3. Diagonalization; Uncountable Sets 453

then
by (2) by (1)
a ∈ Pa ←→ a ∈ E ←→ a ∈
/ Pa

– a contradiction.
This is not a new flavour of diagonalization, but fits under the general dis-
cussion on p. 451 above. Indeed, think of the “table” F : J × J → 2 defined
by†

0 if x P y
F(x, y) =
1 otherwise
The general discussion would lead to the “diagonal object”

D = λx.1 − F(x, x) (3)

which is a J-long 0-1-valued array that cannot be a row of the “table”. “D” is
just another way of saying “E”, for the former is the characteristic function of
the latter, as the following equivalences show (for x ∈ J):

D(x) = 0 ↔ F(x, x) = 1 ↔ x P x ↔ x ∈ E

As an application, let us look at the family (x)x∈S , where S is any class of sets.
Here J = S and P is the identity. Let D = {x ∈ S : x ∈ / x}, i.e., x ∈ D iff
x ∈S∧x ∈ / x. Thus D behaves at x differently than x at x, and therefore it is
not one of the x’s; in other words, D ∈ / S (this is so because this diagonalization
tells us that D is not in the family (x)x∈S , but this family equals S).
So, by diagonalization, we have obtained an object not in S. If we now let
S be V M , the class of all sets, then D is the Russell class, and the above argu-
ment establishes (once again), that D ∈ / V M , i.e., that D is not a set. Therefore,
Russell’s proof was a diagonalization over all sets to obtain an object which is
not a set, and while ingenious and elegantly simple, the technique was borrowed
from Cantor’s work.

With practice, one can expand the applicability of diagonalization to more

general situations – for example, cases where we apply some transformation
(function) to one of the table “coordinates”. Say, given P and J as in VII.3.4,
if G : J → J is a function, then the class E = {G(x) ∈ J : G(x) ∈ / Px} is
different from all Px (x ∈ J). Similarly, if f : ω → ω ω is an enumeration
of total functions ω → ω, then d = λx. f (x)(x) + 1 is not in the range of f

† F is the characteristic function of P.

454 VII. Cardinality

(otherwise, d = f (a) for some a – hence d(a) = f (a)(a) – but also d(a) =
f (a)(a) + 1).†

VII.3.5 Example (Informal). Throughout, [0, 1] will denote the real closed
interval, {x ∈ R : 0 ≤ x ≤ 1}. Since Q, the set of rational numbers, is enumer-
able, then so is [0, 1] ∩ Q = {x ∈ Q : 0 ≤ x ≤ 1} (see Example VII.2.7). Now
each rational in [0, 1] has a decimal expansion 0.a0 a1 . . . ai . . . . For example,

1 = 0. 99
. .. , 0 = 0. 00
. .. , 1/3 = 0. 33
. ..
all 9’s all 0’s all 3’s

and

1/2 = 0.5 00
. .. but also 1/2 = 0.4 99
. ..
all 0’s all 9’s

We next claim that the set of all decimal expansions of rationals in [0, 1]
is enumerable. Indeed, this set equals Qinf ∪ Qfin , where Qinf is the set of all
infinite representations such as 0.33 . . . , 0.99 . . . , 0.499 . . . , whereas Qfin is
the set of all finite representations, i.e., those that terminate with an infinite
sequence of 0’s, such as

. ..
0. 00 . ..
and 0.5 00
all 0’s all 0’s

Note that some rationals have both infinite and finite representations.
By Exercise VII.31, Qinf is equinumerous to an infinite subset of Q, and
also Qfin is equinumerous to an infinite subset of Q; hence Qinf ∪ Qfin ∼ ω,
as required. So we have an enumeration (0.a0n a1n . . . ain . . . )n∈ω of all decimal
expansions of the rational numbers in [0, 1].
Consider the decimal expansion d = 0.d 0 d 1 . . . d i . . . , where for all i ∈ ω,
d = aii . For example, a well-defined way to achieve this is to set
i

2 if aii = 1
d =
i
1 otherwise
By diagonalization, d does not belong to the family (0.a0n a1n . . . ain . . . )n∈ω .
Since the latter represents all the rationals in [0, 1], and since d represents a
real in [0, 1] it follows that d is an irrational number in [0, 1].
One can now continue to discover more irrationals in the interval by adding
d at the beginning of the enumeration and then diagonalizing again to obtain

† This type of argument shows, in recursion theory, that the set of all total computable functions
cannot be “effectively” enumerated.
VII.3. Diagonalization; Uncountable Sets 455

a new irrational d (each irrational has a unique expansion – inﬁnite only).

The reader may wish to refer to Wilder (1963, p. 89) to see a very interesting
extension of this type of discussion. In particular, Wilder applies this type of
diagonal technique to the set of algebraic numbers in [0, 1] to “construct” trans-
cendental, i.e., non-algebraic, numbers (refer also to Exercise VII.27).

VII.3.6 Example (Informal: Cantor). The set of real numbers in [0, 1] is

uncountable.
Suppose instead that [0, 1] ∼ ω. Then, entirely analogously with Ex-
ample VII.3.5, [0, 1]fin ∪ [0, 1]inf ∼ ω, where [0, 1]fin is the set of all finite, and
[0, 1]inf the set of all infinite, decimal expansions of reals in [0, 1], and therefore
there is an enumeration (0.a0n a1n . . . )n∈ω of the members of [0, 1]fin ∪ [0, 1]inf .
However, the existence of the diagonal number d, defined as in Ex-
ample VII.3.5, leads to a contradiction: On one hand d cannot be in the enu-
meration. On the other hand,

d = 0. d 0 d
1
. ..
1’s and 2’s

Hence it is in the enumeration. This contradiction establishes the claim.

VII.3.7 Proposition (Cantor). The set P(ω) is uncountable.

Contrast with Exercise VII.19.

Proof. Let us assume the contrary, i.e., that there is a 1-1 correspondence
f : ω → P(ω). Construct the diagonal set D = {x ∈ ω : x ∈ / f (x)}.† Thus on
one hand D is not in the range of f ; on the other hand it must be, since D ⊆ ω.
This contradiction establishes the claim.

VII.3.8 Example (Informal). For the purpose of this example we will state
without proof a few facts. To begin with, each real number r in [0, 1] has a binary
expansion, or can be represented in binary notation, as r = 0.b0 b1 . . . bi . . . .
This notation, or expansion, means (quite analogously with the familiar decimal
!∞
case) that r = i=0 bi /2i+1 , where each bi is 0 or 1, and is called the ith binary
digit or bit. An expansion 0.b0 b1 . . . bi . . . is finite if for some n, bi = 0 for all
i ≥ n; otherwise it is infinite. Infinite expansions are unique.

† To connect with the discussion on p. 451, here P on ω is given by n P m iff n ∈ f (m); thus
Pm = f (m).
456 VII. Cardinality

Our purpose is to show that [0, 1] ∼ ω 2 ∼ P(ω). Indeed, to each A ∈ P(ω)

we associate its characteristic function on ω, χ A , deﬁned by

0 if n ∈ A
χ A (n) =
1 otherwise

It is clear that A "→ χ A is a 1-1 correspondence from P(ω) to ω 2, which proves

that the rightmost ∼ holds.
We next consider [0, 1]inf and [0, 1]fin , the sets of all infinite and finite binary
expansions, respectively, of all the reals in [0, 1]. For example,

0 = 0. 00 . . . ,
1 = 0. 11 . . . ,

all 0’s all 1’s

1/2 = 0.1 00 . . .
but also 1/2 = .0 11 . . .

all 0’s all 1’s

Since the expansions in [0, 1]ﬁn represent rationals (see Exercise VII.32),
we have [0, 1]ﬁn ∼ ω and therefore

[0, 1]inf ∼ [0, 1]inf ∪ [0, 1]ﬁn (1)

by Lemma VII.2.24. Since every non-zero real has a unique inﬁnite binary
expansion (see also Exercise VII.33), (1) yields (0, 1] ∼ [0, 1]inf ∪ [0, 1]ﬁn , and
one more application of Lemma VII.2.24 ([0, 1] ∼ (0, 1]) yields

[0, 1] ∼ [0, 1]inf ∪ [0, 1]ﬁn

To conclude, observe that f "→ 0. f (0) f (1) . . . f (i) . . . is a 1-1 correspondence

ω
2 → [0, 1]inf ∪ [0, 1]ﬁn .

VII.3.9 Remark. The technique of Example VII.3.8 showed that x 2 ∼ P(x)

for any set x.

VII.3.10 Theorem (Cantor’s Theorem). For any set x, x ∼ P(x).

Proof. By contradiction, let there be a 1-1 correspondence f : x → P(x). This

leads to the family of sets ( f (a))a∈x , to which we can readily apply the technique
of Proposition VII.3.7:
Let D = {a ∈ x : a ∈ / f (a)}. Thus D = f (a) for all a ∈ x, yet D ⊆ x; thus
it must be an f (a) after all. This contradiction establishes the claim.
VII.4. Cardinals 457

We note that, relying on Example VII.3.8, we could have proved Cantor’s

theorem as follows: Let instead x ∼ P(x). Then also x ∼ x 2. So let a "→ ga
be a 1-1 correspondence x → x 2. The diagonal function d = λa.1 − ga (a) is
different from each ga (at a), yet it is a total 0-1-valued function on x, so it is a
ga . Contradiction.
The reader will recognize that the two arguments are essentially identical.
Indeed, d = χ D .

VII.3.11 Example (Informal). We show here that (−1, 1) ∼ R, where

(−1, 1) = {x ∈ R : −1 < x < 1}. Indeed, let f on R be deﬁned by
x
f (x) =
1 + |x|
Trivially, f is total. Next, we see that it is 1-1. Indeed, let
a b
= (1)
1 + |a| 1 + |b|
where a and b are in R. This leads to

a − b = b|a| − a|b| (2)

By (1), ab ≥ 0, so we analyze (2) under just two cases.

Case 1: a ≥ 0 and b ≥ 0. Then a − b = ba − ab = 0.

Case 2: a ≤ 0 and b ≤ 0. Then a − b = −ba − (−ab) = 0.

Both cases lead to a = b, so f is 1-1.

Finally, f is onto (−1, 1). Indeed, let c ∈ (−1, 1). The reader can easily verify
that if c = 0, then f (0) = c; if − 1 < c < 0, then f (c/(1 + c)) = c; if 0 < c < 1,
then f (c/(1 − c)) = c.

VII.4. Cardinals
In this section, following von Neumann, we assign a measure of cardinality to
each set, its cardinal number.
At the very least, cardinal numbers must be ∼-invariants (i.e., equinumerous
sets must measure identically). It is also desirable that this measure be consistent
with the measures we have already accepted for the cardinality of ﬁnite sets,
since the latter perfectly ﬁt with our intuition.
The requirement that cardinal numbers be ∼-invariants means that for any
set A, its cardinal number depends on the class of all sets equinumerous to
458 VII. Cardinality

A rather than just on A. It was therefore natural that, at ﬁrst, mathematicians

defined (Frege-Russell definition) the cardinal number of a set A to be “the ‘set’
of all sets equinumerous to A”. This, of course, eventually led to trouble because
these cardinal numbers were “too big” to be sets. For example, the cardinal
number of {∅} would be, according to this definition, the “set” of all singletons
(one-element sets), but this “set” is in 1-1 correspondence with the class of all
sets and urelements (via x "→ {x}) and therefore is not a set, by the collection
axiom.
Thus cardinal numbers, as “defined” above, cannot be objects of study in our
theory. However, in the old way of doing set theory (where any collection, in
principle, was a set and therefore was entitled to be studied in the theory) there
were still problems, as even cardinal numbers of singletons would be closely
associated with the “self-contradictory notion” of the “set of all sets” (the reader,
once again, is referred to the discussion in Wilder (1963, pp. 98–100)).
The way out this difficulty (von Neumann) is simple. Rather than take for
the cardinal number of A the class of all sets equinumerous to A, just take a
“canonical” or “normalized” representative from this class (the terms in quotes
mean that the representative ought not to depend too strongly on A itself,
so that ∼-invariance can be assured). By Zermelo’s theorem (VI.5.54), each
such class contains ordinals; so take the least such ordinal to measure the
cardinality.

VII.4.1 Deﬁnition. For any set x, its cardinal number, or cardinality, is deﬁned
to be min{α : α ∼ x} and is denoted by Card(x). Cardinal numbers are also
simply referred to as cardinals. Thus a cardinal is just the cardinal number of
some set.
We shall use (in argot) lowercase fraktur letters to denote arbitrary cardinals,
i.e., cardinal-typed variables (e.g., a, b, m), but also lowercase Greek letters
around the middle of the alphabet, typically, κ and λ. The class of all cardinals
will be denoted by Cn.

By deﬁnition, Cn ⊆ On.

Here are some useful and immediate consequences.

VII.4.2 Proposition. For any sets x and y the following hold:

(a) x ∼ Card(x).
(b) x ∼ y iff Card(x) = Card(y).
VII.4. Cardinals 459

Proof. Part (a) follows immediately from the deﬁnition.

For (b), assume x ∼ y. Then Card(x) = min{α : α ∼ x} = min{α : α ∼ y} =
Card(y). This settles the only-if part. For the if part, use (a) to write x ∼ Card(x) =
Card(y) ∼ y.

Part (b) shows that Card() is a ∼-invariant.

VII.4.3 Proposition. For any ordinal α, α ∈ Cn iff there is no β < α such that
β ∼ α.

Proof. Let α ∈ Cn be due to α = Card(x) for some set x. Consider the set (why
set?) S = {γ : γ ∼ x}. We have

α = min S (1)

If some β < α satisﬁes β ∼ α, then this contradicts (1), since β ∈ S. This argu-
ment establishes the only-if part.
For the if part, let there be no β < α such that β ∼ α. Then α is smallest in
{γ : γ ∼ α}, i.e., α = Card(α) and therefore α ∈ Cn.

Proposition VII.4.3 gives a characterization of cardinals which is independent

of the sets whose cardinality these cardinals measure. It is helpful in showing
that speciﬁc ordinals are cardinals. Because of the proposition, cardinals are
also called initial ordinals.

VII.4.4 Example. Every natural number, i.e., ﬁnite ordinal, is a cardinal.

Indeed, by Corollary VII.1.8 we obtain that α ∼ β whenever α ∈ β ∈ ω.
Therefore, fixing attention on β, it is a cardinal by Proposition VII.4.3.
It follows that β = Card(β), but then (Definition VII.1.3) Card(β) = |β|,
so that the definition of cardinals for finite sets is indeed a special case of
Definition VII.4.1, as we hoped it would be.

So far we have obtained that Cn = ∅, in particular, ω ⊆ Cn.

VII.4.5 Example. We now establish that ω ∈ Cn. Indeed, just invoke Proposi-
tion VII.1.13 to see that α ∈ ω implies α ∼ ω.

We have just witnessed that ω is the smallest infinite cardinal (i.e., smal-
lest infinite ordinal that is a cardinal). Definitions VII.1.3 and VII.2.1 are
also worth restating in the present context: x is finite iff Card(x) < ω; it is
460 VII. Cardinality

enumerable iff Card(x) = ω; it is countable iff Card(x) ≤ ω; it is uncountable

iff Card(x) > ω.

VII.4.6 Proposition. For any ordinal α,

(i) Card(α) ≤ α,
(ii) Card(α) = α iff α ∈ Cn.

Proof. (i): Card(α) = min S, where S = {γ : γ ∼ α}. But α ∈ S.

(ii): If part. Let α ∈ Cn. If Card(α) < α, then Card(α) ∼ α contradicts Propo-
sition VII.4.3.
Only-if part. Trivially, the hypothesis says “α is a cardinal”.

VII.4.7 Example. For any cardinal m, Card(m) = m. Indeed, this just rephrases
Proposition VII.4.6(ii).
This observation is often usefully applied as Card(Card(x)) = Card(x), where
x is any set.

VII.4.8 Proposition. Every inﬁnite cardinal is a limit ordinal.

Proof. The claim is known to be true for ω. So let ω < a, and assume instead
that a = β + 1 for some β. Now, β is inﬁnite; otherwise a = β + 1 < ω.
By Lemma VII.2.24, β ∪ {β} ∼ β; therefore a ∼ β, which along with β < a
contradicts that a is a cardinal.

The above result shows that there are many “more” ordinals than cardinals.
For example, ω + 1, ω + 2, and ω + i for any i ∈ ω are not cardinals.
It also suggests the question of whether indeed there are any cardinals above
ω. This question will be eventually answered afﬁrmatively. As a matter of fact,
there are so many cardinals that Cn is a proper class.
The following result is very important for the further development of the
theory of cardinal numbers.

VII.4.9 Theorem. For any sets A and B, A ⊆ B implies that Card(A) ≤

Card(B).

Proof. Let b = Card(B) and f : B → b be a 1-1 correspondence. Deﬁne < on

B by

x<y iff f (x) ∈ f (y) (1)

VII.4. Cardinals 461

By VI.3.12, < well-orders B and hence A. Let α = ran(φ), where φ on A is

given by the inductive deﬁnition

φ(y) = {φ(x) : x < y ∧ x ∈ A} (2)

We know (VI.4.32) that ran(φ) ∈ On (which justiﬁes calling it “α”) and that
φ(x) ∈ On for all x ∈ A.
We next show that

for all x ∈ A, φ(x) ⊆ f (x) (3)

We do so by induction over A with respect to <. So assume (3) for all x in A

such that x < y ∈ A (this is the I.H.), and prove it for y. Now if y is minimum
in A (basis), then φ(y) = ∅, from which the claim follows in this case. Let then
y be non-minimum and z ∈ φ(y). It follows from (2) that z = φ(x) for some
x ∈ A where x < y. By the I.H. z ⊆ f (x) ∈ f (y) ((1) contributes “∈”), and
since f (x) and f (y) are ordinals (being members of b), we obtain z ∈ f (y),
which concludes the inductive proof of (3).
We next observe that α ≤ b, i.e., α ⊆ b. Indeed, let γ ∈ α. Then γ = φ(x)
for some x ∈ A, so that γ ≤ f (x) by (3). Since f (x) ∈ b, i.e., f (x) < b, we
get γ < (i.e., ∈)b.
This last result, along with Propositions VII.4.2 and VII.4.6, yields
Card(A) = Card(α) ≤ α ≤ b = Card(B).

The above theorem will provide the basic tools to compare cardinalities of
sets. To this end we introduce a deﬁnition.

VII.4.10 Deﬁnition. For two sets A and B, A B means that there is a total
and 1-1 f : A → B.

Intuitively, whenever A B, B has at least as many elements as A. We will

indeed see in VII.4.14 that Card(A) ≤ Card(B) is derivable under the circum-
stances. Let us, however, ﬁrst state some trivial but useful observations.

VII.4.11 Proposition.

(i) is reﬂexive and transitive.

(ii) If A ⊆ B, then A B.
(iii) A B iff for some C ⊆ B A ∼ C.
(iv) If A ∼ B, then A B.
462 VII. Cardinality

Proof. Exercise VII.37.

VII.4.12 Example. Case (iv) in Proposition VII.4.11 cannot be improved to

“iff”. For example, x "→ {x} establishes a P(a) for any set a. We know how-
ever that a ∼ P(a) by Cantor’s theorem (VII.3.10). This motivates the following
deﬁnition.

VII.4.13 Deﬁnition. If A B but A ∼ B, then we write A ≺ B.

VII.4.14 Proposition. For any sets A = ∅ and B, the following are equivalent:
(i) A B.
(ii) There is an onto function f : B → A.
(iii) Card(A) ≤ Card(B).

Proof. The equivalence of (i) and (ii) follows directly from V.3.9. Next, let
us assume (i) and prove (iii). By Proposition VII.4.11(iii), A ∼ C ⊆ B for
some C. Hence, using Propositions VII.4.2 and VII.4.9, Card(A) = Card(C) ≤
Card(B). Conversely, assume now (iii), i.e., Card(A) ⊆ Card(B). The diagram
below shows that g ◦ i ◦ f : A → B is total and 1-1, thus establishing (i), where
i : Card(A) → Card(B) is the inclusion map x "→ x and f : A → Card(A)
and g : Card(B) → B are 1-1 correspondences:
g◦i◦ f
A −−−−→ B
 0

f/
g

Card(A) −−−−→ Card(B)
i

VII.4.15 Corollary. For any sets A and B, A ≺ B iff Card(A) < Card(B).

Proof. If part. The hypothesis yields A B and A ∼ B (otherwise the cardinals

of the two sets would be equal). Hence A ≺ B.
Only-if part. The hypothesis yields Card(A) ≤ Card(B) and Card(A) =
Card(B).

VII.4.16 Corollary (Cantor). For any set a, Card(a) < Card(P(a)).

Proof. Indeed, a ≺ P(a) by Example VII.4.12.

VII.4. Cardinals 463

By Corollary VII.4.16, there are inﬁnitely many cardinals. Indeed, for any a,
Card(P(a)) is a bigger cardinal. The preceding proposition relates comparisons
of sets (as to size) with comparisons of their cardinal numbers and leads to
the following important result, which has several names attached to it: Cantor,
Dedekind, Schröder, and Bernstein.

VII.4.17 Theorem (Cantor-Bernstein). For any sets A and B, if A B and

B A then A ∼ B and conversely.

Proof. The “conversely” part directly follows from Proposition VII.4.11. For
the rest, observe that A B and B A yield Card(A) ≤ Card(B) and Card(B) ≤
Card(A) respectively, by Proposition VII.4.14; thus Card(A) = Card(B).

Our approach to cardinals relies on AC. Some authors deﬁne cardinal numbers in
a way independent of AC (see, for example, Levy (1979)). In such an approach,
there is a more obscure – but AC-free – proof† of the Cantor-Bernstein theorem,
which we include here.

Let f : A → B and g : B → A be total and 1-1. We want to conclude that

A ∼ B.
Consider the operator over A given by

(S) = (A − g[B]) ∪ g ◦ f [S] (1)

for all S ⊆ A. Clearly is monotone, so for some X ⊆ A we have (X ) = X .

(For example, will do for X . You may want to review Theorem VII.1.27.) Set

X = f [X ] (2)
Y = A−X (3)

Y = B−X (4)

so that

A = X ∪Y and X ∩Y =∅ (5)

B = X ∪Y and X ∩Y =∅ (6)

† Of course, AC enters via the Zermelo theorem in Deﬁnition VII.4.1 and in the proof of Theo-
rem VII.4.9, on which the above-given proof of the Cantor-Bernstein theorem is based.
464 VII. Cardinality

We will show that Y = g[Y ]. Indeed,

g[Y ] = g[B − X ] by (4)

= g[B] − g[X ] since g is 1-1 and total
= g[B] − g ◦ f [X ] by (2)
= g[B] ∩ (A − g ◦ f [X ]) (7)

By (7) and De Morgan’s law, A − g[Y ] = (A − g[B]) ∪ g ◦ f [X ] = (X ) = X .

By (5), this is what we want.
To conclude, deﬁne h : A → B by

f (x) if x ∈ X
h(x) = −1
g (x) if x ∈ Y

where g −1 : ran(g) → B is, of course, a 1-1 correspondence. Clearly, A ∼ B via

h. This concludes the AC-free proof.

VII.4.18 Example (Informal). The self-contradictory notion of the “set of all

sets”. Let us travel back to the point in time prior to the introduction of the
axiomatic foundation of set theory. At that point sets and classes meant the
same thing. The statement x ∈ x was not necessarily false for all sets x; thus
the notion of the set of all sets would not be disallowed via this route. Instead,
a cardinality argument was then applied to show that the set of all sets could
not possibly exist: Indeed, if V is the set of all sets, then P(V) ⊆ V, therefore
P(V) V. Since also (VII.4.12) V P(V), we get V ∼ P(V), contradicting
Cantor’s theorem.

VII.4.19 Example (Informal). Let us see that (0, 1] × (0, 1] ∼ (0, 1].
Indeed, (0, 1]2 (0, 1] via the function 0.a0 a1 . . . ai . . . , 0.b0 b1 . . . bi . . . "→
0.a0 b0 a1 b1 . . . ai bi . . . , which is clearly total and 1-1 on the understanding
that we only utilize inﬁnite expansions. On the other hand, (0, 1] (0, 1]2 via
x "→ x, 1. The result follows from the Cantor-Bernstein theorem.
Compare the proof just given with the one you gave in Exercise VII.35.

We next turn our attention to the transﬁnite sequence of cardinals.

VII.4.20 Proposition. If S is a set of cardinals, then S is a cardinal. More-

over, a ≤ S for all a ∈ S.

Proof. By VI.5.22, S is an ordinal. We show (see Proposition VII.4.6) that

Card( S) = S by contradiction.
VII.4. Cardinals 465

So let Card( S) < S, i.e., Card( S) ∈ S. By the deﬁnition of ,

Card S ∈ a ∈ S for some a (1)

and hence

Card S <a⊆ S for some a (2)

By Theorem VII.4.9, (2) yields a contradiction:

Card S < a ≤ Card S

Finally, that a ≤ S for all a ∈ S follows from Theorem VI.5.22.

VII.4.21 Corollary. Cn is not a set.

Proof. If it were a set, then a = Cn for some a ∈ Cn. But then a <
Card(P(a)) ∈ Cn contradicts the previous proposition (which yields

Card(P(a)) ≤ Cn = a).

VII.4.22 Deﬁnition. For any cardinal a, its cardinal successor, a+ , is the smal-
lest cardinal > a.

The above deﬁnition makes sense by Corollary VII.4.16 and the remark
following it, since Cn is well-ordered by < (i.e., ∈). We can now deﬁne the
alephs:

VII.4.23 Definition. The aleph transfinite sequence is given by the total func-
tion α "→ ℵα on On defined inductively as follows:

ℵ0 = ω
+
ℵα+1 =
(ℵα )
ℵα = {ℵβ : β < α} if Lim(α)

Each ℵα is an aleph.
A cardinal ℵα with Lim(α) is called a limit cardinal, while one such as ℵα+1
is called a successor cardinal.

VII.4.24 Remark. (1) The reader will note that the term “limit cardinal” applies
to the index of an inﬁnite cardinal in the aleph sequence. It does not refer to the
cardinal itself (all inﬁnite cardinals are limit ordinals by VII.4.8).
466 VII. Cardinality

(2) If Lim(α), then ℵα = {ℵβ+1 : β < α}. Indeed, by VII.4.20, {ℵβ+1 :
β < α} is a cardinal, say

a= {ℵβ+1 : β < α}

Since β < α implies β + 1 < α, a ≤ ℵα . On the other hand, γ ∈ ℵα implies that,

for some β < α, γ ∈ ℵβ . But ℵβ < ℵβ+1 by deﬁnition, and ℵβ+1 is transitive.
Thus, γ ∈ ℵβ+1 ⊆ a.

VII.4.25 Proposition. The function α "→ ℵα is strictly increasing.

Proof. By deﬁnition, ℵα < ℵα+1 . The result follows, by VI.5.38.

In particular, α "→ ℵα is normal.

The next theorem shows how to “compute” a+ .

VII.4.26 Theorem. For all a, a+ = {α : Card(α) ≤ a}.

Proof. Let us set S = {α : Card(α) ≤ a}.

First, S is transitive: Indeed, let α ∈ β ∈ S. This yields α ⊆ β; hence,
Card(α) ≤ Card(β) ≤ a, the last ≤ by deﬁnition of S. Thus α ∈ S.
Second, S is a set: Indeed, if α ∈ S, then α < a+ , for otherwise a+ ≤
Card(α) ≤ a. Therefore, S ⊆ a+ , and hence S is a set. It follows that S is
an ordinal.
Let next, Card(S) ∈ S. Then Card(Card(S)) ≤ a and therefore Card(S) ≤ a
(see VII.4.7) which yields S ∈ S, a contradiction.
Therefore S is a cardinal. Clearly a < S, as the previous paragraph shows.
By VII.4.22, a+ ≤ S.

By the previous theorem, ℵ1 = {α : Card(α) ≤ ω}. That is, ℵ1 is the set of all
countable ordinals.
It is also noted that for each α, (ℵα )+ ≤ Card(P(ℵα )), since ℵα < Card(P(ℵα ))
by Cantor’s theorem, while (ℵα )+ is the smallest cardinal above ℵα .
The conjecture (Hausdorff)

ℵα+1 = Card(P(ℵα ))

is the generalized continuum hypothesis, or GCH, whereas the special case

conjectured by Cantor,

ℵ1 = Card(P(ℵ0 )) (1)

is the continuum hypothesis, or CH.

VII.4. Cardinals 467

Gödel (1938, 1939, 1940) showed, using L, that GCH is consistent with
the Zermelo-Fraenkel (+AC) axioms of set theory, and Cohen (1963) showed
that ¬GCH is also consistent with ZFC. Thus GCH is independent of the ZFC
axioms; these axioms can neither prove it nor disprove it. So, as with AC,
one can adopt either GCH or ¬GCH, as an axiom. This is not generally done,
however. The other axioms of ZFC (including AC) are widely accepted as
“really true”, being counterparts of reasonable principles (e.g., substitution,
foundation), whereas our intuition does not help us at all to choose between
GCH or ¬GCH. Our principles (or axioms) are not adequate to settle this
question, and one hopes that additional intuitively “true” axioms will eventually
be discovered and added which will settle GCH. It is noted that if one adopts
GCH for the sake of experimentation, then several things become simpler in
set theory (e.g., cardinal arithmetic – see Section VII.6), and even the axiom
of choice becomes a theorem† (the interested reader is referred to Levy (1979,
p. 190)).
In the “real realm”, because P(ω) ∼ R (by VII.3.8, VII.3.11, and Exer-
cise VII.34), CH can be rephrased to read there is no cardinal between ω
and Card(R), or also every subset of R either has the cardinality of R or is
countable.

Digression. We brieﬂy look into an alternative deﬁnition of cardinals, which

is not based on the axiom of choice. This digression can be skipped without
harm, as it is not needed for the rest of our development of set theory. In-
deed, it is incompatible with Deﬁnition VII.4.1, which we are following (see
Remark VII.4.28(3) below). Yet, the reader who is interested in foundational
questions will ﬁnd the material here illuminating.

VII.4.27 Deﬁnition (Frege-Russell-Scott). For any set A, its cardinal number

or cardinality, in symbols Card(A), is the class of all sets of least rank (ρ)
equinumerous to A.
A cardinal is a class which is the cardinal number of some set.

VII.4.28 Remark. (1) The above deﬁnition is essentially the original due to
Frege and Russell suggested in the preamble to this section, where the “size” of
cardinals has been drastically reduced down to set size (see VI.6.29). We state
this as Proposition VII.4.29 below.
(2) The cardinal number of a set A does not necessarily contain A (i.e., card-
inals of Deﬁnition VII.4.27 are not equivalence classes). To see this, look, for

† For this to be non circular cardinals must be introduced in an AC-free manner. See the following
Digression.
468 VII. Cardinality

example, at Card({ω}). By Proposition VI.6.21, ρ(ω) = ω + 1; thus ρ({ω}) =

ω + 2 (VI.6.24).
Now let a be any urelement. Then ρ({a}) = 1 and {ω} ∼ {a}. Thus {a} ∈
Card({ω}), but {ω} ∈ / Card({ω}).
(3) Card(∅) = {∅}, an ordinal. However, if x = ∅ then ∅ ∈ / Card(x) = ∅,
since x ∼ ∅, i.e., cardinals of nonempty sets are not ordinals.

VII.4.29 Proposition. For any set x, Card(x) is a set.

Proof. See VI.6.29.

As before, Cn denotes the class of all cardinals.

VII.4.30 Proposition. For any sets x and y, x ∼ y iff Card(x) = Card(y).

Proof. If part. Let a ∈ Card(x) = Card(y). Then x ∼ a ∼ y.

Only-if part. Directly from Deﬁnition VII.4.27.

The above is the counterpart of Proposition VII.4.2(b), this time under Def-
inition VII.4.27. It has been shown by Pincus (1974) that one cannot define
“Card()” in ZF so that it satisfies x ∼ Card(x) as well.
One now proceeds by adopting Definitions VII.4.10 and VII.4.13 for
and ≺. In particular, Proposition VII.4.11 is derivable. Next, ≤ on cardinals
is defined through as in VII.4.31 below. We also observe that if a ∼ a and
b ∼ b , then a b yields a b and a ≺ b yields a ≺ b (Exercise VII.39).

VII.4.31 Alternative Deﬁnition (Cantor). Card(a) < Card(b) means a ≺ b.

Card(a) ≤ Card(b) means a b.

The above deﬁnition embodies the equivalence of (i) and (iii) of Proposi-
tion VII.4.14. Here Theorem VII.4.9 trivially holds via VII.4.11(ii). “Cantor’s
theorem” (VII.4.16) also holds. The Cantor-Bernstein theorem is proved by the
AC-free proof that follows VII.4.17 (p. 463). This yields that < is a partial order
on Cn. Indeed, irreﬂexivity is immediate (Card(a) < Card(a) requires a ∼ a).
Transitivity is obtained as follows:
Let a < b < c and therefore Card(a) < Card(b) < Card(c) for appropriate
a, b and c. By VII.4.31, a ≺ b ≺ c; hence a c by Proposition VII.4.11(i). If
a ∼ c, then (Exercise VII.39) b ≺ a, and hence a ∼ b by the Cantor-Bernstein
theorem. Thus a ≺ c, i.e., a < c.
Proposition VII.4.20 has the following counterpart:
VII.4. Cardinals 469

VII.4.32 Proposition. If S is a nonempty set of cardinals, then there is a cardi-

nal b such that a ≤ b for all a ∈ S.

Proof. Let T = {ρ(a) : a ∈ S}. T is a set (of ordinals) by collection; hence T
is an ordinal α such that β ≤ α for all β ∈ T (Theorem VI.5.22).
Thus x ∈ a ∈ S implies ρ(x) < ρ(a) ≤ α, so that

x ∈ VN (α) (1)

By (1), a ⊆ VN (α) for any a ∈ S. Thus b = Card(VN (α)) will do, by VII.4.11(ii).

From the above, one obtains, once again, Corollary VII.4.21: If Cn is a set,
then let b satisfy a ≤ b for all a ∈ Cn. Then since Card(P(b)) ∈ Cn, we get
Card(P(b)) ≤ b, contradicting Cantor’s theorem.

VII.4.33 Exercise. Comment on the alternative proof of Proposition VII.4.32

that proceeds as follows: Represent each a ∈ S as Card(a) for an appropriate
set a. Let T be the union of all these a’s. T is a set and a ⊆ T for each a. Thus
a = Card(a) ≤ Card(T ) by VII.4.11(ii). Therefore, b = Card(T ) will do.

The following is the counterpart of Theorem VII.4.26.

VII.4.34 Theorem. (Hartogs (1915)). For any set x there is an ordinal α such
that α x.

Proof. Let S = {β : β x}. First, by VII.4.11(i), S is transitive.

We next verify that S is a set and therefore an ordinal, say S = α. To see this,
consider the class W = {A, R : A ⊆ x ∧ R ⊆ A × A is a well-ordering of A}.
Since W ⊆ P(x) × P(x × x), W is a set.
If A, R ∈ W , then A, R ∼ = β for a unique β via a unique order isomor-
phism φ A,R : A → β. Clearly β ∈ S by VII.4.11(iii), and conversely each β ∈ S
and any particular total 1-1 function f : β → x induces a well-ordering R on
A = ran( f ) ⊆ x, so that β = &A, R& (VI.3.12 and VI.4.33).
We conclude that the function A, R "→ &A, R& : W → S is onto, and
thus S is a set by collection. If now α x, then α ∈ α. We must conclude that
α x.

We conclude this digression by observing that < on cardinals, as these were

deﬁned in VII.4.27, is a partial order (see the discussion following Deﬁni-
tion VII.4.31) but that without the axiom of choice it cannot be shown to be a
470 VII. Cardinality

total order. Of course, in the presence of AC one would rather deﬁne cardinals,
as we do, by Deﬁnition VII.4.1.

VII.4.35 Theorem. (Hartogs (1915)). AC is equivalent to the statement “<

of Deﬁnition VII.4.31 is a total order between cardinals, as these were deﬁned
in VII.4.27”.

Proof. First assume the statement in quotes and prove AC.

Let x be any nonempty set and α be such that α x by Theorem VII.4.34.
Thus neither Card(α) = Card(x) nor Card(α) < Card(x). By assumption
Card(x) < Card(α), i.e., x ≺ α, say, via the total 1-1 function f : x → α.
f −1 : ran( f ) → x well-orders x by VI.3.12. This proves that every non-
empty set can be well-ordered, and hence proves AC.
Assume AC now. By Zermelo’s theorem, if x and y are sets, then x ∼ α and
y ∼ β for some α and β. Without loss of generality, say α ≤ β. Then α β.
Now invoke Exercise VII.39.

VII.5. Arithmetic on Cardinals

In set theory and other branches of mathematics one often wants to compute
the cardinality of a set a which is formed in a particular way from given sets
whose cardinalities we already know. Failing this, one is often content to at
least compute an approximation of the cardinality of a, preferably erring on the
high side. This section develops some tools to carry out such computations.

VII.5.1 Deﬁnition. For any cardinals a and b, a +c b stands for Card({0} ×

a ∪ {1} × b), their sum.

The sum operation is denoted by the cumbersome +c to avoid confusion

with ordinal addition.

VII.5.2 Proposition. If A and B are disjoint sets, then Card(A) +c Card(B) =

Card(A ∪ B).

Proof. Let a = Card(A) and b = Card(B). Since a ∼ {0} × a via x "→ 0, x and
b ∼ {1}×b via x "→ 1, x, there are 1-1 correspondences f : A → {0}×a and
g : B → {1}×b. Since A ∩ B = {0} × a ∩ {1} × b = ∅, it follows that f ∪ g is a
1-1 correspondence and A ∪ B ∼ {0} × a ∪ {1} × b; thus a +c b = Card(A ∪ B).

VII.5. Arithmetic on Cardinals 471

VII.5.3 Corollary. For any sets A and B, Card(A∪ B) ≤ Card(A) +c Card(B).

Proof. The function f : {0} × A ∪ {1} × B → A ∪ B given by i, x "→ x is

onto. The claim now follows from VII.5.2 and VII.4.14(ii).

VII.5.4 Remark. By VII.5.2, to compute a +c b it sufﬁces to compute

Card(A∪ B) for any disjoint A and B that have cardinalities a and b respectively.
This observation proves to be very convenient in practice.

VII.5.5 Example. We verify that ω +c ω = ω. Indeed, observe that ω ∼ E and

ω ∼ O, where E and O are the even and odd natural numbers respectively, and
apply VII.5.2.
It is important to observe that +c (cardinal addition) is different than + on
ordinals (ω = ω + ω).

VII.5.6 Example (Informal). We verify that ω +c Card(R) = Card(R).† In-

deed, (0, 1) ∩ ω = ∅ and (0, 1) ∼ R (Exercise VII.34). Thus
ω +c Card(R) = Card(ω ∪ (0, 1)) ≤ Card(R) (1)
where the = follows from VII.5.2 and the ≤ from ω ∪ (0, 1) ⊆ R. Similarly,
Card(R) ≤ Card(ω ∪ (0, 1)) by Exercise VII.34 and (0, 1) ω ∪ (0, 1). This
and (1) establish the claim. Alternatively, ω ∪ (0, 1) ∼ (0, 1) by VII.2.24.

The basic properties of addition, worked out by Cantor, are captured by the
following theorem.

VII.5.7 Proposition (Cantor). For any cardinals the following hold:

(i) a +c 0 = a
(ii) a +c b = b +c a
(iii) (a +c b) +c c = a +c (b +c c)
(iv) If a ≤ b, then a +c c ≤ b +c c
(v) a ≤ b iff for some c, b = a +c c.

Proof. (i)–(iii): by VII.5.4.

For (iv), let A, B, C be mutually disjoint sets such that a = Card(A),
b = Card(B), c = Card(C). By VII.4.14, there is an onto f : B → A. Then
f ∪ i : B ∪ C → A ∪ C is an onto function, where i : C → C is the identity.

† The cardinality of the set of real numbers is often denoted by c in the literature (“c” stands for
“continuum”).
472 VII. Cardinality

For (v), let a ≤ b and A, B be as above; hence (VII.4.14) there is a total, 1-1
f : A → B. Let C = B −ran( f ) (this might be empty). Set c = Card(C). Now,
C ∩ ran( f ) = ∅, and Card(ran( f )) = a. Thus, a +c c = Card(ran( f ) ∪ C) = b.
This settles the only-if part. For the if part start with A ∩ C = ∅ such that
a = Card(A), c = Card(C), and set B = A ∪ C, b = Card(B) = a +c c. Since
i : A ⊆ B (the inclusion map, given by i(x) = x for all x ∈ A) is total and 1-1,
we get a ≤ b by VII.4.14.

VII.5.8 Proposition. +c ω2 = + ω2 .

Proof. m + n = m ∪ {m + k : k ∈ n} by V.1.25 (or VI.10.4). The function

f : n → {m + k : k ∈ n}, given by f (k) = m + k, is a 1-1 correspondence
(1-1-ness by VI.10.2). Thus,

m + n = Card(m + n) since m + n ∈ ω
= Card(m) +c Card({m + k : k ∈ n})
= m +c n

We next turn to the multiplication of cardinals. The deﬁnition is motivated

from the intuitive observation that if |A| = n and |B| = m, then |A× B| = m · n.
The validity of this observation will be formally veriﬁed below.

VII.5.9 Deﬁnition. For any cardinals a and b, a ·c b stands for Card(a × b),
their product.

The cumbersome “·c ” for cardinal multiplication is used to distinguish this

operation from “·”, ordinal multiplication. We prove the analogue of VII.5.2
ﬁrst:

VII.5.10 Proposition. If a = A and b = B, then a ·c b = Card(A × B).

Proof. Let f : a → A and g : b → B be 1-1 correspondences. It follows that

λγ δ. f (γ ), g(δ) is a 1-1 correspondence a × b → A × B.

VII.5.11 Example. In view of VII.2.9, ω ·c ω = ω < ω · ω; thus

a ·c b = a · b, in general.

See however below, the case of ﬁnite cardinals.

VII.5.12 Proposition. ·c ω2 = · ω2 .
VII.5. Arithmetic on Cardinals 473

Proof. We have

m · n = Card(m · n) since m · n ∈ ω
= Card(n × m) by VI.10.18
= Card(m × n) via k, l "→ l, k
= m ·c n by VII.5.9

VII.5.13 Proposition (Cantor). For any cardinals, the following hold:

(i) a ·c 0 = 0
(ii) a ·c 1 = a
(iii) a ·c b = b ·c a
(iv) (a ·c b) ·c c = a ·c (b ·c c)
(v) If a ≤ b, then a ·c c ≤ b ·c c
(vi) a · (b +c c) = a ·c b + a ·c c.

Proof. We apply VII.5.10 throughout. Thus, (i) follows from A × ∅ = ∅.

(ii) follows from the fact that x "→ x, 0 is a 1-1 correspondence A → A × 1.
For (iii) note that A × B ∼ B × A via x, y "→ y, x. (iv) is a consequence of
(A × B) × C ∼ A × (B × C) via x, y, z "→ x, y, z (recall that x, y, z =
x, y, z).
For (v), let f : A → B be 1-1 and total. Then x, y "→ f (x), y is 1-1 and
total A × C → B × C.
Finally, for (vi), take b = Card(B) and c = Card(C) with B ∩ C = ∅, and
note that A × (B ∪ C) = (A × B) ∪ (A × C) and that (A × B) ∩ (A × C) = ∅.

The following result, along with the Cantor-Bernstein theorem, assists in

computations where +c and ·c are involved. It shows that cardinal addition and
multiplication are nowhere near as rich as their ordinal counterparts.

VII.5.14 Theorem. For any a ≥ ω, a ·c a = a.

Proof. Let a ≥ ω be the smallest cardinal for which the claim fails. Then a > ω
by VII.5.11. Let a × a ∼
= β, via the J of Section VI.7, i.e., J [a × a] = β. We
know that J [a × a] ≥ a by Exercise VI.30, so a < β; therefore
& '
a∼
= γ , δ for some γ , δ ∈ a × a (1)
474 VII. Cardinality

Since Lim(a) (by VII.4.8), take a λ < a to satisfy also max(γ , δ) < λ. Thus,
the isomorphism in (1) establishes a λ × λ. Therefore
a = Card(a) ≤ Card(λ × λ) = Card(λ) ·c Card(λ) = Card(λ) (2)
the last “=” by minimality of a, and Card(λ) ≤ λ < a (using VII.4.6). We now
have a contradiction a ≤ Card(λ) < a.

A side effect of the proof is that J [ℵα × ℵα ] ∼

= ℵα for all α.

VII.5.15 Corollary. For any a ≥ ω, a +c a = a.

Proof. Using the arithmetic of VII.5.13,

a +c a = a ·c 1 +c a ·c 1
= a ·c (1 +c 1)
= a ·c (1 + 1)
= a ·c 2
≤ a ·c a by VII.5.13(iii) and (v)
=a
But we also have a ≤ a +c a.

VII.5.16 Example. Constructions in mathematics often result in a family of

sets (Ai )i∈I where we have the “estimates” of cardinalities
Card(Ai ) ≤ a for all i ∈ I (1)
and
Card(I ) ≤ b (2)
We can then estimate that

Card A i ≤ a ·c b (3)
i∈I

Indeed, using AC, pick for each i ∈ I an onto f i : a → Ai , which is legitimate

by (1). Deﬁne g : a × I → i∈I Ai by
g(γ , i) = f i (γ ) for all γ ∈ a, i ∈ I
Now g is onto; hence

Card Ai ≤ Card(a × I ) = a ·c Card(I ) ≤ a ·c b
i∈I

That is (3).
VII.5. Arithmetic on Cardinals 475

Apart from our use of AC in the definition of cardinals (through the well-
ordering theorem), the above result also invoked AC (twice) additionally (why
twice?). AC can be avoided if we are content to prove instead: “Assume that
cardinals were defined without AC, say as in VII.4.27. Now, assume (1) and (2)

above, and moreover let I and i∈I Ai be well-orderable. Then (3) follows
within ZF.”

Indeed, let α = &( i∈I Ai , <1 )& with respect to some arbitrarily chosen well-
ordering <1 of this set. Let us also pick a well-ordering <2 of I . Define for

each x ∈ i∈I Ai

f (x) = i, γ , where i = (<2 -min){ j ∈ I : x ∈ A j } and

γ = &(<1 x, <1 )&

Clearly, f : i∈I Ai → I × α is total and 1-1. Hence,

Card Ai ≤ Card(I × α) = Card(I ) ·c Card(α) = b ·c a = a ·c b
i∈I

VII.5.17 Example. Here is a situation where we may want to use the technique
of the previous example: We have a ﬁrst order language of logic, where the set of
nonlogical symbols has cardinality k. How “many” formulas can this language
have? Well, no more than strings over the alphabet of the language. Now the
cardinality of the alphabet, L, is

ω if k ≤ ω
Card(L) = ω +c k =
k otherwise
where ω is the cardinality of the set of logical symbols (assuming the object
variables are v0 , v1 , . . .).
A “string” of length n < ω is, of course, a member of L n . An easy induction
on n, via VII.5.14, shows that

ω if k ≤ ω
Card(L ) =
n
k otherwise

Thus, the set of all strings over L, n∈ω L n , has cardinality

ω ·c ω = ω if k ≤ ω
Card L ≤
n

n∈ω
k ·c ω = k otherwise

Can you sharpen the ≤ into =?

We deﬁne, ﬁnally, cardinal exponentiation. This again turns out to be far too
“easy” by comparison with ordinal exponentiation.
476 VII. Cardinality

VII.5.18 Deﬁnition. For any a, b, ab denotes Card(ba).

First, let us give the analogues of VII.5.2 and VII.5.10.

VII.5.19 Proposition. If a = Card(A) and b = Card(B), then Card(ba) =

Card( B A).

Proof. Let f : a → A and g : b → B be 1-1 correspondences. As the following

commutative diagram (cf. V.3.11) shows, F : ba → B A given by

F(h) = f ◦ h ◦ g −1

is a 1-1 correspondence, with inverse λk. f −1 ◦ k ◦ g : B A → ba:

h
b −−−−→ a
F −1 (k)
0 
 f
g −1  /
k
B −−−−→ A
F(h)

VII.5.20 Remark. By VII.3.9, 2Card(A) = Card( A 2) = Card(P(A)) for all

sets A.
In particular,

2ℵ0 = Card(P(ω))
= c, by VII.3.8

VII.5.21 Proposition (Cantor). Cardinal exponentiation obeys the following:

(i) a0 = 1
(ii) a1 = a
(iii) ak+c l = ak ·c al
(iv) (ak)l = a(k·c l)
(v) ak ≤ bk whenever a ≤ b.

Proof. (i): The empty function 0 is the only member of 0 a; hence Card(0 a) = 1.
For (ii), the set of total functions f : 1 → a is 1 a = {{0, γ } : γ < a}. Thus
Card(1 a) = a, via the 1-1 correspondence {0, γ } "→ γ .
VII.5. Arithmetic on Cardinals 477

(iii): Let k = Card(K ), l = Card(L), a = Card(A), where K ∩ L = ∅.

The reader can readily verify that f "→ f K , f L is a 1-1 correspondence
K ∪L
A ∼ K A × L A.
(iv): Let K , L , A be as in (iii) (although K ∩ L = ∅ is not required here).
We need a 1-1 correspondence (K ×L) A ∼ L ( K A). The function that maps
λx y. f (x, y) ∈ (K ×L) A to λy.(λx. f (x, y)) ∈ L ( K A) ﬁlls the bill.
(v): Since a ⊆ b, F : ka → kb given by F(g) = g is total and 1-1.

The next result shows that cardinal exponentiation coincides with ordinal
exponentiation over ω, just like the addition and multiplication.

VII.5.22 Proposition. For all m, n in ω, m ·n = m n .

Proof. Induction on n. For n = 0 we have m ·0 = 1 = m 0 , the last equality

by VII.5.21. Assume the claim for some frozen n, and proceed to n + 1.

m ·(n+1) = m ·n · m by VI.10.23
= m n ·c m by I.H. and VII.5.12
= m ·c m
n 1
by VII.5.21
= m n+c 1 by VII.5.21
= m n+1 , by VII.5.8

VII.5.23 Example. How big is ℵℵ0 0 ? Well, this is Card(ω ω); therefore

c = Card(ω 2)
≤ Card(ω ω) since

ω
2 ⊆ ωω
≤ Card P(ω × ω) since ω ω ⊆ P(ω × ω)
ω×ω
= Card( 2) by VII.3.9
=c by VII.5.20 and VII.2.9

Thus, ℵℵ0 0 = c.

VII.5.24 Example. For any n ∈ ω−{0} and set A, we have An ∼ n A via the 1-1
correspondence x0 , . . . xn−1 "→ {i, xi : i ∈ n}. Thus Card(n A) = Card(An ).
In particular, a2 = Card(2 a) = Card(a × a) = a ·c a for any a.

VII.5.25 Remark. We saw in the discussion following VII.4.26 (p. 466) that

(ℵα )+ ≤ Card(P(ℵα ))
478 VII. Cardinality

(ℵα )+ ≤ Card(ℵα 2) = 2ℵα

Thus,

(1) an alternative formulation of GCH is

ℵα+1 = 2ℵα

and
(2) we have just estimated a+ in general:

a+ ≤ 2a

VII.6. Coﬁnality; More Cardinal Arithmetic; Inaccessible Cardinals

How far can we “stretch” an ordinal α by applying to it a function f ? That is,
given α and a total function f : α → On, how “big” can the elements of ran( f )
be? Well, let β be arbitrary. Deﬁne f (0) = β. Thus 1 = {0} is stretched, by f ,
to the arbitrarily large value β. Clearly an uninspiring answer.
A much better question to ask, which leads to fruitful answers, is: how far
can we shrink a given ordinal α by some total function, f ? That is, what is the
smallest β such that f : β → α and ran( f ) “spreads as far to the right” in α as
possible?
More precisely, let us deﬁne

VII.6.1 Deﬁnition (Coﬁnal Subsets). Let ∅ = A ⊆ B, and < be an order on B

(hence also on A). We say that A is coﬁnal in B just in case for every b ∈ B
there is an a ∈ A such that b ≤ a, where, of course, x ≤ y means x < y ∨ x = y.

The set A above “spreads as far to the right in B as possible”. If β ⊆ α, then

β cannot be coﬁnal in α in the above sense, unless β = α. For this reason we
have a somewhat different notion of “coﬁnal in” for ordinals.

VII.6.2 Deﬁnition. β is coﬁnal in α = 0 just in case there is a total function

f : β → α such that ran( f ) is cofinal in α in the sense of VII.6.1. We say that
f maps β cofinally into α, and that f : β → α is a cofinal map (function).
The cofinality of an ordinal α, cf(α), is the smallest ordinal that is cofinal
in α.
VII.6. Cofinality; More Cardinal Arithmetic 479

Thus, cf(α) is the smallest ordinal into which we can “shrink” α via a (total)
function. If Lim(α), and f : β → α is cofinal, then ran( f ) is unbounded in α

(hence sup ran( f ) = ran( f ) = α), since γ < α implies γ + 1 < α, and hence
γ < γ + 1 ≤ f (σ ) for some σ ∈ β.†
From the preamble to the section it follows that 1 = cf(α + 1) for any α.
Also, ω = cf(ω), since for each n ∈ ω and f : n → ω, ran( f ) is finite (Exer-

cise VII.18); hence ran( f ) ∈ ω, and thus, for all n ∈ ω, cf(ω) = n. Also,
by VII.4.23, cf(ℵω ) = ω; therefore some “huge” ordinals (in this case a car-
dinal) can shrink quite a bit.
Finally, it is clear that cf(α) ≤ α, since, whenever Lim(α), the identity
function maps α cofinally into α, while when α = β + 1, cf(α) = 1 ≤ α.

VII.6.3 Deﬁnition (Hausdorff). An ordinal α is regular provided cf(α) = α.

Otherwise, it is singular.

Thus, 1 and ω are regular, all n > 1 in ω are singular, and so is ℵω .‡

VII.6.4 Proposition. For any α, cf(α) is a cardinal.

Proof. If α is a successor then the result is trivial. Otherwise, let 1 ≤ β < cf(α)
and g : β → cf(α) be a 1-1 correspondence. Suppose that f maps cf(α) to α

coﬁnally. Thus, ran( f ) = α. Clearly ran( f ◦ g) = α as well, thus f ◦ g
maps β coﬁnally into α, contradicting the minimality of cf(α).

Thus, all regular ordinals are cardinals. In particular, all, except 1, are limit
ordinals.

VII.6.5 Proposition. There is a total order-preserving coﬁnal map

f : cf(α) → α.

Proof. The result is trivial if α is a successor.

Let then Lim(α), and g : cf(α) → α be such that ran(g) = α. Deﬁne f
by recursion for all β ∈ cf(α)

f (β) ! g(min{σ ∈ cf(α) : (∀γ < β)g(σ ) > f (γ ) ∪ g(β)}) (1)

† Some authors offer the deﬁnition only for Lim(α).

‡ If we allowed nontotal cofinal maps, then the modified definition would make 0 regular as well
(Hausdorff). There is no universal agreement in the literature (see also previous footnote) on this
point.
480 VII. Cardinality

To see that f is total on cf(α), we argue that, for β ∈ cf(α),

{σ ∈ cf(α) : (∀γ < β)g(σ ) > f (γ ) ∪ g(β)} = ∅ (2)
By (1), ran( f ) ⊆ α. Thus, (2) follows, since
(i) ran(g) is unbounded in α,
(ii) sup ran( f β) < α, since β < cf(α).
Again by (1), f (γ ) < f (β) if γ < β, so that f is order-preserving.
Finally, τ ∈ α implies τ < g(σ ) for some σ ∈ cf(α) by (i). By (1), g(σ ) <

f (σ ); hence τ ∈ ran( f ); therefore α ⊆ ran( f ); thus α = ran( f ).

VII.6.6 Proposition. Let the order-preserving function f map α coﬁnally into

β > α. Then cf(α) = cf(β).

Proof. Let g : cf(α) → α be a coﬁnal map. Then, so is f ◦ g : cf(α) → β.

Indeed, let γ ∈ β. By cofinality of f , f (σ ) ≥ γ for some σ ∈ α. Moreover
(cofinality of g) g(ξ ) ≥ σ (for some ξ ∈ cf(α)). Since f is order-preserving,
f (g(ξ )) ≥ f (σ ) ≥ γ . We conclude that
cf(β) ≤ cf(α) (1)
If α is a successor, then cf(α) = 1 and the result follows from (1).
So let Lim(α), and let h : cf(β) → β be cofinal. Define a function F : β → α
by

f −1 (γ ) if γ ∈ ran( f )
F(γ ) = for all γ ∈ β
min{δ ∈ α : f (δ) > γ } if γ ∈
/ ran( f )
F is total, for the min in the bottom case always exists. Indeed, given γ , there
is a σ ∈ α such that γ ≤ f (σ ) < f (σ + 1) [σ + 1 ∈ α by Lim(α)]. Hence
(∃δ ∈ α) f (δ) > γ . Moreover,
F(γ ) < F(δ) whenever ran( f ) % γ < δ ∈ β (2)
Indeed, (2) is immediate if δ ∈ ran( f ) as well, since f is order-preserving. If,
on the other hand, δ ∈
/ ran( f ), then let σ ∈ α be minimum such that f (σ ) > δ.
Thus, f (σ ) > δ > γ = f (η) for some η ∈ α, and F(δ) = σ > η = F(γ ).
Finally, F ◦ h : cf(β) → α is cofinal. Indeed, let γ ∈ α and α % δ > γ
(by Lim(α)). Therefore f (δ) > f (γ ). By cofinality of h, take h(σ ) > f (δ), for
some σ ∈ cf(β) (strict inequality by Lim(β) – why Lim(β)?). It follows, us-
ing (2), that F(h(σ )) > F( f (δ)) = δ > γ .
Thus cf(α) ≤ cf(β), and we are done by (1).
VII.6. Cofinality; More Cardinal Arithmetic 481

VII.6.7 Corollary. For any α, cf(cf(α)) = cf(α).

Proof. If α is regular, then the result is trivial. Otherwise, use VII.6.5 and
VII.6.6.

Thus, for all α, cf(α) is a regular cardinal.

VII.6.8 Corollary. If F is normal, then cf(F(α)) = cf(α) for all limit ordi-
nals α.

Proof. Since F is order-preserving, F(α) ≥ α. If we have equality, the result

is trivial. So let F(α) > α. The map

α % β "→ F(β) ∈ F(α)

is coﬁnal by normality. The result then follows from VII.6.6.

VII.6.9 Corollary. For all Lim(α), cf(ℵα ) = cf(α).

Proof. α "→ ℵα is normal.

One often encounters, and accepts as common sense, the following

statement: “Let X ⊆ n∈ω An and X be ﬁnite. Then, for some m ∈ ω,

X ⊆ n<m An .” This is a special case of the following.

VII.6.10 Proposition. If m ≥ ω is regular, Card(X ) < m, and X ⊆ Aλ ,
λ<m
then X ⊆ λ<κ Aλ for some κ < m.

Proof. Let Card(X ) = n < m, and g : n → X be a 1-1 correspondence. Deﬁne

f : n → m by
1
f (σ ) = min τ ∈ m : g(σ ) ∈
/ Aλ
λ<τ

If the conclusion is false, then f maps n coﬁnally into m, contradicting the

regularity of the latter.

VII.6.11 Proposition. The inﬁnite cardinal a is singular iff, for some β < a
and a family of sets (Aα )α<β with Card(Aα ) < a for all sets in the family, one

has a = Card( α<β Aα ).
482 VII. Cardinality

Proof. Only-if part. Let β = cf(a) < a, and f : β → a be coﬁnal. Thus a =

α<β f (α). The result follows using Aα = f (α). Of course, f (α) < a; hence
Card( f (α)) ≤ f (α) < a.

If part. Suppose that a = Card( α<β Aα ), where β is the smallest ordinal that

satisﬁes the hypotheses above. For each α ∈ β set f (α) = Card( γ <α Aγ ). As

γ <α Aγ ⊆ α<β Aα , it follows that Card( γ <α Aγ ) ≤ Card( α<β Aα ) = a.
The ≤ graduates to < by minimality of β; hence f (α) ∈ a (∈ is, of course, <).
Thus we have a function f : β → a.

By VII.4.20, c = α<β f (α) = supα<β f (α) is a cardinal. Clearly, c ≤ a.
Can the inequality be strict? Well, if it can, then (using VII.5.16)

a = Card Aα
α<β

= Card Aγ
α<β γ <α

≤ c ·c Card(β) = max{c, Card(β)} < a

– a contradiction. Thus, c = a and f is coﬁnal. Since β < a, the result follows.

(1) The above proposition can be rephrased to read exactly as above, but with
β replaced by a cardinal b < a. In the only-if part this is so because cf(a)
is a cardinal. In the if part it is so because the smallest ordinal β that makes
the proof work is cf(a). But this is a cardinal.
(2) As the notions “singular” and “regular” pertain to ordinals, the remark
following VII.5.16 applies here, so that VII.6.11 is provable within ZF on
the assumption a is well-orderable. Remarks such as that and the present
one are only of value when one wants to gauge with accuracy which results
follow, or do not follow, from what axioms.

VII.6.12 Proposition. Every inﬁnite successor cardinal is regular.

Proof. Let ω ≤ a, and assume instead that κ = cf(a+ ) < a+ (κ is a cardinal

by VII.6.4). Let f : κ → a+ be coﬁnal. We observe that f (β) < a+ , hence
Card( f (β)) ≤ a, for all β ∈ κ. Thus, using VII.5.16,

a+ = f (β)
β<κ
≤ κ ·c a
≤ a ·c a = a
which is a contradiction.
VII.6. Coﬁnality; More Cardinal Arithmetic 483

It is known that without AC we cannot prove (in ZF) that ℵ1 is regular (Feferman
and Levy (1963)). One cannot even prove (in ZF) that there are any inﬁnite
regular cardinals at all beyond ω (Gitik (1980)).

Let us next turn our attention to regular limit cardinals beyond ω. These have
a special name.

VII.6.13 Deﬁnition. A cardinal a > ℵ0 is weakly inaccessible iff it is regular

and a limit (i.e., for some α with Lim(α), a = ℵα ).

By VII.6.9, if Lim(α) and cf(ℵα ) = ℵα , then cf(α) = ℵα ≥ α (the inequality

holds by normality of ℵ). Hence
ℵα = α (1)
since cf(α) ≤ α. So, what is the ﬁrst ﬁxed point α of ℵ? By VI.5.42–VI.5.43,
that will be α = sup{sn : n < ω}, where
s0 = 0
sn+1 = ℵsn for n > 0
This α is quite huge, namely,
ℵℵ
..
.ℵ0

On the other hand, cf(α) = ω for this α, since n "→ sn is cofinal. Thus cf(ℵα ) =
cf(α) = ω < ℵα . We have just established that the first fixed point of ℵ, a huge
limit cardinal, is singular. As this was only the first candidate for a weakly
inaccessible cardinal, the first actual such cardinal will be even bigger, as it
must occur later in the aleph sequence.
It turns out that within ZFC one cannot prove that weak inaccessibles exist.
We will prove this relatively easy metamathematical fact below, but first we will
need a notion of strongly inaccessible cardinals and some additional cardinal
arithmetic tools.

VII.6.14 Deﬁnition. A cardinal a is strongly inaccessible, or just inaccessible,

iff it is weakly inaccessible and, moreover, for every inﬁnite cardinal b < a,
2b < a.

(1) Any cardinal a that satisfies b < a → 2b < a is called a strong limit.
(2) The above definition could also be phrased: “A cardinal a > ℵ0 is strongly
inaccessible, or just inaccessible, iff it is regular and, moreover, for every infinite
cardinal b < a, 2b < a.” This is because for any b, b+ ≤ 2b; thus the “moreover”
484 VII. Cardinality

part yields the implication b < a → b+ < a; hence a is a limit cardinal and
therefore, in particular, weakly inaccessible.
In the presence of the generalized continuum hypothesis (GCH) that 2b = b+
for all infinite cardinals, the requirement in VII.6.14 that b < a implies 2b < a
is automatically satisfied, since a is a limit cardinal. Thus, under GCH, weak
and strong inaccessibles coincide.
A strongly inaccessible in comparison with other (smaller) infinite cardinals
is like ω in comparison with smaller cardinals (natural numbers), since n ∈ ω
implies 2n ∈ ω.†

VII.6.15 Deﬁnition (Generalized Cardinal Addition). Let (ki )i∈I be a family

!
of cardinals. Their sum, i∈I ki , is deﬁned to be Card( i∈I {i} × ki ).

Intuitively, in the sum we “count” all the elements in all the ki and allow for
multiplicity of occurrence as well, since if ki = k j , still {i} × ki ∩ { j} × k j = ∅.

VII.6.16 Remark. The above deﬁnition, a straightforward generalization of

Definition VII.5.1, does not need AC, so it can be effected within ZF (with an
appropriate definition of cardinals that also avoids AC). In ZFC it is equivalent to
!
the commonly given statement (definitionally): “ i∈I ki equals Card( i∈I K i ),
where K i ∩ K j = ∅ whenever i = j, and ki = Card(K i ) for all i ∈ I .”
Part of this is due to the obvious ki = Card({i} × ki ). The rest (including the
immunity of the statement to the choice of K i ) follows from AC (the details
will be left to the reader: Exercise VII.59).

An interesting phenomenon occurs in connection with the above remark:

! !
i∈ℵ0 ℵ0 (that is, i∈ℵ0 ki where every ki is ℵ0 ) equals Card( i∈ℵ0 {i} × ℵ0 ) =
Card(ℵ20 ) = ℵ0 .
On the other hand, as we have noted a number of times before, Feferman
and Levy (1963) have shown that, in the absence of AC, it is possible to
have a countable family of mutually disjoint countable sets (K i )i∈I such that

Card( i∈I K i ) = 2ℵ0 > ℵ0 . In lay terms, the “sum of the parts” can be signiﬁ-
cantly less than the “whole”, without AC.

VII.6.17 Proposition (Multiplication as Repeated Addition). For any card-

!
inals a and b, a ·c b = α∈a b.

† “2n ” in the sense of Chapter V. This is the same as 2·n .

VII.6. Coﬁnality; More Cardinal Arithmetic 485

Proof.

,
b = Card {α} × b
α∈a α∈a

= Card(a × b) = a ·c b

VII.6.18 Deﬁnition (Generalized Cardinal Multiplication). Let (ki )i∈I be a

#
family of cardinals. Their product is deﬁned to be Card( i∈I ki ).

We prefer not to propose a symbol for the product of a family of cardinal

#
numbers, as there is no universal agreement. Of course, i∈I ki would be in-
appropriate, as it already indicates something else: the Cartesian product of
the ki .
#
If all the ki are equal to m, and Card(I ) = a, then Card( i∈I ki ) = Card( I m) =
ma.

# #
VII.6.19 Remark. If Ai ∼ Bi for i ∈ I , then i∈I Ai ∼ i∈I Bi (Exer-
cise VII.64).

VII.6.20 Lemma (König). If ai < bi for all i ∈ I , then

, "
ai < Card bi
i∈I i∈I

Proof. Set Ai = {i} × ai ; thus

,
ai = Card Ai
i∈I i∈I

Card(Ai ) = ai for i ∈ I

and the Ai are pairwise disjoint.

We need to show that there is no onto function
"
g: Ai → bi
i∈I i∈I

Suppose otherwise. Set, for each i ∈ I ,

Bi = g[Ai ], the image of Ai under g (1)

486 VII. Cardinality

Thus,
"
bi = Bi
i∈I i∈I

We next project Bi along the ith coordinate to get

Pi = { p(i) : p ∈ Bi } (2)

By (1), and the onto map Bi % p "→ p(i) ∈ Pi ,

Card(Pi ) ≤ Card(Bi )
≤ Card(Ai )
< bi

Thus, Pi ⊂ bi (Pi ⊆ bi , by (2)). Now, using AC, deﬁne a total p on I with

#
p(i) ∈ bi − Pi for all i ∈ I . Clearly, p ∈ i∈I bi , yet p cannot be in ran(g), for, if
so, p ∈ Bi for some i ∈ I , and hence p(i) ∈ Pi ; a contradiction. Thus, g cannot
be onto; a contradiction.

König’s lemma extends Cantor’s diagonalization that lies behind Cantor’s

! #
theorem. Indeed, a = α∈a 1 < Card( α∈a 2) = 2a.

VII.6.21 Corollary. For all inﬁnite cardinals a, a < acf(a) .

Proof. Let f : cf(a) → a be coﬁnal. Then

a= f (β)
β<cf(a)
,
≤ Card( f (β)) by Exercise VII.60
β<cf(a)

"
< Card a since Card( f (β)) ≤ f (β) < a
β<cf(a)

= acf(a)

VII.6.22 Corollary. cf(2ℵ0 ) > ℵ0 .

Indeed, cf(2ℵα ) > ℵα for any α (see Exercise VII.65).

Proof. If we have equality, then (by VII.6.21) we get

2ℵ0 < (2ℵ0 )ℵ0 = 2ℵ0 ·c ℵ0 = 2ℵ0
VII.6. Coﬁnality; More Cardinal Arithmetic 487

In the absence of the CH, ZFC cannot pinpoint the cardinal 2ℵ0 in the aleph
sequence with any certainty. If 2ℵ0 = ℵ1 , then ﬁne. But if not, then it can be
(i.e., it is consistent with ZFC), as Cohen forcing has shown, that 2ℵ0 = ℵ2 or
2ℵ0 = ℵ3 , or, indeed, that 2ℵ0 is weakly inaccessible provided existence of such
inaccessibles is consistent with ZFC.
However, we know that 2ℵ0 = ℵω by VII.6.22, since cf(ℵω ) = ℵ0 .

VII.6.23 Lemma. If m < cf(a), then

,
am = Card(α)m
α<a

Proof. Trivially,

m m
a⊇ α (1)
α<a

Let next f ∈ ma. By the assumption, sup ran( f ) < a; hence f ∈ α<a
m
α. Thus
(1) is promoted to equality.
Next,

m m
a = Card α
α<a
,
≤ Card(α)m by Exercise VII.60
α<a

= a ·c Card(α)m by Exercise VII.62
α<a

= am since Card(α)m ≤ am for all α < a.

VII.6.24 Corollary. If a is regular and m < a, then

,
am = Card(α)m
α<a

VII.6.25 Corollary (Hausdorff ). For all α, β,

ℵ ℵ
ℵα+1
β
= ℵα β ·c ℵα+1
488 VII. Cardinality

Proof. For β ≤ α we apply VII.6.24 (see also VII.6.12 and VII.5.21) to obtain
ℵβ
,
ℵα+1 = Card(γ )ℵβ (note the “≤”)
γ ≤ℵα
, ℵ
≤ ℵα β
γ ≤ℵα
ℵ
= ℵα β ·c ℵα+1
ℵ
≤ ℵα+1
β
·c ℵα+1
ℵ
= ℵα+1
β

For α < β (hence also α + 1 ≤ β) use Exercise VII.56 to obtain

ℵ ℵ
ℵα+1
β
= ℵα β = 2ℵβ

Hence, from ℵα+1 ≤ ℵβ < 2ℵβ the contention becomes the following provable
statement:

2ℵβ = 2ℵβ ·c ℵα+1

We conclude this excursion into “higher arithmetic” by noting how the

adoption of GCH helps to further simplify cardinal arithmetic (in particular,
exponentiation).

VII.6.26 Proposition. If we adopt GCH, then

ℵ
(i) ℵα β = ℵβ+1 if α ≤ β,
ℵ
(ii) ℵα β = ℵα+1 if ℵα > ℵβ ≥ cf(ℵα ),
ℵ
(iii) ℵα β = ℵα if cf(ℵα ) > ℵβ .

Proof. (i):
ℵ
ℵα β = 2ℵβ by Exercise VII.56
= ℵβ+1 by GCH

(ii):

ℵα+1 = ℵℵα α by (i)

ℵ
≥ ℵα β by Exercise VII.57
≥ ℵαcf(ℵα ) by Exercise VII.57
> ℵα by VII.6.21
ℵ
Thus, ℵα β = ℵα+1 .
VII.6. Coﬁnality; More Cardinal Arithmetic 489

(iii):
ℵ
,
ℵα β = Card(γ )ℵβ by VII.6.23
γ <ℵα
= ℵα ·c sup Card(γ )ℵβ by Exercise VII.62
γ <ℵα

Now, for any γ < ℵα ,

Card(γ )ℵβ = Card(ℵβ γ )
≤ Card(P(ℵβ × γ )) since ℵβ γ ⊆ P(ℵβ × γ )
= 2Card(ℵβ ×γ )
= Card(ℵβ × γ )+ by GCH
= (ℵβ ·c Card(γ ))+
+
= max(ℵβ , Card(γ ))
≤ ℵα
ℵ
Thus, ℵα β = ℵα in this case.

We conclude this section by pondering the existence of inaccessibles.

VII.6.27 Lemma. If α is strongly inaccessible and Card(N ) < α, where N is

some arbitrarily chosen set of urelements, then Card(VN (α)) = α.

Proof. Since α ⊆ VN (α) by VI.6.8,

α = Card(α) ≤ Card(VN (α)) (1)

Since Lim(α), VN (α) = β<α VN (β). Thus,
,
Card(VN (α)) ≤ Card(VN (β)) ≤ α ·c sup(Card(VN (β))) (2)
β<α

To conclude, by induction on β, we show that Card(VN (β)) < α for all β < α.
If β = 0, then Card(VN (β)) < α from the choice of N .
If β = γ + 1, then
Card(VN (γ + 1)) = Card(P(N ∪ VN (γ )))
≤ 2max(Card(VN (γ )), Card(N ))
< α, by the I.H. and α’s “strong limit” property.

If Lim(β), then VN (β) = γ ∈β VN (γ ). By the I.H., Card(VN (γ )) < α for
γ < β; thus
Card(VN (β)) < α, by the I.H. and VII.6.11, since β < α.
Thus, (2) yields Card(VN (α)) ≤ α, and the result follows from (1).
490 VII. Cardinality

VII.6.28 Lemma. If α is strongly inaccessible and Card(N ) < α, where N

is some arbitrarily chosen set of urelements, then A ⊆ VN (α) and Card(A) <
Card(VN (α)) imply A ∈ VN (α).

Proof. Now, VN (α) = β<α VN (β), Card(VN (α)) = α, and α is regular.

By VII.6.10, A ⊆ β<m VN (β), where m < α. Thus, A ∈ VN (m + 1) (“ + 1”
in the ordinal sense); hence A ∈ VN (α).

VII.6.29 Lemma. If α is strongly inaccessible and Card(N ) < α, where N is

some arbitrarily chosen set of urelements, then for a set A, A ∈ VN (α) implies
Card(A) < α.

Proof. A ∈ VN (α) implies A ∈ VN (β) for β < α, and hence A ⊆ N ∪ VN (β).

Thus, Card(A) ≤ Card(N ) +c Card(VN (β)) < α, by the assumption (on N ) and
by the proof of VII.6.27.

VII.6.30 Theorem. If α is strongly inaccessible and Card(N ) < α, where N

is some arbitrarily chosen set of urelements, then VN (α) is a formal model of
ZFC.

Proof. The proof is very similar to the proof that N ∪ WF N is a model of

ZFC (VI.6.13). Cf. also Sections VI.8 and VI.9. We prove that I = (L Set , ZFC,
VN (α)) is a formal model of ZFC. The veriﬁcation then entails establishing
ZFC A VN (α) for the universal closure of each ZFC axiom.
Observe at the outset that VN (α) = N ∪ VN (α); hence it is transitive. It is
also nonempty (why?)

(i) The axiom “(∃x)(∀y)(U (y) ↔ y ∈ x)” relativizes to “(∃x ∈ VN (α))(∀y ∈

VN (α))(U (y) ↔ y ∈ x)”. This is derivable (take x = N and apply
substitution axiom).
(ii) The axiom “(∀x)(U (x) → (∀y)y ∈ / x)” relativizes to “(∀x ∈ VN (α))
(U (x) → (∀y ∈ VN (α))y ∈ / x)” and is a trivial consequence of the unrela-
tivized version.
(iii) Axiom of extensionality. Derivable by VI.8.10.
(iv) Axiom of separation. It says that for any set B and class A, A ⊆ B im-
plies that A is a set. To see why the relativization of this is derivable, let
B ∈ VN (α) and A ⊆ B. By ZFC separation, A is a set. We need to prove
that (A is a set)VN (α) , that is, A ∈ VN (α). Well, we have A ⊆ VN (α)
by A ⊆ B and transitivity. Since (VII.6.29) Card(B) < α, we have
Card(A) < α and are done by VII.6.28.
VII.6. Coﬁnality; More Cardinal Arithmetic 491

(v) Axiom of foundation. Holds by VI.8.11.

(vi) Axiom of pairing. For any a, b in VN (α) we must show the derivability of
((∃y)y = {a, b})VN (α) . By VI.8.13 we need only show that {a, b}VN (α) ∈
VN (α), or that {a, b} ∈ VN (α), by VI.8.2. Well,

ρ({a, b}) = max(ρ(a), ρ(b)) + 1 < α

and this settles it.†

(vii) Axiom of union. For any set of sets A ∈ VN (α) we need to show that
VN (α)
(∃y)y = A

By VI.8.13, we need only show (using VI.8.16) that A ∈ VN (α). Well,

let A ∈ VN (β) with β < α. Then A ⊆ N ∪ VN (β), since x ∈ A implies

x ∈ N ∪VN (β) and thus x ⊆ N ∪VN (β). So ρ( A) ≤ max(0, β)+1 < α.
(viii) Power set axiom. We need to show that for any set A ∈ VN (α),

((∃y)y = P(A))VN (α)

or that (VI.8.13)

P
VN (α)
(A) ∈ VN (α)

By absoluteness of ⊆, PVN (α) (A) = P(A) ∩ VN (α) = P(A), the last equal-
ity because x ⊆ A ∈ VN (β) (β < α) implies x ⊆ A ⊆ N ∪ VN (β),
and hence x ∈ VN (β + 1). Thus, also, P(A) ⊆ P(N ∪ VN (β)); therefore
P(A) ∈ VN (β + 2).
(ix) Collection. For convenience, we approach collection via its equivalent
form, replacement, that is, “For any set A and any function f , f [A] is
a set.” We need, therefore, to show that for any set A ∈ VN (α) and any
function f ∈ VN (α)

((∃y)y = f [A])VN (α)

or that f [A] ∈ VN (α). Now, this is derivable by ZFC collection and

VII.6.28, for f ∈ VN (α) implies f [A] ⊆ VN (α), and
by VII.6.29
Card( f [A]) ≤ Card(A) < α

(x) Axiom of inﬁnity. We need an inductive set in VN (α). Since ω ∈ VN (α),

we are done.

† One can also do this with a sledgehammer: {a, b} ⊆ VN (α) and Card({a, b}) ≤ 2 < α. Hence
{a, b} ∈ VN (α), by VI.6.28.
492 VII. Cardinality

(xi) AC. Let S be a set of nonempty sets in VN (α). We need a choice fun-

ction in VN (α). By AC, there is a choice function, f : S → S, in ZFC,

such that f (x) ∈ x for all x ∈ S. Now by (viii), S ∈ VN (α); hence

S × S ∈ VN (α) (why?). Thus, f ⊆ S × S implies f ∈ VN (α).

VII.6.31 Theorem. It is consistent with ZFC that inaccessibles do not exist;

that is, if ZFC is consistent, then so is ZFC + ¬(∃α)(α is (strongly) inaccessible).

Proof. (Metamathematical) It sufﬁces to show that if ZFC is consistent, then

ZFC (∃α)(α is inaccessible)

Suppose instead that

ZFC (∃α)(α is inaccessible) (1)

Then we have a proof in ZFC that the smallest inaccessible, β, exists. Now
introduce new constants β for that inaccessible, and N for a set of urelements
such that Card(N ) < β.† Thus, VN (β) is a (formal) model of ZFC. By (1), I.7.9,
and VII.6.30,

ZFC ((∃α)(α is inaccessible))VN (β)

ZFC (∃α ∈ VN (β))(α is inaccessible)VN (β) (2)

Since “α is inaccessible” is absolute for VN (β) (see Exercise VII.71),‡ it follows

from (2) that there is a real inaccessible in VN (β), that is, (2) becomes

ZFC (∃α ∈ VN (β))(α is inaccessible)

which contradicts the choice of β (see VI.6.9). Since ZFC is consistent, this
contradiction establishes the original claim.

VII.6.32 Remark. (1) The above can be transformed to a ZFC proof, via a
formal model, that if ZFC is consistent, then so is ZFC + ¬(∃α)I (α) (where we
use “I (α)” here as an abbreviation of “α is strongly inaccessible”). Once we ﬁx
N with, say, Card(N ) ≤ ω, the model is M = {x : (∀α)(I (α) → x ∈ VN (α))}

† The reader has had enough practice by now to see that augmenting ZFC thus – including the
relevant axioms, e.g., “N is a set of atoms”, “Card(N ) < β”, etc. – results in a conservative
extension.
‡ Intuitively, if α ∈ VN (β), then an inhabitant of VN (β) will perceive it as a strongly inaccessible
iff an inhabitant of U N does.
VII.6. Coﬁnality; More Cardinal Arithmetic 493

with interpretation of ∈, U as themselves. This is clearly so, for there are two
cases: If there are no inaccessibles (i.e., ¬(∃α)I (α)), then M = U N is a model of
ZFC + ¬(∃α)I (α); else M = VN (β), where β is the smallest inaccessible, and
hence again (by VII.6.31) M is a model of ZFC + ¬(∃α)I (α) since (∃α)I (α)
is false in M(I.7.4).
(2) Can we, again in ZFC, prove consistency of ZFC + (∃α)I (α) (assuming
consistency of ZFC)? No, because this would clash with Gödel’s second incom-
pleteness theorem, which says “In any extension S of ZFC, if S is recognizable
and consistent, then S CONS(S)”, where CONS(S) is a formula that says “S
is consistent”. In outline, this goes like this. Assume that we have a proof

ZFC CONS(ZFC) → CONS(ZFC + (∃α)I (α)) (i)

Hence we also have a proof in the extension

ZFC + (∃α)I (α) CONS(ZFC) → CONS(ZFC + (∃α)I (α)) (ii)

By VII.6.30,

ZFC + (∃α)I (α) CONS(ZFC) (iii)

since, if β is any inaccessible and Card(N ) < β, then for the universal closure
F of every axiom of ZFC we have

ZFC + (∃α)I (α) F VN (β)

By (ii), (iii), and modus ponens we derive a contradiction to Gödel’s incom-

pleteness theorem:

ZFC + (∃α)I (α) CONS(ZFC + (∃α)I (α))

(3) How about weakly inaccessibles? Can we prove in ZFC (if this is consis-
tent† ) that weakly inaccessibles exist? Suppose we could. Then we could also
prove this in the extension theory ZFC + GCH (which is also consistent – as
Gödel has shown using his L). If β is the smallest weakly inaccessible as far as
ZFC knows, then it is also the smallest strongly inaccessible in ZFC + GCH.
But then VN (β), constructed in ZFC + GCH with a well-chosen N , is a model
of ZFC. We have, as in VII.6.31,

ZFC + GCH (∃α ∈ VN (β))I (α)

a contradiction to the choice of β.

† By now this hedging must have become annoying. However, recall that if ZFC is inconsistent,
then for any formula F whatsoever, ZFC F .
494 VII. Cardinality

VII.7. Inductively Deﬁned Sets Revisited; Relative Consistency of GCH

We have had a ﬁrst acquaintance, in VII.1.16, with

(1) sets deﬁned inductively as closures under certain operations,

(2) induction on inductively deﬁned sets, and
(3) the relation of this concept to that of set operators.

We now inform this discussion a bit further by our understanding of cardinals.

First let us expand our understanding of operation, so that we can now allow
infinitary operations. This will have as a result, apart from the wider applicability
of the concept (for example, the inductive definition of “computations in higher-
type objects” involves infinitary operations) that the “operation” and “operator”
approaches become equivalent (see the footnote to VII.1.31).
To this end, we will generalize operations, f , on a set S so that they have
as argument list any set of members from S, rather than a finite sequence of
such objects. Before we proceed to formalize, let us make sure that this makes
intuitive sense, that indeed the new way of looking at rules subsumes VII.1.15
and VII.1.16 as special cases.
How do we indicate order of arguments if the arguments are just lumped
into a finite set? Well, an easy way to do this is to have many “rules” X "→ x
for any given input X , so that we incorporate all desirable outputs x for all
the relevant permutations of the set X (a permutation of X is, of course, a
1-1 correspondence from X to X ). For example, a rule λx y.x − y (for which
order of arguments matters) would give rise to “rules” {x, y} "→ x − y and
{x, y} "→ y − x for all x, y.
an . f (
In general, an “old rule” (VII.1.15) on a set S, λ an ), will give rise in
the present section to new rules

{a1 , . . . , an } "→ f (a j1 , . . . , a jn )

for all permutations ai "→ a ji of {a1 , . . . , an }. In addition, we will allow rules

X "→ x with X possibly inﬁnite, thus going beyond simply translating VII.1.15
into new notation.
While we are at it, we ﬁnd it elegant to allow rules ∅ "→ x. The right hand
sides of such rules (x) will play the role of the initial objects of VII.1.15. This
will unify the discussion, avoiding the annoying asymmetry between rules and
initial objects.

VII.7.1 Deﬁnition. A rule set R on a given set S is a relation R ⊆ P(S) × S.

Instead of writing X, x ∈ R or x R X , we will prefer the notation X "→ x or
VII.7. Inductively Defined Sets 495
R
X "→ x if R must be emphasized. A pair X, x or, in the preferred notation,
X "→ x will be called a rule.
A class X is R-closed iff whenever A ⊆ X then RA ⊆ X.
A rule set R is finitary iff for all rules X "→ x in R, X is finite. Otherwise it
is infinitary.

VII.7.2 Remark. A set X is R-closed iff R[P(X )] ⊆ X , since

R[P(X )] = {a : (∃A)(a R A ∧ A ⊆ X )}

VII.7.3 Example. Every ordinal is closed under the solitary rule ∅ "→ 0.
Every limit ordinal is closed under the rule set

∅→" 0
{α} →" α+1

It is clear that the rule sets are not single-valued relations in general. We will
often omit mention of the set S such that R ⊆ P(S) × S.

VII.7.4 Deﬁnition. Given a rule set R. We say that a set X is inductively, or

recursively, deﬁned by R iff X is the ⊆-smallest set that is R-closed.
Under these conditions, we also say that X is the closure of R, in symbols
X = Cl(R).

As in VII.1.19, we have

VII.7.5 Proposition. For any rule set R, Cl(R) is uniquely deﬁned by

X
R[P(X )]⊆X

Proof. Uniqueness. Say that S, T are both candidates for Cl(R). Then S ⊆ T
and T ⊆ S by VII.7.4; hence S = T .
Existence. To see that

S= {X : X is R-closed} (1)

satisﬁes VII.7.4, observe that if R is a rule set, then ran(R) is an R-closed

set, and hence {X : X is R-closed} = ∅, so that S is a set. Trivially, the inter-
section of any set of R-closed sets is R-closed, so that S is. Now, if T ∈ {X : X is
R-closed}, then S ⊆ T .
496 VII. Cardinality

VII.7.6 Corollary (Induction on the Structure of Cl(R), or R-Induction).

Let T (x) be a formula, and R a rule set. To prove that (∀x ∈ Cl(R)) T (x), it
sufﬁces to prove that {x : T (x)} is R-closed.

Proof. Suppose that {x : T (x)} is R-closed. Then so is C = {x : T (x)} ∩ ran(R).

But C is a set, hence Cl(R) ⊆ C ⊆ {x : T (x)}.

VII.7.7 Example. Let I be a set of initial objects, and F a set of (function)

operations on a set S, as in VII.1.15. Form a rule set R as follows:
R
∅ "→ a if a ∈ I (1)

R
(∀ f ∈ F )({a1 , . . . , an } "→ f (a j1 , . . . , a jn )
for all permutations ai "→ a ji ) (2)

where, in (2), f (a j1 , . . . , a jn ) ↓ is understood.

We now see that Cl(I , F ) = Cl(R).
⊆: By VII.1.20, we need to show that Cl(R) is F -closed and that
I ⊆ Cl(R). Now, since Cl(R) is R-closed, these contentions follow from (2)
and (1) respectively. [For example, if ai ∈ Cl(R) for i = 1, . . . , n, and if f (
an ) ↓
for f ∈ F , then f (an ) ∈ Cl(R) by (2).]
⊇: By VII.7.6, we need to show that Cl(I , F ) is R-closed. So let
R
Cl(I , F ) ⊇ {a1 , . . . , an } "→ a (3)

By (2), the only way that (3) is possible is that a = f (a j1 , . . . , a jn ) for some
permutation of a1 , . . . , an ; hence a ∈ Cl(I , F ). We also need to settle the case
R
Cl(I , F ) ⊇ ∅ "→ a (4)

By (1) above, such a are precisely those in I ; hence, again, a ∈ Cl(I , F ).

We next relate R-induction to induction with respect to a well-founded re-

lation Q : A → A (A a set).

VII.7.8 Example. Let Q : A → A (A a set) have IC, and deﬁne R by

R
Qx "→ x for all x ∈ A

Thus,

X ⊆ A is R-closed iff Qx ⊆ X implies x ∈ X (1)

VII.7. Inductively Deﬁned Sets 497

Two remarks:

(i) By Q-induction (in the sense of VI.2.1), the right hand side of “iff” above
says that A ⊆ X , and hence A = X .
(ii) Thus, by (1), since Cl(R) ⊆ A (why is Cl(R) ⊆ A?) and Cl(R) is R-closed,
we get that A = Cl(R).

A side effect is that instead of Q-induction, to prove properties of A one

can do R-induction. Indeed, the I.H. with respect to one relation is identical to
that with respect to the other: For Q-induction we assume Qx ⊆ X (towards
proving x ∈ X ). For R-induction we want to show that X is R-closed, but that,
by (1), amounts again to assuming Qx ⊆ X (towards proving x ∈ X ).

VII.7.9 Example (Some Pathologies). Let R be a rule set on a set A given by

{a} "→ a. Then every set X is R-closed, so that ∅ = Cl(R).
As another pathological case, let the rule set R on set B be such that whenever
R
X "→ x, X = ∅. Now, Cl(R) exists anyhow, by VII.7.5. By VII.7.6 we can prove
properties of Cl(R) by R-induction. This looks strange in view of VII.7.8, for
one can start with any Q : B → B, then deﬁne R exactly as in VII.7.8, and lo
and behold have an “induction tool” over B. Is the assumption that Q has MC
(or IC) on B really important?
Yes. If not, one could end up with an R as here described (due to the ab-
sence of Q-minimal elements). Under the circumstances, ∅ is R-closed; hence
Cl(R) = ∅, and we can do R-induction over ∅, not over B. Hardly an exciting
prospect.
The moral is that:

(1) We cannot bring in induction over the entire field of Q through the back
door (via R of VII.7.8), if Q does not have IC on B. Our ability to do
induction in these cases is restricted to some (often – as above – trivial)
subset, Cl(R), of the field of Q (see Exercise VII.73 for what we can say
in the general case).
(2) To define “useful” closures, the rule set must have rules with empty premises
(∅ "→ a).

VII.7.10 Deﬁnition (Immediate Predecessors, Ambiguity). Let R be a rule

set. For each a ∈ ran(R) we deﬁne its immediate predecessors as follows:
If A, a ∈ R, then A is an immediate predecessor set (i.p.s.), while each
member of A is an immediate predecessor (i.p.), of a. If ∅ "→ a is the only rule
involving a, then a has no immediate predecessors.
498 VII. Cardinality

The transitive closure of the i.p. relation is the predecessor relation.

If for some a ∈ ran(R) there are A = B such that both A, a ∈ R and
B, a ∈ R, then R is an ambiguous rule set; otherwise it is unambiguous.
Thus, R is unambiguous iff, for all a ∈ ran(R), R −1 a is a singleton (its sole
member is the unique i.p.s. of a), in short, R −1 is a function.

Aczel (1978) calls what we termed ambiguous rule sets “nondeterministic”

(Hinman (1978) calls them “non-monomorphic”). We prefer the above termino-
logy, as it is consistent with its usage towards characterizing a related pheno-
menon in formal language theory. Similarly, the term “nondeterministic” is
reserved in automata and language theory for rule sets that are not single-
valued (as opposed to their inverses not being single-valued – which is what
concerns us here).

VII.7.11 Example. Ambiguity makes it hard (often impossible) to deﬁne func-

tions on Cl(R) the natural way, i.e., by induction on the formation of Cl(R).
Here is an example.
Let us deﬁne symbol sequences using the symbols 1, 2, 3, +, × by the rule
set R given as follows:

∅ "→ 1
∅ "→ 2
∅ "→ 3
{x, y} "→ x + y
{x, y} "→ y + x
{x, y} "→ x × y
{x, y} "→ y × x

Cl(R) is the set of (strings denoting) non-parenthesized arithmetic expressions

that utilize addition, multiplication, and the “constants” 1, 2, 3.
Suppose we want to deﬁne the value, val(E) of any such expression, E. The
natural way to do so is

val(1) = 1
val(2) = 2
val(3) = 3
val(x + y) = val(x) + val(y) if x, y are the i.p.’s of x + y
val(x × y) = val(x) × val(y) if x, y are the i.p.’s of x × y

It should be clear that the above definition of val is, intuitively, “ambiguous” or
“ill-defined” (terminology that was formally adopted in VII.7.10). For example,
VII.7. Inductively Defined Sets 499

there are two choices of i.p. sets for 1 + 2 × 3. One choice is x = 1 + 2 and
y = 3 (under ×), so that

val(1 + 2 × 3) = val(1 + 2) × val(3)

= (val(1) + val(2)) × val(3)
= (1 + 2) × 3 = 9

while the other choice is x = 1 and y = 2 × 3 (under +), so that

val(1 + 2 × 3) = val(1) + val(2 × 3)

= val(1) + (val(2) × val(3))
= 1 + (2 × 3) = 7

We get different results! Even 1 + 2 + 3 has two possible sets of i.p.’s, although
this does not create a problem for val, since + is commutative.

It is not always easy to prove that a rule set is unambiguous (it is much easier,
in general, to spot an ambiguity). The reader will be asked in the Exercises sec-
tion to check that a few familiar rule sets are unambiguous (Exercises VII.74
to VII.77). Freedom from ambiguity is important in an inductive definition
effected by a rule set R, for we can then “well-define”, recursively, functions
by induction over Cl(R). Examples of such functions are the val function over
arithmetic expressions (assuming that arithmetic expressions are defined more
carefully than in VII.7.11: brackets would have helped – see Exercise VII.75),
the truth-value function on formulas (propositional calculus), assigning “mean-
ing” (i.e., “interpretation” over some structure) to terms and formulas of a first
order language, numerous definitions on “trees”, and more. The following result
allows such recursive definitions.

VII.7.12 Example. We continue on the theme of Example VII.7.8 by looking at

the converse situation. Now we are given A = Cl(R), where R is unambiguous.
We deﬁne Q : A → A by

yQx iff y is an i.p. of x with respect to R

Thus,

if X is the (unique) i.p.s. of x, then X = Qx (1)

Does Q have IC? Well, suppose that S is a set for which we know that

Qx ⊆ S → x ∈ S (2)
500 VII. Cardinality

R
Can we conclude that A ⊆ S? Indeed we can, as follows: Let Y ⊆ S and Y "→ y.
By (1), Qy = Y ⊆ S. By (2), y ∈ S. Thus, S is R-closed, hence A ⊆ S.
It follows that, for unambiguous R, R-induction can be replaced by i.p.
induction (see however VII.7.9).

VII.7.13 Theorem (Recursion over a Closure). Let R be an unambiguous

rule set on a set A, and g a total function on A × P(A × ran(g)). Then there is
a unique total function f on Cl(R) satisfying, for all a ∈ Cl(R),
R
f (a) = g(a, f X a ) where X a is the unique set such that X a "→ a

Proof. Deﬁne Q on Cl(R) by

R
x Qy iff x ∈ Y "→ y

Thus, for each y ∈ Cl(R), Qy is the unique i.p.s. of y, or

R
Y = Qy iff Y "→ y

and the recurrence in the statement of the theorem becomes

f (a) = g(a, f Qa)

Since Q has IC on Cl(R) (by Example VII.7.12) we are done, via VI.2.28.

Several variations are possible for VII.7.13 (see Section VI.2), but we will
not pursue them here.
We return to operators : P(X ) → P(X ) (Deﬁnition VII.1.25). It is now the
case (compare with VII.1.31, footnote) that every monotone operator gives rise
to an equivalent rule set.

VII.7.14 Proposition. For every monotone operator : P(X ) → P(X ) there is

a rule set on X such that = Cl(R).

Proof. Deﬁne R for each A ⊆ X, a ∈ X :

R
A "→ a iff a ∈ (A) (1)

Now, a set Z ⊆ X is R-closed just in case Z ⊇ A "→ a implies a ∈ Z . This, in

view of (1) and monotonicity, says that

Z ⊆ X is R-closed iff (Z ) ⊆ Z

VII.7. Inductively Deﬁned Sets 501

As in VII.1.27,

= Z
Z ⊆X
(Z )⊆Z

= Z = Cl(R)
Z ⊆X

Z is R -closed

That, conversely, a rule set R leads to a monotone operator such that

Cl(R) = is proved as in VII.1.31, and we will not revisit it here. We conclude
the section with the introduction of the stages of construction of , since,
intuitively, what is happening is the iteration of a “construction”
S← ∅
repeat until S converges to
S← S ∪ (S)

– that is, at each stage, we add to the S that we have so far all the new points
we constructed in (S) (cf. the “abstraction” of this in VI.5.47).

In the interest of greater ﬂexibility in applying the operator concept, we relax

the requirement that an operator be necessarily a set (viz., its left ﬁeld and
right ﬁeld may be a proper class X whose members are sets).

VII.7.15 Deﬁnition (Stages). Let be a monotone operator, that is, possibly, a

proper class total function that carries sets to sets. We deﬁne by recursion over
On:
α = <α ∪ ( <α )
where

<α = β
β<α

for all α.
We call the set α the αth stage (often, by abuse of terminology, we refer
to the ordinal α itself as the αth stage). An element s ∈ α has level or stage

≤ α. It has level α if moreover s ∈ / β for all β < α. We write = α α

or ∞ = α α . We call ( ∞ ) the class inductively deﬁned by the opera-
tor .

The notation α might be confusing at ﬁrst sight. This is the α-th set constructed;
it is not an operator. By the way, an easy induction on α shows that α is indeed
a set (Exercise VII.78).
502 VII. Cardinality

The notation <α for the union of all the stages before α is due to Moschovakis
(it corresponds to the set S we used in the pseudo-program above). Note that
<0 = ∅ and hence 0 = (∅).

VII.7.16 Lemma. Let be any monotone operator (not necessarily a set). If,
for some α, <α = α , then:
(1) = α = β for β ≥ α.
(2) is a ﬁxed point of , that is, () = .

Proof. (1): Assume (I.H.) that α ≤ γ < β implies α = γ . Thus,

<β = γ
γ <β

= <α ∪ γ
α≤γ <β

= <α ∪ α by I.H.
<α
= by the choice of α
Hence,
β = <β ∪ ( <β )
= <α ∪ ( <α )
= α

In particular, = β β = <α = α , for the above α.

As a by-product, is a set.

(2): Since α = <α ∪ ( <α ), it follows that ( <α ) ⊆ α ; hence,
() ⊆ (i)
by = α = <α , with α as above. Next, as an I.H., assume that
<β ⊆ ()
Now,

() = γ
γ

⊇ γ by monotonicity of . (ii)
γ <β

Thus, by I.H. and (ii), β = <β ∪ ( <β ) ⊆ ().

VII.7. Inductively Deﬁned Sets 503

Therefore β ⊆ () for all β; hence ⊆ (). This settles the issue,
by (i).

VII.7.17 Corollary. Let X be a set, and : P(X ) → P(X ) be a monotone

operator. Then the following are provable (cf. VI.5.47):

(1) is a set.
(2) There is an α such that <α = α . Moreover, = α = β for β ≥ α.
(3) The α of (2) satisﬁes Card(α) ≤ Card(X ).
(4) is a ﬁxed point of , that is, () = .

Proof. (1): This is trivial, since α ⊆ X for all α, and hence = α α ⊆ X .
(2): By the proof of VI.5.47 (see also the remark following that proof).
Alternatively, the function f = λs. min{α : s ∈ α }, that is, the one that maps
each s ∈ to its level, is a set by (1). Let α = sup+ ran( f ). Then

α = β ∪ β
β<α β<α

= β since every s ∈ ( <α ) is in some β , β < α
β<α
<α
=

The rest follows from VII.7.16.

(3): Since the function f : → α of (2) is onto, and ⊆ X , it follows that
Card(α) ≤ Card() ≤ Card(X ).
(4): From VII.7.16.

VII.7.18 Remark. (1) For an arbitrary monotone operator , the smallest α

such that <α = α = , if it exists, is called the ordinal of and is often de-
noted by ||.
(2) We next relax the concept of rule set to allow also (proper) rule classes.
However, we require some restriction on the size of the left hand sides of
rules.

VII.7.19 Deﬁnition. An m-based rule class (possibly proper) R, where m is

regular, is a left narrow class of rules such that for every rule A "→ a of R, we
have Card(A) < m. An ω-based rule is called ﬁnitary.

VII.7.20 Proposition. If is a monotone operator (possibly a proper class)

R
deﬁned from an m-based rule class R by (A) = {x : (∃Y ⊆ A)Y "→ x} for all
504 VII. Cardinality

A, then

(1) || ≤ m,
(2) Cl(R) = .

(A) is a set by left narrowness.

Proof. (1): Since m ⊇ <m, it sufﬁces to show that m ⊆ <m. Let then
x ∈ m. Thus, either x ∈ <m, in which case there is nothing to prove, or
x ∈ ( <m). Thus, for some A ⊆ <m, we have Card(A) < m and A "→ x is
an R-rule. By VII.6.10, A ⊆ <α for some α < m; thus x ∈ ( <α ) ⊆ <m.
(2): By VII.7.16, () = = m. Thus, is an R-closed set; hence Cl(R)
exists (i.e., is a set). Indeed,

Cl(R) ⊆ (i)

On the other hand, assume <α ⊆ Cl(R). It follows (Cl(R) is R-closed) that

( <α ) ⊆ Cl(R)

and hence (by induction)

α ⊆ Cl(R) for all α

which promotes (i) to equality.

We apply these ideas to prove the important reﬂection principle.

VII.7.21 Lemma. For any formula F (y, xn ) and set N , there is a set M ⊇ N
such that:

(1) The following is provable:

u 1 ∈ M ∧ · · · ∧ u n ∈ M → (∃y)F (y, u n ) ↔ (∃y ∈ M)F (y, u n )

and
(2) M can be chosen to satisfy Card(M) ≤ max(Card(N ), ℵ0 ).

Proof. (1): Deﬁne the ω-based rule class

R = {u 1 , . . . , u n } "→ y : F (y, u n ) ∧ y has least rank ∪ {∅ "→ x : x ∈ N }

M = Cl(R) is a set (by VII.7.20), satisfying N ⊆ M.

Let us take u i , i = 1, . . . , n, in M. The ← -direction of (1) is trivial. Let then

(∃y)F (y, u ) (i)

VII.7. Inductively Deﬁned Sets 505

Thus we may add F (a, u ), where a is a new constant and ρ(a) is minimum.
Since M is R-closed, a ∈ M; thus (∃y ∈ M)F (y, u ) by substitution axiom.
(2): Using AC, cut R down to T such that for each of the n! permutations w
of u 1 , . . . , u n , where u i ∈ M (the M above) we keep a unique {u 1 , . . . , u n } "→ y
whenever F (y, w) (this T is, of course, a set). Set M = Cl(T ).
First of all, for all u i ∈ M , (∃y)F (y, u ) ↔ (∃y ∈ M )F (y, u ), exactly as
in (1). Next, by VII.7.20,

M = p (ii)
p∈ω

where is the monotone operator associated with the rule set T (recall, T is

ω-based, whence the choice of upper bound of in (ii)).
By induction on p we argue that Card( p ) ≤ max(Card(N ), ℵ0 ). Indeed,
this is true for p = 0, as 0 = ( <0 ) = (∅) = N . Now,

Card( ) = Card
p+1
∪
i
i

i≤ p i≤ p

≤ Card i +c Card i (iii)
i≤ p i≤ p

By the I.H.,

Card ≤ ( p + 1) ·c max(Card(N ), ℵ0 ) = max(Card(N ), ℵ0 )
i
(iv)
i≤ p

Also, setting S = i≤ p i ,

Card (S) = Card {y : (∃
u ∈ S)({u 1 , . . . , u n } "→ y is in T )}
n
≤ Card(S) since T is single-valued in y
≤ max(Card(N ), ℵ0 ) by (iv) (v)

By (iii)–(v) the induction is complete. Thus, by (ii),

Card(M ) ≤ ℵ0 ·c max(Card(N ), ℵ0 ) = max(Card(N ), ℵ0 )

VII.7.22 Corollary. Lemma VII.7.21 holds if we have a ﬁnite number of for-

mulas F i , i = 1, . . . , m. That is, for any set N , there is a set M ⊇ N such that:
(1) The following is provable for each i = 1, . . . , m:

u 1 ∈ M ∧ · · · ∧ u ik ∈ M → (∃y)F i (y, u ik ) ↔ (∃y ∈ M)F i (y, u ik )

and
(2) M can be chosen to satisfy Card(M) ≤ max(Card(N ), ℵ0 ).
506 VII. Cardinality

Lemma VII.7.21 also holds for any set of formulas that can be indexed within the
theory. However, for an arbitrary (inﬁnite) set of formulas (not indexed within
the theory) the lemma breaks down. It is still true metamathematically, though,
since, arguing in the metatheory, we can index this (enumerable) set of formulas
using N as index set. Of course, the R so obtained (by “put {u 1 , . . . , u n } "→ y
in R as long as G i (y, u ) for some i ∈ N, and y has least rank”) is still ω-based.
The proof technique in VII.7.21 (and the ﬂavour of the result in VII.7.23) is
analogous to that employed towards the downward Löwenheim-Skolem theorem
of model theory (proved in volume 1 of these lectures).

We next apply VII.7.22 to show that for any ﬁnite set of formulas, there is a
set M such that each of these formulas is absolute for M. We say that M reﬂects
these formulas.

VII.7.23 Theorem (Reﬂection Principle). For any set N and any ﬁnite set of
formulas F i , i = 1, . . . , m, there is a set M ⊇ N such that

ZFC F i ↔ F M
i for i = 1, . . . , m

assuming u k ∈ M for all the free variables u k .

Moreover, an M with cardinality at most max(Card(N ), ℵ0 ) exists.

Proof. Let G j , j = 1, . . . , r , be the list of all formulas that consists of the list
F i , i = 1, . . . , m, augmented by all subformulas of the F i . If none of the G j is
of the form (∃y)Q , then take M = N . Otherwise, take M ⊇ N , using VII.7.22
on all formulas of the form (∃y)Q in the G j -list.
By induction on formulas we show next that

ZFC G j ↔ G M
j for j = 1, . . . , r (1)

from which the theorem follows.

If G j is atomic, then (1) follows by VI.8.1 if any a ∈ N , added as a constant
to L Set , relativizes as a. If G j is ¬A or A ∨ B , then the I.H. guarantees that

ZFC A ↔ A M

and

ZFC B ↔ B M

from which, using VI.8.1 and the Leibniz rule, we get

ZFC ¬A ↔ (¬A ) M
VII.7. Inductively Deﬁned Sets 507

and

ZFC A ∨ B ↔ (A ∨ B ) M

Finally, let G j be (∃y)Q . We get

ZFC (∃y)Q ↔ (∃y ∈ M)Q by VII.7.22

↔ (∃y ∈ M)Q M by I.H. and Leibniz rule
M
↔ (∃y)Q by VI.8.1

(1) is proved, and we are done.

A consequence of the reﬂection principle is that ZFC cannot be ﬁnitely ax-

iomatized if it is consistent.† That is, there is no ﬁnite set of sentences F i ,
i = 1, . . . , n, such that for every formula Q ,

ZFC Q iff F 1, . . . F n Q. (1)

This is so because a consistent ZFC using (1) can prove the existence of a
(set) model for itself (i.e., one for {F 1 , . . . , F n }). Thus, ZFC can prove its
consistency (cf. I.7.8):

ZFC (∃M)(¬U (M) ∧ F M

1 ∧···∧ F M
n )

by VII.7.23 and tautological implication‡ with help from the Leibniz rule. This
is contrary to Gödel’s second incompleteness theorem.
As a by-product of this observation, we also conclude that an extension
of VII.7.21 to an arbitrary set of formulas not only does not follow (in ZFC)
from our (Löwenheim-Skolem) proof technique, but is downright impossible.
Now, working in the metatheory, we can mimic the construction that builds
the model U A for some set of urelements A. Continuing in the metatheory, we
can apply reﬂection (to the enumerable – in the metatheory – set of axioms) and
“cut U A down” to an enumerable (U, ∈)-model (M, U, ∈).§ We can next apply
Mostowski collapsing (λx.C(M, x) : M → C(M, M); see VI.2.38, p. 312) to
get an ∈-isomorphic transitive set structure (C(M, M), U, ∈) which is also a
model, since λx.C(M, x) preserves atoms and the ∈-relation.¶

† ZFC, as given, indeed has inﬁnitely many axioms. For example, collection provides one axiom for
each formula F . On the other hand, if ZFC is inconsistent, then certainly all its theorems (which
happen to be all formulas under these circumstances) follow from the single axiom (∀x)x = x.
‡ If ZFC A ↔ A M and ZFC A, then ZFC A M .
§ Take N in the proof of VII.7.23 to be enumerable.
¶ See VI.2.36–VI.2.39. Of course, M is extensional, being a ZFC (U, ∈)-model.
508 VII. Cardinality

Thus, Platonistically, we have shown that a so-called countable transitive

model (CTM) – (C(M, M), U, ∈) – for ZFC exists.
This provides the source for the so-called “Skolem paradox” (not a real
paradox): In ZFC we can prove the existence of sets of enormous cardinality.
In particular, we can prove (Cantor’s theorem) that

ω ∼ P(ω) (1)

However, it is “really true” that

ωC(M,M) ∼ P(ω)C(M,M) (2)

since both sets are enumerable. Isn’t this a contradiction?

We know better by now. (2) is irrelevant, for it is not equivalent to (ω ∼
C(M,M)
P (ω)) , due to the presence of an unbounded existential quantiﬁer. As far
as an inhabitant of C(M, M) is concerned, he sees that

|=C(M,M) ω ∼ P(ω)

and this is as it should be by (1). This person cannot see the 1-1 correspondence
f that effects (2), for f is not in C(M, M). Note that the expression immediately
above says the same thing as (cf. VI.8.4)
C(M,M)
|=U N ω ∼ P(ω)

Consistency of GCH with ZF. We conclude this chapter with a proof that L N
is a model of GCH. Which L N ? We add to ZF the new constants N , f and the
axiom

¬U (N ) ∧ (∀z ∈ N )U (z) ∧ f is a 1-1 function ∧ dom( f ) ⊆ ω ∧ ran( f ) = N

To avoid unnecessary linguistic (and notational) acrobatics we call this conser-

vative extension of ZF just ZF. Then we build L N in ZF as before.
We also bypass tedious relativizations to L N by working in ZF + (V = L)
or a conservative extension thereof throughout (cf. VI.9.18). Thus, rather than
proving ZF GCHL N , we prove GCH in ZF + (V = L) instead. The key lemma
is the following:

VII.7.24 Lemma. In ZF + (V = L) we can prove that if A is a transitive set

and Card(A) ≤ ℵα , then A ⊆ {Fβ : β < ℵα+1 }.

Once the above is settled, one can easily prove:

VII.7. Inductively Deﬁned Sets 509

VII.7.25 Theorem. GCH is provable in ZF + (V = L).

Proof. Let S ⊆ ℵα . Thus, A = ℵα ∪ {S} is transitive, for x ∈ A leads to two

cases: x = S (and we are done by S ⊆ ℵα ), or x ∈ ℵα (but ℵα is transitive). More-
over, Card(A) ≤ ℵα +c 1 = ℵα . Thus, A ⊆ {Fβ : β < ℵα+1 } by VII.7.24, from
which

S ∈ {Fβ : β < ℵα+1 }

Therefore

P(ℵα ) ⊆ {Fβ : β < ℵα+1 }

Hence

Card P(ℵα ) ≤ Card {Fβ : β < ℵα+1 } ≤ ℵα+1

– the last ≤ due to the onto map ℵα+1 % β "→ Fβ . Since ℵα+1 ≤ Card P(ℵα ) ,
we are done.

VII.7.26 Corollary. If ZF is consistent, then so are ZF + GCH and

ZFC + GCH.

Proof of Lemma VII.7.24. Recall that, working in ZF + (V = L), we get AC

for free; therefore all our work on cardinals is available to us. The proof is an
application of reﬂection followed by Mostowski collapsing. Freeze then sets A
and m along with the assumptions

A is transitive and Card(A) ≤ m, m being an inﬁnite cardinal† (1)

It is convenient to work in a conservative extension T of ZF + (V = L) for a

while.
Let us denote by L the language of ZF + (V = L) (this includes the constants
N and f ). T is obtained from ZF + (V = L) by adding to L a new constant B
and the axioms

{ f, N } ∪ N ∪ T C( f ) ∪ A ⊆ B ∧ ¬U (B) (2)

Card(B) ≤ m (3)

† Recall that “freezing”, jargon we have applied constantly towards invoking the deduction theorem,
formally means to add new set constants, A and m, and the assumptions (1).
510 VII. Cardinality

and the schema

A ↔ AB for every sentence A of L (4)
where N = N and f = f . Note that (2) and (3) are relatively consistent by
B B

(1) and the choice of N .

To see that T is as claimed, let T A, where A is over L. Fix attention
to one such proof, and let F 1 , . . . , F n be the universal closures of all the
axioms among (2)–(4) appearing in it. Thus, ZF + (V = L) – although, over
an extended language L that includes the constant B – proves
F 1 ∧ ··· ∧ F n →A (5)
by the deduction theorem. Therefore (cf. I.4.16), we have a proof in ZF +
(V = L) of F 1 ∧ · · · ∧ F n → A – this formula being over L – where F i
is obtained from F i by replacing all occurrences of B in it by a new variable
z (the same z is used in all the F i ) that does not occur as either free or bound
in (5). By ∃-introduction,

ZF + (V=L) (∃z)(F 1 ∧ · · · ∧ F n ) → A (6)
However,

ZF + (V=L) (∃z)(F 1 ∧ · · · ∧ F n )
by VII.7.23 and Card({ f, N } ∪ N ∪ T C( f ) ∪ A) ≤ m by N being count-
able. VII.7.23 is applicable because only finitely many sentences (from schema
(4)) are involved in F 1 ∧ · · · ∧ F n . By (6) we now have a proof of A in
ZF + (V = L) which establishes the conservative nature of the extension T.
I = (L , T, B) is a formal (U, ∈)-model of ZF + (V = L), where L is L
with the addition of B. Indeed, if F is the universal closure of a ZF + (V = L)
axiom, then T F , since T is an extension of ZF + (V = L). By (4), T F B .
We prefer to have a transitive (set) model, so we invoke Mostowski collapsing
(cf. VI.2.38). Since B (argot for I) is a model of ZF + (V = L), it satisfies, in
particular, extensionality. That is, it is extensional in the sense of VI.2.38. We
have shown there that the unary function φ introduced in the theory T over L
by the recursive definition†

x if U (x)
φ(x) = (7)
{φ(y) : y ∈ B ∧ y ∈ x} otherwise
satisfies, for extensional B,

T x ∈ B ∧ y ∈ B → x ∈ y ↔ φ(x) ∈ φ(y) (8)

T φ is 1-1 (9)

† φ(x) = C(B, x) in the notation of VI.2.36. N and f are ﬁxed points of φ. (Why?)
VII.7. Inductively Deﬁned Sets 511

and
T ran(φ) is transitive† (10)
By the ﬁrst case of (7)

T x ∈ B → U (x) ↔ U (φ(x)) (11)
where T is the conservative extension of T obtained by adding the introducing
axiom for φ.‡ Its language is L , that is, L with φ added.
Now, I = (L , T , B) is also a model of ZF + (V = L), and (8)–(11) yield
a formal isomorphism (cf. p. 84) between I and the transitive interpretation of
L, J = (L , T , ran(φ)).§ In fact J is a formal (U, ∈)-model of ZF + (V = L).
To see this we employ I.7.12 to obtain
T A B ↔ A ran(φ) (12)
for all sentences over L. This and (4) entail
T A ↔ A ran(φ) (13)
for all L-sentences. If now F is the universal closure of a ZF + (V = L) axiom,
then T F ; hence, by (13), we also have T F ran(φ) .
We note two more facts:
by (9)
T Card(ran(φ)) = Card(B) ≤ m (14)

T {N } ∪ N ∪ A ⊆ ran(φ) (15)
the latter by Exercise VI.6, since {N } ∪ N ∪ A is transitive. By (15) and results
in Sections VI.8 and VI.9, α "→ Fα is absolute for ran(φ) (i.e., for J); hence so
is ord, since it is introduced by the explicit deﬁnition¶
ord(x) = min{α : x = Fα }
Then
T x ∈ A → ord(x) ∈ ran(φ) (using (15))
Hence, by transitivity of ran(φ),
T x ∈ A → ord(x) ⊆ ran(φ)

† This does not need the extensional nature of B. By the way, T ran(φ) = C(B, B) in the
notation of VI.2.38.
‡ Extensions by deﬁnitions are conservative. Cf. I.6
§ Where U , ∈, N and f are interpreted as themselves, just as in I . cf. footnote to the deﬁnition
of φ.
¶ ord(x) = α ↔ (x = Fα ∧ (∀β ∈ α)x = Fβ ), etc.
512 VII. Cardinality

Further, bringing (14) in,

T x ∈ A → Card(ord(x)) ≤ m (16)
Let now x ∈ A. Suppose also m+ ≤ ord(x). Thus (≤ is ⊆), m+ ≤ Card(ord(x)),
contradicting (16). Hence ord(x) < m+ , and therefore x ∈ {Fβ : β < m+ }. We
have shown
T A ⊆ {Fβ : β < m+ }
Invoking the deduction theorem and remembering the assumptions (1) (with
frozen variables) that we made at the beginning of the proof, we have a proof in
T of “if A is a transitive set, m an inﬁnite cardinal, and Card(A) ≤ m, then A ⊆
{Fβ : β < m+ }”. This is precisely the statement of the lemma, and we are done,
for this argot can be trivially translated into a formula over L. The conservatism
of T means that we have proved the quoted statement in ZF + (V = L).

VII.8. Exercises
VII.1. Show that ω ∼ ω + 2.
VII.2. Show that if A ∼ B, then A is finite iff B is finite.
VII.3. Fill in the missing details in the proof of Proposition VII.1.19.
VII.4. Show that the concatenation of any finite number of (I , F )-derivations
is a (I , F )-derivation.
VII.5. Prove, using first Definition VII.1.32 and then Definition VII.1.3, that
for any x and y such that x = y, both {x} and {x, y} are finite. Use the
second method to also compute their cardinality.
VII.6. Fill in any missing details in the proof of Proposition VII.1.35.
VII.7. Show, using induction on WR-finite sets, that if A is WR-finite and f
is a function, then f [A] is WR-finite.
VII.8. Show that every natural number is WR-finite.
(Hint. Induction on WR-finite sets.)
VII.9. Prove that if A is WR-finite and B ⊆ A, then B is WR-finite. (Do not
use the equivalence of finite with WR-finite.)
VII.10. Prove, by induction on finite sets, that if A is finite and < is a partial
order on A, then A has both a <-minimal and a <-maximal element.
VII.11. Using the previous problem, give an alternative proof that ω is infinite.
(Hint. Consider the partial order ∈ on ω.)
VII.12. If |A| = n + 1 and a ∈ A, then |A − {a}|= n.
VII.8. Exercises 513

VII.13. If A and B are ﬁnite, then so are A ∩ B, A ∪ B, A − B, and A × B.

(Hint. It is convenient to use induction on finite sets for the cases “∪”
and “×”.)
VII.14. If A is finite, then P(A) is finite. In fact, show that if |A| = n, then
| P(A)| = 2n .
(Hint. Use induction on n. Alternatively, count the subsets of A by
counting their characteristic functions.† )
VII.15. Show that if A is an infinite set and B is finite, then A ∪ B, A × B, and
A − B are also infinite, whereas A ∩ B is finite.
VII.16. Show that if A is enumerable and B is finite, then A ∪ B and A − B
are also enumerable.
VII.17. Show that a set is finite iff all its proper subsets are finite.
VII.18. Without using the notion of WR-finite, show that for any finite set A
and any function f , f [A] is finite.
VII.19. Show that the set of finite subsets of ω is enumerable.
(Hint. Identify these sets with their characteristic functions.)
VII.20. Show that the function f , defined in the proof of Theorem VII.2.5, is
strictly increasing.
VII.21. Give a proof of Corollary VII.2.6 without the help of V.3.9 – in parti-
cular, without the help of the axiom of choice.
VII.22. Show that if A is countable and f : A → B is onto, then B is countable.
VII.23. Show that the enumeration pictured in Example VII.2.17 (p. 448) is
given by the function f −1 , where
(x + y)(x + y + 1)
f = λx y. +y
2
Also show that f is a 1-1 correspondence ω2 ∼ ω.
VII.24. Show that λx y.2x (2y + 1) − 1 provides a 1-1 correspondence ω2 ∼ ω.
(Hint. Relate this to the prime factorization theorem, and observe that
all primes except 2 are odd.)
VII.25. Show that the function f : ω2 → ω given by f (x, y) = (x + y)2 + y
is 1-1 but not onto.
(Hint. For “not onto” find an example. For 1-1, find a g such that
g ◦ f = 1 (see Proposition V.3.4). Finding g : ω → ω2 amounts to

† If A ⊆ X and the set X is ﬁxed throughout the discussion, then the characteristic function of A
(with respect to X being understood) is the function χ A = λx.if x ∈ A then 0 else 1.
514 VII. Cardinality

solving the equation z = (x + y)2 + y for (unique, of course) x and y

in ω. Why?)
VII.26. Fill in the missing details in the argument of Example VII.2.20.
VII.27. An algebraic number is √ a real root of a polynomial with integer co-
efficients. For example, 2 is algebraic (root of x 2 − 2 = 0). Each
n ∈ Z is algebraic. It is known that the number π is not algebraic.
(Non-algebraic numbers are called transcendental.) Show that the set
of all algebraic numbers is enumerable.
(Hint. Use the fact that a polynomial of degree n can have at most n
real roots.)
VII.28. Prove Theorem VII.2.25 only with the aid of Lemma VII.2.23.
(Hint. If A is infinite, then it has an enumerable subset B, by Lemma
VII.2.23. Let b ∈ B. Then B − {b} is a proper enumerable subset of
B, by Exercise VII.16.)
VII.29. Without the help of the axiom of choice, show that the set of irrational
numbers, R − Q, is equipotent with R.
(Hint. Find, in R − Q, a set of irrational numbers equinumerous
with Q.)
VII.30. Prove Corollary VII.3.3.
(Hint. This follows from the technique of Example VII.3.1. Define d
so that it is a member of ω 2.)
VII.31. Fill in the missing details in Example VII.3.5.
VII.32. Show that every number in [0, 1] with a finite binary expansion is
rational.
VII.33. Show that every non-zero number in [0, 1] has an infinite binary
expansion.
(Hint. If it has a finite expansion, show that it also has an infinite
expansion.)
VII.34. Show that for any reals a < b, (a, b) ∼ (a, b] ∼ [a, b) ∼ [a, b] ∼ R.
VII.35. Show that (0, 1]×(0, 1] ∼ (0, 1], without the help of Cantor-Bernstein
theorem, as follows: Identify each real in (0, 1] with its unique infinite
decimal expansion. At first, experiment as follows: Given .a0 a1 a2 . . .
ai . . . ∈ (0, 1], form the pair .a0 a2 . . . a2i . . . , .a1 a3 . . . a2i+1 . . . . For
example, .140567 . . . yields .106 . . . , .457 . . ., and

.5510 10. .. yields .51 1 . . ., .50 0

. . ..
all 10’s all 1’s all 0’s
VII.8. Exercises 515

This last example shows that our tentative plan needs an amendment,
for the second component of the pair is not a “real” as we understand
them in this problem. Make an appropriate amendment to obtain a
1-1 correspondence.
(Hint. Revise the way you split .a0 a1 a2 . . . ai . . . into the blocks that
you use to alternately build the two components of the pair, so that
each block contains a non-zero digit.)
VII.36. Show, using the previous problem, that R2 ∼ R.
VII.37. Prove Proposition VII.4.11.
VII.38. Show that α "→ ℵα is onto Cn − ω.
VII.39. Show that if a ∼ a and b ∼ b , then a b yields a b , and a ≺ b
yields a ≺ b .
VII.40. (Tarski.) Show without the help of AC that a set x is inﬁnite iff P(P(x))
contains an enumerable subset.
(Hint. Consider the function f : ω → P(P(x)) given by f (n) =
{a ∈ P(x) : a ∼ n}.)
VII.41. For any α, β, show that Card(α + β) = Card(α) +c Card(β).
VII.42. Show that a < b does not, in general, imply a +c c < b +c c.
VII.43. Without using VII.5.14, prove that a +c c = c for all a ≤ c, where
c = Card(R).
VII.44. Prove VII.5.12 in a different way: Use induction on n.
VII.45. For any α, β, show that Card(α · β) = Card(α) ·c Card(β).
VII.46. Show that for all α, β, ℵα +c ℵβ = ℵα ·c ℵβ = ℵα∪β .
VII.47. Show that for all a > 0, 0a = 0.
VII.48. Show that (a ·c b)c = ac ·c bc for all a, b, c.
VII.49. Fill in all the missing details in the proof of VII.5.21.
VII.50. Compute n ℵ0 for all n ∈ ω.
VII.51. Compute cℵ0 .
VII.52. Show that c < cc .
VII.53. Compute (cc )c in terms of f = Card(R R).
VII.54. Compute the cardinality of the set of all continuous real-valued func-
tions on R.
(Hint. A continuous function is uniquely determined by its restriction
on Q, the set of rational numbers.)
516 VII. Cardinality

VII.55. Compute the cardinality of the set of all differentiable real-valued func-
tions on R.
ℵ
VII.56. Show that ℵα β = 2ℵβ on the assumption that α ≤ β.
(Hint. Use VII.5.21 and VII.4.25.)
VII.57. Show that if k ≤ l, then ak ≤ al .
VII.58. Prove that for any ordinal α, cf(ℵα+ω ) = ω.
VII.59. Prove that if Ai ∼ Bi for all i ∈ I , and if Ai ∩ A j = ∅ = Bi ∩ B j

whenever i = j, then i∈I Ai ∼ i∈I Bi .
!
VII.60. Prove that Card( i∈I Ai ) ≤ i∈I Card(Ai ).
! !
VII.61. Prove that ai ≤ bi for all i ∈ I implies i∈I ai ≤ i∈I bi .
!
VII.62. Prove that i∈m ki = m ·c supi∈m ki , if ki > 0 for all i, and at least one
cardinal among m and ki is infinite.
VII.63. Prove that an infinite cardinal a is singular iff there are cardinals b < a
!
and mλ < a, for all λ < b, such that a = λ∈b mλ .
# #
VII.64. If Ai ∼ Bi for i ∈ I , then i∈I Ai ∼ i∈I Bi .
VII.65. Show that cf(2ℵα ) > ℵα for any α.
VII.66. (Bernstein.) Prove that ℵℵn α = 2ℵα ·c ℵn for all n ∈ ω and all α.
VII.67. Define the beth function, , by the induction 0 = ℵ0 , α+1 = 2α

and, if Lim(α), α = β<α β . Show that has fixpoints.
VII.68. Show that if GCH holds, then α = ℵα for all α.
VII.69. Let Card(N ) < ω. Show that Card(VN (ω + α)) = α for all α.
VII.70. Show that if Lim(α), then VN (α) is a model of ZFC less collection.
VII.71. Let Card(N ) < α, where α is strongly inaccessible. Then the following
are absolute for VN (α):
(1) β is a cardinal,
(2) f : β → γ is cofinal,
(3) cf(β),
(4) β is strongly inaccessible.
VII.72. Prove that “β is a cardinal” is not absolute for VN (ω + 2).
VII.73. Let R be a relation on a set A. Define W f (R), the well-founded part
of R, as the set {a ∈ A : there is no infinite chain . . . R a2 R a1 R a}
(where, of course, all the ai are in A, since R ⊆ A × A). Prove that
= W f (R), where R
Cl( R) on P(A) × A is given by

R
X "→ x iff X = Rx
VII.8. Exercises 517

(Hint. For ⊆ use induction on the structure of Cl(

R). For ⊇ recall
VI.2.13 and use MC over W f (R).)
VII.74. Prove that the rule set that defines the syntax of the formulas of propo-
sitional calculus is unambiguous. This rule set, P, is
∅ "→ p for each variable p
{x, y} "→ (x ∨ y)
{x, y} "→ (y ∨ x)
{x} "→ (¬x)
(Hint. Show by induction over Cl(P) that (1) every member of Cl(P)
has as many “(”-symbols as “)”-symbols, (2) every nonempty proper
(string) prefix of a member of Cl(P) has strictly more “(”-symbols
than “)”-symbols. The rest should be easy.)
VII.75. Prove that the rule set that defines the syntax of the fully parenthe-
sized arithmetic expressions (on the symbol set {1, 2, 3, ×, +, (, )}) is
unambiguous. This rule set, P, is
∅ "→ p for p = 1, 2, 3
{x, y} "→ (x + y)
{x, y} "→ (y + x)
{x, y} "→ (x × y)
{x, y} "→ (y × x)
(Hint. As in the preceding exercise.)
VII.76. Imitate Exercise VII.75 to define a rule set that defines all the terms of
a first order language, and prove that your rule set is unambiguous.
VII.77. Imitate Exercise VII.75 to define a rule set that defines all the formulas
of set theory, and prove that this rule set is unambiguous
(Hint. Brackets, once again, are important.)
VII.78. Prove that for any total operator (even one that is a proper class), α
is a set for all α.
VII.79. Prove VII.6.31 differently: Invoke Gödel’s second incompleteness
theorem.
VIII

Forcing

The method of forcing was invented by Cohen (1963) towards the construction
of non-standard models of ZFC, so that “new axioms” could be proved consis-
tent with the standard ones. Our retelling of the basics of forcing found in this
chapter is indebted primarily to the user-friendly account found in Shoenfield
(1971). The influence of the expositions in Burgess (1978), Jech (1978b), and
Kunen (1980) should also be evident.
In outline, the method goes like this: Suppose we want to show that ZFC
(sometimes ZF or an even weaker subtheory) is consistent with some weird
new axiom, “NA”. Working in the metatheory, one starts with a CTM, M,†
for ZFC. This is the ground model. One then judiciously chooses a PO set,‡
P, <, 1, in M – where we find it convenient to restrict attention to PO sets
that have a maximum element (let us call the latter “1”) – and, using the PO
set, one constructs a so-called generic set G. Circumstances normally have G
obey G ∈ / M. The “judicious” aspect of the choice of the PO set will entail that
the generic extension, M[G], of the CTM M not only contains G as an element
but is a CTM itself that satisfies NA as well (i.e., |= M[G] ZFC + NA). Thus, one
has a proof in the metatheory that if ZFC is consistent (i.e., if a CTM for ZFC
exists), then so is ZFC + NA.
We have said above that “P, <, 1 ∈ M”. By absoluteness of pair (see
Section VI.8), the quoted statement is equivalent to “P ∈ M and <∈ M and
1 ∈ M”.

† We know that we have used the symbol M for {x : U (x)}. However, it is normal practice for
people to also denote by M an arbitrary CTM of ZFC. We are rapidly running out of symbols;
therefore we ask the reader to allow us this overloading of the letter M with more meanings than
one. As always, we will invoke context in our defense.
‡ This is the hard part of the method.

518
VIII. Forcing 519

The above argument cannot be formalized in ZFC “as is” to provide a ﬁnitary
proof of relative consistency, namely, along the lines “if ZFC is consistent, here
is how we can construct a set model for ZFC + NA in ZFC ”. Unfortunately
such a construction would formally prove as a corollary (in ZFC) that ZFC is
itself consistent, clashing with Gödel’s second incompleteness theorem.

Pause. In ZF we have shown how to construct a model, L, of ZFC + GCH.

Why does this not contradict Gödel’s second incompleteness theorem?

We can still circumvent this difﬁculty and provide ﬁnitary proofs of relative
consistency results of the type “if ZFC is consistent, then ZFC + NA is too”,
using forcing. One attempts the contrapositive instead:

If ZFC+NA 0 = 0, then ZFC 0 = 0 (1)

But the if part means that for some ﬁnite set of axioms of ZFC, ,

∪ {NA} 0 = 0 (2)

Now, we can construct a CTM, M, just for , inside ZFC using reflection
(cf. VII.7.23 and the proof of VII.7.24) followed by Mostowski collapsing.
Using forcing, and continuing to work formally inside ZFC, we get a generic
extension of M, M[G], that is a formal model of ∪ {NA}. Now this is fine
by Gödel’s second theorem, for is not the entire ZFC axiom set. By (2), we
have shown ZFC (0 = 0) M[G] , and hence ZFC 0 = 0, since 0 is absolute for
transitive classes. This concludes the forcing proof of (1) in a finitary manner.

We will do our forcing arguments in the metatheory.

A Note on Proofs. We will be working in the metatheory throughout most of

this chapter, using the ZF axioms as our hypotheses (sometimes adding AC).
We will do so usually from within U A (for a usually unspeciﬁed set of start-up
atoms, A), but often from within some CTM M, that is, relativizing formulas
and arguments to M.
Our proof terminology will be similar to that of the “practising mathemati-
cian”. We will say “true” or “false” for sentences (whereas before we have
always said “provable” or “refutable”) and, moreover, we will usually refrain
from reminding the reader that this or that principle of logic (e.g., proof by
auxiliary constant, deduction theorem, proof by cases, etc.) is at work.
520 VIII. Forcing

VIII.1. PO Sets, Filters, and Generic Sets

In this chapter we will ﬁx attention on special types of PO sets that have a
maximum element (which is, of course, unique), invariably denoted by “1”.

VIII.1.1 Deﬁnition. Let P, <, 1 be a PO set with a maximum element –

which we will denote by “1”. Throughout this chapter we will call the members
of P forcing conditions or just conditions, and such PO sets notions of forcing.
We will use the letters p, q, r, s, with or without primes or subscripts, for
conditions.
If p = q ∨ p < q, we write p ≤ q and say that p extends, or is an extension
of, q.† When p < q, then p is a proper extension of q. Two conditions p and
q are compatible iff there is an r ∈ P such that r ≤ p and r ≤ q. If two con-
ditions p and q are not compatible, then they are incompatible and we write
p⊥q.
The abbreviation “ p and q are comparable” stands for “ p = q ∨ p <
q ∨ q < p”.
A set O ⊆ P is open iff, for any p ∈ O, ≤ p ⊆ O. In particular, every
segment ≤ p is open.
A set D ⊆ P is dense iff it meets every open set. In other words, for every
p ∈ P there is a q ∈ D such that q ≤ p (i.e., D ∩ ≤ p = ∅).
A chain over the PO set‡ P is a set C ⊆ P such that any two elements of C
are comparable.
An antichain over the PO set P is a set A ⊆ P such that every two of its
members are incompatible.

VIII.1.2 Remark. In structure parlance (cf. Section I.5) a PO set P, <, 1 is
a structure, with underlying set (or domain) P and with < and 1 as specified
relation and function (a 0-ary function or constant) respectively. Thus, if need
arises, we will use fraktur type (and the same letter as the domain) to name the
structure; in the present case, P = P, <, 1.
The terms “open” and “dense” are not accidental. There is a strong relation
between the homonymous topological concepts and forcing, but this connection
with topology will not be pursued here. By the way, the fully qualified terms
are P-open and P-dense respectively, but usually the qualification is omitted
and P is understood from the context.

† Yes, the extension is the “smaller” of the two. This terminology is due to Cohen (1963).
‡ Recall that when the order < is understood, we say “PO set P” instead of “PO set P, <, 1”.
VIII.1. PO Sets, Filters, and Generic Sets 521

VIII.1.3 Example. The set of ﬁnite functions from ω to {0, 1}, i.e.,

{ f : f is a function and dom( f ) ∈ ω ∧ ran( f ) ⊆ 2} (1)

can be given a PO set structure as follows: Deﬁne f < g to mean g ⊂ f (we

order by “reverse inclusion”). Here ∅ : ω → 2 is 1, the maximum element. For
the sake of future reference, let us give this PO set the name O = O, <, 1
(or O = O, ⊃, ∅), where we have used the name “O” for the set displayed
in (1). We also note that O has no minimal elements. Such an element would
be a ﬁnite function that had no ﬁnite proper extension.

VIII.1.4 Deﬁnition. Let P, <, 1 be a PO set as in VIII.1.1.

A set F ⊆ P is a filter over P, <, 1 – also called a P, <, 1-filter, or a
P-filter if < is understood, or just a filter if P is also understood – provided the
following conditions are fulfilled:

(1) 1 ∈ F.
(2) For any two members p and q of F there is an r in F such that r ≤ p and
r ≤ q.
(3) If p ∈ F and p ≤ q, then q ∈ F (or, p ∈ F → ≥ p ⊆ F).

In view of (3) in VIII.1.4, (1) is equivalent to the requirement “F = ∅”.

Note that in (2) we have asked more than compatibility of any two members
of F: We want this compatibility to be “witnessed” inside F.

In algebra people define their filters in a stronger manner. First off, one requires
that the PO set P be a lattice, that is, for any two of its members p and q, both
sup{ p, q} and inf{ p, q} exist.† One then calls an F ⊆ P a filter if it satisfies

(i) F = ∅ (same as (1) in VIII.1.4 if the lattice has a maximum element),

(ii) for any two members p and q of P, inf{ p, q} ∈ F iff { p, q} ⊆ F.

Note that if F is a ﬁlter in the sense (i)–(ii) over the lattice P = P, <, 1,
then it is as well in the sense (1)–(3) of VIII.1.4 over the PO set P. Indeed,
by (ii), if p and q are in F, then so is inf{ p, q}, providing the “witness” that (2)
requires. Also, if p ∈ F and p ≤ q (q ∈ P), then p = inf{ p, q}; hence (by (ii))
q ∈ F.

† sup and inf were deﬁned in VI.5.23.

522 VIII. Forcing

VIII.1.5 Example. Fix a PO set P. The set {1} is a ﬁlter.

If ∅ = S ⊆ P is a chain, then we can build the ⊆-smallest filter that contains
S. We call it the filter generated by S.
To see that this exists, define
def
F = { p ∈ P : (∃q ∈ S)q ≤ p} (∗)

Trivially (by x ≤ x), S ⊆ F. We next verify that F is a filter. For property (1)
(of VIII.1.4), pick any q ∈ S (S is not empty). But then q ≤ 1; hence 1 ∈ F
by (∗). Property (3) is also trivially verified. As for (2), let p and p be in F.
Then q ≤ p and q ≤ p for some q and q in S. For the sake of concreteness,
say q ≤ q (by comparability of S-elements). Then q is an appropriate witness
for the compatibility of p and p .
Let F be any filter such that

S ⊆ F (∗∗)

Let p ∈ F and also q (in S, by (∗)) such that q ≤ p. Since q ∈ F by (∗∗) and
F is a ﬁlter, it follows that p ∈ F . Thus, F ⊆ F .

VIII.1.6 Example. Refer to the PO set O of VIII.1.3. Suppose that F is a ﬁlter

over O. Then F is single-valued (by VIII.1.4(2)), i.e., a function ω → 2.

VIII.1.7 Deﬁnition (Generic Sets). Given a PO set P = P, <, 1 and a set
M. A subset G ⊆ P is called M-generic iff

(1) G is a ﬁlter over P,

(2) G meets every P-dense set D that is a member of M (that is G ∩ D = ∅).

The reader is reminded that the phrase “D is P-dense” subsumes the sub-
phrase “D ⊆ P”; see VIII.1.1 and VIII.1.2.

VIII.1.8 Theorem (Generic Existence Theorem). Let P = P, <, 1 be a

notion of forcing, and M be countable. Fix any p ∈ P. Then there is an M-
generic set G ⊆ P such that p ∈ G.

Proof. Let m 0 , m 1 , m 3 , . . . be a ﬁxed enumeration of M. We deﬁne by recursion

a function f on ω by

f (0) = p
VIII.1. PO Sets, Filters, and Generic Sets 523

and

m smallest k such that m k ∈ ≤ f (n) ∩ m n if ≤ f (n) ∩ m n =
∅
f (n + 1) = (∗)
f (n) if ≤ f (n) ∩ m n = ∅
Note that the explicit definition of the subscript “k” above avoids an infinite
set of “unspecified choices” (AC). Also note that the last case above always
obtains if m n is an urelement.
It is easy to see that ran( f ) is a nonempty ( p ∈ ran( f )) chain: First off, that
ran( f ) ⊆ P is trivial. Next, the reader can verify that (∀n ∈ ω) f (n + 1) ≤ f (n)
holds (induction on n; the last case in the definition of f guarantees that f is
total on ω).
Taking for G the filter generated by the chain ran( f ) (see VIII.1.5) will
do. Indeed, that p ∈ G is trivial. Let then D ∈ M be dense. Now D = m n
for some n. Then the first condition in (∗) gives us f (n + 1) = m k , where
m k ∈≤ f (n) ∩ D. Since m k ∈ G, we have G ∩ D = ∅.

VIII.1.9 Example. Let M be a CTM for, say, ZF. We will consider the PO
set of VIII.1.3 relativized in M. By absoluteness of pairing and finiteness (see
Section VI.8), {a, b}, ordered pairs, and finite functions are (M-) absolute, and
so is Pω (A) defined as {x : x ⊆ A ∧ x is finite} (see Exercise VIII.3). We also
recall that finite ordinals (and ω), dom, and ran are absolute for M.
Thus one may redo Example VIII.1.3, this time arguing from within M, as
an inhabitant† of M would do, to obtain in M the PO set P = P, ⊃, ∅, where
P = { p : p is a function ∧ dom( p) ∈ ω ∧ ran( p) ⊆ 2} (1)
He will conduct his argument by noting that ω and 2 are in M, and therefore so
is ω × 2; thus P ∈ M, by separation, since‡ P = { p ∈ Pω (ω × 2) : p is a func-
tion ∧ dom( p) ∈ ω} (he knows that M is a ZF model, so he can do all that). It
then follows that P is in M as well, by the fact that M is closed under pairing.

Pause. Why doesn’t he just say that P is a set by separation, since P ⊆ M?

Equivalently, a being of U A argues the same thing by making the case that
P M given in
P M = { p ∈ Pω (ω × 2) : p is a function ∧ dom( p) ∈ ω} (1 )

† This person uses just “{ p : etc.}” rather than “{ p ∈ M : etc.}”, since the ∈ M part is implicit;
there are no universes beyond M for him.
‡ For him Pω (A) consists precisely of these ﬁnite subsets of A that are also members of M – which
are all the ﬁnite subsets of A, absolutely speaking.
524 VIII. Forcing

is in M, since separation holds in M and Pω (ω × 2) ∈ M (why?). Here we are

invoking VI.8.4 and VI.8.13, noting that, for p ∈ M, absoluteness makes the
presence of a superscript “ M ” redundant inside the braces in (1 ) above.
In particular, all this shows that P (hence also P) is absolute for M.
Let G be M-generic. Such a G can be constructed by VIII.1.8, since M is
countable. Let us continue working in U A to construct G.

Since G is a filter, G is a function (VIII.1.6). Note that, for any n ∈ ω,
the set Dn = { p ∈ P : n ∈ dom( p)} is P-dense in M; for if q ∈ P, then either
q(n) ↓ (in which case q ∈ Dn – indeed, any extension p ≤ q is in Dn )† or
q(n) ↑. Well, then, define p = q ∪ {n, 0} (this is in M, by absoluteness of ∪
and pairing). We have p ⊃ q and p ∈ Dn . Thus G meets all the Dn , in other

words, n is in the domain of G for all n: dom F = ω.
Is G ∈ M? Suppose yes, and consider the set (in M by absoluteness of
difference) P − G. This is P-dense in M: Let p ∈ P. Let q and r be two
incompatible extensions of p in M. For example, say n is smallest such that
p(n) ↑. Set then q = p ∪ {n, 0} and r = p ∪ {n, 1}. Now q and r cannot
both be in G for q⊥r . Say q ∈
/ G. But then q ∈ P − G. Having established
the density of P − G (in M), genericity would now imply (P − G) ∩ G = ∅,
a contradiction.
We have said above “Let us continue working in U A to construct G”. We
see that such caution was justified, for an inhabitant of M cannot construct
this G.

VIII.2. Constructing Generic Extensions

Our purpose is to deﬁne a procedure which for any CTM M and M-generic set
G builds a CTM, M[G], that is the ⊆-smallest extension of M containing G
as a member. Such an extension is called a generic or Cohen extension of the
ground model M.
Moreover, we want to be able to empower inhabitants of M to discuss aspects
of M[G] notwithstanding the fact that a lot of objects in M[G] are not in M –
for example, G under “practical circumstances” is not; see VIII.1.9.

To keep our sanity, we will usually employ the language and methods of ZF
(or ZFC, or even of a fragment of ZF) “in the abstract” (i.e., formally) to effect
our various constructions.‡ We can afterwards relativize what we have done

† Recall that this P has no minimal elements.

‡ One can view this approach alternatively: We are really working, metamathematically, within the
“real” universe, U A , however restricting our methodology to only employ ZF axioms and rules
of logic. Of course, any conﬁrmed Platonist who has followed us this far will say: “That’s an odd
comment; wasn’t it exactly this approach that we took all along?”
VIII.2. Constructing Generic Extensions 525

to some CTM M, using results from Sections VI.8 and, sometimes, VI.9. On
occasion, it might be just as easy to work as an inhabitant of M would and,
using the methods of ZFC (or ZF, or . . . ), argue in effect from within M, as we
have done in the initial part of VIII.1.9.

Our discussion will be, in general, dependent upon a PO set “variable”, which
we will invariably call by the nondescript name P = P, <, 1. For convenience
we will use the (fairly standard) notation

|P| stands for P (1)

VIII.2.1 Deﬁnition. Let M be a set and P ∈ M. For any G ⊆ |P| we introduce

the following abbreviation:

a ∈G b stands for (∃ p ∈ G)a, p ∈ b (1)

VIII.2.2 Remark. We have ﬁxed an M and P ∈ M. The relation x P y deﬁned

by (∃G ⊆ |P|)x ∈G y has MC, as this follows from x P y → ρ(x) < ρ(y), this
latter since x ∈ x, p ∈ y for some p ∈ G, for some G ⊆ |P|† (cf. VI.6.24). It is
also left-narrow: x P y → x ∈ T C(y). Thus we can effect recursive definitions
with respect to P, and in particular with respect to ∈G (fixed G), as well as do
P-induction and ∈G -induction (fixed G). Cf. VI.8.23 and VI.8.24.

VIII.2.3 Deﬁnition. Let M be a set, P ∈ M, and G ⊆ |P|. Working in U A ,‡

we deﬁne the interpretation function, λx G.x G , by P-recursion:

x if U (x)
xG = (1)
{y G : y ∈G x} if ¬U (x)

We will call a G the G-interpretation of a.

More completely, we should add to the deﬁnition (1) above, second case, the
conjunct “∧ G ⊆ P ∧ P, <, 1 ∈ M is a PO set”.§ One then adds a third,
“otherwise” case where, say, x G = ∅. This “completion” spoils the clean form
of (1) and adds or subtracts nothing to or from the expected properties of x G

† Or x ∈ {x} ∈ x, p ∈ y if one uses the Kuratowski “. . . , . . . ”.

‡ In ZF in fact, “abstractly” or formally, as we do not use an assumption that M is a CTM.
§ We mean that P, <, 1 is “really” a PO set, as we carry the deﬁnition out in U A . Anyhow, if M
is a CTM, absoluteness of being a PO set would make P, <, 1 a PO set in the eyes of people
living in M as well.
526 VIII. Forcing

used in the sequel. Thus we have stated the missing conditions loosely in the
“assumptions” instead.
Note that if we ﬁx G, then the above deﬁnes λx.x G by ∈G -recursion using
G as a “parameter”. But whence “interpretation”? This terminology will make
sense in the next section.

Finally:

VIII.2.4 Deﬁnition. Let M be a CTM of ZF, P ∈ M, and G an M-generic set.

We deﬁne the set M[G] by

M[G] = {x G : x ∈ M}

We call M[G] a generic extension or Cohen extension of the ground

model M.

VIII.2.5 Remark. By results and techniques of Section VI.8, λx G.x G is ab-

solute for transitive models of ZF (see Exercise VIII.4). Thus if M ⊆ N and N
is a CTM (of ZF) that satisﬁes G ∈ N , then
N
N % aG = aG

for any a ∈ N , in particular for any a ∈ M. Thus, M[G] = {x G : x ∈ M} ⊆ N .

We next build tools to show that for any CTM M and M-generic G, we have
M ⊆ M[G] and G ∈ M[G].

VIII.2.6 Deﬁnition (The Caret). We deﬁne (in ZF formally or in U A meta-

mathematically) by ∈-recursion a function λxP.x̂:

x if U (x)
x̂ =
{ ŷ, 1 : y ∈ x} otherwise

where P has 1 as the maximum element.

VIII.2.7 Remark. Again, there ought to be a third, “otherwise” case above,

yielding, say, x̂ = ∅, whenever the “input” P is not as it should be. The present
“otherwise” would then become the case “¬U (x) ∧ (P is as it should be)”.
Work in Section VI.8 easily yields that the function λxP.x̂ is absolute
(cf. Exercise VIII.5). Thus if M is a CTM of ZF and P ∈ M, then x ∈ M im-
plies that x̂ = (x̂) M ∈ M.
VIII.2. Constructing Generic Extensions 527

In particular, (λxP.x̂) |` M ∈ M. This latter fact can also be seen as follows:
A mathematician who lives in M can effect the recursive deﬁnition VIII.2.6
in M.

VIII.2.8 Lemma. Let M be a CTM of ZF, and G an M-generic set with respect
to P ∈ M. Then M ⊆ M[G] and G ∈ M[G].

Proof. Let x ∈ M. We do ∈-induction to prove (x̂)G = x; thus M ⊆ M[G],

since x̂ ∈ M by VIII.2.7 and therefore (x̂)G ∈ M[G] (cf. VIII.2.4).
If U (x), then (x̂)G = x G = x.
Let now ¬U (x). Then

(x̂)G = {y G : y ∈G x̂} by VIII.2.3

={y G : (∃ p ∈ G)y, p ∈ x̂}
= y G : y, 1 ∈ {ẑ, 1 : z ∈ x} since ran(x̂) = {1}
= {(ẑ) : z ∈ x}
G

= {z : z ∈ x} by I.H.
=x

To prove G ∈ M[G], we look for an element ∈ M such that G = G. We

will calculate that

= { p̂, p : p ∈ P} (1)

will do ﬁne. By closure of M under pairs and by the fact that collection is true
in M, ∈ M. We next calculate

G = {y G : (∃ p ∈ G)y, p ∈ }
= {( p̂)G : p ∈ G}
= { p : p ∈ G} by what we have proved above
=G

VIII.2.9 Remark. M and M[G] have the same urelements. Indeed, M ⊆ M[G]
yields that all atoms of M are included in M[G]. Conversely, suppose that U (a G )
is true in M[G], and hence in U A (recall from Section VI.8 that we set U M = U
in general in (U, ∈)-interpretations). From VIII.2.3, a G = a (otherwise a G is a
set). Thus, a G ∈ M (since a ∈ M).

VIII.2.10 Example. M[G] is closed under pairs, that is z ∈ M[G] ∧

w ∈ M[G] → {z, w} ∈ M[G].
528 VIII. Forcing

It sufﬁces to ﬁnd a u ∈ M such that u G = {z, w}. We start by letting a G = z

and b G = w, with a and b in M (VIII.2.4). Since M is a CTM of, say, ZF, its clo-
sure under pairing yields {a, 1, b, 1} ∈ M. Now u = {a, 1, b, 1} is what
we want, for, by VIII.2.3,
u G = {a G , b G }
since a ∈G u and b ∈G u.

VIII.2.11 Remark. So far we readily have the “C” and “T” of the expected
CTM attributes of M[G]. Indeed, by VIII.2.4, the function λx.x G : M → M[G]
is onto. Since the left ﬁeld is countable, this settles the “C”. As for transitivity,
let x ∈ y G ∈ M[G]. Then y G is a set, thus (VIII.2.3) x = z G for some z ∈G y.
Therefore, for some p ∈ |P|, z ∈ z, p ∈ y ∈ M; hence z ∈ M by transitivity
of M. Finally, x ∈ M[G] by VIII.2.4.

For the “M”, that M[G] is a model of, say, ZFC if M is, we need more work.

VIII.3. Weak Forcing

Let us fix a CTM M (of ZF, or ZFC, or of some extension, or of a suffi-
ciently strong fragment such as ZF without the power set axiom – the so-called
“ZF− P”), as well as a PO set P ∈ M. We will be working in the metatheory,
within U A , using those axioms of ZFC for which M is a model. Suppose that
we have built M[G] for some M-generic G ⊆ |P|, as in VIII.2.4.
We want to allow inhabitants of M to reason about this M[G]. For this
purpose they need names for the “real objects” of M[G] so that they can write
down formulas that can refer to specific objects of M[G], such as G.
Now, by VIII.2.4, any object a ∈ M[G] has the form b G for some b ∈ M.
This b, or, formally, a name for b, will name a.
Thus, we import into the basic language of ZFC, L Set , a new constant symbol
name for each member of M.† This is a process familiar to us from I.5.4.
However, rather than only using the (argot) names i, j, k for members of M –
and i, j, k for their formal counterparts in L Set,M – we will continue using any
(argot) name we please for members of M (such as a, b, q, p, i, . . . ) and, as
in I.5.4 reserve the argot names a, b, p, i, . . . to stand for names of imported
constants. Thus, a names a, etc. L Set,M is called the forcing language.
We will now view M[G] as the domain of the structure MG = (M[G], U, ∈),
and we interpret the language L Set,M in MG in the standard manner (Section I.5).

† As we did with L Set , we are free afterwards to extend the augmented language by deﬁnitions.
VIII.3. Weak Forcing 529

We moreover let

(a)MG = a G for each a ∈ M (1)

Thus every formula over L Set,M says something about M[G].†

Caution. It is important to note that even though the names a were “built” by
importing into L Set names of objects of M, they are primarily used to name –
via the interpretation (1) – objects of M[G], not objects of M.
Whenever we interpret a formula in the structure M = (M, U, ∈), the names
a, . . . are interpreted as (a)M = a, . . . .‡

Of course, in a roundabout way, using the “caret” (cf. VIII.2.6), certain

formal constants will be interpreted, via (1), into members of M:

(â)MG = (â)G = a (cf. VIII.2.8)

After all, M ⊆ M[G].

Under the interpretation of L Set,M in MG above, some names are interpreted
as objects that are not in M. For example, we have seen circumstances under
which an M-generic set G is not a member of M (VIII.1.9), yet there is a name
for G in L Set,M , namely, (VIII.2.8).
We will ﬁnd the following notation convenient:

VIII.3.1 Deﬁnition. For any M-generic G, L Set formula A(x1 , x2 , . . . , xn ),§

and a1 , . . . , an in M we write

G |= A (a1 , a2 , . . . , an ) (1)

G |= A[[ a1 G , a2 G , . . . , an G ]]

as a shorthand for

|=MG A(a1 , a2 , . . . , an ) (2)

|=MG A[[ a1 G , a2 G , . . . , an G ]] (cf. I.5.17)

† This procedure justiﬁes the name “interpretation” for the function x "→ x G .
‡ E.g., in VIII.4.10.
§ Under the term “L Set formula” we include formulas over L Set that may contain deﬁned symbols.
However these formulas must have no M-symbols a, etc.
530 VIII. Forcing

To a person living in U A , (2), and hence also (1), above mean the same thing
as (cf. VI.8.4)
|=U A A M[G] [[ a1G , a2G , . . . , anG ]]
We will usually write the above using round brackets (argot). Similarly, one
abuses notation slightly and writes
|=MG A(a1G , a2G , . . . , anG ) (3)
instead of (2). However, to the right of |= one normally expects to see a well-
formed formula over our language, here L Set,M . Thus, a “real” object (from the
structure MG ) can appear in a formula only by its formal name, a, rather than
by an informal name such as a G , unless one writes in mixed mode using [[ . . . ]]
brackets.†

VIII.3.2 Deﬁnition (Weak Forcing). Let M be a CTM and P ∈ M, while

p ∈ |P|. For any L Set formula A (x1 , x2 , . . . , xn ) and ai ∈ M we write
p w A(a1 , a2 , . . . , an ) (1)
pronounced p weakly forces the sentence A(a1 , a2 , . . . , an ), to mean

(∀G ⊆ |P|) G is M-generic ∧ p ∈ G implies G |= A(a1 , a2 , . . . , an )

Ideally one ought to use a subscript “(M, P)” to the symbol “w ”, but such
pedantry is almost never practised or needed.

In conformity with the mixed-mode notation of VIII.3.1, we may also write

instead of (1)
p w A[[ a1 , a2 , . . . , an ]] (2)
Note that since (1) is to be investigated within M, each ai is interpreted as
ai in M, which leads to (2). One then abuses notation and uses round brackets
in (2) (more on this below).
Weak forcing is due to Feferman (1965). Intuitively, one can think of p
as “a finite amount of information” that is sufficient to make good the claim
“A(a1 , a2 , . . . , an ) is true in G”‡ in all the infinite (generic) extensions of p.§ In
effect, p forces the truth of A(a1 , a2 , . . . , an ) in M[G]. Note that an inhabitant
of M can write down (1), but it is unclear a priori just how he might verify
it or refute it, for he has no knowledge, in general (cf. VIII.1.9), of generic

† This same apparent hairsplitting is what made us import constants in I.5.4 towards defining the
Tarski semantics for first order languages.
‡ In the jargon of VIII.3.1.
§ Looking back to VIII.1.9, this distinction between finite and infinite “amounts of information”
is aptly motivated.
VIII.3. Weak Forcing 531

sets except by (formal) name – all he knows on faith is that generic sets are
objects found beyond the universe he lives in. Yet we will see in the next section
that this apparent dependence of forcing on G-sets and knowledge of “things
outside M” can be circumvented.
We state here a few basic properties of w , all due to Cohen. Monotonicity
(2) in VIII.3.3 below, and the deﬁnability and truth lemmata below, will be the
ticket for the mathematician in M to do forcing within his universe.

In most cases below we write c or c G (depending on target structure) instead

of c in formulas. This notational simpliﬁcation (argot) is achieved by using
mixed-mode notation, but employing round brackets nevertheless.

VIII.3.3 Lemma (Cohen). Let M be a CTM and P ∈ M, while p ∈ |P| and

q ∈ |P|. For any sentences A, B over L Set,M the following hold:
(1) Consistency: We cannot have both p w A and p w ¬A .
(2) Monotonicity: p w A and q ≤ p imply q w A .
(3) p w A ∧ B iff p w A and p w B .

Proof. (1): We cannot have both |=MG A and |=MG ¬A.

(2): Any M-generic G is a ﬁlter, and hence q ∈ G implies p ∈ G.
(3): |=MG A ∧ B iff |=MG A and |=MG B .

VIII.3.4 Lemma (Deﬁnability Lemma). We ﬁx a CTM M and a notion of

forcing P ∈ M. For any A(x1 , . . . , xn ) over L Set,M there is a B (y, x1 , . . . , xn )
over L Set,M such that for all p ∈ |P| and x1 , . . . , xn in M,

p w A(x1 , . . . , xn ) ↔ B M ( p, x1 , . . . , xn )
holds in U A .

Granting the lemma, a being in M can verify the right hand side of the above
equivalence (hence also the left) working in his world M with the unrelativized
B (cf. VI.8.4).

The following lemma says that truth in MG can be certiﬁed by working with
a ﬁnite approximation of G.

VIII.3.5 Lemma (Truth Lemma). Let M be a CTM, and P ∈ M be a notion

of forcing. For any M-generic G ⊆ |P| and any formula A(x ) over L Set,M ,
for all xi ∈ M, A M[G] (x1G , . . . , xnG ) ↔ (∃ p ∈ G) p w A(xn )
holds in U A .
532 VIII. Forcing

The last two lemmata are proved in Section VIII.5 with the help of the
“original” concept of Cohen’s (strong) forcing.

VIII.4. Strong Forcing

We deﬁne here Cohen’s strong forcing relation, a syntactic concept that does
not refer to generic sets and therefore can be deﬁned within U A or, indeed,
within M.
We will use the symbol “” for strong forcing, without qualifying super-
scripts or subscripts as to its “strength”. In fact it will be shown that for any
sentence over L Set,M and p ∈ |P|,
M
p w A iff p ¬¬A (1)

The form of (1) immediately suggests that unlike w – this being helped by
its semantical definition – does not subscribe to the proposition that an even
number of ¬ symbols at the front of a formula can be dropped.
We will attempt to motivate the definition of the version of we use here.
This is the one in Shoenfield (1971) and is probably the user-friendliest in the
literature (compare with the versions in Cohen (1963) and Kunen (1980)† ). The
reader is cautioned not to expect our motivational overture to unambiguously
lead to a unique choice of definition of . Even the auxiliary relation x ∈ p y
introduced below can be defined in different ways (see, e.g., Shoenfield (1967
vs. 1971)).
The crucial concept in motivating the definition of is that by using it, in M,
we can effect a syntactic approximation to truth in M[G], i.e., an approximation
to what

G |= A(a1 , . . . , an )

or, more generally (A is over L Set ),

G |= A(x1 , . . . , xn )

means.
We begin by introducing a ﬁnite version of x ∈G y:

† Not only are the various versions of strong forcing not created equal in terms of deﬁnitional
complexity, but this is also true in terms of their behaviour. For example, in the version in Kunen
w M
(1980), rather than (1) above one proves p A iff p A .
VIII.4. Strong Forcing 533

VIII.4.1 Deﬁnition. Let P be a PO set. With p ∈ |P| and quantiﬁcation over

|P| we introduce the abbreviation

x ∈p y stands for (∃q ≥ p)x, q ∈ y

Suppose that P ∈ M, where M is a CTM of ZF. Thus, x ∈ p y is similar to

x ∈G y. In each case, the membership in y (via ∈G or ∈ p ) is settled by looking
at one appropriate finite piece of information that is part of G or p (recall that
when q ≥ p, q contains less information than p – cf. VIII.1.3). In the case
of ∈ p , however, we even start with a finite amount of information, p, rather
than G.
This relation is not explicitly defined in Shoenfield (1971), but is used there
in a crucial way to define p x ∈ y (see below). By contrast, in Shoenfield
(1967) x ∈ p y is explicitly introduced, but there it abbreviates something else,
namely, (∀q ≤ p)x, q ∈ y; i.e., one looks at all finite extensions of p in order
to settle whether x ∈ p y.
The above considerations complement our earlier comment that this moti-
vational discussion will not deliver the definition of uniquely. In the end, the
definition of is meant to make the following three lemmata and (1) above
true. Any definition that works will do.
In the four lemmata immediately below, M is an arbitrary set, possibly empty,
whose sole purpose is to enrich L Set with a number of additional constants. It
bears no relation to P in general.

VIII.4.2 Lemma (Deﬁnability Lemma – Strong Forcing Version). Let M be

a set, and P a PO set.
For any A(x1 , . . . , xn ) over L Set,M there is an A (y, x1 , . . . , xn ) over L Set,M
such that for all p ∈ |P| the following holds in U A :

p A(x1 , . . . , xn ) ↔ A ( p, x1 , . . . , xn )

Just like and |=, and w apply to everything to their right (they have
lowest priority, hence maximum scope). This explains the brackets around
p A(x1 , . . . , xn ) above.

VIII.4.3 Lemma (Monotonicity Lemma – Strong Forcing Version). Let M

be a set, and P a PO set. Assume that p ∈ |P| and q ∈ |P| with q ≤ p. Then
for any formula A(xn ) over L Set,M , the following holds in U A :

p A(xn ) → q A(xn )
534 VIII. Forcing

VIII.4.4 Lemma (Consistency Lemma). Let M be a set, P a PO set,

and p ∈ |P|. Then for any formula A(x1 , . . . , xn ) over L Set,M and any
x1 , . . . , xn it is true in U A that we cannot have both p A(x1 , . . . , xn ) and
p ¬A(x1 , . . . , xn ).

VIII.4.5 Lemma (Quasi-completeness Lemma). Let M be a set, and P a PO

set. For any p ∈ |P| and formula A(xn ) over L Set,M , the following holds in
U A : There is a q ≤ p, depending on xn , such that

q A(xn ) ∨ q ¬A(xn )

VIII.4.6 Remark. In other words, for any ﬁxed xn , the set

p ∈ |P| : p A(xn ) ∨ p ¬A(xn )

is dense. If M is a CTM of ZF − P, xn is in M, and P ∈ M, then relativizing to

M yields
M
p ∈ |P| : p A (xn ) ∨ p ¬A(xn ) is dense

Hence, since being dense is absolute for such a CTM (Exercise VIII.3),
M M
p ∈ |P| : p A(xn ) ∨ p ¬A(xn ) is a dense set in M.

VIII.4.7 Lemma (Truth Lemma – Strong Forcing Version). Let M be a

CTM, and P ∈ M be a notion of forcing. For any M-generic G ⊆ |P| and any
formula A(xn ) over L Set,M , it is true in U A (provable with no more than the
ZF axioms) that
M
(∀xn ∈ M n ) A M[G] (x1G , . . . , xnG ) ↔ (∃ p ∈ G) p A(xn )

The deﬁnition of will be given syntactically in the metatheory –

specifically, within U A (using no more than the ZF axioms) – in VIII.4.8 below.
We continue our work, pretending that we live in M, towards motivating the
final definition.
Defining p U (a) is easy: we will let p U (a) be true iff U (a) is true in
M (which is so, by VIII.2.9, iff U (a) is true in M[G]).
Syntactically, we will let the expression p U (x) – where x is a variable –
just stand for U (x).
VIII.4. Strong Forcing 535

Next we search for a good deﬁnition for p a ∈ b and p a = b,† or, more
generally (with free variables), of p x ∈ y and p x = y.
So, what does |=MG a ∈ b mean? It means that a G ∈ b G is true, that is, a ∈G b
is true. This is, trivially, equivalent to (cf. I.6.2)

(∃z)(z = a ∧ z ∈G b) (1)

To obtain a ﬁnite approximation of (1) we replace z = a and z ∈G b by ﬁnite

approximations (we use strong forcing to approximate z = a). This leads to
(∃z)(z ∈ p b ∧ p z = a), or, in the free variables version,

px ∈ y means (∃z)(z ∈ p y ∧ p z = x) (2)

More explicitly (by VIII.4.1),

px ∈ y means (∃z)(∃q ≥ p)(z, q ∈ y ∧ p z = x) (3)

We have already indicated that we will deﬁne what p x = y means and

then obtain the meaning for p x = y indirectly. Thus, before we focus on =,
let us reﬂect on how to get p ¬A from p A in general. Unlike the case
for |=, we must not set‡

p ¬A iff ¬ p A

for it is conceivable that the p does not force the truth of A, simply because it
does not contain enough information to do so. Thus we need to know that no
amount of additional ﬁnite information (any q such that q ≤ p) will help to
force A. Then we can proclaim that p forces ¬A. Thus, we will adopt

p ¬A iff (∀q ≤ p)¬q A (4)

With this settled, what does |= M[G] a = b mean? It means that a G = b G is

false, and therefore one of the following cases obtains:

(i) U (a) ∧ U (b) ∧ a = b (recall that U (a) iff U (a G ))

(ii) U (a) ∧ ¬U (b)
(iii) ¬U (a)
∧ U (b)
(iv) (∃z) (z ∈ a G ∧ z ∈/ b G ) ∨ (z ∈ b G ∧ z ∈
/ aG )

† Actually, it turns out to be technically somewhat more convenient to look for a deﬁnition of
p a = b, viewing = as the primary (in)equality predicate and = as a derived one.
‡ The ﬁrst ¬ is formal, part of the formula ¬A . The second is metamathematical. Some writers
use p
A instead.
536 VIII. Forcing

Condition (iv) is equivalent to

(∃z) (z ∈G a ∧ z ∈
/ G b) ∨ (z ∈G b ∧ z ∈
/ G a) (5)

where we have written x ∈ / G y for ¬x ∈G y. A ﬁnite approximation of (5) that

(as we will see) works is obtained by using ∈ p for the positive cases and p
for the negative cases. We are ready to summarize in a deﬁnition.

VIII.4.8 Definition. (Shoenfield (1971).) We fix a PO set P = P, <, 1 and

a set M – possibly empty, a provider of constants – and deﬁne within U A (using
no more than the ZF − P axioms) the symbol p A, for any formula of L Set,M .
We do so by induction on formulas:

(a) p U (x) stands for U (x).

(b) p x ∈ y is given by (2) (p. 535) (equivalently, (3), p. 535).
(c) p x = y stands for

(U (x) ∧ U (y) ∧ x = y) ∨ (U (x) ∧ ¬U (y)) ∨ (¬U (x) ∧ U (y))

∨ (∃z) (z ∈ p x ∧ p z ∈ / y) ∨ (z ∈ p y ∧ p z ∈
/ x)

(d) p ¬A iff (∀q ≤ p)¬q A.

(e) p A ∨ B iff p A or p B .
(f) p (∃x)A[x] iff (∃x) p A [x] .

VIII.4.9 Remark. In (b) and (c) above an occurrence of q u ∈

/ w is (in view
of (d)) an abbreviation for

(∀r ≤ q)¬r u ∈ w (6)

while (since u = w is an abbreviation of ¬u = w) q u = w is an abbreviation

for

(∀r ≤ q)¬r u = w (7)

Using (6) and (7), we can thus rewrite clauses (b) and (c) of the deﬁnition to
read

px ∈ y abbreviates (∃z)(∃q ≥ p)(z, q ∈ y ∧ (∀r ≤ p)¬r z = x)

(8)
VIII.4. Strong Forcing 537

and

p x = y abbreviates
(U (x) ∧ U (y) ∧x = y) ∨ (U (x) ∧ ¬U (y)) ∨ (¬U (x) ∧ U (y)) ∨
(∃z)(∃q ≥ p) (z, q ∈ x ∧ (∀r ≤ p)¬r z ∈ y) ∨ (9)

(z, q ∈ y ∧ (∀r ≤ p)¬r z ∈ x)

respectively. This now makes sense out of Deﬁnition VIII.4.8(b)–(c), since (8)
and (9) constitute a simultaneous recursion in the sense of VI.2.40.
More rigorously, then, what Deﬁnition VIII.4.8(b) and (c) really mean is that
we employ the abbreviations

for any p ∈ |P|, p x ∈ y stands for In( p, x, y) = 0 (10)

and

for any p ∈ |P|, p x = y stands for Ne( p, x, y) = 0 (11)

where the functions λpx y.In( p, x, y) and λpx y.Ne( p, x, y) (with right ﬁeld
{0, 1}) are deﬁned in U A by the simultaneous recursion (8 ) + (9 ) below,
mimicking (8) and (9):

0 if p ∈ |P| ∧
In( p, x, y) = (∃z)(∃q ≥ p)(z, q ∈ y ∧ (∀r ≤ p)Ne(r, z, x) = 1)

1 otherwise (this includes the case p ∈
/ |P|)
(8 )

and


 0 if p ∈ |P| ∧ (U (x) ∧ U (y) ∧ x = y) ∨ (U (x) ∧ ¬U (y))



 ∨ (¬U (x) ∧ U (y))$

Ne( p, x, y) = ∨ (∃z)(∃q ≥ p) z, q ∈ x ∧ (∀r ≤ p)In(r,
z, y) = 1

 %

 ∨ z, q ∈ y ∧ (∀r ≤ p)In(r, z, x) = 1



1 otherwise
(9 )

That the recursion (8 )–(9 ) is legitimate follows from the following consid-
erations: We note that (8 ) implies z ∈ dom(y) (by z, q ∈ y) and hence
max(ρ(z), ρ(x)) < max(ρ(x), ρ(y)). Similarly, (9 ) implies z ∈ dom(x) (by
z, q ∈ x) or z ∈ dom(y) (by z, q ∈ y); thus max(ρ(z), ρ(y)) < max(ρ(x),
ρ(y)) and max(ρ(z), ρ(x)) < max(ρ(x), ρ(y)) respectively. It follows that the
538 VIII. Forcing

recursion is with respect to the relation

p, u, vPq, x, y iff { p, q} ⊆ |P| ∧ max(ρ(u), ρ(v)) < max(ρ(x), ρ(y))

which has MC and is left-narrow (verify!) as required.

We now recast the entire Deﬁnition VIII.4.8 in a rigorous light: (a) and (d)–(f)
remain the same, giving meaning to the left hand side in terms of the right hand
side. (b) and (c) are now (10) and (11) above.
By the absoluteness results of Section VI.8 (in particular, absoluteness of
ranks; cf. also VI.8.23 and VI.8.24), both In and Ne are absolute for any CTM
M of ZF − P. Let then M be such a CTM with P ∈ M. Relativizing to M, we
have, for all a and b in M and p ∈ |P|,

In M ( p, a, b) = In( p, a, b) and Ne M ( p, a, b) = Ne( p, a, b) (12)

In M = In |`M and Ne M = Ne |`M (13)

Proof of Lemma VIII.4.2. We do induction on formulas. A trivial preliminary

remark is that “stands for” can be replaced by “↔”. For example, (10) can be
rewritten as

For any p ∈ |P|, p x ∈ y ↔ In( p, x, y) = 0

for if we replace the abbreviation p x ∈ y by what it abbreviates, we get a

tautology.
For the basis, A may be

(i) U (x). We then take A ( p, x) ≡ U (x).

(ii) x ∈ y. Then use A ( p, x, y) ≡ In( p, x, y) = 0.
(iii) x = y. Then use A ( p, x, y) ≡ Ne( p, x, y) = 0.

For the induction steps, A may be

(I) ¬B (x ). Then we can let A ( p, x ) ≡ (∀q ≤ p)B (q, x ).

(II) B ∨ C . Then we let A ≡ B ∨ C .
(III) (∃y)B (y, x ). Then we set A ( p, x ) ≡ (∃y)B ( p, y, x ).

VIII.4.10 Corollary. Let M be a CTM of ZF − P, and P ∈ M a notion of

forcing. For any formula A (xn ) over L Set there is a formula B (y, xn ) over
VIII.4. Strong Forcing 539

L Set such that for all a1 , . . . , an in M and all p ∈ |P|,

M
|=U A p A(a1 , . . . , an ) iff |=M B [[ p, an ]]

where M = (M, U, ∈).

Proof. Relativizing VIII.4.2 to M, we get

M
(∀ p ∈ |P|)(∀xn ∈ M) pA(x1 , . . . , xn ) ↔ A ( p, x1 , . . . , xn ) M

in particular, for any p ∈ |P| and constants a1 , . . . , an in M,†

M
p A(a1 , . . . , an ) ↔ A ( p, a1 , . . . , an ) M

Setting B ≡ A and rewriting the right hand side of the above in [[ . . . ]]

notation, we are done.

Caution. p U (x), p x ∈ y, and p x = y are absolute for CTMs, as noted

in VIII.4.9. Thus, e.g., ( p x ∈ y) M is equivalent to p x ∈ y for all p, x, y
in M. However, p . . . , in general, is not absolute.

Proof of Lemma VIII.4.3. We do induction on formulas, following Deﬁni-

tion VIII.4.8. Now, A may be one of

(i) U (x): Then the result is immediate.

(ii) x ∈ y: Let p x ∈ y and s ≤ p. By assumption we have (8) of Re-
mark VIII.4.9. This remains valid if we replace the letter p by s throughout
(transitivity of ≤).
(iii) x = y: Exactly as in the previous case (using (9) of VIII.4.9)

For the induction steps, A may be

(I) ¬B : Let q ≤ p and (∀r ≤ p)¬r B . Then (∀r ≤ q)¬r B , that is,
q ¬B . (The I.H. was not used)
(II) B ∨ C : Exercise.
(III) (∃y)B (y, x ): Let q ≤ p and p A . That is, (∃y) p B (y, x ). By the
I.H., (∃y)q B (y, x ).

Proof of Lemma VIII.4.4. Fix x . Then p ¬A(x ) means (∀q ≤ p)q

A (x ).
In particular, p
A(x ).

† We have opted here for the notation “A (. . .) M ” rather than the awkward “(A ) M (. . .)”.
540 VIII. Forcing

Proof of Lemma VIII.4.5. Fix x and p ∈ |P|. If p ¬A(x ), then we are

done with q = p. Else, there is a q ≤ p such that q A(x ) by VIII.4.8 – the
¬ -case.

Proof of Lemma VIII.4.7. Fix an M-generic set G. We do induction on formulas,

following Deﬁnition VIII.4.8. For the basis we do induction on max(ρ(x), ρ(y)).
Now, A may be one of

(i) U (x): Trivial. Forcing = truth in this case.

(ii) x ∈ y:
→: We ﬁx x and y in M such that x G ∈ y G is true. Thus, (∃z)(z =
x ∧ z ∈G y) (cf. (1) of p. 535). Let c (auxiliary constant) be a z that
works. By the I.H. (on max(ρ(x), ρ(y))), let p ∈ G be such that†

pc = x (a)

Let also q ∈ G such that c, q ∈ y. There is an r ∈ G that witnesses

compatibility of p and q. Then r c = x by VIII.4.3 and (a) above.
Thus

(∃z)(∃q ≥ r )(z, q ∈ y ∧ r z = x) (b)

– in short, (∃r ∈ G)r x ∈ y.‡

←: Let x and y be in M and assume (b), with r ∈ G. Let c work for z.
Then q ∈ G by ﬁlter properties; thus c ∈G y; hence

cG ∈ y G (d)

Moreover, the I.H. (on max(ρ(x), ρ(y))) implies that c G = x G . Com-

bining with (d) (via the Leibniz axiom), we get x G ∈ y G .
(iii) x = y:
→: We fix x and y in M such that x G = y G is true. We have cases:
(a) U (x G ) ∧ U (y G ). Then x = x G and y = y G (VIII.2.9), and thus
x = y. By VIII.4.8, any p ∈ |P| satisfies p x = y. Taking, for
example, p = 1, we have such a p in G.
(b) U (x G ) ∧ ¬U (y G ). Then (VIII.2.9) U (x) ∧ ¬U (y), and any
p ∈ |P| satisfies p x = y (VIII.4.8). We conclude as in the
previous case.
(c) ¬U (x G ) ∧ U (y G ). As above.

† The I.H. applies to atomic formulas, here c = x. However, it is all right to apply it to negated
atomic formulas, here c = x, by the “¬” case below.
‡ By the remark following the proof of VIII.4.10, it is unnecessary to write (r x ∈ y) M .
VIII.4. Strong Forcing 541

(d) ¬U (x G ) ∧ ¬U (y G ). Thus,

∃z ∈ dom(x) ∪ dom(y)
(z G ∈ x G ∧ z G ∈
/ yG ) (1)

∨ (z ∈ y ∧ z ∈
G
/x )
G G G

For the sake of argument, say it is the ﬁrst of the two (∨-)cases
of (1) above that holds for some z. Then z ∈G x, i.e., for some
p ∈ G,
z, p ∈ x (2)
Moreover, since ρ(z) < ρ(x) by (2), max(ρ(z), ρ(y)) < max(ρ(x),
ρ(y)); thus the I.H. implies, for some q ∈ G,
q z ∈
/y (3)
Let r ∈ G satisfy r ≤ q and r ≤ p. Then (2) yields z ∈r x, and
(3) yields r z ∈ / y by VIII.4.3. Thus r x = y. The other case
in (1) is entirely analogous.
←: We ﬁx p ∈ G and x and y in M such that p x = y. We have
cases.
(a ) U (x)∧U (y). By VIII.4.8, x = y holds. Since x = x G and y = y G
(VIII.2.9), x G = y G holds.

(b ) U (x) ∧ ¬U (y). Then (VIII.2.9) U (x G ) ∧ ¬U (y G ). Thus x G = y G
holds.

(c ) ¬U (x) ∧ U (y). As above.
(d ) ¬U (x) ∧ ¬U (y). By VIII.4.8,

∃z ∈ dom(x) ∪ dom(y)
(∃q ≥ p)(z, q ∈ x ∧ p z ∈ / y) (4)

∨ (∃q ≥ p)(z, q ∈ y ∧ p z ∈
/ x)

For the sake of argument, say it is the ﬁrst of the two (∨-)cases
of (4) above that holds for some z. Then z ∈G x, since q ∈ G by
ﬁlter properties; hence
zG ∈ x G (5)
ρ(z) < ρ(x) by z ∈ dom(x) implies max(ρ(z), ρ(y)) < max(ρ(x),
ρ(y)); hence, from p z ∈
/ y and the I.H.,
zG ∈
/ yG (6)
(5) and (6) now yield x = y . The other case in (4) is entirely
G G

analogous.
542 VIII. Forcing

For the induction steps, with respect to formulas, A may be

(I) ¬B (xn ): Fix xi , i = 1, . . . , n in M. By Remark VIII.4.6 and M-genericity
of G,
M M
(∃ p ∈ G) p B (xn ) ∨ p ¬B (xn ) (7)

→: Assume ¬B M
M[G] G
(x1 , . . . , xnG ). Let q ∈ G make (7) true. Then
q ¬B (xn ) , since the alternative and the I.H. would yield
B M[G] (x1G , . . . , xnG ), contradicting the assumption.
M
←: Let p ∈ G be such that p ¬B (xn ) . Then (VIII.4.4) p
M
B (xn ) .
M
Is it possible that, for some r ∈ G, r B (xn ) ? Well, if so, let
s ∈ G witness the compatibility of p and r . Then (VIII.4.3)
M M
s B (xn ) ∧ s ¬B (xn )

contradicting VIII.4.4. Thus,

M
(∀ p ∈ G) p
B (xn )

or
M
¬(∃ p ∈ G) p B (xn )

By the I.H., this translates to “B M[G] (x1G , . . . , xnG ) is false in U A ”.

Hence ¬B (x1G , . . . , xnG ) M[G] is true.
(II) B ∨ C : Exercise.
(III) (∃y)B (y, x ): Fix x in M.
→: Let

(∃y ∈ M[G])B M[G] (y, x1G , . . . , xnG ) (8)

be true. Let y = a work above. Then for some b in M, a = b G and

B M[G] (b G , x1G , . . . , xnG ) (9)

is true. By the I.H.,

M
(∃ p ∈ G) p B (b, x1 , . . . , xn ) (10)

is true; hence
M
(∃ p ∈ G) p (∃y)B (y, x1 , . . . , xn ) (11)

is true, by VIII.4.8 (∃-case).

VIII.5. Strong vs. Weak Forcing 543

→: Let (11) hold for the given x . Then (10) holds by VIII.4.8 (∃-case)
for some b ∈ M. Therefore, by the I.H., (9) holds. Thus we have (8).

VIII.5. Strong vs. Weak Forcing

We can now connect and w .

VIII.5.1 Theorem. Fix a CTM M and a PO set P ∈ M. Then for any formula
A(xn ) over L Set,M , and all xn in M, (1) on p. 532 holds in U A .

Proof. Fix x1 , . . . , xn in M. We prove that

M
p w A(xn ) ↔ p ¬¬A(xn ) (2)

holds in U A .
←: Assume the right hand side of ↔, and let G ⊆ |P| be M-generic
and p ∈ G (cf. VIII.1.8). By VIII.4.7, ¬¬A M[G] (x1G , . . . , xnG ) is true; hence
so is A M[G] (x1G , . . . , xnG ). By Deﬁnition VIII.3.2, p w A(xn ), since G (with
p ∈ G) was an arbitrary generic set.
→: We prove the contrapositive, so let
M
¬ p ¬¬A(xn ) (3)

Thus, for some q ≤ p,

M
q ¬A(xn ) (4)

By VIII.1.8, let G be M-generic and q ∈ G. By VIII.4.7, (4) yields the truth of

¬A M[G] (x1G , . . . , xnG )

By ﬁlter properties, p ∈ G. Deﬁnition VIII.3.2 then yields ¬ p w A (xn ) .

One now obtains the Lemmata VIII.3.4 and VIII.3.5 at once:

Proof of Lemma VIII.3.4. By VIII.5.1 and VIII.4.2 we can take

B ( p, x ) ≡ p ¬¬A(x )

Proof of Lemma VIII.3.5. ←: Pick any M-generic G. Let p ∈ G, xi (i =

1, . . . , n) be in M, and p w A(xn ). By Deﬁnition VIII.3.2, A M[G] (x1G , . . . , xnG )
holds.
544 VIII. Forcing

→: Pick any M-generic G and xi (i = 1, . . . , n) in M. Assume M[G] now that

A M[G] (x1G , . . . , xnG ) holds. Then also ¬¬A(x1G , . . . , xnG ) holds. By
VIII.4.7,

M
(∃ p ∈ G) p ¬¬A(x1 , . . . , xn )

We are done by VIII.5.1.

VIII.6. M[G] Is a CTM of ZFC If M Is

Let M be a CTM for ZFC, P ∈ M be a notion of forcing in M, and G ⊆ |P|
be M-generic. We will show that M[G] is also a CTM of ZFC, indeed the
⊆-smallest.
So far we know that M[G] is countable and transitive (VIII.2.11) and is a
subset of any CTM N such that M ⊆ N and G ∈ N (VIII.2.5). It remains to
see that indeed it is a model of ZFC.

VIII.6.1 Lemma. For any x ∈ M, ρ(x G ) ≤ ρ(x).

Proof. We do ∈-induction. If U (x G ), then also U (x); thus ρ(x G ) = ρ(x) = 0

VI.6.24). Next, assume that ¬U (x G ). Then

 

ρ(x ) = 
G
ρ(y ) + 1
G

y G ∈x G

≤ ρ(y) + 1 by I.H.
(∃ p∈G)y, p∈x

≤ ρ(z) + 1 since ρ(y) < ρ(y, p)
z∈x

= ρ(x)

VIII.6.2 Lemma. On M = On M[G] .

Proof. On M = {x ∈ M : (x ∈ On) M } = {x ∈ M : x ∈ On} = M ∩ On, the

second “=” by absoluteness of (x ∈ On) for transitive classes. That is, On M is
VIII.6. M[G] Is a CTM of ZFC If M Is 545

the set of all real† ordinals found inside M. To an inhabitant of M, On M is a

proper class, that is,
On M ∈
/M (1)
Indeed, On M is a (real) ordinal, for it contains just ordinals and is a transitive set
(intersection of two transitive classes). If On M ∈ M, then On M ∈ On ∩ M =
On M . It also happens to be the smallest (real) ordinal not in M: If α < On M ,
then α ∈ On M = M ∩ On.
Similarly, transitivity of M[G] yields that On M[G] = M[G] ∩ On and that
On M[G] is the smallest ordinal not in M[G]. Since, trivially, On M ⊆ On M[G] ,
that is, On M ≤ On M[G] , we will be done if we can show that
On M ∈
/ M[G] (2)
Well, if (2) is false, then On M = c G for some c ∈ M, and hence (VIII.6.1)
ρ(On M ) ≤ ρ(c). Since (ρ(c) ∈ On) M is true (why?), we have ρ(c) M ∈ On M ,
that is, ρ(c) < On M (absoluteness of ρ), a contradiction (from ρ(On M ) =
On M + 1 – cf. VI.6.21).

VIII.6.3 Remark. By absoluteness of ω and of natural numbers, n ∈ On and

ω ∈ On relativize (for any transitive class M) to n ∈ OnM and ω ∈ OnM
respectively. In particular, 0, 1, 2, . . . , ω are in both M and M[G].

VIII.6.4 Theorem. The M[G]-relativizations of the ZFC axioms are true.

Proof.
(i) Urelement axioms:
(a) Urelements are atomic: M[G]Let y ∈ M[G]. We want the truth of
U (y) → ¬(∃x)x ∈ y . That is, of U (y) → ¬(∃x ∈ M[G])x ∈ y,
which is true even without the qualiﬁcation (∃x)(x ∈ M[G] ∧ · · · ).
(b) Set of all atoms:
We want {x : U (x)} M[G] ∈ M[G]. That is, M[G] ∩
{x : U (x)} ∈ M[G]. This is so by VIII.2.9 – whence M[G] ∩
{x : U (x)} = M ∩ {x : U (x)} – along with M ⊆ M[G] and the fact
that M is a CTM, so that {x : U (x)} M ∈ M.
(ii) Extensionality: It holds because M[G] is transitive (VI.8.10).
(iii) Separation: Let a ∈ M and b = {x ∈ a G : A M[G] (x)}, where A(x) is
over L Set,M . We let
c = {x, p ∈ dom(a) × |P| : p w x ∈ a ∧ A(x)} (1)

† Not relativized.
546 VIII. Forcing

By VIII.3.4, the condition to the right of “:” is equivalent to a formula

relativized to M. Moreover, dom(a) × |P| ∈ M, since M is a CTM of
ZFC. Thus c ∈ M. We now verify that b = c G , and we are done. Indeed:
⊇: Let z ∈ c G . Then z = x G for some x satisfying x, p ∈ c,
for some p ∈ G. Then (1) yields that x ∈ dom(a) and that
x G ∈ a G ∧ A M[G] (x G ) is true by VIII.3.5. This says x G ∈ b.
⊆: Let z ∈ b. Then z ∈ a G ; therefore z = x G for some x ∈ dom(a).
The second part of the entrance requirement in b yields the truth of
A M[G] (x G ). Thus, x ∈ dom(a) ∧ x G ∈ a G ∧ A M[G] (x G ) is true.
By VIII.3.5, x ∈ dom(a) ∧ p w x ∈ a ∧ A (x) is true for some
p ∈ G ⊆ |P|. Thus, x, p ∈ c by (1). But then x ∈G c; hence
x G ∈ cG .
The case with parameters, A(x, y ), presents no additional difﬁculties.
(iv) Pairing: By VIII.2.10.
(v) Union: We want to prove that for any a G ∈ M[G] a set b G ∈ M[G]

exists such that ( is absolute for transitive classes – cf. Section VI.8)

a G ⊆ bG

We concentrate on the case where ¬U (a G ), for otherwise the result is

trivial by absoluteness of ∅ for transitive classes (take b G = ∅).
Now, analyzing the issue inside U A , we ﬁnd that

a G = {x G : (∃y ∈ dom(a))x G ∈ y G } (2)

But then, taking b = dom(a) is what we want: Indeed, ﬁrst, as M is

a CTM for ZFC, and and dom are absolute, we have b ∈ M; hence

b G ∈ M[G]. Moreover, by (2), if z ∈ a G , then z has the form x G where,
for some y ∈ dom(a), x ∈G y. Say x, p ∈ y for some p ∈ |P|. Thus
x, p ∈ b; hence x G ∈ b G .
(vi) Foundation: Holds in any class (VI.8.11).
(vii) Collection: Again we look into the parameterless situation. Suppose
that we know that

(∀x ∈ a G )(∃y ∈ M[G])A M[G] (x, y) is true (3)

where A is over L Set,M and a ∈ M. We want to show that a set b ∈ M

exists such that

(∀x ∈ a G )(∃y ∈ b G )A M[G] (x, y) is true (4)

Since collection is true in M, the following holds in M (we have implicitly

invoked VIII.3.4 to obtain a formula, relativized to M, equivalent to
VIII.6. M[G] Is a CTM of ZFC If M Is 547

“ p w A(z, u)”):

(∀z, p ∈ dom(a) × |P|)(∃u ∈ M) p w A(z, u)
→ (∃W ∈ M)(∀z, p ∈ dom(a) × |P|)(∃u ∈ W ) p w A (z, u)
Or, moving (∃W ) to the front and asserting it to be a set (permissible
by III.8.4),

there is a set W in M such that

(∀z, p ∈ dom(a) × |P|)(∃u ∈ M) p w A(z, u) (5)
→ (∀z, p ∈ dom(a) × |P|)(∃u ∈ W ) p w A(z, u)
Now ﬁx a W ∈ M that veriﬁes (5), and let b = W × {1}. Clearly, by clos-
ure of M under × and pairs, b ∈ M. We will show that b G works for (4).
First off, an easy calculation (similar to that in the proof of VIII.2.8) gives

b G = {y G : y ∈ W } (6)

Towards (4), let x ∈ a G . Then x = v G for some v ∈ M and, moreover,

v ∈G a. Therefore

v ∈ dom(a) (7)

By (3) ﬁx a y ∈ M[G] that makes A M[G] (v G , y) true. Since y = t G for

some t ∈ M, we have the truth of A M[G] (v G , t G ); hence, for said v and
t, and some p ∈ G (by VIII.3.5),

p w A(v, t) (8)

By (7) and (8), using z = v and u = t in (5), we have satisﬁed an instance of

the hypothesis of (5). Thus, there is some c ∈ W such that p w A (v, c).
By the truth lemma (recall that the p we are talking about is in G),

A M[G] (v G , c G ) is true

Moreover, c G ∈ b G by (6). Thus c G will do as an instance of y in (4).

(viii) Power set: Let a ∈ M. We want to show that for some b ∈ M,

{x ∈ M[G] : x ⊆ a G } ⊆ b G (9)

It will turn out that b = P(dom(a) × |P|) M × {1} works. First off, since
M satisﬁes ZFC, b ∈ M. The same type of calculation we have done in
the collection case yields

b G = {z G : z ∈ P(dom(a) × |P|) M } (10)

548 VIII. Forcing

Let now x ∈ M[G] ∧ x ⊆ a G . We show that x ∈ b G . By hypothesis,

x = c G for some c ∈ M. We form

d = {z, p ∈ dom(a) × |P| : p w z ∈ c} (11)

By VIII.3.4, d ∈ M, since separation is true in M. We now see that d is

a “better” name for x above – i.e., for c G – than c is, because, using the
name d, it is easy to show that d G ∈ b G . We have two claims here: First,
cG = d G .
⊆: Let y ∈ c G . Then y = z G for some z ∈ dom(a) (recall that c G ⊆
a G ). By the truth lemma, there is a p ∈ G such that p w z ∈ c;
hence, by (11), z, p ∈ d. Hence, z G ∈ d G (for z ∈G d).
⊇: Let y ∈ d G . Then y = z G for some z ∈ M, and there is a p ∈ G
such that z, p ∈ d. By (11), z ∈ dom(a) and p w z ∈ c. Remem-
bering that p ∈ G, we get z G ∈ c G by VIII.3.5.
Second, we show that d G ∈ b G , which concludes our task. But this is so
by (10), since d ∈ P(dom(a) × |P|) M .
(ix) Inﬁnity: ω ∈ M ⊆ M[G].
(x) AC: It is convenient to use the version of AC given by Corollary VI.5.53.
Fix then a set x ∈ M[G]. For some set a ∈ M, x = a G . By the corollary,
let f = {β, f (β) : β ∈ α} in M such that dom(a) ⊆ ran( f ) – possible
by AC M . Recalling VIII.2.10 and VIII.2.6, we let

F = β̂, {β̂, f (β)} × {1} × {1} : β ∈ α × {1} (12)

Now F ∈ M. What is F G ? Using VIII.2.10 and VIII.2.6, we calculate

(as we did to obtain (6) above)
G
FG = β̂, {β̂, f (β)} × {1} × {1} : β ∈ α

= β, {β, f (β)G } : β ∈ α
& '
= β, f (β)G : β ∈ α

Thus F G is a function in M[G] with domain α (VIII.6.2). If y G ∈ a G ,

then y ∈ dom(a); hence y = f (β) for some β ∈ α by choice of f , and
therefore y G = f (β)G . Thus a G ⊆ ran(F G ).

We have been repetitiously insisting throughout this chapter that a being in M

be empowered to “force things to happen” in M[G] for any M-generic G. The
proof of the above theorem clariﬁes the meaning of that intention and shows
that it is wise and feasible: A being in M knows that ZFC is true in M. We
have ensured that he can use this knowledge and the p w . . . construct, which
VIII.7. Applications 549

is expressible in M, to force the truth of the ZFC axioms in a world (M[G])

that, for him, is “imaginary” or unreachable.
Actually he can do more, and this is the subject of the next section: By
choosing the notion of forcing P in M appropriately, he can force all sorts of
weird things to happen in the imaginary world M[G], such as 2ℵ0 = ℵℵ1 . And,
of course, he can force the truth of ¬CH.
Finally, we note that the name “generic” for the sets G (for any ﬁxed P)
is apt. They all “code the same information”, i.e., it does not matter which
particular M-generic G ⊆ |P| we choose. If M is a CTM for ZFC, then M[G]
is a CTM for ZFC. Moreover, if a P forces some additional properties (such as
¬CH) to be true in M[G], this is so for any M-generic G ⊆ |P|.

VIII.7. Applications
We conclude our lectures by presenting in this section elementary applications
of forcing. They all are based on PO sets of finite functions. We recall from
Section VI.8 that finiteness is absolute for CTMs of ZFC and therefore an
inhabitant of such a CTM proclaims a set “finite” exactly when such a set is
“really” finite. We will benefit from widening the scope of Examples VIII.1.3,
VIII.1.6, and VIII.1.9.

VIII.7.1 Example. Let† F(a, b) = F(a, b), <, 1 be a PO set deﬁned as fol-
lows in terms of sets a and b where a is inﬁnite and b = ∅:‡

F(a, b) = { p | p : a → b is a ﬁnite function}

1 is the function ∅ : a → b with empty domain. p < q will mean p ⊃ q

(reverse inclusion). Let M be a CTM and F ∈ M. Then absoluteness of pair-
ing and ﬁniteness allows the preceding deﬁnition to take place inside M with
the same result as if it were effected in U A . Let G ⊆ F be M-generic. Then

f = G is a function a → b that is total and onto. That it is a function
follows from the compatibility of p and q in G as in VIII.1.9. For totalness
one observes (as in VIII.1.9) that the set Di = { p ∈ F : p(i) ↓} is dense in
M for all i ∈ a. For ontoness we employ the dense sets Ri , for i ∈ b, where
Ri = { p ∈ F : i ∈ ran( p)}. Indeed, let q ∈ F. If q(i) ↓, then q ∈ Di . Otherwise,

Di % q ∪ {i, j} < q

† No connection between this “F” and Gödel operations of Section VI.9.

‡ We have used “{ p | . . . }” rather than our usual “{ p : . . . }” because a “ p : a . . . ” follows a few
symbols away.
550 VIII. Forcing

where we have picked j ∈ b arbitrarily (b = ∅). Similarly, if i ∈ ran(q), then

q ∈ Ri , else

Ri % q ∪ { j, i} < q

where we have picked j ∈ a − dom(q) (a is inﬁnite in M).

Thus, G ∩ Di = ∅ = G ∩ R j for all i ∈ a and all j ∈ b. That is, for all these
i, j there are ﬁnite subfunctions of f – p and q – with p(i) ↓ and j ∈ ran(q).
That is, we have veriﬁed totalness and ontoness.
As in VIII.1.9, G ∈ / M if b has two or more elements. Indeed, under the
present circumstances, if G ∈ M, then so is F − G, and we get a contradiction
as follows: First, F − G is dense. Indeed, let q ∈ F. Pick i ∈ a − dom(q) and
j, m distinct members of b. Then

q ∪ {i, j}⊥q ∪ {i, m}

Thus one of these extensions of q must be in F − G. Having veriﬁed the density

of this latter set, we now have (F − G) ∩ G = ∅.
It is useful to rephrase the fact that G ∈
/ M: M ⊂ M[G].

Absoluteness results of Sections VI.8 and VI.9 (in particular VI.9.20) show that
the function α "→ Fα of VI.9.2 is absolute for any CTM of ZF as long as we
take N of L N in M. Thus, if M is such a CTM, then

N = {Fα : α ∈ On} = {Fα : α ∈ On }

LM M M
(1)

If now M is a CTM of ZFC and we take a = ω and b = 2 in Example VIII.7.1,

we have, for any M-generic G, that M[G] is a CTM of ZFC (Theorem VIII.6.4)
and, moreover,

L M[G]
N = {Fα : α ∈ On} M[G] = {Fα : α ∈ On M[G] } (2)

Since On M = On M[G] (by VIII.6.2), (1) and (2) yield

M[G]
N = LN
LM

or, in words, M and M[G] have the same constructible sets.

Now, by the conclusion of the preceding example and in view of what we have
just remarked, any x ∈ M[G] − M is a set (M and M[G] have the same atoms;
cf. VIII.2.9) that is not constructible in M[G]. Thus, on the assumption that M
is a model of ZFC, we now have a model of ZFC + (V = L), namely M[G].

More speciﬁcally, it turns out that not only G ∈ / M, but also G ∈ / M (see
Exercise VIII.2). This function – viewed as a characteristic function – deﬁnes
VIII.7. Applications 551

a subset of ω. This subset is not in M, and hence is not constructible in M[G].

For the record we state all this as follows:

VIII.7.2 Proposition (Cohen). It is consistent with ZFC that non-constructible

sets exist. In fact, it is consistent with ZFC that a non-constructible subset of ω
exists.

VIII.7.3 Example (Collapsing Cardinals). Once more we look at Exam-

ple VIII.7.1. This time we ﬁx a CTM of ZFC, M, and take a = ω and b = ℵ1M .

Of course, ℵ1M is the smallest ordinal α > ω in M for which there is no 1-1
correspondence f : ω → α in M.

Let G be M-generic with respect to the PO set F(ω, ℵ1M ), and consider

g = G. We know that g ∈ M[G]. We also know that g : ω → α is total
and onto in M[G] (recall that ordinals are absolute, and M and M[G] have
the same ordinals). Thus, α is not an uncountable cardinal in M[G]; it is just
an at most countable ordinal (in view of g). We say that the cardinal ℵ1M was
collapsed as we passed from M to its generic extension M[G]. Therefore ℵ1M is
not an uncountable cardinal in M[G], that is, ℵ1M < ℵ1M[G] in M[G]. Of course,
all along, ℵ1M is just that α above.
This phenomenon of cardinals collapsing – a witness to the fact that being a
cardinal is not absolute for CTMs – is annoying because it causes more work
towards proving the relative consistency of ¬CH.

Of course, at most countable cardinals do not collapse, by absoluteness of ω

and of ﬁnite ordinals. Moreover, going backwards, all cardinals are preserved.

VIII.7.4 Proposition. Let α be an ordinal in M[G] such that (α is a

cardinal) M[G] . Then (α is a cardinal) M .

Proof. α is an ordinal in M by VIII.6.2. Suppose that the conclusion is false, and

let β < α and f : β → α be a 1-1 correspondence in M. Now “ f is a 1-1 corres-
pondence” is absolute for transitive models of ZF, and β and f are in M[G].
This contradicts that α is a cardinal in M[G].

VIII.7.5 Example (Towards the Relative Consistency of ¬CH). We turn

once again to Example VIII.7.1. This time we ﬁx a CTM of ZFC, M, and take
an F(a, b) ∈ M with a = ω × κ and b = 2, where κ is an uncountable cardinal
552 VIII. Forcing

in M, or
(κ is an uncountable cardinal) M (1)

Let G be M-generic and form M[G]. We know that if we let f = G, then
f : ω × κ → 2 is total and onto (2)
and
f ∈
/M by Exercise VIII.2 (3)
As a matter of fact, α "→ λn. f (n, α) † is 1-1 on κ: Indeed, for any α = β in κ
consider the set in M,

D = p ∈ F(ω × κ, 2) : (∃n ∈ ω) p(n, α) ↓ ∧

p(n, β) ↓ ∧ p(n, α) = p(n, β)

D is dense, as we can extend any q ∈ F(ω × κ, 2) by adding the triples

n, α, 0 and n, β, 1 to q, for some n such that q(n, α) ↑ and q(n, β) ↑
(there are plenty of such n, by ﬁniteness of q). Now G ∩ D = ∅ translates to
f (n, α) = f (n, β) for some n ∈ ω; hence λn. f (n, α) = λn. f (n, β).
Since the function f is in M[G], so is α "→ λn. f (n, α). The latter being 1-1
and total (on κ), it establishes
M[G]
Card(ω 2) ≥ Card(κ) (4)
or
(2ℵ0 ) M[G] ≥ Card M[G] (κ) (5)
Of course, the ordinal κ is an ordinal to inhabitants of both worlds, M and
M[G], but it would be reckless to assume, a priori, that κ is a cardinal in
M[G] just by virtue of being so in M – after all, we saw that cardinals may
collapse. Hence our conservatism in using “Card M[G] (κ)” rather than just “κ”
in (4) and (5) above.
Now, outside the context of a CTM, to deny CH (that says 2ℵ0 = ℵ1 ) is to
manage to get 2ℵ0 > ℵ1 , in other words
2ℵ0 ≥ ℵ2
In view of (5) and the above (relativized to M[G]) it would then sufﬁce to
take κ = ℵ2M and prove that uncountable cardinals, at least the two particular
ones ℵ1M and ℵ2M , are preserved as we pass from M to a generic extension (with

† Or λα.(λn. f (n, α)).

VIII.7. Applications 553

respect to the present F(a, b)) M[G], that is, we would like to show ℵ1M = ℵ1M[G]
and ℵ2M = ℵ2M[G] .
We will do that through a sequence of lemmata.

Pause. If ℵ2M collapses, then we are in trouble: (5) does nothing for us, because
the ordinal ℵ2M is countable in M[G]. If ℵ1M collapses, even if ℵ2M might not,
we are still in trouble, for ℵ2M is not the second inﬁnite cardinal in M[G] (that
is, ℵ2M[G] ) now that ℵ1M has collapsed.

VIII.7.6 Remark. Since for each α < κ, λn. f (n, α) codes a real number in
[0, 1] (in binary notation), we say that λn. f (n, α) is a Cohen generic real. Thus
we have added (these objects are new, by Exercise VIII.2) κ generic reals to
the ground model M. Intuitively, this set of reals turns out to be so huge that,
in M[G], it has cardinality large enough to allow some cardinalities below it,
but above ω.

VIII.7.7 Deﬁnition (The κ-Antichain Condition). Let κ be an inﬁnite cardi-

nal. A PO set P = P, <, 1 has the κ-antichain condition, for short κ-a.c., if
every antichain A ⊆ P has cardinality < κ.

In much of the literature the κ-antichain condition is called the κ-chain condition
or κ-c.c. In particular, when κ = ℵ1 one then speaks of the countable chain
condition or c.c.c.

VIII.7.8 Lemma. Let M be a CTM of ZFC, and P = P, <, 1 a PO set in

M that has κ-a.c.
in M.† Assume that κ is a regular uncountable cardinal in
M
M – that is, ω < κ ∧ κ is regular holds. Then κ is also a cardinal in M[G]
for any M-generic G ⊆ P.

Proof. Suppose instead that hypotheses hold, yet there is an M-generic G and an
onto function in M[G], f : α → κ, where α < κ in M[G]. That is, the ordinal κ
is not a cardinal in M[G]. By VIII.6.2, α is an ordinal in M as well.
There is a formula A(x, y, z) of L Set that says “x : y → z is an onto func-
tion”. Thus, if t ∈ M is such that f = t G , then, for some p ∈ G (we ﬁx one
such p), VIII.3.5 implies

p w A(t, α̂, κ̂)

† The “in M” cannot be emphasized enough. Since M is countable, a resident of U A trivially sees
that every antichain in M is countable.
554 VIII. Forcing

In words, we have the following sentence true in M:

p w t : α̂ → κ̂ is onto (1)

where theˆ-function is that of VIII.2.6.

For every β < α we let

Bβ = {γ < κ : (∃q ≤ p)q w t(β̂) = γ̂ } (2)

Bβ ∈ M for all β < α, since, by the deﬁnability lemma, the expression to the
right of “:” is a formula relativized to M. We next majorize the cardinality in
M of the Bβ , i.e., estimate (“from above”) Card M (Bβ ). To this end, let us pick
for each γ ∈ Bβ one q that works in (2) above. We denote this q by qγ .
Assume now that γ = δ are both in Bβ . We will argue that qγ ⊥qδ : If not,
let r ≤ qγ and r ≤ qδ . Now, by deﬁnition of the symbol “qγ ”, qγ w t(β̂) = γ̂
and qδ w t(β̂) = δ̂; hence, by monotonicity (VIII.3.3(2)), r w t(β̂) = γ̂ and
r w t(β̂) = δ̂. Let G % r be some M-generic set (cf. VIII.1.8). The truth

lemma yields γ = t G (β) = δ in M[G ] (recall that ordinals are preserved, and
both γ and δ are in M, for κ is). This is a contradiction.

Pause. While G = G in U A in general, and the same is true of t G versus

t G = f , these objects – G and t G – were just intermediate agents towards
deriving the contradiction γ = δ.

Thus, in M, Bβ maps 1-1 into some antichain C that contains the qγ objects
for the various γ ∈ Bβ . Therefore, Card(Bβ ) ≤ Card(C) < κ is true in M,† the
“<” contributed by the κ-a.c. of P in M. Since κ is regular in M, the following
is true in M by VII.6.11:

Card Bβ < κ (3)
β<α

We will next contradict (3) by proving, in M,

κ⊆ Bβ (4)
β<α

This shows that the assumption that we have an α and f in M[G] with the
stated properties is untenable, thus proving the lemma.
Towards (4), let us argue in M, and let γ < κ (i.e., γ ∈ κ). Since f is onto
κ (from α – this happens in M[G]), let β < α such that f (β) = γ . Thus, by

† Written without the M-superscript, since we have said “is true in M” (cf. VI.8.4).
VIII.7. Applications 555

the truth lemma, some q ∈ G satisﬁes

q w t(β̂) = γ̂ (5)
We will be done if we can say “without loss of generality, q ≤ p” for the p we
have ﬁxed at the outset of our proof (cf. (1)), for then γ ∈ Bβ by (2). To settle
the phrase in quotes, let r ∈ G be such that r ≤ q and r ≤ p – it exists because
both q and p are in G. Then r w t(β̂) = γ̂ by monotonicity (and the fact that
q works).

VIII.7.9 Deﬁnition. Let M be a CTM of ZFC, and P = P, <, 1 a PO set in

M. We say that P preserves cardinals just in case for every M-generic G ⊆ P
and every ordinal α of M (i.e., α ∈ On M ), if α is a cardinal in M, then it is also
a cardinal in M[G].

By absoluteness of ω and below, ﬁnite or countable cardinals are always

preserved (forward, from M to M[G]). By VIII.7.4 cardinals are also preserved
backwards.
Now, Lemma VIII.7.8 gives a sufﬁcient condition for the preservation of all
regular cardinals (in M) above κ (clearly, if P has the κ-a.c. in M, it also has
the λ-a.c. in M for all cardinals λ > κ). The following strengthens all this a bit,
by dropping the qualiﬁcation regular.

VIII.7.10 Corollary. Let M be a CTM of ZFC, and P = P, <, 1 a PO set

in M that has the ℵ1 -a.c. in M. Then P preserves cardinals.

Proof. We only worry about what happens beyond ω. By the remarks above,
if κ = ℵα+1
M
, a successor cardinal, then κ is preserved, since it is regular in M
(see VII.6.12). Suppose now that κ = ℵαM and Lim(α), that is, a limit cardinal.†

Thus κ = β<α ℵβM . By Remark VII.4.24(2), κ = β<α ℵβ+1 M
and all ℵβ+1
M
are
preserved.

Our next task is to show that the particular PO set of Example VIII.7.5 has
the ℵ1 -a.c. (or c.c.c. in the alternative terminology). We will need a deﬁnition
and two more lemmata.

VIII.7.11 Deﬁnition (-Systems). A family of sets A is called a -system, or

a quasi-disjoint family, provided that there is a set r , the root of A, such that
for any two a = b in A, a ∩ b = r .

† Lim(α) is absolute for CTMs.

556 VIII. Forcing

The name “-system” is suggested by the shape of a quasi-disjoint family

as sketched below (the members of the family depicted below are r ∪ a, r ∪ b,
r ∪ c, etc.):

u
a b c …

VIII.7.12 Lemma (-System Lemma). If the set A is an uncountable family

of ﬁnite sets, then there is an uncountable B ⊆ A that is a -system.

Proof. This is argued in ZFC (or in U A ).

First off, for each n ∈ ω, let Cn = {X ∈ A : Card(X ) = n}. There must be an

n ∈ ω such that Cn is uncountable; otherwise Card(A) ≤ ℵ0 , since A = n∈ω Cn
(this uses AC; see VII.2.13). Thus, without loss of generality, we assume that
there is some fixed n ∈ ω such that, for all X ∈ A, Card(X ) = n.† Let us then
prove the lemma by induction on n.
For the basis, n = 1, it suffices to take B = A and r = ∅.
We proceed to the n + 1 case, based on the obvious I.H.
Case 1. There is an a such that S = {X ∈ A : a ∈ X } is uncountable. By the
I.H., let D be an uncountable quasi-disjoint subfamily of {X − {a} : X ∈ S}
with root r . Then B = {X ∪ {a} : X ∈ D} with root r ∪ {a} is what we want.
Case 2. There is no such a as above. We then define by recursion (on ordinals
< ℵ1 ) a transfinite sequence in A:

X 0 = some arbitrary set in A

X α = some arbitrary set in A that is disjoint from Xβ
β<α

Assuming that the recursion above is legitimate, then {X α : α < ℵ1 } is un-

countable and quasi-disjoint. Indeed, the X α are pairwise disjoint, so that r = ∅
works.

† That is, we work with an uncountable Cn , call it “A”, and discard the original A.
VIII.7. Applications 557

is the second step in the recursion possible for any α < ℵ1 ? The
But why
set Y = Y ∈ A : Y ∈ / {X β : β < α} is uncountable; otherwise

Card(A) ≤ ℵ0 (that is, Card(Y )) +c Card{X β : β < α}

= ℵ0 +c Card(α) since β "→ X β is 1-1 and total on α
= ℵ0 +c ℵ0 by Card(α) ≤ α < ℵ1
= ℵ0

We will be done if we can argue that at least one Y ∈ Y is disjoint from all
X β , β < α. Any such Y can then be chosen to be X α .

Suppose instead that every Y ∈ Y intersects β<α X β . Then there is a
β0 < α such that X β0 ∩ Y = ∅ for uncountably many among the Y ∈ Y –
otherwise, Y is a countable union of countable sets Zβ = {Y ∈ Y : X β ∩ Y =
∅}, for β < α. Fixing attention on that β0 , we prove that some a ∈ X β0 is in
uncountably many Y , contradicting the case we are arguing under. Well, if not,
let for each a ∈ X β0

W a = {Y ∈ Y : a ∈ Y }

Each W a is countable; hence (X β0 being ﬁnite) so is a∈X β W a . But this union
0
is the set of Y -sets that X β0 intersects, and that is uncountable.

By the concluding remarks of Example VIII.7.5, all that remains to be done

is the following lemma:

VIII.7.13 Lemma. Let M be a CTM of ZFC, and F(a, b) = F(a, b), <, 1
a PO set in M, where a = ω × ℵ2M and b = 2. Then F(a, b) has the ℵ1 -a.c.
(or c.c.c.) in M.

Proof. The argument is carried out inside M.

F(a, b) is uncountable. Arguing by contradiction, let A ⊆ F(a, b) be an
uncountable antichain. The set

B = {dom( p) : p ∈ A} (1)

is also uncountable. If not, A ⊆ s∈B { p ∈ F(a, b) : dom( p) = s}, a countable
set, since for each finite s ⊆ ω × ℵ2M the cardinality of s 2 is finite (= 2Card(s) ),
and thus { p ∈ F(a, b) : dom( p) = s} is finite. Let D ⊆ B be an uncountable
-system of root r , and set A D = { p ∈ A : dom( p) ∈ D}. This is uncountable
due to the onto map p "→ dom( p).
Now, { p|`r : p ∈ A D } is finite, hence there are plenty of p and q in A D , indeed
uncountably many, with p = q and p|`r = q|`r . But then p and q are compatible,
558 VIII. Forcing

since dom( p) and dom(q) are in D, and therefore dom( p) ∩ dom(q) = r . But
also p⊥q, since both are in A.

For the record, we now have

VIII.7.14 Corollary (Cohen). With M and F(a, b) as above, if G is any M-

generic set, then (¬CH) M[G] is true.

Thus, a model of ZFC leads to a model of ZFC + ¬CH.

VIII.8. Exercises
VIII.1. In the definition of generic sets (VIII.1.7) we have required G to
be a filter definitionally. Prove that in the presence of the density
requirement we get that G is a filter for free, relaxing requirement (2)
in the definition of a filter (VIII.1.4) as follows: We only ask that any
two p and q in G be compatible (without asking for a witness in G).
(Hint. Fix p and q in G. It helps to prove that the following set is
dense: {r ∈ |P| : r ⊥ p ∨ r ⊥q ∨ (r ≤ p ∧ r ≤ q)}.)
VIII.2. Refer to Example VIII.7.1, and take a = ω and b = 2. We have seen
that if M is a CTM (of, say, ZF) with F(ω, 2) ∈ M and G is any

M-generic set, then G ∈
/ M. We also know that f = G is a function
and f ∈ M[G]. Prove that f ∈/ M.
(Hint. Let g : ω → 2 be in M. With the help of the set { p ∈ F(ω, 2) :
(∃n ∈ ω)( p(n) ↓ ∧ p(n) = g(n)}, prove that f = g.)
VIII.3. If M is a transitive model of ZF − P, then the following are absolute for
M, where we write πi , i = 1, 2, 3, for the ith projection of x, y, z:
(a) Pω (A), where Pω (A) = {x : x ⊆ A ∧ x is finite}.
(b) x is a PO set.
(c) x is a PO set ∧ y ∈ π1 (x) ∧ z ∈ π1 (x) ∧ y⊥z.
(d) x is a PO set ∧ y ⊆ π1 (x) ∧ ¬U (y) ∧ y is open.
(e) x is a PO set ∧ y ⊆ π1 (x) ∧ ¬U (y) ∧ y is dense.
(f) x is a PO set ∧ y ⊆ π1 (x) ∧ ¬U (y) ∧ ¬U (z) ∧ y is z-generic.
VIII.4. Prove that λx G.x G is absolute for transitive models of ZF − P.
VIII.5. Prove that λxP.x̂ of VIII.2.6 is absolute for transitive models of
ZF − P.
VIII.6. Provide all the necessary details that show In and Ne are absolute for
any transitive model of ZF − P.
VIII.8. Exercises 559

VIII.7. Chose an M, a PO set P in M, and G ⊆ M (not necessarily generic).

Show that (x ∪ y)G = x G ∪ y G for all x and y in M.
VIII.8. Simplify the expressions, giving your answer in terms of or w
(that is, |= should not ﬁgure in the ﬁnal answer).
(a) p w ¬A
(b) p w A ∨ B
(c) p w (∀x)A(x, y )
(d) p w (∃x)A(x, y )
(e) p A ∧ B
(f) p (∀x) A(x, y )
VIII.9. Let M be a CTM of ZFC, P a PO set in M, p ∈ |P|, and A a sentence
over L Set,M . Assume that D = {q ∈ |P| : q w A} is dense below
p, which means that for every r ≤ p, (≤r ) ∩ D = ∅. Prove that
p w A.
VIII.10. Given sentences A 1 , . . . , A n , B over L Set,M , for some CTM of
ZFC, M, and a PO set P in M. Prove that if we have A 1 , . . . , A n ZFC
B and p w A i (i = 1, . . . , n, p ∈ |P|), then we also have p w B .
Bibliography

Aczel, P. (1978). An introduction to inductive deﬁnitions. In Barwise (1978), Chap-

ter C.7, pages 739–782.
Apostol, T. (1957). Mathematical Analysis. Addison-Wesley, Reading, Massachusetts.
Bachmann, H. (1955). Transﬁnite Zahlen. Berlin.
Barwise, Jon (1975). Admissible Sets and Structures. Springer-Verlag, New York.
——— (1978), editor. Handbook of Mathematical Logic. North-Holland, Amsterdam.
——— and L. Moss (1991). Hypersets. The Mathematical Intelligencer, 13(4):31–41.
Blum, E. (1967). A machine-independent theory of the complexity of recursive functions.
J. ACM, 14:322–336.
Bourbaki, N. (1966a). Éléments de Mathématique. Hermann, Paris.
——— (1966b). Éléments de Mathématique; Théorie des Ensembles. Hermann, Paris.
Burgess, John P. (1978). Forcing. In Barwise (1978), Chapter B.4, pages 403–452.
Cohen, P. J. (1963). The independence of the continuum hypothesis, part I. Proc. Nat.
Acad. Sci. U.S.A., 50:1143–1148. Part II, 51:105–110 (1964).
Davis, M. (1965). The Undecidable. Raven Press, Hewlett, N.Y.
Dedekind, R. (1888). Was Sind und Was Sollen die Zahlen? Vieweg, Braunschweig.
Devlin, K. J. (1978). Constructibility. In Barwise (1978), Chapter B.5, pages 453–489.
Dijkstra, Edsger W., and Carel S. Scholten (1990). Predicate Calculus and Program
Semantics. Springer-Verlag, New York.
Enderton, Herbert B. (1972). A Mathematical Introduction to Logic. Academic Press,
New York.
Feferman, S. (1965). Some applications of the notion of forcing and generic sets.
Fundamenta Mathematicae, 56:325–345.
———, and A. Levy (1963). Independence results in set theory by Cohen’s method II
(abstract). Notices Amer. Math. Soc., 10:592.
Frege, G. (1893). Grundgesetze der Arithmetik, Begriffsschriftlich Abgeleitet, volume 1.
Jena.
Gitik, M. (1980). All uncountable cardinals can be singular. Israel J. Math., 35(1–2):61–
88.
Gödel, K. (1938). The consistency of the axiom of choice and of the generalized con-
tinuum hypothesis. Proc. Nat. Acad. Sci. U.S.A., 24:556–557.
——— (1939). The consistency of the axiom of choice and of the generalized continuum
hypothesis. Proc. Nat. Acad. Sci. U.S.A., 25:220–224.

560
Bibliography 561

——— (1940). The Consistency of the Axiom of Choice and of the Generalized
Continuum-Hypothesis with the Axioms of Set Theory. Annals of Math. Stud. 3.
Princeton University Press, Princeton.
Gries, David, and Fred B. Schneider (1994). A Logical Approach to Discrete Math.
Springer-Verlag, New York.
——— and ——— (1995). Equational propositional logic. Information Processing
Lett., 53:145–152.
Hartogs, F. (1915). Über das Problem der Wohlordnung. Math. Ann., 76:438–443.
Hermes, H. (1973). Introduction to Mathematical Logic. Springer-Verlag, New York.
Hilbert, D., and P. Bernays (1968). Grundlagen der Mathematik I, II. Springer-Verlag,
New York.
Hinman, P. G. (1978). Recursion-Theoretic Hierarchies. Springer-Verlag, New York.
Jech, T. J. (1978a). About the axiom of choice. In Barwise (1978), Chapter B.2,
pages 345–370.
——— (1978b). Set Theory. Academic Press, New York.
Kamke, E. (1950). Theory of Sets. Translated from the 2nd German edition by
F. Bagemihl. Dover Publications, New York.
Kunen, Kenneth (1980). Set Theory: An Introduction to Independence Proofs. North-
Holland, Amsterdam.
Levy, A. (1979). Basic Set Theory. Springer-Verlag, New York.
Manin, Yu. I. (1977). A Course in Mathematical Logic. Springer-Verlag, New York.
Mendelson, Elliott (1987). Introduction to Mathematical Logic, 3rd edition. Wadsworth
& Brooks, Monterey, California.
Monk, J. D. (1969). Introduction to Set Theory. McGraw-Hill, New York.
Montague, R. (1955). Well-founded relations; generalizations of principles of induction
and recursion (abstract). Bull. Amer. Math. Soc., 61:442.
Pincus, D. (1974). Cardinal representatives. Israel J. Math., 18:321–344.
Rasiowa, H., and R. Sikorski (1963). The Mathematics of Metamathematics. Państwowe
Wydawnictwo Naukowe, Warszawa.
Schütte, K. (1977). Proof Theory. Springer-Verlag, New York.
Shoenﬁeld, Joseph R. (1967). Mathematical Logic. Addison-Wesley, Reading,
Massachusetts.
——— (1971). Unramiﬁed forcing. In Dana S. Scott, editor, Axiomatic Set Theory, Proc.
Symp. Pure Math., pages 357–381.
——— (1978). Axioms of Set Theory. In Barwise (1978), Chapter B.1, pages 321–344.
Sierpiński, W. (1965). Cardinal and Ordinal Numbers. Warsaw.
Skolem, T. (1923). Einige Bemerkungen zur axiomatischen Begründung der Mengen-
lehre. In Wiss. Vorträge gehalten auf dem 5. Kongress der skandinav. Mathematiker
in Helsingförs, 1922, pages 217–232.
Smullyan, Raymond, M. (1922). Gödel’s Incompleteness Theorems. Oxford University
Press, Oxford.
Tarski, A. L. (1955). General principles of induction and recursion (abstract); The notion
of rank in axiomatic set theory and some of its applications (abstract). Bull. Amer.
Math. Soc., 61:442–443.
–—— (1956). Ordinal Algebras. North-Holland, Amsterdam.
Tourlakis, G. (1984). Computability. Reston Publishing Company, Reston, Virginia.
——— (2000a). A basic formal equational predicate logic – part I. BSL, 29(1–2):43–56.
——— (2000b). A basic formal equational predicate logic – part II. BSL, 29(3):75–88.
——— (2001). On the soundness and completeness of equational predicate logics.
J. Computation and Logic, 11(4):623–653.
562 Bibliography

Veblen, Oswald, and John Wesley Young (1916). Projective Geometry, volume I. Ginn
and Company, Boston.
Whitehead, A. N., and B. Russell (1912). Principia Mathematica, volume 2. Cambridge
Univ. Press, Cambridge.
Wilder, R. L. (1963). Introduction to the Foundations of Mathematics. Wiley, New York.
Zermelo, E. (1904). Beweis daß jede Menge wohlgeordnet werden kann. Math. Ann.,
59:514–516.
——— (1908). Untersuchungen über die Grundlagen der Mengenlehre I. Math. Ann.,
65:261–281.
——— (1909). Sur les ensembles ﬁnis et le principe de l’induction complète. Acta
Math., 32:185–193.
List of Symbols

[ f (0),. . . , f (n − 1)], 251 t(P) (transitive closure of P), 262

A × B, 189 P+ (transitive closure of P (alternate
π (first projection of pairing notation)), 232, 262
∗
function), 186 P (reflexive-transitive closure of P),
δ (second projection of pairing 269
function), 186 Cn, 458
A M , 79 cf(α), 478
+ A, 49 P ◦ S (relational composition), 253
MP (adjacency map of P), 269 f ∗ g, 252
ℵα , 465 a I , 54
, 516 D(X ) (set of sets definable from X ),
f (n) = O(g(n)) (“big-O” notation), 223
283 Sa ↓ (S is defined at a), 198
(∃a ∈ M) . . . , 63 0 , 101, 380
A(y1 , . . . , yk ]), 20 ∆A : A → A (diagonal relation on
A[y1 , . . . , yk ], 20 A), 257
m, 458 −, 141
a ·c b, 472 < , 413
#
Card( i∈I ki ), 485 X Y , 412
a+ , 465 <∗ , 413
!
i∈I ki , 484 dom (domain of a relation), 195
κ, 458 ∅, 111, 131
Card(x), 458 A ∼ B, 431
Cl(R), 495 A ∼ B, 431
rA (P) (reflexive closure of w , 530
P : A → A), 262 , 536
s(P) (symmetric closure of P), 262 |=I, 80
563
564 List of Symbols

f : A → B, 26 N, 12
f I , 54 xn , 185
glb(B), 347 x1 , . . . , xn (n-vector or n-tuple),
1A : A → A (identity relation on A), 185
257 x , 185
S[X] (image of X under S), 196 N, 15
S−1 [X] (inverse image of X under S), N, 112, 233
196 ω, 112, 234
Sc, 196 n, m, l, i, j, k (natural number
i, 55 variables), 235
p⊥q, 520 |=, 61
inf(B), 67, 347 , 61
b T a (T a relation; it means ∈,
/ 116
a, b ∈ T), 194 n , 88
Z, 113 On, 332
n
A , 142 OP, 188
i=1 i
A , 142 < (abstract symbol for order), 284
1≤i≤n i
a∈I A a , 193 Ord, 331
∩, 141 ord(x), 397

A (intersection of a class A), , 374
153 ∼
=, 317
Z, 15 ≤ (abstract symbol for reflexive
(ιz)Q , 74 order), 285

κ-a.c., 553 β∈α ({β} × X β ), 135, 412
·β
κ-c.c., 553 α , 423
Lim (α), 342 α + 1, 340
L, 397 a, b (ordered pair), 186
L(M), 55 (A, <), 286
L N , 397 ≺, 462
λ (used in λ-notation), 202 ≺ (a fixed order of logical and
"→ (alternative to λ-notation), 203 nonlogical symbols), 222, 223
, 438 , 461
∞ , 438 pr (the predecessor function on ω),
<, 336 239
L M (the constructible universe), 284 P I , 54
, 35 An , 192
, 39 A1 × · · · × An , 192
lub(B), 346 P0 (0th power of a relation), 257
α − 1, 342 Pn ( positive power of a relation), 256
List of Symbols 565
#
Fa , 207 sup(B), 346
×
a∈I
n
A , 192 sup+ (A), 347
i=1 i
×
I
1≤i≤n
Ai , 192 t M , 79
Term(M), 55
A, 208
S + (the set of all nonempty strings
∃!, 67
over the set S), 224
A/P, 277
T C(A) (transitive closure of the
R N (α), 359
class A), 265
ran (range of a relation), 195
(order of “definable sets”), 219,
ρ, 369
223, 225
r k, 369
. . ., 20
Q, 113
Q, 15 U M , 144
R, 15 U N (the class of all sets and atoms),
R, 113 144
S−1 (inverse of S), 196 U, 144
, 55 Sa ↑ (S is undefined at a),
T A (restrict inputs of T to be 198
n
in A), 198 A , 142
i=1 i
T A (alternate notation for “”), A , 142
1≤i≤n i
199 A
a∈I a , 193
T | A (that is, T ∩ A2 ), 198 ∪, 141

S(x), 340 A (union of a class A), 150

string A (union of a set A), 152
λ, 13 U (x) (“x is an urelement”), 114
≡, 13 VN (α), 359
⊆, 117, 139 V = L, 408
⊆, 117 V = L, 408
⊇, 117, 139 an , 21
⊇, 117 a , 21
⊂, 119, 139 xn , 34
, 119 V M , 144
, 119 V N (the class of all sets), 145
⊃, 139 V, 145
n + 1 (the successor of n ∈ ω, i.e., WF N , 359
n ∪ {n}), 242 Wff(M), 55
a +c b, 470 ZF, 108, 228
sp, 309 ZFC, 2, 108, 229
Index

absolute formula, 382 of (unordered) pair, 145, 211

absolute term, 382 of choice, see AC
absolutely provable, 38 of collection, 163, 204
absoluteness of rank, 394 of constructibility, 408
AC, 215, 274 of dependent choices, 298
global, 226 of foundation, 158
strong, 226 of inﬁnity, 234
Aczel, P., 498 of power set, 180
addition, 246 of replacement, 161, 163, 371
over ω, 246 of union, 151
is associative, 248 special, 8
is commutative, 247 axiomatic theories, 39
adjacency map, 269 axiomatized, 39
adjacency matrix, 269 axioms, logical, 35
∀-introduction, 44
alephs, 465 Barwise, J., 192, 368, 395
algebraic topology, 275 basic language, 108
algorithm, 283 Bernays, P., 34, 42, 315
alphabet, 7, 13 Berry’s paradox, 104
ambiguity, 25 beth, 516
ambiguous, 25, 498 bijection, see 1-1 correspondence
antichain, 520 binary digit, 455
antinomies, 104 binary notation, 455
Apostol, T., 228 binary relation, see relation
argot, 40 bit, 455
arithmetization, 90 Blum, E., 228
arity, 9 Blum’s speedup theorem, 228
array, 233 BNF, 18
ascending sequence, 348 Boolean, 9
associativity, 17, 256 Boolean addition, 270
at most enumerable, 442 Boolean sum, 271
atom, see urelement bound variable, 19
axiom, 8 Bourbaki, N., 5, 43, 229
comprehension, 121 Brouwer, L. E. J., 4, 12
logical, 8 Burali-Forti, C., 122
nonlogical, 8, 37
of regularity, 159 calculational proof, 85
urelements are atomic, 116 Cantor, G., 105, 122

567
568 Index

cardinal, 207, 458, 467 concatenation, 13, 252

inaccessible, 483 conditions, 520
strongly inaccessible, 483 congruence, 261
weakly inaccessible, 483 conjunction, 17
cardinal multiplication, 472 conjunctionally, 85
cardinal number, 431, 458, 467 connectives, 9
cardinal product, 472 conservative, 68, 74
cardinal successor, 465 conservative extension, 72, 168
cardinality, 431, 458, 467 consistency, 61, 531
cardinals consistency theorem, 64, 65
product of family, 485 constant, 10
sum of, 470 imported, 55
sum of family, 484 constructible object, 397
Cartesian product, 189 constructible sets, 396
of many classes, 192 constructible universe, 219, 229, 284, 395
category theory, 275 construction formative, 43
c.c.c., 553 continuum hypothesis, 3, 466
CH, 466 correct, 88
chain, 354, 520 correct theory, 88
choice function, 216 countable, 64, 90, 314, 442
class, 136 countable chain condition, 553
∈-closed, 237 course-of-values induction, 281, 294
domain of, 195 CTM, 314
f -closed, 434 cumulative hierarchy, 359
proper, 137
R-closed, 495 De Morgan’s laws
range of, 195 for classes, 154
transitive, 236 generalized, 212
class term, 134 decision problem, 41
closed formula, 20 Dedekind, R., 449
closed term, 20 Dedekind finite, 449
closed under, 21, 434 Dedekind infinite, 449
closure, 21 deduce, 38
see also relation deduction theorem, 49
cofinal subsets, 478 deductions, 1
cofinality, 478 definability, 61, 62
Cohen, P. J., 229, 467, 487, 526 in a structure, 61
Cohen extension, 524, 526 definition by recursion, 26
Cohen generic real, 553 0 -formulas, 380
collapsing cardinals, 551 -system, 555
collapsing function, 310 root of, 555
collection, 149 dense, 520
collection axiom, see axiom dense below, 559
commutative diagram, 275 denumerable, 442
compactness theorem, 65 derivation, 22
comparable, 520 derived rule, 37, 39, 41
compatible, 520 Devlin, K. J., 229
complement, 145 diagonalization, 121, 156
complete, 97 difference, 250
complete arithmetic, 40 of two classes, 141
complete lattice, 349 over ω, 250
completeness, 6 disjoint classes, 141
completeness theorem, 54, 64, 65, 97 disjoint union, 412
completion, 97 distributive laws, 95, 212
composition, 195 for ω, 282
see also relation for ×, 213
computer programming, 232 generalized, 212
Index 569

domain, 54 formalize, 6
double induction, 248, 300 formalized, 3
dummy renaming, 46 formula
absolute, 382
∃-introduction, 37 mixed-mode, 62
elimination of defined symbols, 69, 71 prime, 29
embedding, 318 propositional, 29
empty set, 100, 111, 131 satisfiable, 31
Entscheidungsproblem, 41 tautology, 31
enumerable, 442 unsatisfiable, 31
ε-numbers, 426 formula form, 35
ε-term, 76 formula schema, 35
equinumerous, 431 foundation axiom, 102
equipotent, 431 Fraenkel, A. A., 105
equivalence class, 276, 277 free for, 33
equivalence relation, 276 free variable, 19
equivalence theorem, 52 Frege, G., 121
existential formula, 67 function, 199
explicit listing, 147 bijective, 204
expression, 2, 8, 13 collapsing, 310
extension, 7, 194 continuous, 348
Cohen, 524 countably continuous, 348
generic, 524 diagram, 275
of a condition, 520 commutative, 275
extensional, 31, 312 expansive, 353
extensionality, 67 inclusive, 353
increasing, 348
family injective, 203
of sets, 150 inverse of, 273
indexed, 206 left inverse of, 272
intersection of, 153 left-invertible, 274
union of, 150 monotone, 348
quasi-disjoint, 555 non-decreasing, 348
Feferman, S., 228, 447, 530 one-to-one, 203
filter, 521 order-preserving, 317
generated by a chain, 522 partial, 199
finitary, 4 right inverse of, 272
finitary rule, 503 right-invertible, 274
finite, 389 surjective, 204
finite sequence, 251 function diagram, 275
length of, 251 function substitution, 201
finitely satisfiable, 98
first incompleteness theorem, 87 G-interpretation, 525
first order definable, 111 -closed, 437
fixed point, 351, 438 -induction, 438
fixpoint, 351, 438 GCH, 466, 484
forcing, 315 generalization, 45, 120
forcing conditions, 520 generalized continuum hypothesis, 408,
forcing language, 528 466, 484
formal interpretation, 78 generic, 518
of a theory, 83 generic extension, 524, 526
formal isomorphism, 84 global choice, see AC
formal language, 111, 115 Gödel, K., 3, 6, 229, 395, 467
first order, 7 Gödel operations, 396
formal model Gödel-Rosser incompleteness theorem, 92
of a theory, 83 Gödel’s second incompleteness theorem,
formal natural numbers, see set 93
570 Index

Gries, D., 5 interpretation function, 525

ground model, 518, 526 intersection, 153
of a class, 153
Hausdorff, F., 230 of a family of sets, 153
Hausdorff’s theorem, 230, 357 of a higher order collection, 193
Hermes, H., 43 of many classes, 142
Heyting, A., 4 of two classes, 141
Hilbert, D., 4, 12, 42, 93, 315 inverse image of a class under a relation, 196
Hinman, P. G., 498 iota notation, 73
hypersets, 156, 368 i.p., 25, 497
i.p.s., 497
IC, 293 irreflexive, 284
iff, 13 isomorphic, 240
I.H., 23 isomorphic image, 242
image, 177 isomorphism, 241, 313, 316
image of a class under a relation, 196 iteratively, 22
immediate predecessor set, 497
immediate predecessors, 25, 497 Jech, T. J., 229, 395
implied multiplication, 260
imported constant, 55 Kamke, E., 355
inaccessible, 483 κ-antichain condition, 553
incompatible, 520 κ-chain condition, 553
incompletable, 89 KP, 192
incompletableness, 62, 87 KPU, 192
incompleteness, 6 Kronecker, L., 5
incompleteness theorem (first, of Gödel), 112 Kuratowski-Zorn theorem, 230, 356
indefinite article, 75
index class, 193 λ-notation, 202
indexed family of sets, see family of sets language
individual variables, 9 countable, 64
induced order, 319 extension, 55
induction, 235 first order, 16
on finite sets, 441 restriction, 55
on TC, 393 uncountable, 64
on WR-finite sets, 441 lattice, 521
over , 438 laws of exponents, see relation, positive
over δ, 343 power of
over On, 343, 344 ∈-least, 244
over ω, 235 least number principle, 281
course-of-values, 281 left inverse, see function
induction hypothesis, 23, 235 Leibniz, G. W., 35
inductive definitions, 19, 21, see also recursive Leibniz rule, 52
definitions Levy, A., 199, 228, 439, 447, 467
inductive set, see set lexicographically, 91, 413
inductiveness condition, 293 limit cardinal, 465, 555
infima, 347 limit ordinal, 342
infimum, 347 Lindemann, F., 5
infinite descending chain, 297 linear order, see order
infix, 14, 195 linearly ordered class, 287
informal, 10 linearly ordered set, 230, 287
initial ordinals, 459 LO class, see linearly ordered class
injection, see function, injective LO set, see linearly ordered set
injective function, see function logical symbols, 9
input object, 193 logically implies, 54
instance, 37, 129 Löwenheim, L., 64
interpretation, 54 lower bound, greatest, 347
Index 571

m-based rule class, 503 occurs in, 13

M-generic sets, 522 1-1 correspondence, 64, 204, 273
M-instance, 56 1-1 function, see function
M-term, 55 one point rule, 71, 201
M-formula, 55 one-to-one function, see function
Mal cev, A. I., 65 onto, 64
Manin, Yu. I., 40 open, 520
material implication, 17 operator, 437
mathematical theory, see theory inductive, 438
matrix multiplication, 283 monotone, 437
maximal element, 287 order, 284
maximum element, 287 lexicographic, 224
MC, 289 linear, 287
weak, 372 partial, 284
metalanguage, 7 total, 287
metamathematics, 7 order of x, 397
metatheorems, 11 order-preserving, 348
metatheory, 4, 7 order type, 241, 339
metavariable, 169 order-isomorphic, 317
minimal condition, 289 order isomorphism, 317
minimal element, 287 ordered disjoint sum, 412
∈-minimal element, 239 ordered n-tuple, 185
minimality principle, 239 ordered pair, 184, 188
∈-minimum, 244 ordered sequence, 20
minimum element, 287 ordinal, 324, 331
model, 57 coﬁnal in, 478
countable, 64 limit, 342
uncountable, 64 of a monotone operator, 503
modus ponens, 37, 120 regular, 479
monotonicity, 51, 531 singular, 479
Montague, R., 306 successor, 342
Moschovakis, Y. N., 186, 502 ordinal exponentiation, 423
Moss, L., 368 ordinal number, 43, 324, 331
Mostowski, A., 310, 426 ordinals, 65, 219, 284, 324
MP, 37 ordinary recursion theory, 196
multiplication of natural numbers, 259, output object, 193
282
is associative, 282 pairing function, 184, 186
is commutative, 282 ﬁrst projection of, 186
second projection of, 186
natural numbers, 12, 235 pairwise disjoint, 216
distributive law, 282 pairwise disjoint sets or classes, 153
natural projection, 280 paradoxes, 104
non-deterministic, 498 parameter, 170, 137, 194
non-extensional, 312 parsable, 30
non-null, 141 parse, 22
non-void, 141 partial function, see function
non-monomorphic, 498 partial multiple-valued function, 199
normal, 349 partial order, see order
notion of forcing, 520 partially ordered class, 286
null, 141 partially ordered set, 286
numeral, 42, 88 partition, 278
n-vector, 185 Peano arithmetic, 87, 101
pigeon-hole principle, 432
object variables, 9 Pincus, D., 468
occurrence, 13 Platonist, 103
572 Index

PO class, see partially ordered class recursive, 89

PO set, see partially ordered set recursive definitions, 21, 244, 301
Post, E., 97 over ω, 244
Post’s theorem, 97 recursively axiomatizable, 40
power class, 178 recursively axiomatized, 39
power set, 149, 178 reference set, 145
predecessor, 238, 498 reflection principle, 504
predicate, 10 reflexive closure, 262
defined reflexive-transitive closure, 362
subset, see subset relation regularity axiom, see axiom, nonlogical
prefix, 13, 195 relation, see predicate, 20, 194
preservation of cardinals, 555 antisymmetric, 260
prime factorization theorem, 446 binary, 194
primitive recursive pairing function, closure, 262, 495
189 reflexive, 262
principle 3 of set construction, 161 symmetric, 262
priority, 17, 533 transitive, 262
of set operations, 154 composition, 253
product, 485 converse of, 196
programming formalism, 233 defining formula of, 194
programming language, 233 diagonal, 257
proof, 1, 38 domain of, 195
-, 38 equivalence, 276
by auxiliary constant, 53 extension of, 198
by cases, 52 field of, 195
by contradiction, 51 identity, 257
proper class, see class input of, 195
proper extension, 520 inverse of, 196
proper prefix, 13 irreflexive, 260
proper subset, see subset relation left field of, 196
propositional axioms, 97 left-narrow, 301
propositional calculus, 97 n-ary, 194
propositional connectives, 9 nontotal, 196
propositional logic, 97 onto, 196
propositional segment, 97 output of, 195
propositional valuations, 30 positive power of, 256
propositional variable, 29 laws of exponents, 259
provably equivalent, 43 range of, 195
proved from, 38 reflexive, 260
pure recursion, 308 restriction of, 198
over δ, 344 right field of, 196
over On, 344 setlike, 301
pure sets, 310, 346 single-valued, 199
pure theory, 57 symmetric, 260
total, 196
quantifier transitive, 260
bounded, 149 unary, 194
quotient set, 277 well-founded, 297
relational composition, see relation,
rank, 369 composition
Rasiowa, H., 42 relational extension of a formula, 194
R-closed, 21 relational implementation, 194
“real sets”, 110 relative consistency of ZFC−f, 367
recursion, 20 relative universe, 145
over δ, 344 relativization, 79
over On, 343 replacement axiom, see axiom
Index 573

reverse inclusion, 521, 549 simple completeness, 63

right inverse, see function simply complete, 88
R-induction, 496 simultaneous substitution, 34
Rosser, J. B., 92 singleton, 147, 162
rule, 495 size limitation doctrine, 161
finitary, 495 Skolem, T., 64
infinitary, 495 Skolem function, 76
rule set, 494 Skolem paradox, 508
rules of inference, 8, 29 ∈-smallest, 244
Russell, B., 74, 104, 122, 439 Smullyan, R., 88
sort, 17
satisfiable, 57 soundness, 59, 81
finitely, 57 specialization, 44, 117
schema, 35 stage, 24
schema instance, 35 of collecting type, 219
Schneider, F. B., 5 of powering type, 219
scope, 15, 17, 533 of set formation, formal definition,
segment, 230, 301 368
closed, 301 principle 0 of set construction, 102
initial, 301 principle 1 of set construction, 102
selection axiom, 229 principle 2 of set construction, 102
self-reference, 94 string, 13
semantics, 6 empty, 13
sentence, 20 λ, 13
separation, 67 equality, 13
sequence, 208 ≡, 13
set, 99 prefix, 13
∈-closed, 237 proper, 13
countable, 442 strong choice, see AC
Dedekind finite, 449 strong limit, 483
Dedekind infinite, 449 strongly inaccessible, 483
definable, 219, 220, 223 structure, 54
empty, 111 domain of, 54
enumerable, 442 expansion of, 55, 72
finite, 431 reduct of, 55
inductive, 233 underlying set of, 54
inductively defined, 495 universe of, 54
infinite, 431 subclass, 139
name, 110 proper, 139
of (formal) natural numbers, 234 subscripted variable, 233
of natural numbers, 112 ⊆, 21
pure, 310 subset relation, 117, 139
rank of, 206 proper, 119
recursively defined, 495 substitutable for, 33
successor, 233 substitution, 32
transitive, 220, 236, 331 simultaneous, 34
uncountable, 442 substitution axiom, 35
well-founded, 158, 359 substring, 11, 13
WR-finite, 440 subtraction, see difference
WR-infinite, 440 successor cardinal, 465, 555
sethood, 18, 106 successor ordinal, 342
set term, 126 superclass, 139
by listing, 133 proper, 139
set universe, 144 superset relation, 117
Shoenfield, J. R., 395 support, 309
Sikorski, R., 42 support function, 346
574 Index

supremum, 346 trichotomy, 241, 243, 245, 281,

surjection, see function, surjective 287
surjective function, see function true, 110
symbol sequence, 2 truth (value) assignment, 30
symmetric closure, 262 truth functions, 31
syntactic variable, 169 type, 17

table, 20 unambiguous, 26, 55, 498

Tarski, A. L., 306, 530 unary, 18
Tarski semantics, 87 unbounded, 479
τ -term, 76 uncountable, 64, 90, 442
tautologies, 29 undecidable sentence, 88
tautology theorem, 97 underlying set, 54
term, 126 union, 24, 150, 152
absolute, 382 of a class, 150
theorem, 37 of a family of sets, 150
-, 37 of a higher order collection, 193
theorem schema, 41 of many classes, 142
theory, 39 of two classes, 141
absolute, 40 uniquely readable, 30
applied, 40 universal closure, 45
complete, 88 universal quantifier, 17
conservative extension of, universe, 144
47 unordered pair, 147
consistent, 39, 229 upper bound, 346
contradictory, 39 least, 346
correct, 88 urelement, 99, 114
extension of, 47 Urelemente, 100
first order, 39
formal interpretation of, 83 vacuously satisfied, 32
formal model of, 83 valid, 57
incompletable, 228 logically, 57
incomplete, 88 universally, 57
inconsistent, 39 valuation, 30, 98
ω-consistent, 88 variable
pure, 40 bound, 19
semantically complete, 63 free, 19
simply complete, 88 frozen, 50
simply incomplete, 88 variant, 36, 46
sound, 57 Veblen, O., 53
total, 186 vector, 185
total function, 143 vector notation, 21
total order, see order void, 141
totally defined function, 143 von Neumann, J., 104
totally ordered set, 230 von Neumann universe, 107
Tourlakis, G., 228
transfinite, 284 Warshall, S., 283
construction, 284 weak equality, 202, 279
sequence, 348 weak forcing, 530
transitive, 284 weak MC, 372
transitive class, see class weakly continuous, 349
transitive closure, 232, 262 weakly inaccessible, 483
of a class, 265 weakly normal, 349
transitive set, see set well-defined, 280
transitivity of , 38 well-founded relations, 297
transpose, 271 well-founded sets, 158, 359
Index 575

well-ordering, 219, 223, 284, 289 Zermelo, E., 105, 230, 355
Whitehead, A. N., 74, 439 Zermelo’s well-ordering principle, 355
Wilder, R. L., 449, 455, 458 Zermelo’s well-ordering theorem, 230
WO class, 289 Zermelo-Fraenkel axioms, 108
WR-ﬁnite, 440 Zermelo-Fraenkel set theory, 228
WR-inﬁnite, 440 Ziffern, 42
Zorn’s lemma, see Kuratowski-Zorn
Young, J. W., 53 theorem