katzm_reimann2018_An Introduction to Ramsey Theory - Fast Functions, Infinity and Metamathematics
katzm_reimann2018_An Introduction to Ramsey Theory - Fast Functions, Infinity and Metamathematics
Volume 87
An Introduction
to Ramsey Theory
Fast Functions, Infinity,
and Metamathematics
Matthew Katz
Jan Reimann
Mathematics
Advanced
Study Semesters
An Introduction
to Ramsey Theory
S T U D E N T M AT H E M AT I C A L L I B R A RY
Volume 87
An Introduction
to Ramsey Theory
Fast Functions, Infinity,
and Metamathematics
Matthew Katz
Jan Reimann
Mathematics
Advanced
Study Semesters
Editorial Board
Satyan L. Devadoss John Stillwell (Chair)
Rosa Orellana Serge Tabachnikov
Copying and reprinting. Individual readers of this publication, and nonprofit li-
braries acting for them, are permitted to make fair use of the material, such as to
copy select pages for use in teaching or research. Permission is granted to quote brief
passages from this publication in reviews, provided the customary acknowledgment of
the source is given.
Republication, systematic copying, or multiple reproduction of any material in this
publication is permitted only under license from the American Mathematical Society.
Requests for permission to reuse portions of AMS publication content are handled
by the Copyright Clearance Center. For more information, please visit www.ams.org/
publications/pubpermissions.
Send requests for translation rights and licensed reprints to reprint-permission
@ams.org.
2018
c by the authors. All rights reserved.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 23 22 21 20 19 18
Contents
Preface ix
v
vi Contents
Bibliography 199
Notation 203
Index 205
Foreword: MASS at
Penn State University
vii
viii Foreword: MASS at Penn State University
If we split a set into two parts, will at least one of the parts behave like
the whole? Certainly not in every aspect. But if we are interested only
in the persistence of certain small regular substructures, the answer
turns out to be “yes”.
A famous example is the persistence of arithmetic progressions.
The numbers 1, 2, . . . , N form the most simple arithmetic progres-
sion imaginable: The next number differs from the previous one by
exactly 1. But the numbers 4, 7, 10, 13, . . . also form an arithmetic
progression, where each number differs from its predecessor by 3.
So, if we split the set {1, . . . , N } into two parts, will one of them
contain an arithmetic progression, say of length 7? Van der Waerden’s
theorem, one of the central results of Ramsey theory, tells us precisely
that: For every k there exists a number N such that if we split the
set {1, . . . , N } into two parts, one of the parts contains an arithmetic
progression of length k.
Van der Waerden’s theorem exhibits the two phenomena, the
interplay of which is at the heart of Ramsey theory:
ix
x Preface
If we take into account, on the other hand, that there are differ-
ent sizes of infinity, as reflected by Cantor’s theory of ordinals and
cardinals, Principle 2 reappears in a very interesting way. Moreover,
as with the Paris-Harrington theorem, it leads to metamathematical
issues, this time in set theory.
xii Preface
c ∶ S → [r].
1
2 1. Graph Ramsey theory
In this case, we are able to get an exact cut-off of how large the
set needs to be; however, we will see that getting exact answers to
Ramsey-type questions will not always be easy, or even possible.
While the pigeonhole principle is a rather obvious statement in
the finite realm, its infinite versions are not trivial and require the
development of a theory of infinite sizes (cardinalities). We will do
this in Chapter 2.
Exercise 1.4. Prove that any subset of size n + 1 from [2n] must
contain two elements whose sum is 2n + 1.
The size of the vertex set is called the order of the graph G and
is denoted by ∣G∣. A graph may be called finite or infinite depending
on the size of its vertex set. In this chapter we will deal exclusively
with finite graphs, those with finite vertex sets. In the next chapter,
we will encounter infinite graphs. Figure 1.1 shows an example of a
finite graph with V = {1, 2, 3, 4, 5}.
The actual elements of the vertex set are often less important than
its cardinality. Whether the vertex set is {1, 2, 3, 4, 5} or {a, b, c, d, e}
carries no importance for us, as long as the corresponding graph is
essentially the same. Mathematically, “essentially the same” means
that the two objects are isomorphic. Two graphs G = (V, E) and
G′ = (V ′ , E ′ ) are isomorphic, written G ≅ G′ , if there is a bijection
2
The adjective combinatorial is used to distinguish this type of graph from the
graph of a function. It is usually clear from the context which type of graph is meant,
and so we will just speak of “graphs”.
3
Note that the definition of the edge set is not standard across all texts. Other
authors may call the graphs we use simple graphs to emphasize that our edge set does
not allow multiple edges between vertices or edges which begin and end at the same
vertex, while theirs do.
6 1. Graph Ramsey theory
1 4 C
A B
2 3 D
We say that two graphs with the same vertex set, G1 = (V, E1 )
and G2 = (V, E2 ), are complements if the edge sets E1 and E2 are
complements (as sets) in [V ]2 . This means that if G1 and G2 are
4
Another possible definition would be that the edge set is a set of ordered pairs.
The result would be a graph where each edge has a “direction” associated with it, like
a one-way street. Graphs whose edges are ordered pairs are called directed graphs.
1.2. The basics of graph theory 7
1 4 1 4
2 3 2 3
Definition 1.6.
(i) Given two graphs G = (V, E) and G′ = (V ′ , E ′ ), if V ′ ⊆ V
and E ′ ⊆ E∣V ′ , then G′ is a subgraph of G.
(ii) Given a graph G = (V, E), if V ′ ⊆ V , then G∣V ′ ∶= (V ′ , E∣V ′ )
is the subgraph induced by V ′ .
1 2 1
2 4 3 2
3 1 3
Figure 1.4. A graph G = (V, E) is shown on the left. The
middle graph is a subgraph of G, but not an induced subgraph,
while the graph on the right is an induced subgraph of G.
Note that the definition of path requires all vertices xi along the
path to be distinct—along a path, we can visit each vertex only once.
The size of the edge set in a path is called the length of the path. We
allow paths of length 0, which are just single vertices. Rather than as
a graph, we can also think of a path as a finite sequence of vertices
which begin at x0 and end at xn . If n ≥ 2 and x0 and xn are adjacent,
we can extend the path to a cycle or closed path, beginning and
ending at x0 .
If there exists a path that begins at vertex u and ends at vertex
v, then we say u and v are connected. Connectedness is a good
example of an equivalence relation:
● it is reflexive—every vertex u is connected to itself (by a
path of length 0);
● it is symmetric—if u is connected to v, then v is connected
to u (we just reverse the path);
● it is transitive—if u is connected to v and v is connected to
w, then u is connected to w (intuitively by concatenating
the two paths, but a formal proof would have to be more
careful, since the paths could share edges and vertices, so
we would not be able to concatenate them directly).
Recall the general definition of an equivalence relation. A binary
relation R on a set X is an equivalence relation if for all x, y, z ∈ X,
(E1) x R x,
1.2. The basics of graph theory 9
Exercise 1.8. Prove that a graph of order n which has more than
(n − 1)(n − 2)
2
edges must be connected.
1 1 2 1
2
2 4 5 3 6
3
3 4 4 5
2
1
6
5
1 1
1 1
1 1
1 1
1 1
1 1
1 1
Figure 1.7. Two bipartite graphs
1 1
1 1
1 1
1
1 1 1
Exercise 1.10. Prove that the total number of edges in Kn1 ,...,nk is
∑ ni nj .
1≤i<j≤k
1 1
1 1 1
1
1
1
1 1
1 1
Proof. Assume there are two paths that connect vertices u and v. We
may also assume that the two paths do not share any vertices except
for u and v, since in that case we could replace u or v by the first vertex
that the two paths share (and obtain two shorter paths to which we
could apply the argument). We can create a cycle by concatenating
the paths in the following way: If the first path goes through the
vertices u, x1 , . . . , xn , v and the second path goes through the vertices
u, y1 , . . . , ym , v, take u, x1 , . . . , xn , v, ym , . . . , y1 , u. Therefore, a tree
will always have a unique path between any two vertices.
On the other hand, assume that we have a graph G which is not
a tree. This means that either G is disconnected or there is a cycle in
G. If G is disconnected, then there are two vertices u and v that are
in different connected components and so do not have a path between
them. If G has a cycle, the cycle contains at least two vertices, u′ and
v ′ . This path can then be decomposed into two paths, one from u′
1.2. The basics of graph theory 13
to v ′ and one from v ′ to u′ , which means that there is more than one
path between the two vertices. Therefore, any graph in which there is
a unique path between any two vertices would be a connected graph
with no cycles, a tree.
The fact that two vertices are connected by a unique path lets us
organize a tree in a hierarchical manner. We designate one vertex in
a tree to be the root of the tree. Then, any other vertex in the tree
with degree 1 is called a leaf. Once a root has been chosen, we can
reorient any tree with the root at the bottom and all the leaves at
the top, like a real tree.
After choosing a root vertex, we can partially order the vertices
of a tree, based on their distance from the root. Given a vertex v,
consider the unique path from the root r to v. If this path goes
through a vertex u, then we say that v is a successor of u, or that u
is a predecessor of v, and we write u < v.
This order is partial in the sense that if u ≠ v are not on the same
path from the root, they are not comparable, that is, neither u < v
nor v < u.
The root is the unique vertex which is a predecessor of all other
vertices. A path from the root r to v can be represented as the
sequence of vertices
2 1 2 1
3 6 3 6
4 5 4 5
An arbitrary graph of order 6 The induced coloring of K6
So let us assume there are at least six people attending and con-
sider any 2-coloring of K6 . Let us call the vertices v1 , v2 , . . . , v6 and
consider, without loss of generality, the first vertex v1 . Vertex v1 is
connected to five other vertices. If we let R be the set of vertices con-
nected to v1 by a red edge and let B be the set of vertices connected
to v1 by a blue edge, then by the pigeonhole principle, either ∣R∣ ≥ 3
or ∣B∣ ≥ 3; we will assume ∣R∣ ≥ 3. If any two elements in R, say v2 and
v3 , are connected by a red edge, then v1 , v2 , and v3 are the vertices
of a red triangle. On the other hand, if all the elements in R are
connected by blue edges, then we have a blue triangle since there are
at least three vertices in R. In either case, we have a monochromatic
triangle (Figure 1.11).
In arrow notation, we just showed that 6 → (3)22 . (Keep in
mind that the subscript is denoting that we are using 2 colors, and
the superscript is denoting that we are coloring 2-element subsets.)
Surely this result will hold if our original complete graph was on
more than six vertices; simply pick six of the vertices and consider
the induced subgraph on those vertices, which is necessarily K6 , and
then use the result. Since we previously showed that five vertices is
not enough, we have proven the following.
Proposition 1.15. N → (3)22 if and only if N ≥ 6.
v2 v1 v2 v1
v3 v6 v3 v6
v4 v5 v4 v5
v2 v1
v3 v6
v4 v5
If the party has more guests, can we find even larger cliques (of
mutual friends or mutual strangers)? This is the subject of Ramsey’s
theorem for graphs.
Corollary 1.17 (Ramsey’s theorem for graphs, dual form). For any
k ≥ 2, there exists some integer N such that any graph of at least N
vertices contains a complete subgraph on k vertices or an independent
subgraph on k vertices.
v1 , v2 , v3 , . . .
v2
the pairs m, n in a matrix and then use the truth of the statement for
values in the ith diagonal of the matrix to prove the statement in the
(i + 1)st diagonal. In each diagonal, the sum m + n is constant, and
so we can view the simultaneous induction on m and n as a standard
induction on the value m + n.
For the base case of the induction (m = n = 2), (1.1) follows easily
from the fact that R(2, n) = n and R(m, 2) = m. For the inductive
step, let N = R(m − 1, n) + R(m, n − 1). Consider a 2-colored KN and
let v be an arbitrary vertex. Define Vred and Vblue to be the vertices
connected to v via a red edge or a blue edge, respectively. Then
Proof. We have
m m+2−2
R(m, 2) = m = ( )=( ),
m−1 m−1
n 2+n−2
R(2, n) = n = ( ) = ( ),
1 2−1
and then, again by simultaneous induction,
R(m, n) ≤ R(m − 1, n) + R(m, n − 1)
m+n−3 m+n−3 m+n−2
≤( )+( )=( ).
m−2 m−1 m−1
For m = n = k, this theorem yields the upper bound
2k − 2
(1.3) R(k) ≤ ( ).
k−1
Using Stirling’s approximation formula for n!, one can show √ that when
n is sufficiently large, ( k−1 ) is approximately 2
2k−2 2(k−1)
/ π(k − 1), so
the bound in (1.3) is a little better than our original bound R(k) ≤ 22k
from the proof of Theorem 1.16.
The current (as of 2018) best known general upper bound for
R(k) was proved by Conlon [10] in 2009: There exists a constant C
such that
log(k−1) 2k − 2
R(k) ≤ k−C log log(k−1) ( ).
k−1
Despite their rather regular appearance, the Paley graphs are im-
portant objects in the study of random graphs, as they share many
statistical properties with graphs for which the edge relation is deter-
mined by a random coin toss.
14 13
15
12
16
11
17
10
1
9
2
8
3
7
4
5 6
At this point, one may ask: There are only finitely many 2-
colorings of a complete graph of any finite order. Could we not cycle
through all colorings of K48 one by one (preferably on a fast com-
puter) and check whether each has either a red or a blue 5-clique? If
not, then R(5) = 49. If yes, then we could test all 2-colorings of a
K47 , and so on. Eventually we will have determined the fifth Ramsey
number.
The problem with this strategy is that there are simply too many
graphs to check! How many colorings are there? A K48 has (48 2
)=
1128 edges. Each edge can be colored in two ways, giving us
21128 ≈ 3.6 × 10339
colorings to check. At the time this book was written, the world’s
fastest supercomputer, the Cray Titan, could perform about 20×1015
floating-point operations per second (FLOPS). Under the unrealistic
28 1. Graph Ramsey theory
m=3 6 9 14 18 23
m=4 18 25 36 41 49 68
m=5 43 48 58 87 80 143
m=6 102 165 115 298
m=7 205 540
Table 1. Exact Ramsey numbers R(m, n) and best known
bounds for m, n ≤ 6, from [52]
Theorem 1.29 (Lower bound for R(k)). For k ≥ 3, R(k) > 2k/2 .
and color each of the (N2 ) edges based on a fair coin flip, say red for
heads and blue for tails. Since the coin flips are independent, there
N
will be a total of 2( 2 ) different 2-colorings of KN , each occurring with
equal probability.
k
Pick k vertices in your graph. There are 2(2) possible 2-colorings
of this subgraph and exactly 2 of them are monochromatic. Therefore,
the probability of randomly getting a monochromatic subgraph on
k
these k vertices is 21−(2) . There are (N
k
) different k-cliques in KN ,
so the probability of getting a monochromatic subgraph on any k
k
vertices is (N
k
)21−(2) .
Now suppose N = 2k/2 . We want to show that there is a positive
probability that a random coloring of KN will have no monochromatic
k-clique (and hence deduce that R(k) > 2k/2 , proving the theorem).
We bound the probability that a random coloring will give us a
monochromatic k-clique from above:
N k N! k
( )21−(2) = 21−(2)
k k!(N − k!)
N k 1−(k2) N!
≤ 2 (since ≤ N k)
k! (N − k)!
2
2k /2 1−(k2 −k)/2
= 2
k!
21+k/2
= .
k!
1+k/2
If k ≥ 3, then 2 k! < 1. (This is easily verified by induction.) There-
fore, it is not certain to always obtain a monochromatic clique of size
k, which in turn means that there is a positive probability that a
random coloring of KN will have no monochromatic k-clique.
While Erdős was not the first to use this kind of argument, he
certainly popularized it and, through his results, helped it become an
important tool not only in graph theory and combinatorics but also
in many other areas of mathematics (see for example [2]).
1.5. Turán’s theorem 31
vmax
S T
to try to use induction on p. With this in mind, let’s see how we can
prove the p = 3 case from what we already know, and then show how
to proceed with the induction. We will also focus on the case of two
colors (r = 2) for now and then discuss what needs to be changed to
adapt our argument for more than two colors.
c ∶ [N ]3 → {red, blue}
a3
a1 a1
a2 a2
x
are red, the x such that both triples are blue, the x where the first
triple is red and the second is blue, and the x where the first triple
is blue and the second is red. Each set represents how x is colored
with respect to the numbers already selected. Think of these sets as
“coloring configurations” (see right diagram of Figure 1.15).
One of these four sets must be the largest (or, at least, not smaller
than any of the other three); call this largest set S4 . Note that if i
and j are distinct elements from {1, 2, 3}, then c(ai , aj , x) is constant
for all x ∈ S4 and therefore depends not on our choice of x but only
on ai and aj .
We can then continue inductively to find a sequence of elements
a1 , a2 , . . . , at : At the beginning of stage t, we have already defined
a set St−1 = {a1 , a2 , . . . , at−1 }. We first pick an arbitrary element at
′
in St−1 and put St−1 = St−1 ∖ {at }. Next, we determine the color
′
configurations of all x ∈ St−1 . This involves checking the color of all
triples {ai , at , x} where i < t. Each color configuration can be thought
of as a sequence of length t−1 recording the colors of the {ai , at , x}, for
1.6. The finite Ramsey theorem 37
example (red, blue, red, red, . . . ). We see that there are 2t−1 possible
′
color configurations for x. Now partition St−1 into 2t−1 sets, where x
and y are in the same set if they have the same color configuration.
From these 2t−1 sets, pick one of maximal size and make this set St .
As before, we have that if 1 ≤ i < j ≤ t, then c(ai , aj , x) is constant
for all x ∈ St . The color depends only on our choice of i and j.
We can carry out this construction till we run out of numbers,
that is, till St consists of one number only; call this number at+1 .
We have constructed a sequence a1 , a2 , . . . , at+1 . This sequence
does not yet define a monochromatic subset, but by the way we con-
structed it we can trim it down to one. The crucial property this
sequence inherits from our construction is the following:
Whenever we pick 1 ≤ i < j < s ≤ t + 1,
(1.4)
c(ai , aj , as ) depends only on i and j, not on s.
same beginning, then they have the same color. (We will encounter
colorings of this type again in Section 4.6.)
Claim I: If c ∶ [r(k − 1) + 1]p → {0, . . . , r − 1} is good, then there
exists H ⊆ [r(k − 1) + 1] of size k such that c is monochromatic on
[H]p .
To see this, define a coloring c∗ ∶ [r(k − 1) + 1] → {0, . . . , r − 1} by
letting
41
42 2. Infinite Ramsey theory
h1 = zi1 , . . . , hp = zip .
{1, 2, 3}
Trees from partial orders. Let (T, <) be a partially ordered set.
(T, <) is called a tree (as a partial order) if
(T1) there exists an r ∈ T such that for all x ∈ T , r ≤ x
(r is the root of the tree);
(T2) for any x ∈ T , the set of predecessors of x, {y ∈ T ∶ y < x}, is
finite and linearly ordered by <.
Note that not every poset is a tree. For example, in the set of
all subsets of {1, 2, 3}, the predecessors of {1, 2, 3} are not linearly
ordered. Often a poset also lacks a root element. For example, the
usual ordering of the integers Z, ⋅ ⋅ ⋅ < −2 < −1 < 0 < 1 < 2 < ⋯, satisfies
neither (T1) nor (T2).
Trees arising from partial orders can be interpreted as graph-
theoretic trees, as introduced in Section 1.2. In fact, the elements
of T are called nodes, and sets of the form {y ∈ T ∶ y ≤ x} are called
branches.
Exercise 2.3. Let (T, <) be a tree (partial order). Define a graph by
letting the node set be T and connect two nodes if one is an immediate
predecessor of the other. (Node s is an immediate predecessor of t if
s < t and if for all u ∈ T , u < t implies u ≤ s.) Show that the resulting
graph is a tree in the graph-theoretic sense.
σ = 01100010101.
Therefore, ({0, 1}∗ , <) is a tree, the full binary tree (Figure 2.2).
2.2. König’s lemma and compactness 47
00 01 10 11
0 1
Our example of the full binary tree {0, 1}∗ is finitely branching.
lim d(xi , x) = 0.
i→∞
Exercise 2.10. Use the infinite Ramsey theorem to prove that every
sequence in R has a monotone subsequence. As any bounded, mono-
tone sequence converges in R, this implies the Bolzano-Weierstrass
theorem.
2.3. Some topology 53
sni , t⃗) → 0.
i→∞
d(⃗
(x)
x = ∑ si 2−i .
i
for all α ∈ A,
α ≤ β and β is the least number with this property.
If we combine (O1) and (O4), there must exist a least ordinal that
is greater than every natural number. This number is called ω. (O3)
tells us that ω has a successor, ω + 1, which in turn has a successor
itself, (ω +1)+1, which we write as ω +2. We can continue this process
and obtain
ω, ω + 1, ω + 2, ω + 3, . . . , ω + n, . . . .
But the ordinals do not stop here. Applying (O4) to the set {ω +n∶ n ∈
N}, we obtain a number that is greater than any of these, denoted by
ω+ω. Here is a graphical representation of these first infinite ordinals:
ω ○ ○ ○ ⋯
ω+1 ○ ○ ○ ⋯ ●
ω+2 ○ ○ ○ ⋯ ● ○
ω+ω ○ ○ ○ ⋯ ● ○ ○ ⋯
There are simply too many to form a set. The ordinals form what is
technically referred to as a proper class, which we will denote by Ord.
Other examples of proper classes are the class of all sets and the class
of all sets that do not contain themselves (this is Russell’s paradox ).
The assumption that either of these is a set leads to a contradiction
similar to assuming that a set of all ordinals exists. Classes behave
in many ways like sets—for example, we can talk about the elements
of a class. But these elements cannot be other classes; classes are too
large to be an element of something.
● m + (n + 1) = (m + n) + 1,
● m ⋅ (n + 1) = m ⋅ n + m.
α+0 = α,
α + (β + 1) = (α + β) + 1,
α+λ = sup{α + γ∶ γ < λ} if λ is a limit ordinal.
58 2. Infinite Ramsey theory
1 + ω = sup{1 + n∶ n ∈ N} = sup{m∶ m ∈ N} = ω ≠ ω + 1.
ω ⋅ 2 = ω(1 + 1) = ω + ω,
2 ⋅ ω = sup{2n∶ n ∈ N} = sup{m∶ m ∈ N} = ω.
ω ⋅ ω = sup{ω ⋅ n∶ n ∈ N}.
ω, ω + ω, ω + ω + ω, ω + ω + ω + ω, . . . .
ω, ω ⋅ ω, ω ⋅ ω ⋅ ω, . . . .
Exponentiation of ordinals.
α0 = 1,
αβ+1 = αβ ⋅ α,
αλ = sup{αγ ∶ γ < λ} if λ is a limit ordinal.
By this definition, ω ω is the limit of ω, ω 2 , ω 3 , . . . indeed.
Using exponentiation, we can form the sequence
ω
ω, ω ω , ω ω , . . . .
f (S ′ ) ∈ S ′ .
{sξ ∶ ξ < α}
Now iterate. This procedure has to stop at some ordinal, i.e. there
exists an ordinal β such that
If not, that is, if the procedure traversed all ordinals, we would have
constructed an injection F ∶ Ord → S. Using some standard axioms
about sets, this would imply that, since Ord is not a set, S cannot be
a set (it would be as large as the ordinals, which form a proper class),
which is a contradiction.
For now, let us just put on record that the use of the axiom of
choice may present some foundational issues. By using a choice func-
tion without specifying further the specific objects which are chosen,
the axiom introduces a non-constructive aspect into proofs. For this
reason, one often tries to clarify whether the axiom of choice is needed
in its full strength, whether it can be replaced by weaker (and founda-
tionally less critical) principles such as the axiom of countable choice
(ACω ) or the axiom of dependent choice (DC), or whether it can be
avoided altogether (for example by giving an explicit, constructive
proof).
The book by Jech [36] is an excellent source on many questions
surrounding the axiom of choice.
For example, the sets {1, 2, 3, 4, 5} and {6, 7, 8, 9, 10} have the
same cardinality. In the finite realm, it is impossible for a set to be a
proper subset of another set yet have the same cardinality as the other
set. This is no longer the case for infinite sets. The set of integers
has the same cardinality as the set of even integers, as witnessed by
the bijection z ↦ 2z.
A very interesting case is N versus N × N. While N is not a subset
of N × N, we can embed it via the mapping n ↦ (n, 0) as a proper
subset of N × N. But there is actually a bijection between the two
sets, the Cantor pairing function
(x + y)2 + 3x + y
(x, y) ↦ ⟨x, y⟩ = .
2
Exercise 2.19. (a) Draw the points of N × N in a two-dimensional
grid. Start at (0, 0), which maps to 0, and find the point which maps
to 1. Connect the two with an arrow. Next find the point that maps
to 2, and connect it by an arrow to the point that maps to 1. Continue
in this way. What pattern emerges?
(b) We can rewrite the pairing function as
(x + y)2 + 3x + y (x + y + 1)(x + y)
=x+ .
2 2
Recall that the sum of all numbers from 1 to n is given by
(n + 1)n
.
2
How does this help to explain the pattern in part (a)?
You can find a proof in [35]. You can of course try proving it
yourself, too.
Exercise 2.21. Use the Cantor-Schröder-Bernstein theorem to show
that R and [0, 1] have the same cardinality.
that is, we put 0 on top of all other numbers. This gives a well-
ordering of order type ω +1. Or we could put all the negative numbers
on top of the positive integers:
0 < 1 < 2 < 3 < ⋅ ⋅ ⋅ < −1 < −2 < −3 < ⋯,
which gives a well-ordering of type ω + ω.
This implies, in particular, that ω, ω + 1, and ω + ω all have
the same cardinality. Recall that we identify ordinals with their
initial segment, i.e. we put β = {α ∈ Ord∶ α < β}. Hence ω + 1 =
{0, 1, 2, . . . , ω}, and we can map ω + 1 bijectively to ω as follows
ω ↦ 0, 0 ↦ 1, 1 ↦ 2, ...
Note that this definition uses the axiom of choice, since we have
to ensure that each set has at least one well-ordering.
Exercise 2.26. Show that ∣A∣ ≤ ∣B∣ if and only if there exists a one-
to-one mapping A → B.
68 2. Infinite Ramsey theory
Theorem 2.27. For every set S, there exists a set of strictly larger
cardinality.
Proof. Consider P(S) = {X∶ X ⊆ S}, the power set of S. The map-
ping S → P(S) given by s ↦ {s} is clearly injective, so ∣S∣ ≤ ∣P(S)∣.
We claim that there is no bijection f ∶ S → P(S). Suppose there
were such a bijection f , that is, in particular,
P(S) = {f (x)∶ x ∈ S}.
Every subset of S is the image of an element of S under f . To get
a contradiction, we exhibit a set X ⊆ S for which this is impossible,
namely by letting
x ∈ X ∶⇔ x ∉ f (x).
Now, if there were x0 ∈ S such that f (x0 ) = X, then, by the definition
of X,
x0 ∈ X ⇔ x0 ∉ f (x0 ) ⇔ x0 ∉ X,
a contradiction. This is a set-theoretic version of Cantor’s diagonal
argument.
means:
If ∣X∣ ≥ κ and c ∶ [X]μ → λ, then there exists H ⊆ X with
∣H∣ ≥ η such that c∣[H]μ is constant.
Here [X]μ is the set of all subsets of X of cardinality μ:
has cardinality κμ .
N = Z0 ⊇ Z1 ⊇ Z2 ⊇ ⋯
72 2. Infinite Ramsey theory
such that
● zi ∈ Zi ;
● for each i, the color of {zi , zj } is the same for all j > i; and
● each Zi is infinite.
It was possible to find these sequences because of the simple in-
finite pigeonhole principle: If we partition an infinite set into finitely
many parts, one of them must be infinite.
This principle still holds for uncountable sets: Any finite partition
of an uncountable set must have an uncountable part. In fact, we have
something stronger:
In any partition of an uncountable set into countably many
parts, one of the parts must be uncountable.
Using the language of cardinals, we can state and prove a formal
version of this principle.
Proposition 2.35. If κ is an uncountable cardinal and f ∶ κ → ω,
then there exists an α < ω such that ∣f −1 ({α})∣ = ℵ1 .
Proposition 2.36. Ramsey’s theorem does not hold for the real num-
bers:
∣R∣ = 2ℵ0 /→ (2ℵ0 )22 .
The essence of the proof lies in the fact that a homogeneous set
would “line-up” the well-ordering ≺ with the standard ordering <R of
R. If this line-up is too long (uncountable), we get a contradiction
due to the fact that R contains Q as a dense “backbone” (under <R ).
could continue our construction beyond stage ω and into the transfi-
nite. In fact, by choosing the xα carefully enough, we can ensure that
the construction goes on for β-many stages for any fixed countable
ordinal β. But the cardinality argument of the proof above tells us it
is impossible to do this for ℵ1 -many stages.
R0 ⊆ R1 ⊆ ⋅ ⋅ ⋅ ⊆ Rβ ⊆ ⋯
78 2. Infinite Ramsey theory
Note that the proof becomes quite elegant once we have proved
the lemma. After we ensured the existence of the set R, we can work
in some sense “backwards”: We choose a single “anchor point” x∗
from the reserved set (2ℵ0 )+ ∖ R. You can think of x∗ as always being
the next point chosen in the sense of the standard construction, only
then to be replaced by an element from R which behaves exactly like
it in terms of color pairings with the already constructed xβ .
Note also that we do not need to construct a sequence of shrinking
sets Zα anymore. In the standard construction, the Zα represent the
reservoir from which the next potential elements of the homogeneous
set are chosen. They are no longer needed since x∗ is always available
(as explained above).
We also do not need to keep track of the color choices we made
along the way, as x∗ does this job for us, too. For example, if
c(x∗ , x0 ) = 0, it follows by construction that c(xβ , x0 ) = 0 for all
β > 0, which in the previous constructions means that we restrict the
Z to all elements which color 0 with x0 .
ℵ0 /→ (2)1ℵ0 .
80 2. Infinite Ramsey theory
But even if we make the colored set larger than the number of colors,
this does not mean we can find even a finite homogeneous set.
Proposition 2.41. For any infinite cardinal κ,
2κ /→ (3)2κ .
The proof works in general for any cardinal κ that we can reach
in fewer than κ steps. This brings us to the notion of cofinality.
82 2. Infinite Ramsey theory
Growth of Ramsey
functions
85
86 3. Growth of Ramsey functions
1 2 3 4 5 6 7 8
blue
red
But then he saw that no matter how you color {1, . . . , 9} (29 =
512 possible colorings), there will always be a monochromatic 3-AP.
In other words, W (3, 2) = 9. Keep in mind, though, that van der
Waerden was originally trying to prove Theorem 3.1, not the finite
version, Theorem 3.3. As we outlined above, the mere existence of
W (3, 2) is not clear at all from the statement of Theorem 3.1. But
the three mathematicians were able to deduce Theorem 3.3 given
that Theorem 3.1 holds by an argument similar to the compactness
argument we have used. (They did not call it compactness though.)
1
Our diagrams follow the design of van der Waerden [69]. In the case of two
colors, the top line will always represent the color blue and the bottom one always red.
88 3. Growth of Ramsey functions
Block progressions.
Being able to use more than two colors turned out to play a crucial
role in establishing the existence of W (k, 2) by induction. The colors
occur as “block colors”, in the following sense. Suppose we have a 2-
coloring of N. For some fixed m ≥ 1, consider now consecutive blocks
of m numbers, i.e.
1, 2, . . . , m, m+1, m+2, . . . , 2m, 2m+1, m+2, . . . , 3m, ....
Being able to use more than two colors to establish the existence
of W (k, 2) proved crucial for the proof, together with the following
principle, which we applied above:
(3.3) Any arithmetic progression “inside” an arithmetic progres-
sion is again an arithmetic progression.
Let us try to deduce that W (3, 2) exists, by a formal deduction
instead of a brute force argument where we simply check all the cases.
It is helpful to imagine an evil adversary who counters every guess
we make by choosing a coloring that makes it as hard as possible for
us to find monochromatic arithmetic progressions. We have to corner
90 3. Growth of Ramsey functions
him with our logic so that eventually he has no choice but to reveal
a monochromatic 3-AP.
Inspecting the color of the numbers 1, 2, 3, our opponent has to re-
veal a 2-AP, by the pigeonhole principle (recall that any two numbers
with the same color form a monochromatic 2-AP). Without loss of
generality, let us assume we are presented with the following coloring:
1 2 3
In other words, we have a blue 2-AP {1, 3}. The easiest way to a
monochromatic 3-AP from there would be a blue 5:
1 2 3 4 5
But of course our opponent will not give us this easy victory and
thus color 5 red:
1 2 3 4 5
In fact, as you can easily check, there are plenty of ways to 2-color
{1, . . . , 5} without having a monochromatic 3-AP. So let us assume
that no 5-block of the form {5m + 1, 5m + 2, . . . , 5(m + 1)} contains
a monochromatic 3-AP. But, as we will now see, we can turn our
opponent’s strategy—denying us the easy victory inside a 5-block—
against him using a nice trick.
This is where the block-coloring idea comes in: A 2-colored 5-
block represents one of 25 = 32 possible patterns. Think of each
pattern as a “color” that is a special blend of red and blue, determined
by how often each color occurs and where it occurs within the five
positions. The pigeonhole principle now guarantees that
3.1. Van der Waerden’s theorem 91
36 40 101 105
B8 B21
Figure 3.2. The 5-blocks B8 and B21 have the same coloring
patterns.
The two blocks together still do not give us a 3-AP. But the
picture changes when we consider the 3-AP of 5-blocks generated by
B8 , B21 , and B34 . In other words, we consider a 3-AP of blocks
corresponding to their indices.
Of course, the coloring pattern of B34 might be completely dif-
ferent from that of B8 and B21 . But the crucial fact, as simple as it
may sound, is that the last element of B34 , 170, is assigned a color.
How would this help us? Let’s look at the three blocks together:
92 3. Growth of Ramsey functions
40 105 170
∣ ∣∣ ∣ ∣∣
∣ ∣ ∣ ∣
B8 B21 B34
40 105 170
∣ ∣∣ ∣ ∣∣
∣ ∣ ∣ ∣
B8 B21 B34
While this seems a rather large bound considering one can estab-
lish W (3, 2) = 9 by checking all cases, the way we arrived at it can be
generalized to higher orders.
Let us try to capture the essential steps in this argument.
1. Choose a block size. Our block size was 5 since it guaranteed
the existence of a monochromatic 2-AP that can be extended
to a 3-AP (not necessarily monochromatic) within the block.
2. Find an arithmetic progression of block-coloring patterns.
There are 25 = 32 possible 2-colorings of 5-blocks. Hence
among 33 5-blocks, two must be the same. These blocks
form a 2-AP of blocks.
3. Extend the arithmetic progression of block patterns by one
block. The additional block may not have the same coloring
pattern, but we will use it to “force” a monochromatic AP
in the next step.
4. Consider arithmetic progressions of numbers inside the block
progression. One element of the additional block will be the
“focal point” of a monochromatic AP, either by collecting
the elements in each block at a constant position, or by ex-
tending a “diagonal” AP, such as, in our example, elements
1, 3, and 5 from blocks B8 , B21 , and B34 , respectively.
Exercise 3.5. Follow the template above to argue that W (4, 2) ex-
ists. You may assume that all numbers of the form W (3, r) exist. It
helps to make a drawing like we did above.
1. We need a block size that guarantees the existence of a
monochromatic 3-AP that can be extended to a 4-AP (not
necessarily monochromatic). Express this in terms of W (3, 2).
2. Let M be the block size. There are 2M ways to color such a
block. But now we need to find not only two but three blocks
94 3. Growth of Ramsey functions
anymore (or at least not right away), since the “focal number” could
now be colored with the third color, say green.
∣ ∣∣ ∣ ∣∣
∣ ∣ ∣ ∣
∣
But we can apply the block argument again—to the big block of
size M = 7 ⋅ (2 ⋅ 37 + 1). There are
7
+1)
3M = 37⋅(2⋅3 possible 2-colorings of such a block.
Hence, among 3M + 1 such blocks, we must see two with the same
coloring pattern.
∣ ∣∣ ∣ ∣∣ ∣ ∣∣ ∣ ∣∣
∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣
∣ ∣
∣ ∣∣ ∣ ∣∣ ∣ ∣∣ ∣ ∣∣
∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣
∣ ∣
We do not know what the coloring of the third big block is, but
arguing as before, one element in the third (inner) 7-block of the
third (outer) M -block becomes the focal point of three arithmetic
progressions, as indicated in the following picture:
96 3. Growth of Ramsey functions
∣ ∣∣ ∣ ∣∣ ∣ ∣∣ ∣ ∣∣
∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣
∣ ∣
Note that our argument uses only the pigeonhole principle (i.e.
the existence of W (2, r) for r = 3M ). The existence of W (k, r) for
k > 2 and r ≥ 2 is not needed.
Exercise 3.7. Generalize the previous argument to show that W (3, r)
exists for any r ≥ 2. Can you derive a bound for W (3, r)?
Exercise 3.8. Sketch a proof that W (4, 3) exists by combining the
arguments for W (3, 3) and W (4, 2). A careful drawing can represent
most of the argument. What should the initial block size be in this
case? The existence of which numbers W (k, r) do we have to assume?
If you tried Exercise 3.6, you have probably seen that our upper
bound U (3, 3) = (2 ⋅ 37⋅(2⋅3 +1) + 1) ⋅ (2 ⋅ 37 + 1) ⋅ 7 is huge compared
7
to U (3, 2) = 325. You can probably imagine what will happen for
U (3, 4).
Let us try to compute U (4, 2), using the definitions given by
formulas (3.4), (3.5), and (3.6).
The innermost block size is given by
U (3, 2) − 1 324
b1 = U (3, 2) + ⌊ ⌋ = 325 + = 487.
2 2
Then
U (3, 2487 ) − 1
b2 = (U (3, 2487 ) + ⌊ ⌋) ⋅ 487.
2
example). For the van der Waerden bounds, however, tetration is just
the beginning.
The important fact for us is that tetration can be defined recur-
sively from exponentiation. What we mean by this is that we can set
a ground value and then inductively define higher values by iterating
exponentiation.
For example, multiplication is defined by iterating addition:
x ⋅ 0 = 0,
x ⋅ (y + 1) = x + x ⋅ y.
x0 = 1,
xy+1 = x ⋅ (xy ).
53 = 5 ⋅ 52 = 5 ⋅ (5 ⋅ 51 ) = 5 ⋅ (5 ⋅ (5 ⋅ (50 )))
= 5 ⋅ (5 ⋅ (5 ⋅ (1))) = 125.
x ↑↑ 0 ∶= 1,
x ↑↑ (y + 1) ∶= x(x↑↑y) .
If you unravel the definition, you will see that tetration indeed
results in a tower of exponentiation
x
⋅⋅
x ↑↑ y = xx .
O
y times
ϕ1 (x, y) = x ⋅ y,
ϕ2 (x, y) = xy ,
ϕ3 (x, y) = x ↑↑ (y + 1).
Example 3.13.
ϕ4 (2, 3) = ϕ3 (2, ϕ4 (2, 2))
= ϕ3 (2, ϕ3 (2, ϕ4 (2, 1)))
= ϕ3 (2, ϕ3 (2, ϕ3 (2, ϕ4 (2, 0))))
= ϕ3 (2, ϕ3 (2, ϕ3 (2, 2)))
= ϕ3 (2, ϕ3 (2, 2 ↑↑ 2))
= ϕ3 (2, ϕ3 (2, 22 ))
= ϕ3 (2, 2 ↑↑ 4)
22
= ϕ3 (2, 22 )
= 2 ↑↑ 65536
2⋰
2
=
T
65536 times
The van der Waerden bound U (k, r) in turn, for fixed k and as
a function of r, dominates the kth level of the Ackermann function.
We have already seen this for k = 3. So let us assume we have shown
that for k ≥ 3 and r ≥ 2, Uk (r) ≥ ϕk (r, r − 1). Now
∗ ∗
U (k + 1, r) ≥ b∗1 ⋯b∗r = U (k, r)U (k, r b1 ) ⋯ U (k, r br−1 ),
Each function ϕk , as well as each kth level of the van der Waerden
bound Uk , can be obtained by a finite number of applications of these
two operations to a set of basic functions: x + y in the case of ϕ, r + 1
(the value of W (2, r)), multiplication, exponentiation, and integer
division for U .
Functions that can be obtained this way belong to an important
family of functions—the primitive recursive functions. To define them
for all finite arities, we introduce the following basic functions.
Zero function: Zero(x) = 0.
The successor function: S(x) = x + 1.
The projection functions: Pni (x1 , . . . , xn ) = xi .
Definition 3.16. The family of primitive recursive functions is
the smallest family of functions f ∶ Nn → N (where n ∈ N can be
arbitrary) that contains the three basic functions and is closed under
composition and recursion.
Exercise 3.19. Show that the following functions are primitive re-
cursive:
x Y y, max(x, y), min(x, y), ∣x − y∣.
In the late 19th century, work by Dedekind [11] and Peano [48,
49] demonstrated that definitions by induction play an important role
in the development of number theory from first principles (axioms),
and that any axiomatic framework for number theory must include a
principle ensuring that functions described by induction (recursion)
are indeed well-defined. This led to the introduction of the family
108 3. Growth of Ramsey functions
Hence we have
Φ0 (x) = x + 2,
Φ1 (x) = 2x,
Φ2 (x) = 2x ,
Φ3 (x) = 2 ↑↑ (x + 1),
Φn+1 (0) = 2,
Φn+1 (x + 1) = Φn (Φn+1 (x)).
Theorem 3.24.
Φω (x) ≤ Φk (x).
which is a contradiction.
Corollary 3.26. The Ackermann function ϕ(x, y, n) and the van der
Waerden upper bound U (k, r) are not primitive recursive.
Φω+1 (0) = 2,
Φω+1 (x + 1) = Φω (Φω+1 (x)).
As you can probably tell, we have chosen the index ω with the idea
in mind that the ordinals will help us index our extended hierarchy
3.3. Hierarchies of growth 111
beyond the finite levels. So, in general, assume that we have defined
the function Φα (x) for some ordinal α, and let
Φα+1 (0) = 2,
Φα+1 (x + 1) = Φα (Φα+1 (x)).
Note that even though the functions are indexed by ordinals, they are
still functions from N to N.
What this definition does not tell us is how to define Φα in the
case where α is a limit ordinal. For α = ω, we took the diagonal of
the functions “leading up to it”. We can do something similar for an
arbitrary limit ordinal α by putting
Φα (x) = Φαx (x),
where αx is a sequence of ordinals with limit α, i.e. sup{αx ∶ x ∈ N} = α.
Any such sequence is called a fundamental sequence for α.
There are two issues: First, this only works for limit ordinals with
cofinality ω (recall the definition of cofinality, Definition 2.43). As we
are only interested in Φα for countable α, this does not really present
a problem. Second, there are multiple ways to choose a fundamental
sequence. For example, if α = ω + ω, both
ω + 1, ω + 3, ω + 5, . . . and ω + 2, ω + 4, ω + 6, . . .
converge to α. Is there a canonical way of selecting a fundamental
sequence? If we consider only α < ε0 (recall that ε0 was the least
ordinal ε for which ω ε = ε), there is indeed such a way, given through
the Cantor normal form.
Theorem 3.27 (Cantor normal form). Every ordinal 0 < α < ε0 can
be represented uniquely in the form
α = ω β1 + ⋯ + ω βn ,
where n ≥ 1 and α > β1 ≥ ⋅ ⋅ ⋅ ≥ βn .
Exercise 3.31. Determine all combinatorial lines for C42 . How many
are there?
3.4. The Hales-Jewett theorem 115
Theorem 3.33 (Hales and Jewett). For any integers t, r > 1, there
exists an integer HJ (t, r) such that for any n ≥ HJ (t, r), every r-
coloring of Ctn has a monochromatic combinatorial line.
2
The naming of these combinatorial structures after Shelah is due to Graham,
Rothschild, and Spencer [24], and we follow them here.
118 3. Growth of Ramsey functions
A single Shelah line will not be enough for our purposes; we need
the simultaneous existence of multiple Shelah lines.
A combinatorial s-space, Σs , is the concatenation of s combi-
natorial lines of variable dimensions over the same alphabet. We can
represent the space as the set of s-tuples whose entries are points on
combinatorial lines,
(L1 (τ1 ), L2 (τ2 ), . . . , Ls (τs )).
Each Li is a combinatorial line in Ctni for some ni , and each τi can
vary independently. The combinatorial s-space Σs can be realized as
a subset of Ctn where n = ∑ ni .
3
This bound is clearly not optimal, but it facilitates counting later while having
little impact on the overall growth estimate.
3.4. The Hales-Jewett theorem 119
(2 2 ∗ 3) × (2 ∗) × (2 ∗ ∗ ∗ 3 3).
While points in this space live in C312 , any such point is completely
determined by the respective values of ∗ in the three Shelah lines.
Therefore, there is a canonical bijection between the space and C33 .
Given a coloring c of a Shelah s-space, we call c t-blind if the
induced coloring c∗ = c ○ π is t-blind.
Two points are hence equivalent if they have identical coloring be-
havior with respect to all (s − 1)-combinations of Shelah point prede-
cessors.
Since each CtNi has no more than tNi2 Shelah points, each point
in CtNs has at most
s−1
ts−1 ∏ Nj2 =∶ Ms
j=1
4
This is similar to the block-coloring patterns in the proof of van der Waerden’s
theorem.
3.4. The Hales-Jewett theorem 121
possible choices for the xj and zk . As before, this means that there
are r Mi equivalence classes. We require Ni ≥ r Mi , and as before the
N
Shelah line pigeonhole principle implies that Ct j contains a t-blind
Shelah line, Li , with respect to this equivalence relation.
Continuing in this way, we obtain Shelah lines L1 , . . . , Ls and
claim that the original coloring c is t-blind for the corresponding She-
lah s-space S.
It suffices to verify the following: If (z1 , . . . , zi−1 , y, zi+1 , . . . , zs )
and (z1 , . . . , zi−1 , y ∗ , zi+1 , . . . , zs ) are two points in S with y = t − 1
and y ∗ = t, then the points have the same color.
By construction,
(x1 , . . . , xi−1 , y, zi+1 , . . . , zs ) and (x1 , . . . , xi−1 , y ∗ , zi+1 , . . . , zs )
have the same color for any Shelah points x1 ∈ CtN1 , . . ., xi−1 ∈ CtNi−1 .
But since each zj , j < i, is on the Shelah line Lj , the zj are clearly
Shelah points, from which the claim follows immediately.
Therefore, if we set the Ni as required in the construction,
s−1
N1 ∶= r t ,
i−1
Mi ∶= ts−1 ∏ Nj2 and Ni ∶= r Mi (1 < i ≤ s),
(3.7) j=1
s
N ∶= ∑ Ni ,
i=1
Proof. Let S(t) be the function defined in the proof of Lemma 3.38,
that is, let S(1) = 1. For s = S(t − 1), let S(t) = N , where N is as
in the last line of (3.7). Inspecting the definition of S, we see that
from some point on, S(t) is significantly larger than both r and t. In
particular, with s = S(t − 1) we have
s−1 s
N1 = r t ≤ ss ≤ Φ3 (s)
and
i−1
Mi = ts−1 ∏ Nj2 ≤ (Ni−1 )3s ,
j=1
and thus, for sufficiently large t (and hence sufficiently large s),
3s 2Ni−1
Ni ≤ r (Ni−1 ) ≤ 22 .
Every iteration of Ni adds three more exponents to an exponential
tower (the ↑↑ function), and hence adds 3 to an argument of Φ3 .
Therefore,
Ns ≤ Φ3 (s + 3(s − 1)) ≤ Φ3 (4s),
and
S(t) = N = N1 + ⋯ + Ns ≤ 2Ns ≤ Φ3 (4s + 1).
A direct calculation shows that S(3) ≤ Φ4 (8). So if we assume, in-
ductively, that S(t − 1) ≤ Φ4 (2t), then
S(t) ≤ Φ3 (4s + 1) = Φ3 (4S(t − 1) + 1)
≤ Φ3 (4Φ4 (2t) + 1) ≤ Φ3 (Φ4 (2t + 1)) = Φ4 (2(t + 1)).
3.5. A really fast-growing Ramsey function 123
The only difference from the usual finite Ramsey theorem (The-
orem 1.31) is that the homogeneous set Y is required to be relatively
large. Given m, p, and r, we denote by PH (m, p, r) the least N with
the property asserted in the statement of the theorem.
Call these intervals the type 1 blocks. If i and j lie in the same type 1
subset, put c(i, j) = 1. Otherwise, let c1 (x, y) = 0.
If A is a homogeneous subset of color 1, then A ⊂ [x, 2x) for
some x, which means that ∣A∣ ≤ x and hence A is not relatively large.
Therefore, any relatively large, homogeneous subset A must have color
0, and if ∣A∣ ≥ 3, then A has to contain at most one element each from
[2, 4) and [4, 8). Therefore, PH (3, 2, 2) ≥ 4 = Ψ1 (2).
Next, we define a type 2 block structure in a similar way. Split
[2, ∞) into sets [x, x ⋅ 2x ) (note that Ψ2 (x) = x ⋅ 2x ):
[2, 8) ∪ [8, 8 ⋅ 28 ) ∪ ⋯.
and therefore
(2)
PH (4, 2, 3) ≥ Ψ2 (2) ≥ Ψ2 (3).
PH (n + 3, 2, n + 2) ≥ Ψn+1 (n + 2),
One can continue along these lines for higher values of p (that is,
look at p-sets instead of just pairs). The analysis becomes much more
involved. In a landmark paper, Ketonen and Solovay [40] were able
to show that
Metamathematics
or its negation
129
130 4. Metamathematics
We will make all this more precise below; here we just want to
indicate that once a formal definition of proof is given, the connection
between proof and truth needs to be rigorously (re-)established.
That all this is possible is one of the great triumphs of mathe-
matical logic in the 20th century.
● addition (x, y) ↦ x + y,
● multiplication (x, y) ↦ x ⋅ y.
Furthermore, the number 0 (which we count among the natural num-
bers) has a special status, because, for example:
● it is the smallest natural number;
● it is the only natural number that is not the successor of
another natural number;
● adding it to another number does not change that number.
Therefore, the language of arithmetic LA has the four symbols
S, +, ⋅, 0.
Formulas are built from symbols via rules. What these rules are
specifically, and how they are used to form formulas, is not important
at this point (although we have to consider it in more detail later on).
All that matters for now is that the rules enable us to distinguish
between valid formulas, such as
∀x0 (x0 + 3 = 4)
0+0=0
We can simply store all three axioms in the memory of the machine
and compare any given formula successively with each of them. How-
ever, this procedure is impossible if the axiom system is infinite. In
this case the sentences of the system must be describable in a sys-
tematic (algorithmic) way.
The axiom system we will be particularly interested in is Peano
arithmetic (PA), formalized in the language of arithmetic LA and
described by the following axioms:
(PA1) ∀x S(x) ≠ 0
(PA2) ∀x ∀y (S(x) = S(y) ⇒ x = y)
(PA3) ∀x (x + 0 = x)
(PA4) ∀x ∀y (x + S(y) = S(x + y))
(PA5) ∀x (x ⋅ 0 = 0)
(PA6) ∀x ∀y (x ⋅ S(y) = x ⋅ y + x)
(LNPϕ ) ∀w[∃v
⃗ ⃗ → ∃z(ϕ(z, w)
ϕ(v, w) ⃗ ∧ ∀y < z ¬ϕ(y, w))]
⃗
Exercise 4.1. Argue (informally) that induction and the least num-
ber principle are equivalent.
Remark: The careful reader may already have noticed that we have
become a little sloppy concerning formal notation—we left out a
parenthesis here and there, and we used x, y to denote variables in-
stead of x0 and x1 . All this is done to improve readability.
(1) is an axiom ψ ∈ A, or
(2) is a logical axiom, i.e. a formula from a fixed set of universally
valid formulas such as x = x or ψ ∨ ¬ψ, or
(3) has been obtained from formulas ϕ1 , . . . , ϕi−1 by application
of a deduction rule. For example, if ϕ2 is ϕ1 ⇒ ψ, then
we are allowed to deduce ψ. In other words, if we have
previously established ϕ1 and ϕ1 ⇒ ψ, we may deduce ψ by
applying the logical rule called modus ponens.
The choice of the logical axioms (2) and the choice of deduc-
tion rules (3) define a proof system. The precise nature of such a
proof system is of no importance here. What matters for us is that
the system is sound and complete (properties we will discuss below).
Readers can find various such systems in the books by Shoenfield [60]
and Rautenberg [54].
In mathematical practice, a proof is usually not given in this form.
It would be very hard to digest for a human reader who wants to follow
an argument. Proofs in mathematical papers and books (such as this
one) are usually given in a hybrid form: formal computations paired
with a deduction given in English or another language. But the idea
is that every proof can be brought into the form of Definition 4.2.
And once this is done, it should not be very hard (though it would be
tedious) to go through the proof literally line by line to check whether
every step is valid. In fact, we could leave this task to a computer.
In recent years enormous progress has been made in developing
proof assistants 2 , which help humans to turn their “human” proofs
into fully formal arguments the correctness of which can then be
checked by computers. Several important theorems of mathemat-
ics have successfully been verified this way, for example proofs of the
Kepler conjecture in geometry [30] and the Feit-Thompson theorem
in group theory [21].
2
Coq, HOL, Isabelle, Lean, to name just a few.
138 4. Metamathematics
A ⊧ ϕ[a1 , . . . , an ]
for all σ ∈ S, A ⊧ σ.
3
You may have noticed the use of ≡ here to denote equality between formulas.
This is to distinguish it from the logical symbol =, which is used inside formulas.
140 4. Metamathematics
Exercise 4.5. Find other models, different from the standard model,
which satisfy as many axioms from (PA1)–(PA6) as possible. Try
adding additional elements to the standard model “at the end” and
extend the operations S, +, ⋅ accordingly.
Does the axiom system PA have any models other than the stan-
dard model N?
The answer is, maybe a bit surprisingly, “yes”. To see why, we
need to return to our brief introduction to mathematical logic and
talk about the relation between proof and truth.
consequence, this would mean checking for every group whether the
statement holds in that particular group. But there are way too many
groups. In fact, the family of all mathematical groups is not even a
set, but a proper class, just like the class of all ordinals.
Of course, nobody proves theorems about groups this way. We
deduce them from the group axioms. What the completeness theorem
tells us is that
every statement true in all groups has a proof from the group
axioms, and this proof can be completely formalized.
The completeness theorem also has some important consequences
at the other extreme: inconsistent theories.
A theory T is inconsistent if for some sentence σ,
T ⊢ σ and T ⊢ ¬σ.
(Note that, by the completeness theorem, we could use ⊧ instead of
⊢.) In any fixed mathematical structure M, we have either M ⊧ σ
or M ⊧ ¬σ, but never both (since a sentence is either true or not,
in which case its negation is true). Therefore, if T is inconsistent,
it cannot have any models. The other direction of the completeness
theorem tells us in turn that if a theory does not have any models,
then it must be inconsistent.
Corollary 4.8. A theory T is consistent if and only if T has a model.
Exercise 4.9. Show that if T is inconsistent, T ⊢ σ for every sentence
σ. In other words, an inconsistent theory proves everything.
(Hint: If T ⊭ τ for some τ , then there has to be some structure wit-
nessing this.)
S c = {ϕcn ∶ n ∈ N}.
But for these finitely many formulas, we can easily give a model: Take
N (which satisfies PA and, in particular, every finite subset of PA) and
interpret c as 19191920, that is, one larger than the largest number
occurring in any of the formulas ϕcn of T . This gives us a model of T .
By compactness, PA + S c has a model, say N . N satisfies every
axiom of PA, and it must also interpret the constant c so that every
statement
c > n (n ∈ N)
. . . a − 3, a − 2, a − 1, a, a + 1, a + 2, a + 3 . . . (n ∈ N).
it follows that a + n < 2a for all n. One can even show that 2a −
m > a + n for any m, n ∈ N. This means that the non-standard part
4.2. Non-standard models of Peano arithmetic 149
x ∈ I and y < x ⇒ y ∈ I.
N is not only an initial segment but also closed under the successor
function S. Non-empty initial segments with this additional property
are called cuts.
is a cut in N .
(a1 , . . . , an ) ∈ A ⇔ M ⊧ ϕ[a1 , . . . , an ].
150 4. Metamathematics
Example 4.17.
(1) In any group G, the centralizer can be defined via
ϕ(x) ≡ ∀y (y ○ x = x ○ y).
(2) In any LA -structure, the set of even elements is defined by
ϕ(x) ≡ ∃y (x = y + y).
152 4. Metamathematics
This coding function works very well, but it has one big caveat:
In the language of arithmetic LA , we do not have a symbol for expo-
nentiation, so we cannot directly express the coding function above
in LA .
Gödel found a coding scheme that works in formal arithmetic (in
particular, it works in PA).
For this, we need the remainder function:
Proof. The main point is to choose a large enough l and then ap-
ply Lemma 4.22. Let a = max{a0 , . . . , an−1 } and put l = an. By
Lemma 4.22, the numbers
1 + l!, . . . , 1 + l ⋅ l!
are pairwise relatively prime. By the Chinese remainder theorem,
there exists c such that for i = 0, . . . , n − 1,
c ≡ ai (mod 1 + (i + 1)l!).
Since
ai ≤ a ≤ an = l ≤ l! < 1 + (i + 1)l!,
we have that rem(c, 1 + (i + 1)l!) = ai , and thus, if we let d = l!,
β(c, d, i) = ai .
For a full proof, see for example [13] or [54]. The proposition
gives us a first glimpse of a powerful connection between logic and
computation, a connection that will feature prominently in the next
section.
Exercise 4.28. Show that both decode and length are definable in
N.
∀k, p, r∃N ∀f [
∃l (func(f, l) ∧
∀i ≤ l (set(arg(f, i), p) ∧
∀j ≤ p(decode(arg(f, i), j) ≤ N )) ∧
∀y ((set(y, p) ∧ ∀j ≤ p(decode(y, j) ≤ N ))
⇒ ∃j(arg(f, j) = y))
∀j ≤ l (val(f, j) ≤ r))
o⇒
∃z, j (j ≤ r ∧ set(z, k) ∧
∀i ≤ k (decode(z, i) ≤ N ) ∧
would be to go over the essential steps in the proof and check that they
can be established simply by using the basic properties of addition
and multiplication via induction. (If you are willing to study a little
more mathematical logic, you will acquire some nice tools such as
upward absoluteness that make this a lot easier.)
We can also point to the rich efforts of a community of math-
ematicians aimed at formalizing proofs of major results so that the
correctness of the proofs can subsequently be checked by a computer.
This has been done multiple times for Ramsey’s theorem; see for ex-
ample [55].
Which other number-theoretic theorems can be proven in Peano
arithmetic? It turns out that in the vast majority of cases, if you can
formalize the statement in arithmetic, then it is provable in PA. Eu-
clid’s theorem on the infinitude of primes, van der Waerden’s theorem
(Theorem 3.3), the law of quadratic reciprocity—all are provable in
PA.4 This is good evidence that PA is quite strong as a formal system
and captures most of elementary number theory.
On the other hand, one might ask:
Are there true statements about the natural numbers that can-
not be proved in PA?
This question spurred some of the greatest achievements in math-
ematical logic in the 20th century. Along the way, the optimism that
mathematics could be put on a completely solid foundation was shat-
tered. But it also produced some beautiful mathematics, in which
Ramsey theory played no small part.
4.4. Incompleteness
The natural numbers N (just like any other LA -structure) have the
property that any LA -sentence is either true or false in N, and in the
latter case this means that the negation of the sentence must be true.
For any sentence σ, either N ⊧ σ or N ⊧ ¬σ.
If a theory has this property, it is called complete.
4
A notable uncertainty at the time this book was written concerns Wiles’s proof of
Fermat’s last theorem, but there is optimism among experts that it can be formalized
in PA as well.
160 4. Metamathematics
either T ⊢ σ or T ⊢ ¬σ.
∀x, y (x ○ y = y ○ x).
The statement says that a group is Abelian, i.e. the group operation
commutes. G has no opinion about this statement, since it neither
proves the statement nor disproves it (i.e. proves its negation). Some
groups are Abelian, others not. Hence the theory of groups is incom-
plete.
5
Some authors define completeness using a non-exclusive or. Completeness in the
sense of Definition 4.29 would then be equivalent to being complete and consistent.
4.4. Incompleteness 161
6
An example of such a formula would be ∀x (x = x)
162 4. Metamathematics
which assigns a Gödel number to all variables and the constant sym-
bol 0. Next, we assign Gödel numbers to all terms, i.e. expressions
that can be obtained by applying the operations S, +, ⋅ to variables,
constants, and other terms. Suppose s and t have already been as-
signed Gödel numbers. Then we put
⌜ ⌝
⌜
S(t)⌝ = 3 ⋅ 7 t
,
⌜ ⌝ ⌜ ⌝
⌜ ⌝
s+t =3 ⋅7 2 s
⋅ 11 t
,
⌜ ⌝ ⌜ ⌝
⌜ ⌝
s⋅t =3 ⋅7 3 s
⋅ 11 t
.
Finally, we assign Gödel numbers to formulas:
⌜ ⌝ ⌜ ⌝
⌜
s = t⌝ = 5 ⋅ 7 s
⋅ 11 t
,
⌜ ⌝
⌜
¬ϕ⌝ = 52 ⋅ 7 ϕ
,
⌜ ⌝ ⌜
⌜ ψ⌝
ϕ ∧ ψ ⌝ = 53 ⋅ 7 ϕ
⋅ 11 ,
⌜
⌜ ⌝ ϕ⌝
∃xi (ϕ) = 5 ⋅ 7 4 i+1
⋅ 11 .
4.4. Incompleteness 165
For example, the formula ∃x0 (x0 + x1 = x0 ⋅ x2 ) has the Gödel number
⌜ ⌜ x +x ⌝ ⌜ ⌝
x0 +x1 =x0 ⋅x2 ⌝ 1 ⋅11 x0 ⋅x2
54 ⋅ 71 ⋅ 11 = 54 ⋅ 7 ⋅ 115⋅7
0
2 3 ⋅72 ⋅1123
32 ⋅72 ⋅112
⋅113
= 54 ⋅ 7 ⋅ 115⋅7 .
on certain inputs, while for other inputs they finish and output a
result after finitely many steps.
Using the Gödel numbering of Turing machines, we can define
the following set of natural numbers:
and
̃(e) ↑ ⇒ Me (e) ↑ ⇒ Md (e) = 0 ⇒ M
M ̃(e) ↓,
The reason for this is that we can express facts about Turing
machine computations as formulas of first-order arithmetic. In the
previous section, we stated Gödel’s result that every primitive recur-
sive function is definable over N (Proposition 4.27). Kleene extended
this to all computable functions and relations.
N ⊧ Ψ[e, a, b] ⇔
the eth Turing machine halts on input a and outputs b.
168 4. Metamathematics
Note that all objects in the statement are of a finite nature, and
it is possible to formalize this statement in LA via a process like the
one we outlined for the finite Ramsey theorem in Section 4.3.
If you go back to Section 3.5 and revisit the proof of the fast
Ramsey theorem (Theorem 3.40), you will see that we used compact-
ness to infer it from the infinite Ramsey theorem. The problem with
formalizing this proof in PA is that the coding methods in Section 4.3
apply only to finite sequences of numbers, not infinite sets. Cardinal-
ity considerations aside (there are uncountably many subsets of N),
one can devise other effective coding methods for a certain subfam-
ily of subsets of N (for example, consider Gödel numbers of Turing
machines computing the set). But it is possible to show that the in-
finite Ramsey theorem is not formalizable in PA for such an effective
coding. This was proved by Jockusch [37].
The question is whether there might be an alternative proof of the
fast Ramsey theorem that utilizes only finite objects, as it is possible
for the finite Ramsey theorem. In 1977, Paris and Harrington [47]
showed that this is impossible, and this impossibility result has be-
come known as the Paris-Harrington theorem.
4.5. Indiscernibles 171
4.5. Indiscernibles
We want to show that a certain statement of first-order arithmetic is
not provable in PA. How do we accomplish this? By Gödel’s com-
pleteness theorem, we can show that
PA ⊬ σ
by constructing a model M of PA such that M ⊭ σ. The compactness
theorem in turn gave us a tool to construct models for PA other than
N, non-standard models. So far, however, we have had little control
over the nature of a non-standard model, other than that it has non-
standard elements.
In this section, we will describe a technique that allows us to
construct non-standard models with additional properties. Perhaps
somewhat surprisingly, Ramsey theory will play a key role.
When two objects are identical, they will have exactly the same
properties. But what about the converse: If two objects have the
exact same properties, are they identical?
While this is ultimately a philosophical question, there is a way
to frame it mathematically. We could, for instance, say that two
elements a and b of an L-structure M are indiscernible if we cannot
tell them apart by any L-formula ϕ(x, y). That is, for any such
formula,
M ⊧ ϕ[a, b] ⇔ M ⊧ ϕ[b, a].
For example, we can consider a graph G = (V, E) as a structure over
the language L = {E} with just one binary relation symbol (the edge
relation). In a complete graph Kn , any two vertices would be indis-
cernible in this sense.
In models of PA, however, we have a general obstruction to indiscern-
ibility—the models are linearly ordered. For any two elements a ≠ b,
either a < b or b < a. Therefore, the formula ϕ(x, y) ≡ x < y will
discern a from b.
172 4. Metamathematics
3+5=8 or 1 + 1 ≠ 334 ∨ 2 = 2.
The left formula does not contain any logical symbols other than
‘=’. Such LA -formulas are called atomic—they cannot be broken
up further into simpler subformulas. The formula on the right is
a Boolean combination of atomic formulas. The atomic parts are
ϕ ≡ 1 + 1 = 334 and ψ ≡ 2 = 2, and the formula is given as ¬ ϕ ∨ ψ.
The important point about quantifier-free LA -formulas is that,
for the standard model N, we can check whether these statements are
true by means of a computer program. For statements with quanti-
fiers, such as the Goldbach conjecture (a ∀-statement), this may no
longer be possible. In a “brute force” attempt, a computer would have
to check infinitely many instances (every even integer) and hence, if
the conjecture is true, run forever.
Similarly, an ∃-statement can be interpreted as an unbounded
search, since we are looking for a witness to the existential statement
over the whole structure. If no such witness exists, our search will go
on forever, and how and when would we decide whether that’s the
case? This is essentially the same question as the halting problem,
which we have seen to be undecidable (Theorem 4.35).
However, if we bound our search in advance, say by looking only
at numbers less than 106 , we know that eventually our search will end,
either because it has found a witness, or because there is none among
the numbers up to 106 . It might take a long time, but it will end.
Syntactically, a bounded search corresponds to a bounded quantifier.
For example,
∀x < t or ∃x < t
respectively.
R(a1 , . . . , an ) ∶⇔ N ⊧ ϕ[a1 , . . . , an ],
N ⊧ ϕ[a1 , . . . , al ]}.
Proof.
(i) Closure under basic arithmetic operations: Closure under S is
easy, since in any model of PA it holds that if a < b and b < c, then
S(a) < c (this can be proved via induction). And so, as we have an
infinite strictly increasing sequence b1 < b2 < ⋯, N is closed under S.
7
Sat stands for satisfaction relation (⊧).
4.5. Indiscernibles 177
N ⊧ ϕ[a1 , . . . , an ] ⇔ M ⊧ ϕ[a1 , . . . , an ].
M⊧ϕ implies N ⊧ ϕ.
4.5. Indiscernibles 179
While the bi are, strictly speaking, not part of the language, for the
sake of readability, in what follows we write the latter expression as
N ⊧ ∃x < bi θ(x).
Similarly, we have
is equivalent to
Note that the “meta”-quantifiers, ∃i1 > i0 ∀i2 > i1 . . . ∃in > in−1 , are
just an abbreviation of the long statement “there exists i1 > i0 such
that for all . . . ”. The formula on the right-hand side is Δ0 , since
all quantified variables are bounded, and hence by Lemma 4.47 the
previous statement is equivalent to
8
The notation in the succeeding formula is again a little sloppy. As a and c⃗ are
not variables but elements of the structure over which we interpret, we should write
x)[a, c⃗], but that makes it rather hard to read.
ψ(⃗
4.5. Indiscernibles 181
M ⊧ ∃x1 < bi0 +1 ∀x2 < bi0 +2 . . . ∃xn < bi0 +n ψ(a, c⃗, x
⃗).
N ⊧ ϕ[a, c⃗] iff M ⊧ ∃x1 < bi0 +1 ∀x2 < bi0 +2 . . . ∃xn < bi0 +n ψ(a, c⃗, x
⃗).
As before, we choose i0 such that a, c⃗ < bi0 and obtain the equivalence
N ⊧ ϕ[a, c⃗] iff M ⊧ ∃x1 < bi0 +1 ∀x2 < bi0 +2 . . . ∃xn < bi0 +n ψ(a, c⃗, x
⃗).
Since induction (and hence the LNP) holds in M, there exists a least
â < bi0 such that
M ⊧ ∃x1 < bi0 +1 ∀x2 < bi0 +2 . . . ∃xn < bi0 +n ψ(â, c⃗, x
⃗).
fi ({y0 } ∪ Yj ) = fi ({y0 } ∪ Yl )
Theorem 4.55 (Paris and Harrington [47]). The fast Ramsey theo-
rem is not provable in PA.
(3) Since the model M satisfies (∗), we can use it to find diagonal
indiscernibles.
This is the metamathematically most subtle step. The finite Ramsey
theorem is provable in PA; there exists a least w ∈ M such that
M ⊧ w → (3c + 1)2c+1
c .
theorem after all, and as we pointed out above, to M, all its elements
“look” finite. But what does it mean “from the outside” that, for
example, ∣H∣ ≥ w if w is non-standard?
Going back to Section 4.3, the formalized version of “∣H∣ ≥ w”
states that a code a (for H) exists such that a codes a set and the
length of a is at least w. We defined the length of a code simply
as the 0-entry in its decoding sequence. But can such an entry be
non-standard? In other words, can the result of
rem(c, 1 + (1 + i)d)
Theorem 4.59 (Paris and Harrington [47]). The fast Ramsey theo-
rem implies ConPA .
second incompleteness theorem holds for these systems, too. One im-
portant example is Zermelo-Fraenkel set theory with the axiom
of choice, ZFC.
Just as PA collects basic statements about natural numbers, ZFC
consists of various statements about sets. For example, one axiom
asserts that if X and Y are sets, so is {X, Y }. Another axiom asserts
the existence of the power set P(X) for any set X. You can find
the complete list in any book on set theory (such as [35]). ZFC is
a powerful axiom system. Most mathematical objects and theories
(from analysis to group theory to algebraic topology) can be formal-
ized in it. ZFC interprets PA and one can also formalize the proof
that N is a model of PA in ZFC, which means that ZFC proves the
consistency of PA. However, it also means that ZFC cannot prove its
own consistency, in the sense formulated above. One would have to
resort to an even stronger axiom system to prove the consistency of
ZFC. The stronger system, if consistent, in turn cannot prove its own
consistency, and so on.
There is something similar to a standard model of PA in ZFC:
the von Neumann universe V . It is a cumulative hierarchy of sets,
built from the empty set by iterating the power set operation and
taking unions: For ordinals α and λ, we define
V0 = ∅,
Vα+1 = P(Vα ), and
Vλ = ⋃ Vβ if λ is a limit ordinal.
β<λ
V = ⋃ Vα
α∈Ord
[1] W. Ackermann, Zum Hilbertschen Aufbau der reellen Zahlen (German), Math.
Ann. 99 (1928), no. 1, 118–133, DOI 10.1007/BF01459088. MR1512441
[2] N. Alon and J. H. Spencer, The probabilistic method, 4th ed., Wiley Series in
Discrete Mathematics and Optimization, John Wiley & Sons, Inc., Hoboken, NJ,
2016. MR3524748
[3] V. Angeltveit and B. D. McKay, R(5, 5) ≤ 48, arXiv:1703.08768, 2017.
[4] R. A. Brualdi, Introductory combinatorics, 5th ed., Pearson Prentice Hall, Upper
Saddle River, NJ, 2010. MR2655770
[5] P. L. Butzer, M. Jansen, and H. Zilles, Johann Peter Gustav Lejeune Dirichlet
(1805–1859): Genealogie und Werdegang (German), Dürerner Geschichtsblätter
71 (1982), 31–56. MR690659
[6] A. Church, A note on the Entscheidungsproblem., Journal of Symbolic Logic 1
(1936), 40–41.
[7] A. Church, An unsolvable problem of elementary number theory, Amer. J. Math.
58 (1936), no. 2, 345–363, DOI 10.2307/2371045. MR1507159
[8] P. J. Cohen, The independence of the continuum hypothesis, Proc. Nat. Acad.
Sci. U.S.A. 50 (1963), 1143–1148. MR0157890
[9] P. J. Cohen, The independence of the continuum hypothesis. II, Proc. Nat. Acad.
Sci. U.S.A. 51 (1964), 105–110. MR0159745
[10] D. Conlon, A new upper bound for diagonal Ramsey numbers, Ann. of Math. (2)
170 (2009), no. 2, 941–960, DOI 10.4007/annals.2009.170.941. MR2552114
[11] R. Dedekind, Was sind und was sollen die Zahlen? (German), 8te unveränderte
Aufl, Friedr. Vieweg & Sohn, Braunschweig, 1960. MR0106846
[12] R. Diestel, Graph theory, 5th ed., Graduate Texts in Mathematics, vol. 173,
Springer, Berlin, 2017. MR3644391
[13] H. B. Enderton, A mathematical introduction to logic, 2nd ed., Harcourt/Aca-
demic Press, Burlington, MA, 2001. MR1801397
[14] P. ErdH øs, Some remarks on the theory of graphs, Bull. Amer. Math. Soc. 53
(1947), 292–294, DOI 10.1090/S0002-9904-1947-08785-1. MR0019911
[15] P. Erdős and R. Rado, A problem on ordered sets, J. London Math. Soc. 28
(1953), 426–438, DOI 10.1112/jlms/s1-28.4.426. MR0058687
199
200 Bibliography
[19] K. Gödel, Über formal unentscheidbare Sätze der Principia Mathematica und
verwandter Systeme I (German), Monatsh. Math. Phys. 38 (1931), no. 1, 173–
198, DOI 10.1007/BF01700692. MR1549910
[20] K. Gödel, The Consistency of the Continuum Hypothesis, Annals of Mathematics
Studies, no. 3, Princeton University Press, Princeton, NJ, 1940. MR0002514
[21] G. Gonthier, A. Asperti, J. Avigad, et al., A machine-checked proof of the
odd order theorem, Interactive theorem proving, Lecture Notes in Comput. Sci.,
vol. 7998, Springer, Heidelberg, 2013, pp. 163–179, DOI 10.1007/978-3-642-39634-
2 14. MR3111271
[22] W. T. Gowers, A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11
(2001), no. 3, 465–588, DOI 10.1007/s00039-001-0332-9. MR1844079
[23] R. L. Graham and B. L. Rothschild, Ramsey theory (1980), ix+174. Wiley-
Interscience Series in Discrete Mathematics; A Wiley-Interscience Publication.
MR591457
[24] R. L. Graham, B. L. Rothschild, and J. H. Spencer, Ramsey theory, 2nd ed.,
Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley
& Sons, Inc., New York, 1990. A Wiley-Interscience Publication. MR1044995
[25] R. L. Graham and J. H. Spencer, Ramsey theory, Scientific American 263 (1990),
no. 1, 112–117.
[26] B. Green and T. Tao, The primes contain arbitrarily long arithmetic pro-
gressions, Ann. of Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/an-
nals.2008.167.481. MR2415379
[27] R. E. Greenwood and A. M. Gleason, Combinatorial relations and chro-
matic graphs, Canad. J. Math. 7 (1955), 1–7, DOI 10.4153/CJM-1955-001-4.
MR0067467
[28] A. Grzegorczyk, Some classes of recursive functions, Rozprawy Mat. 4 (1953),
46. MR0060426
[29] A. W. Hales and R. I. Jewett, Regularity and positional games, Trans. Amer.
Math. Soc. 106 (1963), 222–229, DOI 10.2307/1993764. MR0143712
[30] T. Hales, M. Adams, G. Bauer, T. D. Dang, J. Harrison, Le Truong Hoang, C.
Kaliszyk, V. Magron, S. McLaughlin, T. T. Nguyen, Q. T. Nguyen, T. Nipkow,
S. Obua, J. Pleso, J. Rute, A. Solovyev, T. H. A. Ta, N. T. Tran, T. D. Trieu, J.
Urban, K. Vu, and R. Zumkeller, A formal proof of the Kepler conjecture, Forum
Math. Pi 5 (2017), e2, 29, DOI 10.1017/fmp.2017.1. MR3659768
[31] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, Oxford,
at the Clarendon Press, 1954. 3rd ed. MR0067125
[32] D. Hilbert and W. Ackermann, Grundzüge der theoretischen Logik, Springer-
Verlag, Berlin, 1928.
[33] N. Hindman and E. Tressler, The first nontrivial Hales-Jewett number is four,
Ars Combin. 113 (2014), 385–390. MR3186481
Bibliography 201
[34] D. R. Hirschfeldt, Slicing the truth. On the computable and reverse mathematics
of combinatorial principles, Lecture Notes Series. Institute for Mathematical Sci-
ences. National University of Singapore, vol. 28, World Scientific Publishing Co.
Pte. Ltd., Hackensack, NJ, 2015. Edited and with a foreword by Chitat Chong,
Qi Feng, Theodore A. Slaman, W. Hugh Woodin and Yue Yang. MR3244278
[35] T. J. Jech, Set theory, Springer Monographs in Mathematics, Springer-Verlag,
Berlin, 2003. The third millennium edition, revised and expanded. MR1940513
[36] T. J. Jech, The axiom of choice, North-Holland Publishing Co., Amsterdam-
London; Amercan Elsevier Publishing Co., Inc., New York, 1973. Studies in Logic
and the Foundations of Mathematics, Vol. 75. MR0396271
[37] C. G. Jockusch Jr., Ramsey’s theorem and recursion theory, J. Symbolic Logic
37 (1972), 268–280, DOI 10.2307/2272972. MR0376319
[38] A. Kanamori and K. McAloon, On Gödel incompleteness and finite combi-
natorics, Ann. Pure Appl. Logic 33 (1987), no. 1, 23–41, DOI 10.1016/0168-
0072(87)90074-1. MR870685
[39] R. Kaye, Models of Peano arithmetic, Oxford Logic Guides, vol. 15, The Claren-
don Press, Oxford University Press, New York, 1991. Oxford Science Publications.
MR1098499
[40] J. Ketonen and R. Solovay, Rapidly growing Ramsey functions, Ann. of Math.
(2) 113 (1981), no. 2, 267–314, DOI 10.2307/2006985. MR607894
[41] A. Y. Khinchin, Three pearls of number theory, Graylock Press, Rochester, NY,
1952. MR0046372
[42] D. E. Knuth, Mathematics and computer science: coping with finiteness, Science
194 (1976), no. 4271, 1235–1242, DOI 10.1126/science.194.4271.1235. MR534161
[43] B. M. Landman and A. Robertson, Ramsey theory on the integers, 2nd ed., Stu-
dent Mathematical Library, vol. 73, American Mathematical Society, Providence,
RI, 2014. MR3243507
[44] D. Marker, Model theory: an introduction, Graduate Texts in Mathematics,
vol. 217, Springer-Verlag, New York, 2002. MR1924282
[45] B. D. McKay and S. P. Radziszowski, R(4, 5) = 25, J. Graph Theory 19 (1995),
no. 3, 309–322, DOI 10.1002/jgt.3190190304. MR1324481
[46] J. Nešetřil, Ramsey theory, Handbook of combinatorics, Vol. 1, 2, Elsevier Sci. B.
V., Amsterdam, 1995, pp. 1331–1403. MR1373681
[47] J. Paris and L. Harrington, A mathematical incompleteness in Peano arith-
metic, Handbook of mathematical logic, Stud. Logic Found. Math., vol. 90, North-
Holland, Amsterdam, 1977, pp. 1133–1142. MR3727432
[48] G. Peano, Arithmetices principia: nova methodo, Fratres Bocca, 1889.
[49] G. Peano, Sul concetto di numero, Rivista di Matematica 1 (1891), 256–267.
[50] R. Péter, Über die mehrfache Rekursion (German), Math. Ann. 113 (1937), no. 1,
489–527, DOI 10.1007/BF01571648. MR1513105
[51] C. C. Pugh, Real mathematical analysis, 2nd ed., Undergraduate Texts in Math-
ematics, Springer, Cham, 2015. MR3380933
[52] S. Radziszowski, Small Ramsey numbers, Electron. J. Comb., http://www.
combinatorics.org/files/Surveys/ds1/ds1v15-2017.pdf, 2017.
[53] F. P. Ramsey, On a problem of formal logic, Proc. London Math. Soc. (2) 30
(1929), no. 4, 264–286, DOI 10.1112/plms/s2-30.1.264. MR1576401
[54] W. Rautenberg, A concise introduction to mathematical logic, Based on the sec-
ond (2002) German edition, Universitext, Springer, New York, 2006. With a fore-
word by Lev Beklemishev. MR2218537
[55] T. Ridge, Hol/library/ramsey.thy.
202 Bibliography
[56] H. E. Rose, Subrecursion: functions and hierarchies, Oxford Logic Guides, vol. 9,
The Clarendon Press, Oxford University Press, New York, 1984. MR752696
[57] P. Rothmaler, Introduction to model theory, Algebra, Logic and Applications,
vol. 15, Gordon and Breach Science Publishers, Amsterdam, 2000. Prepared by
Frank Reitmaier; Translated and revised from the 1995 German original by the
author. MR1800596
[58] S. Shelah, Primitive recursive bounds for van der Waerden numbers, J. Amer.
Math. Soc. 1 (1988), no. 3, 683–697, DOI 10.2307/1990952. MR929498
[59] L. Shi, Upper bounds for Ramsey numbers, Discrete Math. 270 (2003), no. 1-3,
251–265, DOI 10.1016/S0012-365X(02)00837-3. MR1997902
[60] J. R. Shoenfield, Mathematical logic, Association for Symbolic Logic, Urbana,
IL; A K Peters, Ltd., Natick, MA, 2001. Reprint of the 1973 second printing.
MR1809685
[61] S. G. Simpson, Subsystems of second order arithmetic, 2nd ed., Perspectives in
Logic, Cambridge University Press, Cambridge; Association for Symbolic Logic,
Poughkeepsie, NY, 2009. MR2517689
[62] C. Smoryński, Logical number theory. I. An introduction, Universitext, Springer-
Verlag, Berlin, 1991. MR1106853
[63] R. I. Soare, Turing computability: theory and applications, Theory and Applica-
tions of Computability, Springer-Verlag, Berlin, 2016. MR3496974
[64] E. Szemerédi, On sets of integers containing no k elements in arithmetic pro-
gression, Acta Arith. 27 (1975), 199–245, DOI 10.4064/aa-27-1-199-245. Collec-
tion of articles in memory of Juriı̆ Vladimirovič Linnik. MR0369312
[65] P. Turán, Eine Extremalaufgabe aus der Graphentheorie (Hungarian, with Ger-
man summary), Mat. Fiz. Lapok 48 (1941), 436–452. MR0018405
[66] A. M. Turing, On computable numbers, with an application to the Entschei-
dungsproblem, Proc. London Math. Soc. (2) 42 (1936), no. 3, 230–265, DOI
10.1112/plms/s2-42.1.230. MR1577030
[67] S. M. Ulam, Adventures of a mathematician, University of California Press, 1991.
[68] B. L. van der Waerden, Beweis einer Baudetschen Vermutung, (German), Nieuw
Arch. Wiskd., II. Ser. 15 (1927), 212–216.
[69] B. L. van der Waerden, Wie der Beweis der Vermutung von Baudet gefun-
den wurde (German), Abh. Math. Sem. Univ. Hamburg 28 (1965), 6–15, DOI
10.1007/BF02993133. MR0175875
[70] B. L. van der Waerden, How the proof of Baudet’s conjecture was found, Studies
in Pure Mathematics (Presented to Richard Rado), Academic Press, London, 1971,
pp. 251–260. MR0270881
[71] S. S. Wainer, A classification of the ordinal recursive functions, Arch.
Math. Logik Grundlagenforsch. 13 (1970), 136–153, DOI 10.1007/BF01973619.
MR0294134
[72] S. Weinberger, Computers, rigidity, and moduli. The large-scale fractal geometry
of Riemannian moduli space, M. B. Porter Lectures, Princeton University Press,
Princeton, NJ, 2005. MR2109177
Notation
203
204 Notation
205
206 Index
STML/87