ALGORITHMIC Information Theory
ALGORITHMIC Information Theory
INFORMATION
THEORY
Third Printing
G J Chaitin
IBM, P O Box 218
Yorktown Heights, NY 10598
[email protected]
April 2, 2003
This book was published in 1987 by Cambridge Uni-
versity Press as the first volume in the series Cam-
bridge Tracts in Theoretical Computer Science. In
1988 and 1990 it was reprinted with revisions. This
is the text of the third printing. However the APL
character set is no longer used, since it is not gen-
erally available.
Acknowledgments
1
2
Foreword
Turing’s deep 1937 paper made it clear that Gödel’s astonishing earlier
results on arithmetic undecidability related in a very natural way to a
class of computing automata, nonexistent at the time of Turing’s paper,
but destined to appear only a few years later, subsequently to proliferate
as the ubiquitous stored-program computer of today. The appearance
of computers, and the involvement of a large scientific community in
elucidation of their properties and limitations, greatly enriched the line
of thought opened by Turing. Turing’s distinction between computa-
tional problems was rawly binary: some were solvable by algorithms,
others not. Later work, of which an attractive part is elegantly devel-
oped in the present volume, refined this into a multiplicity of scales
of computational difficulty, which is still developing as a fundamental
theory of information and computation that plays much the same role
in computer science that classical thermodynamics plays in physics:
by defining the outer limits of the possible, it prevents designers of
algorithms from trying to create computational structures which prov-
ably do not exist. It is not surprising that such a thermodynamics of
information should be as rich in philosophical consequence as thermo-
dynamics itself.
This quantitative theory of description and computation, or Com-
putational Complexity Theory as it has come to be known, studies the
various kinds of resources required to describe and execute a computa-
tional process. Its most striking conclusion is that there exist computa-
tions and classes of computations having innocent-seeming definitions
but nevertheless requiring inordinate quantities of some computational
resource. Resources for which results of this kind have been established
include:
3
4
(c) The time for which such a process will need to execute, either
on a standard “serial” computer or on computational structures
unrestricted in the degree of parallelism which they can employ.
Of these three resource classes, the first is relatively static, and per-
tains to the fundamental question of object describability; the others
are dynamic since they relate to the resources required for a computa-
tion to execute. It is with the first kind of resource that this book is
concerned. The crucial fact here is that there exist symbolic objects
(i.e., texts) which are “algorithmically inexplicable,” i.e., cannot be
specified by any text shorter than themselves. Since texts of this sort
have the properties associated with the random sequences of classical
probability theory, the theory of describability developed in Part II of
the present work yields a very interesting new view of the notion of
randomness.
The first part of the book prepares in a most elegant, even playful,
style for what follows; and the text as a whole reflects its author’s won-
derful enthusiasm for profundity and simplicity of thought in subject
areas ranging over philosophy, computer technology, and mathematics.
J. T. Schwartz
Courant Institute
February, 1987
Preface
5
6
1 Introduction 13
7
8 CONTENTS
7 Randomness 179
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.2 Random Reals . . . . . . . . . . . . . . . . . . . . . . . . 184
8 Incompleteness 197
8.1 Lower Bounds on Information Content . . . . . . . . . . 197
8.2 Random Reals: First Approach . . . . . . . . . . . . . . 200
8.3 Random Reals: |Axioms| . . . . . . . . . . . . . . . . . . 202
8.4 Random Reals: H(Axioms) . . . . . . . . . . . . . . . . . 209
9 Conclusion 213
10 Bibliography 215
11
12 LIST OF FIGURES
Chapter 1
Introduction
More than half a century has passed since the famous papers Gödel
(1931) and Turing (1937) that shed so much light on the foundations
of mathematics, and that simultaneously promulgated mathematical
formalisms for specifying algorithms, in one case via primitive recursive
function definitions, and in the other case via Turing machines. The
development of computer hardware and software technology during this
period has been phenomenal, and as a result we now know much better
how to do the high-level functional programming of Gödel, and how
to do the low-level machine language programming found in Turing’s
paper. And we can actually run our programs on machines and debug
them, which Gödel and Turing could not do.
I believe that the best way to actually program a universal Turing
machine is John McCarthy’s universal function EVAL. In 1960 Mc-
Carthy proposed LISP as a new mathematical foundation for the the-
ory of computation [McCarthy (1960)]. But by a quirk of fate LISP
has largely been ignored by theoreticians and has instead become the
standard programming language for work on artificial intelligence. I
believe that pure LISP is in precisely the same role in computational
mathematics that set theory is in theoretical mathematics, in that it
provides a beautifully elegant and extremely powerful formalism which
enables concepts such as that of numbers and functions to be defined
from a handful of more primitive notions.
Simultaneously there have been profound theoretical advances.
Gödel and Turing’s fundamental undecidable proposition, the question
13
14 CHAPTER 1. INTRODUCTION
(3) the idea of using register machines rather than Turing machines,
and of encoding computational histories via variables which are
vectors giving the contents of a register as a function of time.
Their work gives a simple straightforward proof, using almost no num-
ber theory, that there is an exponential diophantine equation with one
parameter p which has a solution if and only if the pth computer pro-
gram (i.e., the program with Gödel number p) ever halts.
Similarly, one can use their method to arithmetize my undecidable
proposition. The result is an exponential diophantine equation with
the parameter n and the property that it has infinitely many solutions
if and only if the nth bit of Ω is a 1. Here Ω is the halting probability
of a universal Turing machine if an n-bit program has measure 2−n
[Chaitin (1975b,1982b)]. Ω is an algorithmically random real number
in the sense that the first N bits of the base-two expansion of Ω cannot
be compressed into a program shorter than N bits, from which it follows
that the successive bits of Ω cannot be distinguished from the result of
independent tosses of a fair coin. We will also show in this monograph
1
These results are drawn from Chaitin (1986,1987b).
15
(2) Secondly, EVAL must not lose control by going into an infinite
loop. In other words, we need a safe EVAL that can execute
2
This theorem was originally established in Chaitin (1987b).
16 CHAPTER 1. INTRODUCTION
Our version of pure LISP also has the property that in it we can
write a short program to calculate Ω in the limit from below. The
program for calculating Ω is only a few pages long, and by running it (on
the 370 directly, not on the register machine!), we have obtained a lower
bound of 127/128ths for the particular definition of Ω we have chosen,
which depends on our choice of a self-delimiting universal computer.
The final step was to write a compiler that compiles a register ma-
chine program into an exponential diophantine equation. This compiler
consists of about 700 lines of code in a very nice and easy to use pro-
gramming language invented by Mike Cowlishaw called REXX. REXX
is a pattern-matching string processing language which is implemented
by means of a very efficient interpreter.3 It takes the compiler only a
few minutes to convert the 300-line LISP interpreter into a 900,000-
character 17,000-variable universal exponential diophantine equation.
The resulting equation is a little large, but the ideas used to produce it
are simple and few, and the equation results from the straightforward
application of these ideas.
Here we shall present the details of this adventure, but not the full
equation.4 My hope is that this monograph will convince mathemati-
cians that randomness and unpredictability not only occur in nonlin-
ear dynamics and quantum mechanics, but even in rather elementary
branches of number theory.
In summary, the aim of this book is to construct a single equa-
tion involving only addition, multiplication, and exponentiation of non-
negative integer constants and variables with the following remarkable
property. One of the variables is considered to be a parameter. Take
the parameter to be 0,1,2,. . . obtaining an infinite series of equations
from the original one. Consider the question of whether each of the
derived equations has finitely or infinitely many non-negative integer
solutions. The original equation is constructed in such a manner that
the answers to these questions about the derived equations mimic coin
tosses and are an infinite series of independent mathematical facts, i.e.,
irreducible mathematical information that cannot be compressed into
3
See Cowlishaw (1985) and O’Hara and Gomberg (1985).
4
The full equation is available from the author: “The Complete Arithmetization
of EVAL,” November 19th, 1987, 294 pp.
18 CHAPTER 1. INTRODUCTION
any finite set of axioms. In other words, it is essentially the case that
the only way to prove such assertions is by assuming them as axioms.
To produce this equation, we start with a universal Turing machine
in the form of the LISP universal function EVAL written as a register
machine program about 300 lines long. Then we “compile” this register
machine program into a universal exponential diophantine equation.
The resulting equation is about 900,000 characters long and has about
17,000 variables. Finally, we substitute for the program variable in
the universal diophantine equation the binary representation of a LISP
program for Ω, the halting probability of a universal Turing machine if
n-bit programs have measure 2−n .
Part I
19
21
The Arithmetization of
Register Machines
2.1 Introduction
In this chapter we present the beautiful work of Jones and Matija-
sevič (1984), which is the culmination of a half century of development
starting with Gödel (1931), and in which the paper of Davis, Put-
nam, and Robinson (1961) on Hilbert’s tenth problem was such a
notable milestone. The aim of this work is to encode computations
arithmetically. As Gödel showed with his technique of Gödel num-
bering and primitive recursive functions, the metamathematical asser-
tion that a particular proposition follows by certain rules of inference
from a particular set of axioms, can be encoded as an arithmetical or
number theoretic proposition. This shows that number theory well de-
serves its reputation as one of the hardest branches of mathematics, for
any formalized mathematical assertion can be encoded as a statement
about positive integers. And the work of Davis, Putnam, Robinson,
and Matijasevič has shown that any computation can be encoded as
a polynomial. The proof of this assertion, which shows that Hilbert’s
tenth problem is unsolvable, has been simplified over the years, but it
is still fairly intricate and involves a certain amount of number theory;
for a review see Davis, Matijasevič, and Robinson (1976).
23
24 CHAPTER 2. REGISTER MACHINES
L(a1 , . . . , an , x1 , . . . , xm ) = R(a1 , . . . , an , x1 , . . . , xm ).
L(a1 , . . . , an , x1 , . . . , xm ) = R(a1 , . . . , an , x1 , . . . , xm )
0: 1
1: 1 1
2: 1 2 1
3: 1 3 3 1
4: 1 4 6 4 1
5: 1 5 10 10 5 1
6: 1 6 15 20 15 6 1
7: 1 7 21 35 35 21 7 1
8: 1 8 28 56 70 56 28 8 1
9: 1 9 36 84 126 126 84 36 9 1
10: 1 10 45 120 210 252 210 120 45 10 1
11: 1 11 55 165 330 462 462 330 165 55 11 1
12: 1 12 66 220 495 792 924 792 495 220 66 12 1
13: 1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1
14: 1 14 91 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 1
15: 1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1
16: 1 16 120 560 1820 4368 8008 11440 12870 11440 8008 4368 1820 560 120 16 1
(This rule assumes that entries which are not explicitly shown in this
table are all zero.)
Now let’s replace each entry by a 0 if it is even, and let’s replace
it by a 1 if it is odd. That is to say, we retain only the rightmost bit
in the base-two representation of each entry in the table in Figure 2.1.
This gives us the table in Figure 2.2.
Figure 2.2 shows Pascal’s triangle mod 2 up to (x + y)64. This table
was calculated by using the formula
! ! !
n+1 n n
≡ + (mod 2).
k+1 k+1 k
That is to say, each entry is the base-two sum without carry (the “EX-
CLUSIVE OR”) of two entries in the row above it: the entry in the
same column, and the one in the column just to left.
Erasing 0’s makes it easier for one to appreciate the remarkable
pattern in Figure 2.2. This gives us the table in Figure 2.3.
Note that moving one row down the table in Figure 2.3 corresponds
to taking the EXCLUSIVE OR of the original row with a copy of it
that has been shifted right one place. More generally, moving down
the table 2n rows corresponds to taking the EXCLUSIVE OR of the
original row with a copy of it that has been shifted right 2n places. This
is easily proved by induction on n.
Consider the coefficients of xk in the expansion of (1 + x)42 . Some
are even and some are odd. There are eight odd coefficients: since 42 =
32 + 8 + 2, the coefficients are odd for k = (0 or 32) + (0 or 8) + (0 or
2). (See the rows marked with an ∗ in Figure 2.3.) Thus the coefficient
of xk in (1 + x)42 is odd if and only if each bit in the base-two numeral
for k “implies” (i.e., is less than or equal to) the corresponding bit in
the base-two numeral for 42. More generally, the coefficient of xk in
(1 + x)n is odd if and only if each bit in the base-two numeral for k
implies the corresponding bit in the base-two numeral for n.
Let us write r ⇒ s if each bit in the base-two numeral for the non-
negative integer r implies the corresponding bit in the base-two numeral
s. We have seen that r ⇒ s if and only if
for the non-negative integer
the binomial coefficient rs of xr in (1 + x)s is odd. Let us express this
as an exponential diophantine predicate.
28 CHAPTER 2. REGISTER MACHINES
0: 1
1: 11
2: 101
3: 1111
4: 10001
5: 110011
6: 1010101
7: 11111111
8: 100000001
9: 1100000011
10: 10100000101
11: 111100001111
12: 1000100010001
13: 11001100110011
14: 101010101010101
15: 1111111111111111
16: 10000000000000001
17: 110000000000000011
18: 1010000000000000101
19: 11110000000000001111
20: 100010000000000010001
21: 1100110000000000110011
22: 10101010000000001010101
23: 111111110000000011111111
24: 1000000010000000100000001
25: 11000000110000001100000011
26: 101000001010000010100000101
27: 1111000011110000111100001111
28: 10001000100010001000100010001
29: 110011001100110011001100110011
30: 1010101010101010101010101010101
31: 11111111111111111111111111111111
32: 100000000000000000000000000000001
33: 1100000000000000000000000000000011
34: 10100000000000000000000000000000101
35: 111100000000000000000000000000001111
36: 1000100000000000000000000000000010001
37: 11001100000000000000000000000000110011
38: 101010100000000000000000000000001010101
39: 1111111100000000000000000000000011111111
40: 10000000100000000000000000000000100000001
41: 110000001100000000000000000000001100000011
42: 1010000010100000000000000000000010100000101
43: 11110000111100000000000000000000111100001111
44: 100010001000100000000000000000001000100010001
45: 1100110011001100000000000000000011001100110011
46: 10101010101010100000000000000000101010101010101
47: 111111111111111100000000000000001111111111111111
48: 1000000000000000100000000000000010000000000000001
49: 11000000000000001100000000000000110000000000000011
50: 101000000000000010100000000000001010000000000000101
51: 1111000000000000111100000000000011110000000000001111
52: 10001000000000001000100000000000100010000000000010001
53: 110011000000000011001100000000001100110000000000110011
54: 1010101000000000101010100000000010101010000000001010101
55: 11111111000000001111111100000000111111110000000011111111
56: 100000001000000010000000100000001000000010000000100000001
57: 1100000011000000110000001100000011000000110000001100000011
58: 10100000101000001010000010100000101000001010000010100000101
59: 111100001111000011110000111100001111000011110000111100001111
60: 1000100010001000100010001000100010001000100010001000100010001
61: 11001100110011001100110011001100110011001100110011001100110011
62: 101010101010101010101010101010101010101010101010101010101010101
63: 1111111111111111111111111111111111111111111111111111111111111111
64: 10000000000000000000000000000000000000000000000000000000000000001
0: 1
1: 11
2: 1 1
3: 1111
4: 1 1
5: 11 11
6: 1 1 1 1
7: 11111111
8: 1 1
9: 11 11
10: 1 1 1 1
11: 1111 1111
12: 1 1 1 1
13: 11 11 11 11
14: 1 1 1 1 1 1 1 1
15: 1111111111111111
16: 1 1
17: 11 11
18: 1 1 1 1
19: 1111 1111
20: 1 1 1 1
21: 11 11 11 11
22: 1 1 1 1 1 1 1 1
23: 11111111 11111111
24: 1 1 1 1
25: 11 11 11 11
26: 1 1 1 1 1 1 1 1
27: 1111 1111 1111 1111
28: 1 1 1 1 1 1 1 1
29: 11 11 11 11 11 11 11 11
30: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
31: 11111111111111111111111111111111
*32: 1 1
33: 11 11
34: 1 1 1 1
35: 1111 1111
36: 1 1 1 1
37: 11 11 11 11
38: 1 1 1 1 1 1 1 1
39: 11111111 11111111
*40: 1 1 1 1
41: 11 11 11 11
*42: 1 1 1 1 1 1 1 1
43: 1111 1111 1111 1111
44: 1 1 1 1 1 1 1 1
45: 11 11 11 11 11 11 11 11
46: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
47: 1111111111111111 1111111111111111
48: 1 1 1 1
49: 11 11 11 11
50: 1 1 1 1 1 1 1 1
51: 1111 1111 1111 1111
52: 1 1 1 1 1 1 1 1
53: 11 11 11 11 11 11 11 11
54: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
55: 11111111 11111111 11111111 11111111
56: 1 1 1 1 1 1 1 1
57: 11 11 11 11 11 11 11 11
58: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
59: 1111 1111 1111 1111 1111 1111 1111 1111
60: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
61: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
62: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
63: 1111111111111111111111111111111111111111111111111111111111111111
64: 1 1
We use the fact that the binomial coefficients are the digits of the
number (1 + t)s written in base-t notation, if t is sufficiently large. For
example, in base-ten we have
110 =1
111 = 11
112 = 121
113 = 1331
114 = 14641
but for 115 a carry occurs when adding 6 and 4 and things break down.
In fact, since the binomial coefficients of order s add up to 2s , it is
sufficient to take t = 2s . Hence
t = 2s
! (1 + t)s = vtr+1 + utr + w
s
r ⇒ s iff u = is odd iff w < tr
r
u<t
u is odd.
t = 2s
(1 + t)s = vtr+1 + utr + w
w + x + 1 = tr
u+y+1=t
u = 2z + 1.
LABEL: HALT
Halt execution.
L: HALT Halt.
L: GOTO L2 Unconditional branch to L2.
L: JUMP R L2 (label) of next instruction into R &
goto L2.
L: GOBACK R Goto (label) which is in R.
L: EQ R 0/255 L2 Compare the rightmost 8 bits of R
L: NEQ R R2 L2 with an 8-bit constant
or with the rightmost 8 bits of R2
& branch to L2 for equal/not equal.
L: RIGHT R Shift R right 8 bits.
L: LEFT R 0/255 Shift R left 8 bits & insert an 8-bit
R R2 constant or insert the rightmost
8 bits of R2. In the latter case,
then shift R2 right 8 bits.
L: SET R 0/255 Set the entire contents of R to be
R R2 equal to that of R2 or to an 8-bit
constant (extended to the left with
infinitely many 0’s).
L: OUT R Write string in R.
L: DUMP Dump all registers.
LABEL: DUMP
Each register’s name and the character string that it contains are
written out (with the characters in the correct, not the reversed,
order!). This instruction is not really necessary; it is used for
debugging.
input.A = “abc”
time = 8
executed must be HALT, which has no effect on the contents.) I.e., the
digit corresponding to q t (0 ≤ t < time) gives the contents of a register
just before the (t + 1)-th instruction is executed. q must be chosen
large enough for everything to fit.
The base-q numbers L1, L2, L3, L4 encode the instruction being ex-
ecuted as a function of time; the digit corresponding to q t in LABEL
is a 1 if LABEL is executed at time t, and it is a 0 if LABEL is not
executed at time t.
L1 = 00000001q
L2 = 00101010q
L3 = 01010100q
L4 = 10000000q .
i is a base-q number consisting of time 1’s:
i = 11111111q .
1 + (q − 1)i = q time .
1 ⇒ L1.
dont.set.A ⇒ A
dont.set.A ⇒ (q − 1)(i − L2)
A ⇒ dont.set.A + (q − 1)L2
dont.set.B ⇒ B
dont.set.B ⇒ (q − 1)(i − L1 − L2)
B ⇒ dont.set.B + (q − 1)(L1 + L2)
qL1 ⇒ L2
qL3 ⇒ L4 + L2
r=L
s=R
t = 2s
(1 + t)s = vtr+1 + utr + w
w + x + 1 = tr
u+y+1=t
u = 2z + 1.
Here again, negative terms must be transposed to the other side of the
composite equation. E.g., five equations can be combined into a single
equation by using the fact that if a, b, c, d, e, f, g, h, i, j are non-negative
integers, then
a = b, c = d, e = f, g = h, i = j
if and only if
a2 + b2 + c2 + d2 + e2 + f 2 + g 2 + h2 + i2 + j 2
= 2ab + 2cd + 2ef + 2gh + 2ij.
APPEND two lists consisting of two atoms each, takes the LISP inter-
preter 238890 machine cycles, and to APPEND two lists consisting of
six atoms each, takes 1518834 machine cycles! This shows very clearly
that these equations are only of theoretical interest, and certainly not
a practical way of actually doing computations.
The register machine simulator that counted the number of ma-
chine cycles is written in 370 machine language. On the large 370
mainframe that I use, the elapsed time per million simulated register
machine cycles is usually from 1 to 5 seconds, depending on the load
on the machine. Fortunately, this same LISP can be directly imple-
mented in 370 machine language using standard LISP implementation
techniques. Then it runs extremely fast, typically one, two, or three
orders of magnitude faster than on the register machine simulator. How
much faster depends on the size of the character strings that the regis-
ter machine LISP interpreter is constantly sweeping through counting
parentheses in order to break lists into their component elements. Real
LISP implementations avoid this by representing LISP S-expressions
as binary trees of pointers instead of character strings, so that the de-
composition of a list into its parts is immediate. They also replace the
time-consuming search of the association list for variable bindings, by a
direct table look-up. And they keep the interpreter stack in contiguous
storage rather then representing it as a LISP S-expression.
We have written in REXX a “compiler” that automatically converts
register machine programs into exponential diophantine equations in
the manner described above. Solutions of the equation produced by this
REXX compiler correspond to successful computational histories, and
there are variables in the equation for the initial and final contents of
each machine register. The equation compiled from a register machine
program has no solution if the program never halts on given input, and
it has exactly one solution if the program halts for that input.
Let’s look at two simple examples to get a more concrete feeling for
how the compiler works. But first we give in Section 2.4 a complete cast
of characters, a dictionary of the different kinds of variables that appear
in the compiled equations. Next we give the compiler a 16-instruction
register machine program with every possible register machine instruc-
tion; this exercises all the capabilities of the compiler. Section 2.5 is the
compiler’s log explaining how it transformed the 16 register machine
44 CHAPTER 2. REGISTER MACHINES
instructions into 17 equations and 111 ⇒’s. Note that the compiler
uses a FORTRAN-like notation for equations in which multiplication
is ∗ and exponentiation is ∗∗.
We don’t show the rest, but this is what the compiler does. First it
expands the ⇒’s and obtains a total of 17 + 7 × 111 = 794 equations,
and then it folds them together into a single equation. This equation is
unfortunately too big to include here; as the summary information at
the end of the compiler’s log indicates, the left-hand side and right-hand
side are each more than 20,000 characters long.
Next we take an even smaller register machine program, and this
time we run it through the compiler and show all the steps up to the
final equation. This example really works; it is the 4-instruction pro-
gram for reversing a character string that we discussed above (Figure
2.5). Section 2.6 is the compiler’s log explaining how it expands the
4-instruction program into 13 equations and 38 ⇒’s. This is slightly
larger than the number of equations and ⇒’s that we obtained when we
worked through this example by hand; the reason is that the compiler
uses a more systematic approach.
In Section 2.7 the compiler shows how it eliminates all ⇒’s by ex-
panding them into equations, seven for each ⇒. The original 13 equa-
tions and 38 ⇒’s produced from the program are flush at the left mar-
gin. The 13 + 7 × 38 = 279 equations that are generated from them
are indented 6 spaces. When the compiler directly produces an equa-
tion, it appears twice, once flush left and then immediately afterwards
indented 6 spaces. When the compiler produces a ⇒, it appears flush
left, followed immediately by the seven equations that are generated
from it, each indented six spaces. Note that the auxiliary variables
generated to expand the nth ⇒ all end with the number n. By looking
at the names of these variables one can determine the ⇒ in Section 2.6
that they came from, which will be numbered (imp.n), and see why the
compiler generated them.
The last thing that the compiler does is to take each of the 279
equations that appear indented in Section 2.7 and fold it into the left-
hand side and right-hand side of the final equation. This is done using
the “sum of squares” technique: x = y adds x2 + y 2 to the left-hand
side and 2xy to the right-hand side. Section 2.8 is the resulting left-
hand side, and Section 2.9 is the right-hand side; the final equation is
2.4. VARIABLES USED IN ARITHMETIZATION 45
ic (vector)
This is a vector giving the label of the instruction being executed
at any given time. I.e., if at time t the instruction LABEL is
executed, then the base-q digit of ic corresponding to q t is the
binary representation of the S-expression (LABEL).
next.ic (vector)
This is a vector giving the label of the next instruction to be ex-
ecuted. I.e., if at time t + 1 the instruction LABEL is executed,
then the base-q digit of ic corresponding to q t is the binary rep-
resentation of the S-expression (LABEL).
longest.label (scalar)
This is the number of characters in the longest label of any in-
struction in the program.
number.of.instructions (scalar)
This is the total number of instructions in the program.
REGISTER (vector)
This is a vector giving the contents of REGISTER as a function
of time. I.e., the base-q digit corresponding to q t is the contents
of REGISTER at time t.
char.REGISTER (vector)
This is a vector giving the first character (i.e., the rightmost 8
bits) in each register as a function of time. I.e., the base-q digit
corresponding to q t is the number between 0 and 255 that repre-
sents the first character in REGISTER at time t.
shift.REGISTER (vector)
This is a vector giving the 8-bit right shift of each register as a
function of time. I.e., the base-q digit corresponding to q t is the
2.4. VARIABLES USED IN ARITHMETIZATION 47
input.REGISTER (scalar)
This is the initial contents of REGISTER.
output.REGISTER (scalar)
This is the final contents of REGISTER.
uNUMBER
The binomial coefficient used in expanding the NUMBERth im-
plication.
vNUMBER
A junk variable used in expanding the NUMBERth implication.
wNUMBER
A junk variable used in expanding the NUMBERth implication.
xNUMBER
A junk variable used in expanding the NUMBERth implication.
yNUMBER
A junk variable used in expanding the NUMBERth implication.
zNUMBER
A junk variable used in expanding the NUMBERth implication.
L1: GOTO L1
L2: JUMP C L1
L3: GOBACK C
L4: NEQ A C’a’ L1
L5: NEQ A B L1
L6: EQ A C’b’ L1
L7: EQ A B L1
L8: OUT C
L9: DUMP
L10: HALT
L11: SET A C’a’
L12: SET A B
L13: RIGHT C
L14: LEFT A C’b’
L15: LEFT A B
L16: HALT
50 CHAPTER 2. REGISTER MACHINES
(imp.1) L1 => i
(imp.2) L2 => i
(imp.3) L3 => i
(imp.4) L4 => i
(imp.5) L5 => i
(imp.6) L6 => i
(imp.7) L7 => i
(imp.8) L8 => i
(imp.9) L9 => i
(imp.10) L10 => i
(imp.11) L11 => i
(imp.12) L12 => i
(imp.13) L13 => i
(imp.14) L14 => i
(imp.15) L15 => i
(imp.16) L16 => i
(eq.7) i = L1 + L2 + L3 + L4 + L5 + L6 + L7 + L8 + L9 +
L10 + L11 + L12 + L13 + L14 + L15 + L16
L1: GOTO L1
(imp.18) q * L1 => L1
L2: JUMP C L1
(imp.19) q * L2 => L1
L3: GOBACK C
(imp.26) q * L4 => L5 + L1
(imp.27) q * L4 => L1 + q * eq.A.C’a’
L5: NEQ A B L1
(imp.28) q * L5 => L6 + L1
(imp.29) q * L5 => L1 + q * eq.A.B
L6: EQ A C’b’ L1
(imp.30) q * L6 => L7 + L1
(imp.31) q * L6 => L7 + q * eq.A.C’b’
L7: EQ A B L1
(imp.32) q * L7 => L8 + L1
(imp.33) q * L7 => L8 + q * eq.A.B
52 CHAPTER 2. REGISTER MACHINES
L8: OUT C
(imp.34) q * L8 => L9
L9: DUMP
L10: HALT
L12: SET A B
L13: RIGHT C
L15: LEFT A B
L16: HALT
In other words,
(eq.9) ic = 1075605632 * L1 + 1083994240 * L2 +
1079799936 * L3 + 1088188544 * L4 + 1077702784 *
L5 + 1086091392 * L6 + 1081897088 * L7 +
1090285696 * L8 + 1073901696 * L9 + 278839193728 *
L10 + 275349532800 * L11 + 277497016448 * L12 +
276423274624 * L13 + 278570758272 * L14 +
275886403712 * L15 + 278033887360 * L16
L1: GOTO L1
L2: JUMP C L1
L3: GOBACK C
L5: NEQ A B L1
L6: EQ A C’b’ L1
54 CHAPTER 2. REGISTER MACHINES
L7: EQ A B L1
L8: OUT C
L9: DUMP
L10: HALT
L12: SET A B
L13: RIGHT C
L15: LEFT A B
L16: HALT
Register A ...................................................
Register B ...................................................
Register C ...................................................
Compare A B ..................................................
Register variables:
A B C
Label variables:
L1 L10 L11 L12 L13 L14 L15 L16 L2 L3 L4 L5 L6 L7
L8 L9
Auxiliary variables:
char.A char.B dont.set.A dont.set.B dont.set.C
eq.A.B eq.A.C’a’ eq.A.C’b’ ge.A.B ge.A.C’a’
ge.A.C’b’ ge.B.A ge.C’a’.A ge.C’b’.A goback.L3 i
ic input.A input.B input.C longest.label next.ic
number.of.instructions output.A output.B output.C
q q.minus.1 set.A set.A.L11 set.A.L12 set.A.L14
set.A.L15 set.B set.B.L15 set.C set.C.L13 set.C.L2
shift.A shift.B shift.C time total.input
(imp.1) L1 => i
(imp.2) L2 => i
(imp.3) L3 => i
(imp.4) L4 => i
(eq.7) i = L1 + L2 + L3 + L4
(imp.6) q * L1 => L2
L2: LEFT B A
(imp.7) q * L2 => L3
(imp.8) q * L3 => L4 + L2
(imp.9) q * L3 => L2 + q * eq.A.X’00’
L4: HALT
L2: LEFT B A
L4: HALT
2.6. A COMPLETE EXAMPLE OF ARITHMETIZATION 61
Register A ...................................................
(eq.10) set.A = L2
(imp.20) dont.set.A => A
(imp.21) dont.set.A => q.minus.1 * i - q.minus.1 * set.A
(imp.22) A => dont.set.A + q.minus.1 * set.A
Register B ...................................................
(eq.13) set.B = L1 + L2
(imp.27) dont.set.B => B
(imp.28) dont.set.B => q.minus.1 * i - q.minus.1 * set.B
(imp.29) B => dont.set.B + q.minus.1 * set.B
Note: X’00’ is 0
* i
Register variables:
A B
Label variables:
L1 L2 L3 L4
Auxiliary variables:
char.A dont.set.A dont.set.B eq.A.X’00’ ge.A.X’00’
ge.X’00’.A i input.A input.B longest.label
number.of.instructions output.A output.B q
q.minus.1 set.A set.A.L2 set.B set.B.L1 set.B.L2
shift.A time total.input
2.7. EXPANSION OF ⇒’S 63
L3 => i
r3 = L3
s3 = i
t3 = 2**s3
(1+t3)**s3 = v3*t3**(r3+1) + u3*t3**r3 + w3
w3+x3+1 = t3**r3
u3+y3+1 = t3
u3 = 2*z3+ 1
L4 => i
r4 = L4
s4 = i
t4 = 2**s4
(1+t4)**s4 = v4*t4**(r4+1) + u4*t4**r4 + w4
w4+x4+1 = t4**r4
u4+y4+1 = t4
u4 = 2*z4+ 1
i = L1 + L2 + L3 + L4
i = L1+L2+L3+L4
1 => L1
r5 = 1
s5 = L1
t5 = 2**s5
(1+t5)**s5 = v5*t5**(r5+1) + u5*t5**r5 + w5
w5+x5+1 = t5**r5
u5+y5+1 = t5
u5 = 2*z5+ 1
q ** time = q * L4
q**time = q*L4
q * L1 => L2
r6 = q*L1
s6 = L2
t6 = 2**s6
(1+t6)**s6 = v6*t6**(r6+1) + u6*t6**r6 + w6
w6+x6+1 = t6**r6
u6+y6+1 = t6
u6 = 2*z6+ 1
q * L2 => L3
r7 = q*L2
s7 = L3
2.7. EXPANSION OF ⇒’S 65
t7 = 2**s7
(1+t7)**s7 = v7*t7**(r7+1) + u7*t7**r7 + w7
w7+x7+1 = t7**r7
u7+y7+1 = t7
u7 = 2*z7+ 1
q * L3 => L4 + L2
r8 = q*L3
s8 = L4+L2
t8 = 2**s8
(1+t8)**s8 = v8*t8**(r8+1) + u8*t8**r8 + w8
w8+x8+1 = t8**r8
u8+y8+1 = t8
u8 = 2*z8+ 1
q * L3 => L2 + q * eq.A.X’00’
r9 = q*L3
s9 = L2+q*eq.A.X’00’
t9 = 2**s9
(1+t9)**s9 = v9*t9**(r9+1) + u9*t9**r9 + w9
w9+x9+1 = t9**r9
u9+y9+1 = t9
u9 = 2*z9+ 1
set.B.L1 => 0 * i
r10 = set.B.L1
s10 = 0*i
t10 = 2**s10
(1+t10)**s10 = v10*t10**(r10+1) + u10*t10**r10 + w10
w10+x10+1 = t10**r10
u10+y10+1 = t10
u10 = 2*z10+ 1
set.B.L1 => q.minus.1 * L1
r11 = set.B.L1
s11 = q.minus.1*L1
t11 = 2**s11
(1+t11)**s11 = v11*t11**(r11+1) + u11*t11**r11 + w11
w11+x11+1 = t11**r11
u11+y11+1 = t11
u11 = 2*z11+ 1
0 * i => set.B.L1 + q.minus.1 * i - q.minus.1 * L1
r12 = 0*i
66 CHAPTER 2. REGISTER MACHINES
s12+q.minus.1*L1 = set.B.L1+q.minus.1*i
t12 = 2**s12
(1+t12)**s12 = v12*t12**(r12+1) + u12*t12**r12 + w12
w12+x12+1 = t12**r12
u12+y12+1 = t12
u12 = 2*z12+ 1
set.B.L2 => 256 * B + char.A
r13 = set.B.L2
s13 = 256*B+char.A
t13 = 2**s13
(1+t13)**s13 = v13*t13**(r13+1) + u13*t13**r13 + w13
w13+x13+1 = t13**r13
u13+y13+1 = t13
u13 = 2*z13+ 1
set.B.L2 => q.minus.1 * L2
r14 = set.B.L2
s14 = q.minus.1*L2
t14 = 2**s14
(1+t14)**s14 = v14*t14**(r14+1) + u14*t14**r14 + w14
w14+x14+1 = t14**r14
u14+y14+1 = t14
u14 = 2*z14+ 1
256 * B + char.A => set.B.L2 + q.minus.1 * i - q.minus.1 * L2
r15 = 256*B+char.A
s15+q.minus.1*L2 = set.B.L2+q.minus.1*i
t15 = 2**s15
(1+t15)**s15 = v15*t15**(r15+1) + u15*t15**r15 + w15
w15+x15+1 = t15**r15
u15+y15+1 = t15
u15 = 2*z15+ 1
set.A.L2 => shift.A
r16 = set.A.L2
s16 = shift.A
t16 = 2**s16
(1+t16)**s16 = v16*t16**(r16+1) + u16*t16**r16 + w16
w16+x16+1 = t16**r16
u16+y16+1 = t16
u16 = 2*z16+ 1
set.A.L2 => q.minus.1 * L2
2.7. EXPANSION OF ⇒’S 67
r17 = set.A.L2
s17 = q.minus.1*L2
t17 = 2**s17
(1+t17)**s17 = v17*t17**(r17+1) + u17*t17**r17 + w17
w17+x17+1 = t17**r17
u17+y17+1 = t17
u17 = 2*z17+ 1
shift.A => set.A.L2 + q.minus.1 * i - q.minus.1 * L2
r18 = shift.A
s18+q.minus.1*L2 = set.A.L2+q.minus.1*i
t18 = 2**s18
(1+t18)**s18 = v18*t18**(r18+1) + u18*t18**r18 + w18
w18+x18+1 = t18**r18
u18+y18+1 = t18
u18 = 2*z18+ 1
A => q.minus.1 * i
r19 = A
s19 = q.minus.1*i
t19 = 2**s19
(1+t19)**s19 = v19*t19**(r19+1) + u19*t19**r19 + w19
w19+x19+1 = t19**r19
u19+y19+1 = t19
u19 = 2*z19+ 1
A + output.A * q ** time = input.A + q * set.A.L2 + q * dont.s
et.A
A+output.A*q**time = input.A+q*set.A.L2+q*dont.set.A
set.A = L2
set.A = L2
dont.set.A => A
r20 = dont.set.A
s20 = A
t20 = 2**s20
(1+t20)**s20 = v20*t20**(r20+1) + u20*t20**r20 + w20
w20+x20+1 = t20**r20
u20+y20+1 = t20
u20 = 2*z20+ 1
dont.set.A => q.minus.1 * i - q.minus.1 * set.A
r21 = dont.set.A
s21+q.minus.1*set.A = q.minus.1*i
68 CHAPTER 2. REGISTER MACHINES
t21 = 2**s21
(1+t21)**s21 = v21*t21**(r21+1) + u21*t21**r21 + w21
w21+x21+1 = t21**r21
u21+y21+1 = t21
u21 = 2*z21+ 1
A => dont.set.A + q.minus.1 * set.A
r22 = A
s22 = dont.set.A+q.minus.1*set.A
t22 = 2**s22
(1+t22)**s22 = v22*t22**(r22+1) + u22*t22**r22 + w22
w22+x22+1 = t22**r22
u22+y22+1 = t22
u22 = 2*z22+ 1
256 * shift.A => A
r23 = 256*shift.A
s23 = A
t23 = 2**s23
(1+t23)**s23 = v23*t23**(r23+1) + u23*t23**r23 + w23
w23+x23+1 = t23**r23
u23+y23+1 = t23
u23 = 2*z23+ 1
256 * shift.A => q.minus.1 * i - 255 * i
r24 = 256*shift.A
s24+255*i = q.minus.1*i
t24 = 2**s24
(1+t24)**s24 = v24*t24**(r24+1) + u24*t24**r24 + w24
w24+x24+1 = t24**r24
u24+y24+1 = t24
u24 = 2*z24+ 1
A => 256 * shift.A + 255 * i
r25 = A
s25 = 256*shift.A+255*i
t25 = 2**s25
(1+t25)**s25 = v25*t25**(r25+1) + u25*t25**r25 + w25
w25+x25+1 = t25**r25
u25+y25+1 = t25
u25 = 2*z25+ 1
A = 256 * shift.A + char.A
A = 256*shift.A+char.A
2.7. EXPANSION OF ⇒’S 69
B => q.minus.1 * i
r26 = B
s26 = q.minus.1*i
t26 = 2**s26
(1+t26)**s26 = v26*t26**(r26+1) + u26*t26**r26 + w26
w26+x26+1 = t26**r26
u26+y26+1 = t26
u26 = 2*z26+ 1
B + output.B * q ** time = input.B + q * set.B.L1 + q * set.B.
L2 + q * dont.set.B
B+output.B*q**time = input.B+q*set.B.L1+q*set.B.L2+q*do
nt.set.B
set.B = L1 + L2
set.B = L1+L2
dont.set.B => B
r27 = dont.set.B
s27 = B
t27 = 2**s27
(1+t27)**s27 = v27*t27**(r27+1) + u27*t27**r27 + w27
w27+x27+1 = t27**r27
u27+y27+1 = t27
u27 = 2*z27+ 1
dont.set.B => q.minus.1 * i - q.minus.1 * set.B
r28 = dont.set.B
s28+q.minus.1*set.B = q.minus.1*i
t28 = 2**s28
(1+t28)**s28 = v28*t28**(r28+1) + u28*t28**r28 + w28
w28+x28+1 = t28**r28
u28+y28+1 = t28
u28 = 2*z28+ 1
B => dont.set.B + q.minus.1 * set.B
r29 = B
s29 = dont.set.B+q.minus.1*set.B
t29 = 2**s29
(1+t29)**s29 = v29*t29**(r29+1) + u29*t29**r29 + w29
w29+x29+1 = t29**r29
u29+y29+1 = t29
u29 = 2*z29+ 1
ge.A.X’00’ => i
70 CHAPTER 2. REGISTER MACHINES
r30 = ge.A.X’00’
s30 = i
t30 = 2**s30
(1+t30)**s30 = v30*t30**(r30+1) + u30*t30**r30 + w30
w30+x30+1 = t30**r30
u30+y30+1 = t30
u30 = 2*z30+ 1
256 * ge.A.X’00’ => 256 * i + char.A - 0 * i
r31 = 256*ge.A.X’00’
s31+0*i = 256*i+char.A
t31 = 2**s31
(1+t31)**s31 = v31*t31**(r31+1) + u31*t31**r31 + w31
w31+x31+1 = t31**r31
u31+y31+1 = t31
u31 = 2*z31+ 1
256 * i + char.A - 0 * i => 256 * ge.A.X’00’ + 255 * i
r32+0*i = 256*i+char.A
s32 = 256*ge.A.X’00’+255*i
t32 = 2**s32
(1+t32)**s32 = v32*t32**(r32+1) + u32*t32**r32 + w32
w32+x32+1 = t32**r32
u32+y32+1 = t32
u32 = 2*z32+ 1
ge.X’00’.A => i
r33 = ge.X’00’.A
s33 = i
t33 = 2**s33
(1+t33)**s33 = v33*t33**(r33+1) + u33*t33**r33 + w33
w33+x33+1 = t33**r33
u33+y33+1 = t33
u33 = 2*z33+ 1
256 * ge.X’00’.A => 256 * i + 0 * i - char.A
r34 = 256*ge.X’00’.A
s34+char.A = 256*i+0*i
t34 = 2**s34
(1+t34)**s34 = v34*t34**(r34+1) + u34*t34**r34 + w34
w34+x34+1 = t34**r34
u34+y34+1 = t34
u34 = 2*z34+ 1
2.8. LEFT-HAND SIDE 71
.input+time+number.of.instructions+longest.label+3))**2 + (q.m
inus.1+1)**2+(q)**2 + (1+q*i)**2+(i+q**time)**2 + (r1)**2+(L1)
**2 + (s1)**2+(i)**2 + (t1)**2+(2**s1)**2 + ((1+t1)**s1)**2+(v
1*t1**(r1+1)+u1*t1**r1+w1)**2 + (w1+x1+1)**2+(t1**r1)**2 + (u1
+y1+1)**2+(t1)**2 + (u1)**2+(2*z1+1)**2 + (r2)**2+(L2)**2 + (s
2)**2+(i)**2 + (t2)**2+(2**s2)**2 + ((1+t2)**s2)**2+(v2*t2**(r
2+1)+u2*t2**r2+w2)**2 + (w2+x2+1)**2+(t2**r2)**2 + (u2+y2+1)**
2+(t2)**2 + (u2)**2+(2*z2+1)**2 + (r3)**2+(L3)**2 + (s3)**2+(i
)**2 + (t3)**2+(2**s3)**2 + ((1+t3)**s3)**2+(v3*t3**(r3+1)+u3*
t3**r3+w3)**2 + (w3+x3+1)**2+(t3**r3)**2 + (u3+y3+1)**2+(t3)**
2 + (u3)**2+(2*z3+1)**2 + (r4)**2+(L4)**2 + (s4)**2+(i)**2 + (
t4)**2+(2**s4)**2 + ((1+t4)**s4)**2+(v4*t4**(r4+1)+u4*t4**r4+w
4)**2 + (w4+x4+1)**2+(t4**r4)**2 + (u4+y4+1)**2+(t4)**2 + (u4)
**2+(2*z4+1)**2 + (i)**2+(L1+L2+L3+L4)**2 + (r5)**2+(1)**2 + (
s5)**2+(L1)**2 + (t5)**2+(2**s5)**2 + ((1+t5)**s5)**2+(v5*t5**
(r5+1)+u5*t5**r5+w5)**2 + (w5+x5+1)**2+(t5**r5)**2 + (u5+y5+1)
**2+(t5)**2 + (u5)**2+(2*z5+1)**2 + (q**time)**2+(q*L4)**2 + (
r6)**2+(q*L1)**2 + (s6)**2+(L2)**2 + (t6)**2+(2**s6)**2 + ((1+
t6)**s6)**2+(v6*t6**(r6+1)+u6*t6**r6+w6)**2 + (w6+x6+1)**2+(t6
**r6)**2 + (u6+y6+1)**2+(t6)**2 + (u6)**2+(2*z6+1)**2 + (r7)**
2+(q*L2)**2 + (s7)**2+(L3)**2 + (t7)**2+(2**s7)**2 + ((1+t7)**
s7)**2+(v7*t7**(r7+1)+u7*t7**r7+w7)**2 + (w7+x7+1)**2+(t7**r7)
**2 + (u7+y7+1)**2+(t7)**2 + (u7)**2+(2*z7+1)**2 + (r8)**2+(q*
L3)**2 + (s8)**2+(L4+L2)**2 + (t8)**2+(2**s8)**2 + ((1+t8)**s8
)**2+(v8*t8**(r8+1)+u8*t8**r8+w8)**2 + (w8+x8+1)**2+(t8**r8)**
2 + (u8+y8+1)**2+(t8)**2 + (u8)**2+(2*z8+1)**2 + (r9)**2+(q*L3
)**2 + (s9)**2+(L2+q*eq.A.X’00’)**2 + (t9)**2+(2**s9)**2 + ((1
+t9)**s9)**2+(v9*t9**(r9+1)+u9*t9**r9+w9)**2 + (w9+x9+1)**2+(t
9**r9)**2 + (u9+y9+1)**2+(t9)**2 + (u9)**2+(2*z9+1)**2 + (r10)
**2+(set.B.L1)**2 + (s10)**2+(0*i)**2 + (t10)**2+(2**s10)**2 +
((1+t10)**s10)**2+(v10*t10**(r10+1)+u10*t10**r10+w10)**2 + (w
10+x10+1)**2+(t10**r10)**2 + (u10+y10+1)**2+(t10)**2 + (u10)**
2+(2*z10+1)**2 + (r11)**2+(set.B.L1)**2 + (s11)**2+(q.minus.1*
L1)**2 + (t11)**2+(2**s11)**2 + ((1+t11)**s11)**2+(v11*t11**(r
11+1)+u11*t11**r11+w11)**2 + (w11+x11+1)**2+(t11**r11)**2 + (u
11+y11+1)**2+(t11)**2 + (u11)**2+(2*z11+1)**2 + (r12)**2+(0*i)
**2 + (s12+q.minus.1*L1)**2+(set.B.L1+q.minus.1*i)**2 + (t12)*
*2+(2**s12)**2 + ((1+t12)**s12)**2+(v12*t12**(r12+1)+u12*t12**
r12+w12)**2 + (w12+x12+1)**2+(t12**r12)**2 + (u12+y12+1)**2+(t
2.8. LEFT-HAND SIDE 73
*2+(v23*t23**(r23+1)+u23*t23**r23+w23)**2 + (w23+x23+1)**2+(t2
3**r23)**2 + (u23+y23+1)**2+(t23)**2 + (u23)**2+(2*z23+1)**2 +
(r24)**2+(256*shift.A)**2 + (s24+255*i)**2+(q.minus.1*i)**2 +
(t24)**2+(2**s24)**2 + ((1+t24)**s24)**2+(v24*t24**(r24+1)+u2
4*t24**r24+w24)**2 + (w24+x24+1)**2+(t24**r24)**2 + (u24+y24+1
)**2+(t24)**2 + (u24)**2+(2*z24+1)**2 + (r25)**2+(A)**2 + (s25
)**2+(256*shift.A+255*i)**2 + (t25)**2+(2**s25)**2 + ((1+t25)*
*s25)**2+(v25*t25**(r25+1)+u25*t25**r25+w25)**2 + (w25+x25+1)*
*2+(t25**r25)**2 + (u25+y25+1)**2+(t25)**2 + (u25)**2+(2*z25+1
)**2 + (A)**2+(256*shift.A+char.A)**2 + (r26)**2+(B)**2 + (s26
)**2+(q.minus.1*i)**2 + (t26)**2+(2**s26)**2 + ((1+t26)**s26)*
*2+(v26*t26**(r26+1)+u26*t26**r26+w26)**2 + (w26+x26+1)**2+(t2
6**r26)**2 + (u26+y26+1)**2+(t26)**2 + (u26)**2+(2*z26+1)**2 +
(B+output.B*q**time)**2+(input.B+q*set.B.L1+q*set.B.L2+q*dont
.set.B)**2 + (set.B)**2+(L1+L2)**2 + (r27)**2+(dont.set.B)**2
+ (s27)**2+(B)**2 + (t27)**2+(2**s27)**2 + ((1+t27)**s27)**2+(
v27*t27**(r27+1)+u27*t27**r27+w27)**2 + (w27+x27+1)**2+(t27**r
27)**2 + (u27+y27+1)**2+(t27)**2 + (u27)**2+(2*z27+1)**2 + (r2
8)**2+(dont.set.B)**2 + (s28+q.minus.1*set.B)**2+(q.minus.1*i)
**2 + (t28)**2+(2**s28)**2 + ((1+t28)**s28)**2+(v28*t28**(r28+
1)+u28*t28**r28+w28)**2 + (w28+x28+1)**2+(t28**r28)**2 + (u28+
y28+1)**2+(t28)**2 + (u28)**2+(2*z28+1)**2 + (r29)**2+(B)**2 +
(s29)**2+(dont.set.B+q.minus.1*set.B)**2 + (t29)**2+(2**s29)*
*2 + ((1+t29)**s29)**2+(v29*t29**(r29+1)+u29*t29**r29+w29)**2
+ (w29+x29+1)**2+(t29**r29)**2 + (u29+y29+1)**2+(t29)**2 + (u2
9)**2+(2*z29+1)**2 + (r30)**2+(ge.A.X’00’)**2 + (s30)**2+(i)**
2 + (t30)**2+(2**s30)**2 + ((1+t30)**s30)**2+(v30*t30**(r30+1)
+u30*t30**r30+w30)**2 + (w30+x30+1)**2+(t30**r30)**2 + (u30+y3
0+1)**2+(t30)**2 + (u30)**2+(2*z30+1)**2 + (r31)**2+(256*ge.A.
X’00’)**2 + (s31+0*i)**2+(256*i+char.A)**2 + (t31)**2+(2**s31)
**2 + ((1+t31)**s31)**2+(v31*t31**(r31+1)+u31*t31**r31+w31)**2
+ (w31+x31+1)**2+(t31**r31)**2 + (u31+y31+1)**2+(t31)**2 + (u
31)**2+(2*z31+1)**2 + (r32+0*i)**2+(256*i+char.A)**2 + (s32)**
2+(256*ge.A.X’00’+255*i)**2 + (t32)**2+(2**s32)**2 + ((1+t32)*
*s32)**2+(v32*t32**(r32+1)+u32*t32**r32+w32)**2 + (w32+x32+1)*
*2+(t32**r32)**2 + (u32+y32+1)**2+(t32)**2 + (u32)**2+(2*z32+1
)**2 + (r33)**2+(ge.X’00’.A)**2 + (s33)**2+(i)**2 + (t33)**2+(
2**s33)**2 + ((1+t33)**s33)**2+(v33*t33**(r33+1)+u33*t33**r33+
w33)**2 + (w33+x33+1)**2+(t33**r33)**2 + (u33+y33+1)**2+(t33)*
2.9. RIGHT-HAND SIDE 75
*2 + (u33)**2+(2*z33+1)**2 + (r34)**2+(256*ge.X’00’.A)**2 + (s
34+char.A)**2+(256*i+0*i)**2 + (t34)**2+(2**s34)**2 + ((1+t34)
**s34)**2+(v34*t34**(r34+1)+u34*t34**r34+w34)**2 + (w34+x34+1)
**2+(t34**r34)**2 + (u34+y34+1)**2+(t34)**2 + (u34)**2+(2*z34+
1)**2 + (r35+char.A)**2+(256*i+0*i)**2 + (s35)**2+(256*ge.X’00
’.A+255*i)**2 + (t35)**2+(2**s35)**2 + ((1+t35)**s35)**2+(v35*
t35**(r35+1)+u35*t35**r35+w35)**2 + (w35+x35+1)**2+(t35**r35)*
*2 + (u35+y35+1)**2+(t35)**2 + (u35)**2+(2*z35+1)**2 + (r36)**
2+(eq.A.X’00’)**2 + (s36)**2+(i)**2 + (t36)**2+(2**s36)**2 + (
(1+t36)**s36)**2+(v36*t36**(r36+1)+u36*t36**r36+w36)**2 + (w36
+x36+1)**2+(t36**r36)**2 + (u36+y36+1)**2+(t36)**2 + (u36)**2+
(2*z36+1)**2 + (r37)**2+(2*eq.A.X’00’)**2 + (s37)**2+(ge.A.X’0
0’+ge.X’00’.A)**2 + (t37)**2+(2**s37)**2 + ((1+t37)**s37)**2+(
v37*t37**(r37+1)+u37*t37**r37+w37)**2 + (w37+x37+1)**2+(t37**r
37)**2 + (u37+y37+1)**2+(t37)**2 + (u37)**2+(2*z37+1)**2 + (r3
8)**2+(ge.A.X’00’+ge.X’00’.A)**2 + (s38)**2+(2*eq.A.X’00’+i)**
2 + (t38)**2+(2**s38)**2 + ((1+t38)**s38)**2+(v38*t38**(r38+1)
+u38*t38**r38+w38)**2 + (w38+x38+1)**2+(t38**r38)**2 + (u38+y3
8+1)**2+(t38)**2 + (u38)**2+(2*z38+1)**2
3.1 Introduction
In this chapter we present a “permissive” simplified version of pure
LISP designed especially for metamathematical applications. Aside
from the rule that an S-expression must have balanced ()’s, the only
way that an expression can fail to have a value is by looping forever.
This is important because algorithms that simulate other algorithms
chosen at random, must be able to run garbage safely.
This version of LISP developed from one originally designed for
teaching [Chaitin (1976a)]. The language was reduced to its essence
and made as easy to learn as possible, and was actually used in several
university courses. Like APL, this version of LISP is so concise that
one can write it as fast as one thinks. This LISP is so simple that
an interpreter for it can be coded in three hundred and fifty lines of
REXX.
How to read this chapter: This chapter can be quite difficult to
understand, especially if one has never programmed in LISP before.
The correct approach is to read it several times, and to try to work
through all the examples in detail. Initially the material will seem
completely incomprehensible, but all of a sudden the pieces will snap
together into a coherent whole. Alternatively, one can skim Chapters 3,
4, and 5, which depend heavily on the details of this LISP, and proceed
directly to the more theoretical material in Chapter 6, which could be
79
80 CHAPTER 3. A VERSION OF PURE LISP
()
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789
_+-.’,!=*&?/:"$%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Figure 3.1: The LISP Character Set. These are the 128 characters
that are used in LISP S-expressions: the left and right parentheses and
the 126 one-character atoms. The place that a character appears in this
list of all 128 of them is important; it defines the binary representation
for that character. In this monograph we use two different represen-
tations: (1) The first binary representation uses 8 bits per character,
with the characters in reverse order. The 8-bit string corresponding
to a character is obtained by taking the 1-origin ordinal number of its
position in the list, which ranges from 1 to 128, writing this number as
an 8-bit string in base-two, and then reversing this 8-bit string. This is
the representation used in the exponential diophantine version of the
LISP interpreter in Part I. (2) The second binary representation uses 7
bits per character, with the characters in the normal order. The 7-bit
string corresponding to a character is obtained by taking the 0-origin
ordinal number of its position in the list, which ranges from 0 to 127,
writing this number as a 7-bit string in base-two, and then reversing
this 7-bit string. This is the representation that is used to define a
program-size complexity measure in Part II.
3.2. DEFINITION OF LISP 81
1.
It is important to note that we do not identify 0 and (). It is usual
in LISP to identify falsehood and the empty list; both are usually called
NIL. This would complicate our LISP and make it harder to write the
LISP interpreter that we give in Chapter 4, because it would be harder
to determine if two S-expressions are equal. This would also be a serious
mistake from an information-theoretic point of view, because it would
make large numbers of S-expressions into synonyms. And wasting the
expressive power of S-expressions in this manner would invalidate large
portions of Chapter 5 and Appendix B. Thus there is no single-character
synonym in our LISP for the empty list (); 2 characters are required.
The fundamental semantical concept in LISP is that of the value
of an S-expression in a given environment. An environment consists
of a so-called “association list” in which variables (atoms) and their
values (S-expressions) alternate. If a variable appears several times,
only its first value is significant. If a variable does not appear in the
environment, then it itself is its value, so that it is in effect a literal
constant. (xa x(a) x((a)) F(&(x)(/(.x)x(F(+x))))) is a typical
environment. In this environment the value of x is a, the value of F
is (&(x)(/(.x)x(F(+x)))), and any other atom, for example Q, has
itself as value.
Thus the value of an atomic S-expression is obtained by searching
odd elements of the environment for that atom. What is the value of a
non-atomic S-expression, that is, of a non-empty list? In this case the
value is defined recursively, in terms of the values of the elements of the
S-expression in the same environment. The value of the first element
of the S-expression is the function, and the function’s arguments are
the values of the remaining elements of the expression. Thus in LISP
the notation (fxyz) is used for what in FORTRAN would be written
f(x,y,z). Both denote the function f applied to the arguments xyz.
There are two kinds of functions: primitive functions and defined
84 CHAPTER 3. A VERSION OF PURE LISP
• Name Atom
Symbol .
Arguments 1
Explanation The result of applying this function to an argu-
ment is true or false depending on whether or not the argu-
ment is an atom.
Examples (.x) has value 1
(.y) has value 0
• Name Equal
Symbol =
Arguments 2
Explanation The result of applying this function to two argu-
ments is true or false depending on whether or not they are
the same S-expression.
Examples (=wx) has value 1
(=yz) has value 0
• Name Join/CONS
Symbol *
Arguments 2
Explanation If the second argument is not a list, then the result
of applying this function is the first argument. If the second
argument is an n-element list, then the result of applying
this function is the (n + 1)-element list whose head is the
first argument and whose tail is the second argument.
Examples (*xx) has value a
(*x()) has value (a)
(*xy) has value (abcd)
(*xz) has value (a(ef))
(*yz) has value ((bcd)(ef))
• Name Output
86 CHAPTER 3. A VERSION OF PURE LISP
Symbol ,
Arguments 1
Explanation The result of applying this function is its argu-
ment, in other words, this is an identity function. The side-
effect is to display the argument. This function is used to
display intermediate results. It is the only primitive function
that has a side-effect.
Examples Evaluation of (-(,(-(,(-y))))) displays (cd) and
(d) and yields value ()
• Name Quote
Symbol ’
Arguments 1
Explanation The result of applying this function is the uneval-
uated argument expression.
Examples (’x) has value x
(’(*xy)) has value (*xy)
• Name If-then-else
Symbol /
Arguments 3
Explanation If the first argument is not false, then the result
is the second argument. If the first argument is false, then
the result is the third argument. The argument that is not
selected is not evaluated.
Examples (/zxy) has value a
(/txy) has value a
(/fxy) has value (bcd)
Evaluation of (/tx(,y)) does not have the side-effect of
displaying (bcd)
• Name Eval
Symbol !
3.2. DEFINITION OF LISP 87
Arguments 1
Explanation The expression that is the value of the argument is
evaluated in an empty environment. This is the only primi-
tive function that is a partial rather than a total function.
Examples (!(’x)) has value x instead of a, because x is evalu-
ated in an empty environment.
(!(’(.x))) has value 1
(!(’((’(&(f)(f)))(’(&()(f)))))) has no value.
Atom . = + - * , ’ / ! ? & :
Arguments 1 2 1 1 2 1 1 3 1 2 2 3
list requires the previous evaluation of all its elements. When evaluation
of the elements of a list is required, this is always done one element at
a time, from left to right.
M-expressions (M stands for “meta”) are S-expressions in which
the parentheses grouping together primitive functions and their argu-
ments are omitted as a convenience for the LISP programmer. See
Figure 3.3. For these purposes, & (“function/del/LAMBDA/define”)
is treated as if it were a primitive function with two arguments, and
: (“LET/is”) is treated as if it were a primitive function with three
arguments. : is another meta-notational abbreviation, but may be
thought of as an additional primitive function. :vde denotes the value
of e in an environment in which v evaluates to the current value of d,
and :(fxyz)de denotes the value of e in an environment in which f
evaluates to (&(xyz)d). More precisely, the M-expression :vde denotes
the S-expression ((’(&(v)e))d), and the M-expression :(fxyz)de de-
notes the S-expression ((’(&(f)e))(’(&(xyz)d))), and similarly for
functions with a different number of arguments.
A " is written before a self-contained portion of an M-expression
to indicate that the convention regarding invisible parentheses and the
meaning of : does not apply within it, i.e., that there follows an S-
expression “as is”.
Input to the LISP interpreter consists of a list of M-expressions.
All blanks are ignored, and comments may be inserted anywhere by
placing them between balanced [’s and ]’s, so that comments may
include other comments. Two kinds of M-expressions are read by the
interpreter: expressions to be evaluated, and others that indicate the
environment to be used for these evaluations. The initial environment
is the empty list ().
Each M-expression is transformed into the corresponding S-expres-
sion and displayed:
3.3. EXAMPLES 89
3.3 Examples
Here are five elementary examples of expressions and their values.
(*a(*b(*c())))
(+(-(-(-(’(abcde))))))
(*+(*=(*-())))
• The M-expression
(’&(xyz)*z*y*x()abc)
((’(&(xyz)(*z(*y(*x())))))abc)
• The M-expression
:(Cxy)/.xy*+x(C-xy)(C’(abcdef)’(ghijkl))
((’(&(C)(C(’(abcdef))(’(ghijkl)))))
(’(&(xy)(/(.x)y(*(+x)(C(-x)y))))))
and one of the evaluations took 720 million simulated register machine
cycles!1
The program in Section 3.5 defines a conventional LISP with atoms
that may be any number of characters long. This example makes an
important point, which is that if our LISP with one-character atoms can
simulate a normal LISP with multi-character atoms, then the restriction
on the size of names is not of theoretical importance: any function that
can be defined using long names can also be defined using our one-
character names. In other words, Section 3.5 proves that our LISP is
computationally universal, and can define any computable function. In
practice the one-character restriction is not too serious, because one
style of using names is to give them only local significance, and then
names can be reused within a large function definition.2
The third and final example of LISP in LISP in this chapter, Section
3.6, is the most serious one of all. It is essentially a complete definition
of the semantics of our version of pure LISP, including ! and ?. Almost,
but not quite. We cheat in two ways:
(1) First of all, the top level of our LISP does not run under a time
limit, and the definition of LISP in LISP in Section 3.6 omits
this, and always imposes time limits on evaluations. We ought to
reserve a special internal time limit value to mean no limit; the
LISP interpreter given in Chapter 4 uses the underscore sign for
this purpose.
(2) Secondly, Section 3.6 reserves a special value, the dollar sign, as an
error value. This is of course cheating; we ought to return an atom
if there is an error, and the good value wrapped in parentheses if
there is no error, but this would complicate the definition of LISP
in LISP given in Section 3.6. The LISP interpreter in Chapter 4
uses an illegal S-expression consisting of a single right parenthesis
as the internal error value; no valid S-expression can begin with
a right parenthesis.
1
All the other LISP interpreter runs shown in this book were run directly on a
large mainframe computer, not on a simulated register machine; see Appendix A
for details.
2
Allowing long names would make it harder to program the LISP interpreter on
a register machine, which we do in Chapter 4.
92 CHAPTER 3. A VERSION OF PURE LISP
But except for these two “cheats,” we take Section 3.6 to be our
official definition of the semantics of our LISP. One can immediately
deduce from the definition given in Section 3.6 a number of important
details about the way our LISP achieves its “permissiveness.” Most
important, extra arguments to functions are ignored, and empty lists
are supplied for missing arguments. E.g., parameters in a function def-
inition which are not supplied with an argument expression when the
function is applied will be bound to the empty list (). This works this
way because when EVAL runs off the end of a list of arguments, it
is reduced to the empty argument list, and head and tail applied to
this empty list will continue to give the empty list. Also if an atom
is repeated in the parameter list of a function definition, the binding
corresponding to the first occurrence will shadow the later occurrences
of the same variable. Section 3.6 is a complete definition of LISP se-
mantics in the sense that there are no hidden error messages and error
checks in it: it performs exactly as written on what would normally
be considered “erroneous” expressions. Of course, in our LISP there
are no erroneous expressions, only expressions that fail to have a value
because the interpreter never finishes evaluating them: it goes into an
infinite loop and never returns a value.
That concludes Chapter 3. What lies ahead in Chapter 4? In the
next chapter we re-write the LISP program of Section 3.6 as a register
machine program, and then compile it into an exponential diophantine
equation. The one-page LISP function definition in Section 3.6 becomes
a 308-instruction register machine LISP interpreter, and then a 308 +
19 + 448 + 16281 = 17056-variable equation with a left-hand side and a
right-hand side each about half a million characters long. This equation
is a LISP interpreter, and in theory it can be used to get the values of S-
expressions. In Part II the crucial property of this equation is that it has
a variable input.EXPRESSION, it has exactly one solution if the LISP
S-expression with binary representation3 input.EXPRESSION has a
value, and it has no solution if input.EXPRESSION does not have a
value. We don’t care what output.VALUE is; we just want to know if
the evaluation eventually terminates.
3
Recall that the binary representation of an S-expression has 8 bits per character
with the characters in reverse order (see Figures 2.4 and 3.1).
3.4. LISP IN LISP I 93
V: (&(se)(/(.s)(/(.e)s(/(=s(+e))(+(-e))(Vs(-(-e)))))(
(’(&(f)(/(=f’)(+(-s))(/(=f.)(.(V(+(-s))e))(/(=f+)(
+(V(+(-s))e))(/(=f-)(-(V(+(-s))e))(/(=f,)(,(V(+(-s
))e))(/(=f=)(=(V(+(-s))e)(V(+(-(-s)))e))(/(=f*)(*(
V(+(-s))e)(V(+(-(-s)))e))(/(=f/)(/(V(+(-s))e)(V(+(
-(-s)))e)(V(+(-(-(-s))))e))(V(+(-(-f)))(,(N(+(-f))
(-s)e)))))))))))))(V(+s)e))))
N: (&(xae)(/(.x)e(*(+x)(*(V(+a)e)(N(-x)(-a)e)))))
94 CHAPTER 3. A VERSION OF PURE LISP
F: (&(x)(/(.x)x(F(+x))))
expression (F(’(((ab)c)d)))
value a
cycles 1435274
expression (V(’(F(’(((ab)c)d))))(*(’F)(*F())))
display (x(((ab)c)d)F(&(x)(/(.x)x(F(+x)))))
display (x((ab)c)x(((ab)c)d)F(&(x)(/(.x)x(F(+x)))))
display (x(ab)x((ab)c)x(((ab)c)d)F(&(x)(/(.x)x(F(+x)))))
display (xax(ab)x((ab)c)x(((ab)c)d)F(&(x)(/(.x)x(F(+x)))))
value a
cycles 719668657
/=+s’(ATOM) /.+(V+-se)’(T)’(NIL)
/=+s’(CAR) +(V+-se)
/=+s’(CDR) : x -(V+-se) /.x’(NIL)x
/=+s’(OUT) ,(V+-se)
/=+s’(EQ) /=(V+-se)(V+--se)’(T)’(NIL)
/=+s’(CONS) : x (V+-se) : y (V+--se) /=y’(NIL) *x() *xy
/=+s’(COND) /=’(NIL)(V++-se) (V*+s--se) (V+-+-se)
: f /.++s(V+se)+s [ f is ((LAMBDA)((X)(Y))(BODY)) ]
(V+--f,(N+-f-se)) [ display new environment ]
V: (&(se)(/(.(+s))(/(=s(+e))(+(-e))(Vs(-(-e))))(/(=(+
s)(’(QUOTE)))(+(-s))(/(=(+s)(’(ATOM)))(/(.(+(V(+(-
s))e)))(’(T))(’(NIL)))(/(=(+s)(’(CAR)))(+(V(+(-s))
e))(/(=(+s)(’(CDR)))((’(&(x)(/(.x)(’(NIL))x)))(-(V
(+(-s))e)))(/(=(+s)(’(OUT)))(,(V(+(-s))e))(/(=(+s)
(’(EQ)))(/(=(V(+(-s))e)(V(+(-(-s)))e))(’(T))(’(NIL
)))(/(=(+s)(’(CONS)))((’(&(x)((’(&(y)(/(=y(’(NIL))
)(*x())(*xy))))(V(+(-(-s)))e))))(V(+(-s))e))(/(=(+
s)(’(COND)))(/(=(’(NIL))(V(+(+(-s)))e))(V(*(+s)(-(
-s)))e)(V(+(-(+(-s))))e))((’(&(f)(V(+(-(-f)))(,(N(
+(-f))(-s)e)))))(/(.(+(+s)))(V(+s)e)(+s)))))))))))
))
N: (&(xae)(/(.x)e(*(+x)(*(V(+a)e)(N(-x)(-a)e)))))
[ FIRSTATOM
( LAMBDA ( X )
( COND (( ATOM X ) X )
(( QUOTE T ) ( FIRSTATOM ( CAR X )))))
]
& F ’
((FIRSTATOM)
96 CHAPTER 3. A VERSION OF PURE LISP
((LAMBDA) ((X))
((COND) (((ATOM) (X)) (X))
(((QUOTE) (T)) ((FIRSTATOM) ((CAR) (X))))))
)
expression (’((FIRSTATOM)((LAMBDA)((X))((COND)(((ATOM)(X))(X)
)(((QUOTE)(T))((FIRSTATOM)((CAR)(X))))))))
F: ((FIRSTATOM)((LAMBDA)((X))((COND)(((ATOM)(X))(X))(
((QUOTE)(T))((FIRSTATOM)((CAR)(X)))))))
[ APPEND
( LAMBDA ( X Y ) ( COND (( ATOM X ) Y )
(( QUOTE T ) ( CONS ( CAR X )
( APPEND ( CDR X ) Y )))))
]
& C ’
((APPEND)
((LAMBDA) ((X)(Y)) ((COND) (((ATOM) (X)) (Y))
(((QUOTE) (T)) ((CONS) ((CAR) (X))
((APPEND) ((CDR) (X)) (Y))))))
)
expression (’((APPEND)((LAMBDA)((X)(Y))((COND)(((ATOM)(X))(Y)
)(((QUOTE)(T))((CONS)((CAR)(X))((APPEND)((CDR)(X))
(Y))))))))
C: ((APPEND)((LAMBDA)((X)(Y))((COND)(((ATOM)(X))(Y))(
((QUOTE)(T))((CONS)((CAR)(X))((APPEND)((CDR)(X))(Y
)))))))
(V’
((FIRSTATOM) ((QUOTE) ((((A)(B))(C))(D))))
F)
expression (V(’((FIRSTATOM)((QUOTE)((((A)(B))(C))(D)))))F)
display ((X)((((A)(B))(C))(D))(FIRSTATOM)((LAMBDA)((X))((C
OND)(((ATOM)(X))(X))(((QUOTE)(T))((FIRSTATOM)((CAR
)(X)))))))
3.5. LISP IN LISP II 97
display ((X)(((A)(B))(C))(X)((((A)(B))(C))(D))(FIRSTATOM)(
(LAMBDA)((X))((COND)(((ATOM)(X))(X))(((QUOTE)(T))(
(FIRSTATOM)((CAR)(X)))))))
display ((X)((A)(B))(X)(((A)(B))(C))(X)((((A)(B))(C))(D))(
FIRSTATOM)((LAMBDA)((X))((COND)(((ATOM)(X))(X))(((
QUOTE)(T))((FIRSTATOM)((CAR)(X)))))))
display ((X)(A)(X)((A)(B))(X)(((A)(B))(C))(X)((((A)(B))(C)
)(D))(FIRSTATOM)((LAMBDA)((X))((COND)(((ATOM)(X))(
X))(((QUOTE)(T))((FIRSTATOM)((CAR)(X)))))))
value (A)
(V’
((APPEND) ((QUOTE)((A)(B)(C))) ((QUOTE)((D)(E)(F))))
C)
expression (V(’((APPEND)((QUOTE)((A)(B)(C)))((QUOTE)((D)(E)(F
)))))C)
display ((X)((A)(B)(C))(Y)((D)(E)(F))(APPEND)((LAMBDA)((X)
(Y))((COND)(((ATOM)(X))(Y))(((QUOTE)(T))((CONS)((C
AR)(X))((APPEND)((CDR)(X))(Y)))))))
display ((X)((B)(C))(Y)((D)(E)(F))(X)((A)(B)(C))(Y)((D)(E)
(F))(APPEND)((LAMBDA)((X)(Y))((COND)(((ATOM)(X))(Y
))(((QUOTE)(T))((CONS)((CAR)(X))((APPEND)((CDR)(X)
)(Y)))))))
display ((X)((C))(Y)((D)(E)(F))(X)((B)(C))(Y)((D)(E)(F))(X
)((A)(B)(C))(Y)((D)(E)(F))(APPEND)((LAMBDA)((X)(Y)
)((COND)(((ATOM)(X))(Y))(((QUOTE)(T))((CONS)((CAR)
(X))((APPEND)((CDR)(X))(Y)))))))
display ((X)(NIL)(Y)((D)(E)(F))(X)((C))(Y)((D)(E)(F))(X)((
B)(C))(Y)((D)(E)(F))(X)((A)(B)(C))(Y)((D)(E)(F))(A
PPEND)((LAMBDA)((X)(Y))((COND)(((ATOM)(X))(Y))(((Q
UOTE)(T))((CONS)((CAR)(X))((APPEND)((CDR)(X))(Y)))
))))
value ((A)(B)(C)(D)(E)(F))
(Vsed) =
value of S-expression s in environment e within depth d.
If a new environment is created it is displayed.
V: (&(sed)(/(.s)((’(&(A)(Ae)))(’(&(e)(/(.e)s(/(=s(+e)
)(+(-e))(A(-(-e))))))))((’(&(f)(/(=$f)f(/(=f’)(+(-
s))(/(=f/)((’(&(p)(/(=$p)p(/(=0p)(V(+(-(-(-s))))ed
)(V(+(-(-s)))ed)))))(V(+(-s))ed))((’(&(W)((’(&(a)(
/(=$a)a((’(&(x)((’(&(y)(/(=f.)(.x)(/(=f+)(+x)(/(=f
-)(-x)(/(=f,)(,x)(/(=f=)(=xy)(/(=f*)(*xy)(/(.d)$((
’(&(d)(/(=f!)(Vx()d)(/(=f?)((’(&(L)(/(Ldx)((’(&(v)
(/(=$v)v(*v()))))(Vy()d))((’(&(v)(/(=$v)?(*v()))))
(Vy()x)))))(’(&(ij)(/(.i)1(/(.j)0(L(-i)(-j)))))))(
(’(&(B)(V(+(-(-f)))(,(B(+(-f))a))d)))(’(&(xa)(/(.x
)e(*(+x)(*(+a)(B(-x)(-a))))))))))))(-d)))))))))))(
+(-a)))))(+a)))))(W(-s)))))(’(&(l)(/(.l)l((’(&(x)(
/(=$x)x((’(&(y)(/(=$y)y(*xy))))(W(-l))))))(V(+l)ed
)))))))))))(V(+s)ed))))
100 CHAPTER 3. A VERSION OF PURE LISP
expression (’(C(&(xy)(/(.x)y(*(+x)(C(-x)y))))))
E: (C(&(xy)(/(.x)y(*(+x)(C(-x)y)))))
(V ’(C’(ab)’(cd)) E ’())
expression (V(’(C(’(ab))(’(cd))))E(’()))
value $
(V ’(C’(ab)’(cd)) E ’(1))
expression (V(’(C(’(ab))(’(cd))))E(’(1)))
display (x(ab)y(cd)C(&(xy)(/(.x)y(*(+x)(C(-x)y)))))
value $
(V ’(C’(ab)’(cd)) E ’(11))
expression (V(’(C(’(ab))(’(cd))))E(’(11)))
display (x(ab)y(cd)C(&(xy)(/(.x)y(*(+x)(C(-x)y)))))
display (x(b)y(cd)x(ab)y(cd)C(&(xy)(/(.x)y(*(+x)(C(-x)y)))
))
value $
(V ’(C’(ab)’(cd)) E ’(111))
expression (V(’(C(’(ab))(’(cd))))E(’(111)))
display (x(ab)y(cd)C(&(xy)(/(.x)y(*(+x)(C(-x)y)))))
display (x(b)y(cd)x(ab)y(cd)C(&(xy)(/(.x)y(*(+x)(C(-x)y)))
))
display (x()y(cd)x(b)y(cd)x(ab)y(cd)C(&(xy)(/(.x)y(*(+x)(C
(-x)y)))))
3.6. LISP IN LISP III 101
value (abcd)
103
104 CHAPTER 4. THE LISP INTERPRETER EVAL
(1) the association list ALIST which contains all variable bindings,
(2) the interpreter STACK used for saving and restoring information
when the interpreter calls itself, and
All other registers are either temporary scratch registers used by the
interpreter (FUNCTION, ARGUMENTS, VARIABLES, X, and Y),
or hidden registers used by the microcode rather than directly by the
interpreter. These hidden registers include:
(1) the two in-boxes and two out-boxes for micro-routines: SOURCE,
SOURCE2, TARGET, and TARGET2,
(3) the three registers for return addresses from subroutine calls:
LINKREG, LINKREG2, and LINKREG3
forced to only give these excerpts; the full compiler log and equation
are available from the author.1
GOTO UNWIND
L53: GOTO UNWIND
*
NOT_QUOTE LABEL
*
* If .........................................................
*
NEQ FUNCTION,C’/’,NOT_IF_THEN_ELSE
NOT_QUOTE: NEQ FUNCTION C’/’ NOT_IF_THEN_ELSE
* / If
POPL EXPRESSION,ARGUMENTS pick up "if" clause
L55: SET SOURCE ARGUMENTS
L56: JUMP LINKREG3 SPLIT_ROUTINE
L57: SET EXPRESSION TARGET
L58: SET ARGUMENTS TARGET2
PUSH ARGUMENTS remember "then" & "else" clauses
L59: SET SOURCE ARGUMENTS
L60: JUMP LINKREG2 PUSH_ROUTINE
JUMP LINKREG,EVAL evaluate predicate
L61: JUMP LINKREG EVAL
POP ARGUMENTS pick up "then" & "else" clauses
L62: JUMP LINKREG2 POP_ROUTINE
L63: SET ARGUMENTS TARGET
EQ VALUE,C’)’,UNWIND abort ?
L64: EQ VALUE C’)’ UNWIND
NEQ VALUE,C’0’,THEN_CLAUSE predicate considered true
* if not 0
L65: NEQ VALUE C’0’ THEN_CLAUSE
TL ARGUMENTS,ARGUMENTS if false, skip "then" clause
L66: SET SOURCE ARGUMENTS
L67: JUMP LINKREG3 SPLIT_ROUTINE
L68: SET ARGUMENTS TARGET2
THEN_CLAUSE LABEL
HD EXPRESSION,ARGUMENTS pick up "then" or "else" clause
THEN_CLAUSE: SET SOURCE ARGUMENTS
L70: JUMP LINKREG3 SPLIT_ROUTINE
L71: SET EXPRESSION TARGET
JUMP LINKREG,EVAL evaluate it
L72: JUMP LINKREG EVAL
110 CHAPTER 4. THE LISP INTERPRETER EVAL
*
* Eval .......................................................
*
NEQ FUNCTION,C’!’,NOT_EVAL
L126: NEQ FUNCTION C’!’ NOT_EVAL
* ! Eval
SET EXPRESSION,X pick up argument
L127: SET EXPRESSION X
PUSH ALIST push alist
L128: SET SOURCE ALIST
L129: JUMP LINKREG2 PUSH_ROUTINE
EMPTY ALIST fresh environment
L130: SET ALIST C’)’
L131: LEFT ALIST C’(’
JUMP LINKREG,EVAL evaluate argument again
L132: JUMP LINKREG EVAL
POP ALIST restore old environment
L133: JUMP LINKREG2 POP_ROUTINE
L134: SET ALIST TARGET
POP DEPTH restore old depth limit
L135: JUMP LINKREG2 POP_ROUTINE
L136: SET DEPTH TARGET
GOTO UNWIND
L137: GOTO UNWIND
*
NOT_EVAL LABEL
*
* Evald ......................................................
*
NEQ FUNCTION,C’?’,NOT_EVALD
NOT_EVAL: NEQ FUNCTION C’?’ NOT_EVALD
* ? Eval depth limited
SET VALUE,X pick up first argument
L139: SET VALUE X
SET EXPRESSION,Y pick up second argument
L140: SET EXPRESSION Y
* First argument of ? is in VALUE and
* second argument of ? is in EXPRESSION.
* First argument is new depth limit and
114 CHAPTER 4. THE LISP INTERPRETER EVAL
VARIABLES REGISTER
X REGISTER
Y REGISTER
SOURCE REGISTER
SOURCE2 REGISTER
TARGET REGISTER
TARGET2 REGISTER
WORK REGISTER
PARENS REGISTER
LINKREG REGISTER
LINKREG2 REGISTER
LINKREG3 REGISTER
*
Register variables:
ALIST ARGUMENTS DEPTH EXPRESSION FUNCTION LINKREG
LINKREG2 LINKREG3 PARENS SOURCE SOURCE2 STACK
TARGET TARGET2 VALUE VARIABLES WORK X Y
Label variables:
ALIST_SEARCH BIND CHOOSE COMPARE COPY_HD COPY_TL
COPY1 COPY2 DEPTH_OKAY EVAL EVALST
124 CHAPTER 4. THE LISP INTERPRETER EVAL
Auxiliary variables:
char.ARGUMENTS char.DEPTH char.EXPRESSION
char.FUNCTION char.PARENS char.SOURCE char.SOURCE2
char.VALUE char.VARIABLES char.WORK char.X char.Y
dont.set.ALIST dont.set.ARGUMENTS dont.set.DEPTH
4.3. THE ARITHMETIZATION OF EVAL 125
dont.set.EXPRESSION dont.set.FUNCTION
dont.set.LINKREG dont.set.LINKREG2
dont.set.LINKREG3 dont.set.PARENS dont.set.SOURCE
dont.set.SOURCE2 dont.set.STACK dont.set.TARGET
dont.set.TARGET2 dont.set.VALUE dont.set.VARIABLES
dont.set.WORK dont.set.X dont.set.Y
eq.ARGUMENTS.C’(’ eq.DEPTH.C’(’ eq.DEPTH.C’_’
eq.EXPRESSION.C’(’ eq.EXPRESSION.Y
eq.FUNCTION.C’.’ eq.FUNCTION.C’+’ eq.FUNCTION.C’!’
eq.FUNCTION.C’*’ eq.FUNCTION.C’-’ eq.FUNCTION.C’/’
eq.FUNCTION.C’,’ eq.FUNCTION.C’?’
eq.FUNCTION.C’’’’ eq.FUNCTION.C’=’ eq.PARENS.C’1’
eq.SOURCE.C’(’ eq.SOURCE.C’)’ eq.SOURCE.X’00’
eq.SOURCE2.C’(’ eq.SOURCE2.X’00’ eq.VALUE.C’)’
eq.VALUE.C’0’ eq.VARIABLES.C’(’ eq.WORK.C’)’
eq.WORK.X’00’ eq.X.C’(’ eq.X.C’_’ eq.X.X’00’
eq.X.Y eq.Y.C’(’ ge.ARGUMENTS.C’(’
ge.C’.’.FUNCTION ge.C’(’.ARGUMENTS ge.C’(’.DEPTH
ge.C’(’.EXPRESSION ge.C’(’.SOURCE ge.C’(’.SOURCE2
ge.C’(’.VARIABLES ge.C’(’.X ge.C’(’.Y
ge.C’+’.FUNCTION ge.C’!’.FUNCTION ge.C’*’.FUNCTION
ge.C’)’.SOURCE ge.C’)’.VALUE ge.C’)’.WORK
ge.C’-’.FUNCTION ge.C’/’.FUNCTION ge.C’,’.FUNCTION
ge.C’_’.DEPTH ge.C’_’.X ge.C’?’.FUNCTION
ge.C’’’’.FUNCTION ge.C’=’.FUNCTION ge.C’0’.VALUE
ge.C’1’.PARENS ge.DEPTH.C’(’ ge.DEPTH.C’_’
ge.EXPRESSION.C’(’ ge.EXPRESSION.Y
ge.FUNCTION.C’.’ ge.FUNCTION.C’+’ ge.FUNCTION.C’!’
ge.FUNCTION.C’*’ ge.FUNCTION.C’-’ ge.FUNCTION.C’/’
ge.FUNCTION.C’,’ ge.FUNCTION.C’?’
ge.FUNCTION.C’’’’ ge.FUNCTION.C’=’ ge.PARENS.C’1’
ge.SOURCE.C’(’ ge.SOURCE.C’)’ ge.SOURCE.X’00’
ge.SOURCE2.C’(’ ge.SOURCE2.X’00’ ge.VALUE.C’)’
ge.VALUE.C’0’ ge.VARIABLES.C’(’ ge.WORK.C’)’
ge.WORK.X’00’ ge.X.C’(’ ge.X.C’_’ ge.X.X’00’
ge.X.Y ge.X’00’.SOURCE ge.X’00’.SOURCE2
ge.X’00’.WORK ge.X’00’.X ge.Y.C’(’ ge.Y.EXPRESSION
ge.Y.X goback.JN_EXIT goback.L14 goback.L267
goback.L271 goback.SPLIT_EXIT i ic input.ALIST
126 CHAPTER 4. THE LISP INTERPRETER EVAL
set.LINKREG2.L142 set.LINKREG2.L16
set.LINKREG2.L167 set.LINKREG2.L169
set.LINKREG2.L179 set.LINKREG2.L181
set.LINKREG2.L193 set.LINKREG2.L199
set.LINKREG2.L201 set.LINKREG2.L205
set.LINKREG2.L216 set.LINKREG2.L218
set.LINKREG2.L222 set.LINKREG2.L224
set.LINKREG2.L233 set.LINKREG2.L243
set.LINKREG2.L249 set.LINKREG2.L251
set.LINKREG2.L257 set.LINKREG2.L43
set.LINKREG2.L45 set.LINKREG2.L60 set.LINKREG2.L62
set.LINKREG2.L75 set.LINKREG2.L77
set.LINKREG2.UNWIND set.LINKREG3 set.LINKREG3.L101
set.LINKREG3.L108 set.LINKREG3.L124
set.LINKREG3.L157 set.LINKREG3.L160
set.LINKREG3.L175 set.LINKREG3.L186
set.LINKREG3.L189 set.LINKREG3.L196
set.LINKREG3.L212 set.LINKREG3.L229
set.LINKREG3.L239 set.LINKREG3.L245
set.LINKREG3.L255 set.LINKREG3.L261
set.LINKREG3.L265 set.LINKREG3.L269
set.LINKREG3.L29 set.LINKREG3.L33 set.LINKREG3.L39
set.LINKREG3.L51 set.LINKREG3.L56 set.LINKREG3.L67
set.LINKREG3.L70 set.LINKREG3.L82 set.LINKREG3.L86
set.PARENS set.PARENS.L282 set.PARENS.L284
set.PARENS.L286 set.SOURCE set.SOURCE.BIND
set.SOURCE.COPY_TL set.SOURCE.COPY1
set.SOURCE.EVAL set.SOURCE.EVALST
set.SOURCE.EXPRESSION_ISNT_ATOM set.SOURCE.L106
set.SOURCE.L123 set.SOURCE.L128 set.SOURCE.L141
set.SOURCE.L156 set.SOURCE.L159 set.SOURCE.L174
set.SOURCE.L188 set.SOURCE.L192 set.SOURCE.L195
set.SOURCE.L211 set.SOURCE.L215 set.SOURCE.L221
set.SOURCE.L227 set.SOURCE.L238 set.SOURCE.L242
set.SOURCE.L244 set.SOURCE.L248 set.SOURCE.L253
set.SOURCE.L259 set.SOURCE.L28 set.SOURCE.L280
set.SOURCE.L32 set.SOURCE.L42 set.SOURCE.L50
set.SOURCE.L55 set.SOURCE.L59 set.SOURCE.L66
set.SOURCE.L81 set.SOURCE.L85 set.SOURCE.NO_LIMIT
128 CHAPTER 4. THE LISP INTERPRETER EVAL
set.SOURCE.NOT_EQUAL set.SOURCE.NOT_EVALD
set.SOURCE.NOT_IF_THEN_ELSE set.SOURCE.NOT_RPAR
set.SOURCE.POP_ROUTINE set.SOURCE.THEN_CLAUSE
set.SOURCE2 set.SOURCE2.COPY2 set.SOURCE2.L107
set.SOURCE2.L173 set.SOURCE2.L228 set.SOURCE2.L254
set.SOURCE2.L260 set.SOURCE2.L301
set.SOURCE2.PUSH_ROUTINE set.SOURCE2.WRAP
set.STACK set.STACK.L266 set.STACK.L270
set.STACK.L3 set.TARGET set.TARGET.JN_ROUTINE
set.TARGET.L278 set.TARGET.L299 set.TARGET.REVERSE
set.TARGET.REVERSE_HD set.TARGET.SPLIT_ROUTINE
set.TARGET2 set.TARGET2.L273 set.TARGET2.L279
set.TARGET2.REVERSE_TL set.VALUE
set.VALUE.ALIST_SEARCH set.VALUE.L102
set.VALUE.L104 set.VALUE.L109 set.VALUE.L113
set.VALUE.L116 set.VALUE.L139 set.VALUE.L176
set.VALUE.L206 set.VALUE.L230 set.VALUE.L34
set.VALUE.L52 set.VALUE.RETURNQ set.VALUE.RETURN0
set.VALUE.RETURN1 set.VARIABLES set.VARIABLES.L190
set.VARIABLES.L241 set.WORK set.WORK.COPY_TL
set.WORK.COPY1 set.WORK.COPY2 set.WORK.L118
set.WORK.L119 set.WORK.L149 set.WORK.L150
set.WORK.L153 set.WORK.L154 set.WORK.L18
set.WORK.L19 set.WORK.L208 set.WORK.L209
set.WORK.L235 set.WORK.L236 set.WORK.L25
set.WORK.L26 set.WORK.L275 set.WORK.L276
set.WORK.L281 set.WORK.L291 set.WORK.L300
set.WORK.L301 set.WORK.L90 set.WORK.L91
set.WORK.NOT_RPAR set.WORK.REVERSE
set.WORK.REVERSE_HD set.WORK.REVERSE_TL set.X
set.X.EXPRESSION_IS_ATOM set.X.L145 set.X.L158
set.X.L225 set.X.L240 set.X.L246 set.X.L252
set.X.L258 set.X.L31 set.X.L35 set.X.L83 set.X.L96
set.Y set.Y.L146 set.Y.L161 set.Y.L30 set.Y.L84
set.Y.L87 set.Y.L97 shift.ARGUMENTS shift.DEPTH
shift.EXPRESSION shift.FUNCTION shift.PARENS
shift.SOURCE shift.SOURCE2 shift.VALUE
shift.VARIABLES shift.WORK shift.X shift.Y time
total.input
4.4. START OF LEFT-HAND SIDE 129
95)*(2**s1795) + 2*((1+t1795)**s1795)*(v1795*t1795**(r1795+1)+
u1795*t1795**r1795+w1795) + 2*(w1795+x1795+1)*(t1795**r1795) +
2*(u1795+y1795+1)*(t1795) + 2*(u1795)*(2*z1795+1) + 2*(r1796)
*(256*ge.C’(’.SOURCE2) + 2*(s1796+char.SOURCE2)*(256*i+128*i)
+ 2*(t1796)*(2**s1796) + 2*((1+t1796)**s1796)*(v1796*t1796**(r
1796+1)+u1796*t1796**r1796+w1796) + 2*(w1796+x1796+1)*(t1796**
r1796) + 2*(u1796+y1796+1)*(t1796) + 2*(u1796)*(2*z1796+1) + 2
*(r1797+char.SOURCE2)*(256*i+128*i) + 2*(s1797)*(256*ge.C’(’.S
OURCE2+255*i) + 2*(t1797)*(2**s1797) + 2*((1+t1797)**s1797)*(v
1797*t1797**(r1797+1)+u1797*t1797**r1797+w1797) + 2*(w1797+x17
97+1)*(t1797**r1797) + 2*(u1797+y1797+1)*(t1797) + 2*(u1797)*(
2*z1797+1) + 2*(r1798)*(eq.SOURCE2.C’(’) + 2*(s1798)*(i) + 2*(
t1798)*(2**s1798) + 2*((1+t1798)**s1798)*(v1798*t1798**(r1798+
1)+u1798*t1798**r1798+w1798) + 2*(w1798+x1798+1)*(t1798**r1798
) + 2*(u1798+y1798+1)*(t1798) + 2*(u1798)*(2*z1798+1) + 2*(r17
99)*(2*eq.SOURCE2.C’(’) + 2*(s1799)*(ge.SOURCE2.C’(’+ge.C’(’.S
OURCE2) + 2*(t1799)*(2**s1799) + 2*((1+t1799)**s1799)*(v1799*t
1799**(r1799+1)+u1799*t1799**r1799+w1799) + 2*(w1799+x1799+1)*
(t1799**r1799) + 2*(u1799+y1799+1)*(t1799) + 2*(u1799)*(2*z179
9+1) + 2*(r1800)*(ge.SOURCE2.C’(’+ge.C’(’.SOURCE2) + 2*(s1800)
*(2*eq.SOURCE2.C’(’+i) + 2*(t1800)*(2**s1800) + 2*((1+t1800)**
s1800)*(v1800*t1800**(r1800+1)+u1800*t1800**r1800+w1800) + 2*(
w1800+x1800+1)*(t1800**r1800) + 2*(u1800+y1800+1)*(t1800) + 2*
(u1800)*(2*z1800+1) + 2*(r1801)*(ge.SOURCE2.X’00’) + 2*(s1801)
*(i) + 2*(t1801)*(2**s1801) + 2*((1+t1801)**s1801)*(v1801*t180
1**(r1801+1)+u1801*t1801**r1801+w1801) + 2*(w1801+x1801+1)*(t1
801**r1801) + 2*(u1801+y1801+1)*(t1801) + 2*(u1801)*(2*z1801+1
) + 2*(r1802)*(256*ge.SOURCE2.X’00’) + 2*(s1802+0*i)*(256*i+ch
ar.SOURCE2) + 2*(t1802)*(2**s1802) + 2*((1+t1802)**s1802)*(v18
02*t1802**(r1802+1)+u1802*t1802**r1802+w1802) + 2*(w1802+x1802
+1)*(t1802**r1802) + 2*(u1802+y1802+1)*(t1802) + 2*(u1802)*(2*
z1802+1) + 2*(r1803+0*i)*(256*i+char.SOURCE2) + 2*(s1803)*(256
*ge.SOURCE2.X’00’+255*i) + 2*(t1803)*(2**s1803) + 2*((1+t1803)
**s1803)*(v1803*t1803**(r1803+1)+u1803*t1803**r1803+w1803) + 2
*(w1803+x1803+1)*(t1803**r1803) + 2*(u1803+y1803+1)*(t1803) +
2*(u1803)*(2*z1803+1) + 2*(r1804)*(ge.X’00’.SOURCE2) + 2*(s180
4)*(i) + 2*(t1804)*(2**s1804) + 2*((1+t1804)**s1804)*(v1804*t1
804**(r1804+1)+u1804*t1804**r1804+w1804) + 2*(w1804+x1804+1)*(
t1804**r1804) + 2*(u1804+y1804+1)*(t1804) + 2*(u1804)*(2*z1804
4.5. END OF RIGHT-HAND SIDE 133
135
137
Having done the bulk of the work necessary to encode the halting
probability Ω as an exponential diophantine equation, we now turn to
theory. In Chapter 5 we trace the evolution of the concepts of program-
size complexity. In Chapter 6 we define these concepts formally and
develop their basic properties. In Chapter 7 we study the notion of a
random real and show that Ω is a random real. And in Chapter 8 we
develop incompleteness theorems for random reals.
138
Chapter 5
Conceptual Development
139
140 CHAPTER 5. CONCEPTUAL DEVELOPMENT
if one flips 7 coins for each character, eventually the number of right
parentheses will overtake the number of left parentheses, with probabil-
ity one. This is similar to the fact that heads versus tails will cross the
origin infinitely often, with probability one. I.e., a symmetrical random
walk on a line will return to the origin with probability one. For a more
detailed explanation, see Appendix B.
Now let’s select from the set of all syntactically correct programs,
which has measure 1, those that give a particular result. I.e., let’s
consider PLISP (x) defined to be the probability that an S-expression
chosen at random evaluates to x. In other words, if one tosses 7 coins
per character, what is the chance that the LISP S-expression that one
gets evaluates to x?
Finally, we define ΩLISP to be the probability that an S-expression
“halts”, i.e., the probability that it has a value. If one tosses 7 coins
per character, what is the chance that the LISP S-expression that one
gets halts? That is the value of ΩLISP .
Now for an upper bound on LISP complexity. Consider the S-
expression (’x) which evaluates to x. This shows that
H(x) ≤ |x| + 3.
The complexity of an S-expression is bounded from above by its size +
3.
Now we introduce the important notion of a minimal program. A
minimal program is a LISP S-expression having the property that no
smaller S-expression has the same value. It is obvious that there is
at least one minimal program for any given LISP S-expression, i.e., at
least one p with |p| = HLISP (x) which evaluates to x. Consider the S-
expression (!q) where q is a minimal program for p, and p is a minimal
program for x. This expression evaluates to x, and thus
|p| = H(x) ≤ 3 + |q| = 3 + H(p),
which shows that if p is a minimal program, then
H(p) ≥ |p| − 3.
It follows that all minimal programs p, and there are infinitely many
of them, have the property that
|H(p) − |p|| ≤ 3.
142 CHAPTER 5. CONCEPTUAL DEVELOPMENT
(1) Large minimal programs are “normal”, that is to say, each of the
128 characters in the LISP character set appears in it with a rel-
ative frequency close to 1/128. The longer the minimal program
is, the closer the relative frequencies are to 1/128.
(2) There are few minimal programs for a given object; minimal pro-
grams are essentially unique.
where
n
k≡ .
128
5.1. COMPLEXITY VIA LISP EXPRESSIONS 143
for all n, where Ω7n denotes the first 7n bits of the base-two numeral
for Ω, which implies the assertion of the theorem. The 2 is the number
of parentheses in (Pq) where q is a minimal program for Ω7n . Hence,
B(n) = n + O(1),
a step already taken at the end of Chaitin (1969a). This is easily done,
by deciding that programs will be bit strings, and by interpreting the
start of the bit string as a LISP S-expression defining a function, which
is evaluated and then applied to the rest of the bit string as data to give
the result of the program. The binary representation of S-expressions
that we have in mind uses 7 bits per character and is described in Figure
3.1. So now the complexity of an S-expression will be measured by the
size in bits of the shortest program of this kind that calculates it. I.e.,
we use a universal computer U that produces LISP S-expressions as
output when it is given as input programs which are bit strings of the
following form: programU = (self-delimiting LISP program for function
definition f ) binary data d. Since there is one 7-bit byte for each LISP
character, we see that
(1) There are at most 2n bit strings of complexity n, and less than
2n strings of complexity less than n.
(2) There is a constant c such that all bit strings of length n have
complexity less than n + c. In fact, c = 7 will do, because the
LISP function ’ (QUOTE) is one 7-bit character long.
(3) Less than 2−k of the bit strings of length n have H < n − k. And
more than 1 − 2−k of the bit strings of length n have n − k ≤ H <
n + c. This follows immediately from (1) and (2) above.
no longer holds.
ωk ≤ ωk+1 → ΩV
[
Make a list of strings into a prefix-free set
by removing duplicates. Last occurrence is kept.
]
& (Rx)
[ P-equiv: are two bit strings prefixes of each other ? ]
: (Pxy) /.x1 /.y1 /=+x+y (P-x-y) 0
[ is x P-equivalent to a member of l ? ]
: (Mxl) /.l0 /(Px+l) 1 (Mx-l)
[ body of R follows: ]
/.xx : r (R-x) /(M+xr) r *+xr
R: (&(x)((’(&(P)((’(&(M)(/(.x)x((’(&(r)(/(M(+x)r)r(*(
+x)r))))(R(-x))))))(’(&(xl)(/(.l)0(/(Px(+l))1(Mx(-
l)))))))))(’(&(xy)(/(.x)1(/(.y)1(/(=(+x)(+y))(P(-x
)(-y))0)))))))
[
K th approximation to Omega for given U.
]
& (WK)
: (Cxy) /.xy *+x(C-xy) [ concatenation (set union) ]
: (B)
: k ,(*"&*()*,’k()) [ write k & its value ]
: s (R(C(Hk)s)) [ add to s programs not P-equiv which halt ]
: s ,(*"&*()*,’s()) [ write s & its value ]
/=kK (Ms) [ if k = K, return measure of set s ]
: k *1k [ add 1 to k ]
(B)
: k () [ initialize k to zero ]
: s () [ initialize s to empty set of programs ]
(B)
W: (&(K)((’(&(C)((’(&(B)((’(&(k)((’(&(s)(B)))())))())
150 CHAPTER 5. CONCEPTUAL DEVELOPMENT
))(’(&()((’(&(k)((’(&(s)((’(&(s)(/(=kK)(Ms)((’(&(k
)(B)))(*1k)))))(,((*&(*()(*(,(’s))()))))))))(R(C(H
k)s)))))(,((*&(*()(*(,(’k))())))))))))))(’(&(xy)(/
(.x)y(*(+x)(C(-x)y)))))))
[
Subset of computer programs of size up to k
which halt within time k when run on U.
]
& (Hk)
[ quote all elements of list ]
: (Qx) /.xx **"’*+x()(Q-x)
[ select elements of x which have property P ]
: (Sx) /.xx /(P+x) *+x(S-x) (S-x)
[ property P
is that program halts within time k when run on U ]
: (Px) =0.?k(Q*U*x())
[ body of H follows:
select subset of programs of length up to k ]
(S(Xk))
H: (&(k)((’(&(Q)((’(&(S)((’(&(P)(S(Xk))))(’(&(x)(=0(.
(?k(Q(*U(*x())))))))))))(’(&(x)(/(.x)x(/(P(+x))(*(
+x)(S(-x)))(S(-x)))))))))(’(&(x)(/(.x)x(*(*’(*(+x)
()))(Q(-x))))))))
[
Produce all bit strings of length less than or equal to k.
Bigger strings come first.
]
& (Xk)
/.k ’(())
: (Zy) /.y ’(()) **0+y **1+y (Z-y)
(Z(X-k))
X: (&(k)(/(.k)(’(()))((’(&(Z)(Z(X(-k)))))(’(&(y)(/(.y
)(’(()))(*(*0(+y))(*(*1(+y))(Z(-y))))))))))
5.4. OMEGA IN LISP 151
M: (&(x)((’(&(S)((’(&(C)((’(&(A)((’(&(M)((’(&(P)((’(&
(s)(*(+s)(*.(-s)))))(Px))))(’(&(x)(/(.x)(’(0))((’(
&(y)((’(&(z)(-y)))(/(+y)(,(’(overflow)))0))))(A(M(
+x))(P(-x))))))))))(’(&(x)(/(.x)(’(1))(*0(M(-x))))
)))))(’(&(xy)(/(.x)(*0y)(/(.y)(*0x)((’(&(z)(*(C(+x
)(+y)(+z))(*(S(+x)(+y)(+z))(-z)))))(A(-x)(-y))))))
))))(’(&(xyz)(/x(/y1z)(/yz0)))))))(’(&(xyz)(=x(=yz
))))))
[
If k th bit of string x is 1 then halt, else loop forever.
Value, if has one, is always 0.
]
& (Oxk) /=0.,k (O-x-k) [ else ]
152 CHAPTER 5. CONCEPTUAL DEVELOPMENT
O: (&(xk)(/(=0(.(,k)))(O(-x)(-k))(/(.x)(Oxk)(/(+x)0(O
xk)))))
& (Us)
[
Alphabet:
]
: A ’"
((((((((leftparen)(rightparen))(AB))((CD)(EF)))(((GH)(IJ))((KL
)(MN))))((((OP)(QR))((ST)(UV)))(((WX)(YZ))((ab)(cd)))))(((((ef
)(gh))((ij)(kl)))(((mn)(op))((qr)(st))))((((uv)(wx))((yz)(01))
)(((23)(45))((67)(89))))))((((((_+)(-.))((’,)(!=)))(((*&)(?/))
((:")($%))))((((%%)(%%))((%%)(%%)))(((%%)(%%))((%%)(%%)))))(((
((%%)(%%))((%%)(%%)))(((%%)(%%))((%%)(%%))))((((%%)(%%))((%%)(
%%)))(((%%)(%%))((%%)(%%)))))))
[
Read 7-bit character from bit string.
Returns character followed by rest of string.
Typical result is (A 1111 000).
]
: (Cs)
/.--- ---s (Cs) [ undefined if less than 7 bits left ]
: (Rx) +-x [ 1 bit: take right half ]
: (Lx) +x [ 0 bit: take left half ]
*
(/+s R L
(/+-s R L
(/+--s R L
(/+---s R L
(/+----s R L
(/+-----s R L
(/+------s R L
5.4. OMEGA IN LISP 153
A)))) )))
---- ---s
[
Read zero or more s-exp’s until get to a right parenthesis.
Returns list of s-exp’s followed by rest of string.
Typical result is ((AB) 1111 000).
]
: (Ls)
: c (Cs) [ c = read char from input s ]
/=+c’(right paren) *()-c [ end of list ]
: d (Es) [ d = read s-exp from input s ]
: e (L-d) [ e = read list from rest of input ]
**+d+e-e [ add s-exp to list ]
[
Read single s-exp.
Returns s-exp followed by rest of string.
Typical result is ((AB) 1111 000).
]
: (Es)
: c (Cs) [ c = read char from input s ]
/=+c’(right paren) *()-c [ invalid right paren becomes () ]
/=+c’(left paren) (L-c) [ read list from rest of input ]
c [ otherwise atom followed by rest of input ]
U: (&(s)((’(&(A)((’(&(C)((’(&(L)((’(&(E)((’(&(x)(!(*(
+x)(*(*’(*(-x)()))())))))(Es))))(’(&(s)((’(&(c)(/(
=(+c)(’(rightparen)))(*()(-c))(/(=(+c)(’(leftparen
)))(L(-c))c))))(Cs)))))))(’(&(s)((’(&(c)(/(=(+c)(’
(rightparen)))(*()(-c))((’(&(d)((’(&(e)(*(*(+d)(+e
))(-e))))(L(-d)))))(Es)))))(Cs)))))))(’(&(s)(/(.(-
(-(-(-(-(-s)))))))(Cs)((’(&(R)((’(&(L)(*((/(+s)RL)
((/(+(-s))RL)((/(+(-(-s)))RL)((/(+(-(-(-s))))RL)((
/(+(-(-(-(-s)))))RL)((/(+(-(-(-(-(-s))))))RL)((/(+
(-(-(-(-(-(-s)))))))RL)A)))))))(-(-(-(-(-(-(-s))))
154 CHAPTER 5. CONCEPTUAL DEVELOPMENT
))))))(’(&(x)(+x))))))(’(&(x)(+(-x)))))))))))(’(((
(((((leftparen)(rightparen))(AB))((CD)(EF)))(((GH)
(IJ))((KL)(MN))))((((OP)(QR))((ST)(UV)))(((WX)(YZ)
)((ab)(cd)))))(((((ef)(gh))((ij)(kl)))(((mn)(op))(
(qr)(st))))((((uv)(wx))((yz)(01)))(((23)(45))((67)
(89))))))((((((_+)(-.))((’,)(!=)))(((*&)(?/))((:")
($%))))((((%%)(%%))((%%)(%%)))(((%%)(%%))((%%)(%%)
))))(((((%%)(%%))((%%)(%%)))(((%%)(%%))((%%)(%%)))
)((((%%)(%%))((%%)(%%)))(((%%)(%%))((%%)(%%)))))))
)))
[ Omega ! ]
(W’(1111 111 111))
expression (W(’(1111111111)))
display k
display ()
display s
display ()
display k
display (1)
display s
display ()
display k
display (11)
display s
display ()
display k
display (111)
display s
display ()
display k
display (1111)
display s
display ()
display k
display (11111)
display s
5.4. OMEGA IN LISP 155
display ()
display k
display (111111)
display s
display ()
display k
display (1111111)
display s
display ()
display k
display (11111111)
display s
display ()
display k
display (111111111)
display s
display ()
display k
display (1111111111)
display (000)
display (100)
display (010)
display (110)
display (001)
display (101)
display (011)
display (111)
display (00)
display (10)
display (01)
display (11)
display (0)
display (1)
display ()
display s
display ((1000000)(0100000)(1100000)(0010000)(1010000)(011
0000)(1110000)(0001000)(1001000)(0101000)(1101000)
(0011000)(1011000)(0111000)(1111000)(0000100)(1000
100)(0100100)(1100100)(0010100)(1010100)(0110100)(
156 CHAPTER 5. CONCEPTUAL DEVELOPMENT
1110100)(0001100)(1001100)(0101100)(1101100)(00111
00)(1011100)(0111100)(1111100)(0000010)(1000010)(0
100010)(1100010)(0010010)(1010010)(0110010)(111001
0)(0001010)(1001010)(0101010)(1101010)(0011010)(10
11010)(0111010)(1111010)(0000110)(1000110)(0100110
)(1100110)(0010110)(1010110)(0110110)(1110110)(000
1110)(1001110)(0101110)(1101110)(0011110)(1011110)
(0111110)(1111110)(0000001)(1000001)(0100001)(1100
001)(0010001)(1010001)(0110001)(1110001)(0001001)(
1001001)(0101001)(1101001)(0011001)(1011001)(01110
01)(1111001)(0000101)(1000101)(0100101)(1100101)(0
010101)(1010101)(0110101)(1110101)(0001101)(100110
1)(0101101)(1101101)(0011101)(1011101)(0111101)(11
11101)(0000011)(1000011)(0100011)(1100011)(0010011
)(1010011)(0110011)(1110011)(0001011)(1001011)(010
1011)(1101011)(0011011)(1011011)(0111011)(1111011)
(0000111)(1000111)(0100111)(1100111)(0010111)(1010
111)(0110111)(1110111)(0001111)(1001111)(0101111)(
1101111)(0011111)(1011111)(0111111)(1111111))
value (0.1111111)
Program Size
6.1 Introduction
In this chapter we present a new definition of program-size complex-
ity. H(A, B/C, D) is defined to be the size in bits of the shortest
self-delimiting program for calculating strings A and B if one is given
a minimal-size self-delimiting program for calculating strings C and D.
As is the case in LISP, programs are required to be self-delimiting, but
instead of achieving this with balanced parentheses, we merely stipulate
that no meaningful program be a prefix of another. Moreover, instead
of being given C and D directly, one is given a program for calculating
them that is minimal in size. Unlike previous definitions, this one has
precisely the formal properties of the entropy concept of information
theory.
What train of thought led us to this definition? Following [Chaitin
(1970a)], think of a computer as decoding equipment at the receiving
end of a noiseless binary communications channel. Think of its pro-
grams as code words, and of the result of the computation as the de-
coded message. Then it is natural to require that the programs/code
words form what is called a “prefix-free set,” so that successive messages
sent across the channel (e.g. subroutines) can be separated. Prefix-free
sets are well understood; they are governed by the Kraft inequality,
which therefore plays an important role in this chapter.
One is thus led to define the relative complexity H(A, B/C, D) of
157
158 CHAPTER 6. PROGRAM SIZE
6.2 Definitions
In this chapter, Λ = LISP () is the empty string. {Λ, 0, 1, 00, 01,
10, 11, 000, . . .} is the set of finite binary strings, ordered as indicated.
Henceforth we say “string” instead of “binary string;” a string is un-
derstood to be finite unless the contrary is explicitly stated. As before,
|s| is the length of the string s. The variables p, q, s, and t denote
strings. The variables c, i, k, m, and n denote non-negative integers.
#(S) is the cardinality of the set S.
Definition of a Prefix-Free Set
A prefix-free set is a set of strings S with the property that no string
in S is a prefix of another.
Definition of a Computer
A computer C is a computable partial function that carries a pro-
gram string p and a free data string q into an output string C(p, q) with
the property that for each q the domain of C(., q) is a prefix-free set;
i.e., if C(p, q) is defined and p is a proper prefix of p0 , then C(p0 , q) is
not defined. In other words, programs must be self-delimiting.
Definition of a Universal Computer
U is a universal computer iff for each computer C there is a constant
sim(C) with the following property: if C(p, q) is defined, then there is
6.2. DEFINITIONS 159
Q.E.D.
We pick this particular universal computer U as the stan-
dard one we shall use for measuring program-size complexities
throughout the rest of this book.
Definition of Canonical Programs, Complexities, and Prob-
abilities
s∗ ≡ minU (p,Λ)=s p.
I.e., s∗ is the shortest string that is a program for U to
calculate s, and if several strings of the same size have this
property, we pick the one that comes first when all strings
of that size are ordered from all 0’s to all 1’s in the usual
lexicographic order.
160 CHAPTER 6. PROGRAM SIZE
(b) Complexities.
(c) Probabilities.
P
PC (s) ≡ C(p,Λ)=s 2−|p| ,
P (s) ≡ PU (s),
P
PC (s/t) ≡ C(p,t∗ )=s 2−|p|,
P (s/t) ≡ PU (s/t),
P
Ω≡ U (p,Λ) is defined 2−|p| .
Remark on Omega
Note that the LISP program for calculating Ω in the limit from
below that we gave in Section 5.4 is still valid, even though the notion
of “free data” did not appear in Chapter 5. Section 5.4 still works,
because giving a LISP function only one argument is equivalent to
giving it that argument and the empty list Λ as a second argument.
Remark on Nomenclature
The names of these concepts mix terminology from information the-
ory, from probability theory, and from the field of computational com-
plexity. H(s) may be referred to as the algorithmic information content
of s or the program-size complexity of s, and H(s/t) may be referred to
as the algorithmic information content of s relative to t or the program-
size complexity of s given t. Or H(s) and H(s/t) may be termed the
algorithmic entropy and the conditional algorithmic entropy, respec-
tively. H(s : t) is called the mutual algorithmic information of s and t;
it measures the degree of interdependence of s and t. More precisely,
H(s : t) is the extent to which knowing s helps one to calculate t,
which, as we shall see in Theorem I9, also turns out to be the extent to
6.2. DEFINITIONS 161
(g) H(s/t) 6= ∞,
(h) 0 ≤ PC (s) ≤ 1,
(i) 0 ≤ PC (s/t) ≤ 1,
P
(j) 1 ≥ s PC (s),
P
(k) 1 ≥ s PC (s/t),
Proof
These are immediate consequences of the definitions. Q.E.D.
Extensions of the Previous Concepts to Tuples of Strings
We have defined the program-size complexity and the algorithmic
probability of individual strings, the relative complexity of one string
given another, and the algorithmic probability of one string given an-
other. Let’s extend this from individual strings to tuples of strings:
this is easy to do because we have used LISP to construct our universal
computer U, and the ordered list (s1 s2 . . . sn ) is a basic LISP notion.
Here each sk is a string, which is defined in LISP as a list of 0’s and 1’s.
Thus, for example, we can define the relative complexity of computing
a triple of strings given another triple of strings:
and
H(s, t) ≤ HC (s, t) + sim(C) ≤ H(s) + H(t/s) + O(1).
It remains to verify the claim that there is such a computer. C
does the following when it is given the program s∗ p and the free data
Λ. First C pretends to be U. More precisely, C generates the r.e. set
V = {v : U(v, Λ) is defined}. As it generates V , C continually checks
whether or not that part r of its program that it has already read is
a prefix of some known element v of V . Note that initially r = Λ.
Whenever C finds that r is a prefix of a v ∈ V , it does the following.
If r is a proper prefix of v, C reads another bit of its program. And if
r = v, C calculates U(r, Λ), and C’s simulation of U is finished. In this
manner C reads the initial portion s∗ of its program and calculates s.
Then C simulates the computation that U performs when given the
free data s∗ and the remaining portion of C’s program. More precisely,
C generates the r.e. set W = {w : U(w, s∗ ) is defined}. As it generates
W , C continually checks whether or not that part r of its program that
it has already read is a prefix of some known element w of W . Note
that initially r = Λ. Whenever C finds that r is a prefix of a w ∈ W ,
it does the following. If r is a proper prefix of w, C reads another bit
of its program. And if r = w, C calculates U(r, s∗ ), and C’s second
simulation of U is finished. In this manner C reads the final portion p
of its program and calculates t from s∗ . The entire program has now
been read, and both s and t have been calculated. C finally forms the
pair (s, t) and halts, indicating this to be the result of the computation.
Q.E.D.
Remark
The rest of this section is devoted to showing that the “≤” in The-
orem I1(f) and I1(i) can be replaced by “=.” The arguments used have
a strong probabilistic as well as an information-theoretic flavor.
Theorem I2
(Extended Kraft inequality condition for the existence of a prefix-
free set).
6.3. BASIC IDENTITIES 165
{(sk , nk ) : k = 0, 1, 2, . . .}
and
HC (s) = min nk .
sk =s
(b) Now to justify the claim. We must show that the above rule for
making assignments never fails, i.e., we must show that it is never
the case that all programs of the requested length are unavailable.
A geometrical interpretation is necessary. Consider the unit inter-
val [0, 1) ≡ {real x : 0 ≤ x < 1}. The kth program (0 ≤ k < 2n )
of length n corresponds to the interval
h
k2−n , (k + 1)2−n .
of the requested length 2−n that is available that has the smallest
possible k. Using this rule for making assignments gives rise to
the following fact.
Fact. The set of those points in [0, 1) that are unassigned can
always be expressed as the union of a finite number of intervals
h
ki 2−ni , (ki + 1)2−ni
I.e., these intervals are disjoint, their lengths are distinct powers
of 2, and they appear in [0, 1) in order of increasing length.
We leave to the reader the verification that this fact is always
the case and that it implies that an assignment is impossible
6.3. BASIC IDENTITIES 167
Note
The preceding proof may be considered to involve a computer mem-
ory “storage allocation” problem. We have one unit of storage, and all
requests for storage request a power of two of storage, i.e., one-half
unit, one-quarter unit, etc. Storage is never freed. The algorithm given
above will be able to service a series of storage allocation requests as
long as the total storage requested is not greater than one unit. If the
total amount of storage remaining at any point in time is expressed as
a real number in binary, then the crucial property of the above storage
allocation technique can be stated as follows: at any given moment
there will be a block of size 2−k of free storage if and only if the binary
digit corresponding to 2−k in the base-two expansion for the amount of
storage remaining at that point is a 1 bit.
Theorem I3
(Computing HC and PC “in the limit”).
Consider a computer C.
(a) The set of all true propositions of the form
“HC (s) ≤ n”
Proof
This is an easy consequence of the fact that the domain of C is an
r.e. set. Q.E.D.
Remark
The set of all true propositions of the form
“H(s/t) ≤ n”
is not r.e.; for if it were r.e., it would easily follow from Theorems I1(c)
and I0(q) that Theorem 5.1(f) of Chaitin (1975b) is false.
Theorem I4
For each computer C there is a constant c such that
Proof
First a piece of notation. By lg x we mean the greatest integer less
than the base-two logarithm of the real number x. I.e., if 2n < x ≤
2n+1 , then lg x = n. Thus 2lg x < x as long as x is positive. E.g.,
lg 2−3.5 = lg 2−3 = −4 and lg 23.5 = lg 24 = 3.
It follows from Theorem I3(b) that one can eventually discover every
lower bound on PC (s) that is a power of two. In other words, the set
of all true propositions
n o
T ≡ “PC (s) > 2−n ” : PC (s) > 2−n
Hence X X
2−|p| < PC (s) ≤ 1
D(p,Λ) is defined s
Hence X X
2−|p| < PC (s/t) ≤ 1
D(p,t∗ ) is defined s
Proof
This follows immediately from Theorem I5(b). Q.E.D.
Theorem I7 X
P (s) ' P (s, t).
t
Proof
On the one hand, there is a computer C such that
Thus X
PC (s) ≥ P (s, t).
t
Thus X
PC (s, t) ≥ PC (s, s) ≥ P (s).
t
172 CHAPTER 6. PROGRAM SIZE
Q.E.D.
Theorem I8
There is a computer C and a constant c such that
HC (t/s) = H(s, t) − H(s) + c.
Proof
By Theorems I7 and I5(b) there is a c independent of s such that
X
2H(s)−c P (s, t) ≤ 1.
t
Given the free data s∗ , C computes s = U(s∗ , Λ) and H(s) = |s∗ |, and
then simulates the computer determined by the requirements
{(t, |p| − H(s) + c) : U(p, Λ) = (s, t)}
Thus for each p such that
U(p, Λ) = (s, t)
there is a corresponding p0 such that
C(p0 , s∗ ) = t
and
|p0 | = |p| − H(s) + c.
Hence
HC (t/s) = H(s, t) − H(s) + c.
However, we must check that these requirements satisfy the Kraft in-
equality and are consistent:
X X
2−|p| = 2−|p|+H(s)−c
C(p,s∗ ) is defined U (p,Λ)=(s,t)
X
=2 H(s)−c
P (s, t) ≤ 1
t
because of the way c was chosen. Thus the hypothesis of Theorem I2 is
satisfied, and these requirements indeed determine a computer. Q.E.D.
Theorem I9
6.3. BASIC IDENTITIES 173
Proof
Theorem I9(a) follows immediately from Theorems I8, I0(b), and
I1(f). Theorem I9(b) follows immediately from Theorem I9(a) and
the definition of H(s : t). Theorem I9(c) follows immediately from
Theorems I9(b) and I1(a). Thus the mutual information H(s : t) is the
extent to which it is easier to compute s and t together than to compute
them separately, as well as the extent to which knowing s makes t easier
to compute. Theorem I9(d,e) follow immediately from Theorems I9(a)
and I5(b). Theorem I9(f) follows immediately from Theorems I9(b)
and I5(b). Q.E.D.
Remark
We thus have at our disposal essentially the entire formalism of
information theory. Results such as these can now be obtained effort-
lessly:
H(s1 , s2 , s3 , s4 )
= H(s1/s2 , s3 , s4 ) + H(s2 /s3 , s4 ) + H(s3 /s4 ) + H(s4 ) + O(1).
However, there is an interesting class of identities satisfied by our H
function that has no parallel in classical information theory. The sim-
plest of these is
H(H(s)/s) = O(1)
(Theorem I1(c)), which with Theorem I9(a) immediately yields
and
H(H(s), H(t/s)/s, t) = O(1).
In fact, it is easy to see that
which implies
H(H(s : t)/s, t) = O(1).
And of course these identities generalize to tuples of three or more
strings.
Proof
(a) By Theorem I0(l,j),
X X
2−H(n) ≤ P (n) ≤ 1.
n n
(b) If X
2−f (n)
n
diverges, and
H(n) ≤ f (n)
held for all but finitely many values of n, then
X
2−H(n)
n
(c) If X
2−f (n)
n
{(n, f (n)) : n ≥ n0 }
Remark
H(n) can in fact be characterized as a minimal function computable
in the limit from above that lies just on the borderline between the
convergence and the divergence of
X
2−H(n) .
Theorem I11
(Maximal complexity finite bit strings).
(a) max|s|=n H(s) = n + H(n) + O(1).
We now obtain Theorem I11(a,b) from this estimate for H(s). There
is a computer C such that
C(p, |p|∗ ) = p
H(s/n) < n − k.
Randomness
7.1 Introduction
Following Turing (1937), consider an enumeration r1 , r2 , r3 , . . . of all
computable real numbers between zero and one. We may suppose that
rk is the real number, if any, computed by the kth computer program.
Let .dk1dk2 dk3 . . . be the successive digits in the decimal expansion of
rk . Following Cantor, consider the diagonal of the array of rk :
179
180 CHAPTER 7. RANDOMNESS
This gives us a new real number with decimal expansion .d11 d22 d33 . . ..
Now change each of these digits, avoiding the digits zero and nine.
The result is an uncomputable real number, because its first digit is
different from the first digit of the first computable real, its second
digit is different from the second digit of the second computable real,
etc. It is necessary to avoid zero and nine, because real numbers with
different digit sequences can be equal to each other if one of them ends
with an infinite sequence of zeros and the other ends with an infinite
sequence of nines, for example, .3999999. . . = .4000000. . . .
Having constructed an uncomputable real number by diagonalizing
over the computable reals, Turing points out that it follows that the
halting problem is unsolvable. In particular, there can be no way of
deciding if the kth computer program ever outputs a kth digit. Be-
cause if there were, one could actually calculate the successive digits
of the uncomputable real number defined above, which is impossible.
Turing also notes that a version of Gödel’s incompleteness theorem is
an immediate corollary, because if there cannot be an algorithm for
deciding if the kth computer program ever outputs a kth digit, there
also cannot be a formal axiomatic system which would always enable
one to prove which of these possibilities is the case, for in principle one
could run through all possible proofs to decide. As we saw in Chapter
2, using the powerful techniques which were developed in order to solve
Hilbert’s tenth problem,1 it is possible to encode the unsolvability of
the halting problem as a statement about an exponential diophantine
equation. An exponential diophantine equation is one of the form
P (x1 , . . . , xm ) = P 0 (x1 , . . . , xm ),
where the variables x1 , . . . , xm range over non-negative integers and
P and P 0 are functions built up from these variables and non-negative
integer constants by the operations of addition A+B, multiplication A×
B, and exponentiation AB . The result of this encoding is an exponential
diophantine equation P = P 0 in m + 1 variables n, x1 , . . . , xm with the
property that
P (n, x1 , . . . , xm ) = P 0 (n, x1 , . . . , xm )
1
See Davis, Putnam and Robinson (1961), Davis, Matijasevič and Robin-
son (1976), and Jones and Matijasevič (1984).
7.1. INTRODUCTION 181
(here |s| denotes the length of the string s), and each real in the covered
set X has a member of C as the initial part of its base-two expansion.
In other words, we consider sets of real numbers with the property that
there is an algorithm A for producing arbitrarily small coverings of the
set. Such sets of reals are constructively of measure zero. Since there are
only countably many algorithms A for constructively covering measure
zero sets, it follows that almost all real numbers are not contained in
any set of constructive measure zero. Such reals are called (Martin-Löf)
random reals. In fact, if the successive bits of a real number are chosen
by coin flipping, with probability one it will not be contained in any set
of constructive measure zero, and hence will be a random real number.
Note that no computable real number r is random. Here is how we
get a constructive covering of arbitrarily small measure. The covering
algorithm, given n, yields the n-bit initial sequence of the binary digits
of r. This covers r and has total length or measure equal to 2−n . Thus
there is an algorithm for obtaining arbitrarily small coverings of the set
consisting of the computable real r, and r is not a random real number.
We leave to the reader the adaptation of the argument in Feller
(1970) proving the strong law of large numbers to show that reals in
which all digits do not have equal limiting frequency have constructive
measure zero.2 It follows that random reals are normal in Borel’s sense,
that is, in any base all digits have equal limiting frequency.
Let us consider the real number p whose nth bit in base-two nota-
tion is a zero or a one depending on whether or not the exponential
diophantine equation
P (n, x1 , . . . , xm ) = P 0 (n, x1 , . . . , xm )
the first N values of the parameter n. If one knows for how many of
these values of n, P = P 0 has a solution, then one can find for which
values of n < N there are solutions. This is because the set of solutions
of P = P 0 is recursively enumerable, that is, one can try more and
more solutions and eventually find each value of the parameter n for
which there is a solution. The only problem is to decide when to give
up further searches because all values of n < N for which there are
solutions have been found. But if one is told how many such n there
are, then one knows when to stop searching for solutions. So one can
assume each of the N +1 possibilities ranging from p has all of its initial
N bits off to p has all of them on, and each one of these assumptions
determines the actual values of the first N bits of p. Thus we have
determined N + 1 different possibilities for the first N bits of p, that
is, the real number p is covered by a set of intervals of total length
(N + 1)2−N , and hence is a set of constructive measure zero, and p
cannot be a random real number.
Thus asking whether an exponential diophantine equation has a
solution as a function of a parameter cannot give us a random real
number. However asking whether or not the number of solutions is
infinite can give us a random real. In particular, there is an exponential
diophantine equation Q = Q0 such that the real number q is random
whose nth bit is a zero or a one depending on whether or not there are
infinitely many different m-tuples of non-negative integers x1 , . . . , xm
such that
Q(n, x1 , . . . , xm ) = Q0 (n, x1 , . . . , xm ).
The equation P = P 0 that we considered before encoded the halting
problem, that is, the nth bit of the real number p was zero or one
depending on whether the nth computer program ever outputs an nth
digit. To construct an equation Q = Q0 such that q is random, we
use instead the halting probability Ω of a universal Turing machine;
Q = Q0 has finitely or infinitely many solutions depending on whether
the nth bit of the base-two expansion of the halting probability Ω is a
zero or a one.
Q = Q0 is quite a remarkable equation, as it shows that there is a
kind of uncertainty principle even in pure mathematics, in fact, even
in the theory of whole numbers. Whether or not Q = Q0 has infinitely
184 CHAPTER 7. RANDOMNESS
of an infinite random string from the preceding ones, must fail about
half the time. Previously we could only prove this to be the case if
(the number of bits predicted among the first n) / log n → ∞; now
this works as long as infinitely many predictions are made. So by going
from considering the size of LISP expressions to considering the size
of self-delimiting programs in a rather abstract programming language,
we lose the concreteness of the familiar, but we gain extremely sharp
theorems.
Definition [Martin-Löf (1966)]
Speaking geometrically, a real r is Martin-Löf random if it is never
the case that it is contained in each set of an r.e. infinite sequence Ai
of sets of intervals with the property that the measure3 of the ith set
is always less than or equal to 2−i :
µ(Ai ) ≤ 2−i . (7.1)
Here is the definition of a Martin-Löf random real r in a more compact
notation: h i
∀i µ(Ai ) ≤ 2−i ⇒ ¬∀i [r ∈ Ai ] .
An equivalent definition, if we restrict ourselves to reals in the unit
interval 0 ≤ r ≤ 1, may be formulated in terms of bit strings rather
than geometrical notions, as follows. Define a covering to be an r.e. set
of ordered pairs consisting of a positive integer i and a bit string s,
Covering = {(i, s)},
with the property that if (i, s) ∈ Covering and (i, s0 ) ∈ Covering, then
it is not the case that s is an extension of s0 or that s0 is an extension
of s.4 We simultaneously consider Ai to be a set of (finite) bit strings
{s : (i, s) ∈ Covering}
3
I.e., the sum of the lengths of the intervals, being careful to avoid counting
overlapping intervals twice.
4
This is to avoid overlapping intervals and enable us to use the formula (7.2). It
is easy to convert a covering which does not have this property into one that covers
exactly the same set and does have this property. How this is done depends on the
order in which overlaps are discovered: intervals which are subsets of ones which
have already been included in the enumeration of Ai are eliminated, and intervals
which are supersets of ones which have already been included in the enumeration
must be split into disjoint subintervals, and the common portion must be thrown
away.
186 CHAPTER 7. RANDOMNESS
∃c∀n [H(rn ) ≥ n − c]
The series
X 2 X 2 +n
2n /2n = 2−n = 2−0 + 2−0 + 2−2 + 2−6 + 2−12 + 2−20 + · · ·
(In fact, we can take N = 2.) Let the variable s range over bit strings,
and consider the following inequality:
X X X X 2 +n
2−[|s|−n] = 2n µ(An2 ) ≤ 2−n ≤ 1.
n≥N s∈An2 n≥N n≥N
satisfy the Kraft inequality and are consistent (Theorem I2). It follows
that
s ∈ An2 & n ≥ N ⇒ H(s) ≤ |s| − n + sim(C).
Thus, since r ∈ An2 for all n ≥ N, there will be infinitely many initial
segments rk of length k of the base-two expansion of r with the property
that rk ∈ An2 and n ≥ N, and for each of these rk we have
since Ω ≤ 1. Thus if a real r has the property that H(rn ) dips below
n − k − c for even one value of n, then r is covered by an r.e. set Ak of
intervals with µ(Ak ) ≤ 2−k . Thus if H(rn )−n goes arbitrarily negative,
for each k we can compute an Ak with µ(Ak ) ≤ 2−k and r ∈ Ak , and r
is not Martin-Löf random. Q.E.D.
Theorem R2 [Solovay (1975)]
Martin-Löf random ⇔ Solovay random.
Proof ¬Martin-Löf ⇒ ¬Solovay
We are given that ∀i [r ∈ Ai ] and ∀i [µ(Ai ) ≤ 2−i ]. Thus
X X
µ(Ai ) ≤ 2−i < ∞.
P
Hence µ(Ai ) converges and r is in infinitely many of the Ai and
cannot be Solovay random.
Proof ¬Solovay ⇒ ¬Martin-Löf
Suppose X
µ(Ai ) ≤ 2c
and the real number r is in infinitely many of the Ai . Let
n o
Bn = x : x is in at least 2n+c of the Ai .
7.2. RANDOM REALS 189
Hence X X X
2−|s| = µ(Ai ) ≤ 1.
i≥N s∈Ai i≥N
Let An be the r.e. set of all n-bit strings s such that H(s) < n + k.
X X X
µ(An ) ≤ 2−H(n)+k+c = 2k+c 2−H(n) ≤ 2k+c Ω ≤ 2k+c ,
n
P
since Ω ≤ 1. Hence µ(An ) < ∞ and r is in infinitely many of the
An , and thus r is not Solovay random. Q.E.D.
Theorem R4
A real number is Martin-Löf random ⇔ it is Solovay random ⇔ it
is Chaitin random ⇔ it is weakly Chaitin random.
Proof
The equivalence of all four definitions of a random real number
follows immediately from Theorems R1, R2, and R3. Q.E.D.
Note
That weak Chaitin randomness is coextensive with Chaitin random-
ness, reveals a complexity gap. I.e., we have shown that if H(rn ) > n−c
for all n, necessarily H(rn ) − n → ∞.
Theorem R5
With probability one, a real number r is Martin-Löf/Solovay/
Chaitin random.
Proof 1
Since Solovay randomness ⇒ Martin-Löf and Chaitin randomness,
it is sufficient to show that r is Solovay random with probability one.
Suppose X
µ(Ai ) < ∞,
where the Ai are an r.e. infinite sequence of sets of intervals. Then (this
is the Borel–Cantelli lemma [Feller (1970)])
[ X
lim µ( Ai ) ≤ lim µ(Ai ) = 0
N →∞ N →∞
i≥N i≥N
We use the Borel–Cantelli lemma again. This time we show that the
Chaitin criterion for randomness, which is equivalent to the Martin-Löf
and Solovay criteria, is true with probability one. Since for each k,
X
µ({r : H(rn ) < n + k}) ≤ 2k+c
n
and thus converges,6 it follows that for each k with probability one
H(rn ) < n + k only finitely often. Thus, with probability one,
lim H(rn ) − n = ∞.
n→∞
Q.E.D.
Theorem R6
Ω is a Martin-Löf/Solovay/Chaitin random real number.7
Proof
It is easy to see that Ω can be computed as a limit from below. We
gave a LISP program for doing this at the end of Chapter 5. Indeed,
{p : U(p, Λ) is defined} ≡ {p1 , p2 , p3 , . . .}
is a recursively enumerable set. Let
X
ωn ≡ 2−|pk | .
k≤n
so that all objects with complexity H less than or equal to n are in the
set
{U(pi , Λ) : i ≤ k} ,
and one can calculate this set and then pick an arbitrary object that
isn’t in it.
Thus there is a computable partial function ψ such that
But
H(ψ(Ωn )) ≤ H(Ωn ) + cψ .
Hence
n < H(ψ(Ωn )) ≤ H(Ωn ) + cψ ,
and
H(Ωn ) > n − cψ .
Thus Ω is weakly Chaitin random, and by Theorem R4 it is Martin-
Löf/Solovay/Chaitin random. Q.E.D.
Note
More generally, if X is an infinite r.e. set of S-expressions, then
X
2−H(x)
x∈X
and X
P (x)
x∈X
Consider the set An of all infinite bit strings for which F makes at
least n predictions and the number of correct predictions k among the
first n made satisfies
1 k
− > .
2 n
We shall show that
essentially converges like a geometric series with ratio less than one.
Since Ω satisfies the Solovay randomness criterion, it follows that Ω is
in at most finitely many of the An . I.e., if F predicts infinitely many
bits of Ω, then, for any > 0, from some point on the number of correct
predictions k among the first n made satisfies
1 k
− ≤ ,
2 n
if
1 k
− > .
2 n
To prove this, note that the binomial coefficients “n choose k” sum to
2n , and that the coefficients start small, grow until the middle, and
then decrease as k increases beyond n/2. Thus the coefficients that
we are interested in are obtained by taking the large middle binomial
coefficient, which is less than 2n , and multiplying it by at least n
fractions, each of which is less than unity. In fact, at least n/2 of the
194 CHAPTER 7. RANDOMNESS
L(n, x1 , . . . , xm ) = R(n, x1 , . . . , xm )
where there are n 1’s in the first list of 1’s and k 1’s in the second list of
1’s. The resulting equation will have a solution in non-negative integers
if and only if ϕ(n, k) is defined, and for given n and k it can have at
most one solution.
We are almost at our goal; we need only point out that the binary
representation of the S-expression (7.3) can be written in closed form
as an algebraic function of n and k that only uses +, ×, −, and expo-
nentiation. This is easy to see; the essential step is that the binary
representation of a character string consisting only of 1’s is just the
sum of a geometric series with multiplier 256. Then, proceeding as in
Chapter 2, we eliminate the minus signs and express the fact that s is
the binary representation of the S-expression (7.3) with given n and k
by means of a few exponential diophantine equations. Finally we fold
this handful of equations into the left-hand side and the right-hand side
of our LISP interpreter equation, using the same “sum of squares” trick
that we did in Chapter 2.
The result is that our equation has gotten a little bigger, and that
the variable input.EXPRESSION has been replaced by three new vari-
ables s, n and k and a few new auxiliary variables. This new monster
equation has a solution if and only if ϕ(n, k) is defined, and for given n
196 CHAPTER 7. RANDOMNESS
and k it can have at most one solution. Recall that ϕ(n, k) is defined
for all sufficiently large values of k if and only if the nth bit of the base-
two representation of Ω is a 1. Thus our new equation has infinitely
many solutions for a given value of n if the nth bit of Ω is a 1, and it
has finitely many solutions for a given value of n if the nth bit of Ω is
a 0. Q.E.D.
Chapter 8
Incompleteness
197
198 CHAPTER 8. INCOMPLETENESS
one can never prove that a specific string has this property.
As we saw when we studied randomness, if one produces a bit string
s by tossing a coin n times, 99.9% of the time it will be the case that
H(s) ≈ n + H(n). In fact, if one lets n go to infinity, with probability
one H(s) > n for all but finitely many n (Theorem R5). However,
Theorem LB [Chaitin (1974a,1974b,1975a,1982b)]
Consider a formal theory all of whose theorems are assumed to be
true. Within such a formal theory a specific string cannot be proven to
have information content more than O(1) greater than the information
content of the axioms of the theory. I.e., if “H(s) ≥ n” is a theorem
only if it is true, then it is a theorem only if n ≤ H(axioms) + O(1).
Conversely, there are formal theories whose axioms have information
content n+O(1) in which it is possible to establish all true propositions
of the form “H(s) ≥ n” and of the form “H(s) = k” with k < n.
Proof
The idea is that if one could prove that a string has no distinguish-
ing feature, then that itself would be a distinguishing property. This
paradox can be restated as follows: There are no uninteresting numbers
(positive integers), because if there were, the first uninteresting number
would ipso facto be interesting! Alternatively, consider “the smallest
positive integer that cannot be specified in less than a thousand words.”
We have just specified it using only fourteen words.
Consider the enumeration of the theorems of the formal axiomatic
theory in order of the size of their proofs. For each positive integer
k, let s∗ be the string in the theorem of the form “H(s) ≥ n” with
n > H(axioms) + k which appears first in the enumeration. On the one
hand, if all theorems are true, then
On the other hand, the above prescription for calculating s∗ shows that
and thus
H(s∗ ) ≤ H(axioms, H(axioms), k) + cψ
≤ H(axioms) + H(k) + O(1).
8.1. LOWER BOUNDS ON INFORMATION CONTENT 199
and thus
k < H(k) + O(1).
However, this inequality is false for all k ≥ k0 , where k0 depends only
on the rules of inference. A contradiction is avoided only if s∗ does not
exist for k = k0 , i.e., it is impossible to prove in the formal theory that
a specific string has H greater than H(axioms) + k0 .
Proof of Converse
The set T of all true propositions of the form “H(s) ≤ k” is re-
cursively enumerable. Choose a fixed enumeration of T without repe-
titions, and for each positive integer n, let s∗ be the string in the last
proposition of the form “H(s) ≤ k” with k < n in the enumeration.
Let
∆ = n − H(s∗ ) > 0.
Then from s∗ , H(s∗), and ∆ we can calculate n = H(s∗ ) + ∆, then all
strings s with H(s) < n, and then a string sn with H(sn ) ≥ n. Thus
and so
n ≤ H(s∗ , H(s∗ ), ∆) + cψ ≤ H(s∗ ) + H(∆) + O(1)
(8.1)
≤ n + H(∆) + O(1)
using the subadditivity of joint information and the fact that a program
tells us its size as well as its output. The first line of (8.1) implies that
which implies that ∆ and H(∆) are both bounded. Then the second
line of (8.1) implies that
The triple (s∗ , H(s∗ ), ∆) is the desired axiom: it has information con-
tent n + O(1), and by enumerating T until all true propositions of the
form “H(s) ≤ k” with k < n have been discovered, one can immedi-
ately deduce all true propositions of the form “H(s) ≥ n” and of the
form “H(s) = k” with k < n. Q.E.D.
Note
Here are two other ways to establish the converse, two axioms that
solve the halting problem for all programs of size ≤ n:
for now (Sections 8.2 and 8.3) we only make use of the fact that Ω is
Martin-Löf random.
If one tries to guess the bits of a random sequence, the average
number of correct guesses before failing is exactly 1 guess! Reason: if
we use the fact that the expected value of a sum is equal to the sum
of the expected values, the answer is the sum of the chance of getting
the first guess right, plus the chance of getting the first and the second
guesses right, plus the chance of getting the first, second and third
guesses right, et cetera:
1 1 1 1
+ + + + · · · = 1.
2 4 8 16
Or if we directly calculate the expected value as the sum of (the number
right till first failure) × (the probability):
1 1 1 1 1
0× +1× +2× +3× +4× +···
2 4 8 16 32
X X X
=1× 2−k + 1 × 2−k + 1 × 2−k + · · ·
k>1 k>2 k>3
1 1 1
= + + + · · · = 1.
2 4 8
On the other hand (see the next section), if we are allowed to try 2n
times a series of n guesses, one of them will always get it right, if we
try all 2n different possible series of n guesses.
Theorem X
Any given formal theory T can yield only finitely many (scattered)
bits of (the base-two expansion of) Ω. When we say that a theory yields
a bit of Ω, we mean that it enables us to determine its position and its
0/1 value.
Proof
Consider a theory T , an r.e. set of true assertions of the form
since X
2−f (n) ≤ 1.
P
Hence µ(Ak ) ≤ 2−k , and µ(Ak ) also converges. Thus only finitely
many of the Ak occur (Borel–Cantelli lemma [Feller (1970)]). I.e.,
[ X
lim µ( Ak ) ≤ µ(Ak ) ≤ 2−N → 0.
N →∞
k>N k>N
The measure µ(Ak ) of the union of the set of possibilities for Ω covered
by n-bit theories with any n is thus
X X P
≤ 2−f (n)−k = 2−k 2−f (n) ≤ 2−k (since 2−f (n) ≤ 1).
n n
Proof
Choose c so that X
2−f (n) ≤ 2c .
Then X
2−[f (n)+c] ≤ 1,
and we can apply Theorem A to f 0 (n) = f (n) + c. Q.E.D.
Corollary A2
Let X
2−f (n)
converge and f be computable as before. If g(n) is computable, then
there is a constant cf,g with the property that no g(n)-bit theory ever
yields more than g(n) + f (n) + cf,g bits of Ω. E.g., consider N of the
form
n
22 .
For such N, no N-bit theory ever yields more than N +f (log log N)+cf,g
bits of Ω.
Note
Thus for n of special form, i.e., which have concise descriptions, we
get better upper bounds on the number of bits of Ω which are yielded
by n-bit theories. This is a foretaste of the way algorithmic information
theory will be used in Theorem C and Corollary C2 (Section 8.4).
Lemma for Second Borel–Cantelli Lemma!
For any finite set {xk } of non-negative real numbers,
Y 1
(1 − xk ) ≤ P .
xk
Proof
If x is a real number, then
1
1−x ≤ .
1+x
Thus
Y 1 1
(1 − xk ) ≤ Q ≤P ,
(1 + xk ) xk
8.3. RANDOM REALS: |AXIOMS| 205
Q.E.D.
Second Borel–Cantelli Lemma [Feller (1970)]
Suppose that the events An have the property that it is possible to
determine whether or not the event An occurs by examining the first
f (n) bits of Ω, where f is a computable function. If the events An are
P
mutually independent and µ(An ) diverges, then Ω has the property
that infinitely many of the An must occur.
Proof
Suppose on the contrary that Ω has the property that only finitely
many of the events An occur. Then there is an N such that the event
An does not occur if n ≥ N. The probability that none of the events
AN , AN +1 , . . . , AN +k occur is, since the An are mutually independent,
precisely
Yk
1
(1 − µ(AN +i )) ≤ hPk i,
i=0 i=0 µ(AN +i )
(2) Then there must be a 1 at each end of the run of 0’s, but the
remaining 2k − k − 2 = k − 2 bits can be anything.
(4) There is no room for another 10k 1 to fit in the block of 2k bits, so
we are not overestimating the probability by counting anything
twice.
Invoking the second Borel–Cantelli lemma (if the events Ai are indepen-
P
dent and µ(Ai ) diverges, then infinitely many of the Ai must occur),
we are finished. Q.E.D.
Corollary B
If X
2−f (n)
diverges and f is computable and nondecreasing, then infinitely often
there is a run of f (2n+1 ) zeros between bits 2n and 2n+1 of Ω (2n ≤
bit < 2n+1 ). Hence there are infinitely many N-bit theories that yield
(the first) N + f (N) bits of Ω.
Proof
8.3. RANDOM REALS: |AXIOMS| 207
1 X n+1 n+1
= 2 φ(2 ).
2
On the other hand,
X Xh i X
φ(k) ≤ φ(2n ) + · · · + φ(2n+1 − 1) ≤ 2n φ(2n ).
If X
2−f (n)
diverges and f is computable and nondecreasing, then by the Cauchy
condensation test X n
2n 2−f (2 )
also diverges, and therefore so does
X n+1 )
2n 2−f (2 .
(a) There is a c with the property that no n-bit theory ever yields
more than
(scattered) bits of Ω.
(b) There are infinitely many n-bit theories that yield (the first)
bits of Ω.
Proof
Using the Cauchy condensation test, we shall show below that
P
(a) 1
n log n(log log n)2
< ∞,
P
(b) 1
n log n log log n
= ∞.
which converges.
On the other hand,
X1
n
behaves the same as
X 1 X
2n = 1,
2n
which diverges.
X 1
n log n
behaves the same as
X 1 X1
2n = ,
2n n n
which diverges. And
X 1
n log n log log n
behaves the same as
X 1 X 1
2n n
= ,
2 n log n n log n
which diverges. Q.E.D.
(1) We have seen that the information content of knowing the first n
bits of Ω is ≥ n − c.
(2) Now we show that the information content of knowing any n bits
of Ω (their positions and 0/1 values) is ≥ n − c.
210 CHAPTER 8. INCOMPLETENESS
Lemma C
X
#{s : H(s) < n}2−n ≤ 1.
n
Proof
X
1≥Ω≥ 2−H(s)
s
X X X
= #{s : H(s) = n}2−n = #{s : H(s) = n}2−n 2−k
n n k≥1
XX X
= #{s : H(s) = n}2−n−k = #{s : H(s) < n}2−n .
n k≥1 n
Q.E.D.
Theorem C
If a theory has H(axiom) < n, then it can yield at most n + c
(scattered) bits of Ω.
Proof
Consider a particular k and n. If there is an axiom with H(axiom) <
n which yields n + k scattered bits of Ω, then even without knowing
which axiom it is, we can cover Ω with an r.e. set of intervals of measure
#{s : H(s) < n} 2−n−k
≤ # of axioms measure of set of
with H < n possibilities for Ω
Thus if even one theory with H < n yields n+ k bits of Ω, for any n, we
get a cover for Ω of measure ≤ 2−k . This can only be true for finitely
many values of k, or Ω would not be Martin-Löf random. Q.E.D.
Corollary C
No n-bit theory ever yields more than n + H(n) + c bits of Ω.
Proof
8.4. RANDOM REALS: H(AXIOMS) 211
Proof
The theorem combines Theorem R8, Corollaries A and B, Theorem
C, and Corollaries C and C2. Q.E.D.
Chapter 9
Conclusion
213
214 CHAPTER 9. CONCLUSION
1
Compare my previous thoughts on theoretical biology, Chaitin (1970b) and
Chaitin (1979). There I suggest that mutual information H(s : t) can be used to
pick out the highly correlated regions of space that contain organisms. This view is
static; here we are concerned with the dynamics of the situation. Incidentally, it is
possible to also regard these papers as an extremely abstract discussion of musical
structure and metrics between compositional styles.
2
In Chaitin (1985) I examine the complexity of physical laws by actually pro-
gramming them, and the programs turn out to be amazingly small. I use APL
instead of LISP.
3
See Chaitin (1977a,1976c).
4
See the discussion of matching pennies in Chaitin (1969a).
Chapter 10
Bibliography
215
216 CHAPTER 10. BIBLIOGRAPHY
Implementation Notes
The programs in this book were run under the VM/CMS time-sharing
system on a large IBM 370 mainframe, a 3090 processor. A virtual
machine with 4 megabytes of storage was used.
The compiler for converting register machine programs into expo-
nential diophantine equations is a 700-line1 REXX program. REXX is
a very nice and easy to use pattern-matching string processing language
implemented by means of a very efficient interpreter.2
There are three implementations of our version of pure LISP:
(1) The first is in REXX, and is 350 lines of code. This is the sim-
plest implementation of the LISP interpreter, and it serves as an
“executable design document.”
221
222 APPENDIX A. IMPLEMENTATION NOTES
(3) The third LISP implementation, like the previous one, has a 250-
line REXX driver; the real work is done by a 700-line 370 As-
sembler H expression evaluator. This is the high-performance
evaluator, and it is amazingly small: less than 8K bytes of 370
machine language code, tables, and buffers, plus a megabyte of
storage for the stack, and two megabytes for the heap, so that
there is another megabyte left over for the REXX driver. It gets
by without a garbage collector: since all information that must
be preserved from one evaluation to another (mostly function def-
initions) is in the form of REXX character strings, the expression
evaluator can be reinitialized after each evaluation. Another rea-
son for the simplicity and speed of this interpreter is that our
version of pure LISP is “permissive;” error checking and the pro-
duction of diagnostic messages are usually a substantial portion
of an interpreter.
All the REXX programs referred to above need to know the set of
valid LISP characters, and this information is parameterized as a small
128-character file.
An extensive suite of tests has been run through all three LISP
implementations, to ensure that the three interpreters produce identical
results.
This software is available from the author on request.
Appendix B
223
224 APPENDIX B. S-EXPRESSIONS OF SIZE N
that left and right parentheses must balance for the first time precisely
at the end of the expression. Our task is easier than in normal LISP
because we ignore blanks and all atoms are exactly one character long,
and also because NIL and () are not synonyms.
Here are some examples. S0 = 0, since there are no zero-character
S-expressions. S1 = α, since each atom by itself is an S-expression.
S2 = 1, because the empty list () is two characters. S3 = α again:
(a)
S4 = α2 + 1:
(aa)
(())
S5 = α3 + 3α:
(aaa)
(a())
(()a)
((a))
S6 = α4 + 6α2 + 2:
(aaaa)
(aa())
(a()a)
(a(a))
(()aa)
(()())
((a)a)
((aa))
((()))
S7 = α5 + 10α3 + 10α:
(aaaaa)
(aaa())
(aa()a)
(aa(a))
225
(a()aa)
(a()())
(a(a)a)
(a(aa))
(a(()))
(()aaa)
(()a())
(()()a)
(()(a))
((a)aa)
((a)())
((aa)a)
((())a)
((aaa))
((a()))
((()a))
(((a)))
Moreover, it follows from the asymptotic estimate for Sn that this infi-
P
nite series converges as n−1.5 .
In fact, the asymptotic estimate for Sn stated above is derived by
using the well-known fact that the probability that the first return to
the origin in a symmetrical random walk in one dimension occurs at
epoch 2n is precisely
!
1 2n −2n 1
2 ∼ √ .
2n − 1 n 2n πn
For we are choosing exactly half of the random walks, i.e., those that
start with a left parenthesis not a right parenthesis.
Accepting this estimate for the moment (we shall give a proof later)
[or see Feller (1970)], we now derive the asymptotic estimate for
Sn for unrestricted α. To obtain an arbitrary n-character S-expression,
first decide the number 2k (0 ≤ 2k ≤ n) of parentheses that it contains.
Then choose which of the n characters will be parentheses and which
will be one of the α atoms. There are n − 2 choose 2k − 2 ways of
doing this, because the first and the last characters must always be a
left and a right parenthesis, respectively. There remain αn−2k choices
for the characters that are not parentheses, and one-half the number of
227
ways a random walk can return to the origin for the first time at epoch
2k ways to choose the parentheses. The total number of n-character
S-expressions is therefore
!" !#
X n−2 1 1 2k
n−2k
α .
0≤2k≤n 2k − 2 2 2k − 1 k
This is approximately equal to
!" #2 " #
X n 2k 1
α n−2k 2k
2 √ 1.5 .
0≤2k≤n 2k n 4 πk
S0 = 0,
S1 = α,
(B.1)
S2 = 1,
P
Sn = n−1k=2 Sn−k Sk (n ≥ 3).
The recurrence (B.1) for Sn and its boundary conditions can then be
reformulated in terms of the generating function as follows:
I.e., h i
F (x)2 + [−αx − 1] F (x) + αx + x2 = 0.
2
For some of the history of this method, and its use on a related problem, see
“A combinatorial problem in plane geometry,” Exercises 7–9, Chapter VI, p. 102,
Pólya (1954).
229
G(x)2 = P (x).
and thus
2P (x)G0 (x) = P 0 (x)G(x),
from which we now derive a recurrence for calculating Sn from Sn−1
and Sn−2 , instead of needing all previous values.
We have
G(x)2 = P (x),
that is,
1
G(x)2 = −αx − x2 + [−αx − 1]2 .
4
3
I am grateful to my colleague Victor Miller for suggesting the method we use
to do this.
230 APPENDIX B. S-EXPRESSIONS OF SIZE N
where it is understood that the low order terms of the sums have been
“modified.” Substituting in P (x) and P 0 (x), and multiplying through
by 2, we obtain
h iX h iX
(α2 − 4)x2 − 2αx + 1 (n + 1)Sn+1 xn = (α2 − 4)x − α Sn xn .
I.e., P
[(α2 − 4)(n − 1)Sn−1 − 2αnSn + (n + 1)Sn+1 ] xn
P
= [(α2 − 4)Sn−1 − αSn ] xn .
We have thus obtained the following remarkable recurrence for n ≥ 3:
h i
nSn = − (α2 − 4)(n − 3) Sn−2 + [2α(n − 1) − α] Sn−1 . (B.2)
Recurrences such as (B.3) are well known. See, for example, the dis-
cussion of “Recurring series,” and “Solution of difference equations,”
Exercises 15–16, Chapter VIII, pp. 392–393, Hardy (1952). The lim-
iting ratio Sn /Sn−1 → ρ must satisfy the following equation:
(α2 − 4) − 2αx + x2 = 0.
(x − (α + 2)) (x − (α − 2)) = 0.
ρ1 = α − 2,
ρ2 = α + 2.
The larger root ρ2 agrees with our previous asymptotic estimate for
Sn /Sn−1 .
232 APPENDIX B. S-EXPRESSIONS OF SIZE N
Appendix C
Back Cover
233
234 APPENDIX C. BACK COVER