Turing AI
Turing AI
AI Communications
ISSN 0921-7126, IOS Press. All rights reserved
2 S. Muggleton / Turing and the development of
AI
sign and test a machine which would emulate hu- the building of machines designed to carry out
man reasoning. Acting as the UK’s main wartime this or that complicated task. He was now fas-
decryption centre, Bletchley Park had recruited cinated with the idea of a machine that could
many of the UK’s best mathematicians in an at- learn.
tempt to decode German military messages. By
1940 the Bombe machine, designed by Turing and
Welchman [7], had gone into operation and was
2. Turing’s 1950 paper in Mind
ef- ficiently decrypting messages using methods
pre- viously employed manually by human
decoders. In keeping with Turing’s background in 2.1. Structure of the paper
Mathemati- cal Logic, the Bombe design worked
according to a reductio ad absurdum principle The opening sentence of Turing’s 1950 paper
which simplified the hypothesis space of 263 [42] declares
possible settings for the Enigma machine to a small
number of possibilities based on a given set of I propose to consider the question, “Can ma-
message transcriptions. chines think?”
The hypothesis elimination principle of the
Bombe was later refined in the design of the The first six sections of the paper provide a philo-
sophical framework for answering this question.
Colos- sus I and II machines. The Tunny report These sections are briefly summarised below.
[15] (de- classified by the UK government in
2000), shows that one of the key technical
refinements of Colos- sus was the use of Bayesian
reasoning to order the search through the space
of hypothetical settings for the Lorenz encryption
machine. This combi- nation of logical
hypothesis generation tied with Bayesian
evaluation were later to become central to
approaches used within Machine Learning (see
Section 5). Indeed strong parallels exist between
decryption tasks on the one hand, which involve
hypothesising machine settings from a set of mes-
sage transcriptions and modern Machine Learning
tasks on the other hand, which involve hypothesis-
ing a model from a set of observations. Given their
grounding in the Bletchley Park decryption work
it is hardly surprising that two of the authors of
the Tunny report, Donald Michie (1923-2007) and
Jack Good (1916-2009), went on to play
founding roles in the post-war development of
Machine In- telligence and Subjective
Probabilistic reasoning respectively. In numerous
out-of-hours meetings at Bletchley Park, Turing
discussed the problem of machine intelligence
with both Michie and Good. According to Andrew
Hodges [16], Turing’s biog- rapher
These meetings were an opportunity for Alan
to develop the ideas for chess-playing
machines that had begun in his 1941
discussions with Jack Good. They often talked
about mechani- sation of thought processes,
bringing in the the- ory of probability and
weight of evidence, with which Donald Michie
was by now familiar. . . .
He (Turing) was not so much concerned with
1. The Imitation Game. Often referred to as
the “Turing test”, this is a form of parlour
game involving a human interrogator who al-
ternately questions a hidden computer and a
hidden person in an attempt to distinguish
the identity of the respondents. The Imita-
S. Muggleton / Turing and the development of 3
tion Game is aimed at providing
AI an objective
test for deciding whether machines can
think.
2. Critique of the New Problem. Turing dis-
cusses the advantages of the game for the pur-
poses of deciding whether machines and hu-
mans could be attributed with thinking on
an equal basis using objective human judge-
ment.
3. The Machines Concerned in the Game. Tur-
ing indicates that he intends digital comput-
ers to be the only kind of machine permitted
to take part in the game.
4. Digital Computers. The nature of the new
digital computers, such as the Manchester
machine, is explained and compared to
Charles
Babbage’s proposals for an Analytical En-
gine.
5. Universality of Digital Computers. Turing
explains how digital computers can emulate
any discrete-state machine.
6. Contrary Views on the Main Question. Nine
traditional philosophical objections to the
proposition that machines can think are in-
troduced and summarily dismissed by Tur-
ing.
4 S. Muggleton / Turing and the development of
AI
2.2. Learn would be a very practicable possibility even by
ing machines - Section 7 of Turing
paper present techniques. It is probably not necessary
to increase the speed of operations of the ma-
The task of engineering software which ad- chines at all. Parts of modern machines which
can be regarded as analogs of nerve cells work
dresses the central question of Turing’s paper about a thousand times faster than the lat-
have ter. This should provide a “margin of safety”
dominated Artificial Intelligence research over the which could cover losses of speed arising in
last sixty years. In the final section of the 1950 many ways. Our problem then is to find out
paper Turing addresses the motivation and possi- how to programme these machines to play the
ble approaches for such endeavours. His transition game. At my present rate of working I produce
from the purely philosophical nature of the first about a thousand digits of programme a day,
so that about sixty workers, working steadily
six sections of the paper is marked as follows. through the fifty years might accomplish the
The only really satisfactory support that can job, if nothing went into the wastepaper bas-
be given for the view expressed at the begin- ket. Some more expeditious method seems de-
ning of section 6, will be that provided by wait-
sirable.
ing for the end of the century and then doing
the experiment described. But what can we In retrospect it is amazing that Turing managed to
say foresee that “Advances in engineering” would lead
in the meantime? to computers with a Gigabyte of storage by the
Turing goes on to discuss three distinct strategies end of the twentieth century. It is also noteworthy
which might be considered capable of achieving a that Turing suggests that in terms of hardware,
thinking machine. These can be characterised as it
follows: 1) AI by programming, 2) AI by ab is memory capacity rather than processing speed
initio which will be critical.
machine learning and 3) AI using logic, probabil- However, the final sentence of the quote above
ities, learning and background knowledge. In the indicates that Turing could already foresee that
next three sections we discuss these strategies of manual composition of a program which could
Turing in relation to various phases of AI pass
research the Turing test was not the most “expeditious”
3. Version 1: AI by programming [1960s- method, despite the fact that a dedicated group of
1980s] around “sixty” programmers might complete the
task within “fifty years”” if “nothing went into
3.1. Storage capacity argument
the wastepaper basket”. Turing must already have
been accutely aware, from his work with the early
Turing considers an argument concerning the pilot ACE computer, that plenty goes in the
memory requirements for programming a digital waste
computer with similar capacity to a human being. basket in the process of debugging computer pro-
As I have explained, the problem is mainly one grams.
of programming. Advances in engineering will
have to be made too, but it seems unlikely that 3.2. Programming approach to AI and the
Machine Intelligence
these will series for the requirements.
not be adequate
Estimates of the storage capacity of the brain
vary from 1010 to 1015 binary digits. I Turing’s influence on the development of AI
incline from the 1960s to the 1980s is particularly evi-
to the lower values and believe that only a very dent in the Machine Intelligence book series,
small fraction is used for the higher types of which
thinking. Most of it is probably used for the re- acted as a vanguard of cutting edge AI research
tention of visual impressions, I should be sur- during this period. The series Executive Editor,
prised if more than 109 was required for satis- Donald Michie has already been mentioned as
factory playing of the imitation game, at any one of Turing’s Bletchley colleagues. Michie was
rate against a blind man. (Note: The capac- also the founder of Europe’s first Department of
ity of the Encyclopaedia Britannica, 11th edi- Artificial Intelligence in the 1960s in Edinburgh,
tion, is 2 × 109). A storage capacity of 107, and later also founded the Turing Institute (an
S. Muggleton / Turing and the development of 5
AI
AI research institute) in the 1980s in Glasgow. Physical perception The 1960s-1980s witnessed a
Michie specifically chose topics for the Machine number of early and bold attempts to write pro-
Intelligence workshops which were closely related grams which could recognise three-dimensional
to those which he and Jack Good had discussed ob-
with Turing during the war. Indeed Jack Good jects within a digital image (eg [19,3].) However,
was a frequent contributor to the series on Turing- these were generally limited to analysis of simple
inspired topics such as Computer Chess [14]. To polygons and it was unclear how they could be
open the Machine Intelligence 5 volume Michie se- extended to recognise real-world objects such as
lected “Intelligent machinery” [44], a previously trees, cars or people.
unpublished article, in which Turing discussed the In the same period considerable advances were
idea of designing intelligent robots which could made in natural language generation and under-
“roam the countryside” and learn from their expe- standing (eg [35,36,37]). Early systems directly ad-
rience. dressed one of the key assumptions of Turing’s
Turing’s Version 1 Programming approach to imi-
Artificial Intelligence was the dominating paradigm tation game, by supporting answering of questions
for Artificial Intelligence research up until the mid- posed in natural language. However, just as with
1980s. Research during this period can largely be the initial attempts at computer vision, these nat-
divided into broad areas associated with 1) Rea- ural language systems were limited by the com-
soning, 2) Physical perception and 3) Physical ac- plexity of grammars provided by their program-
tion. mers.
Physical action As mentioned previously Tur-
ing [44] had discussed the idea of intelligent ma-
Reasoning Simon and Newell’s General Prob- chines which could roam the countryside, learning
lem Solver (GPS) [30] was an early and influ- for themselves. Probably the best known mobile
ential attempt to program a universal problem robotics project from the early years was Stan-
solver which could be applied to a variety of for- ford’s Shakey project (1966-1972) [31]. By con-
mal symbolic reasoning problems such as theo- trast, in the Edinburgh Freddy assembly robot
rem proving, geometry and chess playing. It was [2,1] the robot arm and associated digital camera
clear that although GPS could solve simple prob- remained in a fixed position while a platform con-
lems, with more complex tasks, its reasoning was taining sequentially assembled parts was directed
rapidly swamped by the combinatorics of the to move past it by the computer.
search. Throughout the 1960s-1980s a variety of
other more specific approaches were taken to the
problems of improving the efficiency of search (eg
4.[24,9])
Version
and2:planning
AI by ab (eg
initio machineAdditionally
[8,11,17]). learning a
variety of more special purpose techniques were de-
veloped for both theorem proving (eg [34,20]) and In his 1950s paper Turing had already antici-
chess playing (eg [38,14]). pated the difficulties of developing AI by manually
During the same period, attempts to address programming a digital computer. His suggested
the difficulties, foreseen by Turing, of writing ef- remedy was that machines must learn in the same
fective and efficient AI programs led to the rise of way as a human child.
a number of high-level languages. The methodolo-
gies on which these were based varied from the use Instead of trying to produce a programme to
of λ-calculus (eg LISP) [21] to the development simulate the adult mind, why not rather try to
of stack-based languages (eg POP1) [6] as well as produce one which simulates the child’s? If this
languages based on first-order predicate calculus were then subjected to an appropriate course
(eg Prolog) [46]. The approach of heuristic pro- of education one would obtain the adult brain.
gramming, developed in systems such as Dendral Presumably the child brain is something like
[4] and MYCIN [41], used constraints in the form a notebook as one buys it from the station-
of rules to produce systems which could reason at ers. Rather little mechanism, and lots of blank
the level of human experts. These expert systems
sheets. (Mechanism and writing are from our
became a key demonstrator for the achievements
of Artificial Intelligence in the early 1980s. point of view almost synonymous.) Our hope
is that there is so little mechanism in the child
brain that something like it can be easily pro-
6 S. Muggleton / Turing and the development of
AI
grammed. The amount of work in the educa- Turing’s knowledge of information theory [39] had
tion we can assume, as a first approximation, led him to anticipate some of the limitations later
to be much the same as for the human child. uncovered in the 1980s by Valiant’s theory of the
learnable [45]. That is, effective ab initio machine
4.1. The ab initio Machine Learning movement learning is necessarily confined to the construction
[1980s-1990s] of relatively small chunks of knowledge. However,
Valiant also demonstrated that the expected accu-
During the 1970s the success of the expert sys- racy of the learned knowledge can be arbitrarily
tems movement (see Section 3.2) became increas- high given sufficient examples. So, unfortunately
ingly stifled by the cost of involving experts in the we have to return to Turing’s original question of
development and maintenance of large rule-based how to programme the 10 12 bits of memory re-
systems. This problem became known as “Feigen- quired to achieve human-level intelligence.
baum’s bottleneck” [10]. However, early experi-
ments with Meta-Dendral [4], and later Michalski’s
Soy Bean expert system [23], showed that rules 5. Version 3: AI using logic, probabilities,
could be automatically learned by machines from learning and background knowledge
observations. Moreover, Michalski demonstrated
that not only was this a more efficient method of Turing’s answer to the problems which beset
building and maintaining expert systems, but it ab initio machine learning follows immediately on
could also result in rules which were more from the quote given in the previous Section.
accurate than existing human experts. This
It is necessary therefore to have some other
resulted in the start of a new series of workshops
“unemotional” channels of communication. If
called Machine Learning [22] led by Ryszard
these are available it is possible to teach a ma-
Michalski, Jaime Car- bonell and Tom Mitchell. The
chine by punishments and rewards to obey or-
workshops, which later developed into the
ders given in some language, e.g., a symbolic
International Conference on Machine Learning,
language. These orders are to be transmitted
were originally based on the format of Donald
through the “unemotional” channels. The use
Michie’s Machine Intelligence workshops.
of this language will diminish greatly the num-
ber of punishments and rewards required.
4.2. The limits of positive and negative examples
Turing’s claim is that by employing an “unemo-
tional” symbolic language it should be possible to
A common feature of systems developed within
reduce the number of examples required for
the standard Machine Learning framework is that,
learn- ing.
in Turing’s words, learning is conducted ab initio
(Turing’s phrase is from “blank sheets”) using a
5.1. Logic-based learning with background
set of vectors associated with positive and
knowledge
negative classifications. Turing provides a
mathematically- inspired warning about such an
approach. The obvious question is the appropriate form
and function of the symbolic language to be em-
The use of punishments and rewards can at ployed. Again Turing’s suggestions follow immedi-
best be a part of the teaching process. Roughly ately on from the last quote.
speaking, if the teacher has no other means of
communicating to the pupil, the amount of in- Opinions may vary as to the complexity which
formation which can reach him does not is suitable in the child machine. One might
exceed the total number of rewards and try to make it as simple as possible consistent
punishments applied. By the time a child has with the general principles. Alternatively one
learnt to re- peat “Casabianca” he would might have a complete system of logical in-
probably feel very sore indeed, if the text ference “built in”. In the latter case the store
could only be discov- ered by a “Twenty would be largely occupied with definitions and
Questions” technique, every “NO” taking the propositions.
form of a blow.
S. Muggleton / Turing and the development of 7
AI
Alan Robinson’s introduction [34] of resolution- It is in the nature of a Universal Turing machine
based automatic theorem proving in 1965 led to that it acts as a meta-logical interpreter. It is this
an explosion of interest in the use of first-order property which allows rules to be treated as data,
predicate calculus as a representation for rea- allowing them to be altered and updated. A recent
soning within AI systems. In line with Turing’s paper [27] by the author demonstrates that the
idea of using “built-in” logical definitions, Gordon meta-interpretive nature of the Prolog Logic Pro-
Plotkin’s thesis [32] used resolution theorem prov- gramming language can be used to efficiently sup-
ing as the context for investigating a form of ma- port the introduction of auxilliary ‘invented‘’ pred-
chine learning which involves hypothesising logical icates and recursion within the context of learning
axioms from observations and background knowl- complex grammars.
edge. Within the era of Logic Programming [18]
in the 1980s, these early investigations by Plotkin
were taken up again by Shapiro [40] in the context 6. The challenge of “super-criticality”
of using inductive inference for automatically re-
vising Prolog programs. However, it was not until The previous sections indicate that many of the
the 1990s that the school of Inductive Logic Pro- issues which Turing discusses in the last section
gramming [25,26,28] started to investigate this of the paper have since been explored in the AI
ap- proach in depth as a highly expressive literature. However, one of the Machine Learning
Machine Learning paradigm. A recent survey of challenges which Turing mentions is still entirely
the field open.
[29] points to the maturity of theory, implementa- Another simile would be an atomic pile of less
tion and applications in this area. than critical size: an injected idea is to corre-
spond to a neutron entering the pile from
with- out. Each such neutron will cause a
5.2. Uncertainty and probabilistic learning
certain dis- turbance which eventually dies
away. If, how- ever, the size of the pile is
Turing makes some interesting observations sufficiently increased, the disturbance caused
con- cerning the uncertainty of learned rules. by such an incoming neutron will very likely go
on and on increas- ing until the whole pile is
Processes that are learnt do not produce a destroyed. Is there a corresponding
hun- dred per cent certainty of result; if they phenomenon for minds, and is there one for
did they could not be unlearnt. machines? There does seem to be one for the
Over the last decade there has been increasing human mind. The majority of them seem to be
interest in including probablities into Inductive ”subcritical,” i.e., to correspond in this analogy
Logic Programming [33,12]. These probability to piles of subcritical size. An idea presented to
val- ues are used to give an indication of the such a mind will on average give rise to less
uncer- tainty of learned rules. Turing also makes than one idea in reply. A smallish proportion
the fol- lowing point concerning the ephemeral are supercritical. An idea presented to such a
nature of learning. mind may give rise to a whole ”the- ory”
consisting of secondary, tertiary and more
The idea of a learning machine may appear remote ideas. Animals’ minds seem to be very
paradoxical to some readers. How can the definitely subcritical. Adhering to this analogy
rules of operation of the machine change? we ask, ”Can a machine be made to be super-
They should describe completely how the critical?”
machine will react whatever its history might
Turing’s challenge to make a machine which is
be, what- ever changes it might undergo.
“super-critical” seems to only makes sense in the
The rules are thus quite time-invariant. This
context of an extreme setting of the Version 3
is quite true. The explanation of the paradox is
ap- proach (see Section 5) to Artificial
that the rules which get changed in the
Intelligence. The situation in which a new
learning process are of a rather less
observation “leads to a theory consisting of
pretentious kind, claiming only an ephemeral
secondary, tertiary and more remote ideas”
validity.
requires both an alert mind, but
8 S. Muggleton / Turing and the development of
AI
also one which is abundantly stocked with Acknowledgements
relevant background knowledge. Providing such
abundant background knowledge to a machine is The author would like to thank Donald Michie
challenging, though the advent of the World- and other colleagues for their inspiring discussions
Wide-Web offers an obvious source, as long as the on Alan Turing’s views on Machine Intelligence
available infor- mation can be accessed for and Machine Learning. The author would also
purposes of inductive reasoning. like to thank the Royal Academy of Engineering
for funding his present 5 year Research Chair.
7. Conclusion References