history of AI 1
history of AI 1
ARTHUR 1. SAMUEL
IBM Research Laboratories
Yorktown Heights,New Yark
Page
1. Introduction . . . . . . . . . . . . . . . . . . . . . 165
2. Historical Background . . . . . . . . . . . . . . . . .166
3. Chess-Playing Machines . . . . . . . . . . . . . . . . . 166
3.1 The Minimaxing Procedure . . . . . . . . . . . . . . . 167
3.2 Dictionary Approach . . . . . . . . . . . . . . . . . 168
3.3 Shannon’s Detailed Analysis . . . . . . . . . . . . . . 168
3.4 Turing’s Contribution . . . . . . . . . . . . . . . . 170
3.5 The Los Alamos Program . . . . . . . . . . . . . . . 170
3.6 Bernstein’s Program . . . . . . . . . . . . . . . . . 171
3.7 Newell, Shaw, and Simon Program . . . . . . . . . . . . 172
4. Checkers (or Draughts) . . . . . . . . . . . . . . . . . 175
4.1 The Samuel Checker Program . . . . . . . . . . . . . . 176
5 . Other Games . . . . . . . . . . . . . . . . . . . .190
6. A Look Into the Future . . . . . . . . . . . . . . . . . 192
References . . . . . . . . . . . . . . . . . . . . .192
1. Introduction
One cannot begin a discussion of game-playing machines without a
passing reference to that famous hoax known as the Maelzel Automaton,
constructed by Baron Kempelen in 1769, which played chess through the
efforts of a dwarf hidden inside. What we are going to discuss are not
devices of this sort, in fact, quite the opposite; we will be considering
attempts a t making machines which behave as if there were men inside
when there are none. Modern work along these lines does not often lead
to the construction of actual machines.’ Instead, a program is usually
written for an existing digital computer. For all practical purposes the
*An exception to this statement can be made today (1960) in terms of the many
game-playing machines which high school students have recently been constructing
as entries in science fairs. Some of these are quite ambitious. A checker-playing com-
puter built by David Ecklein of Cedar Falls, Iowa, employs 3200 vacuum tubes.
165
166 ARTHUR L. SAMUEL
general-purpose computer with a special game-playing program does
become a different device and can be called a game-playing machine.
Games provide a happy vehicle for studying methods of simulating
certain aspects of intellectual behavior ; happy because they are fun,
and happy because they reduce the problem to one of manageable pro-
portions. Games, particularly those with a long history, retain many of
the essential characteristics of real-life problems and eliminate many of
the worrisome complications. There are, thus, both a trivial and a serious
purpose to the study of game-playing machines.
2. Historical Background
The history of game-playing machines is replete with attempts to
devise algorithms which can be used to guarantee a win, and with the
development of schemes to play specific games for which such algorithms
exist. As an example, the game of Nim falls into the category of games
for which an algorithm exists. However interesting some of the many
machine realizations have been to the nonmathematical public, this game
remains essentially trivial to anyone knowing the binary number system.
We will not concern ourselves with such games, a t least not when played
in this manner. What we will consider are games falling into three cate-
gories: (1) those games for which there are no known algorithms, other
than the simple process of exhaustion, (2) those games which can be
played in a nontrivial manner, without recourse to the known defining
algorithm, and finally (3) games for which there are known optimum
strategies which must be used against a knowledgeable opponent, but in
which heuristic procedures may be used with advantage against an op-
ponent who wittingly or unwittingly fails to employ the optimum strat-
egy. Checkers and chess fall into the first category. Davies [ l ] has
programmed “tic tac toe” or “noughts and crosses” on the British DEUCE
computer as a game of the second category. The penny-matching machine
of Hagelbarger [2] is a typical example of the third.
3. Chess-Playing Machines
Out of deference to the pioneering work of Shannon [3] we will begin
our discussion with the game of chess. Correctly or incorrectly, this game
has the reputation of being the intellectual game par excellence.2 I n any
event, it has defied all efforts a t complete solution and it is still being
played today in essentially the same form as it was during the Renaissance.
‘The Japanese game of “Go” is perhaps the most serious contender. Checkers, in
spite of its plebeian associations, is thought by some to offer substantially the same
intellectual challenge,
PROGRAMMING COMPUTERS TO PLAY GAMES 167
FIQ.1. Simplified diagram showing how the evaluations are backed up through
the “tree” of possible moves to arrive a t the best next move. The evaluation process
starts a t 0.
(Reproduced from reference ClOl.)
KEY.0, machine chooses branch with largest score.
0,opponent expected to choose branch with smallest score.
0,machine chooses branch with most positive score.
‘Trivial or not, it was some eight years after Shannon’s paper was written that this
was done in its entirety.
168 ARTHUR 1. SAMUEL
is the most advantageous for the side making the move until one arrives
a t the best initial move. With the evaluation function always expresscd in
terms of the same side (e.g., for the computer) this becomes an alterna-
tion of minimizing and maximizing actions, an inferential procedure which
has been dubbed “minimaxing.” This minimaxing procedure has been
basic to all serious attempts at programming chess or checkers.
3.2 Dictionary Approach
We must not exclude an alternate, albeit impractical method men-
tioned by Shannon, that of storing a dictionary of all possible positions
of the chess pieces together with the correct move (either calculated or
supplied by a chess master). While there are many fewer of these positions
than there are of variations in paths by which they may be reached, the
number is still quite formidable, perhaps of the order of A very
much abridged and carefully editcd version of such a dictionary is, how-
ever, not entirely useless even for chess, and we will later see how such
a dictionary has been used successfully for the game of checkers.
3.3 Shannon‘s Detailed Analysis
Returning to the minimax procedure4hannon considered two basic
types of strategies. For one, which he called Type A, the look-ahead
procedure is always to be carried a fixed number of moves ahead along
all possible paths, as shown in Fig. 1. In a second and more sophisticated
procedure, callcd Type B, certain forceful variations are to be carried out
as far as possible and are evaluated only a t reasonable positions where
some quasi-stability has been established, while other pointless variations
are not to be explored a t all. This procedure was justified by likening it
to the tactics employed by chess masters.
Shannon anticipated that such a program would play a fairly strong
game a t speeds comparable to human speeds. He listed as machine advan-
tages the following: (1) high-speed operation in individual calculations,
(2) freedom from errors, (3) freedom from laziness, and (4) freedom
from nerves. H e balanced these against the flexibility, imagination and
inductive and learning capabilities of the human mind.
Shannon defines a chess position in the following terms:
(a) A statement of the positions of all pieces on the board.
(b) A statement of which side, White or Black, has the move.
(c) A statement as to whether the kings and rooks have moved. This
‘The use of this term is perhaps unfortunate, since it implies a closer connection
than actually exists between this game-playing procedure and the minimax theorem
of game theory. See J. von Neumann and 0. Morgenstern, The Theory of Games
and Economic Behavior. Princeton Univ. Press, Princeton, New Jersey, 1955.
PROGRAMMING COMPUTERS TO PLAY GAMES 169
language. Their choice for this language is within the framework of in-
terpretive programming, that is, a programming scheme in which pro-
grams are written in a language believed to be more task-oriented and in
which a so-called “interpretative program” translates these instructions
into machine code, one a t a time, as needed during the program execution.
This procedure, as is well known, results in a substantial sacrifice in speed
as compared with assembly techniques, but it does greatly simplify the
task of making continuous changes. As a matter of fact, the Newell chess
program involves no less than four language levels: (1) the machine code
itself, (2) a general information processing language (their IPL) , which
is primarily designed for handling lists, (3) a basic chess vocabulary, and
finally (4) the chess program itself.
3.7.1 THE NEWELL, SHAW, AND SIMON
INFORMATION PROCESSING LANGUAGE
moving, and the dominant criterion is the number of pieces of each color
on the board.
(c) The rules of the activity must be definite and they should be known.
Games satisfy this requirement. Unfortunately, many problems of eco-
nomic importance do not. While, in principle, the determination of the
rules can be a part of the learning process, this is a complication which
might well be left until later.
(d) There should be a background of knowledge concerning the activity
against which the learning progress can be tested.
(e) The activity should be one that is familiar to a substantial body
of people so that the behavior of the program can be made understandable
to them. More people play or think that they can play checkers than is
the case for chess. I n both cases the ability to have the program play
against human opponents (or antagonists) adds spice to the study and,
incidentally, provides a convincing demonstration for those who do not
believe that machines can compete with people in activities which are
usually assumed to involve thinking.
4.1.2 GENERAL METHOD
Not only was everything possible done to increase operating speed, but
a great deal of attention was given to the operating procedures. The pro-
gram is arranged so that it can be operated in a variety of different ways.
For example, it is possible to cause the program to play itself, that is, to
play both sides of the game. This mode of play was found to be especially
useful during the early stages of learning.
It is also possible to have the program follow book games. When operat-
ing in this mode, the program decides a t each point in the game on its
next move in the usual way and reports this proposed move. Instead of
actually making this move, the program refers to the stored record of a
book game and makes the book move. The program records its evaluation
of the two moves and it also counts and reports the number of possible
moves which it rates as being better than the book move and the number
it rates as being poorer. The sides are reversed and the process is repeated.
For demonstration purposes, and also as a means of avoiding lost ma-
chine time while an opponent is thinking, it is possible to play several
simultaneous games against different opponents. Eight have been played
on a number of occasions. When playing in this fashion the different
moves are reported separately on punch cards (as well as being listed on
the printer). These cards are used as input, together with a card contain-
ing the player’s move, as a means of identifying the game to which the
specific move applies.
Games may be started with any initial configuration of the board posi-
tion so that the program may be tested on end games, checker puzzles,
etc. For nonstandard starting conditions the program lists the initial
piece-arrangement. From time to time, and a t the end of each game, the
program also tabulates various bits of statistical information which as-
sists in the evaluation of playing performance.
During normal play against a single opponent the program accepts
moves entered on the input keys and displays its move on the console
lights. A complete record is made concurrently on the printer. This record
lists the moves made by both sides and the program’s evaluation of these
moves. Should an opponent attempt to make an illegal move the program
stops, with a full display of all the lights on the console, and waits for a
legal move to be entered. A listing of the existing board positions can be
requested at any time should errors be made in executing moves on the
opponent’s board.
The program concedes when faced with a sure defeat, and when it can
180 ARTHUR 1. SAMUEL
predict a win i t reports this fact on the printer. A win in not more than
15 moves (half moves, as some people count) is its record prediction to
date,
4.1.5 PLY LIMITATIONS
As briefly mentioned in Section 4.1.2, great care was exercised in termi-
nating the look-ahead procedure. For convenience in discussing this prob-
lem, Samuel defines the look-ahead distance as the ply (a ply of 2 consist-
ing of one proposed move by the machine and the anticipated reply by
the opponent). The ply is not fixed, but depends upon the dynamics of
the situation, and it is allowed to vary from move to move and from
branch to branch during the analysis. Several criteria are used for termi-
nating the look-ahead relating to the existence of jump moves and to
the possibilities of offering exchanges. Sequences are terminated a t varying
plies, as shown in Fig. 2, usually from a minimum of 3 to a maximum of
20, this upper figure being set by the space reserved for storing the con-
tinuations. For a while a record was kept of the maximum ply encountered
during each move analysis. It was soon found that this limiting ply of 20
was reached a t least once during almost every move analysis. This was
usually found to be the result of an unlikely situation in which one side
or the other always made the worst possible move. When a provision was
introduced to terminate obvious winning or losing sequences (in which one
side gained the equivalent of a 2-king advantage) a t a reasonable ply
figure (usually set a t 11), a substantial reduction in playing time resulted.
When the look-ahead procedure is terminated the resulting board posi-
tion is then evaluated in terms of a linear polynomial.
4.1.6 THEI SCORING POLYNOMIAL
FIG.2. A “tree” of moves which might be investigated during the look-ahead. The
actual branchings are much more numerous than those shown, and the “tree” is
apt t o extend t o as many as 20 levels. (Reproduced from reference [lo].)
the ranges in numerical values of the various terms overlap. Both schemes
have been tried a t various times.
Being more specific, the dominant scoring parameter, as defined by
the rules for checkers, is the inability for one side or the other t o move.
182 ARTHUR 1. SAMUEL
Since this can occur but once in any game it is tested for separately and
is not included in the scoring polynomial as tabulated by the computer
during play. The next parameter is considered to be the relative piece-
advantage, It is always assumed that it is to the machine’s advantage
to reduce the number of the opponent’s pieces as compared to its own.
A reversal of the sign of this term will, in fact, cause the program to play
“give-away” checkers, and with learning it can only learn to play a better
and better give-away game. Were the sign of this term not known, it
could, of course, be determined by tests, but it must be fixed by the
experimenter and, in effect, it is one of the instructions to the machine
defining its task. The numerical computation of the piece-advantage has
been arranged in such a way as to account for the well-known property
that it is usually to one’s advantage to trade pieces when one is ahead and
to avoid trades when one is behind. Furthermore, it is assumed that kings
are more valuable than pieces, the relative weights assigned to them being
3 to 2. This ratio means that the program will trade 3 men for 2 kings, or
2 kings for 3 men, if by so doing it can obtain some positional advantage.
The choice of the parameters to follow this first term and, indeed, the
method of making this choice has been the subject of much study. Two
procedures have been investigated; the one in which the experimenter
himself selected the terms and in which the learning feature resides else-
where, and the second in which the program made its own selection as
n part of the learning process.
'This playing-time requirement, while large in terms of cost, is ICRS than the time
which the checker master probably spends to acquire hia proficiency.
PROGRAMMING COMPUTERS TO PLAY GAMES 187
presents itself, since the only measuring parameter available is this same
scoring polynomial that the process is designed to improve. Recourse was
had to the peculiar property of the look-ahead procedure which makes
it less important for the scoring polynomial to be particularly good the
further ahead the process is continued. This means that one can evaluate
the relative change in the positions of two players, when this evaluation
is made over a fairly large number of moves, by using a scoring system
which is much too gross to be significant on a move-by-move basis.
In order to obtain a large enough span to make use of this character-
istic, it was arranged for Alpha to keep a record of the apparent good-
ness of its board positions as the game progressed. This was done by
computing the scoring polynomial for each board position encountered
in actual play and by saving this polynomial in its entirety. At the same
time, Alpha also computed the backed-up score for all board positions,
using the look-ahead procedure described earlier. At each play by Alpha
the initial board score, as saved from the previous Alpha move, was
compared with the backed-up score for the current position. The differ-
ence between these scores, defined as delta, was used to measure Alpha’s
progress. If delta was positive, then, presumably, Alpha’s position was
improving with time and, consequently, it was reasonable to assume
that the initial board evaluation was in error and terms which con-
tributed positively should have been given more weight. A converse
statement could be made for the case where delta was negative. Pre-
sumably, in this case, a wrong choice of moves was made and greater
weight should have been given to terms making negative contributions.
A record was kept of the correlation existing between the signs of the
individual term contributions in the initial scoring polynomial and the
sign of delta. After each play in which a substantial delta value resulted,
an adjustment was made in the values of the correlation coefficients, with
due account being taken of the number of times that each particular term
had been used and had had a nonzero value. The coefficient for the poly-
nomial term (other than the piece-advantage term) with the then largest
correlation coefficient was set a t a prescribed maximum value with pro-
portionate values determined for all of the remaining coefficients. When-
ever a correlation-coefficient calculation led to a negative sign a corre-
sponding reversal was, of course, made in the sign associated with the
term itself.
When the value of delta fell below an arbitrary limit, and so cast
doubt on the validity of the proposed correction, the initial board score
for this move was retained until after another pair of moves, and delta
was then computed over this longer span. This process was repeated
PROGRAMMING COMPUTERS TO PLAY GAMES 189
as many times as necessary until a significantly large delta value re-
sulted.
After each play for which an adjustment was made, a low-term tally
was recorded against the term which had the lowest correlation coefficient
and, a t the same time, a test was made to see if this brought its tally
count up to some arbitrary limit (originally set at 8, and later set a t 32).
When this limit was reached for any specific term, this term was trans-
ferred to the bottom of the reserve list and it was replaced by a term
from the head of the reserve list.
Samuel comments on the fact that the procedure of having the pro-
gram select terms for the evaluation polynomial from a supplied list
might be thought to be much too simple, and that the program should
be made to generate the terms for itself. Unfortunately, he was not able
to devise a satisfactory scheme for doing this. With a man-generated list
one might a t least ask that the terms be members of an orthogonal set,
assuming that this has some meaning as applied to the evaluation of a
checker position. But, even this seemed impossible. The only practical
solution seemed to be that of including a relatively large number of pos-
sible terms in the hope that all of the contributing parameters got cov-
ered somehow, even though in an involved and redundant way. This is
not an entirely undesirable state of affairs, however, since i t simulates
the situation which is apt to exist when an attempt is made to apply
similar learning techniques to real-life situations.
Many of the terms in the existing list were related in some vague way
to the parameters used by checker experts. Some of the concepts which
checker experts appear to use eluded attempts a t definition in terms of
machine programming. Some of the terms were quite unrelated to the
usual checker lore and were discovered more or less by accident. The
second moment about the diagonal axis through the double corners is
an example. Twenty-seven different simple terms were used and a num-
ber of combinational terms were introduced which had some of the
characteristics of binary connectives.
4.1.8.6 Learning4 y-Generalization Tests
Samuel reported on two fairly extensive series of tests of this learning
procedure; the first series covering 28 games. This series revealed certain
weaknesses of the program which were corrected. The results of 42 addi-
tional games played after these changes were also reported. I n general,
the program was able to learn to play a rather good game.
The tentative conclusions which were drawn from these tests were:
(a) A simple generalization scheme of the type here used can be an
190 ARTHUR 1. SAMUEL
effective learning device for problems amenable to tree-searching pro-
cedures.
(b) The memory requirements of such schemes are quite modest and
remain fixed with time.
(c) The operating times are also reasonable and remain fixed, inde-
pendent of the amount of accumulated learning.
(d) Incipient forms of instability in the solution can be expected but,
at least for the checker program, these can be dealt with by quite straight-
forward procedures.
(e) Even with the incomplete and redundant set of parameters which
have been used to date, it is possible for the computer to learn to play a
better-than-sverage game of checkers in a relatively short period of time.
4.1.9 COMPARISONS BETWEEN THE TWO METHODB
5. Other Garnet
While the programming of computers to play bridge hands has not
attracted much attention, several people have programmed bridge bid-
ding, using one of the conventional bidding systems. The program
written for the IBM 650 by David Lefkovitz a t the Moore School of
PROGRAMMING COMPUTERS TO PLAY GAMES 191