0% found this document useful (0 votes)
5 views28 pages

history of AI 1

The document discusses the development of computer programs designed to play games, particularly chess and checkers, highlighting historical contributions and methodologies such as the minimaxing procedure. It outlines the challenges of programming machines to play skillfully, the importance of algorithms, and the evolution of game-playing machines. The work of notable figures like Shannon and Turing is emphasized, along with the potential for future advancements in artificial intelligence within game contexts.

Uploaded by

dalya.ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views28 pages

history of AI 1

The document discusses the development of computer programs designed to play games, particularly chess and checkers, highlighting historical contributions and methodologies such as the minimaxing procedure. It outlines the challenges of programming machines to play skillfully, the importance of algorithms, and the evolution of game-playing machines. The work of notable figures like Shannon and Turing is emphasized, along with the potential for future advancements in artificial intelligence within game contexts.

Uploaded by

dalya.ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Programming Computers to Play Games

ARTHUR 1. SAMUEL
IBM Research Laboratories
Yorktown Heights,New Yark

Page
1. Introduction . . . . . . . . . . . . . . . . . . . . . 165
2. Historical Background . . . . . . . . . . . . . . . . .166
3. Chess-Playing Machines . . . . . . . . . . . . . . . . . 166
3.1 The Minimaxing Procedure . . . . . . . . . . . . . . . 167
3.2 Dictionary Approach . . . . . . . . . . . . . . . . . 168
3.3 Shannon’s Detailed Analysis . . . . . . . . . . . . . . 168
3.4 Turing’s Contribution . . . . . . . . . . . . . . . . 170
3.5 The Los Alamos Program . . . . . . . . . . . . . . . 170
3.6 Bernstein’s Program . . . . . . . . . . . . . . . . . 171
3.7 Newell, Shaw, and Simon Program . . . . . . . . . . . . 172
4. Checkers (or Draughts) . . . . . . . . . . . . . . . . . 175
4.1 The Samuel Checker Program . . . . . . . . . . . . . . 176
5 . Other Games . . . . . . . . . . . . . . . . . . . .190
6. A Look Into the Future . . . . . . . . . . . . . . . . . 192
References . . . . . . . . . . . . . . . . . . . . .192

1. Introduction
One cannot begin a discussion of game-playing machines without a
passing reference to that famous hoax known as the Maelzel Automaton,
constructed by Baron Kempelen in 1769, which played chess through the
efforts of a dwarf hidden inside. What we are going to discuss are not
devices of this sort, in fact, quite the opposite; we will be considering
attempts a t making machines which behave as if there were men inside
when there are none. Modern work along these lines does not often lead
to the construction of actual machines.’ Instead, a program is usually
written for an existing digital computer. For all practical purposes the
*An exception to this statement can be made today (1960) in terms of the many
game-playing machines which high school students have recently been constructing
as entries in science fairs. Some of these are quite ambitious. A checker-playing com-
puter built by David Ecklein of Cedar Falls, Iowa, employs 3200 vacuum tubes.
165
166 ARTHUR L. SAMUEL
general-purpose computer with a special game-playing program does
become a different device and can be called a game-playing machine.
Games provide a happy vehicle for studying methods of simulating
certain aspects of intellectual behavior ; happy because they are fun,
and happy because they reduce the problem to one of manageable pro-
portions. Games, particularly those with a long history, retain many of
the essential characteristics of real-life problems and eliminate many of
the worrisome complications. There are, thus, both a trivial and a serious
purpose to the study of game-playing machines.

2. Historical Background
The history of game-playing machines is replete with attempts to
devise algorithms which can be used to guarantee a win, and with the
development of schemes to play specific games for which such algorithms
exist. As an example, the game of Nim falls into the category of games
for which an algorithm exists. However interesting some of the many
machine realizations have been to the nonmathematical public, this game
remains essentially trivial to anyone knowing the binary number system.
We will not concern ourselves with such games, a t least not when played
in this manner. What we will consider are games falling into three cate-
gories: (1) those games for which there are no known algorithms, other
than the simple process of exhaustion, (2) those games which can be
played in a nontrivial manner, without recourse to the known defining
algorithm, and finally (3) games for which there are known optimum
strategies which must be used against a knowledgeable opponent, but in
which heuristic procedures may be used with advantage against an op-
ponent who wittingly or unwittingly fails to employ the optimum strat-
egy. Checkers and chess fall into the first category. Davies [ l ] has
programmed “tic tac toe” or “noughts and crosses” on the British DEUCE
computer as a game of the second category. The penny-matching machine
of Hagelbarger [2] is a typical example of the third.

3. Chess-Playing Machines
Out of deference to the pioneering work of Shannon [3] we will begin
our discussion with the game of chess. Correctly or incorrectly, this game
has the reputation of being the intellectual game par excellence.2 I n any
event, it has defied all efforts a t complete solution and it is still being
played today in essentially the same form as it was during the Renaissance.
‘The Japanese game of “Go” is perhaps the most serious contender. Checkers, in
spite of its plebeian associations, is thought by some to offer substantially the same
intellectual challenge,
PROGRAMMING COMPUTERS TO PLAY GAMES 167

As Shannon pointed out, “The problem is not that of designing a machine


to play perfect chess (which is quite impractical) nor one which merely
plays legal chess (which is trivial 3). We would like it to play a skillful
game, perhaps comparable to that of a good human player.” Shannon’s
paper did not present an actual chess-playing program, but it did pro-
pose a systematic procedure which could be programmed and it discussed
various ways in which the simplest procedure could be improved.
Chess is, of course, a finite game. In principle, one need only consider
all possible moves in a given position, then all moves for the opponent,
etc., and so look ahead to the end of the game. This turns out to be com-
pletely impossible since, in general, there are some 30 legal moves for
each position and assuming, say, 40 moves for a typical game it would
still take over loo0years to compute the first move were a machine avail-
able which could consider each variation in one micromicrosecond.
3.1 The Minimaxing Procedure
The only practical procedure is to carry the look-ahead procedure for-
ward only a very short distance, then to evaluate the resulting board
positions using the best evaluation function one is able to devise. It is
not satisfactory to select the initial move which leads to the board position
with the highest score, since to reach this position would require the co-
operation of the opponent. Instead, one must work backward through the
“tree” of moves, (see Fig. 1) determining a t each branch point which move

+la, +a +a -7 +4 -3 0 t3 -10 -P -70 -la, +3 +7 +I5 -5

FIQ.1. Simplified diagram showing how the evaluations are backed up through
the “tree” of possible moves to arrive a t the best next move. The evaluation process
starts a t 0.
(Reproduced from reference ClOl.)
KEY.0, machine chooses branch with largest score.
0,opponent expected to choose branch with smallest score.
0,machine chooses branch with most positive score.
‘Trivial or not, it was some eight years after Shannon’s paper was written that this
was done in its entirety.
168 ARTHUR 1. SAMUEL
is the most advantageous for the side making the move until one arrives
a t the best initial move. With the evaluation function always expresscd in
terms of the same side (e.g., for the computer) this becomes an alterna-
tion of minimizing and maximizing actions, an inferential procedure which
has been dubbed “minimaxing.” This minimaxing procedure has been
basic to all serious attempts at programming chess or checkers.
3.2 Dictionary Approach
We must not exclude an alternate, albeit impractical method men-
tioned by Shannon, that of storing a dictionary of all possible positions
of the chess pieces together with the correct move (either calculated or
supplied by a chess master). While there are many fewer of these positions
than there are of variations in paths by which they may be reached, the
number is still quite formidable, perhaps of the order of A very
much abridged and carefully editcd version of such a dictionary is, how-
ever, not entirely useless even for chess, and we will later see how such
a dictionary has been used successfully for the game of checkers.
3.3 Shannon‘s Detailed Analysis
Returning to the minimax procedure4hannon considered two basic
types of strategies. For one, which he called Type A, the look-ahead
procedure is always to be carried a fixed number of moves ahead along
all possible paths, as shown in Fig. 1. In a second and more sophisticated
procedure, callcd Type B, certain forceful variations are to be carried out
as far as possible and are evaluated only a t reasonable positions where
some quasi-stability has been established, while other pointless variations
are not to be explored a t all. This procedure was justified by likening it
to the tactics employed by chess masters.
Shannon anticipated that such a program would play a fairly strong
game a t speeds comparable to human speeds. He listed as machine advan-
tages the following: (1) high-speed operation in individual calculations,
(2) freedom from errors, (3) freedom from laziness, and (4) freedom
from nerves. H e balanced these against the flexibility, imagination and
inductive and learning capabilities of the human mind.
Shannon defines a chess position in the following terms:
(a) A statement of the positions of all pieces on the board.
(b) A statement of which side, White or Black, has the move.
(c) A statement as to whether the kings and rooks have moved. This
‘The use of this term is perhaps unfortunate, since it implies a closer connection
than actually exists between this game-playing procedure and the minimax theorem
of game theory. See J. von Neumann and 0. Morgenstern, The Theory of Games
and Economic Behavior. Princeton Univ. Press, Princeton, New Jersey, 1955.
PROGRAMMING COMPUTERS TO PLAY GAMES 169

is important, since by moving a rook, for example, the right t o castle on


that side is forfeited.
(d) A statement of, say, the last move. This will determine whether a
possible en passant capture is legal, since this privilege is forfeited after
one move.
(e) A statement of the number of moves made since the last pawn
move or capture. This is important because of the 50-move draw rule.
Because there is no element of chance in chess (other than the original
choice of sides) and because each side has “perfect information” a t each
move as to all previous moves, one can conclude that any given position of
chess must be either:
(a) A won position for White. That is, White can force a win however
Black defends.
(b) A draw position. White can force a t least a draw however Black
plays, and likewise Black can force a t least a draw however White plays.
If both sides play correctly the game will end in a draw.
(c) A won position for Black. Black can force a win however White
plays.
Since there exists no practical method for determining to which of
the three categories any given position belongs, one must devise an
approximate evaluation function f ( P ) in quite an analogous way to that
in which a good chess player performs a position evaluation. Whereas
an exact evaluation would have but three values, an approximate evalua-
tion function will have a more or less continuous range of possible values.
Shannon lists the following factors which might be included as applied to
the midgame:
(a) Material advantage (difference in total material).
(b) Pawn formation, backward, isolated pawns, etc.
(c) Positions of pieces, advanced knight, doubled rooks, etc.
(d) Commitments, attackn and options, pieces required for guard func-
tions, exchange options, pins, etc.
(e) Mobility.
The evaluation function would be formed by assigning numerical values
and weights to these various factors and adding them together.
Shannon gave some attention to the mechanics of causing a computer
to play legal chess. In this connection, he envisioned a method of repre-
senting a complete board position by a sequence of 64 digits corresponding
to the 64 squares on the board in which each digit can have a value lying
between -6 and +6 to represent the piece which occupies the square in
question. White piece types would be assigned positive numbers (P=l,
N = 2 , B=3, R=4, Q=5, K = 6 ) , and Black pieces the corresponding nega-
tive numbers. A few additional numbers would be required to specify the
170 ARTHUR 1. SAMUEL

board status with respect to castling, en passant captures, draw sequences,


etc. Moves were to be represented by (a,b,c) where a and b are squares
and c specifies a piece in case of promotion. The complete program was
to consist of a set of subroutines which make moves, list all possible moves
for each type of piece, calculate evaluation functions, perform minimax-
ing, and determine the optimum move. The exact details of the pro-
cedures suggested by Shannon have been superseded by newer techniques
made possible by modern computers.
Other aspects of the problem treated by Shannon had to do with thc
use of a statistical element to avoid repetitive games, the possibility of
storing standard openings, simple methods of altering style of play, thc
possibilities of introducing learning, and finally, the possibility of pro-
grams based on elaborate analyses of “type positions” which would en-
able the program to recognize similarities with familiar situations and
so to obtain suggestions of plausible moves to investigate.
Shannon has recently reconsidered the problem of programming chess
with particular attention to routines aimed a t generating forceful moves.
He and John McCarthy have also written scveral versions of a program
for finding two-move and three-move mates, The interested reader would
do well to be on the lookout for a report of this work.
3.4 Turing’s Contribution
Following Shannon’s lead, Turing [4] also addressed himself to the
problem of programming a computer to play chess. His analysis was based
on the premise t h a t “ 1 f one can explain quite unambiguously in English,
with the aid of mathematical symbols if required, how a calculation is
to be done, then it is always possible to programme a digital computer
to do that calculation, provided the storage capacity is adequate.” He
contented himself, therefore, in describing such a program and in simulat-
ing it by hand calculation. Turing recognized the need for carrying the
analysis varying distances and for making evaluations at what he called
dead positions, that is, a t positions where no capture, or recapture, or
mate can be made in the next move. His scheme of position-play evalua-
tion was quite simple. The only published game revealed it as a very weak
player.
3.5 The Los Alamos Program
The next major contribution to appear in the literature5 was the work
of a group a t Los Alamos [ 5 ] , in which major concessions were made in
the interests of speed by using a 6 x 6 board, omitting the bishops, using
‘Pravda has reported that chess has been programmed for the Russian BESM
computer, but no published details are available.
PROGRAMMING COMPUTERS TO PLAY GAMES 171

6 pawns to a side, and by eliminating castling and two-square pawn


moves. In spite of this violence to chess tradition the game was said to
retain much of the flavor of real chess. The great reduction in complexity
allowed the program to be condensed to 600 machine words on a MANIAC I
computer. It was possible to investigate all sequences to a depth of 4,
that is, t o two possible machine moves and two replies, within an average
playing time of 12 minutes per move. Subject to these limitations, this pro-
gram played well enough to beat a beginner, but it was no match for the
experienced chess player. These studies demonstrated for the first time
that chess was a t last within the reach of the existing digital computers
(although the authors were quite pessimistic on this score) and substituted
concrete experimental data for the previous speculations as to magnitude
of the task of making computers compete with the human brain in per-
forming tasks which require the gathering of a great amount of informa-
tion which is then evaluated in accordance with extremely complex
criteria.
These workers (particularly M. B. Wells) have continued their interest
and have recently programmed the complete (8 x 8) game of chess for
the M A N I ~11.
C This ncw code follows variations until there have been
two consecutive noncapture moves with a minimum of three half-moves
and a maximum of five. An additional time-saving rule cuts off variations
involving loss of material if a player cannot regain the lost material in
one move. This had the unfortunate effect of eliminating sacrifice moves
from consideration. Positions are evaluated in ternls of material advantage
and mobility, with a pawn being worth 10 freedoms. The average playing
time is 18 min per move. Since many improvements are in process, includ-
ing a major increase in playing speed, and a revision in the way the pro-
gram applies the mobility principle, a significant improvement in playing
ability is to be expected soon.
3.6 Bernstein's Program
The first complete chess program to be described is that written for
the IBM 704 computer by Bernstein and his collaborators [S]. The pro-
gram requires about 8 min to make a move, it looks four half-moves in
advance, and i t plays a respectable and not-too-obvious game of chess.
In order to play legal chess without any simplifying modifications,
Bernstein and his associates addressed themselves directly to the problem
of limiting the number of moves to be considered a t each stage in the
analysis. Their program selects the 7 most plausible moves for detailed
analysis (out of the 30 or so available) and so limits the number of final
positions to be evaluated at a depth of 4 to only 2401 (compared to a
possible 81,000 positions). This selection is made on the basis of 8 ques-
172 ARTHUR 1. SAMUEL
tions which the program explores in order. Quoting Bernstein, these ques-
tions are:
“1. Am I in check and, if so, can I capture the checking piece, interpose
a piece, or move away?
2. Are any exchanges possible and, if so, can I gain material advantage
by entering upon the exchange, or should I move my man away?
3. If I have not castled, can I do so now?
4. Can I develop a minor piece?
5. Can I occupy an open file?
6. Do I have any men that I can put on the critical squares created
by pawn chains?
7. Can I make a pawn move?
8. Can I make a piece move?”
It will be observed that the choice of questions and their order follows
the common lore of the chess world. The success of the selection technique
is critically dependent upon the validity of this lore. An almost fatal
defect is the certainty of overlooking simple moves which may have
very important consequences, Somehow, good chess players exercise vari-
ous checks and balances which prevent them from committing such obvi-
ous blunders and, presumably, similar facilities could be added to the
program, although how to do this is not a t all obvious. The real lesson to
be learned from the Bernstein experiments is the importance of selection
in the reduction of the work load.
3.7 Newell, Shaw, and Simon Program
To date, perhaps the most ambitious attack on machine chess is that
by Newell et al. [7].The intent of their study is to use chess as a vehicle
for describing and understanding human thinking and decision processes.
As Newell has expressed the matter, “We wish not only to have the pro-
gram make good moves, but to do so for the right reasons.” This work
was started in 1955 and is not yet complete. It was interrupted by work
directed toward developing programs that discover proofs for theorems in
symbolic logic, an undertaking which Newell asserts to involve the same
fundamental problem: “. . , reasoning with heuristics that select fruitful
paths of exploration in a space of possibilities that grows exponentially
where the dilemmas of speed versus selectivity, and uniformity versus
sophistication exist.”
These workers began by assuming that the task had grown beyond
the reach of direct machine-coding and that it required a more powerful
‘There is an implied assumption here that people know and act on the right rea-
sons, an assumption that many workers reject.
PROGRAMMING COMPUTERS TO PLAY GAMES 173

language. Their choice for this language is within the framework of in-
terpretive programming, that is, a programming scheme in which pro-
grams are written in a language believed to be more task-oriented and in
which a so-called “interpretative program” translates these instructions
into machine code, one a t a time, as needed during the program execution.
This procedure, as is well known, results in a substantial sacrifice in speed
as compared with assembly techniques, but it does greatly simplify the
task of making continuous changes. As a matter of fact, the Newell chess
program involves no less than four language levels: (1) the machine code
itself, (2) a general information processing language (their IPL) , which
is primarily designed for handling lists, (3) a basic chess vocabulary, and
finally (4) the chess program itself.
3.7.1 THE NEWELL, SHAW, AND SIMON
INFORMATION PROCESSING LANGUAGE

A digression may be desirable to consider the basic characteristics of


the IPL (Information Processing Language, as defined by Newell and
Shaw) [ 81. These languages (of which there are now 6) are designed pri-
marily for handling lists which are of unknown length and whose terms
may themselves be the names of other lists, etc. These languages allow two
kinds of expressions: data list structures which contain the information
to be processed, and routines which define the processes themselves. All
of the computer memory, which is not otherwise assigned, is contained
in an available space list. When memory cells are needed they are removed
from this list and when the need for them no longer exists they are re-
turned. There is, thus, no separate limit to the size of any specific list
other than the over-all one set by the total available space. Each word
in a list (other than the termination word which contains zero) must
contain, in addition to the desired data, (1) a designation prefix and ft
data type prefix which specify the type of information stored, whether
data, or instruction, etc., and the format of the data, whether fixed point,
floating point, etc., and (2) a link portion which specifies the location in
memory of the next word of the list. To insert a word in an existing list
following any specific word it is only necessary to move the link address
from this previous word to the link portion of the word to be introduced,
and put the address of the new word into the link field of the previous
word. Deletions are accomplished by reversing this process, taking care
however to attach the deleted cell to the available space list. This proc-
ess is seen to be well suited for handling data which can proliferate or
contract a t various stages in the analysis in a bounded but otherwise un-
specified fashion. The virtue of having on demand essentially all of mem-
ory available for a specific purpose is, however, obtained a t a cost, since
174 ARTHUR 1. SAMUEL
a substantial portion of the total space is used up in “prefixes” and in
“links.” The IPLlanguages, thus, transfer some of the programming com-
plexities from the programmer to the interpretative routine and pay for
this by a sacrifice in speed and memory space.
Starting with this very flexible language, Newell and his associates have
programmed approximately 100 terms in a basic chess vocabulary. Each
term defines a process which can be used to answer the kinds of questions
one asks about a chess position, test to see if one man bears on another,
or to see if two men are on the same diagonal, etc. The final program is
written in these chess terms.
3.7.2 THE NEWELL, SHAW, AND SIMON ANALYSIS

Newell and his collaborators describe their program in terms of goals,


move generations, evaluation analysis, and final choice. Rather than to
use a numerically additive evaluation function to compare alternatives,
the various goals are listed in order of importance; a leading goal takes
precedence over all those that follow it in this list. They feel that this is
a real distinction. As a matter of fact, all of the usual evaluation schemes
become equivalent to this if widely differing values are assigned to the
different parameters of the evaluation polynomial so that the largest value
attained by any given term can never equal the smallest, nonzero, value
of the term which precedes it.
The goals form a set of modules out of which the program is constructed.
They are independent and can be added or removed, as desired, without
disturbing other goals. The more important of these goals are king safety,
material balance, center control, development, king-side attack, and pro-
motion. At the beginning of each move a preliminary analysis is made
and a set of goals are chosen which are thought to be appropriate to the
chess situation that exists.
Each goal has associated with it a move generator which proposes alter-
nate moves aimed a t achieving the specific goal. These move generators
are used exclusively to generate alternate moves and are not employed
to generate the continuations that are explored in the analysis of the
moves. For each move proposed by the move generator an analysis gen-
erator takes over and explores each continuation until it terminates by
arriving a t a generalized “dead” position. Adapting Turing’s idea in this
regard, they require that the dead position concept be applied to each
feature of the board, Newell, himself, has raised some question regarding
the lack of a built-in rule that will cause the search to terminate under
these conditions, and it may well be that this is a serious defect of their
proposed procedure. However, whenever a branch does terminate, the
resulting board position is evaluated and the score is backed up by the
PROGRAMMING COMPUTERS TO PLAY GAMES 175

conventional minimaxing procedure and, finally, a choice of moves is


made.
Newell proposes to explore various techniques for making this final
choice, in addition to the simple procedure of taking the one yielding the
highest score. One such technique is to set an acceptance level and to take
the first acceptable move. Of course, if none come up to this level the best
move found will be used, Another procedure is to search for a move that
has a double function, that is, a move proposed by more than one move
generator. Still a third variation would be to divide the goals into two
sets; the first to contain all the features that normally should be at-
tended to, and the second to contain features that are rare in occurrence,
but either very good or very bad if they do occur. The executive routine
would then first find an acceptable move on the basis of the first list and
then spend all of the remaining time looking for various rare conse-
quences derived from the second list. All of these schemes are seen to
simulate the techniques that human players adopt, and thus to follow the
design philosophy espoused by Newell and his collaborators.
It would be tempting to go into greater detail were it not for the fact
that great strides are still being made in this program and many of the
precise details of the existing program will soon be obsolete. Incidentally,
Newell now estimates that their program will take about 10 hr per move.
We can summarize both the description of the Newell, Shaw, and
Simon work and, in fact, all of the published work on chess, by quoting
these last authors, “There is clearly evident, in this succession of efforts
[that is, one can see] a steady development toward the use of more and
more complex programs and more and more selective heuristics, and
toward the use of principles of play similar t o those used by human
players. Partly, this trend represents-at least in our case-a deliberate
attempt to simulate human thought processes. I n even larger part, how-
ever, it reflects the constraints that the task itself imposes upon any infor-
mation-processing system that undertakes to perform it.”
As an addendum, i t should be remarked that there are several other
groups working in this field who for one reason or another have not seen
fit to publish their work. It will be most interesting to see if they concur
with the Newell, Shaw, and Simon assessment of the situation.

4. Checkers (or Draughts)


Possibly because checkers is generally assumed t o be less “U” than
chess, only a limited number of workers have dealt with the problem of
programming this game for a digital computer. The first published work
was that of Strachey [9] in Great Britain. Strachey’s program followed
176 ARTHUR 1. SAMUEL
the general methods outlined by Shannon. H e recognized the desirability
of continuing the look-ahead procedure until a dead position is reached,
defined in this case by a sequence of two nonjump moves. His program
could make a move in one or two minutes and, although the play was
not brilliant, it played, in his words, “quite a tolerable game until the
end game.”
While it is grossly unfair to dismiss Strachey’s work in a single para-
graph and to discuss the present writer’s own efforts in some detail, in
the interests of conciseness this will have to be done. Perhaps such high-
handed behavior can be excused if the writer publicly apologizes for his
action, as he now does, and acknowledges the credit which is Mr. Strach-
ey’s due for being there first and for the many clever programming tech-
niques which either the writer has borrowed from Strachey’s work or
which have been suggested to him by Strachey in private discussion.

4.1 The Samuel Checker Program


Samuel [lo] (having apologized, we revert to formalism) justifies his
long and continued interest in programming checkers on the basis of the
use of a game as a vehicle for studying machine-learning techniques.
Suggestions that he incorporate standard openings or other forms of man-
devised checker lore have been consistently rejected. I n contrast with the
Newell, Shaw, and Simon philosophy, he refuses to pass judgment on
whether the program makes good moves for the right reasons, demanding
instead, that it develop its own reasons. Checkers was chosen rather than
chess primarily because the simplicity of its rules permitted greater em-
phasis being placed on learning techniques. The game was thought to
contain all of the basic characteristics of an intellectual activity in which
heuristic procedures and learning processes can play a major role and
in which these processes can be evaluated.

4.1.1 CHARACTERISTICS NEEDED FOR LEARNINQ STUDIES


These characteristics are enumerated as:
(a) The activity must not be deterministic in the practical sense.
There exists no known algorithm which will guarantee a win or a draw
in checkers. The complete explorations of every possible path through a
checker game would involve perhaps 1040 choices of moves-less than the
corresponding figure for chess, but still a very large number.
(b) A definite goal must exist (the winning of the game) and a t least
one criterion or intermediate goal must exist which has a bearing on the
achievement of the final goal and for which the sign should be known.
I n checkers, the goal is to deprive the opponent of the possibility of
PROGRAMMING COMPUTERS TO PLAY GAMES 177

moving, and the dominant criterion is the number of pieces of each color
on the board.
(c) The rules of the activity must be definite and they should be known.
Games satisfy this requirement. Unfortunately, many problems of eco-
nomic importance do not. While, in principle, the determination of the
rules can be a part of the learning process, this is a complication which
might well be left until later.
(d) There should be a background of knowledge concerning the activity
against which the learning progress can be tested.
(e) The activity should be one that is familiar to a substantial body
of people so that the behavior of the program can be made understandable
to them. More people play or think that they can play checkers than is
the case for chess. I n both cases the ability to have the program play
against human opponents (or antagonists) adds spice to the study and,
incidentally, provides a convincing demonstration for those who do not
believe that machines can compete with people in activities which are
usually assumed to involve thinking.
4.1.2 GENERAL METHOD

Following Shannon’s lead, the Samuel program plays by looking ahead


a few moves and by evaluating the resulting board positions with scores
then being backed up by the customary minimax procedure. I n contrast
with the 30 figure assumed for chess, there are, on the average, only about
7 moves available a t any one time in checkers so that it is possible to
explore all available initial moves. As a matter of fact, the Bernstein selec-
tion procedure is not a t all safe as applied to checkers, since some of the
more brilliant move sequences may start with moves which superficially
do not appear to be logical. On the other hand, even with but 7 choices
a t each branch point, it is not possible to explore all sequences to more
than a very few moves, and great care must be taken to terminate unin-
teresting sequences and to obtain the maximum amount of information
from those branches which are explored in depth.
4.1.3 TECHNIQUES USED TO ACHIEVE SPEED
With the primary attention being on learning, it was also felt that
the time per move should be kept to an absolute minimum so that many
games could be played. In most of the work this playing time was held to
30 sec, by making full use of all of the resources of the available com-
puter (IBM 704) , by employing elaborate table-lookup procedures, by
designing special fast sorting and searching procedures, and by using a
variety of special programming tricks. A slight digression, to consider
some of these techniques, might be in order a t this time and may help
178 ARTHUR 1. SAMUEL
the reader to visualize how games are actually played on a computer.
Readers interested only in the underlying philosophy may skip the rest
of this section.
During the look-ahead procedure, which takes much of the operating
time, the program must progress from a statement of a board position, to
a statement of a possible move, to a statement of the resulting board
position, etc. Board positions are stored in the machine as a set of four
machine words. Each word in the IBM 704 is 36 bits in length and 32 of
these bits, by convention, arc assigned to corresponding squares on the
checkerboard (only alternate squares are used in checkers). Pieces are
represented by 1’s appearing in the assigned bit positions of the corre-
sponding word. Two words are reserved for pieces belonging to the side
whose turn it is to play, called the Active side, and two words are as-
signed to the Passive side. By convention, one direction on the board is
called Forward (this being the direction in which Black men can move),
and the Active side’s pieces which can move in this dircction (being both
men and kings if Black is Active, and kings only if White is Active) are
stored in the word reserved for Forward-Active pieces. The second word
contains Backward-Active piece designations (these being kings only if
Black is Active, and both men and kings if White is Active). Similar con-
ventions hold for the two words reserved for Passive pieces. Lists of the
available moves corresponding to any specified board position are com-
puted from these four words by a process which Samuel describes in detail,
and are themselves stored in five words, one word being used solely to
indicate whether the move is a jump or not (since jumps are compulsory,
no normal moves need be listed if a single jump is possible), and the
remaining four to contain 1’s in bit positions corresponding to the squares
from which pieces can move in the four possible diagonal directions:
Right-Forward, Left-Forward, Left-Backward, and Right-Backward.
With all possible moves so recorded it is relatively easy to consider one
move only, deleting the corresponding bit in the move record so that this
particular move will never be reconsidered when one again returns to
this same level of look-ahead, and to compute a new set of four words,
specifying the resulting board position after the move is executed. The
reversal in sides is automatically taken care of by storing the former
Active pieces in the region reserved for Passive pieces and vice versa.
The procedure for calculating normal moves from board positions is
illustrative of some of the techniques used to hold the computing time
to an absolute minimum. The straightforward way to do this would be to
consider the squares on the board one a t a time; first testing to see if they
were occupied, and then testing to see if moves were available. This process
would, obviously, have to be repeated 32 times, once for each square.
PROGRAMMING COMPUTERS TO PLAY GAMES 179
Actually, the entire computation for all squares was done in one fell
swoop. The interested reader is referred to [ 101 for details of the method.
4.1.4 OPERATIONAL PROCEDURES

Not only was everything possible done to increase operating speed, but
a great deal of attention was given to the operating procedures. The pro-
gram is arranged so that it can be operated in a variety of different ways.
For example, it is possible to cause the program to play itself, that is, to
play both sides of the game. This mode of play was found to be especially
useful during the early stages of learning.
It is also possible to have the program follow book games. When operat-
ing in this mode, the program decides a t each point in the game on its
next move in the usual way and reports this proposed move. Instead of
actually making this move, the program refers to the stored record of a
book game and makes the book move. The program records its evaluation
of the two moves and it also counts and reports the number of possible
moves which it rates as being better than the book move and the number
it rates as being poorer. The sides are reversed and the process is repeated.
For demonstration purposes, and also as a means of avoiding lost ma-
chine time while an opponent is thinking, it is possible to play several
simultaneous games against different opponents. Eight have been played
on a number of occasions. When playing in this fashion the different
moves are reported separately on punch cards (as well as being listed on
the printer). These cards are used as input, together with a card contain-
ing the player’s move, as a means of identifying the game to which the
specific move applies.
Games may be started with any initial configuration of the board posi-
tion so that the program may be tested on end games, checker puzzles,
etc. For nonstandard starting conditions the program lists the initial
piece-arrangement. From time to time, and a t the end of each game, the
program also tabulates various bits of statistical information which as-
sists in the evaluation of playing performance.
During normal play against a single opponent the program accepts
moves entered on the input keys and displays its move on the console
lights. A complete record is made concurrently on the printer. This record
lists the moves made by both sides and the program’s evaluation of these
moves. Should an opponent attempt to make an illegal move the program
stops, with a full display of all the lights on the console, and waits for a
legal move to be entered. A listing of the existing board positions can be
requested at any time should errors be made in executing moves on the
opponent’s board.
The program concedes when faced with a sure defeat, and when it can
180 ARTHUR 1. SAMUEL

predict a win i t reports this fact on the printer. A win in not more than
15 moves (half moves, as some people count) is its record prediction to
date,
4.1.5 PLY LIMITATIONS
As briefly mentioned in Section 4.1.2, great care was exercised in termi-
nating the look-ahead procedure. For convenience in discussing this prob-
lem, Samuel defines the look-ahead distance as the ply (a ply of 2 consist-
ing of one proposed move by the machine and the anticipated reply by
the opponent). The ply is not fixed, but depends upon the dynamics of
the situation, and it is allowed to vary from move to move and from
branch to branch during the analysis. Several criteria are used for termi-
nating the look-ahead relating to the existence of jump moves and to
the possibilities of offering exchanges. Sequences are terminated a t varying
plies, as shown in Fig. 2, usually from a minimum of 3 to a maximum of
20, this upper figure being set by the space reserved for storing the con-
tinuations. For a while a record was kept of the maximum ply encountered
during each move analysis. It was soon found that this limiting ply of 20
was reached a t least once during almost every move analysis. This was
usually found to be the result of an unlikely situation in which one side
or the other always made the worst possible move. When a provision was
introduced to terminate obvious winning or losing sequences (in which one
side gained the equivalent of a 2-king advantage) a t a reasonable ply
figure (usually set a t 11), a substantial reduction in playing time resulted.
When the look-ahead procedure is terminated the resulting board posi-
tion is then evaluated in terms of a linear polynomial.
4.1.6 THEI SCORING POLYNOMIAL

The successive terms of the scoring polynomial are related, or so it is


hoped, such that each stands in relation t o the term with the next larger
coefficient as an intermediate goal whose achievement indicates that the
machine is going in the right direction to achieve the more dominant
goal. If the machine could look far enough ahead one need only ask, “Is
the machine still in the game?” Since it cannot look this far ahead in the
usual situation something else must be substituted, say the piece ratio,
and the machine then continues the look-ahead until one side has a ma-
terial advantage. But, even this is not always possible, so the program tests
for a position advantage, etc. Numerical measures of these various prop-
erties of the board position are added together, each with appropriate co-
efficients to form the evaluation polynomial. Setting these various coeffi-
cients a t widely differing values produces the situation favored by Newell,
Shaw, and Simon. The coefficients can be set, with equal ease, so that
PROGRAMMING COMPUTERS TO PLAY GAMES 181

FIG.2. A “tree” of moves which might be investigated during the look-ahead. The
actual branchings are much more numerous than those shown, and the “tree” is
apt t o extend t o as many as 20 levels. (Reproduced from reference [lo].)

the ranges in numerical values of the various terms overlap. Both schemes
have been tried a t various times.
Being more specific, the dominant scoring parameter, as defined by
the rules for checkers, is the inability for one side or the other t o move.
182 ARTHUR 1. SAMUEL
Since this can occur but once in any game it is tested for separately and
is not included in the scoring polynomial as tabulated by the computer
during play. The next parameter is considered to be the relative piece-
advantage, It is always assumed that it is to the machine’s advantage
to reduce the number of the opponent’s pieces as compared to its own.
A reversal of the sign of this term will, in fact, cause the program to play
“give-away” checkers, and with learning it can only learn to play a better
and better give-away game. Were the sign of this term not known, it
could, of course, be determined by tests, but it must be fixed by the
experimenter and, in effect, it is one of the instructions to the machine
defining its task. The numerical computation of the piece-advantage has
been arranged in such a way as to account for the well-known property
that it is usually to one’s advantage to trade pieces when one is ahead and
to avoid trades when one is behind. Furthermore, it is assumed that kings
are more valuable than pieces, the relative weights assigned to them being
3 to 2. This ratio means that the program will trade 3 men for 2 kings, or
2 kings for 3 men, if by so doing it can obtain some positional advantage.
The choice of the parameters to follow this first term and, indeed, the
method of making this choice has been the subject of much study. Two
procedures have been investigated; the one in which the experimenter
himself selected the terms and in which the learning feature resides else-
where, and the second in which the program made its own selection as
n part of the learning process.

4.1.7 VARIOUS FORMS OF ROTE LEARNINQ


The most elementary type of learning discussed by Samuel was that
in which the program simply saves all of the board positions encountered
during play, together with their computed scores. Reference can then be
made to this memory record and a certain amount of computing time can
be saved. This can hardly be called a very advanced form of learning;
nevertheless, if the program then utilizes the saved time t o compute fur-
ther in depth it would appear to improve with time.
Fortunately, the ability to store board information at a ply of zero,
and to look up boards a t a larger ply, provides the possibility of looking
much farther in advance than might otherwise be possible. Samuel ex-
plained this by considering a very simple case where the look-ahead is
always terminated at a fixed ply, say 3, and in which the program saves
only the board positions encountered during the actual play with their
associated backed-up scores. Now it is this list of previous board positions
that is used to look up board positions while a t a ply level of 3 in the
subsequent games. If a board position is found, its score has, in effect,
PROGRAMMING COMPUTERS TO PLAY GAMES 183

already been backed up by 3 levels, and if it becomes effective in deter-


mining the move to be made, it is a 6-ply score rather than a simple 3-ply
score. This new initial board position with its 6-ply score is, in turn,
saved and it may be encountered in a future game and the score backed
up by an additional set of 3 levels, etc. This procedure is illustrated in
Fig. 3. The incorporation of this variation, together with the simpler rote-

FIG.3. Simplified representation of the rote-learning process, in which information


saved from a previous game is used to increase the effective ply of the backed-up
score. (Reproduced from reference [lo] .)
KEY.@, typical board position found in memory with more from previous look-
ahead search.

learning feature, resulted in a fairly powerful learning technique which


was studied in some detail.
One important additional feature had to be added to the program
before it would play satisfactorily. It was found necessary to impart a
sense of direction to the program in order to force it to press on toward
a win. To illustrate this, consider the situation of 2 kings against 1 king,
which is a winning combination for practically all variations in board
positions. With time, the program can be assumed to have stored all of
these variations, each associated with a winning score. Now, if such a
184 ARTHUR L. SAMUEL
situation is encountered, the program will look ahead along all possible
paths and each path will lead to a winning combination, in spite of the
fact that only one of the possible initial moves may be along the direct
path toward the win while all of the rest may be wasting time,
This difficulty was overcome by keeping a record of the ply value of
the different board positions a t all times and by making a further choice
between board positions on this basis. If ahead, the program would push
directly toward the win while, if behind, it would adopt delaying tactics.
This was done automatically by simply decreasing the magnitude of the
score a small amount each time it was backed up a ply level during the
analyses. When the program was then faced with a choice of board posi-
tions whose scores diffcred only by the ply number it would always make
the most advantageous choice.
4.1.7.1 Cataloging and Culling Stored Information
Since practical considerations limited the number of board positions
which could be saved, and since the time to search through those that are
saved could have easily become unduly long, Samuel devised elaborate
systems, (1) to catalog the boards that were saved, (2) to delete redundan-
cies, and (3) to discard board positions which were not believed to be of
much value. Board positions were first standardized by reversing the
pieces and piece positions if it was a board position in which White was
to move, so that all boards were reported as if it were Black’s turn to
move, reducing by nearly a factor of 2 the number of boards which had
to be saved. Board positions in which all of the pieces were kings were
reflected about the diagonals with a further fourfold reduction in the num-
ber which had to be saved. Standardized board positions were grouped
into records on the basis of (1) the number of pieces on the board, (2)
the presence or absence of a piece advantage, (3) the side possessing this
advantage, (4) the presence or absence of kings on the board, ( 5 ) the
side having the so-called “move,” or opposition advantage, and finally
(6) the first moments of the pieces about normal and diagonal axes
through the board.
During play, newly acquired board positions were saved in the high-
speed memory until a reasonable number had been accumulated and they
were then merged with those on the “memory tape” and a new memory
tape was produced. Board positions within a record were listed in a serial
fashion, being sorted with respect to the words which defined them. The
records were arrangcd on the tape in the order in which they were most
likely to be needed during the course of a game; board positions with 12
pieces to a side coming first, etc. This method of cataloging was found
to cut tape-searching time to a minimum.
PROGRAMMING COMPUTERS TO PLAY GAMES 185

Reference was made to the board positions already saved by reading


the correct record into the memory and searching through it by a dichoto-
mous search procedure. Usually five or more records were held in memory
a t one time, the exact number at any time depending upon the lengths
of the particular records in question. Normally, the program called three
or four new records into memory during each new move, making room for
them as needed, by discarding the records which had been held the longest.
Two different procedures were found to be of value in limiting the num-
ber of board positions that were saved; one based on the frequency of use
and, the second, on the ply. To keep track of the frequency of use, an
age term was added to the score. Each new board position to be saved
was arbitrarily assigned an age. When reference was made to a stored
board position, either to update its score or to utilize it in the look-ahead
procedure, the age recorded for this board position was divided by 2. This
is called refreshing. Offsetting this, each board position was automatically
aged by one unit a t memory merge times (normally occurring about once
every twenty moves). When the age of any one board position reached
an arbitrary maximum value this board position was expunged from the
record. This is a form of forgetting. New board positions which remained
unused were soon forgotten, while board positions which were used several
times in succession were refreshed to such an extent that they were re-
membered even if not used thereafter for a fairly long period of time. This
form of refreshing and forgetting was adopted on the basis of reflections
as to the frailty of human memories.
I n addition to the limitations imposed by forgetting, a restriction was
placed on the maximum size of any one record. Whenever an arbitrary
limit was reached, enough of the lowest-ply board positions were auto-
matically culled from the record to bring the size well below the maximum.

4.1 .Y.d Selection of Terms for Rote-Learning Studies


Before embarking on a study of the learning capabilities of this sys-
tem, it was, of course, first necessary to fix the terms and coefficients in
the evaluation polynomial. To do this, a number of different sets of values
were tested by playing through a series of book games and computing the
move correlation coefficients. These values varied from 0.2 for the poorest
polynomial tested, to approximately 0.6 for the one finally adopted. The
selected polynomial contained four terms (as contrasted with the use
of 16 terms in later experiments). In decreasing order of importance,
these were: (1) piece advantage, (2) denial of occupancy, (3) mobility,
and (4) a hybrid term which combined control of the center and piece
advancement.
186 ARTHUR 1. SAMUEL

4.1.7.3 Rote-Learning Tests


Having arbitrarily picked a scoring polynomial, a series of games were
played, both self-play and play against many different individuals (sev-
eral of these being checker masters). Many book games were also followed,
some of these being end games. The program learned to play a very good
opening game and to recognize most winning and losing end positions
many moves in advance, although its midgame play was not greatly im-
proved. This program was able to qualify as a rather better-than-average
novice, but definitely not as an expert. These studies were continued until
the memory tape contained something over 53,000 *board positions (aver-
aging 3.8 words each). While this was still far from the number which
would tax the listing and searching procedures used in the program, rough
estimates, based on the frequency with which the saved boards were
utiiized during normal play (these figures being tabulated automatically) ,
indicated that a library tape containing a t least twenty times this number
of board positions would be needed to improve the midgame play. At the
existing rate of acquisition of new positions this would require an in-
ordinate amount of play and, consequently, of machine time?

4.1.7.4 Rote-Learning Conclusions


The general conclusions drawn from these tests were that:
(a) An effective rote-learning technique must include a procedure to
minimize the length of the path to the desired result, and it must contain
a refined system for cataloging and storing information.
(b) Rote-learning procedures can be used effectively on machines with
the data-handling capacity of the IBM 704 if the information which must
be saved and searched does not occupy more than, roughly, one million
words, and if not more than one hundred or so references need to be made
to this information per minute. These figures are, of course, highly de-
pendent upon the exact efficiency of cataloging which can be achieved.
(c) The game of checkers, when played with a simple scoring scheme
and with rote learning only, requires more than this number of words for
master caliber of play and, as a consequence, is not completely amenable
to this treatment on the IBM 704.
(d) A game, such as checkers, is a suitable vehicle for use during the
development of learning techniques and i t is a very satisfactory device
for demonstrating machine-learning procedures to the unbelieving.

'This playing-time requirement, while large in terms of cost, is ICRS than the time
which the checker master probably spends to acquire hia proficiency.
PROGRAMMING COMPUTERS TO PLAY GAMES 187

4.1.8 A LEARNING PROCEDURE INVOLVING GENERALIZATIONS


An obvious way to decrease the amount of storage needed to utilize
past experience is to generalize on the basis of experience and to save only
the generalizations. This should, of course, be a continuous process if it
is to be truly effective, and it should involve several levels of abstraction.
Samuel has made a start in this direction by having the program select,
from a list of 38 parameters, a subset of 16 terms for use in the evaluation
polynomial, and by having the program determine the sign and magnitude
of the coefficients which multiply these selected parameters. The piece-
advantage term needed to define the task was computed separately and,
of course, was not altered by the program.
The program was arranged to act as two different players, called Alpha
and Beta. Alpha generalized on its experience, after each move, by ad-
justing the coefficients in its evaluation polynomial and by replacing terms
which appeared to be unimportant by new parameters drawn from a re-
serve list. Beta, on the contrary, used the same evaluation polynomial
for the duration of any one game. Program Alpha was used to play
against human opponents, and during self-play Alpha and Bets played
each other.
At the end of each self-play game a determination was made of the
relative pIaying ability of Alpha, as compared with Beta, by a neutral
portion of the program. If Alpha won (or was adjudged to be ahead when
a game was otherwise terminated) the then current scoring system used
by Alpha was given to Beta. If, on the other hand, Beta won or was
ahead, this fact was recorded as a black mark for Alpha. Whenever Alpha
received an arbitrary number of black marks (usually set at 3) it was
assumed to be on the wrong track and a fairly drastic and arbitrary
change was made in its scoring polynomial (by reducing the coefficient of
the leading term to zero). This action was necessary on occasion, since the
entire learning process was an attempt to find the highest point in multi-
dimensional scoring space in the presence of many secondary maxima on
which the program could become trapped.
The capability of the program could be tested at any time by having
Alpha play one or more book games (with the learning procedure tem-
porarily immobilized) and by correlating its play with the recommenda-
tions of the masters or, more interestingly, by pitting it against a human
player.

4.1.8.1 Polynomial Modification Procedure


For Alpha to make changes in its scoring polynomial it must have
some trustworthy criteria for measuring performance. A logical difficulty
188 ARTHUR L. SAMUEL

presents itself, since the only measuring parameter available is this same
scoring polynomial that the process is designed to improve. Recourse was
had to the peculiar property of the look-ahead procedure which makes
it less important for the scoring polynomial to be particularly good the
further ahead the process is continued. This means that one can evaluate
the relative change in the positions of two players, when this evaluation
is made over a fairly large number of moves, by using a scoring system
which is much too gross to be significant on a move-by-move basis.
In order to obtain a large enough span to make use of this character-
istic, it was arranged for Alpha to keep a record of the apparent good-
ness of its board positions as the game progressed. This was done by
computing the scoring polynomial for each board position encountered
in actual play and by saving this polynomial in its entirety. At the same
time, Alpha also computed the backed-up score for all board positions,
using the look-ahead procedure described earlier. At each play by Alpha
the initial board score, as saved from the previous Alpha move, was
compared with the backed-up score for the current position. The differ-
ence between these scores, defined as delta, was used to measure Alpha’s
progress. If delta was positive, then, presumably, Alpha’s position was
improving with time and, consequently, it was reasonable to assume
that the initial board evaluation was in error and terms which con-
tributed positively should have been given more weight. A converse
statement could be made for the case where delta was negative. Pre-
sumably, in this case, a wrong choice of moves was made and greater
weight should have been given to terms making negative contributions.
A record was kept of the correlation existing between the signs of the
individual term contributions in the initial scoring polynomial and the
sign of delta. After each play in which a substantial delta value resulted,
an adjustment was made in the values of the correlation coefficients, with
due account being taken of the number of times that each particular term
had been used and had had a nonzero value. The coefficient for the poly-
nomial term (other than the piece-advantage term) with the then largest
correlation coefficient was set a t a prescribed maximum value with pro-
portionate values determined for all of the remaining coefficients. When-
ever a correlation-coefficient calculation led to a negative sign a corre-
sponding reversal was, of course, made in the sign associated with the
term itself.
When the value of delta fell below an arbitrary limit, and so cast
doubt on the validity of the proposed correction, the initial board score
for this move was retained until after another pair of moves, and delta
was then computed over this longer span. This process was repeated
PROGRAMMING COMPUTERS TO PLAY GAMES 189
as many times as necessary until a significantly large delta value re-
sulted.
After each play for which an adjustment was made, a low-term tally
was recorded against the term which had the lowest correlation coefficient
and, a t the same time, a test was made to see if this brought its tally
count up to some arbitrary limit (originally set at 8, and later set a t 32).
When this limit was reached for any specific term, this term was trans-
ferred to the bottom of the reserve list and it was replaced by a term
from the head of the reserve list.
Samuel comments on the fact that the procedure of having the pro-
gram select terms for the evaluation polynomial from a supplied list
might be thought to be much too simple, and that the program should
be made to generate the terms for itself. Unfortunately, he was not able
to devise a satisfactory scheme for doing this. With a man-generated list
one might a t least ask that the terms be members of an orthogonal set,
assuming that this has some meaning as applied to the evaluation of a
checker position. But, even this seemed impossible. The only practical
solution seemed to be that of including a relatively large number of pos-
sible terms in the hope that all of the contributing parameters got cov-
ered somehow, even though in an involved and redundant way. This is
not an entirely undesirable state of affairs, however, since i t simulates
the situation which is apt to exist when an attempt is made to apply
similar learning techniques to real-life situations.
Many of the terms in the existing list were related in some vague way
to the parameters used by checker experts. Some of the concepts which
checker experts appear to use eluded attempts a t definition in terms of
machine programming. Some of the terms were quite unrelated to the
usual checker lore and were discovered more or less by accident. The
second moment about the diagonal axis through the double corners is
an example. Twenty-seven different simple terms were used and a num-
ber of combinational terms were introduced which had some of the
characteristics of binary connectives.
4.1.8.6 Learning4 y-Generalization Tests
Samuel reported on two fairly extensive series of tests of this learning
procedure; the first series covering 28 games. This series revealed certain
weaknesses of the program which were corrected. The results of 42 addi-
tional games played after these changes were also reported. I n general,
the program was able to learn to play a rather good game.
The tentative conclusions which were drawn from these tests were:
(a) A simple generalization scheme of the type here used can be an
190 ARTHUR 1. SAMUEL
effective learning device for problems amenable to tree-searching pro-
cedures.
(b) The memory requirements of such schemes are quite modest and
remain fixed with time.
(c) The operating times are also reasonable and remain fixed, inde-
pendent of the amount of accumulated learning.
(d) Incipient forms of instability in the solution can be expected but,
at least for the checker program, these can be dealt with by quite straight-
forward procedures.
(e) Even with the incomplete and redundant set of parameters which
have been used to date, it is possible for the computer to learn to play a
better-than-sverage game of checkers in a relatively short period of time.
4.1.9 COMPARISONS BETWEEN THE TWO METHODB

Samuel draws some interesting comparisons between the playing style


developed by the generalination program and that developed by the
earlier rote-learning procedure. The program with rote learning soon
learncd to imitate master play during the opening moves. It was always
quite poor during the middle game, but it easily learned how to avoid
most of the obvious traps during end game play and could usually drive
on toward a win when left with a piece advantage. The program with the
generalization procedure never learned to play in a conventional manner
and its openings were apt to be weak. On the other hand, it soon learned
to play a good middle game, and with a piece advantage it usually
polished off its opponent in short order. Apparently, this form of rote
learning is of the greatest help either under conditions when the results
of any specific action are long delayed, or in those situations where
highly specialized techniques are required. Contrasting with this, the
generalization procedure is most helpful in situations in which the avail-
able permutations of conditions are large in number and when the con-
sequences of any specific action are not long delayed.
Samuel concluded that his experiments had demonstrated with some
certainty that it is possible to devise learning schemes which will out-
perform an average person and that such learning schemes may someday
be economically feasible as applied to real-life situations.

5. Other Garnet
While the programming of computers to play bridge hands has not
attracted much attention, several people have programmed bridge bid-
ding, using one of the conventional bidding systems. The program
written for the IBM 650 by David Lefkovitz a t the Moore School of
PROGRAMMING COMPUTERS TO PLAY GAMES 191

Electrical Engineering of the University of Pennsylvania, is perhaps


typical. This program, called LEX by its originator, assumes the role of
one of the players a t the table, with the other three players being either
human or, if other machines are available, additional LEX programs.
As Lefkovitz points out “. . . two LEX programs as partners would be
quite compatible since the criterion of a good partnership in bridge is a
common understanding of the system and conventions employed.” LEX
operates algorithmically in the Goren system and will respond to the
following conventions: One club convention, Short club open, Stayman,
and Blackwood. Lefkovitz recognized that bridge bidding was not a good
subject for a heuristic approach and treated it as a large table-lookup
procedure. His program will make a good opening bid in from 9 to 15 sec
and a reasonably competent response bid in 10 sec or less.
Mention has already been made of Davies’ work on tic tac toe [l].
More recently, the three-dimensional four-in-a-row variation of this game
has been programmed for the IBM 650 by Mr. H. I?. Smith, Jr. This pro-
gram looks ahead a number of moves and evaluates the resulting posi-
tions in terms of the relative control of the various lines through the
array. Since no complete algorithm is used, it is possible t o beat the
program, although this is seldom done.
A number of independent groups have simulated Hagelbarger’s SEER
[2] on a digital computer. The program written by A. L. Samuel, and
later elaborated by W. Burgin, A. L. Samuel and Hao Wang, is perhaps
typical. This program will detect unintended nonrandom sequences in
the human calling of heads-or-tails and so outwit a human opponent who
does not resort to a random process to determine his choices. The pro-
gram will also learn most repeating sequences of binary choices and can
be taught to reproduce a great many different sequences, such, for exam-
ple, as T and e, to several decimal places, when given the first two or three
digits. An interesting variation of this technique, which these workers
have considered, has to do with teaching a computer to supply missing
letters and to correct the spelling of Basic English texts.
There remains a rather large class of games which employ digital
computers to simulate the environment in which a game may be played.
Most business games and so-called war games fall into this category.
For these games, the computer is acting as a gaming device. However
involved the programming may be, and some of these games are quite
sophisticated, the computer is not acting as a player-instead, it is pro-
viding the conditions and interpreting the rules so that the human play-
ers can play each other or, in special cases, so that one player can play
a game of solitaire. These otherwise interesting and extremely useful
games do not come within the scope of the present discussion since the
192 ARTHUR L. SAMUEL
computer is not playing the game and its action does not appear to be
motivated by human intelligence.

6. A look Into the Future


Just as it was impossible to begin the discussion of game-playing ma-
chines without referring to the hoaxes of the past, it is equally unthink-
able to close the discussion without a prognosis. Programming computers
to play games is but one stage in the development of an understanding
of the methods which must be employed for the machine simulation of
intellectual behavior. As we progress in this understanding it seems rea-
sonable to assume that these newer techniques will be applied to real-life
situations with increasing frequency, and that the effort devoted to games
or other toy problems will decrease. Perhaps we have not yet reached this
turning point and we may still have much to learn from the study of
games. Nevertheless, it seems reasonably certain that the time is not far
distant when we will be able t o exploit the decision-making capabilities
of the digital computer to a much greater extent than is now done, and
when many of the more humdrum mental tasks which now take so much
human time will be done by machine.
References
1. Referenced, but not described in detail, in the discussion on the computer in
a non-arithmetic role. Davies, D. W., Proc. Znst. Elec. Engrs., (London) P t . B,
suppl. 103,473 (1946).
2. Hagelbarger, D. W., SEER,a sequence extrapolating robot, I R E Trans. on Elec-
tronzk Computers EQ, 1 (1956).
3. Shannon, C. E., Programming a computer for playing chess, Phil. Mug. 41, 256
(1950).
4. Published as a portion of Chapter 25 in Faster Than Thought (B. V. Bowden,
ed.), Pitman, London, 1953.
5. Kister, J., Stein, P., Ulam, S., Walden, W., and Wells, M., Experiments in chess,
J . Assoc. Comp. Mach. 4,174 (1957).
6. Bernstein, A., and Roberts, M. deV., Computer v. chess-player, Sci. Americun
198, 96 (1958) ; Bernstein, A., Roberts, M . deV., Arbuckle, T., and Belsky, M. A.,
A chess-playing program for the IBM 704, Proc. Western Joint Computer C o n f .
(1958).
7. Newell, A., The chess machine, Proc. Western Joint Computer Conf. (1955);
Newell, A., Shaw, J. C., and Simon, H. A., Chess-playing programs and the
problem of complexity, I B M J . Research Develop. 2,320 (1958).
8. Newell, A., and Shaw, J. C., Programming the logic theory machine, Proc. West-
ern Joint Computer Conf. (1957).
9. Strachey, C . S., Logical or non-mathematical programmes, Proc. Assoc. for Com-
puting Machinery Meeting, Toronto, pp. 46-49 (1952).
10. Samuel, A. L., Some studies in machine learning using the game of checkers,
IBM J . Research Develop. 3,210-229 (1959).

You might also like