ppt Module-1 Chapter 5 Game Playing.pptx - Google Slides
ppt Module-1 Chapter 5 Game Playing.pptx - Google Slides
Index
• Games
• OPTIMAL DECISIONS IN GAMES
• The minimax algorithm
• Optimal decisions in multiplayer games
• ALPHA–BETA PRUNING
• IMPERFECT REAL-TIME DECISIONS
• STOCHASTIC GAMES
• PARTIALLY OBSERVABLE GAMES
• Kriegspiel: Partially observable chess
• Card games
Games
• In this chapter we cover competitive environments, in which the agents’
goals are in conflict, giving rise to adversarial search problems often
known as games.
• Mathematical game theory, a branch of economics, views any multiagent
environment as a game, provided that the impact of each agent on the others
is “significant,” regardless of whether the agents are cooperative or
competitive.
• In AI, the most common games are of a rather specialized kind what game
theorists call deterministic, turn-taking, two-player, zero-sum games of
perfect information (such as chess).
• In our terminology, this means deterministic, fully observable environments
in which two agents act alternately and in which the utility values at the end
of the game are always equal and opposite.
Cont…
• For example, if one player wins a game of chess, the other player necessarily
loses. It is this opposition between the agents’ utility functions that makes the
situation adversarial.
• Games have engaged the intellectual faculties of humans—sometimes to an
alarming degree for as long as civilization has existed.
• For AI researchers, the abstract nature of games makes them an appealing
subject for study. The state of a game is easy to represent, and agents are
usually restricted to a small number of actions whose outcomes are defined
by precise rules.
• Physical games, such as croquet and ice hockey, have much more
complicated descriptions, a much larger range of possible actions, and rather
imprecise rules defining the legality of actions.
• With the exception of robot soccer, these physical games have notattracted
much interest in the AI community.
Cont…
• We begin with a definition of the optimal move and an algorithm for finding
it. We then look at techniques for choosing a good move when time is
limited.
• Pruning allows us to ignore portions of the search tree that make no
difference to the final choice, and heuristic evaluation functions allow us to
approximate the true utility of a state without doing a complete search.
• Games such as backgammon that include an element of chance; we also
discuss bridge, which includes elements of imperfect information because
not all cards are visible to each player.
• Finally, we look at how state-of-the-art game-playing programs fare against
human opposition and at directions for future developments.
• We first consider games with two players, whom we call MAX and MIN for
reasons that will soon become obvious. MAX moves first, and then they take
turns moving until the game is over. At the end of the game, points are
awarded to the winning player and penalties are given to the loser.
Cont…
• A game can be formally defined as a kind of search problem
with the following elements:
Cont…
• The initial state, ACTIONS function, and RESULT function define the game
tree for the game—a tree where the nodes are game states and the edges are
moves. Figure 5.1 shows part of the game tree for tic-tac-toe (noughts and
crosses).
OPTIMAL DECISIONS IN GAMES
• In a normal search problem, the optimal solution would be a
sequence of actions leading to a goal state a terminal state that is a
win.
• In adversarial search, MIN has something to say about it. MAX
therefore must find a contingent strategy, which specifies MAX’s
move in the initial state, then MAX’s moves in the states resulting
from every possible response by MIN, then MAX’s moves in the
states resulting from every possible response by MIN to those moves,
and so on.
• This is exactly analogous to the AND–OR search algorithm with
MAX playing the role of OR and MIN equivalent to AND. Roughly
speaking, an optimal strategy leads to outcomes at least as good as
any other strategy when one is playing an infallible opponent. We
begin by showing how to find this optimal strategy.
Cont…
• Even a simple game like tic-tac-toe is too complex for us to draw the entire
game tree on one page, so we will switch to the trivial game in Figure 5.2.
• The possible moves for MAX at the root node are labeled a1, a2, and a3. The
possible replies to a1 for MIN are b1, b2, b3, and so on. This particular game
ends after one move each by MAX and MIN.
• (In game parlance, we say that this tree is one move deep, consisting of two
half-moves, each of which is called a ply.) The utilities of PLY the terminal
states in this game range from 2 to 14.
Cont…
Cont…
Cont…
• Let us apply these definitions to the game tree in Figure 5.2. The terminal
nodes on the bottom level get their utility values from the game’s UTILITY
function.
• The first MIN node, labeled B, has three successor states with values 3, 12,
and 8, so its minimax value is 3. Similarly, the other two MIN nodes have
minimax value 2.
• The root node is a MAX node; its successor states have minimax values 3, 2,
and 2; so it has a minimax value of 3. We can also identify the minimax
decision at the root: action a1 is the optimal choice for MAX because it leads
to the state with the highest minimax value.
• This definition of optimal play for MAX assumes that MIN also plays
optimally it maximizes the worst-case outcome for MAX. What if MIN does
not play optimally? Then it is easy to show (Exercise 5.7) that MAX will do
even better. Other strategies against suboptimal opponents may do better
than the minimax strategy, but these strategies necessarily do worse against
optimal opponents.
The minimax algorithm
• The minimax algorithm (Figure 5.3) computes the minimax decision from
the current state.
• It uses a simple recursive computation of the minimax values of each
successor state, directly implementing the defining equations.
• The recursion proceeds all the way down to the leaves of the tree, and then
the minimax values are backed up through the tree as the recursion unwinds.
• For example, in Figure 5.2, the algorithm first recurses down to the three
bottomleft nodes and uses the UTILITY function on them to discover that
their values are 3, 12, and 8, respectively. Then it takes the minimum of these
values, 3, and returns it as the backedup value of node B.
• A similar process gives the backed-up values of 2 for C and 2 for D.
• Finally, we take the maximum of 3, 2, and 2 to get the backed-up value of 3
for the root node.
Cont…
Optimal decisions in multiplayer
games
• Many popular games allow more than two players. Let us examine
how to extend the minimax idea to multiplayer games. This is
straightforward from the technical viewpoint, but raises some
interesting new conceptual issues.
• First, we need to replace the single value for each node with a vector
of values. For example, in a three-player game with players A, B, and
C, a vector vA, vB, vC is associated with each node.
• For terminal states, this vector gives the utility of the state from each
player’s viewpoint. (In two-player, zero-sum games, the two-element
vector can be reduced to a single value because the values are always
opposite.)
• The simplest way to implement this is to have the UTILITY function
return a vector of utilities.
Cont…
Cont…
• Multiplayer games usually involve alliances, whether formal or informal,
among the players. Alliances are made and broken as the game proceeds.
• How are we to understand such behavior? Are alliances a natural
consequence of optimal strategies for each player in a multiplayer game? It
turns out that they can be.
• For example, suppose A and B are in weak positions and C is in a stronger
position. Then it is often optimal for both A and B to attack C rather than
each other, lest C destroy each of them individually. In this way,
collaboration emerges from purely selfish behavior.
• Of course, as soon as C weakens under the joint onslaught, the alliance loses
its value, and either A or B could violate the agreement. In some cases,
explicit alliances merely make concrete what would have happened anyway.
• In other cases, a social stigma attaches to breaking an alliance, so players
must balance the immediate advantage of breaking an alliance against the
long-term disadvantage of being perceived as untrustworthy.
ALPHA–BETA PRUNING
• Consider again the two-ply game tree from Figure 5.2. Let’s go through the
calculation of the optimal decision once more, this time paying careful
attention to what we know at each point in the process. The steps are
explained in Figure 5.5.
• The outcome is that we can identify the minimax decision without ever
evaluating two of the leaf nodes.
• Another way to look at this is as a simplification of the formula for
MINIMAX. Let the two unevaluated successors of node C in Figure 5.5 have
values x and y. Then the value of the root node is given by
Cont…
Cont…
Cont…
Move ordering
• The effectiveness of alpha–beta pruning is highly dependent on the order in
which the states are examined.
• For example, in Figure 5.5(e) and (f), we could not prune any successors of
D at all because the worst successors (from the point of view of MIN) were
generated first.
• If the third successor of D had been generated first, we would have been able
to prune the other two.
• This suggests that it might be worthwhile to try to examine first the
successors that are likely to be best.
IMPERFECT REAL-TIME DECISIONS
• The minimax algorithm generates the entire game search space, whereas the
alpha–beta algorithm allows us to prune large parts of it. However,
alpha–beta still has to search all the way to terminal states for at least a
portion of the search space.
• This depth is usually not practical, because moves must be made in a
reasonable amount of time—typically a few minutes at most. Claude
Shannon’s paper Programming a Computer for Playing Chess (1950)
proposed instead that programs should cut off the search earlier and apply a
heuristic evaluation function to states in the search, effectively turning
nonterminal nodes into terminal leaves.
• In other words, the suggestion is to alter minimax or alpha–beta in two ways:
replace the utility function by a heuristic evaluation function EVAL, which
estimates the position’s utility, and replace the terminal test by a cutoff test
that decides when to apply EVAL. That gives us the following for heuristic
minimax for state s and maximum depth d:
Cont…
Evaluation functions
• An evaluation function returns an estimate of the expected utility of the game
from a given position. The idea of an estimator was not new when Shannon
proposed it.
• For centuries, chess players (and aficionados of other games) have developed
ways of judging the value of a position because humans are even more
limited in the amount of search they can do than are computer programs.
• It should be clear that the performance of a game-playing program depends
strongly on the quality of its evaluation function.
• An inaccurate evaluation function will guide an agent toward positions that
turn out to be lost.
• How exactly do we design good evaluation functions?
Cont…
• First, the evaluation function should order the terminal states in the same
way as the true utility function: states that are wins must evaluate better than
draws, which in turn must be better than losses.
• Otherwise, an agent using the evaluation function might err even if it can see
ahead all the way to the end of the game.
• Second, the computation must not take too long! (The whole point is to
search faster.) Third, for nonterminal states, the evaluation function should
be strongly correlated with the actual chances of winning.
Cont…
• One might well wonder about the phrase “chances of winning.” After all,
chess is not a game of chance: we know the current state with certainty, and
no dice are involved.
• But if the search must be cut off at nonterminal states, then the algorithm will
necessarily be uncertain about the final outcomes of those states.
• This type of uncertainty is induced by computational, rather than
informational, limitations.
• Given the limited amount of computation that the evaluation function is
allowed to do for a given state, the best it can do is make a guess about the
final outcome.
Cont…
• Let us make this idea more concrete. Most evaluation functions work by
calculating various features of the state for example, in chess, we would
have features for the number of white pawns, black pawns, white queens,
black queens, and so on.
• The features, taken together, define various categories or equivalence classes
of states: the states in each category have the same values for all the features.
• For example, one category contains all two-pawn vs. one-pawn endgames.
• Any given category, generally speaking, will contain some states that lead to
wins, some that lead to draws, and some that lead to losses.
• The evaluation function cannot know which states are which, but it can
return a single value that reflects the proportion of states with each outcome.
• For example, suppose our experience suggests that 72% of the states
encountered in the two-pawns vs. one-pawn category lead to a win (utility
• +1); 20% to a loss (0), and 8% to a draw (1/2).
Cont…
• Then a reasonable evaluation for states in the category is the expected
value:(0.72 × +1) + (0.20 × 0) + (0.08 × 1/2) = 0.76.
• In principle, the expected value can be determined for each category,
resulting in an evaluation function that works for any state. As with terminal
states, the evaluation function need not return actual expected values as long
as the ordering of the states is the same.
• In practice, this kind of analysis requires too many categories and hence too
much experience to estimate all the probabilities of winning. Instead, most
evaluation functions compute separate numerical contributions from each
feature and then combine them to find the total value.
• For example, introductory chess books give an approximate material value
• for each piece: each pawn is worth 1, a knight or bishop is worth 3, a rook 5,
and the queen 9.
• Other features such as “good pawn structure” and “king safety” might be
worth half a pawn, say. These feature values are then simply added up to
obtain the evaluation of the position.
Cont…
• A secure advantage equivalent to a pawn gives a substantial likelihood of
winning, and a secure advantage equivalent to three pawns should give
almost certain victory, as illustrated in Figure 5.8(a). Mathematically, this
kind of evaluation function is called a weighted linear function because it
can be expressed as
Cont…
Cutting off search
• The next step is to modify ALPHA-BETA-SEARCH so that it will call the
heuristic EVAL function when it is appropriate to cut off the search. We
replace the two lines in Figure 5.7 that mention TERMINAL-TEST with the
following line:
• We also must arrange for some bookkeeping so that the current depth is
incremented on each recursive call.
• The most straightforward approach to controlling the amount of search is to
set a fixed depth limit so that CUTOFF-TEST(state, depth) returns true for
all depth greater than some fixed depth d.
• It must also return true for all terminal states, just as TERMINAL-TEST
• did.
• The depth d is chosen so that a move is selected within the allocated time. A
more robust approach is to apply iterative deepening. (See Chapter 3.)
Cont…
• When time runs out, the program returns the move selected by the deepest
completed search. As a bonus, iterative deepening also helps with move
ordering.
• These simple approaches can lead to errors due to the approximate nature of
the evaluation function. Consider again the simple evaluation function for
chess based on material advantage.
• Suppose the program searches to the depth limit, reaching the position in
Figure 5.8(b), where Black is ahead by a knight and two pawns.
• It would report this as the heuristic value of the state, thereby declaring that
the state is a probable win by Black.
• But White’s next move captures Black’s queen with no compensation.
Hence, the position is really won for White, but this can be seen only by
looking ahead one more ply.
Cont…
• Obviously, a more sophisticated cutoff test is needed. The evaluation
function should be applied only to positions that are quiescent that is,
unlikely to exhibit wild swings in value in the near future. In chess, for
example, positions in which favorable captures can be made are not
quiescent for an evaluation function that just counts material.
• Nonquiescent positions can be expanded further until quiescent positions are
reached. This extra search is called a quiescence search; sometimes it is
restricted to consider only certain types of moves, such as capture moves,
that will quickly resolve the uncertainties in the position.
Cont…
• The horizon effect is more difficult to eliminate. It arises when the program
is facing an opponent’s move that causes serious damage and is ultimately
unavoidable, but can be temporarily avoided by delaying tactics. Consider
the chess game in Figure 5.9. It is clear that there is no way for the black
bishop to escape.
Cont…
Forward pruning