Chapter 4
Chapter 4
Game Search:
Games are a form of multi-agent environment
What do other agents do and how do they affect our success?
Cooperative vs. competitive multi-agent environments.
Competitive multi-agent environments give rise to adversarial search often
known as games
Games – adversary
Solution is strategy (strategy specifies move for every possible opponent reply).
Time limits force an approximate solution
Evaluation function: evaluate “goodness” of game position
Example: chess
Difference between the search space of a game and the search space of a problem:
In the first case it represents the moves of two (or more) players, whereas in the latter
case it represents the "moves" of a single problem-solving agent.
2
An exemplary game: Tic-tac-toe
There are two players denoted by X and O.
They are alternatively writing their letter in one of the 9 cells of a 3 by 3 board.
The winner is the one who succeeds in writing three letters in line.
It ends in a win for one player and a loss for the other, or possibly in a draw.
The root node is the initial state, in which it is the first player's turn to move (the player
X).
The successors of the initial state are the states the player can reach in one move, their
successors are the states resulting from the other player's possible replies, and so on.
3
Terminal states are those representing a win for X, loss for X, or a
draw.
Each path from the root node to a terminal node gives a different
complete play of the game.
Initial state: It includes the board position and identifies the players to move.
Successor function: It gives a list of (move, state) pairs each indicating a legal move
and resulting state.
Terminal test: This determines when the game is over. States where the game is
ended are called terminal states.
Utility function: It gives numerical value of terminal states. E.g. win (+1), lose (-1)
and draw (0). Some games have a wider variety of possible outcomes eg. ranging
from +92 to -192.
5
The Minimax Algorithm:
Let us assign the following values for the game: 1 for win by X, 0 for draw, -1 for loss
by X.
Given the values of the terminal nodes (win for X (1), loss for X (-1), or draw (0)), the
values of the non-terminal nodes are computed as follows:
•the value of a node where it is the turn of player X to move is the maximum of
the values of its successors (because X tries to maximize its outcome).
•the value of a node where it is the turn of player O to move is the minimum of
the values of its successors (because O tries to minimize the outcome of X).
Figure below shows how the values of the nodes of the search tree are computed from
the values of the leaves of the tree.
The values of the leaves of the tree are given by the rules of the game:
•1 if there are three X in a row, column or diagonal;
•-1 if there are three O in a row, column or diagonal;
•0 otherwise
6
X O O
O to move X
(Min) X
1(w in for X)
X O O X O O X O O X O O
X to move O X X O X X
(Max) X X X O X O
.. . .. . .. .
Min b c
d e f g
h i j k l m n o p r
5 3 3 1 4 7 5 9 2 7
Show what moves should be chosen by the two players, assuming that both
are using the mini-max procedure.
8
Solution:
Max a 7
Min b 4 7 c
5 d 4 e 7 f 9 g
h i j k l m n o p r
5 3 3 1 4 7 5 9 2 7
9
Alpha-Beta Pruning:
The problem with minimax search is that the number if game states it has examine is
exponential in the number of moves.
Unfortunately, we can’t eliminate the exponent, but we can effectively cut it in half.
The idea is to compute the correct minimax decision without looking at every node in
the game tree, which is the concept behind pruning.
The particular technique for pruning that we will discuss here is “Alpha-Beta
Pruning”.
When this approach is applied to a standard minimax tree, it returns the same move
as minimax would, but prunes away branches that cannot possibly influence the final
decision.
Alpha-beta pruning can be applied to trees of any depth, and it is often possible to
prune entire sub-trees rather than just leaves.
10
Alpha-beta pruning is a technique for evaluating nodes of a game tree that eliminates
unnecessary evaluations.
Alpha: is the value of the best (i.e. highest value) choice we have found so far at any
choice point along the path for MAX.
Beta: is the value of the best (i.e. lowest-value) choice we have found so far at any
choice point along the path for MIN.
Alpha-beta search updates the values of alpha and beta as it goes along and prunes
the remaining branches at a node as soon as the value of the current node is known to
be worse than the current alpha or beta for MAX or MIN respectively.
An alpha cutoff:
To apply this technique, one uses a parameter called alpha that represents a lower
bound for the achievement of the Max player at a given node.
Let us consider that the current board situation corresponds to the node A in the
following figure.
11
Max A = 15
Max D E Max
f(D) = 10
It will therefore estimate first the value of the node B. Let us suppose that
this value has been evaluated to 15, either by using a static evaluation
function, or by backing up from descendants omitted in the figure.
12
Therefore 15 is a lower bound for the achievement of the Max player (it may still
be possible to achieve more, depending on the values of the other descendants of
A).
This value is transmitted upward to the node A and will be used for evaluating the
other possible moves from A.
Let us assume that the value of D is 10 (this value has been obtained either by
applying a static evaluation function directly to D, or by backing up values from
descendants omitted in the figure).
Because this value is less than the value of alpha, the best move for Max is to
node B, independent of the value of node E that need not be evaluated.
Indeed, if the value of E is greater than 10, Min will move to D which has the
value 10 for Max.
Otherwise, if the value of E is less than 10, Min will move to E which has a value
less than 10.
13
So, if Max moves to C, the best it can get is 10, which is less than the value a = 15
that would be gotten if Max would move to B.
A beta cutoff:
To apply this technique, one uses a parameter called beta that represents an upper
bound for the achievement of the Max player at a given node.
14
The Min player also evaluates its descendants in a depth-first order.
From the point of view of Min, this is an upper bound for the achievement of Min
(it may still be possible to make Min achieve less, depending of the values of the
other descendants of B).
15
Therefore the value of beta at the node F is 15.
This value is transmitted upward to the node B and will be used for evaluating the other
possible moves from B.
Let us assume that the value of H is 25 (this value has been obtained either by applying
a static evaluation function directly to H, or by backing up values from descendants
omitted in the figure).
Because this value is greater than the value of beta, the best move for Min is to node F,
independent of the value of node I that need not be evaluated.
So in both cases, the value obtained by Max is at least 25 which is greater than beta (the
best value obtained by Max if Min moves to F).
16
Therefore, the best move for Min is at F, independent of the value of I.
One should notice that by applying alpha and beta cut-off, one obtains the same
results as in the case of mini-max, but (in general) with less effort.
This means that, in a given amount of time, one could search deeper in the game tree
than in the case of mini-max.
17
Game Theory
• Game theory is a study of how to mathematically determine the best strategy for given
conditions in order to optimize the outcome.
• Game theory assumes all human interactions can be understood and navigated by
presumptions.
18
Importance of game theory
19
Games of Chance
• In the game with uncertainty Players include a random element (roll dice, flip a coin,
etc.) to determine what moves to make
i.e. Dice are rolled at the beginning of a player’s turn to determine the legal moves.
• Chance games are good for exploring decision making in adversarial problems involving
skill and luck.
20
Constraints Satisfaction Problems
A constraint satisfaction problem consists of three components, X, D and C:
X is a set of variables, { X1, …….., Xn}
D is a set of domains, { D1,………..Dn}, one for each variable.
C is a set of constraints that specify allowable combinations of values.
21
Example problem: Map coloring
22
We are looking at a map of Australia showing its states and territories.
We are given the task of coloring each region either red, green or blue in such a way
that no neighboring regions have same color.
Since there are nine places where regions border, there are nine constraints:
C = {SA ≠ WA, SA ≠ NT, SA ≠ Q, SA ≠ NSW, SA ≠ V, WA ≠ NT,
NT ≠ Q, Q ≠ NSW, NSW ≠ V}
23
24
Crypt-arithmetic Problem
• Many problems in AI can be considered as problems of constraint satisfaction, in
which the goal state satisfies a given set of constraint.
If the same letter occurs more than once, it must be assigned the same digit
each time.
The sum of the digits must be arithmetically correct with the added restriction
that no leading zeroes are allowed.
25
Example 1
Solve the following crypt arithmetic problem
TWO
+T W O
FOUR
9 2 8
TWO
9 2 8
+TWO
1 8 5 6
FOUR
F=1O=8U=5R=6T=9W=2
Example 2
FOUR
+F O U R
EIGHT
26
9 2 3 5
FOUR
9 2 3 5
+F O U R
1 8 4 7 0
EIGHT
E=1I=8G=4H=7T=0F=9O=2U=3R=5
27