Min Max en
Min Max en
Przemysław Klesk˛
[email protected]
1 Game theory
6 References
Table of contents
1 Game theory
6 References
Game theory
Game theory
A branch of mathematics dealing with situations of conflict (strategic
situations), where a result of a participant depends on choices made by
himself and others. Sometimes, also called the theory of rational behaviours.
Apart from computer science, applied in the fields of sociology, economics,
military (historically earlier).
Notions
Game
A situation of conflict, where:
at least two players can be indicated,
every player has a certain number of possible strategies to choose from (a strategy
precisely defines the way the game shall be played by the player),
result of the game is a direct consequence of the combination of strategies chosen by
players.
Strategy
Complete set of decisions (about choices or moves) that a player has to make for all possible
states the game can reach.
It is often impossible to write down (memorize) a strategy because of its size (for typical games).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 5 / 77
Game theory
Notions
Finite game
A game for which it is guaranteed that the game shall finish.
Zero-sum game
A game in which payoffs for all players (determined by result of game) sum
up to zero.
1
For chess the convention is: 0 (loss), 1 (win), 2 (draw); zero sum can be
obtained by a linear transformation: 2x − 1).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 6 / 77
Game theory
Minimax Theorem
For zero-sum games the minimax solution is identical with Nash equilibrium
(a broader notion).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 7 / 77
Game theory
Przemysław Klesk
˛ (KMSIiMS, ZUT) 8 / 77
Game theory
The minimax choice for A is a2 , because the worst possible result for A is then −1:
The minimax choice for B is b2 , because the worst possible result for B is then 0:
Solution (a2 , b2 ) is not stable, because if B believes that A chooses a2 then B shall choose b1
in order to obtain the payoff −1; then, if A believes that B chooses b1 then A shall choose a1
to obtain the payaoff 3, etc.
Dominated choices: a3 and b3 — regardless of opponent’s choice, the other choices are
better (more precisely: not worse). Hence, the matrix of payoffs can be reduced by
deleting third row and third column.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 9 / 77
Game theory
1 1
p= , q= . (4)
6 3
Game value: v = − 13 .
Formally, when P and Q represent mixed strategies (as vectors of probabilities), then:
If there exist more than one optimal mixed strategy then infinitely many of them exist.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 10 / 77
Game theory
John Nash, born 1928, Nobel prize winner in 1994 (in economics).
Informally
In a multi-player game, we say that a certain set of strategies from particular
players constitutes the Nash equilibrium, if and only if each of those strategies
is the best response for all remaining ones and none of the players can gain
by changing its own strategy with other strategies kept fixed.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 11 / 77
Game theory
Formally
In a game with n players, let Si denote the set of possible strategies for i-th player.
Let S denote the space of all strategies as the cartesian product of sets of strategies from
particular players:
S = S1 × S2 × · · · × Sn
For any set of strategies (s1 , . . . , sn ) from particular players, let Wi (s1 , . . . , sn ) determine the
payoff for the i-th player. Therefore, Wi is a function:
Wi : S → R.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 12 / 77
Game theory
Another way to define NEQ is that s∗i can be viewed as the solution of:
for all i.
The idea of NEQ can be applied to analyze or predict what happens when several players
(parties, institutions) must make decisions simultaneously, and when the outcome
depends on all those decisions. The outcome cannot be predicted when analyzing the
decisions seperately (in isolation).
NEQ does not have to indicate the best result for the group (the best sum of results) and
may seem irrational for an outside observer (e.g. prisonner dilemma, Braess paradox).
In many cases, players could improve their group result if they agree strategies different
from NEQ (e.g. business cartels instead of free-market competition).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 13 / 77
Game theory
Braess Paradox
B
1+ n
100 2
A 0.25 D
2 1+ n
100
C
Problem
Assuming selfishness and rationality of drivers, find the expected traffic flow (the NEQ) for 100
drivers travelling from A to D in two cases: (1) when edge BC does not exist, (2) when edge BC
exists.
Treat the problem as a game where every player (driver) has 2 or 3 strategies, respectively: ABD,
ACD and possibly ABCD. The payoff is the time of travel for road selected. For AB and BD
edges, the n parameter denotes the number of players who selected the edge as a road fragment.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 14 / 77
Game theory
Case 1
p q
1 + 100 + 2 = 1 + 100 + 2;
(8)
p + q
= 100.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 15 / 77
Game theory
Case 2
p+r q+r p+r q+r
1 + 100 + 2 = 1 + 100 + 2 = 1 + + 0.25 + 1 + 100 ;
100
(9)
p + q + r
= 100.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 16 / 77
Table of contents
1 Game theory
6 References
Game trees and searching
Game trees
Przemysław Klesk
˛ (KMSIiMS, ZUT) 18 / 77
Game trees and searching
Przemysław Klesk
˛ (KMSIiMS, ZUT) 19 / 77
Game trees and searching
Przemysław Klesk
˛ (KMSIiMS, ZUT) 20 / 77
Game trees and searching
Przemysław Klesk
˛ (KMSIiMS, ZUT) 21 / 77
Game trees and searching
Przemysław Klesk
˛ (KMSIiMS, ZUT) 22 / 77
Game trees and searching
Przemysław Klesk
˛ (KMSIiMS, ZUT) 23 / 77
Game trees and searching
In 1990 the program was given the right to participate in championships and playing
against human.
The program lost the championships in 1992, but won in 1994. In 1996, the possibility of
Chinook’s participation was withdrawn (program was much stronger than any other
human player).
Search space of order: 5 · 1020 . Database (library) with information about the best move
(continuation) for many states.
29.04.2007 — authors of the project anounce English checkters a solved game! Black
(starting the game) have a draw guarantee with a perfect play. White is also guaranteed
with a draw, regardless of the first move by black.
Until today, it is the “largest” solved mind game.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 24 / 77
Game trees and searching
Arthur Samuel wrote a checkers engine program in 1950 under a project sponsored by
IBM.
In 1952, a genetic element of self-training was added — two instances of the program
were playing against one another with repetitions. The thus evolved program was
beating amateurs and intermediate players.
After program presentation in 1956 for IBM stakeholders, the IBM stock quotes rose by 15
points.
In 1962 the program played a public match against Robert Nealy (a blind checkers
master), in which the program won. The win was given much publicity. Nealy was not a
world-class master.
In effect, a false belief was spread, that English checkers were already a solved game at
the time. Bjørnsson had troubles obtaining his grant for Chinook research in the 80s
because of that.
A year later, Samuel’s program lost a rematch: 1 loss, 5 draws. In 1966 the program lost 8
consecutive games against top level players: Derek Oldbury and Walter Hellman.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 25 / 77
Table of contents
1 Game theory
6 References
Games of perfect information — algorithms
MIN-MAX algorithm
Procedure mmEvaluateMax(s, d, D)
1 If s is a terminal then return h(s) (position evaluation).
2 v := −∞.
3 For all states t being descendants of s:
1 v := max{v, mmEvaluateMin(t, d + 21 , D)}.
4 Return v.
Procedure mmEvaluateMin(s, d, D)
1 If s is a terminal then return h(s) (position evaluation).
2 v := ∞.
3 For all states t being descendants of s:
1 v := min{v, mmEvaluateMax(t, d + 21 , D)}.
4 Return v.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 27 / 77
Games of perfect information — algorithms
Many independent discoverers: Samuel (1952), McCarthy (1956), Newell and Simon
(1958).
During analysis two values are propagated down and up the tree:
α — guaranteed (so far) payoff for maximizing player,
β — guaranteed (so far) payoff for minimizing player.
Out-most execution for root forces α = −∞, β = ∞.
Children-nodes (and their subtrees) are analyzed while α < β.
Whenever α > β, one should stop considering successive children (and their subtrees) —
they will not affect the outcome for the whole tree; they would be a result of a
non-optimal play by some of players.
Inoptimistic case, the gain in complexity with respect to MIN-MAX is from O(bD ) to
√
O bD/2 = O bD , where b — branching factor (constant or average). E.g. for chess b ≈ 40.
Owing to the gain one may search deeper.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 28 / 77
Games of perfect information — algorithms
Procedure alphaBetaEvaluateMin(s, d, D, α, β)
1 If s is a terminal then return h(s) (position evaluation).
2 For all states t being descendants of s:
1 v := alphaBetaEvaluateMax(t, d + 12 , D, α, β).
2 If v 6 α then return α. (cut-off)
3 β := min{β, v}.
3 Return β.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 29 / 77
Games of perfect information — algorithms
α = −∞, 5
β=∞
MAX
α = −∞ α=5
β = ∞, 5 β = ∞, 10, 5
MIN
5 4 6 ∗ 7 10 4 4
Przemysław Klesk
˛ (KMSIiMS, ZUT) 30 / 77
Games of perfect information — algorithms
Explanation: we need to build all possible children for the first player, but we assume
moves are optimally ordered therefore, in each child, alread the first move of the second
player causes a cut-off (α > β) and further moves are discarded as non-optimal ones. And
so forth recursively.
3
There exist estimates for the average case (random order of children), yielding O(b 4 D ).
In chess for: b = 40 and D = 12 (12 half-moves), the proportion of visited states for
pessimistic ordering to visited states for optimistic ordering is 406 , i.e. of 109 order.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 31 / 77
Games of perfect information — algorithms
Procedure fsAlphaBetaEvaluateMin(s, d, D, α, β)
1 If s is a terminal then return h(s) (position evaluation).
2 For all states t being descendants of s:
1 v := fsAlphaBetaEvaluateMax(t, d + 21 , D, α, β).
2 β := min{β, v}.
3 If α > β then return β. (cut-off)
3 Return β.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 32 / 77
Games of perfect information — algorithms
Article: Knuth D.E., Moore R.W., „An Analysis of Alpha-Beta Pruning”, Artificial Intelligence,
1975.
The theorem is useful for building more advanced search algorithms: Negascout, MTD-f based on
so called zero search windows.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 33 / 77
Games of perfect information — algorithms
Quiescence algorithm
Tries to mimic the intuition of human players by: expanding loud nodes/leaves, and not
expanding quiet nodes/leaves (instead, immediate return of position evaluation).
Partially solves the horizon effect problem.
We call a position quiet if no sudden changes of position evaluation between given state
and its descendants occur (e.g. takes/captures).
An assesment if given state is quiet or not may not be easy; it may require a heuristic in
itself. Importantly, such an assement must be faster than expanding a new tree level.
Quiescence does not have to be applied only at leaves level, but already sooner. Current
depth can be used as an element for assesment of quietness, i.e. the deeper we are the
larger tendency to leave quiet states not expanded.
Exact description e.g. in: D. Laramée, Chess Programming Part V: Advanced Search, 2000.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 34 / 77
Games of perfect information — algorithms
Sorting heuristics
in chess: „captures first”,
in many card games e.g. bridge: „extreme cards first, middle ones
later”; e.g. a hand A, D, 8, 6, 5, 2 can be sorted to: A, 2, D, 5, 8, 6; the order
can be also arranged accoring to the position of the player within a trick
(e.g. the last player typically plays a high card first, the second player
typically plays a low card first, etc.),
sorting according to position evaluation — evaluate and sort children
immediately based on their position evaluation, before running the
recurrence downwards.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 35 / 77
Games of perfect information — algorithms
Przemysław Klesk
˛ (KMSIiMS, ZUT) 36 / 77
Games of perfect information — algorithms
Transposition table
The name comes from chess and represents the possibility of obtaining
the same position (state) by different sequences of moves.
If the downwards recurrence for such a state has already been
calculated then one can save time by using a ready-made result.
Often, implemented as a hash map (time efficience, analogical to Closed
set in A∗ , BFS, etc.). The keys in hash map are states themselves or their
abbreviations — hash codes (e.g.. in chess positions of at most 32 pieces
are required plus information about castling and en passant capture
possibilities).
Conditions for reusing a state from the transposition table:
Depth of the state in transposition table not deeper than for the tested state (so that
the ready score comes from an analysis of a tree of equal or greater depth).
α-β window for the state in transposition table must be not more narrow than
current ones (so that the ready score was not affected by more cut-offs)
Sometimes applied as the book of openings or endgames (chess,
checkers).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 37 / 77
Games of perfect information — algorithms
Scout algorithm)
α + 1 = β. (12)
Przemysław Klesk
˛ (KMSIiMS, ZUT) 38 / 77
Games of perfect information — algorithms
Scout algorithm
Definition
We say that a given α-β window succeeded if v returned by the fsAlphaBeta procedure (fail-soft) is
such that: α < v < β. It implies (Knuth-Moore) that the true game value v∗ equals v.
Definition
We say that a given α-β window failed low if v returned by the fsAlphaBeta procedure (fail-soft) is
such that: v 6 α. It implies (Knuth-Moore) that v is an upperbound on the true game value:
v∗ 6 v.
Definition
We say that a given α-β window failed high if v returned by the fsAlphaBeta procedure (fail-soft) is
such that: β 6 v. It implies (Knuth-Moore) that v is a lowerbound on the true game value: v 6 v∗ .
Przemysław Klesk
˛ (KMSIiMS, ZUT) 39 / 77
Games of perfect information — algorithms
Scout algorithm
The more narrow the imposed window is, the greater chance to generate more cut-offs.
Only the first child of each state is analyzed by a full α-β window, the second and
successive children are analyzed by a zero window, i.e. α-(α + 1) or (β − 1)-β, respectively
for MAX, MIN states.
A zero window must fail either way.
If a zero window imposed on a child of MAX state failed low then we do not have to care
— the payoff for the maximizing player could not be improved within this child
(computational gain, probably greater number of cutoffs should appear within the
subtree of that child).
If a zero window imposed on a child of MAX failed high then we have to repeat the
search for that child (computational loss) with a wider window v-β in order to obain an
exact result for given subtree. Remark: still, the window (for the repeated calculation) is
more narrow than the original one: α-β.
The last two remarks are suitably opposite for MIN states.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 40 / 77
Games of perfect information — algorithms
Scout algorithm
Procedure scoutMax(s, d, D, α, β)
1 If s is a terminal then return h(s) (position evaluation).
2 b := β.
3 For all states t being descendants of s:
1 v := scoutMin(t, d + 12 , D, α, b).
1
2 If t is not the first child and D − d > 2 · 2 and b 6 v (failing high) then:
1
1 v := scoutMin(t, d + 2 , D, v, β). (repeat search with a wider window)
3 α := max{α, v}.
4 If α > β then return α. (cut-off)
5 b := α + 1.
4 Return α.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 41 / 77
Games of perfect information — algorithms
Scout algorithm
Procedure scoutMin(s, d, D, α, β)
1 If s is a terminal then return h(s) (position evaluation).
2 a := α.
3 For all states t being descendants of s:
1 v := scoutMax(t, d + 12 , D, a, β).
1
2 If t is not the first child and D − d > 2 · 2 and v 6 a (failing low) then:
1
1 v := scoutMax(t, d + 2 , D, α, v). (repeat search with wider window)
3 β := min{β, v}.
4 If α > β then return β. (cut-off)
5 a := β − 1.
4 Return β.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 42 / 77
Games of perfect information — algorithms
α = −∞, 5
β=∞
MAX
α = −∞ α=5
β = ∞, 5 β = 6, 5
MIN
α = −∞, 5 α = 4, 6 α = 5, 7 α=5
β=∞ β=5 β=6 β=6
MAX
5 4 6 ∗ 7 ∗ 4 4
Przemysław Klesk
˛ (KMSIiMS, ZUT) 43 / 77
Games of perfect information — algorithms
α = −∞
β = ∞, 5
α=5 ?
β=6
MIN
α = −∞, 5 α = 4, 6 α = 5, 7 α = 5, 8
β=∞ β=5 β=6 β=6
MAX
5 4 6 ∗ 7 ∗ 4 8
Przemysław Klesk
˛ (KMSIiMS, ZUT) 44 / 77
Games of perfect information — algorithms
Scout algorithm
Przemysław Klesk
˛ (KMSIiMS, ZUT) 45 / 77
Games of perfect information — algorithms
Negamax algorithm
Fact:
Przemysław Klesk
˛ (KMSIiMS, ZUT) 46 / 77
Games of perfect information — algorithms
Negamax algorithm
The out-most call for the root is:: negaMax(root, 0, D, −∞, ∞, 1).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 47 / 77
Games of perfect information — algorithms
α = −∞, 5
β=∞
MAX
5 4 6 ∗ 7 10 4 4 −h(s)
Przemysław Klesk
˛ (KMSIiMS, ZUT) 48 / 77
Games of perfect information — algorithms
Negascout algorithm
Przemysław Klesk
˛ (KMSIiMS, ZUT) 49 / 77
Games of perfect information — algorithms
Checkers: international (a.k.a. Polish) (100 squares, players have 20 pawns each),
Brazilian (64 squares, 12 pawns per player), English (64 squares, 12 pawns, kings moving
by 1 square).
Implementation of α-β pruning including transpostion table and Quiescence.
Multiple programs playing checkers according to different heuristics (position evaluation
functions) and competing withing a genetic evolution.
Individuals can be identified with heuristics evaluation positions (various AIs). The
simplest heuristics materialistic, symmetrical:
h = w1 Pp + w2 Kp − w1 Po − w2 Ko , (13)
where P, K denote the number of pawns and kings, respectively; whereas indexes p, o
stand for player and opponent, respectively. The parameters under genetic optimization
are w1 , w2 .
Parameters coded as integers, initially picked on random from {−100, . . . , 100}. During
evolution, the parameter have not gone outside the initial range.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 50 / 77
Games of perfect information — algorithms
It is difficult to fairly assign a numeric fitness or rank to such due to the possibility of
three-ways-draw: A wins against B, B wins agains C, C wins against A.
Tournament selection comes to mind. Difficulties: (1) frequent draws (who should be
then selected to next population?), (2) possibility of losing the best individual, (3) pointing
out the best individual in the final population.
Final approach: tournaments for population sizes being powers of 2 n = 2m . Individuals
paired randomly into matches, n/2 of population filled up with winners (in case of a draw
added was 1 child of crossed parents), the rest of population filled up iteratively with
winners of matches between winners added before.
The winner of the very last match (the winner among winners) considered the best
individual in the final population.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 51 / 77
Games of perfect information — algorithms
Linear cross-over — draw a randome number α ∈ (0, 1), a child C of parents A and B is
obtained as:
wi (C) = αwi (A) + (1 − α)wi (B). (14)
Uniform cross-over — for all wi we randomly decide from which parent it comes (and
copy it unchanged). Additionally, mutation is suggested.
Mutation of constant radius — each gene (weights) is added a random value from
{−20, . . . , 20}. The probability of mutation is lineaerly decreased from 0.9 in the first
iteration to 0.3 in the last.
Different depths for tree analysis were set up based on the GA iteration (the later iteration
the more accurate the analysis should be).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 52 / 77
Games of perfect information — algorithms
h = w1 Pp + w2 Kp + w3 Pp + w4 Kp (15)
Ds
hs = w1 Ps + · · · + w7 KDs + w8 Ms + w9 + w10 KD2s , (17)
Ps + Ks
where: M ∈ {0, 1} — extra reward for turn to move, D — number of doubled pieces
(touching by corners), KD2 — number of kings on opposite double diagonal.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 53 / 77
Games of perfect information — algorithms
Studied heuristics:
materialistic-row-wise — pawns have different values according to rows they occupy.
N−1
X
hs = wi Pi + wN K, (18)
i=1
Przemysław Klesk
˛ (KMSIiMS, ZUT) 54 / 77
Games of perfect information — algorithms
Comment: aggressive play with pawns — capturing opponent’s pawn increases the
evaluation by 2. Careful play with kings in endgame, because own kings are worth more
than opponents’.
materialistic-positional, symmetric:
Comment: surprisingly the pawns defending promotion line are evaluated negatively,
and kings on main diagonal as immaterial.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 55 / 77
Games of perfect information — algorithms
Comment: surprisingly, most of parameters were zeroed; of relevance seem to be: pawns
just before promotion and penalty for doubled pieces.
extended materialistic-row-wise, symmetrical:
P1 = 2, P2 = 1, P3 = 2, P4 = 2, P5 = 2, P6 = 2, P7 = 1, P8 = 3, P9 = 6, K = 12.
Comment: interestingly, values in rows from 3 to 6 are equal; a pawn in 8-th row already
starts to be worth more.
extended materialistic-structural, symmetrical:
Comment: one can note that immortal pieces have a positive impact, frozen ones a
negative impact (before both those elements were ‘hidden’ in the doubling parameter D).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 56 / 77
Games of perfect information — algorithms
Przemysław Klesk
˛ (KMSIiMS, ZUT) 57 / 77
Games of perfect information — algorithms
Przemysław Klesk
˛ (KMSIiMS, ZUT) 58 / 77
Games of perfect information — algorithms
Bachelor thesis: Katarzyna Kubasik, Application of game tree searching algorithms for finding
minimax points in “double dummy” bridge, WI, 2011.
Implementation of α-β pruning with the use transposition table.
Despite 4 players: N, E, S, W; alternate players (N, S) and (E, W) constitute pairs which
can be identified with two players: maximizing and minimizing.
A play by each player constitutes a new level in the search tree. The full tree has
4 · 13 = 52 levels.
When searching MAX, MIN do not have to alternate — one has to check which side took
the last trick (this side will again have the next move).
Improving elements: checking current sequences (e.g. a configuraiton 6, 4, 2, becomes a
sequence once other players have used the cards 5 and 3); sorting moves according to
heuristics: „extreme cards first, middle ones later”.
Przemysław Klesk
˛ (KMSIiMS, ZUT) 59 / 77
Games of perfect information — algorithms
♠ A10x
♥ A10x
♦ xxxx
♣ xxx
♠ KQJ ♠ xxxxx
N
♥ KQJ ♥ xxxxx
W E
♦ xxx S ♦ xx
♣ xxxx ♣x
♠ xx
♥ xx
♦ AKQJ
♣ AKQJ10
Przemysław Klesk
˛ (KMSIiMS, ZUT) 60 / 77
Games of perfect information — algorithms
♠ 10
♥ A10
♦
♣
♠J
N
♥ KQ
W E immaterial
♦ S
♣
♠
♥ xx
♦
♣ 10
S now playes 10♣ and W is squeezed. Without the initial duck at trick one the squeeze won’t
take place — the first play by N has a consequence 40 levels deeper in the tree!
Przemysław Klesk
˛ (KMSIiMS, ZUT) 61 / 77
Games of perfect information — algorithms
♠ A10
♥ 10
♦
♣
♠ QJ
N
♥J
W E immaterial
♦ S
♣
♠ xx
♥
♦
♣ 10
Przemysław Klesk
˛ (KMSIiMS, ZUT) 62 / 77
Table of contents
1 Game theory
6 References
Games of perfect information with random elements
Przemysław Klesk
˛ (KMSIiMS, ZUT) 64 / 77
Games of perfect information with random elements
Przemysław Klesk
˛ (KMSIiMS, ZUT) 65 / 77
Games of perfect information with random elements
Expectiminimax algorithm
Przemysław Klesk
˛ (KMSIiMS, ZUT) 66 / 77
Games of perfect information with random elements
Example: backgammon
Przemysław Klesk
˛ (KMSIiMS, ZUT) 67 / 77
Games of perfect information with random elements
Example: backgammon
2+6−1
Two dices are rolled — number of possibilities: 21 = 2 . Branching at CHANCE level:
b = 21.
For n = 15 pawns, in effect of a „typical roll” (non “double”) a player can either select one
pawn to move (outcomes sum), or select two pawns (individual outcomes). Number of
possibilities: n(n − 1) = 210.
In case of “doubles” (the same outcome on both dices) a player has 4 single moves (of
same value) at disposal. Number of possibilities: 4+n−1 = 3060. Doubles occur with 16
4
probability.
Field blockages significantly reduce the number of possibilities. The average branching is
estimated to be approx. 400.
As depth increases the probability of some state decreases exponentially fast — therefore,
long-term forcasts are of little value.
The TDGammon program visits only 4 halfmoves of depth but has a very complex
heuristics for position evaluation (sufficient for a play at master’s level).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 68 / 77
Table of contents
1 Game theory
6 References
Games of imperfect information
Imperfect information
Przemysław Klesk
˛ (KMSIiMS, ZUT) 70 / 77
Games of imperfect information
Przemysław Klesk
˛ (KMSIiMS, ZUT) 71 / 77
Games of imperfect information
Possible hand 1: (player A) K♠, Q♠, A♦, A♣ : A♠, J♠, A♥, K♦ (player B).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 72 / 77
Games of imperfect information
Possible hand 2: (A) K♠, Q♠, A♦, A♣ : A♠, J♠, A♥, K♣ (B).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 73 / 77
Games of imperfect information
In fact, we have: (player A) K♠, Q♠, A♦, A♣ : A♠, J♠, A♥, K∗ (player B).
1 1
Score: 2 · (−1) + 2 · 0 = − 21 .
Przemysław Klesk
˛ (KMSIiMS, ZUT) 74 / 77
Games of imperfect information
?
♠ KQJxx ♠ Axxxx
N
♥ AKQ ♥ xxx
W E
♦ AJ10 S ♦ Kxx
♣ AQ ♣ xx
?
Contract: 6 spades played by WE. First lead N: x♠. Key missing cards: Q♦,
K♣. Does there exist a play guaranteeing 12 tricks?
Przemysław Klesk
˛ (KMSIiMS, ZUT) 75 / 77
Games of imperfect information
?
♠ xx ♠ xx
N
♥ ♥
W E
♦ AJ10 S ♦ Kxx
♣ AQ ♣ xx
?
Optimal play: We trump out opponents — spades (by playing three rounds of spades if need
be), we play three rounds of hearts. An endgame as above shall take place. Now, we play A♣ and
Q♣ and we give up Q♣ voluntarily! Regardles of continuation by N or S, 12 tricks is guaranteed.
An other play based on attempts to catch K♣ at S (odds: ≈ 50%) and to catch Q♦ at S or N (odds:
≈ 50%) leads to the following expected number of tricks: ≈ 14 11 + 42 12 + 41 13 = 12, but with
variance: ≈ 14 (12 − 11)2 + 42 (12 − 12)2 + 41 (12 − 13)2 = 12 .
Przemysław Klesk
˛ (KMSIiMS, ZUT) 76 / 77
References
Some references
1 J. von Neuman and O. Morgenstern, Theory of Games and Economic Behavior, 1944 (see:
http://press.princeton.edu/titles/7802.html).
2 Chinook project wesite: http://webdocs.cs.ualberta.ca/~chinook.
3 D.E. Knuth, R.W. Moore, An Analysis of Alpha-Beta Pruning, Artificial Intelligence, 1975 (patrz:
http://www.eecis.udel.edu/~ypeng/articles/An Analysis of Alpha-Beta Pruning.pdf).
4 A. Reinefeld, „An Improvement to the Scout Tree Search Algorithm”, ICCA Journal, 1983 (see:
http://www.top-5000.nl/ps/An improvement to the scout tree search algorithm.pdf)
5 D. Larameé, Chess Programming I-V, 2000. http://www.gamedev.net/page/resources/_/reference/programming/
artificial-intelligence/gaming/chess-programming-part-i-getting-started-r1014)
6 M. Bożykowski, Implementation of self-teaching program for checkers, master thesis, WI ZUT, 2009.
7 K. Kubasik, Application of game tree searching algorithms for finding minimax points in “double dummy” bridge, bachelor
thesis, WI ZUT, 2011.
8 P. Beling, Practical aspects of logical games programming, master thesis, Polytechnics of Łódź, 2006.
9 Expectiminimax tree, Wikipedia, (see: http://en.wikipedia.org/wiki/Expectiminimax_tree).
10 Materials on GIB — bridge playing program, (see:
http://www.greatbridgelinks.com/gblSOFT/Reviews/SoftwareReview090301.html and
http://www.gibware.com/).
Przemysław Klesk
˛ (KMSIiMS, ZUT) 77 / 77