Chap04 GamePlaying Complete
Chap04 GamePlaying Complete
1
Game Playing
Game yes
Over?
no
Generate Successors
Evaluate Successors
no Game yes
Over?
Games as Adversarial Search
• States:
– board configurations
• Initial state:
– the board position and which player will move
• Successor function:
– returns list of (move, state) pairs, each indicating a legal
move and the resulting state
• Terminal test:
– determines when the game is over
• Utility function:
– gives a numeric value in terminal states (e.g., -
1, 0, +1 for loss, tie, win)
Game Tree (2-player, Deterministic Turns)
computer’s
turn
opponent’s
turn
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
13
30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
14
30
30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
15
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
16
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
17
30
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
18
30
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
19
30
30 20
30 25 20
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
20
30
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
21
30
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
22
30
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
23
20
20
30 20
30 25 20 05
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
24
20
20 15
30 20 15 60
30 25 20 05 10 15 45 60
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
25
20
20 15
30 20 15 60
30 25 20 05 10 15 45 60
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
26
20
20 15
30 20 15 60
30 25 20 05 10 15 45 60
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
27
Minimax Strategy
• Why do we take the min value every other
level of the tree?
• The Universe:
– number of atoms ≈ 1078
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
33
30
30 25
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
34
30 Do we need to check
this node?
30 25
80 30 25 ?? 55 20 05 65 40 10 70 15 50 45 60 75
35
30 No - this branch is guaranteed to be
worse than what max already has
30 25
80 30 25
55 20 05 65 40 10 70 15 50 45 60 75
36
??
30
30 20
Do we need to check
this node?
30 25 20 05
80 30 25
55 20 05 ?? 40 10 70 15 50 45 60 75
37
35
30
30 20
30 25 20 05
80 30 25 20 05
55 40 10 70 15 50 45 60 75
38
35 ??
Alpha-Beta
• The alpha-beta procedure can speed up a
depth-first minimax search.
• Alpha: a lower bound on the value that a max
node may ultimately be assigned
v>
• Beta: an upper bound on the value that a
minimizing node may ultimately be assigned
v<
Alpha-Beta
MinVal(state, alpha, beta){
if (terminal(state))
return utility(state);
for (s in children(state)){
child =
MaxVal(s,alpha,beta); beta =
min(beta,child);
if (alpha>=beta)
return child;
}
alpha = the highest value for
return MAX
best along the path
child
beta =(min); } value for MIN along the path
the lowest
Alpha-Beta
MaxVal(state, alpha, beta){
if (terminal(state))
return utility(state);
for (s in children(state)){
child =
MinVal(s,alpha,beta); alpha
= max(alpha,child);
if (alpha>=beta)
return child;
}
return best child
(max);
alpha = the}highest value for MAX along the path
beta = the lowest value for MIN along the path
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-
∞
β=∞
α=-
∞
β=∞
α=-
∞
β=∞
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
42
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-
∞
β=∞
α=-
∞
β=∞
α=-∞ 80
β=80
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
43
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-
∞
β=∞
α=-
∞
β=∞
α=-∞
30
β=30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
44
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-
∞
β=∞
α=30
β=∞ 30
α=-∞
30
β=30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
45
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-
∞
β=∞
α=30
β=∞ 30
α=30
β=∞
α=-
∞ 30
β=30
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
46
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-
∞
β=∞
α=30
β=∞ 30
β≤α
α=30
prune!
α=- β=25
∞ 30
β=30 25
80 30 25 55 20 05 65 40 10 70 15 50 45 60 7547
75
35 47
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-∞ 30
β=30
α=30
β=∞ 30
α=30
α=- β=25
∞ 30
β=30 25
80 30 25 55 20 05 65 40 10 70 15 50 45 60 7548
75
35 48
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-∞ 30
β=30
α=30 α=-∞
β=30
β=∞ 30
α=30
α=-
25β=25α=-∞
∞ 30 β=30
β=30
80 30 25 55 20 05 65 40 10 70 15 50 45 60 7549
75
35 49
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-∞ 30
β=30
α=30 α=20
β=∞ 30 20
β=30
α=30 α=20
α=- β=30
25β=25α=-∞ 20
∞ 30 β=20
β=30
80 30 25 55 20 05 65 40 10 70 15 50 45 60 7550
75
35 50
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-∞ 30
β=30
α=30 α=20
β=∞ 30 20
β=30
α=30 α=20
α=- β=05
25β=25α=-∞ 20
∞ 30 β=20
β=30 05
80 30 25 55 20 05 65 40 10 70 15 50 45 60 7551
75
35 51
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-∞ 30
β=30
α=30 α=20
β=∞ 30 20
β=30
β≤α
α=30 α=20
prune!
α=- β=05
25β=25α=-∞ 20
∞ 30 β=20
β=30 05
80 30 25 55 20 05 40 10 70 15 50 45 60 7552
75
35 52
α=-
α - the best value ∞
for max along the path β=∞
β - the best value
for min along the
path
α=-∞ 20
β=20
α=30 α=20
β=∞ 30 20
β=30
α=30 α=20
α=- β=05
25β=25α=-∞ 20
∞ 30 β=20 05
β=30
80 30 25 55 20 05 40 10 70 15 50 45 60 7553
75
35 53
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the
path
α=-∞ 20
β=20
α=30 α=20
β=∞ 30 20
β=30
α=30 α=20
α=- β=05
25β=25α=-∞ 20
∞ 30 β=20 05
β=30
80 30 25 55 20 05 40 10 70 15 50 45 60 7554
75
35 54
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the
path
α=20
20
β=∞
α=20
30 20 β=∞
α=20
30 25 20 05 β=∞
80 30 25 55 20 05 40 10 70 15 50 45 60 7555
75
35 55
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the
path
α=20
20
β=∞
α=20
30 20 β=∞
α=20
30 25 20 05
β=10 10
80 30 25 55 20 05 40 10 70 15 50 45 60 7556
75
35 56
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the
path
α=20
20
β=∞
α=20
30 20 10 β=∞
α=20
30 25 20 05
β=10 10
80 30 25 55 20 05 40 10 70 15 50 45 60 7557
75
35 57
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the
path
α=20
20
β=∞
α=20
30 20 10 β=∞
α=20
α=20
30 25 20 05 β=15
β=10 10
15
80 30 25 55 20 05 40 10 70 15 50 45 60 7558
75
35 58
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the
path
α=20
20
β=∞
α=20
30 20 15 β=∞
α=20
α=20
30 25 20 05 β=15
β=10 10
15
80 30 25 55 20 05 40 10 70 15 50 45 60 7559
75
35 59
α=20
α - the best value 20 β=∞
for max along the path
β - the best value
for min along the
path
α=20
20 15
β=15
α=20
30 20 15 β=∞
α=20
α=20
30 25 20 05 β=15
β=10 10
15
80 30 25 55 20 05 40 10 70 15 50 45 60 7560
75
35 60
α=20
α - the best value 20 β=∞
for max along the path β≤α
β - the best value
for min along the
prune!
path
α=20
20 15
β=15
α=20
30 20 15 β=∞ X
α=20
α=20
30 25 20 05
β=10 10
15
β=15
X X
80 30 25 55 20 05 40 10 70 15 X X X X
61
35
Bad and Good Cases for Alpha-Beta Pruning
• Bad: Worst moves encountered first
4 MAX
+ +
2 3 4
+----+----+
+ +----+----+ MIN
6 7 +----+----+ MAX
+--+
4 +--+ +--+ +-+-+
5 +--+ +--+ +--+ +--+
8 +--+--+ 6
5 4 23 2 1137 4 3 5 2 3 8 62 1 6 1 2
4 4
• Good: Good moves ordered first
4 MAX
+ + +
4 3 2 MIN
+ + +
+ + + MAX
+--+ +--+ +--+ ++--+ + + 4
4 2 6 x 8 x3 2 6
8 3
• If we can order moves, we can get more benefit from alpha-beta pruning
x
Properties of α-β
• Pruning does not affect final result. This means that it gets the
exact same result as does full minimax.
Normally:
T(m) = b.T(m-1) + c T(m) = O(bm)
4 ply lookahead is a
hopeless chess player!
– 4-ply ≈ human novice
– 8-ply ≈ typical PC,
human master
– 12-ply ≈ Deep Blue, 67
Cutoff
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
68
0
0 0
0 0 Cutoff 0 0
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
69
Evaluation Functions
Tic Tac Toe
• Let p be a position in the game
• Define the utility function f(p) by
– f(p) =
• largest positive number if p is a win for computer
• smallest negative number if p is a win for opponent
• RCDC – RCDO
– where RCDC is number of rows, columns and diagonals in
which computer could still win
– and RCDO is number of rows, columns and diagonals in
which opponent could still win.
Sample Evaluations
• X = Computer; O = Opponent
O O O X
X X X
X O X
O
rows rows
cols cols
diags diags
Evaluation functions
• For chess/checkers, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wm fm(s)
e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black
queens), etc.
Example: Samuel’s Checker-Playing
Program
• It uses a linear evaluation function
f(n) = w1f1(n) + w2f2(n) + ... + wmfm(n)
For example: f = 6K + 4M + U
– K = King Advantage
– M = Man Advantage
– U = Undenied Mobility Advantage (number of
moves that Max where Min has no jump moves)
Samuel’s Checker Player
• In learning mode
Circuitry (1987)
77
Chess game tree
78
Image from Kasparov versus Deep Blue: Computer Chess Comes of Age By Monty Newborn
Problem with fixed depth Searches
if we only search n moves ahead,
it may bepossible that the
hy can be delayed by a
catastrop of moves that do not
progress
sequence
makes in other direction
any ves may not be found)
also work
(good
mo
79
Problems with a fixed ply: The Horizon Effect
chess,
perfect backgammon,
checkers, go,
information monopoly
othello
d1 di dk
S(c,di)
chance
.4 .6
min 1.2
chance
.4
.6
leaf .4
3 5 1 4 1 2 4 5
.6
max
Complexity
• Instead of O(bm), it is O(bmnm) where n is the
number of chance outcomes.
97