05-games
05-games
3
What Kinds of Games?
Mainly games of strategy with the following
characteristics:
5
Two-Player
Opponent’s Game
Move
Generate New
Position
Gam ye
e s
Over
? n
Generate o
Successors
Evaluate
Successors
Move to Highest-Valued
Successor
no Gam
yes e
Over
? 6
Games as Adversarial Search
• States:
– board configurations
• Initial state:
– the board position and which player will move
• Successor function:
– returns list of (move, state) pairs, each indicating a legal
move and the resulting state
• Terminal test:
– determines when the game is over
• Utility function:
– gives a numeric value in terminal states
(e.g., -1, 0, +1 for loss, tie, win)
7
Game Tree (2-player, Deterministic,
Turns)
computer
’s turn
opponent
’s turn
10
80 30 25 5 20 05 40 10 70 50 45 60
35 5 65 15 75 1
1
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 12
8
0
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 13
3
0
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 14
3
0
3
0
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 15
3
0
3 2
0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 16
3
0
3 2
0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 17
3
0
3
0
3 2
0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 18
3
0
3
0
3 2
0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 19
3
0
3 2
0 0
3 2 2
0 5 0
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 20
3
0
3 2
0 0
3 2 2 0
0 5 0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 21
3
0
3 2
0 0
3 2 2 0
0 5 0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 22
3
0
3 2
0 0
3 2 2 0
0 5 0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 23
2
0
2
0
3 2
0 0
3 2 2 0
0 5 0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 24
2
0
2 1
0 5
3 2 1 6
0 0 5 0
3 2 2 0 1 1 4 6
0 5 0 5 0 5 5 0
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 25
2
0
2 1
0 5
3 2 1 6
0 0 5 0
3 2 2 0 1 1 4 6
0 5 0 5 0 5 5 0
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 26
2
0
2 1
0 5
3 2 1 6
0 0 5 0
3 2 2 0 1 1 4 6
0 5 0 5 0 5 5 0
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 27
Minimax Strategy
• Why do we take the min value every other
level of the tree?
29
Properties of Minimax
• Complete?
– Yes (if tree is finite)
• Optimal?
– Yes (against an optimal opponent)
– No (does not exploit opponent weakness against suboptimal opponent)
• Time complexity?
– O(bm)
• Space complexity?
– O(bm) (depth-first exploration)
30
Good Enough?
• Chess:
– branching factor b≈35
• The Universe:
– number of atoms ≈ 1078
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 33
3
0
3 2
0 5
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 34
3
0
Do we need to
check this node?
3 2
0 5
8 3 2 ?? 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 0 5 5 0 0 0 5 0 5 0 5 35
3 No - this branch is guaranteed
0
to be worse than what max
already has
3 2
0 5
80 30 25
5 20 05 40 10 70 50 45 60 75
5 65 15 36
??
3
0
3 2
0 0 Do we need to
check this node?
3 2 2 0
0 5 0 5
80 30 25
5 20 05 ?? 40 10 70 50 45 60 75
5 15 37
35
3
0
3 2
0 0
3 2 2 0
0 5 0 5
80 30 25 20 05
5 40 10 70 50 45 60 75
5 15 38
35 ??
Alpha-Beta
• The alpha-beta procedure can speed up a
depth-first minimax search.
• Alpha: a lower bound on the value that a max
node may ultimately be assigned
v>
α=-
∞
β=∞
α=-
∞
β=∞
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 42
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-
∞
β=∞
α=-
∞
β=∞
α=-∞ 8
0
β=80
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 43
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-
∞
β=∞
α=-
∞
β=∞
α=-∞
3
0
β=30
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 44
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-
∞
β=∞
α=30
β=∞ 30
α=-∞
3
0
β=30
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 45
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-
∞
β=∞
α=30
β=∞ 30
α=30
β=∞
α=-
∞ 30
β=30
8 3 2 3 5 2 0 6 4 1 7 1 5 4 6 7
0 0 5 5 5 0 5 5 0 0 0 5 0 5 0 5 46
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-
∞
β=∞
α=30
β=∞ 30
β≤α
α=30
α=-
∞
β=30
30
β=25
25
prun
e!
8 3 25 5 2 0 6 4 1 7 1 5 4 6 754
0 0 5 0 5 5 0 0 0 5 0 5 0 7 75
35 47
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-∞ 30
β=30
α=30
β=∞ 30
α=30
α=- β=25
∞ 30
β=30 25
8 3 25 5 2 0 6 4 1 7 1 5 4 6 754
0 0 5 0 5 5 0 0 0 5 0 5 0 8 75
35 48
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-∞ 30
β=30
α=30 α=-∞
β=∞ 30
β=30
α=30
α=-
2 β=25α=-∞
∞ 30
β=30 5
β=30
8 3 25 5 2 0 6 4 1 7 1 5 4 6 754
0 0 5 0 5 5 0 0 0 5 0 5 0 9 75
35 49
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-∞ 30
β=30
α=30 α=20
β=∞ 30 2
β=30 0
α=30 α=20
α=- β=30
2 β=25α=-∞ 2
∞ 30
0
β=30 5
β=20
8 3 25 5 2 0 6 4 1 7 1 5 4 6 755
0 0 5 0 5 5 0 0 0 5 0 5 0 0 75
35 50
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-∞ 30
β=30
α=30 α=20
β=∞ 30 2
β=30 0
α=30 α=20
α=- β=05
2 β=25α=-∞ 2
∞ 30
0
β=30 5 05
β=20
8 3 25 5 2 0 6 4 1 7 1 5 4 6 755
0 0 5 0 5 5 0 0 0 5 0 5 0 1 75
35 51
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-∞ 30
β=30
α=30 α=20
β=∞ 30 2
β=30 0
β≤α
α=30 α=20
α=-
∞ 30
2 β=25α=-∞ 2
0
β=05
prun
β=30 5 05
β=20 e!
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 2 75
35 52
α=-
α - the best value ∞
for max along the β=∞
path β - the best value
for min along
the path
α=-∞ 20
β=20
α=30 α=20
β=∞ 30 2
β=30 0
α=30 α=20
α=- β=05
2 β=25α=-∞ 2
∞ 30
0 05
β=30 5
β=20
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 3 75
35 53
α=20
α - the best value 2 β=∞
for max along the 0
path β - the best value
for min along
the path
α=-∞ 20
β=20
α=30 α=20
β=∞ 30 2
β=30 0
α=30 α=20
α=- β=05
2 β=25α=-∞ 2
∞ 30
0 05
β=30 5
β=20
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 4 75
35 54
α=20
α - the best value 2 β=∞
for max along the 0
path β - the best value
for min along
the path
α=20
2
β=∞
0
α=20
3 2 β=∞
0 0
α=20
3 2 2 0 β=∞
0 5 0 5
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 5 75
35 55
α=20
α - the best value 2 β=∞
for max along the 0
path β - the best value
for min along
the path
α=20
2
β=∞
0
α=20
3 2 β=∞
0 0
α=20
3 2 2 0
β=10 10
0 5 0 5
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 6 75
35 56
α=20
α - the best value 2 β=∞
for max along the 0
path β - the best value
for min along
the path
α=20
2
β=∞
0
α=20
3 2 1 β=∞
0 0 0
α=20
3 2 2 0
β=10 10
0 5 0 5
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 7 75
35 57
α=20
α - the best value 2 β=∞
for max along the 0
path β - the best value
for min along
the path
α=20
2
β=∞
0
α=20
3 2 1 β=∞
0 0 0
α=20
α=20
3 2 2 0 β=15
β=10 10
0 5 0 5 15
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 8 75
35 58
α=20
α - the best value 2 β=∞
for max along the 0
path β - the best value
for min along
the path
α=20
2
β=∞
0
α=20
3 2 1 β=∞
0 0 5
α=20
α=20
3 2 2 0 β=15
β=10 10
0 5 0 5 15
8 3 25 5 2 05 4 1 7 1 5 4 6 755
0 0 5 0 0 0 0 5 0 5 0 9 75
35 59
α=20
α - the best value 2 β=∞
for max along the 0
path β - the best value
for min along
the path
α=20
2 1
0 5
β=15
α=20
3 2 1 β=∞
0 0 5
α=20
α=20
3 2 2 0 β=15
β=10 10
0 5 0 5 15
8 3 25 5 2 05 4 1 7 1 5 4 6 756
0 0 5 0 0 0 0 5 0 5 0 0 75
35 60
α=20
α - the best value 2 β=∞
for max along the 0 β≤α
path β - the best value
for min along
the path prun
2 1
α=20
e!
0 5
β=15
α=20
3
0
2
0
1
5
β=∞ X
α=20
α=20
3
0
2
5
2
0
0
5
β=10 10
β=15
15
X
X
8 3 25 5 2 05 4 1 7 1 X X X X
0 0 5 0 0 0 0 5 6
35 1
Bad and Good Cases for Alpha-Beta Pruning
• Bad: Worst moves encountered first
4 MAX
+
2 3 MIN
+----+----+
+ +----+----+
6 7 45 3 8 6 4 MAX
+--+
4 +--++ +--+ +-+-+
+----+----+
+--+ +--+ +--+ +--+ +--+--+
6 5 42 3 2 1 1 3 7 4 5 2 3 8 2 1 6 1 2
4
• Good: Good moves ordered first
4 MAX
+
4 3 2 MIN
+ + + + ++ + + + +
4 6 8 3 x x 2 x x MAX
+--+ +--+ +--+ +--++ +-+-+
4 2 6 x 8 x 3 2 1 2
1
• If we can order moves, we can get more benefit from alpha-beta pruning
Properties of α-β
• Pruning does not affect final result. This means that it gets the
exact same result as does full minimax.
63
Why O(bm/2)?
Let T(m) be time complexity of search for depth m
Normally:
T(m) = b.T(m-1) + c T(m) = O(bm)
64
Node Ordering
Iterative deepening search
65
Good Enough?
• Chess: The universe
– branching factor b≈35 can play chess
- can we?
– game length m≈100
– search space bm/2 ≈ 3550 ≈ 1077
• The Universe:
– number of atoms ≈ 1078
– age ≈ 1018 seconds
– 108 moves/sec x 1078 x 1018 = 10104 6
6
Cutting off Search
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
4 ply lookahead is a
hopeless chess player!
– 4-ply ≈ human novice
– 8-ply ≈ typical PC,
human master
– 12-ply ≈ Deep Blue, 6
7
Cutof
f
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
68
0
0 0
0 0 Cutof 0 0
f
80 30 25 35 55 20 05 65 40 10 70 15 50 45 60 75
69
Evaluation Functions
Tic Tac Toe
• Let p be a position in the game
• Define the utility function f(p) by
– f(p) =
• largest positive number if p is a win for computer
• smallest negative number if p is a win for opponent
• RCDC – RCDO
– where RCDC is number of rows, columns and diagonals in
which computer could still win
– and RCDO is number of rows, columns and diagonals in
which opponent could still win.
70
Sample Evaluations
• X = Computer; O = Opponent
O O O X
X X X
X O X
O
rows rows
cols cols
71
diags diags
Evaluation functions
• For chess/checkers, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wm fm(s)
e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black
queens), etc.
72
Example: Samuel’s Checker-Playing
Program
• It uses a linear evaluation function
f(n) = w1f1(n) + w2f2(n) + ... + wmfm(n)
For example: f = 6K + 4M + U
– K = King Advantage
– M = Man Advantage
– U = Undenied Mobility Advantage (number of
moves that Max where Min has no jump moves)
73
Samuel’s Checker Player
• In learning mode
74
Samuel’s Checker Player
• How does A change its function?
Coefficent replacement
(node) = backed-up value(node) – initial value(node)
if > 0 then terms that
contributed positively are given more weight and
terms that contributed negatively get less weight
if < 0 then terms that
contributed negatively are given more weight and
terms that contributed positively get less weight
75
Chess: Rich history of cumulative ideas
Minimax search, evaluation function learning
(1950).
(1975).
7
8
Problem with fixed depth
Searches
if we search n moves
only it ahead,
may be possible that the
catastro hy can be delayed
p by a of moves that
sequenc do not progress
e make
any s in other direction
ves may not be
also found)
work
(good
mo
7
9
Problems with a fixed ply: The Horizon Effect
82
Additional Refinements
83
End-Game Databases
84
The MONSTER
88
Other Games
deterministic chance
chess,
perfect backgammo
checkers,
informati n,
go, othello
on monopoly
imperfec bridge,
stratego
t poker,
informati scrabble
on 90
Games of Chance
• What about games that involve chance, such
as
– rolling dice
– picking a card
• Use three kinds of nodes:
– max nodes min
– min nodes chanc
– chance nodes e
max
91
Games of Chance
Expectiminimax
c chance node
with max
children
d1 di dk
S(c,di)
chanc
e . .6
mi 4
n 1.2
chance
.4
.6
lea .4
3 5 1 4 1 2 4
f 5 .6
max
93
Complexity
• Instead of O(bm), it is O(bmnm) where n is the
number of chance outcomes.
94
Imperfect Information
• E.g. card games, where
opponents’ initial ar
cards unknown e
• Idea: For all deals consistent with what
you can see
– compute the minimax value of
available actions for each of possible
deals
– compute the expected value over all
deals
95
Status of AI Game Players
• Tic Tac Toe • Poker
– Tied for best player in world – 2015, Heads-up limit hold'em poker
• Othello is solved
– Computer better than any human • Checkers
– Human champions now refuse to – 1994, Chinook ended 40-year reign
play computer of human champion Marion Tinsley
• Scrabble • Chess
– Maven beat world champions Joel – 1997, Deep Blue beat human
Sherman and Matt Graham champion Gary Kasparov in six-
game match
• Backgammon – Deep Blue searches 200M
– 1992, Tesauro combines 3-ply positions/second, up to 40 ply
search & neural networks (with 160 – Now looking at other applications
hidden units) yielding top-3 player
(molecular dynamics, drug
• Bridge synthesis)
– Gib ranked among top players in the • Go
world – 2016, Deepmind’s AlphaGo
defeated Lee Sedol & 2017 defeated
Ke Jie
Summary
• Games are fun to work on!
9
7