Lec3-Adversarial Search
Lec3-Adversarial Search
1
Outline
• Multi-Agent Systems
• A special case: Two-Player Sequential Zero-
Sum Complete-Information Games
– Minimax Algorithm
– Alpha-Beta Pruning
• Overview of Deep Blue and AlphaGo (and
AlphaZero)
2
Multi-Agent Systems
Robocup
3
A Special Case
• Two-Player Sequential Zero-Sum Complete-Information
Games
• Two-Player
– Two players involved
• Sequential
– Players take turns to move (take actions)
– Act alternately
• Zero-sum
– Utility values of the two players at the end of the game have equal
absolute value and opposite sign
• Perfect Information
– Each player can fully observe what actions other agents have taken
4
Two-Agent Games
• Two-agents, perfect information,
zero-sum games
• Two agents move in turn until
either one of them wins or the
result is a draw.
• Each player has a complete model
of the environment and of its own
and the other’s possible actions and
their effects.
5
Games vs. search problems
• "Unpredictable" opponent → specifying a move
for every possible opponent reply
6
Tic-Tac-Toe
7
Chess
https://en.wikipedia.org/wiki/Chess
8
Chess
Garry Kasparov vs Deep Blue (1996)
Result: Win-loss-draw-draw-draw-loss
(In even-numbered games, Deep Blue
played white)
9
Go
• DeepMind promotion video before the game
with Lee Sedol
https://www.youtube.com/watch?v=SUbqykXVx0A
10
Go
https://deepmind.com/blog/alphago-zero-learning-scratch/
AlphaGo: https://www.nature.com/articles/nature16961.pdf
AlphaZero: www.nature.com/articles/nature24270.pdf
11
Solution Concept
• What strategies are appropriate to use in these
two-player sequential zero-sum complete-
information games?
What action should O player take? What action should X player take?
12
Solution Concept
• Iterative Definition: Each player should choose
the best action leading to the highest utility for
it assuming both players will choose the best
actions afterwards
• What if someone accidentally chooses a sub-
optimal action?
– In the next step, the player who needs to move
should choose the best action leading to the highest
utility for it assuming both players will choose the
best actions afterwards
13
Game as Search Problem
14
Game as Search Problem
• In search, we find a path from initial state to a
terminal state
15
Minimax Procedure (1)
• Two player : MAX and MIN
• Task : find a “best” move for MAX
• Assume that MAX moves first, and that the two players
move alternately.
• MAX node
– nodes at even-numbered depths correspond to positions in
which it is MAX’s move next
• MIN node
– nodes at odd-numbered depths correspond to positions in
which it is MIN’s move next
16
Minimax Procedure (2)
• Estimate of the best-first move
– Apply a static evaluation function to the leaf nodes
– Measure the “worth/value” of the leaf nodes.
– Analyze game trees to adopt the convention
• Game positions favorable to MAX cause the evaluation
function to have a positive value
• Positions favorable to MIN cause the evaluation function
to have negative value
• Values near zero correspond to game positions not
particularly favorable to either MAX or MIN.
17
Game tree
(2-player, deterministic, take turns)
tic-tac-toe
18
Minimax Algorithm
19
Minimax Algorithm
20
Minimax Algorithm
22
Minimax Algorithm
V=5 V=6
MAX
V=5
MIN MIN V=6 MIN V=5
V=5
MAX MAX V=6 MAX V=6 MAX V=7 MAX V=5 MAX V=8
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
24
Properties of minimax
• Complete? Yes (if tree is finite)
• Optimal? Yes (against an optimal opponent)
• Time complexity? O(bm) (b-legal moves; m- max tree depth)
• Space complexity? O(bm) (depth-first exploration)
Intractable for large games
• For chess, b ≈ 35, m ≈100 for "reasonable" games
→ exact solution completely infeasible
For many problems, you do not need a full contingent plan at the very
beginning of the game
• Can solve the problem in a more “online” fashion, just like how human players
play Chess/Go: My opponent takes this action and what should I do now?
25
Minimax Algorithm
• If we only care about the game value, or the optimal
action at a specific game state, can we do better?
– Trick 1: Depth-limited search (Limit the depth)
– Minimax algorithm (Shannon, 1950):
• Set 𝑑 (an integer) as an upper bound on the search depth
• 𝑒(𝑠), the evaluation function, returns an estimate of minimax
value of state 𝑠
• Whenever we reach a nonterminal node of depth 𝑑, return 𝑒(𝑠)
• If 𝑑 = ∞ (by default), then 𝑒 will never be called, and the
algorithm visits all the nodes in the tree and provides the
optimal action for all states
26
Minimax Algorithm
• Static evaluation function
– Often a heuristic function that is easily computable given the
representation of a state, e.g., a weighted sum of features
– For chess: Count the pieces for each side, giving each a weight
(queen=9, rook=5, knight/bishop=3, pawn=1)
– What properties do we care about in the evaluation function?
For the optimal action, only the ordering matters
– In chess, can only search
full-width tree to about 4 levels
27
Evaluation functions
• For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
• e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens)
• First, the evaluation function should order the terminal states in the
same way as the true utility function;
• Second, the computation must not take too long!
• Third, for nonterminal states, the evaluation function should be
strongly correlated with the actual chances of winning.
28
Cutting off search
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
29
Example : Tic-Tac-Toe (1)
• MAX marks crosses; MIN marks circles
• it is MAX’s turn to play first.
– With a depth bound of 2
– Evaluation function e(p) of a position p
• If p is not a winning for either player,
e(p) = (no. of complete rows, columns, or diagonals that
are still open for MAX) - (no. of complete rows, columns,
or diagonals that are still open for MIN)
• If p is a win of MAX, e(p) =
• If p is a win of MIN, e(p) = -
30
Example : Tic-Tac-Toe (2)
– Evaluation function e(p) of a position p
• If p is not a winning for either player,
e(p) = (no. of complete rows, columns, or diagonals that
are still open for MAX) - (no. of complete rows, columns,
or diagonals that are still open for MIN)
e(p)=5-4=1 e(p)=6-4=2
31
Example : Tic-Tac-Toe (3)
– evaluation function e(p) of a position p
• If p is not a winning for either player,
e(p) = (no. of complete rows, columns, or diagonals that
are still open for MAX) - (no. of complete rows, columns,
or diagonals that are still open for MIN)
e(p)= e(p)=
32
Example : Tic-Tac-Toe (4)
33
Example : Tic-Tac-Toe (4)
34
Example : Tic-Tac-Toe (4)
35
Example : Tic-Tac-Toe (4)
• First move
36
Example : Tic-Tac-Toe (5)
37
Example : Tic-Tac-Toe (6)
38
Question?
• How to improve
search efficiency?
• Is it possible to
cut-off some
unnecessary
subtrees?
39
• If we only care about the game value, or the
optimal action at a specific game state, can we
do better?
– Trick 2: Pruning subtrees (Limit the width)
– Intuition: Can compute the correct decision without
looking at every node (consider the bounds of the
minimax value)
40
[α, β]
41
Why is it called α-β?
α is the value of the best (i.e., highest-value) choice
found so far at any choice point along the path for
MAX
At MIN node: If v is worse than α, MAX will avoid it
→ prune that branch
Define β similarly for MIN
43
α-β Pruning Example
44
α-β pruning example
45
α-β pruning example
46
α-β pruning example
47
α-β pruning example
48
α-β pruning
α: lower-bound of minimax value
β: upper-bound of minimax value
Whether to Whether to
prune or not prune or not
Update α V= Update β
MAX α=
β=
V=
MIN α= MIN MIN
β=
V=5
α= MAX MAX MAX MAX MAX MAX
β=
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
49
α-β pruning
V=
MAX α=
β=
V=
MIN α= MIN MIN
β=
V=5
α=5 MAX MAX MAX MAX MAX MAX
β=
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
50
α-β pruning
V=
MAX α=
β=
V=
MIN α= MIN MIN
β=
V=5
α=5 MAX MAX MAX MAX MAX MAX
β=
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
51
α-β pruning
V=
MAX α=
β=
V=
MIN α= MIN MIN
β=
V=5
V=5
α=5 MAX MAX MAX MAX MAX MAX
β=
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
52
α-β pruning
V=
MAX α=
β=
V=5
MIN α= MIN MIN
β=5
V=5
α=5 MAX MAX MAX MAX MAX MAX
β=
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
53
α-β pruning
V=
MAX α=
β=
V=5
MIN α= MIN MIN
β=5
V=5 V=
α=5 MAX MAX α= MAX MAX MAX MAX
β= β=5
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
54
α-β pruning
V=
MAX α=
β=
V=5
MIN α= MIN MIN
β=5
V=5 V=4
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
55
α-β pruning
V=
MAX α=
β=
V=5
MIN α= MIN MIN
β=5
V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
56
α-β pruning
V=
MAX α=
β=
V=5
MIN α= MIN MIN
β=5
V=6
V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
57
α-β pruning
V=
MAX α=
β=
V=5
V=5
MIN α= MIN MIN
β=5
V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
58
α-β pruning
V=5
MAX α=5
β=
V=5
MIN α= MIN MIN
β=5
V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
59
α-β pruning
V=5
MAX α=5
β=
V=5 V=
MIN α= MIN α=5 MIN
β=5 β=
V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
60
α-β pruning
V=5
MAX α=5
β=
V=5 V=
MIN α= MIN α=5 MIN
β=5 β=
V=5 V=6 V=
α=5 MAX MAX α=4 MAX α=5 MAX MAX MAX
β= β=5 β=
5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2
61
α-β pruning
Key points:
– The MAX player will only update the value of α.
– The MIN player will only update the value of β.
– While backtracking the tree, the node values will be passed
to upper nodes instead of values of α and β.
– Only pass the α, β values to the child nodes.
Properties:
– Pruning does not affect the final result
– Effectiveness depends on the ordering of successors
– With "perfect ordering," time complexity = O(bm/2)
→ doubles depth of search
63
Iterative Deepening Search
64
Deterministic games in practice
• Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used a precomputed endgame database defining perfect play for
all positions involving 8 or fewer pieces on the board, a total of 444 billion
positions.
• Chess: Deep Blue defeated human world champion Garry Kasparov in a six-
game match in 1997. Deep Blue searches 200 million positions per second, uses
very sophisticated evaluation, and undisclosed methods for extending some lines
of search up to 40 ply.
• Othello: human champions refuse to compete against computers, who are too good.
• Go: Computer Go has been much better than Human since 2016.
65
Before Deep Blue
Name of AI program / Key Technique Year
contributors
Claude Shannon, Alan Minimax search with scoring function 1950
Turing (Static evaluation function)
Opening Books 1950s
KOTOK/McCarthy Alpha-Beta search, Brute-force search 1966
program &ITEP program
MAC HACK Transposition tables 1967
CHESS 3.0-CHESS 4.9 Iteratively deepening depth-first search 1975
End game databases via dynamic 1977
programming
BELLE Special-purpose circuitry 1978
CRAY BLITZ Parallel search 1983
HITECH Parallel evaluation 1985
DEEP BLUE Parallel search + Special-purpose circuitry 1987
66
Before Deep Blue
• Claude Shannon,
Alan Turing:
Minimax search with
scoring function
(1950)
67
How Deep Blue Works
• ~200 million moves / second = 3.6 * 1010
moves in 3 minutes
• 3 min corresponds to
– ~7 plies of uniform depth minimax search
– 10-14 plies of uniform depth alpha-beta search
• 1 sec corresponds to 380 years of human
thinking time
• Specialized hardware searches last 5 ply
68
How Deep Blue Works
• Hardware requirement
– 32-node RS6000 SP multicomputer
– Each node had
• 1 IBM Power2 Super Chip (P2SC)
– 16 chess chips
• Move generation (often takes 40-50% of time)
• Evaluation
• Some endgame heuristics & small endgame databases
– 32GB opening & endgame database
69
Connections with RL
70
Connections with RL
71
Case Study
AlphaGo
72
How AlphaGo Works
• Supervised learning + Reinforcement learning
• Monte-Carlo Tree Search
73
How AlphaGo Works
• Supervised learning + Reinforcement learning
• Monte-Carlo Tree Search
74
How AlphaZero Works
• Reinforcement learning
• Monte-Carlo Tree Search
75
How AlphaZero Works
• Reinforcement learning
• Monte-Carlo Tree Search
76
Summary
• Games are fun to work on!
• They illustrate several important points about AI
• Perfection is unattainable → must approximate
77