0% found this document useful (0 votes)
3 views

Lec3-Adversarial Search

The document discusses adversarial search in multi-agent systems, focusing on two-player sequential zero-sum complete-information games. It covers key algorithms such as the Minimax Algorithm and Alpha-Beta Pruning, as well as notable examples like Deep Blue and AlphaGo. The content emphasizes the strategies for optimal decision-making in competitive environments and the computational challenges associated with these games.

Uploaded by

春苗 陈
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lec3-Adversarial Search

The document discusses adversarial search in multi-agent systems, focusing on two-player sequential zero-sum complete-information games. It covers key algorithms such as the Minimax Algorithm and Alpha-Beta Pruning, as well as notable examples like Deep Blue and AlphaGo. The content emphasizes the strategies for optimal decision-making in competitive environments and the computational challenges associated with these games.

Uploaded by

春苗 陈
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Adversarial Search

1
Outline
• Multi-Agent Systems
• A special case: Two-Player Sequential Zero-
Sum Complete-Information Games
– Minimax Algorithm
– Alpha-Beta Pruning
• Overview of Deep Blue and AlphaGo (and
AlphaZero)

2
Multi-Agent Systems
Robocup

Texas Hold’em 三国杀

3
A Special Case
• Two-Player Sequential Zero-Sum Complete-Information
Games

• Two-Player
– Two players involved
• Sequential
– Players take turns to move (take actions)
– Act alternately
• Zero-sum
– Utility values of the two players at the end of the game have equal
absolute value and opposite sign
• Perfect Information
– Each player can fully observe what actions other agents have taken

4
Two-Agent Games
• Two-agents, perfect information,
zero-sum games
• Two agents move in turn until
either one of them wins or the
result is a draw.
• Each player has a complete model
of the environment and of its own
and the other’s possible actions and
their effects.

5
Games vs. search problems
• "Unpredictable" opponent → specifying a move
for every possible opponent reply

• For Chess, average branching 35; and search 50


moves by each player
– Time limits (35100) → unlikely to find goal, must
approximate

6
Tic-Tac-Toe

• Two players, X and O, take turns marking the


spaces in a 3×3 grid

• The player who succeeds in placing three of


their marks in a horizontal, vertical, or diagonal
row wins the game
https://en.wikipedia.org/wiki/Tic-tac-toe

7
Chess

https://en.wikipedia.org/wiki/Chess

8
Chess
Garry Kasparov vs Deep Blue (1996)

Result: Win-loss-draw-draw-draw-loss
(In even-numbered games, Deep Blue
played white)

9
Go
• DeepMind promotion video before the game
with Lee Sedol

https://www.youtube.com/watch?v=SUbqykXVx0A

10
Go

AlphaGo vs Lee Sedol (3/2016) AlphaZero vs AlphaGo (2017)

https://deepmind.com/blog/alphago-zero-learning-scratch/

Result: win-win-win-loss-win Result: 100-0

AlphaGo: https://www.nature.com/articles/nature16961.pdf
AlphaZero: www.nature.com/articles/nature24270.pdf

11
Solution Concept
• What strategies are appropriate to use in these
two-player sequential zero-sum complete-
information games?

What action should O player take? What action should X player take?

12
Solution Concept
• Iterative Definition: Each player should choose
the best action leading to the highest utility for
it assuming both players will choose the best
actions afterwards
• What if someone accidentally chooses a sub-
optimal action?
– In the next step, the player who needs to move
should choose the best action leading to the highest
utility for it assuming both players will choose the
best actions afterwards

13
Game as Search Problem

14
Game as Search Problem
• In search, we find a path from initial state to a
terminal state

• In two-player sequential zero-sum complete-


information games, we are looking for a
“strategy profile” or “contingent plan”: An
action for each state

15
Minimax Procedure (1)
• Two player : MAX and MIN
• Task : find a “best” move for MAX
• Assume that MAX moves first, and that the two players
move alternately.
• MAX node
– nodes at even-numbered depths correspond to positions in
which it is MAX’s move next
• MIN node
– nodes at odd-numbered depths correspond to positions in
which it is MIN’s move next

16
Minimax Procedure (2)
• Estimate of the best-first move
– Apply a static evaluation function to the leaf nodes
– Measure the “worth/value” of the leaf nodes.
– Analyze game trees to adopt the convention
• Game positions favorable to MAX cause the evaluation
function to have a positive value
• Positions favorable to MIN cause the evaluation function
to have negative value
• Values near zero correspond to game positions not
particularly favorable to either MAX or MIN.

17
Game tree
(2-player, deterministic, take turns)
tic-tac-toe

18
Minimax Algorithm

19
Minimax Algorithm

20
Minimax Algorithm

The minimax value of root node

The best play from both parties leads to a draw

22
Minimax Algorithm

V=5 V=6
MAX

V=5
MIN MIN V=6 MIN V=5

V=5
MAX MAX V=6 MAX V=6 MAX V=7 MAX V=5 MAX V=8

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

24
Properties of minimax
• Complete? Yes (if tree is finite)
• Optimal? Yes (against an optimal opponent)
• Time complexity? O(bm) (b-legal moves; m- max tree depth)
• Space complexity? O(bm) (depth-first exploration)
 Intractable for large games
• For chess, b ≈ 35, m ≈100 for "reasonable" games
→ exact solution completely infeasible
 For many problems, you do not need a full contingent plan at the very
beginning of the game
• Can solve the problem in a more “online” fashion, just like how human players
play Chess/Go: My opponent takes this action and what should I do now?

25
Minimax Algorithm
• If we only care about the game value, or the optimal
action at a specific game state, can we do better?
– Trick 1: Depth-limited search (Limit the depth)
– Minimax algorithm (Shannon, 1950):
• Set 𝑑 (an integer) as an upper bound on the search depth
• 𝑒(𝑠), the evaluation function, returns an estimate of minimax
value of state 𝑠
• Whenever we reach a nonterminal node of depth 𝑑, return 𝑒(𝑠)
• If 𝑑 = ∞ (by default), then 𝑒 will never be called, and the
algorithm visits all the nodes in the tree and provides the
optimal action for all states

26
Minimax Algorithm
• Static evaluation function
– Often a heuristic function that is easily computable given the
representation of a state, e.g., a weighted sum of features
– For chess: Count the pieces for each side, giving each a weight
(queen=9, rook=5, knight/bishop=3, pawn=1)
– What properties do we care about in the evaluation function?
For the optimal action, only the ordering matters
– In chess, can only search
full-width tree to about 4 levels

27
Evaluation functions
• For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
• e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens)

• First, the evaluation function should order the terminal states in the
same way as the true utility function;
• Second, the computation must not take too long!
• Third, for nonterminal states, the evaluation function should be
strongly correlated with the actual chances of winning.

28
Cutting off search
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval

TERMINAL-TEST-->if CUTOFF-TEST(stated, depth) then return


EVAL(state)

Does it work in practice?


bm = 106, b=35 → m=4
4-ply lookahead is a hopeless chess player!
– 4-ply ≈ human novice
– 8-ply ≈ typical PC, human master
– 12-ply ≈ Deep Blue, Kasparov

29
Example : Tic-Tac-Toe (1)
• MAX marks crosses; MIN marks circles
• it is MAX’s turn to play first.
– With a depth bound of 2
– Evaluation function e(p) of a position p
• If p is not a winning for either player,
e(p) = (no. of complete rows, columns, or diagonals that
are still open for MAX) - (no. of complete rows, columns,
or diagonals that are still open for MIN)
• If p is a win of MAX, e(p) = 
• If p is a win of MIN, e(p) = - 

30
Example : Tic-Tac-Toe (2)
– Evaluation function e(p) of a position p
• If p is not a winning for either player,
e(p) = (no. of complete rows, columns, or diagonals that
are still open for MAX) - (no. of complete rows, columns,
or diagonals that are still open for MIN)

e(p)=5-4=1 e(p)=6-4=2

31
Example : Tic-Tac-Toe (3)
– evaluation function e(p) of a position p
• If p is not a winning for either player,
e(p) = (no. of complete rows, columns, or diagonals that
are still open for MAX) - (no. of complete rows, columns,
or diagonals that are still open for MIN)

e(p)= e(p)=

32
Example : Tic-Tac-Toe (4)

33
Example : Tic-Tac-Toe (4)

34
Example : Tic-Tac-Toe (4)

35
Example : Tic-Tac-Toe (4)

• First move

36
Example : Tic-Tac-Toe (5)

37
Example : Tic-Tac-Toe (6)

38
Question?

• How to improve
search efficiency?
• Is it possible to
cut-off some
unnecessary
subtrees?

39
• If we only care about the game value, or the
optimal action at a specific game state, can we
do better?
– Trick 2: Pruning subtrees (Limit the width)
– Intuition: Can compute the correct decision without
looking at every node (consider the bounds of the
minimax value)

40
[α, β]

41
Why is it called α-β?
 α is the value of the best (i.e., highest-value) choice
found so far at any choice point along the path for
MAX
 At MIN node: If v is worse than α, MAX will avoid it
→ prune that branch
 Define β similarly for MIN

43
α-β Pruning Example

44
α-β pruning example

45
α-β pruning example

46
α-β pruning example

47
α-β pruning example

48
α-β pruning
α: lower-bound of minimax value
β: upper-bound of minimax value

Whether to Whether to
prune or not prune or not

Update α V= Update β
MAX α=
β=

V=
MIN α= MIN MIN
β=

V=5
α= MAX MAX MAX MAX MAX MAX
β=

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

49
α-β pruning

V=
MAX α=
β=

V=
MIN α= MIN MIN
β=

V=5
α=5 MAX MAX MAX MAX MAX MAX
β=

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

50
α-β pruning

V=
MAX α=
β=

V=
MIN α= MIN MIN
β=

V=5
α=5 MAX MAX MAX MAX MAX MAX
β=

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

51
α-β pruning

V=
MAX α=
β=

V=
MIN α= MIN MIN
β=
V=5

V=5
α=5 MAX MAX MAX MAX MAX MAX
β=

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

52
α-β pruning

V=
MAX α=
β=

V=5
MIN α= MIN MIN
β=5

V=5
α=5 MAX MAX MAX MAX MAX MAX
β=

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

53
α-β pruning

V=
MAX α=
β=

V=5
MIN α= MIN MIN
β=5

V=5 V=
α=5 MAX MAX α= MAX MAX MAX MAX
β= β=5

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

54
α-β pruning

V=
MAX α=
β=

V=5
MIN α= MIN MIN
β=5

V=5 V=4
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

55
α-β pruning

V=
MAX α=
β=

V=5
MIN α= MIN MIN
β=5

V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

56
α-β pruning

V=
MAX α=
β=

V=5
MIN α= MIN MIN
β=5
V=6
V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

57
α-β pruning

V=
MAX α=
β=
V=5

V=5
MIN α= MIN MIN
β=5

V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

58
α-β pruning

V=5
MAX α=5
β=

V=5
MIN α= MIN MIN
β=5

V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

59
α-β pruning

V=5
MAX α=5
β=

V=5 V=
MIN α= MIN α=5 MIN
β=5 β=

V=5 V=6
α=5 MAX MAX α=4 MAX MAX MAX MAX
β= β=5

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

60
α-β pruning

V=5
MAX α=5
β=

V=5 V=
MIN α= MIN α=5 MIN
β=5 β=

V=5 V=6 V=
α=5 MAX MAX α=4 MAX α=5 MAX MAX MAX
β= β=5 β=

5 3 4 6 5 3 6 4 7 5 2 4 5 3 8 2

61
α-β pruning
 Key points:
– The MAX player will only update the value of α.
– The MIN player will only update the value of β.
– While backtracking the tree, the node values will be passed
to upper nodes instead of values of α and β.
– Only pass the α, β values to the child nodes.
 Properties:
– Pruning does not affect the final result
– Effectiveness depends on the ordering of successors
– With "perfect ordering," time complexity = O(bm/2)
→ doubles depth of search

63
Iterative Deepening Search

64
Deterministic games in practice
• Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used a precomputed endgame database defining perfect play for
all positions involving 8 or fewer pieces on the board, a total of 444 billion
positions.

• Chess: Deep Blue defeated human world champion Garry Kasparov in a six-
game match in 1997. Deep Blue searches 200 million positions per second, uses
very sophisticated evaluation, and undisclosed methods for extending some lines
of search up to 40 ply.

• Othello: human champions refuse to compete against computers, who are too good.

• Go: Computer Go has been much better than Human since 2016.

65
Before Deep Blue
Name of AI program / Key Technique Year
contributors
Claude Shannon, Alan Minimax search with scoring function 1950
Turing (Static evaluation function)
Opening Books 1950s
KOTOK/McCarthy Alpha-Beta search, Brute-force search 1966
program &ITEP program
MAC HACK Transposition tables 1967
CHESS 3.0-CHESS 4.9 Iteratively deepening depth-first search 1975
End game databases via dynamic 1977
programming
BELLE Special-purpose circuitry 1978
CRAY BLITZ Parallel search 1983
HITECH Parallel evaluation 1985
DEEP BLUE Parallel search + Special-purpose circuitry 1987

66
Before Deep Blue
• Claude Shannon,
Alan Turing:
Minimax search with
scoring function
(1950)

• Only show a few


branches here

67
How Deep Blue Works
• ~200 million moves / second = 3.6 * 1010
moves in 3 minutes
• 3 min corresponds to
– ~7 plies of uniform depth minimax search
– 10-14 plies of uniform depth alpha-beta search
• 1 sec corresponds to 380 years of human
thinking time
• Specialized hardware searches last 5 ply

68
How Deep Blue Works
• Hardware requirement
– 32-node RS6000 SP multicomputer
– Each node had
• 1 IBM Power2 Super Chip (P2SC)
– 16 chess chips
• Move generation (often takes 40-50% of time)
• Evaluation
• Some endgame heuristics & small endgame databases
– 32GB opening & endgame database

69
Connections with RL

70
Connections with RL

71
Case Study
AlphaGo

72
How AlphaGo Works
• Supervised learning + Reinforcement learning
• Monte-Carlo Tree Search

73
How AlphaGo Works
• Supervised learning + Reinforcement learning
• Monte-Carlo Tree Search

74
How AlphaZero Works
• Reinforcement learning
• Monte-Carlo Tree Search

75
How AlphaZero Works
• Reinforcement learning
• Monte-Carlo Tree Search

76
Summary
• Games are fun to work on!
• They illustrate several important points about AI
• Perfection is unattainable → must approximate

77

You might also like