L07 Adversarial Search
L07 Adversarial Search
Semester I, 2022-23
Adversarial Search
Rohan Paul
1
Outline
• Last Class
• Constraint Satisfaction
• This Class
• Adversarial Search
• Reference Material
• AIMA Ch. 5 (Sec: 5.1-5.5)
2
Acknowledgement
These slides are intended for teaching purposes only. Some material
has been used/adapted from web sources and from slides by Doina
Precup, Dorsa Sadigh, Percy Liang, Mausam, Dan Klein, Anca
Dragan, Nicholas Roy and others.
3
Game Playing and AI
• Games: challenging decision-making
problems
• Incorporate the state of the other agent in
your decision-making. Leads to a vast
number of possibilities.
• Long duration of play. Win at the end.
• Time limits: Do not have time to compute
optimal solutions.
4
Games: Characteristics
• Axes: • Zero-Sum Games
• Players: one, two or more.
• Adversarial: agents have opposite
• Actions (moves): deterministic or
stochastic utilities (values on outcomes)
• States: fully known or not.
6
Slide adapted from Dan Klein and from Mausam
Single-Agent Trees
2 0 … 2 6 … 4 6
7
Computing “utility” of states to decide actions
Non-Terminal States:
Value of a state:
The best achievable
outcome (utility)
from that state
2 0 … 2 6 … 4 6
Terminal States:
8
Game Trees: Presence of an Adversary
The adversary’s actions are not in our control. Plan as a contingency considering all possible actions taken by the adversary.
Minimax Values
States Under Agent’s Control: States Under Opponent’s Control:
-8 -5 -10 +8
Terminal States:
Adversarial Search (Minimax)
• Consider a deterministic, zero-sum game
• Tic-tac-toe, chess etc.
• One player maximizes result and the other minimizes result.
• Minimax Search
• Search the game tree for best moves.
• Select optimal actions that move to a position with the highest minimax
value.
• What is the minimax value?
• It is the best achievable utility against the optimal (rational) adversary.
• Best achievable payoff against the best play by the adversary.
Minimax Algorithm
• Ply and Move Minimax values:
• Move: when action taken by both players. computed recursively
• Ply: is a half move.
5 max
• Backed-up value
• of a MAX-position: the value of the largest successor
• of a MIN-position: the value of its smallest successor. 2 5 min
• Minimax algorithm
• Search down the tree till the terminal nodes.
• At the bottom level apply the utility function.
8 2 5 6
• Back up the values up to the root along the search
path (compute as per min and max nodes)
Terminal values:
• The root node selects the action. part of the game
Minimax Example
3 2 2
3 12 8 2 4 6 14 5 2
Minimax Implementation
• Complexity
• Time: O(bm)
• Space: O(bm)
• Optimal
• If the adversary is playing optimally (i.e.,
giving us the min value)
• Yes
• If the adversary is not playing optimally MAX
(i.e., not giving us the min value)
• No. Why? It does not exploit the opponent’s
weakness against a suboptimal opponent). MIN
10 10 9 100
3 <=2 2
3 12 8 2 14 5 2
Alpha-Beta Pruning: General Idea
• General Configuration (MIN version)
• Consider computing the MIN-VALUE at some node n, MAX
examining n’s children
• n’s estimate of the childrens’ min is reducing.
MIN a
• Who can use n’s value to make a choice? MAX
• Let a be the best value that MAX can get at any choice
point along the current path from the root
• If the value at n becomes worse than a, MAX will not pick MAX
this option, so we can stop considering n’s other children
(any further exploration of children will only reduce the MIN n
value further)
Alpha-Beta Pruning: General Idea
• General Configuration (MAX version)
• Consider computing the MAX-VALUE at some node n, MIN
examining n’s children
• n’s estimate of the childrens’ min is increasing.
MAX b
• Who can use n’s value to make a choice? MIN
• Let b be the lowest (best) value that MIN can get at any
choice point along the current path from the root
• If the value at n becomes higher than b, MIN will not pick MIN
this option, so we can stop considering n’s other children
(any further exploration of children will only increase the MAX n
value further)
Pruning: Example
Pruning: Example
8 <=4
Pruning: Example
Pruning: Example
10
<=2
>=100 2
10
Alpha-Beta Implementation
α: MAX’s best option on path to root
β: MIN’s best option on path to root
26
Alpha-Beta Pruning – Order of nodes matters
3 <=2 2
3 12 8 2 14 5 2
27
Alpha-Beta Pruning – Order of nodes matters
3 <=2
<=2
3 12 8 2 2 5 14
28
Alpha-Beta Pruning - Properties
1. Pruning has no effect on the minimax value at the root.
• Pruning does not affect the final action selected at the root.
2. A form of meta-reasoning (computing what to compute)
• Eliminates nodes that are irrelevant for the final decision.
3. The alpha-beta search cuts the largest amount off the tree when we
examine the best move first
• However, best moves are typically not known. Need to make estimates.
29
Alpha-Beta Pruning – Order of nodes matters
If the nodes were indeed encountered as “worst
moves first” – then no pruning is possible
31
Minimax for Chess Alpha-Beta for Chess
• Solution:
• Depth-limited Search (H-Minimax)
• Search only to a limited depth (cutoff) in the tree
• Replace the terminal utilities with an evaluation function
for non-terminal positions.
? ? ? ?
Terminal nodes
Evaluation Functions
• Evaluation functions score non-terminals in depth-limited search.
• Estimate the chances of winning.
• Value at a min-node will only keep going down. Once value of min-node lower than
better option for max along path to root, can prune
10 10 9 100
• Expectimax search:
• At chance nodes the outcome is uncertain
• Calculate the expected utilities: weighted average
(expectation) of children
37
Expectimax Search
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is EXP: return exp-value(state)
3 12 9 2 4 6 15 6 0 3 12 9 2
Estimate of true
400 300 … expectimax value
492 362 …
Multiple players and other games
• Other games: non zero-sum, or multiple players
• Generalization of minimax:
• Terminals have utility tuples
• Node values are also utility tuples
• Each player maximizes its own component
43
Probabilities (Recap)
• A random variable represents an event whose outcome is unknown
• A probability distribution is an assignment of weights to outcomes 0.25
• Example: Traffic on freeway
• Random variable: T = whether there’s traffic
• Outcomes: T in {none, light, heavy}
• Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
0.50
• Some laws of probability:
• Probabilities are always non-negative
• Probabilities over all possible outcomes sum to one
• Providing utilities
Getting ice cream
• In a game, may be simple (+1/-1)
• Utilities summarize the agent’s goals
Get Single Get Double