0% found this document useful (0 votes)
39 views48 pages

L07 Adversarial Search

The document discusses adversarial search and algorithms like minimax and alpha-beta pruning that are used for game playing in artificial intelligence. It covers topics like minimax search, the minimax algorithm, alpha-beta pruning, and how the order of exploring nodes in the game tree impacts pruning.

Uploaded by

arabickathu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views48 pages

L07 Adversarial Search

The document discusses adversarial search and algorithms like minimax and alpha-beta pruning that are used for game playing in artificial intelligence. It covers topics like minimax search, the minimax algorithm, alpha-beta pruning, and how the order of exploring nodes in the game tree impacts pruning.

Uploaded by

arabickathu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

COL333/671: Introduction to AI

Semester I, 2022-23

Adversarial Search

Rohan Paul

1
Outline
• Last Class
• Constraint Satisfaction
• This Class
• Adversarial Search
• Reference Material
• AIMA Ch. 5 (Sec: 5.1-5.5)

2
Acknowledgement
These slides are intended for teaching purposes only. Some material
has been used/adapted from web sources and from slides by Doina
Precup, Dorsa Sadigh, Percy Liang, Mausam, Dan Klein, Anca
Dragan, Nicholas Roy and others.

3
Game Playing and AI
• Games: challenging decision-making
problems
• Incorporate the state of the other agent in
your decision-making. Leads to a vast
number of possibilities.
• Long duration of play. Win at the end.
• Time limits: Do not have time to compute
optimal solutions.

4
Games: Characteristics
• Axes: • Zero-Sum Games
• Players: one, two or more.
• Adversarial: agents have opposite
• Actions (moves): deterministic or
stochastic utilities (values on outcomes)
• States: fully known or not.

• Core: contingency problem


• The opponent’s move is not known ahead of time. A player must respond
with a move for every possible opponent reply.
• Output
• Calculate a strategy (policy) which recommends a move from each state.
5
Playing Tic-Tac-Toe: Essentially a search problem!

Terminal nodes we get -1, 0 or 1 for loss, tie or


win. Think of this value as a ”utility” of a state.

6
Slide adapted from Dan Klein and from Mausam
Single-Agent Trees

2 0 … 2 6 … 4 6
7
Computing “utility” of states to decide actions
Non-Terminal States:
Value of a state:
The best achievable
outcome (utility)
from that state

2 0 … 2 6 … 4 6
Terminal States:

8
Game Trees: Presence of an Adversary

-20 -8 … -18 -5 … -10 +4 -20 +8

The adversary’s actions are not in our control. Plan as a contingency considering all possible actions taken by the adversary.
Minimax Values
States Under Agent’s Control: States Under Opponent’s Control:

-8 -5 -10 +8

Terminal States:
Adversarial Search (Minimax)
• Consider a deterministic, zero-sum game
• Tic-tac-toe, chess etc.
• One player maximizes result and the other minimizes result.

• Minimax Search
• Search the game tree for best moves.
• Select optimal actions that move to a position with the highest minimax
value.
• What is the minimax value?
• It is the best achievable utility against the optimal (rational) adversary.
• Best achievable payoff against the best play by the adversary.
Minimax Algorithm
• Ply and Move Minimax values:
• Move: when action taken by both players. computed recursively
• Ply: is a half move.
5 max
• Backed-up value
• of a MAX-position: the value of the largest successor
• of a MIN-position: the value of its smallest successor. 2 5 min

• Minimax algorithm
• Search down the tree till the terminal nodes.
• At the bottom level apply the utility function.
8 2 5 6
• Back up the values up to the root along the search
path (compute as per min and max nodes)
Terminal values:
• The root node selects the action. part of the game
Minimax Example

3 2 2

3 12 8 2 4 6 14 5 2
Minimax Implementation

def max-value(state): def min-value(state):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, min-value(successor)) v = min(v, max-value(successor))
return v return v
Minimax Implementation
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is MIN: return min-value(state)

def max-value(state): def min-value(state):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor)) v = min(v, value(successor))
return v return v

Useful, when there are multiple adversaries.


Minimax Properties
• Completeness
• Yes

• Complexity
• Time: O(bm)
• Space: O(bm)

• Requires growing the tree till the


terminal nodes.
• Not feasible in practice for a game
like Chess.
Minimax Properties You: Cricle. Opponent: Cross

• Optimal
• If the adversary is playing optimally (i.e.,
giving us the min value)
• Yes
• If the adversary is not playing optimally MAX
(i.e., not giving us the min value)
• No. Why? It does not exploit the opponent’s
weakness against a suboptimal opponent). MIN

10 10 9 100

If min returns 9? Or 100?


Necessary to examine all values in the tree?

3 <=2 2

3 12 8 2 14 5 2
Alpha-Beta Pruning: General Idea
• General Configuration (MIN version)
• Consider computing the MIN-VALUE at some node n, MAX
examining n’s children
• n’s estimate of the childrens’ min is reducing.
MIN a
• Who can use n’s value to make a choice? MAX
• Let a be the best value that MAX can get at any choice
point along the current path from the root
• If the value at n becomes worse than a, MAX will not pick MAX
this option, so we can stop considering n’s other children
(any further exploration of children will only reduce the MIN n
value further)
Alpha-Beta Pruning: General Idea
• General Configuration (MAX version)
• Consider computing the MAX-VALUE at some node n, MIN
examining n’s children
• n’s estimate of the childrens’ min is increasing.
MAX b
• Who can use n’s value to make a choice? MIN
• Let b be the lowest (best) value that MIN can get at any
choice point along the current path from the root
• If the value at n becomes higher than b, MIN will not pick MIN
this option, so we can stop considering n’s other children
(any further exploration of children will only increase the MAX n
value further)
Pruning: Example
Pruning: Example

8 <=4
Pruning: Example
Pruning: Example

10
<=2

>=100 2
10
Alpha-Beta Implementation
α: MAX’s best option on path to root
β: MIN’s best option on path to root

def max-value(state, α, β): def min-value(state , α, β):


initialize v = -∞ initialize v = +∞
for each successor of state: for each successor of state:
v = max(v, value(successor, α, β)) v = min(v, value(successor, α, β))
if v ≥ β return v if v ≤ α return v
α = max(α, v) β = min(β, v)
return v return v
Alpha-Beta Pruning - Properties
1. Pruning has no effect on the minimax value at the root.
• Pruning does not affect the final action selected at the root.
2. A form of meta-reasoning (computing what to compute)
• Eliminates nodes that are irrelevant for the final decision.

26
Alpha-Beta Pruning – Order of nodes matters

3 <=2 2

3 12 8 2 14 5 2

27
Alpha-Beta Pruning – Order of nodes matters

3 <=2
<=2

3 12 8 2 2 5 14

28
Alpha-Beta Pruning - Properties
1. Pruning has no effect on the minimax value at the root.
• Pruning does not affect the final action selected at the root.
2. A form of meta-reasoning (computing what to compute)
• Eliminates nodes that are irrelevant for the final decision.
3. The alpha-beta search cuts the largest amount off the tree when we
examine the best move first
• However, best moves are typically not known. Need to make estimates.

29
Alpha-Beta Pruning – Order of nodes matters
If the nodes were indeed encountered as “worst
moves first” – then no pruning is possible

If the nodes were encountered as “best moves first”


– then pruning is possible

Note: In reality, we don’t know the ordering.


30
Slide adapted from Prof. Mausam
Alpha-Beta Pruning - Properties
1. Pruning has no effect on the minimax value at the root.
• Pruning does not affect the final action selected at the root.
2. A form of meta-reasoning (computing what to compute)
• Eliminates nodes that are irrelevant for the final decision.
3. The alpha-beta search cuts the largest amount off the tree when we
examine the best move first
• Problem: However, best moves are typically not known.
• Solution: Perform iterative deepening search and evaluate the states.
4. Time Complexity
• Best ordering - O(bm/2). Can double the search depth for the same resources.
• On average – O(b3m/4) if we expect to find the min or max after b/2 expansions.

31
Minimax for Chess Alpha-Beta for Chess

Slide adapted from Prof. Mausam


Cutting-off Search
MAX
4
• Problem (Resource costraint): MIN
-2 4
• Minimax search: full tree till the terminal nodes.
• Alpha-beta prunes the tree but still searches till the -1 -2 4 9 Evaluations
terminal nodes.
• We can’t search till the terminal nodes. Cut off

• Solution:
• Depth-limited Search (H-Minimax)
• Search only to a limited depth (cutoff) in the tree
• Replace the terminal utilities with an evaluation function
for non-terminal positions.

? ? ? ?
Terminal nodes
Evaluation Functions
• Evaluation functions score non-terminals in depth-limited search.
• Estimate the chances of winning.

• Ideal function: returns the actual minimax value of the position


• In practice: typically weighted linear sum of features:

• e.g. fi(s) = (number of pieces of type i), each weight wi etc.


Evaluation Functions and Alpha-Beta

• Evaluation functions are always imperfect.

• Value at a min-node will only keep going down. Once value of min-node lower than
better option for max along path to root, can prune

• Evaluation function as a guidance for pruning


• IF evaluation function provides upper-bound on value at min-node, and upper-bound already
lower than better option for max along path to root THEN can prune
Determining “good” node orderings
• The ordering of nodes helps alpha-beta pruning.
• Worst ordering O(bm). Best ordering O(bm/2).

• How to find good orderings


• Problem: we only know them when we evaluate the nodes.

• One approach – iterative deepening to determine


evaluations for nodes
• What if we can do iterative deepening to a certain depth. Use the
evaluation function at the set depth and then compute the values for the
nodes in the tree that is generated.
• Next time, use the evaluations of the previous search to order the nodes.
Use them for pruning.
• Use evaluations of the previous search for order.
Incorporating Chance: Expectimax Search
• When the result of an action is not known. max

• Incorporate a notion of chance


• Include chance nodes
• Unpredictable opponents: the ghosts move
randomly in Pacman.
• Explicit randomness: rolling dice by a player in a
game.

10 10 9 100
• Expectimax search:
• At chance nodes the outcome is uncertain
• Calculate the expected utilities: weighted average
(expectation) of children

37
Expectimax Search
def value(state):
if the state is a terminal state: return the state’s utility
if the next agent is MAX: return max-value(state)
if the next agent is EXP: return exp-value(state)

def max-value(state): def exp-value(state):


initialize v = -∞ initialize v = 0
for each successor of state: for each successor of state:
v = max(v, value(successor)) p = probability(successor)
return v v += p * value(successor)
return v
Expectimax Search
def exp-value(state):
initialize v = 0
for each successor of state: 1/2 1/6
p = probability(successor) 1/3
v += p * value(successor)
return v 5
8 24
7 -12

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10


Expectimax Search

3 12 9 2 4 6 15 6 0 3 12 9 2

Can we perform pruning?


Depth-Limited Expectimax
• Depth-limit can be applied in
Expectimax search.
• Use heuristics to estimate the
values at the depth limit.

Estimate of true
400 300 … expectimax value

492 362 …
Multiple players and other games
• Other games: non zero-sum, or multiple players

• Generalization of minimax:
• Terminals have utility tuples
• Node values are also utility tuples
• Each player maximizes its own component

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5


“Games are to AI as grand prix is to automobile design”
Games viewed as an indicator of intelligence.

43
Probabilities (Recap)
• A random variable represents an event whose outcome is unknown
• A probability distribution is an assignment of weights to outcomes 0.25
• Example: Traffic on freeway
• Random variable: T = whether there’s traffic
• Outcomes: T in {none, light, heavy}
• Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
0.50
• Some laws of probability:
• Probabilities are always non-negative
• Probabilities over all possible outcomes sum to one

• As we get more evidence, probabilities may change:


• P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60
• Methods for reasoning and updating probabilities later.
0.25
Expectations (Recap)
• The expected value of a function of a random variable is the average, weighted by
the probability distribution over outcomes

• Example: How long to get to the airport?

Time: 20 min 30 min 60 min


x + x + x 35 min
Probability: 0.25 0.50 0.25
Probabilities for Expectimax
• In expectimax search, we have a probabilistic model of
how the opponent (or environment) will behave in any
state
• Model could be a simple uniform distribution (roll a die)
• Model could be sophisticated and require a great deal of
computation. The model might say that adversarial actions
are likely.

• For now, assume each chance node magically comes


along with probabilities that specify the distribution
over its outcomes (later formal ways).
Utilities and Decision-making
• Utilities are functions from outcomes
(states of the world) to real numbers
that describe an agent’s preferences

• Providing utilities
Getting ice cream
• In a game, may be simple (+1/-1)
• Utilities summarize the agent’s goals
Get Single Get Double

• We specify the utilities for a task, let the


behaviour emerge from the action. Oops Whew!
Maximum Expected Utility

• Maximum expected utility (MEU) principle:


• Choose the action that maximizes expected
utility
• The agent can be in several states, each with a
probability distribution. Utilities map states to a
value. Compute the expectation.
• We try to build models that maximize the
expected utility.

You might also like