ai lecture-4
ai lecture-4
Chess:
1945-1960: Zuse, Wiener, Shannon, Turing,
Newell&Simon, McCarthy
1960-1996: gradual improvements
1997: Deep Blue defeats human champion Gary Kasparov
in a six-game match
2024: Stockfish rating 3631 (vs 2847 for Magnus Carlsen)
Go:
1968: Zobrist’s program plays legal Go, barely (b>300!)
1968-2005: various ad hoc approaches tried, novice level
2005-2014: Monte Carlo tree search -> strong amateur
2017-2017: Alphago defeats human world champion
2022: human exploits NN weakness to defeat top Go
programs
Game Playing State-of-the-Art
Checkers:
1950: First computer player
1959: Samuel’s self-taught program
1994: First computer champion: Chinook ended 40-year-
reign of human champion Marion Tinsley using complete
8-piece endgame.
2007: Checkers solved!
Chess:
1945-1960: Zuse, Wiener, Shannon, Turing,
Newell&Simon, McCarthy
1960-1996: gradual improvements
1997: Deep Blue defeats human champion Gary Kasparov
in a six-game match
2024: Stockfish rating 3631 (vs 2847 for Magnus Carlsen)
Go:
1968: Zobrist’s program plays legal Go, barely (b>300!)
1968-2005: various ad hoc approaches tried, novice level
2005-2014: Monte Carlo tree search -> strong amateur
2017-2017: Alphago defeats human world champion
2022: human exploits NN weakness to defeat top Go
programs
Pacman:
Types of Games
Axes:
Zero sum?
Deterministic or stochastic?
One, two, or more players?
Perfect information (can you see the state)?
2 0 … 2 6 … 4 6
Value of a State
Value of a state: The best Non-Terminal States:
achievable outcome
(utility) from that state
2 0 … 2 6 … 4 6
Terminal States:
Adversarial Game Trees
-8 -5 -10 +8
Terminal States:
Tic-Tac-Toe Game Tree
Adversarial Search (Minimax)
Deterministic, zero-sum games: Minimax values:
Tic-tac-toe, chess, checkers computed recursively
One player maximizes result
5 max
The other minimizes result
2 5 min
Minimax search:
A state-space search tree
Players alternate turns
8 2 5 6
Compute each node’s minimax value:
the best achievable utility against a
Terminal values:
rational (optimal) adversary part of the game
Minimax Implementation
3 12 8 2 4 6 14 5 2
Minimax Properties
max
min
10 10 9 100
3 12 8 2 4 6 14 5 2
Minimax Pruning
3 12 8 2 14 5 2
Alpha-Beta Pruning
General configuration (MIN version)
We’re computing the MIN-VALUE at some node n MAX
We’re looping over n’s children
n’s estimate of the childrens’ min is dropping MIN a
Who cares about n’s value? MAX
Let a be the best value that MAX can get at any choice
point along the current path from the root
If n becomes worse than a, MAX will avoid it, so we MAX
can stop considering n’s other children (it’s already
bad enough that it won’t be played) MIN n
10
<=2
>=100 2
10
Resource Limits
Problem: In realistic games, cannot search to leaves! 4 max
Example:
Suppose we have 100 seconds, can explore 10K nodes /
sec
So can check 1M nodes per move
- reaches about depth 8 – decent chess program
Guarantee of optimal play is gone
More plies makes a BIG difference
? ? ? ?
Use iterative deepening for an anytime
algorithm
Evaluation Functions
Evaluation functions score non-terminals in depth-limited search
\
Iterative Deepening
Iterative deepening using Minimax (or AlphaBeta) as
b
subroutine: Until run out of time: …
1. Do a Minimax up to depth 1, using evaluation function at depth 1
2. Do a Minimax up to depth 2, using evaluation function at depth 2
3. Do a Minimax up to depth 3, using evaluation function at depth 3
4. Do a Minimax up to depth 4, using evaluation function at depth 4
…
min
Result: even though the 2 ghosts independently
…
run their own MiniMiniMax search, they will
naturally coordinate because:
max
They optimize the same objective …
They know they optimize the same objective (i.e. min
they know the other ghost is also a minimizer)
…
min
… …
Summary
Games are decision problems with 2 or more agents
Huge variety of issues and phenomena depending on details of interactions and payoffs
For zero-sum games, optimal decisions defined by minimax
Implementable as a depth-first traversal of the game tree
Time complexity O(bm), space complexity O(bm)
Alpha-beta pruning
Preserves optimal choice at the root
alpha/beta values keep track of best obtainable values from any max/min nodes on path from root to current node
Time complexity drops to O(bm/2) with ideal node ordering
Exact solution is impossible even for “small” games like chess
Evaluation function
Iterative deepening (i.e. go as deep as time allows)
Emergence of coordination:
For 3 or more agents (all MIN or MAX agents), coordination will naturally emerge from each independently optimizing
their actions through search, as long as they know for each other agent whether they are MIN or MAX