0% found this document useful (0 votes)

17 views

lecture14

Uploaded by

andybao291

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

lecture14

Uploaded by

andybao291

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Game Theory Lecture #14

Outline:

• Multiagent learning
• Regret matching
• Fictitious play
Single Agent Learning

• Setup:
– Two players: Player 1 vs. Nature
– Actions set: A1 and AN
– Payoffs: U : A1 × AN → R

Nature
Rain No Rain
Umbrella 1 0
P1
No umbrella 0 1
Player 1’s Payoff

• Player repeatedly interacts with nature

– Player’s action day t: a1 (t)
– Nature’s action day t: aN (t)
– Payoff day t: U (a1 (t), aN (t))
• Goal: Implement strategy that provides desirable guarantees with regard to average
performance
• Case 1: Stationary environment

– Nature’s choice according to non-adaptive (fixed) prob distribution pN ∈ ∆(An )

– Theory available to optimize average performance, e.g., reinforcement learning

• Case 2: Non-stationary environment

– Nature’s choice according to adaptive prob distribution, i.e., pN (t) 6= pN (t − 1)

– In general, pN (t) = f (a1 (0), ..., a1 (t − 1), aN (0), ..., aN (t − 1))
– One choice: aN (t) = βN (a1 (t − 1)) (assume zero sum game)

• Question: Is a player’s environment stationary or non-stationary in a game?

1
Single agent learning (cont)

• Challenge: Hard to predict what nature is going to do

• Previous direction: Optimize worst-case payoffs (e.g., security strategies)
• Problem: Derived strategies might be highly inefficient given behavior of nature
• Example:

Nature
Rain No Rain Thunder
Umbrella 1 0 0
P1 No umbrella 0 1 0
Jacket 0.1 0.1 0.1
Player 1’s Payoff
– What is security strategy?
– What is security level?
– How would answers change if there was never any thunder?

• Fact: Security strategies and values can be highly influenced by “rare” actions
• Are there “online” policies that can provide potentially better performance guarantees?

2
What about regret?

• New direction: Can a player optimize “what if” scenarios?

• Definition: Player’s average payoff at day t
t
1X
Ū (t) = U (a1 (τ ), aN (τ ))
t τ =1

• Definition: Player’s perceived average payoff at day t if committed to fixed action and
nature was unchanged
t
a1 1X
v̄ (t) = U (a1 , aN (τ ))
t τ =1

• Definition: Player’s regret at day t for not having used action a1

R̄a1 (t) = v̄ a1 (t) − Ū (t)

• Example:

Day 1 2 3 4 5 6 ...
Player’s Decision NU U NU U NU NU ...
Nature’s Decision R NR R R NR R ...
Payoff 0 0 0 1 1 0 ...

– Ū (6)?
– v̄ U (6)?
– v̄ N U (6)?
– R̄U (6)?
– R̄N U (6)?

3
Regret Matching

• Positive regret = Player could have done something better in hindsight

• Q: Is it possible to make positive regret vanish asymptotically “irrespective” of nature?
• Consider the strategy Regret Matching : At day t play strategy p(t) ∈ ∆(A1 )
U
R̄ (t) +
pU (t + 1) = U
R̄ (t) + + R̄N U (t) +
NU
R̄ (t) +
pN U (t + 1) = U
R̄ (t) + + R̄N U (t) +

• Notation: [·]+ is projection to positive orthant, i.e., [x]+ = max{x, 0}

• Strategy generalizes to more than two actions
• Fact: Positive regret asymptotically vanishes irrespective of nature
U
R̄ (t) →0
N U +
R̄ (t) + → 0

• Example revisited:

Day 1 2 3 4 5 6 ...
Player’s Decision NU U NU U NU NU ...
Nature’s Decision R NR R R NR R ...
Payoff 0 0 0 1 1 0 ...
– Regret matching strategy day 2?
– Regret matching strategy day 3?
– Regret matching strategy day 4?
– Regret matching strategy day 5?
– Regret matching strategy day 6?

4
Learning in games

• Consider the following one-shot game

– Players N
– Actions Ai
– Utility functions Ui : A → R
• Consider a repeated version of the above one-shot game where at each time t ∈ {1, 2, ...},
each player i ∈ N simultaneously
– Selects a strategy pi (t) ∈ ∆(Ai )
– Selects an action ai (t) randomly according to strategy pi (t)
– Receives utility Ui (ai (t), a−i (t))
– Each player updates strategy using available information

pi (t + 1) = f (a(0), a(1), ..., a(t); Ui )

• The strategy update function f (·) is referred to as the learning rule

– Ex: Cournot adjustment process

• Concern: How much information do players have access to?

– Structural form of utility function, i.e., Ui (·)?

– Action of other players, i.e., a−i (t)?
– Perceived reward for alternative actions, i.e., Ui (ai , a−i (t)) for any ai
– Utility received, Ui (a(t))

• Informational restrictions place restriction on class of admissible learning rules

• Goal: Provide asymptotic guarantees if all players follow a specific f (·)

5
Regret matching

• Consider the learning rule f (·) where

ai
R̄i (t) +
pai i (t + 1) = P
ãi (t)

ãi ∈Ai R̄ +

– pai i (t + 1) = Probability player i plays action ai at time t + 1

– R̄iai (t) = Regret of player i for action ai at time t
• Fact: Max regret of all players goes to 0 (think of other players as “nature”)
ai
R̄i (t) + → 0
• Result restated: The behavior converges to a “no-regret” point
• Question: Where are we? Is this a NE?
• Rewrite regret in terms of empirical frequency z(t) ∈ ∆(A)
Ūi (t) = 1t tτ =1 Ui (a(τ )) = Ui (z(t))
P

Pt
v̄iai (t) = 1
t τ =1 Ui (ai , a−i (t)) = Ui (ai , z−i (t))

R̄iai (t) = v̄iai (t) − Ūi (t) = Ui (ai , z−i (t)) − Ui (z(t))
• Characteristic of no-regret point
R̄iai (t) ≤ 0 ⇔ Ui (ai , z−i (t)) ≤ Ui (z(t))
• No-regret point restated: For any player i and action ai
Ui (ai , z−i (t)) ≤ ui (z(t))
• No-regret point = Coarse correlated equilibrium (slightly weaker notion than correlated
equilibrium)
• Slightly modified (and more complex) version of regret matching ensures convergence to
correlated equilibrium.
• Theorem: If all players follow the regret matching strategy then the empirical frequency
converges to the set of coarse correlated equilibrium.

6
Convergence to NE?

• Recap: If all players follow the regret matching strategy then the empirical frequency
converges to the set of coarse correlated equilibria.
• This result holds irrespective of the underlying game!
• Problems:
– Predictability: Behavior will not necessarily settle down, i.e., only guarantees that
empirical frequency of play will be in the set of CCE
– Efficiency: Set of CCE much larger than the set of NE. Are CCE worse than NE in
terms of efficiency?
• Revised goal: Are there learning rules that converge to NE (as opposed to CCE) for any
game?
• Answer: No
• Theorem: There are no “natural” dynamics that lead to NE in any game (Hart, 2009).
– Natural = adaptive, simple, efficient (e.g., regret matching, cournot, ...)
– Not natural = exhaustive search, mediator, ...
• Question: Are there natural dynamics that converge to NE for special game structures?
(e.g., zero-sum games?)

7
Fictitious Play

• Recall: A learning rule is of the form

pi (t + 1) = f (a(1), a(2), ..., a(t); Ui )

• Fictitious play: A learning rule where the strategy pi (t + 1) is a best response to the
scenario where all players j 6= i are selecting their action independently according to the
empirical frequency of their past decisions.
• Define empirical frequencies qi (t) as follows:
t
1X
qiai (t) = I{ai (τ ) = ai }
t τ =1

• Fictitious play: Each player best responds to empirical frequencies

pi (t + 1) ∈ arg max ui (pi , q−i (t))

pi ∈∆(Ai )

where
a
X Y
ui (pi , q−i (t)) = ui (a1 , a2 , ..., an )pai i qj j (t)
a∈A j6=i

• FP facts: Beliefs (i.e., empirical frequencies) converge to NE for

– For 2-player games with 2 moves per player
– Zero sum games with arbitrary moves per player
– Other game structures as well (more to come on this)

8
Fictitious play example

• Consider the following two-player zero-sum games

L C R
T −1 0 1
M 1 −1 0
B 0 1 −1

• Suppose a(1) = {T, L}

– What is qrow (1)?

– What is qcol (1)?

• What is a(2)?

– What is qrow (2)?

– What is qcol (2)?

• What is a(3)?

Game Theory (Part 1)
No ratings yet
Game Theory (Part 1)
81 pages
M3S11: Games, Risks and Decisions
No ratings yet
M3S11: Games, Risks and Decisions
78 pages
Hoelle, M. Game Theory. Parduespr12
No ratings yet
Hoelle, M. Game Theory. Parduespr12
258 pages
(Sxdistic) GAMING KEYWORDS FOR SLAYER LEECHER
100% (2)
(Sxdistic) GAMING KEYWORDS FOR SLAYER LEECHER
139 pages
Paranoia XP - Mandatory Mission Pack
100% (6)
Paranoia XP - Mandatory Mission Pack
32 pages
Game Theory Project - Adaptive Heuristics
No ratings yet
Game Theory Project - Adaptive Heuristics
8 pages
Lecture VIII: Learning: Markus M. M Obius March 6, 2003
No ratings yet
Lecture VIII: Learning: Markus M. M Obius March 6, 2003
9 pages
Games in Learning
No ratings yet
Games in Learning
99 pages
Introduction To Game Theory Markus Mobius Harvard
100% (3)
Introduction To Game Theory Markus Mobius Harvard
176 pages
GameTheory Lecture 02 Choice With Uncertainty and Dynamic Choice
No ratings yet
GameTheory Lecture 02 Choice With Uncertainty and Dynamic Choice
44 pages
Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World
No ratings yet
Unlearnable Games and "Satisficing" Decisions: A Simple Model For A Complex World
34 pages
Nagel 95
No ratings yet
Nagel 95
15 pages
Introduction To Game Theory (Harvard) PDF
No ratings yet
Introduction To Game Theory (Harvard) PDF
167 pages
Lecture I-II: Motivation and Decision Theory - Markus M. MÄobius
No ratings yet
Lecture I-II: Motivation and Decision Theory - Markus M. MÄobius
9 pages
Game Theory Behavioral Finance
No ratings yet
Game Theory Behavioral Finance
5 pages
Algorithmic Game Theory - No-Regret Dynamics
No ratings yet
Algorithmic Game Theory - No-Regret Dynamics
9 pages
2111.06008v3
No ratings yet
2111.06008v3
37 pages
Learning in Extensive-Form Games I. Self-Confirming Equilibria
No ratings yet
Learning in Extensive-Form Games I. Self-Confirming Equilibria
36 pages
Contents Preface Oct07
No ratings yet
Contents Preface Oct07
10 pages
Decision Theory Notes
No ratings yet
Decision Theory Notes
226 pages
Dasdas
No ratings yet
Dasdas
12 pages
Learning Pareto-Optimal Solutions in 2x2 Con Ict Games
No ratings yet
Learning Pareto-Optimal Solutions in 2x2 Con Ict Games
14 pages
Unit 1-RL
No ratings yet
Unit 1-RL
11 pages
PhysRevX.14.021039
No ratings yet
PhysRevX.14.021039
38 pages
Game Theoretic Decision Making PHD Thesis CMU-CS-23-117
No ratings yet
Game Theoretic Decision Making PHD Thesis CMU-CS-23-117
358 pages
A Reinforcement Procedure Leading To Correlated Equilibrrium - Mascollel Hart
No ratings yet
A Reinforcement Procedure Leading To Correlated Equilibrrium - Mascollel Hart
20 pages
Syllabus: Economics 805, Part 1 Evolution and Learning in Games
No ratings yet
Syllabus: Economics 805, Part 1 Evolution and Learning in Games
3 pages
DS Unit Iv
No ratings yet
DS Unit Iv
89 pages
The Folk Theorems For Repeated Games: A Synthesis
No ratings yet
The Folk Theorems For Repeated Games: A Synthesis
36 pages
336 Lecture9 2007
No ratings yet
336 Lecture9 2007
4 pages
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
No ratings yet
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
85 pages
Mid-term Revision
No ratings yet
Mid-term Revision
8 pages
CHicken or Checkin'?
No ratings yet
CHicken or Checkin'?
29 pages
Games With Sequential Moves PDF
100% (1)
Games With Sequential Moves PDF
3 pages
21ai020 & Reinforcement Learning: The Agent-Environment Interface
No ratings yet
21ai020 & Reinforcement Learning: The Agent-Environment Interface
8 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
No ratings yet
Introduction To Game Theory 1 - Decision Theory: Helena Perrone
52 pages
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
No ratings yet
Evolutionary Game Theory and Multi-Agent Reinforcement Learning
26 pages
Jurnal Internasional
No ratings yet
Jurnal Internasional
8 pages
Game Theory
No ratings yet
Game Theory
29 pages
Chapter 5
No ratings yet
Chapter 5
13 pages
Game Theory: Lecture Notes by Y. Narahari
No ratings yet
Game Theory: Lecture Notes by Y. Narahari
7 pages
Decision and Game Theory Resume
No ratings yet
Decision and Game Theory Resume
12 pages
Decision 1
No ratings yet
Decision 1
15 pages
Lecture Notes On Decision Theory: Brian Weatherson
No ratings yet
Lecture Notes On Decision Theory: Brian Weatherson
149 pages
Game Theory
100% (1)
Game Theory
155 pages
Lecture I-II: Motivation and Decision Theory: 1 Motivating Experiment: Guess The Average
No ratings yet
Lecture I-II: Motivation and Decision Theory: 1 Motivating Experiment: Guess The Average
8 pages
Microeconomics II: Game Theory and Its Applications
No ratings yet
Microeconomics II: Game Theory and Its Applications
31 pages
Nau 2008 Learning
No ratings yet
Nau 2008 Learning
49 pages
Lectures
No ratings yet
Lectures
98 pages
MT10 Ch8 Solns
No ratings yet
MT10 Ch8 Solns
16 pages
Training An Artificial Neural Network To Play Tic Tac Toe PDF
No ratings yet
Training An Artificial Neural Network To Play Tic Tac Toe PDF
16 pages
Chapter 10
No ratings yet
Chapter 10
4 pages
Reinforcement Learning.pptx
No ratings yet
Reinforcement Learning.pptx
59 pages
6b Soln
No ratings yet
6b Soln
3 pages
L2
No ratings yet
L2
31 pages
Game Theory: Guillem Roig
No ratings yet
Game Theory: Guillem Roig
28 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Lab5
No ratings yet
Lab5
4 pages
lecture15-notes
No ratings yet
lecture15-notes
12 pages
lecture06
No ratings yet
lecture06
8 pages
Lecture_5
No ratings yet
Lecture_5
38 pages
MC Maths - Activity Book Without Ans - Stage 3 - C04
No ratings yet
MC Maths - Activity Book Without Ans - Stage 3 - C04
14 pages
Chess Editors, Journalists, Correspondents, and Columnists by Bill Wall
100% (1)
Chess Editors, Journalists, Correspondents, and Columnists by Bill Wall
26 pages
Dance Dance Revolution
No ratings yet
Dance Dance Revolution
3 pages
Raikage
No ratings yet
Raikage
2 pages
Han Boardgame English Rules
No ratings yet
Han Boardgame English Rules
8 pages
Big Car Small Car Big Car Ii
No ratings yet
Big Car Small Car Big Car Ii
6 pages
Exam 4th Jrs
No ratings yet
Exam 4th Jrs
3 pages
2014 National Championships Results
No ratings yet
2014 National Championships Results
17 pages
D100 Dungeon Q&A V1.0
No ratings yet
D100 Dungeon Q&A V1.0
3 pages
Holtzhausen Louis - 99 1 Chess Tactics
100% (3)
Holtzhausen Louis - 99 1 Chess Tactics
104 pages
Analytical Reasoning
No ratings yet
Analytical Reasoning
12 pages
Sample
No ratings yet
Sample
6 pages
Volleyball Officiating
No ratings yet
Volleyball Officiating
34 pages
Tab Program Math
No ratings yet
Tab Program Math
7 pages
The Great Fire Bird - 5e Monster - GM Binder
No ratings yet
The Great Fire Bird - 5e Monster - GM Binder
4 pages
Android Game: A Project Synposis
No ratings yet
Android Game: A Project Synposis
8 pages
WM Testpaper 10 Maths 241013 Basic+Standard Probability Triangles
No ratings yet
WM Testpaper 10 Maths 241013 Basic+Standard Probability Triangles
2 pages
Conan - Character Sheet 2nd Edition (2 Page)
75% (4)
Conan - Character Sheet 2nd Edition (2 Page)
2 pages
8-11 FEB FIDE TMT JJ College TRICHY
No ratings yet
8-11 FEB FIDE TMT JJ College TRICHY
2 pages
NR 311303 Operations Research
No ratings yet
NR 311303 Operations Research
10 pages
3320144-Hells Rebels 2e
No ratings yet
3320144-Hells Rebels 2e
106 pages
Ing 7
No ratings yet
Ing 7
8 pages
Mad 402
No ratings yet
Mad 402
52 pages
42 47 SMPS Inverter 715G3812
No ratings yet
42 47 SMPS Inverter 715G3812
4 pages
Updated PIAA District-III 6A Boys Brackets
No ratings yet
Updated PIAA District-III 6A Boys Brackets
1 page
Question Bank
100% (2)
Question Bank
6 pages
Buck Lateral Kick-Off Return
100% (5)
Buck Lateral Kick-Off Return
4 pages
Lista Software Revistas
No ratings yet
Lista Software Revistas
17 pages