0% found this document useful (0 votes)
11 views

Lect26 4up

This document discusses self-interested agents and competition between agents. It introduces a "grade game" example where two students must choose between options that determine their grade based on both choices. It then discusses how agents have preferences over outcomes and these preferences can be represented with utility functions. When multiple self-interested agents interact in an environment, their choices impact the outcome through a state transformer function. Rational agents will choose actions that maximize their utility based on this function and their preferences.

Uploaded by

alialataby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lect26 4up

This document discusses self-interested agents and competition between agents. It introduces a "grade game" example where two students must choose between options that determine their grade based on both choices. It then discusses how agents have preferences over outcomes and these preferences can be represented with utility functions. When multiple self-interested agents interact in an environment, their choices impact the outcome through a state transformer function. Rational agents will choose actions that maximize their utility based on this function and their preferences.

Uploaded by

alialataby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Today

Robotics and Autonomous Systems


Lecture 26: Self-interested agents
• In the last lecture we observed that teamwork requires some level of
cooperation.
Richard Williams • It is easiest to obtain this cooperation when agents are benevolent
towards each other in the sense of sharing the same interests/goals.
Department of Computer Science
University of Liverpool • But what happens if such alignment of interests is not present and
agents have their own, possibly incompatible goals?
• Self-interest
• Competition between agents.

1 / 59 2 / 59

Competition between agents Competition between agents

3 / 59 4 / 59
The grade game The grade game

• Take a piece of paper


• You are randomly paired with a partner (you do not know who!)
• You have to write X or Y on the piece of paper
• You will get a grade based on the following rules:
• If both you and your partner write X, then you both get a B
• If you write X and your partner writes Y then you get D and your
partner gets A
• If you write Y and your partner writes X then you get A and your
partner gets D
• If both you and your partner write Y then you both get C
The Grade Game, Ben Polak

5 / 59 6 / 59

The grade game The grade game

• What would you do? • What would you do?


• What you get depends also on the choice of your partner.

7 / 59 8 / 59
The grade game Choose the side

• Which side of the road to drive on?

• What would you do?


• What you get depends also on the choice of your partner.
• This is the blueprint of strategic interaction.

9 / 59 10 / 59

Choose the side Choose the side

• Which side of the road to drive on? • Which side of the road to drive on?

Any fule kno that.

11 / 59 12 / 59
Choose the side Choose the side

• Which side of the road to drive on?

• How do you choose when you don’t know what “everyone else” is
doing.

• Same side as everyone else

13 / 59 14 / 59

Utilities and Preferences Preferences over outcomes

• Assume we have just two agents: Ag “ ti , j u.


• Agents are assumed to be self-interested.
• They have preferences over states of the the environment.
• Assume Ω “ tω1 , ω2 , . . .u is the set of “outcomes” that agents have
preferences over.
• Outcomes are states of the environment.

• Cake or Death?

15 / 59 16 / 59
Preferences over outcomes Preferences over outcomes

• Broccoli or Brussel Sprouts?


• Cake or Pie?

17 / 59 18 / 59

Utility functions Utility functions

• We capture preferences by utility functions: • For example:


uSimon pcake q ą uSimon pdeath q
ui : Ω Ñ ’
uSimon ppie q ą uSimon pcake q
uj : Ω Ñ ’ uSimon pbroccoli q ą uSimon pbrussel sprouts q
• Utility functions lead to preference orderings over outcomes: • Meaning that:
cake ąSimon death
ω ľi ω1 means ui pωq ě ui pω1 q pie ąSimon cake
broccoli ąSimon brussel sprouts
ω ąi ω1 means ui pωq ą ui pω1 q

19 / 59 20 / 59
Utility functions Utility versus Money

• Typical relationship between utility & money:


utility

• Note that:
uSimon ppie q ą uSimon pcake q
really means
uSimon ppie & X q ą uSimon pcake & X q
where X captures “everything else”. money

• Ceteris paribus, “All other things being equal”.


• If things aren’t the same, preferences may change.

• Utility is not money. Just a way to encode preferences.

21 / 59 22 / 59

Multiagent Encounters Multiagent Encounters

• We need a model of the environment in which these agents will act. . .


• agents simultaneously choose a (strategy), and as a result of the
actions they select, an outcome in Ω will result; • Here is a state transformer function:
• the actual outcome depends on the combination of actions;
• assume each agent has just two possible strategies, C and D. τpD , D q “ ω1 τpD , C q “ ω2 τpC , D q “ ω3 τpC , C q “ ω4
• Environment behaviour given by state transformer function: • This environment is sensitive to actions/strategies of both agents.

τ: looAc
moon ˆ looAc
moon ÑΩ
agent i’s action agent j’s action

23 / 59 24 / 59
Multiagent Encounters Multiagent Encounters

• Here is another: • And here is another:

τpD , D q “ ω1 τpD , C q “ ω1 τpC , D q “ ω1 τpC , C q “ ω1 τpD , D q “ ω1 τpD , C q “ ω2 τpC , D q “ ω1 τpC , C q “ ω2

• Neither agent has any influence in this environment. • This environment is controlled by j.

25 / 59 26 / 59

Rational Action Rational Action

• Suppose we have the case where both agents can influence the
outcome, and they have utility functions as follows:

ui pω1 q “ 1 ui pω2 q “ 1 ui pω3 q “ 4 ui pω4 q “ 4


uj pω1 q “ 1 uj pω2 q “ 4 uj pω3 q “ 1 uj pω4 q “ 4 • i prefers all outcomes that arise through C over all outcomes that
arise through D.
• With a bit of abuse of notation:
C , C ľi C , D ąi D , C ľi D , D
ui pD , D q “ 1 ui pD , C q “ 1 ui pC , D q “ 4 ui pC , C q “ 4
uj pD , D q “ 1 uj pD , C q “ 4 uj pC , D q “ 1 uj pC , C q “ 4 • Thus C is the rational choice for i.

• Then agent i’s preferences are:

C , C ľi C , D ąi D , C ľi D , D

• In this case, what should i do?

27 / 59 28 / 59
Key question Choose the side

• Let’s go back to this game

• How do we establish the rational choice for an agent? • Which side to drive on.
Best strategy
29 / 59 30 / 59

Payoff Matrices Payoff Matrices

• We can characterise the “choose side” scenario in a payoff matrix


i
left right • Actually there are two matrices here, one (call it A ) that specifies the
left 1 0 payoff to i and another B that specifies the payoff to j.
j 1 0
• Sometimes we’ll write the game as pA , B q in recognition of this.
right 0 1
0 1
• Agent i is the column player and gets the upper reward in a cell.
• Agent j is the row player and gets the lower reward in a cell.

31 / 59 32 / 59
Payoff Matrices Solution Concepts

• We can characterise the grade game scenario in a payoff matrix


i
• How will a rational agent will behave in any given scenario?
Y X
Y 2 1 Play. . .
j 2 4 • Dominant strategy;
X 4 3 • Nash equilibrium strategy;
• Pareto optimal strategies;
1 3
• Strategies that maximise social welfare.
• Payoffs are the US grade points that correspond to the problem
statement.
• A is 4, B is 3 etc.

33 / 59 34 / 59

Dominant Strategies Dominant Strategies

• Given any particular strategy s (either C or D) agent i, there will be a


number of possible outcomes.
• We say s1 dominates s2 if every outcome possible by i playing s1 is
preferred over every outcome possible by i playing s2 .
• A rational agent will never play a dominated strategy.
• Thus in this game:
• So in deciding what to do, we can delete dominated strategies.
i
• Unfortunately, there isn’t always a unique undominated strategy.
D C
D 1 4
j 1 1
C 1 4
4 4
C dominates D for both players.

35 / 59 36 / 59
Dominant Strategies Dominant Strategies

• Game with dominated strategies • Game with dominated strategies


L C R L C
U 1 1 0 U 1 1
3 0 0 3 0
M 1 1 0 M 1 1
1 1 5 1 1
L 1 1 0 L 1 1
0 4 0 0 4
• Can eliminate the dominated strategies and simplify the game • Can eliminate the dominated strategies and simplify the game
• Remove R (dominated by L). • Remove R (dominated by L).

37 / 59 38 / 59

Dominant Strategies Dominant Strategies

• If we are lucky, we can eliminate enough strategies so that the choice • If we are lucky, we can eliminate enough strategies so that the choice
of action is obvious. of action is obvious.
• In general we aren’t that lucky.

39 / 59 40 / 59
Nash Equilibrium Nash Equilibrium

• In general, we will say that two strategies s1 and s2 are in Nash


equilibrium (NE) if:
1 under the assumption that agent i plays s1 , agent j can do no better
than play s2 ; and
2 under the assumption that agent j plays s2 , agent i can do no better
than play s1 .
• Neither agent has any incentive to deviate from a NE.
• Eh?

John Forbes Nash.

41 / 59 42 / 59

Nash Equilibrium Nash Equilibrium

• Let’s consider the payoff matrix for the grade game:


i
Y X
Y 2 1 • If two strategies are best responses to each other, then they are in
j 2 4 Nash equilibrium.
X 4 3
1 3
• Here the Nash equilibrium is pY , Y q.
• If i assumes that j is playing Y , then i’s best response is to play Y .
• Similarly for j.

43 / 59 44 / 59
Nash Equilibrium Nash Equilibrium

i • More formally:
Y X A strategy pi ˚ , j ˚ q is a Nash equilibrium solution to the game pA , B q if:
Y 2 1
j 2 4 @i , ai ˚ ,j ˚ ě ai ,j ˚
X 4 3 @j , bi ˚ ,j ˚ ě bi ˚ ,j
1 3
• In a game like this you can find the NE by cycling through the
outcomes, asking if either agent can improve its payoff by switching
its strategy.
• Thus, for example, pX , Y q is not an NE because i can switch its payoff
from 1 to 2 by switching from X to Y .

45 / 59 46 / 59

Nash Equilibrium Nash Equilibrium

• This game has two pure strategy NEs, pC , C q and pD , D q:


i
• Unfortunately: D C
1 Not every interaction scenario has a pure strategy NE. D 5 1
2 Some interaction scenarios have more than one NE. j 3 2
C 0 3
2 3
• In both cases, a single agent can’t unilaterally improve its payoff.

47 / 59 48 / 59
Nash Equilibrium Mirowski on Nash Equilibrium

• This game has no pure strategy NE:


i
D C
D 2 1
j 1 2
C 0 1
2 1
• For every outcome, one of the agents will improve its utility by
switching its strategy.
• We can find a form of NE in such games, but we need to go beyond
pure strategies.

49 / 59 50 / 59

Pareto Optimality Pareto Optimality

• Can argue as follows:


• “Reasonable” agents would agree to move to ω1 from ω if ω is not
• An outcome is said to be Pareto optimal (or Pareto efficient) if there is Pareto optimal and ω1 is.
no other outcome that makes one agent better off without making • Even if a given agent doesn’t directly benefit from ω1 , others can
another agent worse off. benefit without it suffering.
• If an outcome is Pareto optimal, then at least one agent will be
reluctant to move away from it (because this agent will be worse off).
• If an outcome ω is not Pareto optimal, then there is another outcome
ω1 that makes everyone as happy, if not happier, than ω.

51 / 59 52 / 59
Pareto Optimality Pareto Optimality

• This next game has two Pareto efficient outcomes, pC , D q and pD , C q.

• This game has one Pareto efficient outcome, pD , D q. i


D C
i D 5 4
D C 3 0
j
D 5 1 C 0 1
j 3 2 4 1
C 0 0
2 1 • Note that Pareto efficiency doesn’t necessarily mean fair.
• Just that you can’t move away and make one agent better off without
• There is no solution in which either agent does better. making the other worse off.
• Both Pareto efficient solutions are the worst outcome for one of the
agents.

53 / 59 54 / 59

Pareto Optimality Social Welfare

• Pareto optimality is a rather weak concept.

• The social welfare of an outcome ω is the sum of the utilities that


each agent gets from ω:
ÿ
ui pωq
i PAg

• Think of it as the “total amount of money in the system”.


• As a solution concept, may be appropriate when the whole system
(all agents) has a single owner (then overall benefit of the system is
important, not individuals).
• What is the Pareto optimal way to divide a pile of money between A
and B?

55 / 59 56 / 59
Social Welfare Social Welfare

• In both these games, pC , C q maximises social welfare.


• As a solution concept it doesn’t consider the benefits to individuals.
i
D C
D 2 1
j 2 1
C 3 4
3 4

i
D C
D 2 1
j 2 1
• A very skewed outcome can maximise social welfare. C 3 9
3 0

57 / 59 58 / 59

Summary

• This lecture has introduced some of the issues in handling


self-interested agents.
• Covered the basics of game theory and looked at some solution
concepts
• Dominant strategies
• Nash equilibrium
• Pareto optimality
• Maximising social welfare
• Will look at more next time.

59 / 59

You might also like