0% found this document useful (0 votes)
71 views89 pages

Main Ai Games Markets

This document provides an introduction to applying economics, game theory, and optimization techniques to artificial intelligence. It covers topics like Nash equilibrium, auction theory, mechanism design, regret minimization, extensive-form games, fair division, and their applications to problems in AI like self-play and internet advertising auctions. The goal is to introduce concepts from these fields that are important to understanding how AI systems interact with markets and economic decision making.

Uploaded by

Jing Dong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views89 pages

Main Ai Games Markets

This document provides an introduction to applying economics, game theory, and optimization techniques to artificial intelligence. It covers topics like Nash equilibrium, auction theory, mechanism design, regret minimization, extensive-form games, fair division, and their applications to problems in AI like self-play and internet advertising auctions. The goal is to introduce concepts from these fields that are important to understanding how AI systems interact with markets and economic decision making.

Uploaded by

Jing Dong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

AI, Games, and Markets

Christian Kroer1
Department of Industrial Engineering and Operations Research
Columbia University

October 17, 2023

1
Email: [email protected].
2
Contents

1 Introduction and Examples 5


1.1 Why Economics, AI and Optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Market Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Nash Equilibrium Intro 11


2.1 General-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Auctions and Mechanism Design Intro 15


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Mechanism Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Regret Minimization and Sion’s Minimax Theorem 21


4.1 Regret Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Online Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Distance-Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Online Mirror Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Minimax theorems via OCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Self-Play via Regret Minimization 31


5.1 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 From Regret to Nash Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Alternation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4 Increasing Iterate Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.5 Historical Notes and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Extensive-Form Games 37
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 Perfect-Information EFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3 Imperfect-Information EFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Sequence Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.5 Dilated Distance-Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.6 Counterfactual Regret Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.7 Numerical Comparison of CFR methods and OMD-like methods . . . . . . . . . . . . . . . . 45
6.8 Stochastic Gradient Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.9 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3
4 CONTENTS

7 Intro to Fair Division and Market Equilibrium 49


7.1 Fair Division Intro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Fisher Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3 More General Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.4 Computing Market Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.5 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8 Fair Allocation with Indivisible Goods 55


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.3 Fair Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.4 Computing Discrete Max Nash Welfare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.5 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

9 Internet Advertising Auctions: Position Auctions 61


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.2 Position Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.3 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

10 Auctions with Budgets 65


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.2 Auctions Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.3 Second-Price Auction Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.4 First-Price Auction Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10.5 In what sense are we in equilibrium? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
10.7 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

11 Online Budget Management 75


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.2 Dynamic Auctions Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.3 Adaptive Pacing Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
11.4 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

12 Demographic Fairness 79
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
12.2 Disallowing Targeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.3 Demographic Fairness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.4 Implementing Fairness Measures on a Per-Ad Basis . . . . . . . . . . . . . . . . . . . . . . . . 81
12.5 Fairness Constraints in FPPE via Taxes and Subsidies . . . . . . . . . . . . . . . . . . . . . . 82
12.6 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 1

Introduction and Examples

1.1 Why Economics, AI and Optimization?


These lecture notes provide an introduction to the topics of game theory and market design, with a focus on
how AI and optimization methods can be used both to understand these problems, as well as enable them
in practical settings. We will go over several application areas for these ideas, where each area will have
real-life applications that have been deployed. A common theme underlying all the areas that we will study
is that for each area, one or more of the real applications are enabled by AI and optimization. In particular,
we will repeatedly see that economic solution concepts often have some underlying convex or mixed-integer
formulation of the problem, that allows us to compute solutions. Furthermore, most applications will require
scaling at a level where standard optimization methods are not enough. In those settings, AI methods such
as abstraction or machine learning are often used. For example, we may have a game that is way too large to
even fit in memory. In that case, we can generate some coarse-grained representation of the problem using
abstraction or machine learning. This coarse-grained representation is then typically what we solve with
optimization methods.
The following subsections give examples of the types of ideas and applications the book will cover.

Target Audience
These notes are targeted at senior undergraduates, masters, and Ph.D. students in operations research and
computer science. The background requirements are as follows:

• Knowledge of linear algebra, probability, and calculus.

• A basic background in convex, linear, and integer optimization.

• I will sometimes refer to basic concepts from computational complexity theory, but a background in
this is not required.

The notes do not assume any background in game theory or mechanism design. For convex optimization,
it is possible to read up on the basics as you go along. For example, the first few chapters of Boyd and
Vandenberghe [16] are a good reference. For linear optimization, I recommend Bertsimas and Tsitsiklis [12].

1.2 Game Theory


The first pillar of the course will be game theory. In classical optimization, we have some form of objective
function that we try to minimize or maximize, say maxx∈X f (x), where X is a convex set of possible choices,
and f is some concave function. For example, perhaps we are thinking of X as a set of prices that a retailer
can set for a given item, and f (x) tells us the revenue that the retailer gets when setting the price x.
In game theory, on the other hand, we study settings where multiple individuals make choices, and the
outcome depends on the choices of all the individuals. Suppose that we have two retailers, each choosing

5
6 CHAPTER 1. INTRODUCTION AND EXAMPLES

prices x1 and x2 respectively. Now, suppose that f1 is a function that tells us the revenue received by
retailer 1 in this setup. Since consumers will potentially compare the prices x1 and x2 , we should expect
f1 to depend on both x1 and x2 , so we let f1 (x1 , x2 ) be the revenue for retailer 1 generated under prices
x1 and x2 . Now we can again try to think of the optimization problem that retailer 1 wishes to solve; first
let us assume that x2 was already chosen and retailer 1 knows its value, in that case they want to solve
maxx1 ∈X f1 (x1 , x2 ). However, we could similarly argue that retailer 2 should choose their price x2 based on
the price x1 chosen by retailer 1. Now we have a problem, because we cannot talk about optimally choosing
either of the two prices in isolation, and instead we need a way to reason about how they might be chosen in
a way that depends on each other. Game theory provides a formal way to reason about this type of situation.
For example, the famous Nash equilibrium, which we will study below, specifies that we should find a pair
x1 , x2 such that they are mutually optimal with respect to each other. Another solution concept we will see
is the Stackelberg equilibrium, where one retailer is assumed to go first, while anticipating the optimization
problem being solved by the second retailer. From now on we will refer to each individual optimizer in a
problem either as a player or an agent.

1.2.1 Nash Equilibrium


One of the most important ideas in game theory is the famous Nash equilibrium. A Nash equilibrium is a
specification of an action for each player (or a probability distribution over actions) such that it is a steady
state of a game, in the sense that no player wishes to change their action given what everybody else is doing.
This is best illustrated with an example. Below is the payoffs of the game of rock-paper-scissors (RPS),
specified as a bimatrix of payoffs.

Rock Paper Scissors


Rock 0,0 -1,1 1,-1
Paper 1,-1 0,0 -1,1
Scissors -1,1 1,-1 0,0

In this representation, Player 1 chooses a row to play, and Player 2 chooses a column to play. Player 1 tries
to maximize the first value at the resulting entry in the bimatrix, while Player 2 tries to maximize the second
value.
Here is an example of something that is not a Nash equilibrium: Player 1 always plays rock, and Player
2 always plays scissors. In this case, Player 2 is not playing optimally given the strategy of Player 1, since
they could improve their payoff from −1 to 1 by switching to deterministically playing paper. In fact, this
argument works for any pair of deterministic strategies, and so we see that there is no Nash equilibrium
consisting of deterministic strategies.
Instead, RPS is an example of a game where we need randomization in order to arrive at a Nash equi-
librium. The idea is that each player gets to choose a probability distribution over their actions instead
(e.g. a distribution over rows for Player 1). Now, the value that a given player receives under a pair of
mixed strategies is their expected payoff given the randomized strategies. In RPS, it’s easy to see that the
unique Nash equilibrium is for each player to play each action with probability 13 . Given this distribution,
there is no other action that either player can switch to and improve their utility. This is what we call a
(mixed-strategy) Nash equilibrium.
The famous result of John Nash from 1951 is that every game has a Nash equilibrium:

Theorem 1. Every bimatrix game has a Nash equilibrium.

In fact, Nash’s result is broader: it covers a quite broad class of n-player games, as we shall see in the
next lecture.
The attentive reader may have noticed that the RPS game has a further property: whenever one player
wins, the other loses. This means that each player can equivalently reason about minimizing the utility of
the other player, rather than maximizing their own utility. More generally, a bimatrix game is a zero-sum
game if it can be represented in the following form:

min max x⊤ Ay
x∈∆n y∈∆m
1.2. GAME THEORY 7

Pn
where ∆n = {x ∈ Rn : i=1 xi = 1, x ≥ 0} is the probability simplex over n actions, ∆m the probability
simplex of m actions, and A contains the payoff entries to the y-player from the bimatrix representation.
Problems of this form are also known as bilinear saddle-point problems. The key here is that we can now
represent the outcome of the game as a single matrix, where the x-player wishes to minimize the bilinear
term x⊤ Ay and the y-player wishes to maximize it. Zero-sum matrix games are very special: they can be
solved in polynomial time with a linear program whose size is linear in the matrix size.
Rock-paper-scissors is of course a rather trivial example of a game. A more exciting application of
zero-sum games is to use it to compute an optimal strategy for two-player poker (AKA heads-up poker).
In fact, as we will discuss later, this was the foundation for many recent” “superhuman AI for poker”
results [15, 69, 17, 19]. In order to model poker games we will need a more expressive game class called
extensive-form games (EFGs). These games are played on trees, where players may sometimes have groups
of nodes, called information sets, that they cannot distinguish among. An example is shown in Figure 6.3.

A K

P1 P1

f r r f

−1 P2 P2 −1

f r f r

1 P1 1 P1

c f c f

3 −2 −3 −2

Figure 1.1: A poker game where P1 is dealt Ace or King. “r,” “f,” and “c” stands for raise, fold, and check
respectively. Leaf values denote P1 payoffs. The shaded area denotes an information set: P2 does not know
which of these nodes they are at, and must thus use the same strategy in both.

EFGs can also be represented as a bilinear saddle-point problem:


min max x⊤ Ay,
x∈X y∈Y

where X, Y are no longer probability simplexes, but more general convex polytopes that encode the sequential
decision spaces of each player. This is called the sequence-form representation [89], and we will cover that
later. Like matrix games, zero-sum EFGs can be solved in polynomial time with linear programming, with
an LP whose size is linear in the game tree.
It turns out that in many practical scenarios, the LP for solving a zero-sum game ends up being far too
large to solve. This is especially true for EFGs, where the game tree can quickly become extremely large if the
game has almost any amount of depth. Instead, iterative methods are used in practice. What is meant by it-
erative methods here is the class of algorithms that build a sequence of strategies x0 , x1 , . . . , xT , y0 , y1 , . . . , yT
using only some form of oracle access to Ay and A⊤ x (this is different from writing down A explicitly!). Typ-
PT PT
ically in such iterative methods, the average of the sequence of strategies x̄T = T1 t=1 xt , ȳT = T1 t=1 yt
8 CHAPTER 1. INTRODUCTION AND EXAMPLES

converge to a Nash equilibrium. The reason these methods are preferred is two-fold. First, by never writing
down A explicitly we save a lot of memory (now we just need enough memory to store the much smaller x, y
strategy vectors). Secondly, they avoid the expensive matrix inversions involved in the simplex algorithm
and interior-point methods.
The algorithmic techniques we will learn for Nash equilibrium computation are largely centered around
iterative methods. First, we will do a quick introduction to online learning and online convex optimization.

We will learn about two classes of algorithms: ones that converge to an equilibrium at a rate O(1/ T ).
These roughly correspond to saddle-point variants of gradient-descent-like methods. Then we will learn
about methods that converge to the solution at a rate of O(1/T ). These roughly correspond to saddle-point
variants of accelerated gradient methods. Then we will also look at the practical performance of these
algorithms. Here we will see that the following quote is very much true:

In theory, theory and practice are the same. In practice, they are not.
+
In particular, the preferred method in practice is the
√ CFR algorithm [83] and later variations [18], all of
which have a theoretical convergence rate of O(1/ T ). In contrast, there are methods that converge at a
rate of O(1/T ) [57, 65, 63] in theory, but these methods are actually slower than CFR+ for most real games!
Being able to compute an approximate Nash equilibrium with iterative methods is only one part of how
superhuman AIs were created for poker. In addition, abstraction and deep learning methods were used to
create a small enough game that can be solved with iterative methods. We will also cover how these methods
are used.
Killer applications of zero-sum games include poker (as we saw), other recreational two-player games, and
generative-adversarial networks (GANs). Other applications that are, as of yet, less proven to be effective
in practice are robust sequential decision making (the adversary represents uncertainty), security scenarios
where we assume the world is adversarial, and defense applications.

1.2.2 Stackelberg Equilibrium


A second game-theoretic solution concept that has had extensive application in practice is what’s called a
Stackelberg equilibrium. We will primarily study Stackelberg equilibrium in the context of what is called
security games [82].
Imagine the following scenario: we are designing the patrol schedule for national park rangers that try
to prevent poaching of endangered wildlife in the park (such as rhinos, which are poached for their horns).
There are 20 different watering holes that the rhinos frequent. We have 5 teams of guards that can patrol
watering holes. How can we effectively combat poaching? If we come up with a fixed patrol schedule then
the poachers can observe us for a few days and learn our schedule exactly. Afterwards they can strike at a
waterhole that is guaranteed to be empty at some particular time. Thus we need to design a schedule that is
unpredictable, but which also accounts for the fact that some watering holes are more frequented by rhinos
(and are thus higher value), travel constraints, etc.
In the security games literature, the most popular solution concept for this kind of setting is the Stack-
elberg equilibrium. In a Stackelberg equilibrium, we assume that we, as the leader (e.g. the park rangers),
get to commit to our (possibly randomized) strategy first. Then, the follower observes our strategy and best
responds. This turns out to yield the same solution concept as Nash equilibrium in zero-sum games, but in
general games it leads to a different solution concept.
However, if we want to help the park rangers design their schedules then we will need to be able to
compute Stackelberg equilibria of the resulting game model. Again, we will see that optimization is one of
the fundamental pillars of the field of security games research. A unique feature of security games is that the
strategy space of the leader is typically some combinatorial polytope (e.g. a restriction on the transportation
polytope), and the problem of computing a Stackelberg equilibrium is intimately related to optimization over
the underlying polytope of the defender (see Xu [90] for some nice consequences of this observation). Because
of this combinatorial nature, security games often end up being much harder to solve than zero-sum Nash
equilibrium. Therefore, the focus of this section will be on combinatorial approaches to this problem, such
as mixed-integer programming and decomposition. Another crucial aspect of security games is having good
models of the attacker. Thus, if time permits, we will also spend some time learning how one can model
adversaries using machine learning.
1.3. MARKET DESIGN 9

Killer applications of Stackelberg games are mainly in the realm of security. They have been applied in
infrastructure security (airports, coast guard, air marshals) [82], to protect wildlife [46], and to combat fare
evasion. A nascent literature is also emerging in cybersecurity. Outside of the world of security, Stackelberg
games are also used to model things like first-mover advantage in the business world.

1.3 Market Design


The second pillar of the book will be market design. In market design we are typically concerned with how
to design the rules of the game, and how to do that in order to achieve “good” outcomes. Thus, game theory
is a key tool in market design, since it will be our way of understanding what outcomes we may expect to
arise, given a set of market rules.
Market design is a huge area, and so it has many killer applications. The ones we will see in this course
include Internet ad auctions and how to fairly assign course seats to students. However there are many
others such as how to price and assign rideshares at Lyft/Uber, how to assign NYC kids to schools, how to
enable nationwide kidney exchanges, how to allocate spectrum, etc.
For example, imagine that we are designing a mechanism for managing course enrollment. How should
we decide which students get to take which courses? What do we do with the fact that our machine learning
course has 100 seats and 500 people that want to take it? Overall, we would like the system to somehow
be efficient, but what does that mean? We would also like the system to be fair, but it’s not entirely clear
what that means either.
At a loss for ideas, we come up with the following solution: we will just have a sign-up system where
students can sign up until a course fills up. After that we put other students on a waitlist that we clear on a
first-in first-out basis as seats become available. Is this a good system? Well, let’s look at a simple example:
we will have 2 students and 2 courses, each course having 1 seat. Students are allowed to take at most one
course. Let’s say that each student values the courses as follows:

Course A Course B
Student 1 5 5
Student 2 2 8

Student 1 arrives first and signs up for course B. Then Student 2 arrives and signs up for A. The total
welfare of this assignment is 5 + 2 = 7. This does not seem to be an efficient use of resources: we can
improve our solution by swapping the courses, since Student 1 gets the same utility as before, and Student
2 improves their utility. This is what’s called a Pareto-improving allocation because each student is at least
as well off as before, and at least one student is strictly better off. One desiderata for efficiency is that no
such improvement should be possible; an allocation with this property is called Pareto efficient.
Let’s look at another example. Now we have 2 students and 4 courses, where each student takes 2 courses.
Again courses have only 1 seat.
Course A Course B Course C Course D
Student 1 10 10 1 1
Student 2 10 10 1 1
Now say that Student 1 shows up first, and signs up for A and B. Then Student 2 shows up and signs up for
C and D. Call this assignment A1 . Here we get that A1 is Pareto efficient, but it does not seem fair. A fairer
solution seems to be that each students get a course with value 10 and a course with value 1, let A2 be such
an assignment. One way to look at this improvement is through the notion of envy: each student should like
their own course schedule at least as well as that of any other student. Under A1 Student 2 envies Student
1 by a value of 18, whereas under A2 no student envies the other. An allocation where no student envies
another student is called envy-free. Fairness turns out to be a complicated idea, and we will see later that
there are several appealing notions that we may wish to strive for.
Instead of first-come-first-serve, we can use ideas from market design to get a better mechanism. The
solution that we will learn about is based on a fake-money market: we give every student some fixed budget
of fake currency (aka funny money). Then, we treat the assignment problem as a market problem under
the assigned budgets, and ask for what is called a market equilibrium. Briefly, a market equilibrium is a set
10 CHAPTER 1. INTRODUCTION AND EXAMPLES

of prices, one for each item, and an allocation of items to buyers. The allocation must be such that every
item is fully allocated (or has a price of zero), and every buyer is getting an assignment that maximizes their
utility over all the possible assignments they could afford given the prices and their budget. Given such a
market equilibrium, we then take the allocation from the equilibrium, throw away the prices (the money was
fake anyway!), and use that to perform our course allocation. This turns out to have a number of attractive
fairness and efficiency properties. Course-selection mechanisms based on this idea are deployed at several
business schools such as Columbia Business School, the Wharton School at U. of Pennsylvania, University
of Toronto’s Rotman School of Management, and Darthmouth’s Tuck School of Business [21, 22].
Of course, if we want to implement this protocol we need to be able to compute a market equilibrium.
This turns out to be a rich research area: in the case of what is called a Fisher market, where each agent
i has a linear utility function vi ∈ Rm+ over the m items in the market there is a neat convex program that
results in a market equilibrium [44]:
X
max Bi log(vi · xi )
x≥0
i
X
s.t. xij ≤ 1, ∀j
i

Here xij is how much buyer i is allocated of item j. Notice that we are simply maximizing the budget-
weighted logarithmic utilities, with no prices! It turns out that the prices are the dual variables on the
supply constraints. We will see some nice applications of convex duality and Fenchel conjugates in deriving
this relationship. We will also see that this class of markets have a relationship to the types of auction
systems that are used at Google and Facebook [36, 37].
In the case of markets such as those for course seats, the problem is computationally harder and requires
combinatorial optimization. Current methods use a mixture of MIP and local search [22].

1.4 Acknowledgments
These lecture notes owe a large debt to several other professors that have taught courses on Economics
and Computation. In particular, Tim Roughgarden’s lecture notes [78] and video lectures, John Dickerson’s
course at UMD1 , and Ariel Procaccia’s course at CMU2 provided inspiration for course topics as well as
presentation ideas.
I would also like to thank the following people for extensive feedback on the lecture notes: Ryan D’Orazio
for both finding mistakes and providing helpful suggestions on presenting Blackwell approachability. Gabriele
Farina for discussions around several regret minimization issues. And for helping me develop much of my
thinking on regret minimization in general.
I also thank the following people who pointed out mistakes and typos in the notes: Mustafa Mert Çelikok,
Ajay Sakarwal, Eugene Vinitsky.

1 https://www.cs.umd.edu/class/spring2018/cmsc828m/
2 http://www.cs.cmu.edu/ arielpro/15896s16/index.html
Chapter 2

Nash Equilibrium Intro

In this lecture we begin our study of Nash equilibrium by giving the basic definitions, Nash’s existence result,
and briefly touch on computability issues. Then we will make a few observations specifically about zero-sum
games, which have much more structure to exploit.

2.1 General-Sum Games


A normal-form game consists of:
• A set of players N = {1, . . . , n}
• A set of strategies S = S1 × S2 × · · · × Sn
• A utility function ui : S → R
We will use the shorthand s−i to denote the subset of a strategy vector s that does not include player i’s
strategy.
As a first solution concept we will consider dominant-strategy equilibrium (DSE). In DSE, we seek a
strategy vector s ∈ S such that each si is a best response no matter what s−i is. A classic example is the
prisoner’s dilemma: two prisoners are on trial for a crime. If neither confesses (stay silent) to the crime then
they will each get 1 year in prison. If one person confesses and the other does not, then the confessor gets
no time, but their co-conspirator gets 9 years. If both confess then they both get 6 years.
Silent Confess
Silent -1,-1 -9,0
Confess 0,-9 -6,-6
In this game, confessing is a DSE: it yields greater utility than staying silent no matter what the other player
does. A DSE rarely exists in practice, but it can be useful in the context of mechanism design, where we get
to decide the rules of the game. It is the idea underlying e.g. the second-price auction which we will cover
later.
Consider some strategy vector s ∈ S. We say that s is a pure-strategy Nash equilibrium if for each player
i and each alternative strategy s′i ∈ Si :
ui (s) ≥ ui (s−i , s′i ),
where s−i denotes all the strategies in s except that of i. A DSE is always a pure-strategy Nash equilibrium,
but not vice versa. Consider the Professor’s dilemma,1 where the professor chooses a row strategy and the
students choose a column strategy:
Students
Listen Sleep
Prof.

Prepare 106 , 106 -10,0


Slack off 0,-10 0,0
1 Example borrowed from Ariel Procaccia’s slides

11
12 CHAPTER 2. NASH EQUILIBRIUM INTRO

In this game there is no DSE, but there’s clearly two pure-strategy Nash equilibria: the professor prepares
and students listen, or the professor slacks off and students sleep. But these have quite different properties.
Thus equilibrium selection is an issue for general-sum games. There are at least two reasons for this: first,
if we want to predict the behavior of players then how do we choose which equilibrium to predict? Second,
if we want to prescribe behavior for an individual player, then we cannot necessarily suggest that they play
some particular strategy from a Nash equilibrium, because if the other players do not play the same Nash
equilibrium then it may be a terrible suggestion (for example, suggesting that the professor plays “Prepare”
from the Prepare/Listen equilibrium, when the students are playing the Slack off/Sleep equilibrium would
be bad for the professor).
Moreover, pure-strategy equilibria are not even guaranteed to exist, as we saw in the previous lecture
with the rock-paper-scissors example.
To fix the existence issue we may consider allowing players to randomize over their choice of strategy
(as in rock-paper-scissors where players should randomize uniformly). Let σi ∈ ∆|Si | denote player i’s
probability distribution over their strategy, this is called a mixed strategy. Let a strategy profile be denoted
by σ = (σ1 , . . . , σn ). By a slight abuse of notation we may rewrite a player’s utility function as
X Y
ui (σ) = ui (s) σi (si )
s∈S i

A (mixed-strategy) Nash equilibrium is a strategy profile σ such that for all pure strategies σi′ (σi′ is pure if
it puts probability 1 on a single strategy):

ui (σ) ≥ ui (σ−i , σi′ ).

Now, Nash’s theorem says that

Theorem 2. Any game with a finite set of strategies and a finite set of players has a mixed-strategy Nash
equilibrium.

Now, since our goal is the prescribe or predict behavior, we would also like to be able to compute a Nash
equilibrium. Unfortunately this turns out to be computationally difficult:

Theorem 3. The problem of computing a Nash equilibrium in general-sum finite games is PPAD-complete.

We won’t go into detail on what the complexity class PPAD is for now, but suffice it to say that it is
weaker than the class of NP-complete problems (it is not hard to come up with a MIP for computing a Nash
equilibrium, for example), but still believed to take exponential time in the worst case.
As a sidenote, one may make the following observation about why Nash equilibrium does not “fit” in the
class of NP-complete problems: typically in NP-completeness we ask questions such as “does there exist a
satisfying assignment to this Boolean formula?” But given a particular game, we already know that a Nash
equilibrium exists. Thus we cannot ask about the type of existence questions typically used in NP-complete
problems, but rather it is only the task of finding one of the solutions that is difficult. This can be a useful
notion to keep in mind when encountering other problems that have guaranteed existence. That said, once
one asks for additional properties such as “does there exist a Nash equilibrium where the sum of utilities is
at least v?” one gets an NP-complete problem.
Given a strategy profile σ, we will often be interested in measuring how “happy” the players are with the
outcome of the game under σ. Most commonly, we are interested in the social welfare of a strategy profile
(and especially for equilibria). The social welfare is the expected value of the sum of the player’s utilities:
n
X n X
X n
Y
ui (σ) = ui (s) σi′ (si′ ).
i=1 i=1 s∈S i′ =1

We already saw in the Professor’s Dilemma that there can be multiple equilibria with wildly different social
welfare: when the professor slacks off and the students sleep, the social welfare is zero; when the professor
prepares and the students listen, the social welfare is 2 · 106 .
2.2. HISTORICAL NOTES 13

2.1.1 Zero-Sum Games


In the special case of a two-player zero-sum game, we have u1 (s) = −u2 (s)∀s ∈ S. In that case, we can
represent our problem as the bilinear saddlepoint problem we saw in the last lecture:

min max ⟨x, Ay⟩.


x∈∆n y∈∆m

A first observation one may make is that the minimization problem faced by the x-player is a convex
optimization problem, since the max operation is convexity-preserving. This suggests that we should have
a lot of algorithmic options to use. This turns out to be true: unlike the general case, we can compute a
zero-sum equilibrium in polynomial time using linear programming (LP).
In fact, we have the following stronger statement, which is essentially equivalent to LP duality:

Theorem 4 (von Neumann’s minimax theorem). Every two-player zero-sum game has a unique value v,
called the value of the game, such that

min max ⟨x, Ay⟩ = maxm minn ⟨x, Ay⟩ = v.


x∈∆n y∈∆m y∈∆ x∈∆

We will prove a more general version of this theorem when we discus regret minimization.
Because zero-sum Nash equilibria are min-max solutions, they are the best that a player can do, given a
worst-case opponent. This guarantee is the rationale for saying that a given game has been solved if a Nash
equilibrium has been computed for the game. Some games are trivially solvable, e.g. in rock-paper-scissors
we know that the uniform distribution is the only equilibrium. However, this notion has also been applied
to quite large games. For example heads-up limit Texas hold’em, one of the smallest poker variants played
by humans (which is still a huge game). In 2015, that game was essentially solved. The idea of essentially
solving a game is as follows: we want to compute a strategy that is statistically indistinguishable from a Nash
equilibrium in a lifetime of human-speed play. The statistical notion was necessary because the solution was
computed using iterative methods that only converge to an equilibrium in the limit (but in practice get quite
close very rapidly). The same argument is also used in constructing AIs for even larger two-player zero-sum
poker games where we can only try to approximate the equilibrium.
Note that this min-max guarantee of Nash equilibria does not hold in general-sum games. In general-sum
games, we have no payoff guarantees if our opponent does not play their part of the same Nash equilibrium
that we play. Interestingly, the AI and optimization methods developed for two-player zero-sum poker turned
out to still outperform top-tier human players in 6-player no-limit Texas hold’em poker, in spite of these
equilibrium selection issues. An AI based on these methods ended up beating professional human players,
in spite of the methods having no guarantees on performance, nor even of converging to a general-sum Nash
equilibrium.
Here is another interesting property of zero-sum Nash equilibrium: it is interchangeable. Meaning that
if you take an equilibrium (x, y) and another equilibrium (x′ , y ′ ) then (x, y ′ ) and (x′ , y) are also equilibria.
This is easy to see from the minimax formulation.

2.2 Historical Notes


Daskalakis et al. [38] were the first to show that solving general games is a PPAD-complete problem. Their
initial result was for four-player games. Chen et al. [28] showed that the result holds even for two-player
general-sum games. NP-completeness of finding Nash equilibria with various properties was shown by Gilboa
and Zemel [53] and Conitzer and Sandholm [35].
The result where Heads-up limit Texas hold’em was essentially solved was by Bowling et al. [15]. That
paper also introduced the notion of “essentially solved.” The strong performance against top-tier humans
in 6-player poker was shown by Brown and Sandholm [19].
14 CHAPTER 2. NASH EQUILIBRIUM INTRO
Chapter 3

Auctions and Mechanism Design Intro

3.1 Introduction
In this lecture note we will study the problem of how to aggregate a set of agent preferences into an outcome,
ideally in a way that achieves some desirable outcome. Desiderata we might care about include social welfare,
which is just the sum of the agent’s utilities derived from the outcome, or revenue in the context of auctions.
Suppose that we have have a car, and we wish to give it to one of n people, with the goal of giving it
to the person that would get the most utility out of the car. One thing we could do is ask each person to
tell us how much utility they would get out of receiving the car, expressed as some positive number. This,
obviously, leads to the “who can name the highest number?” game, since no person will want to tell us how
much value they actually place on the car, but will instead try to name as large of a number as possible.
The above, rather silly, example shows that in general we need to be careful about how we design the
rules that map the stated preferences by the agents of a mechanism into an outcome. The general field
concerned with the design of rules, or mechanisms for designing rules that ask agents about their preferences
and use that to choose an outcome is called mechanism design.

3.2 Auctions
We will mostly focus on the most classical mechanism-design setting: auctions. We will start by considering
single-item auctions: there is a single good for sale, and there is a set of n buyers, with each buyer having
some value vi for the good. The goal will be to sell the item via a sealed-bid auction, which works as follows:

1. Each bidder i submits a bid bi ≥ 0, without seeing the bids of anyone else.

2. The seller decides who gets the good based on the submitted bids.

3. Each buyer i is charged a price pi which is a function of the bid vector b.

A few things in our setup may seem strange. First, most people would not think of sealed bids when
envisioning an auction. Instead, they typically envision what’s called the English auction. In the English
auction, bidders repeatedly call out increasing bids, until the bidding stops, at which point the highest bidder
wins and pays their last bid. This auction can be conceptualized as having a price that starts at zero, and
then rises continuously, with bidders dropping out as they become priced out. Once only one bidder is left,
the increasing price stops and the item is sold to the last bidder at that price. This auction format turns out
to be equivalent to the second-price sealed-bid auction which we will cover below. Another auction format
is the Dutch auction, which is less prevalent in practice. It starts the price very high such that nobody is
interested, and then continuously drops the price until some bidder says they are interested, at which point
they win the item at that price. The Dutch auction is likewise equivalent to the first-price sealed-bid auction,
which we cover below.
Secondly, it would seem natural to always give the item to the highest bid in step 2, but this is not
always done (though we will focus on that rule). Thirdly, the pricing step allows us to potentially charge

15
16 CHAPTER 3. AUCTIONS AND MECHANISM DESIGN INTRO

more bidders than only the winner. This is again done in some reasonable auction designs, though we will
mostly focus on auction formats where pi = 0 if i does not win.
When thinking about how buyers are going to behave in an auction, we need to first clarify what each
buyers knows about the other bidders. Perhaps the most standard setting is one where each buyer i has some
distribution Fi from which they draw their bid is drawn, independently from the distribution for every other
buyer. This is known as the independent private values (IPV) model. In this model, every buyer knows the
distribution of every other buyer, but they only get to observe their own value vi ∼ Fi before choosing their
bid bi . For this model, we need a new game-theoretic equilibrium notion called a Bayes Nash equilibrium
(BNE). A BNE is a set of mappings {σi }ni=1 , where σi (vi ) specifies the bid that buyer i submits when they
have value i, such that for all values vi and alternative bids bi , σi (vi ) achieves at least as much utility as bi
in a Bayesian sense:

Ev−i ∼F−i [ui (σi (vi ), σ−i (v−i ))|vi ] ≥ Ev−i ∼F−i [ui (bi , σ−i (v−i ))|vi ].

In the auction context, ui (bi , σ−i (v−i ) is utility that buyer i derives given the allocation and payment rule.
The idea of a BNE works more generally for a game setup where ui is some arbitrary utility function.
We will now introduce some useful mechanism-design terminology. We will introduce it in this single-item
auction context, but it applies more broadly.
Efficiency. An outcome of a single-item auction is efficient if the item ends up allocated to the buyer
that values it the most. In general mechanism design problems, an efficient outcome is typically taken to
be one that maximizes the sum of the agent utilities, which is also known as maximizing the social welfare.
Alternatively, efficiency is sometimes taken to mean that we get a Pareto-optimal outcome, which is a weaker
notion of efficiency than social welfare maximization (convince yourself of this with a small example.)
Revenue. The revenue of a single-item auction is simply the sum of payments made by the bidders.

Truthfulness, strategyproofness, and incentive compatibility. Informally, we say that an auction


is truthful or incentive compatible (IC) if buyers maximize their utility by bidding their true value (i.e.
bi = vi ). More formally, an auction is dominant strategy incentive compatible (DSIC) if a buyer maximizes
their utility by bidding their value, no matter what everyone else does. Saying that an auction (or more
generally a mechanism) is “truthful” or “strategyproof” typically means that it is DSIC. DSIC auctions are
very attractive because it means that buyers do not need to worry about what the other buyers will do: no
matter what happens, they should just bid their value. This also means that, as auction designers, we can
reasonably expect that buyers will bid their true value (or at least try to, if they are not perfectly capable of
estimating it themselves). This makes it much easier to reason about aspects such as efficiency or revenue.
A slightly weaker degree of truthfulness is that of Bayes-Nash incentive compatibility: an auction is
Bayes-Nash IC if there exists a BNE where every buyer bids their value. It is clear why this notion is less
appealing: Now buyers need to worry about whether other buyers are going to bid truthfully. If they think
that they will, then we might expect them to bid their value as well. However, if the system starts out in
some other state, we might worry that buyers will adapt their bidding over time in a way that pushes them
into some other non-truthful equilibrium.

3.2.1 First-price auctions


First-price auctions are perhaps what most people imagine when we say that we are selling our good via a
sealed-bid auctions. In first-price auctions, each buyer submits some bid bi ≥ 0, and then we allocate the
item to the buyer i∗ with the highest bid and charge that buyer bi∗ . This is also sometimes referred to as
pay-your-bid.
Let’s briefly try to reason about what might happen in a first-price auction. Clearly, no buyer should bid
their true value for the good under this mechanism; in that case they receive no utility even when they win.
Instead, buyers should shade their bids, so that they sometimes win while also receiving strictly positive
utility. The problem is that buyers must strategize about how other buyers will bid, in order to figure out
how much to shade by.
This issue of shading and guessing what other buyers will bid happened in early Internet ad auctions,
where first-price auctions were initially adopted. Overture was an early pioneer in selling Internet sponsored
search ads via auction. They initially ran first-price auctions, and provided services to MSN and Yahoo
3.2. AUCTIONS 17

(which were popular search engines at the time). Bidding and pricing turned out to be very inefficient,
because buyers were constantly changing their bids in order to best respond to each other. Plots of the price
history show a clear “sawtooth pattern,” where a pair of bidders will take turns increasing their bid by 1
cent each, in order to beat the other bidder. Finally, one of the bidders reaches their valuation, at which
point they drop their bid much lower in order to win something else instead. Then, the winner realizes that
they should bid much lower, in order to decrease the price they pay. At that point, the bidder that dropped
out starts bidding 1 cent more again, and the pattern repeats. This leads to huge price fluctuations, and
inefficient allocations, since about half the time the item goes to the bidder with the lower valuation.
All that said, it turns out that there does exist at least one interesting characterization of how bid-
ding should work in a single-item first-price auction (the Overture example technically consists of many
“independent” first-price auctions; though that independence does not truly hold as we shall see later).
For this characterization, we assume the following symmetric model: we have n buyers as before, and
buyer i assigns value vi ∈ [0, ω] for the good. Each vi is sampled IID from an increasing distribution function
F . F is assumed to have a continuous density f and full support. Each bidder knows their own value vi ,
but only knows that the value of each other buyer is sampled according to F .
Given a bid bi , buyer i earns utility vi − bi if they win, and utility 0 otherwise. If there are multiple bids
tied for highest then we assume that a winner is picked uniformly at random among the winning bids, and
only the winning bidder pays.
It turns out that there exists a symmetric equilibrium in this setting, where each bidder bids according
to the function
β(vi ) = E[Y1 |Y1 < vi ],
where Y1 is the random variable denoting the maximum over n − 1 independently-drawn values from F .

Theorem 5. If every bidder in a first-price auction bids according to β then the resulting strategy profile is
a Bayes-Nash equilibrium.

Proof. Let G(y) = F (y)n−1 denote the distribution function for Y1 .


Suppose all bidders except i bids according to β. The function β is continuous and monotonically
increasing: a higher value for vi simply adds additional values to the highest end of the distribution. As a
consequence, the highest bid other than that of bidder i is β(Y1 ). It follows that bidder i should never bid
more than β(ω), since that is the highest possible other bid. Now consider bidding bi ≤ β(ω). Letting z be
such that β(z) = bi , the expected value that bidder i obtains from bidding bi is:

Π(bi , vi ) =G(z)[vi − β(z)]


=G(z)vi − G(z)E[Y1 |Y1 < z] by definition of β(z)
Z z
=G(z)vi − yg(y)dy by definition of expectation
0
Z z
=G(z)vi − G(z)z + G(y)dy integration by parts
Z z 0
=G(z)(vi − z) + G(y)dy
0

Now we can compare the values from bidding β(vi ) and bi :


Z vi Z z
Π(β(vi ), vi ) − Π(bi , vi ) = G(vi )(vi − vi ) + G(y)dy − G(z)(vi − z) − G(y)dy
0 0
Z z
= G(z)(z − vi ) − G(y)dy
vi

If z ≥ vi then this is clearly positive since G(z) ≥ G(y) for all y ∈ [vi , z]. If z ≤ vi , then G(z) ≤ G(y), and
so we have a negative number and subtract a more negative number.

A nice property that follows from the monotonicity of β is that the item is always allocated to the bidder
with the highest valuation, and thus the symmetric equilibrium is efficient.
18 CHAPTER 3. AUCTIONS AND MECHANISM DESIGN INTRO

3.2.2 Second-price auctions


Now we look at another pricing rule: the second-price auction. The second-price auction turns out to simply
allow buyers to submit their true value as their bid. In a second-price auction, the winning bidder i∗ is
charged the second-highest bid. It’s easy to see that a bidder should simply bid their valuation in this
auction format. There are four cases to consider for a non-truthful bid bi ̸= vi :
1. bi > vi ≥ b2 where b2 is the second-highest bid. In that case buyer i would have gotten the same utility
from bidding vi .
2. bi > b2 > vi where b2 is the second-highest bid. In that case buyer i wins, but gets utility vi − b2 < 0,
and they would have been better off bidding their valuation.
3. bi < b2 < vi where b2 is the second-highest bid. In that case buyer i does not win, but they could have
won and gotten strictly positive utility if they had bid their valuation.
4. b2 < bi < vi where b2 is the second-highest bid. In that case buyer i wins, but they would have won,
and paid the same, if they had bid their true value.
It follows that the second-price auction is DSIC, because an agent should report their true valuation no
matter what everybody else does. The second-price auction is also efficient, in the sense that it maximizes
social welfare (since the item goes to the buyer with the highest value). Finally, it is computable, in the sense
that it is easy to find the allocation and payments.
Note that the first-price auction is also computable, and under the symmetric equilibrium given in
Theorem 5 it is also efficient. But it is not truthful, and it is not hard to come up with a simple discrete
setting where there is not even an equilibrium.

3.3 Mechanism Design


More generally, in mechanism design:
• There’s a set of outcomes O, and the job of the mechanism is to choose some outcome o ∈ O
• Each agent i has a private type θi ∈ Θi , that they draw from some publicly-known distribution Fi
• Each agent i has some publicly-known valuation function vi (o|θi ) that specifies a utility for each
outcome, given their type
• The
P goal of the center is to design a mechanism that maximizes some objective, e.g. social welfare
i ui (o|θi )

A mechanism takes as input a vector of reported types θ from the players, and outputs an outcome,
formally it is a function f : ×i Θi → O that specifies the outcome that results from every possible set of
reported types. In mechanism design with money, we also have a payment function g : ×i Θi → Rn that
specifies how much each agent pays under the outcome. In less formal terms, a mechanism merely specifies
what happens, given the reported types from the agents. In first and second-price auctions the outcome
function was the same (allocate to the highest bidder), but the payment function was different. We could
potentially also allow randomized mechanism f : ×i Θi → ∆(O) that map to a probability distribution over
the outcome space.
How do we analyze what happens in a given mechanism? The ideal answer is that every agent is best
off reporting their true type, no matter what everybody else does, i.e. the mechanism should be DSIC.
Formally, that would mean that for every agent i, type θi ∈ Θi , any type vector θ−i of the remaining agents,
and misreported type θi′ ∈ Θi :
E [vi (f (θi , θ−i ))] ≥ E [vi (f (θi′ , θ−i ))] ,
where the expectation is over the the potential randomness of the mechanism. If there is also a payment
function g and agents have quasi-linear utilities then the inequality is

E [vi (f (θi , θ−i )) − g(θi , θ−i )] ≥ E [vi (f (θi′ , θ−i )) − g(θi′ , θ−i )] ,
3.4. HISTORICAL NOTES 19

A less satisfying answer is that there exists a Bayes-Nash equilibrium of the game induced by the mech-
anism, in which every agent reports their true type. Formally, that would mean that for every agent i, type
θi ∈ Θi , and misreported type θi′ ∈ Θi :

E [vi (f (θi , θ−i ))] ≥ E [vi (f (θi′ , θ−i ))] ,

where the expectation is over the types θ−i of the other agents, and the potential randomness of the mecha-
nism. This constraint just says that reporting their true type should maximize their expected utility, given
that everybody else is truthfully reporting. This can likewise be generalized for a payment function g.
In the setting where we can charge money, the Vickrey-Clarke-Groves (VCG) mechanism is DSIC and
maximizes social welfare. In VCG, after receiving the type vector θ, we pick the outcome o that maximizes
the report welfare. The key to then making this choice incentive compatible is that we charge each agent
their externality: X X
max

vi′ (o′ |θi′ ) − vi′ (o|θi′ ).
o ∈O
i′ ̸=i i′ ̸=i

The externality measures how much better off all the other agents would have been if i were not there. When
we add together the value received by player i minus their payment, we get that their utility function is:
X X
vi′ (o|θi′ ) − max

vi′ (o′ |θi′ )
o ∈O
i′ i′ ̸=i

Intuitively, we see that i cannot affect the negative term here, and the positive term is exactly the social
welfare. Thus we get that each agent i is incentivized to maximize social welfare, which is achieved by
reporting their true type θi .

3.4 Historical Notes


The issues with first-price in the context of Overture’s sponsored search auctions is described in Edelman and
Ostrovsky [41], which also shows plots from real data exhibiting the sawtooth pattern. The derivation of the
symmetric equilibrium of the first-price auction follows the proof from Krishna [61]. Interestingly, first-price
auctions have experiences a resurgence in the context of display advertising, where many independent ad
exchanges moved to first price in 2018, and Google followed suit and moved their Ad Manager to first price
in 20191 .
The second-price auction is sometimes referred to as the Vickrey auction after its inventor [86]. The
generalized second-price auction was described by Edelman et al. [42], though it had been in use in the
Internet ad industry for a while at that point. The VCG mechanism was described in a series of papers by
Vickrey [86], Clarke [32], and Groves [56]. A full description of a slightly more general VCG mechanism,
and proof of correctness, can be found in Nisan et al. [74, Chapter 9]

1 see https://www.blog.google/products/admanager/update-first-price-auctions-google-ad-manager/
20 CHAPTER 3. AUCTIONS AND MECHANISM DESIGN INTRO
Chapter 4

Regret Minimization and Sion’s


Minimax Theorem

So far we have mostly discussed the existence of game-theoretic equilibria such as Nash equilibrium. Now we
will get started on how to compute Nash equilibria, specifically in two-player zero-sum games. The fastest
methods for computing large-scale zero-sum Nash equilibrium are based on what’s called regret minimization.
Regret minimization is a form of single-agent decision making, where the decision maker repeatedly chooses
decision from a set of possible choices, and each time they make a decision, they are then given some loss
vector specifying how much loss they incurred through their decision. It may seem counterintuitive that we
move on to a single-agent problem after discussing game-theoretic problems with two or more players, but
we shall see that regret minimization can be used to learn how to play a game. We will also use it to prove
a fairly general version of von Neumann’s minimax theorem, a variant that is known as Sion’s minimax
theorem.

4.1 Regret Minimization


In the simplest regret-minimization setting we imagine that we are faced with the task of repeatedly choosing
among a finite set of n actions. At each time step, we choose an action, and then a loss gti ∈ [0, 1] is revealed
for each action i. The loss is how unhappy we are with having chosen action i, and the goal is to minimize
losses over time. This scenario is then repeated iteratively. The key is that the losses may be adversarially
chosen after we make our choice, and we would like to come up with a decision-making procedure that does
at least as well as the single best action in hindsight. We will be allowed to choose a distribution over actions,
rather than a single action, at each decision point. Classical example applications would be picking stocks,
picking which route to take to work in a routing problem, or weather forecasting. To be concrete, imagine
that we have n weather-forecasting models that we will use to forecast the weather each day. We would like
to decide which model is best to use, but we’re not sure how to pick the best one. In that case, we may
run a regret-minimization algorithm, where our “action” is to pick a model, or a probability distribution
over models, to forecast the weather with. If we spend enough days forecasting, then we will show that it is
possible for our average prediction to be as good as the best single model in hindsight. As can be seen from
the above examples, regret minimization methods are widely applicable beyond equilibrium computation
and a useful tool to know about.

4.1.1 Setting
Formally, we are faced with the following problem: at each time step t = 1, . . . , T :

1. We must choose a decision xt ∈ ∆n

2. Afterwards, a loss vector gt ∈ [0, 1]n is revealed to us, and we pay the loss ⟨gt , xt ⟩

21
22 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM

Our goal is to develop an algorithm that recommends good decisions. A natural goal would be to do as well
as the best sequence of actions in hindsight. But this turns out to be too ambitious, as the following example
shows

Example 1. We have 2 actions a1 , a2 . At timestep t, if our algorithm puts probability greater than 21 on
action a1 , then we set the loss to (1, 0), and vice versa we set it to (0, 1) if we put less than 12 on a1 . Now
we face a loss of at least T2 , whereas the best sequence in hindsight has a loss of 0.

Instead, our goal will be to minimize regret. The regret at time t is how much worse our sequence of
actions is, compared to the best single action in hindsight:

t
X t
X
Rt = ⟨gτ , xτ ⟩ − minn ⟨gτ , x⟩
x∈∆
τ =1 τ =1

We say that an algorithm is a no-regret algorithm if for every ϵ > 0, there exists a sufficiently-large time
horizon T such that RTT ≤ ϵ.
Let’s see an example showing that randomization is necessary. Consider the following natural algorithm:
at time t, choose the action that minimizes the loss seen so far, where ei is the vector of all zeroes except
index i is 1:
X t
xt+1 = argmin ⟨gτ , x⟩. (FTL)
x∈{e1 ,...,en } τ =1

This algorithm is called follow the leader (FTL). Note that it always chooses a deterministic action. The
following example shows that FTL, as well as any other deterministic algorithm, cannot be a no-regret
algorithm

Example 2. At time t, say that we recommend action i. Since the adversary gets to choose the loss vector
after our recommendation, let them choose the loss vector be such that gi = 1, gj = 0∀j ̸= i. Then our
deterministic algorithm has loss T at time T , whereas the cost of the best action in hindsight is at most Tn .

It is also possible to derive a lower bound showing that any algorithm must have regret at least O( T )
in the worst case, see e.g. [78] Example 17.5.

4.1.2 The Hedge Algorithm


We now show that, while it is not possible to achieve no-regret with deterministic algorithms, it is possible
with randomized ones. We will consider the Hedge algorithm. It works as follows:

• At t = 1, initialize a weight vector w1 with wi1 = 1 for all actions i

t
• At time t, choose actions according to the probability distribution pi = Pwi t
j wj

• After observing gt , set wit+1 = wit · e−ηgt,i , where η is a stepsize parameter

The stepsize η controls how aggressively we respond to new information. If gt,i is large then we decrease
the weight wi more aggressively.

Theorem 6. Consider running Hedge for T timesteps. Hedge satisfies

ηT log n
RT ≤ +
2 η
4.2. ONLINE CONVEX OPTIMIZATION 23

Proof. Let gt2 denote the vector of squared losses. Let Zt = wjt be the sum of weights at time t. We have
P
j

n
X
Zt+1 = wit e−ηgt,i
i=1
n
X
= Zt xt,i e−ηgt,i
i=1
n
X η2 2
≤ Zt xt,i (1 − ηgt,i + g )
i=1
2 t,i
η2
= Zt (1 − η⟨xt , gt ⟩ + ⟨xt , gt2 ⟩)
2
η2
⟨xt ,gt2 ⟩
≤ Zt e−η⟨xt ,gt ⟩+ 2

x2
where the first inequality uses the second-order Taylor expansion e−x ≤ 1 − x + 2 and the second
inequality uses 1 + x ≤ ex .
Telescoping and using Z1 = n, we get
T
η2 η2
⟨xt ,gt2 ⟩
Y PT PT 2
ZT +1 ≤ n e−η⟨xt ,gt ⟩+ 2 = ne−η t=1 ⟨xt ,gt ⟩+ 2 t=1 ⟨xt ,gt ⟩

t=1

Now consider the best action in hindsight i∗ . We have


PT PT η2 PT 2
e−η t=1 gt,i∗
= wiT∗+1 ≤ ZT +1 ≤ ne−η t=1 ⟨xt ,gt ⟩+ 2 t=1 ⟨xt ,gt ⟩

Taking logs gives


T T T
X X η2 X
−η gt,i∗ ≤ log n − η ⟨xt , gt ⟩ + ⟨xt , gt2 ⟩.
t=1 t=1
2 t=1

Now we rearrange to get


T
log n η X log n ηT
RT ≤ + ⟨xt , gt2 ⟩ ≤ + ,
η 2 t=1 η 2

where the last inequality follows from xt ∈ ∆n and gt ∈ [0, 1]n .


If we know T in advance we can now set η = √1 to get that Hedge is a no-regret algorithm.
T

4.2 Online Convex Optimization


In OCO, we are faced with a similar, but more general, setting than in the regret-minimization setup from
last time. In the OCO setting, we are making decisions from some compact convex set X ∈ Rn (analogous to
the fact that we were previously choosing probability distributions from ∆n ). After choosing a decision xt ,
we suffer a convex loss ft (xt ). We will assume that ft is differentiable for convenience, but this assumption
is not necessary.
As before, we would like to minimize the regret:
T
X T
X
RT = ft (xt ) − min ft (x)
x∈X
t=1 t=1

We saw in the last lecture that the follow-the-leader (FTL) algorithm, which always picks the action
that minimizes the sum of losses seen so far, does not work. That same argument carries over to the OCO
setting. The basic problem with FTL is that it is too unstable: If we consider a setting with X = [−1, 1] and
24 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM

f1 (x) = 21 x and ft alternates between −x and x then we get that FTL flip-flops between −1 and 1, since
they become alternately optimal, and always end up being the wrong choice for the next loss.
This motivates the need for a more stable algorithm. What we will do is to smooth out the decision
made at each point in time. In order to describe how this smoothing out works we need to take a detour
into distance-generating functions.

4.3 Distance-Generating Functions


A distance-generating function (DGF) is a function d : X → R which is continuously differentiable on the
interior of X, and strongly convex with modulus 1 with respect to a given norm ∥ · ∥, meaning
1
d(x) + ⟨∇d(x), x′ − x⟩ + ∥x′ − x∥2 ≤ d(x′ )∀x, x′ ∈ X
2
If d is twice differentiable on int X then the following definition is equivalent:

⟨h, ∇2 d(x)h⟩, ∀x ∈ X, h ∈ Rn

Intuitively, strong convexity says that the gap between d and its first-order approximation should grow
at a rate of at least ∥x − x′ ∥2 . Graphically, we can visualize the 1-dimensional version of this as follows:

Figure 4.1: Strong convexity illustrated. The gap between the distance function and its first-order approxi-
mation should grow at least as ∥x − x′ ∥2 .

We will use this gap to construct a distance function. In particular, we say that the Bregman divergence
associated with a DGF d is the function:

D(x′ ∥x) = d(x′ ) − d(x) − ⟨∇d(x), x′ − x⟩.

Intuitively, we are measuring the distance going from x to x′ . Note that this is not symmetric, the distance
from x′ to x may be different, and so it is not a true distance metric.
Given d and our choice of norm ∥ · ∥, the performance of our algorithms will depend on the set width of
X with respect to d:
Ωd = max ′
d(x) − d(x′ ),
x,x ∈X

and the dual norm of ∥ · ∥:


∥g∥∗ = max ⟨g, x⟩.
∥x∥≤1

In particular, we will care about the largest possible loss vector g that we will see, as measured by the
dual norm ∥g∥∗ .
4.3. DISTANCE-GENERATING FUNCTIONS 25

Norms and their dual norm satisfy a useful inequality that is often called the Generalized Cauchy-Schwarz
inequality:  
x
⟨g, x⟩ = ∥x∥ g, ≤ ∥x∥ max ⟨g, x′ ⟩ ≤ ∥x∥∥g∥∗
∥x∥ ∥x′ ∥≤1

What’s the point of these DGFs, norms, and dual norms? The point is that we get to choose all of
these in a way that fits the “geometry” of our set X. This will become important later when we will derive
convergence rates that depend on Ω and L, where L is an upper bound on the dual norm ∥g∥X,∗ of all loss
vectors.
Consider the following two DGFs for the probability simplex ∆n = {x : i xi = 1, x ≥ 0}:
P

X 1X 2
d1 (x) = xi log(xi ), d2 (x) = x .
i
2 i i

The first is the entropy DGF, the second is the Euclidean DGF. First let us check that they are both strongly
convex on ∆n . The Euclidean DGF is clearly strongly convex wrt. the ℓ2 norm. It turns out that the entropy
DGF is strongly-convex wrt. the ℓ1 norm. Using the second-order definition of strong convexity and any
h ∈ Rn :
!2
X
2
∥h∥1 = |hi |
i
!2
X√ |hi |
= xi √
i
xi
! !
X X |hi |2
≤ xi by the Cauchy-Schwarz inequality
i i
xi
!
X |hi |2
= because x ∈ ∆n
i
xi
= ⟨h, ∇2 d1 (x)h⟩

But now imagine that our losses are in [0, 1]n . The maximum dual norm for the Euclidean DGF is then
* +
⃗1 √
max ⟨⃗1, x⟩ = ⃗1, √ = n,
∥x∥2 ≤1 n

while Ωd2 = 1.
In contrast, the maximum dual norm for the ℓ1 norm is

max ⟨⃗1, x⟩ = ∥⃗1∥∞ = 1.


∥x∥1 ≤1

and the set width of the entropy DGF is Ωd1 = log  n. 


ΩL
Thus if our convergence rate is of the form O √ T
, then the entropy DGF gives us a log n dependence

on the dimension n of the simplex, whereas the Euclidean DGF leads to a n dependence. This shows the
well-known fact that the entropy DGF is the “right” DGF for the simplex (from a theoretical standpoint,
things turn out to be quite different in numerical performance as we shall see later in the course).
We will need the following inequality on a given norm and its dual norm:
1 1
⟨g, x⟩ ≤ ∥g∥2∗ + ∥x∥2 . (4.1)
2 2
which follows from
1 1 1
⟨g, x⟩ − ∥x∥2 ≤ ∥g∥∗ ∥x∥ − ∥x∥2 ≤ ∥g∥2∗
2 2 2
26 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM

where the first step is by the generalized Cauchy-Schwarz inequality and the second step is by maximizing
over x.
We will also need the following result concerning Bregman divergences. Unfortunately it’s not clear what
intuition one can give about this, except to say that the left-hand side is analogous to a triangle inequality.
Lemma 1 (Three-point lemma). For any three points x, u, z, we have
D(u∥x) − D(u∥z) − D(z∥x) = ⟨∇d(z) − ∇d(x), u − z⟩
The proof is direct from expanding definitions and canceling terms.

4.4 Online Mirror Descent


We now cover one of the canonical OCO algorithms: Online Mirror Descent (OMD). In this algorithm, we
smooth out the choice of xt+1 in FTL by penalizing our choice by the Bregman divergence D(x∥xt ) from xt .
This has the effect of stabilizing the algorithm, where the stability is essentially due to the strong convexity
of d. We pick our iterates as follows:
xt+1 = argmin ⟨η∇ft (x), x⟩ + D(x∥xt ).
x∈X

where η > 0 is the stepsize.


For this algorithm to be well-defined we also need one of the following assumptions:
lim ∥∇d(x)∥ = +∞ (4.2)
x→∂X
(4.3)
or d should be continuously differentiable on all of X.
To ease notation a bit, we let gt = ∇ft (xt ) throughout the section.
We first prove what is sometimes called a descent lemma or fundamental inequality for OMD1 .
Theorem 7. For all x ∈ X, we have
η2
η(ft (xt ) − ft (x)) ≤ η⟨gt , xt − x⟩ ≤ D(x∥xt ) − D(x∥xt+1 ) + ∥gt ∥2∗
2

Proof. The first inequality in the theorem is direct from convexity of ft . Thus we only need to prove the
second inequality.
By first-order optimality of xt+1 we have
⟨ηgt + ∇d(xt+1 ) − ∇d(xt ), x − xt+1 ⟩ ≥ 0, ∀x ∈ X (4.4)
Now pick some arbitrary x ∈ X. By rearranging terms and adding and subtracting ⟨∇d(xt+1 ) −
∇d(xt ), x − xt+1 ⟩ we have
⟨ηgt , xt − x⟩ =⟨∇d(xt ) − ∇d(xt+1 ) − ηgt , x − xt+1 ⟩ + ⟨∇d(xt+1 ) − ∇d(xt ), x − xt+1 ⟩
+ ⟨ηgt , xt − xt+1 ⟩
≤⟨∇d(xt+1 ) − ∇d(xt ), x − xt+1 ⟩ + ⟨ηgt , xt − xt+1 ⟩; by (4.4)
=D(x∥xt ) − D(x∥xt+1 ) − D(xt+1 ∥xt ) + ⟨ηgt , xt − xt+1 ⟩; by three-points lemma
2
η 1
≤D(x∥xt ) − D(x∥xt+1 ) − D(xt+1 ∥xt ) + ∥gt ∥2∗ + ∥xt − xt+1 ∥2 ; by (4.1)
2 2
η2
≤D(x∥xt ) − D(x∥xt+1 ) + ∥gt ∥2∗ ; by strong convexity of d,
2
which proves the theorem.
1 Our
proof follows the one from the excellent lecture notes of Orabona [75]. See also Beck [8] for a proof of the offline variant
of mirror descent.
4.5. MINIMAX THEOREMS VIA OCO 27

The descent lemma gives us a one-step upper bound on how much better x is than xt . Based on the
descent lemma, a bound on the regret of OMD can be derived. The idea is to apply the descent lemma at
each time step, and then showing that when we sum across the resulting inequalities, a sequence of useful
cancellations occur.

Theorem 8. The OMD algorithm with DGF d achieves the following bound on regret:
T
D(x∥x1 ) η X
RT ≤ + ∥gt ∥2∗
η 2 t=1

Proof. Consider any x ∈ X. Now we apply the inequality from Theorem 7 separately to each time step
t = 1, . . . , T , divide through by η, and then summing from t = 1, . . . , T we get
T T
η2
 
X X 1 2
⟨gt , x − xt ⟩ ≤ D(x∥xt ) − D(x∥xt+1 ) + ∥gt ∥∗
t=1 t=1
η 2
T
D(x∥x1 ) − D(x∥xT +1 ) X η
≤ + ∥gt ∥2∗
η t=1
2
T
D(x∥x1 ) X η
≤ + ∥gt ∥2∗
η t=1
2

where the second inequality is by noting that the term D(x∥xt ) appears with a positive sign at the t’th part
of the sum, and negative sign at the t − 1’th part of the sum.

Suppose that each ft is Lipschitz in the sense that ∥gt ∥∗ ≤ L, using our

bound Ω on DGF differences,
2Ω
and supposing we initialize x1 at the minimizer of d, then we can set η = L T to get

Ω ηT L2 √
RT ≤ + ≤ 2ΩT L
η 2

A related algorithm is the follow-the-regularizer-leader algorithm. It works as follows:


t
X
xt+1 = argmin η⟨ gt , x⟩ + d(x).
x∈X τ =1

Note that it is more directly related to FTL: it uses the FTL update, but with a single smoothing term
d(x), whereas OMD re-centers a Bregman divergence at D(·∥xt ) at every iteration. FTRL can be analyzed
similarly to OMD. It gives the same theoretical properties for our purposes, but we’ll see some experimental
performance from both algorithms later. For a convergence proof see Orabona [75].

4.5 Minimax theorems via OCO


In the first and second chapters we saw von Neumann’s minimax theorem, which was:

Theorem 9 (von Neumann’s minimax theorem). Every two-player zero-sum game has a unique value v,
called the value of the game, such that

min max ⟨x, Ay⟩ = maxm minn ⟨x, Ay⟩ = v.


x∈∆n y∈∆m y∈∆ x∈∆

We will now prove a generalization of this theorem.


28 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM

Theorem 10 (Generalized minimax theorem). Let X ∈ Rn , Y ∈ Rm be compact convex sets. Let f (x, y) be
continuous, convex in x for a fixed y, and concave in y for a fixed x, with some upper bound L on the partial
subgradients with respect to x and y. Then there exists a value v such that

min max f (x, y) = max min f (x, y) = v.


x∈X y∈Y y∈Y x∈X

Proof. We will view this is a game between a player choosing the minimizer and a player choosing the
maximizer. Let y ∗ be the y chosen when y is chosen first. When y is chosen second, the maximizer over y
can, in the worst case, pick at least y ∗ every time. Thus we get

max min f (x, y) ≤ min max f (x, y)


y∈Y x∈X x∈X y∈Y

For the other direction we will use our OCO results. We run a repeated game where the players choose
a strategy xt , yt at each iteration t. The x player chooses xt according to a no-regret algorithm (say OMD),
while yt is always chosen as argmaxy∈Y f (xt , y). Let the average strategies be

T T
1X 1X
x̄ = xt , ȳ = yt .
T t=1 T t=1

Using OMD with the Euclidean DGF (since X is compact this is well-defined), we get the following
bound:
T
X T
X √ 
RT = f (xt , yt ) − min f (x, yt ) ≤ O ΩT L (4.5)
x∈X
t=1 t=1

Now we bound the value of the min-max problem as


T T
1 X 1X
min max f (x, y) ≤ max f (x̄, y) ≤ max f (xt , y) ≤ f (xt , yt ),
x∈X y∈Y y∈Y T y∈Y t=1 T t=1

where the first inequality follows because x̄ is a valid choice in the minimization over X, the second inequality
follows by convexity, and the third inequality follows because yt is chosen to maximize f (xt , yt ). Now we
can use the regret bound (4.5) for OMD to get

T √ !
1 X ΩL
min max f (x, y) ≤ min f (x, yt ) + O √
x∈X y∈Y T x∈X t=1 T
√ !
ΩL
≤ min f (x, ȳ) + O √
x∈X T
√ !
ΩL
≤ max min f (x, y) + O √
y∈Y x∈X T

Now taking the limit T → ∞ we get

min max f (x, y) ≤ max min f (x, y)


x∈X y∈Y y∈Y x∈X

which concludes the proof.

For simplicity we assumed continuity of f . The argument did not really need continuity, though. The
same proof works for f which is lower/upper semicontinuous in x and y respectively.
4.6. HISTORICAL NOTES 29

4.6 Historical notes


When applied to the offline setting where ft = f ∀t, OMD is equivalent to the mirror descent algorithm
which was introduced by Nemirovsky and Yudin [71], with the more modern variant introduced by Beck
and Teboulle [9]. There’s a functional-analytic interpretation of OMD and mirror descent where one views
d as a mirror map that allows us to think of f and x in terms of the dual space of linear forms. This was
the original motivation for mirror descent, and allows one to apply the algorithm in broader settings, e.g.
Banach spaces. This is described in several textbooks and lecture notes e.g. Orabona [75] or Bubeck et al.
[20]. The FTRL algorithm run on an offline setting with ft = f becomes equivalent to Nesterov’s dual
averaging algorithm [72].
The minimax theorems in Theorem 9 and Theorem 10 were developed by John von Neumann in [87].
The term “von Neumann’s minimax theorem” is often used to refer to the specific version in Theorem 9.
In his original 1928 paper, von Neumann actually proved a more general result for continuous quasi-convex-
quasi-concave functions f , which captures the form given in Theorem 10. See Kjeldsen [59] for a discussion
of the history of von Neumann’s development and conceptualization of the minimax theorem, including a
discussion of the quasi-convex-quasi-concave generalization. The more general Theorem 10, as well as even
more general versions that allow quasi-concavity and quasi-convexity and abstract topological decision spaces,
are often referred to as Sion’s minimax theorem 2 , sometimes even in cases that fall under von Neumann’s
generalization beyond the bilinear case. For example, in his 1958 paper [80], Sion claims that von Neumann’s
theorem is only concerned with bilinear functions, whereas it is actually substantially more general. This
misconception that von Neumann only dealt with the bilinear case may have arisen because that is by far
the most important case from a game-theoretic perspective (since it enables solutions to two-player zero-sum
games). Moreover, von Neumann’s original 1928 paper was written in German, and an English translation
did not appear until 1958 [88].

2A quite general version of what’s usually referred to as Sion’s minimax theorem can be found on Wikipedia at https:
//en.wikipedia.org/wiki/Sion%27s_minimax_theorem.
30 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM
Chapter 5

Self-Play via Regret Minimization

5.1 Recap
We have covered a slew of no-regret algorithms: hedge, online mirror descent (OMD), regret matching (RM),
and RM+ . All of these algorithms can be used for the case of solving two-player zero-sum matrix games of
the form
minn maxm ⟨x, Ay⟩.
x∈∆ y∈∆

Matrix games are a special case of the more general saddle-point problem

min max f (x, y)


x∈X y∈Y

where f is convex-concave, meaning that f (·, y) is convex for all fixed y, and f (x, ·) is concave for all fixed
x, and lower/upper semicontinuous. In this chapter we will cover how to solve this more general class
of saddle-point problems by using regret minimization for each “player” and having the regret minimizers
perform what is usually called self play. The name self play comes from the fact that we usually use the same
regret-minimization algorithm for each player, and so in a sense this approach towards computing equilibria
lets the chosen regret-minimization algorithm play against itself. After covering the self play setup, we will
look at some experiments on practical performance for the matrix-game case. We will also compare to an
algorithm that has stronger theoretical guarantees.

5.2 From Regret to Nash Equilibrium


In order to use regret-minimization algorithms for computing Nash equilibrium, we will run a repeated game
between the x and y players. We will assume that each player has access to some regret-minimizing algorithm
Rx and Ry (we will be a bit loose with notation here and implicitly assume that Rx and Ry keep a state
that may depend on the sequence of losses and decisions). The game is as follows:

• Initialize x1 ∈ X, y1 ∈ Y to be some pair of strategies in the relative interior (in matrix games we
usually start with the uniform strategy)

• At time t, let xt be the recommendation from Rx and yt be the recommendation from Ry

• Let Rx and Ry observe losses gt = f (·, yt ), ℓt = f (xt , ·) respectively

For a strategy pair x̄, ȳ, we will measure proximity to Nash equilibrium via the saddle-point residual
(SPR):
   
ξ(x̄, ȳ) := max f (x̄, y) − f (x̄, ȳ) + f (x̄, ȳ) − min f (x, ȳ) = max f (x̄, y) − min f (x, ȳ).
y∈Y x∈X y∈Y x∈X

31
32 CHAPTER 5. SELF-PLAY VIA REGRET MINIMIZATION

gt−1 xt gt xt+1
· · · ℓt−1 X yt ℓt X yt+1 · · ·
Y Y
Figure 5.1: The flow of strategies and losses in regret minimization for games.

Each bracketed term represents how much each player can improve by deviating from ȳ or x̄ respectively,
given the strategy profile (x̄, ȳ). In game-theoretic terms the brackets are how much each player improves
by best responding.
Now, suppose that the regret-minimizing algorithms guarantee regret bounds of the form
T
X T
X
max f (xt , y) − f (xt , yt ) ≤ ϵy
y∈Y
t=1 t=1
(5.1)
T
X T
X
f (xt , yt ) − min f (x, yt ) ≤ ϵx ,
x∈X
t=1 t=1

then the following folk theorem holds


1
PT 1
PT
Theorem 11. Suppose (5.1) holds, then for the average strategies x̄ = T t=1 xt , ȳ = T t=1 yt the SPR
is bounded by
(ϵx + ϵy )
ξ(x̄, ȳ) ≤ .
T

Proof. Summing the two inequalities in (5.1) we get


T
X T
X T
X T
X
ϵx + ϵy ≥ max f (xt , y) − f (xt , yt ) + f (xt , yt ) − min f (x, yt )
y∈Y x∈X
t=1 t=1 t=1 t=1
T
X T
X
= max f (xt , y) − min f (x, yt )
y∈Y x∈X
t=1 t=1
 
≥T max f (x̄, y) − min f (x, ȳ) ,
y∈Y x∈X

where the inequality is by f being convex-concave.

So now we know how to compute a Nash equilibrium: simply run the above repeated game with each
player using a regret-minimizing algorithm, and the uniform average of the strategies will converge to a Nash
equilibrium.
Figure 5.2 shows the performance of the regret-minimization algorithms taught so far in the course, when
used to compute a Nash equilibrium of a zero-sum matrix game via Theorem 11. Performance is shown on
3 randomized matrix game classes where entries in A are sampled according to: 100-by-100 uniform [0, 1],
500-by-100 standard Gaussian, and 100-by-100 standard Gaussian. All plots are averaged across 50 game
samples per setup. We show one addition algorithm for reference: the mirror prox algorithm, which is an
offline optimization algorithm that converges to a Nash equilibrium at a rate of O T1 . It’s an accelerated


variant of mirror descent, and it similarly relies on a distance-generating function d. The plot shows mirror
prox with the Euclidean distance.  
As we see in Figure 5.2, mirror prox indeed performs better than all the O √1T regret minimizers
using the setup for Theorem 11. On the other hand, the entropy-based variant of √ OMD, which has a log n
dependence on the dimension n, performs much worse than the algorithms with n dependence.
5.3. ALTERNATION 33

Normal_100_100 Normal_500_100 Uniform_100_100


1.000
Saddle−point residual

● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●

0.100 ●
● ●


● ● ●
● ● ●
● ●

● ● ●
● ● ●
0.010 ● ●




● ● ●
● ● ●
● ● ●
● ● ●

0.001 ●

1 10 100 1000 1 10 100 1000 1 10 100 1000


Iterations
● MP l2 uniform OMD entropy uniform OMD l2 uniform
Algorithm
RM RM+

Figure 5.2: Plots showing the performance of four different regret-minimization algorithms for computing
Nash equilibrium, all using Theorem 11. Mirror prox with uniform averaging is also shown as a reference
point.

5.3 Alternation
Let’s try making a small tweak now; the idea of alternation. In alternation, the players are no longer
symmetric: one player sees the loss based on the previous strategy of the other player as before, but the
second player sees the loss associated to the current strategy.

• Initialize x1 , y1 to be uniform distributions over actions

• At time t, let xt be the recommendation from Rx

• The y player observes loss f (xt , ·)

• yt is the recommendation from Ry after observing f (xt , ·)

• The x player observes loss f (·, yt )

Suppose that the regret-minimizing algorithms guarantee regret bounds of the form

T
X T
X
max f (xt+1 , y) − f (xt+1 , yt ) ≤ ϵy
y∈Y
t=1 t=1
(5.2)
T
X T
X
f (xt , yt ) − min f (x, yt ) ≤ ϵx .
x∈X
t=1 t=1

Theorem 12. Suppose we run two regret minimizer with alternation and they give the guarantees in (5.2).
PT PT
Then the average strategies x̄ = T1 t=1 xt+1 , ȳ = T1 t=1 yt satisfy
PT
ϵx + ϵy + t=1 (f (xt+1 , yt ) − f (xt , yt ))
ξ(x̄, ȳ) ≤
T
34 CHAPTER 5. SELF-PLAY VIA REGRET MINIMIZATION

Proof. As before we sum the regret bounds to get

T
X T
X T
X T
X
ϵx + ϵy ≥ max f (xt+1 , y) − f (xt+1 , yt ) + f (xt , yt ) − min f (x, yt )
y∈Y x∈X
t=1 t=1 t=1 t=1
T
X T
X T
X
= max f (xt+1 , y) − min f (x, yt ) − [f (xt+1 , yt ) − f (xt , yt )]
y∈Y x∈X
t=1 t=1 t=1
  XT
≥T max f (x̄, y) − min f (x, ȳ) − [f (xt+1 , yt ) − f (xt , yt )]
y∈Y x∈X
t=1

Theorem 12 shows that if f (xt+1 , yt ) − f (xt , yt ) ≤ 0 for all t, then the bound for alternation is weakly
better than the bound in Theorem 11. But what does this condition mean? If we examine it from the
regret minimization perspective, it is saying that xt+1 does better than xt against yt . Intuitively, we would
expect this to hold: xt is chosen right before observing f (·, yt ), whereas xt+1 is chosen immediately after
observing f (·, yt ), and generally we would expect that any time we make a new observation, we should move
somewhat in the direction of improvement against that observation. Indeed, it turns out to be relatively
straightforward to show that this holds for all the regret minimizers we saw so far (As an exercise, show that
this holds for a few regret minimizers; it is easiest for OMD).
Figure 5.3 shows the performance of the same set of regret-minimization algorithms but now using the
setup from Theorem 12. Mirror prox is shown exactly as before.

Normal_100_100 Normal_500_100 Uniform_100_100


Saddle−point residual

● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●

0.100 ●
● ●


● ● ●
● ● ●
● ●

● ● ●
● ● ●
0.010 ● ●




● ● ●
● ● ●
● ● ●
● ● ●

0.001 ●

1 10 100 1000 1 10 100 1000 1 10 100 1000


Iterations
● MP l2 uniform OMD entropy uniform alt OMD l2 uniform alt
Algorithm
RM alt RM+ alt

Figure 5.3: Plots showing the performance of four different regret-minimization algorithms for computing
Nash equilibrium, all using Theorem 12. Mirror prox with uniform averaging is also shown as a reference
point.

Amazingly, Figure 5.3 shows that with alternation, OMD with the Euclidean DGF, regret matching, and
RM+ all perform about on par with mirror prox.
5.4. INCREASING ITERATE AVERAGING 35

5.4 Increasing Iterate Averaging


Now we will look at one final trick. In Theorems 11 and 12 we generated a solution by uniformly averaging
iterates. We will now consider polynomial averaging schemes of the form
T T
1 X 1 X
x̄ = PT tq xt , ȳ = PT tq yt .
q q
t=1 t t=1 t=1 t t=1

Figure 5.4 shows the performance of the same set of regret-minimization algorithms but now using the
setup from Theorem 12 and linear averaging in all algorithms, including mirror prox. The fastest algorithm

Normal_100_100 Normal_500_100 Uniform_100_100


1e+00
Saddle−point residual

● ● ● ● ● ●
● ● ● ● ● ●

● ● ●
1e−01 ●





● ● ●
● ●
● ●
● ●
1e−02 ●




● ●

● ●


1e−03 ●




● ●

● ●

1e−04 ● ●

1 10 100 1000 1 10 100 1000 1 10 100 1000


Iterations
● MP l2 linear OMD l2 linear alt RM linear alt
Algorithm
RM+ alt RM+ linear alt

Figure 5.4: Plots showing the performance of four different regret-minimization algorithms for computing
Nash equilibrium, all using Theorem 12. All algorithms use linear averaging. RM+ with uniform averaging
is shown as a reference point.

with uniform averaging, RM+ with alternation, is shown for reference. OMD with Euclidean DGF and RM+
with alternation both gain another order of magnitude in performance by introducing linear averaging.
It can be shown that RM+ , online mirror descent, and mirror prox, all work with polynomial averaging
schemes.

5.5 Historical Notes and Further Reading


The derivation of a folk theorem for alternation in matrix games was by Burch et al. [23], after Farina
et al. [47] pointed out that the original folk theorem does not apply when using alternation. The general
convex-concave case is new, although easily derived from the existing results.
The fact that instantiating OMD with the Euclidean distance seems to perform better than entropy
when solving matrix games in practice has been observed in a few different algorithms both first-order
methods [26, 51] and regret-minimization algorithms [48]. The fact that OMD with Euclidean distance
performs much better after adding alternation has not been observed before.
Results for polynomial averaging schemes were shown by Tammelin et al. [83] and Brown and Sandholm
[18] for RM+ , in Nemirovski’s lecture notes1 for mirror descent and mirror prox, and for several other
primal-dual first-order methods by Gao et al. [51].

1 https://www2.isye.gatech.edu/
~nemirovs/LMCO_LN2019NoSolutions.pdf
36 CHAPTER 5. SELF-PLAY VIA REGRET MINIMIZATION
Chapter 6

Extensive-Form Games

6.1 Introduction
In this lecture we will cover extensive-form games (EFGs). Extensive-form games are a richer game descrip-
tion that explicitly models sequential interaction. EFGs are played on a game tree. Each node in the game
tree belongs to some player, whom gets to choose the branch to traverse.

6.2 Perfect-Information EFGs


We start by considering perfect-information EFGs. The term perfect information refers to the fact that in
these games, every player always knows the exact state of the game. A perfect-information EFG is a game
played on a tree, where each internal node belongs to some player. The actions for the player at a given node
is the set of branches, and by selecting a branch the game proceeds to the following node. An example is
shown in Figure 6.1 on the left. That game has four nodes where players take actions, two belong to player
1 (labelled P1) and two belonging to player 2 (labelled P2). Additionally, the game tree has 6 leaf nodes.
At each leaf node, each player receives some payoff. In this particular game, it is a zero-sum game, and the
value at a leaf denotes the value that player 1 receives.

P1 P1 P1

P2 P2 P2 P2 -2 -1

-2 P1 -1 1 -2 2 -1 1

-3 2 1
Figure 6.1: A simple perfect-information EFG. Three versions of the game are shown, where each stage
corresponds to removing one layer of the game via backward induction.

Perfect-information EFGs are trivially solvable (at least if we are able to traverse the whole game tree
at least once). The way to solve them is via backward induction. Backward induction works by starting at
some bottom decision node of the game tree, which only leads to leaf nodes after each action is taken (such a
node always exists). Then, the optimal action for the player at the node is selected, and the node is replaced
with the corresponding leaf node. Now we get a new perfect-information EFG with one less internal node.
Backward induction then repeats this process until there’s no internal nodes left, at which point we have
computed a Nash equilibrium. Thus perfect-information EFGs always have pure-strategy Nash equilibria.

37
38 CHAPTER 6. EXTENSIVE-FORM GAMES

While backward induction yields a linear-time algorithm for solving perfect-information games, in prac-
tice, many games of interest are way too large to solve with it nonetheless. For example, chess and go both
have enormous game trees, with estimates of ∼ 1045 and ∼ 10172 nodes respectively.
Next let us see how converting to normal form works. The way converting to normal form works is
that for each player, we create an action corresponding to every possible way of assigning an action at every
decision point. So, if a player has d decision points with A actions each, then there Ad actions will be created
in the normal form representation of the EFG. This reduction to normal form works for both perfect and
imperfect-information games.
Let’s consider an instructive example. Here we will model the Cuban Missile Crisis. The USSR has
moved a bunch of nuclear weapons to Cuba, and the US has to decide how to respond. If they do nothing,
then the USSR wins a political victory, and gets to keep nuclear missiles within firing distance of major US
cities. If the US responds, then it could results in a series of escalations that would eventually lead to nuclear
war, or the USSR will eventually compromise and remove the missiles.

USA

respond do nothing

USSR 0,2

nuclear war compromise

−1000, −1000 2,1

Figure 6.2: A perfect-information EFG modeling the Cuban missile crisis.

If we convert this game to normal form, we get the following game:

USSR
Nuclear war Compromise
USA

Respond −1000, −1000 2,1


Do Nothing 0,2 0,2

It is straightforward to see from this representation that the Cuban Missile Crisis game has two PNE:
(do nothing, nuclear war) and (respond, compromise). However, the first PNE is in a sense not compelling:
what if the USA just responded? The USSR probably would not be willing to follow through on taking the
action “nuclear war” since it has such low utility for them as well. This leads to the notion of subgame-perfect
equilibria, which are equlibria that remain equilibria if we take any subgame consisting of picking some node
in the tree and starting the game there.

6.3 Imperfect-Information EFGs


Next we study imperfect-information EFGs. As the name implies, these are games where players may not
have perfect knowledge about the state of the game. From a game-theoretic perspective, this class of games
is richer, and will rely more directly on equilibrium concepts for talking about solutions (in contrast to
6.3. IMPERFECT-INFORMATION EFGS 39

perfect-information EFGs, where solutions are straightforwardly obtained from backward induction). An
example is shown in Figure 6.3.

A K

P1 P1

f r r′ f′

−1 P2 P2 −1

f∗ r∗ f∗ r∗

P1 P1 1 −3

ĉ fˆ ĉ fˆ

−3 −2 3 −2

Figure 6.3: A (rather weird) poker game where P1 is dealt Ace or King with equal probability. “r,” “f,” and
“c” stands for raise, fold, and check respectively. Leaf values denote P1 payoffs. The shaded area denotes an
information set: P2 does not know which of these nodes they are at, and must thus use the same strategy
in both. Note that in the case where they are dealt an ace, P1 does not observe the action taken by P2.

An EFG has the following:


• Information sets: for each player, the nodes belonging to that player are partitioned into information
sets I ∈ Ii . information sets represent imperfect information: a player does not know which node
in an information set they are at, and thus they muse utilize the same strategy at each node in that
information set. In Figure 6.3 P2 has only 1 information set, which contains both their nodes, whereas
P1 has four information sets, each one a singleton node. For player i we will also let Ji be an index
set of information sets with generic element j.
• Each information set I with index j has a set of actions that they corresponding player may take,
which is denoted by Aj .
• Leaf nodes Z: the set of terminal states. Player i gains utility ui (z) if leaf node z is reached. Z is the
set of all leaf nodes.
• Chance nodes where Chance or Nature moves with a fixed probability distribution. In Figure 6.3
chance deals A or K with equal probability.
We will assume throughout that the game has perfect recall, which means that no player ever forgets some-
thing they knew in the past. More formally, it means that for every information set I ∈ Ii , there is a single
last information-set action pair I ′ , a′ belonging to i that was the last information set and action taken by
that player for every node in I.
The last action taken by player i before reaching an information set with index j is denoted pj . This is
well-defined due to perfect recall.
We spent a lot of time learning how one may compute a Nash equilibrium in a two-player zero-sum
game by finding a saddle point of a min-max problem over convex compact polytopes. This model looked as
follows (we also learned how to handle convex-concave objectives, here we restrict our attention to bilinear
saddle-point problems)
min max⟨x, Ay⟩. (6.1)
x∈X y∈Y
40 CHAPTER 6. EXTENSIVE-FORM GAMES

Now we would like to find a way to represent EFG zero-sum Nash equilibrium this way. This turns out to
be possible, and the key is to find the right way to represent strategies such that we get a bilinear objective.
The next section will describe this representation.
First, let us see why the most natural formulation of the strategy spaces won’t work. The natural formu-
lation would be to have a player specify a probability distribution over actions at each of their information
sets. Let σ be a strategy profile, where σa is the probability of taking action a (from now on we assume that
every action is distinct so that for any a there is only one corresponding I where the action can be played).
The expected value over leaf nodes is X
u2 (z)P(z|σ)
z∈Z

The problem with this formulation is that if a player has more than one action on the path to any leaf, then
the probability P(z|σ) of reaching z is non-convex in that player’s own strategy, since we have to multiply
each of the probabilities belonging to that player on the path to z. Thus we cannot get the bilinear form in
(6.1).

6.4 Sequence Form


In this section we will describe how we can derive a bilinear representation X of the strategy space for player
1. Everything is analogous for Y .
In order to get a bilinear formulation of the expected value we do not write our strategy in terms of
the probability σa of playing an action a. Instead, we associate to each information-set-action pair I, a a
variable xa denoting the probability of playing the sequence of actions belonging to player 1 on the path to I,
including the probability of a at I. For example, in the poker game in Figure 6.3, there would be a variable
xĉ denoting the product of probabilities player 1 puts on playing actions r and then ĉ. To be concrete, say
that we have a behavioral strategy σ 1 for player 1, then the corresponding sequence-form probability on the
action ĉ would be xĉ = σr1 · σĉ1 . Similarly there would be a variable xfˆ = σr1 · σf1ˆ denoting the product of
probabilities on r and fˆ. Clearly, for this to define a valid strategy we must have xĉ + xfˆ = xr .
More generally, X is defined as the set of all x ∈ Rn , x ≥ 0 such that
X
xpj = xa , ∀j ∈ J1 , (6.2)
a∈Aj

P
where n = I∈Ii |A|, and p(I) is the parent sequence leading to I.
One way to visually think of the set of sequence-form strategies is given in Figure 6.4. This representation

∆1 × ∆2
q1 q2 q3 q4

q 1 · ∆3 q 2 · ∆4 q3 · ∆5 q 4 · ∆6
q5 q6 q7 q8 q9 q10 q11 q12

q6 · ∆7 × q 6 · ∆8
q13 q15 q16 q17
q14

Figure 6.4: An example treeplex constructed from 9 simplices.

is called a treeplex. Each information set is represented as a simplex, which is scaled by the parent sequence
leading to that information set (by perfect recall there is a unique parent sequence). After taking a particular
6.5. DILATED DISTANCE-GENERATING FUNCTIONS 41

action it is possible that a player may arrive at several next possible simplexes depending on what the other
player or nature does. This is represented by the × symbol.
It’s important to understand that the sequence form specifies probabilities on sequences of actions for a
single player. Thus they are not the same as paths in the game tree; indeed, the sequence r∗ for player 2
appears in two separate paths of the game tree, as player 2 has two nodes in the corresponding information
set.
Say we have a set of probability distributions over actions at each information set, with σa denoting the
probability of playing action a. We may construct a corresponding sequence-form strategy by applying the
following equation in top-down fashion (so that xpj is always assigned before xa ):

xa = xpj σa , ∀j ∈ J , a ∈ Aj . (6.3)

The payoff matrix A associated with the sequence-form setup is a sparse matrix, with each row corre-
sponding to a sequence of the x player and each column corresponding to a sequence of the y player. Each
leaf has a cell in A at the pair of sequences that are last visited by each player before reaching that leaf,
and the value in the cell is the payoff to the maximizing player in the bilinear min-max formulation. Cells
corresponding to pairs of sequences that are never the last pair of sequences visited before a leaf have a zero.
With this setup we now have an algorithm for computing a Nash equilibrium in a zero-sum EFG: Run
online mirror descent (OMD) for each player, using either of our folk-theorem setups from the previous
chapter. However, this has one issue, recall the update for OMD (also known as a prox mapping):

xt+1 = argmin⟨γgt , x⟩ + D(x∥xt ),


x∈X

where D(x∥xt ) = d(x) − d(xt ) − ⟨∇d(xt ), x − xt ⟩ is the Bregman divergence from xt to x. In order to run
OMD, we need to be able to compute this prox mapping. The question of whether the prox mapping is easy
to compute is easily answered when X is a simplex, where updates for the entropy DGF are closed-form,
and updates for the Euclidean DGF can be computed in n log n time, where n is the number of actions. For
treeplexes this question becomes more complicated.
In principle we could use the standard Euclidean distance for d. In that case the update can be rewritten
as
xt+1 = argmin ∥x − (xt − γgt )∥22 ,
x∈X

which means that the update requires us to project onto a treeplex. This can be done in n · d · log n time,
where n is the number of sequences and d is the depth of the decision space of the player. While this is
acceptable, it turns out there are smarter ways to compute these updates which take linear time in n.

6.5 Dilated Distance-Generating Functions


We will see two ways to construct regret minimizers for treeplexes. The first is based on choosing an appro-
priate distance-generating function (DGF) for the treeplex, such that prox mappings are easy to compute.
To that end, we now introduce what are called dilated DGFs. In dilated DGFs we assume that we have a
DGF dj for each information set j ∈ J . For the polytope X we construct the DGF

xj
X  
d(x) = βj x pj d j ,
x pj
j∈J1

where βj > 0 is the weight on information set j.


Dilated DGFs have the nice property that the proximal update can be computed recursively as long as
we know how to compute the simplex update for each j. Let xj , gtj etc denote the slice of a given vector
42 CHAPTER 6. EXTENSIVE-FORM GAMES

corresponding to sequences belonging to information set j. The update is

argmin⟨gt , x⟩ + D(x∥xt )
x∈X
= argmin⟨gt , x⟩ + d(x) − d(xt ) − ⟨∇d(xt ), x − xt ⟩
x∈X
= argmin⟨gt − ∇d(xt ), x⟩ + d(x)
x∈X
X j 
= argmin ⟨gt − ∇d(xt )j , xj ⟩ + βj xpj dj (xj /xpj )
x∈X
j∈J
X  
= argmin xpj ⟨gtj − ∇d(xt )j , xj /xpj ⟩ + βj dj (xj /xpj )
x∈X
j∈J

Now we may consider some information set j with no descendant information sets. Since xpj is on the
outside of the parentheses, we can compute the update at j as if it were a simplex update, and the value at
the information set can be added to the coefficient on xpj . That logic can then be applied recursively. Thus
we can traverse the treeplex in bottom-up order, and at each information set we can compute the value for
xjt+1 in however long it takes to compute an update for a simplex with DGF dj . P
If we use the entropy DGF for each j ∈ J and set the weight βj = 2 + maxa∈Aj j ′ ∈C a 2βj ′ , then we get
j
1
a DGF for X that is strongly convex modulus M where M = maxx∈X ∥x∥1 . If we scale this DGF by M we
get that it is strongly convex modulus 1. If we instantiate the mirror prox algorithm with this DGF for X
and Y we get an algorithm that converges at a rate of
 q 
maxij Aij maxI∈I log(|AI |) Mx2 2d + My2 2d
O ,
T

where Mx , My are the maximum ℓ1 norms on X and Y , and d is an upper bound on the depth of both
treeplexes. This gives the fastest theoretical rate of convergence among gradient-based methods. However,
this only works for OMD. All our other algorithms (RM, RM+ ) were for simplex domains exclusively. Next
we derive a way to use these locally at each information set. It turns out that faster practical performance
can be obtained this way.

6.6 Counterfactual Regret Minimization


The framework we will cover is the counterfactual regret minimization (CFR) framework for constructing
regret minimizers for EFGs.
CFR is based on deriving an upper bound on regret, which allows decomposition into local regret mini-
mization at each information set.
We are interested in minimizing the standard regret notion over the sequence form:
T
X T
X
RT = ⟨gt , xt ⟩ − min ⟨gt , x⟩.
x∈X
t=1 t=1

To get the decomposition, we will define a local notion of regret which is defined with respect to behavioral
strategies σ ∈ ×j ∆j =: Σ (here we just derive the decomposition for a single player, say player 1. Everything
is analogous for player 2).
We saw in the previous lecture note that it is always possible to go from behavioral form to sequence
form using the following recurrence, where assignment is performed in top-down order.

xa = xpj σa , ∀j ∈ J , a ∈ Aj . (6.4)

It is also possible to go the other direction (though this direction is not a unique mapping, as one has a
choice of how to assign behavioral probabilities at information sets j such that xpj = 0). These procedures
produce payoff-equivalent strategies for EFGs.
6.6. COUNTERFACTUAL REGRET MINIMIZATION 43

For a behavioral strategy vector σ (or loss vector gt ) we say that σ j is the slice of σ corresponding to
information set j. σ j↓ is the slice corresponding to j, and every information set below j. Similarly, Σj↓ is
the set of all behavioral strategy assignments for the subset of simplexes that are in the tree of simplexes
rooted at j.
We let Cj,a be the set of next information sets belonging to player 1 that can be reached from j when
taking action a. In other words, the set of information sets whose parent sequence is a.
Now, let the value function at time t for an information set j belonging to player 1 be defined as
X X ′ ′
Vtj (σ) = ⟨gtj , σ j ⟩ + σa Vtj (σ j ↓ ).
a∈Aj j ′ ∈Cj,a

where σ ∈ Σj↓ . Intuitively, this value function represents the value that player 1 derives from information
set j, assuming that i played to reach it, i.e. if we counterfactually set xpj = 1.
The subtree regret at a given information set j is
T
X T
X
RTj↓ = Vtj (σtj↓ ) − min Vtj (σ),
σ∈Σj↓
t=1 t=1

Note that this regret is with respect to the behavioral form.


The local loss that we will eventually minimize is defined as
′ ′
Vtj (σtj ↓ ).
j
X
ĝt,a = gt,a +
j ′ ∈Cj,a

Note that for each j, the loss depends linearly on σ j ; σ j does not affect information sets below j, since we
use σt in the value function for child information sets j ′ .
Now we show that the subtree regret decomposes in terms of local losses and subtree regrets.
Theorem 13. For any j ∈ J , the subtree regret at time T satisfies
 
T T

σa RTj ↓  .
X X X
RTj↓ = ⟨ĝtj , σtj ⟩ − minj  ⟨ĝtj , σ⟩ −
σ∈∆
t=1 t=1 a∈Aj ,j ′ ∈Cj,a

Proof. Using the definition of subtree regret we get


 
T T
X X X ′ ′
Rtj↓ = Vtj (σtj↓ ) − min  ⟨gtj , σ j ⟩ + σa Vtj (σ j ↓ ) by expanding Vtj (σ j↓ )
σ∈Σj↓
t=1 t=1 a∈Aj ,j ′ ∈Cj,a
 
T T
X X X ′ ′
= Vtj (σtj↓ ) − minj  ⟨gtj , σ⟩ + σa min′ Vtj (σ̂ j ↓ ) by sequential min
σ∈∆ σ̂∈Σj ↓
t=1 t=1 a∈Aj ,j ′ ∈Cj,a
 
T T
′ ′
σa RTj ↓  by definition of ĝt and RTj ↓ .
X X X
= Vtj (σtj↓ ) − minj  ⟨ĝtj , σ⟩ −
σ∈∆
t=1 t=1 a∈Aj ,j ′ ∈Cj,a

The theorem follows, since Vtj (σtj↓ ) = ⟨ĝtj , σtj ⟩.


The local regret that we will be minimizing is the following
T
X T
X
R̂Tj := j j
⟨ĝt , σt ⟩ − minj ⟨ĝtj , σ⟩.
σ∈∆
t=1 t=1

Note that this regret is in the behavioral form, and it corresponds exactly to the regret associated to locally
minimizing ĝtj at each simplex j.
The CFR framework is based on the following theorem, which says that the sequence-form regret can be
upper-bounded by the behavioral-form local regrets.
44 CHAPTER 6. EXTENSIVE-FORM GAMES

Theorem 14. The regret at time T satisfies

X
RT = RTroot↓ ≤ max xpj R̂Tj ,
x∈X
j∈J

where root is the root information set.

Proof. For the equality, consider the regret RT over the sequence form polytope X. Since each sequence-form
strategy has a payoff equivalent behavioral strategy in Σ and vice versa, we get that the regret RT is equal
to RTroot↓ for the root information set root (we may assume WLOG. that there is a root information set since
if not then we can add a dummy root information set with a single action).
By Theorem 13 we have for any j ∈ J
 
T T

σa RTj ↓ 
X X X
RTj↓ = ⟨ĝt , σt ⟩ − minj  ⟨ĝtj , σ⟩ −
j j
σ∈∆
t=1 t=1 a∈Aj ,j ′ ∈Cj,a
T T

σa RTj ↓ ,
X X X
≤ ⟨ĝtj , σtj ⟩ − minj ⟨ĝtj , σ⟩ + maxj (6.5)
σ∈∆ σ∈∆
t=1 t=1 a∈Aj ,j ′ ∈Cj,a


σa RTj ↓
PT j P
where the inequality is by the fact that independently minimizing the terms t=1 ⟨ĝt , σ⟩ and − a∈Aj ,j ′ ∈Cj,a
is smaller than jointly minimizing them.
Now we may apply (6.5) recursively in top-down fashion starting at root to get the theorem.

A direct corollary of Theorem 14 is that if the counterfactual regret at each information set grows sublin-
early then overall regret grows sublinearly. This is the foundation of the counterfactual regret minimization
(CFR) framework for minimizing regret over treeplexes. The CFR framework can succinctly be described as

1. Instantiate a local regret minimizer for each information set simplex ∆j .

2. At iteration t, for each j ∈ J , feed the local regret minimizer the counterfactual regret ĝtj .

3. Generate xt+1 as follows: ask for the next recommendation from each local regret minimizer. This
yields a set of simplex strategies, one for each information set. Construct xt+1 via (6.4).

Thus we get an algorithm for minimizing regret on treeplexes based on minimizing counterfactual regrets.
In order to construct an algorithm for computing a Nash equilibrium based on a CFR setup, we may invoke
the folk theorems from the previous lectures using the sequence-form strategies generated by CFR. Doing
this yields an algorithm that converges to a Nash equilibrium of an EFG at a rate on the order of O √1T
While CFR is technically a framework for constructing local regret minimizers, the term “CFR” is often
overloaded to mean the algorithm that comes from using the folk theorem with uniform averages, and using
regret matching as the local regret minimizer at each information set. CFR+ is the algorithm resulting from
using the alternation setup, taking linear averages of strategies, and using RM+ as the local regret minimizer
at each information set.
We now show pseudocode for implementing the CFR algorithm with the RM+ regret minimizer. In
order to compute Nash equilibria with this method one would use CFR as the regret minimizer in one of the
folk-theorem setups from the previous lecture.
6.7. NUMERICAL COMPARISON OF CFR METHODS AND OMD-LIKE METHODS 45

Algorithm 1: CFR(RM+ )(J , X)


Data: J set of infosets
X sequence-form strategy space
1 function Setup()
2 Q ← 0-initialized vector over sequences
3 t←1
4 function NextStrategy()
5 x ← 0 ∈ R|X|
6 x∅ ← 1
7 for j ∈ JPin top-down order do
8 s ← a∈Aj Qa
9 if s = 0 then
10 for a ∈ Aj do
11 xa ← xpj / |Aj |
12 else
13 for a ∈ Aj do
14 xa ← xpj × Qa / s
15 return x

16 function ObserveLoss(gt ∈ R|X| )


17 for j ∈ JPin bottom-up order do
18 s ← a∈Aj Qa
19 v ← 0 // the value of information set j
20 if s = 0 P
then
21 v ← a∈Aj gt,a /|Aj |
22 else P
23 v ← a∈Aj ⟨gt,a , Qa / s⟩
24 gt,pj ← gt,pj + v // construct local loss ĝt
25 for a ∈ Aj do
26 Qa ← [Qa + (v − gt,a )]+ // gt,a = ĝt,a since all j ′ ∈ Cj,a were already traversed
27 t←t+1

NextStrategy simply implements the top-down recursion (6.4) while computing the update corre-
sponding to RM+ at each j. ObserveLoss uses bottom-up recursion to keep track of the regret-like
sequence Qa , which is based on ĝt,a in CFR.
A technical note here is that we assume that there is some dummy sequence ∅ at the root of the treeplex
with no corresponding j (this corresponds to a single-action dummy information set at the root, but leaving
out that dummy information set in the index set J ). This makes code much cleaner because there is no
need to worry about the special case where a given j has no parent sequence, at the low cost of increasing
the length of the sequence-form vectors by 1.

6.7 Numerical Comparison of CFR methods and OMD-like meth-


ods
Figure 6.5 shows the performance of three different variations of CFR, as well as the excessive gap technique
(EGT), a first-order method that converges at a rate of O(1/T ) using the dilated entropy DGF from last
lecture (EGT is equivalent to the mirror prox algorithm that was shown previously, in terms of theoretical
convergence rate). The plots show performance on four EFGs: Leduc poker, a simplified poker game that
is standard in EFG solving (three different deck sizes are shown), and search, a game played on a graph
where an attacker attempts to reach a particular set of nodes, and the defender tries to capture them (full
descriptions can be found in Kroer et al. [65]).
46 CHAPTER 6. EXTENSIVE-FORM GAMES

6−card Leduc 30−card Leduc


● ●
● ●
● ●
● ● ●
● ●
1.00 ●
● 1.00 ●
● ●
solution accuracy

solution accuracy

● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●

Algorithm Algorithm ●

● CFR(RegretMatching) ● CFR(RegretMatching)
0.01
CFR(RegretMatching+) 0.01 CFR(RegretMatching+)
CFR+ CFR+
ExcessiveGapTechnique ExcessiveGapTechnique

10 100 1000 10 100 1000


tree traversals tree traversals

70−card Leduc Search


● ●
● ●
● ●
● ● ● ●
● ●

1.00 ●
1.00 ● ● ●
● ●
solution accuracy

solution accuracy
● ● ● ●
● ● ●
● ●




Algorithm ●

Algorithm
● CFR(RegretMatching) ● CFR(RegretMatching)

0.01 CFR(RegretMatching+) 0.01 CFR(RegretMatching+)


CFR+ CFR+
ExcessiveGapTechnique ExcessiveGapTechnique

10 100 1000 10 100 1000


tree traversals tree traversals

Figure 6.5: Solution accuracy as a function of the number of tree traversals in three different variants of
Leduc hold’em and a pursuit evasion game. Results are shown for CFR with regret mathing, CFR with
regret mathing+ , CFR+ , and EGT. Both axes are shown on a log scale.

6.8 Stochastic Gradient Estimates


So far we have operated under the assumption that we can easily compute the matrix-vector product gt = Ayt ,
where A is the payoff matrix of the EFG that we are trying to solve. While gt can indeed be computed
in time linear in the size of the game tree, we may be in a case where the game tree is so large that even
one traversal is too much. In that case, we are interested in developing methods that can work with some
stochastic gradient estimator g̃t of the gradient. Typically, one would consider unbiased gradient estimators,
i.e. E[g̃t ] = gt .
Assuming that we have a gradient estimator g̃t for each time t, a natural approach for attempting to
compute a solution would be to apply our previous approach of running a regret minimizer for each player
and using the folk theorem, but now using g̃t at each iteration, rather than gt . If our unbiased gradient
estimator g̃t is reasonably accurate then we might expect that this approach should still yield an algorithm
for computing a Nash equilibrium. This turns out to be the case.

Theorem 15. Assume that each player uses a bounded unbiased gradient estimator for their loss at each
iteration. Then for all p ∈ (0, 1), with probability at least 1 − 2p

R̃1 + R̃T2  r 2 1
ξ(x̄, ȳ) ≤ T + 2∆ + M̃1 + M̃2 log ,
T T p

where R̃Ti is the regret incurred under the losses g̃ti for player i, ∆ = maxz,z′ ∈Z u2 (z) − u2 (z ′ ) is the payoff
range of the game, and M̃1 ≥ maxx,x′ ∈X ⟨g̃t , x − x′ ⟩, ∀g̃t is a bound on the “size” of the gradient estimate,
with M2 defined analogously.

We will not show the proof here, but it follows from introducing the discrete-time stochastic process

dt := gt (xt − x)) − g̃t (xt − x),


6.9. HISTORICAL NOTES 47

Battleship, external sampling, 50 seeds Goofspiel, external sampling, 50 seeds


MCCFR MCCFR
100 FTRL (η = 10) FTRL (η = 100)
OMD (η = 10) OMD (η = 10)
Saddle-point gap

Saddle-point gap
100

0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.5 1.0 1.5 2.0 2.5
Number of nodes touched (×107 ) Number of nodes touched (×106 )
Leduc13, external sampling, 50 seeds Search game (4 turns), external sampling, 50 seeds
MCCFR MCCFR
FTRL (η = 10) FTRL (η = 100)
OMD (η = 1) OMD (η = 1)
Saddle-point gap

Saddle-point gap
100 100

10−1 10−1
1 2 3 4 5 6 7 8 0.2 0.4 0.6 0.8 1.0
Number of nodes touched (×106 ) Number of nodes touched (×106 )

Figure 6.6: Performance of CFR, FTRL, and OMD when using the external sampling gradient estimator.

observing that it is a martingale difference sequence, and applying the Azuma-Hoeffding concentration
inequality.
With Theorem 15 in hand, we just need a good way to construct gradient estimates g̃t ≈ Ayt . Generally,
one can construct a wide array of gradient estimators by using the fact that Ayt can be computed by
traversing the EFG game tree: at each leaf node z in the tree, we add −u1 (z)ya to gt,a′ , where a is the
last sequence taken by the y player, and a′ is the last sequence taken by the x player. To construct an
estimator, we may choose to sample actions at some subset of nodes in the game tree, and then only traverse
the sampled branches, while taking care to normalize the eventual payoff so that we maintain an unbiased
estimator. One of the most successful estimators construct this way is the external sampling estimator. In
external sampling when computing the gradient Ayt , we sample a single action at every node belonging to
the y player or chance, while traversing all branches at nodes belonging to the x player.
Figure 6.6 shows the performance when using external sampling in CFR (CFR with sampling is usually
called Monte-Carlo CFR or MCCFR), FTRL, and OMD. Performance is shown on Leduc with a 13-card
deck, Goofspiel (another card game), search, and battleship. In the deterministic case we saw that CFR+
was much faster than the theoretically-superior EGT algorithm (and OMD/FTRL would perform much
worse than EGT). Here we see that in the stochastic case it varies which algorithm is better.

6.9 Historical Notes


The sequence form was discovered in the USSR in the 60s [77] and later rediscovered independently [89, 60].
Dilated DGFs for EFGs were introduced by Hoda et al. [57] where they proved that any such DGF constructed
from simplex DGFs which are strongly convex must also be strongly convex. Kroer et al. [65] showed the
strong convexity modulus of the dilated entropy DGF shown here. An explicit bound for the dilated Euclidean
DGF can be found in Farina et al. [48], which also explores regret minimization algorithms with dilated DGFs
in depth.
CFR-based algorithms were used as the algorithm for computing Nash equilibrium in all the recent
milestones where AIs beat human players at various poker games [15, 69, 17, 19].
CFR was introduced by Zinkevich et al. [91]. Many later variations have been developed, for example
48 CHAPTER 6. EXTENSIVE-FORM GAMES

the stochastic method MCCFR [67], and variations on which local regret minimizer to use in order to speed
up practical performance [83, 18]. The proof of CFR given here is a simplified version of the more general
theorem developed in [47]. The plots on CFR vs EGT are from Kroer et al. [65].
The bound on error from using a stochastic method in Theorem 15 is from Farina et al. [49], and the
plots on stochastic methods are from that same paper. External sampling and several other EFG gradient
estimators were introduced by Lanctot et al. [67].
Chapter 7

Intro to Fair Division and Market


Equilibrium

7.1 Fair Division Intro


In this chapter we start the study of fair allocation of resources to a set of individuals. We start by focusing
on the fair division setting. In fair division, we have one or more items that we wish to allocate to a set of
agents, under the assumption that the items are infinitely-divisible, meaning that we can perform fractional
allocation. In the next chapter we will study the setting with discrete items. The goal will be to allocate
the items in a manner that is efficient, while attempting to satisfy various notions of fairness towards each
individual agent. Fair allocation has many applications such as assigning course seats to students, pilot-to-
plane assignment for airlines, dividing estates, chores, or rent, and fair recommender systems.
In this note we study fair division problems with the following setup: we have a set of m infinitely-divisible
items that we wish to divide among n agents. Without loss of generality we may assume that each item has
supply 1. We will denote the bundle of items given to agent i as xi , where xij is the amount of item j that
is allocated to agent i. Each agent has some utility function ui (xi ) ∈ R+ denoting how much they like the
bundle xi . We shall use x to denote an assignment of items to agents. We will later study the indivisible
setting.
Given all of the above, we would like to choose a “good” assignment x of items to agents. However,
“good” turns out to be very complicated in the setting of fair division, as there are many possible desiderata
we may wish to account for.
First, we would like the allocation to somehow be efficient, meaning thatP it should lead to high utilities for
the agents. One option would be to try to maximize the social welfare i ui (xi ), the sum of agent utilities.
However, this turns out to be incompatible with the fairness notions that we will introduce later. An easy
criticism of social welfare in the context of fair division is that it favors welfare monsters: agents with much
greater capacity for utility are given more items1 . Instead, we shall focus on the much less satisfying notion
of Pareto optimality: we wish to find an allocation x such that for every other allocation x′ , if some agent i′
is better off under x′ , then some other agent is strictly worse off. In other words, x should be such that no
other allocation weakly improves all agent’s utilities, unless all utilities stay the same.
We will consider the following measures of how fair an allocation x is:
• No envy: x has no envy if for every pair of agents i, i′ , ui (xi ) ≥ ui (xi′ ). In other words, every agent
likes their own bundle at least as much as that of anyone else
 
• Proportionality: x satisfies proportionality if ui (xi ) ≥ ui ⃗1 · n1 . That is, every agent likes their bundle
1
xi at least as well as the bundle where they receive n of every item.
We begin our study of fair division mechanisms with a classic: competitive equilibrium from equal incomes
(CEEI). In CEEI, we construct a mechanism for fair division by giving each agent a unit budget of fake
1 See also https://existentialcomics.com/comic/8

49
50 CHAPTER 7. INTRO TO FAIR DIVISION AND MARKET EQUILIBRIUM

currency (or funny money), computing what is called a competitive equilibrium (also known as Walrasian
equilibrium or market equilibrium; we will use the latter terminology) under this new market, and using the
corresponding allocation as our fair division. The fake currency is then thrown away, since it had no purpose
except to define a market.
To understand this mechanism, we first introduce market equilibrium. In a market equilibrium, we wish
to find a set of prices p ∈ Rm
+ for each of the m items, as well as an allocation x of items to agents such that
everybody is assigned an optimal allocation given the prices and their budget. Formally, the demand set of
an agent i with budget Bi is

D(p) = argmaxxi ≥0 ui (xi ) s.t. ⟨p, xi ⟩ ≤ Bi


P
A market equilibrium is an allocation-price pair (x, p) s.t. xi ∈ D(p) for all agents i, and i xij = 1.
CEEI is a perfect solution to our desiderata that we asked for. It is Pareto optimal (every market
equilibrium is Pareto optimal by the first welfare theorem). It has no envy: since each agent has the same
budget Bi = 1 in CEEI and every agent is buying something in their demand set, no envy must be satisfied,
since they can afford the bundle of any other agent. Finally, proportionality is satisfied, since each agent
can afford the bundle where they get n1 of each item (convince yourself why).
Market-equilibrium-based allocation for divisible items has applications in large-scale Internet markets.
First, it can be applied in fair recommender systems. As an example, consider a job recommendations site.
It’s a two-sided market. On one side are the users, whom view job ads. On the other side are the companies
creating job ads. Naively, a system might try to simply maximize the number of job ads that users click on,
or apply to. This can lead to extremely imbalanced allocations, where a few job ads get a huge number of
views and applicants, which is bad both for users and the companies. Instead, the system may wish to fairly
distribute user views across the many different job ads. In that case, CEEI can be used. In this setting the
agents are the job ads, and the items are slots in the ranked list of job ads shown to the user. Secondly,
there are strong connections between market equilibrium and the allocation of ads in large-scale Internet ad
markets. This connection will be explored in detail in a later note.
Motivated by the application to fair division, we will now cover market equilibrium, both when they exist
and how to find one when they do.

7.2 Fisher Market


We first study market equilibrium in the Fisher market setting. We have a set of m infinitely-divisible items
that we wish to divide among n buyers. Without loss of generality we may assume that each item has supply
1. We will denote the bundle of items given to buyer i as xi , where xij is the amount of item j that is
allocated to buyer i. Each buyer has some utility function ui (xi ) ∈ R+ denoting how much they like the
bundle xi . We shall use x to denote an assignment of items to buyers. Each buyer is endowed with a budget
Bi of currency.

7.2.1 Linear Utilities


We start by studying the simplest setting, where the utility of each buyer is linear. This means that every
buyer i has some valuation vector vi ∈ Rm , and ui (xi ) = ⟨vi , xi ⟩.
Amazingly, there is a nice convex program for computing a market equilibrium. Before giving the convex
program, let’s consider some properties that we would like. First, if we are going to find a feasible allocation,
we would obviously like the supply constraints to be respected, i.e.
X
xij ≤ 1, ∀j.
i

Secondly, since a buyer’s demand does not change even if we rescale their valuation by a constant, we
would like the optimal solution to our convex program to also remain unchanged. Similarly, splitting the
budget of a buyer into two separate buyers with the same valuation function should leave the allocation
7.2. FISHER MARKET 51

unchanged. These conditions are satisfied by the budget-weighted geometric mean of the utilities:
!1/ Pi Bi
Y
ui (xi )Bi .
i

Since taking roots does not affect optimality, and taking the log of the whole expression, this is equivalent
to optimizing
X
max Bi log⟨vi , xi ⟩ Dual variables
x≥0
i
X (EG)
s.t. xij ≤ 1, ∀j = 1, . . . , m, pj
i

On the right are the dual variables associated to each constraint. It is easy to see that this is a convex
program. First, the feasible set is defined by linear inequalities. Second, we are taking a max of a sum of
concave functions composed with linear maps. Since taking a sum and composing with a linear map both
preserve concavity we get that the objective is concave.
The solution to the primal problem x along with the vector of dual variables p yields a market equilibrium.
Here we assume that for every item j there exists i such that vij > 0, and every buyer values at least one
item above 0.

Theorem 16. The pair of allocations x and dual variables p from EG forms a market equilibrium.

Proof. To see this, we need to look at the KKT conditions of the primal and dual variables. Writing the
Lagrangian relaxation and applying Sion’s minimax theorem (the most general version of Sion’s minimax
theorem only requires compactness in one of the variables, which we have due gives
!
X X X
min max Bi ⟨vi , xi ⟩ + pj 1 − xij
p≥0 x≥0
i j i
X X
= min max [Bi ⟨vi , xi ⟩ − ⟨p, xi ⟩] + pj (7.1)
p≥0 x≥0
i j

Looking at optimality conditions we get


P
1. For all items j: pj > 0 ⇒ i xij = 1
Bi pj
2. For all buyers i: ⟨vi ,xi ⟩ ≤ vij

Bi pj
3. For all pairs i, j: xij > 0 ⇒ ⟨vi ,xi ⟩ = vij

The first condition shows that every item is fully allocated, since for every j there is some buyer i with
v B
non-zero value and by the second condition pj ≥ ⟨viji ,xii⟩ > 0.
The second condition for market equilibrium is that every buyer is assigned a bundle from their demand
set. We will use βi = ⟨viB,xi i ⟩ = uiB(xi i ) to denote the utility price that buyer i pays. First off, by the second
condition we have that the utility price that buyer i gets satisfies
pj
βi ≤ .
vij

By the third condition, we have that if xij > 0 then for all other items j ′ we have
pj pj ′
= βi ≤ .
vij vij ′

Thus, any item j that buyer i is assigned has at least as low of a utility price as any other item j ′ . In other
words, they only buy items that have the best bang-per-buck among all the items. Thus we get that they
52 CHAPTER 7. INTRO TO FAIR DIVISION AND MARKET EQUILIBRIUM

only purchase optimal items, it remains to show that they spent their whole budget. Multiplying the third
condition by xij and rearranging gives

Bi
xij vij = pj xij ,
⟨vi , xi ⟩

for any j such that xij > 0. Summing across all such j yields
X X Bi Bi
pj xij = xij vij = ⟨vi , xi ⟩ = Bi .
j j
⟨vi , xi ⟩ ⟨vi , xi ⟩

EG gives us an immediate proof of existence for the linear Fisher market setting: the feasible set is clearly
non-empty, and the max is guaranteed to be achieved.
In a previous lecture note we referenced Pareto optimality as a property of market equilibrium. It is now
trivial to see that Pareto optimality holds in Fisher-market equilibrium: since it is a solution to EG, it must
be. Otherwise we construct a solution with strictly better objective!
From the EG formulation we can also see that the equilibrium utilities and prices are in fact unique.
First note that any market equilibrium allocation would satisfy the optimality conditions of EG, and thus
be an optimal solution. But if there were more than one set of utility vectors that were equilibria, then by
the strong concavity of the log we would get that there is a strictly better solution, which is a contradiction.
That equilibrium prices are unique now follows from the third optimality condition, since all terms except
the utilities are constants.

7.3 More General Utilities


It turns out that EG can be applied to a broader class of utilities. This class is the set of utilities that are
concave, homogeneous, and continuous.
In that case we get an optimization problem of the form
X
max Bi log ui (xi ) Dual variables
x≥0
i
X (EG)
s.t. xij ≤ 1, ∀j = 1, . . . , m, pj
i

This is still a convex optimization problem, since composing a concave and nondecreasing function (the
log) with a concave function (ui ) yields a concave function.
Beyond linear utilities, the most famous classes of utilities that fall under this category is:

1. Cobb-Douglas utilities: ui (xi ) = j (xij )aij , where j aij = 1


Q P

xij
2. Leontief utilities: ui (xi ) = minj aij

P 1/ρ
ρ
3. The family of constant elasticity of substitution (CES) utilities: ui (xi ) = j aij xij , where aij
are the utility parameters of a buyer, and ρ parameterizes the family, with −∞ < ρ ≤ 1 and ρ ̸= 0

CES utilities turn out to generalize all the other utilities we have seen so far: Leontief utilities are
obtained as ρ approaches −∞, Cobb-Douglas utilities as ρ approaches 0, and linear utilities when ρ = 1.
More generally, ρ < 0 means that items are complements, whereas ρ > 0 means that items are substitutes.
If ui is continuously differentiable then the proof that EG computes a market equilibrium in this more
general setting essentially follows that of the linear case. The only non-trivial change is that when we derive
optimality conditions by taking the derivative of the Lagrangian with respect to xi we get
Bi pj
1. ui (xi ) ≤ ∂ui (xi )/∂xij
7.4. COMPUTING MARKET EQUILIBRIUM 53

Bi pj
2. xij > 0 ⇒ ui (xi ) = ∂ui (xi )/∂xij

In order to prove that buyers spend their budget exactly in this setting we can apply Euler’s homogeneous
i (xi )
function theorem ui (xi ) = j xij ∂u∂x
P
ij
to get

X X ∂ui (xi ) Bi
xij pj = xij = Bi .
j j
∂xij ui (xi )

7.4 Computing Market Equilibrium


So now we know how to write a market equilibrium problem as a convex program. How should we solve
it? One option is to build the EG convex program explicitly using mathematical programming software.
However, most contemporary software is not very good at handling this kind of objective function (formally
this falls under exponential cone programming, which is still relatively new). As of 2019, my experience with
open-source solvers was that they fail at 150 items and 150 buyers, with randomly-generated valuations. The
Mosek solver [70] is currently the only industry-grade solver that supports exponential cone programming
(support for exponential cones was only just added in 2018). It fares much better, and scales to a few
thousand buyers and items. For problems of moderate-to-large size, this is the most effective approach.
However, for very large instances, the iterations of the interior-point solver used in Mosek become too slow.
Instead, for extremely large problems we may invoke some of our earlier results on saddle-point problems.
In particular, the formulation (7.1) is amenable to online mirror descent and the folk-theorem based approach
for solving saddle-point problems. In that framework, we can interpret the repeated game as being played
between a pricer trying to minimize over prices p, and the set of buyers choosing allocations x.
As an exercise, convince yourself that the OMD/folk theorem approach works. Pay particular attention
to the assumptions needed for online mirror descent.

7.5 Historical Notes


The original Eisenberg-Gale convex program was given for linear utilities in [44]. [43] later extended it to
utilities that are concave, continuous, and homogeneous.
Fairly assigning course seats to students via market equilibrium was studied by Budish [21]. Goldman
and Procaccia [55] introduce an online service spliddit.org which has a user-friendly interface for fairly
dividing many things such as estates, rent, fares, and others. The motivating example of fair recommender
systems, where we fairly divide impressions among content creators via CEEI was suggested in [64] and [62].
Similar models, but where money has real value, were considered for ad auctions with budget constraints by
several authors [14, 36, 37]
There is a rich literature on various iterative approaches to computing market equilibrium in Fisher mar-
kets. One can apply first-order methods or regret-minimization approaches to the saddle-point formulation
(7.1) directly, which was done in Kroer et al. [64] and Gao et al. [51]. There is a large literature on interpret-
ing first-order methods through the lens of dynamics between a pricer who increases and decreases prices
as items become oversubscribed and undersubscribed, and buyers report their preferred bundles, or make
gradient-steps in the direction of their preference [33, 13, 30]. There is also a literature deriving auction-
like algorithms, which can similarly sometimes be viewed as instantiations of gradient descent and related
algorithms [10, 73].
A fairly comprehensive recent overview of fair division can be found at https://users.cs.duke.edu/
~ rupert/fair-division-aaai20/Tutorial-Slides.pdf.
54 CHAPTER 7. INTRO TO FAIR DIVISION AND MARKET EQUILIBRIUM
Chapter 8

Fair Allocation with Indivisible Goods

8.1 Introduction
In this lecture note we study the problem of performing fair allocation when the items are indivisible. This
setting presents a number of challenges that were not present in the divisible case.
It is obviously an important setting in practice. For example, the website http://www.spliddit.org/
allows users to fairly split estates, financial assets, toys, or other goods. Another important application is
that of fairly allocating course seats to students. This setting is even more intricate, because valuations in
that setting are combinatorial. In order to design suitable mechanisms for fairly dividing discrete goods, we
will need to reevaluate our fairness concepts.

8.2 Setup
We have a set of m indivisible goods that we wish to divide among n agents. We assume that each good has
supply 1. We will denote the bundle of goods given to agent i as P xi , where xij is the amount of good j that
is allocated to buyer i. The set of feasible allocations is then {x| i xij ≤ 1, xij ∈ {0, 1}}
Unless otherwise specified, each agent is assumed to have a linear utility function ui (xi ) = ⟨vi , xi ⟩ denoting
how much they like the bundle xi .

8.3 Fair Allocation


In the case of indivisible items, several of our fairness properties become much harder to achieve. We will
assume that we are required to construct a Pareto-efficient allocation.
Proportional fairness doesn’t even make sense anymore: it rested on the idea of assigning each agent
their fractional share n1 of each item. There is however, a suitable generalization of proportionality that does
make sense for the indivisible case: the maximin share (MMS) guarantee: For agent i, their MMS is the
value they would get if they get to divide the items up into n bundles, and are required to take the worst
bundle. Formally:

max min ui (xj )


x≥0 j
X
s.t. xij ≤ 1, ∀j
i
xij ∈ {0, 1}, ∀i, j

We say that an allocation x is an MMS allocation if every agent i receives utility ui (xi ) that is at least
as high as their MMS guarantee. In the case of 2 agents, an MMS allocation always exists. As an exercise,
you might try to come up with an algorithm for finding such an allocation1
1 Solution: compute one of the solutions to agent 1’s MMS computation problem. Then let agent 2 choose their favorite

55
56 CHAPTER 8. FAIR ALLOCATION WITH INDIVISIBLE GOODS

In the case of 3 or more agents, such a solution may not exist. The counterexample is very involved, so
we won’t cover it here.
Theorem 17. For n ≥ 3 agents, there exist additive valuations for which an MMS allocation does not exist.
However, an allocation such that each agent receives at least 43 of their MMS guarantee always exists.
The original spliddit algorithm for dividing goods worked as follows: first, compute the α ∈ [0, 1] such
that every agent can be guaranteed an α fraction of their MMS guaranteee (this always ends up being α = 1
in practice). Then, subject to the constraints ui (xi ) ≥ αMMSi , a social welfare-maximizing allocation was
computed. However, this can lead to some weird results.
Example 3. Three agents each have valuation 1 for 5 items. In that case, the MMS guarantee is 1 for each
agent. But now the social welfare-maximizing solution can allocate three items to agent 1, and 1 item each
to agents 2 and 3. Obviously a more fair solution would be to allocate 2 items to 2 agents, 1 item to the last
agent.
One observation we can make about the 3/1/1 solution versus the 2/2/1 solution is that envy is strictly
higher in the 3/1/1/ solution.
With the above motivation, let us consider envy in the discrete setting. It is easy to see that we generally
won’t be able to get envy-free solutions if we are required to assign all items. Consider 2 agents splitting an
inheritance: a house worth $500k, a car worth $10k, and a jewelry set worth $5k. Since we have to give the
house to a single agent, the other agent is guaranteed to have envy. Thus we will need a relaxed notion of
envy:
Definition 1. An allocation x is envy-free up to one good (EF1) if for every pair of agents i, k, there exists
an item j such that xkj = 1 and ui (xi ) ≥ ui (xk − ek ), where ek is the k’th basis vector.
Intuitively, this definition says that for any pair of agents i, k such that i envies k, that envy can be
removed by removing a single item from the bundle of k. Note that requiring EF1 would have forced us to
use the 2/2/1 allocation in Example 3.
For linear utilities, an EF1 allocation is easily found (if we disregard Pareto optimality). As an exercise,
come up with an algorithm for computing an EF1 allocation for linear valuations2 In fact, EF1 allocations
can be computed in polynomial time for any monotone set of utility functions (meaning that if xi ≥ x′i then
ui (xi ) ≥ ui (x′i )).
However, ideally we would like to come up with an algorithm that gives us EF1 as well as Pareto efficiency.
To achieve this, we will consider the product of utilities, which we saw previously in Eisenberg-Gale. This
product is also called the Nash welfare of an allocation:
Y
N W (x) = ui (xi )
i

The max Nash welfare (MNW) solution picks an allocation that maximizes N W (x):
Y
max ui (xi )
x
i
X
s.t. xij ≤ 1, ∀j
i
xij ∈ {0, 1}, ∀i, j
Note that here we have to worry about the degenerate case where N W (x) = 0 for all x, meaning that
it is impossible to give strictly positive utility to all agents. We will assume that there exists x such that
N W (x) > 0. If this does not hold, typically one seeks a solution that maximizes the number of agents with
strictly positive utility, and then the largest MNW achievable among subsets of that size is chosen.
The MNW solution turns out to achieve both Pareto optimality (obviously, since otherwise it would not
solve the MNW optimization problem), and EF1:
bundle, and give the other bundle to agent 1. Agent 1 clearly receives their MMS guarantee, or better. Agent 2 also does: their
MMS guarantee is at most 12 ∥v2 ∥1 , and here they receive utility of at least 12 ∥v2 ∥1 .
2 This is achieved by the round-robin algorithm: simply have agents take turns picking their favorite item. It is easy to see

that EF1 is an invariant of the partial allocations resulting from this process.
8.4. COMPUTING DISCRETE MAX NASH WELFARE 57

Theorem 18. The MNW solution for linear utilities is Pareto optimal and EF1.

Proof. Let x be the MNW solution. Say for contradiction that agent i envies agent k by more than one
v
good. Let j be the item allocated to agent k that minimizes the ratio vkj ij
. Let x′ be the same allocation as x,
except that xij = 1, xkj = 0. The proof is concluded by showing that N W (x′ ) > N W (x), which contradicts
′ ′

optimality of x for the MNW problem.


Using the linearity of utilities we have ui (x′i ) = ui (xi ) + vij and uk (x′k ) = uk (xk ) − vkj . Every other
utility stays the same. Now we have

N W (x′ )
>1
N W (x)
[ui (xi ) + vij ] · [uk (xk ) − vkj ]
⇔ >1
ui (xi )uk (xk )
   
vij vkj
⇔ 1+ · 1− >1
ui (xi ) uk (xk )
vkj
⇔ [ui (xi ) + vij ] < uk (xk ) (8.1)
vij

By how we choose j we have P


vkj j ′ ∈xk vkj uk (xk )

≤ P ≤ ,
vij v
j ′ ∈xk ij
′ ui (xk )

and by the envy property we have


ui (xi ) + vij < ui (xk ).
Now we can multiply together the last two inequalities to get (8.1).

The MNW solution also turns out to give a guarantee on MNW, but not a very strong one: every agent is
guaranteed to get 1+√24n−3 of their MMS guarantee, and this bound is tight. Luckily, in practice the MNW
solution seems to fare much better. On Spliddit data, the following ratios are achieved. In the table below
are shown the MMS approximation ratios across 1281 “divide goods” instances submitted to the Spliddit
website for fairly allocating goods

MMS approximation ratio intervals [0.75, 0.8) [0.8, 0.9) [0.9, 1) 1


% of instances in interval 0.16% 0.7% 3.51% 95.63%
Over 95% of the instances have every player receive their full MMS guarantee.

8.4 Computing Discrete Max Nash Welfare


8.4.1 Complexity
The problem of maximizing Nash welfare is generally not easy. In fact, the problem turns out to be not only
NP-hard, but NP-hard to approximate within a factor µ ≈ 1.00008 (the best currently-known approximation
factor is 1.45, so the gap between 1.00008 and 1.45 is open).
The reduction is based the vertex-cover problem on 3-regular graphs, which is NP-hard to approximate
within factor ≈ 1.01. A 3-regular graph is a graph where each vertex has degree 3.
The proof is not particularly illuminating, so we will skip it here. However, let’s see a quick way to
prove a simpler statement: that the problem is NP-hard even for 2 players with identical linear valuations.
Consider the following

Definition 2. Partition problem: you are given a multiset of integers S = {s1 , . . . , sm } (potentially with
duplicates),
P and your task is to figure out if there is a way to partition S into two sets S1 , S2 such that
P
i∈S1 si = i∈S2 s2 .
58 CHAPTER 8. FAIR ALLOCATION WITH INDIVISIBLE GOODS

We may now construct an MNW instance as follows: we create two agents and m items. Each agent has

value sj for item j. Now by the AM-GM inequality (2d case: xy ≤ x+y 2 ,Pwith equality iff x = y) there
exists a correct partitioning if and only if the MNW allocation has value ( 21 j sj )2 .
This result can be extended to show strong NP-hardness by considering the k-equal-sum-subset prob-
lem: given a multiset S of x1 , . . . , xn positive integers, are there k nonempty disjoint subsets S1 , . . . , Sk ⊂ S
such that sum(S1 ) = . . . = sum(Sk ). The exact same reduction as before works, but with k agents rather
than 2.

8.4.2 Algorithms
Given these computational complexity problems, how should we compute an MNW allocation in practice?
We present two approaches here. First, we can take the log of the objective, to get a concave function.
After taking logs, we get the following mixed-integer exponential-cone program:
X
max log ui
i
s.t. X ui ≤ ⟨vi , xi ⟩, ∀i = 1, . . . , n
(8.2)
xij ≤ 1, ∀j = 1, . . . , m
i
xij ∈ {0, 1}, ∀i, j
This is simply the discrete version of the Eisenberg-Gale convex program. One approach is to solve this
problem directly, e.g. using Mosek.
Alternatively, we can impose some additional structure on the valuation space: if we assume that all
valuations are integer-valued, then we know that ui (xi ) will take on some integer value in the range 0 to
∥vi ∥1 . In that case, we can add a variable wi for each agent i, and use either (1) the linearization of the log
at each integer value, or (2) the linear function from the line segment (log k, k), (log(k + 1), k + 1), as upper
bounds on wi . This gives 21 ∥vi ∥1 constraints for each i using the line segment approach (the linearization
uses twice as many constraints), but ensures that wi is equal to log⟨vi , xi ⟩ for all integer-valued ⟨vi , xi ⟩.
Using the line segment approach gives the following mixed-integer linear program (MILP):
X
max wi
i
s.t. X wi ≤ log k + [log(k + 1) − log k] × (⟨vi , xi ⟩ − k), ∀i = 1, . . . , n, k = 1, 3, . . . , ∥vi ∥1
vij xij ≥ 1, ∀i
j X
xij ≤ 1, ∀j = 1, . . . , m
i
xij ∈ {0, 1}, ∀i, j
(8.3)
These two mixed-integer programs both have some drawbacks: For the first mixed-integer exponential-
cone program, we must resort to much less mature technology than for mixed-integer linear programs. On
the other hand, the discrete EG program is reasonably compact: the program is roughly the size of a solution.
For the MILP, the good news is that MILP technology is quite mature, and so we might expect this to solve
quickly. On the other hand, adding n × ∥vi ∥1 additional constraints can be quite a lot, and could lead to
slow LP solves as part of the branch-and-bound procedure.
Figure 8.1 shows the performance of the two approaches.

8.5 Historical Notes


The maximin share was introduced by Budish [21]. The results on nonexistence of MMS allocation, and
an approximation guarantee of 32 were given by Kurokawa et al. [66]. The approximation guarantee was
improved to 43 by Ghodsi et al. [52]. The application of MNW to fair division was proposed by Caragiannis
et al. [24].
8.5. HISTORICAL NOTES 59

150


Algo
Runtime

100
● eg_runtime
mip_runtime
50


● ●
● ● ● ●
0 ●

10 20 30 40 50
num agents

Figure 8.1: Plot showing the runtime of discrete Eisenberg-Gale and the MILP approach.

A really nice overview talk targeted at a technical audience is given by Ariel Procaccia here: https://
www.youtube.com/watch?v=7lUtS-l9ytI. Most of the material here is based on his excellent presentations
of these topics.
The 1.00008 inapproximability result was byZLee [68]. The 1.45-approximation algorithm was given
by Barman et al. [6]. Strong NP-hardness of k-equal-sum-subset is shown in Cieliebak et al. [31].
The MILP using approximation to the log at each integer point was introduced by Caragiannis et al. [25].
At the time, Mosek did not support exponential cones, and so they did not compare to the direct solving of
discrete Eisenberg-Gale. The results shown here are the first direct comparison of the two, as far as I know.
60 CHAPTER 8. FAIR ALLOCATION WITH INDIVISIBLE GOODS
Chapter 9

Internet Advertising Auctions:


Position Auctions

9.1 Introduction
In this chapter we begin our study of more advanced auction concepts beyond first and second-price auctions.
We will focus on a type of auction motivated by internet advertising auctions. Internet advertising auctions
provide the funding for almost every free internet service such as google search, facebook, twitter, and so on.
At the heart of these monetization schemes is a market design based around independently running auctions
every time a user shows up. This happens many times per second, advertisers participate in thousands or
even millions of auctions, have budget constraints that span the auctions, and each user query generates
multiple potential slots for showing ads. For all these reasons, these markets turn out to require a lot of new
theory for understanding them. Similarly, the scale of the problem necessitates the design of algorithmic
agents for bidding on behalf of advertisers.
First we will introduce the position auction, which is a highly structured multi-item auction. There, we
will look at the two most practically-important auction formats: the generalized second-price auction (GSP),
and the Vickrey-Clarke-Groves (VCG) auction. Then in the following chapters, we will study auctions with
budgets and repeated auctions.

9.1.1 Considerations for internet advertising


Consider the following problem: a user shows up and searches for the word “mortgage” on Google; now,
you are Google, and you have thousands of ads that you could show to the user when returning their search
result. Typically, google shows a few ads at the top of the results (say 2 ads, to be concrete); an example is
shown in Fig. 9.1. This setting is referred to as the “sponsored search setting.” How do you decide which
ads to show? And how do you decide how much to charge each advertiser for showing their ad? A natural
suggestion would be to try to use auctions. Based on earlier chapters of the book, one might think of running
four separate first or second-price auctions, one for each ad slot. In that case, it is clear how to decide winners
and how to set prices. Yet this immediately runs into a problem: the same ad may win multiple auctions,
and thus be shown in several slots. This looks bad for the user, and the advertiser almost surely does not
want to pay for multiple slots. Instead, we need to design a multi-item auction. But we cannot simply use
the multi-item generalization of e.g. the second-price auction, where each items is identical. This is because
different slots are not identical: users are generally more likely to click on the first ad than the second ad,
and so on. This motivates the position auction, which we study in this chapter.
The position auction model can also be used to approximate other settings such as the insertion of ads
in a news feed ; a news feed is the familiar infinitely-scrolling list of e.g. Facebook posts, Reddit posts,
Instagram posts, or Twitter posts. For example, Reddit typically inserts 1 ad in the set of visible results
before scrolling (see Figure 9.1 on the right), with another ad appearing in the next 10-15 results (this was
tested on March 28th 2020). Similarly, Facebook and Twitter insert 1-2 sponsored posts near the top of the

61
62 CHAPTER 9. INTERNET ADVERTISING AUCTIONS: POSITION AUCTIONS

Figure 9.1: Left: A Google query for “mortgage” shows 2 ads. Organic search results follow further down.
Right: The front page of Reddit. The second feed story is an ad.

feed. Truly capturing feed auctions does require some care, however. The assumption of there being a fixed
number of items is incorrect for that setting. Instead, the number of ads shown depends on how far the user
scrolls, the size of the ads, and what else is being shown in terms of organic content. We will focus on the
simpler setting with a fixed number of slots, but properly handling feed auctions is an interesting problem.
Beyond the multi-item and budget aspects, internet advertising has a few other interesting quirks. Below
these are discussed briefly, though we will mostly abstract away considerations around these issues.

Targeted advertising.

In a classical advertising setting such as TV or newspaper advertising, the same ad is shown to every viewer
of a given TV channel, or every reader of a newspaper. This means that it is largely not feasible for smaller,
and especially niche, retailers to advertise, since their return on investment is very low due to the small
fraction of viewers or readers that fit their niche. All this changed with the advent of internet advertising,
where niche retailers can perform much more fine-grained targeting of their ads. This has enabled many
niche retailers to scale up their audience reach significantly.
One way that targeting can occur is directly through association with the search term in sponsored
search. For example, by bidding on the search term “mortgage,” a lender is effectively performing a type
of targeting. However, a second type of targeting occurs by matching on query and user features (such
targeting is used across many types of internet advertising including search, feed ads, and others). For
example, a company selling surf boards might wish to target users at the intersection of the categories {age
16-30, lives in California}. Because each individual auction corresponds to a single user query, the idea of
targeted advertising can be captured in the valuations that we will use for the buyers in our auction setup:
each buyer corresponds to an advertiser, each auction corresponds to a query, and the buyer will have value
zero for all items in a given auction if the associated query features do not match their targeting criteria.
Targeted advertising has the potential for some adverse effects. Of particular note are demographic biases
in the types of ads being shown (a well-documented example is that in some settings, ads for new luxury
housing developments were disproportionately shown to white people). In a later lecture note we will study
such questions around demographic fairness. A second potential issue is that of user privacy. This is an
interesting topic that we will unfortunately not have too much to say on, as it is outside the scope of the
course.

Pay per click.

Another revolution compared to pre-internet advertising is the pay per click nature of most internet adver-
tising auctions. Many advertisers are not actually interested in the user simply viewing their ad. Instead,
their goal is to get the user to click on the ad, or even something downstream of clicking on the ad, such
as selling the advertised product via the linked website. Because the platform, such as google, is in a much
better position to predict whether a given user will click on a given ad, these auctions operate on a cost
per click basis, rather than a cost per impression. What this means is that any given advertiser does not
actually pay just because they won the auction and got their ad shown, instead they pay only if the user
actually clicks on their ad.
9.2. POSITION AUCTIONS 63

From an auction perspective, this means that the valuations used in the auctions must take into account
the probability that the user will click on the ad. Valuations are typically constructed by breaking down
the value that a buyer i (in this case an advertiser) has for an item (which is a particular slot in the search
query or user feed) into several components. The value per click of advertiser i is the value vi > 0 they place
on any user within their targeting criteria clicking on their ad (modern platforms generalize this concept
to a value per conversion, where a conversion can be a click, an actual sale of a product, the user viewing
a video, etc.). The click-through-rate is the likelihood that the user behind query j will click on the ad
of advertiser i, independently of where on the page the ad is shown. We denote this by CT Rij ; we will
assume that CT Rij = 0 if query j does not fall under the targeting criteria of buyer i. Finally, the slot
qualities q1 , . . . , qS are scalar values denoting the quality of each slot that an ad could end up in. These
are monotonically decreasing values, indicating the fact that it’s generally preferable to be shown higher up
on the page. Now, finally, the value that buyer i has for being shown in slot s of query j is modeled as
vijs = vi · CT Rij · qs .
For the rest of the lecture note, we will assume that vij is the value that buyer i has for auction j;
this value encodes the value per click, the CTR, and the targeting criteria (but can allow for more general
valuations that do not decompose). Note that this assumes correct CTR predictions, which is obviously
not true in practice. In practice the CTRs are estimated using machine learning, and it is of interest to
understand which discrepancies this introduces into the market. Secondly, we are assuming that buyers are
maximizing their expected utility, rather than observed utility. This is largely a non-problem, since they
will participate in thousands or even millions of auctions, and thus their realized value can reasonably be
expected to match the expectation (at least if the CTRs are correct). The slot quality qs will be handled
separately in the next section. Once we start discussing budgets, we will keep the presentation simple by
assuming a single item per auction, thus avoiding the need for slot qualities.

9.2 Position Auctions


In the position auction model, a set of S slots are for sale. The slots are shown in ranked order, and the
value that an advertiser derives from showing their ad in a particular slot s decomposes into two terms
vis = vi qs where vi is the value that the advertiser places on a user clicking on their ad, and qs is the
advertiser-independent click probability of slot s. Here we assume that vi already incorporates the click-
through rate (so in particular it could be that vi = vi′ · CT Ri where vi′ is their actual value per click, and
CT Ri is the click-through rate in the current auction). It is assumed that q1 ≥ q2 ≥ · ≥ qS , i.e. the top slot
is better than the second slot, and so on. Going back to the original setting, a position auction corresponds
to the individual auction that is run when a particular user query shows up. Because we are analyzing this
individual auction in isolation, we can drop the j index and simply assume that vi gives the expected value
per click for buyer i in the current auction.
Now suppose that the n advertisers submit bids b ∈ Rn+ . Both auction formats we will use then proceed
to perform allocation via welfare maximization, assuming that the bids are truthful. We will also refer to
this as bid maximization. In particular, we sort b (suppose without loss of generality that the bids are
conveniently already ordered by buyer index: b1 ≥ b2 ≥ · · · ≥ bn ), and allocate the slots in order of bids (so
buyer 1 with bid b1 gets slot 1, buyer 2 gets slots 2, and so on up to bid bS getting slot S).

Example 4. Suppose we have two slots with quality scores q1 = 1, q2 = 0.5, and three buyers with values
v1 = 10, v2 = 8, v3 = 2, and suppose they all bid their values. Then buyer 1 is allocated slot 1, and they
generate a value of v1 · q1 = 10, buyer 2 is allocated slot 2 and they generate a value v2 · q2 = 4, and buyer 3
gets nothing.

9.2.1 Generalized Second-Price Auctions


The generalized second-price (GSP) auction sells the S slots as follows: First, we allocate via bid maximiza-
tion as described above. If the user clicks on ad i ≤ S, then advertiser i is charged the next-highest bid bi+1 .
GSP generalizes second-price auctions in the sense that if S = 1 then this auction format is equivalent to
the standard second-price auction (if we take expected values in lieu of the pay-per-click model). However,
64 CHAPTER 9. INTERNET ADVERTISING AUCTIONS: POSITION AUCTIONS

this is a fairly superficial generalization, since GSP turns out to lose the core property of the second-price
auction: truthfulness!
In particular, consider Example 4 again. With GSP prices, buyer 1 gets utility q1 (v1 − v2 ) = 2 when
everyone bids truthfully. If buyer 1 instead bids some value between 2 and 8, then they get utility q2 (v1 −v3 ) =
4. Thus, buyer 1 is better off misreporting. More generally, it turns out that the GSP auction can have
several pure-Nash equilibria, and some of these lead to allocations that are not welfare-maximizing. Consider
the following bid vector for Example 4, b = (4, 8, 2). Buyer 1 gets utility 0.5(10 − 2) = 4 (whereas they’d get
utility 2 for bidding above 8). Buyer 2 gets utility 1(8 − 4) = 4 (whereas they’d get utility 0.5(8 − 2) = 3 for
bidding below 4). Buyer 3 is priced out.

9.2.2 VCG for Position Auctions


The second pricing rule we will consider is the VCG rule. Recall that VCG computes the welfare-maximizing
allocation (assuming truthful bids), and then charges buyer i their externality (i.e. how much the presence
of buyer i decreases the social welfare across the remaining agents).
S
Let W−i be the social welfare achieved by buyers [n] \ i if we maximize welfare across only those buyers,
S−i
and let W−i be the social welfare of [n] \ i if we maximize welfare using all slots except slot i. Buyer i gets
charged their externality, which is as follows:
X X
S S−i
W−i − W−i = bk · sk−1 − bk · sk (9.1)
k∈{i+1,...,S+1} k∈{i+1,...,S}
X
= bk · (sk−1 − sk ) (9.2)
k∈{i+1,...,S+1}

We already saw a sketch of the fact that VCG is truthful in ??, but here we show the result specifically
for the position auction setting, where the proof is nice and short.
Theorem 19. The VCG auction for position auctions is truthful.

Proof. Suppose again that buyer bids are sorted, with buyer i winning slot i when bidding truthfully. Now
suppose buyer i misreports and gets slot k instead. Now we want to show that bidding truthfully maximizes
utility, which means:
S S−i S S−k
si · vi − [W−i − W−i ] ≥ sk · vi − [W−i − W−i ].
Simplifying this expression gives
S−i S−k
si · vi + W−i ≥ sk · vi + W−i .
Now we see that both the right-hand and left-hand sides correspond to social welfare under two different
allocations (where we treat bids from other buyers as their true value). The left-hand side is social welfare
when i bids truthfully, while the right-hand side is social welfare when i misreports in a way that gives them
slot k. Given that VCG picked the left-hand side, and VCG allocates via welfare maximization, the left-hand
side must be larger.

9.3 Historical Notes


An early version of the GSP auction was introduced in the early internet search days at Overture, which was
an innovator in sponsored search advertising, and they were later acquired by Yahoo, which used this rule as
well. Google then started using the more modern version of GSP. From an academic perspective, the GSP
rule and position auctions in general started to be studied by Varian [84] and Edelman et al. [42], motivated
by its use in practice. An interesting historical perspective on why VCG was not chosen is discussed by
Varian and Harris [85] who worked at Google at the time. The primary reasons are essentially inertial: a
lot of engineering work was already going into GSP, and advertisers had gotten used to bidding in GSP. A
major concern would be that they would need to raise their bids in VCG due to its truthfulness, which might
be hard to explain to them given their existing experience with GSP. Facebook notably uses VCG rather
than GSP [85], unlike the prior internet companies.
Chapter 10

Auctions with Budgets

10.1 Introduction
The previous chapter introduced a new aspect to auctions associated with internet advertising auctions: the
multiple-slot issue. This chapter studies a second major practical aspect of internet advertising auctions:
budgets. In these auctions, a large fraction of advertisers specify a budget constraint that must hold in
aggregate across all of the payments made by the advertiser. Because these budget constraints are applied
across all of the auctions that an advertiser participates in, they couple the auctions together, and force
us to consider the aggregate incentives across auctions. This is in contrast to all of our previous auction
results, which studied a single auction in isolation. Notably, these budgets constraints break the incentive
compatibility of the second-price auction; for an advertiser with a budget constraint, it is not necessarily
optimal to bid their true value in each auction!

10.2 Auctions Markets


Throughout the rest of this chapter, we will consider settings where each individual auction is a single-item
auction, using either first or second-price rules. This is of course a simplification: in practice each individual
auction would be more complicated (e.g. a position auction), but even just for single-item individual auctions
it turns out that there are a lot of interesting problems.
In this setting we have n buyers and m goods. Buyer i has value vij for good j, and each buyer has some
budget Bi . Each good j will be sold via sealed-bid auction, using either first or second-price. We assume
that for all buyers i, there exists some item j such that vij > 0, and similarly for all j there exists i such
that vij > 0. Let x ∈ Rn×m be an allocation of items to buyers, with associated prices p ∈ Rm . The utility
that a buyer i derives from this allocation is
(
⟨vi , xi ⟩ − ⟨p, xi ⟩ if ⟨p, xi ⟩ ≤ Bi
ui (xi , p) = .
−∞ otherwise

We call this setting an auction market. If second-price auctions are used then we call it a second-price
auction market, and conversely we call it a first-price auction market if first-price auctions are used.

10.3 Second-Price Auction Markets


In Chapter 3 we saw that the second-price auction is strategyproof. However, this relied on there being
a single auction, and no budgets. It’s easy to construct an example showing that this is no longer true
in second-price auction markets. Consider a market with two buyers and two items, with valuations v1 =
(100, 100), v2 = (1, 1) and budgets B1 = B2 = 1. If both buyers submit their true valuations then buyer 1
wins both items, pays 2, and gets −∞ utility.

65
66 CHAPTER 10. AUCTIONS WITH BUDGETS

Figure 10.1: Comparison of pacing methods. Left: no pacing, middle: probabilistic pacing, right: multi-
plicative pacing.

Instead, each buyer needs to somehow smooth out their spending across auctions. For large-scale Internet
auctions this is typically achieved via some sort of pacing rule. Here we will mention two that have been
used in practice:
1. Probabilistic pacing: each buyer i is given a parameter αi ∈ [0, 1] denoting the probability that they
should participate in each auction. For each auction j, an independent coin is flipped which comes up
heads with probability αi , and if it comes up heads then the buyer submits a bid bij = vij to that
auction.
2. Multiplicative pacing: each buyer i is given a parameter αi ∈ [0, 1], which acts as a scalar multiplier
on their truthful bids. In particular, for each auction j, buyer i submits a bid bij = αi vij .
Both methods have been applied in real-life large-scale Internet ad markets.
Figure 10.1 shows a comparison of pacing methods for a simplified setting where time is taken into
account. Here we assume that we are considering some buyer i whose value is the same for every item,
but other bidders are causing the items to have different prices. On the x-axis we plot time, and on the
y-axis we plot the price of each item. On the left is the outcome from naive bidding: the buyer spends their
budget much too fast, and ends up running out of budget when there are many high-value items left for
them to buy. In practice, many buyers also prefer to smoothly spend their budget throughout the day. In
the middle we show probabilistic pacing, where we do get smooth budget expenditure. However, the buyer
ends up buying some very expensive item, while missing out on much cheaper items that have the same
value to them. Finally, on the right is the result from probabilistic pacing, where the buyer picks an optimal
threshold to buy at, and thus buys item optimally in order of bang-per-buck.
In this note we will focus on multiplicative pacing, but see the historical notes section for some references
to papers that also consider probabilistic pacing.
The intuition given in Figure 10.1 can be shown to hold more generally when items have different values
to the buyer. Generally, it turns out that given a set of bids by all the other bidders, a buyer can always
specify a best response by choosing an optimal pacing multiplier:
Proposition 1. Suppose we allow arbitrary bids in each auction. If we hold all bids for buyers k ̸= i fixed,
then buyer i has a best response that consists of multiplicatively-paced bids (assuming that if a buyer is tied
for winning an auction, they can specify the fraction that they win).
Proof. Since every other bid is held fixed, we can think of each item as having some price pj = maxk̸=i bkj ,
which is what i would pay if they bid bij ≥ bkj . Now we may sort the items in decreasing order of bang-per-
v
buck pijj . An optimal allocation for i clearly consists of buying items in this order, until they reach some
index j such that if they buy every item with index l < j and some fraction xij of item j, they either spend
v v p
their whole budget, or j is the first item with pijj ≥ 1 (if pijj > 1 then xij = 0). Now set αi = vijj . With this
bid, i gets exactly this optimal allocation: for all items l ≤ j (which are the items in the optimal allocation),
p
we have αi vil = vijj vil ≥ vpill vil = pl .

The goal will be to find a pacing equilibrium:


Definition 3. A second-price pacing equilibrium (SPPE) is a vector of pacing multipliers α ∈ [0, 1]n , a
fractional allocation xij , and a price vector such that for every buyer i:
10.3. SECOND-PRICE AUCTION MARKETS 67

• For all j,
P
i xij = 1, and if xij > 0 then i is tied for highest bid on item j.
• If xij > 0 then pj = maxk̸=i αk vkj .
• For all i, j pj xij ≤ Bi . Additionally, if the inequality is strict then αi = 1.
P

The first and second conditions of pacing equilibrium simply enforce that the item always goes to winning
bids at the second-price rule. The third condition ensures that a buyer is only paced if their budget constraint
is binding. It follows (almost) immediately from Proposition 1 that every buyer is best responding in SPPE.
A nice property of SPPE is that it is always guaranteed to exist (this is not immediate from the existence
of, say, a Nash equilibrium in a standard game, since an SPPE corresponds to a specific type of pure-strategy
Nash equilibrium):
Theorem 20. An SPPE of a pacing game is always guaranteed to exist.
We won’t cover the whole proof here, but we will state the main ingredients, which are useful to know
more generally.
• First, a smoothed pacing game is constructed. In the smoothed game, the allocation is smoothed
out among all bids that are within ϵ of the maximum bid, thus making the allocation a deterministic
function of the pacing multipliers α. Several other smooth approximations are also introduced to deal
with other discontinuities. In the end, a game is obtained, where each player simply has as their action
space the interval [0, 1] and utilities are nice continuous and quasi-concave functions.
• Secondly, the following fixed-point theorem is invoked to guarantee existence of a pure-strategy Nash
equilibrium in the smoothed game.
Theorem 21. Consider a game with n players, strategy space Ai , and utility function ui (ai , a−i ). If
the following conditions are satisfied:
– Ai is convex and compact for all i
– ui (si , ·) is continuous in s−i
– ui (·, s−i ) is continuous and quasi-concave in si (quasi-concavity of a function f (x) means that for
all x, y and λ ∈ [0, 1] it holds that f (λx + (1 − λ)y) ≥ min(f (x), f (y)))
then a pure-strategy Nash equilibrium exists.

• Finally, the limit point of smoothed games as the smoothing factor ϵ tends to zero is shown to yield
an equilibrium in the original pacing problem.
Unfortunately, while SPPE is guaranteed to exist, it turns out that sometimes there are several SPPE,
and they can have large differences in revenue, social welfare, and so on. An example is shown in Figure 10.2.
In practice this means that we might need to worry about whether we are in a good and fair equilibrium.
Another positive property of SPPE is that every SPPE is also a market equilibrium, if we consider a
market equilibrium setting where each buyer has a quasi-linear demand function that respects the total
supply as follows:
Di (p) = argmax0≤xi ≤1 ⟨vi − p, xi ⟩ s.t. ⟨p, xi ⟩ ≤ Bi .
This follows immediately by simply using the allocation x and prices p from the SPPE as a market equilib-
rium. Proposition 1 tells us that xi ∈ Di (p), and the market clears by definition of SPPE. This means that
SPPE has a number of nice properties such as no envy and Pareto optimality (although Pareto optimality
requires considering the seller as an agent too).
Finally we turn to the question of computing an SPPE. Unfortunately the news there is bad. It was
shown recently that computing an SPPE is a PPAD-complete problem. This means that there exists a
polynomial-time reduction between the problem of computing a Nash equilibrium in a general-sum game
and that of computing an SPPE, and thus the two problems are equally hard, from the perspective of
computing a solution in polynomial time. Moreover, it was also shown that we cannot hope for iterative
methods to efficiently compute an approximate SPPE. Beyond merely computing any SPPE, we could also
try to find one that maximizes revenue or social welfare. This problem turns out to be NP complete.
There is a mixed-integer program for computing SPPE, but unfortunately it is not very scalable.
68 CHAPTER 10. AUCTIONS WITH BUDGETS

Figure 10.2: Multiplicity of SPPE. On the left is shown a problem instance, and on the right is shown two
possible second-price pacing equilibria.

10.4 First-Price Auction Markets


Next we consider what happens if we instead sell each item by first-price auction as part of an auction
market.
First we start by defining what we call budget-feasible pacing multipliers. Intuitive, this is simply a set of
pacing multipliers such that everything is allocated according to first-price auction, and everybody is within
budget.
Definition 4. A set of budget-feasible pacing multipliers (BFPM) is a vector of pacing multipliers α ∈ [0, 1]n
and a fractional allocation xij such that for every buyer i:
• Prices are defined to be pj = maxk αi vkj .
• For all j, i xij = 1, and if xij > 0 then i is tied for highest bid on item j.
P

• For all i, j pj xij ≤ Bi .


P

Again, the goal will be to find a pacing equilibrium. This is simply a BFPM that satisfied the comple-
mentarity condition on the budget constraint and pacing multiplier.
Definition 5. A first-price pacing equilibrium (FPPE) is a BFPM (α, x) such that for every buyer i:
• For all i, if j pj xij < Bi then αi = 1.
P

Notably, the only difference to SPPE is the pricing condition, which now uses first price.
A very nice property of the first-price setting is that BFPMs satisfy a monotonicity condition: if (α′ , x′ )
and (α′′ , x′′ ) are both BFPM, then the pacing vector α = max(α′ , α′′ ) (where the max is taken component-
wise) is also a BFPM. The associated allocation is that for each item j, we first identify whether the highest
bid comes from α′ or α′′ , and use the corresponding allocation of j (breaking ties towards α′ ).
Intuitively, the reason that (α, x) is also BFPM is that for every buyer i, their bids are the same as in
one of the two previous BFPMs (say (α′ , x′ ) WLOG.), and so the prices they pay are the same as in (α′ , x′ ).
Furthermore, since every other buyer is bidding at least as much as in (α′ , x′ ), they win weakly less of each
item (using the tie-breaking scheme described above). Since (α′ , x′ ) satisfied budgets, (α, x) must also satisfy
budgets. The remaining conditions are easily checked.
In addition to componentwise maximality, there is also a maximal BFPM (α, x) (there could be mul-
tiple x compatible with α) such that α ≥ α′ for all α′ that are part of any BFPM. Consider αi∗ =
sup{αi |α is part of a BFPM}. For any ϵ and i, we know that there must exist a BFPM such that αi > αi∗ −ϵ.
For a fixed ϵ we can take componentwise maxima to conclude that there exists (αϵ , xϵ ) that is a BFPM. This
10.4. FIRST-PRICE AUCTION MARKETS 69

yields a sequence {(αϵ , xϵ )} as ϵ → 0. Since the space of both α and x is compact, the sequence has a limit
point (α∗ , x∗ ). By continuity (α∗ , x∗ ) is a BFPM.
We can use this maximality to show existence and uniqueness (of multipliers) of FPPE:

Theorem 22. An FPPE always exists and the set of pacing multipliers {α} that are part of an FPPE is a
singleton.

Proof. Here we give a high-level proof, a more explicit proof can be found in the paper listed in the notes.
Consider the component-wise maximal α and an associated allocation x such that they form a BFPM.
Since α, x is a BFPM, we only need to check that it has no unnecessarily paced bidders. Suppose some
buyer i is spending strictly less than Bi and αi < 1. If i is not tied for any items, then we can increase
αi for some sufficiently small ϵ and retain budget feasibility, contradicting the maximality of α. If i is tied
for some item, consider the set N (i) of all bidders tied with i. Now take the transitive closure of this set
by repeatedly adding any bidder that is tied with any bidder in N (i). We can now redistribute all the tied
items among bidders in N (i) such that no bidder in N (i) is budget constrained (this can be done by slightly
increasing i’s share of every item they are tied on, then slightly increasing the share of every other buyer
in N (i) who is now below budget, and so on). But now there must exist some small enough δ > 0 such
that we can increase the pacing multiplier of every bidder in N (i) by δ while retaining budget feasibility and
creating no new ties. This contradicts α being maximal. We get that there can be no unnecessarily paced
bidders under α
Finally, to show uniqueness, consider any alternative BFPM α′ , x′ . Consider the set I of buyers such that
αi < α; Since α ≥ α′ and α ̸= α′ this set must have size at least one. Since all buyers in I were spending

less than their budget under α, and their collective spending strictly decreased, at least one buyer in I must
not be spending their whole budget. But αi′ < αi ≤ 1 for all i ∈ I, so that buyer must be unnecessarily
paced.

10.4.1 Sensitivity
FPPE enjoys several nice monotonicity and sensitivity properties that SPPE does not. Several of these
follow from the maximality property of FPPE: the unique FPPE multipliers α are such that α ≥ α′ for any
other BFPM (α′ , x′ ).
The following are all guaranteed to weakly increase revenue of the FPPE:

1. Adding a bidder i: the old FPPE (α, x) is still BFPM by setting αi = 0, xi = 0. By α monotonicity
prices increase weakly.

2. Adding an item: The new FPPE α′ satisfies α′ ≤ α (for contradiction, consider the set of bidders
whose multipliers increased, since they win weakly more and prices went up, somebody must break
their budget). Now consider the bidders such that αi′ < αi . Those bidders spend their whole budget
by the FPPE “no unnecessary pacing” condition. For bidders such that αi′ = αi , they pay the same as
before, and win weakly more.

3. Increasing a bidder i’s budget: the old FPPE (α, x) is still BFPM, so this follows by α maximality.

It is also possible to show that revenue enjoys a Lipschitz property: increasing a single buyer’s budget
by ∆ increases revenue by at most ∆. Similarly, social welfare can be bounded in terms of ∆, though
multiplicatively, and it does not satisfy monotonicity.

10.4.2 Convex Program


Next we consider how to compute an FPPE. This turns out to be easier than for SPPE. This is due to a
direct relationship between FPPE and market equilibrium: FPPE solutions are exactly the set of solutions
to the quasi-linear variant of the Eisenberg-Gale convex program for computing a market equilibrium:
70 CHAPTER 10. AUCTIONS WITH BUDGETS

X
max Bi log(ui ) − δi X X
x≥0,δ≥0,u min pj − Bi log(βi )
i p≥0,β≥0
X j i
ui ≤ xij vij + δi , ∀i (10.1) (10.3)
∀i, pj ≥ vij βi
j
X βi ≤ 1
xij ≤ 1, ∀j (10.2)
i
On the left is shown the primal convex program, and on the right is shown the dual convex program. The
variables xij denote the amount of item j that bidder i wins. The leftover budget is denoted by δi , it arises
from the dual program: it is the primal variable for the dual constraint βi ≤ 1, which constrains bidder i to
paying at most a price-per-utility rate of 1.
The dual variables βi , pj correspond to constraints (10.1) and (10.2), respectively. They can be interpreted
p
as follows: βi is the inverse bang-per-buck: minjs.t.xij >0 vijj for buyer i, and pj is the price of good j.
We may use the following basic fact from convex optimization to conclude that strong duality holds and
get optimality conditions:
Theorem 23. Consider a convex program and its dual
max q(λ)
min f (x) λ≥0
x
q(λ) := min L(x, λ)
gi (x) ≤ 0, ∀i (10.4) x≥0 (10.5)
X
x≥0 L(x, λ) := f (x) + λi gi (x)
i
with Lagrange multipliers λi for each constraint i. Assume that the following Slater constraint qualification
is satisfied: there exists some x ≥ 0 such that gi (x) < 0 for all i. If (10.4) has a finite optimal value f ∗ then
(10.5) has a finite optimal value q ∗ and f ∗ = q ∗ . Furthermore a solution pair x∗ , λ∗ is optimal if and only
if the following Karush-Kuhn-Tucker (KKT) conditions hold:
• (primal feasibility) x∗ is a feasible solution of (10.4)
• (dual feasibility) λ∗ ≥ 0

• (complementary slackness) λ∗i gi (x∗ ) = 0 for all i


• (stationarity) x∗ ∈ argminx≥0 L(x, λ∗ )
We can use the strong duality theorem above, and in particular the KKT conditions, to show that FPPE
and EG are equivalent.
Informally, the correspondence between FPPE and solutions to the convex program follows because
βi specifies a single price-per-utility rate per bidder which exactly yields the pacing multiplier αi = βi .
Complementary slackness then guarantees that if pj > vij βi then xij = 0, so any item allocated to i
has exactly rate βi . Similarly, complementary slackness on βi ≤ 1 and the associated primal variable δi
guarantees that bidder i is only paced if they spend their whole budget.
Theorem 24. An optimal solution to the quasi-linear Eisenberg-Gale convex program corresponds to an
FPPE with pacing multiplier αi = βi and allocation xij , and vice versa.
Proof. Clearly the quasi-linear Eisenberg-Gale convex program satisfies the Slater constraint qualification:
we may use the proportional allocation where every buyers gets n1 of every item to see this. Thus the optimal
solution must satisfy the following KKT conditions:
Bi Bi P
1. ui = βi ⇔ ui = βi 5. pj > 0 ⇒ i xij = 1
2. βi ≤ 1
pj 6. δi > 0 ⇒ βi = 1
3. βi ≤ vij
pj
4. xij , δi , βi , pj ≥ 0 7. xij > 0 ⇒ βi = vij
10.5. IN WHAT SENSE ARE WE IN EQUILIBRIUM? 71

It is easy to see that xij is a valid allocation: the primal program has the exact packing constraints.
Budgets are also satisfied (here we may assume ui > 0 since otherwise budgets are satisfied since the bidder
wins no items): by KKT condition 1 and KKT condition 7 we have that for any item j that bidder i is
allocated part of:
Bi pj Bi vij xij
= ⇒ = pj xij
ui vij ui
If δi = 0 then summing over all j gives
P
X j vij xij
pj xij = Bi = Bi
j
ui

This part of the budget argument is exactly the same as for the standard Eisenberg-Gale proof [74]. Note
that (10.1) always holds exactly since the objective is strictly
P increasing in ui . Thus δi = 0 denotes full
budget expenditure. If δi > 0 then (10.1) implies that ui > j vij xij which gives:
P
j vij xij
X
pj xij = Bi < Bi
j
ui

This shows that δi > 0 denotes some leftover budget.


If bidder i is winning some of item j (xij > 0) then KKT condition 7 implies that the price on item j is
αi vij , so bidder i is paying their bid as is necessary in a first-price auction. Bidder i is also guaranteed to
be among the highest bids for item j: KKT conditions 7 and 3 guarantee αi vij = pj ≥ αi′ vi′ j for all i′ .
Finally each bidder either spends their entire budget or is unpaced: KKT condition 6 says that if δi > 0
(that is, some budget is leftover) then βi = αi = 1, so the bidder is unpaced.
Now we show that any FPPE satisfies the KKT conditions for EG. WeP set βi = αi and use the allocation
x from the FPPE. We set δi = 0 if α < 1, otherwise we set it to Bi − j xij vij . We set ui equal to the
utility of each bidder. KKT condition 1 is satisfied since each bidder either gets a utility rate of 1 if they
are unpaced and so ui = Bi or their utility rate is αi so they spend their entire budget for utility Bi /αi .
KKT condition 2 is satisfies since αi ∈ [0, 1]. KKT condition 3 is satisfied since each item bidder i wins
p
has price-per-utility αi = vijj = βi , and every other item has a higher price-per-utility. KKT conditions
(4) and (5) are trivially satisfied by the definition of FPPE. KKT condition 6 is satisfied by our solution
construction. KKT condition 7 is satisfied because a bidder i being allocated any amount of item j means
that they have a winning bid, and their bid is equal to vij αi .
It follows that an FPPE can be computed in polynomial time, and that we can apply various first-order
methods to compute large-scale FPPE.

10.5 In what sense are we in equilibrium?


We introduced the pacing equilibrium using terminology similar to how we previously discussed game-
theoretic equilibria such as Nash equilibria. Yet, it is useful to take a moment to consider what the actual
equilibrium properties that we are getting under pacing equilibria are. When we defied pacing equilibria,
we asked for a certain complementarity condition on the pacing multipliers, the “no unecceasry pacing”
condition. This condition is not a game-theoretic equilibrium condition, but rather a condition on the budget-
management algorithms that the buyers are using. In particular, it is a condition that an online learning
algorithm on the Lagrange multiplier of the budget constraint would try to maintain. Now, assuming that
you are in a static environment where at each time step, the m items from the pacing model show up, then a
pacing equilibrium would be stable, in the sense that if everyone bids according to the computed multipliers,
and tied goods are split according to the fractional amounts from the equilibrium, then “no unnecessary
pacing” is satisfied, so the budget-management algorithms won’t change their pacing multipliers. In this
sense we are in equilibrium.
However, from the perspective of the buyers, they may or may not be best responding to each other. In
the context of second-price pacing equilibrium, it is possible to show that a pacing equilibrium is a pure Nash
72 CHAPTER 10. AUCTIONS WITH BUDGETS

equilibrium of a game where each buyer is choosing their pacing multiplier, and observing their quasi-linear
utility (with −∞ utility for breaking the budget). Moreover, in the second-price setting, if we fix the bids
of every other buyer, then a pacing multiplier αi that satisfies no unnecessary pacing is actually a best
response over the set of all possible ways to bid in each individual auction. In the case of first-price pacing
equilibrium, we do not have this property. In FPPE, a buyer might wish to shade their own price. In that
case, FPPE should be thought of only as a budget-management equilibrium among the algorithmic proxy
bidders that control budget expenditure. Secondly, due to this shading, the values vij that we took as input
to the FPPE problem should probably be thought of as the bids of the buyers, which would generally be
lower than their true values.

10.6 Conclusion
There are interesting differences in the properties satisfied by SPPE and FPPE. We summarize them quickly
here (these are all covered in the literature noted in the Historical Notes):

• FPPE is unique (this can be shown from the convex program, or directly from the monotonicity
property of BFPM), SPPE is not

• FPPE can be computed in polynomial time, computing an SPPE is a PPAD-complete problem

• FPPE is less sensitive to perturbation (e.g. revenue increases smoothly as budgets are increased)

• SPPE corresponds to a pure-strategy Nash equilibrium, and thus buyers are best responding to each
other

• Both correspond to different market equilibria (but SPPE requires buyer demands to be “supply
aware”)

• Neither of them are strategyproof

• Due to the market equilibrium connection, both can be shown strategyproof in an appropriate “large
market” sense

FPPE and SPPE have also been studied experimentally, both via random instances, as well as instances
generated from real ad auction data. The most interesting takeaways from those experiments are:

• In practice SPPE multiplicity seems to be very rare

• Manipulation is hard in both SPPE and FPPE if you can only lie about your value-per-click

• FPPE dominates SPPE on revenue

• Social welfare can be higher in either FPPE or SPPE. Experimentally it seems to largely be a toss-up
on which solution concept has higher social welfare.

10.7 Historical Notes


The multiplicative pacing equilibrium results shown in this lecture note were developed by Conitzer et al.
[36] for SP auction markets, and Conitzer et al. [37] for FP auction markets. Another strand of literature
has studied models where items arrive stochastically and valuations are then drawn independently. Balseiro
et al. [3] show existence of pacing equilibrium for multiplicative pacing as well as several other pacing rules
for such a setting; they also give a very interesting comparison of revenue and social welfare properties of
the various pacing option in the unique symmetric equilibrium of their setting. Most notably, multiplicative
pacing achieves strong social welfare properties, while probabilistic pacing achieves higher revenue properties.
Balseiro, Besbes, and Weintraub [5] show that when bidders get to select their bids individually, multiplicative
pacing equilibrium arises naturally via Lagrangian duality on the budget constraint, under a fluid-based
10.7. HISTORICAL NOTES 73

mean-field market model. The PPAD-completeness of computing an SPPE was given by Chen, Kroer, and
Kumar [29]
The quasi-linear variant of Eisenberg-Gale was given by Chen, Ye, and Zhang [27] and independently
by Cole et al. [34] (an unpublished note from one of the authors in Cole et al. [34] was in existence around
a decade before the publication of Cole et al. [34]). Theorem 23 is a specialization to the FPPE setting.
In reality much stronger statements can be made: For a more general statement of the strong duality
theorem and KKT conditions used here, see Bertsekas, Nedic, and Ozdaglar [11] Proposition 6.4.4. The
KKT conditions can be significantly generalized beyond convex programming.
The fixed-point theorem that is invoked to guarantee existence of a pure-strategy Nash equilibrium in
the smoothed game is by Debreu [39], Glicksberg [54], and Fan [45].
74 CHAPTER 10. AUCTIONS WITH BUDGETS
Chapter 11

Online Budget Management

11.1 Introduction
In the last lecture note we studied auctions with budgets and repeated auctions. However, we ignored one
important aspect: time. In this lecture note we consider an auction market setting where a buyer is trying
to adaptively pace their bids over time. The goal is to hit the “right” pacing multiplier as before, but each
bidder has to learn that multiplier as the market plays out. We’ll see how we can approach this problem
using ideas from regret minimization.

11.2 Dynamic Auctions Markets


In this setting we have n buyers who repeatedly participate in second-price auctions. At each time period
t = 1, . . . , T a single second-price auction is run. At time t, each bidder samples a valuation vit independently
from a cumulative distribution function Fi which is assumed to be absolutely continuous and with bounded
density fi whose support is [0, v̄i ]. As usual, we assume that each buyer has some budget Bi that they should
satisfy, and we denote by ρi = Bi /T the per-period target expenditure; we assume ρi ≤ v̄i . We may think
of each buyer as being characterized by a type θi = (Fi , ρi ).
At each time period t buyer i observes their valuation vit and then submits a bid bit . We will use
dit = maxk̸=i bit to denote the highest bid other than that of i. As before the utility of an buyer is quasi-
linear and thus if they win auction t they get utility vit − dit . We may write the utility using an indicator
variable as uit = 1{dit ≤ bit }(vit − dit ), and the expenditure zit = 1{di,t ≤ bit }dit .
It is assumed that each buyer has no information on the valuation distributions, including their own.
Instead, they just know their own target expenditure rate ρi and the total number of time periods T . Buyers
also do not know how many other buyers are in the market.
t−1
At time t, buyer i knows the history (viτ , biτ , ziτ , uiτ )τ =1 of own values, bids, payments, and utilities.
Furthermore, they know their current value vit . Based on this history, they choose a bid bit . We will say
that a bidding strategy for buyer i is a sequence of mappings β = β1 , . . . where βt maps the current history
to a bid (potentially in randomized fashion). The strategy β is budget feasible if the bids bβit generated by
β are such that
T
1{dit ≤ bβit }dit ≤ Bi
X

t=1

under any vector of highest competitor bids di .


For a given realization of values vi = vi1 , . . . , viT and highest competitor bids di we denote the expected
value of a strategy β as
" T #
1{dit ≤ bit }(vit − dit ) ,
β
X β
πi (vi , di ) = E
t=1

where the expectation is taken with respect to randomness in β.

75
76 CHAPTER 11. ONLINE BUDGET MANAGEMENT

We would like to compare our outcome to the hindsight optimal strategy. We denote the expected value
of that strategy as
XT
πiH (vi , di ) := max xit (vit − dit )
xi ∈{0,1}T
t=1
T (11.1)
X
s.t. xit dit ≤ Bi
t=1

The hindsight-optimal strategy has a simple structure: we simply choose the optimal subset of items to win
while satisfying our budget constraint. In the case where the budget constraint is binding, this is a knapsack
problem.
Ideally we would like to choose a strategy such that πiβ approaches πiH . However, this turns out not to
be possible. We will use the idea of asymptotic γ-competitiveness to see this. Formally, β is asymptotic
γ-competitive if
1 H 
lim sup sup πi (vi , di ) − γπiβ (vi , di ) ≤ 0
T →∞, vi ∈[0,v̄i ]T , T
Bi =ρi T
di ∈RT
+

Intuitively, the condition says that asymptotically, β should achieve at least 1/γ of the hindsight-optimal
expected value.
For any γ < v̄i /ρi , asymptotic γ-competitiveness turns out to be impossible to achieve. Thus, if our target
expenditure ρi is much smaller than our maximum possible valuation, we cannot expect to do anywhere near
as well as the hindsight-optimal strategy.
The general proof is quite involved, but the high-level idea is not too complicated. Here we show the
construction for v̄i = 1, ρi = 1/2, and thus the claim is that γ < v̄i /ρi = 2 is unachievable. The impossibility
is via a worst-case instance. In this instance, the highest other bid comes from one of the two following
sequences:

d1 = (dhigh , . . . , dhigh , v̄i , . . . . . . , v̄i )


d2 = (dhigh , . . . , dhigh , dlow , . . . , dlow ) ,

for v̄i ≥ dhigh > dlow > 0. The general idea behind this construction is that in the sequence d1 , buyer i
must buy many of the expensive items in order to maximize their utility, since they receive zero utility for
winning items with price v̄i . However, in the sequence d2 , buyer i must save money so that they can buy
the cheaper items priced at dlow .
For the case we consider here, there are T /2 of each type of highest other bid (assume T is even for
convenience). Now, we may set dhigh = 2ρi − ϵ and dlow = 2ρi − kϵ, where ϵ and k are constants that can
be tuned. For sufficiently small ϵ, i can only afford to buy T /2 items total, no matter the combination of
items. Furthermore, buying an item at price dlow yields k times as much utility as buying an item at dhigh .
Now, in order to achieve at least half of the optimal utility under d1 , buyer i must purchase at least T /4
of the items priced at dhigh . Since they don’t know whether d1 or d2 occurred until after deciding whether
to buy at least T /4 of the dhigh items, this must also occur under d2 . But then buyer i can at most afford
to buy T /4 of the items priced at dlow when they find themselves in the d2 case. Now for any γ < 2, we can
pick k and ϵ such that achieving γπiH requires buying at least T /4 + 1 of the dlow items.
It follows that we cannot hope to design an online algorithm that competes with γπiH for γ < v̄i /ρi .
However, it turns out that a subgradient descent algorithm can achieve exactly γ = v̄i /ρi

11.3 Adaptive Pacing Strategy


1
The idea is to construct a pacing multiplier αi = 1+µ by running a subgradient descent scheme on the value
for µ that allows i to smoothly spend their budget across the T time periods.
The algorithm takes as input a stepsize ϵi > 0 and some initial value µ1 ∈ [0, µ̄i ] (where µ̄i is some
upper bound on how large µ needs to be). We use P[0,µ̄i ] to denote projection onto the interval [0, µ̄i ]. The
algorithm, which we call APS, proceeds as follows
11.4. HISTORICAL NOTES 77

• Initialize the remaining budget at B̃i1 = Bi


• For every time period t = 1, . . . , T :
vit
1. Observe vit , construct a paced bid bit = min( 1+µ t
, B̃it )
2. Observe spend zit , and update the pacing multiplier:

µt+1 = P[0,µ̄i ] (µt − ϵi (ρi − zit ))

3. Update remaining budget B̃i,t+1 = B̃it − zit


This algorithm is motivated by Lagrangian duality. Consider the following Lagrangian relaxation of the
hindsight-optimal optimization problem (11.1):
T
X
max [xit (vit − (1 − µ)dit ) + µρi ] .
x∈{0,1}T
t=1

The optimal solution for the relaxed problem is easy to characterize: we set xit = 1 for all t such that
vit
vit ≥ (1 − µ)dit . Importantly, this is achieved by the bid bit = 1+µ that we use in APS.
The Lagrangian dual is the minimization problem
T
X
(vit − (1 − µ)dit )+ + µρi ,
 
inf (11.2)
µ≥0
t=1

where (·)+ denotes thresholding at 0. This dual problem upper bounds πiH (but we do not necessarily have
strong duality since we did not even start out with a convex primal program). The minimizer of the dual
problem yields the strongest possible upper bound on ϕH i , however, solving this requires us to know the
entire sequences vi , di . APS approximates this optimal µ by taking a subgradient step on the t’th term of
the dual:
∂µ (vit − (1 − µ)dit )+ + µρi ∋ ρi − dit 1{bit ≥ dit } = ρi − zit .
 

Thus APS is taking subgradient steps based on the subdifferential of the t’th term of the Lagrangian dual
of the hindsight-optimal optimization problem.
The APS algorithm achieves exactly the lower bound we derived earlier, and is thus asymptotically
optimal:
Theorem 25. APS with stepsize ϵi = O(T −1/2 ) is v̄i
ρi -asymptotic competitive, and converges at a rate of
O(T −1/2 ).
This result holds under adversarial conditions: for example, the sequence of highest other bids may be
as d1 , d2 in the lower bound. However, in practice we do not necessarily expect the world to be quite this
adversarial. In a large-scale ad market, we would typically expect the sequences vi , di to be more stochastic
in nature. In a fully stochastic setting with independence, APS turns out to achieve πiH asymptotically:
Theorem 26. Suppose (vit,dit ) are sampled independently from stationary, absolutely continuous CDFs
with differentiable and bounded densities. Then the expected payoff from APS with stepsize ϵi = O(T −1/2 )
approaches πiH asymptotically at a rate of T −1/2 .
Theorem 26 shows that if the environment is well-behaved then we can expect much better performance
from APS.

11.4 Historical Notes


The material presented here was developed by Balseiro and Gur [4]. Beyond auction markets, the idea of
using paced bids based on the Lagrange multiplier µ has been studied in the revenue management literature,
see e.g. Talluri and Van Ryzin [81], where it is shown that this scheme is asymptotically optimal as T tends
to infinity. There is also recent work on the adaptive bidding problem using multi-armed bandits [50].
78 CHAPTER 11. ONLINE BUDGET MANAGEMENT
Chapter 12

Demographic Fairness

12.1 Introduction
This chapter studies the issue of demographic fairness. This is a separate topic from the types of fairness
we have studied so far, which was largely focused on individual fairness notions such as envy-freeness and
proportionality. Moreover, in the context of ad auctions, those fairness guarantees are with respect to
advertisers, since they are the buyers/agents in the market equilibrium model of the ad auction markets.
Demographic fairness, on the other hand, is a fairness notion with respect to the users who are being shown
the ads. In the context of the Fisher market models we have studied so far, this means that demographic
fairness will be a property measured on the item side, since items correspond to ad slots for particular users.
Secondly, some demographic fairness notions will be with respect to groups of users, rather than individual
users. A serious concern with internet advertising auctions and recommender systems is that the increased
ability to target users based on features could lead to harmful effects on subsets of the population, such
as gender or race-based biases in the types of ads or content being shown. We will start by looking at a
few real-world examples where notions of demographic fairness were observed to be violated. We will then
describe some potential ideas for implementing fairness in the context of Fisher markets and first-price ad
auctions, but it is important to emphasize that this is an evolving area, and it is not clear that there is
a simple answer to the question of how to guarantee certain types of demographic fairness, and moreover
there are tradeoffs involved between various notions, as well as between fairness and other objectives such
as revenue or welfare.

12.1.1 Age Discrimination in Job Ads


ProPublica reported in 2017 that many companies were using age as part of their targeting criteria for job
ads they were placing on Facebook [2]. This included Amazon, Verizon, UPS and Facebook itself. Quoting
from the article:

Verizon placed an ad on Facebook to recruit applicants for a unit focused on financial planning and
analysis. The ad showed a smiling, millennial-aged woman seated at a computer and promised
that new hires could look forward to a rewarding career in which they would be “more than just
a number.”
Some relevant numbers were not immediately evident. The promotion was set to run on the
Facebook feeds of users 25 to 36 years old who lived in the nation’s capital, or had recently
visited there, and had demonstrated an interest in finance.

Whether age-based targeting of job ads is illegal was not completely clear, as of 2017 when this article was
written. The federal Age Discrimination in Employment Act of 1967 prohibits bias against people aged 40
or older both in hiring and employment. Whether the company placing the ad, as well as Facebook, could
be held liable for age discrimination was not clear, since the law was written before the internet age, and it
was not clear whether the law applied to targeted ads.

79
80 CHAPTER 12. DEMOGRAPHIC FAIRNESS

12.1.2 Targeting Housing Ads along Racial Boundaries


ProPublica also reported in 2016 on the fact that advertisers had the ability to run ads that exclude certain
“ethnic affinities” such as “hispanic affinity” or “african-american affinity” on Facebook [1]. Since Facebook
does not ask users about race, these affinity categories are stand-in estimates based on user interests and
behavior. On the benign side, these features can be used to test for example how an ad in Spanish versus
English will perform in a hispanic population. More generally, it can be used as a tool for advertisers to
understand how their products are received by different groups.
However, ProPublica reported that they were able to create a (fake) ad for an event related to first-time
home buying, where they could use these categories to exclude various ethnic groups from seeing the ad.
When it comes to topics such as housing, the Fair Housing Act from 1968 made it illegal

”to make, print, or publish, or cause to be made, printed, or published any notice, statement,
or advertisement, with respect to the sale or rental of a dwelling that indicates any preference,
limitation, or discrimination based on race, color, religion, sex, handicap, familial status, or
national origin.”

In other contexts, such as e.g. traditional newspapers, advertisements are reviewed before being accepted to
be shown, in order to ensure that they do not violate these laws. However, in the context of online advertising,
the process is much more automated and algorithmic, and the targeting criteria are powerful enough that
one has to think carefully about what fairness means and how it can be implemented algorithmically.
For the remainder of the lecture, we will operate under the assumption that we wish to ensure various
demographic properties of how ads are shown, for ads that are viewed as “sensitive”. Beyond employment
and housing, another category of ads that are viewed as sensitive are credit opportunities. Again, existing
laws that were created prior to the internet disallow discrimination based on demographic properties in
lending.

12.2 Disallowing Targeting


If we wish to prohibit the potential discrimination described above, we could introduce a category of “sensitive
ads,” where we do not allow age, gender, or racial features to be used as a feature. One might naively think
that this would work, but unfortunately there are many ways to perform indirect targeting of these categories.
For example, zip code can often be a strong proxy for race, and thus care is needed in order to ensure that
we do not allow proxy-based targeting of these sensitive features.
Facebook took such an approach in 2019 [79], based on a settlement with various civil rights organizations.
In that approach, they disallow targeting on age, gender, zip code, and “cultural affinities” for what they
categorize as sensitive ads. That categorization includes housing, employment, and credit opportunities.
While this approach ensures that a certain type of discrimination cannot occur, it does not necessarily
rule out other forms of biases in how ads are serverd.

12.3 Demographic Fairness Measures


We will next study explicit quantifiable measures of fairness. These can potentially be used to audit whether
a given ad or system contains biases, or as guiding measures for how to adaptively change the allocation
system in order to ensure unbiasedness.
To make things concrete, suppose we have m users, and a single sensitive ad i. We will assume that
each user j is associated with a non-sensitive feature vector wj , and each user also belongs to one of two
demographic groups, A or B, which is considered a sensitive attribute; let gj denote this group. We let GA
and GB be the set of all indices denoting users in group A or group B, respectively. As usual, we will use
xij ∈ [0, 1] to denote the probability that the ad i is shown to user j.
12.4. IMPLEMENTING FAIRNESS MEASURES ON A PER-AD BASIS 81

Statistical Parity This notion of demographic fairness asks that ad i is shown at an equal rate across the
two groups, in the following sense:

1 X 1 X
xij = xij .
|GA | |GB |
j∈GA j∈GB

This guarantees that, in aggregate, the groups are being shown the ad at an equal rate.
Next, let’s see an example of how statistical parity could be broken even though targeting by demographic
features is disallow. Suppose that a sensitive ad (say a job ad) wishes to target users in either demographic,
and has a value of $1 per click, with a click-through rate that depends only on wj and not gj . Secondly,
there’s another ad which is not sensitive, which has a value per click of $2, and click-through rates of 0.1
and 0.6 for groups A and B respectively. Now, the sensitive ad will never be able to win any slots for group
B since even with a CTR of 1, their bid will be lower than 0.6 · 2 = 1.2. As a result, the sensitive ad will
be shown only to group A. A concrete example of how this competition-driven form of bias might occur is
when the non-sensitive ad is some form of female-focused product such as clothing or make-up.
A potential criticism of this fairness measure is that it does not require the ad to be shown to equally
interested users in both groups. Thus, one could for example worry that the ad might end up buying highly
relevant slots among one group, and cheap irrelevant slots in the other group in order to satisfy the constraint.

Similar Treatment Similar treatment (ST) asks for an individual-level fairness guarantee: if two users i
and k have the same non-sensitive feature vector wj = wk , then they should be treated similarly regardless
of the value of gj and gk . A simple version of this principle for ad auctions could be that we require xij = xik
whenever wj = wk . However, if the feature space is large, some features are continuous, or we just want
this to hold even when users are similar in terms of wj and wk , then we need a slightly more complicated
constraint. Suppose we have a measure d(wj , wk ) that measures similarity between feature vectors. Then,
ST can be defined as
|xij − xik | ≤ d(wj , wk ).

With this definition, we are asking for more than just equality when wj = wk ; instead we also ask that the
difference between xij and xik should decrease smoothly as the non-sensitive feature vectors get closer to
each other, as measured by d.

12.4 Implementing Fairness Measures on a Per-Ad Basis


In this section we highlight some difficulties in applying these fairness notions straightforwardly in ad auction
markets. We will focus on statistical parity; similar treatment seems even more difficult to implement.
If we consider the hindsight optimization problem faced by an individual ad, we could add a constraint
that the ad’s allocation satisfies statistical parity.
m
X
max (vit − pit )xit (12.1)
xi ∈[0,1]T
t=1
T
X
s.t. pit xit ≤ Bi (12.2)
t=1
1 X 1 X
xij = xij . (12.3)
|GA | |GB |
j∈GA j∈GB

However, this constraint is not easy to implement as part of an online allocation procedure, for two
reasons. The first is that equality constraints such as this one are harder to handle as part of an online
learning procedure, than the simpler “packing constraint” needed for the budgets (a less-than-or-equals
constraints with only positive coefficients). The second reason is that we do not know the normalizing
factors until the end.
82 CHAPTER 12. DEMOGRAPHIC FAIRNESS

12.5 Fairness Constraints in FPPE via Taxes and Subsidies


Now we study a potential way that we could implement demographic fairness in the context of Fisher markets
and first-price ad auctions. Specifically, we will see that the Eisenbeg-Gale convex program lets us derive a
tax/subsidy scheme for demographic fairness. The high-level idea is that we can consider a more constrained
variant of EG for FPPE, where we insist that the computed allocation satisfies our fairness constraints,
and then we can use KKT conditions to derive appropriate taxes and subsidies from the resulting Lagrange
multipliers on the fairness constraints. To be concrete, suppose that for a group of buyers I ⊂ [n], perhaps
representing a particular group of sensitive ads such as job ads, we wish to enforce statistical parity across
this group in an FPPE setting. Then, we can consider the following constrained version of the EG program:
X
max Bi log(ui ) − δi
x≥0,δ≥0,u
i
X
ui ≤ xij vij + δi , ∀i (12.4)
j
X
xij ≤ 1, ∀j, (12.5)
i
X X X X
xij = xij (12.6)
i∈I j∈GA i∈I j∈GB

Now, our EG program maximizes the quasilinear EG objective, but over a smaller set of feasible allocations:
those that satisfy the statistical parity constraint across buyers in I.
The key to analyzing this new quasilinear EG variant is to use the Lagrange multipliers on Eq. (12.6).
Let (x, p) be the optimal allocation, and let p be the prices derived from the Lagrange multipliers on the
supply constraints Eq. (12.5). Let λ be the Lagrange multiplier on Eq. (12.6). We will show that (x, p, λ)
is a form of market equilibrium, where we charge each buyer i ∈ I a price of pj + λ for j ∈ A and a price
of pj − λ for j ∈ B, where λ is the Lagrange multiplier on Eq. (12.6). Buyers i ∈ / I are simply charged the
price vector p. Clearly, this is not our usual notion of market equilibrium: we are charging two different sets
of prices: prices for buyers in I and prices for buyers not in I.
First, consider some non-sensitive buyer i ∈ / I. For such a buyer, we can show that xi ∈ Di (p) using the
exact same argument as in the case of the standard quasilinear EG program in Theorem 24. Similarly, we
can show that each item is fully allocated if pj > 0 using the same arguments as before. It is also direct
from feasibility that the statistical parity constraint is satisfied.
Given the above, we only need to see what happens for buyers i ∈ I. Ignoring feasibility conditions which
are straightforward, the KKT conditions pertaining to buyer i are as follows:

Bi Bi
1. ui = βi ⇔ ui = βi 4. δi > 0 ⇒ βi = 1
2. βi ≤ 1
pj ±λ pj ±λ
3. βi ≤ vij 5. xij > 0 ⇒ βi = vij

Here, the ± should be interpreted as + for j ∈ A and − for j ∈ B. Now it is straightforward from KKT
conditions 3 and 5 that buyer i buys only items with optimal price-per-utility under the prices pj ± λ. From
here, the same argument as in Theorem 24 can be performed in order to show that buyer i spends their
whole budget, which shows that they received a bundle xi ∈ Di (p ± λ).
It follows from the above that (x, p, λ) is a market equilibrium (with different prices for I and [n] \ I),
and thus we can use the Lagrange multiplier λ as a tax/subsidy scheme in order to enforce statistical parity.

12.6 Historical Notes


The field of “algorithmic fairness” pioneered a lot of the fairness considerations that we considered in this
note, in the context of machine learning. Dwork et al. [40] introduced similar treatment in the context of
machine learning classification, and the notion that we use here for ad auction allocation is an adaptation of
12.6. HISTORICAL NOTES 83

their definitions. They also study statistical parity in the classification context. A book-level treatment of
fairness in machine learning is given by Barocas et al. [7]. Many of these fairness notions were also previously
known in the education testing and psychometrics literature. See the biographical notes in Barocas et al. [7]
for an overview of these older works. The quasilinear Fisher market model with statistical parity constraints
via taxes and subsidies was studied in Peysakhovich et al. [76], which also studies several other fairness
questions in the context of Fisher markets. A related work is Jalota et al. [58]. This work does not study
fairness directly, but shows how per-buyer linear constraints can be implemented in a similar way to what
we describe in Section 12.5.
84 CHAPTER 12. DEMOGRAPHIC FAIRNESS
Bibliography

[1] Julia Angwin and Terry Parris Jr. Facebook lets advertisers exclude users
by race. ProPublica, 2016. URL https://www.propublica.org/article/
facebook-lets-advertisers-exclude-users-by-race.

[2] Julia Angwin, Noam Scheiber, and Ariana Tobin. Dozens of companies are using facebook to ex-
clude older workers from job ads. ProPublica, 2016. URL https://www.propublica.org/article/
facebook-ads-age-discrimination-targeting.

[3] Santiago Balseiro, Anthony Kim, Mohammad Mahdian, and Vahab Mirrokni. Budget management
strategies in repeated auctions. In Proceedings of the 26th International Conference on World Wide
Web, pages 15–23, 2017.

[4] Santiago R Balseiro and Yonatan Gur. Learning in repeated auctions with budgets: Regret minimization
and equilibrium. Management Science, 65(9):3952–3968, 2019.

[5] Santiago R Balseiro, Omar Besbes, and Gabriel Y Weintraub. Repeated auctions with budgets in ad
exchanges: Approximations and design. Management Science, 61(4):864–884, 2015.

[6] Siddharth Barman, Sanath Kumar Krishnamurthy, and Rohit Vaish. Finding fair and efficient alloca-
tions. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 557–574,
2018.

[7] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairmlbook.org,
2019. http://www.fairmlbook.org.

[8] Amir Beck. First-order methods in optimization, volume 25. SIAM, 2017.

[9] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex
optimization. Operations Research Letters, 31(3):167–175, 2003.

[10] Xiaohui Bei, Jugal Garg, and Martin Hoefer. Ascending-price algorithms for unknown markets. ACM
Transactions on Algorithms (TALG), 15(3):1–33, 2019.

[11] Dimitri P Bertsekas, A Nedic, and A Ozdaglar. Convex analysis and optimization. 2003. Athena
Scientific, 2003.

[12] Dimitris Bertsimas and John N Tsitsiklis. Introduction to linear optimization, volume 6. Athena scientific
Belmont, MA, 1997.

[13] Benjamin Birnbaum, Nikhil R Devanur, and Lin Xiao. Distributed algorithms via gradient descent for
fisher markets. In Proceedings of the 12th ACM conference on Electronic commerce, pages 127–136.
ACM, 2011.

[14] Christian Borgs, Jennifer Chayes, Nicole Immorlica, Kamal Jain, Omid Etesami, and Mohammad Mah-
dian. Dynamics of bid optimization in online advertisement auctions. In Proceedings of the 16th inter-
national conference on World Wide Web, pages 531–540, 2007.

85
86 BIBLIOGRAPHY

[15] Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold’em poker
is solved. Science, 347(6218):145–149, 2015.

[16] Stephen P Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[17] Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top
professionals. Science, 359(6374):418–424, 2018.

[18] Noam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted regret mini-
mization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1829–1836,
2019.

[19] Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885–
890, 2019.

[20] Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends®
in Machine Learning, 8(3-4):231–357, 2015.

[21] Eric Budish. The combinatorial assignment problem: Approximate competitive equilibrium from equal
incomes. Journal of Political Economy, 119(6):1061–1103, 2011.

[22] Eric Budish, Gérard P Cachon, Judd B Kessler, and Abraham Othman. Course match: A large-scale
implementation of approximate competitive equilibrium from equal incomes for combinatorial allocation.
Operations Research, 65(2):314–336, 2016.

[23] Neil Burch, Matej Moravcik, and Martin Schmid. Revisiting cfr+ and alternating updates. Journal of
Artificial Intelligence Research, 64:429–443, 2019.

[24] Ioannis Caragiannis, David Kurokawa, Hervé Moulin, Ariel D Procaccia, Nisarg Shah, and Junxing
Wang. The unreasonable fairness of maximum Nash welfare. In Proceedings of the 2016 ACM Conference
on Economics and Computation, pages 305–322. ACM, 2016.

[25] Ioannis Caragiannis, David Kurokawa, Hervé Moulin, Ariel D Procaccia, Nisarg Shah, and Junxing
Wang. The unreasonable fairness of maximum nash welfare. ACM Transactions on Economics and
Computation (TEAC), 7(3):1–32, 2019.

[26] Antonin Chambolle and Thomas Pock. On the ergodic convergence rates of a first-order primal–dual
algorithm. Mathematical Programming, 159(1-2):253–287, 2016.

[27] Lihua Chen, Yinyu Ye, and Jiawei Zhang. A note on equilibrium pricing as convex optimization. In
International Workshop on Web and Internet Economics, pages 7–16. Springer, 2007.

[28] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player nash
equilibria. Journal of the ACM (JACM), 56(3):1–57, 2009.

[29] Xi Chen, Christian Kroer, and Rachitesh Kumar. The complexity of pacing for second-price auctions.
In Proceedings of the 2021 ACM Conference on Economics and Computation, 2021.

[30] Yun Kuen Cheung, Richard Cole, and Nikhil R Devanur. Tatonnement beyond gross substitutes?
gradient descent to the rescue. Games and Economic Behavior, 2019.

[31] Mark Cieliebak, Stephan J Eidenbenz, Aris Pagourtzis, and Konrad Schlude. On the complexity of
variations of equal sum subsets. Nord. J. Comput., 14(3):151–172, 2008.

[32] Edward H Clarke. Multipart pricing of public goods. Public choice, pages 17–33, 1971.

[33] Richard Cole and Lisa Fleischer. Fast-converging tatonnement algorithms for one-time and ongoing
market problems. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages
315–324, 2008.
BIBLIOGRAPHY 87

[34] Richard Cole, Nikhil R Devanur, Vasilis Gkatzelis, Kamal Jain, Tung Mai, Vijay V Vazirani, and Sadra
Yazdanbod. Convex program duality, fisher markets, and Nash social welfare. In 18th ACM Conference
on Economics and Computation, EC 2017. Association for Computing Machinery, Inc, 2017.

[35] Vincent Conitzer and Tuomas Sandholm. New complexity results about Nash equilibria. Games and
Economic Behavior, 63(2):621–641, 2008.

[36] Vincent Conitzer, Christian Kroer, Eric Sodomka, and Nicolás E Stier-Moses. Multiplicative pacing
equilibria in auction markets. In International Conference on Web and Internet Economics, 2018.

[37] Vincent Conitzer, Christian Kroer, Debmalya Panigrahi, Okke Schrijvers, Eric Sodomka, Nicolas E
Stier-Moses, and Chris Wilkens. Pacing equilibrium in first-price auction markets. In Proceedings of the
2019 ACM Conference on Economics and Computation. ACM, 2019.

[38] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of comput-
ing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.

[39] Gerard Debreu. A social equilibrium existence theorem. Proceedings of the National Academy of Sci-
ences, 38(10):886–893, 1952.

[40] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through
awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages
214–226. ACM, 2012.

[41] Benjamin Edelman and Michael Ostrovsky. Strategic bidder behavior in sponsored search auctions.
Decision support systems, 43(1):192–198, 2007.

[42] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized
second-price auction: Selling billions of dollars worth of keywords. American economic review, 97(1):
242–259, 2007.

[43] Edmund Eisenberg. Aggregation of utility functions. Management Science, 7(4):337–350, 1961.

[44] Edmund Eisenberg and David Gale. Consensus of subjective probabilities: The pari-mutuel method.
The Annals of Mathematical Statistics, 30(1):165–168, 1959.

[45] Ky Fan. Fixed-point and minimax theorems in locally convex topological linear spaces. Proceedings of
the National Academy of Sciences of the United States of America, 38(2):121, 1952.

[46] Fei Fang, Thanh H Nguyen, Rob Pickles, Wai Y Lam, Gopalasamy R Clements, Bo An, Amandeep
Singh, Brian C Schwedock, Milin Tambe, and Andrew Lemieux. Paws—a deployed game-theoretic
application to combat poaching. AI Magazine, 38(1):23–36, 2017.

[47] Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Online convex optimization for sequential
decision processes and extensive-form games. In Proceedings of the AAAI Conference on Artificial
Intelligence, volume 33, pages 1917–1925, 2019.

[48] Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Optimistic regret minimization for extensive-
form games via dilated distance-generating functions. In Advances in Neural Information Processing
Systems, pages 5222–5232, 2019.

[49] Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Stochastic regret minimization in extensive-
form games. In International Conference on Machine Learning. PMLR, 2020.

[50] Arthur Flajolet and Patrick Jaillet. Real-time bidding with side information. In Proceedings of the
31st International Conference on Neural Information Processing Systems, pages 5168–5178. Curran
Associates Inc., 2017.

[51] Yuan Gao, Christian Kroer, and Donald Goldfarb. Increasing iterate averaging for solving saddle-point
problems. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
88 BIBLIOGRAPHY

[52] Mohammad Ghodsi, MohammadTaghi HajiAghayi, Masoud Seddighin, Saeed Seddighin, and Hadi
Yami. Fair allocation of indivisible goods: Improvements and generalizations. In Proceedings of the
2018 ACM Conference on Economics and Computation, pages 539–556, 2018.

[53] Itzhak Gilboa and Eitan Zemel. Nash and correlated equilibria: Some complexity considerations. Games
and Economic Behavior, 1(1):80–93, 1989.

[54] Irving L Glicksberg. A further generalization of the kakutani fixed point theorem, with application to
nash equilibrium points. Proceedings of the American Mathematical Society, 3(1):170–174, 1952.

[55] Jonathan Goldman and Ariel D Procaccia. Spliddit: Unleashing fair division algorithms. ACM SIGecom
Exchanges, 13(2):41–46, 2015.

[56] Theodore Groves. Incentives in teams. Econometrica: Journal of the Econometric Society, pages 617–
631, 1973.

[57] Samid Hoda, Andrew Gilpin, Javier Pena, and Tuomas Sandholm. Smoothing techniques for computing
Nash equilibria of sequential games. Mathematics of Operations Research, 35(2):494–512, 2010.

[58] Devansh Jalota, Marco Pavone, Qi Qi, and Yinyu Ye. Fisher markets with linear constraints: Equi-
librium properties and efficient distributed algorithms. Games and Economic Behavior, 141:223–260,
2023.

[59] Tinne Hoff Kjeldsen. John von neumann’s conception of the minimax theorem: a journey through
different mathematical contexts. Archive for history of exact sciences, 56(1):39–68, 2001.

[60] Daphne Koller, Nimrod Megiddo, and Bernhard von Stengel. Efficient computation of equilibria for
extensive two-person games. Games and economic behavior, 14(2):247–259, 1996.

[61] Vijay Krishna. Auction theory. Academic press, 2009.

[62] Christian Kroer and Alexander Peysakhovich. Scalable fair division for’at most one’preferences. arXiv
preprint arXiv:1909.10925, 2019.

[63] Christian Kroer, Gabriele Farina, and Tuomas Sandholm. Solving large sequential games with the
excessive gap technique. In Advances in Neural Information Processing Systems, pages 864–874, 2018.

[64] Christian Kroer, Alexander Peysakhovich, Eric Sodomka, and Nicolas E Stier-Moses. Computing large
market equilibria using abstractions. In Proceedings of the 2019 ACM Conference on Economics and
Computation, pages 745–746, 2019.

[65] Christian Kroer, Kevin Waugh, Fatma Kılınç-Karzan, and Tuomas Sandholm. Faster algorithms for
extensive-form game solving via improved smoothing functions. Mathematical Programming, pages 1–33,
2020.

[66] David Kurokawa, Ariel D Procaccia, and Junxing Wang. Fair enough: Guaranteeing approximate
maximin shares. Journal of the ACM (JACM), 65(2):1–27, 2018.

[67] Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo sampling for regret
minimization in extensive games. In Advances in neural information processing systems, pages 1078–
1086, 2009.

[68] Euiwoong Lee. Apx-hardness of maximizing nash social welfare with indivisible items. Information
Processing Letters, 122:17–20, 2017.

[69] Matej Moravčı́k, Martin Schmid, Neil Burch, Viliam Lisỳ, Dustin Morrill, Nolan Bard, Trevor Davis,
Kevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level artificial intelligence
in heads-up no-limit poker. Science, 356(6337):508–513, 2017.

[70] APS Mosek. The mosek optimization software. Online at http://www. mosek. com, 54(2-1):5, 2010.
BIBLIOGRAPHY 89

[71] Arkadi Nemirovsky and David Borisovich Yudin. Problem complexity and method efficiency in opti-
mization. 1983.
[72] Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120
(1):221–259, 2009.
[73] Yurii Nesterov and Vladimir Shikhman. Computation of fisher–gale equilibrium by auction. Journal of
the Operations Research Society of China, 6(3):349–389, 2018.
[74] Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V Vazirani. Algorithmic game theory. Cambridge
University Press, 2007.
[75] Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
[76] Alexander Peysakhovich, Christian Kroer, and Nicolas Usunier. Implementing fairness constraints in
markets using taxes and subsidies. In Proceedings of the 2023 ACM Conference on Fairness, Account-
ability, and Transparency, pages 916–930, 2023.
[77] IV Romanovskii. Reduction of a game with full memory to a matrix game. Doklady Akademii Nauk
SSSR, 144(1):62–+, 1962.
[78] Tim Roughgarden. Twenty lectures on algorithmic game theory. Cambridge University Press, 2016.
[79] Sheryl Sandberg. Doing more to protect against discrimination in housing, employ-
ment and credit advertising. Facebook, 2019. URL https://about.fb.com/news/2019/03/
protecting-against-discrimination-in-ads/.
[80] Maurice Sion et al. On general minimax theorems. Pacific Journal of mathematics, 8(1):171–176, 1958.
[81] Kalyan Talluri and Garrett Van Ryzin. An analysis of bid-price controls for network revenue manage-
ment. Management science, 44(11-part-1):1577–1593, 1998.
[82] Milind Tambe. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge
university press, 2011.
[83] Oskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling. Solving heads-up limit Texas
hold’em. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
[84] Hal R Varian. Position auctions. international Journal of industrial Organization, 25(6):1163–1178,
2007.
[85] Hal R Varian and Christopher Harris. The vcg auction in theory and practice. American Economic
Review, 104(5):442–45, 2014.
[86] William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance,
16(1):8–37, 1961.
[87] John von Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen, 100(1):295–320, 1928.
[88] John Von Neumann. On the theory of games of strategy. Contributions to the Theory of Games, 4:
13–42, 1959.
[89] Bernhard von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior,
14(2):220–246, 1996.
[90] Haifeng Xu. The mysteries of security games: Equilibrium computation becomes combinatorial al-
gorithm design. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages
497–514. ACM, 2016.
[91] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization
in games with incomplete information. In Advances in neural information processing systems, pages
1729–1736, 2007.

You might also like