Main Ai Games Markets
Main Ai Games Markets
Christian Kroer1
Department of Industrial Engineering and Operations Research
Columbia University
1
Email: [email protected].
2
Contents
6 Extensive-Form Games 37
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 Perfect-Information EFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3 Imperfect-Information EFGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Sequence Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.5 Dilated Distance-Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.6 Counterfactual Regret Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.7 Numerical Comparison of CFR methods and OMD-like methods . . . . . . . . . . . . . . . . 45
6.8 Stochastic Gradient Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.9 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3
4 CONTENTS
12 Demographic Fairness 79
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
12.2 Disallowing Targeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.3 Demographic Fairness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
12.4 Implementing Fairness Measures on a Per-Ad Basis . . . . . . . . . . . . . . . . . . . . . . . . 81
12.5 Fairness Constraints in FPPE via Taxes and Subsidies . . . . . . . . . . . . . . . . . . . . . . 82
12.6 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 1
Target Audience
These notes are targeted at senior undergraduates, masters, and Ph.D. students in operations research and
computer science. The background requirements are as follows:
• I will sometimes refer to basic concepts from computational complexity theory, but a background in
this is not required.
The notes do not assume any background in game theory or mechanism design. For convex optimization,
it is possible to read up on the basics as you go along. For example, the first few chapters of Boyd and
Vandenberghe [16] are a good reference. For linear optimization, I recommend Bertsimas and Tsitsiklis [12].
5
6 CHAPTER 1. INTRODUCTION AND EXAMPLES
prices x1 and x2 respectively. Now, suppose that f1 is a function that tells us the revenue received by
retailer 1 in this setup. Since consumers will potentially compare the prices x1 and x2 , we should expect
f1 to depend on both x1 and x2 , so we let f1 (x1 , x2 ) be the revenue for retailer 1 generated under prices
x1 and x2 . Now we can again try to think of the optimization problem that retailer 1 wishes to solve; first
let us assume that x2 was already chosen and retailer 1 knows its value, in that case they want to solve
maxx1 ∈X f1 (x1 , x2 ). However, we could similarly argue that retailer 2 should choose their price x2 based on
the price x1 chosen by retailer 1. Now we have a problem, because we cannot talk about optimally choosing
either of the two prices in isolation, and instead we need a way to reason about how they might be chosen in
a way that depends on each other. Game theory provides a formal way to reason about this type of situation.
For example, the famous Nash equilibrium, which we will study below, specifies that we should find a pair
x1 , x2 such that they are mutually optimal with respect to each other. Another solution concept we will see
is the Stackelberg equilibrium, where one retailer is assumed to go first, while anticipating the optimization
problem being solved by the second retailer. From now on we will refer to each individual optimizer in a
problem either as a player or an agent.
In this representation, Player 1 chooses a row to play, and Player 2 chooses a column to play. Player 1 tries
to maximize the first value at the resulting entry in the bimatrix, while Player 2 tries to maximize the second
value.
Here is an example of something that is not a Nash equilibrium: Player 1 always plays rock, and Player
2 always plays scissors. In this case, Player 2 is not playing optimally given the strategy of Player 1, since
they could improve their payoff from −1 to 1 by switching to deterministically playing paper. In fact, this
argument works for any pair of deterministic strategies, and so we see that there is no Nash equilibrium
consisting of deterministic strategies.
Instead, RPS is an example of a game where we need randomization in order to arrive at a Nash equi-
librium. The idea is that each player gets to choose a probability distribution over their actions instead
(e.g. a distribution over rows for Player 1). Now, the value that a given player receives under a pair of
mixed strategies is their expected payoff given the randomized strategies. In RPS, it’s easy to see that the
unique Nash equilibrium is for each player to play each action with probability 13 . Given this distribution,
there is no other action that either player can switch to and improve their utility. This is what we call a
(mixed-strategy) Nash equilibrium.
The famous result of John Nash from 1951 is that every game has a Nash equilibrium:
In fact, Nash’s result is broader: it covers a quite broad class of n-player games, as we shall see in the
next lecture.
The attentive reader may have noticed that the RPS game has a further property: whenever one player
wins, the other loses. This means that each player can equivalently reason about minimizing the utility of
the other player, rather than maximizing their own utility. More generally, a bimatrix game is a zero-sum
game if it can be represented in the following form:
min max x⊤ Ay
x∈∆n y∈∆m
1.2. GAME THEORY 7
Pn
where ∆n = {x ∈ Rn : i=1 xi = 1, x ≥ 0} is the probability simplex over n actions, ∆m the probability
simplex of m actions, and A contains the payoff entries to the y-player from the bimatrix representation.
Problems of this form are also known as bilinear saddle-point problems. The key here is that we can now
represent the outcome of the game as a single matrix, where the x-player wishes to minimize the bilinear
term x⊤ Ay and the y-player wishes to maximize it. Zero-sum matrix games are very special: they can be
solved in polynomial time with a linear program whose size is linear in the matrix size.
Rock-paper-scissors is of course a rather trivial example of a game. A more exciting application of
zero-sum games is to use it to compute an optimal strategy for two-player poker (AKA heads-up poker).
In fact, as we will discuss later, this was the foundation for many recent” “superhuman AI for poker”
results [15, 69, 17, 19]. In order to model poker games we will need a more expressive game class called
extensive-form games (EFGs). These games are played on trees, where players may sometimes have groups
of nodes, called information sets, that they cannot distinguish among. An example is shown in Figure 6.3.
A K
P1 P1
f r r f
−1 P2 P2 −1
f r f r
1 P1 1 P1
c f c f
3 −2 −3 −2
Figure 1.1: A poker game where P1 is dealt Ace or King. “r,” “f,” and “c” stands for raise, fold, and check
respectively. Leaf values denote P1 payoffs. The shaded area denotes an information set: P2 does not know
which of these nodes they are at, and must thus use the same strategy in both.
where X, Y are no longer probability simplexes, but more general convex polytopes that encode the sequential
decision spaces of each player. This is called the sequence-form representation [89], and we will cover that
later. Like matrix games, zero-sum EFGs can be solved in polynomial time with linear programming, with
an LP whose size is linear in the game tree.
It turns out that in many practical scenarios, the LP for solving a zero-sum game ends up being far too
large to solve. This is especially true for EFGs, where the game tree can quickly become extremely large if the
game has almost any amount of depth. Instead, iterative methods are used in practice. What is meant by it-
erative methods here is the class of algorithms that build a sequence of strategies x0 , x1 , . . . , xT , y0 , y1 , . . . , yT
using only some form of oracle access to Ay and A⊤ x (this is different from writing down A explicitly!). Typ-
PT PT
ically in such iterative methods, the average of the sequence of strategies x̄T = T1 t=1 xt , ȳT = T1 t=1 yt
8 CHAPTER 1. INTRODUCTION AND EXAMPLES
converge to a Nash equilibrium. The reason these methods are preferred is two-fold. First, by never writing
down A explicitly we save a lot of memory (now we just need enough memory to store the much smaller x, y
strategy vectors). Secondly, they avoid the expensive matrix inversions involved in the simplex algorithm
and interior-point methods.
The algorithmic techniques we will learn for Nash equilibrium computation are largely centered around
iterative methods. First, we will do a quick introduction to online learning and online convex optimization.
√
We will learn about two classes of algorithms: ones that converge to an equilibrium at a rate O(1/ T ).
These roughly correspond to saddle-point variants of gradient-descent-like methods. Then we will learn
about methods that converge to the solution at a rate of O(1/T ). These roughly correspond to saddle-point
variants of accelerated gradient methods. Then we will also look at the practical performance of these
algorithms. Here we will see that the following quote is very much true:
In theory, theory and practice are the same. In practice, they are not.
+
In particular, the preferred method in practice is the
√ CFR algorithm [83] and later variations [18], all of
which have a theoretical convergence rate of O(1/ T ). In contrast, there are methods that converge at a
rate of O(1/T ) [57, 65, 63] in theory, but these methods are actually slower than CFR+ for most real games!
Being able to compute an approximate Nash equilibrium with iterative methods is only one part of how
superhuman AIs were created for poker. In addition, abstraction and deep learning methods were used to
create a small enough game that can be solved with iterative methods. We will also cover how these methods
are used.
Killer applications of zero-sum games include poker (as we saw), other recreational two-player games, and
generative-adversarial networks (GANs). Other applications that are, as of yet, less proven to be effective
in practice are robust sequential decision making (the adversary represents uncertainty), security scenarios
where we assume the world is adversarial, and defense applications.
Killer applications of Stackelberg games are mainly in the realm of security. They have been applied in
infrastructure security (airports, coast guard, air marshals) [82], to protect wildlife [46], and to combat fare
evasion. A nascent literature is also emerging in cybersecurity. Outside of the world of security, Stackelberg
games are also used to model things like first-mover advantage in the business world.
Course A Course B
Student 1 5 5
Student 2 2 8
Student 1 arrives first and signs up for course B. Then Student 2 arrives and signs up for A. The total
welfare of this assignment is 5 + 2 = 7. This does not seem to be an efficient use of resources: we can
improve our solution by swapping the courses, since Student 1 gets the same utility as before, and Student
2 improves their utility. This is what’s called a Pareto-improving allocation because each student is at least
as well off as before, and at least one student is strictly better off. One desiderata for efficiency is that no
such improvement should be possible; an allocation with this property is called Pareto efficient.
Let’s look at another example. Now we have 2 students and 4 courses, where each student takes 2 courses.
Again courses have only 1 seat.
Course A Course B Course C Course D
Student 1 10 10 1 1
Student 2 10 10 1 1
Now say that Student 1 shows up first, and signs up for A and B. Then Student 2 shows up and signs up for
C and D. Call this assignment A1 . Here we get that A1 is Pareto efficient, but it does not seem fair. A fairer
solution seems to be that each students get a course with value 10 and a course with value 1, let A2 be such
an assignment. One way to look at this improvement is through the notion of envy: each student should like
their own course schedule at least as well as that of any other student. Under A1 Student 2 envies Student
1 by a value of 18, whereas under A2 no student envies the other. An allocation where no student envies
another student is called envy-free. Fairness turns out to be a complicated idea, and we will see later that
there are several appealing notions that we may wish to strive for.
Instead of first-come-first-serve, we can use ideas from market design to get a better mechanism. The
solution that we will learn about is based on a fake-money market: we give every student some fixed budget
of fake currency (aka funny money). Then, we treat the assignment problem as a market problem under
the assigned budgets, and ask for what is called a market equilibrium. Briefly, a market equilibrium is a set
10 CHAPTER 1. INTRODUCTION AND EXAMPLES
of prices, one for each item, and an allocation of items to buyers. The allocation must be such that every
item is fully allocated (or has a price of zero), and every buyer is getting an assignment that maximizes their
utility over all the possible assignments they could afford given the prices and their budget. Given such a
market equilibrium, we then take the allocation from the equilibrium, throw away the prices (the money was
fake anyway!), and use that to perform our course allocation. This turns out to have a number of attractive
fairness and efficiency properties. Course-selection mechanisms based on this idea are deployed at several
business schools such as Columbia Business School, the Wharton School at U. of Pennsylvania, University
of Toronto’s Rotman School of Management, and Darthmouth’s Tuck School of Business [21, 22].
Of course, if we want to implement this protocol we need to be able to compute a market equilibrium.
This turns out to be a rich research area: in the case of what is called a Fisher market, where each agent
i has a linear utility function vi ∈ Rm+ over the m items in the market there is a neat convex program that
results in a market equilibrium [44]:
X
max Bi log(vi · xi )
x≥0
i
X
s.t. xij ≤ 1, ∀j
i
Here xij is how much buyer i is allocated of item j. Notice that we are simply maximizing the budget-
weighted logarithmic utilities, with no prices! It turns out that the prices are the dual variables on the
supply constraints. We will see some nice applications of convex duality and Fenchel conjugates in deriving
this relationship. We will also see that this class of markets have a relationship to the types of auction
systems that are used at Google and Facebook [36, 37].
In the case of markets such as those for course seats, the problem is computationally harder and requires
combinatorial optimization. Current methods use a mixture of MIP and local search [22].
1.4 Acknowledgments
These lecture notes owe a large debt to several other professors that have taught courses on Economics
and Computation. In particular, Tim Roughgarden’s lecture notes [78] and video lectures, John Dickerson’s
course at UMD1 , and Ariel Procaccia’s course at CMU2 provided inspiration for course topics as well as
presentation ideas.
I would also like to thank the following people for extensive feedback on the lecture notes: Ryan D’Orazio
for both finding mistakes and providing helpful suggestions on presenting Blackwell approachability. Gabriele
Farina for discussions around several regret minimization issues. And for helping me develop much of my
thinking on regret minimization in general.
I also thank the following people who pointed out mistakes and typos in the notes: Mustafa Mert Çelikok,
Ajay Sakarwal, Eugene Vinitsky.
1 https://www.cs.umd.edu/class/spring2018/cmsc828m/
2 http://www.cs.cmu.edu/ arielpro/15896s16/index.html
Chapter 2
In this lecture we begin our study of Nash equilibrium by giving the basic definitions, Nash’s existence result,
and briefly touch on computability issues. Then we will make a few observations specifically about zero-sum
games, which have much more structure to exploit.
11
12 CHAPTER 2. NASH EQUILIBRIUM INTRO
In this game there is no DSE, but there’s clearly two pure-strategy Nash equilibria: the professor prepares
and students listen, or the professor slacks off and students sleep. But these have quite different properties.
Thus equilibrium selection is an issue for general-sum games. There are at least two reasons for this: first,
if we want to predict the behavior of players then how do we choose which equilibrium to predict? Second,
if we want to prescribe behavior for an individual player, then we cannot necessarily suggest that they play
some particular strategy from a Nash equilibrium, because if the other players do not play the same Nash
equilibrium then it may be a terrible suggestion (for example, suggesting that the professor plays “Prepare”
from the Prepare/Listen equilibrium, when the students are playing the Slack off/Sleep equilibrium would
be bad for the professor).
Moreover, pure-strategy equilibria are not even guaranteed to exist, as we saw in the previous lecture
with the rock-paper-scissors example.
To fix the existence issue we may consider allowing players to randomize over their choice of strategy
(as in rock-paper-scissors where players should randomize uniformly). Let σi ∈ ∆|Si | denote player i’s
probability distribution over their strategy, this is called a mixed strategy. Let a strategy profile be denoted
by σ = (σ1 , . . . , σn ). By a slight abuse of notation we may rewrite a player’s utility function as
X Y
ui (σ) = ui (s) σi (si )
s∈S i
A (mixed-strategy) Nash equilibrium is a strategy profile σ such that for all pure strategies σi′ (σi′ is pure if
it puts probability 1 on a single strategy):
Theorem 2. Any game with a finite set of strategies and a finite set of players has a mixed-strategy Nash
equilibrium.
Now, since our goal is the prescribe or predict behavior, we would also like to be able to compute a Nash
equilibrium. Unfortunately this turns out to be computationally difficult:
Theorem 3. The problem of computing a Nash equilibrium in general-sum finite games is PPAD-complete.
We won’t go into detail on what the complexity class PPAD is for now, but suffice it to say that it is
weaker than the class of NP-complete problems (it is not hard to come up with a MIP for computing a Nash
equilibrium, for example), but still believed to take exponential time in the worst case.
As a sidenote, one may make the following observation about why Nash equilibrium does not “fit” in the
class of NP-complete problems: typically in NP-completeness we ask questions such as “does there exist a
satisfying assignment to this Boolean formula?” But given a particular game, we already know that a Nash
equilibrium exists. Thus we cannot ask about the type of existence questions typically used in NP-complete
problems, but rather it is only the task of finding one of the solutions that is difficult. This can be a useful
notion to keep in mind when encountering other problems that have guaranteed existence. That said, once
one asks for additional properties such as “does there exist a Nash equilibrium where the sum of utilities is
at least v?” one gets an NP-complete problem.
Given a strategy profile σ, we will often be interested in measuring how “happy” the players are with the
outcome of the game under σ. Most commonly, we are interested in the social welfare of a strategy profile
(and especially for equilibria). The social welfare is the expected value of the sum of the player’s utilities:
n
X n X
X n
Y
ui (σ) = ui (s) σi′ (si′ ).
i=1 i=1 s∈S i′ =1
We already saw in the Professor’s Dilemma that there can be multiple equilibria with wildly different social
welfare: when the professor slacks off and the students sleep, the social welfare is zero; when the professor
prepares and the students listen, the social welfare is 2 · 106 .
2.2. HISTORICAL NOTES 13
A first observation one may make is that the minimization problem faced by the x-player is a convex
optimization problem, since the max operation is convexity-preserving. This suggests that we should have
a lot of algorithmic options to use. This turns out to be true: unlike the general case, we can compute a
zero-sum equilibrium in polynomial time using linear programming (LP).
In fact, we have the following stronger statement, which is essentially equivalent to LP duality:
Theorem 4 (von Neumann’s minimax theorem). Every two-player zero-sum game has a unique value v,
called the value of the game, such that
We will prove a more general version of this theorem when we discus regret minimization.
Because zero-sum Nash equilibria are min-max solutions, they are the best that a player can do, given a
worst-case opponent. This guarantee is the rationale for saying that a given game has been solved if a Nash
equilibrium has been computed for the game. Some games are trivially solvable, e.g. in rock-paper-scissors
we know that the uniform distribution is the only equilibrium. However, this notion has also been applied
to quite large games. For example heads-up limit Texas hold’em, one of the smallest poker variants played
by humans (which is still a huge game). In 2015, that game was essentially solved. The idea of essentially
solving a game is as follows: we want to compute a strategy that is statistically indistinguishable from a Nash
equilibrium in a lifetime of human-speed play. The statistical notion was necessary because the solution was
computed using iterative methods that only converge to an equilibrium in the limit (but in practice get quite
close very rapidly). The same argument is also used in constructing AIs for even larger two-player zero-sum
poker games where we can only try to approximate the equilibrium.
Note that this min-max guarantee of Nash equilibria does not hold in general-sum games. In general-sum
games, we have no payoff guarantees if our opponent does not play their part of the same Nash equilibrium
that we play. Interestingly, the AI and optimization methods developed for two-player zero-sum poker turned
out to still outperform top-tier human players in 6-player no-limit Texas hold’em poker, in spite of these
equilibrium selection issues. An AI based on these methods ended up beating professional human players,
in spite of the methods having no guarantees on performance, nor even of converging to a general-sum Nash
equilibrium.
Here is another interesting property of zero-sum Nash equilibrium: it is interchangeable. Meaning that
if you take an equilibrium (x, y) and another equilibrium (x′ , y ′ ) then (x, y ′ ) and (x′ , y) are also equilibria.
This is easy to see from the minimax formulation.
3.1 Introduction
In this lecture note we will study the problem of how to aggregate a set of agent preferences into an outcome,
ideally in a way that achieves some desirable outcome. Desiderata we might care about include social welfare,
which is just the sum of the agent’s utilities derived from the outcome, or revenue in the context of auctions.
Suppose that we have have a car, and we wish to give it to one of n people, with the goal of giving it
to the person that would get the most utility out of the car. One thing we could do is ask each person to
tell us how much utility they would get out of receiving the car, expressed as some positive number. This,
obviously, leads to the “who can name the highest number?” game, since no person will want to tell us how
much value they actually place on the car, but will instead try to name as large of a number as possible.
The above, rather silly, example shows that in general we need to be careful about how we design the
rules that map the stated preferences by the agents of a mechanism into an outcome. The general field
concerned with the design of rules, or mechanisms for designing rules that ask agents about their preferences
and use that to choose an outcome is called mechanism design.
3.2 Auctions
We will mostly focus on the most classical mechanism-design setting: auctions. We will start by considering
single-item auctions: there is a single good for sale, and there is a set of n buyers, with each buyer having
some value vi for the good. The goal will be to sell the item via a sealed-bid auction, which works as follows:
1. Each bidder i submits a bid bi ≥ 0, without seeing the bids of anyone else.
2. The seller decides who gets the good based on the submitted bids.
A few things in our setup may seem strange. First, most people would not think of sealed bids when
envisioning an auction. Instead, they typically envision what’s called the English auction. In the English
auction, bidders repeatedly call out increasing bids, until the bidding stops, at which point the highest bidder
wins and pays their last bid. This auction can be conceptualized as having a price that starts at zero, and
then rises continuously, with bidders dropping out as they become priced out. Once only one bidder is left,
the increasing price stops and the item is sold to the last bidder at that price. This auction format turns out
to be equivalent to the second-price sealed-bid auction which we will cover below. Another auction format
is the Dutch auction, which is less prevalent in practice. It starts the price very high such that nobody is
interested, and then continuously drops the price until some bidder says they are interested, at which point
they win the item at that price. The Dutch auction is likewise equivalent to the first-price sealed-bid auction,
which we cover below.
Secondly, it would seem natural to always give the item to the highest bid in step 2, but this is not
always done (though we will focus on that rule). Thirdly, the pricing step allows us to potentially charge
15
16 CHAPTER 3. AUCTIONS AND MECHANISM DESIGN INTRO
more bidders than only the winner. This is again done in some reasonable auction designs, though we will
mostly focus on auction formats where pi = 0 if i does not win.
When thinking about how buyers are going to behave in an auction, we need to first clarify what each
buyers knows about the other bidders. Perhaps the most standard setting is one where each buyer i has some
distribution Fi from which they draw their bid is drawn, independently from the distribution for every other
buyer. This is known as the independent private values (IPV) model. In this model, every buyer knows the
distribution of every other buyer, but they only get to observe their own value vi ∼ Fi before choosing their
bid bi . For this model, we need a new game-theoretic equilibrium notion called a Bayes Nash equilibrium
(BNE). A BNE is a set of mappings {σi }ni=1 , where σi (vi ) specifies the bid that buyer i submits when they
have value i, such that for all values vi and alternative bids bi , σi (vi ) achieves at least as much utility as bi
in a Bayesian sense:
Ev−i ∼F−i [ui (σi (vi ), σ−i (v−i ))|vi ] ≥ Ev−i ∼F−i [ui (bi , σ−i (v−i ))|vi ].
In the auction context, ui (bi , σ−i (v−i ) is utility that buyer i derives given the allocation and payment rule.
The idea of a BNE works more generally for a game setup where ui is some arbitrary utility function.
We will now introduce some useful mechanism-design terminology. We will introduce it in this single-item
auction context, but it applies more broadly.
Efficiency. An outcome of a single-item auction is efficient if the item ends up allocated to the buyer
that values it the most. In general mechanism design problems, an efficient outcome is typically taken to
be one that maximizes the sum of the agent utilities, which is also known as maximizing the social welfare.
Alternatively, efficiency is sometimes taken to mean that we get a Pareto-optimal outcome, which is a weaker
notion of efficiency than social welfare maximization (convince yourself of this with a small example.)
Revenue. The revenue of a single-item auction is simply the sum of payments made by the bidders.
(which were popular search engines at the time). Bidding and pricing turned out to be very inefficient,
because buyers were constantly changing their bids in order to best respond to each other. Plots of the price
history show a clear “sawtooth pattern,” where a pair of bidders will take turns increasing their bid by 1
cent each, in order to beat the other bidder. Finally, one of the bidders reaches their valuation, at which
point they drop their bid much lower in order to win something else instead. Then, the winner realizes that
they should bid much lower, in order to decrease the price they pay. At that point, the bidder that dropped
out starts bidding 1 cent more again, and the pattern repeats. This leads to huge price fluctuations, and
inefficient allocations, since about half the time the item goes to the bidder with the lower valuation.
All that said, it turns out that there does exist at least one interesting characterization of how bid-
ding should work in a single-item first-price auction (the Overture example technically consists of many
“independent” first-price auctions; though that independence does not truly hold as we shall see later).
For this characterization, we assume the following symmetric model: we have n buyers as before, and
buyer i assigns value vi ∈ [0, ω] for the good. Each vi is sampled IID from an increasing distribution function
F . F is assumed to have a continuous density f and full support. Each bidder knows their own value vi ,
but only knows that the value of each other buyer is sampled according to F .
Given a bid bi , buyer i earns utility vi − bi if they win, and utility 0 otherwise. If there are multiple bids
tied for highest then we assume that a winner is picked uniformly at random among the winning bids, and
only the winning bidder pays.
It turns out that there exists a symmetric equilibrium in this setting, where each bidder bids according
to the function
β(vi ) = E[Y1 |Y1 < vi ],
where Y1 is the random variable denoting the maximum over n − 1 independently-drawn values from F .
Theorem 5. If every bidder in a first-price auction bids according to β then the resulting strategy profile is
a Bayes-Nash equilibrium.
If z ≥ vi then this is clearly positive since G(z) ≥ G(y) for all y ∈ [vi , z]. If z ≤ vi , then G(z) ≤ G(y), and
so we have a negative number and subtract a more negative number.
A nice property that follows from the monotonicity of β is that the item is always allocated to the bidder
with the highest valuation, and thus the symmetric equilibrium is efficient.
18 CHAPTER 3. AUCTIONS AND MECHANISM DESIGN INTRO
A mechanism takes as input a vector of reported types θ from the players, and outputs an outcome,
formally it is a function f : ×i Θi → O that specifies the outcome that results from every possible set of
reported types. In mechanism design with money, we also have a payment function g : ×i Θi → Rn that
specifies how much each agent pays under the outcome. In less formal terms, a mechanism merely specifies
what happens, given the reported types from the agents. In first and second-price auctions the outcome
function was the same (allocate to the highest bidder), but the payment function was different. We could
potentially also allow randomized mechanism f : ×i Θi → ∆(O) that map to a probability distribution over
the outcome space.
How do we analyze what happens in a given mechanism? The ideal answer is that every agent is best
off reporting their true type, no matter what everybody else does, i.e. the mechanism should be DSIC.
Formally, that would mean that for every agent i, type θi ∈ Θi , any type vector θ−i of the remaining agents,
and misreported type θi′ ∈ Θi :
E [vi (f (θi , θ−i ))] ≥ E [vi (f (θi′ , θ−i ))] ,
where the expectation is over the the potential randomness of the mechanism. If there is also a payment
function g and agents have quasi-linear utilities then the inequality is
E [vi (f (θi , θ−i )) − g(θi , θ−i )] ≥ E [vi (f (θi′ , θ−i )) − g(θi′ , θ−i )] ,
3.4. HISTORICAL NOTES 19
A less satisfying answer is that there exists a Bayes-Nash equilibrium of the game induced by the mech-
anism, in which every agent reports their true type. Formally, that would mean that for every agent i, type
θi ∈ Θi , and misreported type θi′ ∈ Θi :
where the expectation is over the types θ−i of the other agents, and the potential randomness of the mecha-
nism. This constraint just says that reporting their true type should maximize their expected utility, given
that everybody else is truthfully reporting. This can likewise be generalized for a payment function g.
In the setting where we can charge money, the Vickrey-Clarke-Groves (VCG) mechanism is DSIC and
maximizes social welfare. In VCG, after receiving the type vector θ, we pick the outcome o that maximizes
the report welfare. The key to then making this choice incentive compatible is that we charge each agent
their externality: X X
max
′
vi′ (o′ |θi′ ) − vi′ (o|θi′ ).
o ∈O
i′ ̸=i i′ ̸=i
The externality measures how much better off all the other agents would have been if i were not there. When
we add together the value received by player i minus their payment, we get that their utility function is:
X X
vi′ (o|θi′ ) − max
′
vi′ (o′ |θi′ )
o ∈O
i′ i′ ̸=i
Intuitively, we see that i cannot affect the negative term here, and the positive term is exactly the social
welfare. Thus we get that each agent i is incentivized to maximize social welfare, which is achieved by
reporting their true type θi .
1 see https://www.blog.google/products/admanager/update-first-price-auctions-google-ad-manager/
20 CHAPTER 3. AUCTIONS AND MECHANISM DESIGN INTRO
Chapter 4
So far we have mostly discussed the existence of game-theoretic equilibria such as Nash equilibrium. Now we
will get started on how to compute Nash equilibria, specifically in two-player zero-sum games. The fastest
methods for computing large-scale zero-sum Nash equilibrium are based on what’s called regret minimization.
Regret minimization is a form of single-agent decision making, where the decision maker repeatedly chooses
decision from a set of possible choices, and each time they make a decision, they are then given some loss
vector specifying how much loss they incurred through their decision. It may seem counterintuitive that we
move on to a single-agent problem after discussing game-theoretic problems with two or more players, but
we shall see that regret minimization can be used to learn how to play a game. We will also use it to prove
a fairly general version of von Neumann’s minimax theorem, a variant that is known as Sion’s minimax
theorem.
4.1.1 Setting
Formally, we are faced with the following problem: at each time step t = 1, . . . , T :
2. Afterwards, a loss vector gt ∈ [0, 1]n is revealed to us, and we pay the loss ⟨gt , xt ⟩
21
22 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM
Our goal is to develop an algorithm that recommends good decisions. A natural goal would be to do as well
as the best sequence of actions in hindsight. But this turns out to be too ambitious, as the following example
shows
Example 1. We have 2 actions a1 , a2 . At timestep t, if our algorithm puts probability greater than 21 on
action a1 , then we set the loss to (1, 0), and vice versa we set it to (0, 1) if we put less than 12 on a1 . Now
we face a loss of at least T2 , whereas the best sequence in hindsight has a loss of 0.
Instead, our goal will be to minimize regret. The regret at time t is how much worse our sequence of
actions is, compared to the best single action in hindsight:
t
X t
X
Rt = ⟨gτ , xτ ⟩ − minn ⟨gτ , x⟩
x∈∆
τ =1 τ =1
We say that an algorithm is a no-regret algorithm if for every ϵ > 0, there exists a sufficiently-large time
horizon T such that RTT ≤ ϵ.
Let’s see an example showing that randomization is necessary. Consider the following natural algorithm:
at time t, choose the action that minimizes the loss seen so far, where ei is the vector of all zeroes except
index i is 1:
X t
xt+1 = argmin ⟨gτ , x⟩. (FTL)
x∈{e1 ,...,en } τ =1
This algorithm is called follow the leader (FTL). Note that it always chooses a deterministic action. The
following example shows that FTL, as well as any other deterministic algorithm, cannot be a no-regret
algorithm
Example 2. At time t, say that we recommend action i. Since the adversary gets to choose the loss vector
after our recommendation, let them choose the loss vector be such that gi = 1, gj = 0∀j ̸= i. Then our
deterministic algorithm has loss T at time T , whereas the cost of the best action in hindsight is at most Tn .
√
It is also possible to derive a lower bound showing that any algorithm must have regret at least O( T )
in the worst case, see e.g. [78] Example 17.5.
t
• At time t, choose actions according to the probability distribution pi = Pwi t
j wj
The stepsize η controls how aggressively we respond to new information. If gt,i is large then we decrease
the weight wi more aggressively.
ηT log n
RT ≤ +
2 η
4.2. ONLINE CONVEX OPTIMIZATION 23
Proof. Let gt2 denote the vector of squared losses. Let Zt = wjt be the sum of weights at time t. We have
P
j
n
X
Zt+1 = wit e−ηgt,i
i=1
n
X
= Zt xt,i e−ηgt,i
i=1
n
X η2 2
≤ Zt xt,i (1 − ηgt,i + g )
i=1
2 t,i
η2
= Zt (1 − η⟨xt , gt ⟩ + ⟨xt , gt2 ⟩)
2
η2
⟨xt ,gt2 ⟩
≤ Zt e−η⟨xt ,gt ⟩+ 2
x2
where the first inequality uses the second-order Taylor expansion e−x ≤ 1 − x + 2 and the second
inequality uses 1 + x ≤ ex .
Telescoping and using Z1 = n, we get
T
η2 η2
⟨xt ,gt2 ⟩
Y PT PT 2
ZT +1 ≤ n e−η⟨xt ,gt ⟩+ 2 = ne−η t=1 ⟨xt ,gt ⟩+ 2 t=1 ⟨xt ,gt ⟩
t=1
We saw in the last lecture that the follow-the-leader (FTL) algorithm, which always picks the action
that minimizes the sum of losses seen so far, does not work. That same argument carries over to the OCO
setting. The basic problem with FTL is that it is too unstable: If we consider a setting with X = [−1, 1] and
24 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM
f1 (x) = 21 x and ft alternates between −x and x then we get that FTL flip-flops between −1 and 1, since
they become alternately optimal, and always end up being the wrong choice for the next loss.
This motivates the need for a more stable algorithm. What we will do is to smooth out the decision
made at each point in time. In order to describe how this smoothing out works we need to take a detour
into distance-generating functions.
⟨h, ∇2 d(x)h⟩, ∀x ∈ X, h ∈ Rn
Intuitively, strong convexity says that the gap between d and its first-order approximation should grow
at a rate of at least ∥x − x′ ∥2 . Graphically, we can visualize the 1-dimensional version of this as follows:
Figure 4.1: Strong convexity illustrated. The gap between the distance function and its first-order approxi-
mation should grow at least as ∥x − x′ ∥2 .
We will use this gap to construct a distance function. In particular, we say that the Bregman divergence
associated with a DGF d is the function:
Intuitively, we are measuring the distance going from x to x′ . Note that this is not symmetric, the distance
from x′ to x may be different, and so it is not a true distance metric.
Given d and our choice of norm ∥ · ∥, the performance of our algorithms will depend on the set width of
X with respect to d:
Ωd = max ′
d(x) − d(x′ ),
x,x ∈X
In particular, we will care about the largest possible loss vector g that we will see, as measured by the
dual norm ∥g∥∗ .
4.3. DISTANCE-GENERATING FUNCTIONS 25
Norms and their dual norm satisfy a useful inequality that is often called the Generalized Cauchy-Schwarz
inequality:
x
⟨g, x⟩ = ∥x∥ g, ≤ ∥x∥ max ⟨g, x′ ⟩ ≤ ∥x∥∥g∥∗
∥x∥ ∥x′ ∥≤1
What’s the point of these DGFs, norms, and dual norms? The point is that we get to choose all of
these in a way that fits the “geometry” of our set X. This will become important later when we will derive
convergence rates that depend on Ω and L, where L is an upper bound on the dual norm ∥g∥X,∗ of all loss
vectors.
Consider the following two DGFs for the probability simplex ∆n = {x : i xi = 1, x ≥ 0}:
P
X 1X 2
d1 (x) = xi log(xi ), d2 (x) = x .
i
2 i i
The first is the entropy DGF, the second is the Euclidean DGF. First let us check that they are both strongly
convex on ∆n . The Euclidean DGF is clearly strongly convex wrt. the ℓ2 norm. It turns out that the entropy
DGF is strongly-convex wrt. the ℓ1 norm. Using the second-order definition of strong convexity and any
h ∈ Rn :
!2
X
2
∥h∥1 = |hi |
i
!2
X√ |hi |
= xi √
i
xi
! !
X X |hi |2
≤ xi by the Cauchy-Schwarz inequality
i i
xi
!
X |hi |2
= because x ∈ ∆n
i
xi
= ⟨h, ∇2 d1 (x)h⟩
But now imagine that our losses are in [0, 1]n . The maximum dual norm for the Euclidean DGF is then
* +
⃗1 √
max ⟨⃗1, x⟩ = ⃗1, √ = n,
∥x∥2 ≤1 n
while Ωd2 = 1.
In contrast, the maximum dual norm for the ℓ1 norm is
where the first step is by the generalized Cauchy-Schwarz inequality and the second step is by maximizing
over x.
We will also need the following result concerning Bregman divergences. Unfortunately it’s not clear what
intuition one can give about this, except to say that the left-hand side is analogous to a triangle inequality.
Lemma 1 (Three-point lemma). For any three points x, u, z, we have
D(u∥x) − D(u∥z) − D(z∥x) = ⟨∇d(z) − ∇d(x), u − z⟩
The proof is direct from expanding definitions and canceling terms.
Proof. The first inequality in the theorem is direct from convexity of ft . Thus we only need to prove the
second inequality.
By first-order optimality of xt+1 we have
⟨ηgt + ∇d(xt+1 ) − ∇d(xt ), x − xt+1 ⟩ ≥ 0, ∀x ∈ X (4.4)
Now pick some arbitrary x ∈ X. By rearranging terms and adding and subtracting ⟨∇d(xt+1 ) −
∇d(xt ), x − xt+1 ⟩ we have
⟨ηgt , xt − x⟩ =⟨∇d(xt ) − ∇d(xt+1 ) − ηgt , x − xt+1 ⟩ + ⟨∇d(xt+1 ) − ∇d(xt ), x − xt+1 ⟩
+ ⟨ηgt , xt − xt+1 ⟩
≤⟨∇d(xt+1 ) − ∇d(xt ), x − xt+1 ⟩ + ⟨ηgt , xt − xt+1 ⟩; by (4.4)
=D(x∥xt ) − D(x∥xt+1 ) − D(xt+1 ∥xt ) + ⟨ηgt , xt − xt+1 ⟩; by three-points lemma
2
η 1
≤D(x∥xt ) − D(x∥xt+1 ) − D(xt+1 ∥xt ) + ∥gt ∥2∗ + ∥xt − xt+1 ∥2 ; by (4.1)
2 2
η2
≤D(x∥xt ) − D(x∥xt+1 ) + ∥gt ∥2∗ ; by strong convexity of d,
2
which proves the theorem.
1 Our
proof follows the one from the excellent lecture notes of Orabona [75]. See also Beck [8] for a proof of the offline variant
of mirror descent.
4.5. MINIMAX THEOREMS VIA OCO 27
The descent lemma gives us a one-step upper bound on how much better x is than xt . Based on the
descent lemma, a bound on the regret of OMD can be derived. The idea is to apply the descent lemma at
each time step, and then showing that when we sum across the resulting inequalities, a sequence of useful
cancellations occur.
Theorem 8. The OMD algorithm with DGF d achieves the following bound on regret:
T
D(x∥x1 ) η X
RT ≤ + ∥gt ∥2∗
η 2 t=1
Proof. Consider any x ∈ X. Now we apply the inequality from Theorem 7 separately to each time step
t = 1, . . . , T , divide through by η, and then summing from t = 1, . . . , T we get
T T
η2
X X 1 2
⟨gt , x − xt ⟩ ≤ D(x∥xt ) − D(x∥xt+1 ) + ∥gt ∥∗
t=1 t=1
η 2
T
D(x∥x1 ) − D(x∥xT +1 ) X η
≤ + ∥gt ∥2∗
η t=1
2
T
D(x∥x1 ) X η
≤ + ∥gt ∥2∗
η t=1
2
where the second inequality is by noting that the term D(x∥xt ) appears with a positive sign at the t’th part
of the sum, and negative sign at the t − 1’th part of the sum.
Suppose that each ft is Lipschitz in the sense that ∥gt ∥∗ ≤ L, using our
√
bound Ω on DGF differences,
2Ω
and supposing we initialize x1 at the minimizer of d, then we can set η = L T to get
√
Ω ηT L2 √
RT ≤ + ≤ 2ΩT L
η 2
Note that it is more directly related to FTL: it uses the FTL update, but with a single smoothing term
d(x), whereas OMD re-centers a Bregman divergence at D(·∥xt ) at every iteration. FTRL can be analyzed
similarly to OMD. It gives the same theoretical properties for our purposes, but we’ll see some experimental
performance from both algorithms later. For a convergence proof see Orabona [75].
Theorem 9 (von Neumann’s minimax theorem). Every two-player zero-sum game has a unique value v,
called the value of the game, such that
Theorem 10 (Generalized minimax theorem). Let X ∈ Rn , Y ∈ Rm be compact convex sets. Let f (x, y) be
continuous, convex in x for a fixed y, and concave in y for a fixed x, with some upper bound L on the partial
subgradients with respect to x and y. Then there exists a value v such that
Proof. We will view this is a game between a player choosing the minimizer and a player choosing the
maximizer. Let y ∗ be the y chosen when y is chosen first. When y is chosen second, the maximizer over y
can, in the worst case, pick at least y ∗ every time. Thus we get
For the other direction we will use our OCO results. We run a repeated game where the players choose
a strategy xt , yt at each iteration t. The x player chooses xt according to a no-regret algorithm (say OMD),
while yt is always chosen as argmaxy∈Y f (xt , y). Let the average strategies be
T T
1X 1X
x̄ = xt , ȳ = yt .
T t=1 T t=1
Using OMD with the Euclidean DGF (since X is compact this is well-defined), we get the following
bound:
T
X T
X √
RT = f (xt , yt ) − min f (x, yt ) ≤ O ΩT L (4.5)
x∈X
t=1 t=1
where the first inequality follows because x̄ is a valid choice in the minimization over X, the second inequality
follows by convexity, and the third inequality follows because yt is chosen to maximize f (xt , yt ). Now we
can use the regret bound (4.5) for OMD to get
T √ !
1 X ΩL
min max f (x, y) ≤ min f (x, yt ) + O √
x∈X y∈Y T x∈X t=1 T
√ !
ΩL
≤ min f (x, ȳ) + O √
x∈X T
√ !
ΩL
≤ max min f (x, y) + O √
y∈Y x∈X T
For simplicity we assumed continuity of f . The argument did not really need continuity, though. The
same proof works for f which is lower/upper semicontinuous in x and y respectively.
4.6. HISTORICAL NOTES 29
2A quite general version of what’s usually referred to as Sion’s minimax theorem can be found on Wikipedia at https:
//en.wikipedia.org/wiki/Sion%27s_minimax_theorem.
30 CHAPTER 4. REGRET MINIMIZATION AND SION’S MINIMAX THEOREM
Chapter 5
5.1 Recap
We have covered a slew of no-regret algorithms: hedge, online mirror descent (OMD), regret matching (RM),
and RM+ . All of these algorithms can be used for the case of solving two-player zero-sum matrix games of
the form
minn maxm ⟨x, Ay⟩.
x∈∆ y∈∆
Matrix games are a special case of the more general saddle-point problem
where f is convex-concave, meaning that f (·, y) is convex for all fixed y, and f (x, ·) is concave for all fixed
x, and lower/upper semicontinuous. In this chapter we will cover how to solve this more general class
of saddle-point problems by using regret minimization for each “player” and having the regret minimizers
perform what is usually called self play. The name self play comes from the fact that we usually use the same
regret-minimization algorithm for each player, and so in a sense this approach towards computing equilibria
lets the chosen regret-minimization algorithm play against itself. After covering the self play setup, we will
look at some experiments on practical performance for the matrix-game case. We will also compare to an
algorithm that has stronger theoretical guarantees.
• Initialize x1 ∈ X, y1 ∈ Y to be some pair of strategies in the relative interior (in matrix games we
usually start with the uniform strategy)
For a strategy pair x̄, ȳ, we will measure proximity to Nash equilibrium via the saddle-point residual
(SPR):
ξ(x̄, ȳ) := max f (x̄, y) − f (x̄, ȳ) + f (x̄, ȳ) − min f (x, ȳ) = max f (x̄, y) − min f (x, ȳ).
y∈Y x∈X y∈Y x∈X
31
32 CHAPTER 5. SELF-PLAY VIA REGRET MINIMIZATION
gt−1 xt gt xt+1
· · · ℓt−1 X yt ℓt X yt+1 · · ·
Y Y
Figure 5.1: The flow of strategies and losses in regret minimization for games.
Each bracketed term represents how much each player can improve by deviating from ȳ or x̄ respectively,
given the strategy profile (x̄, ȳ). In game-theoretic terms the brackets are how much each player improves
by best responding.
Now, suppose that the regret-minimizing algorithms guarantee regret bounds of the form
T
X T
X
max f (xt , y) − f (xt , yt ) ≤ ϵy
y∈Y
t=1 t=1
(5.1)
T
X T
X
f (xt , yt ) − min f (x, yt ) ≤ ϵx ,
x∈X
t=1 t=1
So now we know how to compute a Nash equilibrium: simply run the above repeated game with each
player using a regret-minimizing algorithm, and the uniform average of the strategies will converge to a Nash
equilibrium.
Figure 5.2 shows the performance of the regret-minimization algorithms taught so far in the course, when
used to compute a Nash equilibrium of a zero-sum matrix game via Theorem 11. Performance is shown on
3 randomized matrix game classes where entries in A are sampled according to: 100-by-100 uniform [0, 1],
500-by-100 standard Gaussian, and 100-by-100 standard Gaussian. All plots are averaged across 50 game
samples per setup. We show one addition algorithm for reference: the mirror prox algorithm, which is an
offline optimization algorithm that converges to a Nash equilibrium at a rate of O T1 . It’s an accelerated
variant of mirror descent, and it similarly relies on a distance-generating function d. The plot shows mirror
prox with the Euclidean distance.
As we see in Figure 5.2, mirror prox indeed performs better than all the O √1T regret minimizers
using the setup for Theorem 11. On the other hand, the entropy-based variant of √ OMD, which has a log n
dependence on the dimension n, performs much worse than the algorithms with n dependence.
5.3. ALTERNATION 33
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
●
0.100 ●
● ●
●
●
● ● ●
● ● ●
● ●
●
● ● ●
● ● ●
0.010 ● ●
●
●
●
●
● ● ●
● ● ●
● ● ●
● ● ●
●
0.001 ●
Figure 5.2: Plots showing the performance of four different regret-minimization algorithms for computing
Nash equilibrium, all using Theorem 11. Mirror prox with uniform averaging is also shown as a reference
point.
5.3 Alternation
Let’s try making a small tweak now; the idea of alternation. In alternation, the players are no longer
symmetric: one player sees the loss based on the previous strategy of the other player as before, but the
second player sees the loss associated to the current strategy.
Suppose that the regret-minimizing algorithms guarantee regret bounds of the form
T
X T
X
max f (xt+1 , y) − f (xt+1 , yt ) ≤ ϵy
y∈Y
t=1 t=1
(5.2)
T
X T
X
f (xt , yt ) − min f (x, yt ) ≤ ϵx .
x∈X
t=1 t=1
Theorem 12. Suppose we run two regret minimizer with alternation and they give the guarantees in (5.2).
PT PT
Then the average strategies x̄ = T1 t=1 xt+1 , ȳ = T1 t=1 yt satisfy
PT
ϵx + ϵy + t=1 (f (xt+1 , yt ) − f (xt , yt ))
ξ(x̄, ȳ) ≤
T
34 CHAPTER 5. SELF-PLAY VIA REGRET MINIMIZATION
T
X T
X T
X T
X
ϵx + ϵy ≥ max f (xt+1 , y) − f (xt+1 , yt ) + f (xt , yt ) − min f (x, yt )
y∈Y x∈X
t=1 t=1 t=1 t=1
T
X T
X T
X
= max f (xt+1 , y) − min f (x, yt ) − [f (xt+1 , yt ) − f (xt , yt )]
y∈Y x∈X
t=1 t=1 t=1
XT
≥T max f (x̄, y) − min f (x, ȳ) − [f (xt+1 , yt ) − f (xt , yt )]
y∈Y x∈X
t=1
Theorem 12 shows that if f (xt+1 , yt ) − f (xt , yt ) ≤ 0 for all t, then the bound for alternation is weakly
better than the bound in Theorem 11. But what does this condition mean? If we examine it from the
regret minimization perspective, it is saying that xt+1 does better than xt against yt . Intuitively, we would
expect this to hold: xt is chosen right before observing f (·, yt ), whereas xt+1 is chosen immediately after
observing f (·, yt ), and generally we would expect that any time we make a new observation, we should move
somewhat in the direction of improvement against that observation. Indeed, it turns out to be relatively
straightforward to show that this holds for all the regret minimizers we saw so far (As an exercise, show that
this holds for a few regret minimizers; it is easiest for OMD).
Figure 5.3 shows the performance of the same set of regret-minimization algorithms but now using the
setup from Theorem 12. Mirror prox is shown exactly as before.
● ● ● ● ● ●
● ● ●
● ● ● ●
● ● ●
●
0.100 ●
● ●
●
●
● ● ●
● ● ●
● ●
●
● ● ●
● ● ●
0.010 ● ●
●
●
●
●
● ● ●
● ● ●
● ● ●
● ● ●
●
0.001 ●
Figure 5.3: Plots showing the performance of four different regret-minimization algorithms for computing
Nash equilibrium, all using Theorem 12. Mirror prox with uniform averaging is also shown as a reference
point.
Amazingly, Figure 5.3 shows that with alternation, OMD with the Euclidean DGF, regret matching, and
RM+ all perform about on par with mirror prox.
5.4. INCREASING ITERATE AVERAGING 35
Figure 5.4 shows the performance of the same set of regret-minimization algorithms but now using the
setup from Theorem 12 and linear averaging in all algorithms, including mirror prox. The fastest algorithm
● ● ● ● ● ●
● ● ● ● ● ●
●
● ● ●
1e−01 ●
●
●
●
●
●
● ● ●
● ●
● ●
● ●
1e−02 ●
●
●
●
●
● ●
●
● ●
●
●
1e−03 ●
●
●
●
●
● ●
●
● ●
●
1e−04 ● ●
●
Figure 5.4: Plots showing the performance of four different regret-minimization algorithms for computing
Nash equilibrium, all using Theorem 12. All algorithms use linear averaging. RM+ with uniform averaging
is shown as a reference point.
with uniform averaging, RM+ with alternation, is shown for reference. OMD with Euclidean DGF and RM+
with alternation both gain another order of magnitude in performance by introducing linear averaging.
It can be shown that RM+ , online mirror descent, and mirror prox, all work with polynomial averaging
schemes.
1 https://www2.isye.gatech.edu/
~nemirovs/LMCO_LN2019NoSolutions.pdf
36 CHAPTER 5. SELF-PLAY VIA REGRET MINIMIZATION
Chapter 6
Extensive-Form Games
6.1 Introduction
In this lecture we will cover extensive-form games (EFGs). Extensive-form games are a richer game descrip-
tion that explicitly models sequential interaction. EFGs are played on a game tree. Each node in the game
tree belongs to some player, whom gets to choose the branch to traverse.
P1 P1 P1
P2 P2 P2 P2 -2 -1
-2 P1 -1 1 -2 2 -1 1
-3 2 1
Figure 6.1: A simple perfect-information EFG. Three versions of the game are shown, where each stage
corresponds to removing one layer of the game via backward induction.
Perfect-information EFGs are trivially solvable (at least if we are able to traverse the whole game tree
at least once). The way to solve them is via backward induction. Backward induction works by starting at
some bottom decision node of the game tree, which only leads to leaf nodes after each action is taken (such a
node always exists). Then, the optimal action for the player at the node is selected, and the node is replaced
with the corresponding leaf node. Now we get a new perfect-information EFG with one less internal node.
Backward induction then repeats this process until there’s no internal nodes left, at which point we have
computed a Nash equilibrium. Thus perfect-information EFGs always have pure-strategy Nash equilibria.
37
38 CHAPTER 6. EXTENSIVE-FORM GAMES
While backward induction yields a linear-time algorithm for solving perfect-information games, in prac-
tice, many games of interest are way too large to solve with it nonetheless. For example, chess and go both
have enormous game trees, with estimates of ∼ 1045 and ∼ 10172 nodes respectively.
Next let us see how converting to normal form works. The way converting to normal form works is
that for each player, we create an action corresponding to every possible way of assigning an action at every
decision point. So, if a player has d decision points with A actions each, then there Ad actions will be created
in the normal form representation of the EFG. This reduction to normal form works for both perfect and
imperfect-information games.
Let’s consider an instructive example. Here we will model the Cuban Missile Crisis. The USSR has
moved a bunch of nuclear weapons to Cuba, and the US has to decide how to respond. If they do nothing,
then the USSR wins a political victory, and gets to keep nuclear missiles within firing distance of major US
cities. If the US responds, then it could results in a series of escalations that would eventually lead to nuclear
war, or the USSR will eventually compromise and remove the missiles.
USA
respond do nothing
USSR 0,2
USSR
Nuclear war Compromise
USA
It is straightforward to see from this representation that the Cuban Missile Crisis game has two PNE:
(do nothing, nuclear war) and (respond, compromise). However, the first PNE is in a sense not compelling:
what if the USA just responded? The USSR probably would not be willing to follow through on taking the
action “nuclear war” since it has such low utility for them as well. This leads to the notion of subgame-perfect
equilibria, which are equlibria that remain equilibria if we take any subgame consisting of picking some node
in the tree and starting the game there.
perfect-information EFGs, where solutions are straightforwardly obtained from backward induction). An
example is shown in Figure 6.3.
A K
P1 P1
f r r′ f′
−1 P2 P2 −1
f∗ r∗ f∗ r∗
P1 P1 1 −3
ĉ fˆ ĉ fˆ
−3 −2 3 −2
Figure 6.3: A (rather weird) poker game where P1 is dealt Ace or King with equal probability. “r,” “f,” and
“c” stands for raise, fold, and check respectively. Leaf values denote P1 payoffs. The shaded area denotes an
information set: P2 does not know which of these nodes they are at, and must thus use the same strategy
in both. Note that in the case where they are dealt an ace, P1 does not observe the action taken by P2.
Now we would like to find a way to represent EFG zero-sum Nash equilibrium this way. This turns out to
be possible, and the key is to find the right way to represent strategies such that we get a bilinear objective.
The next section will describe this representation.
First, let us see why the most natural formulation of the strategy spaces won’t work. The natural formu-
lation would be to have a player specify a probability distribution over actions at each of their information
sets. Let σ be a strategy profile, where σa is the probability of taking action a (from now on we assume that
every action is distinct so that for any a there is only one corresponding I where the action can be played).
The expected value over leaf nodes is X
u2 (z)P(z|σ)
z∈Z
The problem with this formulation is that if a player has more than one action on the path to any leaf, then
the probability P(z|σ) of reaching z is non-convex in that player’s own strategy, since we have to multiply
each of the probabilities belonging to that player on the path to z. Thus we cannot get the bilinear form in
(6.1).
P
where n = I∈Ii |A|, and p(I) is the parent sequence leading to I.
One way to visually think of the set of sequence-form strategies is given in Figure 6.4. This representation
∆1 × ∆2
q1 q2 q3 q4
q 1 · ∆3 q 2 · ∆4 q3 · ∆5 q 4 · ∆6
q5 q6 q7 q8 q9 q10 q11 q12
q6 · ∆7 × q 6 · ∆8
q13 q15 q16 q17
q14
is called a treeplex. Each information set is represented as a simplex, which is scaled by the parent sequence
leading to that information set (by perfect recall there is a unique parent sequence). After taking a particular
6.5. DILATED DISTANCE-GENERATING FUNCTIONS 41
action it is possible that a player may arrive at several next possible simplexes depending on what the other
player or nature does. This is represented by the × symbol.
It’s important to understand that the sequence form specifies probabilities on sequences of actions for a
single player. Thus they are not the same as paths in the game tree; indeed, the sequence r∗ for player 2
appears in two separate paths of the game tree, as player 2 has two nodes in the corresponding information
set.
Say we have a set of probability distributions over actions at each information set, with σa denoting the
probability of playing action a. We may construct a corresponding sequence-form strategy by applying the
following equation in top-down fashion (so that xpj is always assigned before xa ):
xa = xpj σa , ∀j ∈ J , a ∈ Aj . (6.3)
The payoff matrix A associated with the sequence-form setup is a sparse matrix, with each row corre-
sponding to a sequence of the x player and each column corresponding to a sequence of the y player. Each
leaf has a cell in A at the pair of sequences that are last visited by each player before reaching that leaf,
and the value in the cell is the payoff to the maximizing player in the bilinear min-max formulation. Cells
corresponding to pairs of sequences that are never the last pair of sequences visited before a leaf have a zero.
With this setup we now have an algorithm for computing a Nash equilibrium in a zero-sum EFG: Run
online mirror descent (OMD) for each player, using either of our folk-theorem setups from the previous
chapter. However, this has one issue, recall the update for OMD (also known as a prox mapping):
where D(x∥xt ) = d(x) − d(xt ) − ⟨∇d(xt ), x − xt ⟩ is the Bregman divergence from xt to x. In order to run
OMD, we need to be able to compute this prox mapping. The question of whether the prox mapping is easy
to compute is easily answered when X is a simplex, where updates for the entropy DGF are closed-form,
and updates for the Euclidean DGF can be computed in n log n time, where n is the number of actions. For
treeplexes this question becomes more complicated.
In principle we could use the standard Euclidean distance for d. In that case the update can be rewritten
as
xt+1 = argmin ∥x − (xt − γgt )∥22 ,
x∈X
which means that the update requires us to project onto a treeplex. This can be done in n · d · log n time,
where n is the number of sequences and d is the depth of the decision space of the player. While this is
acceptable, it turns out there are smarter ways to compute these updates which take linear time in n.
xj
X
d(x) = βj x pj d j ,
x pj
j∈J1
argmin⟨gt , x⟩ + D(x∥xt )
x∈X
= argmin⟨gt , x⟩ + d(x) − d(xt ) − ⟨∇d(xt ), x − xt ⟩
x∈X
= argmin⟨gt − ∇d(xt ), x⟩ + d(x)
x∈X
X j
= argmin ⟨gt − ∇d(xt )j , xj ⟩ + βj xpj dj (xj /xpj )
x∈X
j∈J
X
= argmin xpj ⟨gtj − ∇d(xt )j , xj /xpj ⟩ + βj dj (xj /xpj )
x∈X
j∈J
Now we may consider some information set j with no descendant information sets. Since xpj is on the
outside of the parentheses, we can compute the update at j as if it were a simplex update, and the value at
the information set can be added to the coefficient on xpj . That logic can then be applied recursively. Thus
we can traverse the treeplex in bottom-up order, and at each information set we can compute the value for
xjt+1 in however long it takes to compute an update for a simplex with DGF dj . P
If we use the entropy DGF for each j ∈ J and set the weight βj = 2 + maxa∈Aj j ′ ∈C a 2βj ′ , then we get
j
1
a DGF for X that is strongly convex modulus M where M = maxx∈X ∥x∥1 . If we scale this DGF by M we
get that it is strongly convex modulus 1. If we instantiate the mirror prox algorithm with this DGF for X
and Y we get an algorithm that converges at a rate of
q
maxij Aij maxI∈I log(|AI |) Mx2 2d + My2 2d
O ,
T
where Mx , My are the maximum ℓ1 norms on X and Y , and d is an upper bound on the depth of both
treeplexes. This gives the fastest theoretical rate of convergence among gradient-based methods. However,
this only works for OMD. All our other algorithms (RM, RM+ ) were for simplex domains exclusively. Next
we derive a way to use these locally at each information set. It turns out that faster practical performance
can be obtained this way.
To get the decomposition, we will define a local notion of regret which is defined with respect to behavioral
strategies σ ∈ ×j ∆j =: Σ (here we just derive the decomposition for a single player, say player 1. Everything
is analogous for player 2).
We saw in the previous lecture note that it is always possible to go from behavioral form to sequence
form using the following recurrence, where assignment is performed in top-down order.
xa = xpj σa , ∀j ∈ J , a ∈ Aj . (6.4)
It is also possible to go the other direction (though this direction is not a unique mapping, as one has a
choice of how to assign behavioral probabilities at information sets j such that xpj = 0). These procedures
produce payoff-equivalent strategies for EFGs.
6.6. COUNTERFACTUAL REGRET MINIMIZATION 43
For a behavioral strategy vector σ (or loss vector gt ) we say that σ j is the slice of σ corresponding to
information set j. σ j↓ is the slice corresponding to j, and every information set below j. Similarly, Σj↓ is
the set of all behavioral strategy assignments for the subset of simplexes that are in the tree of simplexes
rooted at j.
We let Cj,a be the set of next information sets belonging to player 1 that can be reached from j when
taking action a. In other words, the set of information sets whose parent sequence is a.
Now, let the value function at time t for an information set j belonging to player 1 be defined as
X X ′ ′
Vtj (σ) = ⟨gtj , σ j ⟩ + σa Vtj (σ j ↓ ).
a∈Aj j ′ ∈Cj,a
where σ ∈ Σj↓ . Intuitively, this value function represents the value that player 1 derives from information
set j, assuming that i played to reach it, i.e. if we counterfactually set xpj = 1.
The subtree regret at a given information set j is
T
X T
X
RTj↓ = Vtj (σtj↓ ) − min Vtj (σ),
σ∈Σj↓
t=1 t=1
Note that for each j, the loss depends linearly on σ j ; σ j does not affect information sets below j, since we
use σt in the value function for child information sets j ′ .
Now we show that the subtree regret decomposes in terms of local losses and subtree regrets.
Theorem 13. For any j ∈ J , the subtree regret at time T satisfies
T T
′
σa RTj ↓ .
X X X
RTj↓ = ⟨ĝtj , σtj ⟩ − minj ⟨ĝtj , σ⟩ −
σ∈∆
t=1 t=1 a∈Aj ,j ′ ∈Cj,a
Note that this regret is in the behavioral form, and it corresponds exactly to the regret associated to locally
minimizing ĝtj at each simplex j.
The CFR framework is based on the following theorem, which says that the sequence-form regret can be
upper-bounded by the behavioral-form local regrets.
44 CHAPTER 6. EXTENSIVE-FORM GAMES
X
RT = RTroot↓ ≤ max xpj R̂Tj ,
x∈X
j∈J
Proof. For the equality, consider the regret RT over the sequence form polytope X. Since each sequence-form
strategy has a payoff equivalent behavioral strategy in Σ and vice versa, we get that the regret RT is equal
to RTroot↓ for the root information set root (we may assume WLOG. that there is a root information set since
if not then we can add a dummy root information set with a single action).
By Theorem 13 we have for any j ∈ J
T T
′
σa RTj ↓
X X X
RTj↓ = ⟨ĝt , σt ⟩ − minj ⟨ĝtj , σ⟩ −
j j
σ∈∆
t=1 t=1 a∈Aj ,j ′ ∈Cj,a
T T
′
σa RTj ↓ ,
X X X
≤ ⟨ĝtj , σtj ⟩ − minj ⟨ĝtj , σ⟩ + maxj (6.5)
σ∈∆ σ∈∆
t=1 t=1 a∈Aj ,j ′ ∈Cj,a
′
σa RTj ↓
PT j P
where the inequality is by the fact that independently minimizing the terms t=1 ⟨ĝt , σ⟩ and − a∈Aj ,j ′ ∈Cj,a
is smaller than jointly minimizing them.
Now we may apply (6.5) recursively in top-down fashion starting at root to get the theorem.
A direct corollary of Theorem 14 is that if the counterfactual regret at each information set grows sublin-
early then overall regret grows sublinearly. This is the foundation of the counterfactual regret minimization
(CFR) framework for minimizing regret over treeplexes. The CFR framework can succinctly be described as
2. At iteration t, for each j ∈ J , feed the local regret minimizer the counterfactual regret ĝtj .
3. Generate xt+1 as follows: ask for the next recommendation from each local regret minimizer. This
yields a set of simplex strategies, one for each information set. Construct xt+1 via (6.4).
Thus we get an algorithm for minimizing regret on treeplexes based on minimizing counterfactual regrets.
In order to construct an algorithm for computing a Nash equilibrium based on a CFR setup, we may invoke
the folk theorems from the previous lectures using the sequence-form strategies generated by CFR. Doing
this yields an algorithm that converges to a Nash equilibrium of an EFG at a rate on the order of O √1T
While CFR is technically a framework for constructing local regret minimizers, the term “CFR” is often
overloaded to mean the algorithm that comes from using the folk theorem with uniform averages, and using
regret matching as the local regret minimizer at each information set. CFR+ is the algorithm resulting from
using the alternation setup, taking linear averages of strategies, and using RM+ as the local regret minimizer
at each information set.
We now show pseudocode for implementing the CFR algorithm with the RM+ regret minimizer. In
order to compute Nash equilibria with this method one would use CFR as the regret minimizer in one of the
folk-theorem setups from the previous lecture.
6.7. NUMERICAL COMPARISON OF CFR METHODS AND OMD-LIKE METHODS 45
NextStrategy simply implements the top-down recursion (6.4) while computing the update corre-
sponding to RM+ at each j. ObserveLoss uses bottom-up recursion to keep track of the regret-like
sequence Qa , which is based on ĝt,a in CFR.
A technical note here is that we assume that there is some dummy sequence ∅ at the root of the treeplex
with no corresponding j (this corresponds to a single-action dummy information set at the root, but leaving
out that dummy information set in the index set J ). This makes code much cleaner because there is no
need to worry about the special case where a given j has no parent sequence, at the low cost of increasing
the length of the sequence-form vectors by 1.
solution accuracy
●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
●
Algorithm Algorithm ●
●
● CFR(RegretMatching) ● CFR(RegretMatching)
0.01
CFR(RegretMatching+) 0.01 CFR(RegretMatching+)
CFR+ CFR+
ExcessiveGapTechnique ExcessiveGapTechnique
solution accuracy
● ● ● ●
● ● ●
● ●
●
●
●
●
Algorithm ●
●
Algorithm
● CFR(RegretMatching) ● CFR(RegretMatching)
Figure 6.5: Solution accuracy as a function of the number of tree traversals in three different variants of
Leduc hold’em and a pursuit evasion game. Results are shown for CFR with regret mathing, CFR with
regret mathing+ , CFR+ , and EGT. Both axes are shown on a log scale.
Theorem 15. Assume that each player uses a bounded unbiased gradient estimator for their loss at each
iteration. Then for all p ∈ (0, 1), with probability at least 1 − 2p
R̃1 + R̃T2 r 2 1
ξ(x̄, ȳ) ≤ T + 2∆ + M̃1 + M̃2 log ,
T T p
where R̃Ti is the regret incurred under the losses g̃ti for player i, ∆ = maxz,z′ ∈Z u2 (z) − u2 (z ′ ) is the payoff
range of the game, and M̃1 ≥ maxx,x′ ∈X ⟨g̃t , x − x′ ⟩, ∀g̃t is a bound on the “size” of the gradient estimate,
with M2 defined analogously.
We will not show the proof here, but it follows from introducing the discrete-time stochastic process
Saddle-point gap
100
0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.5 1.0 1.5 2.0 2.5
Number of nodes touched (×107 ) Number of nodes touched (×106 )
Leduc13, external sampling, 50 seeds Search game (4 turns), external sampling, 50 seeds
MCCFR MCCFR
FTRL (η = 10) FTRL (η = 100)
OMD (η = 1) OMD (η = 1)
Saddle-point gap
Saddle-point gap
100 100
10−1 10−1
1 2 3 4 5 6 7 8 0.2 0.4 0.6 0.8 1.0
Number of nodes touched (×106 ) Number of nodes touched (×106 )
Figure 6.6: Performance of CFR, FTRL, and OMD when using the external sampling gradient estimator.
observing that it is a martingale difference sequence, and applying the Azuma-Hoeffding concentration
inequality.
With Theorem 15 in hand, we just need a good way to construct gradient estimates g̃t ≈ Ayt . Generally,
one can construct a wide array of gradient estimators by using the fact that Ayt can be computed by
traversing the EFG game tree: at each leaf node z in the tree, we add −u1 (z)ya to gt,a′ , where a is the
last sequence taken by the y player, and a′ is the last sequence taken by the x player. To construct an
estimator, we may choose to sample actions at some subset of nodes in the game tree, and then only traverse
the sampled branches, while taking care to normalize the eventual payoff so that we maintain an unbiased
estimator. One of the most successful estimators construct this way is the external sampling estimator. In
external sampling when computing the gradient Ayt , we sample a single action at every node belonging to
the y player or chance, while traversing all branches at nodes belonging to the x player.
Figure 6.6 shows the performance when using external sampling in CFR (CFR with sampling is usually
called Monte-Carlo CFR or MCCFR), FTRL, and OMD. Performance is shown on Leduc with a 13-card
deck, Goofspiel (another card game), search, and battleship. In the deterministic case we saw that CFR+
was much faster than the theoretically-superior EGT algorithm (and OMD/FTRL would perform much
worse than EGT). Here we see that in the stochastic case it varies which algorithm is better.
the stochastic method MCCFR [67], and variations on which local regret minimizer to use in order to speed
up practical performance [83, 18]. The proof of CFR given here is a simplified version of the more general
theorem developed in [47]. The plots on CFR vs EGT are from Kroer et al. [65].
The bound on error from using a stochastic method in Theorem 15 is from Farina et al. [49], and the
plots on stochastic methods are from that same paper. External sampling and several other EFG gradient
estimators were introduced by Lanctot et al. [67].
Chapter 7
49
50 CHAPTER 7. INTRO TO FAIR DIVISION AND MARKET EQUILIBRIUM
currency (or funny money), computing what is called a competitive equilibrium (also known as Walrasian
equilibrium or market equilibrium; we will use the latter terminology) under this new market, and using the
corresponding allocation as our fair division. The fake currency is then thrown away, since it had no purpose
except to define a market.
To understand this mechanism, we first introduce market equilibrium. In a market equilibrium, we wish
to find a set of prices p ∈ Rm
+ for each of the m items, as well as an allocation x of items to agents such that
everybody is assigned an optimal allocation given the prices and their budget. Formally, the demand set of
an agent i with budget Bi is
Secondly, since a buyer’s demand does not change even if we rescale their valuation by a constant, we
would like the optimal solution to our convex program to also remain unchanged. Similarly, splitting the
budget of a buyer into two separate buyers with the same valuation function should leave the allocation
7.2. FISHER MARKET 51
unchanged. These conditions are satisfied by the budget-weighted geometric mean of the utilities:
!1/ Pi Bi
Y
ui (xi )Bi .
i
Since taking roots does not affect optimality, and taking the log of the whole expression, this is equivalent
to optimizing
X
max Bi log⟨vi , xi ⟩ Dual variables
x≥0
i
X (EG)
s.t. xij ≤ 1, ∀j = 1, . . . , m, pj
i
On the right are the dual variables associated to each constraint. It is easy to see that this is a convex
program. First, the feasible set is defined by linear inequalities. Second, we are taking a max of a sum of
concave functions composed with linear maps. Since taking a sum and composing with a linear map both
preserve concavity we get that the objective is concave.
The solution to the primal problem x along with the vector of dual variables p yields a market equilibrium.
Here we assume that for every item j there exists i such that vij > 0, and every buyer values at least one
item above 0.
Theorem 16. The pair of allocations x and dual variables p from EG forms a market equilibrium.
Proof. To see this, we need to look at the KKT conditions of the primal and dual variables. Writing the
Lagrangian relaxation and applying Sion’s minimax theorem (the most general version of Sion’s minimax
theorem only requires compactness in one of the variables, which we have due gives
!
X X X
min max Bi ⟨vi , xi ⟩ + pj 1 − xij
p≥0 x≥0
i j i
X X
= min max [Bi ⟨vi , xi ⟩ − ⟨p, xi ⟩] + pj (7.1)
p≥0 x≥0
i j
Bi pj
3. For all pairs i, j: xij > 0 ⇒ ⟨vi ,xi ⟩ = vij
The first condition shows that every item is fully allocated, since for every j there is some buyer i with
v B
non-zero value and by the second condition pj ≥ ⟨viji ,xii⟩ > 0.
The second condition for market equilibrium is that every buyer is assigned a bundle from their demand
set. We will use βi = ⟨viB,xi i ⟩ = uiB(xi i ) to denote the utility price that buyer i pays. First off, by the second
condition we have that the utility price that buyer i gets satisfies
pj
βi ≤ .
vij
By the third condition, we have that if xij > 0 then for all other items j ′ we have
pj pj ′
= βi ≤ .
vij vij ′
Thus, any item j that buyer i is assigned has at least as low of a utility price as any other item j ′ . In other
words, they only buy items that have the best bang-per-buck among all the items. Thus we get that they
52 CHAPTER 7. INTRO TO FAIR DIVISION AND MARKET EQUILIBRIUM
only purchase optimal items, it remains to show that they spent their whole budget. Multiplying the third
condition by xij and rearranging gives
Bi
xij vij = pj xij ,
⟨vi , xi ⟩
for any j such that xij > 0. Summing across all such j yields
X X Bi Bi
pj xij = xij vij = ⟨vi , xi ⟩ = Bi .
j j
⟨vi , xi ⟩ ⟨vi , xi ⟩
EG gives us an immediate proof of existence for the linear Fisher market setting: the feasible set is clearly
non-empty, and the max is guaranteed to be achieved.
In a previous lecture note we referenced Pareto optimality as a property of market equilibrium. It is now
trivial to see that Pareto optimality holds in Fisher-market equilibrium: since it is a solution to EG, it must
be. Otherwise we construct a solution with strictly better objective!
From the EG formulation we can also see that the equilibrium utilities and prices are in fact unique.
First note that any market equilibrium allocation would satisfy the optimality conditions of EG, and thus
be an optimal solution. But if there were more than one set of utility vectors that were equilibria, then by
the strong concavity of the log we would get that there is a strictly better solution, which is a contradiction.
That equilibrium prices are unique now follows from the third optimality condition, since all terms except
the utilities are constants.
This is still a convex optimization problem, since composing a concave and nondecreasing function (the
log) with a concave function (ui ) yields a concave function.
Beyond linear utilities, the most famous classes of utilities that fall under this category is:
xij
2. Leontief utilities: ui (xi ) = minj aij
P 1/ρ
ρ
3. The family of constant elasticity of substitution (CES) utilities: ui (xi ) = j aij xij , where aij
are the utility parameters of a buyer, and ρ parameterizes the family, with −∞ < ρ ≤ 1 and ρ ̸= 0
CES utilities turn out to generalize all the other utilities we have seen so far: Leontief utilities are
obtained as ρ approaches −∞, Cobb-Douglas utilities as ρ approaches 0, and linear utilities when ρ = 1.
More generally, ρ < 0 means that items are complements, whereas ρ > 0 means that items are substitutes.
If ui is continuously differentiable then the proof that EG computes a market equilibrium in this more
general setting essentially follows that of the linear case. The only non-trivial change is that when we derive
optimality conditions by taking the derivative of the Lagrangian with respect to xi we get
Bi pj
1. ui (xi ) ≤ ∂ui (xi )/∂xij
7.4. COMPUTING MARKET EQUILIBRIUM 53
Bi pj
2. xij > 0 ⇒ ui (xi ) = ∂ui (xi )/∂xij
In order to prove that buyers spend their budget exactly in this setting we can apply Euler’s homogeneous
i (xi )
function theorem ui (xi ) = j xij ∂u∂x
P
ij
to get
X X ∂ui (xi ) Bi
xij pj = xij = Bi .
j j
∂xij ui (xi )
8.1 Introduction
In this lecture note we study the problem of performing fair allocation when the items are indivisible. This
setting presents a number of challenges that were not present in the divisible case.
It is obviously an important setting in practice. For example, the website http://www.spliddit.org/
allows users to fairly split estates, financial assets, toys, or other goods. Another important application is
that of fairly allocating course seats to students. This setting is even more intricate, because valuations in
that setting are combinatorial. In order to design suitable mechanisms for fairly dividing discrete goods, we
will need to reevaluate our fairness concepts.
8.2 Setup
We have a set of m indivisible goods that we wish to divide among n agents. We assume that each good has
supply 1. We will denote the bundle of goods given to agent i as P xi , where xij is the amount of good j that
is allocated to buyer i. The set of feasible allocations is then {x| i xij ≤ 1, xij ∈ {0, 1}}
Unless otherwise specified, each agent is assumed to have a linear utility function ui (xi ) = ⟨vi , xi ⟩ denoting
how much they like the bundle xi .
We say that an allocation x is an MMS allocation if every agent i receives utility ui (xi ) that is at least
as high as their MMS guarantee. In the case of 2 agents, an MMS allocation always exists. As an exercise,
you might try to come up with an algorithm for finding such an allocation1
1 Solution: compute one of the solutions to agent 1’s MMS computation problem. Then let agent 2 choose their favorite
55
56 CHAPTER 8. FAIR ALLOCATION WITH INDIVISIBLE GOODS
In the case of 3 or more agents, such a solution may not exist. The counterexample is very involved, so
we won’t cover it here.
Theorem 17. For n ≥ 3 agents, there exist additive valuations for which an MMS allocation does not exist.
However, an allocation such that each agent receives at least 43 of their MMS guarantee always exists.
The original spliddit algorithm for dividing goods worked as follows: first, compute the α ∈ [0, 1] such
that every agent can be guaranteed an α fraction of their MMS guaranteee (this always ends up being α = 1
in practice). Then, subject to the constraints ui (xi ) ≥ αMMSi , a social welfare-maximizing allocation was
computed. However, this can lead to some weird results.
Example 3. Three agents each have valuation 1 for 5 items. In that case, the MMS guarantee is 1 for each
agent. But now the social welfare-maximizing solution can allocate three items to agent 1, and 1 item each
to agents 2 and 3. Obviously a more fair solution would be to allocate 2 items to 2 agents, 1 item to the last
agent.
One observation we can make about the 3/1/1 solution versus the 2/2/1 solution is that envy is strictly
higher in the 3/1/1/ solution.
With the above motivation, let us consider envy in the discrete setting. It is easy to see that we generally
won’t be able to get envy-free solutions if we are required to assign all items. Consider 2 agents splitting an
inheritance: a house worth $500k, a car worth $10k, and a jewelry set worth $5k. Since we have to give the
house to a single agent, the other agent is guaranteed to have envy. Thus we will need a relaxed notion of
envy:
Definition 1. An allocation x is envy-free up to one good (EF1) if for every pair of agents i, k, there exists
an item j such that xkj = 1 and ui (xi ) ≥ ui (xk − ek ), where ek is the k’th basis vector.
Intuitively, this definition says that for any pair of agents i, k such that i envies k, that envy can be
removed by removing a single item from the bundle of k. Note that requiring EF1 would have forced us to
use the 2/2/1 allocation in Example 3.
For linear utilities, an EF1 allocation is easily found (if we disregard Pareto optimality). As an exercise,
come up with an algorithm for computing an EF1 allocation for linear valuations2 In fact, EF1 allocations
can be computed in polynomial time for any monotone set of utility functions (meaning that if xi ≥ x′i then
ui (xi ) ≥ ui (x′i )).
However, ideally we would like to come up with an algorithm that gives us EF1 as well as Pareto efficiency.
To achieve this, we will consider the product of utilities, which we saw previously in Eisenberg-Gale. This
product is also called the Nash welfare of an allocation:
Y
N W (x) = ui (xi )
i
The max Nash welfare (MNW) solution picks an allocation that maximizes N W (x):
Y
max ui (xi )
x
i
X
s.t. xij ≤ 1, ∀j
i
xij ∈ {0, 1}, ∀i, j
Note that here we have to worry about the degenerate case where N W (x) = 0 for all x, meaning that
it is impossible to give strictly positive utility to all agents. We will assume that there exists x such that
N W (x) > 0. If this does not hold, typically one seeks a solution that maximizes the number of agents with
strictly positive utility, and then the largest MNW achievable among subsets of that size is chosen.
The MNW solution turns out to achieve both Pareto optimality (obviously, since otherwise it would not
solve the MNW optimization problem), and EF1:
bundle, and give the other bundle to agent 1. Agent 1 clearly receives their MMS guarantee, or better. Agent 2 also does: their
MMS guarantee is at most 12 ∥v2 ∥1 , and here they receive utility of at least 12 ∥v2 ∥1 .
2 This is achieved by the round-robin algorithm: simply have agents take turns picking their favorite item. It is easy to see
that EF1 is an invariant of the partial allocations resulting from this process.
8.4. COMPUTING DISCRETE MAX NASH WELFARE 57
Theorem 18. The MNW solution for linear utilities is Pareto optimal and EF1.
Proof. Let x be the MNW solution. Say for contradiction that agent i envies agent k by more than one
v
good. Let j be the item allocated to agent k that minimizes the ratio vkj ij
. Let x′ be the same allocation as x,
except that xij = 1, xkj = 0. The proof is concluded by showing that N W (x′ ) > N W (x), which contradicts
′ ′
N W (x′ )
>1
N W (x)
[ui (xi ) + vij ] · [uk (xk ) − vkj ]
⇔ >1
ui (xi )uk (xk )
vij vkj
⇔ 1+ · 1− >1
ui (xi ) uk (xk )
vkj
⇔ [ui (xi ) + vij ] < uk (xk ) (8.1)
vij
The MNW solution also turns out to give a guarantee on MNW, but not a very strong one: every agent is
guaranteed to get 1+√24n−3 of their MMS guarantee, and this bound is tight. Luckily, in practice the MNW
solution seems to fare much better. On Spliddit data, the following ratios are achieved. In the table below
are shown the MMS approximation ratios across 1281 “divide goods” instances submitted to the Spliddit
website for fairly allocating goods
Definition 2. Partition problem: you are given a multiset of integers S = {s1 , . . . , sm } (potentially with
duplicates),
P and your task is to figure out if there is a way to partition S into two sets S1 , S2 such that
P
i∈S1 si = i∈S2 s2 .
58 CHAPTER 8. FAIR ALLOCATION WITH INDIVISIBLE GOODS
We may now construct an MNW instance as follows: we create two agents and m items. Each agent has
√
value sj for item j. Now by the AM-GM inequality (2d case: xy ≤ x+y 2 ,Pwith equality iff x = y) there
exists a correct partitioning if and only if the MNW allocation has value ( 21 j sj )2 .
This result can be extended to show strong NP-hardness by considering the k-equal-sum-subset prob-
lem: given a multiset S of x1 , . . . , xn positive integers, are there k nonempty disjoint subsets S1 , . . . , Sk ⊂ S
such that sum(S1 ) = . . . = sum(Sk ). The exact same reduction as before works, but with k agents rather
than 2.
8.4.2 Algorithms
Given these computational complexity problems, how should we compute an MNW allocation in practice?
We present two approaches here. First, we can take the log of the objective, to get a concave function.
After taking logs, we get the following mixed-integer exponential-cone program:
X
max log ui
i
s.t. X ui ≤ ⟨vi , xi ⟩, ∀i = 1, . . . , n
(8.2)
xij ≤ 1, ∀j = 1, . . . , m
i
xij ∈ {0, 1}, ∀i, j
This is simply the discrete version of the Eisenberg-Gale convex program. One approach is to solve this
problem directly, e.g. using Mosek.
Alternatively, we can impose some additional structure on the valuation space: if we assume that all
valuations are integer-valued, then we know that ui (xi ) will take on some integer value in the range 0 to
∥vi ∥1 . In that case, we can add a variable wi for each agent i, and use either (1) the linearization of the log
at each integer value, or (2) the linear function from the line segment (log k, k), (log(k + 1), k + 1), as upper
bounds on wi . This gives 21 ∥vi ∥1 constraints for each i using the line segment approach (the linearization
uses twice as many constraints), but ensures that wi is equal to log⟨vi , xi ⟩ for all integer-valued ⟨vi , xi ⟩.
Using the line segment approach gives the following mixed-integer linear program (MILP):
X
max wi
i
s.t. X wi ≤ log k + [log(k + 1) − log k] × (⟨vi , xi ⟩ − k), ∀i = 1, . . . , n, k = 1, 3, . . . , ∥vi ∥1
vij xij ≥ 1, ∀i
j X
xij ≤ 1, ∀j = 1, . . . , m
i
xij ∈ {0, 1}, ∀i, j
(8.3)
These two mixed-integer programs both have some drawbacks: For the first mixed-integer exponential-
cone program, we must resort to much less mature technology than for mixed-integer linear programs. On
the other hand, the discrete EG program is reasonably compact: the program is roughly the size of a solution.
For the MILP, the good news is that MILP technology is quite mature, and so we might expect this to solve
quickly. On the other hand, adding n × ∥vi ∥1 additional constraints can be quite a lot, and could lead to
slow LP solves as part of the branch-and-bound procedure.
Figure 8.1 shows the performance of the two approaches.
150
●
Algo
Runtime
100
● eg_runtime
mip_runtime
50
●
●
● ●
● ● ● ●
0 ●
10 20 30 40 50
num agents
Figure 8.1: Plot showing the runtime of discrete Eisenberg-Gale and the MILP approach.
A really nice overview talk targeted at a technical audience is given by Ariel Procaccia here: https://
www.youtube.com/watch?v=7lUtS-l9ytI. Most of the material here is based on his excellent presentations
of these topics.
The 1.00008 inapproximability result was byZLee [68]. The 1.45-approximation algorithm was given
by Barman et al. [6]. Strong NP-hardness of k-equal-sum-subset is shown in Cieliebak et al. [31].
The MILP using approximation to the log at each integer point was introduced by Caragiannis et al. [25].
At the time, Mosek did not support exponential cones, and so they did not compare to the direct solving of
discrete Eisenberg-Gale. The results shown here are the first direct comparison of the two, as far as I know.
60 CHAPTER 8. FAIR ALLOCATION WITH INDIVISIBLE GOODS
Chapter 9
9.1 Introduction
In this chapter we begin our study of more advanced auction concepts beyond first and second-price auctions.
We will focus on a type of auction motivated by internet advertising auctions. Internet advertising auctions
provide the funding for almost every free internet service such as google search, facebook, twitter, and so on.
At the heart of these monetization schemes is a market design based around independently running auctions
every time a user shows up. This happens many times per second, advertisers participate in thousands or
even millions of auctions, have budget constraints that span the auctions, and each user query generates
multiple potential slots for showing ads. For all these reasons, these markets turn out to require a lot of new
theory for understanding them. Similarly, the scale of the problem necessitates the design of algorithmic
agents for bidding on behalf of advertisers.
First we will introduce the position auction, which is a highly structured multi-item auction. There, we
will look at the two most practically-important auction formats: the generalized second-price auction (GSP),
and the Vickrey-Clarke-Groves (VCG) auction. Then in the following chapters, we will study auctions with
budgets and repeated auctions.
61
62 CHAPTER 9. INTERNET ADVERTISING AUCTIONS: POSITION AUCTIONS
Figure 9.1: Left: A Google query for “mortgage” shows 2 ads. Organic search results follow further down.
Right: The front page of Reddit. The second feed story is an ad.
feed. Truly capturing feed auctions does require some care, however. The assumption of there being a fixed
number of items is incorrect for that setting. Instead, the number of ads shown depends on how far the user
scrolls, the size of the ads, and what else is being shown in terms of organic content. We will focus on the
simpler setting with a fixed number of slots, but properly handling feed auctions is an interesting problem.
Beyond the multi-item and budget aspects, internet advertising has a few other interesting quirks. Below
these are discussed briefly, though we will mostly abstract away considerations around these issues.
Targeted advertising.
In a classical advertising setting such as TV or newspaper advertising, the same ad is shown to every viewer
of a given TV channel, or every reader of a newspaper. This means that it is largely not feasible for smaller,
and especially niche, retailers to advertise, since their return on investment is very low due to the small
fraction of viewers or readers that fit their niche. All this changed with the advent of internet advertising,
where niche retailers can perform much more fine-grained targeting of their ads. This has enabled many
niche retailers to scale up their audience reach significantly.
One way that targeting can occur is directly through association with the search term in sponsored
search. For example, by bidding on the search term “mortgage,” a lender is effectively performing a type
of targeting. However, a second type of targeting occurs by matching on query and user features (such
targeting is used across many types of internet advertising including search, feed ads, and others). For
example, a company selling surf boards might wish to target users at the intersection of the categories {age
16-30, lives in California}. Because each individual auction corresponds to a single user query, the idea of
targeted advertising can be captured in the valuations that we will use for the buyers in our auction setup:
each buyer corresponds to an advertiser, each auction corresponds to a query, and the buyer will have value
zero for all items in a given auction if the associated query features do not match their targeting criteria.
Targeted advertising has the potential for some adverse effects. Of particular note are demographic biases
in the types of ads being shown (a well-documented example is that in some settings, ads for new luxury
housing developments were disproportionately shown to white people). In a later lecture note we will study
such questions around demographic fairness. A second potential issue is that of user privacy. This is an
interesting topic that we will unfortunately not have too much to say on, as it is outside the scope of the
course.
Another revolution compared to pre-internet advertising is the pay per click nature of most internet adver-
tising auctions. Many advertisers are not actually interested in the user simply viewing their ad. Instead,
their goal is to get the user to click on the ad, or even something downstream of clicking on the ad, such
as selling the advertised product via the linked website. Because the platform, such as google, is in a much
better position to predict whether a given user will click on a given ad, these auctions operate on a cost
per click basis, rather than a cost per impression. What this means is that any given advertiser does not
actually pay just because they won the auction and got their ad shown, instead they pay only if the user
actually clicks on their ad.
9.2. POSITION AUCTIONS 63
From an auction perspective, this means that the valuations used in the auctions must take into account
the probability that the user will click on the ad. Valuations are typically constructed by breaking down
the value that a buyer i (in this case an advertiser) has for an item (which is a particular slot in the search
query or user feed) into several components. The value per click of advertiser i is the value vi > 0 they place
on any user within their targeting criteria clicking on their ad (modern platforms generalize this concept
to a value per conversion, where a conversion can be a click, an actual sale of a product, the user viewing
a video, etc.). The click-through-rate is the likelihood that the user behind query j will click on the ad
of advertiser i, independently of where on the page the ad is shown. We denote this by CT Rij ; we will
assume that CT Rij = 0 if query j does not fall under the targeting criteria of buyer i. Finally, the slot
qualities q1 , . . . , qS are scalar values denoting the quality of each slot that an ad could end up in. These
are monotonically decreasing values, indicating the fact that it’s generally preferable to be shown higher up
on the page. Now, finally, the value that buyer i has for being shown in slot s of query j is modeled as
vijs = vi · CT Rij · qs .
For the rest of the lecture note, we will assume that vij is the value that buyer i has for auction j;
this value encodes the value per click, the CTR, and the targeting criteria (but can allow for more general
valuations that do not decompose). Note that this assumes correct CTR predictions, which is obviously
not true in practice. In practice the CTRs are estimated using machine learning, and it is of interest to
understand which discrepancies this introduces into the market. Secondly, we are assuming that buyers are
maximizing their expected utility, rather than observed utility. This is largely a non-problem, since they
will participate in thousands or even millions of auctions, and thus their realized value can reasonably be
expected to match the expectation (at least if the CTRs are correct). The slot quality qs will be handled
separately in the next section. Once we start discussing budgets, we will keep the presentation simple by
assuming a single item per auction, thus avoiding the need for slot qualities.
Example 4. Suppose we have two slots with quality scores q1 = 1, q2 = 0.5, and three buyers with values
v1 = 10, v2 = 8, v3 = 2, and suppose they all bid their values. Then buyer 1 is allocated slot 1, and they
generate a value of v1 · q1 = 10, buyer 2 is allocated slot 2 and they generate a value v2 · q2 = 4, and buyer 3
gets nothing.
this is a fairly superficial generalization, since GSP turns out to lose the core property of the second-price
auction: truthfulness!
In particular, consider Example 4 again. With GSP prices, buyer 1 gets utility q1 (v1 − v2 ) = 2 when
everyone bids truthfully. If buyer 1 instead bids some value between 2 and 8, then they get utility q2 (v1 −v3 ) =
4. Thus, buyer 1 is better off misreporting. More generally, it turns out that the GSP auction can have
several pure-Nash equilibria, and some of these lead to allocations that are not welfare-maximizing. Consider
the following bid vector for Example 4, b = (4, 8, 2). Buyer 1 gets utility 0.5(10 − 2) = 4 (whereas they’d get
utility 2 for bidding above 8). Buyer 2 gets utility 1(8 − 4) = 4 (whereas they’d get utility 0.5(8 − 2) = 3 for
bidding below 4). Buyer 3 is priced out.
We already saw a sketch of the fact that VCG is truthful in ??, but here we show the result specifically
for the position auction setting, where the proof is nice and short.
Theorem 19. The VCG auction for position auctions is truthful.
Proof. Suppose again that buyer bids are sorted, with buyer i winning slot i when bidding truthfully. Now
suppose buyer i misreports and gets slot k instead. Now we want to show that bidding truthfully maximizes
utility, which means:
S S−i S S−k
si · vi − [W−i − W−i ] ≥ sk · vi − [W−i − W−i ].
Simplifying this expression gives
S−i S−k
si · vi + W−i ≥ sk · vi + W−i .
Now we see that both the right-hand and left-hand sides correspond to social welfare under two different
allocations (where we treat bids from other buyers as their true value). The left-hand side is social welfare
when i bids truthfully, while the right-hand side is social welfare when i misreports in a way that gives them
slot k. Given that VCG picked the left-hand side, and VCG allocates via welfare maximization, the left-hand
side must be larger.
10.1 Introduction
The previous chapter introduced a new aspect to auctions associated with internet advertising auctions: the
multiple-slot issue. This chapter studies a second major practical aspect of internet advertising auctions:
budgets. In these auctions, a large fraction of advertisers specify a budget constraint that must hold in
aggregate across all of the payments made by the advertiser. Because these budget constraints are applied
across all of the auctions that an advertiser participates in, they couple the auctions together, and force
us to consider the aggregate incentives across auctions. This is in contrast to all of our previous auction
results, which studied a single auction in isolation. Notably, these budgets constraints break the incentive
compatibility of the second-price auction; for an advertiser with a budget constraint, it is not necessarily
optimal to bid their true value in each auction!
We call this setting an auction market. If second-price auctions are used then we call it a second-price
auction market, and conversely we call it a first-price auction market if first-price auctions are used.
65
66 CHAPTER 10. AUCTIONS WITH BUDGETS
Figure 10.1: Comparison of pacing methods. Left: no pacing, middle: probabilistic pacing, right: multi-
plicative pacing.
Instead, each buyer needs to somehow smooth out their spending across auctions. For large-scale Internet
auctions this is typically achieved via some sort of pacing rule. Here we will mention two that have been
used in practice:
1. Probabilistic pacing: each buyer i is given a parameter αi ∈ [0, 1] denoting the probability that they
should participate in each auction. For each auction j, an independent coin is flipped which comes up
heads with probability αi , and if it comes up heads then the buyer submits a bid bij = vij to that
auction.
2. Multiplicative pacing: each buyer i is given a parameter αi ∈ [0, 1], which acts as a scalar multiplier
on their truthful bids. In particular, for each auction j, buyer i submits a bid bij = αi vij .
Both methods have been applied in real-life large-scale Internet ad markets.
Figure 10.1 shows a comparison of pacing methods for a simplified setting where time is taken into
account. Here we assume that we are considering some buyer i whose value is the same for every item,
but other bidders are causing the items to have different prices. On the x-axis we plot time, and on the
y-axis we plot the price of each item. On the left is the outcome from naive bidding: the buyer spends their
budget much too fast, and ends up running out of budget when there are many high-value items left for
them to buy. In practice, many buyers also prefer to smoothly spend their budget throughout the day. In
the middle we show probabilistic pacing, where we do get smooth budget expenditure. However, the buyer
ends up buying some very expensive item, while missing out on much cheaper items that have the same
value to them. Finally, on the right is the result from probabilistic pacing, where the buyer picks an optimal
threshold to buy at, and thus buys item optimally in order of bang-per-buck.
In this note we will focus on multiplicative pacing, but see the historical notes section for some references
to papers that also consider probabilistic pacing.
The intuition given in Figure 10.1 can be shown to hold more generally when items have different values
to the buyer. Generally, it turns out that given a set of bids by all the other bidders, a buyer can always
specify a best response by choosing an optimal pacing multiplier:
Proposition 1. Suppose we allow arbitrary bids in each auction. If we hold all bids for buyers k ̸= i fixed,
then buyer i has a best response that consists of multiplicatively-paced bids (assuming that if a buyer is tied
for winning an auction, they can specify the fraction that they win).
Proof. Since every other bid is held fixed, we can think of each item as having some price pj = maxk̸=i bkj ,
which is what i would pay if they bid bij ≥ bkj . Now we may sort the items in decreasing order of bang-per-
v
buck pijj . An optimal allocation for i clearly consists of buying items in this order, until they reach some
index j such that if they buy every item with index l < j and some fraction xij of item j, they either spend
v v p
their whole budget, or j is the first item with pijj ≥ 1 (if pijj > 1 then xij = 0). Now set αi = vijj . With this
bid, i gets exactly this optimal allocation: for all items l ≤ j (which are the items in the optimal allocation),
p
we have αi vil = vijj vil ≥ vpill vil = pl .
• For all j,
P
i xij = 1, and if xij > 0 then i is tied for highest bid on item j.
• If xij > 0 then pj = maxk̸=i αk vkj .
• For all i, j pj xij ≤ Bi . Additionally, if the inequality is strict then αi = 1.
P
The first and second conditions of pacing equilibrium simply enforce that the item always goes to winning
bids at the second-price rule. The third condition ensures that a buyer is only paced if their budget constraint
is binding. It follows (almost) immediately from Proposition 1 that every buyer is best responding in SPPE.
A nice property of SPPE is that it is always guaranteed to exist (this is not immediate from the existence
of, say, a Nash equilibrium in a standard game, since an SPPE corresponds to a specific type of pure-strategy
Nash equilibrium):
Theorem 20. An SPPE of a pacing game is always guaranteed to exist.
We won’t cover the whole proof here, but we will state the main ingredients, which are useful to know
more generally.
• First, a smoothed pacing game is constructed. In the smoothed game, the allocation is smoothed
out among all bids that are within ϵ of the maximum bid, thus making the allocation a deterministic
function of the pacing multipliers α. Several other smooth approximations are also introduced to deal
with other discontinuities. In the end, a game is obtained, where each player simply has as their action
space the interval [0, 1] and utilities are nice continuous and quasi-concave functions.
• Secondly, the following fixed-point theorem is invoked to guarantee existence of a pure-strategy Nash
equilibrium in the smoothed game.
Theorem 21. Consider a game with n players, strategy space Ai , and utility function ui (ai , a−i ). If
the following conditions are satisfied:
– Ai is convex and compact for all i
– ui (si , ·) is continuous in s−i
– ui (·, s−i ) is continuous and quasi-concave in si (quasi-concavity of a function f (x) means that for
all x, y and λ ∈ [0, 1] it holds that f (λx + (1 − λ)y) ≥ min(f (x), f (y)))
then a pure-strategy Nash equilibrium exists.
• Finally, the limit point of smoothed games as the smoothing factor ϵ tends to zero is shown to yield
an equilibrium in the original pacing problem.
Unfortunately, while SPPE is guaranteed to exist, it turns out that sometimes there are several SPPE,
and they can have large differences in revenue, social welfare, and so on. An example is shown in Figure 10.2.
In practice this means that we might need to worry about whether we are in a good and fair equilibrium.
Another positive property of SPPE is that every SPPE is also a market equilibrium, if we consider a
market equilibrium setting where each buyer has a quasi-linear demand function that respects the total
supply as follows:
Di (p) = argmax0≤xi ≤1 ⟨vi − p, xi ⟩ s.t. ⟨p, xi ⟩ ≤ Bi .
This follows immediately by simply using the allocation x and prices p from the SPPE as a market equilib-
rium. Proposition 1 tells us that xi ∈ Di (p), and the market clears by definition of SPPE. This means that
SPPE has a number of nice properties such as no envy and Pareto optimality (although Pareto optimality
requires considering the seller as an agent too).
Finally we turn to the question of computing an SPPE. Unfortunately the news there is bad. It was
shown recently that computing an SPPE is a PPAD-complete problem. This means that there exists a
polynomial-time reduction between the problem of computing a Nash equilibrium in a general-sum game
and that of computing an SPPE, and thus the two problems are equally hard, from the perspective of
computing a solution in polynomial time. Moreover, it was also shown that we cannot hope for iterative
methods to efficiently compute an approximate SPPE. Beyond merely computing any SPPE, we could also
try to find one that maximizes revenue or social welfare. This problem turns out to be NP complete.
There is a mixed-integer program for computing SPPE, but unfortunately it is not very scalable.
68 CHAPTER 10. AUCTIONS WITH BUDGETS
Figure 10.2: Multiplicity of SPPE. On the left is shown a problem instance, and on the right is shown two
possible second-price pacing equilibria.
Again, the goal will be to find a pacing equilibrium. This is simply a BFPM that satisfied the comple-
mentarity condition on the budget constraint and pacing multiplier.
Definition 5. A first-price pacing equilibrium (FPPE) is a BFPM (α, x) such that for every buyer i:
• For all i, if j pj xij < Bi then αi = 1.
P
Notably, the only difference to SPPE is the pricing condition, which now uses first price.
A very nice property of the first-price setting is that BFPMs satisfy a monotonicity condition: if (α′ , x′ )
and (α′′ , x′′ ) are both BFPM, then the pacing vector α = max(α′ , α′′ ) (where the max is taken component-
wise) is also a BFPM. The associated allocation is that for each item j, we first identify whether the highest
bid comes from α′ or α′′ , and use the corresponding allocation of j (breaking ties towards α′ ).
Intuitively, the reason that (α, x) is also BFPM is that for every buyer i, their bids are the same as in
one of the two previous BFPMs (say (α′ , x′ ) WLOG.), and so the prices they pay are the same as in (α′ , x′ ).
Furthermore, since every other buyer is bidding at least as much as in (α′ , x′ ), they win weakly less of each
item (using the tie-breaking scheme described above). Since (α′ , x′ ) satisfied budgets, (α, x) must also satisfy
budgets. The remaining conditions are easily checked.
In addition to componentwise maximality, there is also a maximal BFPM (α, x) (there could be mul-
tiple x compatible with α) such that α ≥ α′ for all α′ that are part of any BFPM. Consider αi∗ =
sup{αi |α is part of a BFPM}. For any ϵ and i, we know that there must exist a BFPM such that αi > αi∗ −ϵ.
For a fixed ϵ we can take componentwise maxima to conclude that there exists (αϵ , xϵ ) that is a BFPM. This
10.4. FIRST-PRICE AUCTION MARKETS 69
yields a sequence {(αϵ , xϵ )} as ϵ → 0. Since the space of both α and x is compact, the sequence has a limit
point (α∗ , x∗ ). By continuity (α∗ , x∗ ) is a BFPM.
We can use this maximality to show existence and uniqueness (of multipliers) of FPPE:
Theorem 22. An FPPE always exists and the set of pacing multipliers {α} that are part of an FPPE is a
singleton.
Proof. Here we give a high-level proof, a more explicit proof can be found in the paper listed in the notes.
Consider the component-wise maximal α and an associated allocation x such that they form a BFPM.
Since α, x is a BFPM, we only need to check that it has no unnecessarily paced bidders. Suppose some
buyer i is spending strictly less than Bi and αi < 1. If i is not tied for any items, then we can increase
αi for some sufficiently small ϵ and retain budget feasibility, contradicting the maximality of α. If i is tied
for some item, consider the set N (i) of all bidders tied with i. Now take the transitive closure of this set
by repeatedly adding any bidder that is tied with any bidder in N (i). We can now redistribute all the tied
items among bidders in N (i) such that no bidder in N (i) is budget constrained (this can be done by slightly
increasing i’s share of every item they are tied on, then slightly increasing the share of every other buyer
in N (i) who is now below budget, and so on). But now there must exist some small enough δ > 0 such
that we can increase the pacing multiplier of every bidder in N (i) by δ while retaining budget feasibility and
creating no new ties. This contradicts α being maximal. We get that there can be no unnecessarily paced
bidders under α
Finally, to show uniqueness, consider any alternative BFPM α′ , x′ . Consider the set I of buyers such that
αi < α; Since α ≥ α′ and α ̸= α′ this set must have size at least one. Since all buyers in I were spending
′
less than their budget under α, and their collective spending strictly decreased, at least one buyer in I must
not be spending their whole budget. But αi′ < αi ≤ 1 for all i ∈ I, so that buyer must be unnecessarily
paced.
10.4.1 Sensitivity
FPPE enjoys several nice monotonicity and sensitivity properties that SPPE does not. Several of these
follow from the maximality property of FPPE: the unique FPPE multipliers α are such that α ≥ α′ for any
other BFPM (α′ , x′ ).
The following are all guaranteed to weakly increase revenue of the FPPE:
1. Adding a bidder i: the old FPPE (α, x) is still BFPM by setting αi = 0, xi = 0. By α monotonicity
prices increase weakly.
2. Adding an item: The new FPPE α′ satisfies α′ ≤ α (for contradiction, consider the set of bidders
whose multipliers increased, since they win weakly more and prices went up, somebody must break
their budget). Now consider the bidders such that αi′ < αi . Those bidders spend their whole budget
by the FPPE “no unnecessary pacing” condition. For bidders such that αi′ = αi , they pay the same as
before, and win weakly more.
3. Increasing a bidder i’s budget: the old FPPE (α, x) is still BFPM, so this follows by α maximality.
It is also possible to show that revenue enjoys a Lipschitz property: increasing a single buyer’s budget
by ∆ increases revenue by at most ∆. Similarly, social welfare can be bounded in terms of ∆, though
multiplicatively, and it does not satisfy monotonicity.
X
max Bi log(ui ) − δi X X
x≥0,δ≥0,u min pj − Bi log(βi )
i p≥0,β≥0
X j i
ui ≤ xij vij + δi , ∀i (10.1) (10.3)
∀i, pj ≥ vij βi
j
X βi ≤ 1
xij ≤ 1, ∀j (10.2)
i
On the left is shown the primal convex program, and on the right is shown the dual convex program. The
variables xij denote the amount of item j that bidder i wins. The leftover budget is denoted by δi , it arises
from the dual program: it is the primal variable for the dual constraint βi ≤ 1, which constrains bidder i to
paying at most a price-per-utility rate of 1.
The dual variables βi , pj correspond to constraints (10.1) and (10.2), respectively. They can be interpreted
p
as follows: βi is the inverse bang-per-buck: minjs.t.xij >0 vijj for buyer i, and pj is the price of good j.
We may use the following basic fact from convex optimization to conclude that strong duality holds and
get optimality conditions:
Theorem 23. Consider a convex program and its dual
max q(λ)
min f (x) λ≥0
x
q(λ) := min L(x, λ)
gi (x) ≤ 0, ∀i (10.4) x≥0 (10.5)
X
x≥0 L(x, λ) := f (x) + λi gi (x)
i
with Lagrange multipliers λi for each constraint i. Assume that the following Slater constraint qualification
is satisfied: there exists some x ≥ 0 such that gi (x) < 0 for all i. If (10.4) has a finite optimal value f ∗ then
(10.5) has a finite optimal value q ∗ and f ∗ = q ∗ . Furthermore a solution pair x∗ , λ∗ is optimal if and only
if the following Karush-Kuhn-Tucker (KKT) conditions hold:
• (primal feasibility) x∗ is a feasible solution of (10.4)
• (dual feasibility) λ∗ ≥ 0
It is easy to see that xij is a valid allocation: the primal program has the exact packing constraints.
Budgets are also satisfied (here we may assume ui > 0 since otherwise budgets are satisfied since the bidder
wins no items): by KKT condition 1 and KKT condition 7 we have that for any item j that bidder i is
allocated part of:
Bi pj Bi vij xij
= ⇒ = pj xij
ui vij ui
If δi = 0 then summing over all j gives
P
X j vij xij
pj xij = Bi = Bi
j
ui
This part of the budget argument is exactly the same as for the standard Eisenberg-Gale proof [74]. Note
that (10.1) always holds exactly since the objective is strictly
P increasing in ui . Thus δi = 0 denotes full
budget expenditure. If δi > 0 then (10.1) implies that ui > j vij xij which gives:
P
j vij xij
X
pj xij = Bi < Bi
j
ui
equilibrium of a game where each buyer is choosing their pacing multiplier, and observing their quasi-linear
utility (with −∞ utility for breaking the budget). Moreover, in the second-price setting, if we fix the bids
of every other buyer, then a pacing multiplier αi that satisfies no unnecessary pacing is actually a best
response over the set of all possible ways to bid in each individual auction. In the case of first-price pacing
equilibrium, we do not have this property. In FPPE, a buyer might wish to shade their own price. In that
case, FPPE should be thought of only as a budget-management equilibrium among the algorithmic proxy
bidders that control budget expenditure. Secondly, due to this shading, the values vij that we took as input
to the FPPE problem should probably be thought of as the bids of the buyers, which would generally be
lower than their true values.
10.6 Conclusion
There are interesting differences in the properties satisfied by SPPE and FPPE. We summarize them quickly
here (these are all covered in the literature noted in the Historical Notes):
• FPPE is unique (this can be shown from the convex program, or directly from the monotonicity
property of BFPM), SPPE is not
• FPPE is less sensitive to perturbation (e.g. revenue increases smoothly as budgets are increased)
• SPPE corresponds to a pure-strategy Nash equilibrium, and thus buyers are best responding to each
other
• Both correspond to different market equilibria (but SPPE requires buyer demands to be “supply
aware”)
• Due to the market equilibrium connection, both can be shown strategyproof in an appropriate “large
market” sense
FPPE and SPPE have also been studied experimentally, both via random instances, as well as instances
generated from real ad auction data. The most interesting takeaways from those experiments are:
• Manipulation is hard in both SPPE and FPPE if you can only lie about your value-per-click
• Social welfare can be higher in either FPPE or SPPE. Experimentally it seems to largely be a toss-up
on which solution concept has higher social welfare.
mean-field market model. The PPAD-completeness of computing an SPPE was given by Chen, Kroer, and
Kumar [29]
The quasi-linear variant of Eisenberg-Gale was given by Chen, Ye, and Zhang [27] and independently
by Cole et al. [34] (an unpublished note from one of the authors in Cole et al. [34] was in existence around
a decade before the publication of Cole et al. [34]). Theorem 23 is a specialization to the FPPE setting.
In reality much stronger statements can be made: For a more general statement of the strong duality
theorem and KKT conditions used here, see Bertsekas, Nedic, and Ozdaglar [11] Proposition 6.4.4. The
KKT conditions can be significantly generalized beyond convex programming.
The fixed-point theorem that is invoked to guarantee existence of a pure-strategy Nash equilibrium in
the smoothed game is by Debreu [39], Glicksberg [54], and Fan [45].
74 CHAPTER 10. AUCTIONS WITH BUDGETS
Chapter 11
11.1 Introduction
In the last lecture note we studied auctions with budgets and repeated auctions. However, we ignored one
important aspect: time. In this lecture note we consider an auction market setting where a buyer is trying
to adaptively pace their bids over time. The goal is to hit the “right” pacing multiplier as before, but each
bidder has to learn that multiplier as the market plays out. We’ll see how we can approach this problem
using ideas from regret minimization.
t=1
75
76 CHAPTER 11. ONLINE BUDGET MANAGEMENT
We would like to compare our outcome to the hindsight optimal strategy. We denote the expected value
of that strategy as
XT
πiH (vi , di ) := max xit (vit − dit )
xi ∈{0,1}T
t=1
T (11.1)
X
s.t. xit dit ≤ Bi
t=1
The hindsight-optimal strategy has a simple structure: we simply choose the optimal subset of items to win
while satisfying our budget constraint. In the case where the budget constraint is binding, this is a knapsack
problem.
Ideally we would like to choose a strategy such that πiβ approaches πiH . However, this turns out not to
be possible. We will use the idea of asymptotic γ-competitiveness to see this. Formally, β is asymptotic
γ-competitive if
1 H
lim sup sup πi (vi , di ) − γπiβ (vi , di ) ≤ 0
T →∞, vi ∈[0,v̄i ]T , T
Bi =ρi T
di ∈RT
+
Intuitively, the condition says that asymptotically, β should achieve at least 1/γ of the hindsight-optimal
expected value.
For any γ < v̄i /ρi , asymptotic γ-competitiveness turns out to be impossible to achieve. Thus, if our target
expenditure ρi is much smaller than our maximum possible valuation, we cannot expect to do anywhere near
as well as the hindsight-optimal strategy.
The general proof is quite involved, but the high-level idea is not too complicated. Here we show the
construction for v̄i = 1, ρi = 1/2, and thus the claim is that γ < v̄i /ρi = 2 is unachievable. The impossibility
is via a worst-case instance. In this instance, the highest other bid comes from one of the two following
sequences:
for v̄i ≥ dhigh > dlow > 0. The general idea behind this construction is that in the sequence d1 , buyer i
must buy many of the expensive items in order to maximize their utility, since they receive zero utility for
winning items with price v̄i . However, in the sequence d2 , buyer i must save money so that they can buy
the cheaper items priced at dlow .
For the case we consider here, there are T /2 of each type of highest other bid (assume T is even for
convenience). Now, we may set dhigh = 2ρi − ϵ and dlow = 2ρi − kϵ, where ϵ and k are constants that can
be tuned. For sufficiently small ϵ, i can only afford to buy T /2 items total, no matter the combination of
items. Furthermore, buying an item at price dlow yields k times as much utility as buying an item at dhigh .
Now, in order to achieve at least half of the optimal utility under d1 , buyer i must purchase at least T /4
of the items priced at dhigh . Since they don’t know whether d1 or d2 occurred until after deciding whether
to buy at least T /4 of the dhigh items, this must also occur under d2 . But then buyer i can at most afford
to buy T /4 of the items priced at dlow when they find themselves in the d2 case. Now for any γ < 2, we can
pick k and ϵ such that achieving γπiH requires buying at least T /4 + 1 of the dlow items.
It follows that we cannot hope to design an online algorithm that competes with γπiH for γ < v̄i /ρi .
However, it turns out that a subgradient descent algorithm can achieve exactly γ = v̄i /ρi
The optimal solution for the relaxed problem is easy to characterize: we set xit = 1 for all t such that
vit
vit ≥ (1 − µ)dit . Importantly, this is achieved by the bid bit = 1+µ that we use in APS.
The Lagrangian dual is the minimization problem
T
X
(vit − (1 − µ)dit )+ + µρi ,
inf (11.2)
µ≥0
t=1
where (·)+ denotes thresholding at 0. This dual problem upper bounds πiH (but we do not necessarily have
strong duality since we did not even start out with a convex primal program). The minimizer of the dual
problem yields the strongest possible upper bound on ϕH i , however, solving this requires us to know the
entire sequences vi , di . APS approximates this optimal µ by taking a subgradient step on the t’th term of
the dual:
∂µ (vit − (1 − µ)dit )+ + µρi ∋ ρi − dit 1{bit ≥ dit } = ρi − zit .
Thus APS is taking subgradient steps based on the subdifferential of the t’th term of the Lagrangian dual
of the hindsight-optimal optimization problem.
The APS algorithm achieves exactly the lower bound we derived earlier, and is thus asymptotically
optimal:
Theorem 25. APS with stepsize ϵi = O(T −1/2 ) is v̄i
ρi -asymptotic competitive, and converges at a rate of
O(T −1/2 ).
This result holds under adversarial conditions: for example, the sequence of highest other bids may be
as d1 , d2 in the lower bound. However, in practice we do not necessarily expect the world to be quite this
adversarial. In a large-scale ad market, we would typically expect the sequences vi , di to be more stochastic
in nature. In a fully stochastic setting with independence, APS turns out to achieve πiH asymptotically:
Theorem 26. Suppose (vit,dit ) are sampled independently from stationary, absolutely continuous CDFs
with differentiable and bounded densities. Then the expected payoff from APS with stepsize ϵi = O(T −1/2 )
approaches πiH asymptotically at a rate of T −1/2 .
Theorem 26 shows that if the environment is well-behaved then we can expect much better performance
from APS.
Demographic Fairness
12.1 Introduction
This chapter studies the issue of demographic fairness. This is a separate topic from the types of fairness
we have studied so far, which was largely focused on individual fairness notions such as envy-freeness and
proportionality. Moreover, in the context of ad auctions, those fairness guarantees are with respect to
advertisers, since they are the buyers/agents in the market equilibrium model of the ad auction markets.
Demographic fairness, on the other hand, is a fairness notion with respect to the users who are being shown
the ads. In the context of the Fisher market models we have studied so far, this means that demographic
fairness will be a property measured on the item side, since items correspond to ad slots for particular users.
Secondly, some demographic fairness notions will be with respect to groups of users, rather than individual
users. A serious concern with internet advertising auctions and recommender systems is that the increased
ability to target users based on features could lead to harmful effects on subsets of the population, such
as gender or race-based biases in the types of ads or content being shown. We will start by looking at a
few real-world examples where notions of demographic fairness were observed to be violated. We will then
describe some potential ideas for implementing fairness in the context of Fisher markets and first-price ad
auctions, but it is important to emphasize that this is an evolving area, and it is not clear that there is
a simple answer to the question of how to guarantee certain types of demographic fairness, and moreover
there are tradeoffs involved between various notions, as well as between fairness and other objectives such
as revenue or welfare.
Verizon placed an ad on Facebook to recruit applicants for a unit focused on financial planning and
analysis. The ad showed a smiling, millennial-aged woman seated at a computer and promised
that new hires could look forward to a rewarding career in which they would be “more than just
a number.”
Some relevant numbers were not immediately evident. The promotion was set to run on the
Facebook feeds of users 25 to 36 years old who lived in the nation’s capital, or had recently
visited there, and had demonstrated an interest in finance.
Whether age-based targeting of job ads is illegal was not completely clear, as of 2017 when this article was
written. The federal Age Discrimination in Employment Act of 1967 prohibits bias against people aged 40
or older both in hiring and employment. Whether the company placing the ad, as well as Facebook, could
be held liable for age discrimination was not clear, since the law was written before the internet age, and it
was not clear whether the law applied to targeted ads.
79
80 CHAPTER 12. DEMOGRAPHIC FAIRNESS
”to make, print, or publish, or cause to be made, printed, or published any notice, statement,
or advertisement, with respect to the sale or rental of a dwelling that indicates any preference,
limitation, or discrimination based on race, color, religion, sex, handicap, familial status, or
national origin.”
In other contexts, such as e.g. traditional newspapers, advertisements are reviewed before being accepted to
be shown, in order to ensure that they do not violate these laws. However, in the context of online advertising,
the process is much more automated and algorithmic, and the targeting criteria are powerful enough that
one has to think carefully about what fairness means and how it can be implemented algorithmically.
For the remainder of the lecture, we will operate under the assumption that we wish to ensure various
demographic properties of how ads are shown, for ads that are viewed as “sensitive”. Beyond employment
and housing, another category of ads that are viewed as sensitive are credit opportunities. Again, existing
laws that were created prior to the internet disallow discrimination based on demographic properties in
lending.
Statistical Parity This notion of demographic fairness asks that ad i is shown at an equal rate across the
two groups, in the following sense:
1 X 1 X
xij = xij .
|GA | |GB |
j∈GA j∈GB
This guarantees that, in aggregate, the groups are being shown the ad at an equal rate.
Next, let’s see an example of how statistical parity could be broken even though targeting by demographic
features is disallow. Suppose that a sensitive ad (say a job ad) wishes to target users in either demographic,
and has a value of $1 per click, with a click-through rate that depends only on wj and not gj . Secondly,
there’s another ad which is not sensitive, which has a value per click of $2, and click-through rates of 0.1
and 0.6 for groups A and B respectively. Now, the sensitive ad will never be able to win any slots for group
B since even with a CTR of 1, their bid will be lower than 0.6 · 2 = 1.2. As a result, the sensitive ad will
be shown only to group A. A concrete example of how this competition-driven form of bias might occur is
when the non-sensitive ad is some form of female-focused product such as clothing or make-up.
A potential criticism of this fairness measure is that it does not require the ad to be shown to equally
interested users in both groups. Thus, one could for example worry that the ad might end up buying highly
relevant slots among one group, and cheap irrelevant slots in the other group in order to satisfy the constraint.
Similar Treatment Similar treatment (ST) asks for an individual-level fairness guarantee: if two users i
and k have the same non-sensitive feature vector wj = wk , then they should be treated similarly regardless
of the value of gj and gk . A simple version of this principle for ad auctions could be that we require xij = xik
whenever wj = wk . However, if the feature space is large, some features are continuous, or we just want
this to hold even when users are similar in terms of wj and wk , then we need a slightly more complicated
constraint. Suppose we have a measure d(wj , wk ) that measures similarity between feature vectors. Then,
ST can be defined as
|xij − xik | ≤ d(wj , wk ).
With this definition, we are asking for more than just equality when wj = wk ; instead we also ask that the
difference between xij and xik should decrease smoothly as the non-sensitive feature vectors get closer to
each other, as measured by d.
However, this constraint is not easy to implement as part of an online allocation procedure, for two
reasons. The first is that equality constraints such as this one are harder to handle as part of an online
learning procedure, than the simpler “packing constraint” needed for the budgets (a less-than-or-equals
constraints with only positive coefficients). The second reason is that we do not know the normalizing
factors until the end.
82 CHAPTER 12. DEMOGRAPHIC FAIRNESS
Now, our EG program maximizes the quasilinear EG objective, but over a smaller set of feasible allocations:
those that satisfy the statistical parity constraint across buyers in I.
The key to analyzing this new quasilinear EG variant is to use the Lagrange multipliers on Eq. (12.6).
Let (x, p) be the optimal allocation, and let p be the prices derived from the Lagrange multipliers on the
supply constraints Eq. (12.5). Let λ be the Lagrange multiplier on Eq. (12.6). We will show that (x, p, λ)
is a form of market equilibrium, where we charge each buyer i ∈ I a price of pj + λ for j ∈ A and a price
of pj − λ for j ∈ B, where λ is the Lagrange multiplier on Eq. (12.6). Buyers i ∈ / I are simply charged the
price vector p. Clearly, this is not our usual notion of market equilibrium: we are charging two different sets
of prices: prices for buyers in I and prices for buyers not in I.
First, consider some non-sensitive buyer i ∈ / I. For such a buyer, we can show that xi ∈ Di (p) using the
exact same argument as in the case of the standard quasilinear EG program in Theorem 24. Similarly, we
can show that each item is fully allocated if pj > 0 using the same arguments as before. It is also direct
from feasibility that the statistical parity constraint is satisfied.
Given the above, we only need to see what happens for buyers i ∈ I. Ignoring feasibility conditions which
are straightforward, the KKT conditions pertaining to buyer i are as follows:
Bi Bi
1. ui = βi ⇔ ui = βi 4. δi > 0 ⇒ βi = 1
2. βi ≤ 1
pj ±λ pj ±λ
3. βi ≤ vij 5. xij > 0 ⇒ βi = vij
Here, the ± should be interpreted as + for j ∈ A and − for j ∈ B. Now it is straightforward from KKT
conditions 3 and 5 that buyer i buys only items with optimal price-per-utility under the prices pj ± λ. From
here, the same argument as in Theorem 24 can be performed in order to show that buyer i spends their
whole budget, which shows that they received a bundle xi ∈ Di (p ± λ).
It follows from the above that (x, p, λ) is a market equilibrium (with different prices for I and [n] \ I),
and thus we can use the Lagrange multiplier λ as a tax/subsidy scheme in order to enforce statistical parity.
their definitions. They also study statistical parity in the classification context. A book-level treatment of
fairness in machine learning is given by Barocas et al. [7]. Many of these fairness notions were also previously
known in the education testing and psychometrics literature. See the biographical notes in Barocas et al. [7]
for an overview of these older works. The quasilinear Fisher market model with statistical parity constraints
via taxes and subsidies was studied in Peysakhovich et al. [76], which also studies several other fairness
questions in the context of Fisher markets. A related work is Jalota et al. [58]. This work does not study
fairness directly, but shows how per-buyer linear constraints can be implemented in a similar way to what
we describe in Section 12.5.
84 CHAPTER 12. DEMOGRAPHIC FAIRNESS
Bibliography
[1] Julia Angwin and Terry Parris Jr. Facebook lets advertisers exclude users
by race. ProPublica, 2016. URL https://www.propublica.org/article/
facebook-lets-advertisers-exclude-users-by-race.
[2] Julia Angwin, Noam Scheiber, and Ariana Tobin. Dozens of companies are using facebook to ex-
clude older workers from job ads. ProPublica, 2016. URL https://www.propublica.org/article/
facebook-ads-age-discrimination-targeting.
[3] Santiago Balseiro, Anthony Kim, Mohammad Mahdian, and Vahab Mirrokni. Budget management
strategies in repeated auctions. In Proceedings of the 26th International Conference on World Wide
Web, pages 15–23, 2017.
[4] Santiago R Balseiro and Yonatan Gur. Learning in repeated auctions with budgets: Regret minimization
and equilibrium. Management Science, 65(9):3952–3968, 2019.
[5] Santiago R Balseiro, Omar Besbes, and Gabriel Y Weintraub. Repeated auctions with budgets in ad
exchanges: Approximations and design. Management Science, 61(4):864–884, 2015.
[6] Siddharth Barman, Sanath Kumar Krishnamurthy, and Rohit Vaish. Finding fair and efficient alloca-
tions. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 557–574,
2018.
[7] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairmlbook.org,
2019. http://www.fairmlbook.org.
[8] Amir Beck. First-order methods in optimization, volume 25. SIAM, 2017.
[9] Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex
optimization. Operations Research Letters, 31(3):167–175, 2003.
[10] Xiaohui Bei, Jugal Garg, and Martin Hoefer. Ascending-price algorithms for unknown markets. ACM
Transactions on Algorithms (TALG), 15(3):1–33, 2019.
[11] Dimitri P Bertsekas, A Nedic, and A Ozdaglar. Convex analysis and optimization. 2003. Athena
Scientific, 2003.
[12] Dimitris Bertsimas and John N Tsitsiklis. Introduction to linear optimization, volume 6. Athena scientific
Belmont, MA, 1997.
[13] Benjamin Birnbaum, Nikhil R Devanur, and Lin Xiao. Distributed algorithms via gradient descent for
fisher markets. In Proceedings of the 12th ACM conference on Electronic commerce, pages 127–136.
ACM, 2011.
[14] Christian Borgs, Jennifer Chayes, Nicole Immorlica, Kamal Jain, Omid Etesami, and Mohammad Mah-
dian. Dynamics of bid optimization in online advertisement auctions. In Proceedings of the 16th inter-
national conference on World Wide Web, pages 531–540, 2007.
85
86 BIBLIOGRAPHY
[15] Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-up limit hold’em poker
is solved. Science, 347(6218):145–149, 2015.
[16] Stephen P Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
[17] Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top
professionals. Science, 359(6374):418–424, 2018.
[18] Noam Brown and Tuomas Sandholm. Solving imperfect-information games via discounted regret mini-
mization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1829–1836,
2019.
[19] Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science, 365(6456):885–
890, 2019.
[20] Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends®
in Machine Learning, 8(3-4):231–357, 2015.
[21] Eric Budish. The combinatorial assignment problem: Approximate competitive equilibrium from equal
incomes. Journal of Political Economy, 119(6):1061–1103, 2011.
[22] Eric Budish, Gérard P Cachon, Judd B Kessler, and Abraham Othman. Course match: A large-scale
implementation of approximate competitive equilibrium from equal incomes for combinatorial allocation.
Operations Research, 65(2):314–336, 2016.
[23] Neil Burch, Matej Moravcik, and Martin Schmid. Revisiting cfr+ and alternating updates. Journal of
Artificial Intelligence Research, 64:429–443, 2019.
[24] Ioannis Caragiannis, David Kurokawa, Hervé Moulin, Ariel D Procaccia, Nisarg Shah, and Junxing
Wang. The unreasonable fairness of maximum Nash welfare. In Proceedings of the 2016 ACM Conference
on Economics and Computation, pages 305–322. ACM, 2016.
[25] Ioannis Caragiannis, David Kurokawa, Hervé Moulin, Ariel D Procaccia, Nisarg Shah, and Junxing
Wang. The unreasonable fairness of maximum nash welfare. ACM Transactions on Economics and
Computation (TEAC), 7(3):1–32, 2019.
[26] Antonin Chambolle and Thomas Pock. On the ergodic convergence rates of a first-order primal–dual
algorithm. Mathematical Programming, 159(1-2):253–287, 2016.
[27] Lihua Chen, Yinyu Ye, and Jiawei Zhang. A note on equilibrium pricing as convex optimization. In
International Workshop on Web and Internet Economics, pages 7–16. Springer, 2007.
[28] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player nash
equilibria. Journal of the ACM (JACM), 56(3):1–57, 2009.
[29] Xi Chen, Christian Kroer, and Rachitesh Kumar. The complexity of pacing for second-price auctions.
In Proceedings of the 2021 ACM Conference on Economics and Computation, 2021.
[30] Yun Kuen Cheung, Richard Cole, and Nikhil R Devanur. Tatonnement beyond gross substitutes?
gradient descent to the rescue. Games and Economic Behavior, 2019.
[31] Mark Cieliebak, Stephan J Eidenbenz, Aris Pagourtzis, and Konrad Schlude. On the complexity of
variations of equal sum subsets. Nord. J. Comput., 14(3):151–172, 2008.
[32] Edward H Clarke. Multipart pricing of public goods. Public choice, pages 17–33, 1971.
[33] Richard Cole and Lisa Fleischer. Fast-converging tatonnement algorithms for one-time and ongoing
market problems. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages
315–324, 2008.
BIBLIOGRAPHY 87
[34] Richard Cole, Nikhil R Devanur, Vasilis Gkatzelis, Kamal Jain, Tung Mai, Vijay V Vazirani, and Sadra
Yazdanbod. Convex program duality, fisher markets, and Nash social welfare. In 18th ACM Conference
on Economics and Computation, EC 2017. Association for Computing Machinery, Inc, 2017.
[35] Vincent Conitzer and Tuomas Sandholm. New complexity results about Nash equilibria. Games and
Economic Behavior, 63(2):621–641, 2008.
[36] Vincent Conitzer, Christian Kroer, Eric Sodomka, and Nicolás E Stier-Moses. Multiplicative pacing
equilibria in auction markets. In International Conference on Web and Internet Economics, 2018.
[37] Vincent Conitzer, Christian Kroer, Debmalya Panigrahi, Okke Schrijvers, Eric Sodomka, Nicolas E
Stier-Moses, and Chris Wilkens. Pacing equilibrium in first-price auction markets. In Proceedings of the
2019 ACM Conference on Economics and Computation. ACM, 2019.
[38] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of comput-
ing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
[39] Gerard Debreu. A social equilibrium existence theorem. Proceedings of the National Academy of Sci-
ences, 38(10):886–893, 1952.
[40] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through
awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages
214–226. ACM, 2012.
[41] Benjamin Edelman and Michael Ostrovsky. Strategic bidder behavior in sponsored search auctions.
Decision support systems, 43(1):192–198, 2007.
[42] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized
second-price auction: Selling billions of dollars worth of keywords. American economic review, 97(1):
242–259, 2007.
[43] Edmund Eisenberg. Aggregation of utility functions. Management Science, 7(4):337–350, 1961.
[44] Edmund Eisenberg and David Gale. Consensus of subjective probabilities: The pari-mutuel method.
The Annals of Mathematical Statistics, 30(1):165–168, 1959.
[45] Ky Fan. Fixed-point and minimax theorems in locally convex topological linear spaces. Proceedings of
the National Academy of Sciences of the United States of America, 38(2):121, 1952.
[46] Fei Fang, Thanh H Nguyen, Rob Pickles, Wai Y Lam, Gopalasamy R Clements, Bo An, Amandeep
Singh, Brian C Schwedock, Milin Tambe, and Andrew Lemieux. Paws—a deployed game-theoretic
application to combat poaching. AI Magazine, 38(1):23–36, 2017.
[47] Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Online convex optimization for sequential
decision processes and extensive-form games. In Proceedings of the AAAI Conference on Artificial
Intelligence, volume 33, pages 1917–1925, 2019.
[48] Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Optimistic regret minimization for extensive-
form games via dilated distance-generating functions. In Advances in Neural Information Processing
Systems, pages 5222–5232, 2019.
[49] Gabriele Farina, Christian Kroer, and Tuomas Sandholm. Stochastic regret minimization in extensive-
form games. In International Conference on Machine Learning. PMLR, 2020.
[50] Arthur Flajolet and Patrick Jaillet. Real-time bidding with side information. In Proceedings of the
31st International Conference on Neural Information Processing Systems, pages 5168–5178. Curran
Associates Inc., 2017.
[51] Yuan Gao, Christian Kroer, and Donald Goldfarb. Increasing iterate averaging for solving saddle-point
problems. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
88 BIBLIOGRAPHY
[52] Mohammad Ghodsi, MohammadTaghi HajiAghayi, Masoud Seddighin, Saeed Seddighin, and Hadi
Yami. Fair allocation of indivisible goods: Improvements and generalizations. In Proceedings of the
2018 ACM Conference on Economics and Computation, pages 539–556, 2018.
[53] Itzhak Gilboa and Eitan Zemel. Nash and correlated equilibria: Some complexity considerations. Games
and Economic Behavior, 1(1):80–93, 1989.
[54] Irving L Glicksberg. A further generalization of the kakutani fixed point theorem, with application to
nash equilibrium points. Proceedings of the American Mathematical Society, 3(1):170–174, 1952.
[55] Jonathan Goldman and Ariel D Procaccia. Spliddit: Unleashing fair division algorithms. ACM SIGecom
Exchanges, 13(2):41–46, 2015.
[56] Theodore Groves. Incentives in teams. Econometrica: Journal of the Econometric Society, pages 617–
631, 1973.
[57] Samid Hoda, Andrew Gilpin, Javier Pena, and Tuomas Sandholm. Smoothing techniques for computing
Nash equilibria of sequential games. Mathematics of Operations Research, 35(2):494–512, 2010.
[58] Devansh Jalota, Marco Pavone, Qi Qi, and Yinyu Ye. Fisher markets with linear constraints: Equi-
librium properties and efficient distributed algorithms. Games and Economic Behavior, 141:223–260,
2023.
[59] Tinne Hoff Kjeldsen. John von neumann’s conception of the minimax theorem: a journey through
different mathematical contexts. Archive for history of exact sciences, 56(1):39–68, 2001.
[60] Daphne Koller, Nimrod Megiddo, and Bernhard von Stengel. Efficient computation of equilibria for
extensive two-person games. Games and economic behavior, 14(2):247–259, 1996.
[62] Christian Kroer and Alexander Peysakhovich. Scalable fair division for’at most one’preferences. arXiv
preprint arXiv:1909.10925, 2019.
[63] Christian Kroer, Gabriele Farina, and Tuomas Sandholm. Solving large sequential games with the
excessive gap technique. In Advances in Neural Information Processing Systems, pages 864–874, 2018.
[64] Christian Kroer, Alexander Peysakhovich, Eric Sodomka, and Nicolas E Stier-Moses. Computing large
market equilibria using abstractions. In Proceedings of the 2019 ACM Conference on Economics and
Computation, pages 745–746, 2019.
[65] Christian Kroer, Kevin Waugh, Fatma Kılınç-Karzan, and Tuomas Sandholm. Faster algorithms for
extensive-form game solving via improved smoothing functions. Mathematical Programming, pages 1–33,
2020.
[66] David Kurokawa, Ariel D Procaccia, and Junxing Wang. Fair enough: Guaranteeing approximate
maximin shares. Journal of the ACM (JACM), 65(2):1–27, 2018.
[67] Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo sampling for regret
minimization in extensive games. In Advances in neural information processing systems, pages 1078–
1086, 2009.
[68] Euiwoong Lee. Apx-hardness of maximizing nash social welfare with indivisible items. Information
Processing Letters, 122:17–20, 2017.
[69] Matej Moravčı́k, Martin Schmid, Neil Burch, Viliam Lisỳ, Dustin Morrill, Nolan Bard, Trevor Davis,
Kevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level artificial intelligence
in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
[70] APS Mosek. The mosek optimization software. Online at http://www. mosek. com, 54(2-1):5, 2010.
BIBLIOGRAPHY 89
[71] Arkadi Nemirovsky and David Borisovich Yudin. Problem complexity and method efficiency in opti-
mization. 1983.
[72] Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical programming, 120
(1):221–259, 2009.
[73] Yurii Nesterov and Vladimir Shikhman. Computation of fisher–gale equilibrium by auction. Journal of
the Operations Research Society of China, 6(3):349–389, 2018.
[74] Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay V Vazirani. Algorithmic game theory. Cambridge
University Press, 2007.
[75] Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
[76] Alexander Peysakhovich, Christian Kroer, and Nicolas Usunier. Implementing fairness constraints in
markets using taxes and subsidies. In Proceedings of the 2023 ACM Conference on Fairness, Account-
ability, and Transparency, pages 916–930, 2023.
[77] IV Romanovskii. Reduction of a game with full memory to a matrix game. Doklady Akademii Nauk
SSSR, 144(1):62–+, 1962.
[78] Tim Roughgarden. Twenty lectures on algorithmic game theory. Cambridge University Press, 2016.
[79] Sheryl Sandberg. Doing more to protect against discrimination in housing, employ-
ment and credit advertising. Facebook, 2019. URL https://about.fb.com/news/2019/03/
protecting-against-discrimination-in-ads/.
[80] Maurice Sion et al. On general minimax theorems. Pacific Journal of mathematics, 8(1):171–176, 1958.
[81] Kalyan Talluri and Garrett Van Ryzin. An analysis of bid-price controls for network revenue manage-
ment. Management science, 44(11-part-1):1577–1593, 1998.
[82] Milind Tambe. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge
university press, 2011.
[83] Oskari Tammelin, Neil Burch, Michael Johanson, and Michael Bowling. Solving heads-up limit Texas
hold’em. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
[84] Hal R Varian. Position auctions. international Journal of industrial Organization, 25(6):1163–1178,
2007.
[85] Hal R Varian and Christopher Harris. The vcg auction in theory and practice. American Economic
Review, 104(5):442–45, 2014.
[86] William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance,
16(1):8–37, 1961.
[87] John von Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen, 100(1):295–320, 1928.
[88] John Von Neumann. On the theory of games of strategy. Contributions to the Theory of Games, 4:
13–42, 1959.
[89] Bernhard von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior,
14(2):220–246, 1996.
[90] Haifeng Xu. The mysteries of security games: Equilibrium computation becomes combinatorial al-
gorithm design. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages
497–514. ACM, 2016.
[91] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization
in games with incomplete information. In Advances in neural information processing systems, pages
1729–1736, 2007.