Differential Economics and Deep Learning
Differential Economics and Deep Learning
duetting,zhef,[email protected]
b
John A. Paulson School of Engineering and Applied Sciences, Harvard University
parkes,[email protected]
Abstract
Designing an incentive compatible auction that maximizes expected revenue is an intricate
task. The single-item case was resolved in a seminal piece of work by Myerson in 1981, but more
than 40 years later a full analytical understanding of the optimal design still remains elusive for
settings with two or more items. In this work, we initiate the exploration of the use of tools from
deep learning for the automated design of optimal auctions. We model an auction as a multi-layer
neural network, frame optimal auction design as a constrained learning problem, and show how it
can be solved using standard machine learning pipelines. In addition to providing generalization
bounds, we present extensive experimental results, recovering essentially all known solutions
that come from the theoretical analysis of optimal auction design problems and obtaining novel
mechanisms for settings in which the optimal mechanism is unknown.
1 Introduction
Optimal auction design is one of the cornerstones of economic theory. It is of great practical
importance as auctions are used across industries and in the public sector to organize the sale
of products and services. Concrete examples are the U.S. FCC Incentive Auction, the sponsored
∗
David Parkes is currently on sabbatical at DeepMind as a research scientist. This work is supported in part
through NSF award CCF-1841550, as well as a Google Fellowship for Zhe Feng. We thank Zihe Wang (Shanghai
University of Finance and Economics) for pointing out that the combinatorial feasible definition in the ICML’19
published version of the extended abstract of this paper need not imply an integer decomposition. We would like
to thank Dirk Bergemann, Yang Cai, Vincent Conitzer, Yannai Gonczarowski, Constantinos Daskalakis, Glenn Elli-
son, Sergiu Hart, Ron Lavi, Kevin Leyton-Brown, Shengwu Li, Noam Nisan, Parag Pathak, Alexander Rush, Karl
Schlag, Zihe Wang, Alex Wolitzky, participants in the Economics and Computation Reunion Workshop at the Si-
mons Institute, the NIPS’17 Workshop on Learning in the Presence of Strategic Behavior, a Dagstuhl Workshop
on Computational Learning Theory meets Game Theory, the EC’18 Workshop on Algorithmic Game Theory and
Data Science, the Annual Congress of the German Economic Association, participants in seminars at LSE, Technion,
Hebrew, Google, HBS, MIT, and the anonymous reviewers on earlier versions of this paper for their helpful feedback.
The first version of this paper, originally titled as “Optimal Auctions through Deep Learning”, was posted on arXiv
on June 12, 2017. An extended abstract appeared in ICML’19 (Dütting et al., 2019), along with a short Research
Highlight in the Comm. ACM (Dütting et al., 2021). The source code for all experiments is available from Github at
https://github.com/saisrivatsan/deep-opt-auctions.
1
search auctions conducted by search engines such as Google, and the auctions run on platforms
such as eBay. In the standard independent private valuations model, each bidder has a valuation
function over subsets of items, drawn independently from not necessarily identical distributions.
It is assumed that the auctioneer knows the value distributions and can use this information in
designing the auction. A challenge is that valuations are private, and bidders may not report their
valuations truthfully.
In a seminal piece of work, Myerson resolved the optimal auction design problem when there
is a single item for sale (Myerson, 1981). Today, after 40 years of intense research, there are
some elegant partial characterizations (Manelli and Vincent, 2006; Pavlov, 2011; Haghpanah and
Hartline, 2019; Giannakopoulos and Koutsoupias, 2018; Daskalakis et al., 2017; Yao, 2017), but the
analytical problem of optimal design is not completely resolved even for a setting with two bidders
and two items. At the same time, there have been impressive algorithmic advances (Cai et al.,
2012b,a, 2013; Hart and Nisan, 2017; Babaioff et al., 2014; Yao, 2015; Cai and Zhao, 2017; Chawla
et al., 2010), although most of them apply to the weaker notion of Bayesian incentive compatibility
(BIC). Our focus in this paper is on auctions that satisfy dominant-strategy incentive compatibility
(DSIC), which is a more robust and desirable notion of incentive compatibility.
A recent line of work has started to bring in tools from machine learning and computational
learning theory to design auctions from samples of bidder valuations. Much of the effort has fo-
cused on analyzing the sample complexity of designing revenue-maximizing auctions (Cole and
Roughgarden, 2014; Mohri and Medina, 2016; Huang et al., 2018; Morgenstern and Roughgar-
den, 2015; Gonczarowski and Nisan, 2017; Morgenstern and Roughgarden, 2016; Syrgkanis, 2017;
Gonczarowski and Weinberg, 2018; Balcan et al., 2016). A handful of works has leveraged machine
learning pipelines to optimize different aspects of mechanisms (Lahaie, 2011; Dütting et al., 2014;
Narasimhan et al., 2016), but none of these provides the generality and flexibility of our approach.
There have also been other computational approaches to auction design, under the research program
of automated mechanism design (Conitzer and Sandholm, 2002, 2004; Sandholm and Likhodedov,
2015) (to which the present paper contributes), but where scalable, they are limited to specialized
classes of auctions that are already known to be incentive compatible.
2
regret, which is equivalent to DSIC up to measure zero events. For this, we make use of augmented
Langrangian optimization during training, which has the effect of introducing into the loss function
penalty terms that correspond to violations of incentive compatibility. In this way, we minimize
during training a combination of negated revenue and a penalty term for IC violations. We refer to
this neural network architecture as RegretNet. This approach is applicable to multi-bidder multi-
item settings for which we do not have tractable characterizations of IC mechanisms, but will
generally only find mechanisms that are approximately incentive compatible.
We show through extensive experiments that these two approaches are capable of recovering
the designs of essentially all auctions for which theoretical solutions have been developed over the
past 40 years, and in the case of RegretNet, we show that the degree of approximation to DSIC
is very good. We also demonstrate that this deep learning franework is a useful tool for refuting
hypotheses or generating supporting evidence in regard to the conjectured structure of optimal
auctions, and that in the case of RochetNet this framework can be used to discover designs that
can then be proved to be optimal. We also give generalization bounds that provide confidence
intervals on the expected revenue and expected ex post regret, in terms of the empirical revenue
and empirical regret achieved during training, the descriptive complexity of the neural network
used to encode the allocation and payment rules, and the number of samples used to train the
network.
1.2 Discussion
While the original work on automated mechanism design (AMD) framed the problem as a linear
program (LP) (Conitzer and Sandholm, 2002, 2004), this has severe scalablility issues as the for-
mulation scales exponentially in the number of agents and items (Guo and Conitzer, 2010). We
provide a detailed comparison with an LP-based framework, and find that even for a small setting
with two bidders and two items (and a discretization of bidder values into eleven bins per item),
the corresponding LP takes 62 hours to complete since the LP needs to handle ≈ 9 × 105 decision
variables and ≈ 3.6 × 106 constraints.
In comparison, differentiable economics leverages the expressive power of neural networks and
the ability to enforce complex constraints using a standard machine learning pipeline. This provides
for optimization over a broad class of mechanisms without needing to resort to a discretized function
representation, and is constrained only by the expressivity of the neural network architecture. For
the same setting, our approach finds an auction with low regret in just over 3.7 hours (see Table
13). Moreover, the LP based approach fails to scale much beyond this point while the neural
network-based approach continues to scale.
The optimization problems studied here are non-convex and gradient-based approaches may,
in general, get stuck in local optima. Empirically, however, this has not been an obstacle to the
successful application of deep learning in other problem domains, and there is theoretical support
for a “no local optima” phenomenon (see, e.g., Choromanska et al. (2015); Kawaguchi (2016); Du
et al. (2019); Allen-Zhu et al. (2019)). We make similar observations for our experiments: our
neural network architectures recover optimal solutions, wherever known, despite the formulation
being non-convex.
In the case of RegretNet, our framework only provides a guarantee of approximate DSIC. In
this regard, we work with expected ex post regret, which is a quantifiable relaxation of DSIC that
was first introduced in (Dütting et al., 2014). An essential aspect is that it quantifies the regret to
bidders for truthful bidding given knowledge of the bids of others (hence “ex post”), and thus is a
quantity that measures the degree of approximation to DSIC. Indeed, our experiments suggest that
this relaxation is a very effective tool for approximating optimal DSIC auctions, with RegretNet
3
attaining a very good fit to known theoretical results.
This work also shows that this neural-network based pipeline can be used to discover new
analytical results (see Section 5.5, where we use computational results to guess the analytical
structure of an optimal design and duality theory to verify its optimality).
1.4 Organization
Section 2 formulates the auction design problem as a learning problem, introduces the charactization-
based and characterization-free approaches, and gives the main generalization bounds. Section 3
introduces the network architectures of RochetNet and RegretNet, and instantiates the specific gen-
eralization bound for these networks. Section 4 describes the training and optimization procedures,
and Section 5 presents extensive experimental results including experiments that provide support
for theoretical conjectures in regard to the design of optimal auctions along with the discovery of
new, provably-optimal auction designs. Section 6 concludes.
4
2 Auction Design as a Learning Problem
2.1 Preliminaries
We consider a setting with a set of n bidders N = {1, . . . , n} and m items M = {1, . . . , m}. Each
bidder i has a valuation function vi : 2M → R≥0 , where vi (S) denotes the bidder’s value the subset
of items S ⊆ M .
In the simplest case, a bidder may have additive valuations, with P a value vi ({j}) for each item
j ∈ M , and a value for a subset of items S ⊆ M that is vi (S) = j∈S vi ({j}). Alternatively, if a
bidder’s value for a subset of items S ⊆ M is vi (S) = maxj∈S vi ({j}), the bidder has a unit-demand
valuation. We also consider bidders with general combinatorial valuations, but defer the details to
Appendix A.2 and B.3.
Bidder i’s valuation function is drawn independently from a distribution Fi over possible Q valu-
ation functions Vi . We write v = (v1 , . . . , vn ) for a profile of valuations, and denote V = ni=1 Vi .
The auctioneer knows the distributions F = (F1 , . . . , Fn ), but does not know the bidders’ realized
valuation v. The bidders report their valuations (perhaps untruthfully), and an auction decides on
an allocation of items to the bidders and charges a payment to them.
We denote an auction (g, p) as a pair of allocation rules gi : V → 2M and payment rules
pi : V → R≥0 (these rules can be randomized). Given bids b = (b1 , . . . , bn ) ∈ V , the auction
computes an allocation g(b) ∈ 2M , and payments p(b) ∈ Rn≥0 .
A bidder with valuation vi receives utility ui (vi ; b) = vi (gi (b)) − pi (b) at bid profile b. Let
v−i denote
Q the valuation profile v = (v1 , . . . , vn ) without element vi , similarly for b−i , and let
V−i = j6=i Vj denote the possible valuation profiles of bidders other than bidder i. An auction is
dominant strategy incentive compatible (DSIC) if each bidder’s utility is maximized by reporting
truthfully no matter what the other bidders report. That is, ui (vi ; (vi , b−i )) ≥ ui (vi ; (bi , b−i )) for
every bidder i, every valuation vi ∈ Vi , every bid bi ∈ Vi , and all bids b−i ∈ V−i from others.
An auction is ex post individually rational (IR) if each bidder receives a non-zero utility when
participating truthfully, i.e. ui (vi ; (vi , b−i )) ≥ 0 ∀i ∈ N , vi ∈ Vi , and b−i ∈ V−i .
In a DSIC auction, it is in the best interest of each P bidder to report truthfully, and so the
equilibrium revenue on valuation profile v is simply i pi (v). Optimal auction design seeks to
identify a DSIC auction that maximizes expected revenue.
There is also a weaker notion of incentive compatibility, Bayesian Incentive Compatibility (BIC).
An auction is BIC if each bidder’s utility is maximized by reporting truthfully when the other
bidders also report truthfully, i.e. Ev−i [ui (vi ; (vi , v−i ))] ≥ Ev−i [ui (vi ; (bi , v−i ))] for every bidder i,
every valuation vi ∈ Vi , every bid bi ∈ Vi . In this work, we focus on DSIC auctions rather than BIC
auctions, since DSIC auctions are more preferable in practice because truthful bidding remains an
equilibrium without common knowledge of the distributions on valuations or common knowledge
on rationality.
5
−E[ i∈N pw
P
i (v)], among all auctions in M that satisfy DSIC (or just IC). For a single-bidder
setting, there is no difference between DSIC and BIC.
We present two approaches for achieving IC. In the first, we leverage a characterization result
to constrain the search space so that all mechanisms within this class are IC. In the second, we
replace the IC constraints with a differentiable approximation, and move the constraints into the
objective via the augmented Lagrangian method. The first approach affords a smaller search space
and is exactly DSIC, but only applies to single-bidder multi-item settings. The second approach
applies to multi-bidder, multi-item settings, but entails search through a larger parametric space
and only achieves approximate IC.
In Appendix A.1, we also describe a construction based on Myerson (1981)’s characterization
result for multi-bidder single-item settings, which we refer to as MyersonNet.
where gj (v) ∈ {0, 1} indicates whether or not the bidder is assigned item j.
We can consider a menu of J choices, for some J ≥ 1, where each choice consists of a possibly
randomized allocation, together with a price. For choice j ∈ [J], let αj ∈ [0, 1]m specify the
randomized allocation, and parameter βj ∈ R specify the negated price. By choosing the menu
item that maximizes the bidder’s utility, or the null (no allocation, no payment) outcome when this
is better, a menu of size J induces the following utility function:
u(v) = max max {αj · v + βj }, 0 . (2)
j∈[J]
The well known taxation principle from mechanism design theory tells us that a mechanism
that selects the menu choice that maximizes an agent’s reported utility, based on its bid b ∈ Rm , is
DSIC (Hammond, 1979; Guesnerie, 1995). To see this, observe that the menu does not depend on
the reports, and that the agent will maximize its utility by reporting its true valuation function so
that the right choice is made on its behalf. Moreover, the taxation principle also tells us that the
use of a menu is without loss of generality for DSIC mechanisms.
Based on this, for a given J ≥ 0, we seek to learn a mechanism with parameters w = (α, β),
where α ∈ [0, 1]mJ and β ∈ RJ , to maximize the expected revenue Ev∼F [βj ∗ (v) ], where j ∗ (v) ∈
argmaxj∈[J] {αj · v + βj }, and denotes the best choice for the bidder, where choice 0 corresponds to
the null outcome. For a unit-demand
P bidder, the utility can also be represented via (1), with the
additional constraint that j gj (v) ≤ 1, ∀v. We discuss this more in Section 3.1.
We also have the following characterization of DSIC mechanisms for the single bidder case.
Theorem 2.1 (Rochet (1987)). The utility function u : Rm ≥0 → R that is induced by a DSIC
mechanism for a single biddder is 1-Lipschitz w.r.t. the `1 -norm, non-decreasing, and convex.
6
h4
u(b) h3
h2
h1 u(b) = 0
b
Figure 1: An induced utility function represented by RochetNet for the case of a single item (m = 1) and
menu with four choices (J = 4).
The convexity can be understood by recognizing that the induced utility function (2) is the
maximum over a set of hyperplanes, each corresponding to a choice in the menu set. Figure 1
illustrates Rochet’s theorem for a single item (m = 1) and a menu consisting of four choices
(J = 4). Here, the induced utility for choice j given bid b ∈ R is hj (b) = αj · b + βj .
Given this, to find the optimal single-bidder auction we can search over a suitably sized menu
set and pick the one that maximizes expected revenue. In Section 3.1 we explain how to achieve
this by modeling the utility function as a neural network, and formulating the above optimization
as a differentiable learning problem.
7
and seek to minimize the empirical loss (negated revenue) subject to the empirical regret being
zero for all bidders, and the following formulation:
min − L1 L
P Pn w (`)
`=1 i=1 pi (v )
w∈Rd
s.t. c i (w) = 0, ∀i ∈ N.
rgt (4)
We additionally require the auction to satisfy IR, which can be ensured by restricting the
search space to a class of parametrized auctions that charge no bidder more than her valuation for
an allocation.
In Section 3, we model the allocation and payment rules through a neural network, and incor-
porate the IR requirement within the architecture. In Section 4 we describe how the IC constraints
can be incorporated into the objective using Lagrange multipliers, so that the resulting neural net
can be trained with standard pipelines.
Definition 2.1 (Quantile-based ex post regret). For each bidder i, and q with 0 < q < 1, the q-
quantile-based ex post regret, rgt qi (w), induced by the probability distribution F on valuation profiles,
is defined as the smallest x such that
w 0 w
P max0
ui (vi ; (vi , v−i )) − ui (vi ; (vi , v−i )) ≥ x ≤ q.
vi ∈Vi
We can bound the q-quantile based regret rgt qi (w) by the expected ex post regret rgt i (w) as in
the following lemma. The proof appears in Appendix D.1.
Lemma 2.1. For any fixed q, 0 < q < 1, and bidder i, we can bound the q-quantile-based ex post
regret by
rgt i (w)
rgt qi (w) ≤ .
q
Using this Lemma 2.1, we can show, for example, that when the expected ex post regret is
0.001, then the probability that the ex post regret exceeds 0.01 is at most 10%.
8
We measure the capacity of an auction class M using a definition of covering numbers from the
ranking literature (Rudin and Schapire, 2009). For this, define the `∞,1 distance between auctions
(g, p), (g 0 , p0 ) ∈ M as
X X
0
max |gij (v) − gij (v)| + |pi (v) − p0i (v)|.
v∈V
i∈N,j∈M i∈N
For any > 0, let N∞ (M, ) be the minimum number of balls of radius required to cover M
under the `∞,1 distance.
Theorem 2.2. For each bidder i, assume that the valuation function vi satisfies vi (S) ≤ 1, ∀S ⊆
M . Let M be a class of auctions that satisfy individual rationality. Fix δ ∈ (0, 1). With probability
at least 1 − δ over draw of sample S of L profiles from F , for any (g w , pw ) ∈ M,
q
P w 1 PL Pn w (v (`) ) − 2n∆ − Cn log(1/δ)
Ev∼F p
i∈N i (v) ≥ L `=1 p
i=1 i L L ,
and
n n
r
1X 1Xc log(1/δ)
rgti (w) ≤ rgti (w) + 2∆L + C 0 ,
n n L
i=1 i=1
n q o
2 log(N∞ (M, 2n ))
where ∆L = inf >0
n +2 L and C, C 0 are distribution-independent constants.
See Appendix D.2 for the proof. If the term ∆L in the above bound goes to zero as the sample
size L increases then the above bounds go to zero as L → ∞. In Theorem 3.1 in Section 3, we
bound ∆L for the neural network architectures we present in this work.
9
h1
b1 0
b2 h2
max u(b)
..
. ..
.
bm
hJ
Figure 2: RochetNet: Neural network representation of a non-negative, monotone, convex induced utility
function; here hj (b) = αj · b + βj for b ∈ Rm and αj ∈ [0, 1]m .
By using a large number of hyperplanes, one can use this neural network architecture to search
over a sufficiently rich class of DSIC and IR auctions for the single-bidder, multi-item setting. Given
the RochetNet construction, we seek to minimize the negated, expected revenue, Ev∼F [βj ∗ (v) ]. To
ensure that the objective is a continuous function of parameters α and β, we adopt during training
a softmax operation in place of the argmax, and the following loss function:
X
L(α, β) = −Ev∼F βj ∇
e j (v) , (5)
j∈[J]
where
∇
e j (v) = softmaxj κ · (α1 · v + β1 ), . . . , κ · (αJ · v + βJ ) , (6)
and κ > 0 is a constant that controls P the quality of the approximation. Here, the softmax function,
softmaxj (κx1 , . . . , κxJ ) = eκxj / j 0 eκxj 0 , takes as input J real numbers and returns a probability
distribution consisting of J probabilities, proportional to the exponential of the inputs. We only
do this approximation during training, and always use argmax during testing to guarantee the
mechanism is DSIC.
During training, we seek to optimize the parameters of the neural network, i.e., α ∈ [0, 1]mJ ,
and β ∈ RJ , to minimize loss (5). For this, given a sample S = {v (1) , . . . , v (L) } drawn from F , we
use stochastic gradient descent to optimize an empirical version of the loss.
This approach easily extends to a single bidder with a unit-demand valuation. In this case,
the new requirement is that the sum of the allocation probabilities cannot exceed one. This can
be
Pmenforced by restricting the coefficients for each hyperplane to sum up to at most one, i.e.
k=1 αjk ≤ 1, ∀j ∈ [J], and αjk ≥ 0, ∀j ∈ J, k ∈ [m]. To achieve this contraint, we can re-
parameterize αjk as softmaxk γj1 , · · · , γjm , where γjk ∈ R, ∀j ∈ J, k ∈ m. With this restriction,
the resulting mechanism is DSIC for unit-demand bidders since the selected menu choice corre-
sponds a distribution over single-item allocations.2
10
Allocation Network g Payment Network p
z11 , . . . , z1m
softmax
b
b11 z11 b11 m
X
(1) (R) (1) (T ) p1 = p̃1 z1j b1j
.. h1 h1 .. .. c1 c1 σ p̃1 ×
. . . j=1
b
b1m zn1 b1m m
X
(1) (R) (1) (T ) p2 = p̃2 z2j b2j
h2 σ p̃2 ×
.. . . . h2 .. ..
c2 . . . c2 j=1
. . .
.. .. .. .. .. .. ..
bn1 . . z1m bn1 . . . . .
b
.. .. .. m
X
. (1) (R) . . (1) (T )
hJ1 hJR cJ 0 cJ 0 σ p̃n × pn = p̃n znj bnj
1 T
bnm znm bnm j=1
zn1 , . . . , znm
softmax
Figure 3: The allocation component and payment component of the RegretNet neural network for a setting
with n additive bidders and m items. The inputs are bids from each bidder for each item. The revenue
rev and expected ex post rgti are defined as a function of the parameters of the allocation component and
payment component w = (wg , wp ).
11
sof tmax sof tmax
b11 s11 s1m z11 = min{s̄11 , s̄011 }
(1) (R)
.. h1 h1 .. ... .. ..
. . . .
s̄
b1m sn1 snm zn1 = min{s̄n1 , s̄0n1 }
(1) (R)
h2 ... h2
.. ..
. .
.. ..
bn1 . . s011 ... s01m z1m = min{s̄1m , s̄01m }
.. .. sof tmax s̄0 ..
. (1)
hJ1
(R)
hJR . .
bnm s0n1 . . . s0nm znm = min{s̄nm , s̄0nm }
sof tmax
Figure 4: The allocation component of the RegretNet neural network for settings with n unit-demand
bidders and m items.
component such that the matrix of output probabilities [zij ]ni,j=1 is doubly stochastic.4
In particular,the allocation component computes two sets of scores sij ’s and s0ij ’s. Let s, s0 ∈
Rnm denote the corresponding matrices. The first set of scores are normalized along the rows and
the second set of scores normalized along the columns. Both normalizations can be performed by
passing these scores through softmax functions. The allocation for bidder i and item j is then
computed as the minimum of the corresponding normalized scores:
0
esij esij
0
zij = ϕDS
ij (s, s ) = min Pn+1 s , Pm+1 s0 ,
k=1 e k=1 e
kj ik
where indices n + 1 and m + 1 denote dummy inputs that correspond to an item not being allocated
to any bidder and a bidder not being allocated any item, respectively.
We first show that ϕDS (s, s0 ) as constructed is doubly stochastic, and that we do not lose in
generality by the constructive approach that we take. See Appendix D.3 for a proof.
Lemma 3.1. The matrix ϕDS (s, s0 ) is doubly stochastic ∀ s, s0 ∈ Rnm . For any doubly stochastic
matrix z ∈ [0, 1]nm , ∃ s, s0 ∈ Rnm , for which z = ϕDS (s, s0 ).
It remains to show that doubly-stochastic matrices correspond to lotteries over one-to-one as-
signments. This is an easy corollary of Birkhoff (1946) and also a special case of the bihierarchy
structure proposed in Budish et al. (2013) (Theorem 1), which we state in the following lemma for
completeness.
Lemma 3.2 (Birkhoff (1946)). Any doubly stochastic matrix A ∈ Rn×m P can be represented as a
1 , . . . , B k where each B ` ∈ {0, 1}n×m and
convex
P combination of matrices B j∈[m] Bij ≤ 1, ∀i ∈ [n]
and i∈[n] Bij ≤ 1, ∀j ∈ [m].
Budish et al. (2013) also propose a polynomial algorithm to decompose the doubly stochastic
matrix. The payment component for unit-demand valuations is the same as for the case of additive
valuations (see Figure 3).
4
This is a slightly more general definition for doubly stochastic than is typical. Doubly stochastic is more typically
defined on a square matrix with the sum of rows and the sum of columns equal to 1.
12
3.3 Covering Number Bounds
We conclude this section by instantiating our generalization bound from Section 2.4 to Regret-
Net, where we have both a regret and revenue term. Analogous results can also be stated for
P we only have a revenue term. Here, k · k1 is the induced matrix norm, i.e.
RochetNet, where
kwk1 = maxj i |wij |.
Theorem 3.1. For RegretNet with R hidden layers, K nodes per hidden layer, dg parameters in
the allocation component, dp parameters in the payment component, m items, n bidders, a sample
size of L, and the vector of all model parameters w satisfying kwk1 ≤ W , the following are valid
bounds for the ∆L term defined in Theorem 2.2, for different bidder valuation types:
(a) additive
p valuations:
∆L ≤ O R(dg + dp ) log(LW max{K, mn})/L ,
(b) unit-demand
q valuations:
∆L ≤ O R(dg + dp ) log(LW max{K, mn})/L ,
The proof is given in Appendix D.5. As the sample size L → ∞, the term ∆L → 0. The
dependence of the above result on the number of layers, nodes, and parameters in the network is
similar to standard covering number bounds for neural networks (Anthony and Bartlett, 2009).
where λ ∈ Rn is a vector of Lagrange multipliers, and ρ > 0 is a fixed parameter that controls the
weight on the quadratic penalty.
The solver is described in Algorithm 1 and alternates between the following updates on the model
parameters and the Lagrange multipliers: (a) wnew ∈ argminw Cρ (wold ; λold ), and (b) λnew i =
λold
i + ρ rgt
c i (w new ), ∀i ∈ N.
We divide the training sample S into minibatches of size B, and perform several passes over
the training samples (with random shuffling of the data after each pass). We denote the minibatch
received at iteration t by St = {v (1) , . . . , v (B) }. The update (a) on model parameters involves an
unconstrained optimization of Cρ over w and is performed using a gradient-based optimizer.
f i (w) denote the empirical regret in (3) computed on minibatch St . The gradient of Cρ
Let rgt
w.r.t. w for fixed λt is given by:
B B B
t 1 XX w (`)
XX
t
XX
∇w Cρ (w; λ ) = − ∇w pi (v ) + λi g`,i + ρ rgt
f i (w) g`,i , (8)
B
`=1 i∈N i∈N `=1 i∈N `=1
13
Algorithm 1 RegretNet Training
1: Input: Minibatches S1 , . . . , ST of size B
2: Parameters: ∀t, ρt > 0, γ > 0, η > 0, Γ ∈ N, K ∈ N
3: Initialize: w 0 ∈ Rd , λ0 ∈ Rn
4: for t = 0 to T do
5: Receive minibatch St = {v (1) , . . . , v (B) }
(`)
6: Initialize misreport v 0 i ∈ Vi , ∀` ∈ [B], i ∈ N
7: for r = 0 to Γ do
8: ∀` ∈ [B], i ∈ N :
(`) (`) (`) 0 (`)
v 0 i ← v 0 i + γ∇vi0 uw
9: i vi ; vi , v−i 0 (`)
vi =v 0 i
10: end for
11: Compute regret gradient: ∀` ∈ [B], i ∈ N :
12: t
g`,i =
(`) 0 (`) (`) (`) (`)
∇w uw − uw
13: i vi ; v i , v−i i (vi ; v )
w=wt
14: Compute Lagrangian gradient using (8) and update wt :
15: wt+1 ← wt − η∇w Cρt (w, λt )
w=wt
16: Update Lagrange multipliers once in Q iterations:
17: if t is a multiple of Q
18: λt+1
i ← λti + ρt rgt
f i (wt+1 ), ∀i ∈ N
19: else
20: λt+1 ← λt
21: end for
where
h i
w (`) 0 (`) w (`) (`)
g`,i = ∇w max
0
ui v i ; v i , v−i − ui (v i ; v ) .
vi ∈Vi
f i and g`,i in turn involve a “max” over misreports for each bidder i and valuation
The terms rgt
profile `. We solve this inner maximization over misreports using another gradient based optimizer.
(`)
In particular, we maintain a misreport v 0 i for each i and valuation profile `. For each minibatch,
we compute the optimal misreport, for each agent i and each valuation profile `, by taking Γ
gradient updates from a randomly initialized valuation, each update of the form
(`) (`) (`) 0 (`)
v 0 i = v 0 i + γ∇vi0 uw
i vi ; vi , v−i (`)
, (9)
vi0 =v 0 i
for some γ > 0. This is in the spirit of adversarial machine learning, where these gradient steps on
the input are taken to try to find a misreport for the agent that “defeats” the incentive alignment
of the mechanism.
Figure 5 gives a visualization of this search for defeating misreports when learning an optimal
auction for a problem with a single bidder with an additive valuation over two items, where the
bidder’s value for each item is an independent draw from U [0, 1] (see Section 5.3, Setting A). In
the visualization, the bidder has true valuation (v1 , v2 ) = (0.1, 0.8), with this input represented as
a green dot. The red crosses represent possible misreports. The heat map shows the utility gain,
u1 ((v1 , v2 ); (b1 , b2 ))−u1 ((v1 , v2 ); (v1 , v2 )), for this bidder when bidding some amount (b1 , b2 ) ∈ [0, 1]2
rather than truthfully. This mechanism is already approximately DSIC and the utility gain is
14
Figure 5: The gradient-based approach to regret approximation, shown for a well-trained auction for
Setting A. The top left plot shows the true valuation (green dot) and ten random initial misreports (red
dots). The remaining plots give snapshots of the progress of gradient ascent on the input, showing this every
four steps.
negative everywhere (and truthful bidding has zero regret), with shades of yellow corresponding to
a misreport that is almost as good as a true report and shades of green towards blue corresponding
to a harmful misreport. We illustrate the use of input gradients by initializing each of 10 possible
misreports (we are using 10 misreports for illustration, in our experiments we initialize only a
single misreport), and performing Γ = 20 gradient-ascent steps (9) for each misreport. Figure 5
shows the initial misreports along with a new snapshot of the location of each misreport every four
gradient-ascent steps.
We use the Adam optimizer (Kingma and Ba, 2014) for updates on model parameters w and
(`)
misreports v 0 i .5 Since the optimization problem is non-convex, the solver is not guaranteed to
reach a globally optimal solution. However, this training algorithm proves very effective in our
experiments. The learned auctions incur very low regret and closely match the structure of optimal
auctions in settings where this structure is known from existing theory.
5 Experimental Results
In this section, we demonstrate that our approach can recover near-optimal auctions for essentially
all settings for which an analytical solution is known, that it is an effective tool for confirming or
refuting hypotheses about optimal designs, and that it can find new auctions for settings where
there is no known analytical solution. We present a representative subset of the results here, and
provide additional experimental results in Appendix B.
5
Adam is a variant of SGD that makes use of a momentum term to update weights. Lines 9 and 15 in the
pseudo-code of Algorithm 1 are stated for a standard SGD algorithm.
15
5.1 Setup
We implement our framework using the TensorFlow deep learning library. For RochetNet we
initialized parameters α and β in (5) using a random uniform initializer over the interval [0,1] and
a zero initializer, respectively. For RegretNet we used the tanh activation function at the hidden
nodes, and Glorot uniform initialization (Glorot and Bengio, 2010). We perform cross validation to
decide on the number of hidden layers and the number of nodes in each hidden layer. We include
exemplary numbers that illustrate the tradeoffs in Section 5.7.
We trained RochetNet on 215 valuation profiles sampled every iteration in an online manner. We
used the Adam optimizer with a learning rate of 0.1 for 20,000 iterations for making the updates.
The parameter κ in Equation (6) was set to 1,000. Unless specified otherwise we used a max
network over 1,000 linear functions to model the induced utility functions, and report our results
on a sample of 10,000 profiles.
For RegretNet we used a sample of 640,000 valuation profiles for training and a sample of
10,000 profiles for testing. The augmented Lagrangian solver was run for a maximum of 80 epochs
(full passes over the training set) with a minibatch size of 128. The value of ρ in the augmented
Lagrangian was set to 1.0 and incremented every two epochs.
An update on wt was performed for every minibatch using the Adam optimizer with learning
rate 0.001. For each update on wt , we ran Γ = 25 misreport updates steps with learning rate 0.1.
At the end of 25 updates, the optimized misreports for the current minibatch were cached and
used to initialize the misreports for the same minibatch in the next epoch. An update on λt was
performed once every 100 minibatches (i.e., Q = 100).
We ran all experiments on a compute cluster with NVDIA Graphics Processing Unit (GPU)
cores.
5.2 Evaluation
In addition to the revenue of the learned auction on a test set, we also evaluate the P
regret achieved by
RegretNet, averaged across all bidders and test valuation profiles, i.e., rgt = n1 ni=1 rgt c i (g w , pw ).
Each rdgt i has an inner “max” of the utility function over bidder valuations vi0 ∈ Vi (see (3)). We
evaluate these terms by running gradient ascent on vi0 with a step-size of 0.1 for 2,000 iterations
(we test 1,000 different random initial vi0 and report the one that achieves the largest regret).
For some of the experiments we also report the total time required to train the network. This
time is incurred during offline training, while the allocation and payments can be computed in a
few milliseconds once the network is trained.
A. Single bidder with additive valuations over two items, where the item values are independent
draws from U [0, 1].
B. Single bidder with unit-demand valuations over two items, where the item values are inde-
pendent draws from U [2, 3].
The optimal design for the first setting is given by Manelli and Vincent (2006), who show that
the optimal mechanism is deterministic and offers the bidder three options: receive both items and
16
Prob. of allocating item 1 Prob. of allocating item 2 Prob. of allocating item 1 Prob. of allocating item 2
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
v2
v2
v2
v2
0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
0 0 0 0
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
v2
v2
v2
2.4 0.4 2.4 0.4 2.4 0.4
Figure 6: Side-by-side comparison of allocation rules learned by RochetNet and RegretNet for single bidder,
two items settings. Panels (a) and (b) are for Setting A and Panels (c) and (d) are for Setting B. The panels
describe the probability that the bidder is allocated item 1 (left) and item 2 (right) for different valuation
inputs. The optimal auctions are described by the regions separated by the dashed black lines, with the
numbers in black the optimal probability of allocation in the region.
√
pay (4 − 2)/3, receive item 1 and pay 2/3, or receive item 2 and pay 2/3. For the second setting
Pavlov (2011) shows that it is optimal to offer a fair lottery ( 12 , 12 ) over the items (at a discount),
or to purchase
√ any item at a fixed price. For the parameters here√the price for the lottery is
1
6 (8 + 22) ≈ 2.115 and the price for an individual item is 16 + 16 (8 + 22) ≈ 2.282.
We used two hidden layers with 100 hidden nodes in RegretNet for these settings. A visualization
of the optimal allocation rule and those learned by RochetNet and RegretNet is given in Figure 6.
Figure 7(a) gives the optimal revenue, the revenue and regret obtained by RegretNet, and the
revenue obtained by RochetNet. Figure 7(b) shows how these terms evolve over time during training
in RegretNet.
We find that both approaches essentially recover the optimal design, not only in terms of
revenue, but also in terms of the allocation rule and transfers. The auctions learned by RochetNet
are exactly DSIC and match the optimal revenue precisely, with sharp decision boundaries in
the allocation and payment rule. The decision boundaries for RegretNet are smoother, but still
remarkably accurate. The revenue achieved by RegretNet matches the optimal revenue up to a
< 1% error term and the regret it incurs is < 0.001. The plots of the test revenue and regret show
that the augmented Lagrangian method is effective in driving the test revenue and the test regret
towards optimal levels.
The additional domain knowledge incorporated into the RochetNet architecture leads to exactly
DSIC mechanisms that match the optimal design more accurately, and speeds up computation (the
training took about 10 minutes compared to 11 hours). On the other hand, we find it surprising
how well RegretNet performs given that it starts with no domain knowledge at all.
We present and discuss a host of additional experiments with single-bidder, two-item settings
in Appendix B.
17
Opt RegretNet RochetNet
Distribution
rev rev rgt(mean) rgt(90 %) rgt(95 %) rgt(99 %) rev
Setting A 0.550 0.554 < 0.001 < 0.001 0.001 0.002 0.550
Setting B 2.137 2.137 < 0.001 < 0.001 < 0.001 0.002 2.136
(a)
0.60 0.010
0.008
0.55
Test Revenue
Test Regret
0.006
0.50
RegretNet 0.004
0.45 Optimal 0.002
Mechanism
0.40 0.000
0 20 40 60 80 0 20 40 60 80
Epochs Epochs
(b)
Figure 7: (a): Test revenue and test regret for RegretNet and test revenue for RochetNet for Settings A
and B. (b): Plot of test revenue and test regret as a function of training epochs for Setting A with RegretNet.
Items SJA (rev ) RochetNet (rev )
2 0.549 0.549
3 0.875 0.875
4 1.219 1.219
5 1.576 1.576
6 1.943 1.943
7 2.318 2.318
8 2.699 2.699
9 3.086 3.086
10 3.478 3.478
Figure 8: Revenue of the Straight-Jacket Auction (SJA) computed via the recursive formula in (Gian-
nakopoulos and Koutsoupias, 2018), and the test revenue of the auction learned by RochetNet, for various
numbers of items m. The SJA is known to be optimal for up to six items and conjectured to be optimal for
any number of items.
A breakthrough came with Giannakopoulos and Koutsoupias (2018), who were able to find a
pattern in the results for two items and three items. The proposed mechanism—the Straight-Jacket
Auction (SJA)—offers bundles of items at fixed prices. The key to finding these prices is to view
the best-response regions as a subdivision of the m-dimensional cube, and observe that there is
an intrinsic relationship between the price of a bundle of items and the volume of the respective
best-response region.
Giannakopoulos and Koutsoupias (2018) give a recursive algorithm for finding the subdivision
and the prices, and used LP duality to prove that the SJA is optimal for m ≤ 6 items.6 They also
conjecture that the SJA remains optimal for general m, but were unable to prove it.
Figure 8 gives the revenue of the SJA, and that found by RochetNet for m ≤ 10 items. We
used a test sample of 230 valuation profiles (instead of 10,000) to compute these numbers for
6
The duality argument developed by Giannakopoulos and Koutsoupias is similar but incomparable to the duality
approach of Daskalakis et al. (2013). We will return to the latter in Section 5.5.
18
higher precision. It shows that RochetNet finds the optimal revenue for m ≤ 6 items, and that
it finds DSIC auctions whose revenue matches that of the SJA for m = 7, 8, 9, and 10 items.
Closer inspection reveals that the allocation and payment rules learned by RochetNet essentially
match those predicted by Giannakopoulos and Koutsoupias for all m ≤ 10. We take this as strong
additional evidence that their conjecture is correct.
For these experiments, we used a max network over 10,000 linear functions (instead of 1,000) to
increase the representation and flexibility of the neural network. This overparameterization trick
is commonly used in deep learning and has proven to be very effective in practice (Krizhevsky
et al., 2012; Allen-Zhu et al., 2019). We illustrate this effect in Appendix B.4. We followed up on
the usual training phase with an additional 20 iterations of training using Adam optimizer with
learning rate 0.001 and a minibatch size of 230 .
We also found it useful to impose item-symmetry on the learned auction, especially for m =
9 and 10 items, as this helped with accuracy and reduced training time. Imposing symmetry
comes without loss of generality for auctions with an item-symmetric distribution (Daskalakis and
Weinberg, 2012). To impose item symmetry, we first permute the inputs to be in ascending order,
compute the allocation and payment on this permuted input, and then invert the permutation of
allocation to compute the mechanism for the original inputs. With these modifications it took
about 13 hours to train the networks.
C. One additive bidder and two items, where the bidder’s valuation is drawn uniformly from the
triangle T = {(v1 , v2 )| vc1 + v2 ≤ 2, v1 ≥ 0, v2 ≥ 1} where c > 0 is a free parameter.
There is no analytical result for the optimal auction design for this setting. We ran RochetNet
for different values of c to discover the optimal auction. The mechanisms learned by RochetNet for
c = 0.5, 1, 3, and 5 are shown in Figure 10.
Based on this, we conjectured that √ the optimal mechanism contains two menu items for c ≤
2+ 1+3c
1, namely {(0, 0), 0} and {(1, 1), 3 }, and three menu items for c > 1, namely {(0, 0), 0},
{(1/c, 1), 4/3}, and {(1, 1), 1 + c/3}, giving the optimal allocation and payment in each region. In
particular, as c transitions from values less than or equal to 1 to values larger than 1, the optimal
mechanism transitions from being deterministic to being randomized. Figure 9 gives the revenue
achieved by RochetNet and the conjectured optimal format for a range of parameters c computed
on 230 valuation profiles.
We can validate the optimality of this conjectured design through duality theory (Daskalakis
et al., 2013). The proof is given in Appendix D.6.
Theorem 5.1. For any c > 0, suppose the bidder’s valuation is uniformly distributed over set
T = {(v1 , v2 )| vc1 + v2 ≤ 2, v1 ≥ 0, v2 ≥ 1}. Then the optimal auction contains two menu items
√
{(0, 0), 0} and {(1, 1), 2+ 31+3c } when c ≤ 1, and three menu items {(0, 0), 0}, {(1/c, 1), 4/3}, and
{(1, 1), 1 + c/3} otherwise.
In Appendix B.5, we also give the mechanisms learned by RochetNet for two additional settings.
Taken together, these results demonstrate that RochetNet is a powerful tool to help in the discovery
of new analytical results. In follow-up work, Shen et al. (2019) also use a neural network framework,
19
c Opt (rev ) RochetNet (rev )
0.125 1.029 1.029
0.200 1.046 1.046
0.250 1.056 1.056
0.500 1.104 1.104
1.000 1.185 1.185
3.000 1.481 1.481
5.000 1.778 1.778
8.000 2.222 2.222
10.000 2.518 2.518
20.000 4.000 4.000
Figure 9: Test revenue of the newly discovered optimal mechanism and that of RochetNet, for Setting C
with varying parameter c.
Prob. of allocating item 1 Prob. of allocating item 2 Prob. of allocating item 1 Prob. of allocating item 2
2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0
v2
v2
v2
v2
v2
v2
v2
Figure 10: Allocation rules learned by RochetNet for Setting C. The panels describe the probability that
the bidder is allocated item 1 (left) and item 2 (right) for c = 0.5, 1, 3, and 5. The auctions proposed in
Theorem 5.1 are described by the regions separated by the dashed black lines, with the numbers in black
the optimal probability of allocation in the region.
closely related to RochetNet, to discover an optimal analytical result for a similar setting: a single
additive bidder and two items, where the bidder’s valuation is drawn uniformly from the triangle
{(v1 , v2 )| vc1 + v2 ≤ 1, v1 ≥ 0, v2 ≥ 0}.
20
Number of initialized menu RochetNet (rev ) Number of active menu
choices in RochetNet items in RochetNet
2 0.3309 2
5 0.3309 2
10 0.3310 3
20 0.3310 4
50 0.3310 6
100 0.3310 7
200 0.3310 11
500 0.3310 11
1,000 0.3310 12
2,000 0.3310 17
5,000 0.3310 34
10,000 0.3310 59
20,000 0.3310 89
Figure 11: Test revenue of the auction learned by RochetNet for different menu sizes in Setting D. The
number of active menus increases as the number of initialized menu choices increases. The optimal mechanism
requires an infinitely-sized menu and achieves a revenue of 0.3311.
Net for different-sized menus. In Figure 11, we report the revenue, the number of menu choices
represented in RochetNet, and the number of menu choices that are active for one or more samples
in the test set. As we increase the number of initialized menu choices, the number of active menu
items increases as well. Comparing the optimal infinite-sized menu with the menu learned by Ro-
chetNet, we find that the difference in revenue comes from a large number of menu items that each
only contribute marginally to the net revenue (< 10−5 ). RochetNet fails to learn some of these
menus due to the fixed size of mini-batches and the numerical tolerance error of the optimization
routine. Regardless, the overall gap in revenue is negligible. Already with two active menu items,
RegretNet achieves a revenue of ∼ 0.3309 (99.93% of optimal), while with three or more active
menu items the revenue is at least ∼ 0.3310 (99.96% of optimal).
5.7 Scaling Up
In this section, we consider settings with up to five bidders and up to ten items. This is several
orders of magnitude more complex than existing analytical or computational results. It is also a
natural playground for RegretNet, as no tractable characterizations of IC mechanisms are known
for these settings. We specifically consider the following two settings, that generalize the basic
setting considered in Manelli and Vincent (2006) and Giannakopoulos and Koutsoupias (2018) to
more than one bidder:
E. Three additive bidders and ten items, where bidders draw their value for each item indepen-
dently from U [0, 1].
F. Five additive bidders and ten items, where bidders draw their value for each item indepen-
dently from U [0, 1].
An analytical description of the optimal auction for these settings is not known. However,
running a separate Myerson auction for each item is optimal in the limit of the number of bidders
(Palfrey, 1983). For a regime with a small number of bidders, this provides a strong benchmark.
We also compare to selling the grand bundle via a Myerson auction.
21
(a)
Figure 12: (a) Revenue and regret of RegretNet on the validation set for auctions learned for Setting E
using different architectures, where (R, K) denotes R hidden layers and K nodes per layer. (b) Test revenue
and test regret for Settings E and F for the (5, 100) architecture.
Figure 13: Test revenue, test regret, test IR violation, and training time or solve time for RegretNet and
an LP-based approach, for a two bidder, two items setting with additive uniform valuations.
For Setting E, we show in Figure 12(a) the revenue and regret of the learned auction on a
validation sample of 10,000 profiles, obtained with different architectures. Here (R, K) denotes an
architecture with R hidden layers and K nodes per layer. The (5, 100) architecture has the lowest
regret among all the 100-node networks for both Setting E and Setting F. Figure 12(b) shows that
the learned auctions yield higher revenue compared to the baselines, and do so with tiny regret.
22
1.00
0.95
rev(mean)
0.90
0.85
0.80 LP with IR
LP
RegretNet
0.75
10 3 10 2 10 1 100 101 102
Run Time (in hours)
Figure 14: Test revenue vs. training or solve time (in hours) in the two bidders, two items, additive U [0, 1]
value setting. Comparing the LP-based approach (with -nearest and -down rounding, corresponding to
“LP” and “LP with IR” respectively) and RegretNet, and varying the number of variables in the LP by
modifying the level of discretization, and the number of parameters in RegretNet by modifying the network
structure. The size of a marker corresponds to the sum of regret and IR violations of each method. For
RegretNet, we only plot the results that lie on the efficient frontier.
−ui (v (`) ), 0}. Due to the coarse discretization, the LP approach with nearest-point rounding suffers
substantial IR violations. As a result of this, as well as its relatively high regret compared to
RegretNet, the relatively high revenue achieved by the LP together with nearest-point rounding,
compared with RegretNet, is misleading. For this reason, we also include the performance of
the LP-based mechanism when the continuous input valuation profiles are rounded down to their
respective discrete profiles. There we see zero IR violation but substantially lower revenue than
RegretNet (and still with higher regret) We were not able to run an LP for this setting for a finer
discretization than 11 bins per item value in more than nine days (216 hours) of compute time.8
In contrast, RegretNet yields very low regret along with zero IR violations (as the neural network
satisfies IR by design), and does so in around four hours. In fact, even for the larger Settings E–F,
the training time of RegretNet was less than 13 hours.
In Figure 14, we plot the test revenue, test regret and the run-time of the LP-based and
RegretNet methods, while varying the number of variables in the LP and number of parameters in
RegretNet. For the LP, this is done by varying the discretization and for RegretNet, this is done by
varying the network structure. In Appendix C, we include the complete set of results for varying
the discretization in the LP-based method and varying the number of hidden layers and hidden
units configurations in RegretNet. Introducing an increasingly fine discretization into the LP-based
method provides an initial increase in revenue in return for a modest increase in run time, but this
gives way to a huge increase in run time with no effect on revenue. For RegretNet, the training
time is relatively stable as the number of hidden layers and units per layer is varied, while larger
networks bring a substantive increase in revenue. We only plot the results for RegretNet that lie on
the efficient frontier, and refer to Figure 25 for the full details. Taken together these results show
that RegretNet’s performance substantially extends the revenue-time Pareto frontier available from
the LP method, obtaining higher revenue for a relatively modest training time.
8
We used an AWS EC-2 instance with 48 cores and 96GB of memory
23
6 Conclusion
In this paper, we have introduced a new framework of differentiable economics for using neural
networks for economic discovery, specifically for the discovery of revenue-optimal, multi-bidder and
multi-item auctions. We have demonstrated that standard machine learning pipelines can be used
to essentially re-discover all known, optimal auction designs, and to discover the design of auctions
for settings out of reach of theory and settings that are orders of magnitude larger than those that
can be solved through other computational approaches. We also see promise for the framework in
advancing economic theory, for example in supporting or refuting conjectures and as an assistant
in guiding new economic discovery.
This framework has already inspired a great deal of follow-on work, in taking differentiable
economics to additional domains and in scaling-up the methods to support networks that simul-
taneously handle multiple sizes of markets (number of bidders and number of items). Looking
ahead, there remain a number of interesting challenges. Beyond expanding the domains that are
studied by differentiable economics, the methodological challenges include the interpretability of
learned mechanisms, integrating additional structural regularities from economic theory, scaling
up to larger economic systems, and providing robustness guarantees in the form of certificates for
economic properties. Combinatorial auctions (CAs) presents an especially important domain, and
one whose study we have only initiated here (see Appendix A.2 and B.3, for theory and experiment
results for the case of CAs with two items). CAs are important to practice (Palacios-Huerta et al.,
2022), and yet concerns around low revenue and their vulnerability to collusion (Ausubel and Mil-
grom, 2006; Ausubel et al., 2006; Day and Milgrom, 2008; Levin and Skrzypacz, 2016; Goeree and
Lien, 2016) mean that we lack a complete understanding even for the design of efficient auctions,
never mind finding revenue-optimizing designs.
References
Allen-Zhu, Z., Li, Y., and Song, Z. (2019). A convergence theory for deep learning via over-
parameterization. In Proceedings of the 36th International Conference on Machine Learning,
pages 242–252.
Anthony, M. and Bartlett, P. L. (2009). Neural Network Learning: Theoretical Foundations. Cam-
bridge University Press, 1st edition.
Areyan Viqueira, E., Cousins, C., Mohammad, Y., and Greenwald, A. (2019). Empirical mecha-
nism design: Designing mechanisms from data. In Proceedings of the Thirty-Fifth Conference
on Uncertainty in Artificial Intelligence, UAI, volume 115 of Proceedings of Machine Learning
Research, pages 1094–1104.
Ausubel, L., Cramton, P., and Milgrom, P. (2006). The clock-proxy auction: A practical combi-
natorial auction design. In Cramton, P., Shoham, Y., and Steinberg, R., editors, Combinatorial
Auctions, chapter 5, pages 115–138. MIT Press.
Ausubel, L. and Milgrom, P. (2006). The lovely but lonely Vickrey auction. In Cramton, P.,
Shoham, Y., and Steinberg, R., editors, Combinatorial Auctions, chapter 1, pages 17–40. MIT
Press.
Babaioff, M., Immorlica, N., Lucier, B., and Weinberg, S. M. (2014). A simple and approximately
optimal mechanism for an additive buyer. In Proceedings of the 55th IEEE Symposium on
Foundations of Computer Science, pages 21–30.
24
Balaguer, J., Koster, R., Summerfield, C., and Tacchetti, A. (2022). The good shepherd: An oracle
agent for mechanism design. CoRR, arXiv:2202.10135.
Balcan, M.-F., Sandholm, T., and Vitercik, E. (2016). Sample complexity of automated mechanism
design. In Proceedings of the 29th Conference on Neural Information Processing Systems, pages
2083–2091.
Bichler, M., Fichtl, M., Heidekrüger, S., Kohring, N., and Sutterer, P. (2021). Learning equilibria
in symmetric auction games using artificial neural networks. Nat. Mach. Intell., 3(8):687–695.
Birkhoff, G. (1946). Tres observaciones sobre el algebra lineal. Universidad Nacional de Tucumán,
Revista A, 5:147–151.
Boutilier, C. and Hoos, H. H. (2001). Bidding languages for combinatorial auctions. In Proceedings
of the 17th International Joint Conference on Artificial Intelligence, pages 1211–1217.
Brero, G., Chakrabarti, D., Eden, A., Gerstgrasser, M., Li, V., and Parkes, D. (2021a). Learning
Stackelberg equilibria in sequential price mechanisms. In ICML Workshop for Reinforcement
Learning Theory.
Brero, G., Eden, A., Gerstgrasser, M., Parkes, D. C., and Rheingans-Yoo, D. (2021b). Reinforce-
ment learning of sequential price mechanisms. In Thirty-Fifth AAAI Conference on Artificial
Intelligence, AAAI, pages 5219–5227.
Brero, G., Lepore, N., Mibuari, E., and Parkes, D. C. (2022). Learning to mitigate AI collusion on
economic platforms. CoRR, abs/2202.07106.
Brinkman, E. and Wellman, M. P. (2017). Empirical mechanism design for optimizing clearing
interval in frequent call markets. In Proceedings of the 2017 ACM Conference on Economics and
Computation, EC, pages 205–221.
Budish, E., Che, Y.-K., Kojima, F., and Milgrom, P. (2013). Designing random allocation mecha-
nisms: Theory and applications. American Economic Review, 103(2):585–623.
Cai, Y., Daskalakis, C., and Weinberg, M. S. (2012a). Optimal multi-dimensional mechanism
design: Reducing revenue to welfare maximization. In Proceedings of the 53rd IEEE Symposium
on Foundations of Computer Science, pages 130–139.
Cai, Y., Daskalakis, C., and Weinberg, S. M. (2012b). An algorithmic characterization of multi-
dimensional mechanisms. In Proceedings of the 44th ACM Symposium on Theory of Computing,
pages 459–478.
Cai, Y., Daskalakis, C., and Weinberg, S. M. (2013). Understanding incentives: Mechanism de-
sign becomes algorithm design. In Proceedings of the 54th IEEE Symposium on Foundations of
Computer Science, pages 618–627.
Cai, Y. and Zhao, M. (2017). Simple mechanisms for subadditive buyers via duality. In Proceedings
of the 49th ACM Symposium on Theory of Computing, pages 170–183.
Chawla, S., Hartline, J. D., Malec, D. L., and Sivan, B. (2010). Multi-parameter mechanism
design and sequential posted pricing. In Proceedings of the 42th ACM Symposium on Theory of
Computing, pages 311–320.
25
Choromanska, A., LeCun, Y., and Arous, G. B. (2015). The landscape of the loss surfaces of
multilayer networks. In Proceedings of The 28th Conference on Learning Theory, pages 1756–
1760.
Cole, R. and Roughgarden, T. (2014). The sample complexity of revenue maximization. In Pro-
ceedings of the 46th ACM Symposium on Theory of Computing, pages 243–252.
Conitzer, V. and Sandholm, T. (2002). Complexity of mechanism design. In Proceedings of the
18th Conference on Uncertainty in Artificial Intelligence, pages 103–110.
Conitzer, V. and Sandholm, T. (2004). Self-interested automated mechanism design and implica-
tions for optimal combinatorial auctions. In Proceedings of the 5th ACM Conference on Electronic
Commerce, pages 132–141.
Curry, M. J., Chiang, P., Goldstein, T., and Dickerson, J. (2020). Certifying strategyproof auction
networks. In Advances in Neural Information Processing Systems 33.
Curry, M. J., Lyi, U., Goldstein, T., and Dickerson, J. (2022). Learning revenue-maximizing
auctions with differentiable matching. In 25th International Conference on Artificial Intelligence
and Statistics (AISTATS).
Daskalakis, C., Deckelbaum, A., and Tzamos, C. (2013). Mechanism design via optimal transport.
In Proceedings of the 14th ACM Conference on Electronic Commerce, pages 269–286.
Daskalakis, C., Deckelbaum, A., and Tzamos, C. (2017). Strong duality for a multiple-good mo-
nopolist. Econometrica, 85:735–767.
Daskalakis, C. and Weinberg, S. M. (2012). Symmetries and optimal multi-dimensional mechanism
design. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 370–387.
Day, R. and Milgrom, P. (2008). Core-selecting package auctions. International Journal of Game
Theory, 36(3):393–407.
Du, S., Lee, J., Li, H., Wang, L., and Zhai, X. (2019). Gradient descent finds global minima of
deep neural networks. In Proceedings of the 36th International Conference on Machine Learning,
volume 97, pages 1675–1685.
Duan, Z., Tang, J., Yin, Y., Feng, Z., Yan, X., Zaheer, M., and Deng, X. (2022). A context-
integrated transformer-based neural network for auction design. In Proceedings of the 39th In-
ternational Conference on Machine Learning, volume 162, pages 5609–5626. PMLR.
Duan, Z., Zhang, D., Huang, W., Du, Y., Wang, J., Yang, Y., and Deng, X. (2021). Towards the
pac learnability of nash equilibrium. arXiv preprint arXiv:2108.07472.
Dütting, P., Feng, Z., Narasimhan, H., Parkes, D. C., and Ravindranath, S. S. (2019). Optimal
auctions through deep learning. In Proceedings of the 36th International Conference on Machine
Learning, ICML, volume 97 of Proceedings of Machine Learning Research, pages 1706–1715.
Dütting, P., Feng, Z., Narasimhan, H., Parkes, D. C., and Ravindranath, S. S. (2021). Optimal
auctions through deep learning. Commun. ACM, 64(8):109–116.
Dütting, P., Fischer, F., Jirapinyo, P., Lai, J., Lubin, B., and Parkes, D. C. (2014). Payment
rules through discriminant-based classifiers. ACM Transactions on Economics and Computation,
3(1):5.
26
Feng, Z., Narasimhan, H., and Parkes, D. C. (2018). Deep learning for revenue-optimal auctions
with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and
Multiagent Systems, pages 354–362.
Feng, Z., Parkes, D. C., and Ravindranath, S. S. (2022). Differentiable economics. In Echenique, F.,
Immorlica, N., and Vazirani, V., editors, Online matching theory and market design. Cambridge
University Press.
Fudenberg, D. and Liang, A. (2019). Predicting and understanding initial play. American Economic
Review, 109:4112–4141.
Giannakopoulos, Y. and Koutsoupias, E. (2018). Duality and optimality of auctions for uniform
distributions. In SIAM Journal on Computing, volume 47, pages 121–165.
Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural
networks. In Proceedings of the 13th International Conference on Artificial Intelligence and
Statistics.
Goeree, J. K. and Lien, Y. (2016). On the impossibility of core-selecting auctions. Theoretical
Economics, 11:41–52.
Golowich, N., Narasimhan, H., and Parkes, D. C. (2018). Deep learning for multi-facility loca-
tion mechanism design. In Proceedings of the 27th International Joint Conference on Artificial
Intelligence, pages 261–267.
Gonczarowski, Y. A. and Nisan, N. (2017). Efficient empirical revenue maximization in single-
parameter auction environments. In Proceedings of the 49th Annual ACM Symposium on Theory
of Computing, pages 856–868.
Gonczarowski, Y. A. and Weinberg, S. M. (2018). The sample complexity of up-to- multi-
dimensional revenue maximization. In 59th IEEE Annual Symposium on Foundations of Com-
puter Science, pages 416–426.
Guesnerie, R. (1995). A Contribution to the Pure Theory of Taxation. Cambridge University Press.
Guo, M. and Conitzer, V. (2010). Computationally feasible automated mechanism design: General
approach and case studies. In Proceedings of the 24th AAAI Conference on Artificial Intelligence.
Haghpanah, N. and Hartline, J. (2019). When is pure bundling optimal? Review of Economic
Studies. Revise and resubmit.
Hammond, P. (1979). Straightforward individual incentive compatibility in large economies. Review
of Economic Studies, 46:263–282.
Hart, S. and Nisan, N. (2017). Approximate revenue maximization with multiple items. Journal
of Economic Theory, 172:313–347.
Hartford, J. S., Lewis, G., Leyton-Brown, K., and Taddy, M. (2017). Deep IV: A flexible approach
for counterfactual prediction. In Proceedings of the 34th International Conference on Machine
Learning, pages 1414–1423.
Hartford, J. S., Wright, J. R., and Leyton-Brown, K. (2016). Deep learning for predicting human
strategic behavior. In Proceedings of the 29th Conference on Neural Information Processing
Systems, pages 2424–2432.
27
Huang, Z., Mansour, Y., and Roughgarden, T. (2018). Making the most of your samples. SIAM
Journal on Computing, 47(3):651–674.
Ivanov, D., Safiulin, I., Balabaeva, K., and Filippov, I. (2022). Optimal-er auctions through atten-
tion. arXiv preprint arXiv:2202.13110.
Jehiel, P., Meyer-ter-Vehn, M., and Moldovanu, B. (2007). Mixed bundling auctions. Journal of
Economic Theory, 134(1):494–512.
Jordan, P. R., Schvartzman, L. J., and Wellman, M. P. (2010). Strategy exploration in empir-
ical games. In 9th International Conference on Autonomous Agents and Multiagent Systems
(AAMAS), pages 1131–1138.
Kawaguchi, K. (2016). Deep learning without poor local minima. In Proceedings of the 30th
Conference on Neural Information Processing Systems, pages 586–594.
Kiekintveld, C. and Wellman, M. P. (2008). Selecting strategies using empirical game models: An
experimental analysis of meta-strategies. In 7th International Joint Conference on Autonomous
Agents and Multiagent Systems (AAMAS), pages 1095–1101.
Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. CoRR,
abs/1412.6980.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet classification with deep con-
volutional neural networks. In Advances in Neural Information Processing Systems 25, pages
1097–1105.
Kuo, K., Ostuni, A., Horishny, E., Curry, M. J., Dooley, S., Chiang, P., Goldstein, T., and Dicker-
son, J. P. (2020). ProportionNet: Balancing fairness and revenue for auction design with deep
learning. CoRR, abs/2010.06398.
Lahaie, S. (2011). A kernel-based iterative combinatorial auction. In Proceedings of the 25th AAAI
Conference on Artificial Intelligence, pages 695–700.
Lanctot, M., Zambaldi, V. F., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., Silver, D., and
Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. In
Advances in Neural Information Processing Systems 30, pages 4190–4203.
Levin, J. and Skrzypacz, A. (2016). Properties of the combinatorial clock auction. American
Economic Review, 106:2528–2551.
Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R. S., and Welling, M. (2017). Causal
effect inference with deep latent-variable models. In Proceedings of the 30th Conference on Neural
Information Processing Systems, pages 6449–6459.
Manelli, A. and Vincent, D. (2006). Bundling as an optimal selling mechanism for a multiple-good
monopolist. Journal of Economic Theory, 127(1):1–35.
Mohri, M. and Medina, A. M. (2016). Learning algorithms for second-price auctions with reserve.
Journal of Machine Learning Research, 17:74:1–74:25.
28
Morgenstern, J. and Roughgarden, T. (2016). Learning simple auctions. In Proceedings of the 29th
Conference on Learning Theory, pages 1298–1318.
Narasimhan, H., Agarwal, S., and Parkes, D. C. (2016). Automated mechanism design without
money via machine learning. In Proceedings of the 25th International Joint Conference on Arti-
ficial Intelligence, pages 433–439.
Palacios-Huerta, I., Parkes, D. C., and Steinberg, R. (2022). Combinatorial auctions in practice.
J. of Economic Literature. forthcoming.
Pavlov, G. (2011). Optimal mechanism for selling two goods. B.E. Journal of Theoretical Eco-
nomics, 11:1–35.
Peri, N., Curry, M. J., Dooley, S., and Dickerson, J. P. (2021). PreferenceNet: Encoding human
preferences in auction design with deep learning. In Proceedings of the 35th Conference on Neural
Information Processing Systems (NeurIPS).
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., and Griffiths, T. L. (2021). Using
large-scale experiments and machine learning to discover theories of human decision-making.
Science, 372(6547):1209–1214.
Raghu, M., Irpan, A., Andreas, J., Kleinberg, R., Le, Q. V., and Kleinberg, J. M. (2018). Can
deep reinforcement learning solve Erdos-Selfridge-Spencer games? In Proceedings of the 35th
International Conference on Machine Learning, pages 4235–4243.
Rahme, J., Jelassi, S., Bruna, J., and Weinberg, S. M. (2021a). A permutation-equivariant neural
network architecture for auction design. In Thirty-Fifth AAAI Conference on Artificial Intelli-
gence, AAAI, pages 5664–5672.
Rahme, J., Jelassi, S., and Weinberg, S. M. (2021b). Auction learning as a two-player game. In
9th International Conference on Learning Representations, ICLR.
Ravindranath, S. S., Feng, Z., Li, S., Ma, J., Kominers, S. D., and Parkes, D. C. (2021). Deep
learning for two-sided matching. CoRR, abs/2107.03427.
Rochet, J.-C. (1987). A necessary and sufficient condition for rationalizability in a quasilinear
context. Journal of Mathematical Economics, 16:191–200.
Rudin, C. and Schapire, R. E. (2009). Margin-based ranking and an equivalence between AdaBoost
and RankBoost. Journal of Machine Learning Research, 10:2193–2232.
29
Shen, W., Peng, B., Liu, H., Zhang, M., Qian, R., Hong, Y., Guo, Z., Ding, Z., Lu, P., and Tang,
P. (2020). Reinforcement mechanism design: With applications to dynamic pricing in sponsored
search auctions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34,
pages 2236–2243.
Shen, W., Tang, P., and Zuo, S. (2019). Automated mechanism design via neural networks. In
Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems.
Sill, J. (1998). Monotonic networks. In Proceedings of the 12th Conference on Neural Information
Processing Systems, pages 661–667.
Syrgkanis, V. (2017). A sample complexity measure with applications to learning optimal auctions.
In Proceedings of the 20th Conference on Neural Information Processing Systems, pages 5358–
5365.
Tacchetti, A., Strouse, D., Garnelo, M., Graepel, T., and Bachrach, Y. (2019). A neural architecture
for designing truthful and efficient auctions. CoRR, abs/1907.05181.
Thompson, D., Newman, N., and Leyton-Brown, K. (2017). The Positronic Economist: A compu-
tational system for analyzing economic mechanisms. In Proceedings of the 31st AAAI Conference
on Artificial Intelligence, pages 720–727.
Vorobeychik, Y., Kiekintveld, C., and Wellman, M. P. (2006). Empirical mechanism de-
sign:Methods, with application to a supply-chain scenario. In Proceedings 7th ACM Conference
on Electronic Commerce (EC-2006), pages 306–315.
Vorobeychik, Y., Reeves, D. M., and Wellman, M. P. (2012). Constrained automated mechanism
design for infinite games of incomplete information. Auton. Agents Multi Agent Syst., 25(2):313–
351.
Wang, K., Xu, L., Perrault, A., Reiter, M. K., and Tambe, M. (2022a). Coordinating followers to
reach better equilibria: End-to-end gradient descent for Stackelberg games. In AAAI Conference
on Artificial Intelligence.
Wang, X., Ma, G. Q., Eden, A., Li, C., Trott, A., Zheng, S., and Parkes, D. C. (2022b). Using rein-
forcement learning to study platform economies under market shocks. CoRR, arXiv:2203.13395.
Yao, A. C.-C. (2015). An n-to-1 bidder reduction for multi-item auctions and its applications. In
Proceedings of the 26th ACM-SIAM Symposium on Discrete Algorithms, pages 92–109.
Yao, A. C.-C. (2017). Dominant-strategy versus Bayesian multi-item auctions: Maximum revenue
determination and comparison. In Proceedings of the 18th ACM Conference on Economics and
Computation, pages 3–20.
Zheng, S., Trott, A., Srinivasa, S., Parkes, D. C., and Socher, R. (2022). The AI Economist:
Optimal economic policy design via two-level deep reinforcement learning. Science Advances, 8.
30
A Additional Architectures
In this appendix we present additional network architectures, for a multi-bidder single-item setting,
and for a general multi-bidder multi-item setting with combinatorial valuations.
The structure of the revenue-optimal auction is well understood for this setting.
Theorem A.1 (Myerson (1981)). There exist a collection of monotonically non-decreasing func-
tions, φ̄i : R≥0 → R called the ironed virtual valuation functions such that the optimal BIC auction
for selling a single item is the DSIC auction that assigns the item to the buyer with the highest
ironed virtual value φ̄i (vi ) provided that this value is non-negative, with ties broken Rin an arbitrary
v
value-independent manner, and charges the bidders according to pi (vi ) = vi gi (vi ) − 0 i gi (t) dt.
For distribution Fi with density fi the virtual valuation function is ψi (vi ) = vi −(1−F (vi ))/f (vi ).
A distribution Fi with density fi is regular if ψi is monotonically non-decreasing. For regular
distributions F1 , . . . , Fn no ironing is required and φ̄i = ψi for all i.
If the virtual valuation functions ψ1 , . . . , ψn are furthermore monotonically increasing and not
only monotonically non-decreasing, the optimal auction can be viewed as applying the monotone
transformations to the input bids b̄i = φ̄i (bi ), feeding the computed virtual values to a second price
auction (SPA) with zero reserve price, denoted (g 0 , p0 ), making an allocation according to g 0 (b̄),
and charging a payment φ̄−1 0
i (pi (b̄)) for winning bidder i. In fact, this auction is DSIC for any
choice of strictly monotone transformations of the values:
Theorem A.2. For any set of strictly monotonically increasing functions φ̄1 , . . . , φ̄n , an auction
defined by outcome rule gi = gi0 ◦ φ̄ and payment rule pi = φ̄−1 i ◦ p0i ◦ φ̄ is DSIC and IR, where
(g 0 , p0 ) is the allocation and payment rule of a second price auction with zero reserve.
For regular distributions with monotonically increasing virtual value functions designing an
optimal DSIC auction thus reduces to finding the right strictly monotone transformations and
corresponding inverses, and modeling a second price auction with zero reserve.
We present a high-level overview of a neural network architecture that achieves this in Fig-
ure 15(a), and describe the components of this network in more detail in Section A.1.1 and Sec-
tion A.1.2 below.
The MyersonNet is tailored to monotonically increasing virtual value functions. For regular dis-
tributions with virtual value functions that are not strictly increasing and for irregular distributions
this approach only yields approximately optimal auctions.
31
h1,1
.. max
.
SPA-0 h1,J
b1 φ̄1
g0 (z1 , . . . , zn ) bi ..
. ..
. min b̄i
..
. φ̄−1
1 t1 hK,1
.. ..
p0 . . max
bn φ̄n
φ̄−1
n tn hK,J
(a) (b)
Figure 15: (a) MyersonNet: The network applies monotone transformations φ̄1 , . . . , φ̄n to the input
bids, passes the virtual values to the SPA-0 network in Figure 16, and applies the inverse transformations
αikj
φ̄−1 −1
1 , . . . , φ̄n to the payment outputs. (b) Monotone virtual value function φ̄i , where hkj (bi ) = e
i
bi + βkj .
Since each of the above linear function is strictly non-decreasing, so is φ̄i . In practice, we can set
each wkji = eαikj for parameters αi ∈ [−B, B] in a bounded range. A graphical representation of
kj
the neural network used for this transform is shown in Figure 15(b). For sufficiently large K and J,
this neural network can be used to approximate any continuous, bounded monotone function (that
satisfies a mild regularity condition) to an arbitrary degree of accuracy (Sill, 1998). A particular
advantage of this representation is that the inverse transform φ̄−1 can be directly obtained from
the parameters for the forward transform:
−αkj i
φ̄−1
i (y) = max min e
i
(y − βkj ).
k∈[K] j∈[J]
where κ > 0 is a constant fixed a priori, and determines the quality of the approximation. The
higher the value of κ, the better the approximation but the less smooth the resulting allocation
function.
The SPA-0 payment to bidder i, conditioned on being allocated, is the maximum of the virtual
values from the other bidders and zero:
t0i (b̄) = max max b̄j , 0 , i ∈ N.
(12)
j6=i
32
sof tmax
b̄1 z1 max t01
b̄1
0 zn
0 max t0n
Let g α,β and tα,β denote the allocation and conditional payment rules for the overall auction in
Figure 15(a), where (α, β) are the parameters of the forward monotone transform. Given a sample
of valuation profiles S = {v (1) , . . . , v (L) } drawn i.i.d. from F , we optimize the parameters using the
negated revenue on S as the error function, where the revenue is approximated as:
L n
1 X X α,β (`) α,β (`)
rd
ev(g, t) = gi (v ) ti (v ). (13)
L
`=1 i=1
We solve this training problem using a minibatch stochastic gradient descent solver.
33
m
Let s, s(1) , . . . , s(m) ∈ Rn×2 denote these bidder scores and item scores. Each group of
P (j)
scores is normalized using a softmax function: s̄i,S = exp(si,S )/ S 0 exp(si,S 0 ) and s̄i,S =
(j) P (j)
exp(si,S )/ i0 ,S 0 exp(si0 ,S 0 ). The allocation for bidder i and bundle S ⊆ M is defined as the min-
(j)
imum of the normalized bidder-wise score s̄i,S and the normalized item-wise scores s̄i,S for each
j ∈ S:
(j)
zi,S = ϕCF (1) (m)
i,S (s, s , . . . , s ) = min s̄i,S , s̄i,S : j ∈ S . (16)
Similar to the unit-demand setting, we first show that ϕCF (s, s(1) , . . . , s(m) ) is combinatorial
feasible and that our constructive approach is without loss of generality. See Appendix D.4 for a
proof.
Lemma A.1. The matrix ϕCF (s, s(1) , . . . , s(m) ) is combinatorial feasible ∀ s, s(1) , . . . , s(m) ∈
m m m
Rn×2 . For any combinatorial feasible matrix z ∈ [0, 1]n×2 , ∃ s, s(1) , . . . , s(m) ∈ Rn×2 , for which
z = ϕCF (s, s(1) , . . . , s(m) ).
Unfortunately, Example A.1 shows that a combinatorial feasible allocation may not have an
integer decomposition, even for the case of two bidders and two items.
Example A.1. Consider a setting with two bidders and two items, and the following fractional,
combinatorial feasible allocation:
z1,{1} z1,{2} z1,{1,2} 3/8 3/8 1/4
z= =
z2,{1} z2,{2} z2,{1,2} 1/8 1/8 1/4
Any integer decomposition of this allocation z would need to have the following structure:
0 0 1 0 0 0 1 0 0 1 0 0 0 0 0
z = a +b +c +d +e
0 0 0 0 0 1 0 1 0 0 0 0 0 1 0
0 1 0 0 1 0 0 0 0
+f +g +h
1 0 0 0 0 0 1 0 0
Theorem A.3. For m = 2, any combinatorial feasible allocation z with additional constraints (17)
can be represented as a convex combination of matrices B 1 , . . . , B k where each B ` is a combinatorial
feasible, 0-1 allocation.
34
Proof. Firstly, we observe in any deterministic allocation B ` , if there exists an i, s.t. Bi,{1,2}
` = 1,
`
then ∀j 6= i, S : Bj,S = 0. Therefore, we first decompose z into the following components,
n
X
z= zi,{1,2} · B i + C,
i=1
and
1 if j = i, S = {1, 2}, and
i
Bj,S =
0 otherwise.
Then we want to argue that C can be represented as k`=i+1 p` · B ` , where k`=i+1 p` ≤ 1 −
P P
Pn `
i=1 zi,{1,2} and
Peach B is a feasible 0-1 allocation. Matrix C P has all zeros in the last (items
{1, 2}) column, i Ci,{1} ≤ 1 − ni=1 zi,{1,2} , and i Ci,{2} ≤ 1 − ni=1 zi,{1,2} .
P P
In addition, based on constraint (17), for each bidder i,
n
X
Ci,{1} + Ci,{2} = zi,{1} + zi,{2} ≤ 1 − zi0 ,{1,2} .
i0 =1
Thus C is a doubly stochastic matrix with scaling factor 1 − ni0 =1 zi0 ,{1,2} . Therefore, we can
P
always decompose C into a linear combination k`=i+1 p` · B ` , where k`=i+1 p` ≤ 1 − ni0 =1 zi0 ,{1,2}
P P P
We leave to future work to characterize the additional constraints needed for the multi-item
(m > 2) case.
To satisfy constraint (17) for each bidder i, we compute the normalized score s̄0 i,S for each i, S
as,
s̄0 (i) if S = {1} or {2}, and
i,S n
s̄0 i,S = o
min s̄0 (k)
i,S : k ∈ N if S = {1, 2}.
The payment component of the network for combinatorial bidders has the same structure as
P payment p̃i ∈ [0, 1] for each bidder i using a sigmoidal
the one in Figure 3, computing a fractional
unit, and outputting a payment pi = p̃i S⊆M zi,S bi,S .
35
Opt SPA MyersonNet
Distribution n
rev rev rev
Setting G 3 0.531 0.500 0.531
Setting H 5 2.314 2.025 2.305
Setting I 3 2.749 2.500 2.747
Setting J 3 2.368 2.210 2.355
Figure 17: The test revenue of the single-item auctions obtained with MyersonNet.
B Additional Experiments
We present a broad range of additional experiments for the two main architectures used in the
body of the paper, and additional ones for the architectures presented in Appendix A
J. Three bidders with independent irregular distributions Firregular , where each vi is drawn from
U [0, 3] with probability 3/4 and from U [3, 8] with probability 1/4.
We note that the optimal auctions for the first three distributions involve virtual value functions
φ̄i that are strictly monotone. For the fourth and final distribution the optimal auction uses ironed
virtual value functions that are not strictly monotone.
For the training set and test set we used 1,000 valuation profiles sampled i.i.d. from the respective
valuation distribution. We modeled each transform φ̄i in the MyersonNet architecture using 5 sets
of 10 linear functions, and we used κ = 103 .
The results are summarized in Figure 17. For comparison, we also report the revenue obtained
by the optimal Myerson auction and the second price auction (SPA) without reserve. The auctions
learned by the neural network yield revenue close to the optimal.
K. Single additive bidder with independent preferences over two non-identically distributed
items, where v1 ∼ U [4, 16] and v2 ∼ U [4, 7]. The optimal mechanism is given by Daskalakis
et al. (2017).
36
Prob. of allocating item 1 Prob. of allocating item 2 Prob. of allocating item 1 Prob. of allocating item 2
7.0 1.0 7.0 1.0 7.0 1.0 7.0 1.0
6.5 0.8 6.5 0.8 6.5 0.8 6.5 0.8
6.0 0.5 6.0 6.0 0.5 6.0
0.6 0.6 0.6 0.6
5.5 1 5.5 1 5.5 1 5.5 1
v2
v2
v2
v2
0.4 0.4 0.4 0.4
5.0 5.0 5.0 5.0
4.5 0 0.2 4.5 0 0.2 4.5 0 0.2 4.5 0 0.2
v2
v2
v2
v2
0.4 1 0.4 0.4 1 0.4 0.4 1 0.4 0.4 1 0.4
v2
v2
v2
v2
Figure 18: Side-by-side comparison of the allocation rules learned by RochetNet and RegretNet for single
bidder, two items settings. Panels (a) and (b) are for Setting K, Panels (c) and (d) are for Setting L, and
Panels (e) and (f) for Setting M. Panels describe the learned allocations for the two items (item 1 on the
left, item 2 on the right). Optimal mechanisms are indicated via dashed lines and allocation probabilities in
each region.
L. Single additive bidder with preferences over two items, where (v1 , v2 ) are drawn jointly and
uniformly from a unit triangle with vertices (0, 0), (0, 1) and (1, 0). The optimal mechanism
is due to Haghpanah and Hartline (2019).
M. Single unit-demand bidder with independent preferences over two items, where the item values
v1 , v2 ∼ U [0, 1]. See Pavlov (2011) for the optimal mechanism.
We used RegretNet architectures with two hidden layers with 100 nodes each. The optimal
allocation rules as well as a side-by-side comparison of those found by RochetNet and RegretNet
are given in Figure 18. Figure 19 gives the revenue and regret achieved by RegretNet and the
revenue achieved by RochetNet.
We find that in all three settings RochetNet recovers the optimal mechanism basically exactly,
while RegretNet finds an auction that matches the optimal design to surprising accuracy.
37
Opt RegretNet RochetNet
Distribution
rev rev rgt(mean) rgt(90 %) rgt(95 %) rgt(99 %) rev
Setting K 9.781 9.734 < 0.001 < 0.001 < 0.001 0.001 9.779
Setting L 0.388 0.392 < 0.001 < 0.001 < 0.001 0.001 0.388
Setting M 0.384 0.384 < 0.001 < 0.001 < 0.001 0.001 0.384
Figure 19: Test revenue and test regret achieved by RegretNet and test revenue achieved by RochetNet
for Settings K–M.
0.890 0.00031
0.885 0.00030
Test Revenue
Test Regret
0.880 0.00029
0.875 0.00028
0.870
0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000
No. of test samples No. of test samples
Figure 20: Test revenue and test regret achieved by RegretNet for Setting N as we increase the size of the
test set.
O. Two bidders and two items, with item valuations v1,1 , v1,2 , v2,1 , v2,2 drawn independently from
U [1, 2] and set valuations v1,{1,2} = v1,1 + v1,2 + C1 and v2,{1,2} = v2,1 + v2,2 + C2 , where C1 , C2
are drawn independently from U [−1, 1].
P. Two bidders and two items, with item valuations v1,1 , v1,2 drawn independently from U [1, 2],
item valuations v2,1 , v2,2 drawn independently from U [1, 5], and set valuations v1,{1,2} = v1,1 +
v1,2 + C1 and v2,{1,2} = v2,1 + v2,2 + C2 , where C1 , C2 are drawn independently from U [−1, 1].
These settings correspond to Settings I.-III. described in Section 3.4 of Sandholm and Likhode-
dov (2015). These authors conducted extensive experiments with several different classes of incen-
tive compatible mechanisms, and different heuristics for setting the parameters of these auctions.
They observed the highest revenue for two classes of mechanisms that generalize mixed bundling
auctions and λ-auctions (Jehiel et al., 2007).
These two classes of mechanisms are the Virtual Value Combinatorial Auctions (VVCA) and
Affine Maximizer Auctions (AMA). They also considered a restriction of AMA to bidder-symmetric
auction (AMAbsym ). We use VVCA∗ , AMA∗ , and AMA∗bsym to denote the best mechanism in the
respective class, as reported by Sandholm and Likhodedov and found using a heuristic grid search
technique.
For Setting N and O, Sandholm and Likhodedov observed the highest revenue for AMA∗bsym ,
and for Setting P the best performing mechanism was VVCA∗ . Figure 21 compares the performance
of RegretNet to that of these best performing, benchmark mechanisms. To compute the revenue of
the benchmark mechanisms we used the parameters reported in Sandholm and Likhodedov (2015)
(Table 2, p. 1011), and evaluated the respective mechanisms on the same test set used for RegretNet.
Note that RegretNet is able to learn new auctions with improved revenue and tiny regret.
38
RegretNet VVCA∗ AMA∗bsym
Distribution
rev rgt(mean) rgt(90 %) rgt(95 %) rgt(99 %) rev rev
Setting N 0.878 < 0.001 < 0.001 < 0.001 0.001 — 0.862
Setting O 2.860 < 0.001 < 0.001 < 0.001 < 0.001 — 2.765
Setting P 4.269 < 0.001 < 0.001 < 0.001 < 0.001 4.209 —
Figure 21: Test revenue and test regret for RegretNet for Settings N–P and a comparison with the best
performing VVCA and AMAbsym auctions as reported by Sandholm and Likhodedov (2015).
In order to make sure we are using sufficient data to report our results, we re-ran our evaluation
for setting N on a bigger test set with up to 50,000 samples and computed the regret using 5,000
gradient ascent steps. The estimated revenue and regret remained approximately the same as that
observed on our regular test set with 10,000 samples with regret computed using 2,000 gradient
ascent steps. Figure 20 shows how the revenue and regret varies as we increase the size of the test
set.
Q. One additive bidder and two items, where the bidder’s valuation is drawn uniformly from the
triangle T = {(v1 , v2 )|v1 + v2 (c − 1) ≤ 2c − 1, v1 ≥ 1, v2 ≥ 1} where c ≥ 1 is a free parameter
39
Prob. of allocating item 1 Prob. of allocating item 2 Prob. of allocating item 1 Prob. of allocating item 2
2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0
v2
v2
v2
1.4 0.4 1.4 0.4 1.4 0.4 1.4 0.4
v2
v2
v2
1.4 0.4 1.4 0.4 1.4 0.4 1.4 0.4
v2
v2
v2
1.4 0.4 1.4 0.4 1.4 0.4 1.4 0.4
v2
v2
v2
1.4 0.4 1.4 0.4 1.4 0.4 1.4 0.4
Figure 23: Allocation rules learned by RochetNet for Setting Q. The panels describe the probability that
the bidder is allocated item 1 (left) and item 2 (right) for different values c = 2.0, 4.0, 6.0, 7.0, 8.0, 9.0, 10.0
and 12.
R. One additive bidder and two items, where the bidder’s valuation is drawn uniformly from the
triangle T = {(v1 , v2 )|v1 + v2 ≤ c + 1, v1 ≥ 1, v2 ≥ 1} where c ≥ 1 is a free parameter.
The mechanisms learned by RochetNet for setting Q and setting R for various values of c are
shown in Figure 23 and Figure 24 respectively
40
Prob. of allocating item 1 Prob. of allocating item 2 Prob. of allocating item 1 Prob. of allocating item 2
1.25 1.0 1.25 1.0 1.5 1.0 1.5 1.0
v2
v2
v2
1.10 0.4 1.10 0.4 1.2 0.4 1.2 0.4
v2
v2
v2
0.4 1.4 0.4 1.75 0.4 1.75 0.4
4
1.50 1.50
0.2 1.2 0.2 0.2 0.2
2 1.25 1.25
0.0 1.0 0.0 1.00 0.0 1.00 0.0
2 4 6 8 10 1.0 1.2 1.4 1.6 1.8 2.0 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
v1 (c) v1 v1 (d) v1
Prob. of allocating item 1 Prob. of allocating item 2 Prob. of allocating item 1 Prob. of allocating item 2
5.0 1.0 5.0 1.0 7 1.0 7 1.0
4.5 4.5 6 6
0.8 0.8 0.8 0.8
4.0 4.0
3.5 3.5 5 5
0.6 0.6 0.6 0.6
3.0 3.0 4 4
v2
v2
v2
v2
2.5 0.4 2.5 0.4 0.4 0.4
3 3
2.0 2.0
0.2 0.2 2 0.2 2 0.2
1.5 1.5
1.0 0.0 1.0 0.0 1 0.0 1 0.0
1 2 3 4 5 1 2 3 4 5 2 4 6 2 4 6
v1 (e) v1 v1 (f) v1
Prob. of allocating item 1 Prob. of allocating item 2 Prob. of allocating item 1 Prob. of allocating item 2
9 1.0 9 1.0 1.0 1.0
8 8 10 10
0.8 0.8 0.8 0.8
7 7
8 8
6 0.6 6 0.6 0.6 0.6
5 5 6 6
v2
v2
v2
v2
4 0.4 4 0.4 0.4 0.4
4 4
3 3
0.2 0.2 0.2 0.2
2 2 2 2
1 0.0 1 0.0 0.0 0.0
2 4 6 8 2 4 6 8 2 4 6 8 10 2 4 6 8 10
v1 (g) v1 v1 (h) v1
Figure 24: Allocation rules learned by RochetNet for Setting R. The panels describe the probability that
the bidder is allocated item 1 (left) and item 2 (right) for different values c = 1.25, 1.5, 2.0, 3.0, 5.0, 7.0, 9.0
and 11.0.
the IR violation is zero but the revenue is much lower. Increasing the amount of discretization in
the LP leads to more accurate results with lower regret (and lower IR violations with -nearest),
but the number of parameters and the run time also increase exponentially. For the setting with
12 bins per value, the LP did not terminate despite running for 9 days on an AWS EC2 instance
with 48 cores and 96GB memory. In contrast, RegretNet learns a mechanism in this setting with
negligible regret and zero IR violations in at most six hours for most configurations. In Figure 25,
we report the test revenue and test regret achieved by RegretNet for different hidden layer R and
hidden units K configurations.
D Omitted Proofs
We present formal proofs for all theorems and lemmas that are stated in the body of the paper
or in other appendices. We first
Pdintroduce some notation. We denote the inner product between
d
vectors a, b ∈ R as ha, bi = for a vector x by kxk1 and the
i=1 ai bi . We denote the `1 norm
induced `1 norm for a matrix A ∈ Rk×t by kAk1 = max1≤j≤t ki=1 Aij .
P
41
Run-time
bins / value Parameters rev rgt(mean) IR viol.
(in hours)
1.495 0.187 0.374
D=2 96 8e-6
0 0 0
0.994 0.046 0.090
D=3 486 2e-5
0 0.020 0
1.073 0.024 0.040
D=4 1536 9e-5
0.762 0.005 0
0.978 0.013 0.022
D=5 3750 3e-4
0.706 0.002 0
0.987 0.008 0.012
D=6 7776 3e-4
0.799 0.002 0
0.967 0.006 0.009
D=7 14406 0.003
0.816 0.002 0
0.953 0.004 0.006
D=8 24576 0.007
0.825 0.002 0
0.941 0.003 0.005
D=9 39366 0.016
0.827 0.001 0
0.936 0.003 0.004
D = 10 60000 0.45
0.839 0.001 0
0.927 0.002 0.003
D = 11 87846 61.9
0.837 0.001 0
D = 12 124416 − − − > 216
(a)
Training time
R K Parameters rev rgt(mean) IR viol.
(in hours)
1 100 2111 0.843 <0.0005 0 3.4
1 50 1061 0.844 <0.0005 0 3.5
1 200 4211 0.841 <0.0005 0 3.5
2 100 22311 0.878 <0.0005 0 3.7
2 200 84611 0.874 <0.0005 0 3.7
1 500 10511 0.841 <0.0005 0 3.8
2 50 6161 0.874 <0.0005 0 3.8
2 500 511511 0.87 <0.0005 0 3.8
4 200 245411 0.88 <0.0005 0 4.2
4 50 16361 0.882 <0.0005 0 4.4
4 100 62711 0.884 <0.0005 0 4.4
4 500 1513511 0.887 <0.0005 0 5.1
8 200 567011 0.896 <0.0005 0 5.2
8 50 36761 0.885 <0.0005 0 5.6
8 100 143511 0.877 <0.001 0 5.6
(b)
Figure 25: Test revenue, test regret, test IR violation, and running-time for the setting of Section 5.8, with
two additive bidders and two items, with bidder item values sampled independently U [0, 1]. (a) LP-based
method, with varying levels of discretization (D), first row for -nearest and second for -down rounding;and
(b) RegretNet, with varying number of hidden layers (R) and hidden units (K).
42
D.1 Proof of Lemma 2.1
Let fi (v; w) := maxvi0 ∈Vi uw 0 w
i (vi ; (vi , v−i )) − ui (vi ; (vi , v−i )). Then we have rgt i (w) = Ev∼F [fi (v; w)].
Rewriting the expected value, we have
Z ∞ Z rgt q (w)
i
rgt i (w) = P(fi (v; w) ≥ x)dx ≥ P(fi (v; w) ≥ x)dx ≥ q · rgt qi (w),
0 0
where the last inequality holds because for any 0 < x < rgt qi (w), P(fi (v; w) ≥ x) ≥ P(fi (v; w) ≥
rgt qi (w)) = q.
D.2.1 Definitions
Let Ui be the class of utility functions for bidder i defined on auctions in M, i.e.,
Ui = ui : Vi × V → R ui (vi , b) = vi (g(b)) − pi (b) for some (g, p) ∈ M .
and let U be the class of profile of utility functions defined on M, i.e., the class of tuples (u1 , . . . , un )
where each ui : Vi × V → R and ui (vi , b) = vi (g(b)) − pi (b), ∀i ∈ N for some (g, p) ∈ M.
We will sometimes find it useful to represent the utility function as an inner product, i.e.,
treating vi as a real-valued vector of length 2M , we may write ui (vi , b) = hvi , gi (b)i − pi (b).
Let rgt ◦ Ui be the class of all regret functions for bidder i defined on utility functions in Ui , i.e.,
n o
0
rgt ◦ Ui = fi : V → R fi (v) = max 0
u (v
i i , (v ,
i −iv )) − u (v
i i , v) for some ui ∈ U i ,
vi
and as before, let rgt ◦ U be defined as the class of profiles of regret functions.
Define the `∞,1 distance between two utility functions u and u0 as
X
max0
|ui (vi , (vi0 , v−i )) − ui (vi , (vi0 , v−i ))|
v,v
i
and let N∞ (U, ) denote the minimum number of balls of radius to cover U under this distance.
Similarly, define the distance between ui and u0i as maxv,vi0 |ui (vi , (vi0 , v−i )) − u0i (vi , (vi0 , v−i ))|, and
let N∞ (Ui , ) denote the minimum number of balls of radius to cover Ui under this distance.
Similarly, we define covering numbers N∞ (rgt ◦ Ui , ) and N∞ (rgt ◦ U, ) for the function classes
rgt ◦ Ui and rgt ◦ U respectively.
Moreover, we denote the class of allocation functions as G and for each bidder i, Gi = {gi : V →
M
2 | g ∈ G}. Similarly, we denote the class of payment functions by P and Pi = {pi : V → R | p ∈
P}. We denote the covering number of P as N∞ (P, ) under the `∞,1 distance and the covering
number for Pi using N∞ (Pi , ) under the `∞,1 distance.
43
D.2.2 Auxiliary Lemma
We will use a lemma from Shalev-Shwartz and Ben-David (2014). Let F denote a class of bounded
functions f : Z → [−c, c] defined on an input space Z, for some c > 0. Let D be a distribution over
Z and S = {z1 , . . . , zL } be a sample drawn i.i.d. from D. We are interested in the gap between
the expected value of a function f and the average value of the function on sample S, and would
like to bound this gap uniformly for all functions in F. For this, we measure the capacity of the
function class F using the empirical Rademacher complexity on sample S, defined below:
1 X
R̂L (F) := Eσ sup σi f (zi ) ,
L f ∈F
zi ∈S
where σ ∈ {−1, 1}L and each σi is drawn i.i.d from a uniform distribution on {−1, 1}. We then
have:
Lemma D.1 (Shalev-Shwartz and Ben-David (2014)). Let S = {z1 , . . . , zL } be a sample drawn
i.i.d. from some distribution D over Z. Then with probability of at least 1 − δ over draw of S from
D, for all f ∈ F,
L
r
1X 2 log(4/δ)
Ez∼D [f (z)] ≤ f (z` ) + 2R̂L (F) + 4c ,
L L
`=1
Note each function f in this class corresponds to a mechanism (g, p) in M, and the expected
value Ev∼D [f (v)] gives the expected revenue from that mechanism. The proof then follows by an
application of the uniform convergence bound in Lemma D.1 to the above function class, and by
further bounding the Rademacher complexity term in this bound by the covering number of the
auction class M.
Applying Lemma D.1 to the auxiliary function class rev ◦ M, we get with probability of at least
1 − δ over draw of L valuation profiles S from D, for any f ∈ rev ◦ M, there exists a distribution-
independent constant C > 0 such that,
L n
h X i 1 XX
Ev∼F − pi (v) ≤ − pi (v (`) )
L
i∈N `=1 i=1
r
log(1/δ)
+ 2R̂L (rev ◦ M) + Cn . (18)
L
All that remains is to bound the above empirical Rademacher complexity R̂L (rev ◦ M) in terms
of the covering number of the payment class P and in turn in terms of the covering number of
the auction class M. Since we assume that the auctions in M satisfy individual rationality and
v(S) ≤ 1, ∀S ⊆ M , we have for any v, pi (v) ≤ 1.
44
By the definition of the covering number for the payment class, there exists P a cover P̂ for P of
size |P̂| ≤ N∞ (P, ) such that for any p ∈ P, there is a fp ∈ P̂ with maxv i |pi (v) − fp i (v)| ≤ .
We thus have:
L
" #
1 X X
R̂L (rev ◦ M) = Eσ sup σ` · pi (v (`) )
L p
`=1 i
L L
" # " #
1 X X
(`) 1 X X
(`) (`)
= Eσ sup σ` · fp i (v ) + Eσ sup σ` · pi (v ) − fp i (v )
L p L p
`=1 i `=1 i
L
" #
1 X X
(`) 1
≤ Eσ sup σ` · p̂i (v ) + Eσ kσk1
L p̂∈P̂ `=1
L
i
sX X r
2 log(N∞ (P, ))
≤ ( p̂i (v ` ))2 +
L
` i
r
2 log(N∞ (P, ))
≤ 2n + , (19)
L
where the second-last inequality follows from Massart’s lemma, and the last inequality holds because
v !2 v !2
√
u u
uX X uX X
t p̂i (v ` ) ≤ t pi (v ` ) + n ≤ 2n L.
` i ` i
We further observe that N∞ (P, ) ≤ N∞ (M, ). By the definition of the covering number for
the auction class M, there exists a cover M̂ for M of size |M̂| ≤ N∞ (M, ) such that for any
(g, p) ∈ M, there is a (ĝ, p̂) ∈ M̂ such that for all v,
X X
|gij (v) − ĝij (v)| + |pi (v) − p̂i (v)| ≤ .
i,j i
P
This also implies that i |pi (v) − p̂i (v)| ≤ , and shows the existence of a cover for P of size at
most N∞ (M, ).
Substituting the bound on the Radamacher complexity term in (19) in (18) and using the fact
that N∞ (P, ) ≤ N∞ (M, ), we get:
L n
r r
h X i 1 XX n 2 log(N∞ (M, )) o log(1/δ)
Ev∼F − pi (v) ≤ − pi (v (`) ) + 2 · inf + 2n + Cn ,
L >0 L L
i∈N `=1 i=1
45
(1) bounding the covering number for each regret class rgt ◦ Ui in terms of the covering number
for individual utility classes Ui
(2) bounding the covering number for the combined utility class U in terms of the covering
number for M
(3) bounding the covering number for the sum regret class rgt ◦ U in terms of the covering
number for the (combined) utility class M.
An application of Lemma D.1 then completes the proof. We prove each of the above steps
below.
Proof. By the definition of covering number N∞ (Ui , ), there exists a cover Ûi with size at most
N∞ (Ui , /2) such that for any ui ∈ Ui , there is a ûi ∈ Ûi with
sup |ui (vi , (vi0 , v−i )) − ûi (vi , (vi0 , v−i ))| ≤ /2.
v,vi0
For any ui ∈ Ui , taking ûi ∈ Ûi satisfying the above condition, then for any v,
ui (vi , (vi0 , v−i )) − ui (vi , (vi , v−i )) − max ûi (vi , (v̄i , v−i )) − ûi (vi , (vi , v−i ))
max
0
vi ∈V v̄i ∈V
≤ max
0
ui (vi , (vi0 , v−i )) − max ûi (vi , (v̄i , v−i )) + ûi (vi , (vi , v−i )) − ui (vi , (vi , v−i ))
vi v̄i
≤ max
0
ui (vi , (vi0 , v−i )) − max ûi (vi , (v̄i , v−i )) + ûi (vi , (vi , v−i )) − ui (vi , (vi , v−i ))
vi v̄i
≤ max
0
ui (vi , (vi0 , v−i )) − max ûi (vi , (v̄i , v−i )) + /2
vi v̄i
Let vi∗ ∈ arg maxvi0 ui (vi , (vi0 , v−i )) and v̂i∗ ∈ arg maxv̄i ûi (vi , (v̄i , v−i )), then
max
0
ui (vi , (vi0 , v−i )) = ui (vi∗ , v−i ) ≤ ûi (vi∗ , v−i ) + /2 ≤ ûi (v̂i∗ , v−i ) + /2 = max ûi (vi , (v̄i , v−i )) + ,
vi v̄i
max ûi (vi , (v̄i , v−i )) = ûi (v̂i∗ , v−i ) ≤ ui (v̂i∗ , v−i ) + /2 ≤ ui (vi∗ , v−i ) + /2 = max
0
ui (vi , (vi0 , v−i )) + /2 .
v̄i vi
Thus, for all ui ∈ Ui , there exists ûi ∈ Ûi such that for any valuation profile v,
0
max
0
ui (v i , (v i , v−i )) − ui (v i , (v i , v−i )) − max ûi (v i , (v̄ i , v−i )) − ûi (v i , (vi , v−i )) ≤ ,
vi v̄i
Proof. Recall that the utility function of bidder i is ui (vi , (vi0 , v−i )) = hvi , gi (vi0 , v−i )i − pi (vi0 , v−i ).
There exists a set M̂ with |M̂| ≤ N∞ (M, /n) such that, there exists (ĝ, p̂) ∈ M̂ such that
X
sup |gij (v) − ĝij (v)| + kp(v) − p̂(v)k1 ≤ /n.
v∈V i,j
46
We denote ûi (vi , (vi0 , v−i )) = hvi , ĝi (vi0 , v−i )i − p̂i (vi0 , v−i ), where we treat vi as a real-valued vector
of length 2M .
For all v ∈ V, vi0 ∈ Vi ,
≤
Proof. By definition of N∞ (U, ), there exists Û with size at most N∞ (U, ), such that, for any
u ∈ U, there exists û such that for all v, v 0 ∈ V ,
X
|ui (vi , (vi0 , v−i )) − ûi (vi , (vi0 , v−i ))| ≤ .
i
Therefore for all v ∈ V , | i ui (vi , (vi0 , v−i )) − i ûi (vi , (vi0 , v−i ))| ≤ , from which it follows that
P P
N∞ (rgt ◦ U, ) ≤ N∞ (rgt ◦ U, ). Following Step 1, it is easy to show N∞ (rgt ◦ U, ) ≤ N∞ (U, /2).
Together with Step 2 this completes the proof of Step 3.
Based on the same arguments as in Section D.2.3, we can thus bound the empirical Rademacher
complexity as:
r !
2 log N∞ (rgt ◦ U, )
R̂L (rgt ◦ U) ≤ inf + 2n
>0 L
r !
2 log N∞ (M, 2n )
≤ inf + 2n .
>0 L
Applying Lemma D.1, completes the proof of the generalization bound for regret.
47
D.4 Proof of Lemma A.1
Similar to Lemma 3.1, ϕCF (s, s(1) , . . . , s(m) ) trivially satisfies the combinatorial feasibility (con-
straints (14)–(15)). For any allocation z that satisfies the combinatorial feasibility, the following
scores
(j)
∀j = 1, · · · , m, si,S = si,S = log(zi,S ) + c,
Theorem D.1. For RegretNet with R hidden layers, K nodes per hidden layer, dg parameters in
the allocation component, dp parameters in the payment component, and the vector of all model
parameters kwk1 ≤ W , the following are the bounds on the term ∆L for different bidder valuation
types:
(a) additive
p valuations:
∆L ≤ O R(dg + dp ) log(LW max{K, mn})/L ,
(b) unit-demand
q valuations:
∆L ≤ O R(dg + dp ) log(LW max{K, mn})/L ,
(c) combinatorial
q valuations (with combinatorial feasible allocation):
∆L ≤ O R(dg + dp ) log(LW max{K, n 2m })/L .
We first bound the covering number for a general feed-forward neural network and specialize it
to the three architectures we present in Section 3 and Appendix A.2.
Lemma D.2. Let Fk be a class of feed-forward neural networks that maps an input vector x ∈ Rd0
to an output vector y ∈ Rdk , with each layer ` containing T` nodes and computing z 7→ φ` (w` z),
where each w` ∈ RT` ×T`−1 and φ` : RT` → [−B, +B]T` . Further let, for each network in Fk , let the
parameter matrices kw` k1 ≤ W and kφ` (s) − φ` (s0 )k1 ≤ Φks − s0 k1 for any s, s0 ∈ RT`−1 .
d
2Bd2 W (2ΦW )k
N∞ (Fk , ) ≤ ,
Proof. We shall construct an `1,∞ cover for Fk by discretizing each of the d parameters along
[−W, +W ] at scale 0 /d, where we will choose 0 > 0 at the end of the proof. We will use F̂k
to denote the subset of neural networks in Fk whose parameters are in the range {−(dW d/0 e −
1) 0 /d, . . . , −0 /d, 0, 0 /d, . . . , dW d/0 e0 /d}. The size of F̂k is at most d2dW/0 ed . We shall now
show that F̂k is an -cover for Fk .
We use mathematical induction on the number of layers k. We wish to show that for any f ∈ Fk
there exists a fˆ ∈ F̂k such that:
48
For k = 0, the statement holds trivially. Assume that the statement is true for Fk . We now show
that the statement holds for Fk+1 .
A function f ∈ Fk+1 can be written as f (z) = φk+1 (wk+1 H(z)) for some H ∈ Fk . Similarly, a
function fˆ ∈ F̂k+1 can be written as fˆ(z) = φk+1 (ŵk+1 Ĥ(z)) for some Ĥ ∈ F̂k and ŵk+1 is a matrix
of entries in {−(dW d/0 e − 1) 0 /d, . . . , −0 /d, 0, 0 /d, . . . , dW d/0 e0 /d}. Also, for any parameter
matrix w` ∈ RT` ×T`−1 , there is a matrix ŵ` with discrete entries s.t.
T
X̀
`
kw` − ŵ` k1 = max |w`,i,j − ŵ`,i,j | ≤ T` 0 /d ≤ 0 . (20)
1≤j≤T`−1
i=1
We then have:
kf (x) − fˆ(x)k1
= kφk+1 (wk+1 H(x)) − φk+1 (ŵk+1 Ĥ(x))k1
≤ Φkwk+1 H(x) − ŵk+1 Ĥ(x)k1
≤ Φkwk+1 H(x) − wk+1 Ĥ(x)k1 + Φkwk+1 Ĥ(x) − ŵk+1 Ĥ(x)k1
≤ Φkwk+1 k1 · kH(x) − Ĥ(x)k1 + Φkwk+1 − ŵk+1 k1 · kĤ(x)k1
≤ ΦW kH(x) − Ĥ(x)k1 + ΦTk Bkwk+1 − ŵk+1 k1
≤ Bd0 ΦW (2ΦW )k + ΦBd0
≤ Bd0 (2ΦW )k+1 ,
where the second line follows from our assumption on φk+1 , and the sixth line follows from our
inductive hypothesis and from (20). By choosing 0 = B(2ΦW )k
, we complete the proof.
We next bound the covering number of the auction class in terms of the covering number for the
class of allocation networks and for the class of payment networks. Recall that the payment networks
computes a fraction α : Rm(n+1) → [0, 1]n and computes a payment pi (b) = αi (b) · hvi , gi (b)i for
each bidder i. Let G be the class of allocation networks and A be the class of fractional payment
functions used to construct auctions in M. Let N∞ (G, ) and N∞ (A, ) be the corresponding
covering numbers w.r.t. the `∞ norm. Then:
Proof. Let Ĝ ⊆ G, Â ⊆ A be `∞ covers for G and A, i.e. for any g ∈ G and α ∈ A, there exists
ĝ ∈ Ĝ and α̂ ∈ Â with
X
sup |gij (b) − ĝij (b)| ≤ /3 (21)
b i,j
X
sup |αi (b) − α̂i (b)| ≤ /3. (22)
b i
We now show that the class of mechanism M̂ = {(ĝ, α̂) | ĝ ∈ Ĝ, and p̂(b) = α̂i (b) · hvi , ĝi (b)i} is
an -cover for M under the `1,∞ distance. For any mechanism in (g, p) ∈ M, let (ĝ, p̂) ∈ M̂ be a
mechanism in M̂ that satisfies (22). We have:
X X
|gij (b) − ĝij (b)| + |pi (b) − p̂i (b)|
i,j i
49
X
≤ /3 + |αi (b) · hbi , gi,· (b)i − α̂i (b) · hbi , ĝi (b)i|
i
X
≤ /3 + |(αi (b) − α̂i (b)) · hbi , gi (b)i|
i
+ |α̂i (b) · (hbi , gi (b)i − hbi , ĝi,· (b))i|
X X
≤ /3 + |αi (b) − α̂i (b)| + kbi k∞ · kgi (b) − ĝi (b)k1
i i
X
≤ 2/3 + |gij (b) − ĝij (b)| ≤ ,
i,j
where in the third inequality we use hbi , gi (b)i ≤ 1. The size of the cover M̂ is |Ĝ||Â|, which
completes the proof.
We are now ready to prove covering number bounds for the three architectures in Section 3 and
Appendix A.2.
Proof of Theorem D.1. All three architectures use the same feed-forward architecture for comput-
ing fractional payments, consisting of K hidden layers with tanh activation functions. We also have
by our assumption that the `1 norm of the vector of all model parameters is at most W , for each
` = 1, . . . , R + 1, kw` k1 ≤ W . Using that fact that the tanh activation functions are 1-Lipschitz and
bounded in [−1, 1], and there are at most max{K, n} number of nodes in any layer of the payment
network, we have by an application of Lemma D.2 the following bound on the covering number of
the fractional payment networks A used in each case:
d
max(K, n)2 (2W )R+1 p
N∞ (A, ) ≤
where dp is the number of parameters in payment networks.
For the covering number of allocation networks G, we consider each architecture separately. In
each case, we bound the Lipschitz constant for the activation functions used in the layers of the
allocation network and followed by an application of Lemma D.2. For ease of exposition, we omit
the dummy scores used in the final layer of neural network architectures.
Additive bidders. The output layer computes n allocation probabilities for each item j
using a softmax function. The activation function φR+1 : Rn → Rn for the final layer for input
s ∈ Rn×m can be described as: φR+1 (s) = [softmax(s1,1 , . . . , sn,1 ), . . . , softmax(s
P 1,m , . . . , sn,m )],
where softmax : Rn → [0, 1]n is defined for any u ∈ Rn as softmaxi (u) = eui / nk=1 euk .
We then have for any s, s0 ∈ Rn×m ,
50
where the third step follows by bounding the Frobenius norm of the Jacobian of the softmax
function.
The hidden layers ` = 1, . . . , R are standard feed-forward layers with tanh activations. Since the
tanh activation function is 1-Lipschitz, kφ` (s)−φ` (s0 )k1 ≤ ks−s0 k1 . We also have by our assumption
that the `1 norm of the vector of all model parameters is at most W , for each ` = 1, . . . , R + 1,
kw` k1 ≤ W . Moreover, the output of each hidden layer node is in [−1, 1], the output layer nodes
is in [0, 1], and the maximum number of nodes in any layer (including the output layer) is at most
max{K, mn}.
By an application of Lemma D.2 with Φ = 1, B = 1, and d = max{K, mn} we have
dg
max{K, mn}2 (2W )R+1
N∞ (G, ) ≤ ,
where dg is the number of parameters in allocation networks.
Unit-demand bidders. The output layer n allocation probabilities for each item j as an
element-wise minimum of two softmax functions. The activation function φR+1 : R2 n → Rn for the
final layer for two sets of scores s, s̄ ∈ Rn×m can be described as:
where the last step can be derived in the same way as (23).
As with additive bidders, using additionally hidden layers ` = 1, . . . , R are standard feed-forward
layers with tanh activations, we have from Lemma D.2 with Φ = 1, B = 1 and d = max{K, mn},
dg
max{K, mn}2 (2W )R+1
N∞ (G, ) ≤ .
Combinatorial bidders. The output layer outputs an allocation probability for each bidder
m m
i and bundle of items S ⊆ M . The activation function φR+1 : R(m+1)n2 → Rn2 for this layer for
m
m + 1 sets of scores s, s(1) , . . . , s(m) ∈ Rn×2 is given by:
n
(1)
φR+1,i,S (s, s(1) , . . . , s(m) ) = min softmaxS (si,S 0 : S 0 ⊆ M ), softmaxS (si,S 0 : S 0 ⊆ M ), . . . ,
51
o
(m)
softmaxS (si,S 0 : S 0 ⊆ M ) ,
where the last step can be derived in the same way as (23).
As with additive bidders, using additionally hidden layers ` = 1, . . . , R are standard feed-forward
layers with tanh activations, we have from Lemma D.2 with Φ = 1, B = 1 and d = max{K, n · 2m }
d
max{K, n · 2m }2 (2W )R+1 g
N∞ (G, ) ≤
where dg is the number of parameters in allocation networks.
We now bound ∆L for the three architectures using the covering number bounds we derived
above. In particular, we upper bound the the ‘inf’ over > 0 by substituting a specific value of :
(a) For additive bidders, choosing = √1L , we get
r !
log(W max{K, mn}L)
∆L ≤ O R(dp + dg )
L
52
C
(0,2)
F G
H
I
R2
E
(0, 43 ) R3
R1
A D J K B
(0, 1) ( 3c , 1) (c, 1)
and both the supremum and infimum are achieved. Based on ”complementary slackness” of linear
programming, the optimal solution of Equation 25 needs to satisfy the following conditions.
Corollary D.1 (Daskalakis et Ral. (2017)). Let u∗ and γ ∗ be feasible for their respective problems
in Equation 25, then u dµ = kv − v 0 k1 dγ ∗ if and only if the following two conditions hold:
R ∗
Z Z
u d(γ1 − γ2 ) = u∗ dµ
∗ ∗ ∗
Z
u∗ (v) − u∗ (v 0 ) = kv − v 0 k1 , γ ∗ -almost surely.
Then we prove the utility function u∗ induced by the mechanism for setting C is optimal. Here
we only focus on Settiong C with c > 1, for c ≤ 1 the proof is analogous and we omit here11 . The
transformed measure µ of the valuation distribution is composed of:
11
It is fairly similar to the proof for setting c > 1. If c ≤ 1, there are only two regions to discuss, in which R1 and
R2 are the regions correspond to allocation (0, 0) and (1, 1), respectively. Then we show the optimal γ ∗ = γ̄ R1 + γ̄ R2
where γ̄ R1 = 0 for region R1 and show γ R2 only ”transports” mass of measure downwards and leftwards in region
R2 , which is analgous to the analysis for γ R3 for setting c > 1.
53
1. A point mass of +1 at (0, 1).
It is straightforward to verify that µ(R1 ) = µ(R2 ) = µ(R3 ) = 0. We will show there exists an
optimal measure γ ∗ for the dual program of Theorem 2 (Equation 5) in Daskalakis et al. (2013). γ ∗
can be decomposed into γ ∗ = γ R1 + γ R2 + γ R3 with γ R1 ∈ Γ+ (R1 × R1 ), γ R2 ∈ Γ+ (R2 × R2 ), γ R3 ∈
Γ+ (R3 × R3 ). We will show the feasibility of γ ∗ , such that
γ1R1 − γ2R1 µ|R1 ; γ1R2 − γ2R2 µ|R2 ; γ1R3 − γ2R3 µ|R3 . (26)
γ A -almost surely for any A = R1 , R2 , and R3 . We visualize the transport of measure γ ∗ in Figure 26.
Construction of γ R1 . µ+ |R1 is concentrated on a single point (0, 1) and µ− |R1 is distributed
throughout a region which is coordinate-wise greater than (0, 1), then it is obviously to show
0 µ|R1 . We set γ R1 to be zero measure, and we get γ1R1 − γ2R1 = 0 µ|R1 . In addition,
u∗ (v) = 0, ∀v ∈ R1 , then the conditions in Corollary 1 in Daskalakis et al. (2013) hold trivially.
Construction of γ R2 . µ+ |R2 is uniformly distributed on upper edge CF of the triangle and
µ− |R2 is uniformly distributed in R2 . Since we have µ(R2 ) = 0, we construct γ RR2 by “transporting”
R2 R2 ∗ R2 R2
R + |∗R2 into µ− |R2 downwards, that is γR1 = µ+ |R2 , γ2 = µ− |R2 . Therefore, u0 d(γ1 − γ2 ) =
µ
u dµ holds trivially. The measure γ 2 is only concentrated on the pairs (v, v ) such that v1 =
v1 , v2 ≥ v20 . Thus for such pairs (v 0 v 0 ), we have u∗ (v)−u∗ (v 0 ) = ( vc1 +v2 − 43 )−( vc1 +v20 − 43 ) = ||v−v 0 ||1 .
0
6 `2 (F H)c
H 4`(F H) `(F H) 3`(F H)
µ(RU )=√ − · = √ · 4− √ >0
c2 + 1 c 2(c2 + 1) c2 + 1 c2 + 1
`2 (F H)c
H 4`(F H) 2 `(F H)c 6 2`(F H)c
µ(RL )= √ − ·√ − · √ −
c2 + 1 c c2 + 1 c 3 c2 + 1 2(c2 + 1)
`(F H) 3`(F H)
=√ · √ −2 <0
c2 + 1 c2 + 1
Thus, there exists a unique line lH with positive slope that intersects H and separate R3 into two
parts, RU H (above l ) and RH (below l ), such that µ (RH ) = µ (RH ). We will then show for
H B H + U − U
any two points on edge BF , H and I, lines lH and lI will not intersect inside R3 . In Figure 26, on
the contrary, we assume lH = HK and lI = IJ intersects inside R3 . Given the definition of lH and
lI , we have
54
Notice µ− is only distributed inside R3 and edge DB, thus µ− (F IKD) > µ− (F IJD). Given the
above discussion, we have
On the other hand, let S(HIK) be the area of triangle HIK, DG be the altitude of triangle
DBF w.r.t BF , and h be the altitude of triangle HJK w.r.t the base HI.
6 6 1 3
µ− (HJK) = · S(HIK) = · `(HI)h ≤ · `(HI) · `(DG)
c c 2 c
3 2c 2
= · √ · `(HI) = √ · `(HI)
c 3 c2 + 1 c2 + 1
2
<√ · `(HI) = µ+ (HIK),
2
c +1
which is a contradiction of Equation 27. Thus, we show lH and lI doesn’t intersect inside R3 . Let
γ R3 be the measure that transport mass from µ+ |RR3 to µ− |R3 through lines {lH |H ∈ BF }. Then
we have γ1R3 = µ+ |R3 , γ2R3 = µ− |R3 , which leads to u∗ d(γ1R3 − γ2R3 ) = u∗ dµ. The measure γ R3
R
is only concentrated on the pairs (v, v 0 ), s.t. v1 ≥ v10 and v2 ≥ v20 . Therefore, for such pairs (v, v 0 ),
we have u∗ (v) − u∗ (v 0 ) = (v1 + v2 − 3c − 1) − (v10 + v20 − 3c − 1) = (v1 − v10 ) + (v2 − v20 ) = ||v − v 0 ||1 .
Finally, we show there must exist an optimal measure γ for the dual program of Theorem 2
in Daskalakis et al. (2013).
55