0% found this document useful (0 votes)
8 views

Lecture Notes of The Institute For Computer Sciences, Social Informatics and Telecommunications Engineering 105

Uploaded by

Pankaj Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture Notes of The Institute For Computer Sciences, Social Informatics and Telecommunications Engineering 105

Uploaded by

Pankaj Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 282

Lecture Notes of the Institute

for Computer Sciences, Social Informatics


and Telecommunications Engineering 105

Editorial Board
Ozgur Akan
Middle East Technical University, Ankara, Turkey
Paolo Bellavista
University of Bologna, Italy
Jiannong Cao
Hong Kong Polytechnic University, Hong Kong
Falko Dressler
University of Erlangen, Germany
Domenico Ferrari
Università Cattolica Piacenza, Italy
Mario Gerla
UCLA, USA
Hisashi Kobayashi
Princeton University, USA
Sergio Palazzo
University of Catania, Italy
Sartaj Sahni
University of Florida, USA
Xuemin (Sherman) Shen
University of Waterloo, Canada
Mircea Stan
University of Virginia, USA
Jia Xiaohua
City University of Hong Kong, Hong Kong
Albert Zomaya
University of Sydney, Australia
Geoffrey Coulson
Lancaster University, UK
Vikram Krishnamurthy Qing Zhao
Minyi Huang Yonggang Wen (Eds.)

Game Theory
for Networks
Third International ICST Conference
GameNets 2012
Vancouver, BC, Canada, May 24-26, 2012
Revised Selected Papers

13
Volume Editors

Vikram Krishnamurthy
University of British Columbia
Vancouver, BC V6T 1Z4, Canada
E-mail: [email protected]

Qing Zhao
University of California
Electrical and Computer Engineering
Davis, CA 95616, USA
E-mail: [email protected]

Minyi Huang
Carleton University
Ottawa, ON K1S 5B6, Canada
E-mail: [email protected]

Yonggang Wen
Nanyang Technological University
Singapore 639798
E-mail: [email protected]

ISSN 1867-8211 e-ISSN 1867-822X


ISBN 978-3-642-35581-3 e-ISBN 978-3-642-35582-0
DOI 10.1007/978-3-642-35582-0
Springer Heidelberg Dordrecht London New York

Library of Congress Control Number: 2012953707

CR Subject Classification (1998): I.2.1, K.4.4, I.2.6, C.2.4, H.3.4, K.6.5, G.1.6

© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2012

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface

The 3rd International Conference on Game Theory for Networks (Gamenets)


was held during May 24–26, 2012 in the Empire Landmark Hotel in spectacular
Vancouver, Canada. Vancouver is widely recognized as one of the world’s most
“liveable cities”. The mission of the conference is to share novel basic research
ideas as well as experimental applications in the Gamenets area in addition to
identifying new directions for future research and development.
Gamenets 2012 had 20 peer reviewed papers and a plenary talk on mean-field
games by Prof. Peter Caines of McGill University.
We would like to thank the authors for providing the content of the pro-
gram. We would also like to express our gratitude to the TPC and reviewers,
who worked very hard in reviewing papers. This year, we received 24 paper sub-
missions from authors all over the world. After a rigorous peer review by the
Technical Program Committee (TPC), 13 papers were accepted. In addition, 7
invited papers were included in the technical program.
We would like to thank our financial sponsor EAI (European Alliance for
Innovation) for their support in making Gamenets 2012 a successful event.

Vikram Krishnamurthy
Organization

Organizing Committee
Conference General Chair
Vikram Krishnamurthy University of British Columbia, Canada

Technical Program Committee (TPC) Co-chairs


Qing Zhao UC Davis, USA
Minyi Huang Carleton University, Canada
Yonggang Wen Nanyang Technological University, Singapore

Local Organizing Chair


Alireza Attar University of British Columbia, Canada

Workshops Co-chairs
Mihaela van der Schaar UCLA, USA
Hamidou Tembine Supelec, France
Srinivas Shakkottai Texas A&M, USA

Publications Chair
Alfredo Garcia University of Virginia, USA

Industry Chair
Shrutivandana Sharma Yahoo Labs, India

Publicity Co-chairs
Dusit Niyato Nanyang Technological University, Singapore
Walid Saad University of Miami, USA

Web Chair
Omid Namvar Gharehshiran University of British Columbia, Canada

Conference Organizer
Erica Polini EAI, contact: erica.polini[at]eai.eu

Steering Committee
Athanasios Vasilakos National Technical University of Athens,
Greece
Imrich Chlamtac Create-Net, Italy
Table of Contents

Achievability of Efficient Satisfaction Equilibria in Self-Configuring


Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
François Mériaux, Samir Perlaza, Samson Lasaulce, Zhu Han, and
Vincent Poor
A Competitive Rate Allocation Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Yanting Wu, George Rabanca, Bhaskar Krishnamachari, and
Amotz Bar-Noy
Convergence Dynamics of Graphical Congestion Games . . . . . . . . . . . . . . . 31
Richard Southwell, Yanjiao Chen, Jianwei Huang, and Qian Zhang
Establishing Network Reputation via Mechanism Design . . . . . . . . . . . . . . 47
Parinaz Naghizadeh Ardabili and Mingyan Liu
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand . . . 63
John N. Tsitsiklis and Yunjian Xu
A Game Theoretic Optimization of the Multi-channel ALOHA
Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Kobi Cohen, Amir Leshem, and Ephraim Zehavi
Game-theoretic Robustness of Many-to-one Networks . . . . . . . . . . . . . . . . . 88
Aron Laszka, Dávid Szeszlér, and Levente Buttyán
Hybrid Pursuit-Evasion Game between UAVs and RF Emitters with
Controllable Observations: A Hawk-Dove Game . . . . . . . . . . . . . . . . . . . . . . 99
Husheng Li, Vasu Chakravarthy, Sintayehu Dehnie,
Deborah Walter, and Zhiqiang Wu
Learning Correlated Equilibria in Noncooperative Games with Cluster
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Omid Namvar Gharehshiran and Vikram Krishnamurthy
Marketing Games in Social Commerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Dohoon Kim
Mean Field Stochastic Games with Discrete States and Mixed
Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Minyi Huang
Network Formation Game for Interference Minimization Routing in
Cognitive Radio Mesh Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Zhou Yuan, Ju Bin Song, and Zhu Han
VIII Table of Contents

Noncooperative Games for Autonomous Consumer Load Balancing


over Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Tarun Agarwal and Shuguang Cui

Optimal Contract Design for an Efficient Secondary Spectrum


Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Shang-Pin Sheng and Mingyan Liu

Primary User Emulation Attack Game in Cognitive Radio Networks:


Queuing Aware Dogfight in Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Husheng Li, Vasu Chakravarthy, Sintayehu Dehnie, and Zhiqiang Wu

Revenue Maximization in Customer-to-Customer Markets . . . . . . . . . . . . . 209


Shaolei Ren and Mihaela van der Schaar

A Stackelberg Game to Optimize the Distribution of Controls in


Transportation Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Ralf Borndörfer, Bertrand Omont, Guillaume Sagnol, and
Elmar Swarat

Stochastic Loss Aversion for Random Medium Access . . . . . . . . . . . . . . . . 236


George Kesidis and Youngmi Jin

Token-Based Incentive Protocol Design for Online Exchange Systems . . . 248


Jie Xu, William Zame, and Mihaela van der Schaar

Towards a Metric for Communication Network Vulnerability to Attacks:


A Game Theoretic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Assane Gueye, Vladimir Marbukh, and Jean C. Walrand

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275


Achievability of Efficient Satisfaction Equilibria
in Self-Configuring Networks

François Mériaux1, Samir Perlaza2, Samson Lasaulce1 , Zhu Han3 ,


and Vincent Poor2
1
Laboratoire des Signaux et Systèmes - LSS (CNRS-SUPELEC-Paris Sud),
Gif-sur-Yvette, 91192 France
2
Department of Electrical Engineering, Princeton University,
Princeton, NJ 08542 USA
3
Department of Electrical and Computer Engineering, University of Houston,
Houston, TX 77004 USA

Abstract. In this paper, a behavioral rule that allows radio devices to


achieve an efficient satisfaction equilibrium (ESE) in fully decentralized
self-configuring networks (DSCNs) is presented. The relevance of ESE in
the context of DSCNs is that at such state, radio devices adopt a trans-
mission/receive configuration such that they are able to simultaneously
satisfy their individual quality-of-service (QoS) constraints. An ESE is
also an efficient network configuration, i.e., individual QoS satisfaction
is achieved by investing the lowest possible effort. Here, the notion of
effort refers to a preference each radio device independently establishes
among its own set of actions. In particular, the proposed behavioral rule
requires less information than existing rules, as in the case of the clas-
sical best response dynamics and its variants. Sufficient conditions for
convergence are presented in a general framework. Numerical results are
provided in the context of a particular uplink power control scenario, and
convergence from any initial action profile to an ESE is formally proved
in this scenario. This property ensures the proposed rule to be robust to
the dynamic arrival or departure of radio devices in the network.

Keywords: Supermodular games, Power control, Efficient Satisfaction


Equilibrium, Games in Satisfaction Form.

1 Introduction
A decentralized self-configuring network (DSCN) is basically an infrastructure-
less communication system in which radio devices autonomously choose their
own transmit/receive configuration in order to guarantee reliable communica-
tion. In particular, a transmit/receive configuration can be described in terms of
power allocation polices, coding-modulation schemes, scheduling policies, decod-
ing order, etc. Typical examples of DSCNs are wireless sensor networks, short
range networks in the ISM bands (e.g., Wi-Fi, Bluetooth, ZigBee, etc,), femto-cell
networks (e.g., femto cells in LTE-A) and adhoc networks in general. The under-
lying feature of DSCNs is that transmitters directly communicate with their re-
spective receivers without the intervention of a central controller. Thus, the main

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 1–15, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
2 F. Mériaux et al.

limitation of these networks to actually provide QoS is the mutual interference


arising from the uncoordinated interaction of radio devices subject to mutual
interference. Within this context, the notion of QoS provisioning translates into
the need for designing behavioral rules such that radio devices autonomously
adapt their transmission configurations in order to meet the minimum require-
ments for their communications to take place satisfactorily. In particular, similar
reconfigurable capabilities have been already mentioned in [6] in the context of
cognitive radios.
In general, the decentralized nature of the QoS provisioning task in DSCN has
fostered the use of tools from game theory (GT) [7,5], strategic learning theory
[9], distributed optimization and variational inequality theory [11] to the analysis
of QoS provisioning in this scenario. In this paper, we focus on a particular
formulation of the QoS provisioning problem, namely games in satisfaction form
[8]. More specifically, we provide behavioral rules that allow radio devices to
achieve an efficient satisfaction equilibrium (ESE) in DSCNs. The notion of ESE,
as introduced in [8], refers to a network state in which all the network devices are
able to satisfy their individual QoS constraints by investing the minimum effort.
Often, we associate the notion of high effort with transmit/receive configurations
that might represent an additional waste of energy to satisfy the individual QoS
constraints. In this context, one of the main contributions of this paper is the
introduction of a behavioral rule that allows the network to achieve an ESE
using only local information. Another important contribution is a set of sufficient
conditions to observe the convergence to an ESE of the proposed rule.
In order to show the potential of our contributions in the context of DSCNs,
we consider a particular scenario of power control in the uplink of a single-cell
system in which devices must guarantee a minimum signal to interference plus
noise ratio (SINR). Interestingly, we highlight that in this particular scenario, the
proposed behavioral rule converges to an ESE independently of the initial state
of the network. This result contrasts with the existing literature. For instance,
in [1], Altman et al. studied the problem in the general framework of compact
sublattices as action sets. Therein, under the assumption that a solution to the
problem exists, they established that a simple behavioral rule known in game
theory as the best response dynamics (BRD) [4] only converge to the solution
from particular starting points. When the transmit power sets are continuous,
Yates et al. proved that the BRD converge from any initial point in [14]. In
the case of discrete actions sets, an algorithm close to the BRD is proposed
in [12]. However, there are still conditions on the starting point to ensure the
convergence of the algorithm.
The remainder of this paper unfolds as follows. In Sec. 2, we revisit the notion
of satisfaction equilibrium (SE) and ESE and we formulate the QoS provisioning
problem in the most general terms. In Sec. 3, we describe our main contribu-
tion: a behavioral rule that allows DSCNs to converge to an ESE, when action
sets correspond to compact sublattices. In Sec. 4, we present numerical results
in a particular scenario as described above in order to verify our theoretical
contributions. Finally, we conclude our work in Sec. 5.
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 3

2 QoS Provisioning and Games in Satisfaction Form


2.1 QoS Problem Formulation
Consider a DSCN comprising a set K = {1, . . . , K} of K transmitter/receiver
pairs to which we refer as links. Each radio device autonomously chooses its opti-
mal transmit/receive configuration in order to satisfy its own QoS requirements.
Here, we denote by k ∈ K the k-th link, independently of whether it is the trans-
mitter or the receiver that is the device performing the self-adaptation of the
link configuration. We denote by ak the transmit/receive configuration adopted
by the link k, and we denote by Ak the set of all its possible choices. For all
k ∈ K, Ak is assumed to be a compact sublattice, as in [1,13]. A = A1 × . . .× AK
represents the set of all configuration profiles. This structure has the advantage
of comprising both compact continuous sets and discrete sets1 . We denote by
a−k = (a1 , . . . , ak−1 , ak+1 , . . . , aK ) the vector obtained by dropping the k-th
component of the vector a. We denote the space in which the vector a−k exists
by A−k . With a slight abuse of notation, we write the vector a as (ak , a−k ), in
order to emphasize its k-th component. A transmit/receive configuration can be
described by parameters such as the power allocation policy, modulation scheme,
constellation size, decoding order, scheduling policy, etc. The instantaneous per-
formance of radio device k is determined by a set of Qk functions
 u (1)
k : A → R,
..
 (Qk )
. (1)
uk : A → R.
Typical performance metrics are transmission rate, transmission delay, bit error
rate, energy efficiency, or any combination of those. We denote the minimum
(q ) (q ,min)
and maximum acceptable values of the performance metric uk k by Γk k
(q ,max)
and Γk k , respectively. Thus, we say that the configuration profile a ∈ A
satisfies the QoS constrains of the DSCN if for all link k the following set of
inequalities are satisfied :
 Γ (1,min)
k
(1)
< uk (ak , a−k ) < Γk
(1,max)
,
..
 (Qk ,min) (Q )
.
(Q ,max)
(2)
Γk < uk k (ak , a−k ) < Γk k .
Note that the performance metrics of link k depend not only on its own con-
figuration ak but also on the configurations a−k adopted by all the other links.
Thus, in order to ease our notation, we define the correspondence fk : A−k → 2Ak
that determines all the possible configurations of player k that satisfies its QoS
constraints. That is ∀ak ∈ Ak
ak ∈ fk (a−k ) ⇔
(q,min) (q) (q,max) (3)
∀q ∈ {1, . . . , Qk }, Γk < uk (ak , a−k ) < Γk .
1
The results of Sec. 3.2 and Sec. 3.4 apply to the general framework of compact
sublattices whereas the results of Sec. 3.5 apply only to discrete configuration sets.
4 F. Mériaux et al.

The problem of all the links wanting to satisfy their QoS constraints at the same
time can naturally be described as a game.

2.2 Game Formulation


As defined in [8], a game in satisfaction form is fully described by the following

  
triplet
G = K, {Ak }k∈K , {fk }k∈K . (4)
In this triplet, K represents the set of players, Ak is the strategy set of player
k ∈ K, and the correspondence fk determines the set of actions of player k
that allows its satisfaction given the actions played by all the other players.
A strategy profile is denoted by vector a = (a1 , . . . , aK ) ∈ A. In general, an
important outcome of a game in satisfaction form is the one where all players
are satisfied, that is, an SE. The notion of SE was formulated as a fixed point
in [8] as follows:

  
Definition 1 (Satisfaction Equilibrium). An action profile a+ is an equi-
librium for the game G = K, {Ak }k∈K , {fk }k∈K if
 
∀k ∈ K, k ∈ fk a−k .
a+ +
(5)

As we shall see, the SE is often not unique and thus, there might exist some SEs
that are of particular interest. In the following, we introduce the notion of an
efficient SE (ESE). For this intent, we consider a cost function for each player
of the game, in order to model the notion of effort or cost associated with a
given action choice. For all k ∈ K, the cost function ck : Ak → [0, 1] satisfies the
following condition : ∀(ak , ak ) ∈ A2k , it holds that

ck (ak ) < ck (ak ) , (6)

if and only if, ak requires a lower effort than action ak when it is played by
player k. Under the notion of effort, the set of SEs that are of particular interest
are those that require the lowest individual efforts. We formalize this notion of
equilibrium using the following definition.

  
Definition 2 (Efficient Satisfaction Equilibrium). An action profile a∗ is
an ESE for the game G = K, {Ak }k∈K , {fk }k∈K , with cost functions {ck }k∈K ,
if
∀k ∈ K, a∗k ∈ fk a∗−k
 (7)
and
∀k ∈ K, ∀ak ∈ fk (a∗−k ), ck (ak ) ≥ ck (a∗k ). (8)
Note that the effort associated by each player with each of its actions does not
depend on the choice of effort made by other players. Here, we have left players
to individually choose their cost functions, which adds another degree of freedom
to the modeling of the QoS problem in DSCNs.
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 5

Note also that a game in satisfaction form is not a game with a constrained
set of actions, as is the case in the formulation presented in [3]. Here, a player
can use any of its actions independently of all the other players. The dependency
on the other players’ actions enters through whether the player under study is
satisfied or not.

2.3 Power Control Game

In the rest of this paper, we use the context of uplink power control in a single-
cell as a case study. Although most of our results apply in a general context,
we concentrate in the uplink power control problem as presented in [12,14], to
illustrate our results.
Consider K transmitter/receiver pairs denoted by index k ∈ K. For all k ∈ K,
transmitter k uses power level pk ∈ Ak , with Ak generally defined as a compact
sublattice For each player k ∈ K, we denote pmin k and pmax
k the minimum and
the maximum power levels in Ak , respectively. For every couple (i, j) ∈ K2 , we
denote by gij the channel gain coefficient between transmitter i and receiver j.
The considered metric for each pair k is the Shannon rate given by

pk gkk
uk (pk , p−k ) = log2 1 + [bps/Hz], (9)
σk2 + j=k pj gjk

where σk2 is the noise variance at receiver k.


The QoS requirement for each pair k is to have a channel capacity uk (pk , p−k )
higher than a given threshold Γk bps/Hz. The satisfaction correspondence of link
k is then
fk (p−k ) = {pk ∈ Ak | uk (pk , p−k ) ≥ Γk }
σk2 + pj gjk
 (10)
j=k
= pk ∈ Ak | pk ≥ (2Γk − 1) .
gkk

3 Convergence to an Efficient Satisfaction Equilibrium


In this section, we provide sufficient conditions for convergence of the BRD
 
and the robust blind response dynamics (RBRD) to an ESE of the game G =
K, {Ak }k∈K , {fk }k∈K , with cost functions {ck }k∈K .

3.1 Best Response Dynamics


 
In the context of a game in satisfaction form G = K, {Ak }k∈K , {fk }k∈K , with

cost functions {ck }k∈K , we define the best response (BR) correspondence of
player k, given that the other players adopt the reduced action profile a−k , as
follows:
BRk (a−k ) = arg min ck (ak ). (11)
ak ∈fk (a−k )
6 F. Mériaux et al.

We consider a BRD defined as the behavioral rule in which players sequentially


update their action following the Gauss-Seidel method [2]. At step n + 1 of the
algorithm, all the players sequentially update their actions with the following
rule:
(n+1) (n+1) (n+1) (n) (n)
ak = BRk (a1 , . . . , ak−1 , ak+1 , . . . , aK ). (12)
For a discrete set of actions, the BRD can be compared to the asynchronous
version of the Minimum Feasible Value Assignment (MFVA) algorithm presented
in [12]. The difference is that in [12], players only move to their optimal satisfying
action if they are not satisfied with actions played at the previous step. In the
BRD, players move to their optimal satisfying action independently of their
satisfaction at the previous step.

3.2 Convergence of the BRD

To study the convergence of the BRD, we first define some notation of interest.
Let a = (a1 , . . . , aN ) and b = (b1 , . . . , bN ) be two action profiles and let c =
a ∨ b denote the maximum of (a, b) component wise, i.e., c = (c1 , . . . , cN ) with
cn = max(an , bn ) ∀n ∈ {1, . . . , N }. In a similar way, a ∧ b denotes min(a, b)
component wise.

Definition 3 (S-modularity). The function g: A → R is said to be supermod-


ular if for any a, b ∈ A

g(a ∧ b) + g(a ∨ b) ≥ g(a) + g(b), (13)

and said to be submodular if

g(a ∧ b) + g(a ∨ b) ≤ g(a) + g(b). (14)

In the case of the cost function defined in (6), ck depends only on the actions
of player k. Hence, ck is both supermodular and submodular. As a result, (13)
and (14) are equalities.

Definition 4 (Ascending and descending properties). The correspondence


fk is said to possess the ascending property (respectively the descending property)
if for any two elements a−k and a−k of the set A−k , with a−k = a−k ∧ a−k
implies that ∀ak ∈ fk (a−k ) and ∀ak ∈ fk (a−k ),
 min(a , a ) ∈ f (a
k

k −k ),
k (15)
max(ak , ak ) ∈ fk (a−k ),

or for the descending property


 max(a , a ) ∈ f (a
k

k −k ),
k (16)

min(ak , ak ) ∈ fk (a−k ).
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 7

An important consequence of the ascending (or descending) property is that

∀a−k ∈ A−k , fk (a−k ) = ∅. (17)

The definition of an ascending set can easily be understood in the context of


distributed power control. In such a context, the ascending property means that
if all the other players increase their powers, player k also has to increase its own
power if it wants to remain satisfied. Also note that if the ascending property
is ensured, then there is always at least one satisfying power level for player k,
whatever the other players are playing. In particular, when all the players are
at maximum power levels, there exists a satisfying power for player k, which is
a strong assumption.

Proposition 1. Assume that for all k ∈ K, fk (·) is nonempty and compact for
all the values of their arguments, fk (·) has either the ascending or the descending
property and fk (·) is continuous. Then the following holds:
– (i) An ESE exists.
– (ii) If the dynamics start with the action profile associated with the highest
or lowest effort in ck (·), for all k ∈ K, the BRD converge monotonically to
an ESE.
– (iii) If the dynamics start from an SE, the trajectory of the best response
converges to an ESE. It monotonically evolves in all components.
– (iv) In a two-player game, the BRD converge to an ESE from any starting
point.

The proof of Prop. 1 comes from Th. 1 in [1] and Th. 2.3 in [13]. We simply
have to verify that the right assumptions hold for the ascending case and the
descending case:
– Let fk (·) be ascending for all k ∈ K. ck is a cost function player k wants
to minimize, in particular ck is a submodular function, and thus −ck is a
supermodular function player k wants to maximize and Th. 1 from [1] holds,
i.e., (i, ii, iii) in Prop. 1 are ensured when the sets are ascending.
– Let fk (·) be descending for all k ∈ K. A similar reasoning can be made: ck is a
submodular function player k wants to minimize and the same theorem holds
as well, i.e., (i, ii, iii) in Prop. 1 are ensured when the sets are descending.
In both ascending and descending cases, (iv) in Prop. 1 is obtained from Th. 2.3
in [13].

3.3 BRD in the Uplink Power Control Game


In the general framework of compact sublattices as strategy sets (including con-
tinuous and discrete action sets), the BRD converge only from given starting
points (see [1,13]). However, in the uplink power control problem, it has been
shown in [10,14] that when strategy sets are continuous, the BRD converge from
any initial point. When strategy sets are discrete, the convergence of the BRD
8 F. Mériaux et al.

from any initial point to an equilibrium is not guaranteed. In [12], it is shown


that the MFVA converges only when all the transmitters start at their lowest
power levels. In the following, we consider a 3-player uplink power control game
to illustrate the non-convergence of the BRD from a particular initial action
profile.

Example 1. In this example, we refer to the notation introduced in Sec. 2.3. Let
us consider K = 3 pairs of transmitters/receivers. For all k ∈ K, transmitter
k uses power level ak ∈ {pmin, pmax }. Given the constraints from Sec. 2.3, let
consider channel gains such that

f1 (pmin , pmin ) = f3 (pmin , pmin ) = {pmin, pmax },


f1 (pmin , pmax ) = f3 (pmin , pmax ) = {pmin, pmax },
(18)
f1 (pmax , pmin ) = f3 (pmax , pmin ) = {pmax },
f1 (pmax , pmax ) = f3 (pmax , pmax ) = {pmax },

and
f2 (pmin , pmin ) = {pmin, pmax },
f2 (pmin , pmax ) = {pmax },
(19)
f2 (pmax , pmin) = {pmin , pmax },
f2 (pmax , pmax ) = {pmax }.
We can check that fk has the ascending property for all k ∈ K. For each pair
k, the cost of the power level is given by the identity cost function ck (ak ) = ak .
This game has two ESEs:

– (pmin , pmin , pmin ) where all the players transmit at their lowest power level.
No player has interest in deviating from its action since any other action has
a higher cost (even though the player would remain satisfied).
– (pmax , pmax , pmax ) where all the players have to transmit at maximum power
to be satisfied. If one deviates from its action, it will not be satisfied anymore.

But depending on the initial action profile of the BRD, the BRD may not
converge to an ESE. For instance, assume that the BRD start at p(0) =
(pmax , pmin, pmax ). At step 1, player 1 chooses the action that minimizes c1 (·)
(0)
given the previous actions of the other players p−1 = (pmin , pmax ), i.e.,

(1)
p1 = BR1 (pmin , pmax ) = pmin . (20)

Player 2 chooses the action that minimizes c2 (·) given the most recent actions
(1) (0)
of the other players (p1 , p−(1,2) ) = (pmin , pmax ), i.e.,

(1)
p2 = BR2 (pmin , pmax ) = pmax . (21)
(1) (1)
Player 3 chooses the action that minimizes c3 (·) given (p1 , p2 ) = (pmin , pmax ),
i.e.,
(1)
p3 = BR3 (pmin , pmax ) = pmin . (22)
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 9

At step 2, player 1 chooses the action that minimizes c1 (·) given the previous
(1)
actions of the other players p−1 = (pmax , pmin ), i.e.,

(2)
p1 = BR1 (pmax , pmin ) = pmax . (23)

Player 2 chooses the action that minimizes c2 (·) given the most recent actions
(2) (1)
of the other players (p1 , p−(1,2) ) = (pmax , pmin), i.e.,

(2)
p2 = BR2 (pmax , pmin ) = pmin . (24)
(2) (2)
Player 3 chooses the action that minimizes c3 (·) given (p1 , p2 ) = (pmax , pmin),
i.e.,
(2)
p3 = BR3 (pmax , pmin ) = pmax . (25)
The algorithm is back at the starting point, and it is clear that it will continue
in this infinite loop.

3.4 Robust Blind Response Dynamics

The BRD have significant drawbacks. First, it was just shown that in a K-player
game with K > 2, the dynamics may not converge to an ESE depending on the
initial action profile. Second, to determine the BR, each player has to know
the set fk (a−k ) ∀a−k ∈ A−k . To overcome these drawbacks, we propose a new
algorithm that requires less information about the game for each player and can
still be proven to converge to an ESE. Let us start by defining the robust blind
response (RBR) by RBRk : A → Ak , such that :

 a , if a ∈ f (a

k

k k −k ), ak ∈ fk (a−k ) and ck (ak ) ≤ ck (ak ),
 a , otherwise,
 
(ak , a−k ) → a , if a ∈ f (a
k k k −k ) and ak ∈/ fk (a−k ), (26)
k

with action ak being randomly chosen in Ak , such that ∀ak ∈ Ak , Pr (ak = ak ) >
0. Each time the RBR is used, a player k ∈ K randomly chooses an action in
its strategy set Ak without taking into account the constraints of other players.
Player k only has to know if the new action and the previous one allow the
satisfaction of its individual constrains and to compare their respective costs. If
both actions allow the satisfaction of the constraints, it chooses the one with the
lowest cost. If the new action allows the satisfaction of the individual constraints
whereas the previous one does not, it moves to the new action. Otherwise, it
keeps the same action. When all the players sequentially use the RBR such that
∀k ∈ K
(n+1) (n+1) (n+1) (n) (n)
ak = RBRk (a1 , . . . , ak−1 , ak+1 , . . . , aK ), (27)
we refer to these dynamics as the RBR dynamics (RBRD). Our main result in
this section is stated in the following theorem.
10 F. Mériaux et al.

Theorem 1. Assume that for all k ∈ K, fk (·) is nonempty and compact for
all the values of their arguments, fk (·) has the ascending property and it is
continuous, and ck (·) is strictly increasing. Then, the following holds:
– (i) If the dynamics start from an SE, the sequence of RBRs converges to an
ESE. It monotonically decreases in all components.
– (ii) If the dynamics start with the actions associated with the highest effort
in ck (·), ∀k ∈ K, the sequence of RBRs converges monotonically to an ESE.
– (iii) In a two-player game, the sequence of RBRs converge to an ESE from
any starting point.

Proof. Applying Prop.1, we know that there exists an ESE for the game G =
K, {Ak }k∈K , {fk }k∈K . The convergence of the RBRD to an ESE is proven in
two steps. First, we show for (i, ii, ii) that the RBRD converge to a fixed point.
Second, we explain why this fixed point has to be an ESE.
– (i) Assume that the dynamics start from an SE: aSE and this SE is not an
ESE (otherwise, the convergence is trivial). Let player k ∈ K be the first
(n)
player to actually change its action at step n to ak ; necessarily this action
SE
has a lower cost than ak because a satisfied player can only move to another
satisfying action with a lower cost. Let the next player to move be denoted by
j. From its point of view (ak , aSE SE SE
(n) (n)
−{k,j} ) = (ak , a−{k,j} ) ∧ a−j . Hence, due
to the ascending property of fj and the strict monotony of cj , necessarily its
(n )
new action aj ≤ aSE
(n)
j , and so forth. For each k ∈ K the sequence {ak }n∈N
is decreasing in a compact set. Thus, the algorithm converges to a limit.
– (ii) Assume that the dynamics start from action profile amax =
(amax
1 , . . . , amax
K ) and this point is not an SE (otherwise refer to (i)). Let
player k update its action first, at step n. Necessarily, its updated action
(n)
ak is lower than amax k . Then ∀j = k, j ∈ K
(n) (n)
{−j,k} , ak ) = (a{−j,k} , ak ) ∧ a−j .
(amax max max
(28)

Due to the ascending property of fj and the strict monotony of cj , the update
of player j is hence lower than amax
j , and so forth. Again, for each player
(n)
k ∈ K, the sequence of action {ak }n∈N is decreasing in a compact set and
the algorithm converges to a limit.
– (iii) In a two-player game, assume the dynamics start from a random action
(0) (0)
profile (a1 , a2 ). Assume player 1 is the first player that updates its action
(n) (0)
to get satisfied, at step n. The action profile is then (a1 , a2 ). In the next
move, either the same player 1 decreases its action, remaining satisfied, or
player 2 moves to an action that satisfies it, leading to an action profile
(n) (n )
(a1 , a2 ). If this profile is an SE, the dynamics converge according to (i).
Otherwise player 1 is no longer satisfied and has to update its action. If
(n ) (0)
a2 < a2 , then due to the ascending property and the strict monotonicity
(n)
of c1 , player 1 will only move to a lower action than a1 . Then player 2 will
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 11

(n )
also have to move to a lower action than a2 for analogous reasons, and
(n) (n)
so forth. The sequences {a1 }n∈N and {a2 }n∈N are hence decreasing in a
(n ) (0)
compact set, so they converge to a limit. If a2 > a2 , the sequences are
increasing in a compact set and converge as well.
We now have to prove that a fixed point is an ESE. Consider that a∗ is a fixed
point for RBRk , ∀k ∈ K. By the definition of RBRk this means that there exists
no ak ∈ Ak such that ak ∈ fk (a∗−k ) and ck (ak ) ≤ ck (a∗k ), which is exactly the
definition of the ESE. This completes the proof. 
The main advantage of these dynamics over BRD in a general framework is
that the former require only local information and the knowledge of an explicit
expression for fk is no longer relevant. Only the knowledge of whether the cor-
responding player is satisfied or not is sufficient to implement the RBR.

3.5 RBRD in the Uplink Power Control Game


A very interesting property occurs for the RBR in the uplink power control game
with discrete action sets.
Theorem 2. In the power allocation game defined above in Sec. 2.3, with dis-
crete action sets, i.e., ∀k ∈ K, Ak = {pk , . . . , pN
(1)
k } with Nk the number of
k

power levels in action set Ak , the RBRD converge to an ESE from any starting
point.
Proof. We show in this proof that from any starting point of the dynamics, there
is a non-null probability that the dynamics move to a particular SE with a given
way. Note that the particular sequence of events we describe here is not always
the way the dynamics run. It is simply a sequence that can occur with a non-null
probability, but there are many other possible sequences that lead to an SE.
(0) (0)
Assume p(0) = (p1 , . . . , pk ) is the starting power profile of the dynamics.
Consider all the unsatisfied players at this point and assume that they all move
to their maximum possible power levels (this may happen with a non-null prob-
ability). These levels satisfy them since the ascending property gives us

∀k ∈ K, ∀p−k ∈ A−k , pmax


k ∈ fk (p−k ). (29)

This increase of power levels may cause some of the satisfied players at the
starting point not to be satisfied anymore. We also assume that these players
move to their maximum power levels. And the same is done until no unsatisfied
player remains. So we get a power profile made of the highest power levels for
some of the players and the initial power levels for the others, and every player
is satisfied at this point: it is an SE.
Finally, from (i) of Th. 1, the dynamics converge to an ESE, which completes
the proof. 
Th. 2 highlights a very interesting property of the RBRD when players enter or
quit the game (or when the channel coefficients vary). Indeed, if K transmitters
12 F. Mériaux et al.

are in any given ESE p∗ and a new transmitter enters the game, a new game
starts with K + 1 players. Thus, from Th. 2, it can be stated that convergence
to a new ESE, if it exists, is ensured from the new starting point (p∗ , pk+1 ).

4 Numerical Results

In this section, we provide numerical results for the uplink power control game
with discrete action sets as defined in Sec. 2.3.
In Fig. 1, we show the sequences of actions converging to an ESE for the
RBRD in a 2-player power control game. The colored region is the satisfaction
region, i.e., the region allowing both players to be satisfied. The coloring of this
region follows the sum of the costs for each player. The RBR first converges to the
satisfaction region, then converges to an ESE while remaining in the satisfaction
region.

35

30
Power Index of Player 2

25

20

15

10

Satisfaction region
5
Robust blind response
0
0 5 10 15 20 25 30 35
Power Index of Player 1
Fig. 1. Sequence of power indices for the RBRD in the uplink 2-player power control
game. The colored region is the satisfaction region, i.e., the region where the two players
mutually satisfy their constraints.

The scenario we consider in Fig. 2 and Fig. 3 highlights the advantages of


RBRD over the BRD in a 3-player game: during the first 200 steps, only trans-
mitters 1 and 3 are in the game, then transmitter 2 joins them for the 200
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 13

next steps, and finally transmitter 3 leaves for the last 200 steps. On each of
the two figures, we show the sequence of power indices for the three players,
knowing that each action set is made of Nk = 32 possible power levels from
10−6 W to 10−2 W. We also show the satisfaction states of the three players: for
each step of the dynamics, if all the player are satisfied, the satisfaction state
is 1, otherwise it is 0. Fig. 2 and Fig. 3 correspond to the behavior of the BRD
and the RBRD, respectively. The channel parameters and the starting points
of the two simulations are exactly the same. Channel gains are g22 = 10−5 ,
g11 = g33 = g13 = g21 = g32 = 10−6 , g12 = g23 = g31 = 10−7 , and transmitters
1, 2, and 3 start at power levels 10−3 W, 10−5/2 W, and 10−9/4 W, respectively.
The utility constraints Γ1 , Γ2 , and Γ3 are taken as 1.2 bps/Hz, 1.5 bps/Hz, and
1.2 bps/Hz, respectively. The variance of the noise is fixed at 10−10 W for all the
transmitters. It is interesting to notice that the BRD converge to ESE during
the first and third phase but when transmitter 2 enters the game in the second
phase, the BRD do not converge to an ESE. Instead, they enter a loop and we
can see that the transmitters are not satisfied. Concerning the RBRD, although
their convergence time is longer, they converge in the three phases and another
interesting fact is that transmitters are satisfied during a longer amount of time
compared to the BRD.

40
Trans. 1
Power Index

30 Trans. 2
Trans. 3
20

10
0 100 200 300 400 500 600
Satisfaction

0 100 200 300 400 500 600


Number of iterations

Fig. 2. Sequences of power indices and satisfaction states for the BRD in the 3-player
uplink power control game
14 F. Mériaux et al.

40
Trans. 1
Power Index

30 Trans. 2
Trans. 3
20

10
0 100 200 300 400 500 600
Satisfaction

0 100 200 300 400 500 600


Number of iterations

Fig. 3. Sequences of power indices and satisfaction states for the RBRD in the 3-player
uplink power control game

5 Conclusion and Future Work


In this work, we have proposed a behavior rule that converges to an ESE in
the general framework of compact sublattices as actions sets. Compared to the
BRD, the proposed rule requires far less information although its convergence
time is longer. Applying this rule to the uplink power control game with discrete
actions sets has been shown to be of great interest since the dynamics are proven
to converge to an ESE from any starting action profile. This particular feature
allows the proposed rule to be robust to the entrance or the exit of players in
the power control game.
However, a strong assumption of this work is to assume that for every player,
for any action profile of the other players, there exists an action satisfying the
considered player. In the power control game, it would be more relevant to take
into account scenarios in which the power levels of the other players are too high
and a given player cannot be satisfied for any action it can play. Hence, a natural
perspective of this work is to relax this assumption and study the convergence
of the dynamics in this context.
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 15

References
1. Altman, E., Altman, Z.: S-modular games and power control in wireless networks.
IEEE Transactions on Automatic Control 48(5), 839–842 (2003)
2. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical
Methods. Prentice-Hall, Inc., Upper Saddle River (1989)
3. Debreu, G.: A social equilibrium existence theorem. Proceedings of the National
Academy of Sciences of the United States of America 38(10), 886–893 (1952)
4. Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge (1991)
5. Han, Z., Niyato, D., Saad, W., Basar, T., Hjorungnes, A.: Game Theory in Wire-
less and Communication Networks: Theory, Models and Applications. Cambridge
University Press, Cambridge (2011)
6. Haykin, S.: Cognitive radio: Brain-empowered wireless communications. IEEE
Journal on Selected Areas in Communications 23(2), 201–220 (2005)
7. Lasaulce, S., Tembine, H.: Game Theory and Learning in Wireless Networks: Fun-
damentals and Applications. Elsevier Academic Press, Waltham (2011)
8. Perlaza, S.M., Tembine, H., Lasaulce, S., Debbah, M.: Quality-of-service provision-
ing in decentralized networks: A satisfaction equilibrium approach. IEEE Journal
of Selected Topics in Signal Processing 6(2), 104–116 (2012)
9. Rose, L., Lasaulce, S., Perlaza, S.M., Debbah, M.: Learning equilibria with par-
tial information in decentralized wireless networks. IEEE Communications Maga-
zine 49(8), 136–142 (2011)
10. Scutari, G., Barbarossa, S., Palomar, D.P.: Potential games: A framework for vector
power control problems with coupled constraints. In: The IEEE International Con-
ference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France
(May 2006)
11. Scutari, G., Palomar, D.P., Facchinei, F., Pang, J.-S.: Convex optimization, game
theory, and variational inequality theory in multiuser communication systems.
IEEE Signal Processing Magazine 27(3), 35–49 (2010)
12. Wu, C.C., Bertsekas, D.P.: Distributed power control algorithms for wireless net-
works. IEEE Transactions on Vehicular Technology 50(2), 504–514 (2001)
13. Yao, D.D.: S-modular games, with queueing applications. Queueing Systems
21(3-4), 449–475 (1995)
14. Yates, R.D.: A framework for uplink power control in cellular radio systems. IEEE
Journal on Selected Areas in Communications 13(7), 1341–1347 (1995)
A Competitive Rate Allocation Game

Yanting Wu1 , George Rabanca2 , Bhaskar Krishnamachari1,


and Amotz Bar-Noy2
1
University of Southern California, Los Angeles CA 90089, USA,
{yantingw,bkrishna}@usc.edu
2
The City University of New York, New York, NY 10016, USA
[email protected], [email protected]

Abstract. We introduce a competitive rate allocation game in which


two receivers compete to forward the data from a transmitter to a des-
tination in exchange for a payment proportional to the amount of for-
warded data. At each time slot the channel from the transmitter to each
receiver is an independent random variable with two states, high or low,
affecting the amount of data that can be transmitted. Receivers make
"bids" on the state of their channel and the transmitter allocates rate
accordingly. Receivers are rewarded for successful transmissions and pe-
nalized for unsuccessful transmissions. The goal of the transmitter is to
set the penalties in such a way that even if the receivers are selfish, the
data forwarded is close to the optimal transmission rate. We first model
this problem as a single shot game in which the receivers know the chan-
nel probability distributions but the transmitter does not, and show that
it is possible for the transmitter to set penalties so as to ensure that both
receivers have a dominant strategy and the corresponding Price of An-
archy is bounded by 2. We show, moreover, that this is in a sense the
best possible bound. We next consider the case when receivers have in-
complete information on the distributions, and numerically evaluate the
performance of a distributed online learning algorithm based on the well-
known UCB1 policy for this case. From simulations, we find that use of
UCB1 policy yields a performance close to the dominant strategy.

Keywords: competitive rate allocation game, Nash equilibrium, online


learning.

1 Introduction
Optimizing throughput is one of the central problems in wireless networking
research. To make good use of the available wireless channels, the transmitter
must allocate rate efficiently. We study in this paper a simple yet fundamental
rate allocation problem in which the transmitter does not precisely know the
state of the channels, and the corresponding receivers are selfish.
In this problem, there is one transmitter that must allocate rates to two differ-
ent receivers to forward data on its behalf to a given destination (see illustration
in figure 1). The two channels from the transmitter to each receiver are indepen-
dent channels with two states: high or low. The channel states are assumed to

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 16–30, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
A Competitive Rate Allocation Game 17

Fig. 1. Illustration of problem

be i.i.d. Bernoulli random variables. Initially we assume that the receivers both
know each others channel parameters, but the transmitter does not. At each
time, the receivers communicate to the transmitter a binary bid corresponding
to the possible state of their respective channels. The transmitter responds to
these bids by deciding whether to send aggressively (at a high, or very high rate)
or conservatively (at a low rate) on one or both channels. Specifically when both
receivers bid low, the transmitter sends data at a low rate R1 over both channels.
And when both receivers bid high, the transmitter splits its power to send data
at a high rate R2 over both channels. When one of the receivers bids low and the
other bids high, the transmitter sends data at a very high rate R3 over the latter
channel. When the sender sends data at a high or very high rate, we assume
that there is a failure and nothing gets sent if the transmission channel actually
turns out to be bad. In this case, the sender levies a penalty on the receivers.
But whenever data is successfully sent, it pays the receiver a fee proportional to
the rate obtained.
There are two roles in this setting: the receivers and the transmitter. The
receivers want to get as much reward as possible, avoiding the penalties. Since
the transmitter’s rate allocation is a competitive resource that directly affects the
receivers’ utilities, the setting can be modeled as a two player, non-cooperative
game. On the other hand, the transmitter is the game designer: it can choose
the penalties in order to influence how the receivers play the game. The goal of
the transmitter is to transmit as much data as possible and, without knowledge
of the receiver’s channel states, to guarantee that the total transmission is not
much worse than the optimal. In this paper we prove that there is a way to set
the penalty terms such that both receivers have dominant strategies, and the
data forwarded by two receivers is at least 1/2 of the optimal, in other words,
that the Price of Anarchy from the transmitter’s point of view is at most 2.
If the underlying channels’ states are known we can assume that the two
receivers will play their dominant strategies if they have one. However, if the
underlying channel status is unknown, the receivers need to learn which action
is more beneficial. Assuming that the underlying channel state is drawn from
an unknown underlying distribution at each time slot, we show that modeling
each payers’ choice of action as a multi-armed bandit leads to desirable results.
18 Y. Wu et al.

In this paper we adapt UCB1 algorithm [1], which there are two arms for each
receiver, each arm corresponding to an action: bidding high or bidding low.
From the simulations, we find that the UCB1 algorithm gives a performance
which is close to the dominant strategy, and, when both receivers use UCB1 to
choose their strategies, it can give even better payoffs than playing the dominant
strategy.
Related Work: Game theory, which is a mathematical tool for analyzing
the interaction of two or more decision makers, has been used in wireless com-
munications by many authors [2], [3]. While we are not aware of other papers
that have addressed exactly the same formulation as discussed here, other re-
searchers have explored related game theoretic problems pertaining to power
allocation over multiple channels. For instance, the authors of [4] formulate a
multiuser power control problem as a noncooperative game, show the existence
and uniqueness of a Nash equilibrium for a two-player version game and pro-
pose a water-filling algoirhtm which reaches the Nash equilibrium efficiently; and
the authors of [5] study a power allocation game for orthogonal multiple access
channels, prove that there exists a unique equilibrium of this game when the
channels are static and show that a simple distributed learning schema based
on the replicator dynamics converges to equilibrium exponentially fast. Unlike
most of the prior works, our formulation and analysis is not focused on optimiz-
ing the power allocation per se, but rather on issues of information asymmetry
between the transmitter and receivers and the design of appropriate penalties
levied by the transmitter to ensure that the receiver’s selfishness do not hurt
performance too much. Somewhat related to the formulation in this paper are
two recent papers on non-game-theoretic formulations for a transmitter to decide
on the whether to send conservatively or aggressively over a single (known or
unknown) Markovian channel [6] [7]. Although we consider a simpler Bernoulli
channels here in which case the transmitter’s decisions would be simplified, our
formulation focuses on strategic interactions between two receivers. In the case of
unknown stochastic payoffs, we consider the use of a multi-armed bandit-based
learning algorithm. Relatively little is known about the performance of such on-
line learning algorithms in game formulations, though it has been shown that
they do not always converge to Nash equilibria [8].

2 Problem Formulation

In the rate allocation game we consider two receivers and one transmitter. The
transmitter uses the two receivers to forward data to the destination. The channel
from the transmitter to each receiver has one of the two states at each time slot:
low (L or 0) or high (H or 1). The two channels are independent with each other
and their state comes from an i.i.d. distribution. We denote pi (i = 0, 1) as
the probability that channel i is in state high at any time. Before transmitting,
neither the receivers nor the transmitter know the state of the channel. At the
beginning of each time slot, each receiver makes a "bid" (high or low). The
transmitter allocates rate to the receivers according to the bids sent. At the end
A Competitive Rate Allocation Game 19

of the time slot both receivers observe whether or not their transmission was
successful. A transmission is not successful if the respective channel is in a low
state but has been assumed to be in a high state.
Since the channel state is unknown in advance, the receivers’ bid may lead to
an unsuccessful transmission. If the the transmission is successful, the receiver
is paid an amount proportional to the transmission rate. Otherwise, it will get a
penalty (negative reward). Table 1 shows the reward functions for each receiver.

Table 1.

Bid Actual State Other Channel Bid Reward


L R1
L L
H 0
L R1
L H
H 0
L −C
H L
H −D
L R3
H H
H R2

Throughout the rest of the paper we will assume that R1 < R2 < R3 < 2R2 .
C and D are the penalties that the receivers get for making a high bid when
their channel state is low.
There are two roles in this game setting: the transmitter and the receivers.
The transmitter wants to carry as much data as possible to the destination. It
is not interested in penalizing the receivers, but only uses the penalty to give
incentive to the receivers to make good guesses. The receivers are only interested
in the reward and they don’t lose any utility from transmitting more data.

3 Parameters Known Cases - Receivers’ Perspective

Table 2 shows the relationship between the expected rewards for the two receivers
as a normal form game. In each cell, the first value corresponds to the reward
for receiver 1, and the second value corresponds to the reward for receiver 2.

Table 2.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, p2 R3 − (1 − p2 )C)
H (p1 R3 − (1 − p1 )C, 0) (p1 R2 − (1 − p1 )D, p2 R2 − (1 − p2 )D)
20 Y. Wu et al.

3.1 Mixed Nash Equilibrium


We denote by XYi the expected reward for receiver i (i = 1, 2) when receiver 1
bids X, receiver 2 bids Y (where X and Y are high or low).

LL1 = R1 , (1)
LL2 = R1 ,
LH1 = 0,
LH2 = p2 R3 − (1 − p2 )C,
HL1 = p1 R3 − (1 − p1 )C,
HL2 = 0,
HH1 = p1 R2 − (1 − p1 )D,
HH2 = p2 R2 − (1 − p2 )D.

Let receiver 1 bid high with probability q1 , and receiver 2 bid high with prob-
ability q2 . At Nash equilibrium, receiver 1 selects the probability such that the
utility function for receiver 2 is the same for both bidding high and bidding low.
Therefore we have:

(1 − q1 )LL2 = (1 − q1 )LH2 + q1 HH2 . (2)

Similarly for receiver 2:

(1 − q2 ) × LL1 = (1 − q2 )HL1 + q2 HH1 . (3)

Solving 2 and 3 we get :


−C + Cp2 − R1 + p2 R3
q1 = , (4)
−C + D + Cp2 − Dp2 − R1 − p2 R2 + p2 R3
−C + Cp1 − R1 + p1 R3
q2 = . (5)
−C + D + Cp1 − Dp1 − R1 − p1 R2 + p1 R3
Setting q1 and q2 to be 0 or 1, we can find a relationship between the values of
p1 and p2 and the existence of a pure Nash equilibrium.
C+R1 C+R1
If q1 =0 and q2 = 0, then p1 = C+R3 p2 = C+R3 .
D C+R1
If q1 =0 and q2 = 1, then p1 = D+R2 p2 = C+R3 .
C+R1 D (6)
If q1 =1 and q2 = 0, then p1 = C+R3 p2 = D+R2 .
D D
If q1 =1 and q2 = 1, then p1 = D+R2 p2 = D+R2 .

Denote
C + R1 D
b1 = min{ , }, (7)
C + R3 D + R2
C + R1 D
b2 = max{ , }. (8)
C + R3 D + R2
A Competitive Rate Allocation Game 21

Theorem 1. If p1 ∈
/ [b1 , b2 ] or p2 ∈
/ [b1 , b2 ], then there exists a unique pure Nash
equilibrium.
C+R1 D
Proof. Let p1 < b1 , thus p1 < C+R3 and p1 < D+R2

HL1 = p1 R3 − (1 − p1 )C < b1 (R3 + C) − C ≤ R1 ,


HH1 = p1 R2 − (1 − p1 )D < b1 (R2 + D) − D ≤ 0.

Thus receiver 1 has a dominating strategy for bidding low. When receiver 1 bids
low, the optimal action for receiver 2 is bidding low if LL2 > LH2 , and high
otherwise.
Let p1 > b2 , thus p1 > C+R D
C+R3 and p1 > D+R2
1

HL1 = p1 R3 − (1 − p1 )C > b2 (R3 + C) − C ≥ R1 ,


HH1 = p1 R2 − (1 − p1 )D > b2 (R2 + D) − D ≥ 0.

Thus the dominating strategy for receiver 1 is bidding high. When receiver 1
bids high, the optimal action for receiver 2 is bidding high if HL2 < HH2 and
low otherwise.
Similarly for p2 ∈
/ [b1 , b2 ].
Lemma 1. If p1 ∈ (b1 , b2 ) and p2 ∈ (b1 , b2 ), there exists more than one Nash
equilibrium.
Proof. Let p1 ∈ (b1 , b2 ) and p2 ∈ (b1 , b2 ), then there are two possible scenarios:
Scenario 1: b1 = C+R D
C+R3 , b2 = D+R2 , then
1

LH2 = p2 R3 − (1 − p2 )C = p2 (R3 + C) − C > b1 (R3 + C) − C = R1 .

Similarly, HL1 > R1 .

HH1 = p1 R2 − (1 − p1 )D = p1 (R2 + D) − D < b2 (R2 + D) = 0.

Similarly, HH2 < 0.


The payoff matrix for receivers will become as Table 3 shown:

Table 3.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, > R1 )
H (> R1 , 0) (< 0, < 0)

There are two Nash equilibrium: when one receiver bids high, the other re-
ceiver bids low.
D
Scenario 2: b1 = D+R2
, b2 = C+R1
C+R3 , then

LH2 = p2 R3 − (1 − p2 )C = p2 (R3 + C) − C < b2 (R3 + C) − C = R1 .


22 Y. Wu et al.

Similarly, HL1 < R1 .

HH1 = p1 R2 − (1 − p1 )D = p1 (R2 + D) − D > b2 (R2 + D) = 0.

Similarly, HH2 > 0.


The payoff matrix for receivers will become as Table 5 shown:

Table 4.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, < R1 )
H (< R1 , 0) (> 0, > 0)

There are two Nash equilibrium: both bid high, or both bid low.
In the range of (b1, b2)×(b1, b2), if both receivers play mixed Nash equilibrium,
their utility could become much worse than they play pure Nash equilibrium.
If the mixed Nash equilibrium is used. The expected total utility function for
each receiver are:

U1 = (1 − q1 )(1 − q2 )R1 + q1 (1 − p1 )(1 − q2 )(−C), (9)


+q1 (1 − p1 )q2 (−D) + q1 p1 (1 − q2 )R3 + q1 p1 q2 R2 .

U2 = (1 − q2 )(1 − q1 )R1 + q2 (1 − p2 )(1 − q1 )(−C), (10)


+q2 (1 − p2 )q1 (−D) + q2 p2 (1 − q1 )R3 + q2 p2q1 R2 .

In cases where b1 = D+R D


2
, b2 = C+R
C+R3 , when p1 → b1 +, p2 → b1 +, we have
1

q1 → 1 and q2 → 1. Substituting in Eq. (10) and Eq. (11), we can get U1 → 0


and U2 → 0, which is much worse than they just play LL Nash equilibrium.
Both receivers suffer if they play mixed Nash equilibrium.
For simplicity, we want to set C and D such that we only have pure Nash
equilibrium, independent of the probability distributions p1 and p2 .
Lemma 2. Given C, there exists a D, such that there only exist pure Nash
equilibrium.
Proof. When D = −CR 2 −R1 R2
R1 −R3 , we can get b1 = b2 , there only exists pure Nash
equilibrium region.
Lemma 3. If there only exists pure Nash equilibrium both receivers have a dom-
inant strategy.
Proof. If we only have pure Nash equilibrium then we must have b1 = b2 = p.
There are four possible scenarios:
Scenarios 1: p1 < p and p2 < p,
The payoff matrix for receivers will become as Table 5 shown:
A Competitive Rate Allocation Game 23

Table 5.
XX
XXXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, < R1 )
H (< R1 , 0) (< 0, < 0)

The dominant strategies for both receivers are bidding low.


Similarly, we have
Scenario 2: p1 < p and p2 > p, dominant strategy for receiver 1 is bidding low
and dominant strategy for receiver 2 is bidding high.
Scenario 3: p1 > p and p2 < p, dominant strategy for receiver 1 is bidding
high and dominant strategy for receiver 2 is bidding low.
Scenario 4: p1 > p and p2 > p, dominant strategy for both receivers is bidding
high.

4 Parameters Known Cases - Transmitter’ Perspective

In this section, we consider the amount of data which can be sent by the two
receivers. Think the transmitter asks the two receivers to forward its data. What
the transmitter really cares about is how much data is sent. In this case, when
sending fails, we consider the data sent is 0. The penalty term C and D are to
let the receivers adjust their bidding, but for transmitter, it does not get such a
penalty.
Table 6 represents the expected rewards table got from the transmitter’s view:

Table 6.
XX
XXXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, p2 R3 )
H (p1 R3 , 0) (p1 R2 , p2 R2 )

Utility functions from the transmitter’s point of view:

VLL = R1 + R1 , (11)
VHL = p1 R3 , (12)
VLH = p2 R3 , (13)
VHH = p1 R2 + p2 R2 . (14)
(15)

Price of Anarchy (PoA):


24 Y. Wu et al.

maxs∈S V (s)
P oA = . (16)
mins∈N E V (s)

where S is the strategy set, NE is the Nash equilibrium set, and V (s) = {VLL , VHL ,
VLH , VHH }.
R1 R3 −R1 R2 R1 R2
Theorem 2. If C = R2 −R1 and D = R2 −R1 , then P oA < 2.

Proof. If C = R1 R3 −R1 R2
R2 −R1 and D = R1 R2
R2 −R1 , then b1 = b2 = R1
R2 . There only exists
pure Nash equilibrium.
Let p = R 1
R2 ,
If p1 < p and p2 < p,

VLL = 2R1 , (17)


R1 R3
VHL = p1 R3 < R2 < 2R1 , (18)
R1 R3
VLH = p2 R3 < < 2R1 ,
R2 (19)
VHH = p1 R2 + p2 R2 < 2R1 . (20)
(21)

The optimal is LL. The Nash equilibrium is also LL. Thus P oA = 1.


If p1 < p and p2 > p,

VLL = 2R1 < 2p2 R2 < 2p2 R3 , (22)


VHL = p1 R3 < p2 R3 , (23)
VLH = p2 R3 , (24)
VHH = p1 R2 + p2 R2 < 2p2 R2 < 2p2 R3 . (25)
(26)

The optimal is at most 2p2 R3 . The Nash equilibrium is LH. Thus P oA < 2.
If p1 > p and p2 < p, similar to the p1 < p and p2 > p case.
If p1 > p and p2 > p,

VLL = 2R1 < 2p1 R2 , (27)


VHL = p1 R3 < 2p1 R2 , (28)
VLH = p2 R3 < 2p2 R2 , (29)
VHH = p1 R2 + p2 R2 . (30)
(31)

The optimal is at most 2(p1 R2 + p2 R2 ). Nash equilibrium is HH. Thus P oA < 2

Lemma 4. In the rate allocation game, for any fixed penalties C and D, there
exist p1 and p2 such that the P oA is at least 2R1
R3 .

Proof. Assume that p1 = 0 and p2 = 1. Then the table 7 shows the receivers’
payoff matrices and table 8 shows the transmitter’s payoff.
A Competitive Rate Allocation Game 25

Table 7. Receivers’ payoff for p1 = 0 and p2 = 1


XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, R3 )
H (0, 0) (−C, R2 )

Table 8. Transmitter’s payoff for p1 = 0 and p2 = 1


XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (2R1 ) (R3 )
H (0) (R2 )

Since R3 > R1, then the only Nash equilibrium in this instance of the game is
(L, H) for a transmitter utility of R3 . If 2R1 > R3 then the optimal solution
from the transmitter perspective is (L, L) for an utility of 2R1 .
The Price of Anarchy is at least 2R1
R3 .

Corollary 1. The Price of Anarchy for the rate allocation game over all in-
stances can be arbitrarily close to 2 for any C and D.

Proof. Setting R1 = R2 +  = R3 + 2 ( → 0+) in the lemma 4 leads to a


P oA → 2.

This corollary implies that our result in Theorem 2 showing how that the PoA
can be bounded by 2 is essentially tight in the sense that no better guarantee
could be provided that applies to all problem parameters.

5 Online Learning Using Multi-armed Bandits


Algorithms

When the channels’ status are known, and C and D are set as described in The-
orem 2, both receivers have dominant strategies. However, when the channels’
status are unknown, the receivers need to try both actions: sending with high
data rate or sending with low data rate. The underlying channels are stochastic,
even to each receiver, the probability that the channel will be good is unknown.
Multi-armed bandits are handy tool to tackle the stochastic channel problems,
so we adopt the well known UCB1 algorithm [1] to figure out the optimal strate-
gies for each receiver. The arms correspond to actions: bidding high or low, each
receiver only records the average rewards and number of plays and play by the
UCB1 algorithm in a distributed manner without taking into account the other
receiver’s actions.
We recap the UCB1 algorithm in Alg. 1, normalizing the rewards in our case
to lie between 0 and 1.
26 Y. Wu et al.

Algorithm 1. Online learning using UCB1


There are two arms corresponding to each receiver: bidding high or bidding low. Let
xl be the rewards which represents the average reward gained by each receiver by
playing arm l (l = H, L), nl represents how many times the arm l is played.
Initialization: Initially, playing each arm once, store the initial rewards in xl , and
set nl = 1.
for time slot n = 1, 2, · · · do 
Select the arm with highest value of Rx̄l3+D
+D
+ 2ln(n)
nl
. Play the selected arm for a
time slot. Update the average reward of the selected arm as well as nl of the selected
arm.
end for

6 Simulations

In this section we present some simulation results showing that the UCB1 learn-
ing algorithm performs well. In all simulations we fix the penalties C and D as
in theorem 2 which leaves each receiver with a dominant strategy, but which is
not usually known by the receivers. In the figures below we show how the UCB1
learning algorithm compares with playing the dominant strategy (if the receiver
knew it) and determine that using UCB1 does not lose much utility in average,
and sometimes is better than the dominant strategy.
First, in figure 2, we assume that receiver 2 knows its probability for the state
of the channel being high, and plays its own dominant strategy. In this case
receiver 1 would be better off if it knew the probability of its state and would
play the dominant strategy. However, playing UCB1 does not lose much utility
in average. Figure 2 shows for each R1 as a fraction of R3 , the average payoff
over multiple games in which R2 , p1 and p2 are distributed over their entire
domain.
In figure 3 we show the average payoff over multiple choices of R2 , p1 and p2 ,
when receiver 1 plays either the dominant strategy or the UCB1 strategy, and
receiver 2 plays the UCB1 strategy. We can see here that the dominant strategy
is only better in average for large values of R1 and for small values of R1 playing
UCB1 brings better payoff.
Figure 4 and 5 show the same scenarios from the transmitter’s perspective.
Figure 4 compares the optimal average utility the transmitter could get from each
game to the average utility the transmitter gets from receiver 1 using UCB1
or receiver 1 using its dominant strategy, when receiver 2 plays its dominant
strategy. We notice that both strategies give almost the same payoff to the
transmitter, especially when the value of R1 is much smaller compared to R3 .
This happens because when receiver 1 uses UCB1 against a player that uses
its dominant strategy then receiver 1 will quickly learn to play its dominant
strategy as well. Figure 5 shows how the transmitter optimal payoff compares to
the transmitter payoff when the receiver 2 uses the UCB1. When both receivers
use the UCB1 algorithm to choose their strategies, the transmitter payoff is
better than when one receiver uses the dominant strategy and the other receiver
A Competitive Rate Allocation Game 27

Payoff of UCB1 vs. dominant strategy when the other receiver


plays its dominant strategy
3
dominant
UCB1
2.5

Average payoff per game


2

1.5

0.5

0
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)

Fig. 2. Receiver 1 payoff against receiver 2 using dominant strategy

Payoff of UCB1 vs. dominant strategy when the other receiver


plays UCB1
4.6

dominant
4.4
UCB1
4.2
Average payoff per game

3.8

3.6

3.4

3.2

2.8

2.6
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)

Fig. 3. Receiver 1 payoff against receiver 2 using UCB1 strategy

uses the UCB1 learning algorithm. When both receivers are using the UCB1
learning algorithm the receivers don’t play the Nash equilibrium when that is
much worse than cooperating. This is why the UCB1 sometimes performs better
than the dominant strategy.
Finally, figure 6 shows how the transmission rate varies when receivers use the
UCB1 learning algorithm, compared to the optimal transmission rate. In this
simulation we vary the actual probabilities of the two channels while keeping
the rewards unchanged, and we observe that when the two channels are equally
good the UCB1 algorithm obtains almost optimal transmission rate.
We now consider two specific problem instances to illustrate the performance
when UCB1 is adopted by both receivers. In both cases, we assume the following
parameters hold:
28 Y. Wu et al.

Transmitter perspective:
UCB1 vs. dominant strategy when the other receiver
plays its dominant strategy
10
dominant
9 UCB1
OPT
8

Average payoff per game


7

0
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)

Fig. 4. Transmitter payoff when one receiver uses dominant strategy

Transmitter perspective:
UCB1 vs. dominant strategy when the other receiver
plays UCB1
9
dominant
8.5
UCB1
8 OPT
Average payoff per game

7.5

6.5

5.5

4.5

4
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)

Fig. 5. Transmitter payoff when one receiver uses UCB1 strategy

R1 = 40, R2 = 45, R3 = 60, C = 120, D = 360, T = 105 , b1 = b2 = 8/9.

Example 1: Probability parameters p1 = 6/9, p2 = 7.9/9

In this case, the payoff matrix from the receivers’ point of view is shown in table
10:
The optimal action (from the transmitter’s perspective) is both receivers bid-
ding low. When both receivers apply UCB1, we find that for receiver 1, the
number of times out of 100,000 that it bids high is 657, the number of times it
bids low is 99343; for receiver 2, the number of times it bids high is 39814, and
the number of times it bids low is 60186.
A Competitive Rate Allocation Game 29

Transmitter perspective for R1 = 8, R2 = 9, R3 = 10


and various values of p1 and p2.
1
p2 = .2
p2 = .4
p2 = .6
0.95

Average payoff per game


0.9

0.85

0.8

0.75

0.7
.1 .2 .3 .4 .5 .6 .7 .8 .9 1
p1

Fig. 6. Normalized transmitter payoff with respect to optimum when both play UCB1
as a function of the two channel parameters

Table 9.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (40, 40) (0, 38)
H (0, 0) (−90, −4.5)

Example 2: Probability parameters: p1 = 6/9, p2 = 8.1/9


The payoff matrix from the receivers’ point of view is shown in table 10:

Table 10.
XX
XXXReceiver 2
Receiver 1 XXXX
L H
L (40, 40) (0, 42)
H (0, 0) (−90, 4.5)

In this case, the optimal action (from the transmitter’s perspective) is receiver
1 bidding low, receiver 2 bidding high. for Receiver 1, the number of times out
of 100,000 that it bids high is 622, the number of times it bids low is 99378; for
Receiver 2, the number of times it bids high is 62706, and the number of times
it bids low is 62706.
These examples illustrate how the distributed learning algorithm is sensitive
to the underlying channel parameters and learns to play the right bid over a
sufficient period of time, although as expected, the regret is higher when the
channel parameter is close to b1 .
30 Y. Wu et al.

7 Conclusion

We have presented and investigated a competitive rate allocation game in which


two selfish receivers compete to forward the data from a transmitter to a des-
tination for a rate-proportional fee. We showed that even if the transmitter is
unaware of the stochastic parameters of the two channels, it can set penalties
for failures in such a way that the two receivers’ strategic bids yield a total rate
that is not less than half of the best possible rate it could achieve if it had knowl-
edge of the channel parameters. We have also studied the challenging case when
the underlying channel is unknown, resulting in a game with unknown stochas-
tic payoffs. For this game, we numerically evaluated the use of the well-known
UCB1 strategy for multi-armed bandits, and showed that it gives performance
close to the dominant strategies (in the case the payoffs are known) or sometimes
even better. In future work, we would like to obtain more rigorous results for
the game with unknown stochastic payoffs.

Acknowledgment. This research was supported by the U.S. Army Research


Laboratory under the Network Science Collaborative Technology Alliance,
Agreement Number W911NF-09-2-0053.

References
1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit
problem. Machine Learning 47, 235–256 (2002)
2. MacKenzii, A., DaSilva, L.: Game Theory for Wireless Engineers. Morgan and Clay-
pool Publishers (2006)
3. Altman, E., Boulogne, T., El-Azouzi, R., Jimenez, T., Wynter, L.: A survey on
networking games in telecommunications. Computer and Operations Research 33,
286–311 (2006)
4. Yu, W., Ginis, G., Cioffi, J.M.: Distributed Multiuser Power Control for Digital
Subscriber Lines. IEEE Jour. on Selected Areas in Communications 20(5), 1105–
1115 (2002)
5. Mertikopoulos, P., Belmega, E.V., Moustakas, A.L., Lasaulce, S.: Distributed Learn-
ing Policies for Power Allocation in Multiple Access Channels. IEEE Journal on
Selected Areas in Communications 30(1), 96–106 (2012)
6. Laourine, A., Tong, L.: Betting on Gilbert-Elliott Channels. IEEE Transactions on
Wireless Communications 50(3), 484–494 (2010)
7. Wu, Y., Krishnamachari, B.: Online Learning to Optimize Transmission over an
Unknown Gilbert-Elliott Channel. WiOpt (May 2012)
8. Daskalakis, C., Frongillo, R., Papadimitriou, C.H., Pierrakos, G., Valiant, G.: On
Learning Algorithms for Nash Equilibria. In: Kontogiannis, S., Koutsoupias, E.,
Spirakis, P.G. (eds.) SAGT 2010. LNCS, vol. 6386, pp. 114–125. Springer, Heidelberg
(2010)
Convergence Dynamics of Graphical Congestion
Games

Richard Southwell1 , Yanjiao Chen2 , Jianwei Huang1 , and Qian Zhang2


1
Information Engineering Department, The Chinese University of Hong Kong,
Shatin, New Territories, Hong Kong
2
Computer Science and Engineering Department,
Hong Kong University of Science and Technology,
Clear Water Bay, Kowloon, Hong Kong
{richardsouthwell254,chenyj.thu,jianweihuang}@gmail.com,
[email protected]

Abstract. Graphical congestion games provide powerful models for a


wide range of scenarios where spatially distributed individuals share re-
sources. Understanding when graphical congestion game dynamics con-
verge to pure Nash equilibria yields important engineering insights into
when spatially distributed individuals can reach a stable resource alloca-
tion. In this paper, we study the convergence dynamics of graphical con-
gestion games where players can use multiple resources simultaneously.
We show that when the players are free to use any subset of resources the
game always converges to a pure Nash equilibrium in polynomial time
via lazy best response updates. When the collection of sets of resources
available to each player is a matroid, we show that pure Nash equilibria
may not exist in the most general case. However, if the resources are ho-
mogenous, the game can converge to a Nash equilibrium in polynomial
time.

Keywords: congestion game, resource allocation, matroid, games on


graphs, graphical.

1 Introduction

Congestion games have found applications in many scientific and engineering


areas. The original congestion game model was introduced by Rosenthal [1]. The
idea behind this model is that players select resources to use, and the payoff a
player gains from using a given resource depends upon that resource’s congestion
level (i.e., the total number of players using it).
The original congestion game model is very general, because it allows different
resources to be associated with different payoff functions, and it allows players
to use multiple resources simultaneously. Also, the game has a very appealing

This work is supported by the General Research Funds (Project Number 412509)
established under the University Grant Committee of the Hong Kong Special Ad-
ministrative Region, China.

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 31–46, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
32 R. Southwell et al.

feature called the finite improvement property., which means that if the players
keep performing asynchronous better response updates (i.e., the players improve
their strategy choices one at a time) then the system will eventually reach a pure
Nash equilibrium - a strategy profile from which no player has any incentive
to deviate. Intuitively, the finite improvement property means greedy updating
always converges to a stable strategy profile.
The generality and pleasing convergence properties of the original congestion
game model have led to its application to a wide range of resource allocation
scenarios (e.g., economics [2], communication networks [3–6], network routing [7],
network formation [8], ecology [9], and sociology [10]). However, the original
model has the limitation that each player using the same resource gets the same
payoff from it. Treating players identically is unsuitable for many scenarios in
ecology [11], network routing [12], and wireless networks [13] where players are
heterogenous. This has motivated many adaptations of the original congestion
game, including congestion games with player-specific payoff functions [4,14] and
weighted congestion games [16].
In [17], we considered the graphical congestion game (see Figure 1), an im-
portant generalization of the original congestion game concept. This model not
only allows player-specific payoff functions but also models how the spatial po-
sitioning of the players affects their performance in the game. In the original
congestion game model, any pair of users cause congestion to each another when
using the same resource. In the graphical congestion game, we regard the play-
ers as vertices in a conflict graph. Only linked players cause congestion to each
other. Unlinked players can use the same resource without causing congestion
to each other. We describe some scenarios that can be modeled using graphical
congestion games in Table 1.

Fig. 1. A strategy profile in a graphical congestion game. The players (i.e., the vertices
on the graph) select sets of resources to use. Player 1 is using resources 1 and 2. The
amount of payoff a player gains from using a particular resource is a non-increasing
function of the number of its neighbors who are also using that resource.
Convergence Dynamics of Graphical Congestion Games 33

Table 1. How graphical congestion games can be used to model various resource
sharing scenarios

Scenario Players Resources Links in the Conflict Graph Represent


Represent Represent
Ecology Organisms Food Sources Organisms are spatially close enough to com-
[9] or Habitats pete for the same food source or habitat.
Wireless Wireless Channels Users are close enough to cause significant in-
Networks Users terference to each other.
[13, 15, 17]
Market Businesses Markets Business locations are close enough to compete
Shar- over the same customers.
ing [23]

Although the graphical congestion game has a wide-range applications, it is


no longer guaranteed to possess the finite improvement property or even pure
Nash equilibria. Since the graphical congestion game is highly practically rele-
vant yet may lose some nice features, the obvious question is as follows: What are
the conditions under which a graphical congestion game possesses a pure Nash
equilibrium or even the finite improvement property? This is a question of fun-
damental importance for many spatially distributed resource sharing scenarios.

1.1 Problem Definition


A generalized graphical congestion game is a 5-tuple g =
(N , R, (ζn )n∈N , (fnr )n∈N ,r∈R , G), where:
• N = {1, 2, ..., N } is a set of N players.
• R = {1, 2, ..., R} is a set of R resources.
• ζn ⊆ 2R is1 the collection of resource sets available to player n ∈ N . During
the game player n selects a member of ζn to use. Therefore ζn can be viewed
as the set of strategies available to player n. Sometimes we refer to ζn as
the collection of available resource sets, and the members of ζn as available
resource sets.
• fnr is the non-increasing payoff function for a player n ∈ N using resource
r ∈ R.
• G = (N , E) is an undirected graph with vertex set N and edge set E. Here E is
a set of unordered pairs {n, n } of players. We say that player n ∈ N is linked
to player n ∈ N if and only if {n, n } ∈ E. We can interpret {n, n } ∈ E as
being equivalent to saying that n and n can cause congestion to one another.
We assume {n, n} ∈ / E for each player n ∈ N . In other words, we assume that
no player is adjacent to itself2 . We refer to G as the conflict graph.
1
Here 2R denotes the set of all subsets of R.
2
This is just a convention we adopt for simplicity. All our results persist if we allow
players to be adjacent to/cause congestion to themselves, but the results would look
more cumbersome. One could emulate the idea n is adjacent to themselves under
our framework by replacing their payoff functions fnr (x) with new payoff functions
fnr (x + 1).
34 R. Southwell et al.

N
A strategy profile X ∈ Πn=1 ζn consists strategy (i.e., a collection of resources)
Xn ∈ ζn of each player n ∈ N .
We define the congestion level crn (X) of resource r ∈ R for player n ∈ N
within strategy profile X to be crn (X) = |{n ∈ N : {n, n } ∈ E, r ∈ Xn }|. In
other words crn (X) is the number of neighbors that n has in the conflict graph G
which are using resource r in strategy profile X. The total payoff that a player
n gets in strategy profile X is

fnr (crn (X)) .
r∈Xn

This is the sum of the payoffs fnr (crn (X)) that n receives from each of the re-
sources r within the resource set Xn that n chooses.

1.2 Better, Best, and Lazy Best Response Updates


We are concerned with how graphical congestion games evolve through time as
the players attempt to improve their resource choices.
Let us define an [n] → S update as the action where player n ∈ N
switches to use resource set S ∈ ζn , while all other players retain their ex-
isting resource selections. If the current strategy profile is X, then the [n] → S
update changes the strategy profile from X to a new strategy profile Y =
(X1 , .., Xp−1 , S, Xp+1 , .., Xn ). We wish to emphasize that an [n] → S update
(and, in fact every update we consider) only involves one player changing its
strategy, while all other players keep their strategy choices unchanged.
We say that an [n] → S update is a better response update if it improves
player n’s payoff, i.e.,
 
fnr (crn (Y )) > fnr (crn (X)) .
r∈Yn r∈Xn

We say that [n] → S is a best response update if it improves player n’s payoff
to the maximum possible value among all better responses from the current
strategy profile.
We say that [n] → S is a lazy best response update [16] if (a) [n] → S is a
best response update, and (b) for any other best response update [n] → S  that
n could perform, we have |Xn − S  | + |S  − Xn | ≥ |Xn − S| + |S − Xn |. In other
words, a lazy best response update is a best response update which minimizes
the number |Xn − S| + |S − Xn | of resources which n must add or remove from
their currently chosen resource set Xn .
N
We say a strategy profile X ∈ Πn=1 ζn is a pure Nash equilibrium 3 if and only
if no better response updates can be performed by any player from X.
We give an illustrative example of such a graphical congestion game in Figure
1. Suppose that the collections of available resources for the four players/vertices
3
We always suppose players use pure strategies and so all of the Nash equilibria that
we discuss are pure.
Convergence Dynamics of Graphical Congestion Games 35

are ζ1 = 2{1,2,3} , ζ2 = ζ4 = {∅, {1}}, and ζ3 = {∅, {2}}. Assume that the payoff
functions are fnr (x) = 1 − x for each player n and resource r. In the strategy
profile X shown in Figure 1, player 1 uses strategy X1 = {1, 2} and receives a
total payoff of f11 (c11 (X)) + f12 (c21 (X)) = (1 − 2) + (1 − 1). From this strategy
profile, player 1 could perform the better response update [1] → {2} (which is
not a best response update), or the best response update [1] → {2, 3} (which is
not a lazy best response update), or the lazy best response update [1] → {3}
(which leads to a pure Nash equilibrium).
We are interested in how graphical congestion games evolve when the players
keep performing better response updates. Nash equilibria are the fixed points
of such dynamics, since no player has any incentive to deviate from a Nash
equilibrium.
We can put properties a congestion game might possess in the ascending order
of strength/desirability as follows;

1. A pure Nash equilibrium exists.


2. A sufficiently long sequence of lazy best response updates is guaranteed to
drive the system to a pure Nash equilibrium.
3. A sufficiently long sequence of better response updates is guaranteed to drive
the system to a pure Nash equilibrium (the finite improvement property).

This paper is mainly concerned with identifying conditions under which the
generalized graphical congestion games have properties 1, 2, and 3. It should be
noted that the presence of property 3 implies the presence of property 2, which
in turn, implies the presence of property 1. However it is possible to construct
games with only subset (or none) of the above properties.

1.3 Previous Work

Graphical congestion games were introduced in [19], where the authors consid-
ered linear and non-player specific payoff functions. Such games are proved to
have the finite improvement property when the graph is undirected or acyclic.
But the authors illustrated a game on a directed graph with no pure Nash equilib-
ria. In [20], players are assigned different weights, so they suffer more congestion
from “heavier” neighbors. Both [19] and [20] restricted their attention to “sin-
gleton games” (where each player uses exactly one resource at any given time)
with linear and non-player-specific payoff functions.
In [17], the authors introduced the more general graphical congestion game
model as described in Section 1.1 to model spectrum sharing in wireless net-
works (see Table 1). The model allows generic player-specific payoff functions,
as wireless users often have complicated and heterogeneous responses to the
received interference. The authors showed that every singleton graphical con-
gestion game with two resources has the finite improvement property. They also
gave an example of a singleton graphical congestion game (with player-specific
and resource-specific payoff functions) which does not possess any pure Nash
equilibria. In [13], we extended upon this work by showing that every singleton
36 R. Southwell et al.

graphical congestion game with homogenous resources (i.e. the payoff functions
are not resource-specific) converges to a pure Nash equilibrium in polynomial
time.
In [15], the authors investigated the existence of pure Nash equilibrium of
spatial spectrum sharing games on general interference graphs, especially when
Aloha and random backoff mechanisms are used for channel contention. They
also proposed an efficient distributed spectrum sharing mechanism based on
distributed learning.

1.4 Our Results


We focus upon the generalized graphical congestion games where players can use
multiple resources simultaneously. In general, a player n can use any available
set of resources from ζn at any given time. Our results suggest that the kinds of
restrictions put on the combinatorial structure of the collections of available re-
source sets, ζn , have a dramatic influence on whether the convergence properties
exist or not.
In particular, we find that when the collections of available resource sets ζn
are “matroids” [21], many powerful results can be derived. A matroid M ⊆ 2U
with a ground set U is a set M of subsets S ⊆ U (called independent sets) which
has the following three properties4 ;
1. Empty set ∅ ∈ M .
2. If S ∈ M and S  ⊆ S, then S  ∈ M .
3. If S ∈ M contains less elements than S  ∈ M , then there exists some x ∈
S  − S such that S ∪ {x} ∈ M .
We refer to 1, 2 and 3 in the above list as the matroid properties. Properties 1
and 2 are natural. Property 3 ensures that many examples of “independent set
structures” from combinatorics and linear algebra are matroids. In a graph, the
collection of subsets of edges which hold no cycles is a matroid. If U is a finite set
of vectors from a vector space, and M is the collection of linearly independent
subsets of U , then M is a matroid. Another important example of a matroid is
the uniform matroid {S ⊆ U : |S| ≤ k}, which is the collection of all subsets of
a set U which have no more than k elements. A simple kind of matroid is the
powerset M = 2U = {S ⊆ U } (i.e., the collection of all subsets of U ).
A matroid graphical congestion game is a graphical congestion game, within
which the collection of available resource sets ζn of each player n is a matroid
with a ground set R. Matroids are very general, and so matroid graphical conges-
tion games have many applications. In Table 1, we discussed how the graphical
congestion game can be used to model ecologies, wireless networks, and market
sharing. In each of these cases, it is more reasonable to assume that the collection
of available resource sets of each player forms a uniform matroid than to treat
the system as a singleton graphical congestion game. For example, in ecology
4
We write |S| to denote the number of elements in set S and S  − S to denote the set
of elements in S  but not S.
Convergence Dynamics of Graphical Congestion Games 37

the organisms will be able to access multiple food sources, but they will not be
able to access more than a certain number of food sources because of limited
time and energy. In wireless networks, users can divide their transmission power
among many channels, however they cannot access too many channels because
their total power is limited.5 In market sharing games (e.g., [23]), each player
has a fixed budget they can spend upon serving markets. When the cost of serv-
ing each market is the same, this corresponds to a uniform matroid congestion
game because the number of markets a player can serve is capped. Linked pay-
ers in a matroid graphical congestion game could represent businesses who are
close enough to compete for the same customers. As [16] noted, some network
formation games correspond to congestion games with a matroid structure. For
example, in [22] the authors considered the game where players select spanning
trees of a graph, but suffer congestion from sharing edges with other players.
In such scenarios, the conflict graph could represent which players are able to
observe each others’ actions.
In section 2, we consider the properties of a special important type of matroid
graphical congestion game, the powerset graphical congestion game, within which
the collection of available resource sets ζn of each player n is a powerset ζn = 2Qn
for some subset Qn ⊆ R of available resources. In section 3, we investigate the
properties of more general matroid graphical congestion games. Our main results
are listed below (and illustrated in Figure 2);

• There exist powerset graphical congestion games with homogenous resources


which do not have the finite improvement property (Theorem 1)
• Every powerset graphical congestion game will reach a pure Nash equilibrium
in polynomial time when the system evolves via lazy best response updates
(Theorem 2).
• There exist matroid graphical congestion games which posses no pure Nash
equilibria (Theorem 3).
• Every matroid graphical congestion game with homogenous resources will
reach a pure Nash equilibrium in polynomial time when the system evolves
via lazy best response updates (Theorem 4).

Our main result is Theorem 4, because it identifies a very general class of games
with pleasing convergence properties. This result is especially meaningful for
wireless networks, because wireless channels often have equal bandwidth, which
means that they correspond to homogenous resources (under flat fading or in-
terleaved channelization). The way we prove this convergence result is to define
a potential function, which decreases whenever a player performs a lazy best
response update. The existence of such a function guarantees that lazy best
response updatings eventually lead to a fixed point (a pure Nash equilibrium).
Due to limited space, we refer the readers to our online technical
report [24] for the full proofs of most results in this paper.
5
In reality, when a user shares its power among many channels, the benefit they
receive from using each one is diminished. Our game model does not capture this
effect, however other models that do [18] are often analytically intractable.
38 R. Southwell et al.

Matroid GCG

Powerset GCG

Powerset
homo-resource
GCG

Matroid
homo-resource
GCG

Fig. 2. In both Powerset GCG and Matroid homo-resource GCG, lazy best
response update converges to pure Nash equilibria in polynomial time. However, even
in the intersection class of Powerset homo-resource GCG, there exist examples
where better response update may never converge to pure Nash equilibria.

2 Powerset Graphical Congestion Games

We begin our exploration of the dynamics of graphical congestion games with


the “powerset” case, where players may use any subset of a set Qn of resources
available to them. In powerset congestion games, the decision of whether or not
to use one resource has no effect on a player’s ability to use the other resources.
This fact allows us to decouple the system and consider the usage each resource
separately.
As we shall see, the players in a powerset graphical congestion game can reach
a pure Nash equilibrium in polynomial time via selfish updating. However, the
players must be careful about what kind of updates they perform, because the
following result suggests that better response updating is not guaranteed to lead
to a pure Nash equilibrium.

Theorem 1. There exist powerset graphical congestion games with homogenous


resources which do not have the finite improvement property.

Proof. Consider the powerset graphical congestion game g with players N =


{1, 2, 3}, resources R = {1, 2, 3, 4}, strategy sets ζ1 = ζ2 = ζ3 = 2{1,2,3,4}
and payoff functions fnr such that (f1r (0), f1r (1), f1r (2)) = (0, −5, −7) and
(f2r (0), f2r (1), f2r (2)) = (f3r (0), f3r (1), f3r (2)) = (0, −2, −7). The game is is played
on a three vertex complete graph G. Figure 3 shows how better response up-
dating can lead to cycles in g, meaning g does not have the finite improvement
property. 
Convergence Dynamics of Graphical Congestion Games 39

Fig. 3. A cycle in the best response dynamics of the powerset graphical congestion
game discussed in the proof of Theorem 1. The arrows represent how the strategy
profile changes with certain better response updates. Better response updating cannot
be guaranteed to drive this game into a pure Nash equilibrium because better response
updating can lead to cycles.

Notice that the example game in the proof of Theorem 1 is played on a complete
graph and has homogenous resources. Thus the lack of finite improvement prop-
erty is not due to either special property of the graph or the resources. Theorem
1 seems to be quite negative. However, as we shall see, the players often can
be gaurenteed to reach pure Nash equilibria if they update their resources in
special ways (instead of unregulated asynchronous updates). Before we describe
this in more details, let us introduce some tools that will be useful throughout
our analysis: beneficial pickups, beneficial drops, and the temperature function.

2.1 Beneficial Pickups and Drops


A better response update may alter the set of resources that a player is using
in quite complicated ways. However, we will show that better response updates
can be decomposed into sequences of elementary update operations. Here we
introduce two such operations: the beneficial pickup (where a player starts using
a good new resource) and the beneficial drop (where a player stops using a bad
old resource).
More formally, suppose we have a graphical congestion game in the strategy
profile X. A beneficial pickup is a better response update [n] → Xn ∪ {a} with
a∈/ Xn (i.e., a beneficial pickup is where a player starts using a new resource a
40 R. Southwell et al.

and obtains additional benefits). A beneficial drop is a better response update


[n] → Xn − {b} where b ∈ Xn (i.e., a beneficial drop is where a player stops
using a resource b and gains benefits).
To illustrate these concepts, consider the graphical congestion game depicted
in Figure 1 with parameters as described in Section 1.2. In this case, [1] →
{1, 2, 3} is a beneficial pickup that player 1 can perform and [1] → {2} is a
beneficial drop that player 1 can perform.
We can use beneficial pickups and drops to construct more complex updates.
Thinking in this way is useful, because we can define a global “temperature”
function which decreases every time a player conducts a beneficial pickup or
drop.

2.2 The Temperature Function

The temperature function maps strategy profiles to integers. In certain scenarios,


the temperature function acts like a potential function, which decreases with
lazy best response updates6 . This fact allows us to prove our polynomial time
convergence results.
To build the temperature function, we associate each payoff function f with a
left-threshold value TN← [f ] (which, roughly speaking, is the maximum integer x
such that f (x) ≥ 0) and a right-threshold value TN→ [f ] (which, roughly speaking,
is the minimum integer x such that f (x) ≤ 0). The values of these thresholds
also depend on the integer N . We will take N to be the number of players in
our game when we apply these concepts later.
More precisely, suppose f is a non-increasing function and N is an integer.
We define the left-threshold TN← [f ] of f with respect to N as follows:

−1, if f (x) < 0, ∀x ∈ {0, ..., N − 1},
TN← [f ] =
max{x ∈ {0, .., N − 1} : f (x) ≥ 0}, otherwise.

We define the right-threshold TN→ [f ] of f with respect to N as follows:


N, if f (x) > 0, ∀x ∈ {0, ..., N − 1},
TN→ [f ] =
min{x ∈ {0, .., N − 1} : f (x) ≤ 0}, otherwise.

In an N -player graphical congestion game the input of a payoff function f will


be a congestion level in the range {0, 1, ..., N −1}. The following lemma describes
how TN← [f ] and TN→ [f ] indicate when a resource’s congestion level is so high that
it is no longer worth using.

Lemma 1. Suppose TN← [f ] and TN→ [f ] are the left-threshold and right-threshold
values of the non-increasing function f (with respect to N ), then for any x ∈
{0, ..., N − 1},
6
The temperature function is not always a potential function, because it may not
decrease when certain better response updates are performed in certain cases.
Convergence Dynamics of Graphical Congestion Games 41

• f (x) > 0 if and only if x ≤ TN→ [f ] − 1, and


• f (x) < 0 if and only if x ≥ TN← [f ] + 1.

Lemma 1 can be proved using basic facts about non-increasing functions. With
this lemma in place we shall define the temperature function.
The temperature function Θ associated with an N -player graphical congestion
game g is defined as
 
Θ(X) = (crn (X) − TN← [fnr ] − TN→ [fnr ]) .
n∈N r∈Xn

In many types of graphical congestion game, the temperature function always de-
creases with lazy best response updates. Now we will show that the temperature
function decreases every time a player performs a beneficial pickup or drop.

Lemma 2. Suppose that we have a graphical congestion game in a strategy pro-


file X, and a player n performs a beneficial pickup, [n] → Xn ∪{a}, which drives
the system into a strategy profile Y . We have Θ(Y ) ≤ Θ(X ) − 1.

Lemma 2 can be proved using Lemma 1 together with the fact that fna (can (X)) >
0 whenever [n] → Xn ∪ {a} is a beneficial pickup.

Lemma 3. Suppose that we have a graphical congestion game in a strategy pro-


file X, and a player n performs a beneficial drop, [n] → Xn − {b}, which drives
the system into a strategy profile Y . We have Θ(Y ) ≤ Θ(X ) − 1.

Lemma 3 can be proved using Lemma 1 together with the fact that fnb (cbn (X)) <
0 whenever [n] → Xn − {b} is a beneficial drop.
The temperature function clearly takes integer values. Another crucial feature
of the temperature function is that it is bounded both above and below.

Lemma 4. If X is a strategy profile of a graphical congestion game with N


players and R resources, then temperature function Θ satisfies the inequalities
R(N − 2N 2 ) ≤ Θ(X) ≤ RN 2 .

2.3 Convergence Dynamics of Powerset Graphical Congestion


Games

Lemma 5 characterizes the relationship between the lazy best response and the
beneficial pickups and drops.

Lemma 5. In a powerset graphical congestion game, every lazy best response


can be decomposed into a sequence of beneficial pickups and/or beneficial drops.

We know from Lemmas 2 and 3 that beneficial pickups and drops decreases the
temperature function. Hence Lemma 5 essentially shows that the temperature
function is a potential function, which decreases by integer steps when a powerset
graphical congestion game evolves via lazy best response updates.
42 R. Southwell et al.

Theorem 2. Consider a powerset graphical congestion game with N players


and R resources. A Nash equilibrium can be reached from any initial strategy
profile within R(3N 2 − N ) asynchronous lazy best response updates.

Sketch of Proof. Since each beneficial pickup or drop decreases the tempera-
ture function Θ by at least one (Lemmas 2 and 3), and each lazy best response
update can be decomposed into beneficial pickups and drops (Lemma 5), we have
that each lazy best response update decreases the temperature function by at
least one. Since the temperature function is bounded above by RN 2 and below by
R(N −2N 2 ) (Lemma 4), then no more than RN 2 −(R(N −2N 2 )) = R(3N 2 −N )
lazy best response updates can be performed starting from any strategy profile.
When no more lazy best response update can be performed, we reach a pure
Nash equilibrium. 

3 Matroid Graphical Congestion Games

Powerset graphical congestion games have a relatively simple combinatorial


structure, which allows us to prove with relative ease that they always have
pure Nash equilibria. When the resource availability sets ζn ’s have a more com-
plicated structure, this is no longer true. In this section, we shall investigate the
properties of the more general matroid graphical congestion games, where each
player’s collection of available resource sets ζn is a matroid. We start by showing
that in a pure strategy Nash equilibrium may not exist in general.

Theorem 3. There exist matroid graphical congestion games which do not pos-
sess a pure Nash equilibrium.

Sketch of Proof. In [17], the authors gave an example of a singleton graphical


congestion game g (with strictly positive payoff functions) that has no pure
Nash equilibria. We can convert g into a matroid graphical congestion game
g  by giving players the extra option of using no resources (i.e., by adding the
empty set into their collection of available resource sets). Since using a resource
in g  leads to a positive payoff, rational players in g  will behave exactly as in
g (i.e., they will always want to use some resource). Since g has no pure Nash
equilibria, g  has no pure Nash equilibria either. 

Next we shall examine a special type of matroid graphical congestion game,


which is guaranteed to possess a pure Nash equilibrium and nice convergence
properties.

3.1 Convergence Dynamics of Matroid Graphical Congestion


Games with Homogenous Resources

We say a graphical congestion game g = (N , R, (ζn )n∈N , (fnr )n∈N ,r∈R , G) has
homogenous resources when the payoff functions are not resource specific (i.e.,
Convergence Dynamics of Graphical Congestion Games 43

fn1 (x) = fn2 (x) = ... = fnR (x) = fn (x), ∀n ∈ N , ∀x). Note that different play-
ers can have different payoff functions. When discussing resource homogenous
games, we often suppress the superscript on the payoff functions, writing fnr (x)
as fn (x) to represent the fact that the payoff functions do not depend on the
resources.
We will show that a matroid graphical congestion game with homogenous
resources will reach a pure Nash equilibrium in polynomial time if the players
perform lazy best response updates. We prove this result with the help of the
temperature function. Before we do this, we must introduce a third type of
elementary update operation – the beneficial swap, which is a better response
update [n] → (Xn ∪ {a}) − {b}) where a ∈ / Xn and b ∈ Xn (i.e., a beneficial
swap is where a player stops using a resource b and starts using a resource a,
and benefits as a result.)
Our next result states that in any graphical congestion game with homogenous
resources (but not necessarily with matroid structure), a beneficial swap will
decrease the temperature function Θ by at least one.

Lemma 6. Suppose we have a graphical congestion game with homogenous


resources in a strategy profile X, and we perform a beneficial swap [n] →
(Xn ∪ {a}) − {b}, which moves the system into a strategy profile Y . We have
Θ(Y ) ≤ Θ(X) − 1.

Lemma 6 follows from the fact that if [n] → (Xn ∪ {a}) − {b} is a beneficial swap
and the resources are homogenous, then can (X) < cbn (X).
Lemmas 2, 3, and 6 together imply that any beneficial pickup, drop, or swap
in a graphical congestion game with homogenous resources will decrease the
temperature function. Next we will show that if the strategy sets ζn ’s of the
game are matroids, then it is always possible to perform a beneficial pickup,
drop, or swap from a non-equilibrium state. In particular we will show that
each lazy best response update in a matroid graphical congestion game with
homogenous resources can be decomposed into a sequence of beneficial pickups,
drops, and/or swaps. The following three lemmas will allow us to achieve this
goal.

Lemma 7. If [n] → S is a lazy best response update that can be performed from
a strategy profile X of a matroid graphical congestion game with homogenous
resources and |Xn | < |S|, then there exists a ∈ S − Xn such that [n] → Xn ∪ {a}
is a beneficial pickup that player n can perform from X.

Lemma 8. If [n] → S is a lazy best response update that can be performed from
a strategy profile X of a matroid graphical congestion game with homogenous
resources and |Xn | > |S|, then there exist b ∈ S − Xn such that [n] → Xn − {b}
is a beneficial drop that player n can perform from X.

Lemma 9. If [n] → S is a lazy best response update that can be performed from
a strategy profile X of a matroid graphical congestion game with homogenous
resources and |Xn | = |S|, then there exists a ∈ Xn − S and ∃b ∈ S − Xn such
44 R. Southwell et al.

that [n] → (Xn ∪ {a}) − {b} is a beneficial swap that player n can perform from
X.

Lemmas 7 and 8 can be shown using the basic matroid properties. Our proof to
Lemma 9 uses a more sophisticated result about matroids shown in [21]. With
Lemmas 7, 8, and 9, we can prove the following main result of this paper.

Theorem 4. Consider a matroid graphical congestion game with homogenous


resources with N players and R resources. A Nash equilibrium can be reached
from any initial strategy profile within R(3N 2 − N ) asynchronous lazy best re-
sponse updates.

Sketch of Proof. Since each beneficial pickup, drop, or swap decreases the
temperature function Θ by at least one (Lemmas 2, 3, and 6) and each lazy best
response update can be decomposed into beneficial pickups, drops, or swaps (as
can be proved inductively using Lemmas 7, 8, and 9), we have that each lazy
best response update decreases the temperature function by at least one. Since
the temperature function is bounded above by RN 2 and below by R(N − 2N 2 )
(Lemma 4), no more than RN 2 −(R(N −2N 2)) = R(3N 2 −N ) lazy best response
updates can be performed starting from any strategy profile. When no more lazy
best response update can be performed, we reach a pure Nash equilibrium. 

By considering Theorem 4 in conjunction with Theorem 1, we can see an interest-


ing separation between the dynamics that always reach a pure Nash equilibrium
and the dynamics which sometimes do not. Theorem 1 implies the existence of
matroid graphical congestion games with homogenous resources that will never
converge to pure Nash equilibria when the players do better response updates.
However, Theorem 4 implies that when the players restrict themselves to lazy
best response updates (which are more accurate and rational), they are guaran-
teed to reach a pure Nash equilibrium in polynomial time.

4 Conclusion

We have derived many results which are useful for understanding when graphical
congestion games converge to pure Nash equilibria. Theorem 1 is quite negative,
because it implies the existence of games with simple features (players that can
use any combination of resources, and resources are homogenous) which cannot
be guaranteed to converge to pure Nash equilibria under generic better response
updating. However, Theorems 2 and 4 imply that in many cases (powerset games,
or matroid games with homogenous resources) the players do converge to pure
Nash equilibria under lazy best response updating. These results are very en-
couraging, because they imply that spatially distributed individuals will quickly
be able to organize themselves into a pure Nash equilibrium in a wide range of
scenarios. Just so long as the players are rational enough to restrict themselves
to lazy best response updates. We obtained our convergence results by breaking
better response updates into more elementary operations, and observing how
Convergence Dynamics of Graphical Congestion Games 45

these operations alter the value of the temperature function we defined. In the
future, we will use these results to study the convergence dynamics of more
general games, where players have generic collections of available resource sets.

References
1. Rosenthal, R.: A class of games possessing pure-strategy Nash equilibria. Interna-
tional Journal of Game Theory 2, 65–67 (1973)
2. Tennenholtz, M., Zohar, A.: Learning equilibria in repeated congestion games. In:
Proceedings of AAMAS 2009 (2009)
3. Liu, M., Wu, Y.: Spectum sharing as congestion games. In: Proceedings the 46th
Annual Allterton Conference on Communication, Control, and Computing (2008)
4. Law, L., Huang, J., Liu, M., Li, S.: Price of Anarchy for Cognitive MAC Games.
In: Proceedings of IEEE GLOBECOM (2009)
5. Chen, X., Huang, J.: Evolutionarily Stable Spectrum Access in a Many-Users
Regime. In: Proceedings of IEEE GLOBECOM (2011)
6. Southwell, R., Huang, J.: Spectrum Mobility Games. In: IEEE INFOCOM (2012)
7. Vöcking, B., Aachen, R.: Congestion games: Optimization in competition. In: Pro-
ceedings of the Second ACiD Workshop (2006)
8. Tardos, E., Wexler, T.: Network formation games and the potential function
method. In: Algorithmic Game Theory. ch.19, pp. 487–516 (2007)
9. Fretwell, S.D., Lucas, H.L.: On Territorial Behavior and Other Factors Influencing
Habitat Distribution in Birds. Acta Biotheor. 19, 16–36 (1969)
10. Lachapelle, A., Wolfram, M.: On a mean field game approach modeling congestion
and aversion in pedestrian crowds. Transportation Research Part B: Methodologi-
cal. 45, 1572–1589 (2011)
11. Godin, J., Keenleyside, M.: Foraging on Patchily Distributed Preyby a Cichlid
Fish (Teleostei, Cichlidae): A Test of the Ideal Free Distribution Theory. Anim.
Behav. 32, 120–131 (1984)
12. Ackermann, H., Röglin, H., Vöcking, B.: On the Impact of Combinatorial Structure
on Congestion Games. In: Proceedings of FOCS 2006 (2006)
13. Southwell, R., Huang, J.: Convergence Dynamics of Resource-Homogeneous Con-
gestion Games. In: Jain, R., Kannan, R. (eds.) GameNets 2011. LNICST, vol. 75,
pp. 281–293. Springer, Heidelberg (2012)
14. Milchtaich, I.: Congestion Games with Player-Specific Payoff Functions. Games
and Economic Behavior 13, 111–124 (1996)
15. Chen, X., Huang, J.: Spatial Spectrum Access Game: Nash Equilibria and Dis-
tributed Learning. In: ACM International Symposium on Mobile Ad Hoc Net-
working and Computing (MobiHoc), Hilton Head Island, South Carolina, USA
(June 2012)
16. Ackermann, H., Röglin, H., Vöcking, B.: Pure Nash Equilibria in Player-Specific
and Weighted Congestion Games. In: Spirakis, P.G., Mavronicolas, M., Konto-
giannis, S.C. (eds.) WINE 2006. LNCS, vol. 4286, pp. 50–61. Springer, Heidelberg
(2006)
17. Tekin, C., Liu, M., Southwell, R., Huang, J., Ahmad, S.: Atomic Congestion Games
on Graphs and its Applications in Networking. IEEE Transactions on Networking
(to appear, 2012)
18. Etkin, R., Parekh, A., Tse, D.: Spectrum sharing for unlicensed bands. IEEE Jour-
nal on Selected Areas in Communications 25, 517–528 (2007)
46 R. Southwell et al.

19. Bilo, V., Fanelli, A., Flammini, M., Moscardelli, L.: Graphical congestion games.
Algorithmica 61, 274–297 (2008)
20. Fotakis, D., Gkatzelis, V., Kaporis, A.C., Spirakis, P.G.: The Impact of Social
Ignorance on Weighted Congestion Games. In: Leonardi, S. (ed.) WINE 2009.
LNCS, vol. 5929, pp. 316–327. Springer, Heidelberg (2009)
21. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency. Matroids,
Trees, Stable Sets, Volume B, 39–69 (2009)
22. Werneck, R., Setubal, J., Conceicao, A.: Finding minimum congestion spanning
trees. Journal of Experimental Algorithmics 5 (2000)
23. Goemans, M., Li, L., Mirrokni, V., Thottan, M.: Market sharing games applied to
content distribution in ad-hoc networks. In: Proceedings of MobiHoc 2004 (2004)
24. Southwell, R., Chen, Y., Huang, J., Zhang, Q.: Convergence Dynamics of Graphi-
cal Congestion Games, Technical Report,
http://jianwei.ie.cuhk.edu.hk/publication/GCCConvergenceTechReport.pdf
Establishing Network Reputation
via Mechanism Design

Parinaz Naghizadeh Ardabili and Mingyan Liu

Department of Electrical Engineering and Computer Science


University of Michigan, Ann Arbor, Michigan, 48109-2122
{naghizad,mingyan}@umich.edu

Abstract. In any system of networks, such as the Internet, a network


must take some measure of security into account when deciding whether
to allow incoming traffic, and how to configure various filters when mak-
ing routing decisions. Existing methods tend to rely on the quality of
specific hosts in making such decisions, resulting in mostly reactive se-
curity policies. In this study we investigate the notion of reputation of a
network, and focus on constructing mechanisms that incentivizes the par-
ticipation of networks to provide information about themselves as well as
others. Such information is collected by a centralized reputation agent,
who then computes a reputation index for each network. We use a simple
mechanism to demonstrate that not only a network has the incentive to
provide information about itself (even though it is in general not true),
but also that this information can help decrease the estimation error.

Keywords: Mechanism Design, Network Reputation, Incentives.

1 Introduction
This paper studies the following mechanism design problem: in a distributed multi-
agent system where each agent possesses beliefs (or perceptions) of each other,
while the truth about an agent is only known to that agent itself and it may have
an interest in withholding the truth, how to construct mechanisms with the proper
incentives for agents to participate in a collective effort to arrive at the correct
perceptions of all participants without violating privacy and self-interest.
Our main motivation lies in the desire to enhance network security through
establishing the right quantitative assessment of the overall security posture of
different networks at a global level; such a quantitative measure can then be used
to construct sophisticated security policies that are proactive in nature, which are
distinctly different from current solutions that typically tackle specific security
problems. Such quantitative measure can also provide guidance to networks’
human operators to more appropriately allocate resources in prioritizing tasks –
after all, the health of a network is very much a function of the due diligence of
its human administrators.

The work is partially supported by the NSF under grant CIF-0910765 and CNS-
121768, and the U.S. Department of Commerce, National Institute of Standards
and Technology (NIST) Technology Innovation Program (TIP) under Cooperative
Agreement Number 70NANB9H9008.

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 47–62, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
48 P. Naghizadeh Ardabili and M. Liu

Consider a system of inter-connected networks. Each network has access to statis-


tics gleaned from inbound and outbound traffic to a set of other networks. From
these statistics it can form certain opinions about the quality or “cleanliness” of
these other networks, and actions are routinely taken based on such opinions. For
instance, network administrators may choose to block a high percentage of inbound
traffic from a network observed to send out a large number of spams. Such peer
network-network observations are often incomplete – a network does not get to
see the entire traffic profile of another network – and can be biased. Thus two net-
works’ view of a common third network may or may not be consistent.
The true quality of a network ultimately can only be known to that network
itself, though sometimes a network may not have or choose to use the resources
needed to obtain this knowledge. It is however not necessarily in the network’s
self-interest to truthfully disclose this information: a network has incentive to
inflate other’s perception about itself. This is because this perceived high qual-
ity often leads to higher visibility and less blocked outbound traffic from this
network. Similarly, a network may or may not wish to disclose truthfully what it
observes about others for a variety of privacy considerations. On the other hand,
it is typically in the interest of all networks to have the correct perception about
other networks. This is because this correct view of others can help the system
administrator determine the correct security configurations.
In this paper we set out to examine the validity and usefulness of a reputation
system, where a central reputation agent solicits input from networks regarding
their perceptions of themselves and others, and computes a reputation index
for each network as a measure/indicator of the health or security posture of a
network. These reputation indices are then broadcast to all networks; a network
can in turn combine such reputation information with its local observations to
take proactive measures to maintain “good” reputation and/or improve its own
reputation over time, and take proactive measures to protect themselves against
networks with “bad” reputations. The ultimate goal of this type of architecture
is to improve global network security, which has been championed by and is
gaining support from network operators’ organizations, see e.g., [10].
The design and analysis of such a system must observe two key features. The
first is that participation in such a system is completely voluntary, and therefore
it is critical for the system to adopt mechanisms that can incentivize networks
to participate. The second is that networks may not report truthfully to the
reputation agent even if they choose to participate in such a collaborative effort,
and therefore it is crucial for any mechanism adopted by the system to either
provide the right incentive to induce truth revelation, or be able to function
despite untruthful input from networks.
It should be noted that a wide variety of systems have been developed to deter-
mine host reputation by monitoring different types of data. Darknet monitors [2],
DNS sensors [1], scanning detection, firewall logs [3], web access logs, and ssh brute
force attack reports are all examples of systems that can report on hosts that have
engaged in potentially suspicious behavior. The most commonly used host repu-
tation systems are related to determining illegitimate email messages or SPAM.
Establishing Network Reputation 49

A wide range of different organizations such as SPAMHAUS [12], SpamCop [6],


Shadowserver [14], and Barracuda [11], independently operate their own reputa-
tion lists, which are largely generated by observing unauthorized email activity
directed at monitored spamtraps. In addition, organizations such as Team Cymru
[8], Shadowserver, and Damballa [7] generate similar reputation lists by analyzing
malware or even DNS activity. There is however a significant difference between
assessing individual hosts’ reputation vs. defining reputation as a notion for a net-
work. Host reputation lists by themselves cannot directly be used in developing a
consistent security policy due to the dynamic nature of host addresses.
Besides the security context, there has been a large volume of literature on the
use of reputation in peer-to-peer (P2P) systems and other related social network
settings. Specifically, a large population and the anonymity of individuals in
such social settings make it difficult to sustain cooperative behavior among self-
interested individuals [5]. Reputation has thus been used in such systems as
an incentive mechanism for individuals to cooperate and behave according to
a certain social norm in general [15], and to reciprocate in P2P systems in
particular [4,13,9]. While the focus of social network studies is on the effect of
changing reputation has on individuals, the focus of our study in its present form
is on how to make network reputation an accurate representation of a network’s
security posture. Accordingly, our emphasis is on how to incentivize participation
from networks, while user participation in a P2P system is a given (i.e., by default
reputation only applies to an active user already in a P2P system).
Our main findings are summarized as follows. We propose a reputation mech-
anism which induces a network to participate in the collective assessment of its
own reputation. We first show that for two networks (Section 3), a network’s par-
ticipation can result in a higher mean estimated reputation and at the same time
lower estimation error, thus benefiting both itself and the system. This remains
true even if the observations of the other network is biased. We further show in
Section 4 that these results extend to the case of multiple interacting networks.

2 The Model, Main Assumptions, and Preliminaries

2.1 The Model

Consider a system of K inter-connected networks, denoted by N1 , N2 , · · · , NK .


Network Ni ’s overall health condition is described by a quantity rii , which will
also be referred to as the true or real reputation of Ni , or simply the truth. We
will assume without loss of generality that these true quantities are normalized,
i.e., rii ∈ [0, 1], for all i = 1, 2, · · · , K.
There is a central reputation agent, who solicits and collects a vector
(Xij )j∈K of reports from each network Ni . It consists of cross-reports Xij , i, j =
1, 2, · · · , K, j = i, which represent Ni ’s assessment of Nj ’s quality, and self-
reports Xii , i = 1, 2, · · · , K, which are the networks’ self-advertised quality mea-
sure disclosed to the reputation agent. The reputation agent’s goal is to compute
a reputation index denoted by r̂i , which is an estimate of rii for each network
50 P. Naghizadeh Ardabili and M. Liu

Ni using a certain mechanism with the above inputs collected from the net-
works. This index/estimate will then be used by peer networks to regulate their
interactions with Ni .

2.2 Assumptions
We assume that each network Ni is aware of its own conditions and therefore
knows rii precisely, but this is in general its private information. While it is
technically feasible for any network to obtain rii by closely monitoring its own
hosts and traffic, it is by no means always the case due to reasons such as resource
constraints.
We also assume that a network Ni can sufficiently monitor inbound traffic
from network Nj so as to form an estimate of Nj ’s condition, denoted by Rij ,
based on its observations. However, Ni ’s observation is in general an incomplete
view of Nj , and may contain error depending on the monitoring and estimation
technique used. We will thus assume that Rij is described by a Normal distribu-
tion N (μij , σij
2
), which itself may be unbiased (μij = rjj ) or biased (μij = rjj ).
We will further assume that this distribution is known to network Nj (a relax-
ation of this assumption is also considered later). The reason for this assumption
is that Nj can closely monitor its outbound traffic to Ni , and therefore may suf-
ficiently infer how it is perceived by Ni . On the other hand, Ni itself may or
may not be aware of the distribution N (μij , σij2
).
A reputation mechanism specifies a method used by the reputation agent to
compute the reputation indices, i.e., how the input reports are used to generate
output estimates. We assume the mechanism is common knowledge among all
K participating networks.
A participating network Ni ’s objective is assumed to be characterized by the
following two elements: (1) it wishes to obtain from the system as accurate as
possible a reputation estimate r̂j on networks Nj other than itself, and (2) it
wishes to obtain as high as possible an estimated reputation r̂i on itself. It must
therefore report to the reputation agent a carefully chosen (Xij )j∈K , using its
private information rii , its knowledge of the distributions (Rji )j∈K\i , and its
knowledge of the mechanism, to increase (or inflate) as much as possible r̂i
while keeping r̂j close to rjj . The reason for adopting the above assumption is
because, as pointed out earlier, accurate assessment of other networks’ security
posture can help a network configure its policies appropriately, and thus correct
perception of other networks is critical. On the other hand, a network has an
interest in inflating its own reputation so as to achieve better visibility and less
traffic blocked by other networks, etc. Note that these two elements do not fully
define a network’s preference model (or utility function). We are simply assuming
that a network’s preference is increasing in the accuracy of others’ reputation
estimate and increasing in its own reputation estimate, and that this is public
knowledge1 .
1
How the preference increases with these estimates and how these two elements are
weighed remain the network’s private information and do not factor into the present
analysis.
Establishing Network Reputation 51

Note also that the objective assumed above may not capture the nature of a
malicious network, who may or may not care about the estimated perceptions
about itself and others. Recall that our basic intent through this work is to
provide reputation estimate as a quantitative measure so that networks may
adopt and develop better security policies and be incentivized to improve their
security posture through a variety of tools they already have. Malicious networks
are not expected to react in this manner. On the other hand, it must be admitted
that their participation in this reputation system, which cannot be ruled out as
malicious intent may not be a priori knowledge, can very well lead to skewed
estimates, thereby rendering the system less than useful. The hope is that a
critical mass of non-malicious networks will outweigh this effect, but this needs
to be more precisely established and is an important subject of future study.

2.3 Candidate Mechanisms and Rationale


One simple mechanism that can be used by the reputation agent is to take the
estimate r̂i to be the average of the cross-reports Xji and the self-report Xii . It
can be easily seen that in this case, Ni will always choose to report Xii = 1, and
thus the self-reports will bear no information. The mechanism can be modified
to take the average of only the cross-reports (Xji )j∈K\i as the estimate. If cross-
reports are unbiased, then r̂i can be made arbitrarily close to rii as the number of
networks increases. We will later take the mean absolute error of this mechanism,
which we will refer to as the averaging mechanism, as a benchmark in evaluating
the performance of other mechanisms.
An alternative to completely ignoring Ni ’s self-report is to induce or incen-
tivize Ni to provide useful information in its self-report even if it is not the precise
truth rii . With this in mind, a good mechanism might on one hand convince Ni
that it can help contribute to a desired, high estimate r̂i by supplying input Xii ,
while on the other hand try to use the cross-reports, which are estimates of the
truth rii , to assess Ni ’s self-report and threaten with punishment if it is judged
to be overly misleading.
Also, note that it is reasonable to design a mechanism in which Ni ’s cross-
reports are not used in calculating its own reputation estimate. By doing so,
we ensure that the cross-reports are reported truthfully2 . To see why this is the
case, note that by its cross-reports Ni can now only hope to increase its utility
by altering r̂j . Now Ni ’s best estimate of rjj is Rij , which it knows will be used
as a basis for the estimate r̂j . On the other hand, due to its lack of knowledge of
rjj , Ni can’t use a specific utility function to see how it can strategically choose
Xij so as to increase its utility. By this argument, for the rest of the paper we will
assume that the cross-reports are reported truthfully, and that this is common
knowledge.
It is worthwhile to emphasize that the above reasoning on truthful cross-
reports derives from accounting for the direct effect of the cross-reports on the
2
This is conceptually similar to not using a user’s own bid in calculating the price
charged to him in the context of auction, a technique commonly used to induce
truthful implementation.
52 P. Naghizadeh Ardabili and M. Liu

final estimates. One might argue that a network could potentially improve its
relative position by providing false cross-reports of other networks so as to lower
their reputation indices, i.e., it can make itself look better by comparison. A
close inspection of the situation reveals, however, that there is no clear incentive
for a network to exploit such indirect effect of their cross-reports either.
One reason is that the proposed reputation system is not a ranking system,
where making other entities look worse would indeed improve the standing of
oneself. The reputation index is a value normalized between [0, 1], a more or
less absolute scale. It is more advisable that a network tighten its security mea-
sures against all networks with low indices rather than favor the highest-indexed
among them.
But more importantly and perhaps more subtly, badmouthing another net-
work is not necessarily in the best interest of a network. Suppose that after
sending a low cross-report Xij , Ni subsequently receives a low r̂j from the rep-
utation agent. Due to its lack of knowledge of other networks’ cross-reports,
Ni cannot reasonably tell whether this low estimate r̂j is a consequence of its
own low cross-report, or if it is because Nj was observed to be poor(er) by
other networks and thus r̂j is in fact reflecting Nj ’s true reputation (unless a set
of networks collude and jointly target a particular network). This ambiguity is
against Ni ’s interest in obtaining accurate estimates of other networks; therefore
bashing is not a profitable deviation from truthful reporting.

3 A Two-Network Scenario
3.1 The Proposed Mechanism
We start by considering only two networks and extend the result to multiple
networks in the next section. We will examine the following way of computing
the reputation index r̂1 for N1 , where  is a fixed and known constant. The
expression for r̂2 is similar, thus for the remainder of this section we will only
focus on N1 .
 X +X
21 11
if X11 ∈ [X21 − , X21 + ]
r̂1 (X11 , X21 ) = 2 (1)
X21 − |X11 − X21 | if X11 ∈/ [X21 − , X21 + ]

In essence, the reputation agent takes the average of self-report X11 and cross-
report X21 if the two are sufficiently close, or else punishes N1 for reporting
significantly differently. Note that this is only one of many possibilities that
reflect the idea of weighing between averaging and punishing; for instance, we
can also choose to punish only when the self-report is higher than the cross-
report, and so on.

3.2 Choice of Self-report


As stated earlier, we assume N1 believes N2 ’s cross-report is a sample of a
random variable with distribution X21 ∼ N (μ, σ 2 ). As a result, the choice of
Establishing Network Reputation 53

the self-report X11 is determined by the solution of the optimization problem


maxX11 E[r̂1 ]. Using (1), E[r̂1 ] eventually simplifies to (with F () and f () denot-
ing the cdf and pdf, respectively):

E[r̂1 ] = X11 + 2 (F (X11 + ) − 3F (X11 − ))


 X11 +  X11 −
− 12 F (x)dx − 2 F (x)dx . (2)
X11 − −∞

Taking the derivative with respect to X11 we obtain:


dE  1
= 1 + [f (X11 + ) − 3f (X11 − )] − [F (X11 + ) + 3F (X11 − )].(3)
dX11 2 2
We next re-write  = aσ; this expression of  reflects how the reputation agent
can limit the variation in the self-report using its knowledge of this variation σ 3 .
Replacing X21 ∼ N (μ, σ 2 ) and  = aσ in (3), and making the change of variable
y := X11

−μ
results in:

a a(y+1) 2 a(y−1) 2 1 a(y + 1) a(y − 1)


√ (e−( 2 ) − 3e−( 2 ) ) − (erf( √
√ √
) + 3erf( √ )) = 0 . (4)
2π 2 2 2
Therefore, if y solves (4) for a given a, the optimal value for X11 would be

X11 = μ + aσy. Equation (4) can be solved numerically for a, resulting in Figure
1. It’s interesting to see in that in Figure 1 we always have y < 1, and as a

consequence X11 < μ + . This means that N1 is choosing a self-report within
its prediction of the acceptable range. Also note that this self-report is always
positively biased, reflecting N1 ’s interest in increasing r̂1 .

3.3 Value of Cross-Report and Self-report


We next examine how close the resulting reputation estimate r̂1 is to the real
quality r11 by calculating the mean absolute error (MAE) and comparing it
to that of the averaging mechanism; from this we further illustrate the roles
and values of cross-report and self-report. We do this separately for two cases,
where the cross-report comes from an unbiased distribution and a biased distri-
bution, respectively. Note that in both cases the averaging mechanism for the
two-network scenario reduces to taking the cross-report as the estimate, i.e. the
averaging mechanism has an estimate of E[X21 ] for N1 .

Unbiased Cross-Report. We now compare the performance of (1) to the


averaging mechanism.
3
Note that we are assuming σ is known by the reputation agent as well as the net-
works. σ can be thought of as a measure of the variation of N2 ’s estimate, which
depends on the nature of its observation and the algorithm it uses for the estimate.
While this is not entirely an unreasonable assumption, it ultimately needs to be
verified through analysis of real data.
54 P. Naghizadeh Ardabili and M. Liu

Define em := E[|r̂1 − r11 |] as the MAE of the mechanism described in (1)



with  = aσ. As already derived, N1 ’s self-report is set to X11 = μ + aσy,
where y solves (4) for a given a; N2 ’s cross-report X21 is set to R21 (truthful
reporting); and R21 is assumed to be unbiased. With these assumptions, we find
the following expression for em 4 :
 
1 μ+a(y+1)σ 1 μ−ayσ
em = xf (x)dx − xf (x)dx
2 μ−ayσ 2 μ+a(y−1)σ
 μ+a(y−1)σ
−2 xf (x)dx + ayσ + (μ − ayσ) F (μ − ayσ)
−∞
3 1 
+ (μ + ayσ) F (μ + a(y − 1)σ) − F (μ + a(y + 1)σ) . (5)
2 2
As seen in (5), em is a function of a. Thus we can optimize the choice of a by
solving the problem mina em . Taking the derivative of (5) we get:
 
dem σ a (a(y+1))2 (a(y−1))2 ay
= √ (e− 2 − 3e− 2 ) + (ay + y  ) 2 + erf( √ ) +
da 2 2π 2
a 2 2
1 
− (a(y+1)) − (a(y−1)) a(y+1) a(y−1)
√ (e 2 + 3e 2 )− erf( √2 ) − 3erf( √2 ) . (6)
2π 2

0.5 0.3 0.9


Proposed Mechanism Proposed Mechanism
Estimated Reputation
Mean Absolute Error

0.4 Averaging Mechanism 0.8 Averaging Mechanism


0.28
0.3 0.7
0.26
y

0.2 0.6
0.24
0.1 0.5

0 0.22 0.4
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
a a a

Fig. 1. Solution of (4): y Fig. 2. Errors vs. a, r11 = Fig. 3. Est. reputation vs.
vs. a 0.75, σ 2 = 0.1 a, r11 = 0.75, σ 2 = 0.1

As seen in (6), the optimal choice of a does not depend on the specific values of
μ and σ. Therefore, the same mechanism can be used for any set of networks.
Equation (6) can be solved numerically, and is zero at two values: at a = 0,
which indicates a local maximum, and at a ≈ 1.7, where it has a minimum. This
can been seen from Figure 2, which shows the MAE of the proposed mechanism
compared to that of the averaging mechanism. Under the averaging mechanism
the MAE is E[|R21 − r11 |] = π2 σ. We see that for a large range of a values the
mechanism given in (1) results in smaller estimation error. This suggests that
N1 ’s self-report can significantly benefit the system as well as all networks other
than N1 .
4 1
The calculations here are possible if y ≤ 2
, which based on Figure 1 is a valid
assumption for moderate values of a.
Establishing Network Reputation 55

We next examine whether there is incentive for N1 to provide this self-report,


i.e., does it benefit N1 itself? Figure 3 compares N1 ’s estimated reputation r̂i
under the proposed mechanism to that under the averaging mechanism, in which
case it is simply N2 ’s cross-report X21 , and E[X21 ] = μ when unbiased.
Taking Figs 2 and 3 together, we see that there is a region, a ∈ [2, 2.5] in which
the presence of the self-report helps N1 obtain a higher estimated reputation,
while helping the system reduce its estimation error on N1 . This is a region that
is mutually beneficial to both N1 and the system, and N1 clearly has an incentive
to participate and provide the self-report.

Biased Cross-Report. We now turn to the case where the cross-report X21
comes from the biased distribution N (r11 + b, σ 2 ), where b is the bias term, a
fact unknown to both N2 and the reputation mechanism. We will thus assume
that the mechanism used remains that given by (1) with the optimal value of a
obtained previously.
First consider the case that N1 is also not aware of the bias, and again chooses

X11 = r11 +ayσ. The calculation of the error is the same, leading to (5). However,
here F and f are those of the Normal distribution N (r11 + b, σ 2 ). Therefore, the
new minimum error and the value of a where it occurs are different. Figure 4
shows the MAE for three different values of the bias. As seen from the figure,
the error increases for b = −0.1σ, and decreases for b = 0.1σ compared to the
unbiased case. This is because for the negative bias, N1 is not adapting its self-
advertised reputation accordingly. This makes the mechanism operate mainly
in the punishment phase, which introduces larger errors. For the small positive
bias, however, the mechanism works mainly in the averaging phase, and the error
is less than both the biased and unbiased cases. The latter follows from the fact
that punishment phases happen more often in the unbiased case. Note however
that for larger values of positive bias, the error will eventually exceed that of the
unbiased case.

1
Unbiased
0.4 0.4
Estimated Reputation

Unbiased Unbiased 0.9 Biased b=0.1σ


Mean Absolute Error

Mean Absolute Error

Biased b = 0.1σ Biased b = 0.1σ Biased b=−0.1σ


0.35 0.35 0.8 Simple Averaging
Biased b = − 0.1σ Biased b = − 0.1σ
Simple Averaging Simple Averaging 0.7
0.3 0.3
0.6
0.25 0.25
0.5

0.2 0.2 0.4


0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
a a a

Fig. 4. MAE, biased Fig. 5. MAE, biased Fig. 6. Est. reputation,


cross-reports, bias not cross-reports, bias known biased cross-reports, bias
known known

Next we consider the case where X21 ∼ N (r11 +b, σ 2 ) as before but this bias is

known to N1 . N1 will accordingly adapt its self-report to be X11 = r11 + b + ayσ.
Figure 5 shows a comparison in this case. The results show that the selected
positive bias increases the error, while the negative bias can decrease the error
compared to the unbiased case.
56 P. Naghizadeh Ardabili and M. Liu

The assumption of a known bias has the following two intuitively appealing
interpretations. The first is where N1 has deliberately sent its traffic through N2
in such a way so as to bias the cross-report. As expected, it’s in the interest of N1
to introduce a positive bias in N2 ’s evaluation of itself. If this is what N1 chooses
to do then arguably the mechanism has already achieved its goal of improving
networks’ security posture – after all, N2 now sees a healthier and cleaner version
of N1 which is welcomed! The second case is where given the mechanism, N2
knows that N1 will introduce a positive bias in its self-report, and consequently
counter-acts by sending a negatively biased version of its observation. To find
the optimal choice for this deliberately introduced bias we proceed as follows.
Define μ := r11 + b. To see how the mean absolute error behaves, we find an
expression for em at any given a.5

 μ+a(y+1)σ  2r11 −μ−ayσ


1 1
em = (μ − r11 + ayσ) + xf (x)dx − xf (x)dx
2 2r11 −μ−ayσ 2 μ+a(y−1)σ
 μ+a(y−1)σ
−2 xf (x)dx + (2r11 − μ − ayσ) F (2r11 − μ − ayσ)
−∞
3 1
+ (μ + ayσ)( F (μ + a(y − 1)σ) − F (μ + a(y + 1)σ)) . (7)
2 2
where F and f are the cdf and pdf of the biased distribution. To find the value
of b at which the error is minimized, we take the derivative of (7), resulting in:
dem
= 1 − 2F (2r11 − μ − ayσ) = 0 . (8)

Solving (8) will show that for a given a, the MAE is minimized at b∗ = − ayσ 2 . As
a result, the final reports sent by the two networks will be X11∗
= r11 + ayσ2 and
X21∗
= R21 − ayσ2 , which in turn increases the chance of having the mechanism
operate in the averaging phase, thus decreasing the error.
As in the unbiased case, we also compare the estimated reputation r̂1 in this
case to highlight that there is incentive for N1 to provide self-report, shown
in Figure 6. A comparison between Figs. 5 and 6 reflects the tradeoff between
achieving a lower estimation error and helping N1 achieve a higher estimated
reputation. In the case of positive bias, even though N1 benefits from providing
a self-report for smaller values of a compared to the unbiased case, the system
can use a more limited range of a to decrease MAE compared to the averaging
mechanim. Similarly, larger values of a are required for incentivizing N1 ’s partic-
ipation when the cross-report is negatively biased, while the MAE improvement
is achieved for a larger range of a.

4 Extension to a Multi-network Scenario


We now consider the case with more than two participating networks. The pro-
posed mechanism can be extended as follows. The reputation agent now receives
5 aσ
The following calculations are for moderate values of bias b ∈ [−ayσ, −ayσ + 2
].
Establishing Network Reputation 57

more cross-reports on the basis of which it will judge Ni . In the simplest case, the
agent can take the average of all the cross-reports to get X0i := K−1 1
Σj∈K\i Xji ,
and derive r̂i using:
 X +X
0i ii
if Xii ∈ [X0i − , X0i + ]
r̂i (Xii , X0i ) = 2 . (9)
X0i − |Xii − X0i | if Xii ∈
/ [X0i − , X0i + ]

Another alternative is using a weighted version of the cross-reports in this mech-


anism. We defer this discussion to later in the section. For the mechanism defined
in (9), we again have two cases, one where the cross-reports are unbiased, and
one where they are biased. In the second case, we further distinguish between
the cases where the bias itself is of a non-skewed distribution and where the bias
distribution is skewed.

4.1 Unbiased Cross-Reports


We will assume Xji ∼ N (μji , σji
2
), and that these distributions are independent.
Σ μ Σ σ2
Thus X0i also has a Normal distribution given by N ( j∈K\i K−1
ji
, (K−1)
j∈K\i ji
2 ). The

optimization problem for Ni is the same as before resulting in Xii = μ + ayσ  ,


with μ and σ 2 being the mean and variance of X0i . Note that in this case the
reputation agent is using  = aσ  .
If all cross-reports are unbiased, i.e., μji = rii , and σji = σ, we have X0i ∼
σ2
N (rii , K−1 ). To find the optimal choice of a we will need to solve (6) again, with
the only difference that σ is replaced by σ  . Therefore, the optimal choice of a,
which is independent of the mean or variance of the reports, will be the same
as before. This result can be verified in Figures 7 and 8, which show the MAE
of collections of 3 and 10 networks respectively. Furthermore, as expected the
error decreases as the number of networks increases in this case.

4.2 Biased Cross-Reports


Now assume that the cross-reports are biased and that the bias term it-
self comes from a Normal distribution. We re-write Xji = Rji + Bji , where
Rji ∼ N (rii , σji
2
), and Bji ∼ N (bji , σb,ji
2
). Therefore, assuming independence, in
general we have:

j∈K\i bji + σb,ji


2 2
j∈K\i (σji )
X0i ∼ N (rii + , ). (10)
K −1 (K − 1)2

Non-skewed Bias Distribution. If the bias distribution has zero mean (bji =
0) and all variance terms are the same: σji = σ and σb,ji = σb , then (10)
σ2 +σ2
is simplified to X0i ∼ N (rii , σ 2 ), where σ 2 = K−1b . The calculation of the
optimal self-report is given by the same optimization problem as before, resulting
in Xii∗ = rii + ayσ  . Figures 9 and 10 show the simulation results for K = 3
and K = 10 respectively. As expected, biased cross-reports result in larger error
58 P. Naghizadeh Ardabili and M. Liu

0.24 0.1
Proposed Mechanism 0.22
Proposed Mechanism 0.095 Averaging Mechanism Proposed Mechanism−Biased
Mean Absolute Error

Mean Absolute Error


0.22
Averaging Mechanism 0.21
Averaging Mechanism−Biased
0.09 Proposed Mechanism−Unbiased

Mean Absolute Error


0.2
0.2 Averaging Mechanism−Unbiased
0.085 0.19
0.18 0.18
0.08
0.17
0.16 0.075
0.16

0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0.15


0 0.5 1 1.5 2 2.5 3
a a a

Fig. 7. MAE, 3 Networks, Fig. 8. MAE, 10 Net- Fig. 9. MAE, 3 Networks,


Unbiased Cross-Reports works, Unbiased Cross- non-skewed bias distribution
Reports

0.1 0.8
0.105 Proposed Mechanism−Biased
Proposed Mechanism−Biased
0.1 0.095 Averaging Mechanism−Biased
Averaging Mechanism−Biased
Proposed Mechanism−Unbiased

Estimated Reputation
0.75
Mean Absolute Error

Proposed Mechanism−Unbiased
Mean Absolute Error

0.095 0.09
Averaging Mechanism−Unbiased Averaging Mechanism−Unbiased
0.09
0.085 0.7
0.085 Proposed Mechanism−Biased
0.08
0.08 0.65 Averaging Mechanism−Biased
0.075
0.075 Proposed Mechanism−Unbiased
Averaging Mechanism−Unbiased
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
a a a

Fig. 10. MAE, 10 Net- Fig. 11. MAE, 10 Net- Fig. 12. Est. Reputation,
works, non-skewed bias works, skewed bias 10 Networks, skewed bias
distribution distribution distribution

compared to unbiased cross-reports: the fact that σ  > σ  in the unbiased case
allows N1 to introduce a larger inflation in its self-report, thus increasing the
MAE in general.

Skewed Bias Distribution. If we assume that all bias terms are from the
same distribution but this distribution is skewed itself, i.e. B0i ∼ N (b0i , σb ),
then negatively biased cross-reports can result in lower MAE compared to a
non-skewed bias distribution, while positively biased cross-reports can increase
the error. Figure 11 verifies this property of the mechanism in a collection of 10
networks, and for a negative value of b0i .
In all of the above cases, we need the range of a to be such that using the
proposed mechanism is mutually beneficial for the system and the individual
networks. Our numerical results show that, when cross-reports are unbiased,
the values of a for which it is individually rational for a network to participate
does not change as the number of networks increases. Also, this range remains
unchanged if the cross-reports have a non-skewed bias distribution. In the case
of skewed bias distribution a similar behavior as the two-network scenario is
observed, where individual networks have more incentive to participate in the
estimation of their own reputation when there is a positive bias in the cross-
reports, and are less inclined to do so in the presence of a negative bias.
Figure 12 illustrates these results. As seen in the figure, for unbiaed cross-
reports, the range for which networks are incentivized to participate is again
roughly a ∈ [2, 2.5] despite the increase in the number of networks. The figure
also shows the effect of a choice of b = −0.1σ for cross-reports with skewed
bias. A careful study of this figure along with Figure 11 indicates that the same
Establishing Network Reputation 59

tradeoff described in section 3 holds between minimizing error and providing


incentive for participation.

4.3 Weighted Mean of Cross-Reports


So far, we have assumed the reputation agent takes a simple average of the cross-
reports to judge the truthfulness of the self-report. Assume that as suggested
earlier, the agent forms the weighted mean:

j∈K\i wj Xji
X0i := (11)
j∈K\i wj

where w := (wj )j∈K\i is a vector of weights, also specified by the reputation


agent. One reasonable choice for w could be a vector of previously computed
reputations r̂j , with the goal of allowing the more reputable networks to have a
higher influence on the estimate. We proceed by analyzing the performance of
this alternative mechanism.

Unbiased Cross-Reports. Assume Xji ∼ N (rii , σji 2


). By adopting this as-
sumption, we focus on a scenario where all networks have an unbiased view
of Ni , but potentially different accuracy as reflected by different values of σji ,
with smaller variances corresponding to more precise estimates.

Consequently,
wj 2 σ2
the weighted mean in (11) has a distribution X0i ∼ N (rii , ( j∈K\i ji
2 ). Thus
j∈K\i wj )
except for the change in the equivalent variance, the overall problem remains
the same as the one discussed earlier6 . Since an increased variance increases the
MAE, in order to have a better estimate 
using the weighted

average compared
wj 2 σ2 σ2
2 ≤
j∈K\i ji j∈K\i ji
to the simple average, we would need ( (K−1)2 .
j∈K\i wj )


In the special case σji = σ, ∀j, the Cauchy-Schwarz inequality implies
wj 2
j∈K\i ≥ K−1
2
1
, with equality at wj = w0 , ∀j. This is true independent
( j∈K\i wj )
of the choice of w, and therefore the weighted average will always have higher
estimation error. Figure 13 shows this result for a random choice of the vector
w.
Next consider the case where σji ’s are different. Without lose of generality
assume that the coefficients are normalized such that they sum to 1. In order to
achieve lower estimation error, we want to choose w such that j∈K\i wj 2 σji 2

1
σ
j∈K\i (K−1)2 ji
2
. This rearrangement shows clearly that for the inequality to
hold, it suffices to put more weight on the smaller σji , i.e., more weight on
those with more accurate observations. It follows that if more reputable networks
(higher r̂j ) also have more accurate observations (smaller σji ), then selecting
weights according to existing reputation reduces the estimation error. Figure 14
shows the results for 3 networks when σ31 < σ21 , and the weights are chosen
accordingly to be w = (0.45, 0.55).
6
In fact, using a simple average of cross-reports is a special case of this problem by
using equal wj and σji .
60 P. Naghizadeh Ardabili and M. Liu

0.28 0.21
Proposed Mechanism − Weighted Average Proposed Mechanism − Weighted Average
0.26 Proposed Mechanism − Simple Average 0.2 Proposed Mechanism − Simple Average
Averaging Mechanism Averaging Mechanism

Mean Absolute Error


Mean Absolute Error
0.24
0.19
0.22
0.18
0.2
0.17
0.18

0.16 0.16

0.15
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
a a

Fig. 13. MAE, 3 Networks, Weighted Fig. 14. MAE, 3 Networks, Weighted
Averages, Equal Variances Averages, Different Variances

0.22 0.8
Proposed Mechanism − Weighted Average Proposed Mechanism − Weighted Average
0.795 Proposed Mechanism − Simple Average
Proposed Mechanism − Simple Average
0.2 Averaging Mechanism
Averaging Mechanism 0.79

Estimated Reputation
Mean Absolute Error

0.785
0.18
0.78
0.16 0.775

0.77
0.14
0.765

0.12 0.76
0 0.5 1 1.5 2 2.5 3 1.75 1.8 1.85 1.9 1.95 2 2.05 2.1 2.15
a a

Fig. 15. MAE, 3 Networks, Weighted Fig. 16. Est. Reputation, 3 Networks,
Averages, Skewed Bias Weighted Averages, Skewed Bias

Biased Cross-Reports. Assume now Xji ∼ N (rii + bji , σji


2
+ σb,ji
2
). Then (11)
 
wj bji wj 2 (σ2 +σ2 )
results in X0i ∼ N (rii + j∈K\i wj , j∈K\i  ji
( j∈K\i wj )2
b,ji
). The case of equally
j∈K\i
distributed bias terms is very similar to before, and it will only add a bias term
to the mean of the equivalent X01 . Therefore, we only focus on the case where
bji ’s are different.
In this case we have two ways of improving the result over the simple averag-
ing. Following our previous discussion, putting more weight on the cross-reports
that have smaller variances will decrease the final variance and thus the esti-
mation error. On the other hand, if we put more weight on smaller bias terms,
the overall bias will decrease. As already discussed in the beginning of this sec-
tion, positively biased cross-reports increase the estimation error. Thus, having
a smaller bias term will improve the MAE. Figure 15 shows the results for 3
networks, where N3 has a better estimate than N2 , by which we mean both
0 < b31 < b21 and σ31 < σ21 . The weights are chosen such that w3 > w2 .
Finally, we check networks’ incentives under the weighted version of the mech-
anism. Based on our previous observation, we expect a similar tradeoff here
as well: the lower MAE comes at the cost of the reduction in the range of a
that makes the mechanism individually rational. This effect is illustrated in
Figure 16.

5 Discussion and Conclusion


We demonstrated the feasibility of designing network reputation mechanisms
that can incentivize networks to participate in the collective effort of determining
Establishing Network Reputation 61

their health conditions by providing information about themselves and others.


We showed that our mechanism can allow both the participants and the system
to benefit. Furthermore, the mechanism remains robust even if we relax the
assumption of unbiased initial estimation. As a byproduct of this analysis, we
observed how once the mechanism is fixed, networks can improve the assessment
even further by strategically choosing their cross-reports. We also verified that
the same results hold as the number of participating networks increases.
This is only the first step toward building a comprehensive global reputation
system; there remain many interesting and challenging problems to pursue.
To begin, the mechanisms proposed here (simple and weighted averages) are
just two of many possible choices. In particular, it would be desirable to relax
the assumption of having known variances, σij 2
, throughout the system, and see
if it is possible to design alternative mechanisms that can achieve the same or
better performance. Secondly, in practice it is possible for the reputation agent
to obtain direct observations of its own as additional input to the estimation.
This may allow us to relax the assumption that the cross-reports are truthful
(though as we have argued this is a reasonable assumption in and by itself).
Thirdly, it would be very interesting to analyze the effect of the presence of a
small percentage of malicious networks as discussed in the paper.
At an architectural level, it would be of great interest to design a distributed
mechanism without the need for a central reputation agent. One possibility is
to follow a gossip-like procedure, where neighboring networks update their re-
spective estimates using values provided by other networks through a similar
averaging-punishment process to ensure that peer networks provide useful if not
entire true information. It would be interesting to see what type of computa-
tion will lead to system-wide convergence to accurate estimates of the networks’
health conditions.

References

1. Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., Feamster, N.: Building a Dy-
namic Reputation System for DNS. In: 19th USENIX Security Symposium (August
2010)
2. Bailey, M., Cooke, E., Myrick, A., Sinha, S.: Practical Darknet Measurement. In:
40th Annual Conference on Information Sciences and Systems (March 2006)
3. DShield. How To Submit Your Firewall Logs To DShield (September 2011),
http://isc.sans.edu/howto.html
4. Feldman, M., Lai, K., Stoica, I., Chuang, J.: Robust incentive techniques for
peer-to-peer networks. In: ACM Conference on Electronic Commerce, pp. 102–111
(2004)
5. Hanaki, N., Peterhansl, A., Dodds, P., Watts, D.: Cooperation in evolving social
networks. Management Science 53(7), 1036–1050 (2007)
6. Cisco Systems Inc. SpamCop Blocking List - SCBL (May 2011),
http://www.spamcop.net/
7. Damballa Inc. Damballa Threat Reputation System (May 2011),
http://www.damballa.com/
62 P. Naghizadeh Ardabili and M. Liu

8. Team Cymru Inc. Malicious Activity Insight (May 2011),


http://www.team-cymru.com/Services/Insight/
9. Kamvar, S., Schlosser, M.T., Molina, H.G.: The Eigentrust Algorithm for Reputa-
tion Management in P2P Networks. In: International Conference on World Wide
Web, pp. 640–651 (2003)
10. Karir, M., Creyts, K., Mentley, N.: Towards Network Reputation - Analyzing the
Makeup of RBLs. In: NANOGG52, Denver, CO (June 2011),
http://www.merit.edu/networkresearch/papers/pdf/
2011/NANOG52 reputation-nanog.pdf
11. Barracuda Networks. Barracuda Reputation Blocklist (May 2011),
http://www.barracudacentral.org/
12. The SPAMHAUS project. SBL, XBL, PBL, ZEN Lists ( May 2011),
http://www.spamhaus.org/
13. Ravoaja, A., Anceaume, E.: STORM: A Secure Overlay for P2P Reputation Man-
agement. In: International Conference on Self-Adaptive and Self-Organizing Sys-
tems, pp. 247–256 (2007)
14. ShadowServer. The ShadowServer Botnet C&C List (May 2011),
http://www.shadowserver.org/
15. Zhang, Y., van der Schaar, M.: Peer-to-Peer Multimedia Sharing based on Social
Norms. Elsevier Journal on Signal Processing: Image Communication Special Issue
on Advances in Video Streaming for P2P Networks (to appear)
Efficiency Loss in a Cournot Oligopoly
with Convex Market Demand

John N. Tsitsiklis and Yunjian Xu

Laboratory or Information and Decision Systems, MIT, Cambridge, MA, 02139, USA
{jnt,yunjian}@mit.edu

Abstract. We consider a Cournot oligopoly model where multiple sup-


pliers (oligopolists) compete by choosing quantities. We compare the
social welfare achieved at a Cournot equilibrium to the maximum pos-
sible, for the case where the inverse market demand function is convex.
We establish a lower bound on the efficiency of Cournot equilibria in
terms of a scalar parameter derived from the inverse demand function.
Our results provide nontrivial quantitative bounds on the loss of social
welfare and aggregate profit for several convex inverse demand functions
that appear in the economics literature.

Keywords: Price of anarchy, Cournot oligopoly, revenue management.

1 Introduction

In a book on oligopoly theory (see Chapter 2.4 of [6]), Friedman raises an in-
teresting question on the relation between Cournot equilibria and competitive
equilibria: “is the Cournot equilibrium close, in some reasonable sense, to the
competitive equilibrium?” While a competitive equilibrium is generally socially
optimal, a Cournot (Nash) equilibrium can yield arbitrarily high efficiency loss
in general [8]. The concept of efficiency loss is intimately related to the concept
of “price of anarchy,” advanced by Koutsoupias and Papadimitriou in a seminal
paper [11]; it provides a natural measure of the difference between a Cournot
equilibrium and a socially optimal competitive equilibrium.
For Cournot oligopoly with affine demand functions, various efficiency bounds
have been reported in recent works [9][10]. Convex demand functions, such as
the negative exponential and the constant elasticity demand curves, have been
widely used in oligopoly analysis and marketing research [2,4,14]. The efficiency
loss in a Cournot oligopoly with some specific forms of convex inverse demand
functions1 has received some recent attention. For a particular form of convex

This research was supported in part by the National Science Foundation under grant
CMMI-0856063 and by a Graduate Fellowship from Shell.
1
Since a demand function is generally nonincreasing, the convexity of a demand func-
tion implies that the corresponding inverse demand function is also convex. For
a Cournot oligopoly model with non-concave inverse demand functions, existence
results for Cournot equilibria can be found in [12,1].

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 63–76, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
64 J.N. Tsitsiklis and Y. Xu

inverse demand functions, i.e., p(q) = α − βq γ , the authors of [3] show that when
γ > 0, the worst case efficiency loss occurs when an efficient supplier has to share
the market with infinitely many inefficient suppliers. The authors of [7] consider
a class of inverse demand functions that solve a certain differential equation (for
example, constant elasticity inverse demand functions belong to this class), and
establish efficiency lower bounds that depend on equilibrium market shares, the
market demand, and the number of suppliers.
For Cournot oligopolies with general convex and nonincreasing demand func-
tions, we establish a lower bound on the efficiency of Cournot equilibria in terms
of a scalar parameter c/d derived from the inverse demand function, namely, the
ratio of the slope of the inverse demand function at the Cournot equilibrium, c,
to the average slope of the inverse demand function between the Cournot equi-
librium and a social optimum, d. For convex and nonincreasing inverse demand
functions, we have c ≥ d; for affine inverse demand functions, we have c/d = 1.
In the latter case, our efficiency bound is f (1) = 2/3, which is consistent with the
bound derived in [9]. More generally, the ratio c/d can be viewed as a measure
of nonlinearity of the inverse demand function.
The rest of the paper is organized as follows. In the next section, we formulate
the model and provide some mathematical preliminaries on Cournot equilibria
that will be useful later, including the fact that efficiency lower bounds can be
obtained by restricting to linear cost functions. In Section 3, we consider affine
inverse demand functions and derive a refined lower bound on the efficiency of
Cournot equilibria that depends on a small amount of ex post information. We
also show this bound to be tight. In Section 4, we consider a more general model,
involving convex inverse demand functions. We show that for convex inverse de-
mand functions, and for the purpose of studying the worst case efficiency loss, it
suffices to restrict to a special class of piecewise linear inverse demand functions.
This leads to the main result of the paper, a lower bound on the efficiency of
Cournot equilibria (Theorem 2). Based on this theorem, in Section 5 we derive a
corollary that provides an efficiency lower bound that can be calculated without
detailed information on Cournot equilibria, and apply it to various commonly
encountered convex inverse demand functions. Finally, in Section 6, we make
some brief concluding remarks. Most proofs are omitted and can be found in an
extended version of the paper [13].

2 Formulation and Preliminaries

In this section, we first define the Cournot competition model that we study,
and introduce several main assumptions that we will be working with. In Sec-
tion 2.1, we present conditions for a nonnegative vector to be a social optimum or
a Cournot equilibrium. Then, in Section 2.2, we define the efficiency of a Cournot
equilibrium. In Sections 2.3 and 2.4, we derive some properties of Cournot equi-
libria that will be useful later, but which may also be of some independent
interest. For example, we show that the worst case efficiency occurs when the
cost functions are linear.
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 65

We consider a market for a single homogeneous good with inverse demand


function p : [0, ∞) → [0, ∞) and N suppliers. Supplier n ∈ {1, 2, . . . , N } has a
cost function Cn : [0, ∞) → [0, ∞). Each supplier n chooses a nonnegative real
number xn , which is the amount of the good to be supplied by her. The  strategy
N
profile x = (x1 , x2 , . . . , xN ) results in a total supply denoted by X = n=1 xn ,
and a corresponding market price p(X). The payoff to supplier n is
πn (xn , x−n ) = xn p(X) − Cn (xn ),
where we have used the standard notation x−n to indicate the vector x with the
component xn omitted. In the sequel, we will use ∂− p and ∂+ p to denote the
left and right derivatives of p, respectively.
Assumption 1. For any n, the cost function Cn : [0, ∞) → [0, ∞) is con-
vex, continuous, and nondecreasing on [0, ∞), and continuously differentiable
on (0, ∞). Furthermore, Cn (0) = 0.
Assumption 2. The inverse demand function p : [0, ∞) → [0, ∞) is contin-
uous, nonnegative, and nonincreasing, with p(0) > 0. Its right derivative at 0
exists and at every q > 0, its left and right derivatives also exist.
Note that we do not yet assume that the inverse demand function is convex. The
reason is that some of the results to be derived in this section are valid even in
the absence of such a convexity assumption. Note also that some parts of our
assumptions are redundant, but are included for easy reference. For example,
if Cn (·) is convex and nonnegative, with Cn (0) = 0, then it is automatically
continuous and nondecreasing.
Definition 1. The optimal social welfare is the optimal objective value in
the following optimization problem,
 X N

maximize p(q) dq − Cn (xn )
0 (1)
n=1
subject to xn ≥ 0, n = 1, 2, . . . , N,
N
where X = n=1 xn .
X
In the above definition, 0 p(q) dq is the aggregate consumer surplus and
N
n=1 Cn (xn ) is the total cost of the suppliers. For a model with a nonincreasing
continuous inverse demand function and continuous convex cost functions, the
following assumption guarantees the existence of an optimal solution to (1).
Assumption 3. There exists some R > 0 such that p(R) ≤ minn {Cn (0)}.
The social optimization problem (1) may admit multiple optimal solutions. How-
ever, they must all result in the same price. We note that the differentiability of
the cost functions is crucial for this result to hold.
Proposition 1. Suppose that Assumptions 1 and 2 hold. All optimal solutions
to (1) result in the same price.
66 J.N. Tsitsiklis and Y. Xu

2.1 Optimality and Equilibrium Conditions


We observe that under Assumption 1 and 2, the objective function in (1) is
concave. Hence, we have the following necessary and sufficient conditions for a
vector xS to achieve the optimal social welfare:
  S  
Cn (xn ) = p X S , if xSn > 0,
  (2)
Cn (0) ≥ p X S , if xSn = 0,
N
where X S = n=1 xSn .
We have the following equilibrium conditions for a strategy profile x. In par-
ticular, under Assumptions 1 and 2, if x is a Cournot equilibrium, then

Cn (xn ) ≤ p (X) + xn · ∂− p (X) , if xn > 0, (3)


Cn (xn ) ≥ p (X) + xn · ∂+ p (X) , (4)
N
where again X = n=1 xn . Note, however, that in the absence of further as-
sumptions, the payoff of supplier n need not be a concave function of xn and
these conditions are, in general, not sufficient.
We will say that a nonnegative vector x is a Cournot candidate if it sat-
isfies the necessary conditions (3)-(4). Note that for a given model, the set of
Cournot equilibria is a subset of the set of Cournot candidates. Most of the re-
sults obtained in this section, including the efficiency lower bound in Proposition
5, apply to all Cournot candidates.
For convex inverse demand functions, the necessary conditions (3)-(4) can be
further refined.
Proposition 2. Suppose that Assumptions 1 and 2 hold, and that  the inverse
N
demand function p(·) is convex. If x is a Cournot candidate with X = n=1 xn >
0, then p(·) must be differentiable at X, i.e.,

∂− p (X) = ∂+ p (X) .

Because of the above proposition, when Assumptions 1 and 2 hold and the inverse
demand function is convex, we have the following necessary (and, by definition,
sufficient) conditions for a nonzero vector x to be a Cournot candidate:
 
Cn (xn ) = p (X) + xn p (X), if xn > 0,
(5)
Cn (0) ≥ p (X) + xn p (X), if xn = 0.

2.2 Efficiency of Cournot Equilibria


As shown in [5], if p(0) > minn {Cn (0)}, then the aggregate supply at a Cournot
equilibrium is positive; see Proposition 3 below for a slight generalization. If on
the other hand p(0) ≤ minn {Cn (0)}, then the model is uninteresting, because
no supplier has an incentive to produce and the optimal social welfare is zero.
This motivates the assumption that follows.
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 67

Assumption 4. The price at zero supply is larger than the minimum marginal
cost of the suppliers, i.e.,

p(0) > min{Cn (0)}.


n

Proposition 3. Suppose that Assumptions 1, 2, and 4 hold. If x is a Cournot


candidate, then X > 0.

Under Assumption 4, at least one supplier has an incentive to choose a positive


quantity, which leads us to the next result.

Proposition 4. Suppose that Assumptions 1-4 hold. Then, the social welfare
achieved at a Cournot candidate, as well as the optimal social welfare [cf. (1)],
are positive.

We now define the efficiency of a nonnegative vector x as the ratio of the social
welfare that it achieves to the optimal social welfare.

Definition 2. Suppose that Assumptions 1-4 hold. The efficiency of a non-


negative vector x = (x1 , . . . , xN ) is defined as
 X N

p(q) dq − Cn (xn )
0 n=1
γ(x) =  N
, (6)
XS 
p(q) dq − Cn (xSn )
0 n=1

where xS =(xS1 , . . . , xSN ) is an optimal solution of the optimization problem in



(1) and X S = N n=1 xn .
S

We note that γ(x) is well defined: because of Assumption 4 and Proposition


4, the denominator on the right-hand side of (6) is guaranteed to be positive.
Furthermore, even if there are multiple socially optimal solutions xS , the value
of the denominator is the same for all such xS . Note that γ(x) ≤ 1 for every
nonnegative vector x. Furthermore, if x is a Cournot candidate, then γ(x) > 0,
by Proposition 4.

2.3 Restricting to Linear Cost Functions


Proposition 5. Suppose that Assumptions 1-4 hold and that p(·) is convex. Let
x be a Cournot candidate which is not socially optimal, and let αn = Cn (xn ).
Consider a modified model in which we replace the cost function of each supplier
n by a new function C n (·), defined by

C n (x) = αn x, ∀ x ≥ 0.

Then, for the modified model, Assumptions 1-4 still hold, the vector x is a
Cournot candidate, and its efficiency, denoted by γ(x), satisfies 0 < γ(x) ≤ γ(x).
68 J.N. Tsitsiklis and Y. Xu

If x is a Cournot equilibrium, then it satisfies Eqs. (3)-(4), and therefore is


a Cournot candidate. Hence, Proposition 5 applies to all Cournot equilibria
that are not socially optimal. We note that if a Cournot candidate x is socially
optimal for the original model, then the optimal social welfare in the modified
model could be zero, in which case γ(x) = 1, but γ(x) is undefined; see the
example that follows.
Example 1. Consider a model involving two suppliers (N = 2). The cost function
of supplier n is Cn (x) = x2 , for n = 1, 2. The inverse demand function is constant,
with p(q) = 1 for any q ≥ 0. It is not hard to see that the vector (1/2, 1/2) is
a Cournot candidate, which is also socially optimal. In the modified model, we
have C n (x) = x, for n = 1, 2. The optimal social welfare achieved in the modified
model is zero. 
To lower bound the efficiency of a Cournot equilibrium in the original model, it
suffices to lower bound the efficiency achieved at a worst Cournot candidate for
a modified model. Accordingly, and for the purpose of deriving lower bounds, we
can (and will) restrict to the case of linear cost functions, and study the worst
case efficiency over all Cournot candidates.

2.4 Other Properties of Cournot Candidates


In this subsection, we collect a few useful and intuitive properties of Cournot
candidates. We show that at a Cournot candidate there are two possibilities:
either p(X) > p(X S ) and X < X S , or p(X) = p(X S ) (Proposition 6); in the
latter case, under the additional assumption that p(·) is convex, a Cournot can-
didate is socially optimal (Proposition 7). In either case, imperfect competition
can never result in a price that is less than the socially optimal price.
Proposition 6. Suppose that Assumptions 1-4 hold. Let x and xS be a Cournot
candidate and an optimal solution to (1), respectively. If p(X) = p(X S ), then
p(X) > p(X S ) and X < X S .
For the case where p(X) = p(X S ), Proposition 6 does not provide any compari-
son between X and X S . While one usually has X < X S (imperfect competition
results in lower quantities), it is also possible that X > X S , as in the following
example.
Example 2. Consider a model involving two suppliers (N = 2). The cost function
of each supplier is linear, with slope equal to 1. The inverse demand function is
convex, of the form
2 − q, if 0 ≤ q ≤ 1,
p(q) =
1, if 1 < q.

It is not hard to see that any nonnegative vector xS that satisfies xS1 + xS2 ≥ 1
is socially optimal; xS1 = xS2 = 1/2 is one such vector. On the other hand, it can
be verified that x1 = x2 = 1 is a Cournot equilibrium. Hence, in this example,
2 = X > X S = 1. 
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 69

Proposition 7. Suppose that Assumptions 1-4 hold and that the inverse de-
mand function is convex. Let x and xS be a Cournot candidate and an optimal
solution to (1), respectively. If p(X) = p(X S ), then p (X) = 0 and γ(x) = 1.

Proposition 1 shows that all social optima lead to a unique “socially optimal”
price. Combining with Proposition 7, we conclude that if p(·) is convex, a Cournot
candidate is socially optimal if and only if it results in the socially optimal price.

2.5 Concave Inverse Demand Functions

In this section, we argue that the case of concave inverse demand functions is
fundamentally different. For this reason, the study of the concave case would
require a very different line of analysis, and is not considered further in this
paper.
According to Proposition 7, if the inverse demand function is convex and
if the price at a Cournot equilibrium equals the price at a socially optimal
point, then the Cournot equilibrium is socially optimal. For nonconvex inverse
demand functions, this is not necessarily true: a socially optimal price can be
associated with a socially suboptimal Cournot equilibrium, as demonstrated by
the following example.

Example 3. Consider a model involving two suppliers (N = 2), with C1 (x) = x


and C2 (x) = x2 . The inverse demand function is concave on the interval where
it is positive, of the form

1, if 0 ≤ q ≤ 1,
p(q) =
max{0, −M (q − 1) + 1}, if 1 < q,

where M > 2. It is not hard to see that the vector (0.5, 0.5) satisfies the opti-
mality conditions in (2), and is therefore socially optimal. We now argue that
(1/M, 1 − 1/M ) is a Cournot equilibrium. Given the action x2 = 1/M of supplier
2, any action on the interval [0, 1 − 1/M ] is a best response for supplier 1. Given
the action x1 = 1 − (1/M ) of supplier 1, a simple calculation shows that

arg max x · p(x + 1 − 1/M ) − x2 = 1/M.


x∈[0,∞)

Hence, (1/M, 1 − 1/M ) is a Cournot equilibrium. Note that X = X S = 1,


so that p(X) = p(X S ). However, the optimal social welfare is 0.25, while the
social welfare achieved at the Cournot equilibrium is 1/M − 1/M 2 . By consid-
ering arbitrarily large M , the corresponding efficiency can be made arbitrarily
small. 

The preceding example shows that arbitrarily high efficiency losses are possible,
even if X = X S . The possibility of inefficient allocations even when the price
is the correct one opens up the possibility of substantial inefficiencies that are
hard to bound.
70 J.N. Tsitsiklis and Y. Xu

3 Affine Inverse Demand Functions

In this section, we establish an efficiency lower bound for Cournot oligopoly


models with affine inverse demand functions, of the form:

b − aq, if 0 ≤ q ≤ b/a,
p(q) = (7)
0, if b/a < q,

where a and b are positive constants.

Theorem 1. Suppose that Assumption 1 holds (convex cost functions), and that
the inverse demand function is affine, of the form (7). Suppose also that b >
minn {Cn (0)} (Assumption 4). Let x be a Cournot equilibrium, and let αn =
Cn (xn ). Let also
aX
β= ,
b − minn {αn }
If X > b/a, then x is socially optimal. Otherwise:

(a) We have 1/2 ≤ β < 1.


(b) The efficiency of x satisfies,

γ(x) ≥ g(β) = 3β 2 − 4β + 2.

(c) The bound in part (b) is tight. That is, for every β ∈ [1/2, 1) and every  > 0,
there exists a model with a Cournot equilibrium whose efficiency is no more
than g(β)+.
(d) The function g(β) is minimized at β = 2/3 and the worst case efficiency is
2/3.

0.95

0.9

0.85
g(β)

0.8

0.75
(2/3, 2/3)

0.7

0.65
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
β

Fig. 1. A tight lower bound on the efficiency of Cournot equilibria for the case of affine
inverse demand functions
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 71

The lower bound g(β) is illustrated in Fig. 1. For the special case where all the
cost functions are linear, of the form Cn (xn ) = αn , Theorem 1 has an interesting
interpretation. We first note that β = X/X S , which is the ratio of the aggregate
supply at the Cournot equilibrium to that at a social optimum. Clearly, if β
is close to 1 we expect the efficiency loss due to the difference X S − X to be
small. However, efficiency losses may also arise if the total supply at a Cournot
equilibrium is not provided by the most efficient suppliers. Our result shows that,
for the affine case, β can be used to lower bound the total efficiency loss due
to this second factor as well. Somewhat surprisingly, the worst case efficiency
also tends to be somewhat better for low β, that is, when β approaches 1/2, as
compared to intermediate values (β ≈ 2/3).

4 Convex Inverse Demand Functions


In this section, we first show that in order to study the worst-case efficiency of
Cournot equilibria, it suffices to consider a particular form of piecewise linear
inverse demand functions. We then introduce the main result of this section,
an efficiency lower bound that holds for Cournot oligopoly models with convex
inverse demand functions.
Proposition 8. Suppose that Assumptions 1-4 hold, and that the inverse de-
mand function is convex. Let x and xS be a Cournot candidate and an optimal
solution to (1), respectively. Assume that p(X) = p(X S ) and let c = |p (X)|.
Consider a modified model in which we replace the inverse demand function by
a new function p0 (·), defined by


⎨ −c(q − X) + p(X), if 0 ≤ q ≤ X,
p0 (q) =  S
 (8)
⎩ max 0, p(X S)−p(X) (q − X) + p(X) , if X < q.
X −X

Then, for the modified model, with inverse demand function p0 (·), the vector xS
remains socially optimal, and the efficiency of x, denoted by γ 0 (x), satisfies

γ 0 (x) ≤ γ(x).

Proof. Proof Since p(X) = p(X S ), Proposition 6 implies that X < X S , so that
p0 (·) is well defined. Since the necessary and sufficient optimality conditions in
(2) only involve the value of the inverse demand function at X S , which has been
unchanged, the vector xS remains socially optimal for the modified model.
Let   S
X X
A= p0 (q) dq, B= p(q) dq,
0 X
and  
XS X
C= (p0 (q) − p(q)) dq, D= (p(q) − p0 (q)) dq.
X 0
72 J.N. Tsitsiklis and Y. Xu

D
p(q)

0
p (q)

Price
Cournot equilibrium

A C
Socially optimal point

X S Aggregate supply
X

Fig. 2. The efficiency of a Cournot equilibrium cannot increase if we replace the inverse
demand function by the piecewise linear function p0 (·). The function p0 (·) is tangent to
the inverse demand function p(·) at the equilibrium point, and connects the Cournot
equilibrium point with the socially optimal point.

See Fig. 2 for an illustration of p(·) and a graphical interpretation of A, B, C,


D. Note that since p(·) is convex, we have C ≥ 0 and D ≥ 0. The efficiency of x
in the original model with inverse demand function p(·), is
N
A + D − n=1 Cn (xn )
0 < γ(x) = N ≤ 1,
A + B + D − n=1 Cn (xSn )
where the first inequality is true because the social welfare achieved at any
Cournot candidate is positive (Proposition 4). The efficiency of x in the modified
model is 
A− N n=1 Cn (xn )
γ 0 (x) = N .
A + B + C − n=1 Cn (xSn )
Note that the denominators in the above formulas for γ(x) and γ 0 (x) are all
positive, by
N Proposition 4.
If A − n=1 Cn (xn ) ≤ 0, then γ 0 (x) ≤ 0 and the result is clearly true. We
N
can therefore assume that A − n=1 Cn (xn ) > 0. We then have

N 
N
A− Cn (xn ) A+D− Cn (xn )
n=1 n=1
0 < γ 0 (x) = ≤

N 
N
A+B+C − Cn (xSn ) A+B+C+D− Cn (xSn )
n=1 n=1

N
A+D− Cn (xn )
n=1
≤ = γ(x) ≤ 1,
N
A+B +D− Cn (xSn )
n=1

which proves the desired result. 


Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 73

Note that unless p(·) happens to be linear on the interval [X, X S ], the function
p0 (·) is not differentiable at X and, according to Proposition 2, x cannot be a
Cournot candidate for the modified model. Nevertheless, p0 (·) can still be used
to derive a lower bound on the efficiency of Cournot candidates in the original
model.

0.7
(1,2/3)

0.6

0.5

0.4
f(c/d)

0.3

0.2

0.1

0
5 10 15 20 25 30 35 40 45 50
c/d

Fig. 3. Plot of the lower bound on the efficiency of a Cournot equilibrium in a Cournot
oligopoly with convex inverse demand functions, as a function of the ratio c/d

Theorem 2. Suppose that Assumptions 1-4 hold, and that the inverse demand
function is convex. Let x and xS be a Cournot equilibrium and a solution to (1),
respectively. Then, the following hold.
(a) If p(X) = p(X S ), then γ(x) = 1.
(b) If p(X) = p(X S ), let c = |p (X)|, d = |(p(X S ) − p(X))/(X S − X)|, and
c = c/d. We have c ≥ 1 and
φ2 + 2
1 > γ(x) ≥ f (c) = , (9)
φ2 + 2φ + c
where   
2−c+ c2 − 4c + 12
φ = max ,1 .
2
Remark 1. We do not know whether the lower bound in Theorem 2 is tight. The
difficulty in proving tightness is due to the fact that the vector x need not be a
Cournot equilibrium in the modified model. 
The lower bound established in part (b) is depicted in Fig. 2. If p(·) is affine,
then c = c/d = 1. From (9), it can be verified that f (1) = 2/3, which agrees
with the lower bound in [9] for the affine case. We note that the lower bound
f (c) is monotonically decreasing in c, over the domain [1, ∞). When c ∈ [1, 3),
φ is at least 1, and monotonically decreasing in c. When c ≥ 3, φ = 1.
74 J.N. Tsitsiklis and Y. Xu

5 Corollaries and Applications


For a given inverse demand function p(·), the lower bound derived in Theorem
2 requires some knowledge on the Cournot candidate and the social optimum,
namely, the aggregate supplies X and X S . We will derive an efficiency lower
bound that does not require knowledge of X and X S , and apply it to various
convex inverse demand functions that have been considered in the economics
literature.

Corollary 1. Suppose that Assumptions 1-4 hold and that p(·) is convex. Let2
  
s = inf{q | p(q) = min Cn (0)}, t = inf q  min Cn (q) ≥ p(q) + q∂+ p(q) .
n n
(10)
If ∂− p(s) < 0, then the efficiency of a Cournot candidate is at least
f (∂+ p(t)/∂− p(s)).

Note that if there exists a “best” supplier n such that Cn (x) ≤ Cm

(x), for any
other supplier m and any x > 0, then the parameters s and t depend only on
p(·) and Cn (·).

Example 4. Suppose that Assumptions 1, 3, and 4 hold, and that there is a


best supplier, whose cost function is linear with a slope c ≥ 0. Consider inverse
demand functions of the form (cf. Eq. (6) in [2])

p(q) = max{0, α − β log q}, 0 < q, (11)

where α and β are positive constants.3 Through a simple calculation we obtain


   
α−c α−β−c
s = exp , t = exp .
β β

From Corollary 1 we obtain that for every Cournot equilibrium x,


 
exp ((α − c)/β)
γ(x) ≥ f = f (exp (1)) ≥ 0.5237. (12)
exp ((α − β − c)/β)

Now we argue that the efficiency lower bound (12) holds even without the as-
sumption that there is a best supplier associated with a linear cost function. From
Proposition 5, the efficiency of any Cournot equilibrium x will not increase if
the cost function of each supplier n is replaced by

C n (x) = Cn (xn )x, ∀x ≥ 0.


2
Under Assumption 3, the existence of the real numbers defined in (10) is guaranteed.
3
In fact, p(0) is undefined. This turns out to not be an issue: for a small enough
 > 0, we can guarantee that no supplier chooses a quantity below . Furthermore,

lim↓0 0 p(q) dq = 0. For this reason, the details of the inverse demand function in
the vicinity of zero are immaterial as far as the chosen quantities or the resulting
social welfare are concerned.
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 75

Let c = minn Cn (xn )}. Since the efficiency lower bound in (12) holds for the
modified model with linear cost functions, it applies whenever the inverse de-
mand function is of the form (11). 
Example 5. Suppose that Assumptions 1, 3, and 4 hold, and that there is a
best supplier, whose cost function is linear with a slope c ≥ 0. Consider inverse
demand functions of the form (cf. Eq. (5) in [2])

p(q) = max{α − βq δ , 0}, 0 < δ ≤ 1, (13)

where α and β are positive constants. Note that if δ = 1, then p(·) is affine; if
0 < δ ≤ 1, then p(·) is convex. Assumption 4 implies that α > χ. Through a
simple calculation we have
 1/δ  1/δ
α−c α−c
s= , t= .
β β(δ + 1)
From Corollary 1 we know that for every Cournot equilibrium x,
   
−βδtδ−1 1−δ
γ(x) ≥ f = f (δ + 1) δ .
−βδsδ−1
Using the argument in Example 4, we conclude that this lower bound also applies
to the case of general convex cost functions. 

6 Conclusion
It is well known that Cournot oligopoly can yield arbitrarily high efficiency
loss in general; for details, see [8]. For Cournot oligopoly with convex market
demand and cost functions, results such as those provided in Theorem 2 show
that the efficiency loss of a Cournot equilibrium can be bounded away from
zero by a function of a scalar parameter that captures quantitative properties of
the inverse demand function. With additional information on the cost functions,
the efficiency lower bounds can be further refined. Our results apply to various
convex inverse demand functions that have been considered in the economics
literature.

References
1. Amir, R.: Cournot oligopoly and the theory of supermodular games. Games Econ.
Behav. 15, 132–148 (1996)
2. Bulow, J., Pfleiderer, P.: A note on the effect of cost changes on prices. J. Political
Econ. 91(1), 182–185 (1983)
3. Corchon, L.C.: Welfare losses under Cournot competition. International J. of In-
dustrial Organization 26(5), 1120–1131 (2008)
4. Fabingeryand, M., Weyl, G.: Apt Demand: A flexible, tractable adjustable-pass-
through class of demand functions (2009),
http://isites.harvard.edu/fs/docs/icb.topic482110.files/Fabinger.pdf
76 J.N. Tsitsiklis and Y. Xu

5. Friedman, J.W.: Oligopoly and the Theory of Games. North-Holland, Amsterdam


(1977)
6. Friedman, J.: Oligopoly Theory. Cambridge University Press (1983)
7. Guo, X., Yang, H.: The Price of Anarchy of Cournot Oligopoly. In: Deng, X., Ye,
Y. (eds.) WINE 2005. LNCS, vol. 3828, pp. 246–257. Springer, Heidelberg (2005)
8. Johari, R.: Efficiency loss in market mechanisms for resource allocation, Ph.D.
dissertation, Mass. Inst. Technol., Cambridge, MA, USA (2004)
9. Johari, R., Tsitsiklis, J.N.: Efficiency loss in Cournot games, MIT Lab. Inf. Decision
Syst., Cambridge, MA, USA. Technical report 2639 (2005),
http://web.mit.edu/jnt/www/Papers/R-05-cournot-tr.pdf
10. Kluberg, J., Perakis, G.: Generalized quantity competition for multiple products
and loss of efficiency. In: Allerton Conf. Comm., Control, Comput., Monticello, IL,
USA (2008)
11. Koutsoupias, E., Papadimitriou, C.H.: Worst-case equilibria. Computer Sci. Re-
view 3(2), 65–69 (1999)
12. Novshek, W.: On the existence of Cournot equilibrium. Review of Econ. Stud-
ies 52(1), 85–98 (1985)
13. Tsitsiklis, J.N., Xu, Y.: Efficiency loss in a Cournot oligopoly with convex market
demand (2012), http://arxiv.org/abs/1203.6675
14. Tyagi, R.: A characterization of retailer response to manufacturer trade deals. J.
of Marketing Res. 36(4), 510–516 (1999)
A Game Theoretic Optimization
of the Multi-channel ALOHA Protocol

Kobi Cohen, Amir Leshem, and Ephraim Zehavi

Faculty of Engineering, Bar-Ilan University,


Ramat-Gan, 52900, Israel
[email protected]

Abstract. In this paper we consider the problem of distributed through-


put maximization of networks with multi-channel ALOHA medium ac-
cess protocol. In the multi-channel ALOHA protocol, each user tries to
randomly access a channel using a probability vector defining the access
probability to the various channels. First, we characterize the Nash Equi-
librium Points (NEPs) of the network when users solve the unconstrained
rate maximization. We show that in this case, for any NEP, each user’s
probability vector is a standard unit vector (i.e., each user tries to access
a single channel with probability one and does not try to access other
channels). Specifically, when the number of users, N , is equal to the num-
ber of channels there are N ! NEPs. However, when the number of users
is much larger than the number of channels, most of the users get a zero
utility (due to collisions). To overcome this problem we propose to limit
each user’s total access probability and solve the problem under a total
probability constraint. We characterize the NEPs when user rates are
subject to a total transmission probability constraint. We propose a sim-
ple best-response algorithm that solves the constrained rate maximiza-
tion, where each user updates its strategy using its local channel state
information (CSI) and by monitoring the channel utilization. We prove
that the constrained rate maximization can be formulated as an exact
potential game. This implies that convergence of the proposed algorithm
is guaranteed. Finally, we provide numerical examples to demonstrate
the algorithm’s performance.

Keywords: Collision channels, multi-channel ALOHA, Nash equilib-


rium point, best response, potential games.

1 Introduction
In typical wireless communication networks, the bandwidth is shared by several
users. Medium Access Control (MAC) schemes are used to manage the access of
users to the shared channels. The slotted ALOHA access protocol is popular due
to its simple implementation and random-access nature [1]. In each time-slot, a
user may access a shared channel according to a specific transmission probability.
Transmission is successful only if a single user tries to access a shared channel
in a given time-slot. If more than one user transmits at the same time slot over

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 77–87, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
78 K. Cohen, A. Leshem, and E. Zehavi

the same channel a collision occurs. Here, we examine the ALOHA protocol
with multi-channel systems, dubbed multi-channel ALOHA. In multi-channel
systems, the bandwidth is divided into K orthogonal sub-bands using Orthog-
onal Frequency Division Multiple Access (OFDMA). Each sub-band can be a
cluster of multiple carriers. A diversity of channel realizations is advantageous
when users exploit local CSI to access good channels. Multi-channel systems are
widely investigated recently in cognitive radio networks, where cognitive users
share an unlicensed spectrum band, while avoiding interferences with licensed
users. A related work on this subject can be found in [2–6].
In distributed optimization algorithms, users take autonomous decisions based
on local information and coordination or massage passing between users are not
required. Therefore, in wireless networks, distributed optimization algorithms
are simple to implement and generally preferred over centralized solutions. A
natural framework to analyze distributed optimization algorithms in wireless
networks is non-cooperative game-theory. A related work on this subject can be
found in [7–12].
In this paper we present a game theoretic approach to the problem of dis-
tributed rate maximization of multi-channel ALOHA networks. In the multi-
channel ALOHA protocol, each user tries to randomly access a channel using a
probability vector defining the access probability to the various channels. First,
we characterize the Nash Equilibrium Points (NEPs) of the network when users
solve the unconstrained rate maximization. We show that in this case, for any
NEP, each user’s probability vector is a standard unit vector (i.e., each user occu-
pies a single channel with probability one and does not try to access other chan-
nels). When considering the unconstrained rate maximization, we are mainly
interested in the case where the number of channels is greater or equal to the
number of users, to avoid collisions. Specifically, in the case where the number of
users, N , is equal to the number of channels there are N ! NEPs. However, when
the number of users is much larger than the number of channels, most users get
a zero utility (due to collisions). To overcome this problem we propose to limit
each user’s total access probability and solve the problem under a total prob-
ability constraint. We characterize the NEPs when user rates are subject to a
total transmission probability constraint. We propose a simple best-response al-
gorithm that solves the constrained rate maximization, where each user updates
its strategy using its local CSI and by monitoring the channel utilization. We
prove that the constrained rate maximization can be formulated as an exact po-
tential game [13]. In potential games, the incentive of all players to change their
strategy can be expressed in a one global function, the potential function. The
existence of a bounded potential function corresponding to the constrained rate
maximization problem implies that the convergence of the proposed algorithm
is guaranteed. Furthermore, the convergence is in finite time, starting from any
point and using any updating dynamics across users.
The rest of this paper is organized as follows. In section 2 we present
the network model and game formulation. In section 3 and 4 we discuss the
unconstrained and the constrained rate maximization problems, respectively. In
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 79

section 5 we provide simulation results to demonstrate the algorithm


performance.

2 Network Model and Game Formulation


In this paper we consider a wireless network containing N users who trans-
mit over K orthogonal collision channels. The users transmit using the slotted
ALOHA scheme. In each time slot each user is allowed to access a single channel.
A transmission can be successful only if no other user tries to access the same
channel simultaneously. In this paper we denote the collision-free achievable rate
of user n at channel k by un (k). Furthermore, we define a virtual zero-rate chan-
nel un (0) = 0, , ∀n, i.e., accessing a channel k = 0 refers to no-transmission.
The collision-free rate vector of user n in all K + 1 channels is given by:
 
un  un (0) un (1) un (2) · · · un (K) , (1)

and the collision-free rate matrix of all N users in all K + 1 channels is given by:
⎡ ⎤
u1 (0) u1 (1) u1 (2) · · · u1 (K)
⎢ u2 (0) u2 (1) u2 (2) · · · u2 (K) ⎥
U⎢ ⎣ :
⎥ .
⎦ (2)
uN (0) uN (1) uN (2) · · · uN (K)

Let pn (k) be the probability that user n tries to access channel k. Let Pn be the
the set of all probability vectors of user n in all K + 1 channels. A probability
vector pn ∈ Pn of user n is given by:
 
pn  pn (0) pn (1) pn (2) · · · pn (K) , (3)

Let P be the set of all probability matrices of all N users in all K + 1 channels.
The probability matrix P ∈ P is given by:
⎡ ⎤
p1 (0) p1 (1) p1 (2) · · · p1 (K)
⎢ p2 (0) p2 (1) p2 (2) · · · p2 (K) ⎥
P⎢ ⎣ :
⎥ ,
⎦ (4)
pN (0) pN (1) pN (2) · · · pN (K)
K
where k=0 pn (k) = 1 ∀n.
Let P−n be the set of all probability matrices of all N users in all K + 1
channels, except user n. The probability matrix P−n ∈ P−n is given by:
⎡ ⎤
p1 (0) p1 (1) p1 (2) · · · p1 (K)
⎢ : ⎥
⎢ ⎥
⎢pn−1 (0) pn−1 (1) pn−1 (2) · · · pn−1 (K)⎥
P−n  ⎢⎢ ⎥ , (5)

⎢pn+1 (0) pn+1 (1) pn+1 (2) · · · pn+1 (K)⎥
⎣ : ⎦
pN (0) pN (1) pN (2) · · · pN (K)
80 K. Cohen, A. Leshem, and E. Zehavi

We focus in this paper on stationary access strategies, where each user decides
whether or not to access a channel based on the current utility matrix and all
other users’ strategy.

Definition 1: A stationary strategy for user n is a mapping from {P−n , un }


to pn ∈ Pn .

Remark 1: Note that un depends on the local CSI of user n, which can be
obtained by a pilot signal in practical implementations. On the other hand, in
the sequel we show that user n does not need the complete information on the
matrix P−n to update its strategy, but only to monitor the channel utilization
by other users, defined by:
N
qn (k)  1 − (1 − pi (k)) . (6)
i=1 ,i=n

Remark 2: We refer the probability matrix P as the multi-strategy contained


all users’ strategy, and P−n as the multi-strategy contained all users’ strategy
except the strategy of user n.

When user n perfectly monitors the k th channel utilization, it observes:


N
vn (k)  1 − qn (k) = (1 − pi (k)) , (7)
i=1 ,i=n

which is the probability that the k th channel is available.


Since a collision occurs when more than one user tries to access the same
channel, the achievable rate of user n in the k th channel is given by:

rn (k)  un (k)vn (k) . (8)

Hence, the achievable expected rate of user n is given by:


K
Rn  Rn (pn , P−n ) = pn (k)rn (k) . (9)
k=1

In this paper, we consider a distributed rate maximization problem, where each


user tries to maximize its own expected rate subject to a total transmission
probability constraint:
K
max Rn s.t. pn (k) ≤ Pmax . (10)
pn
k=1

We are interested in unconstrained (i.e., Pmax = 1) and constrained (i.e., Pmax <
1) NEP solutions of this game. A NEP for our model is a multi-strategy P, given
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 81

in (4), which is self-sustaining in the sense that non of the users can increase its
utility by unilaterally modifying its strategy pn .

Definition 2: A multi-strategy P is a Nash Equilibrium Point (NEP) if

Rn (pn , P−n ) ≥ Rn (p̃n , P−n ) ∀n , ∀p̃n .


(11)

Formally, we define the non-cooperative multi-channel ALOHA game in this pa-


per as follows:

Definition 3: The non-cooperative multi-channel ALOHA game (10) is given


by Γ = (N , P, R), where N = {1, 2, ..., N } denotes the set of players (or users),
P denotes the set of multi-strategies and R : P → RN denotes the payoff (i.e.,
rate) function.

Next, we examine the unconstrained and constrained NEP solutions of this


game (10).

3 Unconstrained Rate Maximization

In this section, we characterize unconstrained NEP solutions of this game (10).


Here, we set Pmax = 1 in (10). When considering unconstrained solutions, we are
mainly interested in the case where K ≥ N to avoid collisions. Practically, each
user monitors the channel utilization vn (k) for all k = 1, ..., K (i.e., the complete
P−n is not required), and tries to access only a single available channel, which
is the best response to all users’ strategy P−n (5).

Theorem 1. Assume that Pmax = 1 in (10). Then:


a) For any NEP, each user’s probability vector is a standard unit vector with
probability 1 (i.e., each user tries to access a single channel with probability one
and does not try to access other channels).
b) The network converges to a NEP in N iterations.

Proof. The proof is given in [14].

We infer from Theorem 1 that the unconstrained distributed rate maximization


is equivalent to a channel assignment problem, where each user chooses a single
channel. Once a channel is taken by some user, no other user can access the
same channel, since it has a zero utility. A good distributed solution to (10)
is obtained via distributed opportunistic access [15] combined with the Gale-
Shapley algorithm [16] to achieve a stable channel assignment, as was done
in [4, 5]. For details the reader is referred to [14].
82 K. Cohen, A. Leshem, and E. Zehavi

In the general case where N = K any permutation that avoids a collision


is a NEP. For instance, in the case of 3 users and 3 channels, the following
multi-strategy is a NEP: ⎡ ⎤
0010
P = ⎣0 1 0 0⎦ , (12)
0001
since any user that unilaterally modifies its strategy gets a zero utility (due to
collision or no-transmission). In this case we have N ! NEPs.
In the case where K > N any permutation that avoids a collision and max-
imizes every users’ rate (given other users’ strategies) is a NEP. For instance,
consider the case of 2 users and 3 channels and assume that u1 (3) ≤ u1 (2) and
u2 (3) ≤ u2 (1). The following multi-strategy is a NEP:

0010
P= , (13)
0100
since non of the users can increase its utility by unilaterally modifying its strategy
pn . As a result, there exist (K · (K − 1) · · · (K − N + 1)) NEPs at most.
In the case where N > K any permutation is a NEP if at least K users access
different K channels. For instance, in the case of 3 users and 2 channels, the
following multi-strategy is a NEP:
⎡ ⎤
001
P = ⎣0 1 0⎦ , (14)
010

since any user that unilaterally modifies its strategy gets a zero utility (due to
a collision or accessing the virtual channel). Note that a better NEP can be
obtained if users 2 or 3 access the virtual channel (i.e., do not transmit).

4 Constrained Rate Maximization


We now discuss the more interesting case, where N > K. In this case, uncon-
strained solutions lead to collisions or to zero utilities for some users. Therefore,
constrained solutions should be used. According to Theorem 1, setting Pmax < 1
is necessary to avoid collisions (otherwise, all users access a single channel with
probability one). First, we show the following result:
Theorem 2. Assume that Pmax < 1 in (10). Let rn (k ∗ ) = max {rn (k)}, where
k
rn (k) is defined in (8). Then, each user n plays the strategy:

⎨1 − Pmax , if k = 0
pn (k) = Pmax , if k = k ∗ , (15)

0, otherwise
with probability 1.
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 83

Proof. The proof is given in [14].

We infer from Theorem 2 that in each iteration each user will access a single
channel with probability Pmax and will not try to access other channels. However,
in contrast to the unconstrained solutions, other users can still access occupied
channels, since the utility is strictly positive in all channels. We discuss the
convergence later.
As a result of Theorem 2, we obtain a best response algorithm, given in Table
1. The proposed algorithm solves the constrained rate maximization problem
(10). In the initialization step, each user selects the channel with the maximal
collision-free rate un (k). This can be done by all users simultaneously in a single
iteration. Then, each user occasionally monitors the channels utilization and
updates its strategy by selecting the channel with the maximal achievable rate
rn (k) given the channels utilization.

Table 1. Proposed best response algorithm

%——initializing———————————–

- for all n = 1, ..., N users do:

- estimate un (k) for all k = 1, ..., K


- k ∗ ← arg max {un (k)}
k
- pn (k ∗ ) ← Pmax

- end for

%——end initializing——————————

- repeat:

- estimate vn (k) for all k = 1, ..., K


- compute rn (k) = un (k)vn (k)
for all k = 1, ..., K
- k ∗ ← arg max {rn (k)}
k
- pn (k ∗ ) ← Pmax

- until convergence

Next, we examine the convergence of the proposed algorithm. In contrast to


the unconstrained solutions, convergence of the algorithm is not guaranteed in
N iterations. However, in the following we use the theory of potential games to
84 K. Cohen, A. Leshem, and E. Zehavi

show that the constrained rate maximization (10) indeed converges in finite time.
In potential games, the incentive of all players to change their strategy can be
expressed as a single global function, the potential function. In exact potential
games, the improvement that each player can get by unilaterally changing its
strategy equals to the improvement in the potential function. Hence, any local
maximum of the potential function is a NEP. The existence of an exact bounded
potential function corresponding to the constrained rate maximization problem
(10) implies that the convergence of the proposed algorithm is guaranteed. Fur-
thermore, the convergence is in finite time, starting from any point and using
any updating dynamics across users.
 
Definition 4 [13]: A game Γ = N , P, R̃ , is an exact potential game if there is
an exact potential function φ : P → R such that for every user n ∈ N and for
every P−n ∈ P−n the following holds:

R̃n (p(2)
n , P−n ) − R̃n (pn , P−n )
(1)

n , P−n ) − φ(pn , P−n ) ,


= φ(p(2) (1)
(16)
(1) (2)
∀pn , pn ∈ Pn .

Theorem 3. The constrained rate maximization (10) can be formulated as an


exact potential game. Specifically, a global bounded exact potential function exists
to this game.

Proof. The proof is given in [14].

Corollary 1: Any sequential update dynamics of the multi-channel ALOHA game


(10) converges to a NEP in finite time, starting from any point. Specifically, the
proposed best response algorithm, given in Table 1, converges to a NEP in finite
time.

5 Simulation Results

In this section we provide numerical examples to illustrate the algorithm per-


formance. Here, we focus on the constrained rate maximization. We simulated a
network with N = 30 users, K = 10 channels, and the following parameters: the
channels are distributed according to Rayleigh fading distribution, i.i.d across
users and channels. The bandwidth W of each channel was set to 10MHz, and
the SNR was set to 20dB. The entries of the collision-free rate matrix U are
un (k) = W log(1 + SNR)Mbps. We set Pmax = K/N = 1/3. We compare be-
tween two algorithms: 1) The totally greedy algorithm, in the sense that each user
transmits over the channel that maximizes its collision-free rate un (k) without
considering the channel utilization; 2) The proposed best response algorithm,
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 85

given in Table 1. We initialize the proposed algorithm by the totally greedy


algorithm solution, as described in Table 1.
In Fig. 1(a) and 1(b) we present the average density of the rates achieved
by the proposed algorithm and by the totally greedy algorithm, respectively. It
can be seen that the rates variance achieved by the proposed algorithm is much
lower than the rates variance achieved by the the totally greedy algorithm. In
Table 2 we compare between the algorithms performance. It can be seen that
the average rate achieved by the proposed best response algorithm outperforms
the average rate achieved by the totally greedy algorithm by roughly 15%. The
average number of iterations until convergence of the proposed best response
algorithm is less than 9.

Table 2. Performance comparison

Proposed algorithm Totally greedy

Average rate [Mbps] 11.56 10.08

Variance 1.45 34.16

Average number of
iterations 8.75 1

6 Conclusion
In this paper we investigated the problem of distributed rate maximization of
networks applying the multi-channel ALOHA random access protocol. We char-
acterized the NEPs of the network when users solve the unconstrained rate
maximization. In this case, for any NEP, we obtained that each user tries to
access a single channel with probability one and does not try to access other
channels. Next, we limited each user’s total access probability and solved the
problem under a total probability constraint, to overcome the problem of col-
lisions when the number of users is much larger than the number of channels.
We characterized the NEPs when user rates are subject to a total transmission
probability constraint. We proposed a simple best-response algorithm that solves
the constrained rate maximization, where each user updates its strategy using
its local CSI and by monitoring the channel utilization. We used the theory
of potential games to prove convergence of the proposed algorithm. Finally, we
provided numerical examples to demonstrate the algorithms performance.
86 K. Cohen, A. Leshem, and E. Zehavi

1.8 Proposed algorithm

1.6

1.4

1.2
Density

0.8

0.6

0.4

0.2

0
8 9 10 11 12 13 14
Achievable rate [Mbps]

(a) Performance of the proposed best response algorithm, given


in Table 1.
5

4.5 Totally greedy algorithm

3.5

3
Density

2.5

1.5

0.5

0
0 5 10 15 20 25 30
Achievable rate [Mbps]

(b) Performance of the totally greedy algorithm.

Fig. 1. Average density of the rates achieved by the proposed algorithm and by the
totally greedy algorithm
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 87

References
1. Roberts, L.G.: ALOHA packets, with and without slots and capture. ACM SIG-
COMM Computer Communication Review 5(2), 28–42 (1975)
2. Zhao, Q., Sadler, B.: A survey of dynamic spectrum access. IEEE Signal Processing
Magazine 24(3), 79–89 (2007)
3. Zhao, Q., Tong, L., Swami, A.: Decentralized cognitive MAC for opportunistic spec-
trum access in ad hoc networks: a POMDP framework. IEEE Journal on Selected
Area in Comm. 25, 589–600 (2007)
4. Yaffe, Y., Leshem, A., Zehavi, E.: Stable matching for channel access control in
cognitive radio systems. In: International Workshop on Cognitive Information Pro-
cessing (CIP), pp. 470–475 (June 2010)
5. Leshem, A., Zehavi, E., Yaffe, Y.: Multichannel opportunistic carrier sensing for
stable channel access control in cognitive radio systems. IEEE Journal on Selected
Areas in Communications 30, 82–95 (2012)
6. Naparstek, O., Leshem, A.: Fully distributed auction algorithm for spectrum shar-
ing in unlicensed bands. In: IEEE International Workshop on Computational Ad-
vances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 233–236 (2011)
7. Yu, W., Ginis, G., Cioffi, J.: Distributed multiuser power control for digital sub-
scriber lines. IEEE Journal on Selected Areas in Communications 20(5), 1105–1115
(2002)
8. Luo, Z., Pang, J.: Analysis of iterative waterfilling algorithm for multiuser power
control in digital subscriber lines. EURASIP Journal on Applied Signal Process-
ing 2006, 80 (2006)
9. Maskery, M., Krishnamurthy, V., Zhao, Q.: Decentralized dynamic spectrum ac-
cess for cognitive radios: Cooperative design of a non-cooperative game. IEEE
Transactions on Communications 57(2), 459–469 (2009)
10. Huang, J., Krishnamurthy, V.: Transmission control in cognitive radio as a Marko-
vian dynamic game: Structural result on randomized threshold policies. IEEE
Transactions on Communications 58(1), 301–310 (2010)
11. Menache, I., Shimkin, N.: Rate-based equilibria in collision channels with fading.
IEEE Journal on Selected Areas in Communications 26(7), 1070–1077 (2008)
12. Candogan, U., Menache, I., Ozdaglar, A., Parrilo, P.: Competitive scheduling in
wireless collision channels with correlated channel state, pp. 621–630 (2009)
13. Monderer, D., Shapley, L.: Potential games. Games and Economic Behavior 14,
124–143 (1996)
14. Cohen, K., Leshem, A., Zehavi, E.: Game theoretic aspects of the multi-channel
ALOHA protocol in cognitive radio networks. Submitted to the IEEE Journal on
Selected Areas in Communications (2012)
15. Zhao, Q., Tong, L.: Opportunistic carrier sensing for energy-efficient information
retrieval in sensor networks. EURASIP J. Wireless Comm. Netw. 2, 231–241 (2005)
16. Gale, D., Shapley, L.: College admissions and the stability of marriage. The Amer-
ican Mathematical Monthly 69(1), 9–15 (1962)
Game-theoretic Robustness
of Many-to-one Networks

Aron Laszka1 , Dávid Szeszlér2, and Levente Buttyán1


1
Budapest University of Technology and Economics,
Department of Telecommunications,
Laboratory of Cryptography and System Security
{laszka,buttyan}@crysys.hu
http://www.crysys.hu/
2
Budapest University of Technology and Economics,
Department of Computer Science and Information Theory
[email protected]
http://www.cs.bme.hu/

Abstract. In this paper, we study the robustness of networks that are


characterized by many-to-one communications (e.g., access networks and
sensor networks) in a game-theoretic model. More specifically, we model
the interactions between a network operator and an adversary as a two
player zero-sum game, where the network operator chooses a spanning
tree in the network, the adversary chooses an edge to be removed from
the network, and the adversary’s payoff is proportional to the number of
nodes that can no longer reach a designated node through the spanning
tree. We show that the payoff in every Nash equilibrium of the game is
equal to the reciprocal of the persistence of the network. We describe op-
timal adversarial and operator strategies and give efficient, polynomial-
time algorithms to compute optimal strategies. We also generalize our
game model to include varying node weights, as well as attacks against
nodes.

Keywords: game theory, adversarial games, network robustness, di-


rected graph strength, graph persistence, access networks, sensor
networks.

1 Introduction
Access networks and sensor networks are inherently vulnerable to physical at-
tacks, such as jamming and destruction of nodes and links. From a topological
point of view, the common characteristic of these networks is that the primary
goal of the nodes is to communicate with a designated node; therefore, we will
refer to them as many-to-one networks, as opposed to many-to-many networks,
such as backbone networks. For example, in a mesh network of wireless routers
that provide Internet access to mobile terminals, every router is typically inter-
ested in communicating with a designated gateway router through which the
Internet is reachable, and not with other peer routers of the network (except for

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 88–98, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Game-theoretic Robustness of Many-to-one Networks 89

the purpose of packet forwarding of course). As another example, in a sensor


network, the goal of the network is to collect the sensed data at a designated
central node.
In this paper, we study the robustness of many-to-one networks in a game-
theoretic model. Traditionally, game-theoretic analysis of many-to-one networks
has been focused on resource allocation and routing in order to ensure fairness
and efficiency [1,2,3]. To the best of our knowledge, our work is the first that
uses game-theoretic analysis of network robustness in many-to-one networks.
Our work is inspired by [4] and [5], which use game-theoretic analysis of ro-
bustness of many-to-many networks. In [4], the strategic interactions between a
network manager, whose goal is to keep the network connected by choosing a
spanning tree, and an attacker, whose goal is to disconnect the network by at-
tacking a link, were modeled as a zero-sum game. It was shown that the payoff in
every Nash equilibrium of the game is the reciprocal of the (undirected) strength
of the network. Furthermore, an efficient algorithm was provided to compute an
optimal attack. In [5], the game model was generalized to include link attack
costs, which can vary based on the targeted links, resulting in a non-zero-sum
game.
While the definition of our game resembles that of [4] and [5], it is actually
fundamentally different:
– First, our game models many-to-one networks, while [4] and [5] modeled
many-to-many networks. We believe that studying adversarial games in
many-to-one networks is more important as these networks are usually more
vulnerable to attacks.
– Second, our payoff function considers the number of separated nodes, i.e.,
how disconnected the network becomes as the result of an attack. This is a
more realistic function for both the operator and the adversary.
– Finally, besides giving an algorithm to compute an optimal adversarial strat-
egy, we also give an algorithm to compute an optimal operator strategy.
Since we believe that a general theory of adversarial network games and graph
robustness metrics is possible, we have kept our notions as similar to that of [4]
and [5] as possible, even though our model and methodology is different.
In [6], a robustness metric for directed graphs with a designated node, called
directed graph strength, was introduced and shown to be computable in poly-
nomial time. Unfortunately, the name “directed strength” is misleading for two
reasons: Firstly, the definition works for undirected graphs as well, without any
modifications. Secondly, the fundamental difference between directed strength
and the similarly named (undirected) strength (which is also introduced in [6]
and used in [4]) is that the former is concerned with reachability between each
node and a designated node, while the latter is concerned with reachability be-
tween every pair of nodes. Therefore, to avoid ambiguity, we renamed directed
graph strength to persistence in [7]. In this paper, we continue to to use the
name persistence.
90 A. Laszka, D. Szeszlér, and L. Buttyán

The main contributions of our paper are the following:

– We model the interactions between a network operator and an adversary as


a two player, zero-sum game.
– We show that the payoff in every Nash equilibrium of the game is equal to
the reciprocal of the persistence of the network.
– We describe optimal adversarial and operator strategies and give efficient,
polynomial-time algorithms to compute such optimal strategies.

The organization of this paper is the following: In Section 2, we present our game
model. In Section 3, we introduce the concepts and notions used by subsequent
sections. In Section 4, we propose an optimal adversarial strategy and show that
the expected payoff of the adversary cannot be smaller than the reciprocal of
the persistence of the network if she adopts the optimal strategy. In Section
5, we propose an optimal operator strategy and show that the expected payoff
of the operator cannot be smaller than minus the reciprocal of the persistence
of the network when it follows the optimal strategy. In Section 6, we combine
the results of the preceding sections to describe a class of Nash equilibria of
the game. In Section 7, we generalize our game model to allow nodes with non-
uniform weights and attacks against nodes. Finally, in Section 8, we conclude
the paper.

2 The Game

The network topology is represented by a connected undirected graph G = (V, E)


with a designated node r ∈ V (G). The goal of the network operator is to keep
the nodes of the network connected to the designated node, while the goal of the
adversary is to separate as many nodes as possible from it.
The interaction between the network operator and the adversary is mod-
eled as a two player, one-shot, zero-sum game. The network operator chooses a
spanning tree to be used for communications. The mixed strategy of the net-
work operator is a distribution on the set of spanning trees T (G), i.e., A :=
|T (G)| 
{α ∈ R≥0 | T ∈T (G) αT = 1}. The adversary chooses an edge to be at-
tacked. The mixed strategy of the adversary is a distribution on E(G), i.e.,
|E(G)| 
B := {β ∈ R≥0 | e∈E(G) βe = 1}. The payoff for the adversary is the number
of nodes from which there is no path to r in T \ {e}, where T and e are the
spanning tree and edge chosen by the operator and the adversary, respectively.
If e ∈ T , then the payoff is obviously zero.
Let λ(T, e) denote the number of nodes that are disconnected from r if the
operator uses T and the adversary attacks e. Then, the payoff function of the
game for the adversary can be written as
 
P (α, β) = αT βe λ(T, e) . (1)
e∈E(G) T ∈T (G)
Game-theoretic Robustness of Many-to-one Networks 91

The adversary has to solve max min P (α, β), while the operator has to solve
β α
min max P (α, β). The corresponding solutions, i.e., the optimal adversarial and
α β
operator strategies, are presented in Section 4 and Section 5, respectively.
In this paper, similarly to [4] and [5], we restrict the pure strategies of the
adversary to attacking single edges only. Studying generalized game models, in
which the pure strategies of the adversary consist of subsets of edges, is an open
problem in case of both many-to-many and many-to-one networks.

3 Preliminaries

In this section, we introduce the basic concepts and notions used by subsequent
sections.
For a set of edges A ⊆ E(G), let λ(A) denote the number of nodes from which
there is no path leading to r in the graph when A is removed.
In [6], the persistence of a graph was defined as:

Definition 1 (Persistence). Given a directed graph G with a designated node


r ∈ V (G), the persistence π(G) is defined as
 
|A|
π(G) = min : A ⊆ E(G), λ(A) > 0 . (2)
λ(A)

Since reachability is well-defined in case of undirected graphs as well, the above


definition also works for undirected graphs without any modifications.
|A|
Definition 2 (Critical set). A set of edges A ⊆ E(G) is critical, if λ(A) =
π(G), i.e., if the minimum in Definition 1 is attained.

Definition 3 (Expected loss). The expected loss of an edge e ∈ E(G) in a


given operator strategy α is the expected payoff of the pure adversarial strategy
targeting exclusively e, i.e., T ∈T αT · λ(T, e).

3.1 Computing Persistence

It is shown in [6] that the computation of persistence can be performed using a


maximum flow algorithm 1 .
Assume that the task is to decide if π(G) ≥ π0 holds, where π0 is a given
constant. For any set X ⊆ V (G), denote by δ(X) the set of edges leaving X. It
is easy to see that the minimum in Definition 1 is attained at a set A = δ(X)
for a suitable X ⊆ V (G) \ {r}. (Indeed, “spare” edges could be deleted from A
without increasing the ratio |A|/λ(A).) Of course, A = δ(X) implies λ(A) = |X|.
Therefore, π(G) ≥ π0 is equivalent to saying that |δ(X)| − π0 · |X| ≥ 0 holds for
1
In this subsection we build on the basics of network flow theory; the required back-
ground can be found in most introductory graph theory textbooks.
92 A. Laszka, D. Szeszlér, and L. Buttyán

all X ⊆ V (G) \ {r}. Adding π0 · (|V (G)| − 1) to both sides we get that π(G) ≥ π0
is equivalent to

|δ(X)| + π0 · (|X| − 1) ≥ π0 · (|V (G)| − 1) , (3)

for all X ⊆ V (G) \ {r} (where X = V (G) \ X).


Consider the following maximum network flow problem. Add a new node s
to G; for each v ∈ V (G) \ {r} add a new arc from s to v and set its capacity to
π0 ; finally, set the capacity of each original arc of G to 1. Denote the obtained
network by G∗ . According to the well-known “max-flow-min-cut” theorem of
Ford and Fulkerson, the maximum flow in the obtained network from s to r is
equal to the minimum cut capacity, that is, the minimum of the sum of capacities
on arcs leaving a set X, where minimum is taken over all subsets X ⊆ V (G∗ )
for which s ∈ X and r ∈ / X. Obviously, the capacity of the cut X is |δ(X)| +
π0 · (|X| − 1). Comparing this with Equation 3 above, we get that π(G) ≥ π0
is equivalent to the existence of a flow of value π0 · (|V (G)| − 1) from s to r in
the above constructed network; or, in other words, a flow that satures all arcs
leaving s.
Consequently, the question of π(G) ≥ π0 can be answered by a maximum
flow algorithm. From this, the actual value of π(G) (that is, the maximum π0
for which the above described flow exists) can be determined by binary search
(which yields a polynomial time algorithm if all input numerical data is assumed
to be integer). In [6] a refinement of this approach is also given: it is shown that
π(G) can be determined by at most |V (G)| maximum flow computations (even
for arbitrary input data).
Furthermore, if π(G) is known, the above described reduction to maximum
flow can be also used to find a critical set: Construct a G∗ in the above manner
with π0 = π(G). A minimum cut in G∗ is a critical set in G.

4 Adversary Strategy
In this section, we describe an adversarial strategy, which achieves an expected
1
payoff of π(G) , regardless of the strategy of the operator. Later, in Section 5, we
show that this strategy is optimal by proving that this is the highest attainable
expected payoff for the attacker if the operator is rational.

Theorem 1. If an adversary targets exclusively the edges of a critical set A with


1
uniform probability, then her expected payoff is at least π(G) .

Proof. For any given spanning tree T ∈ T and set of edges B ⊆ E(G),

e∈B λ(T, e) ≥ λ(B), since every node cut off by removing B has to increase
λ(T, e) by one for at least one e ∈ B. Therefore, the expected payoff for the
adversary is
Game-theoretic Robustness of Many-to-one Networks 93

   1
αT βe λ(T, e) = αT λ(T, e)
|A|
e∈E(G) T ∈T e∈A T ∈T
1  
= αT λ(T, e)
|A|
T ∈T e∈A
1 
≥ αT λ(A)
|A|
T ∈T
λ(A) 
= αT
|A|
T ∈T
λ(A) 1
= = .
|A| π(G)


As seen before in Subsection 3.1, a critical set can be computed in polynomial
time, which implies that the same holds for the adversary strategy described in
Theorem 1.

5 Operator Strategy
In this section, we propose an efficient algorithm that computes an optimal oper-
1
ator strategy, which achieves π(G) expected payoff, regardless of the strategy of
the operator. We have already shown in Section 4 that this is the best attainable
expected payoff for the operator if the adversary is rational.
The following lemma is required by the proof of our main theorem:
Lemma 1. Let G be a graph with a designated sink node r. Let G denote the
graph obtained from G in the following way: Add a source node s to the graph.
For each v ∈ V (G)\{r}, add an arc from s to v and set its capacity to 1. Finally,
1
set the capacity of every original edge of the graph to π(G) . The maximum flow

in G from s to r is |V (G)| − 1.
Proof. This readily follows from Subsection 3.1 by scaling the capacity of each
1
edge with π(G) .

The proof of the following theorem is a constructive proof, where we describe


our efficient algorithm for obtaining optimal operator strategies:
Theorem 2. Let G be a graph with a designated node r. There is an operator
1
strategy in which the expected loss of every edge is at most π(G) .

Proof. Our proof is constructive and it is based on the following algorithm:


1. Let G be the graph obtained from G in the way described in Lemma 1 with
the designated node used as the sink node. Find a maximum flow f in G
from the source node s to the designated node r.
94 A. Laszka, D. Szeszlér, and L. Buttyán

2. Find a spanning reverse arborescence 2 T rooted at r in G such that


– T only includes edges to which f assigns a positive flow amount and
– every edge is directed in the same way as the flow.
3. Calculate λ(T, e) for every e ∈ T .
f (e)
4. Let αT := mine∈T λ(T,e) .
5. For every e ∈ E(G), let f (e) := f (e) − αT · λ(T, e).
6. For every v ∈ V (G) \ {r}, let f ((s, v)) := f ((s, v)) − αT .
7. If the flow assigned by f from s to r is greater than zero, then continue from
Step 2.
8. Let αT := 0 for every other spanning tree.

Before proving the correctness of the algorithm, we have to prove that Step
2 can be executed in each iteration, otherwise the algorithm would terminate
incorrectly. Obviously, if f is a network flow and the amount of flow along every
(s, v), v ∈ V (G) \ {r} edge is positive, there has to be a directed path from every
v ∈ V (G) \ {r} to r consisting of edges with positive flow amounts. Thus, we
have to show that if f is a network flow carrying γ from s to r before Step 5,
then it is a network flow carrying γ − αT (|V (G)| − 1) from s to r after Step 6.
For a v ∈ V (G) \ {r}, let λv denote λ(T, eout ), where eout is the outgoing edge
of v in T . Clearly, the sum of λ(T, ein ) over all incoming edges ein ∈ E(G) of v
is λv − 1. Since the flow along every edge e is decreased by αT · λ(T, e), the sum
of outgoing flows is decreased by αT · λv . Similarly, the sum of incoming flows is
decreased by αT · (λv − 1) + αT = αT · λv , which takes the αT decrease on (s, v)
into account as well. Clearly, the net flow at v remains zero. Since this is true
for every node, except s and r, f remains a network flow. The flow from s to r
is decreased by αT (|V (G)| − 1), since the flow on every (s, v), v ∈ V (G) \ {r},
edge is decreased by αT .
Now, we can prove the correctnessof the algorithm. First, we have to prove
that α is indeed a distribution, i.e., T ∈T αT = 1 and αT ≥ 0, ∀T ∈ T . This
is evident, as the amount of flow from s to r is decreased by αT (|V (G)| − 1) at
every assignment, and the amount is |V (G)| − 1 after Step 1 and zero after the
algorithm has finished.
Second, we have to prove that the expected loss of every edge in E(G) is at
1 1
most π(G) . After Step 1, the amount of flow along every edge is at most π(G) . At
every αT assignment, the flow along every edge is decreased
 by αT · λ(T, e) and
it is never decreased to a negative value. Therefore T ∈T αT · λ(T, e) ≤ π(G) 1
.
Finally, we have to prove that the algorithm terminates after a finite number
of iterations. In every iteration, the flow along at least on edge (i.e., along every
f (e)
edge for which λ(T,e) is minimal) is decreased from a positive amount to zero.
Since there are a finite number of edges, the algorithm terminates after a finite
number of iterations. 


Theorem 3. The above algorithm runs in polynomial time.


2
A directed, rooted spanning tree in which all edges point to the root.
Game-theoretic Robustness of Many-to-one Networks 95

Proof. In Step 8, the assignment does not have to be actually performed for
every spanning tree, since it is enough to output the probabilities of only the
trees in the support of the distribution. Therefore, every step of the algorithm
can be performed in polynomial time. Furthermore, the number of iterations is
less than or equal to the number of edges |E(G)|, since the flow along at least
one edge is decreased from a positive amount to zero in every iteration. 

Corollary 1. An operator strategy that achieves at least − π(G)
1
expected payoff
for the operator can be found in polynomial time.
Proof. The claim of this corollary follows from Theorem 2 and 3. Suppose that
the strategy of the operator is constructed using the proposed algorithm. Then,
1
the expected payoff of every pure adversarial strategy is at most π(G) , since

∀e ∈ E(G) : T ∈T αT · λ(T, e) ≤ π(G) . Therefore, the expected payoff of every
1

1
mixed adversarial strategy is at most π(G) as well. 


6 Nash-Equilibrium
Based on the above results, we can describe a class of Nash equilibria:
Corollary 2. The adversarial strategies presented in Section 4 and the operator
strategies presented in Section 5 form Nash equilibria of the game. The expected
1
payoffs for the adversary and the operator are π(G) and − π(G)
1
, respectively.

Since the game is zero-sum, all Nash equilibria have the same expected payoff.
Consequently, graph persistence is a sensible measure of network robustness.

7 Generalizations
In this section, we present various generalizations to our basic game model intro-
duced in Section 2, which make our model more realistic and practical. We show
that all of these generalized models can be traced back to the basic game model,
i.e., with minor modifications, the previously presented theorems and algorithms
apply to these generalized models as well.

7.1 Directed Graphs


Recall that, in Section 3, graph persistence was defined for directed graphs, even
though it was applied only to undirected graphs so far. We have restricted the
topologies of the studied networks to undirected graphs only to simplify our
basic model. Now, we relax this restriction, and use directed graphs to represent
network topologies. This is clearly a generalization, since undirected networks
can be also represented in this model by replacing each undirected edge with
two arcs facing opposite directions. The generalization is very straightforward,
since all steps and arguments of the previously presented algorithms and proofs
work with directed graphs as well, without any modifications.
96 A. Laszka, D. Szeszlér, and L. Buttyán

7.2 Non-uniform Node Weights

It is possible to generalize our results to the case where nodes have non-uniform
weight or importance. Let dv be the weight of node v: by disconnecting each
node v from r, the adversary gains and the operator loses dv (instead of 1, as
in the original model). Let λ(T, e) denote the total weight of the nodes that
are disconnected from r when the operator uses T and the adversary attacks e.
Similarly, let λ(A) denote the total weight of the nodes that are disconnected
when A is removed. It is easy to see that the definition of graph persistence and
the proposed adversarial strategy do not have to be modified to accommodate
the changes in the definitions of λ(T, e) and λ(A).
In case of the operator strategy, the following modifications have to be made
to the proposed algorithm and the proof:

– In Step 1, the capacity of each (s, v), v ∈ V (G)\{r} arc has to be dv , instead
of 1.
– In Step 6, the capacity of each (s, v), v ∈ V (G) \ {r} arc has to be decreased
by dv · αT , instead of αT .
– Consequently,
• the sum of λ(T, ein ) over all incoming edges ein ∈ E(G) of v is λv − dv ,
instead of λv − 1, 
• the flow from s to r is decreased by αT v∈V (G)\{r} dv , instead of
αT (|V (G)| − 1).

7.3 Node Attacks

Based on the generalization presented in the previous subsection, our results


can be further generalized to the case where the adversary is not only able
to target edges, but it is able to target nodes as well. In this case, the mixed
strategy of the adversary is a distribution on (V (G) ∪ E(G)), i.e., B := {β ∈
|V (G)|+|E(G)| 
R≥0 | e∈(V (G)∪E(G)) βe = 1}.
For an arbitrary subset A ⊆ (V (G) ∪ E(G)), let λ(A) denote total weight of
the nodes which are either elements of A or from which there is no path leading
to r in the graph when A is removed.
The definition of persistence has to be generalized to allow targeting nodes:

Definition 4 (Edge-node-persistence). Given a directed graph G with a des-


ignated node r ∈ V (G), the edge-node-persistence πn (G) is defined as
 
|A|
πn (G) = min : A ⊆ (V (G) ∪ E(G)), λ(A) > 0 . (4)
λ(A)

In [7], we have shown that computing edge-node-persistence can easily be re-


duced to computing persistence by vertex splitting, a well-known trick in graph
theory: replace each node v by two nodes v1 and v2 , add the arc (v1 , v2 ) to G,
let d(v1 ) = d(v), d(v2 ) = 0; finally, replace each original arc (u, v) by (u2 , v1 ). It
Game-theoretic Robustness of Many-to-one Networks 97

is fairly easy to see that the persistence of the obtained graph is the same as the
edge-vertex-persistence of the original one.
This trick can be also used to obtain adversarial and operator strategies that
achieve πn1(G) payoff in the generalized model on any given graph G. Let G be
the graph obtained from G in the above manner. Find an optimal adversarial
strategy on G as it has been described in Section 4, which achieves π(G1 1
 ) = π (G)
n

payoff on G . The support of the resulting distribution consists of edges in E(G)
and edges corresponding to nodes in V (G). It is easy to see that if we replace
edges corresponding to nodes with the nodes in the support of the distribution,
the resulting strategy achieves πn1(G) payoff on G. An optimal operator strategy,
which achieves πn1(G) payoff on G, can be obtained in a similar manner.
Please note that we could define a model in which an adversary is only able
to target nodes, but this is unnecessary. For every optimal adversarial strategy
targeting both nodes and edges, we can construct a corresponding optimal ad-
versarial strategy that targets only nodes: simply replace each arc in the strategy
with its source node. It is easy to see, that the payoff of the resulting strategy
is at least as large as the payoff of the original strategy.

8 Conclusions

In this paper, we introduced a game-theoretic model of the interactions between


the operator of a many-to-one network and an adversary. We showed that the
payoff in every Nash equilibrium of the game is equal to the reciprocal of the
persistence of the network. One of our main contributions is to link the graph-
theoretic robustness of a network, measured in persistence, to game theory, which
gives a better understanding of robustness and an argument for the soundness
of the notion of graph persistence. We also gave efficient, polynomial-time algo-
rithms to compute optimal strategies for the adversary and the operator. The
optimal operator strategy gives a baseline for the design of robust many-to-one
routing algorithms.

Acknowledgements. This paper has been supported by HSN Lab, Bu-


dapest University of Technology and Economics, http://www.hsnlab.hu. Dávid
Szeszlér is supported by grant Nr. OTKA 103985 of the Hungarian National Sci-
ence Fund. The work is also related to the internal project of the authors’ hosting
institution on “Talent care and cultivation in the scientific workshops of BME”,
which is supported by the grant TÁMOP - 4.2.2.B-10/1–2010-0009.

References

1. Altman, E., Boulogne, T., El-Azouzi, R., Jimenez, T., Wynter, L.: A survey on
networking games in telecommunications. Computers & Operations Research 33(2),
286–311 (2006)
98 A. Laszka, D. Szeszlér, and L. Buttyán

2. Felegyhazi, M., Hubaux, J.P.: Game theory in wireless networks: A tutorial. Tech-
nical Report LCA-REPORT-2006-002, EPFL, Lausanne, Switzerland (June 2007)
3. Charilas, D.E., Panagopoulos, A.D.: A survey on game theory applications in wire-
less networks. Computer Networks 54(18), 3421–3430 (2010)
4. Gueye, A., Walrand, J.C., Anantharam, V.: Design of Network Topology in an
Adversarial Environment. In: Alpcan, T., Buttyán, L., Baras, J.S. (eds.) GameSec
2010. LNCS, vol. 6442, pp. 1–20. Springer, Heidelberg (2010)
5. Gueye, A., Walrand, J.C., Anantharam, V.: How to Choose Communication Links
in an Adversarial Environment? In: Jain, R., Kannan, R. (eds.) GameNets 2011.
LNICST, vol. 75, pp. 233–248. Springer, Heidelberg (2012)
6. Cunningham, W.H.: Optimal attack and reinforcement of a network. Journal of the
ACM 32(3), 549–561 (1985)
7. Laszka, A., Buttyán, L., Szeszlér, D.: Optimal selection of sink nodes in wireless
sensor networks in adversarial environments. In: Proc. of the 12th IEEE Interna-
tional Symposium on a World of Wireless, Mobile and Multimedia, WoWMoM 2011,
Lucca, Italy, pp. 1–6 (June 2011)
Hybrid Pursuit-Evasion Game between UAVs
and RF Emitters with Controllable
Observations: A Hawk-Dove Game

Husheng Li1 , Vasu Chakravarthy2, Sintayehu Dehnie3 , Deborah Walter4 ,


and Zhiqiang Wu5
1
The University of Tennessee, Knoxville, TN
[email protected]
2
Air Force Research Lab, Dayton, OH
[email protected]
3
Booz Allen Hamilton, Dayton, OH
[email protected]
4
Rose-Hulman Institute of Technology
[email protected]
5
Wright State University, Dayton, OH
[email protected]

Abstract. Unmanned aerial vehicles (UAVs) can be used to chase radio


frequency (RF) emitters by sensing the signal sent out by the RF emit-
ters. Meanwhile, the RF emitter can evade from the UAVs, thus forming
a pursuit-evasion game. In contrast to traditional pursuit-evasion games,
in which the players can always observe each other, the RF emitter can
stop transmitting such that the UAVs lose the target. However, stopping
the transmission also incurs cost to the RF emitter since it can no longer
convey information to destinations. Hence, the RF emitter can take both
continuous actions, i.e., the moving direction, and discrete actions, i.e.,
whether to stop transmission. Meanwhile, there are both discrete states,
i.e., whether the RF transmitter is transmitting, and continuous states,
i.e., the locations of UAVs and RF emitter, thus forming a hybrid sys-
tem. We will study the game theoretic properties of this novel game and
derive the optimal strategies for both parties under certain assumptions.

Keywords: UAV, pursuit-evasion game.

1 Introduction
Unmanned aerial vehicle (UAV) is a remotely piloted aircraft, which is widely
used in military. It can be used for many tasks, particularly in surveillance or
renaissance. In recent years, people have studied how to use UAVs as a flying sen-
sor network to monitor various activities, such as radio activities [2][4][5][9][11].
This is particularly useful in military due to the inexpensive cost and efficient
deployment.
In this paper, we study how UAVs can be used to chase RF emitters. When
a UAV is equipped with directional antenna, it can determine where the RF

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 99–114, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
100 H. Li et al.

emitter is and then pursue it, either to continue the surveillance or destroy the
RF emitter. We assume that the RF emitter is also mobile, but with a slower
speed than the UAV. The RF emitter can move to evade the pursuit of the UAV.
Then, it forms a pursuit-evasion game which was originally studied by R. Isaacs
[6]. Since such a game is played in a continuous space and continuous time,
it belongs to the category of differential games. In contrast to traditional game
theory, in which randomness is a key factor of the game, the pursuit-evasion game
is deterministic, which can be described by a partial differential equation (called
Isaacs Equation). It has been widely applied in the study of warfares, such as the
doggame of gameers and the bunker hill battle [6]. The value functions and the
optimal strategies at the equilibrium have been obtained for many applications.

Fig. 1. An illustration of the pursuit-evasion game of UAV and RF emitter

In contrast to the traditional pursuit-evasion games, the game studied in this


paper is characterized by its hybrid action space and state space. Besides choos-
ing the moving direction, the RF emitter can choose to stop the RF transmission.
Since the UAV’s geolocationing capability is completely dependent on the RF
signal, the UAV will lose the observability of the RF emitter. During this ’blind’
period, the RF emitter can try to evade the UAV. However, it incurs penalty
to the RF emitter when it ceases the RF transmission. Hence, the RF emitter
must find a good tradeoff between the risk of being caught by the UAV and the
penalty of ceasing transmitting. Note that the action of whether to transmit and
the state of whether being transmitting are both discrete. Therefore, the game is
actually played in a hybrid system in which both discrete and continuous states
exist [7].
Note that hybrid systems have been intensively studied in recent years due
to its wide applications in various areas such as smart grid and robotic network.
However, there have been very few studies on the games in hybrid systems [8].
UAV-RF Game 101

In particular, there have been no studies on the pursuit-evasion games with the
observation controllable by the evader, to our best knowledge. In this paper, we
will consider both cases of discounted and non-discounted rewards. The feedback
Nash equilibrium will be obtained and described by a combination of Bellman’s
equation and Isaacs equation. Due to the prohibitive challenge of solving the
equations, we will study heuristic steering strategies of the UAV and RF emitter
and then use numerical simulations to explore the strategy of whether to stop
transmitting.
The remainder of this paper is organized as follows. The system model for the
UAV and RF emitter is introduced in Section 2. The case of single UAV and
single RF emitter is studied in Section 3 and is then extended to the multiple-
UAV-multiple-emitter case in Section 4. The numerical results and conclusions
are obtained in Sections 5 and 6, respectively.

2 System Model
Consider one UAV and one RF emitter. We denote by xu = (xu1 , xu2 ) and
xe = (xe1 , xe2 ) the locations of the UAV and the RF emitter. We adopt a simple
model for the motions of the UAV and RF emitter, using the following ordinary
differential equations [3]:

⎨ ẋu1 = vu sin θu
ẋu2 = vu cos θu , (1)

θ̇u = wu fu

and

⎨ ẋe1 = ve sin θe
ẋe2 = ve cos θe , (2)

θ̇e = we fe

where vu and ve are the velocities; fu and fe are the forces to make the direction
change; wu and we are the inertia. It is reasonable to assume that vu > ve . We
assume that the forces are limited; i.e., |fu | < Fu and |fe | < Fe , where Fu and
Fe are the maximum absolute values of the forces. Note that the above model is
very simple but more mathematically tractable than more complicated motion
models.

3 Single-UAV-Single-Emitter Game
In this section, we assume that there is only one UAV and it can perfectly
determine the location of the RF emitter when the emitter keeps transmitting.
This is reasonable if the UAV employs a powerful sensor which can determine
both distance (e.g., using the signal strength) and the angle (e.g., using an
antenna array). However, when the emitter stops transmitting at a certain cost,
the UAV loses the target; hence we say that the observation is controllable (by
102 H. Li et al.

the emitter). In contrast to traditional continuous time pursuit-evasion game,


the challenge of this game is the hybrid system state, which consists of both
continuous one (the locations and directions) and discrete one (the emitter’s
transmission state).

3.1 Game Formulation


Obviously, there are two players in the game, namely the UAV and the RF
emitter. Essentially, the UAV wants to pursue the RF using its sensor while
the emitter wants to evade by moving or stopping emitting. For simplicity, we
assume that the pursuit and evasion occur in a plane. The elements are itemized
as follows.

State. We denote by s the state of the whole system, which consists of the
following components:
– For the UAV side, its state includes its current location xu = (xu1 , xu2 ) and
the direction θu .
– For the emitter side, its state includes the current location xe = (xe1 , xe2 ),
the moving direction θe and its transmission state se : se = 1 when the the
emitter transmits and se = 0 otherwise.
Since the game only concerns the the relative location x = xu −xe , we can define
the system state as s = (x, θu , θe ).

Actions. Both the UAV and emitter can move and change direction. Moreover,
the emitter can choose to stop transmitting and then make the UAV lose track
of the target. Hence, the actions in the game are defined as follows.
– UAV: The action is fu which is visible to the emitter.
– Emitter: Its action includes fe , which is also visible to the UAV when se = 1,
and the decision on whether to stop the transmission, which is denoted by ae .
For simplicity, we assume that, when the UAV loses the targets, it follows a cer-
tain predetermined track; e.g., keeping the original direction (fu = 0). Moreover,
we assume that the transmission state has a minimal dwelling time τ0 ; i.e., each
transmission state, namely on or off, must last for at least τ0 units. To simplify
the analysis, we assume that the decision on transmission can be made at only
discrete times, namely 0, τ0 , 2τ0 , ... For the case in which the decision can be
made at continuous time under the constraint of minimum dwelling time, the
analysis is much more complicated and will be left to our future study.

Rewards. The purpose of the UAV is to catch the emitter or force the emitter
to keep silent. When the distance between UAV and emitter is small, the game
is ended. This stopping time is defined as

T ∗ = inf {t|x(t) ≤ γd } , (3)


UAV-RF Game 103

where γd is a predetermined threshold for the distance. It is possible that T ∗ is


infinite if the UAV is unable to catch the emitter, e.g., if the RF emitter keeps
silent forever.
Hence, the total reward of the UAV can be modeled using the non-discounted
one or discounted one:
– Discounted reward: When the reward is discounted (i.e., the future is less
important than now; hence, the requirement of time is incorporated into
the decisions), we have (both α > 0 and 0 < β < 1 are parameters of
discounting)
 T∗  
R= R0 e−αt δ(x(t) ≤ γd ) − cβ n ae (t)δ(t = nτ0 ) dt, (4)
t=0

where R0 is the reward for locating the emitter and c is the penalty on the
UAV when the emitter transmits in one time slot. The reward at time t is
given by

r(t) = R0 e−αt δ(x(t) ≤ γd ) − cβ n ae (t)δ(t = nτ0 ). (5)

– Non-discounted reward: When the reward is not discounted (i.e., the future
is the same important as now) within a time window [0, Tf ], we have
 min{T ∗ ,Tf }
R= (R0 δ(x(t) ≤ γd ) − cae (t)δ(t = nτ0 )) dt. (6)
t=0

The reward at time t is given by

r(t) = R0 δ(x(t) ≤ γd ) − cae (t)δ(t = nτ0 ). (7)

For simplicity, we assume that tf = Tf /τ0 is an integer.


Since we model the game as a zero-sum one, the reward of the emitter at time
slot t is simply given by −r(t). Note that, in practice, the reward could be more
complicated, e.g., taking the fuel consumptions into account. This requires much
more complicated models and will be studied in the future.

System Dynamics. The dynamics of the game can be written as

ṡ(t) = fae (t) (s(t), fu (t), fe (t)), (8)

where ae (t) is the transmission state of the emitter. We denote by πu and πe


the strategies of the continuous actions of the UAV and emitter, respectively;
i.e., fu (t) = πu (ae (t), s(t)) and fe (t) = πe (ae (t), x(t)). As we have assumed in
the game formulation, when ae (t) = 0 (the emitter stops transmitting), πu is
independent of s(t); i.e., the UAV follows a predetermined track. In this paper,
we assume πu = 0 when ae = 0; i.e., the UAV keeps the original direction when
it loses track of UAV.
104 H. Li et al.

3.2 Feedback Nash Equilibrium


When ae (t) is always 1 (i.e., the emitter keeps transmitting all the time and is
thus always visible to the UAV), the game degenerates to a traditional pursuit-
evasion game. A brief introduction to the feedback Nash equilibrium1 of the tra-
ditional pursuit-evasion game is provided in the Appendix for self-containedness.
When ae (t) is not always 1, the challenge is that there are both discrete and
continuous system states in the dynamics, thus eliminating the possibility of
straightforwardly applying the traditional theories of stochastic games (discrete
system state) and differential games (continuous system state).

Equilibrium for Discounted Reward. First, we define reward-to-go function


Rs (t); i.e., the reward from time t to the game termination. We have the following
two observations:
– We notice that the decision actually depends on only the relative locations
directions of UAV and emitter, not on the current transmission status of
emitter.
– There are two types of reward-to-go functions; namely the ones at the times
of deciding the transmission status and the ones in other times. We assume
that the decision on whether to shut down the transmission is made at
time slightly before nτ0 ; i.e., (nτ0 )− . Then, we have reward-to-go functions
{Rs ((nτ0 )− )}n=0,1,... and Rx (t), t = nτ0 .
Then, the following proposition provides the reward-to-go functions at the feed-
back Nash equilibrium of the game with non-discounted reward:
Proposition 1. The reward-to-go functions for the non-discounted reward are
determined by

Rs ((τ0 )− ) = min [−cI(ae = 1) + Rs (0, ae )] , (9)


ae

and

min(τ,T ∗ )
Rs (t, 1) = max min R0 δ(x(t) < γd )dt + Rs (τ0− ) , (10)
fu fe t

where s is the system state at time τ0 , respectively, and


 ∗
min(τ,T )
Rs (t, 0) = min R0 δ(x(t) < γd )dt + Rs (τ0− ) . (11)
fe t fu =0

And (31) and (33) can be further written as

− ∂Rs∂t(t,1) = maxfu minfe ∂Rs (t,1)


∂s f (t, s, fu , fe ) + R0 δ(x(t) < γd ) ,
(12)
Rs (τ, 1) = Rs ((τ0 )− ),
1
The definition of feedback Nash equilibrium can be found in [1].
UAV-RF Game 105

and
− ∂Rs∂t(t,0) = minfe ∂Rs∂s(t,1) f (t, s, fu , fe ) + R0 δ(x(t) < γd ) ,
(13)
Rs (τ, 0) = Rs ((τ0 )− ),
Then, we can obtain the optimal strategies of the UAV and emitter, which are
given in the following corollary
Corollary 1. The strategies at the feedback Nash equilibrium are given by
– The strategy of UAV is given by
 
∂Rs (t, 1)
u∗f = arg max min f (t, s, fu , fe ) + R0 δ(x(t) < γd ) . (14)
fu fe ∂s
– The strategy of the emitter is given by
 
∂Rs (t, 1)
u∗e = arg min min f (t, s, fu , fe ) + R0 δ(x(t) < γd ) . (15)
fe fu ∂s
and
Rx ((τ0 )− ) = min [−cI(ae = 1) + Rx (0, ae )] , (16)
ae

Equilibrium for Non-discounted Reward. Similarly to the discounted re-


ward case, the equilibrium for the non-discounted reward case is given in the
following proposition:
Proposition 2. The reward-to-go functions for the non-discounted reward are
determined by
 
Rsn ((τ0 )− ) = min −cI(ae = 1) + Rsn+1 (0, ae ) , (17)
ae

and

min(τ,T ∗ )
Rsn+1 (t, 1) = max min R0 δ(x(t) < γd )dt + Rsn (τ0− ) , (18)
fu fe t

where s is the state at time τ and



min(τ,T ∗ )
n+1
Rs (t, 0) = min R0 δ(x(t) < γd )dt + Rsn (τ0− ) . (19)
fe t fu =0

And (31) and (33) can be further written as


∂Rn+1 (t,1) ∂Rn+1 (t,1)
− s ∂t = maxfu minfe s
∂s f (t, s, fu , fe ) + R0 δ(x(t) < γd ) ,
(20)
Rs (τ, 1) = Rsn+1 ((τ0 )− ),
n+1

and
∂Rn+1 (t,0) ∂Rn+1 (s,1)
− x ∂t = minfe x
∂s f (t, s, fu , fe ) + R0 δ(x(t) < γd ) ,
, (21)
Rsn+1 (τ, 0) = Rsn+1 ((τ0 )− ),
and
Rsf ((τ0 )− = 0.
t
(22)
106 H. Li et al.

3.3 Computation of Strategy


Since we have both continuous and discrete actions, we address them separately
and then integrate into one uniform procedure for computing the strategies at
the feedback Nash equilibrium. For simplicity, we consider only the case of dis-
counted rewards.

Discrete Action. For the discrete action, we consider only the emitter since
there is no discrete action for the UAV.
– Case of Discounted Reward: We assume that, given Rs ((τ0 )− ), we know how
to compute the strategies of the UAV and emitter in (20) and (21). Then,
we can do the following value iteration for computing Rs ((τ0 )− ):
 k+1  
Rs ((τ0 )− ) = minae −cI(ae = 1) + Rsk (0, ae )
, (23)
Rs0 ((τ0 )− ) = R0 (x)

where R0 is the initialization of the reward-to-go function, which is a function


of the relative location, and Rsk (0, ae ) is obtained from the the values of
Rsk ((τ0 )− ) in the k-th iteration. The difficulty of the value iteration is that
s is a continuous state, thus requiring uncountable equations in the value
iteration. One effective approach is that we can discretize the location, thus
approximating the problem using a discrete one.
– Case of Non-discounted Reward: We assume that, given Rs ((τ0 )− ), we know
how to compute the strategies of the UAV and emitter in (20) and (21).
Then, we can do the following value iteration for computing Rs ((τ0 )− ):
 k+1  
Rs ((τ0 )− ) = minae −cI(ae = 1) + Rsk (0, ae )
, (24)
Rs0 ((τ0 )− ) = R0 (s)

where R0 is the initialization of the reward-to-go function, which is a function


of the relative location, and Rsk (0, ae ) is obtained from the the values of
Rsk ((τ0 )− ) in the k-th iteration. The difficulty of the value iteration is that
x is a continuous state, thus requiring uncountable equations in the value
iteration. One effective approach is that we can discretize the location, thus
approximating the problem using a discrete one.

Continuous Action. It is highly nontrivial to solve the partial differential


equation, particularly when the cost function Rx0 ((τ0 )− ) is complicated. Unfor-
tunately, we are still unable to solve it. Hence, we propose the following heuristic
but reasonable strategy for both the UAV and the RF emitter, which is inde-
pendent of whether the reward is discounted or not:
– UAV: When the RF emitter is transmitting, the UAV follows the direction
towards the RF emitter using the full force.
– RF emitter: The RF emitter follows the direction perpendicular to the vector
between the UAV and the RF emitter in full strength.
UAV-RF Game 107

4 Multi-UAV-Multi-Emitter Game

In this section, we extend the study on the single-UAV-single-emitter game to


the general case in which we consider multiple UAVs and multiple emitters.

4.1 Game Formulation


We assume that there are Nu UAVs and Ne RF emitters. We assume that both
quantities Nu and Ne are known to all UAVs and emitters. This is reasonable
since each emitter can count the number of UAVs due to the assumption of visi-
bility. We also assume that the emitters are in the state of ’on’ at the beginning
such that the UAVs know the number of emitters. The elements of the game are
then explained as follows.

Players: Since we do not consider any random factor, thus making the game a
deterministic one, each UAV and each emitter know the future evolution of the
game at the feedback Nash equilibrium. Hence, we can consider the the game
as a two (virtual) player one; i.e., both the UAV side and the emitter side are
controlled in centralized way. We assume that each emitter will be out of the
game once it is caught by any UAV; e.g., it is destroyed by the UAV. Hence,
the number of actual players may be changing during the game. We denote by
Ne (t) the set of emitters still surviving at time t.
In practice, when there exists randomness in the observations or each UAV
(emitter) has limited knowledge to the system state, the communications among
the UAVs or the emitters need to be considered, which is concerned with the
team formations due to limited communication range. This more complicated
case will be studied in the future.

State Space. For each individual UAV and emitter, its state is the same as the
single-UAV-single-emitter case. The system state space is the product of the in-
dividual ones; i.e., the state includes the locations and directions of all UAVs and
emitters, denoted by {xun }n=1,...,Nu , {θnu }n=1,...,Nu , {xen }n∈Ne (t) . {θne }n=1,...,Ne ,
as well as the emitters’ transmission state. Note that, when an emitter is caught
by a UAV, it is out of the game and the state space is reduced. Similarly to the
single-UAV-single-emitter case, we still use s to denote the overall system state
(excluding the discrete state of the transmission status of each emitter).

Action Space. For each individual UAV or emitter, its action space is the
same as the single-UAV-single-emitter case in the previous section. We simply
add superscript to distinguish the actions of different UAVs or emitters. For
simplicity, we do not add more constraints like collision avoidance or formation
maintenance.
108 H. Li et al.

Reward. Similarly to the single-UAV-single-emitter case, a reward is achieved


by the UAVs when an emitter is caught. A cost is incurred to an emitter if it
stops transmitting. Due to the limited space, we consider only the non-discounted
case, in which the reward is given by
 T∗
R= e−αt R0 δ(xnu − xm e (t) ≤ γd , ∃n, m ∈ Ne (t))
t=0
 
− cβ n am
e (t)δ(t = nτ0 )dt, (25)
n m∈Ne (t)

where T ∗ is the earliest time that all emitters have been caught; i.e.,

T ∗ = min{t||Ne (t)| = 0}. (26)

Recall that R0 is the reward for catching an emitter and c is the cost when an
emitter transmits in one time slot. We can immediately obtain the instantaneous
reward r(t) of the UAVs.

4.2 Multi-UAV-Single-Emitter Game


To study the general case, we first study the special case in which there is only
one emitter. Similarly to the single UAV case, we have the following conclusion
for the multi-UAV-single-emitter game.
Proposition 3. The reward-to-functions for the non-discounted reward are de-
termined by

Rs ((τ0 )− ) = min [−cI(ae = 1) + Rx (0, ae )] , (27)


ae

and

Rs (t, 1) = max min


fu fe

min(τ,T ∗ )
R0 δ(∃n, xun (t) − xe (t) < γd )dt + Rs (τ − ) , (28)
t

and

Rs (t, 0) = min
fe

min(τ,T ∗ )
R0 δ(∃n, xun (t) − xe (t) < γd )dt + Rs (τ − ) . (29)
t fu =f0

4.3 Multi-UAV-Multi-Emitter Game


Based on the discussion on the multi-UAV-single-emitter case, the general multi-
UAV-multi-emitter case can be analyzed in a recursive manner: when an emitter
is caught, the game is converted into a game with one less emitter.
UAV-RF Game 109

Proposition 4. Suppose that the feedback Nash equilibrium for Ne − 1 emitters


has been obtained and we use a super script Ne − 1 in the reward-to-go function.
The reward-to-functions for the non-discounted reward are determined by
 
RsNe ((τ0 )− ) = min −cI(ae = 1) + RP Ne x (0, ae ) , (30)
ae

and

RsNe (t, 1) = max min


fu fe
 ∗
min(τ,T )
R0 δ(∃n, xun (t) − xe (t) < γd )dt + RsÑ e (τ − ) , (31)
t

where Ñe is the number of emitters after the time τ ; i.e. ,



Ne − 1 , if ∃n, t, xun (t) − xe (t) < γd )
Ñe = , (32)
Ne , otherwise

and

RsNe (t, 0) = min


fe
 ∗
min(τ,T )
R0 δ(∃n, xun (t) − xe (t) < γd )dt + RsÑ e (τ − ) . (33)
t fu =f0

5 Numerical Results
In this section, we use numerical simulations to disclose some phenomena of the
pursuit-evasion game. For simplicity, we consider only one UAV and one RF
emitter.

5.1 Simulation Setup


We consider abstract length and time units. We assume vu = 0.1, ve = 0.02,
Fu = 0.05 and Fv = 0.1. We assume γd = 0.1. Unless stated otherwise, the
penalty of the RF emitter being caught by the UAV is 10, while the penalty of
not transmitting is 3. For the case of discounted reward, the discounting factor
is β = 0.9. We discretize d, δθ1 and δθ2 into 40 × 20 × 20 grid. The value function
is obtained from 50 iterations. For the case of non-discounted reward, we set
tf = 5, i.e., the RF only need to consider the game within 5 decision periods.

5.2 Case of Discounted Reward


Fig. 2 shows the value functions of different cases. We observe that the value
function is high when δθ1 is close to zero. The reason is that both the UAV and
RF emitter have similar initial direction; hence it is easier for the UAV to catch
110 H. Li et al.

d=1,θ =0 d=2.5,θ =0
2 2
10 10

8 8

6 6
value

value
4 4

2 2

0 0
0 2 4 6 0 2 4 6
δθ δθ
1 1
δθ1=π,δθ2=0 δθ1=3π/2,δθ2=0
10 3

8
2
value

value
6
1
4

2 0
0 2 4 6 0 2 4 6
d d

Fig. 2. Samples of value functions

d=7.07, always transmit d=7.07, optimal


6 5

4
0
2
−5
0

−2 −10
0 5 10 −5 0 5 10

d=4.24, always transmit d=4.24, optimal


3 4

2
2
1
0
0
−2
−1

−2 −4
0 2 4 6 −5 0 5 10

Fig. 3. Samples of tracks


UAV-RF Game 111

the RF emitter. We also observe that the value usually decreases as the initial
distance between UAV and RF is large (but there are some exceptions).
Fig. 3 shows the tracks of the UAV and RF emitter with different initial
distances. In the left columns, the RF emitter always keeps transmitting; finally,
it will be caught by the UAV. In the right column, the RF emitter adopts the
optimized strategy. We observe that the RF emitter can escape from the pursuit
of the UAV by stopping transmitting in certain times.
Then, we increase the penalty of stopping transmitting to 8. The tracks using
the corresponding optimal strategy is shown in Fig. 4. We observe that, in both
cases, the RF emitter is finally caught by the UAV, due to the large penalty of
stopping transmitting.

d=7.07, always transmit d=7.07, optimal


6 6

4 4

2 2

0 0

−2 −2
0 5 10 0 5 10

d=4.24, always transmit d=4.24, optimal


3 3

2 2

1 1

0 0

−1 −1

−2 −2
0 2 4 6 0 2 4 6

Fig. 4. Samples of tracks when the penalty of ceasing transmitting is increased

5.3 Case of Non-discounted Reward


For the case of no-discount reward, the value functions and the optimal actions
in different stages are shown in Fig. 5. We observe that, in the 5-th stage, the
RF emitter more intends to keep transmitting and take the risk of being caught
by the UAV. The sample tracks are shown in Fig. 6. We observe that, in the first
situation, the RF emitter stops transmitting to avoid the UAV at the beginning
and finally gets caught by the UAV.
112 H. Li et al.

stage 1 stage 1
2 10

1.8
8
1.6
action

value
6
1.4
4
1.2

1 2
0 2 4 6 0 2 4 6
d d
stage 5 stage 5
2 10

1.8 8

1.6 6
action

1.4 value 4

1.2 2

1 0
0 2 4 6 0 2 4 6
d d
Fig. 5. Samples of value functions and optimal actions when the reward is not dis-
counted

d=5\sqrt{2}, always transmit d=5\sqrt{2}, optimal


6 5

2 0

−2 −5
0 5 10 −5 0 5 10

d=3\sqrt{2}, always transmit d=3\sqrt{2}, optimal


3 4

2
2
1
0
0
−2
−1

−2 −4
0 2 4 6 −5 0 5 10

Fig. 6. Samples of tracks when the reward is not discounted


UAV-RF Game 113

6 Conclusions

A The Isaccs Equation


We consider a differential game with N players over time period [0, T ], whose
dynamics are given by (the system state x is in RM )
ẋ(t) = f (t, x(t), u1 (t), ..., uN (t)), (34)
and the cost functionals are given by
Ln (u1 , ..., uN )
 T
= gn (t, x(t), u1 (t), ..., uN (t))dt + qn (x(T)). (35)
0

We assume that each player has perfect access to all dimensions of the system
state; i.e., the closed-loop perfect state (CLPS). The following definition defines
the feedback Nash equilibrium for the differential game.
Definition 1. For the N -player game in (34) and (35), an N -tuple of strate-
gies {πn∗ }n=1,...,N consists of a feedback Nash equilibrium solution if there exist
functionals Vn over [0, T ] × RM such that
Vn (T, x) = qn (x), (36)
 T
Vn (t, x) = gn (t, x∗ (s), π1∗ (x∗ ), ..., πN

(x∗ ))ds
t
+ qn (x∗ (T ))
 T
≤ gn (t, x(s), π1∗ (x), ..., πn−1

(x), πn (x),
t
∗ ∗
πn+1 (x)..., πN (x))ds + qn (x∗ (T )), ∀πn , (37)
where x∗ is the trace of state when the actions are π1∗ (s), ..., πN

(s) and x is the
state trace when the action of player n is changed to πn .
The following theorem provides a sufficient condition for the feedback Nash
equilibrium for the general N -player case.
Theorem 1. An N -tuple of strategies {πn∗ }n=1,...N provides a feedback Nash
equilibrium if the functionals {Vn }n=1,...,N satisfy the following equations:

∂Vn (t, x) ∂Vn (t, x)  ∗ 
− = min f (t, x, π−n (t, x), un )
dt un ∂x
 ∗ 
+ g(t, x, π−n (t, x), un ) , (38)
and

∂Vn (t, x)  ∗ 
πn∗ (t, x) = arg min f (t, x, π−n (t, x), un )
un ∂x
 ∗ 
+ g(t, x, π−n (t, x), un ) , (39)
114 H. Li et al.

and
Vn (T, x) = qn (x). (40)
The following theorem provides a sufficient condition for two-player zero-sum
game in which the cost for player 1 is given by
 T
L(u1 , u2 ) = g(t, x(t), u1 (t), u2 (t))dt + q(T, x(T )), (41)
0
and the cost of player 2 is −L(u1 , u2 ).
Theorem 2. The value function of the two-player zero-sum differential game
satisfies the following Isaacs equation:
 
∂V ∂V
− = min max f (t, x, u1 , u2 ) + g(t, x, u1 , u2 )
∂t u1 u2 ∂x
 
∂V
= max min f (t, x, u1 , u2 ) + g(t, x, u1 , u2 ) (42)
u2 u1 ∂x

References
1. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn. Society
for Industrial and Applied Mathematics (1999)
2. Beard, R.W., McLain, T.W., Nelson, D.B., Kingston, D., Johanson, D.: Decentral-
ized cooperative aerial surveillance using fixed-wing miniature UAVs. Proceedings
of the IEEE 94(7), 1306–1324 (2006)
3. Bullo, F., Cortes, J., Martinez, S.: Distributed Control of Robotic Networks: A
Mathematical Approach to Motion Coordination Algorithms. Princeton University
Press (2009)
4. DeLima, P., York, G., Pack, D.: Localization of ground targets using a flying sensor
network. In: Proc. of IEEE International Conference on Sensor Networks, Ubiqui-
tous, and Trustworthy Computing, vol. 1, pp. 194–199 (2006)
5. Elsaesser, D.: Emitter geolocation using low-accuracy direction-finding sensors. In:
IEEE Symposium on Computational Intelligence for Security and Defense Appli-
cations, CISDA, pp. 1–7 (2009)
6. Isaacs, R.: Differential Games. Wiley (1965)
7. Lunze, J., Lararrigue, F.L.: Handbook of Hybrid Systems Control: Theory, Tools
and Applications. Cambridge Univ. Press (2009)
8. Nerode, A., Remmel, J.B., Yakhnis, A.: Hybrid system games: Extraction of control
automata with small topologies. In: Handbook of Hybrid Systems Control: Theory,
Tools and Applications. Cambridge Univ. Press (2009)
9. Scerri, P., Glinton, R., Owens, S., Sycara, K.: Locating RF Emitters with Large
UAV Teams. In: Pardalos, P.M., Murphey, R., Grundel, D., Hirsch, M.J. (eds.) Adv.
in Cooper. Ctrl. & Optimization. LNCIS, vol. 369, pp. 1–20. Springer, Heidelberg
(2007)
10. Scerri, P., Glinton, R., Owens, S., Scerri, D., Sycara, K.: Geolocation of RF emit-
ters by many UAVs. In: AIAA, Infotech@Aerospace 2007 Conference and Exhibit
(2007)
11. Walter, D.J., Klein, J., Bullmaster, J.K., Chakravarthy, C.V.: Multiple UAV to-
mography based geolocation of RF emitters. In: Proc. of the SPIE Defense, Secu-
rity, and Sensing 2010 Conference, Orlando, FL, April 5-9 (2010)
Learning Correlated Equilibria
in Noncooperative Games with Cluster
Structure

Omid Namvar Gharehshiran and Vikram Krishnamurthy

University of British Columbia


Department of Electrical and Computer Engineering
2332 Main Mall, Vancouver, BC V6T 1Z4, Canada
{omidn,vikramk}@ece.ubc.ca

Abstract. We consider learning correlated equilibria in noncooperative


repeated games where players form clusters. In each cluster, players ob-
serve the action profile of cluster members and receive local payoffs,
associated to performing localized tasks within clusters. Players also ac-
quire global payoffs due to global interaction with players outside cluster,
however, are oblivious to actions of those players. A novel adaptive learn-
ing algorithm is presented which generates trajectories of empirical fre-
quency of joint plays that converge almost surely to the set of correlated
ε-equilibria. Thus, sophisticated rational global behavior is achieved by
individual player’s simple local behavior.

Keywords: Adaptive learning, correlated equilibrium, differential in-


clusions, stochastic approximation.

1 Introduction
Consider a noncooperative repeated game with a set of players comprising multi-
ple non-overlapping clusters. Clusters are characterized by the subset of players
that perform the same task locally and share information of their actions with
each other. However, clusters do not disclose their action profile to other clusters.
In fact, players inside clusters are even oblivious to the existence of other clusters
or players. Players repeatedly take actions to which two payoffs are associated: i)
local payoffs: due to performing localized tasks within clusters, ii) global payoffs:
due to global interaction with players outside clusters. The incremental informa-
tion that players acquire at the end of each period then comprises: i) the realized
payoff, delivered by a third party (e.g. network controller in sensor networks),
and ii) observation of action profile of cluster members. Players then utilize this
information and continuously update their strategies – via the proposed regret-
based learning algorithm – to maximize their expected payoff. The question we
tackle in this paper is: Given this simple local behavior of individual agents, can
the clustered network of players achieve sophisticated global behavior? Similar
problem have been studied in the Economics literature. For seminal works, the
reader is referred to [1,2,3].

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 115–124, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
116 O.N. Gharehshiran and V. Krishnamurthy

Main Results: Regret-matching as a strategy of play in long-run interactions


has been introduced in [1,2]. In [1] the authors prove that when all players share
action information and follow the proposed regret-based learning procedure, un-
der general conditions, the global behavior converges to the set of correlated equi-
librium. A milder assumption is that players only observe the outcome, namely,
stage payoffs. A regret-based reinforcement learning algorithm is proposed in [2]
whereby players build statistics of their past experience and infer how their pay-
off would have improved based on the history of realized payoffs. Our model
differs from the above works as it incorporates cluster structure where action
information is only locally shared. The main result of this paper is that if every
player follows the proposed adaptive regret-based learning algorithm, the global
behavior of the network converges to the set of correlated ε-equilibria [4]. The
presented learning procedure can be simply regarded as a non-linear adaptive
filtering algorithm. In addition, we show (via empirical numerical studies) that,
taking advantage of the excess information disclosed within clusters, an order of
magnitude faster convergence to the set of correlated ε-equilibria can be achieved
as compared to the regret-based reinforcement learning algorithm in [2].
Correlated equilibrium is a generalization of Nash equilibrium and describes
a condition of competitive optimality. It is, however, more preferable for on-
line adaptive learning in distributed systems with tight computation/energy
constraints (e.g. wireless sensor networks [5,6]) due to the following reasons:
i) Structural Simplicity: it is a convex polytope, whereas the Nash equilibria
are isolated points at the extrema of this set [7], ii) Computational Simplicity:
computing correlated equilibrium requires solving a linear program (that can be
solved in polynomial time), whereas computing Nash equilibrium necessitates
finding fixed points. iii) Coordination Capability: it directly takes into account
the ability of players to coordinate their actions. Indeed, Hart and Mas-Colell
observe in [2] that for most simple adaptive procedures, “...there is a natural
coordination device: the common history, observed by all players. It is thus rea-
sonable to expect that, at the end, independence among players will not obtain.”
This coordination leads to potentially higher payoffs than if players take their
actions independently as required by Nash equilibrium [4].

Context: The motivation for such formulation stems from multi-agent net-
works that require some sort of cluster structure such as intruder monitoring
in sensor networks. Consider a multiple-target localization scenario in an unat-
tended ground sensor network [5,6]. Depending on their locations, sensors form
clusters each responsible for localizing a particular target. Sensors receive two
payoffs: i) local payoffs, based on the importance and accuracy of the informa-
tion provided about the local phenomena, ii) global payoffs, for communicating
the collected data to the sink through the communication channel, which is
globally shared amongst all sensors. Consideration of the potential local interac-
tion among sensors leads to a more realistic modeling, hence, more sophisticated
design of reconfigurable networked sensors.
Learning Correlated Equilibria in Clustered Noncooperative Games 117

2 Regret-Based Learning with Cluster Structure

2.1 Game Model


Consider the finite repeated strategic-form noncooperative game:
       
G = K, (Cm )m∈M , Ak k∈K , U k k∈K , σ k k∈K , (1)

where each component is described as follows:


1) Set of Players: K = {1, 2, . . . , K}. Individual players are denoted by k ∈ K.
2) Local Clusters Cm : Set K is partitioned into M non-overlapping clusters
Cm ⊂ K, m ∈ M = {1, . . . , M }. We make the cluster monitoring assumption:

k, k  ∈ Cm if and only if k knows akn and k  knows akn at the end of period n.
Note that isolated players, which do not belong to any cluster, are formulated
as singleton clusters.
3) Action Set: Ak = {1, 2, . . . , Ak } denotes the set of action indices for each
player k, where |Ak | = Ak .
4) Payoff Function: U k : AK → R denotes the payoff function for each player
k. Here, AK = ×k∈K Ak represents the set of K-tuple of action profiles. A generic
element of AK is denoted by a = (ak , . . . , aK ) and can be rearranged as (ak , a−k )

for any player k, where a−k ∈ ×k ∈K Ak .
k =k
The payoff for each player k ∈ K is formulated as:
 
U k ak , a−k = Ulk (ak , aCm ) + Ugk (ak , a−Cm ). (2)

Here, aCm ∈ ×k ∈Cm Ak and a−Cm ∈ × k ∈K Ak denote the joint action profile of
k =k k ∈C
/ m
cluster Cm (to which player k belongs) excluding player k and the joint action
−k
profile of all players excluding cluster Cm , respectively. In addition, Ulk (ak , aCm )
= 0 if cluster Cm is singleton.
Time is discrete n = 1, 2,. . .. Each player k takes an action akn at time instant
n and receives a payoff Unk akn . Each player is assumed to know its local payoff
function Ulk (·); Hence, taking action akn and knowing aCnm , is capable of evaluat-
ing its stage local payoff. Players do not know the global payoff function Ugk (·).
However, they can compute their realized global payoffs as follows:
 k  
k
Ug,n an = Unk akn − Ulk (akn , aCnm ). (3)

Note that, even if players knew Ugk (·), they could not compute stage global
payoffs as they are unaware of the actions taken by players outside cluster,
namely, a−C
n
m
.
5) Strategy σ k : At period n, each player k selects actions according to a
k 
randomized strategy σ k ∈ ΔAk = {pk ∈ RA ; pk (a) ≥ 0, a∈Ak pk (a) = 1}.
The learning algorithm is an adaptive procedure whereby obtaining relatively
118 O.N. Gharehshiran and V. Krishnamurthy

high payoff by a given action i at period n increases the probability of choosing


k
that action σn+1 (i) in the following period.

2.2 Learning Correlated ε-equilibria


The game G, defined in (1), is played repeatedly in discrete time n = 1, 2, . . ..
Each player k generates two average regret matrices and update their elements
k
over time: (i) ᾱkAk ×Ak , which records average local-regrets, and (ii) β̄Ak ×Ak ,
which is an unbiased estimator of the average global-regrets. Each element
ᾱkn (i, j), i, j ∈ Ak , gives the time-average regret, in terms of gains and losses in
local payoff values, had the player selected action j every time he played action
i in the past. However, players are not capable of computing their global payoffs
and only receive the realized values. Each element β̄nk (i, j), i, j ∈ Ak , thus pro-
vides an unbiased estimate (based on the realized global payoffs) of the average
regrets for replacing action j every time i was played in the past.
Positive overall-regrets (sum of local- and global-regrets) imply the oppor-
tunity to gain higher payoffs by switching action. Therefore, agents take only
positive regrets |ᾱkn (i, j) + β̄nk (i, j)|+ into account to determine switching proba-
+
bilities σ kn . Here, |x| = max{0, x}. The more positive the regret for not choos-
ing an action, the higher is the probability that the player picks that action.
At each period, with probability 1 − δ, player k chooses its consecutive action
according to |ᾱkn (i, j) + β̄nk (i, j)|+ . With the remaining probability δ, player k
randomizes amongst the actions Ak according to a uniform distribution. This
can be interpreted as “exploration” which is essential as players continuously
learn their global payoff functions. Exploration forces all actions to be chosen
with a minimum frequency, hence, rules out actions being rarely chosen.
The adaptive regret-based learning algorithm can then be summarized as
follows:
Algorithm 1: Adaptive Regret-based Learning with Partial Local Information
0) Initialization: Set 0 < δ < 1. Initialize ψ0k (i) = 1/Ak , for all i ∈ Ak ,
k
ᾱk0 = 0Ak ×Ak and β̄0 = 0Ak ×Ak .
For n = 1, 2, . . . repeat the following steps:
1) Strategy Update and Action Selection: Select action akn = j according
to the following distribution
δ
σ kn = (1 − δ) μkn + · 1 k, (4)
Ak A
where 1Ak = [1, 1, · · · , 1]Ak ×1 and μkn denotes an invariant measure for the
following transition probabilities:
⎧    k +
⎨ 1k ᾱkn−1 akn−1 , i + β̄n−1
k
an−1 , i , i = akn−1 ,
k ξ
ψn (i) = 1 −  k (5)
⎩ j∈Ak ψn (j) , i = akn−1 .
j=i

Here, ξ k is chosen such that ξ k > j∈Ak −{ak ψnk (j).
n−1 }
Learning Correlated Equilibria in Clustered Noncooperative Games 119

2) Local Information Exchange: Player k: i) broadcasts akn to the cluster


members, ii) receives actions of cluster members and forms the profile aCnm .
3) Regret Update:
3.1: Local Regret Update
  
ᾱkn (i, j) = ᾱkn−1 (i, j) + n Ulk (j, aCnm ) − Ulk (akn , aCnm ) I{akn = i} − ᾱkn−1 (i, j) . (6)

3.2: Global Regret Update



σnk (i) k  k  k  k k
β̄nk (i, j) = β̄n−1
k
(i, j)+n U k
a I{an = j} − Ug,n k
an I{an = i} − β̄n−1 (i, j) .
σnk (j) g,n n
(7)

Here, I{·} denotes the indicator function and the step-size is selected as n =
1/(n + 1) (in static games) or n = ε̄, 0 < ε̄  1, (in slowly time-varying games).
4) Recursion: Set n ← n + 1 and go to Step 1.

Remark 1. The game model may evolve with time due to: i) players join-
ing/leaving the game, ii) players appending/shrinking the set of choices, iii)
changes in players’ incentives, and iv) changes in cluster membership agree-
ments. In these cases, to keep players responsive to the changes, a constant
step-size n = ε̄ is required in (6) and (7). Algorithm 1 cannot respond to mul-
tiple successive changes in the game as players’ strategies are functions of the
time-averaged regrets.

3 Global Behavior and Convergence Analysis

3.1 Global Behavior and Correlated ε-equilibrium


Consider game G, defined in (1), and suppose each player employs Algorithm 1
to pick action for the next period. The global behavior, denoted by z̄n , is defined
as the (discounted) empirical frequency of joint play of all players. Formally,
1 
n τ ≤n eaτ , if n = n1 ,
z̄n = (8)
ε̄ τ ≤n (1 − ε̄)n−τ eaτ , if n = ε̄,
 k

where eaτ denotes the k∈K A -dimensional unit vector with the element
corresponding to aτ being equal to one. The second line in (8) is a discounted
version of the first line and will be used in slowly evolving games. Note that
z̄n is only used for the global convergence analysis of Algorithm 1 – it does
not need to be computed by the players. However, in multi-agent systems such
as sensor networks, a network controller can monitor z̄n and use it to adjust
sensors’ parameters, thereby changing the equilibrium set in novel ways.
120 O.N. Gharehshiran and V. Krishnamurthy

Before proceeding with the main theorem of this paper, we provide the defi-
nition of the correlated ε-equilibrium Cε .
K
Definition 1.
K
Let π denote a joint distribution on A , where π (a) ≥ 0 for all
a ∈ A and a∈AK π (a) = 1. The set of correlated ε-equilibrium, denoted by
Cε , is the convex set [4]
 
Cε = π : π k (i, a−k ) U k (j, a−k ) − U k (i, a−k ) ≤ ε, ∀i, j ∈ Ak , ∀k ∈ K . (9)
a−k

For ε = 0 in (9), C0 is called the set of correlated equilibria.


 
In (9), π k i, a−k denotes the probability of player k choosing action i and the
rest playing a−k . Definition 1 simply states that when the recommended signal a,
chosen according to the distribution π, allocates positive probability to playing
action i by player k, choosing j ∈ Ak − {i} (instead of i) does not lead to a
higher expected payoff.

3.2 Convergence to Correlated ε-equilibrium


The following theorem states the main result of this paper:
Theorem 1. Suppose each player k ∈ K employs the learning procedure in
Algorithm 1. Then, for each ε > 0, there exists δ̂ (ε) such that if δ < δ̂ (ε)
in Algorithm 1, the global behavior z̄n converges almost surely (for n = 1/n) to
the set of correlated ε-equilibria in the following sense:
a.s. a.s.
z̄n −−→ Cε as n → ∞ iff d (z̄n , Cε ) = inf |z̄n − z| −−→ 0 as n → ∞. (10)
z∈Cε

For constant step-size n = ε̄, z̄n weakly tracks Cε .


The above theorem implies that, for constant step-size n = 1/n, the stochastic
process z̄n enters and stays in the correlated ε-equilibrium set Cε forever with
probability one. In other words, for any ε > 0, there exists N (ε) > 0 with
probability one such that for n > N (ε), one can find a correlated equilibrium
π ∈ C0 at the most ε-distance of z̄n . In addition, if the game evolves with time
slowly enough, Algorithm 1 can properly track the time-varying set of correlated
ε-equilibria.

Remark 2. If one replaces δ in Algorithm 1 with δn , such that δn → 0 slowly


enough as n → ∞, convergence to the set of correlated equilibria C0 (instead of
ε-equilibria Cε ) can be achieved in static games. This result cannot be expanded
to the games slowly evolving with time.

Proof. The proof uses concepts in stochastic averaging theory [8] and Lyapunov
stability of differential inclusions [9]. In what follows, a sketch of the proof will
be presented:
Learning Correlated Equilibria in Clustered Noncooperative Games 121

1) Asymptotics of the Discrete-time Dynamics: Trajectories of the piecewise


k
constant continuous-time interpolation of the stochastic processes ᾱkn and β̄ n
converges almost surely to (for n = 1/n), as n → ∞, or weakly tracks (for
k
n = ε̄), as ε̄ → 0, trajectories ᾱ (t) and β̄ (t) evolving according to the system
k

of inter-connected differential inclusion-equation:

dᾱk
 k
∈ Lk ᾱk , β̄ − ᾱk ,
dt
dβ̄k
 k k (11)
dt = G k ᾱk , β̄ − β̄ ,

 k  k
where elements of the set-valued matrix Lk ᾱk , β̄ and matrix G k ᾱk , β̄ are
given by:

   
k   
Lkij ᾱk , β̄ = Ulk j, ν Cm − Ulk i, ν Cm σ k (i) ; ν Cm ∈ ΔACm −{k} , (12)

 k k  k 
Gij
k
ᾱ , β̄ = Ug,t (j) − Ug,t
k
(i) σ k (i) , (13)

k
for some bounded measurable process Ug,t (·). Here,


Ulk (ak , ν Cm ) = Ulk (ak , aCm )dν Cm (aCm ), (14)
ACm −{k}

In addition, ΔACm −{k} denotes the simplex of probability measures over


ACm −{k} . The proof for the case of slowly time-varying game includes mean
square error bounds and weak convergence analysis.
Furthermore, if (11) is Lyapunov stable, trajectories of the continuous-time
k
interpolation of the stochastic processes ᾱkn and β̄ n converges almost surely to
(for n = 1/n), as n → ∞, or weakly tracks (for n = ε̄), as ε̄ → 0, the set of
global attractors of (11).
2) The coupled system of differential inclusion-equation (11) is Lyapunov sta-
+
ble and the set of global attractors is characterized by ᾱk (i, j) + β̄ k (i, j)
being confined within an ε-distance of R− , for all i, j ∈ Ak . Formally, for almost
every solution to (11),
+
lim ᾱkt (i, j) + β̄tk (i, j) ≤ ε, ∀i, j ∈ A. (15)
t→∞

This, together with step 1, proves that if player k employs the learning procedure
in Algorithm 1, ∀ε ≥ 0, there exists δ̂(ε) ≥ 0 such that if δ ≤ δ̂(ε) in Algorithm 1:
122 O.N. Gharehshiran and V. Krishnamurthy

Table 1. Agents’ Payoff Matrix

2 : x1 2 : x2
 
Local: Ul1 , Ul2 1 : x1 (3, 5) (2, 3)
1 : x2 (3, 3) (5, 4)

2 : x1 2 : x2 2 : x1 2 : x2
  1 : x1 (−1, 3, 1) (2, −1, 3) (1, −1, 3) (0, 3, 1)
Global: Ug1 , Ug2 , Ug3
1 : x2 (1, −1, 3) (1, 4, 1) (3, 3, 1) (−1, 0, 3)
3 : y1 3 : y2

+
lim sup ᾱkn (i, j) + β̄nk (i, j) ≤ ε w.p. 1, ∀i, j ∈ Ak . (16)
n→∞

3) The global behavior z̄n converges to Cε if and only if (16) holds for all
players k ∈ K. Thus, if every player k follows Algorithm 1, z̄n converges almost
surely (in static games) or weakly tracks (in slowly evolving games) the set of
correlated ε-equilibrium Cε .

4 Numerical Example

In this section we study a small hypothetical multi-agent network comprising


three agents K = {1, 2, 3}. Agents 1 and 2 are allocated the same task, hence,
form the cluster C = {1, 2} and share action information. Agent 3 forms a sin-
gleton cluster, hence, neither observes the action profile of C, nor discloses its
action to agents 1 and 2. Agents 1 and 2 repeatedly take action from the same
action set A1 = A2 = {x1 , x2 }. Agent 3, due to performing a different task,
chooses from a different action set A3 = {y1 , y2 }. Table 1 gives the payoffs in
normal form. The set of correlated equilibrium is singleton (a pure strategy),
where probability one is assigned to a∗ = (x2 , x2 , y1 ) and zero to others.
In numerical studies, we set n = 1/n and δ = 0.1. Figure 1 illustrates the
behavior of Algorithm 1 and compares its performance to the reinforcement lean-
ing algorithm proposed in [2]. The sample paths shown in Fig. 1 are averaged
over 50 independent runs of the algorithms starting with the same initial con-
ditions a1 = (x1 , x1 , y1 ). Note that Theorem 1 proves convergence to the set of
correlated ε-equilibrium. Therefore, although the average utilities increases with
the number of iterations in Fig. 1(a), it only reaches an ε-distance of the val-
ues achievable in correlated equilibrium depending on the choice of exploration
parameter δ in Algorithm 1. Comparing the slopes of the lines in Fig. 1(b),
m1 = −0.182 (for regret-based reinforcement learning [2]) and m2 = −0.346 (for
Algorithm 1) numerically verifies that exploiting local action information results
in an order of magnitude faster convergence to the set of correlated ε-equilibria.
Learning Correlated Equilibria in Clustered Noncooperative Games 123

Average Overall Utility


6

2
0 200 400 600 800 1000
Iteration Number n

(a) Average overall utility


Distance to Correlated Equilibrium

0
10

2 3
10 10
Iteration Number log(n)
(b) Distance to correlated equilibrium

Fig. 1. Performance Comparison: The solid and dashed lines represent the results from
Algorithm 1 and the reinforcement learning algorithm in [2], respectively. In (a), the
blue, red and black lines illustrate the sample paths of average payoffs of agents 1, 2 and
3, respectively. The dotted lines also represent the payoffs achievable in the correlated
equilibrium.

5 Conclusion
We considered noncooperative repeated games with cluster structure and
presented a simple regret-based adaptive learning algorithm that ensured con-
vergence of global behavior to the set of correlated ε-equilibria. Noting that
reaching correlated equilibrium can be conceived as consensus formation in ac-
tions amongst players, the proposed learning algorithm could have significant
124 O.N. Gharehshiran and V. Krishnamurthy

applications in frameworks where coordination is sought among “players” in a


distributed fashion, e.g. smart sensor systems and cognitive radio. It was nu-
merically verified that utilizing the excess information shared/observed within
clusters could lead to an order of magnitude faster convergence results.

References
1. Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilib-
rium. Econometrica 68, 1127–1150 (2000)
2. Hart, S., Mas-Colell, A.: A reinforcement procedure leading to correlated equilib-
rium.In: Economic Essays: A Festschrift for Werner Hildenbrand, pp. 181–200 (2001)
3. Hart, S., Mas-Colell, A.: A general class of adaptive strategies. Journal of Economic
Theory 98, 26–54 (2001)
4. Aumann, R.J.: Correlated equilibrium as an expression of Bayesian rationality.
Econometrica: Journal of the Econometric Society 55, 1–18 (1987)
5. Krishnamurthy, V., Maskery, M., Yin, G.: Decentralized adaptive filtering algo-
rithms for sensor activation in an unattended ground sensor network. IEEE Trans-
actions on Signal Processing 56, 6086–6101 (2008)
6. Gharehshiran, O.N., Krishnamurthy, V.: Coalition formation for bearings-only lo-
calization in sensor networks – a cooperative game approach 58, 4322–4338 (2010)
7. Nau, R., Canovas, S.G., Hansen, P.: On the geometry of nash equilibria and corre-
lated equilibria. International Journal of Game Theory 32, 443–453 (2004)
8. Kushner, H.J., Yin, G.: Stochastic Approximation Algorithms and Applications,
2nd edn. Springer, New York (2003)
9. Benaı̈m, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential in-
clusions; Part II: Applications. Mathematics of Operations Research 31, 673–695
(2006)
Marketing Games in Social Commerce

Dohoon Kim

School of Business, Kyung Hee University


Hoegi-dong 1, Dongdaemoon-gu, Seoul 130-701, Korea (South)
[email protected]

Abstract. This study first provides a stylized model that captures the essential
features of the SC(Social Commerce) business. The model focuses on the
relationship between key decision issues such as marketing inputs and revenue
stream. As more SCs join the industry, they are inevitably faced with fierce
competition, which may lead to sharp increase in the total marketing and
advertising expenditure. This type of competition may lead the industry away
from its optimal development path, and at worst, toward a disruption of the
entire industry. Such being the case, another goal of this study is to examine the
possibility that the tragedy of commons may occur in the industry. Our basic
analysis presents Nash equilibria with both homogeneous and heterogeneous
players. Under a symmetric situation with homogeneous SCs, our analysis
specifies the conditions that the tragedy of commons can occur. Further
discussions provide strategic implications and policy directions to overcome the
shortcomings intrinsic to the current business model, and help the industry to
sustainably develop itself toward the next level.

Keywords: Social commerce, SNS, Marketing competition, Game model,


Tragedy of commons, Regulation.

1 Introduction
SC(Social Commerce or social shopping) providers started their business by
combining group buying with selling discounts from their partners over the Internet.
SC providers split the revenue with their business partners at a predefined
commission rate. After Groupon first initiated this business model in 2009, this type
of services has been called ‘group buying’ since the service proposals become
effective only when more than a certain number of customers buy the coupons. The
SC services are also called ‘daily deal’ or ‘flash deal,’ which emphasizes the aspect of
the service offerings that are usually valid for a short period of time.
SC, barely three years old as a new industry, has been witnessing rapid growth, and
more customers, business partners and investors have joined the industry. More than
500 SC providers (hereafter, simply referred to as SCs) are running their business
worldwide([15]). 1 In Korea, one of the hottest regions of the SC industry, the
1
The statistics vary to some extent since the ways to define the SC industry are different across
countries. Another statistics argue that the number of SCs in the middle of 2011 amounts to
320 in the U.S., more than 3,000 in China, more than 300 in Japan, and 230 in Korea,
respectively (Kim, 2011; Lee, 2011; ROA Holdings, 2011).

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 125–137, 2012.
© Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
126 D. Kim

transaction scale over one quarter amounts to more than 200 million dollars. The sales
revenue of SCs has increased from 45 million dollars in 2010 to almost 500 million
dollars in 2011. These figures mean that the industry has grown 10 times in terms of
sales revenue and 20 times in terms of transaction scale over a year. As of the end of
2011, more than a third of the population in Korea subscribed to and experienced the
service([9]). One observes similar figures about the industry in the East Asian region,
where the SC business is most popular other than the U.S. Over the past years, for
example, the sales revenue has increased from 780 billion dollars to more than 1
trillion dollars in the U.S., from 1,200 million dollars to 3,550 million dollars in
China, from 8,400 million dollars to 11 billion dollars in Japan([5]).
The emergence of SC reflects the collective bargaining power of end-users as the
Internet has shifted the bargaining power from sellers to customers. One of the
distinct examples of this change is what SNS(Social Network Service) brought to the
distribution channels and marketing efforts. Thanks to this new opportunity,
customers, particularly in younger generations who are now initiating and shaping the
consumer trends, have been exposed to more deals, discount chances, and new
information around their local areas. Accordingly, they have been easily allured to the
service proposals from SCs and gave a boost to the industry in its early stage.
However, many criticisms about the SC businesses are emerging now: for
example, [12], [14], [15], [18], [19] etc. These startups have drawn skepticism for
unusual accounting practices, rising marketing costs and quality assurance. This could
make it more difficult for SCs to lure investors. Actually, Groupon experienced an
unstable fluctuation of stock price after its IPO, and LivingSocial withdrew its actions
towards IPO. However, the most urgent, critical view on the SC industry points out
that the industry’s growth rates are unsustainable. One also argues that the business
model of SC has some flaws and cannot be justified by the current overheated market.
The resulting instability may suddenly leave customers, partners and investors
disenchanted. According to Korea Consumer Protection Board, the number of
accusation cases about service failures and customer damages reaches 500 in
2011([9]). Furthermore, many SCs seem to suffer from huge marketing expenses and
low ARPU(Average Revenue Per User). This result has been predicted since their
business practices reinvested a big portion of revenue on advertising and promotion,
and maintained a wide range of service offerings, of which the assortment and
operations costs are too high to justify. Actually, in most countries, ARPU has been
tied up at a very low level: for example, [9], [11], [16].
The prosperity and adversity of the SC industry carries meaningful implications to
other e-commerce industries. The business model of SC may seem to be IT-intensive
at a glance, but it heavily relies upon laborious works. In fact, the human resource and
manpower is the main source of competition in this industry. The SC business model
needs to investigate various commercial districts, negotiate and contract with the
partners, and advertise and promote the service offerings to anonymous consumers.
All of these activities require human interventions. This explains why the average sale
amount per employee is far lower than ones in other e-commerce sectors such as SNS,
search engines, and business portals([5], [11]). Thus, the low entry barrier in the SC
industry is very likely to propel the industry to a chicken game in marketing. The
Marketing Games in Social Commerce 127

worst outcome of persistence of the current situation is that the business model could
end up with another bubble and the entire industry could collapse. SCs, entering a
new phase, should revise the value proposition that they are willing to deliver to the
market and develop a niche differentiated from online shopping mall or open markets.
This study aims at providing a stylized model that captures the essential features of
the SC business model. We will analyze the model to see whether SC is sustainable or
not and find some conditions for stable evolution of the industry. Our approach first
focuses on the relationship between marketing efforts and revenue stream. As more
SCs join the industry, fierce competition is inevitable, resulting in sharp increase of
the marketing and advertising expenditure. This type of competition may lead the
industry away from its optimal development path, and at worst, toward a disruption of
the entire industry. Such being the case, the contribution of this study can be seen as
examining the possibility that ‘the tragedy of commons’ occurs in the industry and
devising a means of avoiding the tragedy.
The organization of our paper is as follows. In Section 2, we present our model that
is stylized to demonstrate the essential features of SC business process and
competitive landscape. We analyze the model in the next section and investigate the
possibility that the tragedy of commons occurs in the industry due to an excessive
competition in market share. Implications of our findings through modeling and
analysis are followed in the next section. Section 4 also discusses the future
development of the SC business model to overcome its limitations. The last section
concludes the paper and suggests future works.

2 Model

SC offers a value proposition to potential customers by allowing them to buy in


groups and receive quantity discounts. Merchants or suppliers (as business partners
with SCs) also gain benefits from selling a significant amount of volume (rather than
selling one by one and customer to customer) through a single SC channel.
Furthermore, suppliers use SCs as a marketing channel to access potential customers
and increase sales. Thus, the key to the SC business model lies in a deep discount or
pooling willingness-to-buy from customers and turning that potential to an effective
real demand. Our study focuses on the latter part of the business model: i.e., pooling
the potential demand and turning it to real one. For example, Groupon tries to attract
the minimum required number of users to unlock the corresponding offer.
However, potential investors are not quite sure about whether the business model is
sustainable. Those who considered investing in SC startups seem less interested
now([12], [15], [19] etc.). The weakest link of the business model comes from its
simplicity. It is simple enough to be copied without heavy initial installation costs; as
a matter of fact, it is too simple to prevent potential competitors from entering the
industry. Accordingly, the competition is increasingly intensifying.
Some investigations show that SCs are forced to spend more money to acquire
customers due to intensifying competition. In 2010, Groupon spent $4.95 per
subscriber added, but in 2011, it spent $5.30 for each additional subscriber([12]). This
128 D. Kim

increase will be worrisome to potential investors since it could be a signal that the
business is getting more costly for a SC to acquire and retain customers in order to
keep the revenue stream. The business model reveals the nature that a success
constrains its growth.
This self-destructive aspect can be best disclosed when there is less volume of
available inventory or service capacity (ironically, thanks to a success of its SC
business) for many deal-prone customers. In that situation, which is quite plausible,
the willingness of partner suppliers to offer a deep discount will go down, and price-
sensitive shoppers switch to another SC which offers a better deal. In the long run,
competition among SCs will drive down the discount rate and/or the minimum
required threshold.
Considering the arguments above, here, we formulate the SC business model that
incorporates both the bright side and inherent weakness, and delve into the possibility
of self-destruction. We focus on key decision issues of SCs such as marketing efforts
and service configurations offered to customers. Due to fierce competition among
SCs, however, the commission rate is highly likely to be standardized and common to
all the SCs. For example, the commission rate in Korea has been almost fixed at 20%
over one year ([9]). With a fixed commission rate, our model allows a SC to leverage
its minimum required number (refer to the definition of threshold below) as a
competition tool depending on its competitive capability. We further assume that the
discount rates in service configuration are already reflected in this threshold level. In
sum, for the purpose of our study, it suffices to focus on the marketing expenses and
the threshold level.
Let’s suppose that there is a set composed of N SCs, where k is employed as the
index for an individual (sometimes representative) SC. N may also denote the set of
SCs if it is clearly distinguishable in the context: i.e., N = {1, …, n}. We define some
notations for the model elements as follows:
• ek: marketing efforts of SC k,
• tk: customer favor threshold (hereafter, simply referred to as ‘threshold’) set by
SC k (i.e., a reference point representing a service configuration including a discount
package and a minimum number of customers in order for the service offering to be
effective),
• δk: SC k’s tolerance capability to maintain positive cash flows in choosing the
threshold level (i.e., the maximum level of threshold that SC k endures),
• E: total marketing efforts currently under way in the industry (i.e., E= k∈N ek ).

Then, the stylized business model of SCs is abstracted as follows. First, a SC issues
coupons that can be used to purchase products or services from a partner supplier at a
constant discount rate. However, those coupons are effective only when the amount of
coupons sold exceeds a minimum required number of users, or a threshold (tk) set by
the corresponding SC k. The revenue of the SC k will be proportional to the effective
demand that the SC faces. Usually, the revenue function of SC k can be represented
by rk(tk, ek, E).
Marketing Games in Social Commerce 129

∂rk ∂rk ∂ 2 rk
, where > 0, < 0, and < 0. (1)
∂ek ∂E−k ∂t k2

For example, we may employ rk(tk, ek, E) = (ek / E)⋅tk⋅(δk − tk), where δk is the
maximum level of threshold that SC k endures, and simply called the capability of SC
k; that is, SC k loses money if it sets tk beyond δk.
Now, we need to explain more about the conditions in (1). First, the sales revenue
(the amount of deals concluded) of k will be proportional to the relative marketing
expenses. This feature reflects the current situation with a very low entry barrier and
brand-recognition directly related to market share. Thus, we get the first inequality in
(1). However, the marketing efforts of other players put a negative effect on the
corresponding revenue rk, which the second inequality in (1) suggests.
Before explaining the third inequality, note that the threshold has an effect both for
and against the sales amount of SC k. The bigger tk, the larger profit margin SC k will
expect. On the other hand, the probability of ‘service failure’ increases as tk rises.
What we mean by ‘service failure’ is the service that was offered but failed to be
delivered due to the effective demand falling short of the threshold. In its turn, the SC
should compensate for the failure according to the predefined SLA(Service Level
Agreement), which results in a loss on the revenue stream. According to a survey
conducted by the Korea Chamber of Commerce and Industry, more than 50% of
complaints from SC customers come from service failures such as shortage in
quantity and quality degradation due to excessive sales of coupons([8]). SCs are
responsible for the service failures, and they should compensate the corresponding
customers for breach of service agreement, which ultimately reduces the actual
revenue. Thus, the increase of the threshold tk will enhance the revenue at first, but it
will also increase the possibility of service failure, thereby reducing the real revenue
in the end. We model this effect of tk on the revenue in a concave shape, thereby
requiring the third inequality in (1).
Finally, we need to net out the costs of individual efforts of SC k, which is
assumed to be proportional to the amount of effort ek: that is, ck⋅ek. Note that ck
involves both pecuniary and non-pecuniary unit cost incurred in the course of
operations pertaining to marketing. Thus, it can be thought of as all the ex ante burden
when SC k implement one unit of marketing action. And ck should not be confused
with marketing expenses ek, which represents ex post values paid for marketing-
related activities. There will be no costs associated with the decision of tk since the
decision is a matter of deliberation and does not incur pecuniary costs. In sum, our
final payoff (profit) of SC k is formulated as follows:

ek
πk = rk − ck⋅ek = tk⋅ ⋅(δk − tk) − ck⋅ek (tk ≤ δk). (2)
E

And the total industry profit is naturally defined as k∈N π k .


130 D. Kim

3 Analysis

Our analysis first presents Nash equilibria of the model. Assuming that heterogeneous
SCs may employ different strategies, the following Proposition shows that there are
infinitely many solutions, in particular, for a best response of individual marketing
effort ek.

Proposition 1. Let k denote an arbitrary SC among N SCs: i.e., k∈{1, …, N}. We


δ2 c ej
also define constants ζij ≡ i2 ⋅ j for all i < j and εj ≡
2
for all j in N. The latter
δ j ci E
represents the relative strength of marketing effort from SC j. Given that (N−1) SCs
*
(except k) have decided their optimal (best-response) threshold tj and marketing
effort ej (j = 1, …, k−1, k+1, …, N), SC k’s optimal tk and εk (both positive) are
* * *

determined as follows:
δ
tk* = k and
2
εk* as a solution to the following linear equation system, 1 − ζij = −ζij⋅εi* + εj*, ∀ i < j
in N.
Then, ek* is determined by P⋅εk* with a suitable proportional constant P.

Proof: First, one can easily show that tk* satisfies the FONC(First Order Necessary
Condition). The linear equation system for εk’s (k = 1, …, N) comes from a set of
FONCs for ek*’s. It is possible to derive a closed form solution for εk* by utilizing the
matrix structure of the linear system and employing Cramer’s rule. However, the
detailed procedure is omitted here; instead, an example is demonstrated below. Once
εk*’s are identified, we can construct ek* by simply multiplying a constant P to the
corresponding εk*. Although the system of simultaneous equations for ek’s has
infinitely many solutions, thanks to the linearity of the equation system for εk’s, ek* is
unique up to scalar multiplication.
To check out the SOSC(Second Order Sufficient Condition), we construct the
Hessian matrix H as follows. One can easily show that H is negative definite at the
points satisfying the FONCs if both tk* and ek* are positive as assumed in the
Proposition above.

 2 E ⋅ E−k ⋅ t k ⋅ (δ k − t k ) E−k ⋅ (δ k − 2t k ) 
− E4 E2 
H=
δ k − 2t k 2ek  , where E−k = i≠ k ei . Q.E.D.
 − 
 E E 

According to Proposition 1, the optimal threshold is proportional to the capability


that the corresponding SC can exert in the market. The (relative) marketing effort of
2
As stated before, the notation ‘N’ stands for either the number of SCs or a set of SCs. This
usage will not cause any confusion since it is clear from the context what it means.
Marketing Games in Social Commerce 131

SC k, εk* (thereby, ek* too) increases as ζkj (j ≠ k) increases, but decreases as ζjk (j ≠ k)
increases. Thus, more marketing efforts of SC k are expected if the relative capability
δ c
(i.e., k , j ≠ k) is enhanced and/or the relative marketing cost (i.e., k , j ≠ k)
δj cj
decreases. However, the former will have a stronger effect on ek* than the latter since
ζij is proportional to square of the relative capability. Subsequently, the critical
competitive edge can be gained from enhancing the capability that a SC can maintain
a positive cash flow against low margins.
If all the SCs have the same capability and cost structure, a symmetric Nash
equilibrium can be found in the following Proposition. This sort of symmetric cases
with homogeneous SCs may fit two stages of the industry life-cycle. The first is the
infant or very early stage of the industry, where a small number of similar size
companies constitute the industry. Another one is the mature stage of the life-cycle,
where many small- and medium-sized SCs (in particular, with low δk) are forced out
of the market and a small number of big SCs with similar properties survive.

Proposition 2. In a symmetric case, where ck = c and δk = δ for all k = 1, …, N, a


symmetric Nash equilibrium is determined as follows:
δ N − 1 δ2
tk* = t* = and ek* = e* = ⋅ for all k = 1, …, N.
2 N 2 4c

Proof: As for tk*, the same reasoning as in Proposition 1 is applied. Thanks to the
symmetric strategy assumption, we can construct the system of linear equations for
( )
ek’s directly from the set of FONCs, c⋅(E*)2 = δ 2 4 ⋅ E−∗k , for all k. The last equation
reduces into c⋅N ⋅e = δ ⋅ ( N − 1) 4 since E = N⋅e* and E−k
2 * 2 * ∗
= (N−1)⋅e*. Thus, we
get tk* and ek* as above, from which we see that SOSCs are trivially satisfied.
Q.E.D.

First note that in the case of symmetric strategy, the optimal level of the customer
favor threshold t* does not depend on the number of SCs in the industry. On the other
hand, it is interesting to look into the combined effort or total expenditure from all
N − 1 δ2
SCs (i.e., E* = N⋅e* = ⋅ ), which depends on the number of SCs. The
N 4c
dE ∗
combined effort increases with N (i.e., > 0), but the rate of growth is
dN
d 2E∗
diminishing with N (i.e., < 0). Furthermore, E* converges to a number as N
dN 2
δ2
goes to infinity: i.e., lim N →∞ E ∗ = ≡ Ê . In sum, E* is a concave function of N,
4c
which converges to Ê .
132 D. Kim

Although each SC may exert less marketing effort as there are more SCs
(see e* in Proposition 2), the addition of new SC swamps this effect, thereby,
increasing the total marketing efforts (E*) into the market. If we assume that the
revenue function reflects the market demand, there will be a strong possibility of
overexploitation of customers; that is, collectively, SCs will exert marketing efforts
far beyond the point that boosts the potential market demand at its maximum level.
This resembles the typical situation of ‘the tragedy of commons,’ where this sort of
negative externality is at the heart of the problem ([3], [6], [7]). When a SC
advertises, it doesn’t take into account the negative effect that its action might have on
the revenue streams of other SCs.
To examine this possibility more precisely, let’s first define the industry
performance measure W(-) as a function of the total marketing expenses E and the
average customer favor threshold⎯t as follows:

W(E,⎯t) = CB(E,⎯t)⋅PB(E,⎯t). (3)

where CB and PB stand for ‘Customer Benefits’ and ‘Producer Benefits,’


respectively. Under symmetric strategies (i.e.,⎯δ = δ and⎯t = t), the latter is simply the
sum of all the profits from SCs: that is, PB(E, t) ≡ k∈N π k = t⋅(δ − t) − c⋅E.
CB is supposed to have a linear and additive relationship with E and⎯t : that is,
CB(E,⎯t) = α⋅E − β⋅ t , where α and β are all positive coefficients. This notion of CB
is natural since the scale of demand for SC services is more likely to increase with
greater total marketing efforts. In addition, the degree of customer benefits (for
instance, higher reliability and assurance of services) enhances as the average
threshold level reduces.
However, CB and PB are not generally commensurable, and they cannot be
combined in a simple (weighted) sum. Literature on cost-benefit analysis and multi-
criteria decision making suggests to employ a multiplicative form (instead of a
summation) when combining two terms incommensurable with each other. One may
incorporate additional weights to adjust the balance between CB and PB. However,
we did not apply such weights in (3) since our purpose of study is not to quantify or
estimate the exact amount of the industry performance, but to examine qualitative
behaviors of the system. As a result, our industry performance measure has been
proposed as the product of CB and PB, and the expression for W(-) in (3) has been
arranged into the following specific form:

W(E, t) = {t⋅(δ − t) − c⋅E}⋅(α⋅E − β⋅t). (4)

With the industry performance measure in (4), the following Proposition explains
how socially optimal E0 and t0 are determined.
Marketing Games in Social Commerce 133

Proposition 3. Let’s assume the symmetric situation as in Proposition 2, and suppose


δ β
( )
that the following condition holds: > ⋅ 1 + 3 . Then, the total marketing efforts
c α
E0 and the average threshold t0 maximize the industry performance defined in (4).

δ δ ⋅ (α ⋅ δ + 2c ⋅ β)
t0 = and E0 = .
2 8c ⋅ a

Proof: First, it’s easy to show that FONCs are satisfied with t0 and E0 if α⋅δ > c⋅β (in
particular, for t0), which is also satisfied by the condition above. To check out SOSC,
we construct the Hessian matrix below:
 − 2c ⋅ α c ⋅ β + α ⋅ (δ − 2t ) 
H= .
c ⋅ β + α ⋅ (δ − 2t ) β ⋅ (4t − δ ) − 2α ⋅ E 
This Hessian is indeed negative definite at t0 and E0 when (α⋅δ − c⋅β)2 > 3(c⋅β)2,
2 2 2 2
δ β β δ β δ β
which is equivalent to  −  > 3⋅   , or   − 2 ⋅ − 2  > 0. Since
 c α  α
   
c α c α
δ
is positive, this inequality is satisfied if the condition in the Proposition holds.
c
Q.E.D.
Note that t0 = t*; that is, at least for the threshold, the socially optimal level and the
optimal level of an individual choice coincide. Therefore, we may predict that SCs
will manage their threshold levels at the socially optimal level.
However, this desirable feature may not be sustained when we consider the total
marketing efforts. Furthermore, a ramification of the tragedy of commons shows a
δ β
‘phase transition’ nature, where the relationship between and specifies the
c α
sharp boundary of the phase transition. We’ve already seen that a relationship
between these two terms presents the conditions in Proposition 3. These conditions
δ β
hold when is far larger than . Proposition 4 goes further and provides another
c α
relationship (in somehow different format) between these two terms. This relation is
critical in triggering the situation of ‘the tragedy of commons.’

Proposition 4. Let’s assume the symmetric situation as in Proposition 2. Now,


consider the following two cases that are mutually exclusive and complete.
δ β
Case (a) > 2 : Then, there is a positive critical value T such that the tragedy of
c α
commons occurs (i.e., E* > E0) if the number of SCs exceeds this critical point (i.e., N
≥ T ). T is larger than one and determined as follows:

2α ⋅ δ
T= ,
α ⋅ δ − 2c ⋅ β
134 D. Kim

δ β
Case (b) ≤ 2 : Then, for any N, the total marketing efforts falls short of the
c α
socially optimal level (i.e., E* ≤ E0).

Proof: E* > E0 if and only if (α⋅δ + 2c⋅β)⋅N < 2α⋅δ⋅N − 2α⋅δ, which is further
arranged into (2c⋅β − α⋅δ)⋅N < −2α⋅δ. Then, we have two cases. The condition in
Case (a) corresponds to the situation where the left-hand side is negative; while the
condition in Case (b) guarantees that the left-hand side is non-negative. Thus, in Case
(b), the inequality E* > E0 cannot hold unless N is negative, which is impossible. In
2α ⋅ δ
Case (a), E* > E0 holds for N > ≡ T. Furthermore, the denominator of T is
α ⋅ δ − 2c ⋅ β
always bigger than the numerator under the condition in Case (a), which guarantees T
> 1. Q.E.D.
The results of the Proposition imply that one cannot expect that the SC industry will
be sustained unless the condition in Case (b) comes true. It depends on the number of
SCs whether the industry in Case (a) survives or not. That is, a limited number of SCs
may thrive only if the size of the industry is maintained less than T. It’s not difficult to
construct an example where Case (b) together with the limited opportunity of N < T in
Case (a) of Proposition 4 are rarely observed. Therefore, the tragedy of commons
seems inevitable in most practical situations.

By rearranging T into , we know that T is larger than 2 and converges
2c ⋅ β
α−
δ
dT d 2T
to 2 as δ becomes larger. Since < 0 and > 0, T is diminishing but slowly
dδ dδ 2
β
converges to 2 as δ goes infinity. However, T shows a different behavior when q ≡
α
changes. Again, by rearranging terms in T, we get another expression of T
2δ dT d 2T
= , and > 0 and > 0 when δ > 2c. Subsequently, T is close to 2
δ − 2c ⋅ q dq dq 2
β
when α is far larger than β (i.e., ≈ 0), and very rapidly increasing (to infinity) as
α
β δ
approaches to (> 1). This behavior implicitly puts an upper bound on the
α 2c
δ⋅α
relative size between α and β; that is, β cannot be larger than . As a result, T
2c
β
appears more sensitive to than to δ.
α
*
Since t0 = t and they do not depend on the number of SCs under symmetric
strategies, we can view the performance structure from a different angle by defining
two parametric functions based on our model: that is, H = H (t ) ≡ −β⋅ t and J = J (t )
( )
≡ t ⋅ δ − t . Note that at a symmetric equilibrium, both H(-) and J(-) are constant
Marketing Games in Social Commerce 135

β⋅δ δ2
functions: specifically, H = − and J = for both social and individual optimal
2 4
levels (t0 and t*). Accordingly, the performance measure (4) can be simply viewed as
if it were a function of E only as below:

Ŵ = (α⋅E + H)⋅(J − c⋅E) . (5)

In fact, this expression of the system performance is similar to a well-known system


performance measure in ecology([3], [7]). First, one can interpret H and J as a
location parameter and an ecological capacity, respectively. The latter (J) is
proportional to the average capability and the former (H), together with J, determines
the generic performance without marketing efforts; that is, J⋅H (< 0) corresponds to
the performance level when E = 0. From (5), we know that both solutions to Ŵ = 0
α⋅J −c⋅H
(in terms of E) are positive, and Ŵ is maximized at Ê = .
2α ⋅ c
There are two forces at work in (5). First, for a given potential market size (i.e., a
fixed H and J), more marketing efforts by SCs mean more revenue streams: the first
term (α⋅E + H) in (5). In fact, at the early stage of the industry, the marketing chicken
game has contributed to the rapid growth of the entire market for SC services([4]). In
Korea, the SC business has grown into a one billion dollar industry over the last two
years, and many experts agree that the massive marketing activities have raised
customers’ awareness of the SC businesses. However, due to the fierce competition
with a fixed installed base, more marketing efforts also result in a smaller population
to target in the next period: the second term (J − c⋅E) in (5). The overall effect of these
two forces comes up with the system performance measure in a multiplicative form as
above.
Corollary 5 below provides more streamlined expressions of the conditions
pertaining to the tragedy of commons when tk’s are identical and fixed at some t such
as t0 (= t*) for some policy reasons, and ek is the only effective strategy of SC k (k = 1,
… , N).
Corollary 5. Let’s assume that tk’s are fixed at t and the performance measure is
given as (5). With symmetric strategies as in Proposition 2, each SC sets its optimal
t ⋅ (δ − t ) N − 1 t ⋅ (δ − t ) N − 1
marketing effort at ⋅ 2
(thereby, E* = ⋅ ). And we have
c N c N
the following two cases.
Case (a) c⋅β < α⋅(δ − t): Then, there is a positive critical value T̂ such that the
tragedy of commons occurs (i.e., E* ≥ Ê ) if the number of SCs exceeds this critical
point (i.e., N ≥ T̂ ). T̂ is larger than one and determined as follows:
2α ⋅ (δ − t )
T̂ ≡ ,
α ⋅ (δ − t ) − c ⋅ β
Case (b) c⋅β ≥ α⋅(δ − t): Then, for any N, the total marketing efforts falls short of the
socially optimal level (i.e., E* < Ê ).
136 D. Kim

Proof: The proof is straightforward and follows the procedures similar to


Proposition 3 and Proposition 4. We omit the proof. Q.E.D.

Note that T̂ > 1 in Case (a) of Corollary 5. Thus, we still have a chance to escape
from the tragedy of commons even in Case (a) when N < T̂ . Unfortunately, however,
a reasoning procedure similar to one derived from Proposition 4 reveals that T̂ is
always larger than 2 but quite small in most normal situations.

4 Conclusion and Future Works

The SC startups have drawn criticism for unusual accounting practices, rising
marketing costs and inadequate quality assurance, despite a rapid growth in their early
stage. We tried to understand the current critical situation and figure out the causes of
the pessimistic view toward the SC industry. For the purposes, our study developed
stylized game models and analyzed them to find out some potential (but critical)
problems inherent in the business model at the early stage of industrial life-cycle. In
particular, we focused on the conditions under which the SC industry is sustainable.
Our findings and analytical results provided strategic implications and policy
directions to overcome the shortcomings intrinsic to the current business model. For
example, a set of regulations on the marketing activities may help the industry to
sustainably develop itself toward the next level. Along this line, our future works will
pursue some empirical studies to identify the parameters in our model so that we can
further enrich knowledge about the industry. For example, although gathering data
will be intrinsically difficult due to the early stage of the industry, we need to develop
an operational definition of the social welfare W to estimate the relevant parameters
such as α and β in our model. Then, we will be able to quantify the conditions under
which a (group of) first-mover(s) survives and estimate a proper size of the industry
sustainable in the long run.

References
1. Baek, B.-S.: Ticket Monster and Coupang, head-to-head competition for the industry’s
number one position. Economy Today (November 25, 2011) (in Korean),
http://www.eto.co.kr/news/
outview.asp?Code=20111125145743203&ts=133239
2. Patel, K.: Groupon marketing spending works almost too well. Ad Age Digital
(November 12, 2011),
http://adage.com/article/digital/
groupon-marketing-spending-works/230777/
3. Alroy, J.: A multispecies overkill simulation of the end-Pleistocene mega faunal mass
extinction. Science 292, 1893–1896 (2001)
4. Anderson, M., Sims, J., Price, J., Brusa, J.: Turning ‘like’ to ‘buy’: social media emerges
as a commerce channel. White Paper. Booz & Company (January, 20 (2012),
http://www.booz.com/media/uploads/
BaC-Turning_Like_to_Buy.pdf
Marketing Games in Social Commerce 137

5. Financial News: Special report on social commerce (December 18, 2011) (in Korean),
http://www.fnnews.com/view?ra=Sent0901m_View&corp=fnnews&arc
id=0922494751&cDateYear=2011&cDateMonth=12&cDateDay=18
6. Greco, G.M., Floridi, L.: The tragedy of the digital commons. Ethics and Information
Technology 6, 73–81 (2004)
7. Hardin, G.: The tragedy of the commons. Science 162, 1243–1248 (1968)
8. KCCI: A consumer satisfaction survey on social commerce services. Research report.
Korea Chamber of Commerce and Industry (March 8, 2011) (in Korean),
http://www.korcham.net/EconNews/KcciReport/CRE01102R.asp?m_c
hamcd=A001&m_dataid=20110308001&m_page=1&m_query=TITLE&m_que
ryText=%BC%D2%BC%C8%C4%BF%B8%D3%BD%BA
9. Kim, Y.-H.: Social commerce: current market situations and policy issues. KISDI (Korea
Information Society Development Institute) Issue Report 23, 41–63 (2011) (in Korean)
10. Knowledge at Wharton: Dot-com bubble, part II? Why it’s so hard to value social
networking sites? Knowledge at Wharton Online (October 4, 2006),
http://knowledge.wharton.upenn.edu/
article.cfm?articleid=1570
11. Lee, E.-M.: Global market survey on social commerce. KISDI Issue Report 23, 36–44
(2011) (in Korean)
12. MacMillan, D.: Groupon’s stumbles may force it to pare back size of IPO. Bloomberg
Online (October 3, 2011),
http://www.bloomberg.com/news/2011-10-03/groupon-s-stumbles-
seen-paring-back-size-of-ipo-as-investor-interest-wanes.html
13. MacMillan, D.: LivingSocial aims to be different from Groupon. Business Week Online
(September 22, 2011),
http://www.businessweek.com/magazine/
livingsocial-aims-to-be-different-from-groupon-09222011.html
14. MacMillan, D.: Groupon China Venture said to fire workers for poor performance.
Bloomberg Online (August 24, 2011), http://www.bloomberg.com/news/2011-
08-23/groupon-china-joint-venture-said-to-fire-workers-for-
poor-performance.html
15. Reibstein, D.: How sustainable is Groupon’s business model? Knowledge at Wharton
(May 25, 2011),
http://knowledge.wharton.upenn.edu/
article.cfm?articleid=2784
16. ROA Holdings: The rapidly expanding social commerce market of South Korea and Japan.
Research report (February 21, 2011),
http://global.roaholdings.com/report/
research_view.html?type=country&num=143
17. Urstadt, B.: Social networking is not a business. MIT Technology Review (July/August
2008), http://www.technologyreview.com/business/20922/
18. Webster, K.: Groupon’s business model: bubble or the real deal? (September 19, 2011),
http://pymnts.com/commentary/pymnts-voice/
groupon-s-business-model-bubble-or-the-real-deal/
19. Wheeler, R.: Groupon gone wrong! Harvard business fellow’s warning to investors and
entrepreneurs (August 23, 2011),
http://pymnts.com/briefingroom/shopping-and-social-
buying/social-shopping-and-social-buying/
groupon-gone-wrong-a-warning-to-investors/
Mean Field Stochastic Games with Discrete
States and Mixed Players

Minyi Huang

School of Mathematics and Statistics, Carleton University,


Ottawa, ON K1S 5B6, Canada
[email protected]

Abstract. We consider mean field Markov decision processes with a


major player and a large number of minor players which have their indi-
vidual objectives. The players have decoupled state transition laws and
are coupled by the costs via the state distribution of the minor players.
We introduce a stochastic difference equation to model the update of
the limiting state distribution process and solve limiting Markov decision
problems for the major player and minor players using local information.
Under a solvability assumption of the consistent mean field approxima-
tion, the obtained decentralized strategies are stationary and have an
ε-Nash equilibrium property.

Keywords: mean field game, finite states, major player, minor player.

1 Introduction
Large population stochastic dynamic games with mean field cou-
pling have attracted substantial interest in the recent years; see, e.g.,
[1,4,11,16,12,13,18,19,22,23,24,26,27]. To obtain low complexity strategies,
consistent mean field approximations provide a powerful approach, and in the
resulting solution, each agent only needs to know its own state information
and the aggregate effect of the overall population which may be pre-computed
off-line. One may further establish an ε-Nash equilibrium property for the set
of control strategies [12]. The technique of consistent mean field approximations
is also applicable to optimization with a social objective [5,14,23]. The survey
[3] on differential games presents a timely report of recent progress in mean
field game theory. This general methodology has applications in diverse areas
[4,20,27]. The mean field approach has also appeared in anonymous sequential
games [17] with a continuum of players individually optimally responding to
the mean field. However, the modeling of a continuum of independent processes
leads to measurability difficulties and the empirical frequency of the realizations
of the continuum-indexed individual states cannot be meaningfully defined [2].
A recent generalization of the mean field game modeling has been introduced
in [10] where a major player and a large number of minor players coexist pursuing
their individual interests. Such interaction models are often seen in economic or
engineering settings, simple examples being a few large corporations and many

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 138–151, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Games with Mixed Players 139

much smaller competitors, a network service provider and a large number of


small users with their respective objectives. An extension of the modeling in [10]
to dynamic games with Markovian switches in the dynamics is presented in [25].
The random switches model the abrupt changes of the decision environment.
Traditionally, game models differentiating vastly different strengths of players
have been well studied in cooperative game theory, and static models are usually
considered [6,8,9]. Such players with very different strengths are called mixed
players.
The linear-quadratic-Gaussian (LQG) model in [10] shows that the presence
of the major player causes an interesting phenomenon called the lack of sufficient
statistics. More specifically, in order to obtain asymptotic equilibrium strategies,
the major player cannot simply use a strategy as a function of its current state
and time; for a minor player, it cannot simply use the current states of the major
player and itself. To overcome this lack of sufficient statistics for decision, the
system dynamics are augmented by adding a new state, which approximates
the mean field and is driven by the major player’s state. This additional state
enters the obtained decentralized strategy of each player and it captures the past
influence of the major player. The recent work [21] considered minor players
parametrized by a continuum which causes high complexity to the state space
augmentation approach, and a backward stochastic differential equation based
approach (see, e.g., [28]) was used to deal with the random mean field process.
The resulting decentralized strategies are not Markovian.
In this paper, we consider the interaction modeling of a major player and a
large number of minor players in the setting of discrete time Markov decision
processes (MDPs). Although the major player modeling is conceptually very
similar to [10] which considers an LQG game model, the lack of linearity in
the MDP context will give rise to many challenges in analysis. Additionally,
an important motivation to use the MDP framework is that our method may
potentially be applicable to many practical problems. In relation to mean field
games with discrete state and action spaces, related work can also be found in
[15,23,7,17]; they all consider a population of comparably small decision makers
which may be called peers.
A key step in our decentralized control design is to describe the evolution of
the mean field, as the distribution of the minor players’ states, by a stochastic
difference equation driven by the major player’s state. Given the above repre-
sentation of the limiting mean field, we may approximate the original problems
of the major player and a typical minor player by limiting MDPs with hybrid
state spaces where the player in question has a finite state space and the mean
field process is a continuum evolving on a simplex.
The organization of the paper is as follows. Section 2 formulates the mean
field Markov decision game with a major player. Section 3 proposes a stochastic
representation of the update of the mean field and analyzes two auxiliary MDPs
in the mean field limit. The consistency condition for mean field approximations
is introduced in Section 4, and Section 5 shows an asymptotic Nash equilibrium
property. Section 6 presents concluding remarks of the paper.
140 M. Huang

2 The Mean Field Game Model


We adopt the framework of Markov decision processes to formulate the mean field
game which involves a major player A0 and a large population of minor players
{Ai , 1 ≤ i ≤ N }. The state and action spaces of all players are finite, and denoted
by S0 = {1, . . . , K0 } and A0 = {1, . . . , L0 }, respectively, for the major player.
For simplicity, we consider uniform minor players which share common state and
action spaces denoted by S = {1, . . . , K} and A = {1, . . . , L}, respectively. At
time t ∈ Z+ = {0, 1, 2, . . .}, the state and action of Aj are, respectively, denoted
by xj (t), uj (t), 0 ≤ j ≤ N . To model the mean field interaction of the players,
we denote the random measure process as follows
(N ) (N )
I (N ) (t) = (I1 (t), . . . , IK (t)), t ≥ 0,
(N ) N
where Ik (t) = (1/N ) i=1 1(xi (t)=k) . The process I (N ) (t) describes the fre-
quency of occurrence of the states in S at time t.
For the major player, the state transition law is determined by the stochastic
kernel

Q0 (z|y, a0 ) = P (x0 (t + 1) = z|x0 (t) = y, u0 (t) = a0 ), (1)

where y, z ∈ S0 and a0 ∈ A0 . Following the usual convention in Markov decision


processes, the transition probability of the process x0 from t to t + 1 is solely
determined by x0 (t) = y and u0 (t) = a0 observed at t even if additional state
and action information before t is known.
The one-stage cost of the decision problem of the major player is given by
c0 (x0 , θ, a0 ), where θ is the state distribution of the minor players. The infinite
horizon discounted cost is


J0 = E ρt c0 (x0 (t), I (N ) (t), u0 (t)),
t=0

where ρ ∈ (0, 1) is the discount factor.


The state transition of minor player Ai is specified by

Q(z|y, a) = P (xi (t + 1) = z|xi (t) = y, ui (t) = a), (2)

where y, z ∈ S and a ∈ A. The one-stage cost is c(x, x0 , θ, a) and the infinite


horizon discounted cost is


Ji = E ρt c(xi (t), x0 (t), I (N ) (t), ui (t)).
t=0

Due to the structure of the costs J0 and Ji , the major player has a significant
impact on each minor player. By contrast, each minor player has a negligible
impact on another minor player or the major player. Also, from the point of
view of the major player or a fixed minor player, it does not distinguish other
Games with Mixed Players 141

specific individual minor players. Instead, only the aggregate state information
I (N ) (t) matters at each step, which is an important feature of mean field decision
problems.
For the N + 1 decision processes, we specify the joint distribution as follows.
Given the states and actions of all players at time t, the transition probability
to a value of (x0 (t + 1), x1 (t + 1), . . . , xN (t + 1)) is simply given by the product
of the individual transition probabilities under their respective actions.
For integer k ≥ 2, denote the simplex
⎧ ⎫
⎨ k ⎬

Dk = (λ1 , . . . , λk ) ∈ Rk+  λj = 1 .
⎩ ⎭
j=1

To ensure that the individual costs are finite, we introduce the assumption.
(A1) The one-stage costs c0 and c are functions on S0 × DK × A0 and S ×
S0 × DK × A, respectively, and they are both continuous in θ. ♦

Remark 1. By the continuity condition in (A1), there exists a fixed constant C


such that |c0 | + |c| ≤ C for all x0 ∈ S0 , x ∈ S, a0 ∈ A0 , a ∈ A and θ ∈ DK .

We further assume the following condition on the initial state distribution of


the minor players.
(A2) The initial states x1 (0), . . . , xN (0) are independent and there exists a
deterministic θ0 ∈ DK such that

lim I (N ) (0) = θ0
N →∞

with probability one. ♦

2.1 The Traditional Approach and Complexity

Denote the so-called t-history

ht = (xj (s), uj (s − 1), s ≤ t, j = 0, . . . , N ), t ≥ 1, (3)

and h0 = (x0 ). We may further specify mixed strategies (or policies; we shall use
the two names strategy and policy interchangeably), as a probability measure on
the action space, of each player depending on ht , and use the method of dynamic
programming to identify Nash strategies for the mean field game. However, for a
large population of minor players, this traditional approach is impractical. First,
each player must use centralized information which causes high complexity in
implementation; second, numerically solving the dynamic programming equation
is a prohibitive or even impossible task when the number of minor players exceeds
a few dozen.
142 M. Huang

3 The Mean Field Approximation


To overcome the fundamental complexity difficulty, we use the mean field ap-
proximation approach. The basic idea is to introduce a limiting process to ap-
proximate the random measure process I (N ) (t) and solve localized optimization
problems for both the major player and a representative minor player.
Regarding the informational requirement in our decentralized strategy design,
we assume (i) the limiting distribution θ0 and the state x0 (t) of the major player
are known to all players, (ii) each minor player knows its own state but not the
state of any other particular minor player.
We use a process θ(t) with state space DK to approximate I (N ) (t) when
N → ∞. Before specifying the rule governing the evolution of θ(t), we give some
intuitive explanation. Due to the presence of the major player, the action of
each minor player should be affected by x0 (t) and its own state xi (t), and this
causes the correlation of the individual state processes {xi (t), 1 ≤ i ≤ N } in the
closed-loop system. The resulting process θ(t) should be a random process. We
propose the updating rule
θ(t + 1) = ψ(x0 (t), θ(t)), (4)
where θ(0) = θ0 . The specific form of ψ will be determined by a procedure of
consistent mean field approximations. We consider ψ from the following function
class

Ψ = {φ(i, θ) = (φ1 , . . . , φK )|φk ≥ 0, k∈S φk = 1},
where φ(i, ·) is continuous on DK for all i ∈ S0 . The structure of (4) is analogous
to the stochastic ordinary differential equation (ODE) modeling of the random
mean field in the mean field LQG game model in [10], where the the evolution
of the ODE is driving by the state of the major player.
It is possible to consider a function of the form ψ(t, x0 , θ), which is more
general than in (4). For computational efficiency, we will not seek this generality.
And on the other hand, the consideration of a time-invariant function will be
sufficient for developing our mean field approximation scheme. More specifically,
by introducing (4), we may develop stationary feedback strategies for all the
players, and furthermore, the mean field limit of the closed-loop will regenerate a
stationary transition law of θ(t) which is in agreement with the initial assumption
of time-invariant dynamics.

3.1 The Limiting Problem of the Major Player


Suppose the function ψ in (4) has been given. The original problem of the major
player is now approximated by a new Markov decision process. We will often use
x0 , xi , θ to denote a value of the corresponding processes.
Problem (P0): Minimize


J¯0 = E ρt c0 (x0 (t), θ(t), u0 (t)),
t=0
Games with Mixed Players 143

where x0 (t) has the transition law (1) and θ(t) satisfies (4).
Problem (P0) gives a standard Markov decision process. To solve this problem,
we use the dynamic programming approach by considering a family of optimiza-
tion problems associated with different initial conditions. Given the initial state
(x0 , θ) ∈ S0 × DK at t = 0, define the cost function


J¯0 (x0 , θ, u(·)) = E ρt c0 (x0 (t), θ(t), u0 (t))|x0 , θ .
t=0

Denote the value function v(x0 , θ) = inf J¯0 (x0 , θ, u(·)), where the infimum is
with respect to all mixed policies/strategies of the form π = (π(0), π(1), . . . , )
such that each π(s) is a probability measure on A0 , indicating the probability
to take a particular action, and depends on all past history (. . . , x0 (s − 1), θ(s −
1), u0 (s − 1), x0 (s), θ(s)). By taking two different initial conditions (x0 , θ) and
(x0 , θ ) and comparing the associated optimal costs, we may easily obtain the
following continuity property.

Proposition 1. For each x0 , the value function v(x0 , ·) is continuous on DK . 

We write the dynamic programming equation

v(x0 , θ)
= min {c0 (x0 , θ, a0 ) + ρEv(x0 (t + 1), θ(t + 1))}
a0 ∈A0


= min c0 (x0 , θ, a0 ) + ρ Q0 (k|x0 , a0 )v(k, ψ(x0 , θ)) .
a0 ∈A0
k∈S0

Since the action space is finite, an optimal policy π̂0 solving the dynamic pro-
gramming equation exists and is determined as a stationary Markov policy of
the form π̂0 (x0 , θ), i.e., π̂0 is a function of the current state. Let the set of opti-
mal policies be denoted by Π0 . It is possible that Π0 consists of more than one
element.

3.2 The Limiting Problem of the Minor Player


Suppose a particular optimal strategy π̂0 ∈ Π0 has been fixed for the major
player. The resulting state process is x0 (t). The decision problem of the minor
player is approximated by the following limiting problem.
Problem (P1): Minimize


J¯i = E ρt c(xi (t), x0 (t), θ(t), ui (t)),
t=0
144 M. Huang

where xi (t) has the state transition law (2); θ(t) satisfies (4); and x0 (t) is subject
to the control policy π̂0 ∈ Π0 . This leads to a Markov decision problem with the
state (xi (t), x0 (t), θ(t)) and control action ui (t). Following the steps in Section
3.1, we define the value function w(xi , x0 , θ).
Before analyzing the value function w, we specify the state transition law of
the major player under any mixed strategy π0 . Suppose
π0 = (α1 , . . . , αL0 ), (5)
which is a probability vector. By the standard convention in Markov decision
processes, the strategy π0 selects action k with probability αk . We further define

Q0 (z|y, π0 ) = αl Q0 (z|y, l),
l∈A0

where π0 is given by (5).


The dynamic programming equation is now given by
w(xi , x0 , θ)
= min{c(xi , x0 , θ, a) + ρEw(xi (t + 1), x0 (t + 1), θ(t + 1))}
a∈A
  
= min c(xi , x0 , θ, a0 ) + ρ Q(j|xi , a)Q0 (k|x0 , π̂0 )w(j, k, ψ(x0 , θ)) .
a∈A
j∈S,k∈S0

The following continuity property parallels Proposition 1.


Proposition 2. For each pair (xi , x0 ), the value function w(xi , x0 , ·) is contin-
uous on DK . 
Again, since the action space in Problem (P1) is finite, the value function is
attained by at least one optimal strategy. Let the optimal strategy set be denoted
by Π. Note that Π is determined after π̂0 is selected first.
Let π be a mixed strategy of the minor player and represented in the form
π = (β1 , . . . , βL ).
We determine the state transition law of the minor player as follows

Q(z|y, π) = βl Q(z|y, l). (6)
l∈A

We have the following theorem on the closed-loop system.


Theorem 1. Suppose π̂0 ∈ Π0 and π̂ ∈ Π is determined after π̂0 . Under the
policy pair (π̂0 , π̂), (xi (t), x0 (t), θ(t)) is a Markov chain with stationary transition
probabilities.
Proof. It is clear that π̂0 and π̂ are stationary feedback policies as a function of
the current state of the corresponding system. They may be represented as two
probability vectors
π̂0 = (π̂01 (x0 , θ), . . . , π̂0L0 (x0 , θ)),
π̂ = (π̂ 1 (xi , x0 , θ), . . . , π̂ L (xi , x0 , θ)).
Games with Mixed Players 145

The process (xi (t), x0 (t), θ(t)) is a Markov chain since the transition probability
from time t to t + 1 depends only on the value of (xi (t), x0 (t), θ(t)) and not on
the past history. Suppose at time t, (xi (t), x0 (t), θ(t)) = (j, k, θ). Then at t + 1,
we have the transition probability
  

P xi (t + 1) = j  , x0 (t + 1) = k  , θ(t + 1) = θ xi (t), x0 (t), θ(t)) = (j, k, θ)
= Q(j  |j, π̂(j, k, θ))Q0 (k  |k, π̂0 (k, θ))δψ(k,θ) (θ ).

We use δa (x)to denote the dirac function, i.e., δa (x) = 1 if x = a, and δa (x) = 0
elsewhere. It is seen that the transition probability is determined by (j, k, θ) and
does not depend on time. 

3.3 Discussions on Mixed Strategies

If Problems (P0) and (P1) are considered alone, one may always select an optimal
policy which is a pure policy, i.e., given the current state, the action can be
selected in a deterministic manner. However, in the mean field game setting we
need to eventually determine the function ψ by a fixed point argument. For this
reason, it is generally necessary to consider the optimal policies from the larger
class of mixed policies. The restriction to deterministic policies may potentially
lead to a nonexistence situation when the consistency requirement is imposed
later on the mean field approximation.

4 Replication of the Frequency Process

This section develops the procedure to replicate the dynamics of θ(t) from the
closed-loop system when the minor players apply the control strategies obtained
from the limiting Markov decision problems.
We start with a system of N minor players. Suppose the major player has
selected its optimal policy π̂0 (x0 , θ) from Π0 . Note that for the general case
of Problem (P1), there may be more than one optimal policy. We make the
convention that the same optimal policy π̂(xi , x0 , θ) is used by all the minor
players while each minor player substitutes its own state into the feedback policy
π̂. It is necessary to make this convention since otherwise the mean field limit
cannot be properly defined if there are multiple optimal policies and if each
minor player can take an arbitrary one.
We have the following key theorem on the asymptotic property of the update
of I (N ) (t) when N → ∞. Note that the range of I (N ) (t) is a discrete set. For
any θ ∈ DK , we take an approximation procedure. We suppose the vector θ has
been used by the minor players (of the finite population) at time t in solving
their limiting control problems and used in their optimal policy.
Theorem 2. Fix any θ = (θ1 , . . . , θK ) ∈ DK . Suppose the major player applies
π̂0 and the N minor players apply π̂, and at time t the state of the major player
146 M. Huang

is x0 and I (N ) (t) = (s1 , . . . , sK ), where (s1 , . . . , sK ) → θ as N → ∞. Then given


(x0 , I (N ) (t), π̂), as N → ∞,

K K
 
I (N ) (t + 1) → θl Q(1|l, π̂(l, x0 , θ)), . . . , θl Q(K|l, π̂(l, x0 , θ)) (7)
l=1 l=1

with probability one.


Proof. By the assumption on I (N ) (t), there are sk N minor players in state k ∈ S
at time t. In determining the distribution of I (N ) (t + 1), by symmetry of the
minor players, we may assume without loss of generality that at time t minor
players A1 , . . . , As1 N are in state 1, As1 N +1 , . . . , A(s1 +s2 )N are in state 2, etc.
We check the contribution of A1 alone in generating different states in S. Due
to the transition of A1 , state k ∈ S will appear with probability

Q(k|1, π̂(1, x0 , θ)).

We further obtain a probability vector Q1 := (Q(k|1, π̂(1, x0 , θ)))K k=1 with its
entries assigned on the set S indicating the probability that each state appears
resulting from the transition of A1 .
An important fact is that in the closed-loop system with x0 (t) = x0 , condi-
tional independence holds for the transition from xi (t) to xi (t + 1) for the N
processes.
Thus, the distribution of N I (N ) (t + 1) given (x0 , I (N ) (t), π̂) is obtained as
the convolution of N independent distributions corresponding to all N minor
players. And Q1 is one of these N distributions. We have

K K
 
Ex0 ,I (N ) (t),π̂ I (N ) (t + 1) = sl Q(1|l, π̂(l, x0 , θ)), . . . , sl Q(K|l, π̂(l, x0 , θ)) ,
l=1 l=1
(8)

where Ex0 ,I (N ) (t),π̂ denotes the conditional mean given (x0 , I (N ) (t), π̂).
So by the law of large numbers I (N ) (t + 1) − Ex0,I (N ) (t),π̂ I (N ) (t + 1) converges
to zero with probability one, as N → ∞. We obtain (7). 
Based on the right hand side of (7), we introduce the N × N matrix
⎡ ⎤
Q(1|1, π̂(1, x0 , θ)) . . . Q(N |1, π̂(1, x0 , θ))
⎢ Q(1|2, π̂(2, x0 , θ)) . . . Q(N |2, π̂(2, x0 , θ)) ⎥
⎢ ⎥
Q∗ (x0 , θ) = ⎢ .. .. .. ⎥. (9)
⎣ . . . ⎦
Q(1|N, π̂(N, x0 , θ)) . . . Q(N |N, π̂(N, x0 , θ))
Theorem 2 implies that within the infinite population limit if the random mea-
sure of the states of the minor players is θ(t) at time t, then θ(t + 1) should be
generated as

θ(t + 1) = θ(t)Q∗ (x0 (t), θ(t)). (10)


Games with Mixed Players 147

4.1 The Consistent Condition


The fundamental requirement of consistent mean field approximations is that
the mean field initially assumed should be the same as what is replicated by
the closed-loop system when the number of minor players tends to infinity. By
comparing (4) with (10), this consistency requirement reduces to the following
condition

ψ(x0 , θ) = θQ∗ (x0 , θ), (11)

where Q∗ is given by (9). Recall that when we introduce the class Ψ for ψ,
we have a continuity requirement. By imposing (11), we implicitly require a
continuity property of Q∗ with respect to the variable θ.
Combining the solutions to Problems (P0) and (P1) and the consistent re-
quirement, we write the so-called mean field equation system

θ(t + 1) = ψ(x0 (t), θ(t)), (12)


  
v(x0 , θ) = min c0 (x0 , θ, a0 ) + ρ Q0 (k|x0 , a0 )v(k, ψ(x0 , θ)) , (13)
a0 ∈A0
k∈S0

w(xi , x0 , θ) = min c(xi , x0 , θ, a0 )+
a∈A
 
ρ Q(j|xi , a)Q0 (k|x0 , π̂0 )w(j, k, ψ(x0 , θ)) , (14)
j∈S,k∈S0

ψ(x0 , θ) = θQ∗ (x0 , θ). (15)

In the above, we use xi to denote the state of the generic minor player. Note that
only a single generic minor player appears in this mean field equation system.
Definition 1. We call (π̂0 , π̂, ψ(x0 , θ)) a consistent solution to the mean field
equation system (12)-(15) if π̂0 solves (13) and π̂ solves (14) and if the constraint
(15) is satisfied. ♦

5 Decentralized Strategies and Performance


We consider a system of N + 1 players. We specify randomized strategies with
centralized information and decentralized information, respectively.

Centralized Information. Define the t-history ht by (3). For any j = 0, ..., N ,


the admissible control set Uj of player Aj consists of control (uj (0), uj (1), . . .),
where each uj (t) is a mixed strategy as a mapping from ht to DL0 if j = 0, and
to DL if 1 ≤ j ≤ N .

Decentralized Information. For the major player, denote


 
h0,dec
t = x0 (0), θ(0), u0 (0), . . . , x0 (t − 1), θ(t − 1), u0 (t − 1), x0 (t), θ(t) .
148 M. Huang

A decentralized strategy at time t is such that u0 (t) is a randomized strategy


depending on h0,dec
t . For minor player Ai , denote

hi,dec
t = xi (0), x0 (0), θ(0), ui (0), . . . ,

xi (t − 1), x0 (t − 1), θ(t − 1), u0 (t − 1), xi (t), x0 (t), θ(t) .

A decentralized strategy at time t is such that ui (t) depends on hi,dec t .


For the mean field equation system, if a solution triple (π̂0 , π̂, ψ) exists, we will
obtain π̂0 and π̂ as decentralized Markov strategies as a function of the current
state (x0 (t), θ(t)) and (xi (t), x0 (t), θ(t)), respectively.
Suppose all the players use their decentralized strategies π̂0 (x0 , θ), π̂(xi , x0 , θ),
1 ≤ i ≤ N , respectively. In the setup of mean field decision problems, a central
issue is to examine the performance change for player Aj if it unilaterally changes
to a policy in Uj by utilizing extra information.
For examining the performance, we have the following error estimate on the
mean field approximation.
Theorem 3. Suppose (i) θ(t) is generated by (4), where θ0 is given by (A2);
(ii) (π̂0 , π̂, ψ(x0 , θ)) is a consistent solution to the mean field equation system
(12)-(15). Then we have

lim E|I (N ) (t) − θ(t)| = 0


N →∞

for each given t.


Proof. We use the technique introduced in the proof of Theorem 2. Fix any
 > 0. We have

P (|I (N ) (0) − θ0 | ≥ ) ≤ E|I (N ) (0) − θ(0)|/.

We take a sufficiently large N0 such that for all N ≥ N0 , we have

P (|I (N ) (0) − θ0 | < ) > 1 − . (16)

Then following the method for (8), we may estimate I (N ) (1). By the consistency
condition (11), we further obtain

lim E|I (N ) (1) − θ(1)| = 0.


N →∞

Carrying out the estimates recursively, we obtain the desired result for each
fixed t. 
For j = 0, ..., N , denote u−j = (u0 , u1 , ..., uj−1 , uj+1 , ..., uN ).
Definition 2. A set of strategies uj ∈ Uj , 0 ≤ j ≤ N , for the N + 1 players
is called an -Nash equilibrium with respect to the costs Jj , 0 ≤ j ≤ N , where
 ≥ 0, if for any j, 0 ≤ j ≤ N , we have Ji (uj , u−j ) ≤ Jj (uj , u−j ) + , when any
alternative uj is applied by player Aj . ♦
Games with Mixed Players 149

Theorem 4. Assume the conditions in Theorem 3 hold. Then the set of strate-
gies ûj , 0 ≤ j ≤ N , for the N + 1 players is an N -Nash equilibrium, i.e., for
0 ≤ j ≤ N,

Jj (ûj , û−j ) − N ≤ inf Jj (uj , û−j ) ≤ Jj (ûj , û−j ),


uj

where 0 ≤ N → 0 as N → ∞ and uj is a centralized information based strategy.

Proof. The theorem may be proven by following the usual argument in our pre-
vious work [12,10]. First, by using Theorem 3, we may approximate I (N ) (t) in
the original game by θ(t). Then the optimization problems of the major player
and any minor player are approximated by Problems (P0) and (P1), respec-
tively. Finally, it is seen that each player can gain little if it deviates from the
decentralized strategy determined from the mean field equation system. 

6 Conclusion Remarks and Future Work

This paper considers a class of Markov decision processes involving a major


player and a large population of minor players. The players have independent
dynamics for fixed actions and have mean field coupling in their costs according
to the state distribution process of the minor players. We introduce a stochastic
difference equation depending on the state of the major player to characterize
the evolution of the minor players’ state distribution process in the infinite pop-
ulation limit and solve local Markov decision problems. This approach provides
decentralized stationary strategies and offers a low complexity solution.
This paper presents the main conceptual framework for decentralized decision
making in the setting of Markov decision processes. The existence analysis and
the associated computation of a solution to the mean field equation system is
more challenging than in linear models. It is of interest to develop fixed point
analysis to study the existence of solutions. Also, the development of iterative
computation procedures for solutions is of practical interest.

References

1. Adlakha, S., Johari, R., Weintraub, G., Goldsmith, A.: Oblivious equilibrium for
large-scale stochastic games with unbounded costs. In: Proc. IEEE CDC 2008,
Cancun, Mexico, pp. 5531–5538 (December 2008)
2. Al-Najjar, N.I.: Aggregation and the law of large numbers in large economies.
Games and Economic Behavior 47(1), 1–35 (2004)
3. Buckdahn, R., Cardaliaguet, P., Quincampoix, M.: Some recent aspects of differ-
ential game theory. Dynamic Games and Appl. 1(1), 74–114 (2011)
4. Dogbé, C.: Modeling crowd dynamics by the mean field limit approach. Math.
Computer Modelling 52, 1506–1520 (2010)
5. Gast, N., Gaujal, B., Le Boudec, J.-Y.: Mean field for Markov decision processes:
from discrete to continuous optimization (2010) (Preprint)
150 M. Huang

6. Galil, Z.: The nucleolus in games with major and minor players. Internat. J. Game
Theory 3, 129–140 (1974)
7. Gomes, D.A., Mohr, J., Souza, R.R.: Discrete time, finite state space mean field
games. J. Math. Pures Appl. 93, 308–328 (2010)
8. Haimanko, O.: Nonsymmetric values of nonatomic and mixed games. Math. Oper.
Res. 25, 591–605 (2000)
9. Hart, S.: Values of mixed games. Internat. J. Game Theory 2, 69–86 (1973)
10. Huang, M.: Large-population LQG games involving a major player: the Nash cer-
tainty equivalence principle. SIAM J. Control Optim. 48(5), 3318–3353 (2010)
11. Huang, M., Caines, P.E., Malhamé, R.P.: Individual and mass behaviour in large
population stochastic wireless power control problems: centralized and Nash equi-
librium solutions. In: Proc. 42nd IEEE CDC, Maui, HI, pp. 98–103 (December
2003)
12. Huang, M., Caines, P.E., Malhamé, R.P.: Large-population cost-coupled LQG
problems with nonuniform agents: individual-mass behavior and decentralized ε-
Nash equilibria. IEEE Trans. Autom. Control 52(9), 1560–1571 (2007)
13. Huang, M., Caines, P.E., Malhamé, R.P.: The NCE (mean field) principle with
locality dependent cost interactions. IEEE Trans. Autom. Control 55(12), 2799–
2805 (2010)
14. Huang, M., Caines, P.E., Malhamé, R.P.: Social optima in mean field LQG control:
centralized and decentralized strategies. IEEE Trans. Autom. Control (in press,
2012)
15. Huang, M., Malhamé, R.P., Caines, P.E.: On a class of large-scale cost-coupled
Markov games with applications to decentralized power control. In: Proc. 43rd
IEEE CDC, Paradise Island, Bahamas, pp. 2830–2835 (December 2004)
16. Huang, M., Malhamé, R.P., Caines, P.E.: Nash equilibria for large-population linear
stochastic systems of weakly coupled agents. In: Boukas, E.K., Malhamé, R.P.
(eds.) Analysis, Control and Optimization of Complex Dynamic Systems, pp. 215–
252. Springer, New York (2005)
17. Jovanovic, B., Rosenthal, R.W.: Anonymous sequential games. Journal of Mathe-
matical Economics 17, 77–87 (1988)
18. Lasry, J.-M., Lions, P.-L.: Mean field games. Japan. J. Math. 2(1), 229–260 (2007)
19. Li, T., Zhang, J.-F.: Asymptotically optimal decentralized control for large popu-
lation stochastic multiagent systems. IEEE Trans. Automat. Control 53(7), 1643–
1660 (2008)
20. Ma, Z., Callaway, D., Hiskens, I.: Decentralized charging control for large popula-
tions of plug-in electric vehicles. IEEE Trans. Control Systems Technol. (to appear,
2012)
21. Nguyen, S.L., Huang, M.: Mean field LQG games with a major player: continuum-
parameters for minor players. In: Proc. 50th IEEE CDC, Orlando, FL, pp. 1012–
1017 (December 2011)
22. Nourian, M., Malhamé, R.P., Huang, M., Caines, P.E.: Mean field (NCE) formula-
tion of estimation based leader-follower collective dyanmics. Internat. J. Robotics
Automat. 26(1), 120–129 (2011)
23. Tembine, H., Le Boudec, J.-Y., El-Azouzi, R., Altman, E.: Mean field asymptotics
of Markov decision evolutionary games and teams. In: Proc. International Confer-
ence on Game Theory for Networks, Istanbul, Turkey, pp. 140–150 (May 2009)
24. Tembine, H., Zhu, Q., Basar, T.: Risk-sensitive mean-field stochastic differential
games. In: Proc. 18th IFAC World Congress, Milan, Italy (August 2011)
Games with Mixed Players 151

25. Wang, B.-C., Zhang, J.-F.: Distributed control of multi-agent systems with random
parameters and a major agent (2012) (Preprint)
26. Weintraub, G.Y., Benkard, C.L., Van Roy, B.: Markov perfect industry dynamics
with many firms. Econometrica 76(6), 1375–1411 (2008)
27. Yin, H., Mehta, P.G., Meyn, S.P., Shanbhag, U.V.: Synchronization of coupled
oscillators is a game. IEEE Trans. Autom. Control 57(4), 920–935 (2012)
28. Yong, J., Zhou, X.Y.: Stochastic Controls: Hamiltonian Systems and HJB Equa-
tions. Springer, New York (1999)
Network Formation Game for Interference
Minimization Routing in Cognitive Radio Mesh
Networks

Zhou Yuan1 , Ju Bin Song2 , and Zhu Han1


1
Department of Electrical and Computer Engineering,
University of Houston, Houston, TX, USA
2
Department of Electronics and Radio Engineering,
Kyung Hee University, South Korea

Abstract. Cognitive radio (CR)-based wireless mesh networks (WMNs)


provide a very suitable framework for secondary users’ (SUs’) trans-
missions. When designing routing techniques in CR-WMNs, we need to
consider the aggregate interference from the SUs to PUs. Although the
interference from a single SU that is outside the PUs’ footprints is small,
the aggregate interference from a great number of SUs transmitting at
the same time may be significant, and this will greatly influence the PUs’
performance. Therefore, in this paper, we develop a distributed routing
algorithm using the network formation game to minimize the aggregate
interference from the SUs to the PUs. The proposed distributed algo-
rithm can avoid the problems in the centralized routing solution, such
as the high computation complexity and high information-gathering de-
lay. Simulation results show that the proposed framework can provide
better routes in terms of interference to the PUs compared to the Dijk-
stra’s shortest path algorithm, and the distributed solution shows near
optimum compared to the upper bound.

1 Introduction
Cognitive radio (CR) is a revolutionary technology that allows secondary users
(SUs) to occupy the idle licensed spectrum holes left by the primary users (PUs)
[1]. CR-based wireless mesh networks (WMNs) is dynamically self-organized
and self-configured, and the SUs (wireless mesh routers) have the capabilities to
automatically establish and maintain the mesh connections among themselves
avoiding the interference to the PUs [2–5].
Although there have been some work investigating routing problems in CR
networks, few in the literatures consider the aggregate interference to the PUs
from a large amount of SUs transmitting at the same time. Also the game the-
oretical approaches have been less investigated in the routing problems for the
CR networks. In this paper, we focus on the development of routing algorithms
for CR-WMNs to minimize the aggregate interference from the SUs to the PUs.
Note that we are not considering the interference between different secondary
nodes or between multiple paths, which has been well investigated in the idea

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 152–162, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Network Formation Game 153

Fig. 1. Illustration of the CR-WMN model

of interference aware routing [6]. Instead, we are studying the aggregate inter-
ference from multiple SUs to the PUs in the CR networks. In CR-WMNs, the
secondary mesh nodes equipped with CR functionalities must be out of PUs’
footprint to avoid interference to the PUs, as long as they want to use the same
channels as the PUs’. Although the interference from a single SU (that is outside
the primary users’ footprint) is small, the aggregated interference from a large
number of SUs transmitting at the same time to the PUs can be significant, and
the performance of the PUs can be greatly influence by this aggregate interfer-
ence. We formulate the routing problem to minimize the aggregate interference
from the SUs to the PUs. We develop a distributed algorithm using the network
formation game framework and a myopic distributed algorithm [7]. From the
simulation results, we can see that the proposed distributed algorithm can pro-
duce better routes in terms of interference to the PUs compared to Dijkstra’s
algorithm. Also the distributed solution shows near optimum compared to the
upper bound.
The remainder of this paper is organized as follows. In Section 2, the CR-
WMN model is introduced. In Section 3, we provide the formulation of the
distributed routing algorithm. Section 4 presents the simulation results, and
Section 5 concludes the paper.

2 Cognitive Radio Wireless Mesh Network Model

In CR-WMNs, the wireless routers work as the SUs, which have the capabilities
to sense the spectrum and access the idle spectrum holes left by the PUs. The SUs
can employ the spectrum sensing techniques, such as radio identification based
154 Z. Yuan, J.B. Song, and Z. Han

sensing or the spectral correlation algorithm, to detect the available spectrum


left by the PUs [9]. We define N as the set of SUs in the CR-WMNs, and each
router i ∈ N . E is the set of direct links, and fe represents the flow on direct
link e ∈ E. If two SUs are in each other’s transmission range, we define the link
between these two nodes as a direct link. Otherwise, the link is called indirect
link, in which intermediate nodes along the link are required to relay packets.
ci,j is defined as the capacity ofdirect link e = (i, j), and it can be calculated
Pi d−α
i,j h
using ci,j = W log2 1 + Nj +Γ , where W represents the bandwidth, Pi is the
transmission power of node i, di,j is the distance between node i and j, α is path
loss constant, and h is the channel response that can be defined as a circular
symmetric complex Gaussian random variable. Nj and Γ represent AWGN noise
and the interference from other nodes, respectively. We also define an indicator
Xi,j , which is set to 1 only if the link e = (i, j) is active. Fig. 1 illustrates the CR-
WMN model, and we can see that the big circle represents the PU’s footprint.
Solid lines between SUs represent the links that are connected and dashed lines
are the links that have no connections. If the licensed spectrum is occupied by
the PU, secondary users, such as SU4, which are inside the PUs’ footprint, are
not allowed to access the spectrum. Therefore, we will have Xi,j = 0, Xp,j = 0,
and Xj,q = 0. In contrast, if SUs are out of PU’s footprint, such as SUs i,
p, and q, they are allowed to access the spectrum, since the interference from
single secondary user is sufficiently low. Consequently, we can have Xi,p = 1 and
Xp,q = 1, showing that the SUs can access the spectrum because the SUs are
out of the PU’s footprint.

2.1 Routing with Minimum Aggregate Interference to the PUs in


CR-WMNs
Single SU may produce sufficiently low interference to the PUs when the distance
between itself and the primary users is sufficiently long. Nevertheless, when the
number of the SUs increases, and a large amount of SUs are transmitting at the
same time, the aggregate interference from the SUs to the PUs can be significant.
We must design routing protocols in CR-WMNs to minimize this aggregate
interference. The concept of interference temperature can be considered to model
the interference level in CR-WMNs [8]. In this paper, we use the generalized
interference temperature model TI , i.e.,
PI (fc , B)
TI (fc , B) = , (1)
kB
where PI (fc , B) is the average interference power in Watts centered at frequency
fc , covering a bandwidth of B in Hertz. Boltzmann’s constant k is 1.38 × 10−23
Joules per Kelvin degree.
In the example shown in Fig. 1, the interference temperature level of SU2 is
lower than that of SU1, considering the fact that SU1 is located closer than SU2
to the PU. When the SU i and SU q want to communicate with each other, we
should choose the path of i → SU2 → q instead of i → SU1 → q.
Network Formation Game 155

2.2 Transmission Range and Interference Range

The transmission power of SU i can be denoted as Pi . We define the channel


gain between two secondary nodes i and j as Gi,j = βd−α i,j , where α is the path
loss constant, β is a constant related to antenna design, and dij is the distance
between SU i and SU j. We define a threshold ρT . Only if the received power
is higher than ρT , the data can be seen as successfully transmitted. We also
assume that interference from a single secondary mesh node is sufficiently low
when received power at the PUs is smaller than another threshold ρI . Therefore,
the transmission range for a SU i can be calculated as RTi = (βPi /ρT )1/α . In
the same way, we can calculate the interference range for secondary node i as
RIi = (βPi /ρI )1/α .

3 Distributed Routing Algorithm Using Network


Formation Game

In this section, we propose a distributed routing algorithm for CR-WMNs us-


ing the network formation game. Compared to the centralized routing solution,
which may cause problems such as the high cost for building the centralized co-
ordinate nodes, high information-gathering delay, and system breakdown caused
by the possible failures in the centralized nodes, the network formation based
distributed routing algorithm can significantly reduce the system overhead and
the computation complexity.

3.1 Game Formulation

Network formation games provide a suitable framework to model the interac-


tions among the SUs in CR-WMNs when they are trying to form the routes [7].
Network formation game constitute the problems that involve a number of play-
ers interacting with each other in order to form a suitable graph that connects
them. Depending on the objectives and incentives of the players in the network
formation game, we can form a final network graph G based the interactions
between the players and their decisions. Therefore, we can model the routing
problem in CR-WMNs as a network formation game, and SUs are players. The
result of the game will be a directed graph G(N, E). N = {1, ..., N } is defined as
the set of all secondary nodes, and E denotes the set of edges between the SUs.

Definition 1. A path between two SUs i and j in G can be defined as a sequence


of SUs i1 ,...,iK such that i1 = i, iK = j, and each directed link (ik , ik+1 ) ∈ G
for each k ∈ {1, . . . , K − 1}. We denote Vi as the set of all paths from SU i to
the destination of SU i, denoted as Di , and thus |Vi | represents the number of
paths from SU i to destination Di .

Convention 1: Each destination Di is connected to its source through at least


one path. Therefore, we can have |Vi | ≥ 1, ∀i ∈ N .
156 Z. Yuan, J.B. Song, and Z. Han

We need to define the strategy for each player in the game. The strategy of SU
i is to select the link that it wants to form from its strategy space, which can be
defined as the SUs in N that SU i is able to and wants to connect to. We want to
set a rule that player i cannot connect to player j which is already connected to i.
This means that if a link (j, i) ∈ G, then link (i, j) cannot be in G. Formally, for
a current network graph G, let Ai = {j ∈ N\{i}|(j, i) ∈ G} be the setof nodes
from which node i accepted a link (j, i), and Si = {(i, j)|j ∈ N\({i} Ai )} as
the set of links corresponding to the nodes with whom node i wants to connect.
Consequently, the strategy of player i is to select the link si ∈ Si that it wants
to form by choosing the player that it wants to connect to.

3.2 Utility
The players try to make decisions for utility maximization. Given a network
graph G and a selected strategy si for any player i ∈ N , the utility of player i
can be expressed as
fi,inexthop
ui (G) = −Be1 Be2 × TIi , (2)
ci,inexthop

where Be1 and Be2 are the barrier functions, TIi is node i’s interference temper-
ature, fi,inexthop is the flow on the edge between node i and its next hop, and
ci,inexthop represents the capacity of the same edge.
We know that the flow on each edge should be smaller than the link ca-
pacity, which means fe ≤ ce , ∀e ∈ E. In addition, the outgoing flow should be
equalto the sum of incoming
 flow and generated traffic. Therefore, we can have
lj + e=(i,j)∈E fe = e=(j,i)∈E fe , where lj represents the generated traffic of
secondary node j. This is the flow conservation constraint. We assume that that
lj consists of only generated traffic if there is no incoming traffic from wired In-
ternet. Therefore, the barrier functions that consider the above two constraints
can be defined as  κ1
 1
1
Be = , (3)
1 − fcee + ε1

and ⎛ ⎞ κ2
 1
Be2 = ⎝ 
lj + e∈E fe
⎠ , (4)
1−  + ε2
e fe

where ε1 and ε2 are two small dummy constants so that the denominators are not
zero. κ1 and κ2 are set to be great than 0 in order to weight different constraints.
When the constraints are almost not met, the values of the constraint functions
will be large. Therefore, in the proposed utility function, the interference to the
PUs are protected by the barrier functions to ensure that the two constraints
are satisfied.
Network Formation Game 157

3.3 Proposed Algorithm for Network Formation Game


Now we will start to design an algorithm of interaction to form the network
graph considering the utility function. When SU i plays a strategy si ∈ Si and
all other SUs keep their strategies s−i = [s1 , ..., si−1 , si+1 , ..., sM ], we can have
graph Gsi ,s−i . For each player i, it want to select strategy si = (i, j) ∈ Si which
can maximize its utility. We can define the best response for any player as:
Definition 2. A strategy s∗i ∈ Si is a best response for a player i ∈ N if
ui (Gs∗i ,s−i ) ≥ ui (Gsi ,s−i ), ∀si ∈ Si . Therefore, given that the other nodes main-
tain their strategies, the best response for player i is to choose the strategy that
maximizes its utility.
Subsequently, a distributed formation of the network graph is proposed in this
paper. We assume that network is dense enough. We also consider that each
node is myopic, which means that each player only considers the current state
of the network. When they want to improve their utilities, they do not consider
the future evolution of the network. In this paper, we propose a myopic network
formation algorithm consisting of two phases: a fair prioritization phase and a
network formation phase. In the fair prioritization phase, we develop a priority
function that assigns a priority to each node. In the network formation phase, the
players interact to select the next hop to this destination by increasing priority.
In the fair prioritization phase, the node with a higher interference to the
PUs is assigned a higher priority. The objective of the prioritization is to make
the SUs that produce high interference to the PUs have an advantage in the
selection of their path towards their destinations. Therefore, those players can
have a larger chance to improve their performances because they are allowed to
select their partners with a larger space of strategies. In addition, we need to
mention that we can also use other priority functions. In fact, in the simulation
results, we use a random priority function for a general case.
In the myopic network formation phase, the secondary nodes start to select
their strategies based on the priorities defined in the fair prioritization phase.
Given the current network graph resulting from the strategies of the other play-
ers, player i plays its best response s∗i ∈ Si in order to maximize its utility at
each round. Every node replaces its current link to the destination with another
link that maximizes its utility, and therefore, the best response action is a link
replace operation. In order to find the best response, each node engages in pair-
wise negotiations with the other nodes. Once the negotiations are completed,
the node can select the strategy that maximizes its payoff. Finally a graph G
will be formed after convergence in which no player can improve its utility by
changing the best response.
Definition 3. A network graph G in which no player i can improve its utility
by a unilateral change in its strategy si ∈ Si is a Nash network.
From the definition above, we can see that when the links chosen by each node
are the best responses, a Nash network is formed. In Nash network, no node is
able to improve its utility by unilaterally changing its current strategy, which
158 Z. Yuan, J.B. Song, and Z. Han

Fig. 2. A simulation result showing the network routing using distributed algorithm
in a 250-by-250 meter area

means that the nodes are in a Nash equilibrium. Consequently, we can have
ui (Gs∗i ,s−i ) ≥ ui (Gsi ,s−i ), ∀si ∈ S̆i , for any i ∈ N .

Theorem 1. In the game with finitely many nodes, there exists a Nash net-
work G .

After solving the network formation algorithm and obtaining the whole network
topology, the source node may have several choices to the destination, as defined
in Convention 1. However, if we select a route that is very far away from the
primary users, which may provide significantly low interference to the primary
users, we may have large delay along this route. Therefore, we need a tradeoff
between the cumulative delay and the aggregate interference. In order to make
sure that the interference to the PUs is low enough without increasing much
delay, we will select a route based on the constraint:

Dtotal ≤ Dτ , (5)

where Dtotal represents the total delay along the route, and Dτ is the threshold.
Note that for different source and destination pairs, we may have different values
for the delay threshold. Given the constraint in Eq. (5), the source will then select
the route with the lowest aggregate interference to the PUs.

4 Simulation Results and Discussions


In this section, we present the simulation results for the network formation game
based distributed routing algorithm for CR-WMNs. We consider that the nodes
are deployed in a 250-by-250 meter area. The value of path loss constant is 2.
Network Formation Game 159

0.9

Normalized interference to primary users


0.8

0.7

0.6

0.5

0.4
Network formation algorithm
0.3 Dijkstra’s algorithm
Centralized algorithm
0.2

0.1

0
20 25 30 35 40 45 50 55 60
Number of secondary nodes

Fig. 3. Number of secondary nodes vs. interference

We assume that link capacities only depend on the distance between the two
players to simplify the problem. The data rate is 48 Mbps within the distance of
32m , 36 Mbps within 37m, 24 Mbps within 45m, 18 Mbps within 60m, 12 Mbps
within 69m, 9 Mbps within 77m, and 6 Mbps within 90m [11]. The maximum
interference range RI is 180m, and the maximum transmission range RT is 90m.
The number of the nodes in the network may change, and we consider random
topologies for the simulation. We generate a data set of 1,000 for the simulation.
For every data set, the generated traffic by the node, the locations of the gateway
are randomly generated and defined.
Fig. 2 shows the simulation results for the proposed distributed routing al-
gorithm. We use a random priority in the fair prioritization phase for a general
case. The big dot represents a PU with the sector area as the PU’s footprint.
The other dots are 50 SUs and those SUs that are inside the PU’s footprint
are forced to turn off because the spectrum is occupied by the PU. We also
define the source and destination nodes in Fig. 2. After applying the proposed
distributed interference minimization routing algorithm, we can get the route
shown as the dashed arrows. If we use the Dijkstra’s shortest path algorithm
that does not consider the aggregate interference to the PU, the solid route is
achieved. In these two routes, the interference temperature values to the primary
user are 1.6195 and 1.3354, respectively. Clearly, the solid route produces higher
interference to the PU than the dashed route, since the nodes in the solid route
are closer to the PU.
Now we compare the performance between the proposed distributed algorithm
and the upper bound. The upper bound can be achieved using the centralized
routing algorithm proposed in [12]. Fig. 3 shows the simulation results about
the interference comparison with different numbers of the SUs. ε1 , ε2 are both
set to be 1.5, and κ1, κ2 are 0.01. We choose small κ values to avoid the cost
function changing too fast. Delay threshold is set to be twice the delay if using
the Dijkstra’s algorithm. The solid line represents the simulated performance of
the distributed network formation algorithm. The dashed line is the centralized
160 Z. Yuan, J.B. Song, and Z. Han

Dijkstra algorithm
5 Network formation algorithm
Centralized algorithm

Normalized delay
4

50 100 150 200 250


Distance between secondary nodes

Fig. 4. Distance between secondary nodes vs. delay

1.8
Normalized interference to primary users

Dτ 1=4*Ds
1.6
Dτ 2=3*Ds
1.4
Dτ 3=2*Ds
1.2

0.8

0.6

0.4

0.2

0
20 25 30 35 40 45 50 55 60
Number of secondary nodes

Fig. 5. Comparison of aggregate interference given different delay thresholds

solution, and it performs better than the distributed approach as expected. The
distributed solution shows near optimum compared with the centralized interfer-
ence minimization solution, producing about 1.0098 time the interference from
the centralized algorithm. This means that it is 99.02% efficient compared to the
upper bound. The black dashed line is the result using the Dijkstra’s algorithm
without considering the aggregate interference to the PUs. We can find that it
produces the highest interference in the three solutions. Moreover, with the in-
creasing number of SUs, interference to the PUs increases in Fig. 3. Note that
the reason that we only compare the proposed algorithms with the Dijkstra’s
shortest path algorithm is that most other existing routing algorithms for CR
networks do not consider the aggregate interference to the PUs.
Fig. 4 shows the comparison of delay between the proposed distributed algo-
rithm and the upper bound. For simplicity, the delay is defined as the number of
hops. We can find that with the increasing of the distance between SUs, the total
delay will increase, which is consistent with the results in Fig. 3. In addition,
the centralized algorithm provides slightly higher delay than the distributed
Network Formation Game 161

network formation algorithm since the network formation algorithm provides


slightly higher interference than the centralized algorithm. Moreover, the Di-
jkstra’s algorithm performs the best since it does not consider the aggregate
interference to the PUs, and always finds the shortest path.
If we do not set a delay threshold shown in Eq. (5) in a large area with a
significantly large amount of SUs, the route will be very long with large delay,
although the aggregate interference to the PUs is decreased. This is not accept-
able and we need to use delay threshold to constrain the route. In Fig. 5, we
show the performance comparison between different delay thresholds using the
distributed network formation algorithm. Ds represents the delay of the route
between the source and destination using the Dijkstra’s algorithm. We can find
in Fig. 5 that higher delay threshold provides longer path with lower aggregate
interference to the primary user. With a higher delay threshold, although the
path we find is longer with more secondary nodes and farther away from the
primary user, the aggregate interference decreases exponentially with distance,
which is much faster than the linear increasing of number of nodes on the route.

5 Conclusion
In this paper, we develop a distributed routing algorithm using network forma-
tion game in CR-WMNs. In CR-WMNs, although the interference from a single
SU to the PUs is small, aggregate interference from a large number of SUs that
are transmitting at the same time can be significant, which will influence the
PUs’ performance. Therefore, we develop a distributed routing algorithm using
the network formation game framework to minimize the aggregate interference to
the PUs, which is practically implementable. Simulation results shows that the
proposed scheme finds better routes in terms of interference to the PUs compared
to the shortest path scheme. We also compare the performance of the distributed
optimization algorithm with an upper bound and validate it efficiency, and the
distributed solution shows near optimum compared to the centralized solution,
providing 99.02% efficiency of the upper bound.

References
1. Hossain, E., Niyato, D., Han, Z.: Dynamic Spectrum Access in Cognitive Radio
Networks. Cambridge University Press, UK (2009)
2. Chowdhury, K.R., Akyildiz, I.F.: Cognitive Wireless Mesh Networks with Dynamic
Spectrum Access. IEEE Journal on Selected Areas in Communications 26(1), 168–
181 (2008)
3. Ileri, O., Samardzija, D., Sizer, T., Mandayam, N.B.: Demand Responsive Pricing
and Competitive Spectrum Allocation Via a Spectrum Server. In: Proc. IEEE
Symposium on New Frontiers in Dynamic Spectrum Access Networks, Baltimore,
MD, US, November 8-11, pp. 194–202 (2005)
4. Etkin, R., Parekh, A., Tse, D.: Spectrum Sharing For Unlicensed Bands. In: Proc.
IEEE Symposium on New Frontiers in Dynamic Spectrum Access Networks, Bal-
timore, MD, US, November 8-11, pp. 251–258 (2005)
162 Z. Yuan, J.B. Song, and Z. Han

5. Kim, D.I., Le, L.B., Hossain, E.: Joint Rate and Power Allocation for Cognitive
Radios in Dynamic Spectrum Access Environment. IEEE Transactions on Wireless
Communications 7(12), 5517–5527 (2008)
6. Parissidis, G., Karaliopoulos, M., Spyropoulos, T., Plattner, B.: Interference-Aware
Routing in Wireless Multihop Networks. IEEE Transactions on Mobile Comput-
ing 10(5), 716–733 (2011)
7. Saad, W., Han, Z., Debbah, M., Hjorungnes, A., Basar, T.: A Game-Based Self-
Organizing Uplink Tree for VoIP Services in IEEE 802.16j Networks. In: Proc.
IEEE International Conference on Communications, Dresden, Germany (June
2009)
8. Clancy, T.C.: Achievable Capacity Under the Interference Temperature Model. In:
Proc. IEEE International Conference on Computer Communications, Anchorage,
AK, US, pp. 794–802 (May 2007)
9. Yucek, T., Arslan, H.: A Survey of Spectrum Sensing Algorithms for Cognitive Ra-
dio Applications. IEEE Communications Surveys and Tutorials 11, 116–130 (2009)
10. Han, Z., Liu, K.J.R.: Resource Allocation For Wireless Networks: Basics, Tech-
niques, and Applications. Cambridge University Press, UK (2008)
11. IEEE 802.11: Wireless LAN Medium Access Control (MAC) and Physical Layer
(PHY) Specifications
12. Yuan, Z., Song, J.B., Han, Z.: Interference Minimization Routing and Scheduling
in Cognitive Radio Wireless Mesh Networks. In: IEEE Wireless Communications
and Networking Conference, Sydney, Australia, pp. 1–6 (April 2010)
Noncooperative Games for Autonomous
Consumer Load Balancing over Smart Grid

Tarun Agarwal and Shuguang Cui

Department of Electrical and Computer Engineering,


Texas A&M University, College Station, TX 77843-3128
{atarun,cui}@tamu.edu

Abstract. Traditionally, most consumers of electricity pay for their con-


sumptions according to a fixed rate. With the advancement of Smart
Grid technologies, large-scale implementation of variable-rate metering
becomes more practical. As a result, consumers will be able to control
their electricity consumption in an automated fashion, where one pos-
sible scheme is to have each individual maximize its own utility as a
noncooperative game. In this paper, noncooperative games are formu-
lated among the electricity consumers in Smart Grid with two real-time
pricing schemes, where the Nash equilibrium operation points are investi-
gated for their uniqueness and load balancing properties. The first pricing
scheme charges a price according to the average cost of electricity borne
by the retailer and the second one charges according to a time-variant
increasing-block price, where for each scheme, a zero-revenue model and
a constant-rate revenue model are considered. The Nash equilibrium is
shown to exist for four different combined cases corresponding to the two
pricing schemes and the two revenue models, and is unique for three of
the cases under certain conditions. It is further shown that both pric-
ing schemes lead to similar electricity loading patterns when consumers
are only interested in minimizing the electricity costs without any other
profit considerations. Finally, the conditions under which the increasing-
block pricing scheme is preferred over the average-cost based pricing
scheme are discussed.

Keywords: Game Theory, Noncooperative Game, Nash Equilibrium,


Smart Grid, Real Time Pricing, Increasing-Block Pricing.

1 Introduction
In the traditional power market, electricity consumers usually pay a fixed retail
price for their electricity usage. This price only changes on a seasonal or yearly
basis. However, it has been long recognized in the economics community that
charging consumers a flat rate for electricity creates allocative inefficiencies, i.e.,
consumers do not pay equilibrium prices according to their consumption levels
[1]. This was shown through an example in [2], which illustrates how flat pricing
causes deadweight loss at off-peak times and excessive demand at the peak times.
The latter may lead to small-scale blackouts in a short run and excessive capacity

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 163–175, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
164 T. Agarwal and S. Cui

buildup over a long run. As a solution, variable-rate metering that reflects the
real-time cost of power generation can be used to influence consumers to defer
their power consumption away from the peak times. The reduced peak-load can
significantly reduce the need for expensive backup generation during peak times
and excessive generation capacity.
The main technical hurdle in implementing real-time pricing has been the
lack of cost-effective two-way smart metering, which can communicate real-time
prices to consumers and their consumption levels back to the energy provider.
In addition, the claim of social benefits from real-time pricing also assumes that
the consumer demand is elastic and responds to price changes while traditional
consumers do not possess the equipments that enable them to quickly alter their
demands according to the changing power prices. Significant research efforts
on real-time pricing have involved estimating the consumer demand elasticity
and the level of benefits that real time pricing can achieve [1, 3, 4]. Fortunately,
the above requirements on smart metering and consumer adaptability are being
fulfilled [5] as technology advances in cyber-enabled metering, power generation,
power storage, and manufaturing automation, which is driven by the need for a
Smart Grid.
Such real-time pricing dynamics have been studied in the literature mainly
with game theory [6–8]. In particular, the authors in [6] provided a design mech-
anism with revelation principle to determine the optimal amount of incentive
that is needed for the customers to be willing to enter a contract with the utility
and accept power curtailment during peak periods. However, they only consid-
ered a fixed pricing scheme. In [7], the authors studied games among consumers
under a certain class of demand profiles at a price that is a function of day long
aggregate cost of global electricity load of all consumers. However, the case with
real-time prices was not investigated in [7]. In [8], a noncooperative game was
studied to tackle the real-time pricing problem, where the solution was obtained
by exploring the relationship with the congestion games and potential games.
However, the pricing schemes that we study are not amenable to transformations
described in [8].
In this paper we formulate noncooperative games [9,10] among the consumers
with two real-time pricing schemes under more general load profiles and revenue
models. The first pricing scheme charges a price according to the instantaneous
average cost of electricity production and the second one charges according to
a time-varying version of increasing-block price [11]. We investigate consumer
demands at the Nash equilibrium operation points for their uniqueness and load
balancing properties. Furthermore, two revenue models are considered for each
of the schemes, and we show that both pricing schemes lead to similar electricity
loading patterns when consumers are interested only in the minimization of
electricity costs. Finally we discuss the conditions under which the increasing-
block pricing scheme is preferred over the average-cost based pricing scheme.
The rest of the paper is organized as follows. The system model and for-
mulation of the noncooperative game are presented in Section 2. The game is
analyzed with different real-time pricing schemes under different revenue models
Noncooperative Games for Autonomous Consumer Load Balancing 165

in Sections 3 and 4, where the Nash equilibrium properties are investigated. We


conclude the paper in Section 5.

2 System Model and Game Formulation


2.1 System Model
We study the transaction of energy between a single electricity retailer and mul-
tiple consumers. In each given time slot, each consumer has a demand for electric
energy (measured in Watt-hour, Wh). The job of the retailer is to satisfy de-
mands from all the consumers. The electricity supply of the retailer is purchased
from a variety of sources over a wholesale electricity market and the retailer
may possess some generation capacity as well. These sources may use different
technologies and fuels to generate electricity, which leads to different marginal
costs of electricity at the retailer, where the marginal cost is the incremental
cost incurred to produce an additional unit of output [12]. Mathematically, the
marginal cost function is expressed as the first derivative of the total cost func-
tion. Examples of the marginal cost function and the corresponding total cost
are presented in Fig. 1(a) and Fig. 1(b), respectively, which are based on real
world data from the wholesale electricity market [3]. Naturally, the retailer at-
tempts to satisfy demands by procuring the cheapest source first. This results
in a non-decreasing marginal cost of the supply curve, as illustrated through
the example in Fig. 1(a). The retailer charges each consumer a certain price for
its consumption in order to cover the cost, where the sum payments by all the
consumers should be enough to cover the total cost and certain profit margin
set by the retailer or regulatory body. In our model we assume that all these are
incorporated within the marginal cost of electricity.
While the retailer aims to procure sufficient supply to meet the sum demand of
its consumers in each time slot, in reality, the supply is limited by the generation
capacity available in the wholesale electricity market. Thus, the maximum sum
load that the retailer can service bears an upper limit and we model this capacity
limit by setting the marginal cost of electricity to infinity when the sum load
exceeds a predetermined threshold. Each consumer has an energy demand in
each time slot and it pays the retailer at a price that is set by the retailer such
that, in each time slot, the sum of payments made by all consumers meets the
total cost in that slot. As such, a particular consumer’s share of this bill depends
on the retailer’s pricing scheme, which is a function of the demands from all
the consumers. Accordingly, as the total load varies over time, each consumer
operates over a time-variant price with time-slotted granularity. We assume that
each consumer has a total demand for electricity over each day1 , which can be
distributed throughout the day in a time-slotted manner, to maximize certain
utility function. Next, we model such individual load balancing behaviors as a
noncooperative game.
1
Here we adopt one day as an operation period that contains a certain number of
time slots. Obviously, such a choice has no impact on the analytical results in this
paper.
166 T. Agarwal and S. Cui

$100

Oil
Marginal Cost ($/MWh)

Oil
Natural Gas
Coal
Nuclear
Hydro

Quantity Supplied (MWh) 60000


(a) Marginal cost as a function of supply

$1M
Cost ($)

Quantity Supplied (MWh) 60000


(b) Total cost as a function of supply

Fig. 1. A hypothetical marginal cost of supply and the corresponding total cost curve
as seen by the retailer in the wholesale market within a single time slot. Supply is from
five different sources: hydroelectric, nuclear, coal, natural gas, and oil. Two different
generators may use different technologies for power generation thus incurring different
marginal costs with the same fuel (e.g., the two different cost levels for oil in Fig. 1(a)).

2.2 Noncooperative Load Balancing Game


The noncooperative game between these consumers is formulated as follows.
Consider a group of N consumers, who submit their daily demands to a retailer in
Noncooperative Games for Autonomous Consumer Load Balancing 167

a time-slotted pattern at the beginning of the day (which contains T time slots).
These consumers are selfish and aim to maximize their individual utility/payoff
functions; hence they do not cooperate with each other to manage their demands.
Each consumer i has a minimum total daily requirement of energy, βi ≥ 0, which
is split over the T time slots. Let xit denote the ith consumer’s demand in the tth
time slot. A consumer can demand any value xit ≥ 0 (negativity constraint) with
 i i i i i i
t xt ≥ βi (demand constraint). Let x = {x1 , x2 , . . . , xt , . . . , xT }, represent the
ith consumer’s demand vector, which is called the strategy for the ith consumer.
, xN
Let xt = {x1t , . . . t }, represent the demand vector from all consumers in time
slot t with xt = i xit . Let x represent the set {x1 , . . . , xN }.
The payoff or utility for consumer i is denoted by π i which is the difference
between the total revenue it generates from the purchased electricity and its
cost. In particular, let Eti , a function of xit , represent the revenue generated by
the ith consumer in the tth time slot and Mti , a function of xt , represent its
payment to the retailer for purchasing xit . Then the payoff π i , to be maximized
by consumer i, is given by
  
πi = Eti − Mti .
t∈{1,...,T }

Since Mti is a function of xt , we see that the consumer payoff is influenced by


its load balancing strategy and those of other consumers.
We consider the problem of maximizing the payoff at each consumer by de-
signing the distributed load balancing strategy xi ’s, under two real-time pricing
schemes set by the retailer. The first one is the average-cost based pricing scheme
and the second one is the increasing-block pricing scheme. Specifically, for the
first scheme the retailer charges the consumers the average cost of electricity
procurement that is only dependent on the sum demands, xt , from all the con-
sumers. For the second scheme, the retailer charges according to a marginal cost
function that depends on the vector of demands from all consumers, xt .
Let C(x) represent the cost of x units of electricity, to the retailer, from the
wholesale market (an example function is plotted in Fig. 1(b)). Then under the
average-cost based pricing, the price per unit charged to the consumers is given
by
A(xt ) = C(xt )/xt , (1)
and at time t consumer i pays

Mti = xit A(xt ) (2)



for consuming xit units of electricity. It is easy to see that i Mti = C(xt ), i.e.,
with average-cost based pricing the total payment made by the consumers covers
the total cost to the retailer. Note that C  (xt ) gives the marginal cost function
in the wholesale market, henceforth denoted by C(xt ) = C  (xt ) in the context of
increasing-block pricing (an example marginal cost curve is plotted in Fig. 1(a)).
For reasons we discussed earlier, in the context of electricity market, the marginal
cost C(xt ) is always non-negative and non-decreasing such that C(xt ) is always
168 T. Agarwal and S. Cui

positive, non-decreasing, and convex. Briefly, we note that as the retailer capacity
is constrained by a predetermined upper limit U , we model this constraint as
C(xt ) = ∞, ∀xt > U ; obviously xit ≤ U is an implicit constraint on the demand
xit for any rational consumer.
The second scheme is a time-variant version of the increasing-block pric-
ing scheme [11]. With a typical increasing-block pricing scheme, consumer i
is charged a certain rate b1 for its first z1 units consumed, then charged rate
b2 (> b1 ) for additional z2 units, and charged rate b3 (> b2 ) for additional z3
units, and so on. The b’s and z’s describe the marginal cost price for the com-
modity. In our scheme we design a marginal cost function, which retains the
increasing nature of increasing-block pricing, such that it depends on xt and the
function C(·). Consumer i pays an amount determined by the marginal cost func-
tion M(x, xt ), applicable to all consumers at time slot t. In particular consumer
i pays
 xit
Mti = M(x, xt )dx (3)
0

for consuming xit units of electricity where M(·) is chosen as


⎛ ⎞

M(x, xt ) = C ⎝ min (x, xjt )⎠ ,
j


such that i Mti = C(xt ) is satisfied. An intuition behind this pricing scheme
is to penalize consumers with relatively larger demands. Note that in this case,
xit ≤ U is implicitly assumed by letting C(·) = ∞ ∀xit > U and hence Mti =
∞ ∀xit > U .
For each of the two pricing schemes, we study two different revenue models.
For the first one we set Eti as zero for all consumers over all time slots, which
leads to payoff maximization being the same as cost minimization from the point
of view of the consumers. For the second one we assign consumer i a constant
revenue rate φit at each time slot t, which gives Eti = φit xit and leads to payoff
maximization being the same as profit maximization.

3 Nash Equilibrium with Average-Cost Pricing

For the average-cost pricing, the payment to the retailer in slot t by consumer i
is given by (2).

3.1 Zero-Revenue Model

In this case the revenue is set to zero as Eti = 0, which results in payoff maxi-
mization being the same as cost minimization
 for each consumer. Specifically, the
payoff for consumer i is given by π i = − t Mti . The consumer load balancing
Noncooperative Games for Autonomous Consumer Load Balancing 169

problem for consumer i, for i = 1, . . . , N , is given by the following optimization


problem:

maximize π i (xi ) = − Mti
t
subject to Mti = xit A(xt ), ∀t,

xit ≥ βi ,
t

xt = xjt , ∀t,
j

0 ≤ xit , ∀t.

As cost to the retailer becomes infinity whenever the total demand goes beyond
the capacity threshold for the wholesale market, i.e., when C(xt ) = ∞ ∀xt > U ,
the price to consumers will become infinite and their payoff will go to negative
infinity. Thus any consumer facing an infinite cost at a particular time slot
can manipulate the demand vector such that the cost becomes finite, which is
always feasible under the assumption that sum load demand over all times slots
is less than sum supply availability. This implies that, at Nash equilibrium, sum
demand xt will be less than the capacity threshold
 U, ∀t, which allows for a
redundant constraint xit ≤ U, ∀i, t, as xit ≤ i xit = xt ≤ U . Such a redundant
but explicit constraint in turn makes the feasible region for x, denoted by X ,
finite and hence compact. The compactness property is utilized to prove the
Kakutani’s theorem [13] which in turn is required to show the existence of NEP
solution.
By the results in [14] we can show that there exists an NEP strategy for all
agents with the cost function used here and the NEP solution exists for the
proposed noncooperative consumer load balancing game.
On the other hand, the cost function Mti does not satisfy the conditions for
being a type-A function, defined in [14]. Therefore, the corresponding uniqueness
result in [14] cannot be extended to our formulation. In [15] we show that our
problem is equivalent to an atomic flow game [16] with splittable flows and
different player types (i.e., each player controls a different amount of total flow)
over a generalized nearly-parallel graph, which has strictly semi-convex, non-
negative, and non-decreasing functions for cost per unit flow. By the results
of [16], we can prove that the NEP solution for the load balancing game is
unique [15].
In the following, we discuss the properties for the unique NEP solution for
the proposed load balancing game.

Lemma 1. With the average-cost based pricing and zero revenue, at the Nash
equilibrium the price of electricity faced by all consumers is the same over all
time slots.

The proof is provided in [15].


170 T. Agarwal and S. Cui

Lemma 2. If C(·) is strictly convex, at the Nash equilibrium, the sum of de-
mands on the system, xt , keeps the same across different time slots.
The proof is provided in [15].
Lemma 3. If C(·) is strictly convex, at Nash equilibrium, each consumer will
distribute its demands equally over the T time slots.
The proof is provided in [15].

Remark: Under the average-cost based pricing scheme with zero revenue, if
one particular consumer increases its total demand of electricity, the price A(·)
increases, which in turn increases the payments for all other consumers as well.
Theoretically one consumer may cause indefinite increases in the payments of
all others; and in this sense this scheme does not protect the group from reckless
action of some consumer(s). This issue will be addressed by our second pricing
scheme as we will show in Section 4.

3.2 Constant-Rate Revenue Model


In this case, the rate of revenue generation for each consumer at each time slot
is taken as a non-negative constant φit . Thus, Eti = φit × xit . The consumer load
balancing problem for each consumer i is given by the following optimization
problem:
 
maximize π i (xi ) = Eti − Mti
t
subject to Eti = φit xit , ∀t,
Mti = xit A(xt ), ∀t,

xit ≥ βi ,
t

xt = xjt , ∀t,
j

0 ≤ xit , ∀t.
We assume that βi = 0, ∀i, and the rate of revenue is larger than the price of
electricity such that we do not end up with any negative payoff or the trivial
solution xit = 0, ∀i, t.
Here again, if the sum demand in a given time slot t exceeds the retailer’s
capacity threshold U , the consumers will face an infinite price for their con-
sumption. This implies that, at Nash equilibrium the sum demand xt will never
exceed the capacity threshold U , as we assume that sum load demand over all
time slots is greater that sum load available.
 This again allows for the redun-
dant constraint xit ≤ U, ∀i, t, as xit ≤ i xit = xt ≤ U , which in turn makes the
feasible region for x, X , finite and hence compact.
The proof for the existence of NEP for this game under the given assumptions
is provided in [15].
Noncooperative Games for Autonomous Consumer Load Balancing 171

Lemma 4. At the Nash equilibrium, the consumer(s) with the highest revenue
rate (φit ) within the time slot, may be the only one(s) buying the power in that
time slot.
The proof is provided in [15]. Thus if consumer i has the maximum rate of
revenue, either it is the only consumer buying non-zero power xit such that
φit = A(xit ) or φit < C  (0) and hence xit = 0 in that time slot, which leads to
a unique Nash equilibrium for the sub-game. If in a given time slot multiple
consumers experience the same maximum rate of revenue, the sub-game will
turn into a Nash Demand Game [17] between the set of consumers given by
{arg maxk φkt }, which is well known to admit multiple Nash equilibriums. Thus
the overall noncooperative game has a unique Nash equilibrium if and only if, in
each time slot, at most one consumer experiences the maximum rate of revenue.

4 Nash Equilibrium with Increasing-Block Pricing


In this section we study the load balancing game with the time-variant increasing-
block pricing scheme. Under this scheme consumer i pays Mti for xit units of
electricity, which is given by (3) with M(x, xt ) the marginal cost function posed
to the consumer. Thus, as defined before, we have
⎛ ⎞

M(x, xt ) = C ⎝ min (x, xjt )⎠ .
j

As an example, if the demands from different consumers at time slot t are iden-
tical, i.e., if xit = xjt , ∀i, j, we have,

M(x, xt ) = C(N x).

4.1 Zero-Revenue Model


In this case the payment by consumer i is given by (3)
 xit
Mti = M(x, xt )dx.
0

The consumer load balancing problem for each consumer i is given by the fol-
lowing optimization problem:

maximize π i (xi ) = − Mti
t
 xit
subject to Mti = M(x, xt )dx, ∀t,
0

xit ≥ βi ,
t
0 ≤ xit , ∀t.
172 T. Agarwal and S. Cui

If the sum demand xt in a time slot t exceeds U , the price of electricity for the
consumer with the highest demand (indexed by ĵ) becomes infinite. As we retain
the assumption that sum load demand over all time slots is greater that sum
load available, consumer ĵ can rearrange its demand vector such that either the
sum demand becomes within the capacity threshold or consumer ĵ is no longer
the highest demand consumer (then the new customer with the highest demand
performs the same routine until the sum demand is under the threshold). This
implies that, at the Nash equilibrium point we have xt ≤ U . Similarly, we now
have the redundant constraint xit ≤ U, ∀ i, t, which in turn makes the feasible
region X finite and hence compact.
The proof for the existence of NEP for this game under the given assumptions
is provided in [15]. When each consumer tries to minimize its total cost while
satisfying its minimum daily energy requirement βi , we have the following result.

Lemma 5. If C(·) is strictly convex, the Nash equilibrium is unique and each
consumer distributes its demand uniformly over all time slots.

The proof is provided in [15].

Remark: Notice that under the zero-revenue model, the NEP point is the same
with both increasing-block pricing and average-cost based pricing. For both the
cases, at NEP, we have xit = βi /T, ∀i, t. However, even though the loading pat-
tern is similar, the payments Mti made by the consumers will differ and, with
increasing-block pricing, will likely be lesser for consumers with relatively lower
consumption. In addition, with increasing-block pricing, the maximum payment
Mti made by the ith consumer given xit demand will be C(N xit )/N , irrespective
of what other consumers demand and consume. Thus this addresses the issue
faced under the average-cost based pricing and zero-revenue model, in which one
consumer can increase their demand indefinitely and cause indefinite increase in
the payments of all other consumers.

4.2 Constant-Rate Revenue Model

The consumer load balancing problem for consumer i is given by the following
optimization problem:
 
maximize π i (xi ) = Eti − Mti
t
subject to Eti = i i
φt xt , ∀t,
 xit
Mti = M(x, xt )dx, ∀t,
0

xit ≥ βi ,
t
0 ≤ xit , ∀t.
Noncooperative Games for Autonomous Consumer Load Balancing 173

Here again, we assume βi = 0, ∀i, to avoid any negative payoffs and we could
agree for the redundant constraint xit ≤ U, ∀ i, t, which in turn makes the
feasible region for X finite and hence compact.
The proof for the existence of NEP for this game under the given assump-
tions is provided in [15]. With the average-cost based pricing scheme under the
constant-rate revenue model, we see that in a given time slot, if a single con-
sumer enjoys the maximum rate of revenue, it will be the only consumer who
is able to purchase power. We show here that with the increasing-block pricing
scheme under constant-rate revenue model, the result is different.
For a given time slot t, consumer i has an incentive to increase their demand
xit as long as the payoff increases, i.e., ∂π i /∂xit > 0. Therefore at the equilibrium
the following holds for all consumers:

∂π i
≤0
∂xit
(4)
∂Mti
⇒ φit ≤ = M(xit , xt ).
∂xit

Additionally, if φit < M(xit , xt ), Jti can be reduced by reducing xit . This implies
that if xit > 0, at the equilibrium we have

φit ≥ M(xit , xt ). (5)

1500
Quantity Demanded xit (MWh)

Rate of Revenue φit ($/MWh) $100

Fig. 2. Demand xit versus the rate of revenue (φit ) at equilibrium. Each dot represents
a particular consumer i = {1, . . . , 100}.
174 T. Agarwal and S. Cui

Thus (4) and (5) together imply that, if xit > 0, we have φit = M(xit , xt ). Together
we can write the following set of necessary conditions for equilibrium,

φit = M(xit , xt ) if φit ≥ M(0, xt ),


(6)
xit = 0 if φit < M(0, xt ).

For illustration, we simulate a scenario consisting of 100 consumers, who have


their rate of revenue φit generated from a uniform distribution ranging over
$0−$100/MWh, where the marginal cost to the retailer C(·) is given by Fig. 1(a).
In Fig. 2 we plot the demand xit versus the rate of revenue (φit ) at a given time
slot t, where xit is evaluated over i = {1, . . . , 100}. The equilibrium is obtained
by iterative updates of M(·) and xt until convergence within an error tolerance
as in (6).
Thus, unlike with the average-cost pricing, where only the consumer with
the maximum rate of revenue could purchase electricity at the equilibrium, any
consumer may procure a non-zero amount of energy as long as its own rate of
revenue is larger than M(0, xt ).

5 Conclusion

In this paper we formulated noncooperative games among the consumers of


Smart Grid with two real-time pricing schemes to derive autonomous load bal-
ancing solutions. The first pricing scheme charges consumers a price that is equal
to the average cost of electricity borne by the retailer and the second scheme
charges consumers an amount that is dependent on the incremental marginal
cost which is shown to protect consumers from irrational behaviors. Two revenue
models were considered for each of the pricing schemes, for which we investigated
the Nash equilibrium operation points for their uniqueness and load balancing
properties. For the zero-revenue model, we showed that when consumers are in-
terested only in the minimization of electricity costs, the Nash equilibrium point
is unique with both the pricing schemes and leads to similar electricity loading
patterns in both cases. For the constant-rate revenue model, we showed the ex-
istence of Nash equilibrium with both the pricing schemes and the uniqueness
results with the average-cost based pricing scheme.

References

1. Allcott, H.: Rethinking real time electricity pricing. CEEPR Working Paper 2009-
015, MIT Center for Energy and Environmental Policy Research (October 2009),
http://web.mit.edu/ceepr/www/publications/workingpapers/2009-015.pdf
2. Borenstein, S.: Time-varying retail electricity prices: Theory and practice. In: Grif-
fin, J., Puller, S. (eds.) Electricity Deregulation: Choices and Challenges, pp. 317–
357. University of Chicago Press, Chicago (2005)
3. Holland, S., Mansur, E.: The short-run effects of time-varying prices in competitive
electricity markets. The Energy Journal 27(4), 127–155 (2006)
Noncooperative Games for Autonomous Consumer Load Balancing 175

4. Borenstein, S.: The long-run effects of real-time electricity pricing. CSEM Working
Paper 133, University of California Energy Institute, Berkeley (June 2004),
http://www.ucei.berkeley.edu/PDF/csemwp133.pdf
5. Faruqui, A., Hledik, R., Sergici, S.: Rethinking prices. Public Utilities Fort-
nightly 148(1), 30–39 (2010)
6. Fahrioglu, M., Alvarado, F.: Designing cost effective demand management con-
tracts using game theory. In: IEEE Power Engineering Society 1999 Winter Meet-
ing, vol. 1, pp. 427–432. IEEE (1999)
7. Caron, S., Kesidis, G.: Incentive-based energy consumption scheduling algorithms
for the smart grid. In: 2010 First IEEE International Conference on Smart Grid
Communications (SmartGridComm), pp. 391–396 (October 2010)
8. Ibars, C., Navarro, M., Giupponi, L.: Distributed demand management in smart
grid with a congestion game. In: 2010 First IEEE International Conference on
Smart Grid Communications (SmartGridComm), pp. 495–500 (October 2010)
9. Tirole, J.: The Theory of Industrial Organization. The MIT Press, Cambridge
(1988)
10. Başar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Society for Indus-
trial and Applied Mathematics, Philadelphia (1999)
11. Borenstein, S.: Equity effects of increasing-block electricity pricing. CSEM Working
Paper 180, University of California Energy Institute, Berkeley (November 2008),
http://www.ucei.berkeley.edu/PDF/csemwp180.pdf
12. Lindeman, J.: EZ-101 Microeconomics. Barron’s Educational Series, Hauppauge
(2001)
13. Kakutani, S.: A generalization of brouwers fixed point theorem. Duke Mathematical
Journal 8(3), 457–459 (1941)
14. Orda, A., Rom, R., Shimkin, N.: Competitive routing in multiuser communication
networks. IEEE/ACM Transactions on Networking (TON) 1(5), 510–521 (1993)
15. Agarwal, T., Cui, S.: Noncooperative Games for Autonomous Consumer Load Bal-
ancing over Smart Grid. ArXiv e-prints (April 2011),
http://arxiv.org/abs/1104.3802
16. Bhaskar, U., Fleischer, L., Hoy, D., Huang, C.: Equilibria of atomic flow games are
not unique. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on
Discrete Algorithms, pp. 748–757 (2009)
17. Nash, J.: Two-person cooperative games. Econometrica 21(1), 128–140 (1953)
Optimal Contract Design for an Efficient
Secondary Spectrum Market

Shang-Pin Sheng and Mingyan Liu

Department of Electrical Engineering and Computer Science


University of Michigan, Ann Arbor, Michigan, 48109-2122
{shangpin,mingyan}@umich.edu

Abstract. In this paper we formulate a contract design problem where


a primary license holder wishes to profit from its excess spectrum ca-
pacity by selling it to potential secondary users/buyers, but needs to
determine how to optimally price it to maximize its profit, knowing that
this excess capacity is stochastic in nature and cannot provide determin-
istic service guarantees to a buyer. We address this problem by adopting
as a reference a traditional spectrum market where the buyer can pur-
chase exclusive access with fixed/deterministic guarantees. We consider
two cases; in one the seller has full information on the buyer, including
its service requirement and quality constraint, and in the other the seller
only knows possible types and their distribution. In the first case we fully
characterize the nature of the optimal contract design. In the second case,
we find the optimal contract design when there are two possible types
and determine a design procedure and show that it is optimal when the
nature of the stochastic channel is common to all possible types.

Keywords: contract design, incentives, quality of service constraint,


secondary spectrum market.

1 Introduction
The scarcity of spectrum resources and the desire to improve spectrum efficiency
have led to extensive research and development in recent years in such concepts
as dynamic spectrum access/sharing, open access, and secondary (spot or short-
term) spectrum market, see e.g., [1, 2].
One of the fundamental premises behind a secondary (and short-term) spec-
trum market is the existence of excess capacity due to the primary license holder’s
own spectrum under-utilization. However, this excess capacity is typically uncon-
trolled and random, both spatially and temporally, and strongly dependent on the
behavior of the primary users. One may be able to collect statistics and make pre-
dictions, as has been done in numerous spectrum usage studies [3–5], but it is fun-
damentally stochastic in nature. The primary license holder can of course choose
to eliminate the randomness by setting aside resources (e.g., bandwidth) exclu-
sively for secondary users. This will however likely impinge on its current users
and may not be in the interest of its primary business model.

The work is partially supported by the NSF under grants CIF-0910765 and CNS-
1217689, and the ARO under Grant W911NF-11-1-0532.

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 176–191, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Optimal Contract Design for an Efficient Secondary Spectrum Market 177

The alternative is to simply give non-exclusive access to secondary users for


a fee, which allows the secondary users to share a certain amount of bandwidth
simultaneously with its existing licensed users, but only under certain conditions
on the primary traffic/spectrum usage. For instance, a secondary user is given
access but can only use the bandwidth if the current activity by the licensed
users is below a certain level, e.g., as measured by received SNR. This is a typical
scenario under the spectrum overlay and underlay models [6]; many spectrum
sharing schemes proposed in the literature fall under this scenario, see e.g., [7–10].
In this case a secondary user pays (either in the form of money or services in
return) for gaining spectrum access but not for guaranteed use of the spectrum.
This presents a challenge to both the primary and the secondary users: On
one hand, the secondary user must assess its needs and determine whether the
uncertainty in spectrum quality is worth the price asked for and what level of
uncertainty can be tolerated. On the other hand, the primary must decide how
stochastic service quality should be priced so as to remain competitive against
guaranteed (or deterministic) services which the secondary user may be able to
purchase from a traditional market or a different primary license holder.
In this paper we formulate this as a contract problem for the primary user
and seek to address the question of what type of contracts should the primary
design so as to maximize its profit. Within this framework we adopt a reference
point in the form of a traditional spectrum market from where a secondary user
can purchase deterministic or guaranteed service, i.e., exclusive access rights
to certain bandwidth, at a fixed price per unit. This gives the secondary user
a choice to reject the offer from the primary user if it is risk-averse or if the
primary user’s offer is not attractive. This also implies that the price per unit of
bandwidth offered by the primary user must reflect its stochastic quality.
Work most relevant to the study presented in this paper includes [11], which
considers a contract problem where the secondary users help relay primary user’s
data and in return are allowed to send their own data, as well as [12], which
considers the convexity of an optimal portfolio of different stochastic purchases,
under two types of bandwidth shortage constraints. The work in [12] however
considers only the perspective of the buyer but not the seller.
Our main results are as follows. We formally present the contract design prob-
lem in Section 2, and consider two cases. In the first case the seller is assumed to
have full information on the buyer, including its service requirement and qual-
ity constraint. For in this case we fully characterize the optimal contract design
(Section 3). In the second case the buyer belongs to a set of types and the seller
knows only the set and its distribution but not the buyer’s exact type. We again
fully characterize the nature of the optimal contract design when the number
of types is limited to two. In the case of having more than two possible types
of buyer, we assume that the channel condition is common among the buyers.
Under this assumption, we determine the optimal contract when the seller can
design as many contract as the it wants. When the number of contracts is limited,
we describe a design procedure and prove the optimality (Section 4).
178 S.-P. Sheng and M. Liu

2 Model and Assumptions


The basic idea underlying our model is to capture the value of secondary spec-
trum service, which is random and non-guaranteed in nature, by using guaran-
teed service as a reference.

2.1 The Contract Setup


The contract is setup to be distributed from the seller to the buyer in this model.
The seller, who is also referred to as the owner or the primary license holder,
uses the spectrum to provide business and service to its primary users, and carry
primary traffic. The seller is willing to sell whatever underutilized bandwidth it
has as long as it generates positive profit and does not impact negatively its
primary business. It knows that the bandwidth it is selling is stochastic and
cannot provide hard guarantees. We will assume that the seller pre-designs up
to M contracts and announce them to a potential buyer. If the buyer accepts
one of the contracts, they come to an agreement and they have to follow the
contract up to a predetermined period of time. It is up to the seller to design
the contracts, but up to the buyer to decide whether or not to accept it.
Each contract is in the form of a pair of real numbers (x, p), where x ∈ R+
and p ∈ R+ :
– x is the amount of bandwidth they agree to trade on (given from the seller
to buyer).
– p is the price per unit of x (total of xp paid to the seller).
When a contract (x, p) is signed, the seller’s profit or utility is defined by

U (x, p) = x(p − c),

where c(> 0) is a predetermined constant cost which takes into account the
operating cost of the seller. If none of the contract is accepted by the buyer, the
reserved utility of the owner is defined by U (0, 0) = 0.

2.2 A Reference Market of Fixed/Deterministic Service or


Exclusive Use
We next consider what a contract specified by the pair (x, p) means to a potential
buyer. To see this, we will assume that there exists a traditional (as opposed to
this emerging, secondary) market from where the buyer can purchase services
with fixed or deterministic guarantees. What this means is that the buyer can
purchase exclusive use of certain amount of bandwidth, which does not have to
be shared with other (primary) users. This serves as an alternative to the buyer,
and will be used in our model as a point of reference. We will leave it unspecified
how the price of exclusive use is set, and will thus normalize it to be unit price
per unit of bandwidth (or per unit of transmission rate). The idea is that given
this alternative, the owner cannot arbitrarily set its price because the buyer can
Optimal Contract Design for an Efficient Secondary Spectrum Market 179

always walk away and purchase from this traditional market. This traditional
market will also be referred to as the reference market, and the service it sells
as the fixed or deterministic service/channel. Our model does allow a buyer to
purchase from both markets should there be a benefit.

2.3 The Buyer’s Consideration


When the set of M contracts are presented to a buyer, its choices are (1) to
choose one of the contracts and abide by its terms, (2) to reject all contracts and
go to the traditional market, and (3) to purchase a certain combination from
both market. The buyer’s goal is to minimize its purchasing cost as long as a
certain quality constraint is satisfied. The framework presented here applies to
any meaningful quality constraint; to make our discussion concrete, below we
will focus on a loss constraint for concreteness.
Suppose the buyer chooses to purchase y unit of fixed service from the ref-
erence market together with a contract (x, p). Then its constraint on expected
loss of transmission can be expressed as:
E[(q − y − xB)+ ] ≤  ,
where
– q: the amount of data/traffic the buyer wishes to transmit.
– B ∈ {0, 1}: a binary random variable denoting the quality of the channel for
this buyer. We will denote b := P (B = 1).
– : a threshold on expected loss that’s acceptable to the buyer.
Here we have adopted a simplifying assumption that the purchased channel (in
the amount of x) is either available in the full amount or completely unavailable.
More sophisticated models can be adopted here, by replacing xB with another
random variable X(x) denoting the random amount of data transmission the
buyer can actually realize. This will not affect the framework presented here,
but will alter the technical details that follow.
With this purchase (y, (x, p)), the buyer’s cost is given by y + xp. The cost of
the contract (x, p) to this buyer is given by the value of the following minimiza-
tion problem:
C(x, p) = minimize y + px (1)
y

subject to E[(q − y − xB)+ ] ≤  (2)


That is, to assess how much this contract actually costs him, the buyer has to
consider how much additional fixed service he needs to purchase to fulfill his
needs.
The buyer can always choose to not enter into any of the presented contracts
and only purchase from the traditional market. In this case, its cost is given by
the value of the following minimization problem:
C(0, 0) = minimize y
y

subject to E[(q − y)+ ] ≤ 


180 S.-P. Sheng and M. Liu

Since every term is deterministic, we immediately conclude that C(0, 0) = q −


, which will be referred to as the reserve price of the buyer. Obviously if a
contract’s cost is higher than this price then there is no incentive for the buyer
to enter into that contract.

2.4 Informational Constraints


We investigate the following two possible scenarios.

1. Information is symmetric:
Under this assumption, the seller knows exactly the values q, b,  of the buyer.
The seller can thus extract all of the buyer’s surplus (over the reserve price),
resulting in C(x, p) = C(0, 0) at the optimal contract point.
2. Information is asymmetric:
Under this assumption, the seller can no longer exploit all of the buyer’s
surplus, resulting in a more complicated contract design process. We assume
there are possibly K types of buyers, each having a different triple (q, b, ).
We further assume that the seller has a prior belief of the distribution of
the buyer types; a buyer is of type i with probability ri and has the triple
(qi , bi , i ) as its private information. We will also assume that at most M
different contracts are announced to the buyer.

3 Optimal Contract under Symmetric Information

In the symmetric information case, the seller can custom-design a contract for
the buyer, subject to the constraint that it offers an incentive for the buyer to
accept, referred to as the individual rationality (IR) constraint. In other words,
the buyer (by accepting the contract) has to be able to achieve a cost no higher
than the reserve price: C(x, p) ≤ C(0, 0) = q − . Knowing this, the seller can
exactly determined the region where the buyer would accept a contract (x, p)
since it knows the values q, , b.
Theorem 1. When q(1 − b) ≤ , the buyer accepts the contract (x, p) if and
only if

b if x ≤ q−
p ≤ q− b
q− (3)
x if x > b

When q(1 − b) > , the buyer accepts the contract if and only if


b if x ≤ 1−b
p≤ b  (4)
x(1−b) if x > 1−b

The above result is illustrated in Fig. 1. The meaning of the two different types
of regions are as follows. (i) When q(1 − b) ≤ , or b ≥ q−
q , the quality of the
Optimal Contract Design for an Efficient Secondary Spectrum Market 181

q(1−b)<ε q=5, b=0.8, ε=3 q(1−b)>ε, q=5, b=0.3, ε=3


1
1
* *
0.8
(x ,p ) 0.8

price (p)
price (p)

0.6 0.6
p=b
xp=(q−ε) p=b (x*,p*)
0.4 0.4
xp=bε/(1−b)
0.2 0.2

0 0
0 2 4 6 8 10 0 2 4 6 8 10
bandwidth (x) bandwidth (x)

Fig. 1. Examples of q(1 − b) ≤  (left) and q(1 − b) >  (right)

stochastic channel is sufficiently good such that, when x is large enough, the
constraint Eqn. (2) can be achieved without any purchase of the deterministic
channel (fixed service y). Thus, the buyer is willing to spend up to C(0, 0) = q−.
(ii) When q(1 − b) > , or b < q−q , the quality of the stochastic channel is not
so good that no matter how much is purchased, some deterministic channel (y)
has to be purchased to satisfy the loss constraint. Thus, the buyer is not willing
to spend all of q −  on the contract. Below we prove the sufficient condition of
the acceptable region when q(1 − b) ≤ ; other parts of the above theorem can
be done using similar arguments.
q−
1. The buyer accepts the contract (x, p) if x ≤ b and p ≤ b.
Proof. We start by letting y = q −  − xp and show that the IR constraint is
satisfied:
y + xp = q −  − xp + xp = q −  ≤ U (0, 0) .
The loss constraint is satisfied because,

E[(q − y − xB)+ ] = (q − y − x)+ b + (q − y)+ (1 − b)


= ( + xp − x)+ b + ( + xp)(1 − b)

( + xp)(1 − b) ≤ ( + b q−
b )(1 − b) ≤  if  + xp − x ≤ 0
=
 + x(p − b) ≤  if  + xp − x > 0


q−
2. The buyer is willing to accept the contract (x, p) if x ≥ b and xp ≤
U (0, 0) = q − .
Proof. The IR constraint is satisfied when the buyer does not purchase any
y. We next examine whether the quality constraint is satisfied with y = 0.
q− +
E[(q − x)+ ] = (q − x)+ b + q(1 − b) ≤ (q − ) b + q(1 − b)
b
= (qb − (q − ))+ + q(1 − b) = ( − q(1 − b))+ + q(1 − b)
= ( − q(1 − b)) + q(1 − b)) =  ,

where the second to last equality follows from the fact that q(1 − b) ≤ . 

182 S.-P. Sheng and M. Liu

After determining the feasible region of contracts for a given type (q, , b), the
seller can choose any point in this region to maximize its utility. We next show
that the optimal contract for the seller is determined by the intersection of the
two boundary lines derived above, which we will denote as (x∗ , p∗ ) throughout
the rest of the paper. Here we assume that there exists a contract with p > c
such that the buyer will accept, for otherwise the seller has no incentive to sell
the stochastic channel.

Theorem 2. The optimal contract is the intersection point of the two lines:

p∗ = b (5)

∗ ∗ q −  if q(1 − b) ≤ 
x p = b (6)
1−b if q(1 − b) > 

Proof. From the form of the seller’s utility (U (x, p) = x(p − c)), it can be easily
verified that the profit is increasing in p. Using this property and the fact that
we already determined the feasible contracts in Theorem 1, we can show that
the contract pair (x, p) that generates the highest profit for the seller is the
intersection point (x∗ , p∗ ) (as illustrated in Figure 1). 


Once the seller determines the optimal contract and presents it to the buyer,
the buyer chooses to accept because it satisfies the loss constraint and the IR
constraint. It can be shown that the buyer’s utility is exactly C(0, 0), as we
expected.
The optimal contract for buyer of type (q, , b) defined in Theorem 2 can be
written in a compact form in the following theorem.

Theorem 3. The optimal contract (x∗ , p∗ ) of a buyer type (q, , b) is given by



(x∗ , p∗ ) = (min( 1−b , q−
b ), b).

Proof. By Theorem 2, when q(1 − b) ≤ ,



q− 1−b −
≤ =
b b 1−b
∗ q−  q−
⇒ x = = min( , )
b 1−b b

Similarly, when q(1 − b) > ,



q− 1−b −
> =
b b 1−b
  q−
⇒ x∗ = = min( , )
1−b 1−b b


Optimal Contract Design for an Efficient Secondary Spectrum Market 183

We now introduce the concept of an equal-cost line of a buyer, this concept will
be used to find the optimal contract when there are more than one possible type
of buyer. Consider a contract (x , p ). Denote by P (x , p , x) a price such that the
contract (x, P (x , p , x)) has the same cost as contract (x , p ) to a buyer. This
will be referred to as an equivalent price. Obviously P (x , p , x) is a function of
x, x , and p .
Definition 1. The equal-cost line E of a buyer of type (q, , b) is the set of
contracts within the buyer’s acceptance region T that are of equal cost to the
buyer. Thus (x, p) ∈ E if and only if p = P (x , p , x) for some other (x , p ) ∈ E.
The cost of this line is given by C(x , p ), ∀(x , p ) ∈ E.
It should be clear that there are many equal-cost lines, each with a different
cost. Figure 2 shows an example of a set of equal-cost lines. We will therefore
also write an equal-cost line as Ex ,p for some (x , p ) on the line to distinguish
it from other equal-cost lines. The next theorem gives a precise expression for
the equivalent price that characterizes an equal-cost line.

Equal cost line (Cost of accepting)


0.8
13.5
13.15
12.75
0.6
12
11.5
price (p)

0.4

0.2

0
0 2 4 6 8 10
bandwidth (x)

Fig. 2. Example of equal cost lines

Theorem 4. For a buyer of type (q, , b) with an intersection point (x∗ , p∗ ) on


its acceptance region boundary, and given a contract (x , p ), an equal-cost line
Ex ,p consists of all contracts (x, P (x , p , x)) such that

x 
⎪ b − x (b − p )
⎪ if x, x ≤ x∗
⎨  
x p /x if x, x ≥ x∗
P (x , p , x) =

⎪ (b(x − x ) + x p )/x if x < x∗ < x
∗   

b − (x∗ b − x p )/x if x < x∗ < x

Proof. We will prove this for the case q(1 − b) ≤ ; the other case can be shown
with similar arguments and is thus omitted for brevity. In this case x∗ = q− b .
When x, x ≤ x∗ , without buying deterministic service the loss is given by
E[(q − xB)+ ] = (q − x)+ b + q(1 − b)
= (q − x)b + q(1 − b) = q − xb ≥ ,
184 S.-P. Sheng and M. Liu

where the second equality is due to the fact that q(1 − b) ≤  ⇒ q−
b ≤q ⇒x≤
q−
b ≤ q. The incentive for the buyer is to purchase y such that the loss is just
equal to .

E[(q − y − xB)+ ] = (q − y − x)b + (q − y)(1 − b)


= q − y − xb =  .

The first equality follows from the fact that q(1 − b) ≤ , which implies both
(q − y − x) ≥ 0 and (q − y) ≥ 0. This is true for both (x, p) and (x , p ). Since
(x, p) is on the equal cost line Ex ,p , we know that C(x, p) = C(x , p ). We also
know that C(x, p) = y + xp and C(x , p ) = y  + x p ,

C(x, p) = q −  − xb + xp = q −  − x b + x p = C(x , p ) .

Rearranging the second equality such that p is a function of x, x , p immediately


gives the result. When x, x > x∗ , x (x ) alone is sufficient to achieve the loss
constraint. For C(x, p) = C(x , p ) we must have x p = xp, resulting in the
second branch. The third and fourth branch can be directly derived from the
first two branches. When x > x∗ > x (x > x∗ < x), we first find the equivalent
price at x∗ by the first branch (second branch), and then use the second branch
(first branch) to find P (x , p , x). This gives the third branch (fourth branch) 


Note that every contract below an equal-cost line is strictly preferable to a


contract on the line for the buyer.

4 Contract under Asymmetric Information

We now turn to the case where parameters (q, b, ) are private information of
the buyer. The seller no longer knows the exact type of the buyer but only
what types are out there and their distribution; consequently it has to guess the
buyer’s type and design the contract in a way that maximizes its expected payoff.
In order to do so, the seller can design a specific contract for each type so that
the buyers will reveal their true types. Specifically, when the buyer distributes a
set of contracts C = {(x1 , p1 ), (x2 , p2 )......(xK , pK )} specially designed for each
of the K types, a buyer of type i will select (xi , pi ) only if the following set of
equations is satisfied:

Ci (xi , pi ) ≤ Ci (xj , pj ) ∀j = i ,

where Ci denotes the cost of a type i buyer. In other words, the contract designed
for one specific type of buyer, must be as good as any other contract from the
buyer’s point of view. Let Ri (C) denote the contract that a type i buyer will
select given a set of contract C. Then,

Ri (C) = argmin Ci (x, p) .


(x,p)∈C
Optimal Contract Design for an Efficient Secondary Spectrum Market 185

Given a set of contracts C, we can now express the seller’s expected utility as

E[U (C)] := U (Ri (C))ri
i

where ri is the a priori probability that the buyer is of type i. We further denote
the set Ti = {(x, p) : Ci (x, p) ≤ Ci (0, 0)} as the set of all feasible contracts for
type i buyer (feasible region in Theorem 1). The optimal contract (Theorem 2)
designed for the type-i buyer, will also be called maxi :

maxi := (x∗i , p∗i )


:= argmax U (x, p)
(x,p)∈Ti

4.1 Two Types of Buyer, K=2

We first consider the case when there are only two possible types of buyer
(qi , i , bi ), i ∈ {1, 2}, with probability ri , r1 + r2 = 1.

1 0.7
max1 max
0.6 1
0.8 I
I 0.5 3
4
I
price (p)

price (p)

0.6 1 0.4 max


max 2
2
0.4 I2 0.3 I2
G
0.2
0.2 I
3
0.1
I
1

0 0
0 2 4 6 8 10 0 2 4 6 8 10
bandwidth (x) bandwidth (x)

Fig. 3. Example when max1 ∈


/ T2 and max2 ∈
/ T1 (left), max1 ∈ T2 or max2 ∈ T1
(right)

M = 1. We first consider the case when the seller hands out only one contract.

Theorem 5. The optimal contract is as follows,

– if max1 ∈
/ T2 and max2 ∈
/ T1 ,

⎨ max1 if r1 U (max1 ) ≥ r2 U (max2 ) and r1 U (max1 ) ≥ U (G)
optimal = max2 if r2 U (max2 ) ≥ r1 U (max1 ) and r2 U (max2 ) ≥ U (G)

G if U (G) ≥ r2 U (max2 ) and U (G) ≥ r1 U (max1 )

– if max1 ∈ T2 .

max1 if U (max1 ) ≥ r2 U (max2 )
optimal =
max2 if r2 U (max2 ) ≥ U (max1 )
186 S.-P. Sheng and M. Liu

– if max2 ∈ T1 .

max2 if U (max2 ) ≥ r1 U (max1 )
optimal =
max1 if r1 U (max1 ) ≥ U (max2 )

When max1 ∈ / T2 and max2 ∈ / T1 , we denote the intersecting point of the two
boundaries (of the accepting region of the two types) as G (see Figure 3 (left)).
Theorem 5 can be proved by showing that the payoffs of contracts in a particular
region are no greater than special points such as G. For example, in the case of
max1 ∈ / T2 and max2 ∈ / T1 any point in I3 is suboptimal to point G because
they are both acceptable by both types of buyers and G has a strictly higher
profit than any other point in I3 .

M = 2, max1 ∈ / T2 and max2 ∈ / T1 . The seller can hand out at most


two contracts for the buyer to choose from. We will see that providing multiple
contracts can help the seller obtain higher profits.
Theorem 6. The set {max1 , max2 } is the optimal set of contracts.
Proof. The set C = {max1 , max2 } gives an expected payoff of
E[U (C)] = r1 U (R1 (C)) + r2 U (R2 (C))) = r1 U (R1 (max1 )) + r2 U (R2 (max2 ))
The last equality holds because max1 ∈ / T2 and max2 ∈ / T1 and both types
choose the maxi intended for them. If C is not the optimal set, then there must
exist some contract set C = {(x1 , p1 ), (x2 , p2 )} such that
E[U ((C ))] = r1 U (R1 (x1 , p1 )) + r2 U (R2 (x2 , p2 ))
> E[U (C)] = r1 U (R1 (max1 )) + r2 U (R2 (max2 ))
This has to mean either U (R1 (x1 , p1 )) > U (R1 (max1 )), or U (R2 (x2 , p2 )) >
U (R2 (max2 )), or both, all of which contradict the definition of maxi . Thus,
{max1 , max2 } is the optimal contract set. 


M = 2, max1 ∈ T2 or max2 ∈ T1 . The seller can hand out at most two


contracts.
Obviously, the seller cannot hand out the same contract C = {max1 , max2 }
as in the previous section and claim that it is optimal. Without loss of genrality,
we will assume that the type-1 buyer has a smaller b1 (b1 ≤ b2 ), thus, we are
considering the max1 ∈ T2 case. We will first determine the optimal contract
when x∗1 ≤ x∗2 , the optimal contract when x∗1 > x∗2 can be determined based
on the results of the first case. To find the optimal contract set, we consider
only the contract pairs {(x1 , p1 ), (x2 , p2 )} where each type-i buyer pick (xi , pi )
instead of the other one. It is quite simple to show that we do not lose optimality
by restricting to this type of contract sets.
To find the optimal contract, we will 1) first show that for each (x1 , p1 ) we
can express the optimal (x2 , p2 ) in terms of x1 and p1 ; 2) then we will show that
(x1 , p1 ) must be on the boundary of T1 with x1 ≤ x∗1 ; 3) using 1) and 2) we can
calculate the expected profit by a simpler optimization problem.
Optimal Contract Design for an Efficient Secondary Spectrum Market 187

Lemma 1. In the K = 2 case, if max1 ∈ T2 and x∗1 ≤ x∗2 . Given a contract for
type-1 (x1 , p1 ), the optimal contract for type-2 must be (x∗2 , P2 (x1 , p1 , x∗2 )).
Proof. Given a contract (x1 , p1 ), the feasible region for the contract of type-2
buyer is the area below P2 (x1 , p1 , x) as defined in Theorem 4 (see Figure 4).
By noticing that the form of the seller’s profit is increasing in both p and x
(U (x, p) = x(p − c)), the contract that generates the highest profit will be such
that x2 = x∗2 and p2 =, P2 (x1 , p1 , x∗2 ). 

Lemma 2. In the K = 2 case, if max1 ∈ T2 and x∗1 ≤ x∗2 . An optimal contract
for type-1 must be p1 = b1 and x1 ≤ x∗1 .
Proof. Lemma 2 can be proved in two steps. First we assume the optimal con-
tract has (x1 , p1 ) ∈ T1 , where we can increase p1 by some positive δ > 0 but
still have (x1 , p1 + δ) ∈ T1 . By noticing that both U (x, p) and P (x, p, x ) are
increasing in p. We know that both U (x1 , p1 + δ) and U (x∗2 , P2 (x1 , p1 + δ, x∗2 )))
are strictly larger than U (x1 , p1 ) and U (x∗2 , P2 (x1 , p1 , x∗2 ))). This contradicts the
assumption that it was optimal before, thus, we know that the optimal contract
for (x1 , p1 ) must be on the two lines (the upper boundary of T1 ) defined in The-
orem 2. Then we can exclude the possibility of having (x1 , p1 ) on the boundary
of T1 with x1 > x∗1 by comparing the contract (x∗1 , b1 ) with such a contract.  

1
Optimal contract to give type−2
0.8
Equal−cost line of type−2
price (p)

0.6 (x1, p1) for type−1

0.4

0.2

0
0 2 4 6 8 10
bandwidth (x)

Fig. 4. The regions to distinguish type-2 given (x1 , p1 )

By putting the constraints from Lemmas 1, 2 and using Theorem 4, the expected
profit can be expressed as follows.

E[U (C)] = r1 U (x1 , p1 ) + r2 U (x2 , p2 )


= r1 U (x1 , p1 ) + r2 U (x2 , P2 (x1 , p1 , x∗2 ))
x1
= r1 U (x1 , b1 ) + r2 U (x∗2 , b2 − ∗ (b2 − b1 ))
x2
∗ x1
= r1 x1 (b1 − c) + r2 x2 (b2 − ∗ (b2 − b1 ) − c)
x2
∂E[U (C)]
= r1 (b1 − c) − r2 (b2 − b1 )
∂x1
188 S.-P. Sheng and M. Liu

The x1 acheiving the optimal contract C is given by,



0 if r1 (b1 − c) − r2 (b2 − b1 ) < 0
x1 =
x∗1 if r1 (b1 − c) − r2 (b2 − b1 ) > 0

max2 if r1 (b1 − c) − r2 (b2 − b1 ) < 0
C= ∗ x∗
max1 , (x2 , b2 − x∗ (b2 − b1 )) if r1 (b1 − c) − r2 (b2 − b1 ) > 0
1
2

This result shows two different conditions: 1) When rr12 < bb21−b 1
−c , type-2 is more
profitable and the seller will distribute max2 . If the seller chooses to distribute
max2 , there is no way to distribute another contract for type-1 without affecting
the behavior of type-2. Consequently, the seller only distributes one contract. 2)
When rr12 > bb21−b
−c , type-1 is more profitable and the seller will distribute max1 .
1

x∗
After choosing max1 , the seller can also choose (x∗2 , b2 − x1∗ (b2 − b1 )) for the
2
type-2 buyer without affecting the type-1 buyer’s choice. As a result, the seller
distributes a pair of contracts to get the most profit.
With a very similar argument, the optimal contract for x∗1 > x∗2 can be
determined. Again, we can prove that the optimal contract must have p1 =
b1 and x1 ≤ x∗1 . The difference is that when x∗1 > x∗2 , the expression for
(x∗2 , P2 (x1 , p1 , x∗2 )) has two cases depending on whether x1 > x∗2 or x1 ≤ x∗2 .

r1 U (x1 , b1 ) + r2 U (x∗2 , b2 − xx∗1 (b2 − b1 )) if x1 ≤ x∗2
E[U (C)] = 2
r1 U (x1 , b1 ) + r2 U (x∗2 , xx1∗b1 ) if x1 > x∗2
2

∂E[U (C)] r1 (b1 − c) − r2 (b2 − b1 ) if x1 ≤ x∗2
=
∂x1 r1 (b1 − c) + r2 b1 if x1 > x∗2

To summarize, when r1 (b1 − c) − r2 (b2 − b1 ) > 0, E[R(C)] is strictly increasing in


x1 and we know that x1 = x∗1 maximizes the expected profit. When r1 (b1 − c) −
r2 (b2 − b1 ) < 0, E[R(C)] is decreasing in x1 if x1 ∈ [0, x∗2 ] and increasing in x1
if x1 ∈ [x∗2 , x∗1 ]. We can only conclude that either x1 = 0 or x1 = x∗1 maximizes
the expected profit.


0 or x∗1 if r1 (b1 − c) − r2 (b2 − b1 ) < 0
x1 =
x∗1 if r1 (b1 − c) − r2 (b2 − b1 ) > 0
 x∗
1 b1
max2 or {max1 , (x∗2 , x∗ )} if r1 (b1 − c) − r2 (b2 − b1 ) < 0
C= x∗ b
2

{max1 , (x∗2 , x1∗1 )} if r1 (b1 − c) − r2 (b2 − b1 ) > 0


2

In the first condition, we can calculate the expected profit of the two contract
sets and pick the one with the higher profit.

4.2 K Types of Buyer, K ≥ 2, Common bi


In this section we consider the case when different types share the same channel
condition bi = b, ∀i = 1, · · · , K, which is also known to the seller. This models
Optimal Contract Design for an Efficient Secondary Spectrum Market 189

the case where the condition is largely determined by the seller’s primary user
traffic. An example of the acceptance regions of three buyer types are shown
in Figure 5. We will assume that the indexing of the buyer is in the increasing
order of x∗i ; this can always be done by relabeling the buyer indices. There are
two possible cases: (1) the seller can announce as many contracts as it likes,
i.e., M = K (note that there is no point in designing more contracts than there
are types); (2) the seller is limited to at most M < K contracts. In the results
presented below we fully characterize the optimal contract set in both cases.

Three buyer types with same channel condition


0.7

0.6 max max2 max3


1
0.5
price (p)

0.4 I I2 I3
1
0.3

0.2

0.1

0
0 2 4 6 8 10
bandwidth (x)

Fig. 5. Three buyer types with common b

Theorem 7. When M = K and ∀bi = b, the contract set that maximizes the
seller’s profit is (max1 , max2 , ..., maxK ).

This result holds for the following reason. As shown in Figure 5, with a constant
b, the intersection points (maxi ) of all acceptance regions are on the same line
p = b. For a buyer of type i, all points to the left of maxi on this line cost the same
as maxi , and all points to its right are outside the buyer’s acceptance region.
Therefore the type-i buyer will select the contract maxi given this contract set.
Since this is the best the seller can do with a type-i buyer (see Theorem 4) this
set is optimal for the seller. (see proof of Theorem 6)

Lemma 3. When M < K and ∀bi = b, the optimal contract set is a subset of
(max1 , ..., maxK ).

Proof. Assume the optimal contract C is not a subset of (max1 , ..., maxK ). Then
it must consists of some contract points from at least one of the Ii regions as
demonstrated in Figure 5. Let these contracts be Ai ⊂ Ii and i Ai = C. For
each non-empty Ai , we replace it by the contract maxi and call this new contract
set C . The proof is to show that this contract set generates profit at least as large
as the original one. For each type-i buyer that picked some contract (x, p) ∈ Aj
from the optimal contract C, it must had a type greater than or equal to j
otherwise (x, p) is not in its acceptance region. In the contract set C , type-i will
now pick maxj or maxl with l > j. The choice of each possible type of buyer
picks from C is at least as profitable as the one they picked from C. Thus, the
expected profit of C is at least as good as C. 

190 S.-P. Sheng and M. Liu

The above lemma suggests the following iterative way of finding the optimal
contract set.
Definition 2. Define function g(m, i) as the the maximum expected profit for
the seller by picking contract maxi and selecting optimally m − 1 contracts from
the set (maxi+1 , ..., maxK ).
Note that if we include maxi and maxj (i < j) in the contract set but nothing
else in between i and j, then a buyer of type l (i ≤ l < j) will pick contract maxi .
j−1
These types contribute to an expected profit of x∗i (b − c) l=i rl . At the same
time, no types below i will select maxi (as it is outside their acceptance regions),
and no types at or above j will select maxi (as for them maxj is preferable).
Thus the function g(m, i) can be recursively obtained as follows:
j−1

g(m, i) = max g(m − 1, j) + x∗i (b − c) rl ,
j:i<j≤K−m+2
l=i

K
with the boundary condition g(1, i) = x∗i (b − c) l=i rl .
Finally, it should be clear that the maximum expected profit for the seller
is given by max1≤i≤K g(M, i), and the optimal contract set can be determined
by going backwards: first determine i∗M = arg max1≤i≤K g(M, i), then i∗M−1 =
arg max1≤i≤K−1 g(M − 1, i), and so on.

Theorem 8. The set of contracts {maxi∗1 , maxi∗2 , · · · , maxi∗M } obtained using


the above procedure is optimal and its expected profit is given by g(M, i∗M ).

5 Conclusion
In this paper we considered a contract design problem where a primary license
holder wishes to profit from its excess spectrum capacity by selling it to potential
secondary users/buyers via designing a set of profitable contracts. We considered
two cases. Under symmetric information, we found the optimal contract that
achieves maximum profit for the primary user. Under asymmetric information,
we found the optimal contract if the buyer belongs to one of two types. When
there are more than two types we restricted our attention to the case where the
channel condition is common to all types, and presented an optimal procedure
to design the contracts.

References
1. Akyildiz, I.F., Lee, W.Y., Vuran, M.C., Mohanty, S.: Next generation/dynamic
spectrum access/cognitive radio wireless networks: a survey. Computer Net-
works 50(13), 2127–2159 (2006)
2. Buddhikot, M.M.: Understanding dynamic spectrum access: Models, taxonomy and
challenges. In: New Frontiers in Dynamic Spectrum Access Networks, DySPAN
2007, pp. 649–663. IEEE (2007)
Optimal Contract Design for an Efficient Secondary Spectrum Market 191

3. McHenry, M.A., Tenhula, P.A., McCloskey, D., Roberson, D.A., Hood, C.S.:
Chicago spectrum occupancy measurements & analysis and a long-term studies
proposal. In: The First International Workshop on Technology and Policy for Ac-
cessing Spectrum. ACM Press, New York (2006)
4. McHenry, M.A.: NSF spectrum occupancy measurements project summary. Shared
Spectrum Company Report (August 2005)
5. Chen, D., Yin, S., Zhang, Q., Liu, M., Li, S.: Mining spectrum usage data: a large-
scale spectrum measurement study. In: ACM International Conference on Mobile
Computing and Networking (MobiCom), Beijing, China (September 2009)
6. Zhao, Q., Sadler, B.M.: A survey of dynamic spectrum access. IEEE Signal Pro-
cessing Magazine: Special Issue on Resource-Constrained Signal Processing, Com-
munications, and Networking 24, 79–89 (2007)
7. Kim, H., Shin, K.G.: Efficient discovery of spectrum opportunities with mac-layer
sensing in cognitive radio networks. IEEE Transactions on Mobile Computing 7(5),
533–545 (2008)
8. Liu, X., Shankar, S.N.: Sensing-based opportunistic channel access. Journal of Mo-
bile Networks and Applications 11(4), 577–591 (2006)
9. Zhao, Q., Tong, L., Swami, A., Chen, Y.: Decentralized cognitive mac for oppor-
tunistic spectrum access in ad hoc networks: A pomdp framework. IEEE Journal
on Selected Areas in Communications (JSAC) 5(3), 589–600 (2007)
10. Ahmad, S.H.A., Liu, M., Javidi, T., Zhao, Q., Krishnamachari, B.: Optimality
of myopic sensing in multi-channel opportunistic access. IEEE Transactions on
Information Theory 55(9), 4040–4050 (2009)
11. Duan, L., Gao, L., Huang, J.: Contract-based cooperative spectrum sharing. In:
Dynamic Spectrum Access Networks (DySPAN), pp. 399–407. IEEE (2011)
12. Muthuswamy, P.K., Kar, K., Gupta, A., Sarkar, S., Kasbekar, G.: Portfolio opti-
mization in secondary spectrum markets. WiOpt (2011)
Primary User Emulation Attack Game
in Cognitive Radio Networks: Queuing
Aware Dogfight in Spectrum

Husheng Li1 , Vasu Chakravarthy2, Sintayehu Dehnie3 , and Zhiqiang Wu4


1
The University of Tennessee, Knoxville, TN
[email protected]
2
Air Force Research Lab, Dayton, OH
[email protected]
3
Booz Allen Hamilton, Dayton, OH
[email protected]
4
Wright State University, Dayton, OH
[email protected]

Abstract. Primary user emulation attack, which targets distabilizing


the queuing dynamics of cognitive radio networks, is studied using game
theoretic argument. The attack and defense are modeled as a stochas-
tic game. The Nash equilibrium of the game is studied. In particular,
the Lyapunov drift is considered as the reward in each round. Explicit
expressions of the Nash equilibrium strategies are obtained.

Keywords: primary user emulation, cognitive radio, stochastic game.

1 Introduction
Cognitive radio has attracted substantial studies since its birth in 1999 [14].
In cognitive radio systems, users without license (called secondary users) are
allowed to use the licensed spectrum that licensed users (called primary users)
are not using, thus improving the spectrum utilization efficiency. When primary
users emerge, the secondary users must quit the corresponding channels. To
ensure no interference to primary user traffic, the secondary user must sense the
spectrum periodically to determine the existence of primary users.
Such a dynamical spectrum access mechanism, particularly the spectrum sens-
ing mechanism, also incurs vulnerabilities for the communication system. One
serious threat is the primary user emulation (PUE) attack [1], in which the at-
tacker sends out signal similar to that of primary users during the spectrum
sensing period such that the secondary users will be ‘scared’ away even if there
is no primary user, since it is difficult to distinguish the signals from primary
users and the attacker. Such an attack is very efficient since the attacker needs
only very weak power consumption, due to the high requirement on the spectrum
sensing sensitivity of secondary users.
Most existing studies on PUE attack fall in the topics of proactive detection of
attacker [1] or passive frequency hopping [11]. Due to the difficulty of detecting

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 192–208, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Dogfight 193

the attacker, we will focus on the frequency hopping for avoiding PUE, in which
the secondary users randomly choose channels to sense such that the attacker
cannot always block the traffic in the cognitive radio network. Such an attack
and defense procedure is essentially a game, which is coined ‘dogfight in spec-
trum’ in [11] and has been studied using game theoretic argument. In previous
studies, only single hop communications, such as point to point communications
or multiple access, are considered and total throughput is considered as the game
reward (cost) for the defender (attacker). However, in many practical applica-
tions like sensor networks, the traffics are multihop and have constant average
traffic volumes, thus forming a queuing dynamics since each secondary user has
a buffer to store the received packets. Hence, the ultimate goal of the game is to
stabilize / destabilize the queuing dynamics, thus making the game a queuing
aware one. Note that the optimal scheduling strategy of stabilizing a queuing
system in wireless communication network is obtained in the seminal work [20].
However, there has not been any study on the game for stabilizing/destabilizing
queuing systems, which is of significant importance for the security of various
queuing systems. In this paper, we will model this queuing aware dogfight in spec-
trum as a stochastic game, in which the actions are the channels to sense/block
and the rewards are the metrics related to the queue stability such as the Lya-
punov drift and back pressure. We will first study the centralized case, in which
both the cognitive radio network and attackers are controlled by centralized
controllers, thus making the game a two-player one. Then, we will extend it to
the more practical situation, in which each player can observe only local system
state, based on which it makes the decision of action. Different from graphical
games in which each player has its own reward, in our situation, the attackers
and secondary users form two coalitions whose rewards are the sum of the local
rewards and each player devotes to increase the coalition reward. Such a ‘local-
decision-global-reward’ brings significant difference from the graphical games.
For both cases, we will provide the game formulation, the Nash equilibrium and
value of the general situation, and discussions for special cases. Note that, for
simplicity, we assume that the attackers know the queuing situation of the cogni-
tive radio network, which can be realized by eavesdropping the control messages
in the network.
In summary, our major contribution includes
– Study the network wide PUE attack with the awareness of queuing dynamics,
which extends the PUE attack in single hop systems.
– Study the game of stabilizing/destabilizing queuing systems, which extends
the decision problems of network stabilization.
The study will deepen our understanding on the security of cognitive radio net-
works, as well as that of general queuing systems. It will help the design of a
robust cognitive radio network which can effectively combat the PUE attack in
the network range.
The remainder of this paper is organized as follows. The existing work re-
lated to this paper will be introduced in Section 2. The system model will be
explained in Section 3. The centralized and decentralized games will be discussed
194 H. Li et al.

in Sections 4 and 5, respectively. Numerical results will be provided in Section


6. Finally, the conclusion will be drawn in Section 7.

2 Related Works
In this section, we provide a brief survey of the existing works related to this
paper. Note that there are huge number of studies in each of the topics. Hence,
the introduction is far from exhaustive.

2.1 Security Issues in Cognitive Radio Networks


In contrast to traditional wireless communication networks, two types of new
attacks emerge in the context of cognitive radio network, namely the false re-
port attack and PUE. The former one occurs only in collaborative spectrum
sensing in which secondary users exchange information or send the observations
to a fusion center in order to improve the performance of spectrum sensing. In
such a collaboration, a malicious node can send faked report to the center or
the neighbors. Hence, various approaches have been proposed to detect such an
attack and the corresponding attacker. For example, a Bayesian framework is
applied in [23] and [24] to detect the attacker. In [17], a trust system is used
to evaluate the trustworthiness of each collaborating secondary user. We have
explained the mechanism of PUE attack which also attracted significant studies.
In [7], a mechanism is proposed to detect the PUE attack based on the approach
proposed in [1]. In [21], the emulation attack is modeled as a Bayesian game
and the Nash equilibrium is analyzed. In contrast to these studies, we study the
attack and defense with multiple players in networks and the goal of destabiliz-
ing/stabilizing the queuing dynamics, instead of the spectrum sensing precision
or the system throughput.

2.2 Stability of Queuing Systems


A key task in queuing systems is to stabilize the queuing dynamics; otherwise
the buffers containing the packets will overflow, thus causing packet loss. In the
seminal work [20], Tassiulas and Ephremides found a scheduling algorithm for
wireless communication networks achieving the optimal throughput region. The
algorithm was extended to the context of cognitive radio networks by incorpo-
rating impact of primary users [22]. In [15], a ‘drift-plus-penalty’ cost function is
proposed to achieve the tradeoff between the queuing stability and other factors
like delay. Note that the algorithm proposed in [20] is centralized, i.e., a center
will make the decision of scheduling based on the queuing situations of each
node, which is impractical in most applications. In recent years, the scheduling
algorithm has been extended to the decentralized case at the cost of reasonable
performance loss [25][27]. Although the scheduling algorithm and the correspond-
ing queuing stability have been widely studied, there have been few studies on
the queuing dynamics aware attack and defense using the tool of game theory.
Dogfight 195

2.3 Games

The analysis in this paper is based on game theory. Due to the features of the
queuing aware dogfight in spectrum, our study concerns both stochastic games
and graphical games since the reward is dependent on the system state and the
players form a graph (network).

Stochastic Game. Many queuing systems can be modeled as Markov systems


in which the future evolution of system is based on only the current system state.
Hence, the corresponding game also requires a Markov framework, in which the
reward is dependent on the system state. Games in a Markov system are called
stochastic games and were studied by Shapley [19]. In stochastic games, the Nash
equilibrium, or the game value, is characterized by the combination of dynamic
programming and the value of one-snapshot games. A comprehensive introduc-
tion on stochastic games can be found in [5]. Note that the classical results in
[5][19] are based on the assumption that all players have perfect information of
the system state. In many situations, this assumption is invalid. For example, in
the context of PUE attack on queuing dynamics in cognitive radio network, the
attackers may not perfectly know the queuing situations of each secondary user.
However, the computation of Nash equilibrium in stochastic games with partial
observation is still an open problem. Hence, we focus on the case of perfect state
information in this paper.

Graphical Game. As more studies are paid to various types of networks, such
as social networks and communication networks, game theory is also extended
from the traditional structureless setup (e.g., two players or multiple layers form-
ing a complete graph) to the scenarios with network structures (called graphical
game) [16]. In such games, the players form a graph or a network in which
the corresponding topology plays an important role in the game. In one type of
graphical game, each player has its own payoff. The algorithm for computing the
Nash equilibrium has been studied in [4], [8] and [9]. In another type of graphi-
cal game, the players form two coalitions and each player aims to maximize the
coalition reward equaling the sum of individual rewards. Such a game has been
studied in [3] and [26]. An excellent summary can be found in [2].

3 System Model

The system model consists of the models of cognitive radio networks, data flows
and primary user emulation attacks.

3.1 Network Model

We consider a cognitive radio network with N secondary users. The topology of


the network can be represented by a graph with N nodes, in which two nodes
196 H. Li et al.

adjacent to each other are able to communicate with each directly. We denote by
n ∼ m if secondary users n and m are neighbors in the network. We assume that
there are totally M licensed channels which may be used by K primary users.
We denote by Nk the set of secondary users that may be affected by primary
user k and denote by Mk the set of channels that primary user k occupies when
it is active. For simplicity, we assume that the activities in different time slots of
each primary user are mutually independent, and the probability of being active
is denoted by pk for primary user k.
The time is divided into time slots, each containing a spectrum sensing period
followed by a data transmission period. At time slot t, the status of channel m
is denoted by sm ; i.e., sm = 0 when the channel is not being used by primary
users and sm = 1 otherwise. Due to the limited capability of spectrum sensing, we
assume that each secondary user can sense only one channel during the spectrum
sensing period. It is straightforward to extend to the more generic case in which
multiple channels can be sensed simultaneously. For simplicity, we assume that
the spectrum sensing is perfect; i.e., the output of spectrum sensing is free of
errors.

3.2 Traffic Model

We assume that there are totally F data flows in the cognitive radio network. We
denote by Sf and Df the source node and destination node of flow f , respectively.
We assume that the number of packets arriving at the source node of data flow
f satisfies a Poisson distribution with expectation af . The routing paths of the
F data flows can be represented by an F × N matrix R in which Rf n = 1 if
data flow f passes through secondary user n and Rf n = 0 otherwise. We denote
by In the set of flows passing through secondary user n.
The data flows are packetized using the
same packet length. Each secondary user has CR
one or more buffers to store the received pack- State: queue
Action: channel
selection and
lengths
ets. In each time slot, the secondary users will scheduling

choose one packet, if there is any, and sense


one or more channel for transmission. Suppose Reward:
that one channel can support the transmission Lyapunov drift

of only one data flow. We assume that, if two


Attckers
secondary users are close to each other, they
State: queue Action: the
are not allowed to sense the same channel due length channels to jam
to the potential collision. We denote by Cn the
set of other secondary users that have intol-
erable co-channel interference with secondary
Fig. 1. Elements of the game
user n. We assume that there are sufficiently
many channels such that any set of interfer-
ing secondary users can be assigned to different channels and all secondary
users can transmit simultaneously by appropriately allocating the channels; i.e.,
maxn Cn  M .
Dogfight 197

When secondary user n decides to transmit to the next hop neighbor j and
an available channel, say m, is found, the packet can be delivered successfully
with probability μnjm which is determined by the channel quality.

3.3 PUE Attack Model


We assume that there are totally L PUE attackers distributed around the cogni-
tive radio network. Each attacker chooses Q (Q ≤ M ) channels to attack. During
each spectrum sensing period, each PUE sends interference in the Q channels
such that the secondary users sensing these channels are scared away even if
the channel is actually not being used by primary users. We denote by Vl the
set of potential secondary user victims that could be jammed by attacker l. We
assume that the attackers have certain knowledge about the current state of the
cognitive radio network.

4 Centralized Game
In this section, we consider the centralized case, in which the actions of the
attackers and secondary users are both fully coordinated. Hence, we can assume
that there are two centers making the decisions for the attackers and secondary
users, respectively, such that there are two players in the game.

4.1 Elements of Game


We define the following elements of the game. Obviously, this game is a stochastic
one having the elements of reward, action and state.

– State: The system state, denoted by s, includes the queue lengths of all flows
and all secondary users which are denoted by {qf n }f =1,...,F,n=1,...,N . The
state space is then denoted by S which consists of all possible s. We assume
that the system state is visible to both attackers and secondary users. Note
that, since we assume that the primary users’ activities are independent in
time, the spectrum situation is memoryless. It is easy to extend to the case
in which the spectrum has memory by incorporating the spectrum state into
the system state.
– Actions: We denote by Aa and As the sets of actions of the attackers and
secondary users, respectively. The actions of the attackers, denoted by aa ,
include the channels to jam, which are denoted by {cal }l=1,...,L (cl is a vec-
tor containing the Q channels to jam). The action of the secondary users,
denoted by as , includes the assignment of the channels, as well as the sched-
uled flow. We denote by cn (t) and fn (t) the assigned channel and scheduled
flow at secondary user n at time slot t. To avoid co-channel interference, we
have cn (t) = cm (t) if m ∈ Cn .
198 H. Li et al.

– Reward: The purpose of the attacker is to make the cognitive radio col-
lapse, or equivalently destabilizing the queuing system, while the purpose of
the secondary users is to stabilize the system. Hence, a quantity is needed
to quantify the stability of the system. We define the following Lyapunov
function, which is given by
F 
 N
V (s(t)) = qf2 n (t), (1)
f =1 n=1

namely the square sum of all queue lengths. The larger the Lyapunov func-
tion is, the more unstable the system is since there are more packets staying
in the network. Since V (s(t)) can be rewritten as
t

V (s(t)) = V (s(0)) + V (s(r)) − V (s(r − 1)), (2)
r=1

we define d(t) = E [V (s(t)) − V (s(t − 1)], namely the expected Lyapunov


drift [15], as the reward of the attacker. When the Lyapunov drift is positive,
the system becomes more unstable, thus benefiting the attackers. To define
the reward of the secondary user system, we model the game as a zero-sum
one and define −d(t) as the reward of cognitive radio network. For simplicity,
we add a discounting factor 0 < β < 1 to the reward in each time slot such
that the total reward of the attacker is given by


R= β t d(t), (3)
t=0

which simplifies the analysis since it is much easier to analysis the game with
a discounted sum of rewards. Note that this definition is motivated by the
classical works on scheduling queuing network in which the scheduling tries
to minimize the Lyapnov drift in order to stabilize the queues [15][20].

4.2 Attack/Defense Strategies


The attack strategy, denoted by πa , is defined as the condition probability
P (aa |s); i.e., the probability of the action given the current system state. Sim-
ilarly, we can also define the defense strategy, denoted by πs , as P (as |s). We
will first study the Nash equilibrium given the above game configuration via
the Shapley’s Theorem. Then, we will use a simpler definition of reward which
simplifies the analysis.

Nash Equilibrium. First, we follow the standard solution of stochastic games.


For a general stochastic game (not necessary zero-sum), the Nash equilibrium is
defined as the pair of strategies (πs∗ , πa∗ ), which satisfies
R(πs∗ , πa∗ ) ≥ R(πs∗ , πa ), ∀πa , (4)
R(πs∗ , πa∗ ) ≤ R(πs , πa∗ ), ∀πs . (5)
Dogfight 199

At the Nash equilibrium point, both players have no motivation to change the
strategies specified the equilibrium point; any unilateral deviation from the equi-
librium point can only incur performance degradation of itself.
To find the Nash equilibrium, an auxiliary matrix game proposed by Shapley
[19] is needed. We first define the matrix game conditioned on the system state
s, which is given by
⎛ ⎞
d(s, 1, 1) d(s, 1, 2) · · · d(s, 1, |Aa |)
⎜ d(s, 2, 1) d(s, 2, 2) · · · d(s, 2, |Aa |) ⎟
⎜ ⎟
R(s) = ⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠
d(s, |As |, 1) d(s, |As |, 2) · · · d(s, |As |, |Aa |)

in which d(s, a1 , a2 ) is the expected Lyapunov drift when the system state is
s and the actions are a1 and a2 for the attackers and cognitive radio network,
respectively.
We define the value vector of the attacker, denoted by va = (va (1), ..., va (|S|),
whose elements are given by

va (s) = R(s), s = 1, ..., |S|, (6)

where R(s) is the reward of the attackers given the initial state s. Then, an
auxiliary matrix game is defined with the following payoff matrices

R̃(s, va ) = R(s) + βT(s, va ), s ∈ S, (7)

where the elements in the matrix T(s, va ) are defined as



T(s, va )ij = p(s |s, i, j)va (s ). (8)
s

Similarly, we can also define the value vector for the cognitive radio network,
which is denoted by vc .
The following theorem (Shapley, 1953, [19]) discloses the condition of the Nash
equilibrium of the zero-sum stochastic game:
Theorem 1. The value vector at the Nash equilibrium satisfies the following
equations:

va (s) = val [R(s, va )] , s ∈ S, (9)

where the matrix game R(s, va ) is in (7).


Once the value vector va is obtained, the optimal action of the attackers is given
by
 
aa (s) = arg max min R̃(s, va ) , (10)
j i ij

while the optimal action of the cognitive radio is given by


 
ac (s) = arg min max R̃(s, va ) . (11)
i j ij
200 H. Li et al.

Myopic Game for Back Pressure. Although the Nash equilibrium exists for
the stochastic game formulation in the previous subsection, it is very difficult to
obtain analytic expression for the equilibrium. We can only obtain the numerical
solution for small systems. Moreover, it is still not clear whether defining the
Lyapunov drift as the reward of each time slot is the optimal choice. In this
subsection, we will study the myopic case in which the attackers and cognitive
radio take myopic strategies by maximizing their rewards in each time slot,
without considering the future evolution. Moreover, we will approximate the
maximization of Lyapunov drift by maximizing the back pressure, which can
simplify the stochastic game to a one-stage normal game.
It is well known that, when there is no attacker, the back pressure of flow f
at secondary user n is given by [20]

(qf n − qf j ) μnjm , j ∈
/ Df
Df n = , (12)
qf n μnjm , j ∈ D
where j is the next secondary user along flow f and m is the channel for the
transmission from n to j (recall that i ∈ Df means that node i is a destination
node for flow f ). [20] has shown that the scheduling algorithm minimizing the
back pressure, which is tightly related to minimizing the Lyapunov drift, can
stabilize the queuing system.
However, when attacks exist, the back pressure is dependent on the attackers’
strategy since the channels selected by the attackers will change the transmission
success probability μnjm . Recall that the actions of attackers and cognitive radio
network are denoted by aa and ac , respectively. Then, the success probability,
as a function of the actions, is given by (recall that Vl is the set of secondary
user that attacker l can attack)
/ cal , ∀n ∈ Vl ),
μ̃njm (aa , ac ) = μnjm I(m ∈ (13)
where I is the characteristic function of the event that no attacker that can
interfere secondary user n is attacking channel m. Then, the back pressure in
the game is defined as a function of the actions aa and ac , which is given by

(qf n − qf j ) μ̃njm (aa , ac ), j ∈
/ Df
D̃f n (aa , ac ) = . (14)
qf n μ̃njm (aa , ac ), j ∈ Df

Then, the reward of the attacker is given by (recall that fn is the flow scheduled
at secondary user n)
N

R(aa , ac ) = − D̃fn ,n (aa , ac ), (15)
n=1

and the reward of the cognitive radio network is N n=1 D̃fn ,n since the game
is modeled as a zero-sum one. Then, the strategy of the attackers at the Nash
equilibrium is given by
πa∗ = arg max min R(πa , πc ), (16)
πa πc
Dogfight 201

and the corresponding action of the cognitive radio is given by

πc∗ = arg min max R(πa , πc ). (17)


πc πa

The actions at the Nash equilibrium point can be computed using linear pro-
gramming. The challenge is the large number of actions when the network size
or the number of channels is large. We will find the analytic expression for an
example in the sequel. For large system size, we can only use approximations for
exploring the Nash equilibrium.

4.3 Example
In this subsection, we use one example to illustrate the previous discussions,
which also provides insights for networks with larger size. The example is illus-
trated in Fig. 2, in which there is one attacker and three secondary users. We
assume that there are totally two channels over which two data flows are sent
from secondary user 3 to secondary users 1 and 2, respectively. The attacker can
only interfere secondary user 3. For simplicity, we assume that secondary user 3
can sense and transmit over both channels simultaneously; hence, there are only
two possible actions for secondary user 3.

Stochastic Game for Lyapunov Drift.


Due to its small size, the Nash equilibrium of Attacker
the example can be obtained by solving equa-
tions. However, we are still unable to obtain PUE
the explicit expression. Below, we consider a Attack
more simplified case in which μ311 μ312
3
and μ322 μ321 ; i.e., secondary user 3 should Flow 1 Flow 2
use channel 1 to transmit data flow 1 and use
channel 2 to transmit data flow 2. In this case, 1 Cognitive Radio 2
the strategy of the cognitive radio network is Network
fixed; hence, the problem is converted from
a game theoretic one to a single sided deci-
Fig. 2. An illustration of the
sion one. Then, the attacker needs to decide example
whether to jam data flow 1 (thus sending sig-
nal over channel 1) or jam data flow 2 (thus sending signal over channel 2). The
following proposition provides the optimal strategy for the attacker.
Proposition 1. Consider the above simplified case. We assume that the trans-
mission success probability is μ and the new packet arrival rate is λ, both identi-
cal for the two data flows. The optimal strategy of the attacker to maximize the
Lyapunov function is to choose the channel to jam the data flow having a larger
queue length.

One Stage Game for Back Pressure. Now we consider the one stage game
for maximizing or minimizing the back pressure. Fix a certain time slot and
202 H. Li et al.

drop the index of time for simplicity. It is easy to verify that the reward for the
cognitive radio can be represented by a matrix, which is given by
 
q32 μ322 q31 μ312
. (18)
q31 μ311 q32 μ321

The Nash equilibrium of this matrix game is provided in the following proposi-
tion. The proof is a straightforward application of the conclusion in [5]; hence,
we omit the proof due to the limited space.
Proposition 2. We denote by πja the probability that the attacker attacks chan-
nel j, j = 1, 2, and by πkc the probability that secondary user 3 transmits data flow
1 over channel k while transmitting data flow 2 over the other channel, k = 1, 2.
The Nash equilibrium of the matrix game in (18) is given by the following cases:
– If the following inequality holds; i.e.,

(q32 μ322 − q31 μ312 )(q32 μ321 − q31 μ311 ) > 0
, (19)
(q32 μ322 − q31 μ311 )(q32 μ321 − q31 μ312 ) > 0

the equilibrium strategies are given by


q32 μ321 −q31 μ311
π1a = q32 μ322 −q31 μ312 +q32 μ321 −q31 μ311
q32 μ321 −q31 μ312 . (20)
π1c = q32 μ322 −q31 μ311 +q32 μ321 −q31 μ312

– If the first equality in (19) does not hold, then we have the following possi-
bilities:
• q32 μ322 ≥ q31 μ312 and q32 μ321 < q31 μ311 , or q32 μ322 > q31 μ312 and
q32 μ321 ≤ q31 μ311 : secondary user 3 should always transmit data flow 1
over channel 1; the attacker should attack channel 1 if q31 μ311 > q32 μ322
and attack channel 2 otherwise.
• q32 μ322 < q31 μ312 and q32 μ321 ≥ q31 μ311 , or q32 μ322 ≤ q31 μ312 and
q32 μ321 > q31 μ311 : secondary user 3 should always transmit data flow 1
over channel 2; the attacker should attack channel 1 if q32 μ321 > q31 μ312
and attack channel 2 otherwise.
• q32 μ322 = q31 μ312 and q32 μ321 = q31 μ311 : secondary user 3 can choose
either action; the attacker should attack channel 1 if q32 μ321 > q31 μ312
and attack channel 2 otherwise.
– If the second equality in (19) does not hold, then we have the following pos-
sibilities:
• q32 μ322 ≤ q31 μ311 and q31 μ312 < q32 μ321 , or q32 μ322 < q31 μ311 and
q31 μ312 ≤ q32 μ321 : the attacker should always attack channel 1; secondary
user 3 should transmit data flow 1 over channel 1, if q32 μ322 > q31 μ312 ,
and transmit over channel 2 otherwise.
• q32 μ322 ≥ q31 μ311 and q31 μ312 > q32 μ321 , or q32 μ322 > q31 μ311 and
q31 μ312 ≥ q32 μ321 : the attacker should always attack channel 2; secondary
user 3 should transmit data flow 1 over channel 1, if q31 μ311 > q32 μ321 ,
and transmit over channel 2 otherwise..
Dogfight 203

• q32 μ322 = q31 μ311 and q32 μ321 = q31 μ312 : the attacker can attack any
channel; secondary user 3 should transmit flow 3 over channel 1 if
q32 μ322 > q31 μ312 and attack channel 2 otherwise..
Remark 1. We can draw the following conclusions from the Nash equilibrium:
– When all channels have the same quality, the attacker should attack each
channel with probability 0.5, which is independent of the queue lengths.
– Suppose μ311 = μ322 μ312 = μ321 ; i.e., it is much desirable to transmit
data flow 1 over channel 1 and data flow 2 over channel 2, the attacker should
attack the channel more desirable for the data flow with more queue length.
In this situation, the queue length information is useful.

4.4 Stability Analysis


Now we analyze the stability of the queuing dynamics. We first provide a brief
introduction to the queuing stability when there is no attacker. Then, we consider
the case when attacker exists.

Stability Without Attacker. When there is no attacker, the stability of queu-


ing networks has been analyzed for single channel case in [20], which is easy to
extend to the multichannel case. We denote by a L-dimensional vector f the sums
of data flow rates of the links; i.e., fl stands for the total data rates through link
l, l = 1, ..., L. We denote by S the set of all vectors of transmission success prob-
abilities, each dimension corresponding to a link and each vector corresponding
to one possible channel assignment. Then, if we can find a vector c ∈ co(S) such
that f < c, then the queue is stabilizable. When f > c, the queues cannot be
stabilized. The proofs follow those of Lemma 3.2 and Lemma 3.3 in [20].

Stability Subject to Attacker. When


attacker(s) exists, the capacity vector c is 5 6
changed since the transmission success prob-
A B
ability is decreased due to the PUE attack. 3 4
Since the attack actions are dynamical, de-
2 1
pending on the queue situations, each vector
in S also becomes dynamical. Hence, it is diffi-
cult to analyze the stability analytically. Here Fig. 3. An illustration of the de-
we just provide a qualitative observation. For centralized game
a certain link l, if the total flow rate fl is close
to the capacity cl , then it is more possible that
there is a long queue at the transmitter. As we have seen in the example, the
attacker tends to attack secondary users with longer queues by jamming the
channels more possibly available to the secondary user, given that the channel
conditions are similar. Then, cl is further decreased, thus making the attacker
more focused on link l.
For the simple example in Fig. 2, when the attacker and the network carry
out the one-stage game, we have the following corollary of Prop. 1:
204 H. Li et al.

Corollary 1. A necessary condition for q1 → ∞ and q2 being finite is


 2  2
μ312 μ311
μ311 + μ312 < f1 , (21)
μ311 + μ312 μ311 + μ312

and
μ312 μ311 (μ321 + μ322 )
2 > f2 . (22)
(μ311 + μ312 )

Proof. The proof is simple. We notice

c1 = (1 − π1a ) π1c μ311 + π1a (1 − π1c ) μ312 , (23)

and

c2 = π1a π1c μ321 + (1 − π1a ) (1 − π1c ) μ322 . (24)

Then, we simply substitute the conclusion in Prop. 1 into the above expressions
of c1 and c2 .

5 Decentralized Game

As we have discussed in the previous section, the centralized game is difficult to


analyze due to the large action space and state space; moreover, the centralized
controls of the attackers and cognitive radio network are impractical in applica-
tions. Hence, we study the decentralized game for both attackers and cognitive
radio network. An illustration is given in Fig. 3, in which we consider two at-
tackers, namely A and B, and six secondary users, namely 1, 2, 3, 4, 5 and 6. A
key feature for the decentralized game is that each attacker/secondary user is a
player and each player makes decision based on the states of its neighbors/direct
victims. For example, secondary user 2 makes its decision based on the state of
secondary user 4, while attacker A makes its decision based on the states of
secondary users 2 and 3.
Based on the big picture described above, we define the elements of the game
as follows:

– System state: Due to the locality assumption, each player does not nec-
essarily know the queue lengths of all secondary user and all flows. For
attacker l, its state is sal = {qf n }n∈Vl ,f ∈In , i.e., the queuing situations of
all secondary users that it may attack. For secondary user n, its state is
sal = {qf m }n∼m,f ∈Im , i.e., the queuing situations of all neighboring sec-
ondary users.
– Strategy: As we have assumed, each player knows only the states of its
neighbors. Hence, its action is also dependent on only the neighbors. We de-
fine the strategy of a player as the distribution of action given the states
Dogfight 205

of its neighbors and itself1 . For each attacker, the strategy is given by
P (a| {qf n }n∈Vl ,f ∈In ), a = 1, ..., M . For each secondary user n, the strat-
egy is given by P (a| {qf n }m∼n,f ∈Im ). The overall strategy of the cognitive
radio network (attacker) is the product of the strategies of each secondary
user (attacker); i.e.,
M m
πa = m=1 πa
N n
. (25)
πc = n=1 πc

Note that the key difference between the decentralized game and the central-
ized one is the structure of the strategy; i.e., the decentralized game has a
product space for the strategy while the centralized does not.
– Reward: Again, we consider the Laypunov drift as the reward. For secondary
user n, its reward is given by

rn (t) = qf2 n (t − 1) − qf2 n (t). (26)
f ∈Im

The total reward of the coalition of secondary users is then given by


N

R(t) = rn (t)
n=1
= V (t − 1) − V (t), (27)

which is equal to the negative of the Laypunov drift.


The situation is slightly more complicated for the attacker
 coalition. Nat-
urally, we can define the reward of attacker k as − n∈Nk rn (t). However,
if we simply add up the individual rewards of the attackers as the total re-
ward of the attacker coalition, it may not be equal to the negative of R(t),
since the sets of secondary users affected by different attackers may overlap.
Hence, we assume that, before launching the attack, the attacker divide the
secondary users into disjoint sets and each attacker takes the rewards from
only one set of secondary users, denoted by Ñk for attacker k. Then, we
define − n∈Ñk rn (t) as the reward of attacker k; thus, the total reward of
the attacker coalition is equal to the negative of the reward of the secondary
user coalition.
Then, the reward of the secondary user coalition is given by


Rs = E β t R(t) , (28)
t=1
1
The more general strategy should include the history, namely the previous actions
and previous system states, into the condition of the probability distribution of
actions. It is still not clear whether the Markov assumption in the strategy loses any
information. For the case of time average reward, it has been shown in [2] that, when
the strategy of one coalition is fixed, the Markov strategy can achieve the optimal
reward for the other coalition.
206 H. Li et al.

where β is the discounting factor. We can also consider the mean reward;
however, it is much more complicated.
For the PUE attack game, we define the value of the game as follows [2].
Definition 1. The value of the PUE attack game is given by

sup inf Rc = inf sup Rc , (29)


πc πa πa πc

if both sides exist.


The following proposition shows the existence of the value of the decentralized
stochastic game. The proof is similar to that of Theorem 4.16 in [2], where the
reward is the average of rewards.
Proposition 3. The value of the decentralized game, defined in (29) exists.
The proof and more discussions will be made in the journal version.

6 Numerical Results
In this section, we use numerical results to
demonstrate the theoretical analysis. In Fig.
4, we show the rate region subject to PUE 0.9
uniformly random
0.8 Nash equilibrium
attacks for the network in Fig. 2. The strate- 0.7
no attack

gies are obtained by solving the equations in 0.6

the Shapley’s Theorem, using numerical ap- 0.5


2
λ

0.4
proach [5]. Since there are infinitely many pos- 0.3
sible queue lengths, thus resulting in infinitely 0.2

many system states, we merge all the cases 0.1

0
with more than 9 packets in a queue into one 0 0.2 0.4
λ
1
0.6 0.8 1

state. We judge whether a given set of rates is


stable by carrying out the simulation for the Fig. 4. Rate region subject to
queuing dynamics; if one of the queues has PUE attacks
more than 50 packets after 2000 time slots, we
claim that the rates are unstable. We tested
the case of Nash equilibrium, uniformly choosing the actions and no PUE attack.
The region of each case is the area below the corresponding curve. We observe
that the PUE attack can cause a significant reduction of the rate region.

7 Conclusions
In this paper, we have studied multiple attackers and an arbitrary cognitive radio
network with multiple data flows, where the goal of the game is to stabilize
(destabilize) the queuing dynamics by the secondary users (attackers). Both
the centralized and decentralized cases of the game have been considered. The
Lyapunov drift and the back pressure are considered as the game rewards for
Dogfight 207

the stochastic game case and the myopic strategy case, respectively. The value
functions and Nash equilibriums have been obtained for the general case, while
the explicit expressions are obtained for simple but typical scenarios. Numerical
simulations have been carried out to demonstrate the analysis.

References

1. Chen, R., Park, J.-M., Reed, J.H.: Defense against primary user emulation attacks
in cognitive radio networks. IEEE J. on Select. Areas in Commun. Special Issue
on Cognitive Radio Theory and Applications 26(1) (2008)
2. Chornei, R.K., Daduna, H., Knopov, P.S.: Control of Spatially Structured Random
Processes and Random Fields with Applications. Springer (2006)
3. Daskalakis, C., Papadimitriou, C.: Computing pure Nash equilibria in graphical
games via Markov random fields. In: Proc. of the 7th ACM Conferene on Electrionic
Commerce (2006)
4. Elkind, E., Goldberg, L., Goldberg, P.: Graphical games on tree revisited. In: Proc.
of the 7th ACM Conferene on Electrionic Commerce (2006)
5. Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer (1997)
6. Han, Z., Pandana, C., Liu, K.J.R.: Distributive opportunistic spectrum access for
cognitive radio using correlated equilibrium and no-regret learning. In: Proc. of
IEEE Wireless Communications and Networking Conference, WCNC (2007)
7. Jin, Z., Anand, S., Subbalakshmi, K.P.: Detecting primary user emulation attacks
in dynamic spectrum access networks. In: Proc. of IEEE International Conference
on Communications, ICC (2009)
8. Kakade, S., Kearns, M., Langford, J., Ortiz, L.: Correlated equilibria in graphical
games. In: Proc. of the 4th ACM Conference on Electronic Commerce, EC (2003)
9. Kakade, S.M., Kearns, M., Ortiz, L.E.: Graphical Economics. In: Shawe-Taylor,
J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 17–32. Springer,
Heidelberg (2004)
10. Korilis, Y.A., Lazar, A.A.: On the existence of equlibria in noncooperative optimal
flow control. Journal of the ACM 42, 584–613 (1995)
11. Li, H., Han, Z.: Dogfight in spectrum: Jamming and anti-jamming in cognitive
radio systems. In: Proc. of IEEE Conference on Global Communications, Globecom
(2009)
12. Li, H., Han, Z.: Blind dogfight in spectrum: Combating primary user emulation
attacks in cognitive radio systems with unknown channel statistics. In: Proc. of
IEEE International Conference on Communications, ICC (2010)
13. Li, H., Han, Z.: Competitive spectrum access in cognitive radio networks: Graphical
game and learning. In: Proc. of IEEE Wireless Communication and Networking
Conference, WCNC (2010)
14. Mitola, J.: Cognitive radio for flexible mobile multimedia communications. In: Proc.
IEEE Int. Workshop Mobile Multimedia Communications, pp. 3–10 (1999)
15. Neely, M.J.: Stochastic Network Optimization with Application to Communication
and Queuing Systems. Morgan&Claypool Press (2010)
16. Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V. (eds.): Algorithmic Game
Theory. Cambridge University Press (2007)
17. Qin, T., Yu, H., Leung, C., Sheng, Z., Miao, C.: Towards a trust aware cognitive
radio architecture. ACM SIGMOBILE Newsletter 13 (April 2009)
208 H. Li et al.

18. Sampath, A., Dai, H., Zheng, H., Zhao, B.Y.: Multi-channel jamming attacks using
cognitive radios. In: Proc. of IEEE Conference on Computer Communications and
Networks, ICCCN (2007)
19. Shapley, L.S.: Stochastic games. In: Proceedings Nat. Acad. of Science USA, pp.
1095–1100 (1953)
20. Tassiulas, L., Ephremides, A.: Stability properties of constrained queuing systems
and scheduling for maximum throughput in multihop radio networks. IEEE Trans.
Automat. Control 37, 1936–1949 (1992)
21. Thomas, R.W., Komali, R.S., Borghetti, B.J., Mahonen, P.: A Bayesian game
analysis of emulation attacks in dynamic spectrum access networks. In: Proc. of
IEEE International Symposium of New Frontiers in Dynamic Spectrum Access
Networks, DySPAN (2008)
22. Urgaonkar, R., Neely, M.J.: Opportunistic scheduling with reliability guarantees
in cognitive radio networks. IEEE Trans. Mobile Computing 8, 766–777 (2009)
23. Wang, W., Li, H., Sun, Y., Han, Z.: Attack-proof collaborative spectrum sensing
in cognitive radio networks. In: Proc. of Conference on Information Sciences and
Systems, CISS (2009)
24. Wang, W., Li, H., Sun, Y., Han, Z.: CatchIt: Detect malicious nodes in collabora-
tive spectrum sensing. In: Proc. of IEEE Conference on Global Communications,
Globecom (2009)
25. Wu, X., Srikant, R.: Regulated maximal matching: A distributed scheduling al-
gorithm for multihop wireless networks with node-exclusive spectrum sharing. In:
Proc. of 44th IEEE Conference on Decision and Control (2005)
26. Yao, D.: S-modular games, with queuing applications. Queuing Systems and Their
Applications 21, 449–475 (1995)
27. Ying, L., Srikant, R., Eryilmaz, A., Dullerud, G.E.: Distributed fair resource al-
location in cellular networks in the presence of heterogeneous delays. In: Proc. of
IEEE International Symposium on Modeling and Optimization in Mobile, Ad Hoc
and Wireless Networks, WIOPT (April 2005)
Revenue Maximization
in Customer-to-Customer Markets

Shaolei Ren and Mihaela van der Schaar

Electrical Engineering Department, University of California, Los Angeles


{rsl,mihaela}@ee.ucla.edu

Abstract. Customer-to-customer (C2C) markets, such as eBay, provide


a platform allowing customers to engage in business with each other. The
success of a C2C market requires an appropriate pricing (i.e., transaction
fee charged by the market owner) scheme that can maximize the market
owner’s revenue while encouraging customers to participate in the mar-
ket. However, the choice of an optimal revenue-maximizing transaction
fee is challenged by the large population of self-interested customers (i.e.,
sellers and buyers). In this paper, we address the problem of maximizing
the market owner’s revenue based on a hierarchical decision framework
that captures the rationality of both sellers and buyers. First, we use
a model with a representative buyer to determine the sales of products
in the market. Then, by modeling sellers as self-interested agents mak-
ing independent selling decisions, we show that for any transaction fee
charged by the market owner, there always exists a unique equilibrium in
the selling decision stage. Finally, we derive the optimal transaction fee
that maximizes the market owner’s revenue. We find that to maximize its
revenue under certain circumstances, the market owner may even share
its advertising revenues with sellers as rewards to encourage them to sell
products in the market and bring more website traffic. Our results indi-
cate that the market owner’s revenue can be significantly increased by
optimally choosing the transaction fee, even though sellers and buyers
make self-interested and rational decisions.

Keywords: Revenue maximization, customer-to-customer market, pric-


ing, product substitutability.

1 Introduction
Electronic commerce markets have witnessed an explosive growth over the past
decade and have now become an integral part of our everyday lives. In the
realm of electronic commerce, customer-to-customer, also known as consumer-to-
consumer (C2C), markets are becoming more and more popular, as they provide
a convenient platform allowing customers to easily engage in business with each
other. A well-known C2C market is eBay, on which a wide variety of products,
including second-hands goods, are sold.
As a major source of revenue, a C2C market owner charges various fees, which
we refer to as transaction fees, for products sold in the market. For instance, eBay

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 209–223, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
210 S. Ren and M. van der Schaar

charges a final value fee and a listing fee for each sold item [1]. Hence, to enhance
a C2C market’s profitability, it is vital for the market owner to appropriately
set the transaction fee. In this paper, we focus on a C2C market and address
the problem of maximizing the market owner’s revenue. The scenario that we
consider is summarized as follows.
1. The market owner monetizes the market by charging transaction fees for
each product sold and (possibly) through advertising in the market. For the
completeness of analysis, we also allow the market owner to reward the sellers
to encourage them to sell products in the market, which increases the market’s
website traffic and hence advertising revenues if applicable. Although rewarding
sellers may seem to deviate from our initial goal of collecting transaction fees
from sellers, we shall show that rewarding sellers may also maximize the market
owner’s revenue under certain circumstances.
2. Products are sold by sellers and purchased by buyers at fixed prices. Pro-
motional activities (e.g., monetary rewards, rebate) and/or auctions are not
considered in our study.
3. Buyers do not need to pay the market owner (e.g., membership fees) in
order to purchase products in the market, and they can directly interact with
sellers that participate in the market (e.g., eBay).
In the following analysis, we adopt a leader-follower model (i.e., the mar-
ket owner is the leader, followed by the sellers and then by the buyers), which
is described in Fig. 1. Note that, without causing ambiguity, we refer to the
market owner as intermediary for brevity if applicable. Fig. 1 also shows inter-
dependencies of different decisions stages. The intermediary’s transaction fee
decision will directly affect the sellers’ participation in the market, while the
sellers’ selling decisions influence the buyers’ purchasing decisions. Based on
backward induction, we first use a model with a representative buyer, which is
a collection of all the individual buyers, to determine the sales of products sold
in the market. As a distinguishing feature, our model captures the (implicit)
competition among the sellers, which is typically neglected in existing two-sided
market research [8], and also the buyers’ preference towards a bundle of diversi-
fied products. Then, we study the selling decisions made by self-interested sellers.
It is shown that there always exists a unique equilibrium point at which no seller
can gain by changing its selling decision, which makes it possible for the inter-
mediary to maximize its revenue without uncertainties. Next, we formulate the
intermediary’s revenue maximization problem and develop an efficient algorithm
to derive the optimal transaction fee that maximizes the intermediary’s revenue.
Finally, we conduct simulations to complement our analysis and show that the
intermediary’s revenue can be significantly increased by optimally choosing the
transaction fee, even though sellers and buyers make self-interested and rational
decisions.
The rest of this paper is organized as follows. Related work is reviewed in Sec-
tion 2. Section 3 describes the model. In Section 4, we study the decisions made
by the buyers and sellers, and derive the optimal transaction fee maximizing the
intermediary’s revenue. Finally, concluding remarks are offered in Section 5.
Revenue Maximization 211

Stage 1: Optimal Transaction


Intermediary Fee
(i.e., decide transaction fee)

Stage 2: Selling Decision


Sellers (i.e., decide whether or not to sell
products in the market)

Stage 3: Product Purchasing


Decision
Buyers (i.e., decide which products to
purchase and how many to
purchase)

Fig. 1. Order of decision making

2 Related Works
We briefly summarize the existing related works in this section.
If the intermediary chooses to reward the sellers, then the transaction fee
is essentially an incentive for sellers to sell products in the market. Various
incentive mechanisms have been proposed recently. For instance, the authors
in [3] proposed eliminating or hiding low-quality content to provide content
producers with incentives to generate high-quality content. In [4], two scoring
rules, the approval-voting scoring rule and the proportional-share scoring rule,
were proposed to enable the high-quality answers for online question and answer
forums (e.g., Yahoo! Answers). The authors in [5] proposed a (virtual) reward-
based incentive mechanism to improve the overall task completion probability in
collaborative social media networks. If the intermediary charges the sellers, then
our work can be classified as market pricing. By considering a general two-sided
market, the authors in [8] studied the tradeoffs between the merchant mode and
the platform mode, and showed the conditions under which the merchant or
platform mode is preferred. Focusing on the Internet markets, [10] revealed that
a neutral network is inferior to a non-neutral one in terms of social welfare when
the ratio between advertising rates and end user price sensitivity is either too
high or too low.
In economics literature, C2C markets are naturally modeled as two-sided mar-
kets, where two user groups (i.e., sellers and buyers in this paper) interact and
provide each other with network benefits. Nevertheless, most two-sided market
research neglected the intra-group externalities (e.g., see [11][12] for a survey),
which in the contexts of C2C markets indicate the sellers’ competition. A few
recent studies on two-sided markets explicitly considered intra-group external-
ities. For instance, [13] studied the optimal pricing problem to maximize the
platform’s profit for the payment card industry with competition among the
merchants. [14] considered the sellers’ competition in a two-sided market with
differentiated products. More recently, considering intra-group competition, [15]
212 S. Ren and M. van der Schaar

studied the problem of whether an entrant platform can divert agents from the
existing platform and make a profit. Nevertheless, the focus in all these works
was market pricing, whereas in our work the intermediary can either charge or
reward the sellers. Moreover, the existing studies on two-sided markets typically
neglected product substitutability as well as buyers’ “love for variety”.
To summarize, this paper derives the optimal transaction fee, and determines
analytically when the intermediary should subsidize sellers to maximize its rev-
enue. Unlike general two-sided market research (e.g., [11][12]), this paper con-
siders both the sellers’ competition and the product substitutability, which are
key features of C2C markets and, as shown in this paper, significantly impact
the optimal transaction fee of C2C platforms.

3 Model
We first specify the basic modeling details of the intermediary, sellers and buyers,
and then discuss the model extension.

3.1 Intermediary
An important and prevailing charging model in C2C markets is that, for each sold
product, the intermediary charges a transaction fee that is proportional to the
product price (e.g., final value fee in eBay). From the perspective of sellers, sellers
pay to the intermediary when their products are sold, i.e., “pay-per-sale”. In this
paper, we concentrate on the “pay-per-sale” model. Nevertheless, it should be
noted that other fees may also be levied on product sales, e.g., eBay charges
a lump-sum listing fee for listing a product regardless of the quantities sold
[1]. Investigating more sophisticated charging models (e.g., “pay-per-sale” plus
lump-sum fee) is part of our ongoing research. As in many real C2C markets
such as eBay, buyers do not need to pay the intermediary (e.g., membership
fees) in order to purchase products in the market.
To formally state our model, we denote x̄ ≥ 0 as the sales volume (i.e., quan-
tities of sold products) in the market, and θ > 0 is the transaction fee1 that the
intermediary charges the sellers for each of their sold products. For the ease of
presentation, we assume in our basic model that all the products belong to the
same category and have the same price and hence, θ is the same for all the prod-
ucts. This assumption is valid if all the sellers sell similar and/or standardized
products (e.g., books, CDs) and, due to perfect competition, set the same price
for their products [8][19]. Recent research support the assumption of a uniform
product price by showing that price dispersion in online shopping sites is fairly
small, i.e., prices offered by different sellers for the same or similar products are
very close to each other [6]. Moreover, if the considered C2C market is an online
labor market in which sellers “sell” their services (e.g., skills, knowledge, etc.),
1
Note that θ is actually the percentage of the product price charged by the interme-
diary. However, since we later normalize the product price to 1, θ can also represent
the absolute transaction fee charged by the intermediary.
Revenue Maximization 213

the assumption of different services having the same price is reasonable when the
offered services are of the same or similar types (see, e.g., Fiverr, an emerging
C2C market where the “sellers” offer, possibly different, services and products
for a fixed price of US$ 5.00 [2]). We should also make it clear that our analysis
can be generalized and applied if different products are sold at different prices
(see Section 3.4 for a detailed discussion). Besides the transaction fees charged
for product sales, the intermediary may also receive advertising revenues by
displaying contextual advertisement on its website. In general, the advertising
revenue is approximately proportional to page views (i.e., the number of times
that the webpages are viewed), which are also approximately proportional to
sales volume in the market. Thus, overall, the advertising revenue is approxi-
mately proportional to the sales volume. Let b ≥ 0 be the (average) advertising
revenue that the intermediary can derive from each sold product. For the conve-
nience of analysis, we assume that b is constant regardless of x̄, i.e., the average
advertising revenue is independent of the sales volume. Next, we can express the
intermediary’s revenue as2
ΠI = (b + θ) · x̄. (1)
Remark 1: For the completeness of analysis, we allow θ to take negative values,
in which case the intermediary rewards the sellers for selling their products. This
may occur if the intermediary can derive a sufficiently high advertising revenue
per page view and hence would like to encourage more sellers to participate in its
market, which attracts more buyers and increases the website traffic (and hence,
higher advertising revenues, too). In the following analysis, we use the term
transaction fee (per sold product) to refer to θ wherever applicable, regardless
of its positive or negative sign.
Remark 2: While b can be increased by using sophisticated advertising algorithms
showing more relevant advertisement, we assume throughout the paper that b
is exogenously determined and fixed, and shall focus on deriving the optimal θ
that maximizes the intermediary’s revenue.
Remark 3: As in [8], we focus on only one C2C market in this paper. Although
the competition among various C2C markets is not explicitly modeled, we do
consider that online buyers can purchase products from other markets (see Sec-
tion 3.3 for details).

3.2 Sellers
As evidenced by the exploding number of sellers on eBay, a popular C2C market
can attract a huge number of sellers. To capture this fact, we use a continuum
model and assume that the mass of sellers is normalized to one. Each seller can
sell products of a certain quality while incurring a lump-sum cost, which we
refer to as selling cost, regardless of the sales volume. Note that the product
2
The expression in (1) can also be considered as the intermediary’s profit, if we treat b
as the average advertising profit for each sold product and neglect the intermediary’s
recurring fixed operational cost.
214 S. Ren and M. van der Schaar

quality can be different across sellers, although we assume in our basic model
that the selling cost is the same for all sellers. We should emphasize that the
product quality is represented by a scalar and, as a generalized concept, is jointly
determined by a variety of factors including, not not limited to, product popu-
larity, seller ratings, customer service and product reviews [7]. For instance, even
though two sellers with different customer ratings sell the same product, we say
that the product sold by the seller with a higher rating has a higher quality. The
scalar representation of product quality, i.e., abstracting and aggregating vari-
ous factors to one value, is indeed an emerging approach to representing product
quality [7]. Mathematically, we denote qi ≥ 0 and c > 0 as the product quality
sold by seller i and the selling cost, respectively. Without causing ambiguity, we
occasionally use product qi to refer to the product with a quality qi . To charac-
terize heterogeneity in the product quality, we assume that the product quality
q follows a distribution in a normalized interval [0, 1] across the unit mass of
sellers and the cumulative distribution function (CDF) is denoted by F (q) for
q ∈ [0, 1] . In other words, F (q) denotes the number or fraction of sellers whose
products have a quality less than or equal to q ≥ 0. In what follows, we shall
explicitly focus on the uniform distribution, i.e., F (q) = q for q ∈ [0, 1], when
we derive specific results, although other CDFs can also be considered and our
approach of analysis still applies.3 Note that scaling the interval [0, 1] to [0, q̄]
does not affect the analysis, but will only complicate the notations.
As stated in the previous subsection, we assume in our basic model that all
the products are sold at the same price in the market. Hence, without loss of
generality, we normalize the product price to 1. Denote the profit that each seller
can obtain by selling a product by s ∈ (0, 1), which is assumed to be same for all
the sellers, and let x(qi ) ≥ 0 be the sales volume for product qi . Heterogeneous
product profits (i.e., different s for different sellers) can be treated in the same
way as treating heterogeneous product prices (see Section 3.4 for details). In our
model, sellers are rational and each seller makes a self-interested binary decision:
sell or not sell products in the considered C2C market. If seller i chooses to sell
products in the market, it can derive a profit expressed as

πi = (s − θ) · x(qi ) − c, (2)

where θ is the transaction fee charged by the intermediary per product sale, and
c is the (lump-sum) selling cost. Seller i obtains zero profit if it chooses not to
sell products in the market. By the assumption of rationality, seller i chooses to
sell products if and only if its profit is non-negative. It is intuitively expected
that, with the same price, a product with a higher quality will have a higher
sales volume (and yield a higher profit for its seller, too) than the one with a
lower quality.4 Thus, the sellers’ selling decisions have a threshold structure. In
particular, there exist marginal sellers whose products have a quality denoted
3
The uniform distribution has been widely applied to model the diversity of various
factors, such as opportunity cost [8] and valuation of quality-of-service [9].
4
This statement can also be mathematically proved, while the proof is omitted here
for brevity.
Revenue Maximization 215

by qm ∈ [0, 1], and those sellers whose product quality is greater (less) than
qm will (not) choose to sell products in the market. We refer to qm as the
marginal product quality. Next, it is worthwhile to provide the following remarks
concerning the model of sellers.
Remark 4: In our model, a seller who sells m ≥ 1 different products is viewed as
m sellers, each of whom sells a single product, and the total selling cost is m · c
(i.e., constant returns to scale [8]).
Remark 5: The lump-sum selling cost c accounts for a variety of fixed costs for
selling products. For instance, sellers need to spend time in purchasing products
from manufactures and in listing products in the market. Moreover, as charged
by eBay, a small amount of lump-sum fee, i.e., listing fee, may also be charged
for listing a product (although we do not explicitly consider this revenue for
maximizing intermediary’s revenue) [1]. As in [8], we assume that the sellers will
incur a predetermined selling cost if they choose to sell products in the market.
For the ease of presentation, we consider a homogeneous selling cost among the
sellers, while we shall discuss the extension to heterogeneous selling costs in
Section 3.4.
Remark 6: In our model, sellers always have products available if buyers would
like to purchase. That is, “out of stock” does not occur.

3.3 Buyers
We adopt the widely-used representative agent model to determine how the total
budget (i.e., buyers’ expenditure in online shopping) is allocated across a vari-
ety of products [18]. Specifically, the representative buyer optimally allocates
its total budget, denoted by T , across the available products to maximize its
utility. Note that T can be interpreted as the size of the representative buyer or
the online shopping market size. In addition to purchasing products sold in the
considered C2C market, buyers may also have access to products sold in other
online markets (e.g., business-to-customer shopping sites and/or other C2C mar-
kets), and we refer to these products as outside products. Similarly, we refer to
those online markets where outside products are sold as outside markets. Focus-
ing on the intermediary’s optimal transaction fee decision, we do not consider
the details of how or by whom outside products are sold. Instead, we assume
that the mass of outside products is na ≥ 0 and the outside product quality
follows a certain CDF F̃ (q) with support q ∈ [ql , qh ], where 0 ≤ ql < qh are
the lowest and highest product quality of outside products, respectively. For the
convenience of notation, throughout the paper, we alternatively represent the
outside products using a unit mass of products with an aggregate quality of qa ,
without affecting the analysis. Note that qa is a function of na ≥ 0, F̃ (q) and
the utility function of the representative buyer. In particular, given a uniform
distribution of outside product quality and the quality-adjusted Dixit-Stiglitz
216 S. Ren and M. van der Schaar

utility for the representative buyer (which we shall define later), we can readily
obtain
    σ1
na qhσ+1 − qlσ+1
qa = , (3)
1+σ

where σ > 1 measures the product substitutability [17]. Recalling that qm ∈


[0, 1] is the marginal product quality above which the sellers choose to sell
products in the market, we write the representative buyer’s utility function as
U (x(q), xa | qm , qa ), where x(q) denotes the sales volume for product q ∈ [qm , 1]
and xa is the sales volume for outside products with an aggregate quality of
qa . Note that although there are outside products available in outside markets,
we focus on only one C2C market and implicitly assume that the sellers under
consideration, if they choose to sell products, can only participate in the consid-
ered C2C market [10]. In our future work, we shall explicitly consider that the
sellers may sell products in multiple markets. Thus, xa is essentially interpreted
as “outside activity” of the buyers, i.e., how many products buyers purchase
in outside markets. Note that x(q) can be rewritten as x(q | qm , qa ), although
we use the succinct notation x(q) throughout the paper whenever applicable. If
qm increases (decreases), there will be fewer types of products available in the
considered C2C market. Because of the continuum model, we allow x(q) and xa
to take non-integer values, and x(q) actually represents the sales volume den-
sity for a continuum of products with quality q ∈ [qm , 1], i.e., x(q) is the sales
volume that an individual seller with a product quality of q obtains. Next, by
using a quality-adjusted version of the well-known Dixit-Stiglitz function [17][18]
as the utility function which captures product heterogeneity as well as the buy-
ers’ “love for variety”, we formulate the utility maximization problem for the
representative buyer as follows
 1  σ−1
σ
σ−1 σ−1
U (x(q), xa | qm , qa ) = q · x(q) σ dF (q) + qa · xaσ
,
qm
 (4)
1
s.t., x(q)dF (q) + xa ≤ T,
qm

where σ > 1 measures the elasticity of substitution between different products.


In the extreme case, the products are perfectly substitutable when σ = ∞, i.e.,
purchasing product A and product B makes no difference except for the quality
difference [17]. The inequality in (4) specifies the budget constraint, i.e., the total
expenditure in purchasing products cannot exceed T . As we stated in Section
3.2, the product price is normalized to 1 and hence, the price does not appear in
the inequality constraint in (4). Note that to limit the number of parameters, we
assume that the price of outside products is also normalized to 1. We can also
choose other values of outside product price, and it does not affect our analysis
except for that the aggregate outside product quality may be changed. It is also
worth mentioning that an implicit assumption underlying the problem (4) is that
Revenue Maximization 217

the aggregate quality of outside products is independent of the intermediary’s


transaction fee decision and other variables in the model such as qm and x(q).
This can be justified by noting that there are many outside markets besides the
considered C2C market and changes in one market have a negligible impact on
the others. Before performing further analysis, we list the following properties
satisfied by the utility function U (x(q), xa | qm , qa ) in (4).
Property 1 (Diminishing marginal utility): U (x(q), xa | qm , qa ) is increasing and
strictly (jointly) concave in x(q) and xa , for q ∈ [0, 1].
Property 2 (Preference towards diversified products): maxx(q)≥0,xa ≥0 U (x(q),
xa | qm , qa ) is decreasing in qm ∈ [0, 1].
Property 3 (Negative externalities): Denote by x∗ (q | qm , qa ), for q ∈ [0, 1], the
optimal solution to (4). x∗ (q | qm , qa ) is continuous and strictly increasing in
qm ∈ [0, 1], increasing in q ∈ [0, 1], and decreasing in qa for qa ∈ [0, ∞). In
particular, x∗ (0 | qm , qa ) = 0 for all qm ∈ [0, 1] and qa ≥ 0.
We briefly discuss the above properties. Property 1 captures the effects of dimin-
ishing marginal utility when the representative buyer purchases more products
[17]. Property 2 models the phenomenon that buyers will benefit from the par-
ticipation of sellers in the market. This is particularly true for online markets,
where the buyers prefer to be given available options for a diversified bundle
of products. Thus, when qm ∈ [0, 1] increases, i.e., fewer sellers sell products in
the market, the representative buyer’s (maximum) utility decreases. Property 3
reflects the “crowding effects”, i.e., lower qm or more (types of) products avail-
able increases competition among the sellers. Specifically, an individual seller
will obtain a lower sales volume if more sellers choose to sell products in the
market or the aggregate outside product quality is higher [19].
Remark 7: Although we focus on the utility function defined in (4) for the ease
of presentation, our analysis of product purchasing and product selling decisions
applies to any other utility functions that satisfy Properties 1–3.

3.4 Model Extension


To keep the model succinct and highlight our hierarchical framework that cap-
tures the customer rationality, we only present the basic model in this paper.
In this subsection, we briefly discuss how our basic model is extended to better
capture a real market. In particular, we emphasize heterogeneous selling costs
and heterogeneous product prices.

Heterogeneous Selling Costs. The assumption that all the sellers have the
same (homogeneous) selling cost can be relaxed to consider that different sellers
have heterogeneous selling costs. Specifically, as in [20], we assume that there
are K ≥ 1 possible values of selling costs, denoted by c1 , c2 , . . . , cK , where 0 <
c1 ≤ c2 · · · ≤ cK , and refer to sellers with the selling cost of ck as type-k sellers.
Under the continuum model, the (normalized) mass of type-k sellers is nk > 0
K
such that k=1 nk = 1. To model the product quality heterogeneity, we consider
218 S. Ren and M. van der Schaar

that the product quality of type-k sellers follows a continuous and positive CDF
denoted by Fk (q) > 0 for q ∈ [0, 1]. Thus, the fraction of type-k sellers whose
product quality is less than or equal to q ∈ [0, 1] is given by nk Fk (q). Following
a framework of analysis similar to the one illustrated in Fig. 1, we can show
that there exists a unique equilibrium outcome in the selling decision stage, and
develop a recursive algorithm to derive the optimal transactions fee to maximize
the intermediary’s revenue.

Heterogeneous Product Prices. To explain how the assumption of a uni-


form price for all the products can be relaxed, we consider a scenario that
the product price is expressed as a function p(q) in terms of the quality.5 To
limit the number of free parameters, we still assume that the price for out-
side products is normalized to 1. Hence, the budget constraint in (4) becomes
1
qm
x(q) · p(q)dF (q) + xa ≤ T , while the objective function in (4) remains un-
changed. Then, buyers will purchase more products that have higher values of
“quality/price” (i.e., q/p(q)) instead of higher values of q. Moreover, according
to the distribution of product quality, we can easily derive the distribution of
q/p(q). As a result, we can view q/p(q) as if it were the product quality “q”
in our basic model. Next, because of the price heterogeneity, a seller’ profit
may not always increase with the sales volume. To tackle this problem, we can
normalize the sellers’ profits with respect to their own net profits per product
without affecting the binary selling decisions. For instance, if the profits of seller
A and seller B are (sA − pA · θ) · xA − c and (sB − pB · θ) · xB − c, then the
corresponding normalized profits are xA − c/(sA − pA ) and xB − c/(sB − pB ),
respectively, where pA , sA and xA are seller A’s product price, product profit,
and sales volume, respectively, and similar definitions for seller B. Note that θ is
the percentage that the intermediary charges as the transaction fee based on the
product price, while in our basic model the normalized product price is 1 and
hence the product price term does not appear in (1) or (2). It can be seen that
the normalized profits of sellers are obtained by dividing (2) by s − θ, except
for the heterogeneous selling costs. Thus, the analysis of selling decisions can be
performed following the “heterogeneous selling costs” model that we discussed
above. To sum up, if we view q/p(q) as if it were the product quality “q” in our
basic model, then the analysis in this paper still applies, although there may not
exist a closed-form expression for the optimal transaction fee θ∗ to maximize the
intermediary’s revenue (since the intermediary’s profit expression changes) and
we may need to resort to numerical methods to find it.

4 Revenue Maximization in C2C Markets


In this section, based on the model described above, we study the problem of
optimizing the transaction fee in the presence of self-interested sellers and buyers.
We proceed with our analysis using backward induction.
5
We can also consider that products of the same quality may have different prices,
but this significantly complicates the notations and explanation.
Revenue Maximization 219

4.1 Optimal Product Purchasing

By considering the quality-adjusted Dixit-Stiglitz utility defined in (4) and uni-


form distribution of the product quality, we can obtain explicitly the closed-form
solution as follows

T (σ + 1)q σ
x∗ (q) =  
σ+1 , (5)
(σ + 1) · qaσ + 1 − qm
σ
T (σ+1)qa
for q ∈ [qm , 1], x∗ (q) = 0 for q ∈ [0, qm ), and x∗a = σ + 1−q σ+1
. The details
(σ+1)·qa ( m )
∗ ∗
of deriving (5) are omitted for brevity. After plugging x (q) and xa into (4), the
maximum utility derived by the representative buyer is given by
 σ+1
 σ−1
1

∗ ∗ 1 − qm
U (x (q), x∗a ) =T qaσ + , (6)
σ+1

which is decreasing in qm ∈ [0, 1]. Note that the other concave utility functions
can also be considered, although an explicit closed-form solution may not exist.

4.2 Equilibrium Selling Decision

Based on the representative buyer’s product purchasing decision, we now ana-


lyze the self-interested selling decisions made by sellers (i.e., Stage 2 in Fig. 1).
Due to rationality, sellers will not choose to sell products if they cannot obtain
non-negative profits. Essentially, interaction among the sellers can be formalized
as a non-cooperative game with an infinite number of players, the solution to
which is (Nash) equilibrium. The intermediary’s revenue will become stabilized
if the product selling stage reaches an equilibrium. Thus, the existence of an
equilibrium point is important and relevant for the intermediary to maximize
its long-term revenue. At an equilibrium, if any, no sellers can gain more profits
by deviating from their decisions. In other words, the fraction of sellers choos-
ing to sell products on the intermediary’s C2C market does not change at the
equilibrium, or equivalently, the marginal product quality qm ∈ [0, 1] becomes
invariant. Next, we study the equilibrium selling decision by specifying the equi-

librium marginal product quality denoted by qm .

If qm = 1, then no (or a zero mass of) sellers can receive a non-negative profit

by selling products in the market. This implies that, with qm = 1, we have

x (1|1, qa ) · (θ + s) − c ≤ 0. If there are some sellers choosing to sell products at

the equilibrium (i.e., qm ∈ [0, 1)), then according to the definition of marginal
product quality, we have x∗ (qm ∗ ∗
|qm , qa ) · (θ + s) − c = 0. Hence, we can show that
⎡    σ1 ⎤1
σ ∗ σ+1
c · (σ + 1) · (qa ) + 1 − (qm )

qm ∗
 Q(qm )=⎣ ⎦ , (7)
T (σ + 1)(s − θ)
0
220 S. Ren and M. van der Schaar

where [ ν ]10 = max{1, min{0, ν}}. Thus, an equilibrium selling decision exists

if and only if the mapping Q(qm ), defined in (7), has a fixed point. Next, we

formally define the equilibrium marginal product quality in terms of qm as below.
∗ ∗
Definition 1: qm is an equilibrium marginal product quality if it satisfies qm =

Q(qm ).
We establish the existence and uniqueness of an equilibrium marginal product
quality in Theorem 1, whose proof is omitted for brevity. For the proof technique,
interested readers may refer to [20] where we consider a user-generated content
platform.

Theorem 1. For any θ ∈ [−s, b], there exists a unique equilibrium qm ∈ (0, 1]

in the selling decision stage. Moreover, qm satisfies

qm = 1, if x∗ (1 | 1, qa ) · (s − θ) ≤ c,
∗ (8)
qm ∈ (0, 1), otherwise,

where x∗ (1 | 1, qa ) is obtained by solving (4) with qm → 1.6 


Theorem 1 guarantees the existence of a unique equilibrium point and shows
that if the seller with the highest product quality cannot obtain a profit (due
to high selling cost, high transaction fee, etc.), then no sellers choose to sell
products in the market at equilibrium. For notational convenience, we denote
the value of θ that satisfies x∗ (1 | 1, qa ) · (s − θ) = c by
σ
c c · (qa )
θ̄  s − ∗
=s− . (9)
x (1 | 1, qa ) T
Then, it follows from Theorem 1 that the intermediary can gain a positive rev-
enue if and only if θ ∈ (−b, θ̄). Nevertheless, if θ̄ ≤ −b, then the intermediary’s
revenue is always zero. Hence, we assume θ̄ > −b throughout the paper. Based
∗ ∗ ∗
on the uniqueness of qm for any θ ∈ [−b, s], we can express qm = qm (θ) as a func-

tion of θ ∈ [−b, s]. While there exists no simple closed-form expression of qm (θ),

it can be easily shown that qm (θ) ∈ (0, 1) is strictly increasing in θ ∈ [−b, θ̄)
(i.e., fewer sellers choose to sell products in the market when the transaction fee

θ increases) and qm (θ) = 1 for θ ∈ [θ̄, s].

4.3 Optimal Transaction Fee


Based on decisions made by the buyers and sellers, we study the optimal trans-
action fee θ that maximizes the intermediary’s steady-state revenue (i.e., revenue
obtained when the product selling decision stage reaches equilibrium). Mathe-
matically, we formalize the problem as

θ∗ = arg max (b + θ) · x̄, (10)


θ∈[−b,θ̄]

6
When qm → 1, only a negligible fraction of sellers choose to sell products in the
market.
Revenue Maximization 221

1
where x̄ = q∗ x∗ (q | qm ∗
, qa )dF (q). The decision interval is shrunk to [−b, θ̄],
m
since θ ∈ (θ̄, s] always results in a zero revenue for the intermediary, where θ̄ is
defined in (9). In the following analysis, a closed-form optimal transaction fee
σ
θ∗ ∈ [−b, s − c·(qTa ) ] is obtained and shown in Theorem 2.
Theorem 2. The unique optimal transaction fee θ∗ ∈ [−b, θ̄] that maximizes
the intermediary’s revenue is given by
 σ 
∗ c · (σ + 1) · (qa ) + 1 − z σ+1
θ =s− , (11)
T (σ + 1) · z σ

where z ∈ [qm (−b), 1] is the unique root of the equation7
σ
T · (qa ) · (b + s) c σ + z σ+1
− σ 2 + 3
· = 0. (12)
[(σ + 1) · (qa ) + 1 − z σ+1 ] (σ + 1) z 2σ+1

Proof. Due to space limitations, we only provide the proof outline. Instead of
directly solving (10), we first find the optimal (equilibrium) marginal product
quality, which is the root of (12). Then, based on the marginal user principle, we
can obtain the optimal transaction fee θ∗ maximizing the intermediary’s revenue.
The detailed proof technique is similar to that in [20]. 
Next, we note that, to maximize its revenue, the intermediary may even reward
the sellers for selling products in the market, i.e., θ∗ < 0. In particular, “reward-
ing” should be applied if one of the following cases is satisfied:

1. Total budget T (i.e., market size) is sufficiently small;


2. Selling cost c is sufficiently large;
3. Profit of each sold product s is sufficiently small;
4. Aggregate outside product quality qa is sufficiently large;
5. Advertising revenue for each sold product b is sufficiently large.

In the first four cases, few sellers can receive a non-negative profit by sell-
ing products without being economically rewarded by the intermediary (e.g.,
if the selling cost c is very high, then sellers need to receive subsidy from the
intermediary to cover part of their selling costs). The last case indicates that if
the intermediary can derive a sufficiently high advertising revenue for each sold
product, then it can share the advertising revenue with the sellers to encourage
them to sell products in the market such that the intermediary can increase its
total advertising revenue. In Fig. 2, we illustrate the impacts of transaction fees
on the intermediary’s revenue. Note that the numeric settings for Fig. 2 are only
for the purpose of illustration and our analysis applies to any other settings.
For instance, with all the other parameters being the same, a larger value of T
indicates that the buyers spend more money in online shopping (i.e., the online
shopping market size is bigger). In practice, the intermediary needs to obtain
7 ∗
qm (−b) is the equilibrium point in the product selling stage when θ = −b.
222 S. Ren and M. van der Schaar

0.3
"Charging" is optimal c=2.0
c=1.5
0.2

Revenue
c=1.0

0.1
"Rewarding" is optimal

0
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
θ

0.3
"Charging" is optimal T=20
T=30
0.2
Revenue

T=40

0.1
"Rewarding" is optimal

0
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
θ

Fig. 2. Revenue versus transaction fee. σ = 2.0, b = 0.2, s = 0.5, qa = 3.0. Upper:
T = 40; Lower: c = 1.0.

real market settings by conducting market surveys, data analysis, etc. [8]. The
upper plot Fig. 2 verifies that the intermediary should reward the sellers if the
selling cost is high, while the lower plot indicates the intermediary should share
its advertising revenue with sellers in an emerging online shopping market (i.e.,
the market size is small). We also observe from Fig. 2 that by optimally choos-
ing the transaction fee θ∗ , the intermediary can significantly increase its revenue
compared to setting a non-optimal transaction fee (e.g., θ = 0). For instance,
the upper plot in Fig. 2 shows that with an optimal transaction fee and c = 1.0,
the intermediary’s revenue increases by nearly 30% compared to θ = 0 (i.e., the
intermediary only relies on advertising revenues). Due to the space limitation,
we omit more numerical results and the analytical condition specifying when the
intermediary should reward sellers (i.e., θ∗ < 0) to maximize its revenue.

5 Conclusion
In this paper, we studied a C2C market and proposed an algorithm to iden-
tify the optimal transaction fee to maximize the intermediary’s revenue while
taking into account the customer rationality. We first used the representative
buyer model to determine how the buyers’ total budget is allocated across a
variety of products. Then, we showed that there always exists a unique equilib-
rium point at which no seller can gain by changing its selling decision. Next, we
formalized the intermediary’s revenue maximization problem and, by using the
quality-adjusted Dixit-Stiglitz utility function function and the uniform distri-
bution of product qualities, derived the closed-form optimal solution explicitly.
We discussed qualitatively the impacts of the aggregate outside product quality
and product substitutability on the intermediary’s revenue. Extension to hetero-
geneous selling costs and product prices were also addressed. Our results showed
Revenue Maximization 223

that a significant increase in the intermediary’s revenue can be achieved using


our proposed algorithm. Future research directions include, but are not limited
to: (1) competition among different markets; (2) intermediary’s investment de-
cisions; and (3) optimal transaction fee maximizing social welfare.

References
1. eBay Seller Fees, http://pages.ebay.com/help/sell/fees.html
2. Fiverr, http://www.fiverr.com
3. Gosh, A., McAfee, P.: Incentivizing high-quality user-generated content. In: 20th
Intl. Conf. World Wide Web (2011)
4. Jain, S., Chen, Y., Parkes, D.C.: Designing incentives for online question and an-
swer forums. In: ACM Conf. Electronic Commerce (2009)
5. Singh, V.K., Jain, R., Kankanhalli, M.S.: Motivating contributors in social media
networks. In: ACM SIGMM Workshop on Social Media (2009)
6. Ghose, A., Yao, Y.: Using transaction prices to re-examine price dispersion in
electronic markets. Info. Sys. Research 22(2), 1526–5536 (2011)
7. McGlohon, M., Glance, N., Reiter, Z.: Star quality: aggregating reviews to rank
products and merchants. In: Intl. Conf. Weblogs Social Media, ICWSM (2010)
8. Hagiu, A.: Merchant or two-sided platform? Review of Network Economics 6(2),
115–133 (2007)
9. Jin, Y., Sen, S., Guerin, R., Hosanagar, K., Zhang, Z.-L.: Dynamics of competition
between incumbent and emerging network technologies. NetEcon (August 2008)
10. Musacchio, J., Kim, D.: Network platform competition in a two-sided market:
Implications to the net neutrality issue. In: TPRC: Conf. Commun., Inform., and
Internet Policy (September 2009)
11. Rochet, J.C., Tirole, J.: Platform competition in two-sided markets. Journal of the
European Economic Association 1, 990–1029 (2003)
12. Rochet, J.C., Tirole, J.: Two-sided markets: A progress report. RAND Journal of
Economics 37, 645–667 (2006)
13. Rochet, J.C., Tirole, J.: Cooperation among competitors: Some economics of pay-
ment card associations. Rand Journal of Economics 33, 549–570 (2002)
14. Nocke, V., Peitz, M., Stahl, K.: Platform ownership. Journal of the European
Economic Association 5, 1130–1160 (2007)
15. Belleflamme, P., Toulemonde, E.: Negative intra-group externalities in two-sided
markets. CESifo Working Paper Series
16. Evans, G.W., Honkapohja, S.: Learning and Expectations in Macroeconomics.
Princeton Univ. Press, Princeton (2001)
17. Dixit, A.K., Stiglitz, J.E.: Monopolistic competition and optimum product diver-
sity. American Economic Review 67(3), 297–308 (1977)
18. Hallak, J.C.: The effect of cross-country differences in product quality on the di-
rection of international trade 2002, Working Paper, Univ. Michigan, Ann Arbor,
MI (2002)
19. Rochet, J.C., Tirole, J.: Two-sided markets: A progress report. RAND J. Eco-
nomics 37(3), 645–667 (2006)
20. Ren, S., Park, J., van der Schaar, M.: Maximizing profit on user-generated content
platforms with heterogeneous participants. In: IEEE Infocom (2012)
21. Munkres, J.R.: Elements of Algebraic Topology. Perseus Books Pub., New York
(1993)
A Stackelberg Game to Optimize the Distribution
of Controls in Transportation Networks

Ralf Borndörfer1, Bertrand Omont2 , Guillaume Sagnol1 , and Elmar Swarat1


1
ZIB (Zuse Institut Berlin), Berlin, Germany
{borndoerfer,sagnol,swarat}@zib.de
2
Ecole Centrale Paris, Chatenay-Malabry, France
[email protected]

Abstract. We propose a game theoretic model for the spatial distribu-


tion of inspectors on a transportation network. The problem is to spread
out the controls so as to enforce the payment of a transit toll. We formu-
late a linear program to find the control distribution which maximizes
the expected toll revenue, and a mixed integer program for the prob-
lem of minimizing the number of evaders. Furthermore, we show that
the problem of finding an optimal mixed strategy for a coalition of N
inspectors can be solved efficiently by a column generation procedure.
Finally, we give experimental results from an application to the truck
toll on German motorways.

Keywords: Stackelberg game, Polymatrix game, Controls in transporta-


tion networks.

1 Introduction
In this article, we study from a theoretical point of view the problem of allocat-
ing inspectors to spatial locations of a transportation network, in order to enforce
the payment of a transit fee. The question of setting an optimal level of control in
transportation networks has been addressed by several authors, but to the best
of our knowledge, none of them takes the topology of the network and the spatial
distribution of the inspectors into account. Simple game theoretic models have
been proposed to model the effect of the control intensity on the behaviour of the
users of the network [4], to find an optimal trade-off between the control costs
and the revenue from the network fee [1], or to evaluate the effect of giving some
information (about the controls) to the users [6]. More recently, an approach to
optimize the schedules of inspectors in public transportation networks was pro-
posed by DSB S-tog in Denmark [7]. In contrast to our problem, the authors of
the latter article focus on temporal scheduling and assume an evasion rate which
does not depend on the control intensity. The present paper is motivated by an
application to the enforcement of a truck toll in Germany, which we present next.

Truck toll on German motorways. In 2005 Germany introduced a distance-


based toll for commercial trucks weighing twelve tonnes or more in order to

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 224–235, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
A Stackelberg Game 225

fund growing investments for maintenance and extensions of motorways. The


enforcement of the toll is the responsibility of the German Federal Office for
Goods Transport (BAG), who has the task to carry out a network-wide control,
with an intensity which is proportional to spatial and time dependent traffic
distributions. It is implemented by a combination of 300 automatic stationary
gantry bridges and by tours of 300 control vehicles on the entire highway network.
In this paper, we present some theoretical work obtained in the framework of
our cooperation with the BAG, whose final goal is to develop an optimization
tool to schedule the control tours of the inspectors. This real-world problem
is subject to a variety of legal constraints, which we handle by mixed integer
programming [2]. We propose a game theoretic approach to optimize the spatial
distribution of the controls with respect to two different objectives: (i) maximize
the (expected) monetary profit of the government; (ii) minimize the number
of evaders. The goal of this study is twofold. On the one hand, we want to
evaluate the reasonableness of current BAG’s methodology (control intensities
proportional to traffic volumes). On the other hand, we plan to use in a follow-
up work the distributions computed in this article as a target of the real-world
problem.

Specificity of the applied problem and assumptions made in this article. The
model presented in this article is not limited to the case of motorway networks.
It applies to any situation where the individuals in transit can be controlled
on each section of their route through a network. A strong assumption of our
model however is that we know the set of routed demands of the network, i.e. the
number of individuals taking each possible route. In our model, the drivers do
not have the choice of their route between their source and destination. We plan
to search in this direction for future work. In particular, it might be relevant to
consider that the drivers can take some sections of a trunk road to avoid the toll
motorway.
We do not pretend that our model is representative of all the complexity of
the drivers’ reaction to inspectors’ behavior, in particular because certain facts
are particularly hard to model. For example, the perception of the penalty is not
the same for all truck drivers. If an evader is caught with a second offense, he
may get a higher fine in a trial.
In this article, we assume that the users of the network act on a selfish be-
haviour, and decide to pay or to evade so as to minimize the expected cost of
their trip. This is obviously wrong, since there is certainly a large fraction of
honest people who always pay the toll. However, we claim that our simplified
model still leads to significant spatial distributions of the controls, because: (i)
the number of evaders that we compute in this model corresponds to the num-
ber of network users for which it is more interesting to evade the toll; (ii) hence,
the toll revenue in this model is a lower bound for the true revenue; (iii) if the
fraction of honest drivers is the same on every route, we could solve the problem
by considering the remaining fraction of crafty drivers only, which would lead to
the same results.
226 R. Borndörfer et al.

Organization and contribution. We present our model in Section 2. We show


that the optimal distribution of controls (with respect to the aforementioned
goals) is the optimal strategy of the inspectors in a Stackelberg game, and can
be found by mean of mathematical programming formulations. Then we exhibit
in Section 2.3 a relation between the optimal solution of our model and the Nash
equilibriums of a particular polymatrix game. Finally, experimental results from
the application to the truck toll in Germany are presented in Section 3.

2 A Network Spot-Checking Game

We make use of the standard notation [n] := {1, . . . , n} and we denote vectors by
boldface lower case letters. We model the transportation network by a directed
graph G = (V, E). We assume that the users of the network are distributed over
a set of routes R = {r1 , . . . , rm }, where each ri ⊂ E. In addition, we are given
the demand xi of each route, that is, the number of users that take the route
ri per unity of time (typically one hour; we assume a constant demand, i.e., we
do nottake the diurnal variations of the traffic into account). We denote by
ye := {i∈[m]: ri e} xi the number of users passing through edge e per unity of
time.
Every user of the route ri has to pay a toll fee Ti , but he may also decide
to evade the toll, with the risk to pay a penalty Pi if he gets controlled. We
assume that the inspectors have a total capacity of control κ. This means that
κ individuals can be controlled per unity of time. We consider two manners of
spreading out the controls over the network in the next subsections. In the first
one, we simply assume that the control force can be arbitrarily distributed over
the network. The second one is a more theoretical approach, where we consider
all possible allocations of a finite number of inspectors over the sections e ∈ E,
and we search for the best mixed strategy combining these allocations.

2.1 Arbitrarily Splittable Controls

We denote by q ∈ ΔE the distribution of the controls, where ΔE is the set of


all probability distributions over E:

ΔE := {q ∈ [0, 1]|E| : qe = 1}.
e∈E

Each coordinate of q represents the proportion of the total capacity of control κ


that is allocated to the corresponding edge, i.e., κqe individuals can be controlled
on the section e per unity of time.

Strategy of the users. We denote by πi the probability for a user of the route
ri to be controlled during its trip. We assume a stationary regime in which the
users have learned the values of the probability πi . Hence, a user of the route ri
will pay if πi is above the threshold PTii , and evade if it is below. In other words,
A Stackelberg Game 227

the proportion pi of payers on the route ri minimizes the average cost per user
of this route:
 
λi := min(Ti , Pi πi ) = min pi Ti + (1 − pi )Pi πi .
pi ∈[0,1]

A user passing on the section e has a probability ( κq


ye ∧ 1) to be controlled on
e

this section, where we have used the notation a ∧ b := min(a, b). Hence, the
probability πi of being controlled during a trip on route ri can be expressed as
a function of the control distribution q:
 κqe 
πi = 1 − 1−( ∧ 1) .
e∈r
ye
i

In this section, we will use the classical approximation


 κqe
πi  πi := ( ∧ 1), (1)
e∈r
ye
i

which is valid when the right hand side of Equation (1) is small. In the experi-
ments presented in Section 3, we obtain values of πi that never exceed 0.2. Note
that this approximation is equivalent to assuming that a user pays twice the fine
if he is caught twice.

Strategy of the inspectors. We think of the set of inspectors as a single player


who splits the total control force κ according to a distribution q ∈ ΔE , called
the mixed strategy of the controller. Similarly, the users of the route ri ∈ R
are considered as a single player (called the ith user), who pays the toll with
a probability pi and tries to evade with the complementary probability 1 − pi .
We say that the ith user plays with mixed strategy pi = [pi , 1 − pi ]T ∈ Δ2 . Our
assumption that the users have the ability to learn the control distribution q can
be described in the framework of a Stackelberg game, in which the controller is the
leader, who makes the first move, while the users are followers who react to the
leader’s action. The controller knows that the users will adjust their strategies
pi depending on the control distribution q, and plays accordingly. We can now
formulate the problem of optimally distributing the controls over the networks,
with respect to two distinct objectives.

Maximizing the profit. If the controller wants to maximize the total revenue
generated by the toll, which is, by construction, equal to the total loss of the
users, the problem to solve is:
 
max xi λi = max xi min(Ti , Pi πi ), (2)
q∈ΔE q∈ΔE
i∈[m] i

where πi depends on q through Equation (1). If the costs of the controls must
be taken into account, and the cost for a control on section e is ce , then we can
solve:  
max xi min(Ti , Pi πi ) − qe κce , (3)
q∈Δ−
E i∈[m] e∈E
228 R. Borndörfer et al.


where Δ− E := {q ∈ [0, 1]
|E|
: e∈E qe ≤ 1} (we do not necessarily use all the
control capacity). It is not difficult to see that there must be an optimum such
that ∀e ∈ E, κq ye ≤ 1, because the controller never has interest to place more
e

capacity of control on a section than the number of users that pass through
 κq
it. If we impose this constraint, the expression of πi simplifies to e
ye , and
e∈ri
Problem (3) becomes a linear program:
 
max− xi λi − qe κce (4)
q∈ΔE
i e∈E
λ∈Rm
 κqe
s. t. ∀i ∈ [m], λi ≤ Pi
e∈r
ye
i

∀i ∈ [m], λi ≤ Ti
∀e ∈ E, κqe ≤ ye .

Minimizing the number of evaders. If the goal of the controller is to minimize


the number of evaders, the problem to solve is:

min xi .
q∈ΔE
{i∈[m]: Pi πi <Ti }

Note that we have chosen to consider here that the ith user is paying when the
threshold πi = PTii is reached but not exceeded. We can formulate this problem
as a mixed integer program (MIP), by introducing a binary variable δi which is
forced to take the value 1 when πi < PTii :

min xi δi (5)
q∈ΔE
δ∈{0;1}m i

Ti  κqe
s. t. ∀i ∈ [m], ≤ + δi
Pi e∈r
ye
i

∀e ∈ E, κqe ≤ ye .

2.2 Coalition of a Finite Number of Controllers


In this section, we consider a more realistic setting, in which N inspectors, each
κ
having a capacity of control N , play a cooperative game in order to maximize
the revenue generated from the toll. A strategyof the coalition of controllers
consists of a vector n ∈ SN := {n ∈ N|E| : e∈E ne = N } that indicates
how many inspectors are allocated to each edge of the network. We assume that
the inspectors play with a mixed strategy q ∈ ΔSN , i.e. they choose the spatial
distribution n ∈ SN with the probability qn . With this setting, the probability
for a user of the route ri to be controlled during its trip becomes
  ne κ 
π̄i = qn 1− 1−( ∧ 1) .
e∈ri
N ye
n∈SN

αn,i
A Stackelberg Game 229

As in Section 2.1, the problem to maximize the revenue generated from the toll
can be formulated as an LP (we do not consider control costs for the sake of
simplicity). Note that this time, we do not need to take a linear approximation
of π̄i , because αn,i is a fixed parameter:

max xi λi
q∈ΔSN
i
λ∈Rm
s. t. ∀i ∈ [m], λi ≤ Ti (6a)

∀i ∈ [m], λi ≤ qn Pi αn,i . (6b)
n∈Sn

Although the number of strategies of the inspectors’ coalition is exponential with


respect to N , we will see that this problem can be solved efficiently by column
generation. Let v n denote the vector of Rm with coordinates vn,i = Pi αn,i . From
a geometrical point of view, the constraint (6b) restricts λ to the polyhedron P
which is defined as the convex hull of the vertices (v n )n∈SN , plus the cone of
nonpositive vectors:

P = {v + z : v ∈ convex-hull({v n : n ∈ SN }), z ∈ Rm
− }. (7)

The next proposition shows that if the capacity of control κ is smaller than the
traffic on every edge, then P has no more than |E| extreme points, so that we
can impose qn = 0 for almost all n ∈ SN .
Proposition 1. Assume that ∀e ∈ E, κ ≤ ye , and denote by ñ(e) the allocation
where all the inspectors are concentrated on edge e. Then, every extreme point
of P is of the form v ñ(e) for an e ∈ E. Hence, Problem (6) has a solution in
which qn = 0 for all n ∈ SN \ {ñ(e) : e ∈ E}.

Proof. It is clear that the extreme points of convex-hull(SN ) are the vectors of
the form ñ(e) := [0, . . . , 0, N, 0, . . . , 0]T , with the nonzero in position e. The
application n → un , which maps SN onto Rm , and where
 ne κ
un,i := Pi
e∈r
N ye
i

is linear, and hence the extreme points of the polyhedron with vertices (un )n∈SN
are among the images of the extreme points of convex-hull(SN ), that is, the
vectors uñ(e) (e ∈ E). Let n ∈ SN . Since κ ≤ ye for all e, the expression of vn,i
can be simplified to:

 ne κ
vn,i = Pi 1− (1 − ) ≤ un,i ,
e∈ri
N ye

where the inequality follows from the log-concavity of x → i (1 − xi ). Moreover
the equality is attained for the vectors of the type v ñ(e) , because the product
consists of only one factor (or even 0 factor if e ∈/ ri ), i.e., ∀e ∈ E we have
230 R. Borndörfer et al.

v ñ(e) = uñ(e) . This shows that un ∈ P, because it can be written as a convex


combination of the vectors (v ñ(e) )e∈E . Finally, if n is not of the type ñ(e), i.e.,
max ne < N , then we know that un is not an extreme point of P, and hence
e∈E
the vector v n , which can be written as un + z for a vector z ∈ Rm
− is not an
extreme point of P.
If κ > ye for some e ∈ E, then some other extreme points will appear. How-
ever, we expect the solution to be sparse and we can solve Problem (6) by
column generation. In addition to the columns corresponding to the variable λ,
we start with the columns that correspond to the fully concentrated allocations
(qñ(e) )e∈E . After each iteration, the subproblem that we must solve to add a
new column is the maximization of the reduced cost μT vn − μ0 , where μ ≥ 0
is the current dual variable
 associated with the constraints (6b), and μ0 is the
dual of the constraint n qn ≤ 1:
⎧ ⎫
⎨  ne κ  ⎬
max μi Pi 1 − 1−( ∧ 1) − μ0 : n ∈ SN (8)
n ⎩ N ye ⎭
i∈[m] e∈ri

We use a greedy heuristic to find an approximate solution of Problem (8): we


start from the configuration n(0) = 0 without any inspector, and for k =
1, . . . , N we add an inspector on the section which causes the largest possible
increase of the reduced cost:

(k−1)
(k) ne + 1 if e = ek
∀e ∈ E, ne = (k−1)
ne otherwise,

  (k−1)
(ne + δe,e )κ 
where ek ∈ argmax μi Pi 1− 1− ∧1 .
e ∈E e∈r
N ye
i∈[m] i

In the above equation, δ stands for the Kronecker delta function. We use the
vector n(N ) generated by this greedy procedure as an approximation for the
solution of (8), and we add the column vn(N ) in the linear program. Finally, we
solve this augemented linear program and repeat the above procedure.
An argument justifying this greedy method is that if we use the same approx-
imation as in Equation (1), the objective of Problem (8) becomes separable and
concave, and it is well known that the greedy procedure finds the optimum (see
e.g. [5]). The column generation procedure can be stopped when the optimal
value of Problem (8) is 0, which guarantees that no other column can increase
the value of Problem (6). In practice, we stop the column generation as soon as
the reduced cost of the strategy n(N ) returned by the greedy procedure is 0.

2.3 Relation with Polymatrix Games


A polymatrix game is a multiplayer game in which the payoff of player i is the
sum of the partial payoffs received from a bimatrix game against each other
player:
A Stackelberg Game 231

Payoff(i) = pi T Aij pj .
j=i

In this section, we establish a relation between the solutions of the model (3)
presented above and the Nash equilibriums of a particular polymatrix game. For
the model without costs (2), it is not difficult to write the payoff of the controller
as the sum of partial payoffs from zero-sum bimatrix games played against each
user (recall that pi = [pi , 1 − pi ]T ):
  
Payoff(controller) = xi λi = Loss(user i) = pTi Ai q,
i i i

where Ai is the 2 × |E|−matrix with elements


 κ
∀e ∈ E, (Ai )1,e = xi Ti ; (Ai )2,e = ye xi Pi if e ∈ ri ;
0 otherwise.

This particular polymatrix game has a special structure, since the interaction
between the players can be modelled by a star graph with the controller in the
central node, and each edge represents a zero-sum game between a user and the
controller. Modulo the additional constraint κqe ≤ ye , which bounds from above
the mixed strategy of the controller, any Nash equilibrium (q, p1 , . . . , pm ) of
this polymatrix game gives a solution q to the Stackelberg competition problem
studied in Section 2.1. The model with control costs (3) can also be formulated
in this way, by adding a new player who has a single strategy. This player plays
a zero-sum
 game against the controller, whose payoff is the sum of the control
costs e ce qe .
Interestingly, the fact that Problem (3) is representable by a LP is strongly
related to the fact that every partial game is zero-sum. We point out a recent
paper of Daskalakis and Papadimitriou [3], who have generalized the Neumann’s
minmax theorem to the case of zero-sum polymatrix games. In the introduction
of the latter article, the authors moreover notice that for any star network, we
can find an equilibrium of a zero-sum polymatrix game by solving a LP.

3 Experimental Results

We have solved the models presented in this paper for several regions of the
German motorways network, based on real traffic data (averaged over time). We
present here a brief analysis of our results. On Figure 1, we have represented
the mixed strategy of the controller that maximizes the revenue from the toll
(without control costs, for κ = 60 controls per hour), for the regions of Berlin-
Brandenburg and North Rhine-Westphalia (NRW). The graphs corresponding
to these regions consist of 57 nodes (resp. 111) and 120 directed edges (resp.
264), and we have taken in consideration 1095 routes (resp. 4905). We have used
a toll fee of 0.176 e per kilometer, and a penalty of 400 e that does not depend
on the route.
232 R. Borndörfer et al.

Control rate

Berlin

Brandenburg

Cottbus

(a)
Control rate

Dortmund

Duisburg

Wuppertal
Düsseldorf

(b)

Fig. 1. Mixed strategy of the controller which maximizes the revenue (2), for the regions
of Berlin-Brandenburg (a), and NRW (b). The widths of the sections indicate the traffic
volumes.

For the region of Berlin-Brandenburg, we have plotted the evolution of the


number of evaders and the revenue generated from the toll as a function of κ on
Figure 2. Just to give an idea of the order of magnitudes, there is an average of
1620 trucks per hour in this instance. The strategies that maximize the revenue
A Stackelberg Game 233

100 max_revenue
min_evaders
proportional
80

fraction of evaders(%)
60

40

20

0
0 20 40 60 80 100

κ (# controls per hour)

(a)

100
revenue (% of the all pay case)

80

60

40

20 max_revenue
min_evaders
proportional
0
0 20 40 60 80 100

κ (# controls per hour)

(b)

Fig. 2. Evolution of the number of evaders (a) and of the toll revenue (b) with κ, for
the region of Berlin-Brandenburg

and that minimize the number of evaders are compared to the case where the
controls are proportional to the traffic. Several conclusions can be drawn from
this Figure: first, the “proportional” strategy is not so bad in terms of revenue,
however a difference of up to 4% with the max_revenue strategy is observed.
Second, the number of evaders decreases much faster when the controls are dis-
tributed with respect to this goal. For κ = 55, the evasion rate achieved by the
234 R. Borndörfer et al.

control distribution that is proportional to the traffic (resp. that maximizes the
revenue) is of 97% (resp. 54%), while we can achieve an evasion rate of 31% with
the min_evaders strategy. Third, both the max_revenue and the min_evaders
strategies create a situation in which it is in the interest of no driver to evade
for κ ≥ 80.3. In contrast, there is still 2% of the drivers who had better evade
with the proportional strategy for κ = 115.
We have also computed the optimal mixed strategy for a coalition of N = 13
inspectors, with the column generation procedure described in Section 2.2. For
κ = 60, we found that the N inspectors should be simultaneously allocated to a
common section 84% of the time. The column generation procedure, which allows
to consider the strategies where the inspectors are spread over the network, yields
an increase of revenue of only 1.84%. An intuitive explanation is that spreading
out the inspectors leads to potentially controlling several times the same driver.
Moreover, most of the traffic passes only through sections where ye ≥ κ, so that
v ñ(e) is an extreme point of P (cf. Equation (7)).

4 Conclusion and Perspectives


We have presented a novel approach based on a Stackelberg game to spread
out the controls over a transportation network, in order to enforce the payment
of a transit toll. To the best of our knowledge, this is the first article which
studies the distribution of controls while taking the topology of the network
into account. The problem of distributing the controls so as to maximize the
expected toll revenue (resp. minimize the number of evaders) was formulated as
a linear program (resp. mixed integer program), and we have drawn a parallel
with polymatrix games. Experimental results suggest that this approach can
lead to significant improvements compared to the strategy which consists in
controlling each section proportionally to the traffic volumes, especially when
the goal is to minimize the number of toll evaders.
We have also shown that our model can be extended to deal with the prob-
lem of simultaneously deploying N controllers over the network. Despite the
apparent complexity of this problem, we were able to find a solution by column
generation in our experiments. The optimal strategy assigns most of the time
the N controllers to the same section.
In future work, we want to improve the behavioral model of the users. A key
point seems to be the perception of the probability to be controlled as a function
of the control distributions, which can very different for several users [1]. We also
want to introduce some time dynamics in the model, since the diurnal variations
of the traffic can be very important.

Aknowledgement. The authors express their gratitude to Tobias Harks for


his precious suggestions, which improved the presentation of this article. They
also thank Julia Buwaya for her valuable support on this project.
A Stackelberg Game 235

References
1. Boyd, C., Martini, C., Rickard, J., Russell, A.: Fare evasion and non-compliance: A
simple model. Journal of Transport Economics and Policy, 189–197 (1989)
2. Borndörfer, R., Sagnol, G., Swarat, E.: An IP approach to toll enforcement opti-
mization on german motorways. Tech. Rep. ZIB, Report 11-42, Zuse Institut Berlin
(2011)
3. Daskalakis, C., Papadimitriou, C.H.: On a Network Generalization of the Minmax
Theorem. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S.,
Thomas, W. (eds.) ICALP 2009, Part II. LNCS, vol. 5556, pp. 423–434. Springer,
Heidelberg (2009)
4. Hildebrand, M.D., Prentice, B.E., Lipnowski, I.: Enforcement of highway weight
regulations: A game theoretic model. Journal of the Transportation Research Fo-
rum 30(2) (1990)
5. Ibaraki, T., Katoh, N.: Resource allocation problems: algorithmic approaches. MIT
Press (1988)
6. Jankowski, W.B.: Fare evasion and non-compliance: A game theoretical approach.
International Journal of Transport Economics 38(3), 275–287 (1991)
7. Thorlacius, P., Clausen, J., Brygge, K.: Scheduling of inspectors for ticket spot
checking in urban rail transportation. Trafikdage på Aalborg Universitet 2008 (2010)
Stochastic Loss Aversion
for Random Medium Access

George Kesidis and Youngmi Jin


1
CS&E and EE Depts, Penn State University
[email protected]
2
EE Dept, KAIST, South Korea
youngmi [email protected]

Abstract. We consider a slotted-ALOHA LAN with loss-averse, non-


cooperative greedy users. To avoid non-Pareto equilibria, particularly
deadlock, we assume probabilistic loss-averse behavior. This behavior is
modeled as a modulated white noise term, in addition to the greedy
term, creating a diffusion process modeling the game. We observe that
when player’s modulate with their throughput, a more efficient explo-
ration of play-space (by Gibbs sampling) results, and so finding a Pareto
equilibrium is more likely over a given interval of time.

Keywords: ALOHA MAC, Pareto equilibria, diffusion machine.

1 Introduction

The “by rule” window flow control mechanisms of, e.g., TCP and CSMA, have
elements of both proactive and reactive communal congestion control suitable
for distributed/information-limited high-speed networking scenarios. Over the
past ten years, game theoretic models for medium access and flow control have
been extensively explored in order to consider the effects of even a single end-
user/player who greedily departs from such prescribed/standard behaviors [1, 6,
9, 13–16, 23–25, 28]. Greedy end-users may have a dramatic effect on the overall
“fairness” of the communication network under consideration. So, if even one
end-user acts in a greedy way, it may be prudent for all of them to do so.
However, even end-users with an noncooperative disposition may temporarily
not practice greedy behavior in order to escape from sub-optimal (non-Pareto)
Nash equilibria. In more general game theoretic contexts, the reluctance of an
end-user to act in a non-greedy fashion is called loss aversion [7].
In this note, we focus on simple slotted-ALOHA MAC for a LAN. We begin
with a noncooperative model of end-user behavior. Despite the presence of a stable
interior Nash equilibrium, this system was shown in [13,14] to have a large domain
of attraction to deadlock where all players’ transmission probability is one and
so obviously all players’ throughput is zero (here assuming feasible demands and
throughput based costs). To avoid non-Pareto Nash equilibria, particularly those

G. Kesidis was supported by American NSF CISE/CNS grant 0916179.

Y. Jin was supported by the Korean NRF grant number 2010-0006611.

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 236–247, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Stochastic Loss Aversion for Random Medium Access 237

involving zero throughput for some or all users, we assume that end-users will
probabilistically engage in non-greedy behavior. That is, a stochastic model of loss
aversion, a behavior whose aim is long term communal betterment.
We may be able to model a play that reduces net-utility using a single
“temperature” parameter T in the manner of simulated annealing (e.g., [12]);
i.e., plays that increase net utility are always accepted and plays that reduce net
utility are (sometimes) accepted with probability decreasing in T , so the players
are (collectively) less loss averse with larger T . Though our model of probabilis-
tic loss aversion is related that of simulated annealing by diffusions [10,29], even
with a free meta-parameter (η or ηw below) possibly interpretable as tempera-
ture, our modeling aim is not centralized annealing (temperature cooling) rather
decentralized exploration of play-space by noncooperative users.
We herein do not model how the end-users will keep track of the best (Pareto)
equilibria previously played/discovered1. Because the global extrema of the global
objective functions (Gibbs exponents) we derive do not necessarily correspond to
Pareto equilibria, we do not advocate collective slow “cooling” (annealing) of the
equivalent temperature parameters. Also, we do not model how end-user through-
put demands may be time-varying, a scenario which would motivate the “contin-
ual search” aspect of the following framework.
The following stochastic approach to distributed play-space search is also
related to “aspiration” of repeated games [3, 8, 18], where a play resulting in
suboptimal utility may be accepted when the utility is less than a threshold, say
according to a “mutation” probability [17, 26]. This type of “bounded rational”
behavior been proposed to find Pareto equilibria, in particular for distributed
settings where players act with limited information [26]. Clearly, given a global
objective L whose global maxima correspond to Pareto equilibria, these ideas
are similar to the use of simulated annealing to find the global maxima of L
while avoiding suboptimal local maxima.
This paper is organized as follows. In Section 2, we formulate the basic
ALOHA noncooperative game under consideration. Our stochastic framework
(a diffusion) for loss aversion is given in Section 3; for two different modulat-
ing terms of the white-noise process, the invariant distribution in the collective
play-space is derived. A two-player numerical example is used to illustrate the
performance of these two approaches in Section 4. We conclude in Section 5 with
a discussion of future work.

2 A Distributed Slotted-ALOHA Game for LAN MAC


Consider an idealized2 ALOHA LAN where each user/player i ∈ {1, 2, ..., n}
has (potentially different) transmission probability vi . For the collective “play”
v = (v1 , v2 , ..., vn ), the net utility of player i is
1
The players could, e.g., alternate between (loss averse) greedy behavior to discover
Nash equilibrium points, and the play dynamics modeled herein for breadth of search
(to escape non-Pareto equilibria).
2
We herein do not consider physical layer channel phenomena such as shadowing and
fading as in, e.g., [16, 25].
238 G. Kesidis and Y. Jin

Vi (v) = Ui (θi (v)) − M θi (v), (1)

where the strictly convex and increasing utility Ui of steady-state throughput



θi := vi (1 − vj )
j=i

is such that Ui (0) = 0, and the throughput-based price is M . So, the throughput-
demand of the ith player is

yi := (U  )−1 (M ).

This is a quasi-stationary game wherein future action is based on the outcome


of the current collective play v observed in steady-state [5].
The corresponding continuous Jacobi iteration of the better response dynam-
ics is [13, 14, 27]: for all i

d yi
vi =  − vi =: − Ei (v), (2)
dt j=i (1 − vj )

cf. (6). Note that we define −Ei , instead of Ei , to be consistent with the notation
of [29], which seeks to minimize a global objective, though we want to maximize
such objectives in the following.
Such dynamics generally exhibit multiple Nash equilibria, including non-Pareto
equilibria with significant domains of attraction. Our ALOHA context has a sta-
ble deadlock equilibrium point where all players always transmit, i.e., v = 1 :=
(1, 1, ..., 1) [13, 14].

3 A Diffusion Model of Loss Aversion

Generally in the following, we consider differently loss-averse players. Both ex-


amples considered are arguably distributed (information limited) games wherein
every player’s choice of transmission probability is based on information know-
able to them only through their channel observations, so that consultation among
users is not required. In particular, players are not directly aware of each other’s
demands (y).

3.1 Model Overview

We now model stochastic perturbation of the Jacobi dynamics (2), allowing for
suboptimal plays despite loss aversion, together with a sigmoid mapping g to
ensure plays (transmission probabilities) v remain in a feasible hyper-rectangle
D ⊂ [0, 1]n (i.e., the feasible play-space for v): for all i,

dui = −Ei (v)dt + σi (vi )dWi (3)


vi = gi (ui ) (4)
Stochastic Loss Aversion for Random Medium Access 239

where Wi are independent standard Brownian motions. An example sigmoid is

g(u) := γ(tanh(u/w) + δ), (5)

where 1 ≤ δ < 2 and 0 < γ ≤ 1/(1 + δ). Thus, inf u g(u) = inf v = γ(−1 +
δ) ≥ 0 and supu g(u) = sup v = γ(1 + δ) ≤ 1. Again, to escape from the
domains of attraction of non-Pareto equilibria, the deterministic Jacobi dynamics
(i.e., −Ei (v)dt in (3)) have been perturbed by white noise (dWi ) here modulated
by a diffusion term of the form:

2hi (v)
σi (vi ) = ,
fi (vi )

where

fi (vi ) := gi (gi−1 (vi )).

For the example sigmoid (5),


  2 
γ v
f (v) = 1− −δ .
w γ

In the following, we will consider different functions hi leading to Gibbs invariant


distributions for v.
Note that the discrete-time (k) version of this game model would be

ui (k + 1) − u(k) = −Ei (v(k))ε + σi (v(k))Ni (k)


vi (k + 1) = gi (ui (k + 1)), (6)

where the Ni (k) are all i.i.d. normal N(0, ε) random variables.
The system just described is a variation of E. Wong’s diffusion machine [29],
the difference being the introduction of the term h instead of a temperature
meta-parameter T . Also, the diffusion function σi is player-i dependent at least
through hi . Finally, under the slotted-ALOHA dynamics, there is no function
E(v) such that ∂E/∂vi = Ei , so we will select the diffusion factors hi to achieve
a tractable Gibbs stationary distribution of v, and interpret them in terms of
player loss aversion.
Note that in the diffusion machine, a common temperature parameter T may
be slowly reduced to zero to find the minimum of a global potential function
(the exponent of the Gibbs stationary distribution of v) [20, 21], in the manner
of simulated annealing. Again, the effective temperature parameter here (η or
ηw) will be constant.

3.2 Example Diffusion Term hi Decreasing in vi


In this subsection, we analyze the model when, for all i,

hi (vi ) := ηyi (1 − vi )2 . (7)


240 G. Kesidis and Y. Jin

with η > 0 a free meta-parameter (assumed common to all players). So, a greedier
player i (larger yi ) will generally tend to be less loss averse (larger hi ), except
when their current retransmission play vi is large.

Theorem 1. The stationary probability density function of v ∈ D ⊂ [0, 1]n ,


defined by (4) and (3), is
 
1 Λ(v)
p(v) = exp − log H(v) , (8)
Z ηY

where: the normalizing term


 
Λ(v)
Z := exp − log H(v) dv,
D ηY
n

D := (γi (−1 + δi ), γi (1 + δi ))
i=1
n
 N  
yi vj
Λ(v) := − + log(1 − vj ) yi
i=1
1 − vi j=1 1 − vj
i=j
n

H(v) := (1 − vi )2 , and
j=1
n
Y := yj .
j=1

Remark: Λ is a Lyapunov function of the deterministic (σi ≡ 0 for all i) Jacobi


iteration [13, 14].

Proof. Applying Ito’s lemma [19, 29] to (3) and (4) gives
1
dvi = gi (ui )dui + gi (ui )σi2 (v)dt
2
1
= [−fi (vi )Ei (v) + gi (gi−1 (vi ))σi2 (v)]dt
2
+ fi (vi )σi (v)dWi ,

where the derivative operator z  := d z(vi ) and we have just substituted (3)
dvi
for the second equality. From the Fokker-Planck (Kolmogorov forward) equation
for this diffusion [19, 29], we get the following equation for the time-invariant
(stationary) distribution p of v: for all i,
1 1
0= ∂i (fi2 σi2 p) − [−fi Ei + (gi ◦ gi−1 )σi2 ]p,
2 2

where the operator ∂i := ∂vi .
Stochastic Loss Aversion for Random Medium Access 241

Now note that

fi2 (vi )σi2 (v) = 2hi (vi )fi (vi ) and


gi (gi−1 (vi ))σi2 (vi ) = 2hi (vi )gi (gi−1 (vi ))/fi (vi )
= 2hi (vi )fi (vi ).

So, the previous display reduces to

0 = ∂i (hi fi p) − (−Ei fi + hi fi )p


= (hi ∂i p + hi p + Ei p)fi ,

where the second equality is due to cancellation of the hi fi p terms. For all i,
since fi > 0,
∂i p(v) Ei (v) h (vi )
= ∂i log p(v) = − − i (9)
p(v) hi (vi ) hi (vi )
1 2
= ∂i Λ(v) + .
ηY 1 − vi
Finally, (8) follows by direct integration. 

Unfortunately, the exponent of p under (7),


Λ(v)
Λ̃(v) := − log H(v), (10)
ηY
and both its component terms Λ and − log H, remain maximal in the deadlock
region near 1. Under first-order necessary conditions for optimality, ∇Λ̃ = 0,
demand is less than achieved throughput for every user i:

vi j=i (1 − vj )
yi =  . (11)
1 + 2η j (1 − vj )

3.3 Example Diffusion Term hi Increasing in vi


The following alternative diffusion term hi is an example which is instead in-
creasing in vi , but decreasing in the channel idle time from player i’s point-of-
view [2, 11],
ηvi
hi (v) :=  . (12)
j=i (1 − vj )

That a user would be less loss averse (higher h) when the channel was perceived
to be more idle may be a reflection of a “dynamic” altruism [2] (i.e., a player is
more courteous as s/he perceives that others are). The particular form of (12)
also leads to another tractable Gibbs distribution for v.
242 G. Kesidis and Y. Jin

Theorem 2. Using (12), the stationary probability density function of the dif-
fusion v on [0, 2γ]n is
1
p(v) = exp(Δ(v)) (13)
W
where
n   n
yi 1
Δ(v) = − 1 log vi + (1 − vi ), (14)
i=1
η η i=1

and W is the normalizing term.

Proof. Following the proof of Theorem 1, the invariant here satisfies also satisfies
(9):
Ei (v)
∂i log p(v) = − − ∂i log hi (v)
hi (v)
yi 1 1
= − (1 − vj ) − .
ηvi η vi
j=i

Substituting (12) gives:


 
yi 1 1
∂i log p(v) = −1 − (1 − vj ).
η vi η
j=i

So, we obtain (14) by direct integration. 

3.4 Discussion
Note that if η > maxi yi , then Δ is strictly decreasing in vi for all i, and so will
be minimal in the deadlock region (unlike Λ̃). So the stationary probability in
the region of deadlock will be low. However, large η may result in the stationary
probability close to 0 being very high. So, we see that the meta-parameter η (or
ηw) here plays a more significant role (though the parameters δ and γ in g play
a more significant role in the former objective Λ̃ owing to its global extremum
at 1).
For small η < mini yi , note that Δ(1) = 0, i.e., it is not a maximal singularity
at 1 as Λ̃. Also, the differences in role played by η in the two Gibbs distributions
(8) and (13) is apparent from the first-order necessary conditions for optimality
of their potentials:

∇Λ(v) = 0 ⇔ yi = vi (1 − vj )
j=i

∇Δ(v) = 0 ⇔ yi − η = vi (1 − vj ),
j=i
Stochastic Loss Aversion for Random Medium Access 243

so that here demand is more than achieved throughput. Thus, under the potential
Δ, if 0 < η < mini yi , then the Gibbs distribution is maximal at points v where
the throughputs θ = y − η1, i.e., all users’ achieved throughputs are less than
their demands by the same constant amount η. So, the meta-parameter η may
be used to deal with the problem of excessive total demand i yi .
Finally note that the Hessian of Δ has all off-diagonal entries 1/η and ith
diagonal entry −(yi − η)/(ηvi2 ). Assume that the reduced demands y − η1 are
feasible and achieved at v. If yi − η > (n − 1)vi2 for all users i (again where n
is the number of users), then by diagonal dominance, Δ’s Hessian is negative
definite at v and hence is a local maximum there. The sufficient condition of
diagonal dominance is achieved in the special case when vi < 1/(2n) for all i
because for all i:


yi − η = vi (1 − vj ) ≈ vi (1 − vj ),
j=i j=i

where the approximation is accurate since j vj < 1/2 by assumption, and

(n − 1)vi + vj < 0.5 + 0.5 = 1,


j=i

yi − η vi (1 − j=i vj ) 1
i.e., ≈ > (n − 1) .
ηvi2 ηvi2 η

This special case obviously does not include the classical, static choice for slotted
ALOHA of vi = 1/n for all i, which leads to optimal total throughput (for the
identical users case) of 1/e when n is large.

4 Numerical Example
For an n = 2 player example with demands y = (8/15, 1/15) and η = 1, the two
interior Nash equilibria are the locally stable (under deterministic dynamics)
at v ∗a = (2/3, 1/5) and the (unstable) saddle point at v ∗b = (4/5, 1/3) (both
with corresponding throughputs θ = y) [13, 14]. Again, 1 is a stable deadlock
boundary equilibrium which is naturally to be avoided if possible as both players’
throughputs are zero there, θ = 0. Under the deterministic dynamics of (2),
the deadlock equilibrium 1 had a significant domain of attraction including a
neighborhood of the saddle point v ∗b .
The exponent of p (potential of the Gibbs distribution), Λ̃, for this example
is depicted in Figure 1. Λ̃ has a shape similar to that of the Lyapunov function
Λ, but without the same interior local extrema or saddle points by (11). The
extreme mode at 1 is clearly evident.
244 G. Kesidis and Y. Jin

Fig. 1. The potential/exponent (10) of the Gibbs distribution (8) for n = 2 players
with demands y = (8/15, 1/15)

4.1 Small η
For the case where 0 < η < min{y1 , y2 }, we took η = 0.01 for the example
above. The potential Δ of the Gibbs distribution (13) is depicted in Figure 2.
Compared to Λ̃ in Figure 1, v = 1 is not a local extremum under Δ (and does

Fig. 2. The potential Δ of (13) for n = 2 players with demands y = (8/15, 1/15) under
(12) with η = 0.01
Stochastic Loss Aversion for Random Medium Access 245

not have a domain of attraction). The function Λ under demands y − .01 · 1,


denoted Λ∗ (recall the discussion at the end of Section 3.4), is depicted in Figure
3 and, again, is similar to that depicted in Figure 1. For purposes of reference
in these figures, the following table compares these quantities at the points v ∗
that achieve the demands y under Λ:

v1∗ , v2∗ Λ Δ Λ∗
5 , 3 .059 −4.6 .037
4 1

3 , 5 .057 −3.7 .046


2 1

Fig. 3. The component Λ of the potential of (8) for n = 2 players with demands
y = (8/15, 1/15) − 0.01 · 1

4.2 Large η
See [22] for a numerical example of this case, where we illustrate how the use of
(12) results in dramatically less sensitivity to the choice of the parameters δ and
γ governing the range of the play-space D.

5 Conclusions and Future Work


The diffusion term (12) was clearly more effective than (7) at exploring the play-
space, but the interior local maxima of the Gibbs distribution are at points where
achieved throughput is less than demand by the “temperature” meta-parameter
η. In future work, we plan to explore other diffusion factors h and consider a
246 G. Kesidis and Y. Jin

model with power based costs, i.e., M v instead of M θ in the net utility (1).
Also, we will study the effects of asynchronous and/or multirate play among the
users [2, 4, 15].

References
1. Altman, E., Boulogne, T., El-Azouzi, R., Jiménez, T., Wynter, L.: A survey on net-
working games in telecommunications. Comput. Oper. Res. 33(2), 286–311 (2006)
2. Antoniadis, P., Fdida, S., Griffin, C., Jin, Y., Kesidis, G.: CSMA Local Area Net-
working under Dynamic Altruism (December 2011) (submitted)
3. Bendor, J., Mookherjee, D., Ray, B.: Aspiration-based reinforcement learning in re-
peated interaction games: an overview. International Game Theory Review 3(2&3),
159–174 (2001)
4. Bertsekas, D.P., Tsitsiklis, J.N.: Convergence rate and termination of asynchronous
iterative algorithms. In: Proc. 3rd International Conference on Supercomputing
(1989)
5. Brown, G.W.: Iterative solutions of games with fictitious play. In: Koopmans, T.C.
(ed.) Activity Analysis of Production and Allocation. Wiley, New York (1951)
6. Cagalj, M., Ganeriwal, S., Aad, I., Hubaux, J.P.: On Selfish Behavior in CSMA/CA
networks. In: Proc. IEEE INFOCOM (2005)
7. Camerer, C.F., Loewenstein, G.: Behavioral Economics: Past, Present, Future. In:
Camerer, C.F., Loewenstein, G., Rabin, M. (eds.) Advances in Behavioral Eco-
nomics. Princeton Univ. Press (2003)
8. Cho, I.-K., Matsui, A.: Learning aspiration in repeated games. Journal of Economic
Theory 124, 171–201 (2005)
9. Cui, T., Chen, L., Low, S.H.: A Game-Theoretic Framework for Medium Access
Control. IEEE Journal on Selected Areas in Communications 26(7) (September
2008)
10. Gidas, B.: Global optimization via the Langevin equation. In: Proc. IEEE CDC,
Ft. Lauderdale, FL (December 1985)
11. Heusse, M., Rousseau, F., Guillier, R., Dula, A.: Idle sense: An optimal access
method for high throughput and fairness in rate diverse wireless LANs. In: Proc.
ACM SIGCOMM (2005)
12. Holley, R., Stroock, D.: Simulated Annealing via Sobolev Inequalities. Communi-
cations in Mathematical Physics 115(4) (September 1988)
13. Jin, Y., Kesidis, G.: A pricing strategy for an ALOHA network of heterogeneous
users with inelastic bandwidth requirements. In: Proc. CISS, Princeton (March
2002)
14. Jin, Y., Kesidis, G.: Equilibria of a noncooperative game for heterogeneous users
of an ALOHA network. IEEE Communications Letters 6(7), 282–284 (2002)
15. Jin, Y., Kesidis, G.: Dynamics of usage-priced communication networks: the case
of a single bottleneck resource. IEEE/ACM Trans. Networking (October 2005)
16. Jin, Y., Kesidis, G.: A channel-aware MAC protocol in an ALOHA network with
selfish users. IEEE JSAC Special Issue on Game Theory in Wireless Communica-
tions (January 2012)
17. Kandori, M., Mailath, G., Rob, R.: Learning, mutation, and long run equilibria in
games. Econometrica 61(1), 29–56 (1993)
18. Karnadikar, R., Mookherjee, D., Ray, D., Vega-Redondo, F.: Evolving aspirations
and cooperation. Journal of Economic Theory 80, 292–331 (1998)
Stochastic Loss Aversion for Random Medium Access 247

19. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Springer
(1991)
20. Kesidis, G.: Analog Optimization with Wong’s Stochastic Hopfield Network. IEEE
Trans. Neural Networks 6(1) (January 1995)
21. Kesidis, G.: A quantum diffusion network. Technical Report 0908.1597 (2009),
http://arxiv.org/abs/0908.1597
22. Kesidis, G., Jin, Y.: Stochastic loss aversion for random medium access. Technical
report (January 9, 2012), http://arxiv.org/abs/1201.1776
23. Lee, J.W., Chiang, M., Calderbank, R.A.: Utility-optimal random-access protocol.
IEEE Transactions on Wireless Communications 6(7) (July 2007)
24. Ma, R.T.B., Misra, V., Rubenstein, D.: An Analysis of Generalized Slotted-Aloha
Protocols. IEEE/ACM Transactions on Networking 17(3) (June 2009)
25. Menache, I., Shimkin, N.: Fixed-Rate Equilibrium in Wireless Collision Channels.
In: Chahed, T., Tuffin, B. (eds.) NET-COOP 2007. LNCS, vol. 4465, pp. 23–32.
Springer, Heidelberg (2007)
26. Montanari, A., Saberi, A.: Convergence to equilibrium in local interaction games.
In: FOCS (2009)
27. Shamma, J.S., Arslan, G.: Dynamic fictitious play, dynamic gradient play, and
distributed convergence to Nash equilibria. IEEE Trans. Auto. Contr. 50(3), 312–
327 (2005)
28. Wicker, S.B., MacKenzie, A.B.: Stability of Multipacket Slotted Aloha with Selfish
Users and Perfect Information. In: Proc. IEEE INFOCOM (2003)
29. Wong, E.: Stochastic Neural Networks. Algorithmica 6 (1991)
Token-Based Incentive Protocol Design
for Online Exchange Systems

Jie Xu , William Zame, and Mihaela van der Schaar

University of California Los Angeles, Los Angeles CA 90095, USA


[email protected], [email protected], [email protected]

Abstract. In many online exchange systems, agents provide services to


satisfy others agents’ demands. Typically, the provider incurs a (imme-
diate) cost and hence, it may withhold service. As a result, the success of
the exchange system requires proper incentive mechanisms to encourage
service provision. This paper studies the design of such systems that are
operated based on the exchange of tokens, a simple internal currency
which provides indirect reciprocity among agents. The emphasis is on
how the protocol designer should choose a protocol - a supply of tokens
and suggested strategies - to maximize service provision, taking into ac-
count that impatient agents will comply with the protocol if and only if
it is in their interests to do so. Agents’ interactions are modeled as a re-
peated game. We prove that the these protocols have a simple threshold
structure and the existences of equilibria. Then we use this structural
property to design exchange strategies that maximize the system effi-
ciency. Among all protocols with the same threshold, we find that there
is a unique optimal supply of tokens that balances the token distribution
in the population and achieves the optimal efficiency. Such token pro-
tocols are proven to be able to achieve full efficiency asymptotically as
agents become sufficient patient or the cost becomes sufficient small.

Keywords: token protocols, repeated games, agents, efficiency.

1 Introduction
Resource sharing services are currently proliferating in many online systems. For
example, In BitTorrent, Gnutella and Kazaa, individual share files; in Seti@home
individuals provide computational assistance; in Slashdot and Yahoo!Answers,
individuals provide content, evaluations and answers to questions. The expansion
of such sharing and exchange services will depend on their participating members
(herein referred to as agents) to contribute and share resources with each other.
However, the participating agents are self-interested and hence, they will try to
“free-ride”, i.e. they will derive services from other agents without contributing
their own services in return. Empirical studies show that this free-riding problem
can be quite severe: in Gnutella system for instance, almost 70% of users share
no files at all [1].

Corresponding author.

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 248–258, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Token Incentive 249

To compel the self-interested agents to cooperate, incentive schemes can be


designed which rely on the information that individual agents have. Typically,
this information is about the past reciprocation behavior of other agents in the
system which can be complete or partial. Such incentives schemes can be classi-
fied into two categories: personal reciprocation (direct reciprocation) and social
reciprocation (indirect reciprocation) . In the first category [3][7], agents are able
to ”recognize” (identify) each other and exchange resources depending on their
own past mutual interactions. While simple to implement, such incentive schemes
cannot be efficiently deployed in systems where anonymous agents interact infre-
quently with the same partner or in systems with large number of agents. There
has been considerable literature on reputation-based schemes for various applica-
tions, which appertains to the second category of incentive schemes. Reputation
is used as a way to achieve cooperation among self-interested users in [8]. This
framework is generalized in [13], where also protocols are designed using social
norms based on reputation. However, an important limitation of such solutions
is their centralized nature: the provision of service depends on the reputation of
both the client and server, some central authority is required to keep track of
and verify reputations. Moreover, reputation schemes are also vulnerable to col-
lusion attacks: a set of colluding cheaters can mutually increase their reputation
by giving each other positive feedback while giving others negative feedback.
In this paper, we focus on pure strategies and design a new framework for
providing incentives in social communities, using tokens. Agents exchange tokens
for services: the client who receives service from a server pays for that service
with a token which the provider will later use to obtain service when it becomes
a client. In this setting, there is potentially a great deal of scope for a designer
to improve the social welfare of the system by carefully designing of the token
exchanges. The extent to which this potential can be realized depends of course
on the degree of control the designer can exert. Here we ask what the designer can
achieve by imposing a system that relies solely on the exchange of intrinsically
worthless tokens or fiat money. Our emphasis in this paper is on the design of
such a system; in particular, how the designer should choose a protocol - a supply
of tokens and suggested strategies - to maximize the system efficiency. Among all
such choices/recommendations, the designer should select one that maximizes
the social welfare/system efficiency - or at least approaches this maximum. We
characterize the equilibria (in terms of the system parameters), show that they
have a particularly simple form, and determine the achievable system efficiency.
When agents are patient, it is possible to design equilibria to nearly optimal
efficiency.
This work connects to a number of economic literatures [11][10][6][14]. We
go further than these papers in that we emphasize the design of equilibria and
the designer’s goal of efficiency. In particular, we identify equilibria that are
asymptotically efficient, which these papers do not do. In the computer science
and engineering literature, token approaches are also adopted in various systems
[12][4][2]. However, they either assume that agents are compliant, rather than
self-interested, and do not treat incentives and equilibrium or mainly focus on
250 J. Xu, W. Zame, and M. van der Schaar

simulations rather than rigorous theoretical justifications. The work closest to


ours is probably [5][9] which treats a rather different model in a “scrip” system.
More importantly, it assumes that agents adopt threshold strategies but we
rigorously prove that threshold strategies are the only equilibrium strategy.
The rest of this paper is organized as follows. Section 2 introduces the pro-
posed token exchange model, defines equilibrium strategies and formulates the
optimal protocol design problem. Section 3 describes the nature of equilibrium.
Section 4 discusses efficiency of equilibrium protocols and designs the optimal
protocol - optimal token supply and optimal threshold. Section 5 illustrates the
simulation results. Finally concluding remarks are made in Section 6.

2 System Model
In the environment we consider, a continuum (mass 1) of agents each possess
a unique resource that can be duplicated and provided to others. (In the real
systems we have in mind, the population is frequently in the tens of thousands, so
a continuum model seems a reasonable approximation.) The benefit of receiving
this resource is b and the cost of producing it is c ; we assume b > c > 0 ,
so that social welfare is increased when the service is provided, but the cost is
strictly positive, so that the server has a disincentive to provide it. Agents care
about current and future benefits/costs and discount future benefits/costs at
the constant rate β ∈ (0, 1) . Agents are risk neutral so seek to maximize the
discounted present value of a stream of benefits and costs.
Time is discrete. In each time period, a fraction ρ ≤ 1/2 of the population is
randomly chosen to be a client and matched with randomly chosen server; the
fraction 1 − 2ρ is unmatched. (No agent is both a client and a server in the same
period.) When a client and server are matched, the client chooses whether or
not to request service, the server chooses whether or not provide service (i.e.,
transfer the file) if requested. This client-server model describes the world where
an agent has demand at times and also is matched by the system to provide
service at other times.
The parameters b, c, β, ρ completely describe the environment. Because the
units of benefit b and cost c are arbitrary (and tokens have no intrinsic value),
only the benefit/cost ratio r = b/c is actually relevant. We consider variations
in the benefit/cost ratio r and the discount factor β, but view the matching rate
ρ as immutable.

2.1 Tokens and Strategies


In a single server-client interaction, the server has no incentive to provide services
to the client. The mechanism we study for creating incentives to provide involves
the exchange of tokens. Tokens are indivisible, have no intrinsic value, and can be
stored without loss. Each agent can hold an arbitrary non-negative finite number
of tokens, but cannot hold a negative number of tokens and cannot borrow.
The protocol designer creates incentives for the agents to provide or share
resources by providing a supply of tokens and recommending strategies for agents
Token Incentive 251

when they are clients and servers. The recommended strategy is a pair (σ, τ ) :
IN → {0, 1} ; τ is the client strategy and σ is the server strategy. It is obvious
that the strategy should only depend on agents’ current token holding because
the future matching process is independent of the history.

2.2 Equilibrium
Because we consider a continuum population and assume that agents can observe
only their own token holdings, the relevant state of the system from the point
of view of a single agent can be completely summarized by the fraction μ of
agents who do not request service when they are clients and the fraction ν of
agents who do not provide service when they are servers. If the population is in
a steady state then μ, ν do not change over time.
Given μ, ν the strategy (σ, τ ) is optimal or a best response for the current
token holding of k if the long-run utility satisfies
V (k|μ, ν, σ, τ ) ≥ V (k|μ, ν, σ  , τ  )
for alternative strategies σ  , τ  . Because agent discount the future at the constant
rate β, the strategy (σ, τ ) is optimal if and only if it has the one-shot deviation
property: there does not exist a continuation history h and a profitable deviation
(σ  , τ  ) that differs from (σ, τ ) followed by the history h and nowhere else; i.e.
for the server strategy
σ (k) = 0 ⇒ βV (k|σ, τ, μ, ν) ≥ −c + βV (k + 1|σ, τ, μ, ν)
σ (k) ∈ (0, 1) ⇒ βV (k|σ, τ, μ, ν) = −c + βV (k + 1|σ, τ, μ, ν)
σ (k) = 1 ⇒ βV (k|σ, τ, μ, ν) ≤ −c + βV (k + 1|σ, τ, μ, ν)
for the client strategy
τ (k) = 0 ⇒ βV (k|σ, τ, μ, ν) ≥ b + βV (k − 1|σ, τ, μ, ν)
τ (k) ∈ (0, 1) ⇒ βV (k|σ, τ, μ, ν) = b + βV (k − 1|σ, τ, μ, ν)
τ (k) = 1 ⇒ βV (k|σ, τ, μ, ν) ≤ b + βV (k − 1|σ, τ, μ, ν)
Write EQ(r, β) for the set of protocols Π that constitute an equilibrium when
the benefit/cost ratio is r and the discount factor is β. Conversely, given Π write
Φ(Π) for the set {(r, β)} of pairs of benefit/cost ratios r and discount factors
β such that Π is an equilibrium protocol. Note that EQ, Φ are correspondences
and are inverse to each other.

2.3 Invariant Distribution


If the designer chooses the protocol Π = (α, σ, τ ) and agents follow the recom-
mendation, we can easily describe the evolution of the token distribution (the
distribution of token holdings). Note that the token distribution must satisfy
two feasibility conditions:

 ∞

η (k) = 1, kη (k) = α
k=1 k=0
252 J. Xu, W. Zame, and M. van der Schaar

μ, ν are computed as

 ∞

μ= (1 − τ (k)) η (k) , ν = (1 − σ (k)) η (k)
k=0 k=0

Evidently, μ is the fraction of agents who do not request service, and that ν is
the fraction of agents who do not server (assuming they follow the protocol).
To determine the token distribution next period, it is convenient to work
backwards and ask how an agent could come to have k tokens in the next period.
Given the protocol Π the (feasible) token distribution η is invariant if η+ = η;
that is, η is stationary when agents comply with the recommendation (σ, τ ).

2.4 Problem Formulation


The goal of the protocol designer is to provide agents with incentives to provide
service. Define the system efficiency as the probability that the service provi-
sion is successfully carried out when two agents are paired given the system
parameters b, c, β. Using the definition of μ, ν, by the Law of Large Numbers,
the efficiency is computed in the straightforward manner,

Eff (Π|b, c, β) = (1 − μ) (1 − ν)

Taking into account that impatient agents will comply with the protocol if and
only if it is in their interests to do so, the protocol needs to be an equilibrium
given the system parameters. Formally, the design problem are thus to choose
the protocol Π = arg max Eff (Π|β, r) .
Π:(β,r)∈Φ(Π)

3 Equilibrium Strategies
The candidate protocols are enormous, directly focusing on the efficiency hence
is impossible. Therefore, we explore whether there exist some special structures
of the optimal strategies which may simplify the system design.
Proposition 1. Given b, c, β, μ, ν,
1. The optimal client strategy τ is τ (k) = 1 for every k ≥ 1; that is, “always
request service when possible”.
2. The optimal server strategy σ has a threshold property; that is, there exists
K such that σ(k) = 1, ∀k < K and σ(k) = 0, ∀k ≥ K.

Proof. 1. Suppose there is some b, c, β, μ, ν such that τ (k) < 1. If this client
strategy is optimal, it implies that the marginal value of holding k − 1 tokens is
at least b/β, i.e. V (k) − V (k − 1) ≥ b/β > b. Consider any realized continuation
history following the decision period. We estimate the loss in the expected utility
having one less token. Because there is only one deviation in the initial time
period, the following behaviors are exactly the same. The only difference occurs
Token Incentive 253

at the first time when the token holding drops to 0 when it is supposed to buy.
At this moment, the agent cannot buy and losses benefit b. Therefore the loss in
the utility is β t b for some t depending on the specific realized history. Because
this analysis is valid for all possible histories, the expected utility is strictly less
than b. This violates the optimality condition. Hence, it is always optimal for
the agent to spend the token if possible.
2. (sketch) Based on the result of part 1, we study an arbitrary server strategy
σ. The utilities of holding different numbers of tokens are inter-dependent with
each other
V (0) = σ (0) ρ (1 − μ) (−c + βV (1))
+ (ρ (σ (0) (μ − 1) + 2) + 1 − 2ρ) βV (0)
V (k) = σ (k) ρ (1 − μ) (−c + βV (k + 1))
+ρ (1 − ν) (b + βV (k − 1))
+ (ρ (σ (k) μ + ν + 1 − σ (k)) + 1 − 2ρ) βV (k) ,
∀k = 1, 2, ..., K − 1
V (k) = ρ (1 − ν) (b + βV (k − 1))
+ (ρ (ν + 1) + 1 − 2ρ) βV (k) , ∀k = K, K + 1, ...

Using these equations, it can be shown that if a strategy is an equilibrium, the


marginal utilities M (k) = V (k + 1) − V (k) are decreasing sequences. Therefore,
there exists a threshold K such that M (k) ≥ c/β, ∀k < K and M (k) > c/β, ∀k ≥
K.
In view of Proposition 1, we suppress client strategy τ entirely, assuming that
clients always request service whenever possible. Therefore we frequently write
Π = (α, σ) instead of Π = (α, σ, τ ). Moreover, we only need to focus on threshold
server strategies in the following analysis.
Existence of equilibrium is not trivial. To see why, fix a benefit/cost ratio and
consider a threshold protocol Π = (α, σK ). If the discount factor is small, agents
will not be willing to continue providing service until they acquire K tokens; if
β is large, agents will not be willing to stop providing service after they have
acquired K tokens - and it is not obvious that there will be any discount factor
β that makes agents be willing to do so. The following theorem claims that such
β can always be found.
Proposition 2. For each threshold strategy protocol Π = (α, σK ) and bene-
fit/cost ratio r > 1, the set β : ΠK ∈ EQ(r, β) is a non-degenerate interval
[β L , β H ).
Proof. (sketch) We first see that M (K − 1) > c/β, M (K) < c/β is a necessary
and sufficient condition for a strategy to be an equilibrium. This is established
on the properties of marginal utilities. Define F (β) = M (K − 1|β) − c/β, G(β) =
M (K|β) − c/β. Hence, the necessary and sufficient condition becomes F (β) >
0, G(β) < 0.
It can be shown that there exists a unique β L ∈ (0, 1), such that F (β) ≥
0, ∀β ∈ (β L , 1) and equality holds only for β L . Next we show that there exists
a unique β H ∈ (β L , 1) such that G(β) ≤ 0, ∀β ∈ (β L , β H ) and equality holds
254 J. Xu, W. Zame, and M. van der Schaar

only for β H . To see that such β H exists, we prove the G(β) is strictly increasing
in β, G(β L ) < 0 and G(1) > 0. Therefore, there must exist an non-degenerate
interval [β L , β H ] that makes a pure threshold strategy an equilibrium.

If the discount factor is given, the existence of equilibrium can be similarly


characterized by the benefit/cost ratio.
Proposition 3. For each threshold strategy protocol Π = (α, σK ) and discount
factor β ∈ (0, 1), the set r : ΠK ∈ EQ(r, β) is a non-degenerate interval [rL , rH ).

Proof. (sketch) The proof is similar to the proof of Theorem 2 but this time
we write F (r) = M (K − 1|r) − c/β and G(r) = M (K|r) − c/β as functions
of r. Using similar arguments, we can show that F (r) ≥ 0, ∀r ∈ (rL , ∞) and
G(r) < 0, ∀r ∈ (rL , rH ) and rL < rH .

From the design perspective, it is important to understand the set of strategies


that can be equilibria for given system parameters. This will be more clear when
we show that the system efficiency not only depends on the strategy (thresh-
old) but also the token supply. If the token supply is not designed properly
with regard to the threshold, there will be strict efficiency loss. Due to this rea-
son, understanding the equilibrium thresholds for the system parameters is of
paramount importance.

4 Protocol Design

The protocol designer is interested in maximizing the probability of service pro-


vision Eff = (1 − μ)(1 − ν). We also define it as the system efficiency. It is
directly dependent on the fractions of request (1 − μ) and service (1 − ν), which
are determined by the recommended strategy and token distribution in the pop-
ulation.
The token holding distribution is a joint impact of the recommended strategy
and token supply. Using the definition of the token distribution and its transition
equations, we are able to characterize it for the threshold strategy which is
completely determined by the feasibility conditions and the relationship
 k
1 − η (0)
η (k) = η (0) , ∀k = 0, 1, ..., K − 1
1 − η (K)

We will use it in determining the optimal token supply in the next subsection.

4.1 Optimal Token Supply

In general it seems hard to determine the efficiency of a given protocol or to


compare the efficiency of different protocols. However, for a given threshold
strategy, we can find the most efficient protocol and compute its efficiency. Write
ΠK = (K/2, σK ).
Token Incentive 255

Proposition 4. For a given threshold strategy σK , ΠK is the most efficient


protocol; i.e., Eff(α, σK ) ≤ Eff(ΠK ) for every per capita supply of tokens α.
Moreover,
1
Eff (ΠK ) = 1 − 2
(K + 1)
Proof. It is convenient to first solve the following maximization problem

maximize (1 − x1 ) (1 − x2 ) =1 − x1 − x2 + x1 x2
K K
subject to x1 (1 − x1 ) = x2 (1 − x2 )
0 ≤ x1 , x2 ≤ 1

To solve this problem, set f (x) = x(1 − x)K , a straightforward calculus exercise
shows that if 0 ≤ x1 ≤ 1/(K + 1) ≤ x2 ≤ 1 and f (x1 ) = f (x2 ) then,
(a) x1 + x2 ≥ 1/(K + 1) with equality achieved only at x1 = x2 = 1/(K + 1).
(b) x1 x2 ≤ 1/(K + 1) with equality achieved only at x1 = x2 = 1/(K + 1).
Putting (a) and (b) together shows that the optimal solution to the maxi-
mization problem is to have x1 = x2 = 1/(K + 1) and the maximized objective
function value is
 2
1
max (1 − x1 ) (1 − x2 ) = 1 −
K +1
Now consider the threshold K strategy and let η be the corresponding invariant
distribution. If we take x1 = ηo , x2 = ηd then our characterization of the invari-
ant distribution shows that f (x1 ) = f (x2 ). By definition, Eff = (1 − x1 )(1 − x2 )
so  2
1
Eff = 1 −
K +1
Taken together, these are the assertions which were to be proved.

Proposition 4 identifies a sense in which there is an optimal quantity of tokens.


This optimal token supply balances the token distribution in the population in
the sense that there are not too many agents who do not serve or too many
agent who cannot request service. However, these most efficient protocols (for
a given threshold) need not be equilibrium protocols; i.e. such combinations of
token supply and threshold need not be feasible for all system parameters. For
example, given the benefit/cost ratio r, it does not exclude the possibility that
for some discount factor β, we cannot find any threshold protocol with the corre-
sponding optimal token supply that is an equilibrium. However, we disclaim this
conjecture by showing that the sustainable discount factor intervals overlap be-
tween consecutive threshold protocols with optimal token supply. Based on this
overlap property, the following proposition describes the equilibrium threshold
in the limiting case.

Proposition 5. 1. for each fixed discount factor β < 1 lim Eff = 1;


r→∞
2. for each fixed benefit-cost ratio r > 1 lim Eff = 1.
β→1
256 J. Xu, W. Zame, and M. van der Schaar

Proof. (sketch) We prove the first part. The second part is similarly proved.
Consider two protocols Π1 = (K/2, σK ) and Π2 = ((K + 1)/2, σK+1 ) which are
have consecutive thresholds. The corresponding intervals of discount factors that
sustain equilibrium are [β1L , β1H ] and [β2L , β2H ]. We assert that

β1L < β2L < β1H , β2L < β1H < β2H

In words, the sustainable ranges of the discount factors overlap between two
consecutive threshold protocols. To see this, arithmetical exercises show that for
MΠ1 (K|β2L ) > c/β2L which leads to β2L > β1L ; MΠ2 (K|β1H ) > c/β1H which leads
to β2L < β1H . The assertion follows immediately by combining this overlapping
result and Proposition 4.

As agents become arbitrarily patient or the benefit/cost ratio become arbitrar-


ily large, it is possible to choose equilibrium protocols that achieve efficiency
arbitrarily close to full efficiency (i.e., Eff → 1).

5 Simulations

In Fig. 1 we illustrate the sustainable region of the pair (β, r) of the discount
factor and the benefit/cost ratio for various threshold protocols. For a larger
threshold to be an equilibrium, larger discount factors or larger benefit/cost
ratios are required. Moreover, fix one of β and r, for given threshold, there is
always an continuous interval for the other parameter to make the threshold
protocol an equilibrium.

10

6 K=1
r = b/c

5
K=2

4
K=3

3 K=4

2 K=5

1
0.4 0.5 0.6 0.7 0.8 0.9 1
β

Fig. 1. Threshold equilibrium region


Token Incentive 257

0.9 Optimal equilibrium threshold protocol


Fixed threshold protocol (K = 3)
0.8

Normalized Efficiency 0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1
Discount factor β

Fig. 2. Efficiency loss of a fixed threshold protocol

Fig. 2 shows the efficiency of a optimal equilibrium protocol and a fixed thresh-
old protocol. First, the optimal system efficiency goes to 1 as the agents becomes
sufficient patient (β → 1). Second, it compares the achievable efficiency with the
efficiency of a protocol for which the strategic threshold is constrained to be
K = 3. The enormous efficiency loss induced by choosing the wrong protocol
supports our emphasis on the system design in accordance to system parameters.

6 Conclusions
In this paper, we designed token-based protocols - a supply of tokens and rec-
ommended strategies -to encourage cooperation in the online exchange systems
where a large population of anonymous agents interact with each other. We fo-
cused on pure strategy equilibrium and proved that only threshold strategies
can emerge in equilibrium. With this threshold structural results in mind, we
showed that there also exists an unique optimal quantity of tokens that max-
imizes the efficiency given the threshold. It balances the population in such a
way that there are not too many agents who do not serve or too many agents
who cannot pay with tokens. Moreover, the proposed protocols asymptotically
achieve full efficiency when the agents become perfectly patient or the bene-
fit/cost ratio goes to infinity. This paper characterizes the performance of the
online exchange systems operated on tokens and emphasizes the importance of a
proper token protocol. Importantly, the token supply serves as a critical design
parameter that needs to be well understood based on the intrinsic environment
parameters.
258 J. Xu, W. Zame, and M. van der Schaar

References
1. Adar, E., Huberman, B.A.: Free riding on gnutella. ACM Trans. Program. Lang.
Syst. 15(10) (October 2000)
2. Buttyán, L., Hubaux, J.-P.: Stimulating cooperation in self-organizing mobile ad
hoc networks. Mob. Netw. Appl. 8, 579–592 (2003)
3. Feldman, M., Lai, K., Stoica, I., Chuang, J.: Robust incentive techniques for peer-
to-peer networks. In: The 5th ACM Conference on Electronic Commerce, EC 2004,
pp. 102–111. ACM Press, New York (2004)
4. Figueiredo, D., Shapiro, J., Towsley, D.: Incentives to promote availability in peer-
to-peer anonymity systems. In: 13th IEEE International Conference on Network
Protocols, 12 p. (November 2005)
5. Friedman, E.J., Halpern, J.Y., Kash, I.: Efficiency and nash equilibria in a scrip
system for p2p networks. In: 7th ACM Conference on Electronic Commerce, EC
2006, pp. 140–149. ACM, New York (2006)
6. Green, E.J., Zhou, R.: A rudimentary random-matching model with divisible
money and prices. GE, Growth, Math methods 9606001, EconWPA (June 1996)
7. Habib, A., Chuang, J.: Service differentiated peer selection: an incentive mechanism
for peer-to-peer media streaming. IEEE Transactions on Multimedia 8(3), 610–621
(2006)
8. Kandori, M.: Social norms and community enforcement. Review of Economic Stud-
ies 59(1), 63–80 (1992)
9. Kash, I.A., Friedman, E.J., Halpern, J.Y.: Optimizing scrip systems: efficiency,
crashes, hoarders, and altruists. In: Proceedings of the 8th ACM Conference on
Electronic Commerce, EC 2007, pp. 305–315. ACM Press, New York (2007)
10. Kiyotaki, N., Wright, R.: On money as a medium of exchange. Journal of Political
Economy 97(4), 927–954 (1989)
11. Ostroy, J.M., Starr, R.M.: Money and the decentralization of exchange. Economet-
rica 42(6), 1093–1113 (1974)
12. Vishnumurthy, V., Chandrakumar, S., Ch, S., Sirer, E.G.: Karma: A secure eco-
nomic framework for peer-to-peer resource sharing (2003)
13. Zhang, Y., Park, J., van der Schaar, M.: Reputation-based incentive protocols in
crowdsourcing applications. In: Proceedings of IEEE Infocom 2012 (2012)
14. Zhou, R.: Individual and aggregate real balances in a random-matching model.
International Economic Review 40(4), 1009–1038 (1999)
Towards a Metric for Communication Network
Vulnerability to Attacks: A Game Theoretic
Approach

Assane Gueye1 , Vladimir Marbukh1 , and Jean C. Walrand2


1
National Institute of Standards and Technology, Gaithersburg, USA
2
University of California, Berkeley, USA

Abstract. In this paper, we propose a quantification of the vulnerabil-


ity of a communication network where links are subject to failures due to
the actions of a strategic adversary. We model the adversarial nature of
the problem as a 2-player game between a network manager who chooses
a spanning tree of the network as communication infrastructure and an
attacker who is trying to disrupt the communication by attacking a link.
We use previously proposed models for the value of a network to derive
payoffs of the players and propose the network’s expected loss-in-value
as a metric for vulnerability. In the process, we generalize the notion of
betweenness centrality: a metric largely used in Graph Theory to mea-
sure the relative importance of a link within a network. Furthermore, by
computing and analyzing the Nash equilibria of the game, we determine
the actions of both the attacker and the defender. The analysis reveals
the existence of subsets of links that are more critical than the others. We
characterize these critical subsets of links and compare them for the dif-
ferent network value models. The comparison shows that critical subsets
depend both on the value model and on the connectivity of the network.

Keywords: Vulnerability Metric, Value of Communication Network,


Spanning Tree, Betweenness Centrality, Critical Links, Nash Equilib-
rium.

1 Introduction
“...one cannot manage a problem if one cannot measure it...”

This study is an effort to derive a metric that quantifies the vulnerability of a


communication network when the links are subject to failures due to the actions
of a strategic attacker. Such a metric can serve as guidance when designing
new networks in adversarial environments. Also, knowing such a value helps
identify the most critical/vulnerable links and/or nodes of the network, which
is an important step towards improving an existing network. We quantify the

This material is based in part upon work supported by the NIST-ARRA Measure-
ment Science and Engineering Fellowship Program award 70NANB10H026, through
the University of Maryland.

V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 259–274, 2012.

c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
260 A. Gueye, V. Marbukh, and J.C. Walrand

vulnerability as the loss-in-value of a network when links are attacked by an


adversary. Naturally, the first question towards such quantification is: “what is
the value of a communication network?”
The value of a network depends on several parameters including the number
of agents who can communicate over it. It is widely accepted that the utility of
a network increases as it adds more members: the more members a network has,
the more valuable it is. But, there ends the consensus. There is no unanimity
on how much this value increases when new members are added, and there is
very little (if not zero) agreement on how important a given node or link is for
a network. Experts also do not concur on how much value a given network has.
Attempts to assess the utility of a communication network as a function of the
number of its members include the proposition by David Sarnoff [1] who viewed
the value of a network as a linear function of its number of nodes O(n). Robert
Metcalfe [7] has suggested that the value of a network grows as a function of
the total number of possible connections (O(n2 )). David Reed ([4], [16], [17])
has proposed an exponential (O(2n )) model for the utility of a network. For
Briscoe et. al. ([13], [3]) a more reasonable approximation of the value of a
network as a function of the number of nodes is O(nlog(n)). Finally, the authors
of the present paper have considered a power law model where the value of a
network is estimated as O(n1+a ), a ≤ 1. The parameter a is a design parameter
and needs to be specified. Details of these value models are discussed later in
section 2.1.
Each of these very generic models is suitable for a particular network setting,
as we will see later. However, they all have a number of limitations; two of which
are particularly of interest to us: They do not take into account the topology of
the network nor do they consider the way in which traffic is being carried over the
network. In this paper, we build upon these models and use them in the process
to quantify the vulnerability of a network. More precisely, we use the models
as a proof of concept for defining the importance of network links relative to
spanning trees. With this definition, we are implicitly considering networks where
information flow over spanning trees. The topology is also taken into account
because the set of spanning trees of the network has a one-to-one correspondence
with its topology. We are particularly interested in an adversarial situation where
links are the target of an attacker. We use a game theoretic approach to model
the strategic interaction between the attacker and the defender1 .
Our focus on spanning trees is not a limitation as the techniques of the pa-
per can be used to study other scenarios where the network manager chooses
some subset of links (shortest path, Hamiltonian cycle, etc...) and the attacker
is targeting more than one link as can be seen in Gueye [8, Chap. 4]. However,
spanning trees have a number of desirable properties that have made them a cen-
tral concept in communication networking. The Spanning-Tree Protocol (STP-
802.1D 1998–[14] and [15]) is the standard link management protocol used in
Ethernet networks.

1
Throughout this paper the call the defender a “network manager”. The defender can
be a human or an automata that implements the game.
Metric for Communication Network Vulnerability to Attacks 261

When communication is carried over a spanning tree, any node can reach any
other node. In that sense, a spanning tree can be said to deliver the maximum
value of the network (indeed this ignores the cost of communication). This value
can be determined by using one of the models cited above. Now, assuming that
information flows over a given spanning tree, two scenarios are possible when a
link of the network fails.
If the link does not belong to the spanning tree, then its failure does not affect
the communication. If, on the other hand, the link belongs to the spanning tree,
then the spanning tree is separated into two subtrees, each of them being a
connected subnetwork and also delivers some value. However, the sum of the
values delivered by the two subnetworks is expected to be less than the value
of the original network. We define the importance of the link, relative to the
spanning tree, to be this loss-in-value (LIV) due to the failure of the link.
Link failures may occur because of random events (faults) such as human
errors and/or machine failures: this is dealt with under the subject of reliability
and fault tolerance [12]. They also can be the result of the action of a malicious
attacker whose goal is to disrupt the communication. It is this type of failure that
is the main concern of this paper. A network manager (defender) would like to
avoid this disruption by choosing an appropriate communication infrastructure.
We model this scenario as a 2-player game where the defender is choosing a
spanning tree to carry the communication in anticipation of an intelligent attack
by a malicious attacker who is trying to inflict the most damage. The adversary
also plans in anticipation of the defense. We use the links’ LIV discussed above
to derive payoffs for both players.
Applying game theoretic models to the security problem is a natural process
and it has recently attracted a lot of interest (see surveys [18], [11]). In this paper,
we set up a game on the graph of a network and consider the Nash equilibrium
concept. We propose the expected LIV of the game for the network manager
as a metric for vulnerability. This value captures how much loss an adversary
can inflict to the network manager by attacking links. By analyzing the Nash
equilibria of the game, we determine the actions of both the attacker and the
defender. The analysis reveals the existence of a set of links that are most critical
for the network. We identify the critical links and compare them for the different
network value models cited above. The comparison shows that the set of critical
links depends on the value model and on the connectivity of the network.
In the process to quantifying the importance of a communication link, we
propose a generalization of the notion of betweenness centrality which, in its
standard form, is defined with respect to shortest paths ([6]). We consider net-
works where information flow over spanning trees, hence we use spanning trees
in lieu of paths. Our generalization allows both the consideration of arbitrary
(instead of binary) weights of the links as well as preference for spanning tree
utilization.
The remainder of this paper is organized as follows. The next section 2.1
discusses the different network value models that we briefly introduced above.
We use these models to compute the relative importance of the links with respect
262 A. Gueye, V. Marbukh, and J.C. Walrand

to spanning trees. This is shown in section 2.2, followed by our generalization


of the notion of betweenness centrality in section 2.3. The strategic interaction
between the network manager and the attacker is modeled as a 2-player game
which is presented in section 3.1. The Nash equilibrium theorem of the game
is stated in section 3.2 followed by a discussion and analysis of its implications
in section 4. Section 4.1 discusses our choice of metric for the vulnerability of
a network. In section 4.2 we compare the critical subsets of a network for the
different value models cited above. Concluding remarks and future directions are
presented in section 5. All our proofs are presented in the appendix of our online
report [9].

2 On the Value of Communication Networks

The value of a network depends on several parameters including the num-


ber of nodes, the number of links, the topology, and the type of communica-
tion/information that is carried over the network. Assessing such value is a
subjective topic and, to the knowledge of the authors, there is no systematic
quantification of the value of a communication network. Next, we discuss some
attempts that have been made to measure the utility of a network as a function
of its number of nodes.

2.1 Network Value Models

Sarnoff ’s Law:
Sarnoff’s law [1] states that the value of a broadcast network is proportional to
the number of users (O(n)). This law was mainly designed for radio/TV broad-
cast networks where the popularity of a program is measured by the number of
listeners/viewers. The high advertising cost during prime time shows and other
popular events can be explained by Sarnoff’s law. Indeed as more viewers are
expected to watch a program, a higher price is charged per second of advertising.
Although Sarnoff’s law has been widely accepted as a good model for broadcast
network, many critics say that it underestimates the value of general communi-
cation networks such as the Internet.

Metcalfe’s Law:
Metcalfe’s law [5] was first formulated by George Gilder (1993) and attributed
to Robert Metcalfe who used it mostly in the context of the Internet. The law
states that the value of a communication network is proportional to the square of
the number of node. Its foundation is the observation that in a general network
with n nodes, each node can establish n − 1 connections. As a consequence, the
total number of undirected connections is equal to n(n − 1)/2 ∼ O(n2 ). This
observation is particularly true in Ethernet networks where everything is “logi-
cally” connected to everything else. Metcalfe’s law, has long been held up along
side with Moore’s law as the foundation of Internet growth.
Metric for Communication Network Vulnerability to Attacks 263

Walrand’s Law:
Walrand’s law generalizes the previous laws by introducing a parameter a. The
intuition behind this law is as follows. Imagine a large tree of degree d that is
rooted at you. Your direct children in the tree are your friends. The children
of these children are the friends of your friends, and so on. Imagine that there
are L ≥ 2 levels. The total number of nodes is n = d(dL − 1)/(d − 1) + 1. If
d is large, this number can be roughly approximated by n ≈ dL . Assume that
you only consider your direct friends i.e., about d people. Then the value of the
network to you is O(d) = O(na ) where a = 1/L. If you care about your friends
2
and their friends (i.e d2 people) then your value of the network is O(n L ). If all
the nodes up to level l ≤ L are important to you (dl nodes), then the network
l
has a value of O(n L ). Repeating the same reasoning for each user (node), the
total value of the network is approximately equal to O(n ∗ na ) = O(n1+a ) with
0 < a ≤ 1. The parameter a is a characteristic of the network and needs to be
determined. Notice that if all nodes value children at all levels, the total value
of the network becomes n2 which corresponds to the Metcalfe’s law (a = 2). If
on the other hand a = 0, we get back Sarnoff’s model.

Reed’s Law:
Reed’s law, also called the Group-Forming law, was introduced by David Reed
([16],[4], [17]) to quantify the value of networks that support the construction
of a communicating group. A group forming network resembles a network with
smart nodes that, on-demand, form into such configurations. Indeed, the number
of possible groups that can be formed over a network of n nodes is O(2n ). Reed’s
law has been used to explain many new social network phenomenons. Important
messages posted on social networking platforms such as Twitter and Facebook
have been witnessed to spread exponentially fast.

Briscoe, Odlyzko, and Tilly (BOT)’s Law:


Briscoe, Odlyzko and Tilly ([3], [13]), have proposed an O(n log(n)) rule for
the valuation of a network of size n. Their law is mostly inspired by Zipf’s law
that states that if we order a large collection of items by size or popularity, the
second element in the collection will be about half the measure of the first, the
third element will be about 1/3 of the first, and the k-th element will measure
about 1/k of the first. Setting the measure of the first element (arbitrarily) to 1,
the sequence looks like (1, 1/2, 1/3, . . . , 1/k, . . . , 1/n). Now, assuming that each
node in the network assigns value to the other nodes according to Zipf law, then
the total value
n−1of the network to any given node will be proportional to the har-
monic sum i=1 1i , which approaches log(n). Summing over the nodes, we get
the nlog(n) rule. This growth rate is faster than the linear growth of Sarnoff’s
law and does not have the overestimating downside that is inherent to Reed and
Metcalfe. It also has a diminishing return property that is missing in all the other
models.
264 A. Gueye, V. Marbukh, and J.C. Walrand

2.2 Assessing Importance of Links via Spanning Trees


Assuming that a model has been determined for the value of a network, we
quantify the importance of a network link with respect to a spanning tree as the
loss-in-value (LIV) when the link fails while communication is carried over the
tree.

n=8 n=8 n1=4 n2=4

a) b) c)

Fig. 1. Determining the loss-in-value (LIV) of a network link. a) Complete network of


n = 8 nodes, with link ’e’ of interest shown in bold. b)A particular spanning tree ’T ’
of the graph containing link e. c) When link e is removed network is disconnected in 2
connected components each with 4 nodes.

The LIV of a link e, relative to a given spanning tree T , is determined as


follow (see Figure 1). Assume that communication is carried over T and delivers
a value of f (n) − η(T ), where η(T ) is the cost of maintaining spanning tree
T with f (0) = 0 if the network contains 0 node (i.e is empty). Now assume
that link e of the network fails. If e ∈ T , then T is partitioned into 2 subtrees;
each subtree Ti , i ∈ {1, 2} represents a connected component with ni nodes,
where n1 + n2 = n. The net value of the resulting disconnected network is
f (n1 ) + f (n2 ) − η(T ), where f (ni ) is the value of the connected component i.
When link e is removed, some exchanges that could be carried on the original
network become impossible. As of such, it is reasonable to assume that f (·) is
such that f (n) ≥ f (n1 ) + f (n2 ), which is the case for all the network value
models cited above. We define the importance of link e, relative to spanning
tree T , as this LIV f (n) − (f (n1 ) + f (n2 )) when link e fails. If the link does not
belong to the spanning tree, then removing it will leave the network connected,
hence its LIV is equal to zero. More formally, the importance of link e relative
to T is the (normalized) LIV λ(e, T ):

f (n1 ) + f (n2 )
λ(T, e) = 1 − . (1)
f (n)

with the understanding that if e ∈/ T , n1 = n and n2 = 0, giving λ(T, e) = 0.


Writing this expression for all spanning trees and all links of the network, we
build the tree-link LIV matrix Λ defined by Λ[T, e] = λ(T, e).
Remark 1. With the definition in (1), the LIV of a link relative to any spanning
tree is always equal to zero under Sarnoff’s law (i.e λ(T, e) = 0, ∀e and T ). As a
consequence we drop Sarnoff’s law in the analysis below. We consider the simple
model (GWA) introduced in [10]. It gives the same normalized LIV of 1 if the
Metric for Communication Network Vulnerability to Attacks 265

link e belongs to the spanning tree and 0 otherwise (i.e. λ(T, e) = 1e∈T ). The
model basically assumes that whenever a link on the spanning tree is removed
(i.e. successfully attacked and hence disconnecting the network), the network
loses its entire value.
Table (1) shows the LIV of links for the different models presented above (Sarnoff
replaced by GWA). It is assumed that removing link e divides spanning tree T
into two subtrees with respective n1 and n2 nodes (n1 + n2 = n)

Table 1. Normalized LIV of link e relative to spanning tree T for the different laws.
Removing link e from spanning tree T divides the network into two subnetworks with
respective n1 and n2 nodes (n1 + n2 = n).

Model Normalized LIV


GWA 1e∈T
n2 +n2
Metcalfe 1 − 1n2 2
Reed 1 − 2−n1 − 2−n2
BOT 1 − n1 log(nn1log(n)
)+n2 log(n2 )

n1+a +n1+a
Walrand 1− 1
n1+a
2

2.3 A Generalization of the betweenness Centrality Measure


The quantification we have described above for the significance of a link is relative
to spanning trees: there is a different value for each different tree. In general, one
would like to get a sense of the importance of a link for the overall communication
process. Betweenness centrality is a measure that have long been used for that
purpose. Next, we propose a quantification of the importance a link within a
network that generalizes the notion of betweenness. We start by recalling the
betweenness centrality measure as it was defined by Freeman [6].
For link e, and nodes i and j, let gi,j be the number of shortest paths between
i and j and let gij (e) the numbers of those paths that contain e. The partial
g (e)
betweenness measure of e with respect to i and j is defined as ϑij (e) = ijgij

and the betweenness of e is defined as ϑ(e) = i<j ϑij (e). Freeman [6] made
the observation that in the definition of betweenness, gij (e) can be seen as a
weight given to e for a communication between i and j, and g1ij can be seen as
a probability (uniform here) of choosing among the several alternative geodesics
that can carry communication between i and j.
Using this observation and using spanning trees (in lieu of shortest paths), we
can easily generalize the betweenness centrality to quantify the importance of a
link as

ϑ(e, λ, α) = αT λ(T, e), (2)
T
266 A. Gueye, V. Marbukh, and J.C. Walrand

where the summation is now over spanning trees. The parameter λ(T, e) is the
weight of link e for spanning tree T , and α(T ) is the probabilities (preference)
of using T as communication infrastructure.
In general, λ and α can be determined by considering relevant aspects of the
communication network (e.g. cost of utilizing the links, overall communication
delay, vulnerability of links). In this paper, the parameters λ are chosen to be
equal to the LIV of the links relative to spanning trees, and α is chosen to be
the mixed strategy Nash equilibrium in a game between a network manager and
an attacker. Details of the game are presented next.

3 Game Theoretic Approach


3.1 Game Model
The game is over the links of the network with a topology given by a connected
undirected graph G = (V, E) with |E| = m links and |V| = n nodes. The set of
spanning trees is denoted T ; we let N = |T |.
To get all nodes connected in a cycle-free way, the network manager chooses
a spanning tree T ∈ T of the graph. Running the communication on spanning
tree T requires a maintenance cost of η(T ) to the network manager. If link e is
attacked, the total cost to the manager is η(T )+λ(T, e), where λ(T, e) is the LIV
introduced in (1). The attacker simultaneously selects an edge e ∈ E to attack.
Each edge e ∈ E is associated with some cost μ(e) that an attacker needs to spend
to launch a successful attack on e, and gives an attack reward of λ(T,e). Hence,
the net attack reward is equal to λ(T, e) − μ(e) for the attacker. It is assumed
that the attacker has the option e∅ of not attacking, with λ(T, e∅ ) = 0, ∀ T , and
μ(e∅ ) = 0.
We are mainly interested in analyzing mixed strategy Nash equilibria of the
game where the defender chooses α over T to minimize the expected net commu-
nication cost L(α, β) while the attacker is choosing β over E ∪ {e∅ } to maximize
the expected net reward R(α, β).
 
 
L(α, β) = αT η(T ) + βe λ(T, e) , (3)
T ∈T e∈T
(4)
 
 
R(α, β) = βe αT λ(T, e) − μ(e) . (5)
e∈E T e

In this paper, we have focused on the case where η(T ) = η is constant; hence not
relevant
  optimization of L(α, β), which now becomes the minimization
to the
of T ∈T αT e∈T β e λ(T, e). As a consequence, we ignore η(T ) for the rest of
this paper. The general case of η(T ) will be considered in subsequent studies.
Metric for Communication Network Vulnerability to Attacks 267

3.2 Nash Equilibrium Theorem


To state the NE theorem of the game, we need to make a certain number of
definitions.
For each subset of edges E ⊆ E, we let ΛE be the matrix Λ where columns
corresponding to links not in E are set to zero. Matrix Λ is defined in section
2.2 and its entries are given in (1).
Definition 1. For any subset of links E ⊆ E, we define the function κ(E)

κ : 2E −→ R+
  
E −→ κ(E) = min 1 y, y ∈ ỹ ∈ Rm
+ | ΛE ỹ ≥ 1 . (6)

κ(E) is the value of a linear program (LP) that might be infeasible (e.g. when
a row of ΛE is all zeros). However, its dual is always feasible (see [9, App.E]),
and when the dual LP is bounded, the primal is necessarily feasible [2]. Let yE
be a solution of the primal program whenever the dual LP is bounded. If this
dual is unbounded for some subset E, we let yE = K1m , for an arbitrary large
constant K, where m = |E|, and 1m is the all-ones vector of length m. With this
“fix”, κ(E) = m ∗ K when the dual LP is unbounded. Hence, we can define the
following quantities.
Definition 2. The probability distribution induced by E is defined as β E =
yE /κ(E).
The induced expected net reward θ(E) and the maximum induced expected net
reward θ∗ are defined by
1 
θ(E) := − βE (e)μ(e), and θ∗ := max (θ(E)) . (7)
κ(E) E
e∈E

We call a subset E critical if θ(E) = θ∗ and we let C be the set of all critical
subsets.
Remark 2. – In our online report [9, App.E], we argue that a critical subset E
is such that 0 < κ(E) < ∞, hence its corresponding yE and β E are always
well-defined.
– With the definition of κ(·), if μ = 0, a subset E of links is critical, than any
subset F ⊇ E is critical. In this case, the most critical subset is the critical
subset with the minimum size. More details about this can be found in [9].
Theorem 1. For the game defined above, the following always hold.
1. If θ∗ ≤ 0, then “No Attack” (i.e. β(e∅ ) = 1) is always an optimal strategy
for the attacker. In this case, the equilibrium strategy (αT , T ∈ T ) for the
defender is such that

ϑ(e, λ, α) = αT λ(T, e) ≤ μ(e), ∀e ∈ E. (8)
T ∈T

The corresponding payoff is 0 for both players.


268 A. Gueye, V. Marbukh, and J.C. Walrand

2. If θ∗ ≥ 0, then for every probability distribution (γE , E ∈ C) on the set


of critical subsets, the attacker’s strategy (β(e), e ∈ E) defined by β(e) :=

E∈E γE β E (e) is in Nash equilibrium with any strategy (αT , T ∈ T ) of the
defender that satisfies the following properties:

ϑ(e, λ, α) − μ(e) = θ∗ for all e ∈ E such that β(e) > 0.
(9)
ϑ(e, λ, α) − μ(e) ≤ θ∗ for all e ∈ E.

Furthermore, there exists at least one such strategy α.  γE


The corresponding payoffs are θ∗ for the attacker, and r(γ) := E∈C κ(E)
for the defender.
3. If μ = 0, then every Nash equilibrium pair of strategies for the game has the
form described above.

4 Discussion and Analysis

The NE theorem has three parts. If the quantity θ∗ is negative then the attacker
has no incentive to attack. For such choice to hold in an equilibrium, the defender
has to choose his strategy α as given in (8). Such α always exists. When θ∗ ≥ 0
there exists an equilibrium under which the attacker launches an attack that
focuses only on edges of critical subsets. The attack strategies (probability of
attack of the links) are given by convex combinations of the induced distributions
of critical subsets. The corresponding defender’s strategies are given by (9).
When there is no attack cost, the attacker always launches an attack (θ∗ > 0)
and the theorem states that all Nash equilibria of the game have the structure
in 9.

4.1 Vulnerability Metric and the Importance of Links

For simplicity, let’s first assume that there is no attack cost i.e μ = 0. In this case,
θ(E) = κ(E)1
and θ∗ > 0. Also, a subset of link E is critical if and only if κ(E)
is minimal. Since in this case the game is zero-sum, the defender’s expected loss
is also θ∗ = (minE κ(E)). θ∗ depends only on the graph and the network value
model (f (n)). It measures the worst case loss/risk that the network manager is
expecting in the presence of any (strategic) attacker. Notice that in our setting,
a powerful attacker is one who does not have a cost of attack (i.e. μ = 0). When
θ∗ is high, the potential loss in connectivity is high. When it is low, an attacker
has very little incentive, hence the risk from an attack is low. Hence, θ∗ can be
used as a measure of the risk of disconnectivity in the presence of a strategic
attacker. A graph with a high θ∗ is a very vulnerable one.
This vulnerability metric also corresponds to a quantification of the impor-
tance of the most critical links. This is captured by the inequalities in (9), which,
when μ = 0, become

ϑ(e, λ, α) ≤ θ∗ for all e ∈ E, (10)


Metric for Communication Network Vulnerability to Attacks 269

with equality whenever link e is targeted with positive probability (β(e) > 0) at
equilibrium. From (9) we see that β(e) > 0 only if edge e belongs to a critical
subset, and hence is critical. Thus, the attacker focuses its attack only on critical
links, which inflict the maximum loss to the defender.
For the defender, since the game is zero-sum, the Nash equilibrium strategy
corresponds to the min-max strategy. In other words, his choice of α minimizes
the maximum expected loss. Hence, the defender’s equilibrium strategy α can
be interpreted as the best way (in the min-max sense) to choose a spanning
tree in the presence of a strategic adversary. Using this interpretation with our
generalization of betweenness centrality in (2), we get a way to quantify the
importance of the links to the overall communication process. The inequalities
in (10) above say that the links that are the most important to the defender
(i.e. with maximum ϑ(e, λ, α)) are the ones that are targeted by the attacker
(the most critical). This unifies the positive view of importance of links when
it comes to participation to the communication process to the negative view
of criticality when it comes to being the target of a strategic adversary. This
is not surprising because since the attacker’s goal is to cause the maximum
damage to the network, it makes sense that she targets the most important
links.
When the cost of attack is not zero (μ = 0), our vulnerability metric θ∗ takes
it into account. For instance, if the attacker has to spend too much effort to
successfully launch an attack, to the point where (the expected net reward) θ∗ is
negative, the theorem tells that, unsurprisingly, the attacker will choose to not
launch an attack. To “force” the attacker to hold to such choice (i.e to maintain
the equilibrium), the defender has to randomly pick a spanning tree according
to (8). With this choice, the relative value of any link is less than the amount of
effort needed to attack it (which means that any attack will result to a negative
net-payoff to the attacker). When μ is known, such choice of α can be seen as
a deterrence tactic for the defender.
If the vulnerability θ∗ is greater than zero, than there exists an attack strategy
that only targets critical links. To counter such attack, the defender has to draw
a spanning tree according to the distribution α in (9). For such choice of a tree,
the relative importance of any critical link, offset by the cost of attacking the
link, is equal to θ∗ . For any other link, this difference is less than θ∗ . In this
case, the criticality of a link is determined not only by how much importance
it has for the network, but also how much it would take for the adversary to
successfully attack it. Hence,when μ ≥ 0, θ∗ is a measure of the willingness of
an attacker to launch an attack. It includes the loss-in-value for the defender as
well as the cost of attack for the attacker.
Observe that when μ ≥ 0 the theorem does not say anything about the
existence of other Nash equilibria. It is our conjecture (verified in all simulations)
that even if there were other equilibria, θ∗ is still the maximum payoff that the
attacker could ever receive. Hence, it measures the worst case scenario for the
defender.
270 A. Gueye, V. Marbukh, and J.C. Walrand

1
2
3 4

a) b) c)

Fig. 2. Example of critical subsets for different value models. a) GWA model b) BOT,
Walrand, and Metcalfe’s models. c) Reed’s model.

4.2 Critical Subsets and Network Value Models

In this section we discuss how the critical subsets depend on the model used
for the value of the network. Figure 2 shows an example of network with the
critical subsets for the different value models discussed earlier. The example
shows a “core” network (i.e the inner links) and a set of bridges connecting it to
peripheral nodes. A bridge is a single link the removal of which disconnects the
network. In all figures, the critical subset of links is shown the dashed lines. In
this discussion we mainly assume that the attack cost μ is equal to zero.
Figure 2.a shows the critical subset corresponding to the GWA link cost model
introduced in [10] for which λT,e = 1e∈T . With this model, the defender loses
everything (i.e. 1) whenever the attacked link belongs to the chosen spanning
tree. Since a bridge is contained in any spanning tree, attacking a bridge gives
the maximum outcome to the attacker. As a consequence, the critical subsets
correspond to the set of bridges as can be observed in the figure. In fact, with
the GWA value model and Definition 1 of [10], on can easily show that that
|E|
κ(E) = M(E) , where M(E) = minT (|T ∩ E|). Notice that if E is a disconnecting
set (i.e. removing the edges in E divides the graph into 2 or more connected
components), M(E) ≥ 1. Now, if e is a bridge, |T ∩ {e}| = 1 for all spanning
trees T , implying that M({e}) = 1 and θ({e}) = κ({e}) = 1, which is the
maximum possible value of θ∗ . As a consequence, each bridge is a critical subset
and any convex combination over the bridges yields an optimal attack.
Figure 2.b depicts the critical subsets with the Metcalfe, BOT, and Walrand
(a = 0.6) models. For all these models (as well as for Reed’s model), the function
f (x) − (f (x1 ) + f (x2 )), where x1 + x2 = x, is maximized when x1 = x2 = x/2.
This suggests that attacks targeting links that evenly divide (most) spanning
trees are optimal. This conjecture “seems” to be confirmed by the examples
shown in the figure. The most critical links are the innermost or core links of
the network for all three models. The Nash equilibrium attack distributions are
slightly different for the 3 models. The distribution on links (1, 2, 3, 4, 5) is given
in Table 2 for Metcalfe, BOT, and Walrand(a = 0.6) models. Notice that for all
models, the middle link (2) is attacked with a higher probability.
Metric for Communication Network Vulnerability to Attacks 271

Table 2. Attack probabilities on links (1, 2, 3, 4, 5) for Metcalfe, BOT, and Walrand
models

Model Attack probability


Metcalfe (0.1875, 0.2500, 0.1875, 0.1875, 0.1875)
BOT (0.1911, 0.2356, 0.1911, 0.1911, 0.1911)
Walrand(a = 0.6) (0.1884, 0.2465, 0.1884, 0.1884, 0.1884)

Although Reed’s (exponential) model also has the same property discussed in
the previous paragraph, the critical subset with Reed is different, as can be seen
in figure 2.c. While Metcalfe, BOT, and Walrand models lead to the core network
being critical, with Reed’s model, the critical links are the links giving access to
the core network. Each of the links is attacked with the same probability. This
might be a little surprising because it contradicts the conjecture that innermost
links tend to be more critical. However, observing the attack’s reward function
1 − f (n1 )+f (n−n1 )
f (n) as shown in figure 3, Reed’s model coincides with the GWA
model in a wide range of n1 . This means that any link that separates (most of
the spanning) into subtrees of n1 and n − n1 nodes gives the maximum reward
to the attacker, for most values of n1 . Also, notice that since the core network is
“well connected”, the defender has many options for choosing a spanning tree.
This means that in the core, the attacker has less chances of disrupting the
communication. Links accessing the core, on the other hand, deliver high gain
and better chances of disrupting the communication. Hence, the best strategy
for the attacker is, in this case, to target access to the core. Notice that Metcalfe,
BOT, and Walrand (a ≤ 1) models do not have this optimal tradeoff choice.
By choosing the parameter a to be sufficiently large in the Walrand model,
we have (experimentally) observed that the critical subset moves from being the
core, to corresponding to the one in GWA model (the bridges) for very large
values of a. In fact, with all network topologies we have considered (more than
50), we could always choose the parameter of the Walrand so that the critical
subset matches the one in GWA model. This implies that as the model loss
function 1 − f (n1 )+f (n−n1 )
f (n) gets closer to the GWA function 1e∈T , the critical
subset moves away from the inner links to the outer links.
These observations indicate that the critical subsets of a graph depend on the
value model used to setup the game. The value model is however not the only
factor that characterizes the critical subset(s) of a graph. Figure 4 shows the
same network as in the previous example with one additional (core) link. With
this link, the connectivity of the network is enhanced. The critical subset does
not change for the GWA models. However, for all other 4 models, the critical
subset is now the access to the core. This suggests that connectivity is another
factor that characterizes the critical subset(s).
As was observed (with simulations) in the previous example, in this case also,
when the parameter a of Walrand’s model is chosen sufficiently large, the critical
subsets become the same as the GWA critical subsets.
272 A. Gueye, V. Marbukh, and J.C. Walrand

0.8 Walrand: a=0.4


Walrand: a=1.4
Walrand: a=25
0.6 Metcalfe
Reed
BOT
GWA
0.4

0.2

0
5 10 15 20

Fig. 3. Comparison of the loss functions 1 − f (n1 )+f (n−n1 )


f (n)
when a link belonging to
the chosen spanning tree is cut, dividing it into 2 subtrees of n1 and n − n1 nodes.
(x-axis n1 , y-axis 1 − f (n1 )+f (n−n1 )
f (n)
). For GWA, since λT e = 1e∈T , the loss is always
1. The models GWA, Reed, and Walrand (for large values of a), overlap in a wide
region of values of n1 .

a) b)

Fig. 4. Example of critical subsets for different value models. a) GWA model b) BOT,
Walrand, Metcalfe and Reed’s models.

5 Conclusion and Future Work

In this study, we quantify the vulnerability of a communication network where


links are subject to failures due to the actions of a strategic attacker. Such a
metric can serve as guidance when designing new communication networks and
determining it is an important step towards improving existing networks.
We build upon previously proposed models for the value of a network, to
quantify the importance of a link, relative to a spanning tree, as the loss-in-value
when communication is carried over the tree and the link is failed by a strate-
gic attacker. We use these values to setup a 2-player game where the defender
Metric for Communication Network Vulnerability to Attacks 273

(network manager) chooses a spanning tree of the network as communication


infrastructure and the attacker tries to disrupt the communication by attacking
one link. We propose the equilibrium’s expected loss-in-value as a metric for
the vulnerability of the network. We analyze the set of Nash equilibria of the
game and discuss its implications. The analysis shows the existence of subsets of
links that are more critical than the others. We characterize these sets of critical
subsets and, using examples, we show that such critical subsets depend on the
network value model as well as the connectivity of the graph. The nature of
this dependency is an interesting question that we are planning to investigate in
future studies. Finally, we propose a generalization of the notion of betweenness
centrality that allows different weights for the links as well as preference among
the graph structures that carry the communication (e.g. spanning trees for this
paper).
Several future directions are being considered as a followup to this paper.
First, in here, we have discussed the critical subsets using illustrative exam-
ples. To get a better intuition about the relationship between the value function
and the critical subset of the network, a more rigorous analysis of the game
value function (κ(E)) is needed. With such an analysis we will be able to inte-
grate and understand more realistic (and potentially more complicated) network
value models. Also, in this paper, we use spanning trees to define the relative
importance of links. This implicitly considers only networks in which informa-
tion flows over spanning trees. However, our result is general and can be used to
study games on other type of networks. One interesting extension is the situation
where the network manager chooses p ≥ 1 spanning trees (example p = 2 is the
situation where the manager chooses a communication tree and a backup one),
and the attacker has a budget to attack k ≥ 1 links. Also, we have assumed, in
this paper, that the cost of communicating over any spanning tree is the same.
In the future, we will study versions of the problem where some spanning trees
might be more costly then others. Finally, this study has focused on the failure of
links in a network. Nodes also are subject failures: whether random or strategic.
A more thorough study should consider both links and nodes.

References
1. USN Admiral James Stavridis. Channeling David Sarnoff (September 2006),
http://www.aco.nato.int/saceur/channeling-david-sarnoff.aspx
2. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press
(March 2004)
3. Briscoe, B., Odlyzko, A., Tilly, B.: Metcalfe’s Law is Wrong. IEEE Spectrum, 26–31
(July 2006)
4. Marketing Conversation. Reeds Law States that Social Networks Scale Ex-
ponentially (August 2007), http://marketingconversation.com/2007/08/28/
reeds-law/
5. Marketing Conversation. A Short discussion on Metcalfe’s Law for Social Networks
(May 2008), http://marketingconversation.com/2007/08/28/reeds-law/
6. Freeman, L.: Centrality in Social Networks Conceptual Clarification. Social Net-
works 1(3), 215–239 (1979)
274 A. Gueye, V. Marbukh, and J.C. Walrand

7. Gilder, G.: Metcale’s Law And Legacy (November 1995), http://www.seas.upenn.


edu/~gaj1/metgg.html
8. Gueye, A.: A Game Theoretical Approach to Communication Security. PhD dis-
sertation, University of California, Berkeley, Electrical Engineering and Computer
Sciences (March 2011)
9. Gueye, A., Marbukh, V., Walrand, J.C.: Towards a Quantification of Communi-
cation Network Vulnerability to Attacks: A Game Theoretic Approach. Technical
report, National Institute of Standards and Technology (December 2011), http://
www.nist.gov/itl/math/cctg/assane.cfm
10. Gueye, A., Walrand, J.C., Anantharam, V.: Design of Network Topology in an
Adversarial Environment. In: Alpcan, T., Buttyán, L., Baras, J.S. (eds.) GameSec
2010. LNCS, vol. 6442, pp. 1–20. Springer, Heidelberg (2010)
11. Manshaei, M.H., Zhu, Q., Alpcan, T., Basar, T., Hubaux, J.-P.: Game Theory
Meets Network Security and Privacy. Technical report, EPFL, Lausanne (2010)
12. Medhi, D.: Network Reliability and Fault-Tolerance. John Wiley & Sons, Inc.
(2007)
13. Odlyzko, A., Tilly, B.: A refutation of Metcalfe’s Law and a better estimate for the
value of networks and network interconnections
14. Cisco Press. Spanning Tree Protocol: Introduction (August 2006), http://
www.cisco.com/en/US/tech/tk389/tk621/tsd_technology_support_protocol_
home.html
15. Cisco Press. Understanding and Configuring Spanning Tree Protocol (STP) on Cat-
alyst Switches (August 2006), http://www.cisco.com/en/US/tech/tk389/tk621/
technologies_configuration_example09186a008009467c.shtml
16. Reed, D.P.: That Sneaky Exponential: Beyond Metcalfe’s Law to the Power of Com-
munity Building (Spring 1999), http://www.reed.com/dpr/locus/gfn/reedslaw.
html
17. Reed, D.P.: Weapon of Math Destruction (February 2003),
http://www.immagic.com/eLibrary/ARCHIVES/GENERAL/GENREF/C030200D.pdf
18. Roy, S., Ellis, C., Shiva, S., Dasgupta, D., Shandilya, V., Wu, Q.: A Survey of
Game Theory as Applied to Network Security. In: Hawaii International Conference
on System Sciences, pp. 1–10 (2010)
Author Index

Agarwal, Tarun 163 Marbukh, Vladimir 259


Ardabili, Parinaz Naghizadeh 47 Mériaux, François 1

Bar-Noy, Amotz 16 Omont, Bertrand 224


Borndörfer, Ralf 224
Buttyán, Levente 88 Perlaza, Samir 1
Poor, Vincent 1
Chakravarthy, Vasu 99, 192
Rabanca, George 16
Chen, Yanjiao 31
Ren, Shaolei 209
Cohen, Kobi 77
Cui, Shuguang 163 Sagnol, Guillaume 224
Sheng, Shang-Pin 176
Dehnie, Sintayehu 99, 192 Song, Ju Bin 152
Southwell, Richard 31
Gharehshiran, Omid Namvar 115 Swarat, Elmar 224
Gueye, Assane 259 Szeszlér, Dávid 88

Han, Zhu 1, 152 Tsitsiklis, John N. 63


Huang, Jianwei 31
Huang, Minyi 138 van der Schaar, Mihaela 209, 248

Walrand, Jean C. 259


Jin, Youngmi 236
Walter, Deborah 99
Wu, Yanting 16
Kesidis, George 236 Wu, Zhiqiang 99, 192
Kim, Dohoon 125
Krishnamachari, Bhaskar 16 Xu, Jie 248
Krishnamurthy, Vikram 115 Xu, Yunjian 63

Lasaulce, Samson 1 Yuan, Zhou 152


Laszka, Aron 88
Leshem, Amir 77 Zame, William 248
Li, Husheng 99, 192 Zehavi, Ephraim 77
Liu, Mingyan 47, 176 Zhang, Qian 31

You might also like