Lecture Notes of The Institute For Computer Sciences, Social Informatics and Telecommunications Engineering 105
Lecture Notes of The Institute For Computer Sciences, Social Informatics and Telecommunications Engineering 105
Editorial Board
Ozgur Akan
Middle East Technical University, Ankara, Turkey
Paolo Bellavista
University of Bologna, Italy
Jiannong Cao
Hong Kong Polytechnic University, Hong Kong
Falko Dressler
University of Erlangen, Germany
Domenico Ferrari
Università Cattolica Piacenza, Italy
Mario Gerla
UCLA, USA
Hisashi Kobayashi
Princeton University, USA
Sergio Palazzo
University of Catania, Italy
Sartaj Sahni
University of Florida, USA
Xuemin (Sherman) Shen
University of Waterloo, Canada
Mircea Stan
University of Virginia, USA
Jia Xiaohua
City University of Hong Kong, Hong Kong
Albert Zomaya
University of Sydney, Australia
Geoffrey Coulson
Lancaster University, UK
Vikram Krishnamurthy Qing Zhao
Minyi Huang Yonggang Wen (Eds.)
Game Theory
for Networks
Third International ICST Conference
GameNets 2012
Vancouver, BC, Canada, May 24-26, 2012
Revised Selected Papers
13
Volume Editors
Vikram Krishnamurthy
University of British Columbia
Vancouver, BC V6T 1Z4, Canada
E-mail: [email protected]
Qing Zhao
University of California
Electrical and Computer Engineering
Davis, CA 95616, USA
E-mail: [email protected]
Minyi Huang
Carleton University
Ottawa, ON K1S 5B6, Canada
E-mail: [email protected]
Yonggang Wen
Nanyang Technological University
Singapore 639798
E-mail: [email protected]
CR Subject Classification (1998): I.2.1, K.4.4, I.2.6, C.2.4, H.3.4, K.6.5, G.1.6
© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2012
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Vikram Krishnamurthy
Organization
Organizing Committee
Conference General Chair
Vikram Krishnamurthy University of British Columbia, Canada
Workshops Co-chairs
Mihaela van der Schaar UCLA, USA
Hamidou Tembine Supelec, France
Srinivas Shakkottai Texas A&M, USA
Publications Chair
Alfredo Garcia University of Virginia, USA
Industry Chair
Shrutivandana Sharma Yahoo Labs, India
Publicity Co-chairs
Dusit Niyato Nanyang Technological University, Singapore
Walid Saad University of Miami, USA
Web Chair
Omid Namvar Gharehshiran University of British Columbia, Canada
Conference Organizer
Erica Polini EAI, contact: erica.polini[at]eai.eu
Steering Committee
Athanasios Vasilakos National Technical University of Athens,
Greece
Imrich Chlamtac Create-Net, Italy
Table of Contents
1 Introduction
A decentralized self-configuring network (DSCN) is basically an infrastructure-
less communication system in which radio devices autonomously choose their
own transmit/receive configuration in order to guarantee reliable communica-
tion. In particular, a transmit/receive configuration can be described in terms of
power allocation polices, coding-modulation schemes, scheduling policies, decod-
ing order, etc. Typical examples of DSCNs are wireless sensor networks, short
range networks in the ISM bands (e.g., Wi-Fi, Bluetooth, ZigBee, etc,), femto-cell
networks (e.g., femto cells in LTE-A) and adhoc networks in general. The under-
lying feature of DSCNs is that transmitters directly communicate with their re-
spective receivers without the intervention of a central controller. Thus, the main
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 1–15, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
2 F. Mériaux et al.
The problem of all the links wanting to satisfy their QoS constraints at the same
time can naturally be described as a game.
triplet
G = K, {Ak }k∈K , {fk }k∈K . (4)
In this triplet, K represents the set of players, Ak is the strategy set of player
k ∈ K, and the correspondence fk determines the set of actions of player k
that allows its satisfaction given the actions played by all the other players.
A strategy profile is denoted by vector a = (a1 , . . . , aK ) ∈ A. In general, an
important outcome of a game in satisfaction form is the one where all players
are satisfied, that is, an SE. The notion of SE was formulated as a fixed point
in [8] as follows:
Definition 1 (Satisfaction Equilibrium). An action profile a+ is an equi-
librium for the game G = K, {Ak }k∈K , {fk }k∈K if
∀k ∈ K, k ∈ fk a−k .
a+ +
(5)
As we shall see, the SE is often not unique and thus, there might exist some SEs
that are of particular interest. In the following, we introduce the notion of an
efficient SE (ESE). For this intent, we consider a cost function for each player
of the game, in order to model the notion of effort or cost associated with a
given action choice. For all k ∈ K, the cost function ck : Ak → [0, 1] satisfies the
following condition : ∀(ak , ak ) ∈ A2k , it holds that
if and only if, ak requires a lower effort than action ak when it is played by
player k. Under the notion of effort, the set of SEs that are of particular interest
are those that require the lowest individual efforts. We formalize this notion of
equilibrium using the following definition.
Definition 2 (Efficient Satisfaction Equilibrium). An action profile a∗ is
an ESE for the game G = K, {Ak }k∈K , {fk }k∈K , with cost functions {ck }k∈K ,
if
∀k ∈ K, a∗k ∈ fk a∗−k
(7)
and
∀k ∈ K, ∀ak ∈ fk (a∗−k ), ck (ak ) ≥ ck (a∗k ). (8)
Note that the effort associated by each player with each of its actions does not
depend on the choice of effort made by other players. Here, we have left players
to individually choose their cost functions, which adds another degree of freedom
to the modeling of the QoS problem in DSCNs.
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 5
Note also that a game in satisfaction form is not a game with a constrained
set of actions, as is the case in the formulation presented in [3]. Here, a player
can use any of its actions independently of all the other players. The dependency
on the other players’ actions enters through whether the player under study is
satisfied or not.
In the rest of this paper, we use the context of uplink power control in a single-
cell as a case study. Although most of our results apply in a general context,
we concentrate in the uplink power control problem as presented in [12,14], to
illustrate our results.
Consider K transmitter/receiver pairs denoted by index k ∈ K. For all k ∈ K,
transmitter k uses power level pk ∈ Ak , with Ak generally defined as a compact
sublattice For each player k ∈ K, we denote pmin k and pmax
k the minimum and
the maximum power levels in Ak , respectively. For every couple (i, j) ∈ K2 , we
denote by gij the channel gain coefficient between transmitter i and receiver j.
The considered metric for each pair k is the Shannon rate given by
pk gkk
uk (pk , p−k ) = log2 1 + [bps/Hz], (9)
σk2 + j=k pj gjk
In this section, we provide sufficient conditions for convergence of the BRD
and the robust blind response dynamics (RBRD) to an ESE of the game G =
K, {Ak }k∈K , {fk }k∈K , with cost functions {ck }k∈K .
To study the convergence of the BRD, we first define some notation of interest.
Let a = (a1 , . . . , aN ) and b = (b1 , . . . , bN ) be two action profiles and let c =
a ∨ b denote the maximum of (a, b) component wise, i.e., c = (c1 , . . . , cN ) with
cn = max(an , bn ) ∀n ∈ {1, . . . , N }. In a similar way, a ∧ b denotes min(a, b)
component wise.
In the case of the cost function defined in (6), ck depends only on the actions
of player k. Hence, ck is both supermodular and submodular. As a result, (13)
and (14) are equalities.
Proposition 1. Assume that for all k ∈ K, fk (·) is nonempty and compact for
all the values of their arguments, fk (·) has either the ascending or the descending
property and fk (·) is continuous. Then the following holds:
– (i) An ESE exists.
– (ii) If the dynamics start with the action profile associated with the highest
or lowest effort in ck (·), for all k ∈ K, the BRD converge monotonically to
an ESE.
– (iii) If the dynamics start from an SE, the trajectory of the best response
converges to an ESE. It monotonically evolves in all components.
– (iv) In a two-player game, the BRD converge to an ESE from any starting
point.
The proof of Prop. 1 comes from Th. 1 in [1] and Th. 2.3 in [13]. We simply
have to verify that the right assumptions hold for the ascending case and the
descending case:
– Let fk (·) be ascending for all k ∈ K. ck is a cost function player k wants
to minimize, in particular ck is a submodular function, and thus −ck is a
supermodular function player k wants to maximize and Th. 1 from [1] holds,
i.e., (i, ii, iii) in Prop. 1 are ensured when the sets are ascending.
– Let fk (·) be descending for all k ∈ K. A similar reasoning can be made: ck is a
submodular function player k wants to minimize and the same theorem holds
as well, i.e., (i, ii, iii) in Prop. 1 are ensured when the sets are descending.
In both ascending and descending cases, (iv) in Prop. 1 is obtained from Th. 2.3
in [13].
Example 1. In this example, we refer to the notation introduced in Sec. 2.3. Let
us consider K = 3 pairs of transmitters/receivers. For all k ∈ K, transmitter
k uses power level ak ∈ {pmin, pmax }. Given the constraints from Sec. 2.3, let
consider channel gains such that
and
f2 (pmin , pmin ) = {pmin, pmax },
f2 (pmin , pmax ) = {pmax },
(19)
f2 (pmax , pmin) = {pmin , pmax },
f2 (pmax , pmax ) = {pmax }.
We can check that fk has the ascending property for all k ∈ K. For each pair
k, the cost of the power level is given by the identity cost function ck (ak ) = ak .
This game has two ESEs:
– (pmin , pmin , pmin ) where all the players transmit at their lowest power level.
No player has interest in deviating from its action since any other action has
a higher cost (even though the player would remain satisfied).
– (pmax , pmax , pmax ) where all the players have to transmit at maximum power
to be satisfied. If one deviates from its action, it will not be satisfied anymore.
But depending on the initial action profile of the BRD, the BRD may not
converge to an ESE. For instance, assume that the BRD start at p(0) =
(pmax , pmin, pmax ). At step 1, player 1 chooses the action that minimizes c1 (·)
(0)
given the previous actions of the other players p−1 = (pmin , pmax ), i.e.,
(1)
p1 = BR1 (pmin , pmax ) = pmin . (20)
Player 2 chooses the action that minimizes c2 (·) given the most recent actions
(1) (0)
of the other players (p1 , p−(1,2) ) = (pmin , pmax ), i.e.,
(1)
p2 = BR2 (pmin , pmax ) = pmax . (21)
(1) (1)
Player 3 chooses the action that minimizes c3 (·) given (p1 , p2 ) = (pmin , pmax ),
i.e.,
(1)
p3 = BR3 (pmin , pmax ) = pmin . (22)
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 9
At step 2, player 1 chooses the action that minimizes c1 (·) given the previous
(1)
actions of the other players p−1 = (pmax , pmin ), i.e.,
(2)
p1 = BR1 (pmax , pmin ) = pmax . (23)
Player 2 chooses the action that minimizes c2 (·) given the most recent actions
(2) (1)
of the other players (p1 , p−(1,2) ) = (pmax , pmin), i.e.,
(2)
p2 = BR2 (pmax , pmin ) = pmin . (24)
(2) (2)
Player 3 chooses the action that minimizes c3 (·) given (p1 , p2 ) = (pmax , pmin),
i.e.,
(2)
p3 = BR3 (pmax , pmin ) = pmax . (25)
The algorithm is back at the starting point, and it is clear that it will continue
in this infinite loop.
The BRD have significant drawbacks. First, it was just shown that in a K-player
game with K > 2, the dynamics may not converge to an ESE depending on the
initial action profile. Second, to determine the BR, each player has to know
the set fk (a−k ) ∀a−k ∈ A−k . To overcome these drawbacks, we propose a new
algorithm that requires less information about the game for each player and can
still be proven to converge to an ESE. Let us start by defining the robust blind
response (RBR) by RBRk : A → Ak , such that :
a , if a ∈ f (a
k
k k −k ), ak ∈ fk (a−k ) and ck (ak ) ≤ ck (ak ),
a , otherwise,
(ak , a−k ) → a , if a ∈ f (a
k k k −k ) and ak ∈/ fk (a−k ), (26)
k
with action ak being randomly chosen in Ak , such that ∀ak ∈ Ak , Pr (ak = ak ) >
0. Each time the RBR is used, a player k ∈ K randomly chooses an action in
its strategy set Ak without taking into account the constraints of other players.
Player k only has to know if the new action and the previous one allow the
satisfaction of its individual constrains and to compare their respective costs. If
both actions allow the satisfaction of the constraints, it chooses the one with the
lowest cost. If the new action allows the satisfaction of the individual constraints
whereas the previous one does not, it moves to the new action. Otherwise, it
keeps the same action. When all the players sequentially use the RBR such that
∀k ∈ K
(n+1) (n+1) (n+1) (n) (n)
ak = RBRk (a1 , . . . , ak−1 , ak+1 , . . . , aK ), (27)
we refer to these dynamics as the RBR dynamics (RBRD). Our main result in
this section is stated in the following theorem.
10 F. Mériaux et al.
Theorem 1. Assume that for all k ∈ K, fk (·) is nonempty and compact for
all the values of their arguments, fk (·) has the ascending property and it is
continuous, and ck (·) is strictly increasing. Then, the following holds:
– (i) If the dynamics start from an SE, the sequence of RBRs converges to an
ESE. It monotonically decreases in all components.
– (ii) If the dynamics start with the actions associated with the highest effort
in ck (·), ∀k ∈ K, the sequence of RBRs converges monotonically to an ESE.
– (iii) In a two-player game, the sequence of RBRs converge to an ESE from
any starting point.
Proof. Applying Prop.1, we know that there exists an ESE for the game G =
K, {Ak }k∈K , {fk }k∈K . The convergence of the RBRD to an ESE is proven in
two steps. First, we show for (i, ii, ii) that the RBRD converge to a fixed point.
Second, we explain why this fixed point has to be an ESE.
– (i) Assume that the dynamics start from an SE: aSE and this SE is not an
ESE (otherwise, the convergence is trivial). Let player k ∈ K be the first
(n)
player to actually change its action at step n to ak ; necessarily this action
SE
has a lower cost than ak because a satisfied player can only move to another
satisfying action with a lower cost. Let the next player to move be denoted by
j. From its point of view (ak , aSE SE SE
(n) (n)
−{k,j} ) = (ak , a−{k,j} ) ∧ a−j . Hence, due
to the ascending property of fj and the strict monotony of cj , necessarily its
(n )
new action aj ≤ aSE
(n)
j , and so forth. For each k ∈ K the sequence {ak }n∈N
is decreasing in a compact set. Thus, the algorithm converges to a limit.
– (ii) Assume that the dynamics start from action profile amax =
(amax
1 , . . . , amax
K ) and this point is not an SE (otherwise refer to (i)). Let
player k update its action first, at step n. Necessarily, its updated action
(n)
ak is lower than amax k . Then ∀j = k, j ∈ K
(n) (n)
{−j,k} , ak ) = (a{−j,k} , ak ) ∧ a−j .
(amax max max
(28)
Due to the ascending property of fj and the strict monotony of cj , the update
of player j is hence lower than amax
j , and so forth. Again, for each player
(n)
k ∈ K, the sequence of action {ak }n∈N is decreasing in a compact set and
the algorithm converges to a limit.
– (iii) In a two-player game, assume the dynamics start from a random action
(0) (0)
profile (a1 , a2 ). Assume player 1 is the first player that updates its action
(n) (0)
to get satisfied, at step n. The action profile is then (a1 , a2 ). In the next
move, either the same player 1 decreases its action, remaining satisfied, or
player 2 moves to an action that satisfies it, leading to an action profile
(n) (n )
(a1 , a2 ). If this profile is an SE, the dynamics converge according to (i).
Otherwise player 1 is no longer satisfied and has to update its action. If
(n ) (0)
a2 < a2 , then due to the ascending property and the strict monotonicity
(n)
of c1 , player 1 will only move to a lower action than a1 . Then player 2 will
Achievability of Efficient Satisfaction Equilibria in Self-Configuring Networks 11
(n )
also have to move to a lower action than a2 for analogous reasons, and
(n) (n)
so forth. The sequences {a1 }n∈N and {a2 }n∈N are hence decreasing in a
(n ) (0)
compact set, so they converge to a limit. If a2 > a2 , the sequences are
increasing in a compact set and converge as well.
We now have to prove that a fixed point is an ESE. Consider that a∗ is a fixed
point for RBRk , ∀k ∈ K. By the definition of RBRk this means that there exists
no ak ∈ Ak such that ak ∈ fk (a∗−k ) and ck (ak ) ≤ ck (a∗k ), which is exactly the
definition of the ESE. This completes the proof.
The main advantage of these dynamics over BRD in a general framework is
that the former require only local information and the knowledge of an explicit
expression for fk is no longer relevant. Only the knowledge of whether the cor-
responding player is satisfied or not is sufficient to implement the RBR.
power levels in action set Ak , the RBRD converge to an ESE from any starting
point.
Proof. We show in this proof that from any starting point of the dynamics, there
is a non-null probability that the dynamics move to a particular SE with a given
way. Note that the particular sequence of events we describe here is not always
the way the dynamics run. It is simply a sequence that can occur with a non-null
probability, but there are many other possible sequences that lead to an SE.
(0) (0)
Assume p(0) = (p1 , . . . , pk ) is the starting power profile of the dynamics.
Consider all the unsatisfied players at this point and assume that they all move
to their maximum possible power levels (this may happen with a non-null prob-
ability). These levels satisfy them since the ascending property gives us
This increase of power levels may cause some of the satisfied players at the
starting point not to be satisfied anymore. We also assume that these players
move to their maximum power levels. And the same is done until no unsatisfied
player remains. So we get a power profile made of the highest power levels for
some of the players and the initial power levels for the others, and every player
is satisfied at this point: it is an SE.
Finally, from (i) of Th. 1, the dynamics converge to an ESE, which completes
the proof.
Th. 2 highlights a very interesting property of the RBRD when players enter or
quit the game (or when the channel coefficients vary). Indeed, if K transmitters
12 F. Mériaux et al.
are in any given ESE p∗ and a new transmitter enters the game, a new game
starts with K + 1 players. Thus, from Th. 2, it can be stated that convergence
to a new ESE, if it exists, is ensured from the new starting point (p∗ , pk+1 ).
4 Numerical Results
In this section, we provide numerical results for the uplink power control game
with discrete action sets as defined in Sec. 2.3.
In Fig. 1, we show the sequences of actions converging to an ESE for the
RBRD in a 2-player power control game. The colored region is the satisfaction
region, i.e., the region allowing both players to be satisfied. The coloring of this
region follows the sum of the costs for each player. The RBR first converges to the
satisfaction region, then converges to an ESE while remaining in the satisfaction
region.
35
30
Power Index of Player 2
25
20
15
10
Satisfaction region
5
Robust blind response
0
0 5 10 15 20 25 30 35
Power Index of Player 1
Fig. 1. Sequence of power indices for the RBRD in the uplink 2-player power control
game. The colored region is the satisfaction region, i.e., the region where the two players
mutually satisfy their constraints.
next steps, and finally transmitter 3 leaves for the last 200 steps. On each of
the two figures, we show the sequence of power indices for the three players,
knowing that each action set is made of Nk = 32 possible power levels from
10−6 W to 10−2 W. We also show the satisfaction states of the three players: for
each step of the dynamics, if all the player are satisfied, the satisfaction state
is 1, otherwise it is 0. Fig. 2 and Fig. 3 correspond to the behavior of the BRD
and the RBRD, respectively. The channel parameters and the starting points
of the two simulations are exactly the same. Channel gains are g22 = 10−5 ,
g11 = g33 = g13 = g21 = g32 = 10−6 , g12 = g23 = g31 = 10−7 , and transmitters
1, 2, and 3 start at power levels 10−3 W, 10−5/2 W, and 10−9/4 W, respectively.
The utility constraints Γ1 , Γ2 , and Γ3 are taken as 1.2 bps/Hz, 1.5 bps/Hz, and
1.2 bps/Hz, respectively. The variance of the noise is fixed at 10−10 W for all the
transmitters. It is interesting to notice that the BRD converge to ESE during
the first and third phase but when transmitter 2 enters the game in the second
phase, the BRD do not converge to an ESE. Instead, they enter a loop and we
can see that the transmitters are not satisfied. Concerning the RBRD, although
their convergence time is longer, they converge in the three phases and another
interesting fact is that transmitters are satisfied during a longer amount of time
compared to the BRD.
40
Trans. 1
Power Index
30 Trans. 2
Trans. 3
20
10
0 100 200 300 400 500 600
Satisfaction
Fig. 2. Sequences of power indices and satisfaction states for the BRD in the 3-player
uplink power control game
14 F. Mériaux et al.
40
Trans. 1
Power Index
30 Trans. 2
Trans. 3
20
10
0 100 200 300 400 500 600
Satisfaction
Fig. 3. Sequences of power indices and satisfaction states for the RBRD in the 3-player
uplink power control game
References
1. Altman, E., Altman, Z.: S-modular games and power control in wireless networks.
IEEE Transactions on Automatic Control 48(5), 839–842 (2003)
2. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical
Methods. Prentice-Hall, Inc., Upper Saddle River (1989)
3. Debreu, G.: A social equilibrium existence theorem. Proceedings of the National
Academy of Sciences of the United States of America 38(10), 886–893 (1952)
4. Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge (1991)
5. Han, Z., Niyato, D., Saad, W., Basar, T., Hjorungnes, A.: Game Theory in Wire-
less and Communication Networks: Theory, Models and Applications. Cambridge
University Press, Cambridge (2011)
6. Haykin, S.: Cognitive radio: Brain-empowered wireless communications. IEEE
Journal on Selected Areas in Communications 23(2), 201–220 (2005)
7. Lasaulce, S., Tembine, H.: Game Theory and Learning in Wireless Networks: Fun-
damentals and Applications. Elsevier Academic Press, Waltham (2011)
8. Perlaza, S.M., Tembine, H., Lasaulce, S., Debbah, M.: Quality-of-service provision-
ing in decentralized networks: A satisfaction equilibrium approach. IEEE Journal
of Selected Topics in Signal Processing 6(2), 104–116 (2012)
9. Rose, L., Lasaulce, S., Perlaza, S.M., Debbah, M.: Learning equilibria with par-
tial information in decentralized wireless networks. IEEE Communications Maga-
zine 49(8), 136–142 (2011)
10. Scutari, G., Barbarossa, S., Palomar, D.P.: Potential games: A framework for vector
power control problems with coupled constraints. In: The IEEE International Con-
ference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France
(May 2006)
11. Scutari, G., Palomar, D.P., Facchinei, F., Pang, J.-S.: Convex optimization, game
theory, and variational inequality theory in multiuser communication systems.
IEEE Signal Processing Magazine 27(3), 35–49 (2010)
12. Wu, C.C., Bertsekas, D.P.: Distributed power control algorithms for wireless net-
works. IEEE Transactions on Vehicular Technology 50(2), 504–514 (2001)
13. Yao, D.D.: S-modular games, with queueing applications. Queueing Systems
21(3-4), 449–475 (1995)
14. Yates, R.D.: A framework for uplink power control in cellular radio systems. IEEE
Journal on Selected Areas in Communications 13(7), 1341–1347 (1995)
A Competitive Rate Allocation Game
1 Introduction
Optimizing throughput is one of the central problems in wireless networking
research. To make good use of the available wireless channels, the transmitter
must allocate rate efficiently. We study in this paper a simple yet fundamental
rate allocation problem in which the transmitter does not precisely know the
state of the channels, and the corresponding receivers are selfish.
In this problem, there is one transmitter that must allocate rates to two differ-
ent receivers to forward data on its behalf to a given destination (see illustration
in figure 1). The two channels from the transmitter to each receiver are indepen-
dent channels with two states: high or low. The channel states are assumed to
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 16–30, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
A Competitive Rate Allocation Game 17
be i.i.d. Bernoulli random variables. Initially we assume that the receivers both
know each others channel parameters, but the transmitter does not. At each
time, the receivers communicate to the transmitter a binary bid corresponding
to the possible state of their respective channels. The transmitter responds to
these bids by deciding whether to send aggressively (at a high, or very high rate)
or conservatively (at a low rate) on one or both channels. Specifically when both
receivers bid low, the transmitter sends data at a low rate R1 over both channels.
And when both receivers bid high, the transmitter splits its power to send data
at a high rate R2 over both channels. When one of the receivers bids low and the
other bids high, the transmitter sends data at a very high rate R3 over the latter
channel. When the sender sends data at a high or very high rate, we assume
that there is a failure and nothing gets sent if the transmission channel actually
turns out to be bad. In this case, the sender levies a penalty on the receivers.
But whenever data is successfully sent, it pays the receiver a fee proportional to
the rate obtained.
There are two roles in this setting: the receivers and the transmitter. The
receivers want to get as much reward as possible, avoiding the penalties. Since
the transmitter’s rate allocation is a competitive resource that directly affects the
receivers’ utilities, the setting can be modeled as a two player, non-cooperative
game. On the other hand, the transmitter is the game designer: it can choose
the penalties in order to influence how the receivers play the game. The goal of
the transmitter is to transmit as much data as possible and, without knowledge
of the receiver’s channel states, to guarantee that the total transmission is not
much worse than the optimal. In this paper we prove that there is a way to set
the penalty terms such that both receivers have dominant strategies, and the
data forwarded by two receivers is at least 1/2 of the optimal, in other words,
that the Price of Anarchy from the transmitter’s point of view is at most 2.
If the underlying channels’ states are known we can assume that the two
receivers will play their dominant strategies if they have one. However, if the
underlying channel status is unknown, the receivers need to learn which action
is more beneficial. Assuming that the underlying channel state is drawn from
an unknown underlying distribution at each time slot, we show that modeling
each payers’ choice of action as a multi-armed bandit leads to desirable results.
18 Y. Wu et al.
In this paper we adapt UCB1 algorithm [1], which there are two arms for each
receiver, each arm corresponding to an action: bidding high or bidding low.
From the simulations, we find that the UCB1 algorithm gives a performance
which is close to the dominant strategy, and, when both receivers use UCB1 to
choose their strategies, it can give even better payoffs than playing the dominant
strategy.
Related Work: Game theory, which is a mathematical tool for analyzing
the interaction of two or more decision makers, has been used in wireless com-
munications by many authors [2], [3]. While we are not aware of other papers
that have addressed exactly the same formulation as discussed here, other re-
searchers have explored related game theoretic problems pertaining to power
allocation over multiple channels. For instance, the authors of [4] formulate a
multiuser power control problem as a noncooperative game, show the existence
and uniqueness of a Nash equilibrium for a two-player version game and pro-
pose a water-filling algoirhtm which reaches the Nash equilibrium efficiently; and
the authors of [5] study a power allocation game for orthogonal multiple access
channels, prove that there exists a unique equilibrium of this game when the
channels are static and show that a simple distributed learning schema based
on the replicator dynamics converges to equilibrium exponentially fast. Unlike
most of the prior works, our formulation and analysis is not focused on optimiz-
ing the power allocation per se, but rather on issues of information asymmetry
between the transmitter and receivers and the design of appropriate penalties
levied by the transmitter to ensure that the receiver’s selfishness do not hurt
performance too much. Somewhat related to the formulation in this paper are
two recent papers on non-game-theoretic formulations for a transmitter to decide
on the whether to send conservatively or aggressively over a single (known or
unknown) Markovian channel [6] [7]. Although we consider a simpler Bernoulli
channels here in which case the transmitter’s decisions would be simplified, our
formulation focuses on strategic interactions between two receivers. In the case of
unknown stochastic payoffs, we consider the use of a multi-armed bandit-based
learning algorithm. Relatively little is known about the performance of such on-
line learning algorithms in game formulations, though it has been shown that
they do not always converge to Nash equilibria [8].
2 Problem Formulation
In the rate allocation game we consider two receivers and one transmitter. The
transmitter uses the two receivers to forward data to the destination. The channel
from the transmitter to each receiver has one of the two states at each time slot:
low (L or 0) or high (H or 1). The two channels are independent with each other
and their state comes from an i.i.d. distribution. We denote pi (i = 0, 1) as
the probability that channel i is in state high at any time. Before transmitting,
neither the receivers nor the transmitter know the state of the channel. At the
beginning of each time slot, each receiver makes a "bid" (high or low). The
transmitter allocates rate to the receivers according to the bids sent. At the end
A Competitive Rate Allocation Game 19
of the time slot both receivers observe whether or not their transmission was
successful. A transmission is not successful if the respective channel is in a low
state but has been assumed to be in a high state.
Since the channel state is unknown in advance, the receivers’ bid may lead to
an unsuccessful transmission. If the the transmission is successful, the receiver
is paid an amount proportional to the transmission rate. Otherwise, it will get a
penalty (negative reward). Table 1 shows the reward functions for each receiver.
Table 1.
Throughout the rest of the paper we will assume that R1 < R2 < R3 < 2R2 .
C and D are the penalties that the receivers get for making a high bid when
their channel state is low.
There are two roles in this game setting: the transmitter and the receivers.
The transmitter wants to carry as much data as possible to the destination. It
is not interested in penalizing the receivers, but only uses the penalty to give
incentive to the receivers to make good guesses. The receivers are only interested
in the reward and they don’t lose any utility from transmitting more data.
Table 2 shows the relationship between the expected rewards for the two receivers
as a normal form game. In each cell, the first value corresponds to the reward
for receiver 1, and the second value corresponds to the reward for receiver 2.
Table 2.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, p2 R3 − (1 − p2 )C)
H (p1 R3 − (1 − p1 )C, 0) (p1 R2 − (1 − p1 )D, p2 R2 − (1 − p2 )D)
20 Y. Wu et al.
LL1 = R1 , (1)
LL2 = R1 ,
LH1 = 0,
LH2 = p2 R3 − (1 − p2 )C,
HL1 = p1 R3 − (1 − p1 )C,
HL2 = 0,
HH1 = p1 R2 − (1 − p1 )D,
HH2 = p2 R2 − (1 − p2 )D.
Let receiver 1 bid high with probability q1 , and receiver 2 bid high with prob-
ability q2 . At Nash equilibrium, receiver 1 selects the probability such that the
utility function for receiver 2 is the same for both bidding high and bidding low.
Therefore we have:
Denote
C + R1 D
b1 = min{ , }, (7)
C + R3 D + R2
C + R1 D
b2 = max{ , }. (8)
C + R3 D + R2
A Competitive Rate Allocation Game 21
Theorem 1. If p1 ∈
/ [b1 , b2 ] or p2 ∈
/ [b1 , b2 ], then there exists a unique pure Nash
equilibrium.
C+R1 D
Proof. Let p1 < b1 , thus p1 < C+R3 and p1 < D+R2
Thus receiver 1 has a dominating strategy for bidding low. When receiver 1 bids
low, the optimal action for receiver 2 is bidding low if LL2 > LH2 , and high
otherwise.
Let p1 > b2 , thus p1 > C+R D
C+R3 and p1 > D+R2
1
Thus the dominating strategy for receiver 1 is bidding high. When receiver 1
bids high, the optimal action for receiver 2 is bidding high if HL2 < HH2 and
low otherwise.
Similarly for p2 ∈
/ [b1 , b2 ].
Lemma 1. If p1 ∈ (b1 , b2 ) and p2 ∈ (b1 , b2 ), there exists more than one Nash
equilibrium.
Proof. Let p1 ∈ (b1 , b2 ) and p2 ∈ (b1 , b2 ), then there are two possible scenarios:
Scenario 1: b1 = C+R D
C+R3 , b2 = D+R2 , then
1
Table 3.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, > R1 )
H (> R1 , 0) (< 0, < 0)
There are two Nash equilibrium: when one receiver bids high, the other re-
ceiver bids low.
D
Scenario 2: b1 = D+R2
, b2 = C+R1
C+R3 , then
Table 4.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, < R1 )
H (< R1 , 0) (> 0, > 0)
There are two Nash equilibrium: both bid high, or both bid low.
In the range of (b1, b2)×(b1, b2), if both receivers play mixed Nash equilibrium,
their utility could become much worse than they play pure Nash equilibrium.
If the mixed Nash equilibrium is used. The expected total utility function for
each receiver are:
Table 5.
XX
XXXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, < R1 )
H (< R1 , 0) (< 0, < 0)
In this section, we consider the amount of data which can be sent by the two
receivers. Think the transmitter asks the two receivers to forward its data. What
the transmitter really cares about is how much data is sent. In this case, when
sending fails, we consider the data sent is 0. The penalty term C and D are to
let the receivers adjust their bidding, but for transmitter, it does not get such a
penalty.
Table 6 represents the expected rewards table got from the transmitter’s view:
Table 6.
XX
XXXReceiver 2
Receiver 1 XXXX
L H
L (R1 , R1 ) (0, p2 R3 )
H (p1 R3 , 0) (p1 R2 , p2 R2 )
VLL = R1 + R1 , (11)
VHL = p1 R3 , (12)
VLH = p2 R3 , (13)
VHH = p1 R2 + p2 R2 . (14)
(15)
maxs∈S V (s)
P oA = . (16)
mins∈N E V (s)
where S is the strategy set, NE is the Nash equilibrium set, and V (s) = {VLL , VHL ,
VLH , VHH }.
R1 R3 −R1 R2 R1 R2
Theorem 2. If C = R2 −R1 and D = R2 −R1 , then P oA < 2.
Proof. If C = R1 R3 −R1 R2
R2 −R1 and D = R1 R2
R2 −R1 , then b1 = b2 = R1
R2 . There only exists
pure Nash equilibrium.
Let p = R 1
R2 ,
If p1 < p and p2 < p,
The optimal is at most 2p2 R3 . The Nash equilibrium is LH. Thus P oA < 2.
If p1 > p and p2 < p, similar to the p1 < p and p2 > p case.
If p1 > p and p2 > p,
Lemma 4. In the rate allocation game, for any fixed penalties C and D, there
exist p1 and p2 such that the P oA is at least 2R1
R3 .
Proof. Assume that p1 = 0 and p2 = 1. Then the table 7 shows the receivers’
payoff matrices and table 8 shows the transmitter’s payoff.
A Competitive Rate Allocation Game 25
Since R3 > R1, then the only Nash equilibrium in this instance of the game is
(L, H) for a transmitter utility of R3 . If 2R1 > R3 then the optimal solution
from the transmitter perspective is (L, L) for an utility of 2R1 .
The Price of Anarchy is at least 2R1
R3 .
Corollary 1. The Price of Anarchy for the rate allocation game over all in-
stances can be arbitrarily close to 2 for any C and D.
This corollary implies that our result in Theorem 2 showing how that the PoA
can be bounded by 2 is essentially tight in the sense that no better guarantee
could be provided that applies to all problem parameters.
When the channels’ status are known, and C and D are set as described in The-
orem 2, both receivers have dominant strategies. However, when the channels’
status are unknown, the receivers need to try both actions: sending with high
data rate or sending with low data rate. The underlying channels are stochastic,
even to each receiver, the probability that the channel will be good is unknown.
Multi-armed bandits are handy tool to tackle the stochastic channel problems,
so we adopt the well known UCB1 algorithm [1] to figure out the optimal strate-
gies for each receiver. The arms correspond to actions: bidding high or low, each
receiver only records the average rewards and number of plays and play by the
UCB1 algorithm in a distributed manner without taking into account the other
receiver’s actions.
We recap the UCB1 algorithm in Alg. 1, normalizing the rewards in our case
to lie between 0 and 1.
26 Y. Wu et al.
6 Simulations
In this section we present some simulation results showing that the UCB1 learn-
ing algorithm performs well. In all simulations we fix the penalties C and D as
in theorem 2 which leaves each receiver with a dominant strategy, but which is
not usually known by the receivers. In the figures below we show how the UCB1
learning algorithm compares with playing the dominant strategy (if the receiver
knew it) and determine that using UCB1 does not lose much utility in average,
and sometimes is better than the dominant strategy.
First, in figure 2, we assume that receiver 2 knows its probability for the state
of the channel being high, and plays its own dominant strategy. In this case
receiver 1 would be better off if it knew the probability of its state and would
play the dominant strategy. However, playing UCB1 does not lose much utility
in average. Figure 2 shows for each R1 as a fraction of R3 , the average payoff
over multiple games in which R2 , p1 and p2 are distributed over their entire
domain.
In figure 3 we show the average payoff over multiple choices of R2 , p1 and p2 ,
when receiver 1 plays either the dominant strategy or the UCB1 strategy, and
receiver 2 plays the UCB1 strategy. We can see here that the dominant strategy
is only better in average for large values of R1 and for small values of R1 playing
UCB1 brings better payoff.
Figure 4 and 5 show the same scenarios from the transmitter’s perspective.
Figure 4 compares the optimal average utility the transmitter could get from each
game to the average utility the transmitter gets from receiver 1 using UCB1
or receiver 1 using its dominant strategy, when receiver 2 plays its dominant
strategy. We notice that both strategies give almost the same payoff to the
transmitter, especially when the value of R1 is much smaller compared to R3 .
This happens because when receiver 1 uses UCB1 against a player that uses
its dominant strategy then receiver 1 will quickly learn to play its dominant
strategy as well. Figure 5 shows how the transmitter optimal payoff compares to
the transmitter payoff when the receiver 2 uses the UCB1. When both receivers
use the UCB1 algorithm to choose their strategies, the transmitter payoff is
better than when one receiver uses the dominant strategy and the other receiver
A Competitive Rate Allocation Game 27
1.5
0.5
0
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)
dominant
4.4
UCB1
4.2
Average payoff per game
3.8
3.6
3.4
3.2
2.8
2.6
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)
uses the UCB1 learning algorithm. When both receivers are using the UCB1
learning algorithm the receivers don’t play the Nash equilibrium when that is
much worse than cooperating. This is why the UCB1 sometimes performs better
than the dominant strategy.
Finally, figure 6 shows how the transmission rate varies when receivers use the
UCB1 learning algorithm, compared to the optimal transmission rate. In this
simulation we vary the actual probabilities of the two channels while keeping
the rewards unchanged, and we observe that when the two channels are equally
good the UCB1 algorithm obtains almost optimal transmission rate.
We now consider two specific problem instances to illustrate the performance
when UCB1 is adopted by both receivers. In both cases, we assume the following
parameters hold:
28 Y. Wu et al.
Transmitter perspective:
UCB1 vs. dominant strategy when the other receiver
plays its dominant strategy
10
dominant
9 UCB1
OPT
8
0
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)
Transmitter perspective:
UCB1 vs. dominant strategy when the other receiver
plays UCB1
9
dominant
8.5
UCB1
8 OPT
Average payoff per game
7.5
6.5
5.5
4.5
4
.1 .2 .3 .4 .5 .6 .7 .8
R1 (as a fraction of R3)
In this case, the payoff matrix from the receivers’ point of view is shown in table
10:
The optimal action (from the transmitter’s perspective) is both receivers bid-
ding low. When both receivers apply UCB1, we find that for receiver 1, the
number of times out of 100,000 that it bids high is 657, the number of times it
bids low is 99343; for receiver 2, the number of times it bids high is 39814, and
the number of times it bids low is 60186.
A Competitive Rate Allocation Game 29
0.85
0.8
0.75
0.7
.1 .2 .3 .4 .5 .6 .7 .8 .9 1
p1
Fig. 6. Normalized transmitter payoff with respect to optimum when both play UCB1
as a function of the two channel parameters
Table 9.
XXX
XXReceiver 2
Receiver 1 XXXX
L H
L (40, 40) (0, 38)
H (0, 0) (−90, −4.5)
Table 10.
XX
XXXReceiver 2
Receiver 1 XXXX
L H
L (40, 40) (0, 42)
H (0, 0) (−90, 4.5)
In this case, the optimal action (from the transmitter’s perspective) is receiver
1 bidding low, receiver 2 bidding high. for Receiver 1, the number of times out
of 100,000 that it bids high is 622, the number of times it bids low is 99378; for
Receiver 2, the number of times it bids high is 62706, and the number of times
it bids low is 62706.
These examples illustrate how the distributed learning algorithm is sensitive
to the underlying channel parameters and learns to play the right bid over a
sufficient period of time, although as expected, the regret is higher when the
channel parameter is close to b1 .
30 Y. Wu et al.
7 Conclusion
References
1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit
problem. Machine Learning 47, 235–256 (2002)
2. MacKenzii, A., DaSilva, L.: Game Theory for Wireless Engineers. Morgan and Clay-
pool Publishers (2006)
3. Altman, E., Boulogne, T., El-Azouzi, R., Jimenez, T., Wynter, L.: A survey on
networking games in telecommunications. Computer and Operations Research 33,
286–311 (2006)
4. Yu, W., Ginis, G., Cioffi, J.M.: Distributed Multiuser Power Control for Digital
Subscriber Lines. IEEE Jour. on Selected Areas in Communications 20(5), 1105–
1115 (2002)
5. Mertikopoulos, P., Belmega, E.V., Moustakas, A.L., Lasaulce, S.: Distributed Learn-
ing Policies for Power Allocation in Multiple Access Channels. IEEE Journal on
Selected Areas in Communications 30(1), 96–106 (2012)
6. Laourine, A., Tong, L.: Betting on Gilbert-Elliott Channels. IEEE Transactions on
Wireless Communications 50(3), 484–494 (2010)
7. Wu, Y., Krishnamachari, B.: Online Learning to Optimize Transmission over an
Unknown Gilbert-Elliott Channel. WiOpt (May 2012)
8. Daskalakis, C., Frongillo, R., Papadimitriou, C.H., Pierrakos, G., Valiant, G.: On
Learning Algorithms for Nash Equilibria. In: Kontogiannis, S., Koutsoupias, E.,
Spirakis, P.G. (eds.) SAGT 2010. LNCS, vol. 6386, pp. 114–125. Springer, Heidelberg
(2010)
Convergence Dynamics of Graphical Congestion
Games
1 Introduction
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 31–46, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
32 R. Southwell et al.
feature called the finite improvement property., which means that if the players
keep performing asynchronous better response updates (i.e., the players improve
their strategy choices one at a time) then the system will eventually reach a pure
Nash equilibrium - a strategy profile from which no player has any incentive
to deviate. Intuitively, the finite improvement property means greedy updating
always converges to a stable strategy profile.
The generality and pleasing convergence properties of the original congestion
game model have led to its application to a wide range of resource allocation
scenarios (e.g., economics [2], communication networks [3–6], network routing [7],
network formation [8], ecology [9], and sociology [10]). However, the original
model has the limitation that each player using the same resource gets the same
payoff from it. Treating players identically is unsuitable for many scenarios in
ecology [11], network routing [12], and wireless networks [13] where players are
heterogenous. This has motivated many adaptations of the original congestion
game, including congestion games with player-specific payoff functions [4,14] and
weighted congestion games [16].
In [17], we considered the graphical congestion game (see Figure 1), an im-
portant generalization of the original congestion game concept. This model not
only allows player-specific payoff functions but also models how the spatial po-
sitioning of the players affects their performance in the game. In the original
congestion game model, any pair of users cause congestion to each another when
using the same resource. In the graphical congestion game, we regard the play-
ers as vertices in a conflict graph. Only linked players cause congestion to each
other. Unlinked players can use the same resource without causing congestion
to each other. We describe some scenarios that can be modeled using graphical
congestion games in Table 1.
Fig. 1. A strategy profile in a graphical congestion game. The players (i.e., the vertices
on the graph) select sets of resources to use. Player 1 is using resources 1 and 2. The
amount of payoff a player gains from using a particular resource is a non-increasing
function of the number of its neighbors who are also using that resource.
Convergence Dynamics of Graphical Congestion Games 33
Table 1. How graphical congestion games can be used to model various resource
sharing scenarios
N
A strategy profile X ∈ Πn=1 ζn consists strategy (i.e., a collection of resources)
Xn ∈ ζn of each player n ∈ N .
We define the congestion level crn (X) of resource r ∈ R for player n ∈ N
within strategy profile X to be crn (X) = |{n ∈ N : {n, n } ∈ E, r ∈ Xn }|. In
other words crn (X) is the number of neighbors that n has in the conflict graph G
which are using resource r in strategy profile X. The total payoff that a player
n gets in strategy profile X is
fnr (crn (X)) .
r∈Xn
This is the sum of the payoffs fnr (crn (X)) that n receives from each of the re-
sources r within the resource set Xn that n chooses.
We say that [n] → S is a best response update if it improves player n’s payoff
to the maximum possible value among all better responses from the current
strategy profile.
We say that [n] → S is a lazy best response update [16] if (a) [n] → S is a
best response update, and (b) for any other best response update [n] → S that
n could perform, we have |Xn − S | + |S − Xn | ≥ |Xn − S| + |S − Xn |. In other
words, a lazy best response update is a best response update which minimizes
the number |Xn − S| + |S − Xn | of resources which n must add or remove from
their currently chosen resource set Xn .
N
We say a strategy profile X ∈ Πn=1 ζn is a pure Nash equilibrium 3 if and only
if no better response updates can be performed by any player from X.
We give an illustrative example of such a graphical congestion game in Figure
1. Suppose that the collections of available resources for the four players/vertices
3
We always suppose players use pure strategies and so all of the Nash equilibria that
we discuss are pure.
Convergence Dynamics of Graphical Congestion Games 35
are ζ1 = 2{1,2,3} , ζ2 = ζ4 = {∅, {1}}, and ζ3 = {∅, {2}}. Assume that the payoff
functions are fnr (x) = 1 − x for each player n and resource r. In the strategy
profile X shown in Figure 1, player 1 uses strategy X1 = {1, 2} and receives a
total payoff of f11 (c11 (X)) + f12 (c21 (X)) = (1 − 2) + (1 − 1). From this strategy
profile, player 1 could perform the better response update [1] → {2} (which is
not a best response update), or the best response update [1] → {2, 3} (which is
not a lazy best response update), or the lazy best response update [1] → {3}
(which leads to a pure Nash equilibrium).
We are interested in how graphical congestion games evolve when the players
keep performing better response updates. Nash equilibria are the fixed points
of such dynamics, since no player has any incentive to deviate from a Nash
equilibrium.
We can put properties a congestion game might possess in the ascending order
of strength/desirability as follows;
This paper is mainly concerned with identifying conditions under which the
generalized graphical congestion games have properties 1, 2, and 3. It should be
noted that the presence of property 3 implies the presence of property 2, which
in turn, implies the presence of property 1. However it is possible to construct
games with only subset (or none) of the above properties.
Graphical congestion games were introduced in [19], where the authors consid-
ered linear and non-player specific payoff functions. Such games are proved to
have the finite improvement property when the graph is undirected or acyclic.
But the authors illustrated a game on a directed graph with no pure Nash equilib-
ria. In [20], players are assigned different weights, so they suffer more congestion
from “heavier” neighbors. Both [19] and [20] restricted their attention to “sin-
gleton games” (where each player uses exactly one resource at any given time)
with linear and non-player-specific payoff functions.
In [17], the authors introduced the more general graphical congestion game
model as described in Section 1.1 to model spectrum sharing in wireless net-
works (see Table 1). The model allows generic player-specific payoff functions,
as wireless users often have complicated and heterogeneous responses to the
received interference. The authors showed that every singleton graphical con-
gestion game with two resources has the finite improvement property. They also
gave an example of a singleton graphical congestion game (with player-specific
and resource-specific payoff functions) which does not possess any pure Nash
equilibria. In [13], we extended upon this work by showing that every singleton
36 R. Southwell et al.
graphical congestion game with homogenous resources (i.e. the payoff functions
are not resource-specific) converges to a pure Nash equilibrium in polynomial
time.
In [15], the authors investigated the existence of pure Nash equilibrium of
spatial spectrum sharing games on general interference graphs, especially when
Aloha and random backoff mechanisms are used for channel contention. They
also proposed an efficient distributed spectrum sharing mechanism based on
distributed learning.
the organisms will be able to access multiple food sources, but they will not be
able to access more than a certain number of food sources because of limited
time and energy. In wireless networks, users can divide their transmission power
among many channels, however they cannot access too many channels because
their total power is limited.5 In market sharing games (e.g., [23]), each player
has a fixed budget they can spend upon serving markets. When the cost of serv-
ing each market is the same, this corresponds to a uniform matroid congestion
game because the number of markets a player can serve is capped. Linked pay-
ers in a matroid graphical congestion game could represent businesses who are
close enough to compete for the same customers. As [16] noted, some network
formation games correspond to congestion games with a matroid structure. For
example, in [22] the authors considered the game where players select spanning
trees of a graph, but suffer congestion from sharing edges with other players.
In such scenarios, the conflict graph could represent which players are able to
observe each others’ actions.
In section 2, we consider the properties of a special important type of matroid
graphical congestion game, the powerset graphical congestion game, within which
the collection of available resource sets ζn of each player n is a powerset ζn = 2Qn
for some subset Qn ⊆ R of available resources. In section 3, we investigate the
properties of more general matroid graphical congestion games. Our main results
are listed below (and illustrated in Figure 2);
Our main result is Theorem 4, because it identifies a very general class of games
with pleasing convergence properties. This result is especially meaningful for
wireless networks, because wireless channels often have equal bandwidth, which
means that they correspond to homogenous resources (under flat fading or in-
terleaved channelization). The way we prove this convergence result is to define
a potential function, which decreases whenever a player performs a lazy best
response update. The existence of such a function guarantees that lazy best
response updatings eventually lead to a fixed point (a pure Nash equilibrium).
Due to limited space, we refer the readers to our online technical
report [24] for the full proofs of most results in this paper.
5
In reality, when a user shares its power among many channels, the benefit they
receive from using each one is diminished. Our game model does not capture this
effect, however other models that do [18] are often analytically intractable.
38 R. Southwell et al.
Matroid GCG
Powerset GCG
Powerset
homo-resource
GCG
Matroid
homo-resource
GCG
Fig. 2. In both Powerset GCG and Matroid homo-resource GCG, lazy best
response update converges to pure Nash equilibria in polynomial time. However, even
in the intersection class of Powerset homo-resource GCG, there exist examples
where better response update may never converge to pure Nash equilibria.
Fig. 3. A cycle in the best response dynamics of the powerset graphical congestion
game discussed in the proof of Theorem 1. The arrows represent how the strategy
profile changes with certain better response updates. Better response updating cannot
be guaranteed to drive this game into a pure Nash equilibrium because better response
updating can lead to cycles.
Notice that the example game in the proof of Theorem 1 is played on a complete
graph and has homogenous resources. Thus the lack of finite improvement prop-
erty is not due to either special property of the graph or the resources. Theorem
1 seems to be quite negative. However, as we shall see, the players often can
be gaurenteed to reach pure Nash equilibria if they update their resources in
special ways (instead of unregulated asynchronous updates). Before we describe
this in more details, let us introduce some tools that will be useful throughout
our analysis: beneficial pickups, beneficial drops, and the temperature function.
N, if f (x) > 0, ∀x ∈ {0, ..., N − 1},
TN→ [f ] =
min{x ∈ {0, .., N − 1} : f (x) ≤ 0}, otherwise.
Lemma 1. Suppose TN← [f ] and TN→ [f ] are the left-threshold and right-threshold
values of the non-increasing function f (with respect to N ), then for any x ∈
{0, ..., N − 1},
6
The temperature function is not always a potential function, because it may not
decrease when certain better response updates are performed in certain cases.
Convergence Dynamics of Graphical Congestion Games 41
Lemma 1 can be proved using basic facts about non-increasing functions. With
this lemma in place we shall define the temperature function.
The temperature function Θ associated with an N -player graphical congestion
game g is defined as
Θ(X) = (crn (X) − TN← [fnr ] − TN→ [fnr ]) .
n∈N r∈Xn
In many types of graphical congestion game, the temperature function always de-
creases with lazy best response updates. Now we will show that the temperature
function decreases every time a player performs a beneficial pickup or drop.
Lemma 2 can be proved using Lemma 1 together with the fact that fna (can (X)) >
0 whenever [n] → Xn ∪ {a} is a beneficial pickup.
Lemma 3 can be proved using Lemma 1 together with the fact that fnb (cbn (X)) <
0 whenever [n] → Xn − {b} is a beneficial drop.
The temperature function clearly takes integer values. Another crucial feature
of the temperature function is that it is bounded both above and below.
Lemma 5 characterizes the relationship between the lazy best response and the
beneficial pickups and drops.
We know from Lemmas 2 and 3 that beneficial pickups and drops decreases the
temperature function. Hence Lemma 5 essentially shows that the temperature
function is a potential function, which decreases by integer steps when a powerset
graphical congestion game evolves via lazy best response updates.
42 R. Southwell et al.
Sketch of Proof. Since each beneficial pickup or drop decreases the tempera-
ture function Θ by at least one (Lemmas 2 and 3), and each lazy best response
update can be decomposed into beneficial pickups and drops (Lemma 5), we have
that each lazy best response update decreases the temperature function by at
least one. Since the temperature function is bounded above by RN 2 and below by
R(N −2N 2 ) (Lemma 4), then no more than RN 2 −(R(N −2N 2 )) = R(3N 2 −N )
lazy best response updates can be performed starting from any strategy profile.
When no more lazy best response update can be performed, we reach a pure
Nash equilibrium.
Theorem 3. There exist matroid graphical congestion games which do not pos-
sess a pure Nash equilibrium.
We say a graphical congestion game g = (N , R, (ζn )n∈N , (fnr )n∈N ,r∈R , G) has
homogenous resources when the payoff functions are not resource specific (i.e.,
Convergence Dynamics of Graphical Congestion Games 43
fn1 (x) = fn2 (x) = ... = fnR (x) = fn (x), ∀n ∈ N , ∀x). Note that different play-
ers can have different payoff functions. When discussing resource homogenous
games, we often suppress the superscript on the payoff functions, writing fnr (x)
as fn (x) to represent the fact that the payoff functions do not depend on the
resources.
We will show that a matroid graphical congestion game with homogenous
resources will reach a pure Nash equilibrium in polynomial time if the players
perform lazy best response updates. We prove this result with the help of the
temperature function. Before we do this, we must introduce a third type of
elementary update operation – the beneficial swap, which is a better response
update [n] → (Xn ∪ {a}) − {b}) where a ∈ / Xn and b ∈ Xn (i.e., a beneficial
swap is where a player stops using a resource b and starts using a resource a,
and benefits as a result.)
Our next result states that in any graphical congestion game with homogenous
resources (but not necessarily with matroid structure), a beneficial swap will
decrease the temperature function Θ by at least one.
Lemma 6 follows from the fact that if [n] → (Xn ∪ {a}) − {b} is a beneficial swap
and the resources are homogenous, then can (X) < cbn (X).
Lemmas 2, 3, and 6 together imply that any beneficial pickup, drop, or swap
in a graphical congestion game with homogenous resources will decrease the
temperature function. Next we will show that if the strategy sets ζn ’s of the
game are matroids, then it is always possible to perform a beneficial pickup,
drop, or swap from a non-equilibrium state. In particular we will show that
each lazy best response update in a matroid graphical congestion game with
homogenous resources can be decomposed into a sequence of beneficial pickups,
drops, and/or swaps. The following three lemmas will allow us to achieve this
goal.
Lemma 7. If [n] → S is a lazy best response update that can be performed from
a strategy profile X of a matroid graphical congestion game with homogenous
resources and |Xn | < |S|, then there exists a ∈ S − Xn such that [n] → Xn ∪ {a}
is a beneficial pickup that player n can perform from X.
Lemma 8. If [n] → S is a lazy best response update that can be performed from
a strategy profile X of a matroid graphical congestion game with homogenous
resources and |Xn | > |S|, then there exist b ∈ S − Xn such that [n] → Xn − {b}
is a beneficial drop that player n can perform from X.
Lemma 9. If [n] → S is a lazy best response update that can be performed from
a strategy profile X of a matroid graphical congestion game with homogenous
resources and |Xn | = |S|, then there exists a ∈ Xn − S and ∃b ∈ S − Xn such
44 R. Southwell et al.
that [n] → (Xn ∪ {a}) − {b} is a beneficial swap that player n can perform from
X.
Lemmas 7 and 8 can be shown using the basic matroid properties. Our proof to
Lemma 9 uses a more sophisticated result about matroids shown in [21]. With
Lemmas 7, 8, and 9, we can prove the following main result of this paper.
Sketch of Proof. Since each beneficial pickup, drop, or swap decreases the
temperature function Θ by at least one (Lemmas 2, 3, and 6) and each lazy best
response update can be decomposed into beneficial pickups, drops, or swaps (as
can be proved inductively using Lemmas 7, 8, and 9), we have that each lazy
best response update decreases the temperature function by at least one. Since
the temperature function is bounded above by RN 2 and below by R(N − 2N 2 )
(Lemma 4), no more than RN 2 −(R(N −2N 2)) = R(3N 2 −N ) lazy best response
updates can be performed starting from any strategy profile. When no more lazy
best response update can be performed, we reach a pure Nash equilibrium.
4 Conclusion
We have derived many results which are useful for understanding when graphical
congestion games converge to pure Nash equilibria. Theorem 1 is quite negative,
because it implies the existence of games with simple features (players that can
use any combination of resources, and resources are homogenous) which cannot
be guaranteed to converge to pure Nash equilibria under generic better response
updating. However, Theorems 2 and 4 imply that in many cases (powerset games,
or matroid games with homogenous resources) the players do converge to pure
Nash equilibria under lazy best response updating. These results are very en-
couraging, because they imply that spatially distributed individuals will quickly
be able to organize themselves into a pure Nash equilibrium in a wide range of
scenarios. Just so long as the players are rational enough to restrict themselves
to lazy best response updates. We obtained our convergence results by breaking
better response updates into more elementary operations, and observing how
Convergence Dynamics of Graphical Congestion Games 45
these operations alter the value of the temperature function we defined. In the
future, we will use these results to study the convergence dynamics of more
general games, where players have generic collections of available resource sets.
References
1. Rosenthal, R.: A class of games possessing pure-strategy Nash equilibria. Interna-
tional Journal of Game Theory 2, 65–67 (1973)
2. Tennenholtz, M., Zohar, A.: Learning equilibria in repeated congestion games. In:
Proceedings of AAMAS 2009 (2009)
3. Liu, M., Wu, Y.: Spectum sharing as congestion games. In: Proceedings the 46th
Annual Allterton Conference on Communication, Control, and Computing (2008)
4. Law, L., Huang, J., Liu, M., Li, S.: Price of Anarchy for Cognitive MAC Games.
In: Proceedings of IEEE GLOBECOM (2009)
5. Chen, X., Huang, J.: Evolutionarily Stable Spectrum Access in a Many-Users
Regime. In: Proceedings of IEEE GLOBECOM (2011)
6. Southwell, R., Huang, J.: Spectrum Mobility Games. In: IEEE INFOCOM (2012)
7. Vöcking, B., Aachen, R.: Congestion games: Optimization in competition. In: Pro-
ceedings of the Second ACiD Workshop (2006)
8. Tardos, E., Wexler, T.: Network formation games and the potential function
method. In: Algorithmic Game Theory. ch.19, pp. 487–516 (2007)
9. Fretwell, S.D., Lucas, H.L.: On Territorial Behavior and Other Factors Influencing
Habitat Distribution in Birds. Acta Biotheor. 19, 16–36 (1969)
10. Lachapelle, A., Wolfram, M.: On a mean field game approach modeling congestion
and aversion in pedestrian crowds. Transportation Research Part B: Methodologi-
cal. 45, 1572–1589 (2011)
11. Godin, J., Keenleyside, M.: Foraging on Patchily Distributed Preyby a Cichlid
Fish (Teleostei, Cichlidae): A Test of the Ideal Free Distribution Theory. Anim.
Behav. 32, 120–131 (1984)
12. Ackermann, H., Röglin, H., Vöcking, B.: On the Impact of Combinatorial Structure
on Congestion Games. In: Proceedings of FOCS 2006 (2006)
13. Southwell, R., Huang, J.: Convergence Dynamics of Resource-Homogeneous Con-
gestion Games. In: Jain, R., Kannan, R. (eds.) GameNets 2011. LNICST, vol. 75,
pp. 281–293. Springer, Heidelberg (2012)
14. Milchtaich, I.: Congestion Games with Player-Specific Payoff Functions. Games
and Economic Behavior 13, 111–124 (1996)
15. Chen, X., Huang, J.: Spatial Spectrum Access Game: Nash Equilibria and Dis-
tributed Learning. In: ACM International Symposium on Mobile Ad Hoc Net-
working and Computing (MobiHoc), Hilton Head Island, South Carolina, USA
(June 2012)
16. Ackermann, H., Röglin, H., Vöcking, B.: Pure Nash Equilibria in Player-Specific
and Weighted Congestion Games. In: Spirakis, P.G., Mavronicolas, M., Konto-
giannis, S.C. (eds.) WINE 2006. LNCS, vol. 4286, pp. 50–61. Springer, Heidelberg
(2006)
17. Tekin, C., Liu, M., Southwell, R., Huang, J., Ahmad, S.: Atomic Congestion Games
on Graphs and its Applications in Networking. IEEE Transactions on Networking
(to appear, 2012)
18. Etkin, R., Parekh, A., Tse, D.: Spectrum sharing for unlicensed bands. IEEE Jour-
nal on Selected Areas in Communications 25, 517–528 (2007)
46 R. Southwell et al.
19. Bilo, V., Fanelli, A., Flammini, M., Moscardelli, L.: Graphical congestion games.
Algorithmica 61, 274–297 (2008)
20. Fotakis, D., Gkatzelis, V., Kaporis, A.C., Spirakis, P.G.: The Impact of Social
Ignorance on Weighted Congestion Games. In: Leonardi, S. (ed.) WINE 2009.
LNCS, vol. 5929, pp. 316–327. Springer, Heidelberg (2009)
21. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency. Matroids,
Trees, Stable Sets, Volume B, 39–69 (2009)
22. Werneck, R., Setubal, J., Conceicao, A.: Finding minimum congestion spanning
trees. Journal of Experimental Algorithmics 5 (2000)
23. Goemans, M., Li, L., Mirrokni, V., Thottan, M.: Market sharing games applied to
content distribution in ad-hoc networks. In: Proceedings of MobiHoc 2004 (2004)
24. Southwell, R., Chen, Y., Huang, J., Zhang, Q.: Convergence Dynamics of Graphi-
cal Congestion Games, Technical Report,
http://jianwei.ie.cuhk.edu.hk/publication/GCCConvergenceTechReport.pdf
Establishing Network Reputation
via Mechanism Design
1 Introduction
This paper studies the following mechanism design problem: in a distributed multi-
agent system where each agent possesses beliefs (or perceptions) of each other,
while the truth about an agent is only known to that agent itself and it may have
an interest in withholding the truth, how to construct mechanisms with the proper
incentives for agents to participate in a collective effort to arrive at the correct
perceptions of all participants without violating privacy and self-interest.
Our main motivation lies in the desire to enhance network security through
establishing the right quantitative assessment of the overall security posture of
different networks at a global level; such a quantitative measure can then be used
to construct sophisticated security policies that are proactive in nature, which are
distinctly different from current solutions that typically tackle specific security
problems. Such quantitative measure can also provide guidance to networks’
human operators to more appropriately allocate resources in prioritizing tasks –
after all, the health of a network is very much a function of the due diligence of
its human administrators.
The work is partially supported by the NSF under grant CIF-0910765 and CNS-
121768, and the U.S. Department of Commerce, National Institute of Standards
and Technology (NIST) Technology Innovation Program (TIP) under Cooperative
Agreement Number 70NANB9H9008.
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 47–62, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
48 P. Naghizadeh Ardabili and M. Liu
Ni using a certain mechanism with the above inputs collected from the net-
works. This index/estimate will then be used by peer networks to regulate their
interactions with Ni .
2.2 Assumptions
We assume that each network Ni is aware of its own conditions and therefore
knows rii precisely, but this is in general its private information. While it is
technically feasible for any network to obtain rii by closely monitoring its own
hosts and traffic, it is by no means always the case due to reasons such as resource
constraints.
We also assume that a network Ni can sufficiently monitor inbound traffic
from network Nj so as to form an estimate of Nj ’s condition, denoted by Rij ,
based on its observations. However, Ni ’s observation is in general an incomplete
view of Nj , and may contain error depending on the monitoring and estimation
technique used. We will thus assume that Rij is described by a Normal distribu-
tion N (μij , σij
2
), which itself may be unbiased (μij = rjj ) or biased (μij = rjj ).
We will further assume that this distribution is known to network Nj (a relax-
ation of this assumption is also considered later). The reason for this assumption
is that Nj can closely monitor its outbound traffic to Ni , and therefore may suf-
ficiently infer how it is perceived by Ni . On the other hand, Ni itself may or
may not be aware of the distribution N (μij , σij2
).
A reputation mechanism specifies a method used by the reputation agent to
compute the reputation indices, i.e., how the input reports are used to generate
output estimates. We assume the mechanism is common knowledge among all
K participating networks.
A participating network Ni ’s objective is assumed to be characterized by the
following two elements: (1) it wishes to obtain from the system as accurate as
possible a reputation estimate r̂j on networks Nj other than itself, and (2) it
wishes to obtain as high as possible an estimated reputation r̂i on itself. It must
therefore report to the reputation agent a carefully chosen (Xij )j∈K , using its
private information rii , its knowledge of the distributions (Rji )j∈K\i , and its
knowledge of the mechanism, to increase (or inflate) as much as possible r̂i
while keeping r̂j close to rjj . The reason for adopting the above assumption is
because, as pointed out earlier, accurate assessment of other networks’ security
posture can help a network configure its policies appropriately, and thus correct
perception of other networks is critical. On the other hand, a network has an
interest in inflating its own reputation so as to achieve better visibility and less
traffic blocked by other networks, etc. Note that these two elements do not fully
define a network’s preference model (or utility function). We are simply assuming
that a network’s preference is increasing in the accuracy of others’ reputation
estimate and increasing in its own reputation estimate, and that this is public
knowledge1 .
1
How the preference increases with these estimates and how these two elements are
weighed remain the network’s private information and do not factor into the present
analysis.
Establishing Network Reputation 51
Note also that the objective assumed above may not capture the nature of a
malicious network, who may or may not care about the estimated perceptions
about itself and others. Recall that our basic intent through this work is to
provide reputation estimate as a quantitative measure so that networks may
adopt and develop better security policies and be incentivized to improve their
security posture through a variety of tools they already have. Malicious networks
are not expected to react in this manner. On the other hand, it must be admitted
that their participation in this reputation system, which cannot be ruled out as
malicious intent may not be a priori knowledge, can very well lead to skewed
estimates, thereby rendering the system less than useful. The hope is that a
critical mass of non-malicious networks will outweigh this effect, but this needs
to be more precisely established and is an important subject of future study.
final estimates. One might argue that a network could potentially improve its
relative position by providing false cross-reports of other networks so as to lower
their reputation indices, i.e., it can make itself look better by comparison. A
close inspection of the situation reveals, however, that there is no clear incentive
for a network to exploit such indirect effect of their cross-reports either.
One reason is that the proposed reputation system is not a ranking system,
where making other entities look worse would indeed improve the standing of
oneself. The reputation index is a value normalized between [0, 1], a more or
less absolute scale. It is more advisable that a network tighten its security mea-
sures against all networks with low indices rather than favor the highest-indexed
among them.
But more importantly and perhaps more subtly, badmouthing another net-
work is not necessarily in the best interest of a network. Suppose that after
sending a low cross-report Xij , Ni subsequently receives a low r̂j from the rep-
utation agent. Due to its lack of knowledge of other networks’ cross-reports,
Ni cannot reasonably tell whether this low estimate r̂j is a consequence of its
own low cross-report, or if it is because Nj was observed to be poor(er) by
other networks and thus r̂j is in fact reflecting Nj ’s true reputation (unless a set
of networks collude and jointly target a particular network). This ambiguity is
against Ni ’s interest in obtaining accurate estimates of other networks; therefore
bashing is not a profitable deviation from truthful reporting.
3 A Two-Network Scenario
3.1 The Proposed Mechanism
We start by considering only two networks and extend the result to multiple
networks in the next section. We will examine the following way of computing
the reputation index r̂1 for N1 , where is a fixed and known constant. The
expression for r̂2 is similar, thus for the remainder of this section we will only
focus on N1 .
X +X
21 11
if X11 ∈ [X21 − , X21 + ]
r̂1 (X11 , X21 ) = 2 (1)
X21 − |X11 − X21 | if X11 ∈/ [X21 − , X21 + ]
In essence, the reputation agent takes the average of self-report X11 and cross-
report X21 if the two are sufficiently close, or else punishes N1 for reporting
significantly differently. Note that this is only one of many possibilities that
reflect the idea of weighing between averaging and punishing; for instance, we
can also choose to punish only when the self-report is higher than the cross-
report, and so on.
0.2 0.6
0.24
0.1 0.5
0 0.22 0.4
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
a a a
Fig. 1. Solution of (4): y Fig. 2. Errors vs. a, r11 = Fig. 3. Est. reputation vs.
vs. a 0.75, σ 2 = 0.1 a, r11 = 0.75, σ 2 = 0.1
As seen in (6), the optimal choice of a does not depend on the specific values of
μ and σ. Therefore, the same mechanism can be used for any set of networks.
Equation (6) can be solved numerically, and is zero at two values: at a = 0,
which indicates a local maximum, and at a ≈ 1.7, where it has a minimum. This
can been seen from Figure 2, which shows the MAE of the proposed mechanism
compared to that of the averaging mechanism. Under the averaging mechanism
the MAE is E[|R21 − r11 |] = π2 σ. We see that for a large range of a values the
mechanism given in (1) results in smaller estimation error. This suggests that
N1 ’s self-report can significantly benefit the system as well as all networks other
than N1 .
4 1
The calculations here are possible if y ≤ 2
, which based on Figure 1 is a valid
assumption for moderate values of a.
Establishing Network Reputation 55
Biased Cross-Report. We now turn to the case where the cross-report X21
comes from the biased distribution N (r11 + b, σ 2 ), where b is the bias term, a
fact unknown to both N2 and the reputation mechanism. We will thus assume
that the mechanism used remains that given by (1) with the optimal value of a
obtained previously.
First consider the case that N1 is also not aware of the bias, and again chooses
∗
X11 = r11 +ayσ. The calculation of the error is the same, leading to (5). However,
here F and f are those of the Normal distribution N (r11 + b, σ 2 ). Therefore, the
new minimum error and the value of a where it occurs are different. Figure 4
shows the MAE for three different values of the bias. As seen from the figure,
the error increases for b = −0.1σ, and decreases for b = 0.1σ compared to the
unbiased case. This is because for the negative bias, N1 is not adapting its self-
advertised reputation accordingly. This makes the mechanism operate mainly
in the punishment phase, which introduces larger errors. For the small positive
bias, however, the mechanism works mainly in the averaging phase, and the error
is less than both the biased and unbiased cases. The latter follows from the fact
that punishment phases happen more often in the unbiased case. Note however
that for larger values of positive bias, the error will eventually exceed that of the
unbiased case.
1
Unbiased
0.4 0.4
Estimated Reputation
Next we consider the case where X21 ∼ N (r11 +b, σ 2 ) as before but this bias is
∗
known to N1 . N1 will accordingly adapt its self-report to be X11 = r11 + b + ayσ.
Figure 5 shows a comparison in this case. The results show that the selected
positive bias increases the error, while the negative bias can decrease the error
compared to the unbiased case.
56 P. Naghizadeh Ardabili and M. Liu
The assumption of a known bias has the following two intuitively appealing
interpretations. The first is where N1 has deliberately sent its traffic through N2
in such a way so as to bias the cross-report. As expected, it’s in the interest of N1
to introduce a positive bias in N2 ’s evaluation of itself. If this is what N1 chooses
to do then arguably the mechanism has already achieved its goal of improving
networks’ security posture – after all, N2 now sees a healthier and cleaner version
of N1 which is welcomed! The second case is where given the mechanism, N2
knows that N1 will introduce a positive bias in its self-report, and consequently
counter-acts by sending a negatively biased version of its observation. To find
the optimal choice for this deliberately introduced bias we proceed as follows.
Define μ := r11 + b. To see how the mean absolute error behaves, we find an
expression for em at any given a.5
more cross-reports on the basis of which it will judge Ni . In the simplest case, the
agent can take the average of all the cross-reports to get X0i := K−1 1
Σj∈K\i Xji ,
and derive r̂i using:
X +X
0i ii
if Xii ∈ [X0i − , X0i + ]
r̂i (Xii , X0i ) = 2 . (9)
X0i − |Xii − X0i | if Xii ∈
/ [X0i − , X0i + ]
with μ and σ 2 being the mean and variance of X0i . Note that in this case the
reputation agent is using = aσ .
If all cross-reports are unbiased, i.e., μji = rii , and σji = σ, we have X0i ∼
σ2
N (rii , K−1 ). To find the optimal choice of a we will need to solve (6) again, with
the only difference that σ is replaced by σ . Therefore, the optimal choice of a,
which is independent of the mean or variance of the reports, will be the same
as before. This result can be verified in Figures 7 and 8, which show the MAE
of collections of 3 and 10 networks respectively. Furthermore, as expected the
error decreases as the number of networks increases in this case.
Non-skewed Bias Distribution. If the bias distribution has zero mean (bji =
0) and all variance terms are the same: σji = σ and σb,ji = σb , then (10)
σ2 +σ2
is simplified to X0i ∼ N (rii , σ 2 ), where σ 2 = K−1b . The calculation of the
optimal self-report is given by the same optimization problem as before, resulting
in Xii∗ = rii + ayσ . Figures 9 and 10 show the simulation results for K = 3
and K = 10 respectively. As expected, biased cross-reports result in larger error
58 P. Naghizadeh Ardabili and M. Liu
0.24 0.1
Proposed Mechanism 0.22
Proposed Mechanism 0.095 Averaging Mechanism Proposed Mechanism−Biased
Mean Absolute Error
0.1 0.8
0.105 Proposed Mechanism−Biased
Proposed Mechanism−Biased
0.1 0.095 Averaging Mechanism−Biased
Averaging Mechanism−Biased
Proposed Mechanism−Unbiased
Estimated Reputation
0.75
Mean Absolute Error
Proposed Mechanism−Unbiased
Mean Absolute Error
0.095 0.09
Averaging Mechanism−Unbiased Averaging Mechanism−Unbiased
0.09
0.085 0.7
0.085 Proposed Mechanism−Biased
0.08
0.08 0.65 Averaging Mechanism−Biased
0.075
0.075 Proposed Mechanism−Unbiased
Averaging Mechanism−Unbiased
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
a a a
Fig. 10. MAE, 10 Net- Fig. 11. MAE, 10 Net- Fig. 12. Est. Reputation,
works, non-skewed bias works, skewed bias 10 Networks, skewed bias
distribution distribution distribution
compared to unbiased cross-reports: the fact that σ > σ in the unbiased case
allows N1 to introduce a larger inflation in its self-report, thus increasing the
MAE in general.
Skewed Bias Distribution. If we assume that all bias terms are from the
same distribution but this distribution is skewed itself, i.e. B0i ∼ N (b0i , σb ),
then negatively biased cross-reports can result in lower MAE compared to a
non-skewed bias distribution, while positively biased cross-reports can increase
the error. Figure 11 verifies this property of the mechanism in a collection of 10
networks, and for a negative value of b0i .
In all of the above cases, we need the range of a to be such that using the
proposed mechanism is mutually beneficial for the system and the individual
networks. Our numerical results show that, when cross-reports are unbiased,
the values of a for which it is individually rational for a network to participate
does not change as the number of networks increases. Also, this range remains
unchanged if the cross-reports have a non-skewed bias distribution. In the case
of skewed bias distribution a similar behavior as the two-network scenario is
observed, where individual networks have more incentive to participate in the
estimation of their own reputation when there is a positive bias in the cross-
reports, and are less inclined to do so in the presence of a negative bias.
Figure 12 illustrates these results. As seen in the figure, for unbiaed cross-
reports, the range for which networks are incentivized to participate is again
roughly a ∈ [2, 2.5] despite the increase in the number of networks. The figure
also shows the effect of a choice of b = −0.1σ for cross-reports with skewed
bias. A careful study of this figure along with Figure 11 indicates that the same
Establishing Network Reputation 59
j∈K\i wj Xji
X0i := (11)
j∈K\i wj
In the special case σji = σ, ∀j, the Cauchy-Schwarz inequality implies
wj 2
j∈K\i ≥ K−1
2
1
, with equality at wj = w0 , ∀j. This is true independent
( j∈K\i wj )
of the choice of w, and therefore the weighted average will always have higher
estimation error. Figure 13 shows this result for a random choice of the vector
w.
Next consider the case where σji ’s are different. Without lose of generality
assume that the coefficients are normalized such that they sum to 1. In order to
achieve lower estimation error, we want to choose w such that j∈K\i wj 2 σji 2
≤
1
σ
j∈K\i (K−1)2 ji
2
. This rearrangement shows clearly that for the inequality to
hold, it suffices to put more weight on the smaller σji , i.e., more weight on
those with more accurate observations. It follows that if more reputable networks
(higher r̂j ) also have more accurate observations (smaller σji ), then selecting
weights according to existing reputation reduces the estimation error. Figure 14
shows the results for 3 networks when σ31 < σ21 , and the weights are chosen
accordingly to be w = (0.45, 0.55).
6
In fact, using a simple average of cross-reports is a special case of this problem by
using equal wj and σji .
60 P. Naghizadeh Ardabili and M. Liu
0.28 0.21
Proposed Mechanism − Weighted Average Proposed Mechanism − Weighted Average
0.26 Proposed Mechanism − Simple Average 0.2 Proposed Mechanism − Simple Average
Averaging Mechanism Averaging Mechanism
0.16 0.16
0.15
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
a a
Fig. 13. MAE, 3 Networks, Weighted Fig. 14. MAE, 3 Networks, Weighted
Averages, Equal Variances Averages, Different Variances
0.22 0.8
Proposed Mechanism − Weighted Average Proposed Mechanism − Weighted Average
0.795 Proposed Mechanism − Simple Average
Proposed Mechanism − Simple Average
0.2 Averaging Mechanism
Averaging Mechanism 0.79
Estimated Reputation
Mean Absolute Error
0.785
0.18
0.78
0.16 0.775
0.77
0.14
0.765
0.12 0.76
0 0.5 1 1.5 2 2.5 3 1.75 1.8 1.85 1.9 1.95 2 2.05 2.1 2.15
a a
Fig. 15. MAE, 3 Networks, Weighted Fig. 16. Est. Reputation, 3 Networks,
Averages, Skewed Bias Weighted Averages, Skewed Bias
References
1. Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., Feamster, N.: Building a Dy-
namic Reputation System for DNS. In: 19th USENIX Security Symposium (August
2010)
2. Bailey, M., Cooke, E., Myrick, A., Sinha, S.: Practical Darknet Measurement. In:
40th Annual Conference on Information Sciences and Systems (March 2006)
3. DShield. How To Submit Your Firewall Logs To DShield (September 2011),
http://isc.sans.edu/howto.html
4. Feldman, M., Lai, K., Stoica, I., Chuang, J.: Robust incentive techniques for
peer-to-peer networks. In: ACM Conference on Electronic Commerce, pp. 102–111
(2004)
5. Hanaki, N., Peterhansl, A., Dodds, P., Watts, D.: Cooperation in evolving social
networks. Management Science 53(7), 1036–1050 (2007)
6. Cisco Systems Inc. SpamCop Blocking List - SCBL (May 2011),
http://www.spamcop.net/
7. Damballa Inc. Damballa Threat Reputation System (May 2011),
http://www.damballa.com/
62 P. Naghizadeh Ardabili and M. Liu
Laboratory or Information and Decision Systems, MIT, Cambridge, MA, 02139, USA
{jnt,yunjian}@mit.edu
1 Introduction
In a book on oligopoly theory (see Chapter 2.4 of [6]), Friedman raises an in-
teresting question on the relation between Cournot equilibria and competitive
equilibria: “is the Cournot equilibrium close, in some reasonable sense, to the
competitive equilibrium?” While a competitive equilibrium is generally socially
optimal, a Cournot (Nash) equilibrium can yield arbitrarily high efficiency loss
in general [8]. The concept of efficiency loss is intimately related to the concept
of “price of anarchy,” advanced by Koutsoupias and Papadimitriou in a seminal
paper [11]; it provides a natural measure of the difference between a Cournot
equilibrium and a socially optimal competitive equilibrium.
For Cournot oligopoly with affine demand functions, various efficiency bounds
have been reported in recent works [9][10]. Convex demand functions, such as
the negative exponential and the constant elasticity demand curves, have been
widely used in oligopoly analysis and marketing research [2,4,14]. The efficiency
loss in a Cournot oligopoly with some specific forms of convex inverse demand
functions1 has received some recent attention. For a particular form of convex
This research was supported in part by the National Science Foundation under grant
CMMI-0856063 and by a Graduate Fellowship from Shell.
1
Since a demand function is generally nonincreasing, the convexity of a demand func-
tion implies that the corresponding inverse demand function is also convex. For
a Cournot oligopoly model with non-concave inverse demand functions, existence
results for Cournot equilibria can be found in [12,1].
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 63–76, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
64 J.N. Tsitsiklis and Y. Xu
inverse demand functions, i.e., p(q) = α − βq γ , the authors of [3] show that when
γ > 0, the worst case efficiency loss occurs when an efficient supplier has to share
the market with infinitely many inefficient suppliers. The authors of [7] consider
a class of inverse demand functions that solve a certain differential equation (for
example, constant elasticity inverse demand functions belong to this class), and
establish efficiency lower bounds that depend on equilibrium market shares, the
market demand, and the number of suppliers.
For Cournot oligopolies with general convex and nonincreasing demand func-
tions, we establish a lower bound on the efficiency of Cournot equilibria in terms
of a scalar parameter c/d derived from the inverse demand function, namely, the
ratio of the slope of the inverse demand function at the Cournot equilibrium, c,
to the average slope of the inverse demand function between the Cournot equi-
librium and a social optimum, d. For convex and nonincreasing inverse demand
functions, we have c ≥ d; for affine inverse demand functions, we have c/d = 1.
In the latter case, our efficiency bound is f (1) = 2/3, which is consistent with the
bound derived in [9]. More generally, the ratio c/d can be viewed as a measure
of nonlinearity of the inverse demand function.
The rest of the paper is organized as follows. In the next section, we formulate
the model and provide some mathematical preliminaries on Cournot equilibria
that will be useful later, including the fact that efficiency lower bounds can be
obtained by restricting to linear cost functions. In Section 3, we consider affine
inverse demand functions and derive a refined lower bound on the efficiency of
Cournot equilibria that depends on a small amount of ex post information. We
also show this bound to be tight. In Section 4, we consider a more general model,
involving convex inverse demand functions. We show that for convex inverse de-
mand functions, and for the purpose of studying the worst case efficiency loss, it
suffices to restrict to a special class of piecewise linear inverse demand functions.
This leads to the main result of the paper, a lower bound on the efficiency of
Cournot equilibria (Theorem 2). Based on this theorem, in Section 5 we derive a
corollary that provides an efficiency lower bound that can be calculated without
detailed information on Cournot equilibria, and apply it to various commonly
encountered convex inverse demand functions. Finally, in Section 6, we make
some brief concluding remarks. Most proofs are omitted and can be found in an
extended version of the paper [13].
In this section, we first define the Cournot competition model that we study,
and introduce several main assumptions that we will be working with. In Sec-
tion 2.1, we present conditions for a nonnegative vector to be a social optimum or
a Cournot equilibrium. Then, in Section 2.2, we define the efficiency of a Cournot
equilibrium. In Sections 2.3 and 2.4, we derive some properties of Cournot equi-
libria that will be useful later, but which may also be of some independent
interest. For example, we show that the worst case efficiency occurs when the
cost functions are linear.
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 65
∂− p (X) = ∂+ p (X) .
Because of the above proposition, when Assumptions 1 and 2 hold and the inverse
demand function is convex, we have the following necessary (and, by definition,
sufficient) conditions for a nonzero vector x to be a Cournot candidate:
Cn (xn ) = p (X) + xn p (X), if xn > 0,
(5)
Cn (0) ≥ p (X) + xn p (X), if xn = 0.
Assumption 4. The price at zero supply is larger than the minimum marginal
cost of the suppliers, i.e.,
Proposition 4. Suppose that Assumptions 1-4 hold. Then, the social welfare
achieved at a Cournot candidate, as well as the optimal social welfare [cf. (1)],
are positive.
We now define the efficiency of a nonnegative vector x as the ratio of the social
welfare that it achieves to the optimal social welfare.
C n (x) = αn x, ∀ x ≥ 0.
Then, for the modified model, Assumptions 1-4 still hold, the vector x is a
Cournot candidate, and its efficiency, denoted by γ(x), satisfies 0 < γ(x) ≤ γ(x).
68 J.N. Tsitsiklis and Y. Xu
It is not hard to see that any nonnegative vector xS that satisfies xS1 + xS2 ≥ 1
is socially optimal; xS1 = xS2 = 1/2 is one such vector. On the other hand, it can
be verified that x1 = x2 = 1 is a Cournot equilibrium. Hence, in this example,
2 = X > X S = 1.
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 69
Proposition 7. Suppose that Assumptions 1-4 hold and that the inverse de-
mand function is convex. Let x and xS be a Cournot candidate and an optimal
solution to (1), respectively. If p(X) = p(X S ), then p (X) = 0 and γ(x) = 1.
Proposition 1 shows that all social optima lead to a unique “socially optimal”
price. Combining with Proposition 7, we conclude that if p(·) is convex, a Cournot
candidate is socially optimal if and only if it results in the socially optimal price.
In this section, we argue that the case of concave inverse demand functions is
fundamentally different. For this reason, the study of the concave case would
require a very different line of analysis, and is not considered further in this
paper.
According to Proposition 7, if the inverse demand function is convex and
if the price at a Cournot equilibrium equals the price at a socially optimal
point, then the Cournot equilibrium is socially optimal. For nonconvex inverse
demand functions, this is not necessarily true: a socially optimal price can be
associated with a socially suboptimal Cournot equilibrium, as demonstrated by
the following example.
1, if 0 ≤ q ≤ 1,
p(q) =
max{0, −M (q − 1) + 1}, if 1 < q,
where M > 2. It is not hard to see that the vector (0.5, 0.5) satisfies the opti-
mality conditions in (2), and is therefore socially optimal. We now argue that
(1/M, 1 − 1/M ) is a Cournot equilibrium. Given the action x2 = 1/M of supplier
2, any action on the interval [0, 1 − 1/M ] is a best response for supplier 1. Given
the action x1 = 1 − (1/M ) of supplier 1, a simple calculation shows that
The preceding example shows that arbitrarily high efficiency losses are possible,
even if X = X S . The possibility of inefficient allocations even when the price
is the correct one opens up the possibility of substantial inefficiencies that are
hard to bound.
70 J.N. Tsitsiklis and Y. Xu
b − aq, if 0 ≤ q ≤ b/a,
p(q) = (7)
0, if b/a < q,
Theorem 1. Suppose that Assumption 1 holds (convex cost functions), and that
the inverse demand function is affine, of the form (7). Suppose also that b >
minn {Cn (0)} (Assumption 4). Let x be a Cournot equilibrium, and let αn =
Cn (xn ). Let also
aX
β= ,
b − minn {αn }
If X > b/a, then x is socially optimal. Otherwise:
γ(x) ≥ g(β) = 3β 2 − 4β + 2.
(c) The bound in part (b) is tight. That is, for every β ∈ [1/2, 1) and every > 0,
there exists a model with a Cournot equilibrium whose efficiency is no more
than g(β)+.
(d) The function g(β) is minimized at β = 2/3 and the worst case efficiency is
2/3.
0.95
0.9
0.85
g(β)
0.8
0.75
(2/3, 2/3)
0.7
0.65
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
β
Fig. 1. A tight lower bound on the efficiency of Cournot equilibria for the case of affine
inverse demand functions
Efficiency Loss in a Cournot Oligopoly with Convex Market Demand 71
The lower bound g(β) is illustrated in Fig. 1. For the special case where all the
cost functions are linear, of the form Cn (xn ) = αn , Theorem 1 has an interesting
interpretation. We first note that β = X/X S , which is the ratio of the aggregate
supply at the Cournot equilibrium to that at a social optimum. Clearly, if β
is close to 1 we expect the efficiency loss due to the difference X S − X to be
small. However, efficiency losses may also arise if the total supply at a Cournot
equilibrium is not provided by the most efficient suppliers. Our result shows that,
for the affine case, β can be used to lower bound the total efficiency loss due
to this second factor as well. Somewhat surprisingly, the worst case efficiency
also tends to be somewhat better for low β, that is, when β approaches 1/2, as
compared to intermediate values (β ≈ 2/3).
⎧
⎨ −c(q − X) + p(X), if 0 ≤ q ≤ X,
p0 (q) = S
(8)
⎩ max 0, p(X S)−p(X) (q − X) + p(X) , if X < q.
X −X
Then, for the modified model, with inverse demand function p0 (·), the vector xS
remains socially optimal, and the efficiency of x, denoted by γ 0 (x), satisfies
γ 0 (x) ≤ γ(x).
Proof. Proof Since p(X) = p(X S ), Proposition 6 implies that X < X S , so that
p0 (·) is well defined. Since the necessary and sufficient optimality conditions in
(2) only involve the value of the inverse demand function at X S , which has been
unchanged, the vector xS remains socially optimal for the modified model.
Let S
X X
A= p0 (q) dq, B= p(q) dq,
0 X
and
XS X
C= (p0 (q) − p(q)) dq, D= (p(q) − p0 (q)) dq.
X 0
72 J.N. Tsitsiklis and Y. Xu
D
p(q)
0
p (q)
Price
Cournot equilibrium
A C
Socially optimal point
X S Aggregate supply
X
Fig. 2. The efficiency of a Cournot equilibrium cannot increase if we replace the inverse
demand function by the piecewise linear function p0 (·). The function p0 (·) is tangent to
the inverse demand function p(·) at the equilibrium point, and connects the Cournot
equilibrium point with the socially optimal point.
Note that unless p(·) happens to be linear on the interval [X, X S ], the function
p0 (·) is not differentiable at X and, according to Proposition 2, x cannot be a
Cournot candidate for the modified model. Nevertheless, p0 (·) can still be used
to derive a lower bound on the efficiency of Cournot candidates in the original
model.
0.7
(1,2/3)
0.6
0.5
0.4
f(c/d)
0.3
0.2
0.1
0
5 10 15 20 25 30 35 40 45 50
c/d
Fig. 3. Plot of the lower bound on the efficiency of a Cournot equilibrium in a Cournot
oligopoly with convex inverse demand functions, as a function of the ratio c/d
Theorem 2. Suppose that Assumptions 1-4 hold, and that the inverse demand
function is convex. Let x and xS be a Cournot equilibrium and a solution to (1),
respectively. Then, the following hold.
(a) If p(X) = p(X S ), then γ(x) = 1.
(b) If p(X) = p(X S ), let c = |p (X)|, d = |(p(X S ) − p(X))/(X S − X)|, and
c = c/d. We have c ≥ 1 and
φ2 + 2
1 > γ(x) ≥ f (c) = , (9)
φ2 + 2φ + c
where
2−c+ c2 − 4c + 12
φ = max ,1 .
2
Remark 1. We do not know whether the lower bound in Theorem 2 is tight. The
difficulty in proving tightness is due to the fact that the vector x need not be a
Cournot equilibrium in the modified model.
The lower bound established in part (b) is depicted in Fig. 2. If p(·) is affine,
then c = c/d = 1. From (9), it can be verified that f (1) = 2/3, which agrees
with the lower bound in [9] for the affine case. We note that the lower bound
f (c) is monotonically decreasing in c, over the domain [1, ∞). When c ∈ [1, 3),
φ is at least 1, and monotonically decreasing in c. When c ≥ 3, φ = 1.
74 J.N. Tsitsiklis and Y. Xu
Corollary 1. Suppose that Assumptions 1-4 hold and that p(·) is convex. Let2
s = inf{q | p(q) = min Cn (0)}, t = inf q min Cn (q) ≥ p(q) + q∂+ p(q) .
n n
(10)
If ∂− p(s) < 0, then the efficiency of a Cournot candidate is at least
f (∂+ p(t)/∂− p(s)).
Note that if there exists a “best” supplier n such that Cn (x) ≤ Cm
(x), for any
other supplier m and any x > 0, then the parameters s and t depend only on
p(·) and Cn (·).
Now we argue that the efficiency lower bound (12) holds even without the as-
sumption that there is a best supplier associated with a linear cost function. From
Proposition 5, the efficiency of any Cournot equilibrium x will not increase if
the cost function of each supplier n is replaced by
Let c = minn Cn (xn )}. Since the efficiency lower bound in (12) holds for the
modified model with linear cost functions, it applies whenever the inverse de-
mand function is of the form (11).
Example 5. Suppose that Assumptions 1, 3, and 4 hold, and that there is a
best supplier, whose cost function is linear with a slope c ≥ 0. Consider inverse
demand functions of the form (cf. Eq. (5) in [2])
where α and β are positive constants. Note that if δ = 1, then p(·) is affine; if
0 < δ ≤ 1, then p(·) is convex. Assumption 4 implies that α > χ. Through a
simple calculation we have
1/δ 1/δ
α−c α−c
s= , t= .
β β(δ + 1)
From Corollary 1 we know that for every Cournot equilibrium x,
−βδtδ−1 1−δ
γ(x) ≥ f = f (δ + 1) δ .
−βδsδ−1
Using the argument in Example 4, we conclude that this lower bound also applies
to the case of general convex cost functions.
6 Conclusion
It is well known that Cournot oligopoly can yield arbitrarily high efficiency
loss in general; for details, see [8]. For Cournot oligopoly with convex market
demand and cost functions, results such as those provided in Theorem 2 show
that the efficiency loss of a Cournot equilibrium can be bounded away from
zero by a function of a scalar parameter that captures quantitative properties of
the inverse demand function. With additional information on the cost functions,
the efficiency lower bounds can be further refined. Our results apply to various
convex inverse demand functions that have been considered in the economics
literature.
References
1. Amir, R.: Cournot oligopoly and the theory of supermodular games. Games Econ.
Behav. 15, 132–148 (1996)
2. Bulow, J., Pfleiderer, P.: A note on the effect of cost changes on prices. J. Political
Econ. 91(1), 182–185 (1983)
3. Corchon, L.C.: Welfare losses under Cournot competition. International J. of In-
dustrial Organization 26(5), 1120–1131 (2008)
4. Fabingeryand, M., Weyl, G.: Apt Demand: A flexible, tractable adjustable-pass-
through class of demand functions (2009),
http://isites.harvard.edu/fs/docs/icb.topic482110.files/Fabinger.pdf
76 J.N. Tsitsiklis and Y. Xu
1 Introduction
In typical wireless communication networks, the bandwidth is shared by several
users. Medium Access Control (MAC) schemes are used to manage the access of
users to the shared channels. The slotted ALOHA access protocol is popular due
to its simple implementation and random-access nature [1]. In each time-slot, a
user may access a shared channel according to a specific transmission probability.
Transmission is successful only if a single user tries to access a shared channel
in a given time-slot. If more than one user transmits at the same time slot over
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 77–87, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
78 K. Cohen, A. Leshem, and E. Zehavi
the same channel a collision occurs. Here, we examine the ALOHA protocol
with multi-channel systems, dubbed multi-channel ALOHA. In multi-channel
systems, the bandwidth is divided into K orthogonal sub-bands using Orthog-
onal Frequency Division Multiple Access (OFDMA). Each sub-band can be a
cluster of multiple carriers. A diversity of channel realizations is advantageous
when users exploit local CSI to access good channels. Multi-channel systems are
widely investigated recently in cognitive radio networks, where cognitive users
share an unlicensed spectrum band, while avoiding interferences with licensed
users. A related work on this subject can be found in [2–6].
In distributed optimization algorithms, users take autonomous decisions based
on local information and coordination or massage passing between users are not
required. Therefore, in wireless networks, distributed optimization algorithms
are simple to implement and generally preferred over centralized solutions. A
natural framework to analyze distributed optimization algorithms in wireless
networks is non-cooperative game-theory. A related work on this subject can be
found in [7–12].
In this paper we present a game theoretic approach to the problem of dis-
tributed rate maximization of multi-channel ALOHA networks. In the multi-
channel ALOHA protocol, each user tries to randomly access a channel using a
probability vector defining the access probability to the various channels. First,
we characterize the Nash Equilibrium Points (NEPs) of the network when users
solve the unconstrained rate maximization. We show that in this case, for any
NEP, each user’s probability vector is a standard unit vector (i.e., each user occu-
pies a single channel with probability one and does not try to access other chan-
nels). When considering the unconstrained rate maximization, we are mainly
interested in the case where the number of channels is greater or equal to the
number of users, to avoid collisions. Specifically, in the case where the number of
users, N , is equal to the number of channels there are N ! NEPs. However, when
the number of users is much larger than the number of channels, most users get
a zero utility (due to collisions). To overcome this problem we propose to limit
each user’s total access probability and solve the problem under a total prob-
ability constraint. We characterize the NEPs when user rates are subject to a
total transmission probability constraint. We propose a simple best-response al-
gorithm that solves the constrained rate maximization, where each user updates
its strategy using its local CSI and by monitoring the channel utilization. We
prove that the constrained rate maximization can be formulated as an exact po-
tential game [13]. In potential games, the incentive of all players to change their
strategy can be expressed in a one global function, the potential function. The
existence of a bounded potential function corresponding to the constrained rate
maximization problem implies that the convergence of the proposed algorithm
is guaranteed. Furthermore, the convergence is in finite time, starting from any
point and using any updating dynamics across users.
The rest of this paper is organized as follows. In section 2 we present
the network model and game formulation. In section 3 and 4 we discuss the
unconstrained and the constrained rate maximization problems, respectively. In
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 79
and the collision-free rate matrix of all N users in all K + 1 channels is given by:
⎡ ⎤
u1 (0) u1 (1) u1 (2) · · · u1 (K)
⎢ u2 (0) u2 (1) u2 (2) · · · u2 (K) ⎥
U⎢ ⎣ :
⎥ .
⎦ (2)
uN (0) uN (1) uN (2) · · · uN (K)
Let pn (k) be the probability that user n tries to access channel k. Let Pn be the
the set of all probability vectors of user n in all K + 1 channels. A probability
vector pn ∈ Pn of user n is given by:
pn pn (0) pn (1) pn (2) · · · pn (K) , (3)
Let P be the set of all probability matrices of all N users in all K + 1 channels.
The probability matrix P ∈ P is given by:
⎡ ⎤
p1 (0) p1 (1) p1 (2) · · · p1 (K)
⎢ p2 (0) p2 (1) p2 (2) · · · p2 (K) ⎥
P⎢ ⎣ :
⎥ ,
⎦ (4)
pN (0) pN (1) pN (2) · · · pN (K)
K
where k=0 pn (k) = 1 ∀n.
Let P−n be the set of all probability matrices of all N users in all K + 1
channels, except user n. The probability matrix P−n ∈ P−n is given by:
⎡ ⎤
p1 (0) p1 (1) p1 (2) · · · p1 (K)
⎢ : ⎥
⎢ ⎥
⎢pn−1 (0) pn−1 (1) pn−1 (2) · · · pn−1 (K)⎥
P−n ⎢⎢ ⎥ , (5)
⎥
⎢pn+1 (0) pn+1 (1) pn+1 (2) · · · pn+1 (K)⎥
⎣ : ⎦
pN (0) pN (1) pN (2) · · · pN (K)
80 K. Cohen, A. Leshem, and E. Zehavi
We focus in this paper on stationary access strategies, where each user decides
whether or not to access a channel based on the current utility matrix and all
other users’ strategy.
Remark 1: Note that un depends on the local CSI of user n, which can be
obtained by a pilot signal in practical implementations. On the other hand, in
the sequel we show that user n does not need the complete information on the
matrix P−n to update its strategy, but only to monitor the channel utilization
by other users, defined by:
N
qn (k) 1 − (1 − pi (k)) . (6)
i=1 ,i=n
We are interested in unconstrained (i.e., Pmax = 1) and constrained (i.e., Pmax <
1) NEP solutions of this game. A NEP for our model is a multi-strategy P, given
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 81
in (4), which is self-sustaining in the sense that non of the users can increase its
utility by unilaterally modifying its strategy pn .
since any user that unilaterally modifies its strategy gets a zero utility (due to
a collision or accessing the virtual channel). Note that a better NEP can be
obtained if users 2 or 3 access the virtual channel (i.e., do not transmit).
We infer from Theorem 2 that in each iteration each user will access a single
channel with probability Pmax and will not try to access other channels. However,
in contrast to the unconstrained solutions, other users can still access occupied
channels, since the utility is strictly positive in all channels. We discuss the
convergence later.
As a result of Theorem 2, we obtain a best response algorithm, given in Table
1. The proposed algorithm solves the constrained rate maximization problem
(10). In the initialization step, each user selects the channel with the maximal
collision-free rate un (k). This can be done by all users simultaneously in a single
iteration. Then, each user occasionally monitors the channels utilization and
updates its strategy by selecting the channel with the maximal achievable rate
rn (k) given the channels utilization.
%——initializing———————————–
- end for
%——end initializing——————————
- repeat:
- until convergence
show that the constrained rate maximization (10) indeed converges in finite time.
In potential games, the incentive of all players to change their strategy can be
expressed as a single global function, the potential function. In exact potential
games, the improvement that each player can get by unilaterally changing its
strategy equals to the improvement in the potential function. Hence, any local
maximum of the potential function is a NEP. The existence of an exact bounded
potential function corresponding to the constrained rate maximization problem
(10) implies that the convergence of the proposed algorithm is guaranteed. Fur-
thermore, the convergence is in finite time, starting from any point and using
any updating dynamics across users.
Definition 4 [13]: A game Γ = N , P, R̃ , is an exact potential game if there is
an exact potential function φ : P → R such that for every user n ∈ N and for
every P−n ∈ P−n the following holds:
R̃n (p(2)
n , P−n ) − R̃n (pn , P−n )
(1)
5 Simulation Results
Average number of
iterations 8.75 1
6 Conclusion
In this paper we investigated the problem of distributed rate maximization of
networks applying the multi-channel ALOHA random access protocol. We char-
acterized the NEPs of the network when users solve the unconstrained rate
maximization. In this case, for any NEP, we obtained that each user tries to
access a single channel with probability one and does not try to access other
channels. Next, we limited each user’s total access probability and solved the
problem under a total probability constraint, to overcome the problem of col-
lisions when the number of users is much larger than the number of channels.
We characterized the NEPs when user rates are subject to a total transmission
probability constraint. We proposed a simple best-response algorithm that solves
the constrained rate maximization, where each user updates its strategy using
its local CSI and by monitoring the channel utilization. We used the theory
of potential games to prove convergence of the proposed algorithm. Finally, we
provided numerical examples to demonstrate the algorithms performance.
86 K. Cohen, A. Leshem, and E. Zehavi
1.6
1.4
1.2
Density
0.8
0.6
0.4
0.2
0
8 9 10 11 12 13 14
Achievable rate [Mbps]
3.5
3
Density
2.5
1.5
0.5
0
0 5 10 15 20 25 30
Achievable rate [Mbps]
Fig. 1. Average density of the rates achieved by the proposed algorithm and by the
totally greedy algorithm
A Game Theoretic Optimization of the Multi-channel ALOHA Protocol 87
References
1. Roberts, L.G.: ALOHA packets, with and without slots and capture. ACM SIG-
COMM Computer Communication Review 5(2), 28–42 (1975)
2. Zhao, Q., Sadler, B.: A survey of dynamic spectrum access. IEEE Signal Processing
Magazine 24(3), 79–89 (2007)
3. Zhao, Q., Tong, L., Swami, A.: Decentralized cognitive MAC for opportunistic spec-
trum access in ad hoc networks: a POMDP framework. IEEE Journal on Selected
Area in Comm. 25, 589–600 (2007)
4. Yaffe, Y., Leshem, A., Zehavi, E.: Stable matching for channel access control in
cognitive radio systems. In: International Workshop on Cognitive Information Pro-
cessing (CIP), pp. 470–475 (June 2010)
5. Leshem, A., Zehavi, E., Yaffe, Y.: Multichannel opportunistic carrier sensing for
stable channel access control in cognitive radio systems. IEEE Journal on Selected
Areas in Communications 30, 82–95 (2012)
6. Naparstek, O., Leshem, A.: Fully distributed auction algorithm for spectrum shar-
ing in unlicensed bands. In: IEEE International Workshop on Computational Ad-
vances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 233–236 (2011)
7. Yu, W., Ginis, G., Cioffi, J.: Distributed multiuser power control for digital sub-
scriber lines. IEEE Journal on Selected Areas in Communications 20(5), 1105–1115
(2002)
8. Luo, Z., Pang, J.: Analysis of iterative waterfilling algorithm for multiuser power
control in digital subscriber lines. EURASIP Journal on Applied Signal Process-
ing 2006, 80 (2006)
9. Maskery, M., Krishnamurthy, V., Zhao, Q.: Decentralized dynamic spectrum ac-
cess for cognitive radios: Cooperative design of a non-cooperative game. IEEE
Transactions on Communications 57(2), 459–469 (2009)
10. Huang, J., Krishnamurthy, V.: Transmission control in cognitive radio as a Marko-
vian dynamic game: Structural result on randomized threshold policies. IEEE
Transactions on Communications 58(1), 301–310 (2010)
11. Menache, I., Shimkin, N.: Rate-based equilibria in collision channels with fading.
IEEE Journal on Selected Areas in Communications 26(7), 1070–1077 (2008)
12. Candogan, U., Menache, I., Ozdaglar, A., Parrilo, P.: Competitive scheduling in
wireless collision channels with correlated channel state, pp. 621–630 (2009)
13. Monderer, D., Shapley, L.: Potential games. Games and Economic Behavior 14,
124–143 (1996)
14. Cohen, K., Leshem, A., Zehavi, E.: Game theoretic aspects of the multi-channel
ALOHA protocol in cognitive radio networks. Submitted to the IEEE Journal on
Selected Areas in Communications (2012)
15. Zhao, Q., Tong, L.: Opportunistic carrier sensing for energy-efficient information
retrieval in sensor networks. EURASIP J. Wireless Comm. Netw. 2, 231–241 (2005)
16. Gale, D., Shapley, L.: College admissions and the stability of marriage. The Amer-
ican Mathematical Monthly 69(1), 9–15 (1962)
Game-theoretic Robustness
of Many-to-one Networks
1 Introduction
Access networks and sensor networks are inherently vulnerable to physical at-
tacks, such as jamming and destruction of nodes and links. From a topological
point of view, the common characteristic of these networks is that the primary
goal of the nodes is to communicate with a designated node; therefore, we will
refer to them as many-to-one networks, as opposed to many-to-many networks,
such as backbone networks. For example, in a mesh network of wireless routers
that provide Internet access to mobile terminals, every router is typically inter-
ested in communicating with a designated gateway router through which the
Internet is reachable, and not with other peer routers of the network (except for
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 88–98, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Game-theoretic Robustness of Many-to-one Networks 89
The organization of this paper is the following: In Section 2, we present our game
model. In Section 3, we introduce the concepts and notions used by subsequent
sections. In Section 4, we propose an optimal adversarial strategy and show that
the expected payoff of the adversary cannot be smaller than the reciprocal of
the persistence of the network if she adopts the optimal strategy. In Section
5, we propose an optimal operator strategy and show that the expected payoff
of the operator cannot be smaller than minus the reciprocal of the persistence
of the network when it follows the optimal strategy. In Section 6, we combine
the results of the preceding sections to describe a class of Nash equilibria of
the game. In Section 7, we generalize our game model to allow nodes with non-
uniform weights and attacks against nodes. Finally, in Section 8, we conclude
the paper.
2 The Game
The adversary has to solve max min P (α, β), while the operator has to solve
β α
min max P (α, β). The corresponding solutions, i.e., the optimal adversarial and
α β
operator strategies, are presented in Section 4 and Section 5, respectively.
In this paper, similarly to [4] and [5], we restrict the pure strategies of the
adversary to attacking single edges only. Studying generalized game models, in
which the pure strategies of the adversary consist of subsets of edges, is an open
problem in case of both many-to-many and many-to-one networks.
3 Preliminaries
In this section, we introduce the basic concepts and notions used by subsequent
sections.
For a set of edges A ⊆ E(G), let λ(A) denote the number of nodes from which
there is no path leading to r in the graph when A is removed.
In [6], the persistence of a graph was defined as:
all X ⊆ V (G) \ {r}. Adding π0 · (|V (G)| − 1) to both sides we get that π(G) ≥ π0
is equivalent to
4 Adversary Strategy
In this section, we describe an adversarial strategy, which achieves an expected
1
payoff of π(G) , regardless of the strategy of the operator. Later, in Section 5, we
show that this strategy is optimal by proving that this is the highest attainable
expected payoff for the attacker if the operator is rational.
Proof. For any given spanning tree T ∈ T and set of edges B ⊆ E(G),
e∈B λ(T, e) ≥ λ(B), since every node cut off by removing B has to increase
λ(T, e) by one for at least one e ∈ B. Therefore, the expected payoff for the
adversary is
Game-theoretic Robustness of Many-to-one Networks 93
1
αT βe λ(T, e) = αT λ(T, e)
|A|
e∈E(G) T ∈T e∈A T ∈T
1
= αT λ(T, e)
|A|
T ∈T e∈A
1
≥ αT λ(A)
|A|
T ∈T
λ(A)
= αT
|A|
T ∈T
λ(A) 1
= = .
|A| π(G)
As seen before in Subsection 3.1, a critical set can be computed in polynomial
time, which implies that the same holds for the adversary strategy described in
Theorem 1.
5 Operator Strategy
In this section, we propose an efficient algorithm that computes an optimal oper-
1
ator strategy, which achieves π(G) expected payoff, regardless of the strategy of
the operator. We have already shown in Section 4 that this is the best attainable
expected payoff for the operator if the adversary is rational.
The following lemma is required by the proof of our main theorem:
Lemma 1. Let G be a graph with a designated sink node r. Let G denote the
graph obtained from G in the following way: Add a source node s to the graph.
For each v ∈ V (G)\{r}, add an arc from s to v and set its capacity to 1. Finally,
1
set the capacity of every original edge of the graph to π(G) . The maximum flow
in G from s to r is |V (G)| − 1.
Proof. This readily follows from Subsection 3.1 by scaling the capacity of each
1
edge with π(G) .
Before proving the correctness of the algorithm, we have to prove that Step
2 can be executed in each iteration, otherwise the algorithm would terminate
incorrectly. Obviously, if f is a network flow and the amount of flow along every
(s, v), v ∈ V (G) \ {r} edge is positive, there has to be a directed path from every
v ∈ V (G) \ {r} to r consisting of edges with positive flow amounts. Thus, we
have to show that if f is a network flow carrying γ from s to r before Step 5,
then it is a network flow carrying γ − αT (|V (G)| − 1) from s to r after Step 6.
For a v ∈ V (G) \ {r}, let λv denote λ(T, eout ), where eout is the outgoing edge
of v in T . Clearly, the sum of λ(T, ein ) over all incoming edges ein ∈ E(G) of v
is λv − 1. Since the flow along every edge e is decreased by αT · λ(T, e), the sum
of outgoing flows is decreased by αT · λv . Similarly, the sum of incoming flows is
decreased by αT · (λv − 1) + αT = αT · λv , which takes the αT decrease on (s, v)
into account as well. Clearly, the net flow at v remains zero. Since this is true
for every node, except s and r, f remains a network flow. The flow from s to r
is decreased by αT (|V (G)| − 1), since the flow on every (s, v), v ∈ V (G) \ {r},
edge is decreased by αT .
Now, we can prove the correctnessof the algorithm. First, we have to prove
that α is indeed a distribution, i.e., T ∈T αT = 1 and αT ≥ 0, ∀T ∈ T . This
is evident, as the amount of flow from s to r is decreased by αT (|V (G)| − 1) at
every assignment, and the amount is |V (G)| − 1 after Step 1 and zero after the
algorithm has finished.
Second, we have to prove that the expected loss of every edge in E(G) is at
1 1
most π(G) . After Step 1, the amount of flow along every edge is at most π(G) . At
every αT assignment, the flow along every edge is decreased
by αT · λ(T, e) and
it is never decreased to a negative value. Therefore T ∈T αT · λ(T, e) ≤ π(G) 1
.
Finally, we have to prove that the algorithm terminates after a finite number
of iterations. In every iteration, the flow along at least on edge (i.e., along every
f (e)
edge for which λ(T,e) is minimal) is decreased from a positive amount to zero.
Since there are a finite number of edges, the algorithm terminates after a finite
number of iterations.
Proof. In Step 8, the assignment does not have to be actually performed for
every spanning tree, since it is enough to output the probabilities of only the
trees in the support of the distribution. Therefore, every step of the algorithm
can be performed in polynomial time. Furthermore, the number of iterations is
less than or equal to the number of edges |E(G)|, since the flow along at least
one edge is decreased from a positive amount to zero in every iteration.
Corollary 1. An operator strategy that achieves at least − π(G)
1
expected payoff
for the operator can be found in polynomial time.
Proof. The claim of this corollary follows from Theorem 2 and 3. Suppose that
the strategy of the operator is constructed using the proposed algorithm. Then,
1
the expected payoff of every pure adversarial strategy is at most π(G) , since
∀e ∈ E(G) : T ∈T αT · λ(T, e) ≤ π(G) . Therefore, the expected payoff of every
1
1
mixed adversarial strategy is at most π(G) as well.
6 Nash-Equilibrium
Based on the above results, we can describe a class of Nash equilibria:
Corollary 2. The adversarial strategies presented in Section 4 and the operator
strategies presented in Section 5 form Nash equilibria of the game. The expected
1
payoffs for the adversary and the operator are π(G) and − π(G)
1
, respectively.
Since the game is zero-sum, all Nash equilibria have the same expected payoff.
Consequently, graph persistence is a sensible measure of network robustness.
7 Generalizations
In this section, we present various generalizations to our basic game model intro-
duced in Section 2, which make our model more realistic and practical. We show
that all of these generalized models can be traced back to the basic game model,
i.e., with minor modifications, the previously presented theorems and algorithms
apply to these generalized models as well.
It is possible to generalize our results to the case where nodes have non-uniform
weight or importance. Let dv be the weight of node v: by disconnecting each
node v from r, the adversary gains and the operator loses dv (instead of 1, as
in the original model). Let λ(T, e) denote the total weight of the nodes that
are disconnected from r when the operator uses T and the adversary attacks e.
Similarly, let λ(A) denote the total weight of the nodes that are disconnected
when A is removed. It is easy to see that the definition of graph persistence and
the proposed adversarial strategy do not have to be modified to accommodate
the changes in the definitions of λ(T, e) and λ(A).
In case of the operator strategy, the following modifications have to be made
to the proposed algorithm and the proof:
– In Step 1, the capacity of each (s, v), v ∈ V (G)\{r} arc has to be dv , instead
of 1.
– In Step 6, the capacity of each (s, v), v ∈ V (G) \ {r} arc has to be decreased
by dv · αT , instead of αT .
– Consequently,
• the sum of λ(T, ein ) over all incoming edges ein ∈ E(G) of v is λv − dv ,
instead of λv − 1,
• the flow from s to r is decreased by αT v∈V (G)\{r} dv , instead of
αT (|V (G)| − 1).
is fairly easy to see that the persistence of the obtained graph is the same as the
edge-vertex-persistence of the original one.
This trick can be also used to obtain adversarial and operator strategies that
achieve πn1(G) payoff in the generalized model on any given graph G. Let G be
the graph obtained from G in the above manner. Find an optimal adversarial
strategy on G as it has been described in Section 4, which achieves π(G1 1
) = π (G)
n
payoff on G . The support of the resulting distribution consists of edges in E(G)
and edges corresponding to nodes in V (G). It is easy to see that if we replace
edges corresponding to nodes with the nodes in the support of the distribution,
the resulting strategy achieves πn1(G) payoff on G. An optimal operator strategy,
which achieves πn1(G) payoff on G, can be obtained in a similar manner.
Please note that we could define a model in which an adversary is only able
to target nodes, but this is unnecessary. For every optimal adversarial strategy
targeting both nodes and edges, we can construct a corresponding optimal ad-
versarial strategy that targets only nodes: simply replace each arc in the strategy
with its source node. It is easy to see, that the payoff of the resulting strategy
is at least as large as the payoff of the original strategy.
8 Conclusions
References
1. Altman, E., Boulogne, T., El-Azouzi, R., Jimenez, T., Wynter, L.: A survey on
networking games in telecommunications. Computers & Operations Research 33(2),
286–311 (2006)
98 A. Laszka, D. Szeszlér, and L. Buttyán
2. Felegyhazi, M., Hubaux, J.P.: Game theory in wireless networks: A tutorial. Tech-
nical Report LCA-REPORT-2006-002, EPFL, Lausanne, Switzerland (June 2007)
3. Charilas, D.E., Panagopoulos, A.D.: A survey on game theory applications in wire-
less networks. Computer Networks 54(18), 3421–3430 (2010)
4. Gueye, A., Walrand, J.C., Anantharam, V.: Design of Network Topology in an
Adversarial Environment. In: Alpcan, T., Buttyán, L., Baras, J.S. (eds.) GameSec
2010. LNCS, vol. 6442, pp. 1–20. Springer, Heidelberg (2010)
5. Gueye, A., Walrand, J.C., Anantharam, V.: How to Choose Communication Links
in an Adversarial Environment? In: Jain, R., Kannan, R. (eds.) GameNets 2011.
LNICST, vol. 75, pp. 233–248. Springer, Heidelberg (2012)
6. Cunningham, W.H.: Optimal attack and reinforcement of a network. Journal of the
ACM 32(3), 549–561 (1985)
7. Laszka, A., Buttyán, L., Szeszlér, D.: Optimal selection of sink nodes in wireless
sensor networks in adversarial environments. In: Proc. of the 12th IEEE Interna-
tional Symposium on a World of Wireless, Mobile and Multimedia, WoWMoM 2011,
Lucca, Italy, pp. 1–6 (June 2011)
Hybrid Pursuit-Evasion Game between UAVs
and RF Emitters with Controllable
Observations: A Hawk-Dove Game
1 Introduction
Unmanned aerial vehicle (UAV) is a remotely piloted aircraft, which is widely
used in military. It can be used for many tasks, particularly in surveillance or
renaissance. In recent years, people have studied how to use UAVs as a flying sen-
sor network to monitor various activities, such as radio activities [2][4][5][9][11].
This is particularly useful in military due to the inexpensive cost and efficient
deployment.
In this paper, we study how UAVs can be used to chase RF emitters. When
a UAV is equipped with directional antenna, it can determine where the RF
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 99–114, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
100 H. Li et al.
emitter is and then pursue it, either to continue the surveillance or destroy the
RF emitter. We assume that the RF emitter is also mobile, but with a slower
speed than the UAV. The RF emitter can move to evade the pursuit of the UAV.
Then, it forms a pursuit-evasion game which was originally studied by R. Isaacs
[6]. Since such a game is played in a continuous space and continuous time,
it belongs to the category of differential games. In contrast to traditional game
theory, in which randomness is a key factor of the game, the pursuit-evasion game
is deterministic, which can be described by a partial differential equation (called
Isaacs Equation). It has been widely applied in the study of warfares, such as the
doggame of gameers and the bunker hill battle [6]. The value functions and the
optimal strategies at the equilibrium have been obtained for many applications.
In particular, there have been no studies on the pursuit-evasion games with the
observation controllable by the evader, to our best knowledge. In this paper, we
will consider both cases of discounted and non-discounted rewards. The feedback
Nash equilibrium will be obtained and described by a combination of Bellman’s
equation and Isaacs equation. Due to the prohibitive challenge of solving the
equations, we will study heuristic steering strategies of the UAV and RF emitter
and then use numerical simulations to explore the strategy of whether to stop
transmitting.
The remainder of this paper is organized as follows. The system model for the
UAV and RF emitter is introduced in Section 2. The case of single UAV and
single RF emitter is studied in Section 3 and is then extended to the multiple-
UAV-multiple-emitter case in Section 4. The numerical results and conclusions
are obtained in Sections 5 and 6, respectively.
2 System Model
Consider one UAV and one RF emitter. We denote by xu = (xu1 , xu2 ) and
xe = (xe1 , xe2 ) the locations of the UAV and the RF emitter. We adopt a simple
model for the motions of the UAV and RF emitter, using the following ordinary
differential equations [3]:
⎧
⎨ ẋu1 = vu sin θu
ẋu2 = vu cos θu , (1)
⎩
θ̇u = wu fu
and
⎧
⎨ ẋe1 = ve sin θe
ẋe2 = ve cos θe , (2)
⎩
θ̇e = we fe
where vu and ve are the velocities; fu and fe are the forces to make the direction
change; wu and we are the inertia. It is reasonable to assume that vu > ve . We
assume that the forces are limited; i.e., |fu | < Fu and |fe | < Fe , where Fu and
Fe are the maximum absolute values of the forces. Note that the above model is
very simple but more mathematically tractable than more complicated motion
models.
3 Single-UAV-Single-Emitter Game
In this section, we assume that there is only one UAV and it can perfectly
determine the location of the RF emitter when the emitter keeps transmitting.
This is reasonable if the UAV employs a powerful sensor which can determine
both distance (e.g., using the signal strength) and the angle (e.g., using an
antenna array). However, when the emitter stops transmitting at a certain cost,
the UAV loses the target; hence we say that the observation is controllable (by
102 H. Li et al.
State. We denote by s the state of the whole system, which consists of the
following components:
– For the UAV side, its state includes its current location xu = (xu1 , xu2 ) and
the direction θu .
– For the emitter side, its state includes the current location xe = (xe1 , xe2 ),
the moving direction θe and its transmission state se : se = 1 when the the
emitter transmits and se = 0 otherwise.
Since the game only concerns the the relative location x = xu −xe , we can define
the system state as s = (x, θu , θe ).
Actions. Both the UAV and emitter can move and change direction. Moreover,
the emitter can choose to stop transmitting and then make the UAV lose track
of the target. Hence, the actions in the game are defined as follows.
– UAV: The action is fu which is visible to the emitter.
– Emitter: Its action includes fe , which is also visible to the UAV when se = 1,
and the decision on whether to stop the transmission, which is denoted by ae .
For simplicity, we assume that, when the UAV loses the targets, it follows a cer-
tain predetermined track; e.g., keeping the original direction (fu = 0). Moreover,
we assume that the transmission state has a minimal dwelling time τ0 ; i.e., each
transmission state, namely on or off, must last for at least τ0 units. To simplify
the analysis, we assume that the decision on transmission can be made at only
discrete times, namely 0, τ0 , 2τ0 , ... For the case in which the decision can be
made at continuous time under the constraint of minimum dwelling time, the
analysis is much more complicated and will be left to our future study.
Rewards. The purpose of the UAV is to catch the emitter or force the emitter
to keep silent. When the distance between UAV and emitter is small, the game
is ended. This stopping time is defined as
where R0 is the reward for locating the emitter and c is the penalty on the
UAV when the emitter transmits in one time slot. The reward at time t is
given by
– Non-discounted reward: When the reward is not discounted (i.e., the future
is the same important as now) within a time window [0, Tf ], we have
min{T ∗ ,Tf }
R= (R0 δ(x(t) ≤ γd ) − cae (t)δ(t = nτ0 )) dt. (6)
t=0
and
min(τ,T ∗ )
Rs (t, 1) = max min R0 δ(x(t) < γd )dt + Rs (τ0− ) , (10)
fu fe t
and
− ∂Rs∂t(t,0) = minfe ∂Rs∂s(t,1) f (t, s, fu , fe ) + R0 δ(x(t) < γd ) ,
(13)
Rs (τ, 0) = Rs ((τ0 )− ),
Then, we can obtain the optimal strategies of the UAV and emitter, which are
given in the following corollary
Corollary 1. The strategies at the feedback Nash equilibrium are given by
– The strategy of UAV is given by
∂Rs (t, 1)
u∗f = arg max min f (t, s, fu , fe ) + R0 δ(x(t) < γd ) . (14)
fu fe ∂s
– The strategy of the emitter is given by
∂Rs (t, 1)
u∗e = arg min min f (t, s, fu , fe ) + R0 δ(x(t) < γd ) . (15)
fe fu ∂s
and
Rx ((τ0 )− ) = min [−cI(ae = 1) + Rx (0, ae )] , (16)
ae
and
min(τ,T ∗ )
Rsn+1 (t, 1) = max min R0 δ(x(t) < γd )dt + Rsn (τ0− ) , (18)
fu fe t
and
∂Rn+1 (t,0) ∂Rn+1 (s,1)
− x ∂t = minfe x
∂s f (t, s, fu , fe ) + R0 δ(x(t) < γd ) ,
, (21)
Rsn+1 (τ, 0) = Rsn+1 ((τ0 )− ),
and
Rsf ((τ0 )− = 0.
t
(22)
106 H. Li et al.
Discrete Action. For the discrete action, we consider only the emitter since
there is no discrete action for the UAV.
– Case of Discounted Reward: We assume that, given Rs ((τ0 )− ), we know how
to compute the strategies of the UAV and emitter in (20) and (21). Then,
we can do the following value iteration for computing Rs ((τ0 )− ):
k+1
Rs ((τ0 )− ) = minae −cI(ae = 1) + Rsk (0, ae )
, (23)
Rs0 ((τ0 )− ) = R0 (x)
4 Multi-UAV-Multi-Emitter Game
Players: Since we do not consider any random factor, thus making the game a
deterministic one, each UAV and each emitter know the future evolution of the
game at the feedback Nash equilibrium. Hence, we can consider the the game
as a two (virtual) player one; i.e., both the UAV side and the emitter side are
controlled in centralized way. We assume that each emitter will be out of the
game once it is caught by any UAV; e.g., it is destroyed by the UAV. Hence,
the number of actual players may be changing during the game. We denote by
Ne (t) the set of emitters still surviving at time t.
In practice, when there exists randomness in the observations or each UAV
(emitter) has limited knowledge to the system state, the communications among
the UAVs or the emitters need to be considered, which is concerned with the
team formations due to limited communication range. This more complicated
case will be studied in the future.
State Space. For each individual UAV and emitter, its state is the same as the
single-UAV-single-emitter case. The system state space is the product of the in-
dividual ones; i.e., the state includes the locations and directions of all UAVs and
emitters, denoted by {xun }n=1,...,Nu , {θnu }n=1,...,Nu , {xen }n∈Ne (t) . {θne }n=1,...,Ne ,
as well as the emitters’ transmission state. Note that, when an emitter is caught
by a UAV, it is out of the game and the state space is reduced. Similarly to the
single-UAV-single-emitter case, we still use s to denote the overall system state
(excluding the discrete state of the transmission status of each emitter).
Action Space. For each individual UAV or emitter, its action space is the
same as the single-UAV-single-emitter case in the previous section. We simply
add superscript to distinguish the actions of different UAVs or emitters. For
simplicity, we do not add more constraints like collision avoidance or formation
maintenance.
108 H. Li et al.
where T ∗ is the earliest time that all emitters have been caught; i.e.,
Recall that R0 is the reward for catching an emitter and c is the cost when an
emitter transmits in one time slot. We can immediately obtain the instantaneous
reward r(t) of the UAVs.
and
and
Rs (t, 0) = min
fe
min(τ,T ∗ )
R0 δ(∃n, xun (t) − xe (t) < γd )dt + Rs (τ − ) . (29)
t fu =f0
and
and
5 Numerical Results
In this section, we use numerical simulations to disclose some phenomena of the
pursuit-evasion game. For simplicity, we consider only one UAV and one RF
emitter.
d=1,θ =0 d=2.5,θ =0
2 2
10 10
8 8
6 6
value
value
4 4
2 2
0 0
0 2 4 6 0 2 4 6
δθ δθ
1 1
δθ1=π,δθ2=0 δθ1=3π/2,δθ2=0
10 3
8
2
value
value
6
1
4
2 0
0 2 4 6 0 2 4 6
d d
4
0
2
−5
0
−2 −10
0 5 10 −5 0 5 10
2
2
1
0
0
−2
−1
−2 −4
0 2 4 6 −5 0 5 10
the RF emitter. We also observe that the value usually decreases as the initial
distance between UAV and RF is large (but there are some exceptions).
Fig. 3 shows the tracks of the UAV and RF emitter with different initial
distances. In the left columns, the RF emitter always keeps transmitting; finally,
it will be caught by the UAV. In the right column, the RF emitter adopts the
optimized strategy. We observe that the RF emitter can escape from the pursuit
of the UAV by stopping transmitting in certain times.
Then, we increase the penalty of stopping transmitting to 8. The tracks using
the corresponding optimal strategy is shown in Fig. 4. We observe that, in both
cases, the RF emitter is finally caught by the UAV, due to the large penalty of
stopping transmitting.
4 4
2 2
0 0
−2 −2
0 5 10 0 5 10
2 2
1 1
0 0
−1 −1
−2 −2
0 2 4 6 0 2 4 6
stage 1 stage 1
2 10
1.8
8
1.6
action
value
6
1.4
4
1.2
1 2
0 2 4 6 0 2 4 6
d d
stage 5 stage 5
2 10
1.8 8
1.6 6
action
1.4 value 4
1.2 2
1 0
0 2 4 6 0 2 4 6
d d
Fig. 5. Samples of value functions and optimal actions when the reward is not dis-
counted
2 0
−2 −5
0 5 10 −5 0 5 10
2
2
1
0
0
−2
−1
−2 −4
0 2 4 6 −5 0 5 10
6 Conclusions
We assume that each player has perfect access to all dimensions of the system
state; i.e., the closed-loop perfect state (CLPS). The following definition defines
the feedback Nash equilibrium for the differential game.
Definition 1. For the N -player game in (34) and (35), an N -tuple of strate-
gies {πn∗ }n=1,...,N consists of a feedback Nash equilibrium solution if there exist
functionals Vn over [0, T ] × RM such that
Vn (T, x) = qn (x), (36)
T
Vn (t, x) = gn (t, x∗ (s), π1∗ (x∗ ), ..., πN
∗
(x∗ ))ds
t
+ qn (x∗ (T ))
T
≤ gn (t, x(s), π1∗ (x), ..., πn−1
∗
(x), πn (x),
t
∗ ∗
πn+1 (x)..., πN (x))ds + qn (x∗ (T )), ∀πn , (37)
where x∗ is the trace of state when the actions are π1∗ (s), ..., πN
∗
(s) and x is the
state trace when the action of player n is changed to πn .
The following theorem provides a sufficient condition for the feedback Nash
equilibrium for the general N -player case.
Theorem 1. An N -tuple of strategies {πn∗ }n=1,...N provides a feedback Nash
equilibrium if the functionals {Vn }n=1,...,N satisfy the following equations:
∂Vn (t, x) ∂Vn (t, x) ∗
− = min f (t, x, π−n (t, x), un )
dt un ∂x
∗
+ g(t, x, π−n (t, x), un ) , (38)
and
∂Vn (t, x) ∗
πn∗ (t, x) = arg min f (t, x, π−n (t, x), un )
un ∂x
∗
+ g(t, x, π−n (t, x), un ) , (39)
114 H. Li et al.
and
Vn (T, x) = qn (x). (40)
The following theorem provides a sufficient condition for two-player zero-sum
game in which the cost for player 1 is given by
T
L(u1 , u2 ) = g(t, x(t), u1 (t), u2 (t))dt + q(T, x(T )), (41)
0
and the cost of player 2 is −L(u1 , u2 ).
Theorem 2. The value function of the two-player zero-sum differential game
satisfies the following Isaacs equation:
∂V ∂V
− = min max f (t, x, u1 , u2 ) + g(t, x, u1 , u2 )
∂t u1 u2 ∂x
∂V
= max min f (t, x, u1 , u2 ) + g(t, x, u1 , u2 ) (42)
u2 u1 ∂x
References
1. Başar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn. Society
for Industrial and Applied Mathematics (1999)
2. Beard, R.W., McLain, T.W., Nelson, D.B., Kingston, D., Johanson, D.: Decentral-
ized cooperative aerial surveillance using fixed-wing miniature UAVs. Proceedings
of the IEEE 94(7), 1306–1324 (2006)
3. Bullo, F., Cortes, J., Martinez, S.: Distributed Control of Robotic Networks: A
Mathematical Approach to Motion Coordination Algorithms. Princeton University
Press (2009)
4. DeLima, P., York, G., Pack, D.: Localization of ground targets using a flying sensor
network. In: Proc. of IEEE International Conference on Sensor Networks, Ubiqui-
tous, and Trustworthy Computing, vol. 1, pp. 194–199 (2006)
5. Elsaesser, D.: Emitter geolocation using low-accuracy direction-finding sensors. In:
IEEE Symposium on Computational Intelligence for Security and Defense Appli-
cations, CISDA, pp. 1–7 (2009)
6. Isaacs, R.: Differential Games. Wiley (1965)
7. Lunze, J., Lararrigue, F.L.: Handbook of Hybrid Systems Control: Theory, Tools
and Applications. Cambridge Univ. Press (2009)
8. Nerode, A., Remmel, J.B., Yakhnis, A.: Hybrid system games: Extraction of control
automata with small topologies. In: Handbook of Hybrid Systems Control: Theory,
Tools and Applications. Cambridge Univ. Press (2009)
9. Scerri, P., Glinton, R., Owens, S., Sycara, K.: Locating RF Emitters with Large
UAV Teams. In: Pardalos, P.M., Murphey, R., Grundel, D., Hirsch, M.J. (eds.) Adv.
in Cooper. Ctrl. & Optimization. LNCIS, vol. 369, pp. 1–20. Springer, Heidelberg
(2007)
10. Scerri, P., Glinton, R., Owens, S., Scerri, D., Sycara, K.: Geolocation of RF emit-
ters by many UAVs. In: AIAA, Infotech@Aerospace 2007 Conference and Exhibit
(2007)
11. Walter, D.J., Klein, J., Bullmaster, J.K., Chakravarthy, C.V.: Multiple UAV to-
mography based geolocation of RF emitters. In: Proc. of the SPIE Defense, Secu-
rity, and Sensing 2010 Conference, Orlando, FL, April 5-9 (2010)
Learning Correlated Equilibria
in Noncooperative Games with Cluster
Structure
1 Introduction
Consider a noncooperative repeated game with a set of players comprising multi-
ple non-overlapping clusters. Clusters are characterized by the subset of players
that perform the same task locally and share information of their actions with
each other. However, clusters do not disclose their action profile to other clusters.
In fact, players inside clusters are even oblivious to the existence of other clusters
or players. Players repeatedly take actions to which two payoffs are associated: i)
local payoffs: due to performing localized tasks within clusters, ii) global payoffs:
due to global interaction with players outside clusters. The incremental informa-
tion that players acquire at the end of each period then comprises: i) the realized
payoff, delivered by a third party (e.g. network controller in sensor networks),
and ii) observation of action profile of cluster members. Players then utilize this
information and continuously update their strategies – via the proposed regret-
based learning algorithm – to maximize their expected payoff. The question we
tackle in this paper is: Given this simple local behavior of individual agents, can
the clustered network of players achieve sophisticated global behavior? Similar
problem have been studied in the Economics literature. For seminal works, the
reader is referred to [1,2,3].
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 115–124, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
116 O.N. Gharehshiran and V. Krishnamurthy
Context: The motivation for such formulation stems from multi-agent net-
works that require some sort of cluster structure such as intruder monitoring
in sensor networks. Consider a multiple-target localization scenario in an unat-
tended ground sensor network [5,6]. Depending on their locations, sensors form
clusters each responsible for localizing a particular target. Sensors receive two
payoffs: i) local payoffs, based on the importance and accuracy of the informa-
tion provided about the local phenomena, ii) global payoffs, for communicating
the collected data to the sink through the communication channel, which is
globally shared amongst all sensors. Consideration of the potential local interac-
tion among sensors leads to a more realistic modeling, hence, more sophisticated
design of reconfigurable networked sensors.
Learning Correlated Equilibria in Clustered Noncooperative Games 117
Here, aCm ∈ ×k ∈Cm Ak and a−Cm ∈ × k ∈K Ak denote the joint action profile of
k =k k ∈C
/ m
cluster Cm (to which player k belongs) excluding player k and the joint action
−k
profile of all players excluding cluster Cm , respectively. In addition, Ulk (ak , aCm )
= 0 if cluster Cm is singleton.
Time is discrete n = 1, 2,. . .. Each player k takes an action akn at time instant
n and receives a payoff Unk akn . Each player is assumed to know its local payoff
function Ulk (·); Hence, taking action akn and knowing aCnm , is capable of evaluat-
ing its stage local payoff. Players do not know the global payoff function Ugk (·).
However, they can compute their realized global payoffs as follows:
k
k
Ug,n an = Unk akn − Ulk (akn , aCnm ). (3)
Note that, even if players knew Ugk (·), they could not compute stage global
payoffs as they are unaware of the actions taken by players outside cluster,
namely, a−C
n
m
.
5) Strategy σ k : At period n, each player k selects actions according to a
k
randomized strategy σ k ∈ ΔAk = {pk ∈ RA ; pk (a) ≥ 0, a∈Ak pk (a) = 1}.
The learning algorithm is an adaptive procedure whereby obtaining relatively
118 O.N. Gharehshiran and V. Krishnamurthy
Here, I{·} denotes the indicator function and the step-size is selected as n =
1/(n + 1) (in static games) or n = ε̄, 0 < ε̄ 1, (in slowly time-varying games).
4) Recursion: Set n ← n + 1 and go to Step 1.
Remark 1. The game model may evolve with time due to: i) players join-
ing/leaving the game, ii) players appending/shrinking the set of choices, iii)
changes in players’ incentives, and iv) changes in cluster membership agree-
ments. In these cases, to keep players responsive to the changes, a constant
step-size n = ε̄ is required in (6) and (7). Algorithm 1 cannot respond to mul-
tiple successive changes in the game as players’ strategies are functions of the
time-averaged regrets.
Before proceeding with the main theorem of this paper, we provide the defi-
nition of the correlated ε-equilibrium Cε .
K
Definition 1.
K
Let π denote a joint distribution on A , where π (a) ≥ 0 for all
a ∈ A and a∈AK π (a) = 1. The set of correlated ε-equilibrium, denoted by
Cε , is the convex set [4]
Cε = π : π k (i, a−k ) U k (j, a−k ) − U k (i, a−k ) ≤ ε, ∀i, j ∈ Ak , ∀k ∈ K . (9)
a−k
Proof. The proof uses concepts in stochastic averaging theory [8] and Lyapunov
stability of differential inclusions [9]. In what follows, a sketch of the proof will
be presented:
Learning Correlated Equilibria in Clustered Noncooperative Games 121
dᾱk
k
∈ Lk ᾱk , β̄ − ᾱk ,
dt
dβ̄k
k k (11)
dt = G k ᾱk , β̄ − β̄ ,
k k
where elements of the set-valued matrix Lk ᾱk , β̄ and matrix G k ᾱk , β̄ are
given by:
k
Lkij ᾱk , β̄ = Ulk j, ν Cm − Ulk i, ν Cm σ k (i) ; ν Cm ∈ ΔACm −{k} , (12)
k k k
Gij
k
ᾱ , β̄ = Ug,t (j) − Ug,t
k
(i) σ k (i) , (13)
k
for some bounded measurable process Ug,t (·). Here,
Ulk (ak , ν Cm ) = Ulk (ak , aCm )dν Cm (aCm ), (14)
ACm −{k}
This, together with step 1, proves that if player k employs the learning procedure
in Algorithm 1, ∀ε ≥ 0, there exists δ̂(ε) ≥ 0 such that if δ ≤ δ̂(ε) in Algorithm 1:
122 O.N. Gharehshiran and V. Krishnamurthy
2 : x1 2 : x2
Local: Ul1 , Ul2 1 : x1 (3, 5) (2, 3)
1 : x2 (3, 3) (5, 4)
2 : x1 2 : x2 2 : x1 2 : x2
1 : x1 (−1, 3, 1) (2, −1, 3) (1, −1, 3) (0, 3, 1)
Global: Ug1 , Ug2 , Ug3
1 : x2 (1, −1, 3) (1, 4, 1) (3, 3, 1) (−1, 0, 3)
3 : y1 3 : y2
+
lim sup ᾱkn (i, j) + β̄nk (i, j) ≤ ε w.p. 1, ∀i, j ∈ Ak . (16)
n→∞
3) The global behavior z̄n converges to Cε if and only if (16) holds for all
players k ∈ K. Thus, if every player k follows Algorithm 1, z̄n converges almost
surely (in static games) or weakly tracks (in slowly evolving games) the set of
correlated ε-equilibrium Cε .
4 Numerical Example
2
0 200 400 600 800 1000
Iteration Number n
0
10
2 3
10 10
Iteration Number log(n)
(b) Distance to correlated equilibrium
Fig. 1. Performance Comparison: The solid and dashed lines represent the results from
Algorithm 1 and the reinforcement learning algorithm in [2], respectively. In (a), the
blue, red and black lines illustrate the sample paths of average payoffs of agents 1, 2 and
3, respectively. The dotted lines also represent the payoffs achievable in the correlated
equilibrium.
5 Conclusion
We considered noncooperative repeated games with cluster structure and
presented a simple regret-based adaptive learning algorithm that ensured con-
vergence of global behavior to the set of correlated ε-equilibria. Noting that
reaching correlated equilibrium can be conceived as consensus formation in ac-
tions amongst players, the proposed learning algorithm could have significant
124 O.N. Gharehshiran and V. Krishnamurthy
References
1. Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilib-
rium. Econometrica 68, 1127–1150 (2000)
2. Hart, S., Mas-Colell, A.: A reinforcement procedure leading to correlated equilib-
rium.In: Economic Essays: A Festschrift for Werner Hildenbrand, pp. 181–200 (2001)
3. Hart, S., Mas-Colell, A.: A general class of adaptive strategies. Journal of Economic
Theory 98, 26–54 (2001)
4. Aumann, R.J.: Correlated equilibrium as an expression of Bayesian rationality.
Econometrica: Journal of the Econometric Society 55, 1–18 (1987)
5. Krishnamurthy, V., Maskery, M., Yin, G.: Decentralized adaptive filtering algo-
rithms for sensor activation in an unattended ground sensor network. IEEE Trans-
actions on Signal Processing 56, 6086–6101 (2008)
6. Gharehshiran, O.N., Krishnamurthy, V.: Coalition formation for bearings-only lo-
calization in sensor networks – a cooperative game approach 58, 4322–4338 (2010)
7. Nau, R., Canovas, S.G., Hansen, P.: On the geometry of nash equilibria and corre-
lated equilibria. International Journal of Game Theory 32, 443–453 (2004)
8. Kushner, H.J., Yin, G.: Stochastic Approximation Algorithms and Applications,
2nd edn. Springer, New York (2003)
9. Benaı̈m, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential in-
clusions; Part II: Applications. Mathematics of Operations Research 31, 673–695
(2006)
Marketing Games in Social Commerce
Dohoon Kim
Abstract. This study first provides a stylized model that captures the essential
features of the SC(Social Commerce) business. The model focuses on the
relationship between key decision issues such as marketing inputs and revenue
stream. As more SCs join the industry, they are inevitably faced with fierce
competition, which may lead to sharp increase in the total marketing and
advertising expenditure. This type of competition may lead the industry away
from its optimal development path, and at worst, toward a disruption of the
entire industry. Such being the case, another goal of this study is to examine the
possibility that the tragedy of commons may occur in the industry. Our basic
analysis presents Nash equilibria with both homogeneous and heterogeneous
players. Under a symmetric situation with homogeneous SCs, our analysis
specifies the conditions that the tragedy of commons can occur. Further
discussions provide strategic implications and policy directions to overcome the
shortcomings intrinsic to the current business model, and help the industry to
sustainably develop itself toward the next level.
1 Introduction
SC(Social Commerce or social shopping) providers started their business by
combining group buying with selling discounts from their partners over the Internet.
SC providers split the revenue with their business partners at a predefined
commission rate. After Groupon first initiated this business model in 2009, this type
of services has been called ‘group buying’ since the service proposals become
effective only when more than a certain number of customers buy the coupons. The
SC services are also called ‘daily deal’ or ‘flash deal,’ which emphasizes the aspect of
the service offerings that are usually valid for a short period of time.
SC, barely three years old as a new industry, has been witnessing rapid growth, and
more customers, business partners and investors have joined the industry. More than
500 SC providers (hereafter, simply referred to as SCs) are running their business
worldwide([15]). 1 In Korea, one of the hottest regions of the SC industry, the
1
The statistics vary to some extent since the ways to define the SC industry are different across
countries. Another statistics argue that the number of SCs in the middle of 2011 amounts to
320 in the U.S., more than 3,000 in China, more than 300 in Japan, and 230 in Korea,
respectively (Kim, 2011; Lee, 2011; ROA Holdings, 2011).
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 125–137, 2012.
© Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
126 D. Kim
transaction scale over one quarter amounts to more than 200 million dollars. The sales
revenue of SCs has increased from 45 million dollars in 2010 to almost 500 million
dollars in 2011. These figures mean that the industry has grown 10 times in terms of
sales revenue and 20 times in terms of transaction scale over a year. As of the end of
2011, more than a third of the population in Korea subscribed to and experienced the
service([9]). One observes similar figures about the industry in the East Asian region,
where the SC business is most popular other than the U.S. Over the past years, for
example, the sales revenue has increased from 780 billion dollars to more than 1
trillion dollars in the U.S., from 1,200 million dollars to 3,550 million dollars in
China, from 8,400 million dollars to 11 billion dollars in Japan([5]).
The emergence of SC reflects the collective bargaining power of end-users as the
Internet has shifted the bargaining power from sellers to customers. One of the
distinct examples of this change is what SNS(Social Network Service) brought to the
distribution channels and marketing efforts. Thanks to this new opportunity,
customers, particularly in younger generations who are now initiating and shaping the
consumer trends, have been exposed to more deals, discount chances, and new
information around their local areas. Accordingly, they have been easily allured to the
service proposals from SCs and gave a boost to the industry in its early stage.
However, many criticisms about the SC businesses are emerging now: for
example, [12], [14], [15], [18], [19] etc. These startups have drawn skepticism for
unusual accounting practices, rising marketing costs and quality assurance. This could
make it more difficult for SCs to lure investors. Actually, Groupon experienced an
unstable fluctuation of stock price after its IPO, and LivingSocial withdrew its actions
towards IPO. However, the most urgent, critical view on the SC industry points out
that the industry’s growth rates are unsustainable. One also argues that the business
model of SC has some flaws and cannot be justified by the current overheated market.
The resulting instability may suddenly leave customers, partners and investors
disenchanted. According to Korea Consumer Protection Board, the number of
accusation cases about service failures and customer damages reaches 500 in
2011([9]). Furthermore, many SCs seem to suffer from huge marketing expenses and
low ARPU(Average Revenue Per User). This result has been predicted since their
business practices reinvested a big portion of revenue on advertising and promotion,
and maintained a wide range of service offerings, of which the assortment and
operations costs are too high to justify. Actually, in most countries, ARPU has been
tied up at a very low level: for example, [9], [11], [16].
The prosperity and adversity of the SC industry carries meaningful implications to
other e-commerce industries. The business model of SC may seem to be IT-intensive
at a glance, but it heavily relies upon laborious works. In fact, the human resource and
manpower is the main source of competition in this industry. The SC business model
needs to investigate various commercial districts, negotiate and contract with the
partners, and advertise and promote the service offerings to anonymous consumers.
All of these activities require human interventions. This explains why the average sale
amount per employee is far lower than ones in other e-commerce sectors such as SNS,
search engines, and business portals([5], [11]). Thus, the low entry barrier in the SC
industry is very likely to propel the industry to a chicken game in marketing. The
Marketing Games in Social Commerce 127
worst outcome of persistence of the current situation is that the business model could
end up with another bubble and the entire industry could collapse. SCs, entering a
new phase, should revise the value proposition that they are willing to deliver to the
market and develop a niche differentiated from online shopping mall or open markets.
This study aims at providing a stylized model that captures the essential features of
the SC business model. We will analyze the model to see whether SC is sustainable or
not and find some conditions for stable evolution of the industry. Our approach first
focuses on the relationship between marketing efforts and revenue stream. As more
SCs join the industry, fierce competition is inevitable, resulting in sharp increase of
the marketing and advertising expenditure. This type of competition may lead the
industry away from its optimal development path, and at worst, toward a disruption of
the entire industry. Such being the case, the contribution of this study can be seen as
examining the possibility that ‘the tragedy of commons’ occurs in the industry and
devising a means of avoiding the tragedy.
The organization of our paper is as follows. In Section 2, we present our model that
is stylized to demonstrate the essential features of SC business process and
competitive landscape. We analyze the model in the next section and investigate the
possibility that the tragedy of commons occurs in the industry due to an excessive
competition in market share. Implications of our findings through modeling and
analysis are followed in the next section. Section 4 also discusses the future
development of the SC business model to overcome its limitations. The last section
concludes the paper and suggests future works.
2 Model
increase will be worrisome to potential investors since it could be a signal that the
business is getting more costly for a SC to acquire and retain customers in order to
keep the revenue stream. The business model reveals the nature that a success
constrains its growth.
This self-destructive aspect can be best disclosed when there is less volume of
available inventory or service capacity (ironically, thanks to a success of its SC
business) for many deal-prone customers. In that situation, which is quite plausible,
the willingness of partner suppliers to offer a deep discount will go down, and price-
sensitive shoppers switch to another SC which offers a better deal. In the long run,
competition among SCs will drive down the discount rate and/or the minimum
required threshold.
Considering the arguments above, here, we formulate the SC business model that
incorporates both the bright side and inherent weakness, and delve into the possibility
of self-destruction. We focus on key decision issues of SCs such as marketing efforts
and service configurations offered to customers. Due to fierce competition among
SCs, however, the commission rate is highly likely to be standardized and common to
all the SCs. For example, the commission rate in Korea has been almost fixed at 20%
over one year ([9]). With a fixed commission rate, our model allows a SC to leverage
its minimum required number (refer to the definition of threshold below) as a
competition tool depending on its competitive capability. We further assume that the
discount rates in service configuration are already reflected in this threshold level. In
sum, for the purpose of our study, it suffices to focus on the marketing expenses and
the threshold level.
Let’s suppose that there is a set composed of N SCs, where k is employed as the
index for an individual (sometimes representative) SC. N may also denote the set of
SCs if it is clearly distinguishable in the context: i.e., N = {1, …, n}. We define some
notations for the model elements as follows:
• ek: marketing efforts of SC k,
• tk: customer favor threshold (hereafter, simply referred to as ‘threshold’) set by
SC k (i.e., a reference point representing a service configuration including a discount
package and a minimum number of customers in order for the service offering to be
effective),
• δk: SC k’s tolerance capability to maintain positive cash flows in choosing the
threshold level (i.e., the maximum level of threshold that SC k endures),
• E: total marketing efforts currently under way in the industry (i.e., E= k∈N ek ).
Then, the stylized business model of SCs is abstracted as follows. First, a SC issues
coupons that can be used to purchase products or services from a partner supplier at a
constant discount rate. However, those coupons are effective only when the amount of
coupons sold exceeds a minimum required number of users, or a threshold (tk) set by
the corresponding SC k. The revenue of the SC k will be proportional to the effective
demand that the SC faces. Usually, the revenue function of SC k can be represented
by rk(tk, ek, E).
Marketing Games in Social Commerce 129
∂rk ∂rk ∂ 2 rk
, where > 0, < 0, and < 0. (1)
∂ek ∂E−k ∂t k2
For example, we may employ rk(tk, ek, E) = (ek / E)⋅tk⋅(δk − tk), where δk is the
maximum level of threshold that SC k endures, and simply called the capability of SC
k; that is, SC k loses money if it sets tk beyond δk.
Now, we need to explain more about the conditions in (1). First, the sales revenue
(the amount of deals concluded) of k will be proportional to the relative marketing
expenses. This feature reflects the current situation with a very low entry barrier and
brand-recognition directly related to market share. Thus, we get the first inequality in
(1). However, the marketing efforts of other players put a negative effect on the
corresponding revenue rk, which the second inequality in (1) suggests.
Before explaining the third inequality, note that the threshold has an effect both for
and against the sales amount of SC k. The bigger tk, the larger profit margin SC k will
expect. On the other hand, the probability of ‘service failure’ increases as tk rises.
What we mean by ‘service failure’ is the service that was offered but failed to be
delivered due to the effective demand falling short of the threshold. In its turn, the SC
should compensate for the failure according to the predefined SLA(Service Level
Agreement), which results in a loss on the revenue stream. According to a survey
conducted by the Korea Chamber of Commerce and Industry, more than 50% of
complaints from SC customers come from service failures such as shortage in
quantity and quality degradation due to excessive sales of coupons([8]). SCs are
responsible for the service failures, and they should compensate the corresponding
customers for breach of service agreement, which ultimately reduces the actual
revenue. Thus, the increase of the threshold tk will enhance the revenue at first, but it
will also increase the possibility of service failure, thereby reducing the real revenue
in the end. We model this effect of tk on the revenue in a concave shape, thereby
requiring the third inequality in (1).
Finally, we need to net out the costs of individual efforts of SC k, which is
assumed to be proportional to the amount of effort ek: that is, ck⋅ek. Note that ck
involves both pecuniary and non-pecuniary unit cost incurred in the course of
operations pertaining to marketing. Thus, it can be thought of as all the ex ante burden
when SC k implement one unit of marketing action. And ck should not be confused
with marketing expenses ek, which represents ex post values paid for marketing-
related activities. There will be no costs associated with the decision of tk since the
decision is a matter of deliberation and does not incur pecuniary costs. In sum, our
final payoff (profit) of SC k is formulated as follows:
ek
πk = rk − ck⋅ek = tk⋅ ⋅(δk − tk) − ck⋅ek (tk ≤ δk). (2)
E
3 Analysis
Our analysis first presents Nash equilibria of the model. Assuming that heterogeneous
SCs may employ different strategies, the following Proposition shows that there are
infinitely many solutions, in particular, for a best response of individual marketing
effort ek.
determined as follows:
δ
tk* = k and
2
εk* as a solution to the following linear equation system, 1 − ζij = −ζij⋅εi* + εj*, ∀ i < j
in N.
Then, ek* is determined by P⋅εk* with a suitable proportional constant P.
Proof: First, one can easily show that tk* satisfies the FONC(First Order Necessary
Condition). The linear equation system for εk’s (k = 1, …, N) comes from a set of
FONCs for ek*’s. It is possible to derive a closed form solution for εk* by utilizing the
matrix structure of the linear system and employing Cramer’s rule. However, the
detailed procedure is omitted here; instead, an example is demonstrated below. Once
εk*’s are identified, we can construct ek* by simply multiplying a constant P to the
corresponding εk*. Although the system of simultaneous equations for ek’s has
infinitely many solutions, thanks to the linearity of the equation system for εk’s, ek* is
unique up to scalar multiplication.
To check out the SOSC(Second Order Sufficient Condition), we construct the
Hessian matrix H as follows. One can easily show that H is negative definite at the
points satisfying the FONCs if both tk* and ek* are positive as assumed in the
Proposition above.
2 E ⋅ E−k ⋅ t k ⋅ (δ k − t k ) E−k ⋅ (δ k − 2t k )
− E4 E2
H=
δ k − 2t k 2ek , where E−k = i≠ k ei . Q.E.D.
−
E E
SC k, εk* (thereby, ek* too) increases as ζkj (j ≠ k) increases, but decreases as ζjk (j ≠ k)
increases. Thus, more marketing efforts of SC k are expected if the relative capability
δ c
(i.e., k , j ≠ k) is enhanced and/or the relative marketing cost (i.e., k , j ≠ k)
δj cj
decreases. However, the former will have a stronger effect on ek* than the latter since
ζij is proportional to square of the relative capability. Subsequently, the critical
competitive edge can be gained from enhancing the capability that a SC can maintain
a positive cash flow against low margins.
If all the SCs have the same capability and cost structure, a symmetric Nash
equilibrium can be found in the following Proposition. This sort of symmetric cases
with homogeneous SCs may fit two stages of the industry life-cycle. The first is the
infant or very early stage of the industry, where a small number of similar size
companies constitute the industry. Another one is the mature stage of the life-cycle,
where many small- and medium-sized SCs (in particular, with low δk) are forced out
of the market and a small number of big SCs with similar properties survive.
Proof: As for tk*, the same reasoning as in Proposition 1 is applied. Thanks to the
symmetric strategy assumption, we can construct the system of linear equations for
( )
ek’s directly from the set of FONCs, c⋅(E*)2 = δ 2 4 ⋅ E−∗k , for all k. The last equation
reduces into c⋅N ⋅e = δ ⋅ ( N − 1) 4 since E = N⋅e* and E−k
2 * 2 * ∗
= (N−1)⋅e*. Thus, we
get tk* and ek* as above, from which we see that SOSCs are trivially satisfied.
Q.E.D.
First note that in the case of symmetric strategy, the optimal level of the customer
favor threshold t* does not depend on the number of SCs in the industry. On the other
hand, it is interesting to look into the combined effort or total expenditure from all
N − 1 δ2
SCs (i.e., E* = N⋅e* = ⋅ ), which depends on the number of SCs. The
N 4c
dE ∗
combined effort increases with N (i.e., > 0), but the rate of growth is
dN
d 2E∗
diminishing with N (i.e., < 0). Furthermore, E* converges to a number as N
dN 2
δ2
goes to infinity: i.e., lim N →∞ E ∗ = ≡ Ê . In sum, E* is a concave function of N,
4c
which converges to Ê .
132 D. Kim
Although each SC may exert less marketing effort as there are more SCs
(see e* in Proposition 2), the addition of new SC swamps this effect, thereby,
increasing the total marketing efforts (E*) into the market. If we assume that the
revenue function reflects the market demand, there will be a strong possibility of
overexploitation of customers; that is, collectively, SCs will exert marketing efforts
far beyond the point that boosts the potential market demand at its maximum level.
This resembles the typical situation of ‘the tragedy of commons,’ where this sort of
negative externality is at the heart of the problem ([3], [6], [7]). When a SC
advertises, it doesn’t take into account the negative effect that its action might have on
the revenue streams of other SCs.
To examine this possibility more precisely, let’s first define the industry
performance measure W(-) as a function of the total marketing expenses E and the
average customer favor threshold⎯t as follows:
With the industry performance measure in (4), the following Proposition explains
how socially optimal E0 and t0 are determined.
Marketing Games in Social Commerce 133
δ δ ⋅ (α ⋅ δ + 2c ⋅ β)
t0 = and E0 = .
2 8c ⋅ a
Proof: First, it’s easy to show that FONCs are satisfied with t0 and E0 if α⋅δ > c⋅β (in
particular, for t0), which is also satisfied by the condition above. To check out SOSC,
we construct the Hessian matrix below:
− 2c ⋅ α c ⋅ β + α ⋅ (δ − 2t )
H= .
c ⋅ β + α ⋅ (δ − 2t ) β ⋅ (4t − δ ) − 2α ⋅ E
This Hessian is indeed negative definite at t0 and E0 when (α⋅δ − c⋅β)2 > 3(c⋅β)2,
2 2 2 2
δ β β δ β δ β
which is equivalent to − > 3⋅ , or − 2 ⋅ − 2 > 0. Since
c α α
c α c α
δ
is positive, this inequality is satisfied if the condition in the Proposition holds.
c
Q.E.D.
Note that t0 = t*; that is, at least for the threshold, the socially optimal level and the
optimal level of an individual choice coincide. Therefore, we may predict that SCs
will manage their threshold levels at the socially optimal level.
However, this desirable feature may not be sustained when we consider the total
marketing efforts. Furthermore, a ramification of the tragedy of commons shows a
δ β
‘phase transition’ nature, where the relationship between and specifies the
c α
sharp boundary of the phase transition. We’ve already seen that a relationship
between these two terms presents the conditions in Proposition 3. These conditions
δ β
hold when is far larger than . Proposition 4 goes further and provides another
c α
relationship (in somehow different format) between these two terms. This relation is
critical in triggering the situation of ‘the tragedy of commons.’
2α ⋅ δ
T= ,
α ⋅ δ − 2c ⋅ β
134 D. Kim
δ β
Case (b) ≤ 2 : Then, for any N, the total marketing efforts falls short of the
c α
socially optimal level (i.e., E* ≤ E0).
Proof: E* > E0 if and only if (α⋅δ + 2c⋅β)⋅N < 2α⋅δ⋅N − 2α⋅δ, which is further
arranged into (2c⋅β − α⋅δ)⋅N < −2α⋅δ. Then, we have two cases. The condition in
Case (a) corresponds to the situation where the left-hand side is negative; while the
condition in Case (b) guarantees that the left-hand side is non-negative. Thus, in Case
(b), the inequality E* > E0 cannot hold unless N is negative, which is impossible. In
2α ⋅ δ
Case (a), E* > E0 holds for N > ≡ T. Furthermore, the denominator of T is
α ⋅ δ − 2c ⋅ β
always bigger than the numerator under the condition in Case (a), which guarantees T
> 1. Q.E.D.
The results of the Proposition imply that one cannot expect that the SC industry will
be sustained unless the condition in Case (b) comes true. It depends on the number of
SCs whether the industry in Case (a) survives or not. That is, a limited number of SCs
may thrive only if the size of the industry is maintained less than T. It’s not difficult to
construct an example where Case (b) together with the limited opportunity of N < T in
Case (a) of Proposition 4 are rarely observed. Therefore, the tragedy of commons
seems inevitable in most practical situations.
2α
By rearranging T into , we know that T is larger than 2 and converges
2c ⋅ β
α−
δ
dT d 2T
to 2 as δ becomes larger. Since < 0 and > 0, T is diminishing but slowly
dδ dδ 2
β
converges to 2 as δ goes infinity. However, T shows a different behavior when q ≡
α
changes. Again, by rearranging terms in T, we get another expression of T
2δ dT d 2T
= , and > 0 and > 0 when δ > 2c. Subsequently, T is close to 2
δ − 2c ⋅ q dq dq 2
β
when α is far larger than β (i.e., ≈ 0), and very rapidly increasing (to infinity) as
α
β δ
approaches to (> 1). This behavior implicitly puts an upper bound on the
α 2c
δ⋅α
relative size between α and β; that is, β cannot be larger than . As a result, T
2c
β
appears more sensitive to than to δ.
α
*
Since t0 = t and they do not depend on the number of SCs under symmetric
strategies, we can view the performance structure from a different angle by defining
two parametric functions based on our model: that is, H = H (t ) ≡ −β⋅ t and J = J (t )
( )
≡ t ⋅ δ − t . Note that at a symmetric equilibrium, both H(-) and J(-) are constant
Marketing Games in Social Commerce 135
β⋅δ δ2
functions: specifically, H = − and J = for both social and individual optimal
2 4
levels (t0 and t*). Accordingly, the performance measure (4) can be simply viewed as
if it were a function of E only as below:
Note that T̂ > 1 in Case (a) of Corollary 5. Thus, we still have a chance to escape
from the tragedy of commons even in Case (a) when N < T̂ . Unfortunately, however,
a reasoning procedure similar to one derived from Proposition 4 reveals that T̂ is
always larger than 2 but quite small in most normal situations.
The SC startups have drawn criticism for unusual accounting practices, rising
marketing costs and inadequate quality assurance, despite a rapid growth in their early
stage. We tried to understand the current critical situation and figure out the causes of
the pessimistic view toward the SC industry. For the purposes, our study developed
stylized game models and analyzed them to find out some potential (but critical)
problems inherent in the business model at the early stage of industrial life-cycle. In
particular, we focused on the conditions under which the SC industry is sustainable.
Our findings and analytical results provided strategic implications and policy
directions to overcome the shortcomings intrinsic to the current business model. For
example, a set of regulations on the marketing activities may help the industry to
sustainably develop itself toward the next level. Along this line, our future works will
pursue some empirical studies to identify the parameters in our model so that we can
further enrich knowledge about the industry. For example, although gathering data
will be intrinsically difficult due to the early stage of the industry, we need to develop
an operational definition of the social welfare W to estimate the relevant parameters
such as α and β in our model. Then, we will be able to quantify the conditions under
which a (group of) first-mover(s) survives and estimate a proper size of the industry
sustainable in the long run.
References
1. Baek, B.-S.: Ticket Monster and Coupang, head-to-head competition for the industry’s
number one position. Economy Today (November 25, 2011) (in Korean),
http://www.eto.co.kr/news/
outview.asp?Code=20111125145743203&ts=133239
2. Patel, K.: Groupon marketing spending works almost too well. Ad Age Digital
(November 12, 2011),
http://adage.com/article/digital/
groupon-marketing-spending-works/230777/
3. Alroy, J.: A multispecies overkill simulation of the end-Pleistocene mega faunal mass
extinction. Science 292, 1893–1896 (2001)
4. Anderson, M., Sims, J., Price, J., Brusa, J.: Turning ‘like’ to ‘buy’: social media emerges
as a commerce channel. White Paper. Booz & Company (January, 20 (2012),
http://www.booz.com/media/uploads/
BaC-Turning_Like_to_Buy.pdf
Marketing Games in Social Commerce 137
5. Financial News: Special report on social commerce (December 18, 2011) (in Korean),
http://www.fnnews.com/view?ra=Sent0901m_View&corp=fnnews&arc
id=0922494751&cDateYear=2011&cDateMonth=12&cDateDay=18
6. Greco, G.M., Floridi, L.: The tragedy of the digital commons. Ethics and Information
Technology 6, 73–81 (2004)
7. Hardin, G.: The tragedy of the commons. Science 162, 1243–1248 (1968)
8. KCCI: A consumer satisfaction survey on social commerce services. Research report.
Korea Chamber of Commerce and Industry (March 8, 2011) (in Korean),
http://www.korcham.net/EconNews/KcciReport/CRE01102R.asp?m_c
hamcd=A001&m_dataid=20110308001&m_page=1&m_query=TITLE&m_que
ryText=%BC%D2%BC%C8%C4%BF%B8%D3%BD%BA
9. Kim, Y.-H.: Social commerce: current market situations and policy issues. KISDI (Korea
Information Society Development Institute) Issue Report 23, 41–63 (2011) (in Korean)
10. Knowledge at Wharton: Dot-com bubble, part II? Why it’s so hard to value social
networking sites? Knowledge at Wharton Online (October 4, 2006),
http://knowledge.wharton.upenn.edu/
article.cfm?articleid=1570
11. Lee, E.-M.: Global market survey on social commerce. KISDI Issue Report 23, 36–44
(2011) (in Korean)
12. MacMillan, D.: Groupon’s stumbles may force it to pare back size of IPO. Bloomberg
Online (October 3, 2011),
http://www.bloomberg.com/news/2011-10-03/groupon-s-stumbles-
seen-paring-back-size-of-ipo-as-investor-interest-wanes.html
13. MacMillan, D.: LivingSocial aims to be different from Groupon. Business Week Online
(September 22, 2011),
http://www.businessweek.com/magazine/
livingsocial-aims-to-be-different-from-groupon-09222011.html
14. MacMillan, D.: Groupon China Venture said to fire workers for poor performance.
Bloomberg Online (August 24, 2011), http://www.bloomberg.com/news/2011-
08-23/groupon-china-joint-venture-said-to-fire-workers-for-
poor-performance.html
15. Reibstein, D.: How sustainable is Groupon’s business model? Knowledge at Wharton
(May 25, 2011),
http://knowledge.wharton.upenn.edu/
article.cfm?articleid=2784
16. ROA Holdings: The rapidly expanding social commerce market of South Korea and Japan.
Research report (February 21, 2011),
http://global.roaholdings.com/report/
research_view.html?type=country&num=143
17. Urstadt, B.: Social networking is not a business. MIT Technology Review (July/August
2008), http://www.technologyreview.com/business/20922/
18. Webster, K.: Groupon’s business model: bubble or the real deal? (September 19, 2011),
http://pymnts.com/commentary/pymnts-voice/
groupon-s-business-model-bubble-or-the-real-deal/
19. Wheeler, R.: Groupon gone wrong! Harvard business fellow’s warning to investors and
entrepreneurs (August 23, 2011),
http://pymnts.com/briefingroom/shopping-and-social-
buying/social-shopping-and-social-buying/
groupon-gone-wrong-a-warning-to-investors/
Mean Field Stochastic Games with Discrete
States and Mixed Players
Minyi Huang
Keywords: mean field game, finite states, major player, minor player.
1 Introduction
Large population stochastic dynamic games with mean field cou-
pling have attracted substantial interest in the recent years; see, e.g.,
[1,4,11,16,12,13,18,19,22,23,24,26,27]. To obtain low complexity strategies,
consistent mean field approximations provide a powerful approach, and in the
resulting solution, each agent only needs to know its own state information
and the aggregate effect of the overall population which may be pre-computed
off-line. One may further establish an ε-Nash equilibrium property for the set
of control strategies [12]. The technique of consistent mean field approximations
is also applicable to optimization with a social objective [5,14,23]. The survey
[3] on differential games presents a timely report of recent progress in mean
field game theory. This general methodology has applications in diverse areas
[4,20,27]. The mean field approach has also appeared in anonymous sequential
games [17] with a continuum of players individually optimally responding to
the mean field. However, the modeling of a continuum of independent processes
leads to measurability difficulties and the empirical frequency of the realizations
of the continuum-indexed individual states cannot be meaningfully defined [2].
A recent generalization of the mean field game modeling has been introduced
in [10] where a major player and a large number of minor players coexist pursuing
their individual interests. Such interaction models are often seen in economic or
engineering settings, simple examples being a few large corporations and many
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 138–151, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Games with Mixed Players 139
Due to the structure of the costs J0 and Ji , the major player has a significant
impact on each minor player. By contrast, each minor player has a negligible
impact on another minor player or the major player. Also, from the point of
view of the major player or a fixed minor player, it does not distinguish other
Games with Mixed Players 141
specific individual minor players. Instead, only the aggregate state information
I (N ) (t) matters at each step, which is an important feature of mean field decision
problems.
For the N + 1 decision processes, we specify the joint distribution as follows.
Given the states and actions of all players at time t, the transition probability
to a value of (x0 (t + 1), x1 (t + 1), . . . , xN (t + 1)) is simply given by the product
of the individual transition probabilities under their respective actions.
For integer k ≥ 2, denote the simplex
⎧ ⎫
⎨ k ⎬
Dk = (λ1 , . . . , λk ) ∈ Rk+ λj = 1 .
⎩ ⎭
j=1
To ensure that the individual costs are finite, we introduce the assumption.
(A1) The one-stage costs c0 and c are functions on S0 × DK × A0 and S ×
S0 × DK × A, respectively, and they are both continuous in θ. ♦
lim I (N ) (0) = θ0
N →∞
and h0 = (x0 ). We may further specify mixed strategies (or policies; we shall use
the two names strategy and policy interchangeably), as a probability measure on
the action space, of each player depending on ht , and use the method of dynamic
programming to identify Nash strategies for the mean field game. However, for a
large population of minor players, this traditional approach is impractical. First,
each player must use centralized information which causes high complexity in
implementation; second, numerically solving the dynamic programming equation
is a prohibitive or even impossible task when the number of minor players exceeds
a few dozen.
142 M. Huang
where x0 (t) has the transition law (1) and θ(t) satisfies (4).
Problem (P0) gives a standard Markov decision process. To solve this problem,
we use the dynamic programming approach by considering a family of optimiza-
tion problems associated with different initial conditions. Given the initial state
(x0 , θ) ∈ S0 × DK at t = 0, define the cost function
∞
J¯0 (x0 , θ, u(·)) = E ρt c0 (x0 (t), θ(t), u0 (t))|x0 , θ .
t=0
Denote the value function v(x0 , θ) = inf J¯0 (x0 , θ, u(·)), where the infimum is
with respect to all mixed policies/strategies of the form π = (π(0), π(1), . . . , )
such that each π(s) is a probability measure on A0 , indicating the probability
to take a particular action, and depends on all past history (. . . , x0 (s − 1), θ(s −
1), u0 (s − 1), x0 (s), θ(s)). By taking two different initial conditions (x0 , θ) and
(x0 , θ ) and comparing the associated optimal costs, we may easily obtain the
following continuity property.
v(x0 , θ)
= min {c0 (x0 , θ, a0 ) + ρEv(x0 (t + 1), θ(t + 1))}
a0 ∈A0
= min c0 (x0 , θ, a0 ) + ρ Q0 (k|x0 , a0 )v(k, ψ(x0 , θ)) .
a0 ∈A0
k∈S0
Since the action space is finite, an optimal policy π̂0 solving the dynamic pro-
gramming equation exists and is determined as a stationary Markov policy of
the form π̂0 (x0 , θ), i.e., π̂0 is a function of the current state. Let the set of opti-
mal policies be denoted by Π0 . It is possible that Π0 consists of more than one
element.
where xi (t) has the state transition law (2); θ(t) satisfies (4); and x0 (t) is subject
to the control policy π̂0 ∈ Π0 . This leads to a Markov decision problem with the
state (xi (t), x0 (t), θ(t)) and control action ui (t). Following the steps in Section
3.1, we define the value function w(xi , x0 , θ).
Before analyzing the value function w, we specify the state transition law of
the major player under any mixed strategy π0 . Suppose
π0 = (α1 , . . . , αL0 ), (5)
which is a probability vector. By the standard convention in Markov decision
processes, the strategy π0 selects action k with probability αk . We further define
Q0 (z|y, π0 ) = αl Q0 (z|y, l),
l∈A0
The process (xi (t), x0 (t), θ(t)) is a Markov chain since the transition probability
from time t to t + 1 depends only on the value of (xi (t), x0 (t), θ(t)) and not on
the past history. Suppose at time t, (xi (t), x0 (t), θ(t)) = (j, k, θ). Then at t + 1,
we have the transition probability
P xi (t + 1) = j , x0 (t + 1) = k , θ(t + 1) = θ xi (t), x0 (t), θ(t)) = (j, k, θ)
= Q(j |j, π̂(j, k, θ))Q0 (k |k, π̂0 (k, θ))δψ(k,θ) (θ ).
We use δa (x)to denote the dirac function, i.e., δa (x) = 1 if x = a, and δa (x) = 0
elsewhere. It is seen that the transition probability is determined by (j, k, θ) and
does not depend on time.
If Problems (P0) and (P1) are considered alone, one may always select an optimal
policy which is a pure policy, i.e., given the current state, the action can be
selected in a deterministic manner. However, in the mean field game setting we
need to eventually determine the function ψ by a fixed point argument. For this
reason, it is generally necessary to consider the optimal policies from the larger
class of mixed policies. The restriction to deterministic policies may potentially
lead to a nonexistence situation when the consistency requirement is imposed
later on the mean field approximation.
This section develops the procedure to replicate the dynamics of θ(t) from the
closed-loop system when the minor players apply the control strategies obtained
from the limiting Markov decision problems.
We start with a system of N minor players. Suppose the major player has
selected its optimal policy π̂0 (x0 , θ) from Π0 . Note that for the general case
of Problem (P1), there may be more than one optimal policy. We make the
convention that the same optimal policy π̂(xi , x0 , θ) is used by all the minor
players while each minor player substitutes its own state into the feedback policy
π̂. It is necessary to make this convention since otherwise the mean field limit
cannot be properly defined if there are multiple optimal policies and if each
minor player can take an arbitrary one.
We have the following key theorem on the asymptotic property of the update
of I (N ) (t) when N → ∞. Note that the range of I (N ) (t) is a discrete set. For
any θ ∈ DK , we take an approximation procedure. We suppose the vector θ has
been used by the minor players (of the finite population) at time t in solving
their limiting control problems and used in their optimal policy.
Theorem 2. Fix any θ = (θ1 , . . . , θK ) ∈ DK . Suppose the major player applies
π̂0 and the N minor players apply π̂, and at time t the state of the major player
146 M. Huang
We further obtain a probability vector Q1 := (Q(k|1, π̂(1, x0 , θ)))K k=1 with its
entries assigned on the set S indicating the probability that each state appears
resulting from the transition of A1 .
An important fact is that in the closed-loop system with x0 (t) = x0 , condi-
tional independence holds for the transition from xi (t) to xi (t + 1) for the N
processes.
Thus, the distribution of N I (N ) (t + 1) given (x0 , I (N ) (t), π̂) is obtained as
the convolution of N independent distributions corresponding to all N minor
players. And Q1 is one of these N distributions. We have
K K
Ex0 ,I (N ) (t),π̂ I (N ) (t + 1) = sl Q(1|l, π̂(l, x0 , θ)), . . . , sl Q(K|l, π̂(l, x0 , θ)) ,
l=1 l=1
(8)
where Ex0 ,I (N ) (t),π̂ denotes the conditional mean given (x0 , I (N ) (t), π̂).
So by the law of large numbers I (N ) (t + 1) − Ex0,I (N ) (t),π̂ I (N ) (t + 1) converges
to zero with probability one, as N → ∞. We obtain (7).
Based on the right hand side of (7), we introduce the N × N matrix
⎡ ⎤
Q(1|1, π̂(1, x0 , θ)) . . . Q(N |1, π̂(1, x0 , θ))
⎢ Q(1|2, π̂(2, x0 , θ)) . . . Q(N |2, π̂(2, x0 , θ)) ⎥
⎢ ⎥
Q∗ (x0 , θ) = ⎢ .. .. .. ⎥. (9)
⎣ . . . ⎦
Q(1|N, π̂(N, x0 , θ)) . . . Q(N |N, π̂(N, x0 , θ))
Theorem 2 implies that within the infinite population limit if the random mea-
sure of the states of the minor players is θ(t) at time t, then θ(t + 1) should be
generated as
where Q∗ is given by (9). Recall that when we introduce the class Ψ for ψ,
we have a continuity requirement. By imposing (11), we implicitly require a
continuity property of Q∗ with respect to the variable θ.
Combining the solutions to Problems (P0) and (P1) and the consistent re-
quirement, we write the so-called mean field equation system
In the above, we use xi to denote the state of the generic minor player. Note that
only a single generic minor player appears in this mean field equation system.
Definition 1. We call (π̂0 , π̂, ψ(x0 , θ)) a consistent solution to the mean field
equation system (12)-(15) if π̂0 solves (13) and π̂ solves (14) and if the constraint
(15) is satisfied. ♦
Then following the method for (8), we may estimate I (N ) (1). By the consistency
condition (11), we further obtain
Carrying out the estimates recursively, we obtain the desired result for each
fixed t.
For j = 0, ..., N , denote u−j = (u0 , u1 , ..., uj−1 , uj+1 , ..., uN ).
Definition 2. A set of strategies uj ∈ Uj , 0 ≤ j ≤ N , for the N + 1 players
is called an -Nash equilibrium with respect to the costs Jj , 0 ≤ j ≤ N , where
≥ 0, if for any j, 0 ≤ j ≤ N , we have Ji (uj , u−j ) ≤ Jj (uj , u−j ) + , when any
alternative uj is applied by player Aj . ♦
Games with Mixed Players 149
Theorem 4. Assume the conditions in Theorem 3 hold. Then the set of strate-
gies ûj , 0 ≤ j ≤ N , for the N + 1 players is an N -Nash equilibrium, i.e., for
0 ≤ j ≤ N,
Proof. The theorem may be proven by following the usual argument in our pre-
vious work [12,10]. First, by using Theorem 3, we may approximate I (N ) (t) in
the original game by θ(t). Then the optimization problems of the major player
and any minor player are approximated by Problems (P0) and (P1), respec-
tively. Finally, it is seen that each player can gain little if it deviates from the
decentralized strategy determined from the mean field equation system.
References
1. Adlakha, S., Johari, R., Weintraub, G., Goldsmith, A.: Oblivious equilibrium for
large-scale stochastic games with unbounded costs. In: Proc. IEEE CDC 2008,
Cancun, Mexico, pp. 5531–5538 (December 2008)
2. Al-Najjar, N.I.: Aggregation and the law of large numbers in large economies.
Games and Economic Behavior 47(1), 1–35 (2004)
3. Buckdahn, R., Cardaliaguet, P., Quincampoix, M.: Some recent aspects of differ-
ential game theory. Dynamic Games and Appl. 1(1), 74–114 (2011)
4. Dogbé, C.: Modeling crowd dynamics by the mean field limit approach. Math.
Computer Modelling 52, 1506–1520 (2010)
5. Gast, N., Gaujal, B., Le Boudec, J.-Y.: Mean field for Markov decision processes:
from discrete to continuous optimization (2010) (Preprint)
150 M. Huang
6. Galil, Z.: The nucleolus in games with major and minor players. Internat. J. Game
Theory 3, 129–140 (1974)
7. Gomes, D.A., Mohr, J., Souza, R.R.: Discrete time, finite state space mean field
games. J. Math. Pures Appl. 93, 308–328 (2010)
8. Haimanko, O.: Nonsymmetric values of nonatomic and mixed games. Math. Oper.
Res. 25, 591–605 (2000)
9. Hart, S.: Values of mixed games. Internat. J. Game Theory 2, 69–86 (1973)
10. Huang, M.: Large-population LQG games involving a major player: the Nash cer-
tainty equivalence principle. SIAM J. Control Optim. 48(5), 3318–3353 (2010)
11. Huang, M., Caines, P.E., Malhamé, R.P.: Individual and mass behaviour in large
population stochastic wireless power control problems: centralized and Nash equi-
librium solutions. In: Proc. 42nd IEEE CDC, Maui, HI, pp. 98–103 (December
2003)
12. Huang, M., Caines, P.E., Malhamé, R.P.: Large-population cost-coupled LQG
problems with nonuniform agents: individual-mass behavior and decentralized ε-
Nash equilibria. IEEE Trans. Autom. Control 52(9), 1560–1571 (2007)
13. Huang, M., Caines, P.E., Malhamé, R.P.: The NCE (mean field) principle with
locality dependent cost interactions. IEEE Trans. Autom. Control 55(12), 2799–
2805 (2010)
14. Huang, M., Caines, P.E., Malhamé, R.P.: Social optima in mean field LQG control:
centralized and decentralized strategies. IEEE Trans. Autom. Control (in press,
2012)
15. Huang, M., Malhamé, R.P., Caines, P.E.: On a class of large-scale cost-coupled
Markov games with applications to decentralized power control. In: Proc. 43rd
IEEE CDC, Paradise Island, Bahamas, pp. 2830–2835 (December 2004)
16. Huang, M., Malhamé, R.P., Caines, P.E.: Nash equilibria for large-population linear
stochastic systems of weakly coupled agents. In: Boukas, E.K., Malhamé, R.P.
(eds.) Analysis, Control and Optimization of Complex Dynamic Systems, pp. 215–
252. Springer, New York (2005)
17. Jovanovic, B., Rosenthal, R.W.: Anonymous sequential games. Journal of Mathe-
matical Economics 17, 77–87 (1988)
18. Lasry, J.-M., Lions, P.-L.: Mean field games. Japan. J. Math. 2(1), 229–260 (2007)
19. Li, T., Zhang, J.-F.: Asymptotically optimal decentralized control for large popu-
lation stochastic multiagent systems. IEEE Trans. Automat. Control 53(7), 1643–
1660 (2008)
20. Ma, Z., Callaway, D., Hiskens, I.: Decentralized charging control for large popula-
tions of plug-in electric vehicles. IEEE Trans. Control Systems Technol. (to appear,
2012)
21. Nguyen, S.L., Huang, M.: Mean field LQG games with a major player: continuum-
parameters for minor players. In: Proc. 50th IEEE CDC, Orlando, FL, pp. 1012–
1017 (December 2011)
22. Nourian, M., Malhamé, R.P., Huang, M., Caines, P.E.: Mean field (NCE) formula-
tion of estimation based leader-follower collective dyanmics. Internat. J. Robotics
Automat. 26(1), 120–129 (2011)
23. Tembine, H., Le Boudec, J.-Y., El-Azouzi, R., Altman, E.: Mean field asymptotics
of Markov decision evolutionary games and teams. In: Proc. International Confer-
ence on Game Theory for Networks, Istanbul, Turkey, pp. 140–150 (May 2009)
24. Tembine, H., Zhu, Q., Basar, T.: Risk-sensitive mean-field stochastic differential
games. In: Proc. 18th IFAC World Congress, Milan, Italy (August 2011)
Games with Mixed Players 151
25. Wang, B.-C., Zhang, J.-F.: Distributed control of multi-agent systems with random
parameters and a major agent (2012) (Preprint)
26. Weintraub, G.Y., Benkard, C.L., Van Roy, B.: Markov perfect industry dynamics
with many firms. Econometrica 76(6), 1375–1411 (2008)
27. Yin, H., Mehta, P.G., Meyn, S.P., Shanbhag, U.V.: Synchronization of coupled
oscillators is a game. IEEE Trans. Autom. Control 57(4), 920–935 (2012)
28. Yong, J., Zhou, X.Y.: Stochastic Controls: Hamiltonian Systems and HJB Equa-
tions. Springer, New York (1999)
Network Formation Game for Interference
Minimization Routing in Cognitive Radio Mesh
Networks
1 Introduction
Cognitive radio (CR) is a revolutionary technology that allows secondary users
(SUs) to occupy the idle licensed spectrum holes left by the primary users (PUs)
[1]. CR-based wireless mesh networks (WMNs) is dynamically self-organized
and self-configured, and the SUs (wireless mesh routers) have the capabilities to
automatically establish and maintain the mesh connections among themselves
avoiding the interference to the PUs [2–5].
Although there have been some work investigating routing problems in CR
networks, few in the literatures consider the aggregate interference to the PUs
from a large amount of SUs transmitting at the same time. Also the game the-
oretical approaches have been less investigated in the routing problems for the
CR networks. In this paper, we focus on the development of routing algorithms
for CR-WMNs to minimize the aggregate interference from the SUs to the PUs.
Note that we are not considering the interference between different secondary
nodes or between multiple paths, which has been well investigated in the idea
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 152–162, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Network Formation Game 153
of interference aware routing [6]. Instead, we are studying the aggregate inter-
ference from multiple SUs to the PUs in the CR networks. In CR-WMNs, the
secondary mesh nodes equipped with CR functionalities must be out of PUs’
footprint to avoid interference to the PUs, as long as they want to use the same
channels as the PUs’. Although the interference from a single SU (that is outside
the primary users’ footprint) is small, the aggregated interference from a large
number of SUs transmitting at the same time to the PUs can be significant, and
the performance of the PUs can be greatly influence by this aggregate interfer-
ence. We formulate the routing problem to minimize the aggregate interference
from the SUs to the PUs. We develop a distributed algorithm using the network
formation game framework and a myopic distributed algorithm [7]. From the
simulation results, we can see that the proposed distributed algorithm can pro-
duce better routes in terms of interference to the PUs compared to Dijkstra’s
algorithm. Also the distributed solution shows near optimum compared to the
upper bound.
The remainder of this paper is organized as follows. In Section 2, the CR-
WMN model is introduced. In Section 3, we provide the formulation of the
distributed routing algorithm. Section 4 presents the simulation results, and
Section 5 concludes the paper.
In CR-WMNs, the wireless routers work as the SUs, which have the capabilities
to sense the spectrum and access the idle spectrum holes left by the PUs. The SUs
can employ the spectrum sensing techniques, such as radio identification based
154 Z. Yuan, J.B. Song, and Z. Han
We need to define the strategy for each player in the game. The strategy of SU
i is to select the link that it wants to form from its strategy space, which can be
defined as the SUs in N that SU i is able to and wants to connect to. We want to
set a rule that player i cannot connect to player j which is already connected to i.
This means that if a link (j, i) ∈ G, then link (i, j) cannot be in G. Formally, for
a current network graph G, let Ai = {j ∈ N\{i}|(j, i) ∈ G} be the setof nodes
from which node i accepted a link (j, i), and Si = {(i, j)|j ∈ N\({i} Ai )} as
the set of links corresponding to the nodes with whom node i wants to connect.
Consequently, the strategy of player i is to select the link si ∈ Si that it wants
to form by choosing the player that it wants to connect to.
3.2 Utility
The players try to make decisions for utility maximization. Given a network
graph G and a selected strategy si for any player i ∈ N , the utility of player i
can be expressed as
fi,inexthop
ui (G) = −Be1 Be2 × TIi , (2)
ci,inexthop
where Be1 and Be2 are the barrier functions, TIi is node i’s interference temper-
ature, fi,inexthop is the flow on the edge between node i and its next hop, and
ci,inexthop represents the capacity of the same edge.
We know that the flow on each edge should be smaller than the link ca-
pacity, which means fe ≤ ce , ∀e ∈ E. In addition, the outgoing flow should be
equalto the sum of incoming
flow and generated traffic. Therefore, we can have
lj + e=(i,j)∈E fe = e=(j,i)∈E fe , where lj represents the generated traffic of
secondary node j. This is the flow conservation constraint. We assume that that
lj consists of only generated traffic if there is no incoming traffic from wired In-
ternet. Therefore, the barrier functions that consider the above two constraints
can be defined as κ1
1
1
Be = , (3)
1 − fcee + ε1
and ⎛ ⎞ κ2
1
Be2 = ⎝
lj + e∈E fe
⎠ , (4)
1− + ε2
e fe
where ε1 and ε2 are two small dummy constants so that the denominators are not
zero. κ1 and κ2 are set to be great than 0 in order to weight different constraints.
When the constraints are almost not met, the values of the constraint functions
will be large. Therefore, in the proposed utility function, the interference to the
PUs are protected by the barrier functions to ensure that the two constraints
are satisfied.
Network Formation Game 157
Fig. 2. A simulation result showing the network routing using distributed algorithm
in a 250-by-250 meter area
means that the nodes are in a Nash equilibrium. Consequently, we can have
ui (Gs∗i ,s−i ) ≥ ui (Gsi ,s−i ), ∀si ∈ S̆i , for any i ∈ N .
Theorem 1. In the game with finitely many nodes, there exists a Nash net-
work G .
After solving the network formation algorithm and obtaining the whole network
topology, the source node may have several choices to the destination, as defined
in Convention 1. However, if we select a route that is very far away from the
primary users, which may provide significantly low interference to the primary
users, we may have large delay along this route. Therefore, we need a tradeoff
between the cumulative delay and the aggregate interference. In order to make
sure that the interference to the PUs is low enough without increasing much
delay, we will select a route based on the constraint:
Dtotal ≤ Dτ , (5)
where Dtotal represents the total delay along the route, and Dτ is the threshold.
Note that for different source and destination pairs, we may have different values
for the delay threshold. Given the constraint in Eq. (5), the source will then select
the route with the lowest aggregate interference to the PUs.
0.9
0.7
0.6
0.5
0.4
Network formation algorithm
0.3 Dijkstra’s algorithm
Centralized algorithm
0.2
0.1
0
20 25 30 35 40 45 50 55 60
Number of secondary nodes
We assume that link capacities only depend on the distance between the two
players to simplify the problem. The data rate is 48 Mbps within the distance of
32m , 36 Mbps within 37m, 24 Mbps within 45m, 18 Mbps within 60m, 12 Mbps
within 69m, 9 Mbps within 77m, and 6 Mbps within 90m [11]. The maximum
interference range RI is 180m, and the maximum transmission range RT is 90m.
The number of the nodes in the network may change, and we consider random
topologies for the simulation. We generate a data set of 1,000 for the simulation.
For every data set, the generated traffic by the node, the locations of the gateway
are randomly generated and defined.
Fig. 2 shows the simulation results for the proposed distributed routing al-
gorithm. We use a random priority in the fair prioritization phase for a general
case. The big dot represents a PU with the sector area as the PU’s footprint.
The other dots are 50 SUs and those SUs that are inside the PU’s footprint
are forced to turn off because the spectrum is occupied by the PU. We also
define the source and destination nodes in Fig. 2. After applying the proposed
distributed interference minimization routing algorithm, we can get the route
shown as the dashed arrows. If we use the Dijkstra’s shortest path algorithm
that does not consider the aggregate interference to the PU, the solid route is
achieved. In these two routes, the interference temperature values to the primary
user are 1.6195 and 1.3354, respectively. Clearly, the solid route produces higher
interference to the PU than the dashed route, since the nodes in the solid route
are closer to the PU.
Now we compare the performance between the proposed distributed algorithm
and the upper bound. The upper bound can be achieved using the centralized
routing algorithm proposed in [12]. Fig. 3 shows the simulation results about
the interference comparison with different numbers of the SUs. ε1 , ε2 are both
set to be 1.5, and κ1, κ2 are 0.01. We choose small κ values to avoid the cost
function changing too fast. Delay threshold is set to be twice the delay if using
the Dijkstra’s algorithm. The solid line represents the simulated performance of
the distributed network formation algorithm. The dashed line is the centralized
160 Z. Yuan, J.B. Song, and Z. Han
Dijkstra algorithm
5 Network formation algorithm
Centralized algorithm
Normalized delay
4
1.8
Normalized interference to primary users
Dτ 1=4*Ds
1.6
Dτ 2=3*Ds
1.4
Dτ 3=2*Ds
1.2
0.8
0.6
0.4
0.2
0
20 25 30 35 40 45 50 55 60
Number of secondary nodes
solution, and it performs better than the distributed approach as expected. The
distributed solution shows near optimum compared with the centralized interfer-
ence minimization solution, producing about 1.0098 time the interference from
the centralized algorithm. This means that it is 99.02% efficient compared to the
upper bound. The black dashed line is the result using the Dijkstra’s algorithm
without considering the aggregate interference to the PUs. We can find that it
produces the highest interference in the three solutions. Moreover, with the in-
creasing number of SUs, interference to the PUs increases in Fig. 3. Note that
the reason that we only compare the proposed algorithms with the Dijkstra’s
shortest path algorithm is that most other existing routing algorithms for CR
networks do not consider the aggregate interference to the PUs.
Fig. 4 shows the comparison of delay between the proposed distributed algo-
rithm and the upper bound. For simplicity, the delay is defined as the number of
hops. We can find that with the increasing of the distance between SUs, the total
delay will increase, which is consistent with the results in Fig. 3. In addition,
the centralized algorithm provides slightly higher delay than the distributed
Network Formation Game 161
5 Conclusion
In this paper, we develop a distributed routing algorithm using network forma-
tion game in CR-WMNs. In CR-WMNs, although the interference from a single
SU to the PUs is small, aggregate interference from a large number of SUs that
are transmitting at the same time can be significant, which will influence the
PUs’ performance. Therefore, we develop a distributed routing algorithm using
the network formation game framework to minimize the aggregate interference to
the PUs, which is practically implementable. Simulation results shows that the
proposed scheme finds better routes in terms of interference to the PUs compared
to the shortest path scheme. We also compare the performance of the distributed
optimization algorithm with an upper bound and validate it efficiency, and the
distributed solution shows near optimum compared to the centralized solution,
providing 99.02% efficiency of the upper bound.
References
1. Hossain, E., Niyato, D., Han, Z.: Dynamic Spectrum Access in Cognitive Radio
Networks. Cambridge University Press, UK (2009)
2. Chowdhury, K.R., Akyildiz, I.F.: Cognitive Wireless Mesh Networks with Dynamic
Spectrum Access. IEEE Journal on Selected Areas in Communications 26(1), 168–
181 (2008)
3. Ileri, O., Samardzija, D., Sizer, T., Mandayam, N.B.: Demand Responsive Pricing
and Competitive Spectrum Allocation Via a Spectrum Server. In: Proc. IEEE
Symposium on New Frontiers in Dynamic Spectrum Access Networks, Baltimore,
MD, US, November 8-11, pp. 194–202 (2005)
4. Etkin, R., Parekh, A., Tse, D.: Spectrum Sharing For Unlicensed Bands. In: Proc.
IEEE Symposium on New Frontiers in Dynamic Spectrum Access Networks, Bal-
timore, MD, US, November 8-11, pp. 251–258 (2005)
162 Z. Yuan, J.B. Song, and Z. Han
5. Kim, D.I., Le, L.B., Hossain, E.: Joint Rate and Power Allocation for Cognitive
Radios in Dynamic Spectrum Access Environment. IEEE Transactions on Wireless
Communications 7(12), 5517–5527 (2008)
6. Parissidis, G., Karaliopoulos, M., Spyropoulos, T., Plattner, B.: Interference-Aware
Routing in Wireless Multihop Networks. IEEE Transactions on Mobile Comput-
ing 10(5), 716–733 (2011)
7. Saad, W., Han, Z., Debbah, M., Hjorungnes, A., Basar, T.: A Game-Based Self-
Organizing Uplink Tree for VoIP Services in IEEE 802.16j Networks. In: Proc.
IEEE International Conference on Communications, Dresden, Germany (June
2009)
8. Clancy, T.C.: Achievable Capacity Under the Interference Temperature Model. In:
Proc. IEEE International Conference on Computer Communications, Anchorage,
AK, US, pp. 794–802 (May 2007)
9. Yucek, T., Arslan, H.: A Survey of Spectrum Sensing Algorithms for Cognitive Ra-
dio Applications. IEEE Communications Surveys and Tutorials 11, 116–130 (2009)
10. Han, Z., Liu, K.J.R.: Resource Allocation For Wireless Networks: Basics, Tech-
niques, and Applications. Cambridge University Press, UK (2008)
11. IEEE 802.11: Wireless LAN Medium Access Control (MAC) and Physical Layer
(PHY) Specifications
12. Yuan, Z., Song, J.B., Han, Z.: Interference Minimization Routing and Scheduling
in Cognitive Radio Wireless Mesh Networks. In: IEEE Wireless Communications
and Networking Conference, Sydney, Australia, pp. 1–6 (April 2010)
Noncooperative Games for Autonomous
Consumer Load Balancing over Smart Grid
1 Introduction
In the traditional power market, electricity consumers usually pay a fixed retail
price for their electricity usage. This price only changes on a seasonal or yearly
basis. However, it has been long recognized in the economics community that
charging consumers a flat rate for electricity creates allocative inefficiencies, i.e.,
consumers do not pay equilibrium prices according to their consumption levels
[1]. This was shown through an example in [2], which illustrates how flat pricing
causes deadweight loss at off-peak times and excessive demand at the peak times.
The latter may lead to small-scale blackouts in a short run and excessive capacity
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 163–175, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
164 T. Agarwal and S. Cui
buildup over a long run. As a solution, variable-rate metering that reflects the
real-time cost of power generation can be used to influence consumers to defer
their power consumption away from the peak times. The reduced peak-load can
significantly reduce the need for expensive backup generation during peak times
and excessive generation capacity.
The main technical hurdle in implementing real-time pricing has been the
lack of cost-effective two-way smart metering, which can communicate real-time
prices to consumers and their consumption levels back to the energy provider.
In addition, the claim of social benefits from real-time pricing also assumes that
the consumer demand is elastic and responds to price changes while traditional
consumers do not possess the equipments that enable them to quickly alter their
demands according to the changing power prices. Significant research efforts
on real-time pricing have involved estimating the consumer demand elasticity
and the level of benefits that real time pricing can achieve [1, 3, 4]. Fortunately,
the above requirements on smart metering and consumer adaptability are being
fulfilled [5] as technology advances in cyber-enabled metering, power generation,
power storage, and manufaturing automation, which is driven by the need for a
Smart Grid.
Such real-time pricing dynamics have been studied in the literature mainly
with game theory [6–8]. In particular, the authors in [6] provided a design mech-
anism with revelation principle to determine the optimal amount of incentive
that is needed for the customers to be willing to enter a contract with the utility
and accept power curtailment during peak periods. However, they only consid-
ered a fixed pricing scheme. In [7], the authors studied games among consumers
under a certain class of demand profiles at a price that is a function of day long
aggregate cost of global electricity load of all consumers. However, the case with
real-time prices was not investigated in [7]. In [8], a noncooperative game was
studied to tackle the real-time pricing problem, where the solution was obtained
by exploring the relationship with the congestion games and potential games.
However, the pricing schemes that we study are not amenable to transformations
described in [8].
In this paper we formulate noncooperative games [9,10] among the consumers
with two real-time pricing schemes under more general load profiles and revenue
models. The first pricing scheme charges a price according to the instantaneous
average cost of electricity production and the second one charges according to
a time-varying version of increasing-block price [11]. We investigate consumer
demands at the Nash equilibrium operation points for their uniqueness and load
balancing properties. Furthermore, two revenue models are considered for each
of the schemes, and we show that both pricing schemes lead to similar electricity
loading patterns when consumers are interested only in the minimization of
electricity costs. Finally we discuss the conditions under which the increasing-
block pricing scheme is preferred over the average-cost based pricing scheme.
The rest of the paper is organized as follows. The system model and for-
mulation of the noncooperative game are presented in Section 2. The game is
analyzed with different real-time pricing schemes under different revenue models
Noncooperative Games for Autonomous Consumer Load Balancing 165
$100
Oil
Marginal Cost ($/MWh)
Oil
Natural Gas
Coal
Nuclear
Hydro
$1M
Cost ($)
Fig. 1. A hypothetical marginal cost of supply and the corresponding total cost curve
as seen by the retailer in the wholesale market within a single time slot. Supply is from
five different sources: hydroelectric, nuclear, coal, natural gas, and oil. Two different
generators may use different technologies for power generation thus incurring different
marginal costs with the same fuel (e.g., the two different cost levels for oil in Fig. 1(a)).
a time-slotted pattern at the beginning of the day (which contains T time slots).
These consumers are selfish and aim to maximize their individual utility/payoff
functions; hence they do not cooperate with each other to manage their demands.
Each consumer i has a minimum total daily requirement of energy, βi ≥ 0, which
is split over the T time slots. Let xit denote the ith consumer’s demand in the tth
time slot. A consumer can demand any value xit ≥ 0 (negativity constraint) with
i i i i i i
t xt ≥ βi (demand constraint). Let x = {x1 , x2 , . . . , xt , . . . , xT }, represent the
ith consumer’s demand vector, which is called the strategy for the ith consumer.
, xN
Let xt = {x1t , . . . t }, represent the demand vector from all consumers in time
slot t with xt = i xit . Let x represent the set {x1 , . . . , xN }.
The payoff or utility for consumer i is denoted by π i which is the difference
between the total revenue it generates from the purchased electricity and its
cost. In particular, let Eti , a function of xit , represent the revenue generated by
the ith consumer in the tth time slot and Mti , a function of xt , represent its
payment to the retailer for purchasing xit . Then the payoff π i , to be maximized
by consumer i, is given by
πi = Eti − Mti .
t∈{1,...,T }
positive, non-decreasing, and convex. Briefly, we note that as the retailer capacity
is constrained by a predetermined upper limit U , we model this constraint as
C(xt ) = ∞, ∀xt > U ; obviously xit ≤ U is an implicit constraint on the demand
xit for any rational consumer.
The second scheme is a time-variant version of the increasing-block pric-
ing scheme [11]. With a typical increasing-block pricing scheme, consumer i
is charged a certain rate b1 for its first z1 units consumed, then charged rate
b2 (> b1 ) for additional z2 units, and charged rate b3 (> b2 ) for additional z3
units, and so on. The b’s and z’s describe the marginal cost price for the com-
modity. In our scheme we design a marginal cost function, which retains the
increasing nature of increasing-block pricing, such that it depends on xt and the
function C(·). Consumer i pays an amount determined by the marginal cost func-
tion M(x, xt ), applicable to all consumers at time slot t. In particular consumer
i pays
xit
Mti = M(x, xt )dx (3)
0
such that i Mti = C(xt ) is satisfied. An intuition behind this pricing scheme
is to penalize consumers with relatively larger demands. Note that in this case,
xit ≤ U is implicitly assumed by letting C(·) = ∞ ∀xit > U and hence Mti =
∞ ∀xit > U .
For each of the two pricing schemes, we study two different revenue models.
For the first one we set Eti as zero for all consumers over all time slots, which
leads to payoff maximization being the same as cost minimization from the point
of view of the consumers. For the second one we assign consumer i a constant
revenue rate φit at each time slot t, which gives Eti = φit xit and leads to payoff
maximization being the same as profit maximization.
For the average-cost pricing, the payment to the retailer in slot t by consumer i
is given by (2).
In this case the revenue is set to zero as Eti = 0, which results in payoff maxi-
mization being the same as cost minimization
for each consumer. Specifically, the
payoff for consumer i is given by π i = − t Mti . The consumer load balancing
Noncooperative Games for Autonomous Consumer Load Balancing 169
0 ≤ xit , ∀t.
As cost to the retailer becomes infinity whenever the total demand goes beyond
the capacity threshold for the wholesale market, i.e., when C(xt ) = ∞ ∀xt > U ,
the price to consumers will become infinite and their payoff will go to negative
infinity. Thus any consumer facing an infinite cost at a particular time slot
can manipulate the demand vector such that the cost becomes finite, which is
always feasible under the assumption that sum load demand over all times slots
is less than sum supply availability. This implies that, at Nash equilibrium, sum
demand xt will be less than the capacity threshold
U, ∀t, which allows for a
redundant constraint xit ≤ U, ∀i, t, as xit ≤ i xit = xt ≤ U . Such a redundant
but explicit constraint in turn makes the feasible region for x, denoted by X ,
finite and hence compact. The compactness property is utilized to prove the
Kakutani’s theorem [13] which in turn is required to show the existence of NEP
solution.
By the results in [14] we can show that there exists an NEP strategy for all
agents with the cost function used here and the NEP solution exists for the
proposed noncooperative consumer load balancing game.
On the other hand, the cost function Mti does not satisfy the conditions for
being a type-A function, defined in [14]. Therefore, the corresponding uniqueness
result in [14] cannot be extended to our formulation. In [15] we show that our
problem is equivalent to an atomic flow game [16] with splittable flows and
different player types (i.e., each player controls a different amount of total flow)
over a generalized nearly-parallel graph, which has strictly semi-convex, non-
negative, and non-decreasing functions for cost per unit flow. By the results
of [16], we can prove that the NEP solution for the load balancing game is
unique [15].
In the following, we discuss the properties for the unique NEP solution for
the proposed load balancing game.
Lemma 1. With the average-cost based pricing and zero revenue, at the Nash
equilibrium the price of electricity faced by all consumers is the same over all
time slots.
Lemma 2. If C(·) is strictly convex, at the Nash equilibrium, the sum of de-
mands on the system, xt , keeps the same across different time slots.
The proof is provided in [15].
Lemma 3. If C(·) is strictly convex, at Nash equilibrium, each consumer will
distribute its demands equally over the T time slots.
The proof is provided in [15].
Remark: Under the average-cost based pricing scheme with zero revenue, if
one particular consumer increases its total demand of electricity, the price A(·)
increases, which in turn increases the payments for all other consumers as well.
Theoretically one consumer may cause indefinite increases in the payments of
all others; and in this sense this scheme does not protect the group from reckless
action of some consumer(s). This issue will be addressed by our second pricing
scheme as we will show in Section 4.
0 ≤ xit , ∀t.
We assume that βi = 0, ∀i, and the rate of revenue is larger than the price of
electricity such that we do not end up with any negative payoff or the trivial
solution xit = 0, ∀i, t.
Here again, if the sum demand in a given time slot t exceeds the retailer’s
capacity threshold U , the consumers will face an infinite price for their con-
sumption. This implies that, at Nash equilibrium the sum demand xt will never
exceed the capacity threshold U , as we assume that sum load demand over all
time slots is greater that sum load available.
This again allows for the redun-
dant constraint xit ≤ U, ∀i, t, as xit ≤ i xit = xt ≤ U , which in turn makes the
feasible region for x, X , finite and hence compact.
The proof for the existence of NEP for this game under the given assumptions
is provided in [15].
Noncooperative Games for Autonomous Consumer Load Balancing 171
Lemma 4. At the Nash equilibrium, the consumer(s) with the highest revenue
rate (φit ) within the time slot, may be the only one(s) buying the power in that
time slot.
The proof is provided in [15]. Thus if consumer i has the maximum rate of
revenue, either it is the only consumer buying non-zero power xit such that
φit = A(xit ) or φit < C (0) and hence xit = 0 in that time slot, which leads to
a unique Nash equilibrium for the sub-game. If in a given time slot multiple
consumers experience the same maximum rate of revenue, the sub-game will
turn into a Nash Demand Game [17] between the set of consumers given by
{arg maxk φkt }, which is well known to admit multiple Nash equilibriums. Thus
the overall noncooperative game has a unique Nash equilibrium if and only if, in
each time slot, at most one consumer experiences the maximum rate of revenue.
As an example, if the demands from different consumers at time slot t are iden-
tical, i.e., if xit = xjt , ∀i, j, we have,
The consumer load balancing problem for each consumer i is given by the fol-
lowing optimization problem:
maximize π i (xi ) = − Mti
t
xit
subject to Mti = M(x, xt )dx, ∀t,
0
xit ≥ βi ,
t
0 ≤ xit , ∀t.
172 T. Agarwal and S. Cui
If the sum demand xt in a time slot t exceeds U , the price of electricity for the
consumer with the highest demand (indexed by ĵ) becomes infinite. As we retain
the assumption that sum load demand over all time slots is greater that sum
load available, consumer ĵ can rearrange its demand vector such that either the
sum demand becomes within the capacity threshold or consumer ĵ is no longer
the highest demand consumer (then the new customer with the highest demand
performs the same routine until the sum demand is under the threshold). This
implies that, at the Nash equilibrium point we have xt ≤ U . Similarly, we now
have the redundant constraint xit ≤ U, ∀ i, t, which in turn makes the feasible
region X finite and hence compact.
The proof for the existence of NEP for this game under the given assumptions
is provided in [15]. When each consumer tries to minimize its total cost while
satisfying its minimum daily energy requirement βi , we have the following result.
Lemma 5. If C(·) is strictly convex, the Nash equilibrium is unique and each
consumer distributes its demand uniformly over all time slots.
Remark: Notice that under the zero-revenue model, the NEP point is the same
with both increasing-block pricing and average-cost based pricing. For both the
cases, at NEP, we have xit = βi /T, ∀i, t. However, even though the loading pat-
tern is similar, the payments Mti made by the consumers will differ and, with
increasing-block pricing, will likely be lesser for consumers with relatively lower
consumption. In addition, with increasing-block pricing, the maximum payment
Mti made by the ith consumer given xit demand will be C(N xit )/N , irrespective
of what other consumers demand and consume. Thus this addresses the issue
faced under the average-cost based pricing and zero-revenue model, in which one
consumer can increase their demand indefinitely and cause indefinite increase in
the payments of all other consumers.
The consumer load balancing problem for consumer i is given by the following
optimization problem:
maximize π i (xi ) = Eti − Mti
t
subject to Eti = i i
φt xt , ∀t,
xit
Mti = M(x, xt )dx, ∀t,
0
xit ≥ βi ,
t
0 ≤ xit , ∀t.
Noncooperative Games for Autonomous Consumer Load Balancing 173
Here again, we assume βi = 0, ∀i, to avoid any negative payoffs and we could
agree for the redundant constraint xit ≤ U, ∀ i, t, which in turn makes the
feasible region for X finite and hence compact.
The proof for the existence of NEP for this game under the given assump-
tions is provided in [15]. With the average-cost based pricing scheme under the
constant-rate revenue model, we see that in a given time slot, if a single con-
sumer enjoys the maximum rate of revenue, it will be the only consumer who
is able to purchase power. We show here that with the increasing-block pricing
scheme under constant-rate revenue model, the result is different.
For a given time slot t, consumer i has an incentive to increase their demand
xit as long as the payoff increases, i.e., ∂π i /∂xit > 0. Therefore at the equilibrium
the following holds for all consumers:
∂π i
≤0
∂xit
(4)
∂Mti
⇒ φit ≤ = M(xit , xt ).
∂xit
Additionally, if φit < M(xit , xt ), Jti can be reduced by reducing xit . This implies
that if xit > 0, at the equilibrium we have
1500
Quantity Demanded xit (MWh)
Fig. 2. Demand xit versus the rate of revenue (φit ) at equilibrium. Each dot represents
a particular consumer i = {1, . . . , 100}.
174 T. Agarwal and S. Cui
Thus (4) and (5) together imply that, if xit > 0, we have φit = M(xit , xt ). Together
we can write the following set of necessary conditions for equilibrium,
5 Conclusion
References
1. Allcott, H.: Rethinking real time electricity pricing. CEEPR Working Paper 2009-
015, MIT Center for Energy and Environmental Policy Research (October 2009),
http://web.mit.edu/ceepr/www/publications/workingpapers/2009-015.pdf
2. Borenstein, S.: Time-varying retail electricity prices: Theory and practice. In: Grif-
fin, J., Puller, S. (eds.) Electricity Deregulation: Choices and Challenges, pp. 317–
357. University of Chicago Press, Chicago (2005)
3. Holland, S., Mansur, E.: The short-run effects of time-varying prices in competitive
electricity markets. The Energy Journal 27(4), 127–155 (2006)
Noncooperative Games for Autonomous Consumer Load Balancing 175
4. Borenstein, S.: The long-run effects of real-time electricity pricing. CSEM Working
Paper 133, University of California Energy Institute, Berkeley (June 2004),
http://www.ucei.berkeley.edu/PDF/csemwp133.pdf
5. Faruqui, A., Hledik, R., Sergici, S.: Rethinking prices. Public Utilities Fort-
nightly 148(1), 30–39 (2010)
6. Fahrioglu, M., Alvarado, F.: Designing cost effective demand management con-
tracts using game theory. In: IEEE Power Engineering Society 1999 Winter Meet-
ing, vol. 1, pp. 427–432. IEEE (1999)
7. Caron, S., Kesidis, G.: Incentive-based energy consumption scheduling algorithms
for the smart grid. In: 2010 First IEEE International Conference on Smart Grid
Communications (SmartGridComm), pp. 391–396 (October 2010)
8. Ibars, C., Navarro, M., Giupponi, L.: Distributed demand management in smart
grid with a congestion game. In: 2010 First IEEE International Conference on
Smart Grid Communications (SmartGridComm), pp. 495–500 (October 2010)
9. Tirole, J.: The Theory of Industrial Organization. The MIT Press, Cambridge
(1988)
10. Başar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Society for Indus-
trial and Applied Mathematics, Philadelphia (1999)
11. Borenstein, S.: Equity effects of increasing-block electricity pricing. CSEM Working
Paper 180, University of California Energy Institute, Berkeley (November 2008),
http://www.ucei.berkeley.edu/PDF/csemwp180.pdf
12. Lindeman, J.: EZ-101 Microeconomics. Barron’s Educational Series, Hauppauge
(2001)
13. Kakutani, S.: A generalization of brouwers fixed point theorem. Duke Mathematical
Journal 8(3), 457–459 (1941)
14. Orda, A., Rom, R., Shimkin, N.: Competitive routing in multiuser communication
networks. IEEE/ACM Transactions on Networking (TON) 1(5), 510–521 (1993)
15. Agarwal, T., Cui, S.: Noncooperative Games for Autonomous Consumer Load Bal-
ancing over Smart Grid. ArXiv e-prints (April 2011),
http://arxiv.org/abs/1104.3802
16. Bhaskar, U., Fleischer, L., Hoy, D., Huang, C.: Equilibria of atomic flow games are
not unique. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on
Discrete Algorithms, pp. 748–757 (2009)
17. Nash, J.: Two-person cooperative games. Econometrica 21(1), 128–140 (1953)
Optimal Contract Design for an Efficient
Secondary Spectrum Market
1 Introduction
The scarcity of spectrum resources and the desire to improve spectrum efficiency
have led to extensive research and development in recent years in such concepts
as dynamic spectrum access/sharing, open access, and secondary (spot or short-
term) spectrum market, see e.g., [1, 2].
One of the fundamental premises behind a secondary (and short-term) spec-
trum market is the existence of excess capacity due to the primary license holder’s
own spectrum under-utilization. However, this excess capacity is typically uncon-
trolled and random, both spatially and temporally, and strongly dependent on the
behavior of the primary users. One may be able to collect statistics and make pre-
dictions, as has been done in numerous spectrum usage studies [3–5], but it is fun-
damentally stochastic in nature. The primary license holder can of course choose
to eliminate the randomness by setting aside resources (e.g., bandwidth) exclu-
sively for secondary users. This will however likely impinge on its current users
and may not be in the interest of its primary business model.
The work is partially supported by the NSF under grants CIF-0910765 and CNS-
1217689, and the ARO under Grant W911NF-11-1-0532.
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 176–191, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Optimal Contract Design for an Efficient Secondary Spectrum Market 177
where c(> 0) is a predetermined constant cost which takes into account the
operating cost of the seller. If none of the contract is accepted by the buyer, the
reserved utility of the owner is defined by U (0, 0) = 0.
always walk away and purchase from this traditional market. This traditional
market will also be referred to as the reference market, and the service it sells
as the fixed or deterministic service/channel. Our model does allow a buyer to
purchase from both markets should there be a benefit.
1. Information is symmetric:
Under this assumption, the seller knows exactly the values q, b, of the buyer.
The seller can thus extract all of the buyer’s surplus (over the reserve price),
resulting in C(x, p) = C(0, 0) at the optimal contract point.
2. Information is asymmetric:
Under this assumption, the seller can no longer exploit all of the buyer’s
surplus, resulting in a more complicated contract design process. We assume
there are possibly K types of buyers, each having a different triple (q, b, ).
We further assume that the seller has a prior belief of the distribution of
the buyer types; a buyer is of type i with probability ri and has the triple
(qi , bi , i ) as its private information. We will also assume that at most M
different contracts are announced to the buyer.
In the symmetric information case, the seller can custom-design a contract for
the buyer, subject to the constraint that it offers an incentive for the buyer to
accept, referred to as the individual rationality (IR) constraint. In other words,
the buyer (by accepting the contract) has to be able to achieve a cost no higher
than the reserve price: C(x, p) ≤ C(0, 0) = q − . Knowing this, the seller can
exactly determined the region where the buyer would accept a contract (x, p)
since it knows the values q, , b.
Theorem 1. When q(1 − b) ≤ , the buyer accepts the contract (x, p) if and
only if
b if x ≤ q−
p ≤ q− b
q− (3)
x if x > b
When q(1 − b) > , the buyer accepts the contract if and only if
b if x ≤ 1−b
p≤ b (4)
x(1−b) if x > 1−b
The above result is illustrated in Fig. 1. The meaning of the two different types
of regions are as follows. (i) When q(1 − b) ≤ , or b ≥ q−
q , the quality of the
Optimal Contract Design for an Efficient Secondary Spectrum Market 181
price (p)
price (p)
0.6 0.6
p=b
xp=(q−ε) p=b (x*,p*)
0.4 0.4
xp=bε/(1−b)
0.2 0.2
0 0
0 2 4 6 8 10 0 2 4 6 8 10
bandwidth (x) bandwidth (x)
stochastic channel is sufficiently good such that, when x is large enough, the
constraint Eqn. (2) can be achieved without any purchase of the deterministic
channel (fixed service y). Thus, the buyer is willing to spend up to C(0, 0) = q−.
(ii) When q(1 − b) > , or b < q−q , the quality of the stochastic channel is not
so good that no matter how much is purchased, some deterministic channel (y)
has to be purchased to satisfy the loss constraint. Thus, the buyer is not willing
to spend all of q − on the contract. Below we prove the sufficient condition of
the acceptable region when q(1 − b) ≤ ; other parts of the above theorem can
be done using similar arguments.
q−
1. The buyer accepts the contract (x, p) if x ≤ b and p ≤ b.
Proof. We start by letting y = q − − xp and show that the IR constraint is
satisfied:
y + xp = q − − xp + xp = q − ≤ U (0, 0) .
The loss constraint is satisfied because,
where the second to last equality follows from the fact that q(1 − b) ≤ .
182 S.-P. Sheng and M. Liu
After determining the feasible region of contracts for a given type (q, , b), the
seller can choose any point in this region to maximize its utility. We next show
that the optimal contract for the seller is determined by the intersection of the
two boundary lines derived above, which we will denote as (x∗ , p∗ ) throughout
the rest of the paper. Here we assume that there exists a contract with p > c
such that the buyer will accept, for otherwise the seller has no incentive to sell
the stochastic channel.
Theorem 2. The optimal contract is the intersection point of the two lines:
p∗ = b (5)
∗ ∗ q − if q(1 − b) ≤
x p = b (6)
1−b if q(1 − b) >
Proof. From the form of the seller’s utility (U (x, p) = x(p − c)), it can be easily
verified that the profit is increasing in p. Using this property and the fact that
we already determined the feasible contracts in Theorem 1, we can show that
the contract pair (x, p) that generates the highest profit for the seller is the
intersection point (x∗ , p∗ ) (as illustrated in Figure 1).
Once the seller determines the optimal contract and presents it to the buyer,
the buyer chooses to accept because it satisfies the loss constraint and the IR
constraint. It can be shown that the buyer’s utility is exactly C(0, 0), as we
expected.
The optimal contract for buyer of type (q, , b) defined in Theorem 2 can be
written in a compact form in the following theorem.
We now introduce the concept of an equal-cost line of a buyer, this concept will
be used to find the optimal contract when there are more than one possible type
of buyer. Consider a contract (x , p ). Denote by P (x , p , x) a price such that the
contract (x, P (x , p , x)) has the same cost as contract (x , p ) to a buyer. This
will be referred to as an equivalent price. Obviously P (x , p , x) is a function of
x, x , and p .
Definition 1. The equal-cost line E of a buyer of type (q, , b) is the set of
contracts within the buyer’s acceptance region T that are of equal cost to the
buyer. Thus (x, p) ∈ E if and only if p = P (x , p , x) for some other (x , p ) ∈ E.
The cost of this line is given by C(x , p ), ∀(x , p ) ∈ E.
It should be clear that there are many equal-cost lines, each with a different
cost. Figure 2 shows an example of a set of equal-cost lines. We will therefore
also write an equal-cost line as Ex ,p for some (x , p ) on the line to distinguish
it from other equal-cost lines. The next theorem gives a precise expression for
the equivalent price that characterizes an equal-cost line.
0.4
0.2
0
0 2 4 6 8 10
bandwidth (x)
Proof. We will prove this for the case q(1 − b) ≤ ; the other case can be shown
with similar arguments and is thus omitted for brevity. In this case x∗ = q− b .
When x, x ≤ x∗ , without buying deterministic service the loss is given by
E[(q − xB)+ ] = (q − x)+ b + q(1 − b)
= (q − x)b + q(1 − b) = q − xb ≥ ,
184 S.-P. Sheng and M. Liu
where the second equality is due to the fact that q(1 − b) ≤ ⇒ q−
b ≤q ⇒x≤
q−
b ≤ q. The incentive for the buyer is to purchase y such that the loss is just
equal to .
The first equality follows from the fact that q(1 − b) ≤ , which implies both
(q − y − x) ≥ 0 and (q − y) ≥ 0. This is true for both (x, p) and (x , p ). Since
(x, p) is on the equal cost line Ex ,p , we know that C(x, p) = C(x , p ). We also
know that C(x, p) = y + xp and C(x , p ) = y + x p ,
C(x, p) = q − − xb + xp = q − − x b + x p = C(x , p ) .
We now turn to the case where parameters (q, b, ) are private information of
the buyer. The seller no longer knows the exact type of the buyer but only
what types are out there and their distribution; consequently it has to guess the
buyer’s type and design the contract in a way that maximizes its expected payoff.
In order to do so, the seller can design a specific contract for each type so that
the buyers will reveal their true types. Specifically, when the buyer distributes a
set of contracts C = {(x1 , p1 ), (x2 , p2 )......(xK , pK )} specially designed for each
of the K types, a buyer of type i will select (xi , pi ) only if the following set of
equations is satisfied:
Ci (xi , pi ) ≤ Ci (xj , pj ) ∀j = i ,
where Ci denotes the cost of a type i buyer. In other words, the contract designed
for one specific type of buyer, must be as good as any other contract from the
buyer’s point of view. Let Ri (C) denote the contract that a type i buyer will
select given a set of contract C. Then,
Given a set of contracts C, we can now express the seller’s expected utility as
E[U (C)] := U (Ri (C))ri
i
where ri is the a priori probability that the buyer is of type i. We further denote
the set Ti = {(x, p) : Ci (x, p) ≤ Ci (0, 0)} as the set of all feasible contracts for
type i buyer (feasible region in Theorem 1). The optimal contract (Theorem 2)
designed for the type-i buyer, will also be called maxi :
We first consider the case when there are only two possible types of buyer
(qi , i , bi ), i ∈ {1, 2}, with probability ri , r1 + r2 = 1.
1 0.7
max1 max
0.6 1
0.8 I
I 0.5 3
4
I
price (p)
price (p)
0 0
0 2 4 6 8 10 0 2 4 6 8 10
bandwidth (x) bandwidth (x)
M = 1. We first consider the case when the seller hands out only one contract.
– if max1 ∈
/ T2 and max2 ∈
/ T1 ,
⎧
⎨ max1 if r1 U (max1 ) ≥ r2 U (max2 ) and r1 U (max1 ) ≥ U (G)
optimal = max2 if r2 U (max2 ) ≥ r1 U (max1 ) and r2 U (max2 ) ≥ U (G)
⎩
G if U (G) ≥ r2 U (max2 ) and U (G) ≥ r1 U (max1 )
– if max1 ∈ T2 .
max1 if U (max1 ) ≥ r2 U (max2 )
optimal =
max2 if r2 U (max2 ) ≥ U (max1 )
186 S.-P. Sheng and M. Liu
– if max2 ∈ T1 .
max2 if U (max2 ) ≥ r1 U (max1 )
optimal =
max1 if r1 U (max1 ) ≥ U (max2 )
When max1 ∈ / T2 and max2 ∈ / T1 , we denote the intersecting point of the two
boundaries (of the accepting region of the two types) as G (see Figure 3 (left)).
Theorem 5 can be proved by showing that the payoffs of contracts in a particular
region are no greater than special points such as G. For example, in the case of
max1 ∈ / T2 and max2 ∈ / T1 any point in I3 is suboptimal to point G because
they are both acceptable by both types of buyers and G has a strictly higher
profit than any other point in I3 .
Lemma 1. In the K = 2 case, if max1 ∈ T2 and x∗1 ≤ x∗2 . Given a contract for
type-1 (x1 , p1 ), the optimal contract for type-2 must be (x∗2 , P2 (x1 , p1 , x∗2 )).
Proof. Given a contract (x1 , p1 ), the feasible region for the contract of type-2
buyer is the area below P2 (x1 , p1 , x) as defined in Theorem 4 (see Figure 4).
By noticing that the form of the seller’s profit is increasing in both p and x
(U (x, p) = x(p − c)), the contract that generates the highest profit will be such
that x2 = x∗2 and p2 =, P2 (x1 , p1 , x∗2 ).
Lemma 2. In the K = 2 case, if max1 ∈ T2 and x∗1 ≤ x∗2 . An optimal contract
for type-1 must be p1 = b1 and x1 ≤ x∗1 .
Proof. Lemma 2 can be proved in two steps. First we assume the optimal con-
tract has (x1 , p1 ) ∈ T1 , where we can increase p1 by some positive δ > 0 but
still have (x1 , p1 + δ) ∈ T1 . By noticing that both U (x, p) and P (x, p, x ) are
increasing in p. We know that both U (x1 , p1 + δ) and U (x∗2 , P2 (x1 , p1 + δ, x∗2 )))
are strictly larger than U (x1 , p1 ) and U (x∗2 , P2 (x1 , p1 , x∗2 ))). This contradicts the
assumption that it was optimal before, thus, we know that the optimal contract
for (x1 , p1 ) must be on the two lines (the upper boundary of T1 ) defined in The-
orem 2. Then we can exclude the possibility of having (x1 , p1 ) on the boundary
of T1 with x1 > x∗1 by comparing the contract (x∗1 , b1 ) with such a contract.
1
Optimal contract to give type−2
0.8
Equal−cost line of type−2
price (p)
0.4
0.2
0
0 2 4 6 8 10
bandwidth (x)
By putting the constraints from Lemmas 1, 2 and using Theorem 4, the expected
profit can be expressed as follows.
This result shows two different conditions: 1) When rr12 < bb21−b 1
−c , type-2 is more
profitable and the seller will distribute max2 . If the seller chooses to distribute
max2 , there is no way to distribute another contract for type-1 without affecting
the behavior of type-2. Consequently, the seller only distributes one contract. 2)
When rr12 > bb21−b
−c , type-1 is more profitable and the seller will distribute max1 .
1
x∗
After choosing max1 , the seller can also choose (x∗2 , b2 − x1∗ (b2 − b1 )) for the
2
type-2 buyer without affecting the type-1 buyer’s choice. As a result, the seller
distributes a pair of contracts to get the most profit.
With a very similar argument, the optimal contract for x∗1 > x∗2 can be
determined. Again, we can prove that the optimal contract must have p1 =
b1 and x1 ≤ x∗1 . The difference is that when x∗1 > x∗2 , the expression for
(x∗2 , P2 (x1 , p1 , x∗2 )) has two cases depending on whether x1 > x∗2 or x1 ≤ x∗2 .
r1 U (x1 , b1 ) + r2 U (x∗2 , b2 − xx∗1 (b2 − b1 )) if x1 ≤ x∗2
E[U (C)] = 2
r1 U (x1 , b1 ) + r2 U (x∗2 , xx1∗b1 ) if x1 > x∗2
2
∂E[U (C)] r1 (b1 − c) − r2 (b2 − b1 ) if x1 ≤ x∗2
=
∂x1 r1 (b1 − c) + r2 b1 if x1 > x∗2
0 or x∗1 if r1 (b1 − c) − r2 (b2 − b1 ) < 0
x1 =
x∗1 if r1 (b1 − c) − r2 (b2 − b1 ) > 0
x∗
1 b1
max2 or {max1 , (x∗2 , x∗ )} if r1 (b1 − c) − r2 (b2 − b1 ) < 0
C= x∗ b
2
In the first condition, we can calculate the expected profit of the two contract
sets and pick the one with the higher profit.
the case where the condition is largely determined by the seller’s primary user
traffic. An example of the acceptance regions of three buyer types are shown
in Figure 5. We will assume that the indexing of the buyer is in the increasing
order of x∗i ; this can always be done by relabeling the buyer indices. There are
two possible cases: (1) the seller can announce as many contracts as it likes,
i.e., M = K (note that there is no point in designing more contracts than there
are types); (2) the seller is limited to at most M < K contracts. In the results
presented below we fully characterize the optimal contract set in both cases.
0.4 I I2 I3
1
0.3
0.2
0.1
0
0 2 4 6 8 10
bandwidth (x)
Theorem 7. When M = K and ∀bi = b, the contract set that maximizes the
seller’s profit is (max1 , max2 , ..., maxK ).
This result holds for the following reason. As shown in Figure 5, with a constant
b, the intersection points (maxi ) of all acceptance regions are on the same line
p = b. For a buyer of type i, all points to the left of maxi on this line cost the same
as maxi , and all points to its right are outside the buyer’s acceptance region.
Therefore the type-i buyer will select the contract maxi given this contract set.
Since this is the best the seller can do with a type-i buyer (see Theorem 4) this
set is optimal for the seller. (see proof of Theorem 6)
Lemma 3. When M < K and ∀bi = b, the optimal contract set is a subset of
(max1 , ..., maxK ).
Proof. Assume the optimal contract C is not a subset of (max1 , ..., maxK ). Then
it must consists of some contract points from at least one of the Ii regions as
demonstrated in Figure 5. Let these contracts be Ai ⊂ Ii and i Ai = C. For
each non-empty Ai , we replace it by the contract maxi and call this new contract
set C . The proof is to show that this contract set generates profit at least as large
as the original one. For each type-i buyer that picked some contract (x, p) ∈ Aj
from the optimal contract C, it must had a type greater than or equal to j
otherwise (x, p) is not in its acceptance region. In the contract set C , type-i will
now pick maxj or maxl with l > j. The choice of each possible type of buyer
picks from C is at least as profitable as the one they picked from C. Thus, the
expected profit of C is at least as good as C.
190 S.-P. Sheng and M. Liu
The above lemma suggests the following iterative way of finding the optimal
contract set.
Definition 2. Define function g(m, i) as the the maximum expected profit for
the seller by picking contract maxi and selecting optimally m − 1 contracts from
the set (maxi+1 , ..., maxK ).
Note that if we include maxi and maxj (i < j) in the contract set but nothing
else in between i and j, then a buyer of type l (i ≤ l < j) will pick contract maxi .
j−1
These types contribute to an expected profit of x∗i (b − c) l=i rl . At the same
time, no types below i will select maxi (as it is outside their acceptance regions),
and no types at or above j will select maxi (as for them maxj is preferable).
Thus the function g(m, i) can be recursively obtained as follows:
j−1
g(m, i) = max g(m − 1, j) + x∗i (b − c) rl ,
j:i<j≤K−m+2
l=i
K
with the boundary condition g(1, i) = x∗i (b − c) l=i rl .
Finally, it should be clear that the maximum expected profit for the seller
is given by max1≤i≤K g(M, i), and the optimal contract set can be determined
by going backwards: first determine i∗M = arg max1≤i≤K g(M, i), then i∗M−1 =
arg max1≤i≤K−1 g(M − 1, i), and so on.
5 Conclusion
In this paper we considered a contract design problem where a primary license
holder wishes to profit from its excess spectrum capacity by selling it to potential
secondary users/buyers via designing a set of profitable contracts. We considered
two cases. Under symmetric information, we found the optimal contract that
achieves maximum profit for the primary user. Under asymmetric information,
we found the optimal contract if the buyer belongs to one of two types. When
there are more than two types we restricted our attention to the case where the
channel condition is common to all types, and presented an optimal procedure
to design the contracts.
References
1. Akyildiz, I.F., Lee, W.Y., Vuran, M.C., Mohanty, S.: Next generation/dynamic
spectrum access/cognitive radio wireless networks: a survey. Computer Net-
works 50(13), 2127–2159 (2006)
2. Buddhikot, M.M.: Understanding dynamic spectrum access: Models, taxonomy and
challenges. In: New Frontiers in Dynamic Spectrum Access Networks, DySPAN
2007, pp. 649–663. IEEE (2007)
Optimal Contract Design for an Efficient Secondary Spectrum Market 191
3. McHenry, M.A., Tenhula, P.A., McCloskey, D., Roberson, D.A., Hood, C.S.:
Chicago spectrum occupancy measurements & analysis and a long-term studies
proposal. In: The First International Workshop on Technology and Policy for Ac-
cessing Spectrum. ACM Press, New York (2006)
4. McHenry, M.A.: NSF spectrum occupancy measurements project summary. Shared
Spectrum Company Report (August 2005)
5. Chen, D., Yin, S., Zhang, Q., Liu, M., Li, S.: Mining spectrum usage data: a large-
scale spectrum measurement study. In: ACM International Conference on Mobile
Computing and Networking (MobiCom), Beijing, China (September 2009)
6. Zhao, Q., Sadler, B.M.: A survey of dynamic spectrum access. IEEE Signal Pro-
cessing Magazine: Special Issue on Resource-Constrained Signal Processing, Com-
munications, and Networking 24, 79–89 (2007)
7. Kim, H., Shin, K.G.: Efficient discovery of spectrum opportunities with mac-layer
sensing in cognitive radio networks. IEEE Transactions on Mobile Computing 7(5),
533–545 (2008)
8. Liu, X., Shankar, S.N.: Sensing-based opportunistic channel access. Journal of Mo-
bile Networks and Applications 11(4), 577–591 (2006)
9. Zhao, Q., Tong, L., Swami, A., Chen, Y.: Decentralized cognitive mac for oppor-
tunistic spectrum access in ad hoc networks: A pomdp framework. IEEE Journal
on Selected Areas in Communications (JSAC) 5(3), 589–600 (2007)
10. Ahmad, S.H.A., Liu, M., Javidi, T., Zhao, Q., Krishnamachari, B.: Optimality
of myopic sensing in multi-channel opportunistic access. IEEE Transactions on
Information Theory 55(9), 4040–4050 (2009)
11. Duan, L., Gao, L., Huang, J.: Contract-based cooperative spectrum sharing. In:
Dynamic Spectrum Access Networks (DySPAN), pp. 399–407. IEEE (2011)
12. Muthuswamy, P.K., Kar, K., Gupta, A., Sarkar, S., Kasbekar, G.: Portfolio opti-
mization in secondary spectrum markets. WiOpt (2011)
Primary User Emulation Attack Game
in Cognitive Radio Networks: Queuing
Aware Dogfight in Spectrum
1 Introduction
Cognitive radio has attracted substantial studies since its birth in 1999 [14].
In cognitive radio systems, users without license (called secondary users) are
allowed to use the licensed spectrum that licensed users (called primary users)
are not using, thus improving the spectrum utilization efficiency. When primary
users emerge, the secondary users must quit the corresponding channels. To
ensure no interference to primary user traffic, the secondary user must sense the
spectrum periodically to determine the existence of primary users.
Such a dynamical spectrum access mechanism, particularly the spectrum sens-
ing mechanism, also incurs vulnerabilities for the communication system. One
serious threat is the primary user emulation (PUE) attack [1], in which the at-
tacker sends out signal similar to that of primary users during the spectrum
sensing period such that the secondary users will be ‘scared’ away even if there
is no primary user, since it is difficult to distinguish the signals from primary
users and the attacker. Such an attack is very efficient since the attacker needs
only very weak power consumption, due to the high requirement on the spectrum
sensing sensitivity of secondary users.
Most existing studies on PUE attack fall in the topics of proactive detection of
attacker [1] or passive frequency hopping [11]. Due to the difficulty of detecting
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 192–208, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Dogfight 193
the attacker, we will focus on the frequency hopping for avoiding PUE, in which
the secondary users randomly choose channels to sense such that the attacker
cannot always block the traffic in the cognitive radio network. Such an attack
and defense procedure is essentially a game, which is coined ‘dogfight in spec-
trum’ in [11] and has been studied using game theoretic argument. In previous
studies, only single hop communications, such as point to point communications
or multiple access, are considered and total throughput is considered as the game
reward (cost) for the defender (attacker). However, in many practical applica-
tions like sensor networks, the traffics are multihop and have constant average
traffic volumes, thus forming a queuing dynamics since each secondary user has
a buffer to store the received packets. Hence, the ultimate goal of the game is to
stabilize / destabilize the queuing dynamics, thus making the game a queuing
aware one. Note that the optimal scheduling strategy of stabilizing a queuing
system in wireless communication network is obtained in the seminal work [20].
However, there has not been any study on the game for stabilizing/destabilizing
queuing systems, which is of significant importance for the security of various
queuing systems. In this paper, we will model this queuing aware dogfight in spec-
trum as a stochastic game, in which the actions are the channels to sense/block
and the rewards are the metrics related to the queue stability such as the Lya-
punov drift and back pressure. We will first study the centralized case, in which
both the cognitive radio network and attackers are controlled by centralized
controllers, thus making the game a two-player one. Then, we will extend it to
the more practical situation, in which each player can observe only local system
state, based on which it makes the decision of action. Different from graphical
games in which each player has its own reward, in our situation, the attackers
and secondary users form two coalitions whose rewards are the sum of the local
rewards and each player devotes to increase the coalition reward. Such a ‘local-
decision-global-reward’ brings significant difference from the graphical games.
For both cases, we will provide the game formulation, the Nash equilibrium and
value of the general situation, and discussions for special cases. Note that, for
simplicity, we assume that the attackers know the queuing situation of the cogni-
tive radio network, which can be realized by eavesdropping the control messages
in the network.
In summary, our major contribution includes
– Study the network wide PUE attack with the awareness of queuing dynamics,
which extends the PUE attack in single hop systems.
– Study the game of stabilizing/destabilizing queuing systems, which extends
the decision problems of network stabilization.
The study will deepen our understanding on the security of cognitive radio net-
works, as well as that of general queuing systems. It will help the design of a
robust cognitive radio network which can effectively combat the PUE attack in
the network range.
The remainder of this paper is organized as follows. The existing work re-
lated to this paper will be introduced in Section 2. The system model will be
explained in Section 3. The centralized and decentralized games will be discussed
194 H. Li et al.
2 Related Works
In this section, we provide a brief survey of the existing works related to this
paper. Note that there are huge number of studies in each of the topics. Hence,
the introduction is far from exhaustive.
2.3 Games
The analysis in this paper is based on game theory. Due to the features of the
queuing aware dogfight in spectrum, our study concerns both stochastic games
and graphical games since the reward is dependent on the system state and the
players form a graph (network).
Graphical Game. As more studies are paid to various types of networks, such
as social networks and communication networks, game theory is also extended
from the traditional structureless setup (e.g., two players or multiple layers form-
ing a complete graph) to the scenarios with network structures (called graphical
game) [16]. In such games, the players form a graph or a network in which
the corresponding topology plays an important role in the game. In one type of
graphical game, each player has its own payoff. The algorithm for computing the
Nash equilibrium has been studied in [4], [8] and [9]. In another type of graphi-
cal game, the players form two coalitions and each player aims to maximize the
coalition reward equaling the sum of individual rewards. Such a game has been
studied in [3] and [26]. An excellent summary can be found in [2].
3 System Model
The system model consists of the models of cognitive radio networks, data flows
and primary user emulation attacks.
adjacent to each other are able to communicate with each directly. We denote by
n ∼ m if secondary users n and m are neighbors in the network. We assume that
there are totally M licensed channels which may be used by K primary users.
We denote by Nk the set of secondary users that may be affected by primary
user k and denote by Mk the set of channels that primary user k occupies when
it is active. For simplicity, we assume that the activities in different time slots of
each primary user are mutually independent, and the probability of being active
is denoted by pk for primary user k.
The time is divided into time slots, each containing a spectrum sensing period
followed by a data transmission period. At time slot t, the status of channel m
is denoted by sm ; i.e., sm = 0 when the channel is not being used by primary
users and sm = 1 otherwise. Due to the limited capability of spectrum sensing, we
assume that each secondary user can sense only one channel during the spectrum
sensing period. It is straightforward to extend to the more generic case in which
multiple channels can be sensed simultaneously. For simplicity, we assume that
the spectrum sensing is perfect; i.e., the output of spectrum sensing is free of
errors.
We assume that there are totally F data flows in the cognitive radio network. We
denote by Sf and Df the source node and destination node of flow f , respectively.
We assume that the number of packets arriving at the source node of data flow
f satisfies a Poisson distribution with expectation af . The routing paths of the
F data flows can be represented by an F × N matrix R in which Rf n = 1 if
data flow f passes through secondary user n and Rf n = 0 otherwise. We denote
by In the set of flows passing through secondary user n.
The data flows are packetized using the
same packet length. Each secondary user has CR
one or more buffers to store the received pack- State: queue
Action: channel
selection and
lengths
ets. In each time slot, the secondary users will scheduling
When secondary user n decides to transmit to the next hop neighbor j and
an available channel, say m, is found, the packet can be delivered successfully
with probability μnjm which is determined by the channel quality.
4 Centralized Game
In this section, we consider the centralized case, in which the actions of the
attackers and secondary users are both fully coordinated. Hence, we can assume
that there are two centers making the decisions for the attackers and secondary
users, respectively, such that there are two players in the game.
– State: The system state, denoted by s, includes the queue lengths of all flows
and all secondary users which are denoted by {qf n }f =1,...,F,n=1,...,N . The
state space is then denoted by S which consists of all possible s. We assume
that the system state is visible to both attackers and secondary users. Note
that, since we assume that the primary users’ activities are independent in
time, the spectrum situation is memoryless. It is easy to extend to the case
in which the spectrum has memory by incorporating the spectrum state into
the system state.
– Actions: We denote by Aa and As the sets of actions of the attackers and
secondary users, respectively. The actions of the attackers, denoted by aa ,
include the channels to jam, which are denoted by {cal }l=1,...,L (cl is a vec-
tor containing the Q channels to jam). The action of the secondary users,
denoted by as , includes the assignment of the channels, as well as the sched-
uled flow. We denote by cn (t) and fn (t) the assigned channel and scheduled
flow at secondary user n at time slot t. To avoid co-channel interference, we
have cn (t) = cm (t) if m ∈ Cn .
198 H. Li et al.
– Reward: The purpose of the attacker is to make the cognitive radio col-
lapse, or equivalently destabilizing the queuing system, while the purpose of
the secondary users is to stabilize the system. Hence, a quantity is needed
to quantify the stability of the system. We define the following Lyapunov
function, which is given by
F
N
V (s(t)) = qf2 n (t), (1)
f =1 n=1
namely the square sum of all queue lengths. The larger the Lyapunov func-
tion is, the more unstable the system is since there are more packets staying
in the network. Since V (s(t)) can be rewritten as
t
V (s(t)) = V (s(0)) + V (s(r)) − V (s(r − 1)), (2)
r=1
which simplifies the analysis since it is much easier to analysis the game with
a discounted sum of rewards. Note that this definition is motivated by the
classical works on scheduling queuing network in which the scheduling tries
to minimize the Lyapnov drift in order to stabilize the queues [15][20].
At the Nash equilibrium point, both players have no motivation to change the
strategies specified the equilibrium point; any unilateral deviation from the equi-
librium point can only incur performance degradation of itself.
To find the Nash equilibrium, an auxiliary matrix game proposed by Shapley
[19] is needed. We first define the matrix game conditioned on the system state
s, which is given by
⎛ ⎞
d(s, 1, 1) d(s, 1, 2) · · · d(s, 1, |Aa |)
⎜ d(s, 2, 1) d(s, 2, 2) · · · d(s, 2, |Aa |) ⎟
⎜ ⎟
R(s) = ⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠
d(s, |As |, 1) d(s, |As |, 2) · · · d(s, |As |, |Aa |)
in which d(s, a1 , a2 ) is the expected Lyapunov drift when the system state is
s and the actions are a1 and a2 for the attackers and cognitive radio network,
respectively.
We define the value vector of the attacker, denoted by va = (va (1), ..., va (|S|),
whose elements are given by
where R(s) is the reward of the attackers given the initial state s. Then, an
auxiliary matrix game is defined with the following payoff matrices
Similarly, we can also define the value vector for the cognitive radio network,
which is denoted by vc .
The following theorem (Shapley, 1953, [19]) discloses the condition of the Nash
equilibrium of the zero-sum stochastic game:
Theorem 1. The value vector at the Nash equilibrium satisfies the following
equations:
Myopic Game for Back Pressure. Although the Nash equilibrium exists for
the stochastic game formulation in the previous subsection, it is very difficult to
obtain analytic expression for the equilibrium. We can only obtain the numerical
solution for small systems. Moreover, it is still not clear whether defining the
Lyapunov drift as the reward of each time slot is the optimal choice. In this
subsection, we will study the myopic case in which the attackers and cognitive
radio take myopic strategies by maximizing their rewards in each time slot,
without considering the future evolution. Moreover, we will approximate the
maximization of Lyapunov drift by maximizing the back pressure, which can
simplify the stochastic game to a one-stage normal game.
It is well known that, when there is no attacker, the back pressure of flow f
at secondary user n is given by [20]
(qf n − qf j ) μnjm , j ∈
/ Df
Df n = , (12)
qf n μnjm , j ∈ D
where j is the next secondary user along flow f and m is the channel for the
transmission from n to j (recall that i ∈ Df means that node i is a destination
node for flow f ). [20] has shown that the scheduling algorithm minimizing the
back pressure, which is tightly related to minimizing the Lyapunov drift, can
stabilize the queuing system.
However, when attacks exist, the back pressure is dependent on the attackers’
strategy since the channels selected by the attackers will change the transmission
success probability μnjm . Recall that the actions of attackers and cognitive radio
network are denoted by aa and ac , respectively. Then, the success probability,
as a function of the actions, is given by (recall that Vl is the set of secondary
user that attacker l can attack)
/ cal , ∀n ∈ Vl ),
μ̃njm (aa , ac ) = μnjm I(m ∈ (13)
where I is the characteristic function of the event that no attacker that can
interfere secondary user n is attacking channel m. Then, the back pressure in
the game is defined as a function of the actions aa and ac , which is given by
(qf n − qf j ) μ̃njm (aa , ac ), j ∈
/ Df
D̃f n (aa , ac ) = . (14)
qf n μ̃njm (aa , ac ), j ∈ Df
Then, the reward of the attacker is given by (recall that fn is the flow scheduled
at secondary user n)
N
R(aa , ac ) = − D̃fn ,n (aa , ac ), (15)
n=1
and the reward of the cognitive radio network is N n=1 D̃fn ,n since the game
is modeled as a zero-sum one. Then, the strategy of the attackers at the Nash
equilibrium is given by
πa∗ = arg max min R(πa , πc ), (16)
πa πc
Dogfight 201
The actions at the Nash equilibrium point can be computed using linear pro-
gramming. The challenge is the large number of actions when the network size
or the number of channels is large. We will find the analytic expression for an
example in the sequel. For large system size, we can only use approximations for
exploring the Nash equilibrium.
4.3 Example
In this subsection, we use one example to illustrate the previous discussions,
which also provides insights for networks with larger size. The example is illus-
trated in Fig. 2, in which there is one attacker and three secondary users. We
assume that there are totally two channels over which two data flows are sent
from secondary user 3 to secondary users 1 and 2, respectively. The attacker can
only interfere secondary user 3. For simplicity, we assume that secondary user 3
can sense and transmit over both channels simultaneously; hence, there are only
two possible actions for secondary user 3.
One Stage Game for Back Pressure. Now we consider the one stage game
for maximizing or minimizing the back pressure. Fix a certain time slot and
202 H. Li et al.
drop the index of time for simplicity. It is easy to verify that the reward for the
cognitive radio can be represented by a matrix, which is given by
q32 μ322 q31 μ312
. (18)
q31 μ311 q32 μ321
The Nash equilibrium of this matrix game is provided in the following proposi-
tion. The proof is a straightforward application of the conclusion in [5]; hence,
we omit the proof due to the limited space.
Proposition 2. We denote by πja the probability that the attacker attacks chan-
nel j, j = 1, 2, and by πkc the probability that secondary user 3 transmits data flow
1 over channel k while transmitting data flow 2 over the other channel, k = 1, 2.
The Nash equilibrium of the matrix game in (18) is given by the following cases:
– If the following inequality holds; i.e.,
(q32 μ322 − q31 μ312 )(q32 μ321 − q31 μ311 ) > 0
, (19)
(q32 μ322 − q31 μ311 )(q32 μ321 − q31 μ312 ) > 0
– If the first equality in (19) does not hold, then we have the following possi-
bilities:
• q32 μ322 ≥ q31 μ312 and q32 μ321 < q31 μ311 , or q32 μ322 > q31 μ312 and
q32 μ321 ≤ q31 μ311 : secondary user 3 should always transmit data flow 1
over channel 1; the attacker should attack channel 1 if q31 μ311 > q32 μ322
and attack channel 2 otherwise.
• q32 μ322 < q31 μ312 and q32 μ321 ≥ q31 μ311 , or q32 μ322 ≤ q31 μ312 and
q32 μ321 > q31 μ311 : secondary user 3 should always transmit data flow 1
over channel 2; the attacker should attack channel 1 if q32 μ321 > q31 μ312
and attack channel 2 otherwise.
• q32 μ322 = q31 μ312 and q32 μ321 = q31 μ311 : secondary user 3 can choose
either action; the attacker should attack channel 1 if q32 μ321 > q31 μ312
and attack channel 2 otherwise.
– If the second equality in (19) does not hold, then we have the following pos-
sibilities:
• q32 μ322 ≤ q31 μ311 and q31 μ312 < q32 μ321 , or q32 μ322 < q31 μ311 and
q31 μ312 ≤ q32 μ321 : the attacker should always attack channel 1; secondary
user 3 should transmit data flow 1 over channel 1, if q32 μ322 > q31 μ312 ,
and transmit over channel 2 otherwise.
• q32 μ322 ≥ q31 μ311 and q31 μ312 > q32 μ321 , or q32 μ322 > q31 μ311 and
q31 μ312 ≥ q32 μ321 : the attacker should always attack channel 2; secondary
user 3 should transmit data flow 1 over channel 1, if q31 μ311 > q32 μ321 ,
and transmit over channel 2 otherwise..
Dogfight 203
• q32 μ322 = q31 μ311 and q32 μ321 = q31 μ312 : the attacker can attack any
channel; secondary user 3 should transmit flow 3 over channel 1 if
q32 μ322 > q31 μ312 and attack channel 2 otherwise..
Remark 1. We can draw the following conclusions from the Nash equilibrium:
– When all channels have the same quality, the attacker should attack each
channel with probability 0.5, which is independent of the queue lengths.
– Suppose μ311 = μ322 μ312 = μ321 ; i.e., it is much desirable to transmit
data flow 1 over channel 1 and data flow 2 over channel 2, the attacker should
attack the channel more desirable for the data flow with more queue length.
In this situation, the queue length information is useful.
and
μ312 μ311 (μ321 + μ322 )
2 > f2 . (22)
(μ311 + μ312 )
and
Then, we simply substitute the conclusion in Prop. 1 into the above expressions
of c1 and c2 .
5 Decentralized Game
– System state: Due to the locality assumption, each player does not nec-
essarily know the queue lengths of all secondary user and all flows. For
attacker l, its state is sal = {qf n }n∈Vl ,f ∈In , i.e., the queuing situations of
all secondary users that it may attack. For secondary user n, its state is
sal = {qf m }n∼m,f ∈Im , i.e., the queuing situations of all neighboring sec-
ondary users.
– Strategy: As we have assumed, each player knows only the states of its
neighbors. Hence, its action is also dependent on only the neighbors. We de-
fine the strategy of a player as the distribution of action given the states
Dogfight 205
of its neighbors and itself1 . For each attacker, the strategy is given by
P (a| {qf n }n∈Vl ,f ∈In ), a = 1, ..., M . For each secondary user n, the strat-
egy is given by P (a| {qf n }m∼n,f ∈Im ). The overall strategy of the cognitive
radio network (attacker) is the product of the strategies of each secondary
user (attacker); i.e.,
M m
πa = m=1 πa
N n
. (25)
πc = n=1 πc
Note that the key difference between the decentralized game and the central-
ized one is the structure of the strategy; i.e., the decentralized game has a
product space for the strategy while the centralized does not.
– Reward: Again, we consider the Laypunov drift as the reward. For secondary
user n, its reward is given by
rn (t) = qf2 n (t − 1) − qf2 n (t). (26)
f ∈Im
where β is the discounting factor. We can also consider the mean reward;
however, it is much more complicated.
For the PUE attack game, we define the value of the game as follows [2].
Definition 1. The value of the PUE attack game is given by
6 Numerical Results
In this section, we use numerical results to
demonstrate the theoretical analysis. In Fig.
4, we show the rate region subject to PUE 0.9
uniformly random
0.8 Nash equilibrium
attacks for the network in Fig. 2. The strate- 0.7
no attack
0.4
proach [5]. Since there are infinitely many pos- 0.3
sible queue lengths, thus resulting in infinitely 0.2
0
with more than 9 packets in a queue into one 0 0.2 0.4
λ
1
0.6 0.8 1
7 Conclusions
In this paper, we have studied multiple attackers and an arbitrary cognitive radio
network with multiple data flows, where the goal of the game is to stabilize
(destabilize) the queuing dynamics by the secondary users (attackers). Both
the centralized and decentralized cases of the game have been considered. The
Lyapunov drift and the back pressure are considered as the game rewards for
Dogfight 207
the stochastic game case and the myopic strategy case, respectively. The value
functions and Nash equilibriums have been obtained for the general case, while
the explicit expressions are obtained for simple but typical scenarios. Numerical
simulations have been carried out to demonstrate the analysis.
References
1. Chen, R., Park, J.-M., Reed, J.H.: Defense against primary user emulation attacks
in cognitive radio networks. IEEE J. on Select. Areas in Commun. Special Issue
on Cognitive Radio Theory and Applications 26(1) (2008)
2. Chornei, R.K., Daduna, H., Knopov, P.S.: Control of Spatially Structured Random
Processes and Random Fields with Applications. Springer (2006)
3. Daskalakis, C., Papadimitriou, C.: Computing pure Nash equilibria in graphical
games via Markov random fields. In: Proc. of the 7th ACM Conferene on Electrionic
Commerce (2006)
4. Elkind, E., Goldberg, L., Goldberg, P.: Graphical games on tree revisited. In: Proc.
of the 7th ACM Conferene on Electrionic Commerce (2006)
5. Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer (1997)
6. Han, Z., Pandana, C., Liu, K.J.R.: Distributive opportunistic spectrum access for
cognitive radio using correlated equilibrium and no-regret learning. In: Proc. of
IEEE Wireless Communications and Networking Conference, WCNC (2007)
7. Jin, Z., Anand, S., Subbalakshmi, K.P.: Detecting primary user emulation attacks
in dynamic spectrum access networks. In: Proc. of IEEE International Conference
on Communications, ICC (2009)
8. Kakade, S., Kearns, M., Langford, J., Ortiz, L.: Correlated equilibria in graphical
games. In: Proc. of the 4th ACM Conference on Electronic Commerce, EC (2003)
9. Kakade, S.M., Kearns, M., Ortiz, L.E.: Graphical Economics. In: Shawe-Taylor,
J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 17–32. Springer,
Heidelberg (2004)
10. Korilis, Y.A., Lazar, A.A.: On the existence of equlibria in noncooperative optimal
flow control. Journal of the ACM 42, 584–613 (1995)
11. Li, H., Han, Z.: Dogfight in spectrum: Jamming and anti-jamming in cognitive
radio systems. In: Proc. of IEEE Conference on Global Communications, Globecom
(2009)
12. Li, H., Han, Z.: Blind dogfight in spectrum: Combating primary user emulation
attacks in cognitive radio systems with unknown channel statistics. In: Proc. of
IEEE International Conference on Communications, ICC (2010)
13. Li, H., Han, Z.: Competitive spectrum access in cognitive radio networks: Graphical
game and learning. In: Proc. of IEEE Wireless Communication and Networking
Conference, WCNC (2010)
14. Mitola, J.: Cognitive radio for flexible mobile multimedia communications. In: Proc.
IEEE Int. Workshop Mobile Multimedia Communications, pp. 3–10 (1999)
15. Neely, M.J.: Stochastic Network Optimization with Application to Communication
and Queuing Systems. Morgan&Claypool Press (2010)
16. Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V. (eds.): Algorithmic Game
Theory. Cambridge University Press (2007)
17. Qin, T., Yu, H., Leung, C., Sheng, Z., Miao, C.: Towards a trust aware cognitive
radio architecture. ACM SIGMOBILE Newsletter 13 (April 2009)
208 H. Li et al.
18. Sampath, A., Dai, H., Zheng, H., Zhao, B.Y.: Multi-channel jamming attacks using
cognitive radios. In: Proc. of IEEE Conference on Computer Communications and
Networks, ICCCN (2007)
19. Shapley, L.S.: Stochastic games. In: Proceedings Nat. Acad. of Science USA, pp.
1095–1100 (1953)
20. Tassiulas, L., Ephremides, A.: Stability properties of constrained queuing systems
and scheduling for maximum throughput in multihop radio networks. IEEE Trans.
Automat. Control 37, 1936–1949 (1992)
21. Thomas, R.W., Komali, R.S., Borghetti, B.J., Mahonen, P.: A Bayesian game
analysis of emulation attacks in dynamic spectrum access networks. In: Proc. of
IEEE International Symposium of New Frontiers in Dynamic Spectrum Access
Networks, DySPAN (2008)
22. Urgaonkar, R., Neely, M.J.: Opportunistic scheduling with reliability guarantees
in cognitive radio networks. IEEE Trans. Mobile Computing 8, 766–777 (2009)
23. Wang, W., Li, H., Sun, Y., Han, Z.: Attack-proof collaborative spectrum sensing
in cognitive radio networks. In: Proc. of Conference on Information Sciences and
Systems, CISS (2009)
24. Wang, W., Li, H., Sun, Y., Han, Z.: CatchIt: Detect malicious nodes in collabora-
tive spectrum sensing. In: Proc. of IEEE Conference on Global Communications,
Globecom (2009)
25. Wu, X., Srikant, R.: Regulated maximal matching: A distributed scheduling al-
gorithm for multihop wireless networks with node-exclusive spectrum sharing. In:
Proc. of 44th IEEE Conference on Decision and Control (2005)
26. Yao, D.: S-modular games, with queuing applications. Queuing Systems and Their
Applications 21, 449–475 (1995)
27. Ying, L., Srikant, R., Eryilmaz, A., Dullerud, G.E.: Distributed fair resource al-
location in cellular networks in the presence of heterogeneous delays. In: Proc. of
IEEE International Symposium on Modeling and Optimization in Mobile, Ad Hoc
and Wireless Networks, WIOPT (April 2005)
Revenue Maximization
in Customer-to-Customer Markets
1 Introduction
Electronic commerce markets have witnessed an explosive growth over the past
decade and have now become an integral part of our everyday lives. In the
realm of electronic commerce, customer-to-customer, also known as consumer-to-
consumer (C2C), markets are becoming more and more popular, as they provide
a convenient platform allowing customers to easily engage in business with each
other. A well-known C2C market is eBay, on which a wide variety of products,
including second-hands goods, are sold.
As a major source of revenue, a C2C market owner charges various fees, which
we refer to as transaction fees, for products sold in the market. For instance, eBay
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 209–223, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
210 S. Ren and M. van der Schaar
charges a final value fee and a listing fee for each sold item [1]. Hence, to enhance
a C2C market’s profitability, it is vital for the market owner to appropriately
set the transaction fee. In this paper, we focus on a C2C market and address
the problem of maximizing the market owner’s revenue. The scenario that we
consider is summarized as follows.
1. The market owner monetizes the market by charging transaction fees for
each product sold and (possibly) through advertising in the market. For the
completeness of analysis, we also allow the market owner to reward the sellers
to encourage them to sell products in the market, which increases the market’s
website traffic and hence advertising revenues if applicable. Although rewarding
sellers may seem to deviate from our initial goal of collecting transaction fees
from sellers, we shall show that rewarding sellers may also maximize the market
owner’s revenue under certain circumstances.
2. Products are sold by sellers and purchased by buyers at fixed prices. Pro-
motional activities (e.g., monetary rewards, rebate) and/or auctions are not
considered in our study.
3. Buyers do not need to pay the market owner (e.g., membership fees) in
order to purchase products in the market, and they can directly interact with
sellers that participate in the market (e.g., eBay).
In the following analysis, we adopt a leader-follower model (i.e., the mar-
ket owner is the leader, followed by the sellers and then by the buyers), which
is described in Fig. 1. Note that, without causing ambiguity, we refer to the
market owner as intermediary for brevity if applicable. Fig. 1 also shows inter-
dependencies of different decisions stages. The intermediary’s transaction fee
decision will directly affect the sellers’ participation in the market, while the
sellers’ selling decisions influence the buyers’ purchasing decisions. Based on
backward induction, we first use a model with a representative buyer, which is
a collection of all the individual buyers, to determine the sales of products sold
in the market. As a distinguishing feature, our model captures the (implicit)
competition among the sellers, which is typically neglected in existing two-sided
market research [8], and also the buyers’ preference towards a bundle of diversi-
fied products. Then, we study the selling decisions made by self-interested sellers.
It is shown that there always exists a unique equilibrium point at which no seller
can gain by changing its selling decision, which makes it possible for the inter-
mediary to maximize its revenue without uncertainties. Next, we formulate the
intermediary’s revenue maximization problem and develop an efficient algorithm
to derive the optimal transaction fee that maximizes the intermediary’s revenue.
Finally, we conduct simulations to complement our analysis and show that the
intermediary’s revenue can be significantly increased by optimally choosing the
transaction fee, even though sellers and buyers make self-interested and rational
decisions.
The rest of this paper is organized as follows. Related work is reviewed in Sec-
tion 2. Section 3 describes the model. In Section 4, we study the decisions made
by the buyers and sellers, and derive the optimal transaction fee maximizing the
intermediary’s revenue. Finally, concluding remarks are offered in Section 5.
Revenue Maximization 211
2 Related Works
We briefly summarize the existing related works in this section.
If the intermediary chooses to reward the sellers, then the transaction fee
is essentially an incentive for sellers to sell products in the market. Various
incentive mechanisms have been proposed recently. For instance, the authors
in [3] proposed eliminating or hiding low-quality content to provide content
producers with incentives to generate high-quality content. In [4], two scoring
rules, the approval-voting scoring rule and the proportional-share scoring rule,
were proposed to enable the high-quality answers for online question and answer
forums (e.g., Yahoo! Answers). The authors in [5] proposed a (virtual) reward-
based incentive mechanism to improve the overall task completion probability in
collaborative social media networks. If the intermediary charges the sellers, then
our work can be classified as market pricing. By considering a general two-sided
market, the authors in [8] studied the tradeoffs between the merchant mode and
the platform mode, and showed the conditions under which the merchant or
platform mode is preferred. Focusing on the Internet markets, [10] revealed that
a neutral network is inferior to a non-neutral one in terms of social welfare when
the ratio between advertising rates and end user price sensitivity is either too
high or too low.
In economics literature, C2C markets are naturally modeled as two-sided mar-
kets, where two user groups (i.e., sellers and buyers in this paper) interact and
provide each other with network benefits. Nevertheless, most two-sided market
research neglected the intra-group externalities (e.g., see [11][12] for a survey),
which in the contexts of C2C markets indicate the sellers’ competition. A few
recent studies on two-sided markets explicitly considered intra-group external-
ities. For instance, [13] studied the optimal pricing problem to maximize the
platform’s profit for the payment card industry with competition among the
merchants. [14] considered the sellers’ competition in a two-sided market with
differentiated products. More recently, considering intra-group competition, [15]
212 S. Ren and M. van der Schaar
studied the problem of whether an entrant platform can divert agents from the
existing platform and make a profit. Nevertheless, the focus in all these works
was market pricing, whereas in our work the intermediary can either charge or
reward the sellers. Moreover, the existing studies on two-sided markets typically
neglected product substitutability as well as buyers’ “love for variety”.
To summarize, this paper derives the optimal transaction fee, and determines
analytically when the intermediary should subsidize sellers to maximize its rev-
enue. Unlike general two-sided market research (e.g., [11][12]), this paper con-
siders both the sellers’ competition and the product substitutability, which are
key features of C2C markets and, as shown in this paper, significantly impact
the optimal transaction fee of C2C platforms.
3 Model
We first specify the basic modeling details of the intermediary, sellers and buyers,
and then discuss the model extension.
3.1 Intermediary
An important and prevailing charging model in C2C markets is that, for each sold
product, the intermediary charges a transaction fee that is proportional to the
product price (e.g., final value fee in eBay). From the perspective of sellers, sellers
pay to the intermediary when their products are sold, i.e., “pay-per-sale”. In this
paper, we concentrate on the “pay-per-sale” model. Nevertheless, it should be
noted that other fees may also be levied on product sales, e.g., eBay charges
a lump-sum listing fee for listing a product regardless of the quantities sold
[1]. Investigating more sophisticated charging models (e.g., “pay-per-sale” plus
lump-sum fee) is part of our ongoing research. As in many real C2C markets
such as eBay, buyers do not need to pay the intermediary (e.g., membership
fees) in order to purchase products in the market.
To formally state our model, we denote x̄ ≥ 0 as the sales volume (i.e., quan-
tities of sold products) in the market, and θ > 0 is the transaction fee1 that the
intermediary charges the sellers for each of their sold products. For the ease of
presentation, we assume in our basic model that all the products belong to the
same category and have the same price and hence, θ is the same for all the prod-
ucts. This assumption is valid if all the sellers sell similar and/or standardized
products (e.g., books, CDs) and, due to perfect competition, set the same price
for their products [8][19]. Recent research support the assumption of a uniform
product price by showing that price dispersion in online shopping sites is fairly
small, i.e., prices offered by different sellers for the same or similar products are
very close to each other [6]. Moreover, if the considered C2C market is an online
labor market in which sellers “sell” their services (e.g., skills, knowledge, etc.),
1
Note that θ is actually the percentage of the product price charged by the interme-
diary. However, since we later normalize the product price to 1, θ can also represent
the absolute transaction fee charged by the intermediary.
Revenue Maximization 213
the assumption of different services having the same price is reasonable when the
offered services are of the same or similar types (see, e.g., Fiverr, an emerging
C2C market where the “sellers” offer, possibly different, services and products
for a fixed price of US$ 5.00 [2]). We should also make it clear that our analysis
can be generalized and applied if different products are sold at different prices
(see Section 3.4 for a detailed discussion). Besides the transaction fees charged
for product sales, the intermediary may also receive advertising revenues by
displaying contextual advertisement on its website. In general, the advertising
revenue is approximately proportional to page views (i.e., the number of times
that the webpages are viewed), which are also approximately proportional to
sales volume in the market. Thus, overall, the advertising revenue is approxi-
mately proportional to the sales volume. Let b ≥ 0 be the (average) advertising
revenue that the intermediary can derive from each sold product. For the conve-
nience of analysis, we assume that b is constant regardless of x̄, i.e., the average
advertising revenue is independent of the sales volume. Next, we can express the
intermediary’s revenue as2
ΠI = (b + θ) · x̄. (1)
Remark 1: For the completeness of analysis, we allow θ to take negative values,
in which case the intermediary rewards the sellers for selling their products. This
may occur if the intermediary can derive a sufficiently high advertising revenue
per page view and hence would like to encourage more sellers to participate in its
market, which attracts more buyers and increases the website traffic (and hence,
higher advertising revenues, too). In the following analysis, we use the term
transaction fee (per sold product) to refer to θ wherever applicable, regardless
of its positive or negative sign.
Remark 2: While b can be increased by using sophisticated advertising algorithms
showing more relevant advertisement, we assume throughout the paper that b
is exogenously determined and fixed, and shall focus on deriving the optimal θ
that maximizes the intermediary’s revenue.
Remark 3: As in [8], we focus on only one C2C market in this paper. Although
the competition among various C2C markets is not explicitly modeled, we do
consider that online buyers can purchase products from other markets (see Sec-
tion 3.3 for details).
3.2 Sellers
As evidenced by the exploding number of sellers on eBay, a popular C2C market
can attract a huge number of sellers. To capture this fact, we use a continuum
model and assume that the mass of sellers is normalized to one. Each seller can
sell products of a certain quality while incurring a lump-sum cost, which we
refer to as selling cost, regardless of the sales volume. Note that the product
2
The expression in (1) can also be considered as the intermediary’s profit, if we treat b
as the average advertising profit for each sold product and neglect the intermediary’s
recurring fixed operational cost.
214 S. Ren and M. van der Schaar
quality can be different across sellers, although we assume in our basic model
that the selling cost is the same for all sellers. We should emphasize that the
product quality is represented by a scalar and, as a generalized concept, is jointly
determined by a variety of factors including, not not limited to, product popu-
larity, seller ratings, customer service and product reviews [7]. For instance, even
though two sellers with different customer ratings sell the same product, we say
that the product sold by the seller with a higher rating has a higher quality. The
scalar representation of product quality, i.e., abstracting and aggregating vari-
ous factors to one value, is indeed an emerging approach to representing product
quality [7]. Mathematically, we denote qi ≥ 0 and c > 0 as the product quality
sold by seller i and the selling cost, respectively. Without causing ambiguity, we
occasionally use product qi to refer to the product with a quality qi . To charac-
terize heterogeneity in the product quality, we assume that the product quality
q follows a distribution in a normalized interval [0, 1] across the unit mass of
sellers and the cumulative distribution function (CDF) is denoted by F (q) for
q ∈ [0, 1] . In other words, F (q) denotes the number or fraction of sellers whose
products have a quality less than or equal to q ≥ 0. In what follows, we shall
explicitly focus on the uniform distribution, i.e., F (q) = q for q ∈ [0, 1], when
we derive specific results, although other CDFs can also be considered and our
approach of analysis still applies.3 Note that scaling the interval [0, 1] to [0, q̄]
does not affect the analysis, but will only complicate the notations.
As stated in the previous subsection, we assume in our basic model that all
the products are sold at the same price in the market. Hence, without loss of
generality, we normalize the product price to 1. Denote the profit that each seller
can obtain by selling a product by s ∈ (0, 1), which is assumed to be same for all
the sellers, and let x(qi ) ≥ 0 be the sales volume for product qi . Heterogeneous
product profits (i.e., different s for different sellers) can be treated in the same
way as treating heterogeneous product prices (see Section 3.4 for details). In our
model, sellers are rational and each seller makes a self-interested binary decision:
sell or not sell products in the considered C2C market. If seller i chooses to sell
products in the market, it can derive a profit expressed as
πi = (s − θ) · x(qi ) − c, (2)
where θ is the transaction fee charged by the intermediary per product sale, and
c is the (lump-sum) selling cost. Seller i obtains zero profit if it chooses not to
sell products in the market. By the assumption of rationality, seller i chooses to
sell products if and only if its profit is non-negative. It is intuitively expected
that, with the same price, a product with a higher quality will have a higher
sales volume (and yield a higher profit for its seller, too) than the one with a
lower quality.4 Thus, the sellers’ selling decisions have a threshold structure. In
particular, there exist marginal sellers whose products have a quality denoted
3
The uniform distribution has been widely applied to model the diversity of various
factors, such as opportunity cost [8] and valuation of quality-of-service [9].
4
This statement can also be mathematically proved, while the proof is omitted here
for brevity.
Revenue Maximization 215
by qm ∈ [0, 1], and those sellers whose product quality is greater (less) than
qm will (not) choose to sell products in the market. We refer to qm as the
marginal product quality. Next, it is worthwhile to provide the following remarks
concerning the model of sellers.
Remark 4: In our model, a seller who sells m ≥ 1 different products is viewed as
m sellers, each of whom sells a single product, and the total selling cost is m · c
(i.e., constant returns to scale [8]).
Remark 5: The lump-sum selling cost c accounts for a variety of fixed costs for
selling products. For instance, sellers need to spend time in purchasing products
from manufactures and in listing products in the market. Moreover, as charged
by eBay, a small amount of lump-sum fee, i.e., listing fee, may also be charged
for listing a product (although we do not explicitly consider this revenue for
maximizing intermediary’s revenue) [1]. As in [8], we assume that the sellers will
incur a predetermined selling cost if they choose to sell products in the market.
For the ease of presentation, we consider a homogeneous selling cost among the
sellers, while we shall discuss the extension to heterogeneous selling costs in
Section 3.4.
Remark 6: In our model, sellers always have products available if buyers would
like to purchase. That is, “out of stock” does not occur.
3.3 Buyers
We adopt the widely-used representative agent model to determine how the total
budget (i.e., buyers’ expenditure in online shopping) is allocated across a vari-
ety of products [18]. Specifically, the representative buyer optimally allocates
its total budget, denoted by T , across the available products to maximize its
utility. Note that T can be interpreted as the size of the representative buyer or
the online shopping market size. In addition to purchasing products sold in the
considered C2C market, buyers may also have access to products sold in other
online markets (e.g., business-to-customer shopping sites and/or other C2C mar-
kets), and we refer to these products as outside products. Similarly, we refer to
those online markets where outside products are sold as outside markets. Focus-
ing on the intermediary’s optimal transaction fee decision, we do not consider
the details of how or by whom outside products are sold. Instead, we assume
that the mass of outside products is na ≥ 0 and the outside product quality
follows a certain CDF F̃ (q) with support q ∈ [ql , qh ], where 0 ≤ ql < qh are
the lowest and highest product quality of outside products, respectively. For the
convenience of notation, throughout the paper, we alternatively represent the
outside products using a unit mass of products with an aggregate quality of qa ,
without affecting the analysis. Note that qa is a function of na ≥ 0, F̃ (q) and
the utility function of the representative buyer. In particular, given a uniform
distribution of outside product quality and the quality-adjusted Dixit-Stiglitz
216 S. Ren and M. van der Schaar
utility for the representative buyer (which we shall define later), we can readily
obtain
σ1
na qhσ+1 − qlσ+1
qa = , (3)
1+σ
Heterogeneous Selling Costs. The assumption that all the sellers have the
same (homogeneous) selling cost can be relaxed to consider that different sellers
have heterogeneous selling costs. Specifically, as in [20], we assume that there
are K ≥ 1 possible values of selling costs, denoted by c1 , c2 , . . . , cK , where 0 <
c1 ≤ c2 · · · ≤ cK , and refer to sellers with the selling cost of ck as type-k sellers.
Under the continuum model, the (normalized) mass of type-k sellers is nk > 0
K
such that k=1 nk = 1. To model the product quality heterogeneity, we consider
218 S. Ren and M. van der Schaar
that the product quality of type-k sellers follows a continuous and positive CDF
denoted by Fk (q) > 0 for q ∈ [0, 1]. Thus, the fraction of type-k sellers whose
product quality is less than or equal to q ∈ [0, 1] is given by nk Fk (q). Following
a framework of analysis similar to the one illustrated in Fig. 1, we can show
that there exists a unique equilibrium outcome in the selling decision stage, and
develop a recursive algorithm to derive the optimal transactions fee to maximize
the intermediary’s revenue.
T (σ + 1)q σ
x∗ (q) =
σ+1 , (5)
(σ + 1) · qaσ + 1 − qm
σ
T (σ+1)qa
for q ∈ [qm , 1], x∗ (q) = 0 for q ∈ [0, qm ), and x∗a = σ + 1−q σ+1
. The details
(σ+1)·qa ( m )
∗ ∗
of deriving (5) are omitted for brevity. After plugging x (q) and xa into (4), the
maximum utility derived by the representative buyer is given by
σ+1
σ−1
1
∗ ∗ 1 − qm
U (x (q), x∗a ) =T qaσ + , (6)
σ+1
which is decreasing in qm ∈ [0, 1]. Note that the other concave utility functions
can also be considered, although an explicit closed-form solution may not exist.
where [ ν ]10 = max{1, min{0, ν}}. Thus, an equilibrium selling decision exists
∗
if and only if the mapping Q(qm ), defined in (7), has a fixed point. Next, we
∗
formally define the equilibrium marginal product quality in terms of qm as below.
∗ ∗
Definition 1: qm is an equilibrium marginal product quality if it satisfies qm =
∗
Q(qm ).
We establish the existence and uniqueness of an equilibrium marginal product
quality in Theorem 1, whose proof is omitted for brevity. For the proof technique,
interested readers may refer to [20] where we consider a user-generated content
platform.
∗
Theorem 1. For any θ ∈ [−s, b], there exists a unique equilibrium qm ∈ (0, 1]
∗
in the selling decision stage. Moreover, qm satisfies
∗
qm = 1, if x∗ (1 | 1, qa ) · (s − θ) ≤ c,
∗ (8)
qm ∈ (0, 1), otherwise,
6
When qm → 1, only a negligible fraction of sellers choose to sell products in the
market.
Revenue Maximization 221
1
where x̄ = q∗ x∗ (q | qm ∗
, qa )dF (q). The decision interval is shrunk to [−b, θ̄],
m
since θ ∈ (θ̄, s] always results in a zero revenue for the intermediary, where θ̄ is
defined in (9). In the following analysis, a closed-form optimal transaction fee
σ
θ∗ ∈ [−b, s − c·(qTa ) ] is obtained and shown in Theorem 2.
Theorem 2. The unique optimal transaction fee θ∗ ∈ [−b, θ̄] that maximizes
the intermediary’s revenue is given by
σ
∗ c · (σ + 1) · (qa ) + 1 − z σ+1
θ =s− , (11)
T (σ + 1) · z σ
∗
where z ∈ [qm (−b), 1] is the unique root of the equation7
σ
T · (qa ) · (b + s) c σ + z σ+1
− σ 2 + 3
· = 0. (12)
[(σ + 1) · (qa ) + 1 − z σ+1 ] (σ + 1) z 2σ+1
Proof. Due to space limitations, we only provide the proof outline. Instead of
directly solving (10), we first find the optimal (equilibrium) marginal product
quality, which is the root of (12). Then, based on the marginal user principle, we
can obtain the optimal transaction fee θ∗ maximizing the intermediary’s revenue.
The detailed proof technique is similar to that in [20].
Next, we note that, to maximize its revenue, the intermediary may even reward
the sellers for selling products in the market, i.e., θ∗ < 0. In particular, “reward-
ing” should be applied if one of the following cases is satisfied:
In the first four cases, few sellers can receive a non-negative profit by sell-
ing products without being economically rewarded by the intermediary (e.g.,
if the selling cost c is very high, then sellers need to receive subsidy from the
intermediary to cover part of their selling costs). The last case indicates that if
the intermediary can derive a sufficiently high advertising revenue for each sold
product, then it can share the advertising revenue with the sellers to encourage
them to sell products in the market such that the intermediary can increase its
total advertising revenue. In Fig. 2, we illustrate the impacts of transaction fees
on the intermediary’s revenue. Note that the numeric settings for Fig. 2 are only
for the purpose of illustration and our analysis applies to any other settings.
For instance, with all the other parameters being the same, a larger value of T
indicates that the buyers spend more money in online shopping (i.e., the online
shopping market size is bigger). In practice, the intermediary needs to obtain
7 ∗
qm (−b) is the equilibrium point in the product selling stage when θ = −b.
222 S. Ren and M. van der Schaar
0.3
"Charging" is optimal c=2.0
c=1.5
0.2
Revenue
c=1.0
0.1
"Rewarding" is optimal
0
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
θ
0.3
"Charging" is optimal T=20
T=30
0.2
Revenue
T=40
0.1
"Rewarding" is optimal
0
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
θ
Fig. 2. Revenue versus transaction fee. σ = 2.0, b = 0.2, s = 0.5, qa = 3.0. Upper:
T = 40; Lower: c = 1.0.
real market settings by conducting market surveys, data analysis, etc. [8]. The
upper plot Fig. 2 verifies that the intermediary should reward the sellers if the
selling cost is high, while the lower plot indicates the intermediary should share
its advertising revenue with sellers in an emerging online shopping market (i.e.,
the market size is small). We also observe from Fig. 2 that by optimally choos-
ing the transaction fee θ∗ , the intermediary can significantly increase its revenue
compared to setting a non-optimal transaction fee (e.g., θ = 0). For instance,
the upper plot in Fig. 2 shows that with an optimal transaction fee and c = 1.0,
the intermediary’s revenue increases by nearly 30% compared to θ = 0 (i.e., the
intermediary only relies on advertising revenues). Due to the space limitation,
we omit more numerical results and the analytical condition specifying when the
intermediary should reward sellers (i.e., θ∗ < 0) to maximize its revenue.
5 Conclusion
In this paper, we studied a C2C market and proposed an algorithm to iden-
tify the optimal transaction fee to maximize the intermediary’s revenue while
taking into account the customer rationality. We first used the representative
buyer model to determine how the buyers’ total budget is allocated across a
variety of products. Then, we showed that there always exists a unique equilib-
rium point at which no seller can gain by changing its selling decision. Next, we
formalized the intermediary’s revenue maximization problem and, by using the
quality-adjusted Dixit-Stiglitz utility function function and the uniform distri-
bution of product qualities, derived the closed-form optimal solution explicitly.
We discussed qualitatively the impacts of the aggregate outside product quality
and product substitutability on the intermediary’s revenue. Extension to hetero-
geneous selling costs and product prices were also addressed. Our results showed
Revenue Maximization 223
References
1. eBay Seller Fees, http://pages.ebay.com/help/sell/fees.html
2. Fiverr, http://www.fiverr.com
3. Gosh, A., McAfee, P.: Incentivizing high-quality user-generated content. In: 20th
Intl. Conf. World Wide Web (2011)
4. Jain, S., Chen, Y., Parkes, D.C.: Designing incentives for online question and an-
swer forums. In: ACM Conf. Electronic Commerce (2009)
5. Singh, V.K., Jain, R., Kankanhalli, M.S.: Motivating contributors in social media
networks. In: ACM SIGMM Workshop on Social Media (2009)
6. Ghose, A., Yao, Y.: Using transaction prices to re-examine price dispersion in
electronic markets. Info. Sys. Research 22(2), 1526–5536 (2011)
7. McGlohon, M., Glance, N., Reiter, Z.: Star quality: aggregating reviews to rank
products and merchants. In: Intl. Conf. Weblogs Social Media, ICWSM (2010)
8. Hagiu, A.: Merchant or two-sided platform? Review of Network Economics 6(2),
115–133 (2007)
9. Jin, Y., Sen, S., Guerin, R., Hosanagar, K., Zhang, Z.-L.: Dynamics of competition
between incumbent and emerging network technologies. NetEcon (August 2008)
10. Musacchio, J., Kim, D.: Network platform competition in a two-sided market:
Implications to the net neutrality issue. In: TPRC: Conf. Commun., Inform., and
Internet Policy (September 2009)
11. Rochet, J.C., Tirole, J.: Platform competition in two-sided markets. Journal of the
European Economic Association 1, 990–1029 (2003)
12. Rochet, J.C., Tirole, J.: Two-sided markets: A progress report. RAND Journal of
Economics 37, 645–667 (2006)
13. Rochet, J.C., Tirole, J.: Cooperation among competitors: Some economics of pay-
ment card associations. Rand Journal of Economics 33, 549–570 (2002)
14. Nocke, V., Peitz, M., Stahl, K.: Platform ownership. Journal of the European
Economic Association 5, 1130–1160 (2007)
15. Belleflamme, P., Toulemonde, E.: Negative intra-group externalities in two-sided
markets. CESifo Working Paper Series
16. Evans, G.W., Honkapohja, S.: Learning and Expectations in Macroeconomics.
Princeton Univ. Press, Princeton (2001)
17. Dixit, A.K., Stiglitz, J.E.: Monopolistic competition and optimum product diver-
sity. American Economic Review 67(3), 297–308 (1977)
18. Hallak, J.C.: The effect of cross-country differences in product quality on the di-
rection of international trade 2002, Working Paper, Univ. Michigan, Ann Arbor,
MI (2002)
19. Rochet, J.C., Tirole, J.: Two-sided markets: A progress report. RAND J. Eco-
nomics 37(3), 645–667 (2006)
20. Ren, S., Park, J., van der Schaar, M.: Maximizing profit on user-generated content
platforms with heterogeneous participants. In: IEEE Infocom (2012)
21. Munkres, J.R.: Elements of Algebraic Topology. Perseus Books Pub., New York
(1993)
A Stackelberg Game to Optimize the Distribution
of Controls in Transportation Networks
1 Introduction
In this article, we study from a theoretical point of view the problem of allocat-
ing inspectors to spatial locations of a transportation network, in order to enforce
the payment of a transit fee. The question of setting an optimal level of control in
transportation networks has been addressed by several authors, but to the best
of our knowledge, none of them takes the topology of the network and the spatial
distribution of the inspectors into account. Simple game theoretic models have
been proposed to model the effect of the control intensity on the behaviour of the
users of the network [4], to find an optimal trade-off between the control costs
and the revenue from the network fee [1], or to evaluate the effect of giving some
information (about the controls) to the users [6]. More recently, an approach to
optimize the schedules of inspectors in public transportation networks was pro-
posed by DSB S-tog in Denmark [7]. In contrast to our problem, the authors of
the latter article focus on temporal scheduling and assume an evasion rate which
does not depend on the control intensity. The present paper is motivated by an
application to the enforcement of a truck toll in Germany, which we present next.
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 224–235, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
A Stackelberg Game 225
Specificity of the applied problem and assumptions made in this article. The
model presented in this article is not limited to the case of motorway networks.
It applies to any situation where the individuals in transit can be controlled
on each section of their route through a network. A strong assumption of our
model however is that we know the set of routed demands of the network, i.e. the
number of individuals taking each possible route. In our model, the drivers do
not have the choice of their route between their source and destination. We plan
to search in this direction for future work. In particular, it might be relevant to
consider that the drivers can take some sections of a trunk road to avoid the toll
motorway.
We do not pretend that our model is representative of all the complexity of
the drivers’ reaction to inspectors’ behavior, in particular because certain facts
are particularly hard to model. For example, the perception of the penalty is not
the same for all truck drivers. If an evader is caught with a second offense, he
may get a higher fine in a trial.
In this article, we assume that the users of the network act on a selfish be-
haviour, and decide to pay or to evade so as to minimize the expected cost of
their trip. This is obviously wrong, since there is certainly a large fraction of
honest people who always pay the toll. However, we claim that our simplified
model still leads to significant spatial distributions of the controls, because: (i)
the number of evaders that we compute in this model corresponds to the num-
ber of network users for which it is more interesting to evade the toll; (ii) hence,
the toll revenue in this model is a lower bound for the true revenue; (iii) if the
fraction of honest drivers is the same on every route, we could solve the problem
by considering the remaining fraction of crafty drivers only, which would lead to
the same results.
226 R. Borndörfer et al.
We make use of the standard notation [n] := {1, . . . , n} and we denote vectors by
boldface lower case letters. We model the transportation network by a directed
graph G = (V, E). We assume that the users of the network are distributed over
a set of routes R = {r1 , . . . , rm }, where each ri ⊂ E. In addition, we are given
the demand xi of each route, that is, the number of users that take the route
ri per unity of time (typically one hour; we assume a constant demand, i.e., we
do nottake the diurnal variations of the traffic into account). We denote by
ye := {i∈[m]: ri e} xi the number of users passing through edge e per unity of
time.
Every user of the route ri has to pay a toll fee Ti , but he may also decide
to evade the toll, with the risk to pay a penalty Pi if he gets controlled. We
assume that the inspectors have a total capacity of control κ. This means that
κ individuals can be controlled per unity of time. We consider two manners of
spreading out the controls over the network in the next subsections. In the first
one, we simply assume that the control force can be arbitrarily distributed over
the network. The second one is a more theoretical approach, where we consider
all possible allocations of a finite number of inspectors over the sections e ∈ E,
and we search for the best mixed strategy combining these allocations.
Strategy of the users. We denote by πi the probability for a user of the route
ri to be controlled during its trip. We assume a stationary regime in which the
users have learned the values of the probability πi . Hence, a user of the route ri
will pay if πi is above the threshold PTii , and evade if it is below. In other words,
A Stackelberg Game 227
the proportion pi of payers on the route ri minimizes the average cost per user
of this route:
λi := min(Ti , Pi πi ) = min pi Ti + (1 − pi )Pi πi .
pi ∈[0,1]
this section, where we have used the notation a ∧ b := min(a, b). Hence, the
probability πi of being controlled during a trip on route ri can be expressed as
a function of the control distribution q:
κqe
πi = 1 − 1−( ∧ 1) .
e∈r
ye
i
which is valid when the right hand side of Equation (1) is small. In the experi-
ments presented in Section 3, we obtain values of πi that never exceed 0.2. Note
that this approximation is equivalent to assuming that a user pays twice the fine
if he is caught twice.
Maximizing the profit. If the controller wants to maximize the total revenue
generated by the toll, which is, by construction, equal to the total loss of the
users, the problem to solve is:
max xi λi = max xi min(Ti , Pi πi ), (2)
q∈ΔE q∈ΔE
i∈[m] i
where πi depends on q through Equation (1). If the costs of the controls must
be taken into account, and the cost for a control on section e is ce , then we can
solve:
max xi min(Ti , Pi πi ) − qe κce , (3)
q∈Δ−
E i∈[m] e∈E
228 R. Borndörfer et al.
where Δ− E := {q ∈ [0, 1]
|E|
: e∈E qe ≤ 1} (we do not necessarily use all the
control capacity). It is not difficult to see that there must be an optimum such
that ∀e ∈ E, κq ye ≤ 1, because the controller never has interest to place more
e
capacity of control on a section than the number of users that pass through
κq
it. If we impose this constraint, the expression of πi simplifies to e
ye , and
e∈ri
Problem (3) becomes a linear program:
max− xi λi − qe κce (4)
q∈ΔE
i e∈E
λ∈Rm
κqe
s. t. ∀i ∈ [m], λi ≤ Pi
e∈r
ye
i
∀i ∈ [m], λi ≤ Ti
∀e ∈ E, κqe ≤ ye .
Note that we have chosen to consider here that the ith user is paying when the
threshold πi = PTii is reached but not exceeded. We can formulate this problem
as a mixed integer program (MIP), by introducing a binary variable δi which is
forced to take the value 1 when πi < PTii :
min xi δi (5)
q∈ΔE
δ∈{0;1}m i
Ti κqe
s. t. ∀i ∈ [m], ≤ + δi
Pi e∈r
ye
i
∀e ∈ E, κqe ≤ ye .
As in Section 2.1, the problem to maximize the revenue generated from the toll
can be formulated as an LP (we do not consider control costs for the sake of
simplicity). Note that this time, we do not need to take a linear approximation
of π̄i , because αn,i is a fixed parameter:
max xi λi
q∈ΔSN
i
λ∈Rm
s. t. ∀i ∈ [m], λi ≤ Ti (6a)
∀i ∈ [m], λi ≤ qn Pi αn,i . (6b)
n∈Sn
P = {v + z : v ∈ convex-hull({v n : n ∈ SN }), z ∈ Rm
− }. (7)
The next proposition shows that if the capacity of control κ is smaller than the
traffic on every edge, then P has no more than |E| extreme points, so that we
can impose qn = 0 for almost all n ∈ SN .
Proposition 1. Assume that ∀e ∈ E, κ ≤ ye , and denote by ñ(e) the allocation
where all the inspectors are concentrated on edge e. Then, every extreme point
of P is of the form v ñ(e) for an e ∈ E. Hence, Problem (6) has a solution in
which qn = 0 for all n ∈ SN \ {ñ(e) : e ∈ E}.
Proof. It is clear that the extreme points of convex-hull(SN ) are the vectors of
the form ñ(e) := [0, . . . , 0, N, 0, . . . , 0]T , with the nonzero in position e. The
application n → un , which maps SN onto Rm , and where
ne κ
un,i := Pi
e∈r
N ye
i
is linear, and hence the extreme points of the polyhedron with vertices (un )n∈SN
are among the images of the extreme points of convex-hull(SN ), that is, the
vectors uñ(e) (e ∈ E). Let n ∈ SN . Since κ ≤ ye for all e, the expression of vn,i
can be simplified to:
ne κ
vn,i = Pi 1− (1 − ) ≤ un,i ,
e∈ri
N ye
where the inequality follows from the log-concavity of x → i (1 − xi ). Moreover
the equality is attained for the vectors of the type v ñ(e) , because the product
consists of only one factor (or even 0 factor if e ∈/ ri ), i.e., ∀e ∈ E we have
230 R. Borndörfer et al.
(k−1)
(ne + δe,e )κ
where ek ∈ argmax μi Pi 1− 1− ∧1 .
e ∈E e∈r
N ye
i∈[m] i
In the above equation, δ stands for the Kronecker delta function. We use the
vector n(N ) generated by this greedy procedure as an approximation for the
solution of (8), and we add the column vn(N ) in the linear program. Finally, we
solve this augemented linear program and repeat the above procedure.
An argument justifying this greedy method is that if we use the same approx-
imation as in Equation (1), the objective of Problem (8) becomes separable and
concave, and it is well known that the greedy procedure finds the optimum (see
e.g. [5]). The column generation procedure can be stopped when the optimal
value of Problem (8) is 0, which guarantees that no other column can increase
the value of Problem (6). In practice, we stop the column generation as soon as
the reduced cost of the strategy n(N ) returned by the greedy procedure is 0.
In this section, we establish a relation between the solutions of the model (3)
presented above and the Nash equilibriums of a particular polymatrix game. For
the model without costs (2), it is not difficult to write the payoff of the controller
as the sum of partial payoffs from zero-sum bimatrix games played against each
user (recall that pi = [pi , 1 − pi ]T ):
Payoff(controller) = xi λi = Loss(user i) = pTi Ai q,
i i i
This particular polymatrix game has a special structure, since the interaction
between the players can be modelled by a star graph with the controller in the
central node, and each edge represents a zero-sum game between a user and the
controller. Modulo the additional constraint κqe ≤ ye , which bounds from above
the mixed strategy of the controller, any Nash equilibrium (q, p1 , . . . , pm ) of
this polymatrix game gives a solution q to the Stackelberg competition problem
studied in Section 2.1. The model with control costs (3) can also be formulated
in this way, by adding a new player who has a single strategy. This player plays
a zero-sum
game against the controller, whose payoff is the sum of the control
costs e ce qe .
Interestingly, the fact that Problem (3) is representable by a LP is strongly
related to the fact that every partial game is zero-sum. We point out a recent
paper of Daskalakis and Papadimitriou [3], who have generalized the Neumann’s
minmax theorem to the case of zero-sum polymatrix games. In the introduction
of the latter article, the authors moreover notice that for any star network, we
can find an equilibrium of a zero-sum polymatrix game by solving a LP.
3 Experimental Results
We have solved the models presented in this paper for several regions of the
German motorways network, based on real traffic data (averaged over time). We
present here a brief analysis of our results. On Figure 1, we have represented
the mixed strategy of the controller that maximizes the revenue from the toll
(without control costs, for κ = 60 controls per hour), for the regions of Berlin-
Brandenburg and North Rhine-Westphalia (NRW). The graphs corresponding
to these regions consist of 57 nodes (resp. 111) and 120 directed edges (resp.
264), and we have taken in consideration 1095 routes (resp. 4905). We have used
a toll fee of 0.176 e per kilometer, and a penalty of 400 e that does not depend
on the route.
232 R. Borndörfer et al.
Control rate
Berlin
Brandenburg
Cottbus
(a)
Control rate
Dortmund
Duisburg
Wuppertal
Düsseldorf
(b)
Fig. 1. Mixed strategy of the controller which maximizes the revenue (2), for the regions
of Berlin-Brandenburg (a), and NRW (b). The widths of the sections indicate the traffic
volumes.
100 max_revenue
min_evaders
proportional
80
fraction of evaders(%)
60
40
20
0
0 20 40 60 80 100
(a)
100
revenue (% of the all pay case)
80
60
40
20 max_revenue
min_evaders
proportional
0
0 20 40 60 80 100
(b)
Fig. 2. Evolution of the number of evaders (a) and of the toll revenue (b) with κ, for
the region of Berlin-Brandenburg
and that minimize the number of evaders are compared to the case where the
controls are proportional to the traffic. Several conclusions can be drawn from
this Figure: first, the “proportional” strategy is not so bad in terms of revenue,
however a difference of up to 4% with the max_revenue strategy is observed.
Second, the number of evaders decreases much faster when the controls are dis-
tributed with respect to this goal. For κ = 55, the evasion rate achieved by the
234 R. Borndörfer et al.
control distribution that is proportional to the traffic (resp. that maximizes the
revenue) is of 97% (resp. 54%), while we can achieve an evasion rate of 31% with
the min_evaders strategy. Third, both the max_revenue and the min_evaders
strategies create a situation in which it is in the interest of no driver to evade
for κ ≥ 80.3. In contrast, there is still 2% of the drivers who had better evade
with the proportional strategy for κ = 115.
We have also computed the optimal mixed strategy for a coalition of N = 13
inspectors, with the column generation procedure described in Section 2.2. For
κ = 60, we found that the N inspectors should be simultaneously allocated to a
common section 84% of the time. The column generation procedure, which allows
to consider the strategies where the inspectors are spread over the network, yields
an increase of revenue of only 1.84%. An intuitive explanation is that spreading
out the inspectors leads to potentially controlling several times the same driver.
Moreover, most of the traffic passes only through sections where ye ≥ κ, so that
v ñ(e) is an extreme point of P (cf. Equation (7)).
References
1. Boyd, C., Martini, C., Rickard, J., Russell, A.: Fare evasion and non-compliance: A
simple model. Journal of Transport Economics and Policy, 189–197 (1989)
2. Borndörfer, R., Sagnol, G., Swarat, E.: An IP approach to toll enforcement opti-
mization on german motorways. Tech. Rep. ZIB, Report 11-42, Zuse Institut Berlin
(2011)
3. Daskalakis, C., Papadimitriou, C.H.: On a Network Generalization of the Minmax
Theorem. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S.,
Thomas, W. (eds.) ICALP 2009, Part II. LNCS, vol. 5556, pp. 423–434. Springer,
Heidelberg (2009)
4. Hildebrand, M.D., Prentice, B.E., Lipnowski, I.: Enforcement of highway weight
regulations: A game theoretic model. Journal of the Transportation Research Fo-
rum 30(2) (1990)
5. Ibaraki, T., Katoh, N.: Resource allocation problems: algorithmic approaches. MIT
Press (1988)
6. Jankowski, W.B.: Fare evasion and non-compliance: A game theoretical approach.
International Journal of Transport Economics 38(3), 275–287 (1991)
7. Thorlacius, P., Clausen, J., Brygge, K.: Scheduling of inspectors for ticket spot
checking in urban rail transportation. Trafikdage på Aalborg Universitet 2008 (2010)
Stochastic Loss Aversion
for Random Medium Access
1 Introduction
The “by rule” window flow control mechanisms of, e.g., TCP and CSMA, have
elements of both proactive and reactive communal congestion control suitable
for distributed/information-limited high-speed networking scenarios. Over the
past ten years, game theoretic models for medium access and flow control have
been extensively explored in order to consider the effects of even a single end-
user/player who greedily departs from such prescribed/standard behaviors [1, 6,
9, 13–16, 23–25, 28]. Greedy end-users may have a dramatic effect on the overall
“fairness” of the communication network under consideration. So, if even one
end-user acts in a greedy way, it may be prudent for all of them to do so.
However, even end-users with an noncooperative disposition may temporarily
not practice greedy behavior in order to escape from sub-optimal (non-Pareto)
Nash equilibria. In more general game theoretic contexts, the reluctance of an
end-user to act in a non-greedy fashion is called loss aversion [7].
In this note, we focus on simple slotted-ALOHA MAC for a LAN. We begin
with a noncooperative model of end-user behavior. Despite the presence of a stable
interior Nash equilibrium, this system was shown in [13,14] to have a large domain
of attraction to deadlock where all players’ transmission probability is one and
so obviously all players’ throughput is zero (here assuming feasible demands and
throughput based costs). To avoid non-Pareto Nash equilibria, particularly those
G. Kesidis was supported by American NSF CISE/CNS grant 0916179.
Y. Jin was supported by the Korean NRF grant number 2010-0006611.
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 236–247, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Stochastic Loss Aversion for Random Medium Access 237
involving zero throughput for some or all users, we assume that end-users will
probabilistically engage in non-greedy behavior. That is, a stochastic model of loss
aversion, a behavior whose aim is long term communal betterment.
We may be able to model a play that reduces net-utility using a single
“temperature” parameter T in the manner of simulated annealing (e.g., [12]);
i.e., plays that increase net utility are always accepted and plays that reduce net
utility are (sometimes) accepted with probability decreasing in T , so the players
are (collectively) less loss averse with larger T . Though our model of probabilis-
tic loss aversion is related that of simulated annealing by diffusions [10,29], even
with a free meta-parameter (η or ηw below) possibly interpretable as tempera-
ture, our modeling aim is not centralized annealing (temperature cooling) rather
decentralized exploration of play-space by noncooperative users.
We herein do not model how the end-users will keep track of the best (Pareto)
equilibria previously played/discovered1. Because the global extrema of the global
objective functions (Gibbs exponents) we derive do not necessarily correspond to
Pareto equilibria, we do not advocate collective slow “cooling” (annealing) of the
equivalent temperature parameters. Also, we do not model how end-user through-
put demands may be time-varying, a scenario which would motivate the “contin-
ual search” aspect of the following framework.
The following stochastic approach to distributed play-space search is also
related to “aspiration” of repeated games [3, 8, 18], where a play resulting in
suboptimal utility may be accepted when the utility is less than a threshold, say
according to a “mutation” probability [17, 26]. This type of “bounded rational”
behavior been proposed to find Pareto equilibria, in particular for distributed
settings where players act with limited information [26]. Clearly, given a global
objective L whose global maxima correspond to Pareto equilibria, these ideas
are similar to the use of simulated annealing to find the global maxima of L
while avoiding suboptimal local maxima.
This paper is organized as follows. In Section 2, we formulate the basic
ALOHA noncooperative game under consideration. Our stochastic framework
(a diffusion) for loss aversion is given in Section 3; for two different modulat-
ing terms of the white-noise process, the invariant distribution in the collective
play-space is derived. A two-player numerical example is used to illustrate the
performance of these two approaches in Section 4. We conclude in Section 5 with
a discussion of future work.
is such that Ui (0) = 0, and the throughput-based price is M . So, the throughput-
demand of the ith player is
yi := (U )−1 (M ).
d yi
vi = − vi =: − Ei (v), (2)
dt j=i (1 − vj )
cf. (6). Note that we define −Ei , instead of Ei , to be consistent with the notation
of [29], which seeks to minimize a global objective, though we want to maximize
such objectives in the following.
Such dynamics generally exhibit multiple Nash equilibria, including non-Pareto
equilibria with significant domains of attraction. Our ALOHA context has a sta-
ble deadlock equilibrium point where all players always transmit, i.e., v = 1 :=
(1, 1, ..., 1) [13, 14].
We now model stochastic perturbation of the Jacobi dynamics (2), allowing for
suboptimal plays despite loss aversion, together with a sigmoid mapping g to
ensure plays (transmission probabilities) v remain in a feasible hyper-rectangle
D ⊂ [0, 1]n (i.e., the feasible play-space for v): for all i,
where 1 ≤ δ < 2 and 0 < γ ≤ 1/(1 + δ). Thus, inf u g(u) = inf v = γ(−1 +
δ) ≥ 0 and supu g(u) = sup v = γ(1 + δ) ≤ 1. Again, to escape from the
domains of attraction of non-Pareto equilibria, the deterministic Jacobi dynamics
(i.e., −Ei (v)dt in (3)) have been perturbed by white noise (dWi ) here modulated
by a diffusion term of the form:
2hi (v)
σi (vi ) = ,
fi (vi )
where
where the Ni (k) are all i.i.d. normal N(0, ε) random variables.
The system just described is a variation of E. Wong’s diffusion machine [29],
the difference being the introduction of the term h instead of a temperature
meta-parameter T . Also, the diffusion function σi is player-i dependent at least
through hi . Finally, under the slotted-ALOHA dynamics, there is no function
E(v) such that ∂E/∂vi = Ei , so we will select the diffusion factors hi to achieve
a tractable Gibbs stationary distribution of v, and interpret them in terms of
player loss aversion.
Note that in the diffusion machine, a common temperature parameter T may
be slowly reduced to zero to find the minimum of a global potential function
(the exponent of the Gibbs stationary distribution of v) [20, 21], in the manner
of simulated annealing. Again, the effective temperature parameter here (η or
ηw) will be constant.
with η > 0 a free meta-parameter (assumed common to all players). So, a greedier
player i (larger yi ) will generally tend to be less loss averse (larger hi ), except
when their current retransmission play vi is large.
Proof. Applying Ito’s lemma [19, 29] to (3) and (4) gives
1
dvi = gi (ui )dui + gi (ui )σi2 (v)dt
2
1
= [−fi (vi )Ei (v) + gi (gi−1 (vi ))σi2 (v)]dt
2
+ fi (vi )σi (v)dWi ,
where the derivative operator z := d z(vi ) and we have just substituted (3)
dvi
for the second equality. From the Fokker-Planck (Kolmogorov forward) equation
for this diffusion [19, 29], we get the following equation for the time-invariant
(stationary) distribution p of v: for all i,
1 1
0= ∂i (fi2 σi2 p) − [−fi Ei + (gi ◦ gi−1 )σi2 ]p,
2 2
∂
where the operator ∂i := ∂vi .
Stochastic Loss Aversion for Random Medium Access 241
where the second equality is due to cancellation of the hi fi p terms. For all i,
since fi > 0,
∂i p(v) Ei (v) h (vi )
= ∂i log p(v) = − − i (9)
p(v) hi (vi ) hi (vi )
1 2
= ∂i Λ(v) + .
ηY 1 − vi
Finally, (8) follows by direct integration.
That a user would be less loss averse (higher h) when the channel was perceived
to be more idle may be a reflection of a “dynamic” altruism [2] (i.e., a player is
more courteous as s/he perceives that others are). The particular form of (12)
also leads to another tractable Gibbs distribution for v.
242 G. Kesidis and Y. Jin
Theorem 2. Using (12), the stationary probability density function of the dif-
fusion v on [0, 2γ]n is
1
p(v) = exp(Δ(v)) (13)
W
where
n n
yi 1
Δ(v) = − 1 log vi + (1 − vi ), (14)
i=1
η η i=1
Proof. Following the proof of Theorem 1, the invariant here satisfies also satisfies
(9):
Ei (v)
∂i log p(v) = − − ∂i log hi (v)
hi (v)
yi 1 1
= − (1 − vj ) − .
ηvi η vi
j=i
3.4 Discussion
Note that if η > maxi yi , then Δ is strictly decreasing in vi for all i, and so will
be minimal in the deadlock region (unlike Λ̃). So the stationary probability in
the region of deadlock will be low. However, large η may result in the stationary
probability close to 0 being very high. So, we see that the meta-parameter η (or
ηw) here plays a more significant role (though the parameters δ and γ in g play
a more significant role in the former objective Λ̃ owing to its global extremum
at 1).
For small η < mini yi , note that Δ(1) = 0, i.e., it is not a maximal singularity
at 1 as Λ̃. Also, the differences in role played by η in the two Gibbs distributions
(8) and (13) is apparent from the first-order necessary conditions for optimality
of their potentials:
∇Λ(v) = 0 ⇔ yi = vi (1 − vj )
j=i
∇Δ(v) = 0 ⇔ yi − η = vi (1 − vj ),
j=i
Stochastic Loss Aversion for Random Medium Access 243
so that here demand is more than achieved throughput. Thus, under the potential
Δ, if 0 < η < mini yi , then the Gibbs distribution is maximal at points v where
the throughputs θ = y − η1, i.e., all users’ achieved throughputs are less than
their demands by the same constant amount η. So, the meta-parameter η may
be used to deal with the problem of excessive total demand i yi .
Finally note that the Hessian of Δ has all off-diagonal entries 1/η and ith
diagonal entry −(yi − η)/(ηvi2 ). Assume that the reduced demands y − η1 are
feasible and achieved at v. If yi − η > (n − 1)vi2 for all users i (again where n
is the number of users), then by diagonal dominance, Δ’s Hessian is negative
definite at v and hence is a local maximum there. The sufficient condition of
diagonal dominance is achieved in the special case when vi < 1/(2n) for all i
because for all i:
yi − η = vi (1 − vj ) ≈ vi (1 − vj ),
j=i j=i
yi − η vi (1 − j=i vj ) 1
i.e., ≈ > (n − 1) .
ηvi2 ηvi2 η
This special case obviously does not include the classical, static choice for slotted
ALOHA of vi = 1/n for all i, which leads to optimal total throughput (for the
identical users case) of 1/e when n is large.
4 Numerical Example
For an n = 2 player example with demands y = (8/15, 1/15) and η = 1, the two
interior Nash equilibria are the locally stable (under deterministic dynamics)
at v ∗a = (2/3, 1/5) and the (unstable) saddle point at v ∗b = (4/5, 1/3) (both
with corresponding throughputs θ = y) [13, 14]. Again, 1 is a stable deadlock
boundary equilibrium which is naturally to be avoided if possible as both players’
throughputs are zero there, θ = 0. Under the deterministic dynamics of (2),
the deadlock equilibrium 1 had a significant domain of attraction including a
neighborhood of the saddle point v ∗b .
The exponent of p (potential of the Gibbs distribution), Λ̃, for this example
is depicted in Figure 1. Λ̃ has a shape similar to that of the Lyapunov function
Λ, but without the same interior local extrema or saddle points by (11). The
extreme mode at 1 is clearly evident.
244 G. Kesidis and Y. Jin
Fig. 1. The potential/exponent (10) of the Gibbs distribution (8) for n = 2 players
with demands y = (8/15, 1/15)
4.1 Small η
For the case where 0 < η < min{y1 , y2 }, we took η = 0.01 for the example
above. The potential Δ of the Gibbs distribution (13) is depicted in Figure 2.
Compared to Λ̃ in Figure 1, v = 1 is not a local extremum under Δ (and does
Fig. 2. The potential Δ of (13) for n = 2 players with demands y = (8/15, 1/15) under
(12) with η = 0.01
Stochastic Loss Aversion for Random Medium Access 245
v1∗ , v2∗ Λ Δ Λ∗
5 , 3 .059 −4.6 .037
4 1
Fig. 3. The component Λ of the potential of (8) for n = 2 players with demands
y = (8/15, 1/15) − 0.01 · 1
4.2 Large η
See [22] for a numerical example of this case, where we illustrate how the use of
(12) results in dramatically less sensitivity to the choice of the parameters δ and
γ governing the range of the play-space D.
model with power based costs, i.e., M v instead of M θ in the net utility (1).
Also, we will study the effects of asynchronous and/or multirate play among the
users [2, 4, 15].
References
1. Altman, E., Boulogne, T., El-Azouzi, R., Jiménez, T., Wynter, L.: A survey on net-
working games in telecommunications. Comput. Oper. Res. 33(2), 286–311 (2006)
2. Antoniadis, P., Fdida, S., Griffin, C., Jin, Y., Kesidis, G.: CSMA Local Area Net-
working under Dynamic Altruism (December 2011) (submitted)
3. Bendor, J., Mookherjee, D., Ray, B.: Aspiration-based reinforcement learning in re-
peated interaction games: an overview. International Game Theory Review 3(2&3),
159–174 (2001)
4. Bertsekas, D.P., Tsitsiklis, J.N.: Convergence rate and termination of asynchronous
iterative algorithms. In: Proc. 3rd International Conference on Supercomputing
(1989)
5. Brown, G.W.: Iterative solutions of games with fictitious play. In: Koopmans, T.C.
(ed.) Activity Analysis of Production and Allocation. Wiley, New York (1951)
6. Cagalj, M., Ganeriwal, S., Aad, I., Hubaux, J.P.: On Selfish Behavior in CSMA/CA
networks. In: Proc. IEEE INFOCOM (2005)
7. Camerer, C.F., Loewenstein, G.: Behavioral Economics: Past, Present, Future. In:
Camerer, C.F., Loewenstein, G., Rabin, M. (eds.) Advances in Behavioral Eco-
nomics. Princeton Univ. Press (2003)
8. Cho, I.-K., Matsui, A.: Learning aspiration in repeated games. Journal of Economic
Theory 124, 171–201 (2005)
9. Cui, T., Chen, L., Low, S.H.: A Game-Theoretic Framework for Medium Access
Control. IEEE Journal on Selected Areas in Communications 26(7) (September
2008)
10. Gidas, B.: Global optimization via the Langevin equation. In: Proc. IEEE CDC,
Ft. Lauderdale, FL (December 1985)
11. Heusse, M., Rousseau, F., Guillier, R., Dula, A.: Idle sense: An optimal access
method for high throughput and fairness in rate diverse wireless LANs. In: Proc.
ACM SIGCOMM (2005)
12. Holley, R., Stroock, D.: Simulated Annealing via Sobolev Inequalities. Communi-
cations in Mathematical Physics 115(4) (September 1988)
13. Jin, Y., Kesidis, G.: A pricing strategy for an ALOHA network of heterogeneous
users with inelastic bandwidth requirements. In: Proc. CISS, Princeton (March
2002)
14. Jin, Y., Kesidis, G.: Equilibria of a noncooperative game for heterogeneous users
of an ALOHA network. IEEE Communications Letters 6(7), 282–284 (2002)
15. Jin, Y., Kesidis, G.: Dynamics of usage-priced communication networks: the case
of a single bottleneck resource. IEEE/ACM Trans. Networking (October 2005)
16. Jin, Y., Kesidis, G.: A channel-aware MAC protocol in an ALOHA network with
selfish users. IEEE JSAC Special Issue on Game Theory in Wireless Communica-
tions (January 2012)
17. Kandori, M., Mailath, G., Rob, R.: Learning, mutation, and long run equilibria in
games. Econometrica 61(1), 29–56 (1993)
18. Karnadikar, R., Mookherjee, D., Ray, D., Vega-Redondo, F.: Evolving aspirations
and cooperation. Journal of Economic Theory 80, 292–331 (1998)
Stochastic Loss Aversion for Random Medium Access 247
19. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Springer
(1991)
20. Kesidis, G.: Analog Optimization with Wong’s Stochastic Hopfield Network. IEEE
Trans. Neural Networks 6(1) (January 1995)
21. Kesidis, G.: A quantum diffusion network. Technical Report 0908.1597 (2009),
http://arxiv.org/abs/0908.1597
22. Kesidis, G., Jin, Y.: Stochastic loss aversion for random medium access. Technical
report (January 9, 2012), http://arxiv.org/abs/1201.1776
23. Lee, J.W., Chiang, M., Calderbank, R.A.: Utility-optimal random-access protocol.
IEEE Transactions on Wireless Communications 6(7) (July 2007)
24. Ma, R.T.B., Misra, V., Rubenstein, D.: An Analysis of Generalized Slotted-Aloha
Protocols. IEEE/ACM Transactions on Networking 17(3) (June 2009)
25. Menache, I., Shimkin, N.: Fixed-Rate Equilibrium in Wireless Collision Channels.
In: Chahed, T., Tuffin, B. (eds.) NET-COOP 2007. LNCS, vol. 4465, pp. 23–32.
Springer, Heidelberg (2007)
26. Montanari, A., Saberi, A.: Convergence to equilibrium in local interaction games.
In: FOCS (2009)
27. Shamma, J.S., Arslan, G.: Dynamic fictitious play, dynamic gradient play, and
distributed convergence to Nash equilibria. IEEE Trans. Auto. Contr. 50(3), 312–
327 (2005)
28. Wicker, S.B., MacKenzie, A.B.: Stability of Multipacket Slotted Aloha with Selfish
Users and Perfect Information. In: Proc. IEEE INFOCOM (2003)
29. Wong, E.: Stochastic Neural Networks. Algorithmica 6 (1991)
Token-Based Incentive Protocol Design
for Online Exchange Systems
1 Introduction
Resource sharing services are currently proliferating in many online systems. For
example, In BitTorrent, Gnutella and Kazaa, individual share files; in Seti@home
individuals provide computational assistance; in Slashdot and Yahoo!Answers,
individuals provide content, evaluations and answers to questions. The expansion
of such sharing and exchange services will depend on their participating members
(herein referred to as agents) to contribute and share resources with each other.
However, the participating agents are self-interested and hence, they will try to
“free-ride”, i.e. they will derive services from other agents without contributing
their own services in return. Empirical studies show that this free-riding problem
can be quite severe: in Gnutella system for instance, almost 70% of users share
no files at all [1].
Corresponding author.
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 248–258, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
Token Incentive 249
2 System Model
In the environment we consider, a continuum (mass 1) of agents each possess
a unique resource that can be duplicated and provided to others. (In the real
systems we have in mind, the population is frequently in the tens of thousands, so
a continuum model seems a reasonable approximation.) The benefit of receiving
this resource is b and the cost of producing it is c ; we assume b > c > 0 ,
so that social welfare is increased when the service is provided, but the cost is
strictly positive, so that the server has a disincentive to provide it. Agents care
about current and future benefits/costs and discount future benefits/costs at
the constant rate β ∈ (0, 1) . Agents are risk neutral so seek to maximize the
discounted present value of a stream of benefits and costs.
Time is discrete. In each time period, a fraction ρ ≤ 1/2 of the population is
randomly chosen to be a client and matched with randomly chosen server; the
fraction 1 − 2ρ is unmatched. (No agent is both a client and a server in the same
period.) When a client and server are matched, the client chooses whether or
not to request service, the server chooses whether or not provide service (i.e.,
transfer the file) if requested. This client-server model describes the world where
an agent has demand at times and also is matched by the system to provide
service at other times.
The parameters b, c, β, ρ completely describe the environment. Because the
units of benefit b and cost c are arbitrary (and tokens have no intrinsic value),
only the benefit/cost ratio r = b/c is actually relevant. We consider variations
in the benefit/cost ratio r and the discount factor β, but view the matching rate
ρ as immutable.
when they are clients and servers. The recommended strategy is a pair (σ, τ ) :
IN → {0, 1} ; τ is the client strategy and σ is the server strategy. It is obvious
that the strategy should only depend on agents’ current token holding because
the future matching process is independent of the history.
2.2 Equilibrium
Because we consider a continuum population and assume that agents can observe
only their own token holdings, the relevant state of the system from the point
of view of a single agent can be completely summarized by the fraction μ of
agents who do not request service when they are clients and the fraction ν of
agents who do not provide service when they are servers. If the population is in
a steady state then μ, ν do not change over time.
Given μ, ν the strategy (σ, τ ) is optimal or a best response for the current
token holding of k if the long-run utility satisfies
V (k|μ, ν, σ, τ ) ≥ V (k|μ, ν, σ , τ )
for alternative strategies σ , τ . Because agent discount the future at the constant
rate β, the strategy (σ, τ ) is optimal if and only if it has the one-shot deviation
property: there does not exist a continuation history h and a profitable deviation
(σ , τ ) that differs from (σ, τ ) followed by the history h and nowhere else; i.e.
for the server strategy
σ (k) = 0 ⇒ βV (k|σ, τ, μ, ν) ≥ −c + βV (k + 1|σ, τ, μ, ν)
σ (k) ∈ (0, 1) ⇒ βV (k|σ, τ, μ, ν) = −c + βV (k + 1|σ, τ, μ, ν)
σ (k) = 1 ⇒ βV (k|σ, τ, μ, ν) ≤ −c + βV (k + 1|σ, τ, μ, ν)
for the client strategy
τ (k) = 0 ⇒ βV (k|σ, τ, μ, ν) ≥ b + βV (k − 1|σ, τ, μ, ν)
τ (k) ∈ (0, 1) ⇒ βV (k|σ, τ, μ, ν) = b + βV (k − 1|σ, τ, μ, ν)
τ (k) = 1 ⇒ βV (k|σ, τ, μ, ν) ≤ b + βV (k − 1|σ, τ, μ, ν)
Write EQ(r, β) for the set of protocols Π that constitute an equilibrium when
the benefit/cost ratio is r and the discount factor is β. Conversely, given Π write
Φ(Π) for the set {(r, β)} of pairs of benefit/cost ratios r and discount factors
β such that Π is an equilibrium protocol. Note that EQ, Φ are correspondences
and are inverse to each other.
μ, ν are computed as
∞
∞
μ= (1 − τ (k)) η (k) , ν = (1 − σ (k)) η (k)
k=0 k=0
Evidently, μ is the fraction of agents who do not request service, and that ν is
the fraction of agents who do not server (assuming they follow the protocol).
To determine the token distribution next period, it is convenient to work
backwards and ask how an agent could come to have k tokens in the next period.
Given the protocol Π the (feasible) token distribution η is invariant if η+ = η;
that is, η is stationary when agents comply with the recommendation (σ, τ ).
Eff (Π|b, c, β) = (1 − μ) (1 − ν)
Taking into account that impatient agents will comply with the protocol if and
only if it is in their interests to do so, the protocol needs to be an equilibrium
given the system parameters. Formally, the design problem are thus to choose
the protocol Π = arg max Eff (Π|β, r) .
Π:(β,r)∈Φ(Π)
3 Equilibrium Strategies
The candidate protocols are enormous, directly focusing on the efficiency hence
is impossible. Therefore, we explore whether there exist some special structures
of the optimal strategies which may simplify the system design.
Proposition 1. Given b, c, β, μ, ν,
1. The optimal client strategy τ is τ (k) = 1 for every k ≥ 1; that is, “always
request service when possible”.
2. The optimal server strategy σ has a threshold property; that is, there exists
K such that σ(k) = 1, ∀k < K and σ(k) = 0, ∀k ≥ K.
Proof. 1. Suppose there is some b, c, β, μ, ν such that τ (k) < 1. If this client
strategy is optimal, it implies that the marginal value of holding k − 1 tokens is
at least b/β, i.e. V (k) − V (k − 1) ≥ b/β > b. Consider any realized continuation
history following the decision period. We estimate the loss in the expected utility
having one less token. Because there is only one deviation in the initial time
period, the following behaviors are exactly the same. The only difference occurs
Token Incentive 253
at the first time when the token holding drops to 0 when it is supposed to buy.
At this moment, the agent cannot buy and losses benefit b. Therefore the loss in
the utility is β t b for some t depending on the specific realized history. Because
this analysis is valid for all possible histories, the expected utility is strictly less
than b. This violates the optimality condition. Hence, it is always optimal for
the agent to spend the token if possible.
2. (sketch) Based on the result of part 1, we study an arbitrary server strategy
σ. The utilities of holding different numbers of tokens are inter-dependent with
each other
V (0) = σ (0) ρ (1 − μ) (−c + βV (1))
+ (ρ (σ (0) (μ − 1) + 2) + 1 − 2ρ) βV (0)
V (k) = σ (k) ρ (1 − μ) (−c + βV (k + 1))
+ρ (1 − ν) (b + βV (k − 1))
+ (ρ (σ (k) μ + ν + 1 − σ (k)) + 1 − 2ρ) βV (k) ,
∀k = 1, 2, ..., K − 1
V (k) = ρ (1 − ν) (b + βV (k − 1))
+ (ρ (ν + 1) + 1 − 2ρ) βV (k) , ∀k = K, K + 1, ...
only for β H . To see that such β H exists, we prove the G(β) is strictly increasing
in β, G(β L ) < 0 and G(1) > 0. Therefore, there must exist an non-degenerate
interval [β L , β H ] that makes a pure threshold strategy an equilibrium.
Proof. (sketch) The proof is similar to the proof of Theorem 2 but this time
we write F (r) = M (K − 1|r) − c/β and G(r) = M (K|r) − c/β as functions
of r. Using similar arguments, we can show that F (r) ≥ 0, ∀r ∈ (rL , ∞) and
G(r) < 0, ∀r ∈ (rL , rH ) and rL < rH .
4 Protocol Design
We will use it in determining the optimal token supply in the next subsection.
maximize (1 − x1 ) (1 − x2 ) =1 − x1 − x2 + x1 x2
K K
subject to x1 (1 − x1 ) = x2 (1 − x2 )
0 ≤ x1 , x2 ≤ 1
To solve this problem, set f (x) = x(1 − x)K , a straightforward calculus exercise
shows that if 0 ≤ x1 ≤ 1/(K + 1) ≤ x2 ≤ 1 and f (x1 ) = f (x2 ) then,
(a) x1 + x2 ≥ 1/(K + 1) with equality achieved only at x1 = x2 = 1/(K + 1).
(b) x1 x2 ≤ 1/(K + 1) with equality achieved only at x1 = x2 = 1/(K + 1).
Putting (a) and (b) together shows that the optimal solution to the maxi-
mization problem is to have x1 = x2 = 1/(K + 1) and the maximized objective
function value is
2
1
max (1 − x1 ) (1 − x2 ) = 1 −
K +1
Now consider the threshold K strategy and let η be the corresponding invariant
distribution. If we take x1 = ηo , x2 = ηd then our characterization of the invari-
ant distribution shows that f (x1 ) = f (x2 ). By definition, Eff = (1 − x1 )(1 − x2 )
so 2
1
Eff = 1 −
K +1
Taken together, these are the assertions which were to be proved.
Proof. (sketch) We prove the first part. The second part is similarly proved.
Consider two protocols Π1 = (K/2, σK ) and Π2 = ((K + 1)/2, σK+1 ) which are
have consecutive thresholds. The corresponding intervals of discount factors that
sustain equilibrium are [β1L , β1H ] and [β2L , β2H ]. We assert that
β1L < β2L < β1H , β2L < β1H < β2H
In words, the sustainable ranges of the discount factors overlap between two
consecutive threshold protocols. To see this, arithmetical exercises show that for
MΠ1 (K|β2L ) > c/β2L which leads to β2L > β1L ; MΠ2 (K|β1H ) > c/β1H which leads
to β2L < β1H . The assertion follows immediately by combining this overlapping
result and Proposition 4.
5 Simulations
In Fig. 1 we illustrate the sustainable region of the pair (β, r) of the discount
factor and the benefit/cost ratio for various threshold protocols. For a larger
threshold to be an equilibrium, larger discount factors or larger benefit/cost
ratios are required. Moreover, fix one of β and r, for given threshold, there is
always an continuous interval for the other parameter to make the threshold
protocol an equilibrium.
10
6 K=1
r = b/c
5
K=2
4
K=3
3 K=4
2 K=5
1
0.4 0.5 0.6 0.7 0.8 0.9 1
β
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Discount factor β
Fig. 2 shows the efficiency of a optimal equilibrium protocol and a fixed thresh-
old protocol. First, the optimal system efficiency goes to 1 as the agents becomes
sufficient patient (β → 1). Second, it compares the achievable efficiency with the
efficiency of a protocol for which the strategic threshold is constrained to be
K = 3. The enormous efficiency loss induced by choosing the wrong protocol
supports our emphasis on the system design in accordance to system parameters.
6 Conclusions
In this paper, we designed token-based protocols - a supply of tokens and rec-
ommended strategies -to encourage cooperation in the online exchange systems
where a large population of anonymous agents interact with each other. We fo-
cused on pure strategy equilibrium and proved that only threshold strategies
can emerge in equilibrium. With this threshold structural results in mind, we
showed that there also exists an unique optimal quantity of tokens that max-
imizes the efficiency given the threshold. It balances the population in such a
way that there are not too many agents who do not serve or too many agents
who cannot pay with tokens. Moreover, the proposed protocols asymptotically
achieve full efficiency when the agents become perfectly patient or the bene-
fit/cost ratio goes to infinity. This paper characterizes the performance of the
online exchange systems operated on tokens and emphasizes the importance of a
proper token protocol. Importantly, the token supply serves as a critical design
parameter that needs to be well understood based on the intrinsic environment
parameters.
258 J. Xu, W. Zame, and M. van der Schaar
References
1. Adar, E., Huberman, B.A.: Free riding on gnutella. ACM Trans. Program. Lang.
Syst. 15(10) (October 2000)
2. Buttyán, L., Hubaux, J.-P.: Stimulating cooperation in self-organizing mobile ad
hoc networks. Mob. Netw. Appl. 8, 579–592 (2003)
3. Feldman, M., Lai, K., Stoica, I., Chuang, J.: Robust incentive techniques for peer-
to-peer networks. In: The 5th ACM Conference on Electronic Commerce, EC 2004,
pp. 102–111. ACM Press, New York (2004)
4. Figueiredo, D., Shapiro, J., Towsley, D.: Incentives to promote availability in peer-
to-peer anonymity systems. In: 13th IEEE International Conference on Network
Protocols, 12 p. (November 2005)
5. Friedman, E.J., Halpern, J.Y., Kash, I.: Efficiency and nash equilibria in a scrip
system for p2p networks. In: 7th ACM Conference on Electronic Commerce, EC
2006, pp. 140–149. ACM, New York (2006)
6. Green, E.J., Zhou, R.: A rudimentary random-matching model with divisible
money and prices. GE, Growth, Math methods 9606001, EconWPA (June 1996)
7. Habib, A., Chuang, J.: Service differentiated peer selection: an incentive mechanism
for peer-to-peer media streaming. IEEE Transactions on Multimedia 8(3), 610–621
(2006)
8. Kandori, M.: Social norms and community enforcement. Review of Economic Stud-
ies 59(1), 63–80 (1992)
9. Kash, I.A., Friedman, E.J., Halpern, J.Y.: Optimizing scrip systems: efficiency,
crashes, hoarders, and altruists. In: Proceedings of the 8th ACM Conference on
Electronic Commerce, EC 2007, pp. 305–315. ACM Press, New York (2007)
10. Kiyotaki, N., Wright, R.: On money as a medium of exchange. Journal of Political
Economy 97(4), 927–954 (1989)
11. Ostroy, J.M., Starr, R.M.: Money and the decentralization of exchange. Economet-
rica 42(6), 1093–1113 (1974)
12. Vishnumurthy, V., Chandrakumar, S., Ch, S., Sirer, E.G.: Karma: A secure eco-
nomic framework for peer-to-peer resource sharing (2003)
13. Zhang, Y., Park, J., van der Schaar, M.: Reputation-based incentive protocols in
crowdsourcing applications. In: Proceedings of IEEE Infocom 2012 (2012)
14. Zhou, R.: Individual and aggregate real balances in a random-matching model.
International Economic Review 40(4), 1009–1038 (1999)
Towards a Metric for Communication Network
Vulnerability to Attacks: A Game Theoretic
Approach
1 Introduction
“...one cannot manage a problem if one cannot measure it...”
V. Krishnamurthy et al. (Eds.): GameNets 2012, LNICST 105, pp. 259–274, 2012.
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012
260 A. Gueye, V. Marbukh, and J.C. Walrand
1
Throughout this paper the call the defender a “network manager”. The defender can
be a human or an automata that implements the game.
Metric for Communication Network Vulnerability to Attacks 261
When communication is carried over a spanning tree, any node can reach any
other node. In that sense, a spanning tree can be said to deliver the maximum
value of the network (indeed this ignores the cost of communication). This value
can be determined by using one of the models cited above. Now, assuming that
information flows over a given spanning tree, two scenarios are possible when a
link of the network fails.
If the link does not belong to the spanning tree, then its failure does not affect
the communication. If, on the other hand, the link belongs to the spanning tree,
then the spanning tree is separated into two subtrees, each of them being a
connected subnetwork and also delivers some value. However, the sum of the
values delivered by the two subnetworks is expected to be less than the value
of the original network. We define the importance of the link, relative to the
spanning tree, to be this loss-in-value (LIV) due to the failure of the link.
Link failures may occur because of random events (faults) such as human
errors and/or machine failures: this is dealt with under the subject of reliability
and fault tolerance [12]. They also can be the result of the action of a malicious
attacker whose goal is to disrupt the communication. It is this type of failure that
is the main concern of this paper. A network manager (defender) would like to
avoid this disruption by choosing an appropriate communication infrastructure.
We model this scenario as a 2-player game where the defender is choosing a
spanning tree to carry the communication in anticipation of an intelligent attack
by a malicious attacker who is trying to inflict the most damage. The adversary
also plans in anticipation of the defense. We use the links’ LIV discussed above
to derive payoffs for both players.
Applying game theoretic models to the security problem is a natural process
and it has recently attracted a lot of interest (see surveys [18], [11]). In this paper,
we set up a game on the graph of a network and consider the Nash equilibrium
concept. We propose the expected LIV of the game for the network manager
as a metric for vulnerability. This value captures how much loss an adversary
can inflict to the network manager by attacking links. By analyzing the Nash
equilibria of the game, we determine the actions of both the attacker and the
defender. The analysis reveals the existence of a set of links that are most critical
for the network. We identify the critical links and compare them for the different
network value models cited above. The comparison shows that the set of critical
links depends on the value model and on the connectivity of the network.
In the process to quantifying the importance of a communication link, we
propose a generalization of the notion of betweenness centrality which, in its
standard form, is defined with respect to shortest paths ([6]). We consider net-
works where information flow over spanning trees, hence we use spanning trees
in lieu of paths. Our generalization allows both the consideration of arbitrary
(instead of binary) weights of the links as well as preference for spanning tree
utilization.
The remainder of this paper is organized as follows. The next section 2.1
discusses the different network value models that we briefly introduced above.
We use these models to compute the relative importance of the links with respect
262 A. Gueye, V. Marbukh, and J.C. Walrand
Sarnoff ’s Law:
Sarnoff’s law [1] states that the value of a broadcast network is proportional to
the number of users (O(n)). This law was mainly designed for radio/TV broad-
cast networks where the popularity of a program is measured by the number of
listeners/viewers. The high advertising cost during prime time shows and other
popular events can be explained by Sarnoff’s law. Indeed as more viewers are
expected to watch a program, a higher price is charged per second of advertising.
Although Sarnoff’s law has been widely accepted as a good model for broadcast
network, many critics say that it underestimates the value of general communi-
cation networks such as the Internet.
Metcalfe’s Law:
Metcalfe’s law [5] was first formulated by George Gilder (1993) and attributed
to Robert Metcalfe who used it mostly in the context of the Internet. The law
states that the value of a communication network is proportional to the square of
the number of node. Its foundation is the observation that in a general network
with n nodes, each node can establish n − 1 connections. As a consequence, the
total number of undirected connections is equal to n(n − 1)/2 ∼ O(n2 ). This
observation is particularly true in Ethernet networks where everything is “logi-
cally” connected to everything else. Metcalfe’s law, has long been held up along
side with Moore’s law as the foundation of Internet growth.
Metric for Communication Network Vulnerability to Attacks 263
Walrand’s Law:
Walrand’s law generalizes the previous laws by introducing a parameter a. The
intuition behind this law is as follows. Imagine a large tree of degree d that is
rooted at you. Your direct children in the tree are your friends. The children
of these children are the friends of your friends, and so on. Imagine that there
are L ≥ 2 levels. The total number of nodes is n = d(dL − 1)/(d − 1) + 1. If
d is large, this number can be roughly approximated by n ≈ dL . Assume that
you only consider your direct friends i.e., about d people. Then the value of the
network to you is O(d) = O(na ) where a = 1/L. If you care about your friends
2
and their friends (i.e d2 people) then your value of the network is O(n L ). If all
the nodes up to level l ≤ L are important to you (dl nodes), then the network
l
has a value of O(n L ). Repeating the same reasoning for each user (node), the
total value of the network is approximately equal to O(n ∗ na ) = O(n1+a ) with
0 < a ≤ 1. The parameter a is a characteristic of the network and needs to be
determined. Notice that if all nodes value children at all levels, the total value
of the network becomes n2 which corresponds to the Metcalfe’s law (a = 2). If
on the other hand a = 0, we get back Sarnoff’s model.
Reed’s Law:
Reed’s law, also called the Group-Forming law, was introduced by David Reed
([16],[4], [17]) to quantify the value of networks that support the construction
of a communicating group. A group forming network resembles a network with
smart nodes that, on-demand, form into such configurations. Indeed, the number
of possible groups that can be formed over a network of n nodes is O(2n ). Reed’s
law has been used to explain many new social network phenomenons. Important
messages posted on social networking platforms such as Twitter and Facebook
have been witnessed to spread exponentially fast.
a) b) c)
f (n1 ) + f (n2 )
λ(T, e) = 1 − . (1)
f (n)
link e belongs to the spanning tree and 0 otherwise (i.e. λ(T, e) = 1e∈T ). The
model basically assumes that whenever a link on the spanning tree is removed
(i.e. successfully attacked and hence disconnecting the network), the network
loses its entire value.
Table (1) shows the LIV of links for the different models presented above (Sarnoff
replaced by GWA). It is assumed that removing link e divides spanning tree T
into two subtrees with respective n1 and n2 nodes (n1 + n2 = n)
Table 1. Normalized LIV of link e relative to spanning tree T for the different laws.
Removing link e from spanning tree T divides the network into two subnetworks with
respective n1 and n2 nodes (n1 + n2 = n).
n1+a +n1+a
Walrand 1− 1
n1+a
2
where the summation is now over spanning trees. The parameter λ(T, e) is the
weight of link e for spanning tree T , and α(T ) is the probabilities (preference)
of using T as communication infrastructure.
In general, λ and α can be determined by considering relevant aspects of the
communication network (e.g. cost of utilizing the links, overall communication
delay, vulnerability of links). In this paper, the parameters λ are chosen to be
equal to the LIV of the links relative to spanning trees, and α is chosen to be
the mixed strategy Nash equilibrium in a game between a network manager and
an attacker. Details of the game are presented next.
In this paper, we have focused on the case where η(T ) = η is constant; hence not
relevant
optimization of L(α, β), which now becomes the minimization
to the
of T ∈T αT e∈T β e λ(T, e). As a consequence, we ignore η(T ) for the rest of
this paper. The general case of η(T ) will be considered in subsequent studies.
Metric for Communication Network Vulnerability to Attacks 267
κ : 2E −→ R+
E −→ κ(E) = min 1 y, y ∈ ỹ ∈ Rm
+ | ΛE ỹ ≥ 1 . (6)
κ(E) is the value of a linear program (LP) that might be infeasible (e.g. when
a row of ΛE is all zeros). However, its dual is always feasible (see [9, App.E]),
and when the dual LP is bounded, the primal is necessarily feasible [2]. Let yE
be a solution of the primal program whenever the dual LP is bounded. If this
dual is unbounded for some subset E, we let yE = K1m , for an arbitrary large
constant K, where m = |E|, and 1m is the all-ones vector of length m. With this
“fix”, κ(E) = m ∗ K when the dual LP is unbounded. Hence, we can define the
following quantities.
Definition 2. The probability distribution induced by E is defined as β E =
yE /κ(E).
The induced expected net reward θ(E) and the maximum induced expected net
reward θ∗ are defined by
1
θ(E) := − βE (e)μ(e), and θ∗ := max (θ(E)) . (7)
κ(E) E
e∈E
We call a subset E critical if θ(E) = θ∗ and we let C be the set of all critical
subsets.
Remark 2. – In our online report [9, App.E], we argue that a critical subset E
is such that 0 < κ(E) < ∞, hence its corresponding yE and β E are always
well-defined.
– With the definition of κ(·), if μ = 0, a subset E of links is critical, than any
subset F ⊇ E is critical. In this case, the most critical subset is the critical
subset with the minimum size. More details about this can be found in [9].
Theorem 1. For the game defined above, the following always hold.
1. If θ∗ ≤ 0, then “No Attack” (i.e. β(e∅ ) = 1) is always an optimal strategy
for the attacker. In this case, the equilibrium strategy (αT , T ∈ T ) for the
defender is such that
ϑ(e, λ, α) = αT λ(T, e) ≤ μ(e), ∀e ∈ E. (8)
T ∈T
The NE theorem has three parts. If the quantity θ∗ is negative then the attacker
has no incentive to attack. For such choice to hold in an equilibrium, the defender
has to choose his strategy α as given in (8). Such α always exists. When θ∗ ≥ 0
there exists an equilibrium under which the attacker launches an attack that
focuses only on edges of critical subsets. The attack strategies (probability of
attack of the links) are given by convex combinations of the induced distributions
of critical subsets. The corresponding defender’s strategies are given by (9).
When there is no attack cost, the attacker always launches an attack (θ∗ > 0)
and the theorem states that all Nash equilibria of the game have the structure
in 9.
For simplicity, let’s first assume that there is no attack cost i.e μ = 0. In this case,
θ(E) = κ(E)1
and θ∗ > 0. Also, a subset of link E is critical if and only if κ(E)
is minimal. Since in this case the game is zero-sum, the defender’s expected loss
is also θ∗ = (minE κ(E)). θ∗ depends only on the graph and the network value
model (f (n)). It measures the worst case loss/risk that the network manager is
expecting in the presence of any (strategic) attacker. Notice that in our setting,
a powerful attacker is one who does not have a cost of attack (i.e. μ = 0). When
θ∗ is high, the potential loss in connectivity is high. When it is low, an attacker
has very little incentive, hence the risk from an attack is low. Hence, θ∗ can be
used as a measure of the risk of disconnectivity in the presence of a strategic
attacker. A graph with a high θ∗ is a very vulnerable one.
This vulnerability metric also corresponds to a quantification of the impor-
tance of the most critical links. This is captured by the inequalities in (9), which,
when μ = 0, become
with equality whenever link e is targeted with positive probability (β(e) > 0) at
equilibrium. From (9) we see that β(e) > 0 only if edge e belongs to a critical
subset, and hence is critical. Thus, the attacker focuses its attack only on critical
links, which inflict the maximum loss to the defender.
For the defender, since the game is zero-sum, the Nash equilibrium strategy
corresponds to the min-max strategy. In other words, his choice of α minimizes
the maximum expected loss. Hence, the defender’s equilibrium strategy α can
be interpreted as the best way (in the min-max sense) to choose a spanning
tree in the presence of a strategic adversary. Using this interpretation with our
generalization of betweenness centrality in (2), we get a way to quantify the
importance of the links to the overall communication process. The inequalities
in (10) above say that the links that are the most important to the defender
(i.e. with maximum ϑ(e, λ, α)) are the ones that are targeted by the attacker
(the most critical). This unifies the positive view of importance of links when
it comes to participation to the communication process to the negative view
of criticality when it comes to being the target of a strategic adversary. This
is not surprising because since the attacker’s goal is to cause the maximum
damage to the network, it makes sense that she targets the most important
links.
When the cost of attack is not zero (μ = 0), our vulnerability metric θ∗ takes
it into account. For instance, if the attacker has to spend too much effort to
successfully launch an attack, to the point where (the expected net reward) θ∗ is
negative, the theorem tells that, unsurprisingly, the attacker will choose to not
launch an attack. To “force” the attacker to hold to such choice (i.e to maintain
the equilibrium), the defender has to randomly pick a spanning tree according
to (8). With this choice, the relative value of any link is less than the amount of
effort needed to attack it (which means that any attack will result to a negative
net-payoff to the attacker). When μ is known, such choice of α can be seen as
a deterrence tactic for the defender.
If the vulnerability θ∗ is greater than zero, than there exists an attack strategy
that only targets critical links. To counter such attack, the defender has to draw
a spanning tree according to the distribution α in (9). For such choice of a tree,
the relative importance of any critical link, offset by the cost of attacking the
link, is equal to θ∗ . For any other link, this difference is less than θ∗ . In this
case, the criticality of a link is determined not only by how much importance
it has for the network, but also how much it would take for the adversary to
successfully attack it. Hence,when μ ≥ 0, θ∗ is a measure of the willingness of
an attacker to launch an attack. It includes the loss-in-value for the defender as
well as the cost of attack for the attacker.
Observe that when μ ≥ 0 the theorem does not say anything about the
existence of other Nash equilibria. It is our conjecture (verified in all simulations)
that even if there were other equilibria, θ∗ is still the maximum payoff that the
attacker could ever receive. Hence, it measures the worst case scenario for the
defender.
270 A. Gueye, V. Marbukh, and J.C. Walrand
1
2
3 4
a) b) c)
Fig. 2. Example of critical subsets for different value models. a) GWA model b) BOT,
Walrand, and Metcalfe’s models. c) Reed’s model.
In this section we discuss how the critical subsets depend on the model used
for the value of the network. Figure 2 shows an example of network with the
critical subsets for the different value models discussed earlier. The example
shows a “core” network (i.e the inner links) and a set of bridges connecting it to
peripheral nodes. A bridge is a single link the removal of which disconnects the
network. In all figures, the critical subset of links is shown the dashed lines. In
this discussion we mainly assume that the attack cost μ is equal to zero.
Figure 2.a shows the critical subset corresponding to the GWA link cost model
introduced in [10] for which λT,e = 1e∈T . With this model, the defender loses
everything (i.e. 1) whenever the attacked link belongs to the chosen spanning
tree. Since a bridge is contained in any spanning tree, attacking a bridge gives
the maximum outcome to the attacker. As a consequence, the critical subsets
correspond to the set of bridges as can be observed in the figure. In fact, with
the GWA value model and Definition 1 of [10], on can easily show that that
|E|
κ(E) = M(E) , where M(E) = minT (|T ∩ E|). Notice that if E is a disconnecting
set (i.e. removing the edges in E divides the graph into 2 or more connected
components), M(E) ≥ 1. Now, if e is a bridge, |T ∩ {e}| = 1 for all spanning
trees T , implying that M({e}) = 1 and θ({e}) = κ({e}) = 1, which is the
maximum possible value of θ∗ . As a consequence, each bridge is a critical subset
and any convex combination over the bridges yields an optimal attack.
Figure 2.b depicts the critical subsets with the Metcalfe, BOT, and Walrand
(a = 0.6) models. For all these models (as well as for Reed’s model), the function
f (x) − (f (x1 ) + f (x2 )), where x1 + x2 = x, is maximized when x1 = x2 = x/2.
This suggests that attacks targeting links that evenly divide (most) spanning
trees are optimal. This conjecture “seems” to be confirmed by the examples
shown in the figure. The most critical links are the innermost or core links of
the network for all three models. The Nash equilibrium attack distributions are
slightly different for the 3 models. The distribution on links (1, 2, 3, 4, 5) is given
in Table 2 for Metcalfe, BOT, and Walrand(a = 0.6) models. Notice that for all
models, the middle link (2) is attacked with a higher probability.
Metric for Communication Network Vulnerability to Attacks 271
Table 2. Attack probabilities on links (1, 2, 3, 4, 5) for Metcalfe, BOT, and Walrand
models
Although Reed’s (exponential) model also has the same property discussed in
the previous paragraph, the critical subset with Reed is different, as can be seen
in figure 2.c. While Metcalfe, BOT, and Walrand models lead to the core network
being critical, with Reed’s model, the critical links are the links giving access to
the core network. Each of the links is attacked with the same probability. This
might be a little surprising because it contradicts the conjecture that innermost
links tend to be more critical. However, observing the attack’s reward function
1 − f (n1 )+f (n−n1 )
f (n) as shown in figure 3, Reed’s model coincides with the GWA
model in a wide range of n1 . This means that any link that separates (most of
the spanning) into subtrees of n1 and n − n1 nodes gives the maximum reward
to the attacker, for most values of n1 . Also, notice that since the core network is
“well connected”, the defender has many options for choosing a spanning tree.
This means that in the core, the attacker has less chances of disrupting the
communication. Links accessing the core, on the other hand, deliver high gain
and better chances of disrupting the communication. Hence, the best strategy
for the attacker is, in this case, to target access to the core. Notice that Metcalfe,
BOT, and Walrand (a ≤ 1) models do not have this optimal tradeoff choice.
By choosing the parameter a to be sufficiently large in the Walrand model,
we have (experimentally) observed that the critical subset moves from being the
core, to corresponding to the one in GWA model (the bridges) for very large
values of a. In fact, with all network topologies we have considered (more than
50), we could always choose the parameter of the Walrand so that the critical
subset matches the one in GWA model. This implies that as the model loss
function 1 − f (n1 )+f (n−n1 )
f (n) gets closer to the GWA function 1e∈T , the critical
subset moves away from the inner links to the outer links.
These observations indicate that the critical subsets of a graph depend on the
value model used to setup the game. The value model is however not the only
factor that characterizes the critical subset(s) of a graph. Figure 4 shows the
same network as in the previous example with one additional (core) link. With
this link, the connectivity of the network is enhanced. The critical subset does
not change for the GWA models. However, for all other 4 models, the critical
subset is now the access to the core. This suggests that connectivity is another
factor that characterizes the critical subset(s).
As was observed (with simulations) in the previous example, in this case also,
when the parameter a of Walrand’s model is chosen sufficiently large, the critical
subsets become the same as the GWA critical subsets.
272 A. Gueye, V. Marbukh, and J.C. Walrand
0.2
0
5 10 15 20
a) b)
Fig. 4. Example of critical subsets for different value models. a) GWA model b) BOT,
Walrand, Metcalfe and Reed’s models.
References
1. USN Admiral James Stavridis. Channeling David Sarnoff (September 2006),
http://www.aco.nato.int/saceur/channeling-david-sarnoff.aspx
2. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press
(March 2004)
3. Briscoe, B., Odlyzko, A., Tilly, B.: Metcalfe’s Law is Wrong. IEEE Spectrum, 26–31
(July 2006)
4. Marketing Conversation. Reeds Law States that Social Networks Scale Ex-
ponentially (August 2007), http://marketingconversation.com/2007/08/28/
reeds-law/
5. Marketing Conversation. A Short discussion on Metcalfe’s Law for Social Networks
(May 2008), http://marketingconversation.com/2007/08/28/reeds-law/
6. Freeman, L.: Centrality in Social Networks Conceptual Clarification. Social Net-
works 1(3), 215–239 (1979)
274 A. Gueye, V. Marbukh, and J.C. Walrand