0% found this document useful (0 votes)

0 views

zhan2019

Uploaded by

jialixu089

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

zhan2019

Uploaded by

jialixu089

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2927314, IEEE
Transactions on Mobile Computing
IEEE TRANS. ON MOBILE COMPUTING, VOL. XX, NO. XX, XX XX 1

Free Market of Multi-Leader Multi-Follower

Mobile Crowdsensing: An Incentive Mechanism
Design by Deep Reinforcement Learning
Yufeng Zhan, Chi Harold Liu, Senior Member, IEEE, Yinuo Zhao, Jiang Zhang, Jian Tang, Fellow, IEEE

Abstract—The explosive increase of mobile devices with built-in sensors such as GPS, accelerometer, gyroscope and camera has
made the design of mobile crowdsensing (MCS) applications possible, which create a new interface between humans and their
surroundings. Until now, various MCS applications have been designed, where the task initiators (TIs) recruit mobile users (MUs) to
complete the required sensing tasks. In this paper, deep reinforcement learning (DRL) based techniques are investigated to address
the problem of assigning satisfactory but profitable amount of incentives to multiple TIs and MUs as a MCS game. Specifically, we first
formulate the problem as a multi-leader and multi-follower Stackelberg game, where TIs are the leaders and MUs are the followers.
Then, the existence of the Stackelberg Equilibrium (SE) is proved. Considering the challenge to compute the SE, a DRL based
Dynamic Incentive Mechanism (DDIM) is proposed. It enables the TIs to learn the optimal pricing strategies directly from game
experiences without knowing the private information of MUs. Finally, numerical experiments are provided to illustrate the effectiveness
of the proposed incentive mechanism compared with both state-of-the-art and baseline approaches.

Index Terms—Incentive mechanism, Multi-leader multi-follower mobile crowdsensing, Stackelberg Equilibrium, Deep reinforcement
learning

1 I NTRODUCTION reasons, MUs may be reluctant to participate in practice

With the rapid development of mobile devices such as unless they could obtain satisfactory incentives to com-
smartphones and tablets which are equipped with multi- pensate their resource consumptions and potential privacy
ple powerful built-in sensors including GPS, accelerometer, breach. Hence, it is necessary to design an effective incentive
gyroscope, camera, etc., mobile crowdsensing (MCS) appli- mechanism that can well motivate the MUs to participate in
cations which can provide location based services become MCS applications.
possible [1]. A typical MCS system consists of a cloud-based Game theory is a powerful framework to analyze the
MCS server and a collection of mobile users (MUs) [2]. interactions among multiple players who act in their own
Recently, various MCS applications have been deployed that interests. It can be used to design decentralized mecha-
cover almost every aspect of our lives, including health- nisms, such that no player has the incentive to deviate
care [3], intelligent transportation [4], environmental mon- unilaterally. Thanks to its great promises, game theory has
itoring [5], etc. The task initiators (TIs) register the MCS been applied for designing incentive algorithms in MCS
server as sensing service buyers who launch a set of sensing by recent research efforts. For example, Yang et al. [9, 10]
tasks and select a set of MUs around the corresponding modeled the platform-centric incentive mechanism for M-
point-of-interests (PoIs) to perform the sensing tasks and CS using Stackelberg game approach. In [11], Maharjan et
then send the . Once the MUs are recruited by TIs, a MU al. proposed the multimedia application of quality-aware
starts to perform the assigned task and sends the sensing MCS based on Stackelberg game. However, these works
data back to the TIs through cellular networks or mobile considered that there was only one TI in one sensing slot,
networks [6]. The MCS systems heavily rely on total MU meanwhile they assumed that TI knew the accurate utility
participation level and the individual contribution from function of each participating MU. In real-world scenarios,
each MU. Most existing MCS systems [7, 8] are based on it maybe impractical.
the voluntary participation from MUs. However, to perform Thanks to the recent research output along the direction
sensing tasks, participating MUs have to consume their own of deep reinforcement learning (DRL), which outperformed
resources such as computing and communication power. humans on several game-playing task (e.g., Atari Games),
Moreover, MUs may face the potential privacy threats when it can be used to seek the equilibrium in the MCS game
the sensing data is submitted with their own sensitive infor- without knowing any prior information of each participant.
mation (e.g., location tags and visiting patterns). For these Xiao et al. [12] proposed the secure MCS game based on
Stackelberg game, they designed the discrete DRL approach
(DQN) to achieve the Stackelberg equilibrium (SE) without
• Y. Zhan, C. H. Liu, Y. Zhao and J. Zhang are with the School of Computer
Science and Technology, Beijing Institute of Technology, Beijing 100081, knowing any prior information about the participants in the
China (Corresponding author: C. H. Liu, Email: [email protected]). MCS system. However, this work only considered there was
• J. Tang is with Department of Electrical Engineering and Computer one TI in one sensing slot. In real-world scenario, there may
Science, Syracuse University, NY, USA. E-mail: [email protected].
appear more than one TI in the corresponding PoI in one

TABLE 1
08 7,
List of important notations used in this paper.
IROORZHU OHDGHU 6WDJH,
3D\PHQW
6WUDWHJ\ 0&66HUYHU Notation Explanation
6HQVLQJ n, N Index of a MU, number of MUs

5HSRUW
m, M Index of a TI, number of TIs
6HQVLQJGDWD
The sensing time MU n spends for TI m.
3D\PHQW
tn
m , tm , t
n Vector of sensing time MUs spend for TI m.
6WDJH,, Vector of sensing time MU n spends for all TIs
(tn ∗ ∗ n ∗ Optimum of tn n
m ) , tm , (t ) m , tm , t .
Price TI m determines for MU n.
Fig. 1. An example of a MCS system. pn
m , pm , p
n Vector of prices TI m determines for all MUs.
Vector of prices all TIs pay for MU n.
(pn ∗ ∗ n ∗ Optimum of pn n
m ) , pm , (p ) m , pm , p .
n n
Cm (tm ) Cost of MU n for implementing TI m’s task.
sensing slot. Maximum sensing time of MU n.
In this paper, as shown in Fig. 1, we design an incentive κn , δm
Budget of TI m.
n
mechanism for MCS based on a two-stage, non-cooperative ωm Sensing quality of MU n for TI m’s task.
game which is known as the Stackelberg game. Specif- em
n Mobility index of MU n from TI m’s PoI.
ϕm (tm ) Utility function of TI m.
ically, we consider multiple TIs and multiple MUs who ψn (tn , pn ) Payoff function of MU n
can participate in the MCS system simultaneously. Each φm (tm , pm ) Payoff function of TI m
MU can arbitrary divide its resource to serve different TIs.
We first study the optimal solution and Nash equilibrium
of the MCS game. Based on the insight provided by the tion. Section 5 gives the analysis of the Nash equilibrium
optimal solution and Nash equilibrium of the MCS game, and optimal solution. Section 6 provides the detailed design
we extend our work for dynamic incentive mechanism of DDIM approach. Finally, Section 7, numerical experi-
by formulating the MCS game as a multi-agent Markov ments are conducted to evaluate the system performance,
decision process (MDP). It makes the model more flexible, Section 8 discusses the paper and Section 9 concludes the
which is different from single-agent discrete models used by paper. Table 1 shows the important notations in this paper.
existing work [12].
To address the challenges of multiple TIs and continuous
decision space of MUs, we propose an approach based on 2 R ELATED W ORK
multi-agent DRL with policy gradient. Our approach can Currently, a large quantity of previous works have been
effectively learn the optimal pricing strategy directly from dedicated to designing incentive mechanisms for MCS. Auc-
MCS game history without any prior knowledge about tion is one of the widely used incentive mechanism design
participants’ utility functions. It has merits over model- framework for MCS. Yang et al. [9, 10] proposed the incen-
based MCS game strategies in that it is totally model-free tive mechanism for user-centric MCS using auction method.
and provides a general solution to MCS systems. Thus, it can Several papers [13–15] have taken into consideration that
be applied to complex and unpredictable solutions where it MUs may come into the MCS system in an on-line manner.
is difficult to obtain the precise system models. Recently, many works [16–18] considered the quality of the
Differing from the previous works, our contribution is sensing data. Jin et al. [19] proposed the incentive mechanis-
three-fold. m for privacy-aware data aggregation in MCS. Gan et al. [20]
proposed the game-based incentive mechanism for multi-
1) We formulate the MCS system as a multi-leader
resource sharing to maintain the social fairness-efficiency
multi-follower Stackelberg game, and the existence
tradeoff. However, in these works, the MUs as the seller bid
of SE is proved. To the best of our knowledge, this is
for the sensing tasks.
one of the first works which models the MCS system
Yang et al. [9, 10] modeled the platform-centric incentive
as a multi-leader multi-follower Stackelberg game.
mechanism for MCS using Stackelberg game approach. Du-
2) Since the SE cannot be solved directly, we transform
an et al. [21] used the Stackelberg game to design a threshold
the MCS game into a multi-agent MDP and design
revenue model for the MUs. Cheung et al. [22] designed
a DRL based incentive mechanism called “DDIM”,
the delay-sensitive mechanism MCS based on Stackelberg
which enables each TI to learn the optimal pricing
game. In [11], Maharjan et al. proposed the multimedia
strategy directly from game history without any
application of quality-aware MCS based on Stackelberg
prior knowledge about MUs’ private information.
game. Chen et al. [23] modeled the crowdsourcing systems
DDIM not only can learn the pricing strategy under
with two-stage non-cooperative game, and investigated the
the deterministic environment, but also the pricing
behaviors of MUs under the global network effects. Nie et
strategy under the stochastic environment when the
al. [24, 25] modeled the rewarding and participating for MCS
MUs enter or leave the TIs’ PoIs dynamically.
system as a two-stage single-leader-multiple-follower game,
3) Numerical results demonstrate the effectiveness of
at the same time, the reward was designed taking the under-
the proposed DDIM scheme when compared with
lying social network effects amid the mobile social network
both state-of-the-art and baseline approaches.
into account. However, these works only considered one
The remainder of this paper is organized as follows. sensing task in one sensing slot, meanwhile they did not
Section 3 presents the system model. In Section 2, we discuss take the privacy of MUs into consideration. Xiao et al. [12]
the related works. Section 4 describes the problem formula- proposed the secure MCS game with only one sensing task

in a sensing slot based on Stackelberg game, they took the function of tn m , for each m − n pair [9, 10, 31–34]. Piece-
privacy of MUs by designing the DQN approach. In [26], wise linear functions [10] and quadratic functions [32, 33]
the authors studied the MCS game in vehicular networks, are two examples which are widely used in previous works.
where a Q-Learning approach was applied to derive the In this paper, quadratic function is selected for each MU, i.e.,
equilibrium of the game. Peng el al. [27] and Chakeri et Cm (tm ) = anm tnm 2 + bnm tnm , on his effort level with anm > 0
n n

al. [28] proposed the incentive mechanisms based on two- and bn m > 0, which can be used to model the increasing
stage non-cooperative game for MCS with multiple crowd- marginal cost for every additional unit of effort exerted. And
sourcers, yet these works assumed that the MUs could anm and bnm differ from each other since different MUs have
participate in one sensing task in one sensing slot. Even different level of availability and different sensing tasks are
though there are some works such as [29] having studied the at different level of difficulty. This kind of cost function has
multi-leader multi-follower Stackelberg game, they can not been widely accepted to represent the cost of a MU, e.g.,
be directly applied to our work. In this work, the authors [32, 33]. For MU n, the payoff function can be formulated
designed the evolutionary algorithm to find the optimal as: ∑ ∑
strategies in the Stackelberg game, however, it assumes that ψn (tn , pn ) = pnm tnm − Cmn n
(tm ). (1)
m m
each player only has one variable and the objective of it is to
maximize the total payoffs of each player which is different That is, the benefits MU n can obtain by selling sensing
with our work. service to different TIs.
There are some works for multiple TIs and multiple MUs
MCS systems [30–32]. He et al. [30] designed the incentive 3.2 TI Modeling
mechanism for MCS systems based on Walrasian Equilib- Let pm = (pn m )∀n∈N be the vector of prices that TI m
rium. Duan et al. [32] devised the incentive mechanism determines for all the MUs, and tm = (tn m )∀n∈N be the
for the MCS system that would benefit all the TIs, MUs vector of sensing time that all MUs spend for TI m. Each TI
and MCS server, in a balanced manner. However, these m is associated with a utility function ϕm (·) to measure the
works need the central controller to control the market, sensing quality of all the MUs. Here, we assume that ϕm (·)
which is infeasible in free market scenarios. In this work, we is a monotonically increasing, differentiable, and strictly
formulate the multiple TIs and multiple MUs MCS system concave function of tm . Due to the heterogeneous charac-
as a Stackelberg game in the free market crowdsensing, the teristics of mobile devices, they may contribute differently
main challenges are how to prove the existence of the SE to the quality of sensing for a given amount of sensing
and how to compute it. time. The definition of quality of sensing varies for different
applications. For example, in MedWatch System, the quality
of sensing refers to the quality (e.g., resolution, contrast,
3 S YSTEM M ODEL sharpness) of uploaded photos. Photos with higher quality
We consider that there is a MCS system which consists of M will help the TI better identify visible problems with medical
TIs and N MUs. Let M , {m = 1, 2, · · · , M } be the set of devices. In air quality monitoring MCS systems, quality of
TIs, N , {n = 1, 2, · · · , N } be the set of MUs. Each TI m sensing refers to a MU’s estimation accuracy of air quality.
aims to recruit some of N MUs located near PoIs to gather Obviously, MUs with higher quality of sensing, the TIs will
sensing data and establish a MCS application. obtain higher utilities, e.g., the utility function of TI m ϕm (·)
is a monotonic in quality of sensing [17, 18]. To capture this,
n
the weight ωm is used to indicate the contribution of unit
3.1 MU Modeling
sensing time to the quality of sensing of TI m that MU n
MUs and TIs all aim to maximize their payoffs in the makes.
trading process, as the following two steps: (a) each TI m Then, the payoff of each TI m consists of two parts: (a)
determines the sensing price pn m when occupying a unit the utility gained from collecting MUs’ sensing data, and (b)
amount of each MU n’s sensing time, and (b) each MU n the incentives paid for the sensing service of MUs, as:
chooses tnm amount of sensing time to service TI m. Let ∑
tn = (tnm )∀m∈M be the vector of sensing time that MU n φm (tm , pm ) = ϕm (tm ) − pnm tnm . (2)
n
spends for all tasks, and pn = (pn m )∀m∈M be the vector In this paper, used utility function ϕm (tm ) =
of prices that all TIs pay for MU n. The mobility index of ∑ the widely n n
µm log(1+ n log(1 + ωm tm )) is adopted as in the previous
MU n from TI m’s PoI (which is denoted by em n ) is the works [9–11, 16], and µm is the weight for different TIs. The
probability that MU n leaves the PoI of TI m during the log(1+ωm n n
tm ) term reflects the TI m’s diminishing return on
sensing task, which depends on the movement speed of an the work of MU n, and the outer log term reflects the TI m’s
MU. Each participant (i.e., either a MU or TI) is associated diminishing return∑ on participating MUs. For convenience,
with a payoff function, which represents its benefit. MU we set σm = 1 + n log(1 + ωm n n
tm ), and thus gm (σm ) =
n obtains pnm tnm amount of benefits when it allocates tnm µm log(σm ).
amount of sensing time to TI m. At the same time, perform-
ing TI m’s sensing task for tn m units of sensing time will
incur a cost denoted as Cm n n
(tm ), which could incorporate 4 P ROBLEM F ORMULATION
several factors such as physical or mental tiredness of MUs, In a MCS system, TIs and MUs negotiate the pricing and
battery drainage and bandwidth occupation, etc. Without task allocation strategies, to maximize their own payoffs.
n n
loss of generality, a cost function Cm (tm ) is assumed to be a Specifically, the objectives for TI m and MU n can be
monotonically increasing, differentiable and strictly convex formulated as constrained optimization problems.

For TI m, Theorem 1. Given any feasible pricing strategy pn for MU

n which is announced by all the TIs, its optimal sensing time
OPTI : max φm (tm , pm ) (3) ∗
∑ allocation strategy (tn
m ) satisfies:
s.t. pnm tnm ≤ δm , pnm > 0, 
 m −bm
pn n ∑
M
m −bm
pn n
n

 , if ≤ κn ,
2an 2an

(tnm )∗ =
m m
where δm is the credit budget of TI m. m=1
(6)
For MU n, 

F1 +F2 +F3
, elseif F1 + F2 + F3 > 0,
 F4
0, otherwise,
OPMU : max ψn (tn , pn ) (4)
∑ where,
s.t. tnm ≤ κn , tnm > 0, ∑ ∏
m F1 = anm2 (pnm − pnm1 ),
m1 ̸=m m2 ̸=m1 ,m
where κn is the maximum sensing time for MU n. ∑ ∏
Since each participant has its own objective, it is nec- F2 = an (bn − bnm ),
m ̸=m m2 ̸=m1 ,m m2 m1
∏1 ∑ ∏
essary to find a pricing strategy pm for each TI m and F3 = 2 anm1 κn , F4 = 2 anm1 . (7)
a sensing time allocation strategy tn for each MU n such m1 ̸=m m m1 ̸=m
that all of them could receive benefits. We formulate the Proof. Assume that there are M TIs in the MCS system.
incentive mechanism as a Stackelberg game based on the OPMU becomes:
two-stage non-cooperative game theory. In a Stackelberg ∑
game, participants will be classified into two groups, leaders max pn tn − anm tnm 2 − bnm tnm (8)
m m m
∑
and followers, where leaders have the privilege of moving s.t. tnm ≤ κn , tnm > 0. (9)
first while followers will move according to the leaders’ m

actions. Specifically, our problem is modeled as a multi- Obviously, OPMU is a strictly convex optimization problem.
leader multi-follower Stackelberg game with two stages, Hence, there exists a unique solution. Set λn0 and λnm as La-
where the TIs act as the leaders and all the MUs act as the grangian multipliers, optimization problem (8)-(9) becomes:
followers. First, each TI m determines the pricing strategy ∑
pm . Then, each MU n acts as the game follower choosing its Ln = (pn tn − anm tnm 2 − bnm tnm ) (10)
m m m
∑ ∑
sensing time strategy tn to maximize its own payoff. −λn0 ( tnm − κn ) + λnm tnm .
m m
∗ n ∗
Definition 1. SE: The strategy set ((pn m ) , (tm ) ) for all n ∈ N KKT condition is
and m ∈ M constitutes a SE of the MCS game if the following
∂Ln
conditions are satisfied: = 0, ∀m ∈ M, (11)
(a): (pn ∗ ∂tnm
m ) is a Nash Equilibrium for TIs, i.e., ∑
( ) λn0 ( tn − κn ) = 0, λnm tnm = 0, (12)
m m
φm t∗m (p∗m , p∗−m ), p∗m , p∗−m ≥ ∑
( ) λn0 , λnm ≥ 0, tnm > 0, tnm ≤ κn . (13)
m
φm t∗m (pm , p∗−m ), pm , p∗−m , (5)
Eqn. (11) can be converted to:
where p∗−m
= (pnm1 )∗∀m1 ∈M\m,∀n∈N
is the pricing strategies of pnm − 2anm tnm − bnm − λn0 + λnm = 0, ∀m ∈ M. (14)
other TIs except m.
(b): For given pricing strategy (pn )∗ , the optimal response Since tn
m > 0, we can obtain that λnm = 0. Then, optimal
from n ∗ sensing time allocation can take one of the following cases:
( nMUn n∗ ) is (t ) , which is the unique maximizer for i) Case I: λn0 = 0, according to Eqn. (14), we have:
ψn t , (p ) .
pnm − bnm
tnm = . (15)
2anm
5 O PTIMAL S OLUTION AND N ASH E QUILIBRIUM
ii) Case II: λn0 > 0, according to Eqn. (14), we have
5.1 Optimal Solution for MUs pn −bn −λ
tnm = m 2amn n0 . Substituting tnm into Eqn. (12), we have
∑ ∏
m ∏ n
m1 (pm1 −bm1 )−2
an n n
As formulated in Section 4, for a given set of prices an- m am κn
m1 ̸=m m
λn0 = ∑ ∏ , and:
nounced by TIs, MU n calculates its optimal sensing time m
n
am1
m1̸=m
response by solving optimization problem OPMU in Eqn. (4).
Obviously, OPMU is a convex optimization problem. Hence, F1 + F2 + F3
tnm = , (16)
the stationary solution for each MU is unique and optimal. F4
Note that pn n
m must be greater than bm , otherwise MU n will where F1 , F2 , F3 and F4 satisfy Eqn. (7).
not participate in TI m’s sensing task. This is because that
Then, using Eqn. (15) and (16), the optimal sensing
if pn n
m < bm and MU n participates in TI m’s task, suppose allocation strategies for M TIs and N MUs which covers
MU n contributes any sensing time tc n ∈ R+ , its benefits
m
2 cases I-II for any given feasible pn satisfy Eqn. (6).
n c n c
for serving m is computed as (pm − bm )tn
n
m − am tm which
n

is less than 0. This implies that any TI m who wants to From Theorem 1, MUs can adjust their sensing time allo-
recruit MU n to participate in its own task must provide the cation strategies according to the TIs’ pricing strategies and
sensing price at least more than bn obtain the maximum payoffs. The detailed sensing service
m , otherwise MU n will
not contribute any sensing data for its task. computing process for MU n is described by Algorithm 1.
When a MU receives the TIs’ pricing strategies, it starts

Algorithm 1 The sensing service computing process for For convenience, in the following, ϕm will be utilized
MU n. to replace ϕm (tm ). The Hassian matrix of ϕm is defined as
Input: TIs pricing strategies pn for MU n; 2
H = ( ∂p∂n1ϕ∂p
m
n2 ) ∈ R
N ×N
.
Ouput: The sensing time allocation strategy tn for MU n; m m
′ ′′ ∂gm (σm )
Also, we use gm (σm ) and gm (σm ) to denote and
1: Receiving the pricing strategies pn from all the TIs; 2
∂σm
∂ gm (σm )
∑
M
m −bm
pn n
∂σm 2 , respectively. According to the definition of ϕm ,
2: if 2an ≤ κn then the second-order derivative of ϕm with respect to (w.r.t.) pn
m=1 m m
pn −bn is
3: tnm = m2an m ; 2
∂ ϕm ′′ ′ (
g (σm ) − gm (σm ) ∂tm n )2
m
4: else if F1 + F2 + F3 > 0 then n 2 = m . (17)
n F1 +F2 +F3 ∂pm (1 + tnm )2 ∂pnm
5: tm = F4 ;
6: else The second-order partial derivative of ϕm is
7: tnm = 0; ∂ 2 ϕm ′′ 1 ∂tnm1 ∂tnm2
8: end if = g (σ m ) . (18)
∂pnm1 ∂pnm2 m
(1 + tnm1 )(1 + tnm2 ) ∂pnm1 ∂pnm2
Set diagonal matrix H1 = diag[λ1 , λ2 , · · · , λN ], where λn =
′
gm (σm ) ∂tn ′
its sensing time allocation strategies (Line 1) computing − (1+t m 2
n )2 ( ∂pn ) . Obviously, we have gm (σm ) > 0. As a
m m
process. First, the MU checks whether its sensing time is result, it can be derived that λ ≤ 0.(n
)
′′
enough (Line 2). If so, it means that the TIs could get enough Furthermore, set H2 = gm (σm ) H2 (n1 , n2 ) ∈ RN ×N ,
sensing time from MUs, and do not need to compete with where:
others (Line 3). However, when the MU’s sensing time is 1 ∂tnm1 ∂tnm2
limited, there is competition among TIs. In this case, If TI m H2 (n1 , n2 ) = H2 (n2 , n1 ) = .
(1 + tm )(1 + tm ) ∂pnm1 ∂pnm2
n1 n2
sets a feasible price for MU n , then MU n will allocate the (19)
′′
amount of sensing time larger than 0 to it (Line 4), then MU Then, we can rewrite H2 as H2 = ngm (σm )qq T , where
q = (q n ) ∈ RN ×1 , and q n = 1+t
n will participate in its sensing task (Line 5). Otherwise, MU 1 ∂tm
n ∂pn . According to the
m m
n will not participate in its sensing task (Line 7). Note that definition of Hassian matrix of ϕm , we have H = H1 + H2 .
there is no game played among MUs. Each MU responds Randomly select a vector v ∈ RN ×1 , where the elements in
to the pricing strategies which announced by the TIs only v are not all 0. Then, we have v T Hv = v T H1 v + v T H2 v .
using its local information. The pricing strategies depend definition of H1 , it is easy to obtain v T H1 v =
on all the sensing time allocation strategies selected by MUs ∑ nonnthe
Based
n λ (v )2
≤ 0. And according to the definition of H2 , we
and hence MUs indirectly affect each others’ decisions. have,
(∑ v n ∂tnm )2
′′ ′′
5.2 Nash Equilibrium Analysis for TIs v T H2 v = g m (σm )v T qq T v = gm (σm ) .
n 1 + tn ∂pn
m m
Since each TI behaves in a selfish manner, and all of them are (20)
′′ ∂tn
rational, all the TIs aim at maximizing their own payoffs. In Since gm (σm ) = − (1+σ µm
m)
2 < 0 and ∂pn
m
≥ 0 , it is clear
m
this situation, the competition among TIs can be formulated that v T H2 v ≤ 0. Therefore, we have that vHv ≤ 0.
as a non-cooperative game, whose solution is well-known This indicates that ϕm is a concave function. Meanwhile,
Nash equilibrium. Let δm > 0 be the budget of TI m. Each ∂ 2 (−pn n
m tm )
we have ∂pn 2 ≤ 0. It means that −pnm tnm is also a
TI aims to recruit more sensing time with a lower price. m
concave function. Since a function is concave if it is the
If there is a single TI, it could set a very low price to sum of concave functions, the payoff functions of TIs are
maximize its payoff. Assume that δm is given for each TI all concave.
m. As mentioned in Section 4, the sensing service pricing In addition, it is clear that the strategy sets of TIs are
problem OPTI of TI m is shown in Eqn. (3). closed, bounded and convex. Based on Lemma 1, there
In the proposed MCS game, the non-cooperative game exists a Nash Equilibrium in the non-cooperative game
among the TIs can be described as follows: among TIs.
• Players: TI m ∈ M. Since there is a Nash Equilibrium among TIs, and OPMU
• Strategy: Pricing strategy pm of any TI m. has a unique maximum for a given pm . Therefore, the
• Utility: Payoff functions of TIs given in Eqn. (2). Stackelberg game possesses a SE.
Lemma 1. If the following conditions are satisfied, there exists a Theorem 3. The Stackelberg game formulated in the considered
Nash Equilibrium in the non-cooperative game [35]. multi-leader multi-follower MCS game possesses a SE.
• The player set is finite. According to Theorem 3, we know that each TI m could
• The strategy sets are closed, bound, and convex. determine its optimal pricing strategy, in which the TI m
• The utility functions are continuous and quasi-concave in could not unilaterally change its pricing strategy to receive
the strategy space. more payoff. Furthermore, MUs can determine their optimal
Theorem 2. There exists a Nash Equilibrium in the non- sensing time allocation strategies according to TIs’ pricing
cooperative game among the TIs. strategies to gain the maximum payoffs. In order to solve
optimally OPTI , we start by relaxing the positivity constraint
Proof. As analyzed in Section 5.1, when TIs announce their
of pn
m for convenience of analysis. Let us define Lm as:
pricing strategies, each MU n will give its sensing time ∑
allocation strategy tn
m for TI m. Lm = φm (tm , pm ) − λm ( pnm tnm − δm ). (21)
n

Gradient based optimizer Loss

Update
e parameters
paramet
Policy Value function Update parameters
Update
p g
game history
y optimizer optimizer
Clear up buffer
Environment Agents
(MUs) (TIs)
Calculate state value

State Determine the pricing Replay

strategy buffer

...

...
...
1 D
Action
... Calculate
1 Critic network
Reward loss

...

...
...
...
...
(Payoff)
State Transition: 1
Policy: Actor network
Calculate
e

Calculate payoff
Close up when buffer is full,
while open up when buffer is empty
TI

Fig. 2. (left part) MDP for a MCS game; (right part) proposed DRL based modeling of DDIM for each TI.

The first order optimality condition for the TIs leads to MDP. Then, we present the DRL approach for TIs to learn
∂pn = 0. Then ∂pn = 0, for all n ∈ N gives N equations.
∂Lm ∂Lm
the optimal pricing strategies.
m m
Furthermore, ∂L∂pn
m
= 0, for all m ∈ M gives M equations.
m
Then, a Nash Equilibrium exists for the TIs in the pricing 6.1 MDP for MCS Game
strategies selection game which can be obtained by solving As shown in the left part of Fig. 2, we formulate the MCS
following equations together: game as a multi-agent MDP for MCS (referred as MMDP). It
{ ∂L } is composed of state space Sm , {sm }, action space Am ,
m
n
= 0, ∀n, m . (22) {pm }, state transition probability Pm , {Pm }, and reward
∂pm
Rm , {rm }, which will be described in detail in the next
Solving these M × N equations, we can obtain p∗m . From a few sections. Then, MMDP=< Sm , Am , Pm , Rm > [36].
Theorem 2, we know there exists a Nash Equilibrium for Furthermore, each TI acts as an agent in MMDP and the
TIs. Therefore, we know that Eqn. (22) has a solution. Using environment consists of N MUs.
p∗m , we can compute t∗m . However, even though we know
there exists a SE, yet we still do not know how to solve 6.1.1 State Space
them because of the complexity of Eqn. (22). That is, it is Sm = {sm }. We denote the pricing strategy of TI m at k -th
impossible to get the analytical solution of Eqn. (22). It is game and the vector of sensing time that all MUs spend for
clear that the TIs must know the private information of TI m as pm (k) and tm (k), respectively. Then, the state of
MUs to solve the Eqn.(22). However, in practical scenarios, TI m in MMDP is defined as sm (k) = [pm (k − L), tm (k −
it is impossible for the TIs to know the private information L), · · · , pm (k − 1), tm (k − 1)]. That is, the state of TI m is
of the MUs as a priori. Therefore, it is necessary to design comprised of the past L game records between TI m and all
an efficient algorithm for the incentive mechanism design MUs. We call it “game history matrix”.
in this multi-leader multi-follower MCS system. Since there
are no existing works can be applied directly to solve our 6.1.2 Action Space
problem, we seek to design a DRL-based approach to obtain Am = {pm }. The action for TI m at k -th game is defined as
the SE, as shown in the next section. pm (k), which is the pricing strategy profile of TI m at k -th
game.

6 P ROPOSED DRL- BASED DYNAMIC I NCENTIVE 6.1.3 State Transition Probability Function
M ECHANISM (DDIM) P = {Pm }, where Pm : Sm × Am × Sm → [0, 1] represents
Since optimization problem OPTI in Eqn. (3) is non-linear the transition probability distribution of TI m’s state. We as-
with complicated form, and the OPTI among TIs are tightly sume that the current state and action for TI m is sm (k) = s
coupled, it is difficult to solve it explicitly. Furthermore, and pm (k) = p, respectively. Then, the probability of
′ ′
anm , bnm represents the private information of an m − n pair sm (k + 1) = s is P (s |s, p).
but it is necessary for solving OPTI in Eqn. (3). Having
them can be unrealistic in practice when MUs are unwilling 6.1.4 Reward Function
to expose their private information in certain situations. Rm : Sm × Am → R, where the reward function for TI m is
Therefore, a DRL approach is employed to enable TIs to defined as
( )
learn the optimal pricing strategies directly from negotiation rm (k) = ξφm tm (k), pm (k) . (23)
history without prior knowledge about any MU. In the
following, we first formulate the MCS game as a multi-agent where ξ is a scaling factor.

6.1.5 DDIM Formulation Algorithm 2 DDIM algorithm for TI m.

We define the stochastic pricing strategy of TI m as παm : Input: Game history sm (k);
Sm × Am → [0, 1], where α∫m denotes parameters of policy Ouput: The pricing policy pm (k);
π . It is worth noting that παm (pm |sm )dpm = 1. Then, 1: Initialize sm ,παm ,Vβm and replay buffer D ;
the goal of the MMDP problem is to find the optimal price 2: for Episode in 1, 2, · · · do
policy for TI m, which can be formulated as: 3: for Epoch k in 1, · · · , D do
[ ( )] 4: if Episode > 1 then
α∗m = arg max Esm ∼ρm (s) Vm sm (24) 5: sm (1) = sm ((Eposide − 1)D + 1);
αm 6: end if
( ) [∑ 7: Update the game history matrix sm (k);
∞
where Vm sm = E l
l=0 γm rm (k + l)|sm (k) =
]
8: ( Derive ) pricing policy probability distribution
παm sm (k) ;
sm is state value function representing the expect-
9: Determine
( )the pricing policy pm (k) by sampling
ed ∑∞ l−1 ( reward from k -th game,
∫ accumulative ) ρm (sm ) = from παm sm (k) ;
Sm l=1 γm P sm (l) = sm |sm (1), παm dsm (1) is the 10: Broadcast its pricing strategy pm (k);
state probability distribution, and γm ∈ [0, 1] is a discount 11: Receive the MUs’ sensing time tm (k);
factor for TI m. 12: Calculate its reward rm (k);
∑
13: if n pn m (k)t n
m (k) > δm then
14: rm (k) = −ϑ;
6.2 Policy Optimization 15: end if
We employ the popular actor-critic framework based on pol- 16: Update replay buffer D by adding a new record
icy gradient method [37] to solve Eqn. (24). More precisely, [sm (k), pm (k), sm (k + 1), rm (k)];
for each TI m, the actor network is παm , which generates a s- 17: end for
Clip
tochastic pricing policy pm to satisfy probability distribution 18: Estimate the policy gradient ∇αm Jm as (26) and
παm (·|sm ), while the critic network is Vβm parameterized update the parameter of actor αm as (28);
by βm , which 19: Update the parameter of critic βm by minimizing the
( is) utilized to approximate the state value
function Vm sm . loss function Lm (βm ) in (27) as (29);
We denote Esm ∼ρm (s) [Vm (sm )] by Jm (αm ). Then, the 20: Clear up replay buffer D = ∅;
policy gradient [38] can be derived as: 21: end for

∇αm Jm =
[ ( )] Finally, the critic Vβm can be optimized by minimizing
=Esm ∼ρm (s),pm ∼παm ∇αm log παm (pm |sm )Qm sm , pm ) the following loss function:
[ ( ) ( )] [
=Esm ∼ρm (s),pm ∼πᾱm ∇αm log παm (·)fm sm , pm Am · Lm (βm ) = Esm ∼ρm (sm ) − Vβm (sm )
(25)
π m (pm |sm ) [ ′ ]] 2
where fm (sm , pm ) = πα
[∑
ᾱm (pm |sm ) . State action value + Es′ ∼Pm ,pm ∼πᾱ rm + Vβm (sm )
∞ m m
function Qm (sm , p]m ) = E l
l=1 γm rm (k +(l)|sm (k) ) = D [
∑ ( ) ∑D
( )]2
sm ,(pm (k) )= pm .( Advantage
) function A m s m , p m = ≈ − Vβm sm (k) + rm (l) + Vβm sm (D + 1) ,
Qm sm , pm − Vm sm , and ᾱm is the parameter ( ) of k=1 l=k
the policy used for sampling pm . Notably, Vm sm = (27)
Epm ∼παm Qm (sm , pm ). where D is the number of samples for training the critic.
Moreover, to reduce oscillation caused by gradient
method during training, we integrate the proximal policy 6.3 Proposed DRL based Modeling of DDIM for Each TI
optimization (PPO) method which was proposed in [39] to
As shown in the right part of Fig. 2, each TI is an in-
stabilize the training process, that clips the policy gradient
dependent agent with one actor network and one crit-
as :
ic network. At k -th game, TI m will update its state
∇αm Jm Clip sm (k) (i.e., game history matrix) as the input of actor
[ ( )] network. The actor network will then generate its action
= ∇αm Esm ∼ρm (s),pm ∼πᾱm min fm Am , η(fm )Am pm (k) (i.e., the pricing policy). After all TIs broadcast
∑D their pricing policies (p1 (k), p2 (k) · · · , pM (k)), MUs deter-
[ ]
≈ ∇αm log παm (k) min fm (k)Âm (k), η(fm (k))Âm (k) , mine their sensing time allocation strategies for all the TIs
k=1 (t1 (k), t2 (k) · · · , tM (k)). When receiving MUs’ strategies,
(26) TI m will calculate its payoff rm (k) and the newest game
where η(x) is the piecewise function with intervals [x < record [sm (k), pm (k), sm (k + 1), rm (k)] will be stored into
π m (pm (k)|sm (k))
1−ε, 1−ε ≤ x ≤ 1+ε, x > 1+ε], fm (k) = πα ᾱm (pm (k)|sm (k))
, the replay buffer D.
∑D
Âm (k) = l=k rm (l) + Vβm (sm (D + 1)) − Vβm (sm (k)), and
D is number of samples for estimating the policy gradient 6.3.1 Update Actor and Critic
at one training step. Through policy gradient method, the The actor and critic network will be updated when the
actor can be optimized. replay buffer is filled with D records. Specifically, TI m will

1536-1233 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2019.2927314, IEEE
Transactions on Mobile Computing
IEEE TRANS. ON MOBILE COMPUTING, VOL. XX, NO. XX, XX XX 8
( )
first calculate all the state values Vβm sm (k) in D through 7.2 Compared State-of-the-Art and Baseline Approach-
∑D
critic network. Then, it calculates all l=k rm (l), fm (k), and
es
( ) ∑D
Âm (k) in D. After all Vβm sm (k) , l=k rm (l), fm (k), and We compare our work with two state-of-the-art approaches,
Clip
Âm (k) in D are calculated, the policy gradient ∇αm Jm is and two baseline approaches.
estimated and the parameter of the actor network will be
Platform-centric Model (PCM) [10]: this is a modified
•
updated by using gradient ascent method as:
version of classical incentive mechanism based on a
′ Stackelberg game, where originally the MCS server
αm ← αm + τ1 ∇αm Jm
Clip
(αm ), (28) can only issue one task in one sensing slot. We
extend this work to multiple sensing tasks where
where τ1 is the learning rate for training actor network and tasks appear in a random sequence.
′
αm represents the renewed parameter of actor network. • Platform-centric Model with data quality (PCMDQ)
Furthermore, the critic network can be updated by min- [11]: different from PCM, in this work, the authors
imizing loss function Lm (βm ) by gradient descent method: took the data quality into consideration.
′
• Greedy: it is a heuristic algorithm that greedily
βm ← βm − τ2 ∇βm Lm (βm ), (29) chooses a policy with maximum reward from the
replay buffer.
where τ2 is the learning rate for training critic network and • Random: TIs randomly select their prices, and MUs
′
βm represents the renewed parameter of critic network. randomly decide their sensing time.
∑ ∑
We use total payoff value ζ = m φm + n ψn for perfor-
6.3.2 Detailed Explanation of DDIM mance evaluation, where φm , ψn are payoffs for each TI and
MU as in Eqn. (1) and (2), respectively.
Algorithm 2 shows the pseudocode for DDIM. When a game
starts, each TI initializes its game history matrix (Line 1).
During each game period, it updates its game history matrix 7.3 Results
first (Line 7). Then, by taking its game history matrix as First, we show the convergence of our proposed DDIM
an input of its policy network, each TI derives its pricing approach. In this group of simulations, the maximum sens-
policy and sends it to MUs (Line 8-10). After receiving MUs’ ing time of each MU n is randomly generated from range
sensing time (Line 11), TIs calculate their payoffs and store (6, 10). We set the number of TIs as M = 2 and the number
their game information into replay buffer (Line 12-16). The of MUs as N = 8. Fig. 3(a) shows the pricing strategy of
predefined penalty ϑ will be given to pm (k) if it makes TI 1. We see that our proposed DDIM approach converges
TI m exceed its budget (Line 13-15). The actor network to the optimal prices quickly within less than 200 episodes.
and critic network are updated every D epochs using the Since TIs constantly adjust their pricing strategies, MUs also
past D training samples stored in replay buffer (Line 3 adjust their sensing time allocation strategies accordingly.
and Line 18-19). After updating these two neural networks As shown in Fig. 3(d), MUs’ sensing time allocation strate-
with gradient descent and gradient ascent respectively, the gies also converge to the stable states. Specifically, from
replay buffer will be cleared up (Line 20); and the game will Fig. 3(e) and Fig. 3(f), we see that the payoffs of TIs and
finish when reaching maximum training episodes (Line 2). MUs converge quickly, which means that DDIM approach
Furthermore, both the actor and critic networks are chosen allows participants to quickly obtain optimal payoffs.
as fully-connected neural networks with two hidden layers. Next, we show how each MU and TI behave in a MCS
Specifically, the actor network παm takes state sm (k) as its game under all five approaches, as shown in Table 2, in
input and outputs the mean and variance of action pm . terms of their payoffs. We set the number of MUs and
The critic Vβm takes state sm (k) as its input and outputs TIs both to N = M = 6. The MUs’ maximum sensing
estimated state value of sm (k). time κn randomly generated from (2, 4). We can see that
DDIM achieves the highest total payoff of all MUs and
TIs ζ = 375.3, compared to 203.6 and 214.9 for PCM
and PCMDQ, respectively. Greedy and Random approaches
7 P ERFORMANCE E VALUATION are slightly better but still our approach DDIM has an at
7.1 Simulation Settings least 8.6% improvement. For each MU and TI, we see that
several TIs achieve higher payoffs in PCM and PCMDQ at
Simulations are conducted to evaluate the performance of the beginning. This is because initially MUs are not aware
the proposed DDIM approach with M TIs and N MUs. of performing subsequent sensing tasks will bring higher
Parameter µm for TI m is generated uniformly at random benefits. Thus, MUs try to offer their sensing time as much
from ranges [30, 120]. Values of an n n
m , bm , ωm between MU as possible to serve TIs, and making TIs set a very low price
n and TI m are generated uniformly at random within initially to obtain higher payoffs. However, this strategy
(0, 1). Budget δm of each TI m is randomly generated within will result in MUs to quickly consume sensing time and
[80, 100]. The neural network parameters in our simulation cannot serve any further TIs. For example, in PCM and
are selected through fine-tuning. In this paper, the actor PCMDQ, TIs 4-6 are unable to recruit any MU. On the
network and critic network for each agent both have two contrary, in DDIM all TIs compete with each other so that
hidden fully-connected layers with 200 neurons and 50 MUs allocate their sensing time in a much better way. This
neurons respectively. We set D = 20, and L = 5 by default. competition has been formulated as a non-cooperative game

3 7 2.8
p11 x11 p12

p21 x21 p22

Sensing Service Strategies

Price Strategy of TI 1
2.5 p31 x31 2.6 p32

Price Strategy of TI 2
p41 5 x41 p42

p51 x51 p52

2 4
2.4
p61 x61 p62

p71 x71 p72

1.5 3
p81 x81 2.2 p82
SE SE SE
2
1 2
1

0.5 0 1.8
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Episode Episode
Episode
(a) Pricing strategy of TI 1. (b) MU sensing strategies to TI 1. (c) Pricing strategy of TI 2.
4.5 25
x12 160
Sensing Service Strategies

4 x22
x32
140
20
3.5
x42

Payoffs of MUs
120
Payoffs of TIs
3 x52
15
x62 100
2.5
x72
x82
80 10
2
SE
1.5 60
5
1 40

0.5 20 0
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Episode Episode Episode
(d) MU sensing strategies to TI 2. (e) Payoffs of TIs. (f) Payoffs of MUs.

Fig. 3. SE and convergence of proposed DDIM approach.

TABLE 2 in the simulation of Fig. 5, we set the number of TIs with 20.
Payoff values for five approaches Fig. 4(a) shows the overall payoff value for all MUs and TIs,
where we see that DDIM achieves clearer gains when more
MU/TI id 1 2 3 4 5 6
TIs are assumed. For example, ζ = 1750 when M = 20 for
ψn 8.7 11.3 10.7 9.2 8.2 12.3
DDIM DDIM, but only 1250 for Greedy with the same number
φm 46.8 ∑58.3 46.9
∑ 39.4 53.5 70
ζ = n ψn + m φm = 375.3 of TIs. In order to understand the reasons behind it, we
ψn 8 10.8 9.6 8.4 7 11.2 examine the respective payoffs for TIs and MUs, as shown
Greedy
φm 32 ∑48 35.8
∑ 28.2 39.2 107.5
ζ = n ψn + m φm = 345.7
in Fig. 4(b) and Fig. 4(c). We observe from Fig. 4(b) that TI
ψn 3.1 5.4 4.9 5.9 2.6 3.0 payoffs decrease with the increase of number of TIs. That
PCM
φm 80.6 ∑89.1 9.0
∑ 0 0 0 is, although the sensing time supplies remain unchanged,
ζ = n ψn + m φm = 203.6 it leads to more competitions among TIs. Therefore, TIs
ψn 3.8 7.6 8.8 6.4 3.3 3.2
PCMDQ
φm 81.7 ∑90.2 9.9 0 0 0
need to increase prices to recruit sensing time from MUs,
∑
ζ = n ψn + m φm = 214.9 resulting in a decline of their payoffs, as shown in Fig. 4(b)
Random
ψn 8.6 11 9.9 8.6 7.8 11.8 and a rise in MUs’ payoffs in Fig. 4(c). We also see that
φm 20.8 ∑35.2 23
∑ 18 28 40 payoffs of TIs in PCM and PCMDQ drop more quickly than
ζ = n ψn + m φm = 222.7
DDIM, since the degree of competitions among TIs does
not change in PCM and PCMDQ approaches. In addition,
more TIs will not influence how MUs behave in PCM and
in this work, whose solution is Nash equilibrium, i.e., all the PCMDQ schemes. This because that in their approaches
sensing time have been optimally allocated, makes DDIM since MUs only serve the early coming TIs, when the
outperform PCM and PCMDQ. In Greedy, TIs can only MUs’ sensing time are exhausted, the following TIs will
see their instantaneous reward, but not long-term ones of not have any influence on them and the payoffs of MUs
the future; and in Random, TIs always choose the pricing will remain unchanged. For example, as shown in Fig. 4(c),
strategies randomly, which lead TIs to pick worse pricing when M = 5, the sensing time of all the MUs in PCM have
strategies and thus lower payoffs. been exhausted, therefore the following coming TIs cannot
Next, we show the overall performance comparison recruit any MU and the payoffs of the MUs will not change.
when varying the number of TIs and MUs, in Fig. 4 and
Fig. 5. In this set of simulations, we set MUs’ maximum Then, we show the trend of total payoffs for all MUs
sensing time randomly generated from range (2, 3). In the and TIs when varying the number of MUs, in Fig. 5(a).
simulation of Fig. 4, we set the number of MUs with 40. And We see that DDIM obviously achieves better than all other

2000 120 10
DDIM DDIM DDIM
Greedy Greedy Greedy
Total Payoffs of TIs and MUs PCM PCM PCM
100
8

Average Payoffs of MUs

PCMDQ PCMDQ PCMDQ

Average Payoffs of TIs

Random Random Random
1500 80
6
60
4
1000 40

2
20

500
0 0
5 10 15 20 25 5 10 15 20 25 5 10 15 20 25
Number of TIs Number of TIs Number of TIs
(a) Total payoffs of TIs and MUs when varying (b) Payoffs of TIs when varying no. of TIs. (c) Payoffs of MUs when varying no. of TIs.
no. of TIs.

Fig. 4. Performance comparison when varying no. of TIs.

2000 80 12
DDIM DDIM
Greedy Greedy
70
Total Payoffs of TIs and MUs

PCM PCM 10 DDIM

Average Payoffs of MUs

PCMDQ PCMDQ
Average Payoffs of TIs

Greedy
1500 Random 60 Random PCM
8 PCMDQ
50 Random

1000 40 6

30
4
500 20
2
10

0 0 0
5 10 20 30 40 5 10 20 30 40 5 10 20 30 40
Number of MUs Number of MUs Number of MUs
(a) Total payoffs of TIs and MUs when varying (b) Payoffs of TIs when varying no. of MUs. (c) Payoffs of MUs when varying no. of MUs.
no. of MUs.

Fig. 5. Performance comparison when varying no. of MUs.

80
four approaches. For example, ζ = 1450 when N = 30 for DDIM
Greedy
DDIM, but only 495 for PCM and 1150 for Greedy with the 70 PCM
same number of MUs. In order to understand the reasons PCMDQ
Average Payoffs of TIs

60 Random
behind it, we examine the respective payoffs for TIs and
50
MUs, as shown in Fig. 5(b) and Fig. 5(c). From Fig. 5(b)
and Fig. 5(c), we can see that TI payoffs increase with more 40
MUs, while MU payoffs decrease. This is because that TIs
30
have more options to buy sensing service from more MUs.
Fig. 6 shows the overall performance comparison when 20

varying of MUs’ maximum sensing time κmax . Here we 10

fix M = 4 and N = 6. We simulate the sensing time for
0
each MU κn randomly distributed between (κmax −1, κmax ), 0.2 0.4 0.6 0.8 1
where κmax was varied from 1 to 5 with increment of 1. We max
observe that with the increase of κmax , all five approaches Fig. 7. Payoffs of TIs when varying ωmax .
have the same trend for the total payoffs of TIs and MUs, as
shown in Fig. 6(a). For example, ζ = 360 when κmax = 5 thus for baseline approaches, early arrived TIs quickly take
for DDIM, but only 310 for Greedy with the same amount up MU sensing time, which makes it not optimally allocat-
of κmax . In order to understand the reasons behind it, we ed. While DDIM can make the best allocation of the MUs’
examine the respective payoffs for TIs and MUs, as shown sensing time, therefore TIs will obtain the higher payoffs
in Fig. 6(b) and Fig. 6(c). We observe from Fig. 6(b) that in DDIM. With the increase of κmax , MUs could sell more
TI payoffs increase with κmax . This is because that larger sensing time to the TIs to obtain more payoffs, thus the
κn implies that MUs can provide more sensing time, thus payoffs of the MUs have an increase, which is shown in
offering more opportunities for TIs to select the best ones Fig. 6(c).
for a higher sensing quality. However, DDIM consistently Fig. 7 shows the change of average TI payoffs when
outperforms the baselines. Because although κn is increased, varying of maximum sensing qualities of MUs for TIs ωmax .
sensing time is still far from enough to satisfy all TIs, and Here, we fix the number of TIs and MUs as M = 4 and

400 80 10
DDIM DDIM DDIM
Greedy Greedy Greedy
Total Payoffs of TIs and MUs 350 PCM 70 PCM PCM
PCMDQ PCMDQ 8 PCMDQ

Average Payoffs of MUs

Average Payoffs of TIs
Random Random Random
300 60

250 50 6

200 40
4
150 30
2
100 20

50 10 0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
κmax
max max

(a) Total payoffs of TIs and MUs when varying (b) Payoffs of TIs when varying κmax . (c) Payoffs of MUs when varying κmax .
κmax .

Fig. 6. Performance comparison when varying κmax .

120
N = 6, respectively. We simulate the sensing quality of MU TI 1
as randomly distributed in (ωmax −0.2, ωmax ),
n TI 2
n for TI m ωm 100 TI 3
where ωmax was varied from 0.2 to 1 with an increment
of 0.2. We see that with the increase of ωmax , all the 80

Payoffs of TIs
five approaches have the same trend. This is because that
n
larger ωm implies that MUs can provide sensing data with 60
higher quality, thus making the TIs obtain better sensing
performance. Meanwhile, DDIM consistently obtains the 40
best performance compared with all the baselines. Because
with the increasing of ωmax , although TIs in all the five 20

approaches can obtain better performance, yet for baseline

0
approaches, there is no competition among TIs. That is, early 20 40 60 80 100
arrived TIs quickly take up MU sensing time, which makes 1
it not optimally allocated. On the other hand, our approach Fig. 9. Payoffs of TIs when varying µ1 .
DDIM can make the best allocation of MUs’ sensing time,
therefore TIs will obtain higher payoffs in DDIM.
Fig. 8 shows the overall performance when varying µm . higher prices to the MUs to get more sensing time. At the
In this set of simulations, we set MUs’ maximum sensing same time, as TI 1 increases its price, MUs are more eager
time κn as randomly generated from range (2, 3). We fix to contribute their sensing time to serve TI 1. In this case, TI
M = 4, N = 6. We simulate that µm was varied from 30 2 and 3 with the fixed µm also need to increase their prices,
to 70 with an increment of 10. Fig. 8(a) shows that the total only in this way can they get sufficient sensing time from
payoffs of TIs and MUs obtained by DDIM increase with the MUs, aiming to maximize their payoffs.
larger value of µm . The reason is that a larger value of µm
implies more utilities obtained given the same sensing time 60
DDIM
of MUs. Fig. 8(b) shows the payoffs of the TIs under the Greedy
PCM
impact of µm , where we can observe that the payoffs of TIs 50 PCMDQ
Average Payoffs of TIs

increase with µm . This is because with the increase of µm , Random

TIs will obtain more utilities given the same sensing time 40
of MUs. In Fig. 8(c), the average payoff of all the MUs have
an increase with larger value of µm . This is because that
30
with larger µm , TIs will obtain more utilities given the same
amount of sensing time of MUs, then the TIs will increase
20
their prices for the MUs to get more sensing time, thus the
MUs will obtain higher payoffs.
Fig. 9 shows the change of TIs’ payoffs when varying of 10
0.1 0.2 0.3 0.4 0.5
µ1 . In this simulation, we fix MUs’ maximum sensing time
κn as randomly generated from range (2, 3). We set M = 3, Fig. 10. Payoffs of TIs when varying em
n .
N = 6 and µ2 = µ3 = 50. We simulate that µ1 was varied
from 20 to 100 with an increment of 20. From Fig. 9, we can Fig. 10 shows the change of TIs’ average payoffs when
observe that the payoff of TI 1 increases with µ1 , while the varying emn . In this simulation, MU n leaves TI m’s PoI
payoffs of TI 2 and 3 have a slow decrease. This is because in each time slot with a probability emn . We simulate that
that with the increase of µ1 , TI 1 will obtain more utilities em
n was varied from 0.1 to 0.5 with an increment of 0.1.
given the same amount of sensing time of MUs, thus paying From Fig. 10, we can observe that the average payoffs of TIs

300 70 8
DDIM DDIM DDIM
Greedy Greedy Greedy
Total Payoffs of TIs and MUs PCIM PCM 7 PCM
60
250

Average Payoffs of MUs

PCMDQ PCMDQ PCMDQ

Average Payoffs of TIs

Random Random 6 Random
50
200 5

40 4
150
3
30
2
100
20
1

50 10 0
30 40 50 60 70 30 40 50 60 70 30 40 50 60 70
m m m

(a) Total payoffs of TIs and MUs when varying (b) Payoffs of TIs when varying µm . (c) Payoffs of MUs when varying µm .
µm .

Fig. 8. Performance comparison when varying µm .

decrease with em
n . This is because that with the increase of 8.3 Time Complexity During Actual Execution
em
n , more MUs will leave the PoIs of the TIs, making the During the executing phase, given the state information as
sensing data worthless for the TIs. DDIM can learn MUs’ input, each TI m utilizes its own actor network παm to out-
mobility model, then can make the best decision for each put an action, thus the computational complexity is merely
TI, while other approaches can not. From Fig. 10, we can based on an actor network. According to [41], the time
see DDIM obtains the best performance. For example, the complexity for a fully-connected deep neural network is
average payoffs of TIs is 35 when emn = 0.5 for DDIM, but ∑F
only 11 for PCM and 30 for Greedy with the same em n.
computed as the number of multiplications: O( ϵf ·ϵf −1 ),
f =1
where ϵf is the number of neural units in fully-connected
8 D ISCUSSIONS layer f .
8.1 Extension to General Incentive Mechanism Design In this work, we design the actor network παm with
of Multi-Leader Multi-Follower MCS Case two fully-connected hidden layers. Therefore, executing the
In this paper, the incentive mechanism of multi-leader actor network will not impose a computational burden on
multi-follower MCS is modeled as a two-stage Stackelberg TIs.
game. In this game, the main challenge is how to achieve
the SE for each participant (TIs and MUs) under the unfa-
miliar environment. From Algorithm 1, we know that MUs
9 C ONCLUSION
could obtain the optimal sensing time allocation strategies In this paper, we designed the pricing and sensing time
directly from the prices published by the TIs. Therefore, the allocation strategies for MCS systems with multiple TIs
remaining challenge becomes how to achieve the optimal and multiple MUs to incentivize MUs for participation. We
pricing strategies for the TIs. Due to this challenge, we studied this problem from a free market perspective with
design the Algorithm 2. In Algorithm 2, we design the the goal of achieving a SE. We first formulated the incentive
DDIM based on DRL approach. With DDIM, each TI can mechanism as a Stackelberg game, which can reveal the
directly decide its pricing strategy from the previous game characteristics of the supply-demand pattern for MCS. We
records without knowing any prior information of the other then analyzed and proved that there exists a SE, however
TIs and MUs. Therefore, the incentive mechanism design by its closed-form expression cannot be obtained. Next, we
DRL approach in this paper can be generalized to any multi- proposed a DRL based solution called DDIM to solve it.
leader multi-follower MCS case which can be modeled as a TIs can learn the optimal pricing strategies directly from
two-stage Stackelberg game. game experience without knowing the private information
of MUs. Extensive simulation results showed that DDIM
8.2 Extension to Privacy-Preserving Model and Securi- outperforms the state-of-the-art and baseline approaches.
ty of MUs
In the incentive mechanism designed in this paper, when
the TIs decide their optimal pricing strategies (Eqn. (22)),
R EFERENCES
they need to know the exact utility functions of other [1] A. Draghici and M. V. Steen, “A survey of techniques
participants. Since the parameters an n
m , bm for each MU n for automatically sensing the behavior of a crowd,”
are privacy information [12, 32, 40], the TIs may not have ACM Comput. Surv., vol. 51, no. 1, p. 21, 2018.
sufficient information to solve Eqn. (22). In our future work, [2] B. Guo, Z. Wang, Z. Yu, and et al., “Mobile crowd
we are going to use the privacy-preserving model to protect sensing and computing: The review of an emerg-
the security of MUs, such like encrypting the location infor- ing human-powered sensing paradigm,” ACM Comput.
mation of MUs through privacy-preserving model to avoid Surv., vol. 48, no. 1, pp. 7:1–7:31, 2015.
privacy leakage. [3] “Patientslikeme,” https://www.patientslikeme.com/.

[4] P. Zhou, Y. Zheng, and M. Li, “How long to wait?: 795–808, 2017.
predicting bus arrival time with mobile phone based [21] L. Duan, T. Kubo, K. Sugiyama, J. Huang, T. Hasegawa,
participatory sensing,” in ACM MobiSys, 2012, pp. 379– and J. Walrand, “Incentive mechanisms for smartphone
392. collaboration in data acquisition and distributed com-
[5] Y. Cheng, X. Li, Z. Li, and et al., “Aircloud: a cloud- puting,” in IEEE INFOCOM, 2012, pp. 1701–1709.
based air-quality monitoring system for everyone,” in [22] M. H. Cheung, F. Hou, and J. Huang, “Delay-sensitive
ACM SenSys, 2014, pp. 251–265. mobile crowdsensing: Algorithm design and economic-
[6] F. Rebecchi, M. D. De Amorim, V. Conan, A. Passarella, s,” IEEE Trans. Mob. Comput., 2018.
R. Bruno, and M. Conti, “Data offloading techniques in [23] Y. Chen, B. Li, and Q. Zhang, “Incentivizing crowd-
cellular networks: A survey,” IEEE Commun. Surv. Tut., sourcing systems with network effects,” in IEEE INFO-
vol. 17, no. 2, pp. 580–603, 2015. COM, 2016, pp. 1–9.
[7] M. Xiao, J. Wu, L. Huang, and et al., “Online task as- [24] J. Nie, J. Luo, Z. Xiong, D. Niyato, and et al., “A stackel-
signment for crowdsensing in predictable mobile social berg game approach towards socially-aware incentive
networks,” IEEE Trans. Mob. Comput., vol. 16, no. 8, pp. mechanisms for mobile crowdsensing,” arXiv preprint
2306 – 2320, 2017. arXiv:1807.08412v1, 2018.
[8] E. Wang, Y. Yang, J. Wu, W. Liu, and X. Wang, “An [25] J. Nie, Z. Xiong, D. Niyato, and et al., “A socially-aware
efficient prediction-based user recruitment for mobile incentive mechanisms for mobile crowdsensing service
crowdsensing,” IEEE Trans. Mob. Comput., vol. 17, no. 1, market,” arXiv preprint arXiv:1807.08412v1, 2018.
pp. 16–28, 2018. [26] L. Xiao, T. Chen, C. Xie, H. Dai, and H. V. Poor,
[9] D. Yang, G. Xue, X. Fang, and et al., “Crowdsourcing to “Mobile crowdsensing games in vehicular networks,”
smartphones: incentive mechanism design for mobile IEEE Trans. Veh. Technol., vol. 67, no. 2, pp. 1535–1545,
phone sensing,” in ACM MobiCom, 2012, pp. 173–184. 2018.
[10] D. Yang, G. Xue, X. Fang, and J. Tang, “Incentive [27] J. Peng, Y. Zhu, W. Shu, and M. Wu, “When da-
mechanisms for crowdsensing: Crowdsourcing with ta contributors meet multiple crowdsourcers: Bilateral
smartphones,” IEEE/ACM Trans. Net., vol. 24, no. 3, pp. competition in mobile crowdsourcing,” Comput. Netw.,
1732–1744, 2016. vol. 95, pp. 1–14, 2016.
[11] S. Maharjan, Y. Zhang, and S. Gjessing, “Optimal in- [28] A. Chakeri and L. G. Jaimes, “An incentive mecha-
centive design for cloud-enabled multimedia crowd- nism for crowdsensing markets with multiple crowd-
sourcing,” IEEE Trans. Multimedia, vol. 18, no. 12, pp. souarcers,” IEEE Internet of Things J., vol. 5, no. 2, pp.
2470–2481, 2016. 708–715, 2018.
[12] L. Xiao, Y. Li, G. Han, and et al., “A secure mobile [29] A. Sinha, P. Malo, A. Frantsev, and K. Deb, “Finding op-
crowdsensing game with deep reinforcement learn- timal strategies in a multi-period multi-leader–follower
ing,” IEEE Trans. Inf. Foren. Sec., vol. 13, no. 1, pp. 35 stackelberg game using an evolutionary algorithm,”
– 47, 2018. Computers & Operations Research, vol. 41, pp. 374–385,
[13] K. Han, C. Zhang, J. Luo, and et al., “Truthful schedul- 2014.
ing mechanisms for powering mobile crowdsensing,” [30] S. He, D.-H. Shin, J. Zhang, J. Chen, and P. Lin, “An
IEEE Trans. Comput., vol. 65, no. 1, pp. 294–307, 2016. exchange market approach to mobile crowdsensing:
[14] D. Zhao, X.-Y. Li, and H. Ma, “How to crowdsource pricing, task allocation, and walrasian equilibrium,”
tasks truthfully without sacrificing utility: Online in- IEEE JSAC, vol. 35, no. 4, pp. 921–934, 2017.
centive mechanisms with budget constraint.” in IEEE [31] Y. Zhan, Y. Xia, and J. Zhang, “Quality-aware incentive
INFOCOM, vol. 14, 2014, pp. 1213–1221. mechanism based on payoff maximization for mobile
[15] L. Gao, F. Hou, and J. Huang, “Providing long-term crowdsensing,” Ad Hoc Netw., vol. 72, pp. 44–55, 2018.
participation incentive in participatory sensing,” in [32] X. Duan, C. Zhao, S. He, P. Cheng, and J. Zhang, “Dis-
IEEE INFOCOM, 2015, pp. 2803–2811. tributed algorithms to compute walrasian equilibrium
[16] J. Wang, J. Tang, D. Yang, and et al., “Quality-aware and in mobile crowdsensing,” IEEE Trans. Ind. Electron.,
fine-grained incentive mechanisms for mobile crowd- vol. 64, no. 5, pp. 4048–4057, 2017.
sensing,” in IEEE ICDCS, 2016, pp. 354–363. [33] M. H. Cheung, F. Hou, and J. Huang, “Make a differ-
[17] H. Jin, L. Su, D. Chen, and et al., “Quality of infor- ence: Diversity-driven social mobile crowdsensing,” in
mation aware incentive mechanisms for mobile crowd IEEE INFOCOM, 2017, pp. 1–9.
sensing systems,” in ACM MobiHoc, 2015, pp. 167–176. [34] T. Liu and Y. Zhu, “Social welfare maximization in par-
[18] H. Jin, L. Su, D. Chen, H. Guo, K. Nahrstedt, and J. Xu, ticipatory smartphone sensing,” Comput. Netw., vol. 73,
“Thanos: Incentive mechanism with quality awareness pp. 195–209, 2014.
for mobile crowd sensing,” IEEE Trans. Mob. Comput., [35] D. Fudenberg and J. Tirole, Game Theory. Mas-
2018. sachusetts: Cambridge Press, 1991.
[19] H. Jin, L. Su, H. Xiao, and K. Nahrstedt, “Incentive [36] R. S. Sutton, A. G. Barto et al., Reinforcement learning: An
mechanism for privacy-aware data aggregation in mo- introduction. MIT press, 1998.
bile crowd sensing systems,” IEEE/ACM Trans. Net., [37] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithm-
vol. 26, no. 5, pp. 2019–2032, 2018. s,” in NIPS, 2000, pp. 1008–1014.
[20] X. Gan, Y. Li, W. Wang, L. Fu, and X. Wang, “Social [38] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Man-
crowdsourcing to friends: An incentive mechanism for sour, “Policy gradient methods for reinforcement learn-
multi-resource sharing,” IEEE JSAC, vol. 35, no. 3, pp. ing with function approximation,” in NIPS, 2000, pp.

1057–1063. Yinuo Zhao is a final year undergraduate stu-

[39] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and dent at the School of Computer Science and
Technology at Beijing Institute of Technology,
O. Klimov, “Proximal policy optimization algorithms,” China. Her research interests include machine
arXiv preprint arXiv:1707.06347, 2017. learning and mobile computing.
[40] L. Xiao, Y. Li, X. Huang, and X. Du, “Cloud-based
malware detection game for mobile devices with of-
floading,” IEEE Trans. Mob. Comput., vol. 16, no. 10, pp.
2742–2750, 2017.
[41] I. Goodfellow, Y. Bengio, and A. Courville, Deep learn-
ing. MIT press, 2016.

Yufeng Zhan received his Ph.D. degree in Jiang Zhang is currently a final year undergrad-
School of Automation from Beijing Institute of uate student at the School of Computer Science
Technology, Beijing, China, in 2018. His re- and Technology at Beijing Institute of Technol-
search interests include mobile computing, ma- ogy, China. His research interests include ma-
chine learning and networked control systems. chine learning and control science and technol-
ogy.

Chi Harold Liu (SM’15) receives his Ph.D. de-

gree from Imperial College, UK, and his B.Eng.
degree from Tsinghua University, China. Jian Tang (F’19) is a professor in the Depart-
He is currently a Full Professor and Vice Dean ment of Electrical Engineering and Computer
at the School of Computer Science and Technol- Science at Syracuse University. He received his
ogy, Beijing Institute of Technology, China. He is Ph.D degree in Computer Science from Arizona
also the vice dean of the School of Computer State University in 2006. His research interests
Science and Technology, the Director of IBM lie in the areas of Wireless Networking, Machine
Mainframe Excellence Center (Beijing), Director Learning, Big Data and Cloud Computing. Dr.
of IBM Big Data Technology Center, and Director Tang has published over 120 papers in premier
of National Laboratory of Data Intelligence for journals and conferences. He received an NSF
China Light Industry. Before moving to academia, he joined IBM Re- CAREER award in 2009, the 2016 Best Vehic-
search - China as a staff researcher and project manager, after work- ular Electronics Paper Award from IEEE Vehic-
ing as a postdoctoral researcher at Deutsche Telekom Laboratories, ular Technology Society, and Best Paper Awards from the 2014 IEEE
Germany, and a visiting scholar at IBM T. J. Watson Research Center, International Conference on Communications (ICC) and the 2015 IEEE
USA. His current research interests include the Internet-of-Things (IoT), Global Communications Conference (Globecom) respectively. He has
big data analytics, mobile computing, and wireless ad hoc, sensor, and served as an editor for a few IEEE journals, including IEEE Transactions
mesh networks. on Big Data, IEEE Transactions on Mobile Computing, IEEE Trans-
He received the Distinguished Young Scholar Award in 2013, IBM First actions on Network Science and Engineering, IEEE Transactions on
Plateau Invention Achievement Award in 2012, and IBM First Patent Wireless Communications, IEEE Internet of Things Journal and IEEE
Application Award in 2011 and was interviewed by EEWeb.com as the Transactions on Vehicular Technology. In addition, he served as a TPC
Featured Engineer in 2011. He has published more than 60 prestigious co-chair for the 2018 International Conference on Mobile and Ubiq-
conference and journal papers and owned more than 10 EU/U.S./China uitous Systems: Computing, Networking and Services (Mobiquituous),
patents. He serves as the editor for KSII Trans. on Internet and Informa- the 2015 IEEE International Conference on Internet of Things (iThings)
tion Systems and the book editor for four books published by Taylor & and the 2016 International Conference on Computing, Networking and
Francis Group, USA. He also has served as the general chair of IEEE Communications (ICNC); as the TPC vice chair for the 2019 IEEE
SECON’13 workshop on IoT Networking and Control, IEEE WCNC’12 International Conference on Computer Communications (INFOCOM);
workshop on IoT Enabling Technologies, and ACM UbiComp’11 Work- and as an area TPC chair for INFOCOM 2017-2018. He is also the vice
shop on Networking and Object Memories for IoT. He served as the chair of the Communications Switching and Routing Committee of IEEE
consultant to Asian Development Bank, Bain & Company, and KPMG, Communications Society.
USA, and the peer reviewer for Qatar National Research Foundation,
and National Science Foundation, China. He is a senior member of IEEE
and a Fellow of IET.

Saeco Royal Service Manual
67% (3)
Saeco Royal Service Manual
143 pages
05a - HORIZONTAL BRACE
No ratings yet
05a - HORIZONTAL BRACE
10 pages
20
No ratings yet
20
11 pages
Instant download Incentive Mechanism for Mobile Crowdsensing: A Game-theoretic Approach 1st Edition Youqi Li pdf all chapter
100% (3)
Instant download Incentive Mechanism for Mobile Crowdsensing: A Game-theoretic Approach 1st Edition Youqi Li pdf all chapter
37 pages
A Stackelberg Game Approach For Managing AI Sensing Tasks in Mobile Crowdsensing
No ratings yet
A Stackelberg Game Approach For Managing AI Sensing Tasks in Mobile Crowdsensing
21 pages
Incentive Mechanism for Mobile Crowdsensing: A Game-theoretic Approach 1st Edition Youqi Li - Get the ebook in PDF format for a complete experience
100% (1)
Incentive Mechanism for Mobile Crowdsensing: A Game-theoretic Approach 1st Edition Youqi Li - Get the ebook in PDF format for a complete experience
70 pages
Social Incentive Mechanism Based Multi-User Sensing Time Optimization in Co-Operative Spectrum Sensing With Mobile Crowd Sensing
No ratings yet
Social Incentive Mechanism Based Multi-User Sensing Time Optimization in Co-Operative Spectrum Sensing With Mobile Crowd Sensing
21 pages
Yang CNF Icdcs2016
No ratings yet
Yang CNF Icdcs2016
10 pages
Practical Incentive Mechanisms For IoT BasedMobile Crowdsensing Systems
No ratings yet
Practical Incentive Mechanisms For IoT BasedMobile Crowdsensing Systems
10 pages
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
Distributed Coverage Games For Energy-Aware Mobile Sensor Networks
No ratings yet
Distributed Coverage Games For Energy-Aware Mobile Sensor Networks
28 pages
Incentive_Mechanism_for_Mobile_Crowdsensing_-_Youqi_Li
No ratings yet
Incentive_Mechanism_for_Mobile_Crowdsensing_-_Youqi_Li
177 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Cloud vs Edge
From Everand
Cloud vs Edge
Isaac Berners-Lee
No ratings yet
IoT Edge Innovations
From Everand
IoT Edge Innovations
Kai Turing
No ratings yet
Virtual Intelligence: Fundamentals and Applications
From Everand
Virtual Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet
Comprehensive Guide to Micro:bit Technology: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Micro:bit Technology: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
From Everand
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IoT in Everyday Life
From Everand
IoT in Everyday Life
Anasuya Menon
No ratings yet
Understanding Computer Architectures: Principles and Applications
From Everand
Understanding Computer Architectures: Principles and Applications
Pasquale De Marco
No ratings yet
Immediate download Mobile Crowd Sensing: Incentive Mechanism Design Fen Hou ebooks 2024
100% (1)
Immediate download Mobile Crowd Sensing: Incentive Mechanism Design Fen Hou ebooks 2024
65 pages
Contiki Operating System for Embedded IoT: Definitive Reference for Developers and Engineers
From Everand
Contiki Operating System for Embedded IoT: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Emerging Technologies in Telecommunications
From Everand
Emerging Technologies in Telecommunications
Matthew N. O. Sadiku
No ratings yet
Participant Incentive Mechanism Toward Quality-Oriented Sensing: Understanding and Application
No ratings yet
Participant Incentive Mechanism Toward Quality-Oriented Sensing: Understanding and Application
25 pages
Mobile Games For Language Learning
No ratings yet
Mobile Games For Language Learning
14 pages
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
Instant Access To Mobile Crowd Sensing: Incentive Mechanism Design Fen Hou Ebook Full Chapters
100% (4)
Instant Access To Mobile Crowd Sensing: Incentive Mechanism Design Fen Hou Ebook Full Chapters
62 pages
Edge AI Solutions
From Everand
Edge AI Solutions
Kai Turing
No ratings yet
Network Lab
No ratings yet
Network Lab
14 pages
casado-vara2018
No ratings yet
casado-vara2018
16 pages
Cloud Computing Unveiled: A Short Journey Through Time
From Everand
Cloud Computing Unveiled: A Short Journey Through Time
Maula Issa
No ratings yet
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
Sussman Anomaly: Fundamentals and Applications
From Everand
Sussman Anomaly: Fundamentals and Applications
Fouad Sabry
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Wireless Sensor Network: Smart Monitoring and Autonomous Data Collection for NextGen Robotics
From Everand
Wireless Sensor Network: Smart Monitoring and Autonomous Data Collection for NextGen Robotics
Fouad Sabry
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Microelectromechanical Systems: Designs for Precision and Efficiency in Advanced Robotics
From Everand
Microelectromechanical Systems: Designs for Precision and Efficiency in Advanced Robotics
Fouad Sabry
No ratings yet
5 G Technologies
From Everand
5 G Technologies
Ajit Singh
5/5 (2)
Assignment
No ratings yet
Assignment
53 pages
Introduction to Computer Science Unlocking the World of Technology
From Everand
Introduction to Computer Science Unlocking the World of Technology
Benjamin F
No ratings yet
Microprediction: Building an Open AI Network
From Everand
Microprediction: Building an Open AI Network
Peter Cotton
No ratings yet
Quantum Networks: The Future of Computer Communication
From Everand
Quantum Networks: The Future of Computer Communication
Manoj RC
No ratings yet
Deteksi Node Palsu Di WSN Menggunakan Teori Permainan Non-Kooperatif Non-Zero Sum
No ratings yet
Deteksi Node Palsu Di WSN Menggunakan Teori Permainan Non-Kooperatif Non-Zero Sum
6 pages
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Computer Science Self Management: Fundamentals and Applications
From Everand
Computer Science Self Management: Fundamentals and Applications
Fouad Sabry
No ratings yet
System Design Basics
From Everand
System Design Basics
Kai Turing
No ratings yet
The Decentralized Cloud: How Blockchains Will Disrupt and Unseat Centralized Computing
From Everand
The Decentralized Cloud: How Blockchains Will Disrupt and Unseat Centralized Computing
Daniel W. Marshall
No ratings yet
The Internet of Things: Living in a connected world
From Everand
The Internet of Things: Living in a connected world
BCS, The Chartered Institute for IT
No ratings yet
IoT Time: Evolving Trends in the Internet of Things
From Everand
IoT Time: Evolving Trends in the Internet of Things
Ken Briodagh
5/5 (1)
Game Theoretic Approaches For Spectrum Redistribution 1st Edition Fan Wu Auth instant download
No ratings yet
Game Theoretic Approaches For Spectrum Redistribution 1st Edition Fan Wu Auth instant download
45 pages
IoT Integration
From Everand
IoT Integration
Mei Gates
No ratings yet
Enhancing Tech Theory
From Everand
Enhancing Tech Theory
T. T. Samuels
No ratings yet
11
No ratings yet
11
12 pages
A.I. Cancer Timebomb
From Everand
A.I. Cancer Timebomb
charles r giardina
No ratings yet
Fortifying Digital Fortress: A Comprehensive Guide to Information Systems Security: GoodMan, #1
From Everand
Fortifying Digital Fortress: A Comprehensive Guide to Information Systems Security: GoodMan, #1
Patrick Mukosha
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Cloud Computing
From Everand
Cloud Computing
Mei Gates
No ratings yet
Modern Mainframe Mastery: Navigating the New Era of Systems Management: Mainframes
From Everand
Modern Mainframe Mastery: Navigating the New Era of Systems Management: Mainframes
Isaac Nangan
No ratings yet
Designing Secure and Scalable IoT Systems: Definitive Reference for Developers and Engineers
From Everand
Designing Secure and Scalable IoT Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Taguchi Loss Function
No ratings yet
Taguchi Loss Function
13 pages
DX100 MHMSR-Positioner Set-Up (Sigma III) - SSGW-246
No ratings yet
DX100 MHMSR-Positioner Set-Up (Sigma III) - SSGW-246
4 pages
Swift mt940 942
No ratings yet
Swift mt940 942
13 pages
Reviewer in Budgeting-Part 4
No ratings yet
Reviewer in Budgeting-Part 4
3 pages
Working With CSV Files
No ratings yet
Working With CSV Files
4 pages
Mayuko_Yamashita
No ratings yet
Mayuko_Yamashita
2 pages
Appendix P Risk Analysis ARTM 09232010
No ratings yet
Appendix P Risk Analysis ARTM 09232010
25 pages
The Harsh Rule of The Goals Data-Driven Performance Indicators For Football Teams
No ratings yet
The Harsh Rule of The Goals Data-Driven Performance Indicators For Football Teams
10 pages
Roslin_20170913
No ratings yet
Roslin_20170913
9 pages
Introduction To Classification Notes
No ratings yet
Introduction To Classification Notes
8 pages
DME ToolandDie NSchemeSyllabus
No ratings yet
DME ToolandDie NSchemeSyllabus
254 pages
A320 Notes PDF
100% (1)
A320 Notes PDF
4 pages
70-467 Lab
No ratings yet
70-467 Lab
42 pages
Electric Motors and Drives - Third Edition Solutions 1,2
No ratings yet
Electric Motors and Drives - Third Edition Solutions 1,2
10 pages
The Formation of The Solar System: Our Theory Must Explain The Data
100% (1)
The Formation of The Solar System: Our Theory Must Explain The Data
30 pages
Chap 1
No ratings yet
Chap 1
24 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
An Introduction To MEMS Vibration Monitoring - Analog Devices
No ratings yet
An Introduction To MEMS Vibration Monitoring - Analog Devices
16 pages
Chapter 2 l9l10 Week 3
No ratings yet
Chapter 2 l9l10 Week 3
14 pages
C11 and C13 On-Highway Truck Engines Electrical System: Right Side View
No ratings yet
C11 and C13 On-Highway Truck Engines Electrical System: Right Side View
2 pages
Calculation Exercise-Combined Cycle3
No ratings yet
Calculation Exercise-Combined Cycle3
6 pages
BG-Manual Call Point Break Glass X
No ratings yet
BG-Manual Call Point Break Glass X
2 pages
Userguide Motorola Mototrbo dr3000 PDF
No ratings yet
Userguide Motorola Mototrbo dr3000 PDF
61 pages
MA8451-PROBABILITY AND RANDOMPROCESSES-639406958-QUE BANK(MA8451)PRP-ECE
No ratings yet
MA8451-PROBABILITY AND RANDOMPROCESSES-639406958-QUE BANK(MA8451)PRP-ECE
23 pages
Final Project Description
No ratings yet
Final Project Description
3 pages
A Revised Structure of Rosmanol 1985
No ratings yet
A Revised Structure of Rosmanol 1985
2 pages
Excel Guide
No ratings yet
Excel Guide
23 pages
Mil dtl83420
No ratings yet
Mil dtl83420
29 pages

zhan2019

Uploaded by

zhan2019

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

Free Market of Multi-Leader Multi-Follower

1 I NTRODUCTION reasons, MUs may be reluctant to participate in practice

For TI m, Theorem 1. Given any feasible pricing strategy pn for MU

Gradient based optimizer Loss

State Determine the pricing Replay

6.1.5 DDIM Formulation Algorithm 2 DDIM algorithm for TI m.

p21 x21 p22

Sensing Service Strategies

p51 x51 p52

p71 x71 p72

Fig. 3. SE and convergence of proposed DDIM approach.

Average Payoffs of MUs

Average Payoffs of TIs

Fig. 4. Performance comparison when varying no. of TIs.

PCM PCM 10 DDIM

Average Payoffs of MUs

Fig. 5. Performance comparison when varying no. of MUs.

varying of MUs’ maximum sensing time κmax . Here we 10

Average Payoffs of MUs

Fig. 6. Performance comparison when varying κmax .

approaches can obtain better performance, yet for baseline

increase with µm . This is because with the increase of µm , Random

Average Payoffs of MUs

Average Payoffs of TIs

Fig. 8. Performance comparison when varying µm .

1057–1063. Yinuo Zhao is a final year undergraduate stu-

Chi Harold Liu (SM’15) receives his Ph.D. de-

You might also like