0% found this document useful (0 votes)
60 views

Cooperative Computation Offloading and Resource Allocation For Blockchain-Enabled Mobile Edge Computing: A Deep Reinforcement Learning Approach

This document summarizes a research article that proposes a cooperative computation offloading and resource allocation framework for blockchain-enabled mobile edge computing systems. The framework aims to maximize computation rates and transaction throughput by jointly optimizing offloading decisions, power allocation, block size, and block interval. It formulates the dynamic optimization problem as a Markov decision process and develops an asynchronous advantage actor-critic algorithm using deep neural networks to solve it. Simulation results show the proposed algorithm converges fast and outperforms existing schemes in terms of total reward.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Cooperative Computation Offloading and Resource Allocation For Blockchain-Enabled Mobile Edge Computing: A Deep Reinforcement Learning Approach

This document summarizes a research article that proposes a cooperative computation offloading and resource allocation framework for blockchain-enabled mobile edge computing systems. The framework aims to maximize computation rates and transaction throughput by jointly optimizing offloading decisions, power allocation, block size, and block interval. It formulates the dynamic optimization problem as a Markov decision process and develops an asynchronous advantage actor-critic algorithm using deep neural networks to solve it. Simulation results show the proposed algorithm converges fast and outperforms existing schemes in terms of total reward.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

Cooperative Computation Offloading and Resource


Allocation for Blockchain-Enabled Mobile Edge
Computing: A Deep Reinforcement Learning
Approach
Jie Feng, F. Richard Yu, Fellow, IEEE, Qingqi Pei, Xiaoli Chu, Jianbo Du, and Li Zhu

Abstract—Mobile edge computing (MEC) is a promising I. I NTRODUCTION


paradigm to improve the quality of computation experience
of mobile devices because it allows mobile devices to offload Mobile edge computing (MEC) is a promising technology
computing tasks to MEC servers, benefiting from the powerful that can promote the computation capability of mobile de-
computing resources of MEC servers. However, the existing
computation-offloading works have also some open issues: 1) vices by offloading the computing tasks from mobile devices
security and privacy issues, 2) cooperative computation offload- to MEC servers [1]. Compared with the centralized cloud
ing, and 3) dynamic optimization. To address the security and computing system, the distributed structure of MEC systems
privacy issues, we employ blockchain technology that ensures the has many advantages, including reduced energy consump-
reliability and irreversibility of data in MEC systems. Meanwhile, tion and decreased latency. Many efforts have been made
we jointly design and optimize the performance of blockchain
and MEC. In this paper, we develop a cooperative computation on computation offloading and resource allocation of MEC
offloading and resource allocation framework for blockchain- systems [2]–[5]. However, the above existing methods are
enabled MEC systems. In the framework, we design a multi- not suitable for some specific environments because of the
objective function to maximize the computation rate of MEC following challenges.
systems and the transaction throughput of blockchain systems
1) Security and Privacy Issues: Most of the existing studies
by jointly optimizing offloading decision, power allocation, block
size and block interval. Due to the dynamic characteristics [6], [7] pay little attention to the security and privacy of MEC.
of the wireless fading channel and the processing queues at The interaction between heterogeneous edge nodes and the
MEC servers, the joint optimization is formulated as a Markov migration of service across edge nodes are likely to challenge
decision process (MDP). To tackle the dynamics and complexity its security and privacy. To address these issues, blockchain
of the blockchain-enabled MEC system, we develop an A3C-
has been envisioned as a promising approach [8]. Different
based cooperation computation offloading and resource allocation
algorithm to solve the MDP problem. In the algorithm, deep from traditional digital ledger approaches, which depend on
neural networks are optimized by utilizing asynchronous gradient a trusted central authority, blockchain employs community
descent and eliminating the correlation of data. Simulation results verification to synchronize the decentralized ledgers that are
show that the proposed algorithm converges fast and achieves replicated across multiple nodes [9]. Blockchain can facilitate
significant performance improvements over existing schemes in
the establishment of a trusted, secure, and decentralized MEC
terms of total reward.
systems. In blockchain-enabled MEC systems, MEC servers
Index Terms—Mobile edge computing, blockchain, computa- not only handle their tasks but also deal with the task (e.g. gen-
tion offloading, transaction throughput, A3C.
erate blocks and perform consensus process) from blockchain
This work is supported by the National Key Research and Development
systems, which makes the design of the system more complex.
Program of China under Grant 2018YFE0126000, the Key Program of NSFC- Therefore, the design and optimization of blockchain and MEC
Tongyong Union Foundation under Grant U1636209, the National Natural should be implemented simultaneously.
Science Foundation of China under Grant 61902292, and the Key Research
and Development Programs of Shaanxi under Grant 2019ZDLGY13-07 and 2) Cooperative Computation Offloading: This approach has
2019ZDLGY13-04, the Natural Science Foundation of China under Grant been only considered by a few researchers in the previous
61901367, and the Doctoral Student’s Short-Term Study Abroad Scholarship works. Most existing computation offloading schemes [6],
Fund of Xidian University. (Corresponding authors: Qingqi Pei and F. Richard
Yu) [7] assume that computing tasks can be directly offloaded to
Jie Feng and Qingqi Pei are with State Key Laboratory of Integrated MEC servers via wireless communications. However, a mobile
Services Networks (Xidian University), School of Telecomm. Engineering, device may be experiencing weak or intermittent connectivity
Xidian University, Xi’an, Shaanxi, China (email: [email protected];
[email protected]). and thus cannot directly offload computing tasks to MEC
F. Richard Yu is with the Dept. of Systems and Computer Eng., Carleton servers. If computing tasks are forced to offload to MEC
University, Ottawa, ON, Canada (email: [email protected]). servers directly, the computation offloading performance of
Xiaoli Chu is with University of Sheffield, S1 3JD, UK (email:
[email protected]). mobile devices may be affected due to signal loss. A mobile
Jianbo Du is with Shaanxi Key Laboratory of Information Communications, device must offload computing tasks to MEC servers with the
Xi’an University of Posts and Telecommunications, Xi’an, Shaanxi, China help of neighbouring nodes. Therefore, it is necessary to study
(email: [email protected]).
L. Zhu is Beijing Jiaotong University Beijing, P.R. China (email: zhulibj- cooperative computation offloading. Furthermore, if there exist
[email protected]). malicious nearby nodes, the data security and privacy of

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

mobile devices will be susceptible to attacks. Therefore, the blockchain-enabled MEC systems is described. In Section
trust model needs to be considered on cooperative computation V, the joint problem of offloading decision and resource
offloading. allocation is formulated. We introduce the offloading decision
3) Dynamic Optimization: Moreover, most of the existing and resource allocation in A3C framework in Section VI.
works [6], [10], [11] in the computation offloading decision The performance of the proposed algorithm is evaluated and
and resource allocation strategies are optimized based on analyzed by simulations in Section VII. Finally, in Section
a one-time slot, and the long-term computation offloading VIII, we conclude this paper and look forward to future work.
performance cannot be characterized [12], [13]. The design
and optimization of blockchain-enabled MEC systems should II. R ELATED W ORK
account for the environmental dynamics, such as the time-vary The cooperative computation offloading problem has been
channel conditions and the task arrival. widely studied for MEC and cloud computing systems [18].
To deal with the first two challenges, in this paper, we Hong et al. [19] proposed a quality of service (QoS) co-
propose to maximize the weighted sum of the computation operative computation offloading problem for robots swarms
rate and the transaction throughput for blockchain-enabled in clouding systems aimed at minimizing latency. Cao et al.
MEC systems by jointly optimizing the cooperative computa- [20] studied a novel cooperative computation offloading based
tion offloading decision and resource allocation. Specifically, on both computation and communication of MEC system-
the computation tasks are offloaded from mobile devices to s to improve the energy efficiency for latency-constrained
MEC servers through cooperative communications, wherein computation. Guo et al. [21] presented an efficient dynamic
blockchain technology is applied to guarantee data security. offloading and resource scheduling strategy to decrease energy
For the dynamic optimization issue, we formulate the joint consumption and latency. However, these approaches do not
optimization as a Markov decision process (MDP) problem, take into account the security and privacy of data in coopera-
and develop an efficient deep reinforcement learning (DRL) tive computation offloading.
based offloading decision and resource allocation algorithm to The application of the blockchain in the MEC systems can
solve the problem. significantly improve the network security, data integrity and
The contributions of this paper are summarized as follows. computation validity of the system [22]. The computation
• In most existing works [10], [11], [14], [15], the design offloading problem has also been studied for the blockchain-
and optimization of blockchain and MEC are done sepa- enabled MEC system [11], [14]. Liu et al. [10] proposed a
rately, which will result in sub-optimal performance. We novel blockchain-based framework with an adaptive block size
propose a cooperative computation offloading framework in MEC systems, which considered two offloading models,
for blockchain-enabled MEC systems to enable the joint i.e., offloading to MEC servers or nearby device-to-device
analysis of the MEC computation rate and the blockchain users. Kang et al. [23] proposed a secure and distributed
transaction throughput while considering the trust model. vehicular blockchain for data management in vehicular edge
• The study jointly considers the offloading decision, power computing and networks. Based on the common decentral-
allocation, block size, and block interval to maximize ization characteristic of MEC and blockchain technology, Xu
the weighted sum of computation rate of MEC sys- et al. [24] proposed a trustless crowd-intelligence ecosystem
tems and transaction throughput of blockchain system- to improve network congestion. However, these works only
s. Considering the dynamic characteristics of wireless consider blockchain as an overlay system above the MEC
channels and the available resources, the optimization system, which will give rise to sub-optimal performance.
problem is formulated as an MDP. Since the action Furthermore, these approaches utilize static optimization tech-
space of the MDP problem has both continuous actions niques, which cannot characterize the long-term computation
and discrete actions, traditional learning algorithms, such offloading performance. Therefore, their methods cannot be
as Q-learning [16] and SARSA [17] and so on, are applied in practical dynamic systems.
powerless. An asynchronous advantage actor-critic (A3C) Deep reinforcement learning (DRL) is emerging as one of
reinforcement learning algorithm is introduced to solve the efficient methods to obtain the optimal decision-making
the MDP problem, in which deep neural networks are policy and maximize long-term rewards. Therefore, the use of
optimized by using asynchronous gradient descent and DRL to solve computation offloading problems for blockchain-
eliminating the correlation of data. enabled MEC systems has attracted considerable interest from
• The proposed algorithm and other baseline functions are academia. Qiu et al. [25] proposed a model-free DRL-based
implemented by using Tensorflow on a Python-based computation offloading scheme for blockchain-enabled MEC
simulator. Extensive simulation results show that the pro- systems while considering mining tasks and data process-
posed algorithm has good convergence, and has signifi- ing tasks. A computation offloading problem based on an
cant performance improvements over existing algorithms. advanced deep Q-learning network (DQN) was presented to
Furthermore, we observe that the proposed scheme can minimize energy consumption and delay [26]. However, all of
achieve the optimal trade-off between the performance of the above-proposed algorithms can only handle discrete action
the MEC system and the blockchain system. space and do not apply to continuous action cases. Therefore,
The rest of this paper is organized as follows. In Section we develop an A3C-based cooperative computation offloading
II, we discuss related research. We introduce the system and resource allocation algorithm to achieve the optimal trade-
model in Section III. In Section IV, the trust calculation in off between the performance of the MEC system and the

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

Blockchain system help of relay nodes, to meet the computation requirements


Consensus process
of mobile devices that are far from MEC servers. Assume
that there are R relay nodes around each mobile device, and
Consensus node
each mobile device can only select one relay node to offload
computation tasks collaboratively. Let R = {1, 2, ...R} denote
the set of relay nodes within the coverage of each BS. When
Ordinary node
Block offloading computation tasks by relaying, there may be selfish
and malicious nodes. Therefore, security plays an important
Block Block Block Block Block ... role in realizing cooperative communications [30]. In this
paper, we consider a trust-based secure computation offloading
trust
scheme. Let Dn→r denote the trust value of mobile device n
Transactions
to relay node r.
MEC system Mobile device
In the blockchain system, the blockchain nodes consist of
all BSs. These nodes have two types of roles: ordinary nodes
Relay node and consensus nodes. The blockchain system mainly deals
with transactions, i.e., offloading data records, from the MEC
Small base station
Phase 2
network. To handle the transactions, the blockchain system
Phase 1 needs to complete two steps. One is the block generation,
Macrocell base station
Phase 1 and the other is the consensus process. Ordinary nodes only
transfer and accept ledger data, while consensus nodes produce
MEC server
blocks and perform the consensus process. However, there
may be a security issue during the block generation process,
i.e., malicious consensus nodes may tamper with transaction
Fig. 1: The system model. data. Therefore, the trust value of each candidate should
be considered when voting for consensus nodes. Consensus
nodes with high trust value are likely to ensure a secure
performance of the blockchain system while considering the and reliable block generation process and consensus process
trust model. [14]. Meanwhile, the blockchain system can also store some
parameters for calculating the trust value of the interactive
III. S YSTEM MODEL nodes (i.e., relay nodes and consensus nodes), such as network
status, resources availability, and trustworthiness of interactive
In this section, we first present the network model, then
nodes [31]. Note that there are K consensus nodes selected
depict the MEC model and the blockchain model in detail,
out of N according to a certain rule (specified in Subsection
respectively.
II-C). Let K = {1, 2, ...K} denote the set of consensus nodes.
Similar to many previous works [2], [15], time is slotted in
A. Network Model this paper and the length of a time slot is ∆t. All the notions
We consider a blockchain-enabled MEC system, as shown used are listed in Table I.
in Fig. 1, which is composed of an MEC system and a
blockchain system. In the MEC system, a single macrocell B. MEC System
base station (MBS) is located in the center of the coverage 1) Local Computing Mode: Let Ln denote the number
area. Several small base stations (SBS) are distributed around of CPU cycles required for mobile device n to process 1-
the MBS, and all BSs are connected by wire links, each bit computing task, which is determined by the types of
of which is integrated with an MEC server. We consider applications and can be procured by off-line measurements
an interference-free system in the paper, which users utilize [32]. We let fn denote the CPU-cycle frequency of mobile
orthogonal spectrums for data transmission [27]. Let N = device n, which must meet the constraint fn ≤ fnmax , by
{1, 2, ..., N } denote the set of BSs. We assume that each BS using dynamic voltage and frequency scaling (DVFS). Let
only serves one mobile device [28]. However, the scenario tloc (0 ≤ tloc ≤ △t) denote the computing time of mobile
in which each BS servers multiple mobile devices will be device. The computation rate for local computation (in bits
discussed in our future work. Therefore, the set of mobile per second), denoted by rlocl , can be given by
devices is denoted by N ∗ = N = {1, 2, ..., N }, where mobile
device n is served by its corresponding BS n. We assume that fn
rlocl = . (1)
each mobile device is running independent and fine-grained Ln
tasks [2], [29]. Since mobile devices have relatively weak 2) Cooperative Offloading Mode: Cooperative offloading is
computing capability, the computing tasks of mobile devices composed of two phases, as shown in Fig. 1. In the first
need to be completed with the help of MEC servers to improve phase, the mobile device transmits wireless signals that contain
the quality of the user’s computation experience. the offloaded data size to the BS, which is simultaneously
We adopt cooperative communications to offload tasks in overheard by a relay node. The selected relay node forwards
the MEC system, i.e., offloading computation tasks with the the detected signals to BS by employing the regenerate and

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

TABLE I: Notation Definitions and relay node r to BS n is respectively given by


2
Symbol Definition gr,n (t)Ptot,n (t)/σr,n (t)
N ∗ /N
Pn (t) = ,
The set of all mobile devices or BSs gn,r (t)/σn,r (t) + gr,n (t)/σr,n (t) − gn (t)/σn2 (t)
2 2
R The set of all relay nodes
trust
(5)
Dn→r The trust value of mobile device n to relay node r
K The set of the block producers
Pn (t) The transmit power of mobile device n in slot t
2
(gn,r (t)/σn,r (t) − gn (t)/σn2 )Ptot,n (t)
Pr,n (t) = .
r,n (t)/σr,n (t) − gn (t)/σn (t)
2 (t) + g
gn,r (t)/σn,r 2 2
Pr,n (t) The transmit power of relay node r to BS n in slot t
gn (t) The channel gain of mobile device n to BS n in slot t (6)
gn,r (t) The channel gain of mobile device n to relay node r
in slot t Substituting (5) and (6) to (2), we can have
gr,n (t) The channel gain of relay node r to BS n in slot t
Pn (t)gn (t)
ψn The delay of mobile device n in offloading data =
Cn The processing density σn2 (t)
2
sn The CPU-cycle frequency gr,n (t)gn (t)Ptot,n (t)/(σr,n (t)σn2 (t))
F′
. (7)
r,n (t)/σr,n (t) − gn (t)/σn (t)
The total computation capability of the MEC server 2 (t) + g
gn,r (t)/σn,r 2 2
σn (t) The noise variances of mobile device n to BS n in slot t
σn,r (t) The noise variances of mobile device n to relay node r
Pn (t)gn,r (t) Pr,n (t)gr,n (t)
in slot t
2 (t)
+ 2 (t)
=
σr,n (t) The noise variances of relay node n to BS n σn,r σr,n
in slot t 2
(gr,n (t)/σr,n 2
(t))(2gn,r (t)/σn,r (t) − gn (t)/σn2 )Ptot,n (t)
τn The tolerable maximum delay .(8)
r,n (t)/σr,n (t) − gn (t)/σn (t)
2 (t) + g
gn,r (t)/σn,r 2 2
Tmin The minimum computing capacity required by the
blockchain system
fnmax The maximum CPU-cycle frequency of mobile device n
2
Since gn (t)/σn2 (t) < gn,r (t)/σn,r (t), then we have
Sb (t) The block size in slot t gn (t)/σn (t) < 2gn,r (t)/σn,r (t) − gn (t)/σn2 (t). Then, the
2 2

Tb (t) The block interval in slot t transmit rate which mobile device n offloads the computation
tasks through relay node r is given by
trust
BDn→r (t)
Rn,r (t) =
forward scheme. The offloaded date received in both two 2
2
phases is combined at the BS using maximal ratio combining gr,n (t)gn (t)Ptot,n (t)/(σr,n (t)σn2 (t))
log2 (1 + ),
r,n (t)/σr,n (t) − gn (t)/σn (t)
(MRC) [33]. The transmit rate of mobile device n when the 2 (t) + g
gn,r (t)/σn,r 2 2

relay node r is selected in time slot t is expressed as (9)


{
1 Pn (t)gn (t) where B is bandwidth.
Rn (t) = min log2 (1 + ),
2 σn2 (t) For secure communications, the trust value of relay node
}
Pn (t)gn,r (t) Pr,n (t)gr,n (t) should be considered when offloading data through the relay
log2 (1 + + ) , (2)
2 (t)
σn,r 2 (t)
σr,n node. Then, the selection of relay node is based on the transmit
rate Rn,r (t). When Rn,r∗ (t) > Rn,r (t), r ∈ R, mobile device
where Pn (t) and Pr,n (t) are the transmit power of mobile n offloads computation tasks through relay node r∗. Then the
device n and relay node r to BS n in time slot t, respectively. rate of mobile device n is given by
gn (t), gn,r (t), and gr,n (t) are the channel gain between mobile
device n and BS n, mobile device n and relay node r and relay rn′ (t) = max{Rn,r (t), ∀r ∈ R}. (10)
node r and BS n in slot t, respectively.
The delay for mobile device n to offload data is ψn . Then
The total power of the mobile device and relay node in BS the offloaded data size for mobile device n in slot t is given
n is given by by
Ptot,n (t) = Pn (t) + Pr,n (t). (3) Dn (t) = bn (t)υ(t) = rn′ (t)ψn , (11)
When the direct channel conditions are less than the relay where bn (t) and υ(t) are the amount of raw data and the com-
channel conditions, i.e., gn (t)/σn2 (t) < gn,r (t)/σn,r
2
(t), the munication overhead in computation offloading, respectively.
cooperative computation offloading is adopted. In the coop- Then, the computation rate of cooperation offloading, denoted
erative computation offloading, any increase of power has to by rof f (t), is given by
shared between the mobile device and relay node. Therefore,
the transmit rate can reach the maximum when the following rn′ (t)ψn
rn,of f (t) = . (12)
equation is satisfied. △tυ(t)
Pn (t)gn (t) Pr,n (t)gr,n (t) Pn (t)gn,r (t) After decoding the signals from mobile devices and relay
+ = . (4) nodes, MEC servers can perform the offloaded tasks. The clock
σn2 (t) 2 (t)
σr,n 2 (t)
σn,r
speed of CPU consumed by the computing tasks of mobile
According to (3), the transmit power of mobile device n device n is represented by sn (in CPU cycles/s), which is a

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

constant. Then the time when the MEC server performs mobile pre-prepare prepare persist
device n is given by 0
Dn (t)Ln r′ (t)ψn Ln
τn (t) = = n . (13) 1
sn sn
Since the computation results are very small, we ignore 2
the return time of the computing results in this paper. The
computation rate of MEC server n is given by Lsnn . The time
that the computation tasks of mobile device n are successfully 3
executed is given by
Fig. 2: The process of the consensus algorithm.
t′n,of f (t) = ψn + τn (t). (14)
Accordingly, the total computation rate and the total time
After generating a block, the block needed to be verified.
of mobile device n are respectively given by
In the consensus process, we utilize the delegated Byzantine
sn
rn (t) = an (t)rloc (t) + (1 − an (t))(rn,of f (t) + ), (15) fault tolerance (dBFT) consensus mechanism [34]. When there
Ln are K consensus nodes in the consensus process, we assume
that K ≥ 3f + 1, where f is the maximum number of fault-
ttot,n (t) = an (t)tloc + (1 − an (t))t′n,of f (t), (16) tolerant nodes. In the consensus process, the leader of the node
where an (t) ∈ {0, 1} is the offloading decision of mobile is called the speaker, and the others are called members. The
device n. When an (t) = 1, the computation tasks of mobile speaker is responsible for broadcasting new block proposals
device n are executed locally. Otherwise, the computation to other nodes. The members are responsible for voting on the
tasks are offloaded to the MEC server. new block proposal. When the number of votes is not less than
K − f , the proposal is passed. The speaker p of the consensus
process is determined by
C. Blockchain System
p = (h − v) mod N, (19)
Any node in the blockchain can collect the transactions
from the MEC system. To improve the system performance, where h is the block height of the current consensus, and v is
some blockchain nodes with a high number of votes are the view number. Then, the process of the consensus algorithm
selected as consensus nodes to participate in generating blocks is shown in Fig. 2.
and verifying blocks. The number of votes for a consensus The algorithm can be divided into three phases: pre-prepare,
node candidate is determined by the number of stakes it prepare, and persist. During the pre-prepare phase, the speaker
holds, its available resources and its trust value. We as- for this round is responsible for broadcasting a message to
sume that the stake and available computing resource of other members. Meanwhile, the speaker launches a proposal.
blockchain node n in slot t are represented by Φn (t) and In the prepare phase, the members broadcast the message and
Tn (t), respectively. The available computing resource of the vote. When a consensus node receives no less than K − f
node is the remaining resource after processing the offloaded signatures of the block, it enters the third phase, and a block
tasks. Denote the sets of the stake and available computing is successfully generated in the phase. Meanwhile, the block
resource of nodes by Φs (t) = {Φ1 (t), Φ2 (t), ..., Φn (t)} and is broadcasted the whole blockchain system, and then enter
T (t) = {T1 (t), T2 (t), ..., Tn (t)}, respectively. We assume that the next round of the consensus process.
the MEC server has a first in first out (FIFO) data buffer to Let Tc (t) denote the time cost in the consensus process.
store the arrived but not yet executed offloaded tasks. Hence, For simplicity, the consensus process is divided into two parts,
the dynamics of the processing queue at the beginning of the i.e., message propagating and message verification, including
t + 1 time slot can be given by as follows. signatures verification, message authentication codes (MAC)
generation, and MAC verification [35]. Then, the latency of
Fn (t + 1) = max{Fn (t) − sn + ρn rn (t), 0}, (17) the consensus process in slot t is give by
where ρn is the processing density (in CPU cycles/bit). Then, Tc (t) = Tp (t) + Tv (t), (20)
the computing resource available to the blockchain system by
MEC server n in the slot t is given by where Tp (t) and Tv (t) are the time of message propagation
and the validation time in slot t, respectively.
Tn (t) = max{F ′ − Fn (t), Tmin }, (18) Similar to [15], we utilize latency time to finality (LTF) as
the latency of the blockchain system. The LTF is given by
where F ′ and Tmin are the total computing capacity of MEC
servers and the minimum computing capacity required by the Ttotal (t) = Tc (t) + Tb (t). (21)
blockchain system, respectively. Let Dntrust denote the trust Then, the transaction throughput [15] can be expressed as
value of blockchain nodes. In the paper, we assume that these
K block producers, in turn, generate blocks [15]. Let Sb (t) ⌊Sb (t)/χ⌋
Ψ(t) = , (22)
and Tb (t) denote the block size (in MB) and block interval (in Tb (t)
seconds) in slot t, respectively. where χ is the average size of transactions.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

IV. T RUST C ALCULATION IN B LOCKCHAIN - ENABLED nodes [36]. Therefore, the value of αn→r and βn→r can be
MEC S YSTEMS respectively recast as
For secure communications, only relay nodes with high trust new
αn→r plr
= αn→r + Pn→r × (αn→r + βn→r ), (26)
values should be chosen to relay the offloaded data to the MEC
server. If computing tasks are relayed by a relay node with new
βn→r = βn→r − Pn→r
plr
× (αn→r + βn→r ), (27)
low trust value, the relay node may take malicious actions,
plr
e.g., dropping relaying data packets. Therefore, each mobile where Pn→r is the packet loss rate. Similar to [36], the packet
device should interact with relay nodes with high trust value loss rate is estimated by the following equation.
to avoid potential security threats. Similarly, malicious block ∑c
ω(b) × ω(b)
producers may generate a fake block. Therefore, the consensus Pn→r = 1 − b ∑c
plr
, (28)
b ω(b)
nodes selected should have a higher trust value. To compute
the trust values of nodes (relay nodes and consensus nodes), where ω(b) is the weight value of a historical link state and let
we jointly utilize two common ways to evaluate, i.e., direct link = (ω(1), ω(2), ..., ω(b)) be a historical link state record.
2b
trust and indirect trust (recommendation) [36]. Direct trust The wight value is given by ω(b) = c(c+1) , where b and c are
values of nodes are calculated based on subjective logic, while the serial number of ω(b) in link and the number of the state
indirect trust values are computed based on the third party’s record, respectively.
recommendations. In this work, we evaluate the trust value On the other hand, we assume that all relay nodes have the
of a node by a real number ranging from 0 to 1. Like most same initial energy consumption rate and energy level. When
literature, such as [14], [37], the trust threshold is set 0.5. In malicious nodes launch malicious attacks, they can always
other words, the node is trustworthy when its trust value is consume anomalous energy. Therefore, we utilize energy as
higher than 0.5; otherwise, it is not trustworthy. Next, we first a quality of service (QoS) trust metric to measure whether a
pen
calculate the trust value of relay nodes. relay node is malicious or not. Let Pn→r be the energy con-
sumption rate, which is achieved by using the Ray Projection
pen
method [41] (Pn→r ∈ [0, 1]). Then the node competence (NC)
A. Calculation of Direct Trust is given by
Similar to [36], we utilize node honesty and node capacity {
1 − Pn→r
pen res
, if En→r ≥ θ,
to calculate direct trust. Since mobile communication channels N Cn→r =
0, otherwise,
between mobile devices and relay nodes are unstable and
(29)
noisy, the communication behaviors in computation offloading
res
involves considerable uncertainty. We tackle the uncertain- where En→r and θ are the residual energy of one relay node
ty by using a Subjective Logic framework [38]. The trust and the energy threshold, respectively.
value of mobile device n to relay node r in the Subjective As mentioned above, the node trust relies on the node
Logic framework can be described as a triplet ωn→r = honesty and node competence. Then, the direct trust of a relay
{bn→r , dn→r , νn→r }, where bn→r , dn→r , and νn→r represent node is defined as
belief, disbelief and uncertainty, respectively. Peculiarly, the direct
Dn→r =
relationships among them are determined by {
0.5 + (N Hn→r − 0.5) × N Cn→m , if N Hn→r ≥ 0.5,
bn→r , dn→r , νn→r ∈ [0, 1], N Hn→r × N Cn→r , otherwise.
bn→r + dn→r + νn→r = 1. (23) (30)

Based on the trust model of [39], the node honesty (NH)


B. Calculation of Recommendation
can be given by
For calculation of trust value, we also consider the rec-
N Hn→r = bn→r + ξνn→r , (24) ommendation from the third party, i.e., blockchain systems.
We assume that some relay nodes are willing to contribute
where 0 ≤ ξ ≤ 1 is a constant indicating the degree of their resources to help mobile devices offload computing tasks.
influence of trust uncertainty [40] and These relay nodes are called candidates. When a mobile device
αn→r needs to offload tasks via relay model, the candidates around
bn→r = (1 − νn→r ) , it send a request to the blockchain system and recommend
αn→r + βn→r
βn→r themselves to assist it in completing the tasks offloading.
dn→r = (1 − νn→r ) , (25) Upon receiving the request, the blockchain system will select
αn→r + βn→r
νn→r = 1 − sn→r , a suitable candidate based on the recommended value of each
candidate stored in the system. We assume that the blockchain
where αn→r and βn→r are the number of successful and system periodically updates and stores the candidate’s recom-
unsuccessful communication, respectively. sn→r represents the mended value. However, not every updated recommendation
quality of communication link, which refers to the packet is reliable. If only a single updated recommended value of a
success probability. The packet loss is not only caused by mo- candidate is considered, it is likely that an unreliable candi-
bile communication channels, but also induced by malicious date’s recommendation is adopted, resulting in an unreliable

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

trust evaluation. Therefore, we need to detect whether the and the trust value of relay nodes and blockchain nodes
recommendation is reliable before calculating the trust value. D trust (t) = {Dn→r
trust
(t), Dntrust (t)}, which is denoted as
For this purpose, we present a simple way to detect the trust { }
rel
value by defining the recommended reliability Rn→r . To begin S(t) , G(t), T (t), Φs (t), D trust (t) . (34)
with, we compute the average value of all updated recommen- Since the state space is continuous, the probability of being
dations for candidate r, denoted by Rrave . Then, we obtain the in a particular state is zero. The probability that the process
difference between the recommendation value and the average will leave the state s(t) to transition to the next state s(t + 1)
value. The greater the difference, the lower the reliability of after taking an action a(t) ∈ A can be expressed as
the recommendation. Therefore, the recommended reliability ∫
rel
Rn→r is given by P r(s(t + 1) | s(t), a(t)) = f (s(t), a(t), s′ )ds′ , (35)
S t+1
rel
Rn→r = 1− | Rn→r
rec,i
− Rrave |, (31)
where f is the state transition probability density function.
rec,i
where Rn→rrepresents the recommended value for the ith
update in the blockchain system. B. Action Space
If the recommended reliability of a recommender is less
The action space includes offloading decision a(t), power
than 0.5, even if it has a high recommended value, the recom-
allocation P (t), block size Sb (t), and block interval Tb (t). We
mended value cannot be used to compute the recommended
utilize A(t) = [a(t), P (t), aSk (t), aTk (t)] to define the action
trust. Therefore, we obtain the recommended trust based on
set.
the recommended reliability and the recommended value as
Offloading Decision: The offloading decision is denoted by
follows.
∑I a(t) , [a1 (t), a2 (t), ..., aN (t)]. (36)
Rrel × Rn→r
rec,i
recom
Rn→r = i=1 n→r , (32)
I Power Allocation Decision: The power allocation decision
where I is the number of updates. Therefore, the relay node will be obtained based on achieving a maximum reward. We
trust is given by denote the power allocation decision by P (t), as shown below.
trust
Dn→r = P (t) , [Ptotal,1 (t), Ptotal,2 (t), ..., Ptotal,N (t)] . (37)
{ direct
Dn→r , new
if αn→r ≥ T hnum , Block Size and Block Interval: The delegators are elect-
direct recom
ωdirect Dn→r + ωrecom Rn→r , otherwise, ed by voting based on the number of stakes held by the
(33) blockchain nodes, the trust value of blockchain nodes, and
available computing resource. After determining the delega-
where ωdirect , ωrecom , and T hnum are the weight values of
tors, they take turns to produce blocks. By using the limits
the direct value and the recommendation, and the number of
fractional method, the block size and block interval decisions
interaction between recommenders and the blockchain system,
are respectively given by
respectively. ωdirect ∈ [0, 1], ωrecom ∈ [0, 1], and ωdirect +
ωrecom = 1. Similarly, the trust value of the nodes in the aSk (t) ∈ [0.2, Ṡb ], (38)
blockchain system is evaluated using the same method.
aTk (t) ∈ [0.1, Ṫb ], (39)
V. P ROBLEM F ORMULATION
where Ṫb are the block size limit and the maximum block
It is well known that wireless channels have the Markovian interval, respectively.
property [42], [43]. Therefore, the blockchain-enabled MEC
system is formulated as a discrete MDP to maximize the sys-
tem reward. Since it is impossible to predict the state transition C. Reward Function
probability and reward in advanced in mobile environment, we In this paper, we formulate an optimization problem to
propose a model-free approach based on deep reinforcement maximize the weighted sum of the computation rate of the
learning to solve the above the MDP problem. The MDP is MEC system and the transaction throughput of the blockchain
defined by a tuple < S, A, P, r >, where S is the state set of system, which jointly optimizes offloading decision, power
the system, A is the action set of the system, P is the state allocation, block size, and block interval. Then, the joint
transition probabilities, and r is the reward function. optimization problem is formulated as
[T −1 ]
∑ ∑
N

A. State Space and State Transition Probability max E ω1 ω2 rn (t) + (1 − ω1 )Ψ(t)


t A
t=0 n=1
We define the state space at the current decision epoch t s.t. (C1) : ttot,n (t) ≤ ε,
(t = 1, 2, ...) as a union of the wireless channels conditions
(C2) : Ttotal (t) ≤ ω × Tb (t), (40)
G(t) = (gn (t), gn,r (t), gr,n (t)), the available computing re-
source of the MEC server T (t) = {T1 (t), T2 (t), ..., Tn (t)}, (C3) : 0 ≤ Ptot,n (t) ≤ PT ,
the number of the stakes Φs (t) = {Φ1 (t), Φ2 (t), ..., Φn (t)}, (C4) : an (t) ∈ {0, 1},

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

where ω > 1, and ω1 (0 < ω1 < 1) is a weight factor to generate an estimate of the advantage. Then, the advantage
combine the objective function to a single one, and ω2 is a function is given by
mapping factor that ensures that the objective function is at
the same level. ε(ε ≤ ∆t) is the maximum tolerable average A(at , st ; θ, θv ) = Rt (θv ) − V (st ; θv )
delay in offloading tasks. PT is the sum of the power available ∑
k−1

for all mobile devices and relay nodes in the network. Then, = γ i rt+i + γ k V (st+k ; θv ) − V (st ; θv ). (44)
i=0
we define the reward function as
{
O(t), if C1 − C4 are satisfied, The policy π(at |st ; θ) and the value function V (st ; θv ) are
rt = (41) approximated by using a single convolutional neural network.
0, otherwise,
Especially, the policy function is output by a softmax layer,

N
where O(t) = ω1 ω2 rn (t) + (1 − ω1 )Ψ(t). and the estimate of the value function is output by a linear
n=1 layer. In A3C, all network weights are stored in a central
VI. O FFLOADING D ECISION AND R ESOURCE A LLOCATION parameter server [45]. In the beginning, each actor-learner
IN THE A3C F RAMEWORK
sets its network parameters to those of the server. Then,
multiple actor-learners learn concurrently and optimize the
Compared with other DRL algorithms, such as actor-critic convolutional neural network through asynchronous gradient
learning (AC), advantage actor-critic learning (A2C), and descent. After computing the gradient, the actor-learners send
policy-based learning, the A3C is a faster, simpler, and more the updates to the server. Then, the server propagates new
robust parallel reinforcement learning algorithm proposed by weights to the actor-learners to ensure they share a common
Google DeepMind in 2016 [44]. It can reliably train deep policy.
neural network policies. Different from the underlying re- Two loss functions are associated with the two convolutional
inforcement learning algorithms, such as actor-critic that is neural network outputs. For policy loss function, we have
an on-policy search algorithm, and Q-learning that is an off-
policy value-based search algorithm, A3C combines the ben- fπ (θ) = log π(at | st ; θ)(Rt − V (st ; θv )) + βH(π(st ; θ)),
efits of the value-based method and the policy-based method (45)
[44]. More importantly, it could work in discrete as well as
continuous action spaces. A3C utilizes asynchronous actor- where H(π(st ; θ)) is the entropy. β is a hyperparameter that
learners, i.e., employing multiple CPU threads on a single controls the strength of the entropy regularization term.
machine, to learn more efficiently. Multiple actor-learners Differentiating the policy loss function in (45) with respect
running in parallel can interact with their environment and to the parameter θ, we have
obtain different exploration policies. Moreover, the exploration
policy of each actor-learner is independent of those of the ∇θ fπ (θ) = ∇θ log π(at | st ; θ)(Rt − V (st ; θv ))
others. Hence, the overall exploration policy available for + β∇θ H(π(st ; θ)). (46)
training becomes more diverse.
In an A3C algorithm, we need to maintain a policy The loss function for estimated value function is given by
π(at |st ; θ) (a set of action probability outputs) with the
fv (θv ) = (Rt − V (st ; θv ))2 . (47)
parameter θ and an estimate of the value function V (st ; θv )
(how good a certain state is to be) with the parameter θv . Similarly, differentiating the value loss function in (47) with
Compared with traditional policy gradient methods, A3C is respect to θv yields
more intelligent because the agent utilizes the estimated value
function (the critic) to update the policy (the actor). The policy ∇θv fv (θv ) = 2(Rt − V (st ; θv ))∇θv V (st ; θv ). (48)
and the value function are updated in the terminal state or after
maximum step tmax actions. In the policy-based methods, the The loss function can be minimized by adopting RMSProp
rule is updated by using the discounted returns, which is given algorithm that has been widely used in the deep learning
by algorithms. Then, the estimate of the gradient under RMSProp
is given by

k−1
i k
Rt (θv ) = γ(r) = γ rt+i + γ V (st+k ; θv ), (42) g = αg + (1 − α)∆θ2 , (49)
i=0
where k is vary from state to state and is upper-bounded by where α is the momentum, and ∆θ is the accumulated
tmax , rt+i is immediate reward, and γ ∈ (0, 1] is the discount gradients of the loss function.
factor. Then, the RMSProp algorithm can be updated according to
However, the estimate can cause the variance. To reduce the the following estimated gradient downhill.
variance of the estimate, the advantage estimates is adopted, △θ
which is given by θ ← θ − η√ , (50)
g+ϵ
A(at , st ) = Q(at , st ) − V (st ). (43)
where η is the learning rate, and ϵ is small positive number.
Since the Q(at , st ) value cannot be determined in A3C, Based on (46) and (48), the detail of the A3C algorithm
the discounted returns is used as the estimate of Q(at , st ) to used in our proposed approach is shown in Algorithm 1.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

Algorithm 1 A3C-Based Computation Offloading and Re- TABLE II: Summary of the Simulation Parameters
source Allocation Algorithm
Parameters Definition Values
Initialization: B Bandwidth 180 KHz [50]
• Assume that θ and θv are parameters the actor network PT Maximum power available 1 W [51]
and critic network in global network. φn Transmit time 0.4 s [47]
• Assume that θ′ and θv′ are parameters the actor network N0 Noise power density −174 dBm/Hz [3]
and critic network in local network. χ Average transaction size 200 KB
• Set global counter T = 0 and local step counter t = 1. Ṡb Block size 8 MB [15]
• Set Tmax , tg , γ, learning rate η, ϵ, and tmax . ε Tolerable maximum delay 1 s [5]
• Set the number of agents W . Ln Processing density 737.5 cycle/bit [5]
Iteration: F Computation capability 2.5 GHz [5]
1: while T < Tmax do ϖ2 ,ϖ1 The weighted values 0.2, 0.0001
2: for w = 1 to W do ηa ,ηc Learning rate 10−3 ,10−2
3: Reset global gradient dθ = 0 and dθv = 0. ξ Shadowing standard deviation 10 dB [52]
4: Synchronize local parameters θ′ = θ and θv′ = θv .
5: Set t0 = t and obtain system state S(t).
6: repeat
7: Obtain action A(t) according to policy A. Simulation Parameters
π(A(t)|S(t); θ′ ). We consider a network that consists of an MEC system
8: Execute action A(t), observe reward R(t), and and a blockchain system, which comprises 30 mobile devices
observe next state S(t + 1). scattering over a 2 × 2 km2 area [47]. The number of relay
9: t = t + 1. nodes within the coverage of each BS is 5. The CPU-cycle
10: until t − t0 == tmax frequency of mobile devices and MEC servers is 1 GHz [5]
11: if t%tg == 0 then and 2.4 GHz [48], respectively. Other simulation parameters
12: R = V (S(t); θv′ ). are summarized in Table II, where the path loss models and
13: end if the shadowing fading are standard settings provided by 3GPP
14: for i = t − 1 to t0 do [49]. In our simulations, we use a computer, which has 6
15: R = R(t) + γR. CPU cores. The CPU is Intel Core i5-8400 with 32G memory.
16: Compute policy gradient ∇θ′ fπ (θ′ ) according to The software environment we used is Tensorflow 1.10.0 with
(46). Python 3.6 on Ubuntu 18.04.2 LTS. For the blockchain sys-
17: Compute accumulate gradient dθ = dθ + tem, we used virtualization for distributed ledger technology
∇θ′ fπ (θ′ ). (vDLT) we developed, which is a service-oriented blockchain
18: Compute value gradient ∇θv′ fv (θv′ ) according to system with virtualization and decoupled management/control
(48). and execution. Different block sizes can be dynamically set in
19: Compute accumulate gradient dθv = dθv + vDLT by chaning the parameters in vDLT. For more details,
∇θv′ fπ (θv′ ). please go to http://vdlt.io/approach.html.
20: end for By using Tensorflow’s built-in module TensorBoard, we
21: Asynchronous update weight parameter θ and θv show the visualization of our A3C architecture, as shown in
according to (50). Fig. 3. In Fig. 3, the architecture of the proposed algorithm
22: end for consists of a global network and six worker agents. We can
23: end while observe that the proposed algorithm starts with constructing
the global network. Then, the parameters in the global network
are propagated synchronously to each worker agent. In Fig. 4,
we show the internal structure of one of the worker agents.
VII. S IMULATION R ESULTS AND A NALYSIS Every worker agent has its own network and environment. By
interacting with their own environment, worker agents update
the global network parameters.

In this section, we evaluate the performance of the proposed


algorithm under different parameter settings. Simulation is B. Performance of the Proposed Algorithm
performed using Tensorflow [46] on a Python-based simulator. We first show the convergence of the proposed algorithm
To verify the performance of the proposed algorithm, we under different actor’s learning rate ηa , which the critic’s
consider the following schemes: 1) Proposed scheme without learning rate is set to a fixed value ηc = 10−1 , as shown in Fig.
local computing (Only-offloading): the computation tasks are 5. As can be seen from the figure, when the actor’s learning
only offloaded to MEC servers. 2) Proposed scheme with fixed rate is large, the proposed algorithm has a fast convergence rate
block size (FBS): the size of the blocks generated by the block (i.e., ηa = 0.0001). Similarly, Fig. 6 shows the convergence
producers is the same. 3) Proposed scheme with fixed block of the proposed algorithm under different critic’s learning rate
interval (FBT): the frequency of generating blocks is the same. ηc , which we fix the actor’s learning rate ηa = 10−4 . From

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

10

sync sync_1 sync_2 sync_3 sync_4 sync_5

W_0 W_1 W_2 Global_Net W_3 W_4 W_5


int int int int int int int

Fig. 3: Visualization of the proposed deep reinforcement learning algorithm using TensorBoard.

W_0
a_loss
Locl_grad Normal
Choose_a c_loss A1 Local_grad
actor
sync
S
...4 more
Locl_grad
a_loss TD_errorLocl_grad TD_error
Normal
a_loss
Normal Normal_1 Normal_2 Normal_1
Normal_2 Locl_grad
a_loss a_loss a_loss actor
Vtarget ...1more

Wrap_a_out Locl_grad
A Locl_grad
critic int
Locl_grad
Locl_grad
actor a_loss
S
int Locl_grad

Fig. 4: Visualization of interaction in the worker agent.

4 4
x 10 x 10
10 10

9.5
9.5
9
9
8.5
Total reward

Total reward

8 8.5

7.5
8

7
7.5
6.5 η =10−3 ηc=10−1
a
−4
ηa=10 η =10−2
6 7 c
−5
ηa=10 ηc=10−3
5.5
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Episode Episode

Fig. 5: The total reward with different learning of the actor network.Fig. 6: The total reward with different learning of the critic network.

the figure, we can observe that when the critic’s learning rate The reason is that as the transmit power increases, although
ηc = 0.1, the proposed algorithm first converges, followed by the transmit rate increases, the overhead of communication
ηc = 0.01 and ηc = 0.001. increases, such as energy consumption, which can affect the
computation rate.
In Fig. 7, we illustrate how the sum of the power available
PT affects the average reward. We can observe that the average Fig. 8 and Fig. 9 show the impact of the CPU-cycle
reward increases when PT increases. However, the proposed frequency of the MEC servers sn on average computation rate
algorithm performs better than other schemes. From the figure, and average transaction throughput, respectively. From the Fig.
the average reward of all schemes grows slowly when the 8, we can observe that the average computation rate of all
value of the sum of power available, PT , is greater than 0.7. schemes increases slowly with the increase in the CPU-cycle

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

11

4 6
x 10 x 10
8 8.5
Proposed
Only−offloading
8 FBS

Average Computation Rate (bit/s)


FBT
7.5

7.5
Average Reward

6.5

6
7

5.5
Proposed
FBS 5
Only−offloading
FBT
6.5 4.5
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1 1.5 2 2.5 3 3.5 4 4.5 5
The sum of the power available (W) The CPU cycle frequency snof MEC servers 9
x 10

Fig. 7: Average reward versus the total power available PT . Fig. 8: Average computation rate versus the CPU-cycle frequency sn .

4
x 10
92 6
Proposed
Only−offloading
Average Transaction Throughput (TPS)

90 FBS
5
FBT
88

4
Average Reward

86

84 3

82
2

80
Proposed
1
78 Only−offloading
FBS
FBT
76 0
1 1.5 2 2.5 3 3.5 4 4.5 5 N=8 N=14 N=20 N=26 N=32
The CPU cycle frequency snof MEC servers x 10
9 Number of mobile deveices

Fig. 9: Average transaction throughput versus the CPU- Fig. 10: Average reward versus number of mobile devices N
cycle frequency sn .

frequency sn . However, from Fig. 9, it is observed that the reward of all the schemes decreases with the increase in
average transaction throughput decreases with the increase in the maximum block interval. That is because the transaction
the CPU-cycle frequency sn for all schemes. That is because throughput decreases with the increase in maximum block
the computing resource of MEC servers is limited. When the interval when other parameters are unchanged. To verify the
MEC server consumes more computing resource to perform impact of the average transaction size χ on the average
the offloading tasks, the computing resource available to the reward, we evaluate the performance obtained by the proposed
blockchain system become less. scheme under different average transaction size, as shown in
Fig. 12. From the figure, we can observe that the average
Fig. 10 shows the comparison of the average reward versus
reward for all schemes decreases with the increasing average
the number of mobile devices. As can be seen from the figure,
transaction size. The reason is that one block can only contain
with the number of mobile devices increases, the average
a small number of transactions with large-size transactions.
reward keeps increasing. Due to the joint optimization of
Furthermore, we can also find that the proposed scheme can
offloading decision, the allocation of transmit power, block
obtain highest average reward with the variation of average
size, and block interval, the proposed algorithm can always
transaction size, and then follows the FBS, the Only-offloading
benefit compared with other algorithms that only optimize part
scheme, and the lowest scheme is the FBT. Similarly, we
of the optimization items.
also evaluate the impact of block size Ṡb on the average
In Fig. 11, we examine the average reward under different reward, as shown in Fig. 13. Observe that, the average reward
maximum block interval Ṫb . It can be observed that the average

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

12

4
x 10
11000 2.6
Proposed Proposed
10000 FBS 2.4 Only−offloading
Only−offloading FBS
9000 FBT 2.2 FBT

8000 2
Average Reward

Average Reward
7000 1.8

6000 1.6

5000 1.4

4000 1.2

3000 1

2000 0.8

1000 0.6
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100 200 300 400 500 600
Maximum block interval (s) The average transaction size (KB)

Fig. 11: Average reward versus maximum block interval Ṫb . Fig. 12: Average reward versus transaction size χ.

9
x 10
14000 3
B=150KHz
B=300KHz

The computing resource available Tn(t)


12000 B=450KHz
2.5

10000
2
Average Reward

8000
1.5
6000

1
4000

Proposed
0.5
2000 Only−offloading
FBS
FBT
0 0
1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90
Block size limit (MB) Episode

Fig. 13: Average reward versus block size limit Ṡb . Fig. 14: The computing resource of the randomly chosen blockchain
nodes at the randomly selected 90 episode for the proposed algorithm.

slowly increase with the increase in block size except for given ϖ1 as CPU-cycle frequency sn increases. Besides, we
FBS. That is because the LTF constraint limits the maximum can observe that the average reward increase as ϖ1 increases.
number of transactions in a block. Another observation is That is because the performance of the MEC system is mainly
that the proposed scheme always performs the best, followed affected by changes in the CPU-cycle frequency, and the
by Only-offloading and FBT. Moreover, we randomly choose performance of the blockchain system is almost constant, as
a blockchain node in the blockchain system to display its shown in Fig. 16. Then we can achieve the tradeoff between
computing resource at randomly selected 90 episodes during the performance of the MEC system and the performance of
B = 150KHz, B = 300KHz, and B = 450KHz in Fig. the blockchain system based on Fig. 16.
14. From the figure, the queue length of the blockchain nodes
at different episodes is finite and because the transmit rate is VIII. C ONCLUSIONS AND F UTURE W ORK
different. Besides, we can observe that the computing resource In this paper, we studied a blockchain-enabled MEC system
available of blockchain node decreases with the increase in and, considering the trust value of nodes (i.e., relay nodes
bandwidth B. That is because the transmit rate increases with and consensus nodes), investigated the computation rate of
the increase in bandwidth. the MEC system and transaction throughput of the blockchain
In Figs. 15 and 16, we show the impact of the CPU- system maximization problem. To satisfy the performance
cycle frequency sn on the average reward, computation rate requirements of the system, we jointly optimized cooperative
of the MEC system, and throughput of the blockchain system, offloading decision, power allocation, block size, and block
respectively. From Fig. 15, it is in accordance with our interval. Due to the dynamic characteristics of the wireless
intuition that the performance of the reward improves for a channels and available resources, the formulated optimization

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

13

6
x 10
500 15

Computation Rate (bit/s)


ϖ1=0.3 ϖ =0.3
1
450 ϖ =0.7 ϖ =0.7
1 10 1

400

5
350
Average Reward

300 0
Sn=1GHz Sn=2GHz Sn=3GHz
250

200 150
ϖ1=0.3

Throughput (TPS)
150 ϖ1=0.7
100

100
50
50

0 0
Sn=1GHz Sn=2GHz Sn=3GHz Sn=1GHz Sn=2GHz Sn=3GHz

Fig. 15: The impact of different parameter settings on average reward. Fig. 16: The tradeoff under different parameter setting of ϖ1 and sn .

problem was modeled as an MDP. An A3C algorithm was [11] N. Zhao, H. Wu, and Y. Chen, “Coalition game-based computation
developed to cope with the MDP problem, which can stably resource allocation for wireless blockchain networks,” IEEE Internet
of Things J., pp. 1–1, 2019.
train neural networks. Simulation results have shown the [12] T. Chen and G. B. Giannakis, “Bandit convex optimization for scalable
efficiency of our proposed algorithm, which has fast con- and dynamic iot management,” IEEE Internet of Things Journal, vol. 6,
vergence and better performance than other algorithms under no. 1, pp. 1276–1286, 2018.
[13] M. H. Amini, H. Arasteh, and P. Siano, “Sustainable smart cities through
different parameter settings. Meanwhile, we can also observe the lens of complex interdependent infrastructures: Panorama and state-
that the algorithm can achieve the optimal trade-off between of-the-art,” in Sustainable Interdependent Networks II, 2019, pp. 45–68.
the computation rate of the MEC system and transaction [14] J. Kang, Z. Xiong, D. Niyato, D. Ye, D. I. Kim, and J. Zhao, “Toward
secure blockchain-enabled internet of vehicles: Optimizing consensus
throughput in the blockchain system. In future work, we will management using reputation and contract theory,” IEEE Trans. Veh.
study interference management in blockchain-enabled MEC Technol., vol. 68, no. 3, pp. 2906–2920, Mar. 2019.
systems. [15] M. Liu, F. Yu, Y. Teng, V. Leung, and M. Song, “Performance optimiza-
tion for blockchain-enabled industrial internet of things (IIoT) systems:
A deep reinforcement learning approach,” IEEE Trans. Ind. Info., pp.
R EFERENCES 1–1, 2019.
[16] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no.
[1] M. Patel, B. Naughton, C. Chan, N. Sprecher, S. Abeta, A. Neal et al., 3-4, pp. 279–292, 1992.
“Mobile-edge computing introductory technical white paper,” White [17] G. A. Rummery and M. Niranjan, On-line Q-learning using connec-
paper, mobile-edge computing (MEC) industry initiative, pp. 1089–7801, tionist systems. University of Cambridge, Department of Engineering
2014. Cambridge, England, 1994, vol. 37.
[2] J. Kwak, Y. Kim, J. Lee, and S. Chong, “Dream: Dynamic resource and [18] F. Guo, F. R. Yu, H. Zhang, H. Ji, M. Liu, and V. C. M. Leung, “Adaptive
task allocation for energy minimization in mobile cloud systems,” IEEE resource allocation in future wireless networks with blockchain and
J. Sel. Areas Commun., vol. 33, no. 12, pp. 2510–2523, Dec. 2015. mobile edge computing,” IEEE Trans. Wireless Commun., Accepted,
[3] J. Du, L. Zhao, X. Chu, F. R. Yu, J. Feng, and C. I, “Enabling low-latency 2019.
applications in LTE-A based mixed fog/cloud computing systems,” IEEE [19] Z. Hong, H. Huang, S. Guo, W. Chen, and Z. Zheng, “Qos-aware
Trans. Veh. Technol., vol. 68, no. 2, pp. 1757–1771, Feb. 2019. cooperative computation offloading for robot swarms in cloud robotics,”
[4] J. Feng, Q. Pei, F. R. Yu, X. Chu, and B. Shang, “Computation IEEE Trans. Veh. Technol., vol. 68, no. 4, pp. 4027–4041, Apr. 2019.
offloading and resource allocation for wireless powered mobile edge [20] X. Cao, F. Wang, J. Xu, R. Zhang, and S. Cui, “Joint computation and
computing with latency constraint,” IEEE Wireless Communications communication cooperation for energy-efficient mobile edge comput-
Letters, accepted, 2019. ing,” IEEE Internet of Things J., vol. 6, no. 3, pp. 4188–4200, Jun.
[5] Y. Mao, J. Zhang, S. H. Song, and K. B. Letaief, “Stochastic joint 2019.
radio and computational resource management for multi-user mobile- [21] S. Guo, J. Liu, Y. Yang, B. Xiao, and Z. Li, “Energy-efficient dynamic
edge computing systems,” IEEE Trans. Wireless Commun., vol. 16, no. 9, computation offloading and cooperative task scheduling in mobile cloud
pp. 5994–6009, Sept. 2017. computing,” IEEE Trans. Mobile Comput., vol. 18, no. 2, pp. 319–333,
[6] J. Zhao, Q. Li, Y. Gong, and K. Zhang, “Computation offloading Feb. 2019.
and resource allocation for cloud assisted mobile edge computing in [22] R. Yang, F. R. Yu, P. Si, Z. Yang, and Y. Zhang, “Integrated blockchain
vehicular networks,” IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 7944– and edge computing systems: A survey, some research issues and
7956, Aug. 2019. challenges,” IEEE Communications Surveys & Tutorials, vol. 21, no. 2,
[7] F. Zhou, Y. Wu, R. Q. Hu, and Y. Qian, “Computation rate maximization pp. 1508–1532, 2019.
in UAV-enabled wireless-powered mobile-edge computing systems,” [23] J. Kang, R. Yu, X. Huang, M. Wu, S. Maharjan, S. Xie, and Y. Zhang,
IEEE J. Sel. Areas Commun., vol. 36, no. 9, pp. 1927–1941, Sep. 2018. “Blockchain for secure and efficient data sharing in vehicular edge
[8] Z. Xiong, Y. Zhang, D. Niyato, P. Wang, and Z. Han, “When mobile computing and networks,” IEEE Internet of Things J., accepted, 2019.
blockchain meets edge computing,” IEEE Comm. Mag., vol. 56, no. 8, [24] J. Xu, S. Wang, B. Bhargava, and F. Yang, “A blockchain-enabled
pp. 33–39, Aug. 2018. trustless crowd-intelligence ecosystem on mobile edge computing,”
[9] D. Miller, “Blockchain and the Internet of things in the industrial sector,” IEEE Trans. Ind. Infor., pp. 1–1, accept, 2019.
IT Professional, vol. 20, no. 3, pp. 15–18, May 2018. [25] X. Qiu, L. Liu, W. Chen, Z. Hong, and Z. Zheng, “Online deep rein-
[10] M. Liu, F. R. Yu, Y. Teng, V. C. M. Leung, and M. Song, “Distributed forcement learning for computation offloading in blockchain-empowered
resource allocation in blockchain-based video streaming systems with mobile edge computing,” IEEE Trans. Veh. Technol., pp. 1–1, 2019.
mobile edge computing,” IEEE Trans. Wireless Commun., vol. 18, no. 1, [26] D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, “Secure
pp. 695–708, Jan. 2019. computation offloading in blockchain based iot networks with deep
reinforcement learning,” arXiv preprint arXiv:1908.07466, 2019.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

14

[27] M. S. Alam, J. W. Mark, and X. S. Shen, “Relay selection and resource [49] “3GPP TR 36.839 v11.0.0,” 3GPP, Tech. Rep., Sept. 2012.
allocation for multi-user cooperative ofdma networks,” IEEE Trans. [50] “Evolved universal terrestrial radio access (E-UTRA): Physical channels
Wireless Commun., vol. 12, no. 5, pp. 2193–2205, May 2013. and modulation,” 3GPP TS 36.211 V8.6.0 Std., Mar., 2009.
[28] Y. Wang, M. Sheng, X. Wang, L. Wang, and J. Li, “Mobile-edge com- [51] P. Guo, W. Hou, L. Guo, W. Sun, C. Liu, H. Bao, L. H. K. Duong, and
puting: Partial computation offloading using dynamic voltage scaling,” W. Liu, “Fault-tolerant routing mechanism in 3d optical network-on-chip
IEEE Trans. Commun., vol. 64, no. 10, pp. 4268–4282, Oct. 2016. based on node reuse,” IEEE Transactions on Parallel and Distributed
[29] L. Liu, C. Chen, Q. Pei, S. Maharjan, and Y. Zhang, “Vehicular edge Systems, Accepted, 2019.
computing and networking: A survey,” arXiv preprint arXiv:1908.06849, [52] D. Zhai, R. Zhang, Y. Wang, H. Sun, L. Cai, and Z. Ding, “Joint
2019. user pairing, mode selection, and power control for d2d-capable cellular
[30] G. Han, J. Jiang, L. Shu, and M. Guizani, “An attack-resistant trust networks enhanced by nonorthogonal multiple access,” IEEE Internet of
model based on multidimensional trust metrics in underwater acoustic Things J., vol. 6, no. 5, pp. 8919–8932, Oct. 2019.
sensor network,” IEEE Trans. Mobile Comput., vol. 14, no. 12, pp. 2447–
2459, Dec. 2015.
[31] Z. Yao, D. Kim, and Y. Doh, “Plus: Parameterized and localized
trust management scheme for sensor networks security,” in 2006 IEEE
International Conference on Mobile Ad Hoc and Sensor Systems, 2006, Jie Feng is currently pursuing the Ph.D. degree in
pp. 437–446. Communication and Information System at Xidian
[32] A. P. Miettinen and J. K. Nurminen, “Energy efficiency of mobile clients University, Xian, China. She is also with Carleton
in cloud computing.” in Proc. USENIX Conf. Hot Topics Cloud Comput. University as Visiting Ph.D Student since January
(HotCloud). Boston, MA, USA, Jun. 2012, pp. 1–7. 2019. Her current research interests include mobile
[33] M. S. Alam, J. W. Mark, and X. S. Shen, “Relay selection and resource edge computing, Blockchain, deep reinforcement
allocation for multi-user cooperative ofdma networks,” IEEE Trans. learning, Device to Device communication, resource
Wireless Commun., vol. 12, no. 5, pp. 2193–2205, May 2013. allocation and convex optimization and stochastic
[34] “Antshares digital assets for everyone,” [Online]. Avail- network optimization.
able:https://www.antshares.org.
[35] A. Clement, E. L. Wong, L. Alvisi, M. Dahlin, and M. Marchetti,
“Making byzantine fault tolerant systems tolerate byzantine faults.” in
NSDI, vol. 9, 2009, pp. 153–168.
[36] G. Han, J. Jiang, L. Shu, and M. Guizani, “An attack-resistant trust
model based on multidimensional trust metrics in underwater acoustic F. Richard Yu (S00-M04-SM08-F18) received the
sensor network,” IEEE Trans. Mobile Comput., vol. 14, no. 12, pp. 2447– PhD degree in electrical engineering from the Uni-
2459, Dec. 2015. versity of British Columbia (UBC) in 2003. From
[37] R. A. Shaikh, H. Jameel, B. J. d’Auriol, H. Lee, S. Lee, and Y.-J. Song, 2002 to 2006, he was with Ericsson (in Lund, Swe-
“Group-based trust management scheme for clustered wireless sensor den) and a start-up in California, USA. He joined
networks,” IEEE trans. parallel distrib. syst., vol. 20, no. 11, pp. 1698– Carleton University in 2007, where he is currently a
1712, 2009. Professor. He received the IEEE Outstanding Service
[38] A. Josang, “An algebra for assessing trust in certification chains,” in Award in 2016, IEEE Outstanding Leadership Award
Proc. Netw. Distrib. Syst. Secur. Symp., 1999, pp. 1–10. in 2013, Carleton Research Achievement Award in
[39] Q. Liu, Y. Liao, B. Tang, and L. Yu, “A trust model based on subjective 2012, the Ontario Early Researcher Award (formerly
logic for multi-domains in grids,” in Proc. Pacific-Asia Workshop on Premiers Research Excellence Award) in 2011, the
Computational Intelligence and Industrial Application, 2008, pp. 882– Excellent Contribution Award at IEEE/IFIP TrustCom 2010, the Leadership
886. Opportunity Fund Award from Canada Foundation of Innovation in 2009
[40] N. Oren, T. J. Norman, and A. Preece, “Subjective logic and arguing and the Best Paper Awards at IEEE ICNC 2018, VTC 2017 Spring, ICC
with evidence,” Artificial Intelligence, vol. 171, no. 10-15, pp. 838–854, 2014, Globecom 2012, IEEE/IFIP TrustCom 2009 and Intl Conference on
2007. Networking 2005. His research interests include wireless cyber-physical sys-
[41] M. Chen, Y. Zhou, and L. Tang, “Ray projection method and its tems, connected/autonomous vehicles, security, distributed ledger technology,
applications based on grey prediction,” Chin. J. Stat. Decis, vol. 1, no. 1, and deep learning.
pp. 13–20, 2007. He serves on the editorial boards of several journals, including Co-Editor-
[42] M. Simsek, M. Bennis, and I. Güvenç, “Learning based frequency-and in-Chief for Ad Hoc & Sensor Wireless Networks, Lead Series Editor for
time-domain inter-cell interference coordination in hetnets,” IEEE Trans. IEEE Transactions on Vehicular Technology, IEEE Transactions on Green
Veh. Technol., vol. 64, no. 10, pp. 4589–4602, 2014. Communications and Networking, and IEEE Communications Surveys &
[43] Y. Wei, F. R. Yu, M. Song, and Z. Han, “User scheduling and resource Tutorials. He has served as the Technical Program Committee (TPC) Co-Chair
allocation in hetnets with hybrid energy supply: An actor-critic rein- of numerous conferences. Dr. Yu is a registered Professional Engineer in the
forcement learning approach,” IEEE Trans. Wireless Commun., vol. 17, province of Ontario, Canada, a Fellow of the Institution of Engineering and
no. 1, pp. 680–692, 2018. Technology (IET), and a Fellow of the IEEE. He is a Distinguished Lecturer,
[44] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, the Vice President (Membership), and an elected member of the Board of
D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep rein- Governors (BoG) of the IEEE Vehicular Technology Society.
forcement learning,” in International conference on machine learning,
2016, pp. 1928–1937.
[45] M. Babaeizadeh, I. Frosio, S. Tyree, J. Clemons, and J. Kautz, “Re-
inforcement learning through asynchronous advantage actor-critic on a
gpu,” arXiv preprint arXiv:1611.06256, 2016. Qingqi Pei received his B.S., M.S. and Ph.D.
[46] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. degrees in Computer Science and Cryptography
Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale from Xidian University, in 1998, 2005 and 2008,
machine learning on heterogeneous distributed systems,” arXiv preprint respectively. He is now a Professor and member
arXiv:1603.04467, 2016. of the State Key Laboratory of Integrated Services
[47] C. You, K. Huang, H. Chae, and B. Kim, “Energy-efficient resource al- Networks, also a Professional Member of ACM
location for mobile-edge computation offloading,” IEEE Trans. Wireless and Senior Member of IEEE, Senior Member of
Commun., vol. 16, no. 3, pp. 1397–1411, March 2017. Chinese Institute of Electronics and China Computer
[48] Y. Kim, H. Lee, and S. Chong, “Mobile computation offloading for Federation. His research interests focus on privacy
application throughput fairness and energy efficiency,” IEEE Trans. preserving, blockchain and edge computing security.
Wireless Commun., vol. 18, no. 1, pp. 3–19, Jan 2019.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2961707, IEEE Internet of
Things Journal

15

Xiaoli Chu (M06CSM15) received the B.Eng. de- Jianbo Du received the B.S. degree and M.S. degree
gree in electronic and information engineering from from Xi’an University of Posts and Telecommu-
Xian Jiao Tong University, Xian, China, in 2001, nications in 2007 and 2013, respectively, and the
and the Ph.D. degree in electrical and electronic Ph.D. in communication and information system-
engineering from the Hong Kong University of Sci- s at Xidian University, Xian, Shaanxi, China, in
ence and Technology, Hong Kong, in 2005. She is 2018. She is now a teacher with the department of
a Senior Lecturer with the Department of Electronic Communication and Information Engineering, Xian
and Electrical Engineering, University of Sheffield, University of Posts and Telecommunications. Her
Sheffield, U.K. From September 2005 to April 2012, research interests include mobile edge computing,
she was with the Centre for Telecommunications resource management, NOMA, deep reinforcement
Research, Kings College London. She has published learning, convex optimization, stochastic network
more than 100 peer-reviewed journal and conference papers. She is the optimization and heuristic algorithms and their applications in wireless
Lead Editor/author of the book Heterogeneous Cellular Networks: Theory, communications.
Simulation and Deployment (Cambridge University Press, 2013) and the book
4G Femtocells: Resource Allocation and Interference Management (Springer
2013).

Li Zhu received the Ph.D. degree in traffic control


and information engineering from Beijing Jiaotong
University, Beijing, China, in 2012. He is currently a
Faculty Member at Beijing Jiaotong University and a
Visiting Scholar at Carleton University, Ottawa, ON,
Canada, and The University of British Columbia,
Vancouver, BC, Canada. His research interests in-
clude intelligent transportation systems, train-ground
communication technology in communication base
train ground communication systems, and cross layer
design in train-ground communication systems.

2327-4662 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like