0% found this document useful (0 votes)
12 views19 pages

Peng 2021

Uploaded by

Thanh Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views19 pages

Peng 2021

Uploaded by

Thanh Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
1

Proactive UAV Network Slicing for URLLC and


Mobile Broadband Service Multiplexing
Peng Yang, Member, IEEE, Xing Xi, Graduate Student Member, IEEE, Kun Guo, Member, IEEE,
Tony Q. S. Quek, Fellow, IEEE, Jingxuan Chen, and Xianbin Cao, Senior Member, IEEE

G and emerging 6G wireless networks are expected to be


Abstract—The unmanned aerial vehicle (UAV) network that is
convinced as a significant component of 5G and emerging 6G
wireless networks is desired to accommodate multiple types of
5 highly agile and resilient and brace the capability of fast
communication service recovery [2] in case of network failure
service requirements simultaneously. However, how to converge
different types of services onto a common UAV network without (e.g., infrastructure damage, flash crowd areas and remote
deploying an individual network solution for each type of service areas). To achieve such an ambitious goal, the unmanned aerial
is challenging. We tackle this challenge in this paper through vehicle (UAV) network has been considered as a significant
slicing the UAV network, i.e., creating logical UAV networks component of 5G and 6G networks owing to its unique rapid
customized for specific requirements. To this end, we formulate response-ability and reduced vulnerability to natural disasters
the UAV network slicing problem as a sequential decision problem
to provide mobile broadband (MBB) services for ground mobile [3].
users while satisfying ultra-reliable and low-latency requirements Meanwhile, 5G and 6G networks are convinced to accom-
of UAV control and non-payload signal delivery. This problem, modate different service requirements concerning communica-
however, is difficult to be directly solved mainly due to the tion latency, network throughput and communication reliabil-
sequence-dependent characteristic and the lack of accurate lo- ity. However, it is quite challenging to design a UAV network
cation information of mobile users and accurate and tractable
channel gain models in practice. To overcome these difficulties, to satisfy these diverse service requirements simultaneously
we propose a novel solution approach based on learning and as UAVs have stringent size, weight and power consumption
optimization methods. Particularly, we develop a distributed requirements. Fortunately, a UAV network can benefit from
learning method to predict mobile users’ locations, where partial the network slicing characteristic of 5G and 6G networks via
user location information stored on each UAV is utilized to train enabling virtually isolated on-board processing systems. In this
user location prediction networks. To achieve accurate channel
gain models, we design deep neural networks (DNNs) that are way, multiple services can be converged onto a common UAV
trained by signal measurements at each UAV. To cope with the infrastructure, and the number of hardware components on
challenging sequence-dependent characteristic of the problem, we UAVs can also be minimized, providing novel on-board system
develop a Lyapunov-based optimization framework with provable realizations [4].
performance guarantees to decompose the original problem into
a sequence of separate optimization subproblems based on the A. Prior works
learned results. Finally, an iterative optimization scheme joint
with a successive convex approximation technique is exploited Recently, many research interests [4]–[8] have been paid
to solve these subproblems. Simulation results demonstrate the to the UAV network slicing due to the significant role of
accuracy of the learning methods as well as the effectiveness of the UAV network in 5G and 6G and the urgent requirement
the Lyapunov-based optimization framework. of improving the cost efficiency of providing diverse com-
Index Terms—UAV network slicing, URLLC, mobile broad- munication services by deploying UAVs. For example, the
band service, learning and optimization work in [4] evaluated the performance of network slicing for
UAV communications and demonstrated that the slicing was
I. I NTRODUCTION effective in terms of UAV payload slice and UAV control slice
isolation. A network slicing demo over 5G radio for UAV
Manuscript received October 21, 2020; revised February 25, 2021; accepted
April 12, 2021. This research is supported by the National Research Founda- communications was also demonstrated in [5]. In this demo,
tion, Singapore and Infocomm Media Development Authority under its Future the network was virtually sliced into two slices where one
Communications Research & Development Programme, and MOE ARF Tier slice was created for sending commands to a UAV, and the
2 under Grant T2EP20120-0006. Any opinions, findings and conclusions
or recommendations expressed in this material are those of the author(s) other slice was created to transmit payload from the UAV to
and do not reflect the views of National Research Foundation, Singapore a ground user.
and Infocomm Media Development Authority. (Corresponding author: Peng As control information delivery has the stringent re-
Yang.)
P. Yang, K. Guo, and T. Q. S. Quek are with the Information Systems quirement of ultra-reliable and low-latency communications
Technology and Design, Singapore University of Technology and Design, (URLLC) [9] and payload needs to be transmitted over high-
487372 Singapore. speed and broadband links [10], two types of slices, i.e.,
X. Xi, J. Chen, and X. Cao are with the School of Electronic and Informa-
tion Engineering, Beihang University, Beijing 100083, China, and also with URLLC slices for control information delivery and mobile
the Key Laboratory of Advanced Technology, Near Space Information System broadband (MBB) slices for payload transmission, can be
(Beihang University), Ministry of Industry and Information Technology of envisioned [4] in the UAV network. During the past few years,
China, Beijing 100083, China.
This paper was presented in part in the IEEE Global Communications a rich body of works [1], [9]–[19] on URLLC-enabled UAV
Conference 2020 [1]. network and MBB-enabled UAV network had been published.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
2

In the research domain of URLLC-enabled UAV network, UAV-to-ground-user (UtG) channel gain values. To address
many works [9], [11]–[16] studied how to guarantee the the channel gain model-related issue, a recent work in [24]
performance of URLLC links for control and/or non-payload proposed to use a deep neural network (DNN), which was
information transmission. For instance, the average achievable trained based on raw signal measurements, to obtain numerical
data rate of the URLLC link between a ground control station channel gain estimation values. However, this work did not
and a UAV under a three-dimensional (3D) channel was stud- derive analytically tractable channel gain models; as a result,
ied in [9], where, the URLLC link was established to deliver no theoretical analysis on the communication problem in the
control information for UAV collision avoidance. In [15], the UAV network could be conducted.
authors studied the problem of uplink power for enabling green Additionally, slicing the UAV network should tackle the
URLLC in UAV-assisted Internet of Things (IoT) networks. non-trivial mismatch issue between the slice supply and slice
Besides, the joint design of passive beamforming, blocklength demand. The creation and configuration of UAV slices (slice
allocation and UAV positioning for enabling URLLC services supply), which require the protocol configuration and resource
facilitated by a UAV decode-and-forward (DF) relay and a orchestration and release, are time-consuming; however, ser-
reflective intelligent surface (RIS) was proposed in [16]. In vices (especially URLLC services) cannot tolerate the delay
the research domain of MBB-enabled UAV network, most of creating and configuring the slices.
of the existing literatures [1], [10], [17]–[19] focused on the To overcome the above issues, we propose in this paper
movement control and/or resource (e.g., transmit power, band- to proactively slice the UAV network for URLLC and MBB
width) allocation of the UAV network towards MBB service service multiplexing. Our main contributions are summarized
coverage. For example, in [1], we proposed to provide energy- as follows:
efficient and fair MBB services for ground stationary users • Owing to the mobility of users and the limited UAV
by jointly optimizing UAV trajectories, UAV transmit power, communication coverage, a time-varying UAV network
and the acceptance of users’ access requests. Nevertheless, is desired to be operated to improve mobile users’
the issue of achieving ultra-reliable and low-latency control quality of service (QoS). Thus, we formulate the UAV
information delivery was not investigated in [1]. A joint network slicing problem as a sequential decision prob-
UAV trajectory planning and transmit power allocation for a lem to provide energy-efficient and fair MBB services
UAV network was researched to extend the communication for ground mobile users while satisfying ultra-reliable
coverage for two disconnected far ground vehicles in [10]. and low-latency requirements of UAV control and non-
Additionally, the work in [17] investigated the UAV trajectory payload information transmission. This problem, how-
design and bandwidth allocation problem considering both the ever, is highly challenging to be solved via standard
UAV’s energy consumption and the service fairness among the optimization methods mainly due to the lack of accurate
ground mobile users, and a deep reinforcement learning based users’ locations and channel gain models, as well as the
algorithm was used to solve this problem. sequence-dependent characteristic.
• A distributed learning method is exploited to predict
users’ locations as users’ historical locations are scattered
B. Motivations and contributions among UAVs and a UAV cannot rely solely on partial
However, to practically realize the vision of deploying the location information to predict users’ locations. Besides,
UAV network, the communication problem in the UAV net- we propose to mitigate the mismatch issue between slice
work should be rationally formulated and effectively solved. supply and slice demand by proactively slicing the UAV
To this aim, UAVs need to obtain location information of network. Although the creation and configuration of slices
ground mobile users and require analytical channel gain mod- are time-consuming, they can be performed proactively or
els. Owing to such reasons, most of the above works [1], [9]– in advance based on future users’ locations.
[14], [17]–[19] assumed that users’ locations were known and • We construct accurate and analytically tractable channel
adopted simplified channel gain models such as the isotropic gain models based on estimation results of DNNs. This
radiation for antenna and the free space propagation model is because actual channel gain values depend on mobile
[20] or complicated channel gain models like the probabilistic users and flying UAVs in a rather sophistical manner and
line-of-sight (LoS) model [21] and angle-dependent channel DNNs have the powerful non-linear function approxima-
parameters [22]. The simplified models, however, may be tion ability.
inaccurate in practical environment as they do not have the • Inspired by the superiority of the Lyapunov approach
slightest association with the local environment where UAVs in tackling sequential decision problems, we propose to
are actually deployed. The model accuracy cannot be guar- decompose the formulated problem into multiple repeated
anteed in the local environment even exploiting sophisticated optimization subproblems based on the learned results
statistical models. This is because they can only simulate the via a Lyapunov-based optimization framework, which is
channel gain in an average sense. provably performance guaranteed.
To tackle the user location-related issue, the work in [23] • The subproblems are confirmed to be mixed-integer-
proposed to use a learning method to predict ground user- non-convex, which are difficult to be solved. To make
s’ locations. Based on the predicted locations, UAVs were them tractable, an iterative optimization scheme and a
deployed to provide MBB services for users. However, this successive convex approximation (SCA) technique are
work still adopted the probabilistic LoS model to calculate exploited to handle the mixed-integer and non-convexity

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
3

Resource manage and control


Protocol stack
Virtualized network slice manager A. UAV network slicing system architecture
Dedicated slice Common slice
configuration control function control function
Computing In the above scenario, the UAV network has to simul-
Slice admission control Slice blueprint taneously support ultra-reliable and low-latency uplinks for

MBB slices
MBB service flow, UAV control and non-payload signal transmission and energy-
Resource management structure, etc.
and orchestration efficient downlinks for payload transmission. To achieve this
goal, we propose to virtually isolate UAV network resources

URLLC slices
Slice blueprint

URLLC service flow, and functions customized for specific requirements using the
structure, etc.
concept of network slicing. Particularly, the UAV network is
Network
provider
8$9
UHOD\
%6 (GJHFORXG 6WRUDJH logically divided into two types of slices, i.e., URLLC slices
and MBB slices3 . A UAV network slicing system architecture
is shown in Fig. 1. The system is composed of three major
UAV
communication components: network provider (NP), resource manage and
scenario
control (RMC), virtualized network slice manager (VNSM).
Fig. 1. A communication coverage scenario of a UAV network in urban Following the 3GPP management reference framework [27],
environment and the UAV network slicing system architecture. we consider the UAVs as a network infrastructure resource
domain. NP owns the UAVs and the BS as well according
to the business relationships and stakeholder roles defined in
properties, respectively. [28]. For each UAV, it can be considered as a node of network
• Finally, we conduct simulations to verify the accuracy function virtualization (NFV) infrastructure. Based on the vir-
of the learning methods and the effectiveness of the tualized network function (VNF) and/or the physical network
Lyapunov-based optimization framework as well. function (PNF) [29], user-specific radio resource control is ac-
tivated at each UAV. RMC is responsible for configuring radio
access network protocol stacks according to service require-
II. S YSTEM M ODEL ments of slices. For example, for slices with high reliability
and low-latency requirements, lower frame error rates, reduced
round trip time (RTT), shortened transmission time interval
As shown in Fig. 1, this paper considers a communication
(TTI), and/or multi-point diversity schemes are desired to be
coverage scenario by deploying a UAV network in urban
applied. RMC is also in charge of orchestrating and releasing
environment1 . This scenario mainly includes a BS with ar-
resources on request for all UAVs by exposing the northbound
ray antenna configuration, J single antenna UAVs, and N e
interface to the VNSM. VNSM is made available by NP via
pedestrians (or called ground mobile users) walking in a
logically abstracting the physical infrastructure resources as
two-dimensional (2D) urban area of interest R2 . The BS
virtual computing, storage, and networking resources. Besides,
is utilized to transmit ultra-reliable and low-latency control
VNSM operating on the top of the physical and/or virtualized
signals (e.g., UAV trajectories) to control the movement of
infrastructure is responsible for creating, activating, maintain-
UAVs and non-payload signals (e.g., UAV transmit power)
ing, configuring, and releasing slices during the life cycle of
to configure the UAV network through uplink wireless fad-
them. Through the dedicated/common slice control function,
ing channels2 . The UAVs, the set of which is denoted by
VNSM will generate a network slice blueprint (i.e., a template)
J = {1, 2, . . . , J}, acting as flying relays are deployed to
for each accepted network slice. The slice blueprint describes
perform a communication task, i.e., providing energy-efficient
the structure, configuration, control signals, and service flows
and fair service coverage for ground mobile users via downlink
for instantiating and controlling the network slice instance of a
wireless fading channels. Besides, owing to the limited number
type of service during its life cycle. The slice instance includes
of UAVs and the UAV’s restricted communication coverage,
a set of network functions and resources to meet the end-to-
the movement of UAVs will be continuously controlled to
end service requirements. However, the above slice creation
complete the communication task. To theoretically model the
and configuration processes are time-consuming.
communication task, we assume that the time domain in the
considered scenario is discretized into a sequence of time
slots and consider that the task may last long enough, i.e., B. URLLC slice model
t = 1, 2, . . ., even though the working time of UAVs is limited Before creating or activating a URLLC slice, the RMC must
mainly due to the energy constraint in practice. receive and admit a URLLC slice request. According to the
network slice concept (from the QoS requirement viewpoint),
1 The proposed algorithm can be directly applied in many other types of
the URLLC slice request is defined as below.
environment even though the urban environment is considered here. Besides,
although BSs are densely deployed in urban environment, it is still necessary
to deploy a UAV network to recover partial communication coverage under 3 In practice, it may be difficult to provide enhanced mobile broadband
emergency communication scenarios like flash crowd areas and terrestrial (eMBB) services for many ground users by deploying a UAV. This is because
infrastructure malfunction areas. the massive multiple-input and multiple-output (MIMO) is seen as a key
2 The cooperation among UAVs can improve the UAV network resource technology for enabling eMBB services [25], [26]. However, it is difficult
utilization, we therefore study the case of controlling UAVs in a centralized to install massive MIMO equipments on a UAV due to its stringent size,
manner. Yet, the communication scenario where BSs and UAVs cooperate to weight, and power constraints [3]. We therefore do not investigate the eMBB
serve ground users is left for future research due to the space limitation. use case.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
4

Definition 1. A URLLC slice request is characterized as a the BS and a typical UAV j ∈ J in URLLC slice s ∈ S u ,
tuple {τs,u , εs,u } for a slice s ∈ S u = {1, 2, . . . , |S u |}, where
req req
we model the DNN-based BtU channel gain at time slot t,
req
τs,u represents the transmission latency requirement of a data denoted by hB
j,s (t), as follows:
packet of control and non-payload signals in s, εreq s,u denotes
the codeword4 decoding error probability of the packet, |S u |
hB B 3D 3D B 3D 3D B
j,s (t) = Gj,s (xB , vj (t))Gr ḡj,s (xB , vj (t))fj,s (t)
denotes the number of URLLC slices. ∆ θjB (t)
= 2 (t) ,
Dj;B
We assume that all URLLC slice requests can be accepted
(1)
by the RMC. This assumption is rational because URLLC
where GB 3D 3D
j,s (xB , vj (t)) denotes the transmitting antenna
packets should be immediately served once arriving in the
gain relying on BS antenna configurations such as the num-
UAV network owing to the ultra-low-latency requirement.
ber of elements in antenna array and placement of these
Even if all URLLC slice requests are accepted, URLLC slices
elements. The azimuth angle and elevation angle of the BS
will not occupy a lot of system resources as the number of
antenna uniquely determined by BS and UAV locations will
UAVs is relatively small. Besides, UAVs that have the same
also affect the transmitting antenna gain. Gr is receiving
reliable and low-latency requirement are served by the same
antenna gain, x3D B = [xB , yB , gB ]
T
is the 3D coordinate of
URLLC slice, where Nsu = {1, 2, . . . , Nsu } is the set of UAVs
the BS with gB being the deployment altitude of the BS.
served by URLLC slice s with Nsu being the number of UAVs B
Besides, ḡj,s (x3D 3D
B , vj (t)) represents the path-loss between
served by slice s ∈ S u . B
the BS and UAV j in s at t, fj,s (t) is a random variable
denoting the small-scale fading, θjB (t) := µ(sul (t)|θµj (t)),
C. MBB slice model where µ(sul (t)|θµj (t)) is a DNN with sul (t) being the DNN
Similarly, an MBB slice can only be created after the MBB input and θµj (t) being the DNN parameters, is a channel
slice request is accepted, and we define the MBB slice request gain coefficient that will be obtained based on the DNN
as follows. described in subsection V-A. Dj;B (t) = ||vj3D (t) − x3D B ||2 is
the distance between UAV j and the BS at t.
Definition 2. An MBB slice request can be characterized as
a tuple {Ise , Csth } for any slice s ∈ S e = {1, 2, . . . , |S e |}, The advantages of (1) are: the obtained channel gain can
where Ise is the number of served ground users in s, and Csth reflect the actual local environment parameters; the theoretical
is the data rate requirement of each served user in s, |S e | is expression of the channel gain is not complicated, which will
the number of MBB slices. make the UAV communication problem analytically tractable.
2) UtG channel gain model: Similar to the definition of the
As an MBB slice is created and configured to serve users
BtU channel gain, for a typical UAV j ∈ J and a ground user
with the same data rate requirement, users with different data
i ∈ Nse , s ∈ S e , we model the DNN-based UtG channel gain
rate requirements will be served by diverse slices. Ground
at time slot t as follows:
users are partitioned into |S e | groups according to their data
rate requirements in this paper. Let Nse = {1, 2, . . . , Nse }
denote the set of users with the data rate requirement of Csth θij,s (t) ∆
hij,s (t) = Gj Gr ḡij,s (x3D 3D
i,s (t), vj (t))fij,s (t) = 2 (t) ,
for all s ∈ S e . However, owing Dij,s
∑ to the resource limitation on (2)
the UAV network and N e = s∈S e Nse is much greater than
J, we do not assume that all MBB slice requests should be where Gj denotes the transmitting antenna gain of UAV j,
accepted. To this end, for UAV j ∈ J and user i ∈ Nse , we ḡij,s (x3D 3D
i,s (t), vj (t)) represents the path-loss between UAV j
use an indicator variable aij,s (t) ∈ A(t) to indicate if the and user i at time slot t, fij,s (t) is a random variable account-
MBB slice request of creating an MBB slice s to serve user ing for the small-scale fading. x3D i,s (t) = [xi,s (t), yi,s (t), gi,s ]
T

i using UAV j at time slot t can be accepted, where A(t) is represents the 3D coordinates of user i with gi,s being the
an acceptance set of MBB slice requests. We let aij,s (t) = 1 height of user i. vj3D (t) = [xj (t), yj (t), gj (t)]T is the 3D
if the slice request is accepted; otherwise, aij,s (t) = 0. coordinates of UAV j at t with gj (t) being the deploymen-
t altitude of UAV j. θij,s (t) := Q(sdl (t)|θQj (t)), where
Q(sdl (t)|θQj (t)) denotes a DNN with sdl (t) being the DNN
D. Channel gain models input and θQj (t) being the DNN parameters, is the channel
The solution of the movement control problem of UAV gain coefficient that will be determined based on the DNN
requires the theoretical expressions of channel gain models. described in subsection V-B. Dij,s (t) = ||vj3D (t) − x3D i,s (t)||2
In the considered scenario, we focus on the modelling of the is the distance between UAV j and user i at t.
BtU and UtG channel gain models. Additionally, we denote xi,s (t) = [xi,s (t), yi,s (t)]T as the
1) BtU channel gain model: As mentioned above, some horizontal location of user i at t and denote the horizontal
existing simplified and complicated statistical channel gain location of the BS by xB = [xB , yB ]T . We consider the case
models [20]–[22] are inaccurate in the local environment that all UAVs are deployed at the same and fixed altitude in
where UAVs are actually deployed. To tackle this issue, for this paper and leave the movement control problem of UAVs
4 A URLLC packet will usually be coded before transmission, and the
in 3D space for future research. The horizontal location of
generated codeword will be transmitted in the air interface such that the UAV j is denoted by vj (t) = [xj (t), yj (t)]T ∈ X (t), where
transmission reliability can be improved. X (t) is the set of UAVs’ horizontal locations at t.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
5

III. P ROBLEM F ORMULATION transmission latency and codeword decoding error probability
In this section, we formulate the problem of slicing a UAV requirements can be derived as
u { [ req
network to provide energy-efficient and fair communication pB
N0 Wj,s (t) bs,u ln 2
j,s (t) = B exp u (t) +
τureq Wj,s
coverage for ground mobile users as an optimization problem. hj,s (t)
−1 req
] } (4)
√ req( s,u
Q ε )
To this aim, we first enforce intra-slice constraints and inter- u
−1 .
τs,u Wj,s (t)
slice constraints and then define the objective function of
the optimization problem. With aforementioned system model, Let p(W u (t)) denote the total required transmit power of
constraints and the objective function, the optimization prob- u
the BS for connecting to all UAVs with given
∑ ∑ W (t) =
lem is formulated. W u
(t), where we lighten the notation W u
(t)
s,j∑ j,s ∑ s,j j,s
u
for s∈S u j∈Nsu Wj,s (t). The similar lightened notation
A. URLLC slice constraints is adopted throughout the rest of the paper to simplify the
description. Since the maximum transmit power denoted by
To support URLLC services, the constraints on some crucial pmax of the BS is limited, we have the following transmit
B
performance indicators like transmit data rates of transmitters power constraint
in a network should be rigorously satisfied. In wireless com- ∑
p(W u (t)) = j,s (t) ≤ pB .
pB max
munication networks, Shannon formula is usually leveraged (5)
s,j
to quantize the transmit data rate of a transmitter. However,
in the network supporting the URLLC transmission, Shannon
formula cannot be utilized. This is because Shannon formula is B. MBB slice constraints
estimated under the crucial assumption of transmitting packets Owing to the movement of UAVs, user i for all i ∈ Ise ,
of enough long blocklength; yet, the length of a URLLC s ∈ S e may be in the communication ranges of several UAVs
packet is typically very short to satisfy the stringent low- at slot t. We assume that at t, a user can be served by at most
latency requirement. To tackle this issue, like [30], [31], we as- one UAV, and a UAV is allowed to deliver MBB traffic to
sume that the fading channel is a quasi-static Rayleigh fading at most one user due to its limited service capability. In this
channel over a time slot and changes independently. Then, the way, the upper layer network slice configuration (e.g., protocol
rate formula in finite blocklength regime in [31] is exploited to stack configuration, slice blueprint generation), which is time-
approximate the achievable data rate of transmitting URLLC consuming, can be proactively identified by RMC and VNSM
packets from the BS to UAV j in URLLC slice s ∈ S u under to accommodate the data rate requirements of admitted users
req [33]. Mathematically, we have
the given transmission latency τs,u and codeword decoding
req
error probability εs,u [31], i.e., ∑
j∈J∑aij,s (t) ≤ 1, ∀i ∈ Nse , s ∈ S e ,
[ ( ) (6)
i,s aij,s (t) ≤ 1, ∀j ∈ J .
u
Wj,s (t) hB B
j,s (t)pj,s (t)
u
Rj,s (t) ≈ ln ln 1 + u
√ 2 ]
N0 Wj,s (t)
(3) Based on the above slice request acceptance condition, the
Vj,s (t) −1 req
− τs,ureq
W u (t)
Q (εs,u ) , system bandwidth, denoted by W e (t), allocated to MBB slices
j,s
at t can be fully reused by each ground mobile user. In this
where pB j,s (t) is the transmit power of the BS when connecting case, for a user i ∈ Nse , s ∈ S e , we denote its received signal-
u
to UAV j at slot t, Wj,s (t) is the system bandwidth allocated to-interference-plus-noise ratio from UAV j at time slot t by
to UAV j in URLLC slice s, N0 is the noise power spectral SINRij,s (t), which can be expressed as
density,
/( Q−1 (·) is the inverse of Q-function, and Vj,s (t) =
)2 pj (t)hij,s (t)
hB B
j,s (t)pj,s (t) SINRij,s (t) = , (7)
1−1 1+ N 0W
u (t) is the channel dispersion. Note N0 W e (t) + Iij,s (t)
j,s

that, to ensure the low-latency requirements of UAVs, a fre- where pj (t) ∈ P(t) is the instantaneous transmit power of
quency division multiple access (FDMA) technique is applied UAV j at t with P(t) being a set∑of possible values of all
to achieve the URLLC inter-slices and intra-slices isolation. UAVs’ transmit power, Iij,s (t) = k∈J \{j} pk (t)hik,s (t) is
When the received signal-to-noise ratio (SNR) is higher than the interference caused by other UAVs,
5 dB, Vj,s (t) can be very accurately approximated as 1 [32]. The time average transmit power of UAV j∑during the
On the other hand, even in low SNR regime, since Vj,s (t) < 1, t
first t time slots can be written as p̄j (t) = 1t τ =1 pj (τ ).
we can obtain the upper bound of the minimum required Except for the transmit power, UAVs are subject to propulsion
transmit power by substituting Vj,s (t) = 1 into (3). If the power consumption and inherent circuit power consumption
upper bound value is applied in optimizing resource allocation, mainly including power consumption of mixers, frequency
then the requirements on transmission latency, codeword de- synthesizers, and digital-to-analog converters. Denote ppj and
coding error probability in the URLLC communication can be pcj as the propulsion power and circuit power of j during a time
satisfied. Besides, to satisfy the low-latency requirement, the slot, respectively, we then model the total power consumption
u
minimum data rate is Rj,s (t) = breq req req
s,u /τs,u , where bs,u denotes of j at t as
req
the number of bits to be transmitted within τs,u . Then, by ptot p c
j (t) = pj (t)+pj + pj , (8)
activating the transmit data rate condition and approximating
Vj,s (t) = 1, the mathematical expression of the required where ppj = p0 (1 + 3(vjs /Utip )2 ) +
transmit power from the BS to UAV j that satisfies the p1 ((1 + 0.25(vj /v0 ) ) − 0.5(vjs /v0 )2 )1/2 + 0.5A(vjs )3 ,
s 4 1/2

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
6

p0 and p1 are blade profile power and induced power, vjs above analysis, we can formulate the UAV network slicing
represents the speed of UAV j, Utip denotes the tip speed problem as a sequential decision problem presented as below
of the rotor blade, v0 is the average rotor-induced velocity,
Maximize lim inf (ϕ({ūi,s (t)})−
Aj is a parameter related to mechanical features of UAV j A(t),P(t),X (t),{Wj,s
u (t)} t→∞
[34]. ptot
j (t) is upper-bounded by the maximum instantaneous

ρ p̄tot
j (t)) (13a)
j (t) ≤ p̂j . Accordingly, the time
total power p̂j [35], i.e., ptot j∈J
average total power consumption of UAV j during the first t s.t : lim inf ūi,s (t) ≥ Csth , ∀i, s (13b)
time slots can be written as t→∞

p j (t) ≤ p̃j , ∀j
lim sup p̄tot (13c)
p̄tot c
j (t) = p̄j (t)+pj + pj , (9) t→∞
ptot
j (t) ≤ p̂j , ∀j, t (13d)
which is constrained by p̄totj (t) ≤ p̃j , and p̃j is the maximum 2
time average total power consumption of UAV j [35]. ∥vj (t) − vj (t − 1)∥2 ≤ e2max ,∀j, t (13e)
2
We then leverage Shannon formula to quantify the achiev- ∥vj (t) − vk (t)∥2 ≥ d2min ,∀j, k ̸= j, t (13f)
able data rate ui,s (t) (in Mbps) of user i ∈ Ise , s ∈ S e at t as aij,s (t) ∈ {0, 1}, ∀i, s, j, t (13g)
given by

u
Wj,s (t) > 0, ∀j, s, t (13h)
ui,s (t) = aij,s (t)W e (t)log2 (1 + SINRij,s (t)). (10) (5), (6), (12), (13i)
j∈J

During the first t time slots, the time average


∑tachievable data where vj (0) represents the initial horizontal location of UAV
rate of user i can be written as ūi,s (t) = 1t τ =1 ui,s (τ ). As j, ρ is a non-negative coefficient that weighs a trade-off
users require the minimum time average achievable data rates between the system revenue and the power consumption.
in practical communication scenarios, we present a constraint However, the solution of (13) is highly challenging. This is
to guarantee that user i’s minimum data rate requirement is because the locations of ground mobile users are unknown at
satisfied, i.e., each time slot and analytically tractable channel gain models
ūi,s (t) ≥ Csth , ∀i ∈ Nse , s ∈ S e . (11) are not obtained. Besides, the sequence-dependent character-
istic of (13) significantly hinders its solution. To solve such a
challenging problem, we first propose to predict ground mobile
C. Physical resource and UAV movement constraints users’ locations using a distributed learning method. We then
During the flight, the distance between two consecutive design an online learning method to estimate channel gain
waypoints on a UAV trajectory will be constrained by the coefficients. Based on the predicted users’ locations and the
UAV’s maximum speed. As such, the mathematical expres- estimated channel gain coefficients, we construct analytically
sion of the waypoint distance constraint can be written as tractable channel gain models. Third, with the predicted users’
||vj (t) − vj (t − 1)||22 ≤ e2max , where emax is the UAV’s max- locations and constructed channel gain models, we exploit a
imum flight distance during a time slot. Additionally, for col- Lyapunov-based optimization framework to solve the sequen-
lision avoidance, the distance between any two UAVs at each tial decision problem. Nevertheless, (13) needs to optimize
slot should not be less than a safety distance. Mathematically, discrete and continuous decision variables simultaneously.
the expression can be written as ||vj (t) − vk (t)||22 ≥ d2min , Besides, the function ūi,s (t) is non-concave with respect to
where dmin is the minimum safety distance. (w.r.t.) both pj (t) and vj (t), which results in a non-concave
Besides, since the MBB and the URLLC service provisions objective function and a non-convex constraint (13b). (13f)
are considered and network bandwidth resources allocated to is a non-convex constraint w.r.t. vj (t), and the left-hand-
MBB and URLLC slices are separated in the frequency plane u
side (LHS) of (5) is non-convex w.r.t. Wj,s (t). Therefore,
to achieve the inter-slice isolation, the network bandwidth (13) is also mixed-integer non-convex. Finally, to make (13)
constraint can be written as tractable, an iterative optimization scheme incorporating an
W u (t) + W e (t) = W tot , (12) SCA technique is explored to tackle the mixed-integer and
non-convex characteristics. The procedures of solving (13) are
where W tot denotes the total system bandwidth. elaborated in the following three sections.

D. Objective function and problem formuation IV. D ISTRIBUTED LEARNING FOR USERS ’ LOCATION
∑ PREDICTION
Define ϕ({ūi,s (t)}) = i,s log2 (1 + ūi,s (t)) as a propor-
tional fairness function of time average achievable data rates Since users’ locations may continuously change as time
across all ground users. The maximization of ϕ({ūi,s (t)}) will elapses, dynamic slice creation and configuration should be
lead to that of users’ time average achievable data rates as well enabled to improve the QoS of users. Considering the time-
as UAVs’ fair coverage. Our goal is to achieve an energy- consuming slice creation and configuration, (13) should be
efficient and fair MBB service provision by physically config- proactively solved on the basis of predicted users’ locations
uring the UAV network including optimizing the acceptance to achieve that goal. Many works assumed to obtain users’
set of MBB slice requests A(t), controlling UAVs’ movement current locations via a global positioning system (GPS) [35]–
X (t), and optimizing UAV transmit power P(t) during the [37]. However, they did not study how to predict users’
whole period of the communication task. Combining with the locations based on users’ current and/or historical locations.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
7

To predict users’ locations, it is necessary to apply a machine ࢃଵଵ,௧௧


learning method. Yet, a user’s location information may be Compute
scattered in multiple UAVs owing to the movement of UAVs, UAV agent 1

ࢃ୧୬
ࢃ௥୰

and each UAV can only collect partial information of users’ Local dataset
෢௧

locations after a period of time. A UAV, however, cannot Local model update

accurately predict users’ locations based on partial location


ࢃ୰௥
information. Thus, some centralized methods performed at the ࢃ௥୧୬
BS agent
BS may be exploited to accurately predict users’ locations. Aggregation: Global
However, they require a large amount of raw user location- model
update
ࢃ௃௃,௧,௧௧
related data exchange among the BS and all UAVs, which
Compute
will consume lots of network resources. To tackle this issue, ࢃ୰௥
UAV agent J ࢃ௥୧୬
we develop a distributed learning method. In this method, Local dataset
although each UAV will learn locally, it can obtain the global Local model update
prediction model by exchanging users’ location prediction
models rather than large amounts of raw data between it Fig. 2. The training processes of all UAVs and interrelationships between
UAVs and the BS.
and the BS. Moreover, owing to the local learning and the
exchange of prediction models, the BS and UAVs can predict
each user’s locations. With the predicted users’ locations, the • Output: For a typical UAV j and user i ∈ Nse ,
BS can effectively control the movement of UAVs via solving s ∈ S e , the output of the ESN is defined as a matrix
(13). UAVs can construct UtG channel gain models with the Ŷij,s (t) = [ŷi,s (t + 1), . . . , ŷi,s (t + K)] where K is
predicted users’ locations. the number of future time slots. ŷi,s (t + k) is the
predicted location of user i at time slot t + k. For the
A. Components of echo state network BS agent, the predicted output matrix is denoted by
B
Ŷi,s B
(t) = [ŷi,s B
(t + 1), . . . , ŷi,s (t + K)] for all i ∈ Nse ,
In this paper, an echo state network (ESN)-based learning
s∈S . e
method is exploited to train a location prediction model
because the training process of the ESN is simple and fast
and the ESN can effectively perform sequence-dependent data
B. Distributed ESN learning for users’ location prediction
mining [38]. However, there are two distinctions between
the ESN-based learning method explored in this paper and In this subsection, the procedure of training all the local
that in [38]. This paper leverages the ESN-based learning ESN models in a distributed way and then forcing these
method to predict mobile users’ locations in multiple time models to the global model at convergence is presented in
slots. Besides, the convergence performance of the ESN-based detail. Fig. 2 shows the training processes of all UAVs and
learning method is theoretically analyzed in this paper. An the interrelationship between UAVs and the BS.
ESN is a recurrent neural network which can be partitioned For user i ∈ Nse , s ∈ S e , an input vector xi,s (t) ∈ RNi ,
into four components: agent, input, ESN model, and output as where Ni represents the dimension of the vector, is fed to a
specified below: reservoir with a dimension of Nr , whose internal state qi,s (t−
• Agent: There are J + 1 agents separately located on J 1) ∈ RNr is updated as follows
UAVs and the BS as shown in Fig. 2. Each UAV agent
will train ESN models locally and then send trained local r
qi,s (t) = fres (Win xi,s (t) + Wrr qi,s (t − 1)) , (14)
ESN models to the BS agent for aggregation. For the BS
agent, it will aggregate received local ESN models and where Win r
∈ RNr ×Ni and Wrr ∈ RNr ×Nr are random
then broadcast the aggregated (or global) ESN model to matrices uniformly distributed in the interval (0, 1), fres is
all UAV agents. a suitably defined non-linear function (e.g., tanh(·)).
• Input: For a typical UAV j and user i ∈ Ns , s ∈ S , the
e e
The predicted output of the ESN at t is given by
input set of the ESN is defined as Vij,s (t) = {xi,s (t −
o
Q), ..., xi,s (t)} where Q is the number of users’ location ŷi,s (t + 1) = Win xi,s (t) + Wro qi,s (t), (15)
samples. The Q location samples {xi,s (t−Q), ..., xi,s (t−
1)} are used to train an ESN model related to user i. where Win o
∈ RNo ×Ni , Wro ∈ RNo ×Nr are trained based on
xi,s (t) is used to predict user i’s location. the location samples.
• ESN model: For a typical UAV j, one of its local ESN To train local ESN models on UAV j, we should be
models is leveraged to correlate the xi,s (t) with user provided with a sequence of Q target input-output pairs
i’s predicted location. As a single ESN, which has one {(xi,s (t − Q), yi,s (t − Q + 1)), . . . , (xi,s (t − 1), yi,s (t))} with
hidden layer as shown in Fig. 2 connecting the input yi,s (t) representing the real 2D position of user i ∈ Nse ,
and the output, can quickly converge, we regard it as s ∈ S e , at t. Besides, training ESN models requires updated
a location prediction model. Consequently, a local ESN users’ locations. To this end, each user will broadcast Beacon
r
model includes an input weight matrix Win , a recurrent messages containing location and time information every Tp
r
matrix Wr and an output weight matrix Wj,t . time slots to refresh its location stored on UAVs that can

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
8

receive the Beacon messages. Define the hidden matrix Xij,st The main steps of predicting users’ locations are given in
as  T  Algorithm 1.
xi,s (t − 1)qi,s
T
(t − 1)
 ..  Algorithm 1 Distributed ESN learning for users’ location
Xij,st =  . . (16)
prediction
i,s (t − Q)qi,s (t − Q)
xT T
1: Input: Training data set (local), {Vij,s (t)} ∀i ∈ Nse , s ∈
The optimal output weight matrix at time slot t is then S e , j ∈ J , the maximum number of iterations rmax .
achieved by solving the regularized least-square problem: 2: Output: The predicted location Ŷi,s B
(t), Ŷij,s (t) for all i,
∑J s.
j=1 ||Xij,st Wt − Yij,st ||F +
1
Wt⋆ = arg min 2
2
Wt ∈R(Ni +Nr )×No 3: Each UAV agent j generates N e local ESN models
2 ||Wt ||F ,
ξ 2
including N e matrices Win r
and N e matrices Wrr for
(17) e
predicting N users’ locations, respectively.
where Wt = [Win o
, Wro ]T , ξ ∈ R+ is a positive scalar known 4: Each UAV agent j gathers hidden matrices {Xij,st } and
as regularization factor, Yij,st = [yi,s (t)T ; . . . ; yi,s (t − Q + teacher data {Yij,st } from {Vij,s (t)} for all i, s.
1)T ] ∈ RQ×No . (1)
5: Initialize Aj,t = 0 and Ŵt
(1)
= 0.
As {Xij,st } and {Yij,st } for all i, j, s, t are locally col- 6: for r = 1 : rmax do
lected, we adopt an alternating direction method of multipliers 7: for Each UAV agent j in parallel do
(ADMM) [39] to address (17). This is because the ADMM 8: Agent j computes Wj,t
(r+1)
using (19) and transmits
method is a distributed method and it can quickly aggregate the (r+1)
Wj,t to the BS agent.
locally obtained results to a global result at the convergence.
9: end for
Particularly, by enforcing Wj,t = Ŵt , we can obtain the (r+1)
10: The BS agent computes Ŵt using (20) and broad-
augmented Lagrangian function of (17) (r+1)
casts Ŵt to all UAV agents.

J
11: for Each UAV agent j in parallel do
L(Wj,t , Ŵt ) = 1
2 ||Xij,st Wj,t − Yij,st ||2F + 2ξ ||Ŵt ||2F + (r+1)
j=1 12: Agent j computes Aj,t using (21) and transmits

J ( ( )) ∑
J (r+1)
Aj,t to the BS agent.
j,t Wj,t − Ŵt ||Wj,t − Ŵt ||2F ,
λ
tr AT + 2
j=1 j=1 13: end for
(18) 14: Break if it converges or r = rmax
where Aj,t ∈ R(Ni +Nr )×No is a Lagrangian multiplier matrix. 15: end for
{Wj,t } can be considered as a family of local variables, and 16: The BS agent can then obtain user i’s, ∀i, predicted loca-
Ŵt can be regarded as the global consensus variable, λ is a B
tions Ŷi,s (t) by iteratively assigning xi,s (t + 1) = ŷi,s B
(t)
Lagrangian multiplier. and calling (15) for K times. Similarly, each UAV can
Next, an iterative framework is developed to mitigate (18). achieve all users’ predicted locations.
By zero-forcing the derivative of L(Wj,t , Ŵt ) over Wj,t and
Ŵt at the r-th iteration, we can obtain the evolutionary forms
(r) (r)
of Wj,t and Ŵt , respectively,
C. Convergence of the distributed ESN learning method
(r+1) ( T )−1
Wj,t (= Xij,st Xij,st + λI × ) In this subsection, we will analyze the convergence of
(r) (r) (19)
Xij,st Yij,st + λŴt − Aj,t ,
T Algorithm 1. The following lemma shows that Algorithm 1
is convergent.
where I ∈ Z (Ni +Nr )×(Ni +Nr ) is an identity matrix, and ⋆
(∑ ) Lemma 1. Denote (Wj,t , Ŵt⋆ ) as the optimal solutions to
J ∑J (18). For the distributed ESN learning method, ∀r ∈ Z+ , j ∈
= (ξ + λJ)−1
(r+1) (r) (r+1)
Ŵt Aj,t + λ Wj,t . (r) (r)
j=1 j=1 J , we have that L(Wj,t , Ŵt ) is bounded and
(20)
(r+1) (r) (r)
To update Aj,t , the gradient ascend method is leveraged L(Wj,t

, Ŵt⋆ ) = lim L(Wj,t , Ŵt ). (22)
r→∞
with
( ) Proof. Please refer to Appendix A in the technical report [40].
(r+1) (r) (r+1) (r+1)
Aj,t = Aj,t + η Wj,t − Ŵt , (21)
where η is a constant step.
By exploring the ADMM, the distributed ESN learning V. O NLINE CHANNEL GAIN COEFFICIENT ESTIMATION
method can be implemented for users’ location predication. Even though we obtain predicted locations of all ground
Summarily, J UAVs will separately calculate (19) and (21). users, it is still impossible to theoretically analyze (13) without
The BS calls (20) to aggregate the local variables {Wj,t }. closed-form expressions of BtU and UtG channel gains. To
The local Lagrangian multipliers {Aj,t } are updated to drive this end, we correlate channel gain values with channel gain
{Wj,t } into their global consensus. After a limited number coefficients in a simple manner (refer to (2) and (1)). In
of iterations, {Wj,t } will converge to the global consensus this way, analytically tractable channel gain models can be
variable Wt⋆ = Ŵt . With the obtained Wt⋆ , each UAV and achieved if the channel gain coefficients can be numerically
the BS will be able to predict the entire locations of all users. generated. Additionally, as the developed channel models are

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
9

much more simpler in expression than other channel models the memory is filled, the newly generated sample replaces
such as the probabilistic LoS model [21] and angle-dependent the oldest one. We randomly choose a minibatch of training
channel parameters [22], they can help significantly simplify samples {(sul (τ ), ôj (τ ))|τ ∈ Tt } from the database, where Tt
the theoretical analysis of UAV communication problems. In is a set of time slot indices. The network parameters θuj (t) are
this paper, DNNs, which are of practical usage for complicated trained using the ADAM method [43] to reduce the averaged
function approximations, are leveraged to generate numerical differences of squares, as
values of the channel gain coefficients.
1 ∑ 2
L(θµj (t)) = (ôj (τ ) − oj (τ )) . (23)
|Tt |
A. BtU channel gain coefficient estimation τ ∈Tt

In this subsection, we start to describe the procedure of The following Algorithm 2 summarizes the steps of training
estimating BtU channel gain coefficient using a DNN from the the DNN for BtU channel gain estimation.
viewpoint of the design of input and output space and network
parameters. It is noteworthy that the BS can simultaneously Algorithm 2 DNN for BtU channel gain estimation
connect to all UAVs and the channel gain coefficient of each 1: Initialize: DNN µ(sul (t)|θµj (t)) with network parameters
BtU link should be estimated. Therefore, the BS is imposed θµj (t).
to construct and train a DNN for each UAV. For a typical 2: Initialize: Replay buffer R with capacity C and minibatch
UAV j ∈ Nsu , s ∈ S u , the input space, network parameters, size |Tt |
and output spaces of the constructed DNN are presented as 3: for each slot t = 1, 2, . . . do
follows: 4: Calculate the input space sul (t) according to the loca-
ul ul
• Input space s (t): At each time slot t, we set s (t) = tions of the BS and UAV j.
3D 3D ul ul
[vj (t); xB ; aj,s (t)]. aj,s (t) is an indicator variable in- 5: The BS measures signals to obtain the target channel
dicating whether there is an LoS link between the BS gain coefficient ôj (t).
and UAV j at time slot t. aul j,s (t) = 1 if an LoS link 6: Store the transition (sul (t), ôj (t)) in the buffer R.
exists; otherwise, aul j,s (t) = 0. This is because the BtU 7: If t ≥ |Tt |, sample a random minibatch of |Tt | transi-
channel gain is closely-related to the locations of the BS tions (sul (m), ôj (m)) from R.
and UAV j and the LoS/NLoS connection between them. 8: Update the network parameters θµj (t) by minimizing
In an area with given building locations and heights, the the loss L(θµj (t)) using the ADAM method.
presence/absence of LoS link between the BS and a UAV 9: end for
can be exactly determined by checking whether the link
connecting the BS and the UAV is blocked or not by any
In Algorithm 2, the training process at each time slot is fast
building.
µ via calling the ADAM method to reduce the value of L(θµj (t))
• Network parameters θ j (t): We consider two fully-
of a minibatch of randomly selected samples. Thus, Algorithm
connected hidden layers. The ReLU function is utilized
2 can be performed online.
as the activation function in the hidden layers. Besides,
the network parameters are initialized by a Xavier initial-
ization scheme [41]. B. UtG channel gain coefficient estimation
• Output space oj (t): oj (t) is the estimated output value
of the output layer. We set ôj (t) = θ̂jB (t) with θ̂jB (t) being Owing to the movement of UAVs and ground users, each
the target channel gain coefficient value, which can be ob- UAV will have the possibility of connecting each ground user
tained by signal measurements, between the BS and UAV after a long enough period of time. Besides, the UtG channel
j. The acquisition of signal measurements is practically gain is closely-related to the environment where UAVs are
available based on the existing cellular communication deployed and the locations of UAVs and users, regardless
protocols like the reference signal received power (RSRP) of exclusive characteristics (e.g., user profile information and
for the signal receiving power measurement, received device types) of ground users. Therefore, each UAV is imposed
signal strength indicator (RSSI) for the total receiving to construct and train a unique DNN to estimate the UtG
power measurement [24]. Besides, the linear function is channel gain coefficients between it and all ground users in this
considered as the activation function in the output layer. paper. Particularly, for a typical UAV j, we design the input
space, network parameters, and output space of the DNN as
To effectively train the DNN, the experience replay tech-
follows:
nique [42] is exploited. This is due to the two special charac-
teristics of the channel gain estimation issue: 1) the collected • Input space sdl (t): At each time slot t, we set sdl (t) =
input data sul (t) incrementally arrive as UAV j moves to [x̃3D (t); vj3D (t); adl
j (t)]. x̃
3D
(t) is the 3D coordinate of
new locations, instead of all made available at the beginning the user connecting to UAV j. adl j (t) is an indicator
of the training; 2) as UAV j collects input data as it flies, variable indicating whether there is an LoS link between
those input data obtained at consecutive time slots are typically UAV j and its connected user at time slot t. adl j (t) = 1 if
correlated, which may result in the oscillation or divergence the LoS link exists; otherwise, adl j (t) = 0. The approach
of the DNN. Specifically, at time slot t, a new training sample of determining adl j (t) is similar to that of obtaining
(sul (t), ôj (t)) is added to a database or replay memory. When aul
j,s (t).

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
10

• Network parameters θQj (t): We consider two fully-


connected hidden layers. The ReLU is used as an activa- Zi,s (t) = Zi,s (t − 1) + γi,s (t − 1) − ui,s (t − 1), (26)
tion function in hidden layers. The network parameters
are initialized by a Xavier initialization scheme.
dl dl
• Output space oij (t): oij (t) is the output value of the
Hj (t) = Hj (t − 1) + ptot
j (t − 1) − p̃j . (27)
output layer. We set the target output value ôdl
ij (t) = θ̂ij (t) If the following mean-rate stability conditions can be held
with θ̂ij (t) being the target channel gain value between +
limt→∞ E{[Qi,s (t)] }/t = 0,
UAV j and its connected user i at t. Similarly, θ̂ij (t) can
limt→∞ E{[Zi,s (t)]+ }/t = 0, (28)
be obtained by signal measurements. Besides, the linear +
limt→∞ E{[Hj (t)] }/t = 0,
function is considered as the activation function in the
output layer. where the non-negative operation [x]+ = max{x, 0}, then the
Likewise, the experience replay technique joint with time average constraints of (24) can be satisfied.
the ADAM method is used to help train the DNN
Proof. Please refer to Appendix B in the technical report [40].
Q(sdl (t)|θQj (t)). The main steps of training the DNN for
UtG channel gain estimation are similar to Algorithm 2; yet,
each UAV will measure signals for the DNN training. Thus, Then the following question pops up: how to solve (24) with
we omit the summarization of the main steps for brevity. the virtual queues?
For simplicity, we assume that all virtual queues are ini-
VI. LYAPUNOV-BASED O PTIMIZATION F RAMEWORK tialized to be zero and define a Lyapunov function L (t) as
+
A. Lyapunov decomposition a sum of square of all the three virtual queues [Qi,s (t)] ,
+ +
[Zi,s (t)] and [Hj (t)] (divided by 2 for convenience)
Given the predicted users’ locations and estimated channel ∆ ∑ + 2 ∑ 2
gain coefficients, it is still difficult to solve (13). This is at t, i.e., L(t) = 21 i,s ([Qi,s (t)] ) + 12 i,s ([Zi,s (t)]+ )
∑ + 2
because (13) is a sequential decision problem. Reinforcement + 21 j∈J ([Hj (t)] ) . L(t) is a scalar measure of constraint
learning methods can be explored to solve sequential decision violations. Intuitively, if the value of L(t) is small, the absolute
problems, for example, the works in [23], [44] proposed values of all queues are small; otherwise, the absolute value
reinforcement learning methods to solve sequential decision (
of at least one queue is great. Additionally, )
we define a drift-
problems with a discrete decision space and a continuous ∑
plus-penalty function as ∆(t) − V g(t) − ρ j∈J ptot j (t) ,
decision space, respectively. However, how to solve sequential L(t + 1) − L(t)
decision problems involving both discrete and continuous
where( ∆(t) =∑ ) represents a Lyapunov drift,
−V g(t) − ρ j∈J pj (t) is a penalty, and V is a non-
tot
decision variables (e.g., the problem (13)) is a significant and
understudied problem. To solve (13), we propose to decom- negative penalty coefficient that weighs a trade-off between the
pose the sequential decision problem into multiple repeated constraint violations and the optimality. In this way, the solu-
optimization subproblems via the Lyapunov approach. Partic- tion of sequential decision problem (24) can be implemented
ularly, let γ(t) = (γ1,1 (t), . . . , γN|S e (t)) be an auxiliary
by repeatedly minimizing the drift-plus-penaly function under
e | ,|S |
e
all non-time average constraints of (24) at each time slot t.
data rate vector with 0 ≤ γi,s (t) ∑ ≤ umax i,s (t), ∀i ∈ Ns , s ∈
e

S , t. Define g(t) = ϕ(γ(t)) =


e Besides, the following lemma presents the upper bound of
i,s log2 (1 + γi,s (t)). The
following lemma shows a transformed problem of (13) and the drift-plus-penaly function value.
presents the conditions required to enforce the time average Lemma 3. At each time slot t, the upper bound of the
constraints of the transformed problem. value of the drift-plus-penalty function ∆(t) − V (g(t)−
∑ ∆
Lemma 2. The original problem (13) can be equivalently ρ j∈J ptot j (t)) can be expressed as (29) with B =
∑ max 2 ∑ 2
transformed into the following problem. i,s (ui,s ) + j∈J (p̂j ) /2.
( ∑ ) ( )

Maximize u lim inf ḡ(t) − ρ p̄tot
j (t) ∆(t) − V g(t) − ρ j∈J ptot j (t) ≤ B+
A(t),P(t),X (t),γ(t),{Wj,s (t)}
∑ ∑ +( )
t→∞ j∈J
+ th
(24a) [Q (t)] C − j∈J [Hj (t)] p̃j −ppj − pcj +
∑ i,s p cs
i,s ∑
lim inf [ūi,s (t) − γ̄i,s (t)] ≥ 0, ∀i, s V ρ j∈J (pj +pj ) − V ϕ(γ(t)) + i,s [Zi,s (t)]+ γi,s (t)+
s.t : (24b) ∑ +
t→∞
{V ρ + [Hj (t)] }pj (t)−
lim inf [ūi,s (t) − Csth ] ≥ 0, ∀i, s (24c) ∑j∈J +
i,s {[Qi,s (t)] + [Zi,s (t)] }ui,s (t).
+
t→∞
lim inf [p̃j − p̄tot
j (t)] ≥ 0, ∀j (24d) (29)
t→∞
0 ≤ γi,s (t) ≤ Proof. Please refer to Appendix C in the technical report [40].
i,s (t), ∀i, s, t
umax (24e)
constraints (13d) − (13i). (24f)
In (29), the right-hand-side expression is the upper bound of
For all i ∈ Nse , s ∈ S e , j ∈ J , introduce three families
the drift-plus-penalty. As such, the minimization of the drift-
of virtual queue variables {Qi,s (t)}, {Zi,s (t)}, {Hj (t)}, and
plus-penalty can be approximated by minimizing its upper
update them with
bound. Further, the upper bound can be partitioned into two
Qi,s (t) = Qi,s (t − 1) + Csth − ui,s (t − 1), (25) independent terms related to γ(t) and other sets of decision

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
11

u
variables, i.e., A(t), P(t), X (t), respectively. Therefore, we • The derivative of p(W u (t)), denoted by ∂p(W (t))
∂W u (t) , sat-
can summarize the Lyapunov-based optimization framework isfies

of mitigating (13) as follows. < 0, 0 < W u (t) ≤ Wuth (t)
∂p(W (t)) 
u
• At each t, observe Qi,s (t), Zi,s (t), Hj (t) for all i ∈ Ns ,
e
= 0, W u (t) = Wuth (t) (33)
s ∈ S , j ∈ J.
e ∂W u (t) 
> 0, W u (t) > Wuth (t)
• Choose γi,s (t) for each user i to mitigate (30)
∑ This lemma shows that there is a value Wuth (t) minimizing
Minimize − V ϕ(γ(t)) + [Zi,s (t)]+ γi,s (t) the total required BS transmit power. In consequence, mini-
i,s
γ(t) mizing W u (t) while satisfying the BS transmit power con-
(30a) straint will not decrease the objective function value of (31).
s.t : 0 ≤ γi,s (t) ≤ umax
i,s (t). (30b) On the contrary, finding out the minimum W u (t) satisfying
the BS transmit power constraint can lead to the increase of
• Given UAVs’ locations X (t − 1), choose A(t), P(t), and
the value of W e (t). As ui,s (t) monotonously increases with
X (t) to mitigate (31)
W e (t) [45], the maximum value of the objective function of

Minimize u
+
{V ρ + [Hj (t)] }pj (t)− (31) can be obtained when the minimum W u (t) is achieved.
A(t),P(t),X (t),{Wj,s (t)} j∈J It indicates that the optimal solution can still be obtained even
∑ +
{[Qi,s (t)] + [Zi,s (t)]+ }ui,s (t) (31a) if the subproblem decomposition scheme is performed.
i,s Therefore, we can formulate the subproblem of URLLC
s.t : (13d) − (13i). (31b) slice resource allocation at slot t as follows:
• Compute ui,s (t) using (10). Update three virtual queues Minimize
u
W u (t) (34a)
W (t)
using (25), (26), and (27).
s.t : 0 < W u (t) ≤ W tot , ∀t (34b)
B. Solution to subproblem (30) (5) is satisfied. (34c)

The proportional fairness function ϕ(γ(t)) is a separable Based on the second property in Lemma 4, we can con-
∂p(Wuth (t))
sum of individual logarithmic functions. Thus, the mitiga- clude that ∂W th (t)
= 0. Then, a binary search method
u
tion of (30) is equivalent to a separate selection of the ∂p(W th (t))
[45] can be executed on the condition ∂W th u
= 0 to
individual auxiliary variable γi,s (t) ∈ [0, umax i,s (t)] for each th
u (t)
obtain Wu (t). Besides, the mitigation of (34) is equivalent
user i ∈ Nse , s ∈ S e that minimizes the convex function
+ to obtaining W u (t) satisfying p(W u (t)) = { pmax constrained}
−V log2 (1 + γi,s (t)) + [Zi,s (t)] γi,s (t). In consequence, the B
on W (t) ∈ (0, Wub (t)] with Wub (t) = min Wuth (t), W tot .
u u u
closed-form solution to (30) can be written as
 max Similarly, the binary search method is leveraged to obtain the
 ui,s {(t), [Zi,s (t)]+ = 0, } minimum URLLC bandwidth Wopt u
(t) via searching W u (t) in
γi,s (t) = [ ]+ u
the interval (0, Wub (t)].
 min [Zi,s (t)]+ ln 2 − 1
V max
, ui,s (t) , else. u
As p(Wopt (t)) = pmax
B
u
with Wopt (t) being the optimal
(32) u
W (t), we have
/( )
B max B 2∑ ∑
B −2
C. Solution to subproblem (31) pj,s (t) = pB hj,s (t) u u
hl,s (t) .
s∈S l∈Ns
To make (31) easier to tackle, we attempt to reduce its (35)
variable dimension and decompose it into several independent 2) Subproblem of MBB slice resource allocation: Given
subproblems. The key observations on (31) are as follows: in W u (t), the subproblem of MBB slice resource allocation can
(31), network resources allocated to URLLC slices and MBB be formulated as follows:
slices should be simultaneously optimized; the system band- ∑ +
width allocated to URLLC slices and MBB slices, however, Minimize {V ρ + [Hj (t)] }pj (t)−
A(t),P(t),X (t) j∈J
is only coupled at the total bandwidth constraint (12). We ∑ +
therefore decompose (31) into a subproblem of URLLC slice {[Qi,s (t)] + [Zi,s (t)]+ }ui,s (t) (36a)
i,s
resource allocation and a subproblem of MBB slice resource s.t : (13d) − (13g), (6), (12). (36b)
allocation via decoupling (12).
1) Subproblem formulation of URLLC slice resource allo- In (36), there are continuous variable sets P(t), X (t) and
cation: In (5), the URLLC bandwidth W u (t) and the total 0-1 variable set A(t). Besides, the constraint (13f) is non-
transmit power p(W u (t)) are correlated. By referring to (5), convex over vj (t), ∀j. (36a) is non-convex over vj (t) and
the following lemma, which reflects the relationship between pj (t), ∀j. Therefore, (36) is a challenging mixed-integer-non-
p(W u (t)) and W u (t), can be derived. convex programming problem.
The iterative optimization scheme has been shown to be
Lemma 4. p(W u (t)) satisfies the following properties [45]: an effective scheme of solving mixed-integer programming
• p(W (t)) decreases with W (t) when 0 < W (t) ≤
u u u
problems [10]. We therefore adopt this type of scheme to
th th
Wu (t), where Wu (t) is the unique solution that mini- mitigate (36) and first attempt to optimize the acceptance of
mizes p(W u (t)). MBB slice requests. Supported by the optimal slice access

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
12

enforcement, UAV movement control and UAV transmit power (vk (t) − xi,s (t)) , ∀i, k ̸= j, t (38c)
optimization are then performed. (r) (r) (r) (r)
− ||vj (t) − vk (t)||22 + 2(vj (t) − vk (t))T
a) Acceptance optimization of slice requests: Given UAVs’
locations X (t) and UAV transmit power P(t), the acceptance (vj (t) − vk (t)) ≥ d2min , ∀j, k ̸= j, t (38d)
set of slice requests of (36) can be optimized by mitigating (13e) is satisfied, (38e)
the following problem
(r)
∑ ∑ where ηi,s (t) and Bik,s (t) are slack variables, Di,s (t) =
∑ pk (t)θik,s (t)
Maximize cij,s (t)aij,s (t) (37a) log2 (N0 W e (t) + (r) ),
A(t) i,s j∈J 2 |gk (t)−gi,s | +||vk (t)−xi,s (t)||2
2
k∈J
(r) pk (t)θik,s (t)
s.t : (6), (13g). (37b) Eik,s (t) = ( )2 (r) ,
(r) D (t)
|gk (t)−gi,s |2 +||vk (t)−xi,s (t)||222 i,s ln 2

where cij,s (t) = {[Qi,s (t)]+ + R̃ij,s (t) = −log2 (N0 W e (t) +
pk (t)θik,s (t)
k∈J \{j} |gk (t)−gi,s |2 +Bik,s (t) ),
[Zi,s (t)] }W (t)log2 (1 + SINRij,s (t)).
+ e
(r) (r)
and vj (t), vk (t) are given locations at the r-th iteration.
Note that at the initial time slot t = 1, all weights {cij,s (1)}
equal to zero since all virtual queues are initialized to be Proof. Please refer to Appendix D in the technical report [40].
zero. To tackle this issue, we define the weight cij,s (1)
as cij,s (1) = W e (t) log2 (1 + SINRij,s (1)). (37) is a linear
integer programming problem and can be efficiently solved Remark: As (38) is convex, the optimization tool MOSEK
by some optimization tools such as MOSEK. can be utilized to effectively solve it. Owing to the approxi-
b) UAV movement control: As the constraint (13f) is non- mation, the feasible domain of (38) is smaller than that of (36)
convex over vj (t), ∀j, t, the SCA technique [46] is exploited with fixed P(t) and A(t); thus, the value of (38a) is the upper
to tackle the non-convexity. The key idea of SCA is to solve bound of the opposite of (36a) with given P(t) and A(t), if
a sequence of convex optimization problems with different it exists.
initial points to obtain an approximate solution to a non- c) UAV transmit power optimization: For any given slice
convex optimization problem instead of solving the non- request acceptance set A(t), UAVs’ locations X (t), the fol-
convex problem directly. In this paper, we first utilize some lowing proposition shows a method of optimizing the UAV
approximation functions to locally approximate non-convex transmit power.
(36a) and (13f) based on the following assumption. Proposition 2. By exploiting the SCA technique, the UAV
Assumption 1. A function f˜ : X × K → R is called the transmit power at t can be optimized by mitigating the
approximation function for the non-convex function f (x) (x ∈ following convex optimization problem:
X ), when the following conditions hold [46]: ∑ ∑ +
Maximize − V ρ pj (t) − [Hj (t)] pj (t)+
• f˜(·, ·) is continuous in X × K. P(t),{ηi,s (t)} j∈J j∈J

• f˜(·, x ) is convex in X for all x(r) ∈ K.
(r) +
{[Qi,s (t)] + [Zi,s (t)]+ }ηi,s (t) (39a)
• Function value consistency: f˜(x
(r)
, x(r) ) = f (x(r) ) for ∑
i,s
(r)
all x ∈ X .
(r)
s.t : (aij,s (t)W e (t)(R̂i,s (t) − Fij,s (t)))−
∂ f˜(x,x(r) )
• Gradient consistency: |x=x(r) = ∇f (x)|x=x(r) j∈J
∂x ∑ ∑
for all x(r) ∈ K. (aij,s (t)W e (t)
(r) (r)
Gik, (t)(pk (t) − pk (t))) ≥
• Upper bound: f (x) ≤ f˜(x, x ) for all x ∈ X , x(r) ∈ K.
(r)
j∈J k∈J \{j}
∂ f˜(·,·)
• ∂x is continuous in X × K. ηi,s (t), ∀i, s, t (39b)
Then, given the UAV transmit power P(t), UAVs’ locations (13d) is satisfied, (39c)
X (t − 1) at the previous time slot t − 1, and the acceptance ∑
(r) (r)
set of slice requests A(t), the following proposition shows a where Fij,s (t) = log2 (N0 W e (t) + pk (t)hik,s (t)),
method of controlling the movement of UAVs. k∈J \{j}
∑ pk (t)θik,s (t)
R̂i,s (t) = log2 (N0 W e (t) + |gk (t)−gi,s |2 +||vk (t)−xi,s ||22
),
Proposition 1. By exploring the SCA technique, UAVs’ k∈J
(r) hik,s (t) (r)
locations at t can be obtained by mitigating the following Gik,s (t) = (r) , and pk (t) is the given transmit power
F (t)
convex optimization problem: 2 ij,s ln 2
of UAV k at the r-th iteration.
∑ +
Maximize {[Qi,s (t)] + [Zi,s (t)]+ }ηi,s (t) Proof. Please refer to Appendix E in the technical report [40].
X (t),{ηi,s (t)},{Bik,s (t)} i,s
(38a)
∑ (r)
∑ (r)
s.t : aij,s (t)W e
(t)(Di,s (t) − Eik,s (t)× Remark: MOSEK tool can now be leveraged to effectively
j∈J k∈J
(r)
solve the convex (39). Likewise, the utilization of SCA tech-
(||vk (t) − xi,s (t)||22 − ||vk (t) − xi,s (t)||22 ))+ nique results in the feasible domain of (39) being smaller than

aij,s (t)W e (t)R̃ij,s (t) ≥ ηi,s (t), ∀i, t (38b) that of (36) with given A(t) and X (t). Therefore, the optimal
j∈J value of (39a) is the upper bound of the opposite of (36a) with
(r) (r)
Bik,s (t) ≤ ||vk (t) − xi,s (t)||22 + 2(vk (t) − xi,s (t))T × fixed A(t) and X (t), if it exists.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
13

Based on the above derivation, we next propose an iter- Algorithm 4 Repeatedly Energy-Efficient and Fair Service
ative algorithm, named iterative request, location and power coverage, RE2 FS
optimization, for (36) that is summarized as below. 1: Initialization: Randomly initialize UAVs’ locations, and
run initialization steps of Algorithms 1 and 2.
Algorithm 3 Iterative request, location and power optimization 2: Initialization: Let Qi,s (1) = 0, Zi,s (1) = 0, Hj (1) = 0
1: Initialization: Initialize X (0) (t) = X (t − 1), P (0) (t) = for all i ∈ Nse , s ∈ S e , j ∈ J .
P(t − 1), and r̄max , let r = 0. 3: Pre-train ESN models and DNNs:
2: repeat 4: for each episode t̂ = 1, 2, . . . , 500 do
3: Given X (r) (t), P (r) (t), solve (37) to obtain the optimal 5: Steps 3-19 of Algorithm 1.
solution A(r+1) (t). 6: end for
4: Given A(r+1) (t), X (r) (t), P (r) (t), solve (38) to gener- 7: for each episode t̂ = 1, 2, . . . , 3000 do
ate the optimal solution X (r+1) (t). 8: Steps 4-8 of Algorithm 2. Pre-train the DNN for UtG
5: Given A(r+1) (t), X (r+1) (t), P (r) (t), solve (39) to ob- channel gain coefficient estimation for 1000 episodes.
tain the optimal solution P (r+1) (t). Update r = r + 1. 9: end for
6: until Convergence or r = r̄max . 10: for each time slot t = 1, 2, . . . , T do
11: Predict users’ locations using the distributed ESN
Finally, we can summarize the energy-efficient and fair learning method:
algorithm of mitigating the UAV network slicing problem in 12: Step 19 of Algorithm 1 to obtain the predicted locations
Algorithm 4.
B
ŷi,s (t + K) of user i ∈ Nse for all s ∈ S e .
13: Estimate BtU and UtG channel gain coefficients
VII. I MPLEMENTATION AND PERFORMANCE ANALYSIS OF using the DNNs:
A LGORITHM 4 14: Observe the state sul (t+K) and sdl (t+K). Use DNNs
to estimate the corresponding channel gain coefficients
In this section, we first summarize the implementation of θij,s (t + K) and θjB (t + K). Then, calculate channel
Algorithm 4. Then, we analyze the convergence and compu- gains hij,s (t + K), ∀i ∈ Nse , j ∈ J , s ∈ S e and
tational complexity of Algorithm 4. hBj,s (t + K), ∀j ∈ Ns , s ∈ S
u u
using (2) and (1),
In Algorithm 4, to ensure that accurate users’ predicted respectively.
locations and estimated channel gain coefficients can be in- 15: Slice resource allocation:
putted when calling steps 11-18, we perform steps 3-9 before 16: Compute γi,s (t + K) using (32) for all i and s.
executing the communication task. Fig. 3 depicts the working 17: Call the binary search method to obtain the optimal
diagram of Algorithm 4, where ⃝ 1 is firstly called for ESN
W u (t + K), and calculate W e (t + K) using (12).
model and DNN pre-training and then the logical flow ⃝ 2
18: Find the acceptance set of slice requests A(t + K),
→ ⃝ 3 → ⃝ 4 → ⃝ 5 → ⃝ 6 → ⃝ 2 is executed cyclically. In
UAVs’ locations X (t + K), and UAV transmit power
this logical flow, ⃝ 2 and ⃝ 3 denote the prediction of users’
P(t + K) using Algorithm 3.
locations and the estimation of BtU & UtG channels, ⃝ 4 and
19: Update the ESN models and DNNs:
⃝5 represent the resource optimization and allocation, and
20: Steps 5-19 of Algorithm 1.
⃝6 denotes the update of ESN models and DNNs. Besides,
21: Steps 4-8 of Algorithm 2. Likewise, train the DNN for
Algorithm 4 can effectively tackle the mismatch issue of UtG channel gain coefficient estimation.
slice supply and demand. Although the process of virtually 22: Update virtual queues:
isolating the UAV network resources and functions is time- 23: Calculate ui,s (t+K) for all i and s using (10). Calculate
consuming, this process is desired to be completed within the ptot
j (t + K) for all j using (8).
time interval (t, t + K) based on the predicted users’ locations 24: Update Qi,s (t + K + 1), Zi,s (t + K + 1), and Hj (t +
{ŷi,s
B
(t + K)}, ∀i, s, t. At time slot t + K, the well-created K + 1) for all i, s and j using (25), (26), and (27).
and configured network slices will be utilized to accommodate 25: end for
the QoS requirements of UAV control and non-payload links
and to serve ground mobile users. Summarily, Algorithm 4
allows to partition network slices in advance; thus, we call it and (31). For Algorithm 1, (19), (20), (21) must be iteratively
proactive UAV network slicing. computed until converge or reach the maximum iteration times
The following lemma shows that the performance of Algo- rmax . Besides, (15) should be called for K times to obtain
rithms 3 and 4 can be guaranteed. users’ predicted locations. Therefore, the worst-case complex-
Lemma 5. Algorithm 3 is convergent, and Algorithm 4 can ity of calling Algorithm 1 is O(f1 ) = O(N e (rmax ((Ni +
make all virtual queues mean-rate stabilize. Nr )3 No +(Ni +Nr )2 Q+(Ni +Nr )No Q)+(Ni +Nr )No K)).
For the second contributor, it can be further divided into
Proof. Please refer to Appendix F in the technical report [40].
five parts, i.e., the complexities of solving (30), (34), (37),
(38), (39). As a closed-form solution is derived to solve (30),
We focus on the analysis of the computational complexity of its complexity is O(7N e ) in the worst case. Binary search
tot W u (t)
Algorithm 4 from the complexity perspective of two main con- methods with the complexity of O(log2 ( Wϵ )+log2 ( ubϵ ))
tributors: Algorithm 1 and the procedures of solving both (30) are leveraged to solve (34) to obtain the optimal W u (t), where

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
14

UAV Network information (locations, power & queues )


communication Fresh data Historical data
coverage scenario
 Optimization
Learning
URLLC slice resource optimization
Pre-train   Update Binary
search

User
location MBB slice resource optimization
ESN prediction
Slice access
 enforcement
Pre-train  Update
UAV movement
 control
BtU & UtG
DNN channel UAV transmit
Resource estimation
power control
 allocation

Fig. 3. Working diagram of Algorithm 4.

ϵ represents the error tolerance. The complexity of mitigating by International Telecommunication Union (ITU) [48] with
the linear integer programming problem (37) is O((N e +1)J ). statistical parameters α = 0.3, β = 300 buildings/km2 , γ
The complexities of solving convex problems (38), (39) are being modelled as a Rayleigh distribution with the mean
O((J +(J +1)N e )3.5 ), O((J +N e )3.5 ), respectively [47]. Be- value σ = 30 m. The heights of all buildings are clipped
sides, (37), (38), (39) must be iteratively solved until converge to not exceed 40 m for convenience. The BS antenna model
or reach the maximum iteration times. Therefore, the complex- follows the 3GPP specification # 36.873 antenna model given
tot W u (t)
ity of solving (31) is O(f2 ) = O(log2 ( Wϵ )+log2 ( ubϵ )+ in Table 7.1-1 of [49], where an eight-element uniform linear
r̄max ((N e + 1)J + (J + (J + 1)N e )3.5 + (J + N e )3.5 )) in the array is placed vertically. Each array element is directional
worst case. In summary, the total complexity of Algorithm 4 with half-power beamwidths along both vertical and horizontal
is O(T (f1 + 7N e + f2 )) in the worst case. The computational dimensions equaling to 65◦ . To simulate the signal strength
complexity of the total complexity is exponential to the measured by UAVs, the presence/absence of LoS link between
number of UAVs that is small. Moreover, the actual complexity a UAV and a ground user is firstly checked based on the
is usually much smaller than the worst case. building realization. Meanwhile, we determine whether there
exists an LoS link between the BS and a UAV to simulate
VIII. S IMULATION R ESULTS the signal strength measured by the BS. Then, we generate
A. Comparison algorithms and parameter setting the BtU and UtG path-losses using the 3GPP specification #
36.777 path-loss model for urban Macro given in Table B-2
To verify the effectiveness of the proposed algorithm, we
of [50]. The small-scale fading coefficient is added assuming
compare it with two benchmark algorithms: 1) Static UAV-
Rayleigh fading for the NLoS case and Rician fading with 15
based (SUAV) algorithm: The difference between SUAV and
dB Rician factor for the LoS case [24].
RE2 FS lies in the scheme of controlling the movement of
To test the practicality of the distributed ESN learning
UAVs. For SUAV, it randomly deploys J hovering UAVs
method, the realistic pedestrian movement dataset is extract-
with the similar deployment altitude (50 m) over the area of
ed from a Github website5 and utilized in the simulation.
interest; 2) CirCular Trajectory-based (CCT) algorithm: In this
The dataset depends on 12000 pieces of twitter information
algorithm, each UAV flies in a circular trajectory with a speed
collected near Oxford street, in London on the 14th, March
of 10 m/s. At the beginning of the simulation, UAVs (with an
2018. In this dataset, GPS-related position information of N e
altitude of 50 m) are deployed in a line. The distance between
mobile users who tweeted more than two times were recorded.
two adjacent UAVs is 1/(2J) km. The horizontal locations of
Besides, a linear interpolation method was utilized to obtain
the first and the last UAVs are [1/2 + 1/(4J); 1/2] km and
more user information to describe users’ movement more
[1 − 1/(4J); 1/2] km, respectively, and turning radiuses of
specifically. After that, the 2D trajectory of each user was
them are 1/(4J) km and 1/2 − 1/(4J) km. Besides, it adopts
linearly zoomed into the simulated area of size 1 × 1 km2 . In
the similar slice request acceptance and UAV transmit power
this case, the trajectory of each user was obtained. A turntable
optimization schemes as RE2 FS. The comparison algorithms
game in [44] was also used to set the required data rates of
are implemented on Tensorflow 1.15.0 and Python 3.7, and a
ground users with Csth ∈ {1, 2, 4} Mbps.
PC configured with Core i7, 8G RAM, and Windows 10 OS
Additionally, the parameters related to URLLC slices are
is used.
listed as follows: we consider one URLLC slice and set
We consider an urban area of size 1 × 1 km2 with highrise req
τs,u = 5 milliseconds, ϵreq req
s,u = 1e-7, bs,u = 160 bits,
buildings in the simulation. This scenario corresponds to the
pBmax
= 50000 mW, the user height gi,s = 1.8 m, ∀i, s.
most challenging environment for slicing the UAV network,
MBB slice and UAV power consumption related parameters
since the LoS/NLoS links may alter frequently as UAVs fly.
are shown as below: we consider three types of MBB slices
To accurately simulate the UtG and BtU channel gains in the
environment, we generate the building locations and heights 5 https://github.com/pswf/Twitter-Dataset/blob/master/Dataset. Our algorith-
based on a realization of a local building model suggested m accommodates other realistic pedestrian movement datasets.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
15

1000
and set the UAV altitude gj (t) = 50 m, the circuit power
900
pcj = 20 mW, p̂j = 1650 mW, p̃j = 1500 mW, p0 = 3.4

x-coordinate (m)
800
mW, p1 = 118 mW, vjs = 10 m/s, Utip = 60 m/s, 700
v0 = 5.4 m/s, Aj = 3.1e − 3 [34], [51], ∀j. Besides, let 600

emax = 50 m, dmin = 35 m, umax i,s (t) is approximated as 500

W tot log2 (1 + (p̂j − pcj )θij,s (t)/(N0 W tot |gj (t) − gi,s |2 )). Set 400
x-coordinate of the predicted trajectory

other learning-correlated parameters as below: rmax = 100, 300


0 100 200 300
x-coordinate of the true trajectory

400 500
r̄max = 1000, the sample number Q = 6, the number of future Time slot

time slots K = 10, Ni = 2, No = 2, Nr = 300, λ = 0.001, (a) Comparison of x-coordinate


the step size η = 0.01, ξ = 0.001. For each DNN, its first 1000

hidden layer has 512 neurons, and its second hidden layer has 900

y-coordinate (m)
256 neurons. The learning rate of each DNN is 0.001, the
800
minibatch size |Tt | = 64, ∀t, the buffer capacity C = 1e+6.
More system parameters are listed as follows: Tp = 1, the 700

carrier frequency fc = 2.0 GHz, light of speed c = 3.0e+8 600

m/s, Gj = Gr = 1 dBi, ∀j, total bandwidth W tot = 10 MHz, 500


y-coordinate of the predicted trajectory
y-coordinate of the true trajectory

noise power spectral density N0 = −235 dBm/Hz, T = 500, 0 100 200


Time slot
300 400 500

the coefficients ρ = 0.01 and V = 2, and the 3D location of (b) Comparison of y-coordinate
the BS is x3DB = [25; 37.5; 25] m. 1000
user's predicted coordinates
user's true coordinates

900

y-coordinate (m)
B. Performance evaluation 800

To comprehensively understand the accuracy and the avail- 700

ability of the developed learning methods and optimiza- 600


tion framework, we illustrate the performance results of the
500
distributed learning method, online channel gain coefficient 300 400 500 600 700 800 900 1000
x-coordinate (m)
learning methods, Lyapunov-based optimization framework,
(c) Comparison of a user’s 2D coordinates
respectively. In this simulation, we first let the UAV number
J = 3 and the mobile user number N e = 64. Fig. 4. Comparison of true and predicted trajectories of a user.
First, to validate the accuracy of the distributed learning
method on predicting users’ locations, we plot the actual 0.8 0.8
x-coordinates of users' predicted trajectories
trajectory of a randomly selected user and its correspondingly y-coordinates of users' predicted trajectories

predicted trajectory in Fig. 4. The accuracy, which is measured 0.6 0.6


by the mean square error (MSE), of predicted trajectories of
MSE

MSE
0.4 0.4
64 mobile users is plotted in Fig. 5. From these figures, we
can observe that: 1) when the heading directions of users 0.2 0.2
will not change fast, this method can exactly predict their
0 0
locations. When users change their moving directions quickly, 10 20 30 40 50 60
the method loses their future locations. However, the method
will re-capture the future locations of users after training ESN
Fig. 5. Prediction accuracy of the distributed ESN learning method.
models based on newly collected users’ location samples; 2) 1.5 0.15 0.15

the obtained MSE of the predicted trajectories and actual 0.1


UAV 1
0.1
UAV 2

trajectories of 64 mobile users will not be greater than 0.8. 1


0.05 0.05

Therefore, we may conclude that the developed distributed 0

0.15
800 900 1000 1100 1200
0
800 900 1000 1100 1200

learning method can be utilized to predict users’ locations. 0.1


UAV3

0.5
Next, to testify the accuracy of online channel gain coeffi- 0.05

cient estimation methods, we plot the corresponding tendency 800 900 1000 1100 1200

0
of loss, which is calculated by (23), between the estimated 0 200 400 600 800 1000 1200
coefficient values and its target coefficient values in Figs. 6 Time slot
and 7. Fig. 6 shows the loss values of the DNNs for UtG Fig. 6. Loss values of DNNs for UtG channel gain coefficient estimation
channel gain coefficient estimation. In this figure, loss values versus time slot.
in the first 736 time slots, where initial values of 264 time
slots are forgotten to alleviate the impact of noise, reflect the
pre-training accuracy, and loss value in the later 500 time slots the pre-training and online learning accuracy. Besides, in Fig.
reflect the online learning accuracy. Fig. 7 illustrates the loss 8, we plot the tendency
∑ of instantaneous energy ∑ efficiency,
results of the DNN for BtU channel gain coefficient estimation defined as E(t) = i,s log 2 (1 + ui,s (t)) − ρ tot
j∈J pj (t),
between the BS and the first UAV. Similarly, the values in in the case of perfect channel state information (CSI) and in
the first 2736 time slots and the later 500 time slots reflect the presence of DNN estimation errors. From these figures,

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
16

100 0.012
0.8
80 0.01
0.6
0.008
60 0.4

W
0.006
40 0.2

0 0.004
20 2800 2900 3000 3100 3200
0.002
0
0 500 1000 1500 2000 2500 3000 0
Time slot 2 4 6 8 10 12
Iteration times, r

Fig. 7. Loss values of the DNN for BtU channel gain coefficient
estimation versus time slot. Fig. 9. The convergence curve of the ADMM method explored in the
0 optimization framework.
Instantaneous energy efficiency, E(t)

14

-50 12

10

-100 8

f
6
-150
2
4
RE FS
perfect CSI, RE2FS 2
-200
0 100 200 300 400 500 0
Time slot 2 4 6 8 10 12
Iteration times, r
Fig. 8. Trends of instantaneous energy efficiency versus time slot.
Fig. 10. The convergence curve of the iterative optimization scheme
adopted in the optimization framework.

we can observe that: 1) the estimation error is great initially,


but it quickly decreases with the increase of time slots as 25
Queue stability values
2 30
SQ SZ
more experience is accumulated. For example, the obtained 20 1
20

loss value is smaller than 0.2 after twenty time slots. Besides, 10

15 0 0
after a number of time slots, DNNs for BtU and UtG channel 0 100 200 300
2
400 500 0 100 200 300 400 500

gain coefficient estimation can converge; 2) although actual 10


1
SH

BtU and UtG channel gain coefficients vary fast due to the 5 0
movement of users and UAVs, the estimation method can 0 100 200 300 400 500

achieve good estimation results. For example, during the 0


0 100 200 300 400 500
online learning period, the loss values of DNNs for U2G Time slot
channel gain coefficient estimation reach an order of less than
1.5e-1, and the loss value of the DNN for BtU channel gain Fig. 11. Trend of virtual queue stability values versus time slot.
1000
coefficient estimation is smaller than 0.65; 3) the obtained
E(t) of RE2 FS in the presence of DNN estimation errors is 800
y-coordinate (m)

close to that in the case of perfect CSI. These observations


600
show that DNNs can well estimate channel gain coefficients.
Third, to verify the availability of the Lyapunov-based 400 Trajectory of UAV 1
Trajectory of UAV 2
optimization framework, we plot the convergence curve of Trajectory of UAV 3
200 Final position of UAV 1
the optimization framework. From Algorithm 4, we can Final position of UAV 2
Final position of UAV 3
know that the convergence performance of the framework 0
is determined by that of the ADMM method and the it- 0 200 400 600 800 1000
x-coordinate (m)
erative optimization scheme. We therefore plot the conver-
gence curves of the ADMM method and the iterative op- Fig. 12. Trajectories of three UAVs projected in a 2D space.
timization scheme in Figs. 9 and 10. By referring to the
principle of ADMM, we can measure its convergence by
(r+1) (r)
∆W = ||Ŵt − Ŵt ||F . If ∆W tends to zero after a three UAVs in the first 50 time slots and their final 2D locations
limited number of iterations, the ADMM method converges are plotted in Fig. 12.
[39]. For the iterative optimization scheme, it is convergen- The following observations can be achieved from these fig-
t if ∆f approaches zero after a limited number of itera- ures: 1) both the ADMM method and the iterative optimization
tions, where ∆f = |f (A(r+1) (t), P (r+1) (t), X (r+1) (t)) − scheme can converge after several iterations; therefore, the
f (A(r) (t), P (r) (t), X (r) (t))| with f (A(r) (t), P (r) (t), X (r) (t)) optimization framework is convergent; 2) during the learning
being calculated by (36a). Then, we plot the tendency period, the obtained queue stability values are bounded; 3) the
of the virtual queue stability values, defined as SQ = obtained queue stability values tend to zero with an increasing
maxi,s [Qi,s (t)]+ /t, SZ = maxi,s [Zi,s (t)]+ /t, and SH = time slot; as a result, all introduced virtual queues are mean-
maxj∈J [Hj (t)]+ /t in Fig. 11. Besides, the trajectories of rate stable according to (28), i.e., all time average constraints

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
17

30
TABLE I
RE2FS
I MPACT OF THE APPROXIMATION Vj,s (t) ≈ 1 25 CCT
SUAV

Energy efficiency
17.41%
20
Received SNR with
5 10 15 20
approximated V ≈ 1 (dB) 15 20.04%
Received SNR with
4.9168 9.9872 14.9983 19.9998 10
accurate V in (3) (dB)
5
BS transmit power with
1.7159 2.5224 6.7376 81.5712
approximated V ≈ 1 (dBm) 0
20 40 60 80 100 120
BS transmit power with
1.6984 2.5156 6.7325 81.5552
accurate V in (3) (dBm)
Fig. 13. Energy efficiency versus the number of users.
0.6
RE2FS
25.03%
0.5 CCT
in (24) can be satisfied. This result verifies the effectiveness

Jain's fairness index


31.30% SUAV

of the Lyapunov-based optimization framework; 4) since the 0.4

value of SQ tends to zero, the rate requirements of served 0.3

users can be satisfied, which means that W e (t) is non-zero. 0.2


If URLLC requirements of UAV control and non-payload
0.1
information delivery are not satisfied, then W tot will be
0
allocated to URLLC slices. In this case, all MBB slices will be 20 40 60 80 100 120
released and SQ will be monotonously increase with t, which
is not shown in Fig. 11. Therefore, URLLC requirements of Fig. 14. Jain’s fairness index versus the number of users.
UAV control and non-payload information transmission can
be accommodated; 5) UAV movement constraints can be met
at each time slot; 6) as users frequently appear in the upper compared with the CCT and SUAV algorithms, RE2 FS im-
right corner of the considered area, UAVs tend to move to this proves the energy efficiency by 17.41% and 20.04%, respec-
corner. In this way, QoS requirements of ground users can be tively, when N e = 96; 2) the achieved energy efficiency
met while the UAV transmit power can be reduced. values of all comparison algorithms will increase with an
After that, we show the impact of medium and high SNR increasing number of users as more users can be served; 3)
approximation (Vj,s (t) ≈ 1, ∀j ∈ J , s ∈ S u ) on the BS the proposed RE2 FS achieves the highest Jain’s fairness index.
transmit power consumption. Numerical results in Table I For instance, when N e = 16, compared with the SUAV and
show the impact of the approximation Vj,s (t) ≈ 1 on the SNR CCT algorithms, RE2 FS improves the fairness index value by
and the corresponding BS transmit power required to ensure 25.03% and 31.30%, respectively; 4) when N e ≤ 64, the
the SNR. The results are obtained in the case of deploying one achieved fairness index value of SUAV is greater than that
UAV, and other parameters are set by default. By changing the of CCT. The communication coverage of a UAV is limited.
system bandwidth allocated to the UAV, the SNR experienced When the number of users is small (e.g., not more than 64),
by the UAV increases from 5 dB to 20 dB. By exploring the regular UAV trajectories may generate more coverage holes.
approximation Vj,s (t) ≈ 1, we can obtain an upper bound On the contrary, mobile UAVs can serve more users when the
of the BS transmit power. The results demonstrate that the number of users is great (e.g., greater than 64). As a result,
approximation is tight when the received SNR is higher than CCT performs better in terms of the fair coverage than SUAV
5 dB. When the received SNR is 5 dB, the gap between the when N e > 64; 5) as N e is part of the denominator of the
upper bound and the accurate BS transmit power is around definition of the Jain’s fairness index, the obtained fairness
1%. indexes of all comparison algorithms with N e = 16 are greater
Next, we proceed to the verification of the effectiveness of than their achieved fairness indexes when N e = 128.
the proposed RE2 FS algorithm by comparing it with other We then illustrate the impact of the number of UAVs on the
two benchmark algorithms. To measure the effectiveness, the energy efficiency and Jain’s fairness indexes of all comparison
following two key performance indicators are introduced: the algorithms. Figs. 15 and 16 show the tendency of the obtained
energy efficiency that is computed by performance indicators when N e = 64. From these figures, we
(∑ )2 (13a) ∑
and the Jain’s
e 2 can observe that: 1) owing to the effective UAV movement
fairness index, defined as ū
i,s i,s /(N i,s ūi,s ) with
∑ T
control, the proposed RE2 FS achieves the highest energy
ūi,s = T1 t=1 ui,s (t). efficiency and Jain’s fairness index. For instance, compared
We first plot the impact of the number of users on the with the benchmark algorithms, the minimum improvement
obtained energy efficiency and the Jain’s fairness index of all of RE2 FS on the energy efficiency and Jain’s fairness index
comparison algorithms. Figs. 13 and 14 show the tendency is 3.96% and 2.91%, respectively; 2) we cannot conclude
of the obtained energy efficiency and the Jain’s fairness that the obtained performance indicators of all comparison
indexes of all algorithms, respectively, when J = 3, N e ∈ algorithms will increase or decrease with an increasing number
{16, 32, 64, 96, 128}. of UAVs. More mobile users can be simultaneously served
From these figures, we can observe that: 1) the proposed when the number of UAVs is increased, which indicates that
RE2 FS achieves the highest energy efficiency. For example, the energy efficiency value and the fairness index may be

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
18

25
RE2FS
R EFERENCES
CCT
SUAV
[1] P. Yang, X. Xi, Q. S. T. Quek, J. Chen, and X. Cao, “Repeatedly energy-
Energy efficiency
20 efficient and fair service coverage: UAV slicing,” in 2020 IEEE Global
Communications Conference (Globecom). IEEE, 2020, pp. 1–6.
3.96% [2] L. Gupta, R. Jain, and G. Vaszkun, “Survey of important issues in UAV
15 communication networks,” IEEE Communications Surveys & Tutorials,
38.81% vol. 18, no. 2, pp. 1123–1152, 2016.
[3] X. Cao, P. Yang, M. Alzenad, X. Xi, D. Wu, and H. Yanikomeroglu,
10 “Airborne communication networks: A survey,” IEEE Journal on Select-
2 3 4 5 6 7 ed Areas in Communications, vol. 36, no. 9, pp. 1907–1926, 2018.
[4] A. E. Garcia, S. Hofmann, C. Sous, L. Garcia, A. Baltaci, C. Bach,
R. Wellens, D. Gera, D. Schupke, and H. E. Gonzalez, “Performance
Fig. 15. Energy efficiency versus the number of UAVs. evaluation of network slicing for aerial vehicle communications,” in
0.4
RE2FS 2019 IEEE International Conference on Communications Workshops
0.35
CCT 59.67% (ICC Workshops). IEEE, 2019, pp. 1–6.
Jain's fairness index

SUAV
[5] W. Sarah, “Autonomous drone services tested over interconti-
0.3
nental 5G,” https://5g.co.uk/news/drone-services-over-intercontinental-
2.91% 5g/4374/, 2018.
0.25 [6] H. Hellaoui, O. Bekkouche, M. Bagaa, and T. Taleb, “Aerial control
system for spectrum efficiency in UAV-to-cellular communications,”
0.2 IEEE Communications Magazine, vol. 56, no. 10, pp. 108–113, 2018.
[7] G. K. Xilouris, M. C. Batistatos, G. E. Athanasiadou, G. Tsoulos, H. B.
0.15 Pervaiz, and C. C. Zarakovitis, “UAV-assisted 5G network architecture
2 3 4 5 6 7
with slicing and virtualization,” in 2018 IEEE Globecom Workshops (GC
Wkshps). IEEE, 2018, pp. 1–7.
[8] G. Faraci, C. Grasso, and G. Schembra, “Design of a 5G network slice
Fig. 16. Jain’s fairness index versus the number of UAVs. extension with MEC UAVs managed with reinforcement learning,” IEEE
Journal on Selected Areas in Communications, vol. 38, no. 10, pp. 2356–
2371, 2020.
raised. However, more UAVs will result in greater interference [9] H. Ren, C. Pan, K. Wang, Y. Deng, M. Elkashlan, and A. Nallanathan,
and consume more energy, which will decrease the energy “Achievable data rate for URLLC-enabled UAV systems with 3-D
channel model,” IEEE Wireless Communications Letters, vol. 8, no. 6,
efficiency value; meanwhile, owing to the strong interference pp. 1587–1590, 2019.
some mobile users may experience coverage interruptions, [10] O. Abbasi, H. Yanikomeroglu, A. Ebrahimi, and N. Mokari, “Trajectory
which will lower the fairness index; 3) when varying the num- design and power allocation for drone-assisted NR-V2X network with
dynamic NOMA/OMA,” IEEE Transactions on Wireless Communica-
ber of UAVs, the performance of the SUAV algorithm, where tions, 2020, in press. DOI: 10.1109/TWC.2020.3008568.
UAVs are hovering, does not outperform the CCT algorithm, [11] C. Pan, H. Ren, Y. Deng, M. Elkashlan, and A. Nallanathan, “Joint
where UAVs follow circular trajectories. Summarily, the above blocklength and location optimization for URLLC-enabled UAV relay
systems,” IEEE Communications Letters, vol. 23, no. 3, pp. 498–501,
results show that the URLLC requirements of UAV control and 2019.
non-payload information delivery can be accommodated and [12] A. Ranjha and G. Kaddoum, “Quasi-optimization of distance and
the UAV network can provide energy-efficient and fair MBB blocklength in URLLC aided multi-hop UAV relay links,” IEEE Wireless
Communications Letters, vol. 9, no. 3, pp. 306–310, 2019.
services for ground mobile users by exploiting the proposed
[13] K. Wang, C. Pan, H. Ren, W. Xu, L. Zhang, and A. Nallanathan, “Packet
RE2 FS algorithm. error probability and effective throughput for ultra-reliable and low-
latency UAV communications,” IEEE Transactions on Communications,
IX. C ONCLUSION 2020, in Press. DOI 10.1109/TCOMM.2020.3025578.
[14] K. Chen, Y. Wang, Z. Fei, and X. Wang, “Power limited ultra-reliable
This paper investigated a proactive UAV network slicing and low-latency communication in UAV-enabled IoT networks,” in 2020
problem and formulated the problem as a sequential decision IEEE Wireless Communications and Networking Conference (WCNC).
IEEE, 2020, pp. 1–6.
problem with a goal of providing energy-efficient and fair [15] A. Ranjha and G. Kaddoum, “Quasi-optimization of uplink power
service coverage for MBB users while satisfying the URLLC for enabling green URLLC in mobile UAV-assisted IoT networks: A
requirements of UAV control and non-payload signal delivery. perturbation-based approach,” IEEE Internet of Things Journal, 2020,
in press. DOI: 10.1109/JIOT.2020.3014039.
This problem was confirmed to be a mixed-integer-non-convex [16] ——, “URLLC facilitated by mobile UAV relay and RIS: A joint design
optimization problem, the solution of which also required of passive beamforming, blocklength and UAV positioning,” IEEE Inter-
accurate mobile users’ locations and channel gain models. To net of Things Journal, 2020, in press. DOI: 10.1109/JIOT.2020.3027149.
[17] R. Ding, F. Gao, and X. Shen, “3D UAV trajectory design and fre-
solve this problem, we proposed a new approach using learn- quency band allocation for energy-efficient and fair communication: A
ing and optimization methods. Specifically, we first developed deep reinforcement learning approach,” IEEE Transactions on Wireless
a distributed learning method to predict mobile users’ loca- Communications, 2020, in press. DOI: 10.1109/TWC.2020.3016024.
tions, with which we built analytically tractable DNN-based [18] H. Wu, F. Lyu, C. Zhou, J. Chen, L. Wang, and X. Shen, “Optimal uav
caching and trajectory in aerial-assisted vehicular networks: A learning-
channel gain models. Then, we proposed a Lyapunov-based based approach,” IEEE Journal on Selected Areas in Communications,
optimization framework to decompose the original problem 2020, in Press. DOI: 10.1109/JSAC.2020.3005469.
into several repeated optimization subproblems based on the [19] A. Al-Hilo, M. Samir, C. Assi, S. Sharafeddine, and D. Ebrahimi,
“UAV-assisted content delivery in intelligent transportation systems-
learned results. Finally, these subproblems were optimized by joint trajectory planning and cache management,” IEEE Transactions
exploiting an SCA technique and an iterative optimization on Intelligent Transportation Systems, 2020, in press. DOI: 10.1109/TIT-
scheme. Simulation results were provided to show the accuracy S.2020.3020220.
[20] R. Mudumbai, S. K. Singh, and U. Madhow, “Medium access control
of the learning methods and to verify the effectiveness of the for 60 GHz outdoor mesh networks with highly directional links,” in
Lyapunov-based optimization framework. IEEE Infocom, 2009, pp. 2871–2875.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2021.3088624, IEEE Journal
on Selected Areas in Communications
19

[21] A. Alhourani, S. Kandeepan, and S. Lardner, “Optimal lap altitude for [42] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
maximum coverage,” IEEE Wireless Communications Letters, vol. 3, Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Ostrovski,
no. 6, pp. 569–572, 2014. “Human-level control through deep reinforcement learning,” Nature, vol.
[22] M. M. Azari, F. Rosas, K. Chen, and S. Pollin, “Ultra reliable UAV com- 518, no. 7540, pp. 529–541, 2015.
munication using altitude and cooperation diversity,” IEEE Transactions [43] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
on Communications, vol. 66, no. 1, pp. 330–344, 2018. in ICLR (Poster), 2015, pp. 1–15.
[23] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong, [44] P. Yang, X. Cao, X. Xi, W. Du, Z. Xiao, and D. O. Wu, “Three-
“Caching in the sky: Proactive deployment of cache-enabled unmanned dimensional continuous movement control of drone cells for energy-
aerial vehicles for optimized quality-of-experience,” IEEE Journal on efficient communication coverage,” IEEE Transactions on Vehicular
Selected Areas in Communications, vol. 35, no. 5, pp. 1046–1061, 2017. Technology, vol. 68, no. 7, pp. 6535–6546, 2019.
[24] Y. Zeng, X. Xu, S. Jin, and R. Zhang, “Simultaneous navigation and [45] C. Sun, C. She, C. Yang, T. Q. Quek, Y. Li, and B. Vucetic, “Optimizing
radio mapping for cellular-connected UAV with deep reinforcement resource allocation in the short blocklength regime for ultra-reliable and
learning,” IEEE Transactions on Wireless Communications, 2021, in low-latency communications,” IEEE Transactions on Wireless Commu-
press. DOI: 10.1109/TWC.2021.3056573. nications, vol. 18, no. 1, pp. 402–415, 2018.
[25] 3GPP, “Study on scenarios and requirements for next generation access [46] G. Scutari, F. Facchinei, and L. Lampariello, “Parallel and distributed
technologies (release 14), v14.2.0,” The 3rd Generation Partnership methods for constrained nonconvex optimizationłpart I: Theory,” IEEE
Project, Tech. Rep. 38.913, May 2017. Transactions on Signal Processing, vol. 65, no. 8, pp. 1929–1944, 2017.
[26] T. Kashima, J. Qiu, H. Shen, C. Tang, T. Tian, X. Wang, X. Hou, [47] Y. Ye, Interior point algorithms: theory and analysis. John Wiley &
H. Jiang, A. Benjebbour, Y. Saito et al., “Large scale massive MIMO Sons, Toronto, Canada, 2011, vol. 44.
field trial for 5G mobile communications system,” in 2016 International [48] ITU-R, “Propagation data and prediction methods required for the design
Symposium on Antennas and Propagation (ISAP). IEEE, 2016, pp. of terrestrial broadband radio access systems operating in a frequency
602–603. range from 3 to 60 GHz,” International Telecommunication Union, Tech.
[27] 5G PPP, “5G architecture white paper, version 3.0,” https://5g- Rep. P.1410-5, Feb. 2012.
ppp.eu/white-papers/, Feb. 2020. [49] 3GPP, “Study on 3D channel model for LTE, v12.7.0,” The 3rd Gener-
[28] A. Devlic, A. Hamidian, D. Liang, M. Eriksson, A. Consoli, and ation Partnership Project, Tech. Rep. 36.873, Dec. 2017.
J. Lundstedt, “NESMO: Network slicing management and orchestration [50] ——, “Technical specification group radio access network: study on
framework,” in IEEE International Conference on Communications enhanced LTE support for aerial vehicles, v15.0.0,” The 3rd Generation
(ICC), 2017, pp. 1202–1208. Partnership Project, Tech. Rep. 36.777, Dec. 2017.
[29] P. Rost, C. Mannweiler, D. S. Michalopoulos, C. Sartori, V. Scian- [51] S. Hu, W. Ni, X. Wang, A. Jamalipour, and D. Ta, “Joint optimization of
calepore, N. Sastry, O. Holland, S. Tayade et al., “Network slicing trajectory, propulsion and thrust powers for covert UAV-on-UAV video
to enable scalability and flexibility in 5G mobile networks,” IEEE tracking and surveillance,” IEEE Transactions on Information Forensics
Communications magazine, vol. 55, no. 5, pp. 72–79, 2017. and Security, 2020, in press. DOI: 10.1109/TIFS.2020.3047758.
[30] H. Yang, K. Zhang, K. Zheng, and Y. Qian, “Joint frame design
and resource allocation for ultra-reliable and low-latency vehicular
networks,” IEEE Transactions on Wireless Communications, vol. 19,
no. 5, pp. 3607–3622, 2020.
[31] H. Ren, C. Pan, Y. Deng, M. Elkashlan, and A. Nallanathan, “Joint pilot
and payload power allocation for massive-MIMO-enabled URLLC IIoT
networks,” IEEE Journal on Selected Areas in Communications, vol. 38,
no. 5, pp. 816–830, 2020.
[32] S. Schiessl, J. Gross, and H. Al-Zubaidy, “Delay analysis for wireless
fading channels with finite blocklength channel coding,” in Proceedings
of the 18th ACM International Conference on Modeling, Analysis and
Simulation of Wireless and Mobile Systems. ACM, 2015, pp. 13–22.
[33] R. Ni, X. Li, J. Chen, S. Chen, E. Wang, M. Zhu, W. Zhang, and Y. Chen,
“An end-to-end demonstration for 5G network slicing,” in 2019 IEEE
89th Vehicular Technology Conference (VTC2019-Spring). IEEE, 2019,
pp. 1–5.
[34] Y. Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless
communication with rotary-wing UAV,” IEEE Transactions on Wireless
Communications, vol. 18, no. 4, pp. 2329–2345, 2019.
[35] T. Liu, M. Cui, G. Zhang, Q. Wu, X. Chu, and J. Zhang, “3D
trajectory and transmit power optimization for UAV-enabled multi-link
relaying systems,” IEEE Transactions on Green Communications and
Networking, 2020, in press. DOI: 10.1109/TGCN.2020.3048135.
[36] D. Xu, Y. Sun, D. W. K. Ng, and R. Schober, “Multiuser MISO
UAV communications in uncertain environments with no-fly zones:
Robust trajectory and resource allocation design,” IEEE Transactions
on Communications, vol. 68, no. 5, pp. 3153–3172, 2020.
[37] Y. Sun, D. Xu, D. W. K. Ng, L. Dai, and R. Schober, “Optimal 3D-
trajectory design and resource allocation for solar-powered UAV com-
munication systems,” IEEE Transactions on Communications, vol. 67,
no. 6, pp. 4281–4298, 2019.
[38] S. Scardapane, D. Wang, and M. Panella, “A decentralized training
algorithm for echo state networks in distributed big data applications,”
Neural Networks, vol. 78, pp. 65–74, 2016.
[39] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed
optimization and statistical learning via the alternating direction method
of multipliers,” Foundations and Trends⃝ R in Machine learning, vol. 3,
no. 1, pp. 1–122, 2011.
[40] P. Yang, X. Xi, K. Guo, Q. S. T. Quek, J. Chen, and X. Cao,
“Proactive UAV network slicing for URLLC and mobile broadband
service multiplexing,” 2020, https://arxiv.org/pdf/1912.03600v5.pdf.
[41] X. Glorot and Y. Bengio, “Understanding the difficulty of training
deep feedforward neural networks,” in Proceedings of the thirteenth
international conference on artificial intelligence and statistics, 2010,
pp. 249–256.

0733-8716 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: National University of Singapore. Downloaded on July 06,2021 at 21:22:56 UTC from IEEE Xplore. Restrictions apply.

You might also like