Integrating Convex Optimization and Deep Learning For Downlink Resource Allocation in LEO Satellites Networks
Integrating Convex Optimization and Deep Learning For Downlink Resource Allocation in LEO Satellites Networks
3, JUNE 2024
Abstract—This paper investigates the satellite communication Among these types, LEO satellites are positioned at an
network (SCN) to optimize the channel allocation of the fixed altitude ranging from 500 to 2000 km [3]. The lower orbit
ground base station and the transmission power allocation of of LEO satellites provides them with better coverage, shorter
the Low Earth orbit (LEO) satellites jointly while considering
the freshness of the information. We present a mathematical communication distance, and less path loss [4], leading to
model for the problem and formulate it as a mixed-integer lower time delay and energy consumption compared to higher-
programming (MIP) problem, which is NP-hard. To tackle this orbiting satellites [5]. As a result of these advantages, the
challenge, we propose a two-step approach that decomposes the deployment of LEO satellites has become more active, and the
problem into a channel allocation problem and a power allocation percentage of LEO satellites in orbit has increased. Notably,
problem. For the power allocation problem, we propose a convex
optimization algorithm termed as Opt. For the channel allocation several significant projects on LEO satellite communication
problem, we introduce two learning-based schemes, Ptr and systems, such as OneWeb and SpaceX, have been launched [6].
DNN-Ptr. Combining these two steps together, we develop two However, the power and spectrum resources of LEO satel-
novel algorithms, i.e., Opt-Ptr and Opt-DNN-Ptr. In particular, lites are very limited and can no longer meet the growing
the Opt-Ptr algorithm devises a novel Pointer Network to obtain demand for communications. Therefore, the effective use of
the channel allocation decision and then solves the remaining
power allocation problem using convex optimization algorithms. LEO satellites is on the agenda, as the number of devices in
To further improve the performance, the Opt-DNN-Ptr algorithm need of service around the world is rapidly increasing [7].
utilizes a DNN to predict a transmission power allocation, which On the other hand, the researches on the SCNs are still
is then combined with the channel allocation decision obtained in their infancy, and most of the exiting studies are about
from the pointer network to solve the remaining power allocation architecture, applications, multibeam satellite communication
problem. The simulation results verify the superiority of the
proposed algorithm. or other research topics [1]. There are only a few works about
optimizing the resource allocation for effective data down-
Index Terms—Satellite communication, resource allocation, loading to match the network resources and data demand [8],
deep neural network, pointer network.
[9], [10], [11], [12], [13]. In particular, Zhou et al. have
proposed a joint scheme for satellite channel and power
I. I NTRODUCTION resource allocation, along with Internet of Remote Things data
scheduling, aiming to maximize the network’s capacity for
ATELLITE communication networks (SCNs) are widely
S utilized in critical areas such as navigation, environmental
monitoring, and emergency assistance. Due to their excellent
Internet of Remote Things data using a model-free reinforce-
ment learning framework [8]. Jia et al. [9] have presented a
collaborative data downloading algorithm that optimizes the
properties including high bandwidth, global coverage, and low data reallocation between satellites by utilizing inter-satellite
transmission delay, SCNs have become a research hotspot [1]. link (ISL) routing method, considering the communication
Typically, satellites can be categorized into three types based resource allocation of ISLs to maximize the throughput of
on their orbit altitude: geostationary orbit (GEO), medium data downloading. Reference [10] maximizes the minimum
earth orbit (MEO), and low earth orbit (LEO) satellites [2]. number of successfully scheduled missions over all user
Manuscript received 11 June 2023; revised 1 November 2023; accepted 21 satellites by jointly optimizing contact (i.e., potential available
January 2024. Date of publication 1 February 2024; date of current version communication links) plan design, power allocation in relay
7 June 2024. This work was supported in part by the National Natural Science satellites, and mission schedules using a time expanded graph.
Foundation of China under Grant 61971457 and Grant U23A20275. The
associate editor coordinating the review of this article and approving it for However, the data sharing between satellites is non-trivial due
publication was Z. Xiao. (Corresponding author: Han Hu.) to the high mobility of the satellites. To achieve efficient data
Xiufeng Sui, Ziqi Jiang, Yifeng Lyu, Rongfei Fan, and Han downloading, a terrestrial-satellite network (TSN) architecture
Hu are with the Beijing Institute of Technology, Beijing 100081,
China (e-mail: [email protected]; [email protected]; yifenglyu@ to integrate the ultra-dense LEO networks and the terrestrial
bit.edu.cn; [email protected]; [email protected]). networks is proposed in [11]. Authors have proposed two
Zhi Liu is with the Graduate School of Informatics and Engineering, matching algorithms to solve the joint user scheduling and
The University of Electro-Communications, Tokyo 182-8585, Japan (e-mail:
[email protected]). backhaul transmission power resource allocation problem to
Digital Object Identifier 10.1109/TCCN.2024.3361071 maximize the total data rate and the number of accessed users
2332-7731
c 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
SUI et al.: INTEGRATING CONVEX OPTIMIZATION AND DEEP LEARNING FOR DOWNLINK RESOURCE ALLOCATION 1105
in the TSN. In addition, to maximize the delay-constrained improved. Note that the predicted power allocation is only
throughput in satellite networks, Liu et al. [12] have extended a used to obtain the channel allocation more accurately. The
traditional time-expanded graph to model the data acquisition optimal transmission allocation of the problem is still obtained
and transmission and energy managements. Reference [13] by corresponding convex optimization algorithm. Extensive
ensures data transmission rate and minimizes energy con- simulations are conducted and the simulation results show that
sumption by addressing intermittent link issues [14] between the proposed learning-based algorithms can achieve encourag-
satellites and ground stations through a joint data downloading ing performance compared to other representative benchmark
and resource management approach. methods. The main contributions can be concluded as follows:
The aforementioned studies have investigated the data • We formulate the downlink transmission scenario as a
downloading problems for LEO satellites network. However, MIP problem, allowing for the joint optimization of
the channel resources and transmission power resource are the ground station’s channel allocation and transmission
ignored. Note that channel allocation and transmission power power allocation.
allocation have been widely studied in traditional wireless • We decompose the challenging NP-hard problem into
networks [15], [16]. These challenging optimization problems two sub-problems, namely the channel allocation problem
are usually divided into sub-problems and optimized interac- and the transmission power allocation problem, to avoid
tively. However, different from traditional wireless networks, the complexity of directly solving the MIP problem. For
we can not afford such interaction due to the high communi- the power allocation, we propose a convex optimization
cation cost. Deep RL and other learning based approaches are algorithm called Opt, while for the channel allocation, we
more adaptive [17], but can not be directly used in downlink introduce two learning-based schemes, namely Ptr and
resource allocation in SCNs due to the unique features of the DNN-Ptr.
SCNs. • We develop two novel algorithms: Opt-Ptr and Opt-DNN-
In this paper, we investigate the downlink resource alloca- Ptr by combining these two steps. The Opt-Ptr algorithm
tion in a satellite communication network (SCN) consisting uses a Pointer Network in conjunction with convex
of a fixed ground base station and multi LEO satellites and optimization algorithms such as the Lagrange multiplier
would like to address the aforementioned problems. We jointly method and sub-gradient method to determine the chan-
optimize the channel allocation of the ground base station nel allocation decision and solve the remaining power
and the transmission power allocation of the LEO satellites allocation problem. To further enhance performance, the
to maximize a weighted sum transmission rate considering Opt-DNN-Ptr leverages a DNN to predict the transmis-
the information freshness. In particular, we first mathemat- sion power allocation and incorporates it with the channel
ically model this downlink transmission and formulate the allocation decision from the pointer network to effectively
joint optimization of channel allocation and power allocation allocate resources.
problem as a mixed-integer programming (MIP) problem. To • We conduct a comparison between the proposed algo-
tackle this challenging NP-hard MIP problem, we decouple rithms and benchmark schemes, and the simulation
the problem into two sub-problems: the channel allocation results validate the excellent performance of our learning-
problem and the transmission power allocation problem. based algorithms across various scenarios. Furthermore,
To solve the transmission power allocation problem, the Opt-DNN-Ptr surpasses other algorithms and exhibits
we apply a convex optimization algorithm (i.e., Lagrange superior stability in performance.
multiplier method and sub-gradient method) to obtain the
optimal power allocation with the input and the given chan-
II. R ELATED W ORK
nel allocation, and this algorithm is termed as Opt. Thus,
the main difficulty to solve our problem is to solve the A. Resource Allocation in SCN
channel allocation problem. Inspired by the wide application 1) Channel and Bandwidth Allocation: Considering the
of learning based schemes in solving such problems, we limited spectrum resources of LEO satellites, researches have
introduce two learning-based schemes, Ptr and DNN-Ptr, been conducted to allocate the channel allocation or the
for the channel allocation problem. Combining these two bandwidth in the satellites communication. For example,
steps together, we develop two novel algorithms, i.e., Opt- Xiao et al. [19] propose a long-short term bandwidth allocation
Ptr and Opt-DNN-Ptr. In particular, the Opt-Ptr algorithm strategy for beam-hopping LEO satellites to improve the trans-
devises a novel Pointer Network [18] to obtain the channel mission rate. To efficiently utilize limited spectrum resources,
allocation decision and then solves the remaining power dynamic channel allocation is crucial, and Liu et al. [20]
allocation problem using convex optimization algorithms, such address this by proposing a centralized channel allocation
as the Lagrange multiplier method and sub-gradient method. approach based on DRL to minimize average transmission
To further improve the performance, the Opt-DNN-Ptr algo- latency.
rithm utilizes a deep neural network (DNN) to predict a 2) Power Allocation: In addition to the channel alloca-
transmission power allocation, which is then combined with tion, power allocation has also been studied. For example,
the channel allocation decision obtained from the pointer Dai et al. [21] formulate the dynamic and unpredictable
network to solve the remaining power allocation problem. By channel conditions into the power allocation problem, which
considering the power allocation during the channel allocation is then solved by the proposed DRL based power allocation
in Opt-DNN-Ptr algorithm, the performance can be further algorithm. Similarly, the power allocation optimization with
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
1106 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 10, NO. 3, JUNE 2024
the collected data to the ground base station accordingly The satellite-to-ground data link is mainly free space optical
at each time frame through the assigned downlink sub- (FSO) communication, and the downlink channel gain varies
channel. Due to the high cost of satellite communication, we slowly with time and is mainly affected by weather [38]. We
assume that the channel allocation decision remains the same can assume that the channel gain is approximately constant
within a time frame. Considering the possibility of changing within a time slot τ , where τ is small enough so that
channel conditions, we further consider transmission power the channel gain is approximately constant within a time
allocations in smaller sizes in each frame to obtain a more slot although the weather changes, the satellite moves, etc.
appropriate power allocation. The satellite allocates its limited Considering the limited power resource of satellites, the max-
power to the assigned subchannels, taking into account other imum transmission power provided by the satellite k in each
communication tasks with limited and time-varying available time instant cannot surpass a specific threshold pkmax . Thus
power, in order to maximize the defined objectives. Details we have the max power constraint for satellites k, denoted
will be described later. as pn,k ,m ≤ pkmax . However, the continuous transmission
Let αn,k ∈ {0, 1} be the indicator variable, where αn,k = 1 with maximum power may result in a large loss of the
means that the subchannel n is assigned to the satellite k, and satellites battery [39]. To keep the long-term power budget,
αn,k = 0 means that subchannel n is not assigned to satellite we further consider a average power constraint. Specifically,
k. We assume that each
sub-channel can be assigned to at most the average transmit power among each frame is restricted
one satellite, i.e., K k =1 αn,k ≤ 1, ∀n ∈ N . In cases where below a certain level pkav , where pkav ≤ pkmax . For the power
satellites are not assigned to any sub-channel in the current allocation of satellite k,
the average
M transmit power constraint
N av
frame, the collected data will be stored and await the next can be formulated as n=1 m=1 pn,k ,m /M ≤ pk . For
connection. tot av
brevity, we denote the pk = Mpk . Thus the average power
We further average a frame into M time slots of length transmission power constraint
M can be transformed into the total
τ = M T , denoted by M = {1, . . . , M }. Let C
n,k ,m denote power constraint N n=1
tot
m=1 pn,k ,m ≤ pk , where pk ≤
tot
the amount of data transmitted by satellite k through sub- max
Mpk .
channel n in time slot m. To better respond to service
latency requirements and to encourage earlier completion of B. Problem Formulation
transmissions, we define the objective function as information With given maximum power pkmax and total power pktot ,
freshness-oriented, i.e., higher priority is given to the trans- we denotes the parameters of satellite k as gk = (pkmax , pktot ),
mission in the early time
M slots. The objective function is k ∈ K. For the convenience of expression, we use I t =
N
defined as α
n=1 n,k m=1 wm Cn,k ,m , where wm is the {ht , Gt } to denote the channel gains and satellites parameters
weight parameter and decreases as the increase of m. This in the t-th frame. Without loss of generality, we omit the
objective function denotes the sum weighted transmission data subscript t for brevity in the following. Let α {αn,k }
of satellite k and considers the information freshness. denote the sub-channel allocation and P {pn,k ,m } denote
Thus, to maximize the objective function, we mainly need to the overall transmit power allocation of K satellites. For each
optimize the channel allocation decision αn,k and the Cn,k ,m , frame with the input I, we are interested in optimizing the
where the Cn,k ,m can be obtained as Cn,k ,m = rn,k ,m · τ . channel allocation α and power allocation P with the goal
Here rn,k ,m denotes the transmission rate of sub-channel n in
slot m, when the sub-channel n is allocated to satellite k.
of maximizing
N K M data C̄ (I)
the sum weighted transmission
n=1 C̄
k =1 n,k , where C̄ n,k α n,k m=1 wm Cn,k ,m .
Let pn,k ,m denote the transmission power of the satellite k We consider two cases with different channel allocation
allocated to the sub-channel n in slot m, which will be zero constraints in the formulation. The first case assumes that
if αn,k = 0. Let hn,k ,m denote the downlink channel gain of each satellite can be assigned at most one sub-channel, this
sub-channel n assigned to satellite k in slot m. Considering is to allow as many as satellite can transmit the data back
the channel fading caused by rain attenuation, weather, air to the groundstation when the channel resources are not
quality, and other physical particulates [36], [37], hn,k ,m , can enough, i.e., N
A A f2 n=1 αn,k ≤ 1, ∀k ∈ K,. Thus the downlink
be modeled as: hn,k ,m = Rc 2 lT2 ,k · At n , where AR and resource allocation with the aforementioned constraints can be
k n,k ,m
AT ,k are the effective area of ground receiving antenna and formulate as follows:
N K
transmitting antenna of satellite k, respectively. c is the speed
(P 1): C̄ ∗ (I) = max C̄n,k
of light, lk is the distance between satellite k and the ground α ,P
n=1 k =1
BS, fn is the carrier frequency of sub-channel n, and Atn,k ,m is
s.t. 0 ≤ pn,k ,m ≤ pkmax , n ∈ N , k ∈ K, m ∈ M (2a)
the total channel fading of sub-channel n allocated to satellite
M
k in slot m, which increases rapidly with the increase of carrier
pn,k ,m ≤ pktot , n ∈ N , k ∈ K (2b)
frequency. The achievable transmission rate of sub-channel n
m=1
allocated to satellite k in slot m, denoted by rn,k ,m , can be
αn,k ∈ {0, 1}, n ∈ N , k ∈ K (2c)
modeled using Shannon formula as
2
K
pn,k ,m · hn,k ,m αn,k ≤ 1, n ∈ N , (2d)
rn,k ,m = Bn · log2 1 + , (1)
σn2 k =1
N
where Bn and σn2 are the transmission bandwidth and the noise αn,k ≤ 1, k ∈ K. (2e)
power of sub-channel n, respectively. n=1
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
1108 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 10, NO. 3, JUNE 2024
Additionally, we also consider the case that each satellite (2b) in (P3) can be transformed into an equality constraint, and
can be assigned multiple sub-channels. The downlink resource we can solve this problem with Lagrange Multiplier Method.
allocation with the aforementioned constraints can be formu- To be specific, we first introduce the Lagrangian multiplier λ,
lated as follows: and obtain a Lagrangian function
K
N M
N K
(P 2): C̄ ∗ (I) = max C̄n,k L(P, λ) = C̄n,k + λ pn,k ,m − pk . (5)
α ,P
n=1 k =1 n=1 k =1 m=1
N
It can be shown that finding the optimal solution to (P3)
s.t. 0 ≤ pn,k ,m ≤ pkmax , k ∈ K, m ∈ M (3a)
is equivalent to finding the optimal solution of L(P, λ) under
n=1 ∂L(P,λ) ∂L(P,λ)
N M the constraint of (2a), where ∂λ = 0, ∂ P = 0. As a
∗
pn,k ,m ≤ pktot , k ∈ K (3b) result, we can express the optimal power allocation pn,k ,m of
n=1 m=1 satellite k in slot m as
αn,k ∈ {0, 1}, n ∈ N , k ∈ K (3c) ∗ τ wm · Bn σn2
pn,k ,m = −αn,k + 2 . (6)
K
ln 2λ∗ hn,k ,m
αn,k ≤ 1, n ∈ N . (3d)
k =1 Thus, finding the optimal power allocation P∗ is equivalent to
finding the optimal λ∗ , which can be obtained by the efficient
(P1) and (P2) are non-convex MIP problems with binary
binary search method.
variables α and continuous variable P, which are difficult to
solve directly. To tackle the problem, we decompose MIP
problem into two sub-problems, i.e., the channel allocation B. Power Allocation for (P2)
problem which is a combinatorial optimization problem, and Since one satellite can be assigned to multiple sub-channels
the power allocation problem which is a convex optimization in (P2), the corresponding power allocation of satellites may
problem. The major difficulty of solving our problems lies change from one-dimensional to multi-dimensional. In this
in the channel allocation problem, as the power allocation case, the power allocation is written as the following convex
problem can be solved by convex optimization algorithms with optimization problem:
a given channel allocation. K
N
Since that there are too many possible channel allocations (P 4): C̄ ∗ (I, α) = max C̄n,k
in our problems, it is difficult to search for an optimal or P
n=1 k =1
a satisfying sub-optimal channel allocation decision among s.t. (3a), (3b) (7a)
all the possible channel allocations. Instead, for the channel
allocation problem, we utilize the learning-based methods In the following, we solve the problem (P4) by the sub-
to obtain the channel allocation decision. With the obtained gradient method. In particular, we set the initial feasible power
(0)
channel allocation, the total problem is reduced to a convex allocation as P(0) {pn,k ,m }. For the initial power allocation
optimization problem, which can be solved using convex P(0) , we constantly update power allocation using simple
optimization. We introduce the details in the next. iteration
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
SUI et al.: INTEGRATING CONVEX OPTIMIZATION AND DEEP LEARNING FOR DOWNLINK RESOURCE ALLOCATION 1109
During the updating, we record and update the best feasible into memory states. To be specific, for the input in step k, the
allocation corresponding to maximum objective value as Pbest encoder first embeds I k into a d-dimensional embedding ek ,
and output Pbest as result. which is then transformed into the memory state hk by the
LSTM cell. The decoder also maintains its memory states and
V. L EARNING BASED C HANNEL A LLOCATION generates a distribution over the output selection. For the input
to decoder, the LSTM cells in decoder directly convert the
The power allocation problem in (P1) and (P2) can be
input into the memory states q. At the first decoder step, the
solved with convex optimization algorithms, as demonstrated
initial input of LSTM cell in decoder is the start of sequence
in Section IV. However, we need to obtain the optimal channel
(SOS) token ⇒, which can be treated as a trainable parameter
allocation in advance. As there are AN K and K
N possible
of our neural network. The LSTM cell in decoder takes ⇒ as
channel allocations in problems (P1) and (P2), it is difficult
input and output q1 . The output of LSTM cell is combined
to search for an optimal channel allocation strategy among
with the encoder output, h, to create the probability distribution
all the possible channel allocation strategies. Given that the
over the input sequence in the Pointer Generation (Ptr-Gen),
deep learning can approximate arbitrary continuous functions,
where the probability distribution in the nth decoder step, pn ,
allowing it to mimic the behavior of highly nonlinear and
is generated as
complex systems and making it universal approximator, it can
be applied to solve a variety of problems in communications.
v T tanh(W1 hk + W2 qn ), mk = 0,
Thereby, we introduce two learning-based schemes, Ptr and unk = (11)
−∞, mk = 1.
DNN-Ptr, to solve the channel allocation problem. Combining
the learning-based schemes with convex optimization based pn = softmax (un ), un = un1 , un2 , . . . , unK . (12)
power allocation schemes, we develop two novel algorithms,
i.e., Opt-Ptr and Opt-DNN-Ptr. The Opt-Ptr algorithm intro- In formula (11) and (12), W1 , W2 , v are trainable parameters.
duces a novel Pointer Network to determine the channel To guarantee the input node can be selected at most once in
allocation decision. It then addresses the remaining power some problems, a binary vector m (masks) with length equal
allocation problem by employing convex optimization algo- to the input size is used to mask the input node that has been
rithms. Additionally, the Opt-DNN-Ptr algorithm enhances selected. mk = 1 denotes input node k has been selected
performance by incorporating a DNN to predict transmission and cannot be selected again, while mk = 0 denotes the
power allocation. This predicted allocation is combined with node k has not been selected. With the probability distribution
the channel allocation decision obtained from the pointer in the nth decoder step, pn , the pointer is then selected
network to solve the remaining power allocation problem. By from pn via a selection strategy. The simplest strategy is the
leveraging more information on power allocation, the Opt- arg max(pn (k )), where pn (k ) is the probability of the kth
k
DNN-Ptr guides the channel allocation, leading to a more input node. Once the pointer is selected, it is passed as the
rational policy generated by the Pointer Network. input to the next decoder step and generate the next selection.
It can be noted that the output of decoder is a sequence
A. Pointer Network Based Channel Allocation (Ptr) of pointers (i.e., indexes) to the input sequence. Taking the
With the input I, we need to select an optimal action Pointer Network architecture in Fig. 2 as an example, the
(channel allocation for each sub-channel) through continuous output (green arrows) is the sequence 2, K , k . Because of this,
training, in order to learn the policy π : I → α ∗ . Since the Pointer Network is suitable for such selection problems.
that the input is dynamic due to the high dynamic channel More importantly, the output size of Pointer Network can be
condition and the varying parameters of different satellites in different, which provides the flexibility and scalability with
different time frames, we apply the RL to train the agent. respect to the problem size for our problem.
RL is a kind of learning from one’s own experience, and
is a suitable and powerful tool for automatic control and
B. DNN Based Power Allocation Prediction
decision-making problems in random dynamic environment. In K!
addition, RL does not need manually labeled training samples, As mentioned above, there are AN K = (K −N )! possible
it is more robust to the change of the satellites parameters and channel allocations in single sub-channel case (P1) and K N
the channel conditions. Note that traditional methods, such as possible channel allocations in multi sub-channel case (P2).
DQN and DNN, can be applied to meet the above demands. In the learning based channel allocation discussed above, the
However, these solution are not scalable, i.e., these models pointer network based scheme Ptr outputs the optimal channel
are rigid regarding problem size. As mentioned before, the allocation barely based on the input consisting of channel gains
size of our problem is not fixed as the number of satellites K and the parameters of satellites, which may be insufficient
may change. Thus, we need a flexible and scalable solution to output the channel allocation accurately in our problems,
with respect to problem size, and we devises a novel Pointer especially when N and K are large. Thus, we consider to add
Network to allocate the channels. a DNN based scheme to predict the power allocation, and
Pointer Network is a sequence-to sequence network with the prediction results are added to the input of the pointer
encoder and decoder, both of which consists of Long Short- network based channel allocation to obtain a more accurate
Term Memory (LSTM) cells, as presented in Fig. 2. The channel allocation policy. As we can obtain the optimal power
encoder reads the input sequence and converts the sequence allocation through the convex optimization algorithms, we can
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
1110 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 10, NO. 3, JUNE 2024
Fig. 2. The structure of the proposed algorithms. The solid part and dashed part represent the structure of Opt-Ptr and Opt-DNN-Ptr, respectively. The
structure consists of five steps; Step1: generate the input sequence Î , where Î = (I , Ṗ) in Opt-DNN-Ptr, Î = I in Opt-Ptr. Step2: with Î , the pointer
network outputs the relaxed channel allocation α̇ and critic network generates predicted reward b; Step3: generate the candidate channel allocation (1 in
Opt-Ptr and Q in Opt-DNN-Ptr) based on the relaxed channel allocation α̇; Step4: calculate the power allocation and corresponding C̄ ∗ according to the
optimization algorithm; Step5: train the networks.
train the neural network with training data obtained from the C. Opt-Ptr Algorithm
output of convex optimization algorithms, to achieve the power Opt-Ptr algorithm combines the convex optimization and
allocation prediction policy π : I → P∗ . the Pointer Network, as shown in Fig. 2 (the solid line part),
In particular, we adopt a DNN to learn the power allocation where we obtain the channel allocation based on the Pointer
policy. This DNN is characterized by the coefficient vector θ D , Network, and solve the rest convex optimization problem
e.g., the weights that connect the hidden neurons. Specifically, by the power allocation algorithm introduced in Section IV.
as the DNN architecture shown in Fig. 2, the fully connected For a given input I, the Opt-Ptr algorithm directly outputs
DNN consists of an input layer, two hidden layers and one the optimized channel allocation of the ground base station
output layer, where we use ReLU as the activation function α ∗ through the Pointer Network. With the obtained channel
in the hidden layers. In the output layer, we use a Sigmoid allocation α ∗ , we utilize the power allocation optimization
activation function. As the convex optimization algorithms can algorithm in Section IV to calculate the optimal power allo-
calculate the optimal power allocation, we train the DNN with cation P∗ and corresponding C̄ ∗ . After obtaining the optimal
experience replay technique [40]. A batch of training data pairs C̄ ∗ , we update the Pointer Network using policy-based RL
(optimal power allocation calculated by convex optimization method. The architecture of the Opt-Ptr algorithm is detailed
algorithm) will be sampled from the replay memory to update as follows.
the DNN every training interval, which accordingly update the For the input I that consists of the satellites parameters
parameters of DNN. Then the power allocation prediction in gk = (Dk ,0 , pkmax , pktot ), k ∈ K and channel gains h
next time frame will be obtained based on the newly updated {hn,k ,m }, we need to divide it into K input sequences
DNN. Such iterations repeat thereafter as the input of each of each satellite as I k = {hk , gk }, k ∈ {1, . . . , K }.
time frame is different, and the policy of the DNN is gradually Here hk denotes the channel gain of the channel con-
improved. nected to satellites k, which can be expressed as hk =
In the next subsections, we introduce two algorithms, Opt- {hn,k ,m }, n ∈ {1, . . . , N }, m ∈ {1, . . . , M }. The encoder of
Ptr algorithm and Opt-DNN-Ptr algorithm, to solve the MIP Pointer Network takes the sequence of satellites parameters
problems. I k , k ∈ {1, . . . , K } as the input and embeds it into a vector
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
SUI et al.: INTEGRATING CONVEX OPTIMIZATION AND DEEP LEARNING FOR DOWNLINK RESOURCE ALLOCATION 1111
Algorithm 1 Opt-Ptr Algorithm train the Pointer Network. Taking the optimal sum weighted
Input: Total input I of each frame. transmission data C̄ ∗ in each frame as the reward signal, we
Output: Optimized channel allocation of ground base station update the parameters of the Pointer Network θP by using
α ∗ and Optimized power allocation of satellites P∗ . the policy gradient method. In each frame, with the input I,
1: Initialize Pointer Network parameters θ P and Critic output of the optimal channel allocation α ∗ , and reward signal
Network parameters θ V ; C̄ ∗ , we obtain the policy gradient ∇J (θ P ) in this frame as
2: Set training number O; follows:
3: for all t = 1, . . . , O do
4: Generate solution of the channel allocation problem ∇J (θ P ) ← (C ∗ − b)∇θ P log pθ P (α ∗ | I), (14)
α ∗ with Pointer Network; where b is the baseline value of sum weighted transmission
5: Get the baseline of the Pointer Network b with critic data which can effectively reduce the variance of gradients
network; and thus improve the performance of Pointer Network. To
6: Compute Optimal C̄ ∗ (I, α ∗ ) and P∗ (I, α ∗ ) utilizing obtain the baseline b, we still need a critic network, whose
the power allocation optimization algorithm; parameter is denoted by θ V . The critic network has a similar
7: gθ P = (C̄ ∗ −b)∇θ P log pθ P (α ∗ | I),Lv = b−C̄ ∗ 22 ; architecture to Pointer Network and takes the same input as
8: Update pointer network and critic network, θ P ← Pointer Network, but outputs the baseline b instead. In order
ADAM (θ P , gθ P ), θ V ← ADAM (θ V , ∇θ V LV ); to accurately predict the baseline b, we also need to train the
9: end for critic network. Based on the reward signal C̄ ∗ , we update
the parameters of critic network based on the loss function
Lv , which is obtained as Lv = b − C̄ ∗ 22 . With the policy
presentation. Based on the vector presentation, the decoder of gradient of Pointer network and the loss function of critic
Pointer Network points to one of the input satellites k each network, we update the whole network by ADAM algorithm.
step. Setting the total step of the decoder equal to number of The pseudocode of the Opt-Ptr algorithm is provided in
sub-channels N, we can take the output of Pointer Network in Algorithm 2.
n-th step as an optimized allocation of sub-channel n, denoted
as αn∗ ∈ {1, . . . , K }, n = 1, . . . , N . However, it is worthwhile
to note that the channel allocation in Section IV is denoted D. Opt-DNN-Ptr Algorithm
as αn,k , not αn , which means that we cannot apply the In the Opt-DNN-Ptr approach, we utilize a DNN to obtain
optimization algorithm based on α q directly. In fact, it is easy a predicted power allocation, which is then combined with
to obtain the channel allocation αn,k based on the αn , i.e., the input I and used to obtain the optimal channel allo-
cation by applying Pointer Network. As shown in Fig. 2
1, αn = k ,
αn,k = (13) (the dashed part), there are three parts in the Opt-DNN-Ptr
0, αn = k . framework: DNN, Pointer Network, and the aforementioned
Thus, the channel allocation expression utilized in this power allocation optimization algorithm. Different from the
section (αn ∈ {1, . . . , K }) is barely equivalent to the Opt-Ptr algorithm, we take the predicted power allocation P,
expression αn,k in Section IV. Without loss of generality, we together with the input I as the basis to generate optimal
substitute the channel allocation αn,k with αn ∈ {1, . . . , K } channel allocation α ∗ . For the input I, DNN predicts a power
in the following. Thus, we can take the output of Pointer allocation Ṗ. Then Pointer Network takes the predicted power
Network as the optimized channel allocation α ∗ {αn∗ }, n ∈ allocation Ṗ together with the initial input I as the input
{1, . . . , N }. Note that the satellites in single sub-channel case sequence and outputs the basis channel allocation α̇. Based
(P1) can only be assigned to one sub-channel at most. We on the corresponding quantization function, we quantize the
set the decoder can only point to each input at most once basis channel allocation into Q candidate channel alloca-
in this problem (P1) by using a mask mechanism that sets tions. For the candidate channel allocation, we calculate the
the parameter of the selected input as −∞ and thus decoder optimal power allocation and corresponding sum weighted
will not point to it anymore. As there is no limit in (P2), we transmission data under each candidate channel allocation by
apply the Pointer Network without the mask mechanism. For corresponding power allocation optimization algorithm. By
convenience, we can express this optimized channel allocation comparing the sum weighted transmission data corresponding
generation process as α ∗ = pθ P (· | I), where pθ P denotes to each candidate channel allocation, we obtain the optimal
the function of Pointer Network, θ P denotes the parameters solution for our problems and corresponding sum weighted
of Pointer Network. Based on the optimal channel allocation transmission data C̄ ∗ . According to the input I and optimal
α ∗ , we obtain the optimal power allocation P∗ (I, α ∗ ) and power allocation P∗ , we update the DNN using ADMA. In
corresponding sum weighted transmission data C̄ ∗ (I, α ∗ ) by addition, we also update the Pointer Network based on reward
applying the aforementioned convex optimization algorithms. C̄ ∗ and input. Details of the Opt-DNN-Ptr are described
Then we update the Pointer network accordingly. below.
For the Pointer Network, our goal is to obtain an optimal For the input I consisting of channel gain and satellites
channel allocation based on the input channel gain and parameters, we first predict the overall power allocation of
satellites parameters, which requires a reasonable training satellites Ṗ {ṗn,k ,m } by applying DNN with parameters
method. Thus, we apply the policy-based RL method to θ D . Based on the predicted power allocation Ṗ and the initial
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
1112 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 10, NO. 3, JUNE 2024
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
SUI et al.: INTEGRATING CONVEX OPTIMIZATION AND DEEP LEARNING FOR DOWNLINK RESOURCE ALLOCATION 1113
network based on the optimal output channel allocation α ∗ , training loss in case N = 5,K = 7 for P1 gradually decreases
reward signal C̄ ∗ and the input (I, Ṗ) similar to the Opt-Ptr and stabilizes at around 0.14, whose fluctuation is mainly due
algorithm. Based on the policy gradient and loss, we update to the random sampling of training data. Meanwhile, as shown
the Pointer Network and corresponding critic network using in Fig. 3(b), the training loss of the DNN in case N = 7,
ADAM algorithm in each frame. While during the execution K = 12 for P2 converges within 1000 time frames and
phase after training, only the actor network is utilized without stabilizes at around 0.16.
any additional critic network. The whole procedure of Opt-
DNN-Ptr algorithm is organized in Algorithm 3.
Although each satellite needs to send basic information C. Performance Comparison
to the ground station for decision-making in each time slot, Regarding to the sum weighted transmission data
the time interval between time slots is approximately a few performance, we also compare the proposed learning-based
minutes, which is not frequent. Moreover, compared to the methods with the following benchmark schemes:
actual data transmission volume, the time and bandwidth • Opt-Greedy: Each sub-channel is allocated to the satel-
consumed by transmitting this information can be completely lites corresponding to the max average channel gain. For
negligible. P1, a checking mechanism is added to guarantee that each
satellite can be allocated to one sub-channel at most. The
VI. S IMULATION R ESULTS AND D ISCUSSION power allocation is obtained by the corresponding power
allocation optimization algorithm.
A. Parameters Settings
• Opt-RDS: The channel allocation is randomly selected
In the simulation, we assume that the ground base station from the set S in each time frame, and the corresponding
communicates with satellites in Ka band, the carrier frequency power allocation and sum weighted data are obtained
of sub-channels is in the range of (20, 30) GHz. Specific carrier using the power allocation algorithms in Section IV.
frequency setting of sub-channels is different under different • Opt-DNN [30]: The Opt-DNN algorithm obtains the
amount of sub-channels N with the interval of the carrier channel allocation by applying a fully connected DNN
frequency of sub-channels is barely same. The average gain and corresponding quantization function. The optimal
of sub-channel n connected to satellite k, denoted by h n,k , is power allocation of each candidate channel allocation can
A A 2
modeled as h n,k = Rf 2 lT2 ,k · Acn,k , where the average distance be obtained by solving (P3) or (P4) with the optimization
n
between satellites and ground station l is uniformly distributed algorithm in Section IV.
in the range of (500, 2000) Km according to the orbital altitude 1) Simulation Results for Single Sub-Channel Case (P1):
of LEO satellites and c is the speed of light c = 3 × 108 m/s. In Fig. 4, we evaluate the sum weighted transmission data
The link attenuation, denoted by An,k , and the product of the performance of the proposed Opt-Ptr and Opt-DNN-Ptr algo-
area of the receiving and transmitting antennas AR AT ,k are rithms. As shown in the Fig. 4(a), the Opt-DNN-Ptr algorithm
randomly distributed in the range presented in Table I. For the can obtain larger C̄ ∗ than Opt-Ptr in most frames when there
average channel gain h n,k , we further consider the varying are 5 sub-channels and 7 satellites. In the Fig. 4(b), it can
distance and communication angle between satellites and the be noted that the performance gap between Opt-Ptr and Opt-
ground station during one time frame as φ, which denotes DNN-Ptr in case N=5 K=12 becomes smaller than case N=5
the coefficient set of change of channel gain influenced by K=7.
the distance and angle. The time-varying channel gains of M In Fig. 5, we investigate the average sum weighted trans-
mini-slots hn,k = [hn,k ,1 , hn,k ,2 , . . . , hn,k ,M ] are modeled as mission data performance of different schemes under different
hn,k = h n,k φ n,k + Nw , where φ n,k denotes the selected M K. We can observe that the proposed Opt-DNN-Ptr algorithm
coefficients from φ. Nw denotes the channel gain variation can achieve the best performance in all the cases. The Opt-Ptr
caused by other factors such as weather. In addition, the has worse performance than Opt-DNN when K = 7, and has
total power limit of satellites pktot is obtained based on better performance than Opt-DNN when K > 7. In addition,
pktot = μMpkmax . The corresponding parameters refer to the the performance gap between Opt-Ptr and Opt-DNN increases
literature [41], [42], which are uniformly generated from the as K increases. The other two benchmark algorithms Opt-
ranges specified in Table I. In the experiment, we consider RDS and Opt-Greedy increase more slowly than the proposed
the information freshness and the amount of downloaded data, learning based algorithms and Opt-DNN, especially the Opt-
assuming wm = τ2 [2(M − m) + 1]. As for the quantization RDS. Compared with the benchmark schemes Opt-RDS and
parameter Q, we set the quantization number equal to the Opt-Greedy, the proposed two learning-based methods and the
amount of satellites Q = K. Opt-DNN have better performance. Among the three learning-
In the Opt-Ptr algorithm, we set the size of embedding based methods (i.e., Opt-DNN-Ptr, Opt-Ptr and Opt-DNN),
vector as 120 and learning rate 0.00025. Besides, we set the Opt-DNN-Ptr and Opt-Ptr increase faster than the Opt-
1 glimpse for Pointer Network and 3 glimpses for the critic DNN. Observing the structure of Opt-DNN, the DNN model is
network. The parameter settings for Opt-DNN-Ptr are listed updated based on the sample training data pairs from memory,
in Table II. where the training data pairs are composed of the best data
pairs among Q candidates in each frame. When K is small, the
B. Convergence Analysis proportion of the quantification number Q to S is sufficient to
In Fig. 3, we plot the training loss of the DNN in obtain an excellent solution. The excellent solutions in each
Opt-DNN-Ptr for P1 and P2. As shown in Fig. 3(a), the DNN frame constitute the training data pairs of the DNN, allowing
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
1114 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 10, NO. 3, JUNE 2024
TABLE I
S IMULATION PARAMETERS
TABLE II
H YPERPARAMETERS OF Opt-DNN-Ptr A LGORITHM
Fig. 4. The sum weighted transmission data C̄ ∗ for Opt-Ptr and Opt-DNN-Ptr algorithms.
the DNN to further improve the solution quality. In other of training data pairs may be insufficient when K is large,
words, the DNN learns from high quality training data pairs which leads to the slower increased performance than the Opt-
in Opt-DNN when K is small, thus Opt-DNN can obtain a Ptr. Compared to Opt-DNN that learns form the experience
better performance than Opt-Ptr when K = 7. However, as data pairs, the RL based Opt-DNN-Ptr and Opt-Ptr algorithms
K increases and Q cannot increase accordingly, the quality perform better, especially when the number of satellites K
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
SUI et al.: INTEGRATING CONVEX OPTIMIZATION AND DEEP LEARNING FOR DOWNLINK RESOURCE ALLOCATION 1115
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
1116 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 10, NO. 3, JUNE 2024
Fig. 7. The sum weighted transmission data C̄ ∗ for Opt-Ptr and Opt-DNN-Ptr algorithms.
VII. C ONCLUSION
In this paper, we jointly optimize the channel allocation
of the fixed ground station and the transmission power allo-
cation of satellites in a SCN. We mathematically model the
Fig. 8. The average sum weighted transmission data performance over
different K. downlink transmission and formulate the resource allocation
as a MIP problem. To address this NP-hard problem, we
develop two novel algorithms, i.e., Opt-Ptr and Opt-DNN-Ptr.
Extensive simulations show that our learning-based algorithms
achieve impressive performance compared to other benchmark
methods.
A PPENDIX
Here we introduce the method for obtaining the weight
parameters of the objective function. Our goal, for joint
sub-channel allocation and transmit power allocation, is to
maximize the amount of downloaded data while minimizing
data transmission delay. Specifically, combining the storage
loss proposed in [43], we minimize the integral of the total
amount of stored data across all K satellites within each frame.
The integral of the amount of data of satellites k in one frame
is formulated as
T
Fig. 9. The average sum weighted transmission data performance over Ok Dk (t) dt, (15)
different N. 0
where Dk (t) is the amount of data stored in satellite k at
and P2, the proposed Opt-DNN-Ptr can obtain the maximum time instant t. Using a time discretization approach with slot
sum weighted transmission data under different cases. In length τ , we can approximate the overall Ok by the Trapezoid
addition, comparing the simulation results under (P1) and Method:
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
SUI et al.: INTEGRATING CONVEX OPTIMIZATION AND DEEP LEARNING FOR DOWNLINK RESOURCE ALLOCATION 1117
τ1 τM
Ok Dk (t) dt + . . . + Dk (t) dt [12] R. Liu, M. Sheng, K.-S. Lui, X. Wang, Y. Wang, and D. Zhou, “An ana-
0 τM −1 lytical framework for resource-limited small satellite networks,” IEEE
M
Commun. Lett., vol. 20, no. 2, pp. 388–391, Feb. 2016.
Dk ,m−1 + Dk ,m [13] S. Zhang, G. Cui, and W. Wang, “Joint data downloading and resource
≈τ Õk . (16) management for small satellite cluster networks,” IEEE Trans. Veh.
2
m=1 Technol., vol. 71, no. 1, pp. 887–901, Jan. 2022.
[14] H. Yao, L. Wang, X. Wang, Z. Lu, and Y. Liu, “The space-terrestrial
Here Dk ,m denotes the amount of data stored in satellite k integrated network: An overview,” IEEE Commun. Mag., vol. 56, no. 9,
at the end of slot m, where 1 ≤ m ≤ M . Dk ,0 represents pp. 178–185, Sep. 2018.
[15] S. Aboagye, T. M. N. Ngatched, and O. A. Dobre, “Subchannel and
the initial amount of data stored in satellite k at the beginning power allocation in downlink VLC under different system configura-
of the first slot. The difference of the amount of stored data tions,” IEEE Trans. Wireless Commun., vol. 21, no. 5, pp. 3179–3191,
within a time slot is equal to the amount of data transmitted May 2022.
[16] Y. Qiu, H. Zhang, K. Long, and M. Guizani, “Subchannel assignment
back to the ground in that time slot. Thus, we have and power allocation for time-varying fog radio access network with
NOMA,” IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 3685–3697,
N
Jun. 2021.
Dk ,m−1 − Dk ,m = αn,k Cn,k ,m , ∀k , m, (17) [17] A. A. Khan and R. S. Adve, “Centralized and distributed deep rein-
n=1 forcement learning methods for downlink sum-rate optimization,” IEEE
Trans. Wireless Commun., vol. 19, no. 12, pp. 8410–8426, Dec. 2020.
where Cn,k ,m denotes the amount of data transmitted by satel- [18] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Proc.
Adv. Neural Inf. Process. Syst., 2015, pp. 1–9.
lite k through sub-channel n in slot m. With the equality (17), [19] A. Xiao, Z. Chen, S. Wu, S. Jin, and L. Ma, “Collaborative long-
we can reformulate the (16) as short term bandwidth allocation for satellite-terrestrial networks,” IEEE
Commun. Lett., vol. 26, no. 5, pp. 1121–1125, May 2022.
τ [20] J. Liu, B. Zhao, Q. Xin, and H. Liu, “Dynamic channel allocation for
Õk = 2MDk ,0 satellite Internet of Things via deep reinforcement learning,” in Proc.
2 IEEE Int. Conf. Inf. Netw. (ICOIN), 2020, pp. 465–470.
N
M [21] N. Dai, D. Zhou, M. Sheng, and J. Li, “Deep reinforcement learning
based power allocation for high throughput satellites,” in Proc. IEEE
− αn,k (2(M − m) + 1)Cn,k ,m . (18) 94th Veh. Technol. Conf. (VTC-Fall), 2021, pp. 1–5.
n=1 m=1 [22] H. Tsuchida et al., “Efficient power control for satellite-borne batter-
ies using Q-learning in low-earth-orbit satellite constellations,” IEEE
As Dk ,0 is a constant value, minimizing the integral of the Wireless Commun. Lett., vol. 9, no. 6, pp. 809–812, Jun. 2020.
amount of data of k in one frame Ok is equals to
satellites [23] R. Chen, X. Hu, X. Li, and W. Wang, “Optimum power allocation based
N αn,k τ M on traffic matching service for multi-beam satellite system,” in Proc. 5th
maximizing n=1 2 [ m=1 (2(M − m) + 1)Cn,k ,m ], Int. Conf. Comput. Commun. Syst. (ICCCS), 2020, pp. 655–659.
representing the weighted sum of data from satellite k. [24] A. Paris, I. Del Portillo, B. Cameron, and E. Crawley, “A genetic
algorithm for joint power and bandwidth allocation in multibeam
satellite systems,” in Proc. IEEE Aerosp. Conf., 2019, pp. 1–15.
R EFERENCES [25] N. Pachler, J. J. G. Luis, M. Guerster, E. Crawley, and B. Cameron,
“Allocating power and bandwidth in multibeam satellite systems using
[1] O. Kodheli et al., “Satellite communications in the new space era: A particle swarm optimization,” in Proc. IEEE Aerosp. Conf., 2020,
survey and future challenges,” IEEE Commun. Surveys Tuts., vol. 23, pp. 1–11.
no. 1, pp. 70–109, 1st Quart., 2021. [26] F. G. Ortiz-Gomez, D. Tarchi, R. Martínez, A. Vanelli-Coralli,
[2] Y. Su, Y. Liu, Y. Zhou, J. Yuan, H. Cao, and J. Shi, “Broadband LEO M. A. Salas-Natera, and S. Landeros-Ayala, “Cooperative multi-agent
satellite communications: Architectures and key technologies,” IEEE deep reinforcement learning for resource management in full flexible
Wireless Commun., vol. 26, no. 2, pp. 55–61, Apr. 2019. VHTS systems,” IEEE Trans. Cogn. Commun. Netw., vol. 8, no. 1,
[3] H. Zhou and H. Liu, “Development review of foreign emerging com- pp. 335–349, Mar. 2022.
mercial LEO satellite communication constellations,” Telecommun. Eng., [27] G. Cui, X. Li, L. Xu, and W. Wang, “Latency and energy
vol. 58, no. 9, pp. 1108–1114, 2018. optimization for MEC enhanced SAT-IoT networks,” IEEE Access,
[4] J. Li, M. Li, and W. Li, “Satellite communication on the non- vol. 8, pp. 55915–55926, 2020.
geostationary system and the geostationary system in the fixed-satellite [28] Y.-C. Wu, T. Q. Dinh, Y. Fu, C. Lin, and T. Q. S. Quek, “A hybrid
service,” in Proc. 28th Wireless Opt. Commun. Conf. (WOCC), 2019, DQN and optimization approach for strategy and resource allocation
pp. 1–5. in MEC networks,” IEEE Trans. Wireless Commun., vol. 20, no. 7,
[5] L. You, K.-X. Li, J. Wang, X. Gao, X.-G. Xia, and B. Ottersten, “Massive pp. 4282–4295, Jul. 2021.
MIMO transmission for LEO satellite communications,” IEEE J. Sel. [29] S. Chen, J. Chen, Y. Miao, Q. Wang, and C. Zhao, “Deep reinforcement
Areas Commun., vol. 38, no. 8, pp. 1851–1865, Aug. 2020. learning-based cloud-edge collaborative mobile computation offload-
[6] B. Di, L. Song, Y. Li, and H. V. Poor, “Ultra-dense LEO: Integration of ing in industrial networks,” IEEE Trans. Signal Inf. Process. Netw.,
satellite access networks into 5G and beyond,” IEEE Wireless Commun., vol. 8, pp. 364–375, 2022. [Online]. Available: https://ieeexplore.ieee.
vol. 26, no. 2, pp. 62–69, Apr. 2019. org/document/9776583
[7] Z. Qu, G. Zhang, H. Cao, and J. Xie, “LEO satellite constellation for [30] L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep reinforcement learning
Internet of Things,” IEEE Access, vol. 5, pp. 18391–18401, 2017. for online computation offloading in wireless powered mobile-edge
[8] D. Zhou, M. Sheng, Y. Wang, J. Li, and Z. Han, “Machine learning- computing networks,” IEEE Trans. Mobile Comput., vol. 19, no. 11,
based resource allocation in satellite networks supporting Internet of pp. 2581–2593, Nov. 2020.
Remote Things,” IEEE Trans. Wireless Commun., vol. 20, no. 10, [31] B. V. R. Gorantla and N. B. Mehta, “Resource and computationally
pp. 6606–6621, Oct. 2021. efficient subchannel allocation for D2D in multi-cell scenarios with
[9] X. Jia, T. Lv, F. He, and H. Huang, “Collaborative data downloading partial and asymmetric CSI,” IEEE Trans. Wireless Commun., vol. 18,
by using inter-satellite links in LEO satellite networks,” IEEE Trans. no. 12, pp. 5806–5817, Dec. 2019.
Wireless Commun., vol. 16, no. 3, pp. 1523–1532, Mar. 2017. [32] Z. Wang, J. Li, Y. Wang, Z. Su, S. Yu, and W. Meng, “Optimal
[10] D. Zhou, M. Sheng, R. Liu, Y. Wang, and J. Li, “Channel-aware mission repair strategy against advanced persistent threats under time-varying
scheduling in broadband data relay satellite networks,” IEEE J. Sel. networks,” IEEE Trans. Inf. Forensics Security, vol. 18, pp. 5964–5979,
Areas Commun., vol. 36, no. 5, pp. 1052–1064, May 2018. 2023.
[11] B. Di, H. Zhang, L. Song, Y. Li, and G. Y. Li, “Ultra-dense LEO: [33] Y. Wang, Z. Su, T. H. Luan, J. Li, Q. Xu, and R. Li, “SEAL:
Integrating terrestrial-satellite networks into 5G and beyond for data A strategy-proof and privacy-preserving UAV computation offload-
offloading,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 47–62, ing framework,” IEEE Trans. Inf. Forensics Security, vol. 18,
Jan. 2019. pp. 5213–5228, 2023.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.
1118 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, VOL. 10, NO. 3, JUNE 2024
[34] H. Liao, Z. Zhou, X. Zhao, and Y. Wang, “Learning-based queue-aware Yifeng Lyu received the B.E. degree from the
task offloading and resource allocation for space–air–ground-integrated University of Electronic Science and Technology of
power IoT,” IEEE Internet Things J., vol. 8, no. 7, pp. 5250–5263, Apr. China, Chengdu, China, in 2015, and the M.Eng.
2021. degree from the University of Toronto, Toronto,
[35] Z. Li et al., “Energy efficient resource allocation for UAV-assisted space- Canada, in 2017. He is currently pursuing the
air-ground Internet of Remote Things networks,” IEEE Access, vol. 7, Ph.D. degree with the School of Information and
pp. 145348–145362, 2019. Electronics, Beijing Institute of Technology. His
[36] W. Lyu, B. Cong, J. Han, and J. Xu, “OFDM and self-coherent detection research interests include satellite communication,
based satellite-to-ground communication system,” in Proc. 15th Int. satellite networks, and reinforcement learning.
Conf. Opt. Commun. Netw. (ICOCN), 2016, pp. 1–3.
[37] J. M. Tang, P. M. Lane, and K. A. Shore, “High-speed transmission
of adaptively modulated optical OFDM signals over multimode fibers
using directly modulated DFBs,” J. Lightw. Technol., vol. 24, no. 1,
pp. 429–441, Jan. 2006.
[38] C. Loo, “Impairment of digital transmission through a Ka band satellite
channel due to weather conditions,” Int. J. Satell. Commun., vol. 16, Rongfei Fan (Member, IEEE) received the B.E.
no. 3, pp. 137–145, 1998. degree in communication engineering from the
[39] H. Zhang, Q. Li, Y. Zhang, and X. Li, “Game theory based power Harbin Institute of Technology, Harbin, China, in
allocation method for inter-satellite links in LEO/MEO two-layered 2007, and the Ph.D. degree in electrical engineering
satellite networks,” in Proc. IEEE/CIC Int. Conf. Commun. China from the University of Alberta, Edmonton, Alberta,
(ICCC), 2021, pp. 398–403. Canada, in 2012. Since 2013, he has been a Faculty
[40] V. Mnih et al., “Human-level control through deep reinforcement Member with the Beijing Institute of Technology,
learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. Beijing, China, where he is currently an Associate
[41] M. Chen, R. Chai, and Q. Chen, “Joint route selection and resource Professor with the School of Cyberspace Science
allocation algorithm for data relay satellite systems based on energy effi- and Technology. His research interests include edge
ciency optimization,” in Proc. IEEE 11th Int. Conf. Wireless Commun. computing, federated learning, resource allocation in
Signal Process. (WCSP), 2019, pp. 1–6. wireless networks, and statistical signal processing.
[42] A. Wang, L. Lei, X. Hu, E. Lagunas, A. I. Pérez-Neira, and
S. Chatzinotas, “Adaptive beam pattern selection and resource allocation
for NOMA-based LEO satellite systems,” in Proc. IEEE Glob. Commun.
Conf. (GLOBECOM), 2022, pp. 674–679.
[43] A. Sadeghi, F. Sheikholeslami, A. G. Marques, and G. B. Giannakis,
“Reinforcement learning for adaptive caching with dynamic storage Han Hu (Member, IEEE) received the B.E. and
pricing,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2267–2281, Ph.D. degrees from the University of Science
Oct. 2019. and Technology of China, China, in 2007 and
2012, respectively. He is currently a Professor
with the School of Information and Electronics,
Beijing Institute of Technology, China. His research
interests include multimedia networking, edge intel-
Xiufeng Sui received the Ph.D. degree in com- ligence, and space-air-ground integrated network. He
puter science from the School of Computer Science, received several academic awards, including the Best
University of Science and Technology of China, Paper Award of IEEE TCSVT 2019, the Best Paper
Anhui, China. He is currently an Associate Professor Award of IEEE Multimedia Magazine 2015, and the
with the School of Information and Electronics, Best Paper Award of IEEE Globecom 2013. He served as an Associate Editor
Beijing Institute of Technology. His main research for IEEE T RANSACTIONS ON M ULTIMEDIA and Ad Hoc Networks, and a
interests include computer architecture, operat- TPC Member of Infocom, ACM MM, AAAI, and IJCAI.
ing systems, strategies of engineering science
and technology, and management of technological
innovation.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on October 16,2024 at 11:03:17 UTC from IEEE Xplore. Restrictions apply.