Playing With A Multi Armed Bandit To Optimize Resource Allocation in Satellite-Enabled 5G Networks
Playing With A Multi Armed Bandit To Optimize Resource Allocation in Satellite-Enabled 5G Networks
Abstract—In this paper, we address issues associated with the play in the near future a role that is of paramount importance,
effective management of handover events in satellite-enabled 5G especially when it comes to the effective provision of ubiqui-
network infrastructures. Namely, we devise a strategy for dynam- tous network and communications services [1], [2], [3]. Their
ically allocating 5G gNB available resources in the presence of
a constellation of LEO satellites, based on several parameters adoption does indeed guarantee wide-area coverage and ser-
collected and dispatched by an ad hoc orchestration platform. vice availability, while at the same time providing continuity
We propose to leverage a Combinatorial Multi-Armed Bandit and scalability [4], [5], [6]. A number of initiatives have
approach to design a resource allocation game that helps make recently seen light which address issues associated with the
decisions over time under uncertainty conditions related to the seamless integration between satellite and terrestrial networks
incidence of several factors that can determine the quality of
experience perceived on the user equipment side. The introduced in 5G-enabled next-generation networks [7], [8]. In particular,
approach does represent a pioneering one, since it allows us to the satellite communication industry has become increasingly
model a joint optimization task as a competitive game in which involved with the 3rd Generation Partnership Project (3GPP)
agents typically share resources with other agents instead of occu- standardization activities for 5G, by proposing effective ways
pying them exclusively. The designed task allows to dynamically for such an integration to take place [9].
enable and disable channels, taking care of the relationships with
the lower layer, transparently managing the required handover In the above depicted scenario, special attention must be
operations, and considering possible interference with other devoted to the design and implementation of advanced mech-
channels. We discuss an implementation of the proposed solution anisms for managing User Equipment (UE) hand-off in a way
in a simulated environment. We also analyze the performance that does not negatively impact the perceived end-user’s qual-
it attains, by measuring both its efficiency and efficacy in a ity of experience, while at the same time preserving as much
trial setup reproducing a scenario with near real-time tempo-
ral requirements. Results show that the proposed approach has as possible the resources made available by a satellite-enabled
a linear trend in terms of running time with respect to the 5G access network [10], [11]. In the presence of LEO satel-
number of user equipments and gNbs involved, while achiev- lite infrastructures, such a task plays a crucial role, due to the
ing a sub-optimal solution of the handover task in around 20-30 relative movement between satellite constellations and earth.
rounds. As shown in [2], different drivers entail satellite handover:
Index Terms—AI-based management, 5G, satellite handover, i) the service time of each satellite is limited; ii) link loss, link
multi armed bandit, game theory. interference, and other environmental factors can negatively
impact communication, and iii) UE requests and traffic distri-
I. I NTRODUCTION bution are random, unpredictable and unbalanced, respectively.
OW EARTH Orbit (LEO) satellite networks, together Despite different solutions have been proposed in the litera-
L with the associated mega-constellations, will certainly ture (see [12] for more details), several challenges are still
open due to the continuous movement of UE and satellites.
Manuscript received 28 January 2023; revised 24 May 2023 and 26 Examples of these issues are related to the high frequency
July 2023; accepted 31 July 2023. Date of publication 8 August 2023;
date of current version 7 February 2024. This work is carried out within of handovers, which can lead to potential service malfunc-
the framework of the ESA ARTES AT project “ANChOR” (Data-driven tion, or context conditions, which can affect the variability of
Network Controller and Orchestrator for Real-time Network Management) signal power and propagation delays, as it has also been dis-
– ESA/ESTEC Contract No.: 4000131447/20/NL/AB. The views expressed
herein can in no way be taken to reflect the official opinion of the European cussed in [13]. Different strategies have been proposed for
Space Agency. The associate editor coordinating the review of this article and handover optimization based on either deep learning tech-
approving it for publication was N. Zincir-Heywood. (Corresponding author: niques [14], [15], or optimization models [16], [17], or even
Giancarlo Sperlí.)
Antonio Galli is with the Department of Electrical Engineering and game theory [18], [19]. All such strategies suffer from sev-
Information Technology, University of Naples Federico II, 80125 Naples, Italy eral limitations in representing real-world scenarios. In fact,
(e-mail: [email protected]). the majority of them does not consider neither environmen-
Vincenzo Moscato, Simon Pietro Romano, and Giancarlo Sperlí are with the
Department of Electrical Engineering and Information Technology, University tal information nor UE resource requirements. A few others
of Naples Federico II, 80125 Naples, Italy, and also with the CINI-ITEM reduce the complexity of analysis by relaxing some constraints
National Lab, Complesso Universitario Monte S. Angelo, 80126 Naples, Italy through a model linearization procedure.
(e-mail: [email protected]; [email protected]; giancarlo.sperli@
unina.it). In this paper, we address the mentioned issues by design-
Digital Object Identifier 10.1109/TNSM.2023.3302064 ing a joint optimization strategy related to the configuration
1932-4537
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
342 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 21, NO. 1, FEBRUARY 2024
and allocation of the physical resources required in order to simulated environment for evaluating the proposed approach in
deploy sensor data channels across a specific coverage area, terms of both efficiency and efficacy. Section VI concludes the
considering both the existing allocation of resources and the paper by also identifying possible directions of future work.
specific satellite constellation setup. The allocation strategy we
define allows to dynamically enable and disable channels, tak- II. R ELATED W ORKS
ing care of the relationships with the lower layer, transparently
In this section, we firstly discuss the use of a CMAB
managing the required handover operations, and considering
approach within the context of different network applications
possible interference with other channels. We will explain
to unveil its main peculiarities, with special reference to its
in the paper how we leverage a Combinatorial Multi-Armed
ability to make decisions over time under uncertain conditions.
Bandit (CMAB) approach and design a resource allocation
Then, we investigate more in-depth current state-of-the-art
game that helps make decisions over time under uncertainty
approaches to the optimization of satellite handover tasks.
conditions related to the incidence of several factors that can
determine the quality of experience perceived on the user
equipment side. We also demonstrate how the main contri- A. Multi-Armed Bandit
bution of our work, when compared to the current state of Markov games is a framework that applies game theory
the art, consists in the fact that it allows us to model a joint techniques to MDP-like environments [22]. It is commonly
optimization task as a competitive game in which agents typ- used to determine the optimal learning strategy for a given
ically share resources with other ones instead of occupying problem. Despite its wide adoption, the techniques it lever-
them exclusively. In particular, we model our task as a con- ages are typically ineffective either when the number of agents
textual multi-agent multi-armed bandit, in which each player scales or when dealing with uncertainty problems. In our
can explore a finite set of arms with stochastic reward, whose scenario, decisions over time are strongly affected by the
distribution is player-dependent. In our game each UE plays incidence of several factors (e.g., weather conditions or unpre-
an arm in order to maximize its own reward without commu- dictable and dynamic requests from UE) that can determine the
nicating anything to the others because they compete for the quality of experience perceived on the user equipment side. For
use of satellite resources. This approach is particularly suit- this reason, the Combinatorial Multi-armed Bandit (CMAB)
able when the number of agents is much smaller than the framework has been designed as a sequence decision game
arms to be played [20]. In fact, in our case the cardinal- over T rounds, in which different agents make decisions while
ity of the edge (i.e., arms) set in the proposed graph-based being only able to observe limited information when pulling
model is much larger than the number of available agents. arms [23].
Furthermore, we model the context about the current allo- Such a framework has been used to model a wide range
cation of resources through a bi-partite graph (defined in of application tasks (see [24] for more details), also defining
Section III-B), in which connections between a satellite and a some optimization strategies for supporting online decision-
UE represent different communication channels the terminal making, as shown in [25].
can connect to. Edge connections probabilities are dynami- An example of possible application concerns the Ultra-
cally updated under dynamic settings due to UE’s dynamic Dense Networks (UDNs), in which the densification of Base
behavior (in terms of both bandwidth request and position), Stations (BSs) enables user equipment (UE) to connect to
satellite movement over earth orbit, and variability in weather multiple BSs. Under this perspective, Qureshi et al. [26]
conditions. designed a CMAB approach to address the problem of assign-
We also present an implementation of the proposed solu- ing small base stations to multiple mobile data users in
tion in a simulated environment and analyze the related heterogeneous settings. Their model takes decisions based on
performance, by measuring both its efficiency and efficacy dynamic arrivals of the users and additional information linked
in a near real-time scenario. In particular, the efficiency of to the user and the small BSs (SBSs), i.e., user/SBSs dis-
the proposed approach is evaluated in terms of running time tance, as well as transmission frequency. Another approach in
and shows a linear trend with respect to the number of UE UDNs has been proposed by Zhu et al. [27], whose aim is to
and gNbs involved. This makes it feasible also for mega- maximize the users’ Quality of Experience (QoE) by defining
constellations such as SpaceX [21]. Our tests show that the an optimization problem based on a multi-agent multi-armed
proposed technique converges to a sub-optimal solution in bandit to deal with a scalable video coding problem.
around 20-30 rounds, as witnessed by an Average Regret value The CMAB framework has been also used for supporting
that is close to 0. spectrum scheduling tasks. In particular, it has been applied in
The paper is organized into six sections. Section II intro- a wireless scenario for assigning bands of the wireless spec-
duces the CMAB problem and summarizes the state of the trum as resources to users (see [28] for more details). A
art approaches for the handover task, whose definition is similar CMAB model has been designed by Joshi et al. [29] for
described in more detail in Section III. In the same section, learning the optimal subset size and optimal channel set with
we define the proposed data model and design a probabil- a minimum number of reconstruction failures in the presence
ity assignment estimation task to deal with the Resource of a limited, shared and non-contiguous spectrum.
Allocation game in an uncertain setting. The experimental pro- Finally, authors of [30] developed an algorithm to deal
tocol and the dataset generation are detailed in Section IV, with resource allocation tasks by using a multi-armed ban-
while Section V describes the campaign carried out in a dit exploration aimed at identifying a trade-off between the
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
GALLI et al.: PLAYING WITH A MULTI ARMED BANDIT TO OPTIMIZE RESOURCE ALLOCATION 343
TABLE I
S UMMARIZING THE S TATE OF THE A RT A BOUT M ULTI -A RMED IoT (Internet of Things) devices. To deal with these challenges,
BANDIT IN T ERMS OF TASKS AND A PPLICATIONS several approaches have been proposed, mainly relying either
on optimization models or on deep learning-based techniques.
Chen et al. [36] proposed an optimization model for
resource allocation in satellite-terrestrial networks that guar-
antees both efficiency and fairness. In [16], an Individualistic
Dynamic Handover Parameter Optimization algorithm based
on an Automatic Weight Function has been proposed to esti-
mate the Handover Control Parameters settings for each UE
based on its past experience. Zhang et al. [17] proposed a
optimal played arm and information accumulated during the
network flows model, in which the requests of UTs determine
round.
the weights of the edges, as well as the quality of satel-
The Multi-Armed Bandit is used to effectively support dif-
lite services according to the computed flow matrix. In turn,
ferent optimization tasks, as well as resource allocation or
Huang et al. [14] dealt with the handover problem by using,
spectrum scheduling in the context of network applications
for the classification task, a Deep Neural Network model based
(see Table I). For this reason, we decided to integrate the
on the Signal-to-Interference-Plus-Noise Ratio (SINR) values
multi-armed bandit methodology in our framework and design
associated with UE. A framework based on deep reinforcement
a Resource Allocation game that helps make decisions over
learning has also been developed by Hu et al. [15] for dynamic
time under uncertainty conditions related to the incidence
resource allocation in resource-limited satellite infrastructures.
of several factors (e.g., meteorological information, orbital
As clearly highlighted in Table II, different strategies
resource utilization, etc.).
(e.g., deep neural networks, the primary field equilibrium
or Stackelberg game) have been proposed for handover
B. Satellite Handover optimization. Nevertheless, they all suffer from several limi-
In recent years, with the rapid development of communica- tations in representing real-world scenarios. Most of them are
tion technologies, satellite networks have become a significant mainly focused on UE location and do not take into account
information infrastructure with global coverage, hence repre- neither environmental information nor UE resource require-
senting a key component of 5G networks. They can indeed ments. In some cases, they reduce the complexity of analysis
be used in addition to terrestrial networks for achieving con- by relaxing some constraints through a model linearization
nection anytime and anywhere [31], [32]. Although the LEO procedure.
satellite constellation has a smaller transmission delay and To the best of our knowledge, the introduced approach of
lower energy consumption when compared to Geostationary applying a CMAB model to support the satellite handover task
Earth Orbit (GEO) (see [33] for more details), one of the both effectively and efficiently is a pioneering one in the liter-
main issues associated with it concerns the satellite handover. ature. The task is in our case modeled as a competitive game
Each LEO satellite can only serve UE for a few minutes while in which agents typically share resources with other agents
supporting different features and satisfying heterogeneous instead of occupying them exclusively. Thus the reward is a
constraints. continuous value in [0 . . . 1] rather than either 0 or 1 [18]. Our
Satellite handover may cause issues, such as lower through- framework is based on a combinatorial multi-armed bandit for-
put, higher signaling costs, and more frequent communication mulation. With respect to real-time decision making, previous
interruptions. For this reason, it is crucial to design a rea- works in the literature (see, e.g., [37], [38]) proved the capa-
sonable satellite handover strategy to improve the Quality of bility of CMAB to effectively support real-time decisions both
Service (QoS) perceived by the User Terminal, with the aim to in the streaming music domain and in real-time gaming.
satisfy end-users’ needs in terms of service performance [34].
Starting from the above considerations, Wang et al. [19] III. F RAMEWORK
designed a load balancing schema based on the Stackelberg
In this section, we describe our framework for dealing with
game for dealing with the packet loss caused by a cache over-
the problem of terminals optimal allocation over different gate-
flow of the Low Earth Orbit (LEO) satellites. Namely, their
ways in a 5G-enabled satellite network scenario, The adopted
model consists of one leader (a GEO satellite) and multiple
notations are summarized in Table III. In particular, we aim
followers (LEO satellites), in a multi-layered satellite network
to jointly maximize terminals connectivity on one side and
(MLSN) that allows for high resource storage capacity.
gateways resource allocation on the other.
Furthermore, researchers are focusing on developing tech-
niques and models for resource allocation, which is one of
the main topics in 5G networks (see [12], [35] for more A. Task Definition
details). In particular, two are the main challenges: (i) the The explosion of the number of connected devices has fos-
bandwidth demand due to the presence of more and more UE tered a new generation of Mobile Satellite Systems (MSSs). In
with increased computational capabilities, as well as to the rise particular, the aim of the designed task concerns the manage-
in data demanding applications, such as online gaming, aug- ment of handover events by dynamically allocating 5G gNB
mented reality, etc.; (ii) the number of UE connections that is available resources, based on several indications received from
exponentially growing mainly due to the wide deployment of a resource orchestration platform.
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
344 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 21, NO. 1, FEBRUARY 2024
TABLE II
S UMMARY OF S TATE - OF - THE -A RT A PPROACHES FOR H ANDOVER AND R ESOURCE A LLOCATION IN T ERMS
OF P ERFORMED TASK , D OMAIN OF A PPLICATION AND D RAWBACKS I DENTIFIED
Fig. 1. Example of handover in a 5G network using one UE and two satellites (SAT 2 and SAT 3). (a) The UE is initially connected to satellite number 2;
(b) The UE is going out of reach of satellite number 2; (c) The UE enters the area covered by satellite number 3.
TABLE III
N OMENCLATURE Level Agreements (SLA), or even to deal with the attenuation
of power signal due to the high number of connected
UE [39], [40]. As LEO satellites, these services must com-
ply with the technology constraints and link budget limita-
tions. Furthermore, due to the relative movement between
satellite constellations and Earth, it is necessary to ensure
dynamic access to 5G massive Machine Type Communications
(mMTC) services over any target geographical area, as repre-
sented in Figure 1, where the UE performs a handover from
satellite number 2 (Sat2 in Figure 1(b)) to satellite number 3
(Sat3 in Figure 1(c)). In particular, it is important to note that
the UE in Figure 1(b) is going to leave the area served by
satellite 2 while it is entering the one managed by Satellite 3.
This scenario may affect the QoS of the UE due to channel
More in detail, this task deals with configuring and interference. It hence requires the definition of a handover
allocating the physical resources required to establish sensor strategy that is both effective and efficient.
data channels on a specific coverage area, considering both the Definition 1 (Resource Allocation): — Let Ue and Ga be,
existing allocation of resources and the satellite constellation respectively, a set of UE and gateways. The aim of the task
setup. concerns the resource allocation of each terminal u ∈ Ue
In particular, the task allows to enable and disable chan- to the suitable gateway g ∈ Ga , also considering possible
nels as needed, taking care of the relationships with the handover, for each round t over a proper time horizon T.
lower layer, transparently managing the required handover Thus, the problem is to assign for each considered times-
operations, and considering possible interference with other tamp a given UE u to the most suitable gateway g on the basis
channels. The dynamic allocation challenge aims to prop- of network conditions, in order to jointly maximize terminals
erly manage resource reservation according to specific Service connectivity and gateways resource allocation.
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
GALLI et al.: PLAYING WITH A MULTI ARMED BANDIT TO OPTIMIZE RESOURCE ALLOCATION 345
Our assumption is to model the network scenario as a graph Figure 2(b)) represented as a bi-partite multi-graph, which
and leverage it as the underlying data structure for a specific interconnects each terminal with the possible satellites to
CMAB game. which it can be connected through different channels (rep-
resented by different edges from a UE to a satellite node).
B. Model Definition In particular, we outline in bold the channel of choice which
Our model relies upon two main entities, i.e., a User connects a UE to the related satellite.
Equipment and a Gateway. The user equipment (e.g., smart-
phone, laptop, or smart device) can be interconnected at any C. Multi-Armed Bandit Game
given time to a single gateway, representing a connection point In this section, we describe our approach based on game
with the satellite network. In particular, we model our sce- theory, whose aim is to compute a probability assignment
nario through a bipartite graph, whose formal representation distribution over different gateways for each terminal. In
is discussed in Definition 2. particular, we design our scenario as a multi-armed bandit
Definition 2 (Network Connection Graph): — Let Ue and problem according to Definition 3, where different terminals
Ga be respectively the set of Terminals and Gateways, we compete for the use of gateway resources, whose internal
define the Network Connection Graph (NCG) as a bipar- parameters (i.e., employed resources, number of connected
tite multi-graph NCG = (V, E, W), where the vertices set terminals, network parameters) dynamically vary during each
is composed by the union of the terminals and gateways round of the game. In our vision, a single round is con-
(V = Ue ∪ Ga ) while the edges set L is only composed by sidered as the time period required for a subsequent han-
directed links eijk from a terminal i ∈ Ue to a gateway j ∈ Ga dover event to take place. In fact, continuously performing
across the channel k, whose weights are represented using a handover procedures has an impact on both the constant
proper function w : V × V → [0, 1]. reconfiguration of the network and the Quality of Service
Each vertex of NCG is characterized by a set of attributes perceived by UE, as shown in [41]. Environment uncer-
such as the position in Earth-Centered Inertial (ECI) format,1 tainties have been modeled by varying, among rounds, the
the number of attached UE (for gNb type of nodes) and following parameters: (i) satellites positions over the earth
information about traffic profile. Such attributes can be used orbit; (ii) UE positions; (iii) bandwidth requests; (iv) weather
for estimating the resources consumed by a gateway with- conditions.
out considering any self-loops in the NCG by definition. Definition 3 (Resource Allocation Game): Let M, N and A
Furthermore, edge weight wij represents the probability that respectively the set of terminals, gateways, and actions. We
UE i is associated with a given satellite j through a specific define the Resource Allocation Game as a tuple (A, R), where
channel. Hence, we model UE competition with respect to a terminal i makes a request for associating to a satellite j
accessing satellite resources through the NCG: each UE can (an action aij (t) ∈ A) at each time step t, thus receiving
get connected to a specific satellite in the constellation with an expected reward Q(aij (t)) = E [ri |aij ] = Θij , where
an attachment probability that is dynamically updated through ri ∈ R and Θij represent, respectively, the reward value and
the algorithm described in Section III-C probabilities.
Working Example 1: Let S0 and S1 be two satellite gate- We claim that the multi-armed bandit problem fits our
ways, having two and one channels respectively, to which task because it requires making decisions over time under
different user terminals are connected, as shown in Figure 2(a). uncertainty. In our case, these actions might be affected by
This scenario can be modeled through an NCG (see several uncertainty factors (e.g., meteorological conditions,
orbital placement, and resource utilization). In particular, the
1 Earth-Centered Inertial (ECI) is a format based on three Cartesian axes,
dynamic behavior of UE, representing the different types of
whose centre is fixed in the centre of the Earth, the z-axis extends in the
direction of the North Pole, while the x- and y-axes extend in the equatorial interaction with the available services, in conjunction with
plane pointing to two fixed stars. variability in weather conditions, introduces a non-linearity in
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
346 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 21, NO. 1, FEBRUARY 2024
TABLE IV
U SED N OTATION
the optimization model. This entails uncertainty in identifying Algorithm 1 Assignment Probability Estimation (APE)
the optimal resource allocation. Furthermore, the randomness Require: A Network Connection Graph NCG = (V, E, W),
of user requests, as well as the unbalanced traffic distribu- Feedback mechanism M and Oracle O
tion over different satellites, contribute to increase the level Ensure: Possible assignment pairs (Ga , Ue )
of uncertainty in dealing with the task. For this reason, our
approach relies on a feedback strategy, computed on the basis 1: procedure APE(NCG, k )
of the chosen arm (that could be played by an agent), as well as 2: Initialize μ
of the related reward, that is the gain obtained by playing the 3: for s ← 0 to T do
chosen arm. In a single round, each agent, corresponding to 4: is_exploit ← random_value(0, 1)
a UE, can play an arm, i.e., enable a specific satellite channel 5: if is_exploit then
on which to transmit its information flow. At the beginning of 6: S ← EXPLOIT (NCG, k , O, μ )
each round, the proposed approach plays a super-arm, which 7: else
corresponds to activating a single channel (i.e., an edge in the 8: S ← EXPLORE (NCG, k )
NCG) having the highest probability for each UE, while at 9: end if
the same time avoiding that the same UE is simultaneously 10: Play arm A, observing the assignment process
connected to different satellites. Hence, the super-arm repre- 11: Compute the reward of the played arms
sents the actions of all agents in each round with the aim 12: μ
= UPDATE (NCG, M , AES )
to maximize their own rewards, while also satisfying their 13: end for
bandwidth requirements. It is worth noting that playing arms 14: return (Ga , Ue )
having the highest probabilities is not always the best strategy 15: end procedure
because of possible changes in both satellites orbits and UE
positions. Variations in bandwidth requests, as well as weather
conditions, can also affect the QoS perceived by UE. This equation (1):
uncertainty has been modeled through a feedback mechanism
combined with an exploration-exploitation strategy that allows
T
updating the connection probability between satellites and UE R(T ) = μ(a ∗ ) · T − μ aij (t) (1)
in a dynamic environment, while also avoiding the risk of t=1
getting caught in a local optimum. Furthermore, we model where μ(a ∗ ) · T is the cumulative reward obtained by playing
the context about the current allocation of resources through the best arm a ∗ (note that it is not necessarily unique) in each
a bi-partite graph (defined in Section III-B), whose estima- round, corresponding to the best possible total expected reward
tion of connection probabilities ( μ) among user terminals associated
with the gateway it is more likely to be connected
and satellites over specified channels has been dynamically to, and T t=1 μ(aij (t)) is the sum of the rewards obtained by
updated by playing different rounds on the basis of historical choosing that arm in each round.
trials. Nevertheless, we introduce an expected regret E[R(T)] over
We summarize the used notations in Table IV to underline the time horizon T, whose reward probability is Θ(T ) because
the correlation between the CMAB concepts and the scenario it is not possible to identify the optimal action in advance.
we designed. Therefore, our goal is to minimize the loss LT over the T
The Resource Allocation Game does not satisfy the rounds, whose formal definition is provided in Equation (2):
Independent and Identically Distributed (i.i.d.) assumption T
because the same action does not achieve the same rewards in
∗
LT = E [Θ(T )] = E (Θ − Q(at )) (2)
each round. In fact, the mean reward vector μ ∈ [0, 1] for each
t=1
arm, whose reward distribution can be either 0 or 1 (connected
or not connected to a gateway) according to a Bernoulli dis- where the optimal reward probability Θ∗ = Q(a ∗ ) =
tribution, depends on different parameters (e.g., the distance maxj ∈Ga Q(aij ) = max1≤j ≤N Θij related to the optimal
between terminal and gateway, used resources, network prop- action a ∗ .
erties). The difference Δ(aij ) = μ(a ∗ ) − μ(aij ), also called 1) Algorithm: Algorithm 1 aims to estimate the probabil-
the gap of arm aij , describes how bad arm a is compared to ity that a UE can be attached to the most suitable satellite
the best mean reward μ(a ∗ ) = maxj ∈Ga μ(aij ). In particular, through N discrete steps (rows 3-13). In each round, a set of
we aim to minimize the regret R over T rounds according to k user equipment is selected by activating the edges having a
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
GALLI et al.: PLAYING WITH A MULTI ARMED BANDIT TO OPTIMIZE RESOURCE ALLOCATION 347
significant assignment probability. The others are assigned to each channel. Therefore, the algorithm assigns to each
the suitable satellites, observing the generated rewards. UE the related probability distribution. For instance, it
Algorithm 1 gets as input the Network Connection Graph, assigns to the first user equipment (U0 ) the following
whose edge assignment probability has been randomly probabilities on the basis of the above-defined conditions:
selected considering the initial configuration. In each round, <(0.4, S0,0 ), (0.3, S0,1 ), (0.3, S1,0 )>, where the first number
different arms can be played, each one associated with a ran- of each pair is the assignment probability and the second
dom variable Xi,s ∈ [0, 1], whose mean μi is defined as the element is the identifier of the selected satellite and related
reward obtained by playing that arm in the examined round. channel.
Furthermore, a super-arm A is played at each round 2) Time Complexity: Combinatorial Multi-armed bandit
(row 10), involving the simultaneous activation of a subset of (CMAB) covers a large class of combinatorial online learn-
arms related to a set of selected nodes (Seed Set S). In partic- ing problems under stochastic settings, where subsets of base
ular, these arms are chosen in descending order of assignment arms with unknown distributions can be played, whose reward
probabilities, so to activate a single edge between a UE and depends on unknown stochastic behaviors. In particular, we
the most suitable satellite. Edge weights are initially set to the define that an algorithm can solve the CMAB problem if it
same value. Then, they are updated after each round through can match the RT = O(log T ) lower bound. The CMAB
the update function (row 12), which computes the mean of the problem, under specific conditions, has been shown to achieve
probability assignment over the round itself. a tight asymptotic regret of Θ(log n). As demonstrated in [43],
The initial set of k nodes can be chosen by exploitation [44], and [45], O(log n) regret can be achieved uniformly over
(row 6) or exploration (row 8) strategies at the beginning of time rather than just asymptotically.
each round. The exploitation strategy, which corresponds to
pulling the arm with highest expected reward, leverages an IV. E XPERIMENTAL E VALUATION
oracle algorithm to deal with the handover task. This is done
In this section, we first describe the experimental proto-
by minimizing the difference between the UE bandwidth
col for performing our evaluation analysis, focusing on how
requirements and the available resources from the satellite con-
the dataset has been generated. Then, we discuss the obtained
stellation while estimating influence probabilities (SeedSet). In
results.
particular, the oracle provides an approximation of the optimal
solution by solving an optimization problem defined on the
basis of weather conditions, the distance between UE and A. Experimental Protocol
satellite, UE requirements, and satellite available resources. We herein evaluate both the efficiency and the efficacy of
In turn, the exploration strategy (i.e., sampling arms to learn the proposed approach based on a multi-armed bandit strategy,
about them) chooses the seed set at random. Once the seed set whose aim is to estimate an assignment probability distribution
has been chosen, the super-arm may be played by investigat- w.r.t. a given constellation of satellites. In particular, the aim
ing the assignment process started when additional “weapons” of our evaluation is threefold:
are activated as a consequence. At the end of the round, 1) Evaluating the running time of the proposed methodol-
the estimated average ( μ) is updated (row 12) according to ogy in an environment with strict temporal requirements,
Equation (3). i.e., near real-time; namely, we evaluate the running time
To summarize, an optimal balance between these two by varying the seed set size (k), as well as the probability
strategies must be struck in order to diminish the probability distribution between exploitation and exploration strate-
of achieving a local optimum when leveraging just the gies, while also increasing the number of graph objects
exploitation strategy [42]. The final aim is to identify the best (total number of nodes and edges);
seed set for each round on the basis of the previous ones. 2) Evaluating the proposed methodology in terms of the
t regret/reward achieved, varying both the seed set size
Xi,s
μi = s=1 (3) and distribution probability between exploitation and
Ti,t exploration strategies;
Working Example 2: Let us consider example 1, in which 3) Further comparing the proposed approach with two
we have two satellites (S0 and S1 ), with two and one channel, different baselines obtained, respectively, through an
respectively. Different user terminals are connected to such optimization-based [17] and a random-based technique.
satellites. Let us assume that S0 and S1 have the same band- Comparison is meant in terms of execution time and
width for each channel while the two user equipment have achieved regret/reward.
different requirements for each round. In our analysis, we We split a simulation into multiple time intervals with a
also assume that weather information is typical of a sunny fixed duration, called simulation rounds. It is worth noting
day (according to the label provided by OpenWeatherMap).2 that, for each round of the simulation, we vary both bandwidth
Resource allocation is performed across three simulation demands and UE positions, as well as weather conditions and
rounds (1 minute per round). The proposed algorithm satellites positions over the earth orbit. Furthermore, we will
aims to identify the optimal UE allocation configuration, also present a case study in which the resources required by
by updating the association probability distributions for the different user terminals exceed those made available by the
satellites (due to either lack of resources or adverse weather
2 https://openweathermap.org/history conditions).
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
348 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 21, NO. 1, FEBRUARY 2024
Fig. 3. Efficiency analysis varying the number of UE and gNB nodes and setting the value of epsilon to 0.5.
TABLE V
The evaluation has been carried out on the platform-as-a- DATASET C HARACTERIZATION . N OTE T HAT W EATHER I NFORMATION
service (PaaS) Google Colab3 equipped with one single core I S G ATHERED F ROM O PEN W EATHER M AP
hyper threaded Xeon Processor @2.2Ghz, 12 GB of RAM and
a Tesla T4 GPU.
B. Dataset Generation
We generated the artificial dataset through a simulator, cre-
ating an mMTC scenario with a LEO satellite constellation
as RAN (Radio Access Network). The simulator is mainly
based on three different components, written in the C++ pro-
gramming language: i) the ns-3 official release,4 representing
the main structure of the simulated network, ii) a 5G LENA
module, which provides the 5G New Radio (NR) interface for
UE and gNbs, and iii) a satellite mobility module, computing
and periodically updating satellite node positions. The aim of
the simulator is to set 5G UE spectrum allocated resources about the dataset, whilst the weather information is retrieved
by placing different gNbs and user equipment at different from OpenWeatherMap.2
positions, as identified by ECI coordinates. By running traffic through the configured GWs, it is possible
The overall simulation is split in multiple rounds with a to collect a series of real-time values (long time monitoring
fixed round duration mainly because of the fact that the 5G and collection) for the SNR and hence create the data sets
LENA module does not yet support handover. In fact, han- for driving decisions on the best satellite channel configura-
dover events and the considered resource reallocation cannot tion option. In addition, data for the multi-gateway switching
be enforced while the simulation is still running. Thus, the can be extracted from external open dataset sources, like for
resource configuration may change, and handover events can example OpenWeatherMap.2
occur between one simulation round and the other.
At each simulation round, network information is pro-
V. E XPERIMENTAL R ESULTS
vided in order to model the interaction between UE and
gNb in terms of Distance (km), ElevationAngle (deg), and In view of the demanding constraints of the handover task,
RemainingContactTime (sec). Different spectrum features are we are interested in evaluating the efficiency of the proposed
further provided in terms of Number of active frequency bands approach when varying the number of objects in the graph
(FB), Central frequency of each FB, Bandwidth of each FB, (see Figure 3, Figure 4 and Figure 5). In particular, two main
Number of Component Carriers (CC) of each FB, Central factors affect the efficiency of the proposed approach: i) the
frequency of each CC, Bandwidth of each CC, Number of probability of choosing between exploit and explore strate-
Bandwidth Parts (BWP) of each CC, Central frequency of gies (epsilon value) and ii) the number of graph objects
each BWP, Bandwidth of each BWP, Numerology of each (GnB and UE with the related relationships) and provided
BWP. All such features are considered when making deci- resources.
sions about handover. We further enriched this dataset by Increasing the probability to perform the exploit strategy
integrating both user equipment and satellite requirements, in (see Figure 4), which corresponds to decreasing the value of
terms of required and available resources, as well as related epsilon below 0.5, leads to a higher execution time. This is
weather information such as temperature, pressure, humidity due to the fact that the exploit strategy requires identifying
and wind speed. Table V summarizes the main information the best solution for the allocation of user equipment with
respect to GnBs at each round. In turn, Figure 5 shows lower
3 https://research.google.com execution time when playing 30% of the time the exploit
4 https://www.nsnam.org/releases/ strategy. This highlights a constant trend when increasing
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
GALLI et al.: PLAYING WITH A MULTI ARMED BANDIT TO OPTIMIZE RESOURCE ALLOCATION 349
Fig. 4. Efficiency analysis varying the number of UE and gNB nodes and setting the value of epsilon to 0.3 (corresponding to choosing the exploit strategy
70% of the time).
Fig. 5. Efficiency analysis varying the number of UE and gNB nodes and setting the epsilon value to 0.7 (corresponding to choosing the exploit strategy
30% of the time).
Fig. 6. Efficacy analysis varying the number of UE and gNB and setting the epsilon value to 0.5.
both the number of graph objects and rounds. In fact, this Furthermore, the rise in the number of graph nodes (UE
strategy assigns randomly a user equipment to a possible satel- and gNB) involves a simultaneous increase in the execu-
lite while not guaranteeing to achieve the optimal solution. tion time per round, starting from 10−3 up to 3 seconds
In particular, we can observe peaks in Figures 3-5 because (see Figures 3(a)-3(c), 4(a)-4(c), 5(a)-5(c)). In fact, it is easy
of the dynamic behavior of the task, which strongly affects to recognize that the execution time of our approach has
performance. Increasing the number of UE and gNbs makes a linear trend with respect to the number of graph objects,
things even more complicated. It is worth noting that the which makes it feasible also for mega-constellations such as
majority of peaks happen when leveraging exploitation rather SpaceX [21].
than exploration. The dynamic bandwidth demands of each In turn, we summarize the effectiveness analysis of the
UE, the movement of satellites along the Earth’s orbit, and the proposed approach in Figures 6-8, varying both the number of
variability of weather conditions are the main causes behind nodes and the value of epsilon. It is possible to note that the
the generation of such peaks. As already stated, exploitation proposed approach converges to a sub-optimal solution (with
also calls for a higher execution time. On the other hand, using respect to the oracle’s outcome) in around 20-30 rounds. In
the exploration strategy reduces the number of peaks, but also fact, the Average Regret achieves a value that is closest to
lowers the overall effectiveness of the proposal due to the 0 after 20-30 rounds for all configurations of the considered
random assignment of UE to satellites. network topology. The increase in the number of rounds
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
350 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 21, NO. 1, FEBRUARY 2024
Fig. 7. Efficacy analysis varying the number of UE and gNB and setting the epsilon value to 0.3 (corresponding to choosing the exploit strategy 70% of the
time).
Fig. 8. Efficacy analysis varying the number of UE and gNB and setting the epsilon value to 0.7 (corresponding to choosing the exploit strategy 30% of the
time).
required to reach an optimal value is related to the increase in exceed the ones required by the set of UE (see Figures 9(a)
the number of graph objects. and 10(a)) and the counter case (see Figures 9(b) and 10(b)).
Furthermore, by varying the value of epsilon (which It is easy to notice that in the former case there is an increase
determines the probability of choosing either the exploit or the in the time needed to configure the required resources at
explore strategy) our proposal is able to learn fast in all cases the UE side due to a lack of resources made available from
by strongly diminishing the Average Regret. However, as it is satellites (see Figure 9(a) vs Figure 9(b)). For the very same
easy to notice from Figures 6(b) and 6(c), our approach learns reason, there is a slower learning of the optimal configurations,
faster when the probability of choosing the exploit strategy is which results in a higher average regret (see Figure 10(a) vs
higher than the probability of choosing the explore strategy Figure 10(b)).
(i.e., when epsilon is equal to 0.7). This happens because the We finally compare the proposed approach with two dif-
exploit strategy allows leveraging the historical outcomes of ferent baselines (optimization-based [17] and random-based
the past rounds for identifying the best solution on the basis strategies, respectively).
of the specific context. Although such a result leads us to Figures 11, 12, 13 and 14 represent the efficiency and
choose predominantly the exploit strategy, the latter can often efficacy analysis for both the baselines, respectively. In these
get stuck into a local optimum (as it has been shown in [42]). Figures, it is easy to notice that the optimization-based strategy
Hence, a proper trade-off must be found between these two requires a higher execution time, though it converges to 0
strategies before arriving at a suitable solution for the designed in a lower number of iterations (around 20 rounds versus
task. In summary, the predominant choice of the exploration 30-40 rounds). Indeed, our solution aims to leverage both
strategy results in a less stable solution but with a faster run- approaches, so to achieve the two-fold objective of reducing
ning time, while the exploitation case yields a more stable the time constraint of optimization-based strategies on one side
solution but with a longer running time. Furthermore, increas- and improving the effectiveness analysis of the random-based
ing the number of satellites (shown in Figures 6 – 8 (c)) leads approach (around 20 rounds) on the other. This is shown in
to a rise in the available bandwidth provided by the constel- Figures 3-8.
lation. As a result, this increase allows for better satisfaction In conclusion, it is possible to remark that our approach
of UE (User Equipment) requests by softening the curve. aims to identify a trade-off between effectiveness and effi-
We also consider the case in which the satellite resources ciency for supporting different types of 5G-based services
are not able to satisfy the user equipment needs. In as well as mMTC traffic profiles. We finally observe how
Figures 9 and 10 we evaluate the proposed approach consid- our approach shows, with the experimental setup described
ering both the case in which the available satellite resources in Section IV-A, a running time of a few seconds, which
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
GALLI et al.: PLAYING WITH A MULTI ARMED BANDIT TO OPTIMIZE RESOURCE ALLOCATION 351
Fig. 9. Efficiency analysis setting the epsilon value to 0.5 and the number of gNB and UE to 100. (a) Represents the case when satellite resources exceed
UE’s demand. (b) Reproduces the counter case.
Fig. 10. Efficacy analysis setting the epsilon value to 0.5 and the number of gNB and UE to 100. (a) Represents the case when satellite resources exceed
UE’s demand. (b) Is the counter case.
Fig. 11. Efficiency analysis for the optimization-based strategy [17], by varying the number of UE and gNB.
Fig. 12. Efficacy analysis for the optimization-based strategy [17], by varying the number of UE and gNB.
is quite high. Though, this is mainly due to the limited and technologies, which are able to effectively handle large
computing power used for the simulations (PaaS deployment amounts of information, would for sure significantly reduce
with 12 GB of RAM). The use of Big Data architectures the running time of the proposed approach.
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
352 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 21, NO. 1, FEBRUARY 2024
Fig. 13. Efficiency analysis for the random-based strategy [17], by varying the number of UE and gNB.
Fig. 14. Efficacy analysis for the random-based strategy [17], by varying the number of UE and gNB.
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
GALLI et al.: PLAYING WITH A MULTI ARMED BANDIT TO OPTIMIZE RESOURCE ALLOCATION 353
[10] J. Guo, D. Rincón, S. Sallent, L. Yang, X. Chen, and X. Chen, “Gateway [33] C. Han, L. Huo, X. Tong, H. Wang, and X. Liu, “Spatial anti-jamming
placement optimization in LEO satellite networks based on traffic esti- scheme for Internet of Satellites based on the deep reinforcement learn-
mation,” IEEE Trans. Veh. Technol., vol. 70, no. 4, pp. 3860–3876, ing and Stackelberg game,” IEEE Trans. Veh. Technol., vol. 69, no. 5,
Apr. 2021. pp. 5331–5342, May 2020.
[11] D. Zhou, M. Sheng, J. Wu, J. Li, and Z. Han, “Gateway placement [34] M. S. Mollel et al., “A survey of machine learning applications
in integrated satellite–terrestrial networks: Supporting communications to handover management in 5G and beyond,” IEEE Access, vol. 9,
and Internet of Remote Things,” IEEE Internet Things J., vol. 9, no. 6, pp. 45770–45802, 2021.
pp. 4421–4434, Mar. 2022. [35] N. Sharma and K. Kumar, “Resource allocation trends for ultra dense
[12] E. Juan, M. Lauridsen, J. Wigard, and P. Mogensen, “Handover solu- networks in 5G and beyond networks: A classification and comprehen-
tions for 5G low-earth orbit satellite networks,” IEEE Access, vol. 10, sive survey,” Phys. Commun., vol. 48, Oct. 2021, Art. no. 101415.
pp. 93309–93325, 2022. [36] Z. Chen, D. Guo, K. An, B. Zhang, X. Zhang, and B. Zhao, “Efficient
[13] I. Shayea, M. Ergen, M. Hadri Azmi, S. Aldirmaz Çolak, R. Nordin, and fair resource allocation scheme for cognitive satellite-terrestrial
and Y. I. Daradkeh, “Key challenges, drivers and solutions for mobil- networks,” IEEE Access, vol. 7, pp. 145124–145133, 2019.
ity management in 5G networks: A survey,” IEEE Access, vol. 8, [37] W. Bendada, G. Salha, and T. Bontempelli, “Carousel personalization
pp. 172534–172552, 2020. in music streaming apps with contextual bandits,” in Proc. RecSys,
[14] Z.-H. Huang, Y.-L. Hsu, P.-K. Chang, and M.-J. Tsai, “Efficient han- New York, NY, USA, 2020, pp. 420–425.
dover algorithm in 5G networks using deep learning,” in Proc. IEEE [38] S. Ontanón, “Combinatorial multi-armed bandits for real-time strategy
Global Commun. Conf., 2020, pp. 1–6. games,” J. Artif. Intell. Res., vol. 58, no. 1, pp. 665–702, 2017.
[15] X. Hu, S. Liu, R. Chen, W. Wang, and C. Wang, “A deep reinforcement [39] X. Cheng, Z. Huang, and L. Bai, “Channel Nonstationarity and consis-
learning-based framework for dynamic resource allocation in multibeam tency for beyond 5G and 6G: A survey,” IEEE Commun. Surveys Tuts.,
satellite systems,” IEEE Commun. Lett., vol. 22, no. 8, pp. 1612–1615, vol. 24, no. 3, pp. 1634–1669, 3rd Quart., 2022.
Aug. 2018. [40] W. Shi, W. Xu, X. You, C. Zhao, and K. Wei, “Intelligent reflection
[16] I. Shayea, M. Ergen, A. Azizan, M. Ismail, and Y. I. Daradkeh, enabling technologies for integrated and green Internet-of-Everything
“Individualistic dynamic handover parameter self-optimization algorithm beyond 5G: Communication, sensing, and security,” IEEE Wireless
for 5G networks based on automatic weight function,” IEEE Access, Commun., vol. 30, no. 2, pp. 147–154, Apr. 2023.
vol. 8, pp. 214392–214412, 2020. [41] M. S. Mollel et al., “A survey of machine learning applications
[17] S. Zhang, A. Liu, C. Han, X. Ding, and X. Liang, “A network- to handover management in 5G and beyond,” IEEE Access, vol. 9,
flows-based satellite handover strategy for LEO satellite networks,” pp. 45770–45802, 2021.
IEEE Wireless Commun. Lett., vol. 10, no. 12, pp. 2669–2673, [42] W. V. F. Mauricio, T. F. Maciel, A. Klein, and F. R. M. Lima,
Dec. 2021. “Scheduling for massive MIMO with hybrid precoding using contex-
[18] P. Yang, K. Iyer, and P. Frazier, “Mean field equilibria for resource com- tual multi-armed bandits,” IEEE Trans. Veh. Technol., vol. 71, no. 7,
petition in spatial settings,” Stochast. Syst., vol. 8, no. 4, pp. 307–334, pp. 7397–7413, Jul. 2022.
2018. [43] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation
[19] E. Wang, H. Li, and S. Zhang, “Load balancing based on cache resource rules,” Adv. Appl. Math., vol. 6, no. 1, pp. 4–22, 1985.
allocation in satellite networks,” IEEE Access, vol. 7, pp. 56864–56879, [44] W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed bandit:
2019. General framework and applications,” in Proc. 30th Int. Conf. Mach.
[20] S. Hossain, E. Micha, and N. Shah, “Fair algorithms for multi-agent Learn., Jun. 2013, pp. 151–159.
multi-armed bandits,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, [45] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The non-
2021, pp. 24005–24017. stochastic multiarmed bandit problem,” SIAM J. Comput., vol. 32, no. 1,
[21] A. C. Boley and M. Byers, “Satellite mega-constellations create risks in pp. 48–77, 2002.
low earth orbit, the atmosphere and on earth,” Sci. Rep., vol. 11, no. 1,
pp. 1–8, 2021.
[22] M. L. Littman, “Markov games as a framework for multi-agent rein-
forcement learning,” in Proc. 11th Int. Conf. Mach. Learn., 1994,
pp. 157–163.
[23] S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and non-
stochastic multi-armed bandit problems,” Mach. Learn., vol. 5, no. 1,
pp. 1–122, 2012. Antonio Galli received the bachelor’s and master’s
[24] D. Bouneffouf and I. Rish, “A survey on practical applications of multi- (cum laude) degrees in computer science and engi-
armed and contextual bandits,” 2019, arXiv:1904.10040. neering from the University of Naples “Federico II”
[25] L. Chen, J. Xu, and Z. Lu, “Contextual combinatorial multi-armed ban- in 2016 and 2019, respectively. He is currently pur-
dits with volatile arms and submodular reward,” in Proc. Adv. Neural suing the Ph.D. degree in technology, innovation and
Inf. Process. Syst., vol. 31, 2018, pp. 3247–3256. management with the University of Bergamo and
[26] M. A. Qureshi, A. Nika, and C. Tekin, “Multi-user small base station the University of Naples “Federico II.” His main
association via contextual combinatorial volatile bandits,” IEEE Trans. research interests are in the area of deep learning
Commun., vol. 69, no. 6, pp. 3726–3740, Jun. 2021. and big data analytics.
[27] K. Zhu, L. Li, Y. Xu, T. Zhang, and L. Zhou, “Multi-connection
based scalable video streaming in UDNs: A multi-agent multi-armed
bandit approach,” IEEE Trans. Wireless Commun., vol. 21, no. 2,
pp. 1156–1169, Feb. 2022.
[28] F. Li, D. Yu, H. Yang, J. Yu, H. Karl, and X. Cheng, “Multi-armed-
bandit-based spectrum scheduling algorithms in wireless networks: A
survey,” IEEE Wireless Commun., vol. 27, no. 1, pp. 24–30, Feb. 2020.
[29] H. Joshi, S. Santra, S. J. Darak, M. K. Hanawal, and S. V. S. Santosh,
“Multiplay Multiarmed bandit algorithm based sensing of noncontiguous
wideband spectrum for AIoT networks,” IEEE Trans. Ind. Informat.,
vol. 18, no. 5, pp. 3337–3348, May 2022. Vincenzo Moscato is currently an Associate
[30] B. Thananjeyan, K. Kandasamy, I. Stoica, M. Jordan, K. Goldberg, and Professor of Database and Information Systems
J. Gonzalez, “Resource allocation in multi-armed bandit exploration: with the Department of Electrical Engineering and
Overcoming sublinear scaling with adaptive parallelism,” in Proc. Int. Information Technologies, University of Naples
Conf. Mach. Learn., 2021, pp. 10236–10246. “Federico II.” He was involved in several inter-
[31] A. Guidotti et al., “Architectures and key technical challenges for 5G national, national, and local research projects and
systems incorporating satellites,” IEEE Trans. Veh. Technol., vol. 68, currently is an author of more than one hundred
no. 3, pp. 2624–2639, Mar. 2019. publications on international journal and conference
[32] L. You, K.-X. Li, J. Wang, X. Gao, X.-G. Xia, and B. Ottersten, “Massive proceedings. His current research interests lie in the
MIMO transmission for LEO satellite communications,” IEEE J. Sel. area of multimedia, knowledge management, and big
Areas Commun., vol. 38, no. 8, pp. 1851–1865, Aug. 2020. data analytics.
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.
354 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 21, NO. 1, FEBRUARY 2024
Simon Pietro Romano is currently a Full Professor Giancarlo Sperlí received the Ph.D. degree in
with the Department of Electrical Engineering information technology and electrical engineering
and Information Technology, University of Naples from the University of Naples Federico II, defend-
Federico II. He teaches computer networks, com- ing his thesis: “Multimedia Social Networks.” He
puter architectures, network security, and telem- is an Assistant Professor with the Department of
atics applications. He is also the Co-Founder Electrical Engineering and Information Technology,
of Meetecho, a startup and university spin- University of Naples Federico II. He is a member of
off dealing with scalable video streaming and the Pattern analysis and Intelligent Computation for
WebRTC-based unified collaboration, as well as of mUltimedia Systems (PICUS) departmental research
SECurity Solutions for Innovation that focuses on groups. Finally, he has authored about 105 pub-
network security. He actively participates in Internet lications in international journals, conference pro-
Engineering Task Force standardization activities, mainly in the applications ceedings, and book chapters. His main research interests are in the area of
and real time area. cybersecurity, semantic analysis of multimedia data, and social networks anal-
ysis. He served as a guest editor of different special issues on international
journals.
Authorized licensed use limited to: University of Luxembourg. Downloaded on June 03,2024 at 13:44:43 UTC from IEEE Xplore. Restrictions apply.