Energy-Aware Dynamic DU Selection and NF Relocation
Energy-Aware Dynamic DU Selection and NF Relocation
Article
Energy-Aware Dynamic DU Selection and NF Relocation in
O-RAN Using Actor–Critic Learning
Shahram Mollahasani 1 , Turgay Pamuklu 1 , Rodney Wilson 2 and Melike Erol-Kantarci 1, *
1 School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
[email protected] (S.M.); [email protected] (T.P.)
2 Ciena, Ottawa, ON K2K 0L1, Canada; [email protected]
* Correspondence: [email protected]
Abstract: Open radio access network (O-RAN) is one of the promising candidates for fulfilling flexible
and cost-effective goals by considering openness and intelligence in its architecture. In the O-RAN
architecture, a central unit (O-CU) and a distributed unit (O-DU) are virtualized and executed on
processing pools of general-purpose processors that can be placed at different locations. Therefore, it
is challenging to choose a proper location for executing network functions (NFs) over these entities
by considering propagation delay and computational capacity. In this paper, we propose a Soft Actor–
Critic Energy-Aware Dynamic DU Selection algorithm (SA2C-EADDUS) by integrating two nested
actor–critic agents in the O-RAN architecture. In addition, we formulate an optimization model that
minimizes delay and energy consumption. Then, we solve that problem with an MILP solver and use
that solution as a lower bound comparison for our SA2C-EADDUS algorithm. Moreover, we compare
that algorithm with recent works, including RL- and DRL-based resource allocation algorithms and
a heuristic method. We show that by collaborating A2C agents in different layers and by dynamic
relocation of NFs, based on service requirements, our schemes improve the energy efficiency by 50%
with respect to other schemes. Moreover, we reduce the mean delay by a significant amount with our
Citation: Mollahasani, S.; Pamuklu,
novel SA2C-EADDUS approach.
T.; Wilson, R.; Erol-Kantarci, M.
Energy-Aware Dynamic DU Selection
Keywords: actor–critic learning; energy-efficiency; O-RAN; RAN optimization
and NF Relocation in O-RAN Using
Actor–Critic Learning. Sensors 2022,
22, 5029. https://doi.org/10.3390/
s22135029
1. Introduction
Academic Editor: Peter Han
The next generation wireless networks include a fundamental transformation which is
Joo Chong
the next generation radio access networks (NG-RAN) [1]. The NG-RAN protocol stack can
Received: 22 May 2022 be split into eight different disaggregated options which are combined within three network
Accepted: 26 June 2022 units: radio unit (RU), distributed unit (DU), and centralized unit (CU). Furthermore, un-
Published: 3 July 2022 like the traditional RAN, NG-RAN functions can be virtualized on top of general-purpose
Publisher’s Note: MDPI stays neutral hardware. In Open RAN (O-RAN), the concept of virtualized RAN (vRAN) and the disag-
with regard to jurisdictional claims in gregation of network units reaches its full interoperability by using open interfaces among
published maps and institutional affil- multiple vendors [2,3]. However, the placement of the virtual network functions can be
iations. challenging due to multiple constraints, such as routing path, RAN protocol splits, band-
width limitations, maximum latency tolerance requirements, heterogeneous computational
resources, and so on. The main objective of this work is to develop an RL-based network
function placement scheme in a way that the energy consumption is minimized while
Copyright: © 2022 by the authors. the expected quality of service (QoS) for that network function is satisfied. In this work,
Licensee MDPI, Basel, Switzerland. resource block (RB) scheduling is considered as a network function (NF). The proposed
This article is an open access article
scheme dynamically relocates this NF among DUs based on their location and processing
distributed under the terms and
power by targeting minimum energy consumption and satisfying the QoS requirements
conditions of the Creative Commons
of users.
Attribution (CC BY) license (https://
The idea of disaggregation in O-RAN is allowing RAN functions to be placed at
creativecommons.org/licenses/by/
different computing devices in a distributed manner [4]. Therefore, it is vital to identify
4.0/).
how this disaggregationcan be executed and what metrics need to be satisfied to run these
disaggregated components correctly. By considering the concept of function splits in O-
RAN, the requirements for network functions such as minimum bit rate and the maximum
latency for communication among O-RAN components (RU, DU, and CU) need to be
satisfied. This work is developed based on the O-RAN architecture and split option 7.2
where the functional split is between RU and DU [1,5].
Since the amount of energy consumption in DUs is proportional to the time they are
active, i.e., processing tasks, mobile network operators (MNO) are seeking an intelligent
assignment model where RUs use minimum required resources at DUs, while QoS require-
ments such as packet delivery ratio and latency are satisfied [6]. Note that a part of DU
energy consumption is fixed due to the cooling system, idle power, etc. Therefore, reducing
the active time of DUs can decrease the overall energy consumption in the network. This
can be achieved by optimizing load distribution among DUs with respect to their process-
ing power and their location and setting redundant DUs to sleep mode. The main focus of
this paper is to highlight the effect of deploying an AI-enabled network function relocation
in a dynamic environment on energy consumption and network performance. To this end,
we examined our model by considering resource block scheduling as a network function
and applied our AI-based framework on it. We evaluate this model by generating User
Datagram Protocol (UDP) and Transmission Control Protocol (TCP) packets (video and
ITS) with different delay budgets and QoS requirements.
In this paper, we propose an RL-based resource block allocation in 5G and DU selection
in O-RAN architecture. In the proposed model, we employed energy awareness as a key
performance indicator and provide a multi-agent actor–critic (A2C) method as the primary
novelty of this study. In this work, we employed two agents; one is responsible for
allocating RB to UEs by considering the type and priority of packets, while the other
agent is integrated for reducing the energy consumption by considering processing power,
capacity, and propagation delay among DUs in the network. Furthermore, performance
evaluation includes a mixed-integer linear programming (MILP) solver comparison to
determine the gap between this novel approach and the lower bound of that multi-objective
minimization problem. Thus, we demonstrate the feasibility of this advanced machine
learning implementation in a disaggregated RAN scheme.
In our prior work [7], we developed an A2C-based algorithm for invoking NFs at
the CU or DU by considering traffic requirements. This work extends [7] by considering
the propagation delay between O-RAN components and the energy consumption, and in
addition, we consider DUs with different processing capacities. The proposed model is ex-
amined for a different number of UEs, and it is compared with a heuristic model developed
in our previous works and two other recent works. Our results show that the proposed
model can reduce energy consumption by 50% with respect to models where network
functions are executed locally under the given settings. Apart from energy conservation,
the proposed model can reduce the overall mean delay for an intelligent transport system
(ITS) and video traffic down to 7 ms and 12 ms, respectively. In contrast, the packet delivery
ratio for ITS and video traffics will be increased up to 80% and 30%, respectively.
The rest of this paper is organized as follows: In Section 2, we discuss the recent works
in this area. In Section 3, we formulate the problem and propose an MILP solution. The A2C-
based algorithm is comprehensively explained in Section 4. In Section 5, the proposed
scheme is compared with four baseline algorithms, and in Section 6, we conclude the paper.
2. Related Work
Energy-efficiency algorithms for traditional RANs have been comprehensively pre-
sented in [8]. Most of the works covered in [8] are based on on-off techniques for BSs. Some
of these models rely on traffic prediction to estimate low traffic intervals, while others use
cell zooming techniques to expand the BSs’ coverage with respect to their neighbors [9–11].
These studies consider RAN equipment as a monolithic equipment.
Sensors 2022, 22, 5029 3 of 17
Energy efficiency is also considered in centralized RAN (C-RAN), where just the radio
unit of BSs is disaggregated at remote radio heads (RRHs), and the rest is implemented at
the baseband unit (BBU). For instance, in [12], BBU placement across physical machines in
a cloud site is formulated as a bin-packing problem, and the authors tackle this problem
by proposing a heuristic algorithm. Additionally, in [13,14], the authors improve the work
in [12] by proposing a BBUs virtualization technique with a linear computational complexity
order that reduces the power consumption in the network. Furthermore, in [15], the authors
show how traffic forecasting at each BS can be used in dynamically switching (on/off)
RRHs. Moreover, ref. [16] shows an end-to-end energy-efficient model by activating and
placing virtual network functions (VNFs) on physical machines and distributing the traffic
among them.
The aforementioned C-RAN scenarios mainly reflect the fixed functional split, and they
require high fronthaul bitrate and consequently incur high deployment cost. To this
end, the impact of the flexible function split is examined in several recent studies [17].
For instance, in [18], savings in power and computational resources with respect to different
function splits are analytically modeled. In [19], the authors aim to optimize the energy
efficiency and the midhaul bandwidth in C-RAN.
Different from these works, in this paper we evaluate the energy consumption for
dynamic NF migration among edge clouds, i.e., not only CU allocation as in C-RAN but
also DU allocation in O-RAN is considered. Finally, refs. [20,21] show that the migration of
NFs has a non-negligible impact on energy consumption, which has not been addressed
in previous works. Since we aim to address the placement of resource allocation function
using machine learning, it is important to give a brief overview on the existing studies
on RL-based resource allocation [22]. For instance, an RL-based resource block allocation
technique is employed in a vehicle-to-vehicle network in [23].
In [24], an RL-based algorithm is proposed for optimizing the energy consumption
and cost in a disaggregated RAN. Moreover, in [25], the authors develop an RL-based
user-beam association and resource allocation scheme using transfer reinforcement learn-
ing. In [26], a deep RL-based resource block allocation is introduced in which RBs are
allocated in a way that the mean delay is reduced. Another Deep RL approach is proposed
in [27] to solve a two-tier resource allocation problem in a standalone base station. In our
previous work [28], we developed an RL-based NF to schedule URLLC and eMBB pack-
ets with respect to the delay budget and channel conditions of UEs in the network. We
also developed an RL-based algorithm for invoking NFs at the CU or DU by considering
traffic requirements [7]. However, unlike [7], in this paper we develop a comprehensive
scheme that considers the propagation delay between O-RAN components and the en-
ergy consumption, and in addition we consider DUs with different processing capacities.
Furthermore, in [29], we introduced an optimization-based solution for the DU selection
problem under delay constraints. This paper extends [29] by modeling the problem as a
multi-objective minimization problem for jointly addressing energy-efficiency and delay.
Unlike previous works, in this paper we develop an actor–critic based DU selection
scheme for a RAN with disaggregated layers (such as O-RAN) to dynamically relocate
network functions among available DUs (edge servers) by considering their processing
power and propagation delay in a way that the overall energy consumption in the network
is reduced, while packet delivery ratio and delay budget of user traffic are satisfied.
3. System Model
Figure 1 shows the overall architecture that is structured as a time-interval-based
model. In each time interval (t ∈ T ), I numbers of user equipment (UEs, i ∈ I ) request
demands which are defined as a tuple hi, ti. These demands may belong to one of two
(K = 2) different types of traffic (k ∈ K), such as video and ITS. These traffic types have
different demand sizes (Uhi,tik ) and delay budgets (∆hi,tik ). On the infrastructure side, we
consider L low processing power DUs (DU LP , l ∈ L) that service these UEs. These DUs can
Sensors 2022, 22, 5029 4 of 17
house network protocols from the lowest level to the packet data convergence protocol [30].
Moreover, each one has a dedicated RU to perform lower layer RF functions.
Figure 1. The overall architecture of the proposed model. Agents are located at DUs and interact
with the DU selection algorithm by sending some feedback during each time interval (the red arrow).
Then the DU selection scheme informs agents (the green arrow) regarding the location of NF during
the next time interval based on the types of packets (URLLC or eMBB), available processing capacity,
scheduling delay, and propagation delay among DUs to minimize the overall energy consumption in
the network.
is processed in RB r in the time interval t0 . Lastly, Equation (2) calculates the total energy
consumption in DUs.
∑
F D
El + El ∗
Elt0 = ∗ (1 − blt0 )
ahi,tirt0 (1)
hi,ti∈hI ,T i
r ∈R
!
E TOT = ∑ E HP + ∑ Elt0 (2)
t0 ∈T l ∈L
The delay of each traffic demand (hi, ti) is the summation of two terms in which the
first one is the scheduling delay (δhSi,ti ), and the second one is the propagation delay (δhPi,ti ).
Equation (3) calculates the scheduling delay in which yhi,tit0 is a binary decision variable
equaling ‘1’, if the traffic demand hi, ti scheduled/assigned to process in an RB in time
interval t0 . However, for a large traffic demand, RBs in a single time interval may not be
enough to finish that demand; thus, multiple time intervals might be assigned to demand
hi, ti. For that reason, we find the most recent assigned time interval (max(yhi,tit0 ∗ t0 )) for
t0 ∈T
that demand (hi, ti). Then, we subtract the demand time interval (t) from that value to find
the number of time intervals waited for that demand (hi, ti). Finally, we multiply that value
with the length of transmission time interval (TTI) to find the scheduling delay.
S 0
δhi,ti = max(yhi,tit0 ∗ t ) − t ∗ TTI (3)
t0 ∈T
DM P
(i )
δhPi,ti =
∑ yhi,tit0
∑ yhi,tit0 ∗ b M(i)t0 (4)
t0 ∈T
t0 ∈T
∑ (δhSi,ti + δhPi,ti ) ∗ uhi,tik
k ∈K
hi,ti∈hI ,T i
δM = (5)
∑ uhi,tik
k ∈K
hi,ti∈hI ,T i
Equation (4) defines the propagation delay for demand hi, ti. As explained, a user
demand may be scheduled for more than one time interval (yhi,tit0 > 1). Some of these
time intervals may belong to the times that NFs processed in DU LP (b M(i)t0 = 0) (M (i ) is
a given value that maps the user i with its DU LP , (l = M (i )).). In those time intervals,
propagation delay equals zero due to negligible distance between UEs and DU LP . On the
other hand, in some intervals, NFs may be processed in DU HP (b M(i)t0 = 1). To calculate the
number of time intervals in the latter case, we first multiply the scheduled time intervals
with the NF migration decision variable (yhi,tit0 ∗ b M(i)t0 ). That value will equal one if and
only if NFs are migrated to DU HP in time interval t0 . Second, we divide that value by
the total time intervals needed to process that demand ( ∑ yhi,tit0 ). That value gives us
t0 ∈T
the percentage of time we process that demand in DU HP . Finally, we multiply that value
with the propagation delay between DUlLP and DU HP (D M P
(i ) ) to find the total propagation
delay to finish that user demand.
Equation (5) shows the mean delay in the network. Here, the dividend is the total
delay, and the divisor is the number of total demand in the network. In addition, uhi,tik
is the traffic demand indicator, which equals one if there is a demand (hi, ti) in type k;
otherwise, it equals zero.
(hi, ti). Equation (8) correlates the yhi,tit0 decision variable with ahi,tirt0 , indicating that at
least one resource block r ∈ R in time interval t0 is allocated to the user demand hi, ti.
Thus, we can simplify our delay calculation by using yhi,tit0 instead of a more complex deci-
sion variable ahi,tirt0 . Equation (9) generates the user traffic indicator uhi,tik from demand
size Uhi,tik . A UE may demand only one type of traffic in a specific time interval which
Equation (10) ensures. In addition, the total number of DUlLP that can migrate their NFs to
DU HP is limited by ξ in Equation (11). Lastly, the delay of each demand is limited by ∆hi,tik
according to their traffic type k in Equation (12).
T
∑
0
∑ Sirt0 ∗ ahi,tirt0 ≥ ∑ (Uhi,tik ∗ uhi,tik ), ∀hi, ti ∈ hI , T i (6)
t =t r ∈R k ∈K
∑ ahi,tirt0 ≤ 1, ∀ l ∈ L, ∀r ∈ R, ∀ t 0 ∈ T (7)
hi,ti∈hI ,T i
M ∗ yhi,tit0 − ∑ ahi,tirt0 ≥ 0, ∀hi, ti ∈ hI , T i, ∀t0 ∈ T (8)
r ∈R
M ∗ uhi,tik − Uhi,tik ≥ 0, ∀hi, ti ∈ hI , T i, ∀k ∈ K (9)
∑ uhi,tik ≤ 1, ∀hi, ti ∈ hI , T i (10)
k∈K
∑ blt ≤ ξ, ∀t ∈ T (11)
l ∈L
δhSi,ti + δhPi,ti ≤ ∑ uhi,tik ∗ ∆hi,tik , ∀hi, ti ∈ hI , T i (12)
k ∈K
4. Actor–Critic Solution
4.1. RL Model
In this work, we employ two nested A2C agents which are developed based on O-
RAN architecture. While one agent is responsible for scheduling resource blocks during
each time interval, the other is designed to dynamically choose a proper DU for executing
the scheduler agent by considering energy consumption, processing power, scheduling,
and propagation delay with respect to each DU’s traffic load and location. Processing
resource block allocation decisions of multiple RUs on a single DU (DU HP ) can expand
the observation level of decision agents, which can lead to applying more accurate actions.
More precisely, when an A2C-based scheduler agent can access other RUs’ resource block
map, subcarriers can be allocated to edge UEs in a way that the inter-cell interference
among RUs are minimized as shown in [28]. The goal of the proposed actor–critic agent is
improving the performance of the A2C-based scheduler and reducing the overall energy
Sensors 2022, 22, 5029 7 of 17
by an A2C agent. To this end, a neural network (NN) with three layers of neurons is
employed. Furthermore, every TTI neurons’ weights are updated by interactions occurring
between the actor and the critic. In the exploitation stage, the NN works as a non-linear
function, and it will be tuned by updating the weights. During each state s, the main
goal of SA2C agents is to maximize their received reward r, by applying more accurate
actions, a, which can be obtained by action–value (Q(s,a)) and state–value (V(s)) functions.
The action–value function is used to estimate the effect of applying actions during states.
Consequently, estimation of the outcome is obtained by the state–value function. In this
work, for improving the convergence and increasing the stability of the model (by reducing
the overestimation bias), the soft actor–critic model (SA2C) is used. The overall architecture
of SA2C is depicted in Figure 1. In the SA2C model, instead of evaluating action or state
value functions, we just need to estimate V (s). Moreover, the error parameter in SA2C is
a metric for examining the effect of performed action by considering the expected value
V (s), which can be demonstrated by A(st , at ) = Q(st , at ) − V (st ). In addition, the SA2C
model is a synchronous model (unlike asynchronous actor–critic models) which makes it
more consistent and suitable for disaggregated implementations. In the proposed scheme,
each of the actors and critics contain independent neural networks which are demonstrated
as follows:
• The critic’s neural network is used to estimate the corresponding value function for
aligning the actor’s actions. In the proposed scheme we used two critics to minimize
the overestimation bias.
• The actor’s neural network is used to estimate proper action (choosing the best DU
for executing NF) during each time interval.
The states of the proposed DU selection scheme are extracted from the environment
through a tuple, St = { DUtype , QCI, CQI, HOLdelay }, by which DUs’ location and their
amount of available processing power can be identified by the DU’s type, traffic type and
its priorities with a QoS class identifier (QCI) channel quality indicator (CQI) for examining
signal strength, and the amount of time packets stay in the scheduler queue or head-of-the-
line delay (HOLdelay ), respectively. Meanwhile there are two independent actions which are
generated by agents during each time interval. The action space of the A2C-based scheduler
is the location of the assigned resource block in the RB map. Additionally, the action space
of the DU selection agent is the DU type which should be capable to handle NFs (in our case,
we only consider the placement of the A2C-based scheduler[] however, our scheme can be
extended to other NFs in the 5G stack [26,28]). The DU selection agent needs to collaborate
with the A2C-based RB allocation agent in a way that the overall energy consumption in
the network is reduced, while the expected QoS metrics (delay and packet delivery ratio)
of the A2C-based RB allocation agent can be met.
We consider two separate reward functions for our agents since their objectives and
their action spaces are different. In this work, the reward function for the A2C-based
scheduler is defined as follows as in [28]:
Here, UEi feedback is considered as cqii , R2 is an extra reward of 1 which is given for
URLLC traffic to prioritize, the traffic delay budget and packet HOL delay are presented
as Tra f f ic Budget and Packet Delay , respectively. β, γ, and Φ are weighting/scaling factors.
The scheduling agent consists of an actor and a critic. The actor is located at the BSs and is
responsible for allocating resource blocks to UEs, and the critic is integrated into the DUs
to inspect actors and improve their decisions. In this model, the reward function is defined
Sensors 2022, 22, 5029 9 of 17
based on the CQI feedback, the amount of time the packet stays in the scheduling queue,
and the UEs’ satisfaction. To this end, the reward function is defined in a way that actors
can receive the reward when the received SINR becomes higher (higher CQI), scheduling
delay is reduced, and UEs’ satisfaction ratio is increased (mean delay should be less than
the delay budget). Therefore, when the RB map filled by an actor causes an inter-cell
interference, the received SINR will be reduced, and the mean delay will be increased,
leading to the reduction of the reward and punishment for the agent, which can improve
the learning process in agents.
We define the reward for the DU selection agent as follows:
Here, nURLLC shows the number of UEs which generate URLLC traffic. This can be
obtained through the QCI value which is assigned by the EPS bearer, and it shows packet
priorities, types and their delay budget [34]. In this reward function, sinc(π bc) output
is a binary value which is used to produce discrete actions in each time interval (0 or
1). As it was explained previously, RUs select their local DUs to perform higher layer
processing. Our system model includes one DU with higher processing power than the
others. For energy efficiency, one would consider offloading NFs of local DUs to the high-
processing power DU and switching the local DUs off. This may also allow expanding
the inter-cell interference observation capability of agents. However, it is important to
note that this approach would have an adverse impact on delay since local DUs can
provide access with less propagation delay than the distant high-power DU site. This
is in addition to scheduling delay. Moreover, DU HP has a limited processing capacity,
and we cannot transfer all of DUs’ load to it. Therefore, in our reward function, we want
to make sure during each time interval that we choose a proper DU (priority of its packet
and propagation delay with respect to DU HP ) in a way that the overall delay which each
packet experiences (scheduling delay (δitS ) + propagation delay (δitP )) always remains below
the predefined delay budget (∆itk ) (Equation (18)). Moreover, since URLLC UEs are delay
sensitive, we need to make sure that the mean delay of URLLC traffic is kept as low as
possible with respect to other UEs (Equation (19)). To this end, by increasing the α the
overall delay (δitS + δitP ) can be proportionally reduced with respect to the predefined delay
budget. In addition, we want to ensure the total energy consumption (E TOT ) is reduced
by a third term which is calculated by Equation (2). τ and ω are scalar weights which
are defined based on the priority for enhancing the packet delivery ratio (R1 0 ), reducing
the overall delay (R2 0 ), and reducing energy consumption in the network (E TOT ). As a
final remark, in this algorithm, unlike our previous work [7], we improve our reward
function by considering the propagation delay between O-RAN components, fixed energy
consumption, and dynamic energy consumption with respect to DUs processing capacity.
5. Performance Evaluation
The SA2C-EADDUS scheme is implemented in ns3-gym, which is a framework where
OpenAI Gym (a tool for using machine learning libraries) is integrated into ns3 [35,36].
The neural network of the proposed scheme is developed by Pytorch. In simulations, we
assume the number of UEs is between 40 to 80, the UEs are randomly distributed, and they
are associated to closets DU among 4 DU LP s. The URLLC UE ratio is 10%, and we employ
numerology zero with 12 subcarriers, which has 14 symbols per subframe and 15 KHz
subcarrier spacing. Furthermore, scheduling decisions are applied every TTI.
Sensors 2022, 22, 5029 10 of 17
threshold value can be challenging due to the high dynamicity of the network parameters.
The proposed heuristic algorithm is illustrated in Algorithm 1.
Parameters Value
Number of neurons 1024 × 512 layers (Actor + Critic)
5.4.1. Convergence
Before discussing the obtained results, in Figure 3a we present the convergence of
the reward function for SA2C-EADDUS. In this figure, the number of UEs is equal to
70, of which 10% is assigned with URLLC traffic load. Additionally, we employed the
epsilon-greedy policy, forcing actors to assign RBs with the highest weight or randomly
choosing actions during the exploration phase. As it is shown, the algorithm will converge
after almost 100 episodes where each episode contains 500 iterations.
Sensors 2022, 22, 5029 12 of 17
(a) (b)
Figure 3. The convergence of the reward function and the overall energy conservation in the network
when the SA2C-EADDUS with different processing capacity level is employed: (a) the convergence
of the reward function; (b) impact of changing ξ in SA2C-EADDUS.
4.0
(a) (b)
Figure 4. Energy consumption and mean delay performance with MILP solver: (a) the tradeoff
between these two KPIs; (b) impact of increasing number of UEs.
Figure 5a compares the mean delay results of the SA2C-EADDUS scheme and the
MILP solution. The SA2C-EADDUS scheme provides a reasonable mean delay for the UEs
lower than 14. However, due to the high contention, a higher number of UEs causes package
drops, and then they cause the increase of mean delay. On the other hand, the MILP solver
is not affected by this due to the ideal conditions and predefined given data. Figure 5b
compares the energy consumption of two solutions.The SA2C-EADDUS scheme performs
very close to the MILP solution.
5 MILP Solver
SA2C-EADDUS
4
Mean Delay (ms)
1
8 10 12 14 16
Number of UEs
(a) (b)
Figure 5. Lower bound comparisons of the proposed method SA2C-EADDUS: (a) mean delay com-
parison; (b) energy consumption comparison.
5.4.4. Delay
Hereafter, we examine the performance of the SA2C-EADDUS scheme with respect
to three baselines and use a larger network where the number of UEs are varied between
40 to 80. As shown in Table 1, we evaluated the proposed model by generating User
Datagram Protocol (UDP) and Transmission Control Protocol (TCP) packets (video and
ITS) with different delay budgets (150 ms and 30 ms, respectively) and QoS requirements.
In Figure 6a,b we presented the mean delay of video packets and ITS packets independently
to illustrate how the proposed model managed to keep the mean delay of each traffic type
below its delay budget threshold. The proposed SA2C-EADDUS scheme is compared with
an A2C-based scheduler (A2C-RBA) [28], a DRL-based scheduler with different processing
capacity levels [26], and a heuristic scheme (Heuristic-DUS). Here, to examine the effect of
the processing capacity of DU HP over the network performance, we assume that DU HP
has two processing capacity levels. In the first one, we can transfer the processing load of
all DUs (DU LP ) to DU HP (100%DRL-EADDUS), while in the second one, the load of only
50% of DU LP can be transferred to DU HP (50%-DRL-EADDUS). The energy consumption
corresponding to the delay results in Figure 6 is depicted in Figure 3b. In Figure 3b,
Sensors 2022, 22, 5029 14 of 17
we presented the overall energy conversation when employing the SA2C-EADDUS with
different capacity levels. The maximum energy will be consumed (20 KWh) when UEs
are scheduled locally, and we can conserve energy consumption by up to 50% when the
processing power increases.
(a) (b)
Figure 6. The overall mean delay by considering different processing capacity levels and algorithms:
(a) the video packets’ mean delay; (b) the ITS packets’ mean delay.
As observed in Figure 6, the proposed algorithm reduces the mean delay in both
ITS and video traffic with respect to baselines. Additionally, by increasing the processing
power, the observation level of agents will be increased, and actions become more accurate;
therefore, the 100%DRL-EADDUS agent performs better by reducing the mean delay in the
network in comparison with the 50%-DRL-EADDUS agent. Finally, the SA2C-EADDUS can
reduce the mean delay dramatically with respect to the A2C-RBA and the Heuristic-DUS
algorithm, since the SA2C-EADDUS can allocate RBs in a way that the inter-cell interference
in the network is reduced. Additionally, the SA2C-EADDUS approach performs better with
respect to the DRL-EADDUS algorithm because, unlike DRL-EADDUS, the RB allocation
agent in the SA2C-EADDUS scheme, in addition to packet delay, also considers the mean
CQI level of UEs and the priority of URLLC packets in its algorithm.
(a) (b)
Figure 7. The overall packet delivery ratio by considering different processing capacity levels
and algorithms: (a) the video packets’ delivery ratio; (b) the ITS packets’ delivery ratio.
Sensors 2022, 22, 5029 15 of 17
6. Conclusions
As future mobile networks become more complex, the need for intelligence and
participation of more players is emerging, eventually leading to the need for openness.
As these goals are defining the initiatives such as O-RAN and several others, there is a dire
need to explore intelligence capabilities. In this paper, we evaluated the significance of
expanding the observation level in O-RAN architecture for NFs. To this end, we consider
resource allocation function as an example NF and propose a two nested A2C-based
algorithms, which contain two A2C techniques that are working together. The first layer
dynamically transfers NFs to a DU with higher processing power to hit the balance between
saving energy and improving actions’ accuracy. Meanwhile, the second layer contains an
A2C-based scheduler algorithm, which allocates RB by considering user traffic type and
their delay budget. The simulation results show that the proposed scheme can significantly
increase energy efficiency and reduce the overall mean delay and packet drop ratio with
respect to the case where the NF is solely executed at local DUs with limited processing
power. In future works, we will employ the delay and energy consumption in the fronthaul
links and the switching networks.
Author Contributions: Conceptualization, S.M. and T.P.; methodology, S.M. and T.P.; software,
S.M. and T.P.; validation, S.M. and T.P.; formal analysis, S.M. and T.P.; investigation, S.M. and T.P.;
resources, S.M., T.P. and R.W.; writing—original draft preparation, S.M. and T.P.; writing—review
and editing, S.M. and T.P.; visualization, S.M. and T.P.; supervision, M.E.-K.; project administration,
M.E.-K.; funding acquisition, Ontario Centers of Excellence (OCE) 5G ENCQOR program and Ciena.
All authors have read and agreed to the published version of the manuscript.
Funding: This work is supported by Ontario Centers of Excellence (OCE) 5G ENCQOR program
and Ciena.
Conflicts of Interest: The authors declare no conflict of interest.
Notations
The following notations are used in this manuscript:
Sets Explanation
r∈R set of RBs in one time interval
t∈T set of time intervals
i∈I set of UEs
hi, ti ∈ hI , T i tuple of demands
l∈L set of DU LP
k∈K type of traffic
Variables Explanation
ahi,tirt0 indicates rt0 is assigned for hi, ti
blt indicates NFs of DU LP l migrates to DU HP in t
yhi,tit0 indicates any r in t0 is assigned for hi, ti
uhi,tik user traffic indicator
Given Data Explanation
DltP propagation delay
ElF fixed energy consumption at l
ElD dynamic energy consumption coefficient at l
E HP total energy consumption at DU HP
Sirt0 max service rate
Uhi,tik user traffic demand size
∆hi,tik delay budget
ξ max. number of DU LP that can migrate their NFs
M (i ) user & DU LP map
TTI length of transmission time interval
W energy consumption weight in objective function
Ω scaling factor between en. cons. and mean delay
Sensors 2022, 22, 5029 16 of 17
References
1. Klinkowski, M. Latency-Aware DU/CU Placement in Convergent Packet-Based 5G Fronthaul Transport Networks. Appl. Sci.
2020, 10, 7429. [CrossRef]
2. Semov, P.; Koleva, P.; Tonchev, K.; Poulkov, V.; Cooklev, T. Evolution of mobile networks and C-RAN on the road beyond 5G. In
Proceedings of the 2020 43rd International Conference on Telecommunications and Signal Processing (TSP), Milan, Italy, 7–9 July
2020; IEEE: Piscataway, NJ, USA, 2020; pp. 392–398.
3. Dryjański, M.; Kułacz, Ł.; Kliks, A. Toward Modular and Flexible Open RAN Implementations in 6G Networks: Traffic Steering
Use Case and O-RAN xApps. Sensors 2021, 21, 8173. [CrossRef] [PubMed]
4. Yi, B.; Wang, X.; Li, K.; Huang, M. A comprehensive survey of network function virtualization. Comput. Netw. 2018, 133, 212–262.
[CrossRef]
5. Gilson, M.; Mackenzie, R.; Sutton, A.; Huang, J. NGMN Overview on 5G RAN Functional Decomposition; NGMN Alliance: Frankfurt
am Main, Germany, 2018.
6. Pamuklu, T.; Ersoy, C. GROVE: A Cost-Efficient Green Radio Over Ethernet Architecture for Next Generation Radio Access
Networks. IEEE Trans. Green Commun. Netw. 2021, 5, 84–93. [CrossRef]
7. Mollahasani, S.; Erol-Kantarci, M.; Wilson, R. Dynamic CU-DU Selection for Resource Allocation in O-RAN Using actor–critic
Learning. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021.
8. Wu, J.; Zhang, Y.; Zukerman, M.; Yung, E.K.N. Energy-efficient base-stations sleep-mode techniques in green cellular networks:
A survey. IEEE Commun. Surv. Tutor. 2015, 17, 803–826. [CrossRef]
9. Oh, E.; Son, K.; Krishnamachari, B. Dynamic base station switching-on/off strategies for green cellular networks. IEEE Trans.
Wirel. Commun. 2013, 12, 2126–2136. [CrossRef]
10. Niu, Z. TANGO: Traffic-aware network planning and green operation. IEEE Wirel. Commun. 2011, 18, 25–29. [CrossRef]
11. Mollahasani, S.; Onur, E. Density-aware, energy-and spectrum-efficient small cell scheduling. IEEE Access 2019, 7, 65852–65869.
[CrossRef]
12. Qian, M.; Hardjawana, W.; Shi, J.; Vucetic, B. Baseband processing units virtualization for cloud radio access networks. IEEE
Wirel. Commun. Lett. 2015, 4, 189–192. [CrossRef]
13. Wang, X.; Thota, S.; Tornatore, M.; Chung, H.S.; Lee, H.H.; Park, S.; Mukherjee, B. Energy-efficient virtual base station formation
in optical-access-enabled cloud-RAN. IEEE J. Sel. Areas Commun. 2016, 34, 1130–1139. [CrossRef]
14. Sahu, B.J.; Dash, S.; Saxena, N.; Roy, A. Energy-efficient BBU allocation for green C-RAN. IEEE Commun. Lett. 2017, 21, 1637–1640.
[CrossRef]
15. Saxena, N.; Roy, A.; Kim, H. Traffic-aware cloud RAN: A key for green 5G networks. IEEE J. Sel. Areas Commun. 2016,
34, 1010–1021. [CrossRef]
16. Malandrino, F.; Chiasserini, C.F.; Casetti, C.; Landi, G.; Capitani, M. An Optimization-Enhanced MANO for Energy-Efficient 5G
Networks. IEEE/ACM Trans. Netw. 2019, 27, 1756–1769. [CrossRef]
17. Larsen, L.M.; Checko, A.; Christiansen, H.L. A survey of the functional splits proposed for 5G mobile crosshaul networks. IEEE
Commun. Surv. Tutor. 2018, 21, 146–172. [CrossRef]
18. Shehata, M.; Elbanna, A.; Musumeci, F.; Tornatore, M. Multiplexing gain and processing savings of 5G radio-access-network
functional splits. IEEE Trans. Green Commun. Netw. 2018, 2, 982–991. [CrossRef]
19. Alabbasi, A.; Wang, X.; Cavdar, C. Optimal processing allocation to minimize energy and bandwidth consumption in hybrid
CRAN. IEEE Trans. Green Commun. Netw. 2018, 2, 545–555. [CrossRef]
20. Akoush, S.; Sohan, R.; Rice, A.; Moore, A.W.; Hopper, A. Predicting the performance of virtual machine migration. In Proceedings
of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems,
Miami Beach, FL, USA, 17–19 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 37–46.
21. Zhan, Z.H.; Liu, X.F.; Gong, Y.J.; Zhang, J.; Chung, H.S.H.; Li, Y. Cloud computing resource scheduling and a survey of its
evolutionary approaches. ACM Comput. Surv. (CSUR) 2015, 47, 1–33. [CrossRef]
22. Elsayed, M.; Erol-Kantarci, M. AI-enabled future wireless networks: Challenges, opportunities, and open issues. IEEE Veh.
Technol. Mag. 2019, 14, 70–77. [CrossRef]
23. Şahin, T.; Khalili, R.; Boban, M.; Wolisz, A. Reinforcement learning scheduler for vehicle-to-vehicle communications outside
coverage. In Proceedings of the 2018 IEEE Vehicular Networking Conference (VNC), Taipei, Taiwan, 5–7 December 2018; IEEE:
Piscataway, NJ, USA, 2018; pp. 1–8.
24. Pamuklu, T.; Erol-Kantarci, M.; Ersoy, C. Reinforcement Learning Based Dynamic Function Splitting in Disaggregated Green
Open RANs. In Proceedings of the IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021.
25. Elsayed, M.; Erol-Kantarci, M.; Yanikomeroglu, H. Transfer Reinforcement Learning for 5G-NR mm-Wave Networks. IEEE Trans.
Wirel. Commun. 2020, 20, 2838–2849. [CrossRef]
26. Zhang, T.; Shen, S.; Mao, S.; Chang, G.K. Delay-aware Cellular Traffic Scheduling with Deep Reinforcement Learning. In
Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020;
pp. 1–6.
27. Chen, G.; Zhang, X.; Shen, F.; Zeng, Q. Two Tier Slicing Resource Allocation Algorithm Based on Deep Reinforcement Learning
and Joint Bidding in Wireless Access Networks. Sensors 2022, 22, 3495. [CrossRef]
Sensors 2022, 22, 5029 17 of 17
28. Mollahasani, S.; Erol-Kantarci, M.; Hirab, M.; Dehghan, H.; Wilson, R. actor–critic Learning Based QoS-Aware Scheduler for
Reconfigurable Wireless Networks. IEEE Trans. Netw. Sci. Eng. 2021, 9, 45–54. [CrossRef]
29. Pamuklu, T.; Mollahasani, S.; Erol-Kantarci, M. Energy-Efficient and Delay-Guaranteed Joint Resource Allocation and DU
Selection in O-RAN. In Proceedings of the 5G World Forum (5GWF), Montreal, QC, Canada, 13–15 October 2021.
30. O-RAN Alliance. O-RAN-WG1-O-RAN Architecture Description—v04.00.00; Technical Specification; O-RAN Alliance: Alfter,
Germany, 2021.
31. Yu, Y.J.; Pang, A.C.; Hsiu, P.C.; Fang, Y. Energy-efficient downlink resource allocation for mobile devices in wireless systems. In
Proceedings of the 2013 IEEE Global Communications Conference (GLOBECOM), Atlanta, GA, USA, 9–13 December 2013; IEEE:
Piscataway, NJ, USA, 2013; pp. 4692–4698.
32. Bonati, L.; D’Oro, S.; Polese, M.; Basagni, S.; Melodia, T. Intelligence and Learning in O-RAN for Data-Driven NextG Cellular
Networks. IEEE Commun. Mag. 2021, 59, 21–27. [CrossRef]
33. ITU. ITU-T Recommendation G Suppl. 66. In 5G Wireless Fronthaul Requirements in a Passive Optical Network Context; Technical
Report; International Telecommunications Union: Geneva, Switzerland, 2018.
34. 3GPP. Table 6.1.7-A: Standardized QCI Characteristics from 3GPP TS 23.203 V16.1.0; Technical Report; 3GPP: Sophia Antipolis,
France, 2020.
35. Gawłowicz, P.; Zubow, A. NS-3 meets openai gym: The playground for machine learning in networking research. In Proceedings
of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, Miami Beach,
FL, USA, 25–29 November 2019; pp. 113–120.
36. Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016,
arXiv:1606.01540.