0% found this document useful (0 votes)

14 views

Energy-Aware Dynamic DU Selection and NF Relocation

Energy-Aware_Dynamic_DU_Selection_and_NF_Relocatio

Uploaded by

Yese Solomon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Energy-Aware Dynamic DU Selection and NF Relocation

Energy-Aware_Dynamic_DU_Selection_and_NF_Relocatio

Uploaded by

Yese Solomon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

sensors

Article
Energy-Aware Dynamic DU Selection and NF Relocation in
O-RAN Using Actor–Critic Learning
Shahram Mollahasani 1 , Turgay Pamuklu 1 , Rodney Wilson 2 and Melike Erol-Kantarci 1, *

1 School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
[email protected] (S.M.); [email protected] (T.P.)
2 Ciena, Ottawa, ON K2K 0L1, Canada; [email protected]
* Correspondence: [email protected]

Abstract: Open radio access network (O-RAN) is one of the promising candidates for fulfilling flexible
and cost-effective goals by considering openness and intelligence in its architecture. In the O-RAN
architecture, a central unit (O-CU) and a distributed unit (O-DU) are virtualized and executed on
processing pools of general-purpose processors that can be placed at different locations. Therefore, it
is challenging to choose a proper location for executing network functions (NFs) over these entities
by considering propagation delay and computational capacity. In this paper, we propose a Soft Actor–
Critic Energy-Aware Dynamic DU Selection algorithm (SA2C-EADDUS) by integrating two nested
actor–critic agents in the O-RAN architecture. In addition, we formulate an optimization model that
minimizes delay and energy consumption. Then, we solve that problem with an MILP solver and use
that solution as a lower bound comparison for our SA2C-EADDUS algorithm. Moreover, we compare
that algorithm with recent works, including RL- and DRL-based resource allocation algorithms and
a heuristic method. We show that by collaborating A2C agents in different layers and by dynamic
relocation of NFs, based on service requirements, our schemes improve the energy efficiency by 50%
with respect to other schemes. Moreover, we reduce the mean delay by a significant amount with our
Citation: Mollahasani, S.; Pamuklu,
novel SA2C-EADDUS approach.
T.; Wilson, R.; Erol-Kantarci, M.
Energy-Aware Dynamic DU Selection
Keywords: actor–critic learning; energy-efficiency; O-RAN; RAN optimization
and NF Relocation in O-RAN Using
Actor–Critic Learning. Sensors 2022,
22, 5029. https://doi.org/10.3390/
s22135029
1. Introduction
Academic Editor: Peter Han
The next generation wireless networks include a fundamental transformation which is
Joo Chong
the next generation radio access networks (NG-RAN) [1]. The NG-RAN protocol stack can
Received: 22 May 2022 be split into eight different disaggregated options which are combined within three network
Accepted: 26 June 2022 units: radio unit (RU), distributed unit (DU), and centralized unit (CU). Furthermore, un-
Published: 3 July 2022 like the traditional RAN, NG-RAN functions can be virtualized on top of general-purpose
Publisher’s Note: MDPI stays neutral hardware. In Open RAN (O-RAN), the concept of virtualized RAN (vRAN) and the disag-
with regard to jurisdictional claims in gregation of network units reaches its full interoperability by using open interfaces among
published maps and institutional affil- multiple vendors [2,3]. However, the placement of the virtual network functions can be
iations. challenging due to multiple constraints, such as routing path, RAN protocol splits, band-
width limitations, maximum latency tolerance requirements, heterogeneous computational
resources, and so on. The main objective of this work is to develop an RL-based network
function placement scheme in a way that the energy consumption is minimized while
Copyright: © 2022 by the authors. the expected quality of service (QoS) for that network function is satisfied. In this work,
Licensee MDPI, Basel, Switzerland. resource block (RB) scheduling is considered as a network function (NF). The proposed
This article is an open access article
scheme dynamically relocates this NF among DUs based on their location and processing
distributed under the terms and
power by targeting minimum energy consumption and satisfying the QoS requirements
conditions of the Creative Commons
of users.
Attribution (CC BY) license (https://
The idea of disaggregation in O-RAN is allowing RAN functions to be placed at
creativecommons.org/licenses/by/
different computing devices in a distributed manner [4]. Therefore, it is vital to identify
4.0/).

Sensors 2022, 22, 5029. https://doi.org/10.3390/s22135029 https://www.mdpi.com/journal/sensors

Sensors 2022, 22, 5029 2 of 17

how this disaggregationcan be executed and what metrics need to be satisfied to run these
disaggregated components correctly. By considering the concept of function splits in O-
RAN, the requirements for network functions such as minimum bit rate and the maximum
latency for communication among O-RAN components (RU, DU, and CU) need to be
satisfied. This work is developed based on the O-RAN architecture and split option 7.2
where the functional split is between RU and DU [1,5].
Since the amount of energy consumption in DUs is proportional to the time they are
active, i.e., processing tasks, mobile network operators (MNO) are seeking an intelligent
assignment model where RUs use minimum required resources at DUs, while QoS require-
ments such as packet delivery ratio and latency are satisfied [6]. Note that a part of DU
energy consumption is fixed due to the cooling system, idle power, etc. Therefore, reducing
the active time of DUs can decrease the overall energy consumption in the network. This
can be achieved by optimizing load distribution among DUs with respect to their process-
ing power and their location and setting redundant DUs to sleep mode. The main focus of
this paper is to highlight the effect of deploying an AI-enabled network function relocation
in a dynamic environment on energy consumption and network performance. To this end,
we examined our model by considering resource block scheduling as a network function
and applied our AI-based framework on it. We evaluate this model by generating User
Datagram Protocol (UDP) and Transmission Control Protocol (TCP) packets (video and
ITS) with different delay budgets and QoS requirements.
In this paper, we propose an RL-based resource block allocation in 5G and DU selection
in O-RAN architecture. In the proposed model, we employed energy awareness as a key
performance indicator and provide a multi-agent actor–critic (A2C) method as the primary
novelty of this study. In this work, we employed two agents; one is responsible for
allocating RB to UEs by considering the type and priority of packets, while the other
agent is integrated for reducing the energy consumption by considering processing power,
capacity, and propagation delay among DUs in the network. Furthermore, performance
evaluation includes a mixed-integer linear programming (MILP) solver comparison to
determine the gap between this novel approach and the lower bound of that multi-objective
minimization problem. Thus, we demonstrate the feasibility of this advanced machine
learning implementation in a disaggregated RAN scheme.
In our prior work [7], we developed an A2C-based algorithm for invoking NFs at
the CU or DU by considering traffic requirements. This work extends [7] by considering
the propagation delay between O-RAN components and the energy consumption, and in
addition, we consider DUs with different processing capacities. The proposed model is ex-
amined for a different number of UEs, and it is compared with a heuristic model developed
in our previous works and two other recent works. Our results show that the proposed
model can reduce energy consumption by 50% with respect to models where network
functions are executed locally under the given settings. Apart from energy conservation,
the proposed model can reduce the overall mean delay for an intelligent transport system
(ITS) and video traffic down to 7 ms and 12 ms, respectively. In contrast, the packet delivery
ratio for ITS and video traffics will be increased up to 80% and 30%, respectively.
The rest of this paper is organized as follows: In Section 2, we discuss the recent works
in this area. In Section 3, we formulate the problem and propose an MILP solution. The A2C-
based algorithm is comprehensively explained in Section 4. In Section 5, the proposed
scheme is compared with four baseline algorithms, and in Section 6, we conclude the paper.

2. Related Work
Energy-efficiency algorithms for traditional RANs have been comprehensively pre-
sented in [8]. Most of the works covered in [8] are based on on-off techniques for BSs. Some
of these models rely on traffic prediction to estimate low traffic intervals, while others use
cell zooming techniques to expand the BSs’ coverage with respect to their neighbors [9–11].
These studies consider RAN equipment as a monolithic equipment.
Sensors 2022, 22, 5029 3 of 17

Energy efficiency is also considered in centralized RAN (C-RAN), where just the radio
unit of BSs is disaggregated at remote radio heads (RRHs), and the rest is implemented at
the baseband unit (BBU). For instance, in [12], BBU placement across physical machines in
a cloud site is formulated as a bin-packing problem, and the authors tackle this problem
by proposing a heuristic algorithm. Additionally, in [13,14], the authors improve the work
in [12] by proposing a BBUs virtualization technique with a linear computational complexity
order that reduces the power consumption in the network. Furthermore, in [15], the authors
show how traffic forecasting at each BS can be used in dynamically switching (on/off)
RRHs. Moreover, ref. [16] shows an end-to-end energy-efficient model by activating and
placing virtual network functions (VNFs) on physical machines and distributing the traffic
among them.
The aforementioned C-RAN scenarios mainly reflect the fixed functional split, and they
require high fronthaul bitrate and consequently incur high deployment cost. To this
end, the impact of the flexible function split is examined in several recent studies [17].
For instance, in [18], savings in power and computational resources with respect to different
function splits are analytically modeled. In [19], the authors aim to optimize the energy
efficiency and the midhaul bandwidth in C-RAN.
Different from these works, in this paper we evaluate the energy consumption for
dynamic NF migration among edge clouds, i.e., not only CU allocation as in C-RAN but
also DU allocation in O-RAN is considered. Finally, refs. [20,21] show that the migration of
NFs has a non-negligible impact on energy consumption, which has not been addressed
in previous works. Since we aim to address the placement of resource allocation function
using machine learning, it is important to give a brief overview on the existing studies
on RL-based resource allocation [22]. For instance, an RL-based resource block allocation
technique is employed in a vehicle-to-vehicle network in [23].
In [24], an RL-based algorithm is proposed for optimizing the energy consumption
and cost in a disaggregated RAN. Moreover, in [25], the authors develop an RL-based
user-beam association and resource allocation scheme using transfer reinforcement learn-
ing. In [26], a deep RL-based resource block allocation is introduced in which RBs are
allocated in a way that the mean delay is reduced. Another Deep RL approach is proposed
in [27] to solve a two-tier resource allocation problem in a standalone base station. In our
previous work [28], we developed an RL-based NF to schedule URLLC and eMBB pack-
ets with respect to the delay budget and channel conditions of UEs in the network. We
also developed an RL-based algorithm for invoking NFs at the CU or DU by considering
traffic requirements [7]. However, unlike [7], in this paper we develop a comprehensive
scheme that considers the propagation delay between O-RAN components and the en-
ergy consumption, and in addition we consider DUs with different processing capacities.
Furthermore, in [29], we introduced an optimization-based solution for the DU selection
problem under delay constraints. This paper extends [29] by modeling the problem as a
multi-objective minimization problem for jointly addressing energy-efficiency and delay.
Unlike previous works, in this paper we develop an actor–critic based DU selection
scheme for a RAN with disaggregated layers (such as O-RAN) to dynamically relocate
network functions among available DUs (edge servers) by considering their processing
power and propagation delay in a way that the overall energy consumption in the network
is reduced, while packet delivery ratio and delay budget of user traffic are satisfied.

3. System Model
Figure 1 shows the overall architecture that is structured as a time-interval-based
model. In each time interval (t ∈ T ), I numbers of user equipment (UEs, i ∈ I ) request
demands which are defined as a tuple hi, ti. These demands may belong to one of two
(K = 2) different types of traffic (k ∈ K), such as video and ITS. These traffic types have
different demand sizes (Uhi,tik ) and delay budgets (∆hi,tik ). On the infrastructure side, we
consider L low processing power DUs (DU LP , l ∈ L) that service these UEs. These DUs can
Sensors 2022, 22, 5029 4 of 17

house network protocols from the lowest level to the packet data convergence protocol [30].
Moreover, each one has a dedicated RU to perform lower layer RF functions.

Figure 1. The overall architecture of the proposed model. Agents are located at DUs and interact
with the DU selection algorithm by sending some feedback during each time interval (the red arrow).
Then the DU selection scheme informs agents (the green arrow) regarding the location of NF during
the next time interval based on the types of packets (URLLC or eMBB), available processing capacity,
scheduling delay, and propagation delay among DUs to minimize the overall energy consumption in
the network.

Moreover, in our architecture, DUs have an option to migrate NFs to a DU with

higher processing power (DU HP ). This migration option has two essential benefits in our
system. First, we can switch off the digital units in the local DUs and save energy. Second,
aggregation of NFs from multiple DUs in a common DU HP allows the resource allocation
function to observe multiple RB maps in the same DU and mitigate inter-cell interference.
This can lead to a lower scheduling delay and reduce the number of retransmissions in
the network. However, it should be noted that DU HP has a limited processing capacity
(ξ), and due to its location, migrating NFs to it may face higher propagation delay (DlP )
with respect to other DUs, which can negatively affect packets with a lower latency delay
budget. The details of the notations are given in Notations .

3.1. Energy Consumption and Delay Models

Equation (1) calculates the energy consumption in DU LP l in time interval t0 . That
value equals zero when NFs are migrated to DU HP in this time interval (blt0 = 1) (Note that
a DU may need to consume energy for other reasons in that time interval. However, these
energy consumptions do not change with the decision variables; thus, we do not include
them in the energy consumption model. Meanwhile, they can be easily added as constant
energy consumption.). The equation provides that case by the rightmost multiplicand
(1 − blt0 ). Processing NFs has two energy consumption terms; the first one is the fixed
energy consumption (ElF ), which does not change by the activity of the processing unit in
this DU. The second one is the dynamic energy consumption which is considered when the
processing unit is active for user traffic demand. Thus, it depends on the processing unit
energy consumption per RB (ElD ) and the DU utilization that is calculated by the number
of allocated RBs r in the time interval t0 . Here ahi,tirt0 equals one if the traffic demand hi, ti
Sensors 2022, 22, 5029 5 of 17

is processed in RB r in the time interval t0 . Lastly, Equation (2) calculates the total energy
consumption in DUs.
 

∑
 F D

 El + El ∗
Elt0 =   ∗ (1 − blt0 )
ahi,tirt0  (1)
hi,ti∈hI ,T i
r ∈R
!
E TOT = ∑ E HP + ∑ Elt0 (2)
t0 ∈T l ∈L

The delay of each traffic demand (hi, ti) is the summation of two terms in which the
first one is the scheduling delay (δhSi,ti ), and the second one is the propagation delay (δhPi,ti ).
Equation (3) calculates the scheduling delay in which yhi,tit0 is a binary decision variable
equaling ‘1’, if the traffic demand hi, ti scheduled/assigned to process in an RB in time
interval t0 . However, for a large traffic demand, RBs in a single time interval may not be
enough to finish that demand; thus, multiple time intervals might be assigned to demand
hi, ti. For that reason, we find the most recent assigned time interval (max(yhi,tit0 ∗ t0 )) for
t0 ∈T
that demand (hi, ti). Then, we subtract the demand time interval (t) from that value to find
the number of time intervals waited for that demand (hi, ti). Finally, we multiply that value
with the length of transmission time interval (TTI) to find the scheduling delay.

S 0
δhi,ti = max(yhi,tit0 ∗ t ) − t ∗ TTI (3)
t0 ∈T
DM P
(i )
δhPi,ti =
∑ yhi,tit0
∑ yhi,tit0 ∗ b M(i)t0 (4)
t0 ∈T
t0 ∈T
∑ (δhSi,ti + δhPi,ti ) ∗ uhi,tik
k ∈K
hi,ti∈hI ,T i
δM = (5)
∑ uhi,tik
k ∈K
hi,ti∈hI ,T i

Equation (4) defines the propagation delay for demand hi, ti. As explained, a user
demand may be scheduled for more than one time interval (yhi,tit0 > 1). Some of these
time intervals may belong to the times that NFs processed in DU LP (b M(i)t0 = 0) (M (i ) is
a given value that maps the user i with its DU LP , (l = M (i )).). In those time intervals,
propagation delay equals zero due to negligible distance between UEs and DU LP . On the
other hand, in some intervals, NFs may be processed in DU HP (b M(i)t0 = 1). To calculate the
number of time intervals in the latter case, we first multiply the scheduled time intervals
with the NF migration decision variable (yhi,tit0 ∗ b M(i)t0 ). That value will equal one if and
only if NFs are migrated to DU HP in time interval t0 . Second, we divide that value by
the total time intervals needed to process that demand ( ∑ yhi,tit0 ). That value gives us
t0 ∈T
the percentage of time we process that demand in DU HP . Finally, we multiply that value
with the propagation delay between DUlLP and DU HP (D M P
(i ) ) to find the total propagation
delay to finish that user demand.
Equation (5) shows the mean delay in the network. Here, the dividend is the total
delay, and the divisor is the number of total demand in the network. In addition, uhi,tik
is the traffic demand indicator, which equals one if there is a demand (hi, ti) in type k;
otherwise, it equals zero.

3.2. System Constraints

Equation (6) guarantees that each UE gets its service demand from the system.
Equation (7) prevents the allocation of the same RB (rt0 ) to multiple user traffic demands
Sensors 2022, 22, 5029 6 of 17

(hi, ti). Equation (8) correlates the yhi,tit0 decision variable with ahi,tirt0 , indicating that at
least one resource block r ∈ R in time interval t0 is allocated to the user demand hi, ti.
Thus, we can simplify our delay calculation by using yhi,tit0 instead of a more complex deci-
sion variable ahi,tirt0 . Equation (9) generates the user traffic indicator uhi,tik from demand
size Uhi,tik . A UE may demand only one type of traffic in a specific time interval which
Equation (10) ensures. In addition, the total number of DUlLP that can migrate their NFs to
DU HP is limited by ξ in Equation (11). Lastly, the delay of each demand is limited by ∆hi,tik
according to their traffic type k in Equation (12).

T
∑
0
∑ Sirt0 ∗ ahi,tirt0 ≥ ∑ (Uhi,tik ∗ uhi,tik ), ∀hi, ti ∈ hI , T i (6)
t =t r ∈R k ∈K

∑ ahi,tirt0 ≤ 1, ∀ l ∈ L, ∀r ∈ R, ∀ t 0 ∈ T (7)
hi,ti∈hI ,T i
M ∗ yhi,tit0 − ∑ ahi,tirt0 ≥ 0, ∀hi, ti ∈ hI , T i, ∀t0 ∈ T (8)
r ∈R
M ∗ uhi,tik − Uhi,tik ≥ 0, ∀hi, ti ∈ hI , T i, ∀k ∈ K (9)
∑ uhi,tik ≤ 1, ∀hi, ti ∈ hI , T i (10)
k∈K

∑ blt ≤ ξ, ∀t ∈ T (11)
l ∈L
δhSi,ti + δhPi,ti ≤ ∑ uhi,tik ∗ ∆hi,tik , ∀hi, ti ∈ hI , T i (12)
k ∈K

3.3. Problem Definition

We have a multi-objective minimization (P1) problem in which we aim for a balance
between the energy consumption in DUs and the mean delay with the objective function
(Equation (13)). W is a weighting factor between these key performance indicators (KPIs)
in this equation, and Ω is a scaling factor to normalize the ranges.

(P1) Minimize: W ∗ E TOT + Ω ∗ (1 − W ) ∗ δ M (13)

Subject to: Equations (6)–(12)

Let us consider a special case of the problem (P1) in which the propagation delay,
DlP , between any DU LP and DU HP is remarkably huge and makes the network choose
only DU LP for NFs. Thus, P1 reduces to an RB allocation problem, which is proved as an
NP-hard problem by Yu et al. [31]. Therefore, we propose an actor–critic solution for this
problem which is explained in the next section.

4. Actor–Critic Solution
4.1. RL Model
In this work, we employ two nested A2C agents which are developed based on O-
RAN architecture. While one agent is responsible for scheduling resource blocks during
each time interval, the other is designed to dynamically choose a proper DU for executing
the scheduler agent by considering energy consumption, processing power, scheduling,
and propagation delay with respect to each DU’s traffic load and location. Processing
resource block allocation decisions of multiple RUs on a single DU (DU HP ) can expand
the observation level of decision agents, which can lead to applying more accurate actions.
More precisely, when an A2C-based scheduler agent can access other RUs’ resource block
map, subcarriers can be allocated to edge UEs in a way that the inter-cell interference
among RUs are minimized as shown in [28]. The goal of the proposed actor–critic agent is
improving the performance of the A2C-based scheduler and reducing the overall energy
Sensors 2022, 22, 5029 7 of 17

consumption by dynamically selecting DUs by considering their processing power and

propagation delay in the network.
Since, in the O-RAN architecture, intelligence is integrated at multiple layers of O-
RAN, it is natural to split the intelligence into different layers to take advantage of higher
processing power or to be able to apply real-time actions for delay-sensitive applications.
Accordingly, each O-RAN component should abide by a specific delay bound based on
the tolerable delay within its control loop. To this end, in this work, we deploy the DU
selection agent at CU to access a higher perspective with respect to DUs, accessing higher
processing power to optimize the more complex problem and its higher tolerable delay
bound (10 ms to 1 s) which provides adequate time for processing the model. Additionally,
the RL-based scheduler agents are located at DUs to schedule UEs close to real-time time
intervals (less than 10 ms) [32]. The tolerable delay bound at each layer closely depends on
the midhaul/fronthaul bandwidth and the range. In this work, this delay is assumed as
the AI feedback’s round trip time (RTT), and as shown in Figure 2, it includes processing,
propagation, and switching delay over the XHaul. As it is discussed in [33], the maximum
tolerable delay for an XHaul with a 3 Gbps bandwidth and 400 km range is at most 10 ms;
therefore, in the system model, we considered the propagation delay in this range (2 ms to
5 ms) and left some room for switching delay to make it close to the real-life condition. It
should be noted that the main focus of this work is on propagation delay, and we assumed
the switching delay and processing delay constant by considering the constant bitrate for
all DUs and defining the processing power of DU HP proportional to DU LP (the processing
power of DU HP is considered two to four times higher than DU LP which makes it capable
of processing up to four DUs’ load at the same time).

Figure 2. The overall delay in the network.

4.2. Soft Actor–Critic Energy Aware Dynamic DU Selection/NF Placement

Scheme (SA2C-EADDUS)
In this work, a soft actor–critic approach is employed since it provides hierarchical
control during the learning procedure. In actor–critic models, to reduce the gradient,
the corresponding value function needs to be learned to update and assist the system
policy. In the proposed scheme, based on the amount of energy consumed at each DU,
the DU’s processing power, priority of traffic, and its delay tolerance level, the DU for
executing the RL-based scheduling agent will be selected. This selection is also performed
Sensors 2022, 22, 5029 8 of 17

by an A2C agent. To this end, a neural network (NN) with three layers of neurons is
employed. Furthermore, every TTI neurons’ weights are updated by interactions occurring
between the actor and the critic. In the exploitation stage, the NN works as a non-linear
function, and it will be tuned by updating the weights. During each state s, the main
goal of SA2C agents is to maximize their received reward r, by applying more accurate
actions, a, which can be obtained by action–value (Q(s,a)) and state–value (V(s)) functions.
The action–value function is used to estimate the effect of applying actions during states.
Consequently, estimation of the outcome is obtained by the state–value function. In this
work, for improving the convergence and increasing the stability of the model (by reducing
the overestimation bias), the soft actor–critic model (SA2C) is used. The overall architecture
of SA2C is depicted in Figure 1. In the SA2C model, instead of evaluating action or state
value functions, we just need to estimate V (s). Moreover, the error parameter in SA2C is
a metric for examining the effect of performed action by considering the expected value
V (s), which can be demonstrated by A(st , at ) = Q(st , at ) − V (st ). In addition, the SA2C
model is a synchronous model (unlike asynchronous actor–critic models) which makes it
more consistent and suitable for disaggregated implementations. In the proposed scheme,
each of the actors and critics contain independent neural networks which are demonstrated
as follows:
• The critic’s neural network is used to estimate the corresponding value function for
aligning the actor’s actions. In the proposed scheme we used two critics to minimize
the overestimation bias.
• The actor’s neural network is used to estimate proper action (choosing the best DU
for executing NF) during each time interval.
The states of the proposed DU selection scheme are extracted from the environment
through a tuple, St = { DUtype , QCI, CQI, HOLdelay }, by which DUs’ location and their
amount of available processing power can be identified by the DU’s type, traffic type and
its priorities with a QoS class identifier (QCI) channel quality indicator (CQI) for examining
signal strength, and the amount of time packets stay in the scheduler queue or head-of-the-
line delay (HOLdelay ), respectively. Meanwhile there are two independent actions which are
generated by agents during each time interval. The action space of the A2C-based scheduler
is the location of the assigned resource block in the RB map. Additionally, the action space
of the DU selection agent is the DU type which should be capable to handle NFs (in our case,
we only consider the placement of the A2C-based scheduler[] however, our scheme can be
extended to other NFs in the 5G stack [26,28]). The DU selection agent needs to collaborate
with the A2C-based RB allocation agent in a way that the overall energy consumption in
the network is reduced, while the expected QoS metrics (delay and packet delivery ratio)
of the A2C-based RB allocation agent can be met.
We consider two separate reward functions for our agents since their objectives and
their action spaces are different. In this work, the reward function for the A2C-based
scheduler is defined as follows as in [28]:

Rewardscheduler = βR1 + γR2 + ΦR3 , (14)

∑iI=0 cqi j
! !
R1 = max sgn cqik − ,0 (15)
K
$ %
Packet Delay
R3 = sinc(π ) (16)
Tra f f ic Budget

Here, UEi feedback is considered as cqii , R2 is an extra reward of 1 which is given for
URLLC traffic to prioritize, the traffic delay budget and packet HOL delay are presented
as Tra f f ic Budget and Packet Delay , respectively. β, γ, and Φ are weighting/scaling factors.
The scheduling agent consists of an actor and a critic. The actor is located at the BSs and is
responsible for allocating resource blocks to UEs, and the critic is integrated into the DUs
to inspect actors and improve their decisions. In this model, the reward function is defined
Sensors 2022, 22, 5029 9 of 17

based on the CQI feedback, the amount of time the packet stays in the scheduling queue,
and the UEs’ satisfaction. To this end, the reward function is defined in a way that actors
can receive the reward when the received SINR becomes higher (higher CQI), scheduling
delay is reduced, and UEs’ satisfaction ratio is increased (mean delay should be less than
the delay budget). Therefore, when the RB map filled by an actor causes an inter-cell
interference, the received SINR will be reduced, and the mean delay will be increased,
leading to the reduction of the reward and punishment for the agent, which can improve
the learning process in agents.
We define the reward for the DU selection agent as follows:

Reward DUselector = τ ( R1 0 + R2 0 ) − ωE TOT , (17)

$ S
δhi,ti + δhPi,ti
%
0
R1 = sinc(π ) (18)
∆hi,tik
α × (δhSi,ti + δhPi,ti )
$ %
0 nURLLC
R2 = sinc(π ) ) (19)
ntot ∆hi,tik

Here, nURLLC shows the number of UEs which generate URLLC traffic. This can be
obtained through the QCI value which is assigned by the EPS bearer, and it shows packet
priorities, types and their delay budget [34]. In this reward function, sinc(π bc) output
is a binary value which is used to produce discrete actions in each time interval (0 or
1). As it was explained previously, RUs select their local DUs to perform higher layer
processing. Our system model includes one DU with higher processing power than the
others. For energy efficiency, one would consider offloading NFs of local DUs to the high-
processing power DU and switching the local DUs off. This may also allow expanding
the inter-cell interference observation capability of agents. However, it is important to
note that this approach would have an adverse impact on delay since local DUs can
provide access with less propagation delay than the distant high-power DU site. This
is in addition to scheduling delay. Moreover, DU HP has a limited processing capacity,
and we cannot transfer all of DUs’ load to it. Therefore, in our reward function, we want
to make sure during each time interval that we choose a proper DU (priority of its packet
and propagation delay with respect to DU HP ) in a way that the overall delay which each
packet experiences (scheduling delay (δitS ) + propagation delay (δitP )) always remains below
the predefined delay budget (∆itk ) (Equation (18)). Moreover, since URLLC UEs are delay
sensitive, we need to make sure that the mean delay of URLLC traffic is kept as low as
possible with respect to other UEs (Equation (19)). To this end, by increasing the α the
overall delay (δitS + δitP ) can be proportionally reduced with respect to the predefined delay
budget. In addition, we want to ensure the total energy consumption (E TOT ) is reduced
by a third term which is calculated by Equation (2). τ and ω are scalar weights which
are defined based on the priority for enhancing the packet delivery ratio (R1 0 ), reducing
the overall delay (R2 0 ), and reducing energy consumption in the network (E TOT ). As a
final remark, in this algorithm, unlike our previous work [7], we improve our reward
function by considering the propagation delay between O-RAN components, fixed energy
consumption, and dynamic energy consumption with respect to DUs processing capacity.

5. Performance Evaluation
The SA2C-EADDUS scheme is implemented in ns3-gym, which is a framework where
OpenAI Gym (a tool for using machine learning libraries) is integrated into ns3 [35,36].
The neural network of the proposed scheme is developed by Pytorch. In simulations, we
assume the number of UEs is between 40 to 80, the UEs are randomly distributed, and they
are associated to closets DU among 4 DU LP s. The URLLC UE ratio is 10%, and we employ
numerology zero with 12 subcarriers, which has 14 symbols per subframe and 15 KHz
subcarrier spacing. Furthermore, scheduling decisions are applied every TTI.
Sensors 2022, 22, 5029 10 of 17

The SA2C-EADDUS performance is evaluated by considering different processing

capacities of DUs and two different traffic types. We assume local DUs processing capacity
cannot handle more than one RU’s functions. Additionally, we assume there is a DU with
higher processing capacity (DU HP ) which is located far from the other RUs. DU LP can
reduce its energy consumption by offloading its tasks to a DU with higher processing power.
However, based on its location with respect to DU HP , the propagation delay would vary.
We examine two processing capacity levels for DU HP to evaluate the effect of increas-
ing the observation level of agents over network performance. We also consider two traffic
types (live stream video as eMBB traffic and intelligent transport system (ITS) as URLLC
traffic) with different delay budgets. It should be noted that when the overall delay of a
packet (scheduling and propagation delay) is higher than the predefined delay budget, we
assume the packet is dropped. In Table 1, QoS metrics of each traffic type is depicted.

Table 1. Traffic properties [34].

QCI Resource Type Priority Packet Delay Budget Service Example

2 GBR 40 150 ms Live stream Video
84 GBR 24 30 ms ITS

We use the proposed MILP solution as a benchmark for SA2C-EADDUS. Furthermore,

we compare SA2C-EADDUS with three baselines. The first baseline was proposed in
our prior work [28]. The second baseline is based on a deep reinforcement learning
(DRL) scheduler as in [26], and the last baseline is a heuristic method. We integrated
these approaches to our energy-aware dynamic DU selection scheme to evaluate their
performance with respect to the proposed scheme. In the following, we briefly present
our baselines.

5.1. Delay and Priority Aware Actor–Critic RB Allocation Algorithm (A2C-RBA)

In our previous work [28], we implemented an actor–critic learning-based resource
block scheduler where RBs are allocated to UEs by considering the delay budget and the
priority of the user traffic. The following algorithm is developed for reconfigurable wireless
networks where actions can be autonomously applied over the network. The A2C-RBA
can adapt itself to the dynamicity of the environment to increase the utility of the available
resources. However, the A2C-RBA algorithm is solely running at DU LP , and it cannot take
advantage of DU HP .

5.2. Deep-Reinforcement Learning-Based Energy-Aware Dynamic DU Selection

Scheme (DRL-EADDUS)
In [26], the main objective of DRL agents is allocating RBs in a way that the mean delay
of packets be minimized. The authors in this work considered UEs with different packet
request patterns, and they developed a deep-reinforcement learning algorithm to schedule
RBs during each time interval. This work is integrated with the proposed framework to
reduce the energy alongside the RB allocation in the network. We integrate this work into
our framework to have a fair comparison with the proposed scheme.

5.3. Heuristic DU Selection Method (Heuristic-DUS)

In the heuristic method, decisions are made based on the propagation delay between
local DUs and DU HP with respect to a predefined threshold value and type of packets.
In this algorithm, traffic in local DUs will be transferred to DU HP when the propagation
delay is below the predefined threshold and packets are not URLLC. Therefore, to reduce
the packet drop ratio, if a packet’s scheduling delay is higher than the threshold or URLLC
traffic is scheduled for the next TTI, RBs will be assigned locally. Additionally, when
the packets are scheduled at DU HP , RBs will be assigned by considering other DUs RB
maps to reduce the interference in the network. It should be noted that assigning a proper
Sensors 2022, 22, 5029 11 of 17

threshold value can be challenging due to the high dynamicity of the network parameters.
The proposed heuristic algorithm is illustrated in Algorithm 1.

Algorithm 1: Heuristic RB Allocation Algorithm.

nDU ← Number of DUs;
nRB ← Number of RBs;
PDi ← Propagation delay between DUiLP and DU HP ;
while nRB >0 do
for i = 0; i < nDU; i++ do
if PDi < Threshold && Traffic! = URLLC then
Transfer the load to DU HP
Examine other DUs RB maps
Select RBs which are not interfere with other cells
Add the predefined propagation delay to the packets
else
Apply RB allocation locally
end
end
end

5.4. Simulation Results

In Table 2. the corresponding simulation and NNs parameters are illustrated in detail.
Table 2. Simulation parameters.

Parameters Value
Number of neurons 1024 × 512 layers (Actor + Critic)

DU selec./scheduler algorithms SA2C-EADDUS, DRL-EADDUS,

A2C-RBA
Number of BSs 4
Number of UEs 40, 60, 80
Maximum Traffic load per UE 512 kbps
(Downlink)
Traffic types Video, ITS
Propagation delay 2–5 ms
Reward’s weights τ = 0.5 and ω = 0.5
Number of RBs 12, 100
Discount factor 0.9
Actor learning-rate 0.01
Critic learning-rate 0.05

5.4.1. Convergence
Before discussing the obtained results, in Figure 3a we present the convergence of
the reward function for SA2C-EADDUS. In this figure, the number of UEs is equal to
70, of which 10% is assigned with URLLC traffic load. Additionally, we employed the
epsilon-greedy policy, forcing actors to assign RBs with the highest weight or randomly
choosing actions during the exploration phase. As it is shown, the algorithm will converge
after almost 100 episodes where each episode contains 500 iterations.
Sensors 2022, 22, 5029 12 of 17

(a) (b)
Figure 3. The convergence of the reward function and the overall energy conservation in the network
when the SA2C-EADDUS with different processing capacity level is employed: (a) the convergence
of the reward function; (b) impact of changing ξ in SA2C-EADDUS.

5.4.2. Energy Efficiency

As mentioned previously, in this work, the energy consumption is reduced by migrat-
ing a NF dynamically to a DU with a higher processing power by considering its available
capacity, priority of packets, and propagation delay during each time interval. We assume
that when a DU transfer its load to DU HP , it can be deactivated, and its overall power
consumption (dynamic + fix) will be zero during that time interval. We set the amount of
fixed energy consumption at DU LP and DU HP as 5 KWh and 10 KWh, respectively [24]. We
examine our model by considering three processing capacity levels for DU HP . In the first
one, we can transfer the processing load of all DUs (DU LP ) to DU HP (100%DRL-EADDUS),
while in the second and third ones, DU HP has a limited processing power and the load of
only 75% (75%-DRL-EADDUS) or 50% (50%-DRL-EADDUS) of DU LP can be transferred
to DU HP . As we can see in Figure 3b, based on the available processing power in DU HP ,
by dynamically transferring local DUs’ load, the SA2C-EADDUS scheme can increase
energy conservation up to 50%. Therefore, by dynamically transferring the load among
DUs, we can reduce the energy consumption dramatically in the network.
As shown previously, the formulated problem is NP-hard. To this end, we first
compare our approach with an optimization model in a smaller scale network, then in
the following, we increase the size of the network and evaluate the proposed scheme in
comparison to our baselines.

5.4.3. Comparison with the MILP Solution

In this subsection, we compare the performance of our SA2C-EADDUS scheme with
the MILP solution presented in Section 3.1. Due to the NP-hard property of our problem,
we use only 12 RBs in this comparison. Therefore, we can obtain the optimum solutions
with MILP solver in a reasonable run time.
Figure 4a shows the change of energy consumption and mean delay with the weighting
factor of the energy consumption. As seen, W = 0.7 and W = 0.6 are the breaking points
for 8 UEs and 16 UEs, respectively, in this multi-objective minimization problem. We
observe that choosing a lower weight is crucial to prevent a drastically higher mean delay.
In Figure 4b, we detail the impact of the increasing number of UEs when weight W1 is set to
=0.5. The mean delay remains in a reasonable range with the higher number of UEs, while
the energy consumption increases due to the decentralization of the NFs. One of the main
reasons for this decentralization comes from balancing the increase of scheduling delays
due to increasing competition for RBs. Thus, NFs prefer to stay in local DUs to reduce the
propagation delay, which also reduces the overall mean delay.
Sensors 2022, 22, 5029 13 of 17

4.0

Energy Consumption (kWh)

8 UEs 17
3.5 16 UEs
16
3.0 15
14

Mean Delay (ms)

W=0.6
2.5 13
W=0.7
2.0
1.25

Mean Delay (ms)

1.5
1.00
1.0 0.75
0.5 0.50 8
10 12 14 16 18 20 10 12 14 16
Energy Consumption (Kwh) Number of UEs

(a) (b)
Figure 4. Energy consumption and mean delay performance with MILP solver: (a) the tradeoff
between these two KPIs; (b) impact of increasing number of UEs.

Figure 5a compares the mean delay results of the SA2C-EADDUS scheme and the
MILP solution. The SA2C-EADDUS scheme provides a reasonable mean delay for the UEs
lower than 14. However, due to the high contention, a higher number of UEs causes package
drops, and then they cause the increase of mean delay. On the other hand, the MILP solver
is not affected by this due to the ideal conditions and predefined given data. Figure 5b
compares the energy consumption of two solutions.The SA2C-EADDUS scheme performs
very close to the MILP solution.

5 MILP Solver
SA2C-EADDUS
4
Mean Delay (ms)

1
8 10 12 14 16
Number of UEs

(a) (b)
Figure 5. Lower bound comparisons of the proposed method SA2C-EADDUS: (a) mean delay com-
parison; (b) energy consumption comparison.

5.4.4. Delay
Hereafter, we examine the performance of the SA2C-EADDUS scheme with respect
to three baselines and use a larger network where the number of UEs are varied between
40 to 80. As shown in Table 1, we evaluated the proposed model by generating User
Datagram Protocol (UDP) and Transmission Control Protocol (TCP) packets (video and
ITS) with different delay budgets (150 ms and 30 ms, respectively) and QoS requirements.
In Figure 6a,b we presented the mean delay of video packets and ITS packets independently
to illustrate how the proposed model managed to keep the mean delay of each traffic type
below its delay budget threshold. The proposed SA2C-EADDUS scheme is compared with
an A2C-based scheduler (A2C-RBA) [28], a DRL-based scheduler with different processing
capacity levels [26], and a heuristic scheme (Heuristic-DUS). Here, to examine the effect of
the processing capacity of DU HP over the network performance, we assume that DU HP
has two processing capacity levels. In the first one, we can transfer the processing load of
all DUs (DU LP ) to DU HP (100%DRL-EADDUS), while in the second one, the load of only
50% of DU LP can be transferred to DU HP (50%-DRL-EADDUS). The energy consumption
corresponding to the delay results in Figure 6 is depicted in Figure 3b. In Figure 3b,
Sensors 2022, 22, 5029 14 of 17

we presented the overall energy conversation when employing the SA2C-EADDUS with
different capacity levels. The maximum energy will be consumed (20 KWh) when UEs
are scheduled locally, and we can conserve energy consumption by up to 50% when the
processing power increases.

(a) (b)
Figure 6. The overall mean delay by considering different processing capacity levels and algorithms:
(a) the video packets’ mean delay; (b) the ITS packets’ mean delay.

As observed in Figure 6, the proposed algorithm reduces the mean delay in both
ITS and video traffic with respect to baselines. Additionally, by increasing the processing
power, the observation level of agents will be increased, and actions become more accurate;
therefore, the 100%DRL-EADDUS agent performs better by reducing the mean delay in the
network in comparison with the 50%-DRL-EADDUS agent. Finally, the SA2C-EADDUS can
reduce the mean delay dramatically with respect to the A2C-RBA and the Heuristic-DUS
algorithm, since the SA2C-EADDUS can allocate RBs in a way that the inter-cell interference
in the network is reduced. Additionally, the SA2C-EADDUS approach performs better with
respect to the DRL-EADDUS algorithm because, unlike DRL-EADDUS, the RB allocation
agent in the SA2C-EADDUS scheme, in addition to packet delay, also considers the mean
CQI level of UEs and the priority of URLLC packets in its algorithm.

5.4.5. Packet Delivery Ratio

As explained previously, the proposed scheme reduces the inter-cell interference in
the network by expanding agents’ observation level and applying actions with respect
to other RUs’ RB maps when the NFs are transferred to DU HP . Therefore, as shown in
Figure 7, the SA2C-EADDUS scheme can improve the packet delivery ratio significantly
with respect to the A2C-RBA and Heuristic-DUS algorithms when the number of UEs are
high. Similarly, by increasing the processing level of DU HP , actions become more accurate,
and the number of packets which are successfully delivered will increase.

(a) (b)
Figure 7. The overall packet delivery ratio by considering different processing capacity levels
and algorithms: (a) the video packets’ delivery ratio; (b) the ITS packets’ delivery ratio.
Sensors 2022, 22, 5029 15 of 17

6. Conclusions
As future mobile networks become more complex, the need for intelligence and
participation of more players is emerging, eventually leading to the need for openness.
As these goals are defining the initiatives such as O-RAN and several others, there is a dire
need to explore intelligence capabilities. In this paper, we evaluated the significance of
expanding the observation level in O-RAN architecture for NFs. To this end, we consider
resource allocation function as an example NF and propose a two nested A2C-based
algorithms, which contain two A2C techniques that are working together. The first layer
dynamically transfers NFs to a DU with higher processing power to hit the balance between
saving energy and improving actions’ accuracy. Meanwhile, the second layer contains an
A2C-based scheduler algorithm, which allocates RB by considering user traffic type and
their delay budget. The simulation results show that the proposed scheme can significantly
increase energy efficiency and reduce the overall mean delay and packet drop ratio with
respect to the case where the NF is solely executed at local DUs with limited processing
power. In future works, we will employ the delay and energy consumption in the fronthaul
links and the switching networks.

Author Contributions: Conceptualization, S.M. and T.P.; methodology, S.M. and T.P.; software,
S.M. and T.P.; validation, S.M. and T.P.; formal analysis, S.M. and T.P.; investigation, S.M. and T.P.;
resources, S.M., T.P. and R.W.; writing—original draft preparation, S.M. and T.P.; writing—review
and editing, S.M. and T.P.; visualization, S.M. and T.P.; supervision, M.E.-K.; project administration,
M.E.-K.; funding acquisition, Ontario Centers of Excellence (OCE) 5G ENCQOR program and Ciena.
All authors have read and agreed to the published version of the manuscript.
Funding: This work is supported by Ontario Centers of Excellence (OCE) 5G ENCQOR program
and Ciena.
Conflicts of Interest: The authors declare no conflict of interest.

Notations
The following notations are used in this manuscript:

Sets Explanation
r∈R set of RBs in one time interval
t∈T set of time intervals
i∈I set of UEs
hi, ti ∈ hI , T i tuple of demands
l∈L set of DU LP
k∈K type of traffic
Variables Explanation
ahi,tirt0 indicates rt0 is assigned for hi, ti
blt indicates NFs of DU LP l migrates to DU HP in t
yhi,tit0 indicates any r in t0 is assigned for hi, ti
uhi,tik user traffic indicator
Given Data Explanation
DltP propagation delay
ElF fixed energy consumption at l
ElD dynamic energy consumption coefficient at l
E HP total energy consumption at DU HP
Sirt0 max service rate
Uhi,tik user traffic demand size
∆hi,tik delay budget
ξ max. number of DU LP that can migrate their NFs
M (i ) user & DU LP map
TTI length of transmission time interval
W energy consumption weight in objective function
Ω scaling factor between en. cons. and mean delay
Sensors 2022, 22, 5029 16 of 17

References
1. Klinkowski, M. Latency-Aware DU/CU Placement in Convergent Packet-Based 5G Fronthaul Transport Networks. Appl. Sci.
2020, 10, 7429. [CrossRef]
2. Semov, P.; Koleva, P.; Tonchev, K.; Poulkov, V.; Cooklev, T. Evolution of mobile networks and C-RAN on the road beyond 5G. In
Proceedings of the 2020 43rd International Conference on Telecommunications and Signal Processing (TSP), Milan, Italy, 7–9 July
2020; IEEE: Piscataway, NJ, USA, 2020; pp. 392–398.
3. Dryjański, M.; Kułacz, Ł.; Kliks, A. Toward Modular and Flexible Open RAN Implementations in 6G Networks: Traffic Steering
Use Case and O-RAN xApps. Sensors 2021, 21, 8173. [CrossRef] [PubMed]
4. Yi, B.; Wang, X.; Li, K.; Huang, M. A comprehensive survey of network function virtualization. Comput. Netw. 2018, 133, 212–262.
[CrossRef]
5. Gilson, M.; Mackenzie, R.; Sutton, A.; Huang, J. NGMN Overview on 5G RAN Functional Decomposition; NGMN Alliance: Frankfurt
am Main, Germany, 2018.
6. Pamuklu, T.; Ersoy, C. GROVE: A Cost-Efficient Green Radio Over Ethernet Architecture for Next Generation Radio Access
Networks. IEEE Trans. Green Commun. Netw. 2021, 5, 84–93. [CrossRef]
7. Mollahasani, S.; Erol-Kantarci, M.; Wilson, R. Dynamic CU-DU Selection for Resource Allocation in O-RAN Using actor–critic
Learning. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021.
8. Wu, J.; Zhang, Y.; Zukerman, M.; Yung, E.K.N. Energy-efficient base-stations sleep-mode techniques in green cellular networks:
A survey. IEEE Commun. Surv. Tutor. 2015, 17, 803–826. [CrossRef]
9. Oh, E.; Son, K.; Krishnamachari, B. Dynamic base station switching-on/off strategies for green cellular networks. IEEE Trans.
Wirel. Commun. 2013, 12, 2126–2136. [CrossRef]
10. Niu, Z. TANGO: Traffic-aware network planning and green operation. IEEE Wirel. Commun. 2011, 18, 25–29. [CrossRef]
11. Mollahasani, S.; Onur, E. Density-aware, energy-and spectrum-efficient small cell scheduling. IEEE Access 2019, 7, 65852–65869.
[CrossRef]
12. Qian, M.; Hardjawana, W.; Shi, J.; Vucetic, B. Baseband processing units virtualization for cloud radio access networks. IEEE
Wirel. Commun. Lett. 2015, 4, 189–192. [CrossRef]
13. Wang, X.; Thota, S.; Tornatore, M.; Chung, H.S.; Lee, H.H.; Park, S.; Mukherjee, B. Energy-efficient virtual base station formation
in optical-access-enabled cloud-RAN. IEEE J. Sel. Areas Commun. 2016, 34, 1130–1139. [CrossRef]
14. Sahu, B.J.; Dash, S.; Saxena, N.; Roy, A. Energy-efficient BBU allocation for green C-RAN. IEEE Commun. Lett. 2017, 21, 1637–1640.
[CrossRef]
15. Saxena, N.; Roy, A.; Kim, H. Traffic-aware cloud RAN: A key for green 5G networks. IEEE J. Sel. Areas Commun. 2016,
34, 1010–1021. [CrossRef]
16. Malandrino, F.; Chiasserini, C.F.; Casetti, C.; Landi, G.; Capitani, M. An Optimization-Enhanced MANO for Energy-Efficient 5G
Networks. IEEE/ACM Trans. Netw. 2019, 27, 1756–1769. [CrossRef]
17. Larsen, L.M.; Checko, A.; Christiansen, H.L. A survey of the functional splits proposed for 5G mobile crosshaul networks. IEEE
Commun. Surv. Tutor. 2018, 21, 146–172. [CrossRef]
18. Shehata, M.; Elbanna, A.; Musumeci, F.; Tornatore, M. Multiplexing gain and processing savings of 5G radio-access-network
functional splits. IEEE Trans. Green Commun. Netw. 2018, 2, 982–991. [CrossRef]
19. Alabbasi, A.; Wang, X.; Cavdar, C. Optimal processing allocation to minimize energy and bandwidth consumption in hybrid
CRAN. IEEE Trans. Green Commun. Netw. 2018, 2, 545–555. [CrossRef]
20. Akoush, S.; Sohan, R.; Rice, A.; Moore, A.W.; Hopper, A. Predicting the performance of virtual machine migration. In Proceedings
of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems,
Miami Beach, FL, USA, 17–19 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 37–46.
21. Zhan, Z.H.; Liu, X.F.; Gong, Y.J.; Zhang, J.; Chung, H.S.H.; Li, Y. Cloud computing resource scheduling and a survey of its
evolutionary approaches. ACM Comput. Surv. (CSUR) 2015, 47, 1–33. [CrossRef]
22. Elsayed, M.; Erol-Kantarci, M. AI-enabled future wireless networks: Challenges, opportunities, and open issues. IEEE Veh.
Technol. Mag. 2019, 14, 70–77. [CrossRef]
23. Şahin, T.; Khalili, R.; Boban, M.; Wolisz, A. Reinforcement learning scheduler for vehicle-to-vehicle communications outside
coverage. In Proceedings of the 2018 IEEE Vehicular Networking Conference (VNC), Taipei, Taiwan, 5–7 December 2018; IEEE:
Piscataway, NJ, USA, 2018; pp. 1–8.
24. Pamuklu, T.; Erol-Kantarci, M.; Ersoy, C. Reinforcement Learning Based Dynamic Function Splitting in Disaggregated Green
Open RANs. In Proceedings of the IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021.
25. Elsayed, M.; Erol-Kantarci, M.; Yanikomeroglu, H. Transfer Reinforcement Learning for 5G-NR mm-Wave Networks. IEEE Trans.
Wirel. Commun. 2020, 20, 2838–2849. [CrossRef]
26. Zhang, T.; Shen, S.; Mao, S.; Chang, G.K. Delay-aware Cellular Traffic Scheduling with Deep Reinforcement Learning. In
Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020;
pp. 1–6.
27. Chen, G.; Zhang, X.; Shen, F.; Zeng, Q. Two Tier Slicing Resource Allocation Algorithm Based on Deep Reinforcement Learning
and Joint Bidding in Wireless Access Networks. Sensors 2022, 22, 3495. [CrossRef]
Sensors 2022, 22, 5029 17 of 17

28. Mollahasani, S.; Erol-Kantarci, M.; Hirab, M.; Dehghan, H.; Wilson, R. actor–critic Learning Based QoS-Aware Scheduler for
Reconfigurable Wireless Networks. IEEE Trans. Netw. Sci. Eng. 2021, 9, 45–54. [CrossRef]
29. Pamuklu, T.; Mollahasani, S.; Erol-Kantarci, M. Energy-Efficient and Delay-Guaranteed Joint Resource Allocation and DU
Selection in O-RAN. In Proceedings of the 5G World Forum (5GWF), Montreal, QC, Canada, 13–15 October 2021.
30. O-RAN Alliance. O-RAN-WG1-O-RAN Architecture Description—v04.00.00; Technical Specification; O-RAN Alliance: Alfter,
Germany, 2021.
31. Yu, Y.J.; Pang, A.C.; Hsiu, P.C.; Fang, Y. Energy-efficient downlink resource allocation for mobile devices in wireless systems. In
Proceedings of the 2013 IEEE Global Communications Conference (GLOBECOM), Atlanta, GA, USA, 9–13 December 2013; IEEE:
Piscataway, NJ, USA, 2013; pp. 4692–4698.
32. Bonati, L.; D’Oro, S.; Polese, M.; Basagni, S.; Melodia, T. Intelligence and Learning in O-RAN for Data-Driven NextG Cellular
Networks. IEEE Commun. Mag. 2021, 59, 21–27. [CrossRef]
33. ITU. ITU-T Recommendation G Suppl. 66. In 5G Wireless Fronthaul Requirements in a Passive Optical Network Context; Technical
Report; International Telecommunications Union: Geneva, Switzerland, 2018.
34. 3GPP. Table 6.1.7-A: Standardized QCI Characteristics from 3GPP TS 23.203 V16.1.0; Technical Report; 3GPP: Sophia Antipolis,
France, 2020.
35. Gawłowicz, P.; Zubow, A. NS-3 meets openai gym: The playground for machine learning in networking research. In Proceedings
of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, Miami Beach,
FL, USA, 25–29 November 2019; pp. 113–120.
36. Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016,
arXiv:1606.01540.

Concise Guide to OTN optical transport networks
From Everand
Concise Guide to OTN optical transport networks
alasdair gilchrist
4/5 (2)
2023-Nokia 5G Evolution
No ratings yet
2023-Nokia 5G Evolution
109 pages
Dcar Manual Técnico Lan Mac 800
No ratings yet
Dcar Manual Técnico Lan Mac 800
32 pages
Documentation (Pizza Hut)
94% (16)
Documentation (Pizza Hut)
87 pages
Fair Resource Allocation in Virtualized O-RAN Platforms: And, And
No ratings yet
Fair Resource Allocation in Virtualized O-RAN Platforms: And, And
34 pages
1 DeepReinforcementLearning-basedJointUser
No ratings yet
1 DeepReinforcementLearning-basedJointUser
14 pages
Deep Reinforcement Learning Based Resource Allocation in Delay
No ratings yet
Deep Reinforcement Learning Based Resource Allocation in Delay
5 pages
progress
No ratings yet
progress
30 pages
Deep Q-Network for 5G NR Downlink Scheduling
No ratings yet
Deep Q-Network for 5G NR Downlink Scheduling
6 pages
Towards 5G: Context Aware Resource Allocation For Energy Saving
No ratings yet
Towards 5G: Context Aware Resource Allocation For Energy Saving
13 pages
AI Driven VNF Splitting in O RAN For Enhancin Esmaeil Amiri
No ratings yet
AI Driven VNF Splitting in O RAN For Enhancin Esmaeil Amiri
117 pages
Data Driven AI Assisted Green Network Design and Management FULLTEXT01
No ratings yet
Data Driven AI Assisted Green Network Design and Management FULLTEXT01
99 pages
(Ebook) Analytical modeling of wireless communication systems by Chiasserini, Carla-Fabiani; Gribaudo, Marco; Manini, Daniele ISBN 9781061071086, 9781119307723, 9781119307730, 9781119307747, 9781848219441, 1061071081, 1119307724, 1119307732, 1119307740pdf download
100% (3)
(Ebook) Analytical modeling of wireless communication systems by Chiasserini, Carla-Fabiani; Gribaudo, Marco; Manini, Daniele ISBN 9781061071086, 9781119307723, 9781119307730, 9781119307747, 9781848219441, 1061071081, 1119307724, 1119307732, 1119307740pdf download
54 pages
Joint Delay-Energy Optimization For Multi-Priority Random Access in Machine-Type Communications
No ratings yet
Joint Delay-Energy Optimization For Multi-Priority Random Access in Machine-Type Communications
16 pages
QoS-Driven Scheduling in 5G Radio Access PDF
No ratings yet
QoS-Driven Scheduling in 5G Radio Access PDF
7 pages
Electronics 14 01686 v2
No ratings yet
Electronics 14 01686 v2
23 pages
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
No ratings yet
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
14 pages
Camera Ready TSP
No ratings yet
Camera Ready TSP
16 pages
A Sample Article Using IEEEtran Cls for IEEE Journals and Transactions
No ratings yet
A Sample Article Using IEEEtran Cls for IEEE Journals and Transactions
5 pages
Paper 3 - O-RAN-Joint Selection of Local Trainers and Resource Allocation For Federated Learning in Open RAN Intelligent Controllers
No ratings yet
Paper 3 - O-RAN-Joint Selection of Local Trainers and Resource Allocation For Federated Learning in Open RAN Intelligent Controllers
6 pages
electronics-11-02071 (1)
No ratings yet
electronics-11-02071 (1)
13 pages
Paper 1 - O-RAN-Minimum-Delay
No ratings yet
Paper 1 - O-RAN-Minimum-Delay
8 pages
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
No ratings yet
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
6 pages
Resource_Allocation_for_Network_Slicing_in_Open_RAN_A_Hierarchical_Learning_Approach
No ratings yet
Resource_Allocation_for_Network_Slicing_in_Open_RAN_A_Hierarchical_Learning_Approach
17 pages
Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning For URLLC and eMBB Services
No ratings yet
Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning For URLLC and eMBB Services
14 pages
tham2019--
No ratings yet
tham2019--
4 pages
Deep Reinforcement Learning Based Dynamic Resource Allocation in 5G Ultra-Dense Networks
No ratings yet
Deep Reinforcement Learning Based Dynamic Resource Allocation in 5G Ultra-Dense Networks
7 pages
Network-Aided Intelligent Traffic Steering in 6G O-RAN: A Multi-Layer Optimization Framework
No ratings yet
Network-Aided Intelligent Traffic Steering in 6G O-RAN: A Multi-Layer Optimization Framework
15 pages
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
No ratings yet
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
23 pages
Energy Consumption Optimization
No ratings yet
Energy Consumption Optimization
23 pages
Energy Efficiency in Open RAN: RF Channel Reconfiguration Use Case
No ratings yet
Energy Efficiency in Open RAN: RF Channel Reconfiguration Use Case
9 pages
Optimizing Energy Consumption in Sensor Networks Using Ant Colony Algorithm and Fuzzy System
No ratings yet
Optimizing Energy Consumption in Sensor Networks Using Ant Colony Algorithm and Fuzzy System
14 pages
2023an Energy-Efficient Routing Protocol With Reinforcement Learning in Software-Defined Wireless Sensor Networks
No ratings yet
2023an Energy-Efficient Routing Protocol With Reinforcement Learning in Software-Defined Wireless Sensor Networks
22 pages
5G AND BEYOND WIRELESS NETWORKS OPTIMIZATION Thesis
No ratings yet
5G AND BEYOND WIRELESS NETWORKS OPTIMIZATION Thesis
213 pages
Generating Function Analysis of Wireless Networks and ARQ Systems
No ratings yet
Generating Function Analysis of Wireless Networks and ARQ Systems
178 pages
Towards Network Automation - A Multi-Agent Based Intelligent Networking System
No ratings yet
Towards Network Automation - A Multi-Agent Based Intelligent Networking System
220 pages
Research Paper
No ratings yet
Research Paper
29 pages
Power Over Fiber Pooling As Part of 6G Optical Fronthaul
No ratings yet
Power Over Fiber Pooling As Part of 6G Optical Fronthaul
8 pages
Iarjset 38
No ratings yet
Iarjset 38
5 pages
Deep Learning DL Based Joint Resource Allocation and RRH Association in 5G-Multi-Tier Networks
No ratings yet
Deep Learning DL Based Joint Resource Allocation and RRH Association in 5G-Multi-Tier Networks
1 page
Joint_Optimization_of_Data_Offloading_and_Resource_Allocation_With_Renewable_Energy_Aware_for_IoT_Devices_A_Deep_Reinforcement_Learning_Approach
No ratings yet
Joint_Optimization_of_Data_Offloading_and_Resource_Allocation_With_Renewable_Energy_Aware_for_IoT_Devices_A_Deep_Reinforcement_Learning_Approach
15 pages
Team 2 Research Paper
No ratings yet
Team 2 Research Paper
6 pages
Rapid Spanning Tree Protocol for Modern Networks: Definitive Reference for Developers and Engineers
From Everand
Rapid Spanning Tree Protocol for Modern Networks: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fair Ric
No ratings yet
Fair Ric
10 pages
p4 netowrk
No ratings yet
p4 netowrk
6 pages
Energy Efficient Task Scheduling For Real-Time Embedded Systems
No ratings yet
Energy Efficient Task Scheduling For Real-Time Embedded Systems
15 pages
Energy-Efficient Power Control: A Look at 5G Wireless Technologies
No ratings yet
Energy-Efficient Power Control: A Look at 5G Wireless Technologies
16 pages
Federated Meta-Learning For Traffic Steering in O-Ran
No ratings yet
Federated Meta-Learning For Traffic Steering in O-Ran
7 pages
journal 1
No ratings yet
journal 1
9 pages
Probabilistic and Distributed Traffic Control in LPWANs - 2023 - Ad Hoc Networks
No ratings yet
Probabilistic and Distributed Traffic Control in LPWANs - 2023 - Ad Hoc Networks
11 pages
An Efficient Bandwidth Optimization and Minimizing Energy Consumption Utilizing Efficient Reliability and Interval Discrepant Routing (Eridr) Algorithm
No ratings yet
An Efficient Bandwidth Optimization and Minimizing Energy Consumption Utilizing Efficient Reliability and Interval Discrepant Routing (Eridr) Algorithm
11 pages
PHD Thesis On 5G With AI
No ratings yet
PHD Thesis On 5G With AI
86 pages
Load Balancer Technologies and Architectures: Definitive Reference for Developers and Engineers
From Everand
Load Balancer Technologies and Architectures: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Energy Efficient Data Transmission Using Approximate Dynamic Programming in Mobile Cloud Computing
No ratings yet
Energy Efficient Data Transmission Using Approximate Dynamic Programming in Mobile Cloud Computing
5 pages
Routing Protocols For IOT Applications Based On Distributed Learning
No ratings yet
Routing Protocols For IOT Applications Based On Distributed Learning
7 pages
93fa4284-1d56-461d-bdd4-3a03972cf277
No ratings yet
93fa4284-1d56-461d-bdd4-3a03972cf277
71 pages
REALni
No ratings yet
REALni
6 pages
Energies 13 06546 v2
No ratings yet
Energies 13 06546 v2
19 pages
mathematics-11-02376-v2
No ratings yet
mathematics-11-02376-v2
23 pages
Flexible DRX Optimization For LTE and 5G: Farnaz Moradi, Emma Fitzgerald, Michał Pióro, and Björn Landfeldt
No ratings yet
Flexible DRX Optimization For LTE and 5G: Farnaz Moradi, Emma Fitzgerald, Michał Pióro, and Björn Landfeldt
15 pages
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
From Everand
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Energyefficient Communication Networks And Systems Josip Lorincz pdf download
No ratings yet
Energyefficient Communication Networks And Systems Josip Lorincz pdf download
77 pages
1 s2.0 S0167739X2300403X Main
No ratings yet
1 s2.0 S0167739X2300403X Main
14 pages
4 Optical_Front_Mid_haul_with_Open_Access
No ratings yet
4 Optical_Front_Mid_haul_with_Open_Access
16 pages
5G - Architecture - Overview
No ratings yet
5G - Architecture - Overview
16 pages
3 11 Downtime-Aware O-RAN VNF Deployment
No ratings yet
3 11 Downtime-Aware O-RAN VNF Deployment
6 pages
Bandwidth Part (BWP) in 5g-New Radio
No ratings yet
Bandwidth Part (BWP) in 5g-New Radio
16 pages
5G NR BWP Types and BWP Operations
No ratings yet
5G NR BWP Types and BWP Operations
5 pages
Vitafoam Catalogue
No ratings yet
Vitafoam Catalogue
20 pages
Freight Transport in Nigeria
No ratings yet
Freight Transport in Nigeria
21 pages
Chapter 3 Design Principles
No ratings yet
Chapter 3 Design Principles
34 pages
IS25LP064A IS25LP032A: 3V Serial Flash Memory With 133Mhz Multi I/O Spi & Quad I/O Qpi DTR Interface
No ratings yet
IS25LP064A IS25LP032A: 3V Serial Flash Memory With 133Mhz Multi I/O Spi & Quad I/O Qpi DTR Interface
100 pages
Dell g15 5511 Service Manual en Us
No ratings yet
Dell g15 5511 Service Manual en Us
76 pages
Oper Pica104 108 Uk
No ratings yet
Oper Pica104 108 Uk
40 pages
Topic Name Spring Boot Frameworks Rest API Singleton Design Pattern Factory Design Pattern
No ratings yet
Topic Name Spring Boot Frameworks Rest API Singleton Design Pattern Factory Design Pattern
7 pages
Course Outline For ASPNet
No ratings yet
Course Outline For ASPNet
3 pages
License Mobility Through Microsoft Software Assurance Overview
No ratings yet
License Mobility Through Microsoft Software Assurance Overview
2 pages
MCUXpresso SDK Release Notes_MIMXRT105x
No ratings yet
MCUXpresso SDK Release Notes_MIMXRT105x
25 pages
Microsoft PowerPoint 2019 Fundamentals
No ratings yet
Microsoft PowerPoint 2019 Fundamentals
40 pages
Functions (03 09 2024)
No ratings yet
Functions (03 09 2024)
3 pages
Express
No ratings yet
Express
5 pages
13.MSOFTX3000 Number Analysis Configuration ISSUE1.00
100% (1)
13.MSOFTX3000 Number Analysis Configuration ISSUE1.00
45 pages
Long Answer Question
No ratings yet
Long Answer Question
6 pages
Sentinel System Monitor
No ratings yet
Sentinel System Monitor
5 pages
Student Management System (Python Code)
No ratings yet
Student Management System (Python Code)
4 pages
snmp-4 13
No ratings yet
snmp-4 13
246 pages
Tuya Developer User Guide
No ratings yet
Tuya Developer User Guide
94 pages
D.A.V Public School: A Project Report On Hotel Management System
No ratings yet
D.A.V Public School: A Project Report On Hotel Management System
60 pages
Job Titles For Buyer Personas
No ratings yet
Job Titles For Buyer Personas
5 pages
1 - Bug Management
No ratings yet
1 - Bug Management
40 pages
Students' User Manual: For AI Proctored Online Mock & Main Examination
No ratings yet
Students' User Manual: For AI Proctored Online Mock & Main Examination
21 pages
REPORT
No ratings yet
REPORT
15 pages
Lab exercise TCP UDP wireshak-1
No ratings yet
Lab exercise TCP UDP wireshak-1
15 pages
AUTOSAR FO RS HealthMonitoring
No ratings yet
AUTOSAR FO RS HealthMonitoring
29 pages
Port Forwarding and DMZ For Tilgin HG2381
No ratings yet
Port Forwarding and DMZ For Tilgin HG2381
7 pages
Introduction To Linux Operating System: Nabajyoti Goswami
No ratings yet
Introduction To Linux Operating System: Nabajyoti Goswami
51 pages
Exercise-6 MST Programs
No ratings yet
Exercise-6 MST Programs
11 pages

Energy-Aware Dynamic DU Selection and NF Relocation

Uploaded by

Energy-Aware Dynamic DU Selection and NF Relocation

Uploaded by

sensors

Sensors 2022, 22, 5029. https://doi.org/10.3390/s22135029 https://www.mdpi.com/journal/sensors

Moreover, in our architecture, DUs have an option to migrate NFs to a DU with

3.1. Energy Consumption and Delay Models

3.2. System Constraints

3.3. Problem Definition

(P1) Minimize: W ∗ E TOT + Ω ∗ (1 − W ) ∗ δ M (13)

Subject to: Equations (6)–(12)

consumption by dynamically selecting DUs by considering their processing power and

Figure 2. The overall delay in the network.

4.2. Soft Actor–Critic Energy Aware Dynamic DU Selection/NF Placement

Rewardscheduler = βR1 + γR2 + ΦR3 , (14)

Reward DUselector = τ ( R1 0 + R2 0 ) − ωE TOT , (17)

The SA2C-EADDUS performance is evaluated by considering different processing

Table 1. Traffic properties [34].

QCI Resource Type Priority Packet Delay Budget Service Example

We use the proposed MILP solution as a benchmark for SA2C-EADDUS. Furthermore,

5.1. Delay and Priority Aware Actor–Critic RB Allocation Algorithm (A2C-RBA)

5.2. Deep-Reinforcement Learning-Based Energy-Aware Dynamic DU Selection

5.3. Heuristic DU Selection Method (Heuristic-DUS)

Algorithm 1: Heuristic RB Allocation Algorithm.

5.4. Simulation Results

DU selec./scheduler algorithms SA2C-EADDUS, DRL-EADDUS,

5.4.2. Energy Efficiency

5.4.3. Comparison with the MILP Solution

Energy Consumption (kWh)

Mean Delay (ms)

Mean Delay (ms)

5.4.5. Packet Delivery Ratio

You might also like