Deep Reinforcement Learning Based Resource Allocation in Delay
Deep Reinforcement Learning Based Resource Allocation in Delay
Allocation in Delay-Tolerance-Aware 5G
Industrial IoT Systems
Problem Solved:
In this paper, they investigate the resource allocation problem under delay tolerance
constraints in 5G-enabled IIoT. Specifically, they propose a traffic prediction algorithm to provide
future traffic information for allocation algorithm. Then the allocation algorithms for minimizing
PRBs usage and jointly optimizing PRBs and power allocation are designed, respectively (delay
tolerance is regarded as a constraint in this paper).
Introduction
Industrial Internet of Things (IIoT) is a network that deeply integrates communication technology
with traditional industrial manufacturing. In IIoT, nodes have strict quality of service (QoS)
requirements, and 5G technology provides important technical support for nodes to guarantee QoS.
Since there are usually many nodes with different QoS in IIoT, a reasonable resource allocation
algorithm is needed to guarantee QoS of each node [2]. Using network slicing technology, nodes are
divided into different slices according to QoS, so as to effectively allocate and manage network
resources, which can address heterogeneous services problem. IIoT has stricter requirements on the
delay and reliability of data transmission. In the industrial production process, the failure of data to
arrive in time or even packet loss will cause unpredictable and serious consequences to the
production process and even personal safety. To ensuring the reliability of data transmission,
reducing power consumption is also crucial for the IIoT systems. Keeping low network transmission
power is one of the ways to effectively reduce power consumption in industrial production.
Maintaining low power consumption is an important way to extend the battery life of remote
industrial device. According to the Shannon formula, both power and physical resource blocks (PRBs)
can directly affect the data transmission speed, thereby affecting the transmission delay. There is
strong coupling among reliability of data transmission, power, and PRBs. By designing resource
allocation algorithms related to PRBs and power, the reliability requirements of the IIoT system can
be met.
System Model
The downlink(Downlink refers to the communication process where data is transmitted from
a higher-level network component, like a satellite or cellular base station, to a lower-level
device, such as a smartphone or satellite dish. It's a crucial part of telecommunications and
satellite systems, enabling users to receive data, such as streaming video, downloading files,
and browsing the internet.) of a 5G IIoT system which consists of a single small cell Base
Station (BS) This is a small-sized base station that provides 5G network coverage to a specific
area. It serves as a hub for communication with multiple devices within its range.), N mobile
nodes and M network slices.
The downlink communication bandwidth is parted into K PRBs (These are the smallest units
of bandwidth that can be allocated to different communications in the network.) and each
block has a bandwidth of B MHz.
Each node in the BS corresponds to a queue, and the data to be sent to the node is buffered
in the queue. Nodes with the same delay tolerance are grouped in the same slice and the
delay deadline of slice m is defined as Dmmax . If the delay of data in the queue exceeds
deadline Dmmax , it will be discarded.
At each slot, kn PRBs are assigned to node n, with the constraint K. In addition, they assume
that the channels between the BS and nodes are block fading, where the channel gain keeps
constant in each slot, but varies independently between slots. In this work, the channel gain
hn(t) between BS and node n at slot t is denoted by Rayleigh fading channel gain and there is
only one traffic flow between a node and the BS.
The total end to end delay requirements of slice m include the transmission delay,
propagation delay, queue delay and processing delay. They assume the BS supports 5G
communication and is equipped with powerful servers, so the transmission delay and
processing delay can be ignored.
In this paper, they use delay tolerance to measure the reliability of the network. It reflects
that the minimum requirement for data to meet the end-to-end delay, which ensures the
normal operation of the system. In order to store the incoming traffic data to be sent and
count the queuing delay, we sustain a queue to each node.
Objective is to minimize the number of PRBs usage while keeping slices isolated from each
other, and meet delay tolerance constraints. Reducing the usage of PRBs can improve
resource utilization. In addition, the saved PRBs can allow more nodes to be allowed to
access, or be scheduled to process other transactions.
1.They combined CNN, LSTM and attention mechanism to design a traffic prediction algorithm
B. Attention Mechanism:
Without attention mechanism, all features are given the same weight, so the neural
network can only spend longer time to train. In this work, we use the Squeeze-and-
Excitation (SE) block in [26] as the attention mechanism algorithm. The SE block mainly
includes two parts: squeeze and excitation.
LSTM is widely used in time series forecasting because of its gate structure, which controls
the time scale of information flow, effectively alleviating the gradient disappearance and
explosion problems in traditional recurrent neural networks (RNN). There are 3 gates in
LSTM.
2. Aiming at the optimization goal of minimizing the number of PRBs used on basis of the premise
of meeting the constraints of delay tolerance, a hierarchical resource allocation algorithm is
proposed which is combined by deep reinforcement learning (DRL) algorithm and a heuristic
algorithm. The hierarchical architecture is a commonly used method in the field of dynamic
resource allocation to reduce the complexity of the algorithm
Deep Q-networks (DQN), a deep reinforcement learning method, is commonly used in the
field of discrete action schedule which can optimize policy through the interaction between
agents and the environment. There are two networks in the DQN agent, the evaluation
network and the target network. The two networks have the same structure, and the
parameters of the evaluation network are updated to the target network after every few
steps. The architecture of the two networks is to make it easier to iterate toward a stable
direction. Besides, the greedy strategy is used to obtain the optimal action in DQN. Dueling
Network Architectures for DRL adopts the advantage function, which makes Dueling DQN
more accurate when estimating the Q-value. Dueling DQN can learn faster than DQN.
The size of action space is equal to the sum of the combination of allocating k PRBs to every
node where k is from 0 to K. With the increase of the number of nodes, the size of the action
space increases rapidly, which is important to the training of DRL, because that increase the
training time, increase the training difficulty, and even lead to non-convergence. Therefore,
a heuristic algorithm is proposed which is combined with D3QN to effectively reduce the size
of action space. They propose a heuristic algorithm called PRBs scheduling policy (PSP).
D3QN is responsible for allocating PRBs to each slice, and then distributing the PRBs
allocated to each node by the PSP.
3. They propose a dynamic allocation algorithm that simultaneously allocates PRBs and power to
minimize the weighted sum of PRBs usage and power consumption, achieving a balance between
resource utilization and power consumption.
The proposed algorithm needs to minimize the weighted sum of allocating PRBs and power
for each node n, under the premise of delay tolerance constraints.
Stimulation Results
Simulation Results and Analysis for Traffic Prediction:
It can be seen from the figures and tables that the prediction algorithm proposed in this
paper has the smallest prediction error and the best prediction effect relatively, where the
error value of RMSE can be reduced by up to 3%.
It can be seen that the algorithm satisfies the trend that when the bandwidth of PRBs is
large, the usage of PRBs is low, and when the bandwidth of PRBs is small, the usage of PRBs
is large.
BDQ is only compared with MDQN. It can be seen from Fig. 8 that when the weight is large,
that is, the influence of the power usage is relatively large, the power value is generally low,
and when the weight is small, that is, the influence of the PRB usage is large, the usage of
PRBs is small. Therefore, it is proved that the proposed algorithm can achieve a balance
between resource utilization and power consumption according to different weights which
can be adjusted on the basis of scenarios.
Conclusion
The traffic prediction algorithm CBL-A is composed of CNN, attention mechanism and Bi-
LSTM.
A two-layer dynamic resource allocation algorithm is proposed based on the traffic
prediction results. The first layer uses D3QN to allocate PRBs for each slice, and the second
layer uses a heuristic algorithm to allocate PRBs for nodes, to minimize the usage of PRBs.
A dynamic resource allocation algorithm based on branch structure is proposed, which
divides PRBs and power allocation into different branches, reducing the complexity of the
action space, to minimize the weighted sum of PRBs usage and power.
Simulation results indicate that the traffic prediction algorithm can achieve higher accuracy
than the baseline algorithm. The D3QN-PSP algorithm can lead to higher resource utilization
and convergence speed. The BDQ algorithm can adapt to the dynamic resource allocation
problem under large action space, and realize the balance between resource utilization and
power consumption.
In addition to the proposed algorithm DRL supporting continuous action spaces is also a
potential method to solve the optimization problem.
Design of resource allocation algorithms using Proximal Policy Optimization.