Reinforcement Learning-Based Resource Allocation For M2M Communications Over Cellular Networks

Uploaded by

sreedas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Reinforcement Learning-Based Resource Allocation For M2M Communications Over Cellular Networks

Uploaded by

sreedas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2022 IEEE Wireless Communications and Networking Conference (WCNC)

Reinforcement Learning-Based Resource Allocation

for M2M Communications over Cellular Networks
Sree Krishna Das∗ , Md. Siddikur Rahman† , Lina Mohjazi‡ , Muhammad Ali Imran‡ , and Khaled M. Rabie?
2022 IEEE Wireless Communications and Networking Conference (WCNC) | 978-1-6654-4266-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/WCNC51071.2022.9771998

∗
Dept. of Electrical, Electronic and Communication Engineering, Military Institute of Science and Technology, Bangladesh
†
Dept. of Electrical and Electronic Engineering, American International University-Bangladesh, Bangladesh
‡
James Watt School of Engineering, University of Glasgow, UK
?
Department of Engineering, Manchester Metropolitan University, UK
Email: [email protected], [email protected], [email protected], [email protected],
[email protected]

Abstract—The spectrum efficiency can be greatly enhanced data rate, decreasing power consumption, and reducing end-
by the deployment of machine-to-machine (M2M) communi- to-end (E2E) latency [2]. For M2M user equipments (MUEs)
cations through cellular networks. Existing resource allocation coexisting with cellular user equipments (CUEs), there are
approaches allocate maximum resource blocks (RBs) for cellu-
lar user equipments (CUEs). However, M2M user equipments two different types of deployment such as (i) overlaying
(MUEs) share the same frequency among themselves within the mode, and (ii) underlaying mode. CUEs and MUEs share
same tier. This results in generating co-tier interference, which the same radio resources through the underlaying mode. They
may deteriorate the MUE’s quality-of-service (QoS). To tackle suffer interference with each other. In the overlaying mode,
this problem and improve the user experience, in this paper, dedicated spectrum resources are allocated without creating
we propose a novel resource utilization policy, which exploits
reinforcement learning (RL) algorithm considering the pointer cross-tier interference. However, since a high number of users
network (PN). In particular, we design an optimization problem exist in the wireless network, the radio resources are usually
that determines the optimal frequency and power allocation inadequate [3], [4]. In order to entirely exploit the possible
needed to maximize the achievable rate performance of all facilities of underlaid M2M communications, it is essential to
M2M pairs and CUEs in the network subject to the co-tier provide the proper power for each UE by using the power
interference and QoS constraints. The proposed scheme enables
the user equipment (UE) to autonomously select an available control scheme and designing an efficient machine learning
channel and optimal power to maximize the network capacity (ML)-based resource utilization policy that mitigates the co-
and spectrum efficiency while minimizing co-tier interference. tier and cross-tier interference between MUEs and CUEs.
Moreover, the proposed scheme is compared with traditional
spectrum allocation schemes. Simulation results demonstrate the
superiority of the proposed scheme than that of the traditional
schemes. Moreover, the convergence of the proposed scheme is
investigated which reduces the computational complexity (CC). A. Related Works
Index Terms—M2M communications, resource allocation,
throughput, pointer network, reinforcement learning. In recent years, there has been a great deal of attention
to stochastic optimization and robust optimization methods
I. I NTRODUCTION because of the needs to handle the unpredictable value of CSI
in M2M communications. The studies in [4] addressed CSI by
By 2020, 4 billion devices are linked over 25 billion em- maximizing the predicted linking capacity through the use of
bedded intelligent systems creating 50 trillion GB of data [1]. stochastic optimization. In addition, accurate CSI is often not
Following these figures, internet of things (IoT), in particular possible or may require high feedback rates which makes the
wireless IoT, are potential candidates for future smart world. channel condition uncertain. [2], [5] studied how to maximize
Its vast adoption puts forward several technical challenges the resource efficiency (RE) in M2M communications by using
which include network design and storage architecture for the technique of joint power control and spectrum allocation to
smart devices, effective information transmission protocols, improve the user’s data rate and prolong the battery lifetime
proactive IoT device identification, malicious attack pre- of UE by facilitating the reuse of radio resources between
vention, technology standardization and appliance interfaces. MUEs and CUEs. In real time signal transmission, the size and
As a result, machine-to-machine (M2M) communications is shape parameters of the uncertainty set would fluctuate with
deemed as a promising paradigm in addressing these issues the channel conditions [6], [7]. That is, CSI frequently changes
and offering efficient operation of beyond fifth generation due to the high mobility of CUEs and MUEs. However,
(B5G) and sixth generation (6G) cellular networks. Besides, the preceding works presented so far emphasize on M2M
unlike conventional communications, M2M communications communications with ideal CSI [3]. Therefore, it is critical
involve direct links transmission with the evolved node B to investigate how to meet the increasing demand of higher
(eNB), resulting in many benefits, such as enhancing user transmission rate in M2M communications.

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
978-1-6654-4266-4/22/$31.00 ©2022 IEEE 1473
2022 IEEE Wireless Communications and Networking Conference (WCNC)

B. Motivation
Next generation wireless networks will generate a tremen-
dous amount of data related to network statistics, such as
user traffic, channel occupancy, channel quality, etc. This will
induce unmanageable overhead that largely increases delay,
computation, and energy consumption of network elements
[8]. Neural network (NN) can leverage this data to develop
automated and fine-grained schemes to optimize network radio
resources. The multipurpose pointer network (PN) can predict
sequences over variable length input dictionaries when it
integrates with NN resulting in improvement of sequences
with the assistance of input attention and generalization of Fig. 1. Proposed system model of M2M communications over cellular
networks.
variable size output dictionaries. However, the PN based on
the NN is not taken into account in [7], and interferences are
not taken into account in [4]. Furthermore, resource allocation resource utilization policy of the proposed scheme. It is
techniques based on the conventional optimization theory, also found that the proposed scheme reduces the com-
such as multidimensional knapsack problems (MKP), greedy putational complexity (CC). Also, the proposed scheme
algorithm, and heuristic algorithm [6], [7], are generally highly provides better network performance than that of the
complex and not feasible for real-time applications. Reinforce- traditional schemes.
ment learning (RL) is deemed as a promising algorithm to The rest of the paper is organized as follows. Section II
solve cellular communication problems, especially for spec- illustrates the system model. We put forward the radio resource
trum allocation, data offloading, adaptive modulation, power utilization policy and develop the RL algorithm with low CC
control and interference mitigation more efficiently compared in section III. Section IV presents simulation results, followed
to supervised and unsupervised learning algorithms. However, by conclusions in Section V.
RL algorithms reveal low convergence speed as well as overall
efficiency while working with large state–action spaces in II. P ROPOSED N ETWORK M ODEL
complex networks. Moreover, in combined resource sharing A. System Model
and power controlling schemes, RL is unable to manage The system model considers the downlink data transfer situ-
large action spaces and state space. Therefore, this paper ation when the eNB is located in the center of the cellular cell.
considers RL-based PN for solving multi-dimensional state The M2M communication comprises of a couple of devices
space and complexity discrete action space problems and those are able to directly transmit data without the help of the
proposes a joint power and spectrum allocation algorithm. The eNB. On the other hand, the CUEs are mobile terminals that
key contributions of this paper can be summarized as follows. can only be connected via the eNB. There are M M2M pairs
• This paper proposes a RL-based resource utilization pol- and N CUEs deployed randomly in the coverage area of the
icy for M2M communications over cellular networks. eNB as shown in Fig. 1. UEs operate in orthogonal frequency
• We adopt a mixed integer non-linear programming prob- bands following orthogonal frequency division multiple access
lem and NP-hard. Then, we assign the orthogonal sub- (OFDMA) technique. In this case, MUEs do not share the
frequenices for different MUEs and CUEs to maximize spectrum with CUEs. The unavailability of resource blocks
the sum rate of the network. Moreover, the M2M pairs (RBs) allows the MUEs to share the same frequency but it
are permitted to reuse resources to better use the scarce also causes co-tier interference between MUEs. Moreover, we
resources, when the co-tier interference is below than assume that the complete CSI is accessible [9]. In other words,
threshold interference. Besides, MUEs can choose a num- for simplicity, the eNB is capable of obtaining the full CSI
ber of channels and proper power to transmit services between CUEs and M2M pairings.
as soon as possible without affecting the traditional B. Performance Metrics
communication of CUEs.
As OFDMA is incorporated for CUE and MUE transmis-
• This paper considers the RL algorithm empowered PN
sions, MUE receivers are subject to the interference caused
architecture which is based on a low-complexity process
only by other MUE transmitters that reuse the same frequency.
to effectively utilize the spectrum resources. Besides, the
In this sense, the co-tier interference for the MUE receiver at
PN is a prominent type of NN which efficiently solves
subfrequency k is given as follows
combinatorial optimization related problems in M2M
communications. Furthermore, this method achieves the M
X
∗
goal of significantly improving the QoS of the system, Ik = vm,k p∗m,k h∗m,k , (1)
such as optimizing system capacity and simultaneously m=1
∗
reducing interference. where vm,k ∈ {0,1}, designates whether the subfrequency k
• Extensive simulations are carried out for evaluating the is allocated to M2M pair m, if the m M2M pair reuses the

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1474
2022 IEEE Wireless Communications and Networking Conference (WCNC)

∗ K X
N
subfrequency k, vm,k sets 1 as its value, else sets 0. Also, X
wnk ≤ 1, ∀n ∈ N, ∀k ∈ K (8)
pm,k denotes the transmit power while h∗m,k is the channel
∗
k=1 n=1
gain on subfrequency k between the MUE receiver and other
MUE transmitters including the 3rd generation partnership wnk ∈ {0, 1}, ∀k, n, (9)
project (3GPP) path loss (PL) model, channel fading on both
M2M pairs and CUEs that follows the Rayleigh distribution k
vm ∈ {0, 1}, ∀k, m, (10)
with uniform variance. The signal to interference-plus-noise
ratio (SINR) expression for CUE n and the M2M pair m at N X
X K

subfrequency k can be written as 0≤ pK

n ≤P
max
, (11)
n=1 k=1
pkn hkn
γnk = , (2)
Bn N0 γnk ≥ γth , ∀k, n, (12)
and
k pkm hkm k
γm ≥ γth , ∀k, ∀m ∈ Sk , (13)
γm = , (3)
Ik + Bm N0
where W = [wnk ]N ×K is the N by K subfrequency allocation
respectively, where hkn and hkm represent the channel power k
matrix for CUEs, V = [vm ]M ×K is the M by K subfrequency
gain on subfrequency k between M2M pair and CUE and eNB,
allocation matrix for M2M pairs and P = [pkn ]N ×K is the
respectively and N0 is the noise power. Besides, Bn and Bm
matrix of transmission power of CUEs at all subfrequencies.
denote the allocated spectrum resources of the CUE and M2M
Constraints (8) and (9) ensure that a maximum of one CUE
pair, respectively. Moreover, pkn and pkm denote the transmit
is allocated each subfrequency. Moreover, constraint (11) is
power of n CUE and m M2M pair, respectively. Therefore,
utilized to limit the UE’s transmit power. On the other hand,
the total achievable data rates for nth CUE and mth M2M
constraints (12) and (13) present the SINR constraints for the
pair, respectively can be stated as
CUEs and MUEs, respectively. Here, γth is the threshold for
K
N X
X indicating the minimum required SINR to guarantee the QoS
Rn = Bn wnk log2 (1 + γnk ), (4) requirements for the CUEs and M2M pairs, respectively.
n=1 k=1
and B. Proposed Low Complexity Algorithm
M X
X K
Rm = k
Bm vm k
log2 (1 + γm ), (5) From (7), it can be seen that OP 1 is a non-linear pro-
m=1 k=1 gramming optimization and mixed integer problem, and, it
respectively, where wnk and vmk
denote whether or not the is generally NP-hard [9]. In particular, the proposed RL
subfrequency k is assigned to nth CUE and mth M2M pair, technique can solve complex network resource optimization
respectively. problems and take judicious control decisions with only
limited information about the network statistics rather than
III. RL-BASED R ESOURCE A LLOCATION P OLICY existing ML techniques. Consequently, there are two sub-
A. Overview problems, namely OP 2 and OP 3 , which can be formulated
RL-based resource optimization: This subsection first for- based on to leverage (OP 1 ), and evolve a lower CC algorithm
mulates an optimization problem that maximizes the proposed to solve it. The proposed RL-based resource allocation policy
network performance in terms of spectrum efficiency (SE) [9]. comprises the allocation for CUEs within the constraints of the
In addition, a RL-enabled lower CC algorithm is presented to eNB transmission power and orthogonal subfrequency initially,
address the resource scarcity. and after that the allocation for M2M pairs under threshold
Problem formulation: Since the M2M pairs dispatched infor- interference.
mation opportunistically [7], a significant amount of control 1) Spectrum allocation for each CUE: The first sub-
signals are required by a proper power control system. We problem (OP 2 ) seeks to optimize the overall performance of
assume that an M2M pair consumes constant power to transmit each CUE by presuming that orthogonal subfrequencies do not
data during its operation. In other words, MUEs transmitting suffer from co-tier and cross-tier interference resulting from
power allocated at each subfrequency is given by M2M pairs, i.e.
( OP 2 : max Rn , (14)
0, if M2M pair is inoperative at k {W,P }
pkm = (6)
P, if M2M pair is operative at k s.t. (8), (9), and (11).
We designate Sk = {m|vm k
= 1} to present the set of We can maximize the transmission rate of CUEs for each
working M2M pairs at subfrequency k. To maximize the subfrequency to acquire the maximum throughput of all CUEs.
system performance of both MUE and CUE, the optimization Hence, the maximum SINR constraint is achieved for CUEs.
problem is expressed as In addition, the CUE with the leading channel gain is allowed
to transfer data on each subfrequency. Furthermore, the trans-
OP 1 : max {Rm + Rn } (7) mission power of CUE n∗ is managed with the open loop
{W,V,P }

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1475
2022 IEEE Wireless Communications and Networking Conference (WCNC)

power control technique to mitigate the interference at the Algorithm 1: RL-based resource utilization policy
M2M receivers, such as 1 Step 1: Spectrum distribution for CUEs;
k
γth (Imax + N0 ) 2 Set n = [1, 2, ..., N ], m = [1, 2, ..., M ] and entire
pkn∗ ≥ , (15)
hkn∗ number of simulation times;
max
k
where Imax is the highest permitted interference at each 3 Initialize : Rm = 0, Rn = 0 and pkn = P N ;
subfrequency. 4 for k = [1, 2, ..., K] do
2) Spectrum allocation for each MUE: After finding the 5 Obtain n∗ on subfrequency k;
allocation for each CUE, then the subfrequency assignment 6 Update the optimal transmission power of each
for M2M pairs can be given as CUE according to (15);
7 Update the data rate of each CUE according to
OP 3 : max Rm , (16) (17);
{V }
8 end
s.t. (10), (12), and (13).
9 Step 2: Spectrum distribution for MUEs;
From equation (2), (12) is rewritten as follows
k k 10 Assign the training set S, amount of training phases T ,
pn hn batch size Q and PN parameter P ;
min − N0 Bn ≥ 0. (17)
n γth 11 for t = [1, 2, ..., T ] do
Similarly, (13) is rewritten as 12 Choose a batch of sample sb for b ∈ [1, 2, ..., Q];
M k 13 Trial solution ob based θp (.|sb ) for b ∈ [1, 2, ..., Q];
X
∗ P hm 14 Calculate value: V (ob |sb );
vm,k p∗m,k h∗m,k ≤ min − N0 Bm + vm k
P hkm .
m=1
m∈Sk γth 15 Update the PG (gp ) according to (22);
(18) 16 Update the parameter of the PN according to (23);
After the alteration of (12) and (13), (OP 3 ) is transfigured into 17 Update the baseline function
a two-dimensional (2D) knapsack problem (KP), especially, q(sb ) = q(sb ) + α(V (ob |sb ) − q(sb )) for
there are two dimensions for the weights of the KP. It is ap- b ∈ [1, 2, ..., Q];
plied to achieve the optimal solution for resource allocation in 18 end
M2M communications under power and threshold interference 19 for k = [1, 2, ..., K] do
k
constraints [9]. This type of issue is a part of the MKP [9] 20 Utilize the PN to calculate the vm for mth MUE;
and only water-filling algorithms can be applied to address 21 Update the date rate for each MUE;
this type of NP-hard problems. However, efficient water-filling 22 end
algorithms require high computation time, making them un-
suitable for real-time applications [5]. In this article, the PN is
employed to successfully handle the problem of combinatorial 2) RL algorithm: RL algorithm is considered as a proper
optimization. method to train NN while solving combinatorial optimization
C. Proposed Algorithm Representation problems. Our proposed low complexity policy-empowered
1) PN architecture: RL-based optimal resource allocation RL aims the parameter optimization a PN, which is symbol-
scheme is proposed in Algorithm 1. As above, a 2D KP is ized as P . Besides, the expected tour length expressed by an
a resource optimization problem for each subfrequency k. input graph s is given below [10]
Given the CUEs’ resource distribution state, each of the M2M
J(P |s) = Eo∼θp (.|s) V (o|s). (19)
pairs is a characteristics vector of 3D (v, x, y), where v is
the achievable data rate for M2M pairs according to (5), x The graphs generate from distribution S while training, where
and y are the weights on the 2D KP limitations, which may the overall training target includes sampling from the graph
be obtained from (17) and (18), individually. The PN is a distribution and written as follows
special type of recurrent NN which differentiates the encoder
and the decoder via distinct colors. The input of v, x and y J(P ) = Es∼S J(P |s). (20)
should be in a sequence that comprises the 3D characteristics
vectors as specified, because the PN is built on the model of For optimizing the parameters, we recourse to policy gradient
sequence-to-sequence. The output is also a sequence and may (PG) techniques and stochastic gradient descent. Using the RL
be obtained from the PN by using the pointing mechanism algorithm, the gradient of (20) can be expressed as [10]
which re-arranges the input. The output is a collection of valid ∇p J(P |s) = Eo∼θp (.|s) [(V (o|s)−q(s))∇p log θp (o|s)], (21)
entities that meet the requirements. In particular, we cross the
solution series and terminate when the obtained entities exceed where q(s) signifies the baseline act in the training procedure
the limitations of (17) and (18). The identified entities are the that does not lie in the arrangement of the order in the proposed
solution to the KP. We name it solution o and designate V (o) network and calculates the predicted value to decrease the
as the total value of the corresponding set of the entities. The divergence of the gradients [10]. The popular RL algorithm
details of the PN framework are provided in [9]. has been utilized to extract the gradient for improving the

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1476
2022 IEEE Wireless Communications and Networking Conference (WCNC)

network parameters using the Adam method [10]. The pro- network performance while mitigating the co-tier interference
posed resource allocation scheme uses the RL algorithm in a of the M2M Links. The M2M mode maximizes the SINR of
further practicable approach with Monte Carlo sampling. After M2M links compared to traditional methods, leading to only
assigning the batch size Q, by producing Q independently a slightly better performance than the cellular mode.
and identically allocated sample KP, the gradient is stated in
a randomly determined mean form as TABLE I
T HE PARAMETERS OF S IMULATION
Q
1 X
gp = (V (ob |sb ) − q(sb ))∇p log θp (ob |sb ), (22) Parameter Value
Q Cell radius 250m
b=1
Carrier frequency 2 GHz
P = ADAM (P, gp ), (23) System bandwidth 6 MHz
Per RB bandwidth 200 kHz
The baseline function is an exponential moving mean value of Number of M2M pairs 30
the system rewards achieved through the network upon time Number of CUEs 10
to justify the detail that the strategy enhances with training. M2M communication distance 50m
Maximum transmission power [2] 30 dBm
3) CC analysis: Here we will analyze the CC of the pro- Noise power −174 dBm/Hz
posed scheme. According to Step 1, the CC of the suggested PL model between eNB and CUE [2] 128.1 + 37.6 log(d[km])
algorithm 1 is O(N ). The long-short term memory (LSTM) PL model between M2M pair [2] 148 + 40 log(d[km])
Threshold SINR 10 dB
units with attention are the elementary modules in the PN of Training samples 2000
the Step 2. The CC for every given fine-tuned PN is O(M 2 ) Testing samples 500
whereas the attention variable calculation is done M times Batch size 64
at every creation time [9], [10]. Thus, the entire CC of the Hidden layer 64
proposed scheme is O(M 2 + N ).
A model-free policy-based RL algorithm optimizes the
parameters without knowledge of the environment. This al-
gorithm measures the time of making model inference, that
is, the step for the trained model to make decisions which
minimizes the error on training samples, while keeping the
bound on its model complexity small. Thus, computation time
of the proposed scheme is faster than existing ML schemes
which are analyzed in Fig. 4.
IV. S IMULATION R ESULTS AND A NALYSIS
A. Simulation Setup
The MATLAB based simulation results for the suggested
scheme are presented in this section. Furthermore, the CUEs
and M2M pairs are randomly placed in the network. In
both CUEs and M2M pairs the channel fading follows a
uniform variance of the Rayleigh distribution. Table I lists
the main parameters of the simulation in detail. We compare
the proposed algorithm with two traditional schemes, such as Fig. 2. Average throughput versus number of M2M pairs.
M2M mode and cellular mode. In the M2M mode, the M2M
pair communicates with each other without using learning-
C. The Convergence of the Proposed Mechanism
based resource utilization [5]. Furthermore, in the cellular
mode, the CUE communicates without using learning-based Fig. 3 shows the reward comparison of the proposed
resource utilization in the proposed network [5]. Moreover, the scheme, M2M mode and cellular mode. As illustrated in the
proposed mechanism selects the suitable communication mode figure, our proposed scheme outperforms other existing M2M
using RL empowered PN-based resource utilization policy distribution algorithms in the proposed network. It shows that
without considering the UE selection. when the iteration increases, the network performance of users
is improved, and our proposed method is much better than
B. The Throughput Comparison of M2M Pairs other methods. Compared to traditional resource distribution
Fig. 2 shows the average throughput versus the number of approaches, MUEs achieve reasonable system performance
M2M pairs. The average throughput achieved by the proposed while co-tier interference is appropriately managed. Moreover,
scheme rises as the number of M2M pairs rises. But, the rise we can observe that the proposed scheme converges very
is not remarkable due to the interference limitation in the rapidly. This is because the proposed scheme adopts the PN
cellular mode. From the figure, it can be observed that the which reduces the CC. The numbers of M2M pairs and CUEs
proposed approach achieves a significant improvement in the are set to be 30 and 10, respectively, which are randomly

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1477
2022 IEEE Wireless Communications and Networking Conference (WCNC)

M2M pairs is small, then the Q-learning scheme has a close

performance to the MKP scheme. As the number gets larger,
the Q-learning scheme has achieved better reward performance
and convergence than that of the MKP scheme as well as lower
than that of the proposed scheme. In the case of large state-
action space, Q-learning cannot provide good efficiency due to
poor reward performance and convergence. On the other hand,
by introducing PN architecture in the training process, larger
reward performance, more stable convergence, lower CC, and
adequate network performance are attained using the proposed
scheme than that of MKP and Q-learning schemes.
V. C ONCLUSIONS
This paper investigated the optimal resource distribution
for M2M communications coexisting with cellular networks
using the nonlinear programming mixed-integer optimization
and low-complexity process which is based on the PN. Monte
Fig. 3. The convergence of the proposed mechanism.
Carlo simulations have been performed to evaluate the perfor-
mance of RL-based resource utilization policy. It has been
located in this figure. Additionally, if the number of UEs rises, found that the number of M2M pairs and iterations have
the convergence rate becomes gradually slow. The estimated substantial impact on the results. Simulation results showed
reward decreases as the number of UE rises, which indicates that the proposed mechanism has achieved a better system
the system performance to be better if there are fewer UEs performance in terms of average throughput compared to
in the proposed network. The reason is that the interference traditional approaches while significantly mitigating the inter-
generated by M2M links as a result of co-tier transmissions ference. For further upgrading the spectrum allocation policy,
increases. Therefore, the estimated reward of a network with federated learning framework can be used since it provides
a lower number of UE is greater than at with the many UEs a global solution for complex network optimization problems
in M2M communications. without sharing information between eNBs that makes it an
interesting topic for further investigation.
R EFERENCES
[1] F. Hussain, Internet of Things: Building Blocks and Business Models,
1st ed. Springer, Cham, 2017, pp. 73.
[2] S. K. Das and M. F. Hossain, “A Location-Aware Power Control
Mechanism for Interference Mitigation in M2M Communications over
Cellular Networks,” Comput. & Elect. Eng., vol. 88, pp. 1–23, Dec.
2020.
[3] X. Li, L. Ma, Y. Xu, and R. Shankaran, ”Resource Allocation for D2D-
Based V2X Communication With Imperfect CSI,” IEEE Internet Things
J., vol. 7, no. 4, pp. 3545-–3558, Apr. 2020.
[4] H. Xu et al., “Robust transmission design for multicell D2D underlaid
cellular networks,” IEEE Trans. Veh. Technol., vol. 67, no. 7, pp. 5922—
5936, Jul. 2018.
[5] S. K. Das and R. Mudi, ”A Location-Aware Resource Efficiency
and Energy Efficiency Optimization for M2M Communications over
Downlink Cellular Systems,” IEEE 31st Annu. Int. Symp. on Pers.,
Indoor and Mobile Radio Commun., 2020, pp. 1–6.
[6] Y. Hao, Q. Ni, H. Li, and S. Hou, “Robust Multi-Objective Optimization
for EE-SE Tradeoff in D2D Communications Underlaying Heteroge-
neous Networks,” IEEE Trans. Commun., vol. 66, no. 10, pp. 4936—
Fig. 4. The CC comparison with the proposed scheme, Q-learning scheme 4949, Oct. 2018.
and MKP scheme. [7] W. Wu, R. Liu, Q. Yang, and T. Q. S. Quek, ”Learning-Based
Robust Resource allocation for D2D Underlaying Cellular Net-
work,” May 2021, Accessed: Jul. 03, 2021. [Online]. Available:
D. The Computational Complexity of the Proposed Scheme http://arxiv.org/abs/2105.08324.
[8] D. Wang et al., “Joint resource allocation and power control for D2D
Fig. 4 shows the expected reward versus the number of communication with deep reinforcement learning in MCC,” Physical
Commun., vol. 45, Apr. 2021.
iterations. Here, O(N (K 2 +K+P )) presents the CC of MKP [9] L. Zhu, C. Liu, J. Yuan, and G. Yu. “Machine Learning-Based Resource
where the amount of M2M pairs, length of MKP and the total Optimization for D2D Communication Underlaying Networks,” IEEE
iteration numbers are denoted by K, P , and N , respectively. 92nd Veh. Technol. Conf. (VTC2020-Fall), 2020, pp. 1–6.
[10] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural Com-
This figure considers 80 and 30 as the values of N and K. The binatorial Optimization with Reinforcement Learning,” 5th Int. Conf.
RL based Q-learning scheme for resource allocation, which is Learn. Represent. ICLR 2017 - Work. Track Proc., Nov. 2016, Accessed:
used in [8]. From Fig. 4, it can be seen that if the number of Jul. 08, 2021. [Online]. Available: https://arxiv.org/abs/1611 : 09940v3.

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1478

Roller Dooya
No ratings yet
Roller Dooya
147 pages
Digital Health For Aging Populations
No ratings yet
Digital Health For Aging Populations
8 pages
A Resource Allocation Mechanism Based On Weighted
No ratings yet
A Resource Allocation Mechanism Based On Weighted
16 pages
Resource Management Based On Reinforcement Learning For D2D Communication in Cellular Networks
No ratings yet
Resource Management Based On Reinforcement Learning For D2D Communication in Cellular Networks
6 pages
Reinforcement Learning Based Hybrid Spectrum Resource Allocation Scheme For The High Load of URLLC Services
No ratings yet
Reinforcement Learning Based Hybrid Spectrum Resource Allocation Scheme For The High Load of URLLC Services
21 pages
Federated Reinforcement Learning-Based Resource Allocation in D2D-Enabled 6G
No ratings yet
Federated Reinforcement Learning-Based Resource Allocation in D2D-Enabled 6G
7 pages
50 Make Smart Decisions Faster Deciding D2D Resource Allocation Via Stackelberg Game Guided Multi Agent Deep Reinforcement Learning
No ratings yet
50 Make Smart Decisions Faster Deciding D2D Resource Allocation Via Stackelberg Game Guided Multi Agent Deep Reinforcement Learning
12 pages
Mode Selection, Resource Allocation and Power Control For D2D-Enabled Two-Tier Cellular Network
No ratings yet
Mode Selection, Resource Allocation and Power Control For D2D-Enabled Two-Tier Cellular Network
13 pages
Resource Allocation and Power Control Scheme For Cellular Network Based D2D (Device-to-Device) Communication
No ratings yet
Resource Allocation and Power Control Scheme For Cellular Network Based D2D (Device-to-Device) Communication
8 pages
Sensors 23 03884 v2
No ratings yet
Sensors 23 03884 v2
14 pages
mathematics-10-03415-v2
No ratings yet
mathematics-10-03415-v2
19 pages
Spectrum Efficient Mode Selection and Resource Allocation
No ratings yet
Spectrum Efficient Mode Selection and Resource Allocation
1 page
Communications, Vol. 15, No. 1, Pp. C1-2, Jan. 2016.: TP &arnumber 7374793&isnumber 7374774
No ratings yet
Communications, Vol. 15, No. 1, Pp. C1-2, Jan. 2016.: TP &arnumber 7374793&isnumber 7374774
6 pages
A Reinforcement Learning Method For Joint Mode Selection and Power Adaptation in The V2V Communication Network in 5G
No ratings yet
A Reinforcement Learning Method For Joint Mode Selection and Power Adaptation in The V2V Communication Network in 5G
12 pages
Deep Learning-Based Cross-Layer Resource Allocation For Wired Communication Systems
No ratings yet
Deep Learning-Based Cross-Layer Resource Allocation For Wired Communication Systems
5 pages
Spectrum
No ratings yet
Spectrum
1 page
D2D Communications Underlaying Wireless Powered Communication Networks
No ratings yet
D2D Communications Underlaying Wireless Powered Communication Networks
5 pages
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
No ratings yet
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
14 pages
Korea_University_Korea_1
No ratings yet
Korea_University_Korea_1
6 pages
Clustering and Resource Allocation Strategy For D2D Multicast Networks With Machine Learning Approaches Working
No ratings yet
Clustering and Resource Allocation Strategy For D2D Multicast Networks With Machine Learning Approaches Working
16 pages
Massive Multiple Access Based On Superposition Raptor Codes For M2M Communications
No ratings yet
Massive Multiple Access Based On Superposition Raptor Codes For M2M Communications
12 pages
Futureinternet 11 00012 With Cover
No ratings yet
Futureinternet 11 00012 With Cover
16 pages
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
No ratings yet
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
6 pages
progress
No ratings yet
progress
30 pages
Energy Consumption Optimization
No ratings yet
Energy Consumption Optimization
23 pages
Heterogeneous Machine-Type Communications in Cellular Networks: Random Access Optimization by Deep Reinforcement Learning
No ratings yet
Heterogeneous Machine-Type Communications in Cellular Networks: Random Access Optimization by Deep Reinforcement Learning
6 pages
Smart Power Control For Quality-Driven Multi-User Video Transmissions A Deep Reinforcement Learning Approach
No ratings yet
Smart Power Control For Quality-Driven Multi-User Video Transmissions A Deep Reinforcement Learning Approach
12 pages
Papper (Anh Hiếu)
No ratings yet
Papper (Anh Hiếu)
14 pages
Network_Slicing_Based_Joint_Optimization_of_Beamforming_and_Resource_Selection_Scheme_for_Energy_Efficient_D2D_Networks
No ratings yet
Network_Slicing_Based_Joint_Optimization_of_Beamforming_and_Resource_Selection_Scheme_for_Energy_Efficient_D2D_Networks
17 pages
Kaust Repository: On Mode Selection and Power Control For Uplink D2D Communication in Cellular Networks
No ratings yet
Kaust Repository: On Mode Selection and Power Control For Uplink D2D Communication in Cellular Networks
8 pages
Resource Allocation For D2D Communication Underlaid Cellular Networks Using Graph-Based Approach
No ratings yet
Resource Allocation For D2D Communication Underlaid Cellular Networks Using Graph-Based Approach
15 pages
Deep Reinforcement Learning Based Computation Offloading and Resource Allocation For MEC
No ratings yet
Deep Reinforcement Learning Based Computation Offloading and Resource Allocation For MEC
6 pages
AI_ML_aided_Capacity_Maximization_Strategies_for_URLLC_in_5G_6G_Wireless_Systems_A_Survey_MR_Submit_clean
No ratings yet
AI_ML_aided_Capacity_Maximization_Strategies_for_URLLC_in_5G_6G_Wireless_Systems_A_Survey_MR_Submit_clean
24 pages
Research Article
No ratings yet
Research Article
7 pages
DRL - 63 - Deep Reinforcement Learning For Resource Allocation
No ratings yet
DRL - 63 - Deep Reinforcement Learning For Resource Allocation
6 pages
Energy-Efficient Joint Task Offloading and Resource Allocation in OFDMA-Based Collaborative Edge Computing
No ratings yet
Energy-Efficient Joint Task Offloading and Resource Allocation in OFDMA-Based Collaborative Edge Computing
13 pages
5G AND BEYOND WIRELESS NETWORKS OPTIMIZATION Thesis
No ratings yet
5G AND BEYOND WIRELESS NETWORKS OPTIMIZATION Thesis
213 pages
Deep Reinforcement Learning Based Resource Allocation For V2V Communications
No ratings yet
Deep Reinforcement Learning Based Resource Allocation For V2V Communications
11 pages
Multi-Objective Optimization For Energy-And Spectral-Efficiency Tradeoff in In-Band Full-Duplex (IBFD) Communication
No ratings yet
Multi-Objective Optimization For Energy-And Spectral-Efficiency Tradeoff in In-Band Full-Duplex (IBFD) Communication
6 pages
A Reinforcement Learning Method For Joint Mode Selection and Power Adaptation in The V2V Communication Network in 5G
No ratings yet
A Reinforcement Learning Method For Joint Mode Selection and Power Adaptation in The V2V Communication Network in 5G
15 pages
Joint Delay-Energy Optimization For Multi-Priority Random Access in Machine-Type Communications
No ratings yet
Joint Delay-Energy Optimization For Multi-Priority Random Access in Machine-Type Communications
16 pages
Final
No ratings yet
Final
5 pages
ThesisDeep Reinforcement Learning For Resource Allocation in V2V Communications
No ratings yet
ThesisDeep Reinforcement Learning For Resource Allocation in V2V Communications
6 pages
Machine Learning For Power Control in D2D Communication Based On Cellular Channel Gains (Nice Modelling)
No ratings yet
Machine Learning For Power Control in D2D Communication Based On Cellular Channel Gains (Nice Modelling)
6 pages
4s PDF
No ratings yet
4s PDF
15 pages
Uplink Resource Allocation For Multi-Cluster Internet-of-Things Deployment Underlaying Cellular Networks
No ratings yet
Uplink Resource Allocation For Multi-Cluster Internet-of-Things Deployment Underlaying Cellular Networks
14 pages
A Survey of Enabling Technologies of Low Power and Long Range Machine-to-Machine Communications
No ratings yet
A Survey of Enabling Technologies of Low Power and Long Range Machine-to-Machine Communications
19 pages
Graph Neural Networks Approach For Joint Wireless Power Control and Spectrum Allocation
No ratings yet
Graph Neural Networks Approach For Joint Wireless Power Control and Spectrum Allocation
16 pages
User-Centric Resource Scheduling For Dual-Connectivity Communications
No ratings yet
User-Centric Resource Scheduling For Dual-Connectivity Communications
5 pages
Interference EE - ICC09
No ratings yet
Interference EE - ICC09
5 pages
Gedikli et al. - 2022 - Deep reinforcement learning based flexible preambl
No ratings yet
Gedikli et al. - 2022 - Deep reinforcement learning based flexible preambl
14 pages
incip
No ratings yet
incip
21 pages
Resource Allocation For Cell-Free Massive MIMO-enabled URLLC Downlink Systems
No ratings yet
Resource Allocation For Cell-Free Massive MIMO-enabled URLLC Downlink Systems
16 pages
Queueing 1108.0194
No ratings yet
Queueing 1108.0194
7 pages
Meta Federated Reinforcement Learning For Distributed Resource Allocation
No ratings yet
Meta Federated Reinforcement Learning For Distributed Resource Allocation
11 pages
D2D 3-Tier Network
No ratings yet
D2D 3-Tier Network
18 pages
Transmit Power Management Technique For Wireless Communication Networks
No ratings yet
Transmit Power Management Technique For Wireless Communication Networks
6 pages
Robust Distributed Power Control in Cognitive Radio Networks
No ratings yet
Robust Distributed Power Control in Cognitive Radio Networks
12 pages
A Machine Learning Framework For Resource Allocation Assisted by Cloud Computing
No ratings yet
A Machine Learning Framework For Resource Allocation Assisted by Cloud Computing
8 pages
Energy Efficient Resource Allocation For IRS Assisted CoMP Systems
No ratings yet
Energy Efficient Resource Allocation For IRS Assisted CoMP Systems
15 pages
Optimal Spectrum Partition and Mode Selection in Device-to-Device Overlaid Cellular Networks
No ratings yet
Optimal Spectrum Partition and Mode Selection in Device-to-Device Overlaid Cellular Networks
6 pages
Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
Ahamed Internship Report
No ratings yet
Ahamed Internship Report
35 pages
MTech - CSE - Curriculum-2022
No ratings yet
MTech - CSE - Curriculum-2022
94 pages
Reinforcement Learning-Based Multiaccess Control and Battery Prediction With Energy Harvesting in IoT Systems
No ratings yet
Reinforcement Learning-Based Multiaccess Control and Battery Prediction With Energy Harvesting in IoT Systems
12 pages
Avishkaar Proposal Lab 2024 - Option Wise - Institutional
No ratings yet
Avishkaar Proposal Lab 2024 - Option Wise - Institutional
9 pages
Bahoo Et Al (2023)
No ratings yet
Bahoo Et Al (2023)
32 pages
Test Bank for Business Analytics, 4th Edition, Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann pdf download
100% (1)
Test Bank for Business Analytics, 4th Edition, Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann pdf download
32 pages
Final Brochure - Ifssa 2022
No ratings yet
Final Brochure - Ifssa 2022
14 pages
Enterprise IT Security: The Ultimate Guide
100% (1)
Enterprise IT Security: The Ultimate Guide
44 pages
Edge Data Center Solutions
No ratings yet
Edge Data Center Solutions
8 pages
Catalogo Sinumerik 808
100% (1)
Catalogo Sinumerik 808
96 pages
Cbse - Department of Skill Education Artificial Intelligence
No ratings yet
Cbse - Department of Skill Education Artificial Intelligence
9 pages
Soha
No ratings yet
Soha
6 pages
Relationship Marketing & International Retailing
No ratings yet
Relationship Marketing & International Retailing
17 pages
IOT Handbook
No ratings yet
IOT Handbook
27 pages
UGRD-ITE6300 Cloud Computing and Internet of Things2
No ratings yet
UGRD-ITE6300 Cloud Computing and Internet of Things2
7 pages
BCA 2023
No ratings yet
BCA 2023
12 pages
AZ-900 Microsoft Azure Fundamentals Free Full Course and Study Guide _
No ratings yet
AZ-900 Microsoft Azure Fundamentals Free Full Course and Study Guide _
38 pages
2212.01614v1
No ratings yet
2212.01614v1
7 pages
Class Notes Cybersecurity
No ratings yet
Class Notes Cybersecurity
3 pages
Unit-2 PPT.pptx
No ratings yet
Unit-2 PPT.pptx
41 pages
Working Model For Car Engine and Lighting Simulation Improvised As Iot Learning Device For Autotronics
No ratings yet
Working Model For Car Engine and Lighting Simulation Improvised As Iot Learning Device For Autotronics
5 pages
3GPP TS 22.261
No ratings yet
3GPP TS 22.261
53 pages
AICT Lecture#06
No ratings yet
AICT Lecture#06
2 pages
Embedded Iot Robotics STP
No ratings yet
Embedded Iot Robotics STP
9 pages
IoT Technologies Explained - History, Examples, Risks & Future
No ratings yet
IoT Technologies Explained - History, Examples, Risks & Future
6 pages
5G in Healthcare PPT - Comparison (Autosaved)
No ratings yet
5G in Healthcare PPT - Comparison (Autosaved)
29 pages
ECG1
No ratings yet
ECG1
9 pages
Edx Dpu 1100 Courses
No ratings yet
Edx Dpu 1100 Courses
55 pages

Reinforcement Learning-Based Resource Allocation For M2M Communications Over Cellular Networks

Uploaded by

Reinforcement Learning-Based Resource Allocation For M2M Communications Over Cellular Networks

Uploaded by

2022 IEEE Wireless Communications and Networking Conference (WCNC)

Reinforcement Learning-Based Resource Allocation

subfrequency k can be written as 0≤ pK

M2M pairs is small, then the Q-learning scheme has a close

You might also like