0% found this document useful (0 votes)
12 views

Reinforcement Learning-Based Resource Allocation For M2M Communications Over Cellular Networks

Uploaded by

sreedas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Reinforcement Learning-Based Resource Allocation For M2M Communications Over Cellular Networks

Uploaded by

sreedas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2022 IEEE Wireless Communications and Networking Conference (WCNC)

Reinforcement Learning-Based Resource Allocation


for M2M Communications over Cellular Networks
Sree Krishna Das∗ , Md. Siddikur Rahman† , Lina Mohjazi‡ , Muhammad Ali Imran‡ , and Khaled M. Rabie?
2022 IEEE Wireless Communications and Networking Conference (WCNC) | 978-1-6654-4266-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/WCNC51071.2022.9771998


Dept. of Electrical, Electronic and Communication Engineering, Military Institute of Science and Technology, Bangladesh

Dept. of Electrical and Electronic Engineering, American International University-Bangladesh, Bangladesh

James Watt School of Engineering, University of Glasgow, UK
?
Department of Engineering, Manchester Metropolitan University, UK
Email: [email protected], [email protected], [email protected], [email protected],
[email protected]

Abstract—The spectrum efficiency can be greatly enhanced data rate, decreasing power consumption, and reducing end-
by the deployment of machine-to-machine (M2M) communi- to-end (E2E) latency [2]. For M2M user equipments (MUEs)
cations through cellular networks. Existing resource allocation coexisting with cellular user equipments (CUEs), there are
approaches allocate maximum resource blocks (RBs) for cellu-
lar user equipments (CUEs). However, M2M user equipments two different types of deployment such as (i) overlaying
(MUEs) share the same frequency among themselves within the mode, and (ii) underlaying mode. CUEs and MUEs share
same tier. This results in generating co-tier interference, which the same radio resources through the underlaying mode. They
may deteriorate the MUE’s quality-of-service (QoS). To tackle suffer interference with each other. In the overlaying mode,
this problem and improve the user experience, in this paper, dedicated spectrum resources are allocated without creating
we propose a novel resource utilization policy, which exploits
reinforcement learning (RL) algorithm considering the pointer cross-tier interference. However, since a high number of users
network (PN). In particular, we design an optimization problem exist in the wireless network, the radio resources are usually
that determines the optimal frequency and power allocation inadequate [3], [4]. In order to entirely exploit the possible
needed to maximize the achievable rate performance of all facilities of underlaid M2M communications, it is essential to
M2M pairs and CUEs in the network subject to the co-tier provide the proper power for each UE by using the power
interference and QoS constraints. The proposed scheme enables
the user equipment (UE) to autonomously select an available control scheme and designing an efficient machine learning
channel and optimal power to maximize the network capacity (ML)-based resource utilization policy that mitigates the co-
and spectrum efficiency while minimizing co-tier interference. tier and cross-tier interference between MUEs and CUEs.
Moreover, the proposed scheme is compared with traditional
spectrum allocation schemes. Simulation results demonstrate the
superiority of the proposed scheme than that of the traditional
schemes. Moreover, the convergence of the proposed scheme is
investigated which reduces the computational complexity (CC). A. Related Works
Index Terms—M2M communications, resource allocation,
throughput, pointer network, reinforcement learning. In recent years, there has been a great deal of attention
to stochastic optimization and robust optimization methods
I. I NTRODUCTION because of the needs to handle the unpredictable value of CSI
in M2M communications. The studies in [4] addressed CSI by
By 2020, 4 billion devices are linked over 25 billion em- maximizing the predicted linking capacity through the use of
bedded intelligent systems creating 50 trillion GB of data [1]. stochastic optimization. In addition, accurate CSI is often not
Following these figures, internet of things (IoT), in particular possible or may require high feedback rates which makes the
wireless IoT, are potential candidates for future smart world. channel condition uncertain. [2], [5] studied how to maximize
Its vast adoption puts forward several technical challenges the resource efficiency (RE) in M2M communications by using
which include network design and storage architecture for the technique of joint power control and spectrum allocation to
smart devices, effective information transmission protocols, improve the user’s data rate and prolong the battery lifetime
proactive IoT device identification, malicious attack pre- of UE by facilitating the reuse of radio resources between
vention, technology standardization and appliance interfaces. MUEs and CUEs. In real time signal transmission, the size and
As a result, machine-to-machine (M2M) communications is shape parameters of the uncertainty set would fluctuate with
deemed as a promising paradigm in addressing these issues the channel conditions [6], [7]. That is, CSI frequently changes
and offering efficient operation of beyond fifth generation due to the high mobility of CUEs and MUEs. However,
(B5G) and sixth generation (6G) cellular networks. Besides, the preceding works presented so far emphasize on M2M
unlike conventional communications, M2M communications communications with ideal CSI [3]. Therefore, it is critical
involve direct links transmission with the evolved node B to investigate how to meet the increasing demand of higher
(eNB), resulting in many benefits, such as enhancing user transmission rate in M2M communications.

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
978-1-6654-4266-4/22/$31.00 ©2022 IEEE 1473
2022 IEEE Wireless Communications and Networking Conference (WCNC)

B. Motivation
Next generation wireless networks will generate a tremen-
dous amount of data related to network statistics, such as
user traffic, channel occupancy, channel quality, etc. This will
induce unmanageable overhead that largely increases delay,
computation, and energy consumption of network elements
[8]. Neural network (NN) can leverage this data to develop
automated and fine-grained schemes to optimize network radio
resources. The multipurpose pointer network (PN) can predict
sequences over variable length input dictionaries when it
integrates with NN resulting in improvement of sequences
with the assistance of input attention and generalization of Fig. 1. Proposed system model of M2M communications over cellular
networks.
variable size output dictionaries. However, the PN based on
the NN is not taken into account in [7], and interferences are
not taken into account in [4]. Furthermore, resource allocation resource utilization policy of the proposed scheme. It is
techniques based on the conventional optimization theory, also found that the proposed scheme reduces the com-
such as multidimensional knapsack problems (MKP), greedy putational complexity (CC). Also, the proposed scheme
algorithm, and heuristic algorithm [6], [7], are generally highly provides better network performance than that of the
complex and not feasible for real-time applications. Reinforce- traditional schemes.
ment learning (RL) is deemed as a promising algorithm to The rest of the paper is organized as follows. Section II
solve cellular communication problems, especially for spec- illustrates the system model. We put forward the radio resource
trum allocation, data offloading, adaptive modulation, power utilization policy and develop the RL algorithm with low CC
control and interference mitigation more efficiently compared in section III. Section IV presents simulation results, followed
to supervised and unsupervised learning algorithms. However, by conclusions in Section V.
RL algorithms reveal low convergence speed as well as overall
efficiency while working with large state–action spaces in II. P ROPOSED N ETWORK M ODEL
complex networks. Moreover, in combined resource sharing A. System Model
and power controlling schemes, RL is unable to manage The system model considers the downlink data transfer situ-
large action spaces and state space. Therefore, this paper ation when the eNB is located in the center of the cellular cell.
considers RL-based PN for solving multi-dimensional state The M2M communication comprises of a couple of devices
space and complexity discrete action space problems and those are able to directly transmit data without the help of the
proposes a joint power and spectrum allocation algorithm. The eNB. On the other hand, the CUEs are mobile terminals that
key contributions of this paper can be summarized as follows. can only be connected via the eNB. There are M M2M pairs
• This paper proposes a RL-based resource utilization pol- and N CUEs deployed randomly in the coverage area of the
icy for M2M communications over cellular networks. eNB as shown in Fig. 1. UEs operate in orthogonal frequency
• We adopt a mixed integer non-linear programming prob- bands following orthogonal frequency division multiple access
lem and NP-hard. Then, we assign the orthogonal sub- (OFDMA) technique. In this case, MUEs do not share the
frequenices for different MUEs and CUEs to maximize spectrum with CUEs. The unavailability of resource blocks
the sum rate of the network. Moreover, the M2M pairs (RBs) allows the MUEs to share the same frequency but it
are permitted to reuse resources to better use the scarce also causes co-tier interference between MUEs. Moreover, we
resources, when the co-tier interference is below than assume that the complete CSI is accessible [9]. In other words,
threshold interference. Besides, MUEs can choose a num- for simplicity, the eNB is capable of obtaining the full CSI
ber of channels and proper power to transmit services between CUEs and M2M pairings.
as soon as possible without affecting the traditional B. Performance Metrics
communication of CUEs.
As OFDMA is incorporated for CUE and MUE transmis-
• This paper considers the RL algorithm empowered PN
sions, MUE receivers are subject to the interference caused
architecture which is based on a low-complexity process
only by other MUE transmitters that reuse the same frequency.
to effectively utilize the spectrum resources. Besides, the
In this sense, the co-tier interference for the MUE receiver at
PN is a prominent type of NN which efficiently solves
subfrequency k is given as follows
combinatorial optimization related problems in M2M
communications. Furthermore, this method achieves the M
X

goal of significantly improving the QoS of the system, Ik = vm,k p∗m,k h∗m,k , (1)
such as optimizing system capacity and simultaneously m=1

reducing interference. where vm,k ∈ {0,1}, designates whether the subfrequency k
• Extensive simulations are carried out for evaluating the is allocated to M2M pair m, if the m M2M pair reuses the

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1474
2022 IEEE Wireless Communications and Networking Conference (WCNC)

∗ K X
N
subfrequency k, vm,k sets 1 as its value, else sets 0. Also, X
wnk ≤ 1, ∀n ∈ N, ∀k ∈ K (8)
pm,k denotes the transmit power while h∗m,k is the channel

k=1 n=1
gain on subfrequency k between the MUE receiver and other
MUE transmitters including the 3rd generation partnership wnk ∈ {0, 1}, ∀k, n, (9)
project (3GPP) path loss (PL) model, channel fading on both
M2M pairs and CUEs that follows the Rayleigh distribution k
vm ∈ {0, 1}, ∀k, m, (10)
with uniform variance. The signal to interference-plus-noise
ratio (SINR) expression for CUE n and the M2M pair m at N X
X K

subfrequency k can be written as 0≤ pK


n ≤P
max
, (11)
n=1 k=1
pkn hkn
γnk = , (2)
Bn N0 γnk ≥ γth , ∀k, n, (12)
and
k pkm hkm k
γm ≥ γth , ∀k, ∀m ∈ Sk , (13)
γm = , (3)
Ik + Bm N0
where W = [wnk ]N ×K is the N by K subfrequency allocation
respectively, where hkn and hkm represent the channel power k
matrix for CUEs, V = [vm ]M ×K is the M by K subfrequency
gain on subfrequency k between M2M pair and CUE and eNB,
allocation matrix for M2M pairs and P = [pkn ]N ×K is the
respectively and N0 is the noise power. Besides, Bn and Bm
matrix of transmission power of CUEs at all subfrequencies.
denote the allocated spectrum resources of the CUE and M2M
Constraints (8) and (9) ensure that a maximum of one CUE
pair, respectively. Moreover, pkn and pkm denote the transmit
is allocated each subfrequency. Moreover, constraint (11) is
power of n CUE and m M2M pair, respectively. Therefore,
utilized to limit the UE’s transmit power. On the other hand,
the total achievable data rates for nth CUE and mth M2M
constraints (12) and (13) present the SINR constraints for the
pair, respectively can be stated as
CUEs and MUEs, respectively. Here, γth is the threshold for
K
N X
X indicating the minimum required SINR to guarantee the QoS
Rn = Bn wnk log2 (1 + γnk ), (4) requirements for the CUEs and M2M pairs, respectively.
n=1 k=1
and B. Proposed Low Complexity Algorithm
M X
X K
Rm = k
Bm vm k
log2 (1 + γm ), (5) From (7), it can be seen that OP 1 is a non-linear pro-
m=1 k=1 gramming optimization and mixed integer problem, and, it
respectively, where wnk and vmk
denote whether or not the is generally NP-hard [9]. In particular, the proposed RL
subfrequency k is assigned to nth CUE and mth M2M pair, technique can solve complex network resource optimization
respectively. problems and take judicious control decisions with only
limited information about the network statistics rather than
III. RL-BASED R ESOURCE A LLOCATION P OLICY existing ML techniques. Consequently, there are two sub-
A. Overview problems, namely OP 2 and OP 3 , which can be formulated
RL-based resource optimization: This subsection first for- based on to leverage (OP 1 ), and evolve a lower CC algorithm
mulates an optimization problem that maximizes the proposed to solve it. The proposed RL-based resource allocation policy
network performance in terms of spectrum efficiency (SE) [9]. comprises the allocation for CUEs within the constraints of the
In addition, a RL-enabled lower CC algorithm is presented to eNB transmission power and orthogonal subfrequency initially,
address the resource scarcity. and after that the allocation for M2M pairs under threshold
Problem formulation: Since the M2M pairs dispatched infor- interference.
mation opportunistically [7], a significant amount of control 1) Spectrum allocation for each CUE: The first sub-
signals are required by a proper power control system. We problem (OP 2 ) seeks to optimize the overall performance of
assume that an M2M pair consumes constant power to transmit each CUE by presuming that orthogonal subfrequencies do not
data during its operation. In other words, MUEs transmitting suffer from co-tier and cross-tier interference resulting from
power allocated at each subfrequency is given by M2M pairs, i.e.
( OP 2 : max Rn , (14)
0, if M2M pair is inoperative at k {W,P }
pkm = (6)
P, if M2M pair is operative at k s.t. (8), (9), and (11).
We designate Sk = {m|vm k
= 1} to present the set of We can maximize the transmission rate of CUEs for each
working M2M pairs at subfrequency k. To maximize the subfrequency to acquire the maximum throughput of all CUEs.
system performance of both MUE and CUE, the optimization Hence, the maximum SINR constraint is achieved for CUEs.
problem is expressed as In addition, the CUE with the leading channel gain is allowed
to transfer data on each subfrequency. Furthermore, the trans-
OP 1 : max {Rm + Rn } (7) mission power of CUE n∗ is managed with the open loop
{W,V,P }

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1475
2022 IEEE Wireless Communications and Networking Conference (WCNC)

power control technique to mitigate the interference at the Algorithm 1: RL-based resource utilization policy
M2M receivers, such as 1 Step 1: Spectrum distribution for CUEs;
k
γth (Imax + N0 ) 2 Set n = [1, 2, ..., N ], m = [1, 2, ..., M ] and entire
pkn∗ ≥ , (15)
hkn∗ number of simulation times;
max
k
where Imax is the highest permitted interference at each 3 Initialize : Rm = 0, Rn = 0 and pkn = P N ;
subfrequency. 4 for k = [1, 2, ..., K] do
2) Spectrum allocation for each MUE: After finding the 5 Obtain n∗ on subfrequency k;
allocation for each CUE, then the subfrequency assignment 6 Update the optimal transmission power of each
for M2M pairs can be given as CUE according to (15);
7 Update the data rate of each CUE according to
OP 3 : max Rm , (16) (17);
{V }
8 end
s.t. (10), (12), and (13).
9 Step 2: Spectrum distribution for MUEs;
From equation (2), (12) is rewritten as follows
 k k  10 Assign the training set S, amount of training phases T ,
pn hn batch size Q and PN parameter P ;
min − N0 Bn ≥ 0. (17)
n γth 11 for t = [1, 2, ..., T ] do
Similarly, (13) is rewritten as 12 Choose a batch of sample sb for b ∈ [1, 2, ..., Q];
M  k  13 Trial solution ob based θp (.|sb ) for b ∈ [1, 2, ..., Q];
X
∗ P hm 14 Calculate value: V (ob |sb );
vm,k p∗m,k h∗m,k ≤ min − N0 Bm + vm k
P hkm .
m=1
m∈Sk γth 15 Update the PG (gp ) according to (22);
(18) 16 Update the parameter of the PN according to (23);
After the alteration of (12) and (13), (OP 3 ) is transfigured into 17 Update the baseline function
a two-dimensional (2D) knapsack problem (KP), especially, q(sb ) = q(sb ) + α(V (ob |sb ) − q(sb )) for
there are two dimensions for the weights of the KP. It is ap- b ∈ [1, 2, ..., Q];
plied to achieve the optimal solution for resource allocation in 18 end
M2M communications under power and threshold interference 19 for k = [1, 2, ..., K] do
k
constraints [9]. This type of issue is a part of the MKP [9] 20 Utilize the PN to calculate the vm for mth MUE;
and only water-filling algorithms can be applied to address 21 Update the date rate for each MUE;
this type of NP-hard problems. However, efficient water-filling 22 end
algorithms require high computation time, making them un-
suitable for real-time applications [5]. In this article, the PN is
employed to successfully handle the problem of combinatorial 2) RL algorithm: RL algorithm is considered as a proper
optimization. method to train NN while solving combinatorial optimization
C. Proposed Algorithm Representation problems. Our proposed low complexity policy-empowered
1) PN architecture: RL-based optimal resource allocation RL aims the parameter optimization a PN, which is symbol-
scheme is proposed in Algorithm 1. As above, a 2D KP is ized as P . Besides, the expected tour length expressed by an
a resource optimization problem for each subfrequency k. input graph s is given below [10]
Given the CUEs’ resource distribution state, each of the M2M
J(P |s) = Eo∼θp (.|s) V (o|s). (19)
pairs is a characteristics vector of 3D (v, x, y), where v is
the achievable data rate for M2M pairs according to (5), x The graphs generate from distribution S while training, where
and y are the weights on the 2D KP limitations, which may the overall training target includes sampling from the graph
be obtained from (17) and (18), individually. The PN is a distribution and written as follows
special type of recurrent NN which differentiates the encoder
and the decoder via distinct colors. The input of v, x and y J(P ) = Es∼S J(P |s). (20)
should be in a sequence that comprises the 3D characteristics
vectors as specified, because the PN is built on the model of For optimizing the parameters, we recourse to policy gradient
sequence-to-sequence. The output is also a sequence and may (PG) techniques and stochastic gradient descent. Using the RL
be obtained from the PN by using the pointing mechanism algorithm, the gradient of (20) can be expressed as [10]
which re-arranges the input. The output is a collection of valid ∇p J(P |s) = Eo∼θp (.|s) [(V (o|s)−q(s))∇p log θp (o|s)], (21)
entities that meet the requirements. In particular, we cross the
solution series and terminate when the obtained entities exceed where q(s) signifies the baseline act in the training procedure
the limitations of (17) and (18). The identified entities are the that does not lie in the arrangement of the order in the proposed
solution to the KP. We name it solution o and designate V (o) network and calculates the predicted value to decrease the
as the total value of the corresponding set of the entities. The divergence of the gradients [10]. The popular RL algorithm
details of the PN framework are provided in [9]. has been utilized to extract the gradient for improving the

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1476
2022 IEEE Wireless Communications and Networking Conference (WCNC)

network parameters using the Adam method [10]. The pro- network performance while mitigating the co-tier interference
posed resource allocation scheme uses the RL algorithm in a of the M2M Links. The M2M mode maximizes the SINR of
further practicable approach with Monte Carlo sampling. After M2M links compared to traditional methods, leading to only
assigning the batch size Q, by producing Q independently a slightly better performance than the cellular mode.
and identically allocated sample KP, the gradient is stated in
a randomly determined mean form as TABLE I
T HE PARAMETERS OF S IMULATION
Q
1 X
gp = (V (ob |sb ) − q(sb ))∇p log θp (ob |sb ), (22) Parameter Value
Q Cell radius 250m
b=1
Carrier frequency 2 GHz
P = ADAM (P, gp ), (23) System bandwidth 6 MHz
Per RB bandwidth 200 kHz
The baseline function is an exponential moving mean value of Number of M2M pairs 30
the system rewards achieved through the network upon time Number of CUEs 10
to justify the detail that the strategy enhances with training. M2M communication distance 50m
Maximum transmission power [2] 30 dBm
3) CC analysis: Here we will analyze the CC of the pro- Noise power −174 dBm/Hz
posed scheme. According to Step 1, the CC of the suggested PL model between eNB and CUE [2] 128.1 + 37.6 log(d[km])
algorithm 1 is O(N ). The long-short term memory (LSTM) PL model between M2M pair [2] 148 + 40 log(d[km])
Threshold SINR 10 dB
units with attention are the elementary modules in the PN of Training samples 2000
the Step 2. The CC for every given fine-tuned PN is O(M 2 ) Testing samples 500
whereas the attention variable calculation is done M times Batch size 64
at every creation time [9], [10]. Thus, the entire CC of the Hidden layer 64
proposed scheme is O(M 2 + N ).
A model-free policy-based RL algorithm optimizes the
parameters without knowledge of the environment. This al-
gorithm measures the time of making model inference, that
is, the step for the trained model to make decisions which
minimizes the error on training samples, while keeping the
bound on its model complexity small. Thus, computation time
of the proposed scheme is faster than existing ML schemes
which are analyzed in Fig. 4.
IV. S IMULATION R ESULTS AND A NALYSIS
A. Simulation Setup
The MATLAB based simulation results for the suggested
scheme are presented in this section. Furthermore, the CUEs
and M2M pairs are randomly placed in the network. In
both CUEs and M2M pairs the channel fading follows a
uniform variance of the Rayleigh distribution. Table I lists
the main parameters of the simulation in detail. We compare
the proposed algorithm with two traditional schemes, such as Fig. 2. Average throughput versus number of M2M pairs.
M2M mode and cellular mode. In the M2M mode, the M2M
pair communicates with each other without using learning-
C. The Convergence of the Proposed Mechanism
based resource utilization [5]. Furthermore, in the cellular
mode, the CUE communicates without using learning-based Fig. 3 shows the reward comparison of the proposed
resource utilization in the proposed network [5]. Moreover, the scheme, M2M mode and cellular mode. As illustrated in the
proposed mechanism selects the suitable communication mode figure, our proposed scheme outperforms other existing M2M
using RL empowered PN-based resource utilization policy distribution algorithms in the proposed network. It shows that
without considering the UE selection. when the iteration increases, the network performance of users
is improved, and our proposed method is much better than
B. The Throughput Comparison of M2M Pairs other methods. Compared to traditional resource distribution
Fig. 2 shows the average throughput versus the number of approaches, MUEs achieve reasonable system performance
M2M pairs. The average throughput achieved by the proposed while co-tier interference is appropriately managed. Moreover,
scheme rises as the number of M2M pairs rises. But, the rise we can observe that the proposed scheme converges very
is not remarkable due to the interference limitation in the rapidly. This is because the proposed scheme adopts the PN
cellular mode. From the figure, it can be observed that the which reduces the CC. The numbers of M2M pairs and CUEs
proposed approach achieves a significant improvement in the are set to be 30 and 10, respectively, which are randomly

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1477
2022 IEEE Wireless Communications and Networking Conference (WCNC)

M2M pairs is small, then the Q-learning scheme has a close


performance to the MKP scheme. As the number gets larger,
the Q-learning scheme has achieved better reward performance
and convergence than that of the MKP scheme as well as lower
than that of the proposed scheme. In the case of large state-
action space, Q-learning cannot provide good efficiency due to
poor reward performance and convergence. On the other hand,
by introducing PN architecture in the training process, larger
reward performance, more stable convergence, lower CC, and
adequate network performance are attained using the proposed
scheme than that of MKP and Q-learning schemes.
V. C ONCLUSIONS
This paper investigated the optimal resource distribution
for M2M communications coexisting with cellular networks
using the nonlinear programming mixed-integer optimization
and low-complexity process which is based on the PN. Monte
Fig. 3. The convergence of the proposed mechanism.
Carlo simulations have been performed to evaluate the perfor-
mance of RL-based resource utilization policy. It has been
located in this figure. Additionally, if the number of UEs rises, found that the number of M2M pairs and iterations have
the convergence rate becomes gradually slow. The estimated substantial impact on the results. Simulation results showed
reward decreases as the number of UE rises, which indicates that the proposed mechanism has achieved a better system
the system performance to be better if there are fewer UEs performance in terms of average throughput compared to
in the proposed network. The reason is that the interference traditional approaches while significantly mitigating the inter-
generated by M2M links as a result of co-tier transmissions ference. For further upgrading the spectrum allocation policy,
increases. Therefore, the estimated reward of a network with federated learning framework can be used since it provides
a lower number of UE is greater than at with the many UEs a global solution for complex network optimization problems
in M2M communications. without sharing information between eNBs that makes it an
interesting topic for further investigation.
R EFERENCES
[1] F. Hussain, Internet of Things: Building Blocks and Business Models,
1st ed. Springer, Cham, 2017, pp. 73.
[2] S. K. Das and M. F. Hossain, “A Location-Aware Power Control
Mechanism for Interference Mitigation in M2M Communications over
Cellular Networks,” Comput. & Elect. Eng., vol. 88, pp. 1–23, Dec.
2020.
[3] X. Li, L. Ma, Y. Xu, and R. Shankaran, ”Resource Allocation for D2D-
Based V2X Communication With Imperfect CSI,” IEEE Internet Things
J., vol. 7, no. 4, pp. 3545-–3558, Apr. 2020.
[4] H. Xu et al., “Robust transmission design for multicell D2D underlaid
cellular networks,” IEEE Trans. Veh. Technol., vol. 67, no. 7, pp. 5922—
5936, Jul. 2018.
[5] S. K. Das and R. Mudi, ”A Location-Aware Resource Efficiency
and Energy Efficiency Optimization for M2M Communications over
Downlink Cellular Systems,” IEEE 31st Annu. Int. Symp. on Pers.,
Indoor and Mobile Radio Commun., 2020, pp. 1–6.
[6] Y. Hao, Q. Ni, H. Li, and S. Hou, “Robust Multi-Objective Optimization
for EE-SE Tradeoff in D2D Communications Underlaying Heteroge-
neous Networks,” IEEE Trans. Commun., vol. 66, no. 10, pp. 4936—
Fig. 4. The CC comparison with the proposed scheme, Q-learning scheme 4949, Oct. 2018.
and MKP scheme. [7] W. Wu, R. Liu, Q. Yang, and T. Q. S. Quek, ”Learning-Based
Robust Resource allocation for D2D Underlaying Cellular Net-
work,” May 2021, Accessed: Jul. 03, 2021. [Online]. Available:
D. The Computational Complexity of the Proposed Scheme http://arxiv.org/abs/2105.08324.
[8] D. Wang et al., “Joint resource allocation and power control for D2D
Fig. 4 shows the expected reward versus the number of communication with deep reinforcement learning in MCC,” Physical
Commun., vol. 45, Apr. 2021.
iterations. Here, O(N (K 2 +K+P )) presents the CC of MKP [9] L. Zhu, C. Liu, J. Yuan, and G. Yu. “Machine Learning-Based Resource
where the amount of M2M pairs, length of MKP and the total Optimization for D2D Communication Underlaying Networks,” IEEE
iteration numbers are denoted by K, P , and N , respectively. 92nd Veh. Technol. Conf. (VTC2020-Fall), 2020, pp. 1–6.
[10] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural Com-
This figure considers 80 and 30 as the values of N and K. The binatorial Optimization with Reinforcement Learning,” 5th Int. Conf.
RL based Q-learning scheme for resource allocation, which is Learn. Represent. ICLR 2017 - Work. Track Proc., Nov. 2016, Accessed:
used in [8]. From Fig. 4, it can be seen that if the number of Jul. 08, 2021. [Online]. Available: https://arxiv.org/abs/1611 : 09940v3.

Authorized licensed use limited to: McGill University. Downloaded on March 31,2024 at 16:59:09 UTC from IEEE Xplore. Restrictions apply.
1478

You might also like