0% found this document useful (0 votes)

15 views6 pages

Decentralized_Deep_Reinforcement_Learning_Approach

Uploaded by

mudassir.rafiq155

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views6 pages

Decentralized_Deep_Reinforcement_Learning_Approach

Uploaded by

mudassir.rafiq155

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Decentralized Deep Reinforcement Learning

Approach for Channel Access Optimization

Sheila C. da S. J. Cruz

INATEL
Felipe Augusto Pereira de Figueiredo
INATEL
Rausley A. A. de Souza
INATEL

Research Article

Keywords: Wi-Fi, contention-based channel access, channel utilization optimization, reinforcement

learning, NS-3, NS3-gym

Posted Date: June 11th, 2024

DOI: https://doi.org/10.21203/rs.3.rs-4555252/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: The authors declare no competing interests.

Decentralized Deep Reinforcement Learning
Approach for Channel Access Optimization
Sheila C. da S. J. Cruz, Felipe A. P. de Figueiredo and Rausley A. A. de Souza

Abstract— The IEEE 802.11 standard’s binary exponential proposed and applied in various domains, including wireless
back-off (BEB) algorithm is the prevailing method for tackling networks. DRL algorithms have the capacity to learn from
the collision avoidance problem. Under the BEB paradigm, the network states and adjust to evolving network conditions,
back-off period increases each time a collision occurs, aiming
to minimize the likelihood of subsequent collisions. However, thereby optimizing cumulative rewards over time. This adapt-
this provides sub-optimal results degrading network performance ability renders DRL particularly suitable for solving many op-
and leading to bandwidth wastage, especially in dynamic dense timization and decision-making problems of Wi-Fi networks,
networks. To overcome these drawbacks, this paper proposes providing optimal solutions that are flexible and adaptable to
using a decentralized approach with deep reinforcement learning different learning scenarios. Proposing DRL algorithms offers
algorithms, namely Deep Q Learning (DQN) and Deep Deter-
ministic Policy Gradient (DDPG), to optimize the contention potential improvements by optimizing CW values, especially
window value and maximize throughput while minimizing colli- in dynamic scenarios where the number of nodes increases
sions. Simulations with the NS-3 simulator and NS3-gym toolkit over time.
reveal that DQN and DDPG outperform BEB in both static In [5], a centralized single-agent DRL approach is pre-
and dynamic scenarios, achieving up to a 37.16% network sented. In that solution, the DRL agent is located at the access
throughput improvement in dense networks, keeping a high and
stable throughput as the number of stations increases. point (AP). Since it has a global view of the network, a
unique CW value is optimized and broadcast to all associated
Keywords— Wi-Fi, contention-based channel access, channel stations, ensuring that all stations have the same CW value.
utilization optimization, reinforcement learning, NS-3, NS3-gym.
It showed a considerable increase in throughput. However, a
decentralized approach offers a more robust, efficient solution
I. I NTRODUCTION with guaranteed high scalability, especially in dense dynamic
Wireless networks are widely used and applied in different scenarios. By leveraging distributed computing, new stations
domains where the stations connected to the wireless networks can be easily added to the corresponding scenario, acceler-
require a fair share of the spectrum resources to guarantee a ating convergence to optimal collision avoidance solutions in
better performance while accessing the channel to transmit wireless local area networks (WLANs).
data [1]. One key challenge is managing collisions, where Several multi-agent reinforcement learning (MARL) meth-
multiple stations transmit simultaneously, causing interference ods have been proposed in the literature to enhance the
and data loss. The IEEE 802.11 standard uses the carrier- performance of WLANs by providing decentralized solutions.
sensing multiple access with collision avoidance (CSMA/CA) For instance, the authors in [6] have proposed a MARL to
[2] protocol in the MAC layer to mitigate collision occur- optimize spectrum occupation prediction and enhance multi-
rences by employing a contention window (CW) value that channel slotted wireless network access. The overuse of the
is a random back-off time to minimize collisions. Each new wireless spectrum due to various network technologies causes
collision doubles the CW value, ranging from CWMin (15 collisions. To mitigate this, a multi-agent DRL mechanism
or 31) to CWMax (1023), in order to reduce the chance of with six known and two unknown nodes, using ring and AP
different stations selecting the same back-off value, deferring topologies, was proposed. A distributed reinforcement learning
the transmission to a later time. (RL)-based scheduler allocates slots to avoid collisions, with
The binary exponential back-off (BEB) algorithm is respon- agents trained via online supervised learning with experience
sible for this deferring method, which is used in CSMA/CA replay. Using radio frequency observations for predictions,
[3]. However, the BEB algorithm has significant limitations this approach reduces inter-network collisions by 30% and in-
and often provides sub-optimal results, particularly under high creases overall throughput by 10% compared to the traditional
loads, lacks adaptability, not being able to adapt to changing exponentially weighted moving average (EWMA) algorithm.
network conditions, and also lacks fairness [4]. To overcome To optimize the system spectral efficiency, the authors
these drawbacks, machine learning-based solutions such as in [7] have proposed a MARL algorithm for power alloca-
deep reinforcement learning (DRL) algorithms have been tion and joint subcarrier assignment in multi-cell orthogonal
frequency-division multiplexing systems. Each base station
This work was partially funded by CNPq (Grant Nos. 403612/2020-9, independently calculates resource allocation based on local
311470/2021-1, and 403827/2021-3), by Minas Gerais Research Foundation conditions but collaborates by exchanging information for
(FAPEMIG) (Grant No. APQ-00810-21, PPE-00124-23), by FADCOM -
Fundo de Apoio ao Desenvolvimento das Comunicações, presidential decree global optimization. The MARL algorithm demonstrated fast
no 264/10, November 26, 2010, Republic of Angola, and by the projects convergence and up to 53.6% higher efficiency compared to
XGM-AFCCT-2024-2-5-1 and XGM-FCRH-2024-2-1-1 supported by xGMo- conventional Q-learning. In heterogeneous networks, where
bile - EMBRAPII-Inatel Competence Center on 5G and 6G Networks, with
financial resources from the PPI IoT/Manufatura 4.0 from MCTI grant number multiple APs and users share the same spectrum, effective
052/2023, signed with EMBRAPII. power control is crucial for managing interference. However,
obtaining instantaneous global channel state information in networks, such as providing optimal solutions to channel ac-
rapidly changing environments is challenging. cess, collision avoidance, resource management, and improved
Previous works present limitations in directly optimizing network performance. It supports scalability and robustness,
the network’s throughput and adaptability to extremely dense allowing easy integration of new agents and continued opera-
dynamic scenarios showing high computational complexity. tion despite agent failures. Agents can also share information
Our decentralized solution differs from previous works by to enhance convergence. However, decentralized MARL faces
treating each station as a DRL agent that updates its CW challenges like the non-stationarity problem [12], [13], leading
values resulting in optimized individual throughputs that are to the problem of moving targets. This means that the optimal
passed to the AP which sums up the individual throughputs policy will change whenever other agents’ policies change.
from the self-learning agents. Then the AP broadcasts the Furthermore, the computational complexity increases exponen-
total throughput value back to the stations maximizing the tially with more agents, and achieving agent communication,
network’s overall performance leading to a more flexible, i.e. collaboration, is difficult due to limited local information
adaptable and robust collision avoidance solution to static and and a lack of awareness of other agents’ actions and rewards.
dynamic scenarios.
The decentralized solution proposed in this uses the colli- B. Methodology
sion probability, i.e., the transmission failure probability, as
the information from the state of the network for training the The proposed approach encompasses a decentralized al-
independent DRL agents. The study compares the collision gorithm that runs on multiple stations simultaneously. Each
probabilities of the decentralized approach and the conven- station observes the network state independently and selects
tional BEB algorithm, demonstrating that the decentralized appropriate CW values to optimize overall network perfor-
method outperforms BEB and better adapts to changing net- mance. Next, we describe each part of the decentralized
work conditions, making it well-suited for addressing collision solution.
avoidance in WLANs. 1) Agent is represented by a DRL algorithm (DQN or
Therefore, this work involves analyzing two scenarios: DDPG) to run on multiple stations varying from 5 to
static, with a fixed number of nodes, and dynamic, with an 50.
increasing number of nodes. We propose the use of DRL 2) Current state is the environment status s, of all stations
algorithms, specifically Deep Q Learning (DQN) and Deep associated with the AP. However, it is impossible to get
Deterministic Policy Gradient (DDPG), using the well-known this information because of the nature of the optimiza-
collision probability as the observation metric to optimize tion problem. Therefore, we model the problem as a
network performance. partially observable Markov decision process (POMDP)
The remainder of the paper is organized as follows. Section instead of a Markov decision process (MDP). POMDP
II presents a brief theoretical background and the methodology assumes the environment’s state cannot be perfectly
used for the simulation. Section III describes the simulation observed [14].
results. Finally, Section IV presents the conclusions and future 3) Observation, O, is the network information based on
works. the collision probability to observe the overall network’s
status. This information is saved into a buffer of re-
cent observations. Then a moving average is calculated,
II. S IMULATION M ETHODOLOGY
producing the mean value, µ, and the variance, σ 2 ,
A. Theoretical Background transferred to a two-dimensional vector to train the DRL
RL involves single-agent learning through interaction with agent.
the environment, making decisions that maximize the overall 4) Action, a, determines the CW value. As we compare
cumulative reward [8]. The main idea of the RL algorithms DRL algorithms with discrete and continuous action
is to find a policy, i.e., a rule to explore and exploit the spaces, the actions are integer values between 0 and 6 in
environment, that maximizes the total future reward. DRL the discrete case and real values within the interval [0, 6]
combines RL with artificial neural networks to handle high- in the continuous case. This interval is selected so that
dimensional data and accelerate convergence [9]. DRL may the action space is within 802.11 standard’s CW range,
include DQN [10] and DDPG [11] algorithms. This work which ranges from 15 up to 1023. Therefore, the CW
focuses on using these two algorithms to assist in optimizing value to be calculated for each station can be obtained
CW with the primary goal of reducing node collisions while by applying CW = ⌊2a+4 ⌋ − 1.
improving network performance. 5) Reward, r, is the normalized network throughput. It
A decentralized RL system is a multi-agent RL approach is calculated by dividing the actual throughput by the
designed for decision-making and optimization involving mul- expected maximum throughput for each station. Each
tiple self-learning agents in a shared environment [12]. Based station’s agent receives individual rewards, and the cu-
on game theory, MARL involves agents independently max- mulative reward to be broadcast to every station is the
imizing their rewards by using local information without sum of these individual rewards, resulting in a real value
considering other agents, resulting in a competitive and non- within the interval [0, 1].
communicative learning process [13]. A decentralized ap- The collision probability is a good characterization of the
proach offers significant advantages in the context of WLAN environment state. It is the probability of collision, pcol ,
observed by the network. It can also be interpreted as the
Algorithm 1 DRL-based Decentralized CW Optimization
probability of transmission failure. It is calculated based on the
▷ ### Initialization ###
number of transmitted, Nt , and correctly received, Nr , frames, 1: Define the maximum number of stations, WifiNode.
that is 2: Initialize the observation buffer of each station, O(i), with zeroes.
Nt − Nr 3: Initialize the weights of each agent, θ(i).
pcol = . (1) 4: Get the action function for each station, Aθ (i), which each agent uses to choose
Nt the action according to its current state
5: Initialize the algorithm’s interaction period with the environment, envStepT ime
This collision rate approximates the actual probability of 6: Initialize the number episodes, Nepisodes
collision as the number of frames used to calculate it increases. 7: Initialize the number of steps per episode, Nspe
8: Initialize the training stage period, trainingP eriod
Thus, this rate represents the probability of a frame not being 9: Set trainingFlag ← T rue to tell the algorithm is in the training stage
received because another station is transmitting a frame at the 10: Initialize the experience replay buffer of each station, E(i), with zeroes.
11: trainingStartTime ← currentTime
same time. These probabilities are calculated within the inter- 12: lastUpdate ← currentTime
action periods and provide information on the performance of 13: Initialize the previous mean value of each station µprev (i) ← 0
2
14: Initialize the previous variance value of each station σprev (i) ← 0
the selected CW value. 15: Set CW(i) ← 15, ∀i
The decentralized CW optimization with DRL follows three
16: for e = 1, . . . , Nepisodes do
stages, which are i) pre-learning, where the observation buffer 17: Reset and run the environment, i.e., reset and run the NS-3 simulator
is filled with data; ii) learning, where the agent selects CW 18: for t = 1, . . . , Nspe do
19: for i = 1, . . . , WifiNode do
values; and iii) operational, where the agent uses learned
actions known by training that provide the best rewards. The ▷ ### Pre-learning stage ###
20: Nt (i) ← get number of transmitted frames of the ith station
decentralized solution includes a preprocessing step that uses 21: Nr (i) ← get number of received frames of the ith station
N (i)−N (i)
recent observation history to calculate a moving average, 22: observation(i) ← t N (i) r
t
23: O(i).append(observation(i))
generating the mean, µ, and the variance, σ 2 values that
will be passed to fill a two-dimensional vector, which is 24: if currentT ime ≥ lastU pdate + envStepT ime then
the information used for training the agent. Exploration is ▷ ### Learning and operational stages ###
enabled by adding a decreasing noisy factor to actions, with 25: µ(i), σ 2 (i) ← preprocess(O(i))
26: a(i) ← Aθ(i) (µ(i), σ 2 (i), trainingFlag)
DQN using a noisy factor corresponding to the probability of 27: CW (i) ← 2a(i)+4 − 1
taking a random action instead of an action predicted by the
agent. In the DDPG’s case, the noisy factor occurs by adding 28: if trainingF lag == T rue then
29: NRP (i) ← get the number of received packets of the ith station.
Gaussian noise directly to the actions. The implementation for 30: tput(i) ← envStepT ime
NRP (i)

the proposed decentralized solution follows the pseudo-code 31: Send the throughput of each station to the access point.
shown in Algorithm 1, which presents how the three before- 32: r ← normalize(tput(i))
33: Broadcast the new r reward value to all associated stations
mentioned stages operate. 34: E(i).append((µ(i), σ 2 (i), a(i), r, µprev(i) , σprev(i)
2
))
35: µprev(i) ← µ(i)
2
36: σprev(i) ← σ 2 (i)
III. S IMULATION R ESULTS 37: mb(i) ← get random mini-batch from E(i)
38: Update θ(i) based on mb(i)
In this section, we compare the proposed solution’s perfor- 39: end if
mance against that of the conventional BEB algorithm under 40: lastUpdate ← currentTime
static and dynamic scenarios. 41: end if

▷ ### Makes the transition between learning and operational stages ###
42: if currentT ime ≥ trainingStartT ime + trainingP eriod
A. Simulation Scenario Description then
In the decentralized solution, the stations with RL agents 43: trainingF lag ← F alse
44: end if
follow a distributed topology with one AP, as shown in Fig. 45: end for
1. Each station acts as an autonomous agent, adjusting its 46: end for
47: end for
CW value and receiving individual rewards. The cumulative Cenário da Solução Descentralizada
reward is the total sum of all individual rewards. The stations’
arrangement occurs statically and dynamically. The NS3-gym
and NS-3 simulator are used to train the agent and implement
Total Reward
the DRL algorithm using TensorFlow and PyTorch [15]. Sum

Tables I and II summarize the parameters setup used in NS-

3 for creating the agent’s environment and in NS3-gym for 31 63
Sent packets Sent packets
CW CW
training the DRL agent, respectively. Both DRL algorithms Update Single reward
Access Point
Single reward
Update

include one recurrent long short-term memory (LSTM) layer Station A Station B

and two fully connected layers as the network architecture,

forming an 8 × 128 × 64 topology. CW CW
31 63

𝜇 𝜎 𝜇 𝜎
B. Static Scenario
In the static scenario, a fixed number of stations associated Fig. 1. System model for the decentralized DRL-based CW optimization.
with the AP is kept constant during the experiment execution.
TABLE I 800
Decent. DQN-Pcol
NS-3 E NVIRONMENT CONFIGURATION PARAMETERS . 700 Decent. DDPG-Pcol

600
Configuration Parameter Value
500

Mean CW
Wi-Fi standard IEEE 802.11ax
Number of APs 1 400
Number of static stations 5,15, 30 or 50
Number of dynamic stations increases steadily from 5 to 50 300
Frame aggregation disabled
Packet size 1500 bytes 200
Max Queue Size 100 packets 100
Frequency 5 GHZ
Channel BW 20 MHz 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Traffic constant bit-rate UDP of 150 Mbps Number of Episodes
MCS HeMcs (1024-QAM with a 5/6 coding rate)
Guard Interval 800 [ns] Fig. 3. Mean CW value for 30 stations in the static scenario.
Propagation delay model ConstantSpeedPropagationDelayModel
Propagation loss model MatrixPropagationLossModel
42

Network throughput [Mb/s]

TABLE II
38
NS3- GYM AGENT CONFIGURATION PARAMETERS .
36
Configuration Parameter Value 34 Improvement
over BEB
DQN’s learning rate 4 × 10−4
DDPG’s actor learning rate 4 × 10−4 32
DDPG’s critic learning rate 4 × 10−3 30
Reward discount rate 0.7 BEB
Batch size 32 28 Decent. DQN-Pcol
Replay memory size 18000 Decent. DDPG-Pcol
Size of observation history memory 300 26
0 10 20 30 40 50
trainingP eriod 840 [s] Number of stations
envStepT ime (i.e., interaction interval) 10 [ms]
Fig. 4. Comparison of the network throughput in the dynamic scenario as
the number of stations increases from 5 to 50.

Fig. 2 presents a comparison of the network throughput as a

function of the number of stations between the decentralized happens because the number of random actions decreases over
DRL algorithms over the conventional BEB. The improve- time, and the agent is converging to a result and learning
ments of DDPG over BEB are 5.19%, 17.64%, 16.67%, correctly. After the 10th episode, the mean CW value is kept
and 27.78% for 5, 15, 30, and 50 stations, respectively. The stable around the same value.
improvements of DQN over BEB are 5.19%, 17.21%, 16.27%,
and 27.10% for 5, 15, 30, and 50 stations, respectively. The
C. Dynamic Scenario
results demonstrate that DDPG is slightly better than DQN,
especially for 15, 30, and 50 stations. This can be attributed In the dynamic scenario, the number of stations grows
to DDPG’s ability to select any real CW value within the steadily during the simulation execution, increasing from 5
[0, 6] range satisfactorily suitable for tracking the network’s to 50. The higher the number of stations, the higher the
dynamics [11]. collision probability. This experiment evaluated whether the
Fig. 3 depicts the mean CW value throughout 15 episodes DRL algorithms correctly act upon network changes. Fig.
with 30 stations in the static scenario. The results demonstrate 4 shows that the DRL algorithms effectively enhance the
that in the initial episodes, there is a higher variance on network’s throughput compared to the BEB algorithm in the
CW, but as the training progresses, the variance reduces. This decentralized dynamic scenario. The degree of DQN and
DDPG’s improvement in comparison to BEB was similar. The
improvements are 7.89%, 11.94%, 9.27%, 8.43% over BEB
42
for 5, 15, 30, and 50 stations, respectively.
40
Fig. 5 shows the mean CW for the dynamic scenario as a
Network throughput [Mb/s]

38 function of the number of episodes. As with the static scenario,

36 after some episodes, the CW value remains stable around the
34 same value.
32
Fig. 6 illustrates the instantaneous selected CW values and
Improvement
30 over BEB
the number of stations as a function of the simulation time
BEB for the decentralized dynamic scenario, with the number of
28 Decent. DQN-Pcol
Decent. DDPG-Pcol stations incrementally rising from 5 to 50 at every 1.2 seconds.
26
0 10 20 30 40 50 It is possible to observe that DQN varies between discrete
Number of stations
neighboring CW values. At the same time, DDPG consistently
Fig. 2. Comparison of the network throughput in a static scenario as the raises the CW value, resulting in a lower CW value for 50
number of stations increases from 5 to 50. stations and subsequently improving the throughput.
Decent. DQN-Pcol IV. C ONCLUSIONS
600 Decent. DDPG-Pcol

500
This work has proposed a decentralized solution using
multiple self-learning agents with RL algorithms (DQN and
400
DDPG) to optimize the CW parameter and enhance network
Mean CW

300 throughput. Simulation results have shown that this approach

200 outperforms traditional BEB, with DQN and DDPG achieving
100
up to a 37.16% increase in throughput with 50 stations. Both
DRL algorithms have performed similarly, making either suit-
0
able for improving network efficiency in static and dynamic
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 environments. These findings have confirmed the effective-
Number of Episodes
ness of DRL algorithms in addressing collision avoidance
Fig. 5. Mean CW value for 30 stations in the dynamic scenario. in WLANs. Future work could focus on station cooperation
through information sharing, crucial for agents to optimize
network-wide rewards.

R EFERENCES
[1] S. Giannoulis, C. Donato, R. Mennes, F. A. P. de Figueiredo, I. Ja-
bandžic, Y. De Bock, M. Camelo, J. Struye, P. Maddala, M. Mehari,
A. Shahid, D. Stojadinovic, M. Claeys, F. Mahfoudhi, W. Liu, I. Seskar,
S. Latre, and I. Moerman, “Dynamic and collaborative spectrum sharing:
The scatter approach,” in 2019 IEEE International Symposium on
Dynamic Spectrum Access Networks (DySPAN), 2019, pp. 1–6.
[2] X. Guo, S. Wang, H. Zhou, J. Xu, Y. Ling, and J. Cui, “Performance
evaluation of the networks with Wi-Fi based TDMA coexisting with
CSMA/CA,” Wireless Personal Communications, vol. 114, no. 2, pp.
1763–1783, 2020.
[3] P. Patel and D. K. Lobiyal, “A simple but effective collision and
Fig. 6. Selected CW value with the number of stations increasing from 5 error aware adaptive back-off mechanism to improve the performance
to 50 in the dynamic scenario. of IEEE 802.11 DCF in error-prone environment,” Wireless Personal
Communications, vol. 83, pp. 1477–1518, 2015.
[4] B.-J. Kwak, N.-O. Song, and L. E. Miller, “Performance analysis of
exponential backoff,” IEEE/ACM Transactions on Networking, vol. 13,
Fig. 7 compares the instantaneous network throughput in the no. 2, pp. 343–355, 2005.
[5] S. J. Sheila de Cássia, M. A. Ouameur, and F. A. P. de Figueiredo,
decentralized dynamic scenario, where the number of stations “Reinforcement learning-based wi-fi contention window optimization,”
progressively increases from 5 to 50. The elevated number of Journal of Communication and Information Systems, vol. 38, no. 1, pp.
stations modifies the CW value, affecting the instantaneous 128–143, 2023.
[6] R. Mennes, F. A. P. De Figueiredo, and S. Latré, “Multi-agent deep
network throughput. The BEB’s throughput decays to approx- learning for multi-channel access in slotted wireless networks,” IEEE
imately 26 Mbps when the number of stations connected with Access, vol. 8, pp. 95 032–95 045, 2020.
the AP reaches 50, unlike the decentralized mode, which yields [7] Y. Hu, M. Chen, Z. Yang, M. Chen, and G. Jia, “Optimization of resource
allocation in multi-cell OFDM systems: a distributed reinforcement
a throughput of 35.8 Mbps with an increase of 37.16% over learning approach,” in IEEE 31st Annual International Symposium on
BEB. The proposed DRL algorithms (with either DQN or Personal, Indoor and Mobile Radio Communications, 2020, pp. 1–6.
DDPG) present an almost constant behavior, keeping a high [8] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,
3rd ed. USA: Prentice Hall Press, 2009.
and stable throughput as the number of stations progressively [9] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
increases. These findings make the DRL algorithm totally apt “A brief survey of deep reinforcement learning,” arXiv preprint
to address the collision avoidance challenges in dense wireless arXiv:1708.05866, 2017.
[10] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
networks. with double q-learning,” in Proceedings of the AAAI conference on
artificial intelligence, vol. 30, no. 1, 2016.
42
[11] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
BEB D. Silver, and D. Wierstra, “Continuous control with deep reinforcement
Decent. DQN-Pcol
40 Decent. DDPG-Pcol learning,” arXiv preprint arXiv:1509.02971, 2015.
[12] L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey
38 of multiagent reinforcement learning,” IEEE Transactions on Systems,
Network throughput [Mb/s]

36 Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2,
pp. 156–172, 2008.
34 [13] D. Huh and P. Mohapatra, “Multi-agent reinforcement learning: A
comprehensive survey,” arXiv preprint arXiv:2312.10256, 2023.
32
[14] A. R. Cassandra, “A survey of POMDP applications,” in Working notes
30 of AAAI 1998 fall symposium on planning with partially observable
Markov decision processes, vol. 1724, 1998.
28 [15] P. Gawłowicz and A. Zubow, “NS3-gym: Extending openai gym for
networking research,” arXiv preprint arXiv:1810.03943, 2018.
26
0 10 20 30 40 50 60
Simulation time [s]

Fig. 7. Comparison of the instantaneous network throughput in the dynamic

scenario.

Concise Guide to OTN optical transport networks
From Everand
Concise Guide to OTN optical transport networks
alasdair gilchrist
4/5 (2)
Agile Complete Merged
No ratings yet
Agile Complete Merged
82 pages
Concise Guide to DWDM
From Everand
Concise Guide to DWDM
alasdair gilchrist
5/5 (2)
ITP For Fire Fighting System
33% (3)
ITP For Fire Fighting System
7 pages
Floor Function - Titu Andreescu, Dorin Andrica - MR 2006 PDF
0% (1)
Floor Function - Titu Andreescu, Dorin Andrica - MR 2006 PDF
5 pages
Contention Window Optimization in IEEE 802.11ax Networks With Deep Reinforcement Learning
No ratings yet
Contention Window Optimization in IEEE 802.11ax Networks With Deep Reinforcement Learning
4 pages
2 - Deep Reinforcement Learning Based Rate Adaptation For Wi-Fi Networks
No ratings yet
2 - Deep Reinforcement Learning Based Rate Adaptation For Wi-Fi Networks
5 pages
RL_DRL_2019_Multiple Access for Heterogeneous Wireless Networks
No ratings yet
RL_DRL_2019_Multiple Access for Heterogeneous Wireless Networks
14 pages
progress
No ratings yet
progress
30 pages
Deep-Reinforcement Learning Multiple Access For Heterogeneous Wireless Networks
No ratings yet
Deep-Reinforcement Learning Multiple Access For Heterogeneous Wireless Networks
7 pages
2504.03804v1
No ratings yet
2504.03804v1
7 pages
!Reinforcement Learning for Optimizing Wi-Fi Access Channel Selection
No ratings yet
!Reinforcement Learning for Optimizing Wi-Fi Access Channel Selection
14 pages
Federated Deep Reinforcement Learning For User Access Control in Open Radio Access Networks
No ratings yet
Federated Deep Reinforcement Learning For User Access Control in Open Radio Access Networks
6 pages
231123 智能无线通信技术研究概况PPT 演说
No ratings yet
231123 智能无线通信技术研究概况PPT 演说
28 pages
5 - Application-Level Data Rate Adaptation in Wi-Fi
No ratings yet
5 - Application-Level Data Rate Adaptation in Wi-Fi
9 pages
Wu 2021
No ratings yet
Wu 2021
15 pages
DDQN 1
No ratings yet
DDQN 1
5 pages
Stateless Reinforcement Learning
No ratings yet
Stateless Reinforcement Learning
5 pages
Deep Reinforcement Learning for Multi-user
No ratings yet
Deep Reinforcement Learning for Multi-user
33 pages
Indoor Radio Planning: A Practical Guide for 2G, 3G and 4G
From Everand
Indoor Radio Planning: A Practical Guide for 2G, 3G and 4G
Morten Tolstrup
5/5 (1)
Deep Reinforcement Learning For RAN Optimization and Control
No ratings yet
Deep Reinforcement Learning For RAN Optimization and Control
6 pages
Deep Reinforcement Learning For Communication Flow Control in Wireless Mesh Networks
No ratings yet
Deep Reinforcement Learning For Communication Flow Control in Wireless Mesh Networks
8 pages
Cross Layer Optimization
No ratings yet
Cross Layer Optimization
43 pages
A Graph Neural Network Learning Approach to Optimize RIS-Assisted Federated Learning
No ratings yet
A Graph Neural Network Learning Approach to Optimize RIS-Assisted Federated Learning
16 pages
Deep Reinforcement Learning Based Computation Offloading and Resource Allocation For MEC
No ratings yet
Deep Reinforcement Learning Based Computation Offloading and Resource Allocation For MEC
6 pages
VXLAN Network Virtualization Guide: Definitive Reference for Developers and Engineers
From Everand
VXLAN Network Virtualization Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LCBA
No ratings yet
LCBA
13 pages
RFRL Gym-A Reinforcement Learning Testbed For
No ratings yet
RFRL Gym-A Reinforcement Learning Testbed For
8 pages
A Reinforcement Learning Approach For Scheduling in Mmwave Networks
No ratings yet
A Reinforcement Learning Approach For Scheduling in Mmwave Networks
6 pages
Meta Federated Reinforcement Learning For Distributed Resource Allocation
No ratings yet
Meta Federated Reinforcement Learning For Distributed Resource Allocation
11 pages
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
From Everand
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
An_Efficient_Method_for_Optimal_Allocation_of_Resources_in_LPWAN_Using_Hybrid_Coati-Energy_Valley_Optimization_Algorithm_Based_on_Reinforcement_Learning
No ratings yet
An_Efficient_Method_for_Optimal_Allocation_of_Resources_in_LPWAN_Using_Hybrid_Coati-Energy_Valley_Optimization_Algorithm_Based_on_Reinforcement_Learning
14 pages
Channel Estimation in The Interplanetary Internet Using Deep Learning and Federated Learning SJ
No ratings yet
Channel Estimation in The Interplanetary Internet Using Deep Learning and Federated Learning SJ
4 pages
Optimizing Coverage and Capacity in Cellular Networks Using Machine Learning
No ratings yet
Optimizing Coverage and Capacity in Cellular Networks Using Machine Learning
5 pages
Relay-Assisted Federated Edge Learning
No ratings yet
Relay-Assisted Federated Edge Learning
17 pages
2006.16646
No ratings yet
2006.16646
30 pages
1 DeepReinforcementLearning-basedJointUser
No ratings yet
1 DeepReinforcementLearning-basedJointUser
14 pages
Medium Access Using Distributed Reinforcement Learning For Iots With Low-Complexity Wireless Transceivers
No ratings yet
Medium Access Using Distributed Reinforcement Learning For Iots With Low-Complexity Wireless Transceivers
9 pages
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
DNP3 Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
No-Pain No-Gain DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks
No ratings yet
No-Pain No-Gain DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks
16 pages
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
No ratings yet
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
15 pages
Kimura Et Al 2018 Adaptive Access Point and Channel Selection Method Using Markov Approximation
No ratings yet
Kimura Et Al 2018 Adaptive Access Point and Channel Selection Method Using Markov Approximation
11 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
papr4
No ratings yet
papr4
41 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
9 pages
Cognitive Radio
0% (1)
Cognitive Radio
40 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
16 pages
DRL For WSN Book
No ratings yet
DRL For WSN Book
78 pages
EtherChannel Configuration and Optimization: Definitive Reference for Developers and Engineers
From Everand
EtherChannel Configuration and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Key technologies for NG-PON2 system
From Everand
Key technologies for NG-PON2 system
Rawa Muayad
No ratings yet
Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
Reinforcement Learning For Channel and Radio Allocations To Wireless Backhaul Links
No ratings yet
Reinforcement Learning For Channel and Radio Allocations To Wireless Backhaul Links
4 pages
Dynamic Scheduling For Over-The-Air Federated Edge Learning With Energy Constraints
No ratings yet
Dynamic Scheduling For Over-The-Air Federated Edge Learning With Energy Constraints
16 pages
Virtual Private LAN Service Fundamentals: Definitive Reference for Developers and Engineers
From Everand
Virtual Private LAN Service Fundamentals: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
From Everand
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
No ratings yet
Distributed Machine Learning For Multiuser Mobile Edge Computing Systems
14 pages
Research On Topology Planning For Wireless Mesh Networks Based On Deep Reinforcement Learning
No ratings yet
Research On Topology Planning For Wireless Mesh Networks Based On Deep Reinforcement Learning
6 pages
[1] Deep Reinforcement Learning-Based Relay Selection in Intelligent Reflecting Surface Assisted Cooperative Networks
No ratings yet
[1] Deep Reinforcement Learning-Based Relay Selection in Intelligent Reflecting Surface Assisted Cooperative Networks
5 pages
Wide Horizons of Cellular Networks: Maximizing Connectivity
From Everand
Wide Horizons of Cellular Networks: Maximizing Connectivity
Pasquale De Marco
No ratings yet
Federated Low-Rank Adaptation for Large Models Fine-Tuning Over Wireless Networks
No ratings yet
Federated Low-Rank Adaptation for Large Models Fine-Tuning Over Wireless Networks
17 pages
Team 2 Research Paper
No ratings yet
Team 2 Research Paper
6 pages
Graph Neural Networks Approach For Joint Wireless Power Control and Spectrum Allocation
No ratings yet
Graph Neural Networks Approach For Joint Wireless Power Control and Spectrum Allocation
16 pages
Deep Reinforcement Learning For Wireless Communications and Networking
No ratings yet
Deep Reinforcement Learning For Wireless Communications and Networking
375 pages
Urgentlty Required SIS Correction Data Till 23.03.2025 at 02 PM
No ratings yet
Urgentlty Required SIS Correction Data Till 23.03.2025 at 02 PM
6 pages
Thermometer & It's types
No ratings yet
Thermometer & It's types
10 pages
2023-24-LIST-22-10-2023
No ratings yet
2023-24-LIST-22-10-2023
15 pages
A Bronze Age Queen Was Buried Wearing a Priceless Silver Crown
No ratings yet
A Bronze Age Queen Was Buried Wearing a Priceless Silver Crown
2 pages
Nearly All Mammals Will Go Extinct in 250 Million Years as Earth
No ratings yet
Nearly All Mammals Will Go Extinct in 250 Million Years as Earth
3 pages
ICATIECE-2022
No ratings yet
ICATIECE-2022
2 pages
2023-24-LIST
No ratings yet
2023-24-LIST
14 pages
Journal Catagory (1)
No ratings yet
Journal Catagory (1)
4 pages
30-3-2024-journals
No ratings yet
30-3-2024-journals
23 pages
Tads Education’s Student Acceptances 2022-24
No ratings yet
Tads Education’s Student Acceptances 2022-24
6 pages
CMA Inter OMSM DJB Updated For Dec 24 & June 25-2
No ratings yet
CMA Inter OMSM DJB Updated For Dec 24 & June 25-2
272 pages
Mapwork Calculations
No ratings yet
Mapwork Calculations
15 pages
Technical Seminar Report
No ratings yet
Technical Seminar Report
9 pages
QGIS 3.16 PyQGISDeveloperCookbook en
No ratings yet
QGIS 3.16 PyQGISDeveloperCookbook en
164 pages
MAN Micro Project
No ratings yet
MAN Micro Project
11 pages
Introduction To Arc Welding
No ratings yet
Introduction To Arc Welding
36 pages
SP2024_English-Taught Course List Undergrad
No ratings yet
SP2024_English-Taught Course List Undergrad
6 pages
5 KarakasAPSTRACT2014
No ratings yet
5 KarakasAPSTRACT2014
8 pages
The Menace of Corruption
No ratings yet
The Menace of Corruption
3 pages
Ajp Practical 20
100% (1)
Ajp Practical 20
4 pages
Manual de Usuario de Medidor G100
100% (1)
Manual de Usuario de Medidor G100
48 pages
A Case Study on Shifting Cultivation and Its Sustainable Development in Nagaland
No ratings yet
A Case Study on Shifting Cultivation and Its Sustainable Development in Nagaland
23 pages
Engine Vibration and Health Monitoring Systems
No ratings yet
Engine Vibration and Health Monitoring Systems
8 pages
Daido Chain Catalog
100% (1)
Daido Chain Catalog
147 pages
Delta Ia PLC DVP Es2 Ex2 Ss2 Sa2 Sx2 Se TP PM en 20220819
No ratings yet
Delta Ia PLC DVP Es2 Ex2 Ss2 Sa2 Sx2 Se TP PM en 20220819
981 pages
TAX3247N 3226N May 2024 Assignment Question Paper
No ratings yet
TAX3247N 3226N May 2024 Assignment Question Paper
10 pages
Math Lesson Plan
No ratings yet
Math Lesson Plan
5 pages
Help Desk
No ratings yet
Help Desk
4 pages
High Efficiency Filter Elements For Hydraulic and Lubrication Oils
No ratings yet
High Efficiency Filter Elements For Hydraulic and Lubrication Oils
8 pages
Fixed and Growth Mindset
No ratings yet
Fixed and Growth Mindset
2 pages
AQA Double Slit Interference Answers
No ratings yet
AQA Double Slit Interference Answers
15 pages
CIMA P1 Performance Operations Study Text 2013
100% (8)
CIMA P1 Performance Operations Study Text 2013
697 pages
Warren Buffet Principles
No ratings yet
Warren Buffet Principles
6 pages
ANALYSIS OF SORGHUM PRODUCTION AND ITS EFFECT ON FOOD SECURITY
No ratings yet
ANALYSIS OF SORGHUM PRODUCTION AND ITS EFFECT ON FOOD SECURITY
143 pages
HH 200
No ratings yet
HH 200
12 pages
Appendix F - Mechanical Design
No ratings yet
Appendix F - Mechanical Design
37 pages
Service Manuals Sharp AIR PURIFYER FU 40SE J FU 40SE J Service Manual
100% (1)
Service Manuals Sharp AIR PURIFYER FU 40SE J FU 40SE J Service Manual
28 pages

Decentralized_Deep_Reinforcement_Learning_Approach

Uploaded by

Decentralized_Deep_Reinforcement_Learning_Approach

Uploaded by

Decentralized Deep Reinforcement Learning

Approach for Channel Access Optimization

Keywords: Wi-Fi, contention-based channel access, channel utilization optimization, reinforcement

Posted Date: June 11th, 2024

Additional Declarations: The authors declare no competing interests.

Tables I and II summarize the parameters setup used in NS-

and two fully connected layers as the network architecture,

Network throughput [Mb/s]

Fig. 2 presents a comparison of the network throughput as a

38 function of the number of episodes. As with the static scenario,

300 throughput. Simulation results have shown that this approach

Fig. 7. Comparison of the instantaneous network throughput in the dynamic

You might also like