A Q-Learning-Based Adaptive MAC Protocol For Internet of Things Networks

This document presents a research paper that proposes a Q-learning based medium access control (MAC) protocol for internet of things (IoT) networks. The protocol uses reinforcement learning to dynamically adjust the length of the contention period based on ongoing traffic rates in the network. This allows it to adapt to changing environmental conditions without needing additional input information. The paper confirms that the proposed Q-learning based MAC protocol is robust. It also shows that the protocol achieves higher throughput, lower end-to-end delay, and lower energy consumption during contention compared to traditional contention-based MAC protocols.

Uploaded by

lijalem gezahagn

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

A Q-Learning-Based Adaptive MAC Protocol For Internet of Things Networks

Uploaded by

lijalem gezahagn

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Received July 24, 2021, accepted August 4, 2021, date of publication August 9, 2021, date of current version September

24, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3103718

A Q-Learning-Based Adaptive MAC Protocol for

Internet of Things Networks
CHIEN-MIN WU , YEN-CHUN KAO, KAI-FU CHANG, CHENG-TAI TSAI, AND CHENG-CHUN HOU
Department of Computer Science and Information Engineering, Nanhua University, Chiayi 62248, Taiwan
Corresponding author: Chien-Min Wu ([email protected])
This work is supported by Ministry of Science and Technology, Taiwan, R.O.C., under grant MOST 107-2221-E-343-001-.

ABSTRACT In Internet of Things (IoT) applications, sometimes the quality of service (QoS) of throughput
for transmitting video or the QoS of bounded delay for control of a sensor node is required. A traditional
contention-based medium access control (MAC) protocol cannot meet the adaptive traffic demands of these
networks and confers delay-related constraints. Q-learning (QL) is one of the reinforcement learning (RL)
mechanisms and can potentially be the future machine learning scheme for spectrum MAC protocols in
IoT networks. In this study, a QL-based MAC protocol is proposed to facilitate adaptive adjustment of
the length of the contention period in response to the ongoing traffic rate in IoT networks. The novelty
of QL-based MAC lies in its use of RL to dynamically adjust the length of the contention period according
to the traffic rate. The QL-based MAC will solve the models without additional input information to adapt
to environmental variations during training. We confirm that the proposed QL-based MAC protocol with
node contention is robust. In addition, we showed that our proposed QL-based MAC protocol has higher
system throughput, lower end-to-end delay, and lower energy consumption in MAC contention than those
of contention-based MAC protocols.

INDEX TERMS Internet of Things, quality of service, medium access control, reinforcement learning,
Q-learning.

I. INTRODUCTION and [6], a busy tone-based medium access control (MAC)

The Internet of Things (IoT) is a network with a variety of protocol is proposed as a strategy for controlling the network
applications such as smart sensors, smart home appliances, propagation delay and prioritizing data transmission. In this
and monitoring devices. Each node of the network can com- approach, the available status of a channel is included in
municate with any other node at any time or location. The the busy tone control frame, which is then broadcasted to
technology behind the IoT has become of increasing interest all the nodes at a specified time. By listening to the busy
to the field of communications in recent years [1]. tone control frame, each node is able to determine when the
In the machine-to-machine (M2M) environment, fixed- channel can be accessed without conflict. However, the busy
location communication is insufficient; accordingly, only tone-based MAC protocol is inefficient under heavy traffic as
portable and mobile devices can meet the needs of human the propagation delay increases significantly.
users of the M2M environment. As a result, capitalizing on This limitation can be partially overcome by assign-
mobile IoT has become a common strategy to enhance wire- ing each node a time slot in a time division multiple
less communication technology and to achieve ubiquitous access (TDMA) scheme to avoid the potential for collision.
access to a variety of network devices and applications [2]. In [7], a TDMA-based distributed packet reservation mul-
The nodes that comprise M2M IoT systems include personal, tiple access (D-PRMA) strategy is proposed. In D-PRMA,
military, and emergency systems. In particular, these nodes transmission priority is assigned for different services. For
include smart phones or sensor nodes. This had led to the example, the priority of general data is lower than that of
study of IoT networks in recent years [3], [4]. multimedia data. In [8], a decentralized TDMA MAC pro-
The control and sequence of data transmission is criti- tocol is proposed to adapt transmission priority based on
cal for the effective function of wireless networks. In [5] the traffic rate. In this D-PRMA protocol, the number of
time slots is determined by the number of nodes. Therefore,
The associate editor coordinating the review of this manuscript and the system performance using the D-PRMA protocol will be
approving it for publication was Kashif Sharif . significantly decreased as the number of nodes increases.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 128905
C.-M. Wu et al.: QL-Based Adaptive MAC Protocol for IoT Networks

One standard for wireless network design is defined in the can independently access system resources based on the con-
IEEE 802.11 protocol, which is used extensively as a testbed dition of the wireless channels. Q-learning (QL) is a rein-
and as a reference model for simulation in wireless network forcement learning (RL) scheme that may also serve as an
studies. IEEE 802.11 defines the request-to-send/clear-to- ML strategy for facilitating IoT network in the future [16].
send (RTS/CTS) scheme to overcome the hidden terminal ML, RL, and QL have been previously investigated with
problem in wireless networks. When many nodes simultane- regards to their potential roles in wireless networks. In [17],
ously send the RTS control frame in the contention period, a novel Markov decision process (MDP) model is proposed.
a collision occurs [9]. Furthermore, if the CTS control frame Using this model, the highest throughput can be achieved by
cannot reply to the receiver node after receiving data from the adjusting the power and probabilities of node transmission
RTS control frame, the connection fails as well. This scheme in the wireless networks. A model-free RL mechanism is
can be used to solve the hidden terminal problem in wireless also employed by many states to solve the centralized MDP
networks, which can improve the system throughput. model.
Accordingly, a TDMA scheme based on carrier sense mul- The future research studies on Q-Learning-based MAC
tiple access with collision avoidance (CSMA/CA) has been protocol for duty-cycling mechanism are expected to
proposed and termed ‘‘hybrid-TDMA’’ [10]. In this scheme, be focused on achieving energy efficiency and delay-
a shorter end-to-end delay and higher channel utilization for awareness [18]. In [19], a power control scheme based on
data transmission can be achieved. Different approaches have game theory is proposed. The maximum energy efficiency
been used to create systems that adapt dynamically to network is determined by the selection of the source nodes, and
traffic. In [11], [12], node density and mobility speed are used the data transmission rate is maximized by the selection of
as the reference conditions for determining the length of the the relay nodes. This selection process is achieved using a
contention period. This ensures that system performance is QL-based algorithm to identify the convergence of the best
maintained when the traffic rate changes. Nash equilibrium points. In [20], the authors propose an
In [13], the authors propose a delay-collision CSMA ML-based mechanism for cognitive radio technology. In this
(DC-CSMA) scheme based on nonpersistent TDMA approach, the efficiency of opportunistic spectrum access is
CSMA/CA. The high system performance of the DC-CSMA determined by a QL algorithm; as a result, prior knowledge of
scheme is achieved by balancing the contention probability the environment’s characteristics is not required. In addition,
and channel access time. A low contention latency and high identifying the ongoing interaction between radio nodes and
successful channel access probability are achieved using the environment by this method facilitates increased perfor-
the DC-CSMA under a nonuniform contention probability mance.
distribution. DC-CSMA can also achieve robustness under a In [21], the authors proposed a Q-Learning-based packet
dynamic number of contenders, contention window size, and flow ALOHA protocol which applies a simple Q-Learning
packet size. mechanism under no ACK control framework to achieve a
In [14], the authors propose a contention window control collision-free protocol. When the packet is received success-
mechanism for wireless networks based on IEEE 802.11. fully, the reward is +1, which is −1 otherwise. The node
The size of the contention window is determined by the selects the slot according to the Q-value to mitigate the colli-
traffic load, transmission rate, and packet size. The authors sion and to achieve the collision-free protocol.
also develop a model to demonstrate the system performance In [22], a novel RL-based, model-free scheme is proposed
under collision probability, collision time, and back-off time. for wireless networks that combines the expected Sarsa and
In [15], the authors propose a token adaptive MAC protocol eligibility trace. In this scheme, all values of the possible
(token-based MAC) for a mobile network. The token-based successive actions are averaged to update the targets, which
MAC protocol uses a two-hop wireless mobile network with reduces the variance caused by random sampling. In [23],
a fixed licensed channel. Each node can send packets when the authors propose a deep-RL MAC (DRL-MAC) protocol
it receives a token. The length of the token cycle corresponds for heterogeneous wireless networks that considers the time
to the number of nodes. Therefore, the cycles are long when slot sharing problem for multiple different MAC protocols in
using token-based MAC with a large number of nodes, which multiple time-slotted networks.
will result in increased propagation delays. In [24], the authors propose a QL-based MAC (QL-MAC)
In traditional MAC protocols, the fixed length of the protocol to solve the sleep-wake scheduling problem in wire-
contention period is designed into the MAC contention less networks. QL-MAC is a distributed QL mechanism and
mechanism. Many studies have used the number of wireless can accordingly adapt to dynamic traffic load. The QL-MAC
network nodes to define the length of the contention period, protocol can increase network lifetimes and packet delivery
which does not allow adaptation to network traffic rates. This, ratios compared to other MAC protocols for wireless sensor
in turn, reduces the system throughput and will lead to a networks. In [25], the authors propose an RL-based adaptive
longer-contention-MAC-delay than if adaptive lengths were MAC (RLA-MAC) protocol for wireless sensor networks to
used. optimize the sleep-wake schedule of sensor nodes to reduce
Machine learning (ML) can enable machines to perform energy consumption in MAC contention. Whereas most pro-
as intelligent tools in IoT-enabled wireless networks, which tocols use adaptive duty cycles to optimize energy utilization,