0% found this document useful (0 votes)
3 views

Multi_Agent_Learning_Automata_for_Online_Adaptive_Control_of_Large_Scale_Traffic_Signal_Systems

Uploaded by

junhua.tang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Multi_Agent_Learning_Automata_for_Online_Adaptive_Control_of_Large_Scale_Traffic_Signal_Systems

Uploaded by

junhua.tang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Multi-Agent Learning Automata for Online

Adaptive Control of Large-Scale Traffic Signal


Systems
Xuewei Hou, Lixing Chen, Junhua Tang, Jianhua Li
School of Electronic Information and Electrical Engineering
Shanghai Jiao Tong University, Shanghai, China
{xuewei0401,lxchen,junhuatang,lijh888}@sjtu.edu.cn

Abstract—Adaptive traffic control systems are gaining atten- phase is triggered by the arrival of vehicles, which are suitable
tion in recent years as traditional hand-crafted traffic control for situations with relatively high traffic randomness.
experiences performance fall-offs with increasingly complicated As traditional traffic signal control methods mainly rely on
metropolitan traffic patterns. This paper studies a learning
automata (LA)-based traffic signal control scheme that adapts hand-crafted rules without considering real-time traffic con-
to real-time traffic patterns and optimizes traffic flows by ditions, traffic adaptive signal control (ATSC) received much
dynamically changing the green split timings. A novel LA attention since the 1980s. Early ATSC methods (e.g. SCOOT,
algorithm, called K-Neighbor Multi-Agent Learning Automata SCATS, RHODES, COMDYCS III, LHOVRA, OPAC [1], [2])
(KN-MALA), is proposed to learn the optimal decision online use sensor loops to monitor the traffic condition and adopt
and adjust the traffic light accordingly in an attempt to minimize
the overall waiting time at an intersection. In particular, KN- certain traffic scheduling strategy to optimize one or multiple
MALA employs an online distributed learning framework that objectives (e.g. queue size, traveling time, etc). With the
integrates the traffic condition of neighboring intersections to rapid development of vehicle-to-everything (V2X) and edge
efficiently learn and infer optimal decisions for large-scale traffic computing technologies in recent years, real-time traffic statis-
signal systems. Furthermore, a parameter insensitive update tics can be collected and collaboration among controllers at
mechanism is designed for KN-MALA to overcome the instability
caused by initialization variations. Experiments are conducted adjacent intersections become possible, therefore an emerging
on real-world traffic patterns of Sioux Falls City and the trend is to use reinforcement learning algorithms for adaptive
performance of the proposed algorithm is compared with the traffic light control [2]–[4]. ATSC schemes based on fuzzy
pre-timed traffic light control scheme and an adaptive traffic logic and deep Q-learning have also been reported in the liter-
light control scheme based on single-agent learning automata. ature [5], [6]. However, most existing reinforcement learning-
The results show that the proposed algorithm outperforms the
other schemes in terms of quick traffic clearance under various based methods can only be applied to a single intersection or
traffic patterns and initial conditions. a small number of consecutive intersections, and cannot deal
Index Terms—traffic signal control, learning automata, multi- with large-scale urban traffic congestion problems.
agent system In this paper, we develop a lightweight, yet well-performing,
traffic signal control scheme based on Learning Automata for
I. I NTRODUCTION
adaptive traffic signal control. The main contribution of this
The traffic congestion issue has been a major urban trans- paper can be summarized as follows:
portation problem that not only brings inconvenience to city 1) We propose a lightweight traffic light control algorithm
residents but also causes a huge loss in economic vitality. based on Learning Automata (LA) which can learn the
A great amount of effort has been made to alleviate traf- optimal decision online without training. Since each
fic congestion, among which the Intelligent Transportation controller makes decisions in a distributed manner, the
System (ITS) is attracting tremendous attention in recent algorithm is scalable and can be used in large cities with
years, including topics of traffic signal control (ATSC), public complex traffic conditions.
transport signal priority (TSP), and emergency vehicle signal 2) A K-Neighbor Multi-Agent Learning Automata (KN-
preemption (EVSP), etc [1], [2]. The traffic signal system, MALA) algorithm is proposed. It encourages coop-
as the most important part of infrastructure for intelligent eration among controllers by allowing LA agents to
transportation system, is the main coordinator for the urban integrate the traffic information of the neighboring inter-
traffic flows. Traditional traffic light control methods can sections to learn better system-wide traffic signal control
be roughly classified into two groups [1]. The first is pre- decisions.
timed signal control where a fixed time is determined for all 3) We further design a parameter insensitive update mecha-
green phases according to historical traffic demand, without nism for KN-MALA. It enables KN-MALA to overcome
considering possible fluctuations in traffic demand. The second the instability caused by variations of initial settings,
is vehicle-actuated control methods where the traffic light thereby providing more stable traffic signal control poli-
978-1-6654-3540-6/22/$31.00 © 2022 IEEE cies.
4) Experiments are conducted based on the real-world which is scalable to large road networks. Collaboration be-
traffic data set of Sioux Falls city to evaluate the tween neighboring controllers is enabled by adopting a multi-
effectiveness of the proposed scheme. The results show agent learning model and the insensitivity to the initial pa-
that our method outperforms the pre-timed traffic light rameter setting is achieved by a parameter-insensitive update
control scheme and an adaptive traffic light control mechanism in the learning process.
scheme based on single-agent learning automata.
B. Preliminaries in Learning Automata
The rest of this paper is organized as follows. Section II
briefly reviews the related work in the literature and gives the Learning Automaton (LA) is a stochastic model operating in
background of Learning Automata. In section III we describe the framework of reinforcement learning [15]. It is quite useful
the proposed KN-MALA algorithm. Section IV presents the for adaptive decision making where one needs to choose an
experiment results and section V concludes the paper. action (among several alternatives) online to optimize system
performance, but without complete knowledge of how ac-
tions affect performance. LA approach regards these learning
II. R ELATED W ORK AND P RELIMINARIES
problems as the optimization of the expectation of a random
A. Related Work about Adaptive Traffic Signal Control function where the underlying probability distributions are
unknown. There are basically two categories of LA-based on
Much research work and a number of projects in the field the feature of the action set: Finite Action Learning Automata
of Adaptive Traffic Signal Control (ATSC) have been reported (FALA) and Continuous Action Learning Automata (CALA)
in the literature. Webster proposed a signal timing method to [16]. A finite action Learning Automata can be represented by
minimize vehicle delay, which lays the foundation for modern a quadruple {α, β, p, T } where α = {α1 , α2 , ..., αr } is the set
signal control strategies [7]. ATSC models including SCOOT, of actions of the automaton, β = {β1 , β2 , ..., βm } is the set
SCATS, RHODES, COMDYCS III, LHOVRA, and OPAC, of inputs generated by the environment, p = {p1 , p2 , ..., pr }
are widely adopted in urban intersections around the world. In is the probability vector for selecting the actions, p(n + 1) =
recent years, reinforcement learning and deep reinforcement T [α(n), β(n), p(n)] is the learning algorithm, where n stands
learning have been used to dynamically adjust traffic lights for the n-th iteration of the learning process. A general linear
according to real-time traffic conditions [8]. Approaches in schema for updating action probabilities when action i is
the literature can be classified into the following groups. performed is given by [16]:
Single signal control: Authors in [9] proposed a traffic
light control algorithm for a single intersection. The algorithm pi (n + 1) =pi (n) + a(1 − β(n))(1 − pi (n))
(1)
determines the time duration of the next cycle and each phase − bβ(n)pi (n),
by calculating the traffic flow and congestion level of each
lane. Authors in [10] formulated the signal control problem pj (n + 1) =pj (n) + a(1 − β(n))(1 − pj (n))
as a job scheduling problem, where each job is treated as 1 (2)
+ bβ(n)[ − pj (n)], ∀j 6= i,
vehicles pass through the intersection. This group of methods r−1
concentrates on the traffic control strategy at a single controller where a and b are reward and penalty parameters.
without considering the influence of other intersections, thus
may need adjustment in real implementation. III. K-N EIGHBOR M ULTI -AGENT L EARNING AUTOMATA
Centralized signal control: SCATS [11] automatically FOR ATSC
adjusts the signal parameters for each intersection in the road In this section, we give the system model of adaptive
network based on its library and real-time road conditions. The traffic signal control and propose a K-Neighbor Multi-Agent
algorithm proposed by Zhou et al. [12] used the Internet of Learning Automata (KN-MALA) algorithm to give a solution.
Vehicles (IoV) to collect real-time traffic data which are sent
to a central server for network-wide signal control decisions. A. System Model and Problem Formulation
Centralized signal control can usually achieve the optimal For the traffic light controller at each intersection, there is
solution, but the high cost and slow convergence make it a traffic phase plan. A setting of the traffic light is defined
difficult to apply on a large scale. as a phase (e.g., green light in the west-east and east-west
Distributed signal control: Liu et al. [13] designed a direction), and each phase allows a certain number of traffic
distributed signal control scheme in which a cooperative flows to pass, where a traffic flow stands for traffic demands in
algorithm is used to achieve collaboration between consecutive a certain direction. Depending on the type of intersection and
intersections. Chen et al. [14] proposed a decentralized multi- traffic phase design, a set of light phases are associated with
agent reinforcement learning scheme for traffic signal control the intersection which covers the traffic flow in all directions
in large-scale road network. However, the above methods are without overlapping. As in real-world setting, the traffic light
usually sensitive to hyper-parameters and require high-quality can only change in a specific order (i.e. phase 1 → phase 2 →
initial settings. phase 3 → phase 1 → phase 2 → phase 3 ...). One complete
In this paper, we intend to design a lightweight distributed rotation of the phases forms a cycle. Fig. 1 shows an example
traffic light control algorithm based on Learning Automata of a 4-phase traffic plan for a crossroad intersection. As shown
in Fig. 1, there are altogether 12 traffic flows denoted as F = where gpi,j0 is the green time of phase pi,j 0 , fm ∈ Fi,j 0 .
{f1 , f2 , ..., f12 }, and 4 light phases are defined. The 4 phases The CDI of phase pi,j is calculated as
rotate in sequence, and the traffic flows allowed in phase i(i ∈ PLi
{1, 2, 3, 4}) is denoted as Fi . For example, phase 1 turns on Ci,j,l
Ci,j = l=1 , (5)
green light in both west-east and east-west directions, allowing Li
f2 , f3 , f8 , f9 to pass, thus we have F1 = {f2 , f3 , f8 , f9 }. For where Li is the total number of traffic flows at intersection i.
adaptive traffic signal control, the duration of each signal phase The CDI at Si is thus obtained by
is variable, and it is up to the intelligent traffic signal agent PJi
to determine the time interval of each signal phase based on j=1 Ci,j
Ci = . (6)
the dynamic traffic status at the intersection. Ji
Online Learning Problem for ATSC: For adaptive traffic
signal control, the time duration of the light phases is variable
and can be adjusted to dynamic traffic status. In the framework
of reinforcement learning, the problem can be formulated as:
given the state describing the traffic condition near the ith
intersection, the goal of the traffic control agent Si is to learn
the optimal action (i.e., whether to change the light to the
next phase), so that the congestion delay index Ci can be
minimized.

B. K-Neighbor Multi-Agent Learning Automata


Fig. 1. A 4-phase traffic plan for a crossroad intersection In this section, we propose a K-Neighbor Multi-Agent
Learning Automata (KN-MALA) algorithm for adaptive traffic
Suppose there are I intersections in an area, and the traffic signal control. Each signal control agent at an intersection
signal control agent at intersection i is denoted as Si (i ∈ operates as a learning automaton. In addition to its own traffic
{1, 2, ..., I}). Let pi,j (j ∈ {1, 2, ..., Ji }) denote the jth phase status, a learning agent also takes the traffic condition of its K
of traffic light at Si , where Ji is the total number of phases neighbors into consideration, thus we call our scheme multi-
for Si . Let Fi,j be the set of traffic flows allowed to pass the agent learning automata. Fig. 2 provided an overview of KN-
intersection in phase pi,j . MALA, and in the following, we give detailed explanations.
Congestion Delay Index: We propose a Congestion Delay State: For Learning Automaton at Si , the state of
Index (CDI) to indicate the level of congestion, which is the environment at the nth iteration is represented by
defined as the ratio between the time to pass an intersection in φi (n), which is a finite set mapped from congestion de-
congestion conditions and in non-congested traffic conditions. lay index Ci . The state of its K neighbors denoted by
Consequently, a CDI greater than 1 implies traffic congestion, φi,1 (n), φi,2 (n), ..., φi,K (n) is combined to form the state
and the larger the CDI value, the more serious the traffic vector Φi (n) = [φi (n), φi,1 (n), φi,2 (n), ..., φi,K (n)].
congestion is.
To simplify the problem, we consider a vehicle as a dot
moving at a constant speed. Let vs be the traffic speed in
smooth traffic condition, and vc be the speed in congested
condition. Let tfl be the time it takes for a vehicle in flow fl
to travel from the upstream intersection to the exit of current
intersection in smooth traffic condition. Define wfl as the
number of vehicles waiting to pass the intersection in flow
fl . Let Ci,j,l denote the CDI of flow fl in phase pi,j . For
fl ∈ Fi,j , Ci,j,l is calculated as
wfl ×Ti
(
(tfl + vc ×g i,j ×2
)/tfl wfl 6= 0
Ci,j,l = , (3)
1 wfl = 0
where Ti is the time duration of the current phase cycle at Si ,
and gi,j is the remaining green time of phase pi,j from the
time Ci,j,l is calculated.
For traffic flow fm ∈/ Fi,j , Ci,j,m is calculated as
( wfm ×Ti
(tfm + vc ×g p ×2 )/tfm wfm 6= 0
Ci,j,m = i,j 0 , (4)
1 wfm = 0 Fig. 2. Illustration of K-neighbor multi-agent learning automata
Action: Action αi (n) is the time interval of the next phase Algorithm 1 KN-MALA
which is generated from a random distribution with the mean Input: Directed graph G, traffic demand D, learning rate a0 ,
of µi (n) and the standard deviation of σi (n). In this design, constant n0
an action is taken immediately after a Learning Automaton Output: State-action strategy π
obtains its neighbor’s state when there is one second left in 1: repeat
its current phase. 2: Initialize running time to 0, initialize phase time and
Reward: When half of its current phase time is left, state-action strategy for each signal.
the Learning Automaton obtains its neighbor’s state again 3: while not all vehicles reach their destination do
to evaluate the effect of it’s action. The reward rαi (n) and 4: for each signal do
feedback β(αi (n)) for action αi (n) are calculated as: 5: if phase remaining time is half phase time then
PK 6: Obtain state Φ, calculate reward rαi (n) and
φi,k (n + 1) feedback β(αi (n)), and update parameters in
rαi (n) = φi (n + 1) + k=1 , (7)
K state-action strategy.
β(αi (n)) = sigmoid(rαi (n) − rbest ), (8) 7: end if
8: if phase remaining time == 1 then
where rbest is the lowest reward in history under Φi , and
9: Obtain state Φ, select an action as the next phase
sigmoid(x) = 1+e1−x .
time according to state-action strategy.
Parameter-insensitive update of action probability. The
10: end if
traditional update method of LA shown in (1) and (2) is
11: if phase remaining time == 0 then
significantly affected by initial parameter settings, and it is
12: Switch to the next phase.
necessary to manually adjust the initial settings. To overcome
13: end if
this problem, we adopt a parameter insensitive update method
14: end for
proposed by Guo [17] in our learning model.
15: Generate new vehicles and update vehicle position.
If action αi is taken in iteration n, the mean µi and standard
16: Running time + = 1, phase remaining time − = 1.
deviation σi of the action probability density function are
17: end while
updated as:
18: until reach n iterations.


 µi (n) − a0 β(αi (n))σi (n)(αi (n) − µi (n))

 if n < n0
µi (n+1) = 2
in the United States [18]. The road network consists of 24


 µ i (n)−a 0bn/n 0 c3β(αi (n))σi (n)(αi (n)−µi (n)) intersections and 74 directed roads as shown in Fig. 3.
if n ≥ n0

In order to simulate the off-peak and peak periods that exist
( (9) in real traffic conditions, the speed of newly generated vehicles
σi (0) n < n0 first increases linearly and then decreases linearly according
σi (n + 1) = 1 , (10) to the traffic demand D, generating a total of ten minutes of
1 n ≥ n0
bn/n0 c 3
traffic.
where a0 is the fixed learning rate, β(αi (n)) is the feedback
when action αi (n) is taken. From (9) it can be seen that if
β(αi (n)) is close to 0, it means that the action αi (n) is a good
decision, and µi (n) needs little adjustment. If β(αi (n)) is
close to 1, it is implied that the action αi (n) is extremely bad,
and it is necessary to update µi (n) substantially. To achieve
parameter-insensitivity, we introduce a constant n0 [17]. The
learning rate a0 and standard deviation σi (n) are kept constant
until iteration n0 , so that actions are sampled uniformly and
the Learning Automata can explore as many values as possible.
After iteration n0 , σi (n) gradually decreases and the system
converges. The pseudo-code of KN-MALA algorithm is shown
in Algorithm 1.
IV. E XPERIMENTS AND R ESULTS
In this section, we conduct simulation experiments on a Fig. 3. Sioux Falls transportation network
real-world dataset to evaluate the performance of the proposed
B. Traffic Light Phase Design
KN-MALA algorithm.
As shown in Fig. 3, there are three types of intersections
A. Real-World Dataset on the map including straight intersection, T-Junction, and
Our simulation is based on a dataset describing the real crossroads. The traffic light phase plan we designed for each
road network and traffic demand of the city of Sioux Falls type of intersection is shown in Table I.
Total running time Average waiting time Longest travel time
1250 11
KN-MALA
600

Average waiting time (sec)

Longest travel time (sec)


SALA
Total running time (sec)

1200
10.5 Fixed phase time (30 sec)
580 KN-MALA
1150 SALA
KN-MALA
10 560 Fixed phase time (30 sec)
SALA
1100 Fixed phase time (30 sec)
9.5 540
1050
520
9
1000
500
950 8.5
0 100 200 300 400 500 600 0 100 200 300 400 500 600 0 100 200 300 400 500 600
Episode Episode Episode

Fig. 4. Total running time during learning Fig. 5. Average waiting time during learning Fig. 6. Longest travel time during learning

1400 12 650
KN-MALA KN-MALA KN-MALA

Average waiting time (sec)

Longest travel time (sec)


Total running time (sec)

SALA SALA SALA


1300 Fixed phase time Fixed phase time Fixed phase time
11 600

1200
10 550
1100

9 500
1000

900 8 450
10 15 20 25 30 35 40 10 15 20 25 30 35 40 10 15 20 25 30 35 40
Value Value Value

Fig. 7. Total running time with different µ Fig. 8. Average waiting time with different µ Fig. 9. Longest travel time with different µ

TABLE I 1) pre-timed traffic light control where a fixed time is set for
P HASE S ETTING each phase. The fixed phase time is set to 30 seconds in our
Type of Intersection Road Direction Phase Flows Allowed experiments, and the results are marked as “Benchmark”. 2)
1 E-W,W-E Single Agent Learning Automata (SALA) where a traditional
Straight intersection E,W
2 —— Learning Automata is employed at each intersection for adap-
1 E-W,W-E,E-N tive traffic light control without collaboration between learning
T-Junction E,W,N 2 W-N
agents.
3 N-E,N-W
1 E-W,E-N,W-E,W-S Convergence of Performance Metrics. Fig. 4, Fig. 5
2 E-S,W-N and Fig. 6 show the total running time, average waiting
Crossroads E,W,N,S
3 N-S,S-N,N-W,S-E time, and longest travel time versus the learning episode,
4 N-E,S-W respectively, for the proposed KN-MALA algorithm, and the
other compared methods. For KN-MALA, we can see that
all the three performance metrics converge after episode 500,
C. Parameter Setting this is because the value of n corresponding to each state
First, the CDI Ci at intersection Si is mapped to one of is large enough so that its standard deviation σi (n) is small
the discrete states. In our experiment, the number of states enough as shown in (10). It can also be observed in Fig. 4
is set to 5. And the mapping rule is 1 ≤ Ci < 1.05 → s1 , that the SALA algorithm that learns by itself is inferior to
1.05 ≤ Ci < 1.15 → s2 , 1.15 ≤ Ci < 1.25 → s3 , 1.25 ≤ the benchmark in terms of total running time, while the total
Ci < 1.35 → s4 , 1.35 ≤ Ci → s5 . Therefore, the state of Si is running time of the KN-MALA algorithm is nearly 14% less
φi ∈ {s1 , s2 , s3 , s4 , s5 }. The Learning Automata is initialized than the benchmark, which means KN-MALA can clear the
with µ = 30 and σ = 5. The learning rate a0 is set to 0.01 traffic faster than the other two algorithms. From Fig. 5 and
and the constant n0 is set to 100. The lower limit of µ to set Fig. 6, we can see that the KN-MALA algorithm outperforms
to 10 during the update process. the benchmark and SALA methods significantly in terms of
average waiting time and longest travel time.
D. Results and Discussion Effect of Initial Parameter Setting. To investigate the
Evaluation Metrics. The performance of the proposed effect of the initial parameters µ and σ on the performance of
algorithm is evaluated in terms of the following metrics. 1) the proposed algorithm, we conduct experiments with different
Total running time: the time it takes for the system to clear all µ and σ and the results are shown in Fig. 7 to Fig. 10.
the traffic. 2) Average waiting time: the average waiting time Fig. 7, Fig. 8 and Fig. 9 show the total running time,
experienced by a vehicle. 3) Longest travel time: the time it average waiting time and longest travel time with different
takes for the vehicle travelling the longest path to reach its µ, respectively, for the three algorithms. It can be observed
destination. that all the three performance indicators increase with µ, for
Compared Methods. The performance of the proposed all the three algorithms. However, compared with benchmark
KN-MALA algorithm is compared with two other methods: and SALA, the performance metrics of KN-MALA change
very slightly, exhibiting a certain degree of insensitivity to ACKNOWLEDGEMENT
the initial setting of µ. Fig. 10 shows the total running time, This work is sponsored by projects 61831007, U20B2048
average waiting time, and longest travel time of the KN- supported by National Natural Science Foundation of China,
MALA algorithm under different initial settings of µ and σ. and Shanghai Key Laboratory of Integrated Administration
The results show that the KN-MALA algorithm is relatively Technologies for Information Security.
insensitive to the initial setting of both µ and σ.
R EFERENCES
[1] W.-H. Lee and C.-Y. Chiu, “Design and implementation of a smart traffic
signal control system for smart city applications,” Sensors, vol. 20, no. 2,
pp. 508–525, 2020.
[2] X. Liang, X. Du, G. Wang, and Z. Han, “A deep reinforcement learning
network for traffic light cycle control,” IEEE Transactions on Vehicular
Technology, vol. 68, no. 2, pp. 1243–1253, 2019.
[3] H. Wei, G. Zheng, H. Yao, and Z. Li, “Intellilight: A reinforcement
learning approach for intelligent traffic light control,” in Proceedings
of the 24th ACM SIGKDD International Conference on Knowledge
Discovery &amp; Data Mining, ser. KDD ’18. New York, NY, USA:
Association for Computing Machinery, 2018, p. 2496–2505. [Online].
Available: https://doi.org/10.1145/3219819.3220096
[4] M. Wiering, “Multi-agent reinforcement leraning for traffic light con-
Fig. 10. Performance of KN-MALA with different µ and σ trol,” in Proceedings of the Seventeenth International Conference on
Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 2000, p. 1151–1158.
[5] Y. Bi, X. Lu, Z. Sun, D. Srinivasan, and Z. Sun, “Optimal type-2
fuzzy system for arterial traffic signal control,” IEEE Transactions on
Intelligent Transportation Systems, vol. 19, no. 9, pp. 3009–3027, 2018.
[6] H. Ge, Y. Song, C. Wu, J. Ren, and G. Tan, “Cooperative deep q-learning
with q-value transfer for multi-intersection signal control,” IEEE Access,
vol. 7, pp. 40 797–40 809, 2019.
[7] F. V. Webster, “Traffic signal settings,” Road Research Technical Paper,
vol. 39, 1958.
[8] N. Wu, D. Li, and Y. Xi, “Distributed weighted balanced control
of traffic signals for urban traffic congestion,” IEEE Transactions on
Intelligent Transportation Systems, vol. 20, no. 10, pp. 3710–3720, 2019.
[9] H. J. Chang and G. T. Park, “A study on traffic signal control at
signalized intersections in vehicular ad hoc networks,” Ad Hoc Networks,
Fig. 11. Performance of KN-MALA with different number of states
vol. 11, no. 7, pp. 2115–2124, 2013.
[10] K. Pandit, D. Ghosal, H. M. Zhang, and C.-N. Chuah, “Adaptive traffic
The number of states for φ also affects the performance signal control with vehicular ad hoc networks,” IEEE Transactions on
of the system. Fig. 11 shows the performance of KN-MALA Vehicular Technology, vol. 62, no. 4, pp. 1459–1471, 2013.
[11] L. Zhang, T. M. Garoni, and J. D. Gier, “A comparative study of
when the number of states is 3, 5 and 7, respectively. When macroscopic fundamental diagrams of arterial road networks governed
the number of states is 3, CDI Ci in the range of (<1.11, 1.11- by adaptive traffic signal systems,” Transportation Research Part B,
1.29, >1.35) is mapped to {s1 , s2 , s3 } respectively. Similarly, vol. 49, no. mar., pp. 1–23, 2013.
[12] P. Zhou, X. Chen, Z. Liu, T. Braud, P. Hui, and J. Kangasharju, “DRLE:
when the number of states is 7, CDI Ci in the range of (<1.05, Decentralized reinforcement learning at the edge for traffic light control
1.05-1.11, 1.11-1.17, 1.17-1.23, 1.23-1.29, 1.29-1.35, >1.35) in the IoV,” IEEE Transactions on Intelligent Transportation Systems,
is mapped to {s1 , s2 , ..., s7 }. It can be seen in Fig. 11 that vol. 22, no. 4, pp. 2262–2273, 2021.
[13] W. Liu, G. Qin, Y. He, and F. Jiang, “Distributed cooperative re-
the KN-MALA algorithm performs better when the number inforcement learning-based traffic signal control that integrates v2x
of states is larger, this is because the increase in the number networks’ dynamic clustering,” IEEE Transactions on Vehicular Tech-
of states of φ is equivalent to refining the defined range of nology, vol. 66, no. 10, pp. 8667–8681, 2017.
[14] Y. Chen, C. Li, W. Yue, H. Zhang, and G. Mao, “Engineering a
CDI, therefore the algorithm can make better decisions and large-scale traffic signal control: A multi-agent reinforcement learning
the traffic in the system is cleared faster. approach,” in IEEE INFOCOM 2021 - IEEE Conference on Computer
Communications Workshops (INFOCOM WKSHPS), 2021, pp. 1–6.
V. C ONCLUSION [15] Z. Zhang, D. Wang, and J. Gao, “Learning automata-based multiagent
reinforcement learning for optimization of cooperative tasks,” IEEE
In this paper, we proposed an algorithm for adaptive traffic Transactions on Neural Networks and Learning Systems, vol. 32, no. 10,
signal control. The algorithm is based on multi-agent Learning pp. 4639–4652, 2021.
Automata and can learn the optimal decision online to deter- [16] B. Masoumi and M. R. Meybodi, “Learning automata based multi-agent
system algorithms for finding optimal policies in markov games,” Asian
mine the phase time of the signals dynamically. Simulation Journal of Control, vol. 14, no. 1, pp. 137–152, 2012.
is conducted based on the data set describing the city road [17] Y. Guo, “Research on algorithms of free-of-parameter-tuning learning
network of 24 intersections of the city of Sioux Falls, USA. automata,” Ph.D. dissertation, Shanghai Jiao Tong University, 2019.
[18] T. N. for Research Core Team, “Transportation networks for research,”
The results show that the proposed KN-MALA algorithm can https://github.com/bstabler/TransportationNetworks, Oct 2021.
effectively improve traffic efficiency, thus alleviating the traffic
congestion problem. The proposed method is also applicable
to multi-agent planning problems in other fields, such as
economics, management, engineering, etc.

You might also like