Multi_Agent_Learning_Automata_for_Online_Adaptive_Control_of_Large_Scale_Traffic_Signal_Systems
Multi_Agent_Learning_Automata_for_Online_Adaptive_Control_of_Large_Scale_Traffic_Signal_Systems
Abstract—Adaptive traffic control systems are gaining atten- phase is triggered by the arrival of vehicles, which are suitable
tion in recent years as traditional hand-crafted traffic control for situations with relatively high traffic randomness.
experiences performance fall-offs with increasingly complicated As traditional traffic signal control methods mainly rely on
metropolitan traffic patterns. This paper studies a learning
automata (LA)-based traffic signal control scheme that adapts hand-crafted rules without considering real-time traffic con-
to real-time traffic patterns and optimizes traffic flows by ditions, traffic adaptive signal control (ATSC) received much
dynamically changing the green split timings. A novel LA attention since the 1980s. Early ATSC methods (e.g. SCOOT,
algorithm, called K-Neighbor Multi-Agent Learning Automata SCATS, RHODES, COMDYCS III, LHOVRA, OPAC [1], [2])
(KN-MALA), is proposed to learn the optimal decision online use sensor loops to monitor the traffic condition and adopt
and adjust the traffic light accordingly in an attempt to minimize
the overall waiting time at an intersection. In particular, KN- certain traffic scheduling strategy to optimize one or multiple
MALA employs an online distributed learning framework that objectives (e.g. queue size, traveling time, etc). With the
integrates the traffic condition of neighboring intersections to rapid development of vehicle-to-everything (V2X) and edge
efficiently learn and infer optimal decisions for large-scale traffic computing technologies in recent years, real-time traffic statis-
signal systems. Furthermore, a parameter insensitive update tics can be collected and collaboration among controllers at
mechanism is designed for KN-MALA to overcome the instability
caused by initialization variations. Experiments are conducted adjacent intersections become possible, therefore an emerging
on real-world traffic patterns of Sioux Falls City and the trend is to use reinforcement learning algorithms for adaptive
performance of the proposed algorithm is compared with the traffic light control [2]–[4]. ATSC schemes based on fuzzy
pre-timed traffic light control scheme and an adaptive traffic logic and deep Q-learning have also been reported in the liter-
light control scheme based on single-agent learning automata. ature [5], [6]. However, most existing reinforcement learning-
The results show that the proposed algorithm outperforms the
other schemes in terms of quick traffic clearance under various based methods can only be applied to a single intersection or
traffic patterns and initial conditions. a small number of consecutive intersections, and cannot deal
Index Terms—traffic signal control, learning automata, multi- with large-scale urban traffic congestion problems.
agent system In this paper, we develop a lightweight, yet well-performing,
traffic signal control scheme based on Learning Automata for
I. I NTRODUCTION
adaptive traffic signal control. The main contribution of this
The traffic congestion issue has been a major urban trans- paper can be summarized as follows:
portation problem that not only brings inconvenience to city 1) We propose a lightweight traffic light control algorithm
residents but also causes a huge loss in economic vitality. based on Learning Automata (LA) which can learn the
A great amount of effort has been made to alleviate traf- optimal decision online without training. Since each
fic congestion, among which the Intelligent Transportation controller makes decisions in a distributed manner, the
System (ITS) is attracting tremendous attention in recent algorithm is scalable and can be used in large cities with
years, including topics of traffic signal control (ATSC), public complex traffic conditions.
transport signal priority (TSP), and emergency vehicle signal 2) A K-Neighbor Multi-Agent Learning Automata (KN-
preemption (EVSP), etc [1], [2]. The traffic signal system, MALA) algorithm is proposed. It encourages coop-
as the most important part of infrastructure for intelligent eration among controllers by allowing LA agents to
transportation system, is the main coordinator for the urban integrate the traffic information of the neighboring inter-
traffic flows. Traditional traffic light control methods can sections to learn better system-wide traffic signal control
be roughly classified into two groups [1]. The first is pre- decisions.
timed signal control where a fixed time is determined for all 3) We further design a parameter insensitive update mecha-
green phases according to historical traffic demand, without nism for KN-MALA. It enables KN-MALA to overcome
considering possible fluctuations in traffic demand. The second the instability caused by variations of initial settings,
is vehicle-actuated control methods where the traffic light thereby providing more stable traffic signal control poli-
978-1-6654-3540-6/22/$31.00 © 2022 IEEE cies.
4) Experiments are conducted based on the real-world which is scalable to large road networks. Collaboration be-
traffic data set of Sioux Falls city to evaluate the tween neighboring controllers is enabled by adopting a multi-
effectiveness of the proposed scheme. The results show agent learning model and the insensitivity to the initial pa-
that our method outperforms the pre-timed traffic light rameter setting is achieved by a parameter-insensitive update
control scheme and an adaptive traffic light control mechanism in the learning process.
scheme based on single-agent learning automata.
B. Preliminaries in Learning Automata
The rest of this paper is organized as follows. Section II
briefly reviews the related work in the literature and gives the Learning Automaton (LA) is a stochastic model operating in
background of Learning Automata. In section III we describe the framework of reinforcement learning [15]. It is quite useful
the proposed KN-MALA algorithm. Section IV presents the for adaptive decision making where one needs to choose an
experiment results and section V concludes the paper. action (among several alternatives) online to optimize system
performance, but without complete knowledge of how ac-
tions affect performance. LA approach regards these learning
II. R ELATED W ORK AND P RELIMINARIES
problems as the optimization of the expectation of a random
A. Related Work about Adaptive Traffic Signal Control function where the underlying probability distributions are
unknown. There are basically two categories of LA-based on
Much research work and a number of projects in the field the feature of the action set: Finite Action Learning Automata
of Adaptive Traffic Signal Control (ATSC) have been reported (FALA) and Continuous Action Learning Automata (CALA)
in the literature. Webster proposed a signal timing method to [16]. A finite action Learning Automata can be represented by
minimize vehicle delay, which lays the foundation for modern a quadruple {α, β, p, T } where α = {α1 , α2 , ..., αr } is the set
signal control strategies [7]. ATSC models including SCOOT, of actions of the automaton, β = {β1 , β2 , ..., βm } is the set
SCATS, RHODES, COMDYCS III, LHOVRA, and OPAC, of inputs generated by the environment, p = {p1 , p2 , ..., pr }
are widely adopted in urban intersections around the world. In is the probability vector for selecting the actions, p(n + 1) =
recent years, reinforcement learning and deep reinforcement T [α(n), β(n), p(n)] is the learning algorithm, where n stands
learning have been used to dynamically adjust traffic lights for the n-th iteration of the learning process. A general linear
according to real-time traffic conditions [8]. Approaches in schema for updating action probabilities when action i is
the literature can be classified into the following groups. performed is given by [16]:
Single signal control: Authors in [9] proposed a traffic
light control algorithm for a single intersection. The algorithm pi (n + 1) =pi (n) + a(1 − β(n))(1 − pi (n))
(1)
determines the time duration of the next cycle and each phase − bβ(n)pi (n),
by calculating the traffic flow and congestion level of each
lane. Authors in [10] formulated the signal control problem pj (n + 1) =pj (n) + a(1 − β(n))(1 − pj (n))
as a job scheduling problem, where each job is treated as 1 (2)
+ bβ(n)[ − pj (n)], ∀j 6= i,
vehicles pass through the intersection. This group of methods r−1
concentrates on the traffic control strategy at a single controller where a and b are reward and penalty parameters.
without considering the influence of other intersections, thus
may need adjustment in real implementation. III. K-N EIGHBOR M ULTI -AGENT L EARNING AUTOMATA
Centralized signal control: SCATS [11] automatically FOR ATSC
adjusts the signal parameters for each intersection in the road In this section, we give the system model of adaptive
network based on its library and real-time road conditions. The traffic signal control and propose a K-Neighbor Multi-Agent
algorithm proposed by Zhou et al. [12] used the Internet of Learning Automata (KN-MALA) algorithm to give a solution.
Vehicles (IoV) to collect real-time traffic data which are sent
to a central server for network-wide signal control decisions. A. System Model and Problem Formulation
Centralized signal control can usually achieve the optimal For the traffic light controller at each intersection, there is
solution, but the high cost and slow convergence make it a traffic phase plan. A setting of the traffic light is defined
difficult to apply on a large scale. as a phase (e.g., green light in the west-east and east-west
Distributed signal control: Liu et al. [13] designed a direction), and each phase allows a certain number of traffic
distributed signal control scheme in which a cooperative flows to pass, where a traffic flow stands for traffic demands in
algorithm is used to achieve collaboration between consecutive a certain direction. Depending on the type of intersection and
intersections. Chen et al. [14] proposed a decentralized multi- traffic phase design, a set of light phases are associated with
agent reinforcement learning scheme for traffic signal control the intersection which covers the traffic flow in all directions
in large-scale road network. However, the above methods are without overlapping. As in real-world setting, the traffic light
usually sensitive to hyper-parameters and require high-quality can only change in a specific order (i.e. phase 1 → phase 2 →
initial settings. phase 3 → phase 1 → phase 2 → phase 3 ...). One complete
In this paper, we intend to design a lightweight distributed rotation of the phases forms a cycle. Fig. 1 shows an example
traffic light control algorithm based on Learning Automata of a 4-phase traffic plan for a crossroad intersection. As shown
in Fig. 1, there are altogether 12 traffic flows denoted as F = where gpi,j0 is the green time of phase pi,j 0 , fm ∈ Fi,j 0 .
{f1 , f2 , ..., f12 }, and 4 light phases are defined. The 4 phases The CDI of phase pi,j is calculated as
rotate in sequence, and the traffic flows allowed in phase i(i ∈ PLi
{1, 2, 3, 4}) is denoted as Fi . For example, phase 1 turns on Ci,j,l
Ci,j = l=1 , (5)
green light in both west-east and east-west directions, allowing Li
f2 , f3 , f8 , f9 to pass, thus we have F1 = {f2 , f3 , f8 , f9 }. For where Li is the total number of traffic flows at intersection i.
adaptive traffic signal control, the duration of each signal phase The CDI at Si is thus obtained by
is variable, and it is up to the intelligent traffic signal agent PJi
to determine the time interval of each signal phase based on j=1 Ci,j
Ci = . (6)
the dynamic traffic status at the intersection. Ji
Online Learning Problem for ATSC: For adaptive traffic
signal control, the time duration of the light phases is variable
and can be adjusted to dynamic traffic status. In the framework
of reinforcement learning, the problem can be formulated as:
given the state describing the traffic condition near the ith
intersection, the goal of the traffic control agent Si is to learn
the optimal action (i.e., whether to change the light to the
next phase), so that the congestion delay index Ci can be
minimized.
1200
10.5 Fixed phase time (30 sec)
580 KN-MALA
1150 SALA
KN-MALA
10 560 Fixed phase time (30 sec)
SALA
1100 Fixed phase time (30 sec)
9.5 540
1050
520
9
1000
500
950 8.5
0 100 200 300 400 500 600 0 100 200 300 400 500 600 0 100 200 300 400 500 600
Episode Episode Episode
Fig. 4. Total running time during learning Fig. 5. Average waiting time during learning Fig. 6. Longest travel time during learning
1400 12 650
KN-MALA KN-MALA KN-MALA
1200
10 550
1100
9 500
1000
900 8 450
10 15 20 25 30 35 40 10 15 20 25 30 35 40 10 15 20 25 30 35 40
Value Value Value
Fig. 7. Total running time with different µ Fig. 8. Average waiting time with different µ Fig. 9. Longest travel time with different µ
TABLE I 1) pre-timed traffic light control where a fixed time is set for
P HASE S ETTING each phase. The fixed phase time is set to 30 seconds in our
Type of Intersection Road Direction Phase Flows Allowed experiments, and the results are marked as “Benchmark”. 2)
1 E-W,W-E Single Agent Learning Automata (SALA) where a traditional
Straight intersection E,W
2 —— Learning Automata is employed at each intersection for adap-
1 E-W,W-E,E-N tive traffic light control without collaboration between learning
T-Junction E,W,N 2 W-N
agents.
3 N-E,N-W
1 E-W,E-N,W-E,W-S Convergence of Performance Metrics. Fig. 4, Fig. 5
2 E-S,W-N and Fig. 6 show the total running time, average waiting
Crossroads E,W,N,S
3 N-S,S-N,N-W,S-E time, and longest travel time versus the learning episode,
4 N-E,S-W respectively, for the proposed KN-MALA algorithm, and the
other compared methods. For KN-MALA, we can see that
all the three performance metrics converge after episode 500,
C. Parameter Setting this is because the value of n corresponding to each state
First, the CDI Ci at intersection Si is mapped to one of is large enough so that its standard deviation σi (n) is small
the discrete states. In our experiment, the number of states enough as shown in (10). It can also be observed in Fig. 4
is set to 5. And the mapping rule is 1 ≤ Ci < 1.05 → s1 , that the SALA algorithm that learns by itself is inferior to
1.05 ≤ Ci < 1.15 → s2 , 1.15 ≤ Ci < 1.25 → s3 , 1.25 ≤ the benchmark in terms of total running time, while the total
Ci < 1.35 → s4 , 1.35 ≤ Ci → s5 . Therefore, the state of Si is running time of the KN-MALA algorithm is nearly 14% less
φi ∈ {s1 , s2 , s3 , s4 , s5 }. The Learning Automata is initialized than the benchmark, which means KN-MALA can clear the
with µ = 30 and σ = 5. The learning rate a0 is set to 0.01 traffic faster than the other two algorithms. From Fig. 5 and
and the constant n0 is set to 100. The lower limit of µ to set Fig. 6, we can see that the KN-MALA algorithm outperforms
to 10 during the update process. the benchmark and SALA methods significantly in terms of
average waiting time and longest travel time.
D. Results and Discussion Effect of Initial Parameter Setting. To investigate the
Evaluation Metrics. The performance of the proposed effect of the initial parameters µ and σ on the performance of
algorithm is evaluated in terms of the following metrics. 1) the proposed algorithm, we conduct experiments with different
Total running time: the time it takes for the system to clear all µ and σ and the results are shown in Fig. 7 to Fig. 10.
the traffic. 2) Average waiting time: the average waiting time Fig. 7, Fig. 8 and Fig. 9 show the total running time,
experienced by a vehicle. 3) Longest travel time: the time it average waiting time and longest travel time with different
takes for the vehicle travelling the longest path to reach its µ, respectively, for the three algorithms. It can be observed
destination. that all the three performance indicators increase with µ, for
Compared Methods. The performance of the proposed all the three algorithms. However, compared with benchmark
KN-MALA algorithm is compared with two other methods: and SALA, the performance metrics of KN-MALA change
very slightly, exhibiting a certain degree of insensitivity to ACKNOWLEDGEMENT
the initial setting of µ. Fig. 10 shows the total running time, This work is sponsored by projects 61831007, U20B2048
average waiting time, and longest travel time of the KN- supported by National Natural Science Foundation of China,
MALA algorithm under different initial settings of µ and σ. and Shanghai Key Laboratory of Integrated Administration
The results show that the KN-MALA algorithm is relatively Technologies for Information Security.
insensitive to the initial setting of both µ and σ.
R EFERENCES
[1] W.-H. Lee and C.-Y. Chiu, “Design and implementation of a smart traffic
signal control system for smart city applications,” Sensors, vol. 20, no. 2,
pp. 508–525, 2020.
[2] X. Liang, X. Du, G. Wang, and Z. Han, “A deep reinforcement learning
network for traffic light cycle control,” IEEE Transactions on Vehicular
Technology, vol. 68, no. 2, pp. 1243–1253, 2019.
[3] H. Wei, G. Zheng, H. Yao, and Z. Li, “Intellilight: A reinforcement
learning approach for intelligent traffic light control,” in Proceedings
of the 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, ser. KDD ’18. New York, NY, USA:
Association for Computing Machinery, 2018, p. 2496–2505. [Online].
Available: https://doi.org/10.1145/3219819.3220096
[4] M. Wiering, “Multi-agent reinforcement leraning for traffic light con-
Fig. 10. Performance of KN-MALA with different µ and σ trol,” in Proceedings of the Seventeenth International Conference on
Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 2000, p. 1151–1158.
[5] Y. Bi, X. Lu, Z. Sun, D. Srinivasan, and Z. Sun, “Optimal type-2
fuzzy system for arterial traffic signal control,” IEEE Transactions on
Intelligent Transportation Systems, vol. 19, no. 9, pp. 3009–3027, 2018.
[6] H. Ge, Y. Song, C. Wu, J. Ren, and G. Tan, “Cooperative deep q-learning
with q-value transfer for multi-intersection signal control,” IEEE Access,
vol. 7, pp. 40 797–40 809, 2019.
[7] F. V. Webster, “Traffic signal settings,” Road Research Technical Paper,
vol. 39, 1958.
[8] N. Wu, D. Li, and Y. Xi, “Distributed weighted balanced control
of traffic signals for urban traffic congestion,” IEEE Transactions on
Intelligent Transportation Systems, vol. 20, no. 10, pp. 3710–3720, 2019.
[9] H. J. Chang and G. T. Park, “A study on traffic signal control at
signalized intersections in vehicular ad hoc networks,” Ad Hoc Networks,
Fig. 11. Performance of KN-MALA with different number of states
vol. 11, no. 7, pp. 2115–2124, 2013.
[10] K. Pandit, D. Ghosal, H. M. Zhang, and C.-N. Chuah, “Adaptive traffic
The number of states for φ also affects the performance signal control with vehicular ad hoc networks,” IEEE Transactions on
of the system. Fig. 11 shows the performance of KN-MALA Vehicular Technology, vol. 62, no. 4, pp. 1459–1471, 2013.
[11] L. Zhang, T. M. Garoni, and J. D. Gier, “A comparative study of
when the number of states is 3, 5 and 7, respectively. When macroscopic fundamental diagrams of arterial road networks governed
the number of states is 3, CDI Ci in the range of (<1.11, 1.11- by adaptive traffic signal systems,” Transportation Research Part B,
1.29, >1.35) is mapped to {s1 , s2 , s3 } respectively. Similarly, vol. 49, no. mar., pp. 1–23, 2013.
[12] P. Zhou, X. Chen, Z. Liu, T. Braud, P. Hui, and J. Kangasharju, “DRLE:
when the number of states is 7, CDI Ci in the range of (<1.05, Decentralized reinforcement learning at the edge for traffic light control
1.05-1.11, 1.11-1.17, 1.17-1.23, 1.23-1.29, 1.29-1.35, >1.35) in the IoV,” IEEE Transactions on Intelligent Transportation Systems,
is mapped to {s1 , s2 , ..., s7 }. It can be seen in Fig. 11 that vol. 22, no. 4, pp. 2262–2273, 2021.
[13] W. Liu, G. Qin, Y. He, and F. Jiang, “Distributed cooperative re-
the KN-MALA algorithm performs better when the number inforcement learning-based traffic signal control that integrates v2x
of states is larger, this is because the increase in the number networks’ dynamic clustering,” IEEE Transactions on Vehicular Tech-
of states of φ is equivalent to refining the defined range of nology, vol. 66, no. 10, pp. 8667–8681, 2017.
[14] Y. Chen, C. Li, W. Yue, H. Zhang, and G. Mao, “Engineering a
CDI, therefore the algorithm can make better decisions and large-scale traffic signal control: A multi-agent reinforcement learning
the traffic in the system is cleared faster. approach,” in IEEE INFOCOM 2021 - IEEE Conference on Computer
Communications Workshops (INFOCOM WKSHPS), 2021, pp. 1–6.
V. C ONCLUSION [15] Z. Zhang, D. Wang, and J. Gao, “Learning automata-based multiagent
reinforcement learning for optimization of cooperative tasks,” IEEE
In this paper, we proposed an algorithm for adaptive traffic Transactions on Neural Networks and Learning Systems, vol. 32, no. 10,
signal control. The algorithm is based on multi-agent Learning pp. 4639–4652, 2021.
Automata and can learn the optimal decision online to deter- [16] B. Masoumi and M. R. Meybodi, “Learning automata based multi-agent
system algorithms for finding optimal policies in markov games,” Asian
mine the phase time of the signals dynamically. Simulation Journal of Control, vol. 14, no. 1, pp. 137–152, 2012.
is conducted based on the data set describing the city road [17] Y. Guo, “Research on algorithms of free-of-parameter-tuning learning
network of 24 intersections of the city of Sioux Falls, USA. automata,” Ph.D. dissertation, Shanghai Jiao Tong University, 2019.
[18] T. N. for Research Core Team, “Transportation networks for research,”
The results show that the proposed KN-MALA algorithm can https://github.com/bstabler/TransportationNetworks, Oct 2021.
effectively improve traffic efficiency, thus alleviating the traffic
congestion problem. The proposed method is also applicable
to multi-agent planning problems in other fields, such as
economics, management, engineering, etc.