IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control (KDD’18)

Yamato OKAMOTO
2018/12/16
IntelliLight: A Reinforcement
Learning Approach for Intelligent
Traffic Light Control
(KDD’18)

Who is Yamato ??
 Master of Informatics, Kyoto University, JAPAN (2013)
 Working as a Business developer & AI Researcher
OMRON.Inc (2013~)
twitter RoadRoller_DESU
@ICDM’18
Banquet

Today’s paper
IntelliLight: A Reinforcement Learning Approach for Intelligent
Traffic Light Control (KDD’18)
Hua Wei, Guanjie Zheng, Huaxiu Yao, Zhenhui Li
Pennsylvania State University, University Park, PA, USA
Why chose?
- In this paper, there are two unique points
1. they tested the methods on the real-world traffic data.
2. they try to tried to interpret the policies.

Motivation
Traffic congestion has become increasingly costly.
One way to reduce the traffic congestion is by intelligently
controlling traffic lights.

Related Work (1/2)
Self-Organizing Traffic Light Control (SOTL)
controls the traffic light according to the current traffic state
“state” including the eclipsed time and the number of vehicles
waiting at the red light.)
the traffic light will change when the number of waiting cars is above
a hand-tuned threshold.
Remaining Challenges
without taking into account future situation.
In order to control traffic lights intelligently…
Detect waiting cars
and change traffic light

Related Work (2/2)
Deep Reinforcement Learning for Traffic Light Control
Apply Deep Q-learning to solve the in-managablely large state space.
Learn a Q-function (e.g. a deep neural network) to map state and
action to reward. These works vary in the state representation
and also reward design
Remaining Challenges
Previous studies all take traffic light phase as one feature, And
this one feature does not play a role enough.
agents are having difficulties in distinguishing the decision
process for different traffic light phases.
In order to taking into account future situation…

Key Idea (1/2)
1. A phase-gated model learning
To distinguish the decision process for different phases, they
design a separate learning process of making decisions Q(s, a).
These separate processes are selected through a gate
controlled by the phase.
when phase P = 0, the left
branch will be activated,
while when phase P = 1, the
right branch will be activated.

Key Idea (2/2)
2. Memory Palace and Model Updating
imbalanced samples of traffic on different lanes will lead to
inferior performance on less frequent situation.
To solve this, using different memory palaces for different
traffic-light-phase-action combinations.
training samples for different
phase-action combinations
are stored into different
memory palaces

Traffic Light Optimization Framework
State
Just one intersection and For each lane 𝒊 at this intersection
𝑳𝒊 ：queue length
𝑽𝒊 ：number of vehicles
𝑾𝒊 ：waiting time of vehicles
𝑴 ：image representation of vehicles’ position
Action
Traffic light has two actions
a = 1: change the light to next phase
a = 0: keep the current phase

Reward
reward is defined as a weighted sum of the following factors
𝑳𝒊 ：sum of queue length
𝑫𝒊 ：sum of delay D over all approaching lanes
𝑾𝒊 ：sum of updated waiting time W over all approaching lanes
𝑪 ： C = 0 for keeping and C = 1 for changing the current phase
𝑵 ： number of vehicles during time interval ∆t after the last action
𝑻 ： total time that vehicles spent on approaching lanes.
*heuristic

Training(1) ~Offline Part~
• to collect data samples, let traffic go through with fixed lights timetable,
Training(2) ~Online Part~
• at every time interval ∆t, the traffic light agent observe the state from the
environment and take the action with maximum estimated reward
according to greedy strategy
• After that, the agent will observe the environment and get the reward.
Then, the tuple (s, a, r) will be stored into memory.
• After several timestamps agent update the network according to the logs
in the memory.

Experiment
In this paper, they conduct experiments using both synthetic
and real-world traffic data.
To evaluate the effectiveness of proposed model, they
compare with the following baseline methods
- Fixed-time Control (FT)
- Self-Organizing Traffic Light Control (SOTL)
- Deep Reinforcement Learning for Traffic Light Control (DRL).
For Interpretation of learned signal, they show the percentage
of each action.

Experiment (1/3)
About Synthetic Data
four traffic flow settings:
1. simple changing traffic (configuration 1)
2. equally steady traffic (configuration 2)
3. unequally steady traffic (configuration 3)
4. complex traffic (configuration 4) (*)combination of previous configurations.

Experiment (1/3)
Performance on Synthetic Data
Proposed method IntelliLight achieves the best reward
Proposed method MP(Memory Palace) and PG(Phase Gate) boost the
reword (But not in all configuration).

Experiment (2/3)
About Real-world Data and Performance on it
Data is collected by 1,704 surveillance cameras in Jinan (China), over the
time period from 08/01/2016 to 08/31/2016.
By analyzing records with camera locations, the trajectories of vehicles
are recorded when they pass through road intersections.
The dataset covers 935 locations, and they feed this real-world traffic
setting into SUMO as online experiments.
Proposed method
IntelliLight achieves
the best reward

Experiment (3/3)
Adjusting intelligently to different traffic conditions.
Peak hour vs. Non-peak hour, Weekday vs. Weekend,

Conclusion
This paper address the traffic light control problem
using a well-designed reinforcement learning
approach.
proposed method distinguish the decision process
for different traffic light phases.
They conducted experiments using both synthetic
and real world.
proposed method showed superior performance
over state-of-the-art methods.

IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control (KDD’18)

More Related Content

What's hot (20)

Similar to IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control (KDD’18) (20)

More from Yamato OKAMOTO (20)

Recently uploaded (20)

IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control (KDD’18)